Tests of Difference: Mann-Whitney & Wilcoxn (AQA A-Level Psychology): Revision Notes
Tests of Difference: Mann-Whitney & Wilcoxn
Introduction to inferential tests
Inferential tests are statistical procedures used to determine whether observed differences between groups are statistically significant or likely due to chance. Building on the sign test from Year 1, A-Level psychology introduces two additional non-parametric tests of difference: the Mann-Whitney U test and the Wilcoxon signed-rank test.
Both tests work with ordinal data (ranked data) and help researchers decide whether to accept or reject their null hypothesis based on calculated and critical values.
Non-parametric tests are particularly useful because they don't require data to follow a normal distribution, making them suitable for psychological research where this assumption is often violated.
Mann-Whitney U test
When to use Mann-Whitney
The Mann-Whitney U test is appropriate when:
- You have an independent groups design (unrelated participants)
- Data is at ordinal level of measurement
- You want to test for differences between two groups
- Group sizes can be different (e.g., Group A = 10 participants, Group B = 8 participants)
Worked Example: Employment Discrimination Study
Aim: To investigate whether employers show bias against job candidates with a history of schizophrenia.
Method: Eighteen employers rated job applicants on interview suitability using a 1-20 scale (1 = definitely would not interview, 20 = definitely would interview). Group A received application forms mentioning 'recovering schizophrenic', whilst Group B received identical forms without this phrase.
Hypotheses:
- Alternative hypothesis: There is a difference in suitability ratings based on whether applicants are described as having schizophrenia (non-directional, two-tailed)
- Null hypothesis: There is no difference in suitability ratings between the two groups
Step-by-step calculation process
Step 1: Create a table of ranks
All data from both groups must be ranked together from lowest to highest. When identical scores appear multiple times, calculate the mean rank position. For example, if the score 12 appears at rank positions 7, 8, 9, and 10, all receive the rank of 8.5.
Create frequency and calculations tables showing:
- Individual participant scores
- Rank assigned to each score
- Sum of ranks for each group (RA and RB)
Step 2: Calculate the U value
Use the formula:
Where:
- = U value for the smaller group
- = sum of ranks for that group
- = number of participants in that group
Calculate the smaller U value (this becomes your test statistic).
Step 3: Compare calculated and critical values
- Find the appropriate critical value using the critical values table
- Look up values based on group sizes ( and ) at p ≤ 0.05 for a two-tailed test
- The calculated U value must be EQUAL TO or LESS THAN the critical value for significance
Interpreting results
If the calculated value meets the criteria above, the result is statistically significant (p ≤ 0.05). This means you reject the null hypothesis and accept the alternative hypothesis - there is a genuine difference between the groups that is unlikely due to chance alone.
Wilcoxon signed-rank test
When to use Wilcoxon
The Wilcoxon signed-rank test is appropriate when:
- You have a repeated measures design (related participants)
- Data is at ordinal level of measurement
- You want to test for differences before and after treatment
- The same participants are measured twice
The key difference from Mann-Whitney is that Wilcoxon uses the same participants measured at different time points, rather than comparing two separate groups.
Worked Example: Anger Management Programme
Aim: To assess the effectiveness of an anger management programme for young offenders.
Method: Twenty teenagers in a young offenders institute completed anger questionnaires before and after eight intensive anger management sessions. Each participant received two anger scores which were compared to determine programme effectiveness.
Hypotheses:
- Alternative hypothesis: There is a difference in anger scores before and after treatment (non-directional, two-tailed)
- Null hypothesis: There is no difference in anger scores before and after treatment
Step-by-step calculation process
Step 1: Calculate differences and rank them
- Calculate the difference between each participant's before and after scores
- Ignore the sign (positive or negative) when ranking differences
- Exclude any zero differences from the ranking process
- Rank all non-zero differences from smallest to largest
- Assign tied ranks the mean position when differences are identical
Step 2: Calculate the T value
- Identify which sign (+ or -) appears less frequently in your difference column
- T = sum of ranks for the less frequent sign
- This T value becomes your calculated test statistic
Step 3: Compare calculated and critical values
- Use the critical values table for Wilcoxon
- Find the critical value based on N (number of participants, excluding those with zero differences)
- The calculated T value must be EQUAL TO or LESS THAN the critical value for significance
Interpreting results
When the calculated T value is greater than the critical value, the result is not significant (p > 0.05). This means you accept the null hypothesis - there is insufficient evidence of a real difference, and any observed changes could be due to chance.
Key distinctions between the tests
| Test | Design Type | Data Level | Test Statistic | Usage |
|---|---|---|---|---|
| Mann-Whitney U | Independent groups | Ordinal | U value | Comparing two different groups |
| Wilcoxon | Repeated measures | Ordinal | T value | Comparing same participants twice |
Both tests require the calculated value to be equal to or less than the critical value for statistical significance, unlike some other statistical tests where calculated values must exceed critical values.
Key Points to Remember:
- Mann-Whitney = independent groups with ordinal data (different participants in each condition)
- Wilcoxon = repeated measures with ordinal data (same participants measured twice)
- Both tests use ranking procedures to convert raw scores into ordinal data
- Calculated values must be ≤ critical values for significance in both tests
- Always exclude zero differences when calculating Wilcoxon signed-rank tests