Tests of Difference: Mann-Whitney & Wilcoxn Revision Notes for AQA A-Level Psychology

Tests of Difference: Mann-Whitney & Wilcoxn

Introduction to inferential tests

Inferential tests are statistical procedures used to determine whether observed differences between groups are statistically significant or likely due to chance. Building on the sign test from Year 1, A-Level psychology introduces two additional non-parametric tests of difference: the Mann-Whitney U test and the Wilcoxon signed-rank test.

Both tests work with ordinal data (ranked data) and help researchers decide whether to accept or reject their null hypothesis based on calculated and critical values.

Note

Non-parametric tests are particularly useful because they don't require data to follow a normal distribution, making them suitable for psychological research where this assumption is often violated.

Mann-Whitney U test

When to use Mann-Whitney

The Mann-Whitney U test is appropriate when:

You have an independent groups design (unrelated participants)
Data is at ordinal level of measurement
You want to test for differences between two groups
Group sizes can be different (e.g., Group A = 10 participants, Group B = 8 participants)

Example

Worked Example: Employment Discrimination Study

Aim: To investigate whether employers show bias against job candidates with a history of schizophrenia.

Method: Eighteen employers rated job applicants on interview suitability using a 1-20 scale (1 = definitely would not interview, 20 = definitely would interview). Group A received application forms mentioning 'recovering schizophrenic', whilst Group B received identical forms without this phrase.

Hypotheses:

Alternative hypothesis: There is a difference in suitability ratings based on whether applicants are described as having schizophrenia (non-directional, two-tailed)
Null hypothesis: There is no difference in suitability ratings between the two groups

Step-by-step calculation process

Step 1: Create a table of ranks

All data from both groups must be ranked together from lowest to highest. When identical scores appear multiple times, calculate the mean rank position. For example, if the score 12 appears at rank positions 7, 8, 9, and 10, all receive the rank of 8.5.

Create frequency and calculations tables showing:

Individual participant scores
Rank assigned to each score
Sum of ranks for each group (RA and RB)

Step 2: Calculate the U value

Important

Use the formula: $U = U_A = R_A - \frac{N_A (N_A + 1)}{2}$

Where:

$U_A$ = U value for the smaller group
$R_A$ = sum of ranks for that group
$N_A$ = number of participants in that group

Calculate the smaller U value (this becomes your test statistic).

Step 3: Compare calculated and critical values

Find the appropriate critical value using the critical values table
Look up values based on group sizes ( $N_A$ and $N_B$ ) at p ≤ 0.05 for a two-tailed test
The calculated U value must be EQUAL TO or LESS THAN the critical value for significance

Interpreting results

If the calculated value meets the criteria above, the result is statistically significant (p ≤ 0.05). This means you reject the null hypothesis and accept the alternative hypothesis - there is a genuine difference between the groups that is unlikely due to chance alone.

Wilcoxon signed-rank test

When to use Wilcoxon

The Wilcoxon signed-rank test is appropriate when:

You have a repeated measures design (related participants)
Data is at ordinal level of measurement
You want to test for differences before and after treatment
The same participants are measured twice

Note

The key difference from Mann-Whitney is that Wilcoxon uses the same participants measured at different time points, rather than comparing two separate groups.

Example

Worked Example: Anger Management Programme

Aim: To assess the effectiveness of an anger management programme for young offenders.

Method: Twenty teenagers in a young offenders institute completed anger questionnaires before and after eight intensive anger management sessions. Each participant received two anger scores which were compared to determine programme effectiveness.

Hypotheses:

Alternative hypothesis: There is a difference in anger scores before and after treatment (non-directional, two-tailed)
Null hypothesis: There is no difference in anger scores before and after treatment

Step-by-step calculation process

Step 1: Calculate differences and rank them

Calculate the difference between each participant's before and after scores
Ignore the sign (positive or negative) when ranking differences
Exclude any zero differences from the ranking process
Rank all non-zero differences from smallest to largest
Assign tied ranks the mean position when differences are identical

Step 2: Calculate the T value

Important

Identify which sign (+ or -) appears less frequently in your difference column
T = sum of ranks for the less frequent sign
This T value becomes your calculated test statistic

Step 3: Compare calculated and critical values

Use the critical values table for Wilcoxon
Find the critical value based on N (number of participants, excluding those with zero differences)
The calculated T value must be EQUAL TO or LESS THAN the critical value for significance

Interpreting results

When the calculated T value is greater than the critical value, the result is not significant (p > 0.05). This means you accept the null hypothesis - there is insufficient evidence of a real difference, and any observed changes could be due to chance.

Key distinctions between the tests

Test	Design Type	Data Level	Test Statistic	Usage
Mann-Whitney U	Independent groups	Ordinal	U value	Comparing two different groups
Wilcoxon	Repeated measures	Ordinal	T value	Comparing same participants twice

Note

Both tests require the calculated value to be equal to or less than the critical value for statistical significance, unlike some other statistical tests where calculated values must exceed critical values.

Summary

Key Points to Remember:

Mann-Whitney = independent groups with ordinal data (different participants in each condition)
Wilcoxon = repeated measures with ordinal data (same participants measured twice)
Both tests use ranking procedures to convert raw scores into ordinal data
Calculated values must be ≤ critical values for significance in both tests
Always exclude zero differences when calculating Wilcoxon signed-rank tests

Tests of Difference: Mann-Whitney & Wilcoxn (AQA A-Level Psychology): Revision Notes