Choosing a Statistical Test (AQA A-Level Psychology): Revision Notes
Choosing a Statistical Test
Statistical testing helps researchers determine whether observed differences or relationships in their data are genuine findings or merely due to chance. While descriptive statistics provide useful summaries of data through measures of central tendency and dispersion, they cannot tell us if findings are statistically meaningful. This is where inferential statistical tests become essential.
Purpose of statistical testing
Statistical tests are used to analyse whether differences or correlations found in research are statistically significant - meaning they are unlikely to have occurred by chance alone. The results help researchers decide whether to accept or reject the null hypothesis, which typically states that no real difference or relationship exists between variables.
Statistical significance doesn't prove that a finding is practically important or meaningful - it simply indicates that the result is unlikely to be due to random chance alone.
Three key factors for test selection
When selecting an appropriate statistical test, researchers must consider three essential factors that will guide their decision-making process:
The Three Essential Factors:
- Whether they are looking for a difference or correlation
- The experimental design being used (if testing for differences)
- The level of measurement of the data
All three factors must be carefully considered to select the most appropriate statistical test for your research question.
Factor 1: Difference or correlation?
The first consideration relates to the research aim. Researchers typically investigate either:
- Differences between groups or conditions (e.g., comparing memory performance between two age groups)
- Correlations or associations between variables (e.g., examining the relationship between stress levels and academic performance)
This distinction should be clear from the research hypothesis. Note that "correlation" in this context includes both correlational analyses and investigations examining associations between categorical variables.
If your research question asks "Is there a difference between..." or "Do groups differ in...", you're testing for differences. If it asks "Is there a relationship between..." or "Are variables associated...", you're testing for correlation.
Factor 2: Experimental design
This factor only applies when testing for differences. Researchers must identify whether their study uses:
Related designs:
- Repeated measures: Same participants take part in all conditions
- Matched pairs: Different participants in each condition who have been matched on important variables
Unrelated design:
- Independent groups: Completely different participants in each condition with no matching
The key distinction is whether participants across conditions are connected in some meaningful way (related) or are entirely separate (unrelated). If investigating correlations rather than differences, this factor becomes irrelevant.
Remember: If you're testing for correlation or association, experimental design doesn't matter - you can skip this factor and move directly to considering your level of measurement.
Factor 3: Levels of measurement
Data can be classified into three distinct levels of measurement, each with different characteristics and statistical requirements:
Nominal data
Nominal data consists of categories or labels where items can only belong to one group. Sometimes called categorical data, this represents the most basic level of measurement.
Key features:
- Data appears as frequencies within categories
- Items are discrete - they can only appear in one category
- Cannot calculate meaningful averages or ranges
Examples of Nominal Data:
- Favourite colours (red, blue, green, yellow)
- Preferred transport methods (car, bus, train, bicycle)
- Gender categories (male, female, non-binary)
- Political party preference (Conservative, Labour, Liberal Democrat)
Ordinal data
Ordinal data involves ranking or ordering items along a scale where the position matters, but the intervals between positions are not equal.
Key features:
- Data can be arranged in order from lowest to highest
- Intervals between units are not equal in size
- Based on subjective judgements rather than objective measurements
- Sometimes called "unsafe data" due to lack of precision
For statistical testing, ordinal data is converted to ranks (1st, 2nd, 3rd, etc.) rather than using the original scores, as the raw numbers lack meaningful mathematical properties.
Examples of Ordinal Data:
- Satisfaction ratings on a 1-10 scale
- Agreement levels (strongly disagree, disagree, neutral, agree, strongly agree)
- Competition rankings (1st place, 2nd place, 3rd place)
- Education levels (GCSE, A-Level, Undergraduate, Postgraduate)
Interval data
Interval data represents the most sophisticated level of measurement, using numerical scales with equal, precisely defined units.
Key features:
- Based on standardised units of measurement (time, weight, temperature)
- Equal intervals between all points on the scale
- Preserves maximum detail and precision
- Required for parametric statistical tests
Think of interval data as measurements you could take with scientific instruments like stopwatches, thermometers, or weighing scales - these produce objective, standardised measurements.
Examples of Interval Data:
- Reaction times measured in milliseconds
- Test scores based on objective marking criteria
- Physical measurements (height, weight, temperature)
- Age measured in years or months
Statistical test selection table
The following table provides a systematic approach to selecting the appropriate statistical test based on your three key factors:
| Data Level | Test of Difference (Unrelated) | Test of Difference (Related) | Test of Association/Correlation |
|---|---|---|---|
| Nominal | Chi-squared | Sign test | Chi-squared |
| Ordinal | Mann-Whitney | Wilcoxon | Spearman's rho |
| Interval | Unrelated t-test | Related t-test | Pearson's r |
Key points about the table:
- Chi-squared can test both differences and associations, but always requires unrelated/independent data
- The three tests in the bottom row (unrelated t-test, related t-test, and Pearson's r) are parametric tests
- All other tests are non-parametric tests
Understanding parametric vs non-parametric tests
Parametric tests (unrelated t-test, related t-test, Pearson's r) require interval-level data and make assumptions about the underlying distribution of scores in the population. They are considered more powerful and sensitive to detecting genuine effects.
Non-parametric tests make fewer assumptions about the data and can be used with ordinal or nominal data. They are more robust but generally less sensitive than parametric alternatives.
Choosing Between Test Types:
- Use parametric tests when you have interval data and can meet their assumptions - they're more likely to detect real effects
- Use non-parametric tests when you have nominal/ordinal data or when parametric assumptions are violated
- When in doubt, non-parametric tests are the safer choice
Memory aid for test selection
A useful mnemonic for remembering the sequence of tests in the table can help during exams and research planning:
"Carrots Should Come Mashed With Swede Under Roast Potatoes"
This corresponds to:
- Chi-squared
- Sign test
- Chi-squared
- Mann-Whitney
- Wilcoxon
- Spearman's rho
- Unrelated t-test
- Related t-test
- Pearson's r
Data classification challenges
In psychology, classifying data types can sometimes be ambiguous and requires careful consideration of the underlying measurement properties.
Common Classification Dilemma: "Number of words recalled" in a memory test could theoretically be interval data if all words are equally difficult to remember. However, since some words are naturally more memorable than others, it's often safer to treat such data as ordinal and use appropriate non-parametric tests.
Always provide clear reasoning when determining the level of measurement for your data, as this decision directly impacts which statistical test is appropriate. When classification is unclear, err on the side of caution and choose the more conservative (lower level) classification.
Key Points to Remember:
- Statistical tests determine whether findings are significant or due to chance
- Three factors guide test selection: difference vs correlation, experimental design (related vs unrelated), and level of measurement
- Data levels: Nominal data uses categories, ordinal data involves ranking, interval data has equal units
- Parametric tests (t-tests, Pearson's r) require interval data and are more powerful
- Chi-squared tests associations and always requires independent data
- When in doubt about data level, justify your reasoning clearly