Methods (Edexcel A-Level Psychology): Revision Notes
Correlational Research and Data
What is correlational research?
Correlational research involves measuring two different variables to determine whether they are related. Unlike experimental research, correlation studies do not establish whether one variable causes another to change; they simply examine whether a relationship exists between two co-variables.
Co-variables can be measured directly by the researcher or obtained from secondary data collected by other sources. In biological psychology, co-variables might include genetic similarity (such as closeness of family relationship) and behavioural characteristics (such as levels of aggression).
By plotting scores for these two variables on a scatter diagram, researchers can identify whether any relationship exists between them. This visual representation makes it easier to spot patterns that might not be obvious from looking at raw numbers alone.
Types of correlation
When examining correlational data, three types of relationships may emerge:
Positive correlation occurs when both co-variables increase together. As one variable increases, the other also increases. For example, more hours of revision might be associated with higher test scores.
Negative correlation is shown when one variable increases whilst the other decreases. For instance, higher stress levels might be associated with lower wellbeing scores.
No correlation means there is no clear relationship between the variables. The data points on a scatter diagram appear randomly scattered with no discernible pattern.
Understanding Scatter Diagrams
The most effective way to identify the relationship between co-variables is to plot them on a scatter diagram (also called a scattergraph). One variable is plotted on the y-axis and the other on the x-axis, with each participant's paired scores represented as a single point. The pattern formed by these points reveals the type of correlation present.
Evaluation of correlational research
Cannot establish causality
A major limitation of correlational research is that identifying a relationship between two variables does not reveal which variable causes the other to change. The relationship may be coincidental. For example, if research shows that increased stress is associated with increased aggression, we cannot determine whether stress causes aggression or vice versa.
Third variable problem
Another complicating factor is that a third, unmeasured variable might be influencing both co-variables simultaneously. Using the stress and aggression example, warm weather could increase both stress levels and aggression independently.
Some biological psychologists use correlations to test for genetic explanations by measuring genetic similarity between people and similarity in their behaviour (concordance rate). However, people who are closely related often share the same environment and experiences, making it difficult to isolate the influence of genetics from environmental factors.
Use of secondary data
Correlations frequently employ secondary data (information gathered from previous research) to investigate potential links between variables. This approach can be cost-effective, as it allows researchers to identify whether variables appear linked before investing resources in more expensive large-scale experiments.
If initial correlational analysis suggests no relationship exists, researchers can avoid costly experimental designs. However, if a relationship seems to exist and warrants further investigation, an experiment can then be designed to test whether the relationship is causal.
Analysing correlational data
Scatter diagrams
When conducting a correlational study, you examine the relationship between two variables. Drawing a scatter diagram provides one of the simplest ways to begin your analysis and determine whether any link exists between changes in the co-variables. If the variables appear to show some connection, you can proceed to conduct an inferential test of significance. For correlational data gathered using ordinal measurement, Spearman's rho test is appropriate.
Consider an investigation into the relationship between hours of revision and test marks. The following table shows data collected from ten students:
| Student number | Hours of revision | mark/20 achieved on the test |
|---|---|---|
| 1 | 4 | 14 |
| 2 | 2 | 11 |
| 3 | 1 | 8 |
| 4 | 3 | 7 |
| 5 | 4 | 17 |
| 6 | 5 | 20 |
| 7 | 1 | 7 |
| 8 | 1 | 12 |
| 9 | 6 | 18 |
| 10 | 4 | 15 |
When plotted on a scatter diagram, a trend line reveals that a positive correlation appears to exist between the two variables. However, the line does not suggest a particularly strong correlation, so further investigation is needed. An inferential test would be useful here. Because this correlation uses ordinal data, Spearman's test is most appropriate.
Ordinal data refers to a level of measurement where numbers represent rankings rather than absolute scores. For example, ranking attractiveness on a scale of 1 to 5, or ranking exam performance from highest to lowest. For a scatter diagram to be used for plotting correlation data, the data must be at least ordinal level.
Spearman's rho
Spearman's rho is an inferential test used to determine whether statistical data gathered in a correlation using ordinal data can be generalised from the sample to the whole population. It can only be used to analyse correlational data where the level of measurement is ordinal, or where interval or ratio data has been converted to ordinal (through ranking).
The formula to conduct a Spearman's calculation is:
Where represents the result of the test, represents the difference between the ranked position of the scores on each row, and represents the number of scored pairs gathered (in this case ). The symbol means 'the sum of' or 'the total'.
Worked Example: Calculating Spearman's rho for Revision Hours and Test Scores
Let's work through the complete calculation process using the student revision data.
Step 1: Rank the data
When calculating Spearman's rho, the first step is to rank the scores of each of the two variables measured. For any ranks where several scores are the same (tied), calculate the mid-point of the ranks they would occupy. For example, if three students did four hours of revision, they would take positions 3, 4 and 5 in the ranks, so each receives a rank of 4. The next available rank would then be position 6 for the student who did three hours of revision.
| Student number | Hours of revision | Rank of revision hours | mark/20 | Rank of mark |
|---|---|---|---|---|
| 1 | 4 | 4 | 14 | 5 |
| 2 | 2 | 7 | 11 | 7 |
| 3 | 1 | 9 | 8 | 8 |
| 4 | 3 | 6 | 7 | 9.5 |
| 5 | 4 | 4 | 17 | 3 |
| 6 | 5 | 2 | 20 | 1 |
| 7 | 1 | 9 | 7 | 9.5 |
| 8 | 1 | 9 | 12 | 6 |
| 9 | 6 | 1 | 18 | 2 |
| 10 | 4 | 4 | 15 | 4 |
Step 2: Calculate differences
The next step is to calculate the difference between the ranked positions of each pair of scores. The easiest method is to subtract the second rank from the first rank and record this in the next column.
| Student number | Hours of revision | Rank of revision hours | mark/20 | Rank of mark | Difference in ranks (d) |
|---|---|---|---|---|---|
| 1 | 4 | 4 | 14 | 5 | -1 |
| 2 | 2 | 7 | 11 | 7 | 0 |
| 3 | 1 | 9 | 8 | 8 | 2 |
| 4 | 3 | 6 | 7 | 9.5 | -3.5 |
| 5 | 4 | 4 | 17 | 3 | 1 |
| 6 | 5 | 2 | 20 | 1 | 1 |
| 7 | 1 | 9 | 7 | 9.5 | -0.5 |
| 8 | 1 | 9 | 12 | 6 | 3 |
| 9 | 6 | 1 | 18 | 2 | -1 |
| 10 | 4 | 4 | 15 | 4 | 0 |
Step 3: Square the differences
The final preparatory step is to square the differences to eliminate any negative figures and calculate the total of squared differences.
| Student number | Hours of revision | Rank of revision hours | Raw mark/20 | Rank of raw mark | Difference in ranks (d) | |
|---|---|---|---|---|---|---|
| 1 | 4 | 4 | 14 | 5 | -1 | 1 |
| 2 | 2 | 7 | 11 | 7 | 0 | 0 |
| 3 | 1 | 9 | 8 | 8 | 1 | 1 |
| 4 | 3 | 6 | 7 | 9.5 | -3.5 | 12.25 |
| 5 | 4 | 4 | 17 | 3 | 1 | 1 |
| 6 | 5 | 2 | 20 | 1 | 1 | 1 |
| 7 | 1 | 9 | 7 | 9.5 | -0.5 | 0.25 |
| 8 | 1 | 9 | 12 | 6 | 3 | 9 |
| 9 | 6 | 1 | 18 | 2 | -1 | 1 |
| 10 | 4 | 4 | 15 | 4 | 0 | 0 |
Total
Step 4: Apply the formula
All required information can now be inserted into the formula to calculate Spearman's rho:
Interpreting the correlation coefficient
Once the value of has been calculated, the correlation coefficient can be examined to determine how closely the co-variables are related. The coefficient is a figure between +1 and -1 that indicates the strength and direction of the relationship between variables.
Understanding Correlation Coefficients:
- A positive coefficient indicates a positive correlation
- A negative coefficient indicates a negative correlation
- The closer to +1 or -1, the stronger the correlation
- The closer to 0, the weaker the correlation
- A coefficient of 0 indicates no correlation whatsoever
In this case, the calculated coefficient is 0.839, which suggests a strong, positive correlation between the number of hours of revision completed by students and the marks they achieved on the test.
Statistical significance
The next step is to determine whether this result is statistically significant—that is, whether the result truly demonstrates a relationship between the two variables. If a result is found to be significant, it is reasonable to assume that the two variables are genuinely related. However, if the result is not shown to be significant, there is a high probability that any relationship observed is actually due to chance factors such as sampling error.
Statistical significance refers to the probability that the data results from an actual relationship existing between the variables rather than coincidence. When conducting research and analysing data using descriptive statistics (such as measures of central tendency or dispersion), it is possible to observe trends in the data. However, it remains unclear whether these trends show how one variable actually affects another or are simply showing coincidence.
Understanding p-values and Significance Levels
Inferential statistical tests enable researchers to calculate the probability that their data shows evidence of an actual relationship existing between the variables rather than a coincidence. As psychologists, the generally accepted level of significance is 5 per cent, meaning there is a 95 per cent probability that the results demonstrate a true relationship.
This is often expressed as , where stands for 'the probability of the results being due to chance'—the probability being due to chance is less than or equal to 0.05 or 5 per cent.
Once the result of a statistical test has been calculated, this figure is known as the observed value.
Observed value refers to the value given by a statistical test, such as rho for Spearman's. It is compared with the relevant critical value to determine if a null hypothesis should be retained or rejected.
This value must be compared to a table of critical values to assess whether it is significant. These figures have been calculated by statisticians to enable researchers to more easily determine whether data meets the criteria for significance.
Critical value is a statistical cut-off point presented on a table of critical values that determines whether the result is significant enough for the null hypothesis to be rejected. The table below shows part of a critical values table for the Spearman's rho test. For the result to be significant, the observed value needs to exceed the critical value in the table relevant to the study conducted.
| n | Level of significance for a one-tailed test | |||
|---|---|---|---|---|
| 0.05 | 0.25 | 0.01 | 0.005 | |
| Level of significance for a two-tailed test | ||||
| 0.1 | 0.05 | 0.02 | 0.01 | |
| 4 | 1.000 | - | - | - |
| 5 | .900 | 1.00 | 1.00 | - |
| 6 | .829 | .886 | .943 | 1.00 |
| 7 | .714 | .786 | .893 | .929 |
| 8 | .643 | .738 | .833 | .881 |
| 9 | .600 | .700 | .783 | .833 |
| 10 | .564 | .648 | .745 | .794 |
The calculated must be equal to or exceed the table (critical) value for significance at the level shown.
Worked Example: Interpreting Statistical Significance
For the data in this example, the value was 10, so the final row of the table is relevant. If the original hypothesis was directional (one-tailed), the critical value of interest would be in the first column (0.564), whereas if the hypothesis was non-directional (two-tailed), the critical value of interest would be in the second column (0.648), as these relate to the 0.05 level of significance.
The observed value of in the example was 0.839. As this exceeds the critical values, regardless of whether the hypothesis was directional or non-directional, the result would be significant, meaning that the alternative hypothesis could be accepted and the null hypothesis rejected. This indicates there was an equal to or less than 5 per cent probability that the relationship found occurred by chance.
In fact, the calculated Spearman's rho value of 0.839 exceeds the critical values for a one- and two-tailed test at 0.01, so we can be 99 per cent confident in the relationship found. This also means there is less chance of a Type 1 error having been made.
Important considerations
Cautions When Using Spearman's rho
Caution should be exercised when interpreting the results of a Spearman's rho test when a large sample size has been used. The greater the sample, the more likely you are to find that results are significant, even if the correlation coefficient is close to zero. You should also check for outliers or extreme scores in the data. These may affect the correlation coefficient you calculate and therefore whether the findings are significant or not.
When comparing the correlation coefficient from your test to the critical values in the table, you should ignore whether your coefficient is positive or negative when determining its significance. For example, if your coefficient is -0.5, you should ignore the negative (-) sign when comparing it to the critical value in the table. However, remember that the negative sign is important when you interpret your findings and relate them back to the hypothesis.
Key Points to Remember:
- Correlational research identifies relationships between two variables but cannot establish causation—it shows whether variables are associated, not whether one causes the other
- Three types of correlation exist: positive (both variables increase together), negative (one increases as the other decreases), and no correlation (no pattern)
- Scatter diagrams provide a visual method to identify patterns in correlational data by plotting one variable on each axis
- Spearman's rho () is used to test the statistical significance of correlations with ordinal data, producing a correlation coefficient between -1 and +1
- Statistical significance is determined by comparing the observed value (calculated ) to critical values in a table; if the observed value equals or exceeds the critical value, the relationship is unlikely to be due to chance (typically or 5% probability)