Scatter graphs (Edexcel GCSE Maths): Revision Notes
Scatter graphs
What is a scatter graph?
A scatter graph (also called a scatter plot) displays the relationship between two variables by plotting points on a coordinate system. The points on a scatter graph aren't always scattered randomly - they often show patterns that help us understand how the two variables relate to each other.
The closer the points are to forming a straight line, the stronger the relationship between the variables. When points form a clear linear pattern, we say there is a strong correlation.
Types of correlation
Understanding correlation is essential for interpreting scatter graphs. There are three main types:
Positive correlation
- As one variable increases, the other variable also increases
- Points generally slope upwards from left to right
- Example: As weight increases, length of spring increases
Worked Example: Positive Correlation
Imagine plotting the relationship between hours studied and test scores:
- Student A: 2 hours studied → 65% test score
- Student B: 4 hours studied → 75% test score
- Student C: 6 hours studied → 85% test score
As study time increases, test scores also increase, showing positive correlation.
Negative correlation
- As one variable increases, the other variable decreases
- Points generally slope downwards from left to right
- Example: As house price increases, distance from London decreases
No correlation
- There is no clear relationship between the variables
- Points appear randomly scattered with no obvious pattern
- Example: Weight and score on a maths test
When examining scatter graphs, the strength of correlation depends on how close the points are to forming a straight line pattern. Points that are tightly clustered around an imaginary line show strong correlation, while widely scattered points indicate weak or no correlation.
Outliers
An outlier is a data point that doesn't fit the general pattern of the other data points. Outliers appear as points that are far away from where you'd expect them to be based on the overall trend.
When identifying outliers, look for points that seem out of place compared to the rest of the data pattern.
Identifying Outliers: Look for data points that are significantly distant from the main cluster or trend line. These points may represent measurement errors, unusual circumstances, or genuinely exceptional cases that deserve further investigation.
Line of best fit
A line of best fit is a straight line drawn through the data points to show the general trend. This line helps us make estimates and predictions.
Guidelines for Drawing a Line of Best Fit:
- Draw the line as close as possible to as many points as you can
- Aim to have roughly equal numbers of points above and below the line
- The line doesn't need to pass through every point - it shows the overall pattern
Making predictions
Using a scatter graph and line of best fit, we can make two types of predictions:
Interpolation
- Definition: Predicting a value that falls within the range of your existing data
- More reliable because you're staying within known data limits
- Example: If your data ranges from 0-50 miles per gallon, predicting at 30 miles per gallon is interpolation
Extrapolation
- Definition: Predicting a value that falls outside the range of your existing data
- Less reliable because you're going beyond what you actually know
- Example: If your data ranges from 0-50 miles per gallon, predicting at 70 miles per gallon is extrapolation
Worked Example: Making Predictions
A scatter graph shows the relationship between car engine size (1.0L to 3.0L) and fuel consumption:
Interpolation: Predicting fuel consumption for a 2.2L engine
- This falls within our data range (1.0L to 3.0L)
- Reliable prediction - we can use the line of best fit confidently
Extrapolation: Predicting fuel consumption for a 4.5L engine
- This falls outside our data range
- Less reliable - the relationship might change beyond 3.0L
Cause and effect
Critical Warning: Correlation vs Causation
Correlation doesn't always mean that two variables are directly related or that one causes the other.
For example, bottled water sales and bee stings might both increase during hot weather, but bottled water doesn't cause bee stings. Instead, both are affected by a third factor - temperature.
Always consider whether there might be other factors influencing both variables before concluding that one causes the other.
Reliability of estimates
When commenting on the reliability of estimates from scatter graphs, consider:
- Strong correlation: If points are close to the line of best fit, estimates are more reliable
- Weak correlation: If points are widely scattered, estimates are less reliable
- Interpolation vs extrapolation: Interpolation is generally more reliable than extrapolation
- Data range: Estimates within the data range are more trustworthy
Exam Tip for Reliability Questions: Always mention both the strength of correlation (how close points are to the line) and whether you're making predictions within or outside the data range when discussing reliability.
Exam tips
Essential Exam Strategies:
- Always read the axes labels carefully to understand what variables are being compared
- Look for the overall pattern before identifying outliers
- When asked to comment on reliability, mention both the strength of correlation and whether you're interpolating or extrapolating
- Show your working when making estimates by drawing lines on the graph
Remember!
Key Points to Remember:
- Positive correlation: both variables increase together
- Negative correlation: as one increases, the other decreases
- Line of best fit: helps you make estimates and see trends clearly
- Interpolation (within data range) is more reliable than extrapolation (outside data range)
- Correlation doesn't prove causation - there might be other factors involved