Outliers (Edexcel GCSE Statistics): Revision Notes
Outliers
An outlier is an unusually extreme value in a set of data that stands out from the rest of the distribution. These values can significantly affect statistical calculations and are important to identify when analysing data sets.
What makes a value an outlier?
There are two main methods used to identify outliers in GCSE mathematics, each with different criteria for determining when a data point is considered extreme.
The two primary methods for identifying outliers are the box plot method (using the interquartile range) and the standard deviation method (using the mean and standard deviation). While both are valid approaches, they may identify different outliers in the same dataset.
Identifying outliers using box plots
When working with box plots, we use the interquartile range (IQR) to determine which values are outliers. The IQR measures the spread of the middle 50% of the data.
The 1.5 × IQR rule
The 1.5 × IQR Rule for Outliers
A data value is considered an outlier if it falls:
- Below , or
- Above
Where:
- = first quartile (25th percentile)
- = third quartile (75th percentile)
Representing outliers on box plots
On a box plot, outliers are shown as individual crosses (×) beyond the whiskers. The whiskers themselves only extend to the most extreme values that are not outliers.
Remember that outliers appear as separate points (×) on box plots, while the whiskers connect to the most extreme values that are still within the acceptable range.
Worked Example: Social media usage
Let's examine data showing how many times 11 friends checked social media in one day: Data: 3, 10, 14, 15, 16, 16, 18, 18, 19, 20, 21
Step 1: Find the quartiles
- (4th value) = 15
- (8th value) = 18
Step 2: Calculate the IQR
Step 3: Apply the outlier criteria
- Lower boundary:
- Upper boundary:
Step 4: Identify outliers
- Any value less than 10.5: The value 3 is an outlier
- Any value greater than 22.5: No high outliers in this dataset
Identifying outliers using standard deviation
The standard deviation method uses a different approach, defining outliers based on how far they are from the mean.
The 3σ (sigma) rule
The 3σ (Sigma) Rule for Outliers
A data value is considered an outlier if it falls outside the range:
Where = mean and = standard deviation
Worked Example: Social media survey
In a larger survey about social media use, the mean number of daily checks was 18 with a standard deviation of 3.4.
Step 1: Calculate the outlier boundaries
- Lower boundary:
- Upper boundary:
Step 2: Interpret the results
- Values fewer than 8 times would be outliers
- Values more than 28 times would be outliers
Worked Example: Email access frequency
Given data: Number of times people accessed their email in one day 0, 0, 1, 3, 3, 4, 5, 6, 7, 7, 9, 9, 10, 15, 19
Step 1: Find the quartiles
- is the 4th value = 3
- is the 12th value = 9
Step 2: Calculate IQR and boundaries
- Upper boundary:
- Lower boundary:
Step 3: Identify outliers
- Since , the value 19 is an outlier
- No values are below -6, so there are no low outliers
Worked Example: Maths test scores
A student scored 95% on a maths test where the mean mark was 46% and the standard deviation was 10.8.
Step 1: Calculate the upper outlier boundary
Step 2: Compare the student's score Since , this student's result is an outlier (exceptionally high performance).
Key differences between methods
The box plot method focuses on the position within the data distribution, while the standard deviation method considers how far values are from the average. Both are valid approaches, but they may identify different outliers in the same dataset.
Common Exam Tips
- Always show your working when calculating outlier boundaries
- Remember that outliers can be either unusually high or unusually low
- On box plots, outliers are marked with × symbols, not included in the whiskers
- Check your arithmetic carefully when multiplying by 1.5 or 3
- State clearly which values are outliers in your final answer
Key Points to Remember:
- An outlier is an extreme value that stands out from the rest of the data
- For box plots: use and as boundaries
- For standard deviation: use as boundaries
- Outliers appear as crosses (×) beyond the whiskers on box plots
- Both methods are valid but may give different results for the same data