Skewness (Edexcel GCSE Statistics): Revision Notes
Skewness
What is skewness?
Skewness describes how data is distributed compared to a perfectly symmetrical distribution. When you analyse data, you need to understand whether it's spread evenly or if it leans more heavily towards one side. This concept helps you understand the shape and nature of your dataset.
Understanding skewness is crucial for interpreting real-world data, as perfectly symmetrical distributions are rare in practice. Most datasets will show some degree of skewness.
You can identify skewness in two main ways: by looking at graphs and charts visually, or by using mathematical calculations. Both methods are essential skills for GCSE mathematics.
Types of skewness
There are three main types of distribution patterns you need to recognise:
Positive skew (right-skewed)
In a positively skewed distribution, most of the data clusters towards the lower values, with a long tail extending towards the higher values. Think of it as the data being "pulled" towards the right side.
Key characteristics:
- The median sits closer to the left side of a box plot
- In terms of averages: mean > median > mode
- On a box plot: (median - LQ) < (UQ - median)
- The mean is larger than the median because it's influenced by the extreme high values in the tail
Critical Relationship for Positive Skew: The key mathematical relationship mean > median > mode is essential to remember for exams. The mean gets "pulled" towards the tail by extreme values.
Symmetrical distribution
A perfectly balanced distribution where data is spread evenly on both sides of the centre. This creates a bell-shaped curve that's identical on both sides.
Key characteristics:
- The median sits in the middle of a box plot
- All averages are equal: mean = median = mode
- On a box plot: (median - LQ) = (UQ - median)
- This represents the "ideal" normal distribution
Negative skew (left-skewed)
In a negatively skewed distribution, most data clusters towards the higher values, with a long tail extending towards the lower values. The data appears "pulled" towards the left side.
Key characteristics:
- The median sits closer to the right side of a box plot
- In terms of averages: mode > median > mean
- On a box plot: (UQ - median) < (median - LQ)
- The mean is smaller than the median due to the influence of extreme low values
Critical Relationship for Negative Skew: Remember that mode > median > mean for negative skew - this is the reverse of positive skew. The tail pulls the mean in the opposite direction.
Calculating skewness
You can calculate a numerical value for skewness using this formula:
This formula gives you a precise measurement of how skewed your data is:
- Positive value: Indicates positive skewness (the higher the value, the more skewed)
- Negative value: Indicates negative skewness (the lower the value, the more skewed)
- Zero: Indicates perfectly symmetrical distribution
The factor of 3 in the formula is based on statistical theory and helps standardise the skewness measure. Don't forget to include it in your calculations!
Worked example
Worked Example: Calculating Skewness
Problem: The data shows the number of times people used their mobile phones in one day: 2, 4, 7, 10, 0, 5, 4, 1, 12, 14, 12, 4, 18, 15, 6
Step 1: Create a box plot First, arrange the data in ascending order: 0, 1, 2, 4, 4, 4, 5, 6, 7, 10, 12, 12, 14, 15, 18
Find the key values:
- Minimum = 0
- LQ = 4
- Median = 6
- UQ = 12
- Maximum = 18
Step 2: Identify the type of skewness Compare the distances: (median - LQ) = (6 - 4) = 2 (UQ - median) = (12 - 6) = 6
Since 2 < 6, this shows positive skewness because the median is closer to the lower quartile.
Step 3: Calculate the mean Mean = (0 + 1 + 2 + 4 + 4 + 4 + 5 + 6 + 7 + 10 + 12 + 12 + 14 + 15 + 18) ÷ 15 Mean = 114 ÷ 15 = 7.6 times
Step 4: Calculate the skewness value Given that the standard deviation is 5.54:
This positive value of 0.87 confirms strong positive skewness.
Exam tips and common traps
Common Exam Mistakes to Avoid:
- Visual inspection: Always check if your calculated skewness matches what you can see visually in box plots or frequency curves
- Formula mistakes: Remember the formula uses 3 times the difference, not just the difference
- Interpretation errors: A common mistake is confusing which direction the skew points - remember that positive skew has a tail pointing right (towards positive numbers)
- Box plot reading: The position of the median line within the box is crucial for identifying skewness quickly
Remember!
Key Points to Remember:
- Skewness describes how data is distributed compared to a symmetrical pattern
- Positive skew means mean > median > mode, with data clustering towards lower values
- Negative skew means mode > median > mean, with data clustering towards higher values
- The skewness formula is:
- Box plots show skewness through the position of the median relative to the quartiles