Statistics (Grade 10 NSC Matric Mathematics): Revision Notes
Summary
Statistics is the branch of mathematics that deals with collecting, analysing, and interpreting data. Understanding the different types of data and how to summarise them is fundamental to solving statistical problems effectively.
Types of data
Data are pieces of information that have been observed and recorded, typically from experiments or surveys. Understanding the type of data you're working with is crucial for choosing the right statistical methods.
Choosing the wrong statistical method for your data type can lead to incorrect conclusions. Always identify your data type first before selecting analysis techniques.
Quantitative data
Quantitative data can be written as numbers and measured. There are two main types:
- Discrete data: Can only take specific, separate values (like the number of students in a class - you can't have 2.5 students)
- Continuous data: Can take any value within a range (like height or weight measurements)
Worked Example: Identifying Data Types
Discrete data examples:
- Number of cars in a parking lot: 0, 1, 2, 3, 4... (cannot be 2.7 cars)
- Number of goals scored in a football match: 0, 1, 2, 3...
Continuous data examples:
- Height of students: 165.7 cm, 172.3 cm, 180.1 cm...
- Time taken to complete a race: 12.45 seconds, 13.02 seconds...
Qualitative data
Qualitative data cannot be written as numbers. Instead, they describe qualities or characteristics. The two common types are:
- Categorical data: Data that can be sorted into categories (like favourite colours or types of transport)
- Anecdotal data: Data based on personal accounts or stories rather than systematic collection
Measures of central tendency
These are values that represent the "centre" or typical value of a dataset.
Measures of central tendency help us understand what a "typical" value looks like in our dataset. Each measure has its own strengths and is useful in different situations.
Mean
The mean is the sum of all values divided by the number of values in the dataset. It's what most people call the "average."
Formula:
Where is the mean, is the number of values, and represents each individual value.
Worked Example: Calculating the Mean
Find the mean of the following test scores: 85, 92, 78, 96, 89
Step 1: Add all values together
Step 2: Divide by the number of values
Answer: The mean test score is 88.
Median
The median is the value in the central position when data is arranged from lowest to highest.
- If there's an odd number of values, the median is the middle value
- If there's an even number of values, the median is halfway between the two middle values
Mode
The mode is the value that appears most frequently in the dataset. A dataset can have one mode, multiple modes, or no mode at all.
Data distribution and outliers
Outliers
An outlier is a value that doesn't fit the typical pattern of the rest of the data. It's usually much larger or much smaller than other values in the dataset. Outliers can significantly affect the mean, so it's important to identify them.
Outliers can dramatically skew your results, especially when calculating the mean. Always check for outliers and consider whether they should be included in your analysis or investigated further.
Grouping continuous data
Continuous quantitative data can be grouped by dividing the full range of values into smaller sub-ranges or classes. This transforms continuous data into discrete categories, making it easier to analyse and present.
Measures of dispersion
Dispersion describes how spread out the values are around the centre of the data. Several statistics help us understand this spread.
While measures of central tendency tell us about the "typical" value, measures of dispersion tell us how much variation exists in our data. High dispersion means values are spread out; low dispersion means values are clustered together.
Range
The range is the difference between the maximum and minimum values in the dataset. It gives a simple measure of how spread out the data is.
Formula: Range = Maximum value - Minimum value
Percentiles
The p-th percentile is a value that divides the dataset so that p% of values are less than it, and (100-p)% of values are greater than it.
Formula for finding percentile position:
Where is the position, is the percentile, and is the number of values.
Quartiles
Quartiles divide an ordered dataset into four equal groups:
- Q1 (Lower quartile): 25% of data falls below this value
- Q2 (Median): 50% of data falls below this value
- Q3 (Upper quartile): 75% of data falls below this value
Worked Example: Finding Quartiles
Dataset: 12, 15, 18, 20, 22, 25, 28, 30, 35
Step 1: Data is already ordered (n = 9)
Step 2: Find Q2 (median) - middle value = 22
Step 3: Find Q1 - median of lower half (12, 15, 18, 20) = 16.5
Step 4: Find Q3 - median of upper half (25, 28, 30, 35) = 29
Answer: Q1 = 16.5, Q2 = 22, Q3 = 29
Interquartile range (IQR)
The interquartile range measures the spread of the middle 50% of the data. It's calculated by subtracting the lower quartile from the upper quartile.
Formula: IQR = Q3 - Q1
This measure is less affected by outliers than the range.
Semi interquartile range
The semi interquartile range is half of the interquartile range.
Formula: Semi IQR =
Five number summary and box plots
Five number summary
The five number summary consists of:
- Minimum value
- Q1 (Lower quartile)
- Q2 (Median)
- Q3 (Upper quartile)
- Maximum value
These five values provide a comprehensive overview of how the data is distributed.
Box-and-whisker plot
A box-and-whisker plot (or box plot) is a visual representation of the five number summary. It shows:
- A box extending from Q1 to Q3
- A line inside the box at the median (Q2)
- Whiskers extending to the minimum and maximum values
- Sometimes outliers are shown as separate points
Box plots make it easy to compare the distribution of different datasets at a glance.
Box plots are particularly useful when you need to compare multiple datasets side by side. They quickly show you the central tendency, spread, and any potential outliers in your data.
Key Points to Remember:
- Data types matter: Quantitative data uses numbers, qualitative data uses descriptions
- Mean is sensitive to outliers, while median is more resistant to extreme values
- The range gives you the total spread, but IQR tells you about the middle 50% of your data
- Quartiles divide your data into four equal parts: Q1 (25%), Q2 (50% - the median), Q3 (75%)
- Box plots provide a visual summary of all the key features of your dataset in one graph