Statistics Revision Notes for Grade 10 NSC Matric Mathematics

Summary

Statistics is the branch of mathematics that deals with collecting, analysing, and interpreting data. Understanding the different types of data and how to summarise them is fundamental to solving statistical problems effectively.

Types of data

Data are pieces of information that have been observed and recorded, typically from experiments or surveys. Understanding the type of data you're working with is crucial for choosing the right statistical methods.

chatImportant

Choosing the wrong statistical method for your data type can lead to incorrect conclusions. Always identify your data type first before selecting analysis techniques.

Quantitative data

Quantitative data can be written as numbers and measured. There are two main types:

Discrete data: Can only take specific, separate values (like the number of students in a class - you can't have 2.5 students)
Continuous data: Can take any value within a range (like height or weight measurements)

lightbulbExample

Worked Example: Identifying Data Types

Discrete data examples:

Number of cars in a parking lot: 0, 1, 2, 3, 4... (cannot be 2.7 cars)
Number of goals scored in a football match: 0, 1, 2, 3...

Continuous data examples:

Height of students: 165.7 cm, 172.3 cm, 180.1 cm...
Time taken to complete a race: 12.45 seconds, 13.02 seconds...

Qualitative data

Qualitative data cannot be written as numbers. Instead, they describe qualities or characteristics. The two common types are:

Categorical data: Data that can be sorted into categories (like favourite colours or types of transport)
Anecdotal data: Data based on personal accounts or stories rather than systematic collection

Measures of central tendency

These are values that represent the "centre" or typical value of a dataset.

infoNote

Measures of central tendency help us understand what a "typical" value looks like in our dataset. Each measure has its own strengths and is useful in different situations.

Mean

The mean is the sum of all values divided by the number of values in the dataset. It's what most people call the "average."

Formula: $\overline{x} = \frac{1}{n}\sum_{i=1}^{n}x_i = \frac{x_1 + x_2 + \cdots + x_n}{n}$

Where $\overline{x}$ is the mean, $n$ is the number of values, and $x_i$ represents each individual value.

lightbulbExample

Worked Example: Calculating the Mean

Find the mean of the following test scores: 85, 92, 78, 96, 89

Step 1: Add all values together

$85 + 92 + 78 + 96 + 89 = 440$

Step 2: Divide by the number of values

$\overline{x} = \frac{440}{5} = 88$

Answer: The mean test score is 88.

Median

The median is the value in the central position when data is arranged from lowest to highest.

If there's an odd number of values, the median is the middle value
If there's an even number of values, the median is halfway between the two middle values

Mode

The mode is the value that appears most frequently in the dataset. A dataset can have one mode, multiple modes, or no mode at all.

Data distribution and outliers

Outliers

An outlier is a value that doesn't fit the typical pattern of the rest of the data. It's usually much larger or much smaller than other values in the dataset. Outliers can significantly affect the mean, so it's important to identify them.

chatImportant

Outliers can dramatically skew your results, especially when calculating the mean. Always check for outliers and consider whether they should be included in your analysis or investigated further.

Grouping continuous data

Continuous quantitative data can be grouped by dividing the full range of values into smaller sub-ranges or classes. This transforms continuous data into discrete categories, making it easier to analyse and present.

Measures of dispersion

Dispersion describes how spread out the values are around the centre of the data. Several statistics help us understand this spread.

infoNote

While measures of central tendency tell us about the "typical" value, measures of dispersion tell us how much variation exists in our data. High dispersion means values are spread out; low dispersion means values are clustered together.

Range

The range is the difference between the maximum and minimum values in the dataset. It gives a simple measure of how spread out the data is.

Formula: Range = Maximum value - Minimum value

Percentiles

The p-th percentile is a value that divides the dataset so that p% of values are less than it, and (100-p)% of values are greater than it.

Formula for finding percentile position: $r = \frac{p}{100}(n-1) + 1$

Where $r$ is the position, $p$ is the percentile, and $n$ is the number of values.

Quartiles

Quartiles divide an ordered dataset into four equal groups:

Q1 (Lower quartile): 25% of data falls below this value
Q2 (Median): 50% of data falls below this value
Q3 (Upper quartile): 75% of data falls below this value

lightbulbExample

Worked Example: Finding Quartiles

Dataset: 12, 15, 18, 20, 22, 25, 28, 30, 35

Step 1: Data is already ordered (n = 9)

Step 2: Find Q2 (median) - middle value = 22

Step 3: Find Q1 - median of lower half (12, 15, 18, 20) = 16.5

Step 4: Find Q3 - median of upper half (25, 28, 30, 35) = 29

Answer: Q1 = 16.5, Q2 = 22, Q3 = 29

Interquartile range (IQR)

The interquartile range measures the spread of the middle 50% of the data. It's calculated by subtracting the lower quartile from the upper quartile.

Formula: IQR = Q3 - Q1

This measure is less affected by outliers than the range.

Semi interquartile range

The semi interquartile range is half of the interquartile range.

Formula: Semi IQR = $\frac{\text{IQR}}{2} = \frac{\text{Q3 - Q1}}{2}$

Five number summary and box plots

Five number summary

The five number summary consists of:

Minimum value
Q1 (Lower quartile)
Q2 (Median)
Q3 (Upper quartile)
Maximum value

These five values provide a comprehensive overview of how the data is distributed.

Box-and-whisker plot

A box-and-whisker plot (or box plot) is a visual representation of the five number summary. It shows:

A box extending from Q1 to Q3
A line inside the box at the median (Q2)
Whiskers extending to the minimum and maximum values
Sometimes outliers are shown as separate points

Box plots make it easy to compare the distribution of different datasets at a glance.

infoNote

Box plots are particularly useful when you need to compare multiple datasets side by side. They quickly show you the central tendency, spread, and any potential outliers in your data.

bookmarkSummary

Key Points to Remember:

Data types matter: Quantitative data uses numbers, qualitative data uses descriptions
Mean is sensitive to outliers
, while median is more resistant to extreme values
The range gives you the total spread, but IQR tells you about the middle 50% of your data
Quartiles divide your data into four equal parts: Q1 (25%), Q2 (50% - the median), Q3 (75%)
Box plots provide a visual summary of all the key features of your dataset in one graph

Statistics (Grade 10 NSC Matric Mathematics): Revision Notes

Summary

Types of data

Quantitative data

Qualitative data

Measures of central tendency

Mean

Median

Mode

Data distribution and outliers

Outliers

Grouping continuous data

Measures of dispersion

Range

Percentiles

Quartiles

Interquartile range (IQR)

Semi interquartile range

Five number summary and box plots

Five number summary

Box-and-whisker plot

Explore Grade 10 NSC Matric Mathematics Model Answers by Topics

Trigonometry

Euclidean Geometry

Analytical Geometry

Measurements

Statistics

Probability

Explore Grade 10 NSC Matric Mathematics Quizzes by Topics

Trigonometry

Euclidean Geometry

Analytical Geometry

Measurements

Statistics

Probability

Explore Grade 10 NSC Matric Mathematics Flashcards by Topics

Trigonometry

Euclidean Geometry

Analytical Geometry

Measurements

Statistics

Probability

Join 100,000+ NSC Matric students studying Revision Notes with us.