Measures of Spread Revision Notes for VCE SSCE General Mathematics

Measures of Spread

Understanding variability in data

When analysing data, we need to understand how spread out the values are. Are the observations clustered closely together, or do they vary widely? A measure of spread (also called a measure of variability) helps us quantify this characteristic of a data set.

infoNote

Three main measures of spread are used in statistics:

The range
The interquartile range
The standard deviation

Each measure has different strengths and is suited to different situations.

The range

What is the range?

The range is the simplest way to measure how spread out data is. It tells us the distance between the smallest and largest values in a data set.

The range ( $R$ ) is defined as:

$R = \text{largest data value} - \text{smallest data value}$

The range is easy to calculate but has limitations because it only uses two values from the entire data set.

Worked example: calculating the range

lightbulbExample

Worked Example: Finding the Range

Consider the marks awarded to a group of students for two different tasks:

Task A: $2, 6, 9, 10, 11, 12, 13, 22, 23, 24, 26, 26, 27, 33, 34, 35, 38, 38, 39, 42, 46, 47, 47, 52, 52, 56, 56, 59, 91, 94$

Task B: $11, 16, 19, 21, 23, 28, 31, 31, 33, 38, 41, 49, 52, 53, 54, 56, 59, 63, 65, 68, 71, 72, 73, 75, 78, 78, 78, 86, 88, 91$

Let's find the range for each distribution.

Solution for Task A:

The minimum mark is $2$ and the maximum mark is $94$

$R = 94 - 2 = 92$

Solution for Task B:

The minimum mark is $11$ and the maximum mark is $91$

$R = 91 - 11 = 80$

Interpreting the range from stem-and-leaf plots

Stem-and-leaf plots (also called stem plots) provide a useful way to visualise data distributions and identify minimum and maximum values.

For the cat weights shown in the stem plot above, we can see:

Minimum value: $0.5$ kg (from stem $0$ , leaf $5$ )

Maximum value: $6.4$ kg (from stem $6$ , leaf $4$ )

Therefore: $R = 6.4 - 0.5 = 5.9$ kg

Limitations of the range

chatImportant

A Critical Weakness of the Range

While the range is simple to calculate, it has a significant weakness: it can be heavily influenced by extreme values (outliers).

Let's compare the stem plots for Task A and Task B:

Even though Task A has a larger range than Task B ( $92$ compared to $80$ ), looking at the stem plots reveals that Task A's marks are actually more concentrated than Task B's marks. The large range for Task A is caused by just two unusual values ( $91$ and $94$ ).

Most of Task A's marks are clustered between $10$ and $60$ , whilst Task B's marks are more evenly spread across the range from $11$ to $91$ . This demonstrates that the range alone doesn't always give us an accurate picture of how spread out data truly is.

infoNote

This limitation leads us to consider alternative measures of spread that are not so affected by extreme values.

The interquartile range

What is the interquartile range?

The interquartile range (IQR) is a more robust measure of spread than the range. It focuses on the middle portion of the data and is not affected by extreme values or outliers.

The IQR measures the spread of the middle 50% of observations by using quartiles to divide the data.

Understanding quartiles

infoNote

How Quartiles Work

Quartiles split an ordered data set into four equal parts:

$Q_1$ (first quartile): The median of the lower half of the data. This value has 25% of observations below it and $75\%$ above it.
$Q_2$ (second quartile): This is simply the median of the entire data set, with $50\%$ of observations below and $50\%$ above. We don't commonly use the notation $Q_2$ .
$Q_3$ (third quartile): The median of the upper half of the data. This value has 75% of observations below it and $25\%$ above it.

How to calculate the IQR

To find the interquartile range:

Arrange all observations in ascending order
Divide the observations into two equal-sized groups (if $n$ is odd, omit the median from both groups)
Find $Q_1$ by calculating the median of the lower half
Find $Q_3$ by calculating the median of the upper half
Calculate: $\text{IQR} = Q_3 - Q_1$

The IQR represents the range covered by the middle $50\%$ of the data values.

Worked example: finding the interquartile range

lightbulbExample

Worked Example: Calculating the IQR

Let's find the IQR for Task A and Task B from our earlier example, and compare the results.

Task A (30 values total):

Since there are $30$ values, we have $15$ values in each half.

Lower half: $2, 6, 9, 10, 11, 12, 13, 22, 23, 24, 26, 26, 27, 33, 34$

The median of the lower half ( $Q_1$ ) is the $8$ th value: $Q_1 = 22$

Upper half: $35, 38, 38, 39, 42, 46, 47, 47, 52, 52, 56, 56, 59, 91, 94$

The median of the upper half ( $Q_3$ ) is the $8$ th value: $Q_3 = 47$

Calculate IQR:

$\text{IQR} = Q_3 - Q_1 = 47 - 22 = 25$

Task B (30 values total):

Following the same process:

$Q_1 = 31$

$Q_3 = 73$

$\text{IQR} = Q_3 - Q_1 = 73 - 31 = 42$

Comparison:

The IQR shows that the variability of Task A marks ( $\text{IQR} = 25$ ) is smaller than the variability of Task B marks ( $\text{IQR} = 42$ ). This makes sense when we look at the stem plots - most Task A marks are concentrated in a narrower range than Task B marks.

Finding IQR from a stem plot

For the cat weights data shown above with $19$ values:

Find the median (the $10$ th value in the ordered list)
Since we have an odd number of values, omit the median
This leaves $9$ values in each half
Find $Q_1$ (the median of the lower $9$ values)
Find $Q_3$ (the median of the upper $9$ values)
Calculate: $\text{IQR} = Q_3 - Q_1$

Advantages of the IQR

chatImportant

Why the IQR is Superior to the Range

The interquartile range has several important advantages:

It describes the spread of the middle 50% of observations
It measures variability around the median
It is not affected by outliers or extreme values, since the upper $25\%$ and lower $25\%$ of observations are excluded from the calculation
It is a reliable measure of spread for any distribution, whether skewed or symmetric

This makes the IQR particularly useful when comparing data sets that may contain unusual values.

The standard deviation

What is standard deviation?

The standard deviation ( $s$ ) is the most commonly used measure of spread in statistics. Unlike the IQR which measures spread around the median, standard deviation measures the spread of data around the mean ( $\bar{x}$ ).

Standard deviation tells us the typical distance that data values sit from the mean. A small standard deviation indicates that values are clustered close to the mean, whilst a large standard deviation indicates that values are more spread out.

The formula for standard deviation

chatImportant

The Standard Deviation Formula

The standard deviation is defined as:

$s = \sqrt{\frac{\sum(x - \bar{x})^2}{n - 1}}$

where:

$n$ is the number of data values (sample size)
$\bar{x}$ is the mean
$x$ represents each individual data value
$(x - \bar{x})$ represents the deviation of each value from the mean

Understanding the formula

The standard deviation formula calculates an average of the squared deviations from the mean. Let's break down why this approach is used:

We find how far each value is from the mean: $(x - \bar{x})$
We square these deviations: $(x - \bar{x})^2$
We sum all the squared deviations: $\sum(x - \bar{x})^2$
We divide by $n - 1$ to find an average
We take the square root to get back to the original units

infoNote

Why Square the Deviations?

If we simply added up all the deviations $(x - \bar{x})$ , they would always sum to zero (positive and negative deviations cancel out). Squaring makes all deviations positive so we can meaningfully add them.

Why Divide by $n - 1$ Instead of $n$ ?

This is for theoretical statistical reasons. Using $n - 1$ gives us the sample standard deviation, which is the version we use in this course.

Worked example: calculating standard deviation by hand

lightbulbExample

Worked Example: Manual Calculation of Standard Deviation

Let's calculate the standard deviation for the data set: $2, 3, 4$

Step 1: Create a table to organise the calculations

We need columns for:

$x$ (the data values)
$(x - \bar{x})$ (deviations from the mean)
$(x - \bar{x})^2$ (squared deviations)

Step 2: Calculate the mean

$\bar{x} = \frac{\sum x}{n} = \frac{2 + 3 + 4}{3} = \frac{9}{3} = 3$

Step 3: Complete the table

$x$	$(x - \bar{x})$	$(x - \bar{x})^2$
$2$	$-1$	$1$
$3$	$0$	$0$
$4$	$1$	$1$
Sum	$9$	$0$	$2$

Notice that the sum of the deviations is zero, which confirms our calculations are correct.

Step 4: Substitute into the formula

$s = \sqrt{\frac{\sum(x - \bar{x})^2}{n - 1}}$

$s = \sqrt{\frac{2}{3 - 1}}$

$s = \sqrt{\frac{2}{2}}$

$s = \sqrt{1} = 1$

Therefore, the standard deviation is $1$ .

Calculating standard deviation in practice

infoNote

Whilst it's important to understand how standard deviation is calculated, in practice you will use a CAS calculator to find this value. Manual calculation is only required for the simplest data sets.

Using technology to find measures of spread

Modern technology makes calculating summary statistics much faster and more accurate than manual calculation. Both the TI-Nspire CAS and ClassPad calculators can compute all measures of spread instantly.

Example: monthly rainfall data

lightbulbExample

Worked Example: Using Technology for Summary Statistics

The following table shows monthly rainfall figures for Melbourne over one year:

Month	J	F	M	A	M	J	J	A	S	O	N	D
Rainfall (mm)	48	57	52	57	58	49	49	50	59	67	60	59

Let's use technology to find the mean, standard deviation, median, interquartile range, and range for this data.

Using the TI-Nspire CAS calculator

Steps:

Start a new document and add a Lists & Spreadsheet application
Enter the rainfall data into a list (name it 'rain')
Add a Calculator application
Select: Statistics → Stat Calculations → One-Variable Statistics
Press enter and select OK to generate the results

Calculator output:

The calculator displays all summary statistics including:

Mean: $\bar{x} = 55.4167$
Sample standard deviation: $s_x = 5.80687$
Median: $Q_2 = 57$
First quartile: $Q_1 = 49.5$
Third quartile: $Q_3 = 59$
Minimum: $\text{MinX} = 48$
Maximum: $\text{MaxX} = 67$

Final answers (rounded to one decimal place):

Mean: $\bar{x} = 55.4$ mm

Standard deviation: $s = 5.8$ mm

Median: $M = 57$ mm

Interquartile range: $\text{IQR} = Q_3 - Q_1 = 59 - 49.5 = 9.5$ mm

Range: $R = \text{max} - \text{min} = 67 - 48 = 19$ mm

Using the ClassPad calculator

The ClassPad follows a similar process:

Open the Statistics application
Enter data into a column
Select Calc → One-Variable from the menu
Complete the dialog box to specify the data list
Tap OK to view results

The ClassPad displays the same summary statistics, using $S_x$ for the sample standard deviation.

infoNote

Both calculators allow you to scroll through the results to see additional statistics if needed.

Choosing the right measure of spread

Different measures of spread are appropriate for different situations:

Use the range when:

You need a quick, simple measure
You want to know the total span of the data
The data has no extreme outliers

Use the IQR when:

The data contains outliers or extreme values
The distribution is skewed
You want a robust measure not affected by extremes
You're using the median as the measure of centre

Use the standard deviation when:

The distribution is roughly symmetric
You want to measure spread around the mean
You need the most commonly used measure for further statistical analysis
You're using the mean as the measure of centre

bookmarkSummary

Key Points to Remember:

The range is the simplest measure of spread: $R = \text{maximum} - \text{minimum}$ , but it's affected by outliers.
The interquartile range measures the spread of the middle $50\%$ of data: $\text{IQR} = Q_3 - Q_1$ . It's robust and not affected by extreme values.
Quartiles divide ordered data into quarters: $Q_1$ has $25\%$ below it, $Q_3$ has $75\%$ below it.
The standard deviation measures typical distance from the mean: $s = \sqrt{\frac{\sum(x - \bar{x})^2}{n - 1}}$ . It uses squared deviations because regular deviations sum to zero.
Use a CAS calculator to calculate measures of spread efficiently and accurately. Both TI-Nspire and ClassPad can compute all summary statistics from data lists.
Choose your measure of spread based on the data distribution and whether it contains outliers. IQR is best for skewed data, whilst standard deviation works well for symmetric distributions.

Measures of Spread (VCE SSCE General Mathematics): Revision Notes