Numerical and Statistical Skills Revision Notes for OCR GCSE Geography B (Geography for Enquiring Minds)

Numerical and Statistical Skills

Introduction

Working with numerical data is a fundamental skill for geographers, particularly when conducting fieldwork and drawing conclusions. Understanding how to handle data properly enables you to identify trends, recognise patterns, and make predictions about future developments. This is essential for geographical investigation and analysis.

Measures of central tendency

Central tendency refers to statistical measures that identify the typical or central value within a dataset. These measures help geographers understand what is 'normal' or 'average' in their data.

Median

The median represents the middle value when numbers are arranged in order from smallest to largest. To find the median, you must first organise your data numerically, then locate the value in the central position.

infoNote

How to calculate the median:

Arrange all values in ascending order
If there is an odd number of values, the median is the middle number
If there is an even number of values, the median is the average of the two middle numbers

chatImportant

The median is useful because it is not affected by extremely high or low values (outliers) in your dataset, making it a reliable measure when your data contains unusual values.

Mean

The mean is commonly referred to as the 'average' and represents the sum of all values divided by the number of values in the dataset.

Formula for calculating the mean:

$\text{Mean} = \frac{\text{Sum of all values}}{\text{Total number of values}}$

lightbulbExample

Worked Example: Calculating the Mean

For the dataset: 4, 8, 8, 15, 16, 23, 42

Sum = $4 + 8 + 8 + 15 + 16 + 23 + 42 = 116$

Mean = $116 \div 7 = 16.6$

chatImportant

The mean uses all values in the calculation, so it can be affected by extreme values. Be aware of this limitation when interpreting your results.

Mode

The mode identifies the value that appears most frequently in a dataset. This measure is particularly useful when dealing with categorical data or when you need to identify the most common occurrence.

In the dataset shown (4, 8, 8, 15, 16, 23, 42), the number 8 appears twice, making it the mode as it occurs more frequently than any other value.

infoNote

A dataset can have more than one mode (bimodal or multimodal) if multiple values occur with equal highest frequency. Some datasets may have no mode if all values appear only once.

Modal class

When data is grouped into classes or categories, the modal class is the category with the highest frequency. This is particularly useful when working with grouped frequency tables.

In the frequency table shown, the class interval 0-10 contains the highest frequency (3 observations), making it the modal class.

Measures of spread

Measures of spread describe how data is distributed and how much variation exists within a dataset. Understanding spread is crucial for interpreting the reliability and variability of geographical data.

Range

The range measures the spread of data by calculating the difference between the largest and smallest values in the dataset.

Formula for calculating range:

$\text{Range} = \text{Maximum value} - \text{Minimum value}$

lightbulbExample

Worked Example: Calculating the Range

For the dataset: 4, 8, 8, 15, 16, 23, 42

Range = $42 - 4 = 38$

chatImportant

The range is easy to calculate but can be misleading if there are extreme outliers in your data, as it only considers the two most extreme values.

Quartiles

Quartiles divide an ordered dataset into four equal parts, providing information about the distribution of values. Understanding quartiles helps you analyse how data is spread across the range.

infoNote

The three quartile values are:

Lower quartile (Q1): The value that separates the lowest 25% of data from the rest
Median (Q2): The middle value that separates the lower 50% from the upper 50%
Upper quartile (Q3): The value that separates the lowest 75% of data from the highest 25%

Interquartile range (IQR)

The interquartile range measures the spread of the middle 50% of the data, providing a more robust measure of spread than the range because it excludes extreme values.

Formula for calculating IQR:

$\text{IQR} = \text{Upper quartile (Q3)} - \text{Lower quartile (Q1)}$

lightbulbExample

Worked Example: Calculating the IQR

From the dataset: 4, 8, 8, 15, 16, 23, 42

$Q_3 = 23$

$Q_1 = 8$

$\text{IQR} = 23 - 8 = 15$

infoNote

The IQR is particularly useful in geography for identifying the typical spread of your data whilst ignoring extreme values that might distort your analysis.

Percentage calculations

Calculating percentage changes is essential for comparing data collected at different times or locations. These calculations help geographers quantify changes and make meaningful comparisons.

Percentage increase

Percentage increase calculations show how much a value has grown relative to its original size. This is useful for analysing changes such as river channel width downstream or population growth.

Method for calculating percentage increase:

Calculate the difference between the new value and the original value (increase)
Divide the increase by the original value
Multiply the result by 100

Formula:

$\text{Percentage increase} = \frac{\text{Increase}}{\text{Original value}} \times 100$

lightbulbExample

Worked Example: Percentage Increase

Robin population in woodland:

December count: 15 robins
January count: 23 robins

Increase = $23 - 15 = 8$

Percentage increase = $\frac{8}{15} \times 100 = 53.3\%$

chatImportant

Always identify the original (starting) value correctly, as this forms the denominator in your calculation. Using the wrong baseline will produce incorrect results.

Percentage decrease

Percentage decrease calculations show how much a value has reduced relative to its original size. This is useful for analysing reductions such as particle size in rivers downstream or population decline.

Method for calculating percentage decrease:

Calculate the difference between the original value and the new value (decrease)
Divide the decrease by the original value
Multiply the result by 100

Formula:

$\text{Percentage decrease} = \frac{\text{Decrease}}{\text{Original value}} \times 100$

lightbulbExample

Worked Example: Percentage Decrease

Robin population in woodland:

February count: 22 robins
March count: 12 robins

Decrease = $22 - 12 = 10$

Percentage decrease = $\frac{10}{22} \times 100 = 45.4\%$

chatImportant

When calculating percentage change, ensure you identify whether the change is an increase or decrease before selecting the appropriate formula.

Percentiles

Percentiles divide a dataset into 100 equal parts, providing a very detailed way of understanding where a particular value sits within the overall distribution. Whilst quartiles divide data into quarters, percentiles offer much finer divisions.

infoNote

Understanding percentiles:

If a value is in the 90th percentile, this means that 90% of all values in the dataset are equal to or less than this value, whilst 10% are greater.

Geographical application:

Percentiles are commonly used in various contexts. For example, when measuring baby growth, a midwife might report that a baby is in the 90th percentile for weight, meaning that out of 100 babies of the same age, 90 would weigh the same or less, whilst only 10 would weigh more.

infoNote

Remember that percentiles and quartiles are related - the 25th percentile equals Q1, the 50th percentile equals the median (Q2), and the 75th percentile equals Q3.

Identifying relationships in data

Analysing relationships between different variables is crucial for understanding geographical patterns and processes. Recognising how variables relate to each other allows you to make predictions and test hypotheses about geographical phenomena.

Scatter graphs

Scatter graphs provide a visual representation of the relationship between two variables. Each point on the graph represents a pair of values, allowing you to see whether there is a pattern or correlation between the variables.

Common geographical applications:

Number of tourists versus number of tourist facilities
River discharge versus sediment load
Distance from CBD versus land values
Height versus weight

The scatter graph shown displays data points marked with 'x' symbols, plotting height against weight. Each point represents an individual observation with both a height and weight value.

Line of best fit

A line of best fit (also called a trend line) can be drawn through the scatter graph points to show the general relationship between the two variables. When drawing this line, aim to have approximately equal numbers of points above and below the line, with the line passing as close as possible to all points.

chatImportant

When drawing a line of best fit, use a ruler and ensure the line extends across the full range of your data. Don't just connect the first and last points - the line should reflect the overall trend of all the data.

Understanding correlation

Correlation describes the strength and direction of the relationship between two variables shown on a scatter graph.

Strong correlation occurs when data points cluster tightly around the line of best fit. This indicates a close relationship between the variables - as one changes, the other changes in a predictable way.

Weak correlation occurs when data points are scattered widely from the line of best fit. This suggests the variables are not closely related - a change in one variable does not reliably predict a change in the other.

infoNote

Types of correlation:

Positive correlation: As one variable increases, the other also increases (upward-sloping line)
Negative correlation: As one variable increases, the other decreases (downward-sloping line)
No correlation: No clear relationship exists between the variables (scattered points with no pattern)

chatImportant

Always describe both the direction (positive/negative) and strength (strong/weak) of correlation in your answers.

Interpolation

Interpolation involves finding a value within the existing data range using the line of best fit. This technique allows you to estimate values that were not directly measured or plotted in your original data collection.

The diagram shows how to read a value from the line of best fit using horizontal and vertical reference lines (shown in dashed pink). This process reads from one axis, across to the line of best fit, then to the other axis.

infoNote

How to interpolate:

Locate the known value on one axis
Draw a line from this point to intersect the line of best fit
From the intersection point, draw a line to the other axis
Read the interpolated value

infoNote

Interpolation is generally reliable because you are estimating within the range of your collected data, where the relationship has been demonstrated.

Extrapolation

Extrapolation involves extending the line of best fit beyond the range of collected data to predict values outside your dataset. This technique allows you to make predictions about what might happen beyond your observations.

The diagram shows the line of best fit being extended beyond the data points (shown by the dashed extension line) to predict values outside the measured range.

chatImportant

Important considerations:

Extrapolation may provide uncertain or unreliable results because you are assuming the relationship continues in the same way beyond your data range. Many geographical relationships change at extreme values, meaning extrapolated predictions may be inaccurate.

chatImportant

When asked to evaluate extrapolation in an exam, always mention that it involves greater uncertainty than interpolation because you are predicting beyond known data. External factors or changing relationships might affect the accuracy of extrapolated values.

Remember!

bookmarkSummary

Key Points to Remember:

Central tendency measures (mean, median, mode) identify typical values, whilst measures of spread (range, IQR) show how data is distributed
Mean is calculated by summing all values and dividing by the count, but can be affected by extreme values; median is the middle value and is more robust to outliers
Percentage change is calculated using the formula: $\frac{\text{Change}}{\text{Original value}} \times 100$ , remembering to always divide by the original value
Scatter graphs reveal relationships between variables; a line of best fit should have roughly equal points above and below it
Interpolation (finding values within the data range) is more reliable than extrapolation (predicting beyond the data range), which involves greater uncertainty

Key terms: Mean, median, mode, modal class, range, quartiles, interquartile range, percentiles, scatter graph, correlation, line of best fit, interpolation, extrapolation

Critical skills for exams:

Always show your working in calculations to gain method marks even if your final answer is incorrect
When describing scatter graphs, state both the direction and strength of correlation
For percentage calculations, identify the original value carefully as this is your denominator
Remember that extrapolation is less reliable than interpolation because it extends beyond known data

Numerical and Statistical Skills (OCR GCSE Geography B (Geography for Enquiring Minds)): Revision Notes