Sampling and Data Collection Revision Notes for AQA GCSE Maths

Sampling and Data Collection

What is sampling?

Sampling is the process of studying smaller groups to learn about much larger groups. When you want to understand something about a big population, it's often impossible or impractical to study everyone, so you examine a smaller, manageable group instead.

Think of it like tasting a spoonful of soup to check if the whole pot needs more salt - you don't need to drink the entire pot to know what it tastes like!

infoNote

Why sampling matters: In the real world, researchers use sampling constantly - from opinion polls predicting election results to medical studies testing new treatments. Understanding sampling helps you evaluate whether the conclusions drawn from these studies are reliable.

Populations and samples

The population is the complete group you want to investigate. This could be anything - all the students in your school, every penguin in Antarctica, or all the people in the UK who own cars.

However, you usually cannot study the entire population because:

It might be too large to manage
It could be too expensive or time-consuming
It might even be impossible to access everyone

That's where a sample comes in. A sample is a smaller group selected from the population that you actually study. The key is making sure your sample gives you accurate information about the whole population.

chatImportant

Critical concept: The goal of sampling is to use what you learn about a smaller group to make accurate conclusions about the entire population. If your sample doesn't represent the population well, your conclusions will be wrong.

Requirements for a good sample

For your sample to give you reliable information about the population, it must be representative. This means it should reflect the characteristics of the entire population accurately.

A representative sample needs to be:

Random

Every member of the population must have an equal chance of being selected. This prevents you from accidentally (or deliberately) choosing people who might give you biassed results.

Large enough

The bigger your sample, the more reliable your results should be. A sample that's too small might not capture the true diversity of the population.

Simple random sampling

To create a simple random sample, follow these steps:

lightbulbExample

Worked Example: Creating a Random Sample

Scenario: You want to survey 50 students from your school of 500 students about their lunch preferences.

Step 1: Assign a number to every member of the population

Give each of the 500 students a number from 1 to 500

Step 2: Generate random numbers using a computer, calculator, or by picking numbers from a bag

Use a random number generator to create 50 different numbers between 1 and 500

Step 3: Match the random numbers to the corresponding members of the population

Find the students with those numbers - these form your random sample

This method ensures that your selection process is completely fair and unbiased.

Spotting bias in sampling

A biased sample doesn't properly represent the whole population. This happens when certain groups are more likely to be included than others.

chatImportant

Common sampling mistakes to avoid:

Location bias: Only surveying people at specific locations (like train stations for transport surveys)
Time bias: Only collecting data at certain times of day or days of the week
Size bias: Using samples that are too small to be representative
Self-selection bias: Only including people who volunteer to participate

To identify bias, consider:

When, where, and how the sample was taken
How many people are in the sample
Which groups might be excluded from the sampling process

For example, if you only survey people at a train station about transport preferences, your sample will be biassed towards people who already use public transport. Similarly, if your sample is too small, it won't capture the full range of opinions in the population.

Primary and secondary data

Primary data is information you collect yourself through surveys, experiments, or observations. You control how it's gathered and you know exactly how reliable it is.

Secondary data is information that someone else has already collected, such as data from websites, books, or research papers. While this can save time, you need to be careful about its accuracy and whether it's suitable for your purposes.

infoNote

Advantages and disadvantages:

Primary data: More reliable and tailored to your needs, but time-consuming and expensive to collect
Secondary data: Quick and cheap to access, but may not be exactly what you need and you can't control its quality

Types of data

Qualitative data

Qualitative data is descriptive and uses words rather than numbers. Examples include:

People's names (like Smudge, Snowy, Dave)
Favourite ice cream flavours (vanilla, chocolate, caramel-marshmallow-ripple)
Colours, opinions, or categories

Quantitative data

Quantitative data measures quantities using numbers. Examples include:

Heights of people
Time taken to complete a race
Number of goals scored in football matches

Discrete and continuous data

Quantitative data can be further classified into two types:

Discrete data

Discrete data can only take certain exact values, usually whole numbers. You cannot have values between these points. For example, the number of customers in a shop must be a whole number - you cannot have 2.5 customers.

Continuous data

Continuous data can take any value within a range. Examples include heights, weights, and temperatures, which can be measured to any level of precision (like 1.73 metres or 1.734 metres).

infoNote

Memory aid: Think of discrete data as having gaps between possible values (like steps on a staircase), while continuous data flows smoothly from one value to another (like a ramp).

Organising data into classes

When working with large datasets, you often need to group your data into classes to make it more manageable.

For discrete data, make sure there are clear gaps between classes. For example, use "0-1 goals", "2-3 goals", "4-5 goals" rather than overlapping ranges.

For continuous data, classes should have no gaps between them and are often written using inequalities. For example, "0 ≤ age < 20", "20 ≤ age < 40", "40 ≤ age < 60".

lightbulbExample

Worked Example: Creating Data Classes

For discrete data (number of pets):

Class 1: 0-1 pets
Class 2: 2-3 pets
Class 3: 4-5 pets
Class 4: 6+ pets

For continuous data (height in cm):

Class 1: 150 ≤ height < 160
Class 2: 160 ≤ height < 170
Class 3: 170 ≤ height < 180
Class 4: 180 ≤ height < 190

When grouping data, ensure your classes:

Cover all possible values
Don't overlap
Are sensible in number (not too many or too few)

Using sampling to estimate population size

Sometimes you can use sampling techniques to estimate the total size of a population. This is particularly useful when counting every individual would be impossible.

lightbulbExample

Worked Example: Capture-Recapture Method

Scenario: Estimating the number of fish in a lake

Step 1: Capture a sample, mark them, and release them back

Catch 100 fish, tag them, and release them back into the lake

Step 2: Later, capture another sample and count how many are marked

A week later, catch 80 fish and find that 10 of them are tagged

Step 3: Use the proportion of marked individuals to estimate the total population

If 10 out of 80 fish are marked, then roughly 12.5% of the lake's fish are marked
Since we originally marked 100 fish, the total population ≈ 100 ÷ 0.125 = 800 fish

The idea is that the fraction of marked individuals in your second sample should be similar to the fraction of marked individuals in the entire population.

bookmarkSummary

Key Points to Remember:

A sample is a smaller group used to learn about a larger population
Good samples must be random and large enough to be representative
Bias occurs when certain groups are more likely to be included than others
Primary data is collected by you, while secondary data comes from other sources
Qualitative data uses words, quantitative data uses numbers
Discrete data has exact values only, continuous data can take any value in a range
When organising data into classes, ensure they don't overlap and cover all possibilities

Sampling and Data Collection (AQA GCSE Maths): Revision Notes