Sampling and Data Collection (AQA GCSE Maths): Revision Notes
Sampling and Data Collection
What is sampling?
Sampling is the process of studying smaller groups to learn about much larger groups. When you want to understand something about a big population, it's often impossible or impractical to study everyone, so you examine a smaller, manageable group instead.
Think of it like tasting a spoonful of soup to check if the whole pot needs more salt - you don't need to drink the entire pot to know what it tastes like!
Why sampling matters: In the real world, researchers use sampling constantly - from opinion polls predicting election results to medical studies testing new treatments. Understanding sampling helps you evaluate whether the conclusions drawn from these studies are reliable.
Populations and samples
The population is the complete group you want to investigate. This could be anything - all the students in your school, every penguin in Antarctica, or all the people in the UK who own cars.
However, you usually cannot study the entire population because:
- It might be too large to manage
- It could be too expensive or time-consuming
- It might even be impossible to access everyone
That's where a sample comes in. A sample is a smaller group selected from the population that you actually study. The key is making sure your sample gives you accurate information about the whole population.
Critical concept: The goal of sampling is to use what you learn about a smaller group to make accurate conclusions about the entire population. If your sample doesn't represent the population well, your conclusions will be wrong.
Requirements for a good sample
For your sample to give you reliable information about the population, it must be representative. This means it should reflect the characteristics of the entire population accurately.
A representative sample needs to be:
Random
Every member of the population must have an equal chance of being selected. This prevents you from accidentally (or deliberately) choosing people who might give you biassed results.
Large enough
The bigger your sample, the more reliable your results should be. A sample that's too small might not capture the true diversity of the population.
Simple random sampling
To create a simple random sample, follow these steps:
Worked Example: Creating a Random Sample
Scenario: You want to survey 50 students from your school of 500 students about their lunch preferences.
Step 1: Assign a number to every member of the population
- Give each of the 500 students a number from 1 to 500
Step 2: Generate random numbers using a computer, calculator, or by picking numbers from a bag
- Use a random number generator to create 50 different numbers between 1 and 500
Step 3: Match the random numbers to the corresponding members of the population
- Find the students with those numbers - these form your random sample
This method ensures that your selection process is completely fair and unbiased.
Spotting bias in sampling
A biased sample doesn't properly represent the whole population. This happens when certain groups are more likely to be included than others.
Common sampling mistakes to avoid:
- Location bias: Only surveying people at specific locations (like train stations for transport surveys)
- Time bias: Only collecting data at certain times of day or days of the week
- Size bias: Using samples that are too small to be representative
- Self-selection bias: Only including people who volunteer to participate
To identify bias, consider:
- When, where, and how the sample was taken
- How many people are in the sample
- Which groups might be excluded from the sampling process
For example, if you only survey people at a train station about transport preferences, your sample will be biassed towards people who already use public transport. Similarly, if your sample is too small, it won't capture the full range of opinions in the population.
Primary and secondary data
Primary data is information you collect yourself through surveys, experiments, or observations. You control how it's gathered and you know exactly how reliable it is.
Secondary data is information that someone else has already collected, such as data from websites, books, or research papers. While this can save time, you need to be careful about its accuracy and whether it's suitable for your purposes.
Advantages and disadvantages:
- Primary data: More reliable and tailored to your needs, but time-consuming and expensive to collect
- Secondary data: Quick and cheap to access, but may not be exactly what you need and you can't control its quality
Types of data
Qualitative data
Qualitative data is descriptive and uses words rather than numbers. Examples include:
- People's names (like Smudge, Snowy, Dave)
- Favourite ice cream flavours (vanilla, chocolate, caramel-marshmallow-ripple)
- Colours, opinions, or categories
Quantitative data
Quantitative data measures quantities using numbers. Examples include:
- Heights of people
- Time taken to complete a race
- Number of goals scored in football matches
Discrete and continuous data
Quantitative data can be further classified into two types:
Discrete data
Discrete data can only take certain exact values, usually whole numbers. You cannot have values between these points. For example, the number of customers in a shop must be a whole number - you cannot have 2.5 customers.
Continuous data
Continuous data can take any value within a range. Examples include heights, weights, and temperatures, which can be measured to any level of precision (like 1.73 metres or 1.734 metres).
Memory aid: Think of discrete data as having gaps between possible values (like steps on a staircase), while continuous data flows smoothly from one value to another (like a ramp).
Organising data into classes
When working with large datasets, you often need to group your data into classes to make it more manageable.
For discrete data, make sure there are clear gaps between classes. For example, use "0-1 goals", "2-3 goals", "4-5 goals" rather than overlapping ranges.
For continuous data, classes should have no gaps between them and are often written using inequalities. For example, "0 ≤ age < 20", "20 ≤ age < 40", "40 ≤ age < 60".
Worked Example: Creating Data Classes
For discrete data (number of pets):
- Class 1: 0-1 pets
- Class 2: 2-3 pets
- Class 3: 4-5 pets
- Class 4: 6+ pets
For continuous data (height in cm):
- Class 1: 150 ≤ height < 160
- Class 2: 160 ≤ height < 170
- Class 3: 170 ≤ height < 180
- Class 4: 180 ≤ height < 190
When grouping data, ensure your classes:
- Cover all possible values
- Don't overlap
- Are sensible in number (not too many or too few)
Using sampling to estimate population size
Sometimes you can use sampling techniques to estimate the total size of a population. This is particularly useful when counting every individual would be impossible.
Worked Example: Capture-Recapture Method
Scenario: Estimating the number of fish in a lake
Step 1: Capture a sample, mark them, and release them back
- Catch 100 fish, tag them, and release them back into the lake
Step 2: Later, capture another sample and count how many are marked
- A week later, catch 80 fish and find that 10 of them are tagged
Step 3: Use the proportion of marked individuals to estimate the total population
- If 10 out of 80 fish are marked, then roughly 12.5% of the lake's fish are marked
- Since we originally marked 100 fish, the total population ≈ 100 ÷ 0.125 = 800 fish
The idea is that the fraction of marked individuals in your second sample should be similar to the fraction of marked individuals in the entire population.
Key Points to Remember:
- A sample is a smaller group used to learn about a larger population
- Good samples must be random and large enough to be representative
- Bias occurs when certain groups are more likely to be included than others
- Primary data is collected by you, while secondary data comes from other sources
- Qualitative data uses words, quantitative data uses numbers
- Discrete data has exact values only, continuous data can take any value in a range
- When organising data into classes, ensure they don't overlap and cover all possibilities