Analyse Data (HSC SSCE Mathematics Advanced): Revision Notes
Analyse Data
Learning objectives
After working through this topic, you will be able to:
- Calculate the relative frequency of an outcome in a dataset
- Use relative frequency to estimate the probability of a result in an experiment
- Understand that larger sample sizes generally lead to more reliable probability estimates
- Apply estimated probabilities to make predictions about larger populations
Understanding relative frequency
When we collect data from experiments or surveys, we often want to know how frequently different outcomes occur. Relative frequency gives us a way to express this as a proportion rather than just a raw count.
Relative frequency is the proportion of times a particular outcome occurs within a dataset. It tells us what fraction of the total observations consists of that specific outcome.
The formula for relative frequency is:
This formula divides the number of times an outcome occurred by the total number of observations. The result is always a number between and , similar to a probability.
Think of relative frequency as the "part over whole" - it answers the question "what portion of my total data is this specific outcome?"
Using relative frequency to estimate probability
Relative frequency has an important connection to probability. When we conduct experiments, we can use relative frequency to estimate the probability of outcomes, especially when we don't know the theoretical probability.
For a random variable , the relative frequency approximates the probability:
Here, represents the estimated probability that the random variable takes the specific value . The symbol means "approximately equal to" because relative frequency gives us an estimate based on our sample, not an exact theoretical probability.
Critical concept: This estimate becomes more accurate as we increase the number of trials or observations. With more data, our relative frequency gets closer to the true probability.
Remember: Bigger samples = Better estimates
Worked example: estimating probability from die rolls
Worked Example: Estimating Probability from Die Rolls
A die is rolled times, with the following outcomes recorded:
- 1 appears 12 times
- 2 appears 9 times
- 3 appears 11 times
- 4 appears 10 times
- 5 appears 8 times
- 6 appears 10 times
Question: Estimate the probability of rolling a 3 using relative frequency.
Solution:
We need to find the relative frequency of rolling a 3, which will give us an estimate for .
Step 1: Write the formula
Step 2: Substitute the values
Step 3: Calculate the result
Therefore, the estimated probability of rolling a 3 is approximately 0.183 or 18.3%.
Checking our answer: We can verify this makes sense by adding all frequencies: ✓
Comparison with theory: The theoretical probability of rolling any number on a fair die is . Our estimate of is close to this, which suggests the die may be approximately fair, though we'd need more rolls to be confident.
Key idea: Relative frequency, calculated as , estimates for a random variable .
Applying relative frequency in experiments
Relative frequencies become particularly useful when we're working with experiments where the theoretical probabilities are unknown. In these situations, we must rely on experimental data to estimate probabilities.
An important principle to remember is that larger sample sizes yield more accurate estimates. When we increase the number of trials in an experiment, the relative frequency tends to get closer to the true probability. This is why statisticians prefer to work with larger datasets when making predictions.
Histograms are often used to visualise relative frequency distributions, making it easier to see patterns and compare frequencies across different outcomes. This visual representation helps you quickly identify which outcomes occur most frequently.
Worked example: quality control predictions
Worked Example: Quality Control Predictions
A store checks batteries and finds that are faulty.
Part a: Estimate the probability that a battery is faulty.
Solution:
We use the relative frequency formula to estimate the probability.
Step 1: Write the formula
Step 2: Substitute the values
Step 3: Calculate
Therefore, P(battery is faulty) ≈ 0.08 or 8%.
Part b: Predict the number of faulty batteries in tests.
Solution:
To make a prediction for a larger sample, we multiply the estimated probability by the new sample size.
Step 1: Apply the probability
Step 2: Calculate the result
We would expect approximately 80 batteries to be faulty out of tested.
Why this works: We're assuming that the rate of faulty batteries () observed in our sample of will continue in the larger population. This is a reasonable assumption when the sample is representative of the population.
Key idea: Relative frequencies from experiments estimate probabilities when theoretical values are unknown. These estimates improve with larger samples and can be visualised using histograms.
Exam tips
Examination Success Strategies:
- Always show your working when calculating relative frequencies, even for simple calculations
- Remember to compare experimental probability with theoretical probability when possible to check if results are reasonable
- For prediction questions, clearly identify the estimated probability first, then multiply by the new population size
- Be careful with rounding - keep extra decimal places in intermediate steps and round only your final answer
- State your answers in context when making predictions (e.g., "approximately 80 batteries" rather than just "80")
Remember!
Key Points to Remember:
-
Relative frequency is calculated as the frequency of an outcome divided by the total frequency, giving a proportion between and
-
Relative frequency provides an estimate for probability in experiments:
-
Larger sample sizes produce more reliable estimates - the more data you collect, the closer your relative frequency will be to the true probability
-
Relative frequencies can be used to make predictions about larger populations by multiplying the estimated probability by the population size
-
Always check if your estimated probabilities are reasonable by comparing them to theoretical values when available