Hypothesis Testing and Contingency Tables (AQA A-Level Further Maths): Revision Notes
Confidence Intervals
Introduction
When we need to find the mean or variance of an entire population, measuring every individual is usually impractical or impossible. Instead, we take a sample from the population and calculate the sample mean and sample variance. These sample statistics help us estimate the corresponding population parameters.
A confidence interval provides a range of values within which we expect the true population mean to lie, with a specified level of confidence.
Confidence intervals are widely used in scientific research, quality control, and policy-making where understanding the accuracy of estimates is crucial for decision-making.
Sample statistics
Sample mean
For a set of data values , the sample mean is:
The sample mean is calculated by adding all data values and dividing by the number of observations.
Sample variance
The sample variance is:
Notice the denominator is n-1 rather than n. This correction makes the sample variance an unbiased estimator of the population variance. The "minus one" is called Bessel's correction and compensates for the fact that we're using the sample mean instead of the true population mean in the calculation.
Unbiased estimators
An unbiased estimator is a statistic whose expected value equals the population parameter it estimates. Both the sample mean and sample variance are unbiased estimators of the population mean and population variance respectively.
However, being unbiased does not guarantee the estimate will be close to the true parameter value, especially for small samples. The estimate improves as sample size increases.
Distribution of the sample mean
For large samples drawn from a population with mean and variance , the distribution of the sample mean follows a normal distribution:
This means:
- The expected value of the sample mean equals the population mean
- The variance of the sample mean is (smaller than the population variance)
- As sample size increases, the sample mean becomes more concentrated around
Rule of Thumb for Sample Size
A general rule of thumb is that a sample size of at least 30 is considered 'large enough' to obtain a reasonable estimate of the population variance, allowing us to use in place of . This is based on the Central Limit Theorem, though the theorem itself is not required for this course.
The accuracy of the sample mean as an estimate of the population mean depends on the population variance, which is often unknown.
What is a confidence interval?
A -confidence interval is a range of values generated from sample data. Before the sample is taken, we expect the population mean to fall within this interval with probability .
Key point: A -confidence interval means that if we repeatedly took samples and calculated confidence intervals, we would expect approximately p% of those intervals to contain the true population mean.
Critical Interpretation Note
Once a confidence interval has been generated from a specific sample, we cannot say it has a p% probability of containing . The probability statement only applies before sampling.
After calculating an interval from your data, the population mean either is or isn't in that specific interval – there's no probability involved anymore.
Calculating confidence intervals when variance is estimated
When the population variance is unknown and must be estimated from the sample, the -confidence interval for the population mean is:
where:
- is the sample mean
- is the sample standard deviation (square root of sample variance)
- is the sample size
- is the critical value from the standard normal distribution
Standard error
The standard error is the standard deviation of the sample mean:
The standard error measures how much the sample mean varies from sample to sample. A smaller standard error indicates more precise estimates.
Remember the mnemonic: SEM = Standard Error of Mean =
The standard error decreases as sample size increases, because we're dividing by . This is why larger samples give more reliable estimates.
Finding the z-value
The critical value depends on the confidence level . For common confidence levels:
| Confidence level () | z-value |
|---|---|
| 90% | 1.645 |
| 95% | 1.96 |
| 98% | 2.326 |
| 99% | 2.576 |
These values can be calculated using the inverse normal function:
where is the inverse of the standard normal cumulative distribution function. Most calculators have this function built in.
Interpretation
The confidence interval tells us that we can be confident the true population mean lies between the lower and upper bounds. The interval width increases as:
- The confidence level increases (we need more certainty)
- The sample variance increases (more variable data)
- The sample size decreases (less information)
CLIP Mnemonic
Confidence Level Increases, Precision decreases
Higher confidence = wider interval = less precise estimate. This represents the fundamental trade-off in statistical estimation.
Calculating confidence intervals when population variance is known
When the population variance is known (rare in practice), we can use it directly without needing to estimate it from the sample. The -confidence interval becomes:
This formula uses the known population standard deviation instead of the sample standard deviation .
Worked examples
Worked Example 1: Distribution of Sample Mean
Question: A population follows a normal distribution with mean 6.2 and variance 6.2. A sample of size 50 is taken. Give a model of the distribution of the sample mean.
Solution:
The sample mean is distributed as:
The mean of the distribution equals the population mean (6.2), and the variance is the population variance divided by the sample size.
Worked Example 2: Confidence Interval with Estimated Variance
Question: A sample of size 36 is taken from a population whose standard deviation is 20.4. The sample mean is 13.6.
a) What is the probability that the confidence interval will contain the population mean?
b) Find the 95% confidence interval.
Solution:
a) 95%
Before the sample is taken, there is a 95% probability that a 95%-confidence interval will contain the true population mean.
b) First, find the number of standard errors:
For a 95% confidence interval:
The confidence interval is:
Calculate the standard error:
Therefore:
The 95% confidence interval for the population mean is (6.94, 20.3).
Worked Example 3: Confidence Interval with Known Variance
Question: A 90% confidence interval is to be created for a normal distribution whose variance is known to be 19.5. A sample of size 72 is taken and the sample mean is 19.3.
a) Calculate the standard error.
b) Determine the value of .
c) Find the confidence interval.
Solution:
a) Standard error =
Note: We use (population standard deviation) because the variance is known.
b) For a 90% confidence interval, (from tables or calculator)
c) The confidence interval is:
The 90% confidence interval for the population mean is (18.4, 20.1).
Exam tips and common traps
Sample size and interval width
Increasing the sample size decreases the width of the confidence interval because the standard error contains in the denominator. A larger sample provides a more accurate estimate of the population mean.
Since the standard error is based on , a sample four times as large will halve the width of the interval. This square root relationship is important for planning studies – if you want to double your precision, you need to quadruple your sample size!
When to use each formula
- Use sample variance (): When the population variance is unknown (most common situation)
- Use population variance (): When explicitly stated that the population variance is known
Common Formula Selection Mistake
Always read the question carefully to determine whether you're given:
- The population standard deviation or variance (rare, but explicitly stated)
- The sample standard deviation or variance (most common)
If the question doesn't specify, assume you need to use the sample variance.
Confidence level interpretation
Common Interpretation Mistake
Incorrect: "There is a 95% probability that lies in this interval" (after calculating the interval from a sample)
Correct: "We are 95% confident that lies in this interval" or "If we repeated this process many times, 95% of intervals would contain ."
The difference is subtle but critical for exams!
Representative samples
Always ensure your sample is representative of the population to avoid bias. A biased sample (such as measuring only basketball players when estimating average human height) will produce misleading confidence intervals, even if calculated correctly.
Sources of Bias
Common sources of sample bias include:
- Selection bias (non-random sampling)
- Response bias (people who respond differ from those who don't)
- Measurement bias (systematic errors in data collection)
Even perfect mathematics cannot overcome poor sampling!
Trade-off between confidence and precision
You can never be certain that your interval contains the population mean. The more confident you want to be, the larger the interval becomes ( increases as increases). This represents a fundamental trade-off between confidence and precision.
Checking your work
When calculating confidence intervals:
- Identify whether or should be used
- Calculate the standard error correctly
- Find the appropriate -value for the confidence level
- Apply the formula systematically
- Give your final answer to an appropriate degree of accuracy (usually 3 significant figures)
Calculator Tips
Most scientific calculators can:
- Calculate (inverse normal) directly
- Compute standard deviations from raw data
- Store intermediate values to avoid rounding errors
Learn your calculator's functions before the exam!
Remember!
Key Points to Remember:
-
The sample mean and sample variance are unbiased estimators of the population parameters and .
-
A p%-confidence interval means that before sampling, there is a probability the interval will contain the true population mean. This interpretation only applies before the interval is generated.
-
The standard error measures the variability of the sample mean. Larger samples produce smaller standard errors and narrower confidence intervals.
-
Use s (sample standard deviation) when the population variance is unknown, and σ (population standard deviation) when the population variance is known.
-
Increasing the confidence level produces wider intervals. There is always a trade-off between confidence and precision.
-
Remember the n-1 denominator in sample variance – this is the "minus one for bias correction" that makes it an unbiased estimator.