Hypothesis Testing and Contingency Tables Revision Notes for AQA A-Level Further Maths

Confidence Intervals

Introduction

When we need to find the mean or variance of an entire population, measuring every individual is usually impractical or impossible. Instead, we take a sample from the population and calculate the sample mean and sample variance. These sample statistics help us estimate the corresponding population parameters.

A confidence interval provides a range of values within which we expect the true population mean to lie, with a specified level of confidence.

infoNote

Confidence intervals are widely used in scientific research, quality control, and policy-making where understanding the accuracy of estimates is crucial for decision-making.

Sample statistics

Sample mean

For a set of $n$ data values $\{x_i\}$ , the sample mean is:

$\bar{x} = \frac{\sum x_i}{n}$

The sample mean is calculated by adding all data values and dividing by the number of observations.

Sample variance

The sample variance is:

$s^2 = \frac{1}{n-1}\sum(x_i - \bar{x})^2$

chatImportant

Notice the denominator is n-1 rather than n. This correction makes the sample variance an unbiased estimator of the population variance. The "minus one" is called Bessel's correction and compensates for the fact that we're using the sample mean instead of the true population mean in the calculation.

Unbiased estimators

An unbiased estimator is a statistic whose expected value equals the population parameter it estimates. Both the sample mean $\bar{x}$ and sample variance $s^2$ are unbiased estimators of the population mean $\mu$ and population variance $\sigma^2$ respectively.

However, being unbiased does not guarantee the estimate will be close to the true parameter value, especially for small samples. The estimate improves as sample size increases.

Distribution of the sample mean

For large samples drawn from a population with mean $\mu$ and variance $\sigma^2$ , the distribution of the sample mean follows a normal distribution:

$\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)$

This means:

The expected value of the sample mean equals the population mean $\mu$
The variance of the sample mean is $\frac{\sigma^2}{n}$ (smaller than the population variance)
As sample size $n$ increases, the sample mean becomes more concentrated around $\mu$

infoNote

Rule of Thumb for Sample Size

A general rule of thumb is that a sample size of at least 30 is considered 'large enough' to obtain a reasonable estimate of the population variance, allowing us to use $s^2$ in place of $\sigma^2$ . This is based on the Central Limit Theorem, though the theorem itself is not required for this course.

The accuracy of the sample mean as an estimate of the population mean depends on the population variance, which is often unknown.

What is a confidence interval?

A $p\%$ -confidence interval is a range of values generated from sample data. Before the sample is taken, we expect the population mean $\mu$ to fall within this interval with probability $p\%$ .

Key point: A $p\%$ -confidence interval means that if we repeatedly took samples and calculated confidence intervals, we would expect approximately p% of those intervals to contain the true population mean.

chatImportant

Critical Interpretation Note

Once a confidence interval has been generated from a specific sample, we cannot say it has a p% probability of containing $\mu$ . The probability statement only applies before sampling.

After calculating an interval from your data, the population mean either is or isn't in that specific interval – there's no probability involved anymore.

Calculating confidence intervals when variance is estimated

When the population variance is unknown and must be estimated from the sample, the $p\%$ -confidence interval for the population mean $\mu$ is:

$\bar{x} - z \times \frac{s}{\sqrt{n}} < \mu < \bar{x} + z \times \frac{s}{\sqrt{n}}$

where:

$\bar{x}$ is the sample mean
$s$ is the sample standard deviation (square root of sample variance)
$n$ is the sample size
$z$ is the critical value from the standard normal distribution

Standard error

The standard error is the standard deviation of the sample mean:

$\text{Standard error} = \frac{s}{\sqrt{n}}$

The standard error measures how much the sample mean varies from sample to sample. A smaller standard error indicates more precise estimates.

infoNote

Remember the mnemonic: SEM = Standard Error of Mean = $s/\sqrt{n}$

The standard error decreases as sample size increases, because we're dividing by $\sqrt{n}$ . This is why larger samples give more reliable estimates.

Finding the z-value

The critical value $z$ depends on the confidence level $p$ . For common confidence levels:

Confidence level ( $p$ )	z-value
90%	1.645
95%	1.96
98%	2.326
99%	2.576

These values can be calculated using the inverse normal function:

$z = \Phi^{-1}\left(\frac{1+p}{2}\right)$

where $\Phi^{-1}$ is the inverse of the standard normal cumulative distribution function. Most calculators have this function built in.

Interpretation

The confidence interval tells us that we can be $p\%$ confident the true population mean lies between the lower and upper bounds. The interval width increases as:

The confidence level $p$ increases (we need more certainty)
The sample variance $s^2$ increases (more variable data)
The sample size $n$ decreases (less information)

infoNote

CLIP Mnemonic

Confidence Level Increases, Precision decreases

Higher confidence = wider interval = less precise estimate. This represents the fundamental trade-off in statistical estimation.

Calculating confidence intervals when population variance is known

When the population variance $\sigma^2$ is known (rare in practice), we can use it directly without needing to estimate it from the sample. The $p\%$ -confidence interval becomes:

$\bar{x} - z \times \frac{\sigma}{\sqrt{n}} < \mu < \bar{x} + z \times \frac{\sigma}{\sqrt{n}}$

This formula uses the known population standard deviation $\sigma$ instead of the sample standard deviation $s$ .

Worked examples

lightbulbExample

Worked Example 1: Distribution of Sample Mean

Question: A population follows a normal distribution with mean 6.2 and variance 6.2. A sample of size 50 is taken. Give a model of the distribution of the sample mean.

Solution:

The sample mean is distributed as:

$\bar{X} \sim N\left(6.2, \frac{6.2}{50}\right) = N(6.2, 0.124)$

The mean of the distribution equals the population mean (6.2), and the variance is the population variance divided by the sample size.

lightbulbExample

Worked Example 2: Confidence Interval with Estimated Variance

Question: A sample of size 36 is taken from a population whose standard deviation is 20.4. The sample mean is 13.6.

a) What is the probability that the confidence interval will contain the population mean?

b) Find the 95% confidence interval.

Solution:

a) 95%

Before the sample is taken, there is a 95% probability that a 95%-confidence interval will contain the true population mean.

b) First, find the number of standard errors:

For a 95% confidence interval: $z = 1.96$

The confidence interval is:

$13.6 - 1.96 \times \frac{20.4}{\sqrt{36}} < \mu < 13.6 + 1.96 \times \frac{20.4}{\sqrt{36}}$

Calculate the standard error:

$\frac{20.4}{\sqrt{36}} = \frac{20.4}{6} = 3.4$

Therefore:

$13.6 - 1.96 \times 3.4 < \mu < 13.6 + 1.96 \times 3.4$

$13.6 - 6.664 < \mu < 13.6 + 6.664$

$6.94 < \mu < 20.3 \text{ (3 sf)}$

The 95% confidence interval for the population mean is (6.94, 20.3).

lightbulbExample

Worked Example 3: Confidence Interval with Known Variance

Question: A 90% confidence interval is to be created for a normal distribution whose variance is known to be 19.5. A sample of size 72 is taken and the sample mean is 19.3.

a) Calculate the standard error.

b) Determine the value of $z$ .

c) Find the confidence interval.

Solution:

a) Standard error = $\frac{\sigma}{\sqrt{n}} = \frac{\sqrt{19.5}}{\sqrt{72}} = 0.520$

Note: We use $\sigma$ (population standard deviation) because the variance is known.

b) For a 90% confidence interval, $z = 1.645$ (from tables or calculator)

c) The confidence interval is:

$19.3 - 1.645 \times 0.52 < \mu < 19.3 + 1.645 \times 0.52$

$19.3 - 0.8554 < \mu < 19.3 + 0.8554$

$18.4 < \mu < 20.1 \text{ (3 sf)}$

The 90% confidence interval for the population mean is (18.4, 20.1).

Exam tips and common traps

Sample size and interval width

Increasing the sample size decreases the width of the confidence interval because the standard error contains $\sqrt{n}$ in the denominator. A larger sample provides a more accurate estimate of the population mean.

infoNote

Since the standard error is based on $\sqrt{n}$ , a sample four times as large will halve the width of the interval. This square root relationship is important for planning studies – if you want to double your precision, you need to quadruple your sample size!

When to use each formula

Use sample variance ( $s^2$ ): When the population variance is unknown (most common situation)
Use population variance ( $\sigma^2$ ): When explicitly stated that the population variance is known

chatImportant

Common Formula Selection Mistake

Always read the question carefully to determine whether you're given:

The population standard deviation $\sigma$ or variance $\sigma^2$ (rare, but explicitly stated)
The sample standard deviation $s$ or variance $s^2$ (most common)

If the question doesn't specify, assume you need to use the sample variance.

Confidence level interpretation

chatImportant

Common Interpretation Mistake

Incorrect: "There is a 95% probability that $\mu$ lies in this interval" (after calculating the interval from a sample)

Correct: "We are 95% confident that $\mu$ lies in this interval" or "If we repeated this process many times, 95% of intervals would contain $\mu$ ."

The difference is subtle but critical for exams!

Representative samples

Always ensure your sample is representative of the population to avoid bias. A biased sample (such as measuring only basketball players when estimating average human height) will produce misleading confidence intervals, even if calculated correctly.

infoNote

Sources of Bias

Common sources of sample bias include:

Selection bias (non-random sampling)
Response bias (people who respond differ from those who don't)
Measurement bias (systematic errors in data collection)

Even perfect mathematics cannot overcome poor sampling!

Trade-off between confidence and precision

You can never be certain that your interval contains the population mean. The more confident you want to be, the larger the interval becomes ( $z$ increases as $p$ increases). This represents a fundamental trade-off between confidence and precision.

Checking your work

When calculating confidence intervals:

Identify whether $\sigma$ or $s$ should be used
Calculate the standard error correctly
Find the appropriate $z$ -value for the confidence level
Apply the formula systematically
Give your final answer to an appropriate degree of accuracy (usually 3 significant figures)

infoNote

Calculator Tips

Most scientific calculators can:

Calculate $\Phi^{-1}$ (inverse normal) directly
Compute standard deviations from raw data
Store intermediate values to avoid rounding errors

Learn your calculator's functions before the exam!

Remember!

bookmarkSummary

Key Points to Remember:

The sample mean $\bar{x}$ and sample variance $s^2$ are unbiased estimators of the population parameters $\mu$ and $\sigma^2$ .
A p%-confidence interval means that before sampling, there is a $p\%$ probability the interval will contain the true population mean. This interpretation only applies before the interval is generated.
The standard error $\frac{s}{\sqrt{n}}$ measures the variability of the sample mean. Larger samples produce smaller standard errors and narrower confidence intervals.
Use s (sample standard deviation) when the population variance is unknown, and σ (population standard deviation) when the population variance is known.
Increasing the confidence level produces wider intervals. There is always a trade-off between confidence and precision.
Remember the n-1 denominator in sample variance – this is the "minus one for bias correction" that makes it an unbiased estimator.

Hypothesis Testing and Contingency Tables (AQA A-Level Further Maths): Revision Notes

Confidence Intervals

Introduction

Sample statistics

Sample mean

Sample variance

Unbiased estimators

Distribution of the sample mean

What is a confidence interval?

Calculating confidence intervals when variance is estimated

Standard error

Finding the z-value

Interpretation

Calculating confidence intervals when population variance is known

Worked examples

Exam tips and common traps

Sample size and interval width

When to use each formula

Confidence level interpretation

Representative samples

Trade-off between confidence and precision

Checking your work

Remember!

Explore AQA A-Level Further Maths Model Answers by Topics

Discrete and Continuous Random Variables

Hypothesis Testing and Contingency Tables

Random Processes

Hypothesis Testing and the t-Test

Explore AQA A-Level Further Maths Quizzes by Topics

Discrete and Continuous Random Variables

Hypothesis Testing and Contingency Tables

Random Processes

Hypothesis Testing and the t-Test

Explore AQA A-Level Further Maths Flashcards by Topics

Discrete and Continuous Random Variables

Hypothesis Testing and Contingency Tables

Random Processes

Hypothesis Testing and the t-Test

Join 100,000+ A-Level students studying Revision Notes with us.