Hypothesis Testing and Errors Revision Notes for AQA A-Level Further Maths

Hypothesis Testing and Errors

Introduction to hypothesis testing with Poisson distributions

Hypothesis testing is a statistical method that helps us determine whether a claim about a population parameter is reasonable based on sample data. When working with events that occur randomly over time or space, we often use the Poisson distribution to model the situation.

The Poisson distribution is characterised by a single parameter, λ (lambda), which represents the average number of events occurring in a fixed interval. For example, λ could represent the average number of goals scored per football match, the average number of defects per product, or the average number of customers arriving per hour.

Note

When we don't know the true value of λ, the best approach is to estimate it from a random sample. The best estimate of λ is the sample mean, calculated by dividing the total number of events observed by the sample size. This estimate forms the basis for hypothesis testing.

Setting up your hypotheses

Before conducting a hypothesis test, you must clearly state two competing hypotheses.

The null hypothesis ( $H_0$ )

The null hypothesis represents the claim or assumption we are testing. It always states that the parameter equals a specific value. We write this as:

$H_0: \lambda = \lambda_0$

where $\lambda_0$ is the assumed value of the parameter. The null hypothesis represents the "status quo" or the claim we want to investigate. We assume $H_0$ is true unless we find strong evidence against it.

The alternative hypothesis ( $H_1$ )

The alternative hypothesis expresses what we suspect might be true if the null hypothesis is wrong. There are three possible forms:

Two-tailed test: $H_1: \lambda \neq \lambda_0$ (the parameter differs from $\lambda_0$ in either direction)
One-tailed test (lower tail): $H_1: \lambda < \lambda_0$ (the parameter is less than $\lambda_0$ )
One-tailed test (upper tail): $H_1: \lambda > \lambda_0$ (the parameter is greater than $\lambda_0$ )

The choice between one-tailed and two-tailed tests depends on the research question. Use a one-tailed test when you're specifically interested in whether the parameter has increased or decreased. Use a two-tailed test when you want to detect any difference from the assumed value.

Important

A single observation counts as a sample of size 1. This is important when calculating expected values. Don't forget to multiply by the sample size when determining your test distribution!

The test statistic

When conducting a hypothesis test for a Poisson distribution, the test statistic is straightforward to calculate.

The test statistic equals the total number of events that occur in your sample.

If you take a sample of size $n$ (meaning $n$ independent observations), simply count up all the events that occurred across all observations. This total becomes your test statistic.

Under the null hypothesis, we would expect approximately $n\lambda_0$ events to occur in a sample of size $n$ . The key question is: is your observed total sufficiently different from this expected value to reject $H_0$ ?

Note

Why compare to $n\lambda_0$ instead of just $\lambda_0$ ? The Poisson distribution only allows integer outcomes. When you have multiple observations, you multiply the sample size by the parameter to get the expected total. For example, if $\lambda = 8.6$ and you take 5 observations, you expect about $5 \times 8.6 = 43$ events total, so you use $Y \sim \text{Po}(43)$ for your calculations.

Conducting the hypothesis test

There are two main approaches to conducting hypothesis tests: the critical values method and the p-value method. Both lead to the same conclusion, but they present the evidence differently.

Critical values method

The critical values method identifies the boundary values that separate the critical region (where we reject $H_0$ ) from the acceptance region.

Steps:

Determine the Poisson distribution for the total number of events: $Y \sim \text{Po}(n\lambda_0)$
Use your calculator or tables to find the values that give cumulative probabilities closest to the significance level
The critical values bound the acceptance region
Compare your test statistic to these critical values
Make your decision

Decision rule for two-tailed tests: If your test statistic falls outside the range bounded by the critical values, reject $H_0$ . Otherwise, accept $H_0$ .

Decision rule for one-tailed tests: For $H_1: \lambda > \lambda_0$ , reject $H_0$ if your test statistic is greater than or equal to the upper critical value. For $H_1: \lambda < \lambda_0$ , reject $H_0$ if your test statistic is less than or equal to the lower critical value.

Important

The critical region is bounded by the critical values. Remember that for discrete distributions like the Poisson, we can't always achieve exactly the stated significance level, so we choose values that keep the probability in the critical region as close to the significance level as possible without exceeding it.

p-value method

The p-value represents the probability of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is true.

Steps:

Calculate your test statistic
Determine the appropriate Poisson distribution: $Y \sim \text{Po}(n\lambda_0)$
Calculate the probability of getting a value at least as extreme as your test statistic
Compare this p-value to the significance level
Make your decision

Decision rule: If the p-value is less than the significance level, reject $H_0$ . If the p-value is greater than or equal to the significance level, accept $H_0$ .

Note

A smaller p-value indicates stronger evidence against the null hypothesis. The p-value tells you how surprising your result would be if $H_0$ were true.

Understanding errors in hypothesis testing

Even when we follow correct procedures, hypothesis tests can lead to wrong conclusions. There are two types of errors to consider, each with different implications.

Type I error (false positive)

A Type I error occurs when we reject a null hypothesis that is actually true. This is often called a false positive because we conclude there is an effect or difference when none actually exists.

Key characteristics:

We find "evidence" of something that isn't really there
The result attracts attention unnecessarily
Resources might be wasted investigating a non-existent problem

Note

Calculating the probability of a Type I error:

For continuous probability distributions, the probability of making a Type I error equals the significance level (α). For example, if we test at the 5% level, we have exactly a 5% chance of making a Type I error.

For discrete distributions like the Poisson, the probability of a Type I error may be less than the significance level. This happens because we can't always find critical values that give exactly the desired probability.

The probability of a Type I error equals the probability of the critical region when the null hypothesis is true.

When testing with $Y \sim \text{Po}(10.4)$ at the 5% significance level, we find $P(Y \geq 16) = 0.06405 > 0.05$ , but $P(Y \geq 17) = 0.03681 < 0.05$ . Therefore, the critical value is 17, and the probability of a Type I error equals 0.03681, not exactly 0.05.

Type II error (false negative)

A Type II error occurs when we accept a null hypothesis that is actually false. This is often called a false negative because we fail to detect a real effect or difference.

Key characteristics:

We overlook something important that is actually happening
A genuine problem remains undetected
The consequences can be more serious than a Type I error in many contexts

Important

Comparing the two errors:

The severity of each type of error depends entirely on the context:

In medical testing, a Type II error (failing to detect a disease) could be fatal, while a Type I error (false alarm) merely leads to further testing
In quality control, a Type I error (rejecting good products) costs money, while a Type II error (accepting defective products) could harm customers
In scientific research, Type I errors can lead to false discoveries, while Type II errors mean we miss real effects

Exam guidance: When asked to explain errors in context, always describe what each error means specifically for that situation and discuss the consequences. Don't just give the general definition.

Worked examples

Example

Worked Example 1: Testing with critical values (two-tailed test)

A Poisson distribution is believed to have parameter $\lambda = 8.6$ . We test the hypotheses $H_0: \lambda = 8.6$ and $H_1: \lambda \neq 8.6$ at the 10% significance level using a sample of size 5.

Part a) Find the critical values

Step 1: Determine the distribution for the total number of events.

Sample size × parameter = $5 \times 8.6 = 43$

Therefore, $Y \sim \text{Po}(43)$

Step 2: For a two-tailed test at the 10% level, we need 5% in each tail.

Find values where the cumulative probability is close to 0.05 and 0.95:

Using tables or calculator:

$P(Y \leq 32) = 0.0497$ (approximately 5%)
$P(Y \leq 33) = 0.0693$
$P(Y \geq 54) = 1 - P(Y \leq 53) = 1 - 0.9140 = 0.0860$
$P(Y \geq 55) = 1 - P(Y \leq 54) = 1 - 0.9394 = 0.0606$

The critical values are 32 and 55.

These bound the acceptance region. We reject $H_0$ if our test statistic is $\leq 32$ or $\geq 55$ .

Part b) A sample produces an average rate of 10.4. Should we accept or reject $H_0$ ?

Step 1: Calculate the test statistic.

Total events = sample size × average rate = $5 \times 10.4 = 52$

Step 2: Compare to critical values.

52 lies between 32 and 55 (inside the acceptance region)

Conclusion: Accept the null hypothesis. The sample provides insufficient evidence to suggest that $\lambda \neq 8.6$ at the 10% significance level.

Example

Worked Example 2: Testing with p-values (one-tailed test)

A Poisson distribution is believed to have parameter $\lambda = 6.1$ . We test $H_0: \lambda = 6.1$ against $H_1: \lambda > 6.1$ at the 5% significance level. A sample yields the results: 3, 4, 4, 6, 7, 8, 9, 10, 13, 14.

Part a) Calculate the total number of occurrences

Simply add all the values:

$n = 3 + 4 + 4 + 6 + 7 + 8 + 9 + 10 + 13 + 14 = 78$

Part b) Find the p-value

Step 1: Identify the distribution.

Sample size = 10 observations Expected total = $10 \times 6.1 = 61$

Therefore, $Y \sim \text{Po}(61)$

Step 2: Calculate the p-value.

For $H_1: \lambda > 6.1$ , we want $P(Y \geq 78)$ when $Y \sim \text{Po}(61)$

Using a calculator: $P(Y \geq 78) = 0.02032$

Part c) State your conclusion

The p-value (0.02032) is less than the significance level (0.05).

Conclusion: Reject the null hypothesis. There is sufficient evidence at the 5% level to conclude that $\lambda > 6.1$ . The sample suggests the parameter is higher than 6.1.

Example

Worked Example 3: Calculating Type I error probability

A Poisson distribution is believed to have parameter $\lambda = 1.3$ . We test $H_0: \lambda = 1.3$ against $H_1: \lambda > 1.3$ at the 5% significance level using a sample of size 8.

Part a) Find the critical value

Step 1: Determine the distribution.

Expected total = $8 \times 1.3 = 10.4$

Therefore, $Y \sim \text{Po}(10.4)$

Step 2: Find where cumulative probability equals approximately 0.95.

For a one-tailed test at 5% level, we need $P(Y \geq \text{critical value}) \leq 0.05$

Calculate probabilities:

$P(Y \geq 16) = 0.06405$ (this is greater than 0.05)
$P(Y \geq 17) = 0.03681$ (this is less than 0.05)

Since $0.06405 > 0.05$ but $0.05 > 0.03681$ , we choose the critical value that keeps the probability below 0.05.

The critical value is 17.

Part b) Find the probability of making a Type I error

The probability of a Type I error equals the probability of the critical region when $H_0$ is true.

The critical region is $Y \geq 17$ .

$P(\text{Type I error}) = P(Y \geq 17) = 0.03681$

This probability is less than the 5% significance level because the Poisson distribution is discrete. We cannot achieve exactly 5% in the critical region.

Interpretation: If the true value of $\lambda$ is actually 1.3, there is approximately a 3.68% chance we will incorrectly reject the null hypothesis. A Type I error here means we conclude the parameter has increased when it really hasn't changed.

Example

Worked Example 4: Hypothesis testing in context

A Poisson distribution with parameter $\lambda = 2.8$ is thought to model the number of goals scored per football match well. Due to improved defensive tactics, it's believed the average number of goals has decreased. We test $H_0: \lambda = 2.8$ against $H_1: \lambda < 2.8$ at the 5% significance level using a random sample of 24 matches.

Part a) Find the critical value

Step 1: Set up the distribution.

Expected total = $24 \times 2.8 = 67.2 \approx 67$

Use $Y \sim \text{Po}(67)$

Step 2: For a one-tailed test (lower tail) at 5%, find where $P(Y \leq k) \approx 0.05$

From tables or calculator:

$P(Y \leq 53) = 0.0434$
$P(Y \leq 54) = 0.0569$

Since we want the probability as close to 0.05 as possible without exceeding it:

The critical value is 53.

We reject $H_0$ if the total number of goals $\leq 53$ .

Part b) The sample contains 49 goals total. State your conclusion

Step 1: Compare test statistic to critical value.

Test statistic = 49 Critical value = 53

Since $49 < 53$ , our test statistic falls in the critical region.

Step 2: Make decision and interpret.

Reject the null hypothesis. There is sufficient evidence at the 5% significance level to conclude that $\lambda < 2.8$ .

In context: The data supports the claim that improved defensive tactics have reduced the average number of goals scored per match. On average, there are now fewer goals conceded per match than the previous average of 2.8.

Example

Worked Example 5: Type I and Type II errors in context

A kitchen in a nut-free school examines ingredients to ensure compliance with regulations. The null hypothesis for each ingredient is that it contains no trace of nuts.

Part a) Explain a Type I error and its consequences

A Type I error would occur if the kitchen suspects an ingredient contains nuts when it actually doesn't.

Consequences: This is a false positive. The kitchen would unnecessarily reject a safe ingredient, which might cost some money and require finding an alternative. However, students with allergies remain safe, so the consequences are relatively minor.

Why the kitchen might not worry: It's better to be over-cautious when allergies are involved. The cost of wrongly rejecting a safe ingredient is much less than the risk of accepting a contaminated one.

Part b) Explain a Type II error and its consequences

A Type II error would occur if the kitchen believes an ingredient doesn't contain nuts when it actually does.

Consequences: This is a false negative. The contaminated ingredient would be used, potentially exposing students with nut allergies to a life-threatening reaction. This could lead to severe health consequences or even death.

Why the kitchen should worry: Type II errors have potentially fatal consequences. The kitchen should set their testing procedures to minimise the probability of Type II errors, even if this means increasing Type I errors (false alarms).

Key lesson: The relative importance of Type I and Type II errors always depends on the context and consequences of each type of mistake.

Problem-solving strategy

Note

When answering hypothesis testing questions, follow this systematic approach:

Step 1: Conduct the test

Clearly state $H_0$ and $H_1$
Identify the sample size and calculate expected values
Determine the appropriate Poisson distribution

Step 2: Find critical values or calculate p-value

For critical values: find boundaries of the acceptance region
For p-value: calculate probability of result as extreme as observed
Show your working clearly

Step 3: Accept or reject the null hypothesis

Compare your test statistic to critical values, or
Compare your p-value to the significance level
State your decision clearly

Step 4: Interpret the result in context

Explain what your conclusion means for the real-world situation
Use appropriate language ("evidence suggests..." rather than "proves...")
Address the original research question

Exam tips:

Always show which distribution you're using (e.g., $Y \sim \text{Po}(43)$ )
State probabilities to at least 4 decimal places when calculating
For discrete distributions, remember the Type I error probability may be less than the significance level
When asked to explain errors, always relate them specifically to the context
Check whether you need a one-tailed or two-tailed test before starting calculations

Remember!

Summary

Key Points to Remember:

The test statistic for Poisson hypothesis testing is the total number of events in the sample, compared to the expected value $n\lambda_0$ under the null hypothesis.
Type I error (false positive) means rejecting a true null hypothesis – the probability equals the significance level for continuous distributions, but may be less for discrete distributions like the Poisson.
Type II error (false negative) means accepting a false null hypothesis – the seriousness depends entirely on context, but often has more severe consequences than Type I errors.
For discrete distributions, you cannot always achieve exactly the stated significance level – choose critical values that keep the probability in the critical region as close as possible to the significance level without exceeding it.
Always interpret your results in context – explain what your statistical conclusion means for the real-world situation, and remember that hypothesis tests provide evidence for or against claims, not absolute proof.

Hypothesis Testing and Errors (AQA A-Level Further Maths): Revision Notes