Hypothesis Testing and Errors (AQA A-Level Further Maths): Revision Notes
Hypothesis Testing and Errors
Introduction to hypothesis testing with Poisson distributions
Hypothesis testing is a statistical method that helps us determine whether a claim about a population parameter is reasonable based on sample data. When working with events that occur randomly over time or space, we often use the Poisson distribution to model the situation.
The Poisson distribution is characterised by a single parameter, λ (lambda), which represents the average number of events occurring in a fixed interval. For example, λ could represent the average number of goals scored per football match, the average number of defects per product, or the average number of customers arriving per hour.
When we don't know the true value of λ, the best approach is to estimate it from a random sample. The best estimate of λ is the sample mean, calculated by dividing the total number of events observed by the sample size. This estimate forms the basis for hypothesis testing.
Setting up your hypotheses
Before conducting a hypothesis test, you must clearly state two competing hypotheses.
The null hypothesis ()
The null hypothesis represents the claim or assumption we are testing. It always states that the parameter equals a specific value. We write this as:
where is the assumed value of the parameter. The null hypothesis represents the "status quo" or the claim we want to investigate. We assume is true unless we find strong evidence against it.
The alternative hypothesis ()
The alternative hypothesis expresses what we suspect might be true if the null hypothesis is wrong. There are three possible forms:
- Two-tailed test: (the parameter differs from in either direction)
- One-tailed test (lower tail): (the parameter is less than )
- One-tailed test (upper tail): (the parameter is greater than )
The choice between one-tailed and two-tailed tests depends on the research question. Use a one-tailed test when you're specifically interested in whether the parameter has increased or decreased. Use a two-tailed test when you want to detect any difference from the assumed value.
A single observation counts as a sample of size 1. This is important when calculating expected values. Don't forget to multiply by the sample size when determining your test distribution!
The test statistic
When conducting a hypothesis test for a Poisson distribution, the test statistic is straightforward to calculate.
The test statistic equals the total number of events that occur in your sample.
If you take a sample of size (meaning independent observations), simply count up all the events that occurred across all observations. This total becomes your test statistic.
Under the null hypothesis, we would expect approximately events to occur in a sample of size . The key question is: is your observed total sufficiently different from this expected value to reject ?
Why compare to instead of just ? The Poisson distribution only allows integer outcomes. When you have multiple observations, you multiply the sample size by the parameter to get the expected total. For example, if and you take 5 observations, you expect about events total, so you use for your calculations.
Conducting the hypothesis test
There are two main approaches to conducting hypothesis tests: the critical values method and the p-value method. Both lead to the same conclusion, but they present the evidence differently.
Critical values method
The critical values method identifies the boundary values that separate the critical region (where we reject ) from the acceptance region.
Steps:
- Determine the Poisson distribution for the total number of events:
- Use your calculator or tables to find the values that give cumulative probabilities closest to the significance level
- The critical values bound the acceptance region
- Compare your test statistic to these critical values
- Make your decision
Decision rule for two-tailed tests: If your test statistic falls outside the range bounded by the critical values, reject . Otherwise, accept .
Decision rule for one-tailed tests: For , reject if your test statistic is greater than or equal to the upper critical value. For , reject if your test statistic is less than or equal to the lower critical value.
The critical region is bounded by the critical values. Remember that for discrete distributions like the Poisson, we can't always achieve exactly the stated significance level, so we choose values that keep the probability in the critical region as close to the significance level as possible without exceeding it.
p-value method
The p-value represents the probability of obtaining a result at least as extreme as the one observed, assuming the null hypothesis is true.
Steps:
- Calculate your test statistic
- Determine the appropriate Poisson distribution:
- Calculate the probability of getting a value at least as extreme as your test statistic
- Compare this p-value to the significance level
- Make your decision
Decision rule: If the p-value is less than the significance level, reject . If the p-value is greater than or equal to the significance level, accept .
A smaller p-value indicates stronger evidence against the null hypothesis. The p-value tells you how surprising your result would be if were true.
Understanding errors in hypothesis testing
Even when we follow correct procedures, hypothesis tests can lead to wrong conclusions. There are two types of errors to consider, each with different implications.
Type I error (false positive)
A Type I error occurs when we reject a null hypothesis that is actually true. This is often called a false positive because we conclude there is an effect or difference when none actually exists.
Key characteristics:
- We find "evidence" of something that isn't really there
- The result attracts attention unnecessarily
- Resources might be wasted investigating a non-existent problem
Calculating the probability of a Type I error:
For continuous probability distributions, the probability of making a Type I error equals the significance level (α). For example, if we test at the 5% level, we have exactly a 5% chance of making a Type I error.
For discrete distributions like the Poisson, the probability of a Type I error may be less than the significance level. This happens because we can't always find critical values that give exactly the desired probability.
The probability of a Type I error equals the probability of the critical region when the null hypothesis is true.
When testing with at the 5% significance level, we find , but . Therefore, the critical value is 17, and the probability of a Type I error equals 0.03681, not exactly 0.05.
Type II error (false negative)
A Type II error occurs when we accept a null hypothesis that is actually false. This is often called a false negative because we fail to detect a real effect or difference.
Key characteristics:
- We overlook something important that is actually happening
- A genuine problem remains undetected
- The consequences can be more serious than a Type I error in many contexts
Comparing the two errors:
The severity of each type of error depends entirely on the context:
- In medical testing, a Type II error (failing to detect a disease) could be fatal, while a Type I error (false alarm) merely leads to further testing
- In quality control, a Type I error (rejecting good products) costs money, while a Type II error (accepting defective products) could harm customers
- In scientific research, Type I errors can lead to false discoveries, while Type II errors mean we miss real effects
Exam guidance: When asked to explain errors in context, always describe what each error means specifically for that situation and discuss the consequences. Don't just give the general definition.
Worked examples
Worked Example 1: Testing with critical values (two-tailed test)
A Poisson distribution is believed to have parameter . We test the hypotheses and at the 10% significance level using a sample of size 5.
Part a) Find the critical values
Step 1: Determine the distribution for the total number of events.
Sample size × parameter =
Therefore,
Step 2: For a two-tailed test at the 10% level, we need 5% in each tail.
Find values where the cumulative probability is close to 0.05 and 0.95:
Using tables or calculator:
- (approximately 5%)
The critical values are 32 and 55.
These bound the acceptance region. We reject if our test statistic is or .
Part b) A sample produces an average rate of 10.4. Should we accept or reject ?
Step 1: Calculate the test statistic.
Total events = sample size × average rate =
Step 2: Compare to critical values.
52 lies between 32 and 55 (inside the acceptance region)
Conclusion: Accept the null hypothesis. The sample provides insufficient evidence to suggest that at the 10% significance level.
Worked Example 2: Testing with p-values (one-tailed test)
A Poisson distribution is believed to have parameter . We test against at the 5% significance level. A sample yields the results: 3, 4, 4, 6, 7, 8, 9, 10, 13, 14.
Part a) Calculate the total number of occurrences
Simply add all the values:
Part b) Find the p-value
Step 1: Identify the distribution.
Sample size = 10 observations Expected total =
Therefore,
Step 2: Calculate the p-value.
For , we want when
Using a calculator:
Part c) State your conclusion
The p-value (0.02032) is less than the significance level (0.05).
Conclusion: Reject the null hypothesis. There is sufficient evidence at the 5% level to conclude that . The sample suggests the parameter is higher than 6.1.
Worked Example 3: Calculating Type I error probability
A Poisson distribution is believed to have parameter . We test against at the 5% significance level using a sample of size 8.
Part a) Find the critical value
Step 1: Determine the distribution.
Expected total =
Therefore,
Step 2: Find where cumulative probability equals approximately 0.95.
For a one-tailed test at 5% level, we need
Calculate probabilities:
- (this is greater than 0.05)
- (this is less than 0.05)
Since but , we choose the critical value that keeps the probability below 0.05.
The critical value is 17.
Part b) Find the probability of making a Type I error
The probability of a Type I error equals the probability of the critical region when is true.
The critical region is .
This probability is less than the 5% significance level because the Poisson distribution is discrete. We cannot achieve exactly 5% in the critical region.
Interpretation: If the true value of is actually 1.3, there is approximately a 3.68% chance we will incorrectly reject the null hypothesis. A Type I error here means we conclude the parameter has increased when it really hasn't changed.
Worked Example 4: Hypothesis testing in context
A Poisson distribution with parameter is thought to model the number of goals scored per football match well. Due to improved defensive tactics, it's believed the average number of goals has decreased. We test against at the 5% significance level using a random sample of 24 matches.
Part a) Find the critical value
Step 1: Set up the distribution.
Expected total =
Use
Step 2: For a one-tailed test (lower tail) at 5%, find where
From tables or calculator:
Since we want the probability as close to 0.05 as possible without exceeding it:
The critical value is 53.
We reject if the total number of goals .
Part b) The sample contains 49 goals total. State your conclusion
Step 1: Compare test statistic to critical value.
Test statistic = 49 Critical value = 53
Since , our test statistic falls in the critical region.
Step 2: Make decision and interpret.
Reject the null hypothesis. There is sufficient evidence at the 5% significance level to conclude that .
In context: The data supports the claim that improved defensive tactics have reduced the average number of goals scored per match. On average, there are now fewer goals conceded per match than the previous average of 2.8.
Worked Example 5: Type I and Type II errors in context
A kitchen in a nut-free school examines ingredients to ensure compliance with regulations. The null hypothesis for each ingredient is that it contains no trace of nuts.
Part a) Explain a Type I error and its consequences
A Type I error would occur if the kitchen suspects an ingredient contains nuts when it actually doesn't.
Consequences: This is a false positive. The kitchen would unnecessarily reject a safe ingredient, which might cost some money and require finding an alternative. However, students with allergies remain safe, so the consequences are relatively minor.
Why the kitchen might not worry: It's better to be over-cautious when allergies are involved. The cost of wrongly rejecting a safe ingredient is much less than the risk of accepting a contaminated one.
Part b) Explain a Type II error and its consequences
A Type II error would occur if the kitchen believes an ingredient doesn't contain nuts when it actually does.
Consequences: This is a false negative. The contaminated ingredient would be used, potentially exposing students with nut allergies to a life-threatening reaction. This could lead to severe health consequences or even death.
Why the kitchen should worry: Type II errors have potentially fatal consequences. The kitchen should set their testing procedures to minimise the probability of Type II errors, even if this means increasing Type I errors (false alarms).
Key lesson: The relative importance of Type I and Type II errors always depends on the context and consequences of each type of mistake.
Problem-solving strategy
When answering hypothesis testing questions, follow this systematic approach:
Step 1: Conduct the test
- Clearly state and
- Identify the sample size and calculate expected values
- Determine the appropriate Poisson distribution
Step 2: Find critical values or calculate p-value
- For critical values: find boundaries of the acceptance region
- For p-value: calculate probability of result as extreme as observed
- Show your working clearly
Step 3: Accept or reject the null hypothesis
- Compare your test statistic to critical values, or
- Compare your p-value to the significance level
- State your decision clearly
Step 4: Interpret the result in context
- Explain what your conclusion means for the real-world situation
- Use appropriate language ("evidence suggests..." rather than "proves...")
- Address the original research question
Exam tips:
- Always show which distribution you're using (e.g., )
- State probabilities to at least 4 decimal places when calculating
- For discrete distributions, remember the Type I error probability may be less than the significance level
- When asked to explain errors, always relate them specifically to the context
- Check whether you need a one-tailed or two-tailed test before starting calculations
Remember!
Key Points to Remember:
-
The test statistic for Poisson hypothesis testing is the total number of events in the sample, compared to the expected value under the null hypothesis.
-
Type I error (false positive) means rejecting a true null hypothesis – the probability equals the significance level for continuous distributions, but may be less for discrete distributions like the Poisson.
-
Type II error (false negative) means accepting a false null hypothesis – the seriousness depends entirely on context, but often has more severe consequences than Type I errors.
-
For discrete distributions, you cannot always achieve exactly the stated significance level – choose critical values that keep the probability in the critical region as close as possible to the significance level without exceeding it.
-
Always interpret your results in context – explain what your statistical conclusion means for the real-world situation, and remember that hypothesis tests provide evidence for or against claims, not absolute proof.