Handling Data (AQA A-Level Biology): Revision Notes
Statistical Tests
Statistical tests help biologists determine whether observed differences in data are due to genuine effects or simply random chance. In biology, any probability greater than 5% suggests results could be due to chance alone, while probabilities of 5% or below indicate the data differ significantly and a real cause must be influencing the outcome.
Understanding statistical significance
Statistical significance occurs when the probability that results are due to chance alone is 5% or less (p ≤ 0.05). This means we can be at least 95% confident that observed differences represent real effects rather than random variation.
The null hypothesis assumes there is no significant difference between observed and expected results. Statistical tests help determine whether to accept or reject this null hypothesis based on calculated probability values.
The 5% Rule: Results with p ≤ 0.05 are considered statistically significant, meaning there's less than a 5% chance they occurred due to random variation alone. This is the standard threshold used in biological research.
Chi-squared (χ²) test
The chi-squared test compares patterns in collected data with patterns expected by chance. This test determines how much observed frequencies deviate from expected frequencies.
When to use chi-squared tests
Use chi-squared tests when comparing observed frequencies with expected frequencies, particularly for checking genetic cross results. For example, when testing whether a die is fair by comparing actual throws with expected equal frequencies for each number.
Chi-squared tests are particularly useful in genetics for testing whether experimental crosses match predicted Mendelian ratios, such as 3:1 or 9:3:3:1 ratios.
Chi-squared formula and calculation
The formula for the chi-squared test is:
Where:
- O = observed values
- E = expected values
- Σ = sum of all categories
Worked Example: Genetic Cross Analysis
Consider a cross between two heterozygous tall plants where the expected outcome is 3 tall:1 short plant. In reality, 69 tall and 28 short plants were observed from 97 total plants.
Expected values: 72.75 tall plants, 24.25 short plants
| Category | Observed (O) | Expected (E) | (O-E) | (O-E)² | (O-E)²/E |
|---|---|---|---|---|---|
| Tall plants | 69 | 72.75 | -3.75 | 14.06 | 0.19 |
| Short plants | 28 | 24.25 | 3.75 | 14.06 | 0.58 |
| Total | χ² = 0.77 |
Interpreting chi-squared results
Calculate degrees of freedom as the number of categories minus 1. Here: 2 categories - 1 = 1 degree of freedom.
Compare the calculated χ² value (0.77) with the critical value from statistical tables. For 1 degree of freedom at p = 5%, the critical value is 3.84. Since 0.77 < 3.84, the probability is between 10% and 50%, so we accept the null hypothesis - no significant difference exists between observed and expected results.
Critical Point: If your calculated χ² value is greater than the critical value from the table, the result is statistically significant and you reject the null hypothesis.
Student t test
The Student t test judges whether differences between means of two data sets are statistically significant. This test requires normally distributed data with sufficient sample sizes (ideally more than 15 in each group).
When to use t tests
Use t tests when comparing means from two independent groups to determine if observed differences are statistically significant. Sample sizes do not need to be equal between groups.
The t test assumes your data follows a normal distribution. Always check this assumption before applying the test, especially with smaller sample sizes.
Student t test formula and calculation
The formula for an unpaired t test is:
Where:
- and = means of each group
- and = variances of each group
- and = sample sizes of each group
Worked Example: Limpet Diameter Comparison
Comparing limpet diameters between east-facing and west-facing sites:
| Site | n | Mean diameter (mm) | Variance (s²) |
|---|---|---|---|
| East | 28 | 35.64 | 77.17 |
| West | 28 | 37.36 | 74.4 |
Step 1: Calculate the t value
Step 2: Calculate degrees of freedom
Degrees of freedom = (n₁ + n₂) - 2 = 54
Step 3: Interpret results
From statistical tables, this t value indicates the probability of difference being due to chance is more than 10%. Therefore, we accept the null hypothesis - no significant difference exists between sites.
Correlation coefficient (Pearson's product moment correlation coefficient)
The correlation coefficient (r) measures the strength and direction of linear relationships between two variables. Values range from -1 (perfect negative correlation) to +1 (perfect positive correlation), with 0 indicating no correlation.
When to use correlation analysis
Use correlation analysis when examining relationships between two continuous variables. Always plot data on scatter graphs first to visualise potential relationships before calculating correlation coefficients.
Remember: Correlation does not imply causation. A strong correlation between two variables doesn't necessarily mean one causes the other.
Correlation coefficient formula and calculation
The formula is:
Where:
- x = values of first variable, = mean of first variable
- y = values of second variable, = mean of second variable
- Σ = sum of all values
Worked Example: Seed Mass and Wrinkles Correlation
Investigating correlation between horse chestnut seed mass and number of wrinkles:
Data: 6 seeds with masses 12g, 10g, 8g, 6g, 4g, 2g and wrinkles 1, 3, 8, 15, 27, 36 respectively.
After tabulated calculations: r = -254/261 = -0.97
Interpretation: This indicates a strong negative correlation between seed mass and number of wrinkles.
Degrees of freedom = n - 2 = 4
From correlation tables, r = 0.97 at 4 degrees of freedom gives p < 0.001, meaning the correlation is 99.9% certain to be real.
Using statistical tables effectively
Understanding how to calculate and use degrees of freedom is crucial for interpreting statistical results correctly.
Degrees of Freedom Calculations:
- Chi-squared: number of categories - 1
- t test (unpaired): (n₁ + n₂) - 2
- Correlation: n - 2
Compare calculated values with critical values from appropriate statistical tables. If calculated values exceed critical values at p = 0.05, results are statistically significant.
Table Usage Tip: Always convert negative t or r values to positive when using tables, as statistical tables show absolute values only.
Key Points to Remember:
- Statistical significance occurs when p ≤ 0.05 (5% probability or less)
- Chi-squared tests compare observed vs expected frequencies using
- Student t tests compare means between two groups and require normally distributed data
- Correlation coefficients measure relationship strength between variables, ranging from -1 to +1
- Always calculate correct degrees of freedom and use appropriate statistical tables to interpret results