The Coefficient of Determination (VCE SSCE General Mathematics): Revision Notes
The Coefficient of Determination
Introduction to prediction and association
When two variables are associated, we can use the value of one variable to estimate the value of the other. The strength of this prediction depends on how closely the variables are related.
Understanding association strength
The relationship between variables can vary in strength, and this affects our ability to make predictions:
- Perfect linear association ( or ): We can make exact predictions. For example, when buying cheese by weight, there is a perfect relationship between the weight and the cost ().
- No association (): Knowing one variable doesn't help predict the other. For example, an adult's height does not help predict their IQ ().
- Partial association (): We can make approximate predictions. For example, people's heights and weights are associated, so knowing someone's height helps roughly predict their weight.
The ability to make accurate predictions depends on the correlation coefficient (). The stronger the correlation, the better our predictions will be.
What is the coefficient of determination?
The coefficient of determination is a statistic that measures how well one variable can be predicted from another linearly related variable. It tells us the proportion of variation in one variable that can be explained by variation in the other variable.
Formula
The coefficient of determination is calculated by squaring the correlation coefficient:
Key Properties:
- The coefficient of determination is always positive (even if is negative)
- It is usually expressed as a percentage
- The value ranges from to (or to )
Calculating the coefficient of determination
To calculate the coefficient of determination, follow these steps:
- Square the correlation coefficient
- Convert to a percentage by multiplying by
- Round appropriately (usually to one decimal place)
Worked Example: Height and Weight
If the correlation between weight and height is , find the value of the coefficient of determination as a percentage.
Solution:
The coefficient of determination is 64%.
Note: Converting to a percentage makes interpretation easier and more meaningful.
Interpreting the coefficient of determination
The coefficient of determination (as a percentage) tells us the variation in the response variable that is explained by the variation in the explanatory variable.
What does "explained variation" mean?
Understanding Explained Variation
Consider the height and weight example where :
Interpretation: of the variation in people's weight is explained by the variation in their height.
This means:
- Height explains of why people have different weights
- The remaining of variation in weight is explained by other factors (such as diet, lifestyle, and build)
- We can say that of the variation in weight is NOT explained by the variation in height
When interpreting the coefficient of determination, always:
- Identify which variable is the response variable
- Identify which variable is the explanatory variable
- State the percentage of variation in the response variable explained by the explanatory variable
Worked Example: Carbon Monoxide and Traffic Volume
The level of carbon monoxide (CO) in the air at the roadside and traffic volume at the same location are linearly related, with . Traffic volume is the explanatory variable. Determine the coefficient of determination, express it as a percentage, and interpret.
Solution:
Calculate :
Convert to percentage:
Interpretation: 97.0% of the variation in carbon monoxide levels in the air can be explained by the variation in traffic volume.
Conclusion: Traffic volume is a very good predictor of carbon monoxide levels. Knowing the traffic volume enables us to predict carbon monoxide levels with high accuracy.
Worked Example: Verbal and Mathematical Ability
Scores on tests of verbal and mathematical ability are linearly related with . Verbal ability is the explanatory variable. Determine the coefficient of determination, express it as a percentage, and interpret.
Solution:
Calculate :
Convert to percentage:
Interpretation: Only 7.6% of the variation observed in scores on the mathematical ability test can be explained by the variation in scores obtained on the verbal ability test.
Conclusion: Scores on the verbal ability test are not good predictors of scores on the mathematical ability test. The remaining of the variation in mathematical ability is explained by other factors.
Finding the correlation coefficient from the coefficient of determination
If we know the coefficient of determination (), we can work backwards to find the correlation coefficient (). However, since the square root of a number can be positive or negative, we need additional information to determine the correct sign.
Using a scatterplot to determine the sign
To find from :
- Take the square root of to get
- Examine the scatterplot to determine the direction of the association
- If the association is positive (upward trend), use the positive value
- If the association is negative (downward trend), use the negative value
Worked Example: Finding r from r²
For a relationship shown in a scatterplot (displaying a negative association), the coefficient of determination is . Determine the correlation coefficient , rounded to four decimal places.
Solution:
Since :
The scatterplot shows a negative association, so we choose the negative value:
Exam Tip: Always check the scatterplot or context to determine whether should be positive or negative. The coefficient of determination alone cannot tell you the direction of the relationship!
Understanding prediction accuracy
The coefficient of determination helps us understand how useful one variable is for predicting another. Here's a general guide to interpreting different values:
| Coefficient of Determination | Interpretation | Prediction Quality |
|---|---|---|
| Close to (e.g., ) | Most variation is explained | Excellent predictor |
| Around (e.g., ) | Moderate variation explained | Reasonably good predictor |
| Low (e.g., ) | Very little variation explained | Poor predictor |
Remember: A high coefficient of determination means:
- The explanatory variable is a good predictor of the response variable
- Most of the variation in the response variable is accounted for
- Predictions will be more accurate
Summary
Key Points to Remember:
- The coefficient of determination is calculated by squaring the correlation coefficient:
- It is expressed as a percentage and tells us the proportion of variation in the response variable explained by the explanatory variable
- The higher the coefficient of determination, the better one variable predicts the other
- To find from , take the square root and use a scatterplot to determine the correct sign (positive or negative)
- The remaining percentage (unexplained variation) is due to other factors not included in the relationship