Fitting a Linear Model to the Data (VCE SSCE General Mathematics): Revision Notes
Fitting a Linear Model to the Data
Introduction to linear modeling
When we observe a linear relationship between two numerical variables on a scatterplot, we can take the analysis further by creating a linear model. This involves fitting a straight line to the data and determining its equation. The resulting line helps us better understand the relationship between variables and enables us to make predictions about future values.
The process of fitting a straight line to data showing a linear association is called linear regression. The line itself is known as the regression line or line of good fit.
The equation of any regression line follows the standard form:
where:
- represents the -intercept (where the line crosses the vertical axis)
- represents the slope (the steepness of the line)
Fitting a line 'by eye'
The most straightforward approach to fitting a linear model is the visual method, sometimes called fitting by eye. This involves using a ruler to draw a straight line on the scatterplot that appears to balance the data points evenly on both sides. Ideally, the line should have roughly equal numbers of points above and below it.
Once you have drawn this line on your scatterplot, you can determine its equation using techniques for finding the equation of a straight line.
Method 1: Using the intercept and slope
This method involves identifying the -intercept directly from the graph and calculating the slope using two convenient points on the line.
Steps:
-
Read the -intercept () from where the line crosses the vertical axis
-
Calculate the slope () using:
- Write the equation as
Worked Example: Finding the Line Equation Using Intercept and Slope
A line has been fitted by eye to data showing the velocity of an accelerating car over time.

To find the equation:
-
The -intercept: m/s (reading from the graph where time = 0)
-
The slope: Using points and :
- Therefore:
Method 2: Using two points on the line
This alternative method uses the coordinates of two clear points on the fitted line.
Steps:
-
Identify two points on the line where coordinates can be read easily from the graph
-
Calculate the slope () using:
-
Substitute one point's coordinates into and solve for
-
Write the final equation using the actual variable names
Worked Example: Finding the Line Equation Using Two Points
A line has been fitted to a scatterplot showing student-staff ratios in primary versus secondary education.

Working through the steps:
-
Two suitable points on the line: and
-
Calculate slope:
- Find intercept using point :
- Final equation:
Understanding residuals
When we fit a line to data, most points will not lie exactly on the line. The vertical distance between each actual data point and the fitted line is called a residual.

Residuals can be:
- Positive when the actual data point lies above the line
- Negative when the actual data point lies below the line
These residuals are important because they measure how well our line fits the data. Smaller residuals indicate a better fit.
The least squares method
While fitting a line by eye is simple, a more precise mathematical approach is the least squares method. This technique finds the single best line for the data based on a clear mathematical criterion.
How the method works
The least squares method finds the line that minimises the sum of the squared residuals. Here's why we square the residuals:
- Some residuals are positive (points above the line)
- Some residuals are negative (points below the line)
- If we simply added residuals together, positive and negative values would cancel out
- By squaring each residual first, all values become positive
- We can then add them together meaningfully
- The line that makes this sum as small as possible is our best fit
The name "least squares" comes from finding the line with the least value for the sum of squared residuals.
Formulas for the least squares line
The least squares regression line has equation , where the slope and intercept are calculated using these formulas:
Slope:
Intercept:
where:
- is the correlation coefficient
- and are the standard deviations of and
- and are the mean values of and
Always calculate the slope () first, as you need it to find the intercept ().
Assumptions for using least squares
Before applying the least squares method, check that:
- Both variables are numerical (not categorical)
- The association appears linear (not curved) on the scatterplot
These are the same assumptions required for calculating Pearson's correlation coefficient.
Worked example: Calculating the least squares line
Worked Example: Calculating the Least Squares Line
Find the equation of the least squares regression line when:
Solution:
Calculate the slope:
Calculate the intercept:
Write the equation (rounded to 3 significant figures):
Exam tip: When performing these calculations, work with more decimal places than required and only round your final answer to avoid rounding errors accumulating.
Using technology to find the least squares line
Modern calculators can determine the least squares regression line directly from your data. Both TI-Nspire CAS and ClassPad calculators have built-in functions for this purpose.
General process (all calculators)
- Enter your data into two lists or columns
- Identify which variable is explanatory (independent, on -axis) and which is response (dependent, on -axis)
- Create a scatterplot
- Use the regression function to fit a linear model
- The calculator displays the equation and correlation coefficient
Example using calculator functions
Worked Example: Using Technology for Linear Regression
For the income and CO₂ emissions data:
| Income (\$'000) | 8.9 | 23.0 | 7.5 | 8.0 | 18.0 | 16.7 | 5.2 | 12.8 | 19.1 | 16.4 | 21.7 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| CO₂ (tonnes) | 7.5 | 12.0 | 6.0 | 1.8 | 7.7 | 5.7 | 3.8 | 5.7 | 11.0 | 9.7 | 9.9 |
The calculator produces:

The output shows:
- Regression equation:
- Correlation coefficient:
In terms of the actual variables:
This tells us that for every \$1,000 increase in income, CO₂ emissions increase by approximately 0.417 tonnes on average.

Exam tip: Even when using a calculator, always write your final equation using the actual variable names from the question, not just and . This demonstrates understanding of what the equation represents.
Interpreting the regression line
When you have found the equation of the regression line:
- The slope tells you how much the response variable changes for each unit increase in the explanatory variable
- The intercept gives the predicted value of the response variable when the explanatory variable equals zero (though this may not always be meaningful in context)
- The correlation coefficient indicates the strength and direction of the linear relationship
Key Points to Remember:
- Linear regression creates a mathematical model of the linear relationship between two numerical variables
- The regression line equation is , where is the intercept and is the slope
- Fitting by eye is a simple visual method—draw a line that balances the points
- Residuals are the vertical distances between data points and the fitted line
- The least squares method finds the line that minimises the sum of squared residuals—this is the best mathematical fit
- Always calculate slope first: , then intercept:
- Calculators can determine the least squares line efficiently, but always express the final equation using the actual variable names from your context