Chi Squared Tests for Standard Distributions Revision Notes for Edexcel A-Level Further Mathematics

21.2.3 Chi Squared Tests for Standard Distributions

Least Squares Regression Line

When we have evidence (after visually inspecting data and calculations of $r$ ) that a data set is linearly correlated, it makes sense to draw a line of best fit. This is called the "Least Squares Regression Line."

The process of calculating the regression line involves minimising the squares of the vertical distances from the points to the line.

The least squares regression x-line is the line for which the total area of the squares is least.

Worked Example

lightbulbExample

Example Given that the following data is known to be linearly correlated, calculate the regression line of y on $x$ :

$(x)$ Age of tree (years)	7	24	12	19
$(y)$ Height of tree (m)	2.4	6	5	5.8

Step 1: $y$ on $x$ means " $y$ depends on $x$ ." Decide which variable depends on the other and assign these to be $y$ and $x$ , respectively.

\text{Height depends on age} \quad \Rightarrow \quad y \text{ on } x

The reason for this is that there is an assumption that any variability (deviation from the line of best fit) only occurs for the dependent $(y)$ variable.

Step 2: Assigning the relevant variables to be $y$ and $x$ , we use the least squares regression line formula given in the booklet:

The regression coefficient of $y$ on $x$ is: $b = \dfrac{S_{xy}}{S_{xx}}$

Least squares regression line of $y$ on $x$ is: $y = a + bx \quad \text{where} \quad a = \bar{y} - b\bar{x}$

\begin{array}{c|c|c|c|c} x & y & x^2 & y^2 & xy \\ \hline 7 & 2.4 & 49 & 5.76 & 16.8 \\ 24 & 6 & 576 & 36 & 144 \\ 12 & 5 & 144 & 25 & 60 \\ 19 & 5.8 & 361 & 33.64 & 110.2 \\ \end{array}

Summing up the values:

\sum x = 62, \quad \sum y = 19.2, \quad \sum x^2 = 1130, \quad \sum y^2 = 100.4, \quad \sum xy = 331

Now:

S_{xy} = 331 - \frac{62 \times 19.2}{4} = 33.4

S_{xx} = 1130 - \frac{62^2}{4} =169

So:

b = \frac{33.4}{169} = \frac{167}{845}

Now for $a$ :

\overline {x}= \frac {62}{4}=15.5 \quad | \quad \overline {y}= \frac {19.2}{4}=4.8

\Rightarrow a = 4.8 - \frac{167}{845} \times 15.5 = \frac{587}{338}

Thus the equation of the regression line is:

\therefore y=a+bx

\boxed {y = \frac{587}{338} + \frac{167}{845}x}

Least Squares Regression Line on a Calculator:

\Rightarrow y = 1.74 + 0.198x

Unless required to show full working, do this.

Notable Points

The least squares regression line is only a reliable predictor if:

We have evidence that the data is linearly correlated.
We use it to predict values inside the range of the data we already have (predicting values outside of this range is unreliable and called extrapolation).

If we extrapolate this data for $x$ values beyond the range, we obtain $\color{green}(x)$ .

However, if the actual relationship was

then the regression line is not an appropriate predictor. This is because we cannot be sure the linear relationship holds beyond the data we have.

When unclear or uncertain which variable depends on which, $x$ should be the control variable! E.g., if you choose to take measurements of the heights of trees every $2$ years, you are controlling time, so this should be $x$ .

lightbulbExample

Example: Research was done to see if there is a relationship between finger dexterity and the ability to do work on a production line. The data is shown in the table.

Dexterity score, $x$	2.5	3	3.5	4	5	5	5.5	6.5	7	8
Productivity, $y$	80	130	100	220	190	210	270	290	350	400

The equation of the regression line for these data is $y = -59 + 57x$ .

a. Use the equation to estimate the productivity of someone with a dexterity of $6$ .

b. Give an interpretation of the value of $57$ in the equation of the regression line.

c. State, giving in each case a reason, whether or not it would be reasonable to use this equation to work out the productivity of someone with dexterity of:

i) 2 ii) 14

Answers:

a. Let x = 6

\Rightarrow y = -59 + 57(6) = 283

b. As the dexterity increases by $1$ , the productivity increases by $57$ . (Give answer in context.)

c. i) Reasonable as it is close to the range of data we have.

ii) Unreasonable as it is well outside of the range of data we have. (Extrapolation is unreliable.)

Chi Squared Tests for Standard Distributions (Edexcel A-Level Further Mathematics): Revision Notes

21.2.3 Chi Squared Tests for Standard Distributions

Least Squares Regression Line

Worked Example

Notable Points

Explore Edexcel A-Level Further Mathematics Model Answers by Topics

Poisson & Geometric Hypothesis Testing

Chi Squared Tests

Quality of Tests

Explore Edexcel A-Level Further Mathematics Quizzes by Topics

Poisson & Geometric Hypothesis Testing

Chi Squared Tests

Quality of Tests

Explore Edexcel A-Level Further Mathematics Flashcards by Topics

Poisson & Geometric Hypothesis Testing

Chi Squared Tests

Quality of Tests

Join 100,000+ A-Level students studying Revision Notes with us.