The Bivariate Scatterplot Revision Notes for HSC SSCE Mathematics Standard

The Bivariate Scatterplot

Introduction to bivariate data and scatterplots

When working with statistics, you will often need to explore relationships between two different measurements or variables. This type of data is called bivariate data - data that involves two variables measured together.

A scatterplot (also called a scatter diagram) is a visual tool that helps us determine whether a relationship exists between two numerical variables. Each data point on a scatterplot represents a pair of measurements, plotted as a dot on a graph.

For example, if you wanted to investigate whether there is a connection between a person's height and their arm span, you would collect measurements for both variables from several people. Each person would provide an ordered pair of numbers (height, arm span), which you would then plot on a scatterplot to look for patterns.

infoNote

The term "bivariate" comes from "bi-" meaning two and "variate" referring to variables. This distinguishes it from univariate data (one variable) or multivariate data (more than two variables).

The table above shows bivariate data for $15$ people, with height and arm span measurements recorded in centimetres.

Constructing a scatterplot

Follow these steps to create a scatterplot from bivariate data:

Step 1: Draw a number plane

Begin by drawing a set of perpendicular axes (horizontal and vertical lines that cross at right angles).

Step 2: Set up the horizontal axis

Determine an appropriate scale based on the range of your first variable
Add a clear title that identifies what the axis represents

Step 3: Set up the vertical axis

Choose a suitable scale for the range of your second variable
Label the axis with a descriptive title

Step 4: Plot the data points

For each ordered pair in your data, locate the correct position on the graph
Mark each pair with a dot at the intersection of the two values

Here is the completed scatterplot for the height and arm span data:

infoNote

Notice how the dots show a clear pattern - as height increases, arm span tends to increase as well. This visual pattern suggests there is a relationship between these two variables.

Reading and interpreting scatterplots

Being able to extract information from a scatterplot is an essential skill. Let's look at a worked example.

lightbulbExample

Worked Example: Reading a scatterplot

The scatterplot below displays data from $10$ countries, comparing their average numeracy scores for year $6$ students with their general internet usage rate (expressed as a percentage).

Questions:

a) What is the scale for the vertical axis?

To determine the scale, count the number of divisions between two marked values. Between $0$ and $50$ , there are $5$ divisions. Therefore:

$\text{One unit} = \frac{50}{5} = 10$

b) What is the average numeracy score for the country with 24% internet use?

Find the dot positioned at 24% on the horizontal axis, then read across to the vertical axis. The numeracy score is 120.

c) What is the internet use percentage for the country with an average numeracy score of 160?

Locate the dot at $160$ on the vertical axis, then read down to the horizontal axis. The internet use is 50%.

d) How many countries have internet use of less than 50%?

Count all the dots on the left-hand side of the 50% mark. There are 6 countries.

e) How many countries have a numeracy score greater than 100?

Count the dots in the upper portion of the graph above the $100$ mark. There are 6 countries.

f) Is there a relationship between these two variables?

Looking at the pattern of dots, we can observe that when internet use exceeds 20%, there is a clear upward trend - both variables increase together. However, this pattern is less evident when internet use is below 20%. Overall, there appears to be a relationship between these variables, particularly at higher internet usage rates.

Identifying relationships in scatterplots

No relationship

When data points are randomly scattered across the plot with no discernible pattern, this indicates there is no relationship (or no association) between the variables.

In the example above, the dots are spread randomly with no clear pattern, suggesting the two variables are not related.

Types of association

When a clear pattern exists in the scatterplot, we say there is an association between the variables. To describe this association fully, we need to consider three characteristics: form, direction, and strength.

chatImportant

The three scatterplots above all show clear patterns, but each pattern is different. We need specific vocabulary to describe these differences accurately. Always describe an association using all three characteristics: form, direction (if applicable), and strength.

Form of association

The form describes the shape of the pattern formed by the data points.

Linear form

When the points approximately follow a straight line, the association has a linear form.

The left scatterplot shows a linear pattern where points cluster around an imaginary straight line.

Non-linear form

When the points follow a curved pattern rather than a straight line, the association has a non-linear form.

infoNote

The example above shows points arranged in a curved arc, indicating a non-linear relationship. Non-linear patterns can take many forms including curves, exponential growth, or cyclical patterns.

Direction of association

For associations with linear form, we can describe the direction based on whether the line slopes upward or downward.

Positive association

A positive association exists when the imaginary line through the points has a positive gradient. In practical terms, this means as one variable increases, the other variable also tends to increase. The dots trend upward from left to right.

The scatterplot shows a positive association - both variables increase together.

Negative association

A negative association exists when the imaginary line through the points has a negative gradient. This means as one variable increases, the other variable tends to decrease. The dots trend downward from left to right.

infoNote

The scatterplot above demonstrates a negative association where one variable decreases as the other increases. This is also sometimes called an inverse relationship.

Strength of association

The strength of an association measures how closely the data points cluster around the pattern. It tells us how reliable the relationship is. Below you can see strong, moderate, and weak associations (in that order).

Strong association

In a strong association, the dots form a tight cluster following a single, clear stream. There is minimal scatter, and the pattern is very obvious.

The points lie very close to an imaginary line (for linear associations) or curve (for non-linear associations).

Moderate association

In a moderate association, there is more scatter in the data points. The pattern is still visible but less distinct than in a strong association. The points are more spread out around the line or curve.

Weak association

In a weak association, the scatter increases significantly. The pattern becomes much less clear, and the linear (or non-linear) form is less evident. Points are widely dispersed.

chatImportant

Exam tip: When describing an association in an exam, you should always mention the form (linear/non-linear), direction (positive/negative - if linear), and strength (strong/moderate/weak).

Worked example: Describing a bivariate dataset

lightbulbExample

Worked Example: Describing a bivariate dataset

The table below shows height (in cm) and mass (in kg) for nine people.

Table

a) Construct a scatterplot using the data

Following our four-step process:

Draw a number plane with $h$ (height) on the horizontal axis and $m$ (mass) on the vertical axis
Use a scale where each unit represents $1$ cm for height
Use a scale where each unit represents $1$ kg for mass
Plot each ordered pair: $(163, 55)$ , $(165, 60)$ , $(170, 64)$ , $(175, 66)$ , $(178, 65)$ , $(180, 70)$ , $(182, 71)$ , $(186, 74)$ , $(190, 78)$

b) Describe the form of the association

The points approximately follow a straight line, so the association has a linear form.

c) Describe the direction of the association

The gradient of the imaginary line is positive - the dots trend upward from left to right. Therefore, this is a positive association.

d) Describe the strength of the association

There is only a small amount of scatter in the points. They cluster closely around the linear pattern. This is a strong association.

Complete description: Strong, positive, linear association.

e) Predict the mass of a person who is 173 cm tall

Draw an imaginary vertical line from $173$ cm on the horizontal axis. Where it meets the pattern of dots, read across to the vertical axis while maintaining the linear relationship. The predicted mass is approximately 65 kg.

f) Predict the height of a person who has a mass of 75 kg

Draw an imaginary horizontal line from $75$ kg on the vertical axis. Where it intersects the pattern, read down to the horizontal axis. The predicted height is approximately 187 cm.

Independent and dependent variables

In many bivariate datasets, one variable influences or affects the other. We classify these as independent and dependent variables.

Independent variable

The independent variable is the input variable. It is not affected by the other variable. Think of it as the variable you control or choose.

Represented on the horizontal axis (x-axis)
Usually comes first in ordered pairs
Often represents time, distance, or a controlled factor

Dependent variable

The dependent variable is the output variable. Its value depends on or is influenced by the independent variable.

Represented on the vertical axis (y-axis)
Usually comes second in ordered pairs
Often represents a response, result, or measurement

lightbulbExample

Worked Example: Identifying variables

The table below shows time taken (in hours) relative to distance travelled (in kilometres).

Table

a) Draw a scatterplot

Draw a number plane with $d$ (distance) on the horizontal axis and $t$ (time) on the vertical axis
Use a scale of $10$ km per unit on the horizontal axis
Use a scale of $0.2$ hours per unit on the vertical axis
Plot the points: $(0, 0)$ , $(10, 0.25)$ , $(20, 0.38)$ , $(30, 0.59)$ , $(40, 0.82)$ , $(50, 1.00)$

b) Identify the independent and dependent variables

The independent variable is distance $(d)$ because it is the input - we choose how far to travel. It appears on the horizontal axis.

The dependent variable is time $(t)$ because it depends on the distance travelled - the time taken is determined by how far we go. It appears on the vertical axis.

chatImportant

Exam tip: Think about cause and effect. The independent variable is the cause (what you change), and the dependent variable is the effect (what changes as a result).

bookmarkSummary

Key Points to Remember:

Bivariate data involves two variables measured together for each observation
A scatterplot is a graph that displays bivariate data as dots to help identify relationships
To construct a scatterplot: draw axes, set appropriate scales with titles, and plot each ordered pair as a dot
Describe associations using three characteristics: form (linear or non-linear), direction (positive or negative), and strength (strong, moderate, or weak)
The independent variable (input) goes on the horizontal axis; the dependent variable (output) goes on the vertical axis
No pattern in a scatterplot means no relationship exists between the variables

The Bivariate Scatterplot (HSC SSCE Mathematics Standard): Revision Notes