Bivariate Data – Classifying the Variables Revision Notes for VCE SSCE General Mathematics

Bivariate Data – Classifying the Variables

Introduction to bivariate data

Up until now, you have focused on analysing single variables individually. For example, you might have investigated questions like "What is the favourite colour of students?" or "How do the weights of fish vary?" These questions involve examining just one variable at a time.

However, many real-world questions require us to examine how two variables relate to each other. Consider these examples:

Does a new headache treatment work more quickly than an old treatment?
Are city voters more likely to vote for the Greens party than country voters?
Can we predict a student's test score from the time they spent studying?

These questions cannot be answered by looking at variables separately. Instead, we need to investigate how the two variables are associated or linked together. When two variables vary together in this way, we call the data bivariate data.

infoNote

Bivariate data is data that involves two variables that are linked or associated in some way, so they vary together.

Classifying variables as categorical or numerical

Before we can investigate the association between two variables, we need to identify what type of variables we are working with. There are two main types:

Categorical variables

Categorical variables produce data values that are names or labels rather than numbers.

Examples:

Favourite pet (dog, cat, rabbit, bird)
Coffee size (small, medium, large)
Type of treatment (new, old)
Place of residence (city, country)

Numerical variables

Numerical variables produce data values that are numbers, typically from counting or measuring.

Examples:

Number of brothers ( $0, 1, 2, \ldots$ )
Hand span (measured in cm)
Time taken for headache relief (measured in minutes)
Test score (measured as a percentage)

chatImportant

Categorical variables can be further classified as nominal or ordinal. Numerical variables can be further classified as discrete or continuous.

Types of associations between variables

Once we have classified each variable as categorical or numerical, we can identify the type of association we are investigating. There are three possible combinations:

1. Association between a categorical variable and a numerical variable

Example question: Does the new treatment for headache work more quickly than the old treatment?

Variable 1: Type of treatment (categorical: 'new' or 'old')
Variable 2: Time taken for headache to be relieved (numerical: measured in minutes)

This is an investigation of the association between a categorical variable and a numerical variable.

2. Association between two categorical variables

Example question: Are city voters more likely to vote for the Greens party than country voters?

Variable 1: Place of residence (categorical: 'city' or 'country')
Variable 2: Vote for the Greens (categorical: 'yes' or 'no')

This is an investigation of the association between two categorical variables.

3. Association between two numerical variables

Example question: Can we predict a student's test score from time spent studying for the test?

Variable 1: Time spent studying (numerical: measured in hours)
Variable 2: Test score (numerical: measured as a percentage)

This is an investigation of the association between two numerical variables.

infoNote

Three Types of Associations:

Categorical and numerical variables
Two categorical variables
Two numerical variables

Identifying response and explanatory variables

When investigating associations between variables, it is helpful to identify which variable explains changes in the other variable.

Key definitions

Explanatory variable (EV): The variable we use to explain or predict the value of the response variable.

Response variable (RV): The variable that responds to or is predicted by the explanatory variable.

infoNote

Alternative names:

The explanatory variable is sometimes called the independent variable (IV)
The response variable is sometimes called the dependent variable (DV)

How to identify which is which

Think about the question you are asking. The variable you are using to explain or predict is the explanatory variable. The variable being explained or predicted is the response variable.

Example 1:

Question: Are city voters more likely to vote for the Greens party than country voters?

This question suggests that knowing a person's place of residence might help explain their voting preference.

Explanatory variable: Place of residence
Response variable: Vote for Greens

Example 2:

Question: Does the time it takes a student to get to school depend on their mode of transport?

The question suggests that mode of transport might explain differences in travel time.

Explanatory variable: Mode of transport
Response variable: Time

Example 3:

Question: Can we predict people's height from their wrist measurement?

Here we want to use wrist measurement to predict height.

Explanatory variable: Wrist measurement
Response variable: Height

Important consideration

Sometimes the choice of explanatory and response variables depends on how we phrase the question. For instance, we could reverse Example 3 and ask: "Can we predict people's wrist measurement from their height?" In this case:

Explanatory variable: Height
Response variable: Wrist measurement

chatImportant

The way we ask our statistical question determines which variable is explanatory and which is the response, especially when there is no obvious cause-and-effect relationship.

Worked example: Classifying associations

lightbulbExample

Worked Example: Classifying Associations

Question: For each of the following questions, determine if they involve investigating associations between:

One numerical variable and one categorical variable, or
Two categorical variables, or
Two numerical variables

a) Are younger people more likely to believe in astrology than older people?

Age is measured in years
Belief in astrology is measured as 'yes' or 'no'

b) Do people who weigh more tend to have higher blood pressure?

Weight is measured in kg
Blood pressure is measured in mmHg

c) Are people who have a driver's licence more likely to be in favour of lowering the driving age?

Having a driver's licence is measured as 'yes' or 'no'
Support for lowering the driving age is measured as 'yes' or 'no'

Solution:

a) One numerical variable (age measured in years) and one categorical variable (belief in astrology: yes/no)

b) Two numerical variables (weight in kg and blood pressure in mmHg)

c) Two categorical variables (have a driver's licence: yes/no and support for lowering the driving age: yes/no)

Summary

bookmarkSummary

Key Points to Remember:

Bivariate data involves two variables that are associated or linked together
Categorical variables produce names or labels as data values
Numerical variables produce numbers from counting or measuring
There are three types of associations to investigate:
- Categorical and numerical
- Two categorical variables
- Two numerical variables
The explanatory variable (EV) is used to explain or predict the response variable (RV)
How we phrase our question helps determine which variable is explanatory and which is the response

Bivariate Data – Classifying the Variables (VCE SSCE General Mathematics): Revision Notes