Bivariate Data – Classifying the Variables (VCE SSCE General Mathematics): Revision Notes
Bivariate Data – Classifying the Variables
Introduction to bivariate data
Up until now, you have focused on analysing single variables individually. For example, you might have investigated questions like "What is the favourite colour of students?" or "How do the weights of fish vary?" These questions involve examining just one variable at a time.
However, many real-world questions require us to examine how two variables relate to each other. Consider these examples:
- Does a new headache treatment work more quickly than an old treatment?
- Are city voters more likely to vote for the Greens party than country voters?
- Can we predict a student's test score from the time they spent studying?
These questions cannot be answered by looking at variables separately. Instead, we need to investigate how the two variables are associated or linked together. When two variables vary together in this way, we call the data bivariate data.
Bivariate data is data that involves two variables that are linked or associated in some way, so they vary together.
Classifying variables as categorical or numerical
Before we can investigate the association between two variables, we need to identify what type of variables we are working with. There are two main types:
Categorical variables
Categorical variables produce data values that are names or labels rather than numbers.
Examples:
- Favourite pet (dog, cat, rabbit, bird)
- Coffee size (small, medium, large)
- Type of treatment (new, old)
- Place of residence (city, country)
Numerical variables
Numerical variables produce data values that are numbers, typically from counting or measuring.
Examples:
- Number of brothers ()
- Hand span (measured in cm)
- Time taken for headache relief (measured in minutes)
- Test score (measured as a percentage)
Categorical variables can be further classified as nominal or ordinal. Numerical variables can be further classified as discrete or continuous.
Types of associations between variables
Once we have classified each variable as categorical or numerical, we can identify the type of association we are investigating. There are three possible combinations:
1. Association between a categorical variable and a numerical variable
Example question: Does the new treatment for headache work more quickly than the old treatment?
- Variable 1: Type of treatment (categorical: 'new' or 'old')
- Variable 2: Time taken for headache to be relieved (numerical: measured in minutes)
This is an investigation of the association between a categorical variable and a numerical variable.
2. Association between two categorical variables
Example question: Are city voters more likely to vote for the Greens party than country voters?
- Variable 1: Place of residence (categorical: 'city' or 'country')
- Variable 2: Vote for the Greens (categorical: 'yes' or 'no')
This is an investigation of the association between two categorical variables.
3. Association between two numerical variables
Example question: Can we predict a student's test score from time spent studying for the test?
- Variable 1: Time spent studying (numerical: measured in hours)
- Variable 2: Test score (numerical: measured as a percentage)
This is an investigation of the association between two numerical variables.
Three Types of Associations:
- Categorical and numerical variables
- Two categorical variables
- Two numerical variables
Identifying response and explanatory variables
When investigating associations between variables, it is helpful to identify which variable explains changes in the other variable.
Key definitions
Explanatory variable (EV): The variable we use to explain or predict the value of the response variable.
Response variable (RV): The variable that responds to or is predicted by the explanatory variable.
Alternative names:
- The explanatory variable is sometimes called the independent variable (IV)
- The response variable is sometimes called the dependent variable (DV)
How to identify which is which
Think about the question you are asking. The variable you are using to explain or predict is the explanatory variable. The variable being explained or predicted is the response variable.
Example 1:
Question: Are city voters more likely to vote for the Greens party than country voters?
This question suggests that knowing a person's place of residence might help explain their voting preference.
- Explanatory variable: Place of residence
- Response variable: Vote for Greens
Example 2:
Question: Does the time it takes a student to get to school depend on their mode of transport?
The question suggests that mode of transport might explain differences in travel time.
- Explanatory variable: Mode of transport
- Response variable: Time
Example 3:
Question: Can we predict people's height from their wrist measurement?
Here we want to use wrist measurement to predict height.
- Explanatory variable: Wrist measurement
- Response variable: Height
Important consideration
Sometimes the choice of explanatory and response variables depends on how we phrase the question. For instance, we could reverse Example 3 and ask: "Can we predict people's wrist measurement from their height?" In this case:
- Explanatory variable: Height
- Response variable: Wrist measurement
The way we ask our statistical question determines which variable is explanatory and which is the response, especially when there is no obvious cause-and-effect relationship.
Worked example: Classifying associations
Worked Example: Classifying Associations
Question: For each of the following questions, determine if they involve investigating associations between:
- One numerical variable and one categorical variable, or
- Two categorical variables, or
- Two numerical variables
a) Are younger people more likely to believe in astrology than older people?
- Age is measured in years
- Belief in astrology is measured as 'yes' or 'no'
b) Do people who weigh more tend to have higher blood pressure?
- Weight is measured in kg
- Blood pressure is measured in mmHg
c) Are people who have a driver's licence more likely to be in favour of lowering the driving age?
- Having a driver's licence is measured as 'yes' or 'no'
- Support for lowering the driving age is measured as 'yes' or 'no'
Solution:
a) One numerical variable (age measured in years) and one categorical variable (belief in astrology: yes/no)
b) Two numerical variables (weight in kg and blood pressure in mmHg)
c) Two categorical variables (have a driver's licence: yes/no and support for lowering the driving age: yes/no)
Summary
Key Points to Remember:
- Bivariate data involves two variables that are associated or linked together
- Categorical variables produce names or labels as data values
- Numerical variables produce numbers from counting or measuring
- There are three types of associations to investigate:
- Categorical and numerical
- Two categorical variables
- Two numerical variables
- The explanatory variable (EV) is used to explain or predict the response variable (RV)
- How we phrase our question helps determine which variable is explanatory and which is the response