Terminology Revision Notes for Leaving Cert Mathematics

Statistics Terminology

Understanding populations and samples

In statistics, we need to distinguish between the group we want to learn about and the group we actually study.

Population refers to the complete group of individuals or items that we want to investigate. This could be all Leaving Cert students in Ireland, all cars manufactured in 2023, or all trees in a particular forest. The population represents everyone or everything we're interested in understanding.

Sample means a smaller subset taken from the population for practical study. Since we usually cannot examine every member of a population (imagine trying to survey every Leaving Cert student!), we select a manageable group to represent the whole.

infoNote

For example, if we want to understand the study habits of all 60,000 Leaving Cert students in Ireland (the population), we might survey 200 randomly chosen students (the sample) to draw conclusions about the entire group.

A representative sample accurately reflects the characteristics of the entire population. This means the sample should have similar proportions of different groups as found in the population.

chatImportant

Bias occurs when certain groups are over-represented or under-represented in our sample, leading to inaccurate conclusions about the population. Bias is one of the biggest threats to valid statistical conclusions.

Types of data

Understanding how to classify data helps us choose appropriate analysis methods. Data falls into two main categories, each with important subcategories.

Categorical data consists of information sorted into distinct categories or groups.

Nominal data has no natural ordering between categories. Examples include colours, gender, or favourite subjects. You cannot rank these meaningfully.
Ordinal data has categories with a natural order or ranking. Examples include satisfaction ratings (poor, fair, good, excellent) or finishing positions in a race.

Numerical data represents measured or counted quantities.

Discrete data consists of countable values, often whole numbers. Examples include number of students in a class, goals scored in a match, or books owned.
Continuous data can take any value within a range and is typically measured rather than counted. Examples include height, weight, time, or temperature.

infoNote

Data Classification Hierarchy: The distinction between these data types determines which statistical methods and graphs are appropriate for analysis. Always identify your data type before choosing analysis techniques.

Sampling methods and errors

Different methods exist for selecting samples, including simple random, systematic, stratified, and quota sampling. Each method has specific advantages and applications depending on the research context.

Sampling error represents the natural difference between what we find in our sample and the true population value. This occurs simply because we're studying part of the population, not all of it.

Non-sampling error includes mistakes unrelated to sample selection, such as measurement errors, recording mistakes, or response bias.

infoNote

Even with perfect sampling methods, sampling error will always exist. The key is to minimise it through proper sample size and selection techniques, while eliminating non-sampling errors through careful data collection procedures.

Measures of central tendency

These statistics help us identify the "typical" or "average" value in our data. Each measure has specific uses and limitations.

Mean calculates the arithmetic average by adding all values and dividing by the number of observations. The mean can be significantly affected by extreme values (outliers), which may make it less representative of typical values.

lightbulbExample

Worked Example: Calculating the Mean

Test scores: 65, 70, 72, 75, 78, 82, 95

Mean = $\frac{65 + 70 + 72 + 75 + 78 + 82 + 95}{7} = \frac{537}{7} = 76.7$

Median identifies the middle value when data is arranged in order. With an even number of values, we take the average of the two middle numbers. The median remains stable even when extreme values are present.

lightbulbExample

Worked Example: Finding the Median

Same test scores arranged in order: 65, 70, 72, 75, 78, 82, 95

With 7 values, the median is the 4th value: 75

Mode represents the most frequently occurring value in the dataset. A dataset might have no mode, one mode, or multiple modes.

chatImportant

When to Use Each Measure:

Use mean for symmetrical data without outliers
Use median when data is skewed or contains outliers
Use mode for categorical data or when identifying the most common value

Measures of spread

These statistics describe how much the data varies around the central value, giving us insight into data consistency and reliability.

Range provides the simplest measure of spread by calculating the difference between the maximum and minimum values: $Range = Max - Min$ .

Interquartile range (IQR) measures the spread of the middle 50% of data by finding the difference between the third quartile and first quartile: $IQR = Q\_3 - Q\_1$ .

Variance calculates the average of the squared deviations from the mean, providing a measure of overall variability:

$\text{Variance} = \frac{\sum(x\_i - \bar{x})^2}{n}$

Standard deviation represents the typical distance of data points from the mean. It's calculated as the square root of variance and uses the same units as the original data:

$\text{Standard Deviation} = \sqrt{\text{Variance}}$

chatImportant

Outliers are data points that fall far from most other observations and can significantly impact our analysis. They're typically defined as values more than 1.5 × IQR below Q₁ or above Q₃.

Generalisability and data presentation

Generalisability refers to our ability to apply sample findings to the broader population. This is only valid when our sample is representative and bias is minimised.

infoNote

Common Data Presentation Methods: Different visualisation methods suit different purposes: frequency tables and histograms for numerical data, bar charts and pie charts for categorical data, box plots for showing spread and outliers, and scatter plots for relationships between variables.

Probability concepts

Basic probability terminology forms the foundation for understanding statistical inference and decision-making under uncertainty.

Events are specific outcomes we're interested in, while outcomes are possible results. The sample space contains all possible outcomes.

Mutually exclusive events cannot occur simultaneously, while independent events mean one outcome doesn't affect another.

infoNote

Understanding these probability concepts is essential for hypothesis testing and confidence intervals, which form the basis of statistical inference.

Correlation and regression

Correlation describes the relationship between two variables, which can be positive (both increase together), negative (one increases as the other decreases), or show no relationship.

The correlation coefficient (r) quantifies this relationship on a scale from -1 to +1, where $r = -1$ indicates perfect negative correlation, $r = 0$ indicates no linear relationship, and $r = +1$ indicates perfect positive correlation.

Line of best fit represents the straight line that best describes the relationship between two variables, minimising the sum of squared residuals.

infoNote

Remember that correlation does not imply causation. A strong correlation between two variables doesn't mean one causes the other - there may be other factors involved or the relationship may be coincidental.

Common statistical misuse

chatImportant

Be Aware of These Problems in Statistical Presentations:

Biassed samples that don't represent the population
Misleading graphs with inappropriate scales or formats
Using inappropriate averages for the data type
Ignoring the importance of sample size in drawing conclusions
Cherry-picking data that supports a predetermined conclusion
Confusing correlation with causation

These misuses can lead to incorrect conclusions and poor decision-making.

Summary

bookmarkSummary

Key Points to Remember:

Population is everyone you want to study; sample is who you actually study
Categorical data goes in groups; numerical data involves numbers you can calculate with
Mean is affected by outliers; median is more resistant to extreme values
IQR measures the spread of the middle 50% of your data
A sample must be representative to make valid conclusions about the population
Always consider the context and limitations of your data when drawing conclusions

Terminology (Leaving Cert Mathematics): Revision Notes

Statistics Terminology

Understanding populations and samples

Types of data

Sampling methods and errors

Measures of central tendency

Measures of spread

Generalisability and data presentation

Probability concepts

Correlation and regression

Common statistical misuse

Summary

Explore Leaving Cert Mathematics Model Answers by Topics

The Basics

Graphing Data

Scatter Plots & Correlation

Analysing Data

Measures of Relative Standing

Frequency Distributions

Explore Leaving Cert Mathematics Quizzes by Topics

The Basics

Graphing Data

Scatter Plots & Correlation

Analysing Data

Measures of Relative Standing

Frequency Distributions

Explore Leaving Cert Mathematics Flashcards by Topics

The Basics

Graphing Data

Scatter Plots & Correlation

Analysing Data

Measures of Relative Standing

Frequency Distributions

Join 100,000+ Leaving Cert students studying Revision Notes with us.