Summary (Grade 11 NSC Matric Mathematics): Revision Notes
Summary
Data visualisation methods
Statistics uses several visual methods to represent data effectively. Understanding these different approaches helps you choose the most appropriate way to display and interpret information.
Histograms
Histograms are bar charts that show how frequently different events or values occur in your data set. Each rectangle (bar) represents one category or interval, and the height of each rectangle corresponds to how many times that event occurred. The higher the bar, the more frequently that value appears in your data.
When reading histograms, remember that:
- The width of each bar represents the class interval
- The area of each rectangle is proportional to the frequency
- Adjacent bars touch each other (unlike regular bar charts)
Frequency polygons
Frequency polygons display the same information as histograms but use a different visual approach. Instead of rectangles, frequency polygons use connected line segments and points. To create a frequency polygon, you connect the midpoint of the top edge of each rectangle from a histogram with straight lines.
This method is particularly useful when:
- You want to compare multiple data sets on the same graph
- You need to show trends more clearly than bars would allow
- You're working with continuous data
Ogives (cumulative histograms)
Ogives, also called cumulative histograms, show the running total of frequencies in your data set. Rather than showing individual frequencies, ogives display how many values are less than or equal to each point.
Key properties of ogives:
- The first count is always zero - no values are less than the smallest value
- The last count equals the total number of data points - all values are less than or equal to the largest value
- To construct an ogive, add up all frequencies from left to right as you progress through the data
Measures of dispersion
Dispersion tells you how spread out your data points are from the centre. Two important measures help you quantify this spread.
Variance and standard deviation
Variance and standard deviation are both measures of dispersion that tell you how scattered your data points are around the mean.
Variance formula:
Standard deviation formula:
The relationship between these measures:
- Standard deviation is the square root of the variance
- Standard deviation is measured in the same units as your original data
- Variance is measured in squared units of your original data
Worked Example: Understanding Units
If you're measuring heights in centimetres:
- The standard deviation will be in centimetres
- The variance will be in square centimetres
This is why standard deviation is often more interpretable than variance!
Distribution shapes
The shape of your data distribution affects the relationship between different measures of central tendency.
Symmetric distribution
In a symmetric distribution, the data is evenly balanced around the centre point.
Characteristics:
- The mean approximately equals the median
- The tails on both sides are balanced and equal in length
- The distribution looks like a mirror image on both sides of the centre
Right (positively) skewed distribution
In a right-skewed distribution, there's a longer tail extending to the right side.
Characteristics:
- The mean is greater than the median
- The tail on the right side is longer than the tail on the left side
- The median is closer to the first quartile than to the third quartile
This happens because extreme high values pull the mean upward, but the median remains more resistant to these outliers.
Left (negatively) skewed distribution
In a left-skewed distribution, there's a longer tail extending to the left side.
Characteristics:
- The mean is less than the median
- The tail on the left side is longer than the tail on the right side
- The median is closer to the third quartile than to the first quartile
This occurs because extreme low values pull the mean downward, while the median stays more stable.
Outliers
An outlier is a data value that falls far away from the rest of the data points. Outliers can significantly affect your statistical measures, particularly the mean and standard deviation, which is why it's important to identify them when analysing data.
Outliers have the greatest impact on:
- The mean (can shift it dramatically)
- The standard deviation (can increase it significantly)
- The range (always affected by extreme values)
The median and interquartile range are more resistant to outliers.
Remember!
Key Points to Remember:
- Histograms use rectangles, frequency polygons use connected lines, and ogives show cumulative totals
- Standard deviation = √variance, and standard deviation uses the same units as your original data
- In right-skewed distributions: mean > median; in left-skewed: mean < median
- Ogives always start at zero and end at the total count of your data set
- Outliers are extreme values that can dramatically affect your statistical measures