Descriptive Statistics (AQA A-Level Psychology): Revision Notes
Descriptive Statistics
Descriptive statistics provide the tools needed to summarise and present large datasets in meaningful ways. They help researchers identify patterns and communicate findings clearly through numerical summaries and visual representations.
Descriptive statistics form the foundation of data analysis, allowing researchers to transform raw data into understandable insights. These techniques are essential for making sense of complex datasets before moving to more advanced statistical analyses.
Measures of central tendency
Central tendency measures help researchers find typical or representative scores within a dataset. These measures provide a single value that best represents the entire group of scores. There are three main types: median, mean, and mode.
The median
The median represents the middle value when all scores are arranged in numerical order. For datasets with an odd number of values, the median is simply the middle score. When there's an even number of scores, researchers calculate the median by finding the midpoint between the two central values.
Worked Example: Finding the Median
For odd number of values: 3, 5, 7, 9, 11 The median is 7 (the middle value)
For even number of values: 2, 4, 6, 8 The median is (4 + 6) ÷ 2 = 5
Advantages of using the median:
- Extreme scores (outliers) don't distort the result, making it robust against unusual values
- Calculation is typically straightforward compared to other measures
- Works effectively with ordinal data (ranked information), unlike the mean
Limitations of the median:
- Less sensitive than the mean because it doesn't incorporate every score in the calculation
- May not represent the dataset well when dealing with small samples
- For example, in the dataset 1, 1, 2, 3, 4, 5, 6, 7, 8, the median is 4, but this doesn't reflect the concentration of low scores
The mean
The mean calculates the average by adding all scores together and dividing by the total number of values. This provides the mathematical centre of the dataset.
Worked Example: Calculating the Mean
Dataset: 2, 4, 6, 8, 10
Step 1: Add all values together 2 + 4 + 6 + 8 + 10 = 30
Step 2: Divide by the number of values Mean = 30 ÷ 5 = 6
Advantages of using the mean:
- Most precise measure of central tendency as it incorporates every single data point
- Uses interval-level measurement where equal intervals between units exist (such as time measurements)
- Provides the foundation for many advanced statistical calculations
Limitations of the mean:
- Susceptible to distortion from extremely high or low scores
- The calculated value might not match any actual score in the dataset
- For instance, with scores 1, 1, 2, 3, 4, 5, 6, 7, 8, the mean equals 4.1, but no participant actually scored 4.1
The mode
The mode identifies the most frequently occurring value within a dataset. This measure proves particularly useful when describing the most common or popular response.
Advantages of using the mode:
- Resistant to distortion from extreme values
- Sometimes provides more practical information than other measures - for example, describing the typical number of children in British families as 2 (mode) rather than 2.4 (mean)
Limitations of the mode:
- Datasets may contain multiple modes or no clear mode at all
- For example, the dataset 2, 3, 6, 7, 7, 9, 15, 16, 20 has modes at both 7 and 16
- Doesn't utilise all available data points in its calculation
Measures of dispersion
Dispersion measures indicate how spread out or variable the scores are within a dataset. Understanding variability is crucial for interpreting data accurately, as datasets with the same central tendency can have vastly different distributions. Two primary measures are the range and standard deviation.
The range
The range shows the spread of scores by calculating the difference between the highest and lowest values in the dataset.
Advantages of the range:
- Quick and straightforward to calculate
- Takes into account the most extreme values in the dataset
Limitations of the range:
- Can be misleading when extreme outliers are present
- Doesn't show whether scores cluster together or spread evenly across the range
- Two very different datasets can have identical ranges - for example, both datasets (2, 3, 4, 5, 5, 6, 7, 8, 9, 21) and (2, 5, 8, 9, 10, 12, 13, 15, 16, 18, 21) have the same range of 19, despite having very different distributions
Standard deviation
Standard deviation measures how far, on average, individual scores deviate from the mean. A larger standard deviation indicates greater variability in the dataset.
Worked Example: Calculating Standard Deviation
Dataset: 2, 4, 6, 8, 10
Step 1: Calculate the mean Mean = (2 + 4 + 6 + 8 + 10) ÷ 5 = 6
Step 2: Subtract the mean from each score
- 2 - 6 = -4
- 4 - 6 = -2
- 6 - 6 = 0
- 8 - 6 = 2
- 10 - 6 = 4
Step 3: Square each difference (-4)² = 16, (-2)² = 4, (0)² = 0, (2)² = 4, (4)² = 16
Step 4: Add the squared differences 16 + 4 + 0 + 4 + 16 = 40
Step 5: Divide by (n-1) to get variance Variance = 40 ÷ (5-1) = 40 ÷ 4 = 10
Step 6: Take the square root Standard deviation = √10 = 3.16
Advantages of standard deviation:
- More sensitive measure than range because it incorporates every data point
- Enables interpretation of individual scores - in normally distributed data, approximately 68.26% of scores fall within one standard deviation of the mean, and 95.44% fall within two standard deviations
Limitations of standard deviation:
- More complex to calculate than the range
- Less meaningful when data doesn't follow a normal distribution pattern
Percentages
Percentages convert raw data into proportional formats, making comparisons easier. Researchers calculate percentages by multiplying the original value by 100 and dividing by the total possible score.
Worked Example: Converting to Percentages
Test score: 67 out of 80
Percentage = (67 ÷ 80) × 100 = 0.8375 × 100 = 83.75%
Correlational data
Correlational data examines relationships between variables using correlation coefficients. These coefficients range from +1 (perfect positive correlation) through 0 (no correlation) to -1 (perfect negative correlation). Researchers typically present correlational data using scattergrams, which visually demonstrate the strength and direction of relationships between variables.
Correlation coefficients provide a standardised way to measure relationships, making it possible to compare the strength of different relationships regardless of the units of measurement used.
Presentation of quantitative data
Effective data presentation helps audiences understand research findings through clear visual and numerical formats. The choice of presentation method significantly impacts how well your audience can interpret and understand your findings. Researchers can choose from various presentation methods depending on their data type and research objectives.
Graphs and charts
Visual representations make complex datasets more accessible and help identify patterns that might not be obvious in raw numerical form.
Bar charts
Bar charts display categorical data by showing different groups as separate bars. The categories appear on the horizontal axis, while the measured values show on the vertical axis. Each bar should have equal width with spaces between them to emphasise that the categories are distinct rather than continuous.
Bar charts can display totals, means, percentages, or ratios, and can show multiple variables together for comparison purposes, such as comparing chocolate consumption between males and females across different age groups.
Histograms
Histograms present continuous data where the variable can take any value within a range. Unlike bar charts, histograms have no spaces between bars because the data represents a continuous scale. The continuous scores appear on the horizontal axis, while frequency shows on the vertical axis.
Each bar's width should be identical since they represent equal intervals on the continuous scale. This is a key difference from bar charts where spacing indicates distinct categories.
Frequency polygon (line graph)
Frequency polygons provide an alternative way to display continuous data by connecting points rather than using bars. The graph connects data points with lines, creating a polygon shape.
This format proves particularly useful when comparing two or more frequency distributions on the same graph, as multiple lines can be displayed simultaneously without visual confusion.
Pie charts
Pie charts illustrate the frequency of different categories as percentages of the whole. Each section's size corresponds to the frequency of that category, with the entire circle representing 100% of the data.
This format works best when showing how different parts contribute to a complete whole, but becomes difficult to read with too many small categories.
Key Points to Remember:
-
Central tendency measures each have specific strengths: median resists extreme scores, mean uses all data points, and mode shows the most common value
-
Standard deviation provides more detailed information about data spread than range because it incorporates every score in its calculation
-
Choose appropriate graphs based on data type: bar charts for categorical data, histograms for continuous data, and pie charts for showing parts of a whole
-
Visual presentations help identify patterns and relationships that may not be obvious in raw numerical data
-
All descriptive statistics have limitations- understanding these helps researchers select the most appropriate measures for their specific research questions