Descriptive Statistics Revision Notes for AQA A-Level Psychology

Descriptive Statistics

Descriptive statistics provide the tools needed to summarise and present large datasets in meaningful ways. They help researchers identify patterns and communicate findings clearly through numerical summaries and visual representations.

infoNote

Descriptive statistics form the foundation of data analysis, allowing researchers to transform raw data into understandable insights. These techniques are essential for making sense of complex datasets before moving to more advanced statistical analyses.

Measures of central tendency

Central tendency measures help researchers find typical or representative scores within a dataset. These measures provide a single value that best represents the entire group of scores. There are three main types: median, mean, and mode.

The median

The median represents the middle value when all scores are arranged in numerical order. For datasets with an odd number of values, the median is simply the middle score. When there's an even number of scores, researchers calculate the median by finding the midpoint between the two central values.

lightbulbExample

Worked Example: Finding the Median

For odd number of values: 3, 5, 7, 9, 11 The median is 7 (the middle value)

For even number of values: 2, 4, 6, 8 The median is (4 + 6) ÷ 2 = 5

infoNote

Advantages of using the median:

Extreme scores (outliers) don't distort the result, making it robust against unusual values
Calculation is typically straightforward compared to other measures
Works effectively with ordinal data (ranked information), unlike the mean

chatImportant

Limitations of the median:

Less sensitive than the mean because it doesn't incorporate every score in the calculation
May not represent the dataset well when dealing with small samples
For example, in the dataset 1, 1, 2, 3, 4, 5, 6, 7, 8, the median is 4, but this doesn't reflect the concentration of low scores

The mean

The mean calculates the average by adding all scores together and dividing by the total number of values. This provides the mathematical centre of the dataset.

lightbulbExample

Worked Example: Calculating the Mean

Dataset: 2, 4, 6, 8, 10

Step 1: Add all values together 2 + 4 + 6 + 8 + 10 = 30

Step 2: Divide by the number of values Mean = 30 ÷ 5 = 6

infoNote

Advantages of using the mean:

Most precise measure of central tendency as it incorporates every single data point
Uses interval-level measurement where equal intervals between units exist (such as time measurements)
Provides the foundation for many advanced statistical calculations

chatImportant

Limitations of the mean:

Susceptible to distortion from extremely high or low scores
The calculated value might not match any actual score in the dataset
For instance, with scores 1, 1, 2, 3, 4, 5, 6, 7, 8, the mean equals 4.1, but no participant actually scored 4.1

The mode

The mode identifies the most frequently occurring value within a dataset. This measure proves particularly useful when describing the most common or popular response.

infoNote

Advantages of using the mode:

Resistant to distortion from extreme values
Sometimes provides more practical information than other measures - for example, describing the typical number of children in British families as 2 (mode) rather than 2.4 (mean)

chatImportant

Limitations of the mode:

Datasets may contain multiple modes or no clear mode at all
For example, the dataset 2, 3, 6, 7, 7, 9, 15, 16, 20 has modes at both 7 and 16
Doesn't utilise all available data points in its calculation

Measures of dispersion

Dispersion measures indicate how spread out or variable the scores are within a dataset. Understanding variability is crucial for interpreting data accurately, as datasets with the same central tendency can have vastly different distributions. Two primary measures are the range and standard deviation.

The range

The range shows the spread of scores by calculating the difference between the highest and lowest values in the dataset.

infoNote

Advantages of the range:

Quick and straightforward to calculate
Takes into account the most extreme values in the dataset

chatImportant

Limitations of the range:

Can be misleading when extreme outliers are present
Doesn't show whether scores cluster together or spread evenly across the range
Two very different datasets can have identical ranges - for example, both datasets (2, 3, 4, 5, 5, 6, 7, 8, 9, 21) and (2, 5, 8, 9, 10, 12, 13, 15, 16, 18, 21) have the same range of 19, despite having very different distributions

Standard deviation

Standard deviation measures how far, on average, individual scores deviate from the mean. A larger standard deviation indicates greater variability in the dataset.

lightbulbExample

Worked Example: Calculating Standard Deviation

Dataset: 2, 4, 6, 8, 10

Step 1: Calculate the mean Mean = (2 + 4 + 6 + 8 + 10) ÷ 5 = 6

Step 2: Subtract the mean from each score

2 - 6 = -4
4 - 6 = -2
6 - 6 = 0
8 - 6 = 2
10 - 6 = 4

Step 3: Square each difference (-4)² = 16, (-2)² = 4, (0)² = 0, (2)² = 4, (4)² = 16

Step 4: Add the squared differences 16 + 4 + 0 + 4 + 16 = 40

Step 5: Divide by (n-1) to get variance Variance = 40 ÷ (5-1) = 40 ÷ 4 = 10

Step 6: Take the square root Standard deviation = √10 = 3.16

infoNote

Advantages of standard deviation:

More sensitive measure than range because it incorporates every data point
Enables interpretation of individual scores - in normally distributed data, approximately 68.26% of scores fall within one standard deviation of the mean, and 95.44% fall within two standard deviations

chatImportant

Limitations of standard deviation:

More complex to calculate than the range
Less meaningful when data doesn't follow a normal distribution pattern

Percentages

Percentages convert raw data into proportional formats, making comparisons easier. Researchers calculate percentages by multiplying the original value by 100 and dividing by the total possible score.

lightbulbExample

Worked Example: Converting to Percentages

Test score: 67 out of 80

Percentage = (67 ÷ 80) × 100 = 0.8375 × 100 = 83.75%

Correlational data

Correlational data examines relationships between variables using correlation coefficients. These coefficients range from +1 (perfect positive correlation) through 0 (no correlation) to -1 (perfect negative correlation). Researchers typically present correlational data using scattergrams, which visually demonstrate the strength and direction of relationships between variables.

infoNote

Correlation coefficients provide a standardised way to measure relationships, making it possible to compare the strength of different relationships regardless of the units of measurement used.

Presentation of quantitative data

Effective data presentation helps audiences understand research findings through clear visual and numerical formats. The choice of presentation method significantly impacts how well your audience can interpret and understand your findings. Researchers can choose from various presentation methods depending on their data type and research objectives.

Graphs and charts

Visual representations make complex datasets more accessible and help identify patterns that might not be obvious in raw numerical form.

Bar charts

Bar charts display categorical data by showing different groups as separate bars. The categories appear on the horizontal axis, while the measured values show on the vertical axis. Each bar should have equal width with spaces between them to emphasise that the categories are distinct rather than continuous.

infoNote

Bar charts can display totals, means, percentages, or ratios, and can show multiple variables together for comparison purposes, such as comparing chocolate consumption between males and females across different age groups.

Histograms

Histograms present continuous data where the variable can take any value within a range. Unlike bar charts, histograms have no spaces between bars because the data represents a continuous scale. The continuous scores appear on the horizontal axis, while frequency shows on the vertical axis.

chatImportant

Each bar's width should be identical since they represent equal intervals on the continuous scale. This is a key difference from bar charts where spacing indicates distinct categories.

Frequency polygon (line graph)

Frequency polygons provide an alternative way to display continuous data by connecting points rather than using bars. The graph connects data points with lines, creating a polygon shape.

infoNote

This format proves particularly useful when comparing two or more frequency distributions on the same graph, as multiple lines can be displayed simultaneously without visual confusion.

Pie charts

Pie charts illustrate the frequency of different categories as percentages of the whole. Each section's size corresponds to the frequency of that category, with the entire circle representing 100% of the data.

chatImportant

This format works best when showing how different parts contribute to a complete whole, but becomes difficult to read with too many small categories.

bookmarkSummary

Key Points to Remember:

Central tendency measures each have specific strengths: median resists extreme scores, mean uses all data points, and mode shows the most common value
Standard deviation provides more detailed information about data spread than range because it incorporates every score in its calculation
Choose appropriate graphs based on data type: bar charts for categorical data, histograms for continuous data, and pie charts for showing parts of a whole
Visual presentations help identify patterns and relationships that may not be obvious in raw numerical data
All descriptive statistics have limitations
- understanding these helps researchers select the most appropriate measures for their specific research questions

Descriptive Statistics (AQA A-Level Psychology): Revision Notes