Methods (Edexcel A-Level Psychology): Revision Notes
Analysis of Quantitative Data
Introduction to quantitative data
After conducting psychological research, the collected data must be analyzed to identify patterns and draw conclusions. Quantitative data refers to numerical information gathered through investigations. This type of data emerges from closed questions and ranked-scale questions, which produce values that can be measured and counted.
Research produces raw scores - the initial numerical results before any processing. Raw data in its unprocessed form can be difficult to interpret, so researchers summarize it using descriptive statistics. These are calculations that measure central tendency (average values) and dispersion (spread of scores), making it easier to identify trends and compare groups.
Descriptive statistics transform raw numerical data into meaningful summaries that reveal patterns and trends. Without this statistical analysis, researchers would struggle to draw valid conclusions from their investigations or compare findings across different groups or conditions.
Data tables
Raw data table
A raw data table presents all individual values measured in a study without any processing or summarization. It displays the complete dataset in a structured format, typically organized by participant and condition.
For example, in a study measuring self-rated obedience (out of 10) for males and females:
| Participant | Males | Females |
|---|---|---|
| A | 3 | 7 |
| B | 5 | 9 |
| C | 4 | 6 |
| D | 6 | 8 |
| E | 4 | 7 |
| F | 3 | 6 |
| G | 4 | 9 |
While raw data tables show all collected values, they do not reveal obvious patterns. Other table formats help analyze findings more effectively.
Frequency table
A frequency table displays how many times each score occurs in a dataset. Rather than listing every individual measurement, it counts the occurrences (frequency) of each value.
Using the obedience data, a frequency table would show:
| Self-rated obedience | Males | Females |
|---|---|---|
| 3 | 2 | 0 |
| 4 | 3 | 0 |
| 5 | 1 | 0 |
| 6 | 1 | 2 |
| 7 | 0 | 2 |
| 8 | 0 | 1 |
| 9 | 0 | 2 |
This format immediately reveals that females consistently rated themselves as more obedient than males, with males reporting lower obedience scores overall.
Raw data tables vs. Frequency tables:
Raw data tables are essential for maintaining complete records and performing detailed calculations, but they can be overwhelming when dealing with large datasets. Frequency tables immediately reveal patterns in the data by showing the distribution of scores, making them particularly valuable for identifying trends and comparing groups at a glance.
Measures of central tendency
Central tendency statistics calculate the average or most typical value in a dataset. These measures provide a single representative score for the entire data set, making interpretation and comparison easier. There are three main measures: the mean, median, and mode.
The arithmetic mean
The mean is calculated by adding all values in a dataset and dividing by the total number of scores. It is the most commonly used measure of central tendency.
Formula:
Where:
- = mean
- = sum of
- = individual scores
- = number of scores
Worked Example: Calculating the Mean
Data set: 3, 5, 7, 9, 10, 11, 13
Step 1: Sum all scores
Step 2: Count the number of scores
Step 3: Divide the sum by the number of scores
Result: The mean is 8.3
The mean is the most sensitive measure of central tendency because every value in the dataset affects the calculation. This makes it powerful for statistical analysis. However, it can be distorted by extreme values (outliers) or in datasets with a skewed distribution - where values cluster at one end rather than forming a symmetrical pattern. The mean is most appropriate for interval/ratio level data - numerical measurements on scales with equal intervals, such as time or height.
When to use the mean:
- When you have interval or ratio level data (continuous numerical measurements)
- When your data is symmetrically distributed without extreme outliers
- When you need the most mathematically sensitive measure for further statistical tests
- When all values in your dataset are relevant to the overall picture
The median
The median is the middle value when scores are arranged in order from smallest to largest. It provides a measure of central tendency that is not affected by extreme scores.
Calculation process:
- Arrange all values in rank order
- If there is an odd number of scores, the median is the middle value
- If there is an even number of scores, calculate the mean of the two middle values
Worked Example: Finding the Median
With odd number of scores: Data set: 3, 6, 8, 9, 10 The median is 8 (the middle value)
With even number of scores: Data set: 3, 5, 6, 7, 8, 9
Step 1: Identify the two middle values: 6 and 7
Step 2: Calculate their mean:
Result: The median is 6.5
The median provides a straightforward calculation of central tendency that is not influenced by outliers or skewed distributions. However, it is less sensitive than the mean because it only considers the position of values, not their actual magnitude. The median is not useful for small datasets as it may not represent the typical score. It is most commonly used for ordinal data - measurements that represent rankings rather than precise intervals.
When to use the median:
- When your data contains extreme outliers that would distort the mean
- When you have ordinal (ranked) data
- When your distribution is skewed
- When you need a quick estimate of the central value that isn't affected by unusual scores
The mode
The mode is the value that appears most frequently in a dataset. It identifies the most common score.
Example: Data set: 2, 2, 3, 3, 4, 4, 4, 6, 7, 7, 7, 7, 9 The mode is 7 (it occurs four times)
When two scores occur most frequently with equal frequency, the data is described as bi-modal, and both values should be reported. When more than two modes exist, the mode becomes meaningless as a measure of central tendency.
The mode is used for nominal data - categorical information where values represent discrete categories (such as hair colour: blonde, brown, red, black). It is easy to determine and is not affected by extreme scores. However, it is not a useful measure for datasets with multiple modes or with frequently recurring identical values.
When to use the mode:
- When you have nominal (categorical) data that cannot be meaningfully averaged
- When you want to identify the most popular or common category
- When you need a measure unaffected by extreme values
- When dealing with discrete categories rather than continuous measurements
Measures of dispersion
Dispersion statistics calculate the spread of scores in a dataset. While measures of central tendency can be similar between two groups, the spread of scores may differ considerably. Understanding dispersion is essential for interpreting whether scores cluster closely together or vary widely.
Range
The range is the simplest calculation of dispersion. It represents the difference between the highest and lowest value in a dataset.
Calculation: Range = Highest value - Lowest value
Worked Example: Calculating Range
Data set: 2, 4, 6, 8, 9
Step 1: Identify the highest value: 9
Step 2: Identify the lowest value: 2
Step 3: Calculate the difference:
Result: Range = 7
Interpretation: A range of 7 indicates moderate spread in this dataset.
A high range value indicates that scores are spread out widely. A low range value indicates that scores are close together.
Limitation of the Range:
The range is affected by extreme scores, making it potentially misleading if outliers are present in the data. It also provides no information about whether scores cluster around the mean or are evenly distributed. When a dataset contains extreme scores, the interquartile range provides a better measure. This involves removing the lowest quarter and highest quarter of values (the bottom and top 25%) and calculating the range of the remaining middle half of scores.
Standard deviation
The standard deviation is a more sophisticated way to measure the spread of scores. It calculates the average distance of each value from the arithmetic mean. This provides a single value representing how scores are distributed around the mean - the higher the standard deviation, the greater the spread of scores from the mean value.
Understanding deviation: Deviation refers to the distance of each individual value from the mean. For example, if the mean obedience rating for males is 7, and one male rated himself as 9, the deviation score would be +2. If another male rated himself as 5, the deviation would be -2.
To obtain a single value representing all deviation scores, the standard deviation calculation involves several steps:
Formula:
Calculation steps:
- Calculate the mean of the dataset
- Subtract the mean from each individual score to find the deviation
- Square each deviation value (this eliminates negative values)
- Sum all the squared deviations
- Divide this sum by the number of scores minus 1 (this gives the variance)
- Take the square root of the variance to find the standard deviation
Worked Example: Calculating Standard Deviation
| Score (x) | Mean () | Deviation () | Squared deviation |
|---|---|---|---|
| 70 | 100 | -30 | 900 |
| 80 | 100 | -20 | 400 |
| 90 | 100 | -10 | 100 |
| 100 | 100 | 0 | 0 |
| 110 | 100 | 10 | 100 |
| 120 | 100 | 20 | 400 |
| 130 | 100 | 30 | 900 |
Step 1: Calculate the sum of squared deviations
Step 2: Identify the number of scores
Step 3: Calculate the variance
Step 4: Take the square root to find standard deviation
Result: The standard deviation is 21.6
Interpretation: On average, scores deviate from the mean (100) by approximately 21.6 points, indicating a moderate spread in the data.
Using calculators:
Scientific calculators can compute the mean and standard deviation quickly. Input each score individually using the 'add' function, then press the symbol button. For standard calculations, use this button rather than . Standard calculators can perform statistical calculations but take longer to use.
When asked to calculate a sum in an examination, all working must be shown along with the numerical result achieved.
Summary tables
Summary tables present multiple descriptive statistics together, allowing clear comparison between conditions or groups. They typically include measures of both central tendency and dispersion.
Example summary table:
| Measure | Males | Females |
|---|---|---|
| Mean obedience rating (x̄) | 4.1 | 7.4 |
| Median obedience rating | 4 | 7 |
| Mode obedience rating | 4 | 6, 7 and 9 |
| Range of obedience ratings | 3 | 3 |
| Standard deviation (σx) | 1 | 1.2 |
This format reveals that the typical obedience score is higher for females than males. The modal score for females is not useful as three modes have been calculated (indicating frequent identical values rather than a single most common score). The range is identical for both male and female ratings, and the standard deviations are similar, suggesting a roughly equal spread of scores around the mean for both conditions. Overall, it can be concluded that females consistently rate themselves as more obedient than males, and that males consistently rate themselves as less obedient than females.
However, whether this difference represents a genuine difference or is due to chance factors can only be established by conducting an inferential test on the data. Inferential tests are statistical procedures performed on data to establish whether relationships or differences found were due to chance factors or whether there was indeed a relationship or difference between the variables.
Graphical representation of data
Graphs can be useful for illustrating summary data or data frequencies. Visual representations make patterns and trends more immediately apparent than tables of numbers.
Bar charts
Bar charts are used to present data from a categorical variable, such as the mean, median or mode. The categorical variable is placed on the x-axis (horizontal), and the height of the bars represents the value of that variable.
For example, a bar chart showing the mean number of words recalled in two conditions (acoustically similar sounding words versus acoustically dissimilar sounding words) would place the two conditions on the x-axis and use bar height to show the mean recall score for each condition. The bars are separated by spaces to emphasize that the data represents separate categories.
Histograms
A histogram presents the distribution of scores by illustrating the frequency of values in the dataset. Unlike bar charts, where bars are separated by spaces, the bars on a histogram are joined together to represent continuous data rather than categorical (discrete) data. The possible values are presented on the x-axis, and the height of each bar represents the frequency (how often that value occurred).
Bar Charts vs. Histograms - Key Distinction:
The critical difference between these two graph types:
- Bar charts: Bars are separated by spaces - used for discrete categorical data
- Histograms: Bars are joined together - used for continuous numerical data
Choosing the wrong graph type can misrepresent your data and lead to incorrect interpretations.
Improving graph readability:
Patterns within data may be easier to identify when data is grouped before being displayed on a graph. Raw scores displayed on a graph can be very difficult to interpret, appearing messy and providing little useful information for drawing conclusions. Graphical representations should be meaningful and informative - simply displaying raw data does not achieve this purpose.
When interpreting a data table or graph in an examination, describe the trends and features of the data that are presented, such as which condition produced higher scores or which group showed greater variation.
Distribution patterns
Understanding distribution
Psychological research can use small samples of participants where only measures of central tendency and dispersion provide useful descriptive statistics. When larger samples are gathered, it may be more useful to examine the overall distribution formed by the collected data. Distribution refers to the overall frequency pattern of values in a dataset. Examining distribution can reveal trends in the data that cannot be detected using small samples, and allows estimation of the distribution of scores in the whole population.
Percentage calculations
The percentage score provides an overall indication of the relative proportion of people who achieved a particular score. To calculate a percentage, the sum of all values must be calculated, then each individual value is divided by this sum total and multiplied by 100.
Worked Example: Calculating Percentages
5 people achieved an obedience score of 9. The sum total of participants was 49.
Step 1: Divide the number achieving the score by the total
Step 2: Multiply by 100 to convert to percentage
Result: 10.20% of participants achieved an obedience score of 9.
Interpretation: Approximately 10% of the participants achieved an obedience score of 9.
Normal distribution
When the frequency distribution of a population is calculated, it can be represented on a frequency graph. If the graph illustrates a bell-shaped curve, the data has a normal distribution.
Normal distribution is characterized by its symmetry around the mid-point. The mean, median and mode should be aligned around the mid-point. The tail ends do not meet the horizontal axis, and the percentage of people falling under the curve at each standard deviation can be estimated.
In a normal distribution:
- 68% of the population falls between one standard deviation each side of the mid-point
- 95% of the population falls between two standard deviations either side of the mid-point
The standard deviation must be calculated on the raw scores to understand exactly what value is represented by these intervals.
Worked Example: Interpreting Normal Distribution
The standard deviation for obedience scores was 2.8 and the mean, median and mode score was 7.
Step 1: Place the mean at the mid-point: 7
Step 2: Calculate +1 standard deviation from the mid-point
Step 3: Calculate -1 standard deviation from the mid-point
Step 4: Calculate +2 standard deviations
Step 5: Calculate -2 standard deviations
Interpretation:
- 68% of the sample would achieve an obedience score between 4.2 and 9.8
- 95% of the sample would achieve an obedience score between 1.4 and 12.6
Conclusion: The distribution of self-reported obedience would be normal.
Skewed distribution
Some distributions are not normal but are described as skewed because they are not symmetrical. This may result from the test administered or the type of sample gathered.
Negative skew: If a test is easy or the aptitude of the sample is unusually high, most people will score highly. This leads to a negative skew, where many people score above the average or mean score. In a negatively skewed distribution, the tail is at the negative end of values on the horizontal axis (lower scores).
Positive skew: If the test was particularly difficult or the aptitude of the sample is low, most people will achieve a low score. This leads to a positive skew, where the tail is at the positive end of values on the horizontal axis (higher scores).
Remember: The tail determines the skew direction
- Positive skew: Tail points toward higher scores (positive end) → most people scored low
- Negative skew: Tail points toward lower scores (negative end) → most people scored high
Memory aid: An alternative way to remember this is to imagine drawing a face on the graph curve - a whale swimming toward the vertical axis is coming home (positive), and a whale swimming away from the vertical axis is leaving home (negative).
How the mean changes in skewed distributions:
The mean is affected by extreme scores. In a negative skew, it will be lower than the mode but higher than the mode in a positive skew. The relationship between mean, median and mode shifts depending on the type of distribution.
Key Points to Remember:
-
Quantitative data is numerical information that can be analyzed using descriptive statistics including measures of central tendency and dispersion.
-
The mean is calculated by summing all scores and dividing by the number of scores; it is most appropriate for interval/ratio data but can be affected by extreme values. Formula:
-
The median is the middle value when scores are ranked; it is not affected by outliers and is most suitable for ordinal data.
-
The mode is the most frequent score; it is used for nominal data and is not affected by extreme scores.
-
Standard deviation measures the average spread of scores around the mean and is calculated using the formula:
-
In a normal distribution, data forms a symmetrical bell-shaped curve with the mean, median and mode aligned, and 68% of values fall within one standard deviation of the mean.
-
Skewed distributions are asymmetrical - a positive skew has its tail toward higher scores (most people score low), while a negative skew has its tail toward lower scores (most people score high).
-
Bar charts have separated bars for categorical data, while histograms have joined bars for continuous data.
-
Summary tables effectively present multiple statistics together, but inferential tests are needed to determine if differences are statistically significant or due to chance.