Reliability (AQA A-Level Psychology): Revision Notes
Reliability
What is reliability?
Reliability refers to the consistency of findings from research investigations or measuring instruments. When we describe something as reliable in everyday life, we mean it is dependable and consistent - like a friend who is always punctual or a car that maintains steady performance. In psychology, reliability works similarly: research methods and measuring tools should produce consistent results each time they are used.
Psychology measures abstract concepts like intelligence, attitudes, and memory rather than concrete physical properties. This makes establishing reliability particularly challenging compared to measuring concrete physical properties like height or weight.
For psychological tests, scales, surveys, observations and experiments to be trustworthy, they must demonstrate consistency in their measurements across different occasions.
Types of reliability
Test-retest reliability
This method evaluates whether a questionnaire or psychological test produces consistent results over time. The same test is administered to the same participants on two separate occasions. If the measuring instrument is reliable, participants should achieve similar scores both times.
The time gap between administrations must be carefully chosen - long enough so participants cannot simply remember their previous answers, but short enough that genuine changes in attitudes, opinions or abilities have not occurred.
Researchers then correlate the two sets of scores. A significant positive correlation indicates good reliability.
Worked Example: Test-Retest Reliability
A researcher wants to test the reliability of an anxiety questionnaire:
- Week 1: Administer the questionnaire to 50 participants
- Week 3: Re-administer the same questionnaire to the same 50 participants
- Analysis: Calculate correlation between Week 1 and Week 3 scores
- Result: If correlation = +0.85, this indicates good test-retest reliability (above the +0.80 threshold)
Inter-observer reliability
Different observers may interpret behaviours differently, introducing subjectivity and bias into data collection. The phrase "beauty is in the eye of the beholder" illustrates how personal perspectives can affect observations.
Inter-observer reliability measures the extent of agreement between two or more observers recording the same behaviour. Rather than working alone, observers should work in teams of at least two people. The observations from different observers are then correlated to assess consistency.
Agreement Formula for High Reliability:
If (total number of agreements) ÷ (total number of observations) exceeds +.80, the data demonstrates high inter-observer reliability.
This method also applies to inter-rater reliability (for content analysis) and inter-interviewer reliability.
Internal and external reliability
Internal reliability ensures that items within a questionnaire or psychological test are consistent with each other. External reliability means the questionnaire or test produces consistent results every time it is used, which is essentially what test-retest reliability measures.
Improving reliability across research methods
Questionnaires
When questionnaires show low test-retest reliability (correlation below +.80), some items may need modification or removal. Questions that are complex or ambiguous might be interpreted differently by the same person on different occasions.
Solutions for improving questionnaire reliability:
- Replacing open questions with closed, fixed-choice alternatives to reduce ambiguity
- Rewriting unclear questions
- Removing problematic items entirely
Interviews
The most effective way to ensure interview reliability is using the same interviewer throughout the study. When this is impractical, all interviewers must receive proper training to avoid asking leading or ambiguous questions.
Structured interviews with fixed questions are more reliable than unstructured, free-flowing interviews because they provide greater control over the interviewer's behaviour and reduce variability between sessions.
Experiments
Laboratory experiments are often considered reliable because researchers can exert strict control over procedures, instructions, and testing conditions. This control enables precise replication of methods rather than demonstrating reliability of findings.
However, reliability can be compromised if participants are tested under slightly different conditions each time. Maintaining identical conditions across all testing sessions is essential for reliable results.
Observations
Observation reliability improves when behavioural categories are properly operationalised. Categories should be:
- Measurable and self-evident (e.g., "pushing" is clearer than "aggression")
- Non-overlapping (e.g., "hugging" and "cuddling" should not both appear)
- Comprehensive enough to cover all possible relevant behaviours
Before main data collection, researchers may conduct a pilot study to check that observers apply behavioural categories consistently.
Poor operationalisation forces observers to make subjective judgements, leading to inconsistent and unreliable records.
Key Points to Remember:
- Reliability means consistency - measuring devices should produce the same results when used repeatedly
- Test-retest reliability requires a correlation of +.80 or higher between two testing sessions
- Inter-observer reliability needs multiple observers to record independently, with agreement rates above +.80
- Different research methods require specific strategies to improve reliability
- Proper operationalisation of behavioural categories is essential for reliable observations