Operant Conditioning (Edexcel A-Level Psychology): Revision Notes
Operant Conditioning
Introduction
Operant conditioning is a learning process in which behaviour becomes associated with its consequences. The principle is straightforward: if a behaviour is followed by a pleasant consequence (reward), the likelihood of repeating that behaviour increases. Conversely, if a behaviour is followed by an unpleasant consequence (punishment), the likelihood of repeating that behaviour decreases.
Classic Example: The Learning Pigeon
When a laboratory pigeon pecks a blue button with its beak and receives a food pellet as a reward, it learns to repeat this behaviour. However, if pecking a red button results in a mild electric shock, the pigeon learns to avoid pressing the red button in future. This demonstrates how organisms learn to associate their actions with outcomes and adjust their behaviour accordingly.
Historical background: Edward Thorndike
Edward Thorndike (1911) pioneered this area of research and originally termed it instrumental learning. This refers to learning in which the consequence of a behaviour determines whether that behaviour will be repeated in future.
Thorndike's puzzle box experiment
Worked Example: The Puzzle Box Experiment
Thorndike conducted experiments using a puzzle box - a confined space containing a kitten with a mechanism to escape. The box required the kitten to solve a puzzle (such as pressing a latch) to open the door and obtain a food reward.
Observation Process:
- Trial 1: The kitten explores randomly, eventually hitting the latch by accident and escaping
- Subsequent trials: The kitten escapes progressively faster as it learns through trial and error
- Learning outcome: The kitten learns that pressing the latch leads to door opening and food reward
Law of effect
From this research, Thorndike formulated the law of effect, which states that behaviour followed by a pleasant consequence tends to be repeated, whilst behaviour followed by an unpleasant consequence tends not to be repeated. This principle became foundational to understanding how consequences shape behaviour.
Thorndike also proposed the Law of Exercise, which suggests that when all factors are equal, the more frequently a response is performed in a given situation, the more likely it is to be repeated. This emphasises the role of repetition and practice in strengthening behavioural responses.
B.F. Skinner's contribution
B.F. Skinner built upon Thorndike's work in the 1930s and renamed instrumental conditioning as operant conditioning. Skinner was committed to the scientific study of observable behaviour and believed that to understand human behaviour, it was necessary to apply rigorous scientific principles and methods. He felt the term 'operant conditioning' was more appropriate as it emphasises that organisms are 'operating' on or being influenced by their environment.
The ABC model of operant conditioning
Skinner developed the ABC model to explain how operant conditioning functions:
- Antecedent: The stimulus (such as lights or noise) that triggers a behaviour
- Behaviour: The response made by the organism that can be observed and measured as an outcome of the antecedent
- Consequence: The reward or punishment following the behaviour (such as shock or food)
The stimulus-response association is only learnt or repeated if the consequence of the pairing is positive. A negative consequence weakens the stimulus-response link.
The Skinner box
Skinner conducted systematic laboratory experimentation using his 'Skinner Box' - essentially a box that could dispense food and deliver electric shocks to animals such as rats or pigeons. This apparatus allowed precise control over experimental conditions and measurement of behavioural responses.
For instance, if a rat or pigeon is given a food pellet following a desired behaviour (such as lever pressing), they become more likely to repeat this behaviour. Conversely, if given an electric shock when pressing a lever, they learn to avoid this behaviour to prevent further unpleasant stimuli.
Positive and negative reinforcement
Reinforcement refers to any consequence that increases the likelihood of a behaviour being repeated. There are two types:
Positive reinforcement
Positive reinforcement occurs when a pleasurable stimulus is added following a desired behaviour, increasing the likelihood of repetition. The term 'positive' indicates that something is being added to the situation. For example, if a rat presses a lever and receives a food pellet, the food acts as positive reinforcement, making the rat more likely to press the lever again to obtain further food rewards.
Negative reinforcement
Negative reinforcement involves the removal of an unpleasant stimulus in response to a desired behaviour, which also increases the likelihood of repetition. The term 'negative' indicates that something is being taken away. For instance, if a rat or pigeon is given an electric shock until a lever is pressed, pressing the lever removes the unpleasant stimulus. The organism learns to press the lever to avoid electric shocks in future.
Critical Distinction
Both positive and negative reinforcement increase the frequency of behaviour - they differ only in whether a pleasant stimulus is added or an unpleasant stimulus is removed.
Punishment
Whilst reinforcement strengthens behaviour, punishment weakens it by presenting something unpleasant or removing something pleasant whenever the behaviour is shown. There are two types:
Positive punishment (P+)
Positive punishment involves adding an aversive stimulus to reduce the frequency of a behaviour. For example, a child who misbehaves at a party might be shouted at and scolded. This unpleasant experience (shouting and scolding) reduces the likelihood of the misbehaviour recurring.
Negative punishment (P-)
Negative punishment involves the removal of a liked or desirable stimulus to reduce the frequency of a behaviour. For instance, if a dog jumps on a person to greet them but the person walks away, they are removing their attention from the dog. This removal of attention reduces the frequency of jumping behaviour in future.
Understanding Positive and Negative Terminology
The terms 'positive' and 'negative' in this context refer to adding or removing stimuli, not to whether the stimulus is good or bad.
- Both positive and negative punishment aim to reduce behaviour
- Both positive and negative reinforcement aim to increase behaviour
- Reinforcement and punishment can operate through both the addition (positive) and removal (negative) of stimuli
Types of reinforcer
Reinforcers are consequences that increase the likelihood of a behaviour being learnt. They fall into two categories:
Primary reinforcers
Primary reinforcers occur naturally and satisfy basic biological needs such as food, water and shelter. These are innately rewarding and do not require any prior learning to be effective. Humans and animals are naturally motivated to obtain primary reinforcers as they are essential for survival.
Secondary reinforcers
Secondary reinforcers, in contrast, only strengthen behaviour because they are associated with a primary reinforcer. Money is a classic example - it can be used to purchase food, accommodation and clothing. Secondary reinforcers derive their reinforcing properties through learning and association rather than being inherently rewarding.
Token economy
A token economy is a practical application based on operant conditioning principles. It aims to encourage desirable behaviour through a system of reward whilst discouraging undesirable behaviour through withdrawal of reward (punishment). The tokens used in such systems are secondary reinforcers that can be exchanged for primary reinforcers.
How Token Economies Work
Tokens are only provided in return for demonstrating desired behaviour. The more tokens accumulated, the better the reward available. Through selective reinforcement, desirable behaviours are encouraged and undesirable behaviours are reduced.
Token economies have been implemented in institutions such as schools and prisons. For example, students may be allocated tokens for good behaviour such as attendance, punctuality or high test scores. These tokens can then be exchanged for items in the school shop or perhaps a school trip.
In some high-security prisons, inmates receive credits for participating in constructive activities such as attending the library, cleaning or learning to play an instrument. These credits can be used to purchase tobacco, toiletries or telephone time.
Paul and Lentz (1977) study
Research Application: Token Economy in Psychiatric Settings
Paul and Lentz investigated the effectiveness of operant conditioning by reinforcing appropriate behaviour in 84 schizophrenic patients.
Method:
- Patients received tokens as rewards when they behaved appropriately
- Tokens could be exchanged for luxury items
Results:
- The token economy reduced some schizophrenic symptoms, such as bizarre motor behaviours (for example, rocking and blank staring)
- Successfully improved interpersonal skills and self-care abilities
- Not effective in treating cognitive symptoms (delusions and hallucinations)
- Not effective in treating hostile behaviour (screaming and swearing)
- Only 11% of token economy patients required drug treatment, compared to 100% in control group
Conclusion: Operant conditioning is an effective means of treating people with chronic schizophrenia.
However, Paul and Lentz's study raises important issues regarding social control. Some argue it is not morally appropriate for one person to control another's behaviour and that this experiment violated patients' basic human rights. The patients had their rights to personal property and freedom of choice regarding treatment options constrained by the token economy.
It is also possible that the token economy benefited psychiatric staff by making schizophrenic patients more manageable, rather than being implemented for the patients' own benefit. This raises questions about the therapeutic goals of such treatment.
Schedules of reinforcement
Whilst the consequence of a behaviour determines whether it will be repeated, the timing and frequency of reinforcement also significantly impact the strength and likelihood of a behavioural response.
A schedule of reinforcement is a rule that dictates the situations in which a behaviour will be reinforced. There are two main categories:
Continuous reinforcement
Continuous reinforcement occurs when the desired behaviour is reinforced every time it occurs. For example, every time a rat presses a lever in the Skinner box, it receives a food pellet.
This schedule leads to rapid learning but the behaviour is less resistant to extinction once reinforcement stops.
Partial reinforcement
Partial reinforcement occurs when the desired response is reinforced only some of the time. Interestingly, behaviour acquired through partial reinforcement takes longer to learn but is significantly more resistant to extinction compared to continuous reinforcement. There are four types of partial reinforcement schedules:
1. Fixed interval
In a fixed interval schedule, the first correct response is rewarded only after a predetermined amount of time has passed. For example, a rat in the Skinner box receives a food pellet for pressing the lever only after a 30-second time delay.
Learning takes longer with this schedule, but the response rate increases as the animal approaches the end of the learning period. Interestingly, there is a scalloping effect - a dramatic decrease in response immediately after reinforcement is received.
2. Variable interval
In a variable interval schedule, the first correct response is rewarded after a set amount of time has passed, but this time period varies between trials. A new time period is established after each reinforcement. Learning is still observable and the scalloping effect noticeable in fixed interval reinforcement is absent here, as the organism cannot predict when the next reinforcement will occur.
3. Fixed ratio
In a fixed ratio schedule, a response is reinforced only after a specified number of responses has occurred. For example, a rat receives a food pellet after it presses a lever eight times. This schedule tends to produce high, steady response rates.
4. Variable ratio
In a variable ratio schedule, a response may be reinforced after a set number of correct responses, but this number changes between trials. After reinforcement has been achieved, the number of correct responses required for the next reinforcement varies.
Skinner argued that this form of schedule is particularly effective for maintaining behaviour over time, as the unpredictability keeps the organism responding consistently.
Behaviour modification (including shaping behaviour)
Behaviour modification is a therapeutic approach with its theoretical foundation in operant conditioning and Skinner's experimental work. The underlying principles of behaviour modification are to:
- Extinguish undesirable behaviour by removing the reinforcer
- Replace original behaviour with a desirable behaviour and reinforce it
Skinner was interested in understanding how more complex behaviours could be learnt beyond simple responses to obtain food. He developed the theory of behaviour shaping, or what he termed the 'method of successive approximations'. In Skinner's system, this is a step-by-step process where very general desired behaviours related to what you want to see are rewarded initially. Once this behaviour has been demonstrated, the rewards become more selective so that only behaviours closer to the exact desired behaviour are reinforced. This gradually guides the organism closer and closer to the precise target behaviour.
Applications of behaviour modification
Behaviour modification has been applied in various contexts as a therapeutic model to treat conditions including Attention Deficit Hyperactivity Disorder (ADHD), Obsessive Compulsive Disorder (OCD) and autism. The target behaviour is identified and then rewards are provided for behaviours that gradually approach the target.
Application Example: Working with Autism
A therapist working with a child with autism might use rewards to reinforce good behaviour and gradually become more selective in the distribution of rewards to encourage increasingly specific or problematic behaviour for the child.
The token economy represents a particularly successful example where applications of operant techniques have been integrated into society. Token economies have been employed in psychiatry, clinical psychology and education, using patterns of reward to shape behaviour. Some token economies remove tokens as punishment for undesirable behaviour, such as aggression.
Evaluation of token economies
Ayllon and Milan (1979) reviewed numerous programmes and found they were successful in promoting certain behaviours, such as maintaining rules and control over aggression. However, research suggests that the benefits of token economies are relatively short-lived and do not generalise beyond the institution itself. This raises questions about the rehabilitative value of such systems.
Evaluation
Strengths
Wide applicability: Operant conditioning can explain a broad range of behaviours, from addiction to language acquisition. Any substance or activity can become addictive if it is rewarding; that is, if pleasurable or enjoyable outcomes result from the behaviour, correct utterances can be positively reinforced. For instance, a child who says 'juice' may receive a smile from their parent and be given some juice as a result. The child finds the outcome of saying this word rewarding, which aids language development. The theory also has practical applications - token economies have been successfully implemented in psychiatric hospitals, schools and prisons.
Scientific Approach
Both classical and operant conditioning claim to be scientific. Concepts can be defined precisely, measured accurately and controlled systematically, as illustrated by both Pavlov's and Skinner's laboratory experiments on animals. Because only observable behaviour is measured, it can be argued that this represents an objective measure. Moreover, such experiments can be replicated, allowing for reliability to be assessed.
However, the contrived and artificial nature of such experiments raises concerns about ecological validity and the extent to which findings can be applied to real-life settings.
Historical impact: What was accidentally observed by Pavlov is now a universally accepted principle in psychology. It has remained largely unchanged since formulated by Pavlov and continues to be one of the most important principles in psychology's history. It formed the basis for the behavioural or learning approach. Pavlov's work greatly influenced John Watson and B.F. Skinner, and continues to inspire psychological research today.
Between 1997 and 2000, more than 220 articles appeared in scientific journals citing Pavlov's research on classical conditioning, demonstrating its enduring relevance. Pavlov's contributions to psychology have helped shape the discipline and are likely to continue influencing our understanding of human behaviour into the future.
Weaknesses
Reductionism: Learning theorists, such as Skinner, tend to explain all behaviour as outcomes of previous learning. In a sense, they argue that organisms behave the way they do due to the sum of their experiences. This approach is known as reductionism - reducing complex phenomena to simpler components.
Limitations of the Reductionist Approach
Both classical and operant conditioning greatly underestimate the role of biological factors, including genetic differences and instincts, on behaviour. It could be argued that Skinner's explanations only account for observable behaviours and do not account for unobservable phenomena, such as mental and emotional states (for example, anger or happiness). This makes his explanations limited and oversimplified.
Animal research concerns: A major criticism is the extensive use of animal research upon which a large proportion of learning theories are based. This raises the issue of extrapolating findings from animals and applying them to humans. Animals obviously differ from humans in terms of anatomy and physiology - day-to-day experiences differ considerably between species.
For example, animals do not reflect on their learning experiences with logic, patience or feelings as humans do. A key difference between rats and humans is language. A human can cease a behaviour simply by being informed that no more rewards will be provided. For a rat, this is not an option and it will continue to press a lever for food long after the food has stopped being dispensed.
Ethical Concerns
The use of laboratory experiments with animals in classical and operant conditioning also raises ethical concerns. It could be argued that Pavlov's research, for instance, caused unnecessary suffering to the dogs in his experiment. This needs to be weighed against the benefits of the research and whether the ends justify the means. Others may argue that the research was justified as it furthered our understanding of behaviour.
Deterministic implications: Both theories can be viewed as strongly deterministic, suggesting behaviours are largely governed by environmental forces. If individuals are primarily products of their environment, this suggests they cannot control their own actions and cannot be held responsible for them. Moreover, this has potentially concerning implications, allowing others to control an individual's behaviour through conditioning mechanisms.
Social Control Concerns
This raises further ethical issues linked to social control. Skinner, however, viewed this as potentially positive, believing behaviourist principles could be used to create a better world.
Remember!
Key Takeaways from Operant Conditioning:
-
Operant conditioning involves learning through consequences - behaviours followed by pleasant outcomes increase in frequency, whilst those followed by unpleasant outcomes decrease
-
Thorndike's law of effect established that behaviour with positive consequences tends to be repeated, whilst behaviour with negative consequences tends to be withdrawn
-
Skinner's ABC model (Antecedent-Behaviour-Consequence) explains how the consequence of a behaviour influences its future replication
-
Positive reinforcement adds a pleasant stimulus, negative reinforcement removes an unpleasant stimulus, positive punishment adds an aversive stimulus, and negative punishment removes a pleasant stimulus
-
Schedules of reinforcement (continuous, fixed interval, variable interval, fixed ratio, variable ratio) significantly affect how quickly behaviour is learnt and how resistant it is to extinction
-
Token economies and behaviour modification through successive approximations demonstrate practical applications of operant conditioning principles in therapeutic and institutional settings