Operant applications

Operant Applications

Principles of Learning

Applications of Operant Conditioning

Skinner introduced the concept of teaching machines that shape learning in small steps and provide reinforcements

for correct rewards.

In School

LW

A-JD

L/ C

orbis


Reinforcers affect productivity. Many companies now allow employees to share

profits and participate in company ownership.

At work


At Home

In children, reinforcing good behavior increases the occurrence of these behaviors. Ignoring unwanted behavior decreases their

occurrence.

6

Operant conditioning: Addiction (1)

Drug use is a behaviour that is reinforced by the positive reinforcement that occurs from the pharmacologic properties of the drug.

7


Once a person is addicted, drug use is reinforced by the negative reinforcement of removing or avoiding painful withdrawal symptoms.

Behavior Therapy• Behavior therapy uses learning methods to

change abnormal behavior, thoughts and feelings– Behavior therapists use classical and operant

conditioning techniques as well as modeling– Counterconditioning: learning a new response

• Systematic desensitization: relaxation is paired with a stimulus that formerly induced anxiety

• Aversive conditioning: an unpleasant event is paired with a stimulus to reduce its attractiveness

Ch 2.23

Counterconditioning

Cognitive Behavior Therapy

• Cognitive therapy assumes that thought patterns can cause a disturbance of emotion or behavior – Beck’s Cognitive Therapy for Depression

• Depressed mood caused by cognitive distortions– “Nothing good ever happens to me”

– Ellis’s Rational Emotive Behavior Therapy• Emotional upset is due to irrational beliefs

– “I must be loved by everyone”

Ch 2.25

The Cognitive Paradigm

• Cognition involves the mental processes of perceiving, recognizing, judging and reasoning

• The cognitive paradigm focuses on how people structure and understand their experiences and how these experiences are related to past experiences stored in memory

Ch 2.24

12


Once a person is addicted, drug use is reinforced by the negative reinforcement of removing or avoiding painful withdrawal symptoms.

13

Operant conditioning: Application to CBT techniques

• Functional Analysis – identify high-risk situations and determine reinforcers

• Examine long- and short-term consequences of drug use to reinforce resolve to be abstinent

• Schedule time and receive praise

• Develop meaningful alternative reinforcers to drug use

Gary Wilkes (1994) Animal Trainer

• Elephants:

Dangerous, handling stress sensitive

Calluses build-up (unable to walk)

Cut away with sharp tool

Elephant Manicure

• Violent Aggressive Bull

• Callous not trimmed in 10 years

• Vets can not touch

• What to do?

• Large steel gate with hole in corner (size of elephants foot)

• Clicker + Carrot• Clicker + approach gate +

carrot• Clicker +lift foot + carrot• Clicker + move foot to hole• Etc….• After training: elephant would

voluntarily walk to gate and put foot through

Elephant Manicure

• CS + US

• SHAPING

• Large steel gate with hole in corner (size of elephants foot)

• Clicker + Carrot• Clicker + approach gate +

carrot• Clicker +lift foot + carrot• Clicker + move foot to hole• Etc….• After training: elephant would

voluntarily walk to gate and put foot through

Self Awareness

• Self Aware: observe ones own behavior

• “I think Joe will quit school” ( he is engaged in those types of behaviors)

• I have observed myself engaged in those behaviors. (“I think I will quit school”)

• Long-term Comas• Behave like awake:

Open eyes

Turn heads

Move a hand

Coma = not responsive to environment

Boyle and Greer (1983)

• Reinforced spontaneous behaviors with music

Moved patient

Requested action

Reward = short selection of favorite music

2 sessions a day/ 16 weeks

• Reinforcement

• Outcome (Reward) contingent on behavior

• Cause and effect!

• 33% increased spontaneous movement

1 came out of coma

Norris Edwards: Chapter 8: Wade08.ppt Page: 19

The Problem with The Problem with RewardReward

The Problem with The Problem with RewardReward

• Misuse of reward Misuse of reward ~ rewards must be tied to the ~ rewards must be tied to the behavior we are trying to increase.behavior we are trying to increase.

• Each of use has had the experience of standing Each of use has had the experience of standing in the checkout line and the market and seeing a in the checkout line and the market and seeing a child in a shopping cart tempted by the candy child in a shopping cart tempted by the candy and toys on display adjacent to the line. and toys on display adjacent to the line.

• When we as parents giving a purchase something When we as parents giving a purchase something to quiet our kids in that situation, what behavior to quiet our kids in that situation, what behavior are we actually reinforcing?are we actually reinforcing?

• Misuse of reward Misuse of reward ~ rewards must be tied to the ~ rewards must be tied to the behavior we are trying to increase.behavior we are trying to increase.

• Each of use has had the experience of standing Each of use has had the experience of standing in the checkout line and the market and seeing a in the checkout line and the market and seeing a child in a shopping cart tempted by the candy child in a shopping cart tempted by the candy and toys on display adjacent to the line. and toys on display adjacent to the line.

• When we as parents giving a purchase something When we as parents giving a purchase something to quiet our kids in that situation, what behavior to quiet our kids in that situation, what behavior are we actually reinforcing?are we actually reinforcing?

Norris Edwards: Chapter 8: Wade08.ppt Page: 20©1999 Prentice Hall

Hidden Cost of Rewards

• Preschoolers played with felt-tipped markers and observed

• Divided into 3 groups:– Given markers again and

asked to draw– Promised a reward for

playing with markers– Played with markers,

then rewarded

Albert BanduraSocial Cognitive Theory

• Theories that emphasize how behavior is learned and maintained through observation and imitation of others, positive consequences, and cognitive processes such as plans expectations, and beliefs.

• Observational Learning ~ A process in which an individual learns new responses by observing the behavior of another (a model) rather than through direct experience; sometimes called Vicarious Conditioning.

Skinner (1953) and Verbal Behaviors

• “That itches”• “That tickles”• “That hurts”

• Observed behavior:

• Scratching• Giggling• Tears and groans

Basic Behavioral Principles

• Antecedent - any stimulus that happens before a behavior (S)

• Behavior - an observable and measurable act of an individual (R)

• Consequence - any stimulus that happens after a behavior (O)

Social-Cognitive Learning TheoriesSocial-Cognitive Learning Theories• To this point most American learning

theories have maintained the position that most learning can be explained in terms of the behavioral ABCs.

• Antecedents event preceding the behavior

• Behavior itself• Consequences of the behavior.• Social Learning Theories emphasizes

the importance of observational learning by observing people in social context.

• To this point most American learning theories have maintained the position that most learning can be explained in terms of the behavioral ABCs.

• Antecedents event preceding the behavior

• Behavior itself• Consequences of the behavior.• Social Learning Theories emphasizes

the importance of observational learning by observing people in social context.

Verbal Conditioning

S-R-O

Skinner (1957)

The Mand(Requesting)

• All mands have one thing in common: in the antecedent condition, there is a Motivative Operation (or motivation {S-S}) in place.

• A= thirst (MO) (S)• B= “I want juice” (R)• C= student gets juice (O)• If a child does not want the item, you

cannot teach them to mand for it.

Verbal Conditioning

• S

Hungry?

Sleepy?

• O

Reinforced (Behavior and self aware observation)

Reward or Punish?

• R

Yes!

No! (self

awareness?)

Norris Edwards: Chapter 8: Wade08.ppt Page: 29

When Punishment FailsWhen Punishment Fails

• Most misbehavior is hard to punish immediately.

• Punishment conveys little information.

• An action intended to punish may instead be reinforcing because if brings attention.

• Most misbehavior is hard to punish immediately.

• Punishment conveys little information.

• An action intended to punish may instead be reinforcing because if brings attention.

Behavior and the Mind

• Edward Tolman (1938) experiment with rats demonstrated latent learning

• Latent learning is learning that in not immediately revealed through a change in behavior

• Latent learning occurs without obvious reinforcement

• Perception of the model and of themselves influence individual's learning.

Tolman

Latent Learning: A Classic Experiment(Tolman & Honzik,

1930)Three groups of rats were given practice trials in a maze, 1 trial per day.

The maze consisted of a series of components

shaped like the letter T.

A trial started when the rat was placed in the Start box and ended when he entered the Goal box, after which he was removed from the maze.

Tolman


1930)

TSTART

TTT

i

TT

...

GOAL

When the rat went up the stem of the T, he reached a choice point.If he turned one way, he came to a dead end.If he turned the other way, he came to the entrance of the next component.

Tolman


1930)

TSTART

TTT

i

TT

...

GOAL

Each time the rat turned into the dead end, it was counted as an error.The measure of performance (dependent variable) was the number of errors on a trial.

If learning occurred, the number of errors should decrease as more and more trials were given.


1930)

GROUP 1: On every trial, these rats received food when they reached the goal box.

GROUP 2: These rats never received food. They were simply removed from the maze when they got to the goal box.

GROUP 3: These rats got no food on Trials 1 to 10. But on Trial 11, and every trial afterwards, they received a food reward.

US = Food

UR = Consume Food

CS = Maze

CR= Consume Food


1930)

1 10 11 17

Trials (1 Trial per Day)

Avera

ge

Err

ors

0

2

4

6

8

1

0 GR 1 — GR 2 — GR 3 —

The day-to-day decrease in errors represented a “relatively permanent change in behavior” that resulted from practice.

This was clear evidence for learning.

Hull’s theory predicts that the rats in Hull’s theory predicts that the rats in groups 3 & 2 will not learngroups 3 & 2 will not learn

Latent Learning: A Classic Experiment

(Tolman & Honzik, 1930)

1 10 11 17


Avera

ge

Err

ors

0

2

4

6

8

1

0 GR 1 — GR 2 — GR 3 —

Group 2 got no food but still improved slightly. Removal from the maze was a small reward.

There was little evidence for learning.

Hull vs. Tolman

• Hull’s law of primary reinforcement:– “when a stimulus-response relationship is followed

by a reduction in need, the probability increases that on subsequent occasions the same stimulus will invoke the same response” (Schultz & Schultz, op. cit., p. 329)

• Learning can only take place if there is reinforcement

• S-R connections strengthened by the no. of reinforcements that have occurred - Hull called this “habit strength”

• Habit strength = intervening variable

Hull vs. Tolman

• Tolman devised an experimental test of Hull’s theory

• Hull’s theory states - learning must involve reinforcement– So we can deduce this hypothesis from his

theory:• Rats will not learn if they are not rewarded

– Tolman tested this hypothesis


1 10 11 17


Avera

ge

Err

ors

0

2

4

6

8

1

0 GR 1 — GR 2 — GR 3 —

Getting no food on Trials 1 – 10, Group 3 performed like Group 2 through Trial 11.


1 10 11 17


Avera

ge

Err

ors

0

2

4

6

8

1

0 GR 1 — GR 2 — GR 3 —

On the next trial, Group 3 matched Group 1, and then did even better!


1930)Interpretation

Group 3 learned the route to the maze on Trials 1 to 10 but didn’t show it because there was no motivation to perform. How could they learn if there was no CS/US pairings?They outperformed Group 1 because the shift from no reward to reward made the reward seem larger by comparison. This is called “positive contrast.”

So S-S is the way animals learn?

Hull maintained that maze itself caused little S-R bonds to form

S-R theory still dominated psychology for 40 more years

Response Vs. Place Learning

GROUP P always found food in Goal Box 1.

Start 1

Start 2

Goal 2

Goal 1

(Tolman, Ritchie & Kalish, 1946)

This maze had no walls or roof so that rats could see “landmarks” in the room such as a window, door, or lamp.

On a random half of the trials, the rats started from Start Box 1, and on the other half they started from Start Box 2.

GROUP R found food in Goal Box 1 when they started from Start Box 1 but received food in Goal Box 2 when they started from Start Box 2.



Start 1

Start 2

Goal 2

Goal 1


Cognitive theory predicted that GROUP P would learn faster because they only had to learn one cognitive map.

Behavior theory predicted GROUP R would learn faster because they only had to learn one sequence of movements at the choice point—a right turn.




Start 1

Start 2

Goal 2

Goal 1



What’s YOUR prediction?Are you a behaviorist or a

cognitivist?GROUP PGROUP R



Start 1

Start 2

Goal 2

Goal 1





Group P learned faster.

ButLater studies found that if the maze had a roof so the rats couldn’t see things in the room, response learning was faster.



Start 1

Start 2

Goal 2

Goal 1





Group P learned faster. Both response and place learning occur. Which type is faster depends on what cues are available. So both the S-R and S-S views turned out to be right!

S-R or S-SClassical conditioning can involve both S-R and S-S

Today:

Controlled vs. Automatic processing

S-S= While learning

S-R= After learning

Theories Explaining Classical Conditioning

HULL• Born 1884 in Akron NY• Graduated U. of

Michigan in 1913• Ph.D. U. of Wisconsin

1918• 1929-1952 Professor of

Psychology at Yale• Died 1952

Tolman• Born Newton, Mass. On April

14, 1886.• BA at MIT in electrochemistry• Ph.D. psychology in 1915• Spent month at Giessen under

Kofka. Heavily influenced by Gestalt movement

• Ardent pacificist• Dismissed at Northwestern U• Went to UC Berkley rest of

career

S-R or S-S

Behavioral vs. Cognitive Views of Learning

These traditions in learning theory have existed for decades. They give different answers to the

fundamental question, “What is learned” when learning takes place?

Behaviorists say: “Specific actions”

Cognitivists say: “Mental representations”

For example, in a “Skinner Box”, a rat may receive a food reward every time he presses the bar. He presses faster and faster. What has he learned?

S-R S-S

S-R vs. S-SViews of Learning

These traditions in learning theory have existed for decades. They give different answers to the

fundamental question, “What is learned” when learning takes place?

S-R view: “to press the bar.”

S-S view:

For example, in a “Skinner Box”, a rat may receive a food reward every time he presses the bar. He presses faster and faster. What has he learned?

“that pressing produces food.”


S-R

(“learns to”)1. Learning involves the formation of associations between specific actions and specific events (stimuli) in the environment. These stimuli may either precede or follow the action (antecedents vs. consequences).

2. Many behaviorists use intervening variables to explain behavior (e.g., habit, drive) but avoid references to mental states.

3. RADICAL BEHAVIORISM (operant conditioning/behavior modification/behavior analysis): avoids any intervening variables and focuses on descriptions of relationships between behavior and environment (“functional analysis”).


S-S(“learns that”)

1. Learning takes place in the mind, not in behavior. It involves the formation of mental representations of the elements of a task and the discovery of how these elements are related.

2. Behavior is used to make inferences about mental states but is not of interest in itself (“methodological behaviorism”). 3. EXAMPLE: Tolman & Honzik’s experiment on latent learning. Tolman, a pioneer of cognitive psychology, argued that when rats practice mazes, they acquire a “cognitive map” of the layout—mental representations of the landmarks and their spatial relationships.

S-R or S-S

• Autoshaping

• Taste aversion

• Eyeblink conditioning

• Blocking

• Extinction

• Spontaneous Recovery

• S-R

• S-S

• S-R

• S-S

• S-R

• S-S

Latent LearningLatent Learning

• Rats: one maze trial/day• One group found food every

time (red line)• Second group never found

food (blue line)• Third group found food on

Day 11 (green line)– Sudden change, day 12

• Learning isn’t the same as performance

• Rats: one maze trial/day• One group found food every

time (red line)• Second group never found

food (blue line)• Third group found food on

Day 11 (green line)– Sudden change, day 12

• Learning isn’t the same as performance

Norris Edwards: Chapter 8: Wade08.ppt Page: 56©1999 Prentice Hall

Cognitive Maps

• Tolman trained rats in this maze, with all alleys open– Not to scale; the path on the

left is too long.

• If “Block A” in place, rats chose green (shorter) path

• If “Block B” in place, rats chose blue path– Green path also blocked

• Rats navigate as if they have an internal map

• Tolman trained rats in this maze, with all alleys open– Not to scale; the path on the

left is too long.

• If “Block A” in place, rats chose green (shorter) path

• If “Block B” in place, rats chose blue path– Green path also blocked

• Rats navigate as if they have an internal map

Varieties of cognitive maps? (Gallistel 1990)

Specific issues:• Spatial scale (local vs. home-range) • Geometric content (metric, topological) • Reference frame (egocentric/view-dependent vs. allocentric/view-

independent)Evidence: • People: short cuts in cities and VR (errors); mixed evidence

contents of underlying map• Rodents: most studies on local scale; mixed evidence on contents• Insects: on local and home-range scale--metric, egocentric

Broader Definition (Gallistel 1990): ‘A cognitive map is a record in the central nervous system of macroscopic geometric relations among surfaces in the environment used to plan movements through the environment. A central question is what type of geometric relations a map encodes’.

More on Cognitive Maps: Chimpanzee Behavior


• Chimpanzee on experimenter’s back• Watched site bating: 18 locations• Later released to retrieve food• Most food found• Retrieval route differed from baiting route• Traveling distance was very efficient

Cognitive Maps (spatial learning)


• Second experiment

• Same general plan

• 18 locations: 9 fruits and 9 vegetables

• First retrieval visits were to retrieve fruits, according with food preferences


• Results suggest that chimpanzees have something like a cognitive map of compound.

• As they are carried around, chimpanzees store information about food locations not on the basis of the particular path that they are traveling, but on the basis of their cognitive map. Cognitive Map = A

separate type of memory (Bedroom, Gestalt)


• Chimpanzees work with this cognitive representation to determine most efficient route to travel in gathering food.

• This solution depends on cognitive mediation between inputs and behavior that transforms and organizes inputs.

• To explain chimpanzees’ behavior without appeal to mediating processes would provide an impoverished view of what animal does.

http://www.scottcamazine.com/photos/BeeBehavior/images/06waggleDance_jpg.jpg

Sun Compass and Memory in Bees

Food 20° 40°75°

(Up)

20° 40°

75°

• Bees encode (allocentric?) flight direction in dances

• As sun moves, dances change• Dances change even when bees can’t see sun

(thus compensate by memory)• Reference for memory: landmarks (Dyer &

Gould 1981; Dyer &Dickinson 1996)

H

F

Noon

16:0012

16

The basic task

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

http://www.scottcamazine.com/photos/

A STRATEGY FOR INCREASING BEHAVIOUR

• Behavioral self-management is a strategy for increasing some desired behavior (for example, hours spent studying or exercising) by using self-administered rewards. A behavioral self-management program requires the following:

Strategies for increasing a desired behavior

• Choose a target Choose a target behaviour (the behaviour behaviour (the behaviour you want to increase)you want to increase)

• Record a baseline (count Record a baseline (count time engaged in the time engaged in the desired behaviour or desired behaviour or number of times the number of times the desired behaviour is desired behaviour is performedperformed )

• Establish goals (set Establish goals (set gradual goals – daily and gradual goals – daily and weekly)weekly)

• Choose reinforcers (for Choose reinforcers (for when you reach daily and when you reach daily and weekly goals)weekly goals)

• Record your progress Record your progress (time you engaged in the (time you engaged in the behaviour or number of behaviour or number of times you performed the times you performed the activity)activity)

Operant applications

Technology

Transcript of Operant applications