Law of Effect Animals improve on performance: –optimize –not just repeat same behavior but make...

Post on 14-Dec-2015

217 views 0 download

Tags:

Transcript of Law of Effect Animals improve on performance: –optimize –not just repeat same behavior but make...

Law of Effect

• Animals improve on performance: – optimize – not just repeat same behavior but make it better and more efficient

• Adaptation to the environment via learning

• Herrnstein considers this behavior change/adaptation a question, – not an answer – e.g., what are adapting to; how are adapting; how know to adapt, etc

• Wants to know if there is a similar way in which animals optimize, and can it be described by a unified paradigm.

Reinforcement as strength:

• Reinforcement = making a stronger link

between responding and reward

• Relative frequency

– measure of response-reinforcer strength:

– Absolute rates: P1/time and Sr/time

– Relative rate = P1/P2 and Sr1/Sr2

– response rate as function of reinforcer rate

Reinforcement as strength:

• Plot proportion of responses as function of proportion of reward

– Should be a linear relationship

– As rate of reward increases, the rate of reinforcement should increase

• Note: This is a continuous measure, and not discrete trial: animal has more “choice”

– Discrete trial – trial by trial

– Free operant: animal controls how many responses it makes

Reinforcement as strength:

• Differences when organism controls rate vs. time controls rate– Get exclusive choice on FR or VR schedules

• faster respond = more reinforcer• In time, faster responding does not necessarily get you more• But: should alter rate of one response alternative in comparison to another

– BUT: VI schedules allow examination of changes in response rate as a function of predetermined rate of reinforcer

– With VI schedules, can use reinforcer rate as the independent variable!

• This becomes basis of matching law– Pl/Pl +Pr = Rl/Rl + Rr– Ratio of responding should approximate the rate of reinforcement

A side bar: The Use of CODs

• COD = change over delay

• Use of a COD affects response strength and choice:– Shull and Pliskoff (1967): used COD and no COD– Got better approximation of matching with COD

• Why important:– COD not controlling factor, – controlling factor = response ratio

– COD increases discriminability between the two reinforcer schedules

• Increased discriminability = better “matching”• Why?

Herrnstein’s Matching Equation (1961)

• Begin with a single reinforcer & response

• P1 = kR1

------------

R1+Ro

P1= rate of responding to alternative 1R1 = rate of reinforcement for alternative 1Ro = rate of unaccounted sources of reinforcementk = asymptote of response rate

Can derive a more general two-choice equation

P1 = kR1

------------

R1 + R2 + Ro

---------------------------------------------

P2 = kR2

------------

R1 + R2 + Ro

Cancelling out:

P1 = kR1

------------

R1 + R2 + Ro

---------------------------------------------

P2 = kR2

------------

R1 + R2 + Ro

Two-Parameter Matching Equation

P1 R1

---- = ----

P2 R2

– Assume that Ro is equal for both P1 and P2

– What are some possible “Ro”s?

• Note that are everything is measurable!

What does this mean?

• Relative rate of responding varies with relative rate of reinforcement;

• Must have some effect on absolute rates of responding as well.

• Simple matching law: P1 = kR1/R1 + Ro– makes a hyperbole function– some maximum rate of responding

How plot?

• Plot response rate (R/min) as a function of reinforcement rate (Sr/min)– Makes a hyperbola– Decelerating ascending curve– Why decelerating- why reach asymptote?

• Note is a STEADY STATE theory, not an acquisition model

Example:

• Plot response rate as a function of reinforcer rate:

Reinforcers per hour

0 50 100 150 200 250 300

Res

pons

es p

er h

our

0

10

20

30

40

50

Responses per hour Reinforcers per hour

0.166666667 5

1.758333333 52.75

12.85555556 385.6666667

26.36111111 790.8333333

30.8 924

39.075 1172.25

40 1200

Factors affecting the hyperbola

• Absolute rates are affected by reinforcement rates– Higher the reinforcement rate the higher the

rate of responding– True up to some point (asymptote)- why?

• Can also plot for P1/R2 = R1/R2 and get same general trend

Baum, 1974

Generalized Matching Law

Describes basic matching law:

• P1/P1+P2 = R1/R1 + R2

• Revises to: P1/P2=R1/R2

• Notes that Staddon (1968) found can log it out to get straight lines

• Also adds two parameters: b and a

• New version: Log(P1/P2) = a*log(R1/R2) + log b– P1/P2 = b(R1/R2)a– Where a = the undermatching or sensitivity to reward parameter– B = bias

What is Undermatching?• Perfect sensitivity to reward or “matching”: a=1.0

• undermatching or under sensitivity to reward

– Any preference less extreme than the matching relation would

predict

– a < 1.0:

– A systematic deviation from the matching relation for

preferences toward both alternatives, in the direction of

indifference

– Organism is less sensitive than predicted by the reinforcer

ratios to changes in those reinforcer ratios

What is Undermatching?• A >1.0: over matching or oversensitivity to reward

– A preference that is MORE extreme than the equation would predict

– Systematic deviaiton in matching relaion for preferences toward the better alternative, to the neglect of the lesser alternative

– Organism is more sensitive than predicted to differences in reinforcer alternatives

• Reward sensitivity = discrimination or sensitivity model: – tells us how sensitive the animal is to changes in the (rate) of

reward between the two alternatives

Alone

log (R1/R2)

-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8

log (

T1

/T2

)

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

a = 0.99b = 0.02r2 = 0.99

Unpredicted Competitor

log (R1/R2)

-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8

log (

T1

/T2

)

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

Predicted Competitor

log (R1/R2)

-0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8

log (

T1

/T2

)

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

a = 0.59b = -.06r2 = 0.99

a = 1.37b = 0.008r2 = 0.96

This is an example of almost perfect matching with little bias. Why?

This is an example of undermatching with some bias towards the RIGHT feeder. Why?

This is an example of overmatching with little bias. Why?Is overmatching BETTER than matching or undermatching? Why or why not?

Factors affecting the a or undermatching parameter:

• Discriminability between the stimuli signaling the two schedules

• Discriminability between the two rates of reinforcers

• Component duration

• COD and COD duration

• Deprivation level

• Social interactions during the experiment

• Others?

Bias

• Definition: magnitude of preference is shifted to one reinforcer when there is apparent equality between the rewards

• Unaccounted for preference

• Is experimenter’s failure to make both alternatives equal!

• Calculated using the intercept of the line:– Positive bias is a preference for R1– Negative bias is a preference for R2

Four Sources of Bias• response bias

• Discrepancy between scheduled and obtained reinforcement

• Qualitatively different reinforcers

• Qualitatively different reinforcement schedules

• Examples:– Difficulty of making response: one response key harder to push than other– Qualitatively different reinforcers: Spam vs. Steak– Color– Preference for a side of box, etc

Qualitatively Different Rewards

• Matching law only takes into consideration the rate of reward

• If qualitatively different, must add this in– So: P1/P2 = V1/V2*(R1/R2)a

– Must add in additional factor for qualitative differences

– Assumes value stays constant regardless of richness of a reinforcement schedule

• Interestingly, can get u-shaped functions rather than hyperbolas – Has to do with changing value of reward ratios when dealing with

qualitatively different reinforcers

– Different satiation/habituation points for each type of reweard

– Move to economic models that allow for U-shaped rather than hyperbolic functions.

Qualitatively different reinforcement schedules

• Use of VI versus VR

• Animal should show exclusive choice for VR, or minimal responding to VI

• Can control response rate, but not time

• Not “match” in typical sense, but is still optimizing

So, does the matching law work?

• It is a really OLD model!

• Matching holds up well under mathematical and data tests

• some limitations for model

• tells us about sensitivity to reward and bias

Applications: McDowall, 1984• Wants to apply Herrnstein's equation to clinical

settings:– uses Herrnstein's equation: P=P1/R1+Ro

• makes several important points about Herrnstein's equation:– ro governs rapidity with which hyperbola reaches

asymptote– thus: extraneous reinforcement can affect response

strength (rate)

 

Shows importance of equation

• contingent reinforcement supports higher rate of reinforcement in barren environments than in rich environments

• high rates of Ro can affect situation

• when few other Sr's available, your Sr's matter more

Applications: McDowall

• law of diminishing returns: given increment in reinforcement rate – (delta-Sr) produces a larger increment in the

response rate (delta-r) when the prevailing rate of contingent reinforcement is low (r1) than high (r2)

– response rate increases hyperbolically with increases in reinforcement,

Applications: McDowall

•Reinforcement by experimenter/therapist DOES NOT OCCUR in isolation- must deal with Ro

•What else and where else is your client getting reward?

•What are they comparing YOUR reward to?

Demonstrates with several human studies

• Bradshaw, Szabadi, & Bevan (1976; 1977; 1978)– button pressing for money – organisms matched

• Bradshaw, et al, 1981: used manic-depressive subjects– response rate was hyperbolic regardless of mood

state– k was larger, Ro smaller when manic– k was smaller, Ro larger when depressed

Demonstrates with several human studies

• McDowell study: SIB (scratching) boy– used punishment (obtained reprimands) for scratching– found large value of Ro, but kid did match– Ro was so pervasive, was dominant R

• Ayllon & Roberts (1974): 5th grade boys and studying– reading test performance was rewarded (R1)– disruptive behavior/attention = Ro– found that when increased reinforcement for reading (R1),

responding increased and the disruptive behavior decreased (reduced values of Ro)

Demonstrates with several human studies

• Critchfield: shows works well in sports: three point shots and running vs. passing– Basketball– Football

• why choose football?– Play calling = individual behavior– Quarterback– Offensive coordinator and head coach

Demonstrates with several human studies

• Highly skilled players– When calling play, consider success/failure of

previous attempt in decision for next play– Individual differences in play-calling patterns

(throwing vs passing teams)– Focus at team level

General Method

• data obtained from NFL– primary data:– number of passing/rushing plays– net yards gained

• Data off of ESPN websites

General Method

• several characteristics– plays categorized as rushing or passing based on

what occurred rather than what was called (no way of knowing that)

– sacks = failed rush play– yards gained = completion even if fumble after

catch

• fit data to matching equation– ratio of yards gained through passing vs rushing

used as predictor of ratio of pass plays/rush plays called

Season aggregate league outcome

• a = 0.725, r2 = 75.7; b= -0.129 (favor of rushing)

• historical comparisons:– 1975-2005– 2004 fell out of typical range– R2 decreases about 4%/year across years,

suggesting more variability in play calling

• Why?– Shift in rules designed to favor passing– Free-agency rules– Salary caps

comparison with other leagues:

• Differences:– NFL Europe: 0.619. 82.1– CFL: a = .544, r2 = .567– Arena Football: a =.56, r2=.784– United Indoor Football League: a = 61.3, r2 59.8– National Women’s football association: a = .55, rw = .709– NCAA Atlantic Coast: a = 0.63, r2=.809– NCAA Western: a = .868, r2=.946– NCAA Mid-america: a = 0.509, r2=634

• Generally good fits: R2 = .57-.95

• 6 of 9 leagues: favored passing rather than rushing– CLF: rushing rather than passing (turnover risk?)

conditional play calling:

• examined specific circumstances– examined down number (1,2,3)

• how does matching change?– a = decreasing with down– less likely to pass with increased down– is this surprising? Why or why not?

To reduce behavior a la Herrnstein

•  increase rate of reinforcement for the concurrently available response alternatives– Are engaging in out of seat because it is

reinforcing

– Increase rate of reinforcement for IN-SEAT behaviors

Game by Game outcomes:

• Regular season games

• Preseason fits relatively poor: a = .43

• Later in season: better fits: a = .58

• Post season slightly better: a = .59

• Why?

Actual Therapy situation:Reduce behavior a la Herrnstein

• increase like a DRO schedule, except:– not reinforcing incompatible responses

– arranging environment such that relative rate of reinforcement for desired response is higher than relative rate of reinforcement for undesired behavior

• Get more for “being good” than for “being bad”

To reduce behavior a la Herrnstein

• Take home message:

• IT is the disparity between 2 relative

rates of reinforcement that is important,

not the incompatibility of the 2 responses

Dealing with noncontingent reinforcement (Ro)

•  An example: unconditional positive regard = free, noncontingent reinforcement

• will reduce frequency of undesired responding

• BUT, will also reduce behaviors that may want!!!

 

Dealing with noncontingent reinforcement (Ro)

• to increase responding: 3 ways:

– increase rate of contingent reinforcement

– decrease rate of concurrently available

reinforcement of one alternative

– decrease rate of free, noncontingent

reinforcement

 

Dealing with noncontingent reinforcement (Ro)

• works well in rich environments where have more opportunity to alter reinforcement rates.

•  Not have to add reinforcers, but can DECREASE reinforcement to alter situation, avoid satiation/habituaton

• allows for contextual changes in reinforcement 

Behavioral Contrast

• Behavioral contrast: often found "side effect“: original study: Reynolds, 1961– pigeons on CONC schedules of reinforcement with equal

schedules at first

– then, extinguish reinforcement on one alternative

– got HUGE change in responding for non-EXT alternative

• why? Behavioral contrast- changed value of schedule

• also called the Pullman effect!!!!

Behavioral Contrast• Helps explain "side effects" of

reinforcement:

– e.g.: EXT boy talking to teacher during class,

but then kid talks more to peers

• Why?– P1/P2=R1/R2 100/100 = 100/100

– But then one option goes to EXT

– P1/P2=R1/R2 100/100 = 100/0?

Behavioral Contrast

• Example: boy talking to teacher during

class, so teacher puts the talking on EXT

– but then kid talks more to peers

– Look at ratios:

P1 = R1

-- --

P2 R1

Behavioral Contrast• lets plug in values:

– before, talking to teacher highly valuable:

P1/P2 = 100/50

– now: talking to teacher is not valuable: P1/P2

= 1/50

• alternative is much more "preferable" than

in original situation

– If alter Ro, get similar changes!

Can mathematically predict• Responses

– P1 = staying in seat– P2 = out of seat

• Rewards– R1 = rewards for staying in seat– R2 = rewards for being out of seat– Ro = reward for playing around in seat

• What happens as we vary each of theseP1 = R1/R1+R2+Ro

-----------------------------P2 = R1/R1+R2+Ro

Conclusions: Clinical applications:

• MUST consider broader environmental conceptualizations of problem behavior

• Must account for sources of reinforcement other than that provided by therapist– again- Herrnstein's idea of context of

reinforcement

– if not- shoot yourself in the old therapeutic foot