Q-Learning and Dynamic Treatment Regimes S.A. Murphy Univ. of Michigan IMS/Bernoulli: July, 2004.
Dynamic Treatment Regimes
description
Transcript of Dynamic Treatment Regimes
![Page 1: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/1.jpg)
1
Dynamic Treatment Regimes
S.A. MurphyCASBS
November 2, 2007
![Page 2: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/2.jpg)
2
Outline
– Three apparently dissimilar problems– Myopic decision making– Constructing strategies– Challenges
• Unknown, unobserved causes• Small, expensive data sets
– Discussion
![Page 3: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/3.jpg)
3
Three Apparently Dissimilar Problems
– Artificial Intelligence: Autonomous Helicopter Flight
– Management of Substance Abuse/Mental Illness
– Management of a Welfare Program
![Page 4: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/4.jpg)
4
Artificial Intelligence
• Autonomous Helicopter Flight– Observations: characteristics of the helicopter (position,
orientation, velocity, angular velocity, ….), characteristics of the environment (wind speed, wind angle, turbulence….)
– Actions/treatments: cyclic pitch (causes forward/backward and sideways acceleration), tilt angle of main rotor blades (direction), tail rotor pitch control (turning)
– Rewards: Closeness of helicopter’s flight path to the desired path; avoidance of crashes(!)
![Page 5: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/5.jpg)
5
Andrew Ng’s Helicopter: http://ai.stanford.edu/~ang/
![Page 6: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/6.jpg)
6
The Management of Substance Abuse/Mental Illness
• Treating Patients with Opioid Dependence (heroin)– Observations: individual characteristics (withdrawal
symptoms, craving, attendance at counseling sessions, results of urine tests….), characteristics of the environment (housing, employment.…)
– Actions/treatments: methadone dose, amount of weekly group counseling sessions, daily dosing time of methadone, individual counseling sessions, methadone taper
– Rewards: minimize opioid use and maximize health/functionality, minimize cost
![Page 7: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/7.jpg)
7http://www.nida.nih.gov/perspectives/vol1no1.html
![Page 8: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/8.jpg)
8
Management of a Welfare Program
• “Jobs First” Program in Connecticut– Observations: individual characteristics (assets,
income, age, health, employment), characteristics of the environment (domestic violence, incapacitated family member, # children, living arrangements…)
– Actions/treatments: child care, job search skills training, amount of cash benefit, medical assistance, education
– Rewards: maximize employment/independence.
![Page 9: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/9.jpg)
9
![Page 10: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/10.jpg)
10
The Common Thread: Multi-Stage Decision Making
• Observation, action, observation, action, observation, action,…………………….
• A strategy tells us how to use the observations to choose the actions.
• We’d like to develop strategies that maximize the rewards.
![Page 11: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/11.jpg)
11
Role of the Statistician
• What kinds of data are most useful for developing strategies?
• How do we use limited and expensive data to construct good strategies?
• How do we evaluate strategies using the limited data?
(A strategy tells us how to use the observations to choose the actions.)
![Page 12: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/12.jpg)
12
Outline
– Three apparently dissimilar problems– Myopic decision making– Constructing strategies– Challenges
• Unknown, unobserved causes• Small, expensive data sets
– Discussion
![Page 13: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/13.jpg)
13
Myopic Decision Making
• In myopic decision making, decision makers use strategies that seek to maximize immediate rewards. Problems:
– Ignore longer term consequences of present actions.– Ignore the range of feasible future actions/treatments– Ignore the fact that immediate responses to present actions
may yield information that pinpoints best future actions
• (A strategy tells us how to use the observations to choose the actions.)
![Page 14: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/14.jpg)
14
Autonomous Helicopter FlightThe helicopter has veered from flight plan.
• Myopic action: Choose an acceleration and direction that will ASAP bring us back to the flight plan.
• The result: The myopic action results in the helicopter overshooting the planned flight path and in drastic situations may lead to the helicopter cycling out of control.
• The mistake: We did not consider the range of actions we can take following the initial action. The ability to slow down is mechanically limited.
• The message: Use an acceleration that will not return us as quickly to the planned flight path but will take into account the ability of the helicopter to slow down and reduce the overshoot.
![Page 15: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/15.jpg)
15
Treatment of Psychosis
• Myopic action: Offer patients a treatment that reduces psychosis for as many people as possible.
• The result: Some patients are not helped and/or experience abnormal movements of the voluntary muscles (TDs). The class of subsequent medications is greatly reduced.
• The mistake: We should have taken into account the variety of treatments available to those for whom the first treatment is ineffective.
• The message: Use an initial medication that may not have as large a success rate but that will be less likely to cause TDs.
![Page 16: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/16.jpg)
16
Treatment of Opioid Dependence• Myopic action: Choose an intensive multi-component
treatment (methadone + counseling + behavioral contingencies) that immediately reduces opioid use for as many people as possible.
• The result: Behavioral contingencies are burdensome/expensive to implement and many people may not need the contingencies to improve.
• The mistake: We should allow the patient to exhibit poor adherence prior to implementing the behavioral contingencies.
• The message: Use an initial treatment that may not have as large an immediate success rate but carefully monitor patient adherence to ascertain if behavioral contingencies are required.
![Page 17: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/17.jpg)
17
Outline
– Three apparently dissimilar problems– Myopic decision making– Constructing strategies– Challenges
• Unknown, unobserved causes• Small, expensive data sets
– Discussion
![Page 18: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/18.jpg)
18
Treatment Treatment
Observations (Action) Observations (Action) RewardTime 1 Time 2
Time 2 Time 3
Basic Idea for Constructing a Strategy: Move Backwards Through Time.
(Pretend you are “All-Knowing”)
![Page 19: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/19.jpg)
19
Outline
– Three apparently dissimilar problems– Myopic decision making– Constructing strategies– Challenges
• Unknown, unobserved causes (e.g. how data might mislead you)
• Small, expensive data sets– Discussion
![Page 20: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/20.jpg)
20
Artificial Intelligence
• Scientists who construct strategies in autonomous helicopter flight use mechanistic theory (physical laws: momentum=m*v, W=F*d*cos(θ)…) to model the interrelationships between observations and how the actions might impact .– Scientists know many (most?) of the causes of the
observations and know how the observations relate to one another.
• Scientists can quickly evaluate strategies for selecting the actions (within a matter of months).
![Page 21: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/21.jpg)
21
Comparatively Less Known Mechanistic Models in
Behavioral/Social/Medical Sciences
• Scientists who want to use data on individuals to construct strategies must confront the fact that non-causal “associations” occur due to the unknown causes of the observations.
![Page 22: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/22.jpg)
22
Unknown UnknownCauses Causes
Observations Treatment Observations Treatment RewardTime 1 Time 2
Time 2 Time 3
Conceptual Structure in the Behavioral/Social/Medical Sciences
![Page 23: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/23.jpg)
23
Unknown, Unobserved Causes(Incomplete Mechanistic Models)
Maturity/Unknown DecisionCauses to join "Adult"
Society
+
+
Binge Drinking Treatment Binge Drinking Counseling Functionality Time 1 Time 2
Time 2 Time 3
![Page 24: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/24.jpg)
24
Unknown, Unobserved Causes (Incomplete Mechanistic Models)
• Problem: Non-causal associations between treatment (here counseling) and rewards are likely.
• Solution: Construct strategies using data sets in which randomization is used to assign treatments to students. This breaks the non-causal associations yet permits causal associations.
![Page 25: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/25.jpg)
25
Maturity/
Unknown DecisionCauses to join "Adult"
Society
"+"
Observations Treatment Binge Drinking Counseling FunctionalityTime 1 Time 2
Time 2 Time 3
Unknown, Unobserved Causes(Incomplete Mechanistic Models)
![Page 26: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/26.jpg)
26
Maturity/
Unknown DecisionCauses to join "Adult"
Society
+ -
Binge Drinking Counseling on - Binge Drinking Sanctions FunctionalityYes Health Yes/No + counseling
Consequences Time 2 Yes/No Time 3 Yes/No
Unknown, Unobserved Causes (Incomplete Mechanistic Models)
![Page 27: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/27.jpg)
27
Unknown High SATCauses Scores
+
+
Observations Student + Student Treatment Gradesis a superior admitted to Time 2athlete University
Unknown, Unobserved Causes (Incomplete Mechanistic Models)
![Page 28: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/28.jpg)
28
• The problem: Even when treatments are randomized, non-causal associations occur in the data.
• The solution: Statistical methods for constructing strategies must be conducted in stages as opposed to “all-at-once.” Statistical methods should appropriately “average” over the non-causal associations between treatment and reward.
Unknown, Unobserved Causes (Incomplete Mechanistic Models)
![Page 29: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/29.jpg)
29
Maturity/
Unknown DecisionCauses to join "Adult"
Society
+ -
Binge Drinking Counseling on - Binge Drinking Sanctions FunctionalityYes Health Yes/No + counseling
Consequences Time 2 Yes/No Time 3 Yes/No
Unknown, Unobserved Causes (Incomplete Mechanistic Models)
![Page 30: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/30.jpg)
30
Summary of Solutions To Causal Problems
• Experiments should randomize treatments (e.g. actions).
• Develop statistical methods that avoid being influenced by non-causal associations yet help you construct the strategy.
• Subjects in your data should be representative of population of subjects.
![Page 31: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/31.jpg)
31
Outline
– Three apparently dissimilar problems– Myopic decision making– Constructing strategies– Challenges
• Unknown, unobserved causes• Small, expensive data sets
– Discussion
![Page 32: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/32.jpg)
32
Expensive Data on a Limited Number of Individuals
• Scientists who want to use data on individuals to construct treatment strategies must provide measures of confidence and also evaluations of alternative treatment strategies.
• Above is challenging because methods for constructing strategies are non-smooth.
![Page 33: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/33.jpg)
33
Observations Treatment Observations Treatment Reward
Time 1 Time 2Time 2 Time 3
Basic Idea for Constructing a Strategy: Move Backwards Through Time.
![Page 34: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/34.jpg)
34
Expensive, Limited Data on Individuals
• In order to provide measures of confidence and comparisons of strategies, the statistical methods for constructing strategies must be regularized.
• A number of theoreticians are working hard on this open question.
![Page 35: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/35.jpg)
35
Outline
– Three apparently dissimilar problems– Myopic decision making– Constructing strategies– Challenges
• Unknown, unobserved causes• Small, expensive data sets
– Experiments & Discussion
![Page 36: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/36.jpg)
36
ExTENd
• Ongoing study at U. Pennsylvania (D. Oslin)
• Goal is to learn how best to help alcohol dependent individuals reduce alcohol consumption.
![Page 37: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/37.jpg)
37
Oslin ExTENd
Late Trigger forNonresponse
8 wks Response
TDM + Naltrexone
CBIRandom
assignment:
CBI +Naltrexone
Nonresponse
Early Trigger for Nonresponse
Randomassignment:
Randomassignment:
Randomassignment:
Naltrexone
8 wks Response
Randomassignment:
CBI +Naltrexone
CBI
TDM + Naltrexone
Naltrexone
Nonresponse
![Page 38: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/38.jpg)
38
Adaptive Treatment for ADHD
• Ongoing study at the State U. of NY at Buffalo (B. Pelham)
• Goal is to learn how best to help children with ADHD improve functioning at home and school.
![Page 39: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/39.jpg)
39
ADHD Study
B. Begin low dosemedication
8 weeks
Assess-Adequate response?
B1. Continue, reassess monthly; randomize if deteriorate
B2. Increase dose of medication with monthly changes
as neededRandomassignment:
B3. Add behavioral treatment; medication dose remains stable but intensity
of bemod may increase with adaptive modifications
based on impairment
No
A. Begin low-intensity behavior modification
8 weeks
Assess-Adequate response?
A1. Continue, reassess monthly;randomize if deteriorate
A2. Add medication;bemod remains stable butmedication dose may vary
Randomassignment:
A3. Increase intensity of bemod with adaptive modifi-
cations based on impairment
Yes
No
Randomassignment:
![Page 40: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/40.jpg)
40
Studies under review
• H. Jones study of drug-addicted pregnant women (goal is to reduce cocaine/heroin use during pregnancy and thereby improve neonatal outcomes)
• J. Sacks study of parolees with substance abuse disorders (goal is reduce recidivism and substance use)
![Page 41: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/41.jpg)
41
Jones’ Study for Drug-Addicted Pregnant Women
rRBT
2 wks Response
rRBT
tRBTRandom
assignment:
rRBT
Nonresponse
tRBT
Randomassignment:
Randomassignment:
Randomassignment:
aRBT
2 wks Response
Randomassignment:
eRBT
tRBT
tRBT
rRBT
Nonresponse
![Page 42: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/42.jpg)
42
Sack’s Study of Adaptive Transitional Case Management
Standard Services
Standard TCM
Randomassignment:
Randomassignment:
4 wks Response
Standard TCM
Augmented TCM
Standard TCM
Nonresponse
![Page 43: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/43.jpg)
43
Discussion• The best management of chronic disorders
(poverty, mental illness, other medical conditions) requires multi-stage decision making.
• Avoid myopic decision making!– Allow for longer term effects of the treatment– When comparing treatment options take into account
the effect of future treatments– Appreciate the value of observing patients outcomes
such as adherence• Basic experimental designs and statistical methods
are available.
![Page 44: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/44.jpg)
44
This seminar can be found at:http://www.stat.lsa.umich.edu/~samurphy/seminars/CASBS07.ppt
Email me with questions or if you would like a copy:
![Page 45: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/45.jpg)
45
Unknown, Unobserved Causes
Unknown UnknownCauses Causes
Observations Treatment Observations Treatment RewardTime 1 Time 2
Time 2 Time 3
![Page 46: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/46.jpg)
46
Unknown, Unobserved Causes
Unknown MaturityCauses of Student
+ -
Binge Drinking Treatment Frequent Drinking Treatment GradeTime 1 Binge Drinking Time 2
Time 2 Time 3
![Page 47: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/47.jpg)
47
Unknown, Unobserved Causes
• Problem: We recruit students via flyers posted in dormitories. Associations between observations and rewards are highly likely to be (due to the unknown causes) non-representative.
• Solution: Sample a representative group of college students.
![Page 48: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/48.jpg)
48
STAR*D
• This trial is over and the data is being analyzed (PI: J. Rush).
• One goal of the trial is construct good treatment sequences for patients suffering from treatment resistant depression.
www.star-d.org
![Page 49: Dynamic Treatment Regimes](https://reader036.fdocuments.net/reader036/viewer/2022062323/56815cbf550346895dcac315/html5/thumbnails/49.jpg)
49