Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+...

57
Human Decisions and Machine Predictions Jure Leskovec (@jure) Including joint work with Himabindu Lakkaraju, Sendhil Mullainathan, Jon Kleinberg, Jens Ludwig

Transcript of Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+...

Page 1: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Human Decisions and Machine Predictions

Jure Leskovec (@jure)Including joint work with Himabindu Lakkaraju,Sendhil Mullainathan, Jon Kleinberg, Jens Ludwig

Page 2: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Machine Learning

Humans vs. Machines

Page 3: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Humans vs. Machines§ Given the same data/features X

machines tend to beat human:§ Games: Chess, AlphaGo, Poker§ Image classification§ Question answering (IBM Watson)

§ But humans tend see more than machines do. Humans make decisions based on (X,Z)!

Page 4: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Humans Make DecisionsHumans make decisions based on X,Z!§ Machine is trained based on P(Y|X)§ But humans use P(Y|X,Z) to make

decisions§ Could this be a problem?

Yes! Why? Because the data is selectively labeled and cross validation can severely overestimate model performance.

Page 5: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Plan for the Talk1) Can machines make better decisions than humans?

2) Why do humans make mistakes?

3) Can machines help humans make better decisions?

Jure Leskovec (@jure) Stanford University 5

Page 6: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Jure Leskovec, Stanford University 6

Example Problem:Criminal Court Bail

Assignment

Page 7: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Example Decision ProblemU.S. police makes >12M arrests/yearQ: Where do people wait for trial? Release vs. Detain is high stakes:

§ Pre-trial detention spells avg. 2-3 months (can be up to 9-12 months)

§ Nearly 750,000 people in jails in US§ Consequential for jobs, families

as well as crimeJure Leskovec (@jure) Stanford University 7

Page 8: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Judge’s Decision ProblemJudge must decide whether to release (bail) or not (jail)Outcomes: Defendant when out on bail can behave badly:

§ Fail to appear at trial§ Commit a non-violent crime§ Commit a violent crime

The judge is making a prediction!Jure Leskovec (@jure) Stanford University 8

Page 9: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Bail as a Decision Problem§ Not assessing guilt on this crime

§ Not a punishment

§ Judge can only pay attention to failure to appear and crime risk

Jure Leskovec (@jure) Stanford University 9

Page 10: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

The ML TaskWe want to build a predictorthat will based on defendant’s

criminal history predict defendant’s future bad behavior

Jure Leskovec (@jure) Stanford University 10

Learning algorithm

Defendantcharacteristics

Predictionof bad

behaviorin the future

Page 11: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

What Characteristics to Use?Only administrative features

§ Common across regions§ Easy to get§ No “gaming the system” problem

Only features that are legal to use§ No race, gender, religion

Note: Judge predictions may not obey either of these Jure Leskovec (@jure) Stanford University 11

Page 12: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Data: Defendant Features§ Age at first arrest§ Times sentenced residential

correction§ Level of charge§ Number of active warrants§ Number of misdemeanor cases§ Number of past revocations§ Current charge domestic

violence§ Is first arrest§ Prior jail sentence§ Prior prison sentence§ Employed at first arrest§ Currently on supervision§ Had previous revocation

§ Arrest for new offense while on supervision or bond

§ Has active warrant§ Has active misdemeanor

warrant§ Has other pending charge§ Had previous adult conviction§ Had previous adult

misdemeanor conviction§ Had previous adult felony

conviction§ Had previous Failure to Appear§ Prior supervision within 10 years

Jure Leskovec (@jure) Stanford University 12

(only legal features and easy to get)

Page 13: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Data: Kentucky & Federal

Jurisdiction Number of cases

Fraction released people

Fraction of

Crime

Failure to Appear at

Trial

Non-violent Crime

ViolentCrime

Kentucky 362k 73% 17% 10% 4.2% 2.8%

Federal Pretrial System

1.1m 78% 19% 12% 5.4% 1.9%

Jure Leskovec (@jure) Stanford University 13

Page 14: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Applying Machine Learning§ Just apply Machine Learning:

§ Use labeled dataset (released defendants)

§ Train a learning model to predict whether defendant when our on bail will misbehave

§ Report accuracy on the held-out test set

Jure Leskovec, Stanford University 14

Page 15: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

�������������������������������������������

������������

���

��������������������������

����

������������

���

������������

����

���������������������������������

���

������������������������

����

��������������������������������

���

��������������������������

����

����������������

���

��������������������

����

�������������������

���

��������������

����

������������

����

�������������������

���

����

����

����

�����

����

����

����

�����

����

������

���

�����

���

����

����

�����

����

����

����

�����

����

����

����

�����

����

����

����

�����

���

����

����

�����

Prediction Model

Top four levels the decision treeProbability of NOT committing a crime

Typical tree depth: 20-25Jure Leskovec (@jure) Stanford University 15

But there is a problem with this…

Page 16: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

The Problem§ Issue: Judge sees factors the machine

does not§ Judge makes decisions based on P(Y|X,Z)

§ X … prior crime history§ Z … other factors not seen by the machine

§ Machine makes decisions based on P(Y|X)§ Problem: Judge is selectively labeling

the data based on P(Y|X,Z)!Jure Leskovec, Stanford University 16

Page 17: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Problem: Selective LabelsJudge is selectively labeling the dataset

§ We can only train on released people§ By jailing judge is selectively hiding labels!

Jure Leskovec, Stanford University 17

JudgeRelease

Crime? Crime?

Jail

Yes No ? ?

Page 18: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Selection on UnobservablesSelective labels introduce bias:

§ Say young people with no tattoos have no risk for crime. Judge releases them.

§ But machine does not observe whether defendants have tattoos.

§ So, the machine would falsely conclude that all young people do no crime.

§ And falsely presume that by releasing all young people it does better than judge!

Jure Leskovec, Stanford University 18

Page 19: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Jure Leskovec, Stanford University 19

Challenges: (1) Selective Labels(2) Unobservables Z

Page 20: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Solution: 3 PropertiesOur solution depends on three key properties:§ Multiple judges § Natural variability between judges§ Random assignment of cases to

judges§ One-sidedness of the problem

Page 21: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Solution: Tree Observations1) Problem is one-sided:

§ Releasing a jailed person is a problem§ Jailing a released person is not

Jure Leskovec (@jure) Stanford University 21

AlgorithmRelease Jail

Judg

eJa

il R

eleas

e

ü ü

üû

Page 22: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Solution: Three Observations2) In Federal system cases are randomly assigned§ This means that on average all

judges have similar cases

Jure Leskovec (@jure) Stanford University 22

Page 23: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Solution: Three Observations3) Natural variability between judges§ Due to human nature there will be

some variability between the judges§ Some judges are more strict while

others are more lenient

Jure Leskovec (@jure) Stanford University 23

Page 24: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Solution: ContractionSolution: Use most lenient judges!

§ Take released population of a lenient judge

§ Which of those would we jail to become less lenient

§ Compare crime rate to a strict human judge

Note: Does not rely on judges having “similar” predictions

Jure Leskovec (@jure) Stanford University 24

Page 25: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Solution: ContractionMaking lenient judges strict:

What makes contraction work:§ 1) Judges have similar cases

(random assignment of cases)§ 2) Selective labels are one-sided

Jure Leskovec (@jure) Stanford University 25

Defendants (ordered by crime probability)

Released by a lenient judge

Use algorithm to make a lenient judge more strict

Strict human judge

Page 26: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Evaluating the PredictionPredictions create a new release rule:

§ Parameterized by % released§ For every defendant predict P(crime)§ Sort them by increasing P(crime)

and “release” bottom k§ Note: This is different than AUC

What is the fraction released vs. crime rate tradeoff?

Jure Leskovec (@jure) Stanford University 26

Page 27: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Overall Predicted Crime

Jure Leskovec (@jure) Stanford University 27

Holdingreleaserateconstant,crimerateisreducedby0.102(58.45%decrease)

ReleaseRate

CrimeRate(includesF

TA,N

CA,N

VCA)

Allfeatures+MLalgorithm

Judges17%

73%

Machine Learning11%Ov

erall

crim

e ra

te

Release Rate 88%Standard errors too small to displayDecision trees beat LogReg by 30%Contrast with AUC ROC

Page 28: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Predicted Failure to Appear

Jure Leskovec (@jure) Stanford University 28

Release Rate

FTA

Rate

ML algorithm

Holding release rate constant, crime rate isreduced by 0.06(60% decrease)Judges decisions

10%

73%

4.1%

Page 29: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Predicted Non-Violent Crime

Jure Leskovec (@jure) Stanford University 29

Release Rate

NCA

Rate

Holding release rate constant, crime rate isreduced by 0.025(58% decrease)

Notes:Standarderrorstoosmalltodisplayongraph

Judges decisions4.2%

73%

1.7%

Page 30: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Predicted Violent Crime

Release Rate

NVCA

Rat

e

Holding release rate constant, crime rate isreduced by 0.014(50% decrease)

Judges decisions2.8%

73%

1.4%

Notes:StandarderrorstoosmalltodisplayongraphJure Leskovec (@jure) Stanford University 30

Page 31: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Effect of Race

ML does not beat the judge by racially discriminating

Page 32: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Source of Errors

Judges err on riskiest defendantsPredicted risk

Judg

e re

leas

e ra

te

Page 33: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Predicted Risk vs. JailingLeast Likely to Jail by judge Most Likely to Jail

Pred

icte

d ris

k

Page 34: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Why do Judges do Poorly?Odds are against algorithm: Judge sees many things the algorithm does not:

§ Offender history:Observable by both

§ Demeanor:§ “I looked in his eyes and saw his soul”§ Unobservable by the algorithm

Can we diagnose judges’ mistakes and help them make better decisions?

Jure Leskovec (@jure) Stanford University 34

Page 35: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Why do Judges do Poorly?Two possible reasons why judges are making mistakes:

§ Misuse of observable features§ These are the administrative features

available to the algorithm– E.g.: Previous crime history, etc.

§ Misuse of unobservable features§ Features not seen by the algorithm

– “I looked in his eyes and saw his soul”

Jure Leskovec (@jure) Stanford University 35

Page 36: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

A Different TestPredict judge’s decision

§ Training data does NOT include whether defendant actually commits a crime

Gives a release rule Release if predicted judge would release

§ This weights features the way judge does

What does it do differently then?§ Does not use the unobservables!

Jure Leskovec (@jure) Stanford University 36

Page 37: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Predicting the JudgeBuild a model to predict judge’s behavior

§ We predict what the judge will do (and NOT if a defendant will commit a crime)

Mix judge and the model: 1 − 𝛾 𝐽𝑢𝑑𝑔𝑒*𝑠𝑟𝑒𝑎𝑙𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛 +

𝜸 𝑷𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝒅𝒆𝒄𝒊𝒔𝒊𝒐𝒏

Does this beats the judge (for some 𝜸)?§ Model will do better only if the judge

misweighs unobservable featuresJure Leskovec (@jure) Stanford University 37

Page 38: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Artificial Judge Beats the Judge

Model of the judge beats the judge by 35%

Jure Leskovec (@jure) Stanford University 38

Pure Judge

Pure ML modelof the judge

Page 39: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Putting It All Together

●●

−0.5

−0.4

−0.3

−0.2

−0.1

0.0

−0.20 −0.15 −0.10 −0.05 0.00Percentage Point Change in Release Rate

Perce

nt Ch

ange

FTA

●●●

●●●

●●●

●●●

Judges

Lenient Judge + ML

Predicted Judge

Random

Page 40: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Recap so far…Algorithm beats human judge in the bail assignment problem

Artificial judge beats the real judgeThis means judge is misweighingunobservable features

Can we model human mistakes?Jure Leskovec (@jure) Stanford University 40

Page 41: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Judge j Items i True labels

No crime

Crime

. . .

Decisions of j

Release

Release

Jail

Jail

Modeling Human Decisions

41Jure Leskovec (@jure) Stanford University

No crime

Crime

Page 42: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Judge j Defendant iDecision of judge j on

defendant i

Decision

True label ti

Modeling Human Decisions

42Jure Leskovec (@jure) Stanford University

pc,c pc,n

pn,c pn,n

Trut

h

Probability that j decided “no crime” to a defendant who will do “crime”

Page 43: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Problem Setting

Input:§ Judges and defendants described by

featuresOutput:

§ Discover groups of judges and defendants that share common confusion matrices

Jure Leskovec (@jure) Stanford University 43

Page 44: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Joint Confusion Model

44

Judge attributes Item attributes

Judge j Item i

Confusion matrix

Cluster cj

Clusterdi

Item label

Decision of judge j on item i

b1 b2 b3 b4 zia1 a2 a3 a4 a5

DecisionTru

th

Similar judges share confusion matrices when deciding on similar items

Page 45: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

What do we learn on Bail?Judge attributes: § Year, County, Number of Previous Cases,

Number of Previous Felony Cases, Number of Previous Misd. Cases, Number of Previous Minor Cases

Defendant attributes: § Previous Crime History, Personal Attributes

(Children/Married etc.), Social Status (Education/House Own/Moved a lot in past 10 years), Current Offense Details

Jure Leskovec (@jure) Stanford University 45

Page 46: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Why judges make mistakes

Jure Leskovec (@jure) Stanford University 46

Decision

Trut

h

Less experienced judges make more mistakes¡ Left: Judges with high number of cases¡ Right: Judges with low number of cases

0.9

0.90.1

0.1 0.6

0.4 0.6

0.4

Page 47: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Why judges make mistakes

Judges with many felony cases are stricter¡ Left: Judges with many felony cases¡ Right: Judges with few felony cases

Jure Leskovec (@jure) Stanford University 47

Decision

Trut

h

0.8 0.80.2 0.2

0.5 0.5 0.3 0.7

Page 48: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Why judges make mistakes

Jure Leskovec (@jure) Stanford University 48

Defendants who are single, did felonies, and moved a lot are accurately judged

Decision

Trut

h

Defendants who have kids are confusing to

judges

0.9

0.90.1

0.1 0.5

0.50.5

0.5

Page 49: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

ConclusionNew tool to understand human decision makingThe framework can be applied to various other questions:

§ Decisions about patient treatments§ Assigning students to intervention

programs§ Hiring and promotion decisions

Jure Leskovec (@jure) Stanford University 49

Page 50: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Power of AlgorithmsAlgorithms can help us understand if human judges are biased

Algorithms can be used to diagnose reason for bias

Key insight§ More than prediction tools§ Serve as behavioral diagnostic tools

Jure Leskovec (@jure) Stanford University 50

Page 51: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Focusing on DecisionsNot just about predictionKey is starting with decision:

§ Performance benchmark: Current “human” decisions§ Not ROC but “human ROC”

§ What are we really optimizing?

Focus on decision also raises new concerns

Jure Leskovec (@jure) Stanford University 51

Page 52: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Reflections: New Concerns1) Selective labels

§ We don’t see labels of people that are jailed

§ Extremely common problem:Prediction -> Decision -> Action

§ Which outcomes we see depends on our decisions

Jure Leskovec (@jure) Stanford University 52

Page 53: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Reflections: New Concerns2) Data responds to prediction§ Whenever we make a

prediction/decision we affect the labels/data we see in the future

§ Creates a self-reinforcing feedback loop

Jure Leskovec (@jure) Stanford University 53

Page 54: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Reflections: New Concerns3) Constraints on input features:For example, race is an illegal feature

§ Our models don’t use itBut, is that enough?

§ Many other characteristics correlate with race

How do we deal with this additional constraint?

Jure Leskovec (@jure) Stanford University 54

Page 55: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

Reflections: New Concerns4) Omitted payoff bias§ Is minimizing the crime rate really the

right goal?§ There are other important factors

§ Consequences of jailing on the family§ Jobs and the workplace§ Future behavior of the defendant

§ How could we model these?Jure Leskovec (@jure) Stanford University 55

Page 56: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

ConclusionBail is not the only prediction problem

Many important problems in public policy are predictive problems!

Potential for large impact

Jure Leskovec (@jure) Stanford University 56

Page 57: Human Decisions and Machine Predictions · 2017. 10. 26. · _2H2 b2_mH2 iQCm/;2 "H +F >BbT MB+ JBMQ`Biv.Bbi`B#miBQMQ7.272M/ MibU" b2_ i2VX93dd XjjR3 X3RN8 Cm/;2 XRRj9 yW X8dj XjRek

References§ The Selective Labels Problem: Evaluating Algorithmic Predictions in

the Presence of Unobservables. H. Lakkaraju, J. Kleinberg, J. Leskovec, J. Ludwig, S. Mullainathan. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2017.

§Human Decisions and Machine Predictions. J. Kleinberg, H. Lakkaraju, J. Leskovec, J. Ludwig, S. Mullainathan. NBER Working Paper, 2017.

§ A Bayesian Framework for Modeling Human Evaluations by H. Lakkaraju, J. Leskovec, J. Kleinberg, S. Mullainathan. SIAM International Conference on Data Mining (SDM) 2015.

§ Confusions over Time: An Interpretable Bayesian Model to Characterize Trends in Decision Making. H. Lakkaraju, J. Leskovec. Neural Information Processing Systems (NIPS), 2016.

Jure Leskovec (@jure) Stanford University 57