MLconf NYC Samantha Kleinberg

23
Causal Inference from Uncertain Time Series Data Samantha Kleinberg Stevens Institute of Technology

description

 

Transcript of MLconf NYC Samantha Kleinberg

Page 1: MLconf NYC Samantha Kleinberg

Causal Inference from Uncertain Time Series Data

Samantha KleinbergStevens Institute of Technology

Page 2: MLconf NYC Samantha Kleinberg
Page 3: MLconf NYC Samantha Kleinberg

http://discoverysedge.mayo.edu/artificial-pancreas/

Page 4: MLconf NYC Samantha Kleinberg

insulindependence.orginfluxis.com gigaom.comNate Heintzman @ UCSD, Dexcom

Page 5: MLconf NYC Samantha Kleinberg

TimeBlood Glucose

Heart Rate

Skin Temp.

Energy Use Weight

7:207:25 95.7 26.80 37.237:30 90.5 28.63 12.857:35 82.4 29.69 6.787:40 79.2 30.32 6.417:45 77.0 30.55 8.177:50 73.3 31.19 6.87 1557:55 72.7 31.29 6.088:00 130.0 71.4 31.96 9.748:05 203.5 89.8 31.59 24.588:10 208.2 87.9 31.67 10.638:15 209.0 84.8 31.16 15.638:20 207.8 81.0 31.11 11.748:25 205.9 83.7 31.26 16.288:30 204.8 98.3 31.05 23.598:35 214.9 82.2 30.47 11.668:40 225.7 82.9 30.95 16.938:45 231.4 0.18:50 232.8 89.6 29.67 21.968:55 239.4 81.2 30.81 15.86

Page 6: MLconf NYC Samantha Kleinberg

TimeBlood Glucose

Heart Rate

Skin Temp.

Energy Use Weight

7:207:25 95.7 26.80 37.237:30 90.5 28.63 12.857:35 82.4 29.69 6.787:40 79.2 30.32 6.417:45 77.0 30.55 8.177:50 73.3 31.19 6.87 1557:55 72.7 31.29 6.088:00 130.0 71.4 31.96 9.748:05 203.5 89.8 31.59 24.588:10 208.2 87.9 31.67 10.638:15 209.0 84.8 31.16 15.638:20 207.8 81.0 31.11 11.748:25 205.9 83.7 31.26 16.288:30 204.8 98.3 31.05 23.598:35 214.9 82.2 30.47 11.668:40 225.7 82.9 30.95 16.938:45 231.4 0.18:50 232.8 89.6 29.67 21.968:55 239.4 81.2 30.81 15.86

Errors

Page 7: MLconf NYC Samantha Kleinberg

TimeBlood Glucose

Heart Rate

Skin Temp.

Energy Use Weight

7:207:25 95.7 26.80 37.237:30 90.5 28.63 12.857:35 82.4 29.69 6.787:40 79.2 30.32 6.417:45 77.0 30.55 8.177:50 73.3 31.19 6.87 1557:55 72.7 31.29 6.088:00 130.0 71.4 31.96 9.748:05 203.5 89.8 31.59 24.588:10 208.2 87.9 31.67 10.638:15 209.0 84.8 31.16 15.638:20 207.8 81.0 31.11 11.748:25 205.9 83.7 31.26 16.288:30 204.8 98.3 31.05 23.598:35 214.9 82.2 30.47 11.668:40 225.7 82.9 30.95 16.938:45 231.4 0.18:50 232.8 89.6 29.67 21.968:55 239.4 81.2 30.81 15.86

Errors

Missing Data

Page 8: MLconf NYC Samantha Kleinberg

TimeBlood Glucose

Heart Rate

Skin Temp.

Energy Use Weight

7:207:25 95.7 26.80 37.237:30 90.5 28.63 12.857:35 82.4 29.69 6.787:40 79.2 30.32 6.417:45 77.0 30.55 8.177:50 73.3 31.19 6.87 1557:55 72.7 31.29 6.088:00 130.0 71.4 31.96 9.748:05 203.5 89.8 31.59 24.588:10 208.2 87.9 31.67 10.638:15 209.0 84.8 31.16 15.638:20 207.8 81.0 31.11 11.748:25 205.9 83.7 31.26 16.288:30 204.8 98.3 31.05 23.598:35 214.9 82.2 30.47 11.668:40 225.7 82.9 30.95 16.938:45 231.4 0.18:50 232.8 89.6 29.67 21.968:55 239.4 81.2 30.81 15.86

Errors

Missing Data

Different timescales

Page 9: MLconf NYC Samantha Kleinberg

Discretization

Page 10: MLconf NYC Samantha Kleinberg

Logic-based causal inference

• Complex, temporal relationships

• Potential causes raise probability of and are earlier than effects

Kleinberg, S. (2012) Causality, Probability, and Time.

Page 11: MLconf NYC Samantha Kleinberg

Which causes are significant?

• Main idea: looking for better explanations for the effect

• Assess average difference cause makes to probability of effect

• Determine which ε are statistically significant

Page 12: MLconf NYC Samantha Kleinberg

Causes from EHR data

dx_urinary_symptoms

first_antihtn_combo

dx_overweight

dx_diabetes

dx_hypothyroidism

0510152025

8

9

17

17

18

11

10

22

22

21

Months before CHF diagnosis

S. Kleinberg and N. Elhadad (2013) Lessons Learned in Replicating Data-Driven Experiments in Multiple Medical Systems and Patient Populations. AMIA Annual Symposium.

Page 13: MLconf NYC Samantha Kleinberg

Adding uncertainty

D. Hutchison and S. Kleinberg (2013) Causality and Experimentation in the Sciences.

Page 14: MLconf NYC Samantha Kleinberg

Carb-heavy meal (c), vigorous exercise (e), blood glucose (g)

Page 15: MLconf NYC Samantha Kleinberg

Calculate:

What’s the effect of exercise on glucose?

Page 16: MLconf NYC Samantha Kleinberg

Calculate: Strict discretization:

What’s the effect of exercise on glucose?

Page 17: MLconf NYC Samantha Kleinberg

Calculate: Strict discretization:Probabilistic discretization:

What’s the effect of exercise on glucose?

Page 18: MLconf NYC Samantha Kleinberg

Validation

Simulated structures• Randomly generated

relationships• 6 levels of uncertainty• 20K observations

Page 19: MLconf NYC Samantha Kleinberg

Results on simulated data

Page 20: MLconf NYC Samantha Kleinberg

Causes of changes in glucose

Cohort: 17 subjects with T1DMSensor data (collected for >72 hours)– Glucose values– Insulin dosage– Activity– Sleep stage– Heart rate– Temperature

With N. Heintzman (UCSD,Dexcom) http://dial.ucsd.edu/what-we-do.php

Page 21: MLconf NYC Samantha Kleinberg

Results

very vigorous exercise leads to hyperglycemia (fdr <.01) in 5-30 minutes– Found using both HR (anaerobic activity zone) and

METs

• Not found with strict discretization• Supported by literature

(Marliss and Vranic, 2002; Riddell and Perkins, 2006)

S. Kleinberg and N. Heintzman. Inference of Causal Relationships from Uncertain Data and Application to Type 1 Diabetes. (under review)

Page 22: MLconf NYC Samantha Kleinberg

Open problems

• Latent variables• Nonstationary time series• Real-time prediction/explanation• Simulation

Page 23: MLconf NYC Samantha Kleinberg

Thanks!

• NSF/CRA Computing Innovation fellowship

• NLM Computational Thinking contract

• NLM R01

• Columbia– Noemie Elhadad– George Hripcsak– Mitzi Morris– Rimma Pivovarov

• Geisinger– Buzz Stewart– Craig Wood

• UCSD– Nathaniel Heintzman

• Stevens– Dylan Hutchison