Evaluation poster NeurIPS - Amanda Gentzel poster NeurIPS.pdf · type of evaluation? Q4. Does...

The Case for Evaluating Causal Models UsingInterventional Measures and Empirical Data

SummaryAlgorithms for causal discovery

are unlikely to be widely acceptedunless they can be shown to accurately predict the effects of intervention when

applied to empirical data.

Q1. Don’t we do this already? Q2. Aren’t structural measures enough?

Q3. Is there any datathat supports thistype of evaluation?

Q4. Does empirical data add any value?

• We surveyed 91 causality papers from the past 5 years of NeurIPS, AAAI, KDD, UAI, and ICML.

Source of data

BothSynthetic

Empirical

Structural Distributional Visual inspection Interventional

Synthetic Empirical

Type of algorithm

BivariateOther

Multivariate

ObservationalSampling

Observational DataInterventional Data

Parameterized DAG

Structure Learning &Parameterization

Compute InterventionalDistribution

Estimate InterventionalDistributionEvaluation

T O C1 5.7 L0 3.2 L1 4.5 H0 4.3 H1 6.2 H0 1.5 H1 5.3 L

ID1122334

0 4.6 L4…

T O C0 3.2 L1 4.5 H1 6.2 H1 5.3 L

ID1234

Synthetic (PC) Synthetic (GES) Empirical

• Many datasets exist with knowninterventional effects (DREAM, ACIC 2016 challenge, flow cytometry,cause-effect pairs challenge, etc.)

• We can collect data fromcomputational systems

• Many advantages:• Empirical• Easily intervenable• Natural stochasticity

• We collected data from Postgres, the JDK, and networking infrastructure and intervened by changing system parameters

• Can create pseudo-observational data by biasing with an observed covariate

• Most structural measures penalize all errors equally• Even structural measures designed to consider

interventions (ex: Structural Intervention Distance) produce similar results to other structural measures

• Interventional measures (ex: Total variation distance) • Generated random DAGs and compare different evaluation

measures, for both GES and PC• TVD and SHD produce significantly different results,

varying by algorithm

Measures of interventional effect arenecessary when trying to assess how well an

algorithm learns actual causal effects

Effective methods exist to createempirical data for evaluating

algorithms for causaldiscovery.

• Empirical data provides realistic complexity generally not

present in synthetic data• Data not generated by the

researcher is less likely to containunintentional biases and can be

standardized across the community• Provides a stronger demonstration of

effectiveness• Learned causal structure of computational systems using

PC (left) and GES (center)• Generated synthetic data based on these structures and evaluated

performance of GES, MMHC, and PC• Compare to performance of GES, MMHC, and PC on the original

empirical data – significantly different relative order

A. No, fewer than 10% of papers published in the past 5 years use a combination of empirical data and interventional measures.

A. No, structural measures correspond poorly to measures of interventional effect, such as TVD.

A. Yes, several data sets exist, including ones we have recently created from experimentation with computational systems.

A. Yes, results on empirical data sets appear to differ substantially from results on ‘look-alike’ synthetic data sets.

Amanda Gentzel, Dan Garant, and David JensenCollege of Information and Computer Sciences,University of Massachusetts Amherst

Percentage of papers usingEvaluation measures

Evaluation poster NeurIPS - Amanda Gentzel poster NeurIPS.pdf · type of evaluation? Q4. Does...

Documents

Transcript of Evaluation poster NeurIPS - Amanda Gentzel poster NeurIPS.pdf · type of evaluation? Q4. Does...

Advances in Neural Information Processing Systems 33 (NeurIPS … · Advances in Neural Information Processing Systems 33 (NeurIPS 2020) Edited by: H. Larochelle and M. Ranzato and

Adaptive Density Estimation for Generative ModelsDensity Estimation for Generative Models. NeurIPS 2019 - Thirty-third Conference on Neural Infor NeurIPS 2019 - Thirty-third Conference

Baxter Permutation Process - NeurIPS

Exploring Global Hedge Accounting Changes and Their Impacts · 2016-04-12 · Exploring Global Hedge Accounting Changes and Their Impacts Dan Gentzel, CPA Managing Director Chatham

91.5x122 cm Poster Template - SAGA91.5x122 cm Poster Template Subject Free PowerPoint poster templates Keywords poster presentation, poster design, poster template Created Date 12/6/2010

The Power of Amnesia - NeurIPS

From the Pastor October 9, 2005 - easeton.neteaseton.net/wp-content/uploads/2015/09/B-September-… · Web viewJoan Drake, Fred Gentzel; Altar ... Thomas Murdock, Stephanie Kornspan,

Poster Bodoni Font Poster

Diffeomorphic Temporal Alignment Nets - NeurIPS

Learning to Play the Game of Chess - NeurIPS

poster example ee poster

NeurIPS 2019 Eﬃcient and Diverse Batch Acquisition for ... · Email: {andreas.kirsch, joost.van.amersfoort, yarin}@cs.ox.ac.uk. Introduction to Active Learning A key problem in

CO-Optimal Transport - NeurIPS

Poster de poster

Disentangling Human Error from the Ground Truth ... - NeurIPS

Cultural Analysis Project by Margaret Gentzel Introduction.

Adversarial Machine Learning in Sequential Decision Makingpages.cs.wisc.edu/~jerryzhu/pub/AdvMLSeq.pdf · Adversarial attacks on stochastic bandits. NeurIPS, 2018 • L Lessard, X

Ode to an ODE - NeurIPS

Spatial Transformer Networks - NeurIPS

Optimization by Mean Field Annealing - NeurIPS