A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

20
A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches * CEDM Annual Meeting Pittsburgh, PA May 20, 2012 Umit Guvenc, Mitchell Small, Granger Morgan Carnegie Mellon University *Work supported under a cooperative agreement between NSF and Carnegie Mellon University through the Center for Climate and Energy Decision Making (SES-0949710)

description

A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *. Umit Guvenc, Mitchell Small, Granger Morgan Carnegie Mellon University. CEDM Annual Meeting Pittsburgh, PA May 20, 2012. - PowerPoint PPT Presentation

Transcript of A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

Page 1: A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches*

CEDM Annual Meeting Pittsburgh, PAMay 20, 2012

Umit Guvenc, Mitchell Small, Granger MorganCarnegie Mellon University

*Work supported under a cooperative agreement between NSF and Carnegie Mellon University through the Center for Climate and Energy Decision Making (SES-0949710)

Page 2: A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

Multi-Expert Weighting: A Common Challenge in Public Policy

• Within climate change context, many critical quantities and probability distributions elicited from multiple experts (e.g., climate sensitivity)

• No consensus on best methodology if one wanted to aggregate multiple, sometimes conflicting, expert opinions

• Critical to demonstrate advantages and disadvantages of different approaches under different circumstances

2

Page 3: A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

General Issues Regarding Multi-Expert Weighting

1. Should we aggregate expert judgments at all?2. If we do, should we use a differential weighting

scheme?3. If we do, should we use “seed questions” to assess

expert skill?4. If we do, how should we choose “appropriate” seed

questions?5. If we do, how do different weighting schemes perform

under different circumstances?• Equal weights• Likelihood weights• “Classical” (Cooke) weights

3

Page 4: A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

Presentation Outline

1. Alternative Weighting Methods– Likelihood, “Classical”, Equal Weighting Schemes

2. Our Approach

3. Characterizing Experts– Bias, Precision, Confidence

4. Multi-Expert Scenario Analysis

5. Conclusions

4

Page 5: A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

Likelihood Weights

• Traditional approach for multi-model aggregation in classical statistics– Equivalent to Bayesian model aggregation with uninformed priors

• Uses relative likelihoods for Prob[true value| expert estimate]– We assume expert’s actual likelihood depends on their skill

- Bias and Precision– Expert’s self-perceived likelihood depends on his/her Confidence

• Parametric error distribution function required– Normal distribution assumed in analysis that follows

(many risk-related quantities ~lognormal, so directly applicable to these)

• “Micro” validation incorporated

5

Page 6: A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

“Classical” Weights• Cooke RM (1991), Experts in Uncertainty, Oxford University

Press, Oxford• Cooke RM and Grossens LLHJ (2008) “TU Delft Expert

Judgment Database”, Reliability Engineering and System Safety, v.93, p.657-674

• Per study: 7-55 seeds, 6-47 “effective” seeds, 4-77 experts• Parameters chosen to maximize expert weights• Within-sample validation

• “Macro” validation only– Based on frequencies across percentiles across all questions

• Non-parametric, based on Chi-square distribution

6

Page 7: A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

Our Approach

• MC Simulation with 10 hypothetical questions

• Experts characterized along three dimensions– Bias– Precision– Confidence

• Multi-Expert Scenario Analysis

7

Page 8: A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

Characterizing Experts:Bias, Precision, Confidence

8

µmean

σmean (Precision)

0

Bias

µ

X50%X5% X95% X0

σX (Confidence)

L=fX(0)*

fX

µXTrue Value

Expert thinks about the mean (i.e. best estimate)

Expert thinks about distribution of variable X

Page 9: A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

Multi-Expert Scenario Analysis

• 9 experts, characterized by Bias, Precision, Confidence• 10 hypothetical questions (i = 1 to 10)

– True Value XTrue(i) = 0

– Expert Estimate XEstimate(i): X5%, X50%,X95%

– Predictive Error(i) = XTrue(i) - XGuess(i); MSE

• Leave one question out at a time to predict (cross-validation)• Determine expert weights using 9 questions

• Compare weights and predictive error for an assumed group of experts– Equal Weights– Likelihood Weights– “Classical” Weights

9

Page 10: A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

Multi-Expert Scenarios

1. Base Case2. Impact of Bias3. Impact of Precision4. Impact of Confidence5. Experts with bias, precision and confidence all

varying

10

Page 11: A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

Scenario #1: Base Case

11

Experts 1 4 72 5 83 6 9

Expert Characteristics Results: Weights & Error

Bias Avg Likelihood Weights 0 0 0 10.9% 11.9% 10.7% MSE(L) = 0.03 0 0 0 11.3% 11.1% 11.4% 0 0 0 11.4% 10.7% 10.5%

Precision Avg Classical Weights 0.3 0.3 0.3 10.8% 11.3% 11.1% MSE (C) = 0.01 0.3 0.3 0.3 11.5% 11.0% 11.1% 0.3 0.3 0.3 11.0% 11.1% 11.1%

Confidence

Precision Equal Weights 11.11% MSE(E) = 0.82 C/P 0.3 0.3 0.3 1 0.3 0.3 0.3 1 Confidence: 0.3 0.3 0.3 1 0.3 0.3 0.3

• Model validation: Equal weights to equal skills

Page 12: A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

Scenario #2: Impact of Bias

12

Experts 1 4 72 5 83 6 9

Expert Characteristics Results: Weights & Error

Bias Avg Likelihood Weights 0 0 0 17.2% 18.1% 16.9% MSE(L) = 0.04 0.1 0.1 0.1 14.0% 13.9% 14.0% 0.3 0.3 0.3 2.2% 1.8% 1.8%

Precision Avg Classical Weights 0.3 0.3 0.3 15.8% 16.4% 16.2% MSE (C) = 0.02 0.3 0.3 0.3 14.0% 13.5% 13.4% 0.3 0.3 0.3 3.5% 3.6% 3.6%

Confidence

Precision Equal Weights 11.11% MSE(E) = 2.26 C/P 0.3 0.3 0.3 1 0.3 0.3 0.3 1 Confidence: 0.3 0.3 0.3 1 0.3 0.3 0.3

• When small and moderate bias introduced to multiple experts, weights change to penalize bias (more prominent in likelihood method)

Page 13: A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

Scenario #3: Impact of Precision

13

Experts 1 4 72 5 83 6 9

Expert Characteristics Results: Weights & Error

Bias Avg Likelihood Weights 0 0 0 30.7% 2.2% 0.0% MSE(L) = 0.02 0 0 0 31.9% 2.1% 0.0% 0 0 0 31.2% 1.9% 0.0%

Precision Avg Classical Weights 0.2 0.3 1 15.5% 13.3% 4.3% MSE (C) = 0.02 0.2 0.3 1 16.5% 12.9% 4.3% 0.2 0.3 1 15.7% 13.1% 4.3%

Confidence

Precision Equal Weights 11.11% MSE(E) = 3.42 C/P 0.2 0.3 1 1 0.2 0.3 1 1 Confidence: 0.2 0.3 1 1 0.2 0.3 1

• When Bias=0 for all and imprecision introduced to multiple experts, weights change to reward precision and penalize imprecision (more prominent in likelihood method)

Page 14: A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

Scenario #4: Impact of Confidence

14

Experts 1 4 72 5 83 6 9

Expert Characteristics Results: Weights & Error

Bias Avg Likelihood Weights 0 0 0 10.3% 11.5% 10.2% MSE(L) = 0.05 0 0 0 22.2% 21.3% 22.1% 0 0 0 0.8% 0.8% 0.8%

Precision Avg Classical Weights 0.3 0.3 0.3 7.0% 7.9% 7.4% MSE (C) = 0.02 0.3 0.3 0.3 18.2% 17.4% 17.5% 0.3 0.3 0.3 8.1% 8.2% 8.2%

Confidence

Precision Equal Weights 11.11% MSE(E) = 0.82 C/P 0.3 0.3 0.3 0.5 0.15 0.15 0.15 1 Confidence: 0.3 0.3 0.3 2 0.6 0.6 0.6

• When Bias=0 for all and over- and under-confidence introduced to multiple experts, weights change to penalize inappropriate confidence (more prominent in likelihood method for under-confidence)

Page 15: A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

Scenario #5a: Impact of Precision & Confidence (Bias = 0 for all)

15

Experts 1 4 72 5 83 6 9

Expert Characteristics Results: Weights & Error

Bias Avg Likelihood Weights 0 0 0 17.7% 6.1% 0.0% MSE(L) = 0.03 0 0 0 62.5% 7.4% 0.0% 0 0 0 6.1% 0.2% 0.0%

Precision Avg Classical Weights 0.2 0.3 1 6.8% 6.9% 4.2% MSE (C) = 0.03 0.2 0.3 1 21.5% 17.8% 9.4% 0.2 0.3 1 15.9% 13.2% 4.3%

Confidence

Precision Equal Weights 11.11% MSE(E) = 3.42 C/P 0.2 0.3 1 0.5 0.1 0.15 0.5 1 Confidence: 0.2 0.3 1 2 0.4 0.6 2

• When Bias=0 and imprecision and over-and under-confidence introduced to multiple experts• Weights change to reward “ideal” expert (more prominent in likelihood)• For “Classical”, proper confidence can somewhat compensate for imprecision, not so for

Likelihood (imprecise experts are penalized highly, even if they know they are imprecise)

Page 16: A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

Scenario #5b: Impact of Precision & Confidence(Bias for all)

16

Experts 1 4 72 5 83 6 9

Expert Characteristics Results: Weights & Error

Bias Avg Likelihood Weights 0.5 0.5 0.5 0.0% 0.2% 1.2% MSE(L) = 0.30 0.5 0.5 0.5 0.3% 8.8% 1.0% 0.5 0.5 0.5 42.3% 46.3% 0.0%

Precision Avg Classical Weights 0.2 0.3 1 0.0% 0.1% 12.7% MSE (C) = 0.72 0.2 0.3 1 0.0% 1.9% 46.5% 0.2 0.3 1 2.0% 8.3% 28.6%

Confidence

Precision Equal Weights 11.11% MSE(E) = 23.80 C/P 0.2 0.3 1 0.5 0.1 0.15 0.5 1 Confidence: 0.2 0.3 1 2 0.4 0.6 2

• When bias for all, and varying amounts of precision and improper relative confidence introduced to multiple experts• Likelihood weights change to reward relatively precise, but underconfident experts• Classical weights shift to reward imprecise experts.

Page 17: A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

Scenario #5c: Precision & Confidence (Bias for 3 Experts)

17

Experts 1 4 72 5 83 6 9

Expert Characteristics Results: Weights & Error

Bias Avg Likelihood Weights 0.3 0 0 0.0% 8.2% 0.0% MSE(L) = 0.04 0 0.3 0 86.7% 2.0% 0.0% 0.3 0 0 2.4% 0.6% 0.0%

Precision Avg Classical Weights 0.2 0.3 1 0.0% 9.9% 6.2% MSE (C) = 0.06 0.2 0.3 1 32.8% 6.5% 14.6% 0.2 0.3 1 2.8% 20.4% 6.7%

Confidence

Precision Equal Weights 11.11% MSE(E) = 4.26 C/P 0.2 0.3 1 0.5 0.1 0.15 0.5 1 Confidence: 0.2 0.3 1 2 0.4 0.6 2

• When there is moderate bias in a subset of “good” experts, and both imprecision and over-and under-confidence introduced to all• Likelihood rewards “best” expert significantly • Classical spreads weights across much more

Page 18: A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

Conclusions (1)• Overall: Likelihood and “Classical” similar performance (much

better than equal weights), but with very different weights assigned to experts with different degrees of bias, precision and relative confidence

• Model Check: Both assign equal weights to experts with equal skill (equal bias, precision, and relative confidence)

• Bias: Both penalize biased experts, stronger penalty in Likelihood

• Precision: Both penalize imprecise experts, but again stronger penalty in Likelihood

• Confidence: “Classical” penalizes overconfidence and underconfidence equally. Likelihood penalizes overconfidence a similar amount, but underconfidence much more so.

18

Page 19: A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

Conclusions (2)• Precision & Confidence: For “Classical”, proper (or under-)

confidence can compensate somewhat for imprecision, not so for the Likelihood weights (and over-confidence remains better for Likelihood weighting).

• Future Direction: Consider 3-parameter distributions to be fit from expert’s 5th, 50th, and 95th percentile values to enable a more flexible Likelihood approach– Conduct an elicitation in which 2- and 3-parameter likelihood

functions are used and compared.– Consider how new information affects experts' performance on seed

questions (explore VOI for correcting experts' biases, imprecision, and under- or overconfidence).

19

Page 20: A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *

Thank you

Questions?

20