A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *
-
Upload
lisandra-chambers -
Category
Documents
-
view
31 -
download
0
description
Transcript of A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches *
A Multi-Expert Scenario Analysis for Systematic Comparison of Expert Weighting Approaches*
CEDM Annual Meeting Pittsburgh, PAMay 20, 2012
Umit Guvenc, Mitchell Small, Granger MorganCarnegie Mellon University
*Work supported under a cooperative agreement between NSF and Carnegie Mellon University through the Center for Climate and Energy Decision Making (SES-0949710)
Multi-Expert Weighting: A Common Challenge in Public Policy
• Within climate change context, many critical quantities and probability distributions elicited from multiple experts (e.g., climate sensitivity)
• No consensus on best methodology if one wanted to aggregate multiple, sometimes conflicting, expert opinions
• Critical to demonstrate advantages and disadvantages of different approaches under different circumstances
2
General Issues Regarding Multi-Expert Weighting
1. Should we aggregate expert judgments at all?2. If we do, should we use a differential weighting
scheme?3. If we do, should we use “seed questions” to assess
expert skill?4. If we do, how should we choose “appropriate” seed
questions?5. If we do, how do different weighting schemes perform
under different circumstances?• Equal weights• Likelihood weights• “Classical” (Cooke) weights
3
Presentation Outline
1. Alternative Weighting Methods– Likelihood, “Classical”, Equal Weighting Schemes
2. Our Approach
3. Characterizing Experts– Bias, Precision, Confidence
4. Multi-Expert Scenario Analysis
5. Conclusions
4
Likelihood Weights
• Traditional approach for multi-model aggregation in classical statistics– Equivalent to Bayesian model aggregation with uninformed priors
• Uses relative likelihoods for Prob[true value| expert estimate]– We assume expert’s actual likelihood depends on their skill
- Bias and Precision– Expert’s self-perceived likelihood depends on his/her Confidence
• Parametric error distribution function required– Normal distribution assumed in analysis that follows
(many risk-related quantities ~lognormal, so directly applicable to these)
• “Micro” validation incorporated
5
“Classical” Weights• Cooke RM (1991), Experts in Uncertainty, Oxford University
Press, Oxford• Cooke RM and Grossens LLHJ (2008) “TU Delft Expert
Judgment Database”, Reliability Engineering and System Safety, v.93, p.657-674
• Per study: 7-55 seeds, 6-47 “effective” seeds, 4-77 experts• Parameters chosen to maximize expert weights• Within-sample validation
• “Macro” validation only– Based on frequencies across percentiles across all questions
• Non-parametric, based on Chi-square distribution
6
Our Approach
• MC Simulation with 10 hypothetical questions
• Experts characterized along three dimensions– Bias– Precision– Confidence
• Multi-Expert Scenario Analysis
7
Characterizing Experts:Bias, Precision, Confidence
8
µmean
σmean (Precision)
0
Bias
µ
X50%X5% X95% X0
σX (Confidence)
fµ
L=fX(0)*
fX
µXTrue Value
Expert thinks about the mean (i.e. best estimate)
Expert thinks about distribution of variable X
Multi-Expert Scenario Analysis
• 9 experts, characterized by Bias, Precision, Confidence• 10 hypothetical questions (i = 1 to 10)
– True Value XTrue(i) = 0
– Expert Estimate XEstimate(i): X5%, X50%,X95%
– Predictive Error(i) = XTrue(i) - XGuess(i); MSE
• Leave one question out at a time to predict (cross-validation)• Determine expert weights using 9 questions
• Compare weights and predictive error for an assumed group of experts– Equal Weights– Likelihood Weights– “Classical” Weights
9
Multi-Expert Scenarios
1. Base Case2. Impact of Bias3. Impact of Precision4. Impact of Confidence5. Experts with bias, precision and confidence all
varying
10
Scenario #1: Base Case
11
Experts 1 4 72 5 83 6 9
Expert Characteristics Results: Weights & Error
Bias Avg Likelihood Weights 0 0 0 10.9% 11.9% 10.7% MSE(L) = 0.03 0 0 0 11.3% 11.1% 11.4% 0 0 0 11.4% 10.7% 10.5%
Precision Avg Classical Weights 0.3 0.3 0.3 10.8% 11.3% 11.1% MSE (C) = 0.01 0.3 0.3 0.3 11.5% 11.0% 11.1% 0.3 0.3 0.3 11.0% 11.1% 11.1%
Confidence
Precision Equal Weights 11.11% MSE(E) = 0.82 C/P 0.3 0.3 0.3 1 0.3 0.3 0.3 1 Confidence: 0.3 0.3 0.3 1 0.3 0.3 0.3
• Model validation: Equal weights to equal skills
Scenario #2: Impact of Bias
12
Experts 1 4 72 5 83 6 9
Expert Characteristics Results: Weights & Error
Bias Avg Likelihood Weights 0 0 0 17.2% 18.1% 16.9% MSE(L) = 0.04 0.1 0.1 0.1 14.0% 13.9% 14.0% 0.3 0.3 0.3 2.2% 1.8% 1.8%
Precision Avg Classical Weights 0.3 0.3 0.3 15.8% 16.4% 16.2% MSE (C) = 0.02 0.3 0.3 0.3 14.0% 13.5% 13.4% 0.3 0.3 0.3 3.5% 3.6% 3.6%
Confidence
Precision Equal Weights 11.11% MSE(E) = 2.26 C/P 0.3 0.3 0.3 1 0.3 0.3 0.3 1 Confidence: 0.3 0.3 0.3 1 0.3 0.3 0.3
• When small and moderate bias introduced to multiple experts, weights change to penalize bias (more prominent in likelihood method)
Scenario #3: Impact of Precision
13
Experts 1 4 72 5 83 6 9
Expert Characteristics Results: Weights & Error
Bias Avg Likelihood Weights 0 0 0 30.7% 2.2% 0.0% MSE(L) = 0.02 0 0 0 31.9% 2.1% 0.0% 0 0 0 31.2% 1.9% 0.0%
Precision Avg Classical Weights 0.2 0.3 1 15.5% 13.3% 4.3% MSE (C) = 0.02 0.2 0.3 1 16.5% 12.9% 4.3% 0.2 0.3 1 15.7% 13.1% 4.3%
Confidence
Precision Equal Weights 11.11% MSE(E) = 3.42 C/P 0.2 0.3 1 1 0.2 0.3 1 1 Confidence: 0.2 0.3 1 1 0.2 0.3 1
• When Bias=0 for all and imprecision introduced to multiple experts, weights change to reward precision and penalize imprecision (more prominent in likelihood method)
Scenario #4: Impact of Confidence
14
Experts 1 4 72 5 83 6 9
Expert Characteristics Results: Weights & Error
Bias Avg Likelihood Weights 0 0 0 10.3% 11.5% 10.2% MSE(L) = 0.05 0 0 0 22.2% 21.3% 22.1% 0 0 0 0.8% 0.8% 0.8%
Precision Avg Classical Weights 0.3 0.3 0.3 7.0% 7.9% 7.4% MSE (C) = 0.02 0.3 0.3 0.3 18.2% 17.4% 17.5% 0.3 0.3 0.3 8.1% 8.2% 8.2%
Confidence
Precision Equal Weights 11.11% MSE(E) = 0.82 C/P 0.3 0.3 0.3 0.5 0.15 0.15 0.15 1 Confidence: 0.3 0.3 0.3 2 0.6 0.6 0.6
• When Bias=0 for all and over- and under-confidence introduced to multiple experts, weights change to penalize inappropriate confidence (more prominent in likelihood method for under-confidence)
Scenario #5a: Impact of Precision & Confidence (Bias = 0 for all)
15
Experts 1 4 72 5 83 6 9
Expert Characteristics Results: Weights & Error
Bias Avg Likelihood Weights 0 0 0 17.7% 6.1% 0.0% MSE(L) = 0.03 0 0 0 62.5% 7.4% 0.0% 0 0 0 6.1% 0.2% 0.0%
Precision Avg Classical Weights 0.2 0.3 1 6.8% 6.9% 4.2% MSE (C) = 0.03 0.2 0.3 1 21.5% 17.8% 9.4% 0.2 0.3 1 15.9% 13.2% 4.3%
Confidence
Precision Equal Weights 11.11% MSE(E) = 3.42 C/P 0.2 0.3 1 0.5 0.1 0.15 0.5 1 Confidence: 0.2 0.3 1 2 0.4 0.6 2
• When Bias=0 and imprecision and over-and under-confidence introduced to multiple experts• Weights change to reward “ideal” expert (more prominent in likelihood)• For “Classical”, proper confidence can somewhat compensate for imprecision, not so for
Likelihood (imprecise experts are penalized highly, even if they know they are imprecise)
Scenario #5b: Impact of Precision & Confidence(Bias for all)
16
Experts 1 4 72 5 83 6 9
Expert Characteristics Results: Weights & Error
Bias Avg Likelihood Weights 0.5 0.5 0.5 0.0% 0.2% 1.2% MSE(L) = 0.30 0.5 0.5 0.5 0.3% 8.8% 1.0% 0.5 0.5 0.5 42.3% 46.3% 0.0%
Precision Avg Classical Weights 0.2 0.3 1 0.0% 0.1% 12.7% MSE (C) = 0.72 0.2 0.3 1 0.0% 1.9% 46.5% 0.2 0.3 1 2.0% 8.3% 28.6%
Confidence
Precision Equal Weights 11.11% MSE(E) = 23.80 C/P 0.2 0.3 1 0.5 0.1 0.15 0.5 1 Confidence: 0.2 0.3 1 2 0.4 0.6 2
• When bias for all, and varying amounts of precision and improper relative confidence introduced to multiple experts• Likelihood weights change to reward relatively precise, but underconfident experts• Classical weights shift to reward imprecise experts.
Scenario #5c: Precision & Confidence (Bias for 3 Experts)
17
Experts 1 4 72 5 83 6 9
Expert Characteristics Results: Weights & Error
Bias Avg Likelihood Weights 0.3 0 0 0.0% 8.2% 0.0% MSE(L) = 0.04 0 0.3 0 86.7% 2.0% 0.0% 0.3 0 0 2.4% 0.6% 0.0%
Precision Avg Classical Weights 0.2 0.3 1 0.0% 9.9% 6.2% MSE (C) = 0.06 0.2 0.3 1 32.8% 6.5% 14.6% 0.2 0.3 1 2.8% 20.4% 6.7%
Confidence
Precision Equal Weights 11.11% MSE(E) = 4.26 C/P 0.2 0.3 1 0.5 0.1 0.15 0.5 1 Confidence: 0.2 0.3 1 2 0.4 0.6 2
• When there is moderate bias in a subset of “good” experts, and both imprecision and over-and under-confidence introduced to all• Likelihood rewards “best” expert significantly • Classical spreads weights across much more
Conclusions (1)• Overall: Likelihood and “Classical” similar performance (much
better than equal weights), but with very different weights assigned to experts with different degrees of bias, precision and relative confidence
• Model Check: Both assign equal weights to experts with equal skill (equal bias, precision, and relative confidence)
• Bias: Both penalize biased experts, stronger penalty in Likelihood
• Precision: Both penalize imprecise experts, but again stronger penalty in Likelihood
• Confidence: “Classical” penalizes overconfidence and underconfidence equally. Likelihood penalizes overconfidence a similar amount, but underconfidence much more so.
18
Conclusions (2)• Precision & Confidence: For “Classical”, proper (or under-)
confidence can compensate somewhat for imprecision, not so for the Likelihood weights (and over-confidence remains better for Likelihood weighting).
• Future Direction: Consider 3-parameter distributions to be fit from expert’s 5th, 50th, and 95th percentile values to enable a more flexible Likelihood approach– Conduct an elicitation in which 2- and 3-parameter likelihood
functions are used and compared.– Consider how new information affects experts' performance on seed
questions (explore VOI for correcting experts' biases, imprecision, and under- or overconfidence).
19
Thank you
Questions?
20