Download - Quality Measurement: Is the Information Sound Enough to be Used by Decision Makers? Cheryl L. Damberg, Ph.D., Director of Research Pacific Business Group.

Quality Measurement: Is the Information Sound Enough to be Used by Decision Makers?

Cheryl L. Damberg, Ph.D., Director of ResearchPacific Business Group on HealthAcademy Health: June 8, 2004

© Pacific Business Group on Health, 2004 2

Reframed Question… How good is good enough? For use by whom for what purposes?

Purchasers--changes in plan design to reward higher quality, more efficient providers, steer enrollment (premiums, out of pocket costs)

Plans--incentive payments, tiering, narrow networks, channeling or centers of excellence

Consumers--to guide treatment choices Providers--quality improvement


How Good is Good Enough? We don’t know what the right standard is

Should standards apply in same way to all end users?

What are the dangers of “noisy” information? Demming Toyota studies (Six Sigma) showed that when

gave back noisy information on performance Increased variation, decreased quality Disorienting; lost natural instinct for how to improve

How do we make optimal decision in the face of uncertainty? Decision theory analysis could help to inform these

questions Need research in this area


Reality Check! What information?

Measures exist--few implemented routinely or universally Most providers have no clue what their performance is

“I’m following guidelines, it is someone else who isn’t” Is the current information better than no information?

Absent information—choice is like a flip of the coin (50:50) Decisions will still be made with no information or

poor information Default position is to base decisions solely on price

Consequences differ Patient—inconvenience for little gain in outcome Provider—ruin reputation, livelihood


What’s Currently Going On Out There in Measurement? Two ends of the extreme…examples Commercial vendors

Using administrative data, often with poor case mix adjustment omitted variables that can lead to biased results handling of missing data rank ordering problems that lead enduser to incorrect

decision Research-level work

Doing shrinking estimates to address noise problem without thinking about issues of underlying data quality


Where in the Measurement Process Can Things Go Wrong?

Implementation Display Reporting

Measures

Link to outcomes

Importance Valid Reliable

Poor data Small “n”

Will enduser draw correct conclusion based on how reported?


Data: The Next Generation…..

http://www.geekroar.com/film/archives/tng_data2-thumb.jpg

http://members.tripod.com/~trekky/TNG/tng-logo.jpg


Underlying Problem of Data Quality One of the greatest threats to validity of

performance results are the data that “feed” the measures Even if quality measure is good (i.e., reliable,

valid), can still produce bad (“biased”) result if the data used to score performance are flawed or if the source of data omits key variables important in predicting the outcome.


Example 1: Risk-Adjusted Hospital Outcome for Bypass Surgery CA CABG Mortality Reporting Program

70 hospitals submitted data in 1999 Concern about comparability across hospitals in

coding Potential impact on hospital scores Importance of “getting it right” given public reporting

38 hospitals selected for audit Focused on outliers or near outliers, with random

selection in the middle; over sampled high risk cases 2408 cases audited

Inter-rater reliability 97.6% (range: 95-99%: Cohen’s Kappa)


Table 1: Comparison of Audited Data and CCMRP Submissions for Acuity, All Hospitals, 1999 Data

Audited Data Elective Urgent Emergent Salvage

Total

Elective 447 431 7 1 886 Urgent 140 911 53 4 1,108 Emergent 16 117 199 3 335

CCMRP Data

Salvage 1 18 29 4 52

Total 604 1,477 288 12 2,381


Results of Audit Revealed downcoding and upcoding

problems Worst agreement: acuity (65.6%), angina

type (65.4%), angina class (45.8%), MI (68.3%), and ejection fraction (78.0%)

Missing data: incorrect classification of risk based on policy of replacing with lowest risk Ejection fraction (15.8%), MI (38.1%)


Table 1: Agreement Statistics, All Hospitals, 1999 Data

Variable Records Audited

Missing Values

% Missing Values that Would be

Incorrectly Classified % Agreement

% Lower Triangle Severity Weighted

Disagreement

Acuity 2,408 2 100.00 65.56 64.36

Angina Type (Stable/Unstable) 2,408 0 NA 65.37 34.73

Angina (Yes/No) 2,408 0 NA 86.21 42.47

CCS Angina Class 2,408 105 79.05 45.76 53.19

Congestive Heart Failure 2,408 31 38.71 82.23 32.94

COPD 2,408 6 0.00 86.34 73.25

Creatinine (mg/dl) 2,408 556 3.96 93.31 56.37

Cerebrovascular Disease 2,408 3 0.00 87.67 45.79

Dialysis 2,408 91 0.00 98.13 86.67

Diabetes 2,408 3 0.00 94.73 45.67

Ejection Fraction (%) 2,408 228 15.79 78.95 60.27

Method of measuring ejection fraction 2,408 406 0.00 74.34 Not Calculated

Hypertension 2,408 7 85.71 84.39 40.43

Time from PTCA to surgery 125 45 42.22 78.40 12.50

Left Main Stenosis 2,408 388 7.22 85.96 51.46


Results of Audit Classification of some hospitals as outliers

may be a result of coding deficiencies When model was re-run, saw changes in

statistical significance and/or risk differential Death (outcome variable)—small levels of

disagreement can change hospital rating Change in rankings

1 (no different better than) 6 (worse than no different) 1 (no different worse than)


Impact on Fitted Model Characteristics when Replacing Audited Records with Information from Audit, 1999 Data

Model: CCMRP Data Model: CCMRP Data and Audited Data

Where Record was Audited

Estimate p-value OR* Estimate p-value OR

Intercept -7.74 0.00 -9.11 0.00

Creatinine (mg/dl) 0.18 0.00 1.20 0.01 0.15 1.01

Congestive Heart Failure 0.38 0.00 1.46 0.55 0.00 1.73

Hypertension 0.14 0.18 1.15 0.23 0.04 1.25

Dialysis 0.39 0.18 1.47 1.24 0.00 3.45

Diabetes 0.19 0.04 1.21 0.25 0.01 1.29

Elective Reference Group Reference Group

Urgent 0.26 0.02 1.29 0.33 0.00 1.39

Emergent 1.24 0.00 3.46 1.33 0.00 3.77

Acuity

Salvage 2.46 0.00 11.71 3.11 0.00 22.46

Fit Statistics:

R2 0.188 0.202

c-statistic 0.818 0.833

Hosmer-Lemeshow 2 (p-value) 9.303 (0.317) 23.068 (0.003)


Steps Taken to Safeguard Against Getting it Wrong Audit Data cross validation Training on coding of variables; support to

hospital coders Display of confidence intervals

Small hospital with zero deaths (CI: 0.0%-10.0%) Combine data over multiple years

Generate more stable estimates for small volume hospitals


Example 2: Pay for Performance Plan payouts to medical groups based on

rewarding those groups that rank at 75th percentile or higher

Rank ordering problems Medical groups with estimates based on small “n”

(i.e., noisy) more likely to fall in top or bottom part of distribution

Straight ranking ignores uncertainty in estimates Potential for rewarding wrong players

Rewarding noise, not signal


Example 3: Individual Physician Performance Measurement Small “n” problem

Physician lacks enough events (e.g., diabetics) to score him/her at the level of the individual indicator

Estimates at indicator level are noisy (large SEs) Need to pool more information on physician’s

performance across conditions to improve the signal to noise ratio Create summary scores (e.g., RAND QA Tools)


Can We Proceed? OK to start with Version 1.0 of the measures

Means of soliciting feedback Help drive improvement in measurement Won’t get it perfect on first attempt

Important to safeguard against possible mistakes in classifying Check validity of data (audit, cross validate)

Assess extent of disagreement Perform sensitivity analyses


Hedging Against Uncertainty Conservative ways of reporting so don’t

mislead (level of certainty in estimate) Rank ordering—small groups may rank either in

the highest/lowest part of the distribution, yet we are most uncertain of their true performance

Cruder binning (categorization) When faced with more uncertainty or

consequences are higher Use measures as a tool to identify bottom

performers, then send out teams to find out what is going as a way to validate


Measurement Issues Remain Existing measures

OK, but difficult to implement (many rely on chart review) Hospital performance

Complexity of what to measure (service line vs. overall) Physician performance

Small “n” problem; challenges of pooling data Comprehensive assessment important, but too much

information will overwhelm endusers Need for summary measures

Need to improve data systems


Why Do We Need to Fill the Gaps? Lack of information and transparency

Hard to improve if you don’t know where the problem is

Continue rewarding status quo Need to increase competition to improve

quality and contain costs Information is vital for competitive markets to

operate