Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

25
© 2003 By Default! A Free sample background from www.powerpointbackgrounds.com Slide 1 Evaluation of Support Vector Evaluation of Support Vector Machines for Risk Modeling Machines for Risk Modeling in Interventional Cardiology in Interventional Cardiology Michael E. Matheny, M.D. Michael E. Matheny, M.D.

description

Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology. Michael E. Matheny, M.D. Goal. Comparison of support vector machines and logistic regression risk modeling performance over time for the outcome of death in pre-intervention cardiac catheterization patients. - PowerPoint PPT Presentation

Transcript of Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

Page 1: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 1

Evaluation of Support Vector Evaluation of Support Vector Machines for Risk Modeling in Machines for Risk Modeling in

Interventional CardiologyInterventional Cardiology

Michael E. Matheny, M.D.Michael E. Matheny, M.D.

Page 2: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 2

GoalGoal

Comparison of support vector machines and Comparison of support vector machines and logistic regression risk modeling performance logistic regression risk modeling performance over time for the outcome of death in pre-over time for the outcome of death in pre-intervention cardiac catheterization patients.intervention cardiac catheterization patients.

Page 3: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 3

Pre-intervention Risk Pre-intervention Risk AssessmentAssessment

Percutaneous Coronary Intervention (PCI) is Percutaneous Coronary Intervention (PCI) is a high volume procedure with significant a high volume procedure with significant morbidity & mortality morbidity & mortality

Risk of death in PCI varies widely based on Risk of death in PCI varies widely based on co-morbiditiesco-morbidities

Providing accurate case level estimations Providing accurate case level estimations can greatly aid patient and physician can greatly aid patient and physician decision-makingdecision-making

Page 4: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 4

Domain Data QualityDomain Data Quality

The American College of Cardiologists has The American College of Cardiologists has published a standardized data dictionary published a standardized data dictionary (ACC-NCDR) and mandates that accredited (ACC-NCDR) and mandates that accredited centers maintain detailed data on all PCI centers maintain detailed data on all PCI patientspatients

Some states, including Massachusetts, now Some states, including Massachusetts, now have mandatory reporting of case data have mandatory reporting of case data based on the ACC-NCDRbased on the ACC-NCDR

Page 5: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 5

Current Risk Model StandardCurrent Risk Model StandardLogistical Regression (LR)Logistical Regression (LR)

Gold standard for risk modeling in Gold standard for risk modeling in interventional cardiologyinterventional cardiology

Type of generalized non-linear modelType of generalized non-linear model– Used in analysis of a binary outcomeUsed in analysis of a binary outcome– Bounded by 0 and 1Bounded by 0 and 1

Feature (variable) selectionFeature (variable) selection– From All Available DataFrom All Available Data– Known Risk Factors from Prior StudiesKnown Risk Factors from Prior Studies– Selected Subset of data based on Study DesignSelected Subset of data based on Study Design

Page 6: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 6

Alternative Risk ModelAlternative Risk Model Support Vector Machine (SVM)Support Vector Machine (SVM)

Key FeaturesKey Features– Kernel Functions - introduce non-linearity in the Kernel Functions - introduce non-linearity in the

hypothesis space without explicitly requiring a hypothesis space without explicitly requiring a non-linear algorithm non-linear algorithm • LinearLinear• PolynomialPolynomial• Radial BasedRadial Based

– Global MinimumGlobal Minimum

Page 7: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 7

Risk Model EvaluationRisk Model EvaluationDiscriminationDiscrimination

Provides an estimate of population level Provides an estimate of population level accuracyaccuracy

Area under the Receiver Operating Area under the Receiver Operating Characteristic (ROC) CurveCharacteristic (ROC) Curve

Graphed by the sensitivity vs. 1-specificity at Graphed by the sensitivity vs. 1-specificity at different thresholds different thresholds

Page 8: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 8

Risk Model EvaluationRisk Model EvaluationCalibrationCalibration

Provides an estimation of case level Provides an estimation of case level accuracyaccuracy

Hosmer-Lemeshow’s Goodness-of-Fit TestHosmer-Lemeshow’s Goodness-of-Fit Test– Primarily used in logistic regressionPrimarily used in logistic regression– Calculates how well the observed and expected Calculates how well the observed and expected

frequencies matchfrequencies match– Handles data sparsity better than more common Handles data sparsity better than more common

methods (Variance, Pearson’s)methods (Variance, Pearson’s)– P > 0.05 is a good fitP > 0.05 is a good fit

Page 9: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 9

Source DataSource Data

Brigham & Women’s HospitalBrigham & Women’s Hospital Interventional Cardiology DatabaseInterventional Cardiology Database January 1, 2002 – October 30, 2004January 1, 2002 – October 30, 2004 5383 Cases5383 Cases

– Data split two ways each into 2/3 Training Data split two ways each into 2/3 Training (3588) and 1/3 Test (1795)(3588) and 1/3 Test (1795)• Sequential Split Sequential Split

– sorted chronologicallysorted chronologically– October 27, 2003 splitOctober 27, 2003 split

• Random SplitRandom Split

Page 10: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 10

Sample DemographicsSample DemographicsOverviewOverview

## %%

AgeAge

0-490-49 590590 10.9610.96

50-5950-59 11671167 21.6821.68

60-6960-69 14971497 27.8127.81

70-7970-79 13981398 25.9825.98

80 +80 + 652652 12.2212.22

DiabeticDiabetic 17211721 31.9831.98

HypertensiveHypertensive 40834083 75.8675.86

HyperlipidemiaHyperlipidemia 37373737 69.4469.44

Prior PCIPrior PCI 18221822 33.8533.85

Salvage ProcedureSalvage Procedure 2424 0.450.45

Cardiogenic ShockCardiogenic Shock 9898 1.821.82

Hemodynamic InstabilityHemodynamic Instability 265265 4.924.92

DeathDeath 7878 1.451.45

Page 11: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 11

Model FeaturesModel Features

Age (D)Age (D) HyperlipidemiaHyperlipidemia Hx COPDHx COPD

GenderGender HTNHTN Hx CVDHx CVD

BMI (D)BMI (D) DiabetesDiabetes Hx PVDHx PVD

Cardiogenic ShockCardiogenic Shock Creatinine (D)Creatinine (D) ThrombolyticThrombolytic

Cardiac arrestCardiac arrest Hx CHFHx CHF IABPIABP

Hemodynamic instabilityHemodynamic instability CHFCHF EF (D)EF (D)

SmokerSmoker Prior MIPrior MI AMIAMI

Prior CABGPrior CABG Prior PCIPrior PCI Procedure urgency (D)Procedure urgency (D)

Unstable AnginaUnstable Angina Chronic AnginaChronic Angina AMI Within 24 HoursAMI Within 24 Hours

Page 12: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 12

Logistic RegressionLogistic RegressionModel DevelopmentModel Development

STATA 8.2 (College Station, TX) STATA 8.2 (College Station, TX) Backwards Stepwise TechniqueBackwards Stepwise Technique Exclusion Threshold (P 0.05 – 0.15)Exclusion Threshold (P 0.05 – 0.15) Feature SelectionFeature Selection

Page 13: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 13

Logistic RegressionLogistic RegressionFeature SelectionFeature Selection

Model developmentModel development– Sequential Training SetSequential Training Set– Stepwise Backwards (P = 0.10) used for feature Stepwise Backwards (P = 0.10) used for feature

selectionselection– Stepwise feature removal based on ROC and HL Stepwise feature removal based on ROC and HL

Goodness-of-fit (HL) optimizationGoodness-of-fit (HL) optimization

Page 14: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 14

Logistic RegressionLogistic RegressionFeature SelectionFeature Selection

FeatureFeature ROCROC HL PHL PAllAll 0.9520.952 0.03580.0358

-BMI-BMI 0.9520.952 0.07060.0706

-EF-EF 0.9450.945 0.00040.0004

-arrest-arrest 0.9510.951 0.06020.0602

-hyperlipid-hyperlipid 0.94080.9408 0.00010.0001

-BMI,EF-BMI,EF 0.94820.9482 0.07430.0743

-BMI, Urgency-BMI, Urgency 0.9490.949 0.10660.1066

-BMI, Urgency, CHF Hx-BMI, Urgency, CHF Hx 0.9560.956 0.9560.956

Page 15: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 15

Logistic RegressionLogistic RegressionEvaluationEvaluation

Training Test

ROC HL ROC HL

0.15 0.946 0.672 0.894 <0.001

SEQ 0.10 0.949 0.488 0.904 <0.001

0.05 0.936 0.704 0.889 0.004

0.15 0.926 0.269 0.920 0.140

RND 0.10 0.926 0.269 0.920 0.140

0.05 0.900 0.095 0.899 <0.001

Page 16: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 16

Support Vector MachineSupport Vector MachineModel DevelopmentModel Development

GIST 2.1.1 (Columbia University, NY, NY) GIST 2.1.1 (Columbia University, NY, NY) STATA 8.2 (College Station, TX) STATA 8.2 (College Station, TX) All variables used All variables used Kernel ChoiceKernel Choice

– Polynomial (1-6)Polynomial (1-6)– Radial width factor (related to sigma) (0.1-20)Radial width factor (related to sigma) (0.1-20)

Probabilistic Output MethodologyProbabilistic Output Methodology– Discriminant: distance from hyperplaneDiscriminant: distance from hyperplane– LR Model using Discriminant as the only featureLR Model using Discriminant as the only feature– Established method to convert SVM classification to Established method to convert SVM classification to

regressionregression– Allows use of HL Goodness of fitAllows use of HL Goodness of fit

Page 17: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 17

SEQSEQ Training Test

ROC HL ROC HL

Lin 0.970 0.503 0.896 0.003

P2 0.991 0.966 0.907 0.002

P3 0.994 0.999 0.909 0.067

P4 0.992 0.997 0.907 0.163

P5 0.9870.987 0.8180.818 0.899 0.713

P6 0.9760.976 0.0490.049 0.8850.885 0.7380.738

Support Vector MachineSupport Vector MachinePolynomial EvaluationPolynomial Evaluation

Page 18: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 18

RNDRND Training Test

ROC HL ROC HL

Lin 0.963 0.616 0.862 0.817

P2 0.992 0.920 0.900 0.754

P3 0.995 0.999 0.901 0.617

P4 0.996 1.000 0.903 0.521

P5 0.9960.996 0.9030.903 0.878 0.749

P6 0.9970.997 0.0130.013 0.8710.871 0.8560.856

Support Vector MachineSupport Vector MachinePolynomial EvaluationPolynomial Evaluation

Page 19: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 19

SEQSEQ Training Test

ROC HL ROC HL

R 0.25 1 1 0.889 0.111

R 0.50 1 1 0.909 0.601

R 0.75 1 1 0.910 0.200

R 1.00 0.9970.997 11 0.910 0.246

R 1.50 0.9700.970 0.5020.502 0.9040.904 0.0010.001

R 2.00 0.9740.974 0.8170.817 0.9040.904 0.0010.001

Support Vector MachineSupport Vector MachineRadial EvaluationRadial Evaluation

Page 20: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 20

RNDRND Training Test

ROC HL ROC HL

R 0.25 0.999 1 0.891 0.046

R 0.50 1 1 0.908 0.593

R 0.75 1 1 0.910 0.199

R 1.00 0.9970.997 11 0.911 0.542

R 1.50 0.9920.992 0.9610.961 0.907 0.810

R 2.00 0.8950.895 0.9610.961 0.8980.898 0.2320.232

Support Vector MachineSupport Vector MachineRadial EvaluationRadial Evaluation

Page 21: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 21

DiscussionDiscussionAll DiscriminationAll Discrimination

All Models showed excellent performanceAll Models showed excellent performance None of the models was significantly different None of the models was significantly different

in performancein performance This measure was relatively insensitive to This measure was relatively insensitive to

changes in data across widely variable levels changes in data across widely variable levels of calibrationof calibration

Page 22: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 22

DiscussionDiscussionLR CalibrationLR Calibration

For this data, LR was unable to maintain For this data, LR was unable to maintain calibration. This is likely due to temporal calibration. This is likely due to temporal data driftdata drift

The LR models required manual feature The LR models required manual feature selection and expert knowledge to calibrate selection and expert knowledge to calibrate the training data setsthe training data sets

Page 23: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 23

DiscussionDiscussionSVM CalibrationSVM Calibration

Some versions of both kernel types were Some versions of both kernel types were able to maintain calibration on both data setsable to maintain calibration on both data sets

Calibration was maintained across larger Calibration was maintained across larger parameter ranges of both kernels for the parameter ranges of both kernels for the random data set than the sequential data setrandom data set than the sequential data set

Current assessments of discrimination and Current assessments of discrimination and calibration on the training set are insufficient calibration on the training set are insufficient to choose the optimal kernel parameterto choose the optimal kernel parameter

Page 24: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 24

ConclusionsConclusions

SVMs could be superior to LR in terms of SVMs could be superior to LR in terms of maintaining calibration over time in this maintaining calibration over time in this domaindomain

Further exploration is needed to develop Further exploration is needed to develop additional markers of model robustnessadditional markers of model robustness

Further work in evaluating optimal time Further work in evaluating optimal time intervals to create new models or recalibrate intervals to create new models or recalibrate old modelsold models

Page 25: Evaluation of Support Vector Machines for Risk Modeling in Interventional Cardiology

© 2003 By Default!

A Free sample background from www.powerpointbackgrounds.com

Slide 25

The endThe end