Results for Keith

8
Chapter 4: Multivariate Models Table of Contents 4.3 REALTROMINS-Real Time Risk of Mortality Example 4.3.1 Streaming ECG Data 4.3.2 Physiologic based measures of organ function 4.3.3 Chart data 4.3.4 Developing a Common Time Hierarch 4.3.5 Multivariate Models 4.3.6 Online Interactive Reporting : COGNOS Example 4.1 Introduction The attempt of data mining is to construct a mathematical algorithm that captures viable representations of an existing phenomena hidden within a database. These viable representation must be robust enough based upon their “parameter-estimation” so as to repeat any predicted classification onto to an independent hold out sample. Different classes of algorithms can be found with SAS-EM5.1 ranging from standard regression to rule induction, to neural networks, to two-stage. In addition a model comparison module exist that provides a cross comparison among multiple models based strictly based upon ROC characteristics. 4.3 REALTROMINS-Real Time Risk of Mortality This example attempts to differentiate arrested from surviving children admitted to the University of North Carolina: Chapel Hill PICU (Pediatric Intensive Care Unit) using streaming ECG data, clinical laboratory results and demographic chart data. A sample of 10 pediatric patients were selected as the analytic set to provide this proof of concept. 4.3.1 Streaming ECG Data was captured at 222 data points per second tagged with a utc milli-second timestamp provided by a SpaceLabs Monitor XXXX. This data summarizes into a varied length-varied peaked plot. To create time-domain variables, all peaks must be found. The initial step in determining Rpeaks was to insure that marks provided by SpaceLabs are reasonably consistent and representative of their true location. It was discovered that these marks do not occur at consistent locations within the ECG cycle. This issue included both extra marks and missing marks for unsteady data segments. A multiple filtering algorithm was developed using MATLAB which checked for these conditions and corrected them using several interpolated methods. Once completed the next step was to calculate the HR (heart rate) based on the Rpeak spacing using utc time. The length between Rpeaks represents the period between beats and to convert this to a heart rate, simply divide 60 seconds/minute by this value. For example: 60 sec/min / (0.558 sec/beat) = 107.5 beats per minute (bpm)

Transcript of Results for Keith

Page 1: Results for Keith

Chapter 4: Multivariate Models

Table of Contents

4.3 REALTROMINS-Real Time Risk of Mortality Example

4.3.1 Streaming ECG Data

4.3.2 Physiologic based measures of organ function

4.3.3 Chart data

4.3.4 Developing a Common Time Hierarch

4.3.5 Multivariate Models

4.3.6 Online Interactive Reporting : COGNOS Example

4.1 Introduction

The attempt of data mining is to construct a mathematical algorithm that captures

viable representations of an existing phenomena hidden within a database. These viable

representation must be robust enough based upon their “parameter-estimation” so as to

repeat any predicted classification onto to an independent hold out sample. Different

classes of algorithms can be found with SAS-EM5.1 ranging from standard regression to

rule induction, to neural networks, to two-stage. In addition a model comparison module

exist that provides a cross comparison among multiple models based strictly based upon

ROC characteristics.

4.3 REALTROMINS-Real Time Risk of Mortality

This example attempts to differentiate arrested from surviving children admitted

to the University of North Carolina: Chapel Hill PICU (Pediatric Intensive Care Unit)

using streaming ECG data, clinical laboratory results and demographic chart data. A

sample of 10 pediatric patients were selected as the analytic set to provide this proof of

concept.

4.3.1 Streaming ECG Data was captured at 222 data points per second tagged

with a utc milli-second timestamp provided by a SpaceLabs Monitor XXXX. This data

summarizes into a varied length-varied peaked plot. To create time-domain variables, all

peaks must be found. The initial step in determining Rpeaks was to insure that marks

provided by SpaceLabs are reasonably consistent and representative of their true

location. It was discovered that these marks do not occur at consistent locations within

the ECG cycle. This issue included both extra marks and missing marks for unsteady data

segments. A multiple filtering algorithm was developed using MATLAB which checked

for these conditions and corrected them using several interpolated methods. Once

completed the next step was to calculate the HR (heart rate) based on the Rpeak spacing

using utc time. The length between Rpeaks represents the period between beats and to

convert this to a heart rate, simply divide 60 seconds/minute by this value. For example:

60 sec/min / (0.558 sec/beat) = 107.5 beats per minute (bpm)

Page 2: Results for Keith

This calulcated HR is a value located at each Rpeak. Frequency Domain variables require

an acceptable resolution while maintaining a reasonable time period length. Therefore a

128 point FFT (Fast Fourier Transformation) was chosen as the standard. This provided

64 estimated points over the positive frequency range. The length of the N-second

interval was determined by the heart rate and the 128 point FFT requirement. From the

heart rate, we determine how long it will take to acquire 128 beats. The formula for this

calculation is:

N-seconds (sec) = (128/ average HR)*60

This is set as the length of the N-second period for the current data segment. The

determined N-second amount of data is then retrieved, with the appropriate marks and

Rpeaks. This resulted segment of data may contain a few more or a few less than 128

beats. To sample 128 times consistently (evenly spaced) within this N-second segment,

the HR data is interpolated with a sample rate of 128.

An FFT is now performed on the interpolated HR data. In order to provide a spectrum

that has the 0 Hz (dc) point removed, the HR data is normalized according to the mean

(formula below)

HR normalized = (HR interpolated – HR interpolated average) / (HR interpolated average);

An FFT was then performed on that normalized and interpolated HR data (HRnorm).

The beginning and ending values of each consistent time segment and the beginning and

ending values of each time segment over which the analysis actually occurred (ie. where

the first and last point in utc time were for the HR interpolation) are written out to the HR

data file. Other columns in this file include Rpeaks, Rind, HR, HR interpolated,

HRnorm, frequency (Hz), FFT HR interpolated, and FFT HRnorm.

The data structure developed to store all of this information was multi-hierachical

and variable length. While the number of heart peak record varied about 128, the

frequency domain variable were fixed at 64 record each. Column heading were

HR data file.

Variable Name Hierarchy

A.) Actual Begin Time for the 128 beat sample epoch 1

B.) Actual End Time for the 128 sampled epoch 1

C.) Actual Rpeak Timestamp ~128

D.) Rpeak value ~128

E.) HR – raw ~128

F.) HR – Interpolated ~128

G.) HR- Normative ~128

H.) Frequency 64

I.) Heart rate Spectra Raw 64

J.) Hear Rate Spectra Interpolated 64

Page 3: Results for Keith

Time domain variables were calculated by stripping out the actual Rpeak time stamp

and then binning the distribution and calculating percentages as follows:

if nn_interval > 0 and nn_interval <= .05 then NN50=1;

if nn_interval > .05 and nn_interval <= .10 then NN100=1;

if nn_interval > .10 and nn_interval <= .20 then NN200=1;

if nn_interval > .20 and nn_interval <= .30 then NN300=1;

if nn_interval > .30 and nn_interval <= .40 then NN400=1;

if nn_interval > .40 and nn_interval <= .50 then NN500=1;

if nn_interval > .50 and nn_interval <= .60 then NN600=1;

if nn_interval > .60 and nn_interval <= .70 then NN700=1;

if nn_interval > .70 and nn_interval <= .80 then NN800=1;

if nn_interval > .80 and nn_interval <= .90 then NN900=1;

if nn_interval > .90 and nn_interval <= 1.0 then NN1000=1;

if nn_interval > 1.0 and nn_interval <= 3.0 then NN1000P=1;

pNN50=nn50/rpeak_counts;

pNN100=nn100/rpeak_counts;

pNN200=nn200/rpeak_counts;

pNN300=nn300/rpeak_counts;

pNN400=nn400/rpeak_counts;

pNN500=nn500/rpeak_counts;

pNN600=nn600/rpeak_counts;

pNN700=nn700/rpeak_counts;

pNN800=nn800/rpeak_counts;

pNN900=nn900/rpeak_counts;

pNN1000=nn1000/rpeak_counts;

pNN1000p=nn1000p/rpeak_counts;

Spectra data was extracted by stripping on the fixed 64 records for each group of records.

Frequency bands were grouped as follows using the interpolated heart rate spectra values

to calculate area within that specific band of frequencies.

if frequency >= 0 and frequency <= .003 then freq_type = '1-ULF';

if frequency >= .0031 and frequency <= .040000 then freq_type = '2-VLF';

if frequency >= .040001 and frequency <= .150 then freq_type = '3-LF';

if frequency >= .150001 then freq_type = '4-HF';

4.3.2 Physiologic based measures of organ function important in predicting

mortality were defined by a battery of 75 lab tests collected during a stay within the

PICU. These tests were collected based upon clinical need which varied by patient and

varied over time within patient. This created a concern on how to include all of this

information within a reasonable patient-to-variable ratio and handling the high expected

inter-correlation among the tests. The answer provided here was to create a series of

derogatory bio-markers. Only those tests that could identify segments with the highest

index against base mortality were considered. This was accomplished by looking within

each variables distributions and identifing intervals that index high against based-

mortality using a single factor scan procedure found within QTMS V3.2 .

Page 4: Results for Keith

Base Balance

SINGLE FACTOR SCAN RESULT QTMS

TYPE IS "CONTINUOUS" V3.2

NO.OF NO.OF RESPONSE RESPONSE

# INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX

-----------------------------------------------------------------------------------------------------------------------------------

1 -15.1 - -6.2 42 40 95.238 5.751 0.0165 146 |******************

2 -6.1 - -4.1 44 40 90.909 4.407 0.0358 139 |****************

3 -4 - -2.7 42 33 78.571 1.127 0.2884 120 |*************

4 -2.6 - -1.3 42 25 59.524 0.217 0.6415 91 |******* |

5 -1.2 - 0.6 46 18 39.130 4.833 0.0279 60 |** |

6 0.7 - 1.9 45 28 62.222 0.067 0.7965 95 |********|

7 2 - 3.4 44 28 63.636 0.019 0.8894 97 |*********

8 3.5 - 5.4 43 29 67.442 0.029 0.8640 103 |**********

9 5.5 - 8.2 43 15 34.884 6.101 0.0135 53 |* |

10 8.5 - 14.2 33 21 63.636 0.014 0.9042 97 |*********

------------------------------------------------------------------------------------------------------------------------|----------

424 277 65.330 22.565 0.0072 100

O2Sat - ART (meas)

SINGLE FACTOR SCAN RESULT QTMS

TYPE IS "CONTINUOUS" V3.2

NO.OF NO.OF RESPONSE RESPONSE

# INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX

-----------------------------------------------------------------------------------------------------------------------------------

1 37.4 - 68.2 42 37 88.095 2.747 0.0974 131 |***************

2 68.8 - 76 42 36 85.714 2.158 0.1418 128 |**************

3 76.1 - 84.2 42 41 97.619 5.812 0.0159 145 |******************

4 84.3 - 93.7 42 33 78.571 0.818 0.3659 117 |************

5 94 - 97.5 47 16 34.043 7.668 0.0056 51 |* |

6 97.6 - 98.2 46 23 50.000 2.013 0.1560 74 |***** |

7 98.3 - 98.6 42 24 57.143 0.625 0.4291 85 |******* |

8 98.7 - 99.1 46 18 39.130 5.375 0.0204 58 |** |

9 99.2 - 99.6 49 38 77.551 0.791 0.3738 116 |************

10 99.7 - 100 25 18 72.000 0.088 0.7668 107 |***********

------------------------------------------------------------------------------------------------------------------------|----------

423 284 67.139 28.095 0.0009 100

Sodium

SINGLE FACTOR SCAN RESULT QTMS

TYPE IS "CONTINUOUS" V3.2

NO.OF NO.OF RESPONSE RESPONSE

# INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX

-----------------------------------------------------------------------------------------------------------------------------------

1 . 4 1 25.000 0.993 0.3190 38 |* |

2 121 - 134 83 34 40.964 7.499 0.0062 63 |**** |

3 135 - 136 84 49 58.333 0.615 0.4331 89 |******** |

4 137 - 138 115 60 52.174 3.010 0.0827 80 |******* |

5 139 - 140 106 39 36.792 13.150 0.0003 56 |*** |

6 141 - 143 91 63 69.231 0.222 0.6376 106 |***********

7 144 - 149 79 72 91.139 8.121 0.0044 140 |***************

8 150 - 157 83 83 100.000 15.369 0.0001 153 |******************

9 158 - 165 57 57 100.000 10.555 0.0012 153 |******************

-------------------------------------------------------------------------------------------------------------------------|---------

702 458 65.242 59.534 0.0000 100

Hemoglobin

SINGLE FACTOR SCAN RESULT QTMS

TYPE IS "CONTINUOUS" V3.2

NO.OF NO.OF RESPONSE RESPONSE

# INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX

-----------------------------------------------------------------------------------------------------------------------------------

1 6.2 - 8.5 60 48 80.000 1.673 0.1958 120 |************

2 8.6 - 9.2 62 47 75.806 0.828 0.3629 114 |**********

3 9.3 - 9.8 61 33 54.098 1.389 0.2386 81 |**** |

4 9.9 - 10.4 67 30 44.776 4.715 0.0299 67 |* |

5 10.5 - 11.3 66 28 42.424 5.711 0.0169 64 |* |

6 11.4 - 12.2 65 36 55.385 1.186 0.2761 83 |**** |

7 12.3 - 12.9 71 49 69.014 0.074 0.7863 104 |********

8 13 - 13.8 62 47 75.806 0.828 0.3629 114 |**********

9 13.9 - 15.7 62 54 87.097 4.003 0.0454 131 |**************

10 15.8 - 19.2 31 31 100.000 5.274 0.0216 151 |******************

-----------------------------------------------------------------------------------------------------------------------|-----------

607 403 66.392 25.680 0.0023 100

Glucose

SINGLE FACTOR SCAN RESULT QTMS

TYPE IS "CONTINUOUS" V3.2

NO.OF NO.OF RESPONSE RESPONSE

# INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX

-----------------------------------------------------------------------------------------------------------------------------------

1 . 5 1 20.000 1.497 0.2211 31 |* |

2 30 - 82 76 57 75.000 1.534 0.2155 118 |*****************

3 83 - 91 83 64 77.108 2.357 0.1248 121 |******************

4 92 - 101 87 60 68.966 0.384 0.5354 108 |***************

5 102 - 110 77 48 62.338 0.021 0.8841 98 |*************

6 111 - 119 73 34 46.575 3.348 0.0673 73 |******** |

7 120 - 133 74 43 58.108 0.359 0.5492 91 |************|

8 134 - 159 75 48 64.000 0.001 0.9709 101 |**************

9 161 - 240 74 42 56.757 0.555 0.4565 89 |*********** |

10 241 - 453 42 27 64.286 0.003 0.9597 101 |**************

----------------------------------------------------------------------------------------------------------------------------|------

666 424 63.664 10.059 0.3457 100

Page 5: Results for Keith

4.3.3 Chart data defined as information taken at the point of addmisssion. Again,

these tests were collected based upon clinical need which varied by patient and varied

over time within patient. We employed the same derogatory biomarker methodology as

described for the battery of lab tests.

SINGLE FACTOR SCAN EPI QTMS

TYPE IS "CONTINUOUS" V3.2

NO.OF NO.OF RESPONSE RESPONSE

# INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX

-----------------------------------------------------------------------------------------------------------------------------------

1 . 97 48 49.485 0.513 0.4738 90 |**** |

2 0 1189 416 34.987 85.678 0.0000 64 |* |

3 1 551 544 98.730 193.147 0.0000 180 |******************

---------------------------------------------------------------------------------------------------------------------|-------------

1837 1008 54.872 279.338 0.0000 100

SINGLE FACTOR SCAN FIO2 QTMS

TYPE IS "CONTINUOUS" V3.2

NO.OF NO.OF RESPONSE RESPONSE

# INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX

-----------------------------------------------------------------------------------------------------------------------------------

1 . 103 53 51.456 0.219 0.6398 94 |*******|

2 0.21 522 256 49.042 3.233 0.0722 89 |*******|

3 0.22 - 0.28 223 66 29.596 25.963 0.0000 54 |* |

4 0.3 - 0.32 194 86 44.330 3.929 0.0475 81 |***** |

5 0.34 - 0.35 290 205 70.690 13.223 0.0003 129 |*************

6 0.4 203 104 51.232 0.490 0.4838 93 |*******|

7 0.45 - 0.58 193 162 83.938 29.715 0.0000 153 |******************

8 0.6 - 1 109 76 69.725 4.382 0.0363 127 |*************

-----------------------------------------------------------------------------------------------------------------------|-----------

1837 1008 54.872 81.155 0.0000 100

SINGLE FACTOR SCAN GCS QTMS

TYPE IS "CONTINUOUS" V3.2

NO.OF NO.OF RESPONSE RESPONSE

# INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX

-----------------------------------------------------------------------------------------------------------------------------------

1 . 146 80 54.795 0.000 0.9899 100 |******

2 3 470 442 94.043 131.421 0.0000 171 |******************

3 4 - 7 237 84 35.443 16.304 0.0001 65 |* |

4 8 - 9 223 126 56.502 0.108 0.7424 103 |*******

5 10 - 11 428 150 35.047 30.657 0.0000 64 |* |

6 12 - 15 333 126 37.838 17.609 0.0000 69 |* |

---------------------------------------------------------------------------------------------------------------------|-------------

1837 1008 54.872 196.100 0.0000 100

SINGLE FACTOR SCAN OTHER QTMS

TYPE IS "CONTINUOUS" V3.2

NO.OF NO.OF RESPONSE RESPONSE

# INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX

-----------------------------------------------------------------------------------------------------------------------------------

1 . 97 48 49.485 0.513 0.4738 90 |* |

2 0 1348 662 49.110 8.157 0.0043 89 |* |

3 1 392 298 76.020 31.951 0.0000 139 |******************

-------------------------------------------------------------------------------------------------------------------|---------------

1837 1008 54.872 40.621 0.0000 100

SINGLE FACTOR SCAN PUPILS QTMS

TYPE IS "CONTINUOUS" V3.2

NO.OF NO.OF RESPONSE RESPONSE

# INTERVAL SOLICITED RESPONDERS RATE CHISQ PROB. INDEX

-----------------------------------------------------------------------------------------------------------------------------------

1 . 116 53 45.690 1.782 0.1818 83 |* |

2 0 1406 640 45.519 22.414 0.0000 83 |* |

3 1 - 2 315 315 100.000 116.910 0.0000 182 |******************

------------------------------------------------------------------------------------------------------------------|----------------

1837 1008 54.872 141.106 0.0000 100

Page 6: Results for Keith

4.3.4 – Developing a Common Time Hierarchay. This study was an accumulation

of patient data was collected during the normal course of running a PICU. The units of

time for each grouping of variables was different. These units for the time domain

variables were in sub-second differencing between Rpeaks, while the units for the spectra

domain variables were in groups of 128 beats or approximately 2 minutes of clock

time.Finally the units for Lab and Chart data was intermittent but logged at the actual

clock time taken. To join these 4 groups of data, it was determined to put all information

into a common time frame 2 minutes. This was accomplished using SAS Proc Expand

which can combine time series with different frequencies using various interplotative

methods that can be used to convert raw da into a higher frequency series or aggregate

down to a lower frequency series. Data from all sources were re-calibrated into 2 minute

records

We conclude with data from four sources calibrated into 2 minute epochs for all patients.

We seperated the first four hours of data from each patient file and identified the final

outcome unto each record which resulted in an analytic data set of 1080 records (600 live

packets and 480 dead packets). This represents a summarization of over 125,000 heart

beats using the 10 patients first four hours appended with derogatory bio markers from

lab and chart sources.

4.3.5 Multivariate Models –This file was analyzed using SAS Enterprise Miner

5.1 as diagrammed below. This inlcuded creating a 20% hold out sample as well as

implementing 8 different modelling approaches. The summary of each for both training

and validation samples is provided below. Based upon the ability of the hold out sample

to replicate the ROC profile from the traning sample as well as the over sensitvity, it was

conlcuded that the regression model faired best.

Page 7: Results for Keith

dead live fp fn tp tn %miss %FalseAlarm

Reg TRAIN 384 480 23 4 380 457 1.04% 4.79%

Reg VALIDATE 96 120 7 3 93 113 3.13% 5.83%

DmineReg TRAIN 384 480 24 9 375 456 2.34% 5.00%

DmineReg VALIDATE 96 120 10 7 89 110 7.29% 8.33%

Tree TRAIN 384 480 16 14 370 464 3.65% 3.33%

Tree VALIDATE 96 120 4 8 88 116 8.33% 3.33%

Rule TRAIN 384 480 9 19 365 471 4.95% 1.88%

Rule VALIDATE 96 120 3 10 86 117 10.42% 2.50%

Neural TRAIN 384 480 23 3 381 457 0.78% 4.79%

Neural VALIDATE 96 120 7 4 92 113 4.17% 5.83%

AutoNeural TRAIN 384 480 179 1 383 301 0.26% 37.29%

AutoNeural VALIDATE 96 120 47 0 96 73 0.00% 39.17%

DMNeural TRAIN 384 480 55 0 384 425 0.00% 11.46%

DMNeural VALIDATE 96 120 16 0 96 104 0.00% 13.33%

MBR TRAIN 384 480 26 6 378 454 1.56% 5.42%

MBR VALIDATE 96 120 9 5 91 111 5.21% 7.50%

Page 8: Results for Keith

4.3.6 Online Interactive Reporting : COGNOS Example. This test model can be

accessed on-line and used in realtime. This could allow non sampled patients to be scored

every 2 minutes with a resultant odds of mortality update.