Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear...

18
Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear and Nonlinear Features Alan Jovic, Nikola Bogunovic Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia

Transcript of Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear...

Page 1: Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear and Nonlinear Features Alan Jovic, Nikola Bogunovic.

Random Forest-Based Classification of Heart Rate Variability Signals by UsingCombinations of Linear and Nonlinear Features

Alan Jovic, Nikola Bogunovic

Faculty of Electrical Engineering and Computing,

University of Zagreb, Croatia

Page 2: Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear and Nonlinear Features Alan Jovic, Nikola Bogunovic.

Contents

Problem description Methods

Feature extraction Classification and evaluation

HRV records Results Discussion Conclusion

Page 3: Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear and Nonlinear Features Alan Jovic, Nikola Bogunovic.

Problem description Heart rate variability (HRV) analysis examines fluctuations in

the sequence of cardiac interbeat (RR) intervals

Each cardiac rhythm has a pattern (regular or irregular) in these RR interval fluctuations

HRV is a strong predictor of arrhythmic mortality

Page 4: Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear and Nonlinear Features Alan Jovic, Nikola Bogunovic.

Problem description

Some rhythms have very similar patterns of HRV, e.g. normal and (accelerated) junctional rhythm

Some patterns cannot be efficiently detected by HRV analysis alone, e.g. bundle branch blocks, differentiating atrial disorders (atrial fibrillation vs. wandering atrial pacemaker)

Many rhythms and anomalies can be automatically detected and classified using HRV analysis alone

The main question are: How accurately can a rhythm be classified? What should the optimal length of the analyzed segment be? Which features should be used for which type of rhythm?

Page 5: Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear and Nonlinear Features Alan Jovic, Nikola Bogunovic.

Motivation Nonlinear phenomena are involved in

genesis of HRV:

Lots of nonlinear features described inliterature, very few comparisons of

different features’ combinations on the same dataset

The aim of this work is to evaluate a number of different combinations of (sometimes interrelated) linear and nonlinear HRV features in classification of several types of cardiac rhythms

Page 6: Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear and Nonlinear Features Alan Jovic, Nikola Bogunovic.

Feature extraction

We consider that a feature is linear if it is unable to take into account the nonlinear dynamics of the HRV signal

Examples of linear features include: Time domain statistical and geometric measures Frequency domain spectral features

Nonlinear features try to encompass and quantify the observed complexity of the HRV signal changes

Most of the employed nonlinear features make no assumptions on whether the changes are deterministic or stochastic in origin

Some of the features are specifically designed for HRV analysis, others have more broad areas of application

Page 7: Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear and Nonlinear Features Alan Jovic, Nikola Bogunovic.

Feature extractionSchemenumber

Features in schemeNumber of

featuresDescription Comment

1 SDNN, pNN20, pNN50, RMSSD, HTI 5Linear, time

domain

2 PSD, VLF, LF, HF, LF/HF 5Linear,

frequency domain

3 Linear (time domain), linear (frequency domain) 10 Linear

4 Linear, SD1/SD2 ratio, Fano factor, Allan factor 13Linear +nonlinear

5Spatial filling index (SFI), Correlation dimension (D2),

Central tendency measure (CTM)3

Nonlinear chaos attractor features

Time interval (lag) , T={1,2,5,

10, and 20}, reconstructiondimension d=2

6

Approximate entropy (ApEn1-ApEn4), Maximum approximate entropy (MaxApEn), r for MaxApEn, Multiscale sample entropy (SampEn1-SampEn20), Multiscale Carnap 1D entropy(Carnap1-Carnap20)

46 EntropiesDimension m=2 for ApEn

and SampEn

7Advanced sequential trend analysis (ASTA): ASTA1-ASTA11

11 ASTA

8Detrended fluctuation analysis (DFA): DFA 5, DFA 7, DFA 10, DFA 15, DFA 20

5 DFA

9SFI, D2, CTM, ApEn1-ApEn4, SDNN, pNN20, RMSSD,

HTI11

Features combination

T=1, d=2, m=2

10 All features 78Advanced linear + nonlinear chaos attractor features (T=1) + entropies + ASTA + DFA

Page 8: Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear and Nonlinear Features Alan Jovic, Nikola Bogunovic.

Feature extraction

Most of the linear and nonlinear features were implemented in our own framework for HRV called ECG Chaos Extractor

The only exceptions were frequency domain features, which were extracted in Matlab using the autoregressive function

Some newly proposed nonlinear features ASTA, Carnap 1D (tessellation) entropy – both methods require

more elaborate further research

Not all of the nonlinear HRV features covered in literature were inspected (e.g. Lyapunov exponents, spectral entropy...)

Page 9: Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear and Nonlinear Features Alan Jovic, Nikola Bogunovic.

Classification and evaluation

For the best results on a large number of features, a strong classification algorithm is required

We opted for Random Forests (RF), an ensemble of random decision trees developed by Breiman in 2001

Internal mechanism for feature selection makes it a valuable tool in the case of a large number of potentially insignificant features

We have also tried other classifiers: ANN, SVM, and C4.5 decision tree, however none of the algorithms gives better results in terms of accuracy and speed

Page 10: Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear and Nonlinear Features Alan Jovic, Nikola Bogunovic.

Classification and evaluation

RF was constructed with 40 trees for each feature scheme Stratified 10x10-fold cross-validation evaluation procedure was

executed Analysis overview:

HRV annotations

records

ECG Chaos Extractor

Matlab

Weka Classification accuracy

DATABASES FEATURE EXTRACTION

FEATURE SELECTION AND CLASSIFICATION

RESULTS

Page 11: Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear and Nonlinear Features Alan Jovic, Nikola Bogunovic.

HRV records

Heart rhythm (total no. of feature vectors)

PhysioBank database ECG annotations records RR intervals analyzed

Normal heart rhythm (665)

MIT-BIH Normal Sinus Rhythm Database,

Normal Sinus Rhythm RR Interval Database

MIT-BIH: 16265-19830NSR: nsr001-nsr054

1-500, 251-750, 501-1000, 751-1250, 1001-1500, 1251-1750,

1501-2000, 1751-2250, 2001-2500

Any arrhythmia(492)

MIT-BIH Arrhythmia Database,CAST RR Interval Sub-Study

Database

MIT-BIH: 100-234CAST: e001a-e130a, f001a-f130a

1-500, 501-1000

Supraventricular arrhythmia (312)

MIT-BIH SupraventricularArrhythmia Database

800-8941-500, 251-750, 501-1000, 751-

1250

Congestive heart failure (747)

BIDMC Congestive Heart Failure Database,

Congestive Heart FailureRR Interval Database

BIDMC: chf01-chf15CHF RR: chf201-chf219

1-500, 251-750, 501-1000, ... , 3751-4250, 4001-4500

Four types of cardiac rhythms (seven databases) 500 RR intervals analyzed, with overlapping A total of 2216 feature vectors

Page 12: Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear and Nonlinear Features Alan Jovic, Nikola Bogunovic.

Results

Schemenumber

Description

1 Linear, time domain

2Linear,

frequency domain

3 Linear

4Linear +nonlinear

5Nonlinear chaos attractor features

6 Entropies

7 ASTA

8 DFA

9 Features combination

10 All features

Page 13: Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear and Nonlinear Features Alan Jovic, Nikola Bogunovic.

Results

Schemenumber

Description

1 Linear, time domain

2Linear,

frequency domain

3 Linear

4Linear +nonlinear

5Nonlinear chaos attractor features

6 Entropies

7 ASTA

8 DFA

9 Features combination

10 All features

Page 14: Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear and Nonlinear Features Alan Jovic, Nikola Bogunovic.

Results

Schemenumber

Description

1 Linear, time domain

2Linear,

frequency domain

3 Linear

4Linear +nonlinear

5Nonlinear chaos attractor features

6 Entropies

7 ASTA

8 DFA

9 Features combination

10 All features

Page 15: Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear and Nonlinear Features Alan Jovic, Nikola Bogunovic.

Results

Schemenumber

Description

1 Linear, time domain

2Linear,

frequency domain

3 Linear

4Linear +nonlinear

5Nonlinear chaos attractor features

6 Entropies

7 ASTA

8 DFA

9 Features combination

10 All features

Page 16: Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear and Nonlinear Features Alan Jovic, Nikola Bogunovic.

Discussion Good performance was achieved with schemes: 4, 10, 3, and 9 The most promising combination is the one in scheme 4

consisting of the following features: SDNN, pNN20, pNN50, RMSSD, HTI, PSD, VLF, LF, HF, LF/HF,

SD1/SD2 ratio, Fano factor, Allan factor High increase in the number of nonlinear features did not

significantly improve classification accuracy For further research, each segment should be labeled based on

the beats or rhythm it contains, and not based on database it originated from Improvement in accuracy is to be expected Drawback is the time needed for careful labeling of the rhythms

Additional research is required for useful applicability of certain methods (ASTA, Carnap entropy)

Page 17: Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear and Nonlinear Features Alan Jovic, Nikola Bogunovic.

Conclusion

Results suggest high efficiency of linear features in the classification problems

Some of the nonlinear features contribute to greater accuracy of the models

Random forest proved valuable for: Finding the most relevant subset of features Efficient classification of different cardiac rhythms

The authors recommend a combination of several time domain, frequency domain and nonlinear features for the best results on medium-sized HRV segments

Page 18: Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear and Nonlinear Features Alan Jovic, Nikola Bogunovic.

Thank you!

Questions?