Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear...
-
Upload
rodney-clark -
Category
Documents
-
view
215 -
download
0
Transcript of Random Forest-Based Classification of Heart Rate Variability Signals by Using Combinations of Linear...
Random Forest-Based Classification of Heart Rate Variability Signals by UsingCombinations of Linear and Nonlinear Features
Alan Jovic, Nikola Bogunovic
Faculty of Electrical Engineering and Computing,
University of Zagreb, Croatia
Contents
Problem description Methods
Feature extraction Classification and evaluation
HRV records Results Discussion Conclusion
Problem description Heart rate variability (HRV) analysis examines fluctuations in
the sequence of cardiac interbeat (RR) intervals
Each cardiac rhythm has a pattern (regular or irregular) in these RR interval fluctuations
HRV is a strong predictor of arrhythmic mortality
Problem description
Some rhythms have very similar patterns of HRV, e.g. normal and (accelerated) junctional rhythm
Some patterns cannot be efficiently detected by HRV analysis alone, e.g. bundle branch blocks, differentiating atrial disorders (atrial fibrillation vs. wandering atrial pacemaker)
Many rhythms and anomalies can be automatically detected and classified using HRV analysis alone
The main question are: How accurately can a rhythm be classified? What should the optimal length of the analyzed segment be? Which features should be used for which type of rhythm?
Motivation Nonlinear phenomena are involved in
genesis of HRV:
Lots of nonlinear features described inliterature, very few comparisons of
different features’ combinations on the same dataset
The aim of this work is to evaluate a number of different combinations of (sometimes interrelated) linear and nonlinear HRV features in classification of several types of cardiac rhythms
Feature extraction
We consider that a feature is linear if it is unable to take into account the nonlinear dynamics of the HRV signal
Examples of linear features include: Time domain statistical and geometric measures Frequency domain spectral features
Nonlinear features try to encompass and quantify the observed complexity of the HRV signal changes
Most of the employed nonlinear features make no assumptions on whether the changes are deterministic or stochastic in origin
Some of the features are specifically designed for HRV analysis, others have more broad areas of application
Feature extractionSchemenumber
Features in schemeNumber of
featuresDescription Comment
1 SDNN, pNN20, pNN50, RMSSD, HTI 5Linear, time
domain
2 PSD, VLF, LF, HF, LF/HF 5Linear,
frequency domain
3 Linear (time domain), linear (frequency domain) 10 Linear
4 Linear, SD1/SD2 ratio, Fano factor, Allan factor 13Linear +nonlinear
5Spatial filling index (SFI), Correlation dimension (D2),
Central tendency measure (CTM)3
Nonlinear chaos attractor features
Time interval (lag) , T={1,2,5,
10, and 20}, reconstructiondimension d=2
6
Approximate entropy (ApEn1-ApEn4), Maximum approximate entropy (MaxApEn), r for MaxApEn, Multiscale sample entropy (SampEn1-SampEn20), Multiscale Carnap 1D entropy(Carnap1-Carnap20)
46 EntropiesDimension m=2 for ApEn
and SampEn
7Advanced sequential trend analysis (ASTA): ASTA1-ASTA11
11 ASTA
8Detrended fluctuation analysis (DFA): DFA 5, DFA 7, DFA 10, DFA 15, DFA 20
5 DFA
9SFI, D2, CTM, ApEn1-ApEn4, SDNN, pNN20, RMSSD,
HTI11
Features combination
T=1, d=2, m=2
10 All features 78Advanced linear + nonlinear chaos attractor features (T=1) + entropies + ASTA + DFA
Feature extraction
Most of the linear and nonlinear features were implemented in our own framework for HRV called ECG Chaos Extractor
The only exceptions were frequency domain features, which were extracted in Matlab using the autoregressive function
Some newly proposed nonlinear features ASTA, Carnap 1D (tessellation) entropy – both methods require
more elaborate further research
Not all of the nonlinear HRV features covered in literature were inspected (e.g. Lyapunov exponents, spectral entropy...)
Classification and evaluation
For the best results on a large number of features, a strong classification algorithm is required
We opted for Random Forests (RF), an ensemble of random decision trees developed by Breiman in 2001
Internal mechanism for feature selection makes it a valuable tool in the case of a large number of potentially insignificant features
We have also tried other classifiers: ANN, SVM, and C4.5 decision tree, however none of the algorithms gives better results in terms of accuracy and speed
Classification and evaluation
RF was constructed with 40 trees for each feature scheme Stratified 10x10-fold cross-validation evaluation procedure was
executed Analysis overview:
HRV annotations
records
ECG Chaos Extractor
Matlab
Weka Classification accuracy
DATABASES FEATURE EXTRACTION
FEATURE SELECTION AND CLASSIFICATION
RESULTS
HRV records
Heart rhythm (total no. of feature vectors)
PhysioBank database ECG annotations records RR intervals analyzed
Normal heart rhythm (665)
MIT-BIH Normal Sinus Rhythm Database,
Normal Sinus Rhythm RR Interval Database
MIT-BIH: 16265-19830NSR: nsr001-nsr054
1-500, 251-750, 501-1000, 751-1250, 1001-1500, 1251-1750,
1501-2000, 1751-2250, 2001-2500
Any arrhythmia(492)
MIT-BIH Arrhythmia Database,CAST RR Interval Sub-Study
Database
MIT-BIH: 100-234CAST: e001a-e130a, f001a-f130a
1-500, 501-1000
Supraventricular arrhythmia (312)
MIT-BIH SupraventricularArrhythmia Database
800-8941-500, 251-750, 501-1000, 751-
1250
Congestive heart failure (747)
BIDMC Congestive Heart Failure Database,
Congestive Heart FailureRR Interval Database
BIDMC: chf01-chf15CHF RR: chf201-chf219
1-500, 251-750, 501-1000, ... , 3751-4250, 4001-4500
Four types of cardiac rhythms (seven databases) 500 RR intervals analyzed, with overlapping A total of 2216 feature vectors
Results
Schemenumber
Description
1 Linear, time domain
2Linear,
frequency domain
3 Linear
4Linear +nonlinear
5Nonlinear chaos attractor features
6 Entropies
7 ASTA
8 DFA
9 Features combination
10 All features
Results
Schemenumber
Description
1 Linear, time domain
2Linear,
frequency domain
3 Linear
4Linear +nonlinear
5Nonlinear chaos attractor features
6 Entropies
7 ASTA
8 DFA
9 Features combination
10 All features
Results
Schemenumber
Description
1 Linear, time domain
2Linear,
frequency domain
3 Linear
4Linear +nonlinear
5Nonlinear chaos attractor features
6 Entropies
7 ASTA
8 DFA
9 Features combination
10 All features
Results
Schemenumber
Description
1 Linear, time domain
2Linear,
frequency domain
3 Linear
4Linear +nonlinear
5Nonlinear chaos attractor features
6 Entropies
7 ASTA
8 DFA
9 Features combination
10 All features
Discussion Good performance was achieved with schemes: 4, 10, 3, and 9 The most promising combination is the one in scheme 4
consisting of the following features: SDNN, pNN20, pNN50, RMSSD, HTI, PSD, VLF, LF, HF, LF/HF,
SD1/SD2 ratio, Fano factor, Allan factor High increase in the number of nonlinear features did not
significantly improve classification accuracy For further research, each segment should be labeled based on
the beats or rhythm it contains, and not based on database it originated from Improvement in accuracy is to be expected Drawback is the time needed for careful labeling of the rhythms
Additional research is required for useful applicability of certain methods (ASTA, Carnap entropy)
Conclusion
Results suggest high efficiency of linear features in the classification problems
Some of the nonlinear features contribute to greater accuracy of the models
Random forest proved valuable for: Finding the most relevant subset of features Efficient classification of different cardiac rhythms
The authors recommend a combination of several time domain, frequency domain and nonlinear features for the best results on medium-sized HRV segments
Thank you!
Questions?