Patient Modeling for Next Generation Remote Patient …mpechen/projects/pdfs/Manev2010.pdf · 2011....
Transcript of Patient Modeling for Next Generation Remote Patient …mpechen/projects/pdfs/Manev2010.pdf · 2011....
Department of Mathematics and Computer Science Eindhoven University of Technology
Patient Modeling for Next Generation Remote Patient Management Systems: Heart Failure Hospitalization Prediction
By Goran Manev
A thesis submitted for the degree of
Master of Science
Supervisors: Ass. Prof. Dr. Mykola Pechenizkiy (TU/e)
Dr. Aleksandra Tesanovic (Philips Research)
Eindhoven, February 2010
Abstract
In order to maintain and improve the quality of care without exploding costs,
healthcare systems are undergoing paradigm shift from patient care in the hospital to
patient care at home. Remote patient management (RPM) systems offer a great
potential in reducing hospitalization costs and worsening of symptoms for patients
with chronic diseases, e.g. heart failure, and diabetes. Different types of data collected
by RPM systems provide an opportunity for personalizing information services, and
alerting medical personnel about the changing conditions of the patient. Early and
highly accurate detection of situations that lead to worsening patient’s conditions (e.g.
possible hospitalizations due to HF) is very important so that more patients will
receive timely appropriate feedback (e.g. instruction or education) to overcome their
conditions. However, the feedback provided by RPM systems nowadays is generic
and given to all patients regardless of their personality or current condition.
Additionally, although richness of data provides an opportunity for tailoring and
personalizing information services, there is a limited understanding of the necessary
architecture, methodology, and tailoring criteria to facilitate personalization.
In this thesis we tackle these problems by presenting a possible next generation
RPM system that enables personalization of educational content and its delivery to
patients, and introducing a generic methodology for knowledge discovery (KDD) for
patient modeling.
We especially focus on a particular problem of patient modeling that is the HF
hospitalization prediction. We consider the process of learning a predictive model
from RPM data (collected during a clinical trial). The results of our experimental
study illustrate that with the intelligent data analysis approach we can build models
which are significantly more accurate than expert-based (pre)authored decisioning.
TU Eindhoven 3 G. Manev
Table of Contents
Table of Contents .........................................................................................................3
List of Figures...............................................................................................................5
List of Tables ................................................................................................................7
List of Acronyms ..........................................................................................................9
1 Introduction............................................................................................................11
1.1 Background and Motivation ............................................................................11
1.2 Thesis Objectives and Methodology................................................................12
1.2.1 Hospitalization Classification task......................................................13
1.3 Results..............................................................................................................16
1.4 Organization of the thesis ................................................................................16
2 RPM Systems..........................................................................................................18
2.1 Current state of the art......................................................................................18
2.1.1 Data description ..................................................................................21
2.2 Adaptation challenge .......................................................................................22
2.3 Next generation adaptive RPM systems ..........................................................23
3 KDD Process...........................................................................................................26
3.1 A knowledge discovery framework .................................................................28
3.2 Data ..................................................................................................................28
3.3 Data exploration...............................................................................................28
3.3.1 Visual exploration...............................................................................29
3.4 Data preparation...............................................................................................34
3.4.1 Data cleaning ......................................................................................34
3.4.2 Data selection......................................................................................35
3.4.3 Data preprocessing..............................................................................35
3.4.4 Data transformation ............................................................................36
3.4.5 Feature extraction and selection..........................................................36
3.5 Data mining (Pattern discovery) ......................................................................37
3.5.1 Data mining algorithms.......................................................................38
3.5.2 Classification issues ............................................................................39
3.5.3 Example pattern discovery..................................................................40
4 Heart Failure Hospiatalization prediction...........................................................43
TU Eindhoven 4 G. Manev
4.1 Heart Failure ....................................................................................................43
4.2 Problem of HF hospitalization .........................................................................46
4.3 Related work ....................................................................................................46
4.4 Our approach....................................................................................................48
4.4.1 HF hospitalization Classification task ................................................49
4.4.2 Creation of training instances .............................................................52
4.4.3 Evaluation method ..............................................................................53
5 Case Study: TEN-HMS .........................................................................................57
5.1 Basic data findings...........................................................................................57
5.2 Feature space representation ............................................................................59
5.2.1 General HF features ............................................................................60
5.2.2 Symptoms features..............................................................................60
5.2.3 Features from daily measurements .....................................................60
5.2.4 Medical history feature .......................................................................61
5.2.5 Feature selection .................................................................................61
5.3 Experiment design ...........................................................................................62
5.4 Creation of training instances ..........................................................................63
5.5 Classification model prediction .......................................................................64
5.5.1 Learning models using (S) features ....................................................65
5.5.2 Learning models using (S+D) feature set ...........................................68
5.5.3 Learning models using (S+H) feature set ...........................................70
5.5.4 Learning models using (S+D+H) feature set ......................................71
5.5.5 Learning with feature selection (S+D+H+FS)....................................72
5.5.6 Summary of Classification model prediction .....................................73
5.6 Heart Failure Hospitalization Evaluation.........................................................75
5.6.1 Evaluation on a daily basis .................................................................76
5.6.2 Evaluation using prediction period .....................................................80
5.6.3 Summary of HF hospitalization evaluation ........................................89
6 Conclusions and future work................................................................................91
6.1 Summary and conclusions ...............................................................................91
6.2 Limitations .......................................................................................................94
6.3 Future work......................................................................................................95
References ...................................................................................................................97
A Results on the training dataset...........................................................................101
A.1 Experiment 1 (S) ............................................................................................101
A.2 Experiment 2 (S+D).......................................................................................103
A.3 Experiment 3 (S+H).......................................................................................106
A.4 Experiment 4 (S+D+H)..................................................................................108
A.5 Experiment 5 (S+D+H+FS) ...........................................................................111
A.6 Summary........................................................................................................112
B Results on the test dataset from preemptive mode of work ............................114
TU Eindhoven 5 G. Manev
List of Figures
1.1 The general process of classification task. Figure adopted from [38]. ................14
1.2 High level overview of the classification task. ....................................................15
2.1. Basic architecture of an RPM system ..................................................................19
2.2 A high level view of the next generation RPM....................................................24
3.1 KDD process from [33]. ......................................................................................26
3.2 Modified KDD framework for pattern discovery. ...............................................27
3.3 Dot chart analysis of usage data...........................................................................30
3.4 Example of weight timeseries for three different patients for breathlessness limit
activity symptom (G, S, A and B values). ..........................................................31
3.5 Example weight, heart rate, and blood pressure measurements of a single patient
for breathlessness limit activity symptom (G, S, A and B values) ......................32
3.6 Zoom-in of Figure 3.5..........................................................................................33
3.7 Zoom-in of Figure 3.5..........................................................................................33
3.8 Data preprocessing step for different problems (e.g. HF hospitalization
prediction) ............................................................................................................35
3.9 An overview of features used for three different problems. ................................37
3.10 Training classifiers for three different problems. Applying the model learned on
a real time data to determine if e.g. whether HF hospitalization will occur or not.
..............................................................................................................................38
4.1 Settings from previous approaches. .....................................................................50
4.2 Hospitalization prediction for the following two weeks window........................50
4.3 Hospitalization prediction for the following monthly window............................51
4.4 Forming of a positive (hospitalization took place) training instance...................52
4.5 Receiver operating characteristic (ROC). ............................................................55
5.1 Overall experiment design. ..................................................................................63
TU Eindhoven 6 G. Manev
5.2 Hospitalization prediction accuracies for different classifiers and feature sets
from on training dataset. ......................................................................................74
5.3 Evaluation setup...................................................................................................75
5.4 Hospitalization prediction accuracies for different classifiers and feature sets on
test dataset. ...........................................................................................................77
5.5 Hospitalization rates on the test dataset. ..............................................................79
5.6 Example of firing alarms in the non-preemptive mode of work..........................83
5.7 Example of firing alarms in the preemptive mode of work. ................................84
5.8 Hospitalization prediction accuracies for different classifiers and feature sets on
test dataset using the non-preemptive mode of work on the adaptive engine. ....86
5.9 An example of predicted HF hospitalizations from same classifier runned on
both modes of work of the engine........................................................................88
A.1 All models from all experiments........................................................................112
B.1 Hospitalization prediction accuracies for different classifiers and feature sets on
test dataset using the preemptive mode of work on the adaptive engine...........114
TU Eindhoven 7 G. Manev
List of Tables
1.1 An example of prediction using input features for unseen instances...................14
2.1. Comparison of RPM systems...............................................................................20
2.2 An overview of different types of data ................................................................21
2.3 . Four heart failure management programs for four groups of patients [29] .........22
2.4. Typical features included in a patient model template ........................................25
3.1. Examples of discovered patterns .........................................................................41
3.2. Examples of adaptation rules ...............................................................................42
4.1 HF classification by New York Heart Association [15]. .....................................44
4.2 Confusion matrix for a two class problem...........................................................54
5.1 Number of days with certain number of measurements. .....................................58
5.2. Distribution of HF hospitalizations between two monthly contacts. ...................59
5.3 Number of patients in the training dataset and test dataset. ...............................63
5.4 HF hospitalizations and symptoms in the training and test datasets....................64
5.5 Training instances for the HF hospitalization classification task. .......................64
5.6. TPR,s FPRs and YIs of all models that use symptom (S) features......................66
5.7 Set of best models according to Youden Index (S features). ...............................67
5.8 Result of T-test on FPR between classifiers of the “best set of models” (S).......68
5.9 Set of best models according to Youden Index (S+D features)...........................69
5.10 Result of T-test on FPR between classifiers of the “best set of models” (S+D) (+
significantly outperforms, = tie, - significantly outperformed) ...........................69
5.11 Set of best models according to Youden Index (S+H features)...........................70
5.12 Result of T-test between classifiers of the “best set of models” (S+H)...............71
5.13 Set of best models according to Youden Index (S+D+H features)......................71
5.14 Result of T-test on FPR between classifiers of the “best set of models” (S+D+H)
(+ significantly outperforms, = tie, - significantly outperformed).......................72
TU Eindhoven 8 G. Manev
5.15 Set of best models according to Youden Index (S+D+H+FS features). ..............72
5.16 Result of T-test on FPR between classifiers of the “best set of models”.............73
5.17 Prediction accuracies on the test dataset. .............................................................78
5.18 Types of rules constructed during the learning process.......................................81
5.19 Examples of adaptive (meta) rules.......................................................................82
5.20 Prediction accuracies on the test dataset using the non-preemptive mode of work
on the adaptive engine. ........................................................................................87
A.1 Result of T-test on TPR between all classifiers (S features). ............................101
A.2 Result of T-test on Youden Index between all classifiers (S features). .............102
A.3 Performances of all classifiers (S+D features)...................................................103
A.4 Result of T-test on TPR between all classifiers (S+D). .....................................104
A.5 Result of T-test on TPR between all classifiers (S+D)......................................105
A.6 Performances of all classifiers (S+H). ...............................................................106
A.7 Result of T-test on TPR between all classifiers (S+H). .....................................107
A.8 Result of T-test on Youden Index between all classifiers (S+H).......................107
A.9 Performances of all classifiers (S+D+H). ..........................................................108
A.10 Result of T-test on TPR between all classifiers (S+D+H).................................109
A.11 Result of T-test on Youden Index between all classifiers (S+D+H). ...............110
A.12 Performances off all algorithms (S+D+H+FS). ................................................111
A.13 Result of T-test on TPR between all classifiers (S+D+H+FS) .........................111
A.14 Result of T-test on Youden Index between all classifiers (S+D+H+FS)..........112
A.15 Summary of two best algorithms per experiment with max TPR) and min FPR.
............................................................................................................................113
B.1 Prediction accuracies on the test dataset using the preemptive mode of work on
the adaptive engine. ...........................................................................................115
TU Eindhoven 9 G. Manev
List of Acronyms
BP Blood pressure
C Clinical visits feature set
CHF Chronic Heart Failure
CV Cardiovascular
CVD Cardiovascular disease
D Daily measurements feature set
DCA Dot Chart Analysis
DT Decision tree
EMA Exponential moving average
ESC European Society of Cardiology
FN False negative
FP False positive
FPR FP rate
FS Feature selection
GHF General Heart Failure (feature set)
GWS General Worsening Symptom (feature set)
GNSS General Next Symptom Status (feature set)
HDR Hospitalization detection rate
HF Heart failure
HR Heart rate
HRTI Heart rate trend index
HTM Home telemonitoring
LVEF Left ventricular ejection fraction
MACD Moving average convergence divergence
NTS Nurse telephone support
NYHA New York Heart Association
TU Eindhoven 10 G. Manev
RIPPER Repeated Incremental Pruning to Produce Error Reduction
RPM Remote patient management
RoT Rule of thumb
S Symptoms (feature set)
SVM Support vector machine
SW Symptom worsening
TEN-HMS Trans-European Network-Home-Care Management System
TN True negative
TP True positive
TPR TP rate
UC Usual care
WHF Worsening HF
WTI Weight Trend Index
YI Youden Index
TU Eindhoven 11 G. Manev
Chapter 1
Introduction
1.1 Background and Motivation
Chronic diseases are the leading cause of death and healthcare costs in the developed
countries. According to the 2008 report by American Heart Association,
cardiovascular diseases (CVD) are number one killer in the USA [1]. European Heart
Network reported similar results for Europe in 2008 report [2]. The reports show that
CVD cause nearly half of all deaths in Europe (48%) and in the USA (42%).
One of the most severe CVD with respect to mortality and cost is heart failure
(HF). It is a disease that can not be cured but only managed (end-stage). In Europe,
the range of the prevalence of symptomatic heart failure is between 0.4 to 2% [3]. The
prevalence of HF increases rapidly with age [4], with a mean age of the HF
population being 74 years. It is estimated that there are 4-5 million people in US [1]
and 10 million people in EU and Europe [6] suffering from HF.
Chronic heart failure alone costs US economy over 33.7 billion dollars per year,
of which 16 billion due to re-hospitalization [8]. EU healthcare system is experiencing
similar cost expenditures [2]. 42 percent of re-hospitalizations are preventable by
adequate patient monitoring, instruction, education and motivation (all of which can
be done outside of the hospital).
Hence, in order to monitor and improve quality of care without exploding costs,
healthcare systems are undergoing paradigm shift from patient care in the hospital to
the patient care at home [9]. In that context, remote patient management (RPM)
systems offer great a great potential in reducing hospitalization costs, mortality rates,
TU Eindhoven 12 G. Manev
and worsening of symptoms for patients with chronic diseases, e.g., heart failure,
coronary artery disease and diabetes.
To fulfill its goals to improve patient’s health quality, an RPM system ideally
should have the ability to 1) monitor vital signs, 2) detect (predict) critical situations
that may lead to e.g. hospitalization and 3) provide a feedback to the patients in terms
of appropriate instruction, education and/or motivation to improve their conditions.
Early and highly accurate detection of critical situations (e.g. HF hospitalizations
or symptom worsening), together with the reasons (patient’s conditions) that detect
the critical situation, is very important. On that way, more patients will receive timely
appropriate instructional, motivational and/or educational material (according to the
reasons) to improve their conditions. As such, detection of such situations and
providing feedback tailored to the patient is one of the main goals of an RPM system.
However, most of the educational and instructional material provided by RPM
systems nowadays is generic and given to all patients regardless of their personality,
current condition, physical, or mental state. The recent clinical studies show that
education and coaching tailored towards the patient is a promising approach to
increase adherence to the treatment and potentially improve clinical outcomes
[31,29,40]. Additionally, although a large volumes of data collected by an RPM
system provide an opportunity for tailoring and personalizing information services,
there is limited understanding of the necessary architecture, methodology, and
tailoring criteria to facilitate personalization of the content.
1.2 Thesis Objectives and Methodology
Based on the problems discussed in the previous section we define three main
objectives of this thesis:
• Design a general architecture of a system that will allow personalization in RPM
• Present a KDD framework for discovering patterns and features for patient
modeling
• Focus on Prediction of Heart Failure (HF) hospitalizations
The later objective is especially important as HF hospitalization is one of the main
critical situations and by predicting it possible adaptation, motivation, instruction, etc.
can be provided by the system.
We define a possible architecture of next generation of RPM systems (Chapter 2)
in which as an essential component we placed a generic knowledge discovery (KDD)
framework using data mining approaches. Besides the KDD process, we showed the
TU Eindhoven 13 G. Manev
other key components of the architecture such as patient model, domain model,
adaptation rules and adaptation engine.
The proposed framework of knowledge discovery process (Chapter 3) is essential
for discovering relevant actionable patterns that are basis for patient modeling such as
creation of the patient model and the adaptation rules. This process uses machine
learning and data mining techniques to define tasks based on which patterns may be
learned. By using this approach many kinds of features may be constructed and more
learning algorithms may be tested for learning some knowledge. The overall KDD
process is rather complex process and includes several steps, such as data exploration
for better understanding of the data, proper data selection, data preprocessing, data
transformation, feature space construction, pattern discovery (data mining) and
evaluation of the learned rules.
The problem of HF hospitalization prediction, can be seen as a case study that
follows the general framework to discover rules that will be able to predict HF
hospitalizations. More specifically, it is based on a classification data mining task,
which aims from the available patient’s medical data to classify whether a HF
hospitalization will happen or not. In the next section we describe in more detailed the
general idea behind this classification task.
1.2.1 Hospitalization Classification task
During this master project we considered and experimented with three prediction
problems: prediction of HF hospitalization, prediction of symptom worsening and
prediction of next symptom value. For each of the prediction problems we followed
the general framework that we present in this thesis. Because of this, and in order to
be compact, we focus in this thesis only on the hospitalization prediction problem.
As explained, the prediction of HF hospitalization using daily measurements,
symptoms or other patient’s related data can be seen as a classification task.
Definition 1 (Classification). Classification is the task of learning a target function
(classification model) F that maps each feature set X to one of the predefined class
labels Y [38].
Figure 1.1 shows the general process of classification task. In our case of HF
hospitalization prediction, symptoms or features extracted from the other data (e.g.
daily measurements, clinical visits, or medical history) represents the feature set. The
TU Eindhoven 14 G. Manev
target function is the existence or not of next HF hospitalization, and it has two class
values: existsHF – there will be a HF hospitalization, noHF – there will not be a HF
hospitalization.
Figure 1.1 The general process of classification task. Figure adopted from [38].
The main reasons for applying classification is it ability to be used for predictive
modeling, which indicates that the classification model is used to predict a class label
of unseen instance. Additional reason is that, we already have clear distinction of
class labels (e.g. existsHF and noHF), and it is not necessary to search for class labels
such as with clustering. An example of prediction of the hospitalization classification
task is shown in Table 1.1.
Table 1.1 An example of prediction using input features for unseen instances
Patient ID Features Exists next HF hospitalization?
P1 {f1,….,fn} existsHF
P2 {f1,….,fn} noHF
P3 {f1,….,fn} existsHF
Figure 1.2 provides a high level overview of the contribution of the classification
task in this thesis. At first, given a training data (symptoms, measurements, clinical
visits, medical history, etc), a (classification) model is learned. To learn classification
models, we conducted experiments using combinations of different feature sets. One
of the goals in this thesis was to include symptoms and other types of data (e.g.
medical history) in the prediction of HF hospitalizations, since the current studies
tried to predict HF hospitalizations only from daily measurements (e.g. weight).
Therefore, many kind of features were constructed such as symptoms, features from
daily measurements or medical history. We also use different learning algorithms
(Rule-classifier, Decision Tree and Support Vector Machine) with different parameter
settings to learn models that perform best. Then, the best learned models are used in
online (real-time) setting to classify (predict) unseen records whether a HF
hospitalization will happen or not.
TU Eindhoven 15 G. Manev
ModelModel
Learn
model
Apply
model
Learning
Algorithm
Training data
Model
Real-time data
Figure 1.2 High level overview of the classification task.
Additionally, if the unseen record is classified as a possible HF hospitalization, an
education, motivation or instruction can be provided to the patient according to the
rules (patterns) of features that predict the possible hospitalization.
One of the main problems that we faced in this thesis for the problem of HF
hospitalization prediction was the proper definition of the classification task (e.g.
whether a HF hospitalization happens in 14 days or 30 days). Another challenge was
the construction of the training instances. For example we focused with problems
such as “what are positive and negative instances?” or “what are the right features to
represent these instances?”. The difficulties come from the fact that different types of
data that we wanted to include in the learning process are gathered on different time
periods (e.g. symptoms on a monthly basis, while vital signs on a daily basis).
Furthermore, important challenge of this work was to have proper evaluation of
the HF hospitalization prediction task. Although, there are some studies that tried to
predict HF hospitalizations, so far there is not common way of its evaluation. We
discuss these problems in more detail in Chapter 4 and Chapter 5.
TU Eindhoven 16 G. Manev
1.3 Results
The results in this thesis is our research work on patient modeling in RPM systems,
especially on the problem of prediction hospitalizations due to HF. Specifically, we
developed an architecture of the next generation RPM systems that facilitates
personalization of educational content and its delivery to patients. We created a
generic framework for personalization of RPM that includes machine learning and
data mining approaches, and provided illustrative examples drawn from the analysis
of data from a real clinical trial. With these examples we showed how patient
profiling and tailoring of the educational material can be achieved.
As for the problem of HF hospitalization prediction, we showed that usage of
machine learning techniques and combination of daily measurements with symptoms
and other medical related data gives improved results in HF hospitalization prediction
than current methods. We also showed that symptoms are very good predictors and
therefore they should be taken into account in the prediction. However, since
symptoms are taken on a monthly basis more accurate results may be achieved if the
symptoms are taken on shorter time intervals (e.g. 15 days).
Furthermore, we showed that different features (e.g. symptoms) may have ability
to predict on different time intervals, and therefore this information should be used
when an alarm is casted in real time settings. We developed a simple adaptive engine
that uses information of prediction ability of a certain algorithm (depend of which
features is constructed), to provide adaptive generation of alarms for possible HF
hospitalization.
In general, the process of patient modeling using machine learning is very useful
in providing tailored materials in RPM systems, which together with the high
detection rate achieved from the HF hospitalization prediction task, allows early
detection of worsening situations (e.g. HF hospitalization) and therefore possible
improvement of patient’s condition and reducing hospitalization cost.
1.4 Organization of the thesis
The structure of this thesis is organized as follows. Chapter 2 provides background of
current start of the art of RPM systems and introduction of a new architecture for next
generation RPM systems. Chapter 3 outlines the steps taken according to the general
TU Eindhoven 17 G. Manev
framework for knowledge discovery to discover patterns for patient user modeling.
Chapter 4 explains about our approach to the HF hospitalization prediction problem.
Chapter 5 presents a case study of the TEN-HMS database for the HF hospitalization
prediction problem. The experiments and evaluation of our approach are also shown
in this chapter. Finally, in Chapter 6 we present the conclusions from our work and
directions for future work.
TU Eindhoven 18 G. Manev
Chapter 2
RPM Systems
In order to maintain and improve quality of care without exploding costs, healthcare
systems are undergoing paradigm shift from patient care in the hospital to the patient
care at home [9]. In that context, remote patient management (RPM) systems offer
great a great potential in reducing hospitalization costs, mortality rates, and worsening
of symptoms for patients with chronic diseases, e.g., coronary artery disease, heart
failure and diabetes.
In this chapter we first review the state of the art in RPM systems. Then, proceed
to identify why adaptation and personalization is a challenge from both technical and
clinical point of view. Finally, we present a possible architecture for the next
generation RPM systems.
2.1 Current state of the art
The aim of using RPM systems is to improve quality of care and reduce cost by using
technology for cost effective early detection of worsening of a patient’s health status
and support the professionals in providing timely feedback to the patient to help
coping with the condition or intervene on time.
Existing commercial RPM systems normally provide an end-to-end infrastructure
(as shown in Figure 2.1) that connects patients at home with health professionals at
their institution. The patients at home are equipped with a number of sensors
measuring vital signs to obtain objective measurements about their physical condition.
Patients are typically monitored on several vital signs depending of the chronic
disease in question, e.g., weight and blood pressure for heart failure patients, glucose
and weight for diabetes patients. The vital sign measurements are via application
TU Eindhoven 19 G. Manev
Figure 2.1 Basic architecture of an RPM system
hosting device transferred to the monitoring and management server. The subjective
measurements such as symptoms and quality of life (QoL) scores are also collected
from the patients via questionnaires. The questionnaires can be presented to the
patient directly via application hosting device or the feedback device such as TV.
Objective and subjective measurements (referred to as RPM data) are presented to the
medical professional, who based on the indicated deviations from the normal values
adjusts the patient’s treatment plan, including medications and lifestyle goals
(nutrition and physical activity).
The majority of commercial RPM systems only have the link between the patient
and professional that enables uploading patient data to the professional for review and
treatment changes; these systems are typically referred to as remote patient
monitoring systems as they provide only monitoring but not the management part.
The monitoring solutions e.g., HealthHero [21], HomeMed [22], do not have a
separate feedback device but collect subjective data via an application hosting device.
Patient coaching and education is found to be an influential factor for adherence to the
treatment, both medications treatment and lifestyle adjustments [26]. The
management solutions have a feedback loop to the patient that enables the
professional to provide an appropriate education and counseling (coaching of the
patients) via the feedback device. In this line, a number of RPM systems provide
educational material for the patients to help them cope with their condition. Table 2.1
shows the comparison of the representative RPM systems that provide both
TU Eindhoven 20 G. Manev
measurements and education for managing chronic heart failure, namely solutions
from Health Hero Health TV, BL Healthcare TVx [23], Card Guard iTV [24], and
Philips Motiva [25].
Table 2.1. Comparison of RPM systems
RPM system
Comparison wrt:
Health
Hero
Health
TV
BL
Health-
care
TVx
Card
Guard
iTV
Philips
Motiva
weight ● ● ● ●
ECG ● ●
BP & Pulse ● ● ● ●
SpO2 ● ● ●
temperature ●
fluid status
glucose ● ● ● ●
peak flow ● ● ●
Objective
measurements
PT/INR
Subjective
measurements questionnaires ● ● ● ●
nutrition ● ● ○ ●
physical activity ● ● ○ ●
smoking ● ○ ○ ●
stress ● ○ ○ ●
sleep disorders ● ○ ○
weight reduction ● ● ○ ●
worsening signs ● ● ○ ●
depression ● ○ ○ ●
Education
and
counseling
video conf. ● ● ●
● – covered by RPM system, ○ – partly covered by the RPM system
RPM solutions can also be found in the research community. For example, the
RPM system described in [27] also visualize information for the patients on the TV,
while presenting the information needed by the healthcare personnel on their PC. The
system supports a Patient Health Diary with disease related questions, vital signs
measurements such as heart rate, oxygen saturation, and blood glucose values and
other senor data. Similarly, the system C-Monitor creates provides medical
information and manages medical staffs and patients involved in the disease
management [28]. The system supports two workspaces: for the doctors (to monitor
patients’ condition and adjust therapy) and for the patients (to post the symptoms,
read documents regarding specific diseases and exchange information with the
responding medical professional). The system allows the delivery of personalized
TU Eindhoven 21 G. Manev
documents to the patients such as diseases information, healthy lifestyle
recommendations, suggestions on diet, etc.
2.1.1 Data description
Table 2.2 gives an overview of different types of data that are normally collected
when the patient is using an RPM system. Vital signs and questionnaires are normally
collected using the RPM system from the measurement devices from the patients’
home as described in the previous section. Occasionally, the answers to the
questionnaires are also collected by the medical professional (e.g., nurse) via a
phone conversation and are stored in the RPM database. Collection of data via
telephone contacts is not done on daily basis rather on weekly or monthly basis.
Table 2.2 An overview of different types of data Data classes Collected via (Typical) Frequency
Causes
Co-morbidities
Prior hospitalizations
Medical history
Implanted devices
Face to face meeting at a
medical professional’s
institution
Once, when
diagnosis for
chronic condition is
made
Vitals
Height
Other diagnosis
Baseline data
Lab results
Face to face meeting at a
medical professional’s
institution
Every few month,
during regular
follow-up
Weight
Blood pressure
Vital signs
Pulse
An RPM system at a
patient’s home
Daily
Symptoms
Depression
Anxiety
Overall health
Overall QoL
Stress
Sleep patterns
Fatigue
Questionnaires
Loneliness
Several alternatives:
- An RPM system at a
patient’s home, but also can
be collected:
- Via a telephone contact by a
medical professional
- Via face to face meeting
during regular checkups at
medical professional
institution
Varies depending
on the protocol of
care and can be
collected:
- Daily (RPM)
- Weekly (RPM)
- Montly
(telephone)
- Few months (face
to face meetings)
Bio-markers Face to face meeting at a
medical professional’s
institution
(Few) months
Disease related drugs Medications
Non-disease related
drugs
- Via a telephone contact by a
medical professional
- Via face to face meeting
during regular checkups
Few weeks to few
months
TU Eindhoven 22 G. Manev
Medical history data are collected during the first clinical visit, i.e., visit where the
diagnosis of the condition is made and the initial therapy prescribed, while the
baseline data are collected at the first visit and re-measured at every subsequent
clinical follow-up visit. Labs are currently only collected at the clinical visits and are
very important for adjusting medications during the clinical visit. Medications could
potentially (depending on the local protocols of care) also be changed over the phone.
2.2 Adaptation challenge
In the TEHAF clinical study [29] nurses used Health Buddy system to deliver
educational material to the chronic heart failure patients. During the study, nurses
observed their patients over a period of time, learning their behavioral characteristics,
knowledge level and health state, and concluded that adjusting the content of the
educational material based on symptoms, knowledge and behavior is beneficial for the
patients. They have then designed a simple manual adaptation scheme of the
educational material (mostly in terms of quantity of education) and used that scheme
to deliver the educational material to their patients, as illustrated in Table 2.3.
Similarly, a need for tailoring educational material has been identified from the recent
COACH study [31].
Table 2.3 . Four heart failure management programs for four groups of patients [29]
Program # Duration
(days) Symptoms
Knowledge
& Behavior
change
Benefits
1 90 � � High monitoring
High education
2 30 � � High monitoring
Low education
3 90 � � Low monitoring
High education
4 180 � � Low monitoring
Low education
� (symptoms) patient is exhibiting increasing number of symptoms
� (symptoms) patient is stabilizing and has decreased number of symptoms
� (knowledge/behaviour) patient is showing knowledge of disease and
behaviour in line with the treatment
� (knowledge/behaviour) patient is not showing knowledge and does not
have behaviour in line with the treatment
As can be observed, commercial systems typically focus on raising alarms to the
health professional based on patient’s status of vital signs and their deviations from
normal (baseline) values. These systems typically send the same content to all the
patients, regardless of their current health condition, knowledge level, or a mental
state. For example, Philips Motiva [25] system sends messages and videos to all
TU Eindhoven 23 G. Manev
patients, regarding their health conditions, on a predefined time intervals. A step
further are research RPM systems that to an extent provide aspects of personalization.
However, this personalization is still limited and does not exploit available RPM data
for adaptation of the educational content toward specific patient needs.
Research on personalization is ongoing in e-Learning and there are a number of
successful implementation of adaptive hypermedia systems like AHA!, Interbook, etc.
[32]. However, existing architectures are not adopted in eHealth applications such as
RPM systems. Furthermore, in the mentioned systems, the adaptation and
personalization is pre-authored and thus remains highly static and often subjective
based on some domain expertise translated to the machine readable form. In the next
section we suggest a general architecture of a personalized RPM system in which we
follow general principles of personalization in e-Learning systems with KDD process
as one of the key integrated components.
2.3 Next generation adaptive RPM systems
Given the challenge identified in the previous section, we outline a part of the
architecture that provides a possible foundation for the next generation adaptive
eHealth systems (Figure 2.2).
The key components of the system that facilitate personalization and adaptation
include: (1) patient (user) model, (2) domain model, (3) adaptation rules, (4)
adaptation engine, and (5) knowledge discovery process. Further, there are authoring
and management tools allowing medical experts and professionals to monitor, control
and manage patient models, domain models and adaptation rules.
The knowledge discovery from database (KDD) process is essential for
discovering relevant actionable patterns that are the basis for creation of the patient
model and the adaptation rules. This KDD process is (initially) done ”off-line”, using
stable historical data available from an existing RPM database or from completed
clinical trials relevant for the disease in question. Via this knowledge discovery
process we obtain relevant patterns that are used to build a patient model template.
The same patterns are utilized to build the adaptation rules and domain model of the
available content material that is stored in corresponding databases. The KDD process
is highly iterative and interactive and involves considerable effort from domain and
KDD experts. Moreover, this is by no means one-time activity. With accumulation of
TU Eindhoven 24 G. Manev
Figure 2.2 A high level view of the next generation RPM
new evidence and possible contextual changes, models and rules might need an
update or extension. We discuss in detail KDD process and give examples of patient
model and adaptation rules in Chapter 3.
Other processes, including actual adaptation of the content, are executed “on-
line”, during the use of the system. Namely, for each patient that uses the system, the
patient model template is instantiated into a (personal) patient model which is stored
in the patient models database and updated regularly, e.g., when relevant information
becomes available in the RPM database. The adaptation engine takes the patient
model and domain model, and generates personalized content, e.g., educational,
instructional, motivational, or alerting. The content is then presented at the patient’s
feedback device, e.g., TV, a smart phone, or a computer and/or at the medical
professional side (alerting). Based on patient usage of the system and his/her health
behavior characteristics, the patient model will be updated. Potentially, adaptation
strategies, or individual rules can be automatically revised (based on new evidence) or
a corresponding alert can be sent to a human expert.
Developing personalized RPM systems or adaptation rules is possible only if we
can learn key (potentially changing and dynamic) characteristics of the patients and
track them continuously. Personalization can be organized using individual and group
(or stereotype) user modeling. In a stereotype approach, the users are classified into
several groups. In eHealth applications users can be classified according to their main
disease, background in medicine (patients, nurses, and physicians), general education
background (no degree, college degree, doctorate, etc), and their tasks (consultation,
education, and emergency cases). Individual patient (user) models, besides the user’s
TU Eindhoven 25 G. Manev
medical profile, could include also individual characteristics such as cognitive and
psychological individual peculiarities, the interaction parameters – the last visited
pages, used links, number of the particular pages visits, resource usages, etc.
Table 2.4 gives an overview of possible features of various data classes that can
play a role in the patient model of an RPM system. A feature can be static, e.g.
gender, residence, language, or relatively static, e.g., age, cognitive impairment
(which a patient can develop during the usage of RPM system) and dynamic, e.g.,
values of weight measurements or system usage. The example given is for heart
failure, but can be generalized to any of chronic diseases given a specific set of
relevant symptoms and vital signs for that chronic disease.
Table 2.4. Typical features included in a patient model template
Changes Data class Feature
Static Dynamic
Gender x
Age x
Country x Demographic
Language x
Living status Single/Family x
Weight x
Height x
Body Mass Index x
Edema x
Baseline data
Biomarker values x
Cause of disease x
Co-morbidities x Medical history
Implantables x x
Ankle swelling x
Breathlessness x
Depression x Symptoms
Anxiety x
Weight x
Heart rate x
Blood pressure x
Vital signs
(Frequencies of values
out of band) Diastolic blood
pressure x
Weight x
Blood pressure x
System usage
(Frequency of
measurements) Heart rate x
Verbaliser/Imager x Learning styles
FD/FI x
Reduced eyesight x Cognitive function
Dementia x
Legend: Frequency of vital sign measurements - how often the patient
has been using a sensor for measurements (1 – every day, 0- not at all),
FD/FI – field dependent/independent.
TU Eindhoven 26 G. Manev
Chapter 3
KDD Process
3.1 A knowledge discovery framework
In this section we present a framework for knowledge discovery in databases (KDD)
for patient modeling. The framework is generic and has been adopted from the well
known KDD presented in [33] (Figure 3.1) and applied to our problem of pattern
discovery and specifically for the case of HF hospitalization prediction. Figure 3.2
shows our modified KDD framework.
Data
Knowledge
Target data
Preprocessed data
Patterns
Problem
Understanding of
problem
Exploration and
selection of data
Preparation of data
Data mining
Evaluation/
Interpretation
Figure 3.1 KDD process from [33].
The first step consists of problem definition and understanding. Different
problems may be defined, such as the HF hospitalization prediction problem
discussed in Chapter 4. The input to the second step of the framework, Data
exploration and selection, consists of patient’s data (demographic, patient’s medical
TU Eindhoven 27 G. Manev
history), monthly contacts data, data from clinical visits, and daily measurements
data. We used data from the well known TEN-HMS study [30] as a case study.
RPM
DB
Feature extraction and construction
Event-pattern
analysis
Relevant
data
selectionTime-series
analysis
Pattern
discovery
(data mining)
Statistical analysis
Outliers detectionData cleaning
Meaningful features
that can bring interesting results
Useful
actionable
patterns
All features that resulted from
data mining
Data view
ready to be
mined
Classification, association analysis,
subgroup discovery, emerging pattern
mining, clustering
All data types:
vitals, usage, med history,
clinical study protocol, etc.
Data exploration aimed at identifying potential correlations between subjective and objective
measurements, identifying and constructing relevant features
RPM
DB
subset
Relevant data types
All
patterns
Actionable
patterns
selection
Features relevant for patients model
Association and classification rules
Manual (domain expert) identification of
actionable patterns
Figure 3.2 Modified KDD framework for pattern discovery.
The second step consists of data exploration for better understanding of the data
and relevant data selection. By using explorative data analysis, outlier detection and
data cleaning approaches we performed basic data preprocessing and selected a subset
of relevant data. This step is particularly useful because we used database from
clinical trials which have more elaborate data sets. Further, we performed visual data
exploration, including visualization of event data and timeseries data. All data
explorations are performed in iterative and interactive steps to get a better
understanding of what features and relations between them may potentially describe
patient current state and its short-term and long-term dynamics. Next, based on the
findings from the data exploration, we performed further data preparation, including
data selection, more complex data preprocessing, transformation and feature
extraction and construction. In the data mining (pattern discovery) step, we show
how data mining can be used in discovering patterns. Also, data mining techniques
that we use in our work are defined. We used three classification techniques, namely
support vector machine (SVM), rule classifier and decision tree (DT) for our task of
HF hospitalization prediction, but other data mining methods can be used as well for
pattern discovery. Finally, an example set of patterns that potentially can be used for
creating patient model and adaptation rules is presented.
As mentioned, the HF hospitalization prediction task is not defined in this section
as we have separate chapter for this problem. Also, we do not define how evaluation
is performed as different problems and different techniques can be evaluated on many
different ways. Specifically for the HF hospitalization prediction task, the evaluation
is defined together with the definition of the task, in Chapter 4.
In the remainder of this chapter we describe each step of the KDD framework.
TU Eindhoven 28 G. Manev
3.2 Data
In this thesis work we used data from the well known TEN-HMS (Trans-European
Network-Home-Care Management System) study [30]. This study was conducted in
two years period with aim to monitor patients with cardiovascular diseases. In this
period 421 patients from several hospitals in the Netherlands, the United Kingdom
and Germany were assigned randomly into 3 different groups: Usual care (UC) (85
patients), Nurse telephone support (NTS) (170 patients) and Home telemonitoring
(HTM) (166 patients).
At the beginning of the study (enrollment), baseline characteristics such as
demographic (gender, age, etc), laboratory exam data (sodium, potassium, weight,
height, etc) and medical history (e.g. primary cause of HF, co-morbidities, devices
implanted, etc) were gathered. Additionally, patients filled out two questionnaires to
asses their condition in HF symptoms: Quality of Life Symptoms (QoL) (e.g.
Anxiety, Depression, Loss of memory, etc) and European Quality of Life (EuroQoL)
(e.g. Mobility, Stress, Self care, etc). Also, at this stage medication was adapted to the
patient’s health status. The data of the questionnaires and the laboratory exams were
also collected on each four months when patients had a regular clinical visit (face-to-
face contact). Besides, the clinical visits, patients in the NTS and HTM groups were
contacted by a heart failure specialist nurse via telephone on a monthly basis to assess
patient’s medication, symptoms and number of visits/contacts (at home, by phone, at
office, or at clinic) that patient have in the last month (either by physician,
investigator, specialist or a nurse). In addition to the already collected data, patients
in HTM group were asked to measure their weight, blood pressure and heart rate
twice daily (before breakfast and evening meal).
Finally, the TEN-HMS database contains records of patient’s admission in a
hospital. These records contain information related to the hospitalization such as,
reasons for admission, number of days stayed in hospital, the complete treatments
during the hospitalization, etc.
3.3 Data exploration
Data exploration is a preliminary investigation in order to better understand its
specific characteristics. We conducted statistical analysis on the TEN-HMS dada
TU Eindhoven 29 G. Manev
directly in the database using SQL queries. Besides the statistical analysis, we
conducted visual data exploration, namely visual exploration of event-data and of
time-series data. Visualization of the data can be helpful to discover particular
interesting characteristic (patterns), and what features and relations between them
may potentially describe patient current state and its short-term and long-term
dynamics. All data explorations are performed in iterative and interactive steps. On
this way some results found with the statistical analysis are confirmed with the visual
exploration, or other way around.
3.3.1 Visual exploration
Visual exploration of the data is important for having a first impression of how
different events are related with each other. It can help us to easily identify some
correlations between different types of data, to find extreme situations (outliers) or
other interesting or not interesting situations. In other words it can give us indications
for what can be reduced and what can be used for the pattern discovery.
3.3.1.1 Event pattern analysis
The TEN-HMS database has a dynamic data, because it records different events that
happen in a certain time period. Each of the various data described may represent
such an event (e.g. enrollment in the study). A log of all events for all patients can be
constructed, plotted and analyzed on a dotted chart.
We conduct event pattern analysis of dot charts, which are similar to a Gäntt chart
as they show the spread of events over time by plotting a dot for each event in the log.
We used DCA (Dot Chart Analysis) plug-in of Prom 5.0 open source process mining
tool [34] for this analysis, and we used PromImport framework [35] to construct the
log of events that is used by the DCA plug-in of Prom 5.0.
The chart has a few (orthogonal) dimensions: one showing the time of the event,
and the others showing (possibly different) components, e.g. patient ID and task ID,
of the event. Time is measured along the horizontal axis. The first considered
component is shown along the vertical axis, in boxes. The second component of the
event is given by the color and/or shape of the dot.
The value of visual inspection of event patterns using dot chart is threefold.
Firstly, we get an insight into frequency of and precedence of events starting from a
TU Eindhoven 30 G. Manev
start of the clinical study, from which we are able to decide what is the real start of the
study and when it stopped. Figure 3.3 illustrates frequency of events on a timeline,
starting from the patient enrolment in the program until the last recorded event.
Secondly, we get clear instances for data reduction. For example, it is easy to identify
patients who used the system for a very short time period or had too few daily
measurements (sparsely dotted lines in Figure 3.3). Also, it is easy to identify events
of a patient that should not be taken into account in the further analysis, e.g., monthly
contact via phone or measurement that took place while patient was in the hospital.
Some outlier events can be noticed also relatively easy, e.g. when monthly contact via
phone happened while patient was in the hospital or someone else used measurement
equipment while patient was in the hospital.
Figure 3.3 Dot chart analysis of usage data.
Lastly, the true value of event-pattern analysis is in interesting pattern
identification. We found e.g. that (see the top part of Figure 3.3): (i) a number of
patients were measuring themselves during the working days, but not during
weekends. This points to the influential role of lifestyle habits to the use of system;
(ii) patients start using the measurements devices more (or less) before or/and after
contact with medical professionals, e.g. if a patient does not measure himself for some
time, a clinical visit or a monthly contact event triggers the patient to re-start
measuring. This implies possible strong correlation in the effect of communication to
the motivation of patients to use the system; and (iii) patients may stop measuring
TU Eindhoven 31 G. Manev
themselves after clinical visits or monthly phone contacts, and then resume measuring
after couple of days. This is a potential indicator of worsening of patients condition
(not being able to measure) or improvement of their condition (they get reassurance
by their care givers that they are doing well), or de-motivation by the contact.
3.3.1.2 Visual exploration of time series data
While event-pattern analysis provides us with patterns of events as discussed above, it
does not provide information on how values of daily measurements or values of
symptoms change during the time, or how these values are correlated to each other or
to hospitalization (due to HF or other types).
We used visual exploration of time-series data for detailed visual exploration of
changes and correlations in values of daily measurements or symptoms (and their
correlation with hospitalizations) during a certain period. We visualized the daily
measurements data (weight, blood pressure and heart rate), hospitalization events, and
events from which we get information about symptoms. We did this analysis in
number consecutive steps using two different visualizations.
All patients, one symptom, only one vital sign
For all patient and one symptom (e.g. breathlessness limit activity) we performed
visual inspection of correlations of one vital parameter (e.g. weight) with (in our case)
breathlessness of limit activity symptom. On a singe plot, sub-plots for all patients are
0 50 100 150 200 250 30080
85
90
Days
Weig
ht [k
g]
0 20 40 60 80 100 120 140 160 180 20070
75
80
Days
Weig
ht [k
g]
0 50 100 150 200 250 30070
80
90
Days
Weig
ht [k
g]
Morning weights
Evening weights
G status B status
Hearth Failure Hosp. admission
B status
Hearth Failure Hosp. discharge
A status
S status
Hospitalization discharge
Hospitalization admission
Figure 3.4. Example of weight timeseries for three different patients for
breathlessness limit activity symptom (G, S, A and B values).
TU Eindhoven 32 G. Manev
shown. The investigation is repeated for different combinations, using most relevant
symptoms and different vital signs (weight, blood pressure and heart rate).
Additionally, there is another distinction regarding if the symptoms are plotted using
G (Good-no problem), S (Small amount of problems), A (Average), and B (Bad-many
problems) statuses or using W (Worse), I (Improved), and S (Same-not changed)
statuses. An example of weight timeseries for three patients and breathlessness limits
activity symptom is shown in Figure 3.4.
One patient, one symptom, all vital signs
This visualization is similar as previous one, but instead of plotting only one vital sign
(e.g. weight) for all patients on a same plot, we visualized all daily measurement signs
and most relevant symptom(s) on a same plot for a single patient (Figure 3.5).
Additionally, a more detailed plot for a specific vital sign (e.g. weight) and time
period is constructed (e.g. Figure 3.6). Similarly as previous, the investigation is
repeated for different symptoms and also making distinction regarding the usage of
symptom statuses (G, S, A and B) or (W, I and S).
0 50 100 150 200 250 30070
80
90
Days
We
igh
t [k
g]
0 50 100 150 200 250 3000
100
200
Days
HR
[b
pm
]
0 50 100 150 200 250 3000
100
200
Days
BP
[m
mH
g]
Morning weights
Evening weights
Heart rate
Systolic BP
Diastolic BP
Heart Failure hosp. admission
Hospitalization discharge
B status A status
Hospitalization admission
Heart Failure hosp. discharge
Figure 3.5. Example weight, heart rate, and blood pressure measurements of a single
patient for breathlessness limit activity symptom (G, S, A and B values).
TU Eindhoven 33 G. Manev
120 125 130 135 140 145 150 155 160 165 17074
76
78
80
82
84
86
Days
We
igh
t [k
g]
Morning weights
Evening weights
Heart Failure hosp. discharge
Heart Failure hosp. admission
B status
Figure 3.6. Zoom-in of Figure 3.5.
180 190 200 210 220 230 240 250 26077
78
79
80
81
82
83
84
85
86
Days
We
igh
t [k
g]
Morning weights
Evening weights B status
A status
Heart Failure hosp. admission
Heart Failure hosp. discharge
Figure 3.7 Zoom-in of Figure 3.5.
Benefits of visualization of Timeseries data
This process is helpful to visualize the huge amount of measured data and more
specific, this process is helpful to:
• To get an overview of patient’s daily measurements, symptoms and
hospitalizations;
• To get an impression of how the measurements (e.g. weight) behave before
and/or after symptom’s statuses (their change: worse, improved or not
changed, or simply different symptom’s values: G, S, A, and B) or before
and/or after a hospitalization (due to heart failure or other types);
TU Eindhoven 34 G. Manev
• To get an impression of how the symptom’s statuses change over time,
particularly how they behave before and/or after a hospitalization (due to heart
failure or other types).
Namely from Figure 3.5 (and zoomed regions in Figure 3.6 and Figure 3.7) we get, an
overview of correlations of weight over time with hospitalizations, and correlation of
symptom statuses with hospitalizations (e.g. in the shown zoomed regions symptom
statuses before a HF hospitalization are either A or B for a prominent symptom for
HF, breathlessness limit activity). In addition to well-known feature for heart failure
(rapid weight increase before hospitalization) this rather simple approach helped us to
additionally discover interesting symptom or weight related features that we then used
in the next step of knowledge discovery, in the data mining tasks.
3.4 Data preparation
The data preparation is an important step in data mining that includes data cleaning,
data selection, data preprocessing, transformation, and variable extraction and
selection. Well prepared data results in good results, while unprepared data usually
leads to failure in data mining. Findings obtained from the data exploration step are
used to perform this complex and time consuming step. In this section we briefly
explain these substeps.
3.4.1 Data cleaning
In the data cleaning step we deal with the data anomalies, which are missing values,
outliers (noises), duplicate data and wrong data. Data anomalies can result in a
significant decrease of the performance of models. It is recommended to detect and
remove them or replace them with normal values.
Missing values represent data that is not known. Many data mining algorithms
available can deal with missing data. However, if there is sufficient data and if
proportion of the missing data is low, it is recommended to remove data points with
missing values. In this thesis for training we remove instances that have completely
missing data, while instances in which only few data misses are handled with the data
mining algorithms.
Outliers represent values that are far away from the majority of the data. They can be
extreme cases, measurement errors or other anomalies. Because they are very
TU Eindhoven 35 G. Manev
different from the other data values, outliers do not train well and significantly
degrade the model performance. In this thesis work we remove outlier instances.
Duplicate data. Due to unknown reasons (maybe during data migration) there were
few duplicate records in the database. Duplicate records were noticed in pulse and
blood pressure measurement data.
3.4.2 Data selection
In this step proper (target) data is selected that will be used in the next steps of the
KDD process. For matters of convenience and readability the data selection is
presented in the case study (Chapter 5).
3.4.3 Data preprocessing
The data preprocessing have a great role in the preparation step. Because the original
data is not given in a proper format for the data mining step and usually some
information are hidden, the data should be first preprocessed so that the preprocessed
data will contain proper information. It is very time consuming task and usually takes
around 80% of the time of KDD process. In this thesis the complete preprocessing
step (Figure 3.8) is a complex task that is performed by set of SQL statements for
simple preprocessing and Java application for much complex data preprocessing.
JAVA application
New tables
Selections
Insertions
Updates
Functional views
Procedures
Set of SQL
Preprocessing
Target data
Modified tables
Derived and
modified tables
Preprocessed data
HF hospitalization prediciton
Symptom worsening prediciton
Next symptom value prediction
Figure 3.8. Data preprocessing step for different problems (e.g. HF hospitalization
prediction)
TU Eindhoven 36 G. Manev
3.4.4 Data transformation
The data transformation is a substep of the preparation in which the data values are
transformed into new values. The motivation for transformation is to reduce the
dimensionality of the data. The input variables in this study may be numerical (e.g.
potassium) and categorical (e.g. primary cause of HF). Depend on the attribute we
transformed the data using different techniques, namely disretization, normalization
or categorical transformation.
Discretization. With discretization, numeric attributes are disretized and mapped to
categorical values. It is used to reduce the dimensionality of the attribute value set as
high dimensionality degrades classifier performance. We applied two types of
discretizations:
• Supervised discretization - where numeric attributes are discretized according to
the technique presented in [36]. This technique discretizes a range of numeric
attributes in the dataset into nominal attributes such that the contribution of the
discretized nominal values is approximately equal to the class label (number of
instances in each new categorical value is approximately equal).
• Unsupervised Discretization - in contrast to the supervised discretization, the
unsupervised discretization uses simple binning, i.e. it divides the range of
numeric values into user defined number of bins. For example, if the range of
numeric values is from 0 to 100, by setting the number of bins to 5, the new
categorical values are: [0-20], (21-40], (41-60], (61-80] and (81-100].
Normalization is used to normalize numeric values on a scale from 0 to 1. This is
needed as different attributes may have wide range of real values
Categorical transformation. We used this transformation to transform categorical
attributes to smaller set of new categories.
3.4.5 Feature extraction and selection
Similarly, as the for the data selection, more detailed feature extraction is presented in
the case study. This section gives an overview of the different feature sets that we
considered for different data mining: prediction of HF hospitalization, prediction of
worsening symptom and prediction of next symptom status. However, as stated
earlier, in this thesis we focus only on the HF hospitalization prediction. We
constructed features based on the findings obtained by the data exploration. In
TU Eindhoven 37 G. Manev
general, for the HF hospitalization prediction task, we constructed six different types
of feature sets: General HF (GHF) feature set, features from symptoms (S), features
from daily measurements (D), medical history (H) feature, and features from clinical
visits (C). Additionally, we applied feature selection on the union of all feature set.
Other tasks, as Figure 3.9 shows, use some of these feature sets, but also additional
feature sets specific to the problems were constructed: General WS (GWS) feature set,
General Next Symptom Status (GNSS) feature set, Start Symptom (SS) feature.
Figure 3.9 An overview of features used for three different problems.
3.5 Data mining (Pattern discovery)
In this section we describe the data mining step of the KDD process. First, a
description is given of the place of the data mining task in the framework. Next, we
describe the data mining algorithms that we used and we give motivation why we
choose those algorithms. Finally, we give an example of pattern discovery with the
rule constructed. The main problem of HF hospitalization prediction is defined in
Chapter 4, and therefore we do not show it here.
In general different types of approaches can be used for discovery of useful
patterns, including association analysis, subgroup discovery, etc. In this thesis we
search for discriminating patterns by defining corresponding classification tasks.
TU Eindhoven 38 G. Manev
Figure 3.10 depicts the contribution of the mining task in this framework. Given a
particular set of features or their combination for particular classification task
(problem), the role of the data mining algorithms (classifiers) is to learn a model
represented by set of rules. Later, the learned model is used in online settings to
classify unknown instances of data into one of the predefined classes of the
classification task. For example, in the case of HF hospitalization prediction, the
learned model determines whether HF hospitalization will happen or not.
Figure 3.10. Training classifiers for three different problems. Applying the model
learned on a real time data to determine if e.g. whether HF hospitalization will occur
or not.
3.5.1 Data mining algorithms
In this section we describe the data mining classification methods that are used in this
thesis work. The classification methods have to address three important issues that are
characteristics for the context of this thesis:
• The patterns (rules) discovered should be expressed in a readable format so
that the prediction features can easily be identified.
• Imbalanced class distribution
TU Eindhoven 39 G. Manev
• Known class labels: prediction needed
We applied three different classification methods: rule classifier, decision tree
(DT) and support vector machine (SVM), from which the first two are more important
for this work as they produce output in a readable format that is easy to interpret. On
the other side, SVM do not produce rules in a readable format, but we use it only to
compare the results found by the two other classifiers.
As for the rule classifier we used the RIPPER (Repeated Incremental Pruning to
Produce Error Reduction) algorithm [37], which is one of the most popular and
publicly available rule classifiers. We used JRip, WEKA’s [13] implementation of
this algorithm.
The rule classifier is a direct method to extract rules from the data. Also, the DT
can be used for rule generation, which is an approach considered as indirect method
for rule extraction. We used one of the most often used DTs, C4.5. More specifically,
we used its WEKA’s implementation, J48 algorithm.
3.5.2 Classification issues
Learning a classification model usually challenges with a number of common
classification issues. In this thesis work we deal with two of these issues. In the next
paragraphs we address these issues.
Class imbalance Class imbalance [38] is well-known issue addressed in many research studies in
machine learning classification and performance. It is an issue that deals with
imbalanced class distributions. Number of studies [37,39,41] proposed different ways
of handling this problem. In this thesis work we use cost sensitive learning [42] to
handle class imbalance.
Cost sensitive learning. The data manning algorithms proposed in this work are
imbedded in a meta classifier to make it cost sensitive. In general a penalty is given
for misclassified instances. In terms of confusion matrix, false positives will be
assigned a higher penalty than false negatives. For example, in the classification task
of HF prediction most of the instances (95%) are associated with no HF
hospitalization, and only few of them (5%) have HF hospitalization. Without using
cost learning the classifier will learn more situations in which the HF hospitalization
do not occur. Therefore, a penalty (cost) is assigned for misclassification of HF
TU Eindhoven 40 G. Manev
hospitalization. On that way a classifier can learn both class values as they were
equally distributed.
Model overfitting
Model overfitting is another well-known classification issue. It is related to the
learning of the classification model. While learning the model, the data is divided into
two sets, training data set and validation data set. The data mining algorithms use the
training data to learn the classification model. Later the validation data is used to test
and/or optimize the learned classification models. Model overffiting may occur if the
training and validation data used for learning the model are not properly divided. Two
types of errors are important in this context. A training error is a misclassification of
a training instance. A generalization error is the error of the model on previously
unseen instances. In this thesis work we apply cross validation to handle this issue.
Cross validation is a technique used in the learning of the model and it aims to reduce
the generalization error [42]. All data instances in k-fold cross validation are
approximately equally divided into k folds or partitions, after which k models are
learned. The k models are created for each of the k partitions such that the particular
partition serve as a validation data and the other k-1 partitions are the training data
from which a particular model is learned. Additionally, each of these k partitions is
created in such a way that each class is properly represented in both training and
validation data sets.
3.5.3 Example pattern discovery
In this section we present an example of pattern discovery by defining corresponding
classification tasks. We show this example on a classification task that we also did as
part of our study, but different than our main problem of HF hospitalization
prediction. For example, we searched for rules that would predict next symptom status
and change in next symptom values. Recall that one of the key features of RPM
systems is the ability to perform subjective assessment of the patient via
questionnaires and thereby detect worsening of the symptoms and alert the care giver.
Hence, it would be especially useful to be able to predict the changes in the symptom
values, especially detecting worsening, in advance in order to be able to intervene via
education (coaching, lifestyle changes) or medications. We therefore focus the
TU Eindhoven 41 G. Manev
classification task on predicting the next value of the symptoms. Available features
from the clinical database, such as gender, age, and frequency of daily system usage,
which are potentially impacting the status of the next value of symptoms are used in
the classification task.
Table 3.1 illustrates example patterns found (with the help of J48 and Jrip
classification techniques) for two most prominent symptoms, breathlessness and
swelling of ankles. We can observe that women in general are at higher risk to remain
breathless (P1) and remain with swelling ankles if they do not use the system
regularly (P4-P5). In general, patients are in risk if they are under-utilizing the system
(P2-P3), while male patient population above 75 is at risk for worsening of their
condition (P6).
Table 3.1. Examples of discovered patterns Patterns Symptom
P1 (StartSymptom = ‘B’) & (Sex = F) =>
EndSymptom = B
P2 (StartSymptom = ‘A’) & (Age = '(37.5-81.5]') &
(freqOfWeightUsage < 0.4) => EndSymptom = ‘B’
P3 (StartSymptom = ‘A’) & (Age = '(37.5-81.5]') &
(freqOfPulseUsage < 0.4) => EndSymptom = ‘B’
Breathlessness
P4 (StartSymptom = ‘B’) & (Sex = ‘F’) &
(freqOfWeightUsage < 0.6) => EndSymptom = ‘B’
P5 (StartSymptom = ‘S’) & (Sex = ‘F’) &
(freqOfWeightUsage < 0.6) & (Age < 74.5)=>
EndSymptom = ‘W’
P6 (StartSymptom = ‘S’) & (Sex = ‘M’) &
(Age = '(74.5-79.5]') => EndSymptom = ‘B’
Swelling of
ankles
Legend: Start/End Symptom: G = good (no problem), S = small problems, A =
average, B = bad (many problems), W = worse, I = improved
In Table 3.2 we present possible adaptation rules based on the previously
discovered patterns P1-P6 from Table 3.1. As mentioned in Chapter 2, the current
systems do not personalize the delivered content to the patient’s condition, rather send
the same content to all the patients. The care giver currently needs to, based on the
current reported values of symptoms, do personalization over the phone or face to
face consults, explaining to the patient why his certain behaviors and how influence
the current breathlessness or similar.
With the rules presented in Table 3.2 the system would automatically identify
patients at risk for worsening of their condition, notify the medical professional about
risk, and send adequate content to the patient so that worsening can be prevented.
Thereby, the actual workload of the care giver could be reduced as (part of) what is
now done face to face or over the phone can be done automatically via the system.
Moreover, with the risk identification and adaptation of the content, the chances of
TU Eindhoven 42 G. Manev
improving clinical outcomes and thereby also reducing future workload associated
with worsening are higher. E.g., the first rule based on pattern P1 would send content
material to the patient to help her master her breathing, while at the same time notify
the medical professional that this woman is at risk to remain breathless. Similarly, the
second rule based on patterns P2-P3 would identify patient at risk and send
appropriate educational and instructional material to the patient and notification about
risk to the professional. In this case a patient needs to be motivated to use the system,
and properly instructed how to do so.
Table 3.2. Examples of adaptation rules
P# Possible RuleDesired effect
Patient Medical professional
P1If Sex=F and BreathlessSymptom=B then Send
videos with breathing exercisesRegain control over the breathing
Notification for patient
at risk
P2,
P3
If BreathlessSymptom=A and Age=(37.5-81.5]
and (feqOfWeightUsage <0.4 or
freqOfPulseUsage < 0.4) then Send
Motivational content
Motivation, instruction for using
the system, education on
breathlessness
Notification of patient
at risk
P4
If SwellingSymptom = ‘B’ and Sex = ‘F’ and
freqOfWeightUsage < 0.6 then Send
Motivational video Motivation, instruction for using
the system, education on swelling
ankles
Alert for additional
action
P5
If StartSymptom = ‘S’ and Sex = ‘F’ and
freqOfWeightUsage < 0.6 and Age < 74.5 then
Send motivational content
Notification of patient
at risk
P6If SewllingSymptom = ‘S’ and Sex = ‘M’ and
Age = (74.5-79.5] then send educational content
Motivation, education on im-
portance of managing condition
Notification of patient
at risk
TU Eindhoven 43 G. Manev
Chapter 4
Heart Failure Hospitalization prediction
In this chapter we present the problem of HF hospitalization prediction. First, we
explore in more detail the domain of cardiovascular diseases (CVD), especially Heart
failure (HF), as one of the most severe chronic cardiovascular disease. Then, we
define the problem of HF hospitalization prediction. Next, related approaches for
hospitalization prediction are presented. Finally, we present our approach based on
classification task for the problem of HF hospitalization prediction.
4.1 Heart Failure
Cardiovascular diseases (CVD) refer to a number of (related) conditions of the heart
muscle and its vessels. Heart failure is the end stage of the CVD cycle, and it is a
stage at which the cardiovascular disease can not be cured but with an effective
treatment symptoms can be improved. It describes a clinical syndrome, where the
heart can not pump enough blood to meet the peripheral needs of the body [7, 15, 16].
The body cannot work normally because the blood cannot deliver enough oxygen and
nutrition to it, which may cause fatigue in the muscles. Also, the body cannot
eliminate the waste products properly – leading to a build up of fluid in the lungs, legs
and abdomen.
HF is a very complex chronic condition, with many co-morbidities and number of
symptoms, (and) which it is hard to define in its entity. According to European
Society of Cardiology (ESC) guidelines for chronic heart failure (CHF) [15], patients
TU Eindhoven 44 G. Manev
are considered as HF patients if they have symptoms of HF (e.g. breathlessness,
fatigue, etc. either at rest or during exercise), signs of fluid retention such as
pulmonary congestion and an objective evidence of cardiac dysfunction obtained by
e.g. echocardiography to determine the left ventricular ejection fraction (LVEF).
Classification of HF
As mentioned earlier, due to the complexity of the HF syndrome, there are several
classifications for it. One distinction of HF is on acute and chronic HF [15]. The term
of acute HF is often used to mean a sudden and new onset of symptoms of HF in
patients with a previously normal cardiac function. On the other hand, Chronic HF
means that patients have periods of clinical stability interrupted by episodes of
worsening symptoms. These episodes are also called decompensation of HF [17].
One of the most often used classifications of HF is based on the New York Heart
Association classification [15]. NYHA classifies HF patients depending on the
patient’s ability to do a physical activity.
Table 4.1 HF classification by New York Heart Association [15].
NYHA
functional
class
Description
Class I No limitation: ordinary physical exercise does not cause undue
fatigue, dyspnea, or palpitations
Class II Slight limitation of physical activity: comfortable at rest but
ordinary activity results in fatigue, palpitations, or dyspnea
Class III Marked limitation of physical activity: comfortable at rest but less
than ordinary activity results in symptoms
Class IV Unable to carry out any physical activity without discomfort:
symptoms of HF are present even at rest with increased discomfort
with any physical activity
Causes and Diagnoses of HF
There are many causes that may lead to or worsen HF. Myocardial dysfunction,
usually as a consequence of myocardial infarction, and coronary artery disease, are
the most common causes of HF among patients under the age of 75 years [18]. Acute
ischaemia, anaemia, renal or thyroid dysfunction, past heart attacks, heart valve
disease, lung conditions, arrhythmia, and alcohol / drug abuse are examples of the
many other causes of (worsen) HF.
TU Eindhoven 45 G. Manev
Symptoms and signs are important for early detection of heart failure as they alert
the observer to the possibility that heart failure exists. Fluid retention is one of the
most prominent symptoms, which can lead to a sudden increase of weight.
Breathlessness, ankle swelling and fatigue are other relevant symptoms. Besides
physical symptoms of HF, some patients may have emotional symptoms, such as
anxiety, depression, and dementia.
Treatment and management of HF
The treatment of HF varies from patient to patient very much. Usually a combination
of multiple strategies may be applied to the therapy to fulfill its objectives. Besides
improvement of symptoms and prevention of progression of CHF, sometimes the aim
of the treatment is also to improve the quality of life, morbidity and mortality.
The treatment of HF can include pharmacological therapy (medications) and non-
pharmacological approaches, such as monitoring of body vital signs (e.g. weight) and
lifestyle activities (e.g. dietary restriction in order to control patient’s salt or fluid
intake as suggested in [15,19,20]).
Non-pharmacological programs, especially monitoring of vital signs are very
important to prevent hospitalization and worsening HF and also to improve the
patient’s quality of life. These programs are also known as HF management
programs. Many leading disease management companies offer HF management
programs that vary a lot with respect to design and execution. Usual (home) care
(UC) is a program where patient’s health status, symptoms, diet and medication
compliance, are assessed only via clinical visits. (Nurse) Telephone support (NTS) is
a form of management that can be provided through scheduled telephone calls from a
HF nurse or physician and by clinical visits. These two programs do not use
connected measurement devices do monitor patient’s health. On other side, (Home)
Telemonitoring (HTM) is form of remote management where patients are equipped
with connected measurement devices to allow daily monitoring of symptoms and
signs measured by patients. HTM programs are also known as Remote Patient
Management (RPM) Programs, and the systems that provide them are known as
Remote Patient Management (RPM) Systems. We discussed the RPM management
systems in Chapter 2, here we focus on a particular problem of HF hospitalization
prediction.
TU Eindhoven 46 G. Manev
4.2 Problem of HF hospitalization
In order to improve patients’s health without exploding costs, RPM systems should
have the ability to monitor patient’s conditions, detect if some problems occur and if
that is the case to provide a feedback to the patient in terms of appropriate instruction,
education and/or motivation to improve patient’s conditions.
As HF is one of the most severe cardiovascular diseases (CVD) with respect to
mortality and cost, early and highly accurate detection of HF situations is important so
that RPM systems will be able to intervene via appropriate education, instructions, or
medications to improve patients’s conditions, and hence to reduce costs.
As mentioned in previous section, patients with CHF have phases of clinical
stability interrupted by episodes of decompensation. This term, also called worsening
of HF, means that a previously stable CHF patient shows worsening symptoms that
the body cannot compensate any more. There are two cases in which worsening of HF
may occur. In the first case, worsening of HF may occur due to symptom worsening
but did not resulted in hospitalization of the patient. On the other side, patients may be
hospitalized due to (worsening) HF. Although, both cases are important to be
predicted, in this thesis we consider only the problem of HF hospitalization prediction
as possible hospitalization have higher costs. However, as already mentioned, we also
experimented with prediction of symptom worsening.
4.3 Related work
Several studies have shown different features that may be useful for the prediction of
HF hospitalization [19, 20, 44, 45, 46]. For a long term prediction of the patient’s
health status, attributes such as demographic ones or baseline measurements are only
interested. However, other features obtained in a daily, weekly or even monthly scope
have a short-term prognosis value. So far studies tried to predict HF hospitalization by
using only simple statistical approaches to compare two groups of patients with and
without HF hospitalizations [19, 20, 44] or by using well known trend algorithms or
algorithms defined by experts, again by previous statistical analysis [45, 46].
Although, from the medical domain is known that different symptoms and signs may
cause HF hospitalization (and probably predict it), so far different studies focus
mainly on daily measurements such as weight for prediction of HF hospitalization.
TU Eindhoven 47 G. Manev
The last two unpublished studies for prediction of HF hospitalizations, [45] and
[46], were done on the analysis of the TEN-HMS data [30]. The objective in both
studies was prediction from weight daily measurements (in both [45, 46]) and also
from heart rate in [46]. However, both studies have different definition of the
operational settings and therefore different evaluation strategies.
The study explained in [45] tried to predict episodes of worsening HF that may
lead to hospitalizations, where episodes were defined as 14-days prior to a HF
hospitalization. In this study only weight measurements were considered. Of 168
HTM patients, there were 45 HF hospitalizations and 76 non-HF hospitalizations.
Two different algorithms were used and compared, a simple rule of thumb (RoT) and
a moving average convergence divergence (MACD). A detailed description of the
algorithms can be found in Appendix C as they are also used in this work for feature
construction. The alarms that are generated by the algorithms are related to the
defined episodes, not to every day. This means that if at least one alarm was generated
during these episodes, the HF hospitalization related to that episode was counted as
true positive. Therefore, the true positive rate (TPR) represents proportion of episodes
with at least one alarm that lead to HF hospitalizations, and false positive rate (FPR)
relates to the number of episodes in which nothing HF related happened.
The study of [46] extended the previous study with two other algorithms not used
in [45], a weight trend index (WTI) and a heart rate trend index (HRTI). Additionally,
a number of different combinations of the algorithms were tested. The definitions of
the episodes of HF hospitalizations are the same as in [45], but the evaluation
procedure is a bit different.
As we also compared our results with the results of this study, we describe the
operational setting and evaluation criteria of this study. In this study an algorithm can
raise an alarm each day. If an alarm is raised within 14 days prior a HF hospitalization
it is seen as true positive (TP), and all days with weight measurements in that period
and no alarm are false negatives (FN). If alarm is generated at days out of the
specified range, it is marked as a false positive (FP), while if there is no alarm in these
days they are true negatives (TN). Because of this definition of the evaluation criteria,
there can be a situation where in a period of 14 days prior a HF hospitalization, more
than one alarm to be raised. Therefore, the real true positive rate (TPR) and false
positive rate (FPR) may not show meaningful results with regard to describe an
algorithm’s ability to detect a HF hospitalization in this work. To handle the problem
TU Eindhoven 48 G. Manev
of many TPs, a new term was introduced, hospitalization detection rate (HDR), which
represent proportion of HF hospitalizations that have at least one alarm in the
previously defined time range out of all HF hospitalizations. It reflects which of the
HF hospitalizations are really detected by the algorithms. The FPR on the other hand
describes how often the physicians and patients will be alerted, although no
hospitalization will happen. Youden Index, which is a difference between TPR and
FPR, is also used.
In comparison to [45], the TPR in [45] is the same as hospitalization detection rate
in [46] study. However, the definitions of FPRs are somewhat different in both works.
In [45], the FPR relates to the number of episodes in which nothing HF related
happened. The FPR used in [46] is the number of false alarms divided by the number
of days, which are not located before a HF hospitalization.
4.4 Our approach
In this section we present our approach of HF hospitalization prediction. Our
approach is different from the existing approaches in the following ways. First, it uses
learning process to detect (predict) HF hospitalizations. By using learning algorithms,
we can combine many kinds of different features, and not only those from daily
measurements. For example, we used features like symptoms (monthly basis) or
medical history which were not considered in the previous approaches. Secondly, we
defined new evaluation procedure which provides solution to the problems of
computing TPR and FPR defined in previous works. We defined prediction period in
which a HF hospitalization may be predicted. For example, so far the period of 14
days prior a HF hospitalization was considered as a period for a TP alert. With our
approach we also consider this period in case if an alert is generated by rule consisting
of features constructed from daily measurements. However, if an alert is generated by
rules constructed of monthly symptoms, it may be valid (a TP) if it is in for example
within 30 days before a HF hospitalization. This characteristic of our approach is also
as a result of the learning process and the characteristics of the data (e.g. symptoms
are available on monthly basis).
Lastly, we use a bit different definition of a HF hospitalization that is used in [45]
and [46]. In this work, a hospitalization is considered as HF if at least one of the three
TU Eindhoven 49 G. Manev
diagnosis codes of a hospitalization is 150 (heart failure), while all other
hospitalizations are classified as non-HF hospitalizations. On the other hand, in [45]
and [46], a hospitalization was considered as HF if the first diagnosis code was 150
(heart failure) or the admission reason was primary WHF. The remaining
hospitalizations were classified as non-HF hospitalizations.
In general, our approach follows the general framework of knowledge discovery
process presented in Chapter 3. More detailed, it is based on a classification data
mining task. The classification task is used to identify whether a HF hospitalization
will happen (class existsHF) or not (class noHF) based on the used features. At first,
classification models are learned from a training dataset and later the best learned
models are evaluated on an independent test data. For construction of the training
dataset, we follow the general framework.
In the next sections we first describe in more detail the classification task. Then
we present how training instances are created. Next, we show the evaluation metrics
we used to evaluate our approach. In Chapter 5 (case study) we show the overall
process of HF hospitalization prediction.
4.4.1 HF hospitalization Classification task
Before we define how we look on the problem in our approach of HF hospitalization
prediction, we show how this problem was seen so far. Then, we show motivations
and reasoning for applying our approach. Finally, we explain setting of our approach.
Figure 4.1 shows how the problem of HF hospitalization was seen so far. Based
on the available daily measurement data about a patient at moment ti a prediction was
casted (alert a warning or do not alert) on a daily basis whether the hospitalization for
this patient is likely within next 14 days period [ti , ti+14]. Study [46] uses this setup
and based on it constructs evaluation procedure such that TPs, FPs, FNs and TNs are
calculated on a daily basis. Although, the study presented in [45] uses the same
problem definition, the evaluation procedure for calculation of the same metrics is not
based on every day but on the episodes (periods) of 14 days prior a HF hospitalization
(e.g. only one TP for one period).
One of the goals in this thesis was to include symptoms and possibly other
features (e.g. from medical history) to features extracted from daily measurements in
TU Eindhoven 50 G. Manev
Figure 4.1 Settings from previous approaches.
prediction of the HF hospitalization. Therefore, in the definition of the problem of HF
hospitalization prediction in our approach, we take into account the possible different
features used in the prediction.
However, as we also want to compare results from previous works and in order to
better understand our approach, at first, we consider an over simplified setup for the
definition of the problem of hospitalization prediction. This setup (Figure 4.2) is
based on the current approaches when only daily measurements were considered for
prediction of HF hospitalization. Later, we reconsider our settings for definition of
hospitalization prediction such that information of the other features is included.
An over simplified setup
In the over simplified setup, the problem of hospitalization prediction can be defined
similarly as before (Figure 4.2). Based on the available data (not only daily
measurements) about a patient at moment ti cast a prediction (alert a warning (class
existsHF) or do not alert (class noHF)) on a daily basis whether the hospitalization for
this patient is likely within next 14 days period [ti , ti+14]. We consider here the case
of heart failure (HF) as a primary cause of hospitalization. However, the methodology
is rather generic and applicable for a wider range of cases.
Figure 4.2. Hospitalization prediction for the following two weeks window.
Figure 4.2 illustrates the timeline of data availability in the form of predictive
features used by a domain expert or an automated classifier for facilitating decision
making. At the moment of enrolment t0 of a patient, a complete medical history data
(corresponds to H features) is recorded. A complete record may contain dozens of
fields providing different information such as primary cause of HF, diagnosis and
other information related to the previous hospital admissions, existence of valve
TU Eindhoven 51 G. Manev
diseases (mitral regurgitation, mitral stenosis, aortic regurgitation, etc), evidence of
coronary diseases (myocardial infarction, history of angina, angioplasty, etc),
arrhythmias (chronic atrial fibrillation, ventricular tachycardia, etc), devices
implanted (pacemaker and defibrillator) and other possible diagnosis.
During a (monthly) phone contact MCj patients are asked to assess quality of life
(QoL) symptoms (S features), and report additional data such as disease and non-
disease medication (or medication change), number of visits/contacts (at home, by
phone, at office, at clinic) in the last month. On a daily basis the patients are
monitored regarding their vital signs such as weight, blood pressure and heart rate
(source for constructing D features).
Extended setup
The problem of hospitalization prediction defined by the over simplified setup is a
general problem, which does not take information of the characteristic of the
symptoms (e.g. possibility to predict in the period of which they are valid), and
therefore it assumes that symptoms can not accurately predict hospitalization after the
period of 14 days. If, for example, symptoms were gathered on a 15 days time
window, then there is no need to reconsider the described setup.
Figure 4.3 Hospitalization prediction for the following monthly window.
However, since in the TEN-HMS database, on which study we made our
investigation, symptoms were available on a monthly basis and they are basis for our
analysis, we reconsider the previous setup, such that this information is included in
the definition of the problem. As shown in Figure 4.3, an alert can also be fired or not
at moment MCj (monthly contact), and even more, the rule that generates this alert can
contain only information of symptoms (and not from daily measurements). Then, the
problem of hospitalization prediction can be extended such that, also at MCj, cast a
prediction (alert a warning or do not alert) whether the hospitalization for this patient
is likely within 30 days (until the next monthly contact).
TU Eindhoven 52 G. Manev
In the remainder of this thesis we first look on the problem as it was defined in the
over simplified setup, and after we show the results from that setup, we come back to
this definition of the problem and reconsider the setup in more meaningful way.
4.4.2 Creation of training instances
In Figure 4.4 we show our approach of constructing training instances, particularly a
positive instance, i.e. the case when hospitalization took place.
In the construction of the training instances very important is the period between
two monthly contacts and existence of a previous HF hospitalization within 30 days
before the HF hospitalization for which we want to construct a positive instance. The
later is important since 25 % of all HF hospitalizations occur within 30 days after a
previous HF hospitalization. First, we find a day on which hospitalization has
occurred (th) and take the 14 days window [th-14,th) to compute features related to daily
measurements. It should be noticed that data for computing these features may
include days outside this two week window. A typical example would be to check for
each day of the window whether dynamics of the patient exceeds some predefined
threshold (e.g. feature from MACD algorithm). The dynamics itself can be computed
on windows of different sizes. Later, depend whether there was or not a previous HF
hospitalization in the window of 30 days prior the HF hospitalization (th-30), with each
instance is associated a feature that holds that information (Y – there was a previous
HF hospitalization, and N – there was no previous HF hospitalization). Medical
history related and symptoms related features are computed based on data available at
the moment of patient's enrolment and the monthly visit preceding the hospitalization
correspondingly.
Figure 4.4 Forming of a positive (hospitalization took place) training instance.
Negative instances are constructed in a similar way, but the period of construction
of daily measurements is the period between two monthly contacts in which there was
no HF hospitalization. If there was a HF hospitalization in between two monthly
contacts, as already shown, we constructed a positive instance for that hospitalization.
TU Eindhoven 53 G. Manev
However, in order to be complete (to cover the complete period), if there is no other
HF hospitalization in the next 30 days after the HF hospitalization for which we
created a positive instance, we create additional instance for the period after the HF
hospitalization and before the next monthly contact. For this instance, the feature
“existence of previous HF hospitalization” is marked as Y (existsHF), but the class
label is negative (noHF – no next HF hospitalization).
4.4.3 Evaluation method
In order to measure the performance of our hospitalization approach, an evaluation
method is required. As we already mentioned, we first learn classification models
from training data using different feature sets to choose best models from the training
data and later we evaluate the selected models on the test data. In the following
sections we will first, explain the evaluation methods and the metrics used to measure
the performance of the classification models (classifiers) during the learning phase.
After that, we explain the evaluation methods used in the evaluation phase to
evaluate our approach (the selected best models from the learning phase) on an
independent test data. Recent studies have shown that there is no common way to
evaluate the HF hospitalization prediction (e.g. [45] and [46] use different definitions
of TPs, FPs, FNs and TNs). Therefore, for the final evaluation of our models, we use
two different evaluation methods: one that was used in [46] where the above metrics
are computed on a daily basis, and new evaluation methods introduced in this thesis.
4.4.3.1 Classification model Evaluation
The classification model evaluation is conducted to measure the performance of
classifiers during the learning process. The classification models obtained from the
different machine learning methods (Jrip, J48 and SVM) using different feature sets
are evaluated and compared to be certain which of them show the best results.
The metrics that we used for this evaluation are true positive rate (TPR), false
positive rate FPR and Youden Index (YI). Bellow a motivation is given for using
those metrics.
In the medical domain, especially in the prediction of HF hospitalization, it is
important to measure the performance such that the number of correctly predicted
hospitalizations (true positive (TP) alerts) is as high as possible, while the number of
false alarms (false positive - FP) is as low as possible. In terms of rates, this means
TU Eindhoven 54 G. Manev
that the true positive rate (TPR) that calculates the proportion of positives correctly
classified as such, should be as high as possible. On the other side, the false positive
rate (FPR) that presents the proportion of negative instances that are incorrectly
classified as positive (false alarms), should be as low as possible. In data mining, the
TPR is also known as Recall and in medical society is more known as sensitivity. The
FPR in medical terms is more known as 1- specificity.
However, in reality usually with high number of TPs (high TPR) there is also
relatively high number of FPs, which also influences the FPR. Also, the other way
around occurs, with very small FPs (small FPR), the number of TPs is also smaller
(such as the TPR is). Therefore, there can be a tradeoff between these two as for now
it is not known which have higher cost (higher TPR or smaller FPR). For now, it is
proposed that TPR and FPR are equally important. Therefore, we use Youden Index
(YI) = TPR – FPR as basic metric to measure the performance of the classification
models. It has values between 0 and 1 where 0 is the lowest possible value and 1 is
the highest possible value. If the TPR is maximized and the FPR is minimized, the
Youden Index is highest. Therefore, our goal is to search for best classifiers such that
Youden Index is maximized.
In order to compute these metrics, we need to calculate the number of true
positives, false positive, false negative, and true negative. We use confusion matrix to
do that. Table 4.2 shows the confusion matrix for a two class problem, in which rows
represent the actual output classes and columns contain predicted classes.
Table 4.2 Confusion matrix for a two class problem. Predicted class Actual class
Positive Negative
Positive tp fn
Negative fp tn
In our context the positive class would be “exist new HF hospitalization”
(existsHF), and negative class would be “no next Hf hospitalization” (noHF).
The TPR, FPR are defined as follows:
fn pt
tp (TPR) rate positive True
+
= , tn pf
fp (FPR) rate positive False
+
= (1)
TU Eindhoven 55 G. Manev
Also, we visualize the TPRs and FPRs using a receiver operating characteristic
(ROC) plot [5]. We plot the TPR on the y-axis versus the FPR on the x-axis of each
algorithm using different features sets. The diagonal line that separates the plot on two
parts from the point (0, 0) to point (1,1), where the TPR and the FPR are equal,
represents a random guess. Algorithms are considered as better predictors than
random guess if their values are above the diagonal line, and if below the line, they
are worse.
Random
guess
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
FPR (1-specificity)
TP
R (
se
ns
itiv
ity
)
Figure 4.5 Receiver operating characteristic (ROC).
By using YI as evaluation metric, there can be classification models that perform
equally with respect to YI, but their TPRs and FPRs may be different. For example,
two classifiers with TPRs of 0.5 and 0.4 and FPRs of 0.2 and 0.1, performs equally
with respect to YI (for both it is 0.3). However, the first classifiers performs better
than the second with respect to TRR (0.5 vs 0.4), while the second classifier performs
better than the first with respect to FPR (0.1 vs 0.2). We show this distinction, in order
to allow to domain experts to adjust to their needs. For example, if it is more
important to predict more positives (higher TPR) then the first algorithm will be
selected as better one. On the other side, if it is more important to have fewer false
alarms (smaller FPR), then the second algorithm can be selected.
4.4.3.2 Statistical significance test
The t-test [14] assesses whether the means (averages) of two groups of samples are
significantly different from each other. The t-test can be either unpaired or paired.
The unpaired t-test is used when for each of the two populations that we want to
compare, two separate independent and identically distributed samples are obtained.
On other side, the paired t-test typically is used on a sample of matched pairs of
TU Eindhoven 56 G. Manev
similar units. We use the paired t-test, because we utilize the same dataset and the
observations on the two populations of interest are collected in pairs.
The null hypothesis, which states that one classifier performs no better than the
other, is defined as that the mean of the first sample if equal to the mean of the second
sample. If the calculated p-value is bellow a certain threshold chosen for statistical
significance, usually 0.05, then the null hypothesis is rejected, and we can assume that
one classifier performs better than the other.
We use the statistical significance test (t-test) to asses whether one classifier
performs statistically significantly better than others with respect to Youden Index.
However, as already explained, we are also interested to find classifiers with best
performance with respect to TPR (max TPR) and FPR (min FPR) from those that
performs equally with respect to YI. Therefore, once we find the best algorithms with
respect to YI, we apply t-test to find algorithms that statistically significantly
outperforms others with respect to TPR and FPR.
4.4.3.3 Heart Failure hospitalization evaluation
From the classification model evaluation in the learning phase, we learned and choose
best algorithms of our approach. In order to show the performance of our approach we
evaluated the selected models from the classification evaluation on an independent
test (evaluation) data. This evaluation is used as the final performance measurement
of the hospitalization prediction process. We also compared the results from our
approach with the results of [46] study.
Similarly as in the classification evaluation, as an evaluation metric we used
Youden index, TPR and FPR. As discussed before, we evaluated our models using
two different definitions of the TPs, FPs, FNs and TNs, one for each of the defined
setups, the over simplified one and the new improved setup. We show these
evaluations in the case study (Chapter 5).
TU Eindhoven 57 G. Manev
Chapter 5
Case Study: TEN-HMS
In this chapter we show case study for our approach of HF hospitalization prediction
using the data of the well known TEN-HMS study [30]. We follow the general
framework described in Chapter 3 and we show some of the steps from the KDD
process that we did for the hospitalization task. First, we show some basic finding
from the data exploration needed for the classification task. Second, we show
representation of the feature space. Then, we define the experiment design we
constructed for our approach. We conducted series of experiments to learn
classification models. Next, we evaluate the results of our approach and compare
them with the other existing approach. Lastly, we summarize the results.
5.1 Basic data findings
In this thesis work we use the data from the well known TEN-HMS study [30]. In
Chapter 3 we briefly defined the main characteristics of the database. Here we show
some of basic statistics and findings obtained during the data exploration for our
approach.
Patient data
The complete TEN-HMS dataset contains information about 421 patients with
cardiovascular diseases, out of which 166 patient where HF patients home
telemonitored during the period of two years. We reduced patients that do not have
any data available (5 patients), exist in the study for less than 50 days (7 patients), or
TU Eindhoven 58 G. Manev
have less than 50 measurements (11 patients). Therefore, in our experiments we focus
on the remaining 143 patients (112 alive and 31 dead).
Time interval selection
In the TEN-HMS study most of the patients started to use the system (to measure
themselves) with 2-3 weeks delay after the enrolment, while for other patients there
was even more time delay (a month or even few months). The reason for this is
because some time was needed for technicians to install the system at patient’s home.
In order to really focus on the results from the hometelemonitoring, in this thesis work
we focus on a time interval after the patients started to measure. As for the end point
of the time interval we use the last clinical visit for patients that stayed alive, and for
patient that died the end period is patient’s death.
Measurement data
The TEN-HMS database contains approximately 100000 records for each of the vital
signs measurements. Patients were asked to measure themselves two times per day, in
the morning and in the evening. However, during the study large number of patients
do not comply with this requirement, and measured themselves only once per day,
while other measure 2 or more than 2 times per day. Even more, some patients
measure themselves more time in few minutes (e.g. 7 times in 10 minutes). The later
indicates that those measurements actually should be considered as one. Table 5.1
shows the number of days in which a patient have measured himself for a specific
number of times (number of measurements per day). For example, there were 9 614
days in which patients have only one weight measurement in those days. Similar
number exists for blood pressure and pulse.
Table 5.1 Number of days with certain number of measurements.
# of meas.
per day
# of days # of meas.
per day
# of days
1 9 614 7 4
2 39 342 8 5
3 1 207 9 3
4 150 10 1
5 44 11 1
6 19
Because of this, we make distinction between morning and evening measurements
and later in our study we focus only on morning measurements. We also do not take
TU Eindhoven 59 G. Manev
into account measurements taken after the end period, measurements during
hospitalizations, outliers, and duplicate data.
Monthly symptoms
Symptoms were taken from monthly phone contacts or from clinical visits. The time
period between two monthly contacts was in range between 15 to 40 days, where the
average was 30 days. For the 143 patients that we considered there were 1836
monthly phone contacts or clinical visits after the period that we consider.
Clinical data
Similarly as for the monthly symptoms selection, we select proper clinical visits for
the training data. From 619 clinical visits we select only 492 that were placed after the
start period of our study.
Hospitalization data
From the considered 143 patients, there were in total 93 HF hospitalizations and 107
non-HF hospitalizations. Table 5.2 shows the distribution of hospitalizations in the
periods between two monthly contacts. As shown, 46 out of 93 HF hospitalizations
occur in the first 15 days after a monthly contact, 40 occur in the period between the
16th
day and the 30th
day, and 7 occur in the period between 30th
and 40th
day after the
monthly contact. The exploration of the hospitalization data showed a very interesting
phenomenon, 24 (26%) of all 93 HF hospitalizations occur within 30 days after a
previous HF hospitalization.
Table 5.2. Distribution of HF hospitalizations between two monthly contacts.
Days after a
monthly contact
Number of HF
hospitalizations
1-15 46
16-30 40
31-40 7
5.2 Feature space representation
We construct the features for our task based on the findings from the data exploration,
namely the visualization of event data and the visualization of timeseries data
(Chapter 3). Most of the considered features are new, not used in previous studies.
However, features for the daily measurements are based on the algorithms used in
previous works [45] and [46]. We divide all of the constructed features in 5 feature
TU Eindhoven 60 G. Manev
set: general heart failure feature set (abbreviation GHF), features constructed from
symptoms (S), features from daily measurements (D), and features from medical
history (H). This division is made so that we can experiment the performance of HF
hospitalization prediction of using only single feature set, and a number of
combination the feature sets. In the following subsections we show description of
each of the feature sets we used.
5.2.1 General HF features
This feature set combines two different features, Gender and PreviousHFhosp. The
first one represents patient’s gender and it has two categorical values M and F for
male and female patients respectively. The second feature represents the existence
(category Y) or not (category N) of a previous HF hospitalization within the last 30
days before the HF hospitalization that we want to predict.
5.2.2 Symptoms features
The basis for our analysis is the feature set containing 11 symptoms (e.g. Swelling
Ankles, Breathlessness Limit Activity, Fatigue, Rate Overall Health, etc) that were
assessed on a monthly basis (during both monthly phone contacts and clinical visits).
The motivation of using monthly symptoms as predicting features is the indication
that symptoms are one of the signs of HF hospitalization. We applied our own
transformation to transform the existing 7 assessment values of the symptoms (2-Very
Good, 3-Good, 4-Quite Good, 5-Average, 6-Quite Poor, 7-Poor, and 8-Very Good)
into 4 new categorical health states G-Good (value 2), S-small problems (values 3 and
4), A-Average (value 5), and B-much problems (values 6, 7, and 8). There were three
motivations for this transformation: to reduce the number of categories, improper
distribution of the previous categories, and the different reasoning of the previous
symptoms values (e.g. there is small worsening of problem (health state) while going
from status 3 (Good) to 4 (Quite Good), while the symptom worsen much more from
2 (Very Good - no problem are all) to 3 (Good - but still with little problem)).
5.2.3 Features from daily measurements
We used the algorithms from previous studies, namely RoT, MACD, WTI and HRTI,
to extract features from daily measurements that are used in the data mining task. Due
TU Eindhoven 61 G. Manev
to confidentiality we show the algorithms that we use for feature construction in
Appendix C and here we show the actual features constructed.
We constructed 4 features, ExistsRoTAlarm, ExistsMACDAlarm, ExistsWTIAlarm,
and ExistsHRTIAlarm, for each of the four algorithms RoT, MACD, WTI and HRTI.
A particular feature (e.g. ExistsRoT) indicates existence (category Y) or not (category
N) of at least one alarm raised by a particular algorithm (in this case RoT) within a
period of 14 days prior a HF hospitalization (in case of positive instances) or between
two monthly contacts (in case of negative instance).
5.2.4 Medical history feature
Medical history data was collected at the beginning of the TEN-HMS study. Because
these data is taken only once (it is static data), we clustered patients according to their
medical history data. A hierarchical agglomerative clustering algorithm with complete
linkage was used to cluster patients, after which we selected 4 top clusters: cluster 1
(51 patients), cluster 2 (35 patients), cluster 3 (14 patients) and cluster 4 (14 patients).
We constructed feature named Cluster to express in which cluster a particular patient
belongs. The categories for this feature correspond to the selected clusters (clus1,
clus2, clus3 and clus4). To each training instance for a particular patient is associated
the Cluster feature containing category of the corresponding cluster in which the
patient was clustered.
5.2.5 Feature selection
As the number of considered features increases with combination of different feature
sets, the performance of the data mining algorithms may degrade. The benefits of the
feature selection might result in better results of the used data mining algorithms. The
feature selection reduces dimensionality by reducing redundant or irrelevant features.
This leads to reduction of the training and usage times and it helps to improve the
prediction accuracy [12].
We use exhaustive search algorithm in combination with Correlation-based
feature selection (CFS) method. The exhaustive search algorithm, starting from the
empty set of attributes, performs an exhaustive search through the space of attribute
subsets. The CFS algorithm filters subsets of features that are highly correlated with
the class while having low intercorrelation [43].
We applied the described feature selection method on the training set in the fifth
TU Eindhoven 62 G. Manev
experiment using all features sets (S+D+H) and using 10 cross-fold validation
utilizing the Attribute Selected Classifier for different learning algorithms (Jrip, DT
and SVM) and different combinations of parameters. We compared the performance
of the algorithms with different parameters by measuring the statistical significance.
The feature selection results in reducing the feature space to only 5 features.
5.3 Experiment design
Our experiment setup consists of two major steps. At the first place, in the learning
phase, we conduct experiments to learn and select the best classification models.
Then, in the evaluation prediction phase, we evaluate the performance of our
approach and compare the performance to the other existing approaches. Additionally,
for the evaluation of the performance of our approach we use two different definitions
for computation of the evaluation metrics TP, FP, FN, and TN.
In Figure 5.1 we show the overall experiment setup. First, we divided the
complete dataset into two parts, training dataset and test dataset. From the training
dataset we generated training instances. We used different representation of the
training instances using different feature sets: only symptom features (S), symptom
features with feature extracted from the daily measurements data (S + D), symptoms
with medical history of the patients (S+H), and their union (S+D+H). Additionally,
we tried some of these subsets and finally an exhaustive search for the best feature
subset (S+D+H+FS).
The training instances are used to train classification models. We applied different
classification techniques, including support vector machines (SVM), decision tries
(J48), and rule-based learners (JRip). By means of cross-validation and cost sensitive
learning we searched for and fixed the best parameters for each classification
technique. In each category we left only those combinations which were statistically
significantly better than others according to paired t-test with respect to Youden
index. Additionally, we choose only two models, with maximum TPR and minimum
FPR, from those that perform approximately equal with respect to YI.
After this point, we applied the obtained classification models on the test dataset
to perform evaluation of our approach for HF hospitalization prediction. We used two
evaluation strategies to evaluate our approach and compare with existing approach.
TU Eindhoven 63 G. Manev
Training
dataset
Classifier models Existing Approach
Compare
S S+D S+H S+D+H
Measure
performance
Measure
performance
Test
dataset
Dataset
Creation of training instances
using different feature sets
Cross validation and
Cost sensitive learning and
Feature selection (S+D+H+FS)
Figure 5.1 Overall experiment design.
5.4 Creation of training instances
We focused our investigation on 143 HTM patients as explained before. From these
patients, there were 43 (30%) patients that had been hospitalized at least once due to
HF. For this set of patients, 93 hospitalizations took place. In the test dataset we
randomly choose 29 (20%) patients (9 who had at least one HF hospitalization and 20
who did not) and the remaining 114 (80%) patients (34 who had and 80 who did not)
form training set.
Table 5.3 Number of patients in the training dataset and test dataset.
Number of Patients in Training dataset Number of Patients in Test dataset
with HF
hospitalization
without HF
hospitalization
with HF
hospitalization
without HF
hospitalization
34 80 9 20
114 29
TU Eindhoven 64 G. Manev
After the division, the training set of patients has in total of 73 HF hospitalizations
while the test set has 20 hospitalizations (Table 5.4). As for the events that collect
symptoms (monthly contacts and clinical visits), the training and test data contains
1361 and 378 records respectively.
Table 5.4 HF hospitalizations and symptoms in the training and test datasets.
Training dataset Test dataset
Number of HF hospitalizations 73 20
Number of events with symptoms 1361 378
We construct the training instances from the training dataset like we described in
Section 4.4.2. We use different represent for the training instances using the different
feature sets. However, the number of instances remains the same in each
representation and the only difference is by the different features constructed in the
training instances. Table 5.5 show the number of the positive and negative training
instances, with their corresponding class labels existsHF and noHF. These training set
will be used for learning different classification models explained in the next section.
Table 5.5 Training instances for the HF hospitalization classification task.
Class label Number of instances
existsHF 73
noHF 1330
5.5 Classification model prediction
In this section we show the experiments we performed related to the hospitalization
classification task. We conducted five different experiments using combination of
different features sets: using only symptom features (S), symptom features with
feature extracted from the daily measurements data (S + D), symptoms with medical
history of the patients (S+H), and their union (S+D+H). Additionally, we tried some
of these subsets and finally an exhaustive search for the best feature subset
(S+D+H+FS). Although, not shown, in each of the experiments the set of General HF
Hospitalization (G) features is used.
In each of the experiments, we experiment with three base classification
techniques using WEKA [13], namely support vector machines (SVM), Jrip rule-
classifier and J48 decision tree. For J48 decision tree, we also experiment with
different settings of the minimum number of instances per leaf in the decision tree,
namely object size. Three different values for the object size were considered: 2, 10
TU Eindhoven 65 G. Manev
and 20. The base classifiers were embedded into the CostSensitiveClassifier
classification method from WEKA. On this way we apply cost sensitive learning
experimenting using different costs (8, 9, 10 and 14). In general, the cost sensitive
learning may be applied if the cost of misclassifying a HF hospitalization is bigger
than the cost of reporting a false alarm. However, in this thesis we consider that these
two costs are equal, and we apply the cost sensitive learning only because of the
imbalanced class distribution of the training instances.
The Explorer tool from WEKA was used to configure the experiments, where 30
runs were made for each of them. Besides, 10-fold cross validation was used to
address model overfitting during learning a particular classifier. The t-test was used to
determine whether a particular classification model statistically significantly
outperforms the other with respect to Youden Index. We construct a “set of best
models” from models that perform equally with respect to YI.
Additionally, we also want to make distinction between classifiers with same YIs
but different TPRs and FPRs, which information can be used by domain experts to
adjust to their need for choosing a single classifier as best. Therefore, we show an
example of how this can be used, by choosing two models from the “set of best
models”, one with maximum TPR and other with minimum FPR. We use these
models in the evaluation phase to evaluate performance of our approach.
5.5.1 Learning models using (S) features
In the first experiment, besides the General HF set of features, we tried to predict HF
hospitalization using Symptoms (S) feature set. We also consider a subset of this
feature set, namely S\RQoL, which does not include Rate QoL and Rate Overall
Health features as they represent overall patient’s health and therefore are correlated
to other symptoms. Due to the different costs used, the different parameters of the
object size for J48, and the usage of additional subset of features (S\RQoL), the total
number of algorithms tested in this experiment is 36.
Table 5.6 shows the mean and the standard deviation of the TPR, FPR and YI
metrics for all 36 classifiers. From all combinations of classifiers, we want to select
classifiers that are statistically significantly better than others with respect to YI. We
performed t-test with respect to YI to obtain those classifiers. The complete table of
how particular classifier statistically performs over others is shown Appendix A.
TU Eindhoven 66 G. Manev
Table 5.6. TPR,s FPRs and YIs of all models that use symptom (S) features
Classification model TPR FPR Youden Index
Mean Std.dev. Mean Std.dev. Mean Std.dev.
S (SVM) cost 8 0.2555 0.1499 0.0795 0.0381 0.1759 0.1462
cost 9 0.3398 0.1554 0.1206 0.0299 0.2192 0.1562
cost 10 0.38 0.1544 0.1427 0.0305 0.2373 0.1582
cost 14 0.4486 0.1565 0.2045 0.0367 0.2441 0.1586
S (Jrip) cost 8 0.3915 0.1876 0.1548 0.0449 0.2367 0.1801
cost 9 0.4162 0.1837 0.1714 0.0494 0.2448 0.1771
cost 10 0.4339 0.179 0.1858 0.0507 0.2481 0.1728
cost 14 0.5141 0.1813 0.2396 0.0616 0.2745 0.1757
S (J48) cost 8 Obj.2 0.2205 0.1404 0.0943 0.0279 0.1262 0.1401
Obj.10 0.2496 0.146 0.1143 0.0315 0.1353 0.14
Obj.20 0.2768 0.1513 0.1193 0.0365 0.1576 0.1493
cost 9 Obj.2 0.2258 0.1415 0.0978 0.0289 0.128 0.1412
Obj.10 0.2915 0.1519 0.1358 0.0335 0.1556 0.1482
Obj.20 0.3088 0.151 0.1416 0.0391 0.1673 0.1504
cost 10 Obj.2 0.2318 0.1467 0.1016 0.0291 0.1303 0.1457
Obj.10 0.3307 0.1535 0.1545 0.0358 0.1762 0.1515
Obj.20 0.3214 0.1518 0.1706 0.0415 0.1509 0.1476
cost 14 Obj.2 0.2719 0.157 0.1216 0.0313 0.1503 0.1558
Obj.10 0.3863 0.1586 0.1924 0.0408 0.194 0.1549
Obj.20 0.3926 0.1603 0.234 0.0477 0.1587 0.1561
S\RQoL cost 8 0.3222 0.181 0.1215 0.0464 0.2007 0.173
(Jrip) cost 9 0.3483 0.1766 0.145 0.0524 0.2033 0.1674
cost 10 0.369 0.1824 0.17 0.0635 0.199 0.1664
cost 14 0.5289 0.1842 0.2561 0.0613 0.2729 0.1793
S\RQoL cost 8 Obj.2 0.261 0.1581 0.0991 0.0273 0.1619 0.1578
(J48) Obj.10 0.3232 0.1566 0.1252 0.0352 0.198 0.1517
Obj.20 0.3188 0.1596 0.122 0.0365 0.1969 0.154
cost 9 Obj.2 0.2624 0.1567 0.1041 0.0281 0.1583 0.1568
Obj.10 0.3661 0.1656 0.1446 0.0363 0.2215 0.1589
Obj.20 0.3662 0.1595 0.1402 0.0413 0.226 0.1566
cost 10 Obj.2 0.2683 0.1596 0.1096 0.0282 0.1587 0.1595
Obj.10 0.394 0.1632 0.1614 0.0388 0.2326 0.1587
Obj.20 0.3808 0.163 0.1508 0.0426 0.23 0.1598
cost 14 Obj.2 0.3143 0.169 0.1408 0.0326 0.1735 0.1657
Obj.10 0.4361 0.1708 0.1997 0.0428 0.2364 0.1694
Obj.20 0.4548 0.1723 0.2185 0.05 0.2363 0.1683
In this experiment and we will see later in some other experiments, very few or
none of the classifiers statistically outperforms others with respect to YI. For example,
the best classifier according to the t-test in this experiment (Table 5.7), is Jrip with
cost 14 (abbreviation S(Jrip-cost 14)) because it has highest number of wins (5) over
other classifiers. However, it wins only over 5 (out of 36) classifiers, and there is large
number of classifiers to which it is tied (31 including itself). On the other side, if we
look to the means of YIs from all 36 classifiers, we can notice that there are several
models that perform approximately equal with respect to the means of YI. Also, it can
be noticed that those models, either performs statistically significantly better than
others or are tied with most of the classifiers (like S(Jrip-cost 14) model was).
TU Eindhoven 67 G. Manev
Table 5.7 Set of best models according to Youden Index (S features).
Classification model TPR FPR Youden Index Cumulative YI Cumulative TPR
Mean Std.dev. Mean Std.dev. Mean Std.dev. Nb + Nb = NB - Nb + Nb = NB -
S (SVM - cost 10) 0.38 0.1544 0.1427 0.0305 0.2373 0.1582 0 36 0 8 26 2
S (SVM - cost 14) 0.4486 0.1565 0.2045 0.0367 0.2441 0.1586 0 36 0 19 17 0
S (Jrip - cost 8) 0.3915 0.1876 0.1548 0.0449 0.2367 0.1801 0 36 0 5 30 1
S (Jrip - cost 9) 0.4162 0.1837 0.1714 0.0494 0.2448 0.1771 0 36 0 10 26 0
S (Jrip - cost 10) 0.4339 0.179 0.1858 0.0507 0.2481 0.1728 0 36 0 12 24 0
S (Jrip - cost 14) 0.5141 0.1813 0.2396 0.0616 0.2745 0.1757 5 31 0 25 11 0
S\RQoL (Jrip - cost 14) 0.5289 0.1842 0.2561 0.0613 0.2729 0.1793 4 32 0 27 9 0
S\RQoL (J48 - cost 10 - obj.10) 0.394 0.1632 0.1614 0.0388 0.2326 0.1587 1 35 0 11 24 1
S\RQoL (J48 - cost 14 - obj.10) 0.4361 0.1708 0.1997 0.0428 0.2364 0.1694 0 36 0 15 21 0
S\RQoL (J48 - cost 14 - obj.20) 0.4548 0.1723 0.2185 0.05 0.2363 0.1683 0 36 0 20 16 0
Additionally, because we consider 3 different classification methods (SVM, JRip,
and J48), and in order to be complete, we want to select models from each type of
classification method, which make our selection of the “set of best classifiers” with
respect to YI even more complex.
Therefore, from now on, in the “set of best models” with respect to YI in a certain
experiment we select those models that 1) statistically significantly outperform over
others with respect to YI and/or 2) have highest means of YIs. Additionally, this is
done for each type of classification methods.
Table 5.7 shows the “set of best classification models” using only the S features.
We can see that Jrip with cost 14 performs best in both subset of features S and
S\RQoL. On the other side, as this table only shows the best models out of all 36
models in this experiment, J48 performs better in case of S\RQoL, when Rate QoL
and Rate Overall Health symptoms are not used.
From the “set of best classification models” we look for best models with respect
to TPR (max TPR) and FPR (min FPR) for each classification method. We perform t-
test with respect to TPR, to investigate which model is statistically significantly better
than others with respect to TPR. The complete table of these results is shown in
Appendix A, while Table 5.7 also shows the results with respects to TPR but only on
the best set of classifiers. It should be noticed that in case of TPR it is not necessary to
perform the t-test over the “set of best models” but it can be done on the whole set of
36 classifiers. From Table 5.7 can be noticed that all algorithms (SVM, Jrip and J48)
have max TPRs in case of using cost 14. Even more, in case of J48 and cost 14, the
model that uses object size = 20.
Similarly, we performed t-test analysis with respect to FPR. In contrast to the case
of max TPR, here we do this analysis only on the set of best models. In case of FPR
TU Eindhoven 68 G. Manev
metric, a particular classifier is better that other if it has lower FPR. Therefore, a
classifier is statistically significantly better than others if it is outperformed by others.
As Table 5.8 shows, SVM with cost 10 is the model for SVM method with minimum
FPR, as it is outperformed from 6 of the “best models”. For Jrip, the model with
minimum FPR is S (Jrip-cost8), and for J48 it is S\RQoL (J48-cost10-obj.10)
classification model.
Table 5.8 Result of T-test on FPR between classifiers of the “best set of models” (S)
(+ significantly outperforms, = tie, - significantly outperformed)
Classification model Cumulative FPR (Nb+)-
1 2 3 4 5 6 7 8 9 10 Nb + Nb = NB - (Nb-)
1 = S (SVM-cost 10) / - = = - - - = - - 0 3 6 -6
2 = S (SVM-cost 14) + / + = = = - + = = 3 5 1 2
3 = S (Jrip-cost 8) = - / = = - - = - - 0 4 5 -5
4 = S (Jrip-cost 9) = = = / = - - = = - 0 6 3 -3
5 = S (Jrip-cost 10) + = = = / - - = = = 1 6 2 -1
6 = S (Jrip-cost 14) + = + + + / = + = = 5 4 0 5
7 = S\RQoL (Jrip-cost 14) + + + + + = / + + = 7 2 0 7
8 = S\RQoL (J48-cost 10-obj.10) = - = = = - - / - - 0 4 5 -5
9 = S\RQoL (J48-cost 14-obj.10) + = + = = = - + / = 3 5 1 2
10 = S\RQoL (J48-cost 14-obj.20) + = + + = = = + = / 4 5 0 4
5.5.2 Learning models using (S+D) feature set
In this experiment, we learned classification models using combination of symptoms
and features from daily measurements (S+D). Similarly, we also considered subset of
this feature set, namely S+D\RH, which does not include features generated from RoT
(ExistsRoTAlarm) and HRTI (ExistsHRTIAlarm) algorithms as in the preliminary
investigation they do not show significant results in prediction of HF hospitalization.
Additionally, we considered another subset of this feature set, which does not include
Rate QoL and Rate Overall Health features (abbreviation S\RQoL+D\RH). The
complete set of classifiers in this experiment is 52.
In the previous experiment we have explained how we select the “set of best
models” with respect to YI, and how later from this set we choose models with max
TPR and min FPR for each of the classification methods SVM, Jrip and J48. In this
section and also in the next experiments we follow the same procedure for choosing
the ”set of best models”, and models with max TPR and min FPR. Therefore, here, we
only show those classification models. The complete results of all of the models can
be found in Appendix A.
As Table 5.9 shows, with respect to YI, SVM models with cost 10 and cost 14
actually outperform the others models as they have maximum number of wins (15 and
TU Eindhoven 69 G. Manev
19, respectively) over all other models. Jrip with cost 14 in case of S\RQoL+D\RH
feature set performs better that in case of S+D and S+D\RH feature sets. Similarly,
J48 performs better when S\RQoL+D\RH feature set is used over the other
combination of features.
Table 5.9 Set of best models according to Youden Index (S+D features).
Classification model TPR FPR Youden Index Cumulative YI Cumulative TPR (Nb+)-
Mean Std.dev. Mean Std.dev. Mean Std.dev. Nb + Nb = NB - Nb + Nb = NB - (Nb-)
S+D (SVM-cost 10) 0.4605 0.1687 0.1381 0.0327 0.3225 0.168 15 37 0 15 36 1 14
S+D (SVM-cost 14) 0.5476 0.1745 0.1843 0.0349 0.3634 0.176 19 33 0 32 20 0 32
S+D (Jrip-cost 14) 0.5349 0.1842 0.2496 0.0628 0.2853 0.1746 5 47 0 28 24 0 28
S+D\RH (Jrip-cost 10) 0.48 0.1865 0.1889 0.049 0.2911 0.1758 7 45 0 16 36 0 16
S+D\RH (Jrip-cost 14) 0.5385 0.1922 0.2413 0.0597 0.2972 0.1809 7 45 0 29 23 0 29
S+D\RH (J48-cost 9-obj.20) 0.3957 0.1686 0.1206 0.0387 0.2751 0.1646 8 55 0 10 37 5 5
S\RQoL+D\RH (Jrip-cost 9) 0.4618 0.1772 0.1664 0.0439 0.2953 0.1712 10 42 0 14 38 0 14
S\RQoL+D\RH (Jrip-cost 10) 0.4795 0.1763 0.1776 0.044 0.3019 0.1732 10 42 0 15 37 0 15
S\RQoL+D\RH (Jrip-cost 14) 0.5372 0.1851 0.2241 0.0666 0.3131 0.1693 12 40 0 29 23 0 29
S\RQoL+D\RH (J48-cost 8-obj.20) 0.3765 0.1656 0.0987 0.0376 0.2778 0.1603 9 43 0 7 39 6 1
S\RQoL+D\RH (J48-cost 9-obj.20) 0.404 0.1688 0.1263 0.0388 0.2777 0.1656 8 44 0 10 37 5 5
S\RQoL+D\RH (J48-cost 10-obj.10) 0.443 0.1716 0.1519 0.0338 0.2912 0.1683 12 40 0 16 36 0 16
S\RQoL+D\RH (J48-cost 10-obj.20) 0.4312 0.1723 0.1508 0.0384 0.2804 0.1685 8 44 0 14 38 0 14
S\RQoL+D\RH (J48-cost 14-obj.10) 0.47 0.1741 0.1918 0.0377 0.2782 0.1705 11 41 0 17 35 0 17
S\RQoL+D\RH (J48-cost 14-obj.20) 0.4743 0.167 0.1811 0.0397 0.2932 0.1672 13 39 0 24 28 0 24
As for the models with max TPR, SVM with cost 14 and S+D feature set performs
statistically better than all other classifiers. It wins over 32 other models, and have
maximum TPR of 0.5478. The model with maximum TPR for the Jrip method is
S\RQoL+D\RH (Jrip-cost14) as it has more wins (29) than the other models from Jrip.
Finally, S\RQoL+D\RH (J48-cost14-obj.20) is selected as model with maximum TPR
for the J48 classification method.
Table 5.10 Result of T-test on FPR between classifiers of the “best set of models”
(S+D) (+ significantly outperforms, = tie, - significantly outperformed) Classification model Cumulative FPR (Nb+)-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Nb + Nb = NB - (Nb-)
1 = S+D (SVM-cost 10) / - - - - = = - - + = = = - - 1 5 8 -7
2 = S+D (SVM-cost 14) + / - = - + = = = + + + + = = 6 6 2 4
3 = S+D (Jrip-cost 14) + + / + = + + + = + + + + + + 12 2 0 12
4 = S+D\RH (Jrip-cost 10) + = - / - + = = = + + + = = = 5 7 2 3
5 = S+D\RH (Jrip-cost 14) + + = + / + + + = + + + + + + 12 2 0 12
6 = S+D\RH (J48-cost 9-obj.20) = - - - - / - - - = = - = - - 0 4 10 -10
7 = S\RQoL+D\RH (Jrip-cost 9) = = - = - + / = - + + = = = = 3 8 3 0
8 = S\RQoL+D\RH (Jrip-cost 10) + = - = - + = / - + + = = = = 4 7 3 1
9 = S\RQoL+D\RH (Jrip-cost 14) + = = = = + + + / + + + + = = 8 6 0 8
10 = S\RQoL+D\RH (J48-cost 8-obj.20) - - - - - = - - - / - - - - - 0 1 13 -13
11 = S\RQoL+D\RH (J48-cost 9-obj.20) = - - - - = - - - + / - - - - 1 2 11 -10
12 = S\RQoL+D\RH (J48-cost 10-obj.10) = - - - - + = = - + + / = - - 3 4 7 -4
13 = S\RQoL+D\RH (J48-cost 10-obj.20) = - - = - = = = - + + = / - - 2 6 6 -4
14 = S\RQoL+D\RH (J48-cost 14-obj.10) + = - = - + = = = + + + + / = 6 6 2 4
15 = S\RQoL+D\RH (J48-cost 14-obj.20) + = - = - + = = = + + + + = / 6 6 2 4
Similarly, Table 5.10 shows the results of the t-test with respect to FPR from the
“set of best models” The best models for SVM, Jrip and J48 methods with respect to
the minimum FPR are S+D(SVM-cost10), S\RQoL+D\RH(Jrip-cost8),
S\RQoL+D\RH(J48-cost8-obj20), respectively, as they are outperformed by more
models than the others are (SVM: 7, Jrip:0, J48: 13) .
TU Eindhoven 70 G. Manev
5.5.3 Learning models using (S+H) feature set
This experiment is similar as the one using only symptoms (S) features. Additionally
to the S feature set, the feature from the medical history (H) is used. Again, we
consider subset of this feature set, namely S\RQoL+H. The complete results of all 36
constructed models in the experiments are shown in Appendix A.
As Table 5.11 shows, there are 10 models in the “set of best models” with respect
to YI. However, none but SVM with cost 14 of the models significantly outperforms
the others with respect to YI as they are all tied to each other. In this case, as
explained before, the “set of best models” is selected from the models with highest
YI’s means. Additionally, it can be noticed that also in this experiment J48 performs
better when Rate QoL and Rate Overall Health symptoms are not included.
Table 5.11 Set of best models according to Youden Index (S+H features).
Classification model TPR FPR Youden Index Cumulative YI Cumulative TPR (Nb+)-
Mean Std.dev. Mean Std.dev. Mean Std.dev. Nb + Nb = NB - Nb + Nb = NB - (Nb-)
S+H (SVM-cost 10) 0.4008 0.1689 0.144 0.0323 0.2567 0.1689 0 36 0 4 31 1 3
S+H (SVM-cost 14) 0.4851 0.1752 0.1994 0.037 0.2857 0.1769 3 33 0 19 17 0 19
S+H (Jrip-cost 10) 0.4232 0.1868 0.1769 0.053 0.2463 0.1795 0 36 0 3 33 0 3
S+H (Jrip-cost 14) 0.5123 0.1953 0.2437 0.0664 0.2686 0.1878 0 36 0 21 15 0 21
S\RQoL+H (Jrip-cost 8) 0.3824 0.1825 0.1347 0.0456 0.2477 0.1792 0 36 0 3 32 1 2
S\RQoL+H (Jrip-cost 9) 0.3985 0.1861 0.1562 0.0482 0.2423 0.1773 0 36 0 3 33 0 3
S\RQoL+H (Jrip-cost 10) 0.4205 0.1795 0.1741 0.0516 0.2464 0.1741 0 36 0 5 31 0 5
S\RQoL+H (Jrip-cost 14) 0.502 0.1832 0.2357 0.0715 0.2662 0.1761 0 36 0 21 15 0 21
S\RQoL+H (J48-cost 9-obj.10) 0.3685 0.1686 0.1281 0.0351 0.2404 0.1665 0 36 0 3 31 2 1
S\RQoL+H (J48-cost 9-obj.20) 0.3677 0.1612 0.1206 0.0388 0.2471 0.1602 0 36 0 3 30 3 0
With respect to TPR, as Table 5.11 shows, Jrip with cost 14 in both S+H and
S\RQoL+H feature subsets outperforms other models (21 wins in both cases).
However, we select the S\RQoL+H (Jrip-cost14) model to be the best for the Jrip
method as it has less features (less number of rules). The model with max TPR for the
SVM method is the model with cost 14 and S+H feature set (S+H(SVM-cost14)). It is
actually second best model from all other models with respect to TPR as it wins over
19 other models. On the other side, models of J48 do not perform significantly better
than many other models. Its best model S\RQoL+H(J48-cost9-obj.10) wins only over
3 other models but it is also outperformed by 2 models.
Table 5.12 shows the t-test performed on the “set of best models” with respect to
FPR. The selected models with min FPR for SVM and Jrip, are S+H(SVM-cost10)
and S\RQoL+H(Jrip-cost8), as the are outperformed by more other models with
respect to FPR. In case of J48, two models perform equally with respect to FPR as
they both are outperformed by 5 other models. However, the model S\RQoL+H(J48-
cost9-obj.20) is selected as it has a bit smaller mean of FPR (Table 5.11).
TU Eindhoven 71 G. Manev
Table 5.12 Result of T-test between classifiers of the “best set of models” (S+H)
(+ significantly outperforms, = tie, - significantly outperformed)
Classification model Cumulative FPR (Nb+)-
1 2 3 4 5 6 7 8 9 10 Nb + Nb = NB - (Nb-)
1 = S+H (SVM-cost 10) / - = - = = = - = = 0 6 3 -3
2 = S+H (SVM-cost 14) + / = = + + = = + + 5 4 0 5
3 = S+H (Jrip-cost 10) = = / - = = = - + + 2 5 2 0
4 = S+H (Jrip-cost 14) + = + / + + + = + + 7 2 0 7
5 = S\RQoL+H (Jrip-cost 8) = - = - / = - - = = 0 5 4 -4
6 = S\RQoL+H (Jrip-cost 9) = - = - = / = - = = 0 6 3 -3
7 = S\RQoL+H (Jrip-cost 10) = = = - + = / - + + 3 4 2 1
8 = S\RQoL+H (Jrip-cost 14) + = + = + + + / + + 7 2 0 7
9 = S\RQoL+H (J48-cost 9-obj.10) = - - - = = - - / = 0 4 5 -5
10 = S\RQoL+H (J48-cost 9-obj.20) = - - - = = - - 0 / 0 4 5 -5
5.5.4 Learning models using (S+D+H) feature set
This experiment extends the experiment that uses (S+D) features by additional feature
constructed from medical history (H). Similarly as in case of (S+D) experiment, two
additional subsets of features are considered, S+D\RH+H and S\RQoL+D+H. The
total number of classification models compared in this experiment is 52, and the
complete results are shown in Appendix A.
From Table 5.13, it can be noticed that there are classification models that actually
statistically significantly outperforms over others, such as SVM for all costs (but most
with cost 14), and Jrip with cost 14 for both S+D+H and S\RQoL+D\RH+H feature
subsets, as they have wins over other models. As already explained, for completeness
and in order to select models with min FPR, in the set of best models we also include
other models that have approximately similar or a bit smaller mean of YI.
Table 5.13 Set of best models according to Youden Index (S+D+H features).
Classification model TPR FPR Youden Index Cumulative YI Cumulative TPR (Nb+)-
Mean Std.dev. Mean Std.dev. Mean Std.dev. Nb + Nb = NB - Nb + Nb = NB - (Nb-)
S+D+H (SVM-cost8) 0.4045 0.171 0.1048 0.0287 0.2997 0.1703 7 45 0 9 38 5 4
S+D+H (SVM-cost9) 0.4577 0.1828 0.1248 0.0319 0.3329 0.182 19 33 0 15 34 3 12
S+D+H (SVM-cost10) 0.4914 0.1771 0.1371 0.0337 0.3543 0.1777 25 27 0 22 30 0 22
S+D+H (SVM-cost14) 0.5482 0.1784 0.1788 0.0366 0.3694 0.1765 31 21 0 35 17 0 35
S+D+H (Jrip-cost9) 0.4313 0.1908 0.176 0.0511 0.2553 0.1749 0 52 0 10 39 3 7
S+D+H (Jrip-cost10) 0.4718 0.1931 0.1999 0.0607 0.2719 0.1754 0 52 0 14 37 1 13
S+D+H (Jrip-cost14) 0.5944 0.2068 0.2645 0.0694 0.3299 0.1845 12 40 0 40 12 0 40
S+D+H (J48-cost10-obj20) 0.4337 0.1699 0.1608 0.0395 0.2729 0.169 3 49 0 14 35 3 11
S+D+H (J48-cost14-obj20) 0.4777 0.1809 0.2049 0.0425 0.2729 0.1799 0 45 7 23 29 0 23
S+D\RH+H (JRip-cost10) 0.4873 0.2111 0.2063 0.0637 0.281 0.1926 0 52 0 16 36 0 16
S+D\RH+H (JRip-cost14) 0.5917 0.1894 0.2814 0.0657 0.3103 0.1723 7 45 0 42 10 0 42
S+D\RH+H (J48-cost14-obj20) 0.4646 0.1747 0.2055 0.0414 0.2591 0.1716 1 51 0 23 28 1 22
S\RQoL+D\RH+H (Jrip-cost 8) 0.4224 0.1818 0.1542 0.0424 0.2683 0.171 0 52 0 9 40 3 6
S\RQoL+D\RH+H (Jrip-cost 9) 0.442 0.1966 0.1751 0.0474 0.2669 0.1843 0 52 0 10 40 2 8
S\RQoL+D\RH+H (Jrip-cost 10) 0.4733 0.2011 0.1974 0.0592 0.2759 0.1844 0 52 0 14 37 1 13
S\RQoL+D\RH+H (Jrip-cost 14) 0.6235 0.1931 0.2688 0.0601 0.3547 0.1825 21 31 0 45 7 0 45
However, when we will select models with maximum TPR, exactly those models
discussed above will be selected. So, for SVM method the model with max TPR is the
model with cost 14 (S+D+H(SVM-cost 14)) as it has highest number of wins over
TU Eindhoven 72 G. Manev
other models (35 in total). Similarly, S\RQoL+D\RH+H(Jrip-cost 14) with 45 wins
over other models is the best model of Jrip method. As for J48 the model with max
TPR is S+D\RH+H(Jrip-cost14-obj20).
Table 5.14 shows the results of the t-test with respect to FPR over the set of best
models. Here, models with min FPR are those that have lower costs. For SVM, it is
the model with cost 8, for Jrip it is the S\RQoL+D\RH+H(Jrip-cost 8) model and for
J48 the best model with respect to FPR is S +D+H(J48-cost 10-obj20).
Table 5.14 Result of T-test on FPR between classifiers of the “best set of models”
(S+D+H) (+ significantly outperforms, = tie, - significantly outperformed) Classification model Cumulative FPR (Nb+)-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Nb + Nb = NB - (Nb-)
1 = S+D+H (SVM-cost8) / - - - - - - - - - - - - - - - 0 0 15 -15
2 = S+D+H (SVM-cost9) = / - - - - - - - - - - = - - - 1 1 13 -12
3 = S+D+H (SVM-cost10) = = / - - - - = - - - - = - - - 2 2 11 -9
4 = S+D+H (SVM-cost14) = = = / = = - = = = - = = = = - 3 9 3 0
5 = S+D+H (Jrip-cost9) = = = = / = - = = = - = = = = - 3 9 3 0
6 = S+D+H (Jrip-cost10) = = = = = / - = = = - = = = = - 4 8 3 1
7 = S+D+H (Jrip-cost14) = = = = = = / = = = = = = = = = 13 2 0 13
8 = S+D+H (J48-cost10-obj20) = = = = = = - / - - - - = = = - 2 7 6 -4
9 = S+D+H (J48-cost14-obj20) = = = = = = - = / = - = = = = - 5 7 3 2
10 = S+D\RH+H (JRip-cost10) = = = = = = - = = / - = = = = - 5 7 3 2
11 = S+D\RH+H (JRip-cost14) = = = = = = = = = = / = = = = = 13 2 0 13
12 = S+D\RH+H (J48-cost14-obj20) = = = = = = - = = = - / = = = - 5 7 3 2
13 = S\RQoL+D\RH+H (Jrip-cost 8) = = = = = - - = - - - - / = - - 1 6 8 -7
14 = S\RQoL+D\RH+H (Jrip-cost 9) = = = = = = - = = = - = = / = - 4 8 3 1
15 = S\RQoL+D\RH+H (Jrip-cost 10) = = = = = = - = = = - = = = / - 4 8 3 1
16 = S\RQoL+D\RH+H (Jrip-cost 14) = = = = = = = = = = = = = = = / 13 2 0 13
5.5.5 Learning with feature selection (S+D+H+FS) In the last experiment we use all feature sets (S+D+H) on which we apply feature
selection using exhaustive search algorithm in combination with Correlation-based
Feature Selection (CFS) method. Using 10 cross-fold validation, we utilize the
Attribute Selected Classifier for different learning algorithms (Jrip, DT and SVM) and
different combinations of parameters and costs.
There are 9 models in the” set of best models”. These models were selected from
the highest means of YI, as the t-test shows that none (but S+D+H+FS (SVM-cost10)
model) of the models significantly outperforms the others.
Table 5.15 Set of best models according to Youden Index (S+D+H+FS features).
Classification model TPR FPR Youden Index Cumulative YI Cumulative TPR (Nb+)-
Mean Std.dev. Mean Std.dev. Mean Std.dev. Nb + Nb = NB - Nb + Nb = NB - (Nb-)
S+D+H+FS (SVM-cost8) 0.4173 0.1736 0.1054 0.0358 0.3119 0.1711 0 20 0 0 18 2 -2
S+D+H+FS (SVM-cost9) 0.4369 0.174 0.1111 0.0333 0.3258 0.1732 0 20 0 3 16 1 2
S+D+H+FS (SVM-cost10) 0.4401 0.1714 0.1187 0.0354 0.3213 0.1722 0 20 0 3 16 1 2
S+D+H+FS (SVM-cost14) 0.5408 0.1971 0.1872 0.056 0.3536 0.1844 2 18 0 15 5 0 15
S+D+H+FS (Jrip-cost10) 0.4371 0.1938 0.1544 0.0552 0.2827 0.176 0 20 0 2 16 2 0
S+D+H+FS (Jrip-cost14) 0.5303 0.1935 0.203 0.0521 0.3273 0.1835 0 20 0 13 7 0 13
S+D+H+FS (J48-cost14-obj2) 0.4571 0.1852 0.1604 0.0497 0.2967 0.1755 0 20 0 6 14 0 6
S+D+H+FS (J48-cost14-obj10) 0.472 0.1826 0.1696 0.0514 0.3024 0.1713 0 20 0 8 12 0 8
S+D+H+FS (J48-cost14-obj20) 0.4862 0.1921 0.1779 0.0529 0.3083 0.179 0 20 0 10 10 0 10
TU Eindhoven 73 G. Manev
As for the best models with respect to TPR, models with highest cost 14 were
selected for all methods SVM (15 wins), Jrip (13 wins), and J48 with additional object
size = 20 (10 wins).
The results of the t-test with respect to FPR, are shown in Table 5.16 For SVM
there are 3 models that perform equally, as they are all outperformed by 6 other
models. However, the model with cost 8 is selected as best with respect to FPR, as it
has smaller FPR (0.1054) (Table 5.16). For Jrip and J48 the models with cost 10 and
cost 14 (additionally object size = 2) respectively, are chosen as best with respect to
min FPR.
Table 5.16 Result of T-test on FPR between classifiers of the “best set of models”
(S+D+H+FS) (+ significantly outperforms, = tie, - significantly outperformed)
Classification model Cumulative FPR (Nb+)-
1 2 3 4 5 6 7 8 9 Nb + Nb = NB - (Nb-)
1 = S+D+H+FS (SVM-cost8) / = = - - - - - - 0 2 6 -6
2 = S+D+H+FS (SVM-cost9) = / = - - - - - - 0 2 6 -6
3 = S+D+H+FS (SVM-cost10) = = / - - - - - - 0 2 6 -6
4 = S+D+H+FS (SVM-cost14) + + + / = = = = = 3 5 0 3
5 = S+D+H+FS (Jrip-cost10) + + + = / - = = = 3 4 1 2
6 = S+D+H+FS (Jrip-cost14) + + + = + / + + = 6 2 0 6
7 = S+D+H+FS (J48-cost14-obj2) + + + = = - / = = 3 4 1 2
8 = S+D+H+FS (J48-cost14-obj10) + + + = = - = / = 3 4 1 2
9 = S+D+H+FS (J48-cost14-obj20) + + + = = = = = / 3 5 0 3
Although, the selected best models with respect to TPR and FPR for all methods
(SVM, Jrip, and J48) show a bit different results, they actually produce the same
feature set consisting of 5 features.
5.5.6 Summary of Classification model prediction
In the previous sections we have experimented with the learning process of the
classification task. We performed 5 different experiments using combination of
different feature sets and in same time in each of the experiment we utilized the
performance of three classification methods (SVM, Jrip and J48). Additionally, for
J48 different parameter settings were considered. As for the output from this
experimentation, from each of the experiments we constructed a “set of best models”
that performs approximately equal with respect to Youden Index. Additionally, for
each of the classification methods, from the “set of best models” we selected two
models that significantly outperform others with respect to TPR (max TPR) and FPR
(min FPR), respectively. These models will be used to evaluate our approach on an
independent test dataset.
TU Eindhoven 74 G. Manev
Figure 5.2 shows the selected “sets of best models” with respect to YI from all
experiments. We can see from the figure that the use of information extracted from
the daily measurements (D features) together with S features improves the
performance of classification techniques compared with usage of only S features and
combination of S features with patient’s medical history (H features) (corresponding
points are closer to the top left corner). Accumulating information for patient’s
medical history (H features) to symptoms (S features) gives a considerable
improvement of the SVM technique, and do not change the performance of Jrip and
J48 techniques over the usage of only S features. On the other hand, adding medical
history information (H) to symptoms (S) and daily measurements (D) improves the
performance of SVM, do not change (or degrades a bit) the performance of Jrip, and
degrades the performance of J48. Finally, the figure shows that feature selection
improves the performance of all classification techniques (except for SVM using
S+D+H features in which case the performance is not changed).
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.0 0.1 0.2 0.3
FPR (1-specificity)
TP
R (
se
ns
itiv
ity
)
S (SVM)
S (Jrip)
S\RQoL (Jrip)
S\RQoL (J48)
S+D (SVM)
S+D (JRip)
S+D\RH (JRip)
S+D\RH (J48)
S\RQoL+D\RH (JRip)
S\RQoL+D\RH (J48)
S+H (SVM)
S+H (Jrip)
S\RQoL+H (Jrip)
S\RQoL+H (J48)
S+D+H (SVM)
S+D+H (Jrip)
S+D+H (J48)
S+D\RH+H (Jrip)
S+D\RH+H (J48)
S\RQoL+D\RH+H (JRip)
S+D+H+FS (SVM)
S+D+H+FS (Jrip)
S+D+H+FS (J48)
Random guess
Figure 5.2 Hospitalization prediction accuracies for different classifiers and feature
sets from on training dataset.
TU Eindhoven 75 G. Manev
SVM classifiers show lower FPRs but JRip - higher TPRs. J48 has shown the
lowest FPR, but overall was slightly behind other two classification techniques with
respect to the Youden index.
Furthermore, in all experiments, classifiers with cost 14 perform better with
respect to TPR than classifiers with other costs. In case of J48, additionally, models
with object size = 20 show better performance. In case of FPRs, classifiers with lower
costs (usually 8 or 10) perform better than classifiers with higher costs.
In terms of using subset of the feature sets (e.g. S\RQoL, S+D\RH), J48 classifiers
without Rate QoL and Rate Overall Health symptoms and without features from RoT
and HRTI algorithms, perform better with respect to all YI, TPR, and FPR than
classifiers in which these features are taken into account. Similarly, Jrip classifiers
without RoT and HRTI features perform better with respect to TPR and FPR.
5.6 Heart Failure Hospitalization Evaluation
In Section 5.5 we discussed the process of learning classification models from the
training dataset. For different combination of feature sets, three classification
methods, SVM, Jrip and J48 were performed utilizing different costs. From the set of
best models with respect to Youden Index, we selected two best classification models
with respect to TPR (maximum) and FPR (minimum) for each classification methods
and each combination of feature sets.
In this section we evaluate the selected models of our approach on the test dataset
and we compare the performance of our approach to the other existing approaches. As
shown in Figure 5.3, we performed two different evaluations using different
definitions of the operational setting for computing the TPs, FPs, FNs, and TNs.
Existing RulesClassifier models Test
dataset
Compute TPs, FPs, FNs, TNs
using period of validityCompute TPs, FPs, FNs, TNs
using period of validity
Compare
Compare
Compute TPs, FPs,
FNs, and TNs on a
daily basis
Compute TPs, FPs,
FNs, and TNs on a
daily basis
Figure 5.3. Evaluation setup.
TU Eindhoven 76 G. Manev
We first evaluate our approach according to the evaluation performed in [46] and
based on the over simplified setup discussed in Chapter 4. After we show the results
of this evaluation and the limitations, we refer again to the operational settings of how
classifiers should cast a prediction. We created simple adaptive engine for casting
alarms and based on the work of the engine we evaluate the performances of
approaches and compare with the existing one.
5.6.1 Evaluation on a daily basis
5.6.1.1 Setup definition
The setup for the first evaluation is based on the evaluation as in [46]. This evaluation
is performed such that the algorithms have to take a decision at every day where
measurements are available, and its goal is to predict a HF hospitalization within the
next 14 days (explained in related work in Section 4.2). An alarm is seen as a TP, if it
is fired within 14 days before a HF hospitalization, and if alarm is generated at days
out of this range, it is considered as FP. All days with measurements that are within
the window of 14 days before HF hospitalization and no alarm is generated are FNs.
Finally, all days (with measurements) with no alarm that do not belong in the
specified time period are TNs. TPRs and FPRs are constructed as in formula (1).
However, due to the definition of the above metrics, the TRP and FPR may not show
meaningful results with regard to describe an algorithm’s ability to detect a HF
hospitalization. Similarly as in [46], we use hospitalization detection rate (HDR),
which represents proportion of HF hospitalizations that have at least one alarm in the
window of 14 days prior a HF hospitalization.
5.6.1.2 Evaluation results
Because we wanted to test the algorithms on a daily basis, for each day in which there
was at least one weight measurement for each patient from the test dataset (29
patients) a new instance was constructed. The number of positive test instances was
220, and negative 9605.
The results on the test dataset in accordance with the above setup are shown in
Figure 5.4 and complemented with Table 5.17. It should be noticed that due to
simulating daily-based prediction in the given operational settings the number of
TU Eindhoven 77 G. Manev
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.0 0.1 0.2 0.3
FPR (1-specificity)
TP
R (
se
ns
itiv
ity
)S (SVM)
S (Jrip)
S (J48)
S+D (SVM)
S+D (Jrip)
S+D (J48)
S+H (SVM)
S+H (Jrip)
S+H (J48)
S+D+H (SVM)
S+D+H (Jrip)
S+D+H (J48)
S+D+H+FS (SVM)
S+D+H+FS (Jrip)
S+D+H+FS (J48)
RoT
MACD
WTI
HRTI
UnionMW
UnionMWH
UnionRMW
UnionRMWH
Random guess
Figure 5.4 Hospitalization prediction accuracies for different classifiers and feature
sets on test dataset.
correct classifications can be (14 times) higher than the actual number of
hospitalizations. Therefore, we also report the hospitalization detection rate (HDR)
that has same semantics as TPR during the learning phase (Figure 5.5).The most
important to see here is that all classification approaches perform much better than
previous approaches according to Youden index and hospitalization rate.
However, the relative performance of different classification techniques according
to Youden index is a bit different compared to their performance on the training data.
This is reasonable outcome since the number of TPs and FPs can be higher. We can
see that using both daily measurements (D) features and symptoms (S) features,
improves the performance of Jrip and J48 classification techniques, and degrades a bit
the performance of SVM, compared to usage of only S features. In general, SVM
TU Eindhoven 78 G. Manev
using only S symptoms, and Jrip and J48 classifiers that use both symptoms (S) and
daily measurement (D) features, show best performance with respect to Youden
Index. On the other side, adding information from medical history (H) to symptoms
(S), improves the performance of only J48 and degrades the performance of Jrip and
SVM. Similarly, adding medical history (H) features to daily measurements (H) and
symptoms (S) degrades the performance of SVM and Jrip, while the performance of
J48 dos not changes.
Table 5.17 Prediction accuracies on the test dataset.
Classification model TP FN FP TN TPR FPR Yindex Hrate #Alarms
Our Approach
S (SVM) max TPR 135 85 2108 7497 0.6136 0.2195 0.3941 0.6 2243
min FPR 118 102 1338 8267 0.5364 0.1393 0.3971 0.5 1456
(Jrip) max TPR 135 85 2769 6836 0.6136 0.2883 0.3253 0.65 2904
min FPR 106 114 1583 8022 0.4818 0.1648 0.317 0.5 1689
(J48) max TPR 80 140 1014 8591 0.3636 0.1056 0.258 0.35 1094
min FPR 82 138 1226 8379 0.3727 0.1276 0.2451 0.4 1308
S+D (SVM) max TPR 129 91 2077 7528 0.5864 0.2162 0.3702 0.5 2206
min FPR 126 94 1725 7880 0.5727 0.1796 0.3931 0.45 1851
(Jrip) max TPR 116 104 1626 7979 0.5273 0.1693 0.358 0.6 1742
min FPR 112 108 1197 8408 0.5091 0.1246 0.3845 0.55 1309
(J48) max TPR 99 121 1127 8478 0.45 0.1173 0.3327 0.55 1226
min FPR 87 133 844 8761 0.3955 0.0879 0.3076 0.55 931
S+H (SVM) max TPR 84 136 1628 7977 0.3818 0.1695 0.2123 0.4 1712
min FPR 64 156 1287 8318 0.2909 0.134 0.1569 0.4 1351
(Jrip) max TPR 87 133 1457 8148 0.3955 0.1517 0.2438 0.4 1544
min FPR 86 134 1383 8222 0.3909 0.144 0.2469 0.35 1469
(J48) max TPR 91 129 1169 8436 0.4136 0.1217 0.2919 0.4 1260
min FPR 91 129 1160 8445 0.4136 0.1208 0.2928 0.4 1251
S+D+H (SVM) max TPR 97 123 1926 7679 0.4409 0.2005 0.2404 0.45 2023
min FPR 97 123 1212 8393 0.4409 0.1262 0.3147 0.4 1309
(Jrip) max TPR 95 125 1720 7885 0.4318 0.1791 0.2527 0.55 1815
min FPR 85 135 1135 8470 0.3864 0.1182 0.2682 0.55 1220
(J48) max TPR 129 91 2261 7344 0.5864 0.2354 0.351 0.6 2390
min FPR 127 93 2268 7337 0.5773 0.2361 0.3412 0.6 2395
S+D+H+FS (SVM) max TPR 106 114 1741 7564 0.4818 0.1871 0.2947 0.6 1847
min FPR 95 125 679 8926 0.4318 0.0707 0.3611 0.55 774
(Jrip) max TPR 108 112 1749 7856 0.4909 0.1821 0.3088 0.65 1857
min FPR 72 148 1005 8600 0.3273 0.1046 0.2227 0.35 1077
(J48) max TPR 106 114 1741 7864 0.4818 0.1813 0.3005 0.6 1847
min FPR 94 126 737 8868 0.4273 0.0767 0.3506 0.25 831
Previous approach
RoT 8 212 184 9430 0.0364 0.0191 0.0173 0.3 192
MACD 40 180 554 9060 0.1818 0.0576 0.1242 0.3 594
WTI 28 192 168 9446 0.1273 0.0175 0.1098 0.2 196
HRTI 2 213 208 9170 0.0093 0.0222 -0.0129 0.05 210
MACD+WTI 51 169 669 8945 0.2318 0.0696 0.1622 0.3 720
MACD+WTI+HRTI 53 168 867 8827 0.2398 0.0894 0.1504 0.35 920
RoT+MACD+WTI 56 164 826 8788 0.2545 0.0859 0.1686 0.4 882
RoT+MACD+WTI+HRTI 58 163 1021 8673 0.2624 0.1053 0.1571 0.45 1079
Feature selection results in models that favor optimization of FPR, and SVM
classifier stands here out. FPRs of the tested classifiers are rather higher compared to
the existing approaches due to the fact that S features often have a high impact on the
classification. Especially if we look to the absolute numbers of FPs (false alarms)
(Table 5.17), these numbers are very high. Indeed, S features allow to predict HF
hospitalization in many cases. However, these predictions are not precise in the sense
TU Eindhoven 79 G. Manev
that S features normally allow to say that there is a high chance of the hospitalization
within a month, but not within a particular 14 days period. Therefore, if classification
is based primarily on S features the model can generate a large number of false alarms
in a row. To avoid such situations additional handling mechanisms should be
introduced. Alternatively, simply a ‘possible hospitalization within 30 days’ warning
can be output by a classifier.
With respect to the hospitalization rate (Figure 5.5), maximum of 0.65 is obtained
by Jrip using only symptoms (S) features and feature selection of all (S+D+H+FS).
Using information of both S and D features, improves the performance of only J48
technique, while degrades the performance of SVM, compared with only using S
features. The performance of Jrip is higher in case of the models with max TPR and
lower in case of the model with min FPR. Adding medical history information to the
S features worsen the performance of all methods, and adding H to S+D improves the
performance of only J48, and worsen for others models.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
SVM Jrip J48 SVM Jrip J48
Models with max TPR and min FPR
Ho
sp
italizati
on
Ra
te
S
S+D
S+H
S+D+H
S+D+H+FS
RoT/MACD/UnionMW
WTI
HRTI
UnionMWH
UnionRMW
UnionRMWH
Figure 5.5 Hospitalization rates on the test dataset.
We can conclude that our classification approach in general performs better than
existing approach in both Youden index and hospitalization detection rate (HDR).
Maximum HDR is obtained by Jrip and FS. For the Youden index, models with
maximum TPRs are those from SVM method, while FS shows lower FPRs.
TU Eindhoven 80 G. Manev
5.6.1.3 Limitations
This evaluation has some limitations. First, not all days are considered (e.g. days
without measurements are not taken into account). Because there is no alarm at those
days, they should be either FNs or TNs regarding if a certain day is in the period of 14
days before a HF hospitalization or outside of it. Second, the period of validity of a
certain alarm is not taken into account in the definition of the metrics (TP, FP, TN,
FN). For example, the algorithms used in [46] are constructed based on an
observation of the data 14 days prior a HF hospitalization. In our approach we also
have rules (e.g. from symptoms) that are valid for longer period (e.g. 30 days).
Therefore, as was shown, there can be several alarms within the period of 14 days
(TPs) that predicts the same hospitalization, or more FPs (e.g. 10 false alarms in 10
continuous days). Hospitalization rate solves the problem of many TPs, but still the
number of FPs is very high, and even more the FPR does not reflect the real
performance.
5.6.2 Evaluation using prediction period
5.6.2.1 Setup definition
As explained before, we used different features in the training of our classifiers.
Those features have different characteristics and are taken from different time
windows. For example, existence of PreviousHFhosp feature uses information
whether there was a previous HF hospitalization within the last 30 days before the
current HF hospitalization, features from daily measurements were constructed from
the time window of 14 days prior a HF hospitalization, while symptom features were
taken from the last monthly contact (which in extreme case can be taken from a
period of 40 days). As shown in Table 5.18, using this into account, our classifiers
during the training phase may learn rules with different features: only symptoms (S)
features, only PreviousHFhosp feature, symptoms and PreviousHFhosp, only daily
measurements (D) features, daily measurements (D) and PreviousHFhosp, symptoms
and daily measurements (S+D) features, or union of all (D+PreviousHFhosp+S).
Because of the different types of features and the period of which they are
constructed, we define prediction period for which a particular classifier may predict
(period in which an alarm is valid). For example, if the learned rule consists only of
monthly symptoms then it is valid at most 40 days from the day of the monthly
TU Eindhoven 81 G. Manev
contact or until the next monthly contact when new symptoms are available (then new
rules may apply). If the learned rule consists only of features from daily
measurements, then its validity is for the next 14 days. All possible combination of
rules with the prediction period in which they are valid is shown in Table 5.18. It
should be noticed that construction of the rules is possible only in case of Jrip and J48
algorithms, since their output is interpretable, while SVM classifiers do not produce
readable format and therefore they are not considered in this process.
Table 5.18 Types of rules constructed during the learning process.
Type of rule
(using features)
Example of a rule Prediction Period
S (SwellingAnkles = B and
BreathlessnessLimitActivity = A) => existsHF
dayOfLastMC + 40
only PreviousHFhosp (PreviousHFhosp = True) => existsHF dayOfLastHFdisch + 30
S+PreviousHFhosp (PreviousHFhosp = True and
BreathlessnessLimitActivity = B) => existsHF
min(dayOfLastMC + 40,
dayOfLastHFdisch +30)
D (ExistsWTIAlarm = Y and ExistsMACDAlarm
= Y) => existsHF
currentDay + 14
D+PreviousHFhosp (PreviousHFhosp = True and
ExistsMACDAlarm = Y) => existsHF
min(currentDay + 14,
dayOfLastHFdisch +30)
S+D (BreathlessnessLimitActivity = A and
ExistsMACDAlarm = Y) => existsHF
min(currentDay + 14,
dayOfLastMC +40)
D+PreviousHFhosp+S (BreathlessnessLimitActivity = A and
ExistsMACDAlarm = Y and PreviousHFhosp
= True) => existsHF
min(currentDay + 14,
dayOfLastHFdisch + 30,
dayOfLastMC +40)
Legend:
• dayOfLastMC – the day at which the last monthly contact is recorded
• dayOfLastHFdisch – the day at which the last discharge from hospitalization due to HF is recorded
• currentDay – the current day at some when a rule is fired
Because of the limitations of the previous evaluation, and because of the
characteristics of the generated rules, we define how the obtained classifiers should be
used in real time settings for prediction of a HF hospitalization. Taking the
characteristics of the possible rules that one classifier produce we construct new
adaptive (meta) classifiers for each of the classifiers. To each of the rules of a
classifier we add a new condition to check the period for which the particular rule is
valid. Also, together with an alarm for a possible HF hospitalization, the adaptive
(meta) classifier also outputs the prediction period in which that alarm is valid. Table
5.19 shows some examples of original (not changed rule) and the new adaptive rules
constructed from them. For example, R.1.1 is original rule which in the previous
evaluation predicts a HF hospitalization within 14 days. However, since the rule is
constructed from symptoms, the new adaptive rule (R.1.2) constructed from it, has
prediction period of maximum 40 days after the last monthly contact at which the
symptoms were gathered. Similar rules may be constructed for other rules. It should
TU Eindhoven 82 G. Manev
be notice that, additional output can be generated such as the features (reasons) from
which the rule was generated, but for simplicity we show only what is important for
the prediction task.
We construct simple online adaptive engine that based on the meta rules of the
classifiers and their characteristics, will handle the possible alarms generated by the
meta rules. This means that a meta classifier may produce an alarm, but it is not
necessary that the engine will show it. Here only alarms that are allowed (fired) by the
engine are real alarms and they are outputted from the system.
Table 5.19 Examples of adaptive (meta) rules.
Rule# Possible rule Output R1.1 (SwellingAnkles = B and
BreathlessnessLimitActivity = A) => existsHF
Alert for possible hospitalization in 14 days
R1.2 (SwellingAnkles = B and
BreathlessnessLimitActivity = A and
(currentDay-dayOfLastMC) < 40) => existsHF
Alert for possible hospitalization valid in the
next (dayOfLastMC + 40 - currentDay) days
R2.1 (PreviousHFhosp = True) => existsHF Alert for possible hospitalization in 14 days
R2.2 (PreviousHFhosp = True and
(currentDay-dayOfLastHFdisch) < 30) => HF
Alert for possible hospitalization valid in the
next (dayOfLastHFdisch + 30 - currentDay)
R3.1 (ExistsWTIAlarm = Y and ExistsMACDAlarm =
Y) => existsHF
Alert for possible hospitalization in 14 days
R3.2 (ExistsWTIAlarm = Y and ExistsMACDAlarm =
Y) => existsHF
Alert for possible hospitalization in 14 days
Legend:
• dayOfLastMC – the day at which the last monthly contact is recorded
• dayOfLastHFdisch – the day at which the last discharge from hospitalization due to HF is recorded
• currentDay – the current day at some when a rule is fired
So far, we do not compute a decision list of the rules generated by a particular
classifier, i.e. the obtained rules from some classifier are not ordered, and they have to
be further checked by domain experts to apply proper ordering. Even more, rules from
Jrip are not mutually exclusive, i.e. more than one rule may cover the same instance.
Therefore, we give an example of the adaptive engine with two modes of work,
namely non-preemptive longer and preemptive shorter. However, after proper
ordering of the rules different definition of work of the adaptation engine can be
considered.
Non-preemptive longer work of the engine
At any point of time (every day) the engine runs a particular meta classifier and it
checks whether the classifier produces an alert or not. If an alarm is produced, the
engines checks if multiple rules are possible and the rule with maximum prediction
period is taken and an alarm (a real one) is outputted together with the period of its
validity (prediction period). For every day after the alarm, in its prediction period, the
TU Eindhoven 83 G. Manev
engine again checks whether the classifier produces an alarm or not. If an alarm is
produced it is not shown, because there is already an alarm for possible HF
hospitalization. This was one of the limitations in the previous case when multiple
TPs and FPs were generated for same HF hospitalization. However, the engine
remembers the last alarm generated in this prediction period. At the end of the
prediction period, the engine knows that the previous prediction period ended, and it
again checks for new possible alarm at the current day. If there is one, the procedure
is repeated and the remembered alarm (if there was one) from the prediction period is
forgotten. However, if there was no alarm at the end day of the prediction period, but
if there was a remembered alarm (from the prediction period) and its prediction period
is still valid, then at this point this previously remembered alarm is generated (again
with its prediction period). Then the procedure is repeated.
This mode of work of the adaptation engine operates such that it takes the
characteristics of the features and rules constructed from them. On this way the casted
prediction is the same as the way we constructed our training instances. An example
of casting alarms with the non-preemptive mode of work is shown in Figure 5.6
140 150 160 170 180 190 200
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
Moving Average Convergence Divergence
Days
We
igh
t [k
g]
Monthly contactHF hospitalization
Alarm due to remembered alarm
End of prediction period for remembered alarm
Alarm (Start prediction period)
End of prediction period
Alarm to remember
Not shown alarm (its pred. period already covered)
Figure 5.6 Example of firing alarms in the non-preemptive mode of work.
Preemptive shorter mode of work of the engine
The preemptive shorter mode of work of the engine is a bit similar as the non-
preemptive longer mode of work, but still very different. One difference here is that,
instead of taking the rule with the longer prediction period the rule with the shorter
prediction period is taken. Another difference is that, a previously generated alarm
TU Eindhoven 84 G. Manev
can be preempted by another alarm (with prediction period that finishes before the
current one) generated during the prediction period of the previous alarm. Then the
new alarm is shown (not remembered like in the non-preemptive mode) with the new
shorter prediction period and the previous alarm is forgotten. Although, the
preemptive shorter mode of work will predict the same HF hospitalizations as the
non-preemptive longer mode, this prediction allow always to have an alarm that
possibly will predict a HF hospitalization in a near feature. However, it still improves
the previous work because a minimum length of a prediction period is 14 days.
Therefore, if there is another alarm generated by the classifier in a prediction period
of 14 days, the new alarm can not predict sooner that the current one. In that case, the
procedure is the same as for the non-preemptive mode of work. An example of firing
alarm with this mode is shown in Figure 5.7.
130 140 150 160 170 180 190 200−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4Moving Average Convergence Divergence
Days
We
igh
t [k
g]
Alarm (Start prediction period)
End of prediction period
Alarm due to remembered alarm
End of prediction period for remembered alarm
Figure 5.7 Example of firing alarms in the preemptive mode of work.
Definitions of the evaluation
We can construct evaluation method to evaluate performances of our and existing
algorithms by using the setup of the adaptive engine. In the construction of the
metrics for this evaluation, we use all days from the day at which patients started to
measure until the last clinical study (for patients that stayed alive) or until patient’s
death (for patients that died).
We calculate the TPs, FPs, FNs and TNs based on prediction periods. For each HF
hospitalization there can be only one period in which it is predicted (one TP) or not
predicted at all (FN), which was not the case in the previous evaluation. If a HF
TU Eindhoven 85 G. Manev
hospitalization is predicted by some generated alarm, i.e. if the HF hospitalization is
in some prediction period, then that prediction period is considered as a TP.
All prediction periods (for which a real alarm was generated) that do not predict a
HF hospitalization are considered as FPs. If a HF hospitalization was not predicted at
all, then the period of 14 days prior the hospitalization is considered as FN. We use
this period to be 14 days long as the minimum prediction period was 14 days. Finally,
all other days with no alarm, and therefore not in any prediction period are our basis
for construction of TN periods. Non overlapping periods of at most 14 days from the
beginning of the study (day at first measurement) until the end day (death or last
clinical visit), are considered as TN periods. It should be noticed that these periods
can be interrupted by some prediction periods. Therefore, we allow even a period of 3
days without an alarm, which has immediate prior and immediate next periods, to be
considered as TN period as those 3 days should be able to be predicted.
The TPR in this evaluation is defined on as same way as the hospitalization rate in
the previous evaluation. It represents the rate of the predicted HF hospitalizations out
of the all HF hospitalizations. The FPs indicate the real number of false alarms
generated, that are valid for some period. The FPR reflects rate of false alarms out of
all periods that do not have HF hospitalization (negative instances), which is the true
formula for the FPR according to (1).
5.6.2.2 Evaluation results
Because we wanted to test the algorithms according to the work of the engine for all
days in the time interval of our study for each patient from the test dataset (29
patients) a new instance was constructed. The total number of instance was 11706. It
should be noticed that this number is higher than the number of instances using in the
previous approach (9825 in total) since we include all days, not only days in which
there was a daily measurement. These 11706 instances are basis for the construction
of possible prediction periods like we discussed before. Since TPs and FNs are
constructed from the periods prior a HF hospitalization, the number of positive
instances is 20, which is the same as the number of HF hospitalizations. However, the
number of negative instances (TNs, and FPs) is different for each classification model
due to the fact that different periods for prediction can be constructed.
TU Eindhoven 86 G. Manev
The results from the evaluation of the models in the non-preemptive work of the
engine are shown in Figure 5.8 and complemented with Table 5.20. Again, all
classification approaches perform much better than previous approaches according to
Youden index and hospitalization rate, which in this case is represented by the TPR.
Also, it can be noticed that the problem of high false alarms (FPs) and therefore high
FPRs from the previous approach is solved. Compared to the previous approach the
number of real alarms fired is much smaller. As we have explained, these alarms take
into account the period of their validity from the moment when they are fired.
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.0 0.1 0.2 0.3
FPR (1-specificity)
TP
R (
se
ns
itiv
ity
)
S (Jrip)
S (J48)
S+D (Jrip)
S+D (J48)
S+H (Jrip)
S+H (J48)
S+D+H (Jrip)
S+D+H (J48)
S+D+H+FS (Jrip)
S+D+H+FS (J48)
RoT
MACD
WTI
HRTI
UnionMW
UnionMWH
UnionRMW
UnionRMWH
Random guess
Figure 5.8 Hospitalization prediction accuracies for different classifiers and feature
sets on test dataset using the non-preemptive mode of work on the adaptive engine.
Best performance of the classification techniques according to the Youden index
is achieved by Jrip method in case of using only symptoms (S) features (0.5013) and
by using feature selection (0.4701). Using combination of symptoms (S) and daily
measurements (D) (and additionally medical history (H)) also give high performance
with respect to Youden index. However, these performances are a bit lower than those
of Jrip in case of using only symptoms due to the fact that these classifiers may learn
TU Eindhoven 87 G. Manev
rules which include daily measurements features. Therefore, they may be fired on a
shorter period than rules from only symptoms and by that the number of FPs is higher.
Worst performances are noticed to models that use combination of symptoms (S)
and medical history (H) features. Additionally, in case of feature selection, the
selected models with min FPR from the training phase also show bad performances
with respect to Youden index. However, we already explained that feature selection
results in models that favor optimization of FPR, so these results are reasonable.
The results with respect to hospitalization rate or in this case the TPR, are same as
the results from the hospitalization rate from the previous evaluation.
With respect to FPR, feature selection shows best results as well as models that
use combination of symptoms and medical history features (S+H). The complete
results of the preemptive shorter work of the engine can be found in Appendix B.
Table 5.20 Prediction accuracies on the test dataset using the non-preemptive mode
of work on the adaptive engine.
Classification model TP FN FP TN TPR FPR Yindex Hrate #Alarms
Our Approach
S (Jrip) max TPR 13 7 109 624 0.65 0.1487 0.5013 0.65 122
min FPR 10 10 69 717 0.5 0.0878 0.4122 0.5 79
(J48) max TPR 7 13 55 766 0.35 0.067 0.283 0.35 62
min FPR 8 12 46 781 0.4 0.0556 0.3444 0.4 54
S+D (Jrip) max TPR 12 8 158 680 0.6 0.1885 0.4115 0.6 170
min FPR 11 9 115 724 0.55 0.1371 0.4129 0.55 126
(J48) max TPR 11 9 147 818 0.55 0.1523 0.3977 0.55 158
min FPR 11 9 96 815 0.55 0.1054 0.4446 0.55 107
S+H (Jrip) max TPR 8 12 59 733 0.4 0.0745 0.3255 0.4 67
min FPR 7 13 49 756 0.35 0.0609 0.2891 0.35 56
(J48) max TPR 8 12 46 776 0.4 0.056 0.344 0.4 54
min FPR 8 12 49 772 0.4 0.0597 0.3403 0.4 57
S+D+H (Jrip) max TPR 11 9 135 691 0.55 0.1634 0.3866 0.55 146
min FPR 11 9 90 742 0.55 0.1082 0.4418 0.55 101
(J48) max TPR 12 8 207 668 0.6 0.2366 0.3634 0.6 219
min FPR 12 8 219 708 0.6 0.2362 0.3638 0.6 231
S+D+H+FS (Jrip) max TPR 13 7 154 702 0.65 0.1799 0.4701 0.65 167
min FPR 7 13 54 764 0.35 0.066 0.284 0.35 61
(J48) max TPR 12 8 175 724 0.6 0.1947 0.4053 0.6 187
min FPR 5 15 56 824 0.25 0.0636 0.1864 0.25 61
Previous approach
RoT 6 14 152 763 0.3 0.1661 0.1339 0.3 158
MACD 6 14 99 784 0.3 0.1121 0.1879 0.3 105
WTI 4 16 71 822 0.2 0.0795 0.1205 0.2 75
HRTI 1 19 139 800 0.05 0.148 -0.098 0.05 140
MACD+WTI 6 14 137 766 0.3 0.1517 0.1483 0.3 143
MACD+WTI+HRTI 7 13 258 709 0.35 0.2668 0.0832 0.35 265
RoT+MACD+WTI 8 12 238 703 0.4 0.2529 0.1471 0.4 246
RoT+MACD+WTI+HRTI 9 11 344 645 0.45 0.3478 0.1022 0.45 353
In general, the results from these models are similar to those of the non-
preemptive work of the engine. The classification algorithms also perform better than
previous approaches with respect to Youden index and hospitalization rate.
The only difference is that the performances of the classification models with
respect to FPR are a bit lower than the models in the non-preemptive mode of work.
TU Eindhoven 88 G. Manev
As an implication of this, the performance according to Youden index is also a bit
smaller. These results came from the setup of the engine, which allows an alarm to be
preempted by another alarm in its prediction period if the second alarm can predict
sooner than the current prediction period. The previous alarm is regretted but marked
as FP, and the new alarm is shown. On this way, the number of FPs increases, and by
that FPRs increases and Youden index decreases. Also, on this way the number of
days from firing an alarm until the HF hospitalization is smaller.
Figure 5.9 shows an example of predicted hospitalizations (12 out of 20) from one
classifier runned in both modes of work. On the Y-axis we plot the number days from
the fired alarm until the predicted hospitalization. There were 4 hospitalizations in
which the alarm generated from the preemptive mode of work was closer to the HF
hospitalization than in the non-preemptive mode. However, it should be noticed that
for the preemptive mode of work, actually there was another alarm generated before
which is preempted by the new one.
The relative performance of the models with respect to Younden index, TPR and
FPR are same as the one in the non-preemptive mode of work.
0
5
10
15
20
25
30
1 2 3 4 5 6 7 8 9 10 11 12
Predicted HF hospitalizations
Days b
efo
re H
F h
osp
itali
zati
on
Non-preemptive longer
Preemptive shorter
Figure 5.9 An example of predicted HF hospitalizations from same classifier runned on
both modes of work of the engine.
TU Eindhoven 89 G. Manev
5.6.3 Summary of HF hospitalization evaluation
We showed two evaluation methods, on a daily basis and using prediction periods, to
measure the performance of our approach and compare with the existing ones.
In both evaluations all classification models from our approach perform much
better than the rules in the existing approach according to Youden index and
hospitalization detection rate.
During the evaluation on a daily basis, SVM model that uses only S features, and
models from SVM and Jrip that use S+D features performs best with respect to
Youden index. However, due to the fact that SVM can not be presented in a readable
format, in order to provide adaptation based on the features, Jrip is preferable. Also,
models from J48 that use D features in combination with S features, perform better
than J48 models that use different combination of features. However, SVM can be
used only to predict HF hospitalization, and it is up to the medical professional to
decide what the reasons for the alarm were. This is an indication that other data
mining classification techniques may be used for prediction if the requirements are
different.
Although, the performance of the models (for both Youden Index and HDR) that
use combination of different feature sets (such as S or S+D+H) is lower than the
performance of using S+D features, it is important to notice that those models produce
different rules using different features (including H). Later on, domain experts may
use some of those rules if they find that they are interesting and to include them in the
final decision list which will be used in the prediction of HF hospitalizations. Also,
ensemble learning may be employed to the classifiers (or from single rule) from
different features sets e.g. by simple voting from those classifiers (rules).
With respect to the hospitalization rate, Jrip using only symptoms (S) features and
feature selection (S+D+G+FS) shows best performance (0.65). Models that use
combination of S+D features also have very high HDR. With respect to FPR, feature
selection shows best performance. However, according to the FPR, our algorithms
show very low performance compared to existing performances. This is due to the
fact that symptoms have high impact on the classification. For example, if false alarm
(FP) is generated by symptoms, and because symptoms are valid for one month, for
all days in that month a false alarm will be generated, which increases the FPR.
TU Eindhoven 90 G. Manev
We overcome this problem in the second evaluation, taking into account the
period of validity of a certain alarm. For the previous example, the alarm will be
generated only at the beginning of the monthly period with a simple “possible
hospitalization within 30 days” warning. Two cases for casting of the alarm were
considered: non-preemptive longer and preemptive shorter. With this evaluation the
hospitalization rate is same as the TPR. The FPR represents the rate of periods with
false alarms from all negative periods (false alarm periods and not predicted periods).
Again all our classifiers perform better than previous approaches with respect to
Youden index and TPR (hospitalization rate). Additionally, as we explained, here, the
performances of our approaches are comparable with previous approaches with
respect to also FPR. Even more, feature selection results in models that are better than
all algorithms from previous approach with respect to FPR.
Best performance with respect to Youden index is and TPR is achieved by Jrip
using S and S+D+H+FS features. High performance is also achieved with S+D and
S+D+H, but more with S+D. The reason for having a bit smaller performance with
S+D than only S features is because of the length of the prediction period, where in
case of only S features is always higher (30 days) than in the case where also D
features are included (e.g. it may be 14 days). In case of FPR best performance are
achieved with feature selection and by models that uses S+H features. These models
are also better than previous approaches according to FPR.
The models from the preemptive mode of work perform on the same way as the
models from the non-preemptive work, compared to the previous approaches. The
only difference from the models in the non-preemptive mode of work is that they have
more FPs, and therefore increased FPR and decreased Youden index. Their
importance is to be able to cast a prediction that will have shorter prediction period.
TU Eindhoven 91 G. Manev
Chapter 6
Conclusions and future work
In this thesis we have worked on several topics in the context of patient modeling in
RPM systems. This chapter summarizes the contributions and conclusions of our
work. We present some points for future work.
6.1 Summary and conclusions
Remote Patient Management (RPM) systems are expected to be increasingly used
in the near future. The current generation of RPM systems follows the one-size-fits-all
approach despite of the wide acceptance of the benefits of personalization and
adaptation of information services.
In this thesis we presented a complete roadmap for patient modeling in RPM
systems. First, we presented an architecture of the next generation RPM systems that
facilitates personalization of educational content, its delivery to patients and alarming
services. We showed the main building blocks of the architecture such as patient
model, adaptation rules, domain model, adaptation engine and knowledge discovery
(KDD) process.
Then, we introduced a generic framework for knowledge discovery which is
essential for personalization of RPM, i.e. for discovering actionable patterns that are
basis for creation of the patent model and adaptation rules. The framework is based on
machine learning and data mining approaches. We provided illustrative examples
drawn from the analysis of data from a real clinical trial. With these examples we
showed how patient profiling and tailoring of the educational material can be
achieved. This framework was embedded in the architecture that we presented.
TU Eindhoven 92 G. Manev
We used the presented framework for patient modeling, on the example of the
heart failure hospitalization prediction problem using the data from a real clinical
study (TEN-HMS). We defined a classification task for the HF hospitalization
problem. Using the data from the clinical trial and following the steps if the
framework we learned models (algorithms) that later may be used in real time settings
such that based on the patient’s medical data to cast an alarm whether a HF
hospitalization will occur or not. We experimented with different classification
methods and we used combination of different features to learn best models for our
approach.
By using the data mining approaches our approach for HF hospitalization
prediction allow to include features from different data, and not only daily
measurements like current approaches. Besides, features from daily measurements we
experimented with other features such as symptoms, existence of a previous HF
hospitalization, and medical history.
The process of learning best models resulted with broader “set of best models”
that performed approximately equal according to Youden Index, which we used as a
primary metric for evaluation of the performance of the algorithms. However, we
showed that different models in this “set of best models” may have different TPRs
and FPRs. By this we showed that using the framework, domain experts may adjust to
their needs for selecting a particular algorithm to be used. For example, if fewer false
alarms are more important, then the model with minimum FPR should be used, or
other way around, if prediction of more HF hospitalizations is more important models
with maximum TPR may be used. Finally, any other combination that domain experts
think that satisfies their needs may be used. As an example, we evaluated our
approach no the test dataset by using one two models: one that favor FPR and other
with maximum TPR.
As for the evaluation of the performance of our approach and it comparison with
existing approaches, we performed two evaluations: on a daily basis (from existing
approach) and on a prediction periods which we introduced.
We showed that our approach using machine learning techniques and combination
of different features to learn algorithms for prediction of HF hospitalization gives
much better results in HF hospitalization prediction than current approaches with
respect to Youden index and hospitalization rate in both evaluations. According to the
TU Eindhoven 93 G. Manev
FPR, our approaches in the first evaluation are outperformed by the existing
approaches, since many FPs may be fired by rules generated from symptoms (e.g. for
every day between two monthly contacts). In the second evaluation the performance
of our approaches are comparable with the existing one with respect to FPR.
In general, in the first evaluation, we showed that symptoms have high impact on
the prediction and therefore they should be used in the prediction. Jrip and J48 that
use S+D features perform best with respect to Youden index, taking into account the
requirement that algorithms should produce rules in a readable format. SVM models
can also be used for prediction, since they also result with good performances. But in
this case it is up to the professionals to decide what the reasons for the alarm were as
SVM can not produce alarms in a readable format. Other models can also be used,
depend on what is preferred. For example, maximum TPR is achieved by Jrip using
only symptoms, while feature selection results with models that favor FPR.
Current approaches cast prediction for a possible HF hospitalization every day
when the existing algorithm fired an alarm, not taking into account the period of
validity of a particular alarm. This resulted with many alarms to be fired in a sequence
for prediction of one HF hospitalization. They also evaluated the performance of the
algorithms on a daily basis, which results with many TPs and FPs (because of the
sequence of firing the alarms). On this way the TPRs and FPRs do not give
meaningful results with respect to the rate of correctly predicted hospitalizations (out
of all possible HF hospitalizations), and the rate of false alarms from (out of the
negative instances). Although, the problem with high TPs was solved with
introduction of the hospitalization rate in [46], the problem of many false alarms still
occurred.
We showed that when alarms are fired in real time and later in the evaluation, the
period of validity of a certain alarm fired should be taken into account. For example,
the existing approaches alert that a HF hospitalization is possible within 14 days. So,
it is not necessary to show another alert within the period of these 14 days. However,
after this period alerts should be fired. Therefore, we created adaptive (meta) rules
from the learned rules, such that to each rule we assigned its prediction period. We
presented an example of a simple adaptation engine, that handles the process of firing
alarms taking into account the characteristics of the learned (meta) rules. The engine
has two modes of work, non-preemptive and preemptive. In general, the engine fires
an alarm when there is no previous alarm, and it does not allow another alarm to be
TU Eindhoven 94 G. Manev
fired in the period of validity of a previous alarm in the non-preemptive mode of
work. In the preemptive mode, another alarm can preempt a previous alarm if its can
predict sooner.
By doing this we showed improved evaluation strategy that is based on periods
with alarm (TPs or FPs) and periods without alarm (FNs or TNs). This also solved the
problem of high FPs and therefore FPR. Therefore, the performances of our
approaches are comparable with the existing approaches also with respect to FPR.
Even more, feature selection resulted in better performance than existing approaches.
Finally, for the HF hospitalization problem, we can conclude that our approach
using data mining techniques together with the adaptive engine we constructed,
improved the HF hospitalization prediction. Furthermore, this thesis showed that
symptoms are very good predictors and therefore they should be taken into account in
prediction.
6.2 Limitations
There can be some limitations in our work. At first, the direct measurement of S
features may become outdated, due to relatively long intervals between monthly
contacts for instance. Thus, for example a particular symptom might changed but this
change have not been recorded yet. Therefore, more accurate results may be achieved
if the symptoms are taken on shorter time intervals (e.g. 15 days). Another possible
problem is that this information may be completely or partially missing due to the
organizational or technical reasons. Therefore, timely accurate prediction of symptom
features may improve the performance of HF hospitalization prediction. Preliminary
study of predictive modeling for two most prominent symptoms, breathlessness and
swelling of ankles, has shown promising results. However, it is the goal of further
empirical evaluation to show whether the generated prediction are accurate enough
and indeed improve the performance of HF hospitalization predictors.
Next, we expected that medical history will improve performances of the
classifiers, but we showed that in some cases the results were worst (e.g. models from
S+H performed worst than models from S). We noticed that this problem occurs
because some algorithms learned a rule that contains only the Cluster feature, which
was constructed by clustering the medical history data. Therefore, that rule can always
TU Eindhoven 95 G. Manev
be fired for a patient that exists in the cluster that fires the rule. This rule actually
indicates that patients in that cluster are in some group with certain risk (e.g. with
high risk for HF hospitalization). Therefore, probably better result can be achieved if
models are learned separately for each group (cluster) of patients. However, this may
be a problem since not much patients exist in the database.
6.3 Future work
As in this thesis we discussed several topics, here we also present future work for the
different topics.
In the presented architecture of the RPM system, we focused and define only the
KDD process. Therefore, as a future work, the other component of the architecture
should be defined (e.g. user and domain model representation). As mentioned, we
considered only the off-line process of discovering useful actionable knowledge for
patient modeling and adaptation. However, since some of the patterns are inherently
changing over time, it is important to investigate the potential of online learning,
concept drift handling mechanisms, discovery and use of re-occurring contexts for the
so-called second order adaptation. As part of the project we did some investigation
and illustrated examples of naïve seasonal and time-changing patterns to show the
benefits of online learning, and concept drift mechanisms, and discovery and use of
contextual features for adapting the set of adaptation rules and user modeling
procedures. However, we did not tried specific algorithms for online learning and how
they will result for e.g. in the HF hospitalization prediction. This is another interesting
point for future work.
RPM systems are becoming also more interactive and therefore there is a natural
need in development of other types of feedback personalization mechanisms in RPM
systems. Other technologies including e.g. avatars, personalized information retrieval,
and open corpus adaptation may become important add-ons to the future generation of
RPMs. Integration of these technologies in the presented architecture is another
challenge for future work.
As for the HF hospitalization prediction, there are several issues to be considered.
Since symptoms are good predictor for HF hospitalization, up-to-date information
about symptoms is important for prediction of HF hospitalization. As we mentioned
earlier, we also investigate two other problems: prediction of symptom worsening and
TU Eindhoven 96 G. Manev
prediction of next symptom status. The predicted symptoms from the later one, can be
used for the problem of having up-to-date information about symptoms. However, it
is the goal of further empirical evaluation to show whether the generated prediction
are accurate enough and indeed improve the performance of HF hospitalization
predictors.
Another challenge is improving the HF hospitalization prediction by applying
ensemble approaches to the classifiers (or single rules) from the different feature sets
and considering context-awareness issues. Additionally, different features in the
prediction can be tried. Next, “traditional” time series prediction can be tried (e.g.
predicting how soon HF hospitalization may happen).
Some of the classification techniques we used (JRip and J48) for patient modeling
allow learned models to be analyzed by domain experts. It is highly important for that
the patient models are interpretable and make sense to the medical personnel. Later
on, domain experts may investigate the obtained rules and if they find them
interesting can be included in the final decision list prediction of HF hospitalizations.
The approach can be extended to handle (the usage of) educational data, motivational
messages and other feedback information provided to the patient by RPM system or
medical personnel for better modeling of patient’s state.
TU Eindhoven 97 G. Manev
References
[1] American Heart Association, Heart Diseases and Stroke Statistics – 2008
update.
[2] European cardiovascular disease statistics 2008. European Heart Network,
2008.
[3] The EuroHeart Failure Survey Programme. A survey on the quality of care
among patients with heart failure in Europe. Part 2: treatment. European Heart
Journal 2003. 24: p.464-474.
[4] McKee PA., Castelli WP., McNamara PM. et al.. The natural history of
congestive heart failure: the Framingham study. The New England Journal of
Medicine (N Eng J Med 1971; 285: 1441-1446.
[5] Zweig MH. and Campbell G.. Receiver-operating characteristic (ROC) plots: a
fundamental evaluation tool in clinical medicine. Clin Chem 1993;39:561-577
[6] European Society of Cardiology 2008, www.esc.org.
[7] Adams K.A. and Zannad F.. Clinical Definition and epidemiology of advanced
heart failure. American Journal of Cardiology, 1998. 135: p. S204-215.
[8] Heart disease and stroke statistics. American Heart Association, 2009.
[9] Wang H.. Disease management industry and high-tech adoption. An industry
report from parks associates. Park Associates, 2008.
[10] Ennet C.M. et al.. Decision Support Options for Cardiovascular Medicine.
Technical Note PR-TN 2005/00418, 2005, Philips Research.
[11] Cleland J., Atkin P., and Cullington D.. Patient flow and responsibilities of
medical professionals in HF clinic at Castle Hill Hospital, Hull, UK, 2008.
[12] Wang H., Parrish A., Smith R., and Vrbsky S.. Variable selection and ranking
for analyzing automobile traffic accident data. SAC ’05: proceedings of the
2005 ACM symposium on Applied computing, New York, NY, USA 2005.
p.32-37. ACM.
TU Eindhoven 98 G. Manev
[13] WEKA open source data mining toolkit. www.cs.waikato.ac.nz/ml/weka/.
[14] George C. Runger Douglas C.Montgomery, Applied statistics and probability
for engineers, Wiley, 2007.
[15] Hunt S.A., et al.. ACC/AHA Guidelines for the Evaluation and Management of
Chronic Heart Failure in the Adult: Executive Summary A Report of the
American College of Cardiology/American Heart Association Task Force on
Practice Guidelines Circulation, 2001. 104: p.2996-3007.
[16] Swedberg K., et al.. Guidelines for the diagnosis and treatment of Chronic
Heart Failure (update 2005): The task Force for the Diagnosis and Treatment
of Chronic Heart Failure of the European Society of Cardiology. European
Journal of Heart Failure 2005. 7(3): p.343-349.
[17] Felker GM., Adams KF., Konstam MA. et al. The problem of decompensated
heart failure: nomenclature, classification, and risk stratification. Am Heart J
2003; 145:S18–S25.
[18] Sutton G.C.. Epiodemiologic aspects of heart failure. American Heart Failure
Journal, 1990. 120: p.1538-1540.
[19] Chaudhry SI., Wang Y., Concato J. et al.. Patterns of Weight Change Preceding
Hospitalization for Heart Failure. Circulation 2007; 116:1549-1554.
[20] Tsuyuki RT., McKelvie RS., Arnold JMO et al. Acute Precipitants of
Congestive Heart Failure Exacerbations. Arch Intern Med 2001; 161:2337-
2342.
[21] H. Hero. Health hero. www.healthhero.com, 2008. .
[22] HomMed. Honeywell hommed. www.hommed.com, 2008.
[23] B. Healthcare. Bl healthcare. www.blhealthcare.com, 2008.
[24] C. Guard. Card guard. www.cardguard.com, 2008.
[25] P. Motiva. Philips motiva. www.healthcare.philips.com. .
[26] Wal M., Jaarsma T., Moser D., Veeger N., Gilst W., and Veldhuisen D..
Compliance in heart failure patients: the importance of knowledge and beliefs.
European Heart Journal, 27:434–440, 2006.
[27] Burkow T., Vognild L., Krogstad T., Borch N., Ostengen G., Bratvold A., and
Risberg M. J.. An easy to use and affordable home-based personal ehealth
system for chronic disease management based on free open source software.
Studies in health technology and informatics, 136:83–88, 2008.
TU Eindhoven 99 G. Manev
[28] Wang X., Istepanian R., Geake T., Hayes J., Desco M., Kontaxakis G., Santos
A., Prentza A., and Pavlopoulos S.. A feasibility study of a personalized,
internet-based compliance system for chronic disease management.
Telemedicine and e-Health, 11(5):559– 566, 2005.
[29] Janssen-Boyne J.. The healthbuddy system: a new way of telemonitoring.
European Heart Failure Congress 2008.
[30] Cleland J., Louis A.A., Rigby A., Janssens U., and Balk A.. Noninvasive home
telemonitoring for patients with heart failure at high risk of recurrent admission
and death. Journal of American College of Cardiology, 45(10):1654--1664,
2005.
[31] Wal M. and Jaarsma T.. Adherence in heart failure in the elderly: Problem and
possible solutions. International Journal of Cardiology, 125(2):203--208, 2008.
[32] Brusilovsky P. and Millan E.. User models for adaptive hypermedia and
adaptive educational systems. The Adaptive Web, pages 3–53, 2007.
[33] Fayyad U., Piatetsky-shapiro G., and Smyth P.. From data mining to knowledge
discovery in databases. AI Magazine, 1996. 17: p.37-54.
[34] B. F. van Dongen, A. K. A. de Medeiros, Verbeek H. M. W., Weijters A. J. M.
M., and W. M. P. van der Aalst. The prom framework: A new era in process
mining tool support. In G. Ciardo and P. Darondeau, editors, ICATPN, volume
3536 of Lecture Notes in Computer Science, pages 444–454. Springer, 2005.
[35] Gunther C.W. and W.M.P. van der Aalst. A Generic Import Framework for
Process Event Logs. In J. Eder and S. Dustdar, editors, Business Process
Management Workshops, Workshop on Business Process Intelligence (BPI
2006), volume 4103 of Lecture Notes in Computer Science, pages 81{92.
Springer-Verlag, Berlin, 2006.
[36] Usama M. Fayyad, Keki B. Irani: Multi-interval discretization of
continuousvalued attributes for classification learning. In: Thirteenth
International Joint Conference on Articial Intelligence, 1022-1027, 1993.
[37] Williem W.C. Fast effective rule induction. In Proceedings of the 12th
International Conference on Machine Learning. Morgan Kaufmann, 1995. 76:p.
115-123.
[38] Pang-Ning Tan, Steinbach M., and Kumar V.. Introduction to Data Mining.
Addison Wesley, May 2005.
TU Eindhoven 100 G.
Manev
[39] Japkowicz N., Stephen S.. The class imbalance problem: A systematic study.
Intell. Data Anal., 6(5):429-449, 2002.
[40] Wal M. and Jaarsma T.. Nurse-led intervention can improve adherence to non-
pharmacological treatment in heart failure patients (data from the coach
study). European Journal of Cardiovascular Nursing, 7(1):S41, 2008.
[41] Weis G., and Provost F.. Learning when training data are costly: The effect of
class distribution on tree induction. Journal of Artificial Intelligence Research,
2003. 19: p.315-354.
[42] Witten I.H, and Frank E.. Data Mining: Practical Machine Learning Tools and
Techniques. Morgan Kaufmann, second edition, 2005.
[43] Hall, M. A., Smith, L. A.. Practical feature subset selection for machine
learning. Australian Computer Science Conference. Springer,1998 181-191.
[44] Packer M, Abraham WT, Mehra MR et al. Utility of Impedance Cardiography
for the Identification of Short-Term Risk of Clinical Decompensation in Stable
Patients With Chronic Heart Failure. J Am Coll Cardiol 2006; 47:2245-2252.
[45] Zhang J., Goode KM., Cuddihy PE. et al. Predicting hospitalisation due to
worsening heart failure using daily weight measurement: An analysis of the
Trans-European Network-Home-Care Management System (TEN-HMS) Study.
Unpublished 2008.
[46] Wilhelms M. Internship report on decompensation of Chronic Heart Failure,
Philips Technologie GmbH Forschungslaboratorien, Aachen, Medical Signal
Processing. Unpublished 2008.
TU Eindhoven 101 G.
Manev
Appendix A
Results on the training dataset In the following sections the results from each of the conducted experiments are
shown. In should be noticed that for the T-test results following notations are used:
- 1 in a certain cell means that the algorithm in the column outperforms the algorithm
in the row.
- 0 in a certain cell means that the algorithms in the column and in the row are tied.
A.1 Experiment 1 (S)
Table A.1 Result of T-test on TPR between all classifiers (S features). Classification model Cumulative
Cost Object size 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 Nb + Nb = Nb -
1 = S (SVM) cost 8 - 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 1 0 1 1 0 1 1 0 21 15
2 = cost 9 0 - 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 3 29 4
3 = cost 10 0 0 - 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 8 26 2
4 = cost 14 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19 17 0
5 = S (Jrip) cost 8 0 0 0 0 - 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 30 1
6 = cost 9 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 26 0
7 = cost 10 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 24 0
8 = cost 14 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 25 11 0
9 = S (J48) cost 8 Obj.2 0 1 1 1 1 1 1 1 - 0 0 0 0 0 0 1 1 0 1 1 0 1 1 1 0 1 0 0 1 1 0 1 1 0 1 1 0 15 21
10 = Obj.10 0 0 1 1 1 1 1 1 0 - 0 0 0 0 0 1 0 0 1 1 0 0 0 1 0 0 0 0 1 1 0 1 1 0 1 1 0 20 16
11 = Obj.20 0 0 0 1 0 1 1 1 0 0 - 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 1 0 25 11
12 = cost 9 Obj.2 0 1 1 1 1 1 1 1 0 0 0 - 0 0 0 1 0 0 1 1 0 1 1 1 0 0 0 0 1 1 0 1 1 0 1 1 0 17 19
13 = Obj.10 0 0 0 1 0 0 1 1 0 0 0 0 - 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 28 8
14 = Obj.20 0 0 0 1 0 0 1 1 0 0 0 0 0 - 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 29 7
15 = cost 10 Obj.2 0 0 1 1 1 1 1 1 0 0 0 0 0 0 - 0 0 0 1 1 0 0 1 1 0 0 0 0 1 1 0 1 1 0 1 1 0 20 16
16 = Obj.10 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 3 29 4
17 = Obj.20 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 31 4
18 = cost 14 Obj.2 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 - 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 1 0 26 10
19 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 25 0
20 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 12 23 1
21 = S\RQoL cost 8 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 32 4
22 = (Jrip) cost 9 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 1 0 0 0 0 0 0 0 0 0 0 0 0 2 32 2
23 = cost 10 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 1 0 0 0 0 0 0 0 0 0 0 0 0 3 31 2
24 = cost 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 27 9 0
25 = S\RQoL cost 8 Obj.2 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 - 0 0 0 1 0 0 1 1 0 1 1 0 23 13
26 = (J48) Obj.10 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 - 0 0 0 0 0 1 0 0 1 1 1 29 6
27 = Obj.20 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 - 0 0 0 0 0 0 0 1 1 0 31 5
28 = cost 9 Obj.2 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 - 1 0 0 1 1 0 1 1 0 23 13
29 = Obj.10 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 - 0 0 0 0 0 0 0 7 27 2
30 = Obj.20 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 - 0 0 0 0 0 1 5 28 3
31 = cost 10 Obj.2 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 - 1 1 0 1 1 0 23 13
32 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 - 0 0 0 0 11 24 1
33 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 - 0 0 0 9 26 1
34 = cost 14 Obj.2 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 - 1 1 0 31 5
35 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 15 21 0
36 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 20 16 0
TU Eindhoven 102 G.
Manev
Table A.2 Result of T-test on Youden Index between all classifiers (S features). Classification model Cumulative
Cost Object size 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 Nb + Nb = Nb -
1 = S (SVM) cost 8 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
2 = cost 9 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
3 = cost 10 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
4 = cost 14 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
5 = S (Jrip) cost 8 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
6 = cost 9 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
7 = cost 10 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
8 = cost 14 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 31 0
9 = S (J48) cost 8 Obj.2 0 0 0 0 0 0 0 1 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 33 3
10 = Obj.10 0 0 0 0 0 0 0 1 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 34 2
11 = Obj.20 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
12 = cost 9 Obj.2 0 0 0 0 0 0 0 1 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 34 2
13 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
14 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
15 = cost 10 Obj.2 0 0 0 0 0 0 0 1 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 34 2
16 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
17 = Obj.20 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35 1
18 = cost 14 Obj.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
19 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
20 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
21 = S\RQoL cost 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
22 = (Jrip) cost 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
23 = cost 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
24 = cost 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 4 32 0
25 = S\RQoL cost 8 Obj.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 36 0
26 = (J48) Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 36 0
27 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 36 0
28 = cost 9 Obj.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 36 0
29 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 36 0
30 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 36 0
31 = cost 10 Obj.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 36 0
32 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 1 35 0
33 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 36 0
34 = cost 14 Obj.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 36 0
35 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 36 0
36 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 36 0
TU Eindhoven 103 G.
Manev
A.2 Experiment 2 (S+D)
Table A.3 Performances of all classifiers (S+D features).
Classification model TPR FPR Youden Index
Mean Std.dev. Mean Std.dev. Mean Std.dev.
S+D cost 8 0.3589 0.1642 0.0977 0.0286 0.2612 0.1644
(SVM) cost 9 0.4087 0.165 0.1189 0.0315 0.2898 0.1658
cost 10 0.4605 0.1687 0.1381 0.0327 0.3225 0.168
cost 14 0.5476 0.1745 0.1843 0.0349 0.3634 0.176
S+D cost 8 0.4011 0.18 0.1582 0.0421 0.2429 0.174
(Jrip) cost 9 0.4376 0.1855 0.1738 0.0491 0.2638 0.1783
cost 10 0.4576 0.1844 0.1884 0.0529 0.2692 0.1768
cost 14 0.5349 0.1842 0.2496 0.0628 0.2853 0.1746
S+D (J48) cost 8 Obj.2 0.2364 0.1443 0.0979 0.0287 0.1384 0.142
Obj.10 0.317 0.1543 0.1221 0.0339 0.1949 0.1534
Obj.20 0.3626 0.1608 0.1012 0.0306 0.2614 0.1595
cost 9 Obj.2 0.2442 0.1483 0.1012 0.0297 0.143 0.1455
Obj.10 0.3451 0.1592 0.1485 0.0364 0.1966 0.1562
Obj.20 0.393 0.1673 0.1257 0.0384 0.2672 0.1629
cost 10 Obj.2 0.2554 0.1552 0.1052 0.0288 0.1502 0.1526
Obj.10 0.3652 0.1639 0.1654 0.038 0.1997 0.1615
Obj.20 0.4123 0.1674 0.1471 0.0379 0.2652 0.1663
cost 14 Obj.2 0.2745 0.1568 0.1252 0.0298 0.1493 0.1536
Obj.10 0.4222 0.1743 0.2211 0.0393 0.2011 0.1687
Obj.20 0.4776 0.175 0.221 0.0448 0.2566 0.1703
S+D\RH cost 8 0.4001 0.1835 0.1582 0.0437 0.2418 0.1742
(Jrip) cost 9 0.435 0.1931 0.1742 0.0466 0.2608 0.1783
cost 10 0.48 0.1865 0.1889 0.049 0.2911 0.1758
cost 14 0.5385 0.1922 0.2413 0.0597 0.2972 0.1809
S+D\RH cost 8 Obj.2 0.2422 0.1461 0.09 0.0256 0.1522 0.1451
(J48) Obj.10 0.3301 0.1579 0.1117 0.0305 0.2184 0.158
Obj.20 0.3667 0.1649 0.0958 0.0305 0.271 0.1629
cost 9 Obj.2 0.2531 0.1462 0.0951 0.026 0.158 0.146
Obj.10 0.3614 0.1627 0.1288 0.0302 0.2326 0.1639
Obj.20 0.3957 0.1686 0.1206 0.0387 0.2751 0.1646
cost 10 Obj.2 0.2587 0.1517 0.1008 0.0259 0.1579 0.1512
Obj.10 0.3889 0.1626 0.148 0.0328 0.2409 0.1649
Obj.20 0.4241 0.1728 0.1447 0.0369 0.2794 0.1702
cost 14 Obj.2 0.2921 0.1635 0.1221 0.0279 0.17 0.1621
Obj.10 0.4454 0.1661 0.199 0.0364 0.2464 0.1649
Obj.20 0.4623 0.1659 0.1864 0.0418 0.2759 0.1661
S+D\RH cost 8 0.4302 0.1769 0.1517 0.0422 0.2785 0.1717
(Jrip) cost 9 0.4618 0.1772 0.1664 0.0439 0.2953 0.1712
cost 10 0.4795 0.1763 0.1776 0.044 0.3019 0.1732
cost 14 0.5372 0.1851 0.2241 0.0666 0.3131 0.1693
S+D\RH cost 8 Obj.2 0.259 0.1526 0.0961 0.026 0.1629 0.1511
(J48) Obj.10 0.3761 0.167 0.1285 0.0318 0.2475 0.1609
Obj.20 0.3765 0.1656 0.0987 0.0376 0.2778 0.1603
cost 9 Obj.2 0.269 0.1561 0.1008 0.0269 0.1682 0.1543
Obj.10 0.4098 0.1715 0.1418 0.0325 0.2679 0.1654
Obj.20 0.404 0.1688 0.1263 0.0388 0.2777 0.1656
cost 10 Obj.2 0.2755 0.1586 0.1049 0.0275 0.1706 0.1564
Obj.10 0.443 0.1716 0.1519 0.0338 0.2912 0.1683
Obj.20 0.4312 0.1723 0.1508 0.0384 0.2804 0.1685
cost 14 Obj.2 0.3008 0.1661 0.1273 0.0289 0.1735 0.1648
Obj.10 0.47 0.1741 0.1918 0.0377 0.2782 0.1705
Obj.20 0.4743 0.167 0.1811 0.0397 0.2932 0.1672
TU Eindhoven 104 G. Manev
Table A.4 Result of T-test on TPR between all classifiers (S+D). Classification model Cumulative
cost object size 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 Nb + Nb = Nb -
1 = S+D cost 8 - 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 2 42 8
2 = (SVM) cost 9 0 - 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 10 39 3
3 = cost 10 0 0 - 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 36 1
4 = cost 14 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 32 20 0
5 = S+D cost 8 0 0 0 1 - 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 8 40 4
6 = (Jrip) cost 9 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 40 0
7 = cost 10 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 39 0
8 = cost 14 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28 24 0
9 = S+D cost 8 Obj.2 1 1 1 1 1 1 1 1 - 0 1 0 1 1 0 1 1 0 1 1 1 1 1 1 0 0 1 0 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 14 38
10 = (J48) Obj.10 0 0 1 1 0 0 1 1 0 - 0 0 0 0 0 0 1 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 0 33 19
11 = Obj.20 0 0 0 1 0 0 0 1 0 0 - 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 4 41 7
12 = cost 9 Obj.2 0 1 1 1 1 1 1 1 0 0 1 - 1 1 0 1 1 0 1 1 1 1 1 1 0 0 1 0 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 15 37
13 = Obj.10 0 0 0 1 0 0 0 1 0 0 0 0 - 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 1 3 37 12
14 = Obj.20 0 0 0 1 0 0 0 1 0 0 0 0 0 - 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 10 37 5
15 = cost 10 Obj.2 0 1 1 1 1 1 1 1 0 0 0 0 0 1 - 1 1 0 1 1 1 1 1 1 0 0 1 0 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 17 35
16 = Obj.10 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 - 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 7 38 7
17 = Obj.20 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 39 1
18 = cost 14 Obj.2 0 1 1 1 0 1 1 1 0 0 0 0 0 1 0 0 1 - 1 1 0 1 1 1 0 0 0 0 0 1 0 1 1 0 1 1 1 1 1 1 0 0 0 0 1 1 0 1 1 0 1 1 0 24 28
19 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 39 0
20 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 30 0
21 = S+D\RH cost 8 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 8 40 4
22 = (Jrip) cost 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 41 0
23 = cost 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 36 0
24 = cost 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 29 23 0
25 = S+D\RH cost 8 Obj.2 1 1 1 1 1 1 1 1 0 0 1 0 1 1 0 1 1 0 1 1 1 1 1 1 - 0 1 0 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 14 38
26 = (J48) Obj.10 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 - 0 0 0 0 0 0 1 0 1 1 0 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 0 36 16
27 = Obj.20 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 - 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 5 40 7
28 = cost 9 Obj.2 0 1 1 1 1 1 1 1 0 0 1 0 0 1 0 1 1 0 1 1 1 1 1 1 0 0 1 - 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 16 36
29 = Obj.10 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 - 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 7 36 9
30 = Obj.20 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 10 37 5
31 = cost 10 Obj.2 0 1 1 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 1 1 1 1 1 0 0 0 0 1 1 - 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 18 34
32 = Obj.10 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 10 38 4
33 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 39 0
34 = cost 14 Obj.2 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 1 - 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 0 30 22
35 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 36 0
36 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19 33 0
37 = S+D\RH cost 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 40 0
38 = (Jrip) cost 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 38 0
39 = cost 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 15 37 0
40 = cost 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 29 23 0
41 = S+D\RH cost 8 Obj.2 0 1 1 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 1 1 1 1 1 0 0 0 0 1 1 0 1 1 0 1 1 1 1 1 1 - 1 1 0 1 1 0 1 1 0 1 1 0 18 34
42 = (J48) Obj.10 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 - 0 0 0 0 0 1 0 0 1 1 8 37 7
43 = Obj.20 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 - 0 0 0 0 0 0 0 0 1 7 39 6
44 = cost 9 Obj.2 0 1 1 1 1 1 1 1 0 0 0 0 0 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 1 1 1 1 1 0 1 0 - 1 1 0 1 1 0 1 1 0 21 31
45 = Obj.10 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 11 40 1
46 = Obj.20 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 - 0 0 0 0 0 1 10 37 5
47 = cost 10 Obj.2 0 1 1 1 0 1 1 1 0 0 0 0 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 0 1 0 1 1 0 1 1 1 1 1 1 0 0 0 0 1 1 - 1 1 0 1 1 0 24 28
48 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 16 36 0
49 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 14 38 0
50 = cost 14 Obj.2 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 0 0 1 1 - 1 1 0 32 20
51 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 17 35 0
52 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 24 28 0
TU Eindhoven 105 G. Manev
Table A.5 Result of T-test on TPR between all classifiers (S+D) Classification model Cumulative
cost object size 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 Nb + Nb = Nb -
1 = S+D cost 8 - 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 50 1
2 = (SVM) cost 9 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 44 0
3 = cost 10 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 37 0
4 = cost 14 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19 33 0
5 = S+D cost 8 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 52 0
6 = (Jrip) cost 9 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 52 0
7 = cost 10 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 50 0
8 = cost 14 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 47 0
9 = S+D cost 8 Obj.2 1 1 1 1 0 0 1 1 - 0 1 0 0 1 0 0 1 0 0 1 0 0 1 1 0 0 1 0 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 21 31
10 = (J48) Obj.10 0 0 1 1 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 50 2
11 = Obj.20 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 47 0
12 = cost 9 Obj.2 0 1 1 1 0 0 1 1 0 0 1 - 0 1 0 0 1 0 0 1 0 0 1 1 0 0 1 0 0 1 0 0 1 0 1 1 1 1 1 1 0 0 1 0 1 1 0 1 1 0 1 1 0 25 27
13 = Obj.10 0 0 1 1 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 49 3
14 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 45 0
15 = cost 10 Obj.2 0 1 1 1 0 0 0 1 0 0 1 0 0 1 - 0 1 0 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1 1 1 1 1 0 0 1 0 1 1 0 1 1 0 1 1 0 28 24
16 = Obj.10 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 50 2
17 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 47 0
18 = cost 14 Obj.2 0 1 1 1 0 0 0 1 0 0 1 0 0 1 0 0 1 - 0 0 0 0 1 1 0 0 1 0 0 1 0 0 1 0 0 1 1 1 1 1 0 0 1 0 1 1 0 1 1 0 1 1 0 28 24
19 = Obj.10 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 51 1
20 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 50 0
21 = S+D\RH cost 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 52 0
22 = (Jrip) cost 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 52 0
23 = cost 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 45 0
24 = cost 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 45 0
25 = S+D\RH cost 8 Obj.2 0 1 1 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 1 1 - 0 1 0 0 1 0 0 1 0 0 1 1 1 1 1 0 0 1 0 1 1 0 1 1 0 1 1 0 28 24
26 = (J48) Obj.10 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 51 1
27 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 45 0
28 = cost 9 Obj.2 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 - 0 1 0 0 1 0 0 1 0 1 1 1 0 0 1 0 1 1 0 1 1 0 1 1 0 32 20
29 = Obj.10 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 50 1
30 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 44 0
31 = cost 10 Obj.2 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 1 - 0 1 0 0 1 0 1 1 1 0 0 1 0 1 1 0 1 1 0 1 1 0 32 20
32 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 51 0
33 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 45 0
34 = cost 14 Obj.2 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 47 5
35 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 50 0
36 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 45 0
37 = S+D\RH cost 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 47 0
38 = (Jrip) cost 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10 42 0
39 = cost 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 10 42 0
40 = cost 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 12 40 0
41 = S+D\RH cost 8 Obj.2 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 - 0 1 0 1 1 0 1 1 0 1 1 0 38 14
42 = (J48) Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 1 51 0
43 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 9 43 0
44 = cost 9 Obj.2 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 - 1 0 0 1 0 0 1 1 0 42 10
45 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 9 43 0
46 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 8 44 0
47 = cost 10 Obj.2 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 - 1 0 0 1 1 0 44 8
48 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 12 40 0
49 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 8 44 0
50 = cost 14 Obj.2 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 - 1 1 0 46 6
51 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 11 41 0
52 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 13 39 0
TU Eindhoven 106 G.
Manev
A.3 Experiment 3 (S+H)
Table A.6 Performances of all classifiers (S+H).
Classification model TPR FPR Youden Index
Mean Std.dev. Mean Std.dev. Mean Std.dev.
S+H cost 8 0.3146 0.1556 0.1103 0.0296 0.2044 0.1553
(SVM) cost 9 0.3788 0.1655 0.1292 0.0309 0.2496 0.1668
cost 10 0.4008 0.1689 0.144 0.0323 0.2567 0.1689
cost 14 0.4851 0.1752 0.1994 0.037 0.2857 0.1769
S+H cost 8 0.3673 0.1875 0.1447 0.0445 0.2227 0.1802
(Jrip) cost 9 0.3973 0.173 0.1626 0.0476 0.2347 0.1687
cost 10 0.4232 0.1868 0.1769 0.053 0.2463 0.1795
cost 14 0.5123 0.1953 0.2437 0.0664 0.2686 0.1878
S+H cost 8 Obj.2 0.2384 0.1588 0.0966 0.0271 0.1418 0.1603
(J48) Obj.10 0.2906 0.1578 0.1255 0.0318 0.1651 0.1514
Obj.20 0.2954 0.1605 0.1192 0.0347 0.1761 0.1584
cost 9 Obj.2 0.2434 0.1585 0.1002 0.028 0.1432 0.16
Obj.10 0.3286 0.1725 0.1433 0.0317 0.1853 0.1662
Obj.20 0.3317 0.1602 0.1379 0.0382 0.1939 0.1624
cost 10 Obj.2 0.2484 0.1627 0.1038 0.0283 0.1446 0.1633
Obj.10 0.3677 0.1769 0.1603 0.0345 0.2074 0.1713
Obj.20 0.3446 0.1627 0.1626 0.0416 0.182 0.1609
cost 14 Obj.2 0.282 0.1661 0.1287 0.0319 0.1533 0.1628
Obj.10 0.4151 0.1814 0.199 0.04 0.2161 0.177
Obj.20 0.4157 0.1775 0.2331 0.0485 0.1826 0.1711
S\RQoL+H cost 8 0.3824 0.1825 0.1347 0.0456 0.2477 0.1792
(Jrip) cost 9 0.3985 0.1861 0.1562 0.0482 0.2423 0.1773
cost 10 0.4205 0.1795 0.1741 0.0516 0.2464 0.1741
cost 14 0.502 0.1832 0.2357 0.0715 0.2662 0.1761
S\RQoL+H cost 8 Obj.2 0.2973 0.1729 0.0953 0.0261 0.2019 0.1721
(J48) Obj.10 0.33 0.1621 0.1071 0.0319 0.2229 0.1601
Obj.20 0.3372 0.1682 0.1043 0.0362 0.2329 0.1642
cost 9 Obj.2 0.3011 0.1706 0.1019 0.0271 0.1992 0.1695
Obj.10 0.3685 0.1686 0.1281 0.0351 0.2404 0.1665
Obj.20 0.3677 0.1612 0.1206 0.0388 0.2471 0.1602
cost 10 Obj.2 0.3011 0.1684 0.1061 0.0275 0.195 0.1673
Obj.10 0.391 0.1713 0.1544 0.04 0.2366 0.1699
Obj.20 0.3767 0.1659 0.1398 0.0407 0.2368 0.1647
cost 14 Obj.2 0.3349 0.1684 0.1449 0.0327 0.1901 0.1686
Obj.10 0.413 0.1739 0.2084 0.0414 0.2046 0.1749
Obj.20 0.4265 0.1777 0.2145 0.0485 0.2121 0.1728
TU Eindhoven 107 G.
Manev
Table A.7 Result of T-test on TPR between all classifiers (S+H).
Classification model Cumulative
Cost Object size 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 Nb + Nb = Nb -
1 = S+H cost 8 - 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 32 4
2 = (SVM) cost 9 0 - 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 32 1
3 = cost 10 0 0 - 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 31 1
4 = cost 14 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19 17 0
5 = S+H cost 8 0 0 0 0 - 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 33 2
6 = (Jrip) cost 9 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 33 0
7 = cost 10 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 30 0
8 = cost 14 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 21 15 0
9 = S+H cost 8 Obj.2 0 1 1 1 1 1 1 1 - 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 0 0 0 0 1 1 0 1 1 0 1 1 0 15 21
10 = (J48) Obj.10 0 0 0 1 0 0 1 1 0 - 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 29 7
11 = Obj.20 0 0 0 1 0 0 1 1 0 0 - 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 28 8
12 = cost 9 Obj.2 0 1 1 1 0 1 1 1 0 0 0 - 0 0 0 1 0 0 1 1 1 1 1 1 0 0 0 0 1 1 0 1 1 0 1 1 0 17 19
13 = Obj.10 0 0 0 1 0 0 0 1 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 33 3
14 = Obj.20 0 0 0 1 0 0 0 1 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 33 3
15 = cost 10 Obj.2 0 1 1 1 0 1 1 1 0 0 0 0 0 0 - 1 0 0 1 1 1 1 1 1 0 0 0 0 1 1 0 1 1 0 1 1 0 17 19
16 = Obj.10 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 32 1
17 = Obj.20 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 32 3
18 = cost 14 Obj.2 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 - 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 27 9
19 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 30 0
20 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 30 0
21 = S\RQoL+H cost 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 3 32 1
22 = (Jrip) cost 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 33 0
23 = cost 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 5 31 0
24 = cost 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 21 15 0
25 = S\RQoL+H cost 8 Obj.2 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 - 0 0 0 0 0 0 0 0 0 1 1 0 31 5
26 = (J48) Obj.10 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 - 0 0 0 0 0 0 0 0 0 1 0 32 4
27 = Obj.20 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 - 0 0 0 0 0 0 0 0 0 0 33 3
28 = cost 9 Obj.2 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 - 0 0 0 0 0 0 1 1 0 31 5
29 = Obj.10 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 - 0 0 0 0 0 0 0 3 31 2
30 = Obj.20 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 - 0 0 0 0 0 0 3 30 3
31 = cost 10 Obj.2 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 - 0 0 0 1 1 0 31 5
32 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 3 33 0
33 = Obj.20 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 - 0 0 0 3 31 2
34 = cost 14 Obj.2 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 - 0 0 0 33 3
35 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 7 29 0
36 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 10 26 0
Table A.8 Result of T-test on Youden Index between all classifiers (S+H)
Classification model Cumulative
Cost Object size 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 Nb + Nb = Nb -
1 = S+H cost 8 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
2 = (SVM) cost 9 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
3 = cost 10 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
4 = cost 14 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 33 0
5 = S+H cost 8 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
6 = (Jrip) cost 9 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
7 = cost 10 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
8 = cost 14 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
9 = S+H cost 8 Obj.2 0 0 0 1 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35 1
10 = (J48) Obj.10 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
11 = Obj.20 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
12 = cost 9 Obj.2 0 0 0 1 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35 1
13 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
14 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
15 = cost 10 Obj.2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35 1
16 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
17 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
18 = cost 14 Obj.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
19 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
20 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
21 = S\RQoL+H cost 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
22 = (Jrip) cost 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
23 = cost 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
24 = cost 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 36 0
25 = S\RQoL+H cost 8 Obj.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 36 0
26 = (J48) Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 36 0
27 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 36 0
28 = cost 9 Obj.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 36 0
29 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 36 0
30 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 36 0
31 = cost 10 Obj.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 36 0
32 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 36 0
33 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 36 0
34 = cost 14 Obj.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 36 0
35 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 36 0
36 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 36 0
TU Eindhoven 108 G.
Manev
A.4 Experiment 4 (S+D+H)
Table A.9 Performances of all classifiers (S+D+H).
Classification model TPR FPR Youden Index
Mean Std.dev. Mean Std.dev. Mean Std.dev.
S+D+H cost 8 0.4045 0.171 0.1048 0.0287 0.2997 0.1703
(SVM) cost 9 0.4577 0.1828 0.1248 0.0319 0.3329 0.182
cost 10 0.4914 0.1771 0.1371 0.0337 0.3543 0.1777
cost 14 0.5482 0.1784 0.1788 0.0366 0.3694 0.1765
S+D+H cost 8 0.3992 0.1774 0.1591 0.0479 0.2401 0.1703
(Jrip) cost 9 0.4313 0.1908 0.176 0.0511 0.2553 0.1749
cost 10 0.4718 0.1931 0.1999 0.0607 0.2719 0.1754
cost 14 0.5944 0.2068 0.2645 0.0694 0.3299 0.1845
S+D+H cost 8 Obj.2 0.2626 0.1604 0.0934 0.0265 0.1692 0.1588
(J48) Obj.10 0.3171 0.1581 0.1282 0.0361 0.189 0.1529
Obj.20 0.3611 0.1617 0.123 0.0367 0.2381 0.1594
cost 9 Obj.2 0.2637 0.1593 0.0966 0.0271 0.167 0.1577
Obj.10 0.3434 0.1572 0.1456 0.0361 0.1978 0.1528
Obj.20 0.3987 0.1686 0.1448 0.0399 0.2539 0.1653
cost 10 Obj.2 0.2682 0.1634 0.0985 0.0263 0.1696 0.161
Obj.10 0.3626 0.1592 0.1617 0.037 0.2009 0.1581
Obj.20 0.4337 0.1699 0.1608 0.0395 0.2729 0.169
cost 14 Obj.2 0.2862 0.1545 0.1228 0.0288 0.1634 0.1523
Obj.10 0.4215 0.1759 0.2204 0.0387 0.2012 0.1701
Obj.20 0.4777 0.1809 0.2049 0.0425 0.2729 0.1799
S+D\RH+H cost 8 0.3961 0.1861 0.1552 0.0463 0.2409 0.182
(Jrip) cost 9 0.4197 0.1947 0.1762 0.0497 0.2435 0.1817
cost 10 0.4873 0.2111 0.2063 0.0637 0.281 0.1926
cost 14 0.5917 0.1894 0.2814 0.0657 0.3103 0.1723
S+D\RH+H cost 8 Obj.2 0.2455 0.1525 0.0995 0.0262 0.146 0.1489
(J48) Obj.10 0.3267 0.1599 0.1227 0.0323 0.2041 0.1582
Obj.20 0.354 0.1667 0.1048 0.0346 0.2492 0.1632
cost 9 Obj.2 0.2546 0.1517 0.1018 0.0271 0.1528 0.1485
Obj.10 0.3493 0.1587 0.1384 0.0326 0.2109 0.1579
Obj.20 0.3898 0.164 0.1343 0.0408 0.2555 0.1619
cost 10 Obj.2 0.2573 0.1519 0.1042 0.0266 0.1532 0.1489
Obj.10 0.3721 0.1546 0.1585 0.0351 0.2136 0.1569
Obj.20 0.4119 0.1666 0.1542 0.0393 0.2577 0.1648
cost 14 Obj.2 0.303 0.1584 0.123 0.0286 0.1801 0.1558
Obj.10 0.4356 0.1724 0.2116 0.0351 0.224 0.1658
Obj.20 0.4646 0.1747 0.2055 0.0414 0.2591 0.1716
S\RQoL+D\RH+H cost 8 0.4224 0.1818 0.1542 0.0424 0.2683 0.171
(Jrip) cost 9 0.442 0.1966 0.1751 0.0474 0.2669 0.1843
cost 10 0.4733 0.2011 0.1974 0.0592 0.2759 0.1844
cost 14 0.6235 0.1931 0.2688 0.0601 0.3547 0.1825
S\RQoL+D\RH+H cost 8 Obj.2 0.2527 0.1521 0.0969 0.0272 0.1558 0.1528
(J48) Obj.10 0.3313 0.1648 0.1205 0.0308 0.2108 0.1619
Obj.20 0.3586 0.1681 0.102 0.0374 0.2567 0.1655
cost 9 Obj.2 0.2635 0.1536 0.1039 0.0285 0.1597 0.1528
Obj.10 0.3565 0.163 0.1361 0.0321 0.2204 0.1587
Obj.20 0.3873 0.1681 0.132 0.0392 0.2553 0.165
cost 10 Obj.2 0.2668 0.1513 0.1099 0.0288 0.1569 0.1511
Obj.10 0.3789 0.1601 0.151 0.0352 0.2278 0.1599
Obj.20 0.4033 0.1653 0.1502 0.0366 0.2531 0.1636
cost 14 Obj.2 0.312 0.1573 0.1346 0.0288 0.1774 0.158
Obj.10 0.4137 0.1643 0.214 0.0392 0.1998 0.1585
Obj.20 0.4368 0.17 0.2075 0.0427 0.2294 0.1666
TU Eindhoven 109 G. Manev
Table A.10 Result of T-test on TPR between all classifiers (S+D+H). Classification model Cumulative
cost object size 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 Nb + Nb = Nb -
1 = S+D+H cost 8 - 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 9 38 5
2 = (SVM) cost 9 0 - 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 15 34 3
3 = cost 10 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 30 0
4 = cost 14 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 35 17 0
5 = S+D+H cost 8 0 0 0 1 - 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 7 41 4
6 = (Jrip) cost 9 0 0 0 0 0 - 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 10 39 3
7 = cost 10 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 14 37 1
8 = cost 14 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 40 12 0
9 = S+D+H cost 8 Obj.2 1 1 1 1 1 1 1 1 - 0 0 0 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 0 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 1 0 24 28
10 = (J48) Obj.10 0 1 1 1 0 0 1 1 0 - 0 0 0 0 0 0 1 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 37 15
11 = Obj.20 0 0 1 1 0 0 0 1 0 0 - 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 44 7
12 = cost 9 Obj.2 1 1 1 1 0 1 1 1 0 0 0 - 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 0 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 1 0 25 27
13 = Obj.10 0 0 1 1 0 0 0 1 0 0 0 0 - 0 0 0 1 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 43 9
14 = Obj.20 0 0 0 1 0 0 0 1 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 9 39 4
15 = cost 10 Obj.2 1 1 1 1 0 1 1 1 0 0 0 0 0 1 - 0 1 0 1 1 0 1 1 1 0 0 0 0 0 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 0 1 1 0 27 25
16 = Obj.10 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 - 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 4 42 6
17 = Obj.20 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 14 35 3
18 = cost 14 Obj.2 0 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 1 - 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 1 0 1 1 0 32 20
19 = Obj.10 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 12 36 4
20 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 23 29 0
21 = S+D\RH+H cost 8 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 4 44 4
22 = (Jrip) cost 9 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 9 40 3
23 = cost 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 36 0
24 = cost 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 42 10 0
25 = S+D\RH+H cost 8 Obj.2 1 1 1 1 1 1 1 1 0 0 1 0 0 1 0 1 1 0 1 1 1 1 1 1 - 0 0 0 1 1 0 1 1 0 1 1 1 1 1 1 0 0 0 0 1 1 0 1 1 0 1 1 0 18 34
26 = (J48) Obj.10 0 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 0 - 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 38 14
27 = Obj.20 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 - 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 44 8
28 = cost 9 Obj.2 1 1 1 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 1 1 1 1 1 0 0 0 - 0 1 0 1 1 0 1 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 1 0 21 31
29 = Obj.10 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 - 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 44 7
30 = Obj.20 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 9 39 4
31 = cost 10 Obj.2 1 1 1 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 1 1 1 1 1 0 0 0 0 0 1 - 1 1 0 1 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 1 0 21 31
32 = Obj.10 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 - 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 5 41 6
33 = Obj.20 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 10 38 4
34 = cost 14 Obj.2 0 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 - 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 37 15
35 = Obj.10 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 15 34 3
36 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 23 28 1
37 = S\RQoL+D\RH+H cost 8 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 9 40 3
38 = (Jrip) cost 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 1 0 0 0 0 0 0 0 0 0 0 0 0 10 40 2
39 = cost 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 1 0 0 0 0 0 0 0 0 0 0 0 0 14 37 1
40 = cost 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 45 7 0
41 = S\RQoL+D\RH+H cost 8 Obj.2 1 1 1 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 1 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 1 1 1 1 1 - 0 0 0 1 1 0 1 1 0 1 1 0 20 32
42 = (J48) Obj.10 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 - 0 0 0 0 0 0 0 0 0 1 0 41 11
43 = Obj.20 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 - 0 0 0 0 0 0 0 0 1 0 44 8
44 = cost 9 Obj.2 1 1 1 1 1 1 1 1 0 0 0 0 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 0 1 0 1 1 0 1 1 1 1 1 1 0 0 0 - 0 1 0 1 1 0 1 1 0 23 29
45 = Obj.10 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 - 0 0 0 0 0 0 0 2 43 7
46 = Obj.20 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 - 0 0 0 0 0 0 8 40 4
47 = cost 10 Obj.2 1 1 1 1 1 1 1 1 0 0 0 0 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 0 1 0 0 1 0 1 1 1 1 1 1 0 0 0 0 0 1 - 1 1 0 1 1 0 24 28
48 = Obj.10 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 - 0 0 0 0 8 40 4
49 = Obj.20 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 - 0 0 0 10 38 4
50 = cost 14 Obj.2 0 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 - 1 1 0 38 14
51 = Obj.10 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 - 0 11 37 4
52 = Obj.20 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 - 17 32 3
TU Eindhoven 110 G. Manev
Table A.11 Result of T-test on Youden Index between all classifiers (S+D+H). Classification model Cumulative
cost object size 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 Nb + Nb = Nb -
1 = S+D+H cost 8 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 45 0
2 = (SVM) cost 9 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 19 33 0
3 = cost 10 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 25 27 0
4 = cost 14 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 31 21 0
5 = S+D+H cost 8 0 0 0 1 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 51 1
6 = (Jrip) cost 9 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 52 0
7 = cost 10 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 52 0
8 = cost 14 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 40 0
9 = S+D+H cost 8 Obj.2 0 1 1 1 0 0 0 1 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 47 5
10 = (J48) Obj.10 0 1 1 1 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 48 4
11 = Obj.20 0 0 1 1 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 50 2
12 = cost 9 Obj.2 0 1 1 1 0 0 0 1 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 47 5
13 = Obj.10 0 1 1 1 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 48 4
14 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 52 0
15 = cost 10 Obj.2 0 1 1 1 0 0 0 1 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 47 5
16 = Obj.10 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 48 4
17 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 49 0
18 = cost 14 Obj.2 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 45 7
19 = Obj.10 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 49 3
20 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 49 0
21 = S+D\RH+H cost 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 52 0
22 = (Jrip) cost 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 52 0
23 = cost 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 52 0
24 = cost 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 45 0
25 = S+D\RH+H cost 8 Obj.2 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 - 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 42 10
26 = (J48) Obj.10 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 48 4
27 = Obj.20 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 51 1
28 = cost 9 Obj.2 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 43 9
29 = Obj.10 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 48 4
30 = Obj.20 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 51 1
31 = cost 10 Obj.2 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 43 9
32 = Obj.10 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 49 3
33 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 52 0
34 = cost 14 Obj.2 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 47 5
35 = Obj.10 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 50 2
36 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 51 0
37 = S\RQoL+D\RH+H cost 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 52 0
38 = (Jrip) cost 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 52 0
39 = cost 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 52 0
40 = cost 14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 21 31 0
41 = S\RQoL+D\RH+H cost 8 Obj.2 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 - 0 0 0 0 0 0 0 0 0 0 0 0 45 7
42 = (J48) Obj.10 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 - 0 0 0 0 0 0 0 0 0 0 0 48 4
43 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 52 0
44 = cost 9 Obj.2 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 - 0 0 0 0 0 0 0 0 0 45 7
45 = Obj.10 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 50 2
46 = Obj.20 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 51 1
47 = cost 10 Obj.2 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 - 0 0 0 0 0 0 45 7
48 = Obj.10 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 51 1
49 = Obj.20 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 51 1
50 = cost 14 Obj.2 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 - 0 0 0 47 5
51 = Obj.10 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 - 0 0 48 4
52 = Obj.20 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 50 2
TU Eindhoven 111 G.
Manev
A.5 Experiment 5 (S+D+H+FS)
Table A.12 Performances off all algorithms (S+D+H+FS).
Classification model TPR FPR Youden Index
Mean Std.dev. Mean Std.dev. Mean Std.dev.
S+D+H+FS cost 8 0.4173 0.1736 0.1054 0.0358 0.3119 0.1711
(SVM) cost 9 0.4369 0.174 0.1111 0.0333 0.3258 0.1732
cost 10 0.4401 0.1714 0.1187 0.0354 0.3213 0.1722
cost 14 0.5408 0.1971 0.1872 0.056 0.3536 0.1844
S+D+H+FS cost 8 0.3823 0.1827 0.1185 0.0472 0.2639 0.1717
(Jrip) cost 9 0.4095 0.1819 0.135 0.0498 0.2745 0.1709
cost 10 0.4371 0.1938 0.1544 0.0552 0.2827 0.176
cost 14 0.5303 0.1935 0.203 0.0521 0.3273 0.1835
S+D+H+FS cost 8 Obj.2 0.3235 0.169 0.0839 0.0269 0.2396 0.1653
(J48) Obj.10 0.3407 0.1675 0.0878 0.0294 0.2529 0.1649
Obj.20 0.3402 0.1723 0.0899 0.0317 0.2503 0.1689
cost 9 Obj.2 0.3429 0.1694 0.0944 0.0317 0.2485 0.1662
Obj.10 0.3581 0.1693 0.098 0.0343 0.2601 0.1682
Obj.20 0.3567 0.1771 0.0986 0.0346 0.2582 0.1725
cost 10 Obj.2 0.3674 0.176 0.1073 0.0403 0.2601 0.1703
Obj.10 0.3761 0.1753 0.111 0.0414 0.265 0.1701
Obj.20 0.381 0.179 0.1106 0.0413 0.2704 0.1736
cost 14 Obj.2 0.4571 0.1852 0.1604 0.0497 0.2967 0.1755
Obj.10 0.472 0.1826 0.1696 0.0514 0.3024 0.1713
Obj.20 0.4862 0.1921 0.1779 0.0529 0.3083 0.179
Table A.13 Result of T-test on TPR between all classifiers (S+D+H+FS)
Classification model Cumulative
Cost Object size 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Nb + Nb = Nb -
1 = S+D+H+FS cost 8 - 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 18 2
2 = (SVM) cost 9 0 - 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 16 1
3 = cost 10 0 0 - 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 16 1
4 = cost 14 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 5 0
5 = S+D+H+FS cost 8 0 0 0 1 - 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 17 3
6 = (Jrip) cost 9 0 0 0 1 0 - 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 18 2
7 = cost 10 0 0 0 1 0 0 - 1 0 0 0 0 0 0 0 0 0 0 0 0 2 16 2
8 = cost 14 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 13 7 0
9 = S+D+H+FS cost 8 Obj.2 0 1 1 1 0 0 1 1 - 0 0 0 0 0 0 0 0 1 1 1 0 12 8
10 = (J48) Obj.10 0 1 1 1 0 0 1 1 0 - 0 0 0 0 0 0 0 1 1 1 0 12 8
11 = Obj.20 0 1 1 1 0 0 0 1 0 0 - 0 0 0 0 0 0 1 1 1 0 13 7
12 = cost 9 Obj.2 0 0 0 1 0 0 0 1 0 0 0 - 0 0 0 0 0 1 1 1 0 15 5
13 = Obj.10 0 0 0 1 0 0 0 1 0 0 0 0 - 0 0 0 0 1 1 1 0 15 5
14 = Obj.20 0 0 0 1 0 0 0 1 0 0 0 0 0 - 0 0 0 0 1 1 0 16 4
15 = cost 10 Obj.2 0 0 0 1 0 0 0 1 0 0 0 0 0 0 - 0 0 1 1 1 0 15 5
16 = Obj.10 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 - 0 0 1 1 0 16 4
17 = Obj.20 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 - 0 0 1 0 17 3
18 = cost 14 Obj.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 6 14 0
19 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 8 12 0
20 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 10 10 0
TU Eindhoven 112 G.
Manev
Table A.14 Result of T-test on Youden Index between all classifiers (S+D+H+FS). Classification model Cumulative
Cost Object size 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Nb + Nb = Nb -
1 = S+D+H+FS cost 8 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 0
2 = (SVM) cost 9 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 0
3 = cost 10 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 0
4 = cost 14 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 18 0
5 = S+D+H+FS cost 8 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 0
6 = (Jrip) cost 9 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 0
7 = cost 10 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 0
8 = cost 14 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 20 0
9 = S+D+H+FS cost 8 Obj.2 0 0 0 1 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 0 19 1
10 = (J48) Obj.10 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 0 20 0
11 = Obj.20 0 0 0 1 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 0 19 1
12 = cost 9 Obj.2 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 0 20 0
13 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 0 20 0
14 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 20 0
15 = cost 10 Obj.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 0 20 0
16 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 0 20 0
17 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 0 20 0
18 = cost 14 Obj.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 0 20 0
19 = Obj.10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 0 20 0
20 = Obj.20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 - 0 20 0
A.6 Summary
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.0 0.1 0.2 0.3
FPR (1-specificity)
TP
R (
se
ns
itiv
ity
)
S (best)
S\RQoL (best)
S+D (best)
S+D\RH (best)
S\RQoL+D\RH (best)
S+H (best)
S\RQoL+H (best)
S+D+H (best)
S+D|RH+H (best)
S\RQoL+D\RH+H (best)
S+D+H+FS (best)
S (other)
S\RQoL (other)
S+D (other)
S+D\RH (other)
S\RQoL+D\RH (other)
S+H (other)
S\RQoL+H (other)
S+D+H (other)
S+D\RQoL+H (other)
S\RQoL+D\RH+H (other)
S+D+H+FS (other)
Random guess
Figure A.1 All models from all experiments.
TU Eindhoven 113 G.
Manev
Table A.15 Summary of two best algorithms per experiment with max TPR) and min
FPR.
Classification model TPR FPR Yindex
S (SVM) max TPR 0.4486 0.2045 0.2441
min FPR 0.38 0.1427 0.2373
(Jrip) max TPR 0.5289 0.2561 0.2729
min FPR 0.3915 0.1548 0.2367
(J48) max TPR 0.4548 0.2185 0.2363
min FPR 0.394 0.1614 0.2326
S+D (SVM) max TPR 0.5476 0.1843 0.3634
min FPR 0.4605 0.1381 0.3225
(Jrip) max TPR 0.5372 0.2241 0.3131
min FPR 0.4618 0.1664 0.2953
(J48) max TPR 0.4743 0.1811 0.2932
min FPR 0.3765 0.0987 0.2778
S+H (SVM) max TPR 0.4851 0.1994 0.2857
min FPR 0.4008 0.144 0.2567
(Jrip) max TPR 0.502 0.2357 0.2662
min FPR 0.3824 0.1347 0.2477
(J48) max TPR 0.3685 0.1281 0.2404
min FPR 0.3677 0.1206 0.2471
S+D+H (SVM) max TPR 0.5482 0.1788 0.3694
min FPR 0.4045 0.1048 0.2997
(Jrip) max TPR 0.6235 0.2688 0.3547
min FPR 0.4224 0.1542 0.2683
(J48) max TPR 0.4777 0.2049 0.2729
min FPR 0.4337 0.1608 0.2729
S+D+H+FS (SVM) max TPR 0.5485 0.1887 0.3598
min FPR 0.4185 0.1054 0.3131
(Jrip) max TPR 0.5335 0.1995 0.334
min FPR 0.4426 0.1507 0.2918
(J48) max TPR 0.4883 0.1766 0.3116
min FPR 0.4736 0.1646 0.309
TU Eindhoven 114 G.
Manev
Appendix B
Results on test data from preemptive mode of work
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.0 0.1 0.2 0.3
FPR (1-specificity)
TP
R (
se
ns
itiv
ity
)
S (Jrip)
S (J48)
S+D (Jrip)
S+D (J48)
S+H (Jrip)
S+H (J48)
S+D+H (Jrip)
S+D+H (J48)
S+D+H+FS (Jrip)
S+D+H+FS (J48)
RoT
MACD
WTI
HRTI
UnionMW
UnionMWH
UnionRMW
UnionRMWH
Random guess
Figure B.1 Hospitalization prediction accuracies for different classifiers and feature
sets on test dataset using the preemptive mode of work on the adaptive engine.
TU Eindhoven 115 G.
Manev
Table B.1 Prediction accuracies on the test dataset using the preemptive mode of
work on the adaptive engine.
Classification model TP FN FP TN TPR FPR Yindex Hrate #Alarms
Our Approach
S (Jrip) max TPR 13 7 111 624 0.65 0.151 0.499 0.65 124
min FPR 10 10 69 717 0.5 0.0878 0.4122 0.5 79
(J48) max TPR 7 13 56 766 0.35 0.0681 0.2819 0.35 63
min FPR 8 12 47 779 0.4 0.0569 0.3431 0.4 55
S+D (Jrip) max TPR 12 8 191 680 0.6 0.2193 0.3807 0.6 203
min FPR 11 9 137 724 0.55 0.1591 0.3909 0.55 148
(J48) max TPR 11 9 104 773 0.55 0.1186 0.4314 0.55 115
min FPR 11 9 75 794 0.55 0.0863 0.4637 0.55 86
S+H (Jrip) max TPR 8 12 59 733 0.4 0.0745 0.3255 0.4 67
min FPR 7 13 49 756 0.35 0.0609 0.2891 0.35 56
(J48) max TPR 8 12 46 776 0.4 0.056 0.344 0.4 54
min FPR 8 12 49 772 0.4 0.0597 0.3403 0.4 57
S+D+H (Jrip) max TPR 11 9 153 691 0.55 0.1813 0.3687 0.55 164
min FPR 11 9 108 754 0.55 0.1253 0.4247 0.55 119
(J48) max TPR 12 8 215 666 0.6 0.244 0.356 0.6 227
min FPR 12 8 194 662 0.6 0.2266 0.3734 0.6 206
S+D+H+FS (Jrip) max TPR 13 7 174 702 0.65 0.1986 0.4514 0.65 187
min FPR 7 13 55 764 0.35 0.0672 0.2828 0.35 62
(J48) max TPR 12 8 175 703 0.6 0.1993 0.4007 0.6 187
min FPR 5 15 57 824 0.25 0.0647 0.1853 0.25 62
Previous approach
RoT 6 14 152 763 0.3 0.1661 0.1339 0.3 158
MACD 6 14 99 784 0.3 0.1121 0.1879 0.3 105
WTI 4 16 71 822 0.2 0.0795 0.1205 0.2 75
HRTI 1 19 139 800 0.05 0.148 -0.098 0.05 140
MACD+WTI 6 14 137 766 0.3 0.1517 0.1483 0.3 143
MACD+WTI+HRTI 7 13 258 709 0.35 0.2668 0.0832 0.35 265
RoT+MACD+WTI 8 12 238 703 0.4 0.2529 0.1471 0.4 246
RoT+MACD+WTI+HRTI 9 11 344 645 0.45 0.3478 0.1022 0.45 353