Automated MRI segmentation for individualized modeling of ...
Kernel-based image analysis towards MRI segmentation and ... · Kernel-based image analysis towards...
Transcript of Kernel-based image analysis towards MRI segmentation and ... · Kernel-based image analysis towards...
Kernel-based image analysis towardsMRI segmentation and classification
David Augusto Cardenas Pena
Universidad Nacional de Colombia
Faculty of Engineering and Architecture
Departament of Electrics, Electronics, and Computing Engineering
Manizales, Colombia
2016
Kernel-based image analysis towardsMRI segmentation and classification
David Augusto Cardenas Pena
A dissertation submitted in part fulfillment of the requirements for the degree of:
Doctor in Engineering - Automatics
Advisor:
Cesar German Castellanos Domınguez, PhD
Examining Committee:Beatriz Marcotegui Iturmendi, PhD Juan Manuel Gorriz Saez, PhD
MINES ParisTech Universidad de Granada
Pablo Andres Arbelaez Escalante, PhD
Universidad de los Andes
Academic Research Group:
Signal Processing and Recognition Group
Universidad Nacional de Colombia
Faculty of Engineering and Architecture
Departament of Electrics, Electronics, and Computing Engineering
Manizales, Colombia
2016
Analisis de imagenes empleandokernels para la segmentacion y
clasificacion de RNM
David Augusto Cardenas Pena
Tesis o trabajo de grado presentada(o) como requisito parcial para optar al tıtulo de:
Doctor en Ingenierıa - Ingenierıa Automatica
Director:
Ph.D. Cesar German Castellanos Domınguez
Jurados:Beatriz Marcotegui Iturmendi, PhD Juan Manuel Gorriz Saez, PhD
MINES ParisTech Universidad de Granada
Pablo Andres Arbelaez Escalante, PhD
Universidad de los Andes
Grupo de Investigacion:
Grupo de Control y Procesamiento Digital de Senales
Universidad Nacional de Colombia
Facultad de Ingenierıa y Arquitectura
Departamento de Ingenierıa Electrica, Electronica y Computacion
Manizales, Colombia
2016
A mi hermanita, buen viaje.
A Martın, bienvenido.
Acknowledgment
To my family, thanks for always been there, for all of your love, and for encouraging me.
I would like to express my gratitude to my advisor, Prof. Dr. German Castellanos-
Domınguez for his valuable orientation during this research and for teaching me to think
outside of the box. Thanks to the Signal Processing and Recognition Group (SPRG) at
the Universidad Nacional de Colombia (Manizales) for having me there along these years.
Thanks to my lab colleagues for discussing ideas and giving me new points of view. My
friends el Oso, el Paya, and el Mostro, always helped me on the good and hard times. Fi-
nally, thanks to Mauricio, my master student, for letting me advise him and working with
me.
Furthermore, thanks to all the members of the Fundacion Centro de Investigacion Enfer-
medades Neurologicas (Madrid, Spain), the Medical Imaging and Signal Processing (Ghent,
Belgium), and the Centre for Mathematical Morphology (Fontainebleau, France) for their
hospitality. Mainly, I like to thank professors Juan Antonio Hernandez Tamames, Stefaan
Vandenberghe, and Etienne Decenciere for the opportunity to visit their labs, as well as for
their helpful insights about open issues and research directions.
Finally, I recognize that this research had not been possible without the COLCIENCIAS
Ph.D. scholarship Programa Nacional de Formacion de Investigadores “Generacion del Bi-
centenario” 2011. In addition, some of the results of this work were partially supported by
the research projects 111045426008, 20101008258, 111056934461, also funded by COLCIEN-
CIAS.
xi
Abstract
Recently, medical image analysis has received significant interest due to its wide span of
applications including brain surgery, atlas building, and computer-aided diagnosis. In addi-
tion, kernel theory is one of the most considered machine learning methods in several tasks
due to its properties and multiple techniques. In this work, we combine both medical images,
specifically magnetic resonance images, and kernel theory for improving segmentation and
classification tasks.
The first contribution of the thesis is a novel tuning criterion for Gaussian kernel bandwidth
parameter, termed KEIPV. The approach maximizes the information potential variability in
the reproduced Hilbert space. Such criterion allows tuning all Gaussian kernels considered
in this work. Secondly, we propose new image representation that highlights inherent im-
age categories, particularly age and gender. Resulting embedded image similarity supports
atlas-based segmentation algorithms by selecting the most relevant templates so that com-
putational cost (induced by pairwise image registration) and segmentation performance are
improved. Then, we propose two template-based segmentation approaches: The first one
introduces an information cost function in the Bayesian image intensity modeling so bet-
ter fitting model parameters to the image properties. The second approach is patch-based,
where we introduce a voxel-wise feature extraction locally learned using supervised informa-
tion provided by neighboring label voxels. To this end, the maximization of the centered
kernel alignment (CKA) criterion enhances the tissue discrimination in the feature space.
Finally, a new training scheme for multi-layer perceptron (MLP) is described in the last
chapter with two contributions: A supervised pre-training MLP stage using CKA to learn
linear projecting matrices; and a matrix conditional entropy cost function for training MLP
parameters in a backpropagation updating scheme.
Keywords: Medical image analysis, Kernel theory, MRI clustering, Atlas-based seg-
mentation, Template selection, Bayesian segmentation, Patch-based segmentation,
Computer-aided diagnosis, Information-based cost function, Neural networks
xiii
Resumen
Recientemente, el analisis de imagenes medicas ha recibido gran interes debido a su am-
plia gama de aplicaciones incluyendo cirugıa cerebral, construccion de atlases, y diagnostico
asistido por computador. Por otro lado, la teorıa de Kernels es uno de los metodos de apren-
dizaje de maquina mas empleados en variadas tareas debido a sus propiedades y multiples
tecnicas. En este trabajo, se combinan las imagenes medicas, en particular las imagenes
de resonancia magnetica, y la teorıa de Kernels para mejorar las tareas de segmentacion y
clasificacion.
La primera contribucion de esta tesis es un nuevo criterio para la sintonizacion del ancho de
banda del kernel Gaussiano, como unico parametro libre, el cual es denominado KEIPV. El
algoritmo maximiza la variabilidad del potencial de informacion en el espacio reproducido
de Hilbert. Este criterio se emplea para la sintonizacion de todos los kernels Gaussianos
considerados en este trabajo. Luego, se propone una nueva representacion de imagenes 3D
que realza las categorıas inherentes en los sujetos, especıficamente edad y genero. La me-
dida embebida de similitud de imagenes soporta los algoritmos de segmentacion basados en
atlases al seleccionar las plantillas mas relevantes de tal forma que se reduce el costo com-
putacional (inducido por el registro deformable) y se mejora el desempeno de segmentacion.
Posteriormente, se proponen dos estrategias de segmentacion basadas en atlases: La primera
presenta una funcion de costo empleando medidas de informacion para el esquema de sege-
mentacion Bayesiana, tal que los parametros del modelo se ajustan mejor a las propiedades
de la imagen. La segunda es una estrategia empleando parches, para la que se propone
una extraccion de caractersticas voxel a voxel local que se entrena con informacion super-
visada proveniente de las etiquetas de voxeles vecinos. Con este objetivo, la maximizacion
del criterio del alineamiento de kernels centralizados (CKA) que mejora la discriminacion
de tejidos en el espacio de caracterısticas. Finalmente, un nuevo esquema de entrenamiento
para perceptrones multicapa (MLP) se describe en el ultimo capıtulo con dos contribuciones:
Una etapa de pre-entrenamiento supervisado usando CKA que estima matrices de proyeccion
lineal; y una funcion de costo empleando la entropıa condicional de matrices para el ajuste
fino de los parametros del MLP en un esquema de actualizacion por retropropagacion.
Palabras clave: Analis de imagenes medicas, Teorıa de kernels, Agrupamiento de
RNM, Segmentacion basada en atlases, Seleccion de plantillas, Segmentacion Bayesiana,
Segmentacion basada en parches, Diagnostico asistido por computador, Funcion de
costo empleando medidas de informacion, Redes Neuronales
Contents
Acknowledgement ix
Abstract xi
1 Preliminaries 2
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 State-of-the-art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Image Representation Enhancement . . . . . . . . . . . . . . . . . . . 8
1.3.2 Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.1 General Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4.2 Specific objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Background 13
2.1 Reproducing kernel Hilbert spaces in machine learning . . . . . . . . . . . . 13
2.2 Gaussian kernel estimation from information potential variability . . . . . . 15
2.3 Template-based Image Segmentation . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Magnetic resonance image databases . . . . . . . . . . . . . . . . . . . . . . 20
3 Kernel-based Template Selection from the using Embedding Representations 26
3.1 Template selection for image segmentation . . . . . . . . . . . . . . . . . . . 26
3.1.1 Feature extraction based on inter-slice similarities . . . . . . . . . . . 27
3.2 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2.1 MRI Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.2 ISK feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.3 Image similarity function from TKR . . . . . . . . . . . . . . . . . . 29
3.2.4 Tissue Labeling Performance . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
xvi Contents
4 Information-based cost function for Bayesian MRI segmentation 36
4.1 Bayesian Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.1.1 Parameter optimization . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.1 Evaluation of performed segmentation . . . . . . . . . . . . . . . . . 40
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5 Multi-atlas label fusion using supervised local weighting 43
5.1 Feature-based label fusion within α-neighborhoods . . . . . . . . . . . . . . . 43
5.1.1 Supervised feature learning based on centered kernel alignment . . . . 44
5.1.2 CKA-LF optimization using gradient descent . . . . . . . . . . . . . . 46
5.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.1 Algorithm parameter setup . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2.2 Patch-based segmentation performance . . . . . . . . . . . . . . . . . 49
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6 Magnetic resonance image classification using kernel-enhanced neural networks 52
6.1 Multi-layer perceptron-based classifier using kernels . . . . . . . . . . . . . . 52
6.1.1 Matrix-based entropy as a cost function for MLP . . . . . . . . . . . 53
6.1.2 Network pre-training using centered kernel alignment . . . . . . . . . 54
6.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.2.1 Processing of MRI data . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2.2 Tuning of ANN model parameter . . . . . . . . . . . . . . . . . . . . 58
6.2.3 Classifier performance of neurological classes . . . . . . . . . . . . . . 59
6.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7 Conclusions and Future Work 63
7.1 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.3 Academic discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Bibliography 69
Biographical sketch 81
List of Figures
1-1 4D neonatal brain atlas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1-2 Components of brain mapping. . . . . . . . . . . . . . . . . . . . . . . . . . 5
1-3 Magnetic resonance intensity histogram of brain structures. . . . . . . . . . . 6
2-1 Kernel-based mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2-2 KEIPV illustrative example. . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2-3 Multi-atlas-based segmentation scheme. . . . . . . . . . . . . . . . . . . . . . 19
2-4 Examples of MRI databases. . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3-1 ISK representation for IXI database. . . . . . . . . . . . . . . . . . . . . . . 30
3-2 Similarity kernels for IXI database. . . . . . . . . . . . . . . . . . . . . . . . 31
3-3 KPCA-based projection of IXI database from SSD and TKR. . . . . . . . . 32
3-4 Template selection performance. . . . . . . . . . . . . . . . . . . . . . . . . . 33
4-1 α-order Renyi’s entropy versus the number of iteration for the optimization
procedure, for several α values and a given image in the dataset . . . . . . . 40
4-2 Average Dice similarity index versus the entropy order for available image
noise intensities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5-1 Proposed multi-atlas patch-based label fusion scheme. . . . . . . . . . . . . . 47
5-2 Parameter tuning for patch-based approaches by exhaustive search. . . . . . 48
5-3 Resulting kernel matrices for a random subset of voxels before and after learn-
ing the projection matrix Wr. Voxels are sorted by tissue type. . . . . . . . 48
5-4 β radius effect in a subject’s volume . . . . . . . . . . . . . . . . . . . . . . . 50
6-1 General MRI classification pipeline . . . . . . . . . . . . . . . . . . . . . . . 57
6-2 ANN performance versus the number of hidden nodes . . . . . . . . . . . . . 58
6-3 Relevance indexes grouped by feature type . . . . . . . . . . . . . . . . . . . 59
6-4 Receiver-operating-characteristic curve (top) and confusion matrix (bottom)
on the 30% test data for AEN (left), PCA (center), and CKA (right) initiza-
liation approaches at the best parameter set of the ANN classifier. . . . . . . 61
List of Tables
2-1 Demographic and clinical details of the selected ADNI cohort. . . . . . . . . 23
2-2 Summary of characteristics of the considered MRI databases. . . . . . . . . . 25
3-1 Template selection performance for each tissue using optimal number of atlases. 34
4-1 Dice index for each structure at optimal α = 0.5 . . . . . . . . . . . . . . . . 41
5-1 SATA segmentation performance . . . . . . . . . . . . . . . . . . . . . . . . 49
6-1 FreeSurfer extracted features. . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6-2 Best performing algorithms in the 2014 CADDementia challenge. . . . . . . 59
6-3 ADNI classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
1 Preliminaries
This chapter is intended to work as an introduction to the research problem and the upcoming
chapters. In this sense, a brief medical motivation about the research problem is firstly
introduced as well as the issues under consideration. A state-of-the-art related specifically to
magnetic resonance brain image analysis is then presented, and some of the main limitations
are highlighted. Finally, the proposed hypothesis and thesis objectives are listed.
1.1 Motivation
Medical images have evolved over the years providing essential information for clinical ap-
plications as diagnosis, therapy planning and execution, and disease and treatment mon-
itoring. The most known image modalities include magnetic resonance imaging (MRI),
ultrasound, X-ray computer tomography (CT), and positron emission tomography (PET).
Particularly, the MRI is a commonly used medical imaging technique as it is a non-invasive,
avoids ionizing radiation, and possesses good contrast resolution for different body tis-
sues [Wang and Wang, 2008]. MRI can reveal precise anatomy details, and its flexibility
allows to enhance tissue contrast by varying the image acquisition parameters: Adjustment
on the radio-frequency (RF), gradient pulses, and relaxation timings highlights different
components in the imaged object and produces high contrast images [Liew and Yan, 2006].
MRI became necessary for structural and functional studies of the human brain, one of the
organs arousing the most interest in modern medicine, due to provided detailed anatomical
images. Such images allow a precise analysis of cerebral structures, an essential point in
interdisciplinary technologies as computer-aided detection, diagnosis, and patient follow-up.
In general, brain image analyses are classified into two categories [Cerrolaza et al., 2012]: i)
structural analysis and ii) functional analysis.
Regarding the structural analysis, brain morphometry is perhaps the most important appli-
cation: It quantifies the morphological brain features for learning how age, gender, disease,
genetic composition, environmental exposures, treatment, among other factors, affect brain
structure [Davatzikos, 2004, Ashburner and Friston, 2000]. For this sake, image analysis
1.1 Motivation 3
methods –such as segmentation and registration– and machine learning techniques are ap-
plied to the structural volumes (three-dimensional images) and sometimes to functional vol-
umes (time-varying images). As an example application, authors in [Kuklisova-Murgasova et al., 2011]
built a 4D temporal probabilistic atlas of a neonatal brain using regression approaches to
dynamically generate prior tissue probability maps at any chosen stage of neonatal brain
development. Figure 1-1 shows the resulting temporal atlas1. This kind of brain models has
improved the survival rates of prematurely born infants, in particular for those developing
neurological problems.
Figure 1-1: A 4D dynamic probabilistic atlas of neonatal brain structures at ages of 29,
32, 35, 38, 41 and 44 weeks gestational age shown in columns from left to right,
axial view. Structure probability maps shown in rows from top to bottom: inten-
sity template, color-coding the sum of probability maps, white matter, cortical
gray matter, cerebrospinal fluid, subcortical gray matter. The cerebellum and
brainstem are not shown as they are not present in this slice.
Particularly for neuropathologies, several medical studies showed that neurodegeneration be-
gins in the medial temporal lobe, successively affecting the entorhinal cortex, the hippocam-
pus, and the limbic system, then extending towards neocortical areas [Wolz et al., 2011,
Magnin et al., 2009]. For example, early and significant hippocampal atrophy in people
who have memory complaints usually points to a diagnosis of Alzheimer’s disease (AD)
[Lotjonen et al., 2010]. Therefore, along with clinical history, neuropsychological tests, and
laboratory assessment, the joint clinical diagnosis of AD also includes imaging techniques
1Figure and caption taken from the original Kuklisova-Murgasova’s paper
4 1 Preliminaries
like PET and MRI. Nonetheless, issues related to image quality and radiologist experience
demand the automatic assessment of quantitative biomarkers to enhance the performance
for dementia diagnosis [Dubois et al., 2014, Jack et al., 2013, McKhann et al., 2011]. As a
result, estimation of morphological changes from structural data may support the computer-
aided diagnosis of neurological diseases. In these cases, pattern recognition and multivariate
data analysis methods have been used to build discriminative models [Bron et al., 2015,
Ramırez et al., 2013]. Those methods benefit from the large amounts of neuroimaging data,
available over the last years, to learn differences that clinicians pass during the qualitative
visual inspection. Consequently, earlier and more objective diagnosis than only using clinical
criteria are achieved [Kloppel et al., 2012a].
In the regional context, researchers at the Clınica de la Memoria (Universidad Autonoma de
Manizales), in partnership with the research group Signal Processing and Recognition (Uni-
versidad Nacional de Colombia), have been studying the rise of biomarkers for the attention
deficit and hyperactivity disorder (ADHD) in children. As the result of previous findings,
they are developing the research project Assisted evaluation of evoked related potentials as
a biomarker of the ADHD, where the main goal is to improve the ADHD diagnosis using a
multimodal scheme from clinical tests, brain activity recordings, and structural MRI.
On the other hand, the most widely spread functional analysis task is brain mapping, related
to the assessment of the neuronal activity from the generated synchronous, compact electric
sources located within the brain. In the standard setup, the propagation of such activity
is measured by a set of sensors located on or near a subject’s scalp with excellent tempo-
ral resolution [de Munck et al., 1988]. Electroencephalography (EEG) sensors measure the
voltage potentials using standardized electrode placement systems (see Figure 1.2(a)). Differ-
ently, magnetoencephalogram (MEG) records magnetic fields produced by electrical currents
occurring in the brain through very sensitive magnetometers. Next, the inverse modeling
estimates the unknown sources over the brain cortex best fitting obtained M/EEG, as it is
illustrated in Figure 1.2(b).
Nonetheless, M/EEG recording not only depend on the location of activated neurons (source
model) but also on the spatial arrangement of the electrodes/sensors (channel positions) and
the geometrical and electromagnetic properties of the head. A leadfield matrix contains all
these properties and describes the current flow from each source position, through the head
geometry, to a given sensor so that sources and sensors are linearly related [Montes-Restrepo et al., 2014].
In this case, the head geometry is computed from patient’s medical images (MRI/CT) by
segmenting them into components of different conductivity properties, e.g. skin, bone,
cerebrospinal fluid, and brain tissue [Darvas et al., 2004]. Hence, accurate segmentation
of head structures from imaging studies allows to construct of patient-specific conductiv-
ity matrices and improves the accuracy of the source activity modeling [Lanfer et al., 2012,
Grech et al., 2008].
1.2 Problem statement 5
Cz T4C4C3T3
Pz
Fz
T6
O2
T5
F7 F8
O1
Fp1 Fp2
F4F3
P3 P4
A1 A2
INION
NASION
(a)
MRI
CT
Channel
positionsMEG
EEG
Head model
leadfieldProcessing of
functional data
Inverse solution
Statistics/visualization
(b)
Figure 1-2: Left: Electrode locations of International 10-20 system for electroencephalogra-
phy recording. Right: Major steps in the source location for brain mapping.
In this regard, the research groups Signal Processing and Recognition (Universidad Nacional
de Colombia) and Control e Instrumentacion (Universidad Tecnologica de Pereira), together
with the Instituto de Epilepsia y Parkinson - Neurocentro, proposed the research project en-
titled Development of an automatic system for brain mapping and intraoperative monitoring:
Application to neurosurgery. The research is conducted on subjects with diagnosed epilepsy
and Parkinson and attempts to integrate both, brain mapping and surgery monitoring, into
a single framework improving clinical procedures, enhancing treatment of the neurological
diseases, and preventing surgery complications.
1.2 Problem statement
As above stated, structural and functional brain imaging are playing an expanding role
in neuroscience and experimental medicine. Nevertheless, the produced amount of data
increasingly exceeds the visual analysis capacity of expert clinicians. For instance, build-
ing reliable atlases and leadfields requires several structures from hundreds of volumes be-
ing manually delineated, and that depends on factors related to image quality and radi-
ologist experience [Dubois et al., 2014]. As a consequence, there is a growing need for
automated image analysis: Information extraction requires accurate segmentation; while
robust classification schemes may support the computer-aided diagnosis of neuropatholo-
gies [Sabuncu and Konukoglu, 2015, Heckemann et al., 2006]. Nevertheless, such procedures
6 1 Preliminaries
are not straightforward to perform due to issues related to imaging artifacts or structure
properties.
Perhaps the main artifact is a slowly varying spatial bias multiplying the measured intensities
of several imaging modalities, known as intensity non-uniformity, intensity inhomogeneity,
shading, or bias field. Some causes of the intensity inhomogeneity on MRI are the radio fre-
quency pulse attenuation, the non-uniform magnetic field, and the magnetic susceptibility
of the tissues [Vovk et al., 2007, Brinkmann et al., 1998]. Although such shading is hardly
visible and not a serious drawback for qualitative clinic diagnosis, the intensity variation sig-
nificantly hampers the precise measurement in automated processing tasks as segmentation,
registration, and classification [Demirhan and Guler, 2011, Balafar et al., 2010].
Particularly, segmentation is difficult to perform based solely on the MR intensity because
structures that are anatomically distinct do not necessarily differ in their signal properties.
For instance, with the aid of MR relaxometry, it has been proved that different regions of
white matter have significantly different T1 excitement [Cho et al., 1997]. Frontal cortex
also has an average T1 that is 20% longer than that the one in motor and somatosensory
cortex [Steen et al., 2000]. Moreover, MR intensity histograms of manually labeled brain
structures reported in [Ledig et al., 2012] evidence that boundaries between white matter
and subcortical gray matter are generally less clear than white matter and cortical gray
matter ones. Original histograms are shown in Figure 1-3. Besides, limited image resolu-
tion yields to voxels composed of more than one tissue types, know as the partial volume
effect [Ahmed et al., 2011, Heckemann et al., 2006, Liew and Yan, 2006].
Figure 1-3: Gaussian intensity distribution of manually segmented classes. CSF (black),
WM(red), deep GM structures (green), cortical GM, brainstem and cerebellum
GM fractions (blue).
Regarding the computer-aided diagnosis, using the millions of voxels on an MRI as the
straightforward input of conventional classification machines is still unrealistic because the
number of features larger than the subjects’ and the amount of parameters yield to model
1.3 State-of-the-art 7
overfitting [Ota et al., 2015]. Dimensionality reduction using feature extraction and fea-
ture selection approaches, in general, improves the generalization of pattern recognition
systems. In this regard, brain parcellation from predefined anatomical templates provides
a simple yet versatile and interpretable set of features such as volume, thickness, and
shape [Cuingnet et al., 2011, Zhang et al., 2011]. However, the diagnosis outcome still de-
pend on the quality of resulting parcellation and the template definition, which sometimes
differ from atlas to atlas even for the same structure [Ota et al., 2014, Bohland et al., 2009].
In addition, the very same nature of the disease poses a challenge. For instance, demen-
tia diagnosis from imaging studies also discriminates the mild cognitive impairment (MCI),
which is a heterogeneous and intermediate category between the healthy and Alhzeimer’s
diagnostic groups, from which subjects may convert to AD or return to the normal cogni-
tion [Kloppel et al., 2015].
1.3 State-of-the-art
In the medical image processing field, there exists a set of approaches dealing with the
bias field on the image segmentation stage. In [Wang and Wang, 2008], a fuzzy C-means
(FCM) algorithm for MRI brain image segmentation is presented which incorporates both
the local and non-local neighborhood information into clustering process to increase the
noise robustness. Another, FCM approach is introduced by [Sikka et al., 2009], where an
entropy driven homomorphic filter is used for inhomogeneity correction. A histogram-based
local peak merger using adaptive window initializes cluster centers, and a neighborhood-
based membership ambiguity correction smooths the boundaries between different tissue
clusters. [Li et al., 2008] segments brain MRI and corrects the bias field using a spatially
constrained kernel clustering algorithm. Such a kernel implicitly maps image data to a higher
dimensional space enhancing the separability. Both, the clustering and bias field correction,
are alternatively combined benefiting each other and accelerating the whole convergence.
Active contour models (ACM) have been also used for dealing with low frequency arti-
facts in the segmentation stage. Specifically, geometric active contours are implicit level
set functions defined on a higher dimension, which evolve according to a partial differen-
tial equation (PDE). Usually, the evolution equation is the minimization solution of an
energy formulation, obtained by variational calculus. For instance, a summation of edge
and region energies can be minimized in the variational scheme [Paragios and Deriche, 2002,
Sum and Cheung, 2008]. [Brox and Cremers, 2008] approximated the piecewise smooth Mumford-
Shah functional of the variational framework using local means. Other localized statistics
have been computed from convolutions achieving results similar to piecewise-smooth segmen-
tation in a much more efficient manner [Lankton et al., 2007, Lankton and Tannenbaum, 2008].
8 1 Preliminaries
In this way the variational scheme can model objects with heterogeneous statistics. The
same goal is reached by [Li et al., 2007] where a local binary fitting (LBF) energy allows
to extract local features robust to intensity inhomogeneity. Nevertheless, the number of
operations required to compute the energies implies a higher computational cost. Fur-
thermore, the algorithms for solving accurately PDEs need a large number of iterations
to converge [Wang et al., 2010].
1.3.1 Image Representation Enhancement
A practical way to enhance the image properties and ease the discrimination, especially
when a single modality is provided, is by including a voxel-wise feature extraction stage. For
instance, in [Iftekharuddin et al., 2009], two kinds of texture features are extracted for brain
tumor segmentation: The first ones, fractal-based, are estimated by using author’s Piecewise-
Triangular-Prism-Surface-Area algorithm. The second group results from the combination
of fractal and wavelet analyses. These features were extracted from T1, T2, and FLAIR MRI
modalities and fused by means of Self-Organizing Maps. [Tu and Bai, 2010] also adopted
texture features and joined Haar features in a probabilistic boosting tree, which was further
enhanced with an auto-context model iteratively refining the labeling results. The efficiency
of different feature types from multimodal MRI was later studied by [Ahmed et al., 2011].
Fractal Dimension, multi-fractional Brownian motion (mBm), level-set-based shape, and
normalized image intensity composed the set of evaluated features. The evaluation introduces
a feature selection stage using either PCA, boosting, entropy metrics, or a proposed Kullback-
Leibler-divergence-based ranking. Results showed that mBm performed the best for T1 and
FLAIR modalities while the normalized intensity was more appropriate for segmenting T2
images.
Wavelet-like transforms have been widely studied for image representation in computer vi-
sion applications and also in medical image processing. In [Demirhan and Guler, 2011],
wavelet decomposition along with its statistical information feeds a self-organizing map and
supervised learning vector quantization to segment gray and white matter from T1 MRI.
[Alzubi et al., 2011] introduces a multi-resolution analysis using wavelets, ridgelets, and
curvelets for Region-Of-Interest segmentation on medical images, particularly cancer tissue.
Such transformations exhibited suitable edge reconstruction since a directional component is
included in the traditional wavelet transform. [De et al., 2011] maximizes the image entropy
using Particle Swarm Optimization, being enhanced with an introduced wavelet-based mu-
tation operation. Reported results showed a successful lesion extraction on high contrast and
large lesions. [Cerrolaza et al., 2012] decomposes a statistical active shape model using the
wavelet transform to model object relationships at different levels of detail. The approach
properly models the localities at the cost an increase in the feature space dimension.
1.3 State-of-the-art 9
Most recently the non-local weighted label fusion segmentation approaches have promoted
the development of several voxel-wise groups of features. In general, such approaches dis-
claim the one to one registration correspondence by allowing neighboring labeled elements to
contribute to labeling a target location. Such a contribution, usually given in terms of a sim-
ilarity function, is then assessed using voxel-wise features. In the most straightforward case,
a spatially varying decision fusion measures the intensity difference between voxel-pairs after
deformable mapping as an indicator of the local registration performance [Isgum et al., 2009].
Nonetheless, more robust versions consider patch intensities around a voxel to constitute the
feature space and to estimate nonlocal means [Coupe et al., 2011, Rousseau et al., 2011].
Further, the voting weights can be computed as the combination factors of linearly re-
gressing the target patch from the surrounding ones in the atlases under some constraints
(e.g. sparsity) [Zhang et al., 2012]. The idea is lately enhanced by including a discrimi-
native dictionary learning stage [Tong et al., 2013], generative modeling [Wu et al., 2014],
a local element-wise atlas selection [Tong et al., 2015], and most recently, the authors in
[Bai et al., 2015] calculate a set of patch-based weights from contextual, gradient features.
The output of linear filters as first- and second-order differences, 3D hyperplane, Sobel, and
Laplacian have been also combined with Haar-like features in a selection stage based on
forest learning by [Ma et al., 2014].
1.3.2 Image Classification
As above stated, classifying volumes from the raw intensity is complex in several ways.
Then, an essential stage to implement is the feature extraction. As found in the literature
review, there are two kinds of mainly accepted and standard MRI features considered for
diagnosis from structural brain scans. Firstly, the structure-wise morphometry holds thick-
ness, area, and volume measurements for anatomically defined regions of interest. Such
regions of interest correspond to either the white and gray matter structures, or the corti-
cal and subcortical structures. FreeSurfer is the most known toolkit for extracting such a
feature in a fully automatic way [Jung et al., 2015, Ota et al., 2014, Westman et al., 2013,
Cuingnet et al., 2011]. Secondly, the voxel-based morphometry provides statistics at voxel-
level. The posterior tissue probability maps of gray and white matter, extracted with the Sta-
tistical Parametric Mapping (SPM), are the most considered features [Ramırez et al., 2016,
Khedher et al., 2015, Moradi et al., 2015, Ota et al., 2015, Chyzhyk et al., 2014, Falahati et al., 2014,
Cuingnet et al., 2011].
Such features have been joined with different multivariate pattern recognition (MVPR) tools
for neuroimage data classification. Reported classifiers range from conventional approaches
(k-Nearest Neighbors [Papakostas et al., 2015], Linear Discriminant Analysis [Sørensen et al., 2013],
Support Vector Machines [Kloppel et al., 2012b], Random Forests [Moradi et al., 2015], Re-
10 1 Preliminaries
gressions [Eskildsen et al., 2015]) to the combination of classifiers [Farhan et al., 2014].
[Sabuncu and Konukoglu, 2015] analyzed three representative MVPR tools for schizophre-
nia, dementia, and attention deficit and hyperactivity disorder. Authors conclude that
MVPR tools offer more accurate predictions than univariate markers while the choice of
the feature set and machine-learning algorithm has a significant impact on prediction per-
formance. In the particular case of dementia, most of above approaches were evaluated at
the 2014 CADDementia challenge for reproducing the clinical diagnosis of 354 subjects in a
multi-class classification problem of three diagnostic groups [Bron et al., 2015]: Alzheimer’s
diagnosed patients, subjects with mild cognitive impairment (MCI), and healthy controls
(NC) given their T1-weighted MRI scans. Although the best-performing algorithm yielded
an accuracy 63.0% and an area under the receiver-operating-characteristic curve of 78.8%;
attained true positive rates are 96.9% and 28.7% for NC and MCI, respectively. Such results
proved the biasing towards specific classes when there is a heterogeneous class as MCI.
Other kind of machine learning tools, the Artificial Neural Networks (ANN), have proven
to be suitably adapted to several computer-aided diagnosis tasks, presenting the following
advantages [Amato et al., 2013, Chyzhyk et al., 2014]: i) Ability to process a large amount
of data, (ii) Reduced likelihood of overlooking relevant information, and (iii) Reduction of
diagnosis time. Nonetheless, setting-up the initializing architecture (termed pre-training)
is an essential procedure for ANN implementation, being carried out the most naively
using randomly-initialized parameters. However, this strategy performs poorly in prac-
tice [Vincent et al., 2010]. For improving each initial-random guess, a local unsupervised
criterion can be assumed to pre-train each layer stepwise, aiming to produce a useful higher-
level description from the lower-level representation output by the previous layer. Particu-
lar examples that use unsupervised representation learning are the following: Restricted
Boltzmann Machines [Hinton et al., 2006], autoencoders [Bengio et al., 2007], sparse au-
toencoders [Ranzato et al., 2007], and the greedy layer-wise that is the most common ap-
proach that learns one layer of a deep architecture at a time [Bengio, 2012]. Although
the unsupervised pre-training generates hidden representations that are more useful than
the input space, many of the resulting features may be irrelevant for the discrimination
task [Weston et al., 2012, Mohamed et al., 2011].
1.4 Objectives 11
1.4 Objectives
1.4.1 General Objective
To develop a segmentation and classification framework using kernel tools supporting par-
tition, clustering, and classification of magnetic resonance images of the human brain. The
framework must include priori tissue distributions for enhancing the feature extraction of
atlas-voting strategies. In addition, demographic data must be included into the learning
stages introducing a subject-dependant selection of templates and extracting discriminative
biomarkers for clinical prediction of diseases from brain structural information.
1.4.2 Specific objectives
• To develop an unsupervised kernel-based image representation for 3D volumes for clus-
tering anatomically similar subjects. The proposed representation must highlight the
inherent image distributions while reducing the feature space dimension. Additionally,
the image metric induced in the new space must support template selection for an
atlas-based image segmentation approach.
• To propose a learning methodology for voxel representation using kernel-devoted cost
functions enhancing intrinsic tissue features. The scheme must include local intensity
distribution and supervised tissue information. Resulting features must improve seg-
mentation performance of atlas voting strategies and provide robustness under image
artifacts.
• To build a supervised scheme for biomarker extraction from structural brain MRI
scans supporting clinical prediction tasks. Extracted biomarker from prior subject
information must enhance MRI discrimination and provide data interpretability.
1.5 Contributions
Taking into account the results of the proposed models, we highlight the following contribu-
tions of this thesis:
• A new kernel-based representation of 3D volumes is introduced from the
inherent Inter-Slice Kernel (ISK) relationship aiming to improve MRI dis-
crimination and highlight brain structure distributions. Specifically, we pro-
12 1 Preliminaries
pose three different types of ISK-based feature representations to estimate pairwise
MRI similarities using generalized Euclidean metrics. We tune all needed metric pa-
rameters by means of a centered alignment approach, so that the obtained kernels
resemble the most prior demographic information. The proposed approach is tested on
MRI data discrimination using patient demographic information categories (namely,
age and gender patient). As a result, our proposed discriminative representation prove
to be useful for MRI clustering tasks, while properly supporting atlas selection ap-
proaches.
• An unsupervised cost function, termed kernel function estimation based
on information potential variability maximization (KEIVP), is proposed
for choosing kernel parameters. The model assumes that a Reproduced Kernel
Hilbert Space maximizing the whole information potential variability, also highlights
the intrinsic data distribution. Therefore, we start from a Parzen-based probability
density function estimator and develop the updating rule for the bandwidth of a Gaus-
sian kernel using a finite dataset.
• Renyi’s α entropy is proposed as the cost function of the unified framework
for atlas-based segmentation of brain structures on MRI. This sort of function
leads to more discriminative tissue distributions than the standard maximum likeli-
hood, since the latter relies on the assumption that tissue properties do not overlap
significantly which is far from being realistic.
• A voxel-wise feature extraction methodology using linear projections is de-
veloped in a supervised scheme for supporting a patch-based non-local seg-
mentation approach. In this regard, we generalize convolution-based representations
(like gradients, Laplacians, and non-local means) and spatially adapt the feature rep-
resentation, relying on the fact that structure distributions may vary along the image
domain. Linear projections are calculated by maximizing the affinity between label
and feature distributions.
• An entropy operator for matrices is introduced as the cost function for
optimizing a non-linear image representation. Such a representation, based
on multi-layer perceptrons, supports MRI classification tasks while performing as a
dimension reduction for the original image domain.
2 Background
This chapter overviews the technical background on the concepts considered for the devel-
opment of this thesis. Firstly, we provide the mathematical background on the use of kernel
functions for machine learning applications. Added to that, our proposed criterion to es-
timate Gaussian kernel functions from given samples is introduced. Secondly, we formally
define the image segmentation task from the pattern recognition point of view. Finally,
we describe the considered image databases related to brain structure segmentation and
classification. The contents of this chapter are based on the works of Ashburner and Fris-
ton [Ashburner and Friston, 2000], Aljabar [Aljabar et al., 2009] Principe [Principe, 2010],
Coupe [Coupe et al., 2011], and Alvarez [Alvarez-Meza et al., 2014] .
2.1 Reproducing kernel Hilbert spaces in machine learning
It is universally acknowledged that the study of positive definite kernels is a topic of interest
for the machine learning community as a generalization of a well body of theory that has
been developed for linear models. In this way, a positive definite kernel κ is an implicit
way to represent the samples of the input space X . Owing to there is a correspondence
between κ and a Reproducing kernel Hilbert space (RKHS) of functions H , the kernel can
be understood as an indirect way to compute inner products between elements of a Hilbert
space that are the result of mapping the elements of X to H . So, there is a mapping
function ϕ : X → H such that:
κ (x, x′) = 〈ϕ(x), ϕ(x′)〉H . (2-1)
Regarding this, the space H can be viewed as a feature space and ϕ is called the feature
map. Consequently, by performing linear operations in H it is possible to perform nonlinear
manipulations in the input space X (see Figure 2-1). As a rule, it holds that |X |→∞,
so that |X | ≪ |H | can be assumed. In practice, there is no need to perform any explicit
computations in H .
An important property associated with the use of positive definite kernels in machine learning
is the so-called representer theorem[Kimeldorf and Wahba, 1971, Scholkopf and Smola, 2002]:
14 2 Background
xx
x
x
φφ(x)
φ(o)
φ(o)
φ(o) φ(o)
φ(x)
φ(x)
φ(x)o
oo
o
Input space RKHS
Figure 2-1: Kernel-based mapping.
Theorem 2.1.1 Let Θ : [0,+∞) → R be a strictly monotonic increasing function, X be a
set, and ǫ : (X ×R2)N → R∪∞ be an arbitrary loss function. Then, each minimizer f∈H
of the regularized risk functional:
ǫ ((x1, y1, f(x1)), . . . , (xN , yN , f(xN))) + Θ(‖f‖2
H
), (2-2)
admits a representation of the form:
f(x) =N∑
n=1
αnκ(xn, x), (2-3)
where each yn∈R is a given output associated with the input xn∈X .
Proof 2.1.1 Let S=spanκ(xn, ·) : xn∈X , n∈[1, N ] denotes the subspace of H spanned
by the N training samples. Consider the solution f∈H , this solution can be written as:
f=fS + fS⊥, where fS∈S, fS⊥∈S⊥, and ⊥ stands for the orthogonal symbol. Consequently,
f(xn)=fS(xn) + fS⊥(xn)=fS(xn) + 0. Now, for the second term of the regularized risk func-
tional:
Θ(‖f‖2H
)= Θ
(‖fS‖
2H + ‖fS⊥‖2H
),
since Θ is strictly monotonic increasing it is possible to see that the minimum will be achieved
for ‖fS⊥‖=0, which implies that fS⊥=0.
With this in mind, it is possible to conclude that the representer theorem basically states
that the solution of the minimization of the regularized risk functional can be expressed
in term of the so-called training sample (xn, yn) : n∈[1, N ]. Therefore, it allows us to
deal with problems that a first glance appear to be infinite dimensional. Nonetheless, the
regularization does not prevent of having local multiple minima, such a property requires
some extra conditions, namely, convexity.
2.2 Gaussian kernel estimation from information potential variability 15
2.2 Gaussian kernel estimation from information potential
variability
Let X∈X be a system in the representation space X . Renyi’s entropy, given in Equa-
tion (2-4), quantifies the level of randomness of X .
Hα (X) =1
1− αlog(Ex
p(x)α−1
)(2-4)
In practice, p(x) can be estimated from a set of N samples X=xn∈X : ∀n∈[1, N ], by
using the Parzen’s nonparametric probability density function estimation:
p(x) ≈ pX(x|σX) = En κ (x− xn) , (2-5)
where κ (·)∈R+ is a symmetric kernel function and notation En · stands for averaging
operator along theN samples. Though there are many feasible kernel functions, the Gaussian
is commonly preferred because of its universal approximating capability [Liu et al., 2011].
In this case, the Gaussian kernel can be defined for the input domain X as:
κG (x− x′; σX) = exp
(−‖x− x′‖2
X
2σ2X
), (2-6)
where ‖ · ‖X is a given norm in X .
Provided the observation set X and based on the Parzen’s estimation of Equation (2-5), we
get the following estimator of the Renyi’s α-order entropy [Principe, 2010]:
Hα (X) ≈ Hα (X|σ) =1
1− αlog (Vα (X|σ)), (2-7)
where the so termed information potential (IP) Vα(X|σ) of the set X is defined as follows:
Vα(X|σ) = En vα (xn|σX) , (2-8)
being vα (xn|σ) the IP of the sample xn, which can be computed as:
vα (xn|σX) =1
Nα−1
N∑
n′=1
(κG (xn − xn′ ; σX))α−1. (2-9)
Equation (2-9) lets infer that IP yields an entropy estimate based on the summation of pair-
wise sample interactions through the Gaussian kernel function [Morejon and Principe, 2004].
Also, the Information Force (IF), Fn∈X , is defined as the force acting on the particle xn
due to all other particles in X and is given by the derivative of the IP with respect to xn.
16 2 Background
Particularly, for the case of α=2, the well-known quadratic Renyi’s entropy leads to the
following estimation of the IF:
Fn =∂
∂xnV2(X|σX) = −
1
NσX
∑
xn′∈X
κG (xn − xn′ ; σX) (xn − xn′)
= En′ F (xn|xn′) (2-10)
F (xn|xn′) =1
Nσ2X
κ (xn − xn′; σX) (xn − xn′) , (2-11)
where F (xn|xn′) corresponds to the conditional IF acting on xn due to xn′ . Generally, the IFs
can be interpreted in light of inner products in a high dimensional feature space [Jenssen et al., 2003].
Some important facts have to be highlighted from Equation (2-10):
• Firstly, given that X is fixed and the factor (xn − xn′) points towards xn, all IF direc-
tions are also fixed and attracting-natured.
• Secondly, since Fn turns out to be dependent on the free parameter σX , the IP and
all IF magnitudes become functions of the Gaussian kernel bandwidth. In fact, the IP
follows a monotonically decreasing behavior over σX .
• At the same time, the conditional IF magnitude tends to zero as σX goes either to zero
or infinite and reaching its maximum at some value in R+.
Hence, the importance of an adequate Gaussian kernel bandwidth tuning becomes clear. In
this sense, we seek for an RKHS maximizing the overall IP variability with respect to the
kernel bandwidth parameter so that all IF magnitudes spread the most widely on X . To
this end, the variability of the estimated IP is maximized in terms of the kernel bandwidth
parameter as follows:
σ⋆X = argmax
σX
Var v2(x|σX) (2-12)
Var v2(x|σX) = Ex
(Var v2(x|σX)− Ex Var v2(x|σX))
2 . (2-13)
Deriving Equation (2-13) with respect to σX , the optimal parameter value can be rewritten
in terms of the above introduced Gaussian-based Renyi’s entropy as follows:
d
dσXVar v2(x|σX) =
2
N2σ3X
(1 +
1
N
)( N∑
n,n′=1
κ2G (xn − xn′; σX) ‖xn − xn′‖2
X
−
(N∑
n,n′=1
κG (xn − xn′ ; σX)
)(N∑
n,n′=1
κG (xn − xn′ ; σX) ‖xn − xn′‖2X
)),
=2(N2 +N)
σX
(σ2X
N∑
n,n′=1
F 2(xn|xn′)− V2(X)
N∑
n,n′=1
(F (xn|xn′))⊤(xn − xn′)
)
2.3 Template-based Image Segmentation 17
Lastly, equating the above equation to zero, a fixed point or a gradient descent update rule
can be employed to find a suitable σX value. As a result, we get a scale updating rule as
a function of the IFs, which are induced by a kernel function applied over a finite sample
set. Thereby, a Gaussian kernel-based RKHS coding the most spread out IF magnitudes
can be estimated using the introduced approach, termed as: Kernel function Estimation
from Information Potential Variability - KEIPV. Figure 2-2 illustrates the influence of the
bandwith on the data representation as well as the result of the KEIPV on a synthetic
dataset.
2.3 Template-based Image Segmentation
Let an intensity image be a set of spatially-arranged real-valued measurements, X=xr∈R:r∈Ω.
The bounded domain ,Ω⊂RD, is defined over a D-dimensional space and the elements r are
commonly known as pixels, for 2D images, or voxels, for 3D volumes.
Image segmentation task consists on partitioning the intensity image into multiple segments,
each of them more meaningful and easier to analyze than the original image. We formally
define the task as assigning a single label, lr∈L, to each element on Ω based on its location
and intensity (r and xr) aiming to obtain the segmentation image, L=lr:r∈Ω, also known
as partition or label image. The set L, of size |L|, holds the possible labels.
Template-based segmentation approaches make use of a template (atlas) or set of templates
containing a priori information about shape, position, and/or topology about the imaged
structures of interest. Generally, an atlas is described by an intensity image, X, a tissue
membership map, B=brl:r∈Ω, l∈L, and a spatial domain, Ω. We recognize two kind of
atlases depending on their membership map definition: Deterministic atlases assume a rough
membership, i.e., brl∈0, 1 , whereas probabilistic atlases are constrained to brl=[0, 1] and∑l∈L brl=1.
The latter kind of atlases are built by spatially normalizing and averaging of a set of
N anatomical atlases A=Xn,Ln,Ωn:n=1, . . . , N. Spatial normalization requires a re-
parameterization of the domains into a common coordinate space, Ω′, so that each coordi-
nate r indexes the same anatomical position in each image. Formally, this requires a set of
functions τn:Ωn→Ω′; r 7→τn(r) such that each mapping τn holds the coordinate transfor-
mation between the domain Ωn and the canonical configuration. In computational anatomy,
the mappings are Cn smooth, invertible, and topology preserving transformations avoiding
emergence of holes. Hence, application of the mappings over corresponding membership map
18 2 Background
(a) (b)
(c) σ=6.13× 10−3 (d) σ=6.13× 10−1 (e) σ=6.13× 10
(f) σ=6.13× 10−3 (g) σ=6.13× 10−1 (h) σ=6.13× 10
Figure 2-2: KEIPV illustrative example. a) Multivariate Gaussian toy set. b) log of IP
variability versus bandwidth. 2nd row: Gaussian kernel for the toy set. 3rd
row: IFs acting on a fixed particle (green). Narrow (1st column), KEIVP
(2nd column) and wide (3rd column) bandwidth values.
2.3 Template-based Image Segmentation 19
allows to build the average anatomical template, also known as tissue probability map:
B =
brl =
1
N
N∑
n=1
(bnl ∗ g)r : r∈Ω, l∈L
, (2-14)
where g is a convolution function usually introduced for smoothing purposes and ∗ is the
spatial convolution operator. In general, τn may be an assemble of an affine transformation
(accounting for translation, rotation, scale, and skew) or a non-linear function finely aligning
the images [Ashburner, 2007, Avants and Gee, 2004].
Atlas space Query space
X1,B1,Ω1
X2,B2,Ω2
Xn,Bn,Ωn
XN ,BN ,ΩN
X1, B1
X2, B2
Xn, Bn
XN , BN
Lq
τ1
τ2
τn
τN
ν1sl
ν2sl
νnsl
νNsl
Figure 2-3: Schematic illustration of template-based segmentation using multiple atlases.
Each anatomical atlas, An, is registered to the query anatomy Xq. Resulting
transformation τn is used to map the corresponding tissue membership map Bn
to the query space Ωq. Transformed segmentations Bn are combined through
the weighting function νnsl to create an estimate of the query segmentation Lq.
In the first stage, provided atlases are registered to the target image for propagating the labels
to the target spatial domain. Afterwards, labels are fused into a single class at location r.
Figure 2-3 illustrates the template-based segmentation procedures, which are outlined as
follows:
a) Image registration computes the spatial transformations τn maximizing the alignment
between Xn and the query image Xq [Zitova and Flusser, 2003].
b) Label propagation maps each n-th template in the spatial domain of the query image
Ωq through the transformation τn, yielding the designated membership set Bn=bnrl=bnr′l:
r′=τn(r)∈Ωq.
20 2 Background
c) Voxel classification supplies the estimated label representation Lq by gathering the mem-
berships assigned to a voxel bnrl into a single label lqr∈L following the general rule
[Wu et al., 2014]:
lqr =argmaxl
1
N
N∑
n=1
∑
s∈Br
νnslb
nsl (2-15)
s.t.νnsl ≥ 0,
where Br⊂Ω is a spatial neighborhood around r and νnsl∈R
+ is the weighting function.
Equation (2-15) allows to include most of the template-based segmentation criteria into a
single expression. Some particular cases include:
• Majority voting: Br=r and νnsl=constant.
• Global Weighted Fusion: Br=r, νnsl=νn=f(Xq,Xn) depends on the atlas but not on
the location and it is usually constrained to∑N
n=1 νn=1.
• Nonlocal means: Br=s:‖s− r‖Ω≤β is a closed ball of radius β, and νnsl=f(xq
r, xns ) is
a function depending on the local affinity. This kind of approaches are also know as
patch-driven segmentations.
• Parametric Intensity Modeling: Br=r, and νnrl = p(xr|lr = l) defined as the tissue
conditional probability with a distribution function f(xr|θl) parameterized by θl. In
this case, segmentation rule in Equation (2-15) is rewritten as the maximization of the
posterior probability [Ashburner and Friston, 2005]:
lqr = argmaxl
p(lr = l|xr) (2-16)
p(lr = l|xr) ∝ p(xr|lr = l)p(lnr = l) (2-17)
p(lnr = l) = brl =1
N
N∑
n=1
bnrl (2-18)
p(xr|lr=l) = f(xr|θl) (2-19)
where p(lnr=l)=brl becomes the tissue probability map (TPM) and the parameter set
θl is derived from the query image intensities.
2.4 Magnetic resonance image databases
This section describes the magnetic resonance image (MRI) databases considered for the
development of this manuscript. Such a databases have been widely used for training and
2.4 Magnetic resonance image databases 21
evaluating medical image processing methods and they are publicly available (in some cases
freely acquired under request). Figure 2-4 exemplifies an image of each collection.
BrainWeb database The Simulated Brain Database or BrainWeb contains a set of realistic
MRI data volumes produced by an MRI simulator. The neuroimaging community evalu-
ates the performance of various image analysis methods on such a database given that the
ground-truth is known [Cocosco et al., 1997, Kwan et al., 1996]. The Internet connected
MRI Simulator, at the McConnell Brain Imaging Centre in Montreal, has two publicly avail-
able1 pre-computed MRI sets:
SBD1: 18 simulated images, from the same subject, with dimensions of 181 × 217 × 181
voxels of 1× 1× 1mm. Simulated contrasts are T1-weighted, T2-weighted, and proton
density (PD). The T1-weighted image was simulated as a spoiled FLASH sequence,
with a 308 flip angle, 18ms repeat time, 10ms echo time. Three different levels of
image nonuniformity ( 0, 40, 100% RF ) and six noise levels ( 0, 1, 3, 5, 7, 9% relative to
the brightest tissue in the images ) are simulated to attain 18 volumes.
SBD2: 20 simulated images with constant image quality (3% noise, 0% image nonuniformity)
and varying anatomy. Each volume was generated based on the anatomical model of
different individuals with normal brain. Provided ground-truth volumes are 362×434×
362-sized and intensity volumes 181× 256 × 256-sized. Ground-truth is resampled to
corresponding intensity volume resolution aiming to perform voxel-wise comparisons.
For both sets, ground-truth images hold 10 structures, namely, white matter, gray matter,
cerebrospinal fluid, skull, bone marrow, dura matter, fat, connective tissue, muscles, and
skin. Further details on the simulation process can be found on [Aubert-Broche et al., 2006].
OASIS database: The Open Access Series of Imaging Studies is a project aimed at making
MRI data sets of the brain freely available to the scientific community. Washington Univer-
sity Alzheimer’s Disease Research Center, the Howard Hughes Medical Institute (HHMI) at
Harvard University, the Neuroinformatics Research Group (NRG) at Washington University
School of Medicine, and the Biomedical Informatics Research Network (BIRN) compiled
and made OASIS freely available. The study holds MRI from 416 subjects, aged 18 to 96
years old, including diagnosed very mild dementia (70), mild dementia (28), moderate de-
mentia (2), and healthy (316) subjects [Marcus et al., 2010]. For each subject, three or four
T1-weighted MR scans obtained within a single imaging session are included, from which
a motion-corrected co-registered average image is obtained. Additionally, each subject is
provided with segmented gray matter (GM), white matter (WM) and cerebro-spinal fluid
1http://brainweb.bic.mni.mcgill.ca/brainweb/
22 2 Background
(CSF) structures. A fourth label (BG) is assigned to voxels with no label to model the image
background.
SATA database: The MICCAI 2013 challenge workshop on Segmentation: Algorithms,
Theory, and Applications2 provides a framework for comparing atlas-based segmentation
methods on three standardized datasets. We are interested in the third challenge for the
brain labeling, in which the diencephalon is labeled into 14 regions of interest: Accum-
bens, amygdala, caudate, hippocampus, pallidum, putamen, and thalamus (in all cases left
and right). This collection contains training and testing sets of 35 and 12 brain T1 MR
scans, respectively. The training set contains both intensity and label images with the 14
annotated structures. The testing set contains only the intensity images. Pairwise nonrigid
alignments, among the training and between training and testing volumes, are also provided
for a competition on standardized registration.
IXI database: The Information eXtraction from Images dataset is a brain imaging study
holding MR images from 575 normal subjects aging between 20 and 80 years. Subjects
are provided with T1, T2, PD, DTI, and angiogram volumes. All image sequences were
obtained with three different scanners (Philips 1.5T, Philips 3T, and GE 3T) at different
hospitals in London, and further anonymised and converted to NIFTI format. Additionally,
basic demographic information for each subject is included (age, gender, ethnicity, handiness,
among others). The whole dataset is publicly available online3.
ADNI database: The Alzheimer’s Disease Neuroimaging Initiative database4 was launched
in 2003 by the National Institute on Aging (NIA), the National Institute of Biomedical
Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private
pharmaceutical companies and non-profit organizations. The primary goal of ADNI is to test
whether serial magnetic resonance imaging (MRI), positron emission tomography (PET),
other biological markers, and clinical and neuropsychological assessment can be combined to
measure the progression of mild cognitive impairment and early Alzheimer’s disease. ADNI
database is split into three phases, namely ADNI 1, ADNI 2, and ADNI GO. In all phases
subjects are imaged up to six times using the same scanner: the baseline visit, 6, 12, 18, 24,
and 36 months after baseline. At each visit, subjects are clinically evaluated to label them as
Normal Control (NC), Mild Cognitive Impairment (MCI), Alzheimer’s disease (AD). Clinical
evaluation includes the mini-mental state examination (MMSE), widely used to assess the
mental status, where the maximum score is 30, and a 23 points or less is an indicative of
2https://masi.vuse.vanderbilt.edu/workshop2013/index.php/Main_Page3http://www.brain-development.org/4adni.loni.usc.edu
2.4 Magnetic resonance image databases 23
cognitive impairment [Folstein et al., 1975]. From the three phases, we selected a subset of
633 subjects with scans that had been noted with the “best” quality mark. As a result, the
selected cohort holds N=1993 images with the three class labels described above. Besides,
629 images with a “partial” quality mark were selected in order to assess the classification
performance under more complicated imaging conditions. Table 2-1 briefly describes the
demographic information for the ADNI selected cohort.
“best” quality “partial” quality
NC MCI AD NC MCI AD
N 655 825 513 465 130 34
Age 74.9± 5.0 74.4± 7.4 74.0± 7.4 76.6± 6.4 76.0± 6.3 74.3± 6.5
Male 47.5% 39.5% 47.6% 70.1% 62.3% 70.6%
MMSE 29.0± 1.0 27.1± 2.5 21.9± 4.4 27.5± 2.0 21.2± 1.6 14.4± 2.8
Table 2-1: Demographic and clinical details of the selected ADNI cohort.
242
Back
ground
Figure 2-4: Examples of MRI databases. Top to bottom and left to right: SDB1 for three noise levels (1, 5, 9%), SDB2 simulated
structures, OASIS, SATA (overlaying labeled structures), and IXI.
2.4Magn
eticreson
ance
image
datab
ases25
N Age (yo) Male Structures Modalities T1 vol. size T1 vox. size
SDB1 1 - - 10 whole brain T1, T2,
PD
181×217×181 1×1×1mm
SDB2 20 29± 4 50.0% 10 whole brain T1 181×217×181 1×1×1mm
OASIS 416 52± 25 38.4% WM,GM,CSF T1 208×176×176 1×1×1mm
SATA 47 - - 14 diencephalon T1 256×256×287 1×1×1mm
IXI 619 49± 16 44.7% - T1, T2, 256×256×130 to 0.93×0.93×1.2mm
PD, MRA, 256×256×150 0.97×0.97×1.2mm
DWI
ADNI 2622 71± 7 50.1% - T1 192×192×160 to 0.93×0.93×1.2mm to
(55, 91) 256×256×180 1.25×1.25×1.2mm
Table 2-2: Summary of characteristics of the considered MRI databases.
3 Kernel-based Template Selection from
the using Embedding Representations
As previously stated, templates provide shape, intensity, and/or functional information re-
garding imaged structures. For segmentation tasks, a set of templates are properly reg-
istered to a query image or used to build the prior spatial distribution for each tissue.
In either case, usage of the whole set of templates assumes unimodal shape distributions.
Therefore, accomplished solutions may be biased towards anatomically unrepresentative im-
ages [Valdes-Hernandez et al., 2009]. In constrast, template selection improves segmentation
performance, in terms of computational cost and accuracy, when only representative images
from large datasets are propagated [Aljabar et al., 2009].
A new Kernel-based Atlas Image Selection, computed in the Embedding Representation
space (termed KAISER), is introduced in this chapter supporting MRI segmentation. The
approach encodes inter-slice similarities for each volume to keep main shape information
on a lower dimensional representation. Then, a tensor-product kernel properly combines
multiple representations into a single metric. Finally, a spectral decomposition of the dataset
estimates a compact embedding space, for data visualization, where latent data structure is
highlighted.
3.1 Template selection for image segmentation
Let an atlas dataset A=An:n∈[1..N ] be composed of N tuples An=(Xn,Bn,Ωn, cn) hold-
ing an intensity image Xn, a tissue membership map Bn, a spatial domain Ωn, and a
demographic category cn∈C (e.g. age, gender, disease). The selection framework ranks the
atlas subjects according its similarity with the query Q=(Xq,Ωq, cq) as follows:
Aq = An∈A : s(Am,Q) ≥ ǫ , (3-1)
being s(·, ·)∈R+ an atlas similarity function and ǫ∈R+ a predefined threshold value. Then,
top ranked atlases are aligned to the query to segment it in its native space. Recalling
3.1 Template selection for image segmentation 27
from Equation (2-15), template-based segmentation using atlas selection is further written
in terms of the weighting value νnsl as:
νnsl = νn =
1 : s(An,Q) > ǫ
0 : i.o.w.(3-2)
Similarity function s(·, ·)∈R+ can be either assessed using the meta-information or the in-
tensity images. In the first case, such a function measures how closely atlas subjects match
the query in terms of the demographic variable c: sC(An,Q)=s(cn, cq). In the second case,
the similarity is derived from the intensities images in a standard spatial domain Ω (e.g.,
Sum of Squared Differences, Correlation Coefficient Histogram, and Normalized Mutual In-
formation) or using a feature vector y extracted from each image X.
3.1.1 Feature extraction based on inter-slice similarities
Let the bounded spatial domain for 3D volumes, Ω ⊂ R3, be indexed by a three-component
vector r=taua+ tsus+ tcuc, where uv∈R3 is the orthonormal vector along the axis v, namely
Axial (v=a), Sagittal (v=s), or Coronal (v=c). Since any intensity image X=xr∈R : r∈Ω
is a finite set of measurements indexed by r, one may assume that tv∈[1..Lv], with Lv as
the volume size along the axis v. In addition, any volume can be expressed as a set of P
different non-overlapping partitions:
X =Xp ∈ R|Ωp| : p∈[1..P ] (3-3)
Xp =xr ∈ R : r ∈ Ωp
s.t.
P⋃
p=1
Ωp = Ω
Ωp ∩ Ωp′ = ∅∀p 6= p′
being |Ωp| the cardinality of the p-th partition (number of voxels). The set Ωp is provided
as high-level segmentation or low-level regions for any spatially normalized anatomic image.
Equation (3-3) allows to encode spatial relations by introducing the following similarity
metric:
ypp′ = κP (Xp,Xp′) = 〈ϕ (Xp) , ϕ (Xp′)〉H ∈ R+ ∀p, p′∈[1..P ], (3-4)
with the function ϕ : R|Ωp| → H mapping each partition Ωp to a Hilbert Space H reproduced
by the kernel function κP (·, ·), and 〈·, ·〉H stands for the inner product on H.
Aiming the function in Equation (3-4) to model the spatial dynamics on the imaged objects,
we split the volumes into ordered slices, of size Lv′×Lv′′ , smoothly varying along the canonical
28 3 Kernel-based Template Selection from the using Embedding Representations
axis v by defining the partitions as:
Ωvp = r=puv + tv′uv′ + tv′′uv′′ : tv′∈[1..Lv′ ], tv′′∈[1..Lv′′ ] ∀p∈[1..Lv] (3-5)
Therefore, application of κP (·, ·) over the above partition yields to the Inter-Slice Kernel
(ISK) features of the image X along the v-th axis:
yv = κP
(Xv
p ,Xvp′
)∈R+ : p∈[1..Lv], p ≥ p′. (3-6)
It has to be pointed out that the symmetric property of equality of the kernel functions
(i.e., κP (Xp,Xp′) = κP (Xp′ ,Xp)) allows to only consider p ≥ p′ for building the ISK. As a
result, the similarity function in Equation (3-2) can be assessed in the new feature space:
sv(An,Q) = s(yv
n,yvq ) (3-7)
After spatial normalization, the relations among the images in A can be encoded into a
symmetric positive definite kernel matrix Kv∈RN×N by defining the similarity in the ISK
space as:
sv(An,Am) = κv(yv
n,yvm) = 〈ϕ (yv
n) , ϕ (yvm)〉H ∈ R
+ ∀n,m∈[1..N ], (3-8)
being κv (·, ·) a positive definite and infinitely divisible kernel function producing the matrix
Kv with elements κvnm=κv(yv
n,yvm)∈R
+.
Due to ISK represents the high-dimensional image information along a single axis, different
dynamics are highlighted by changing the axis view. Hence, we introduce the Tensor-Product
Kernel Representation (TKR) to join the similarity measures in Kv matrices:
sX(An,Am) = κT
nm =∏
v∈a,s,c
(κv(yvn,y
vm))
θv , (3-9)
where θv∈R+ weighs the contribution of κv to the TKR, and KT=κT
nm∈R+:n,m∈[1..N ]
holds the joint similarity of dataset A.
To overcome the deleterious effect on the TKR, due to κTnm→0 as kv
nm→0, the influence
of Kv is decreased by θv→0, so that (kvnm)
θv→1. Besides, positive definite and infinitely
divisible properties of kernels in Equation (3-8) allow fixing arbitrary powers, θv, so that the
resulting TKR in Equation (3-9) is always positive definite.
3.2 Experiments and Results
In this chapter, we consider the IXI dataset for demographic analyses, while SDB2, OASIS
and SATA for evaluating the atlas selection performance in different segmentation tasks.
3.2 Experiments and Results 29
3.2.1 MRI Preprocessing
Three preprocessing steps are carried out over all image data: Images are firstly spatially
normalized to the Talairach space to compare them and extract their features within a
standard space. To this end, rigid registration to the MNI305 atlas is applied to each
volume using the quaternion-based mapping and the mutual information (MI) metric of
the Advanced Normalization Tool (ANTS). As a result, volumes are re-sampled to 197 ×
233 × 189 size (MNI305 template size). Secondly, an intensity normalization is performed
by scaling each voxel value, so that the mean intensity of the white matter is equal for all
images [Fischl et al., 2002]. Such a step is applied by Freesurfer, a freely available image
analysis suite1.
3.2.2 ISK feature extraction
The ISK feature vector of an image Xn is noted as yvn∈R
Lv(Lv−1)/2. Hence, a new represen-
tation space of order 104 is achieved, instead of the original image domain of order 106. The
kernel function κP is chosen as the well-known Gaussian function, noted as follows:
κP
(Xn
p′ ,Xnp
), exp
(−‖Xn
p′ −Xnp ‖
2F
2σ2P
), (3-10)
where notation ‖ · ‖F stands for the Frobenius norm and σP∈R+ is the kernel bandwidth
parameter, which is tuned up according to the KEIPV criterion in Equation (2-12). Fig-
ure 3.1(a) illustrates the tuning curve for the ISK bandwidth parameter. Since the sampling
rates and image dimensions are similar for the three axes, the dynamic range of the inter-slice
differences is also the same. As a result, the KEIPV converges near the same bandwidth for
all axes. Resulting ISK representations for an MRI on the IXI database using the optimal
bandwidth are shown in Figures 3.1(b) to 3.1(d). The red corner patches on the matrices
encode MRI regions with no content, i.e., the background. Since the sagittal ISK exhibits
symmetry respect to the anti-diagonal, such representation properly encodes the head sym-
metry along the sagittal axis.
3.2.3 Image similarity function from TKR
Although the latent phenomenon is the same for all ISK, each of them provides a different
view of the data distribution. Hence, aiming to include all the information into a single simi-
larity function, we compute the Tensor-product Kernel Representation using Equation (3-9).
1http://surfer.nmr.mgh.harvard.edu/
30 3 Kernel-based Template Selection from the using Embedding Representations
20 σ⋆P 40 60 80 100
5 · 10−2
0.1
0.15
0.2
Bandwidth σ
Var(IP)
(b) Axial (c) Sagittal (d) Coronal
Figure 3-1: Top: Bandwidth vs. KEIPV cost function for the IXI database vol-
umes. Mean and standard deviation values are plotted. Bottom: ISK
representation along each view for an IXI subject.
3.2 Experiments and Results 31
In addition, we propose to fix αv parameters depending on their corresponding Information
Potential Kv under the assumption that high variability should identify MRI discriminative
patterns:
αv =V2 (K
v)∑v∈a,s,c
V2 (Kv), (3-11)
where the operator V2 (·) measures the Information Potential of a kernel matrix and it is
defined in Equation (2-8). The resulting parameter setup for the IXI subjects is: αa=0.32,
αs=0.35, and αc=0.33. Figure 3-2 shows the attained kernel matrices using each ISK rep-
resentation and TKR, sorting the subjects by gender and age. We also show the Sum of
Squared Differences (SSD) as baseline similarity metric. As seen, both categories, gender
and age, are highlighted using the kernel representations, so evidencing some patterns in the
MRI distribution.
Male
Female
Male Female20 86 21 80
20
8621
80
(a) ISK Axial
Male
Female
Male Female20 86 21 80
20
8621
80
(b) ISK Sagittal
Male
Female
Male Female20 86 21 80
20
8621
80
(c) ISK Coronal
Male
Female
Male Female20 86 21 80
20
8621
80
(d) SSD
0
1
Male
Female
Male Female20 86 21 80
20
8621
80
0.2
0.4
0.6
0.8
(e) TKR
Figure 3-2: Resulting kernel representations for the IXI database using ISK, TKR and SSD.
Colormap is normalized to [0, 1] for all matrices.
To visually identify MRI patterns, we estimate a low-dimensional space from KT using
Kernel Principal Component Analysis (KPCA). Figure 3-3 compares TKR and SSD using
32 3 Kernel-based Template Selection from the using Embedding Representations
their three largest principal components. From TKR-based projection (Figure 3.3(b)), the
following statements rise: i) The third eigenvector is mainly related to gender discrimination;
ii) The first and second eigenvectors are nonlinearly related and both of them unfold the
age category; and iii) Older subjects are wider spread than younger ones. This last finding
agrees the anatomical head knowledge: Brain anatomy is steady on middle age humans and
changes (gray matter volume diminishes) faster on older humans [Aljabar et al., 2009]. As
a result, our proposal naturally highlights principal subject categories (age and gender), so
better representing inter-subject relations than SSD.
M
F
30
40
50
60
70
80
1-t
h c
oord
inate
2-t
h c
oord
inate
3-t
h c
oord
inate
1-th coordinate 2-th coordinate 3-th coordinate
(a) SSD-based projection
M
F
30
40
50
60
70
80
1-th coordinate 2-th coordinate 3-th coordinate
1-t
h c
oo
rdin
ate
2-t
h c
oo
rdin
ate
3-t
h c
oo
rdin
ate
(b) TKR-based projection
Figure 3-3: IXI database projections using Kernel Principal Component Analysis from SSD
and TKR similarities.
3.2.4 Tissue Labeling Performance
Proposed Kernel-based Atlas Image Selection from the Embedding Representation (KAISER)
is evaluated in two tissue labeling tasks: The first one consists in segmenting cerebrospinal
fluid, gray matter, and white matter of the SBD2 collection (the query) using OASIS images as
templates. To this end, the Diffeomorphic Anatomical Registration using Exponentiated Lie
algebra (DARTEL) algorithm aligns the subset selected images and builds a tissue probabil-
ity map (TPM) for each query. Then, resulting TPMs are fed into the unified segmentation
tool of the Statistical Parametric Mapping (SPM), which is a probabilistic generative ap-
proach estimating the query labels. The second task aims to segment the 14 diencephalon
structures on the training SATA images in a leave-one-out validation. In this case, a weighted
majority voting scheme is adopted for segmentation where template-to-query TKR similarity
weighs the contribution of each atlas. In this case, the database provides pair-wise non-rigid
registrations among training images, so no further alignment is required.
3.2 Experiments and Results 33
Segmentation performance is measured in terms of the well-known Dice similarity index:
DIl = Eq
200
|Bql ∩B
ql |
|Bql |+ |Bq
l |
(3-12a)
DI = El DSIl , (3-12b)
where Eq · and El · stand for the averaging operator along the assembly of query images
and along the tissue types, respectively. Bql is the binary ground-truth for l-th tissue, Bq
l
its resulting estimation, and DIl its attained Dice index. If there is no agreement between
the ground-truth and the attained segmentation, DIl is minimum (0%); in the case of total
agreement, DIl is maximum (100%). Figure 3-4 compares KAISER against SSD, mutual
information (MI), and random (Rand) selection approaches in an incremental search for
the optimum number of atlases. The average Dice index and its standard deviation for
each tissue and considered selection approaches are plotted in Table 3-1 where parenthesis
indicates the optimal number of selected templates |Aq|. Obtained results show how KAISER
not only reaches the maximum accuracy at fewer atlases than baseline approaches but also
selects a subset of images better performing than larger sets. Particularly for SDB2, the
performance measure for all approaches converges to the same value since TPMs are built
on a selection-only scheme. On the other hand, SATA results are not the same when using
the whole dataset due to the weighted voting dependent on the similarity measure. In this
regard, tensor kernel representation induces an image similarity improving segmentation
performance.
50 100 150 200 250 |Aq|90
91
92
93
DI
RandSSDMIKAISER
(a) SDB2
5 10 15 20 25 30 |Aq|81
83
85
87
RandSSDMIKAISER
(b) SATA
Figure 3-4: Template selection performance for considered image similarities. Average Dice
similarity index is depicted along the number of selected templates for both
segmentation tasks.
34 3 Kernel-based Template Selection from the using Embedding Representations
Table 3-1: Template selection performance for each tissue using optimal number of atlases.
SDB2 database
Rand (240) SSD (240) MI (70) KAISER (60)
Gray matter 91.9± 1.7 92.5± 1.7 93.0± 1.7 93.2± 1.7
White matter 94.8± 1.3 95.3± 1.4 95.7± 1.4 95.9± 1.4
CSF 89.0± 2.9 88.1± 2.8 88.2± 3.5 88.5± 3.6
Average 91.9± 2.0 92.0± 2.0 92.3± 2.2 92.5± 2.2
SATA database
Rand (34) SSD (24) MI (18) KAISER (13)
Accumbens 78.0± 0.9 79.0± 1.0 78.7± 1.0 79.3± 0.8
Amygdala 80.3± 0.6 80.8± 0.6 80.9± 0.6 80.8± 0.5
Caudate 82.3± 1.3 83.6± 1.3 83.9± 1.3 86.5± 0.8
Hippocampus 83.8± 0.6 84.6± 0.6 84.9± 0.5 85.1± 0.5
Pallidum 88.2± 0.5 88.5± 0.5 88.7± 0.5 88.2± 0.5
Putamen 92.1± 0.3 92.4± 0.3 92.6± 0.3 92.2± 0.3
Thalamus 91.2± 0.3 91.6± 0.3 91.9± 0.3 91.9± 0.2
Average 85.1± 0.6 85.8± 0.7 85.9± 0.6 86.3± 0.5
3.3 Summary
A kernel-based image representation is introduced to support MRI atlas selection in a
template-based segmentation of brain structures. The proposed approach firstly encodes
smooth MRI inter-slice variations using a kernel function (ISK), which can be related to the
brain structure distribution. Besides, ISK along each canonical axis (Axial, Coronal, and
Sagittal) are further combined into a single Tensor-product Kernel Representation (TKR)
inducing pairwise image similarities.
Computed pairwise image kernel, shown in Figure 3-2 and sorted by gender (firstly) and age
(secondly), shows a stacked-block-like shape, leading to assume that there are similarities
encoding demographic groups. Later, the KPCA-based projection, provided in Figure 3-3,
evidenced that TKR highlights inherent image distribution, particularly age and gender
relations. Indeed, TKR enhances both, data interpretability and separability, using patient
demographic information in comparision to standard sum of squared differences.
The capability of the proposal to support image segmentation is evaluated on two query
image collections (the synthetic SDB2 and the SATA for diencephalon) and two segmentation
approaches (a probabilistic generative and a weighted majority voting). To this end, the
similarity metric embedded in the TKR space (KAISER) is used to select templates and
weigh its contribution to the labeling. Obtained results lead to the following conclusions:
3.3 Summary 35
i) KAISER outperforms other selection strategies as SSD and MI; ii) there exists a small
number of templates performing better than the whole dataset so avoiding the computational
cost of pairwise image registration; and iii) large atlas sets bias the segmentation towards
the average population.
4 Information-based cost function for
Bayesian MRI segmentation
An information-based cost function is introduced for learning the conditional class probabil-
ity model required in the probabilistic atlas-based brain magnetic resonance image segmenta-
tion. Aiming to improve the segmentation results, the α-order Renyi’s entropy is considered
as the function to be maximized since this kind of functions has been proved to lead to more
discriminative distributions. Additionally, we developed the model parameter update for
the considered function, leading to a set of weighted averages dependent on the α factor.
Our proposal is tested by segmenting the synthetic BrainWeb MRI database and compared
against the standard log-likelihood function. Achieved results show an improvement in the
segmentation accuracy of ∼ 5% with respect to the baseline cost function.
4.1 Bayesian Image Segmentation
As seen in Section 2.3, one may write the voxel classification rule in Equation (2-15) as the
maximization of the tissue posterior probability [Ashburner and Friston, 2005]:
lqr = argmaxl
p(lr = l|xr) (4-1a)
lqr = argmaxl
p(xr|lr = l)p(lr = l), (4-1b)
where p(lnr=l) = brl = (1/N)∑N
n=1 bnrl is built from the averaging a set of spatially normalized
label atlases to become the prior Tissue Probability Map (TPM) and p(xr|lr=l) = frl(θ) fol-
lows a predefined probability distribution parameterized by θ. In the most common scheme,
the set of model parameters θ is found by maximizing the probability of the whole voxel set
(i.e. the volume intensities) under the assumption of having just independent voxels:
P (X) =∏
r∈Ω
∑
l∈L
frl(θ)brl (4-2)
4.1 Bayesian Image Segmentation 37
In practice, the negative log-likelihood is used to optimize the parameter set θ as an smoother
equivalent of Equation (4-2):
L(X) = −∑
r∈Ω
log
(∑
l∈L
frl(θ)brl
)(4-3)
Instead of using the common log-likelihood as the cost function, we propose to maximize
the amount of information contained in the image X. To this end, we consider the α-order
Renyi’s entropy the cost function to be maximized with respect to the set of parameters θ
as follows:
maxθ
Hα(X) ≡ minθ
−1
1− αlog
(∫
Ω
pα(xr)
)(4-4)
4.1.1 Parameter optimization
As commonly assumed, we consider the conditional tissue probability to follow a normal
distribution, so that:
frl(θ) = γlN (xr|µl, σ2l ) =
γl√2σ2
l
exp
(−|xr − µl|
2
2σ2l
), (4-5)
being µl and σ2l the intensity mean and variance for the class l, respectively. γr∈[0, 1] is the
prior probability of any voxel, irrespective of its intensity, to belong to the l-th class, and it
is subject to∑
l∈L γl=1. Consequently, the parameter set becomes θ=γl, µl, σ2l :l∈L.
Here, Expectation-Maximization (EM) algorithm minimizes a given energy function E with
respect to the parameters θ and an introduced distribution Q=qrl∈[0, 1]:∀r∈Ω, l∈L:
−Hα ≤ E = −Hα +C∑
c=1
Dα (qrl||p(lr = c|xr)) (4-6)
This new energy function works as an upper bound on the proposed cost function and it
is composed of two terms. The first one only considers the α-order Renyi’s entropy, while
the second term, Dα(·‖·), corresponds to the α-order Renyi’s divergence between the poste-
rior probability and the introduced distribution Q. Using Equation (4-6), the optimization
problem in Equation (4-4) is replaced by:
minθ,Q
E =−1
1− αlog
(∫
Ω
pα(xr)
)+∑
l∈L
∫
Ω
1
α− 1log
(qαrl
pα−1(lr = c|xr)
)(4-7)
EM optimization iteratively updates the parameters by alternating between the following
two steps:
38 4 Information-based cost function for Bayesian MRI segmentation
E-step: The energy function E is minimized w.r.t. Q. Since the first term does not depend
on Q, the introduced distribution only minimizes the α-divergence and the problem becomes:
minQ
E ≡ minQ
∑
l∈L
∫
Ω
1
α− 1log
(qαrl
P α−1(lr = c|xr)
)
Given that the α-divergence function is at minimum Dα=0, the solution for Q is:
qrl = p(α−1)/α(lr = l|xr). (4-8)
In addition, the property∑
l∈L p(lr=l|xr)=1 yields to the following restriction:
∑
l∈L
qα/(α−1)rl = 1. (4-9)
M-step: The energy function E is minimized w.r.t. θ. Given the results of the E-step, the
Renyi’s divergence is zero whenever relation in Equation (4-8) holds. Hence, the optimization
problem for the M-step, as in Equation (4-4), consists in only minimizing the Renyi’s entropy.
Using the Bayes’ theorem, the properties of the conditional probability and the results in
Equations (4-8) and (4-9), one can introduce the distribuion Q into Hα(X):
Hα(X) =−1
1− αlog
(∫
Ω
pα(xr)
)=
−1
1− αlog
(∫
Ω
pα(xr, lr = l)
pα(lr = l|xr)
)
=−1
1− αlog
(∫
Ω
pα(xr|lr = l)pα(lr = l)
q2α/(α−1)rl
)
=−1
1− αlog
(∫
Ω
∑
l∈L
qα/(α−1)rl
pα(xr|lr = l)pα(lr = l)
q2α/(α−1)rl
)
Hα(X) =1
α− 1log
(∫
Ω
∑
l∈L
fαrlb
αrl
qαrl
)(4-10)
As a result, the optimization for the M-step is now rewritten as:
minθ
−1
1− αlog
(∫
Ω
∑
l∈L
fαrlb
αrl
qαrl
)(4-11)
Taking into account that α∈[0, 1), the minimization of the function in Equation (4-11) is
equivalent to maximize the argument of the log function, known as the Information Potential
(IP), as follows:
V(X) =
∫
Ω
∑
l∈L
(frl(θ)brl
qrl
)α
4.1 Bayesian Image Segmentation 39
Given that we can only measure a finite number samples in the image, the IP for a given
image X is approximated as:
V(X) ≈ V (X) =∑
r∈Ω
∑
l∈L
(frlbrlqrl
)α
(4-12)
Finally, optimal parameter values are found where the derivatives of V with respect to θ are
zero:
dV (X)
dθ= α
∑
r∈Ω
(brlqrl
)α
f(α−1)rl
dfrldθ
= 0
By differentiating Equation (4-12) with respect to the mean µl, the following expression is
attained:
dV (X)
dµl= α
∑
r∈Ω
(brlfrlqrl
)α(xr − µl)
σl, (4-13)
that being solved for dV/dµl=0 results in the updating rule:
µ(k+1)l =
∑r∈Ω
(brlfrlqrl
)αxr
∑r∈Ω
(brlfrlqrl
)α (4-14)
Likewise, the derivative of V with respect to the variance parameter σ2l is:
dV (X)
dσ2l
=α
2
∑
r∈Ω
(brlfrlqrl
)α((xr − µl)
2
σ2l
− 1
). (4-15)
Hence, the variance is updated in accordance to the following rule:
(σ2l )
(k+1) =
∑r∈Ω
(brlfrlqrl
)α (xr − µ
(k+1)l
)2
∑r∈Ω
(brlfrlqrl
)α (4-16)
Following the above derivative scheme, the attained updating function for the prior param-
eter γl is given by:
γ(k+1)l =
∑r∈Ω
(brlfrlqrl
)αN (xr|µl, σ
2l )
∑r∈Ω
(brlfrlqrl
)α (4-17)
As a result, EM algorithm updates Q, using Equation (4-8), and µl, σ2l , γl, using Equa-
tions (4-14), (4-16) and (4-17), alternately until convergence criteria are met.
40 4 Information-based cost function for Bayesian MRI segmentation
4.2 Experiments and Results
The experiments in this chapter are carried out on SBD1 and SBD2 collections for segmenting
five compartments, namely, white matter (WM), gray matter (GM), cerebrospinal fluid
(CSF), skull (SK), and scalp (SC). Also, the prior probability maps (TPMs) provided by the
SPM software are used to segment the volumes [Ashburner and Friston, 2005]. We assess
the segmentation performance using the Dice Similarity Index defined in Equation (3-12a).
4.2.1 Evaluation of performed segmentation
Firstly, we analyze the influence of the α order in the optimization process. Figure 4-1 depicts
the information-based cost function as a function of the number of iterations for several α
values. As expected, the relation between the entropy order is Hα(X)<Hα′(X):0≤α′<α<1.
Such inequality means that the larger the entropy order, the smaller its value. Moreover,
the EM algorithm converges faster for smaller orders.
0 5 10 15 20 25 30 35 40 45 5010
12
14
Number of iterations
Hα
α = 0.1α = 0.2α = 0.3α = 0.4α = 0.5α = 0.6α = 0.7α = 0.8α = 0.9α = 0.99
Figure 4-1: α-order Renyi’s entropy versus the number of iteration for the optimization
procedure, for several α values and a given image in the dataset
The entropy influence on the segmentation performance is given in the Figure 4-2 showing
the DI versus α for the available noise intensities in SDB1. As seen, the α order leads
the accuracy, so that the Dice index is larger for mid range orders than for small or large
ones. Moreover, the highest segmentation accuracy is generally achieved at α=0.5. It is also
important noting that algorithm is robust under conventional noise levels. For too large noise
levels (9%) the performance markedly decreases, since such levels introduce high variations
on the tissue distributions so difficulting the algorithm convergence.
Finally, we compare attained results against the log-likelihood cost function in Equation (4-3).
4.3 Summary 41
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.75
0.8
0.85
0.9
α
κ1%3%5%7%9%
Figure 4-2: Average Dice similarity index versus the entropy order for available image noise
intensities.
The achieved segmentation accuracy is computed using optimal α=0.5 for each considered
structure. As shown in Table 4-1, the proposed Renyi’s entropy outperforms the baseline
log-likelihood.
Proposed Entropy Baseline Log-likelihood
Noise 1% 3% 5% 7% 9% 1% 3% 5% 7% 9%
Average 85.38 85.22 84.66 84.69 82.78 81.08 80.25 79.94 79.73 77.88
SC 88.61 89.28 86.86 86.92 88.11 84.43 84.97 82.17 82.87 83.15
SK 67.73 67.90 68.40 68.46 68.46 63.16 63.28 64.17 64.26 63.65
CSF 73.03 73.15 71.66 71.57 68.59 68.52 68.60 67.30 67.28 64.35
GM 89.60 88.50 89.73 89.72 84.15 84.88 84.18 85.39 84.89 79.27
WM 90.22 89.12 90.51 90.53 85.77 85.72 85.08 85.85 86.11 80.99
Table 4-1: Dice index for each structure at optimal α = 0.5
4.3 Summary
In this chapter, we have discussed the use of information-based measures into the parameter
optimization scheme for MRI segmentation. In particular, we introduce the α-order Renyi’s
entropy as a new cost function for finding the tissue distribution parameters under the
assumption of normally distributed classes. Additionally, we have developed the model of
updating equations for an EM-based optimization using the considered function. As a result,
parameters are updated from weighted averages (see Equations (4-14), (4-16) and (4-17) ),
where the influence of the r-th voxel for each parameter is given by (brlfrl/qrl)α .
42 4 Information-based cost function for Bayesian MRI segmentation
As seen in Figure 4-1, we prove the relationship between two different entropy orders. We
show that in the range [0, 1], the larger the order, the smaller the information measure. In
fact, the maximum possible value for the Renyi’s entropy is achieved when α = 0, corre-
sponding to H0 = − log(
1|Ω|
). We also find a proportional relationship between the order
and the algorithm convergence iterations. The above is due to the influence of α in the
probability values: The entropy tends to weigh all the events (the voxels) more evenly as α
tends to zero, regardless their probability, i.e. (brlfrl/qrl)α → 1. On the other hand, for large
α values, the entropy is determined by the most probable events.
Regarding the segmentation results in Figure 4-2, we obtain the maximum performance at
α=0.5. For such a value, we compare the proposed cost function against the log-likelihood
as the baseline approach. Achieved Dice indexes, in Table 4-1, show that our scheme out-
performs the baseline since the obtained parameters for the entropy function are more dis-
criminative than those for the log-likelihood.
5 Multi-atlas label fusion using
supervised local weighting
This chapter introduces a multi-atlas weighted label fusion approach that takes advantage
of the supervised fusing labels to improve the segmentation accuracy of brain MR images.
Namely, we employ the knowledge about the neighborhood as well as the patch structure to
be segmented. To this end, we assume a voxel-wise feature extraction procedure based on the
spatially-varying linear combination of patch intensities (like gradients, Laplacians, and non-
local means). Parameters of such a linear combination are locally computed in a supervised
learning scheme by maximizing the match between the local labels and the extracted features,
aiming to attain a more discriminating voxel representation. Particularly, we make use of
the centered kernel alignment (CKA) criterion assessing the correlation between a couple of
kernel matrices [Cortes et al., 2012]. Then, we benefit from the neighborhood-wise analysis
by providing more information about the tissue structure localities and reducing the influence
of small target-atlas registration issues.
5.1 Feature-based label fusion within α-neighborhoods
From a pattern recognition point of view, label fusion builds a set of discriminative functions,
noted as G=gl(r):Ω→R+, ∀l∈L, scoring the membership of r-th voxel to the l-th class so
that the larger the score, the more likely the given voxel belongs to such class. Consequently,
each voxel label is attained as:
lqr = argmaxl∈L
gl(r) (5-1)
In practice, registration issues, imaging artifacts, and intricate shape structures degrade the
label estimation. To further increase the segmentation performance, discriminating functions
44 5 Multi-atlas label fusion using supervised local weighting
are written as a majority voting scheme depending on neighboring voxels:
gl(r) =1
N
N∑
n=1
∑
s∈Br
νns b
nsl (5-2)
νns = ν (yq
r ,yns ) ; ∀s∈Br, n∈[1, N ], (5-3)
being yns∈Y a feature vector, extracted at location s from the n-th intensity image, Br=s∈Ω
q:
‖s−r‖Ω≤α a neighborhood centered at r, with a radius α∈R+. The scalars νns ∈R
+ mea-
sures the similarity between voxels r and s in the feature space Y through the function ν(·, ·).
Notation ‖ · ‖Ω stands for the norm defined on the spatial domain Ω, and ‖ · ‖2 the L2-norm.
We then rewrite the discriminating functions in terms of the weighting factors as gl(r)=ν⊤r drl,
where νr=νns ∈R
+:s∈Br, n∈[1..N ], with νr∈RS, holds the weights of S labeled voxels con-
tributing to the segmentation of the query image at location r (S ≤ N |Br|), and the vector
drl∈0, 1S comprises the votes for the l-th tissue as drl=b
nsl:s∈Br, n∈[1..N ]. Consequently,
the segmentation problem stated in Equation (5-1) becomes:
lqr = argmaxl∈L
ν⊤r drl, (5-4)
and the weighting factors in Equation (5-4) now depend on the similarity to the query voxel
r in the feature space Y.
5.1.1 Supervised feature learning based on centered kernel alignment
Here, we propose to compute the feature vectors in Equation (5-3) using the following pro-
jection:
yns = Wrx
ns , (5-5)
where xns=x
nt :‖t−s‖Ω≤β, t∈Ω denotes the vector containing the p∈Z+ voxel intensities
within a patch centered at s with radius β∈R+, termed the β-patch, and the h×p-sized matrix
Wr=wut∈R:u=[1..h], t=[1..p] linearly projects the β-patch to the h-dimensional feature
space Y. Therefore, we may assess the space-varying weights in Equation (5-3) using a
Gaussian kernel as below:
νns (Wr) = exp
(−‖Wrx
qr −Wrx
ns‖
22
2σ2
); ∀s ∈ Br (5-6)
In the case of Wr=W , projection matrix can be seen as a set of h 3D convolution masks
with radius β, e.g., averaging, Laplacian, and gradient.
5.1 Feature-based label fusion within α-neighborhoods 45
For the resulting feature space to improve tissue discrimination, the joint information be-
tween the feature vectors and their corresponding labels should be maximal. A practical
estimation of such a joint information is given by the centered kernel alignment (CKA)
score [Cortes et al., 2012]:
ρ (Y,B) =Eyy′,bb′
κY
(yns ,y
n′
s′
)κB
(bns , b
n′
s′
)√
Eyy′κ2Y
(yns ,y
n′
s′
)Ebb′
κ2B
(bns , b
n′
s′
) , (5-7)
where κ(·, ·) is the centered version of the kernel function κ(·, ·) given by:
κ (z, z′) = κ (z, z′)− Ez′ κ (z, z′) − Ez κ (z, z
′)+ Ezz′ κ (z, z′) . (5-8)
Therefore, ρ∈[0, 1] is an estimate of the statistical dependence between feature and label
spaces (Y and L) so that the more similar the pairs between interspace variables, the larger
the ρ score.
Here, the function κB(bns , b
n′
s′ )=δ(‖bns − bn′
s′ ‖2) defines the label agreement between any pair
of voxels, and δ(·) is the delta function. In addition, the pairwise feature similarity can
be computed using the Equation (5-6) as κY
(yns ,y
n′
s′
)=exp
(−‖Wrx
n′
s′ −Wrxns‖
22/(2σ
2)).
Finally, we enclose all the feature and label similarities in the following kernel matrices:
KY (Wr, σ) = κY
(yns ,y
n′
s′
): s, s′ ∈ Br;n, n
′ ∈ [1, N ] (5-9a)
KL =κB(b
ns , b
n′
s′ ) : s, s′ ∈ Br;n, n
′ ∈ [1, N ]
(5-9b)
Both kernel matrices, KY∈[0, 1]S×S andKB∈[0, 1]
S×S, allow to empirically estimate the CKA
in accordance to [Brockmeier et al., 2014]:
ρ (Wr, σ) =〈KX , KL〉F√
‖KX , KX‖F‖KL, KL‖F, (5-10)
where 〈·, ·〉F and ‖·, ·‖F stand for the matrix-based Frobenius inner product and norm, re-
spectively. Notation K corresponds to the centered version of the kernel matrixK calculated
as K=HKH . H=I−1⊤1/S is the empirical centering matrix, I∈RS×S the identity matrix,
and 1∈RS an all-ones vector.
Thus, the maximization of Equation (5-10) with respect to Wr generates weighting factors
enhancing the label discrimination. Accordingly, the use of the introduced supervised feature
space into the majority voting scheme is designated as Centered Kernel Alignment-based
Label Fusion (CKA-LF ).
46 5 Multi-atlas label fusion using supervised local weighting
5.1.2 CKA-LF optimization using gradient descent
The explicit objective function of the empirical CKA in Equation (5-10) yields:
F (Wr, σ) = − log (tr (KY (Wr, σ)HKLH)) (5-11)
+ 12log (tr (KY (Wr, σ)HKY (Wr, σ)H)) + ρ0,
where ρ0∈R is a scalar independent on Wr. Then, a two variable maximization problem
holds so that the projection matrix and the Gaussian kernel bandwidth σ are optimized. We
estimate each parameter alternately in an iterative approach: Fixing σ, the gradient descent
optimization updates Wr using the derivative of the objective function Equation (5-11) with
respect to Wr given by:
∂F (Wr)
∂Wr= −4X⊤
r
((G KY )− diag
(1⊤ (G KY )
))Yr, (5-12)
with diag(·) and denoting the diagonal operator and the Hadamard product, respectively.
The matricesXr∈RS×p and Yr∈R
S×h are built from neighboring voxels as: Xr=xns :s∈Br, n∈[1..N ]
and Yr=yns=Wrx
ns :s∈Br, n∈[1..N ]. G∈RS×S is the gradient of the objective function with
respect to KY , computed as:
G =dF
dKY=
HKLH
tr (KYHKLH)−
HKYH
tr (KYHKYH). (5-13)
Wr is then updated following the rule:
W t+1r = W t
r − µt∂L (Wr)
∂W tr
, (5-14)
being µt∈R+ and W tr the learning rate and the estimated projection at iteration t, respec-
tively. Then, fixing Wr, we setup the bandwith. Such a parameter allows scaling all pairwise
distances on the projected space Yt, so we estimate σt through the Kernel Function Esti-
mation from Information Potential Variability-(KEIPV) criterion introduced in Section 2.2.
Thus, we maximize the overall variability of the so termed information potential of projected
samples Yr with respect to the kernel bandwidth parameter spreading the magnitude of the
information forces more widely.
5.2 Experimental Setup
Since the contribution of the current chapter centers on the label fusion stage, atlas selection
is carried out using the KAISER approach proposed in Chapter 3. Then, patch-based
segmentation of a query image follows the procedure in Figure 5-1: i) A subset of S voxels
5.2 Experimental Setup 47
Mapping LearningInput Atlases
Input Query Label FusionPatch
Mapping
xn
xq yq
Wr yn
Figure 5-1: Proposed multi-atlas patch-based label fusion scheme.
is extracted from the fixed α-neighborhood from the atlas set. Each voxel is described by its
label and its β-patch (red patches in the Input Atlases block); ii) The linear mapping, Wr,
is then computed from those samples by minimizing Equation (5-11) (Mapping Learning
block); iii) Query patches are extracted from the target image for all voxels belonging to
the α-neighborhood (blue patches in the Input Query block). Following, all feature vectors
are computed using the previously learned projection Wr; iv) Voting weights are estimated
using the similarity function in the feature space defined in Equation (5-3); Lastly, v) a
majority voting is carried out at each query voxel taking into account the voting weights
according to Equation (5-4).
5.2.1 Algorithm parameter setup
Firstly, we adjust the size of the neighborhoods and the patch as the most critical parameters
of the proposed CKA-LF approach. So, the assumed description of voxels by their appro-
priate β-patches leads to the following limitations: The larger the patch - the higher the
computation cost; besides, the more remote the elements - the less their relevance. With the
above in mind, parameter setup is carried out on an exhaustive search scheme in the ranges
(α, β)∈1, 2, 3 as illustrated in Figure 5-2. We also compare two well-known patch-based
approaches, namely, patch similarity and regression-based voting. Exhaustive search results
in the following optimal parameter setup: (3, 1) for similarity, (2, 1) for regression and (2, 2)
48 5 Multi-atlas label fusion using supervised local weighting
for proposed CKA-LF.
86.9 87.1 87.2
86.6 87.0 87.2
86.0 86.9 87.2
1 2 3
1
2
3
α radius
βradius
(a) Similarity
87.4 87.4 87.2
87.1 87.4 87.2
86.7 87.3 87.2
1 2 3
α radius
(b) Regression
88.0 87.2 86.4
88.4 88.4 88.1
88.3 88.0 87.9
1 2 3
α radius
(c) CKA-LF
Figure 5-2: Parameter tuning for patch-based approaches by exhaustive search.
Exemplary of the label similarity matrix, non-projected weights (i.e. Wr=I), and supervised
weights (after CKA-based projection) is provided in Figures 5.3(a) to 5.3(c), respectively.
All values are computed for the optimal patch radius over the above sample subset. For the
sake of easier visualization, the voxels are sorted with respect to their tissue label. As noted
in Figure 5-3, the supervised weights discriminate tissues better than the similarity assessed
in the patches.
BG
Acc
Amy
Caud
Hipp
o
Pall
Put
Thal
BG
Acc
Amy
Caud
Hippo
Pall
Put
Thal
(a) Reference kernel KL
BG
Acc
Amy
Caud
Hipp
o
Pall
Put
Thal
(b) Patch-based kernel KX
BG
Acc
Amy
Caud
Hipp
o
Pall
Put
Thal
(c) Feature-based kernel KY
Figure 5-3: Resulting kernel matrices for a random subset of voxels before and after learning
the projection matrix Wr. Voxels are sorted by tissue type.
On the other hand, the influence of the β-patch radius on the resulting labeling is illustrated
in Figure 5-4 for a given region of interest, where the mislabeled pixels are marked in red.
5.2 Experimental Setup 49
As seen, β=0 performs the worst (see Figure 5.4(a)) since there is no patch information
to compute the needed projection. As detailed in all subplots of Figure 5-4, the larger the
patch radius - the more accurate the label fusion, showing the benefit of incorporating spatial
information into the voxel representation. However, the segmentation accuracy reduces for
excessively large patches (see Figure 5.4(d)) because of the following two convergence issues:
Firstly, the size of the projection matrix grows geometrically, complicating the convergence
of the CKA to a suitable maximum. Secondly, the larger the radius, the more complex the
distribution of the patch vectors. Hence, the proposed linear projection is not able to find a
weight function properly encoding the supervised information.
5.2.2 Patch-based segmentation performance
For comparing the performed segmentation of a couple of images, we consider the Dice
similarity index (DI) previously defined in Equation (3-12b). Segmentation performance
achieved by the examined methods are shown in Table 5-1 as the average and standard de-
viation values of DI. Template selection results from Table 3-1 are also included to evaluate
the benefit of patch-based approaches. Although proposed CKA-LF generally outperforms
compared approaches, Nucleus Accumbens and Pallidum structures are better segmented on
a regression scheme. It is important to highlight that the best Amygdala extraction is at-
tained by the voxel-wise weighted majority voting (the baseline approach). Such a structure
is anatomically connected to the hippocampus and their contrast in an MRI is low. Conse-
quently, we consider that in this particular case including neighboring information reduces
the segmentation performance.
Voting Similarity Regression CKA-LF
Accumbens 80.8± 0.5 80.8± 0.7 81.2± 0.6 80.3± 0.8
Amygdala 86.5± 0.8 82.9± 0.4 83.1± 0.4 82.9± 0.5
Caudate 85.1± 0.5 90.6± 0.5 91.3± 0.4 92.5± 0.6
Hippocampus 85.1± 0.5 87.0± 0.3 87.6± 0.3 89.3± 0.4
Pallidum 88.2± 0.5 88.5± 0.4 88.7± 0.4 88.4± 0.6
Putamen 92.2± 0.3 92.3± 0.3 92.5± 0.3 94.0± 0.5
Thalamus 91.9± 0.2 92.7± 0.2 92.8± 0.2 95.0± 0.2
Average 86.3± 0.5 87.8± 0.4 88.2± 0.4 88.9± 0.5
Table 5-1: Dice index scores for considered approaches and structures. Mean and standard
deviation along the subjects are depicted.
50 5 Multi-atlas label fusion using supervised local weighting
(a) β = 0, DI = 86.8
(b) β = 1, DI = 87.8
(c) β = 2, DI = 89.0
(d) β = 3, DI = 88.8
Figure 5-4: β radius effect in a subject’s volume. Mislabelings are plotted in red for the
Axial (left), sagittal (center) and coronal (right) axes.
5.3 Summary 51
5.3 Summary
From the above carried out validation, the following aspects emerge as relevant in developing
the CKA-LF method:
Firstly, the construction of the voting function measures the similarity between all feature
vectors extracted from the linear mapping of the patch representation. Such a mapping is
accomplished by the projection matrix, which is learned in a supervised scheme so that the
feature and label relations resemble the most. To this end, we maximize the centered kernel
alignment criterion that is introduced to estimate the affinity between pairs of similarity
matrices. Thus, the use of the CKA criterion and the linear projection of the patches
(see Figure 5-3) allows building a voting function that becomes highly related with the
tissue labels, increasing the class discrimination.
The second aspect is the tuning of the patches and the size of the neighborhoods as the
parameters having a strong influence on the estimation of the mapping function. Specifically,
the former parameter determines the projection domain and the intensity variability inside
each patch. At the same time, the latter one establishes the number of available samples
for estimating the projection matrix as well as the shape changes within the neighborhood
allowing to cope with small registration issues.
Therefore, the lack of patch information in the computed projection decreases the achievable
accuracy of label fusion when β→0 as seen in Figure 5.4(a). Nonetheless, the performance
again reduces for the very large patches (see Figure 5.4(d)) due to the geometrically growing
size of the resulting projection matrix and the complex distribution of the patch vectors.
That is why, the algorithm can not produce a projecting matrix suitable aligning features
and labels, and hence, the yielded weight functions do not encode the supervised information
properly. We also investigate the effect of the searching neighborhood size α on the voting
function estimation. Although small neighborhood radii should provide more robustness
to low-frequency artifacts, they lead to a lack of patches and a poorly estimated Wr. By
contrast, large values of α produce a large amount of patches and, therefore, the shape
variability increases so that the projection matrix calculation is more complex and yields to
suboptimal solutions.
6 Magnetic resonance image
classification using kernel-enhanced
neural networks
Several computer-aided dementia diagnosis methods have been proposed to discriminate be-
tween patients with Alzheimer’s disease (AD), mild cognitive impairment (MCI), and healthy
controls (NC) given their MRI scans. Nonetheless, the problem is particularly challenging
because the heterogeneous and intermediate nature of MCI. To cope with this issue, we
benefit from the artificial neural network (ANN) advantages for complex classification tasks
and introduce a novel supervised pre-training approach devoted to the automated demen-
tia diagnosis. The proposal initializes an ANN based on linear projections to achieve more
discriminating spaces. Such projections are computed by maximizing the centered-kernel
alignment criterion that assesses the affinity between the data kernel matrix and the label
target matrix. As a result, the linear embedding allows accounting for features that con-
tribute the most to class discrimination. We contrast the proposed approach against two
unsupervised initialization approaches (autoencoders and Principal Component Analysis),
and against the best four performing classification methodologies from the 2014 CADDemen-
tia challenge. As a result, our proposal outperforms all the baselines (7% of classification
accuracy and area under the ROC curve) at the time it reduces the class biasing.
6.1 Multi-layer perceptron-based classifier using kernels
Within the classification framework, a multi-layer perceptron (MLP), with L layers, is as-
sumed to predict a class c∈C, within a set of labels C, through a battery of feedforward
deterministic transformations, which are implemented at the hidden layers hl:l∈[1..L] by
mapping an input sample z∈Z to the network output hL as below [Bengio, 2009]:
hl = φl(sl), ∀l∈[1..L]
sl = bl +W lhl−1
h0 = z
(6-1)
6.1 Multi-layer perceptron-based classifier using kernels 53
where bl∈RPl+1 is the l-th offset vector, W l∈RPl+1×Pl the l-th linear projection, and Pl∈N
the size of the l-th layer. The function φl(·)∈R applies saturating, non-linear, element-wise
operations. The first layer in Equation (6-1) (that is, h0∈RP ) is fixed to the P -dimensional
input feature vectors so that Z ⊂ RP . In turn, the output layer hL∈RC , with C=|C|, works
as an estimator of the posterior class probability p(c|z) when the last saturating function
φL(·) is subject to the following constraints:
φL(uc) ∈ [0, 1] (6-2a)
∑
c∈C
φL(uc) = 1 (6-2b)
so that the maximum a-posteriori classification criterion holds:
c⋆ = argmaxc∈C
p(c|z) = argmaxc∈C
hLc (z). (6-3)
To train an MLP-based classifier a set of input samples Z=zn∈RP :n∈[1..N ] along with their
corresponding expected outputs c=cn∈C:n∈[1..N ] are provided and a predefined cost func-
tion L(HL(Z), c)∈R is minimized with respect to the set of parameters θ=W l, bl:l∈[1..L].
6.1.1 Matrix-based entropy as a cost function for MLP
Here, we extend the matrix-based entropy as a cost function for learning MLP parameters.
At this point, we take the definition of the α-order entropy of a symmetric positive definite
matrix, proposed by [Giraldo and Principe, 2013]:
Sα(K) =1
1− αlog (tr (Kα)) (6-4)
Therefore, we look for the set of parameters minimizing the matrix conditional entropy
(MCE) of the expected outputs c given the MLP evaluations hLn(zn) as:
minimizeθ
L = Sα(HL|c)
subject to hl = φl(sl)
sl = bl +W lhl−1,
(6-5)
Now we translate this problem to kernel matrices as: Let the matrix KL∈RN×N encode
the similarity between projected sample pairs, hLn ,h
Lm, with elements kL
nm=κ(eLnm) and
eLnm=hL
n − hLm, and the output similarities be assessed by kC
nm=κC (cn, cm) , enclosed in
KC∈RN×N . Then, the conditional entropy of the data assembly can be computed as:
Sα(HL|c)=Sα(NKL KC)− Sα(K
L), (6-6)
54 6 Magnetic resonance image classification using kernel-enhanced neural networks
where corresponds to the Hadamard product.
Here, the optimization problem is solved using a gradient descent approach with back-
propagable parameter derivatives. To this end, we firstly take advantage of the spectral
properties of the kernel matrices to compute the derivative of the matrix entropy in Equa-
tion (6-4) with respect to K as [Lewis, 1996]:
∇Sα(K) =α
(1− α)tr (Kα)UΛα−1U⊤ (6-7)
being K=UΛU⊤ the spectral decomposition of the kernel matrix with eigenvectors encom-
passed in U∈RN×N and eigenvalues Λ = diag(λ1, · · · , λN). The gradient of the conditional
entropy with respect to the output data kernel is given by:
∇KLSα(HL|c) =
αN
1− α
(NKL KC)α−1
tr ((NKL KC)α)KC −
α
1− α
(KL)α−1
tr ((KL)α)(6-8)
The derivatives with respect to MLP parameters can be intuitively computed using the
following chain rule:
Sα(HL|g)
dθ= ∇KLSα(H
L|c)
(dKL(e)
de
)(de
dθ
)(6-9)
Then, the Algorithm 1 summarizes the backpropagation procedure for updating MLP pa-
rameters using the matrix conditional entropy as the cost function.
6.1.2 Network pre-training using centered kernel alignment
Let H l=hli(zi)∈R
Pl+1:i∈[1..N ], with H l∈RPl+1×N , be the hidden state matrix projecting
Z to the l-th latent space. In order to encode the affinity between a pair of latent samples,
hln,h
lm, we define the following kernel function:
klnm = κ
(d(hl
n,hlm
)), (6-10)
being d:RPl+1 × RPl+1→R
+ a distance operator implementing the positive definite kernel
function κ(·). Therefore, the application of κ over each sample pair in H l yields to the
kernel matrix K l∈RN×N estimating the covariance of the induced random process Hl over
the RKHS. Upon the consideration of the linearity component between the layer transitions,
we apply the Mahalanobis distance that is defined for P -dimensional spaces by the following
inverse covariance matrix W lW l⊤:
d(hl
n,hlm
)=(hl
n − hlm
)W lW l⊤
(hl
n − hlm
)⊤. (6-11)
6.1 Multi-layer perceptron-based classifier using kernels 55
MCE-MLP 1 Backpropagation MLP training for Sα cost function.
1: Compute from input to output the latent variables sl,hl.
2: Compute the matrix P∈RN×N at the output layer:
P =(∇KLSα(H
L|c))
(dKL(e)
de
)
3: Compute from output to input the auxiliary equations:
gln =
hLn φ′(sL−1
n ) : l = L[(W l)⊤(gl
n)] φ′(sl−1
n ) : l ∈ [2..L− 1][(W 1)⊤(g1
n)] φ′(zn) : l = 1
glnm =
hLm φ′(sL−1
n ) : l = L[(W l)⊤(gl
nm)] φ′(sl−1
n ) : l ∈ [2..L− 1][(W 1)⊤(g1
nm)] φ′(zn) : l = 1
4: Compute the partial derivatives:
∂Sα(HL|g)
∂W l= 4
N∑
n,m=1
pnm(gln)(h
ln)
⊤ − 4
N∑
n,m=1
pnm(glnm)(h
ln)
⊤
∂Sα(HL|g)
∂bl= 4
N∑
n,m=1
pnm(gln)− 4
N∑
n,m=1
pnm(glnm)
5: Update the parameters at iteration t using the gradient descent rule:
W l(t) = W l(t− 1)− λt∂Sα(H
L|g)
∂W l
bl(t) = bl(t− 1)− λt∂Sα(H
L|g)
∂bl
56 6 Magnetic resonance image classification using kernel-enhanced neural networks
With the purpose of improving the system performance regarding the learning speed and
classification accuracy, we introduce the available supervised knowledge into the pre-training
stage as the target kernel matrix KC. Then, we learn each matrix W l by maximizing the
similarity between K l and KC through the real-valued centered kernel alignment (CKA)
[Brockmeier et al., 2014]:
ρ(K l,KC
)=
⟨HK lH ,HKCH
⟩F
‖HK lH‖F ‖HKCH‖F, (6-12)
where H=I − N−111⊤ is a centering matrix (H∈RNxN ), 1∈RN is an all-ones vector, and
notations 〈·, ·〉F and ‖·, ·‖F stand for the Frobenius inner product and norm, respectively.
Due to the CKA cost function in Equation (6-12) provides an assembly of discriminative
linear projections W l better matching the relations between hidden states H l and target
information KC, we devise the following optimization problem to compute, at the end, the
projection matrix:
Wl= argmax
W l
ρ(K l,KC
), (6-13)
where the pre-trained Wlinitializes the l-th MLP layer.
Additionally, the weighting matrix mapping from the input to the first hidden layer allows
to analyze the contribution of the original features to the latent space by computing the
relevance vector ∈RP as the squared row averaging:
p=Ew2
qp : ∀u∈[1, P1], (6-14)
where the weight wqp∈R associates each the p-th feature with the q-th hidden neuron. No-
tation E · stands for the averaging operator. The main assumption behind the introduced
relevance in Equation (6-14) is that the larger the values of p, the larger the dependency of
the estimated embedding on the input attribute.
6.2 Experimental Setup
An automated, computer-aided diagnosis system based on artificial neural networks is intro-
duced to classify structural MRI scans from the ADNI dataset in accordance with the fol-
lowing three neurological classes: Normal Control (NC), Mild Cognitive Impairment (MCI),
and Alzheimer’s Disease (AD). Figure 6-1 illustrates the methodological development of the
proposed approach: Firstly, MRIs are independently segmented and a set of features are
extracted from resulting parcellations. Centered kernel alignment is proposed to learn a
projection matrix initializing the MLP and the matrix conditional entropy is minimized in
the MLP training. Tuned and trained model predicts the diagnosis for a given image.
6.2 Experimental Setup 57
Input
MRI Scans
MRI Processing:
-Segmentation
-Feature extraction
Cross-validation training:
-MLP classifier
-CKA pre-training
-MCE function
Output
CAD
Figure 6-1: General MRI classification pipeline
6.2.1 Processing of MRI data
We used FreeSurfer -version 5.1- (a free available1, widely used and extensively validated
brain MRI analysis software package) to process the structural brain MRI scans and compute
the morphological measurements [Fischl, 2012]. Freesurfer morphometric procedures are con-
sidered since they have shown suitable test-retest reliability across scanner manufacturers and
across field strengths, so becoming a standard for MRI feature extraction [Han et al., 2006].
The FreeSurfer pipeline is fully automatic and includes the following procedures: a watershed-
based skull stripping [Segonne et al., 2004], a transformation to the Talairach, an intensity
normalization and bias field correction [Sled et al., 1998], tessellation of the gray/white mat-
ter boundary, topology correction [Segonne et al., 2007], and a surface deformation [Fischl, 2004].
Consequently, a representation of the cortical surface between white and gray matter, of the
pial surface, and a segmentation of white matter from the rest of the brain are obtained.
FreeSurfer computes structure-specific volume, area, and thickness measurements. Corti-
cal and subcortical volumes are normalized to each subject’s Total Intracranial Volume
(eTIV) [Buckner et al., 2004]. Table 6-1 summarizes the five feature sets extracted for each
subject, which are concatenated into the feature matrix X with dimensions N=1993 and
D=324.
Type Number of features Units
Cortical Volumes (CV) 70 mm3
Subortical Volumes (SV) 42 mm3
Surface Area (SA) 72 mm2
Thickness Average (TA) 70 mm
Thickness Std. (TS) 70 mm
Total 324
Table 6-1: FreeSurfer extracted features.
1freesurfer.nmr.mgh.harvard.edu
58 6 Magnetic resonance image classification using kernel-enhanced neural networks
6.2.2 Tuning of ANN model parameter
Given an input D=324 MRI features for classification of the 3- neurological classes, we
use the feedforward ANNs with one hidden layer, 324-input and 3-output neurons. An
exhaustive search is carried out for tuning the single free parameter, namely, the number
of neurons in the hidden layer (m1). For the sake of comparing our proposal, we also con-
sider AutoENcoders (AEN) [Vincent et al., 2010] and the well-known Principal Components
Analysis (PCA) for the initialization stage. All of these approaches (AEN, PCA, and CKA)
provide a projection matrix with an output dimension that, in this case, equates the hidden
layer size. Thus, resulting projections are used as the initial weights for the first layer while
biases and output layer weights are randomly initialized. We then train the MLP based on
the MCE cost function for a different number of neurons using 5-fold cross-validation scheme
as shown in Figure 6-2. Since we look for a network configuration with the highest accu-
racy and the lowest deviation, the resulting search indicates that the best number of hidden
neurons is accomplished by m1=20, m1=16, m1=14 for AEN, PCA, and CKA approaches,
respectively.
6 10 14 18 22 26 3040
60
80
Accuracy
(a) AEN
6 10 14 18 22 26 30
(b) PCA
6 10 14 18 22 26 30
(c) CKA
Figure 6-2: Artificial neural network performance along the number of nodes in the hidden
layer (m1) for the three initialization approachces: AutoENcoder, PCA-based
projection, and CKA-based projection. Results are computed under 5-fold cross-
validation scheme.
We further analyze the influence of each feature to the initialization process regarding the
relevance criterion introduced in Equation (6-14). Obtained results of relevance in Figure 6-3
show that CKA approach enhances the Subcortical Volume features at the time it diminishes
the influence of most Cortical Volumes and Thickness Averages. The relevance of each feature
set provided by AEN and PCA is practically the same. Hence, CKA allows the selection of
relevant biomarkers from MRI.
6.2 Experimental Setup 59
CV SV SA TA TS0
0.5
1
ρ
(a) AEN
CV SV SA TA TS
(b) PCA
CV SV SA TA TS
(c) CKA
Figure 6-3: Relevance indexes grouped by feature type: Cortical Volume (CV), Subcortical
Volume (SV), Surface Area (SA), Thickness Average (TA), and Thickness Std.
(TS).
6.2.3 Classifier performance of neurological classes
Algorithm Features Classifier
Abdulkadir Voxel-based morphometry Support Vector Machine
Ledig Volume and intensity relations Random forest classifier
Sørensen Volume, thickness, shape, inten-
sity relations
Regularized linear discriminant
analysis
Wachinger Volume, thickness, shape Generalized linear model
Table 6-2: Best performing algorithms in the 2014 CADDementia challenge.
As shown in Table 6-2, the ANN models that have been tuned for the three initialization
strategies are contrasted with the best four performing approaches of the 2014 CADDemen-
tia challenge [Bron et al., 2015]. The compared algorithms are evaluated in terms of their
classification performance: accuracy (Acc), area under the receiver-operating-characteristic
curve (Auc), and class-wise true positive rate (τ cp) criteria, respectively, defined as:
Acc =
∑c(t
cp + tcn)∑cN
c(6-15a)
τ c =tcpN c
(6-15b)
Auc =
∑cA
cuc ·N
c
∑c N
c, (6-15c)
60 6 Magnetic resonance image classification using kernel-enhanced neural networks
where c∈NC,MCI,AD indexes each class, N c, tcp, and tcn is the number of samples, true
positives, and true negatives for the c-th class, respectively. The area under the curve Auc is
the weighted average of the area under the ROC curve of each class Acuc. Although the test
samples for the challenge and tested in this present paper are not the same, we assume that
both testing data are equivalent for evaluation purposes.
As seen in Table 6-3 that compares the classification performance on the 30% “best” quality
test set, the proposed approach outperforms other compared approaches of initialization.
Moreover, it performs better that other computer aided diagnosis methods as a whole. For
the “partial” quality images, as expected, the general performance diminishes in all MLP
approaches. Nonetheless, the overall accuracy and area under the curve is still competitive
with respect to the challenge winner. Based on the displayed ROC and confusion matrices
for the MLP-based classifiers with the optimum parameter set (see fig. 6-4), we also infer
that the proposed approach improves MCI discrimination.
Algorithm Acc τNC τMCI τAD Auc ANCuc AMCI
uc AADuc
2014 CADDementia
Sørensen 63.0 96.9 28.7 61.2 78.8 86.3 63.1 87.5
Wachigner 59.0 72.1 51.6 51.5 77.0 83.3 59.4 88.2
Ledig 57.9 89.1 41.0 38.8 76.7 86.6 59.7 84.9
Abdulkadir 53.7 45.7 65.6 49.5 77.7 85.6 59.9 86.7
“best” quality testing
NN-AEN 47.6 73.4 33.1 38.1 64.9 71.4 53.4 75.1
NN-PCA 63.8 70.4 56.7 66.9 80.0 87.2 70.0 87.0
NN-CKA 70.9 78.4 66.6 68.3 85.3 91.7 78.4 88.3
“partial” quality
NN-AEN 62.9 64.6 46.4 32.0 77.0 82.5 65.6 72.5
NN-PCA 64.4 67.6 49.3 26.0 78.4 82.3 67.5 79.2
NN-CKA 65.2 68.6 38.6 42.0 81.6 85.7 70.1 82.4
Table 6-3: Classification performance on the testing groups for considered algorithms under
evaluation criteria. Top: Baseline approaches. Bottom: ANN pre-trainings.
6.3 Summary
From the validation carried out above for MRI-based dementia diagnosis, the following as-
pects emerge as relevant for the developed proposal of ANN pre-training:
– As commonly implemented by the state-of-the-art ANN algorithms, proposed pre-
6.3 Summary 61
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
False Positive Rate
TruePositiveRate
NC (AUC: 71.4)
MCI (AUC: 53.4)
AD (AUC: 75.1)
(a) AEN (Auc: 64.9)
0 0.2 0.4 0.6 0.8 1
False Positive Rate
NC (AUC: 87.2)
MCI (AUC: 70.0)
AD (AUC: 87.0)
(b) PCA (Auc: 80.0)
0 0.2 0.4 0.6 0.8 1
False Positive Rate
NC (AUC: 91.7)
MCI (AUC: 78.4)
AD (AUC: 88.3)
(c) CKA (Auc: 85.3)
NC MCI AD
NC
MCI
AD
152
25.5%
28
4.7%
16
2.7%
126
21.1%
73
12.2%
48
8.0%
26
4.4%
70
11.7%
58
9.7%
Target Class
OutputClass
(d) AEN (Acc: 47.6)
NC MCI AD
127
21.3%
57
9.5%
12
2.0%
57
9.5%
134
22.4%
56
9.4%
5
0.8%
46
7.7%
103
17.3%
Target Class
(e) PCA (Acc: 63.8)
NC MCI AD
159
26.6%
34
5.7%
3
0.5%
51
8.5%
158
26.5%
38
6.4%
8
1.3%
43
7.2%
103
17.3%
Target Class
(f) CKA (Acc: 70.9)
Figure 6-4: Receiver-operating-characteristic curve (top) and confusion matrix (bottom) on
the 30% test data for AEN (left), PCA (center), and CKA (right) initizaliation
approaches at the best parameter set of the ANN classifier.
62 6 Magnetic resonance image classification using kernel-enhanced neural networks
training approach also has a free model parameter that is the number of hidden neurons.
Tuning of this parameter is carried out heuristically by an exhaustive search so as to
reach the highest accuracy on a 5-fold cross-validation (see Figure 6-2). Thus, 14, 20,
and 16 hidden neurons are selected for CKA, AEN and PCA, respectively. As a result,
the suggested CKA approach improves other pre-training ANN approaches (in about
10%) with the additional benefit of decreasing the performed parameter sensitivity.
– We assess the influence of each MRI feature at the pre-training procedure regarding
the relevance criterion introduced in Equation (6-14). As follows from Figure 6-3,
AEN and PCA ponder every feature evenly, restraining their ability for extracting
biomarkers. By contrast, CKA enhances the influence of Subcortical Volumes and
Thickness Standard deviations at the time it diminishes the contribution of Cortical
Volumes and Thickness Averages. Consequently, the proposed approach is also suitable
for feature selection tasks.
– We compare the developed MLP approach with the best four classification strategies
of the 2014 CADDementia, devoted especially to dementia classification. From the
obtained results, summarized in table 6-3, it follows that our proposal outperforms
other algorithms in most of the evaluation criteria and imaging conditions, providing
the most balanced performance over all classes. Particularly, we increase by 7%-points
the classification accuracy and average area under the ROC curve. It is worth noting
that although the Sørensen’s approach accomplishes a τNC value that is 18.5%-points
higher than the proposal, its performance turns out to be biased towards the NC, yield-
ing a worse value of MCI. That is, CKA-MLP carries out unbiased class performance
of the dementia classification. In the case of “partial” quality images, despite the
general performance reduction, proposed pre-training remains as the best approach.
Moreover, the overall measures are still competitive with the results provided by the
CADDementia challenge.
– Figure 6-4 shows the per-class ROC curves and confusion matrices obtained by the
contrasted approaches. In all cases, the area under the curve and accuracy for NC
and AD classes are higher than the ones achieved by the MCI class (Figures 6.4(a)
to 6.4(c)). Hence, MCI classification from the incorporated MRI features remains a
challenging task due to the following facts: the widely-known MCI heterogeneity, the
MCI is an intermediate class between healthy and diagnosed Alzheimer’s, and the MCI
subjects may eventually convert to AD or NC. Moreover, confusion matrices displayed
in Figures 6.4(d) to 6.4(f) confirm that NC and AD are suitably for distinction in most
of the cases. Nevertheless, the MCI class introduces the most errors when considered
as both, target or output class. Therefore, particular studies on the mild cognitive
impairment should improve the diagnosis [Ramırez et al., 2016, Wolz et al., 2011].
7 Conclusions and Future Work
7.1 Concluding remarks
– A new kernel function estimation based on an information potential variability frame-
work is presented. KEIPV estimates an RKHS spanning the most widely information
force magnitudes among data points. Particularly, KEIPV relates different kernel
functions with the intrinsic information potential variations in Parzen-based pdf esti-
mations [Principe, 2010]. Thereby, we seek for an RKHS that maximizes the overall
information potential variability with respect to the global kernel parameter. An up-
dating rule for estimating the Gaussian kernel bandwidth parameter is also introduced
as a function of the forces induced by the distances among samples. KEIPV criterion
is considered for tuning Gaussian kernel parameters in the whole development of this
manuscript.
– A kernel-based image representation is introduced to support MRI discrimination in
segmentation of brain structures. Our proposal encodes inter-slice variations related
to the brain structure distribution. Thus, head patterns can be extracted along each
3D axis view, namely, Axial, Coronal, and Sagittal. Taking into account the attained
results over well-know MRI datasets, proposed kernel-based representation methodol-
ogy proves to find the natural inherent image distributions, namely, age and gender
categories. The representation is evaluated as template selection for image segmenta-
tion, termed KAISER. Results prove that selecting templates reduce the bias towards
population averages by providing small subset better performing than the whole atlas
set. In addition, KAISER selects the smallest atlas subset with the best performance
in comparison with other conventional selection strategies, as SSD and MI.
– We discuss the use of information-based measures to optimize model parameters for
MRI segmentation. In particular, we introduce the α-order Renyi’s entropy as a new
cost function for finding the tissue distribution parameters under the assumption of
normally distributed classes. Additionally, we develop the updating equations for
an EM-based optimization using the considered function. As a result, parameters
are updated from weighted voxel-wise averages, where the influence of the r-th voxel
64 7 Conclusions and Future Work
for each parameter is (brcfrc/qrc)α . We also prove the relation between two different
entropy orders: For α∈[0, 1), the larger the order, the smaller the information measure.
In fact, the maximum possible value for the Renyi’s entropy is achieved when α=0,
corresponding to H0=− log(
1|Ω|
). Additionally, the entropy order and the algorithm
convergence are proportionality related. The above is due to the influence of α in the
probability values: As α tends to zero, the entropy tends to weight all the events more
evenly, regardless their probability, i.e., (brlfrl/qrl)α → 1; ∀r∈Ω, l∈L. On the other
hand, for large order values, the entropy is determined by the most probable events.
Regarding the segmentation accuracy, we show that the larger the noise intensity, the
larger the number of misclassifications. Additionally, we compare our proposal against
the log-likelihood as the baseline approach. Achieved results for the optimal entropy
order show that our scheme outperforms the baseline, since the obtained parameters
for the entropy function are more discriminative than those for the log-likelihood.
– Chapter 5 proposes a new multi-atlas weighted label fusion approach for brain image
segmentation that takes advantage of a more elaborated fusing procedure incorpo-
rating the knowledge about the neighborhood as well as the patch structure of the
considered tissues. For this purpose, all image patches are projected into a discrimi-
nating space that maximizes the similarity between the labels and the feature vectors
by using the introduced centered kernel alignment criterion. Besides, the adopted
neighborhood-wise analysis allows accounting more useful information about the tis-
sue structure localities to avoid the influence of small registration issues on the query
image. Nonetheless, we devise a couple of restrictions on the use of centered kernel
alignment (CKA): Firstly, the number of samples should be larger than input and
output dimensions to avoid overfitted projections. We cope with this drawback by
considering large enough sampling subsets for training purposes. In other wise, val-
idation techniques are recommended to be included in the CKA learning. Secondly,
attained projections must always be lower dimensional than the original feature space.
In this case, the enhancement on tissue discrimination is due to the affinity between
labels and features, not to an increase of the dimension.
– Finally, a new multi-layer perceptron training is introduced aiming to improve the
computer-aided diagnosis of dementia. Given a set of features extracted from a sub-
ject’s brain MRI, the dementia diagnosis task consists on assigning subjects to Normal
Control, Mild Cognitive Impairment (MCI), or Alzheimer’s Disease. To improve the
classification performance, we incorporate a matrix projecting the samples into a more
discriminating feature space so that the affinity between projected features and class
labels is maximized. Such an affinity criterion is implemented by the Centered Kernel
Alignment (CKA) providing two key benefits: i) The only free parameter is the hidden
dimension. ii) A relevance analysis can be introduced to find biomarkers. MLP is then
7.2 Future Work 65
trained using gradient descent algorithm minimizing the matrix conditional entropy as
a new cost function. As a result, our proposal outperforms the contrasted algorithms
(7% of classification accuracy and area under the ROC curve), and reduces the class
biasing, resulting in a better MCI discrimination.
7.2 Future Work
– Regarding the template selection, following research lines are proposed: i) Since ob-
tained decomposition eigenvectors showed non-linear relations, other non-linear embed-
ding techniques, e.g. local linear embedding, can be used for highlighting the essential
structure. Slice-wise metrics, as the mutual information, can be tested as the core of
ISK after being demonstrated to satisfy the kernel properties. Also, other than slice
partitions can be explored to account for different spatial dependencies, e.g. 3D blocks.
Finally, tensor-product kernel parameters can be tuned up under supervised schemes
aiming to highlight other demographic categories.
– A result of Renyi’s entropy for probabilistic image segmentation is the voxel-wise in-
fluence of each voxel to the cost function minimization. Such a contribution may be
extended to model the image intensity distribution with other methods as Parzen win-
dows. In addition, such kind of information metrics can be further adapted to other
image processing tasks (e.g. registration) or unified schemes involving registration,
template selection, dictionary learning [Roy et al., 2014].
– The centered kernel alignment for label fusion (CKA-LF) can be further joined to
multimodal segmentation approaches for profiting the best the different imaging tech-
niques and to discard the nonuseful ones. Moreover, CKA-LF has to be evaluated in
template-based segmentation of structures as bone, liver, lung, and heart. On the other
hand, convolution masks can be used for extracting features from image modalities as
functional MRI. In this regard, CKA provides a new strategy for learning such masks
aiming to enhance tasks classification.
– In Chapter 6, we proposed a new training scheme for multi-layer perceptrons using
kernel-based cost functions. We plan to evaluate such a scheme in other brain MRI
tasks as predicting Alzheimer’s conversion from MCI, classifiying attention deficit hy-
peractivity disorder, and building temporal atlases. Finally, CKA and matrix condi-
tional entropy functions can be embedded into new machine learning tools as deep
learning for enhacing their performance.
66 7 Conclusions and Future Work
7.3 Academic discussion
1 D. Cardenas-Pena, Diego Collazos-Huertas, and German Castellanos-Dominguez, “Cen-
tered Kernel Alignment Enhancing Neural Network Pretraining for MRI-
Based Dementia Diagnosis,” Computational and Mathematical Methods in Medicine,
vol. 2016, Article ID 9523849, 10 pages, 2016.
2 M. Orbes-Arteaga, D. Cardenas-Pena and G. Castellanos-Dominguez “Head and
Neck Auto Segmentation Challenge based on Non-Local Generative Mod-
els,” in MIDAS Journal, 2016.
3 E. E. Bron, M. Smits, W. M. Van Der Flier, H. Vrenken, F. Barkhof, P. Scheltens,
J. M. Papma, R. M. Steketee, C. M. Orellana, R. Meijboom, et al., “Standardized
evaluation of algorithms for computer-aided diagnosis of dementia based on
structural mri: The caddementia challenge,” NeuroImage, vol. 111, pp. 562–579,
2015.
4 D. Cardenas-Pena, M. Orbes-Arteaga, and G. Castellanos-Dominguez, “Supervised
brain tissue segmentation using a spatially enhanced similarity metric,” in
Artificial Computation in Biology and Medicine, pp. 398–407, Springer International
Publishing, 2015.
5 M. Orbes-Arteaga, D. Cardenas-Pena, M. A. Alvarez, A. Orozco, and G. Castellanos-
Dominguez, “Spatial-dependent similarity metric supporting multi-atlas mri
segmentation,” in Pattern Recognition and Image Analysis, pp. 300–308, Springer
International Publishing, 2015.
6 D. Cardenas-Pena, A. A. Orozco, and G. Castellanos-Dominguez, “Information-
based cost function for a bayesian mri segmentation framework,” in Image
Analysis and Processing-ICIAP 2015, pp. 548–556, Springer International Publishing,
2015.
7 M. Orbes-Arteaga, D. Cardenas-Pena, M. A. Alvarez, A. A. Orozco, and G. Castellanos-
Dominguez, “Kernel centered alignment supervised metric for multi-atlas
segmentation,” in Image Analysis and Processing-ICIAP 2015, pp. 658–667, Springer
International Publishing, 2015. Best young paper award finalist.
8 V. Machairas, M. Faessel, D. Cardenas-Pena, T. Chabardes, T. Walter, and E. De-
cenciere, “Waterpixels,” Image Processing, IEEE Transactions on, vol. 24, no. 11,
pp. 3707–3716, 2015.
9 M. Orbes-Arteaga, D. Cardenas-Pena, M. A. Alvarez, A. A. Orozco, and G. Castellanos-
7.3 Academic discussion 67
Dominguez, “Magnetic resonance image selection for multi-atlas segmenta-
tion using mixture models,” in Progress in Pattern Recognition, Image Analysis,
Computer Vision, and Applications, pp. 391–399, Springer International Publishing,
2015.
10 D. Cardenas-Pena, M. Orbes-Arteaga, A. Castro-Ospina, A. Alvarez-Meza, and G.
Castellanos-Dominguez, “A Kernel-based Representation to Support 3D MRI
Unsupervised Clustering,” 22nd International Conference on Pattern Recognition
in Stockholm, August 2014.
11 D. Cardenas-Pena, M. Orbes-Arteaga, and G. Castellanos-Dominguez, “Kernel-based
Atlas Image Selection for Brain Tissue Segmentation,” 36th Annual Interna-
tional Conference of the IEEE Engineering in Medicine and Biology Society in Chicago,
August 2014.
12 D. Cardenas-Pena, A. Alvarez-Meza, and G. Castellanos-Dominguez. “Kernel-Based
Image Representation for Brain MRI Discrimination.” Progress in Pattern
Recognition, Image Analysis, Computer Vision, and Applications. Springer Interna-
tional Publishing, 2014. 343-350. APR-CIARP 2014 Best Paper Award.
13 A. Alvarez-Meza, D. Cardenas-Pena, A.E. Castro-Ospina, and G. Castellanos-Dominguez,
“Tensor-Product Kernel-based Representation encoding Joint MRI View
Similarity,” 36th Annual International Conference of the IEEE Engineering in Medicine
and Biology Society in Chicago, August 2014.
14 E. Cuartas-Morales, D. Cardenas-Pena, and G. Castellanos-Dominguez, “Influence of
anisotropic white matter modeling on EEG source localization,” 36th Annual
International Conference of the IEEE Engineering in Medicine and Biology Society in
Chicago, August 2014.
15 D. Collazos-Huertas, A. Giraldo-Forero, D. Cardenas-Pena, A. Alvarez Meza, and G.
Castellanos-Domınguez, “Functional protein prediction using hmm based fea-
ture representation and relevance analysis,” Advances in Intelligent Systems and
Computing, vol. 232, pp. 71-76, 2014.
16 Strobbe, Gregor and David, Cardenas-Pena and Montes Restrepo, Victoria Eugenia
and van Mierlo, Pieter and Vandenberghe, Stefaan, “Selecting volume conductor
models for EEG source localization of epileptic spikes: preliminary results
based on 4 operated epileptic patients,” International Conference on Basic and
Clinical Multimodal Imaging, Abstracts, Geneva-Switzerland, 2013.
17 D. Cardenas-Pena, J. Martınez-Vargas, and G. Castellanos-Domınguez, “Local bi-
68 7 Conclusions and Future Work
nary fitting energy solution by graph cuts for mri segmentation,” pp. 5131-
5134, 2013.
18 D. Cardenas-Pena, M. Orozco-Alzate, and G. Castellanos-Dominguez, “Selection of
time-variant features for earthquake classification at the nevado-del-ruiz
volcano,” Computers.
19 J. Martinez-Vargas, D. Cardenas-Pena, and G. Castellanos-Dominguez, “Extraction
of stationary components in biosignal discrimination,” pp. 1-4, 2012.
20 D. Cardenas-Pena, J. Martınez-Vargas, and G. Castellanos-Domınguez, “Extraction
of stationary spectral components using stochastic variability,” Lecture Notes
in Computer Science (including subseries Lecture Notes in Artificial Intelligence and
Lecture Notes in Bioinformatics), vol. 7441 LNCS, pp. 765-772, 2012.
Bibliography
[Ahmed et al., 2011] Ahmed, S., Iftekharuddin, K. M., and Vossough, A. (2011). Efficacy
of texture, shape, and intensity feature fusion for posterior-fossa tumor segmentation in
MRI. IEEE transactions on information technology in biomedicine : a publication of the
IEEE Engineering in Medicine and Biology Society, 15(2):206–13.
[Aljabar et al., 2009] Aljabar, P., Heckemann, R. A., Hammers, A., Hajnal, J. V., and
Rueckert, D. (2009). Multi-atlas based segmentation of brain images: Atlas selection
and its effect on accuracy. NeuroImage, 46(3):726 – 738.
[Alvarez-Meza et al., 2014] Alvarez-Meza, A. M., Cardenas-Pena, D., and Castellanos-
Dominguez, G. (2014). Unsupervised Kernel Function Building Using Maximization of
Information Potential Variability. In Bayro-Corrochano, E. and Hancock, E., editors,
Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications SE
- 41, volume 8827 of Lecture Notes in Computer Science, pages 335–342. Springer Inter-
national Publishing.
[Alzubi et al., 2011] Alzubi, S., Islam, N., and Abbod, M. (2011). Multiresolution analysis
using wavelet, ridgelet, and curvelet transforms for medical image segmentation. Interna-
tional journal of biomedical imaging, 2011:136034.
[Amato et al., 2013] Amato, F., Lopez, A., Pena-Mendez, E. M., VaAˆhara, P., Hampl, A.,
and Havel, J. (2013). Artificial neural networks in medical diagnosis. Journal of Applied
Biomedicine, 11(2):47–58.
[Ashburner, 2007] Ashburner, J. (2007). A fast diffeomorphic image registration algorithm.
NeuroImage, 38(1):95–113.
[Ashburner and Friston, 2000] Ashburner, J. and Friston, K. J. (2000). Voxel-based
morphometry–the methods. NeuroImage, 11(6 Pt 1):805–21.
[Ashburner and Friston, 2005] Ashburner, J. and Friston, K. J. (2005). Unified segmenta-
tion. NeuroImage, 26(3):839–51.
[Aubert-Broche et al., 2006] Aubert-Broche, B., Griffin, M., Pike, G. B., Evans, A. C., and
70 Bibliography
Collins, D. L. (2006). Twenty new digital brain phantoms for creation of validation image
data bases. IEEE transactions on medical imaging, 25(11):1410–6.
[Avants and Gee, 2004] Avants, B. and Gee, J. C. (2004). Geodesic estimation for large
deformation anatomical shape averaging and interpolation. NeuroImage, 23 Suppl 1:S139–
50.
[Bai et al., 2015] Bai, W., Shi, W., Ledig, C., and Rueckert, D. (2015). Multi-atlas seg-
mentation with augmented features for cardiac \MR\ images. Medical Image Analysis,
19(1):98–109.
[Balafar et al., 2010] Balafar, M. a., Ramli, a. R., Saripan, M. I., and Mashohor, S.
(2010). Review of brain MRI image segmentation methods. Artificial Intelligence Re-
view, 33(3):261–274.
[Bengio, 2009] Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and
TrendsA R© in Machine Learning, 2(1):1–127.
[Bengio, 2012] Bengio, Y. (2012). Practical recommendations for gradient-based training of
deep architectures. In Neural Networks: Tricks of the Trade, volume 7700 of Lecture Notes
in Computer Science, pages 437–478. Springer Berlin Heidelberg.
[Bengio et al., 2007] Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H. (2007).
Greedy layer-wise training of deep networks. In Scholkopf, B., Platt, J., and Hoffman, T.,
editors, Advances in Neural Information Processing Systems 19 (NIPS’06), pages 153–160.
MIT Press.
[Bohland et al., 2009] Bohland, J. W., Bokil, H., Allen, C. B., and Mitra, P. P. (2009). The
brain atlas concordance problem: quantitative comparison of anatomical parcellations.
PloS one, 4(9):e7200.
[Brinkmann et al., 1998] Brinkmann, B. H., Manduca, A., and Robb, R. A. (1998). Opti-
mized homomorphic unsharp masking for MR grayscale inhomogeneity correction. Medical
Imaging, IEEE Transactions on, 17(2):161–171.
[Brockmeier et al., 2014] Brockmeier, A., Choi, J., Kriminger, E., Francis, J., and Principe,
J. (2014). Neural decoding with kernel-based metric learning. Neural Computation, 26:–.
[Bron et al., 2015] Bron, E. E., Smits, M., van der Flier, W. M., Vrenken, H., Barkhof,
F., Scheltens, P., Papma, J. M., Steketee, R. M., Orellana, C. M., Meijboom, R., Pinto,
M., Meireles, J. R., Garrett, C., Bastos-Leite, A. J., Abdulkadir, A., Ronneberger, O.,
Amoroso, N., Bellotti, R., Cardenas-Pena, D., Alvarez-Meza, A. M., Dolph, C. V.,
Iftekharuddin, K. M., Eskildsen, S. F., Coupe, P., Fonov, V. S., Franke, K., Gaser, C.,
Bibliography 71
Ledig, C., Guerrero, R., Tong, T., Gray, K. R., Moradi, E., Tohka, J., Routier, A., Dur-
rleman, S., Sarica, A., Fatta, G. D., Sensi, F., Chincarini, A., Smith, G. M., Stoyanov,
Z. V., Sørensen, L., Nielsen, M., Tangaro, S., Inglese, P., Wachinger, C., Reuter, M., van
Swieten, J. C., Niessen, W. J., and Klein, S. (2015). Standardized evaluation of algorithms
for computer-aided diagnosis of dementia based on structural MRI: The CADDementia
challenge. NeuroImage, 111:562–579.
[Brox and Cremers, 2008] Brox, T. and Cremers, D. (2008). On Local Region Models and a
Statistical Interpretation ofA theA Piecewise Smooth Mumford-Shah Functional. Inter-
national Journal of Computer Vision, 84(2):184–193.
[Buckner et al., 2004] Buckner, R. L., Head, D., Parker, J., Fotenos, A. F., Marcus, D., Mor-
ris, J. C., and Snyder, A. Z. (2004). A unified approach for morphometric and functional
data analysis in young, old, and demented adults using automated atlas-based head size
normalization: reliability and validation against manual measurement of total intracranial
volume. NeuroImage, 23(2):724–738.
[Cerrolaza et al., 2012] Cerrolaza, J. J., Villanueva, A., and Cabeza, R. (2012). Hierarchical
statistical shape models of multiobject anatomical structures: application to brain MRI.
IEEE transactions on medical imaging, 31(3):713–24.
[Cho et al., 1997] Cho, S., Jones, D., Reddick, W. E., Ogg, R. J., and Steen, R. G. (1997).
ESTABLISHING NORMS FOR AGE-RELATED CHANGES IN PROTON T1 OF HU-
MAN BRAIN TISSUE IN VIVO. Magnetic resonance imaging, 15(10):1133–1143.
[Chyzhyk et al., 2014] Chyzhyk, D., Savio, A., and Grana, M. (2014). Evolutionary ELM
wrapper feature selection for Alzheimer’s disease CAD on anatomical brain MRI. Neuro-
computing, 128:73–80.
[Cocosco et al., 1997] Cocosco, C. A., Kollokian, V., Kwan, R. K.-S., Pike, G. B., and Evans,
A. C. (1997). BrainWeb: Online Interface to a 3D MRI Simulated Brain Database.
NeuroImage, 5:425.
[Cortes et al., 2012] Cortes, C., Mohri, M., and Rostamizadeh, A. (2012). Algorithms for
learning kernels based on centered alignment. The Journal of Machine Learning, 13:795–
828.
[Coupe et al., 2011] Coupe, P., Manjon, J. V., Fonov, V., Pruessner, J., Robles, M., and
Collins, D. L. (2011). Patch-based segmentation using expert priors: Application to
hippocampus and ventricle segmentation. NeuroImage, 54(2):940–954.
[Cuingnet et al., 2011] Cuingnet, R., Gerardin, E., Tessieras, J., Auzias, G., Lehericy, S.,
72 Bibliography
Habert, M.-O., Chupin, M., Benali, H., and Colliot, O. (2011). Automatic classification
of patients with Alzheimer’s disease from structural MRI: a comparison of ten methods
using the ADNI database. NeuroImage, 56(2):766–81.
[Darvas et al., 2004] Darvas, F., Pantazis, D., Kucukaltun-Yildirim, E., and Leahy, R. M.
(2004). Mapping human brain function with MEG and EEG: methods and validation.
NeuroImage, 23 Suppl 1:S289–99.
[Davatzikos, 2004] Davatzikos, C. (2004). Why voxel-based morphometric analysis should be
used with great caution when characterizing group differences. NeuroImage, 23(1):17–20.
[De et al., 2011] De, A., Bhattacharjee, A. K., Chanda, C. K., and Maji, B. (2011). MRI
segmentation using Entropy maximization and Hybrid Particle Swarm Optimization with
Wavelet Mutation. 2011 World Congress on Information and Communication Technolo-
gies, pages 362–367.
[de Munck et al., 1988] de Munck, J. C., van Dijk, B. W., and Spekreijse, H. (1988). Mathe-
matical dipoles are adequate to describe realistic generators of human brain activity. IEEE
transactions on bio-medical engineering, 35(11):960–6.
[Demirhan and Guler, 2011] Demirhan, A. and Guler, A. (2011). Combining stationary
wavelet transform and self-organizing maps for brain MR image segmentation. Engineering
Applications of Artificial Intelligence, 24(2):358–367.
[Dubois et al., 2014] Dubois, B., Feldman, H. H., Jacova, C., Hampel, H., Molinuevo, J. L.,
Blennow, K., DeKosky, S. T., Gauthier, S., Selkoe, D., Bateman, R., Cappa, S., Crutch,
S., Engelborghs, S., Frisoni, G. B., Fox, N. C., Galasko, D., Habert, M.-O., Jicha, G. A.,
Nordberg, A., Pasquier, F., Rabinovici, G., Robert, P., Rowe, C., Salloway, S., Sarazin,
M., Epelbaum, S., de Souza, L. C., Vellas, B., Visser, P. J., Schneider, L., Stern, Y.,
Scheltens, P., and Cummings, J. L. (2014). Advancing research diagnostic criteria for
Alzheimer’s disease: the IWG-2 criteria. Lancet neurology, 13(6):614–29.
[Eskildsen et al., 2015] Eskildsen, S. F., Coupe, P., Fonov, V. S., Pruessner, J. C., and
Collins, D. L. (2015). Structural imaging biomarkers of Alzheimer’s disease: predicting
disease progression. Neurobiology of Aging, 36:S23–S31.
[Falahati et al., 2014] Falahati, F., Westman, E., and Simmons, A. (2014). Multivariate data
analysis and machine learning in Alzheimer’s disease with a focus on structural magnetic
resonance imaging. Journal of Alzheimer’s disease : JAD, 41(3):685–708.
[Farhan et al., 2014] Farhan, S., Fahiem, M. A., and Tauseef, H. (2014). An ensemble-of-
classifiers based approach for early diagnosis of Alzheimer’s disease: classification using
Bibliography 73
structural features of brain images. Computational and mathematical methods in medicine,
2014:862307.
[Fischl, 2004] Fischl, B. (2004). Automatically Parcellating the Human Cerebral Cortex.
Cerebral Cortex, 14(1):11–22.
[Fischl, 2012] Fischl, B. (2012). FreeSurfer. NeuroImage, 62(2):774–81.
[Fischl et al., 2002] Fischl, B., Salat, D. H., Busa, E., Albert, M., Dieterich, M., Haselgrove,
C., van der Kouwe, A., Killiany, R., Kennedy, D., Klaveness, S., Montillo, A., Makris, N.,
Rosen, B., and Dale, A. M. (2002). Whole brain segmentation: automated labeling of
neuroanatomical structures in the human brain. Neuron, 33(3):341–55.
[Folstein et al., 1975] Folstein, M. F., Folstein, S. E., and McHugh, P. R. (1975). “Mini-
mental state”: A practical method for grading the cognitive state of patients for the
clinician. Journal of Psychiatric Research, 12(3):189–198.
[Giraldo and Principe, 2013] Giraldo, L. G. S. and Principe, J. C. (2013). Information The-
oretic Learning with Infinitely Divisible Kernels.
[Grech et al., 2008] Grech, R., Cassar, T., Muscat, J., Camilleri, K. P., Fabri, S. G., Zer-
vakis, M., Xanthopoulos, P., Sakkalis, V., and Vanrumste, B. (2008). Review on solving
the inverse problem in EEG source analysis. Journal of neuroengineering and rehabilita-
tion, 5:25.
[Han et al., 2006] Han, X., Jovicich, J., Salat, D., van der Kouwe, A., Quinn, B., Czanner,
S., Busa, E., Pacheco, J., Albert, M., Killiany, R., Maguire, P., Rosas, D., Makris, N.,
Dale, A., Dickerson, B., and Fischl, B. (2006). Reliability of MRI-derived measurements
of human cerebral cortical thickness: the effects of field strength, scanner upgrade and
manufacturer. NeuroImage, 32(1):180–94.
[Heckemann et al., 2006] Heckemann, R. a., Hajnal, J. V., Aljabar, P., Rueckert, D., and
Hammers, A. (2006). Automatic anatomical brain MRI segmentation combining label
propagation and decision fusion. NeuroImage, 33(1):115–26.
[Hinton et al., 2006] Hinton, G. E., Osindero, S., and Teh, Y.-W. (2006). A fast learning
algorithm for deep belief nets. Neural computation, 18(7):1527–54.
[Iftekharuddin et al., 2009] Iftekharuddin, K. M., Zheng, J., Islam, M. a., and Ogg, R. J.
(2009). Fractal-based brain tumor detection in multimodal MRI. Applied Mathematics
and Computation, 207(1):23–41.
[Isgum et al., 2009] Isgum, I., Staring, M., Rutten, A., Prokop, M., Viergever, M. A., and
74 Bibliography
Van Ginneken, B. (2009). Multi-atlas-based segmentation with local decision fusion-
application to cardiac and aortic segmentation in CT scans. IEEE Transactions on Medical
Imaging, 28(7):1000–1010.
[Jack et al., 2013] Jack, C. R., Knopman, D. S., Jagust, W. J., Petersen, R. C., Weiner,
M. W., Aisen, P. S., Shaw, L. M., Vemuri, P., Wiste, H. J., Weigand, S. D., Lesnick,
T. G., Pankratz, V. S., Donohue, M. C., and Trojanowski, J. Q. (2013). Tracking patho-
physiological processes in Alzheimer’s disease: an updated hypothetical model of dynamic
biomarkers. The Lancet. Neurology, 12(2):207–16.
[Jenssen et al., 2003] Jenssen, R., Principe, J., and Eltoft, T. (2003). Information cut and in-
formation forces for clustering. In Neural Networks for Signal Processing, 2003. NNSP’03.
2003 IEEE 13th Workshop on, pages 459–468.
[Jung et al., 2015] Jung, W. B., Lee, Y. M., Kim, Y. H., and Mun, C.-w. (2015). Automated
Classification to Predict the Progression of Alzheimer’s Disease Using Whole-Brain Vol-
umetry and DTI. Psychiatry Investigation, 12(1):92–102.
[Khedher et al., 2015] Khedher, L., Ramırez, J., Gorriz, J., Brahim, A., and Segovia, F.
(2015). Early diagnosis of Alzheimer’s disease based on partial least squares, principal
component analysis and support vector machine using segmented MRI images. Neuro-
computing, 151:139–150.
[Kimeldorf and Wahba, 1971] Kimeldorf, G. and Wahba, G. (1971). Some results on tcheby-
cheffian spline functions. Journal of mathematical analysis and applications, 33(1):82–95.
[Kloppel et al., 2012a] Kloppel, S., Abdulkadir, A., Jack, C. R., Koutsouleris, N., Mourao-
Miranda, J., and Vemuri, P. (2012a). Diagnostic neuroimaging across diseases. NeuroIm-
age, 61(2):457–463.
[Kloppel et al., 2012b] Kloppel, S., Abdulkadir, A., Jack, C. R., Koutsouleris, N., Mourao-
Miranda, J., and Vemuri, P. (2012b). Diagnostic neuroimaging across diseases. NeuroIm-
age, 61(2):457–63.
[Kloppel et al., 2015] Kloppel, S., Peter, J., Ludl, A., Pilatus, A., Maier, S., Mader, I., Heim-
bach, B., Frings, L., Egger, K., Dukart, J., Schroeter, M. L., Perneczky, R., Haussermann,
P., Vach, W., Urbach, H., Teipel, S., Hull, M., and Abdulkadir, A. (2015). Applying
Automated MR-Based Diagnostic Methods to the Memory Clinic: A Prospective Study.
Journal of Alzheimer’s disease : JAD, 47(4):939–54.
[Kuklisova-Murgasova et al., 2011] Kuklisova-Murgasova, M., Aljabar, P., Srinivasan, L.,
Counsell, S. J., Doria, V., Serag, A., Gousias, I. S., Boardman, J. P., Rutherford, M. a.,
Bibliography 75
Edwards, a. D., Hajnal, J. V., and Rueckert, D. (2011). A dynamic 4D probabilistic atlas
of the developing brain. NeuroImage, 54(4):2750–2763.
[Kwan et al., 1996] Kwan, R. K.-S., Evans, A. C., and Pike, G. B. (1996). An extensible
MRI simulator for post-processing evaluation. In Hohne, K. H. and Kikinis, R., editors,
Visualization in Biomedical Computing, volume 1131 of Lecture Notes in Computer Sci-
ence, pages 135–140. Springer Berlin Heidelberg, Berlin, Heidelberg.
[Lanfer et al., 2012] Lanfer, B., Scherg, M., Dannhauer, M., Kn??sche, T. R., Burger, M.,
and Wolters, C. H. (2012). Influences of skull segmentation inaccuracies on EEG source
analysis. NeuroImage, 62(1):418–431.
[Lankton et al., 2007] Lankton, S., Nain, D., Yezzi, A., and Tannenbaum, A. (2007). Hybrid
geodesic region-based curve evolutions for image segmentation. In Cleary, K. R., Hsieh,
J., Manduca, A., Pluim, J. P. W., Horii, S. C., Emelianov, S. Y., Giger, M. L., Jiang, Y.,
Sahiner, B., Karssemeijer, N., McAleavey, S. A., Andriole, K. P., Reinhardt, J. M., Hu,
X. P., Flynn, M. J., and Miga, M. I., editors, Proc. SPIE 6510, Medical Imaging 2007:
Physics of Medical Imaging, volume 6510, pages 65104U–65104U–10.
[Lankton and Tannenbaum, 2008] Lankton, S. and Tannenbaum, A. (2008). Localizing
region-based active contours. IEEE transactions on image processing : a publication of
the IEEE Signal Processing Society, 17(11):2029–39.
[Ledig et al., 2012] Ledig, C., Wolz, R., Aljabar, P., Lotjonen, J., Heckemann, R. a., Ham-
mers, A., and Rueckert, D. (2012). Multi-class brain segmentation using atlas propagation
and EM-based refinement. Proceedings - International Symposium on Biomedical Imaging,
pages 896–899.
[Lewis, 1996] Lewis, a. S. (1996). Derivatives of Spectral Functions. Mathematics of Oper-
ations Research, 21(3):576–588.
[Li et al., 2007] Li, C., Kao, C.-Y., Gore, J. C., and Ding, Z. (2007). Implicit Active Con-
tours Driven by Local Binary Fitting Energy. 2007 IEEE Conference on Computer Vision
and Pattern Recognition, pages 1–7.
[Li et al., 2008] Li, M., Huang, T., and Zhu, G. (2008). Improved Fast Fuzzy C-Means
Algorithm for Medical MR Images Segmentation. 2008 Second International Conference
on Genetic and Evolutionary Computing, pages 285–288.
[Liew and Yan, 2006] Liew, A. W.-C. and Yan, H. (2006). Current Methods in the Auto-
matic Tissue Segmentation of 3D Magnetic Resonance Brain Images. Current Medical
Imaging Reviews, 2(1):91–103.
76 Bibliography
[Liu et al., 2011] Liu, W., Principe, J. C., and Haykin, S. (2011). Kernel Adaptive Filtering:
A Comprehensive Introduction, volume 57. John Wiley & Sons.
[Lotjonen et al., 2010] Lotjonen, J. M., Wolz, R., Koikkalainen, J. R., Thurfjell, L., Walde-
mar, G., Soininen, H., and Rueckert, D. (2010). Fast and robust multi-atlas segmentation
of brain magnetic resonance images. NeuroImage, 49(3):2352–65.
[Ma et al., 2014] Ma, G., Gao, Y., Wu, G., Wu, L., and Shen, D. (2014). Atlas-Guided
Multi-channel Forest Learning for Human Brain Labeling. In Menze, B., Langs, G.,
Montillo, A., Kelm, M., Muller, H., Zhang, S., Cai, W. T., and Metaxas, D., editors,
Medical Computer Vision: Algorithms for Big Data, volume 8848 of LNCS, pages 97–104.
Springer International Publishing.
[Magnin et al., 2009] Magnin, B., Mesrob, L., Kinkingnehun, S., Pelegrini-Issac, M., Col-
liot, O., Sarazin, M., Dubois, B., Lehericy, S., and Benali, H. (2009). Support vector
machine-based classification of Alzheimer’s disease from whole-brain anatomical MRI.
Neuroradiology, 51(2):73–83.
[Marcus et al., 2010] Marcus, D. S., Fotenos, A. F., Csernansky, J. G., Morris, J. C., and
Buckner, R. L. (2010). Open Access Series of Imaging Studies (OASIS): Longitudinal MRI
Data in Nondemented and Demented Older Adults. Journal of cognitive neuroscience,
22(12):2677–2684.
[McKhann et al., 2011] McKhann, G. M., Knopman, D. S., Chertkow, H., Hyman, B. T.,
Jack, C. R., Kawas, C. H., Klunk, W. E., Koroshetz, W. J., Manly, J. J., Mayeux, R.,
Mohs, R. C., Morris, J. C., Rossor, M. N., Scheltens, P., Carrillo, M. C., Thies, B.,
Weintraub, S., and Phelps, C. H. (2011). The diagnosis of dementia due to Alzheimer’s
disease: recommendations from the National Institute on Aging-Alzheimer’s Association
workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimer’s & dementia :
the journal of the Alzheimer’s Association, 7(3):263–9.
[Mohamed et al., 2011] Mohamed, A. R., Sainath, T. N., Dahl, G., Ramabhadran, B., Hin-
ton, G. E., and Picheny, M. a. (2011). Deep belief networks using discriminative features
for phone recognition. ICASSP, IEEE International Conference on Acoustics, Speech and
Signal Processing - Proceedings, pages 5060–5063.
[Montes-Restrepo et al., 2014] Montes-Restrepo, V., Van Mierlo, P., Strobbe, G., Staelens,
S., Vandenberghe, S., and Hallez, H. (2014). Influence of skull modeling approaches on
EEG source localization. Brain Topography, 27:95–111.
[Moradi et al., 2015] Moradi, E., Pepe, A., Gaser, C., Huttunen, H., and Tohka, J. (2015).
Machine learning framework for early MRI-based Alzheimer’s conversion prediction in
Bibliography 77
MCI subjects. NeuroImage, 104:398–412.
[Morejon and Principe, 2004] Morejon, R. and Principe, J. (2004). Advanced search algo-
rithms for information-theoretic learning with kernel-based estimators. Neural Networks,
IEEE Transactions on, 15(4):874–884.
[Ota et al., 2014] Ota, K., Oishi, N., Ito, K., and Fukuyama, H. (2014). A comparison of
three brain atlases for MCI prediction. Journal of Neuroscience Methods, 221:139–150.
[Ota et al., 2015] Ota, K., Oishi, N., Ito, K., and Fukuyama, H. (2015). Effects of imaging
modalities, brain atlases and feature selection on prediction of Alzheimer’s disease. Journal
of neuroscience methods, 256:168–83.
[Papakostas et al., 2015] Papakostas, G., Savio, A., Grana, M., and Kaburlasos, V. (2015).
A lattice computing approach to Alzheimer’s disease computer assisted diagnosis based
on MRI data. Neurocomputing, 150:37–42.
[Paragios and Deriche, 2002] Paragios, N. and Deriche, R. (2002). Geodesic Active Regions
and Level Set Methods for Supervised Texture Segmentation. International Journal of
Computer Vision, 46(3):223–247.
[Principe, 2010] Principe, J. C. (2010). Information theoretic learning: Renyi’s entropy and
kernel perspectives. Springer.
[Ramırez et al., 2013] Ramırez, J., Gorriz, J., Salas-Gonzalez, D., Romero, A., Lopez, M.,
Alvarez, I., and Gomez-Rıo, M. (2013). Computer-aided diagnosis of Alzheimer’s type
dementia combining support vector machines and discriminant set of features. Information
Sciences, 237:59–72.
[Ramırez et al., 2016] Ramırez, J., Gorriz, J. M., Ortiz, A., Padilla, P., Martınez-murcia,
F. J., and Neuroimaging, D. (2016). Ensemble Tree Learning Techniques forMagnetic
Resonance Image Analysis Javier. In Innovation in Medicine and Healthcare 2015, vol-
ume 45, pages 395–404. Springer International Publishing.
[Ranzato et al., 2007] Ranzato, M., Poultney, C., Chopra, S., and Cun, Y. L. (2007). Ef-
ficient Learning of Sparse Representations with an Energy-Based Model. Advances in
Neural Information Processing Systems, pages 1137–1144.
[Rousseau et al., 2011] Rousseau, F., Habas, P. A., and Studholme, C. (2011). A supervised
patch-based approach for human brain labeling. IEEE Transactions on Medical Imaging,
30(10):1852–1862.
[Roy et al., 2014] Roy, S., Carass, A., Prince, J. L., and Pham, D. L. (2014). Subject Specific
78 Bibliography
Sparse Dictionary Learning for Atlas based Brain MRI Segmentation. Mach Learn Med
Imaging, 8679:248–255.
[Sabuncu and Konukoglu, 2015] Sabuncu, M. R. and Konukoglu, E. (2015). Clinical predic-
tion from structural brain MRI scans: a large-scale empirical study. Neuroinformatics,
13(1):31–46.
[Scholkopf and Smola, 2002] Scholkopf, B. and Smola, A. J. (2002). Learning with Kernels.
The MIT Press, Cambridge, MA, USA.
[Segonne et al., 2004] Segonne, F., Dale, a. M., Busa, E., Glessner, M., Salat, D., Hahn,
H. K., and Fischl, B. (2004). A hybrid approach to the skull stripping problem in MRI.
NeuroImage, 22(3):1060–75.
[Segonne et al., 2007] Segonne, F., Pacheco, J., and Fischl, B. (2007). Geometrically accu-
rate topology-correction of cortical surfaces using nonseparating loops. IEEE Transactions
on Medical Imaging, 26(4):518–529.
[Sikka et al., 2009] Sikka, K., Sinha, N., Singh, P. K., and Mishra, A. K. (2009). A fully
automated algorithm under modified FCM framework for improved brain MR image seg-
mentation. Magnetic resonance imaging, 27(7):994–1004.
[Sled et al., 1998] Sled, J. G., Zijdenbos, a. P., and Evans, a. C. (1998). A nonparametric
method for automatic correction of intensity nonuniformity in MRI data. IEEE transac-
tions on medical imaging, 17(1):87–97.
[Sørensen et al., 2013] Sørensen, L., Pai, A., Igel, C., and Nielsen, M. (2013). Hippocampal
texture predicts conversion from MCI to Alzheimer’s disease. Alzheimer’s & Dementia,
9(4):P581.
[Steen et al., 2000] Steen, R. G., Reddick, W. E., and Ogg, R. J. (2000). More than meets
the eye: significant regional heterogeneity in human cortical T1. Magnetic resonance
imaging, 18(4):361–8.
[Sum and Cheung, 2008] Sum, K. W. and Cheung, P. Y. S. (2008). Vessel Extraction Under
Non-Uniform Illumination: A Level Set Approach. In IEEE transactions on bio-medical
engineering, volume 55, pages 358–360.
[Tong et al., 2013] Tong, T., Wolz, R., Coupe, P., Hajnal, J. V., and Rueckert, D. (2013).
Segmentation of MR images via discriminative dictionary learning and sparse coding:
Application to hippocampus labeling. NeuroImage, 76:11–23.
[Tong et al., 2015] Tong, T., Wolz, R., Wang, Z., Gao, Q., Misawa, K., Fujiwara, M., Mori,
Bibliography 79
K., Hajnal, J. V., and Rueckert, D. (2015). Discriminative Dictionary Learning for Ab-
dominal Multi-Organ Segmentation. Medical Image Analysis, 23:92–104.
[Tu and Bai, 2010] Tu, Z. and Bai, X. (2010). Auto-context and its application to high-level
vision tasks and 3D brain image segmentation. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 32(10):1744–1757.
[Valdes-Hernandez et al., 2009] Valdes-Hernandez, P. A., von Ellenrieder, N., Ojeda-
Gonzalez, A., Kochen, S., Aleman-Gomez, Y., Muravchik, C., and Valdes-Sosa, P. A.
(2009). Approximate average head models for EEG source imaging. Journal of neuro-
science methods, 185:125–32.
[Vincent et al., 2010] Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., and Manzagol,
P.-A. (2010). Stacked Denoising Autoencoders: Learning Useful Representations in a
Deep Network with a Local Denoising Criterion. Journal of Machine Learning Research,
11(3):3371–3408.
[Vovk et al., 2007] Vovk, U., Pernus, F., and Likar, B. (2007). A review of methods for
correction of intensity inhomogeneity in MRI. IEEE transactions on medical imaging,
26(3):405–21.
[Wang and Wang, 2008] Wang, P. and Wang, H. (2008). A Modified FCM Algorithm for
MRI Brain Image Segmentation. 2008 International Seminar on Future BioMedical In-
formation Engineering, 32(8):685–698.
[Wang et al., 2010] Wang, X.-F., Huang, D.-S., and Xu, H. (2010). An efficient local Chan-
Vese model for image segmentation. Pattern Recognition, 43(3):603–618.
[Westman et al., 2013] Westman, E., Aguilar, C., Muehlboeck, J. S., and Simmons, A.
(2013). Regional magnetic resonance imaging measures for multivariate analysis in
Alzheimer’s disease and mild cognitive impairment. Brain Topography, 26(1):9–23.
[Weston et al., 2012] Weston, J., Ratle, F., Mobahi, H., and Collobert, R. (2012). Deep
learning via semi-supervised embedding. In Neural Networks: Tricks of the Trade, volume
7700 of Lecture Notes in Computer Science, pages 639–655. Springer Berlin Heidelberg.
[Wolz et al., 2011] Wolz, R., Julkunen, V., Koikkalainen, J., Niskanen, E., Zhang, D. P.,
Rueckert, D., Soininen, H., and Lotjonen, J. (2011). Multi-method analysis of MRI images
in early diagnostics of Alzheimer’s disease. PLoS ONE, 6(10):1–9.
[Wu et al., 2014] Wu, G., Wang, Q., Zhang, D., Nie, F., Huang, H., and Shen, D. (2014). A
generative probability model of joint label fusion for multi-atlas based brain segmentation.
Medical image analysis, 18(6):881–90.
80 Bibliography
[Zhang et al., 2012] Zhang, D., Guo, Q., Wu, G., and Shen, D. (2012). Sparse Patch-Based
Label Fusion for Multi-Atlas Segmentation. In Yap, P.-T., Liu, T., Shen, D., Westin,
C.-F., and Shen, L., editors, Multimodal Brain Image Analysis, volume 7509 of LNCS,
pages 94–102. Springer Berlin Heidelberg.
[Zhang et al., 2011] Zhang, D., Wang, Y., Zhou, L., Yuan, H., and Shen, D. (2011). Multi-
modal classification of Alzheimer’s disease and mild cognitive impairment. NeuroImage,
55(3):856–67.
[Zitova and Flusser, 2003] Zitova, B. and Flusser, J. (2003). Image registration methods: A
survey. Image and Vision Computing, 21(11):977–1000.
Bibliography 81
Biographical sketch
David Cardenas-Pena received the bachelor’s degree in elec-
tronic engineering and the M.Eng. degree in industrial automa-
tion from the Universidad Nacional de Colombia, Manizales-
Colombia, in 2008 and 2011, respectively. He finished his Ph.D.
degree in automatics with the Universidad Nacional de Colom-
bia in 2016. He has been a Research Assistant with the Signal
Processing and Recongnition Group since 2008 and Teaching
Assistant since 2012. His main research interests include ma-
chine learning and pattern recognition applied to biosignal and
image processing.