7(6,6'2&725$/ 6SDWLDO'HSWK %DVHG0HWKRGVIRU … · 7(6,6'2&725$/ 6sdwldo'hswk...
Transcript of 7(6,6'2&725$/ 6SDWLDO'HSWK %DVHG0HWKRGVIRU … · 7(6,6'2&725$/ 6sdwldo'hswk...
TESIS DOCTORAL
Spatial Depth-Based Methods for Functional Data
Autor:
Carlo Sguera
Director/es:
Pedro Galeano, Rosa Lillo
DEPARTAMENTO DE ESTADÍSTICA
Getafe, Junio 2014
Universidad Carlos III de Madrid
PH.D. THESIS
Spatial Depth-Based Methods for Functional
Data
Author:
Carlo Sguera
Advisors:
Pedro Galeano, Rosa Lillo
DEPARTMENT OF STATISTICS
Madrid, June, 2014
This dissertation was written in the Department of Statistics at Universidad Carlos III de
Madrid under the supervision of the Professors Pedro Galeano and Rosa Lillo. The author
was a visiting scholar at the Department of Biostatistics of Columbia University under the su-
pervision of Professors Sara Lopez-Pintado and Jeff Goldsmith in the period April-July 2013.
The author was supported by a scholarship for master studies (UC3M) and was subsequently
hired as a teaching and research assistant (PIF). Besides, the author and the advisors had the
partial support of the following research projects: Spanish Ministry of Science and Innovation
grant ECO2011-25706 and by Spanish Ministry of Economy and Competition grant ECO2012-
38442.
Agradecimientos
Dos cosas antes de empezar, bueno, tres, pero una ya se nota: primero, voy a escribir en
espanol, o mejor, en “itanol”; segundo, es la primera vez que escribo algo ası, y no se como me
va a salir; y tercero, soy de los que dejan estas cosas para el final: de hecho me acabo de servir
una cervecita para celebrarlo. Bueno, a por ellos
El primer “grazie” va a mi “sorellina” Ilaria. Te quiero mucho y a ver si te veo por Madrid
para la defensa, me gustarıa mucho. Nunzia y Rino, mi mama y mi papa, ya se que va ser
mas difıcil veros por aquı, pero bueno, ya lo celebraremos en Barletta, os quiero. Y ya que
menciono Barletta, mi ciudad, quiero mandar un abrazo a todos mis amigos de alla (uno mas
fuerte a Michele) y a mi familia alargada (primos, primas, tıos, tıas y a los abuelos que ya
no estan, que pero se hubieran alegrado un monton en un dıa como este). Uno kilometros
mas alla esta Andria, donde hice la secundaria: un beso a todas mis companeras de clase (sı,
eran casi todas chicas, ¡que suerte!). De Andria una mencion especial se la merecen Ornella
(siguiendo a ella a Roma descubrı casualmente que existıa una Facultad de Estadıstica), y claro,
Daniele e Marcello, que luego fueron mis companeros de piso en Roma, y ahora son mucho
mas. Ellos me vieron dar mis primeros pasos como estadıstico, juntos a Popolino, ya que
tuvimos la gran suerte de que fuera a vivir con nosotros este chico tan pequenito, que pero es
un grande: Popolı, “si nu grus!!!”. Un saludo tambien a todos mis companeros de universidad,
y sobre todo a dos, Marco e Simone: “cdddaaaaa”. Y ya llegamos a Madrid, donde por cierto
hice tambien el Erasmus en la UC3M, y donde tuve la suerte de que me diera clases Juanmi,
quizas el profesor que me hizo pensar que volver a la UC3M para emprender este camino
podıa ser una buena idea, y de conocer a mucha gente, entre otros a Leo e Irmi: “pasta al
i
huevo, chicos”. Y ya llegamos a los ultimos seis anos en Madrid donde he tenido la suerte
de compartir momentos con gente como: Raqui, perdon si el primer dıa no te quise dejar el
abrigo, pero por suerte luego compartimos muchas cosas mas, porque compartir es vivirun
abrazo tambien a tu hermano Albe y a toda tu familia que me “adopto” un poco. Fresa, mi
compi de despacho, pero tambien de “notti magiche”. Juancho, otro que de “notti magiche”
entiende mucho. Mahomud, mi habibi, un amigo que te da todo. Otro habibi, Azad, que
cuando se fue le seguıa viendo por todas partes. Mariano, el de Roma. Demian, yo soy el
dıa y el la noche, pero al final nos lo estamos pasando bien. Audra, que se enfada un poco
cuando la llamo . . . (ella sabe), pero es con carino. Y tambien Ana Laura, Diana, Joao, Gabi,
Davidy muchos mas. Y finalmente, no por ser pelotas, el ultimo gracias a Rosa y a Pedro, que
es tambien gracias a ellos si he podido escribir estos agradecimientos. Ciao, Carlo.
ii
Abstract
In this thesis we deal with functional data, and in particular with the notion of functional
depth. A functional depth is a measure that allows to order and rank the curves in a functional
sample from the most to the least central curve. In functional data analysis (FDA), unlike in
univariate statistics where R provides a natural order criterion for observations, the ways how
several existing functional depths rank curves differ among them. Moreover, there is no agree-
ment about the existence of a best available functional depth. For these reasons among others,
there is still ongoing research in the functional depth topic and this thesis intends to enhance
the progress in this field of FDA.
As first contribution, we enlarge the number of available functional depths by introducing
the kernelized functional spatial depth (KFSD). In the course of the dissertation, we show that
KFSD is the result of a modification of an existing functional depth known as functional spa-
tial depth (FSD). FSD falls into the category of global functional depths, which means that the
FSD value of a given curve relative to a functional sample depends equally on the rest of the
curves in the sample. However, first in the multivariate framework, where also the notion of
depth is used, and then in FDA, several authors suggested that a local approach to the depth
problem may result useful. Therefore, some local depths for which the depth value of a given
observation depends more on close than distant observations have been proposed in the lit-
erature. Unlike FSD, KFSD falls in the category of local depths, and it can be interpreted as a
local version of FSD. As the name of KFSD suggests, we achieve the transition from global to
local proposing a kernel-type modification of FSD.
KFSD, as well as any functional depth, may result useful for several purposes. For in-
v
stance, using KFSD it is possible to identify the most central curve in a functional sample, that
is, the KFSD-based sample median. Also, using the p% most central curves, we can draw a
p%-central region (0 < p < 100). Another application is the computation of robust means such
as the α-trimmed mean, 0 < α < 1, which consists in the functional mean calculated after
deleting the proportion α of least central curves. The use of functional depths in FDA has gone
beyond the previous examples and nowadays functional depths are also used to solve other
types of problems. In particular, in this thesis we consider supervised functional classification
and functional outlier detection, and we study and propose methods based on KFSD.
Our approach to both classification and outlier detection has a main feature: we are inter-
ested in scenarios where the solution of the problem is not extremely graphically clear. In more
detail, in classification we focus on cases in which the different groups of curves are hardly rec-
ognizable looking at a graph, and we overlook problems where the classes of curves are easily
graphically detectable. Similarly, we do not deal with outliers that are excessively distant from
the rest of the curves, but we consider low magnitude, shape and partial outliers, which are
harder to detect. We deal with this type of problems because in these challenging scenarios it
is possible to appreciate important differences among both depths and methods, while these
differences tend to be much smaller in easier problems.
Regarding classification, methods based on functional depths are already available. In this
thesis we consider three existing depth-based procedures. For the first time, several functional
depths (KFSD and six more depths) are employed to implement these depth-based techniques.
The main result is that KFSD stands out among its competitors. Indeed, KFSD, when used
together with one of the depth based methods, i.e., the within maximum depth procedure,
shows the most stable and best performances along a simulation study that considers six dif-
ferent curve generating processes and for the classification of two real datasets. Therefore, the
results supports the introduction of KFSD as a new functional depth.
For what concerns outlier detection, we also consider some existing depth-based proce-
dures and the above-mentioned battery of functional depths. In addition, we propose three
new methods exclusively designed for KFSD. They are all based on a desirable feature for a
vi
functional depth, that is, a functional depth should assign a low depth value to an outlier. Dur-
ing our research, we have observed that KFSD is endowed with this feature. Moreover, thanks
to its local approach, KFSD in general succeeds in ranking correctly outliers that do not stand
out evidently in a graph. However, a low KFSD value is not enough to detect outliers, and it is
necessary to have at disposal a threshold value for KFSD to distinguish between normal curves
and outliers. Indeed, the three methods that we present provide alternative ways to choose a
threshold for KFSD. The simulation study that we carry out for outlier detection is similarly
extensive as in classification. Besides our proposals, we consider three existing depth-based
methods and seven depths, and two techniques that do not use functional depths. The results
of this second simulation study are also encouraging: the proposed KFSD-based methods are
the only procedures that have good correct outlier detection performances in all the six scenar-
ios and for the two contamination probabilities that we consider.
To summarize, in this thesis we will present a new local functional depth, KFSD, which will
turn out to be a useful tool in supervised classification, when it used in conjunction with some
existing depth-based methods, and in outlier detection, by means of some new procedures that
we will also present in this work.
vii
Resumen
El tema de esta tesis es el analisis de datos funcionales, y en particular de la nocion de pro-
fundidad funcional. Una medida de profundidad funcional permite ordenar las curvas de
una muestra funcional de la mas central a la menos central. Al contrario de lo que ocurre en
R donde existe una forma natural de ordenar las observaciones, en el analisis de datos fun-
cionales (FDA) no existe una forma unica de ordenar las curvas, y por tanto las diferentes pro-
fundidades funcionales existentes ordenan las curvas de distintas formas. Ademas, no existe
un acuerdo sobre la existencia de una profundidad funcional mejor para todas las situaciones
entre las disponibles. Por estas razones, entre otras, el tema de la nocion de profundidad fun-
cional es todavıa un area de estudio de investigacion activa, y esta tesis se propone colaborar
en los avances en este campo de FDA.
Como primera contribucion, en esta tesis se amplıa el numero de profundidades fun-
cionales disponibles mediante la introduccion de la profundidad espacial funcional kernel-
izada (KFSD). A lo largo de este trabajo, se muestra que KFSD es el resultado de una modifi-
cacion de una profundidad funcional existente conocida como profundidad espacial funcional
(FSD). FSD se puede englobar dentro de la categorıa de las profundidades funcionales glob-
ales, lo que significa que el valor de FSD para una curva dada, en relacion con una muestra
funcional, depende igualmente del resto de las curvas en la muestra. Sin embargo, como en el
contexto multivariante, donde tambien se utiliza el concepto de profundidad, varios autores
han sugerido que un enfoque local para la definicion de una profundidad puede resultar util
tambien en FDA. Por este motivo, en la literatura se han propuesto algunas profundidades
locales para las que el valor de la profundidad de una observacion depende mas de las ob-
ix
servaciones cercanas que de las distantes. A diferencia de FSD, KFSD se puede clasificar en
la categorıa de las profundidades locales, y puede ser interpretada como una version local de
FSD. Como el nombre de KFSD sugiere, la transicion de lo global a lo local se lograra mediante
una modificacion de FSD basada en el uso de los kernels.
KFSD, ası como cualquier otra profundidad funcional, puede resultar util para varios
propositos en el ambito del analisis estadıstico de datos. Por ejemplo, usando KFSD es posible
identificar la curva mas central en una muestra funcional, es decir, la mediana de la muestra
segun KFSD. Ademas, utilizando el p% de las curvas centrales, es posible definir la p%-region
central (0 < p% < 100). Otra aplicacion es el calculo de medias robustas, como por ejemplo la
α-media truncada, con 0 < α < 1, que consiste en la media funcional calculada sin considerar
la proporcion α de las curvas menos centrales. El uso de las profundidades funcionales en FDA
ha ido mas alla de los ejemplos anteriores, y en la actualidad las profundidades funcionales
tambien se utilizan para resolver otros tipos de problemas. En particular, en esta tesis se con-
sideran la clasificacion supervisada funcional y la deteccion de curvas atıpicas, y se estudian y
proponen metodos basados en KFSD.
El enfoque que se presenta en esta tesis en clasificacion y deteccion de atıpicos tiene una
caracterıstica principal: el foco del trabajo esta puesto en escenarios en los que la solucion del
problema no resulta muy clara graficamente. Especıficamente, en el apartado de clasificacion
se consideran casos en los que los diferentes grupos de curvas son apenas reconocibles mi-
rando un grafico, mientras que no se consideran problemas donde las clases de las curvas son
facilmente detectables graficamente. De manera similar, no esta entre nuestros objetivos de-
tectar curvas atıpicas que estan excesivamente alejadas graficamente del resto de las curvas, y
por el contrario se consideran atıpicos de baja magnitud, de forma y atıpicos parciales, que son
mas difıciles de detectar con los procedimientos que ya existen en la literatura. En este sentido,
se pondra en evidencia que en este tipo de problemas existen diferencias sustanciales entre las
profundidades y los metodos de analisis, mientras que estas diferencias tienden a ser menores
en problemas mas sencillos o visualmente mas evidentes.
En relacion con el problema de clasificacion funcional, existen en la literatura metodos basa-
x
dos en el uso de las profundidades funcionales. En esta tesis se consideran tres procedimien-
tos de este tipo, y por primera vez se combinan con varias profundidades funcionales (KFSD
y seis mas) con el objetivo de establecer comparativas entre metodos y/o profundidades con
los mismos escenarios. El resultado principal que se observa es que KFSD se destaca entre
sus competidores. De hecho, KFSD, cuando se utiliza junto a uno de los metodos conocido
como el procedimiento de profundidad maxima en los grupos, muestra los resultados mejores
y mas estables a lo largo de un estudio de simulacion que considera seis procesos diferentes
para generar las curvas, ası como en la clasificacion de dos conjuntos de datos reales. Por lo
tanto, los resultados obtenidos sustentan la introduccion de KFSD como nueva profundidad
funcional.
Por lo que se refiere a la deteccion de curvas atıpicas, tambien se consideran algunos pro-
cedimientos ya existentes basados en el uso de la nocion de profundidad y el grupo de sietes
profundidades mencionado arriba. Ademas, se proponen tres nuevos metodos disenados ex-
clusivamente para KFSD. Todos ellos se basan en una caracterıstica deseable en una profun-
didad funcional, es decir, que esta asigne un valor de profundidad baja a una curva atıpica.
Durante nuestra investigacion, se ha observado que KFSD posee esta caracterıstica. Ademas,
gracias a su enfoque local, KFSD es en general capaz de ordenar correctamente los atıpicos
que no se destacan claramente en un grafico. Sin embargo, un valor bajo de KFSD no es su-
ficiente para detectar curvas atıpicas, y es necesario tener a disposicion un valor umbral para
KFSD para distinguir entre curvas normales y atıpicas. De hecho, los tres metodos que se
presentan ofrecen formas alternativas para elegir un umbral para KFSD. Desde un punto de
vista metodologico, estos procedimientos estan respaldados por resultados teoricos de corte
probabilısticos. El estudio de simulacion que se lleva a cabo para la deteccion de atıpicos es
igualmente extenso como en el caso de clasificacion. Ademas de nuestras propuestas, se con-
sideran tres metodos existentes que estan basados en el uso de profundidades funcionales y
dos tecnicas que no utilizan profundidades funcionales. Los resultados de este segundo estu-
dio de simulacion son tambien positivos: los metodos basados en KFSD que se proponen en
esta tesis resultan ser los procedimientos que detectan mejor los atıpicos para un conjunto de
xi
seis escenarios simulados y para las dos probabilidades de contaminacion que se consideran.
En resumen, en esta tesis se presenta una nueva profundidad funcional local, KFSD, que
resulta ser una herramienta util en clasificacion supervisada cuando se utiliza conjuntamente
con algunos metodos basados en el uso de profundidades, y en la deteccion de curvas atıpicas
por medio de algunos nuevos procedimientos que tambien se presentan en este trabajo.
xii
Contents
List of Figures xix
List of Tables xxi
1 Introduction 1
1.1 The Notion of Functional Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Depth-based Functional Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 The Kernelized Functional Spatial Depth 15
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Other Functional Depths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 The Functional Spatial Depth and Associated Quantiles . . . . . . . . . . . . . . . 18
2.4 The Kernelized Functional Spatial Depth . . . . . . . . . . . . . . . . . . . . . . . 20
2.5 From Global to Local Depths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Supervised Functional Classification 31
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 The Supervised Functional Classification Problem . . . . . . . . . . . . . . . . . . 32
3.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Real Data Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
xv
3.4.1 Growth Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4.2 Phoneme Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4 Functional Outlier Detection 55
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.2 Outlier Detection for Functional Data . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3 Simulation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4 Real Data Study: Nitrogen Oxides (NOx) Data . . . . . . . . . . . . . . . . . . . . 72
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
4.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5 Conclusions 81
5.1 Research Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
References 87
xvi
List of Figures
1.1 Three real functional datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Showing the differences between global and local depths . . . . . . . . . . . . . . 24
2.2 Examples of contaminated datasets: clear contamination (top) and faint contam-
ination (bottom). The solid curves are normal curves and the dashed curves are
outliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Comparison of depth-based rankings . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Comparison of KFSD-based rankings . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1 Simulated datasets from CGP1, CGP2, CGP1out and CGP2out . . . . . . . . . . . . 37
3.2 Simulated datasets from CGP3 and CGP4 . . . . . . . . . . . . . . . . . . . . . . . 43
3.3 Growth curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Growth curves: highlighting some interesting curves for the classification problem 48
3.5 Phoneme curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.1 Simulated datasets from MM1, MM2, MM3, MM4, MM5 and MM6 . . . . . . . . 64
4.2 Example of a training sample of peripheral curves . . . . . . . . . . . . . . . . . . 66
4.3 Boxplots of the selected bandwidths for KFSD . . . . . . . . . . . . . . . . . . . . 71
4.4 NOx curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.5 NOx curves: highlighting some detected outliers . . . . . . . . . . . . . . . . . . . 74
xix
List of Tables
2.1 Percentages of times a depth assigns a value among the nout,j lowest ones to an
outlier. Types of outliers: clear and faint . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1 Supervised classification: CGP1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 Supervised classification: CGP2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.3 Supervised classification: CGP1out . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4 Supervised classification: CGP2out . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.5 Cross-validation and best performing percentiles: CGP1, CGP2, CGP1out and
CGP2out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.6 Supervised classification: CGP3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.7 Supervised classification: CGP4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.8 Cross-validation and best performing percentiles: CGP3 and CGP4 . . . . . . . . 44
3.9 Supervised classification: Growth data and T1 . . . . . . . . . . . . . . . . . . . . 46
3.10 Supervised classification: Growth data and T2 . . . . . . . . . . . . . . . . . . . . 47
3.11 Cross-validation and best performing percentiles: Growth data . . . . . . . . . . 47
3.12 Supervised classification: Phoneme data and T1 . . . . . . . . . . . . . . . . . . . 51
3.13 Supervised classification: Phoneme data and T2 . . . . . . . . . . . . . . . . . . . 51
3.14 Cross-validation and best performing percentiles: Phoneme data . . . . . . . . . 51
4.1 Outlier detection: MM1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2 Outlier detection: MM2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
xxi
4.3 Outlier detection: MM3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4 Outlier detection: MM4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.5 Outlier detection: MM5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.6 Outlier detection: MM6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.7 Outlier detection: NOx data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
xxii
La poesia non e di chi la scrive,e di chi gli serve!(Mario Ruoppolo to Pablo Neruda in Il Postino)
Chapter 1
Introduction
The technological advances of the last decades in fields such as chemometrics, engineering,
finance, growth analysis or medicine among others have allowed to observe random samples
of curves. In these cases it is common to assume that the curves have been generated by a
stochastic function and to refer to them as functional data. More precisely, a functional datum
y is an observation of a functional random variable Y ∈ H, where H is an infinite-dimensional
(functional) space. Alternatively, Y can also be interpreted as a stochastic process Y (s), s ∈ I,
where I is an interval in R, and the functional datum expressed as y(s).
In practice, a functional datum y(s) is always observed at a finite number of evaluation
points, that is, in the form of y(s) = (y(s1), . . . , y(sm)), where s = (s1, . . . , sm) is the set of
domain points where y(s) has been measured. However, even if y(s) is a vector, there are at
least three reasons why it is harmful to treat functional data as multivariate data.
1. The sets of domain points where the functional data of a sample are measured may differ
in size and/or elements among them. Moreover, for each observation the order of the
vector s contains an important amount of information and permutations of its elements
are not allowed, unlike in the multivariate case.
2. Any curve is a realization of a stochastic function with a certain dependence structure.
Consequently, functional data are often autocorrelated. Therefore, it is hard to analyze
1
INTRODUCTION 2
functional data by means of standard multivariate procedures because they usually fail
in presence of autocorrelation.
3. Functional samples may contain less curves than evaluation points or, in multivariate
jargon, less rows than columns. It is well known that matrices with more columns than
rows are hardly handled by multivariate techniques.
As a response to the difficulty described above of multivariate data analysis to deal with
functional data, the area of statistics known as functional data analysis (FDA) has arisen in
recent years. Two seminal and complementary books about FDA, one parametric and the other
nonparametric, are Ramsay and Silverman (2005) and Ferraty and Vieu (2006), respectively.
More recently, Horvath and Kokoszka (2012) focused on inference and asymptotic theory for
functional data, whereas Cuevas (2014) presented a partial overview of the state of art in FDA
theory. To give an idea of the kind of data FDA deals with, we show in Figure 1.1 three real
datasets that are considered in this thesis. They are:
• growth curves of 54 heights of girls measured at a common discretized set of 31
nonequidistant ages between 1 and 18 years (Ramsay and Silverman 2005);
• 100 log-periodograms of length 150 corresponding to recordings of speakers pronounc-
ing the phoneme “aa” (Ferraty and Vieu 2006);
• 76 nitrogen oxides (NOx) emission level daily curves measured every hour close to an in-
dustrial are in Poblenou (Barcelona). All the curves correspond to working days (Febrero
et al. 2008).
Before closing this introduction, we make a brief remark on a FDA aspect that is not central
in this thesis, but it is worth to mention. According to Cuevas (2014), a functional datum y(s)
may require a preliminary treatment to either reduce its dimension or remove noise. Among
other alternatives, both objectives can be achieved through basis representation: let ej(s) be
a basis system of H, then y(s) can be substituted by a truncated basis representation given by
INTRODUCTION 3
5 10 15
8010
012
014
016
018
0
Growth
0 50 100 150
05
1015
2025
Phoneme
0 5 10 15 20
010
020
030
040
0
NOx
Figure 1.1: Three real functional datasets: growth curves of girls (top left), log-periodograms of the phoneme “aa” (top right), daily curves of NOx levels (bottom)
y(s) =
J∑j=1
cjej(s),
where J denotes the number of basis functions to use and cj the coefficients to be chosen
according to some criteria, e.g., least squares
m∑k=1
y(sk)−J∑j=1
cjej(sk)
2
.
Clearly, the choices of ej(s), J and the criterion to select cj are very important aspects, but
we do not discuss them in the thesis since they are beyond our scope. For more details on this
FDA issue, we recommend the book by Ramsay and Silverman (2005).
INTRODUCTION 4
1.1 The Notion of Functional Depth
In the previous section we discussed the juxtaposition of multivariate and functional data anal-
ysis. However, their differences did not prevent multivariate techniques to inspire advances
in FDA. A good example of the collaboration between the two areas is given by the extension
of the notion of data depth from the multivariate to the functional framework.
The idea of data depth was born in the multivariate context in a successful attempt to ex-
tend the univariate notion of order statistics to multidimensional spaces. Univariate order
statistics allow to rank data from the smallest to the largest observation and to evaluate the
degree of centrality of a point relative to a probability distribution or a sample. Moving to Rd,
d ≥ 2, although Rd lacks the natural order of R, it is still interesting to have a center-outward
ordering criterion and the notion of multivariate depth succeeds in providing it. According to
Serfling (2006), a multivariate depth is a function that measures how deep (or central) a point
x ∈ Rd is relative to the probability distribution P . To give an idea of how a multivariate depth
should work, consider a P with a unique mode Mo anda certain tail. Then, the depth value of
x should be high for x = Mo and low for x equal to a value in the tail region.
Several implementations of the notion of multivariate depth have been proposed in the
literature. For an overview on this topic, see for example Liu et al. (1999) or Zuo and Ser-
fling (2000). Next, we report the definitions of three multivariate depths. First, the halfspace
depth (Tukey 1975), which is defined as the minimum probability mass carried by any closed
halfspace H ∈ Rd containing x. More precisely,
Definition 1.1. Halfspace (or Tukey) depth (Tukey 1975)
The halfspace depth of x ∈ Rd relative to P on Rd is given by
HSD(x, P ) = infHPr(H) : x ∈ H . (1.1)
Second, the simplicial depth (Liu 1990), which is defined as the probability that x belongs
to a random simplex S ∈ Rd, that is,
INTRODUCTION 5
Definition 1.2. Simplicial depth (Liu 1990)
The simplicial depth of x ∈ Rd relative to P on Rd is given by
SID(x, P ) = Pr (x ∈ S (y1, . . . ,yd+1)) , (1.2)
where S (y1, . . . ,yd+1) denotes the d-dimensional simplex with vertices y1, . . . ,yd+1.
Third, the spatial depth (Serfling 2002), that uses the geometry of multivariate data clouds
and is connected with a notion of multivariate quantiles as we show later.
Definition 1.3. Spatial depth (Serfling 2002)
Let Y be a random variable having probability distribution P on Rd and F be the associated
cumulative distribution of Y. The spatial depth of x ∈ Rd relative to P is given by
SD(x, P ) = 1−∥∥∥∥∫ S(x− y)dF (y)
∥∥∥∥E
= 1− ‖E [S(x−Y)]‖E , (1.3)
where ‖·‖E is the Euclidean norm in Rd and S : Rd → Rd is the spatial sign function given by
S(x) =
x‖x‖E , x 6= 0,
0, x = 0.
Among the previous three multivariate depths, SD has certainly the most important role
for the development of this thesis. For this reason, we report two features of the multivariate
spatial depth. First, as mentioned above, SD is connected with the notion of multivariate
spatial quantile introduced by Chaudhuri (1996):
Definition 1.4. Spatial quantile (Chaudhuri 1996)
Let Y be a random variable having probability distribution P on Rd. Consider u ∈ Rd such
that ‖u‖E < 1. Then, QP (u) is the uth spatial quantile of Y if and only if QP (u) is the value of
q which minimizes
E [Ψ(u,Y − q)−Ψ(u,Y)] ,
INTRODUCTION 6
where, for y ∈ Rd, Ψ(u,y) = ‖y‖E + 〈u,y〉E , and 〈u,y〉E is the Euclidean inner product of u
and y.
Then, it can be shown that under some mild conditions QP (u) and SD(x, P ) are linked in
the following way:
‖Q−1P (x)‖E = 1− SD(x, P ). (1.4)
Therefore, if x has a high spatial depth value, its associated spatial quantile u has a low norm,
and vice versa.
Second, SD(x, P ) depends on the whole P , that is, SD as well as HSD and SID tackle the
depth problem through an approach that we describe as “global”. However, in some analy-
ses it may be recommended to focus on narrower neighborhoods of P and tackle the depth
problem with a “local approach”. With the aim of showing how a local approach can be im-
plemented, we next briefly describe the local versions of the global depths HSD, SID and SD
that have been proposed in the literature:
• the local halfspace depth LHSD (Agostinelli and Romanazzi 2011), which replaces closed
halfspaces with closed slabs. Denote with Hu(a) the closed halfspace z ∈ Rd : u′z ≥
a, u′u = 1. Then, (1.1) can be written as
HSD(x, P ) = infu:u′u=1
Pr(Hu(u′x)). (1.5)
A closed slab SLu(a, a+ τ) between two parallel closed halfspaces Hu(a) and Hu(a+ τ)
is the intersection between the positive side of Hu(a) and the negative side of Hu(a+ τ).
Then, LHSD is given by the following modification of (1.5):
LHSD(x, P, τ) = infu:u′u=1
Pr(SLu(u′x, u′x+ τ)).
• the local simplicial depth LSID (Agostinelli and Romanazzi 2011), which only considers
INTRODUCTION 7
simplices with size no greater than a fixed threshold τ and is given by the following
modification of (1.2):
LSID(x, P, τ) = Pr (x ∈ S (y1, . . . ,yd+1) : v(S (y1, . . . ,yd+1)) ≤ τ) ,
where v(S (y1, . . . ,yd+1)) denotes the volume of the simplex with vertices y1, . . . ,yd+1.
• the kernelized spatial depth KSD (Chen et al. 2009), which is basically based on consid-
ering S(x− y) in (1.3) through a kernel function that reduces the effect of y on the depth
of x for y distant from x.
The notion of multivariate depth has been extended to functional data. In FDA, a depth
has a similar purpose, that is, it allows to obtain a center-outward ordering criterion for curves,
and different implementations already exist in the FDA literature. For example, Chakraborty
and Chaudhuri (2014) defined the functional version of SD(x, P ), the functional spatial depth
function FSD(x, P ), where now x is an element of an infinite-dimensional Hilbert space H
and P is a probability distribution on H. As well as SD, FSD is a global-oriented depth. We
give its formal definition in Chapter 2, where we also present the functional extension of the
notion of spatial quantile function, FQP (u), where now u ∈ H such that ‖u‖ < 1, and ‖ · ‖ is
the norm derived from the inner product 〈·, ·〉 in H (Chaudhuri 1996). In the same chapter we
show that the relationship given by (1.4) extends to the functional framework, and therefore
also FSD(x, P ) and FQP (u) are linked.
The main contribution of this thesis is the definition of a new functional depth. Indeed, we
define the kernelized functional spatial depth (KFSD) which can be interpreted in two ways:
first, KFSD is a functional extension of the multivariate KSD proposed by Chen et al. (2009);
second, KFSD is a local-oriented version of the global-oriented FSD. Then, we use KFSD to
solve functional statistical problems that may require an analysis at a local level. We mainly
focus on supervised classification and outlier detection and we show that KFSD represents a
useful tool to solve such problems.
Along the thesis, besides FSD and KFSD, we consider other functional depths that we next
INTRODUCTION 8
briefly introduce: Fraiman and Muniz (2001) defined the Fraiman and Muniz depth (FMD),
which tries to measure how long a curve remains in the middle of a sample. Cuevas et al.
(2006) proposed the h-modal depth (HMD), which tries to measure how densely a curve is sur-
rounded by other curves. Cuesta-Albertos and Nieto-Reyes (2008) and Cuevas and Fraiman
(2009) proposed the random Tukey depth (RTD) and the integrated dual depth (IDD), respec-
tively. Both depths are based on the computation of K random one-dimensional projections
of the curves, but they differ on how the projections are treated: RTD is given by the mini-
mum of the univariate Tukey depth values of the projections, IDD is given by the average of
the univariate simplicial depth values of the projections. Finally, Lopez-Pintado and Romo
(2009) proposed the band and modified band depths. In particular, the modified band depth
(MBD) is based on all the possible bands defined by the graphs on the plane of 2, 3, . . . and J
curves, and on a measure of the sets where another curve is inside these bands. We provide
the definitions of these functional depths in Chapter 2.
1.2 Depth-based Functional Methods
In what follows we often use the notion of i. i. d. functional sample that may refer alternatively
to:
1. an i. i. d. random sample of size n generated from the random variable Y ∈ H. In this
case we use the notation Yn = y1, . . . , yn;
2. an i. i. d. random sample of size n generated from the stochastic process Y (s), s ∈ I. In
this case we use the notation Yn(s) = y1(s), . . . , yn(s);
3. an i. i. d. random sample of size n generated from the stochastic process Y (s), s ∈ I
whose ith component has been observed at si = (s1i , . . . , smi), i ∈ 1, . . . , n. In this case
we use the notation Yn(s) = y1(s1), . . . , yn(sn).1
1if si 6= sj for some i, j ∈ 1, . . . , n, it may be necessary a preliminary step to estimate the curves at a commonset of domain points. In this thesis, when the data require such preliminary treatment, we carry out the estimationusing cubic spline interpolation. Other techniques can be used for this task.
INTRODUCTION 9
A functional sample (e.g., Yn) can be analyzed using a functional depth. For example,
functional location estimation of the center of Y can be provided by either non-depth-based or
depth-based statistics. A natural estimation of the center of Y is given by the functional mean,
µ =1
n
n∑i=1
yi.
Clearly, the computation of µ does not require the use of a functional depth. However, a depth
analysis of Yn provides a center-outward ordering of the observations, i.e., Y(n) = y(1), . . . ,
y(n), where y(1) and y(n) are the curves in Yn with minimum and maximum depth, respec-
tively. Y(n) provides an alternative to µ, that is, the functional depth-based median given by
the deepest/most central observation in Yn, Me = y(n). Another substitute for µ may be the
functional depth-based α-trimmed mean µα, 0 < α < 1, that can be obtained as follows: com-
pute rα = αn and take the smallest integer greater or equal than rα, drαe. Then,
µα =1
n
n∑i=drαe
y(i).
Moreover, the information contained in Y(n) allows to estimate locations other than the
center of Y by means of the notion of functional depth-based percentiles: let 0% ≤ p ≤ 100%,
rp = pn100 and rp be the nearest integer to rp, then the p depth-based percentile of Yn is given by
y(rp).
Besides for location estimation, functional depths can be used to build methods that are
robust to functional outliers, i.e., curves generated by a different random variable than the one
of the normal curves. For instance, supervised functional classification is one of the problems
that has been tackled using the notion of depth. In a supervised functional classification prob-
lem there are some labeled training curves that belong to two or more groups and the goal is to
classify some test curves with unknown class membership. Lopez-Pintado and Romo (2006)
and Cuevas et al. (2007) proposed three depth-based classification methods which are espe-
cially devised to deal with functional samples where the presence of outlying curves cannot
be discarded. They are the distance to the trimmed mean method (DTM, Lopez-Pintado and
INTRODUCTION 10
Romo 2006), the weighted averaged distance method (WAD, Lopez-Pintado and Romo 2006)
and the within maximum depth method (WMD, Cuevas et al. 2007). We formally present them
in Chapter 3, but here we anticipate that the robustness to outliers of DTM, WAD and WMD
is achieved thanks to the use of a functional depth: in DTM, the decision rule depends on the
group depth-based trimmed means, that are resistant to outliers; in WAD, the classification is
done giving more weight to those curves that are deeper within the group training samples,
and outliers have usually low depth values; in WMD, the class of an unlabeled curve depends
on its depth values relative to the different training groups, and these values are barely affected
by outliers. We enhance the study of DTM, WAD and WMD by studying their performances
when they are used in conjunction with KFSD and the other above-mentioned existing func-
tional depths. In particular, we consider supervised functional classification problems where
the differences between groups are not extremely clear-cut or the data may contain outlying
curves, and we show that in these challenging scenarios a local approach based on the use of
KFSD leads to good results.
An alternative strategy to the construction of robust functional methods is outlier detec-
tion. When a sample of curves is ordered from the most to the least central curve using a
functional depth, if any outlier is in the sample, its depth is expected to be among the lowest
values. Therefore, it is reasonable to build outlier detection methods based on this feature.
Indeed, methods of this nature already exist in the literature. For example, Febrero et al. (2008)
proposed to label as outliers those curves with depth values lower than a certain threshold. As
functional depths, they considered three alternatives, i.e., the Fraiman and Muniz depth FMD
(Fraiman and Muniz 2001), the h-modal depth HMD (Cuevas et al. 2006) and the integrated
dual depth IDD (Cuevas and Fraiman 2009), and they proposed two bootstrap procedures
based on depth-based trimmed and weighted resampling to determine the depth threshold.
Also, Sun and Genton (2011) introduced the functional boxplot, which is constructed using the
ordering provided by the modified band depth MBD (Lopez-Pintado and Romo 2009). Then,
the functional boxplot allows to detect outliers as well as the standard boxplot does. Clearly,
the use of a functional depth is only one of the possible strategies for tackling the functional
INTRODUCTION 11
outlier detection problem. For example, Hyndman and Shang (2010) proposed to reduce the
outlier detection problem from functional to multivariate data by means of functional prin-
cipal component analysis (FPCA), and to use two alternative multivariate techniques on the
scores to detect outliers, i.e., the bagplot and the high density region boxplot. In this thesis,
we contribute to the FDA literature on outlier detection by enlarging the number of available
procedures. Indeed, we present three new methods that are based on smoothed resampling
techniques and allow to select a threshold for KFSD to detect outliers. Similarly as for clas-
sification, we evaluate these new procedures in challenging scenarios: we focus on low mag-
nitude, shape and partial outliers, that is, on outliers that are difficult to recognize, and we
observe results that support the proposed KFSD-based outlier detection methods.
1.3 Structure of the Thesis
This thesis contains five chapters. In the current chapter we discussed some FDA issues, we
presented three real functional datasets and we introduced the notion of functional depth. To
do that, we also recalled some global and local multivariate depths and described how the
concept of depth has been extended to FDA. Finally, we recalled some examples of how a
functional depth can help in doing statistics in FDA, e.g., location estimation, supervised clas-
sification and detection of outliers.
The contributions of this dissertation are developed in Chapters 2, 3 and 4. In Chapter 2
we present a new functional depth, the kernelized functional spatial depth (KFSD). KFSD is a
depth relying on an approach that is both spatial and local. Its local orientation is based on the
use of kernel methods that allow the depth value of a curve x to depend more on the curves
lying in a narrow neighborhood of x. By means of motivating examples, we show that the
local approach behind KFSD results useful to analyze functional samples having for instance
a structure that deviates from unimodality or that are contaminated by outliers. In Sections 2.3
and 2.2, we present the other existing functional depths that we use in this thesis (FMD, HMD,
RTD, IDD, MBD and FSD).
INTRODUCTION 12
In Chapter 3 we use KFSD to perform supervised classification. After describing the sta-
tistical problem of functional discrimination, we present three existing depth-based methods
(DTM, WAD and WMD) and carry out an extensive simulation study. The main feature of the
study, and a novelty in the field, is that we focus on scenarios where the difference between
groups are not extreme. We show that in these scenarios the use of KFSD together with some of
the depth-based classification methods leads to competitive results. In Section 3.4 we classify
the curves of two real datasets and we confirm the good results observed with the simulated
curves.
In Chapter 4 we deal with outlier detection. As for classification, the strategy consists in us-
ing KFSD as a tool to reach our statistical goal. In this case the objective is the identification of
outliers for which we provide three new different procedures based on KFSD. We define these
methods in Section 4.2, whereas we evaluate them in Section 4.3, where we also consider some
existing procedures as competitors. If in classification we do not consider classes extremely
different, in outlier detection we not consider extreme outliers. Indeed, we find more interest-
ing to focus on atypical curves that are difficult to recognize and the new procedures perform
well in detecting this type of outliers. Finally, we close Chapter 4 doing outlier detection on a
real dataset.
The final chapter of the dissertation is dedicated to some conclusions and to the presenta-
tion of some possible future research lines.
E la vita, oggi a te domani a lui!(Dante Cruciani to Ferribotte in I Soliti Ignoti)
Chapter 2
The Kernelized Functional Spatial
Depth1
2.1 Introduction
The origins of the general idea of spatial depth date back to Brown (1983), who studied the
problem of robust location estimation for two-dimensional spatial data and introduced the
idea of spatial median. The spatial approach considers the geometry of the data and is at the
basis of the multivariate spatial depth (Serfling 2002) and quantiles (Chaudhuri 1996) intro-
duced in Chapter 1 (Definitions 1.3 and 1.4, respectively). Both notions have been extended to
functional spaces and they are presented in Section 2.5. In particular, we report the definition
of the functional spatial depth (FSD) due to Chakraborty and Chaudhuri (2014). Then, we
present a new functional depth, the kernelized functional spatial depth (KFSD), which arises
from a modification of FSD. From now on, we assume that H is an infinite-dimensional Hilbert
space, therefore equipped with an inner product function 〈·, ·〉 and a norm function ‖ · ‖ in-
herited from 〈·, ·〉. Before presenting FSD and KFSD, we report the definitions of the other
functional depths that we consider in this thesis.
1This chapter is mostly based on the forthcoming article in the journal TEST by Sguera et al. (2014b)available online at http://link.springer.com/article/10.1007/s11749-014-0379-1 or upon request([email protected])
15
THE KERNELIZED FUNCTIONAL SPATIAL DEPTH 16
2.2 Other Functional Depths
In this section we report the definitions of the other functional depths that we use to carry out
depth-based supervised classification and outlier detection.
Fraiman and Muniz (2001) introduced the first implementation of the notion of depth for
functional data and used it to define ranks and trimmed means in the functional framework.
Their idea consists in considering the integral of the univariate depths of x(s) at each single
point s ∈ I :
Definition 2.1. Fraiman and Muniz depth (FMD, Fraiman and Muniz 2001)
Let Y (s), s ∈ I be a stochastic process in C(I), the space of continuous functions on the
interval I . The Fraiman and Muniz depth of x(s) ∈ C(I) relative to Y (s) is given by
FMD(x(s), Y (s)) =
∫ID(x(s), Y (s)) ds,
where D(·, ·) is a univariate depth. Fraiman and Muniz (2001) propose to use the univariate
simplicial depth as D(·, ·), that is,
D(x(s), Y (s)) = Fs(x(s))(1− Fs(x(s)))
for any s ∈ I , where Fs(·) is the cumulative distribution function of Y (s) at any fixed s ∈ I .
In the initial attempt to extend the concept of mode to the functional setup, Cuevas et al.
(2006) also defined a local depth based on the use of a kernel function. Their basic idea is to
measure how densely a trajectory is surrounded by other trajectories of the process:
Definition 2.2. h-modal depth (HMD, Cuevas et al. 2006)
Let Y be a functional random variable having probability distribution P on H. The h-modal
depth of x ∈ H relative to P is given by
HMD(x, P ) = E [κh (x, Y )] =1
hE [κ (x, Y )] ,
THE KERNELIZED FUNCTIONAL SPATIAL DEPTH 17
where κh, κ : H × H → R are two kernel functions such that κh(x, Y ) = 1hκ(x, Y ) and h is a
fixed tuning parameter.
Clearly, HMD depends on the choice of κ and h. We report the recommendations of the au-
thors in Chapters 3 and 4.
Cuesta-Albertos and Nieto-Reyes (2008) proposed a multivariate depth consisting in a ran-
dom approximation of HSD based on a finite number of one-dimensional projections. Their
first goal was to unburden the demanding computation of HSD, but they also defined a func-
tional extension of their multivariate depth:
Definition 2.3. Random Tukey depth (RTD, Cuesta-Albertos and Nieto-Reyes 2008)
Let Y be a functional random variable having probability distribution P on H. Let v be an
absolutely continuous distribution on H and V = v1, . . . , vK be an i. i. d. random sample
generated from v. The Random Tukey depth of x ∈ H relative to P on V is given by
RTDV (x, P ) = min HSD (〈vk, x〉, Pvk) : vk ∈ V, k = 1, . . . ,K ,
where Pvk denotes the distribution of 〈vk, Y 〉.
Another random functional depth has been proposed by Cuevas and Fraiman (2009). We
report its definition for Hilbert spaces, but this depth can also be defined for data that are
elements of a Banach space.
Definition 2.4. Integrated dual depth (IDD, Cuevas and Fraiman 2009)
Let Y be a functional random variable having probability distribution P on H. Let v be an
absolutely continuous distribution on H and V = v1, . . . , vK be an i. i. d. random sample
generated from v. The integrated dual depth of x ∈ H relative to P on V is given by
IDDV (x, P ) =1
K
K∑k=1
SID (〈vk, x〉, Pvk) ,
where Pvk denotes the distribution of 〈vk, Y 〉.
THE KERNELIZED FUNCTIONAL SPATIAL DEPTH 18
Finally, Lopez-Pintado and Romo (2009) introduced two definitions of depth for functional
observations based on the graphic representation of the curves and on the bands in R2 delim-
ited by j curves. First, the band depth:
Definition 2.5. Band depth (BD, Lopez-Pintado and Romo 2009)
Let Y (s), s ∈ I be a stochastic process in C(I) and Y1(s), . . . YJ(s) be J i. i. d. copies of
Y (s). The band depth of x(s) ∈ C(I) relative to Y (s) is given by
BD(x(s), Y (s)) =J∑j=2
Pr
(min
i=1,...,jYi(s) ≤ x(s) ≤ max
i=1,...,jYi(s), ∀s ∈ I
),
In practice, BD consists in a sum of probabilities of indicator function values. Instead of
considering the indicator function, Lopez-Pintado and Romo (2009) introduced a more flexible
definition by measuring the set where the function x(s) is inside the corresponding band:
Definition 2.6. Modified band depth (MBD, Lopez-Pintado and Romo 2009)
Let Y (s), s ∈ I be a stochastic process in C(I) and Y1(s), . . . YJ(s) be J i. i. d. copies of
Y (s). The modified band depth of x(s) ∈ C(I) relative to Y (s) is given by
MBD(x(s), Y (s)) =J∑j=2
E[λ
(s ∈ I : min
i=1,...,jYi(s) ≤ x(s) ≤ max
i=1,...,jYi(s)
)],
where λ(·) is the Lebesgue measure on I .
In this thesis we do not employ BD, but only MBD. As recommended by the authors, we
use J = 2 as the maximum number of curves delimiting a band.
The last existing functional depth that we consider is the functional spatial depth. Since
FSD is closely related to our proposal KFSD, we dedicate a whole section of the thesis to the
introduction of FSD.
2.3 The Functional Spatial Depth and Associated Quantiles
The functional spatial depth FSD is a recent implementation of the general notion of functional
depth:
THE KERNELIZED FUNCTIONAL SPATIAL DEPTH 19
Definition 2.7. Functional spatial depth (FSD, Chakraborty and Chaudhuri 2014)
Let Y be a functional random variable having probability distribution P on H. The functional
spatial depth of x ∈ H relative to P is given by
FSD(x, P ) = 1− ‖E [FS(x− Y )]‖ ,
where FS : H→ H is the functional spatial sign function given by2
FS(x) =
x‖x‖ , x 6= 0,
0, x = 0.
Next, we show that FSD is connected with the notion of functional spatial quantiles due to
Chaudhuri (1996):
Definition 2.8. Functional spatial quantile (Chaudhuri 1996)
Let Y be a functional random variable having probability distribution P on H. Consider u ∈ H
such that ‖u‖ < 1. Then, FQP (u) is the uth functional spatial quantile of Y if and only if
FQP (u) is the value of q which minimizes
E [Ψ(u, Y − q)−Ψ(u, Y )] , (2.1)
where, for y ∈ H, Ψ(u, y) = ‖y‖+ 〈u, y〉.
Cardot et al. (2013) showed that, if Y is not concentrated on a straight line and is not
strongly concentrated around single points, the Frechet derivative of the convex function (2.1)
is given by
ψ(q) = −E[Y − q‖Y − q‖
]− u. (2.2)
Therefore, FQP (u) is also the value of q such that ψ(q) = 0. Let x be the solution of ψ(q) = 0.
Then, −E[Y−x‖Y−x‖
]= u = FQ−1P (x) and
2For x = x(s) = 0 for any s > 0 we use the notation x = 0.
THE KERNELIZED FUNCTIONAL SPATIAL DEPTH 20
‖FQ−1P (x)‖ =
∥∥∥∥−E [ Y − x‖Y − x‖
]∥∥∥∥ = ‖E [FS(x− Y )]‖ = 1− FSD(x, P ),
which indeed shows that the direct connection between the notions of spatial depth and quan-
tiles holds also in functional Hilbert spaces. Besides this interesting interpretability property
of FSD, Chakraborty and Chaudhuri (2014) also showed that: (1) FSD(x, P ) is invariant under
the class of linear transformations T : H → H, where T (x) = cAx + b, for c ∈ R, c > 0, b ∈ H
and an isometry A on H; (2) if P is non-atomic, then FSD(x, P ) is continuous in x; (3) if H
is strictly convex and P is non-atomic and not supported on a line in H, then FSD(x, P ) has
a unique maximum at the spatial median Me of Y and its maximum value is 1;3 (4) for any
non-zero x ∈ H and sequence Me + nxn∈N+ , the following holds: FSD(m + nx, P ) → 0 as
n → ∞; (5) FSD(x, P ) does not suffer from degeneracy for many infinite dimensional prob-
abilities distributions. For more details on the desirable properties of a functional depth, see
Mosler and Polyakova (2012).
When a functional sample is observed, FSD(x, P ) is replaced by its corresponding sample
version:
Definition 2.9. Functional spatial depth, sample version (Chakraborty and Chaudhuri 2014)
Let Yn = y1, . . . , yn be an i. i. d. random sample generated from the random variable Y ∈ H.
The sample functional spatial depth of x ∈ H relative to Yn is given by
FSD(x, Yn) = 1− 1
n
∥∥∥∥∥n∑i=1
FS(x− yi)
∥∥∥∥∥ . (2.3)
Note that the expression of above will result important for the definition of KFSD.
2.4 The Kernelized Functional Spatial Depth
In this section we propose the kernelized functional spatial depth, a local-oriented version of
FSD. A common way to implement a local approach is to consider kernel-based methods. This
3If Y is not concentrated on a straight line and is not strongly concentrated around single points, the spatialmedian Me of Y is the unique solution of (2.2) for u = 0.
THE KERNELIZED FUNCTIONAL SPATIAL DEPTH 21
was the strategy of Chen et al. (2009), who proposed the multivariate kernelized spatial depth
KSD, the local version of the multivariate spatial depth SD. Moving to functional spaces, we
achieve a similar result and we provide a local depth based on FSD.
To do this, our first step consists in recoding the data. More in detail, instead of considering
x ∈ H, we consider φ(x) ∈ F, where φ : H → F is an embedding map and F is a feature space.
Note that φ can be defined implicitly by a positive definite and stationary kernel, κ : H×H→ R,
through
κ(x, y) = 〈φ(x), φ(y)〉.
For this reason, we refer to this new functional depth as kernelized functional spatial depth.
Its definition is the following:
Definition 2.10. Kernelized functional spatial depth
Let Y be a functional random variable having probability distribution P on H, φ : H → F be
an embedding map and F be a feature space. The kernelized functional spatial depth of x ∈ H
relative to P is given by
KFSD(x, P ) = 1− ‖E [FS(φ(x)− φ(Y ))]‖ = FSD(φ(x), Pφ),
where φ(Y ) and Pφ are recoded version of Y and P , respectively.
The sample version of KFSD(x, P ) is easily obtainable and given by
KFSD(x, Yn) = 1− 1
n
∥∥∥∥∥n∑i=1
FS(φ(x)− φ(yi))
∥∥∥∥∥ = FSD(φ(x), φ(Yn)), (2.4)
where φ(Yn) is a recoded version of Yn. Therefore, both KFSD(x, P ) and KFSD(x, Yn) can
be interpreted as recoded versions of FSD(x, P ) and FSD(x, Yn), respectively.
Note that for KFSD(x, Yn) it is possible to obtain an expression where its kernel nature is
explicit. Indeed, the following holds:
THE KERNELIZED FUNCTIONAL SPATIAL DEPTH 22
∥∥∥∥∥n∑i=1
FS(x− yi)
∥∥∥∥∥2
=
n∑i,j=1;
yi 6=x;yj 6=x
〈x, x〉+ 〈yi, yj〉 − 〈x, yi〉 − 〈x, yj〉√〈x, x〉+ 〈yi, yi〉 − 2〈x, yi〉
√〈x, x〉+ 〈yj , yj〉 − 2〈x, yj〉
,
which implies that FSD(x, Yn) can be expressed in terms of inner products. Thereby, recoding
the data is equivalent to consider a kernel instead of the inner product. Both inner products
and kernel functions can be seen as similarity measures, but kernels are more powerful and
richer than inner products. Therefore, we opt for replacing the inner product function with
a positive definite and stationary kernel function, thus obtaining a κ-based sample version of
KFSD:
Definition 2.11. Kernelized functional spatial depth, κ-based sample version
Let Yn = y1, . . . , yn be an i. i. d. random sample generated from the random variable Y ∈ H
and κ be positive definite and stationary kernel, κ : H×H→ R. The κ-based sample kernelized
functional spatial depth of x ∈ H relative to Yn is given by
KFSD(x, Yn) = 1−
1
n
n∑i,j=1;
yi 6=x;yj 6=x
κ(x, x) + κ(yi, yj)− κ(x, yi)− κ(x, yj)√κ(x, x) + κ(yi, yi)− 2κ(x, yi)
√κ(x, x) + κ(yj , yj)− 2κ(x, yj)
1/2
. (2.5)
Note that (2.5) only requires the choice of κ, and not of φ, which can be left implicit. For
this reason, in practice we use the κ-based version of KFSD(x, Yn).
Regarding the choice of κ, we use a functional version of the Gaussian kernel function used
by Chen et al. (2009), that is,
κ(x, y) = exp
(−‖x− y‖
2
σ2
),
THE KERNELIZED FUNCTIONAL SPATIAL DEPTH 23
which in turn depends on the norm inherited by the functional Hilbert space where data are as-
sumed to lie and on the bandwidth σ. With the aim of exploring the behavior of κ as a function
of bandwidth, we consider different values of σ. We set each σ equal to the p% percentile of
the empirical distribution of ‖yi − yj‖, yi, yj ∈ Yn. Note that the lower p, the lower σ and the
more local the approach of KFSD. Indeed, with low percentiles KFSD considers small neigh-
borhoods and can be viewed as a potential functional density estimator, whereas with high
percentiles KFSD considers wide neighborhoods and behaves almost as a global depth. There-
fore, the use of different percentiles allows us to cover different degrees of KFSD-based local
approaches. For example, the 10%, 50% and 90% percentiles lead to strongly, moderately and
weakly local approaches, respectively. In the corresponding chapters, we present two methods
to choose a final value of σ in supervised classification and outlier detection problems.
After recalling the definitions of FMD, HMD, RTD, IDD, MBD and FSD, and after present-
ing KFSD, we dedicate the next section to discuss the differences between global and local
depths.
2.5 From Global to Local Depths
The definition of FSD(x, Yn) given by (2.3) allow us to discuss an important feature of FSD,
that is, its global approach to the depth problem, and to show how a local approach is an alter-
native. The functional spatial depth of x relative to Yn depends on the values of the functional
spatial sign function, FS(x−yi), i = 1, . . . , n. Each FS(x−yi) is a unit-norm curve representing
the direction from x to y. Therefore, FSD(x, Yn) depends equally on n directions. This feature
generates a trade-off: on one side, FSD(x, Yn) turns out to be robust to the presence of outliers
in Yn; on the other side, FSD(x, Yn) transforms (x − yi) into a unit-norm curve regardless of
yi being a neighboring or a distant curve from x. For this reason, we describe FSD as a global-
oriented depth since it makes depend the depth of x on the whole Yn, and with equal weights.
However, in some circumstances it may be useful an analysis of narrow neighborhoods of x
and to give more weight to close than distant curves, in other words, to have a local approach
THE KERNELIZED FUNCTIONAL SPATIAL DEPTH 24
to the depth problem that allow the information brought by yi to depend on the value of a
certain distance between x and yi, as it happens with KFSD.
To illustrate the differences between a global and a local approach to the depth problem, we
show two examples where the goal is to obtain a center-outward ordering of functional data.
We consider five global depths (FMD, RTD, IDD, MBD and FSD) and two local depths (HMD
and our proposal KFSD). In the first example, we generated 21 curves from a given process
and divided them in three groups with 10, 10 and 1 curves, respectively. Then, we added a
different constant curve to each group (the constants are 0, 10 and 5, respectively), obtaining
the curves at the top of Figure 2.1. Afterwards, we computed the depth values of all the curves
using the five global-oriented depths and we observed that according to all the global depths
the highest depth value is attained at the curve of the third group.
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
0.0 0.2 0.4 0.6 0.8 1.0
01
23
45
Figure 2.1: Two functional datasets that we use to show the differences betweenglobal and local depths
The structure of the dataset of the second example is similar, but we used different trans-
formations to obtain the curves belonging to the second and third group (see the plot at the
bottom of Figure 2.1). Also in this case, we observed that according to all global depths the
THE KERNELIZED FUNCTIONAL SPATIAL DEPTH 25
curve of the third group turns out to be the deepest one. Clearly, both examples involve three
strongly different classes of curves. Nevertheless, when we treat the curves of each example as
belonging to a homogeneous sample, we observe rather inconvenient behaviors of the global
depths. In the two examples, the curve in the third group is roughly in the geometric center of
the dataset, but it is far from the remaining curves and it would be more reasonable to observe
a low depth value at this curve. Actually, this happens with the local depths HMD and KFSD,
and mainly because these depths reduce the contribution of distant curves to the depth value
of a given curve. Indeed, in both examples HMD and KFSD assigned the lowest depth value
to the curves belonging to the third group. This peculiar behavior of HMD and KFSD is due
to their local approach to the depth problem.
Additionally, we also want to give an idea of the potential usefulness of local depths in
ranking correctly outliers that contaminate a functional sample, but that are somehow graph-
ically hidden. To illustrate this fact, we present the following examples: first, we generated
10 datasets of size 50 from a mixture of two stochastic processes, one for normal curves and
one for outliers that are graphically clear, with the probability that a curve is an outlier equal
to 0.05. Second, we generated another group of 10 datasets from a different mixture which
produces outliers that are instead graphically faint. In Figure 2.2 we report a contaminated
dataset for each mixture.
Let nout,j , j = 1, . . . , 10, be the number of outliers generated in the jth dataset. For each
dataset and functional depth, it is desirable to assign the nout,j lowest depth values to the nout,j
generated outliers. For both mixtures and each generated dataset, we registered how many
times the depth of an outlier is indeed among the nout,j lowest values. As depth functions, we
considered the five global depths and two local depths mentioned above. The results reported
in Table 2.1 show that for all the functional depths the ranking of clear outliers is an easier task
than the ranking of faint outliers.
However, while the ranking of clear outliers is reasonably good in different cases, e.g., local
KFSD (100%) or global RTD (95%), the ranking of faint outliers is markedly better with local
depths, i.e., HMD and KFSD (both 76.19%). These results give an idea of the potential of local
THE KERNELIZED FUNCTIONAL SPATIAL DEPTH 26
0.0 0.2 0.4 0.6 0.8 1.0
−20
24
68
clear contamination
0.0 0.2 0.4 0.6 0.8 1.0
−20
24
68
faint contamination
Figure 2.2: Examples of contaminated datasets: clear contamination (top) and faintcontamination (bottom). The solid curves are normal curves and the dashed curvesare outliers
Table 2.1: Percentages of times a depth assigns avalue among the nout,j lowest ones to an outlier.Types of outliers: clear and faint
type of depths global depths local depthsdepths FMD RTD IDD MBD FSD HMD KFSDclear outliers 85.00 95.00 70.00 60.00 60.00 85.00 100.00faint outliers 0.00 28.57 38.10 14.29 33.33 76.19 76.19
THE KERNELIZED FUNCTIONAL SPATIAL DEPTH 27
depths in detecting correctly faint outliers.
Finally, we present a comparative example to evaluate the rankings associated to KFSD and
the other functional depths (FMD, HMD, RTD, IDD, MBD and FSD). We used the 54 growth
curves already reported in Figure 1.1 as functional sample Yn. We computed their depth val-
ues using KFSD with a strongly local percentile (i.e., 10%), as well as using the remaining
functional depths. Then, for each depth we obtained Y(n), that is, the depth-based ordered ver-
sion of Yn, and therefore the ranks of the curves (the curve with rank equal to 1 is the deepest
curve, and so forth). We compared the different rankings and we summarized these compar-
isons in Figure 2.3. For example, the figure at the top on the left of Figure 2.3 compares the
KFSD-based ranks (horizontal axis, in an increasing order) with the FMD-based ranks (in the
vertical axis).
0 10 20 30 40 50
010
2030
4050
KFSD vs FMD
0 10 20 30 40 50
010
2030
4050
KFSD vs HMD
0 10 20 30 40 50
010
2030
4050
KFSD vs RTD
0 10 20 30 40 50
010
2030
4050
KFSD vs IDD
0 10 20 30 40 50
010
2030
4050
KFSD vs MBD
0 10 20 30 40 50
010
2030
4050
KFSD vs FSD
Figure 2.3: Comparison of depth-based rankings (KFSD versus FMD, HMD, RTD,IDD, MBD and FSD) using the 54 growth curves of Figure 1.1
Observing Figure 2.3, it is possible to make some remarks:
• The KFSD-based ranking is rather different from the other rankings. However, it seems
that HMD and FSD are the closest depth in terms of ranking similarities. This result is
THE KERNELIZED FUNCTIONAL SPATIAL DEPTH 28
not surprising since HMD is also a local depth, whereas FSD is the global counterpart of
KFSD.
• The KFSD-based median, that is, the curve with rank equal to 1 according to KFSD, is
not the median for any other depth. On the contrary, all the depths agree on which is the
least deep curve, that is, the curve with rank equal to 54.
We also evaluated different KFSD-based local approaches, As mentioned above, to set σ
in KFSD we consider different percentiles of ‖yi − yj‖, yi, yj ∈ Yn. For example, looking at
the 10%, 50% and 90% percentiles it is possible to evaluate if strongly, moderately and weakly
KFSD-based local approaches differ among them. Using the same functional dataset, we did
this comparison, which is reported in Figure 2.4. The figure at the top of Figure 2.4 compares
the KFSD-based ranks using the 10% percentile (horizontal axis, in an increasing order) with
the case of the 50% percentile (in the vertical axis), while at the bottom the comparison is with
the case of the 90% percentile.
0 10 20 30 40 50
010
2030
4050
KFSD 10% vs 50%
0 10 20 30 40 50
010
2030
4050
KFSD 10% vs 90%
Figure 2.4: Comparison of KFSD-based rankings (percentile 10% versus 50% and90%) using the 54 growth curves of Figure 1.1
THE KERNELIZED FUNCTIONAL SPATIAL DEPTH 29
Observing Figure 2.4, it is possible to appreciate that the different percentiles generate dif-
ferent rankings. Therefore, the way how to choose an appropriate percentile for KFSD is an
issue to take into account, as indeed we do in the rest of the thesis.
2.6 Conclusions
In this chapter we introduced the kernelized functional spatial depth, a local-oriented and
kernel-based version of the functional spatial depth recently proposed by Chakraborty and
Chaudhuri (2014). Originally developed by Chaudhuri (1996) and Serfling (2002) in the multi-
variate context, the spatial approach allows both FSD and KFSD to study the degree of central-
ity of curves from a new point of view with respect to the other existing functional depths. The
main novelty introduced by KFSD consists in the fact that it addresses the study of functional
datasets at a local spatial level, whereas FSD is more appropriate for global spatial analyses.
For example, by means of two illustrative examples, we showed the potential of KFSD in
analyzing functional samples that deviate from unimodality or in ranking correctly outliers
graphically hidden.
As we showed in Section 2.4, KFSD and FSD are related: KFSD(x, P ) = FSD(φ(x), Pφ),
where φ : H → F is an embedding map, F is a feature space and Pφ is a recoded version of
the probability distribution P . The embedding map φ and the feature space F are implicitly
defined through a positive definite kernel, κ : H×H→ R. This relationship between FSD and
KFSD may represent the basis for the future theoretical study of the properties of KFSD.
KFSD depends on the choice of κ. Another possible future research line consists in the anal-
ysis of some alternatives to the kernel function employed in this thesis. Regarding the κ that
we use at the moment, it depends on its bandwidth σ. To understand the effects of σ on the
behavior of KFSD, we proposed to consider a range of values for σ. This choice allows to cover
different degrees of KFSD-based local approaches, from strongly to weakly local approaches,
and we showed that they may provide different depth rankings. The role and the choice of σ
in classification and outlier detection will be further analyzed in Chapters 3 and 4.
Our nada who art in nada,Nada be thy name.Thy kingdom nada.Thy will be nadain nada as it is in nada.(Ernest Hemingway, A Clean Well Lighted Place)
Chapter 3
Supervised Functional Classification1
3.1 Introduction
In this chapter we use KFSD as a tool to perform supervised functional classification. In gen-
eral, in a supervised functional classification problem there is available a training sample com-
posed of curves belonging to two or more groups and with known class memberships. The
information contained in the training data is then used to assign test curves with unknown
class membership to one of the groups. The analysis of the training information and the con-
struction of the classification rule may be based on the use of a functional depth. Indeed, in this
chapter we consider three existing depth-based procedures to perform supervised functional
classification with simulated and real data. We apply these methods using a battery of func-
tional depths: KFSD, but also FMD, HMD, RTD, IDD, MBD and FSD. We focus on challenging
scenarios in which the differences between groups are not extremely clear-cut or the data may
contain outlying curves. The results that we observe with simulated and real data indicate that
a local KFSD-based approach is a good solution in such setups.
1This chapter is mostly based on the forthcoming article in the journal TEST by Sguera et al. (2014b)available online at http://link.springer.com/article/10.1007/s11749-014-0379-1 or upon request([email protected])
31
SUPERVISED FUNCTIONAL CLASSIFICATION 32
3.2 The Supervised Functional Classification Problem
The natural theoretical framework of supervised functional classification is given by the ran-
dom pair (Y,G), where Y is a functional random variable and G is a categorical random vari-
able describing the class membership. From now on, we assume that G takes values 0 or
1. We also assume to observe a sample of n independent pairs generated from (Y,G), i.e.,
(Yn, Gn) = (y1, g1), . . . , (yn, gn), with n0 observations from the group with label 0 and n1
observations from the group with label 1, n0 + n1 = n, and a curve x with unknown class
membership generated from Y .
Using the information contained in (Yn, Gn), the goal of any supervised functional classi-
fication method is to provide a rule to classify the curve x, and several methods have been
proposed in the literature. For instance, Hastie et al. (1995) have proposed a penalized version
of the multivariate linear discriminant analysis technique, whereas James and Hastie (2001)
have directly built a functional linear discriminant analysis procedure that uses natural cubic
spline functions to model the observations. Using a P-spline approach, Marx and Eilers (1999)
have considered functional supervised classification as a special case of a generalized linear
regression model. Hall et al. (2001) have suggested to perform dimension reduction by means
of functional principal component analysis and then to solve the derived multivariate problem
with quadratic discriminant analysis or kernel methods. Ferraty and Vieu (2003) have devel-
oped a functional kernel-type classifier. Epifanio (2008) have proposed to describe curves by
means of shape feature vectors and to use classical multivariate classifiers for the discrimina-
tion stage. Galeano et al. (2014) have developed new versions of several well known functional
classification procedures which use a semi-distance for functional observations that general-
izes the multivariate Mahalanobis distance. Finally, Biau et al. (2005) and Cerou and Guyader
(2006) have studied some consistency properties of the extension of the k-nearest neighbor
procedure to infinite-dimensional spaces. The first extension considers a reduction of the di-
mensionality of the regressors based on a Fourier basis system, and the second generalization
deals with the real infinite dimension of the spaces under consideration. Note that the k-nearest
neighbor method is indeed a general tool that can be used to perform functional nonparamet-
SUPERVISED FUNCTIONAL CLASSIFICATION 33
ric regression, and that classification is a specific case that occurs when the response of the
regression model is categorical (for more details and theoretical results on the general case, see
Burba et al. 2009 and Kudraszow and Vieu 2013).
Besides the above described methods, other alternatives specially designed for datasets
that may contain outlying curves and based on the use of a functional depth have been pro-
posed. Indeed, in Section 3.3 we carry out a simulation study with scenarios that allow outliers,
whereas in Section Section 3.4 we consider two potentially contaminated real datasets. Next,
we report three depth-based proposals:
Definition 3.1. Distance to the trimmed mean method (DTM, Lopez-Pintado and Romo 2006)
Let (Yn, Gn) = (y1, g1), . . . , (yn, gn) be an i. i. d. random sample generated from the random
pair (Y,G) ∈ H × 0, 1 and x a random observation generated from the random variable
Y ∈ H. The distance to the trimmed mean method works as follows:
1. compute the depth-based α-trimmed means µα,g, g = 0, 1;
2. compute dg = ‖x− µα,g‖, g = 0, 1;
3. assign x to the group minimizing dg.
Definition 3.2. Weighted averaged distance method (WAD, Lopez-Pintado and Romo 2006)
Let (Yn, Gn) = (y1, g1), . . . , (yn, gn) be an i. i. d. random sample generated from the random
pair (Y,G) ∈ H × 0, 1 and x a random observation generated from the random variable
Y ∈ H. Denote with Yng =y1g , . . . , yng
the functional sample composed of curves belonging
to group g, g = 0, 1. The weighted averaged distance method works as follows:
1. for a given functional depth, say FD(·, ·), compute the within-group depth values, that
is, FD(yig , Yng
), i = 1, . . . , ng, g = 0, 1. Let wig = FD
(yig , Yng
);
2. compute Wg =∑ng
ig=1
wig‖x−yig‖∑ngig=1 wig
, g = 0, 1;
3. assign x to the group minimizing Wg.
SUPERVISED FUNCTIONAL CLASSIFICATION 34
Definition 3.3. Within maximum depth method (WMD, Cuevas et al. 2007)
Let (Yn, Gn) = (y1, g1), . . . , (yn, gn) be an i. i. d. random sample generated from the random
pair (Y,G) ∈ H × 0, 1 and x a random observation generated from the random variable
Y ∈ H. The within maximum depth method works as follows:
1. for a given functional depth, compute the depth values of x relative to Yng , that is,
FD(x, Yng
), g = 0, 1. Let Dg = FD
(x, Yng
);
2. assign x to the group maximizing Dg.
DTM, WAD and WMD can be used together with any functional depth. In the next two
sections we compare the performances of these methods when used in conjunction with alter-
native functional depths such as FMD, HMD, RTD, IDD, MBD, FSD and KFSD. Among the
non-depth-based methods, we consider as benchmark the functional k-nearest neighbor pro-
cedure (k-NN, Cerou and Guyader 2006), which looks at the k nearest neighbors of x in Yn in
terms of the norm of H and assigns x to a group according to the majority vote. It is worth
noting that the classification rule characterizing k-NN makes the method rather robust to the
presence of outliers, and this is the reason why k-NN represents an interesting competitor for
DTM, WAD and WMD.
3.3 Simulation Study
In this chapter and Chapter 2 we have presented three different depth-based classification
procedures (DTM, WAD and WMD) and seven different functional depths (FMD, HMD, RTD,
IDD, MBD, FSD and KFSD). Pairing all the procedures with all the depths, we obtain 21 depth-
based classification methods, plus k-NN. The goal of this section is to compare them through
an extensive simulation study. From now on, we refer to each method by the notation proce-
dure+depth: for example, DTM+FMD refers to the method obtained by using DTM together
with FMD.
We mainly explore the effectiveness of the methods in supervised classification problems
SUPERVISED FUNCTIONAL CLASSIFICATION 35
where the appropriate class membership of the curves is hard to be deduced by using graph-
ical tools and/or in scenarios in which outlying curves are allowed. In these cases, it may
happen that the curve we want to classify is geometrically rather central relative to the train-
ing samples of two different groups, but also relatively far from one of them, although not
in an obvious way. In such scenarios, a depth-based classification method may behave better
when used together with a local depth instead of a global depth.
Both methods and depths may depend on some parameters or assumptions. Regarding
the methods, DTM depends on the trimming parameter α, that we set at α = 0.2, as in Lopez-
Pintado and Romo (2006). For the benchmark procedure k-NN, we take k = 5 nearest neigh-
bors since it is a standard choice and the method is reasonably robust with respect to the
parameter k. Regarding the functional depths, for HMD, we follow the recommendations in
Febrero et al. (2008), that is, H is the L2 space, κ(x, y) = (2/√
2π) × exp(−‖x − y‖2/2h2), and
h is equal to the 15% percentile of the empirical distribution of ‖yi − yj‖, i, j = 1, . . . , n.
Note that for WMD+HMD we use a normalized version of HMD to make its range equal to
[0, 1]. For RTD and IDD, we work with 50 projections in random Gaussian directions. For
MBD, we consider bands defined by two curves. For FSD and KFSD, we assume that the
curves lie in the L2 space. Moreover, as introduced in Chapter 2, we consider a set of per-
centiles pk = p1, . . . , pK to set σ in KFSD. In particular, we take K = 7 and pk =
15%, 25%, 33%, 50%, 66%, 75%, 85%.
Next, we describe the structure of our simulation study. We overlook setups in which
curves may be almost well classified by a preliminary graphical analysis and focus on clas-
sification scenarios where the differences among groups are hard to be detected graphically.
Moreover, we allow outliers in one part of the simulation study. We consider two-groups sce-
narios throughout the whole simulation study, that is, g ∈ 0, 1, and xg(s) denotes the curve
generating process for group g.
In absence of contamination, we initially consider two different pairs of curve generating
processes:
1. First pair of curve generating processes (from now on, CGP1) with s ∈ [0, 1]
SUPERVISED FUNCTIONAL CLASSIFICATION 36
x0(s) = 4s+ ε(s),
x1(s) = 8s− 2 + ε(s),
where ε(s) is a zero-mean Gaussian component with covariance function given by
E(ε(s), ε(s′)) = 0.25 exp (−(s− s′)2), s, s′ ∈ [0, 1]. (3.1)
2. Second pair of curve generating processes (from now on, CGP2) with s ∈ [0, 2π]
x0(s) = u01 sin s+ u02 cos s,
x1(s) = u11 sin s+ u12 cos s,(3.2)
where u01 and u02 are observations from a continuous uniform random variable between
0.05 and 0.1, whereas u11 and u12 are observations from a continuous uniform random
variable between 0.1 and 0.12.
The main difference between the two processes is that for CGP1 both x0(s) and x1(s) are
composed of deterministic and linear mean functions, plus a random component, while for
CGP2 both x0(s) and x1(s) are exclusively composed of random and nonlinear mean functions.
To allow contamination, we consider the following modified version of CGP1 and CGP2,
where the contamination affects only group 0:
1. First pair of curve generating processes allowing outliers (from now on, CGP1out) with
s ∈ [0, 1]
x0(s) =
4s+ ε(s), with probability 1− q,
4√s+ ε(s), with probability q.
x1(s) = 8s− 2 + ε(s),
where ε(s) is a zero-mean Gaussian component with covariance function given by (3.1)
and 0 < q < 1.
SUPERVISED FUNCTIONAL CLASSIFICATION 37
2. Second pair of curve generating processes allowing outliers (from now on, CGP2out) with
s ∈ [0, 2π]
x0(s) =
u01 sin s+ u02 cos s, with probability 1− q,
u01 sin s+ u12 cos s, with probability q.
x1(s) = u11 sin s+ u12 cos s,
where ui,j , i = 0, 1, j = 1, 2 are defined as for CGP2 and 0 < q < 1.
In Figure 3.2 we report a simulated dataset from CGP1, CGP2, CGP1out and CGP2out.
0.0 0.2 0.4 0.6 0.8 1.0
−20
24
6
CGP1
0 1 2 3 4 5 6
−0.1
5−0
.05
0.05
0.10
0.15
CGP2
0.0 0.2 0.4 0.6 0.8 1.0
−20
24
6
CGP1_out
0 1 2 3 4 5 6
−0.1
5−0
.05
0.05
0.10
0.15
CGP2_out
Figure 3.1: Simulated datasets from CGP1, CGP2, CGP1out and CGP2out: eachdataset contains 25 curves from group g = 0 and 25 dashed curves from groupg = 1. For CGP1 and CGP2, the curves from group g = 0 are all noncontaminated(solid). For CGP1out and CGP2out, the curves from group g = 0 can be noncontam-inated (solid) or contaminated (dotted)
Next, we present some details of the simulation study: for each model, we generated 125
replications. For CGP1out and CGP2out, we set the contamination probability q = 0.10. For
each replication, we generated 100 curves, 50 for g = 0 and 50 for g = 1. We used 25 curves
from g = 0 and 25 curves from g = 1 to build each training sample, and we classified the
SUPERVISED FUNCTIONAL CLASSIFICATION 38
remaining curves. All curves were generated using a discretized and finite set of 51 equidistant
points between 0 and 1 or 0 and 2π, depending on the model. For all the functional depths, we
used a discretized version of their definitions. We performed the comparison among methods
in terms of misclassification percentages. We report means and standard deviations of the
misclassification percentages in Tables 3.1-3.4.
For what concerns KFSD, since we initially consider the set of percentiles pk to set σ,
for each replication and classification method, we have based the choice of the appropriate
percentile among the 7 options on a cross-validation step. Next, we describe it:
1. We divide each initial training sample in 5 groups-balanced cross-validation test samples,
each one of size 10, and we pair each of them with its natural cross-validation training
sample composed of the remaining 40 curves.
2. Therefore, for a each replication, we have available five pairs of cross-validation training
and test samples. For each classification method and percentile, and for all the pairs, we
classify the curves in the test samples and obtain a misclassification percentage.
3. For each replication and classification method, we search for the minimum misclassifica-
tion percentage and its corresponding percentile, say p∗.
Then, for each replication and classification method, we use exclusively p∗ to set the bandwidth
of KFSD to do classification with the initial pair of training and test samples. In a preliminary
stage of the study, since we observed several ties in the cross-validation step described above,
we used the following second criteria to break the ties:
• for DTM, we select the percentile that minimizes the sum of the distances of the curves
in the cross-validation test samples to the α-trimmed mean of the group at which the
curves belong;
• for WAD, we select the percentile that minimizes the sum of the weighted averaged dis-
tances of the curves in the cross-validation test samples to the group at which the curves
belong;
SUPERVISED FUNCTIONAL CLASSIFICATION 39
• for WMD, we select the percentile that maximizes the sum of the scaled KFSD of the
curves in the cross-validation test samples when included in the group at which they
belong.
The second criteria break almost all the ties. However, if after considering the second criterion
a tie is not broken, we break it randomly.
Before showing the results, we discuss some computational issues. Note that for both DTM
and WAD the key step consists in computing the within-group depth values of the training
curves. Hence, we report the computational times (in seconds) that require each functional
depth to perform this task for a training sample of size 50: 0.02 for FMD, 0.08 for HMD, 0.02
for RTD, 0.02 for IDD, less than 0.01 for MBD, 0.05 for FSD, and 0.14 for KFSD2. Additionally,
a 7-percentiles KFSD analysis takes 0.54 seconds, and a 7-percentiles KFSD analysis combined
with a 5-subsamples cross-validation step takes 1.70 seconds. Therefore, the computation of
KFSD is widely feasible, and the option of searching for an appropriate percentile by means of
a cross-validation does not cause major computational problems, whereas it will bring some
classification benefits.
Tables 3.1-3.4 report the performances observed with the data generated from CGP1, CGP2,
CGP1out and CGP2out. Regarding the KFSD-based methods, for each pair of initial training and
test samples and classification method, it may happen to observe the same misclassification
error using the 7 different percentiles. In such cases, the cross-validation step turns out to be
unnecessary. On the contrary, the cross-validation is required if there are at least two different
misclassification errors. For this reason, for each model and method, in Table 3.5 we report the
percentages of replications where the KFSD cross-validation step is required. In the same table
we also report the best performing percentiles for the pairs of initial training and test samples
since this information permits to identify the type of local analysis that would classify best
each type of data.
2For FMD, HMD, RTD and IDD we have used the corresponding R functions that are available in the R packagefda.usc on CRAN (Febrero and Oviedo de la Fuente 2012); for MBD we have followed the guidelines contained inSun et al. (2012); for FSD and KFSD we have built some functions for R, which are available upon request. Featuresof the workstation: Intel Core i7-3.40GHz and 16GB of RAM.
SUPERVISED FUNCTIONAL CLASSIFICATION 40
Table 3.1: CGP1. Means and standard deviations (in parenthesis) of the mis-classification percentages for DTM, WAD and WMD-based methods and k-NN
Method/Depth FMD HMD RTD IDD MBD FSD KFSD
DTM0.16 0.22 0.14 0.14 0.18 0.19 0.14
(0.65) (0.73) (0.58) (0.63) (0.62) (0.64) (0.58)
WAD0.10 0.16 0.11 0.11 0.08 0.13 0.10
(0.50) (0.60) (0.53) (0.53) (0.47) (0.55) (0.50)
WMD15.09 1.66 21.90 18.06 11.82 3.30 0.13(5.43) (2.46) (6.82) (6.27) (4.95) (2.89) (0.66)
k-NN0.11
(0.46)
Table 3.2: CGP2. Means and standard deviations (in parenthesis) of the mis-classification percentages for DTM, WAD and WMD-based methods and k-NN
Method/Depth FMD HMD RTD IDD MBD FSD KFSD
DTM2.43 2.59 2.40 2.45 2.45 2.42 2.35
(2.01) (2.24) (2.00) (2.05) (1.98) (1.99) (2.10)
WAD3.38 3.33 3.10 3.22 3.20 3.02 3.14
(2.32) (2.35) (2.25) (2.33) (2.23) (2.19) (2.25)
WMD0.10 2.16 1.12 0.99 0.13 0.14 0.11
(0.43) (2.75) (1.99) (1.75) (0.49) (0.52) (0.53)
k-NN0.88
(1.53)
Table 3.3: CGP1out. Means and standard deviations (in parenthesis) of the mis-classification percentages for DTM, WAD and WMD-based methods and k-NN
Method/Depth FMD HMD RTD IDD MBD FSD KFSD
DTM0.11 0.08 0.08 0.08 0.06 0.11 0.05
(0.46) (0.39) (0.39) (0.39) (0.35) (0.46) (0.31)
WAD0.06 0.06 0.06 0.08 0.06 0.06 0.06
(0.35) (0.35) (0.35) (0.39) (0.35) (0.35) (0.35)
WMD14.46 2.34 22.93 18.93 11.47 3.26 0.08(5.54) (2.97) (6.92) (7.11) (5.02) (2.80) (0.39)
k-NN0.08
(0.39)
Table 3.4: CGP2out. Means and standard deviations (in parenthesis) of the mis-classification percentages for DTM, WAD and WMD-based methods and k-NN
Method/Depth FMD HMD RTD IDD MBD FSD KFSD
DTM3.87 4.11 3.84 3.81 3.76 3.74 3.71
(2.86) (3.11) (2.84) (2.75) (2.86) (2.83) (2.91)
WAD4.94 4.94 4.67 4.74 4.80 4.48 4.70
(3.36) (3.44) (3.25) (3.25) (3.24) (3.16) (3.29)
WMD0.82 2.90 4.19 3.84 0.83 0.83 0.58
(1.48) (3.31) (3.72) (3.26) (1.55) (1.46) (1.29)
k-NN1.97
(2.05)
SUPERVISED FUNCTIONAL CLASSIFICATION 41
Table 3.5: DTM+KFSD, WAD+KFSD and WMD+KFSD, and curve generatingprocesses CGP1, CGP2, CGP1out and CGP2out: percentages of the replicationsfor which cross-validation is required and best performing percentiles for theinitial training and test samples
Model Method Percentage Percentiles Model Method Percentage Percentiles
CGP1DTM 7.20 50%
CGP2DTM 21.60 66%
WAD 2.40 15%, 25%, 50%, 66%, 75% WAD 13.60 85%WMD 56.00 50%, 66%, 75% WMD 12.80 66%, 85%
CGP1outDTM 3.20 15%, 25%, 50%, 66%
CGP2outDTM 26.40 85%
WAD 0.00 all WAD 19.20 85%WMD 58.40 66% WMD 37.60 75%
The results in Tables 3.1-3.5 show that:
1. When the curves are generated from CGP1 or CGP1out, WAD is the best classification
procedure, but also the performances of DTM are competitive. Both procedures turn
out rather stable with respect to the choice of the functional depth. On the contrary,
WMD is more sensitive to the choice of the depth measure, and only the combination
WMD+KFSD is able to compete with WAD and DTM. The performances of k-NN are
quite good, but they are not the best ones neither for CGP1 nor for CGP1out. Indeed, for
CGP1, the best method is WAD+MBD (0.08%), whereas WAD+KFSD and WAD+FMD are
the second best methods (0.10%). For CGP1out, the best method is DTM+KFSD (0.05%),
and some other spatial depth-based methods perform also quite good, e.g., WAD+FSD
and WAD+KFSD (0.06%). Finally, note that WMD+KFSD behaves reasonably well in
both scenarios (0.13% with CGP1 and 0.08% with CGP1out).
2. When the curves are generated from CGP2 or CGP2out, WMD, in conjunction with FMD,
MBD, FSD and KFSD, is clearly the best performing classification procedure: indeed,
the four resultant methods markedly outperform k-NN. For CGP2, the best method is
WMD+FMD (0.10%), but the performance of WMD+KFSD is almost equal (0.11%). For
CGP2out, the best method is clearly WMD+KFSD (0.58%).
3. The cross-validation step is in general required more by WMD+KFSD, and to a smaller
extent by DTM+KFSD, and finally by WAD+KFSD. For example, with CGP1out the cross-
validation step is completely unnecessary for WAD+KFSD, whereas it is advisable for
SUPERVISED FUNCTIONAL CLASSIFICATION 42
more than one-half of the replications with WMD+KFSD. On the other hand, looking at
the best performing percentiles and focusing on the methods highlighted in the previous
two points, we observe the following: under CGP1 and CGP1out, WAD+KFSD reaches
its best performances with several percentiles; under CGP2 and CGP2out, the best perfor-
mances of WMD+KFSD are with rather high percentiles. This last result is coherent with
the good performances of WMD+FSD under CGP2 and CGP2out. Indeed, the higher is
the percentile, the less local-oriented is KFSD, and its behavior tends towards the behav-
ior of FSD, its global-oriented counterpart.
Observing the curves generated from CGP1 in Figure 3.1, it is possible to appreciate a rather
strong data dependence structure due to the covariance function given by (3.1). On the other
hand, observing the curves generated from CGP2 at any fixed s ∈ [0, 2π], a low data variability
stands out. We enhance our simulation study by relaxing these two features of CGP1 and CGP2
and considering two modifications of them. First, we consider a variation of CGP1 (from now
on, CGP3) which consists in substituting the covariance function of the additive zero-mean
Gaussian component previously given by (3.1) with a weaker version defined by
E(ε(s), ε(s′)) = 0.30 exp (−|s− s′|/0.3), s, s′ ∈ [0, 1].
Second, we consider a variation of CGP2 (from now on, CGP4) which consists in adding to the
two processes in (3.2) two identical additive zero-mean Gaussian components having covari-
ance function
E(ε(s), ε(s′)) = 0.00025 exp (−(s− s′)2), s, s′ ∈ [0, 2π].
Figure 3.2 reports two simulated datasets from CGP3 and CGP4.
We use CGP3 and CGP4 to develop the third and last part of the simulation study. Tables
3.6 and 3.7 report the performances of the 21 depth-based methods and of k-NN, whereas
Table 3.8 is the analogous of Table 3.5 for CGP3 and CGP4.
The results in Tables 3.6-3.8 show that:
SUPERVISED FUNCTIONAL CLASSIFICATION 43
0.0 0.2 0.4 0.6 0.8 1.0
−4−2
02
46
CGP3
0.0 0.2 0.4 0.6 0.8 1.0
−0.1
0.0
0.1
0.2 CGP4
Figure 3.2: Simulated datasets from CGP3 and CGP4: each dataset contains 25 solidcurves from group g = 0 and 25 dashed curves from group g = 1
Table 3.6: CGP3. Means and standard deviations (in parenthesis) of the mis-classification percentages for DTM, WAD and WMD-based methods and k-NN
Method/Depth FMD HMD RTD IDD MBD FSD KFSD
DTM1.33 1.36 1.39 1.36 1.36 1.38 1.31
(1.39) (1.40) (1.44) (1.47) (1.31) (1.42) (1.35)
WAD1.14 1.18 1.20 1.17 1.17 1.20 1.17
(1.33) (1.32) (1.37) (1.35) (1.32) (1.34) (1.32)
WMD5.26 2.93 17.42 14.64 4.29 1.44 1.20
(3.12) (2.66) (5.90) (5.51) (2.76) (1.56) (1.39)
k-NN1.39
(1.46)
Table 3.7: CGP4. Means and standard deviations (in parenthesis) of the mis-classification percentages for DTM, WAD and WMD-based methods and k-NN
Method/Depth FMD HMD RTD IDD MBD FSD KFSD
DTM2.38 2.53 2.30 2.38 2.26 2.30 2.26
(2.61) (2.59) (2.46) (2.47) (2.42) (2.45) (2.40)
WAD3.55 3.46 3.46 3.46 3.60 3.36 3.41
(3.08) (3.08) (2.87) (2.98) (3.07) (2.92) (2.94)
WMD15.52 2.88 28.54 24.02 12.96 4.90 0.82(5.29) (3.19) (6.99) (6.98) (4.76) (3.96) (1.57)
k-NN1.81
(2.28)
SUPERVISED FUNCTIONAL CLASSIFICATION 44
Table 3.8: DTM+KFSD, WAD+KFSD and WMD+KFSD, and curve generatingprocesses CGP3 and CGP4: percentages of the replications for which cross-validation is required and best performing percentiles for the initial trainingand test samples
Model Method Percentage Percentiles Model Method Percentage Percentiles
CGP3DTM 12.00 25%, 75%, 85%
CGP4DTM 14.40 85%
WAD 0.80 25%, 33%, 50%, 66%, 75% WAD 7.20 66%WMD 28.00 75%, 85% WMD 77.60 33%
1. When the curves are generated from CGP3, which is a modification of CGP1, we observe
similar results. WAD is the best classification procedure, but the behavior of DTM is
not bad at all. WMD heavily fails, with the exceptions of the spatial depths KFSD and
FSD, and HMD. k-NN is a competitive classification procedure, but all the DTM and
WAD-based methods, and WMD+KFSD outperform it. The best method is WAD+FMD
(1.14%), whereas there are several best second methods (1.17%), including WAD+KFSD.
2. When the curves are generated from CGP4, which is a modification of CGP2, we observe
that WMD+KFSD is the only method able to outperform k-NN (0.82% against 1.81%),
whereas the remaining methods highlighted for CGP2, i.e., WMD+FMD, WMD+MBD
and WMD+FSD, drastically worsen.
3. As for the previous models, the cross-validation step is in general required more by
WMD+KFSD, and to a smaller extent by DTM+KFSD, and finally by WAD+KFSD. For
example, with CGP3 the cross-validation step is almost unnecessary for WAD+KFSD,
whereas it is advisable for more than one-quarter of the replications with WMD+KFSD.
On the other hand, looking at the best performing percentiles and focusing on the meth-
ods highlighted in the previous two points, we observe that for CGP3 WAD+KFSD
reaches its best performance with all the percentiles except the 15% one, whereas for
CGP4 the best performances of WMD+KFSD are with the 33% percentile, which means
that for this method there is a gain when a rather strong local approach is implemented.
To conclude, we have observed that for the curve generating processes having a deter-
ministic and linear mean function and a random component, i.e., CGP1, CGP1out and CGP3,
WAD+KFSD is among the best and most stable classification methods, and both DTM+KFSD
SUPERVISED FUNCTIONAL CLASSIFICATION 45
and WMD+KFSD have performances that are not so different. On the other hand, for the curve
generating processes having a random and nonlinear mean function, i.e., CGP2, CGP2out and
CGP4, WMD+KFSD is clearly the best classification method. Therefore, KFSD-based func-
tional supervised classification is certainly a good option to discriminate curves.
3.4 Real Data Study
To complete the comparison among the depth-based supervised classification methods and
k-NN, we also consider two real datasets.
3.4.1 Growth Data
The first real dataset consists of 93 growth curves: 54 are heights of girls, 39 are heights of
boys. All of them are observed at a common discretized set of 31 nonequidistant ages between
1 and 18 years. Figure 3.3 shows the curves (for more details about this dataset, see Ramsay
and Silverman 2005). We use natural cubic spline interpolation to estimate the growth curves
at a common and equally spaced domain.
This dataset has been analyzed by Lopez-Pintado and Romo (2006) and Cuevas et al. (2007).
From our point of view, these data are interesting mainly for two reasons: first, the differences
between the two groups are not so much sharp; and second, we can not discard the presence
of some outlying curves, especially among girls.
We perform the first part of the growth data classification study with a similar structure to
the simulation study. More precisely, we consider 150 training samples composed of 40 and
30 randomly chosen curves of girls and boys, respectively. We pair each training sample with
the test sample composed of the remaining 14 and 9 curves of girls and boys, respectively. We
denote this way of obtaining training and test samples as T1, and we try to classify the curves
included in each test sample by using the methods and depths considered in Section 3.4, with
the same specifications for both.
For what concerns the cross-validation step for KFSD, we divide each initial training sam-
SUPERVISED FUNCTIONAL CLASSIFICATION 46
5 10 15
8010
012
014
016
018
020
0 Girls
5 10 15
8010
012
014
016
018
020
0 Boys
5 10 15
8010
012
014
016
018
020
0 G & B
Figure 3.3: Growth curves: 54 heights of girls (top left), 39 heights of boys (topright), 93 heights of girls and boys (bottom; the dashed curves are boys)
ple in 5 groups-unbalanced cross-validation test samples with 8 and 6 curves of girls and boys,
respectively, and we pair each of them with its natural cross-validation training sample.
We report the performances of the 21 depth-based methods and k-NN in Table 3.9. In Ta-
ble 3.11 we report the percentages of pairs of initial training and test samples for which the
cross-validation step is required by DTM+KFSD, WAD+KFSD and WMD+KFSD, and we also
show the best performing percentiles for the same samples.
Table 3.9: Growth data and T1. Means and standard deviations (in parenthesis)of the misclassification percentages for DTM, WAD and WMD-based methodsand k-NN
Method/Depth FMD HMD RTD IDD MBD FSD KFSD
DTM15.22 10.84 19.33 19.91 17.30 19.45 12.14(8.18) (7.35) (8.78) (8.99) (8.80) (9.08) (7.82)
WAD14.43 11.22 15.10 15.16 14.87 15.42 12.84(7.56) (7.41) (8.18) (7.88) (8.11) (8.12) (7.48)
WMD30.41 4.96 35.36 33.16 27.59 18.03 3.45
(11.31) (4.56) (8.60) (8.79) (10.22) (7.39) (3.57)
k-NN3.86
(3.56)
Additionally, we consider important to study how the classification methods work when
SUPERVISED FUNCTIONAL CLASSIFICATION 47
the goal is to classify a single curve using the information contained in the rest of the curves.
To do this with the growth data, we consider each possible training sample composed of 92
curves, and classify the curve not included in the training set. We denote this way of obtaining
training and test samples as T2, and we implement a cross-validation step for DTM+KFSD,
WAD+KFSD and WMD+KFSD also for T2. We report the performances of the classification
methods under T2 in Tables 3.10 and 3.11.
Table 3.10: Growth data and T2. Number of misclassified curves with DTM,WAD and WMD-based methods and k-NN
Method/Depth FMD HMD RTD IDD MBD FSD KFSDDTM 15 9 17 19 15 18 11WAD 12 10 13 13 13 13 11WMD 28 3 32 30 24 16 2k-NN 3
Table 3.11: Growth data and DTM+KFSD, WAD+KFSD and WMD+ KFSD.Percentages of initial T1-type and T2-type training and test samples for whichcross-validation is required and best performing percentiles for the same sam-ples
T Method Percentage Percentiles T Method Percentage Percentiles
T1DTM 72.00 33%
T2DTM 3.23 25%
WAD 36.67 50% WAD 1.08 50%, 66%WMD 91.33 15% WMD 10.75 15%
The results in Tables 3.9-3.11 show that WMD+KFSD is the only method able to outper-
form k-NN, an occurrence that has been already observed with curves generated from CGP4.
Indeed, under T1, WMD+KFSD outperforms k-NN in terms of means of the misclassification
percentages (3.45% against 3.86%), whereas the third best method is WMD+HMD (4.96%);
something similar happens under T2: WMD+KFSD misclassifies 2 curves, and it is still the best
method, followed by k-NN and WMD+HMD, which misclassify 3 curves. If we convert these
T2 performances in percentages, i.e.,(
#misclassified curvessample size × 100
), we observe that, moving
from T1 to T2, there is a slight but systematic improvement: WMD+KFSD, 3.45% → 2.16%;
k-NN, 3.86%→ 3.23%; WMD+HMD, 4.96%→ 3.23%. This pattern is due to the greater size of
the training samples under T2, but it does not cause significant changes in the performances-
based order of the methods.
For what concerns the cross-validation step of the KFSD-based classification methods, most
SUPERVISED FUNCTIONAL CLASSIFICATION 48
of the remarks made for the simulated data also hold for this dataset. Moreover, it is clear that
the implementation of the cross-validation step is a key issue under T1, whereas it becomes
much less important under T2. Finally, under both T1 and T2, the best performing percentile
for the best method, i.e., WMD+KFSD, is the 15% percentile, which means that classification
of growth curves requires a strongly local approach.
Even though here we do not show the results obtained with the 7 different percentiles,
we would like to report that, when we combine WMD with KFSD and use a fixed percentile,
higher percentiles make the performances of the classification method worse. For example, us-
ing WMD and KFSD with the 15% and the 25% percentile, under T1 we observe means equal
to 3.68% and 4.70%, respectively, whereas under T2 the methods misclassify 3 and 4 curves,
respectively. Given the results under T2, we look at the misclassified curves by these two
versions of WMD+KFSD and k-NN: using the 15% percentile, WMD+KFSD misclassifies girls
with labels 11, 25 and 49; using the 25% percentile, it misclassifies girls with labels 8, 25, 49 and
38; k-NN misclassifies girls with labels 8, 25 and 49 (see Figure 3.4).
5 10 15
8010
012
014
016
018
020
0
Growth data
Age(years)
Hei
ght(c
m)
Girls
Boys
Girl 8
Girl 11
Girl 25
Girl 38
Girl 49
Figure 3.4: Growth curves: highlighting some interesting curves for the classifica-tion problem
SUPERVISED FUNCTIONAL CLASSIFICATION 49
Therefore, the differences between WMD+KFSD when used with the 15% percentile and
k-NN lie in girls 8 and 11. Observing these curves, we can appreciate that with a local spatial
approach it is possible to classify correctly a female height having apparently an outlying be-
havior (Girl 8), however at the price of misclassifying a more central female height (Girl 11);
on the contrary, k-NN makes the opposite, and its behavior is more similar to the behavior
of WMD+KFSD when used with the 25% percentile, which misclassifies the same curves as
k-NN, in addition to the girl with label 38. Thanks to the cross-validation step, which allows
a non-fixed percentile, WMD+KFSD takes advantage of the differences between the use of the
15% and the 25% percentile, and it succeeds in misclassifying only girls with labels 25 and 49.
3.4.2 Phoneme Data
The second real dataset that we consider consists in log-periodograms of length 150 corre-
sponding to recordings of speakers pronouncing the phonemes “aa” or “ao”. More precisely,
the dataset contains 400 recordings of the phoneme “aa” and 400 recordings of the phoneme
“ao”. Since we are considering a large number of methods, we perform the study using 100
randomly chosen recordings of the phoneme “aa” (from now on, AA curves) and 100 ran-
domly chosen recordings of the phoneme “ao” (from now on, AO curves). Figure 3.5 shows
the curves. For more details about this dataset, see Ferraty and Vieu (2006).
Observing Figure 3.5, we can appreciate similar features to the ones highlighted for the
growth curves: first, we can not discard the presence of some outlying curves in both groups;
and second, the differences between the two groups are not so much sharp. Indeed, this sec-
ond feature seems exaggerated in the second part of the data (frequencies from 76 to 150), and
the discriminant information seems to lie especially in the first part of them (frequencies from
1 to 75). This hypothesis has been confirmed by a preliminary classification analysis in which
we have observed that in general any method improves its performances when only the first
half of the curves is used. Then, we perform the phoneme data classification study using the
first 75 frequencies, and with a structure that is similar to the one used for the growth data,
that is, we classify curves included in test samples that we obtain by means of both T1 and T2.
SUPERVISED FUNCTIONAL CLASSIFICATION 50
0 50 100 150
05
1015
2025
Phoneme data
Frequencies
Log−
perio
dogr
ams
Figure 3.5: Phoneme data: log-periodograms of 100 AA curves (solid) and 100 AOcurves (dashed)
To perform the T1 part of the study, we consider 100 training samples composed of 75 ran-
domly chosen AA curves and 75 randomly chosen AO curves. Each training sample is paired
with the test sample composed of the remaining 25 AA curves and 25 AO curves (i.e., the allo-
cation “150 training curves, 50 test curves” defines T1), and we classify the curves included in
each test sample by using the same methods and depths as in subsection 3.4.1, with the same
specifications for both.
For what concerns the cross-validation step for KFSD, it is similar to the one implemented
for the simulated data, but in this case we divide each initial training sample in 5 groups-
balanced cross-validation test samples of size 30 and we pair each of them with its natural
cross-validation training sample of size 120.
We report the performances of the 21 depth-based methods and k-NN in Table 3.12, whereas
Table 3.14 is the analogous of Table 3.11 for the phoneme data.
To perform the T2 part of the study, we consider all the possible 200 training samples
composed of 199 phonemes, each one jointly with its corresponding test sample composed
SUPERVISED FUNCTIONAL CLASSIFICATION 51
Table 3.12: Phoneme data and T1. Means and standard deviations (in paren-thesis) of the misclassification percentages for DTM, WAD and WMD-basedmethods and k-NN
Method/Depth FMD HMD RTD IDD MBD FSD KFSD
DTM21.56 23.16 22.68 22.70 21.84 23.00 23.08(5.07) (5.64) (5.47) (5.37) (5.18) (5.56) (5.60)
WAD23.12 23.74 23.88 23.84 23.54 23.64 23.36(5.49) (5.82) (5.78) (5.73) (5.60) (5.82) (5.78)
WMD21.42 24.76 26.18 25.62 20.54 20.62 19.30(4.56) (5.50) (5.79) (6.06) (4.57) (4.86) (4.66)
k-NN22.14(5.01)
of the remaining curve. As for the growth data, we implement a cross-validation step for
DTM+KFSD, WAD+KFSD and WMD+KFSD also under T2. We report the performances of the
classification methods under T2 in Tables 3.13 and 3.14.
Table 3.13: Phoneme data and T2. Number of misclassified curves with DTM,WAD and WMD-based methods and k-NN
Method/Depth FMD HMD RTD IDD MBD FSD KFSDDTM 43 46 45 46 44 45 46WAD 46 50 51 52 46 49 46WMD 43 51 50 51 39 39 37k-NN 45
Table 3.14: Phoneme data and DTM+KFSD, WAD+KFSD and WMD+ KFSD.Percentages of initial T1-type and T2-type training and test samples for whichcross-validation is required and best performing percentiles for the same sam-ples
T Method Percentage Percentiles T Method Percentage Percentiles
T1DTM 10.00 25%
T2DTM 0.00 15%, 25%, 33%, 50%, 66%, 75%, 85%
WAD 33.00 15% WAD 1.00 15%, 25%, 33%, 50%, 66%WMD 54.00 15% WMD 1.50 15%, 25%, 33%, 50%, 66%, 75%
The results in Tables 3.12-3.14 show especially two facts: first, classification of phoneme
data is a hard problem, and effectively the number of misclassified curves is large with any
method and under both T1 and T2; second, WMD+KFSD is the best classification method.
Indeed, under T1, WMD+KFSD is the method with the best performance in terms of mean
of the misclassification percentages (19.30%), and it outperforms the second best method,
WMD+MBD (20.54%). The third best method is given by another spatial depth-based method,
WMD+FSD (20.62%). Under T2, WMD+KFSD is again the method with the best performance
SUPERVISED FUNCTIONAL CLASSIFICATION 52
(37 misclassified curves), whereas WMD+MBD and WMD+FSD are the second best methods
(39 misclassified curves). Note that the performance of the fourth best method is quite distant
(WMD+FMD, 43 misclassified curves), as well as the one of k-NN (45 misclassified curves).
If we convert the T2 performances of the three best methods in percentages, there is a slight
but systematic improvement when moving from T1 to T2: WMD+KFSD, 19.30% → 18.50%;
WMD+MBD, 20.54%→ 19.50%; WMD+FSD, 20.62%→ 19.50%. However, as for growth data,
we observe no significant changes in the performances-based order of the methods.
Observing Table 3.14 and focusing on the best KFSD-based method, i.e., WMD+KFSD, we
appreciate that under T1 the best performing percentile for WMD+KFSD is the 15% percentile,
whereas for T2 the best performing percentiles for WMD+KFSD are all except the 85% per-
centile. However, even though we do not show the results obtained with the 7 different per-
centiles, we would like to report that for the phoneme data, unlike for the growth data, when
we combine WMD with KFSD and a fixed percentile, even with the worst performing per-
centile, which is the 85% percentile, WMD+KFSD has performances comparable to the best
ones. Indeed, using the 85% percentile, WMD+KFSD still outperforms the second best me-
thod of Table 3.12 (19.90% against 20.54% of WMD+MBD under T1), and it misclassifies the
same number of curves as the second best methods of Table 3.13 (39 misclassified curves by
WMD+MBD and WMD+FSD under T2).
3.5 Conclusions
After presenting KFSD in Chapter 2, in this chapter we focused on supervised functional clas-
sification problems using the depth-based methods DTM, WAD and WMD, and a benchmark
procedure such as k-NN. The three depth-based methods were used together with KFSD, FSD
and five more existing functional depths. We developed a simulation study where the dif-
ferences between curves of different groups are not excessively marked and/or the data may
contain outliers, and we analyzed two real datasets with similar features. In both cases, we
observed that a KFSD-based method is always among the best methods in terms of classifica-
SUPERVISED FUNCTIONAL CLASSIFICATION 53
tion capabilities and that it outperforms the benchmark procedure k-NN. Note that no other
depth behaved as well as KFSD, and that WMD+KFSD produced doubtless the most stable
and best depth-based classification method: indeed, WMD+KFSD has always outperformed
WMD+FSD, which is its natural global-oriented competitor, and, more in general, it had ac-
ceptable results and often the best ones, as in the case of growth and phoneme data.
Io so. Ma non ho le prove.Non ho nemmeno indizi.(Pier Paolo Pasolini, Cos’ e questo golpe? Io so)
Chapter 4
Functional Outlier Detection1
4.1 Introduction
The accurate identification of outliers is an important aspect in any statistical data analysis.
Nowadays there are well-established outlier detection techniques in the univariate and multi-
variate frameworks (for a complete review, see for example Barnett and Lewis 1994) as well as
in FDA. The possible reasons why statisticians are interested in identifying outliers are mainly
two: first, any statistical technique may lead to misleading results and conclusions when ap-
plied to contaminated datasets; second, since outliers may arise because of extraordinary de-
viations from the average pattern of the data, a further non-statistical analysis of extreme real-
izations can reveal interesting but latent features of the phenomenon under study.
According to Febrero et al. (2007, 2008), a functional outlier is a curve generated by a sto-
chastic process with a different distribution than the distribution of the normal curves. This
definition covers many types of outliers, e.g., magnitude outliers, shape outliers and partial
outliers, i.e., curves having atypical behaviors only in some segments of the domain. Shape
and partial outliers are typically harder to detect than magnitude outliers (in the case of high
magnitude, outliers can even be recognized by simply looking at a graph), and therefore entail
more difficult outlier detection problems. In this chapter we focus on such scenarios and, from
1This chapter is mostly based on the working paper by Sguera et al. (2014a)
55
FUNCTIONAL OUTLIER DETECTION 56
now on, we refer to low magnitude, shape and partial outliers as “faint outliers” and to high
magnitude outliers as “clear outliers”.
When the local spatial depth KFSD is used to order sample curves from the most to the least
central, in general it succeeds in ranking faint outliers among the least deep curves. To exploit
at the best this feature of KFSD, we introduce three new procedures that provide a threshold
value for KFSD such that curves with depth values lower than the threshold are detected as
outliers: first, we present a result that allows to select a threshold for KFSD to detect outliers.
This result is based on a probabilistic upper bound on a desired false alarm probability of de-
tecting normal curves as outliers. However, its practical application requires the availability
of two samples, circumstance rather uncommon in classical outlier detection problems. Then,
we propose three solutions based on smoothed resampling techniques that require instead a
unique sample.
We study the performances of the resampling-based procedures in a simulation study and
in a real application with environmental data, and we compare them with the performances of
some competitors. First, we consider the methods proposed by Febrero et al. (2008), who also
suggested to label as outliers those curves with depth values lower than a certain threshold: as
functional depths, they considered three alternatives, i.e., FMD, HMD and IDD, whereas, they
proposed two alternative bootstrap procedures based on depth-based trimmed and weighted
resampling, respectively, to determine the depth threshold. Second, we employ the functional
boxplot presented by Sun and Genton (2011). The proposed functional boxplot is constructed
using the ranking of curves provided by MBD and allows to detect outliers as well as the stan-
dard boxplot does. Finally, since the use of a functional depth is only one among the possible
strategies for tackling the functional outlier detection problem, we also consider the methods
proposed by Hyndman and Shang (2010), who first reduce the outlier detection problem from
functional to multivariate data by means of functional principal component analysis (FPCA),
and then use two alternative multivariate techniques on the scores to detect outliers, i.e., the
bagplot and the high density region boxplot, respectively. The results of the comparative study
support our proposals.
FUNCTIONAL OUTLIER DETECTION 57
4.2 Outlier Detection for Functional Data
The outlier detection problem can be described as follows: let Yn = y1, . . . , yn be a sam-
ple that has been generated from a mixture of two functional random variables in H, one for
normal curves and one for outliers, say Ynor and Yout, respectively. Let Ymix be this mixture,
i.e.,
Ymix =
Ynor, with probability 1− α,
Yout, with probability α,(4.1)
where α ∈ [0, 1] is the contamination probability (usually, a value rather close to 0). The curves
composing Yn are all unlabeled, and the goal of the analysis is to decide whether each curve is
a normal curve or an outlier.
Since any functional depth measures the degree of centrality/extremality of a given curve
relative to a distribution or a sample, outliers are expected to have low depth values. In Chap-
ter 3 we used depth-based methods to solve supervised functional classification problems. It
was observed that a local approach is preferable when the classes involved in the problem are
not extremely different or distant. Here, we show that an approach based on a local depth
such as KFSD also succeeds in detecting faint outliers such as low magnitude, shape or partial
outliers.
Recall that KFSD is a functional extension of the kernelized spatial depth for multivariate
data (KSD) proposed by Chen et al. (2009), who also proposed a KSD-based outlier detector
that we generalize to KFSD: for a given dataset Yn generated from Ymix and t, b ∈ [0, 1], the
KFSD-based outlier detector for x ∈ H is given by
g(x, Yn) =
1, if KFSD(x, Yn) ≤ t,t+b−KFSD(x,Yn)
b , if t < KFSD(x, Yn) ≤ t+ b,
0, if KFSD(x, Yn) > t+ b,
(4.2)
where t is a threshold and b determines the transition rate of x from being an outlier (case
g(x, Yn) = 1) to be a normal curve (case g(x, Yn) = 0). Clearly, (4.2) depends on the values of
FUNCTIONAL OUTLIER DETECTION 58
t and b. On the one hand, it is desirable a value of t capable of discriminating between x gen-
erated from Ynor or Yout. On the other hand, the role of b depends on the goal of the analysis.
If the options “outlier” and “normal curve” are the only ones of interest, b should be set at 0.
However, if there is interest in further analysis of “potential outliers”, b may be allowed to be
greater than 0. In our case, since the main goal is outlier detection and t is the key parameter
to be set, we let b = 0.
For the multivariate case, Chen et al. (2009) studied KSD-based outlier detection under dif-
ferent scenarios. One of them consists in an outlier detection problem where two samples are
available, and for which they proposed to select the threshold t by controlling the probability
that normal observations are classified as outliers, i.e., the false alarm probability (FAP). They
proved a result providing a KSD-based probabilistic upper bound on the FAP which depends
on t. Then, the maximum value of t such that the upper bound does not exceed a given desired
FAP provides a threshold for KSD. Next, we extend this result to KFSD:
Theorem 4.1. Let YnY = yi, . . . , ynY and ZnZ = zi, . . . , znZ be two i. i. d. samples generated
from the unknown mixture of random variables Ymix ∈ H described by (4.1), with α > 0. Let g(·, YnY )
be the outlier detector defined in (4.2). Fix δ ∈ (0, 1) and suppose that α ≤ r for some r ∈ [0, 1]. For
a new random element x generated from Ynor, the following inequality holds with probability at least
1− δ:
Ex∼Ynor [g(x, YnY )] ≤ 1
1− r
1
nZ
nZ∑i=1
g (zi, YnY ) +
√ln 1/δ
2nZ
, (4.3)
where Ex∼Ynor refers to the expected value with respect to x generated from Ynor.
The proof of Theorem 4.1 is presented in the appendix of this chapter. Recall that the FAP
has been defined as the probability that a normal observation x is classified as outlier. For the
elements of Theorem 4.1, Prx∼Ynor (g(x, YnY ) = 1) is the FAP. If we set b = 0,
Prx∼Ynor (g(x, YnY ) = 1) = Ex∼Ynor [g(x, YnY )] .
Therefore, the probabilistic upper bound of Theorem 4.1 applies also to the FAP.
FUNCTIONAL OUTLIER DETECTION 59
It is worth noting that the application of Theorem 4.1 requires to observe two samples, cir-
cumstance rather uncommon in classical outlier detection problems, but also a considerably
large nZ . To show the last point, recall that the right-hand side of (4.3) has to be controlled
under the desired FAP and is in practice composed by two addends, with the second equal to
11−r
√ln 1/δ2nZ
. For some normal values such as r = 0.05, δ = 0.05 and nZ = 50, we would have
11−r
√ln 1/δ2nZ
= 0.18, which is greater than a normal desired FAP such as 0.10, and it shows that
the use of Theorem 4.1 may be compromised under some common situations.
We propose three solutions to overcome these limitations. Assume to observe a functional
sample Yn generated from an unknown mixture of random variables Ymix. The goal is to iden-
tify which curves in Yn are outliers, but in this situation there are not available two samples
and Theorem 4.1 cannot be applied. We propose to use Yn as YnY , and to obtain ZnZ by resam-
pling with replacement from Yn. In this way, we also solve the problematic issue related to the
second addend of the right-hand side of (4.3) because it is possible to set nZ as large as needed.
Regarding the resampling procedure to obtain ZnZ , we consider three different schemes, all of
them with replacement. Since we deal with potentially contaminated datasets, besides simple
resampling, we also consider two robust KFSD-based resampling procedures inspired by the
work of Febrero et al. (2008). Then, the three resampling schemes that we consider are:
1. Simple resampling.
2. KFSD-based trimmed resampling: once obtained KFSD(yi, Yn), i = 1, . . . , n, it is possi-
ble to identify the dαTne% of least deep curves, for certain 0 < αT < 1 usually close to
0. These least deep curves are deleted from the sample, and simple resampling is carried
out with the remaining curves.
3. KFSD-based weighted resampling: once obtained KFSD(yi, Yn), i = 1, . . . , n, weighted
resampling is carried out with weights wi = KFSD(yi, Yn).
All the above procedures generate samples with some repeated curves. However, in a pre-
liminary stage of our study we observed that it is preferable to work with ZnZ composed of
curves different among them. To obtain such samples, we add a common smoothing step to
FUNCTIONAL OUTLIER DETECTION 60
the previous three resampling schemes.
To describe the smoothing step, first recall that each curve in Yn is in practice observed
at a discretized and finite set of domain points, and that the sets may differ from one curve
to another. For this reason, the estimation of Yn at a common set of m equidistant domain
points may be required. Let (yi(s1), . . . , yi(sm)) be the observed or eventually estimated m-
dimensional equidistant discretized version of yi, ΣYn be the covariance matrix of the dis-
cretized form of Yn and γ be a smoothing parameter. Consider a zero-mean Gaussian process
whose discretized form has γΣYn as covariance matrix. Let (ζ(s1), . . . , ζ(sm)) be a discretized
realization of the previous Gaussian process. Consider any of the previous three resampling
procedures and assume that at the jth trial, j = 1, . . . , nZ , the ith curve in Yn has been sampled.
Then, the discretized form of the jth curve in ZnZ would be given by (zj(s1), . . . , zj(sm)) =
(yi(s1) + ζj(s1), . . . , yi(sm) + ζj(sm)), or, in functional form, by zj = yi+ζj . Therefore, combin-
ing each resampling scheme with this smoothing step, we provide three different approximate
ways to obtain ZnZ , and we refer to them as smo, tri and wei, respectively. Then, for fixed δ,
r and desired FAP, the threshold t for (4.2) is selected as the maximum value of t such that the
right-hand side of (4.3) does not exceed the desired FAP. Let t∗ be the selected threshold, which
is then used in (4.2) with b = 0 to compute g (yi, Yn), i = 1, . . . , n. If g (yi, Yn) = 1, yi is detected
as outlier. To summarize, we provide three KFSD-based outlier detection procedures and we
refer to them as KFSDsmo, KFSDtri and KFSDwei depending on how ZnZ is obtained (smo, tri
and wei, respectively; recall that YnY = Yn). As competitors of the proposed procedures, we
consider the methods mentioned in Section Section 4.1 that we next describe.
Sun and Genton (2011) proposed a depth-based functional boxplot and an associated out-
lier detection rule based on the center-outward ordering of the sample curves provided by
MBD. Once obtained the ordering, it is used to define the sample central region, that is, the
smallest band containing at least half of the deepest curves. The non-outlying region is de-
fined inflating the central region by 1.5 times. Curves that do not belong completely to the
non-outlying region are detected as outliers. The original functional boxplot is based on the
use of MBD, but clearly any functional depth can be used. Another contribution of this the-
FUNCTIONAL OUTLIER DETECTION 61
sis is the study of the performances of the outlier detection rule associated to the functional
boxplot (from now on, FBP) when used together with the battery of functional depths that we
considered in supervised classification.
Febrero et al. (2008) proposed two depth-based outlier detection procedures selecting a
threshold for FMD, HMD or IDD by means of two alternative robust smoothed bootstrap pro-
cedures whose single bootstrap samples are obtained using the above described tri and wei,
respectively. Then, at each bootstrap sample, the 1% percentile of the empirical distribution
of the depth values is obtained, say p0.01. If B is the number of bootstrap samples, B values
of p0.01 are obtained. Each method selects as cutoff c the median of the collection of p0.01 and,
using c as threshold, a first outlier detection is performed. If some curves are detected as out-
liers, they are deleted from the sample, and the procedure is repeated until no more outliers
are found (note that c is computed only in the first iteration). From now on, we refer to these
methods as Btri and Bwei. Also in this case, we evaluate these procedures using all the func-
tional depths considered in Chapter 3.
Finally, we also consider two procedures that are not based on the use of a functional depth
proposed by Hyndman and Shang (2010). Both procedures are based on the first two robust
functional principal components scores and on two different graphical representations of them.
The first proposal is the outlier detection rule associated to the functional bagplot (from now
on, FBG), which works as follows: obtain the bivariate robust scores and order them using
the multivariate halfspace depth HSD. Define an inner region by considering the smallest re-
gion containing at least the 50% of the deepest scores, and obtain a non-outlying region by
inflating the inner region by 2.58 times. FBG detects as outliers those curves whose scores are
outside the non-outlying region. Note that the scores-based regions and outliers allow to draw
a bivariate bagplot, which produces a functional bagplot once it is mapped onto the original
functional space. The second proposal is related to a different graphical tool, the high den-
sity region boxplot (from now on, we refer to its associated outlier detection rule as FHD). In
this case, once obtained the scores, perform a bivariate kernel density estimation. Define the
(1−β)-high density region (HDR), β ∈ (0, 1), as the region of scores with coverage probability
FUNCTIONAL OUTLIER DETECTION 62
equal to (1 − β). FHD detects as outliers those curves whose scores are outside the (1 − β)-
HDR. In this case, it is possible to draw a bivariate HDR boxplot which can be mapped onto a
functional version, thus providing the functional HDR boxplot.
4.3 Simulation Study
After introducing KFSDsmo, KFSDtri and KFSDwei and their competitors (FBP, Btri, Bwei, FBG
and FHD), in this section we carry out a simulation study to evaluate the performances of
the different methods. Since we employ FBP, Btri and Bwei with seven different functional
depths (FMD, HMD, RTD, IDD, MBD, FSD and KFSD), for them we use the notation proce-
dure+depth: for example, FBP+FMD refers to the method obtained by using FBP together with
FMD.
We consider six models for our simulation study: all of them generate curves according to
the mixture of random variables Ymix described in (4.1). The first three mixture models (MM1,
MM2 and MM3) share Ynor, with curves generated by
y(s) = 4s+ ε(s), (4.4)
where s ∈ [0, 1] and ε(s) is a zero-mean Gaussian component with covariance function given
by
E(ε(s), ε(s′)) = 0.25 exp (−(s− s′)2), s, s′ ∈ [0, 1].
Also the remaining three mixture models (MM4, MM5 and MM6) share Ynor, but, in this case,
the curves are generated by
y(s) = u1 sin s+ u2 cos s, (4.5)
where s ∈ [0, 2π] and u1 and u2 are observations from a continuous uniform random variable
between 0.05 and 0.15.
FUNCTIONAL OUTLIER DETECTION 63
MM1, MM2 and MM3 differ in their Yout components. Under MM1, the outliers are gener-
ated by
y(s) = 8s− 2 + ε(s),
which produces faint outliers of both shape and low magnitude nature. Under MM2, the
outliers are generated by adding to (4.4) an observation from a N(0, 1), and as result outliers
are more irregular than normal curves. Finally, under MM3, the outliers are generated by
y(s) = 4 exp(s) + ε(s),
which produces curves that are normal in the first part of the domain, but that become expo-
nentially outlying.
Similarly, MM4, MM5 and MM6 differ in their Yout components. Under MM4, the outliers
are generated replacing u2 with u3 in (4.5), where u3 is an observation from a continuous uni-
form random variable between 0.15 and 0.17. This change produces partial low magnitude
outliers in the first and middle part of the domain of the curves. Under MM5, the outliers are
generated by adding to (4.5) an observation from a N(0,(0.12
)2), and as result outliers are more
irregular curves. Finally, under MM6, the outliers are generated by
y(s) = u1 sin s+ exp
(0.69s
2π
)u4 cos s, (4.6)
where u4 is an observation from a continuous uniform random variable between 0.1 and 0.15.
As MM3, MM6 allows outliers that are normal in the first part of the domain and become
outlying with an exponential pattern. In Figure 4.1 we report a simulated dataset with at least
one outlier for each mixture model.
The details of the simulation study are the following: for each mixture model, we generate
100 datasets, each one composed of 50 curves. Two values of the contamination probability α
are considered: 0.02 and 0.05. All curves are generated using a discretized and finite set of 51
equidistant points in the domain of each mixture model ([0, 1] for MM1, MM2 and MM3; [0, 2π]
FUNCTIONAL OUTLIER DETECTION 64
0.0 0.2 0.4 0.6 0.8 1.0
−20
24
6
MM1
0 1 2 3 4 5 6
−0.2
−0.1
0.0
0.1
0.2
MM4
0.0 0.2 0.4 0.6 0.8 1.0
−10
12
34
5
MM2
0 1 2 3 4 5 6
−0.2
−0.1
0.0
0.1
0.2
MM5
0.0 0.2 0.4 0.6 0.8 1.0
02
46
MM3
0 1 2 3 4 5 6
−0.2
−0.1
0.0
0.1
0.2
MM6
Figure 4.1: Examples of contaminated functional datasets generated by MM1,MM2, MM3, MM4, MM5 and MM6. Solid curves are normal curves and dashedcurves are outliers
for MM4, MM5 and MM6) and the discretized versions of the functional depths are used.
In relation with the methods and the functional depths that we consider in the study, the
specifications that we use are the following:
1. FBP when used with FMD, HMD, RTD, IDD, MBD, FSD and KFSD: regarding FBP, as
reported in Section 4.2, the central region is built considering the 50% deepest curves
and the non-outlying region by inflating by 1.5 times the central region. Regarding the
depths, for HMD, we follow the recommendations in Febrero et al. (2008), that is, H is
the L2 space, κ(x, y) = 2√2π
exp(−‖x−y‖
2
2h2
)and h is equal to the 15% percentile of the
empirical distribution of ‖yi − yj‖, yi, yj ∈ Yn. For RTD and IDD, we work with 50
projections in random Gaussian directions. For MBD, we consider bands defined by two
curves. For FSD and KFSD, we assume that the curves lie in the L2 space. Moreover, in
KFSD we set σ equal to a moderately local percentile (50%) of the empirical distribution
of ‖yi − yj‖, yi, yj ∈ Yn.
FUNCTIONAL OUTLIER DETECTION 65
2. Btri and Bwei when used with FMD, HMD, RTD, IDD, MBD, FSD and KFSD: γ = 0.05,
B = 100, αT = α. Regarding the depths, we use the specifications reported for FBP.
3. FBG: as reported in Section 4.2, the central region is built considering the 50% deepest
bivariate robust functional principal component scores and the non-outlying region by
inflating by 2.58 times the central region.
4. FHD: β = α.
5. KFSDsmo, KFSDtri and KFSDwei: nY = n = 50 (since YnY = Yn), γ = 0.05, αT = α (only
for KFSDtri), nZ = 6n, δ = 0.05, r = α, desired FAP = 0.10. Moreover, as introduced in
Chapter 2, we consider a set of percentiles pk = p1, . . . , pK to set σ in KFSD. In out-
lier detection, pk = 10k%, k = 1, . . . ,K = 9. The way in which we propose to choose
the most suitable percentile for outlier detection is presented below.
In supervised classification the availability of training curves with known class member-
ships makes possible the definition of some natural procedures to set σ for KFSD, such as
cross-validation. However, in an outlier detection problem it is common to have no informa-
tion whether curves are normal or outliers. Therefore, training procedures are not immediately
available.
We propose to overcome this drawback by obtaining a “training sample of peripheral
curves”, and then choosing the percentile that ranks better the peripheral curves as final per-
centile for KFSD in KFSDsmo, KFSDtri and KFSDwei. Next, we describe the procedure, which
is based on J replications. Let Yn be the functional dataset on which outlier detection has to be
done and let Y(n) =y(1), . . . , y(n)
be the depth-based ordered version of Yn, where y(1) and
y(n) are the curves with minimum and maximum depth, respectively. The steps to obtain a set
of peripheral curves are the following:
1. Let p1, . . . , pK be the set of percentiles in use (in our case, pk = 10k%, k = 1, . . . ,K =
9), and choose randomly a percentile from the set. For the jth replication, j ∈ 1, . . . , J,
denote the selected percentile as pj . From now on, we use J = 20.
FUNCTIONAL OUTLIER DETECTION 66
2. Using pj , compute KFSDpj (yi, Yn), i = 1, . . . , n, where the notation KFSDpj (·, ·) is
used to describe what percentile is used. For the jth replication, denote the KFSD-based
ordered curves as y(1),j , . . . , y(n),j .
3. Take y(1),j , . . . , y(lj),j , where lj ∼ Bin(n, 1n). Apply the smoothing step described in Sec-
tion 4.2 to these curves. For the smoothing step, we use ΣYn and γ = 0.05. For the jth
replication, denote the peripheral and smoothed curves as y∗(1),j , . . . , y∗(lj),j
.
4. Repeat J times steps 1.-3. to obtain a collection of L =∑J
j=1 lj peripheral curves, say YL
(for an example, see Figure 4.2).
0.0 0.2 0.4 0.6 0.8 1.0
−20
24
6
Figure 4.2: Example of a training sample of peripheral curves for a contaminateddataset generated by MM1 with α = 0.05. The solid and shaded curves are theoriginal curves (both normal and outliers). The dashed curves are the peripheralcurves to use as training sample
Next, YL acts as training sample according to the following steps: for each y∗(i),j ∈ YL,
(i ≤ lj), and pk ∈ p1, . . . , pK, compute KFSDpk(y∗(i),j , Y−(i),j), where Y−(i),j = Yn \y(i),j
.
At the end, aL×K matrix is obtained, sayDLK = dlk l=1,...,Lk=1,...,K
, whose kth column is composed
of the KFSD values of the L training peripheral curves when the kth percentile is employed
FUNCTIONAL OUTLIER DETECTION 67
in KFSD. Let rlk be the rank of dlk in the vector KFSDpk(y1, Yn), . . . ,KFSDpk(yn, Yn), dlk,
e.g., rlk is equal to 1 (n + 1) if dlk is the minimum (maximum) value in the vector. Let RLK
be the result of this transformation of DLK , and sum the elements of each column, obtaining
a K-dimensional vector, say RK . Since the goal is to assign ranks as lower as possible to the
peripheral curves, choose the percentile associated to the minimum value of RK . When a tie
is observed, we break it randomly.
The comparison among methods is performed in terms of both correct and false outlier
detection percentages, which are reported in Tables 4.1-4.6. To ease the reading of the tables,
for each model and α, we report in bold the 5 best methods in terms of correct outlier detection
percentage (c).2 For each model, if a method is among the 5 best ones for both contamination
probabilities α, we report its label in bold.
Table 4.1: MM1, α = 0.02, 0.05. Correct (c) andfalse (f) outlier detection percentages of FBP, Btri,Bwei, FBG, FHD, KFSDsmo, KFSDtri and KFSDwei
α = 0.02 α = 0.05
c f c fFBP+FMD 44.34 1.23 43.86 0.73FBP+HMD 74.53 0.94 72.81 0.61FBP+RTD 61.32 0.57 63.16 0.31FBP+IDD 55.66 0.61 61.84 0.34FBP+MBD 49.06 1.33 50.44 0.69FBP+FSD 62.26 0.67 61.84 0.40FBP+KFSD 66.04 0.86 74.12 0.44Btri+FMD 0.00 0.92 0.00 1.80Btri+HMD 72.64 1.43 62.28 1.51Btri+RTD 8.49 0.37 14.47 0.40Btri+IDD 12.26 0.39 17.11 0.65Btri+MBD 0.00 0.67 0.00 1.51Btri+FSD 1.89 0.84 5.70 1.22Btri+KFSD 70.75 1.57 57.89 1.49Bwei+FMD 0.00 1.23 0.00 1.53Bwei+HMD 71.70 1.16 46.49 0.57Bwei+RTD 10.38 1.25 7.46 1.17Bwei+IDD 14.15 2.29 14.04 2.62Bwei+MBD 0.00 1.14 0.00 1.30Bwei+FSD 1.89 1.33 3.07 1.17Bwei+KFSD 66.04 0.94 57.02 0.52FBG 100.00 2.27 97.81 2.37FHD 48.11 1.00 73.68 2.77KFSDsmo 89.62 4.50 85.09 2.58KFSDtri 89.62 4.92 92.11 4.40KFSDwei 97.17 9.44 96.93 6.54
Table 4.2: MM2, α = 0.02, 0.05. Correct (c) andfalse (f) outlier detection percentages of FBP, Btri,Bwei, FBG, FHD, KFSDsmo, KFSDtri and KFSDwei
α = 0.02 α = 0.05
c f c fFBP+FMD 99.09 1.08 96.39 0.84FBP+HMD 96.36 0.96 96.39 0.88FBP+RTD 99.09 0.61 94.78 0.25FBP+IDD 99.09 0.70 95.18 0.38FBP+MBD 99.09 1.06 96.39 0.82FBP+FSD 99.09 0.57 94.78 0.36FBP+KFSD 98.18 0.63 93.98 0.36Btri+FMD 0.00 0.96 0.00 2.00Btri+HMD 94.55 1.60 95.18 1.73Btri+RTD 5.45 0.37 7.63 0.93Btri+IDD 6.36 0.45 10.04 0.97Btri+MBD 0.00 1.08 0.40 2.10Btri+FSD 4.55 1.06 6.02 1.64Btri+KFSD 99.09 1.60 96.39 1.56Bwei+FMD 0.00 1.41 0.00 1.39Bwei+HMD 94.55 0.94 83.53 0.32Bwei+RTD 7.27 1.51 8.43 1.89Bwei+IDD 8.18 2.49 8.84 2.86Bwei+MBD 0.00 1.29 0.40 1.54Bwei+FSD 6.36 1.43 4.82 1.41Bwei+KFSD 92.73 0.72 81.53 0.51FBG 8.18 3.07 4.42 2.95FHD 7.27 1.88 12.45 5.66KFSDsmo 100.00 3.91 95.18 2.76KFSDtri 100.00 5.19 97.99 4.84KFSDwei 100.00 9.20 99.60 6.48
2In presence of tie, we look at the false outlier detection percentage (f), preferring the method with lower f.
FUNCTIONAL OUTLIER DETECTION 68
Table 4.3: MM3, α = 0.02, 0.05. Correct (c) andfalse (f) outlier detection percentages of FBP, Btri,Bwei, FBG, FHD, KFSDsmo, KFSDtri and KFSDwei
α = 0.02 α = 0.05
c f c fFBP+FMD 65.69 0.92 49.19 0.97FBP+HMD 89.22 0.57 85.89 0.63FBP+RTD 86.27 0.45 76.61 0.34FBP+IDD 79.41 0.51 70.56 0.38FBP+MBD 74.51 0.88 59.27 0.84FBP+FSD 79.41 0.51 73.79 0.42FBP+KFSD 89.22 0.57 83.06 0.59Btri+FMD 2.94 0.96 4.84 1.24Btri+HMD 59.80 1.61 55.65 1.64Btri+RTD 5.88 0.33 4.03 0.40Btri+IDD 34.31 0.49 23.79 0.76Btri+MBD 0.98 1.12 3.63 1.49Btri+FSD 14.71 1.06 17.74 1.41Btri+KFSD 59.80 1.65 47.98 1.39Bwei+FMD 2.94 1.10 5.24 0.84Bwei+HMD 59.80 1.25 37.90 0.80Bwei+RTD 19.61 0.92 12.90 0.78Bwei+IDD 29.41 2.67 20.97 2.67Bwei+MBD 0.98 1.31 3.23 1.26Bwei+FSD 16.67 1.10 11.29 0.90Bwei+KFSD 55.88 1.12 41.13 0.72FBG 86.27 2.65 78.63 1.73FHD 49.02 1.02 65.73 2.88KFSDsmo 89.22 3.90 73.79 2.95KFSDtri 90.20 4.63 83.47 4.71KFSDwei 97.06 8.96 90.32 6.50
Table 4.4: MM4, α = 0.02, 0.05. Correct (c) andfalse (f) outlier detection percentages of FBP, Btri,Bwei, FBG, FHD, KFSDsmo, KFSDtri and KFSDwei
α = 0.02 α = 0.05
c f c fFBP+FMD 1.02 0.00 0.00 0.00FBP+HMD 6.12 0.00 1.60 0.02FBP+RTD 0.00 0.00 0.00 0.00FBP+IDD 0.00 0.00 0.00 0.00FBP+MBD 0.00 0.00 0.00 0.00FBP+FSD 0.00 0.00 0.00 0.00FBP+KFSD 2.04 0.00 0.80 0.00Btri+FMD 64.29 0.18 46.80 0.15Btri+HMD 43.88 0.06 20.40 0.21Btri+RTD 27.55 1.08 14.80 0.80Btri+IDD 67.35 0.59 47.60 0.46Btri+MBD 66.33 0.14 43.20 0.06Btri+FSD 68.37 0.12 46.80 0.13Btri+KFSD 57.14 0.24 27.20 0.11Bwei+FMD 51.02 0.12 22.40 0.02Bwei+HMD 40.82 0.04 12.00 0.00Bwei+RTD 24.49 0.18 16.00 0.04Bwei+IDD 90.82 2.26 73.60 1.47Bwei+MBD 56.12 0.08 26.40 0.00Bwei+FSD 61.22 0.08 28.00 0.00Bwei+KFSD 56.12 0.12 20.40 0.00FBG 9.18 0.53 6.80 1.09FHD 51.02 1.02 37.60 4.34KFSDsmo 87.76 2.16 50.00 1.24KFSDtri 91.84 3.00 64.80 2.91KFSDwei 95.92 5.08 62.00 3.35
FUNCTIONAL OUTLIER DETECTION 69
Table 4.5: MM5, α = 0.02, 0.05. Correct (c) andfalse (f) outlier detection percentages of FBP, Btri,Bwei, FBG, FHD, KFSDsmo, KFSDtri and KFSDwei
α = 0.02 α = 0.05
c f c fFBP+FMD 55.56 0.00 54.00 0.00FBP+HMD 66.67 0.00 68.40 0.04FBP+RTD 57.58 0.00 54.40 0.00FBP+IDD 52.53 0.00 56.00 0.00FBP+MBD 55.56 0.00 55.20 0.00FBP+FSD 55.56 0.00 55.60 0.00FBP+KFSD 60.61 0.00 59.20 0.00Btri+FMD 3.03 0.16 2.80 0.36Btri+HMD 96.97 0.16 89.20 0.17Btri+RTD 12.12 1.31 18.40 1.37Btri+IDD 22.22 0.84 29.20 0.63Btri+MBD 3.03 0.18 3.20 0.32Btri+FSD 29.29 0.18 29.20 0.29Btri+KFSD 90.91 0.27 91.20 0.19Bwei+FMD 3.03 0.22 2.40 0.19Bwei+HMD 93.94 0.02 71.20 0.00Bwei+RTD 16.16 0.41 20.00 0.38Bwei+IDD 23.23 3.20 21.60 2.74Bwei+MBD 4.04 0.24 3.60 0.23Bwei+FSD 26.26 0.12 21.60 0.08Bwei+KFSD 88.89 0.12 68.00 0.04FBG 0.00 1.02 0.40 0.04FHD 4.04 1.96 12.80 5.64KFSDsmo 98.99 1.82 94.00 0.44KFSDtri 98.99 2.61 98.00 2.11KFSDwei 100.00 4.61 98.40 2.11
Table 4.6: MM6, α = 0.02, 0.05. Correct (c) andfalse (f) outlier detection percentages of FBP, Btri,Bwei, FBG, FHD, KFSDsmo, KFSDtri and KFSDwei
α = 0.02 α = 0.05
c f c fFBP+FMD 48.42 0.00 44.19 0.00FBP+HMD 60.00 0.18 62.92 0.00FBP+RTD 55.79 0.00 54.68 0.00FBP+IDD 46.32 0.00 40.07 0.00FBP+MBD 48.42 0.00 45.69 0.00FBP+FSD 52.63 0.00 52.43 0.00FBP+KFSD 57.89 0.00 56.93 0.00Btri+FMD 30.53 0.16 35.21 0.32Btri+HMD 67.37 0.24 50.94 0.15Btri+RTD 22.11 1.06 17.23 0.61Btri+IDD 32.63 0.57 20.97 0.51Btri+MBD 28.42 0.24 31.46 0.36Btri+FSD 50.53 0.20 44.94 0.21Btri+KFSD 66.32 0.22 48.31 0.13Bwei+FMD 25.26 0.22 18.35 0.06Bwei+HMD 67.37 0.12 38.95 0.00Bwei+RTD 41.05 0.31 34.46 0.19Bwei+IDD 33.68 2.34 23.22 1.75Bwei+MBD 23.16 0.18 17.98 0.15Bwei+FSD 43.16 0.14 29.59 0.11Bwei+KFSD 64.21 0.14 43.45 0.00FBG 17.89 0.02 14.98 0.06FHD 52.63 1.02 61.80 2.85KFSDsmo 91.58 2.08 71.16 0.95KFSDtri 93.68 2.69 82.02 2.49KFSDwei 96.84 4.69 83.15 2.75
FUNCTIONAL OUTLIER DETECTION 70
The results in Tables 4.1-4.6 show that:
1. KFSDtri and KFSDwei are always among the 5 best methods, whereas KFSDsmo 10 times
over 12, but when its performance is not among the 5 best, it is neither extremely far from
the fifth method (MM2, α = 0.02: 95.18% against 96.39%; MM3, α = 0.02: 73.79% against
78.63%). The rest of the methods are among the 5 best procedures at most 5 times over
12 (FBP+HMD).
2. Regarding MM5 and MM6, our procedures are clearly the best options in terms of correct
detection (c), and in the following order: KFSDwei, KFSDtri and KFSDsmo. In general, this
pattern is observed overall the simulation study. Note that for MM6 and α = 0.02 we
observe the best relative performances of KFSDsmo, KFSDtri and KFSDwei, i.e., 91.58%,
93.68% and 96.84%, respectively, against 67.37% of the fourth best method (Bwei+HMD),
that is, we observe differences greater than 20%, and approaching 30% if KFSDwei and
Bwei+HMD are compared.
3. About MM3, KFSDwei is clearly the best method in terms of correct detection, however
at the price of having a greater false detection (f). This is in general the main weak point
of KFSDsmo, KFSDtri and KFSDwei. As for correct detection, we observe a overall pattern
in our methods in false detection, but in an opposite way, indicating therefore a trade-
off between c and f. Relative high false detection percentages are however something
expected in KFSDsmo, KFSDtri and KFSDwei since these methods are based on the defini-
tion of a desired false alarm probability, which is equal to 10% in this study. Concerning
MM2, we observe similar results to MM3, but in this case the performances of the best
competitors for KFSDwei, i.e., KFSDsmo, KFSDtri, FBP-based methods and Btri and Bwei
when used with local depths, are closer to the results of KFSDwei.
4. Finally, there are only 3 cases in which a competitor outperforms our best method: for
MM1 and both α the best method is FBG, whereas for MM4 and α = 0.05 the best me-
thod is Bwei+IDD. However, both FBG and Bwei+IDD do not show behaviors as stable as
FUNCTIONAL OUTLIER DETECTION 71
KFSDsmo, KFSDtri and KFSDwei do. Indeed, they show poor performances under other
scenarios, e.g., MM2, MM5 or MM6.
In Figure 4.3 we report a series of boxplots summarizing which percentiles have been se-
lected in the training steps for KFSDsmo, KFSDtri and KFSDwei.
MM
1, 0
.02
MM
1, 0
.05
MM
2, 0
.02
MM
2, 0
.05
MM
3, 0
.02
MM
3, 0
.05
MM
4, 0
.02
MM
4, 0
.05
MM
5, 0
.02
MM
5, 0
.05
MM
6, 0
.02
MM
6, 0
.05
perc
entil
es
1
2
3
4
5
6
7
8
9
Figure 4.3: Boxplots of the percentiles selected in the training steps of the simula-tion study for KFSDsmo, KFSDtri and KFSDwei
Observing Figure 4.3, the following general remarks can be done. First, MM6 is the mixture
model for which lower percentiles have been selected, and it is also the scenario in which our
methods considerably outperform their competitors. The need for a more local approach for
MM6-data may explain the two observed facts about this mixture model. Second, lower and
more local percentiles have been chosen for mixture models with nonlinear mean functions
(MM4, MM5 and MM6) than for mixture models with linear mean functions (MM1, MM2 and
MM3). Finally, the percentiles selected by means of the proposed training procedure seem to
vary among datasets. However, except for MM3 and α = 0.02, at least for half of the datasets
a percentile not greater than the median has been chosen, which implies at most a moderately
local approach.
FUNCTIONAL OUTLIER DETECTION 72
4.4 Real Data Study: Nitrogen Oxides (NOx) Data
Besides simulated data, we consider a real dataset which consists in nitrogen oxides (NOx)
emission level daily curves measured every hour close to an industrial area in Poblenou
(Barcelona). The dataset is available in the R package fda.usc (Febrero and Oviedo de la
Fuente 2012) and outlier detection on these data has been first performed by Febrero et al.
(2008) in the paper where Btri and Bwei were presented. We enhance their study by consider-
ing more methods and depths.
According to Febrero et al. (2008), NOx are one of the most important pollutants, and it
is important to identify outlying trajectories because these curves may both compromise any
statistical analysis and be of special interest for further analysis.
More in details, the NOx levels were measured in µg/m3 every hour of every day for the
period 23/02/2005-26/06/2005, but only for 115 days was possible to measure the NOx at ev-
ery hour. These 115 curves are the ones composing the final NOx dataset. However, since the
NOx dataset clearly includes working as well as nonworking day curves, following Febrero
et al. (2008), it is more appropriate to consider two different datasets, that is, a sample of 76
working day curves (from now on, W) and another of 39 nonworking day curves (from now
on, NW). Both W and NW are showed in Figure 4.4: at first glance, it seems that each dataset
may contain outliers, especially partial outliers.
Therefore, because of the possible presence of faint outliers, a local depth approach by
means of the use of KFSDsmo, KFSDtri and KFSDwei may be a good strategy to detect outliers.
Besides them, we do outlier detection with all the methods used in Section 4.3. For all the
procedures we use the same specifications as in Section 4.3, and we assume α = 0.05. For
each method, we report the labels of the curves detected as outliers in Table 4.7, whereas in
Figure 4.5 we highlight these curves.
For what concerns W, most of the methods detect as outlier day 37, which apparently shows
a partial outlying behavior before noon and at the end of the day. Another day detected as
outlier by many methods is day 16, whose curve is the one with the highest morning peak.
In addition to curves 16 and 37, KFSDsmo detects as outlier curve 14, as other nine methods
FUNCTIONAL OUTLIER DETECTION 73
0 5 10 15 20
010
020
030
040
0
W
0 5 10 15 20
010
020
030
040
0
NW
Figure 4.4: NOx data: working (top) and non working (bottom) day curves.
Table 4.7: NOx data, W and NW datasets. Curvesdetected as outliers by FBP, Btri, Bwei, FBG, FHD,KFSDsmo, KFSDtri and KFSDwei
w days non w daysdetected outliers
FBP+FMD - -FBP+HMD 12, 16, 37 5, 7, 20, 21FBP+RTD 37 20FBP+IDD - 5, 7, 20FBP+MBD - -FBP+FSD 37 -FBP+KFSD 12, 16, 37 5, 7, 20, 21Btri+FMD 16, 37 7Btri+HMD 14, 16, 37 7, 20Btri+RTD 14, 16, 37 -Btri+IDD 11, 14, 16, 37 -Btri+MBD 16, 37 7Btri+FSD 14, 16, 37 -Btri+KFSD 12, 14, 16, 37 7, 20Bwei+FMD 16 7Bwei+HMD 16, 37 7, 20Bwei+RTD 14, 16, 37, 38 -Bwei+IDD 16, 37 -Bwei+MBD 16 7Bwei+FSD 16, 37 -Bwei+KFSD 16, 37 7, 20FBG 16, 37 -FHD 12, 14, 16, 37 7, 20KFSDsmo 14, 16, 37 7, 20, 21KFSDtri 12, 14, 16, 37 7, 20, 21KFSDwei 11, 12, 13, 14, 15, 16, 37, 38 7, 20, 21
FUNCTIONAL OUTLIER DETECTION 74
0 5 10 15 20
010
020
030
040
0
W
11: WE, 09/03/200512: FR, 11/03/200513: TU, 15/03/200514: WE, 16/03/200515: TH, 17/03/200516: FR, 18/03/200537: FR, 29/04/200538: MO, 02/05/2005
0 5 10 15 20
050
100
150
200
250
NW
5: SA, 12/03/2005 7: SA, 19/03/200520: SA, 30/04/200521: SU, 01/05/2005
Figure 4.5: NOx dataset, curves detected as outliers in Table 4.7: working (top) andnon working (bottom) days
do, recognizing a seeming outlying pattern in early hours of the day. Additionally, KFSDtri
includes among the outliers also day 12, which may be atypical because of its behavior in early
afternoon. Finally, KFSDwei detects as outliers the greatest number of curves. This last result
may appear exaggerated, but all the curves that are outliers according to KFSDwei seem to
have some partial deviations from the majority of curves. For example, day 13, whose curve is
considered normal by the rest of the procedures, shows a peak at end of the day. Similar peaks
can be observed also in other curves detected as outliers by other methods (e.g., days 16 and
37), which means that it may be occurring a masking effect to day 13’s detriment, and only
KFSDwei points out this possibly outlying feature of the curve. Regarding the training step for
KFSD to set σ, it gives as result the 70% percentile. Observing the first graph of Figure 4.4, it
can be noticed that some curves have a likely outlying behavior, and this may be the reason
why a weakly local approach for KFSD may be adequate enough.
In the case of NW, some methods detect no curves as outliers (e.g., all the global FSD-
based methods), exclusively three FBP-based methods flag day 5 as outlier, whereas days 7, 20
FUNCTIONAL OUTLIER DETECTION 75
and 21 are detected as outliers by, among others, our methods. Days 7 and 20, which have two
peaks, at the beginning and end of the day, are also flagged by other twelve and eight methods,
respectively, while day 21, which shows a single peak in the first hours of the day, is considered
atypical by only two other methods, which happen to be local (FBP+HMD and FBP+KFSD).
This last result may be connected with what has been observed at the KFSD training step for
selecting the percentile, i.e., the selection of the 30% percentile. Therefore, KFSDsmo, KFSDtri
and KFSDwei work with a strongly local percentile, and their results partially resemble the ones
of the previously mentioned local techniques.
4.5 Conclusions
This chapter proposes three methods to detect outliers in functional samples based on KFSD.
We presented a way to set a KFSD-threshold to identify outliers in Theorem 4.1. In practice,
it is necessary to observe two samples to apply Theorem 4.1, and one sample is requested to
have a considerably large size. To overcome this practical limitation, we proposed KFSDsmo,
KFSDtri and KFSDwei: they are methods based on smoothed resampling techniques and, more
important, can be applied when a unique functional sample is available, no matter its size.
We also proposed a new procedure to set the key parameter of KFSD, i.e., its bandwidth
σ, based on obtaining training samples by means of smoothed resampling techniques. The
general idea behind this procedure can be applied to other functional depths or methods with
parameters that need to be set.
We investigated the performances of KFSDsmo, KFSDtri and KFSDwei by means of a sim-
ulation study. We focused on challenging scenarios with low magnitude, shape and partial
outliers (faint outliers) instead of high magnitude outliers (clear outliers). Along the simula-
tion study, KFSDwei, KFSDtri and KFSDsmo attained the largest correct detection performances
in most of the analyzed setups. However, in some cases they paid a price in terms of false de-
tection. Nevertheless, KFSDwei, KFSDtri and KFSDsmo work with a given desired false alarm
probability. Thus, higher false detection percentages than the competitors can be explained
FUNCTIONAL OUTLIER DETECTION 76
by the inherent structure of the methods. Concerning the remaining methods, few competi-
tors in few scenarios outperformed our methods. However, in these cases the differences were
not great, especially for KFSDwei and KFSDtri, and more important, these competitors did not
show stability across scenarios in their results. Finally, we also considered a real application
consisting in NOx emission daily curves.
4.6 Appendix
As explained in Section 4.2, Theorem 4.1 is a functional extension of a result derived by Chen
et al. (2009) for KSD, and since they are closely related, next we report a sketch of the proof
of Theorem 4.1. The proof for KSD is mostly based on an inequality known as McDiarmid’s
inequality (McDiarmid 1989), which also applies to general probability spaces, and therefore
to functional Hilbert spaces. We report this inequality in the next lemma:
Lemma 4.1. (McDiarmid 1989 [1.2]) Let Ω1, . . . ,Ωn be probability spaces. Let Ω =∏nj=1 Ωj
and let X : Ω → R be a random variable. For any j ∈ 1, . . . , n, let (ω1, . . . , ωj , . . . , ωn) and
(ω1, . . . , ωj , . . . , ωn) be two elements of Ω that differ only in their jth coordinates. Assume that X is
uniformly difference-bounded by cj, that is, for any j ∈ 1, . . . , n,
|X (ω1, . . . , ωj , . . . , ωn)−X (ω1, . . . , ωj , . . . , ωn)| ≤ cj . (4.7)
Then, if E[X] exists, for any τ > 0
Pr (X − E[X] ≥ τ) ≤ exp
(−2τ2∑nj=1 c
2j
).
In order to apply Lemma 4.1 to our problem, define
X(z1, . . . , znZ ) = − 1
nZ
nZ∑i=1
g(zi, YnY |YnY ), (4.8)
whose expected value is given by
FUNCTIONAL OUTLIER DETECTION 77
E[X] = E
[− 1
nZ
nZ∑i=1
g(zi, YnY |YnY )
]= −Ez1∼Ymix [g(z1, YnY |YnY )] . (4.9)
Now, for any j ∈ 1, . . . , nZ and zj ∈ H, the following inequality holds
|X(z1, . . . , zj , . . . , znZ )−X(z1, . . . , zj , . . . , znZ )| ≤ 1
nZ,
and it provides assumption (4.7) of Lemma 4.1. Therefore, for any τ > 0
Pr
(Ez1∼Ymix [g(z1, YnY |YnY )]− 1
nZ
nZ∑i=1
g(zi, YnY |YnY ) ≥ τ
)≤ exp
(−2nZτ
2),
and by the law of total probability
E[Pr(Ez1∼Ymix [g(z1, YnY |YnY )]− 1
nZ
∑nZi=1 g(zi, YnY |YnY ) ≥ τ
)]= Pr
(Ez1∼Ymix [g(z1, YnY )]− 1
nZ
∑nZi=1 g(zi, YnY ) ≥ τ
)≤ exp
(−2nZτ
2)
Next, setting δ = exp(−2nZτ
2), and solving for τ , the following result is obtained:
τ =
√ln 1/δ
2nZ.
Therefore,
Pr
Ez1∼Ymix [g(z1, YnY )] ≤ 1
nZ
nZ∑i=1
g(zi, YnY ) +
√ln 1/δ
2nZ
≥ 1− δ. (4.10)
However, Theorem 4.1 provides a probabilistic upper bound for Ex∼Ynor [g(x, YnY )]. First,
note that
Ex∼Ymix [g (x, YnY )] = (1− α)Ex∼Ynor [g (x, YnY )] + αEx∼Yout [g (x, YnY )] ,
and then, for α > 0,
FUNCTIONAL OUTLIER DETECTION 78
Ex∼Ynor [g (x, YnY )] ≤ 1
1− αEx∼Ymix [g (x, YnY )] =
1
1− αEz1∼Ymix [g (z1, YnY )] . (4.11)
Consequently, combining (4.10) and (4.11), we obtain
Pr
Ex∼Ynor [g(x, YnY )] ≤ 1
1− r
1
nZ
nZ∑i=1
g(zi, YnY ) +
√ln 1/δ
2nZ
≥ 1− δ,
which completes the proof.
Marta: Quando c’e l’amore c’e tutto.Gaetano: No, chell’ e ’a salute!(In Ricomincio da tre)
Chapter 5
Conclusions
This dissertation was framed in the statistical field of functional data analysis (FDA). In partic-
ular, our main interest was the notion of functional depth. A functional depth allows to obtain
a center-outward ordering criterion for a functional sample Yn and to assess whether a curve is
central or peripheral relative to the rest of the curves in the sample. Besides ranking curves, we
showed that a functional depth may be useful to estimate in a robust way the center of a func-
tional random variable Y , to obtain central regions or to build other robust procedures based
on the identification of the most representative/central curves and on the elimination or down-
weighting of the least representative/central functional observations. For these reasons, in the
last 15 years several authors have proposed different functional depths. We reported some of
them in Chapter 2.
One of these proposals is the functional spatial depth (FSD, Chakraborty and Chaudhuri
2014). FSD relies on both a spatial and global approach. The concept of global approach of a
depth can be described as follows: let yi, yj , yk ∈ Yn be two central and one peripheral curve
relative to Yn, respectively. Then, according to a global depth, the contributions of yj and yk
to the depth value of yi should be the same. However, an alternative and also reasonable ap-
proach may consist in reducing the contribution of yk to the depth value of yi since they are
two distant curves. We described this different approach as local. The main contribution of this
thesis was the definition of a new functional depth that is precisely based on a local approach,
81
CONCLUSIONS 82
that is, the kernelized functional spatial depth (KFSD). Indeed, KFSD is a local-oriented ver-
sion of FSD and the transition from global (FSD) to local (KFSD) was achieved using kernel
methods. Moreover, we provided also another way to interpret the connection between FSD
and KFSD: using an embedding map φ from an infinite-dimensional Hilbert space H onto a
feature space F, it is possible to see KFSD as a recoded version of FSD. The double nature of
KFSD as a kernel-based and recoded depth is explained by the existing link between kernel
functions and embedding maps through the inner product function defined on H.
In Chapter 3 KFSD was used together with three different supervised classification meth-
ods (DTM, WAD and WMD) to assign unlabeled test curves to one of the groups of some
labeled training curves. We considered both simulated and real curves and DTM, WAD and
WMD were used in conjunction with KFSD and other functional depths (FMD, HMD, RTD,
IDD, MBD and FSD). The functional extension of k-NN was considered as benchmark pro-
cedure. Therefore, we analyzed a total of 22 classification methods and we substantially con-
tributed to a better knowledge of supervised functional classification. Moreover, we designed a
rather original structure for our study since we focused on challenging classification problems
where the different classes of curves were difficult to recognize at first sight and/or were con-
taminated with outliers. According to the results reported in Chapter 3, KFSD-based methods
behave well in such difficult classification problems, and in particular the pair WMD+KFSD
resulted as the best and most stable method.
Finally, in Chapter 4 we dealt with another functional problem, that is, outlier detection.
The reason why it is a good idea to use functional depths to recognize atypical curves is the fol-
lowing: if any outlier is present in a functional sample, a functional depth is expected to assign
a low depth value to this atypical curve. However, even if it is natural to build outlier detec-
tion methods based on the notion of functional depth, low depth values may also be assigned
to peripheral but non-outlying curves. Therefore, it is necessary to define procedures able to
provide a threshold depth value to discriminate between normal curves and outliers. Indeed,
in Chapter 4 we proposed three different methods to select a threshold for KFSD (KFSDsmo,
KFSDtri and KFSDwei) and we compared these new proposals with other depth-based (FBP,
CONCLUSIONS 83
Btri and Bwei) and non-depth-based (FBG and FHD) existing procedures. As for classification,
we developed an extensive simulation study. In this case we allowed exclusively faint outliers,
which are outliers hard to be recognized, and we overlooked clear outliers, which are instead
easily identifiable by most outlier detection procedures. The results of the simulation study
showed that the new methods KFSDsmo, KFSDtri and KFSDwei based on KFSD had the best
and most stable performances. This was especially the case of KFSDwei and KFSDtri.
5.1 Research Lines
We close this dissertation describing some aspects that may characterize our future research:
1. Properties of KFSD
In Section 2.4 we presented KFSD which can be interpreted as a local-oriented version
of the global oriented depth FSD. Moreover, we also showed that KFSD and FSD are
related since the first can be interpreted as a recoded version of the second. Along the
dissertation this connection remained rather implicit because we exploited another im-
portant relationship, that is, the link between an embedding map φ and a kernel function
κ. However, a profounder study of the relationship between KFSD and FSD through φ
may result useful for the definition of some properties of KFSD. Indeed, as reported in
Section 2.3, Chakraborty and Chaudhuri (2014) stated some properties of FSD: their ex-
tension to KFSD is one of our main future research goals. Also, the fulfillment by KFSD
of the desirable properties for a functional depth presented by Mosler and Polyakova
(2012) is among our objectives.
2. Different kernel functions for KFSD
At this stage of our research we did not investigate different kernels functions than
κ(x, y) = exp(−‖x−y‖
2
σ2
)for KFSD. However, while it is rather common to use Gaus-
sian kernel functions in kernel-based methods, it would be interesting to explore how
the choice of different kernels would affect the behavior of KFSD in supervised classi-
CONCLUSIONS 84
fication, outlier detection or other problems. For example, besides Cuevas et al. (2006),
also Dabo-Niang et al. (2007) proposed a kernel-type estimator of the modal curve. They
used a kernel function of a different nature, that is,
κ(x, y) =3
2
(1− d(x, y)2
h2
)1(0,1)(d(x, y)),
where d(x, y) is some measure of proximity between x and y, h is a bandwidth and
1(0,1)(u) is the indicator function of the interval (0, 1) of d(x, y). Also, the review on the
classes of kernels for machine learning from a statistics perspective by Genton (2002)
may offer other interesting alternatives for the κ(x, y) in KFSD. Therefore, an extensive
study that compares the behavior of KFSD with different kernel functions, as well as the
definition of decision rules to make the final decision, may further improve the already
good performances of KFSD in supervised classification and outlier detection.
3. Issues about KFSD as a descriptive FDA tool
We provided two different procedures to set the bandwidth σ of KFSD in supervised
classification and outlier detection. However, KFSD can also be used as a tool to obtain
some descriptive statistics of a functional sample Yn. In this work we did not focus on
descriptive FDA and neither provided a rule to set the bandwidth of KFSD in descriptive
analyses. Indeed, for such analyses it is not even obvious what criterion to consider to
choose σ, unlike in supervised classification and outlier detection. For these reasons, we
intend to explore different criteria and decision rules to select the bandwidth of KFSD in
descriptive FDA problems.
4. Other KFSD-based methods
As described in the introduction of this thesis, besides supervised classification and out-
lier detection, there are other problems that can be tackled using the notion of functional
depth, and therefore KFSD. For instance, we did not study the performances of KFSD
in location estimation of the center of a functional random variable Y and such study
would enhance our knowledge about the usefulness of KFSD in FDA. Another natural
CONCLUSIONS 85
step ahead in our research may be the definition of KFSD-based cluster analysis proce-
dures. Note that outlier detection, that has been studied in this thesis, can be seen as
a special case of cluster analysis since it is a cluster problem with maximum two clus-
ters, one of them with size much smaller than the other (even 0). Therefore, our future
efforts may go in this direction to define KFSD-based cluster procedures for functional
data which may be compared with some non-depth-based cluster techniques, e.g., the
method based on the notion of impartial trimming proposed by Cuesta-Albertos and
Fraiman (2007) or the Funclust procedure defined by Jacques and Preda (2013).
References
Agostinelli, C. and Romanazzi, M. (2011). Local depth. Journal of Statistical Planning and Infer-
ence, 141:817–830.
Barnett, V. and Lewis, T. (1994). Outliers in Statistical Data, volume 3. Wiley New York.
Biau, G., Bunea, F., and Wegkamp, M. (2005). Functional classification in Hilbert spaces. IEEE
Transactions on Information Theory, 51:2163–2172.
Brown, B. M. (1983). Statistical uses of the spatial median. Journal of the Royal Statistical Society,
Series B, 45:25–30.
Burba, F., Ferraty, F., and Vieu, P. (2009). k-nearest neighbour method in functional nonpara-
metric regression. Journal of Nonparametric Statistics, 21:453–469.
Cardot, H., Cenac, P., and Zitt, P.-A. (2013). Efficient and fast estimation of the geometric
median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli, 19:18–
43.
Cerou, F. and Guyader, A. (2006). Nearest neighbor classification in infinite dimension. ESAIM.
Probability and Statistics, 10:340–355.
Chakraborty, A. and Chaudhuri, P. (2014). On data depth in infinite dimensional spaces. Annals
of the Institute of Statistical Mathematics, 66:303–324.
Chaudhuri, P. (1996). On a geometric notion of quantiles for multivariate data. Journal of the
American Statistical Association, 91:862–872.
87
Chen, Y., Dang, X., Peng, H., and Bart, H. L. (2009). Outlier detection with the kernelized
spatial depth function. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31:288–
305.
Cuesta-Albertos, J. A. and Fraiman, R. (2007). Impartial trimmed k-means for functional data.
Computational statistics & data analysis, 51:4864–4877.
Cuesta-Albertos, J. A. and Nieto-Reyes, A. (2008). The random Tukey depth. Computational
Statistics and Data Analysis, 52:4979–4988.
Cuevas, A. (2014). A partial overview of the theory of statistics with functional data. Journal of
Statistical Planning and Inference, 147:1–23.
Cuevas, A., Febrero, M., and Fraiman, R. (2006). On the use of the bootstrap for estimating
functions with functional data. Computational Statistics and Data Analysis, 51:1063–1074.
Cuevas, A., Febrero, M., and Fraiman, R. (2007). Robust estimation and classification for func-
tional data via projection-based depth notions. Computational Statistics, 22:481–496.
Cuevas, A. and Fraiman, R. (2009). On depth measures and Dual statistics. a methodology for
dealing with general data. Journal of Multivariate Analysis, 100:753–766.
Dabo-Niang, S., Ferraty, F., and Vieu, P. (2007). On the using of modal curves for radar wave-
forms classification. Computational Statistics & Data Analysis, 51(10):4878–4890.
Epifanio, I. (2008). Shape descriptors for classification of functional data. Technometrics, 50:284–
294.
Febrero, M., Galeano, P., and Gonzalez-Manteiga, W. (2008). Outlier detection in functional
data by depth measures, with application to identify abnormal nox levels. Environmetrics,
19:331–345.
Febrero, M., Galeano, P., and Gonzlez-Manteiga, W. (2007). A functional analysis of nox levels:
location and scale estimation and outlier detection. Computational Statistics, 22:411–427.
88
Febrero, M. and Oviedo de la Fuente, M. (2012). Statistical computing in functional data anal-
ysis: the R package fda.usc. Journal of Statistical Software, 51:1–28.
Ferraty, F. and Vieu, P. (2003). Curves discrimination: a nonparametric functional approach.
Computational Statistics and Data Analysis, 44:161–173.
Ferraty, F. and Vieu, P. (2006). Nonparametric Functional Data Analysis : Theory and Practice.
Springer, New York.
Fraiman, R. and Muniz, G. (2001). Trimmed means for functional data. Test, 10:419–440.
Galeano, P., Joseph, E., and Lillo, R. E. (2014). The Mahalanobis distance for
functional data with applications to classification. Technometrics, forthcoming.
DOI:10.1080/00401706.2014.902774.
Genton, M. G. (2002). Classes of kernels for machine learning: a statistics perspective. The
Journal of Machine Learning Research, 2:299–312.
Hall, P., Poskitt, D., and Presnell, B. (2001). A functional data-analytic approach to signal
discrimination. Technometrics, 43:1–9.
Hastie, T., Buja, A., and Tibshirani, R. (1995). Penalized discriminant analysis. The Annals of
Statistics, 23:73–102.
Horvath, L. and Kokoszka, P. (2012). Inference for Functional Data With Applications. Springer,
New York.
Hyndman, R. J. and Shang, H. L. (2010). Rainbow plots, bagplots, and boxplots for functional
data. Journal of Computational and Graphical Statistics, 19:29–45.
Jacques, J. and Preda, C. (2013). Funclust: a curves clustering method using functional random
variables density approximation. Neurocomputing, 112:164–171.
James, G. and Hastie, T. (2001). Functional linear discriminant analysis for irregularly sampled
curves. Journal of the Royal Statistical Society, Series B, 63:533–550.
89
Kudraszow, N. L. and Vieu, P. (2013). Uniform consistency of knn regressors for functional
variables. Statistics and Probability Letters, 83:1863–1870.
Liu, R. Y. (1990). On a notion of data depth based on random simplices. Annals of Statistics,
18:405–414.
Liu, R. Y., Parelius, J. M., Singh, K., et al. (1999). Multivariate analysis by data depth: descrip-
tive statistics, graphics and inference, (with discussion and a rejoinder by Liu and Singh).
The Annals of Statistics, 27:783–858.
Lopez-Pintado, S. and Romo, J. (2006). Depth-based classification for functional data. In Liu,
R., Serfling, R., and Souvaine, D. L., editors, Data Depth: Robust Multivariate Analysis, Com-
putational Geometry and Applications, DIMACS Series, pages 103–120, Providence. American
Mathematical Society.
Lopez-Pintado, S. and Romo, J. (2009). On the concept of depth for functional data. Journal of
the American Statistical Association, 104:718–734.
Marx, B. and Eilers, P. (1999). Generalized linear regression on sampled signals and curves: a
p-spline approach. Technometrics, 41:1–13.
McDiarmid, C. (1989). On the method of bounded differences. In Survey in Combinatorics,
pages 148–188. Cambridge University Press, Cambridge.
Mosler, K. and Polyakova, Y. (2012). General notions of depth for functional data. arXiv
preprint. arXiv:1208.1981.
Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis. Springer, New York.
Serfling, R. (2002). A depth function and a scale curve based on spatial quantiles. In Dodge,
Y., editor, Statistical Data Analysis Based on the L1-Norm and Related Methods, pages 25–38.
Birkhauser, Basel.
Serfling, R. (2006). Depth functions in nonparametric multivariate inference. In Liu, R., Ser-
fling, R., and Souvaine, D. L., editors, Data Depth: Robust Multivariate Analysis, Computational
90
Geometry and Applications, DIMACS Series, pages 1–16, Providence. American Mathematical
Society.
Sguera, C., Galeano, P., and Lillo, R. (2014a). Functional outlier detection with a local spatial
depth. Working Paper UC3M, 14-14 (10).
Sguera, C., Galeano, P., and Lillo, R. (2014b). Spatial depth-based classification for functional
data. TEST, forthcoming. DOI:10.1007/s11749-014-0379-1.
Sun, Y. and Genton, M. G. (2011). Functional boxplots. Journal of Computational and Graphical
Statistics, 20:316–334.
Sun, Y., Genton, M. G., and Nychka, D. W. (2012). Exact fast computation of band depth for
large functional datasets: How quickly can one million curves be ranked? Stat, 1:68–74.
Tukey, J. W. (1975). Mathematics and the picturing of data. In Proceedings of the International
Congress of Mathematicians, volume 2, pages 523–531.
Zuo, Y. and Serfling, R. (2000). General notions of statistical depth function. Annals of Statistics,
28:461–482.
91