The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the...

94
The Asymptotic Distributions of The Kernel Estimations of The Conditional Mode and Quantiles December 23, 2008

Transcript of The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the...

Page 1: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

The Asymptotic Distributions of The Kernel

Estimations of The Conditional Mode and Quantiles

December 23, 2008

Page 2: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

THE ISLAMIC UNIVERSITY of GAZA

DEANERY of HIGHER STUDIES

FACULTY of SCIENCE

DEPARTMENT of MATHEMATICS

The Asymptotic Distributions of The Kernel Estimations of The Conditional

Mode and Quantiles

PRESENTED BY

Hossam Othman M. El-sayed

SUPERVISED BY

Dr. Raid Salha

A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENT

FOR THE DEGREE OF MASTER OF MATHEMATICS

1429-2008

1

Page 3: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

To my family...

i

Page 4: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Contents

Table of Contents iii

Acknowledgment iv

Abstract v

List of Figures vi

List of Tables vi

Preface 1

1 Introduction 3

1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Kernel Density Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.1 Kernel Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.3 Properties and Examples of the Kernels . . . . . . . . . . . . . . . . . . . 14

1.4 The MSE and MISE Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.5 Asymptotic MSE and MISE Approximations . . . . . . . . . . . . . . . . . 18

1.6 Optimal Bandwidth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.7 Optimal Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2 On the Estimation of the Mode 29

2.1 Mode Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

ii

Page 5: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

2.2 A Simple Estimation of the Mode . . . . . . . . . . . . . . . . . . . . . . . 38

2.3 Nonparametric Regression Estimation . . . . . . . . . . . . . . . . . . . . . 39

2.4 Joint Asymptotic Distribution of the Estimated Conditional Mode . . . . . 43

3 Quantiles Regression 53

3.1 Nonparametric estimation of conditional quantiles . . . . . . . . . . . . . . 54

3.2 Joint Asymptotic Distribution of the Conditional Quantiles . . . . . . . . . 68

3.3 Mode and Median as a Comparison . . . . . . . . . . . . . . . . . . . . . . 83

Bibliography 84

iii

Page 6: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Acknowledgment

First of all, I am awarding my great thanks for the Almighty Allah who all the time helps

me and grants me the power and courage to finish this study and give me the success in

my live.

My gratitude and respect are paid to my supervisor Dr. Raid Salha for all the interesting

discussions I had with him.

I am grateful to the Islamic University in Gaza for offering me the opportunity to get the

Master Degree of Mathematics, and my thanks to all Professors who teaching me in the

mathematics department. I would like to my express my deep tanks and appreciation to

my family, especially parents for their encouragement.

I wish also to thank my colleagues and my friends who provided suggestions in this study.

Finally, I pray to Allah to accept this work.

iv

Page 7: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Abstract

In this thesis, we study the kernel estimation of the conditional probability density func-

tion and two of its aspects, the conditional mode and the conditional quantiles.

For the conditional mode, we study the asymptotic normality of its kernel estimation from

[18] and we study the conditions under which the conditional mode estimated at finite

distinct points is asymptotically normally distributed.

Also, we study the kernel estimation for the conditional quantile from [1] and we study

the conditions under which the joint distribution of several conditional quantile is asymp-

totically normally distributed.

v

Page 8: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

List of Figures

1.1 Kernel density estimation based on 7 points 14

1.2 Kernel density estimates based on different bandwidths 23

1.3 The Epanechinkov kernel K∗ 27

1.4 Kernel density estimates of the Ethanol data 28

List of Tables

1.1 Common kernel functions 15

1.2 Efficiency of several kernels completed to the optimal kernel K∗ 28

vi

Page 9: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Preface

The probability density function is a fundamental concept in statistics. Suppose we have

a set of observed data points assumed to be a sample from an unknown probability density

function f . The construction of an estimate of the density function from observed data

is known as density estimation.

The classical approach for estimating the density function is called parametric density

estimation. Here one assumes that the data are drawn from a known parametric distrib-

ution which depends only on finitely many parameters, and one uses the data to estimate

the unknown values of these parameters. For example, the normal distribution depends

on two parameters , the mean µ and the variance σ2. The density function f could be

estimated by finding estimates of µ and σ2 from the data, and substituting these estimates

into the formula for the normal density.

Parametric estimates usually depend only on a few parameters, therefore they are suitable

on even for small sizes n. Another approach of density estimation is the nonparametric

estimation. For example Histograms, the naive estimator and the kernel estimator, etc.

We will concentrate on the Kernel estimator. In this case we do not assume that the data

are drwan from a known parametric distribution. This data are allowed to decide which

function fits them best, without the restrictions imposed by the parametric estimation.

For more details see [22].

There are several reasons for using the nonparametric smoothing approaches.

1) They can be employed as a convenient and succinct means of displaying the features

of the data set and hence to aid practical parametric model building.

1

Page 10: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

2) They can be used for diagnostic checking of an estimated parametric model.

3) One may want to conduct inference under only the minimal restrictions imposed in

fully nonparametric structures. For more details see [20]

The main subject of this thesis, is the kernel estimation of the probability density

function, and the conditional distribution function.

Now suppose that (Xi, Yi) are R×R random variable with a common probability den-

sity function f . We want to study the relationship between a response variable Y and

a predictor variable X. To explore this relationship, we use the regression analysis to

quantify it.

The conditional distribution function F (Y |X = x) is very important for solving this prob-

lem. In parametric and nonparametric estimation of the conditional distribution function

most investigation of the underlying structure is concerned with the mean regression func-

tion m(x) = E(Y |X = x), the conditional mean of Y given the value x of X. New insight

about the underlying structure can be gained by considering other aspects of the condi-

tional distribution function F (Y |X).

In this thesis, we will study two other aspects of the conditional distribution function,

its mode and quantiles.

This thesis consist of three chapters, in the first chapter we present some basic definitions

and theorems which will be used in the next chapters. Also, we present the idea of the

kernel estimation of the probability density function and some related topics.

In Chapter two, we introduce the kernel estimation for the mode and the conditional mode

function in the case of independent and identically distributed (i.i.d.) random variables.

We will study the asymptotic behavior of the estimators of the mode and the conditional

mode functions.

Finally, in Chapter three we will study the kernel estimation of the conditional quantile

and asymptotic behavior of this estimation.

2

Page 11: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Chapter 1

Introduction

This Chapter contains some basic definitions and facts that we need in the remanning

of this thesis. In Section 1.1, we present some preliminaries in probability and statistics.

And in the remaining sections of this chapter, we present the idea of the kernel estimation

and some important subjects related to it.

1.1 Preliminaries

In this Section, we will introduce some basic definitions and theorems, that will help in

the remanning of this thesis.

Definition 1.1.1. [8](σ − Field). Let B be a collection of subsets of C. We say that B

is a σ − Field if

(1) φ ∈ B, (B is not empty).

(2) If A ∈ B, then Ac ∈ B, (B is closed under complements).

(3) If the sequence of sets C1, C2, . . . is in B, then∞⋃i=1

Ci ∈ B, (B is closed under

countable unions).

Definition 1.1.2. [8](Probability). Let C be a sample space and let B be a σ − Field

on C. Let P be a real valued function defined on B. Then P is a probability set function

3

Page 12: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

if P satisfies the following three conditions:

1. P(C)≥ 0, for all C ∈ B.

2. P(C)=1.

3. If Cn is a sequence of sets in B and Cm ∩ Cn = φ for all m 6= n, then

P (∞⋃i=1

Cn ) =∞∑

n=1

P (Cn).

Definition 1.1.3. [8] Consider a random experiment with a sample space C. A function

X, which assigns to each element c ∈ C one and only one number X(c) = x, is called a

random variable. The space or range of X is the set of real numbers

D = x : x = X(c), c ∈ C. D will generally be countable set or an interval of real

numbers.

Definition 1.1.4. [6] If X is a discrete random variable, the function given by f(x) =

P (X = x) for each x within the range of X is called the probability distribution of

X.

Definition 1.1.5. [6] If X is a discrete random variable, the function given by

F (x) = P (X ≤ x) =∑t≤x

f(t) for −∞ < x < ∞

where f(t) is the value of the probability distribution of X at t, is called the dis-

tribution function, or the cumulative distribution function, of X and denoted by

(cdf).

Definition 1.1.6. [6] A function with values f(x), defined over the set of all real numbers,

is called a probability density function of the continuous random variable X if and

only if

P (a ≤ X ≤ b) =

∫ b

a

f(x)dx

for any real constants a and b with a ≤ b.

4

Page 13: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Definition 1.1.7. [6] If X is a continuous random variable and the value of its probability

density at t is f(t), then the function given by

F (x) = P (X ≤ x) =

∫ x

−∞f(t)dt for −∞ < x < ∞

is called the distribution function, or the cumulative distribution, of X.

Definition 1.1.8. [8] The support of a continuous random variable X consists of all

points x such that fX(x) > 0.

Definition 1.1.9. [8] (Independence). Let the random variables X1 and X2 have the

joint pdf f(x1, x2) and the marginal pdfs f1(x1) and f2(x2) respectively. The random

variables X1 and X2 are said to be independent if, and only if, f(x1, x2) ≡ f1(x1)f2(x2)

Random variables that are not independent are said to be dependent.

Definition 1.1.10. [8] Let X be a random variable with pdf with parameter θ. Let

X1, . . . , Xn be a random sample from the distribution of X and let T denotes an estimator

of θ. We say T is an unbiased estimator of θ if

E(T ) = θ.

If T is not unbiased, we say that T is a biased estimator of θ.

Theorem 1.1.1. [6]

If θ is an unbiased estimator of θ and

V ar(θ) =1

n E[(∂lnf(X)∂θ

)]2

then θ is a minimum variance unbiased estimator of θ.

Definition 1.1.11. [6] The statistic θ is a Consistent estimator of the parameter θ if

and only if for each c > 0

limn→∞

P ( |θ − θ| < c ) = 1.

5

Page 14: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Theorem 1.1.2. [6]

If θ is an unbiased estimator of θ and V ar(θ) −→ 0 as n −→∞, then θ is a consistent

estimator of θ.

Definition 1.1.12. [6] The statistic θ is a sufficient estimator of the parameter θ if

and only if for each value of θ the conditional probability distribution or density of the

random sample X1, X1, . . . , Xn given θ = θ is independent of θ.

Definition 1.1.13. [8](Characteristic Function) The characteristic function of a ran-

dom variable X with distribution function F, denoted by k(u), is defined be

k(u) =

∫ ∞

−∞e−iuyK(y)dy

Theorem 1.1.3. [8]

The characteristic function of any random variable is a uniformly continuous function.

Theorem 1.1.4. [8](Minkowski’s Inequality)

Let X, Y be two random variables. Then it holds for 1 ≤ p < ∞ that

E(|X + Y |p) 1p ≤ (E(|X|p)) 1

p + (E(|Y |p)) 1p .

Definition 1.1.14. [12] Let r be a positive number such that

kr = limu→0

1− k(u)

|u|r

is finite. If there exists a value of r such that kr is non-zero, it is called the characteristic

exponent of the transform k(u), and kr is called the characteristic coefficient.

Definition 1.1.15. [16] If A is any set, we define the Indicator function IA of the set

A to be the function given by

IA =

1 if x ∈ A,

0 if x 6∈ A.

6

Page 15: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Definition 1.1.16. [8](Converge in Probability). Let Xn be a sequence of random

variables and let X be a random variable defined on a sample space. We say Xn converges

in probability to X if for all ε > 0, we have

limn→∞

P [|Xn −X| ≥ ε] = 0,

or equivalently,

limn→∞

P [|Xn −X| < ε] = 1.

If so, we write

Xnp−→ X.

Definition 1.1.17. [8] (Converge in Distribution). Let Xn be a sequence of random

variables and let X be a random variable. Let FXn and FX be, respectively, the cdfs of

Xn and X. Let C(FX) denote the set of all points where FX is continuous. We say that

Xn converge in distribution to X if

limn→∞

FXn(x) = FX(x), for all x ∈ C(FX).

We denote this convergence by

XnD−→ X.

Definition 1.1.18. [8](Converge with probability 1) Let Xn∞n=1 be a sequence of

random variables on ( Ω , L , P ). We say that Xn converge almost surly to a ran-

dom variable X (Xna.s.−→ X) or Converge with probability 1 to X or Xn converge

strongly to X if and only if

P (w : Xn(w) −→ X(w) as n −→∞) = 1,

or equivalent, for all ε > 0, there exists N ∈ N

P ( |Xn −X| < ε, n ≥ N) = 1.

7

Page 16: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Theorem 1.1.5. [8]

1. If Xn converge to X with probability 1, then Xn converge to X in probability.

2. If Xn converge to X in probability, then Xn converge to X in distribution.

3. Let Xn converge to X in probability and let g be a continuous function on R,

then g(Xn) converge to g(X) in probability.

Example 1.1.1. ( Converge in probability 6=⇒ Converge with probability 1. )

Let Ω = (0, 1] and P a uniform distribution on Ω.

Define An by

A1 = (0, 12], A2 = (1

2, 1]

A3 = (0, 14], A4 = (1

4, 1

2], A5 = (1

2, 3

4], A6 = (3

4, 1]

A7 = (0, 18], A8 = (1

8, 1

4], . . .

Let Xn(w) = IAn(w).

Then P (|Xn − 0| ≥ ε) −→ 0 ∀ε > 0, since Xn is 0 except on An and P (An) ↓ 0. Thus Xn

converge to 0 in probability.

But P (w : Xn(w) −→ 0) = 0 (and not 1) because any w keeps being in some An

beyond any n0, i.e, Xn(w) look like 0 . . . 010 . . . 010 . . . 010 . . . , so Xn not converge with

probability 1 to 0.

Definition 1.1.19. [4] Let A ⊆ R, let f : A −→ R, and let c ∈ A. We say that f is

continuous at c if, given any neighborhood Vεf(c) of f(c) there exists a neighborhood

Vδ(c) of c such that if x is any point of A ∩ Vδ(c), then f(x) belongs to Vεf(c).

Definition 1.1.20. [4] A function f : A −→ R is said to be bounded on A if there

exists a constant M > 0 such that |f(x)| ≤ M for all x ∈ A.

Definition 1.1.21. [4] Let A ⊆ R, let f : A −→ R. We say that f is uniformly

continuous on A if for each ε > 0 there is a δ(ε) > 0 such that if x, u ∈ A are any

numbers satisfying |x− u| < δ(ε), then |f(x)− f(u)| < ε.

8

Page 17: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Definition 1.1.22. [4] Let A ⊆ R, let f : A −→ R. If there exists a constant K > 0

such that

|f(x)− f(u)| ≤ K|x− u|

for all x, u ∈ A, then f is said to be a Lipschitz function (or satisfy a Lipschitz

condition) on A.

Definition 1.1.23. [4] Let f : [a, b] −→ R and let a = x0 < x1 < . . . < xk = b

be any subdivision of [a, b], define p =k∑

i=1

[f(xi)− f(xi−1)]+, n =

k∑i=1

[f(xi)− f(xi−1)]−

and t = n + p. Define P = sup p, N = sup n, and T = sup t.

If T < ∞, we say that f is of bounded variation over [a, b] and we write f ∈ BV.

Definition 1.1.24. [4] A function f : [a, b] −→ R ia said to be absolutely continuous

if given ε > 0 there is δ > 0 such that if (xi, yi)ni=1 is finite pairwise disjoint family of

subintervals of [a, b] withn∑

i=1

|xi − yi| < δ, thenn∑

i=1

|f(xi)− f(yi)| < ε.

Theorem 1.1.6. [16]

Every absolutely continuous function is uniformly continuous function.

Theorem 1.1.7. [16]

If f is absolutely continuous function on [a, b], then f is of bounded variation.

Definition 1.1.25. [4] A set E is said to be measurable if for each set A we have

M?(A) = M?(A ∩ E) + M?(A ∩ Ec),

where M? is the outer measure which defined by

M?(A) = infA⊂S In

∑L(In)

Theorem 1.1.8. [16]

If f : A −→ R is a Lipschitz function, then f is uniformly continuous on A.

9

Page 18: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Theorem 1.1.9. [13]( Classical Central Limit Theorem ):

Let Xk, k ≥ 1 be i.i.d random variable with mean µ and finite variance σ2. Also let

Zn = (Tn − nµ)/σ√

n

where Tn =n∑

i=1

Xi. Then ZnD−→ N( 0 , 1 )

Theorem 1.1.10. [13]( Liapounov Theorem )

Let Xk, k ≥ 1, be independent random variables such that EXk = µk and V arXk =

σ2k, and for some 0 < δ ≤ 1,

v(k)2+δ = E|Xk − µk|2+δ < ∞, k ≥ 1.

Also let Tn =n∑

k=1

Xk, ξn = ETn =n∑

k=1

µk, s2n = V arTn =

n∑

k=1

σ2k, Zn = (Tn − ξn)/sn

and ρn = s−(2+δ)n

n∑

k=1

v(k)2+δ. Then, if limn→∞ ρn = 0, we have Zn

D−→ N( 0 , 1 ).

Theorem 1.1.11. [13]

Let Xk, k ≥ 1, be independent random variables such that Pa ≤ Xk ≤ b = 1

for some finite scalers a < b. Also let EXk = µk, V arXk = σ2k, Tn =

n∑

k=1

Xk, ξn =n∑

k=1

µk

and s2n =

n∑

k=1

σ2k. Then

Zn = (Tn − ξn)/snD−→ N( 0 , 1 ) if and only if sn −→∞ as n −→∞.

Theorem 1.1.12. [13] ( Borel - Cantelli Lemma )

Let An be a sequence of events and denote by P (An) the probability that An occurs,

n ≥ 1. Also, let A denote the event that the An occurs infinitely often (i.o). Then

∑n≥1

P (An) < ∞ =⇒ P (A) = 0,

no matter whether the An are independent or not. If the An are independent, then

∑n≥1

P (An) = +∞ =⇒ P (A) = 0.

10

Page 19: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Lemma 1.1.13. [19]

There exists a universal constant C > 0 such that for each n > 0, εn > 0 and

distribution function F,

P supx∈R

|Fn(x)− F (x)| > εn ≤ C exp (−2nε2n).

Theorem 1.1.14. [13] ( Cramer-Wold )

Let X, X1, X2, . . . be random vectors in Rp; then XnD−→ X if and only if, for a fixed

λ ∈ Rp, we have λtXnD−→ λtX.

Theorem 1.1.15. ( Taylor’s Theorem )

Suppose that f is a real-valued function defined on R and let x ∈ R. Assume that f

has p continuous derivative in an interval (x− δ, x + δ) for some δ > 0 and the (p + 1)th

derivative of f exists. Then for any sequence (αn) converging to zero, we have

f(x + αn) =

p∑j=0

(αjn/j!)f (j)(x) + o(αp

n).

11

Page 20: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

1.2 Kernel Density Estimation

Suppose X1, X2, . . . , Xn is a sequence of independently and identically distributed (i.i.d.)

random variables with common probability density function f(x). The problem of esti-

mating the function f(x) is of interest for many reasons. For instance, it can be used to

calculate probabilities. In parallel, if we know f(x), we are be able, through its graph to

determine its shape as well as other features of the distribution, like if it has one peak ore

more, if it smooth, symmetric, etc.

1.2.1 Kernel Estimator

Let X1, X2, . . . , Xn be i.i.d. random variables with distribution function F (x) = P (X ≤x) which is absolutely continuous,

F (x) =

∫ x

−∞f(y)dy,

with probability density function f(x).

The sample distribution function Fn(x) at a point x is defined as

Fn(x) =1

n number of observations x1, x2, . . . , xn falling in (−∞, x].

It is natural to take Fn(x) as an estimate of F (x) at a given point x. An estimate of f(x)

may be

fn(x) =1

2hn

Fn(x + hn)− Fn(x− hn), (1.2.1)

where hn is chosen as a positive number.

Equation 1.2.1 can be written as

fn(x) =1

2nhn

number of observations falling in the interval [x− h, x + h]

=1

2nhn

n∑i=1

I(|Xi − x| ≤ h)

=1

nhn

n∑i=1

1

2I(|Xi − x

h| ≤ 1)

=1

nhn

n∑i=1

w(Xi − x

h) (1.2.2)

12

Page 21: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

where

I =

1 x− h ≤ Xi ≤ x + h,

0 otherwise.

and

w(Xi − x

h) =

1

2I(|Xi − x

h| ≤ 1) =

12

− 1 ≤ Xi−xh

≤ 1,

0 otherwise.

Definition 1.2.1. We consider the function that centered at the estimation point used

to weight nearby data points as a weight function and will call it the kernel function and

denoted by K(·) which defined as

fn(x) =1

nhn

n∑i=1

K(x−Xi

hn

). (1.2.3)

Note that Equation 1.2.3 can be written as

fn(x) =1

n

n∑i=1

Kh(x−Xi),

where Kh(x) = K(x/h)/h.

The kernel estimator can be viewed as a sum of bumps placed at the observation. The

kernel function K determines the shape of bumps, where the bandwidth hn determines

their width, see the illustration in Figure 1.1

13

Page 22: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

Figure 1.1: Kernel density estimation based on 7 points (see[20])

From Figure 1.1, we have:

(1) The shape of the bump is efined the kernel function.

(2) The spread of the bump is determined by a bandwidth hn, that is analogous to the

bandwidth of a histogram.

That is the value of the kernel estimate at the point x is the average of the n kernel

ordinates at this point.

1.3 Properties and Examples of the Kernels

In this section, we will consider some properties of the kernels. A kernel is a piecewise

continuous function, symmetric around zero, even function and integrating to one, i.e.

K(x) = K(−x),

∫ ∞

−∞K(x)dx = 1. (1.3.1)

14

Page 23: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

The kernel function need not have bounded support, and in most applications K is a

positive probability density function.

A kernel function K is said to be of order p, if its first nonzero moment is µp, i.e. if

µj(K) = 0, j = 1, 2, . . . , p− 1; µp(K) 6= 0,

where

µi(K) =

∫ ∞

−∞yiK(y)dy. (1.3.2)

Some examples of kernel functions is given in Table 1.1, where I is the indicator function.

Table 1.1: Common kernel functions

kernel K(x)

Epanechnikov 34(1− x2)I(|x|≤1)

Biweight 1516

(1− x2)2I(|x|≤1)

Triweight 3532

(1− x2)3I(|x|≤1)

Triangular (1− |x|)I(|x| ≤ 1)

Gaussian (2π)−12 exp(−x2

2)

Uniform 12I(|x| ≤ 1)

Now, we shall introduce some important properties of the kernel estimator, we consider the

following conditions that we will use in proving facts, lemmas and theorems in remanning

of this chapter.

i) The unknown density function f(x) has continuous second derivative f (2)(x).

ii) The bandwidth h = hn = h(n) is a sequence of positive numbers and satisfies

limn→∞

hn = 0, and limn→∞

nhn = ∞.

iii) The kernel K is a bounded probability density function of order 2 and symmetric

about the zero.

15

Page 24: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Definition 1.3.1. The Bias of an estimator fn(x) of a density f(x) is the difference

between the expected value of fn(x) and f(x). That is

Bias(fn(x)) = E(fn(x))− f(x)

In [12], he studied the statistical properties of kernel estimator. In addition to the above,

he proved several other properties. He showed that fn(x) is a consistent of f(x) and the

sequence of estimates fn(x) is asymptotically normally distributed. Also he proved that

if the probability density function f(x) is uniformly continuous, and if limn→∞

nh2n = ∞ ,

then fn(x) tends uniformly continuously ( in probability ) to f(x), in the sense that (1.3.3)

holds

limn→∞

P ( sup−∞<x<∞

|fn(x)− f(x)| < ε) = 1, ∀ε > 0. (1.3.3)

1.4 The MSE and MISE Criteria

The important role played by kernel density estimator makes us concerned with its per-

formance, its efficiency and accuracy in estimating the true density. we will study two

types of the error criteria, the mean square error (MSE) and the mean integrated square

error (MISE).

Definition 1.4.1. The mean square error ( MSE) is used to measure the error when

estimating the density function at a single point. It is defined by

MSEfn(x) = Efn(x)− f(x)2. (1.4.1)

From its definition, the MSE measures the average squared difference between the

density estimator and the true density. In general, any function of the absolute distance

|fn(x) − f(x)| (often called metric) would serve as a measurement of the goodness of

an estimator. But MSE metric has at least two advantages over other metrics. First it

is tractable analytically. Second it has an interesting decomposition into variance and

16

Page 25: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

squared bias provided f(x) is not random, as follows

MSE(fn(x)) = E(f(x)− fn(x))2

= E(f 2(x)− 2f(x)fn(x) + fn(x)2)

= Ef 2(x)− 2f(x)Efn(x) + Efn(x)2

= f 2(x)− 2f(x)Efn(x) + Varfn(x) + (Efn(x))2

= Varfn(x) + (Efn(x)− f(x))2 (1.4.2)

Theorem 1.4.1.

Let X be a random variable having a density f, then

MSE(fn(x)) = n−1∫ ∞

−∞K2

hn(x− y)f(y)dy − (

∫ ∞

−∞Khn(x− y)f(y)dy)2

+ (

∫ ∞

−∞Khn(x− y)f(y)dy − f(x))2 (1.4.3)

Proof: See [11].

Now, we are interested in considering an error criterion that globally measures the distance

between the estimation of f over the entire real line and f itself.

Definition 1.4.2. An error criterion that measures the distance between fn(x) and f(x)

is the integrated squared error (ISE) given by

ISEfn(x) =

∫ ∞

−∞(fn(x)− f(x))2dx

Note that the ISE is not appropriate if we deal with all data sets, so we prefer to analyze

the expected value of this random quantity, the integrated squared error.

Definition 1.4.3. The expected value of ISE is called the mean integrated squared error

(MISE) is given by

MISEfn(x) = E(ISEfn(x)) = E

∫ ∞

−∞(fn(x)− f(x))2dx

17

Page 26: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

By changing the order of integration we have,

MISE(fn(x)) =

∫ ∞

−∞MSEfn(x)dx

=

∫ ∞

−∞Efn(x)− f(x)2dx +

∫ ∞

−∞V ar(fn(x))dx. (1.4.4)

Theorem 1.4.2.

The MISE of an estimator fn(x) of a density f(x) is given by

MISE(fn(x)) = n−1

∫ ∞

−∞

∫ ∞

−∞K2

hn(x− y)f(y)dydx

+ (1− n−1)

∫ ∞

−∞(

∫ ∞

−∞Khn(x− y)f(y)dy)2dx

− 2

∫ ∞

−∞∫ ∞

−∞Khn(x− y)f(y)dyf(x)dx +

∫ ∞

−∞f 2(x)dx. (1.4.5)

Proof: See [11].

1.5 Asymptotic MSE and MISE Approximations

Here, we will derive an asymptotic approximation for MISE which depend on hn in a

simple way. The simple expression of these approximations will exhibit the influence of

the bandwidth hn as a smoothing parameter.

The rate of convergence of the kernel density estimation and the optimal bandwidth can

be also obtained from the asymptotic approximation of MISE.

Before we start in our investigation we have to introduce some definitions, theorems, and

some assumptions that are needed throughout our work.

Definition 1.5.1.

i) A function f is of order less than g as x −→∞ if

limx→∞

f(x)

g(x)= 0.

we indicate this by writing f = (g) (”f is little oh g”)

18

Page 27: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

ii) Let f(x) and g(x) be positive for x sufficiently large. Then f is of at most the order of

g as x −→∞ if there is a positive integer M for which

f(x)

g(x)≤ M,

for x sufficiently large. We indicate this by writing f = O(g) (” f is big oh of g”).

Definition 1.5.2. Given two sequences an and bn such that bn ≥ 0 for all n.

We write

an = O(bn) (read : ”an is big oh of bn”, )

if there exists a constant M > 0 such that |an| ≤ Mbn for all n.

We write an = (bn) as n −→∞ ( read :” an is little oh of bn”), if

limx→∞

an

bn

= 0.

Definition 1.5.3. We say that an is asymptotically equivalent to bn, or simply an is

asymptotic to bn, and we write

an ∼ bn iff limn→∞

(an

bn

) = 1.

Lemma 1.5.1.

Let X be a random variable having a density f, then the bias of fn(x) can be expressed

as

E(fn(x))− f(x) =1

2h2

nµ2(K)f ′′(x) + (h2n) (1.5.1)

Proof :

Firstly, we assume that

∫ ∞

−∞K(z)dz = 1,

∫ ∞

−∞zK(z)dz = 0,

∫ ∞

−∞z2K(z)dz < ∞, and µ2(K) =

∫ ∞

−∞z2K(z)dz.

Note that

E(fn(x)) =

∫ ∞

−∞

1

hn

K(x− y

hn

)f(y)dy.

19

Page 28: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Let z =x− y

hn

to get

E(fn(x)) =

∫ ∞

−∞K(z)f(x− zhn)dz, since f has a continuous derivatives of order 2.

Then we can expand f(x− zhn) in a Taylor series as follows

f(x− zhn) =2∑

j=0

(−zhn)j

j!f (j)(x) + o(−zhn)2

= f(x) + (−zhn)f ′(x) +z2h2

n

2f ′′(x) + o(z2h2

n)

= f(x)− zhnf′(x) +

1

2z2h2

nf′′(x) + o(h2

n)

Therefore,

E[fn(x)] =

∫ ∞

−∞K(z)f(x)− zhnf ′(x) +

1

2z2h2

nf′′(x) + o(h2

n)dz

=

∫ ∞

−∞K(z)f(x)dz −

∫ ∞

−∞zK(z)hnf ′(x)dz +

∫ ∞

−∞K(z)

1

2h2

nz2f ′′(x)dz +

∫ ∞

−∞K(z)o(h2

n)dz

= f(x)− hnf′(x)

∫ ∞

−∞zK(z)dz +

1

2h2

nf ′′(x)

∫ ∞

−∞z2K(z)dz + o(h2

n)

= f(x) +1

2h2

nf′′(x)

∫ ∞

−∞z2K(z)dz + o(h2

n).

By the assumption, we have the result.

Lemma 1.5.2.

Let X be a random variable having a density f, then

V arfn(x) = (nhn)−1R(K)f(x) + (nhn)−1, (1.5.2)

where R(K) =

∫ ∞

−∞K2(x)dx.

Proof :

First, note that

V arfn(x) =1

n∫ ∞

−∞K2

hn(x− y)f(y)dy − [

∫ ∞

−∞Khn(x− y)f(y)dy]2

20

Page 29: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Using the Taylor series expansion of f(x− zhn) about x to get

V arfn(x) =1

nhn

∫ ∞

−∞K2(z)f(x− zhn)dz − n−1Efn(x)2

=1

nhn

∫ ∞

−∞K2(z)f(x) + o(1)dz − n−1f(x) + o(1)2

=1

nhn

∫ ∞

−∞K2(z)f(x) + o(nhn)−1.

From the assumption, the result holds.

Now from above we have some properties about the bias and the variance,

1) The bias is of order (h2n), which implies that fn(x) is asymptotically unbiased estima-

tor. (From page 15 (ii))

2) The bias is large, whenever the absolute value of the second derivative |f (2)(x)| is large.

this occurs for several densities at peaks, where the bias is negative, and valleys,

where the bias is positive.

3) The variance is of order (nhn)−1, which means that the variance converges to zero by

condition (ii) page 15.

Theorem 1.5.3.

The MISE of an estimator fn of the unknown density f is given by

MISEfn(x) = AMISEfn(x)+ (nhn)−1 + h4

where

AMISEfn(x) = (nhn)−1R(K) +1

4h4

nµ22(K)R(f ′′) (1.5.3)

is called the asymptotic MISE of fn(x), and R(K) =

∫ ∞

−∞K2(x)dx.

Proof :

From Equation (1.5.1) and (1.5.2) and applying Equation (1.4.2) to get

MSEfn(x) = (nhn)−1R(K)f(x) + o(nhn)−1+1

4h4

nµ22(K)f ′′2(x) + o(h4

n)

+ h2nµ2

2(K)f ′′(x)o(h2n)

= (nhn)−1R(K)f(x) +1

4h4

nµ22(K)f ′′2(x) + o(nhn)−1 + h4

n

21

Page 30: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

From this, we have

MSEfn(x) = (nhn)−1R(K)f(x) +1

4h4

nµ22(K)f ′′2(x) + o(nhn)−1 + h4

n

therefor,

AMISEfn(x) = (nhn)−1R(K)f(x) +1

4h4

nµ22(K)f ′′2(x).

1.6 Optimal Bandwidth

The problem of bandwidth selection is very important in density estimation. The next

figure (1.2) shows how the density estimates change with the bandwidth size. Choice of

the appropriate bandwidth is critical to the performance of most nonparametric density

estimators. When the bandwidth is very small, the estimate will be very close to the

original data. Thus it will be very wiggly due to the over fitting. The estimate will be

almost unbiased, but it will have large variation under repeated sampling. If the band-

width is very large, the estimate will be very smooth, lying close to the mean of all the

data. Such an estimate will have small variance, but it will be highly biased. A brief sur-

vey of bandwidth selection for kernel density estimation has been taken on by [22] and [23].

One way to select the smoothing parameter is simply to look at the plots of the

smoothed data for several bandwidth. If the overall trend is the feature of the most in-

terest to the investigator, a very smooth estimate may be desirable. If the investigator is

interested in local extremes, a less smooth estimate may be preferred.

Subjective choice of the smoothing parameter offers a great deal of flexibility, as well as

a comprehensive look at the data, see [22].

The AMISE (asymptotic MISE) has some useful advantages. Its simplicity as a math-

ematical expression to deal with, makes it useful for large sample approximation.

Also, we can see an important alternative relationship between bias and variance, it is

known as the variance-bias trade-off. It gives us an understanding about the role of band-

22

Page 31: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

width hn.

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

-3 -2 -1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

0.5

Figure 1.2: Kernel density estimates based on different bandwidths (see [20])

Figure 1.2 depends on three bandwidths:

If we choose hn = 0.25, then we have a solid curve. If we choose hn = 0.5, then we have

a dashed curve. And if we choose hn = 0.75, then we have adotted curve.

There are many rules for bandwidth selection, for example Normal Scale Rules, Over-

smoothed bandwidth selection rules, Least Squares Cross-Validation, Biased Cross-Validation,

Estimation of density functionals and Plug-In Bandwidth Selection. For more details see

[11], [22] and [23].

Corollary 1.6.1.

The AMISE-optimal bandwidth, hAMISE , has a closed form

hopt = [R(K)

µ2(K)2R(f (2))n]15 (1.6.1)

23

Page 32: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Proof :

By differentiating (1.5.3) with respect to hn and setting the derivative equal to zero we

can find the optimal bandwidth

d

dhAMISEfn(x) = −(nh2

n)−1R(K) + h3nµ

22(K)R(f ′′) = 0

h5nµ2

2(K)R(f ′′) = n−1R(K)

hopt = R(K)

nµ22(K)R(f ′′)

15 .

When trying to understand what this hn guides to, we will find that it depends on the

known kernel function K and n, and it is inversely proportional to R(f ′′)15 . This R(f ′′)

measures that the total curvature of f . So if R(f ′′) is small, that is f has a little curvature

and the bandwidth h will be large. On the other hand hn will be small if R(f ′′) is large.

The previous corollary gives agood optimal hn can work to choose a good bandwidth, if

R(f ′′) is known. But f is unknown.

Therefore if we substitute (1.7.1.) to (1.6.3), we obtain the smallest value of AMSE

(since the seconed derivative is grater than zero) for estimating f using the kernel K

AMISEfn(x) = (nhn)−1R(K) +n

4h5

nµ22(K)R(f ′′)

=5

4R(K)µ2

2(K)R(f ′′) 15 /n

45 (R(K))

15

=5

4n−45 (R(K))

45 (µ2(K)2R(f ′′))

15

take the infimum over hn > 0, we get

infhn>0

AMISEfn =5

4µ2(K)2R(K)4R(f (2)) 1

5 n−45

Notice that in (1.6.1) the optimal bandwidth depends on the unknown density being

estimated, so we can not use (1.6.1) directly to find the optimal bandwidth hopt. Also

from (1.6.1) we can draw the following useful conclusions:

1. The optimal bandwidth will converge to zero as the sample size increases, but at very

slow rate.

24

Page 33: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

2. The optimal bandwidth is very inversely proportional to R(f ′′)15 . Since R(f ′′) measures

the curvature of f , this means that for a density function with little curvature, the

optimal bandwidth will be large. conversely, if the density function has a large

curvature, the optimal bandwidth will be small.

1.7 Optimal Kernel

In this section, we investigate what effect the shape of kernel function K has on density

estimation. Usually K is taken to be symmetric, unimodal density function, but there

are many kernel functions that do satisfy these characteristics and still their performance

varies. The best kernel will be known as the optimal kernel.

Epanechnikov (1969) was the first to consider this problem in the density estimation

context and to give a comparison of common kernels in asymptotic performance terms.

Consider the formula for AMISEfn(x) in (1.5.3). In the formula that scaling of K is

incorporated with the bandwidth hn. This causes difficulty in optimization with respect

to K. If we Choose a re-scaling of K of the form

Kδ(x) =1

δK(

x

δ),

the dependance of K and hn can be separated. To know how this can be made, we will

give this lemma.

Lemma 1.7.1.

R(Kδ) = µ22(Kδ) is satisfied iff δ = δ0 = R(K)/µ2

2(K) 15

Proof : See [11].

Theorem 1.7.2.

Let R(Kδ) = µ22(Kδ), where δ = δ0 = R(K)/µ2(K)2 1

5 , then

AMISE(fn(x)) = C(Kδ0)(nhn)−1 +1

4h4R(f ′′). (1.7.1)

25

Page 34: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Proof :

First, since R(Kδ) = µ22(Kδ), then

µ22(K) = δ−5R(K) = δ.δ−5R(Kδ) = δ−4R(Kδ).

Note that

AMISE(fn(x)) =1

nhn

R(K) +1

4h4

nµ22(K)R(f ′′)

nhn

R(Kδ) +1

4h4

nδ−4R(Kδ)R(f ′′)

= R(Kδ0)(nhn)−1 +1

4h4

nR(f ′′)

= δ−10 R(K)(nhn)−1 +

1

4h4

nR(f ′′)

= δ−4R4(K)δ4µ22(K) 1

5(nhn)−1 +1

4h4

nR(f ′′)

= C(Kδ0)(nhn)−1 +1

4h4R(f ′′)

Thus the result holds.

Definition 1.7.1. We say that C(K) is invariant to re-scalings of K if C(Kδ1) = C(Kδ2),

for any δ1, δ2 > 0. We call Kc = Kδ0 the canonical kernel for the class Kδ : δ > 0 of

resealed K.

Corollary 1.7.3.

C(K) is invariant to re-scaling of K.

Proof : See [11].

Canonical kernels also can simplify the optimization procedure of the kernel shape.

That is; from Equation (1.7.1), it is enouhg to choose K that minimizes C(Kδ0), with:

∫ ∞

∞K(x)dx = 1,

∫ ∞

∞xK(x)dx = 0,

∫ ∞

∞x2K(xdx) = a2 ≤ ∞ and K(x) ≥ 0 for all x.

The solution to this problem was given by as

Ka(x) =3

4(1− x2/(5a2))/(5

12 a)I|x|<5

12 a, (1.7.2)

26

Page 35: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

where a is an arbitrary scale parameter.

Now, if we choose a2 =1

5, we get the simp;est of Ka(x)

K?(x) =3

4(1− x2)I|x|<1. (1.7.3)

The kernel in (1.7.3) is known as Epanechinkov kernel, since its optimality properties in

density estimation were first described by Epanechinkov (1969).

Now, we will introduce the useful ratio C(K?)/C(K) 54 .

Definition 1.7.2. The ratio C(K?)/C(K) 54 represents ratio of sample sizes necessary

to obtain the same minimum AMISE (for a given f) when using K?(x) as when using K,

and is called the efficiency of K relative to K?.

x

y

-3 -2 -1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

x

y

-3 -2 -1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

x

y

-3 -2 -1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Figure 1.3: The Epanechinkov kernel K∗ (see [20])

27

Page 36: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

kernel C(K?)/C(K) 45

Epanechnikov 1.000

Biweight .994

Triweight .987

Triangular .986

Gaussian .951

Uniform .930

Table 1.2 : Efficiency of several kernels compered to the optimal kernel K?.

From Table 1.2, if the efficiency of K is 0.98, this means that, we have to use 98%

of data as that using K, if we want the density estimate optimal kernel K? to reach the

same minimum AMISE.

Figure 1.4 shows the kernel density estimates of the Ethanol data based on the same

bandwidth hn = 0.2, but using different kernels. The solid curve stands for the triangular

kernel, the dashed curve for the uniform kernel, and the dotted curve for the normal

kernel.

0.6 0.8 1.0 1.2

12

34

Figure 1.4: Kernel density estimates of the Ethanol data (see [20])

28

Page 37: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Chapter 2

On the Estimation of the Mode

The mode is considered as one of the central tendency measures, the tendency of data

towards the central around a particular value. The mode is defined as the common value

or the most repetitive among those observation, and the data might have more than one

modes or nothing at all which can’t be calculated.

The mode can be found by calculation or drawing and it is not affected by the irregu-

lar values. It’s important to refer to the relation between the mean, the median, and

the mode. Those measures are the most used measures of locations by the investiga-

tors, because those measures are easy to understand. They are equal when the curve is

symmetric, while when the curve is in a state of a positive bending, the mean is larger

than the median and the both are larger than the mode. When the curve is in a state of

negative bending, the mode is larger than the median and the both are larger than the

mean.

In this chapter, firstly we present the kernel estimation of the mode and conditional mode

function, and we study them in the case of i.i.d. random variables. Also we study the

asymptotic behavior of the mode and the conditional mode estimation.

This chapter consists of four sections. In Section 2.1, we introduce the problem of es-

timating the mode of a probability density function, and giving historical notes. We

study under what conditions the asymptotic normality of the unconditional mode. In

the next section, we present the simple mode estimation of [3]. Section 2.3 comprises an

29

Page 38: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

introduction to the study of the relationship between two variables, X and Y, the first is

called the predictor variable, and the second is the response variable, and we present the

Nadaraya-Watson estimator as on approach of the kernel regression estimators. Finally,

we study the joint estimation of the conditional mode function taken at k finite distinct

points.

2.1 Mode Estimation

The problem of estimating the mode of a probability density function has received con-

siderable attention in the literature. The study of nonparametric mode estimation is now

four decades old, having roots in many papers. In the last few years, an increasing interest

in this topic can be observed. Among the most recent evidence of this growing interest

are the papers by [3].

There are many fields where the knowledge of the mode is of great interest. For example,

the estimation of contours is a natural extension of the estimation of mode points. For

more details see [18] and [20].

Let X1, X2, . . . , Xn be a sequence of i.i.d. random variable with pdf f. Assume that

the probability density function f(x) is uniformly continuous in x. It follows that f(x)

possesses a mode θ which defined by

f(θ) = maxx

f(x).

Assume that θ is unique.

The classical procedure to estimate the mode is as follows:

If f(x) is the unknown function and θ is the mode of f, then θ is estimated from the

location θn which maximize function fn for f.

Suppose that fn(x) is a continuous function and tends to 0 as s tends to ∞. There is a

random variable θn such that

θn = arg maxx

fn(x) (2.1.1)

We call θn the sample mode.

30

Page 39: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Lemma 2.1.1. (Bochner Lemma)

Suppose K(y) is a mesurable function satisfying the following:

1. sup−∞<y<∞

|K(y)|dy < ∞,

2.

∫ ∞

−∞|K(y)|dy < ∞,

3. limy→∞

|yK(y)| = 0.

Let g(y) satisfy

∫ ∞

−∞|g(y)|dy < ∞. Let hn be a sequence of positive constants satisfying

the condition

4. limn→∞

hn = ∞.

Define gn(x) =1

hn

∫ ∞

−∞K(

y

hn

)g(x− y)dy.

Then at every point x of continuity of g(·),

limn→∞

gn(x) = g(x)

∫ ∞

−∞|K(y)|dy. (2.1.2)

Proof :

Note first that

gn(x)− g(x)

∫ ∞

−∞|K(y)|dy =

1

hn

∫ ∞

−∞K(

y

hn

)g(x− y)dy − g(x)

∫ ∞

−∞K(y)dy

=

∫ ∞

−∞g(x− y)− g(x) 1

hn

K(y

hn

)dy.

Let δ > 0, and split the region of integration into two regions, |y| ≤ δ and |y| > δ.

Now let z =y

hn

. Then y = zhn and so dy = hndz. Then we have dz =1

hn

dy.

31

Page 40: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Now

|gn(x)− g(x)

∫ ∞

−∞K(y)dy| = |g(x− y)− g(x)|

|y|≤δ

1

hn

K(y

hn

)dy

+ |g(x− y)− g(x)|∫

|y|>δ

1

hn

K(y

hn

)dy

≤ |g(x− y)− g(x)|∫

|y|≤δ

1

hn

K(y

hn

)dy

+

|y|≥δ

|g(x− y)|y

y

hn

K(y

hn

)dy + g(x)

|y|≥δ

1

hn

K(y

hn

)dy

≤ max|y|≤δ

|g(x− y)− g(x)|∫

|z|≤δ/hn

|K(z)|dz

+

|y|≥δ

|g(x− y)|y

y

hn

K(y

hn

)dy + |g(x)|∫

|y|≥δ

1

hn

K(y

hn

)dy

≤ max|y|≤δ

|g(x−y)−g(x)|∫ ∞

−∞|K(z)|dz+

1

δsup

|z|≥δ/hn

|zK(z)|∫ ∞

−∞|g(y)|dy

+ |g(x)|∫

|z|≥δ/hn

|K(z)|dz, ( Since1

y<

1

δ)

which tends to 0 as n tends to ∞, and δ tend to 0.

Lemma 2.1.2.

Consider the foemula of fn(x) as in Equarion (1.2.3). Then fn(x) can be written as

fn(x) = (2π)−1

∫ ∞

−∞e−iuxK(uhn)ϕn(u)du. (2.1.3)

where

ϕn(u) =

∫ ∞

−∞eiuxdFn(x) = n−1

n∑

k=1

eiuXk

Proof : See [12]

Theorem 2.1.3.

Under the conditions of the last lemma (2.1.1), if hn is a function of n satisfying

limn→∞

nh2n = ∞, lim

n→∞E[fn(x)] = f(x), and if the probability density function f(x) is uni-

formly continuous. Then for every ε > 0

P [supx|fn(x)− f(x)| < ε] −→ 1 as n −→∞.

32

Page 41: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Proof :

To prove this theorem we want to show that

limn→∞

E12 [ sup−∞<x<∞

|fn(x)− f(x)|2] = 0. (2.1.4)

Since limn→∞ E[fn(x)] = f(x), it suffices to show that

E12 [ sup−∞<x<∞

|fn(x)− E[fn(x)]|2] −→ 0, (2.1.5)

as n −→∞, since by Lemma 2.1.1, it follows that

limn→∞

sup−∞<x<∞

|E[fn(x)]− f(x)| = 0.

Since

fn(x) = (2π)−1

∫ ∞

−∞e−iuxK(uhn)ϕn(u)du.

Then

sup−∞<x<∞

|fn(x)− E[fn(x)]| ≤ (2π)−1|∫ ∞

−∞eiuxK(uhn)ϕn(u)du−

∫ ∞

−∞eiuxK(uhn)E[ϕn(u)]du|

= (2π)−1

∫ ∞

−∞|eiuxK(uhn)ϕn(u)− E[ϕn(u)]du|

= (2π)−1

∫ ∞

−∞|K(uhn)||ϕn(u)− E[ϕn(u)]|du. ( since | eiux| = 1.)

Therefore, by Minkowski’s inequality, the quantity in (2.1.5) is no greater than

(2π)−1

∫ ∞

−∞|K(uhn)|σ[ϕn(u)]du ≤ (n

12 hn)−1

∫ ∞

−∞|k(u)|du

which tends to 0. The proof of this Theorem is complete.

Theorem 2.1.4.

Under the conditions of the last Theorem, If θn are the sample modes, and if the

population mode θ is unique, then for every ε > 0

limn→∞

P (|θn − θ| < ε) = 1, ∀ε > 0. (2.1.6)

33

Page 42: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Proof :

since f(x) is a uniformly continuous probability density function with a unique mode θ,

it has the following property,

for every ε > 0 there exists an η > 0 such that, for every point x, |θ − x| ≥ ε implies

|f(θ)− f(x)| ≥ η.

If the assertion were false, then there would exist an ε > 0 and a sequence of xn such

that

|f(θ)− f(xn)| < 1

nand |θ − xn| ≥ ε (2.1.7)

Now (2.1.7), and the fact f(x) −→ 0 as x −→ ±∞, implies that there exists point θ′ 6= θ

such that f(θ′) = f(θ), which contradicts the assumption that f(x) has a unique mode θ.

From this assertion since f is uniformly continuous, and it follows that to prove θn −→ θ

in probability, it sufficient to prove that

f(θn) −→ f(θ) in probability as n −→∞. (2.1.8)

Now,

|f(θn)− f(θ)| = |f(θn)− fn(θn) + fn(θn)− f(θ)|≤ |f(θn)− fn(θn)|+ |fn(θn)− f(θ)|≤ sup

x|f(x)− fn(x)|+ sup

x|fn(x)− f(x)|

= 2 supx|fn(x)− f(x)| (2.1.9)

Since

|fn(θn)− f(θ)| = | supx

fn(x)− supx

f(x)| ≤ supx|fn(x)− f(x)|. (2.1.10)

From (2.1.9) and Theorem (2.1.2), we obtain (2.1.8).

Nadaraya (1965) has proved the strongest result in this direction. He proved that

under certain conditions, the sample mode θn converges to the population mode θ with

probability 1.

To achieve the asymptotic normality of θn, and therefore to be able to construct as-

ymptotic confidence interval for θ, it is generally believed that rather heavy smoothing

34

Page 43: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

conditions are needed. The next Theorem state conditions on the constants hn and the

kernel K(u) such that the estimated mode θn is asymptotically normally distributed.

Consider a probability density function f(x) with a unique mode at θ. If f(x) has a

continuous second derivative, then by definition of the mode we have

f ′(θ) = 0, f ′′(θ) < 0. (2.1.11)

Similarly, if the estimated probability density function fn(x) is chosen to be twice dif-

ferentiable (that is, the weighting function K(y) is chosen to be twice differentiable),

then

f ′n(θn) = 0, f ′′n(θn) < 0, (2.1.12)

if θn is the mode of fn(x). Then by Taylor’s theorem, we have

0 = f ′n(θn) = f ′n(θ) + (θn − θ)f ′′n(θ?n) (2.1.13)

for some random variable θ?n between θn and θ. From (2.1.13) we can write

θn − θ = −f ′n(θ)/f ′′n(θ?n) (2.1.14)

if the denominator does not vanish. Using (2.1.14) as a basis, we now state conditions

under which the estimated mode θn is asymptotically normally distributed.

Theorem 2.1.5.

Suppose that there exists δ, 0 < δ < 1, such that the transform k(u) has a charac-

teristic exponent r ≥ 2, and satisfies

1.

∫ ∞

−∞u2+δ|k(u)|du < ∞, and hn is a function of n satisfying

2. limn→∞

nh5+2δn = 0,

3. limn→∞

nh6n = ∞, and that the characteristic function ϕn(u) satisfying

4.

∫ ∞

−∞u2+δ|φ(u)|du < ∞.

35

Page 44: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Then as n −→∞,

E[ sup−∞<x<∞

|f ′′n(x)− f ′′(x)|2] −→ 0 (2.1.15)

f ′′n(θ?n) −→ f ′′(θ) in probability (2.1.16)

(nh3)12 f ′n(θ) −→ N(0, f(θ)J) in distribution (2.1.17)

(nh3n)

12 (θn − θ) −→ N(0, f(θ)/[f ′′(θ)]2J) in distribution (2.1.18)

where we define

J =

∫ ∞

−∞K ′2(y)dy = (2π)−1

∫ ∞

−∞u2k2(u)du. (2.1.19)

Proof :

From (2.1.3), since fn(x) = (2π)−1

∫ ∞

−∞e−iuxk(uhn)ϕn(u)du.

Then f ′n(x) =−i

∫ ∞

−∞ue−iuxk(uhn)ϕn(u)du,

and so f ′′n(x) =i2

∫ ∞

−∞u2e−iuxk(uhn)ϕn(u)du =

−1

∫ ∞

−∞u2e−iuxk(uhn)ϕn(u)du.

First, we will prove (2.1.15)

|f ′′n(x)− E[f ′′n(x)]| = |−1

∫ ∞

−∞u2e−iuxk(uhn)ϕn(u)du− −1

∫ ∞

−∞u2e−iuxk(uhn)E[ϕn(u)]du|

= |−1

∫ ∞

−∞u2e−iuxk(uhn)ϕn(u)du +

1

∫ ∞

−∞u2e−iuxk(uhn)E[ϕn(u)]du|

=1

2π|∫ ∞

−∞u2e−iuxk(uhn)[ϕn(u)− E[ϕn(u)]]du|

≤ 1

∫ ∞

−∞|e−iuxk(uhn)u2[ϕn(u)− E[ϕn(u)]]du|

= (2π)−1

∫ ∞

−∞|k(hnu)|u2|ϕn(u)− E[ϕn(u)]|du. ( Since |e−iux| = 1.)

Let uhn = v, then dv = hndu and so du =1

hn

dv, u2 =v2

h2n

to get

E12 [ sup−∞<x<∞

|f ′′n(x)− E[f ′′n(x)]|2] ≤∫ ∞

−∞|k(uhn)|u2σ[ϕn(u)]du

≤ (n12 h3

n)−1

∫ ∞

−∞|k(v)|v2dv,

|E[f ′′n(x)]− f ′′(x)| ≤ (2π)−1

∫ ∞

−∞|1− k(uhn)|u2|ϕ(u)|du.

36

Page 45: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Equation (2.1.16) follows from (2.1.15) and the fact that θ?n tends to θ, since it is between

θn and θ, and θn tends to θ.

To prove (2.1.17), let

f ′n(θ) = n−1

n∑

k=1

Vnk , Vnk =1

h2n

K ′(θ −Xk)

hn

,

Vnk are independent and identically distributed as Vn = (h2n)−1K ′(θ −Xk)

hn

.Now,

E|Vn|m =

∫ ∞

−∞| 1

hn

K(x− y

hn

)|mf(y)dy

=1

hmn

∫ ∞

−∞|K(

x− y

hn

)|mf(y)dy

Let u =x− y

hn

, to get y = x− uhn, and so dy = −hdu.

That is;

E|Vn|m =hn

hmn

∫ ∞

−∞|K ′(y)|mf(y)dy

=1

h2m−1n

∫ ∞

−∞|K ′(y)|mf(y)dy.

Then

E|Vn|m =1

h2m−1n

f(θ)

∫ ∞

−∞|K ′(y)|mdy

hence

h2m−1E|Vn|m −→ f(θ)

∫ ∞

−∞|K ′(y)|mdy,

Using Liapunov’s condition it is sufficient to show that,

for some δ > 0,E|Vn − E[Vn]|2+δ

n(δ/2)σ2+δ[Vn]−→ 0 as n −→∞.

Now,

(nh3n)−1E[f ′n(θ)] = (nh3

n)−1(−i

2π)

∫ ∞

−∞e−iuθk(hu)− 1uϕudu −→ 0,

37

Page 46: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

nh3nVar[f ′n(θ)] = h−1

n

∫ ∞

−∞K ′2(θ − y)/hnf(y)dy − nh3

nE2[f ′n(θ)]

−→ f(θ)

∫ ∞

−∞K ′2(y)dy.

Therefore

(nh3n)

12 f ′n(θ) −→ N(0, f(θ)J).

This implies that

f ′n(θ)− E[f ′n(θ)]

σ[f ′n(θ)]−→ N(0 , 1) in distribution.

To proof of (2.1.18), from Equation (2.1.14) and (2.1.17), we have

θn − θ =−f ′n(θ)

f ′′n(θ?n)

.

That is;

(nh3n)

12 f ′n(θ) −→ N(0, f(θ)J).

Then,

(nh3n)

12 (θn − θ) =

−(nh3n)

12 f ′n(θ)

[f ′′(θ?n)]2

−→ N( 0 ,f(θ)J

[f ′′(θ)]2).

2.2 A Simple Estimation of the Mode

The estimator (2.1.1) is increasingly used, although it is difficult to calculate. Indeed,

in addition to the calculation of fn, it involves a numerical step for the computation of

arg max .

As noticed by [5], classical search methods of the arg max perform satisfactorily only when

fn is sufficiently regular. Thus in practice, the arg max is usually computed over a finite

grid, although it may affect the asymptotic properties of the estimator. Moreover, when

the dimension of the sample space is large, or when accurate estimation is needed, the grid

size increasing exponentially with the dimension, leads to timeconsuming computations.

38

Page 47: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Finally, the search grid should be located around high density areas. In high dimension,

this is a difficult task and the grid search grid usually includes low density areas. To solve

this problem, [3] proposed a concurrent estimator of the mode θ?n which is defined by

θ?n = arg max

x∈Sn

fn(X), (2.2.1)

where Sn = X1, . . . , Xn, ia a finite sample of d dimension data.

The main advantage of using θ?n instead of θn, is that the former is easily computed

in a finite number of operations. Moreover, since the sample points are naturally concen-

trated in high density areas, the set Sn can be regarded as the most natural random grid

for approximating the mode.

[3] established the strong consistency of θ?n towards θn, and provided almost sure

rate of convergence without any differentiability condition on f around the mode.

[2] examine whether maximization over a finite sample alters the rate of convergence of

the estimate θ?n compared to that of the estimate θn. They proved that the two estimates

have the same asymptotic behavior. Also, another use of computing θ?n is that it may be

an appropriate choice for a stating value of an optimization algorithm to approximate θn.

2.3 Nonparametric Regression Estimation

Kernel smoothing provides a simple way of finding structure in the data sets without

the imposition of a parametric model. One of the most fundamental setting where ker-

nel smoothing ideas can applied is the simple regression problem. In this case paired

observations for each of two variable are available and one is interested in determining

an appropriate functional relationship between the two variables. One of the variables,

usually denoted by X, which called the predictor variable and the other variable usually

denoted by Y, which called the response variable.

39

Page 48: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

A well known result from elementary statistics is the function m that minimizes EY −m(X)2, and it is known as the conditional expectation ( mean ) function of Y given X,

that is

m(X) = E(Y |X)

This function is always called the regression function of Y on X. There are now several

approaches to the nonparametric regression problem. Some of the more popular are those

based on kernel functions, spline functions and wavelets. For more details see [22] and [23].

Each of these approaches has its particular strengths and weaknesses, although kernel

regression estimators have the advantage of mathematical and intuitive simplicity. One

of the most known kernel estimators is the Nadaraya-Watson estimator.

The Nadaraya-Watson Estimator

Let (Xi, Yi) be R×R valued independent random variables with a common proba-

bility density function f. Also assume that X admits a marginal density g(X).

Suppose that we are given n observations of (X, Y ), denoted by (X1, Y1), . . . , (Xn, Yn).

First, we consider the following estimator of the joint density f(x, y) of (X,Y ) :

fn(x, y) =1

n

n∑i=1

Khn(x−Xi)Khn(y − Yi),

and define the marginal pdf of X as

gn(x) =1

n

n∑i=1

Khn(x−Xi)

where Khn(x) = K(x/hn)/hn.

The Nadaraya-Watson estimator of the conditional density function f(y|x) is given

40

Page 49: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

by

fn(y|x) =fn(x, y)

gn(x)=

n−1

n∑i=1

Khn(x−Xi)Khn(y − Yi)

n−1

n∑i=1

Khn(x−Xi)

=

n∑i=1

Khn(x−Xi)Khn(y − Yi)

n∑i=1

Khn(x−Xi)

.

Now to estimate m(·), first we compute an estimator of the joint density f(x, y) of (X, Y ),

and then to integrate it according to the formula

m(x) =

∫ ∞

−∞yf(x, y)dy

∫ ∞

−∞f(x, y)dy

. (2.3.1)

Lemma 2.3.1.

Under the formulas of fn(x, y) and f(x, y), we have

(1)

∫ ∞

−∞fn(x, y)dy =

1

n

n∑i=1

Khn(x−Xi).

(2)

∫ ∞

−∞yfn(x, y)dy =

1

n

n∑i=1

Khn(x−Xi)Yi.

Proof :

Since

∫ ∞

−∞K(u)du = 1, we have that

(1)

∫ ∞

−∞fn(x, y)dy =

∫ ∞

−∞

1

n

∞∑n=1

Khn(x−Xi)Khn(y − Yi)dy

=1

n

∞∑n=1

Khn(x−Xi)

∫ ∞

−∞Khn(y − Yi)dy

=1

n

∞∑n=1

Khn(x−Xi).

41

Page 50: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

(2)

∫ ∞

−∞yfn(x, y)dy =

∫ ∞

−∞

y

n

∞∑n=1

Khn(x−Xi)Khn(y − Yi)dy

=1

n

∞∑n=1

Khn(x−Xi)

∫ ∞

−∞yKhn(y − Yi)dy

=1

n

∞∑n=1

Khn(x−Xi)Yi.

If we substitute these into the numerator and denominator of 2.3.1 we obtain the Nadaraya-

Watson kernel estimator for m(·),

mn(x) =

n∑i=1

Khn(x−Xi)Yi

n∑i=1

Khn(x−Xi)

=n∑

i=1

Wni(x)Yi,

where

Wni(x) =Khn(x−Xi)

n∑i=1

Khn(x−Xi)

, i = 1, . . . , n,

are the weight functions.

The bandwidth hn determines the degree of smoothness of mn(·). This can be imme-

diately seen by considering the limits for hn tending to zero or to infinity respectively.

Corollary 2.3.2.

(a) If hn −→ 0, then at an observation Xi,

mn(Xi) −→ Khn(0)Yi

Khn(0)= Yi,

indicating that small bandwidths reproduce the data.

(b) If hn −→∞, then

mn(Xi) −→

n∑i=1

Khn(0)Yi

n∑i=1

Khn(0)

=1

n

n∑i=1

Yi = Y .

42

Page 51: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Proof: See [20]

That is, in (a) if hn −→ 0, then mn(x) tends to one piont. But if hn −→∞, then mn(x)

tends to the mean.

Suggesting that large bandwidth leads to an over smoothed curve, the sample mean. In

general, the bandwidth function hn acts as follows.

If hn is very small, then the weights focus on a few observations that are in the neigh-

borhood around each Xi. If hn is very large, then the weights will spread over larger

neighborhood around each Xi.

Consequently, the choice of hn plays an important role in kernel regression. These two

limit considerations make it clear that the smoothing parameter hn, in relation to the

sample size n, should not converge to zero too rapidly nor too slowly.

2.4 Joint Asymptotic Distribution of the Estimated

Conditional Mode

In nonparametric estimation of regression function, most investigations are concerned

with the regression function m(x), the conditional mean of Y given value x of a predictor

X. However, new insights about the underlying structures can be gained by considering

other aspects of the conditional distribution f(y|x) of Y given X = x. One of this aspects

is the conditional mode function, which will be the topic of this section.

Assume that (X1, Y1), (X2, Y2), . . . , (Xn, Yn) are i.i.d. random variables with joint proba-

bility density function f(x, y). The marginal probability density function of X1 is g(x) =∫ ∞

−∞f(x, y)dy, and the conditional probability density function of Y1 given X1 = x is

given by f(y|x) =f(x, y)g(x)

. We assume that for each x, f(x, y) is uniformly continuous in

y and it follows that f(x, y) possesses a mode θ(x) defined by

θ(x) = arg max−∞<y<∞

f(y|x).

We call θ(x) the population conditional mode or the mode function, and we assume that

θ(x) is unique.

43

Page 52: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Let K be a measurable function and hn be a sequence of positive numbers converging

to zero. We consider the Nadaraya-Watson estimator fn(y|x) of the conditional density

f(y|x).

If K is chosen such that K(u) tends to zero as u tends to ±∞, then for every sam-

ple sequence and for each x, fn(y|x) is a continuous function of y and tends to ±∞.

Consequently, there is a random variable θn(x) such that

θn(x) = arg max−∞<y<∞

fn(y|x).

We call θn(x) the sample conditional mode. [18] considered θn(x) as an estimator of θ(x)

and established conditions under which the estimator is strongly consistent and asymp-

totically normally distributed. They proved that (nh4n)

12 (θn(x) − θ(x)) is asymptotically

normally distributed with mean zero and variance

f(x, θ(x))

f (0,2)(x, θ(x))2.

∫ ∞

−∞

∫ ∞

−∞K(u)K(1)(v)2dudv

nh4n

where K(1)(v) means the first derivative of K(v), and f (0,2)(x, θ(x)) is defined in the fol-

lowing assumptions.

In this section, we will discuss this result for multivariate case. For distinct points

x1, x2, . . . , xk we will establish conditions under which (nh4n)

12 (θn(x1)−θ(x1), . . . , θn(xk)−

θ(xk))T where T denotes the transpose, is asymptotically multivariate normal with mean

zero vector and diagonal covariance matrix B = [bij] with

bij =f(xi, θ(xi))

f (0,2)(xi, θ(xi))2

∫ ∞

−∞

∫ ∞

−∞K(u)K(1)(v)2dudv.

We consider the following assumptions from [18],

(A1) (X1, Y1), . . . , (Xn, Yn), is a sample of i.i.d. random variables with joint probability

density function f(x, y), where the following hold,

44

Page 53: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

(i) g(x) the marginal probability density function of X, is uniformly continuous.

(ii) f (i,j)(x, y) =∂i+jf(x, y)

∂xi∂yj exist and bounded for 1 ≤ i + j ≤ 3.

(A2) The kernel K is a Borel function and satisfies the following:

(i) K(u) tends to zero as u tends to ±∞.

(ii) K(u) and it’s first two derivative are functions of bounded variation.

(iii) lim|u|→∞

|u2K(i)(u)| = 0, (i = 0, 1)

(iv)

∫ ∞

−∞uiK(u)du = 1, , i = 0 (= 0, if i = 1, 2)

(v)

∫ ∞

−∞|u|3K(u)du < ∞.

(A3) hn is a sequence of positive numbers tending to zero, and satisfies the following:

hn = n−δ ,1

10< δ <

1

8; i.e. lim

n→∞nh8

n = ∞ and limn→∞

nh10n = 0

To prove our result we will use the following preliminary lemmas from [18] and [20].

Lemma 2.4.1. (Bochner Lemma)

Suppose K1(u) and K2(u) are real valued Borel measurable function satisfying the

following conditions

1. supu∈R

|Ki(u)|du < ∞, (i = 1, 2).

2.

∫ ∞

−∞|Ki(u)|du < ∞, (i = 1, 2).

3. lim|u|→∞

|u2Ki(u)| = 0, (i = 1, 2).

If (x, y) ∈ C(f), the set of continuity points of f, then for any η ≥ 0,

limn→∞

[h−2n

∫ ∞

−∞

∫ ∞

−∞|K1(

u

hn

)K2(v

hn

)|1+ηf(x− u, y − v)dudv] =

f(x, y)

∫ ∞

−∞

∫ ∞

−∞|K1(u)K2(v)|1+ηdudv.

45

Page 54: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Define

∂jfn(x, y)

∂yj= f (0,j)(x, y)

= (nhj+2n )−1

n∑i=1

K(x−Xi

hn

)K(j)(y − Yi

hn

),

where K(j) denotes the jth derivative of K, (j = 1, 2),

and

Wni = h−3n K(

x−Xi

hn

)K(1)(y − Yi

hn

), (i = 1, 2, . . . , n)

Lemma 2.4.2.

Under the assumptions (A1)(ii), (A2) and (A3), if (x, y) ∈ C(f), then the following

are true

(i) limn→∞

nh4n[varf (0,1)

n (x, y)] = f(x, y)

∫ ∞

−∞

∫ ∞

−∞(K(u)K(1)(v))2dudv.

(ii) (nh4n)

12Ef (0,1)

n (x, y)− f (0,1)(x, y) = o(1).

Lemma 2.4.3.

Under the assumption of the above Lemma, the following is true,

limn→∞

(n−1h4n)1+ δ

2 [n∑

i=1

E|Wni − EWni|2+δ] = 0.

For fixed x expanding f(0,1)n (x, θn(x)) around θ(x), we obtain

0 = f (0,1)n (x, θn(x)) = f (0,1)

n (x, θ(x)) + (θn(x)− θ(x))f (0,2)n (x, θ?

n(x)),

where

|θ?n(x)− θ(x)| < |θn(x)− θ(x)|.

Hence,

θn(x)− θ(x) = − f(0,1)n (x, θ(x))

f(0,2)n (x, θ?

n(x))(2.4.1)

46

Page 55: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Lemma 2.4.4.

Under the assumption (A1), (A2)(ii), (iii) and (A3), if g(x) > 0, then f(0,2)n (x, θ?

n(x))

converges in probability to f(0,2)n (x, θn(x)) as n tends to infinity.

Now we prove an intermediate result in the next Theorem

Theorem 2.4.5.

Suppose that x1, x2, . . . , xk are distinct points, where f(x, y) > 0, and (xi, y) ∈C(f), (i = 1, 2, . . . , k). Then under the assumption (A1),(A2)(ii),(iii), and (A3), the

distribution of the vector

(nh4n)

12f (0,1)

n (x1, y)− f (0,1)(x1, y), . . . , f (0,1)n (xk, y)− f (0,1)(xk, y)T ,

where T denotes the transpose, is asymptotically multivariate normal, with mean zero

vector and diagonal covariance matrix Γ = [γij], with

γij = f(xi, y)

∫ ∞

−∞

∫ ∞

−∞K(u)K(1)(v)2dudv, (i = 1, 2, . . . , k).

Proof:

Without loss of generality, we consider the special case k = 2. The same arguments are

used for the more general case.

Before we start the proof of the theorem, we introduce some notation.

For i = 1, 2, . . . , n and s = 1, 2, we define the following:

Vni = h−3n K(

xs −Xi

hn

)K(1)(y − Yi

hn

),

Wni(xs) = h2n(Vni(xs)− EVni(xs)),

Wn(xs) =n∑

i=1

Wni(xs),

Zni = (Wni(x1) , Wni(x1))T ,

Zn = n−12 (Wn(x1) , Wn(x2))

T

47

Page 56: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Zn = (nh4n)

12f (0,1)

n (x1, y)− Ef (0,1)n (x1, y) , f (0,1)

n (x2, y)− Ef (0,1)n (x2, y)T (2.4.2)

Let A = [ars] be a 2× 2 diagonal matrix with

ars = f(xs, y)

∫ ∞

−∞

∫ ∞

−∞K(u)K(1)(v)2dudv.

Let Z be the bivariate normal with mean vector zero and covariance matrix A.

First, we will show that Zn converges in distribution to Z. To do that, we will use the

Cramer-Wold theorem.

It will be sufficient to prove that CZTn converge in distribution to CZT for any constant

C = (c1, c2) ∈ R2 ,C 6= 0. Note that,

CZTn =

n∑i=1

n−12CZni , E(n−

12CZni) = 0.

Let ρ2+δni = E|n− 1

2CZni|2+δ , ρ2+δn =

n∑i=1

ρ2+δni , and σ2

n = Var(CZTn ).

Using Liapounov’s theorem, it will be sufficient to show that,

limn→∞

ρ2+δn

σ2+δn

= 0. (2.4.3)

Now, the proof of the Theorem 2.4.5 will given via the following lemmas.

Lemma 2.4.6.

Under conditions (A2)(ii),(iii),(iv), if (xs, y) ∈ C(f), then for (s = 1, 2), (r = 1, 2), the

following are true:

a. limn→∞

EW 2ni(xs) = f(x, y)

∫ ∞

−∞

∫ ∞

−∞K(u)K(1)(v)2dudv.

b. limn→∞

EWni(xs)Wni(xr) = 0, (r 6= s).

Proof :

(a) By definition of Wni(xs),

EW 2ni(xs) = h4

n(EV 2ni(xs) − (EVni(xs))

2), (2.4.4)

48

Page 57: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

where

h4n(EV 2

ni(xs)) = h4n(h−6

n

∫ ∞

−∞

∫ ∞

−∞K(

xs − u

hn

)K(1)(y − v

hn

)2f(u, v)dudv)

= h−2n

∫ ∞

−∞

∫ ∞

−∞K(

u

hn

)K(1)(v

hn

)2f(xs − u, y − v)dudv.

Now, by an application of Bochner Lemma, we obtain that

limn→∞

h4nEV 2

ni(xs) = f(xs, y)

∫ ∞

−∞

∫ ∞

−∞K(u)K(1)(v)2dudv. (2.4.5)

Next,

h4n(EVni(xs))

2 = h2n(hn(EVni(xs)))

2

= h2n(h−2

n

∫ ∞

−∞

∫ ∞

−∞K(

u

hn

)K(1)(v

hn

)f(xs − u, y − v)dudv)2.

By another application of Bochner Lemma, we obtain that

limn→∞

h4n(EVni(xs))

2 = 0. (2.4.6)

By a combination of (2.4.4), (2.4.5), and (2.4.6), (a) holds.

(b) From the definition of Wni(x), we have

E(Wni(x1)Wni(x2)) = h4n(EVni(x1)Vni(x2) − EVni(x1)EVni(x2)). (2.4.7)

Suppose that x2 > x1, let δ = x2 − x1, and δn =δ

hn

.

h4nEVni(x1)Vni(x2) = h−2

n

∫ ∞

−∞

∫ ∞

−∞K(

x1 − u

hn

)K(x2 − u

hn

)(K(1)(y − v

hn

))2f(u, v)dudv

=

∫ ∞

−∞

∫ ∞

−∞K(u)K(δn + u)(K(1)(v))2f(x1 − hnu , y − hnv)dudv

=

∫ ∞

−∞[

∫ ∞

−∞K(u)K(δn + u)(K(1)(v))2g(x1 − hnu)du]

× (K(1)(v))2f(y − hnv|x1 − hnu)dv. (2.4.8)

49

Page 58: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Next,

∫ ∞

−∞K(u)K(δn + u)g(x1 − hnu)du =

|u|< δn2

K(u)K(δn + u)g(x1 − hnu)du

+

|u|≥ δn2

K(u)K(δn + u)g(x1 − hnu)du

≤ sup|u|< δn

2

K(δn + u)

∫ ∞

−∞K(z)g(x1 − hnz)dz

+ sup|u|≥ δn

2

K(u)

∫ ∞

−∞K1(δn + z)g(x1 − hnz)dz

≤ sup|u|≥ δn

2

K(u) . O(1) + sup|u|≥ δn

2

K(u) . O(1)

= 2 sup|u|≥ δn

2

K(u) . O(1)

≤ 4

δn

sup|u|≥ δn

2

|uK(u)| . O(1)

=4hn

δsup|u|≥ δn

2

|uK(u)| . O(1) = O(hn). (2.4.9)

Finally, from (2.4.8), and (2.4.9), we have that

limn→∞

h4nEVni(x1)Vni(x2) = 0. (2.4.10)

Now, we get

h4nEVni(x1)Vni(x2) = h2

n[(h−2n

∫ ∞

−∞

∫ ∞

−∞K(

u

hn

)K(1)(v

hn

)f(x1 − u, y − v)dudv)]

× [(h−2n

∫ ∞

−∞

∫ ∞

−∞K(

w

hn

)K(1)(v

hn

)f(x2 − w, y − v)dwdv)]

−→ 0 (2.4.11)

by an application of Bochner Lemma. The proof of the Lemma is completed by a combi-

nation of (2.4.7), (2.4.10), and (2.4.11).

Lemma 2.4.7.

Under the conditions of the last Lemma, we have that

limn→∞

σ2n = CACT

50

Page 59: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Proof:

Since σ2n = Var (CZT

n ), and by the definition of Z, we have

σ2n = Var(n−

12 c1Wn(x1) + n−

12 c2Wn(x2))

= n−1c21Var(Wn(x1)) + n−1c2

2Var(Wn(x2)) + 2n−1c1c2Cov(Wn(x1) , Wn(x2))

= n−1c21

n∑i=1

Var(Wni(x1)) + n−1c22

n∑i=1

Var(Wni(x2))

+ 2n−1c1c2Cov(n∑

i=1

(Wni(x1)),n∑

i=1

(Wni(x2)))

= c21Var(Wni(x1)) + c2

2Var(Wni(x2)) + 2n−1c1c2E(n∑

i=1

n∑i=1

Wni(x1)Wnj(x2)).

Since CACT is a quadratic from associated with the positive definite matrix A, that is

(CACT > 0), an application of Lemma (2.4.6) implies that

limn→∞

σ2n =

∫ ∞

−∞

∫ ∞

−∞K(u)K(1)(v)2dudv[c2

1f(x1, y) + c22f(x2, y)]

= CACT > 0.

Now,

ρ2+δni ≤ n−(1+ δ

2)|C|2+δE|Z|2+δ

= n−(1+ δ2)|C|2+δE|(Wni(x1) , Wni(x2))|2+δ

≤ n−(1+ δ2)|C|2+δ22+δ maxE|Wni(x1)|2+δ , E|Wni(x2)|2+δ.

Assume that E|Wni(x1)|2+δ > E|Wni(x2)|2+δ. Then we have

ρ2+δni ≤ n−(1+ δ

2)|C|2+δ22+δE|Wni(x1)|2+δ

= n−(1+ δ2)|C|2+δ22+δE|h2

n(Vni(x1)− EVni(x1))|2+δ

= |C|2+δ22+δ(n1−)1+ δ2 (h2

n)2+δE|Vni(x1)− EVni(x1))|2+δ

= |C|2+δ22+δ(n−1h4n)1+ δ

2 E|Vni(x1)− EVni(x1))|2+δ.

This implies that

ρ2+δn =

n∑i=1

ρ2+δni ≤ |C|2+δ22+δ(n−1h4

n)1+ δ2

n∑i=1

E|Vni(x1)− EVni(x1))|2+δ,

51

Page 60: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

which converge to zero as n tends to infinity by an application of Lemma (2.4.3).

Hence the Liapounov’s condition, limn→∞

ρ2+δn

σ2+δn

= 0, is satisfied. So we have that, CZTn is

asymptotically normally distributed with mean zero and variance CACT .

By Cramer-Wold Theorem we have that Zn converges in distribution to Z. Now an appli-

cation of Lemma (2.4.2)(ii) to equation(2.4.2) completes the proof of the Theorem (2.4.2).

We are now in position to prove our maim theorem.

Theorem 2.4.8.

Suppose that x1, x2, . . . , xn are distinct points, where f(xi, y) > 0, and (xi, y) ∈C(f), (i = 1, 2, . . . , k). Then under the assumption (A1)-(A3), the distribution of the

vector

(nh4n)

12 (θn(x1)− θ(x1), . . . , θn(xk)− θ(xk))

T ,

where T denotes the transpose, is asymptotically multivariate normal with mean vector

zero and diagonal covariance matrix B = [bij], with

bij =f(xi, θ(xi))

f (0,2)(xi, θ(xi))2

∫ ∞

−∞

∫ ∞

−∞K(u)K(1)(v)2dudv.

Proof:

(nh4n)

12 (θn(x1)−θ(x1), . . . , θn(xk)−θ(xk))

T =−(nh4n)

12 (

f(0,1)n (x1, θ(x1))

f(0,2)n (x1, θ?

n(x1)), . . . ,

f(0,1)n (xk, θ(xk))

f(0,2)n (xk, θ?

n(xk))),

where

|θ?n(xi)− θ(xi)| < |θn(xi)− θ(xi)|, (i = 1, 2, . . . , k).

An application of Theorem (2.4.5) and Lemma (2.4.4) completes the proof.

52

Page 61: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Chapter 3

Quantiles Regression

The term quantile is synonymous with percentile; the median is the best example of

a quantile. We know that the sample median can be defined as the middle value (or

the value half-way between the two middle values) of a set of ranked data, i.e. the

sample median splits the data into two parts with an equal number of data points in

each. Usually, the sample median is taken as an estimator of the population median m,

a quantity which splits the distribution into two halves in the sense that, if the random

variable Y can be measured on the population, then P (Y ≤ m) = P (Y ≥ m) =1

2. In

particular, for a continuous random variable, m is a solution to the equation F (m) =1

2,

where F (y) = P (Y ≤ y) is the cumulative distribution function.

As an example of the use of the median, consider the distribution of salaries. This

is typically skewed to the right since relatively few people earn large salaries. As a

consequence, the sample median provides a better summary of typical salaries than mean.

More generally, the 25% and 75% sample quantile can be defined as values that split the

data into proportions of one-and three-quarters, and vice versa. Corresponding, in the

continuous case, the population lower quartile and the upper quartile are the solutions

to the equations F (y) =1

4and F (y) =

3

4respectively. Generally, for a proportion α

(0 < α < 1), and in the continuous case, the 100α% quantile (equivalently, the 100pth

percentile) of F is the value y which solves F (y) = α. Note that we assume that this value

53

Page 62: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

is unique.

A further generalization of the concept to the conditional quantile emerges when we want

to study the relationship between a response variable Y and some covariates X. To explore

this relationship, we use analysis to quantify the relationship.

The conditional distribution function F (y|X = x) is the important role of inference in

solving this problem.

In parametric and nonparametric estimation of the conditional distribution function most

investigation of the underlying structures is concerned with the conditional mean function

m(x) = E(Y |X = x), the conditional mean of Y given the value x of X. New insight about

the underlying structure can be gained by considering other aspects of the conditional

distribution function F (y|X).

Estimation of the conditional quantiles has gained particular attention during the recent

three decades because of their useful application in various fields such as econometrics,

finance, environmental sciences and medicine. For more details see [9].

This chapter consists of three sections, in the first section we will discuss the asymptotic

normality of the conditional quantiles, while in the next section we will discuss the joint

asymptotic normality of the conditional quantiles, Finally in Section 3.3 we will give a

comparison between the mode and the median.

3.1 Nonparametric estimation of conditional quan-

tiles

In this Section, we will introduce the definition of conditional α−quantiles and we will

discuss the joint asymptotic normality of the conditional quantiles.

Let X,Y be a bivariate random variable and F (y|X = x) = P (Y ≤ y |X = x) the

conditional distribution of Y, given X = x.

Definition 3.1.1. The conditional α−quantile qα(x) is defined as follows

qα(x) = inf y ∈ R | F (y|x) ≥ α, 0 < α < 1, x ∈ R.

54

Page 63: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

The quantiles give more complete information about the distribution of Y as a function

of the predictor variable X than the conditional mean alone.

[1] discussed the following two kernel estimators of the cdf F (y|x) and the α−quantile

qα(x) respectively

Fn(y|x) =

n∑i=1

IYi≤yK(x−Xi

hn

)

n∑i=1

K(x−Xi

hn

)

(3.1.1)

qn,α(x) = infy ∈ R | Fn(y|x) ≥ α, 0 < α < 1. (3.1.2)

Now, we will consider some properties of the E[Fn(y|x)] and V ar[Fn(y|x)] to give more

information about the Mean Square Error.

Lemma 3.1.1.

Let Yi be an independent random variables. The expectation of the estimator

Fn(y|x) is given by

E[Fn(y|x)] =n∑

i=1

K(x−Xi

hn

)

[n∑

i=1

K(x−Xi

hn

)]

. F (y|Xi)

Proof :

From the definition of the expectation and Equation (3.1.1), we have

E[Fn(y|x)] = E [

n∑i=1

IYi≤yK(x−Xi

hn

)

n∑i=1

K(x−Xi

hn

)

]

=

E [n∑

i=1

IYi≤yK(x−Xi

hn

)]

n∑i=1

K(x−Xi

hn

)

55

Page 64: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

=

∫ ∞

−∞

n∑i=1

IYi≤yK(x−Xi

hn

)f(y|Xi)dy

n∑i=1

K(x−Xi

hn

)

=

n∑i=1

K(x−Xi

hn

)

∫ ∞

−∞IYi≤yf(y|Xi)dy

n∑i=1

K(x−Xi

hn

)

=

n∑i=1

K(x−Xi

hn

)

∫ y

−∞f(t|Xi)dt

n∑i=1

K(x−Xi

hn

)

=

n∑i=1

K(x−Xi

hn

) F (y|Xi)

[n∑

i=1

K(x−Xi

hn

)]

=n∑

i=1

K(x−Xi

hn

)

[n∑

i=1

K(x−Xi

hn

)]

. F (y|Xi). (3.1.3)

Thus the proof of this lemma is completed.

Now, the next lemma gives the varaince of Fn(y|x).

Lemma 3.1.2.

Let Yi be an independent random variables. The varaince of the estimator Fn(y|x)

is given by

V ar[Fn(y|x)] =n∑

i=1

K2(x−Xi

hn

)

[n∑

i=1

K(x−Xi

hn

)]2. [F (y|Xi) − F 2(y|Xi)].

56

Page 65: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Proof:

From the definition of the varaince and Equation (3.1.1), we have

V ar[Fn(y|x)] = E(F 2n(y|x))− [E(Fn(y|x))]2

= E [

n∑

i=1

I2Yi≤yK(

x−Xi

hn

) +∑

1≤i<j<n

IYi < yIYj < yK2(x−Xi

hn

)

[n∑

i=1

K(x−Xi

hn

)]2]

− n∑

i=1

K(x−Xi

hn

)

[n∑

i=1

K(x−Xi

hn

)]

. F (y|Xi)2

=

E [n∑

i=1

I2Yi≤yK

2(x−Xi

hn

)]

[n∑

i=1

K(x−Xi

hn

)]2−

n∑i=1

K2(x−Xi

hn

)

[∑

i

K(x−Xi

hn

)]2. F 2(y|Xi)

=

∫ ∞

−∞

n∑i=1

I2Yi≤yK

2(x−Xi

hn

) f(y|Xi)dy

[n∑

i=1

K(x−Xi

hn

)]2−

n∑i=1

K2(x−Xi

hn

)

[n∑

i=1

K(x−Xi

hn

)]2. F 2(y|Xi)

=

n∑i=1

K2(x−Xi

hn

)

∫ ∞

−∞I2Yi≤y f(y|Xi)dy

[n∑

i=1

K(x−Xi

hn

)]2−

n∑i=1

K2(x−Xi

hn

)

[n∑

i=1

K(x−Xi

hn

)]2. F 2(y|Xi)

=

n∑i=1

K2(x−Xi

hn

) F (y|Xi)

[n∑

i=1

K(x−Xi

hn

)]2−

n∑i=1

K2(x−Xi

hn

)

[n∑

i=1

K(x−Xi

hn

)]2. F 2(y|Xi)

57

Page 66: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

=n∑

i=1

K2(x−Xi

hn

)

[n∑

i=1

K(x−Xi

hn

)]2. F (y|Xi)−

n∑i=1

K2(x−Xi

hn

)

[n∑

i=1

K(x−Xi

hn

)]2. F 2(y|Xi)

=n∑

i=1

K2(x−Xi

hn

)

[n∑

i=1

K(x−Xi

hn

)]2. [F (y|Xi) − F 2(y|Xi)]. (3.1.4)

For further results, we need assumptions of the kernel function, the bandwidth and the

conditional distribution function. These assumptions will be used in this chapter.

(A1) hn is sequence of positive number satisfies the following:

(i) hn −→ 0, for n −→∞;

(ii) nhn −→∞, for n −→∞;

(A2) The kernel K is a Borel function and satisfies the following:

(i) K has a compact support;

(ii) K is symmetric;

(iii) K is Lipschitz-continuous;

(iv)

∫K(u)du = 1;

(v) K is bounded;

(A3) For a fixed y ∈ R there exists F ′′(y|x) =∂2F (y|x)

∂x2in a neighborhood of x.

We assume that (A2)( i , ii ), and (A3) are satisfied. Let Ui =x−Xi

hn

and x ∈ (hn , 1−hn), then from Lemma (1.5.1) it follows that

E[Fn(y|x)]− F (y|x) =1

2h2

nµ2(K)F ′′(y|x) + o(h2n),

58

Page 67: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

where

µ2(K) =

∫u2K(u)du =

∑i

U2i K(u)

∑i

K(Ui).

Then

E[Fn(y|x)] = F (y|x) +h2

n

2

∑i

U2i K(Ui)

∑i

K(Ui)F ′′(y|x) + o(h2

n). (3.1.5)

Lemma 3.1.3. ( Integral approximation of the sum over the kernel function )

With (A2)(i), Lipschitz-continuity (A2)(iii) and the mean value theorem of integration,

it follows

(i) limn→∞

n∑i=1

1

nhn

K(Ui) =

∫ ∞

−∞K(u)du

(ii) limn→∞

n∑i=1

1

nhn

K2(Ui) =

∫ ∞

−∞K2(u)du

(iii) limn→∞

n∑i=1

1

nhn

UiK(Ui) =

∫ ∞

−∞uK(u)du

Proof :

Assume J be the index set of observations, |J | = O(nhn), |J | denotes the cardinality.

We will prove (i) and the proof of (ii) and (iii) are similar.

|n∑

i=1

1

nhn

K(Ui)−∫ ∞

−∞K(u)du| ≤

∑i∈J

| 1

nhn

K(Ui)−∫ Ui−1

Ui

K(u)du|

=∑i∈J

| 1

nhn

K(Ui)− (Ui−1 − Ui)K(ζi)|

=∑i∈J

| 1

nhn

K(Ui)− (x−Xi−1

hn

− x−Xi

hn

)K(ζi)|

=∑i∈J

| 1

nhn

K(Ui)− (Xi −Xi−1

hn

)K(ζi)|

=∑i∈J

| 1

nhn

K(Ui)− 1

nhn

K(ζi)|

59

Page 68: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

=1

nhn

∑i∈J

|K(Ui)−K(ζi)|

=1

nhn

∑i∈J

L|Ui − ζi| ( From Lipschitz Condition)

≤ 1

nhn

∑i∈J

O(1

nhn

)

= O(1

n2h2n

)∑i∈J

1

= O(1

nhn

).

Now, we want to approximate the mean square error as in the next Theorem.

Theorem 3.1.4.

Let Yi be independent and let (A1)( i , ii ), (A2)( i , ii , iii , iv ) and (A3) be satisfied.

Then it holds for n −→∞ and x ∈ (hn , 1− hn) :

MSE(Fn(y|x)) ≈ [h2

n

2F ′′(y|x)

∫u2K(u)du]2

+1

nhn

(F (y|x) − F 2(y|x))

∫K2(u)du. (3.1.6)

Proof :

Since MSE(Fn(y|x)) = (E[Fn(y|x)]− F (y|x))2 + V ar[Fn(y|x)]

Then by (3.1.5), we only want to find V ar[Fn(y|x)].

Taylor expansion yields the variance

F ( y | x− hnUi) = F (y|x)− hnUiF′(y|x) + h2

nU2i F ′′(y|x) + o(h2

n),

F 2( y | x− hnUi) = F (2y|x)− 2hnUiF (y|x)F ′(y|x) + h2nU

2i F ′(y|x)

+ h2nU2

i F (y|x)F ′′(y|x) + o(h2n),

60

Page 69: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Let the condition (A2)(ii) hold and suppose A = 1/[∑

i

K(Ui) ]2. Then, we have

V ar[Fn(y|x)] =n∑

i=1

K2(Ui)

[n∑

i=1

K(Ui)]2

[F (y|Xi) − F 2(y|Xi)]

= A

n∑i=1

K2(Ui)[F (y|Xi) − F 2(y|Xi)]

= A

n∑i=1

K2(Ui) [ F (y|x− Uihn) + hnU′iF (y|x)− h2

nU2i F ′′(y|x) + o(h2

n)

− F 2(y|x− Uihn)− 2hnUiF (y|x)F ′(y|x) + h2nU

2i F ′(y|x)

+ h2nU

2i F (y|x)F ′′(y|x) + o(h2

n)]

= A [F (y|x)− F 2(y|x)]∑

i

K2(Ui)

+ Ah2n [F ′′(y|x) + F ′(y|x) + F (y|x)F ′′(y|x)]

∑i

U2i K2(Ui)

+ A o(h2n)

∑i

K2(Ui).

That is

V ar[Fn(y|x)] =

∑i

K2(Ui)

[∑

i

K(Ui) ]2[F (y|x)− F 2(y|x)]

From the last Lemma, we have

V ar[Fn(y|x)] ≈ 1

nhn

[F (y|x)− F 2(y|x)]

∫K2(u)du.

So the proof of this theorem is completed.

Thus the bias of Fn(y|x) depends on the smoothness of the underlying conditional

distribution function by F ′′(y|x). It is now possible to give a formal assessment about the

asymptotic mean squared error.

Observe that the mean squared error depends on the second derivative of the condi-

tional distribution and the difference between (F (y|x) − F 2(y|x)). This means that the

61

Page 70: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

variance of the estimator is highest in the middle of the distributions. ( Since the maxi-

mum of (F (y|x)− F 2(y|x)) is1

4and happens when F (y|x) =

1

2. )

From the last Theorem it follows that the kernel estimator (3.1.1) is consistent. Next, the

asymptotic normality of (nhn)12 (Fn(y|x) − E[Fn(y|x)]) and (nhn)

12 (Fn(y|x) − [F (y|x)])

is shown.

Theorem 3.1.5.

Let the conditions of the last theorem be satisfied. Then it holds for n −→∞,

(nhn)12 (Fn(y|x)− E[Fn(y|x)])

d−→ N (0, [F (y|x) − F 2(y|x)]

∫K2(u)du). (3.1.7)

Proof :

To prove this theorem, we use Liapunov’s condition.

Let

Qn,i(x) =

K(x−Xi

hn

)

∑i

K(x−Xi

hn

)[IYi≤y − F (y|Xi)]

√V ar[Fn(y|x)]

Therefor,

n∑i=1

Qn,i(x) =n∑

i=1

K(x−Xi

hn

)

∑i

K(x−Xi

hn

)[IYi≤y − F (y|Xi)]

√V ar[Fn(y|x)]

That is;

n∑i=1

Qn,i(x) =

n∑i=1

K(x−Xi

hn

)

∑i

K(x−Xi

hn

)IYi≤y −

n∑i=1

K(x−Xi

hn

)

∑i

K(x−Xi

hn

)F (y|Xi)

√V ar[Fn(y|x)]

This means thatFn(y|x) − E[Fn(y|x)]√

V ar[Fn(y|x)]=

n∑i=1

Qn,i(x)

62

Page 71: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

if the Liapunov’s condition

limn→∞

∞∑i=1

E |Qn,i(x)|3 = limn→∞

n∑i=1

E |K(

x−Xi

hn

)

∑i

K(x−Xi

hn

)[IYi≤y − F (y|Xi)]|3

(V ar[Fn(y|x)])32

is satisfied.

With the integral approximation it holds for the numerator

n∑i=1

E |K(

x−Xi

hn

)

∑i

K(x−Xi

hn

)[IYi≤y − F (y|Xi)]|3

=n∑

i=1

E |K(

x−Xi

hn

)

∑i

K(x−Xi

hn

)|3 |IYi≤y − F (y|Xi)|3

=n∑

i=1

|K(

x−Xi

hn

)

∑i

K(x−Xi

hn

)|3 E |IYi≤y − F (y|Xi)|3

≤n∑

i=1

|K(

x−Xi

hn

)

∑i

K(x−Xi

hn

)|3 =

∑i

K3(x−Xi

hn

)

[∑

i

K(x−Xi

hn

)]3

=

nhn

∫K3(

x−Xi

hn

)du

[nhn

∫K(

x−Xi

hn

)du]3

=o(nhn)

o(n3h3n)

= o(1

n2h2n

) , By Lemma 3.1.1

For the variance of Fn,x(y) follows from the last theorem

V ar[Fn(y|x)] = o(1

nhn

).

63

Page 72: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Thus it holds

(V ar[Fn(y|x)])32 = o(

1

n32 h

32n

)

It follows for Liapunov’s condition

limn→∞

∑i

E |Qn,i(x)|3 ≤O( 1

n2h2n)

O( 1

n32 h

32n

)= O(

1

n12 h

12n

) = o(1).

From Liapunov’s condition and the variance of Fn,x(y) from the last theorem, it follows

asymptotic normality

Fn(y|x) − E[Fn(y|x)]√V ar[Fn(y|x)]

d−→ N (0, 1).

Therefor,

(nhn)12 (Fn(y|x)− E[Fn(y|x)]) −→ N ( 0 , [F (y|x) − F 2(y|x)]

∫K2(u)du ).

Corollary 3.1.6.

Let the condition of the last theorem be satisfied and let nh5n −→ 0, for n −→ ∞.

Then it follows

(nhn)12 (Fn(y|x) − F (y|x))

d−→ N ( 0 , [Fn(y|x) − F 2(y|x)]

∫K2(u)du). (3.1.8)

Proof :

The last theorem gives the asymptotic normality of (nhn)12 (Fn(y|x)− E[Fn(y|x)]).

That is we can replace E[Fn(y|x)] by F (y|x) to get (nhn)12 (E[Fn(y|x)] − F (y|x)) converge

to zero, for n −→∞.

From theorem (3.1.1)

E[Fn(y|x)] − F (y|x) =h2

n

2.

∑i

UiK(Ui)

∑i

K(Ui). F (2,0)(y|x) + o(h2

n)

= O(h2n).

That is

(nhn)12 (E[Fn(y|x)] − F (y|x)) = (nhn)

12 O(h2

n)

= O(nh5n)

12 .

64

Page 73: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

With nh5n −→ 0 for n −→∞, it follows the asymptotic normality of (nhn)

12 ([Fn(y|x)]− F (y|x))

Above theorems deal with the estimator of the conditional distribution. Now the be-

havior of the estimator of the conditional quantile is analyzed. So assume that Fn,x(qn,α(x)) =

Fx(qα(x)) = α is unique and Yi independent.

Now, let

Hn,α(θ(x)) =∑

i

1∑

i

K(x−Xi

hn

). K(

x−Xi

hn

) [α− IYi ≤ θ(x)]

=∑

i

Hi,α(θ(x)). (3.1.9)

Using the central limit theorem,

Hn,α(θ(x)) − E[Hn,α(θ(x))]√V ar[Hn,α(θ(x))]

d−→ N ( 0 , 1) , n −→∞. (3.1.10)

With Hn,α(θ(x)) the mean squared error of qn,α(x) can be calculated.

Theorem 3.1.7.

Let the conditions of theorem (3.1.1) be satisfied and let Fn,x(qn,α(x)) = Fx(qα(x)) = α

be unique. Then it holds

MSE[qn,α(x)] = [1

2h2

n

F (2,0)(qα(x)|x)

f(qα(x)|x)

∫u2K(u)du]2

+1

nhn

α(1− α)

f 2(qα(x))

∫K2(u)du (3.1.11)

Proof :

By the Taylor expansion of the conditional distribution functions of theorem (3.1.1) and

θ(x) = qn,α(x) follows

E[Hn,α(qn,α(x))] ≈ f(qα(x)|x) [qn,α(x) − qα(x)]

+1

2h2

nF (2,0)(qα(x))

∑i

U2i K(Ui)

∑i

K(Ui)

65

Page 74: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

and with integral approximation holds

E[Hn,α(qn,α(x))] ≈ f(qα(x)|x) [qn,α(x) − qα(x)]

+1

2h2

nF (2,0)(qα(x))

∫u2K(u)du.

Now,

V ar[Hn,α(qn,α(x))] =1

[∑

i

K(Ui) ]2

∑i

K2(Ui)[ F (qn,α(x) | x− hnUi)

− F 2((qn,α(x)) | x− hnUi)]

≈ 1

[∑

i

K(Ui) ]2[ α(1− α)]

∑i

K2(Ui)

≈ 1

nhn

α(1− α)

∫K2(u)du.

nhnHn,α(qn,α(x)) is a bounded random variable and∑

i

V ar(nhn Hi,α(qn,α(x))) −→ ∞for n −→∞.

From this the asymptotic normality ( Theorem 1.1.7 Ch.1), it follows.

nhnHn,α(qn,α(x)) − E[Hn,α(qn,α(x))]nhn

√V ar(Hn,x(qn,α(x)))

−→ N( 0 , 1 ) , n −→∞.

Since Fn(qn,α(x)|x) = α , Hn,α(qn,α(x)) = 0. This implies for n −→∞

f(qα(x)|x)[qn,α(x)− qα(x)] + 12h2

nF(2,0)(qα(x))

∫u2K(u)du

√1

nhnα(1− α)

∫K2(u)du

−→ N( 0 , 1 ) (3.1.12)

From that bias and variance of qn,α(x) can be calculated.

The bias depends through F (2,0)(qα(x)|x) on the smoothness of the quantile function.

But because of the division by the conditional density at qα(x) the steepness of the

conditional distribution also affects the bias and the variance. The steeper the conditional

distribution is the greater is the mean square error.

From the method of proof of theorem (3.1.3), asymptotic normality can be established.

66

Page 75: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Corollary 3.1.8.

Let the condition of theorem (3.1.3) be satisfied and let nh5n −→ 0, for n −→∞.

Then it holds

(nhn)12 (qn,α(x)− qα(x))

d−→ N (1

2(nh5

n)12

F (2,0)(qα(x)|x)

f(qα(x)|x)

∫u2K(u)du ,

α(1− α)

f 2(qα(x)|x)

∫K2(u)du )

) ;

d−→ N( 0 ,α(1− α)

f 2(qα(x)|x)

∫K2(u)du

) (3.1.13)

Proof :

From Equation (3.1.12) we have

f(qα(x)|x)[qn,α(x)− qα(x)] + 12h2

nF(2,0)(qα(x))

∫u2K(u)du

√1

nhnα(1− α)

∫K2(u)du

d−→ N( 0 , 1 ).

This implies;

E[

f(qα(x)|x)[qn,α(x)− qα(x)] + 12h2

nF(2,0)(qα(x))

∫u2K(u)du

√1

nhnα(1− α)

∫K2(u)du

] = 0,

and

V ar[f(qα(x)|x)[qn,α(x)− qα(x)] + 1

2h2

nF(2,0)(qα(x))

∫u2K(u)du

√1

nhnα(1− α)

∫K2(u)du

] = 1.

From the properties of the Expectation and the variance, we have

(nhn)12 E[[qn,α(x)− qα(x)]] −→ 1

2(nh2

n)12F (2,0)(qα(x))

fx(qα(x))

∫u2K(u)du (3.1.14)

And

V ar[f(qα(x)|x)[qn,α(x)− qα(x)] + 1

2h2

nF (2,0)(qα(x))

∫u2K(u)du

√1

nhnα(1− α)

∫K2(u)du

] −→ 1,

67

Page 76: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Then

(nhn) V ar[qn,α(x)− qα(x)] −→ α(1− α)

f 2(qα(x)|x)

∫K2(u)du (3.1.15)

From (3.1.14) and (3.1.15), we have

(nhn)12 (qn,α(x)− qα(x))

d−→ N (1

2(nh5

n)12

F (2,0)(qα(x)|x)

f(qα(x)|x)

∫u2K(u)du ,

α(1− α)

f 2(qα(x)|x)

∫K2(u)du )

)

Since nh5n −→ 0. Then we have (3.1.13).

3.2 Joint Asymptotic Distribution of the Conditional

Quantiles

Let (X1, Y1), (X2, Y2), . . . , (Xn, Yn) be independent and identically distributed two di-

mensional random variables with a joint density function f(x, y) and a joint distrib-

ution function F (x, y) =

∫ x

−∞

∫ y

−∞f(u, v)dvdu. The marginal density function of X is

g(x) =

∫ ∞

−∞f(x, y)dy. The conditional density function and the conditional distribution

function of Y given X = x are f(y|x) =f(x, y)

g(x), and

F (y|x) =

∫ y

−∞f(u|x)du =

∫ y

−∞f(x, u)du

g(x)

respectively. Now for i = 1,2 let qαi(x) denote the αi th quantile of the conditional

distribution F (y|x), i.e., a root of the equation F (q(x)|x) = αi, with 0 < α1 < α2 < 1.

Let fn(x, y), gn(x), fn(y|x) and Fn(y|x) be the estimators of f(x, y), g(x), f(y|x) and

68

Page 77: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

F (y|x) respectively and are defined as follows

fn(x, y) =1

nh2n

n∑i=1

K(x−Xi

hn

)K(y − Yi

hn

),

gn(x) =

∫ ∞

−∞fn(x, y)dy =

1

nhn

n∑i=1

K(x−Xi

hn

),

fn(y|x) =fn(x, y)

gn(x)

Fn(y|x) =

∫ y

−∞fn(u|x)du =

Bn(x, y)

gn(x)

where K is a probability density function, hn is a sequence of positive numbers con-

verging to zero, and

Bn(x, y) =1

nhn

n∑i=1

G(y − Yi

hn

)K(x−Xi

hn

)

With G(y) =

∫ y

−∞K(u)du.

Now, we consider for i =1,2 two estimators qαi,n(x) of qαi(x) defined by the root of the

equation Fn(q(x)|x) = αi, i = 1, 2. We shall call qαi,n(x) the conditional sample quantiles.

We prove that under some regularity conditions these estimators are strongly consistent

and asymptotically normally distributed.

Now, we shall assume the following conditions:

(A1) The conditional distribution function satisfy:

(i) F (i,j)(x, y) = ∂i+jF (x, y)/∂xi∂yj exist and are bounded for (i, j) = (1, 2), (2, 0), (2, 1), (3, 0).

(ii) The conditional population quantiles qαi(x) defined by

F (qαi(x)|x) =

F (1,0)( x , qαi(x) )

g(x)= αi, i = 1, 2

are unique.

(iii) f(x, y) is uniformly continuous.

69

Page 78: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

(A2) The marginal density function of X satisfy

(i) g(i)(x) =

∫ ∞

−∞∂if(x, y)/∂xidy exist for i = 1, 2.

(ii) Both h(x) =

∫ ∞

−∞|∂f(x, y)/∂x|dy and g(i)(x) are bounded for i = 1, 2.

(iii) g(x) is uniformly continuous.

(A3) The kernel K is a Borel function and satisfies the following:

(i) K(u) is a function of bounded variation.

(ii)

∫ ∞

−∞uK(u)du = 0.

(iii)

∫ ∞

−∞u2K(u)du < ∞.

(A4) hn is a sequence of positive numbers satisfying:

(i) hn = n−δ, 15

< δ < 14. i.e. lim

n→∞nh4

n = ∞, limn→∞

nh5n = ∞.

Lemma 3.2.1.

Under the conditions ( A2 ), ( i , ii ), ( A3 )( i ) and (A4 )( i ), we have

limn→∞

supx∈R

|gn(x)− g(x)| = 0

with probability one.

Proof : see [17].

Lemma 3.2.2.

Under the conditions (A1 )( i ) and ( A3 )( iii ), we have

supx∈R

|EBn(x, y) − F (1,0)(x, y)| = O(hn).

70

Page 79: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Proof :

By definition of Bn(x, y), we have

EBn(x, y) = E 1

nhn

n∑i=1

G(y − Yi

hn

)K(x−Xi

hn

)

=

∫ ∞

−∞

∫ ∞

−∞

1

hn

G(y − Yi

hn

)K(x−Xi

hn

)f(x, y)dxdy

= hn

∫ ∞

−∞

∫ ∞

−∞G(u)K(v)f(y − uhn, x− vhn)dudv

= hn

∫ ∞

−∞

∫ ∞

−∞K(u)K(v)F (1,0)(y − uhn, x− vhn)dudv

= hn

∫ ∞

−∞

∫ ∞

−∞K(u)K(v)F (1,0)(x, y)− uhnF

(1,1)(x, y) + u2h2nF (0,2)(x, y) + o(h2

n)

= F (1,0)(x, y) + O(hn).

Then, we have the result.

Lemma 3.2.3.

Under the conditions (A1 )( i ), ( A3 )( i , iii ) and (A4 )( i ), we have

limn→∞

supx∈R

|Bn(x, y)− F (1,0)(x, y)| = 0

with probability one.

Proof :

By above lemma, it suffices to show that

limn→∞

supx∈R

|Bn(x, y)− EBn(x, y)| = 0

with probability one. Let Sn(u, v) be the two dimensional empirical distribution function

defined by

Sn(u, v) =1

n

n∑i=1

I(u−Xi)I(v − Yi)

where

I(x− y) =

1 x− y ≥ 0,

0 x− y < 0.

71

Page 80: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

=

1 x ≥ y,

0 x < y.

Now,

supx∈R

|Bn(x, y)− EBn(x, y)|

= supx∈R

|∫ ∞

−∞

∫ ∞

−∞

1

hn

G(y − v

hn

)K(x− u

hn

)dSn(u, v)−∫ ∞

−∞

∫ ∞

−∞

1

hn

G(y − v

hn

)K(x− u

hn

)dF (u, v)|

= supx∈R

|∫ ∞

−∞

∫ ∞

−∞

1

hn

G(y − v

hn

)K(x− u

hn

)× dSn(u, v)− dF (u, v)|

= h−1n sup

x∈R|∫ ∞

−∞

∫ ∞

−∞Sn(u, v)− F (u, v) × dG(

y − v

hn

)dK(x− u

hn

)| ( Integrating by parts )

≤ h−1n µ sup

(u,v)∈R2

|Sn(u, v)− F (u, v)|

where µ =

∫ ∞

−∞|K(1)(t)|dt.

Hence, for any ε > 0, we have

n∑i=1

P [ supy∈R

|Bn(x, y)− EBn(x, y)| ≥ ε ] ≤n∑

i=1

Ph−1n µ sup

(u,v)∈R2

|Sn(u, v)− F (u, v)| ≥ ε

=n∑

i=1

P sup(u,v)∈R2

|Sn(u, v)− F (u, v)| ≥ hnε

µ

< C1

n∑i=1

exp−C2ε2nh2

n

µ2 < ∞, ( By Lemma 1.1.9 )

where C1 and C2 are positive constants.

Sincen∑

i=1

P [ supy∈R

|Bn(x, y)− EBn(x, y)| ≥ ε ] < ∞,

then by Borel Cantell Lemma, we have the rsult.

Lemma 3.2.4.

Under the conditions (A1 )( i ), ( A2 )( ii ), ( A3 )( i , iii ) and (A4 )( i ) if g(x) > 0, then

limn→∞

supy∈R

|Fn(y|x)− F (y|x)| = 0

72

Page 81: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

with probability one.

Proof :

Since F (y|x) =

∫ y

−∞f(x, u)du

g(x)=

F (1,0)(x, y)

g(x).

Then by Lemma (3.2.1) and Lemma (3.2.3), we have

limn→∞

supy∈R

|Fn(y|x)− F (y|x)| = limn→∞

supy∈R

|Bn(x, y)

gn(x)− F (1,0)(x, y)

g(x)| = 0

with probability one.

Lemma 3.2.5.

Under the conditions of Lemma (3.2.4), if g(x) > 0, we have

limn→∞

|F (qαi,n(x)|x)− F (qαi(x)|x)| = 0

with probability one (i = 1, 2.)

Proof :

Since

|F (qαi,n(x)|x)− F (qαi(x)|x)| = |F (qαi,n(x)|x)− Fn(qαi,n(x)|x)− F (qαi

(x)|x) + Fn(qαi,n(x)|x)|≤ |F (qαi,n(x)|x)− Fn(qαi,n(x)|x)|+ |F (qαi

(x)|x)− Fn(qαi,n(x)|x)|≤ 2 sup

y∈R|Fn(y|x)− F (y|x)|

Then

supy∈R

|F (qαi,n(x)|x)− F (qαi(x)|x)| ≤ 2 sup

y∈R|Fn(y|x)− F (y|x)|

Applying the last Lemma to get

limn→∞

supy∈R

|F (qαi,n(x)|x)− F (qαi(x)|x)| ≤ 2 lim

n→∞supy∈R

|Fn(y|x)− F (y|x)| = 0

Thus

limn→∞

supy∈R

|F (qαi,n(x)|x)− F (qαi(x)|x)| = 0.

Now, the following theorem deals with the strong consistency of the estimators qαi,n(x), i =

1, 2.

73

Page 82: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Theorem 3.2.6.

Under the conditions (A1 )( i ), ( A3 )( i , iii ), ( A4 )( i ), if g(x) > 0, then

limn→∞

qαi,n(x) = qαi(x), i = 1, 2,

with probability one.

Proof :

We only prove this theorem at i = 1. That is, we want to show

limn→∞

qα1,n(x) = qα1(x)

with probability one.

Sine qα1,n(x) is unique. Then for any ε > 0 there exists an δ = η(ε) > 0 defined by

η(ε) = minF (qα1(x) + ε|x)− F (qα1(x)|x) , F (qα1(x)|x)− F (qα1(x)− ε|x)such that |qα1,n(x)− qα1(x)| > ε implies that |F (qα1,n(x)|x)− F (qα1(x)|x)| > η(ε).

Expanding Fn(qαi,n(x)|x) around qαi(x) to get

F (qαi(x)|x) = αi = Fn(qαi,n(x)|x)

= Fn(qαi(x)|x) + (qαi,n(x)− qαi

(x))fn(qi|x)

where qi is some random point between qαi,n(x) and qαi(x), i = 1, 2.

Hence

qαi,n(x)− qαi(x) =

F (qαi(x)|x)− Fn(qαi

(x)|x)

fn(qi|x)

and so

(nhn)12 (qαi,n(x)− qαi

(x)) =−(nhn)

12Fn(qαi

(x)|x)− F (qαi(x)|x)

fn(qi|x), i = 1, 2. (3.2.1)

Since limn→∞

Fn(qαi(x)|x) = F (qαi

(x)|x), then we have the result.

Lemma 3.2.7.

Under the conditions (A1 )( i ), ( A3 )( i , iii ) and (A4 )( i ) if g(x) > 0, then

fn(qi|x) = f(qαi(x)|x) + op(1), i = 1, 2.

74

Page 83: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Proof :

Since

|fn(qi|x)− f(qαi(x)|x)| = |fn(qi|x)− f(qi|x) + f(qi|x)− f(qαi

(x)|x)|≤ |fn(qi|x)− f(qi|x)|+ |f(qi|x)− f(qαi

(x)|x)|≤ sup

y∈R|fn(y|x)− f(y|x)|+ |f(qi|x)− f(qαi

(x)|x)|= o(1).

To complete the asymptotic joint distribution of qα1,n(x) and qα2,n(x) we define for i = 1, 2

and j = 1, 2, . . . , n the following:

U∗nj(x) =

1

hn

K(x−Xi

hn

),

V ∗nij(x) =

1

hn

G(qαi

(x)− Yj

hn

)K(x−Xj

hn

),

Unj(x) = (hn)12 [U∗

nj(x)− EU∗nj(x)],

Vnij(x) = (hn)12 [V ∗

nij(x)− EV ∗nij(x)],

Un(x) =n∑

j=1

Unj(x), Vni(x) =n∑

j=1

Vnij(x)

Wnj =

Uni(x)

Vn1j(x)

Vn2j(x)

, n

12Zn =

Un(x)

Vn1(x)

Vn2(x)

wi(x) = F (1,0)(x, qαi(x))

n12Z∗n = (hn)

12

n∑j=1

[U∗nj(x)− g(x)]

n∑j=1

[V ∗n1j(x)− w1(x)]

n∑j=1

[V ∗n2j(x)− w2(x)]

75

Page 84: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

A =

∫ ∞

−∞K2(u)du .

g(x) w1(x) w2(x)

w1(x) w1(x) w1(x)

w2(x) w1(x) w2(x)

.

Lemma 3.2.8.

Under the conditions (A1 )( i ), ( A2 )( ii ), and (A3 )( i , iii ), the following results hold:

1. limn→∞

EU2nj(x) = g(x)

∫ ∞

−∞K2(u)du,

2. limn→∞

EV 2nij(x) = wi(x)

∫ ∞

−∞K2(u)du, i = 1, 2,

3. limn→∞

EUnj(x)Vnij(x) = wi(x)

∫ ∞

−∞K2(u)du, i = 1, 2,

4. limn→∞

EVn1j(x)Vn2j(x) = w1(x)

∫ ∞

−∞K2(u)du.

Proof:

(1) since

EU2nj(x) = hn[

1

h2n

∫ ∞

−∞K2(

x− u

hn

)g(u)du− (1

hn

∫ ∞

−∞K(

x− u

hn

)g(u)du)2],

then,

limn→∞

EU2nj(x) = lim

n→∞1

hn

∫ ∞

−∞K2(

x− u

hn

)g(u)du

− limn→∞

hn(1

hn

∫ ∞

−∞K(

x− u

hn

)g(u)du)2

= g(x)

∫ ∞

−∞K(u)du − 0 = g(x)

∫ ∞

−∞K(u)du

(2)

limn→∞

EV 2nj(x) = lim

n→∞1

h2n

∫ ∞

−∞

∫ ∞

−∞G2(

qα(x)− v

hn

)K2(x− u

hn

)f(u, v)dudv

= limn→∞

1

h2n

∫ ∞

−∞

∫ ∞

−∞G2(

qα(x)− v

hn

)f(v|x− uhn)dv K2(u) g(x− uhn)du

76

Page 85: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

= g(x)

∫ ∞

−∞K2(u)du . lim

n→∞1

h2n

∫ ∞

−∞

∫ ∞

−∞G2(

qα(x)− v

hn

)f(v|x− uhn)dv

= g(x)

∫ ∞

−∞K2(u)du .

∫ qα(x)

−∞f(v|x)d

= g(x)F (qα(x)|x)

∫ ∞

−∞K2(u)du.

The proof of (3) and (4) omited.

Lemma 3.2.9.

Under the conditions (A1 )( i ), ( A2 )( ii ), ( A3 )( i , iii ) and (A4 )( ii ) Zn converges in

distribution to a trivariate normal random variable with mean vector 0 and covariance

matrix A.

Proof :

To prove this theorem , it sufficient to show that CZTn converge in distribution to CZT ,

for any real vector C =

C1

C2

C3

and C 6= 0.

Now, we define for j = 1, 2, . . . , n the following

σ2nj = var[CTWnj ]

ρ3njE|CTWnj|3

and let σ2n =

n∑i=1

σ2nj, ρ3

n =n∑

i=1

ρ3nj.

Next, for any C 6= 0, we have

limn→∞

σ2nj = lim

n→∞V ar[CTWnj ]

= limn→∞

C1Unj(x) + C2Vn1j(x) + C3Vn2j(x) = CTAC > 0, j = 1, 2, . . . , n.

Using computations similar to those in Lemma (3.2.8), we have

E|Un1(x)|3 = O(h− 1

2n ) and

77

Page 86: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

E|Vni1(x)|3 = O(h− 1

2n ), i = 1, 2.

Therefor,

ρ3n = nE|CTWn1 |3

= nE|(

C1 C2 C3

)

Un1(x)

Vn11(x)

Vn21(x)

|3

= nEC21U

2n1(x) + C2

2V2n1j(x) + C2

3V2n2j(x) 3

2

≤ 332 n(C2

1 + C22 + C2

3)32 ×maxE|Un1(x)|3, E|Vni1(x)|3 i = 1, 2.

= O(nh− 1

2n )

Hence it follows that limn→∞

ρn

σn

= 0. By Liapounov’s version of the central limit theorem we

conclude that CTZn = n−12

n∑i=1

CTWnj converge in distribution to a univariate normal

random variable with mean 0 and variance CTAC.

We recall the Cramer-Wold Theorem to complete the proof of this lemma.

78

Page 87: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Lemma 3.2.10.

Under the conditions (A1 )( i ), ( A2 )( ii ) and ( A3 )( i , iii ) Z∗n converge in distribution

of a trivariate normal random variable with mean vector 0 and covariance matrix A.

Proof :

Since

n12 (Z∗n − Zn) = (hn)

12

n∑j=1

[U∗nj(x)− g(x)]

n∑j=1

[V ∗n1j(x)− w1(x)]

n∑j=1

[V ∗n2j(x)− w2(x)]

Un(x)

Vn1(x)

Vn2(x)

=

h12n

n∑j=1

[U∗nj(x)− g(x)]− Un(x)

h12n

n∑j=1

[V ∗n1j(x)− w1(x)]− Vn1(x)

h12n

n∑j=1

[V ∗n2j(x)− w2(x)]− Vn2(x)

=

h12n

n∑j=1

[U∗nj(x)− g(x)]−

n∑j=1

Unj(x)

h12n

n∑j=1

[V ∗n1j(x)− w1(x)]−

n∑j=1

Vn1j(x)

h12n

n∑j=1

[V ∗n2j(x)− w2(x)]−

n∑j=1

Vn2j(x)

=

h12n

n∑j=1

[U∗nj(x)]−

n∑j=1

Unj(x)− h12n

n∑j=1

[g(x)]

h12n

n∑j=1

[V ∗n1j(x)]−

n∑j=1

Vn1j(x)− h12n

n∑j=1

[w1(x)]

h12n

n∑j=1

[V ∗n2j(x)]−

n∑j=1

Vn2j(x)− h12n

n∑j=1

[w2(x)]

79

Page 88: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

=

h12n

n∑j=1

E[U∗nj(x)]− h

12n

n∑j=1

[g(x)]

h12n

n∑j=1

E[V ∗n1j(x)]− h

12n

n∑j=1

[w1(x)]

h12n

n∑j=1

E[V ∗n2j(x)]− h

12n

n∑j=1

[w2(x)]

= nh12n

E[U∗nj(x)]− g(x)

E[V ∗n1j(x)]− w1(x)

[V ∗n2j(x)]− w2(x)

That is Z∗n − Zn = (nhn)12

E[U∗nj(x)]− g(x)

E[V ∗n1j(x)]− w1(x)

[V ∗n2j(x)]− w2(x)

= (nhn)

12 Cn

where Cn =

E[U∗nj(x)]− g(x)

E[V ∗n1j(x)]− w1(x)

[V ∗n2j(x)]− w2(x)

since

EU∗n1(x1) − g(x1) = E 1

hn

K(x−X1

hn

) − g(x1)

=1

hn

∫K(

x1 − u

hn

)g(u)du− g(x1)

=

∫K(u)g(xi − uhn)du− g(x1)

=

∫K(u)g(x1)− uhng

′(x1) + u2h2ng′′(x1)du− g(x1)

= h2ng′′(x1)

∫U2K(u)du ≤ Ch2

n = O(h2n)

We can prove similar the other elements of Cn to get Cn = O(h2n).

80

Page 89: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Now

Z∗n = Zn + (Z∗n − Zn)

= Zn + (nhn)12 Cn

= Zn + O(nh5n)

12

= Zn + O(1)

Then the proof of this Lemma is complete.

Now, we will consider the main theorem of this section.

Theorem 3.2.11.

Under the conditions (A1)(i), (A4)(ii), if g(x) > 0 and f(x, qαi(x)) > 0, i = 1, 2, we

have

qα1,n(x)

qα2,n(x)

is asymptotically normally distributed with mean vector

qα1(x)

qα2(x)

and covariance matrix Bn =

∫ ∞

−∞K2(u)du

nhng(x)

b11 b12

b12 b22

,

where

bij =αi(1− αi)

f(qαi(x)|x)f(qαj

(x)|x), 1 ≤ i ≤ j ≤ 2.

Proof :

Let the function H from R3 to R2 defined by

H(y) =

y2

y1y3

y1

with y =

y1

y2

y3

and let θ =

g(x)

w1(x)

w2(x)

81

Page 90: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

We can write Z∗n = (nhn)12 (Tn − θ), where

Tn =

Tn1

Tn2

Tn3

with Tn1 =

1

n

n∑j

U∗nj(x)

and

Tni =1

n

n∑j

V ∗n(i−1)j(x), i = 2, 3.

From [17] with (n)12 replaced by (nhn)

12 we conclude that

(nhn)12H(Tn)−H(θ) = (nhn)

12

Fn(qα1(x)|x)− F (qα1(x)|x)

Fn(qα2(x)|x)− F (qα2(x)|x)

converge in distribution to a bivariate normal random variable with mean vector 0 and

covariance matrix DADT where D is the matrix of partial derivative of H, evaluated at

θ, and given by

D =

−w1(x)

g2(x)

1

g(x)0

−w2(x)

g2(x)0

1

g(x)

Then

DADT =

∫ ∞

−∞K2(u)du

g(x)×

α1(1− α1) α1(1− α2)

α1(1− α2) α2(1− α2)

By (3.2.1), we have the result.

82

Page 91: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

3.3 Mode and Median as a Comparison

In chapter 2 and 3 we have studied two important aspects of the conditional density func-

tion, the conditional mode and the conditional quantiles.

As a measure of central density we compare between the mode and the median.

First, for the mode

i) The bias term vanishes since we use the condition (A2) (iv) page 45.

ii) The variance equalsf(x, θ(x))

f (0,2)(x, θ(x))2.

∫ ∫K(u)K(1)(v)2dudv

nh4n

That is the MSE depends only on the value of the variance.

Now, for the median

if we put α = 0.5 in Theorem 3.1.5, we get

i) The bias term is given by1

2h2

n

F (2,0)(q0.5(x)|x)

f(q0.5(x)|x)

∫u2K(u)du

ii) The variance term is1

nhn

1/4

f 2(q0.5(x))

∫K2(u)du

Notice that for the mean square error, MSE(x) = Bias2[f(x)] + V ar[f(x)].

For the median, we have the bias depends through F (2,0)(q0.5(x)|x) on the smoothness

of the quantile function.

Because of the division by the conditional density at q0.5(x) the steepness of the conditional

distribution also affects the bias and the variance. The steeper the conditional distribution

is the greater is the mean square error.

Also, we note the variance of the median is the largest since1

4≥ (F (y|x)− F 2(y|x)).

83

Page 92: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

Bibliography

[1] Abberger, K. (1997). Quantile Smoothing in Financial Time Series. Statisical Papers

38, 125-148.

[2] Abraham, G., Bian, G. and Cadre, B. (2004). On the Asymptotic Properties od a

Simple Estimation of the Mode. ESAIM: Probability and Statistics, Vol. 8, 1-11.

[3] Abraham, G., Bian, G. and Cadre, B. (2003). Simple Estimation of the Mode of a

Multivariate Density. The Canadian Journal of Statis- tics, Vol. 31, 23-34.

[4] Bartle Robert G. Sherbert Donaled R. (1991). Introduction To Real Analysis. Eastern

Michigan University.

[5] Devorye, L. (1979). Recursive Estimation of the Mode of a Multivariate Density. The

Canadian Journal of Statistics, Vol 7, 159-167.

[6] Freund, J. (1992). Mathematical Statistics, Arizona State University.

[7] George Casell. Roger L. Berger. (1990). Statistical Inference. Cornell University,

North Carolina State University.

[8] Hogg . Mckean . Craig (2005). Introduction to Mathematicial Statistic. University of

Iowa, Wester Michigan University, University of Iowa.

[9] Keming Yu (2003). Quantile regression: applicaions and current reseach areas. Uni-

versity of Plymouth.UK

[10] Loeve, M.(1960). Probanility Theory, 2nd Ed.Van Nostrand, Princeton. Nostand.

84

Page 93: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

[11] Nada, G. (2002). On The Kernel Density Estimatiom, Islamic University of Gaza,

Palestine.

[12] Parzen, E. (1962). On Estimation of a Probability Density Functin and Mode. The

Annals of Mathematical Statistics, Vol. 33, 1065-1076.

[13] Pranab K.sen. (1993). Large Sample Methods in Statsitic ”An Itroduction With

Applications”. New York.

[14] Rao, C.R. (1965). Linear Statistical Inference and Its Applications. Wiley, New York.

[15] Rosenblatt, M. (1956). Remarks on some non-parametric estimates of the density

function. Annals Math. Statist. 27, 832-837.

[16] Royden H.L. (1997). Real Analysis. Stanford University

[17] Samanta, M. Non-Parametric Estimation of Conditional Quamtiles. (1988). Depar-

ment of Statistic. University of Manitoba. Canada.

[18] Samanta, M. and Thavanesmaran, A. (1990). Nonparametric Estimation of the Con-

ditional Mode. Communications in Statistics. Theory and Methods, Vol. 19, 4515-

4524.

[19] Samanta, M.(1973). Nonparametric Estimation of the Mode of Multivariate Density.

South African Statiscal Jourrnal, Vol.7, 109-117

[20] Salha, R. (2006). Kernel Estimation of the Conditional Quantiles and Mode for Time

Series. Univ. Of Macedonia.

[21] Schustewr, E. (1972). Joint Asymptotic Distribution of Estimated Regression Func-

tion at a Finite Number of Distinct Points. The Annals of Mathematical Statistics,

Vol. 43, 84-88.

[22] Silverman, B.W. (1986). Density Estimation for Statistics and Datd Analysis, School

of Mathematics Univ. of Bath, UK.

85

Page 94: The Asymptotic Distributions of The Kernel Estimations of ...For the conditional mode, we study the asymptotic normality of its kernel estimation from [18] and we study the conditions

[23] Wand, M.P , Jones, M.C. (1995). Kernel Smoothing. Univ of New South Wales

Asutralia.

[24] Watson, G.S. (1964). Smooth Regression Analysis. Sankhya, Series. A, Vol. 26, 359-

372.

[25] Whittle, P. (1958). On the Smoothing of Probabiliy Density Function. J. Roy. Statist.

Soc, Ser. B 20 334-343.

86