Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes...
Transcript of Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes...
![Page 1: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/1.jpg)
Nonparametric Techniques
![Page 2: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/2.jpg)
2PR , ANN, & ML
Nonparametric Techniques
w/o assuming any particular distribution
the underlying function may not be known (e.g.
multi-modal densities)
too many parameters
Estimating density distribution directly
Transform into a lower-dimensional space
where parametric techniques may apply
(more on this later on dimension reduction)
![Page 3: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/3.jpg)
3PR , ANN, & ML
Example
Estimate the population growth, annual
rainfall, etc. in the US
p(x,y)dxdy is the probability of rain fall in
[x,x+dx,y,y+dy]
![Page 4: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/4.jpg)
4PR , ANN, & ML
Example (cont.)
A simple parametric model for p(x,y)
probably does not exist
In stead
partition the area into a lattice
At each (x,y), count the amount of rain r(x,y)
Do that for a whole year
Normalize S r (x,y) = 1
![Page 5: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/5.jpg)
5PR , ANN, & ML
Density estimationprobability
value (x)
p x( )
probability
value (x)
P p x dxxx
i
j ( )
xi x j
![Page 6: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/6.jpg)
6PR , ANN, & ML
From equation
From observation
Hence
P p x dx p x x xxx
j ii
j ( ) ( )( )
Pk
n
p xk n
x x
k n
Vj i
( )/
( )
/
Density estimation
![Page 7: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/7.jpg)
7PR , ANN, & ML
Comparison
In Reality:
The number of
training samples is
limited
if V is too small, k
becomes erratic
What does 0 mean?
if V is too large,
is not representative
In theory:
If n becomes infinitely
large, k/n approaches
the probability, p(x) =
(k/n)/V is then only a
space average
Hence, V must be
allowed to go to zero
as n goes to infinityp x( )
p xk n
x x
k n
Vj i
( )/
( )
/
![Page 8: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/8.jpg)
8PR , ANN, & ML
In Theory
Theoretically, we can use a sequence of
samples with increasing size for estimation
Then
p x p x if
V
k
k
n
n
nn
nn
n
n
( ) ( )
( )
( )
( )
lim
lim
lim
1 0
2
3 0
![Page 9: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/9.jpg)
9PR , ANN, & ML
Two different approaches
Constrain the region size
Shrink the region to maintain good locality
(Parzen Windows)
Constrain the sample size
Enlarge the number of samples to maintain
good resolution (Kn-nearest-neighbors)
![Page 10: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/10.jpg)
10PR , ANN, & ML
Parzen WindowsUse a windowing function, e.g.
A sequence of n regions can be defined
221
2
2
1
0
||1)(
x
eorotherwise
xx
n n
nh
n
x x h
h
( ) ( / )
1
n
i
in
n
i n
i
n
n
n
i n
in
i
inn
xxnh
xx
Vnxp
h
xxxxk
11
11
)(1
)(11
)(
)()(
By definition
![Page 11: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/11.jpg)
11PR , ANN, & ML
Parzen Window (cont.)
As n increases
The window becomes narrower (by hn)
The window becomes taller (by 1/Vn)
Sampling with smaller aperture but
higher focus
The same 100 dollars collected from 100
people and from 1 person is different
(per person)
1)()(1
)( duudxh
xx
Vdxxx
n
i
n
in
n
![Page 12: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/12.jpg)
12PR , ANN, & ML
x
)(xpn Small n: large aperture, smoothed, fuzzy estimate
Large n: small aperture, sharp, erratic estimate
![Page 13: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/13.jpg)
13PR , ANN, & ML
1n
16n
256n
n
![Page 14: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/14.jpg)
14PR , ANN, & ML
2D Sampling
Five samples
Windowing func:
![Page 15: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/15.jpg)
15PR , ANN, & ML
# of samples
Window
size
![Page 16: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/16.jpg)
16PR , ANN, & ML
1n
16n
256n
n
![Page 17: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/17.jpg)
17PR , ANN, & ML
Does it work?
“Work” in the sense that you if you are able to shrink down the window size as much as you want (certainly, you must simultaneously increase the number of samples available), then the limit of the profile should be the correct probability
This implies (treating pn as a random variable)
E(pn (x))=p(x)
Var(pn(x)) -> 0
![Page 18: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/18.jpg)
18PR , ANN, & ML
Convergence of Mean
)()()(
)()(1
)](1
[1
)]([)(
1
xvvvx
vvvx
xx
xx
pdp
dphV
hVE
n
pEp
n
nn
n
i n
i
n
nn
Will pn (x) goes to p(x)?
If n goes to infinity
xi will cover all possible x (summation to integration)
with p(x) distribution (weighted by p(x) )
Sample v appears with
probability p(v)
![Page 19: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/19.jpg)
19PR , ANN, & ML
Convergence of Variance
Will pn(x) always end up at p(x) for certain?
nVn must approach infinity, even Vn when goes to zero
n
nn
nnn
n
nnn
n
i
n
n
i
n
n
i
n
n
i
n
n
nV
p
dphVnV
pn
dphVnV
pnhVn
nE
pnhnV
E
)())(sup()(
)()(1
))(sup(1
)(1
)()(11
)(1
)(1
)(1
)(1
)(
2
22
1
22
22
1
2
2
xx
vvvx
xvvvx
xxx
xxx
x
-> 0 as n-> infinity
![Page 20: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/20.jpg)
20PR , ANN, & ML
kn-nearest-neighbor
Parzen window size hard to estimate
Constrain the number of data items instead
of the size of the window
enlarge window around x to
enclose that many samples, then
k nn
p xk n
Vn
n
n
( )/
![Page 21: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/21.jpg)
21PR , ANN, & ML
kn-nearest-neighbor
Intuitively, as n increases
kn should increase (for good representation)
Vn should decrease (for good localization)
The following conditions guarantee
convergence
0lim
lim
n
k
k
n
n
nn
![Page 22: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/22.jpg)
22PR , ANN, & ML
Sharp spikes around data points:
Kn=1, the probability estimate is infinity at data point
(region size is zero to capture 1 sample)
![Page 23: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/23.jpg)
23PR , ANN, & ML
![Page 24: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/24.jpg)
24PR , ANN, & ML
1
1
nk
n
4
16
nk
n
16
256
nk
n
nk
n
![Page 25: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/25.jpg)
25PR , ANN, & ML
An Example
Estimating
n tagged samples
a volume V around x captures k samples,
of them are
)|( xip
ki
k
k
V
nkV
nk
V
nkV
nk
p
pp
V
nkp
i
i
c
j
i
i
c
j
jn
inin
iin
/
/
/
/
),(
),()|(
/),(
11
x
xx
x
i
![Page 26: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/26.jpg)
26PR , ANN, & ML
Comparison
Parametric
simple and analytical
may not fit well real-world densities
Non-parametric
flexible and fit all densities
need to remember all samples
![Page 27: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/27.jpg)
27PR , ANN, & ML
One Final Note
Here we talk about Parzen window and kn-
nearest-neighbor rule as a way to estimate a
single probability density
This rule is equally useful at labeling a
sample against multiple probable classes
(densities)
More on that in linear discriminant function
![Page 28: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/28.jpg)
28
More Realistic Scenarios
Drake’s Equation
Rate of start formation, fraction of stars having
planets, average # of planets per star that
support life, fraction of such stars actually
develop life, fractions of such stars actually
develop civilization, such civilization have
communication, length of time such civilization
actually release signals
PR , ANN, & ML
![Page 29: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/29.jpg)
29
More Realistic Scenarios
Chance of a person develops cancer
(ancestry, birth place, how raised, living
habits, education history, work history,
exercise habit, income, debt, food intake,
etc.)
Chance of a person contributes to political
campaign (…)
PR , ANN, & ML
![Page 30: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/30.jpg)
30
Curse of Dimensionality
Not possible to estimate distributions in
such high-dimensional space
# of samples needed are generally infinitely
large
PR , ANN, & ML
![Page 31: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/31.jpg)
31
Practical Usage
X = rand(3,3)
Sampling based on certain distribution
(default is uniform)
Need to evaluate certain expectation
Technology advances by alien contact
Life expectancy (for cancer case)
Amount of money for political campaigns
PR , ANN, & ML
![Page 32: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/32.jpg)
32
General Idea
Finite number samples: sample
mean/variance to estimate population
mean/variance
z(l) , l = 1, …, L
Samples may not be independent
Some distribution (uniform) is easier to sample
than others
f(z) is small in regions where p(z) is large and
vice versa
PR , ANN, & ML
![Page 33: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/33.jpg)
33
From One to Another
PR , ANN, & ML
z: uniform
y: any known distribution
Sample z uniformly ==
Sample y based on p(y)
![Page 34: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/34.jpg)
34
Multi-Dimensional
Much more difficult
Do not know the form
Cannot get enough samples to populate the
landscape
How to generate IID samples?
PR , ANN, & ML
![Page 35: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/35.jpg)
35
Rejection Sampling
A real distribution p(z)
A proposal distribution q(z)
Procedure
Generate zo from q(z)
Generate uo from [0, kq(zo)] uniformly
Reject sample if
Otherwise, accept
PR , ANN, & ML
![Page 36: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/36.jpg)
36
Importance Sampling
A real distribution p(z)
A proposal distribution q(z)
Procedure
Generate zo from q(z), nothing rejected
p(z(l))/q(z(l))): importance weight to account for
sampling from wrong distribution
PR , ANN, & ML
![Page 37: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/37.jpg)
37
MCMC
Imagine
A very high-dimensional space
Samples occupy low-dimensional manifold in
such a high-dimensional space
Choose a random start point
Wander about in the space, seeking out places
with sample
With right “seek” strategy, samples generated
along the walk have the right population
characteristics
PR , ANN, & ML
![Page 38: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/38.jpg)
38
MCMC Successive sampling points are NOT
independent, but form a Markov chain
Z* is generated at each step, accepted if
probability > preset threshold
Can be shown that the distribution of z(t)
tends to p(z) as t -> infinity
So distribution of steps z’s after some
initial steps can be used to approximate p(z)
For Metropolis algorithm, q has to be
symmetrical q(a|b)=q(b|a)
PR , ANN, & ML
![Page 39: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/39.jpg)
39
Meropolis - Hastings f(x): proportional to p(x) – target distribution
Given:
xo: first sample
Q(x’|x): Markov process to generate next sample
(x’) given current sample (x), Q must be
symmetrical (e.g., Gaussian)
Iteration:
X’ picking from Q(x’|x)
r=f(x’)/f(x) >=1 accept, otherwise accept with prob
r. If rejected, x’=x
![Page 40: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/40.jpg)
40
Intuition
PR , ANN, & ML
![Page 41: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/41.jpg)
41
Gibbs Sampling
Special case of MCMC Metropolis-
Hastings
From x (i) to x (i+1) by component-wide
sampling, j-th variable in x (i+1) depends on
1 to j-1 in (i+1)-th iterations
j+1 to n in (i)-th iteration
PR , ANN, & ML
![Page 42: Nonparametric Techniques - UCSByfwang/courses/cs290i_prann/pdf/nonparametric.pdfThe window becomes taller (by 1/V n) Sampling with smaller aperture but higher focus The same 100 dollars](https://reader033.fdocuments.net/reader033/viewer/2022051922/60100cc605313c773213ef80/html5/thumbnails/42.jpg)
42
Slice Sampling
Random walk under the probability curve
Start from an xo with f(x)>0
Randomly select height y, 0<y<=f(x)
Randomly select x’ lie within the slice, repeat
PR , ANN, & ML
xxo
f(xo)
y
slice