2019 年中国地球科学联合学术年会 第二号通知2019)cugs-2.pdf · 1. 领导小组 (按拼音排序) 主. 任: 郑永飞 副主任:陈晓非. 侯增谦. 张培震
A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1.
-
Upload
steven-wilken -
Category
Documents
-
view
358 -
download
1
Transcript of A Geometric Perspective on Machine Learning 何晓飞 浙江大学计算机学院 1.
A Geometric Perspective on Machine Learning
何晓飞浙江大学计算机学院
1
Machine Learning: the problem
f 何晓飞
Information(training data)
f: X→YX and Y are usually
considered as a Euclidean spaces.
2
Manifold Learning: geometric perspective
The data space may not be a Euclidean space, but a nonlinear manifold.
☒ Euclidean distance.☒ f is defined on
Euclidean space.☒ambient dimension
☑ geodesic distance.☑ f is defined on
nonlinear manifold.☑ manifold
dimension.
instead… 3
Manifold Learning: the challenges
The manifold is unknown! We have only samples!
How do we know M is a sphere or a torus, or else?
How to compute the distance on M?
versus
This is unknown:
This is what we have:
? ? or else…? Topology
Geometry
Functional analysis 4
Manifold Learning: current solution
Find a Euclidean embedding, and then perform traditional learning algorithms in the Euclidean space.
5
Simplicity
6
Simplicity
7
Simplicity is relative
8
Manifold-based Dimensionality Reduction
Given high dimensional data sampled from a low dimensional manifold, how to compute a faithful embedding?
How to find the mapping function ?
How to efficiently find the projective function ?
f
ff
9
A Good Mapping Function
If xi and xj are close to each other, we hope f(xi) and f(xj) preserve the local structure (distance, similarity …)
k-nearest neighbor graph:
Objective function: Different algorithms have different concerns
10
Locality Preserving Projections
Principle: if xi and xj are close, then their maps yi and yj are also close.
11
Locality Preserving Projections
Principle: if xi and xj are close, then their maps yi and yj are also close.
Mathematical formulation: minimize the integral of the gradient of f.
12
Locality Preserving Projections
Principle: if xi and xj are close, then their maps yi and yj are also close.
Mathematical formulation: minimize the integral of the gradient of f.
Stokes’ Theorem:
13
Locality Preserving Projections
Principle: if xi and xj are close, then their maps yi and yj are also close.
Mathematical formulation: minimize the integral of the gradient of f.
Stokes’ Theorem:
LPP finds a linear approximation to nonlinear manifold, while preserving the local geometric structure.
14
Manifold of Face Images
Expression (Sad >>> Happy)
Pose
(Rig
ht >
>> L
eft)
15
Manifold of Handwritten Digits
Thickness
Slan
t
16
Learning target:
Training Examples:
Linear Regression Model
Active and Semi-Supervised Learning: A Geometric Perspective
17
Generalization Error
Goal of Regression
Obtain a learned function that minimizes the generalization error (expected error for unseen test input points).
Maximum Likelihood Estimate
18
Gauss-Markov Theorem
For a given x, the expected prediction error is:
19
-4 -3 -2 -1 0 1 2 3 40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
-4 -3 -2 -1 0 1 2 3 40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Gauss-Markov Theorem
For a given x, the expected prediction error is:
Good! Bad!20
Experimental Design Methods
Three most common scalar measures of the size of the parameter (w) covariance matrix:
A-optimal Design: determinant of Cov(w). D-optimal Design: trace of Cov(w). E-optimal Design: maximum eigenvalue of
Cov(w).
Disadvantage: these methods fail to take into account unmeasured (unlabeled) data points.
21
Manifold Regularization: Semi-Supervised Setting
Measured (labeled) points: discriminant structure Unmeasured (unlabeled) points: geometrical structure
?
22
Measured (labeled) points: discriminant structure Unmeasured (unlabeled) points: geometrical structure
?
random labeling
Manifold Regularization: Semi-Supervised Setting
23
Measured (labeled) points: discriminant structure Unmeasured (unlabeled) points: geometrical structure
?
random labeling active learningactive learning + semi-supervsed learning
Manifold Regularization: Semi-Supervised Setting
24
Unlabeled Data to Estimate Geometry Measured (labeled) points: discriminant structure
25
Unlabeled Data to Estimate Geometry Measured (labeled) points: discriminant structure
Unmeasured (unlabeled) points: geometrical structure
26
Unlabeled Data to Estimate Geometry Measured (labeled) points: discriminant structure
Unmeasured (unlabeled) points: geometrical structure
Compute nearest neighbor graph G
27
Unlabeled Data to Estimate Geometry Measured (labeled) points: discriminant structure
Unmeasured (unlabeled) points: geometrical structure
Compute nearest neighbor graph G
28
Unlabeled Data to Estimate Geometry Measured (labeled) points: discriminant structure
Unmeasured (unlabeled) points: geometrical structure
Compute nearest neighbor graph G
29
Unlabeled Data to Estimate Geometry Measured (labeled) points: discriminant structure
Unmeasured (unlabeled) points: geometrical structure
Compute nearest neighbor graph G
30
Unlabeled Data to Estimate Geometry Measured (labeled) points: discriminant structure
Unmeasured (unlabeled) points: geometrical structure
Compute nearest neighbor graph G
31
Laplacian Regularized Least Square (Belkin and Niyogi, 2006)
Linear objective function
Solution
32
Active Learning
How to find the most representative points on the manifold? 33
Objective: Guide the selection of the subset of data points that gives the most amount of information.
Experimental design: select samples to label
Manifold Regularized Experimental DesignManifold Regularized Experimental Design
Share the same objective function as Laplacian Regularized Least Squares, simultaneously minimize the least square error on the measured samples and preserve the local geometrical structure of the data space.
Active Learning
34
,
In order to make the estimator as stable as possible, the size of the covariance matrix should be as small as possible.
D-optimality: minimize the determinant of the covariance matrix
2( )Cov Iy 1 2TXLX I
1 2T TH ZZ XLX I
wˆ( )Cov w
11 2ˆ ( )T TZZ XLX I Z w y
Analysis of Bias and Variance
35
Select the first data point such that is maximized,
Suppose k points have been selected, choose the (k+1)th point such that .
Update
Manifold Regularized Experimental Design
Where are selected from
1 1 1 1 11/H H H H H
1( ,..., )maxkZ H z z
1,..., kz z 1{ ,..., }mx x
1z 1 1 1 2T TXLX I z z
11 arg max
k
Tk k
ZH
zz z z
1 11 1 1 1 11 1 1 1
1 1
( )1
TT k k k k
k k k k k Tk k k
H HH H H
H
z z
z zz z
1 1 1 1 2T TH XLX I z z
The algorithm
36
Consider feature space F induced by some nonlinear mapping φ, and < f(xi), f(xj) >=K(xi, xi).
K(·, ·): positive semi-definite kernel function Regression model in RKHS: Objective function in RKHS:
22 212
1 , 1
( ) ( ( ) ) ( ( ) ( ))2
k mT T T
LapRLS i i i j iji i j
J y S
ν ν z ν x ν x νF
1
( ) ,m
i i X ii
ν x α
( ) ,Ty ν x ν F
Nonlinear Generalization in RKHS
37
Select the first data point such that is maximized,
Suppose k points have been selected, choose the (k+1)th point such that .
Update
Kernel Graph Regularized Experimental Design
where are selected from
2 11 2( ) ( )XZ ZX XX XX XXCov K K K LK K α
1( ,..., ) 1 2maxkZ XZ ZX XX XX XXK K K LK K z z
1,..., kz z 1{ ,..., }mx x
1v 1 1 1 2T
XX XX XXK LK K v v
11 arg max
k
Tk kM
vv v v
U V
1 11 1 1 11 1
1 11
Tk k k k
k k Tk k k
M MM M
M
v v
v v
1 1 1 1 2T
XX XX XXM K LK K v v
Nonlinear Generalization in RKHS
38
A Synthetic Example
A-optimal Design Laplacian Regularized Optimal Design
39
A Synthetic Example
A-optimal Design Laplacian Regularized Optimal Design
40
Application to image/video compression
41
Video compression
42
Topology
Can we always map a manifold to a Euclidean space without changing its topology?
…
?
43
Topology
Simplicial Complex
Homology Group
Betti Numbers Euler Characteristic
Good CoverSample Points
Homotopy
Number of components, dimension,…44
Topology
The Euler Characteristic is a topological invariant, a number that describes one aspect of a topological space’s shape or structure.
1
-2
0 1 2
The Euler Characteristic of Euclidean space is 1!
0 0
45
Challenges
Insufficient sample points Choose suitable radius How to identify noisy holes (user interaction?)
Noisy holehomotopy
homeomorphsim
46
Q & A
47