A Survey on Distance Metric Learning (Part 2)
-
Upload
lane-beard -
Category
Documents
-
view
70 -
download
0
description
Transcript of A Survey on Distance Metric Learning (Part 2)
![Page 1: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/1.jpg)
1
A Survey on Distance Metric Learning (Part 2)
Gerry Tesauro
IBM T.J.Watson Research Center
![Page 2: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/2.jpg)
2
Acknowledgement
• Lecture material shamelessly adapted from the following sources:– Kilian Weinberger:
• “Survey on Distance Metric Learning” slides• IBM summer intern talk slides (Aug. 2006)
– Sam Roweis slides (NIPS 2006 workshop on “Learning to Compare Examples”)
– Yann LeCun talk slides (CVPR 2005, 2006)
![Page 3: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/3.jpg)
3
Outline – Part 2
Neighbourhood Components Analysis (Golderberger et al.), Metric Learning by Collapsing Classes (Globerson & Roweis)
Metric Learning for Kernel Regression (Weinberger & Tesauro)
Metric learning for RL basis function construction (Keller et al.)
Similarity learning for image processing (LeCun et al.)
![Page 4: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/4.jpg)
Neighborhood Component Analysis
(Goldberger et. al. 2004)Distance metric for visualization and kNN
![Page 5: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/5.jpg)
Metric Learning for Kernel Regression
Weinberger & Tesauro, AISTATS 2007
![Page 6: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/6.jpg)
Killing three birds with one stone:
We construct a method for linear dimensionality
reduction
that generates a meaningful distance
metric optimally tuned for
distance-based kernel
regression
![Page 7: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/7.jpg)
7
Kernel Regression
• Given training set {(xj , yj), j=1,…,N} where x is -dim vector and y is real-valued, estimate value of a test point xi by weighted avg. of samples:
where kij = kD (xi, xj) is a distance-based kernel function using distance metric D
ijij
ijijj
i k
ky
y
![Page 8: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/8.jpg)
8
Choice of Kernel
• Many functional forms for kij can be used in MLKR;
our empirical work uses the Gaussian kernel
where σ is a kernel width parameter (can set σ=1 W.L.O.G. since we learn D)
softmax regression estimate similar to Roweis’ softmax classifier
)/exp( 22 ijij Dk
ij
ij
ij
ijj
i D
Dy
y)exp(
)exp(
ˆ2
2
![Page 9: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/9.jpg)
Distance Metric for Nearest Neighbor Regression
Learn a linear transformation that allows to estimate the value of a test point from its nearest neighbors
![Page 10: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/10.jpg)
Mahalanobis Metric
Distance function is a pseudo Mahalanobis metric (Generalizes
Euclidean distance)
![Page 11: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/11.jpg)
11
General Metric Learning Objective
• Find parmaterized distance function Dθ that minimizes total leave-one-out cross-validation loss function
– e.g. params θ = elements Aij of A matrix
• Since we’re solving for A not M, optimization is non-convex use gradient descent
2)ˆ( iii
yy
![Page 12: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/12.jpg)
12
Gradient Computation
where xij = xi – xj
For fast implementation: Don’t sum over all i-j pairs, only go up to ~1000
nearest neighbors for each sample i Maintain nearest neighbors in a heap-tree structure,
update heap tree every 15 gradient steps Ignore sufficiently small values of kij ( < e-34 )
Even better data structures: cover trees, k-d trees
))ˆ()ˆ(4 Tij
i jijijjjii xxkyyyyA
A
![Page 13: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/13.jpg)
Learned Distance Metric example
orig. Euclidean D < 1 learned D < 1
![Page 14: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/14.jpg)
“Twin Peaks” test
n=8000
Training:
we added 3 dimensions with
1000% noise
we rotated 5 dimensions randomly
![Page 15: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/15.jpg)
Input Variance
Noise Signal
![Page 16: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/16.jpg)
Test data
QuickTime™ and aTIFF (PackBits) decompressorare needed to see this picture.
![Page 17: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/17.jpg)
QuickTime™ and aTIFF (PackBits) decompressorare needed to see this picture.
Test data
![Page 18: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/18.jpg)
Output Variance
Signal Noise
![Page 19: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/19.jpg)
DimReduction with MLKR• FG-NET face data: 82 persons, 984 face images w/age
![Page 20: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/20.jpg)
DimReduction with MLKR• FG-NET face data: 82 persons, 984 face images w/age
![Page 21: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/21.jpg)
DimReduction with MLKR
Force A to be rectangular
Project onto eigenvectors of A
Allows visualization of data
PowerManagement data (d=21)
![Page 22: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/22.jpg)
Robot arm results (8,32dim)
regression error
![Page 23: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/23.jpg)
© 2006 IBM Corporation
IBM
Unity Data Center Prototype
Objective: Learn long-range resource value estimates for each application manager
State Variables (~48):
– Arrival rate– ResponseTime– QueueLength– iatVariance– rtVariance
Action: # of servers allocated by Arbiter
Reward: SLA(Resp. Time)
8 xSeries servers
Value(#srvrs)
Trade3
AppManager
Value(RT)
ResourceArbiter
Batch
AppManager
Trade3
Server Server Server Server Server Server Server Server
Value(#srvrs)
Value(#srvrs)
Demand(HTTP req/sec)
WebSphere 5.1
DB2
AppManager
WebSphere 5.1
DB2
Value(#srvrs)
Maximize Total SLA Revenue
5 sec
Value(RT)
Demand(HTTP req/sec)
SLA SLA SLA
(Tesauro, AAAI 2005; Tesauro et al., ICAC 2006)
![Page 24: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/24.jpg)
© 2006 IBM Corporation
IBM
Power & Performance Management
Objective: Managing systems to multi-discipline objectives: minimize Resp. Time and minimize Power Usage
State Variables (21):
– Power Cap
– Power Usage
– CPU Utilization
– Temperature
– # of requests arrived
– Workload intensity (# Clients)
– Response Time
Action: Power Cap
Reward: SLA(Resp. Time) – Power Usage
(Kephart et al., ICAC 2007)
![Page 25: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/25.jpg)
© 2006 IBM Corporation25
IBM
IBM Regression Results TEST ERROR
14/47
10/223/5
MLKR
![Page 26: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/26.jpg)
27
Metric Learning for RL basis function construction (Keller et al. ICML 2006)
• RL Dataset of state-action-reward tuples {(si, ai, ri) , i=1,…,N}
![Page 27: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/27.jpg)
28
Value Iteration
• Define an iterative “bootstrap” calculation:
• Each round of VI must iterate over all states in the state space• Try to speed this up using state aggregation (Bertsekas &
Castanon, 1989)
• Idea: Use NCA to aggregate states:– project states into lower-dim rep; keep states with similar Bellman
error close together
– use projected states to define a set of basis functions {}– learn linear value function over basis functions: V = θi i
'
''1 )'(max)(s
kass
ass
ak sVRPsV
![Page 28: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/28.jpg)
Chopra et. al. 2005Similarity metric for image
verification.
Problem: Given a pair of face-images,decide if they are from the same person.
![Page 29: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/29.jpg)
Chopra et. al. 2005Similarity metric for image
verification.
Too difficult for linear mapping!
Problem: Given a pair of face-images,decide if they are from the same person.
![Page 30: A Survey on Distance Metric Learning (Part 2)](https://reader033.fdocuments.net/reader033/viewer/2022061618/56812e3a550346895d93ac97/html5/thumbnails/30.jpg)