Lec 13: Low Dimension Embedding · Laplacian Eigenmap direct data embedding without explicit...
Transcript of Lec 13: Low Dimension Embedding · Laplacian Eigenmap direct data embedding without explicit...
Spring 2020: Venu: Haag 315, Time: M/W 4-5:15pm
ECE 5582 Computer VisionLec 13: Low Dimension Embedding
Zhu LiDept of CSEE, UMKC
Office: FH560E, Email: [email protected], Ph: x 2346.http://l.web.umkc.edu/lizhu
Z. Li: ECE 5582 Computer Vision, 2020 p.1
slides created with WPS Office Linux and EqualX LaTex equation editor
Outline
Recap: Part I
Linear Algebra Refresher SVD and Principal Component Analysis (PCA) Laplacian Eigen Map (LEM) Stochastic Neigborhood Embedding (SNE)
Z. Li: ECE 5582 Computer Vision, 2020 p.2
Handcrafted Feature Pipeline
An image retrieval pipeline (hand crafted features)
p.3
Image Formation
Feature Computing
Feature Aggregation
Classification
Knowledge/Data Base
Z. Li: ECE 5582 Computer Vision, 2020
Homography,Color space
Color histogramFiltering, Edge DetectionHoG, Harris Detector, LoG Scale space, SIFT
BowVLADFisher VectorSupervector
TPR, FPR, Precision, Recall, mAP
kNN, BayesianSVM, Kernel Machine
Vector and Matrix Notations
Vector
Matrix
p.4Z. Li: ECE 5582 Computer Vision, 2020
Vector Products
Inner Product
Outer Product
p.5Z. Li: ECE 5582 Computer Vision, 2020
Matrix-Vector Product
y=Ax
So, y is a linear combination of basis {ak} with weights from x
p.6Z. Li: ECE 5582 Computer Vision, 2020
Matrix Product
C=AB
Associative: ABC = (AB)C = A(BC)
Distributive: A(B+C) = AB + AC
p.7
A: nxp B: pxm A: nxm=
Z. Li: ECE 5582 Computer Vision, 2020
Outer Product/Kron
Vector outer product:
Example
Z. Li: ECE 5582 Computer Vision, 2020 p.8
Matrix Transpose
Transpose
p.9Z. Li: ECE 5582 Computer Vision, 2020
Matrix Trace and Determinant
Trace:Tr(A): only for nxn square matrix
Determinant: Det(A): The size of volumes spanned by A, All possible linear combinations of a1 and a2
p.10
Det(A) = |2-9| = 7;
Z. Li: ECE 5582 Computer Vision, 2020
Eigen Values and Eigen Vectors
Definition: for nxn matrix A:
In Matlab: [P, V]=eig(A);
p.11Z. Li: ECE 5582 Computer Vision, 2020
Eigen Vectors of Symmetric Matrix
If square matrix A: nxn is symmetric A=AT
Then its Eigen Values are real, and Eigen Vectors are othonormal:
� = �X��
where S is a diagonal matrix with eigen values of A.
Application: solution to the Quadratic form maximization:
will be the largest eigen value, and x* will be the corresponding eigen vector of A.
p.12Z. Li: ECE 5582 Computer Vision, 2020
SVD for non square matrix: A mxn:
p.13Z. Li: ECE 5582 Computer Vision, 2020
� = ���
��
Σ
SVD as Signal ExpansionThe Singular Value Decomposition (SVD) of an nxm matrix A, is,
Where the diagonal of S are the eigen values of AAT, [��,��,…, ��], called “singular values” U are eigenvectors of AAT, and V are eigen vectors of ATA,
the outer product of uiviT, are basis of A in reconstruction:
p.14
� = �X�� =��������
�
A(mxn) = U(mxm) S(mxn)
V(nxn)
The 1st order SVD approx. of A is:
�� ∗��: , 1� ∗��: , 1��
Z. Li: ECE 5582 Computer Vision, 2020
SVD approximation of an image
Very easy…function [x]=svd_approx(x0, k)dbg=0;if dbg x0= fix(100*randn(4,6)); k=2;end[u, s, v]=svd(x0);[m, n]=size(s);x = zeros(m, n); sgm = diag(s);for j=1:k x = x + sgm(j)*u(:,j)*v(:,j)'; end
p.15Z. Li: ECE 5582 Computer Vision, 2020
SVD for Separable Filtering
Take LoG filter for eg. h=fspecial('LoG', 11, 2.0); [u,s,v]=svd(h); h1=s(1,1)*u(:,1)*v(:,1)';
Z. Li: ECE 5582 Computer Vision, 2020 p.16
h1 is 1-SVD approx of LoG
Many implications forDeep networks acceleration !
NormVector Norm: Length of the vector Euclidean Norm (L2 Norm): norm(x, 2)
Lp norm:
Matrix Norm: Forbenius Norm
p.17Z. Li: ECE 5582 Computer Vision, 2020
Quadratic Form
Quadratic form f(x)=xTAx in R:
Positive Definite (PD): For non-zero x, xTAx > 0
Positive Semi-Definite (PSD): For non-zero x, xTAx >= 0
Indefinite: Exists x1, x2 non zero, but x1
TAx1 >0, while x2TAx2 < 0;
p.18Z. Li: ECE 5582 Computer Vision, 2020
Matrix Calculus
Gradient of f(A):
Matrix Gradient Properties
p.19Z. Li: ECE 5582 Computer Vision, 2020
Hessian of f(X)
For function:�:�� →�
Gradient & Hessian of Quadratic Form: f(x)= xTAx
p.20Z. Li: ECE 5582 Computer Vision, 2020
PCA -Dimension Reduction
A typical image retrieval pipeline
Z. Li: ECE 5582 Computer Vision, 2020 p.21
Image Formation
Feature Computing
Feature Aggregation
Classification
Knowledge/Data Base
e.g, dense SIFT: 12000 x 128 e.g, Fisher Vector: k=64, d=128
Rd -> Rp
Outline
Recap: Part I
Linear Algebra Refresher SVD and Principal Component Analysis (PCA) Laplacian Eigen Map (LEM) Stochastic Neigborhood Embedding (SNE)
Z. Li: ECE 5582 Computer Vision, 2020 p.22
Principal Component Analysis
The formulation: for data points {x1, x2,…, } in Rn, find a lower dimensional
representation in Rm, via a projection W,: mxn, s.t., the energy of the data is preserved
Z. Li: ECE 5582 Computer Vision, 2020 p.23
PCA solution
Take the Lagrangian of the problem
Take the derivative w.r.t. to w, and KKT condition gives us,
This is an Eigen problem, finding projection s.t. it is just a scaling along the scatter matrix eigen vectors.
Z. Li: ECE 5582 Computer Vision, 2020 p.24
X� = ��
PCA – how to compute
PCA via SVD on the Covariance matrix
Z. Li: ECE 5582 Computer Vision, 2020 p.25
S: covariance, nxn
2d Data
2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5-2
0
2
4
6
8
10
Z. Li: ECE 5582 Computer Vision, 2020 p.26
Principal Components
-5 -4 -3 -2 -1 0 1 2 3 4 5-5
-4
-3
-2
-1
0
1
2
3
4
5
1st principal vector
2nd principal vector
Gives best axis to project Minimum RMS
errorPrincipal vectors
are orthogonal
Z. Li: ECE 5582 Computer Vision, 2020 p.27
PCA on HoGs
Matlab Implementation of PCA: [A, s, eig_values]=princomp(hogs);
Z. Li: ECE 5582 Computer Vision, 2020 p.28
HoG basis function
PCA Application in Aggregation
SIFT aggregation Usually a PCA is
done on SIFT features, to reduce the dimension from 128 to say 24, 32. Then a GMM is
trained in R32 space, for FV encoding
Homework-2 Aggregation Fisher Vector
Aggregation of SIFT
Z. Li: ECE 5582 Computer Vision, 2020 p.29
load../../dataset/cdvs_sift_aggregation_test_data.mat;
[n_sift, kd_sift]=size(gd_sift_cdvs);offs = randperm(n_sift); offs = offs(1:200*2^10);% PCA[A1, s1, lat1]=princomp(double(gd_sift_cdvs(offs,:)));
figure(41); hold on; grid on; stem(lat1, '.'); title('sift pca eigen values');
SIFT PCA
Eigen values
That is why we have kd=[24, 32 48] for SIFT GMM in FV aggregation
Z. Li: ECE 5582 Computer Vision, 2020 p.30
SIFT PCA Basis Functions
Capturing max variation directions
Z. Li: ECE 5582 Computer Vision, 2020 p.31
Visualizing SIFT in lower dimensional space
Project SIFTs from 2 images to 2D space
Z. Li: ECE 5582 Computer Vision, 2020 p.32
Laplacian Eigen Map
Directly compute an embedding {yk} from input {xk} in RD without the explicit projection model A, s.t. Y=AX Objective function
where the nxn affinity matrix W reflects the data points relationship in the original space X.
Z. Li: ECE 5582 Computer Vision, 2020 p.33
M. Belkin and P. Niyogi. Laplacian Eigenmaps and spectral techniques for embedding and clustering. In Advances in Neural Information Processing Systems, volume 14, pages 585–591, Cambridge, MA, USA, 2002. The MIT Press
Graph Laplacian
Graph Laplacian: L= D - W
p.34Z. Li: ECE 5582 Computer Vision, 2020
Laplacian Eigenmap
Minimizing the following
Is equivalent to
Where D is the degree matrix (diagonal) with
Z. Li: ECE 5582 Computer Vision, 2020 p.35
Laplacian Eigen Map Solution
Numerically, solve the eigen problem:
where the first d smallest eigen values’ corresponding eigenvectors, will form a d-dimensional feature of {yk}
Z. Li: ECE 5582 Computer Vision, 2020 p.36
�� = ���
eigen vec
n-point gives nxn Laplacianfirst k eigenvectors of 1xn,give us k-dimension inducedembedding
Stochastic Neigbor Embedding
Hinton's work: For high dimensional data {xi} in R20x20, e.g., digits in
MINST Find its lower dimension (e.g, 2D) embedding such that
their relative affinity is preserved Unsupervised (no label info utilized)
Z. Li: ECE 5582 Computer Vision, 2020 p.37
Probability Preserving Embedding
• Each point in high-dim has a conditional probability of picking each other point as its neighbor.
• The distribution over
neighbors is based on the high-dim pairwise distances.– If we do not have
coordinates for the datapoints we can use a matrix of dissimilarities instead of pairwise distances.
High-D Space
i
jk
probability of picking j given that you start at i
p.38Z. Li: ECE 5582 Computer Vision, 2020
Problem Formulation SNE starts by converting the Euclidean distances between
high-dimensional data points pair distance d(xi, xj) into conditional probabilities that represent similarity. It can be described as:
its lower dimensional embedding{xi} in RD to {yi} in Rd, has similar distance as conditional prob as,
Stochastic Neighbor Embedding
not symmetric
p.39Z. Li: ECE 5582 Computer Vision, 2020
Stochastic Neighbor Embedding (SNE) Preserving pair wise prob relationship in terms of
conditional prob, i.e, minimizes the differences of p(j|i) and q(j|i) for all pairs of {xi, xj} and {yi, yj}
KL distance, measures the difference in two distributions (bonus points for HW-1, using KL to measure histogram distance) Has coding penality interpretation:
http://sce2.umkc.edu/csee/lizhu/teaching/2018.fall.video-com/notes/lec02.pdf
40
p.40Z. Li: ECE 5582 Computer Vision, 2020
SNE solution
Gradient of the total KL distance:
this gives us gradient search solution:
moving along the gradient, with a momentum factor
p.41Z. Li: ECE 5582 Computer Vision, 2020
Matlab Implementation
t-Distribution SNE example: HW-2 data embedding
Z. Li: ECE 5582 Computer Vision, 2020 p.42
Summary SVD and PCA SVD – non-square matrix decomposition, left transform and
right transform, with scaling in between SVD – as an image decomposition, linear combination of
outer-product basis PCA – eigen values indicate amount of info/energy in each
dimension, PCA – basis are eigen vectors to the covariance matrix
Laplacian Eigenmap direct data embedding without explicit projection input data -> affinity graph -> graph Laplacian eigenvectors -
> embedding by picking up eigenvectorsStochastic Neigbor Embedding No explict projection matrix Embedding by preserving probabilitic affinity Solution via gradient algorithm
Z. Li: ECE 5582 Computer Vision, 2020 p.43