Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.)...

50
Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT)
  • date post

    23-Jan-2016
  • Category

    Documents

  • view

    223
  • download

    0

Transcript of Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.)...

Page 1: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Semi-Supervised Learning in Gigantic Image Collections

Rob Fergus (NYU)Yair Weiss (Hebrew

U.)Antonio Torralba

(MIT)

Page 2: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

What does the world look like?

High level image statisticsObject Recognition for large-scale search

Gigantic Image Collections

Page 3: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Spectrum of Label InformationHuman annotations Noisy

labelsUnlabele

d

Page 4: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Semi-Supervised Learning using Graph Laplacian

V = data pointsE = n x n affinity matrix W

G = (V;E )

Wi j = exp(¡ kxi ¡ xj k=2²2) Di i =P

j Wi j

L = D ¡ 1=2LD ¡ 1=2 = I ¡ D ¡ 1=2WD ¡ 1=2Graph Laplacian:

[Zhu03,Zhou04]

Wi j = exp(¡ kxi ¡ xj k=2²2)

Page 5: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

SSL using Graph Laplacian

J (f ) = f T Lf +P l

i=1 ¸(f (i) ¡ yi )2

f T Lf + (f ¡ y)T ¤(f ¡ y)¤i i = ¸

¤i i = 0

If labeled:

If unlabeled:

• Want to find label function f that minimizes:

• y = labels, λ = weights• Rewrite as:

• Straightforward solution

Smoothness Agreement with labels

Page 6: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

• Smooth vectors will be linear combinations of eigenvectors U with small eigenvalues:

Eigenvectors of Laplacian

f = U®U = [Á1; : : : ;Ák]

[Belkin & Niyogi 06, Schoelkopf & Smola 02, Zhu et al 03, 08]

Page 7: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Rewrite System

• Let • U = smallest k eigenvectors of L, α =

coeffs.

• Optimal is now solution to k x k system:

J (®) = ®T § ®+ (U®¡ y)T ¤(U®¡ y)

®

(§ + UT ¤U)®= UT ¤y

f = U®

Page 8: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Computational Bottleneck

• Consider a dataset of 80 million images

• Inverting L– Inverting 80 million x 80 million matrix

• Finding eigenvectors of L– Diagonalizing 80 million x 80 million

matrix

Page 9: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Large Scale SSL - Related work• Nystrom method: pick small set of landmark points

– Compute exact solution on these– Interpolate solution to rest

• Others iteratively use classifiers to label data– E.g. Boosting-based method of Loeff et al. ICML’08

[see Zhu ‘08 survey]

Data Landmarks

Page 10: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Our Approach

Page 11: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Overview of Our Approach

Data LandmarksDensity

Reduce n

Limit as n ∞

Nystrom

Ours

Page 12: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Consider Limit as n ∞

• Consider x to be drawn from 2D distribution p(x)

• Let Lp(F) be a smoothness operator on p(x), for a function F(x):

• Analyze eigenfunctions of Lp(F)

Lp(F ) = 1=2RR

(F (x1) ¡ F (x2))2W(x1;x2)p(x1)p(x2)dx1dx2

W(x1;x2) = exp(¡ kx1 ¡ x2k=2²2)where2

Page 13: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Eigenvectors & Eigenfunctions

Page 14: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

• Claim:

If p is separable, then:

Eigenfunctions of marginals are also eigenfunctions of the joint density, with same eigenvalue

p(x1,x2)

p(x1)

p(x2)

Key Assumption: Separability of Input data

[Nadler et al. 06,Weiss et al. 08]

Page 15: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Numerical Approximations to Eigenfunctions in 1D

• 300k points drawn from distribution p(x)

• Consider p(x1)

p(x) Data

p(x1)

Histogram h(x1)

Page 16: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

• Solve for values g of eigenfunction at set of discrete locations (histogram bin centers)– and associated eigenvalues– B x B system (# histogram bins = 50)

• P is diag(h(x1))

Numerical Approximations to Eigenfunctions in 1D

P ( ~D ¡ ~W)P g= ¾P D̂g~D =

Pj

~W

~W D̂ = diag(P

j P ~W)

¾

Affinity between discrete locations

Page 17: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

1D Approximate Eigenfunctions

• Solve

1st Eigenfunction of h(x1)

2nd Eigenfunction of h(x1)

3rd Eigenfunction of h(x1)

P ( ~D ¡ ~W)P g= ¾P D̂g

Page 18: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Separability over Dimension

• Build histogram over dimension 2: h(x2)

• Now solve for eigenfunctions of h(x2)

1st Eigenfunction of h(x2)

2nd Eigenfunction of h(x2)

3rd Eigenfunction of h(x2)

Data

Page 19: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

From Eigenfunctions to Approximate Eigenvectors

• Take each data point• Do 1-D interpolation in each eigenfunction

k dimensional vector (for k eigenfunctions)

• Very fast operation (has to be done nk times)

Histogram bin1 50

Eig

enfu

nct

ion v

alu

e

Page 20: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Preprocessing

• Need to make data separable• Rotate using PCA

Not separable Separable

Rotate

Page 21: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Overall Algorithm1. Rotate data to maximize separability (currently use

PCA)

2. For each dimension:– Construct 1D histogram– Solve numerically for eigenfunctions/values

3. Order eigenfunctions from all dimensions by increasing eigenvalue & take first k

4. Interpolate data into k eigenfunctions– Yields approximate eigenvectors of Normalized Laplacian

5. Solve k x k least squares system to give label function

Page 22: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Experimentson Toy Data

Page 23: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Comparison of Approaches

Data Exact Eigenvector Eigenfunction

Page 24: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

ExactEigenvectors

0.0531 −− 0.0535

Exact -- ApproximateEigenvalues Approximate

Eigenvectors

0.1920 −− 0.1928

0.2049 −− 0.2068

0.2480 −− 0.5512

0.3580 −− 0.7979

Data

Page 25: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Nystrom Comparison

• Too few landmark points results in highly unstable eigenvectors

Page 26: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Nystrom Comparison

• Eigenfunctions fail when data has significant dependencies between dimensions

Page 27: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Experimentson Real Data

Page 28: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Experiments• Images from 126 classes downloaded

from Internet search engines, total 63,000 images Dump truck Emu

• Labels (correct/incorrect) provided by Geoff Hinton, Alex Krizhevsky, Vinod Nair (U. Toronto and CIFAR)

Page 29: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Input Image Representation

• Pixels not a convenient representation• Use Gist descriptor (Oliva & Torralba, 2001)• PCA down to 64 dimensions• L2 distance btw. Gist vectors rough

substitute for human perceptual distance

Page 30: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Are Dimensions Independent?Joint histogram for pairs of dimensions from raw 384-dimensional Gist

PCA

Joint histogram for pairs of dimensions after PCA

MI is mutual information score. 0 = Independent

Page 31: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Real 1-D Eigenfunctions of PCA’d Gist descriptors

64

56

48

40

32

24

16

8

1Eigenfunction 1

Eigenfunction 256

Inp

ut D

imensio

n

Eig

enfu

nct

ion v

alu

e Color = Input dimension

xmin xmax

Histogram bin1 50

Page 32: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Protocol• Task is to re-rank images of each class

• Measure precision @ 15% recall

• Vary # of labeled examples

Page 33: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

−Inf 0 1 2 3 4 5 6 70.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

Log2 number of +ve training examples/class

Mean

pre

cisi

on

at

15

% r

eca

ll a

vera

ged

over

16

cla

sses

Least−squares

SVM

Chance

Page 34: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

−Inf 0 1 2 3 4 5 6 70.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

Log2 number of +ve training examples/class

Mean

pre

cisi

on

at

15

% r

eca

ll a

vera

ged

over

16

cla

sses

Nystrom

Least−squares

SVM

Chance

Page 35: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

−Inf 0 1 2 3 4 5 6 70.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

Log2 number of +ve training examples/class

Mean

pre

cisi

on

at

15

% r

eca

ll a

vera

ged

over

16

cla

sses

Eigenfunction

Nystrom

Least−squares

SVM

Chance

Page 36: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

−Inf 0 1 2 3 4 5 6 70.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

Log2 number of +ve training examples/class

Mean

pre

cisi

on

at

15

% r

eca

ll a

vera

ged

over

16

cla

sses

Eigenfunction

Nystrom

Least−squares

Eigenvector

SVM

NN

Chance

Page 37: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

80 Million Images

Page 38: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Running on 80 million images

• PCA to 32 dims, k=48 eigenfunctions

• Precompute approximate eigenvectors (~20Gb)

• For each class, labels propagating through 80 million images

Page 39: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.
Page 40: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Summary

• Semi-supervised scheme that can scale to really large problems

• Rather than sub-sampling the data, we take the limit of infinite unlabeled data

• Assumes input data distribution is separable

• Can propagate labels in graph with 80 million nodes in fractions of second

Page 41: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.
Page 42: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Future Work

• Can potentially use 2D or 3D histograms instead of 1D– Requires more data

• Consider diagonal eigenfunctions

• Sharing of labels between classes

Page 43: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Are Dimensions Independent?Joint histogram for pairs of dimensions from raw 384-dimensional Gist

PCA

Joint histogram for pairs of dimensions after PCA

MI is mutual information score. 0 = Independent

Page 44: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Are Dimensions Independent?Joint histogram for pairs of dimensions from raw 384-dimensional Gist

ICA

Joint histogram for pairs of dimensions after ICA

MI is mutual information score. 0 = Independent

Page 45: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.
Page 46: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Overview of Our Approach

• Existing large-scale SSL methods try to reduce # points

• We consider what happens as n ∞

• Eigenvectors Eigenfunctions

• Assume input distribution is separable

• Make crude numerical approx. to Eigenfunctions

• Interpolate data in these approximate eigenfunctions to give approx. eigenvalues

Page 47: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Eigenfunctions

• Eigenfunction are limit of Eigenvectors as n ∞

• Analytical forms of eigenfunctions exist only in a few cases: Uniform, Gaussian

• Instead, we calculate numerical approximation to eigenfunctions

[Nadler et al. 06,Weiss et al. 08]

[Coifman et al. 05, Nadler et al. 06, Belikin & Niyogi 07]

1n2 f T Lf ! Lp(F )

Page 48: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Complexity Comparison

Nystrom

Select m landmark points

Get smallest k eigenvectors of m x m system

Interpolate n points into k eigenvectors

Solve k x k linear system

Eigenfunction

Rotate n points

Form d 1-D histograms

Solve d linear systems, each b x b

k 1-D interpolations of n points

Solve k x k linear system

Key: n = # data points (big, >106) l = # labeled points (small, <100) m = # landmark points d = # input dims (~100) k = # eigenvectors (~100) b = # histogram bins (~50)

Polynomial in # landmarks Linear in # data points

Page 49: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

• Can’t build accurate high dimensional histograms– Need too many points

• Currently just use 1-D histograms– 2 or 3D ones possible with enough data

• This assumes distribution is separable– Assume p(x) = p(x1) p(x2) … p(xd)

• For separable distributions, eigenfunctions are also separable

Key Assumption: Separability of Input data

[Nadler et al. 06,Weiss et al. 08]

Page 50: Semi-Supervised Learning in Gigantic Image Collections Rob Fergus (NYU) Yair Weiss (Hebrew U.) Antonio Torralba (MIT) TexPoint fonts used in EMF. Read.

Varying # Training Examples

−Inf 0 1 2 3 4 5 6 70.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

Log2 number of +ve training examples/class

Mean

pre

cisi

on

at

15

% r

eca

ll a

vera

ged

over

16

cla

sses

Eigenfunction

Nystrom

Least−squares

Eigenvector

SVM

NN

Chance