Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature...

49
Learning to Hash for Big Data o LAMDA Group H˘O¯˘EX ^#EI[:¢¿ [email protected] May 09, 2015 Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 1 / 43

Transcript of Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature...

Page 1: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Learning to Hash for Big Data

o�

LAMDA GroupH®�ÆO�Å�Æ�EâX^�#EâI[­:¢�¿

[email protected]

May 09, 2015

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 1 / 43

Page 2: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Outline

1 IntroductionProblem DefinitionExisting Methods

2 Scalable Graph Hashing with Feature TransformationMotivationModel and LearningExperiment

3 Conclusion

4 Reference

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 2 / 43

Page 3: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Introduction

Outline

1 IntroductionProblem DefinitionExisting Methods

2 Scalable Graph Hashing with Feature TransformationMotivationModel and LearningExperiment

3 Conclusion

4 Reference

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 3 / 43

Page 4: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Introduction Problem Definition

Nearest Neighbor Search (Retrieval)

Given a query point q, return the points closest (similar) to q in thedatabase (e.g., images).

Underlying many machine learning, data mining, information retrievalproblems

Challenge in Big Data Applications:

Curse of dimensionality

Storage cost

Query speedLi (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 4 / 43

Page 5: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Introduction Problem Definition

Similarity Preserving Hashing

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 5 / 43

Page 6: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Introduction Problem Definition

Reduce Dimensionality and Storage Cost

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 6 / 43

Page 7: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Introduction Problem Definition

Querying

Hamming distance:

||01101110, 00101101||H = 3||11011, 01011||H = 1

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 7 / 43

Page 8: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Introduction Problem Definition

Fast Query Speed

By using hashing-based index, we can achieve constant or sub-linearsearch time complexity.

Exhaustive search is also acceptable because the distance calculationcost is cheap now.

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 8 / 43

Page 9: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Introduction Problem Definition

Hash Function Learning

Easy or hard?

Hard: discrete optimization problem

Easy by approximation: two stages of hash function learning

Projection stage (dimensionality reduction)Projected with real-valued projection functionGiven a point x, each projected dimension i will be associated with areal-valued projection function fi(x) (e.g. fi(x) = wT

i x)

Quantization stageTurn real into binary

However, there exist essential differences between metriclearning (dimensionality reduction) and learning to hash. Simply adaptingtraditional metric learning is not enough.

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 9 / 43

Page 10: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Introduction Existing Methods

Data-Independent Methods

The hash function family is defined independently of the training dataset:

Locality-sensitive hashing (LSH): (Gionis et al., 1999; Andoni andIndyk, 2008) and its extensions (Datar et al., 2004; Kulis andGrauman, 2009; Kulis et al., 2009).

SIKH: Shift invariant kernel hashing (SIKH) (Raginsky and Lazebnik,2009).

Hash function: random projections.

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 10 / 43

Page 11: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Introduction Existing Methods

Data-Dependent Methods

Hash functions are learned from a given training dataset (learning to hash).

Relatively short codes

Seminal papers: (Salakhutdinov and Hinton, 2007, 2009; Torralba et al.,2008; Weiss et al., 2008)

Two categories:

Unimodal

Supervised methodsgiven the labels yi or triplet (xi,xj ,xk)Unsupervised methods

Multimodal

Supervised methodsUnsupervised methods

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 11 / 43

Page 12: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Introduction Existing Methods

(Unimodal) Unsupervised Methods

No labels to denote the categories of the training points.

PCAH: principal component analysis.

SH: eigenfunctions computed from the data similarity graph (Weisset al., 2008) .

ITQ: orthogonal rotation matrix to refine the initial projection matrixlearned by PCA (Gong and Lazebnik, 2011) .

AGH: graph-based hashing (Liu et al., 2011).

IsoHash: projected dimensions with isotropic variances (Kong and Li,2012b).

DGH: discrete graph hashing (Liu et al., 2014)

etc.

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 12 / 43

Page 13: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Introduction Existing Methods

(Unimodal) Supervised (semi-supervised) Methods

Class labels or pairwise constraints:

SSH: semi-supervised hashing (SSH) exploits both labeled data andunlabeled data for hash function learning (Wang et al., 2010a,b).

MLH: minimal loss hashing (MLH) based on the latent structuralSVM framework (Norouzi and Fleet, 2011).

KSH: kernel-based supervised hashing (Liu et al., 2012)

LDAHash: linear discriminant analysis based hashing (Strecha et al.,2012)

LFH: supervised hashing with latent factor models (Zhang et al.,2014)

etc.

Triplet-based methods:

Hamming distance metric learning (HDML) (Norouzi et al., 2012)

Column generation base hashing (CGHash) (Li et al., 2013)

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 13 / 43

Page 14: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Introduction Existing Methods

Multimodal Methods

Multi-source hashing

Cross-modal hashing

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 14 / 43

Page 15: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Introduction Existing Methods

Multi-Source Hashing

Aims at learning better codes by leveraging auxiliary views thanunimodal hashing.

Assumes that all the views provided for a query, which are typicallynot feasible for many multimedia applications.

Multiple feature hashing (Song et al., 2011)

Composite hashing (Zhang et al., 2011)

etc.

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 15 / 43

Page 16: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Introduction Existing Methods

Cross-Modal Hashing

Given a query of either image or text, return images or texts similar to it.

Cross View Hashing (CVH) (Kumar and Udupa, 2011)

Multimodal Latent Binary Embedding (MLBE) (Zhen and Yeung,2012a)

Co-Regularized Hashing (CRH) (Zhen and Yeung, 2012b)

Inter-Media Hashing (IMH) (Song et al., 2013)

Relation-aware Heterogeneous Hashing (RaHH) (Ou et al., 2013)

Semantic Correlation Maximization (SCM) (Zhang and Li, 2014)

etc.

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 16 / 43

Page 17: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Introduction Existing Methods

Our Contribution

Unsupervised Hashing :(1) Isotropic hashing [NIPS 2012]

(2) Scalable graph hashing with feature transformation [IJCAI 2015]

Supervised Hashing [SIGIR 2014]:Supervised hashing with latent factor models

Multimodal Hashing [AAAI 2014]:Large-scale supervised multimodal hashing with semantic correlationmaximization

Multiple-Bit Quantization:(1) Double-bit quantization [AAAI 2012]

(2) Manhattan quantization [SIGIR 2012]

5�êâMFÆSµyG�ª³6[5�ÆÏ�6,2015]

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 17 / 43

Page 18: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation

Outline

1 IntroductionProblem DefinitionExisting Methods

2 Scalable Graph Hashing with Feature TransformationMotivationModel and LearningExperiment

3 Conclusion

4 Reference

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 18 / 43

Page 19: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation Motivation

(Unsupervised) Graph Hashing

Guide the hashing code learning procedure by directly exploiting thepairwise similarity (neighborhood structure).

min∑ij

Sij ||bi − bj ||2 = tr(BTLB)

subject to : bi ∈ {−1, 1}c∑i

bi = 0

1

n

∑i

bibTi = I

Should be expected to achieve better performance than non-graphbased methods if the learning algorithms are effective.

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 19 / 43

Page 20: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation Motivation

Motivation

However, it is difficult to design effective algorithms because both thememory and time complexity are at least O(n2).

SH (Weiss et al., 2008): Use an eigenfunction solution of 1-DLaplacian with uniform assumption

BRE (Kulis and Darrell, 2009): Subsample a small subset for training

AGH (Liu et al., 2011), DGH (Liu et al., 2014): Use anchor graph toapproximate the similarity graph

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 20 / 43

Page 21: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation Motivation

Contribution

How to utilize the whole graph and simultaneously avoid O(n2)complexity?

Scalable graph hashing (SGH):

A feature transformation (Shrivastava and Li, 2014) method toeffectively approximate the whole graph without explicitly computingit.

A sequential method for bit-wise complementary learning.

Linear complexity.

Outperform state of the art in terms of both accuracy and scalability.

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 21 / 43

Page 22: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation Model and Learning

Notation

X = {x1, . . . ,xn}T ∈ Rn×d: n data points.

bi ∈ {+1,−1}c: binary code of point xi.

bi = [h1(xi), . . . , hc(xi)]T , where hk(x) denotes hash function.

Pairwise similarity metric defined as: Sij = e−||xi−xj ||2F /ρ ∈ (0, 1]

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 22 / 43

Page 23: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation Model and Learning

Objective Function

min∑

ij(S̃ij −1cb

Ti bj)

2

S̃ij = 2Sij − 1 ∈ (−1, 1].Hash function: hk(xi) = sgn(

∑mj=1Wijφ(xi,xj) + bias)

∀xi, map it into Hamming space as bi = [h1(xi), · · · , hc(xi)]T

∀x, define: K(x) =[φ(x,x1)−

∑ni=1 φ(xi,x1)/n, . . . , φ(x,xm)−

∑ni=1 φ(xi,xm)/n]

Objective function with the parameter W ∈ Rc×m:

minW||cS̃− sgn(K(X)WT )sgn(K(X)WT )T ||2F

s.t. WK(X)TK(X)WT = I

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 23 / 43

Page 24: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation Model and Learning

Feature Transformation

∀x, define P (x) and Q(x):

P (x) = [

√2(e2 − 1)

eρe−||x||2F

ρ x;

√e2 + 1

ee− ||x||

2F

ρ ; 1]

Q(x) = [

√2(e2 − 1)

eρe−||x||2F

ρ x;

√e2 + 1

ee− ||x||

2F

ρ ;−1]

∀xi,xj ∈ X

P (xi)TQ(xj) = 2[

e2 − 1

2e× 2xTi xj

ρ+e2 + 1

2e]e−||xi||

2F+||xj ||

2F

ρ − 1

≈ 2e−||xi||

2F−||xj ||

2F+2xTi xj

ρ − 1

= 2e−||xi−xj ||

2F

ρ − 1 = S̃ij

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 24 / 43

Page 25: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation Model and Learning

Feature Transformation

Here, we use an approximation e2−12e x+ e2+1

2e ≈ ex

We assume −1 ≤ 2ρx

Ti xj ≤ 1. It is easy to prove that

ρ = 2max{‖xi‖2F }ni=1 can make −1 ≤ 2ρx

Ti xj ≤ 1.

Then we have S̃ ≈ P (X)TQ(X)

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 25 / 43

Page 26: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation Model and Learning

Sequential Learning Strategy

Direct relaxation may lead to poor performance

We adopt a sequential learning strategy in a bit-wise complementarymanner

Residual definition:Rt = cS̃−

∑t−1i=1 sgn(K(X)wi)sgn(K(X)wi)

T

R1 = cS̃

Objective function:

minwt||Rt − sgn(K(X)wt)sgn(K(X)wt)

T ||2F

s.t. wTt K(X)TK(X)wt = 1

By relaxation, we can get:

minwt

−tr(wTt K(X)TRtK(X)wt)

s.t. wTt K(X)TK(X)wt = 1

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 26 / 43

Page 27: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation Model and Learning

Sequential Learning Strategy

Then we obtain a generalized eigenvalue problem:

K(X)TRtK(X)wt = λK(X)TK(X)wt

Define At = K(X)TRtK(X) ∈ Rm×m, then:

At = At−1 −K(X)T sgn(K(X)wt−1)sgn(K(X)wt−1)TK(X)

Key component:

A1 = cK(X)T S̃K(X)

= c[K(X)T P (X)T ][Q(X)︸ ︷︷ ︸S̃

K(X)]

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 27 / 43

Page 28: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation Model and Learning

Sequential Learning Strategy

Adopting the residual matrix:

Rt = cS̃−c∑

i=1,i 6=tsgn(K(X)wi)sgn(K(X)wi)

T

we can learn all the W = {wi}ci=1 for multiple rounds.

This can further improve the accuracy.

We continue it for one more round to get a good tradeoff betweenaccuracy and speed.

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 28 / 43

Page 29: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation Model and Learning

Sequential Learning Algorithm

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 29 / 43

Page 30: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation Model and Learning

Complexity Analysis

Initialization

P (X) and Q(X): O(dn)Kernel initialization: O(dmn+mn)A1 and Z: O((d+ 2)mn)

Main procedure

O(c(mn+m2) +m3)

Typically, m, d, c will be much less than n.

Hence, the time complexity is O(n). Furthermore, the storage complexityis also O(n).

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 30 / 43

Page 31: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation Experiment

Dataset

TINY-1M

One million images from the set of 80M tiny images.

Each tiny image is represented by a 384-dim GIST descriptors.

MIRFLICKR-1M

One million Flickr images.

Extract 512-dim features from each image.

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 31 / 43

Page 32: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation Experiment

Evaluation Protocols and Baselines

Evaluation Protocols:

Ground truth: top 2% nearest neighbors in the training set in termsof Euclidean distance

Hamming ranking: Top-K precision

Training time for scalability

Baselines:

LSH (Andoni and Indyk, 2008)

PCAH (Gong and Lazebnik, 2011)

ITQ (Gong and Lazebnik, 2011)

AGH (Liu et al., 2011)

DGH (Liu et al., 2014): DGH-I and DGH-R

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 32 / 43

Page 33: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation Experiment

Top-1k Precision

Method 32 bits 64 bits 96 bits 128 bits 256 bitsSGH 0.4697 0.5742 0.6299 0.6737 0.7357ITQ 0.4289 0.4782 0.4947 0.4986 0.5003AGH 0.3973 0.4402 0.4577 0.4654 0.4767

DGH-I 0.3974 0.4536 0.4737 0.4874 0.4969DGH-R 0.3793 0.4554 0.4871 0.4989 0.5276PCAH 0.2457 0.2203 0.2000 0.1836 0.1421LSH 0.2507 0.3575 0.4122 0.4529 0.5212

Method 32 bits 64 bits 96 bits 128 bits 256 bitsSGH 0.4919 0.6041 0.6677 0.6985 0.7584ITQ 0.5177 0.5776 0.5999 0.6096 0.6228AGH 0.4299 0.4741 0.4911 0.4998 0.506

DGH-I 0.4299 0.4806 0.5001 0.5111 0.5253DGH-R 0.4121 0.4776 0.5054 0.5196 0.5428PCAH 0.2720 0.2384 0.2141 0.1950 0.1508LSH 0.2597 0.3995 0.466 0.5160 0.6072

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 33 / 43

Page 34: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation Experiment

Top-k Precision @TINY-1M

200 400 600 800 10000.2

0.3

0.4

0.5

0.6

0.7

0.8

#Returned Samples

Pre

cisi

on

SGHITQAGHDGH−IDGH−RPCAHLSH

(a) 64 bit

200 400 600 800 1000

0.2

0.3

0.4

0.5

0.6

0.7

0.8

#Returned Samples

Pre

cisi

on

SGHITQAGHDGH−IDGH−RPCAHLSH

(b) 128 bit

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 34 / 43

Page 35: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation Experiment

Top-k Precision @MIRFLICKR-1M

0 200 400 600 800 10000.2

0.3

0.4

0.5

0.6

0.7

0.8

#Returned Samples

Pre

cisi

on

SGHITQAGHDGH−IDGH−RPCAHLSH

(a) 64 bit

0 200 400 600 800 1000

0.2

0.3

0.4

0.5

0.6

0.7

0.8

#Returned Samples

Pre

cisi

on

SGHITQAGHDGH−IDGH−RPCAHLSH

(b) 128 bit

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 35 / 43

Page 36: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation Experiment

Training Time (in second)

Training time @TINY-1M. Here, t1 = 1438.60

Method 32 bits 64 bits 96 bits 128 bits 256 bitsSGH 34.49 52.37 71.53 89.65 164.23ITQ 31.72 60.62 89.01 149.18 322.06AGH 18.60 + t1 19.40 + t1 20.08 + t1 22.48 + t1 25.09 + t1DGH-I 187.57 + t1 296.99 + t1 518.57 + t1 924.08 + t1 1838.30 + t1DGH-R 217.06 + t1 360.18 + t1 615.74 + t1 1089.10 + t1 2300.10 + t1PCAH 4.29 4.54 4.75 5.85 6.49LSH 1.68 1.77 1.84 2.55 3.76

Training time @MIRFLICKR-1M. Here, t2 = 1564.86

Method 32 bits 64 bits 96 bits 128 bits 256 bitsSGH 41.51 59.02 74.86 97.25 168.35ITQ 36.17 64.61 89.50 132.71 285.10AGH 17.99 + t2 18.80 + t2 20.30 + t2 19.87 + t2 21.60 + t2DGH-I 85.81 + t2 143.68 + t2 215.41 + t2 352.73 + t2 739.56 + t2DGH-R 116.25 + t2 206.24 + t2 308.32 + t2 517.97 + t2 1199.44 + t2PCAH 7.65 7.90 8.47 9.23 10.42LSH 2.44 2.43 2.71 3.38 4.21

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 36 / 43

Page 37: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation Experiment

Sensitivity to Parameter ρ

(a) TINY-1M (b) MIRFLICKR-1M

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 37 / 43

Page 38: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Scalable Graph Hashing with Feature Transformation Experiment

Sensitivity to Parameter m

(a) TINY-1M (b) MIRFLICKR-1M

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 38 / 43

Page 39: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Conclusion

Outline

1 IntroductionProblem DefinitionExisting Methods

2 Scalable Graph Hashing with Feature TransformationMotivationModel and LearningExperiment

3 Conclusion

4 Reference

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 39 / 43

Page 40: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Conclusion

Conclusion

Hashing can significantly improve retrieval speed and reduce storagecost.

It has become a hot research topic in big learning with wideapplications.

We have proposed a series of hashing methods, includingunsupervised, supervised, and multimodal methods. Furthermore,some quantization strategies (Kong and Li, 2012a; Kong et al., 2012)are also designed.

In particular, the details of SGH with feature transformation (Jiangand Li, 2015) are introduced in this talk.

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 40 / 43

Page 41: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Conclusion

Future Trends

Discrete optimization and learning

Scalable training for supervised hashing

Quantization

New applications

5�êâMFÆSµyG�ª³6[5�ÆÏ�6,2015] (Li and Zhou,2015)

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 41 / 43

Page 42: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Conclusion

Q & A

Thanks!

Question?

Code available athttp://cs.nju.edu.cn/lwj

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 42 / 43

Page 43: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

Reference

Outline

1 IntroductionProblem DefinitionExisting Methods

2 Scalable Graph Hashing with Feature TransformationMotivationModel and LearningExperiment

3 Conclusion

4 Reference

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 43 / 43

Page 44: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

References

A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximatenearest neighbor in high dimensions. Commun. ACM, 51(1):117–122,2008.

M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitivehashing scheme based on p-stable distributions. In Proceedings of theACM Symposium on Computational Geometry, 2004.

A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensionsvia hashing. In Proceedings of International Conference on Very LargeData Bases, 1999.

Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approachto learning binary codes. In Proceedings of Computer Vision andPattern Recognition, 2011.

Q.-Y. Jiang and W.-J. Li. Scalable graph hashing with featuretransformation. In IJCAI, 2015.

W. Kong and W.-J. Li. Double-bit quantization for hashing. InProceedings of the Twenty-Sixth AAAI Conference on ArtificialIntelligence (AAAI), 2012a.

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 43 / 43

Page 45: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

References

W. Kong and W.-J. Li. Isotropic hashing. In Proceedings of the 26thAnnual Conference on Neural Information Processing Systems (NIPS),2012b.

W. Kong, W.-J. Li, and M. Guo. Manhattan hashing for large-scale imageretrieval. In The 35th International ACM SIGIR conference on researchand development in Information Retrieval (SIGIR), 2012.

B. Kulis and T. Darrell. Learning to hash with binary reconstructiveembeddings. In Proceedings of Neural Information Processing Systems,2009.

B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalableimage search. In Proceedings of International Conference on ComputerVision, 2009.

B. Kulis, P. Jain, and K. Grauman. Fast similarity search for learnedmetrics. IEEE Trans. Pattern Anal. Mach. Intell., 31(12):2143–2157,2009.

S. Kumar and R. Udupa. Learning hash functions for cross-view similaritysearch. In IJCAI, pages 1360–1365, 2011.

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 43 / 43

Page 46: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

References

W.-J. Li and Z.-H. Zhou. Learning to hash for big data: current statusand future trends. Chinese Science Bulletin, 60:485–490, 2015.

X. Li, G. Lin, C. Shen, A. van den Hengel, and A. R. Dick. Learning hashfunctions using column generation. In ICML, 2013.

W. Liu, J. Wang, S. Kumar, and S. Chang. Hashing with graphs. InProceedings of International Conference on Machine Learning, 2011.

W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang. Supervised hashingwith kernels. In CVPR, pages 2074–2081, 2012.

W. Liu, C. Mu, S. Kumar, and S. Chang. Discrete graph hashing. InProceedings of Neural Information Processing Systems, 2014.

M. Norouzi and D. J. Fleet. Minimal loss hashing for compact binarycodes. In Proceedings of International Conference on Machine Learning,2011.

M. Norouzi, D. J. Fleet, and R. Salakhutdinov. Hamming distance metriclearning. In NIPS, pages 1070–1078, 2012.

M. Ou, P. Cui, F. Wang, J. Wang, W. Zhu, and S. Yang. Comparingapples to oranges: a scalable solution with heterogeneous hashing. InKDD, pages 230–238, 2013.

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 43 / 43

Page 47: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

References

M. Raginsky and S. Lazebnik. Locality-sensitive binary codes fromshift-invariant kernels. In Proceedings of Neural Information ProcessingSystems, 2009.

R. Salakhutdinov and G. Hinton. Semantic Hashing. In SIGIR workshopon Information Retrieval and applications of Graphical Models, 2007.

R. Salakhutdinov and G. E. Hinton. Semantic hashing. Int. J. Approx.Reasoning, 50(7):969–978, 2009.

A. Shrivastava and P. Li. Asymmetric lsh (alsh) for sublinear timemaximum inner product search (mips). In NIPS, 2014.

J. Song, Y. Yang, Z. Huang, H. T. Shen, and R. Hong. Multiple featurehashing for real-time large scale near-duplicate video retrieval. In ACMMultimedia, pages 423–432, 2011.

J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-mediahashing for large-scale retrieval from heterogeneous data sources. InSIGMOD Conference, pages 785–796, 2013.

C. Strecha, A. A. Bronstein, M. M. Bronstein, and P. Fua. Ldahash:Improved matching with smaller descriptors. IEEE Trans. Pattern Anal.Mach. Intell., 34(1):66–78, 2012.

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 43 / 43

Page 48: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

References

A. Torralba, R. Fergus, and Y. Weiss. Small codes and large imagedatabases for recognition. In Proceedings of Computer Vision andPattern Recognition, 2008.

J. Wang, S. Kumar, and S.-F. Chang. Sequential projection learning forhashing with compact codes. In Proceedings of International Conferenceon Machine Learning, 2010a.

J. Wang, S. Kumar, and S.-F. Chang. Semi-supervised hashing forlarge-scale image retrieval. In Proceedings of Computer Vision andPattern Recognition, 2010b.

Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In Proceedings ofNeural Information Processing Systems, 2008.

D. Zhang and W.-J. Li. Large-scale supervised multimodal hashing withsemantic correlation maximization. In Proceedings of the Twenty-EighthAAAI Conference on Artificial Intelligence (AAAI), 2014.

D. Zhang, F. Wang, and L. Si. Composite hashing with multipleinformation sources. In Proceedings of International ACM SIGIRConference on Research and Development in Information Retrieval,2011.

Li (http://cs.nju.edu.cn/lwj) Learning to Hash LAMDA, CS, NJU 43 / 43

Page 49: Learning to Hash for Big Data(1)Isotropic hashing[NIPS 2012] (2)Scalable graph hashing with feature transformation[IJCAI 2015] Supervised Hashing [SIGIR 2014]: Supervised hashing with

P. Zhang, W. Zhang, W.-J. Li, and M. Guo. Supervised hashing withlatent factor models. In SIGIR, 2014.

Y. Zhen and D.-Y. Yeung. A probabilistic model for multimodal hashfunction learning. In KDD, pages 940–948, 2012a.

Y. Zhen and D.-Y. Yeung. Co-regularized hashing for multimodal data. InNIPS, pages 1385–1393, 2012b.

() May 12, 2015 43 / 43