Download - Transitive Hashing Network for Heterogeneous Multimedia … · 2020. 9. 21. · 0.6 0.8 1 CVH IMH QCH DCMH THN (c) I æ T Recall 0 0.2 0.4 0.6 0.8 1 Precision 0 0.2 0.4 1 CVH IMH

Transitive Hashing Network for HeterogeneousMultimedia Retrieval

Zhangjie Cao1, Mingsheng Long1, Jianmin Wang1, and Qiang Yang2

1School of SoftwareTsinghua University

2Department of Computer ScienceHong Kong University of Science and Technology

The Thirty-first AAAI Conference on Artificial Intelligence, 2017

Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 1 / 18

Motivation Large Scale Multimedia Retrieval

Cross-modal RetrievalNearest Neighbor (NN) similarity retrieval across modalities

Database: X img = {ximg1 , . . . , ximgN

} and Query: qtxtCross-modal NN: NN (qtxt) = min

x

img œX img d!x

img , qtxt"

Precision: 0.625

Image Query

Top 16 Returned Tags

[‘lake’]

(a) I æ T (Image Query on Text DB)

Precision: 0.625

Tags Query

Top 16 Returned Images

[‘sky sun’]

(b) T æ I (Text Query on Image DB)

Figure: Cross-modal retrieval: similarity retrieval across media modalities.


Motivation Hashing Methods

Hashing Methods

generate imagedescriptors Approximate

NearestNeighborRetrieval

-1

1generate imagehash codes

SIFTDeCAF

generate textdescriptors

-1

1generate texthash codes

Word2VecTF-IDFmountainriver

SuperioritiesMemory

128-d float : 512 bytes æ 16 bytes

1 billion items : 512 GB æ 16 GB

TimeComputation: x10 - x100 faster

Transmission (disk / web): x30

faster

ApplicationsApproximate nearest neighbor

search

Compact representation, Feature

Compression for large datasets

Distribute and transmit data

online

Construct index for large-scale

database


Motivation Hashing Methods

Traditional VS. Transitive

animalsleaves

tigerszoos

portraitactor

athletescelebrity

bluesky

humanslove

animals meaningflowersign

killtree

sunsetlake

peoplewelfare

browntree

athletespeople

heterogeneous relationship

image modality (auxiliary dataset)

heterogeneous relationship(auxiliary dataset)

image modality (query/database)

text modality (database/query)image modality (query/database)

text modality (auxiliary dataset)

text modality (database/query)

animalsleaves

traditional cross-modalhashing

transitive cross-modalhashing


Method Model

Network Architecture

MMD MMD

database/query

hash layer

hash layer

image network text network

query/database

-1 1 -1 1 -11-1 1 -11

pairwisecross-entropy loss

pairwisequantization loss

pairwisequantization loss

Transitive Hashing Network

auxiliary dataset


Method Loss

Heterogeneous Relationship Learning

humanslove


killtree

sunsetlake

peoplewelfare

browntree

athletespeople






animalsleaves

Given heterogeneous relationship S = {sij

},Logarithm Maximum a Posteriori estimation

log p (Hx , Hy |S) Ã log p (S|Hx , Hy ) p (Hx ) p (Hy )=

ÿ

s

ij

œSlog p

1sij

|hxi

, hyj

2p (hx

i

) p1hy

j

2, (1)

where p(S|Hx , Hy ) is likelihood function, and p(Hx ) and p(Hy ) areprior distributions.


Method Loss


Likelihood functionFor each pair of points x

i

and yj

, p(sij

|hxi

, hyj

) is the conditionalprobability of their relationship s

ij

given their hash codes hxi

and hyj

,which can be defined using the pairwise logistic function,

p!sij

|hxi

, hyj

"=

I‡

!+hx

i

, hyj

,", s

ij

= 11 ≠ ‡ !+hx

i

, hyj

,", s

ij

= 0

= ‡!+

hxi

, hyj

,"s

ij

!1 ≠ ‡ !+hx

i

, hyj

,""1≠sij ,

(2)

where ‡ (x) = 1/(1 + e≠x ) is the sigmoid function and hxi

= sgn(zxi

)and hy

i

= sgn(zyi

).


Method Loss


PriorFor ease of optimization, continuous relaxation that hx

i

= zxi

andhy

i

= zyi

is applied to the binary constraints.Then, to control the quantization error and close the gap betweenHamming distance and its surrogate for learning accurate hash codes,a new cross-entropy prior is proposed over the continuous activations{zú

i

} as

p (zúi

) Ã exp3

≠⁄H3

1

b ,|zú

i

|b

44, (3)

where ú œ {x , y}, and ⁄ is the parameter of the exponentialdistribution.


Method Loss


Optimization problem for heterogeneous relationship

min�

J = L + ⁄Q, (4)

where ⁄ is the trade-o� parameter between pairwise cross-entropy lossL and pairwise quantization loss Q, and � is network parameters.

Specifically,

L =ÿ

s

ij

œSlog

11 + exp

1ezx

i

, zyj

f22≠ s

ij

ezx

i

, zyj

f. (5)

Q =ÿ

s

ij

œS

bÿ

k=1(≠ log(|zx

ik

|) ≠ log(|zyjk

|)). (6)


Method Loss

Homogeneous Distribution Alignment

humanslove


killtree

sunsetlake

peoplewelfare

browntree

athletespeople






animalsleaves

Integral prob. metrics F-divergences

wasserstein Hellinger

MMD

TV

KL

Pearson chi^2


Method Loss

Homogeneous Distribution AlignmentMaximum Mean Discrepancy (MMD) [jmlr 12’]MMD is a nonparametric distance measure to compare di�erentdistributions P

q

and Px

in reproducing kernel Hilbert space H (RKHS)endowed with feature map „ and kernel k, formally defined asD

q

,...Ehq≥P

q

[„ (hq)] ≠ Ehx ≥Px

[„ (hx )]...

2

H, where Pq and Px are thedistribution of the query set X q, and the auxiliary set X̄ .

MMD between auxiliary dataset X̄ and query set X q

Dq

=n̂ÿ

i=1

n̂ÿ

j=1

k1zq

i

, zqj

2

n̂2 +n̄ÿ

i=1

n̄ÿ

j=1

k1zx

i

, zxj

2

n̄2 ≠ 2n̂ÿ

i=1

n̄ÿ

j=1

k1zq

i

, zxj

2

n̂n̄ ,

(7)where k(z

i

, zj

) = exp(≠“||zi

≠ zj

||2) is the Gaussian kernel.


Method Loss

Transitive Hashing NetworkUnified optimization problem

min�

C = J + µ (Dq

+ Dd

) , (8)

where µ is a trade-o� parameter between the MAP loss J and theMMD penalty (D

q

+ Dd

).

ExtensionsWhen the auxiliary set is small, we assume that the auxiliary set isrelated to database and query sets. However, if the auxiliary set islarge, this requirement can be aborted since it’s common that theauxiliary set has relationship with other sets. Thus, we can use a largeset as auxiliary set in any task. We can even use relationship modelpre-trained from large-scale datasets and fine-tune it with ourhomogeneous alignment method. This widely-used pre-training andfine-tuning strategy makes our method more easily deployable.


Experiments Setup

Experiments Setup

Datasets:

humanslove


killtree

sunsetlake

peoplewelfare

browntree

athletespeople

NUS-WIDE (auxiliary dataset)

ImageNet (query/database)

NUS-WIDE (auxiliary dataset)

YahooQA (database/query)

animalsleaves

Protocols: MAPs, Precision-Recall CurveParameter selection: cross-validation by jointly assessingMethods to compare with: two unsupervised methods Cross-ViewHashing (CVH) and Inter-Media Hashing (IMH), two supervisedmethods Quantized Correlation Hashing (QCH) and HeterogeneousTranslated Hashing (HTH), and one deep hashing method DeepCross-Modal Hashing (DCMH).


Experiments Results

Results and DiscussionTable: MAP Comparison of Cross-Modal Retrieval Tasks on NUS-WIDE andImageNet-YahooQA

Task Method NUS-WIDE ImageNet-YahooQA8 bits 16 bits 24 bits 32 bits 8 bits 16 bits 24 bits 32 bits

I æ T

IMH 0.5821 0.5794 0.5804 0.5776 0.0855 0.0686 0.0999 0.0889CVH 0.5681 0.5606 0.5451 0.5558 0.1229 0.1180 0.0941 0.0865QCH 0.6463 0.6921 0.7019 0.7127 0.2563 0.2494 0.2581 0.2590HTH 0.5232 0.5548 0.5684 0.5325 0.2931 0.2694 0.2847 0.2663

DCMH 0.7887 0.7397 0.7210 0.7460 0.5133 0.5109 0.5321 0.5087THN 0.8252 0.8423 0.8495 0.8572 0.5451 0.5507 0.5803 0.5901

T æ I

IMH 0.5579 0.5593 0.5528 0.5457 0.1105 0.1044 0.1183 0.0909CVH 0.5261 0.5193 0.5097 0.5045 0.0711 0.0728 0.1116 0.1008QCH 0.6235 0.6609 0.6685 0.6773 0.2761 0.2847 0.2795 0.2665HTH 0.5603 0.5910 0.5798 0.5812 0.2172 0.1702 0.3122 0.2873

DCMH 0.7882 0.7912 0.7921 0.7718 0.5163 0.5510 0.5581 0.5444THN 0.7905 0.8137 0.8245 0.8268 0.6032 0.6097 0.6232 0.6102


Experiments Results

Results and Discussion

Recall0 0.2 0.4 0.6 0.8 1

Pre

cis

ion

0.4

0.6

0.8

1

CVHIMHHTHQCHDCMHTHN

(a) I æ TRecall

0 0.2 0.4 0.6 0.8 1

Pre

cis

ion

0.4

0.6

0.8

1

CVHIMHHTHQCHDCMHTHN

(b) T æ IRecall

0 0.2 0.4 0.6 0.8 1

Pre

cis

ion

0

0.2

0.4

0.6

0.8

1

CVHIMHHTHQCHDCMHTHN

(c) I æ TRecall

0 0.2 0.4 0.6 0.8 1

Pre

cis

ion

0

0.2

0.4

0.6

0.8

1

CVHIMHHTHQCHDCMHTHN

(d) T æ IFigure: Precision-recall curves of Hamming ranking @ 24-bits codes onNUS-WIDE (a)-(b) and ImageNet-YahooQA (c)-(d).

Key ObservationsTHN is a new state of the art method for the more conventionalcross-modal retrieval problems where the relationship between queryand database is available for training as in the NUS-WIDE dataset.The homogeneous distribution alignment module of THN e�ectivelycloses this shift by matching the corresponding data distributions withthe maximum mean discrepancy.


Experiments Results

Empirical Analysis

THN-ip: is the variant which uses the pairwise inner-product loss insteadof the pairwise cross-entropy lossTHN-D: is the variant without using the unsupervised training dataTHN-Q: is the variant without using the quantization loss

Table: MAP of THN variants on ImageNet-YahooQA

Method I æ T T æ I8 bits 16 bits 24 bits 32 bits 8 bits 16 bits 24 bits 32 bitsTHN-ip 0.2976 0.3171 0.3302 0.3554 0.3443 0.3605 0.3852 0.4286THN-D 0.5192 0.5123 0.5312 0.5411 0.5423 0.5512 0.5602 0.5489THN-Q 0.4821 0.5213 0.5352 0.4947 0.5731 0.5592 0.5849 0.5612THN 0.5451 0.5507 0.5803 0.5901 0.6032 0.6097 0.6232 0.6102


Experiments Results

Empirical Analysis

Key ObservationsTHN outperforms THN-ip by very large margins, confirming theimportance of well-defined loss functions for heterogeneousrelationship learning.THN outperforms THN-D, which convinces that THN can furtherexploit the unsupervised training data to bridge the Hamming spacesof auxiliary dataset (NUS-WIDE) and query/database sets(ImageNet-YahooQA) such that the auxiliary dataset can be leveragedas a bridge to transfer knowledge between query and database.THN outperforms THN-Q, indicating pairwise quantization loss canreduce the quantization errors when binarizing continuousrepresentations to hash codes.


Summary

Summary

We designe a new transitive deep hashing problem for heterogeneousmultimedia retrieval without direct relationship between query set anddatabase set.We propose a pairwise cross-entropy and a pairwise quantization lossfor heterogeneous relationship learning.We achieve homogeneous distribution alignment by minizing MMDloss between query/database set and auxiliary set.In the future, we plan to extend the method to social media problems.


MotivationMethodExperimentsSummary