Transitive Hashing Network for HeterogeneousMultimedia Retrieval
Zhangjie Cao1, Mingsheng Long1, Jianmin Wang1, and Qiang Yang2
1School of SoftwareTsinghua University
2Department of Computer ScienceHong Kong University of Science and Technology
The Thirty-first AAAI Conference on Artificial Intelligence, 2017
Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 1 / 18
Motivation Large Scale Multimedia Retrieval
Cross-modal RetrievalNearest Neighbor (NN) similarity retrieval across modalities
Database: X img = {ximg1 , . . . , ximgN
} and Query: qtxtCross-modal NN: NN (qtxt) = min
x
img œX img d!x
img , qtxt"
Precision: 0.625
Image Query
Top 16 Returned Tags
[‘lake’]
(a) I æ T (Image Query on Text DB)
Precision: 0.625
Tags Query
Top 16 Returned Images
[‘sky sun’]
(b) T æ I (Text Query on Image DB)
Figure: Cross-modal retrieval: similarity retrieval across media modalities.
Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 2 / 18
Motivation Hashing Methods
Hashing Methods
generate imagedescriptors Approximate
NearestNeighborRetrieval
-1
1generate imagehash codes
SIFTDeCAF
generate textdescriptors
-1
1generate texthash codes
Word2VecTF-IDFmountainriver
SuperioritiesMemory
128-d float : 512 bytes æ 16 bytes
1 billion items : 512 GB æ 16 GB
TimeComputation: x10 - x100 faster
Transmission (disk / web): x30
faster
ApplicationsApproximate nearest neighbor
search
Compact representation, Feature
Compression for large datasets
Distribute and transmit data
online
Construct index for large-scale
database
Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 3 / 18
Motivation Hashing Methods
Traditional VS. Transitive
animalsleaves
tigerszoos
portraitactor
athletescelebrity
bluesky
humanslove
animals meaningflowersign
killtree
sunsetlake
peoplewelfare
browntree
athletespeople
heterogeneous relationship
image modality (auxiliary dataset)
heterogeneous relationship(auxiliary dataset)
image modality (query/database)
text modality (database/query)image modality (query/database)
text modality (auxiliary dataset)
text modality (database/query)
animalsleaves
traditional cross-modalhashing
transitive cross-modalhashing
Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 4 / 18
Method Model
Network Architecture
MMD MMD
database/query
hash layer
hash layer
image network text network
query/database
-1 1 -1 1 -11-1 1 -11
pairwisecross-entropy loss
pairwisequantization loss
pairwisequantization loss
Transitive Hashing Network
auxiliary dataset
Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 5 / 18
Method Loss
Heterogeneous Relationship Learning
humanslove
animals meaningflowersign
killtree
sunsetlake
peoplewelfare
browntree
athletespeople
image modality (auxiliary dataset)
heterogeneous relationship(auxiliary dataset)
image modality (query/database)
text modality (auxiliary dataset)
text modality (database/query)
animalsleaves
Given heterogeneous relationship S = {sij
},Logarithm Maximum a Posteriori estimation
log p (Hx , Hy |S) Ã log p (S|Hx , Hy ) p (Hx ) p (Hy )=
ÿ
s
ij
œSlog p
1sij
|hxi
, hyj
2p (hx
i
) p1hy
j
2, (1)
where p(S|Hx , Hy ) is likelihood function, and p(Hx ) and p(Hy ) areprior distributions.
Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 6 / 18
Method Loss
Heterogeneous Relationship Learning
Likelihood functionFor each pair of points x
i
and yj
, p(sij
|hxi
, hyj
) is the conditionalprobability of their relationship s
ij
given their hash codes hxi
and hyj
,which can be defined using the pairwise logistic function,
p!sij
|hxi
, hyj
"=
I‡
!+hx
i
, hyj
,", s
ij
= 11 ≠ ‡ !+hx
i
, hyj
,", s
ij
= 0
= ‡!+
hxi
, hyj
,"s
ij
!1 ≠ ‡ !+hx
i
, hyj
,""1≠sij ,
(2)
where ‡ (x) = 1/(1 + e≠x ) is the sigmoid function and hxi
= sgn(zxi
)and hy
i
= sgn(zyi
).
Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 7 / 18
Method Loss
Heterogeneous Relationship Learning
PriorFor ease of optimization, continuous relaxation that hx
i
= zxi
andhy
i
= zyi
is applied to the binary constraints.Then, to control the quantization error and close the gap betweenHamming distance and its surrogate for learning accurate hash codes,a new cross-entropy prior is proposed over the continuous activations{zú
i
} as
p (zúi
) Ã exp3
≠⁄H3
1
b ,|zú
i
|b
44, (3)
where ú œ {x , y}, and ⁄ is the parameter of the exponentialdistribution.
Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 8 / 18
Method Loss
Heterogeneous Relationship Learning
Optimization problem for heterogeneous relationship
min�
J = L + ⁄Q, (4)
where ⁄ is the trade-o� parameter between pairwise cross-entropy lossL and pairwise quantization loss Q, and � is network parameters.
Specifically,
L =ÿ
s
ij
œSlog
11 + exp
1ezx
i
, zyj
f22≠ s
ij
ezx
i
, zyj
f. (5)
Q =ÿ
s
ij
œS
bÿ
k=1(≠ log(|zx
ik
|) ≠ log(|zyjk
|)). (6)
Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 9 / 18
Method Loss
Homogeneous Distribution Alignment
humanslove
animals meaningflowersign
killtree
sunsetlake
peoplewelfare
browntree
athletespeople
image modality (auxiliary dataset)
heterogeneous relationship(auxiliary dataset)
image modality (query/database)
text modality (auxiliary dataset)
text modality (database/query)
animalsleaves
Integral prob. metrics F-divergences
wasserstein Hellinger
MMD
TV
KL
Pearson chi^2
Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 10 / 18
Method Loss
Homogeneous Distribution AlignmentMaximum Mean Discrepancy (MMD) [jmlr 12’]MMD is a nonparametric distance measure to compare di�erentdistributions P
q
and Px
in reproducing kernel Hilbert space H (RKHS)endowed with feature map „ and kernel k, formally defined asD
q
,...Ehq≥P
q
[„ (hq)] ≠ Ehx ≥Px
[„ (hx )]...
2
H, where Pq and Px are thedistribution of the query set X q, and the auxiliary set X̄ .
MMD between auxiliary dataset X̄ and query set X q
Dq
=n̂ÿ
i=1
n̂ÿ
j=1
k1zq
i
, zqj
2
n̂2 +n̄ÿ
i=1
n̄ÿ
j=1
k1zx
i
, zxj
2
n̄2 ≠ 2n̂ÿ
i=1
n̄ÿ
j=1
k1zq
i
, zxj
2
n̂n̄ ,
(7)where k(z
i
, zj
) = exp(≠“||zi
≠ zj
||2) is the Gaussian kernel.
Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 11 / 18
Method Loss
Transitive Hashing NetworkUnified optimization problem
min�
C = J + µ (Dq
+ Dd
) , (8)
where µ is a trade-o� parameter between the MAP loss J and theMMD penalty (D
q
+ Dd
).
ExtensionsWhen the auxiliary set is small, we assume that the auxiliary set isrelated to database and query sets. However, if the auxiliary set islarge, this requirement can be aborted since it’s common that theauxiliary set has relationship with other sets. Thus, we can use a largeset as auxiliary set in any task. We can even use relationship modelpre-trained from large-scale datasets and fine-tune it with ourhomogeneous alignment method. This widely-used pre-training andfine-tuning strategy makes our method more easily deployable.
Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 12 / 18
Experiments Setup
Experiments Setup
Datasets:
humanslove
animals meaningflowersign
killtree
sunsetlake
peoplewelfare
browntree
athletespeople
NUS-WIDE (auxiliary dataset)
ImageNet (query/database)
NUS-WIDE (auxiliary dataset)
YahooQA (database/query)
animalsleaves
Protocols: MAPs, Precision-Recall CurveParameter selection: cross-validation by jointly assessingMethods to compare with: two unsupervised methods Cross-ViewHashing (CVH) and Inter-Media Hashing (IMH), two supervisedmethods Quantized Correlation Hashing (QCH) and HeterogeneousTranslated Hashing (HTH), and one deep hashing method DeepCross-Modal Hashing (DCMH).
Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 13 / 18
Experiments Results
Results and DiscussionTable: MAP Comparison of Cross-Modal Retrieval Tasks on NUS-WIDE andImageNet-YahooQA
Task Method NUS-WIDE ImageNet-YahooQA8 bits 16 bits 24 bits 32 bits 8 bits 16 bits 24 bits 32 bits
I æ T
IMH 0.5821 0.5794 0.5804 0.5776 0.0855 0.0686 0.0999 0.0889CVH 0.5681 0.5606 0.5451 0.5558 0.1229 0.1180 0.0941 0.0865QCH 0.6463 0.6921 0.7019 0.7127 0.2563 0.2494 0.2581 0.2590HTH 0.5232 0.5548 0.5684 0.5325 0.2931 0.2694 0.2847 0.2663
DCMH 0.7887 0.7397 0.7210 0.7460 0.5133 0.5109 0.5321 0.5087THN 0.8252 0.8423 0.8495 0.8572 0.5451 0.5507 0.5803 0.5901
T æ I
IMH 0.5579 0.5593 0.5528 0.5457 0.1105 0.1044 0.1183 0.0909CVH 0.5261 0.5193 0.5097 0.5045 0.0711 0.0728 0.1116 0.1008QCH 0.6235 0.6609 0.6685 0.6773 0.2761 0.2847 0.2795 0.2665HTH 0.5603 0.5910 0.5798 0.5812 0.2172 0.1702 0.3122 0.2873
DCMH 0.7882 0.7912 0.7921 0.7718 0.5163 0.5510 0.5581 0.5444THN 0.7905 0.8137 0.8245 0.8268 0.6032 0.6097 0.6232 0.6102
Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 14 / 18
Experiments Results
Results and Discussion
Recall0 0.2 0.4 0.6 0.8 1
Pre
cis
ion
0.4
0.6
0.8
1
CVHIMHHTHQCHDCMHTHN
(a) I æ TRecall
0 0.2 0.4 0.6 0.8 1
Pre
cis
ion
0.4
0.6
0.8
1
CVHIMHHTHQCHDCMHTHN
(b) T æ IRecall
0 0.2 0.4 0.6 0.8 1
Pre
cis
ion
0
0.2
0.4
0.6
0.8
1
CVHIMHHTHQCHDCMHTHN
(c) I æ TRecall
0 0.2 0.4 0.6 0.8 1
Pre
cis
ion
0
0.2
0.4
0.6
0.8
1
CVHIMHHTHQCHDCMHTHN
(d) T æ IFigure: Precision-recall curves of Hamming ranking @ 24-bits codes onNUS-WIDE (a)-(b) and ImageNet-YahooQA (c)-(d).
Key ObservationsTHN is a new state of the art method for the more conventionalcross-modal retrieval problems where the relationship between queryand database is available for training as in the NUS-WIDE dataset.The homogeneous distribution alignment module of THN e�ectivelycloses this shift by matching the corresponding data distributions withthe maximum mean discrepancy.
Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 15 / 18
Experiments Results
Empirical Analysis
THN-ip: is the variant which uses the pairwise inner-product loss insteadof the pairwise cross-entropy lossTHN-D: is the variant without using the unsupervised training dataTHN-Q: is the variant without using the quantization loss
Table: MAP of THN variants on ImageNet-YahooQA
Method I æ T T æ I8 bits 16 bits 24 bits 32 bits 8 bits 16 bits 24 bits 32 bitsTHN-ip 0.2976 0.3171 0.3302 0.3554 0.3443 0.3605 0.3852 0.4286THN-D 0.5192 0.5123 0.5312 0.5411 0.5423 0.5512 0.5602 0.5489THN-Q 0.4821 0.5213 0.5352 0.4947 0.5731 0.5592 0.5849 0.5612THN 0.5451 0.5507 0.5803 0.5901 0.6032 0.6097 0.6232 0.6102
Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 16 / 18
Experiments Results
Empirical Analysis
Key ObservationsTHN outperforms THN-ip by very large margins, confirming theimportance of well-defined loss functions for heterogeneousrelationship learning.THN outperforms THN-D, which convinces that THN can furtherexploit the unsupervised training data to bridge the Hamming spacesof auxiliary dataset (NUS-WIDE) and query/database sets(ImageNet-YahooQA) such that the auxiliary dataset can be leveragedas a bridge to transfer knowledge between query and database.THN outperforms THN-Q, indicating pairwise quantization loss canreduce the quantization errors when binarizing continuousrepresentations to hash codes.
Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 17 / 18
Summary
Summary
We designe a new transitive deep hashing problem for heterogeneousmultimedia retrieval without direct relationship between query set anddatabase set.We propose a pairwise cross-entropy and a pairwise quantization lossfor heterogeneous relationship learning.We achieve homogeneous distribution alignment by minizing MMDloss between query/database set and auxiliary set.In the future, we plan to extend the method to social media problems.
Z. Cao et al. (Tsinghua University) Transitive Hashing Network AAAI 2017 18 / 18
MotivationMethodExperimentsSummary
Top Related