Context-aware Image Tweet Modelling and Recommendationtaochen/data/pubs/poster_mm16.pdf ·...

Context-aware Image Tweet Modelling and Recommendation Tao Chen, Xiangnan He and Min-Yen Kan School of Computing, National University of Singapore Introduction Research Ques+ons • How to represent the seman.cs of an image tweet? • Are visual objects sufficient? • How to u.lize such representa.ons for personalized image tweet recommenda.on task? ACM MM 2016, Amsterdam, The Netherlands Personalized Image Tweet Recommendation CITING: Context-aware Image Tweet Modelling Tao Chen: [email protected] Experiments Coming soon!!! imdb.to/IGxE9f 1. Hashtag enhanced text URLs 3. Extended context Word Breaker 2. Text in Image OCR Tool Google Image Search 4. Similar contexts Twi@er Users Retweets All Tweets Ra+ngs Training 926 174,765 1,316,645 1,592,837 Test 9,021 77,061 82,743 Method Features P@1 P@3 P@5 MAP 1 Random 0.114** 0.115 0.115 0.156** 2 Length Post’s text 0.176** 0.158 0.150 0.173** 3 Profiling Post’s text 0.336** 0.227 0.197 0.202** 4 FAMF Visual objects 0.211** 0.205 0.192 0.211** 5 FAMF Post’s text 0.359* 0.325 0.287 0.275** 6 FAMF Non-filtered context 0.413 0.352 0.319 0.296 7 FAMF CITING 0.419 0.355 0.319 0.298 8 FAMF CITING + Visual objects 0.425 0.350 0.313 0.298 1 1 0 0 1 1 0 0 0 1 1 0 1 1 0 ? ? ? ? ? ? ? ? ? I 1 I 2 I 3 I 4 I 5 I 6 U 1 U 2 U 3 U 4 Cold start Our Contribu+ons • We propose a CITING framework to model image tweet by its contextual text • We develop a feature-aware matrix factoriza.on model to capture user’s personal interest • We have released code and datasets: hVps://github.com/kite1988/famf • Conflate the variants of hashtags • #icebucket, #ALSIceBucketChallenge • Overlaid text and screenshot of pure text 82% contain the image 22.7% 24.6% 14.3% ice bucket ALS ice bucket challenge 70.6% Named en.ty Best guess Pages that contain the image Propose filtered rules to fuse the four contextual text • Text quality: 1>3>2>4 • Save acquisi.on cost by 18% and improve representa.on quality • The tradi.onal Matrix Factoriza.on model does not work well due to serious cold start problems • To alleviate this, we propose feature-aware Matrix Factoriza.on (FAMF) to model user’s interest • Decompose user-item interac.on to user-feature interac.on • Pair-wise Learning to Rank • Posi.ve tweets (retweets) should be ranked beVer than nega.ve tweets (non-retweets) • Bayesian Personalized Ranking [Rendle et al. 2009] • Infer the parameters via stochas.c gradient descent (SGD) Item’s latent factor User’s latent factor N types of features (e.g., CITING text) A feature’s latent factor Dataset from Twi@er Evalua+on • For each user, keep the recent 10 retweets as test set and the rest as training set • Report mean average precision (MAP) and precision at top posi.ons • 4-8 vs. 1-3: FAMF is effec.ve in modelling user interest • 4 vs. 5: Visual objects are not sufficient to model TwiVer image’s seman.cs • 7 vs. other: Our CITING text significantly outperforms the others • 7 vs. 6: The filtered fusion rules improve text quality • 8 vs. 7: The further incorpora.on of visual objects does consistently improve the performance ** : p<0.01, * : p<0.05

Upload
others
Category

Documents
view
2
download
0

Embed Size (px):

Transcript of Context-aware Image Tweet Modelling and Recommendationtaochen/data/pubs/poster_mm16.pdf ·...

Page 1: Context-aware Image Tweet Modelling and Recommendationtaochen/data/pubs/poster_mm16.pdf · Context-aware Image Tweet Modelling and Recommendation Tao Chen, Xiangnan He and Min-Yen

Context-aware Image Tweet Modelling and Recommendation Tao Chen, Xiangnan He and Min-Yen Kan

School of Computing, National University of Singapore

Introduction ResearchQues+ons•  Howtorepresenttheseman.csofanimagetweet?

•  Arevisualobjectssufficient?•  Howtou.lizesuchrepresenta.onsforpersonalizedimagetweet

recommenda.ontask?

ACM MM 2016, Amsterdam, The Netherlands

Personalized Image Tweet Recommendation

CITING: Context-aware Image Tweet Modelling

Tao Chen: [email protected]

Experiments

Comingsoon!!!imdb.to/IGxE9f

1.Hashtagenhancedtext URLs

3.Extendedcontext

WordBreaker

2.TextinImage

OCRTool GoogleImageSearch

4.Similarcontexts

Twi@erUsers Retweets AllTweets Ra+ngs Training

926 174,765 1,316,645 1,592,837

Test 9,021 77,061 82,743

Method Features P@1 P@3 P@5 MAP 1 Random 0.114** 0.115 0.115 0.156**

2 Length Post’s text 0.176** 0.158 0.150 0.173**

3 Profiling Post’s text 0.336** 0.227 0.197 0.202**

4 FAMF Visual objects 0.211** 0.205 0.192 0.211**

5 FAMF Post’s text 0.359* 0.325 0.287 0.275**

6 FAMF Non-filtered context 0.413 0.352 0.319 0.296

7 FAMF CITING 0.419 0.355 0.319 0.298

8 FAMF CITING + Visual objects 0.425 0.350 0.313 0.298

1 1 0 0

0 1 1 0

1 1 0 ?

? ?

I1 I2 I3 I4 I5 I6

Coldstart

OurContribu+ons•  WeproposeaCITINGframeworktomodelimagetweetbyitscontextual

text•  Wedevelopafeature-awarematrixfactoriza.onmodeltocaptureuser’s

personalinterest•  Wehavereleasedcodeanddatasets:hVps://github.com/kite1988/famf

•  Conflatethevariantsofhashtags•  #icebucket,#ALSIceBucketChallenge

•  Overlaidtextandscreenshotofpuretext

82%containtheimage

22.7%

24.6%

14.3%

icebucket ALSicebucketchallenge

70.6%Nameden.ty

Bestguess

PagesthatcontaintheimageProposefilteredrulestofusethefourcontextualtext

•  Textquality: 1>3>2>4•  Saveacquisi.oncostby18%andimproverepresenta.onquality

•  Thetradi.onalMatrixFactoriza.onmodeldoesnotworkwellduetoseriouscoldstartproblems

•  Toalleviatethis,weproposefeature-awareMatrixFactoriza.on(FAMF)tomodeluser’sinterest•  Decomposeuser-iteminterac.ontouser-feature

interac.on

•  Pair-wiseLearningtoRank•  Posi.vetweets(retweets)shouldberankedbeVerthan

nega.vetweets(non-retweets)•  BayesianPersonalizedRanking[Rendleetal.2009]•  Infertheparametersviastochas.cgradientdescent(SGD)

Item’s latent factor User’s latent factor

N types of features(e.g., CITING text)

A feature’s latent factor

DatasetfromTwi@er

Evalua+on•  Foreachuser,keeptherecent10retweetsastestsetandthe

restastrainingset•  Reportmeanaverageprecision(MAP)andprecisionattop

posi.ons

•  4-8vs.1-3:FAMFiseffec.veinmodellinguserinterest•  4vs.5:VisualobjectsarenotsufficienttomodelTwiVerimage’s

seman.cs•  7vs.other:OurCITINGtextsignificantlyoutperformstheothers•  7vs.6:Thefilteredfusionrulesimprovetextquality•  8vs.7:Thefurtherincorpora.onofvisualobjectsdoes

consistentlyimprovetheperformance

**: p<0.01, *: p<0.05