Context-aware Image Tweet Modelling and Recommendationtaochen/data/pubs/poster_mm16.pdf ·...

1
Context-aware Image Tweet Modelling and Recommendation Tao Chen, Xiangnan He and Min-Yen Kan School of Computing, National University of Singapore Introduction Research Ques+ons How to represent the seman.cs of an image tweet? Are visual objects sufficient? How to u.lize such representa.ons for personalized image tweet recommenda.on task? ACM MM 2016, Amsterdam, The Netherlands Personalized Image Tweet Recommendation CITING: Context-aware Image Tweet Modelling Tao Chen: [email protected] Experiments Coming soon!!! imdb.to/IGxE9f 1. Hashtag enhanced text URLs 3. Extended context Word Breaker 2. Text in Image OCR Tool Google Image Search 4. Similar contexts Twi@er Users Retweets All Tweets Ra+ngs Training 926 174,765 1,316,645 1,592,837 Test 9,021 77,061 82,743 Method Features P@1 P@3 P@5 MAP 1 Random 0.114** 0.115 0.115 0.156** 2 Length Post’s text 0.176** 0.158 0.150 0.173** 3 Profiling Post’s text 0.336** 0.227 0.197 0.202** 4 FAMF Visual objects 0.211** 0.205 0.192 0.211** 5 FAMF Post’s text 0.359* 0.325 0.287 0.275** 6 FAMF Non-filtered context 0.413 0.352 0.319 0.296 7 FAMF CITING 0.419 0.355 0.319 0.298 8 FAMF CITING + Visual objects 0.425 0.350 0.313 0.298 1 1 0 0 1 1 0 0 0 1 1 0 1 1 0 ? ? ? ? ? ? ? ? ? I 1 I 2 I 3 I 4 I 5 I 6 U 1 U 2 U 3 U 4 Cold start Our Contribu+ons We propose a CITING framework to model image tweet by its contextual text We develop a feature-aware matrix factoriza.on model to capture user’s personal interest We have released code and datasets: hVps://github.com/kite1988/famf Conflate the variants of hashtags #icebucket, #ALSIceBucketChallenge Overlaid text and screenshot of pure text 82% contain the image 22.7% 24.6% 14.3% ice bucket ALS ice bucket challenge 70.6% Named en.ty Best guess Pages that contain the image Propose filtered rules to fuse the four contextual text Text quality: 1>3>2>4 Save acquisi.on cost by 18% and improve representa.on quality The tradi.onal Matrix Factoriza.on model does not work well due to serious cold start problems To alleviate this, we propose feature-aware Matrix Factoriza.on (FAMF) to model user’s interest Decompose user-item interac.on to user-feature interac.on Pair-wise Learning to Rank Posi.ve tweets (retweets) should be ranked beVer than nega.ve tweets (non-retweets) Bayesian Personalized Ranking [Rendle et al. 2009] Infer the parameters via stochas.c gradient descent (SGD) Item’s latent factor User’s latent factor N types of features (e.g., CITING text) A feature’s latent factor Dataset from Twi@er Evalua+on For each user, keep the recent 10 retweets as test set and the rest as training set Report mean average precision (MAP) and precision at top posi.ons 4-8 vs. 1-3: FAMF is effec.ve in modelling user interest 4 vs. 5: Visual objects are not sufficient to model TwiVer image’s seman.cs 7 vs. other: Our CITING text significantly outperforms the others 7 vs. 6: The filtered fusion rules improve text quality 8 vs. 7: The further incorpora.on of visual objects does consistently improve the performance ** : p<0.01, * : p<0.05

Transcript of Context-aware Image Tweet Modelling and Recommendationtaochen/data/pubs/poster_mm16.pdf ·...

Page 1: Context-aware Image Tweet Modelling and Recommendationtaochen/data/pubs/poster_mm16.pdf · Context-aware Image Tweet Modelling and Recommendation Tao Chen, Xiangnan He and Min-Yen

Context-aware Image Tweet Modelling and Recommendation Tao Chen, Xiangnan He and Min-Yen Kan

School of Computing, National University of Singapore

Introduction ResearchQues+ons•  Howtorepresenttheseman.csofanimagetweet?

•  Arevisualobjectssufficient?•  Howtou.lizesuchrepresenta.onsforpersonalizedimagetweet

recommenda.ontask?

ACM MM 2016, Amsterdam, The Netherlands

Personalized Image Tweet Recommendation

CITING: Context-aware Image Tweet Modelling

Tao Chen: [email protected]

Experiments

Comingsoon!!!imdb.to/IGxE9f

1.Hashtagenhancedtext URLs

3.Extendedcontext

WordBreaker

2.TextinImage

OCRTool GoogleImageSearch

4.Similarcontexts

Twi@erUsers Retweets AllTweets Ra+ngs Training

926 174,765 1,316,645 1,592,837

Test 9,021 77,061 82,743

Method Features P@1 P@3 P@5 MAP 1 Random 0.114** 0.115 0.115 0.156**

2 Length Post’s text 0.176** 0.158 0.150 0.173**

3 Profiling Post’s text 0.336** 0.227 0.197 0.202**

4 FAMF Visual objects 0.211** 0.205 0.192 0.211**

5 FAMF Post’s text 0.359* 0.325 0.287 0.275**

6 FAMF Non-filtered context 0.413 0.352 0.319 0.296

7 FAMF CITING 0.419 0.355 0.319 0.298

8 FAMF CITING + Visual objects 0.425 0.350 0.313 0.298

1 1 0 0

1 1 0 0

0 1 1 0

1 1 0 ?

? ?

? ?

? ?

? ?

I1 I2 I3 I4 I5 I6

U1

U2

U3

U4

Coldstart

OurContribu+ons•  WeproposeaCITINGframeworktomodelimagetweetbyitscontextual

text•  Wedevelopafeature-awarematrixfactoriza.onmodeltocaptureuser’s

personalinterest•  Wehavereleasedcodeanddatasets:hVps://github.com/kite1988/famf

•  Conflatethevariantsofhashtags•  #icebucket,#ALSIceBucketChallenge

•  Overlaidtextandscreenshotofpuretext

82%containtheimage

22.7%

24.6%

14.3%

icebucket ALSicebucketchallenge

70.6%Nameden.ty

Bestguess

PagesthatcontaintheimageProposefilteredrulestofusethefourcontextualtext

•  Textquality: 1>3>2>4•  Saveacquisi.oncostby18%andimproverepresenta.onquality

•  Thetradi.onalMatrixFactoriza.onmodeldoesnotworkwellduetoseriouscoldstartproblems

•  Toalleviatethis,weproposefeature-awareMatrixFactoriza.on(FAMF)tomodeluser’sinterest•  Decomposeuser-iteminterac.ontouser-feature

interac.on

•  Pair-wiseLearningtoRank•  Posi.vetweets(retweets)shouldberankedbeVerthan

nega.vetweets(non-retweets)•  BayesianPersonalizedRanking[Rendleetal.2009]•  Infertheparametersviastochas.cgradientdescent(SGD)

Item’s latent factor User’s latent factor

N types of features(e.g., CITING text)

A feature’s latent factor

DatasetfromTwi@er

Evalua+on•  Foreachuser,keeptherecent10retweetsastestsetandthe

restastrainingset•  Reportmeanaverageprecision(MAP)andprecisionattop

posi.ons

•  4-8vs.1-3:FAMFiseffec.veinmodellinguserinterest•  4vs.5:VisualobjectsarenotsufficienttomodelTwiVerimage’s

seman.cs•  7vs.other:OurCITINGtextsignificantlyoutperformstheothers•  7vs.6:Thefilteredfusionrulesimprovetextquality•  8vs.7:Thefurtherincorpora.onofvisualobjectsdoes

consistentlyimprovetheperformance

**: p<0.01, *: p<0.05