Accurate Multivariate Stock Movement Prediction via Data ...
Web Image Prediction Using Multivariate Point Processes
description
Transcript of Web Image Prediction Using Multivariate Point Processes
1
Web Image Prediction Using Multivariate Point Processes
Gunhee Kim1 Li Fei-Fei2 Eric P. Xing1
1: School of Computer Science, Carnegie Mellon University2: Computer Science Department, Stanford University
August 14, 2012
2
• Problem Statement• Method
Multivariate Point Process + Poisson Regression Full model of Intensity Function Learning and Prediction Personalization
• Experiments• Conclusion
Outline
3
• Problem Statement• Method
Multivariate Point Process + Poisson Regression Full model of Intensity Function Learning and Prediction Personalization
• Experiments• Conclusion
Outline
4
Problem Statement - Web Image Prediction
A photo stream of world+cup from Flickr up to 12/31/2008.
Each image is associated with meta-data (timestamp, owner ID).
Can we guess what photos will appear on the Flickr at tq = 6/6/2009?
Actual images at tq
Collective Image prediction
Actual images by uq at tq
PersonalizedImage prediction
5
Why is Image Prediction Interesting?Predicting User Behaviors on the Web
User behavior on the Web changes over time.
(2) News recommendation
(3) Product search
Few previous work on what images people are interested in.
• [D08] Dakka et al. CIKM 2008• [M09] Metzler et al. SIGIR 2009• [K10] Kulkani et al, WSDM 2011
• [V11] Amodeo et al, CIKM2011• [R12] Radinsky et al, WWW 2012
• What query terms are popular?
(1) Keyword search
• What documents are most relevant?• What documents are likely to be clicked?
Why is Image Prediction Interesting?Time-sensitive Image Reranking
Submit the term world+cup into Google/Bing/Flickr engines
Bing
Flickr
• Severely redundant. Almost identical all year long.
• Any meaningful order?
Increase diversity by temporal trends
Ranking by temporal suitability
Why is Image Prediction Interesting?Time-sensitive Image Reranking
Time-sensitive image rerankingFor tq = Jun. 23 (summer)
For tq = Feb. 5 (winter)
Personalized Time-sensitive image reranking
For tq = Aug. 23 and uq = 15655191
8
Relation to Previous Work
Web Content Dynamics Similar Image Retrieval
Image basedCollaborative Filtering
Leveraging Web Photosto Infer Missing Information
• Text based method [A11,W06]
• Image-based method [K10] No image prediction No personalization
Temporal trends + user histories
• Semantic meaning of keyword +
feature-wise similarity• [D11, P08, T08]
• Social trends in politics and market [J10]• Spatio-temporal events [S10]
• Scene completion [H07]• 3D models of landmarks [SN10]• Semantic image hierarchy [L10]
Images: source of predictionnot subject of prediction
Future images: not studied as missing info to be inferred.
• [A11] Ahmed et al. AISTAT11• [W06] Wang et al. KDD06• [K10] Kim et al, ECCV10
• [D11] Deng et al. CVPR 11• [P08] Dhilbin et al. CVPR08• [T08] Torralba et al. PAMI08
• [J10] Jin et al. MM10• [S10] Singh et al. MM10• [H07] Hayes et al. SIGGRAPH07
• [SN10] Snavely et al. IEEE10• [L10] Li et al. CVPR10
9
Summary of Contribution
(2) News recommendation
Collective and Personalized Web Image Prediction
Algorithm based on multivariate point process
(1) Predicting user behaviors on the Web
(2) Time-sensitive image reranking
Few previous work for large-scale Web images.
Novel in image retrieval literature
Flexibility, optimality, scalability, and prediction accuracies
More than 10 million images of 40 topics
Outperform baselines (PageRank based IR, Topic modeling)
10
• Problem Statement• Method
Multivariate Point Process + Poisson Regression Full model of Intensity Function Learning and Prediction Personalization
• Experiments• Conclusion
Outline
11
Multivariate Point Process (MPP)
A stochastic process that consists of a series of random events in time and spaces.
Neural spiking modeling
[Brown et al. Nat.Neuro.04]
Locations of Lauraceae trees [Moller et al. 2008]
Ecology
Computer Vision
Crowd counting [Ge et al.CVPR08]Events in video [Prabhakar et al. CVPR10]
Micro-earthquake data [Schoenberg]
Statistical Model for spatio-temporal events
Geology
12
MPP for Image Streams
An occurrence of a particular image at a particular time
A short stream of penguin images
Each image is associated with (visual cluster, timestamp)
A point in time and image space =
v1 : ice hockeyv2 : animal penguinv3 : snowy mountain
Discrete-time trivariate PP
13
Mathematical Formulation for MPP
A short stream of penguin images
Infinitesimal expected occurrence rate of visual cluster i at time t
Intensity function for VC i at t
The intensity function is represented by exponential of linear covariate functions.
: Parameter set
: covariate function
Covariates: any likely factors to be associated with image occurrences (ex. Time, season, and other external events)
14
MLE solution for MPP
A short stream of penguin images
Parametric form of intensity functions with covariates
Log-likelihood of an observed stream
MLE solution can tell which covariates are contributing for the occurrence of visual cluster i
Poisson regression
Globally-optimal solution
15
Sparse MLE solution for MPP
A short stream of penguin images
Log-likelihood of an observed stream
For each visual cluster, only a small number of strong factors affect image occurrence.
A sparse solution is encouragedL1 (Lasso) penalty
MLE solution: Cyclic coordinate descent [Friedman et al. 2010].
16
A Toy Example of Image Prediction
Covariates: only year and months
(1 + 7 + 12 = 20 parameters)
Shark example(Sea tour)
(Ice hockey)
Every yearObserved occurrence data
Peaked in summer
Every month
Peaked in January
17
• Problem Statement• Method
Multivariate Point Process + Poisson Regression Full model of Intensity Function Learning and Prediction Personalization
• Experiments• Conclusion
Outline
18
Full Model of Intensity Functions
History component
Correlation component
Externalcomponent
Any probable factors can be included without performance loss because we encourage a sparse solution.
19
Full Model of Intensity Functions
History component
Correlation component
Externalcomponent
Linear autoregressive (AR) process of order P
Typical pattern ofannual periodicity
Biphasic = bursty occurrence
20
Full Model of Intensity Functions
History component
Correlation component
Externalcomponent
Existence or absence of a VC can be a strong clue.Synchronized
4 months lag
21
Full Model of Intensity Functions
History component
Correlation component
Externalcomponent
Month covariate User covariate
Note1. Flexibly add or remove covariate functions according to the characteristics of image topics.2. AR can be replaced by a more general temporal model such as ARMA.
22
• Problem Statement• Method
Multivariate Point Process + Poisson Regression Full model of Intensity Function Learning and Prediction Personalization
• Experiments• Conclusion
Outline
23
Learning and Prediction
Learning Prediction
For each visual cluster (VC) i,
1. Figure out covariates for intensity function
2. Observe the actual occurrence of VC i
3. Compute MLE solution by using cyclic coordinate descent.
Given a topic keyword and tq,
1. Gather covariates info for tq.
2. Compute intensity function for each VC i,
3. Sample L images according to
O(MJT), only once offline O(MJ), for each tq
M: No. of VCsJ: No. of covariatesT: No. of time steps
30 min (with soccer topic of 810K images) << 1 sec M: = 200, J = 118, T = 1,500
24
• Problem Statement• Method
Multivariate Point Process + Poisson Regression Full model of Intensity Function Learning and Prediction Personalization
• Experiments• Conclusion
Outline
25
Personalization
Idea of locally-weighted Learning [Atkeson et al.97]
Collective Image prediction
Personalized Image prediction
Each image is equally weighted
For a user u6
Each image is weighted according to the user similarity with u6
Learning is more biased.
26
• Problem Statement• Method
Multivariate Point Process + Poisson Regression Full model of Intensity Function Learning and Prediction Personalization
• Experiments• Conclusion
Outline
27
Flickr Dataset
10,284,945 images of 40 topic keywords
Ex. Soccer dataset
NationsPlacesAnimalsObjectsActivitiesAbstractHot topics
7 groups
Seasonal variation Zipf’s law
28
Experimental Tasks
Split the dataset into training/test sets
Timeline12/31/2008
2010Training data + image DBRandomly pick tq
±1 days
Positive test imagesL Predicted images
Collective Image prediction
Personalized Image prediction Randomly chosen 20 (tq,uq) pairs
Randomly chosen 20 tq per topic
29
Evaluation Measures
Actual images and predicted images are more then hundreds.How can we compare them?
(1) Two distance metrics : Lower is better
(2) Average precision: higher is better.
L2 Tiny [Torralba et al. 2008]
SIFT/HOG
2***
***
2
Resize 32x32 images
Using predicted images Rank positive/negative test images
Quantitative Results
Baselines
30
Sampling from ImageNet
Semantic meaning only
PageRank based IR Author-Time topic model
State-of-the-art retrieval Generative topic model
Collective Image prediction Personalized Image prediction
7~8% higher than the best baseline.
31
Examples of Collective Image Prediction
World+cup
(a) Jan.
(b) May
(c) Sep.
Ski+skating
Bicycle+kayak+soccer
Soccer world cup
Cardinals
(a) Jan.
(c) Sep.
(b) May
Football / Snow
Baseball / Leafy, Eggs
Baseball / Leafy
32
Examples of Personalized Image Prediction
Class
Fine+art
(a) User1
(b) User2
(c) User3
Painting
Photography
Flower
Brazilian
(a) User1
(c) User3
(b) User2 Dance
Auto-racing
33
• Problem Statement• Method
Multivariate Point Process + Poisson Regression Full model of Intensity Function Learning and Prediction Personalization
• Experiments• Conclusion
Outline
34
Conclusion
Example code will be available !
What’s done
Web image prediction(1) User behavior prediction(2) Time-sensitive image reranking
Observations
Poisson regression on multivariate point process
Many topics are associated with predictable periodic events.
Image-based Personalization is important.
Ex. What styles of painting does user A like?
More delicate information about user preference over texts