Personalized Semantic Web Service for Classifieds Requirements
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides
-
Upload
cataldo-musto -
Category
Entertainment & Humor
-
view
432 -
download
2
description
Transcript of Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides
UMAP 2012 - Industrial Track Montréal (Canada), 19.07.2012
Enhanced Semantic TV-Show Representation for Personalized Electronic Program Guides
Cataldo Musto, Fedelucio Narducci, Pasquale Lops, Giovanni Semeraro, Marco de Gemmis (University of Bari, Aldo Moro)Mauro Barbieri, Jan Korst, Verus Pronk and Ramon Clout (Philips Research, Eindhoven, The Netherlands)
exponential growthof available TV assets
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
Some stats4 hours watched every day
out of 3000 hours of broadcast TV shows
0.013%ratio
source: Nielsen Survey, 2011
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
Information Overload
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
what TV shows should I watch?C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
industrial scenario
how does Philips cope with the overload of TV shows?
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
personalization.
solution
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
recommender systems
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
content-based recommenderskey concepts
• Each item (TV show) has to be described through a set of features
• Description of TV shows, plot of the movie and so on.
• Each user is described through the features that occur in TV shows she watched (liked) in the past
• Recommendations are provided by calculating the overlap between the textual description of the TV show and the features stored in the user profile
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
content-based recommendersexample: TV shows recommendations
user profile
♥
♥
recommendations
documentary
basketball
football
nba (basketball)
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
documentary
content-based recommendersexample: TV shows recommendations
♥
♥ Xbasketball
football
nba (basketball)
user profile recommendations
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
documentary
content-based recommendersexample: TV shows recommendations
♥
♥ X
user profile recommendations
basketball
football
nba (basketball)
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
‘in vitro’ experimentspersonal channels
concept
Idea: combining boolean filters to filter TV shows and recommenders to rank them.
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
‘in vitro’ experiments
Watchmi plug-indeveloped by Aprico.tv
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
descriptions of TV shows are often too short or poorly meaningful
to feed a content-based recommendation algorithm
problem
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
solutionfeature generation techniquesbased on open knowledge sources
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
solutionfeature generation techniquesbased on open knowledge sources
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
explicit semantic analysis
• Explicit Semantic Analysis (ESA) (Gabrilovitch and Markovitch, 2006)
• Goals To introduce a methodology for representing the knowledge stored in Wikipedia
• To define a relationship between terms in natural language and Wikipedia articles
• Insights
• ESA provides a vector-space representation for each term
• Terms are represented as rows in a matrix (called ESA matrix) where
each column is a Wikipedia concept (article)
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
ESA representationterm/document matrix
a1 a2 a3 a4 a5 a6 a7 a8 a9
t1 ✔ ✔ ✔ ✔
t2 ✔ ✔ ✔ ✔
t3 ✔ ✔ ✔
t4 ✔ ✔ ✔ ✔
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
ESA representationterm/Wikipedia articles matrix
a1 a2 MotoGP a4 a5 a6 a7 a8 a9
t1 ✔ ✔ ✔ ✔
t2 ✔ ✔ ✔ ✔
t3 ✔ ✔ ✔
t4 ✔ ✔ ✔ ✔
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
ESA representation
Cat$[0.92]+
Leopard$[0.84]+
Roar$[0.77]+
Every Wikipedia article is a concept
Each concept is represented through the TF-IDF scores of the terms that occur in the
article
Superbike (0.92)
grand prix (0.76)
valentino rossi (0.59)
MotoGp
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
ESA representationterm/Wikipedia Articles matrix
Politics MotoGP Basketball M.Biaggi V.Rossi
Superbike ✔ ✔ ✔
t2 ✔
t3 ✔
t4 ✔ ✔ ✔
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
ESA representation
Cat$ Cat$[0.95]$
Jane.Fonda$[0.07]$
Panthera([0.92](
Each term can be defined upon the Wikipedia concepts it occurs in
the whole vector is called Semantic Interpretation Vector
“ the semantics of a term is the vector of its associations with Wikipedia articles”
Superbike MotoGP(0.92)
Bridgestone(0.43)
Max Biaggi(0.63)
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
ESA representation
bu#on&Dick+Bu#on&[0.84]&
Bu#on&[0.93]&
Game%Controller%[0.32]%
Mouse+computing*[0.81]&
mouse&Mouse+computing*[0.89]&
Mouse+rodent*[0.91]&
John+Steinbeck&[0.17]&
Mickey%Mouse%[0.81]%
mouse++bu#on&
DragB+andBdrop&[0.32]&
Mouse+rodent*[0.46]&
Mouse+computing*[0.85]&
IBM&PS/2*[0.35]&
semantics of text fragments
calculated as the centroid vector of the semantic interpretations vectors that compose the fragment
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
Research QuestionHow can we exploit ESA for performing
feature generation in the scenario of EPGs personalization?
ESA has already been adopted for text classification, information retrieval and
semantic relatedness computation
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
From BOW to eBOW
Given a description of a TV show, we exploit ESA to obtain an enhanced representation
The original set of features is enriched with the set of Wikipedia articles related the most with the TV show
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
BOW$Concept$
n$[0.32]$
Concept$47!
[0.46]$
Concept$1!
[0.85]$
centroid vector
Concept$50!
[0.35]$
The centroid vector of the whole description of the TV show is calculated
The n most related Wikipedia concepts are extracted
Concepts are added to the original BOW to obtain an enhanced BOW (e-BOW)
algorithm
From BOW to eBOW
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
TV SHOW Rad an Rad
Die besten Duelle der MotoGP (Wheel to wheel
The best duels in the MotoGP)
Wikipedia(Articles(großer&preis&von&italien&
(motorrad)&großer&preis&von&malaysia&
(motorrad)&großer&preis&von&tschechien&
(motorrad)&scuderia&ferrari&valen8no&rossi&
motorrad9wm9saison&2005&motorrad9wm9saison&2006&
max&biaggi&
großer&preis&der&usa&(motorrad)&motorrad9wm9saison&2008&
rad&(heraldik)&loris&capirossi&shin’ya&nakano&
motogp&
example
From BOW to eBOW
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
what about the advantages?
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
user profile tv show
motogp
sports
motorbike
...
competition
example
2012 Superbike Italian Grand
Prix
BOW representation
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
user profile tv show
motogp
sports
motorbike
...
competition
example
2012 Superbike Italian Grand
Prix
XNo matching!
BOW representation
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
user profile tv show
motogp
superbike
sports
motorbike
formula 1
...
competition
example
2012 Superbike Italian Grand
Prix
eBOW representation
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
user profile tv show
motogp
superbike
sports
motorbike
formula 1
...
competition
example
2012 Superbike Italian Grand
Prix
Matching!
✔eBOW representation
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
knowledge is fluid.
ESA advantages
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
knowledge is fluid.it is necessary to exploit open and
always updated knowledge sources
ESA advantages
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
concept:example
‘American Politics’
2000Year Enrichment
Clinton
Bush
Obama
20052011
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
concept:(counter)example
‘Italian Politics’
2000Year Enrichment
Berlusconi
Berlusconi
Berlusconi
20052011
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
experiments.
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
• retrieval task• Given a set of program types and a repository of TV
shows
• We want to retrieve the shows that belong to a specific program type
design of the experimentstask
Movie
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
dataset
•Dataset
• 47 German-language Channels provided by Axel Springer
• 133k TV Shows, 17 program types
• Textual features: title, synopsis, description, program type
•Explicit Semantic Analysis
• Dump: October, 2010
• 814,013 terms (rows) and 484,218 articles (colums)
Aprico.tv data
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
design of the experiments
• Two state-of-the-art learning methods have been compared
• Random Indexing
• Vector Space Model (VSM)-based representation
• Incremental approach to compress the representation in an effective way
• Both TV shows and user profile are points in a vector space
• Logistic Regression
• Supervised Learning Method, state of the art for Text Classification
• Each TV show is classified as relevant or not relevant for the user, according to user profile
• TV shows can be ranked according to their probability scores
learning methods
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
design of the experimentsresearch questions
Which one is the learning method than can provide the best recommendations ?
Does the idea of enriching the BOWs with ESA improve the accuracy of the suggestions ?
1.
2.
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
50
62,5
75
87,5
100
P@5% P@10% P@25% P@50% P@75% P@100%
Logistic RegressionRandom Indexing
experiment 1results
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
experiment 2results
74
79,25
84,5
89,75
95
P@5% P@10% P@25% P@50% P@75% P@100%
BOWeBOW (+20)eBOW (+40)eBOW (+60)
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
experiment 2results
74
79,25
84,5
89,75
95
P@5% P@10% P@25% P@50% P@75% P@100%
BOWeBOW (+20)eBOW (+40)eBOW (+60)
Differences between BOW and eBOW(+40, +60) are
statistically significant (Mann-Whitney Test,
p<0,005)
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
Recap
• Content-based Personalization Techniques for Electronic Program Guides
• Joint work: Philips Research - Aprico.tv - University of Bari
• Feature generation to enrich textual descriptions of TV shows
• Exploitation of ESA: Explicit Semantic Analysis
• Introducing eBOW for content representation
• BOW + Wikipedia concepts related to the textual description
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
Conclusions• Linear Regression can provide good accuracy in retrieving
related TV shows
• Almost 90% in precision.
• Feature Generation techniques based on Wikipedia can improve the precision of a content-based recommendation approach
• eBOW representation overcomes the classical BOW representation
• Good results: 94% in precision
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
questions?
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 18.07.12