Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems -...
-
Upload
cataldo-musto -
Category
Technology
-
view
473 -
download
0
Transcript of Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems -...
![Page 1: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/1.jpg)
cataldo musto and pasquale lops
dept. of computer science
university of bari “aldo moro”, italy
semantics-aware techniques for
social media analysis
user modelling
and recommender systems
tutorial@UMAP 2016 Halifax, Canada – July 16, 2015
![Page 3: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/3.jpg)
in this tutorial
how to represent content
to improve information access and build a
new generation of services for social media
analysis, user modeling and
recommender systems?
![Page 4: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/4.jpg)
Agenda
Why?
How?
What?
Why do we need intelligent information access?
Why do we need content?
Why do we need semantics?
How to introduce semantics?
Basics of Natural Language Processing
Encoding exogenous semantics (top-down approaches)
Encoding endogenous semantics (bottom-up approaches)
Semantics-aware Recommender Systems
Cross-lingual Recommender Systems
Explaining Recommendations
Semantic User Profiles based on Social Data
Semantic Analysis of Social Streams
![Page 5: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/5.jpg)
Why?
Why do we need intelligent information access?
![Page 6: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/6.jpg)
![Page 7: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/7.jpg)
physiologically
impossible
to follow the information flow
in real time
![Page 8: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/8.jpg)
(Source: Adrian C.Ott,
The 24-hour
customer)
we can handle
126 bits of
information/day
we deal with
393 bits of
information/day
ratio: 3x
![Page 9: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/9.jpg)
Information overload (Appeared for the first time in «Future Shock» by Alvin Toffler, 1970)
![Page 10: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/10.jpg)
Information overload (Appeared for the first time in «Future Shock» by Alvin Toffler, 1970)
![Page 11: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/11.jpg)
Information overload (Appeared for the first time in «Future Shock» by Alvin Toffler, 1970)
![Page 12: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/12.jpg)
Information overload
“It is not
information
overload.
It is filter failure”
Clay Shirky
talk @Web2.0 Expo
![Page 13: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/13.jpg)
Challenge
To effectively cope with
information overload
we need to filter the information flow
We need technologies and algorithms for intelligent information access
… and we already have some evidence!
![Page 14: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/14.jpg)
Intelligent Information Access
Information Retrieval (Search Engines)
![Page 15: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/15.jpg)
Intelligent Information Access
Information Filtering (Recommender Systems)
![Page 16: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/16.jpg)
Why? Why do we need content?
![Page 17: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/17.jpg)
Search engines need content
Why do we need content?
Trivial: search engines can’t work without content
![Page 18: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/18.jpg)
Why do we need content?
Recommender Systems: not trivial!
![Page 19: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/19.jpg)
Why do we need content?
Recommender Systems can work without content
![Page 20: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/20.jpg)
Why do we need content?
Several Recommender Systems
perfectly work without using any
content! (e.g.Amazon)
Collaborative Filtering and Matrix
Factorization are state of the art
techniques for implementing
Recommender Systems
(ACM RecSys 2009,
by Neflix Challenge winners)
![Page 21: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/21.jpg)
Why do we need content?
Content can tackle some issues of collaborative filtering
![Page 22: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/22.jpg)
Why do we need content?
Collaborative Filtering issues: sparsity
![Page 23: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/23.jpg)
Why do we need content?
Collaborative Filtering issues: new item problem
![Page 24: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/24.jpg)
Why do we need content?
Collaborative Filtering: lack of transparency!
![Page 25: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/25.jpg)
Why do we need content?
Collaborative Filtering: poor explanations!
Who knows the «customers who bought…»?
![Page 26: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/26.jpg)
Why do we need content?
User Modeling based on simple graph-based representation of social
connections is quite poor.
User Models can benefit of information about the items the user has
consumed (news content, hashtag contained in the Tweets she liked,
etc.)
To enrich and improve user modeling
![Page 27: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/27.jpg)
Why do we need content?
Because a relevant part of the information spread
on social media is content!
And social media really matter
![Page 28: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/28.jpg)
Because a relevant part of the information spread
on social media is content!
![Page 29: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/29.jpg)
can be considered as novel data silos
Social Media
![Page 30: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/30.jpg)
Social Media
information about preferences
![Page 31: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/31.jpg)
information about
People feelings and connections
Social Media
![Page 32: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/32.jpg)
changed the rule for
user modeling and
personalization
Social Media
![Page 33: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/33.jpg)
Recap #1
Why do we need content?
In general: to extend and improve user modeling
To exploit the information spread on social media
To overcome typical issues of collaborative filtering
and matrix factorization
Because search engines can’t simply work without
content
![Page 34: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/34.jpg)
Why? Why do we need semantics?
![Page 35: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/35.jpg)
Why do we need semantics?
A deep comprehension of the information conveyed by
textual content is crucial to improve the quality of user
profiles and the effectiveness of intelligent information
access platforms.
![Page 36: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/36.jpg)
Why do we need semantics?
…some scenarios can be more convincing
(But we need some basics, before)
![Page 37: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/37.jpg)
Basics: Content-based RecSys (CBRS)
Suggest items similar to those the
user liked in the past
Recommendations generated by matching
the description of items with the
profile of the user’s interests
use of specific features
Recommender Systems Handbook,
The Adaptive Web
![Page 38: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/38.jpg)
Basics: Content-based RecSys (CBRS)
user profile items
![Page 39: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/39.jpg)
Basics: Content-based RecSys (CBRS)
user profile items
Recommendation are
generated by
matching the features
stored in the user
profile with those
describing the items
to be recommended.
![Page 40: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/40.jpg)
Basics: Content-based RecSys (CBRS)
user profile items
Recommendation are
generated by
matching the features
stored in the user
profile with those
describing the items
to be recommended. X
![Page 41: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/41.jpg)
Lack of Semantics in User Models
“I love turkey. It’s my choice
for these #holidays!
Social Media can be helpful to avoid cold start
![Page 42: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/42.jpg)
Lack of Semantics in User Models
“I love turkey. It’s my choice
for these #holidays!
..but pure content-based representations
can’t handle polysemy
![Page 43: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/43.jpg)
Lack of Semantics in User Models
“I love turkey. It’s my choice
for these #holidays!
Pure Content-based Representation can easily drive a recommender systems towards failures!
?
![Page 44: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/44.jpg)
Lack of Semantics in Social Media Analysis
?
What are people worried about?
Are they worried about the eagle
or about the city of L’Aquila?
![Page 45: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/45.jpg)
Lack of Semantics in User Models
AI
Artificial
Intelligence
apple
multi-word concepts
?
Book recommendation
![Page 46: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/46.jpg)
AI
Artificial
Intelligence
apple
synonymy
Lack of Semantics in User Models
…is not only about polysemy
?
Book recommendation
Most of the preferences regard AI,
but due to synonymy «apple» is the
most relevant feature in the profile
![Page 47: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/47.jpg)
italian
english
Lack of Semantics in CBRS
![Page 48: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/48.jpg)
Lack of Semantics in CBRS
user profile items
![Page 49: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/49.jpg)
Lack of Semantics in CBRS
user profile items
It is likely that the
algorithm is not able
to suggest a
(relevant) english
news since no
overlap between
the features
occurs!
X
![Page 50: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/50.jpg)
Lack of Semantics in CBRS
user profile items
It is likely that the
algorithm is not able
to suggest a
(relevant) english
news since no
overlap between
the features
occurs!
X
![Page 51: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/51.jpg)
Recap #2
In general: to improve
content representation in
intelligent information
access platforms
To avoid typical issues of
natural language
representations (polysemy,
synonymy, etc.)
To better model user
preferences
To better understand the
information spread on social
media
To provide multilingual
recommendations
Why do we need semantics?
Becuase language is
inherently ambiguous
![Page 52: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/52.jpg)
How?
How to introduce semantics?
![Page 53: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/53.jpg)
Information Retrieval and Filtering
Two sides of the same coin (Belkin&Croft,1992)
Information
Retrieval
information need expressed
through a query
goal: retrieve information which
might be relevant to a
user
Information
Filtering
information need expressed
through a
user profile
goal: expose users to only the
information that is
relevant to them,
according to personal profiles
[Belkin&Croft, 1992] Belkin, Nicholas J., and W. Bruce Croft.
"Information filtering and information retrieval: Two sides of the same
coin?." Communications of the ACM 35.12 (1992): 29-38.
It’s all about searching!
![Page 54: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/54.jpg)
Search (and Content-based Recommendation)
is not so simple as it might seem
Meno: and how will you enquire, Socrates, into that
which you do not know? What will you put forth
as the subject of enquiry? And if you find what
you want, how will you know that this is the
thing you did not know?
Socrates: I know, Meno, what you mean; but just
see what a tiresome dispute you are introducing. You argue that a man cannot search either
for what he knows or for what he does not
know; if he knows it, there is no need to search;
and if not, he cannot; he does not know the very
subject about which he is to search.
Plato Meno 80d-81a
http://www.gutenberg.org/etext/1643
60
Meno’s Paradox of Inquiry
![Page 55: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/55.jpg)
Meno’s question at our times:
the “vocabulary mismatch” problem (revisited)
How to discover the concepts that connect us to the
the information we are seeking (search task) or we want
to be exposed to (recommendation and user modeling
tasks) ?
61
![Page 56: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/56.jpg)
Meno’s question at our times:
the “vocabulary mismatch” problem (revisited)
How to discover the concepts that connect us to the
the information we are seeking (search task) or we want
to be exposed to (recommendation and user modeling
tasks) ?
62
We need some «intelligent» support
(as intelligent information access
technologies)
![Page 57: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/57.jpg)
Meno’s question at our times:
the “vocabulary mismatch” problem (revisited)
How to discover the concepts that connect us to the
the information we are seeking (search task) or we want
to be exposed to (recommendation and user modeling
tasks) ?
63
We need to better understand
and represent the content
We need some «intelligent» support
(as intelligent information access
technologies)
![Page 58: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/58.jpg)
Meno’s question at our times:
the “vocabulary mismatch” problem (revisited)
How to discover the concepts that connect us to the
the information we are seeking (search task) or we want
to be exposed to (recommendation and user modeling
tasks) ?
64
We need to better understand
and represent the content
We need some «intelligent» support
(as intelligent information access
technologies)
![Page 59: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/59.jpg)
…before semantics
some basics
of Natural Language Processing (NLP)
![Page 60: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/60.jpg)
How?
basics of NLP and keyword-based representations
![Page 61: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/61.jpg)
Scenario
Pasquale really loves the movie «The Matrix», and he asks a content-based
recommender system for some suggestions.
How can we feed the algorithm with some textual features related to the movie
to build a (content-based) profile and provide recommendations?
?
Question
Recommendation
Engine
![Page 62: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/62.jpg)
Scenario
the plot can be a rich source of content-based features
![Page 63: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/63.jpg)
Scenario
…but we need to properly process it through a pipeline of
Natural Language Processing techniques
the plot can be a rich source of content-based features
![Page 64: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/64.jpg)
Basic NLP operations
o normalization strip unwanted characters/markup (e.g.
HTML/XML tags, punctuation, numbers, etc.)
o tokenization break text into tokens
o stopword removal exclude common words having
little semantic content
o lemmatization reduce inflectional/variant forms to base
form (lemma in the dictionary), e.g. am, are, is be
o stemming reduce terms to their “roots”, e.g. automate(s),
automatic, automation all reduced to automat
vocabulary
![Page 65: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/65.jpg)
Example
The Matrix is a 1999 American-Australian neo-noir
science fiction action film written and directed by the
Wachowskis, starring Keanu Reeves, Laurence
Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe
Pantoliano. It depicts a dystopian future in which reality
as perceived by most humans is actually a simulated
reality called "the Matrix", created by sentient machines
to subdue the human population, while their bodies' heat
and electrical activity are used as an energy source.
Computer programmer "Neo" learns this truth and is
drawn into a rebellion against the machines, which
involves other people who have been freed from the
"dream world".
![Page 66: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/66.jpg)
X
X
X
X
X
X
X
X X
X X
X
X X X
X
X
X X
The Matrix is a 1999 American-Australian neo-noir
science fiction action film written and directed by the
Wachowskis, starring Keanu Reeves, Laurence
Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe
Pantoliano. It depicts a dystopian future in which reality
as perceived by most humans is actually a simulated
reality called "the Matrix", created by sentient machines
to subdue the human population, while their bodies' heat
and electrical activity are used as an energy source.
Computer programmer "Neo" learns this truth and is
drawn into a rebellion against the machines, which
involves other people who have been freed from the
"dream world".
Example
normalization
![Page 67: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/67.jpg)
Example
The Matrix is a 1999 American Australian neo noir
science fiction action film written and directed by the
Wachowskis starring Keanu Reeves Laurence Fishburne
Carrie Anne Moss Hugo Weaving and Joe Pantoliano It
depicts a dystopian future in which reality as perceived
by most humans is actually a simulated reality called the
Matrix created by sentient machines to subdue the
human population while their bodies heat and electrical
activity are used as an energy source Computer
programmer Neo learns this truth and is drawn into a
rebellion against the machines which involves other
people who have been freed from the dream world
tokenization
![Page 68: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/68.jpg)
Tokenization issues
compound words
o science-fiction: break up hyphenated sequence?
o Keanu Reeves: one token or two? How do you decide it is one
token?
numbers and dates
o 3/20/91 Mar. 20, 1991 20/3/91
o 55 B.C.
o (800) 234-2333
![Page 69: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/69.jpg)
Tokenization issues
language issues
o German noun compounds not segmented
Lebensversicherungsgesellschaftsangestellter means
life insurance company employee
o Chinese and Japanese have no spaces between words (not always
guaranteed a unique tokenization)
莎拉波娃现在居住在美国东南部的佛罗里达
o Arabic (or Hebrew) is basically written right to left, but with certain items like
numbers written left to right
Algeria achieved its independence in 1962 after 132 years of French
occupation
![Page 70: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/70.jpg)
X
X X X
X X
X
X
X
X
X
X
X X
X X
X
X X X
X X
X
X
X
X
X
X
X
X X X
X
X
X
X
X X
X
X
X
Example
stopword removal
The Matrix is a 1999 American Australian neo noir
science fiction action film written and directed by the
Wachowskis starring Keanu Reeves Laurence Fishburne
Carrie Anne Moss Hugo Weaving and Joe Pantoliano It
depicts a dystopian future in which reality as perceived
by most humans is actually a simulated reality called the
Matrix created by sentient machines to subdue the
human population while their bodies heat and electrical
activity are used as an energy source Computer
programmer Neo learns this truth and is drawn into a
rebellion against the machines which involves other
people who have been freed from the dream world
![Page 71: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/71.jpg)
Example
The Matrix is a 1999 American Australian neo noir
science fiction action film written and directed by the
Wachowskis starring Keanu Reeves Laurence Fishburne
Carrie Anne Moss Hugo Weaving and Joe Pantoliano It
depicts a dystopian future in which reality as perceived
by most humans is actually a simulated reality called the
Matrix created by sentient machines to subdue the
human population while their bodies heat and electrical
activity are used as an energy source Computer
programmer Neo learns this truth and is drawn into a
rebellion against the machines which involves other
people who have been freed from the dream world
stopword removal
![Page 72: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/72.jpg)
Example
The Matrix is a 1999 American Australian neo noir
science fiction action film written and directed by the
Wachowskis starring Keanu Reeves Laurence Fishburne
Carrie Anne Moss Hugo Weaving and Joe Pantoliano It
depicts a dystopian future in which reality as perceived
by most humans is actually a simulated reality called the
Matrix created by sentient machines to subdue the
human population while their bodyies heat and electrical
activity are used as an energy source Computer
programmer Neo learns this truth and is drawn into a
rebellion against the machines which involves other
people who have been freed from the dream world
lemmatization
![Page 73: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/73.jpg)
Example
Matrix 1999 American Australian neo noir science fiction
action film write direct Wachowskis star Keanu Reeves
Laurence Fishburne Carrie Anne Moss Hugo Weaving
Joe Pantoliano depict dystopian future reality perceived
human simulate reality call Matrix create sentient
machine subdue human population body heat electrical
activity use energy source Computer programmer Neo
learn truth draw rebellion against machine involve people
free dream world
next step: to give a weight to each feature
(e.g. through TF-IDF)
![Page 74: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/74.jpg)
Weighting features: TF-IDF
terms frequency – inverse document
frequency best known weighting scheme in information retrieval.
Weight of a term as product of tf weight and idf weight
tf number of times the term occurs in the document
idf depends on rarity of a term in a collection
tf-idf increases with the number of occurrences within a
document, and with the rarity of the term in the collection.
)df/log()tflog1(w ,, tdt Ndt
![Page 75: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/75.jpg)
Example
Matrix 1999 American Australian neo noir science fiction
action film write direct Wachowskis star Keanu Reeves
Laurence Fishburne Carrie Anne Moss Hugo Weaving
Joe Pantoliano depict dystopian future reality
perceived human simulate reality call Matrix create
sentient machine subdue human population body heat
electrical activity use energy source Computer
programmer Neo learn truth draw rebellion against
machine involve people free dream world
green=high IDF
red=low IDF
![Page 76: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/76.jpg)
The Matrix representation
Matrix
1999
American
Australian
fiction
world
keywords
a portion of Pasquale’s
content-based profile
given a content-based profile, we can easily build a basic
recommender system through
Vector Space Model and
similarity measures
science
Hugo
![Page 77: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/77.jpg)
Vector Space Model (VSM)
given a set of n features (vocabulary)
f = {f1, f
2 ,..., f
n}
given a set of M items, each item I
represented as a point in a n-dimensional vector space
I = (wf1
,.....wfn
)
wfi is the weight of feature i in the item
![Page 78: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/78.jpg)
Similarity between vectors
cosine similarity
V
i i
V
i i
V
i ii
JI
JI
J
J
I
I
JI
JIJI
1
2
1
2
1),cos(
dot product unit vectors
![Page 79: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/79.jpg)
Basic Content-based Recommendations
o documents represented as vectors o features identified through NLP operations
o features weigthed using tf-idf
o cosine measure for computing similarity
between vectors
![Page 80: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/80.jpg)
Drawbacks
a portion of Pasquale’s
content-based profile
Recommendation:
Notre Dame de Paris,
by Victor Hugo
Basic Content-based Recommendations
Why?
Entities as «Hugo
Weaving» were not
modeled
Matrix
1999
American
Australian
fiction
world
science
Hugo
![Page 81: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/81.jpg)
Drawbacks
Basic Content-based Recommendations
Why?
More complex concepts
as «science fiction» were
not modeled as single
features
Recommendation:
The March of Penguins
Matrix
1999
American
Australian
fiction
world
science
Hugo
a portion of Pasquale’s
content-based profile
![Page 82: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/82.jpg)
Vision
Basic Content-based Recommendations
![Page 83: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/83.jpg)
Vision
Basic Content-based Recommendations
Bad recommendations
![Page 84: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/84.jpg)
Recap #3
Natural Language Processing
techniques necessary to build a
content-based profile
basic content-based
algorithms can be easily built
through TF-IDF
keyword-based representation
too poor and can drive to bad
modeling of preferences (and
bad recommendations)
we need to shift from keywords to concepts
basics of NLP and keyword-based representation
![Page 85: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/85.jpg)
How?
Semantics-aware Content Representation
![Page 86: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/86.jpg)
Semantic representations
![Page 87: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/87.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
![Page 88: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/88.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
top-down
approaches based on the
integration of external
knowledge for
representing content. Able to provide the linguistic,
cultural and backgroud
knowledge in the
content representation
![Page 89: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/89.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
top-down
approaches based on the
integration of external
knowledge for
representing content. Able to provide the linguistic,
cultural and backgroud
knowledge in the
content representation
bottom-up
approaches that determine
the meaning of a word
by analyzing the rules of its usage in the context of
ordinary and concrete
language behavior
![Page 90: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/90.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Introduce semantics by
mapping the features
describing the item with
semantic concepts
Introduce semantics
by linking
the Item to
a knowledge graph
![Page 91: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/91.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Word Sense
Disambiguation
Entity
Linking …….
Introduce semantics by
mapping the features
describing the item with
semantic concepts
Introduce semantics
by linking
the item to
a knowledge graph
![Page 92: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/92.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Ontologies Linked
Open Data
Introduce semantics
by linking
the Item to
a knowledge graph
…….
Introduce semantics by
mapping the features
describing the item with
semantic concepts
![Page 93: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/93.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Distributional
semantic models
Introduce semantics by
mapping the features
describing the item with
semantic concepts
Introduce semantics
by linking
the Item to
a knowledge graph
![Page 94: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/94.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Explicit
Semantic
Analysis
Random
Indexing
…… Word2Vec
Distributional
semantic models
Introduce semantics by
mapping the features
describing the item with
semantic concepts
Introduce semantics
by linking
the Item to
a knowledge graph
![Page 95: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/95.jpg)
How? Encoding exogenous semantics
(top-down approaches)
![Page 96: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/96.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Word Sense
Disambiguation
Entity
Linking …….
Introduce semantics by
mapping the features
describing the item with
semantic concepts
Introduce semantics
by linking the
Item to a
knowledge graph
![Page 97: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/97.jpg)
Word Sense Disambiguation (WSD)
using linguistic ontologies
WSD selects the proper meaning, i.e. sense, for a word in
a text by taking into account the context in which it occurs
![Page 98: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/98.jpg)
Word Sense Disambiguation (WSD)
using linguistic ontologies
WSD selects the proper meaning, i.e. sense, for a word in
a text by taking into account the context in which it occurs
![Page 99: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/99.jpg)
Sense Repository
WordNet groups words into sets of synonyms called synsets
It contains nouns, verbs, adjectives, adverbs
Word Meanings
Word Forms
F1 F2 F3 … … Fn
M1 V(1,1) V(2,1)
M2 V(2,2) V(3,2)
M3
M…
Mm V(m,n)
Synonym
word forms
(synset)
polysemous word:
disambiguation needed
WordNet linguistic ontology [*]
[*] Miller, George A. "WordNet: a lexical database for
English." Communications of the ACM 38.11 (1995): 39-41.
![Page 100: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/100.jpg)
an example of synset
Sense Repository
WordNet linguistic ontology
![Page 101: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/101.jpg)
Sense Repository
WordNet Hierarchies
WordNet linguistic ontology
![Page 102: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/102.jpg)
Word Sense Disambiguation
State of the art: JIGSAW algorithm [*]
Input
o D = {w1, w
2, …. , w
h} document
Output
o X = {s1, s
2, …. , s
k} (kh)
Each si obtained by disambiguating wi based on the context
of each word
Some words not recognized by WordNet
Groups of words recognized as a single concept
[*] Basile, P., de Gemmis, M., Gentile, A. L., Lops, P., & Semeraro, G. (2007, June). UNIBA:
JIGSAW algorithm for word sense disambiguation. InProceedings of the 4th International
Workshop on Semantic Evaluations (pp. 398-401). Association for Computational Linguistics.
![Page 103: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/103.jpg)
How to use WordNet for WSD?
Semantic similarity between synsets inversely proportional to their distance in the WordNet IS-A hierarchy
Path length similarity between synsets used to assign scores to synsets of a polysemous word in order to choose the correct sense
Placental mammal
Carnivore Rodent
Feline, felid
Cat (feline mammal)
Mouse (rodent)
1
2
3 4
5
JIGSAW WSD algorithm
![Page 104: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/104.jpg)
SINSIM(cat,mouse) =
-log(5/32)=0.806
Placental mammal
Carnivore Rodent
Feline, felid
Cat (feline mammal)
Mouse (rodent)
1
2
3 4
5
Synset semantic similarity
![Page 105: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/105.jpg)
w
C
JIGSAW WSD algorithm
![Page 106: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/106.jpg)
w
C
0.107
0.0
0.0
0.806 0.8060.806
JIGSAW WSD algorithm
![Page 107: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/107.jpg)
through WSD can we obtain a semantics-aware representation
of textual content
![Page 108: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/108.jpg)
Synset-based representation
{09596828} American -- (a native or inhabitant of the United States)
{06281561} fiction -- (a literary work based on the imagination and not necessarily on fact)
{06525881} movie, film, picture, moving picture, moving-picture show, motion picture,
motion-picture show, picture show, pic, flick -- (a form of entertainment that enacts a story…
{02605965} star -- (feature as the star; "The movie stars Dustin
Hoffman as an autistic man")
![Page 109: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/109.jpg)
synsets
{09596828} American -- (a native or
inhabitant of the United States)
{06281561} fiction -- (a literary
work based on the imagination
and not necessarily on fact)
{06525881} movie, film, picture,
moving picture, moving-picture
show, motion picture,
motion-picture show, picture show,
pic, flick -- (a form of entertainment
that enacts a story…
{02605965} star -- (feature as the
star; "The movie stars Dustin
Hoffman as an autistic man")
The Matrix representation
through WSD we process the textual
description of the item and we obtain a semantics-aware
representation of the item as
output
keyword-based features replaced
with the concepts (in this
case WordNet synsets) they refer to
Matrix
1999
American
Australian
fiction
world
keywords
science
Hugo
![Page 110: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/110.jpg)
synsets
{09596828} American -- (a native or
inhabitant of the United States)
{06281561} fiction -- (a literary
work based on the imagination
and not necessarily on fact)
{06525881} movie, film, picture,
moving picture, moving-picture
show, motion picture,
motion-picture show, picture show,
pic, flick -- (a form of entertainment
that enacts a story…
{02605965} star -- (feature as the
star; "The movie stars Dustin
Hoffman as an autistic man")
The Matrix representation
Word Sense Disambiguation recap
polysemy and synonymy
effectively handled
classical NLP techniques helpful to
remove further noise (e.g.
stopwords)
potentially language-independent
(later)
entities (e.g. Hugo Weaving)
still not recognized
Matrix
1999
American
Australian
fiction
world
keywords
science
Hugo
![Page 111: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/111.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Word Sense
Disambiguation
Entity
Linking
Introduce semantics
by linking the item
itself to a
knowledge graph
…….
Introduce semantics by
mapping the features
describing the item with
semantic concepts
![Page 112: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/112.jpg)
• Basic Idea
• Input: free text
• e.g. Wikipedia
abstract
• Output:
identification of
the entities
mentioned in the
text.
Entity Linking Algorithms
![Page 113: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/113.jpg)
Why Entity Linking?
because we need to identify the entities mentioned in the textual description
to better catch user preferences and information needs.
… and many more
Several state-of-the-art implementations are already available
![Page 114: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/114.jpg)
Entity Linking Algorithms
OpenCalais
![Page 115: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/115.jpg)
synsets
{09596828} American -- (a native or
inhabitant of the United States)
{06281561} fiction -- (a literary
work based on the imagination
and not necessarily on fact)
{06525881} movie, film, picture,
moving picture, moving-picture
show, motion picture,
motion-picture show, picture show,
pic, flick -- (a form of entertainment
that enacts a story…
{02605965} star -- (feature as the
star; "The movie stars Dustin
Hoffman as an autistic man")
The Matrix representation
Matrix
1999
American
Australian
neo
science
fiction
world
keywords entities
![Page 116: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/116.jpg)
synsets
{09596828} American -- (a native or
inhabitant of the United States)
{06281561} fiction -- (a literary
work based on the imagination
and not necessarily on fact)
{06525881} movie, film, picture,
moving picture, moving-picture
show, motion picture,
motion-picture show, picture show,
pic, flick -- (a form of entertainment
that enacts a story…
{02605965} star -- (feature as the
star; "The movie stars Dustin
Hoffman as an autistic man")
The Matrix representation
Matrix
1999
American
Australian
neo
science
fiction
world
keywords entities
entities are correctly
recognized and modeled
partially multilingual
(entities are inherently multilingual,
but other concepts aren’t)
common sense and abstract
concepts now ignored.
![Page 117: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/117.jpg)
very transparent and human-readable content representation
non-trivial NLP tasks automatically performed
(stopwords removal, n-grams identification, named entities recognition and
disambiguation)
Entity Linking Algorithms
Tag.me
Output
![Page 118: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/118.jpg)
each entity identified in the content can be a feature of a
semantics-aware content representation
based on entity linking
Entity Linking Algorithms
Tag.me
Output
![Page 119: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/119.jpg)
Advantage #1: several common sense
concepts are now identified
Entity Linking Algorithms
Tag.me
Output
![Page 120: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/120.jpg)
Output
Advantage #2: each entity is a reference
to a Wikipedia page
http://en.wikipedia.org/wiki/The_Wachowskis
not a simple textual feature!
Entity Linking Algorithms
Tag.me
![Page 121: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/121.jpg)
We can enrich this entity-based representation
by exploiting the Wikipedia categories’ tree
Entity Linking Algorithms
Tag.me + Wikipedia Categories
![Page 122: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/122.jpg)
final representation of items obtained by
merging entities identified in the text with
the (most relevant)
Wikipedia
categories each
entity is linked to
+ entities wikipedia categories features =
Entity Linking Algorithms
Tag.me + Wikipedia Categories
![Page 123: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/123.jpg)
synsets
{09596828} American -- (a native or
inhabitant of the United States)
{06281561} fiction -- (a literary
work based on the imagination
and not necessarily on fact)
{06525881} movie, film, picture,
moving picture, moving-picture
show, motion picture,
motion-picture show, picture show,
pic, flick -- (a form of entertainment
that enacts a story…
{02605965} star -- (feature as the
star; "The movie stars Dustin
Hoffman as an autistic man")
The Matrix representation
Matrix
1999
American
Australian
neo
science
fiction
world
keywords Wikipedia pages
![Page 124: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/124.jpg)
synsets
{09596828} American -- (a native or
inhabitant of the United States)
{06281561} fiction -- (a literary
work based on the imagination
and not necessarily on fact)
{06525881} movie, film, picture,
moving picture, moving-picture
show, motion picture,
motion-picture show, picture show,
pic, flick -- (a form of entertainment
that enacts a story…
{02605965} star -- (feature as the
star; "The movie stars Dustin
Hoffman as an autistic man")
The Matrix representation
Matrix
1999
American
Australian
neo
science
fiction
world
keywords Wikipedia pages
entities recognized and
modeled (as in OpenCalais)
Wikipedia-based representation:
some common sense terms
included, and new interesting
features (e.g. «science-fiction fil
director») can be generated
terms without a Wikipedia
mapping are ignored
![Page 125: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/125.jpg)
traditional
resources
collaborative
resources
o manually curated by experts
o available for a few languages
o difficult to maintain and update
o collaboratively built by the crowd
o highly multilingual
o up-to-date
Entity Linking Algorithms
Babelfy
![Page 126: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/126.jpg)
![Page 127: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/127.jpg)
Entity Linking Algorithms
Babelfy
we have both Named Entities and Concepts!
![Page 130: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/130.jpg)
synsets
{09596828} American -- (a native or
inhabitant of the United States)
{06281561} fiction -- (a literary
work based on the imagination
and not necessarily on fact)
{06525881} movie, film, picture,
moving picture, moving-picture
show, motion picture,
motion-picture show, picture show,
pic, flick -- (a form of entertainment
that enacts a story…
{02605965} star -- (feature as the
star; "The movie stars Dustin
Hoffman as an autistic man")
The Matrix representation
Matrix
1999
American
Australian
neo
science
fiction
world
keywords Babel synsets
![Page 131: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/131.jpg)
synsets
{09596828} American -- (a native or
inhabitant of the United States)
{06281561} fiction -- (a literary
work based on the imagination
and not necessarily on fact)
{06525881} movie, film, picture,
moving picture, moving-picture
show, motion picture,
motion-picture show, picture show,
pic, flick -- (a form of entertainment
that enacts a story…
{02605965} star -- (feature as the
star; "The movie stars Dustin
Hoffman as an autistic man")
The Matrix representation
Matrix
1999
American
Australian
neo
science
fiction
world
keywords Babel synsets
entities recognized and
modeled (as in OpenCalais
and Tag.me)
Wikipedia-based representation:
some common sense terms
included, and new interesting
features (e.g. «science-fiction
director) can be generated
includes linguistic knowledge
and is able to disambiguate terms
also multilingual!
![Page 132: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/132.jpg)
Recap #4
o «Exogenous» techniques use
external knowledge sources to inject
semantics
o Word Sense Disambiguation
algorithms process the textual
description and replace keywords with
semantic concepts (as synsets)
o Entity Linking algorithms focus on
the identification of the entities. Some
recent approaches also able to identify
common sense terms
o Combination of both
approaches is potentially the
best strategy
encoding exogenous semantics
by processing textual descriptions
![Page 133: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/133.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Ontologies Linked
Open Data
Introduce semantics
by linking the
Item to a
knowledge graph
…….
Introduce semantics by
mapping the features
describing the item with
semantic concepts
![Page 134: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/134.jpg)
Ontologies
o used to describe domain-specific
knowledge
o hierarchies of
concepts with
attributes and relations
o “An ontology is a formal,
explicit specification of
a shared conceptualization”
Guarino, Nicola. "Understanding, building and using ontologies." International Journal of Human-Computer
Studies 46.2 (1997): 293-310.
![Page 135: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/135.jpg)
Exogenous Semantics through Ontologies
why do we need an ontology?
to share common understanding of the structure of information
o among people
o among software agents
to enable reuse of domain knowledge
o to avoid “re-inventing the wheel”
o to introduce standards to allow interoperability
![Page 136: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/136.jpg)
Exogenous Semantics through Ontologies
why do we need an ontology?
to share common understanding of the structure of information
o among people
o among software agents
to enable reuse of domain knowledge
o to avoid “re-inventing the wheel”
o to introduce standards to allow interoperability
…let’s have an example!
![Page 137: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/137.jpg)
Exogenous Semantics through Ontologies
A Movie Ontology
![Page 138: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/138.jpg)
Exogenous Semantics through Ontologies
A Movie Ontology
(a small portion, actually)
we formally encode relevant aspects and the relationships among them
![Page 139: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/139.jpg)
Exogenous Semantics through Ontologies
A Movie Ontology
(a small portion, actually)
every item formally modeled by following this structure
and encoded through a Semantic Web language (e.g. OWL, RDF)
![Page 140: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/140.jpg)
Exogenous Semantics through Ontologies
A Movie Ontology
(a small portion, actually)
every item formally modeled by following this structure
and encoded through a Semantic Web language (e.g. OWL, RDF)
![Page 141: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/141.jpg)
Exogenous Semantics through Ontologies
A Movie Ontology
(a small portion, actually)
why is it useful?
![Page 142: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/142.jpg)
Exogenous Semantics through Ontologies
A Movie Ontology
why is it useful? each feature has a non-ambiguous «meaning»
![Page 143: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/143.jpg)
Exogenous Semantics through Ontologies
A Movie Ontology
why is it useful?
we don’t need to process unstructured content
![Page 144: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/144.jpg)
Exogenous Semantics through Ontologies
A Movie Ontology
(a small portion, actually)
why is it useful? we can perform some «reasoning» on user preferences. How?
![Page 145: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/145.jpg)
Exogenous Semantics through Ontologies
The Movie Ontology
We can reason on the preferences and infer
that a user interested in The Matrix
(SciFi_and_Fantasy genre) is interested in
Imaginational_Entertainment and potentially
in Logical_Thrilling
![Page 146: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/146.jpg)
The Matrix representation
from a flat representation
toward a graph-based
representation
![Page 147: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/147.jpg)
The Matrix representation
from a flat representation
toward a graph-based
representation
semantics explicitly encoded
explicit relations between
concepts exist: reasoning to infer
interesting information
ontologies typically
domain-dependant
huge effort to build and
mantain an ontology
very huge effort to
populate an ontology!
![Page 148: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/148.jpg)
Vision
![Page 149: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/149.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Ontologies Linked
Open Data
Introduce semantics
by linking the
Item to a
knowledge graph
…….
Introduce semantics by
mapping the features
describing the item with
semantic concepts
![Page 150: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/150.jpg)
Linked Open Data
the giant global graph
![Page 151: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/151.jpg)
Linked Open Data (cloud)
what is it?
(large) set of interconnected
semantic datasets
![Page 152: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/152.jpg)
Linked Open Data (cloud)
statistics
149 billions triples, 3,842 datasets (http://stats.lod2.eu)
![Page 153: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/153.jpg)
Linked Open Data (cloud)
DBpedia
core of the LOD cloud
RDF mapping of Wikipedia
![Page 154: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/154.jpg)
Linked Open Data
cornerstones
1.
2.
methodology to publish, share and link
structured data on the Web
use of RDF
o every resource/entity/relation identified by a (unique) URI
o URI: http://dbpedia.org/resource/Halifax
re-use of existing properties to express an
agreed semantics and connect data sources
![Page 155: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/155.jpg)
Linked Open Data (cloud)
![Page 156: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/156.jpg)
Linked Open Data (cloud)
representation
The Matrix dbpedia
-ow
l:directo
r dbo:r
untim
e
interesting non-trivial features come into play
dcterms:subject dcterms:subject
![Page 157: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/157.jpg)
The Matrix representation
from a flat representation
toward a (richer) graph-based
representation
![Page 158: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/158.jpg)
The Matrix representation
we have the advantage of formal semantics defined in RDF, with
interesting features coming from Wikipedia
without the need of building and manually populating an ontology
![Page 159: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/159.jpg)
1.
2.
graph-based data models can be exploited to define more semantic
features based on graph topology
another advantage
![Page 160: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/160.jpg)
i4 (bipartite graph)
users = nodes
items = nodes
preferences = edges
Very intuitive
representation!
u1
i1
u2 i2
u3 i3
u4
i4
Graph-based Data Model
![Page 161: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/161.jpg)
i4
u1
i1
u2
u3 i3
u4
Semantic Graph-based Data Model
![Page 162: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/162.jpg)
i4 DBpedia
mapping
u1
i1
u2
u3 i3
u4
Semantic Graph-based Data Model
![Page 163: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/163.jpg)
i4
u1
i1
u2
u3 i3
u4
dcterms:subject Films about
Rebellions
Quentin Tarantino
1999 films
http://dbpedia.org/resource/1999_films
dcterms:subject
Semantic Graph-based Data Model
(1-hop)
![Page 164: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/164.jpg)
i4
u1
i1 u2
u3 i3 u4
dcterms:subject Films about
Rebellions
Quentin Tarantino
1999 films
http://dbpedia.org/resource/1999_films
dcterms:subject
American film
directors
dbo:award Lynne Thigpen
http://dbpedia.org/resource/Lynne_Thigpen
Semantic Graph-based Data Model
(2-hop)
![Page 165: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/165.jpg)
i4
u1
i1 u2
u3 i3 u4
dcterms:subject Films about
Rebellions
Quentin Tarantino
1999 films
http://dbpedia.org/resource/1999_films
dcterms:subject
American film
directors
dbo:award Lynne Thigpen
http://dbpedia.org/resource/Lynne_Thigpen
Semantic Graph-based Data Model
(n-hop)
![Page 166: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/166.jpg)
PageRank
Spreading activation
Average Neighbors
Degree Centrality
…
Semantic Graph-based Data Model
(Feature Generation)
new semantic features describing the
item can be inferred by mining the
structure of the tripartite graph
![Page 167: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/167.jpg)
Recap #5
o Ontologies enrich the representation
by introducing formal semantics, but
they are very complicated to maintain
and build
o Linked Open Data merge the
advantages of ontologies with the
simplicity of a collaborative knowledge
source as Wikipedia
o Such approaches build a
graph-based representation
that triggers the generation of
semantic topological features
o Inherently multilingual!
encoding exogenous semantics through Knowledge Graphs
![Page 168: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/168.jpg)
How? Encoding endogenous semantics
(bottom-up approaches)
![Page 169: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/169.jpg)
Insight
Very huge availability of textual content
![Page 170: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/170.jpg)
Insight
We can use this huge amount of content to
directly learn a representation of words
![Page 171: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/171.jpg)
Insight
What is «Peroni» ?
Pass me a Peroni!
I like Peroni
Football and Peroni, what a perfect Saturday!
![Page 172: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/172.jpg)
Insight
What is «Budweiser» ?
Pass me a Budweiser!
I like Budweiser
Football and Budweiser, what a perfect Saturday!
![Page 173: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/173.jpg)
Insight
What is «Budweiser» ?
Pass me a Budweiser!
I like Budweiser
Football and Budweiser, what a perfect Saturday!
![Page 174: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/174.jpg)
Insight
What is «Peroni» ?
Pass me a Peroni!
I like Peroni
Football and Peroni, what a perfect Saturday!
![Page 175: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/175.jpg)
Insight
Distributional Hypothesis
«Terms used in similar contexts
share a similar meaning»
![Page 176: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/176.jpg)
Insight
The semantics learnt according to
terms usage is called «distributional»
![Page 177: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/177.jpg)
Distributional Semantics
L.Wittgenstein
(Austrian philosopher)
![Page 178: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/178.jpg)
Distributional Semantics
by analyzing large corpora of
textual data it is possible to
infer information about the
usage (about the meaning) of
the terms
Definition co-occurrence co-occurrence
co-occurrence co-occurrence (*) Firth, J.R. A synopsis of linguistic theory
1930-1955. In Studies in Linguistic Analysis,
pp. 1-32, 1957.
![Page 179: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/179.jpg)
Distributional Semantics
by analyzing large corpora of
textual data it is possible to
infer information about the
usage (about the meaning) of
the terms
Definition co-occurrence co-occurrence
co-occurrence co-occurrence
Beer and wine share a similar meaning since
they are often used in similar contexts
(*) Firth, J.R. A synopsis of linguistic theory
1930-1955. In Studies in Linguistic Analysis,
pp. 1-32, 1957.
![Page 180: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/180.jpg)
Distributional Semantics
Term-Contexts Matrix
A vector-space representation is learnt
by encoding in which context each term is used
(This representation is called WordSpace)
![Page 181: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/181.jpg)
Distributional Semantics
Term-Contexts Matrix
A vector-space representation is learnt
by encoding in which context each term is used
Each row of the matrix is a vector!
![Page 182: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/182.jpg)
Distributional Semantics
Term-Contexts Matrix
beer vs wine: good overlap
Similar!
![Page 183: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/183.jpg)
Distributional Semantics
Term-Contexts Matrix
beer vs wine: no overlap
Not Similar!
![Page 184: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/184.jpg)
WordSpace
beer wine
mojito
dog
A vector space representation (called WordSpace)
is learnt according to terms usage in contexts
![Page 185: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/185.jpg)
WordSpace
beer wine
mojito
dog
A vector space representation (called WordSpace)
is learnt according to terms usage in contexts
Terms sharing a
similar usage
are very close
in the space
![Page 186: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/186.jpg)
Distributional Semantics
Term-Contexts Matrix
Key question: what is the context?
![Page 187: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/187.jpg)
Distributional Semantics
Term-Contexts Matrix
Key question: what is the context?
These approaches are very flexible since the «context» can
be set according to the granularity required by the
representation
![Page 188: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/188.jpg)
Distributional Semantics
Term-Contexts Matrix
Key question: what is the context?
Coarse-grained granularity:
context=whole document
![Page 189: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/189.jpg)
Distributional Semantics
Term-Contexts Matrix = Term-Document Matrix
Key question: what is the context?
(This is Vector Space Model!)
Vector Space Model is a Distributional Model
![Page 190: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/190.jpg)
Distributional Semantics
Term-Contexts Matrix
Key question: what is the context?
Fine-grained granularities:
context=paragraph, sentence, window of words
![Page 191: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/191.jpg)
Distributional Semantics
Term-Contexts Matrix
Fine-grained granularities:
PROs: the more fine-grained the representation, more precise the vectors
CONs: the more fine-grained the representation, the bigger the matrix
![Page 192: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/192.jpg)
Distributional Semantics
Term-Contexts Matrix
The flexibility of distributional semantics models
also regards the rows of the matrix
![Page 193: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/193.jpg)
Distributional Semantics
Term-Contexts Matrix
The flexibility of distributional semantics models
also regards the rows of the matrix
Keywords can be replaced with concepts
(as synsets or entities!)
![Page 194: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/194.jpg)
Distributional Semantics
Term-Contexts Matrix
The flexibility of distributional semantics models
also regards the rows of the matrix
Keywords can be replaced with concepts
(as synsets or entities!)
✔ ✔
![Page 195: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/195.jpg)
Distributional Semantics
✔ ✔
Term-Contexts Matrix
Keanu Reeves and Al Pacino
are «connected» because they
both acted in Drama Films
![Page 196: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/196.jpg)
Distributional Semantics
Representing Documents
Given a WordSpace, a vector space representation of
documents (called DocSpace) is typically built as the
centroid vector of word representations
✔ ✔
![Page 197: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/197.jpg)
Distributional Semantics
Representing Documents
✔ ✔
![Page 198: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/198.jpg)
DocSpace
Given a WordSpace, a vector space representation of
documents (called DocSpace) is typically built as the
centroid vector of word representations
Matrix Revolutions
Donnie Darko
Up!
similarity
calculations
between items
semantic
representation
The Matrix
![Page 199: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/199.jpg)
Distributional Semantics
• We can exploit the (big) corpora of
data to directly learn a semantic
vector-space representation of
the terms of a language
• Lightweight semantics, not
formally defined
• High flexibility: everything is a
vector: term/term similarity, doc/term,
term/doc, etc..
• Context can have different
granularities
• Huge amount of content is needed
• Matrices are particularly huge and
difficult to build
• Too many features: need for
dimensionality reduction
![Page 200: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/200.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Explicit
Semantic
Analysis
Random
Indexing
…… Word2Vec
Distributional
semantic models
Introduce semantics by
mapping the features
describing the item with
semantic concepts
Introduce semantics
by linking
the Item to
a knowledge graph
Distributional Semantics
models share the same
insight but have important
distinguishing aspects
![Page 201: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/201.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Explicit
Semantic
Analysis
Random
Indexing
…… Word2Vec
Distributional
semantic models
Introduce semantics by
mapping the features
describing the item with
semantic concepts
Introduce semantics
by linking
the Item to
a knowledge graph
Distributional Semantics
models share the same
insight but have important
distinguishing aspects
![Page 202: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/202.jpg)
Explicit Semantic Analysis (ESA)
ESA builds a vector-space
semantic
representation of natural language texts in a
high-dimensional space of
comprehensible
concepts derived from
Wikipedia [Gabri06]
[Gabri06] E. Gabrilovich and S. Markovitch. Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text
Categorization with Encyclopedic Knowledge. In Proceedings of the 21th National Conf. on Artificial Intelligence and the
18th Innovative Applications of Artificial Intelligence Conference, pages 1301–1306. AAAI Press, 2006.
Panthera
World War II
Jane Fonda
Island
![Page 203: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/203.jpg)
Explicit Semantic Analysis (ESA)
ESA matrix
ESA Concept 1 … Concept n
term 1 TF-IDF TF-IDF TF-IDF
… TF-IDF TF-IDF TF-IDF
term k TF-IDF TF-IDF TF-IDF Term
s
218
ESA is a Distributional
Semantic model which
uses Wikipedia
articles as context
Wikipedia articles
![Page 204: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/204.jpg)
Explicit Semantic Analysis (ESA)
ESA matrix
ESA Concept 1 … Concept n
term 1 TF-IDF TF-IDF TF-IDF
… TF-IDF TF-IDF TF-IDF
term k TF-IDF TF-IDF TF-IDF Term
s
219
Wikipedia articles
semantic relatedness
between a word and a concept
TF-IDF score
![Page 205: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/205.jpg)
Every Wikipedia article represents a concept
Article words are associated with the concept (TF-IDF)
Explicit Semantic Analysis (ESA)
Each Wikipedia page can be described in terms of
the words with the highest TF/IDF score (this is a
column of ESA
matrix)
![Page 206: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/206.jpg)
Explicit Semantic Analysis (ESA)
ESA matrix
ESA Concept 1 … Concept n
term 1 TF-IDF TF-IDF TF-IDF
… TF-IDF TF-IDF TF-IDF
term k TF-IDF TF-IDF TF-IDF
221
The vector-space representation of each term is called
semantic interpretation vector
![Page 207: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/207.jpg)
Explicit Semantic Analysis (ESA)
Every Wikipedia article represents a concept
Article words are associated with the concept (TF-IDF)
The semantics of a word is the
vector of its associations with
Wikipedia concepts
![Page 208: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/208.jpg)
Explicit Semantic Analysis (ESA)
Important: the semantics of the words is not static.
It changes as Wikipedia articles are modified or
new articles are introduced.
ESA provides a representation which evolves over time!
![Page 209: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/209.jpg)
Explicit Semantic Analysis (ESA)
«web» in 1980 «web» in 2000
Important: the semantics of the words is not static.
It changes as Wikipedia articles are modified or
new articles are introduced.
ESA provides a representation which evolves over time!
![Page 210: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/210.jpg)
Explicit Semantic Analysis (ESA)
«web» in 1980 «web» in 2000
Important: the semantics of the words is not static.
It changes as Wikipedia articles are modified or
new articles are introduced.
ESA provides a representation which evolves over time!
![Page 211: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/211.jpg)
The semantics of a text fragment is the
centroid of the semantics of its words
Game Controller
[0.32]
Mickey Mouse [0.81]
Game Controller
[0.64]
Explicit Semantic Analysis (ESA)
![Page 212: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/212.jpg)
Explicit Semantic Analysis (ESA)
A semantic representation of an item can be built as the
centroid vector of the semantic interpretation vectors of
the terms in the plot.
![Page 213: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/213.jpg)
Explicit Semantic Analysis (ESA)
A semantic representation of an item can be built as the
centroid vector of the semantic interpretation vectors of
the terms in the plot.
![Page 214: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/214.jpg)
Explicit Semantic Analysis (ESA)
Representation can be further improved and enriched by
processing content through exogenous techniques (e.g.
entity linking) in order to catch complex concepts
![Page 215: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/215.jpg)
Explicit Semantic Analysis (ESA)
semantic
relatedness
of a pair of text fragments
(e.g. description of two
items) computed by
comparing their
semantic
interpretation
vectors using the
cosine metric
Matrix Revolutions
Donnie Darko
Up!
The Matrix
![Page 216: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/216.jpg)
Explicit Semantic Analysis (ESA)
Another advantage: ESA can be also used to generate a set
of relevant extra concepts describing the items.
How?
![Page 217: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/217.jpg)
Explicit Semantic Analysis (ESA)
Another advantage: ESA can be also used to generate a set
of relevant extra concepts describing the items.
The Wikipedia pages with the highest TF/IDF score in the
semantic interpretation vector of the item!
![Page 218: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/218.jpg)
Explicit Semantic Analysis (ESA)
Artificial Intelligence
[0.61]
![Page 219: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/219.jpg)
235
Explicit Semantic Analysis (ESA)
![Page 220: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/220.jpg)
Explicit Semantic Analysis (ESA)
Distributional Model which uses
Wikipedia Article as context
Very Transparent representation
(columns have an explicit meaning)
Representation can evolve over time!
Also language-independent, thanks to
cross-language Wikipedia links
The whole matrix is very huge
«Empirical» tuning of the parameters:
how many articles? How many terms?
What is the thresholding?
![Page 221: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/221.jpg)
When transparency is not so important,
it is possible to learn a more compact
vector-space representation of terms and items
![Page 222: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/222.jpg)
When transparency is not so important,
it is possible to learn a more compact
vector-space representation of terms and items
Dimensionality Reduction techniques
![Page 223: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/223.jpg)
When transparency is not so important,
it is possible to learn a more compact
vector-space representation of terms and items
a.k.a. Word embedding techniques Embedding = a smaller representation of words
(more recent – equivalent - buzzword )
![Page 224: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/224.jpg)
When transparency is not so important,
it is possible to learn a more compact
vector-space representation of terms and items
a.k.a. Word embedding techniques Embedding = a smaller representation of words
Is this new?
![Page 225: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/225.jpg)
Dimensionality reduction techniques
Latent Semantic Analysis (LSA) is a widespread
distributional semantics model which builds
a term/term matrix and calculates SVD over that matrix.
Dumais, Susan T. "Latent semantic
analysis." Annual review of information science
and technology 38.1 (2004): 188-230.
![Page 226: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/226.jpg)
Dimensionality reduction techniques
Dumais, Susan T. "Latent semantic
analysis." Annual review of information science
and technology 38.1 (2004): 188-230.
Truncated Singular Value Decomposition
induces higher-order (paradigmatic) relations through the truncated SVD
Latent Semantic Analysis (LSA) is a widespread
distributional semantics model which builds
a term/context matrix and calculates SVD over that matrix.
![Page 227: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/227.jpg)
Singular Value Decomposition
PROBLEM
the huge co-occurrence matrix
SOLUTION
don’t build the huge co-occurrence matrix!
Use incremental and scalable techniques
Dimensionality reduction techniques
![Page 228: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/228.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Explicit
Semantic
Analysis
Random
Indexing
…… Word2Vec
Distributional
semantic models
Introduce semantics by
mapping the features
describing the item with
semantic concepts
Introduce semantics
by linking
the Item to
a knowledge graph
![Page 229: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/229.jpg)
Dimensionality reduction
Random Indexing
It is an incremental and scalable technique
for dimensionality reduction.
M. Sahlgren. The Word-Space Model: Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations
between Words in High-dimensional Vector Spaces. PhD thesis, Stockholm University, 2006.
![Page 230: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/230.jpg)
Dimensionality reduction
Random Indexing
It is an incremental and scalable technique
for dimensionality reduction.
Insight
Assign a vector to each context (word, documents, etc.). The
vector can be as big as you want.
Fill the vector with (almost) randomly assigned values.
Given a word, collect the contexts where that word appears.
Update the representation according to term co-occurrences.
The final representation is the «sum» of the contexts.
Obtain a (smaller but equivalent) vector space representation of
the terms
M. Sahlgren. The Word-Space Model: Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations
between Words in High-dimensional Vector Spaces. PhD thesis, Stockholm University, 2006.
![Page 231: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/231.jpg)
Dimensionality reduction
Random Indexing
It is an incremental and scalable technique
for dimensionality reduction.
Insight
Assign a vector to each context (word, documents, etc.). The
vector can be as big as you want.
Fill the vector with (almost) randomly assigned values.
Given a word, collect the contexts where that word appears.
Update the representation according to term co-occurrences.
The final representation is the «sum» of the contexts.
Obtain a (smaller but equivalent) vector space representation of
the terms
M. Sahlgren. The Word-Space Model: Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations
between Words in High-dimensional Vector Spaces. PhD thesis, Stockholm University, 2006.
![Page 232: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/232.jpg)
Random Indexing
Algorithm
Step 1 - definition of the context granularity:
Document? Paragraph? Sentence? Word?
Step 2 – building the random matrix R
each ‘context’ (e.g. sentence) is assigned a
context vector
dimension = k
allowed values = {-1, 0, +1}
small # of non-zero elements, i.e. sparse vectors
values distributed in a random way
![Page 233: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/233.jpg)
Random Indexing
Context vectors of dimension k = 8
Each row is a «context»
![Page 234: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/234.jpg)
Random Indexing
Algorithm
Step 3 – building the reduced space B
the vector space representation of a term t
obtained by combining the random vectors
of the context in which it occurs in
t1 ∈ {c1, c2, c5}
![Page 235: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/235.jpg)
Random Indexing
Algorithm
Step 3 – building the reduced space B
r1
0, 0, -1, 1, 0, 0, 0, 0
r2 1, 0, 0, 0, 0, 0, 0, -1
r3 0, 0, 0, 0, 0, -1, 1, 0
r4 -1, 1, 0, 0, 0, 0, 0, 0
r5
1, 0, 0, -1, 1, 0, 0, 0
…
rn …
t1 ∈ {c1, c2, c5}
r1
0, 0, -1, 1, 0, 0, 0, 0
r2 1, 0, 0, 0, 0, 0, 0, -1
r5 1, 0, 0, -1, 1, 0, 0, 0
t1 2, 0, -1, 0, 1, 0, 0, -1
Output: WordSpace
![Page 236: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/236.jpg)
Random Indexing
Algorithm
Step 4 – building the document space
the vector space representation of a
document d obtained by
combining the vector space representation
of the terms that occur in the document
Output: DocSpace
![Page 237: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/237.jpg)
WordSpace and DocSpace
c1 c
2 c
3 c
4 … c
k
t1
t2
t3
t4
…
tm
c1 c
2 c
3 c
4 … c
k
d1
d2
d3
d4
…
dn
DocSpace WordSpace
Uniform representation
k is a simple
parameter
of the model
![Page 238: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/238.jpg)
Dimensionality reduction
..even if it sounds weird
theory: Johnson-Lindenstrauss’ lemma [*]
Bm,k ≈ Am,n Rn,k k << n
distances between the points in the reduced space approximately preserved if
context vectors are nearly orthogonal
(and they are)
[*] Johnson, W. B., & Lindenstrauss, J. (1984). Extensions of Lipschitz mappings
into a Hilbert space. Contemporary mathematics, 26(189-206), 1.
![Page 239: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/239.jpg)
Random Indexing
…. is also multilingual!
the same concept, expressed in different languages,
assumes the same position in language-based geometric
spaces
the position of beer in a geometric space based on English
and the position of birra in a geometric space based on
Italian are (almost) the same
![Page 240: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/240.jpg)
Random Indexing
…. is also multilingual!
Italian WordSpace English WordSpace
glass
spoon
dog
beer
cucchiaio
cane
birra
bicchiere
The position in the space can be slightly different, but the
relations similarity between terms still hold
![Page 241: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/241.jpg)
Random Indexing
Incremental and Scalable technique for
learning word embeddings
![Page 242: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/242.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Explicit
Semantic
Analysis
Random
Indexing
…… Word2Vec
Distributional
semantic models
Introduce semantics by
mapping the features
describing the item with
semantic concepts
Introduce semantics
by linking
the Item to
a knowledge graph
![Page 243: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/243.jpg)
Word2Vec
• Distributional Model to learn Word Embeddings.
• Uses a two-layers neural network
• Training based on the Skip-Gram methodology
• Update of the network through Mini-batch or Stochastic Gradient
Descent
In a nutshell
![Page 244: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/244.jpg)
Word2Vec
(Partial) Structure of the network
Input Layer:
• Vocabulary V
• |V| number of terms
• |V| nodes
• Each term is
represented through
a «one hot
representation»
![Page 245: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/245.jpg)
Word2Vec
(Partial) Strucure of the network
Input Layer:
• Vocabulary V
• |V| number of terms
• |V| nodes
• One-hot representation
Hidden Layer:
• N nodes
• N = size of the embeddings
• Parameter of the model
![Page 246: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/246.jpg)
Word2Vec
(Partial) Structure of the network
Hidden Layer:
• N nodes
• N = size of the embeddings
• Parameter of the model
Weight of the network:
• Randomly set (initially)
• Updated through the training
Input Layer:
• Vocabulary V
• |V| number of terms
• |V| nodes
• One-hot representation
![Page 247: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/247.jpg)
Word2Vec
Hidden Layer:
• N nodes
• N = size of the embeddings
• Parameter of the model
Weight of the network:
• Randomly set (initially)
• Updated through the training
Final Representation for term tk
• Weights Extracted from the network
• tk=[wtkv1, wtkv2 … wtkvn]
Input Layer:
• Vocabulary V
• |V| number of terms
• |V| nodes
• One-hot representation
(Partial) Structure of the network
![Page 248: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/248.jpg)
Word2Vec
Training Procedure: how to create training examples?
Skip-Gram Methodology Continuous Bag-of-Words
Methodology
Given a word w(t), predict its
context w(t-2), t(t-1).. w(t+1), w(t+2) Given a context w(t-2), t(t-1)..
w(t+1), w(t+2) predict word w(t)
![Page 249: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/249.jpg)
Word2Vec
Training Procedure: how to create training examples?
Skip-Gram Methodology
Given a word w(t), predict its
context w(t-2), t(t-1).. w(t+1), w(t+2)
Example Input: ”the quick brown fox
jumped over the lazy dog”
Window Size: 1
Contexts:
• ([the, brown], quick)
• ([quick, fox], brown)
• ([brown, jumped], fox) ...
Training Examples:
• (quick, the)
• (quick, brown)
• (brown, quick)
• (brown, fox) ...
![Page 250: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/250.jpg)
Word2Vec
Training Procedure: how to optimize the model?
And probability is calculated through soft-max
The model tries to maximize The
probability of predicting a context C
given a word w
Given a corpus, we create of training examples through Skip-Gram.
![Page 251: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/251.jpg)
Word2Vec
Training Procedure: how to optimize the model?
And probability is calculated through soft-max
The model tries to maximize The
probability of predicting a context C
given a word w
Intuitively, probability is high when scalar product
is close to 1 when vectors are similar!
Given a corpus, we create a training examples through Skip-Gram.
![Page 252: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/252.jpg)
Word2Vec
Training Procedure: how to optimize the model?
Given a corpus, we create a training examples through Skip-Gram.
And probability is calculated through soft-max
The model tries to maximize The
probability of predicting a context C
given a word w
Intuitively, probability is high when scalar product
is close to 1 when vectors are similar!
Word2Vec is a distributional model since it learns a
representation such that couples (word,context)
appearing together have similar vectors
The error is collected and weights in the network are updated
accordingly. Typically is used Stochastic Gradient Descent or Mini-
Batch (every 128 or 512 training examples)
![Page 253: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/253.jpg)
Representation can be really really
small (size<100, typically)
Trending - Recent and Very Hot
technique
Word2Vec
Learning Word Embeddings
through Neural Networks: it is not
based on «counting» co-
occurrences. It relies on «predict»
the distribution
Not transparent anymore
Needs more computational resources
![Page 254: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/254.jpg)
…Let’s put everything together
![Page 255: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/255.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Distributional
semantic models
Introduce semantics by
mapping the features
describing the item with
semantic concepts
Introduce semantics
by linking
the Item to
a knowledge graph
Build a
Graph-based Data
Model
![Page 256: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/256.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Distributional
semantic models
Introduce semantics by
mapping the features
describing the item with
semantic concepts
Introduce semantics
by linking
the Item to
a knowledge graph
Build a
Graph-based Data
Model
Work on
Vector Space Model Work on
Vector Space Model
![Page 257: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/257.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Distributional
semantic models
Introduce semantics by
mapping the features
describing the item with
semantic concepts
Introduce semantics
by linking
the Item to
a knowledge graph
Build a
Graph-based Data
Model
Work on
Vector Space Model Work on
Vector Space Model
Can Exogenous and Endogenous approaches be combined?
![Page 258: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/258.jpg)
Exogenous Approaches as Entity Linking and WSD
work on the row of the matrix
…Let’s put everything together
![Page 259: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/259.jpg)
Exogenous Approaches as Entity Linking and WSD
work on the row of the matrix
…Let’s put everything together
Endogenous Approaches as ESA or Word2Vec
work on the columns of the matrix
![Page 260: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/260.jpg)
Exogenous Approaches as Entity Linking and WSD
work on the row of the matrix
…Let’s put everything together
Endogenous Approaches as ESA or Word2Vec
work on the columns of the matrix
Both approaches can be combined to obtain richer
and more precise semantic representations
(e.g. Word2Vec over textual description processed with WSD)
![Page 261: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/261.jpg)
What?
semantics-aware recommender systems
![Page 262: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/262.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Word Sense
Disambiguation
Entity
Linking …….
Introduce semantics by
mapping the features
describing the item with
semantic concepts
Introduce semantics
by linking the
Item to a
knowledge graph
![Page 263: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/263.jpg)
synsets
{09596828} American -- (a native or
inhabitant of the United States)
{06281561} fiction -- (a literary
work based on the imagination
and not necessarily on fact)
{06525881} movie, film, picture,
moving picture, moving-picture
show, motion picture,
motion-picture show, picture show,
pic, flick -- (a form of entertainment
that enacts a story…
{02605965} star -- (feature as the
star; "The movie stars Dustin
Hoffman as an autistic man")
The Matrix representation
through WSD we process the
textual description of the item
and we obtain a semantics-
aware representation of the
item as output.
In this case, keyword-based
features are replaced with the
concepts (in this case, a
WordNet synset) they refer to.
Matrix
1999
American
Australian
fiction
world
keywords
science
Hugo
![Page 264: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/264.jpg)
Synset-based representation
AI
Artificial
Intelligence
apple
M. Degemmis, P. Lops, and G. Semeraro. A Content-collaborative Recommender that Exploits WordNet-based User Profiles for
Neighborhood Formation. User Modeling and User-Adapted Interaction: The Journal of Personalization Research (UMUAI), 17(3):217–255,
Springer Science + Business Media B.V., 2007.
G. Semeraro, M. Degemmis, P. Lops, and P. Basile. Combining Learning and Word Sense Disambiguation for Intelligent User Profiling. In M.
M. Veloso, editor, IJCAI 2007, Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, January 6-
12, 2007 , pages 2856–2861. Morgan Kaufmann, 2007.
M.Degemmis, P. Lops, G. Semeraro, Pierpaolo Basile: Integrating tags in a semantic content-based recommender
ACM Conference on Recommender Systems, RecSys 2008: 163-170
![Page 265: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/265.jpg)
Keywords- vs synsets-based profiles
EachMovie dataset
o 1,628 movies grouped into 10 categories
o Users who rated between 30 and 100 movies
o Movie content crawled from IMDb
![Page 266: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/266.jpg)
In the context of cultural heritage personalization, does the integration of UGC and textual description of artwork collections cause an increase of the prediction accuracy in the process of recommending artifacts to users?
Results in a cultural heritage scenario
In RecSys ’08, Proceed. of the 2nd ACM Conference on Recommender Systems, pages 163–170, October 23-25, 2008,
Lausanne, Switzerland, ACM, 2008.
![Page 267: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/267.jpg)
Results in a cultural heritage scenario
5-point rating scale
Textual description of items (static content)
Personal Tags
Social Tags (from other users): caravaggio, deposition, christ, cross, suffering, religion
Social Tags
passion
![Page 268: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/268.jpg)
Results in a cultural heritage scenario
Personal Tags
Static Content
Social Tags
![Page 269: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/269.jpg)
Results in a cultural heritage scenario
o Artwork representation
o Artist
o Title
o Description
o Tags
o change of text representation from vectors of words (BOW) into vectors of WordNet synsets (BOS)
o From tags to semantic tags
o supervised Learning
o Bayesian Classifier learned from artworks labeled with user ratings and tags
![Page 270: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/270.jpg)
Results in a cultural heritage scenario
* Results averaged over the 30 study subjects
Au
gm
en
ted
Pro
file
s
Co
nte
nt-
based
Pro
file
s
Tag
-based
Pro
file
s
Overall accuracy F1 ≈ 85%
![Page 271: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/271.jpg)
Results in a cultural heritage scenario
personalized museum tours by arranging the most interesting
items for the active user
step forward to take into account spatial layout & time constraint
L. Iaquinta, M. de Gemmis, P. Lops, G. Semeraro: Recommendations toward Serendipitous Diversions. ISDA 2009: 1049-1054
![Page 272: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/272.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Ontologies Linked
Open Data
Introduce semantics
by linking the
Item to a
knowledge graph
…….
Introduce semantics by
mapping the features
describing the item with
semantic concepts
![Page 273: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/273.jpg)
Semantic Analysis using Ontologies
Quickstep & Foxtrot
o on-line academic research
papers recommenders
o items and user profiles represented through a research
topic ontology
o is-a relationships exploited to
infer general interests when specific
topics are observed
o match based on the correlation
between the topics in the user
profile and topics in papers
ACM Transactions
on Information Systems, 22(1):54–88, 2004
![Page 274: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/274.jpg)
Semantic Analysis using Ontologies
. In W. Nejdl, J. Kay,
P. Pu, and E. Herder, editors, Adaptive Hypermedia and AdaptiveWeb-Based Systems, volume 5149 of Lecture Notes in Computer
Science, pages 279–283. Springer, 2008.
News@hand
![Page 275: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/275.jpg)
Semantic Analysis using Ontologies
o user interests propagation from concepts which received the user feedback to
others related ones though spreading activation
o contextualized propagation strategies of user interests
horizontal propagation among siblings
anisotropic vertical propagation, i.e. user interests propagated differently
upward and downward
. Inf. Sci.,
250:40–60, 2013.
![Page 276: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/276.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Ontologies Linked
Open Data
Introduce semantics
by linking the
Item to a
knowledge graph
…….
Introduce semantics by
mapping the features
describing the item with
semantic concepts
![Page 277: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/277.jpg)
Linked Open Data & RecSys
structured information source
for item descriptions
![Page 278: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/278.jpg)
LOD & Recommender Systems
Vector Space Model for LOD
Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, Markus Zanker. Linked Open Data to support Content-based
Recommender Systems. 8th International Conference on Semantic Systems (I-SEMANTICS) - 2012 (Best Paper Award)
![Page 279: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/279.jpg)
LOD & Recommender Systems
Vector Space Model for LOD
Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, Markus Zanker. Linked Open Data to support Content-based
Recommender Systems. 8th International Conference on Semantic Systems (I-SEMANTICS) - 2012 (Best Paper Award)
(starring, directors,
subject, etc.)
![Page 280: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/280.jpg)
LOD & Recommender Systems
Property subset evaluation
subject+broader solution
better than only subject or
subject+more broaders
too many broaders
introduce noise
best solution achieved
with
subject+broader+genres
Tommaso Di Noia, Roberto Mirizzi, Vito Claudio Ostuni, Davide Romito, Markus Zanker. Linked Open Data to support Content-based
Recommender Systems. 8th International Conference on Semantic Systems (I-SEMANTICS) - 2012 (Best Paper Award)
![Page 281: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/281.jpg)
Graph-based RecSys
Recommendations
obtained by mining
the graph
![Page 282: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/282.jpg)
Graph-based RecSys
Recommendations
obtained by mining
the graph
Identification of the
most relevant (target)
nodes, according to
the recommendation
scenario
![Page 283: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/283.jpg)
Graph-based RecSys
Recommendations
obtained by mining
the graph
Identification of the
most relevant (target)
nodes, according to
the recommendation
scenario
PageRank
Spreading Activation
Personalized PageRank
…
Cataldo Musto, Pasquale Lops, Pierpaolo Basile, Marco De
Gemmis, Giovanni Semeraro.
.
UMAP 2016
![Page 284: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/284.jpg)
Graph-based RecSys
Personalized PageRank [*] to identify
the most relevant nodes in the graph
[*] T. H. Haveliwala. Topic-
Sensitive PageRank: A
Context-Sensitive Ranking
Algorithm for Web Search.
IEEE Trans. Knowl. Data
Eng., 15(4):784–796, 2003.
![Page 285: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/285.jpg)
Graph-based RecSys
MovieLens 100K dataset
G = Personalized PageRank on Bipartite User-Item Graph
G+LOD = Tripartite Graph modeling also Linked Open Data
![Page 286: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/286.jpg)
Graph-based RecSys
is it necessary to inject all the properties available in LOD cloud?
![Page 287: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/287.jpg)
Graph-based RecSys
is it necessary to inject all the properties available in LOD cloud?
![Page 288: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/288.jpg)
Graph-based RecSys
is it necessary to inject all the properties available in LOD cloud?
![Page 289: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/289.jpg)
Graph-based RecSys
what are the most
promising properties
to include?
manual selection o domain-specific
properties
o most frequent properties
o …
automatic selection
o more difficult to
implement
Cataldo Musto, Pasquale Lops, Pierpaolo Basile, Marco De
Gemmis, Giovanni Semeraro.
.
UMAP 2016
![Page 290: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/290.jpg)
Feature selection
selecting the most promising subset of LOD-based
properties
possible techniques
PageRank
Principal Component Analysis
Information Gain
Information Gain Ratio
Mininum Redundancy Maximum Relevance
![Page 291: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/291.jpg)
Graph-based RecSys
tradeoff between accuracy and diversity
MovieLens 100K dataset
![Page 292: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/292.jpg)
Graph-based RecSys
Comparison to state of the art
MovieLens 100K dataset
![Page 293: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/293.jpg)
we can further build some
extra features by
mining the paths
occurring in the graph
path acyclic sequence of relations ( s , .. rl , .. rL )
frequency of pathj in the sub-graph related to u ad x
u3
s i2 p
2 e
1 p
1 i1
(s, p2 ,
p
1)
𝑤𝑢𝑥(𝑗) = #𝑝𝑎𝑡ℎ𝑢𝑥(𝑗)
#𝑝𝑎𝑡ℎ𝑢𝑥(𝑗)𝑗
Semantic Graph-based Data Model
(Path Based Features)
Vito Claudio Ostuni, Tommaso Di Noia, Eugenio Di Sciascio, Roberto Mirizzi: Top-N recommendations from implicit feedback leveraging
linked open data. RecSys 2013: 85-92
![Page 294: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/294.jpg)
LOD & Recommender Systems
Sprank Systems
o analysis of complex relations between user
preferences and the target item
o extraction of path-based features
Vito Claudio Ostuni, Tommaso Di Noia, Eugenio Di Sciascio, Roberto Mirizzi: Top-N recommendations from implicit feedback leveraging
linked open data. RecSys 2013: 85-92
![Page 295: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/295.jpg)
wu3X1 ?
Semantic Graph-based Data Model
(Path Based Features)
![Page 296: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/296.jpg)
path1 (s, s, s) : 1
Semantic Graph-based Data Model
(Path Based Features)
![Page 297: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/297.jpg)
path1 (s, s, s) : 2
Semantic Graph-based Data Model
(Path Based Features)
![Page 298: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/298.jpg)
path1 (s, s, s) : 2
path2 (s, p2, p1) : 1
Semantic Graph-based Data Model
(Path Based Features)
![Page 299: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/299.jpg)
path1 (s, s, s) : 2
path2 (s, p2, p1) : 2
Semantic Graph-based Data Model
(Path Based Features)
![Page 300: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/300.jpg)
path1 (s, s, s) : 2
path2 (s, p2, p1) : 2
path3 (s, p2, p3, p1) : 1
Semantic Graph-based Data Model
(Path Based Features)
![Page 301: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/301.jpg)
path1 (s, s, s) : 2
path2 (s, p2, p1) : 2
path3 (s, p2, p3, p1) : 1
𝑤𝑢3𝑥1 1 = 2
5
𝑤𝑢3𝑥1 2 = 2
5
𝑤𝑢3𝑥1 3 = 1
5
Semantic Graph-based Data Model
(Path Based Features)
![Page 302: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/302.jpg)
Evaluation:
LOD-based overcomes state-of-the-art
![Page 303: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/303.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Explicit
Semantic
Analysis
Random
Indexing
…… Word2Vec
Distributional
semantic models
Introduce semantics by
mapping the features
describing the item with
semantic concepts
Introduce semantics
by linking
the Item to
a knowledge graph
![Page 304: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/304.jpg)
Word2Vec
• Empirical Comparison of Word Embedding Techniques
for Content-based Recommender Systems [*]
• Methodology
• Build a WordSpace using different Word Embedding
techniques (and different sizes)
• Build a DocSpace as the centroid vectors of term vectors
• Build User Profiles as centroid of the items they liked
• Provide Users with Recommendations
• Compare the approaches
European Conference on Information Retrieval
![Page 305: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/305.jpg)
Word2Vec
Results on Dbbook and MovieLens data
European Conference on Information Retrieval
![Page 306: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/306.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Explicit
Semantic
Analysis
Random
Indexing
…… Word2Vec
Distributional
semantic models
Introduce semantics by
mapping the features
describing the item with
semantic concepts
Introduce semantics
by linking
the Item to
a knowledge graph
![Page 307: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/307.jpg)
eVSM
• Enhanced Vector Space Model [*]
• Content-based Recommendation Framework
• Cornerstones
• Semantics modeled through Distributional Models
• Random Indexing for Dimensionality Reduction
• Negative Preferences modeled through Quantum Negation [^]
• User Profiles as centroid vectors of items representation
• Recommendations through Cosine Similarity
[*] Musto, Cataldo. "Enhanced vector space models for content-based recommender systems." Proceedings of the fourth
ACM conference on Recommender systems. ACM, 2010.
Mathematics of language
![Page 308: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/308.jpg)
eVSM
Musto, Cataldo. "Enhanced vector space models for content-based recommender systems." Proceedings of the fourth ACM
conference on Recommender systems. ACM, 2010.
Distributional Models
to build DocSpace of the items
(whole document used as context)
Random Indexing for
Dimensionality Reduction
![Page 309: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/309.jpg)
eVSM
Cornerstones
• Given two vectors a e b
• Through Quantum Negation we can define a Vector (a ∧¬b) • Formally:
• Projection of vector a on the subspace orthogonal to that generated
by vector b
• Intuitively:
• Vector «a» models «positive» preferences
• Vector «b» models «negative» preferences
• Through quantum negation we get a unique vector modeling both
aspects
• Close to vectors containing as many as possible features from «a»
and as less as possible features from «b»
Mathematics of language
![Page 310: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/310.jpg)
eVSM
• User Profiles
• Calculated as centroid vectors of the items the user liked/disliked
Random Indexing-
based Profiles (RI)
Musto, Cataldo. "Enhanced vector space
models for content-based recommender
systems." Proceedings of the fourth ACM
conference on Recommender systems. ACM,
2010.
![Page 311: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/311.jpg)
eVSM
Cornerstones
• User Profiles
• Calculated as centroid vectors of the items the user liked/disliked
Random Indexing-
based Profiles (W-RI)
Quantum Negation- based
Profiles (W-QN)
Musto, Cataldo. "Enhanced vector space
models for content-based recommender
systems." Proceedings of the fourth ACM
conference on Recommender systems. ACM,
2010.
![Page 312: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/312.jpg)
eVSM
Cornerstones
• Recommendations
• Similarity Calculations on DocSpace
![Page 313: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/313.jpg)
eVSM
Experiments
The size of the embeddings does not significantly affect the overall
accuracy of eVsm (MovieLens data)
![Page 314: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/314.jpg)
eVSM
Experiments
Quantum Negation improves the accuracy of the model
(MovieLens data, embedding size=100)
![Page 315: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/315.jpg)
eVSM
Experiments
eVSM significantly overcame all the baselines.
(MovieLens data, embedding size=400)
![Page 316: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/316.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Explicit
Semantic
Analysis
Random
Indexing
…… Word2Vec
Distributional
semantic models
Introduce semantics by
mapping the features
describing the item with
semantic concepts
Introduce semantics
by linking
the Item to
a knowledge graph
Word Sense
Disambiguation
Entity
Linking
![Page 317: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/317.jpg)
C-eVSM
• Contextual Enhanced Vector Space Model [*]
• Extension of eVSM: context-aware Framework
• Cornerstones
• Entity Linking of the content through Tag.me
• Semantics modeled through Distributional Models
• Random Indexing for Dimensionality Reduction
• Distributional Models also used to build a representation of
the context
• Context-aware User Profiles as centroid vectors
• Recommendations through Cosine Similarity
Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based
recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014
![Page 318: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/318.jpg)
C-eVSM
• Context-aware User Profiles
Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based
recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014
α α
Let u be the target user
Let ck be a contextual variable (e.g. task, mood, etc.)
Let vj be its value (e.g. task=running, mood=sad, etc.)
![Page 319: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/319.jpg)
C-eVSM
• Context-aware User Profiles
Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based
recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014
α α
eVSM
profile
![Page 320: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/320.jpg)
C-eVSM
• Context-aware User Profiles
Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based
recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014
C-WRI(u,ck,v
j) = α * WRI(u) + (1-α) * context(u,c
k,v
j)
eVSM
profile
A Vector representing value v for
context c is introduced (e.g.
company=friends)
![Page 321: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/321.jpg)
C-eVSM
• Context-aware User Profiles
Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based
recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014
C-WRI(u,ck,v
j) = α * WRI(u) + (1-α) * context(u,c
k,v
j)
eVSM
profile
A Vector representing value v for
context c is introduced (e.g.
company=friends)
Linear Combination
(a=1 eVSM no context
taken into account)
![Page 322: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/322.jpg)
C-eVSM
• Context-aware User Profiles
Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based
recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014
C-WRI(u,ck,v
j) = α * WRI(u) + (1-α) * context(u,c
k,v
j)
eVSM
profile
A Vector representing value v for
context c is introduced (e.g.
company=friends)
Linear Combination
(a=1 eVSM no context
taken into account)
![Page 323: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/323.jpg)
C-eVSM
• Context-aware User Profiles
Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based
recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014
C-WRI(u,ck,v
j) = α * WRI(u) + (1-α) * context(u,c
k,v
j)
eVSM
profile
A Vector representing value v for
context c is introduced (e.g.
company=friends)
Why this
formula?
![Page 324: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/324.jpg)
C-eVSM
• Why this formula?
Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based
recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014
Insight: it exists a set of terms that is more descriptive
of items relevant in that specific context
for a romantic dinner, e.g. candlelight, seaview, violin
![Page 325: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/325.jpg)
C-eVSM
• Why this formula?
Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based
recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014
![Page 326: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/326.jpg)
C-eVSM
• Why this formula?
Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based
recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014
Thanks to Distributional Semantics Models it is
possible to build a vector-space representation of the
context which emphasize the importance of those
terms, since they are more used ( more important) in
that specific contextual setting.
![Page 327: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/327.jpg)
C-eVSM
Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based
recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014
![Page 328: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/328.jpg)
C-eVSM
Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based
recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014
![Page 329: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/329.jpg)
C-eVSM
Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based
recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014
![Page 330: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/330.jpg)
C-eVSM
Musto, Cataldo, et al. "Combining distributional semantics and entity linking for context-aware content-based
recommendation." International Conference on User Modeling, Adaptation, and Personalization. UMAP 2014
Entities are better than simple keywords!
Selection of Results
![Page 331: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/331.jpg)
C-eVSM
Incorporating contextual information in recommender systems using a multi-
dimensional approach
Compared to Context-aware Collaborative Filtering (CACF)
[*] algorithm: better in 7 contextual segments
![Page 332: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/332.jpg)
Semantic representations
Explicit (Exogenous)
Semantics
Implicit (Endogenous)
Semantics
Explicit
Semantic
Analysis
Random
Indexing
…… Word2Vec
Distributional
semantic models
Introduce semantics by
mapping the features
describing the item with
semantic concepts
Introduce semantics
by linking
the Item to
a knowledge graph
![Page 333: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/333.jpg)
Text Categorization [Gabri09] experiments on diverse datasets
Semantic relatedness of
words and texts [Gabri09] cosine similarity between vectors of ESA concepts
Information Retrieval [Egozi08, Egozi11] ESA-based IR algorithm enriching documents and queries
ESA effectively used for
[Gabri09] E. Gabrilovich and S. Markovitch. Wikipedia-based Semantic Interpretation for Natural Language Processing. Journal of Artificial
Intelligence Research 34:443-498, 2009.
[Egozi08] Ofer Egozi, Evgeniy Gabrilovich, Shaul Markovitch: Concept-Based Feature Generation and Selection for Information Retrieval.
AAAI 2008, 1132-1137, 2008.
[Egozi11] Ofer Egozi, Shaul Markovitch, Evgeniy Gabrilovich. Concept-Based Information Retrieval using Explicit Semantic Analysis.
ACM Transactions on Information Systems 29(2), April 2011.
370
what about ESA for Information Filtering?
![Page 334: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/334.jpg)
Electronic Program Guides
problem
description of TV shows too short or
poorly meaningful to feed a
content-based recommendation algorithm
solution
Explicit Semantic Analysis exploited to obtain an
enhanced representation
[Musto12] C. Musto, F. Narducci, P. Lops, G. Semeraro, M. de Gemmis, M. Barbieri, J. H. M. Korst, V. Pronk, and R. Clout.
Enhanced semantic tv-show representation for personalized electronic program guides. UMAP 2012, pp. 188–199.
Springer, 2012
372
![Page 335: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/335.jpg)
Electronic Program Guides
[Musto12] C. Musto, F. Narducci, P. Lops, G. Semeraro, M. de Gemmis, M. Barbieri, J. H. M. Korst, V. Pronk, and R. Clout.
Enhanced semantic tv-show representation for personalized electronic program guides. UMAP 2012, pp. 188–199. Springer, 2012
373
![Page 336: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/336.jpg)
Electronic Program Guides
[Musto12] C. Musto, F. Narducci, P. Lops, G. Semeraro, M. de Gemmis, M. Barbieri, J. H. M. Korst, V. Pronk, and R. Clout.
Enhanced semantic tv-show representation for personalized electronic program guides. UMAP 2012, pp. 188–199. Springer, 2012
374
![Page 337: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/337.jpg)
Electronic Program Guides
[Musto12] C. Musto, F. Narducci, P. Lops, G. Semeraro, M. de Gemmis, M. Barbieri, J. H. M. Korst, V. Pronk, and R. Clout.
Enhanced semantic tv-show representation for personalized electronic program guides. UMAP 2012, pp. 188–199. Springer, 2012
375
Wikipedia Articles related to the
TV show are added to the
description
![Page 338: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/338.jpg)
Electronic Program Guides
[Musto12] C. Musto, F. Narducci, P. Lops, G. Semeraro, M. de Gemmis, M. Barbieri, J. H. M. Korst, V. Pronk, and R. Clout.
Enhanced semantic tv-show representation for personalized electronic program guides. UMAP 2012, pp. 188–199. Springer, 2012
376
user profile tv show
motogp
sports
motorbike
...
competition
2012 Superbike
Italian Grand Prix
![Page 339: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/339.jpg)
Electronic Program Guides
[Musto12] C. Musto, F. Narducci, P. Lops, G. Semeraro, M. de Gemmis, M. Barbieri, J. H. M. Korst, V. Pronk, and R. Clout.
Enhanced semantic tv-show representation for personalized electronic program guides. UMAP 2012, pp. 188–199. Springer, 2012
377
user profile tv show
motogp
sports
motorbike
...
competition
2012 Superbike
Italian Grand Prix
X No matching!
![Page 340: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/340.jpg)
Electronic Program Guides
[Musto12] C. Musto, F. Narducci, P. Lops, G. Semeraro, M. de Gemmis, M. Barbieri, J. H. M. Korst, V. Pronk, and R. Clout.
Enhanced semantic tv-show representation for personalized electronic program guides. UMAP 2012, pp. 188–199. Springer, 2012
378
user profile tv show
2012 Superbike
Italian Grand Prix
motogp
superbike
sports
motorbike
formula 1
…
competition
Through ESA we can
add new features to
the profile and we
can improve the
overlap between
textual description
![Page 341: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/341.jpg)
Electronic Program Guides
[Musto12] C. Musto, F. Narducci, P. Lops, G. Semeraro, M. de Gemmis, M. Barbieri, J. H. M. Korst, V. Pronk, and R. Clout.
Enhanced semantic tv-show representation for personalized electronic program guides. UMAP 2012, pp. 188–199. Springer, 2012
379
user profile tv show
2012 Superbike
Italian Grand Prix
motogp
superbike
sports
motorbike
formula 1
…
competition
Matching!
✔
![Page 342: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/342.jpg)
Electronic Program Guides
380
results on Aprico.tv data
The more Wikipedia Concepts are added to the textual description
of the items (eBOW+60), the best the precision of the algorithm
eBOW = Bag of Words + Wikipedia Concepts
![Page 343: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/343.jpg)
Semantics User Profiles
• Research Question
• Is it possible to exploit semantic representation
techniques to improve the quality of user profiles?
• Methodology
• Build User Profiles by extracting data available from social
networks
• Process User Profiles through Semantics-aware
Techniques
• Evaluate the effectiveness of the profiles
• Accuracy
• Transparency
• Serendipity
Narducci, F., Musto, C., Semeraro, G., Lops, P., & de Gemmis, M.
(2013, August). Exploiting big data for enhanced representations
in content-based recommender systems. In International
Conference on Electronic Commerce and Web Technologies (pp. 182-
193). Springer Berlin Heidelberg.
![Page 344: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/344.jpg)
Semantics User Profiles
• Techniques
• Keyword-based Profiles
• Entity Linking (Tag.me)
• Explicit Semantic Analysis
• Scenario I’m in trepidation for my first riding lesson!, I’m really anxious
for the soccer match :( , This summer I will flight by Ryanair to London!, Ryanair
really cheapest company!, Ryanair lost my luggage :(, This summer holidays are
really amazing!, Total relax during these holidays!
![Page 345: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/345.jpg)
Semantics User Profiles
Results – Experiment 1
• User Study involving 63 users • Twitter and Facebook as Social Networks
• Users were provided with a set of personalized news
• Answers gathered through User Feedback on recommended news
![Page 346: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/346.jpg)
Semantics User Profiles
Results – Experiment 1
• User Study involving 63 users • Twitter and Facebook as Social Networks
• Users were provided with a set of personalized news
• Answers gathered through User Feedback on recommended news
No relevance feedback
Avg.
rating
Min
rating
Max
rating
Std.
Dev.
1.49 0 5 1.12
1.89 0 5 1.47
2.86 1 5 1.30
2.59 0 5 1.37
![Page 347: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/347.jpg)
Semantics User Profiles
Transparency Serendipity
KEYWORDS 1.33 0 3 0.65 0.42 0 2 0.57
TAG.ME 3.88 2 5 0.82 0.54 0 2 0.61
ESA 1.16 0 4 1.00 3.24 0 5 1.24
Results – Experiment 2
• User Study involving 51 users • Twitter and Facebook as Social Networks
• Users were provided with a tag cloud describing their profile
• Answers gathered through Questionnaires
![Page 348: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/348.jpg)
What?
Cross-lingual recommender systems
![Page 349: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/349.jpg)
Cross-lingual access: motivations
relevant information exist in different languages
Web is becoming more and more multilingual
users are becoming increasing polyglot
more than half of the world population bilingual
cross-language recommender systems
can likely increase the number of tail products suggested (25-30% of sales for online stores)
387
![Page 350: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/350.jpg)
Cross-lingual access: problems
Vocabulary mismatch
use of different languages
extreme case of vocabulary mismatch
388
![Page 351: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/351.jpg)
Semantic analytics for cross-lingual access
Sense-based representations
inherently multilingual
terms in each specific language change, while
concepts (word meanings) remain the same
across different languages
match between
items and user profiles at a conceptual level
![Page 352: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/352.jpg)
(Cross-lingual) Concept-based representations
ESA
![Page 353: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/353.jpg)
MultiWordNet
392
Sense-based representations
Word Sense Disambiguation (JIGSAW) based
on Multiwordnet as sense repository
multilingual lexical database that supports
English, Italian, Spanish, Portuguese, Hebrew,
Romanian, Latin
alignment between synsets in the different languages
semantic relations imported and preserved
![Page 354: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/354.jpg)
MultiWordNet
393
![Page 355: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/355.jpg)
394
Bag of MultiWordNet synsets
bag of
MultiWordNet
synsets
bag of
MultiWordNet
synsets
Match
between
senses
![Page 356: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/356.jpg)
Some results
cross-language movie recommendation scenario
profiles learned from ENG/ITA descriptions
recommendation provided on ITA/ENG descriptions
MovieLens dataset, F1 measure, Wikipedia source for descriptions
395
![Page 357: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/357.jpg)
Cross-lingual representation: Tagme
![Page 358: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/358.jpg)
Cross-language links
397
![Page 359: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/359.jpg)
Cross-lingual representation: Tagme
Film d’azione Lana e Lilly Wachowskis
Laurence Fishburne Keanu Reeves
distopia …
action film The Wachowskis
Keanu Reeves Laurence Fishburne
Dystopia Perception
Carrie-Anne Moss Cyberspace
…
![Page 360: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/360.jpg)
Text Language L1
Text Language Ln
Text Language L2
…
Translated Text PIVOT LANGUAGE
C1 C2 C3 … … Cn
t1
t2
…
tk
ESA MATRIX PIVOT LANGUAGE
Wikipedia articles
Term
s o
ccu
rrin
g in
W
ikip
edia
art
icle
s
TF-IDF
TRANSLATION PROCESS
Cross-lingual representation: ESA
Translation-based ESA
![Page 361: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/361.jpg)
ESA MATRIX L1 C1-L1 C2-L1 … Cn-L1
ESA MATRIX L2 C1-L2 C2-L2 … Cn-L2
ESA MATRIX LN C1-LN C2-LN … Cn-LN
ESA MATRIX LN C1-LN C2-LN … Cn-LN
…
Text
La
ngu
age
L1
Text
La
ngu
age
Ln
Text
La
ngu
age
L2
…
Cross-lingual representation: ESA
Cross-language ESA
![Page 362: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/362.jpg)
Cross-lingual representation: Babelfy
![Page 363: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/363.jpg)
Preliminary results
effectiveness of knowledge-based strategies to provide
cross-lingual recommendations
MovieLens and DBbook datasets
F1 measure, Wikipedia source for descriptions
English and Italian languages
406
![Page 364: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/364.jpg)
Cross-lingual representation: Distributional models
distribution of the terms
(almost) the same in different languages
o cross-lingual representation comes with
no costs thanks to the distributional hypothesis
![Page 365: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/365.jpg)
Distributional Semantics
…. is also multilingual! (Recap)
Italian WordSpace English WordSpace
glass
spoon
dog
beer
cucchiaio
cane
birra
bicchiere
The position in the space can be slightly different, but the
relations similarity between terms still hold
![Page 366: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/366.jpg)
Distributional Semantics
Multilingual DocSpace
Italian DocSpace English DocSpace
D2_L1
D3_L1
D4_L1
D1_L1
D7_L2 D8_L2
D5_L2
D6_L2
By following the same procedure we can
obtain a multilingual DocSpace
Different documents in different languages are represented in a uniform space
![Page 367: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/367.jpg)
Distributional Semantics
…. is also multilingual!
Italian DocSpace English DocSpace
D2_L1
D3_L1
D4_L1
D1_L1
D7_L2 D8_L2
D5_L2
D6_L2
By following the same procedure we can
obtain a multilingual DocSpace
How to build a cross-lingual recommender?
![Page 368: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/368.jpg)
Distributional Semantics
…. is also multilingual!
Italian DocSpace English DocSpace
D2_L1
D3_L1
D4_L1
D1_L1
D7_L2 D8_L2
D5_L2
D6_L2
P1
We build a user profile in L1 (Italian DocSpace)
How to build a cross-lingual recommender?
![Page 369: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/369.jpg)
Distributional Semantics
…. is also multilingual!
Italian DocSpace English DocSpace
D2_L1
D3_L1
D4_L1
D1_L1
D7_L2 D8_L2
D5_L2
D6_L2
P1 P1
We build a user profile in L1 (Italian DocSpace)
We can «move» the profile in L2 (EnglishDocSpace)
How to build a cross-lingual recommender?
![Page 370: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/370.jpg)
Distributional Semantics
…. is also multilingual!
Italian DocSpace English DocSpace
D2_L1
D3_L1
D4_L1
D1_L1
D7_L2 D8_L2
D5_L2
D6_L2
We build a user profile in L1 (Italian DocSpace)
We can «move» the profile in L2 (EnglishDocSpace) We can use similarity measures to suggest items in different language
How to build a cross-lingual recommender?
P1 P1
![Page 371: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/371.jpg)
Some results
effectiveness of knowledge-based strategies to provide
cross-lingual recommendations
MovieLens dataset
F1 measure
comparable results (gap not statistically significant)
414
C. Musto, F. Narducci, P. Basile, P. Lops, M. de Gemmis, G. Semeraro:"Cross-language information filtering: Word sense disambiguation
vs. distributional models." AI*IA 2011: 250-261
![Page 372: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/372.jpg)
What?
Explaining recommendations
![Page 373: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/373.jpg)
Graph-based RecSys: explanations
are properties useful for providing
explanations?
advantages readability of properties
…
disadvantages difficult to generate
natural language
explanations
selection of the properties
to include in the
explanation
![Page 374: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/374.jpg)
Graph-based RecSys: explanations
main intuition of the EXPLOD system
o graph connecting user preferences and
recommendations through a set of
LOD-based properties
o scoring and ranking properties
Cataldo Musto, Fedelucio Narducci , Pasquale Lops, Marco de Gemmis, Giovanni Semeraro:
. RecSys 2016, to appear
![Page 375: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/375.jpg)
Scoring properties in EXPLOD
items in the
user profile
items in the
recommendation list property
number of edges
connecting the propertyc
with the items in
the user profile
number of edges
connecting the propertyc
with the items in
the recommendation set
Graph-based RecSys: explanations
![Page 376: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/376.jpg)
Scoring properties in EXPLOD
weighting factors
adaptation of the
Inverse
Document
Frequency
Graph-based RecSys: explanations
![Page 377: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/377.jpg)
Scoring properties in EXPLOD
higher score to properties highly connected to the items in
both the user profile and the recommendation list,
and which are not common.
Graph-based RecSys: explanations
![Page 378: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/378.jpg)
From properties to Natural Language explanation
I recommend you The Dark Knight because you often like Films
shot in the United Kingdom as Inception and Forrest Gump. In
addition, you sometimes like Films produced by Christopher
Nolan as Inception and Screenplays by Christopher Nolan as
Inception
Graph-based RecSys: explanations
![Page 379: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/379.jpg)
From properties to Natural Language explanation
o top-ranked properties to fill in a template-based structure of
the explanation
o LOD cloud-properties mapped to natural language
expressions
o adverbs obtained by mapping the normalized occurrence
of that property to a different range of the score
Graph-based RecSys: explanations
![Page 380: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/380.jpg)
Graph-based RecSys: explanations
preliminary results
aim question
transparency I understood why this movie
was recommended to me
o topic
o director o distributor
o music composer
persuasion
The explanation made the
recommendation more
convincing
o awards
o director
o location
o producer
engagement
The explanation helped me
discover new information about
this movie
o writer
o director
o producer
o distributor
trust
The explanation increased my
trust in the recommender
system
o awards o composer
o producer
o topic
effectiveness I like this recommendation o director
o writer
o location o composer
user study involving 308 subjects
![Page 381: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/381.jpg)
What?
Semantic analysis of social streams
![Page 382: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/382.jpg)
Social Networks
novel data silos
![Page 383: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/383.jpg)
Goal
To design a unique framework implementing
a pipeline of semantic analysis techniques
![Page 384: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/384.jpg)
Our Contribution: CrowdPulse
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
![Page 385: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/385.jpg)
Our Contribution: CrowdPulse
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
![Page 386: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/386.jpg)
CrowdPulse Workflow
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
![Page 387: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/387.jpg)
CrowdPulse Workflow
Each «project» is an instance of such a workflow
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
![Page 388: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/388.jpg)
CrowdPulse Workflow
Each «project» is an instance of such a workflow
Every module is set according to the needs of the scenario
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
![Page 389: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/389.jpg)
CrowdPulse
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
![Page 390: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/390.jpg)
CrowdPulse
Solution:
Entity Linking Algorithms
Input: textual content
Output: identification and
disambiguation of the
entities mentioned in the
text.
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
![Page 391: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/391.jpg)
CrowdPulse
Solution:
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
![Page 392: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/392.jpg)
CrowdPulse
Solution:
Overall sentiment: :-(
We implemented:
• A lexicon-based approach, which assigns a sentiment
according to the words in the social content
• A supervised classification algorithm, which exploits
labeled examples to learn a classification model
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
![Page 393: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/393.jpg)
Supervised learning
Unsupervised learning
Linguistic Analysis (Distributional Models)
classification, regression tasks
clustering
building word spaces, similarity between
concepts, analysis of terms usage, etc.
CrowdPulse
Step 4: Domain-Specific Processing
CrowdPulse natively supports all these methodologies
The choice is typically scenario-dependent
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
![Page 394: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/394.jpg)
CrowdPulse
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
![Page 395: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/395.jpg)
CrowdPulse Workflow
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
(recap)
![Page 396: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/396.jpg)
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
![Page 397: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/397.jpg)
Research Question:
Is it possible to extract and process social
media to monitor in real time people feelings,
opinions and sentiments about the current
state of the social capital of L’Aquila?
Use Case
L’Aquila Social Urban Network
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
![Page 398: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/398.jpg)
4
4
1
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
Use Case
L’Aquila Social Urban Network
![Page 399: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/399.jpg)
Use Case
L’Aquila Social Urban Network
Heuristics:
- Twitter users (local newspapers, mention to politicians)
- Twitter content+geo (50km around l’Aquila and/or specific hashtags as #laquila
#earthquake, etc)
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
![Page 400: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/400.jpg)
Use Case
L’Aquila Social Urban Network
Extracted Content
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
![Page 401: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/401.jpg)
Use Case
L’Aquila Social Urban Network
Semantic and Sentiment Analysis of Extracted Content
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
![Page 402: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/402.jpg)
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
Use Case
L’Aquila Social Urban Network
Domain-specific Task
Given a piece of social content,
we have to classify it against the social indicators
![Page 403: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/403.jpg)
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
Use Case
L’Aquila Social Urban Network
Classification Task
Build a supervised classification model
for each social indicator
![Page 404: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/404.jpg)
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
Use Case
L’Aquila Social Urban Network
Input: Tweet + Social Indicators Classification Model
![Page 405: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/405.jpg)
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
Use Case
L’Aquila Social Urban Network
Output: Social Indicators and Sentiment conveyed
![Page 406: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/406.jpg)
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
Use Case
L’Aquila Social Urban Network
The «score» of a social Indicator is the sum of the
Sentiment conveyed by the Tweets
![Page 407: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/407.jpg)
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
Use Case
L’Aquila Social Urban Network
Overall score of the social indicators between March and August 2014
![Page 408: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/408.jpg)
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
Use Case
L’Aquila Social Urban Network
COMMUNITY PROMOTER
DEFINES SOME INITIATIVES TO EMPOWER THE SOCIAL CAPITAL
MONITORS THE STATE OF THE SOCIAL INDICATORS
Connecting Real Communities
With Virtual Communities
![Page 409: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/409.jpg)
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
Use Case
The Italian Hate Map
http://users.humboldt.edu/mstephens/hate/hate_map.html Inspired by the
Hate Map built by the
Humboldt University
Is it possible to exploit
techniques for semantic
analysis of social media to
detect intolerant content
posted on social networks and
identify the most at-risk areas
of the Italian country?
Research Question
![Page 410: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/410.jpg)
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
Use Case
The Italian Hate Map
Heuristics: Twitter content
- 76 intolerant seed terms, defined by the psychologists teams
- 5 intolerance dimensions: violence (against women), racism,
homophobia, disability, anti-semitism
![Page 411: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/411.jpg)
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
Use Case
The Italian Hate Map
Many non-intolerant Tweets are extracted!
X X
![Page 412: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/412.jpg)
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
Use Case
The Italian Hate Map
Solution: replace a simple keyword-based approach
with a supervised classification model
X X
![Page 413: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/413.jpg)
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
Use Case
The Italian Hate Map
Entities and Wikipedia categories used as features
![Page 414: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/414.jpg)
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
Use Case
The Italian Hate Map
Sample of Tweet is manually labeled to build classification models
![Page 415: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/415.jpg)
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
Use Case
The Italian Hate Map
![Page 416: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/416.jpg)
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
Use Case
The Italian Hate Map
Tweet about an Italian ministry
Tweet about an Italian football player
Violence against women Disability
Racism Homophobia
![Page 417: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/417.jpg)
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
Use Case
The Italian Hate Map
Given the maps and given the output of the linguistic analysis of
intolerant Tweets (co-occurrences between terms, time lapse, etc.), the
psychologists team defined some guidelines to tackle and prevent
intolerant behaviors.
These guidelines have been freely distributed to public
administrations on early 2015.
![Page 418: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/418.jpg)
Musto, Cataldo, et al. CrowdPulse: A framework for real-time semantic analysis of social streams
Information Systems, 54 (2015): 127-146.
CrowdPulse
Lessons Learned
Pipeline of state of the art techniques Entity Linking, Sentiment Analysis, Machine Learning, Data Visualization
Use Cases.
L’Aquila Social Urban Network
The Italian Hate Map
REAL-TIME SEMANTIC CONTENT ANALYSIS
1. 2.
The outcomes of both use cases showed that very
complex phenomena can be analyzed in a totally new
way, thanks to the huge availability of textual data
![Page 419: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/419.jpg)
Readings
Semantics-aware Recommender Systems
o C. Musto, G. Semeraro, M. de Gemmis, P. Lops: Learning Word Embeddings from Wikipedia for Content-
Based Recommender Systems. ECIR 2016: 729-734
o M. de Gemmis, P. Lops, C. Musto, F.Narducci, G. Semeraro: Semantics-Aware Content-Based
Recommender Systems. Recommender Systems Handbook 2015: 119-159
o C. Musto, G. Semeraro, M. de Gemmis, P. Lops: Word Embedding Techniques for Content-based
Recommender Systems: An Empirical Evaluation. RecSys Posters 2015
o C. Musto, P. Basile, M. de Gemmis, P. Lops, G. Semeraro, S. Rutigliano: Automatic Selection of Linked
Open Data Features in Graph-based Recommender Systems. CBRecSys@RecSys 2015: 10-13
o P. Basile, C. Musto, M. de Gemmis, P. Lops, F. Narducci, G. Semeraro: Content-Based Recommender
Systems + DBpedia Knowledge = Semantics-Aware Recommender Systems. SemWebEval@ESWC
2014: 163-169
o C. Musto, P. Basile, P. Lops, M. de Gemmis, G. Semeraro: Linked Open Data-enabled Strategies for Top-N
Recommendations. CBRecSys@RecSys 2014: 49-56
o C. Musto, G. Semeraro, P. Lops, M. de Gemmis: Combining Distributional Semantics and Entity Linking
for Context-Aware Content-Based Recommendation. UMAP 2014: 381-392
o C. Musto, G. Semeraro, P. Lops, M. de Gemmis: Contextual eVSM: A Content-Based Context-Aware
Recommendation Framework Based on Distributional Semantics. EC-Web 2013: 125-136
o C. Musto, F. Narducci, P. Lops, G. Semeraro, M. de Gemmis, M. Barbieri, J. H. M. Korst, V. Pronk, R. Clout:
Enhanced Semantic TV-Show Representation for Personalized Electronic Program Guides. UMAP
2012: 188-199
o M. Degemmis, P. Lops, G. Semeraro: A content-collaborative recommender that exploits WordNet-based
user profiles for neighborhood formation. User Model. User-Adapt. Interact. 17(3): 217-255 (2007)
o G. Semeraro, M. Degemmis, P. Lops, P. Basile: Combining Learning and Word Sense Disambiguation for
Intelligent User Profiling. IJCAI 2007: 2856-2861
![Page 420: Semantics-aware Techniques for Social Media Analysis, User Modeling and Recommender Systems - Tutorial @UMAP 2016](https://reader031.fdocuments.net/reader031/viewer/2022030315/5883f1a71a28ab34428b654b/html5/thumbnails/420.jpg)
Readings
Cross-language Recommender Systems
o C. Musto, F. Narducci, P. Basile, P. Lops, M. de Gemmis, G. Semeraro: Cross-Language Information
Filtering: Word Sense Disambiguation vs. Distributional Models. AI*IA 2011: 250-261
o P. Lops, C. Musto, F. Narducci, M. de Gemmis, P. Basile, G. Semeraro: Cross-Language Personalization
through a Semantic Content-Based Recommender System. AIMSA 2010: 52-60
Explanations
o C. Musto, F. Narducci, P. Lops, M. de Gemmis, G. Semeraro: ExpLOD: a framework for Explaining
Recommendations based on the Linked Open Data cloud. RecSys 2016, to appear
Semantic Analysis of Social Streams
o C. Musto, G. Semeraro, P. Lops, M. de Gemmis: CrowdPulse: A framework for real-time semantic
analysis of social streams. Inf. Syst. 54: 127-146 (2015)