Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence,...
-
Upload
delphia-moody -
Category
Documents
-
view
215 -
download
0
Transcript of Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence,...
![Page 1: Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649f275503460f94c3fcd0/html5/thumbnails/1.jpg)
Semantics-Based News Recommendation with SF-IDF+
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
June 13, 2013
Marnix [email protected]
Michel [email protected]
Frederik [email protected]
Flavius [email protected]
Erasmus University RotterdamPO Box 1738, NL-3000 DRRotterdam, the Netherlands
![Page 2: Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649f275503460f94c3fcd0/html5/thumbnails/2.jpg)
Introduction (1)
• Recommender systems help users to plough through a massive and increasing amount of information
• Recommender systems:– Content-based– Collaborative filtering– Hybrid
• Content-based systems are often term-based
• Common measure: Term Frequency – Inverse Document Frequency (TF-IDF) as proposed by Salton and Buckley [1988]
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
![Page 3: Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649f275503460f94c3fcd0/html5/thumbnails/3.jpg)
Introduction (2)
• One could take into account semantics:– Semantic Similarity (SS) recommenders:
• Jiang & Conrath [1997]• Leacock & Chodorow [1998]• Lin [1998]• Resnik [1995]• Wu & Palmer [1994]
– Concepts instead of terms → Concept Frequency – Inverse Document Frequency (CF-IDF):
• Reduces noise caused by non-meaningful terms• Yields less terms to evaluate• Allows for semantic features, e.g., synonyms• Relies on a domain ontology• Published at WIMS 2011
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
![Page 4: Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649f275503460f94c3fcd0/html5/thumbnails/4.jpg)
Introduction (3)
• One could take into account semantics:– Synsets instead of concepts → Synset Frequency – Inverse
Document Frequency (SF-IDF):• Similar to CF-IDF• Does not rely on a domain ontology• Published at WIMS 2012
– Research has shown that relationships like synonymy, hyponymy, … provide structure and contribute to an improved level of interpretability
– Hence, we coin SF-IDF+, which additionally accounts for synset semantic relationships
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
![Page 5: Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649f275503460f94c3fcd0/html5/thumbnails/5.jpg)
Introduction (4)
• Implementations in Ceryx (as a plug-in for Hermes [Frasincar et al., 2009], a news processing framework)
• What is the performance of semantic recommenders?– SF-IDF+ vs. SF-IDF– SF-IDF+ vs. TF-IDF– SF-IDF+ vs. SS
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
![Page 6: Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649f275503460f94c3fcd0/html5/thumbnails/6.jpg)
Framework: User Profile
• User profile consists of all read news items
• Implicit preference for specific topics
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
![Page 7: Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649f275503460f94c3fcd0/html5/thumbnails/7.jpg)
Framework: Preprocessing
• Before recommendations can be made, each news item is parsed:– Tokenizer– Sentence splitter– Lemmatizer– Part-of-Speech
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
![Page 8: Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649f275503460f94c3fcd0/html5/thumbnails/8.jpg)
Framework: Synsets
• We make use of the WordNet dictionary and WSD
• Each word has a set of senses and each sense has a set of semantically equivalent synonyms (synsets):– Turkey:
• turkey, Meleagris gallopavo (animal)• Turkey, Republic of Turkey (country)• joker, turkey (annoying person)• turkey, bomb, dud (failure)
– Fly:• fly, aviate, pilot (operate airplane)• flee, fly, take flight (run away)
• Synsets are linked using semantic pointers– Hypernym, hyponym, …
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
![Page 9: Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649f275503460f94c3fcd0/html5/thumbnails/9.jpg)
Framework: TF-IDF
• Term Frequency: the occurrence of a term ti in a document dj, i.e.,
• Inverse Document Frequency: the occurrence of a term ti in a set of documents D, i.e.,
• And hence
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
k jk
jiji n
ntf
,
,,
|}:{|
||log
jii dtj
Didf
ijiji idftfidftf ,,-
![Page 10: Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649f275503460f94c3fcd0/html5/thumbnails/10.jpg)
Framework: SF-IDF
• Synset Frequency: the occurrence of a synset si in a document dj, i.e.,
• Inverse Document Frequency: the occurrence of a synset si in a set of documents D, i.e.,
• And hence
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
k jk
jiji n
nsf
,
,,
|}:{|
||log
jii dsj
Didf
ijiji idfsfidfsf ,,-
![Page 11: Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649f275503460f94c3fcd0/html5/thumbnails/11.jpg)
Framework: SF-IDF+
• Synset Frequency: the occurrence of a synset si and its related synsets ri in a document dj, i.e.,
• Inverse Document Frequency: the occurrence of synsets si and ri in a set of documents D, i.e.,
• Weighting is applied depending on relations, and hence
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
k jk
jiji n
nsf
,
,,
|},:{|
||log
jiii drsj
Didf
rijirji widfsfidfsf ,,,-
![Page 12: Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649f275503460f94c3fcd0/html5/thumbnails/12.jpg)
Framework: SS (1)
• TF-IDF and SF-IDF(+) use cosine similarity:– Two vectors:
• User profile items scores• News message items scores
– Measures the cosine of the angle between the vectors
• Semantic Similarity (SS):– Two vectors:
• User profile synsets• News message synsets
– Jiang & Conrath [1997], Resnik [1995] , and Lin [1998]: information content of synsets
– Leacock & Chodorow [1998] and Wu & Palmer [1994]:path length between synsets
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
![Page 13: Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649f275503460f94c3fcd0/html5/thumbnails/13.jpg)
Framework: SS (2)
• SS score is calculated by computing the pair-wise similarities between synsets in the unread document u and the user profile r:
where W is a vector with all combinations of synsets from r and u that have a common Part-of-Speech, and where sim(u,r) is any of the mentioned SS measures.
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
||
),(
)( ),(
W
rusim
urank Wru
![Page 14: Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649f275503460f94c3fcd0/html5/thumbnails/14.jpg)
Implementation: Hermes
• Hermes framework is utilized for building a news personalization service for RSS
• Its implementation is the Hermes News Portal (HNP):– Programmed in Java– Uses OWL / SPARQL / Jena / GATE / WordNet
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
![Page 15: Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649f275503460f94c3fcd0/html5/thumbnails/15.jpg)
Implementation: Ceryx
• Ceryx is a plug-in for HNP
• Uses WordNet / Stanford POS Tagger / JAWS lemmatizer / Lesk WSD
• Main focus is on recommendation support
• User profiles are constructed
• Computes TF-IDF, SF-IDF, SF-IDF+, and SS
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
![Page 16: Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649f275503460f94c3fcd0/html5/thumbnails/16.jpg)
Evaluation (1)
• Experiment:– We let 19 participants evaluate 100 news items– We use 8 different user profiles focusing on various topics– Ceryx computes TF-IDF, SF-IDF, SF-IDF+, and SS for
various cut-off values– F1 scores are evaluated
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
![Page 17: Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649f275503460f94c3fcd0/html5/thumbnails/17.jpg)
Evaluation (2)
• Results:
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
TF-IDFSF-IDF+
SS
![Page 18: Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649f275503460f94c3fcd0/html5/thumbnails/18.jpg)
Evaluation (2)
• Results:
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
![Page 19: Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649f275503460f94c3fcd0/html5/thumbnails/19.jpg)
Conclusions
• Common recommendation is performed using TF-IDF
• Semantics could be considered by considering synsets and their relations
• Semantics-based recommendation outperforms the classic term-based recommendation
• Future work:– Employ also the similarity of words (e.g., named entities)
missing from WordNet (e.g., based on the Google Distance)– Compare SF-IDF, SF-IDF+, and SS with LDA (latent dirichlet
allocation) and ESA (explicit semantic analysis)
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)
![Page 20: Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.](https://reader035.fdocuments.net/reader035/viewer/2022070410/56649f275503460f94c3fcd0/html5/thumbnails/20.jpg)
Questions
International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013)