Semantic Recommandation Sytems for Research 2.0

16
SEMANTIC RECOMMENDATION SYSTEMS FOR RESEARCH 2.0 OR A Conceptual Prototype for a Twitter based Recommender System for Research 2.0 by Patrick Thonhauser Thursday, October 11, 12

Transcript of Semantic Recommandation Sytems for Research 2.0

Page 1: Semantic Recommandation Sytems for Research 2.0

SEMANTIC RECOMMENDATION SYSTEMS

FOR RESEARCH 2.0OR

A Conceptual Prototype for a Twitter based Recommender System for Research 2.0

by Patrick Thonhauser

Thursday, October 11, 12

Page 2: Semantic Recommandation Sytems for Research 2.0

OUTLINE

• Motivation

• Basics (Semantic Web, Recommender Systems, Natural Language Processing)

• Conceptual Prototype

• Test results and Discussion

• Questions

Thursday, October 11, 12

Page 3: Semantic Recommandation Sytems for Research 2.0

MOTIVATION

• Is Twitter useful for discovering new connections between researchers in similar subject areas (and why Twitter)?

• How much information can we extract form 140 character strings?

• Is it possible to separate useful information from noise?

• Are there any appropriate classifiers and metrics to measure the significance of Twitter users and Tweets?

Thursday, October 11, 12

Page 4: Semantic Recommandation Sytems for Research 2.0

SEMANTIC WEB

• Additional Layer of Information

• Linked Data (use URIs as names, use HTTP URIs, use standards to provide Information, include links to other URIs)

• RDF (based on triples -> subject, predicate, object) is like HTML for the classic web

• Nearly all semantic web standards are based on RDF (like FOAF - Friend of a Friend Project)

Thursday, October 11, 12

Page 5: Semantic Recommandation Sytems for Research 2.0

RECOMMENDER SYSTEMS

• Collaborative Filtering (user based/ item based)

• Content Based Recommendation

• Knowledge Based Recommendation

• Hybrid Recommendations

Thursday, October 11, 12

Page 6: Semantic Recommandation Sytems for Research 2.0

NATURAL LANGUAGE PROCESSING (NLP)

• Classification of Microtext Artefacts (This presentation is killer!)

• Applying NLP - Pipelines

• End of Sentence Detection

• Tokenization

• POS Tagging

• Chunking

• Extraction

Thursday, October 11, 12

Page 7: Semantic Recommandation Sytems for Research 2.0

THE CONCEPT OF THOUGHT

BUBBLES

Let’s imagine every Twitter user belongs to several

different topic related Bubbles

Thursday, October 11, 12

Page 8: Semantic Recommandation Sytems for Research 2.0

• A user is part of topic related bubbles

• Twitter users within topic related bubbles don’t necessarily know each other

• Connections of already existing connections of the service user lead to new information

• Non bidirectional connections preferred

LET’S SUMMARIZE

So how can we find such potentially interesting users?

Thursday, October 11, 12

Page 9: Semantic Recommandation Sytems for Research 2.0

PROOF OF CONCEPT SYSTEM(1) Preselection of user set, which will

be analyzed in depth

(2) Apply NLP-Pipeline for measuring user similarity

(3) Categorize the top-n best scoring users according to the idea of Thought Bubbles

(4) Recommend top-n best scoring users of a category to the user

(5) Analyze acceptance of recommendations

IOS DEV

SOCIAL MEDIA

SPORTS

SERVICE USER

TWITTER

REST API

THOUGHT

BUBBLES API

NLP PRE-

FILTERING

CATEGORISATION

CLUSTERING

ANALYZE RECS

SERVER

A USERS THOUGHT BUBBLE

DB

Thursday, October 11, 12

Page 10: Semantic Recommandation Sytems for Research 2.0

Friends of Friends Twitter

Accounts

Filter accounts that are already connected to you

Filter accounts where: follower_count < 300 status_count < 1000

Filter non English speakingaccounts

Filter Filter FilterIdentifiy People

by using a simple NLP Pipeline

Set of Twitter accounts for further processing

(1) PRE-SELECTION/FILTERING

• The set of friends of friend‘s Twitter accounts changes from iteration to iteration

• Filters are added after analyzing the acceptance of recommendations

Thursday, October 11, 12

Page 11: Semantic Recommandation Sytems for Research 2.0

@testuser The grand jury

commented on a number of…

POS tagging

Tokenization and stripping

@mentions and URLs

[('The', 'AT'), ('grand', 'JJ'), ('jury', 'NN'),

('commented', 'VBD'), ('on', 'IN'),

('a', 'AT'), ('number',

'NN'), ... ('.', '.')]

Raw Tweets

Chunking

Neglect 200 most used English

wordsPOS tagged Tweets

[('jury', 'NN'), 'number',

'NN'), ('social dayly',

'NP'), ...]

Mined nouns and phrases

Frequency Distribution

[('jury', 34), ('social', 23), ('test case',

16), ...]

Filter top n words

DB

Set of Frequency Distributed mined nouns and phrases

(2) NLP PIPELINE

400 most recent Tweets of a potential recommendation are used for calculating the similarity measure

Thursday, October 11, 12

Page 12: Semantic Recommandation Sytems for Research 2.0

•Calculate top-n users by applying Single-Linkage-Clustering

•Categorize if user belongs to user specific bubbles

•Present recommendation lists to users

•Analyze acceptance of recommendations (connect user accounts with FOAF) and add new filter predicate if necessary.

Thursday, October 11, 12

Page 13: Semantic Recommandation Sytems for Research 2.0

SUPERVISED TEST RUN

@gargamit100*@selvers*

@UpsideLearning*@poposkidimitar*

@jkalten*@cpappas*@pfidalgo1*

@timbuckteeth*@starsandrobots*

@TheJ Russ@cliveshepherd*

@Microsoft@jtcobb*

@MichaelPhelps@SebastianThrun*

@elearning*@elvaandrade

@BarackObama@SteveVictor

@AnwarRichardson@pabaker55*

@jamesmclynn@DrEvanHarris

@mstrohm*@AmyFrearson

@gekitz@Hhaitch@sclater*

@TheRock@MCeraWeakBaby

@fatcharlesh@FrankViola@timbarker

@AnnaOscarsson@WithDrake

sabrinaVanessa@charliesheen

@WWEDanielBryan@cmccosky

@kaitlyntrigger@judithsei*

@atsc*@melaniedaveid

@Emmadw*@ladygaga

@marcusfairs@lucyheartsTW

@PeterSmith@MikeVick

@meadd cameron0 0.075 0.150 0.225 0.300

recommendations are framed

Thursday, October 11, 12

Page 14: Semantic Recommandation Sytems for Research 2.0

UNSUPERVISED TEST RESULTS

The probability that a recommended item is relevant is 64.4%. Standard Derivation: 31.5%

Thursday, October 11, 12

Page 15: Semantic Recommandation Sytems for Research 2.0

DISCUSSIONTwitter IS useful for discovering new information in sense of Research 2.0 but:

• Recommendations reflect the Twitter behavior of the user

• Automated tweets harm recommendation results (one sentence gets an enormous weight because it occurs very very often)

• Twitter‘s request limitation is a show stopper

• Comparison to similar systems (Content and collaborative filtering)

Thursday, October 11, 12

Page 16: Semantic Recommandation Sytems for Research 2.0

THANK YOU!ANY QUESTIONS?

Thursday, October 11, 12