Semantic Recommandation Sytems for Research 2.0

SEMANTIC RECOMMENDATION SYSTEMS

FOR RESEARCH 2.0OR

A Conceptual Prototype for a Twitter based Recommender System for Research 2.0

by Patrick Thonhauser

Thursday, October 11, 12

OUTLINE

• Motivation

• Basics (Semantic Web, Recommender Systems, Natural Language Processing)

• Conceptual Prototype

• Test results and Discussion

• Questions


MOTIVATION

• Is Twitter useful for discovering new connections between researchers in similar subject areas (and why Twitter)?

• How much information can we extract form 140 character strings?

• Is it possible to separate useful information from noise?

• Are there any appropriate classifiers and metrics to measure the significance of Twitter users and Tweets?


SEMANTIC WEB

• Additional Layer of Information

• Linked Data (use URIs as names, use HTTP URIs, use standards to provide Information, include links to other URIs)

• RDF (based on triples -> subject, predicate, object) is like HTML for the classic web

• Nearly all semantic web standards are based on RDF (like FOAF - Friend of a Friend Project)


RECOMMENDER SYSTEMS

• Collaborative Filtering (user based/ item based)

• Content Based Recommendation

• Knowledge Based Recommendation

• Hybrid Recommendations


NATURAL LANGUAGE PROCESSING (NLP)

• Classification of Microtext Artefacts (This presentation is killer!)

• Applying NLP - Pipelines

• End of Sentence Detection

• Tokenization

• POS Tagging

• Chunking

• Extraction


THE CONCEPT OF THOUGHT

BUBBLES

Let’s imagine every Twitter user belongs to several

different topic related Bubbles


• A user is part of topic related bubbles

• Twitter users within topic related bubbles don’t necessarily know each other

• Connections of already existing connections of the service user lead to new information

• Non bidirectional connections preferred

LET’S SUMMARIZE

So how can we find such potentially interesting users?


PROOF OF CONCEPT SYSTEM(1) Preselection of user set, which will

be analyzed in depth

(2) Apply NLP-Pipeline for measuring user similarity

(3) Categorize the top-n best scoring users according to the idea of Thought Bubbles

(4) Recommend top-n best scoring users of a category to the user

(5) Analyze acceptance of recommendations

IOS DEV

SOCIAL MEDIA

SPORTS

SERVICE USER

TWITTER

REST API

THOUGHT

BUBBLES API

NLP PRE-

FILTERING

CATEGORISATION

CLUSTERING

ANALYZE RECS

SERVER

A USERS THOUGHT BUBBLE

DB


Friends of Friends Twitter

Accounts

Filter accounts that are already connected to you

Filter accounts where: follower_count < 300 status_count < 1000

Filter non English speakingaccounts

Filter Filter FilterIdentifiy People

by using a simple NLP Pipeline

Set of Twitter accounts for further processing

(1) PRE-SELECTION/FILTERING

• The set of friends of friend‘s Twitter accounts changes from iteration to iteration

• Filters are added after analyzing the acceptance of recommendations


@testuser The grand jury

commented on a number of…

POS tagging

Tokenization and stripping

@mentions and URLs

[('The', 'AT'), ('grand', 'JJ'), ('jury', 'NN'),

('commented', 'VBD'), ('on', 'IN'),

('a', 'AT'), ('number',

'NN'), ... ('.', '.')]

Raw Tweets

Chunking

Neglect 200 most used English

wordsPOS tagged Tweets

[('jury', 'NN'), 'number',

'NN'), ('social dayly',

'NP'), ...]

Mined nouns and phrases

Frequency Distribution

[('jury', 34), ('social', 23), ('test case',

16), ...]

Filter top n words

DB

Set of Frequency Distributed mined nouns and phrases

(2) NLP PIPELINE

400 most recent Tweets of a potential recommendation are used for calculating the similarity measure


•Calculate top-n users by applying Single-Linkage-Clustering

•Categorize if user belongs to user specific bubbles

•Present recommendation lists to users

•Analyze acceptance of recommendations (connect user accounts with FOAF) and add new filter predicate if necessary.


SUPERVISED TEST RUN

@gargamit100*@selvers*

@UpsideLearning*@poposkidimitar*

@jkalten*@cpappas*@pfidalgo1*

@timbuckteeth*@starsandrobots*

@TheJ Russ@cliveshepherd*

@Microsoft@jtcobb*

@MichaelPhelps@SebastianThrun*

@elearning*@elvaandrade

@BarackObama@SteveVictor

@AnwarRichardson@pabaker55*

@jamesmclynn@DrEvanHarris

@mstrohm*@AmyFrearson

@gekitz@Hhaitch@sclater*

@TheRock@MCeraWeakBaby

@fatcharlesh@FrankViola@timbarker

@AnnaOscarsson@WithDrake

sabrinaVanessa@charliesheen

@WWEDanielBryan@cmccosky

@kaitlyntrigger@judithsei*

@atsc*@melaniedaveid

@Emmadw*@ladygaga

@marcusfairs@lucyheartsTW

@PeterSmith@MikeVick

@meadd cameron0 0.075 0.150 0.225 0.300

recommendations are framed


UNSUPERVISED TEST RESULTS

The probability that a recommended item is relevant is 64.4%. Standard Derivation: 31.5%


DISCUSSIONTwitter IS useful for discovering new information in sense of Research 2.0 but:

• Recommendations reflect the Twitter behavior of the user

• Automated tweets harm recommendation results (one sentence gets an enormous weight because it occurs very very often)

• Twitter‘s request limitation is a show stopper

• Comparison to similar systems (Content and collaborative filtering)


THANK YOU!ANY QUESTIONS?


Semantic Recommandation Sytems for Research 2.0

Documents

Transcript of Semantic Recommandation Sytems for Research 2.0