Leveraging the Semantics of Tweets for Adaptive Faceted...

Post on 28-Oct-2019

4 views 0 download

Transcript of Leveraging the Semantics of Tweets for Adaptive Faceted...

Delft University of Technology

Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter

ISWC, Bonn, Germany, Oct 27th 2011

Fabian Abel1, Ilknur Celik1, Geert-Jan Houben, Patrick Siehndel2

1Web Information Systems, TU Delft, the Netherlands 2L3S Research Center, Hannover, Germany

2 Adaptive Faceted Search on Twitter

Personalized Recommendations

Personalized Search

Adaptive Systems

What we do: Science and Engineering for the Personal Web

Social Web

Analysis and User Modeling

user/usage data

Semantic Enrichment, Linkage and Alignment

domains: news social media cultural heritage public data e-learning

3 Adaptive Faceted Search on Twitter

200,000,000 number of tweets published per day

4 Adaptive Faceted Search on Twitter

1 number of tweets that are interesting for me now

5 Adaptive Faceted Search on Twitter

Searching on Twitter

6 Adaptive Faceted Search on Twitter

Issues with Multiple Keywords Search

7 Adaptive Faceted Search on Twitter

Let’s try to search with One Keyword

8 Adaptive Faceted Search on Twitter

Page 1

9 Adaptive Faceted Search on Twitter

Page 2

10 Adaptive Faceted Search on Twitter

Page 3

11 Adaptive Faceted Search on Twitter

Page 60!!

tweet I was looking for

12 Adaptive Faceted Search on Twitter

Page 60!!

tweet I was looking for

Next Saturday @thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my hometwon Eindhoven. #realliveshit #iwillspinrecords about 9 hours ago via Blackberry

Music Artist

Locations

13 Adaptive Faceted Search on Twitter

Is there an easier way?

Locations more...

Events more...

Music Artists: + Guilty Simpson + Bryan Adams + Elton John + Golden Earring + Rihanna + The eagles + 3 Doors Down more...

Current Query:

Results: 1. Yskiddd: Next saturday

@thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my homeytown Eindhoven. #realliveshit #iwillspinrecords2

2. Usee123: Cool #EV3door7980 !!! http://bit.ly/igyyRhL

3. sanmiquelmusic: This Saturday I'm joining @KrusadersMusic to Intents

Eindhoven Music

Expand Query:

Faceted Search can help (hypothesis)

14 Adaptive Faceted Search on Twitter

Challenges

15 Adaptive Faceted Search on Twitter

Facets of a Tweet

@bob: Julian Assange got arrested

Creator @bob Location Delft, the Netherlands Creation time Nov 29th 2011

Facet type Facet Value

16 Adaptive Faceted Search on Twitter

Facets of a Tweet

@bob: Julian Assange got arrested

Creator @bob Location Delft, the Netherlands Creation time Nov 29th 2011

Facet type Facet Value

Challenge 1: How to infer facets that describe the content of a tweet?

17 Adaptive Faceted Search on Twitter

Faceted Search: selecting facet-value pairs

Locations + Aachen + Aalborg + Aalesund + Aarhus + Aasiaat + Abaiang + Abakan more...

Events more...

Music Artists more…

Current Query:

Results: 1. Yskiddd: Next saturday

@thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my homeytown Eindhoven. #realliveshit #iwillspinrecords2

2. Usee123: Cool #EV3door7980 !!! http://bit.ly/igyyRhL

3. sanmiquelmusic: This Saturday I'm joining @KrusadersMusic to Intents

Music

Expand Query:

18 Adaptive Faceted Search on Twitter

Faceted Search: selecting facet-value pairs

Locations + Aachen + Aalborg + Aalesund + Aarhus + Aasiaat + Abaiang + Abakan more...

Events more...

Music Artists more…

Current Query:

Results: 1. Yskiddd: Next saturday

@thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my homeytown Eindhoven. #realliveshit #iwillspinrecords2

2. Usee123: Cool #EV3door7980 !!! http://bit.ly/igyyRhL

3. sanmiquelmusic: This Saturday I'm joining @KrusadersMusic to Intents

Music

Expand Query:

Number of selectable facet values may be

very high!

19 Adaptive Faceted Search on Twitter

Faceted Search: selecting facet-value pairs

Locations + Aachen + Aalborg + Aalesund + Aarhus + Aasiaat + Abaiang + Abakan more...

Events more...

Music Artists more…

Current Query:

Results: 1. Yskiddd: Next saturday

@thatsimpsonguy aka Guilty Simpson will be performing at Area51 in my homeytown Eindhoven. #realliveshit #iwillspinrecords2

2. Usee123: Cool #EV3door7980 !!! http://bit.ly/igyyRhL

3. sanmiquelmusic: This Saturday I'm joining @KrusadersMusic to Intents

Music

Expand Query:

Number of selectable facet values may be

very high!

Challenge 2: How to adapt the faceted search interface to the current demands of a user?

20 Adaptive Faceted Search on Twitter

Adaptive Faceted Search Framework

21 Adaptive Faceted Search on Twitter

Adaptive Faceted Search Framework

Adaptive Faceted Search

Twitter posts

Semantic Enrichment

User and Context Modeling

user

How to adapt the facet-value pair ranking to the

current demands of the user?

How to represent the content of a

tweet? facet extraction

22 Adaptive Faceted Search on Twitter

Facet Extraction and Semantic Enrichment

@bob: Julian Assange got arrested

Julian Assange

Julian Assange Tweet-based enrichment

powered by

23 Adaptive Faceted Search on Twitter

Facet Extraction and Semantic Enrichment

@bob: Julian Assange got arrested

Julian Assange

Julian Assange Tweet-based enrichment

Julian Assange arrested Julian Assange, the founder of WikiLeaks, is under arrest in London…

Link-based enrichment

powered by

24 Adaptive Faceted Search on Twitter

Facet Extraction and Semantic Enrichment

@bob: Julian Assange got arrested

Julian Assange

Julian Assange Tweet-based enrichment

Julian Assange arrested Julian Assange, the founder of WikiLeaks, is under arrest in London…

Link-based enrichment

powered by

Julian Assange

London

WikiLeaks

Julian Assange

Julian Assange

London WikiLeaks

25 Adaptive Faceted Search on Twitter

Impact of Link-based enrichment

26 Adaptive Faceted Search on Twitter

Impact of Link-based enrichment

Representation of tweets:

significantly more facets per tweet with link-based

enrichment

27 Adaptive Faceted Search on Twitter

Faceted Search Strategies

• Challenge: most-relevant facet-value pair should appear at the top of the ranking

• Baseline: hashtag-based keyword search

Locations 1. Aachen 2. Aalborg 3. Aalesund 4. Aarhus … 2145. Eindhoven

Locations 1. Eindhoven 2. Delft 3. Amsterdam 4. Rotterdam 5. London …

28 Adaptive Faceted Search on Twitter

Faceted Search Strategies

• Challenge: most-relevant facet-value pair should appear at the top of the ranking

• Baseline: hashtag-based keyword search

• Faceted Search Strategies: 1. Occurrence frequency: count occurrence frequencies of FVP (baseline)

Locations 1. Aachen 2. Aalborg 3. Aalesund 4. Aarhus … 2145. Eindhoven

Locations 1. Eindhoven 2. Delft 3. Amsterdam 4. Rotterdam 5. London …

facet-value pair

current hit list of matching tweets

number of tweets that contain the FVP

29 Adaptive Faceted Search on Twitter

Faceted Search Strategies

• Challenge: most-relevant facet-value pair should appear at the top of the ranking

• Baseline: hashtag-based keyword search

• Faceted Search Strategies: 1. Occurrence frequency: count occurrence frequencies of FVP (baseline)

2. Personalization: adapt ranking to user profile ( different user

modeling strategies possible; here: entire tweeting history of the user)

Locations 1. Aachen 2. Aalborg 3. Aalesund 4. Aarhus … 2145. Eindhoven

Locations 1. Eindhoven 2. Delft 3. Amsterdam 4. Rotterdam 5. London …

facet-value pair

current hit list of matching tweets

number of tweets that contain the FVP

Personalized FVP ranking stratey

facet-value pair

current hit list of matching tweets

number of tweets that contain the FVP

time June 27 July 4 user

User Profile FVP weight

6

4

3

(location, Delft)

(event, JazzBaltica)

(person, ChetBaker)

weight in user profile =

rank of the FVP

30 Adaptive Faceted Search on Twitter

Faceted Search Strategies

• Challenge: most-relevant facet-value pair should appear at the top of the ranking

• Baseline: hashtag-based keyword search

• Faceted Search Strategies: 1. Occurrence frequency: count occurrence frequencies of FVP (baseline)

2. Personalization: adapt ranking to user profile ( different user

modeling strategies possible; here: entire tweeting history of the user)

3. Diversification: increase variety among the top-ranked FVPs

Locations 1. Aachen 2. Aalborg 3. Aalesund 4. Aarhus … 2145. Eindhoven

Locations 1. Eindhoven 2. Delft 3. Amsterdam 4. Rotterdam 5. London …

facet-value pair

current hit list of matching tweets

number of tweets that contain the FVP

Personalized FVP ranking stratey

facet-value pair

current hit list of matching tweets

number of tweets that contain the FVP

time June 27 July 4 user

User Profile FVP weight

6

4

3

(location, Delft)

(event, JazzBaltica)

(person, ChetBaker)

weight in user profile =

rank of the FVP

number of tweets that contain the FVP

Genre + Blues + Jazz + JazzMusic + Rock more...

Genre + Blues + Jazz + Rock + Classic more...

minimize overlaps

31 Adaptive Faceted Search on Twitter

Faceted Search Strategies

• Challenge: most-relevant facet-value pair should appear at the top of the ranking

• Baseline: hashtag-based keyword search

• Faceted Search Strategies: 1. Occurrence frequency: count occurrence frequencies of FVP (baseline)

2. Personalization: adapt ranking to user profile ( different user

modeling strategies possible; here: entire tweeting history of the user)

3. Diversification: increase variety among the top-ranked FVPs

4. Time-sensitivity: adapt FVP ranking to temporal context

Locations 1. Aachen 2. Aalborg 3. Aalesund 4. Aarhus … 2145. Eindhoven

Locations 1. Eindhoven 2. Delft 3. Amsterdam 4. Rotterdam 5. London …

facet-value pair

current hit list of matching tweets

number of tweets that contain the FVP

Personalized FVP ranking stratey

facet-value pair

current hit list of matching tweets

number of tweets that contain the FVP

time June 27 July 4 user

User Profile FVP weight

6

4

3

(location, Delft)

(event, JazzBaltica)

(person, ChetBaker)

weight in user profile =

rank of the FVP

number of tweets that contain the FVP

Genre + Blues + Jazz + JazzMusic + Rock more...

Genre + Blues + Jazz + Rock + Classic more...

minimize overlaps

Personalized FVP ranking stratey

current hit list of matching tweets

number of tweets that contain the FVP

time June 27 July 4 June 20

occu

rre

nce

fr

eq

ue

ncy

of

FV

P

(event, JazzBaltica)

(event, FrenchOpen)

Event + JazzBaltica + FrenchOpen more...

search

32 Adaptive Faceted Search on Twitter

Faceted Search Strategies

• Challenge: most-relevant facet-value pair should appear at the top of the ranking

• Baseline: hashtag-based keyword search

• Faceted Search Strategies: 1. Occurrence frequency: count occurrence frequencies of FVP (baseline)

2. Personalization: adapt ranking to user profile ( different user

modeling strategies possible; here: entire tweeting history of the user)

3. Diversification: increase variety among the top-ranked FVPs

4. Time-sensitivity: adapt FVP ranking to temporal context

• Semantic enrichment: (i) tweet-based and (ii) link-based enrichment

Locations 1. Aachen 2. Aalborg 3. Aalesund 4. Aarhus … 2145. Eindhoven

Locations 1. Eindhoven 2. Delft 3. Amsterdam 4. Rotterdam 5. London …

33 Adaptive Faceted Search on Twitter

Research Questions

1. How well does faceted search that is supported by the semantic enrichment perform in comparison to keyword search?

2. What strategy performs best in ranking facet-value pairs that allow users to find relevant tweets on Twitter?

3. How do the different building blocks of the faceted search framework influence the performance?

34 Adaptive Faceted Search on Twitter

Dataset

time Nov 15 Dec 15 Jan 15 Feb 15

20,000 Twitter users

30,000,000 tweets

4 months

more than:

Egyptian revolution

Jan 25

35 Adaptive Faceted Search on Twitter

Evaluation Framework

• User Simulation Model [cf. Koren et al., WWW’08]: • Input: search settings = { (user who searches, relevant target tweet) }

• Drill down search result list until no more FVPs can be applied or less than 10 tweets match the query

• Simulating click behavior: first-matching FVP is selected ( user knows

target resource)

• Ground truth relevant target tweet = tweet that has been re-tweeted by the user

• Metrics: • Succes@k: probability that relevant FVP appears in the top k (the higher

the Succes@k, the faster the search and fewer the user effort)

• MRR: mean reciprocal rank of the target tweet when the user selected it

36 Adaptive Faceted Search on Twitter

Faceted-search vs. hashtag-based (keyword) search

37 Adaptive Faceted Search on Twitter

Faceted-search vs. hashtag-based (keyword) search

Faceted search based on semantic enrichment of

tweets outperforms hashtgag-based search

significantly.

38 Adaptive Faceted Search on Twitter

Results: Overview

39 Adaptive Faceted Search on Twitter

Results: Overview Personalized strategy achieves ~12% better

performance than other semantic strategies (and 2 x

better than hashtag-based)

40 Adaptive Faceted Search on Twitter

Impact of link-based enrichment

41 Adaptive Faceted Search on Twitter

Impact of link-based enrichment

Personalized strategy outperforms baseline

significantly

Link-based enrichment improves quality for both

strategies

42 Adaptive Faceted Search on Twitter

Impact of time-sensitivity

43 Adaptive Faceted Search on Twitter

Impact of time-sensitivity

Time-sensitivity based ranking improves quality for both frequency and

diversification strategies

44 Adaptive Faceted Search on Twitter

Application of the Faceted Search Framework

45 Adaptive Faceted Search on Twitter

Twitcident.com Twitter-based crisis management system

1.

2.

3. 4.

Semantic enrichment allows for: 1. Grouping tweets

into incidents 2. Faceted search 3. Thematic Views 4. Analysis

46 Adaptive Faceted Search on Twitter

47 Adaptive Faceted Search on Twitter

48 Adaptive Faceted Search on Twitter

49 Adaptive Faceted Search on Twitter

Conclusions

What we did:

• Adaptive Faceted Search on Twitter + Evaluation Framework • Analysis and Evaluation (+ Application in Twitcident)

Findings:

1. Semantic Enrichment allows for structured representation of the content of tweets basis for faceted search

2. Faceted search performs significantly better than hashtag-based keyword search

3. Different building blocks for making faceted search on Twitter adaptive improve the search quality:

a) Link-based enrichment: more discoverable tweets, better search performance

b) Personalization leads to significant improvements

c) Time-sensitivity improves performance as well

50 Adaptive Faceted Search on Twitter

Thank you!

Twitter: @fabianabel

http://wis.ewi.tudelft.nl/iswc2011/