Fine Grained Location Extraction from Tweets with Temporal Awareness

38
+ Fine-Grained Location Extraction from Tweets with Temporal Awareness 2014/3/13 (Fri.) Chang Wei-Yuan @ MakeLab Group Meeting Chenliang Li, Aixin Sun SIGIR‘14

Transcript of Fine Grained Location Extraction from Tweets with Temporal Awareness

+

Fine-Grained Location Extraction from Tweets with Temporal Awareness

2014/3/13 (Fri.)�Chang Wei-Yuan @ MakeLab Group Meeting

Chenliang Li, Aixin Sun �SIGIR‘14

+Outline

n Introduction �

n Problem Definition �

n Method:PETAR �n  POI inventory�n  Data analysis and observations �n  Time-aware POI tagger�

n Experiment �

n Conclusion �

n Thought

2

+Outline

n Introduction �

n Problem Definition �

n Method:PETAR �n  POI inventory�n  Data analysis and observations �n  Time-aware POI tagger�

n Experiment �

n Conclusion �

n Thought

3

+ Introduction

n Twitter has Large volume of timely data �n Social and business value.

n Users casually or implicitly reveal their locations and short term visiting plans. �

4

+Outline

n Introduction �

n Problem Definition �

n Method:PETAR �n  POI inventory�n  Data analysis and observations �n  Time-aware POI tagger�

n Experiment �

n Conclusion �

n Thought

5

+Problem Definition

n Given: �n a tweet t published by a user from a

predefined geographical region �n a set of temporal awareness label �

n  c ∈ { past, present, future, non-temporal } �

n Goal: �n extract all POI and temporal awareness label

pairs from t. �n  ( POI name, temporal awareness label )�

6

+Challenges

n Predominate usage of short names or informal abbreviations

n Many POI names are ambiguous �

7

+Outline

n Introduction �

n Problem Definition �

n Method:PETAR �n  POI inventory�n  Data analysis and observations �n  Time-aware POI tagger�

n Experiment �

n Conclusion �

n Thought

8

+Overview 9

+POI inventory�

n The POI inventory is constructed by extracting the POI names mentioned �n Foursquare check-in from tweets.

10

+POI inventory�

n 239,499 Foursquare check-in tweets made by Singaporean users �

n POI names are extracted by applying handcrafted rules, Regexp �

11

+POI inventory�

n Partial POI names are extracted by taking all the sub-sequences of the names. �

12

+Data analysis and observations �

n Data set �n 4.33M tweets from 19,256 unique Singapore-

based users during June 2010 �n 222,201 tweets mentions at least one

candidate POI name from by 13,758 unique users.

13

+Data analysis and observations �

n Observation 1 �Many users reveal their fine-grained locations in their tweets. � n 71.4% of all users in the dataset �n 91.3% of the users who had published at

least 20 tweets �

14

+Data analysis and observations �

n Observation 2 �The candidate POI mentions are mostly very short with one or two words. Many of the mentions are partial location names. ��n 46.7% of the candidate POI names are

unigrams�n 41.6%+ of the candidate POI names are

partial POI names �

15

+Data analysis and observations �

n Observation 3 �About half of the candidate POI mentions indeed refer to locations and their associated temporal awareness can be determined. �

16

+Data analysis and observations �

n Observation 3 �About half of the candidate POI mentions indeed refer to locations and their associated temporal awareness can be determined. �

17

+Data analysis and observations �

n Observation 4 �Among all POIs that were visited, or to be visited, about 90% of the visits to these POIs happen within a day. �

18

+Time-aware POI tagger�

n Simultaneously disambiguate the candidate POI mentions and resolve the temporal awareness

19

+Time-aware POI tagger�

n Lexical features �n Basic: word itself, lowercase form, all-cap … n Contextual: context window, preceding,

following ��n Off to jp now ! Hope it DOESN’t rain �n Off to jp now ! Hope it DOESN’t rain �n Off to jp now ! Hope it DOESN’t rain �

20

+Time-aware POI tagger�

n Grammatical Features n Part-of-speech (POS)�n The time-trend score T(t) of the tweet �n The closest verb/time-trend word �

n  its time-trend score, distance, left/right-hand side indicator �

21

+Time-aware POI tagger�

n a dictionary D of time-trend words, containing commonly used words with manually assigned time-trend scores �n D = { “future”:1, “present”:0, “past”:-1, “is”:0,

“was”:-1 … } �

n a tweet t: “Soccer fever at mac now!”�n Is w ∈ t matching an entry in D? �n Considering tags of Verbs.

22

+Time-aware POI tagger�

n Geographical Features n Spatial randomness. �

n Location name confidence. �

n Multiple candidate POI mention.

23

+Time-aware POI tagger�

n Geographical Features n Spatial randomness : POI name entorpy��

�n Location name confidence. �

n Multiple candidate POI mention.

24

+Time-aware POI tagger�

n Geographical Features n Spatial randomness. �

n Location name confidence. �

n Multiple candidate POI mention.

25

+Time-aware POI tagger�

n Geographical Features n Spatial randomness. – entorpy/identifiability�

n Location name confidence. – count �

n Multiple candidate POI mention.

26

+Time-aware POI tagger�

n BILOU Schema Features �n Identify Beginning, Inside and Last word of a

multi-word POI name, and Unit-length POI name, and the words Outside of any POI names �

27

+Outline

n Introduction �

n Problem Definition �

n Method:PETAR �n  POI inventory�n  Data analysis and observations �n  Time-aware POI tagger�

n Experiment �

n Conclusion �

n Thought

28

+Experiment

n Experiment Setup �n manually annotated 4,000 tweets as ground truth �

n Comparative Methods �n Random Annotation (RA) �n K-Nearest Neighbor (KNN) �n StanfordNER (CRF-Classifier) �

29

+Experiment: results

n POI extraction with temporal awareness �

n Disambiguating POIs (ignoring temporal awareness) �

30

+Experiment: feature analysis

n Lex features are better for POI mention disambiguation �

n Gra features are better for resolving temporal awareness �

31

+Experiment

n Lex + Gra features are better in most cases �

32

+Outline

n Introduction �

n Problem Definition �

n Method:PETAR �n  POI inventory�n  Data analysis and observations �n  Time-aware POI tagger�

n Experiment �

n Conclusion �

n Thought

33

+Conclusion

n PETAR facilitate the fine-grained location-based services/marketing and personalization. �

n a POI inventory exploits the crowd wisdom of Foursquare community to enable fine-grained location extraction. �

n a time-aware POI tagger conducts the location extraction and temporal awareness resolution in an effective and efficient way. �

34

+Outline

n Introduction �

n Problem Definition �

n Method:PETAR �n  POI inventory�n  Data analysis and observations �n  Time-aware POI tagger�

n Experiment �

n Conclusion �

n Thought

35

+Thought 1

n Location Extraction with Temporal Awareness �

n Activities Extraction with Temporal Awareness

36

+Thought 2 37

+Thanks for listening. 2014 / 3 / 13 (Fri.) @ MakeLab Group Meeting �[email protected]