Fine Grained Location Extraction from Tweets with Temporal Awareness
-
Upload
chang-wei-yuan -
Category
Data & Analytics
-
view
68 -
download
0
Transcript of Fine Grained Location Extraction from Tweets with Temporal Awareness
+
Fine-Grained Location Extraction from Tweets with Temporal Awareness
2014/3/13 (Fri.)�Chang Wei-Yuan @ MakeLab Group Meeting
Chenliang Li, Aixin Sun �SIGIR‘14
+Outline
n Introduction �
n Problem Definition �
n Method:PETAR �n POI inventory�n Data analysis and observations �n Time-aware POI tagger�
n Experiment �
n Conclusion �
n Thought
2
+Outline
n Introduction �
n Problem Definition �
n Method:PETAR �n POI inventory�n Data analysis and observations �n Time-aware POI tagger�
n Experiment �
n Conclusion �
n Thought
3
+ Introduction
n Twitter has Large volume of timely data �n Social and business value.
n Users casually or implicitly reveal their locations and short term visiting plans. �
4
+Outline
n Introduction �
n Problem Definition �
n Method:PETAR �n POI inventory�n Data analysis and observations �n Time-aware POI tagger�
n Experiment �
n Conclusion �
n Thought
5
+Problem Definition
n Given: �n a tweet t published by a user from a
predefined geographical region �n a set of temporal awareness label �
n c ∈ { past, present, future, non-temporal } �
n Goal: �n extract all POI and temporal awareness label
pairs from t. �n ( POI name, temporal awareness label )�
6
+Challenges
n Predominate usage of short names or informal abbreviations
n Many POI names are ambiguous �
7
+Outline
n Introduction �
n Problem Definition �
n Method:PETAR �n POI inventory�n Data analysis and observations �n Time-aware POI tagger�
n Experiment �
n Conclusion �
n Thought
8
+POI inventory�
n The POI inventory is constructed by extracting the POI names mentioned �n Foursquare check-in from tweets.
10
+POI inventory�
n 239,499 Foursquare check-in tweets made by Singaporean users �
n POI names are extracted by applying handcrafted rules, Regexp �
11
+POI inventory�
n Partial POI names are extracted by taking all the sub-sequences of the names. �
12
+Data analysis and observations �
n Data set �n 4.33M tweets from 19,256 unique Singapore-
based users during June 2010 �n 222,201 tweets mentions at least one
candidate POI name from by 13,758 unique users.
13
+Data analysis and observations �
n Observation 1 �Many users reveal their fine-grained locations in their tweets. � n 71.4% of all users in the dataset �n 91.3% of the users who had published at
least 20 tweets �
14
+Data analysis and observations �
n Observation 2 �The candidate POI mentions are mostly very short with one or two words. Many of the mentions are partial location names. ��n 46.7% of the candidate POI names are
unigrams�n 41.6%+ of the candidate POI names are
partial POI names �
15
+Data analysis and observations �
n Observation 3 �About half of the candidate POI mentions indeed refer to locations and their associated temporal awareness can be determined. �
16
+Data analysis and observations �
n Observation 3 �About half of the candidate POI mentions indeed refer to locations and their associated temporal awareness can be determined. �
17
+Data analysis and observations �
n Observation 4 �Among all POIs that were visited, or to be visited, about 90% of the visits to these POIs happen within a day. �
18
+Time-aware POI tagger�
n Simultaneously disambiguate the candidate POI mentions and resolve the temporal awareness
19
+Time-aware POI tagger�
n Lexical features �n Basic: word itself, lowercase form, all-cap … n Contextual: context window, preceding,
following ��n Off to jp now ! Hope it DOESN’t rain �n Off to jp now ! Hope it DOESN’t rain �n Off to jp now ! Hope it DOESN’t rain �
20
+Time-aware POI tagger�
n Grammatical Features n Part-of-speech (POS)�n The time-trend score T(t) of the tweet �n The closest verb/time-trend word �
n its time-trend score, distance, left/right-hand side indicator �
21
+Time-aware POI tagger�
n a dictionary D of time-trend words, containing commonly used words with manually assigned time-trend scores �n D = { “future”:1, “present”:0, “past”:-1, “is”:0,
“was”:-1 … } �
n a tweet t: “Soccer fever at mac now!”�n Is w ∈ t matching an entry in D? �n Considering tags of Verbs.
22
+Time-aware POI tagger�
n Geographical Features n Spatial randomness. �
n Location name confidence. �
n Multiple candidate POI mention.
23
+Time-aware POI tagger�
n Geographical Features n Spatial randomness : POI name entorpy��
�n Location name confidence. �
n Multiple candidate POI mention.
24
+Time-aware POI tagger�
n Geographical Features n Spatial randomness. �
n Location name confidence. �
n Multiple candidate POI mention.
25
+Time-aware POI tagger�
n Geographical Features n Spatial randomness. – entorpy/identifiability�
n Location name confidence. – count �
n Multiple candidate POI mention.
26
+Time-aware POI tagger�
n BILOU Schema Features �n Identify Beginning, Inside and Last word of a
multi-word POI name, and Unit-length POI name, and the words Outside of any POI names �
27
+Outline
n Introduction �
n Problem Definition �
n Method:PETAR �n POI inventory�n Data analysis and observations �n Time-aware POI tagger�
n Experiment �
n Conclusion �
n Thought
28
+Experiment
n Experiment Setup �n manually annotated 4,000 tweets as ground truth �
n Comparative Methods �n Random Annotation (RA) �n K-Nearest Neighbor (KNN) �n StanfordNER (CRF-Classifier) �
29
+Experiment: results
n POI extraction with temporal awareness �
n Disambiguating POIs (ignoring temporal awareness) �
�
30
+Experiment: feature analysis
n Lex features are better for POI mention disambiguation �
�
n Gra features are better for resolving temporal awareness �
31
+Outline
n Introduction �
n Problem Definition �
n Method:PETAR �n POI inventory�n Data analysis and observations �n Time-aware POI tagger�
n Experiment �
n Conclusion �
n Thought
33
+Conclusion
n PETAR facilitate the fine-grained location-based services/marketing and personalization. �
n a POI inventory exploits the crowd wisdom of Foursquare community to enable fine-grained location extraction. �
n a time-aware POI tagger conducts the location extraction and temporal awareness resolution in an effective and efficient way. �
34
+Outline
n Introduction �
n Problem Definition �
n Method:PETAR �n POI inventory�n Data analysis and observations �n Time-aware POI tagger�
n Experiment �
n Conclusion �
n Thought
35
+Thought 1
n Location Extraction with Temporal Awareness �
n Activities Extraction with Temporal Awareness
36
+Thanks for listening. 2014 / 3 / 13 (Fri.) @ MakeLab Group Meeting �[email protected]�