Thesis oral defense 2015 elvis saravia
-
Upload
elvis-saravia -
Category
Data & Analytics
-
view
23 -
download
0
Transcript of Thesis oral defense 2015 elvis saravia
Inferring User Interests from Microblog Data through Opinion Mining
Student: Elvis Saravia Advisor: Prof. Yi-Shin Chen
Institution: National Tsing Hua UniversityProgram: International Master Program in Information Systems and Applications (IMPISA)
1
Our Journey...→ Introduction→ Related Work→ Objectives→ Framework→ Experiment & Results→ Conclusion & Future Work→ Q & A
2Inferring User Interests from Microblog Data through Opinion Mining
Introduction→ Rapid Growth of the Web
○ Web 2.0 (user-generated content)○ Data generated rapidly○ Social Sharing platforms (Facebook & Twitter)
3
→ Online User-Behaviour Data○ Introduced research opportunities ○ The most valuable asset that a company possesses
Inferring User Interests from Microblog Data through Opinion Mining
Online User Behaviors
4Inferring User Interests from Microblog Data through Opinion Mining
Interests Emotions
Objectives
5Inferring User Interests from Microblog Data through Opinion Mining
→ This work aims to develop a behavior-based user interests identification model.
→ The algorithms proposed combine both contextual and emotion analysis to obtain better performance on user interests extraction.
Motivation→ Economic value
○ Recommendation services (dating sites & ads. targeting)○ Personalized systems (E-commerce & Search engines)
6
→ Personalization○ We love to be uniquely identified ○ Reduce extraction of ambiguous interests
Inferring User Interests from Microblog Data through Opinion Mining
“I may be exactly the same demographic as my neighbor, but that has nothing to do with what I eat.” - Lesperance VP of Digital Marketing and CRM for GrubHub
Related Work→ Ontology [Mylonas et al. 2008] [Bakalov et al. 2009]
○ Search logs and contextual information to build ontology
→ Social Structure [Bao et al., WWW 2010] [Wen et al., SIGKDD 2010]○ Focuses on the user social graph (friends and follows)
7Inferring User Interests from Microblog Data through Opinion Mining
Related Work → Contextual Information [Piao et al., 2011] [Yang et al., JCIS 2012]
○ Natural Language Processing (NLP) and latent Dirichlet allocation (LDA)
→ Behavior-Based [Zhou et al. 2008; Xing et al., WWW 2010]○ Collaborative filtering and Social Actions○ User Interactions (printing, copying and saving)
8Inferring User Interests from Microblog Data through Opinion Mining
Interest Definition→ Considerations:
○ Not everything we say or write interests us○ Our interests shouldn’t be ambiguous○ Ranking interests is challenging
→ Observations:○ Interests ← Motivation ← Positive Emotions [Silvia et al., 2002]○ Our personal Interests are interlinked with our positive emotions
9
I am in New York.
I cannot wait for the Facebook Developer Conference
Inferring User Interests from Microblog Data through Opinion Mining
Framework
10Inferring User Interests from Microblog Data through Opinion Mining
Contextual analysis + Emotion analysis
Rule-BasedExtraction
Emotion Classification
KeywordExtraction
Pre-processing
Interest Candidates Extraction
Interest Identification
Emotion Tagging & Filtering
Emotion Analysis
Interest Identification
Twitter Corpus
11Inferring User Interests from Microblog Data through Opinion Mining
Output file
POSTagging
Pre-processing
12Inferring User Interests from Microblog Data through Opinion Mining
Pre-processingFilter out information that doesn’t
provide any knowledge or value to user interest identification
Pre-processing→ Filter out non-English posts and Re-Tweets
→ Remove useless punctuation marks
13Inferring User Interests from Microblog Data through Opinion Mining
I am loving Jeremy Lin! I am loving Jeremy Lin!
For every post (P) in a collection of Tweets (T)
Pre-processing → Remove tweets containing hyperlinks (no emotion)
→ Remove repeated tweets (same emotion)
14Inferring User Interests from Microblog Data through Opinion Mining
Linsanity comes to LA. http://espn.com
.
.Linsanity comes to LA. http://espn.com
For every post (P) in a collection of Tweets (T)
Pre-processing
15Inferring User Interests from Microblog Data through Opinion Mining
→ Remove terms less than 3 characters long and terms containing “@” symbol○ (e.g. to and @jason)
@jason I love to go to New York @jason I love to go to New York
For every post (P) in a collection of Tweets (T)
16Inferring User Interests from Microblog Data through Opinion Mining
Rule-BasedExtraction
Emotion Classification
KeywordExtraction
Pre-processing
Interest Candidates Extraction
Interest Identification
Emotion Tagging & Filtering
Emotion Analysis
Interest Identification
Twitter Corpus
Output file
POSTagging
Pre-processing
17Inferring User Interests from Microblog Data through Opinion Mining
Interest Candidates Extraction3-phase interest candidates algorithm to extract as much interest candidates
as possible
Interest Candidates Extraction (1)→ POS-tagging
○ Part-of-speech tagging○ Nouns, Proper Nouns and Named entities○ Limitation: Naïve interest candidates
18
I cannot wait for the Facebook Developer Conference
I cannot wait for the Facebook Developer Conference
Inferring User Interests from Microblog Data through Opinion Mining
For every post (P) in a collection of Tweets (T)
Interest Candidates Extraction (2)→ Keyword Extraction (RAKE) [Rose et. al 2009]
○ Extract keywords from posts○ Limitation: phrase boundaries
19
I cannot wait for the Facebook Developer Conference
I cannot wait for the Facebook Developer Conference
Inferring User Interests from Microblog Data through Opinion Mining
I enjoyed watching Mr. Bean
For every post (P) in a collection of Tweets (T)
Interest Candidates Extraction (3)→ Previous Phases: Unreliable and Inconsistent
→ Emerging Interest Concepts? ○ Previous phases cannot extract them○ Provide better insights about users current interests
20Inferring User Interests from Microblog Data through Opinion Mining
Wimbledon 2015
Interest Candidates Extraction (3)→ Rule-Based Concept Extraction [Hsu et al., 2015]
○ Extract frequent emerging concepts based on “wisdom of the crowd”○ 80,000,000 tweets (3,000,000 users)○ 6 patterns were defined
21Inferring User Interests from Microblog Data through Opinion Mining
Interest Candidates Extraction (3)
22Inferring User Interests from Microblog Data through Opinion Mining
Interest Candidates Extraction (3)
23Inferring User Interests from Microblog Data through Opinion Mining
Interest Candidates Extraction (3)
24Inferring User Interests from Microblog Data through Opinion Mining
I am loving Wimbledon 2015 #WC2015 Wimbledon 2015
Crowd-wisdom
Interest Candidates Extraction
25Inferring User Interests from Microblog Data through Opinion Mining
→ Combine the results of the 3-phase interest candidates extraction algorithm.○ Repetitive interest candidates were removed
For every post (P) in a collection of Tweets (T)
26Inferring User Interests from Microblog Data through Opinion Mining
Rule-BasedExtraction
Emotion Classification
KeywordExtraction
Pre-processing
Interest Candidates Extraction
Interest Identification
Emotion Tagging & Filtering
Emotion Analysis
Interest Identification
Twitter Corpus
Output file
POSTagging
Pre-processing
27Inferring User Interests from Microblog Data through Opinion Mining
Emotion AnalysisTagging interest candidates with their pertaining emotion
Emotion Classification→ Pattern based approach
○ Appropriate for grammar informality of tweets
○ Effective for multilingual applications○ Contribution Degree
→ Why Positive emotions?○ Anticipation, Joy and Trust○ Highly related to motivation and interests.
28Inferring User Interests from Microblog Data through Opinion Mining
[Argueta et al., 2015]
Anticipation
Joy
Trust
Surprise
Sadness
Disgust
Anger
Fear
Emotion Analysis→ Tag Interests with emotion
○ Every interest candidate is tagged with its pertaining emotion○ Original post is classified (no pre-processing)
→ Only positive emotions considered:○ Anticipation, Joy and Trust○ Negative emotions were not considered in this work
29Inferring User Interests from Microblog Data through Opinion Mining
Joy
Anticipation
Trust
Emotion Filtering→ Filtering process
○ Posts bearing no emotion○ Shorts posts○ Posts that bear opposite emotions (ambiguous)
30Inferring User Interests from Microblog Data through Opinion Mining
Joy
Anticipation
Trust
Emotion Classification
31
I am loving Jeremy Lin right now
The traffic today is okay!
.
.
Feeling excited for the ASONAM Conference. #feelingblessed
joy
trust ASONAM Conference
Jeremy Lin
Inferring User Interests from Microblog Data through Opinion Mining
For every post P in a collection of Tweets T
32Inferring User Interests from Microblog Data through Opinion Mining
Rule-BasedExtraction
Emotion Classification
KeywordExtraction
Pre-processing
Interest Candidates Extraction
Interest Identification
Emotion Tagging & Filtering
Emotion Analysis
Interest Identification
Twitter Corpus
Output file
POSTagging
Pre-processing
33Inferring User Interests from Microblog Data through Opinion Mining
Interest IdentificationRanking interest candidates in each emotion set
Interest Identification
34Inferring User Interests from Microblog Data through Opinion Mining
→ Repetitive Interest Candidates○ Interest candidates found under several emotions are kept
→ Ambiguity○ Remove interests that are ambiguous○ Emotion classifier aids at this very well
Interest Identification
35Inferring User Interests from Microblog Data through Opinion Mining
→ Occurrence○ Calculate frequency for each interest candidate (ws) ○ Frequency (f) is based on occurrence
Anticipation:
ws1 (f)ws2 (f)wsn (f)
...
Joy:
Jeremy Lin (f)ws2 (f)wsn (f)
...
Trust:
ACM Conference (f)ws2 (f)wsn (f)
...
Interest Identification
36Inferring User Interests from Microblog Data through Opinion Mining
→ Ranking○ Calculate weight for each interest candidate (ws)○ Rank them by weight (w)
Anticipation:
ws1 (w)ws2 (w)wsn (w)
...
Joy:
Jeremy Lin (w)ws2 (w)wsn (w)
...
Trust:
ACM Conference (w)ws2 (w)wsn (w)
...
Interest Identification
37Inferring User Interests from Microblog Data through Opinion Mining
38Inferring User Interests from Microblog Data through Opinion Mining
Experiments and Results2 different types of experiment were conducted
Experiment (1)
39Inferring User Interests from Microblog Data through Opinion Mining
→ Experimental Setup○ 3 active Twitter users (A,B,C)○ The latest 3000+ English posts crawled from feed○ The top-15 most frequent interests per emotion○ Results rated by the users
Evaluation
40
0 = not-related1~4 = related5~10 = highly related
User A: Top 15 frequent interests per emotion
Inferring User Interests from Microblog Data through Opinion Mining
Evaluation
41
0 = not-related1~4 = related5~10 = highly related
User B: Top 15 frequent interests per emotion
Inferring User Interests from Microblog Data through Opinion Mining
Evaluation
42
0 = not-related1~4 = related5~10 = highly related
User C: Top 15 frequent interests per emotion
Inferring User Interests from Microblog Data through Opinion Mining
Evaluation
43
User C
Inferring User Interests from Microblog Data through Opinion Mining
User BUser A
Experiment (2)
44Inferring User Interests from Microblog Data through Opinion Mining
→ Experimental Setup○ Online Surveys○ 7 Users (A,B,C,D,E,F)○ Top 5 interests (including 5 sub-category interests)○ The latest 3000+ English posts crawled from feed○ Interests are categorized (ConceptNet)
Categorizing Interests
45Inferring User Interests from Microblog Data through Opinion Mining
→ Hierarchical Interests Extraction○ Top 15 interests in the 3 emotion sets are combined and categorized○ ConceptNet API○ 2 level “is-a” relationship○ Observation: top interest candidates were highly related
Evaluation
46Inferring User Interests from Microblog Data through Opinion Mining
Precision of system on raw data (Twitter feed)
Evaluation
47Inferring User Interests from Microblog Data through Opinion Mining
● Precision of system when including ambiguous tweets
● Ambiguous tweets bearopposite or no emotion
Evaluation
48Inferring User Interests from Microblog Data through Opinion Mining
● Precision of the full system when considering positive emotions
● Average precision of approx. 81% as top performance (top-10).
Evaluation
49Inferring User Interests from Microblog Data through Opinion Mining
Performance of componentsper user
Evaluation
50Inferring User Interests from Microblog Data through Opinion Mining
Precision comparison of all components evaluated
Conclusion
51Inferring User Interests from Microblog Data through Opinion Mining
→ Positive emotions contribute tremendously to user interests identification as seen in the experiments section.
→ Emotion Analysis is an important component for the effective ranking of user’s interests and the removal of ambiguous information.
Future Work
52Inferring User Interests from Microblog Data through Opinion Mining
→ Analyze emotion distribution to observe if there are patterns in the change of interests.
→ Adopt machine learning techniques to automate feature extraction for interest identification.
→ Improve approach by considering temporal information and negative emotions as a weighting factor.
→ Improve Interest categorization.
Thanks for listening...
53Inferring User Interests from Microblog Data through Opinion Mining
Q & A