I am here…
description
Transcript of I am here…
1
I am here…
Lu Chen
Kno.e.sis CenterAmit Sheth
Wright State University
advisor of
has PhD student
is in
director of
Subjective Information Extraction, Lu Chen 2
Extracting What We Think and How We Feel from What
We Say in Social Media---- Subjective Information Extraction
Lu ChenKno.e.sis Center
Wright State University
http://cdryan.com/blog/think-feel/
Subjective Information Extraction, Lu Chen 3
Subjectivity refers to the subject and his or her perspective, feelings, beliefs, and desires. in philosophy, the term is usually contrasted with objectivity. [1]
[1] Block, Ned; Flanagan, Owen J.; & Gzeldere, Gven (Eds.) The Nature of Consciousness: Philosophical Debates. Cambridge, MA: MIT Press.
http://fineartamerica.com/featured/its-all-subjective-john-crowther.html
Extraction of subjective information:
• Extracting structured subjective information from unstructured content
• Allowing computation to be done on “what people think” and “how people feel”
Subjective Information Extraction, Lu Chen 4
Directions
• From coarse-grained to fine-grained– Document level -> sentence level -> expression level– General sentiment -> domain-dependent sentiment -> target-dependent sentiment– Sentiment Subjective information
• Sentiment (positive/negative/neutral) -> emotion (happy, sad, angry, surprise, etc.)• Other types of subjective information: Intent, suggestion/recommendation,
wish/expectation, outlook, viewpoint, etc.
• From static to dynamic– Our attitude can be changed during social communication.
• Modeling, detecting, and tracking the change of attitude• What leads to the change of attitude? E.g., persuasion campaign
static
dynamic
coarse-grainedfine-grained
subjective information
Subjective Information Extraction, Lu Chen 5
Jan, 2012Aug, 2011 May, 2012
Discovering Fine-grained Sentiment in
Suicide Notes
Extracting Sentiment Expressions from
Electoral Prediction
Understanding and Modeling Emotions
with Tweets
Progress
Subjective Information Extraction, Lu Chen 6
Extracting a diverse and richer set of sentiment-bearing
expressions, including formal and slang words/phrases
Assessing thetarget-dependent polarity
of each sentiment expression
A novel formulation of assigning polarity to a sentiment expression
as a constrained optimization problem over the tweet corpus
Extracting Diverse Sentiment Expressions With Target-dependent Polarity from Twitter
Lu Chen, Wenbo Wang, Meenakshi Nagarajan, Shaojun Wang, and Amit P. Sheth
Subjective Information Extraction, Lu Chen 7
Challenges• Sentiment expressions in tweets can be very diverse.
Quantitative Study of 3000 Tweets: Distributions of N-grams and Part-of-speech of the Sentiment Expressions
Subjective Information Extraction, Lu Chen 8
Challenges• The polarity of a sentiment expression is sensitive to its target.
predictable
predictable movie
predictable stock
long
long river
long battery life
long time for downloading
Subjective Information Extraction, Lu Chen 9
Approach
Extracting Candidate Expressions
Identifying Inter-Expression Relations
Assessing Target-dependent Polarity
Subjective Information Extraction, Lu Chen 10
Extracting Candidate Expressions
• Root word: a word that is considered sentiment-bearing in general sense.
• Collecting root words from – General-purpose sentiment lexicons: MPQA, General Inquirer, and
SentiWordNet– Slang dictionary: Urban Dictionary
• For each tweet, selecting the “on-target” root words, and extracting all the n-grams that contain at least one selected root word as candidates
Subjective Information Extraction, Lu Chen 11
Identifying Inter-Expression Relations
• Connecting the candidate expressions via two types of inter-expression relations – consistency relation and inconsistency relation
• Basic ideas:– A sentiment expression is inconsistent with its negation; two sentiment
expressions linked by contrasting conjunctions are likely to be inconsistent.
– Two adjacent expressions are consistent if they do not overlap, and there is no extra negation applied to them or no contrasting conjunction connecting them.
Subjective Information Extraction, Lu Chen 12
An Example1. I saw The Avengers yesterday evening. It was long but it was very good!2. I do enjoy The Avengers, but it's both overrated and problematic.3. Saw the avengers last night. Mad overrated. Cheesy lines and horrible
writing. Very predictable.4. The avengers was good but the plot was just simple minded and predictable.5. The Avengers was good. I was not disappointed.
Subjective Information Extraction, Lu Chen 13
Assessing Target-dependent Polarity
• For each candidate expression , – P-Probability – the probability that indicates positive
sentiment– N-Probability – the probability that indicates negative
sentiment
• For each pair of candidate expressions and , – Consistency probability – the probability that and have the same
polarity:
– Inconsistency probability – the probability that and have different polarities:
ic)(Pr i
P c
)(Pr iN c
ic
ic
1)(Pr)(Pr iN
iP cc
ic jcic jc
)(Pr)(Pr)(Pr)(Pr),(Pr jN
iN
jP
iP
jicons cccccc
ic jc
)(Pr)(Pr)(Pr)(Pr),(Pr jP
iN
jN
iP
jiincons cccccc
Subjective Information Extraction, Lu Chen 14
An Optimization Model
• We want the consistency and inconsistency probabilities derived from the the P-Probabilities and N-Probabilities of the candidates will be closest to their expectations suggested by the relation networks.
• Objective Function:
1
1
22),(Pr1),(Pr1minimize
n
i
n
ijji
inconsinconsijji
consconsij ccwccw
where and are the weights of the edges (the frequency of the relations) between and in the consistency and inconsistency relation networks, and n is the total number of candidate expressions.
ic jcconsijw
inconsijw
Subjective Information Extraction, Lu Chen 15
The Example
Subjective Information Extraction, Lu Chen 16
Evaluation• Datasets:
– 168,005 tweets about movies– 258,655 tweets about persons
• Gold standard:– 1,500 tweets labeled with sentiment expressions and overall polarities for the
movie targets– 1,500 tweets labeled with sentiment expressions and overall polarities for the
person targets
• Baseline methods:– MPQA, GI, SWN: For each extracted root word regarding the target, simply
look up its polarity in MPQA, General Inquirer and SentiWordNet, respectively.– PROP: a propagation approach proposed by Qiu et al. (2009)– COM-const: Assign 0.5 to all the candidates as their initial P-Probabilities.– COM-gelex: Initialize the candidates’ polarities according to the root word set.Reference: Qiu, G.; Liu, B.; Bu, J.; and Chen, C. 2009. Expanding domain sentiment lexicon through double propagation. In Proc. of IJCAI.
Subjective Information Extraction, Lu Chen 17
Subjective Information Extraction, Lu Chen 18
Subjective Information Extraction, Lu Chen 19
Application
Subjective Information Extraction, Lu Chen 20
Relevance of User Groups Based on Demographics and Participation to Social Media Based Prediction
-- -- A Case Study of 2012 U.S. Republican Presidential PrimariesLu Chen, Wenbo Wang, and Amit P. Sheth
• Existing studies on predicting election result are under the assumption that all the users should be treated equally.
• How could different groups of users be different in predicting election results?
1. Providing a detailed analysis of the social media users on different dimensions
2. Estimating the “vote” of each user by analyzing his/her tweets, and predicted the results based on “vote-counting”
3. Examining the predictive power of different user groups in predicting the results of Super Tuesday races in 10 states
Subjective Information Extraction, Lu Chen 21
User Categorization
Engagement Degree
Tweet Mode
Content Type
Political Preference
Location
Subjective Information Extraction, Lu Chen 22
Electoral Prediction with Different User Groups
Revealing the challenge of identifying the vote intent of “silent majority”
Retweets may not necessarily reflect users' attitude.
Prediction of user’s vote based on more opinion tweets is not necessarily more accurate than the prediction using more information tweets
The right-leaning user group provides the most accurate prediction result. In the best case (56-day time window), it correctly predict the winners in 8 out of 10 states with an average prediction error of 0.1.
To some extent, it demonstrates the importance of identifying likely voters in electoral prediction.
Subjective Information Extraction, Lu Chen 23
Emotion• Discovering Fine-grained Sentiment in Suicide Notes: Classify each
sentence from suicide notes into 15 emotional categories, e.g., love, pride, guilt, blame, hopelessness, etc.
• Emotion Identification from Twitter Data: 7 emotion categories, including joy, sadness, anger, lover, fear, thankfulness, and surprise– Can we automatically create a large emotion dataset with high quality
labels from Twitter? How?– What features can effectively improve the performance of supervised
machine learning algorithms?– How much performance will be gained by increasing the size of the
training data?– Can the system developed on Twitter data be directly applied to identify
emotions from other datasets?
Subjective Information Extraction, Lu Chen 24
What’s next?
static
dynamic
coarse-grained fine-grained
subjective information
Detecting the change of
attitude during persuasive
communication
Discriminating other types of
subjective information from sentiment,
e.g., wish, intent
Subjective Information Extraction, Lu Chen 25
Thank you !