Opinion mining
-
Upload
heena-gupta -
Category
Engineering
-
view
161 -
download
0
description
Transcript of Opinion mining
DOCUMENT SENTIMENT CLASSIFICATION
SUBMITTED BY:Heena Gupta(2013EMS02)
DEFINITION:
• to classify an opinion document as expressing a positive or negative opinion or sentiment.
• it considers the whole document as a basic information unit.
PROBLEM DEFINITIONGiven an opinion document d evaluating an entity,determine the
overall sentiment s of the opinion holder about the entity,i.e., determine s expressed on aspect GENERAL in the quintuple
(_, GENERAL, s, _, _),
where the entity e, opinion holder h, and time of opinion t are assumed known or irrelevant (do not care).
• If s takes categorical values, e.g., positive and negative, then it is a classification problem.
• If s takes numeric values or ordinal scores within a given range,e.g., 1 to 5, the problem becomes regression.
ASSUMPTION “The opinion document d expresses opinions on a single entity e
and contains opinions from a single opinion holder h.”
Sentiment Classification Using Supervised Learning
• Usually 2 class classification problem
Positive Negative • If rating is used (1-5 stars)
1-2(negative) ,4-5(positive),3(neutral)• Essentially a text classification problem• Many supervised learning techniques(naïve Bayes classification, and
support vector machines (SVM))
Key features used in sentiment classification• Terms and their frequency• Part of speech(POS)• Sentiment words and phrases• Rule of opinion• Sentiment shifter • Syntactic dependency
Algorithm• Two consecutive words are extracted if their POS tag conform to
any of the pattern
Example: This piano produces beautiful sounds
WP NN VB JJ NN
Sentiment Classification Using Unsupervised Learning
• Estimates the sentiment orientation (SO) of the extracted phrases using the pointwise mutual information (PMI) measure:
PMI(term1,term2) = log 2(Pr(term1 ˄ term2 )/(Pr(term1)Pr(term2 )))
PMI measures the degree of statistical dependence between two terms
Pr(term1 ˄ term2 ) is the actual co-occurrence probability of term1 and term2
Pr(term1)Pr(term2) is the co-occurrence probability of the two terms if they are statistically independent.
SO = PMI (phrase ,”excellent”) – PMI(phrase ,”poor”)
SO(phrase) = log2 hits(phrase near “excellent”) hits(“poor”)
hits (phrase near “poor”)hits(“excellent”)
• Given a review, the algorithm computes the average SO of all phrases in the review and classifies the review as positive if the average SO is positive and negative otherwise.
We modeled rating prediction as a graph-based semi-supervised
learning problem, which used • labeled (with ratings) reviews • unlabeled (without ratings) reviews.
The unlabeled reviews were also the test reviews whose ratings need to be predicted.
In the graph, • each node is a document (review) and • the link between two nodes is the similarity value between the two
documents.
The algorithm used assumed that initially a separate learner has already predicted the numerical ratings of the unlabeled documents. The graph based method only improves them by revising the ratings through solving an optimization problem to force ratings to be smooth throughout the graph with regard to both the ratings and the link weights.
Sentiment Rating Prediction(Regression Problem)
Sentiment classification is highly sensitive to the domain from which the training data is extracted.
Two types of domains
Source domain : original domain with labeled trained data
Target domain : new domain which is used for testing
Four Strategies
1. Training on a mixture of labeled reviews from other domains where such data are available and testing on the target domain
2. Training a classifier as above, but limiting the set of features to those only observed in the target domain
3. Using ensembles of classifiers from domains with available labeled data and testing on the target domain
4. Combining small amounts of labeled data with large amounts of unlabeled data in the target.
Cross Domain Sentiment Classification
Cross-language sentiment classification means to perform sentiment classification of opinion documents in multiple languages
Example: If we use Sentiment resources in English to perform classification of Chinese reviews the following algorithm is used :
• Translates each Chinese review into English using multiple translators, which
produce different English versions.• It then uses a lexicon-based approach to classify each translated English
version.
The lexicon consists of a set of
positive terms, a set of negative terms, a set of negation terms, and a set of
intensifiers. • The algorithm then sums up the sentiment scores of the terms in
the review considering negations and intensifiers.• If the final score is less than 0, the review is negative, otherwise positive.• For the final classification of each review, it combines the scores of different
translated versions using various ensemble methods, e.g., average, max, weighted average, voting
Cross Language Sentiment Classification
SENTENCE SUBJECTIVITY AND
SENTIMENT CLASSIFICATION
SUBMITTED BY:Heena Gupta(2013EMS02)
INTRODUCTION
Sentences are short documents .Sentence level analysis is to classify sentiment expressed in each sentence
ASSUMPTION
One assumption that researchers often make is that sentence usually contain single opinion
PROBLEM DEFINITION
Given a sentence x, determine whether x expresses a positive, negative, or neutral (or no) opinion.
SENTENCE SENTIMENT CLASSIFICATION CAN BE SOLVED AS• Two separate classification Problem
1. Classify whether sentence expresses opinion or not( Subjective classification)
2. Classify those opinion sentences into positive and negative classes
Sentences are classified into two types• Subjective (give personal views and opinion)• Objective (some factual information)
• Subjective classification is based on supervised learning• Gradability is a semantic property that enables a word to appear in a
comparative construct and to accept modifying expressions that act as intensifiers or diminishers.
Example: a small planet is usually much larger than a large house• sentence similarity was measured based on shared words, phrases
SUBJECTIVITY CLASSIFICATION
One of the bottlenecks in applying supervised learning is the manual effort involved in annotating a large number of training examples.
Solution :
a bootstrapping approach to label training data automatically was proposed
• The algorithm works by first using two high precision classifiers to automatically identify some subjective and objective sentences.
• The highprecision classifiers use lists of lexical items (single words or n-grams) that are good subjectivity clues.
• HP-Subj classifies a sentence as subjective if it contains two or more strong subjective clues.
• HP-Obj classifies a sentence as objective if there are no strong subjective clues..
• The extracted sentences are then added to Sentiment Analysis and Opinion Mining the training data to learn patterns
ASSUMPTION
A sentence expresses a single sentiment from a single opinion holder.
METHOD• For sentiment classification of subjective sentences, we use a large set
of seed adjectives. • modified log-likelihood ratio to determine the positive or negative
orientation for each adjective, adverb, noun and verb. • An orientation to each sentence is assigned by the average log-
likelihood scores of its words. • Two thresholds are chosen using the training data and applied to
determine whether the sentence has a positive, negative, or neutral
orientation.
SENTENCE SENTIMENT CLASSIFICATION
DEALING WITH CONDITIONAL SENTENCES
• Conditional sentences are sentences that describe implications or
hypothetical situations and their consequences.
Such a sentence typically contains two clauses: • the condition clause• the consequent clause,• that are dependent on each other. Their relationship has significant
impact on whether the sentence expresses a positive or negative sentiment.
• EXAMPLE:
“If someone makes a reliable car, I will buy it”
• Translate test sentences in the target language into the source language and classify them using a source language classifier.
• Translate a source language training corpus into the target language and build a corpus-based classifier in the target language.
• Translate a sentiment or subjectivity lexicon in the source language to the target language and build a lexicon-based classifier in the target language.
CROSS LANGUAGE SUBJECTIVITY CLASSIFICATION