Opinion mining

DOCUMENT SENTIMENT CLASSIFICATION

SUBMITTED BY:Heena Gupta(2013EMS02)

DEFINITION:

• to classify an opinion document as expressing a positive or negative opinion or sentiment.

• it considers the whole document as a basic information unit.

PROBLEM DEFINITIONGiven an opinion document d evaluating an entity,determine the

overall sentiment s of the opinion holder about the entity,i.e., determine s expressed on aspect GENERAL in the quintuple

(_, GENERAL, s, _, _),

where the entity e, opinion holder h, and time of opinion t are assumed known or irrelevant (do not care).

• If s takes categorical values, e.g., positive and negative, then it is a classification problem.

• If s takes numeric values or ordinal scores within a given range,e.g., 1 to 5, the problem becomes regression.

ASSUMPTION “The opinion document d expresses opinions on a single entity e

and contains opinions from a single opinion holder h.”

Sentiment Classification Using Supervised Learning

• Usually 2 class classification problem

Positive Negative • If rating is used (1-5 stars)

1-2(negative) ,4-5(positive),3(neutral)• Essentially a text classification problem• Many supervised learning techniques(naïve Bayes classification, and

support vector machines (SVM))

Key features used in sentiment classification• Terms and their frequency• Part of speech(POS)• Sentiment words and phrases• Rule of opinion• Sentiment shifter • Syntactic dependency

Algorithm• Two consecutive words are extracted if their POS tag conform to

any of the pattern

Example: This piano produces beautiful sounds

WP NN VB JJ NN

Sentiment Classification Using Unsupervised Learning

• Estimates the sentiment orientation (SO) of the extracted phrases using the pointwise mutual information (PMI) measure:

PMI(term1,term2) = log 2(Pr(term1 ˄ term2 )/(Pr(term1)Pr(term2 )))

PMI measures the degree of statistical dependence between two terms

Pr(term1 ˄ term2 ) is the actual co-occurrence probability of term1 and term2

Pr(term1)Pr(term2) is the co-occurrence probability of the two terms if they are statistically independent.

SO = PMI (phrase ,”excellent”) – PMI(phrase ,”poor”)

SO(phrase) = log2 hits(phrase near “excellent”) hits(“poor”)

hits (phrase near “poor”)hits(“excellent”)

• Given a review, the algorithm computes the average SO of all phrases in the review and classifies the review as positive if the average SO is positive and negative otherwise.

We modeled rating prediction as a graph-based semi-supervised

learning problem, which used • labeled (with ratings) reviews • unlabeled (without ratings) reviews.

The unlabeled reviews were also the test reviews whose ratings need to be predicted.

In the graph, • each node is a document (review) and • the link between two nodes is the similarity value between the two

documents.

The algorithm used assumed that initially a separate learner has already predicted the numerical ratings of the unlabeled documents. The graph based method only improves them by revising the ratings through solving an optimization problem to force ratings to be smooth throughout the graph with regard to both the ratings and the link weights.

Sentiment Rating Prediction(Regression Problem)

Sentiment classification is highly sensitive to the domain from which the training data is extracted.

Two types of domains

Source domain : original domain with labeled trained data

Target domain : new domain which is used for testing

Four Strategies

1. Training on a mixture of labeled reviews from other domains where such data are available and testing on the target domain

2. Training a classifier as above, but limiting the set of features to those only observed in the target domain

3. Using ensembles of classifiers from domains with available labeled data and testing on the target domain

4. Combining small amounts of labeled data with large amounts of unlabeled data in the target.

Cross Domain Sentiment Classification

Cross-language sentiment classification means to perform sentiment classification of opinion documents in multiple languages

Example: If we use Sentiment resources in English to perform classification of Chinese reviews the following algorithm is used :

• Translates each Chinese review into English using multiple translators, which

produce different English versions.• It then uses a lexicon-based approach to classify each translated English

version.

The lexicon consists of a set of

positive terms, a set of negative terms, a set of negation terms, and a set of

intensifiers. • The algorithm then sums up the sentiment scores of the terms in

the review considering negations and intensifiers.• If the final score is less than 0, the review is negative, otherwise positive.• For the final classification of each review, it combines the scores of different

translated versions using various ensemble methods, e.g., average, max, weighted average, voting

Cross Language Sentiment Classification

SENTENCE SUBJECTIVITY AND

SENTIMENT CLASSIFICATION

SUBMITTED BY:Heena Gupta(2013EMS02)

INTRODUCTION

Sentences are short documents .Sentence level analysis is to classify sentiment expressed in each sentence

ASSUMPTION

One assumption that researchers often make is that sentence usually contain single opinion

PROBLEM DEFINITION

Given a sentence x, determine whether x expresses a positive, negative, or neutral (or no) opinion.

SENTENCE SENTIMENT CLASSIFICATION CAN BE SOLVED AS• Two separate classification Problem

1. Classify whether sentence expresses opinion or not( Subjective classification)

2. Classify those opinion sentences into positive and negative classes

Sentences are classified into two types• Subjective (give personal views and opinion)• Objective (some factual information)

• Subjective classification is based on supervised learning• Gradability is a semantic property that enables a word to appear in a

comparative construct and to accept modifying expressions that act as intensifiers or diminishers.

Example: a small planet is usually much larger than a large house• sentence similarity was measured based on shared words, phrases

SUBJECTIVITY CLASSIFICATION

One of the bottlenecks in applying supervised learning is the manual effort involved in annotating a large number of training examples.

Solution :

a bootstrapping approach to label training data automatically was proposed

• The algorithm works by first using two high precision classifiers to automatically identify some subjective and objective sentences.

• The highprecision classifiers use lists of lexical items (single words or n-grams) that are good subjectivity clues.

• HP-Subj classifies a sentence as subjective if it contains two or more strong subjective clues.

• HP-Obj classifies a sentence as objective if there are no strong subjective clues..

• The extracted sentences are then added to Sentiment Analysis and Opinion Mining the training data to learn patterns

ASSUMPTION

A sentence expresses a single sentiment from a single opinion holder.

METHOD• For sentiment classification of subjective sentences, we use a large set

of seed adjectives. • modified log-likelihood ratio to determine the positive or negative

orientation for each adjective, adverb, noun and verb. • An orientation to each sentence is assigned by the average log-

likelihood scores of its words. • Two thresholds are chosen using the training data and applied to

determine whether the sentence has a positive, negative, or neutral

orientation.

SENTENCE SENTIMENT CLASSIFICATION

DEALING WITH CONDITIONAL SENTENCES

• Conditional sentences are sentences that describe implications or

hypothetical situations and their consequences.

Such a sentence typically contains two clauses: • the condition clause• the consequent clause,• that are dependent on each other. Their relationship has significant

impact on whether the sentence expresses a positive or negative sentiment.

• EXAMPLE:

“If someone makes a reliable car, I will buy it”

• Translate test sentences in the target language into the source language and classify them using a source language classifier.

• Translate a source language training corpus into the target language and build a corpus-based classifier in the target language.

• Translate a sentiment or subjectivity lexicon in the source language to the target language and build a lexicon-based classifier in the target language.

CROSS LANGUAGE SUBJECTIVITY CLASSIFICATION

Opinion mining

Engineering

Transcript of Opinion mining