ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe,...

ACL01 Workshop on Collocation

1

Identifying Collocations for Recognizing Opinions

Janyce Wiebe, Theresa Wilson, Matthew BellUniversity of Pittsburgh

Office of Naval Research grant N00014-95-1-0776


2

Introduction

Subjectivity: aspects of language used to express opinions and evaluations (Banfield 1982)

Relevant for many NLP applications, such asinformation extraction and text categorization

This paper: identifying collocational cluesof subjectivity


3

Outline

SubjectivityData and annotationUnigram featuresN-gram featuresGeneralized N-gram featuresDocument classification


4

Subjectivity Tagging

Recognizing opinions and evaluations (Subjective sentences) as opposed to material objectively presented as true (Objective sentences)

Banfield 1982, Fludernik 1993, Wiebe 1994, Stein & Wright 1995


5

Examples

At several different levels, it’s a fascinating tale. subjective

Bell Industries Inc. increased its quarterly to 10 cents from 7 cents a share. objective


6

Subjectivity

“Complained”“You Idiot!”

“Terrible product”

“Speculated”“Maybe”

“Enthused”“Wonderful!”

“Great product”


7

Examples

Strong addressee-oriented negative evaluation Recognizing flames (Spertus 1997) Personal e-mail filters (Kaufer 2000)

I had in mind your facts, buddy, not hers.

Nice touch. “Alleges” whenever facts posted are not in your persona of what is “real.”


8

Examples

Opinionated, editorial language IR, text categorization (Kessler et al. 1997) Do the writers purport to be objective?

Look, this is a man who has great numbers.

We stand in awe of the Woodstock generation’sability to be unceasingly fascinated by the subjectof itself.


9

Examples

Belief and speech reports Information extraction, summarization,

intellectual attribution (Teufel & Moens 2000)

Northwest Airlines settled the remaining lawsuits,a federal judge said.

“The cost of health care is eroding our standard ofliving and sapping industrial strength”, complainsWalter Maher.


10

Other Applications

Review mining (Terveen et al. 1997)

Clustering documents by ideology (Sack 1995)

Style in machine translation and generation (Hovy 1987)


11

Potential Subjective Elements

"The cost of health care is eroding standards of living and sapping industrial strength,” complains Walter Maher.

Sap: potential subjective element

Subjective element


12

Subjectivity

Multiple types, sources, and targets

We stand in awe of the Woodstock generation’s ability to be unceasingly fascinated by the subject of itself.

Somehow grown-ups believed that wisdom adhered to youth.


13

Annotations

Three levels: expression level sentence level document level

Manually tagged + existing annotations


14

Expression Level Annotations

[Perhaps you’ll forgive me] for reposting his response

They promised [e+ 2 yet] more for [e+ 3 really good][e? 1 stuff]


15

Expression Level Annotations

Difficult for manual and automatic tagging: detailed no predetermined classification unit

To date: used for training and bootstrapping

Probably the most natural level


16

Expression Level Data

1000 WSJ sentences (2J)462 newsgroup messages (2J) 15413 words newsgroup data (1J)

Single round of tagging; results promising

Used to generate features, not for evaluation


17

Sentence Level Annotations

“The cost of health care is eroding our standard of living and sapping industrial strength,’’ complains Walter Maher.

“What an idiot,’’ the idiot presumably complained.

A sentence is labeled subjective if any significantexpression of subjectivity appears


18

Document Level Annotations

This work: Opinion Pieces in the WSJ: editorials,letters to the editor, arts & leisure reviews

+ Free source of data+ More directly related to applications

Other work: flames 1-star to 5-star reviews


19

Document Level Annotations

Opinion pieces contain objective sentences Non-opinion pieces contain subjective sentences

Editorials contain facts supporting the argument

News reports present reactions (van Dijk 1988) “Critics claim …” “Supporters argue …”

Reviews contain information about the product


20

Class Proportions in WSJ Sample

Non-Opinion Pieces

Subjective sentences 43% Objective 57%

Noise

Opinion Pieces

Objective 30%Subjective sentences 70%

Noise


21

Word Distribution

13-17% of words are in opinion pieces

83-87%

of words are in non-opinion pieces


22

Evaluation Metric for Feature Swith Respect to Opinion Pieces

Baseline for comparison # words in opinions / total # words

Precision(S) = # instances of S in opinions / total # instances of S

Given the distributions, precisions of evenperfect subjectivity clues would be low

Improvement over baseline taken as evidenceof promising PSEs


23

DataOpinionPieces

Non-Opinion Pieces


24

Document Level Data

Existing opinion-piece annotations used for training

Manually refined classifications used for testing Identified editorials not marked as such 3 hours/edition Kappa = .93 for 2 judges

3 WSJ editions, each more than 150K words


25

Automatically Generated Unigram Features

Adjective and verb features were generated usingdistributional similarity (Lin 1998, Wiebe 2000)

Existing opinion-piece annotations used for training

Manually refined annotations used for testing


26

Unigram Feature Results

WSJ-10 WSJ-33 baseline 17% baseline 13%

+prec/freq +prec/freq

Adjs +21/373 +09/2137

Verbs +16/721 +07/3193


27

Example Adjective Feature

conclusive, undiminished, brute, amazing,unseen, draconian, insurmountable, unqualified,poetic, foxy, vintage, jaded, tropical, distributional,discernible, adept, paltry, warm, reprehensible, astonishing, surprising, commonplace, crooked, dreary, virtuoso, trashy, sandy, static, virulent,desolate, ours, proficient, noteworthy, Insistent,daring, unforgiving, agreeable, uncritical,homicidal, comforting, erotic, resonant, ephemeral,believable, epochal, dense, exotic, topical, …


28

Unique Words hapax legomena

More than expected single-instance words in subjective elements

Unique-1-gram feature: all words that appear once in the test data

Precision is 1.5 times baseline precision

Frequent feature!


29

Unigram Feature Results

WSJ-10 WSJ-33 baseline 17% baseline 13% Adjs +21/373 +09/2137 Verbs +16/721 +07/3193Unique-1-gram +10/6065 +06/6048 Results are consistent, even with different identification

procedures (similarly for WSJ-22)


30

Collocational PSEs

get out what afor the last timejust as well here we go again

Started with the observation that low precision words often compose higher precision collocations


31

Identifying Collocational PSEs

Searching for 2-grams, 3-grams, 4-grams No grammatical generalizations or constraints yet

Train on the data annotated with subjective elements (expression level)

Test on the manually-refined opinion-piece data(document level)


32

Identifying Collocational PSEs: Training Data (reminder)

1000 WSJ sentences (2J)462 newsgroup messages (2J) 15413 words newsgroup data (1J)


They promised [e+ 2 yet] more for [e+ 3 really good] [e? 1 stuff]


33

N-Grams

Each position is filled by a word POS pair

in|prep the|det air|noun


34

Identifying Collocational PSEs: Training, Step 1

Precision(n-gram) = # subjective instances of n-gram / total # instances of n-gram

Precision with respect to subjective elementscalculated for all 1,2,3,4-grams in the training data

An instance of an n-gram is subjective if eachword in the instance is in a subjective element


35

Identifying Collocational PSEs: Training


They promised [e+ 2 yet] more for [e+ 3 really good] [e? 1 stuff]

An instance of an n-gram is subjective if eachword in the instance is in a subjective element


36


N-gram PSEs selected based on their precisions, usingtwo criteria: 1. Precision >= 0.1

2. Precision >= maximum precision of its constituents


37


prec (w1,w2) >= max (prec (w1), prec (w2))

prec (w1,w2,w3) >= max(prec(w1,w2),prec(w3)) or

prec (w1,w2,w3) >= max(prec(w1),prec(w2,w3))

Precision >= maximum precision of its constituents


38

Results

WSJ-10 WSJ-33 baseline 17% baseline 13% Adjs +21/373 +09/2137 Verbs +16/721 +07/3193Unique-1-gram +10/6065 +06/60482-grams +07/2182 +04/20803-grams +09/271 +06/262 4-grams +05/32 -03/30


39

Generalized Collocational PSEs

Replace each single-instance word in the trainingdata with “UNIQUE”

Rerun the same training procedure, finding collocationssuch as highly|adverb UNIQUE|adj

To test the new collocations on test data, firstreplace each single-instance word in the test datawith “UNIQUE”


40

Results

WSJ-10 WSJ-33 baseline 17% baseline 13% Adjs +21/373 +09/2137 Verbs +16/721 +07/3193Unique-1-gram +10/6065 +06/60482-grams +07/2182 +04/20803-grams +09/271 +06/262 4-grams +05/32 - 03/30U-2-grams +24/294 +14/288U-3-grams +27/132 +13/144U-4-grams +83/3 +15/27


41

Example

highly|adverb UNIQUE|adj

highly unsatisfactory

highly unorthodox

highly talented

highly conjectural

highly erotic


42

Example

UNIQUE|verb out|IN farm out chuck out ruling out crowd out flesh out blot out spoken out luck out


44

How do Fixed and U-Collocations Compare?

Started with the observation that low precision words often compose higher precision collocations

Recall the original motivation for investigatingfixed n-gram PSEs:

But unique words are probably not low precision

Are we finding the same collocations two different ways? Or are we finding new PSEs?


45

Comparison

WSJ-10 2-grams 3-grams 4-gramsIntersecting instances 4 2 0%overlap 0.0016 0.0049 0

WSJ-33: all 0s


46

Opinion-Piece Recognitionusing Linear Regression

%correct TP FPAdjs,verbs .896 5 4Ngrams .899 5 3Adjs,verbs,ngrams .909 9 4All features (+ max density) .912 11 4

Max density: the maximum feature count in an 11-word window


47

Future Work

Methods for recognizing non-compositional phrases(e.g., Lin 1999)

Mutual bootstrapping (Rilof and Jones 1999)to alternatively recognize sequences and subjective fillers


48

Sentence Classification

Binary Features: pronoun, adjective, number, modal ¬ “will “,

adverb ¬ “not”, new paragraph

Lexical feature: good for subj; good for obj; good for neither

Probabilistic classifier

10-fold cross validation; 51% baseline72% average accuracy across folds 82% average accuracy on sentences rated certain


49

Test for Bias: Marginal Homogeneity

Worse the fit,greater the bias

C1

C2

C4

C1

C3

C2 C3 C4

4+ = X4

3+ = X3

2+ = X2

1+ = X1

X1+1 =

X2+2 =

X3+3 =

X4+4 =

ii pp for all i


50

Test for Symmetric Disagreement: Quasi-Symmetry

C1

C2

C4

C1

C3

C2 C3 C4

*

*

***

***

* *

**Tests relationshipsamong the off-diagonal counts

Better the fit,higher the correlation


51

Unigram PSEs

Adjectives and Verbs identified using Lin’sdistributional similarity (Lin 1998)

Distributional similarity is often used inNLP to find synonyms

Motivating hypothesis: words may be similarbecause they have similar subjective usages


52

Unigram Feature Generation

AdjFeature = {}For all Adjectives A in the training data: S = A + N most similar words to A P = precision(S) in the training data if P > T: AdjFeature += S Many runs with various settings for N and T

Choose values of N and T on a validation set

Evaluate on a new test set


53

Lin’s Distributional Similarity

Lin 1998

I have a brown dogR1

R3

R2

R4

Word R W I R1 havehave R2 dogbrown R3 dog . . .


54

Filtering

SeedWords

Words+Clusters

Filtered Set

Word + cluster removedif precision on training set< threshold


55

Parameters

SeedWords

Words+Clusters

Cluster size

Threshold


56

Lin’s Distributional Similarity

R W R W R WR W R W R W R W R W

Word1 Word2

Pairs statistically correlated with Word1

Sum over RWint: I(Word1,RWint) + I(Word2,RWint) /Sum over RWw1: I(Word1,RWw1) + Sum over RWw2: I(Word2,RWw2)

ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe,...

Documents

Transcript of ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe,...