Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

54
Creating Subjective and Objective Sentence Classifiers from Unannotated Texts Ellen Riloff University of Utah (Joint work with Janyce Wiebe at the University of Pittsburgh)

description

Ellen Riloff University of Utah. Creating Subjective and Objective Sentence Classifiers from Unannotated Texts. (Joint work with Janyce Wiebe at the University of Pittsburgh). What is Subjectivity?. - PowerPoint PPT Presentation

Transcript of Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Page 1: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Ellen RiloffUniversity of Utah

(Joint work with Janyce Wiebe at theUniversity of Pittsburgh)

Page 2: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

What is Subjectivity?

• Subjective language includes opinions, rants, allegations, accusations, suspicions, and speculation.

• Distinguishing factual information from subjective information could benefit many applications, including:

– information extraction– question answering– summarization– spam filtering

Page 3: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Previous Work on Subjectivity Classification

• Document-level subjectivity classification (e.g., [Turney 2002; Pang et al. 2002; Spertus 1997])

But most documents contain subjective and objective sentences. [Wiebe et al. 01] reported that 44% of sentences in their news corpus were subjective!

• Sentence-level subjectivity classification [Dave et al. 2003; Yu et al. 2003; Riloff, Wiebe, & Wilson 2003]

Page 4: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Goals of our research

• Create classifiers that label sentences as subjective or objective.

• Learn subjectivity and objectivity clues from unannotated corpora.

• Use information extraction techniques to learn subjective nouns.

• Use information extraction techniques to learn subjective and objective patterns.

Page 5: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Outline of Talk

• Learning subjective nouns with extraction patterns

• Automatically generating training data with high-precision classifiers

• Learning subjective and objective extraction patterns

• Naïve Bayes classification and self-training

Page 6: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Information Extraction

• Information extraction (IE) systems identify facts related to a domain of interest.

• Extraction patterns are lexico-syntactic expressions that identify the role of an object. For example:

<subject> was killed

assassinated <dobj>

murder of <np>

Page 7: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Learning Subjective Nouns

Goal: to learn subjective nouns from unannotated texts.

Method: applying IE-based bootstrapping algorithms that were designed to learn semantic categories.

Hypothesis: extraction patterns can identify subjective contexts that co-occur with subjective nouns.

Example: “expressed <dobj>” concern, hope, support

Page 8: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Extraction Examples

expressed <dobj> condolences, hope, grief, views, worries

indicative of <np> compromise, desire, thinking

inject <dobj> vitality, hatred

reaffirmed <dobj> resolve, position, commitment

voiced <dobj> outrage, support, skepticism, opposition, gratitude, indignation

show of <np> support, strength, goodwill, solidarity

<subj> was shared anxiety, view, niceties, feeling

Page 9: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Meta-Bootstrapping [Riloff & Jones 99]

Unannotated Texts

Best Extraction Pattern

Extractions (Nouns)

Ex: hope, grief, joy, concern, worries

Ex:expressed <DOBJ>

Ex: happiness, relief, condolences

Page 10: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Basilisk [Thelen & Riloff 02]

extraction patterns andtheir extractions

corpus

seedwords

semanticlexicon

5 best candidate words

Pattern Pool

best patterns

CandidateWord Pool

extractions

Page 11: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Subjective Seed Words

cowardice embarrassment hatred outragecrap fool hell slanderdelight gloom hypocrisy sighdisdain grievance love twitdismay happiness nonsense virtue

Page 12: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Subjective Noun Results

• Bootstrapping corpus: 950 unannotated FBIS documents (English-language foreign news)

• We ran each bootstrapping algorithm for 400 cycles, generating ~2000 words.

• We manually reviewed the words and labeled them as strongly subjective or weakly subjective.

• Together, they learned 1052 subjective nouns (454 strong, 598 weak).

Page 13: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Examples of Strong Subjective Nouns

anguish exploitation pariahantagonism evil repudiationapologist fallacies revengeatrocities genius roguebarbarian goodwill sanctimoniousbelligerence humiliation scumbully ill-treatment smokescreencondemnation injustice sympathydenunciation innuendo tyrannydevil insinuation venomdiatribe liarexaggeration mockery

Page 14: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Examples of Weak Subjective Nouns

aberration eyebrows resistantallusion failures riskapprehensions inclination sincerityassault intrigue slumpbeneficiary liability spiritbenefit likelihood successblood peaceful tolerancecontroversy persistent trickcredence plague trustdistortion pressure unitydrama promiseeternity rejection

Page 15: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Outline of Talk

• Learning subjective nouns with extraction patterns

• Automatically generating training data with high-precision classifiers

• Learning subjective and objective extraction patterns

• Naïve Bayes classification and self-training

Page 16: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Initial Training Data Creation

rule-based subjectivesentenceclassifier

rule-basedobjectivesentenceclassifier

subjective & objective sentences

unlabeled texts

subjective clues

Page 17: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Subjective Clues

• entries from manually developed resources [Levin 93; Ballmer & Brennenstuhl 81]

• Framenet lemmas with frame element experiencer [Baker et al. 98]

• adjectives manually annotated for polarity [Hatzivassiloglou & McKeown 97]

• n-grams learned from corpora [Dave et al. 03; Wiebe et al. 01]

• words distributionally similar to subjective seed words [Wiebe 00]

• subjective nouns learned from extraction pattern bootstrapping [Riloff et al. 03]

Page 18: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Creating High-Precision Rule-Based Classifiers

• a sentence is subjectivesubjective if it contains 2 strong subjective clues

• a sentence is objective if:

– it contains no strong subjective clues

– the previous and next sentence contain 1 strong subjective clue

– the current, previous, and next sentence together contain 2 weak subjective clues

GOAL: use subjectivity clues from previous research to build a high-precision (low-recall) rule-based classifier

Page 19: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Data Set

• The MPQA Corpus contains 535 FBIS texts that have been manually annotated for subjectivity.

• Our test set consisted of 9,289 sentences from the MPQA corpus.

• We consider a sentence to be subjective if it has at least one private state of strength medium or higher.

• 54.9% of the sentences in our test set are subjective.

Page 20: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Accuracy of Rule-Based Classifiers

SubjRec SubjPrec SubjFSubj RBC 34.2 90.4 46.6

ObjRec ObjPrec ObjFObj RBC 30.7 82.4 44.7

Page 21: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Generated Data

We applied the rule-based classifiers to 298,809 sentences from (unannotated) FBIS documents.

52,918 were labeled subjective

47,528 were labeled objective

training set of over 100,000 labeled sentences!

Page 22: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Outline of Talk

• Learning subjective nouns with extraction patterns

• Automatically generating training data with high-precision classifiers

• Learning subjective and objective extraction patterns

• Naïve Bayes classification and self-training

Page 23: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Representing Subjective Expressions with Extraction Patterns

• Extraction patterns can represent linguistic expressions that are not fixed word sequences.

drove [NP] up the wall- drove him up the wall- drove George Bush up the wall- drove George Herbert Walker Bush up the wall

step on [modifiers] toes- step on her toes- step on the mayor’s toes- step on the newly elected mayor’s toes

gave [NP] a [modifiers] look- gave his annoying sister a really really mean look

Page 24: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

The Extraction Pattern Learner

• Used AutoSlog-TS [Riloff 96] to learn extraction patterns.

• AutoSlog-TS needs relevant and irrelevant texts as input.

• Statistics are generated measuring each pattern’s association with the relevant texts.

• The subjective sentences were called relevant, and the objective sentences were called irrelevant.

Page 25: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

<subject> passive-vp <subj> was satisfied<subject> active-vp <subj> complained<subject> active-vp dobj <subj> dealt blow<subject> active-vp infinitive <subj> appears to be<subject> passive-vp infinitive <subj> was thought to be<subject> auxiliary dobj <subj> has position

active-vp <dobj> endorsed <dobj>infinitive <dobj> to condemn <dobj>active-vp infinitive <dobj> get to know <dobj>passive-vp infinitive <dobj> was meant to show <dobj>subject auxiliary <dobj> fact is <dobj>

passive-vp prep <np> opinion on <np>active-vp prep <np> agrees with <np>infinitive prep <np> was worried about <np>noun prep <np> to resort to <np>

Page 26: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Relevant Irrelevant

[The World Trade Center], [an icon] of [New York City], was intentionally attacked very early on [September 11, 2001].

Parser Extraction Patterns:<subj> was attackedicon of <np>was attacked on <np> Syntactic Templates

AutoSlog-TS(Step 1)

Page 27: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

AutoSlog-TS (Step 2)

Relevant Irrelevant

Extraction Patterns Freq Prob<subj> was attacked 100 .90icon of <np> 5 .20was attacked on <np> 80 .79

Extraction Patterns:<subj> was attackedicon of <np>was attacked on <np>

Page 28: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Identifying Subjective and Objective Patterns

AutoSlog-TS generates 2 statistics for each pattern:F = pattern frequencyP = relevant frequency / pattern frequency

We call a pattern subjective if F 5 and P .95(6364 subjective patterns were learned)

We call a pattern objective if F 5 and P .15(832 objective patterns were learned)

Page 29: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Examples of Learned Extraction Patterns

Subjective Patterns<subj> believes<subj> was convincedaggression against <np>to express <dobj>support for <np>

Objective Patterns<subj> increased production<subj> took effectdelegation from <np>occurred on <np>plans to produce <dobj>

Page 30: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Patterns with Interesting Behavior

PATTERN FREQ P(Subj | Pattern)<subj> asked 128 .63<subj> was asked 11 1.0

<subj> was expected 45 .42was expected from <np> 5 1.0

<subj> talk 28 .71talk of <np> 10 .90<subj> is talk 5 1.0

<subj> put 187 .67<subj> put end 10 .90

<subj> is fact 38 1.0fact is <dobj> 12 1.0

Page 31: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Augmenting the Rule-Based Classifiers with Extraction Patterns

SubjRec SubjPrec SubjFSubj RBC 34.2 90.4 46.6Subj RBC 58.6 80.9 68.0 w/Patterns

ObjRec ObjPrec ObjFObj RBC 30.7 82.4 44.7Obj RBC 33.5 82.1 47.6 w/Patterns

Page 32: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Outline of Talk

• Learning subjective nouns with extraction patterns

• Automatically generating training data with high-precision classifiers

• Learning subjective and objective extraction patterns

• Naïve Bayes classification and self-training

Page 33: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Naïve Bayes Classifier

• We created an NB classifier using the initial training set and several set-valued features:

– strong & weak subjective clues from RBCs

– subjective & objective extraction patterns

– POS tags (pronouns, modals, adjectives, cardinal numbers, adverbs)

– separate features for each of the current, previous, and next sentences

Page 34: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Naïve Bayes Training

extractionpattern learner

training set

objectivepatterns

subjectivepatterns Naïve Bayes

training

POSfeatures

subjectiveclues

Page 35: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Naïve Bayes Results

SubjRec SubjPrec SubjFNaïve Bayes 70.6 79.4 74.7

ObjRec ObjPrec ObjFNaïve Bayes 77.6 68.4 72.7RWW03 74 70 72(supervised)

RWW03 77 81 79(supervised)

Page 36: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Self-Training Process

best N sentences

Naïve Bayesclassifier

unlabeledsentences

extractionpattern learner

training set

objectivepatterns

subjectivepatterns Naïve Bayes

training

POSfeatures

subjectiveclues

Page 37: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Self-Training Results

SubjRec SubjPrec SubjFSubj RBC w/Patts 1 58.6 80.9 68.0Subj RBC w/Patts 2 62.4 80.4 70.3

ObjRec ObjPrec ObjFObj RBC w/Patts 1 33.5 82.1 47.6Obj RBC w/Patts 2 34.8 82.6 49.0

Naïve Bayes 1 70.6 79.4 74.7Naïve Bayes 2 86.3 71.3 78.1

Naïve Bayes 1 77.6 68.4 72.7Naïve Bayes 2 57.6 77.5 66.1

RWW03 (supervised) 77 81 79

RWW03 (supervised) 74 70 72

Page 38: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Conclusions

• We can build effective subjective sentence classifiers using only unannotated texts.

• Extraction pattern bootstrapping can learn subjective nouns.

• Extraction patterns can represent richer subjective expressions.

• Learning methods can discover subtle distinctions between very similar expressions.

Page 39: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

THE END

Thank you!

Page 40: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Related Work

• Genre classification (e.g., [Karlgren and Cutting 1994; Kessler et al. 1997; Wiebe et al. 2001])

• Learning adjectives, adj. phrases, verbs, and N-grams [Turney 2002; Hatzivassiloglou & McKeown 1997;

Wiebe et al. 2001]

• Semantic lexicon learning [Hearst 1992; Riloff & Shepherd 1997; Roark & Charniak 1998; Caraballo 1999]

– Meta-Bootstrapping [Riloff & Jones 99]

– Basilisk [Thelen & Riloff 02]

Page 41: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

What is Information Extraction?

Extracting facts relevant to a specific topic from narrative text.

Example Domains

Terrorism: perpetrator, victim, target, date, location

Management succession: person fired, successor, position, organization, date

Infectious disease outbreaks: disease, organism, victim, symptoms, location, date

Page 42: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Information Extraction from Narrative Text

• Role relationships define the information of interest …keywords and named entities are not sufficient.

Researchers have discovered how anthrax toxin destroys cells and rapidly causes death ...

Troops were vaccinated against anthrax, cholera, …

Page 43: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Ranking and Manual Review

• The patterns are ranked using the metric:

• A domain expert reviews the top-ranked patterns and assigns thematic roles to the good ones.

RlogF (patterni) = Fi

Ni

* log2 (Fi)

Fi is the # of instances of patterni in relevant textsNi is the # of instances of patterni in all texts

Page 44: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Semantic Lexicons• A semantic lexicon assigns categories to words.

•Semantic dictionaries are hard to come by, especially for specialized domains.

•WordNet [Miller 90] is popular but is not always sufficient. [Roark & Charniak 98] found that 3 of every 5 words learned by their system were not present in WordNet.

politician humantruck vehiclegrenade weapon

Page 45: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

The Bootstrapping Era

Unannotated Texts

+

= KNOWLEDGE !

Page 46: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Meta-Bootstrapping

Unannotated Texts

Best Extraction Pattern

Extractions (Nouns)

Ex: anthrax, ebola, cholera, flu, plague

Ex:outbreak of <NP>

Ex: smallpox,tularemia, botulism

Page 47: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Semantic Lexicon (NP) Results

Iter Company Location Title Location Weapon(Web) (Web) (Web) (Terror) (Terror)

1 5/5 (1.0) 5/5 (1.0) 0/1 (0) 5/5(1.0) 4/4(1.0)

10 25/32 (.78) 46/50 (.92) 22/31 (.71) 32/50 (.92) 31/44 (.70)

20 52/65 (.80) 88/100 (.88) 63/81 (.78) 66/100 (.66) 68/94 (.72)

30 72/113 (.64) 129/150 (.86) 86/131 (.66) 100/150 (.67) 85/144 (.59)

Page 48: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Basilisk

extraction patterns andtheir extractions

corpus

seedwords

semanticlexicon

5 best candidate words

Pattern Pool

best patterns

CandidateWord Pool

extractions

Page 49: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

The Pattern Pool

RlogF (patterni) = Fi

Ni

* log2 (Fi)

Fi is the number of category members extracted by patterni

Ni is the total number of nouns extracted by patterni

where:

Every extraction pattern is scored and the best patternsare put into a Pattern Pool.

The scoring function is:

Page 50: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Scoring Candidate Words

1. collecting all patterns that extracted it

2. computing the average number of category members extracted by those patterns.

Each candidate word is scored by:

j=1

Ni

AvgLog (wordi) =log2 (Fj + 1)

Ni

Page 51: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Building

0

20

40

60

80

100

0 200 400 600 800 1000 1200

Total Lexicon Entries

Co

rre

ct

Le

xic

on

E

ntr

ies

BA-1

MB-1

Event

0

50100

150

200250

300

0 200 400 600 800 1000 1200

Total Lexicon Entries

Co

rre

ct L

ex

ico

n

En

trie

s

BA-1

MB-1

Human

0

200

400

600

800

1000

0 200 400 600 800 1000 1200

Total Lexicon Entries

Co

rre

ct L

ex

ico

n

En

trie

s

BA-1

MB-1

Location

0

100

200

300

400

500

0 200 400 600 800 1000 1200

Total Lexicon Entries

Co

rre

ct L

ex

ico

n

En

trie

s BA-1

MB-1

Time

0

10

20

30

40

0 200 400 600 800 1000 1200

Total Lexicon Entries

Co

rre

ct

Le

xic

on

En

trie

s

BA-1

MB-1

Weapon

0

20

40

60

80

0 200 400 600 800 1000 1200

Total Lexicon Entries

Co

rre

ct

Le

xic

on

En

trie

s

BA-1

MB-1

Page 52: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Bootstrapping a Single Category

Page 53: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

Bootstrapping Multiple Categories

Page 54: Creating Subjective and Objective Sentence Classifiers from Unannotated Texts

A Smarter Scoring Function

diff (wi,ca) = AvgLog (wi,ca) - max (AvgLog(wi,cb)) b a

We incorporated knowledge about competing semantic categories directly into the scoring function.

The modified scoring function computes the difference between the score for the target category and the best score among competing categories.