Identifying Comparative Sentences in Text Documents

22
Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006

description

Identifying Comparative Sentences in Text Documents. Nitin Jindal and Bing Liu University of Illinois SIGIR 2006. Introduction. Comparisons are one of the most convincing ways of evaluation. Much of such info is available on the Web (customer reviews), forum discussions, and blogs. - PowerPoint PPT Presentation

Transcript of Identifying Comparative Sentences in Text Documents

Page 1: Identifying Comparative Sentences in Text Documents

Identifying Comparative Sentences in Text Documents

Nitin Jindal and Bing Liu

University of Illinois

SIGIR 2006

Page 2: Identifying Comparative Sentences in Text Documents

Introduction

• Comparisons are one of the most convincing ways of evaluation.

• Much of such info is available on the Web (customer reviews), forum discussions, and blogs.

• Useful for product manufacturers and potential customers (to make purchasing decisions).

Page 3: Identifying Comparative Sentences in Text Documents

Comparisons vs. Opinions

• Comparisons can be both objective or subjective.

• Comparative sentences have different language constructs from typical opinion sentences.

• Comparative sentences may contain some indicators.

Car X is much better than Car Y

Car X is two feet longer than Car Y

Page 4: Identifying Comparative Sentences in Text Documents

Related Work

• Linguistics: based on grammars (syntax and semantics) and logic (gradability), which is more for human consumption than for automatic identification.

• Opinion tasks: opinion extraction and classification problem, which is quite different from this comparison identification.

Page 5: Identifying Comparative Sentences in Text Documents

Comparatives (Linguistic)

• Comparatives are used to express explicit orderings between objects with respect to the degree or amount to which they possess some gradable property.

John is taller than he was

=>

John is tall to degree d

Page 6: Identifying Comparative Sentences in Text Documents

Comparatives (Linguistic)

• Two broad types:– Metalinguistic Comparatives: compare properti

es of one entity.

Ronaldo is angrier than upset.– Propositional Comparatives: compare between t

wo propositions. Three subcategories:

Page 7: Identifying Comparative Sentences in Text Documents

Comparatives (Propositional)

• Nominal Comparatives: (two sets of entities)

Paul ate more grapes than bananas.

• Adjectival Comparatives: (than, as good as)

Ford is cheaper than Volvo.

• Adverbial Comparatives: (occur after a verb phrase)

Tom ate more quickly than Jane.

Page 8: Identifying Comparative Sentences in Text Documents

Superlatives

• Adjectival Superlatives:

John is the tallest person.

• Adverbial Superlatives:

Jill did her homework most frequently.

• Equality: conjunctions like and, or, …

John and Sue, both like sushi.

Page 9: Identifying Comparative Sentences in Text Documents

POS involved

• NN: Noun• NNP: Proper Noun• VBZ: Verb, present tense, 3rd person singular• JJ: Adjective• RB: Adverb• JJR Adjective, comparatives• JJS: Adjective, superlative• RBR: Adverb, comparative• RBS: Adverb, superlative

Page 10: Identifying Comparative Sentences in Text Documents

Limitations of linguistic classification.

• Non-comparatives with comparative words: many non-comparatives contain comparative words.

In the context of speed, faster means better.John has to try his best to win this game.

• Limited coverage: many comparatives contain no comparative words.

In market capital, Intel is way ahead of Amd.Nokia Samsung, both cell phones perform badly on heat dissipation index.

The M7500 earned a World bench score of 85, whereas Asus A3V posted

a mark of 89.

Page 11: Identifying Comparative Sentences in Text Documents

Enhancements

• First limitation: machine learning methods to distinguish comparatives and non-comparatives.

• Second limitation: – User preferences:

I prefer Intel to Amd = Intel is better than Amd

– Implicit comparatives:Camera X has 2 MP, whereas camera Y has 5 MP.

Page 12: Identifying Comparative Sentences in Text Documents

Types of Comparatives

• Non-Equal Gradable: greater or less than type, including user preferences.

• Equative (Gradable): equal to type• Superlative (Gradable): greater of less than

all others type• Non-Gradable:

– A is similar to B; A has feature F1 while B has F2; A has feature F but B doesn’t

Page 13: Identifying Comparative Sentences in Text Documents

Tasks

• Identifying comparative sentences from a given text data set.

• Extracting comparative relations from sentences. (Mining comparative sentences and relations, AAAI 2006)

Page 14: Identifying Comparative Sentences in Text Documents

Class Sequential Rules with Multiple Minimum Supports

• For sequential pattern mining, patterns to the left and class to the right.

• Select patterns: keywords – POS (JJR, RBR, JJS, RBS) + Words (favor, prefer, win beat, but…) + Phrases (number one, up against)

• The performance of only using keywords are P=32%, R=94%.

Page 15: Identifying Comparative Sentences in Text Documents

Support and Confidence

• Using the minimum support of 20% and minimum confidence of 40%, one of the discovered CSRs is:

Page 16: Identifying Comparative Sentences in Text Documents

Building the Sequence DBthis/DT camera/NN has/VBZ significantly/RB more/JJR noise/NN at/IN iso/NN 100/CD than/IN the/DT nikon/NN 4500/CD

{NN}{VBZ}{RB}{moreJJR}{NN}{IN}{NN} -> comparative

• Sequences which exceeds 60% confidence threshold become rules. Minimum support = 10%.

• 13 Manual rules with conjunctions as whereas/IN, but/CC, however/RB, while/IN, though/IN, although/IN, etc..

Page 17: Identifying Comparative Sentences in Text Documents

Classification Learning

• Machine learning methods:

Feature Set = {X | X is the sequential pattern in

CSR X → y} ∪{Z | Z is the pattern in a manual rule

Z → y}

Page 18: Identifying Comparative Sentences in Text Documents

Data Preparation

• Consumer reviews on products such as digital cameras, DBD players, MP3 players and cellular phones.

• Forum discussions on topics such as Intel vs. AMD, Coke vs. Pepsi, and Microsoft vs. Google.

• News articles on topics such as automobiles, ipods, and soccer vs. football.

Page 19: Identifying Comparative Sentences in Text Documents

Number of Sentences in Data Sets

Page 20: Identifying Comparative Sentences in Text Documents

Experimental Results (1)

Page 21: Identifying Comparative Sentences in Text Documents

Experimental Results (2)

• Review: R low P high -> short sentences, hard to find patterns

• Articles and Forums: R high P low -> long sentences and find patterns too easily or find too many patterns.

Page 22: Identifying Comparative Sentences in Text Documents

Conclusion and Future Work

• Identifying comparative sentences.

• Analyzing different types of comparative sentences.

• Studying how to automatically classify subjective and objective comparisons.