ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe,...

56
ACL01 Workshop on Colloca tion 1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh ffice of Naval Research grant N00014-95-1-0776

Transcript of ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe,...

Page 1: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

1

Identifying Collocations for Recognizing Opinions

Janyce Wiebe, Theresa Wilson, Matthew BellUniversity of Pittsburgh

Office of Naval Research grant N00014-95-1-0776

Page 2: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

2

Introduction

Subjectivity: aspects of language used to express opinions and evaluations (Banfield 1982)

Relevant for many NLP applications, such asinformation extraction and text categorization

This paper: identifying collocational cluesof subjectivity

Page 3: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

3

Outline

SubjectivityData and annotationUnigram featuresN-gram featuresGeneralized N-gram featuresDocument classification

Page 4: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

4

Subjectivity Tagging

Recognizing opinions and evaluations (Subjective sentences) as opposed to material objectively presented as true (Objective sentences)

Banfield 1982, Fludernik 1993, Wiebe 1994, Stein & Wright 1995

Page 5: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

5

Examples

At several different levels, it’s a fascinating tale. subjective

Bell Industries Inc. increased its quarterly to 10 cents from 7 cents a share. objective

Page 6: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

6

Subjectivity

“Complained”“You Idiot!”

“Terrible product”

“Speculated”“Maybe”

“Enthused”“Wonderful!”

“Great product”

Page 7: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

7

Examples

Strong addressee-oriented negative evaluation Recognizing flames (Spertus 1997) Personal e-mail filters (Kaufer 2000)

I had in mind your facts, buddy, not hers.

Nice touch. “Alleges” whenever facts posted are not in your persona of what is “real.”

Page 8: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

8

Examples

Opinionated, editorial language IR, text categorization (Kessler et al. 1997) Do the writers purport to be objective?

Look, this is a man who has great numbers.

We stand in awe of the Woodstock generation’sability to be unceasingly fascinated by the subjectof itself.

Page 9: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

9

Examples

Belief and speech reports Information extraction, summarization,

intellectual attribution (Teufel & Moens 2000)

Northwest Airlines settled the remaining lawsuits,a federal judge said.

“The cost of health care is eroding our standard ofliving and sapping industrial strength”, complainsWalter Maher.

Page 10: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

10

Other Applications

Review mining (Terveen et al. 1997)

Clustering documents by ideology (Sack 1995)

Style in machine translation and generation (Hovy 1987)

Page 11: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

11

Potential Subjective Elements

"The cost of health care is eroding standards of living and sapping industrial strength,” complains Walter Maher.

Sap: potential subjective element

Subjective element

Page 12: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

12

Subjectivity

Multiple types, sources, and targets

We stand in awe of the Woodstock generation’s ability to be unceasingly fascinated by the subject of itself.

Somehow grown-ups believed that wisdom adhered to youth.

Page 13: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

13

Annotations

Three levels: expression level sentence level document level

Manually tagged + existing annotations

Page 14: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

14

Expression Level Annotations

[Perhaps you’ll forgive me] for reposting his response

They promised [e+ 2 yet] more for [e+ 3 really good][e? 1 stuff]

Page 15: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

15

Expression Level Annotations

Difficult for manual and automatic tagging: detailed no predetermined classification unit

To date: used for training and bootstrapping

Probably the most natural level

Page 16: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

16

Expression Level Data

1000 WSJ sentences (2J)462 newsgroup messages (2J) 15413 words newsgroup data (1J)

Single round of tagging; results promising

Used to generate features, not for evaluation

Page 17: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

17

Sentence Level Annotations

“The cost of health care is eroding our standard of living and sapping industrial strength,’’ complains Walter Maher.

“What an idiot,’’ the idiot presumably complained.

A sentence is labeled subjective if any significantexpression of subjectivity appears

Page 18: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

18

Document Level Annotations

This work: Opinion Pieces in the WSJ: editorials,letters to the editor, arts & leisure reviews

+ Free source of data+ More directly related to applications

Other work: flames 1-star to 5-star reviews

Page 19: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

19

Document Level Annotations

Opinion pieces contain objective sentences Non-opinion pieces contain subjective sentences

Editorials contain facts supporting the argument

News reports present reactions (van Dijk 1988) “Critics claim …” “Supporters argue …”

Reviews contain information about the product

Page 20: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

20

Class Proportions in WSJ Sample

Non-Opinion Pieces

Subjective sentences 43% Objective 57%

Noise

Opinion Pieces

Objective 30%Subjective sentences 70%

Noise

Page 21: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

21

Word Distribution

13-17% of words are in opinion pieces

83-87%

of words are in non-opinion pieces

Page 22: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

22

Evaluation Metric for Feature Swith Respect to Opinion Pieces

Baseline for comparison # words in opinions / total # words

Precision(S) = # instances of S in opinions / total # instances of S

Given the distributions, precisions of evenperfect subjectivity clues would be low

Improvement over baseline taken as evidenceof promising PSEs

Page 23: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

23

DataOpinionPieces

Non-Opinion Pieces

Page 24: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

24

Document Level Data

Existing opinion-piece annotations used for training

Manually refined classifications used for testing Identified editorials not marked as such 3 hours/edition Kappa = .93 for 2 judges

3 WSJ editions, each more than 150K words

Page 25: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

25

Automatically Generated Unigram Features

Adjective and verb features were generated usingdistributional similarity (Lin 1998, Wiebe 2000)

Existing opinion-piece annotations used for training

Manually refined annotations used for testing

Page 26: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

26

Unigram Feature Results

WSJ-10 WSJ-33 baseline 17% baseline 13%

+prec/freq +prec/freq

Adjs +21/373 +09/2137

Verbs +16/721 +07/3193

Page 27: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

27

Example Adjective Feature

conclusive, undiminished, brute, amazing,unseen, draconian, insurmountable, unqualified,poetic, foxy, vintage, jaded, tropical, distributional,discernible, adept, paltry, warm, reprehensible, astonishing, surprising, commonplace, crooked, dreary, virtuoso, trashy, sandy, static, virulent,desolate, ours, proficient, noteworthy, Insistent,daring, unforgiving, agreeable, uncritical,homicidal, comforting, erotic, resonant, ephemeral,believable, epochal, dense, exotic, topical, …

Page 28: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

28

Unique Words hapax legomena

More than expected single-instance words in subjective elements

Unique-1-gram feature: all words that appear once in the test data

Precision is 1.5 times baseline precision

Frequent feature!

Page 29: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

29

Unigram Feature Results

WSJ-10 WSJ-33 baseline 17% baseline 13% Adjs +21/373 +09/2137 Verbs +16/721 +07/3193Unique-1-gram +10/6065 +06/6048 Results are consistent, even with different identification

procedures (similarly for WSJ-22)

Page 30: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

30

Collocational PSEs

get out what afor the last timejust as well here we go again

Started with the observation that low precision words often compose higher precision collocations

Page 31: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

31

Identifying Collocational PSEs

Searching for 2-grams, 3-grams, 4-grams No grammatical generalizations or constraints yet

Train on the data annotated with subjective elements (expression level)

Test on the manually-refined opinion-piece data(document level)

Page 32: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

32

Identifying Collocational PSEs: Training Data (reminder)

1000 WSJ sentences (2J)462 newsgroup messages (2J) 15413 words newsgroup data (1J)

[Perhaps you’ll forgive me] for reposting his response

They promised [e+ 2 yet] more for [e+ 3 really good] [e? 1 stuff]

Page 33: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

33

N-Grams

Each position is filled by a word POS pair

in|prep the|det air|noun

Page 34: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

34

Identifying Collocational PSEs: Training, Step 1

Precision(n-gram) = # subjective instances of n-gram / total # instances of n-gram

Precision with respect to subjective elementscalculated for all 1,2,3,4-grams in the training data

An instance of an n-gram is subjective if eachword in the instance is in a subjective element

Page 35: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

35

Identifying Collocational PSEs: Training

[Perhaps you’ll forgive me] for reposting his response

They promised [e+ 2 yet] more for [e+ 3 really good] [e? 1 stuff]

An instance of an n-gram is subjective if eachword in the instance is in a subjective element

Page 36: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

36

Identifying Collocational PSEs: Training, Step 2

N-gram PSEs selected based on their precisions, usingtwo criteria: 1. Precision >= 0.1

2. Precision >= maximum precision of its constituents

Page 37: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

37

Identifying Collocational PSEs: Training, Step 2

prec (w1,w2) >= max (prec (w1), prec (w2))

prec (w1,w2,w3) >= max(prec(w1,w2),prec(w3)) or

prec (w1,w2,w3) >= max(prec(w1),prec(w2,w3))

Precision >= maximum precision of its constituents

Page 38: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

38

Results

WSJ-10 WSJ-33 baseline 17% baseline 13% Adjs +21/373 +09/2137 Verbs +16/721 +07/3193Unique-1-gram +10/6065 +06/60482-grams +07/2182 +04/20803-grams +09/271 +06/262 4-grams +05/32 -03/30

Page 39: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

39

Generalized Collocational PSEs

Replace each single-instance word in the trainingdata with “UNIQUE”

Rerun the same training procedure, finding collocationssuch as highly|adverb UNIQUE|adj

To test the new collocations on test data, firstreplace each single-instance word in the test datawith “UNIQUE”

Page 40: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

40

Results

WSJ-10 WSJ-33 baseline 17% baseline 13% Adjs +21/373 +09/2137 Verbs +16/721 +07/3193Unique-1-gram +10/6065 +06/60482-grams +07/2182 +04/20803-grams +09/271 +06/262 4-grams +05/32 - 03/30U-2-grams +24/294 +14/288U-3-grams +27/132 +13/144U-4-grams +83/3 +15/27

Page 41: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

41

Example

highly|adverb UNIQUE|adj

highly unsatisfactory

highly unorthodox

highly talented

highly conjectural

highly erotic

Page 42: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

42

Example

UNIQUE|verb out|IN farm out chuck out ruling out crowd out flesh out blot out spoken out luck out

Page 43: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

43

Examples

UNIQUE|adj to|TO UNIQUE|verb impervious to reason strange to celebrate wise to temper

UNIQUE|noun of|IN its|pronoun sum of its usurpation of its proprietor of its

they|pronoun are|verb UNIQUE|noun they are fools they are noncontenders

Page 44: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

44

How do Fixed and U-Collocations Compare?

Started with the observation that low precision words often compose higher precision collocations

Recall the original motivation for investigatingfixed n-gram PSEs:

But unique words are probably not low precision

Are we finding the same collocations two different ways? Or are we finding new PSEs?

Page 45: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

45

Comparison

WSJ-10 2-grams 3-grams 4-gramsIntersecting instances 4 2 0%overlap 0.0016 0.0049 0

WSJ-33: all 0s

Page 46: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

46

Opinion-Piece Recognitionusing Linear Regression

%correct TP FPAdjs,verbs .896 5 4Ngrams .899 5 3Adjs,verbs,ngrams .909 9 4All features (+ max density) .912 11 4

Max density: the maximum feature count in an 11-word window

Page 47: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

47

Future Work

Methods for recognizing non-compositional phrases(e.g., Lin 1999)

Mutual bootstrapping (Rilof and Jones 1999)to alternatively recognize sequences and subjective fillers

Page 48: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

48

Sentence Classification

Binary Features: pronoun, adjective, number, modal ¬ “will “,

adverb ¬ “not”, new paragraph

Lexical feature: good for subj; good for obj; good for neither

Probabilistic classifier

10-fold cross validation; 51% baseline72% average accuracy across folds 82% average accuracy on sentences rated certain

Page 49: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

49

Test for Bias: Marginal Homogeneity

Worse the fit,greater the bias

C1

C2

C4

C1

C3

C2 C3 C4

4+ = X4

3+ = X3

2+ = X2

1+ = X1

X1+1 =

X2+2 =

X3+3 =

X4+4 =

ii pp for all i

Page 50: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

50

Test for Symmetric Disagreement: Quasi-Symmetry

C1

C2

C4

C1

C3

C2 C3 C4

*

*

***

***

* *

**Tests relationshipsamong the off-diagonal counts

Better the fit,higher the correlation

Page 51: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

51

Unigram PSEs

Adjectives and Verbs identified using Lin’sdistributional similarity (Lin 1998)

Distributional similarity is often used inNLP to find synonyms

Motivating hypothesis: words may be similarbecause they have similar subjective usages

Page 52: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

52

Unigram Feature Generation

AdjFeature = {}For all Adjectives A in the training data: S = A + N most similar words to A P = precision(S) in the training data if P > T: AdjFeature += S Many runs with various settings for N and T

Choose values of N and T on a validation set

Evaluate on a new test set

Page 53: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

53

Lin’s Distributional Similarity

Lin 1998

I have a brown dogR1

R3

R2

R4

Word R W I R1 havehave R2 dogbrown R3 dog . . .

Page 54: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

54

Filtering

SeedWords

Words+Clusters

Filtered Set

Word + cluster removedif precision on training set< threshold

Page 55: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

55

Parameters

SeedWords

Words+Clusters

Cluster size

Threshold

Page 56: ACL01 Workshop on Collocation1 Identifying Collocations for Recognizing Opinions Janyce Wiebe, Theresa Wilson, Matthew Bell University of Pittsburgh Office.

ACL01 Workshop on Collocation

56

Lin’s Distributional Similarity

R W R W R WR W R W R W R W R W

Word1 Word2

Pairs statistically correlated with Word1

Sum over RWint: I(Word1,RWint) + I(Word2,RWint) /Sum over RWw1: I(Word1,RWw1) + Sum over RWw2: I(Word2,RWw2)