CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

44
CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004

Transcript of CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

Page 1: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 1

Penn

Different Sense Granularities

Martha Palmer, Olga Babko-Malaya

September 20, 2004

Page 2: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 2

PennStatistical Machine Translation results

CHINESE TEXT The japanese court before china photo

trade huge & lawsuit. A large amount of the proceedings before

the court dismissed workers. japan’s court, former chinese servant

industrial huge disasters lawsuit. Japanese Court Rejects Former Chinese

Slave Workers’ Lawsuit for Huge Compensation.

Page 3: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 3

PennOutline MT example Sense tagging Issues highlighted by

Senseval1 Senseval2

Groupings, Impact on ITA

Automatic WSD, impact on scores

Page 4: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 4

PennWordNet - Princeton On-line lexical reference (dictionary)

Words organized into synonym sets <=> concepts

Hypernyms (ISA), antonyms, meronyms (PART) Useful for checking selectional restrictions (doesn’t tell you what they should be)

Typical top nodes - 5 out of 25 (act, action, activity) (animal, fauna) (artifact) (attribute, property) (body, corpus)

Page 5: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 5

PennWordNet – president, 6 senses1. president -- (an executive officer of a firm or corporation) -->CORPORATE EXECUTIVE, BUSINESS EXECUTIVE… LEADER 2. President of the United States, President, Chief Executive -- (the person who

holds the office of head of state of the United States government; "the President likes to jog every morning")-->HEAD OF STATE, CHIEF OF STATE

3. president -- (the chief executive of a republic) -->HEAD OF STATE, CHIEF OF STATE

4. president, chairman, chairwoman, chair, chairperson -- (the officer who presides at the meetings of an organization; "address your remarks to the chairperson") --> PRESIDING OFFICER  LEADER

5. president -- (the head administrative officer of a college or university)-->  ACADEMIC ADMINISTRATOR  …. LEADER

6. President of the United States, President, Chief Executive -- (the office of the United States head of state; "a President is elected every four years")

--> PRESIDENCY, PRESIDENTSHIP POSITION

Page 6: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 6

PennLimitations to WordNet Poor inter-annotator agreement (73%)

Just sense tags - no representationsVery little mapping to syntaxNo predicate argument structure no selectional restrictions

No generalizations about sense distinctions

No hierarchical entries

Page 7: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 7

PennSIGLEX98/SENSEVAL Workshop on Word Sense Disambiguation

54 attendees, 24 systems, 3 languages 34 Words (Nouns, Verbs, Adjectives) Both supervised and unsupervised systems Training data, Test data

Hector senses - very corpus based (mapping to WordNet)

lexical samples - instances, not running text Replicability over 90%, ITA 85%

ACL-SIGLEX98,SIGLEX99, CHUM00

Page 8: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 8

PennHector - bother, 10 senses 1. intransitive verb, - (make an effort), after negation,

usually with to infinitive; (of a person) to take the trouble or effort needed (to do something). Ex. “About 70 percent of the shareholders did not bother to vote at all.” 1.1 (can't be bothered), idiomatic, be unwilling to make the effort

needed (to do something), Ex. ``The calculations needed are so tedious that theorists cannot be bothered to do them.''

2. vi; after neg; with `about" or `with"; rarely cont – (of a person) to concern oneself (about something or

someone) “He did not bother about the noise of the typewriter because Danny could not hear it above the sound of the tractor.” 2.1 v-passive; with `about" or `with“ - (of a person) to be concerned

about or interested in (something) “The only thing I'm bothered about is the well-being of the club.”

Page 9: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 9

PennMismatches between lexicons:Hector - WordNet, shake

Page 10: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 10

PennVERBNET

Page 11: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 11

PennVerbNet/WordNet

Page 12: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 12

PennMapping WN-Hector via VerbNet

SIGLEX99, LREC00

Page 13: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 13

PennSENSEVAL2 –ACL’01 Adam Kilgarriff, Phil Edmond and Martha Palmer

All-words task Lexical sample taskCzech BasqueDutch ChineseEnglish EnglishEstonian Italian

Japanese Korean Spanish Swedish

Page 14: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 14

PennEnglish Lexical Sample - Verbs

Preparation for Senseval 2manual tagging of 29 highly polysemous verbs

(call, draw, drift, carry, find, keep, turn,...)WordNet (pre-release version 1.7)

To handle unclear sense distinctionsdetect and eliminate redundant sensesdetect and cluster closely related senses

NOT ALLOWED

Page 15: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 15

PennWordNet – call, 28 senses1. name, call -- (assign a specified, proper name to; "They named their son David"; "The new school was named

after the famous Civil Rights leader") -> LABEL

2. call, telephone, call up, phone, ring -- (get or try to get into communication (with someone) by telephone;

"I tried to call you all night"; "Take two aspirin and call me in the morning")

->TELECOMMUNICATE

3. call -- (ascribe a quality to or give a name of a common noun that reflects a quality;

"He called me a bastard"; "She called her children lazy and ungrateful")

-> LABEL

Page 16: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 16

PennWordNet – call, 28 senses4. call, send for -- (order, request, or command to come; "She was called into the director's office"; "Call the police!")

-> ORDER

5. shout, shout out, cry, call, yell, scream, holler, hollo, squall -- (utter a sudden loud cry;

"she cried with pain when the doctor inserted the needle"; "I yelled to her from the window but she couldn't hear me")

-> UTTER

6. visit, call in, call -- (pay a brief visit; "The mayor likes to call on some of the prominent citizens")

-> MEET

Page 17: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 17

PennGroupings Methodology

Double blind groupings, adjudication Syntactic Criteria (VerbNet was useful)

Distinct subcategorization frames call him a bastard call him a taxi

Recognizable alternations – regular sense extensions: play an instrument play a song play a melody on an instrument

Page 18: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 18

PennGroupings Methodology (cont.)

Semantic CriteriaDifferences in semantic classes of arguments

Abstract/concrete, human/animal, animate/inanimate, different instrument types,…

Differences in entailments Change of prior entity or creation of a new entity?

Differences in types of events Abstract/concrete/mental/emotional/….

Specialized subject domains

Page 19: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 19

PennWordNet: - call, 28 senses

WN2 , WN13,WN28 WN15 WN26

WN3 WN19 WN4 WN 7 WN8 WN9

WN1 WN22

WN20 WN25

WN18 WN27

WN5 WN 16 WN6 WN23

WN12

WN17 , WN 11 WN10, WN14, WN21, WN24

Page 20: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 20

PennWordNet: - call, 28 senses, groups

WN2, WN13,WN28 WN15 WN26

WN3 WN19 WN4 WN 7 WN8 WN9

WN1 WN22

WN20 WN25

WN18 WN27

WN5 WN 16 WN6 WN23

WN12

WN17 , WN 11 WN10, WN14, WN21, WN24,

Phone/radio

Label

Loud cry

Bird or animal cry

Request

Call a loan/bond

Visit

Challenge

Bid

Page 21: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 21

PennWordNet – call, 28 senses, Group11. name, call -- (assign a specified, proper name to; "They named their son David"; "The new school was named

after the famous Civil Rights leader") --> LABEL3. call -- (ascribe a quality to or give a name of a common

noun that reflects a quality; "He called me a bastard"; "She called her children lazy and

ungrateful") --> LABEL

19. call -- (consider or regard as being; "I would not call her beautiful")--> SEE

22. address, call -- (greet, as with a prescribed form, title, or name;

"He always addresses me with `Sir'"; "Call me Mister"; "She calls him by first name")

--> ADDRESS

Page 22: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 22

PennSense Groups: verb ‘develop’

WN1 WN2 WN3 WN4

WN6 WN7 WN8 WN5 WN 9 WN10

WN11 WN12 WN13 WN 14

WN19 WN20

Page 23: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 23

PennGroups 1 and 2 of Develop

Group Sense No.

Gloss Hypernym

1 – Abstract

WN1

WN2

Products, or mental creations

Mental creations – “new theory” Gradually unfold – “the plot …”

create

create

2 – New

(property) WN3

WN4

Personal attribute – “a passion for …”Physical characteristic – “a beard”

change

change

Page 24: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 24

PennGroup 3 of Develop

Group Sense No.

Gloss Hypernym

3 – New

(self)

WN5

WN9

WN10

WN14

WN20

Originate- “new religious movement”

Gradually unfold – “the plot …”

Grow – “a flower developed …”

Mature – “The child developed …”

Happen – “report the news as it …”

become

occur

grow

change

occur

Page 25: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 25

PennGroup 4 of Develop

Group Sense No.

Gloss Hypernym

4 – Improve

item

WN6

WN7

WN8

WN11

WN12

WN13

WN19

Resources – “natural resources”

Ideas – “ideas in your thesis”

Train animate beings – “violinists”

Civilize – “developing countries”

Make, grow – “develop the grain”

Business – “develop the market” Music – “develop the melody”

improve

theorize

teach

change

change

generate

complicate

Page 26: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 26

PennMaximum Entropy WSDHoa Dang (in progress)

Maximum entropy frameworkcombines different features with no assumption of

independenceestimates conditional probability that W has sense X in

context Y, (where Y is a conjunction of linguistic features

feature weights are determined from training dataweights produce a maximum entropy probability

distribution

Page 27: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 27

PennFeatures used Topical contextual linguistic feature for W:

presence of automatically determined keywords in S Local contextual linguistic features for W:

presence of subject, complementswords in subject, complement positions, particles, prepsnoun synonyms and hypernyms for subjects,

complementsnamed entity tag (PERSON, LOCATION,..) for proper

Nswords within +/- 2 word window

Page 28: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 28

PennMaximum Entropy WSDHoa Dang, Senseval2 Verbs (best)

Maximum entropy framework, p(sense|context) Contextual Linguistic Features

Topical feature for W: +2.5%, keywords (determined automatically)

Local syntactic features for W: +1.5 to +5%, presence of subject, complements, passive? words in subject, complement positions, particles,

preps, etc.Local semantic features for W: +6%

Semantic class info from WordNet (synsets, etc.) Named Entity tag (PERSON, LOCATION,..) for

proper Ns words within +/- 2 word window

Page 29: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 29

PennResults - first 5 Senseval2 verbs

Verb Begin

Call

Carry

Develop

Draw Dress

WN/corpus

10/9 28/14 39/22 21/16 35/21 15/8

Grp/corp 10/9 11/7 16/11 9/6 15/9 7/4

Entropy 1.76 3.68 3.97 3.17 4.60 2.89

ITA-fine .812 .693 .607 .678 .767 .865

ITA-coarse

.814 .892 .753 .852 .825 1.00

MX-fine .832 .470 .379 .493 .366 .610

MX-coarse

.832 .636 .485 .681 .512 .898

Page 30: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 30

PennResults – averaged over 28 verbs

Total

WN/corpus 16.28/10.83

Grp/corp 8.07/5.90

Entropy 2.81

ITA-fine 71%

ITA-coarse 82%

MX-fine 59%

MX-coarse 69%

Page 31: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 31

PennGrouping improved sense identification for MxWSD

75% with training and testing on grouped senses vs. 43% with training and testing on fine-grained senses Most commonly confused senses suggest grouping:

(1) name, call--assign a specified proper name to; ``They called their son David'' (2) call--ascribe a quality to or give a name that reflects a quality; ``He called me a bastard''; (3) call--consider or regard as being; ``I would not call her beautiful'' (4) address, call--greet, as with a prescribed form, title, or name; ``Call me Mister''; ``She calls him by his first name''

Page 32: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 32

PennCriteria to split Framesets

Semantic classes of arguments, such as animacy vs. inanimacy

Serve 01. Act, workGroup 1: function (His freedom served him well)Group 2: work (He served in Congress)

Page 33: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 33

PennCriteria to split Framesets

Semantic type of event(abstract vs. concrete)

See 01. ViewGroup 1: Perceive by sight

(Can you see the bird?)Group 5: determine, check

(See whether it works)

Page 34: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 34

PennOverlap with PropBank Framesets

WN5, WN16,WN12 WN15 WN26

WN3 WN19 WN4 WN 7 WN8 WN9

WN1 WN22

WN20 WN25

WN18 WN27

WN2 WN 13 WN6 WN23

WN28

WN17 , WN 11 WN10, WN14, WN21, WN24,

Loud cry

Label

Phone/radio

Bird or animal cry

Request

Call a loan/bond

Visit

Challenge

Bid

Page 35: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 35

PennOverlap between Senseval2Groups and Framesets – 95%

WN1 WN2 WN3 WN4

WN6 WN7 WN8 WN5 WN 9 WN10

WN11 WN12 WN13 WN 14

WN19 WN20

Frameset1

Frameset2

develop

Page 36: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 36

PennFramesets →Groups→ WordNet

WN1 WN2

WN9 WN8

WN3 WN4 WN12 WN5 WN16

WN18 WN14 WN7

WN15

WN10 WN6 WN13

Frameset1 Frameset2

drop

WN11

Frameset3

Page 37: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 37

PennGroups 1 and 2 of Develop

Group Sense No.

Gloss Hypernym

1 – Abstract

WN1

WN2

Products, or mental creations

Mental creations – “new theory” Gradually unfold – “the plot …”

create

create

2 – New

(property) WN3

WN4

Personal attribute – “a passion for …”Physical characteristic – “a beard”

change

change

Page 38: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 38

PennGroup 3 of Develop

Group Sense No.

Gloss Hypernym

3 – New

(self)

WN5

WN9

WN10

WN14

WN20

Originate- “new religious movement”

Gradually unfold – “the plot …”

Grow – “a flower developed …”

Mature – “The child developed …”

Happen – “report the news as it …”

become

occur

grow

change

occur

Page 39: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 39

PennTranslations of Develop groups

Group Sense No. Portuguese German

G4

G1

G1

G2

G2

G4

G3

G3

WN13 markets

WN1 products

WN2 ways

WN2 theory

WN3 understanding

WN2 character

WN10 bacteria

WN5 movements

desenvolver

desenvolver

desenvolver

desenvolver

desenvolver

desenvolver

desenvolver-se

desenvolver-se

entwickeln

entwickeln

entwickeln

entwickeln

bilden

ausbilden

bilden sich

bilden

Page 40: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 40

PennTranslations of Develop groups

Group Sense No. Chinese Korean

G4

G1

G1

G2

G2

G4

G3

G3

WN13 markets

WN1 products

WN2 ways

WN2 theory

WN3 understanding

WN2 character

WN10 bacteria

WN5 movements

kai1-fa1

kai1-fa1

fa1-zhan3

pei2-yang3-chu1

pei2-yang3

pei2-yang3

fa1-yu4

xing2-cheng2

hyengsengha-ta

kaypalha-ta palcensikhi-ta

palcensikhi-ta

yangsengha-ta

yangsengha-ta

paltalha-ta

hyengsengtoy-ta

Page 41: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 41

PennAn Example of Mapping: verb ‘serve’Assignment: Do you agree?

Frameset id = serve.01

Sense Groups

serve 01: Act, work

Roles:

Arg0:worker

Arg1:job, project

Arg2:employer

GROUP 1: WN1 (function)

WN3(contribute to)

WN12 (answer)

GROUP 2: WN2 (do duty)

WN13 (do military service)

GROUP 3: WN4 (be used by)

WN8 (serve well)

WN14 (service)

GROUP 5: WN7 (devote one’s efforts)

WN10 (attend to)

Page 42: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 42

PennFrameset Tagging Results: overall accuracy 90%* (baseline 73.5%)

Verb Framesets Instances Accuracy

call 11 522 0.835

carry 4 195 0.933

develop 2 240 0.938

draw 3 94 0.926

leave 3 147 0.762

pull 6 88 0.784

serve 2 150 0.967

use 2 820 0.988

work 7 398 0.955* Gold Standard parses

Page 43: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 43

PennSense Hierarchy PropBank Framesets – ITA 94% coarse grained distinctions

20 Senseval2 verbs w/ > 1 FramesetMaxent WSD system, 73.5% baseline, 90% accuracy

Sense Groups (Senseval-2) - ITA 82% (now 89%) Intermediate level (includes Levin classes) – 69%

WordNet – ITA 71% fine grained distinctions, 60.2%

Page 44: CIS630 1 Penn Different Sense Granularities Martha Palmer, Olga Babko-Malaya September 20, 2004.

CIS630 44

PennSummary of WSD

Choice of features is more important than choice of machine learning algorithm

Importance of syntactic structure (English WSD but not Chinese) Importance of dependencies Importance of an hierarchical approach to

sense distinctions, and quick adaptation to new usages.