CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word...

42
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd) Pushpak Bhattacharyya CSE Dept., IIT Bombay 11 th Jan, 2011

description

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word Sense Disambiguation, cntd ). Pushpak Bhattacharyya CSE Dept., IIT Bombay 11 th Jan , 2011. Foundation of Wordnet: Lexical Matrix. Meaning-form relationship. Meanings container. - PowerPoint PPT Presentation

Transcript of CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 4– Wordnet and Word...

Page 1: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

CS460/626 : Natural Language Processing/Speech, NLP and the Web

(Lecture 4– Wordnet and Word Sense Disambiguation, cntd)

Pushpak BhattacharyyaCSE Dept., IIT Bombay 11th Jan, 2011

Page 2: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Foundation of Wordnet: Lexical Matrix

Page 3: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Meaning-form relationship

M1M2M3

Mk-1Mk

.

.

.

W1W2W3

Wk-1Wk

.

.

.

Many to many relationship

Meanings container

Word forms container

Page 4: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Meaning-formMeaning-form

Homonymy (accidental identity or word

borrowing)

Polysemy(shades of meaning)Eg: Fall

1. The kingdom fell

2. The fruit fellHomography(same picture)Eg: river bank and financial

bank

Homophony(same sound)Eg: write and

right

Page 5: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Distributional Similarity Words which are semantically

similar tend to appear in syntactically similar contexts.

The neighbors of the words tend to be the same.

Technically known as Distributional Similarity.

Page 6: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Synset+Gloss+ExampleCrucially needed for concept explication, wordnet building

using another wordnet and wordnet linking.

English Synset: {earthquake, quake, temblor, seism} -- (shaking and vibration at the surface of the earth resulting from underground movement along a fault plane of from volcanic activity)

Hindi Synset: {भूकंप, भूचाल, भूडोल, जलजला, भूकम्प, भू-कंप, भू-कम्प, ज़लज़ला, भूमि�कंप, भूमि�कम्प  - प्राकृति�क कारणों से पृथ्वी के भी�री भाग में कुछ उथल-पुथल होने से ऊपरी भाग के सहसा तिहलने की ति या   "२००१ में गुज़रा� में आये भूकंप में काफ़ी लोग मारे गये थे"

(shaking of the surface of earth; many were killed in the earthquake in Gujarat)

Marathi Synset: धरणीकंप,  भूकंप -  पृथ्वीच्या पोटात द्रव्यक्षोभ होऊन पृष्ठभाग हालण्याची क्रि%या " २००१ साली गुजरात�ध्ये झालेल्या धरणीकंपात अनेक लोक

�ृत्यु�ुखी पडले"

Page 7: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Semantic Relations Hypernymy and Hyponymy

Relation between word senses (synsets)

X is a hyponym of Y if X is a kind of Y Hyponymy is transitive and

asymmetrical Hypernymy is inverse of Hyponymy (lion->animal->animate entity->entity)

Page 8: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Semantic Relations (continued) Meronymy and Holonymy

Part-whole relation, branch is a part of tree

X is a meronymy of Y if X is a part of Y

Holonymy is the inverse relation of Meronymy

{kitchen} ………………………. {house}

Page 9: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Lexical Relation Antonymy

Oppositeness in meaning Relation between word forms Often determined by phonetics, word

length etc. ({rise, ascend} vs. {fall, descend})

Page 10: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Gloss

study

Hyponymy

Hyponymy

Dwelling,abode

bedroom

kitchen

house,homeA place that serves as the living quarters of one or mor efamilies

guestroom

veranda

bckyard

hermitage cottage

Meronymy

Hyponymy

Meronymy

Hypernymy

WordNet Sub-Graph

Page 11: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Troponym and Entailment Entailment

{snoring – sleeping}

Troponym {limp, strut – walk} {whisper – talk}

Page 12: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

EntailmentSnoring entails sleeping.Buying entails paying.

Proper Temporal Inclusion. Inclusion can be in any way.

Sleeping temporally includes snoring.Buying temporally includes paying.

Co-extensiveness. (Troponymy)Limping is a manner of walking.

Page 13: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Opposition among verbs. {Rise,ascend} {fall,descend} Tie-untie (do-undo)

Walk-run (slow,fast)Teach-learn (same activity different perspective)Rise-fall (motion upward or downward)

Opposition and Entailment. Hit or miss (entail aim) . Backward presupposition. Succeed or fail (entail try.)

Page 14: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

The causal relationship.Show- see.Give- have.

Causation and Entailment. Giving entails having. Feeding entails eating.

Page 15: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )
Page 16: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Kinds of AntonymySize Small - BigQuality Good – BadState Warm – CoolPersonality Dr. Jekyl- Mr. HydeDirection East- WestAction Buy – SellAmount Little – A lotPlace Far – NearTime Day - NightGender Boy - Girl

Page 17: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Kinds of MeronymyComponent-object

Head - Body

Staff-object Wood - TableMember-

collectionTree - Forest

Feature-Activity Speech - Conference

Place-Area Palo Alto - California

Phase-State Youth - LifeResource-

processPen - Writing

Actor-Act Physician - Treatment

Page 18: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

GradationState Childhood, Youth, Old

age

Temperature Hot, Warm, Cold

Action Sleep, Doze, Wake

Page 19: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Metonymy Associated with Metaphors which

are epitomes of semantics Oxford Advanced Learners

Dictionary definition: “The use of a word or phrase to mean something different from the literal meaning”

Does it mean Careless Usage?!

Page 20: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Insight from Sanskritic Tradition Power of a word

Abhidha, Lakshana, Vyanjana Meaning of Hall:

The hall is packed (avidha) The hall burst into laughing

(lakshana) The Hall is full (unsaid: and so we

cannot enter) (vyanjana)

Page 21: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Metaphors in Indian Tradition upamana and upameya

Former: object being compared Latter: object being compared with Puru was like a lion in the battle with

Alexander (Puru: upameya; Lion: upamana)

Page 22: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Upamana, rupak, atishayokti upamana: Explicit comparison

Puru was like a lion in the battle with Alexander

rupak: Implicit comparison Puru was a lion in the battle with

Alexander Atishayokti (exaggeration):

upamana and upameya dropped Puru’s army fled. But the lion fought

on.

Page 23: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Modern study (1956 onwards, Richards et. al.) Three constituents of metaphor

Vehicle (items used metaphorically) Tenor (the metaphorical meaning of the

former) Ground (the basis for metaphorical

extension) “The foot of the mountain”

Vehicle: :foot” Tenor: “lower portion” Ground: “spatial parallel between the

relationship between the foot to the human body and the lower portion of the mountain with the rest of the mountain”

Page 24: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Interaction of semantic fields(Haas)

Core vs. peripheral semantic fields Interaction of two words in

metonymic relation brings in new semantic fields with selective inclusion of features

Leg of a table Does not stretch or move Does stand and support

Page 25: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Lakoff’s (1987) contribution Source Domain Target Domain Mapping Relations

Page 26: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Mapping Relations: ontological correspondences Anger is

heat of fluid in container

Heat(i) Container(ii) Agitation of fluid(iii) Limit of resistence(iv) Explosion

AngerBodyAgitation of mindLimit of ability to suppressLoss of control

Page 27: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Image Schemas Categories: Container Contained Quantity

More is up, less is down: Outputs rose dramatically; accidents rates were lower

Linear scales and paths: Ram is by far the best performer

Time Stationary event: we are coming to exam

time Stationary observer: weeks rush by

Causation: desperation drove her to extreme steps

Page 28: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Patterns of Metonymy Container for contained

The kettle boiled (water) Possessor for possessed/attribute

Where are you parked? (car) Represented entity for

representative The government will announce new

targets Whole for part

I am going to fill up the car with petrol

Page 29: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Patterns of Metonymy (contd)

Part for whole I noticed several new faces in the

class Place for institution

Lalbaug witnessed the largest Ganapati

Question: Can you have part-part metonymy

Page 30: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Purpose of Metonymy More idiomatic/natural way of

expression More natural to say the kettle is boiling as

opposed to the water in the kettle is boiling Economy

Room 23 is answering (but not *is asleep) Ease of access to referent

He is in the phone book (but not *on the back of my hand)

Highlighting of associated relation The car in the front decided to turn right

(but not *to smoke a cigarette)

Page 31: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Feature sharing not necessary In a restaurant:

Jalebii ko abhi dudh chaiye (no feature sharing)

The elephant now wants some coffee (feature sharing)

Page 32: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Proverbs Describes a specific event or state

of affairs which is applicable metaphorically to a range of events or states of affairs provided they have the same or sufficiently similar image-schematic structure

Page 33: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

WSD APPROACHES

Page 34: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

WSD – Problem Definition Obtain the sense of

A set of target words, or of All words (all word WSD, more

difficult) Against a

Sense repository (like the wordnet), or A thesaurus (not same as wordnet,

does not have semantic relations) Using the

Context in which the word appears.

Page 35: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

Example word: operation operation ((computer science) data processing in which the result is

completely specified by a rule (especially the processing that results from a single instruction)) "it can perform millions of operations per second"

operation, military operation (activity by a military or naval force (as a maneuver or campaign)) "it was a joint operation of the navy and air force"

operation, surgery, surgical operation, surgical procedure, surgical process (a medical procedure involving an incision with instruments; performed to repair damage or arrest disease in a living body) "they will schedule the operation as soon as an operating room is available"; "he died while undergoing surgery"

mathematical process, mathematical operation, operation ((mathematics) calculation by mathematical methods) "the problems at the end of the chapter demonstrated the mathematical processes involved in the derivation"; "they were learning the basic operations of arithmetic"

Page 36: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

36

KNOWLEDEGE BASED v/s MACHINE LEARNING BASED v/s HYBRID APPROACHESKnowledge Based Approaches

Rely on knowledge resources like WordNet, Thesaurus etc.

May use grammar rules for disambiguation. May use hand coded rules for disambiguation.

Machine Learning Based Approaches Rely on corpus evidence. Train a model using tagged or untagged corpus. Probabilistic/Statistical models.

Hybrid ApproachesUse corpus evidence as well as semantic relations

form WordNet.

Page 37: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

37

SELECTIONAL PREFERENCES (INDIAN TRADITION)

“Desire” of some words in the sentence (“aakaangksha”).

I saw the boy with long hair. The verb “saw” and the noun “boy” desire an object here.

“Appropriateness” of some other words in the sentence to fulfil that desire (“yogyataa”).

I saw the boy with long hair. The PP “with long hair” can be appropriately connected only to

“boy” and not “saw”.

In case, the ambiguity is still present, “proximity” (“sannidhi”) can determine the meaning.

E.g. I saw the boy with a telescope. The PP “with a telescope” can be attached to both “boy” and

“saw”, so ambiguity still present. It is then attached to “boy” using the proximity check.

37

Page 38: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

38

SELECTIONAL PREFERENCES (RECENT LINGUISTIC THEORY)

There are words which demand arguments, like, verbs, prepositions, adjectives and sometimes nouns. These arguments are typically nouns.

Arguments must have the property to fulfil the demand. They must satisfy selectional preferences.

Example Give (verb)

agent – animate obj – direct obj – indirect

I gave him the book I gave him the book (yesterday in the school) ->

adjunct How does this help in WSD?

One type of contextual information is the information about the type of arguments that a word takes.

38

Page 39: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

39

WSD USING SELECTIONAL PREFERENCES AND ARGUMENTS

39

This airlines serves dinner in the evening flight.

serve (Verb) agent object – edible

This airlines serves the sector between Agra & Delhi.

serve (Verb) agent object – sector

Sense 1 Sense 2

Requires exhaustive enumeration of: Argument-structure of verbs. Selectional preferences of arguments. Description of properties of words such that meeting the selectional

preference criteria can be decided.E.g. This flight serves the “region” between Mumbai and DelhiHow do you decide if “region” is compatible with “sector”

Page 40: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

40

OVERLAP BASED APPROACHES Require a Machine Readable Dictionary (MRD).

Find the overlap between the features of different senses of an ambiguous word (sense bag) and the features of the words in its context (context bag).

These features could be sense definitions, example sentences, hypernyms etc.

The features could also be given weights.

The sense which has the maximum overlap is selected as the contextually appropriate sense.

40

Page 41: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

41

LESK’S ALGORITHM

41

Sense 1Trees of the olive family with pinnate leaves, thin furrowed bark and gray branches.

Sense 2The solid residue left when combustible material is thoroughly burned or oxidized.

Sense 3To convert into ash

Sense 1A piece of glowing carbon or burnt wood.

Sense 2charcoal.

Sense 3A black solid combustible substance formed by the partial decomposition of vegetable matter without free access to air and under the influence of moisture and often increased pressure and temperature that is widely used as a fuel for burning

Ash Coal

Sense Bag: contains the words in the definition of a candidate sense of the ambiguous word.Context Bag: contains the words in the definition of each sense of each context word.

E.g. “On burning coal we get ash.”

In this case Sense 2 of ash would be the winner sense.

Page 42: CS460/626 : Natural Language  Processing/Speech, NLP and the Web (Lecture  4– Wordnet and Word Sense Disambiguation,  cntd )

42

WALKER’S ALGORITHM A Thesaurus Based approach. Step 1: For each sense of the target word find the thesaurus

category to which that sense belongs. Step 2: Calculate the score for each sense by using the context

words. A context words will add 1 to the score of the sense if the thesaurus category of the word matches that of the sense.

E.g. The money in this bank fetches an interest of 8% per annum

Target word: bank Clue words from the context: money, interest, annum,

fetch Sense1: Finance Sense2: LocationMoney +1 0

Interest +1 0Fetch 0 0Annum +1 0Total 3 0

Context wordsadd 1 to thesense when the topic of theword matches thatof the sense