Lecture 5: Lexical Relations & WordNet

56
2003.09.09 - SLIDE 1 IS 202 – FALL 2003 Lecture 5: Lexical Relations & WordNet Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003 http://www.sims.berkeley.edu/academics/courses/ is202/f03/ SIMS 202: Information Organization and Retrieval

description

Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 pm Fall 2003 http://www.sims.berkeley.edu/academics/courses/is202/f03/. Lecture 5: Lexical Relations & WordNet. SIMS 202: Information Organization and Retrieval. Lecture Overview. Review - PowerPoint PPT Presentation

Transcript of Lecture 5: Lexical Relations & WordNet

Page 1: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 1IS 202 – FALL 2003

Lecture 5: Lexical Relations & WordNet

Prof. Ray Larson & Prof. Marc Davis

UC Berkeley SIMS

Tuesday and Thursday 10:30 am - 12:00 pm

Fall 2003http://www.sims.berkeley.edu/academics/courses/is202/f03/

SIMS 202:

Information Organization

and Retrieval

Page 2: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 2IS 202 – FALL 2003

Lecture Overview

• Review

• Lexical Relations

• WordNet

• Demo

• Discussion Questions

• Action Items for Next Time

Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack

Page 3: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 3IS 202 – FALL 2003

Lecture Overview

• Review

• Lexical Relations

• WordNet

• Demo

• Discussion Questions

• Action Items for Next Time

Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack

Page 4: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 4IS 202 – FALL 2003

Definition of AI

“... artificial intelligence [AI] is the science of making machines do things that would require intelligence if done by [humans]” (Minsky, 1963)

Page 5: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 5IS 202 – FALL 2003

The Goals of AI Are Not New

• Ancient Greece– Daedalus’ automata

• Judaism’s myth of the Golem• 18th century automata

– Singing, dancing, playing chess?

• Mechanical metaphors for mind– Clock– Telegraph/telephone network– Computer

Page 6: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 6IS 202 – FALL 2003

Some Areas of AI

• Knowledge representation• Programming languages• Natural language understanding• Speech understanding• Vision• Robotics• Planning• Machine learning• Expert systems• Qualitative simulation

Page 7: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 7IS 202 – FALL 2003

AI or IA?

• Artificial Intelligence (AI)– Make machines as smart as (or smarter than)

people

• Intelligence Amplification (IA)– Use machines to make people smarter

Page 8: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 8IS 202 – FALL 2003

Furnas: The Vocabulary Problem

• People use different words to describe the same things– “If one person assigns the name of an item,

other untutored people will fail to access it on 80 to 90 percent of their attempts.”

– “Simply stated, the data tell us there is no one good access term for most objects.”

Page 9: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 9IS 202 – FALL 2003

The Vocabulary Problem

• How is it that we come to understand each other?– Shared context– Dialogue

• How can machines come to understand what we say?– Shared context?– Dialogue?

Page 10: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 10IS 202 – FALL 2003

Vocabulary Problem Solutions?

• Furnas et al.– Make the user memorize precise system

meanings– Have the user and system interact to identify

the precise referent– Provide infinite aliases to objects

• Minsky and Lenat– Give the system “commonsense” so it can

understand what the user’s words can mean

Page 11: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 11IS 202 – FALL 2003

CYC

• Decades long effort to build a commonsense knowledge-base

• Storied past

• 100,000 basic concepts

• 1,000,000 assertions about the world

• The validity of Cyc’s assertions are context-dependent (default reasoning)

Page 12: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 12IS 202 – FALL 2003

Cyc Examples

• Cyc can find the match between a user's query for "pictures of strong, adventurous people" and an image whose caption reads simply "a man climbing a cliff"

• Cyc can notice if an annual salary and an hourly salary are inadvertently being added together in a spreadsheet

• Cyc can combine information from multiple databases to guess which physicians in practice together had been classmates in medical school

• When someone searches for "Bolivia" on the Web, Cyc knows not to offer a follow-up question like "Where can I get free Bolivia online?"

Page 13: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 13IS 202 – FALL 2003

Cyc Applications

• Applications currently available or in development – Integration of Heterogeneous Databases – Knowledge-Enhanced Retrieval of Captioned Information – Guided Integration of Structured Terminology (GIST) – Distributed AI – WWW Information Retrieval

• Potential applications – Online brokering of goods and services – "Smart" interfaces – Intelligent character simulation for games – Enhanced virtual reality – Improved machine translation – Improved speech recognition – Sophisticated user modeling – Semantic data mining

Page 14: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 14IS 202 – FALL 2003

Cyc’s Top-Level Ontology

• Fundamentals • Top Level • Time and Dates • Types of Predicates • Spatial Relations • Quantities • Mathematics • Contexts • Groups • "Doing" • Transformations • Changes Of State • Transfer Of

Possession • Movement • Parts of Objects

• Composition of Substances

• Agents • Organizations • Actors • Roles • Professions

• Emotion • Propositional

Attitudes • Social • Biology • Chemistry • Physiology • General Medicine

http://www.cyc.com/cyc-2-1/toc.html

• Materials• Waves • Devices • Construction

• Financial • Food • Clothing • Weather • Geography • Transportation • Information • Perception • Agreements • Linguistic Terms • Documentation

Page 15: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 15IS 202 – FALL 2003

Lecture Overview

• Review

• Lexical Relations

• WordNet

• Demo

• Discussion Questions

• Action Items for Next Time

Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack

Page 16: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 16IS 202 – FALL 2003

Syntax

• The syntax of a language is to be understood as a set of rules which accounts for the distribution of word forms throughout the sentences of a language

• These rules codify permissible combinations of classes of word forms

Page 17: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 17IS 202 – FALL 2003

Semantics

• Semantics is the study of linguistic meaning

• Two standard approaches to lexical semantics (cf., sentential semantics; and, logical semantics):– (1) compositional– (2) relational

Page 18: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 18IS 202 – FALL 2003

Lexical Semantics: Compositional Approach

• Compositional lexical semantics, introduced by Katz & Fodor (1963), analyzes the meaning of a word in much the same way a sentence is analyzed into semantic components. The semantic components of a word are not themselves considered to be words, but are abstract elements (semantic atoms) postulated in order to describe word meanings (semantic molecules) and to explain the semantic relations between words. For example, the representation of bachelor might be ANIMATE and HUMAN and MALE and ADULT and NEVER MARRIED. The representation of man might be ANIMATE and HUMAN and MALE and ADULT; because all the semantic components of man are included in the semantic components of bachelor, it can be inferred that bachelor man. In addition, there are implicational rules between semantic components, e.g. HUMAN ANIMATE, which also look very much like meaning postulates.– George Miller, “On Knowing a Word,” 1999

Page 19: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 19IS 202 – FALL 2003

Lexical Semantics: Relational Approach

• Relational lexical semantics was first introduced by Carnap (1956) in the form of meaning postulates, where each postulate stated a semantic relation between words. A meaning postulate might look something like dog animal (if x is a dog then x is an animal) or, adding logical constants, bachelor man and never married [if x is a bachelor then x is a man and not(x has married)] or tall not short [if x is tall then not(x is short)]. The meaning of a word was given, roughly, by the set of all meaning postulates in which it occurs.– George Miller, “On Knowing a Word,” 1999

Page 20: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 20IS 202 – FALL 2003

Pragmatics

• Deals with the relation between signs or linguistic expressions and their users

• Deixis (literally “pointing out”)– E.g., “I’ll be back in an hour” depends upon the time of the

utterance• Conversational implicature

– A: “Can you tell me the time?”– B: “Well, the milkman has come.” [I don’t know exactly, but

perhaps you can deduce it from some extra information I give you.]

• Presupposition– “Are you still such a bad driver?”

• Speech acts– Constatives vs. performatives– E.g., “I second the motion.”

• Conversational structure– E.g., turn-taking rules

Page 21: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 21IS 202 – FALL 2003

Language

• Language only hints at meaning

• Most meaning of text lies within our minds and common understanding– “How much is that doggy in the window?”

• How much: social system of barter and trade (not the size of the dog)

• “doggy” implies childlike, plaintive, probably cannot do the purchasing on their own

• “in the window” implies behind a store window, not really inside a window, requires notion of window shopping

Page 22: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 22IS 202 – FALL 2003

Semantics: The Meaning of Symbols

• Semantics versus Syntax– add(3,4)– 3 + 4– (different syntax, same meaning)

• Meaning versus Representation– What a person’s name is versus who they are

• A rose by any other name...

– What the computer program “looks like” versus what it actually does

Page 23: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 23IS 202 – FALL 2003

Semantics

• Semantics: assigning meanings to symbols and expressions– Usually involves defining:

• Objects• Properties of objects• Relations between objects

– More detailed versions include • Events• Time• Places• Measurements (quantities)

Page 24: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 24IS 202 – FALL 2003

The Role of Context

• The concept associated with the symbol “21” means different things in different contexts– Examples?

• The question “Is there any salt?”– Asked of a waiter at a restaurant– Asked of an environmental scientist at work

Page 25: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 25IS 202 – FALL 2003

What’s in a Sentence?

“A sentence is not a verbal snapshot or movie of an event. In framing an utterance, you have to abstract away from everything you know, or can picture, about a situation, and present a schematic version which conveys the essentials. In terms of grammatical marking, there is not enough time in the speech situation for any language to allow for the marking of everything which could possibly be significant to the message.”

Dan Slobin, in Language Acquisition: The state of the art, 1982

Page 26: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 26IS 202 – FALL 2003

Lexical Relations

• Conceptual relations link concepts– Goal of Artificial Intelligence

• Lexical relations link words– Goal of Linguistics

Page 27: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 27IS 202 – FALL 2003

Major Lexical Relations

• Synonymy

• Polysemy

• Metonymy

• Hyponymy/Hypernymy

• Meronymy/Holonymy

• Antonymy

Page 28: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 28IS 202 – FALL 2003

Synonymy

• Different ways of expressing related concepts• Examples

– cat, feline, Siamese cat

• Overlaps with basic and subordinate levels• Synonyms are almost never truly substitutable

– Used in different contexts– Have different implications

• This is a point of contention

Page 29: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 29IS 202 – FALL 2003

Polysemy

• Most words have more than one sense– Homonym: same sound and/or spelling, different

meaning (http://www.wikipedia.org/wiki/Homonym)

• bank (river)• bank (financial)

– Polysemy: different senses of same word (http://www.wikipedia.org/wiki/Polysemy)

• That dog has floppy ears.• She has a good ear for jazz.• bank (financial) has several related senses

– the building, the institution, the notion of where money is stored

Page 30: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 30IS 202 – FALL 2003

Metonymy

• Use one aspect of something to stand for the whole– The building stands for the institution of the

bank.– Newscast: “The White House released new

figures today.”– Waitperson: “The ham sandwich spilled his

drink.”

Page 31: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 31IS 202 – FALL 2003

Hyponymy/Hyperonymy

• ISA relation• Related to Superordinate and Subordinate

level categories– hyponym(robin,bird)– hyponym(emu,bird)– hyponym(bird,animal)– hyperym(animal,bird)

• A is a hypernym of B if B is a type of A• A is a hyponym of B if A is a type of B

Page 32: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 32IS 202 – FALL 2003

Basic-Level Categories (Review)

• Brown 1958, 1965, Berlin et al., 1972, 1973• Folk biology:

– Unique beginner: plant, animal– Life form: tree, bush, flower– Generic name: pine, oak, maple, elm– Specific name: Ponderosa pine, white pine– Varietal name: Western Ponderosa pine

• No overlap between levels• Level 3 is basic

– Corresponds to genus– Folk biological categories correspond accurately to

scientific biological categories only at the basic level

Page 33: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 33IS 202 – FALL 2003

Psychologically Primary Levels

SUPERORDINATE animal furniture

BASIC LEVEL dog chair

SUBORDINATE terrier rocker

• Children take longer to learn superordinate

• Superordinate not associated with mental images or motor actions

Page 34: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 34IS 202 – FALL 2003

Meronymy/Holonymy

• Part/Whole relation– meronym(beak,bird)– meronym(bark,tree)– holonym(tree,bark)

• Transitive conceptually but not lexically– The knob is a part of the door.– The door is a part of the house.– ? The knob is a part of the house ?

• Holonyms are (approximately) the inverse of meronyms

Page 35: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 35IS 202 – FALL 2003

Antonymy

• Lexical opposites– antonym(large, small)– antonym(big, small)– antonym(big, little)– but not large, little

• Many antonymous relations can be reliably detected by looking for statistical correlations in large text collections. (Justeson & Katz 91)

Page 36: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 36IS 202 – FALL 2003

Thesauri and Lexical Relations

• Polysemy: same word, different senses of meaning– Slightly different concepts expressed similarly

• Synonyms: different words, related senses of meanings– Different ways to express similar concepts

• Thesauri help draw all these together• Thesauri also commonly define a set of relations

between terms that is similar to lexical relations– BT, NT, RT

• More on Thesauri next week…

Page 37: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 37IS 202 – FALL 2003

What is an Ontology?

• From Merriam-Webster’s Collegiate– A branch of metaphysics concerned with the nature

and relations of being– A particular theory about the nature of being or the

kinds of existence• More prosaically

– A carving up of the world’s meanings– Determine what things exist, but not how they inter-

relate• Related terms

– Taxonomy, dictionary, category structure• Commonly used now in CS literature to describe

structures that function as Thesauri

Page 38: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 38IS 202 – FALL 2003

Lecture Overview

• Review

• Lexical Relations

• WordNet

• Demo

• Discussion Questions

• Action Items for Next Time

Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack

Page 39: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 39IS 202 – FALL 2003

WordNet

• Started in 1985 by George Miller, students, and colleagues at the Cognitive Science Laboratory, Princeton University– Miller also known as the author of the paper

“The Magical Number Seven, Plus or Minus Two: Some Limits on our Capacity for Processing Information” (1956)

• Can be downloaded for free:– www.cogsci.princeton.edu/~wn/

Page 40: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 40IS 202 – FALL 2003

Miller on WordNet

• “In terms of coverage, WordNet’s goals differ little from those of a good standard college-level dictionary, and the semantics of WordNet is based on the notion of word sense that lexicographers have traditionally used in writing dictionaries. It is in the organization of that information that WordNet aspires to innovation.”– (Miller, 1998, Chapter 1)

Page 41: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 41IS 202 – FALL 2003

Presuppositions of WordNet Project

• Separability hypothesis– The lexical component of language can be

separated and studied in its own right

• Patterning hypothesis– People have knowledge of the systematic

patterns and relations between word meanings

• Comprehensiveness hypothesis– Computational linguistics programs need a

store of lexical knowledge that is as extensive as that which people have

Page 42: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 42IS 202 – FALL 2003

WordNet: Size

POS Unique Synsets Strings

Noun 107930 74488 Verb 10806 12754 Adjective 21365 18523 Adverb 4583 3612 Totals 144684 109377

WordNet Uses “Synsets” – sets of synonymous terms

Page 43: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 43IS 202 – FALL 2003

Structure of WordNet

Page 44: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 44IS 202 – FALL 2003

Structure of WordNet

Page 45: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 45IS 202 – FALL 2003

Structure of WordNet

Page 46: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 46IS 202 – FALL 2003

Unique Beginners

• Entity, something– (anything having existence (living or nonliving))

• Psychological_feature– (a feature of the mental life of a living organism)

• Abstraction– (a general concept formed by extracting common

features from specific examples) • State

– (the way something is with respect to its main attributes; "the current state of knowledge"; "his state of health"; "in a weak financial state")

• Event– (something that happens at a given place and time)

Page 47: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 47IS 202 – FALL 2003

Unique Beginners

• Act, human_action, human_activity– (something that people do or cause to happen)

• Group, grouping– (any number of entities (members) considered as a

unit)

• Possession– (anything owned or possessed)

• Phenomenon– (any state or process known through the senses

rather than by intuition or reasoning)

Page 48: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 48IS 202 – FALL 2003

Lecture Overview

• Review

• Lexical Relations

• WordNet

• Demo

• Discussion Questions

• Action Items for Next Time

Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack

Page 49: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 49IS 202 – FALL 2003

WordNet Demo

• Available online (from Unix) if you wish to try it…– Login to irony and type “wn word” for any

word you are interested in– Demo…

Page 50: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 50IS 202 – FALL 2003

Lecture Overview

• Review

• Lexical Relations

• WordNet

• Demo

• Discussion Questions

• Action Items for Next Time

Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack

Page 51: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 51IS 202 – FALL 2003

Discussion Questions

• Joe Hall on Lexical Relations and WordNet– Which method of linguistic analysis do you

think will be more fruitful... the painstaking process involved with building WordNet or the relatively easy output afforded by Church et al.'s computational method that, however, requires much work to decipher the results?

Page 52: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 52IS 202 – FALL 2003

Discussion Questions

• Joe Hall on Lexical Relations and WordNet– What are the problems/advantages of using

the World Wide Web itself as a "corpus"? (If you were to incorporate the current digital copies of all newspapers, journals, etc. wouldn't you very quickly exceed the 15 Million words of the largest corpus in the Church article?)

Page 53: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 53IS 202 – FALL 2003

Discussion Questions

• Joe Hall on Lexical Relations and WordNet– With the diversity of dialects of the English

language, how much does this type of computational analysis get confused by phrases such as "What up?" (i.e., slang)? Aren't these some of the more interesting parts of language (i.e., how language evolves)?

Page 54: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 54IS 202 – FALL 2003

Lecture Overview

• Review

• Lexical Relations

• WordNet

• Demo

• Discussion Questions

• Action Items for Next Time

Credit for some of the slides in this lecture goes to Marti Hearst and Warren Sack

Page 55: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 55IS 202 – FALL 2003

Homework

• Read Chapters 3 and 5 of The Organization of Information (Textbook)

• Discussion Question volunteers?– Tu Tran– Hong Qu

Page 56: Lecture 5: Lexical Relations & WordNet

2003.09.09 - SLIDE 56IS 202 – FALL 2003

Next Time

• Introduction to Metadata