INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LAB 9: WORDNET.

30
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LAB 9: WORDNET

Transcript of INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LAB 9: WORDNET.

INTRODUCTION TO ARTIFICIAL INTELLIGENCE

Massimo Poesio

LAB 9: WORDNET

COMMONSENSE KNOWLEDGE SOURCES FOR AI / NLP

• There are now several sources of commonsense knowledge that we can use to study its role in reasoning / develop systems able to use commonsense knowledge

• The best known is WordNet, a lexical database based on semantic networks developed by George Miller and his collaborators in Princeton

2004/05 ANLE 3

A LEXICAL RESOURCE BUILT ON SEMANTIC NETWORK PRINCIPLES

• WordNet is a LEXICAL DATABASE created at Princeton– Freely available for research from the Princeton site

• It contains information about a variety of SEMANTICAL RELATIONS

• Three sub-databases (supported by psychological research as early as (Fillenbaum and Jones, 1965))– NOUNs– VERBS– ADJECTIVES and ADVERBS

• Each database organized around SYNSETS

2004/05 ANLE 4

The noun database

• About 90,000 forms, 116,000 senses• Relations:

hypernym breakfast -> meal

hyponym meal -> lunch

has-member faculty -> professor

member-of copilot -> crew

has-Part table -> leg

part-of course -> meal

antonym leader -> follower

USING WN ONLINE

• http://wordnetweb.princeton.edu/perl/webwn

• Example: CELL PHONE

EXERCISE 1

• Find the entry for ROBOT

LEXICAL RELATIONS IN WORDNET

• Wordnet contains information about lexical relations between MEANINGS (more on meanings below)

• The type of lexical relations depends on the type of lexical entry

LEXICAL RELATIONS FOR NOUNS

• ISA (hypernymy)• PART-OF (meronymy)

TAXONOMIC INFORMATION IN WORDNET

• WordNet is a very rich source of taxonomic information

• This information can be found by following HYPERNYMY links

2004/05 ANLE 10

Hypernyms2 senses of robin

Sense 1robin, redbreast, robin redbreast, Old World robin, Erithacus rubecola -- (small Old World songbird with a reddish breast) => thrush -- (songbirds characteristically having brownish upper plumage with a spotted breast) => oscine, oscine bird -- (passerine bird having specialized vocal apparatus) => passerine, passeriform bird -- (perching birds mostly small and living near the ground with feet having 4 toes arranged to allow for gripping the perch; most are songbirds; hatchlings are helpless) => bird -- (warm-blooded egg laying vertebrates characterized by feathers and forelimbs modified as wings) => vertebrate, craniate -- (animals having a bony or cartilaginous skeleton with a segmented spinal column and a large brain enclosed in a skull or cranium) => chordate -- (any animal of the phylum Chordata having a notochord or spinal column) => animal, animate being, beast, brute, creature, fauna -- (a living organism characterized by voluntary movement) => organism, being -- (a living thing that has (or can develop) the ability to act or function independently) => living thing, animate thing -- (a living (or once living) entity) => object, physical object -- => entity, physical thing --

EXERCISE 2

• Find the hypernyms of ROBOT

UPPER ONTOLOGY IN WORDNET

• The noun hierarchy is divided in distinct hierarchies, each with its top element

13/27

• {act,action,activity}• {animal,fauna}• {artifact}• {attribute,property}• {body,corpus}• {cognition,knowledge}• {communication}• {event,happening}• {feeling,emotion}• {food}• {group,collection}• {location,place}• {motive}

• {natural object}• {natural phenomenon}• {person,human being}• {plant,flora}• {possession}• {process}• {quantity,amount}• {relation}• {shape}• {state,condition}• {substance}• {time}

MERONYMY IN WORDNET

• WordNet contains information about PARTS• Stored as information about MERONYMS• Example: TREE

EXERCISE 3

• Find the parts of house• Find the parts of building• Find the parts of car

EXERCISE 4

• Find the entry of bank

2004/05 ANLE 18

THE ORGANIZATION OF THE LEXICON

“ate”

WORD-FORMS LEXEMES SENSES

EAT-LEX-1eat0600

eat0700

“eat”

“eats”

“eaten”

2004/05 ANLE 19

The organization of the lexicon

“stock”

WORD-STRINGS LEXEMES SENSES

STOCK-LEX-1

STOCK-LEX-2

STOCK-LEX-3

stock0100

stock0200

stock0600

stock0700

stock0900

stock1000

2004/05 ANLE 20

Synonymy

“cheap”

WORD-STRINGS LEXEMES SENSES

CHEAP-LEX-1

CHEAP-LEX-2

INEXP-LEX-3

cheap0100

….

……

cheapXXXX

inexp0900

inexpYYYY

“inexpensive”

2004/05 ANLE 21

Synsets• Senses (or `lexicalized concepts’) are represented in

WordNet by the set of words that can be used in AT LEAST ONE CONTEXT to express that sense / lexicalized concept: the SYNSET

• E.g.,

{chump, fish, fool, gull, mark, patsy, fall guy, sucker, shlemiel, soft touch, mug}

(gloss: person who is gullible and easy to take advantage of)

EXERCISE 5

• Find the senses of hand, palm, slick, and stock.

EXERCISE 6

• Find the hypernyms of LAW

2004/05 ANLE 24

The verb database

• About 10,000 forms, 20,000 senses• Relations between verb meanings:

Hypernym fly-> travel

Troponym Walk -> stroll

Entails Snore -> sleep

Antonym Increase -> decrease

2004/05 ANLE 25

Relations between verbal meanings

V1 ENTAILS V2 when Someone V1 (logically) entails Someone V2- e.g., snore entails sleep

TROPONYMY when To do V1 is To do V2 in some manner- e.g., limp is a troponym of walk

EXERCISE 7

• Find the antonyms of accelerate

2004/05 ANLE 27

The adjective and adverb database

• About 20,000 adjective forms, 30,000 senses• 4,000 adverbs, 5600 senses• Relations:

Antonym (adjective) Heavy <-> light

Antonym (adverb) Quickly <-> slowly

EXERCISE 8

• Find the antonyms of dangerous

WORDNET FOR OTHER LANGUAGES

• MultiWordNet (multiwordnet.fbk.eu)– A Multilingual WordNet– Italian WordNet– Synsets aligned with English WordNet (1.6)

whenever possibile– Compatible versions developed for Hebrew,

Portuguese, Romanian and Spanish

OTHER SOURCES OF COMMONSENSE KNOWLEDGE

• OpenCyc:– http://www.opencyc.org/

• ConceptNet– http://conceptnet.media.mit.edu/– Discussed next week

• DBPedia:– http://dbpedia.org/

READINGS

• C. Fellbaum, 1998. WordNet. MIT press