Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists...

28
Using resources WordNet and the BNC
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    221
  • download

    1

Transcript of Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists...

Page 1: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

Using resources

WordNet and the BNC

Page 2: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

WordNet: History

• 1985: a group of psychologists and linguists start to develop a “lexical database”– Princeton University– theoretical basis: results from

• psycholinguistics and psycholexicology– What are properties of the “mental lexicon”?

Page 3: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

Global organisation

• division of the lexicon into five categories:– Nouns– Verbs– Adjectives– Adverbs– function words (“probably stored separately

as part of the syntactic component of language” [Miller et al.]

Page 4: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

Global organization

• nouns: organized as topical hierarchies• verbs: entailment relations• adjectives: N-dimensional hyperspaces• adverbs: N-dimensional hyperspaces• [Miller et al.]: “Each of these lexical structures

reflects a different way of categorizing experience; attempts to impose a single organizing principle on all syntactic categories would badly misrepresent the psychological complexity of lexical knowledge.”

Page 5: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

Basic principles

• organize lexical information in terms of word meaning, rather than word forms– “In this respect, WordNet resembles athesaurus more

than a dictionary, ...” [Miller et al.]

• “ ... a word is a conventional association between a lexicalized concept and an utterance that plays a syntactic role.”– word form: refers to physical utterance or inscription– word meaning: refers to the lexicalized concept that a

form can be used to express

Page 6: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

Lexical semantics

• How are word meanings represented in WordNet?– synsets (synonym sets) as basic units– a word meaning is represented by simply listing the

word forms that can be used to express it

• example: senses of board– a piece of lumber vs. a group of people assembled for

some purpose– synsets as unambiguous designators:– {board, plank} vs. {board, committee}

Page 7: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

Synsets

• synsets often sufficient for differential purposes– if an appropriate synonym is not available a

short gloss may be used– e.g. {board, (a person’s meals, provided

regularly for money)}

Page 8: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

Lexical Relations in WordNet

• “WordNet is organized by semantic relations.”– It is characteristic of semantic relations that

they are reciprocated;– if there is a semantic relation R between

meaning {x, x’, ...} and meaning {y, y’, ...}, then there is a relation R’ between {y,y’, ...} and {x, x’, ...}.

Page 9: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

Lexical relations: synonymy

• similarity of meaning– Leibniz: two expressions are synonymous if the

substitution of one for the other never changes the truth value of a sentence in which the substitution is made

• such global synonymy is rare (it would be redundant)– synonymy relative to a context: two expressions are

synonymous in a linguistic context C if the substitution of one for the other in C does not alter the truth value

– consequence of this synonymy in terms of substitutability: words in different syntactic categories cannot be synonyms

Page 10: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

Lexical relations: antonymy

• antonym of a word x is sometimes not-x, but not always– rich and poor are antonyms– but: not rich does not imply poor– (because many people consider them neither rich nor poor)

• antonymy is a lexical relation between word forms, not a semantic relation between word meanings– meanings {rise,ascend} and {fall, descend} are conceptual

opposites, but they are not antonyms [rise/fall] and [ascend/descend] are pairs of antonyms

– {w1 w2} S1 & {w3 w4 } S2 & ant(w1 ,w3 ) ant(w2 ,w4 )

Page 11: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

Lexcial relations: hyponymy

• hyponymy is a semantic relation between word meanings– {maple} is a hyponym of {tree}

• inverse: hypernymy– {tree} is a hypernym of {maple}

• also called: subordination/superordination; subset/superset; ISA relation

• test for hyponomy:– native speaker must accept sentences built from the

frame “An x is a (kind of) y”

Page 12: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

Lexcial relations: meronymy

• A concept represented by the synset {x, x’,...} is a meronym of a concept represented by the synset {y, y’, ...} if native speakers of English accept sentences constructed from such frames as “A y has an x (as a part)”, “An x is a part of y”.

• inverse relation: holonymy• HAS-AS-PART

– part hierarchy– part-of is asymmetric and (with caution) transitive

Page 13: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

Lexical relations: meronymy

• failures of transitivity caused by different part-whole relations, e.g.– A musician has an arm.– An orchestra has a musician.– but: ? An orchestra has an arm.

• Types of meronymy in WordNet:– component [most frequently found]– member– composition– phase process

Page 14: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

WordNet’s noun hierarchy

• noun hierarchy partitioned into separate hierarchies with unique top hypernyms

• vague abstractions would be semantically empty, e.g. {entity} with immediate hyponyms {object, thing} and {idea}

Page 15: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

• {act,action,activity}• {animal,fauna}• {artifact}• {attribute,property}• {body,corpus}• {cognition,knowledge}• {communication}• {event,happening}• {feeling,emotion}• {food}• {group,collection}• {location,place}• {motive}

• {natural object}• {natural phenomenon}• {person,human being}• {plant,flora}• {possession}• {process}• {quantity,ammount}• {relation}• {shape}• {state, condition}• {substance}• {time}

Page 16: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

Nouns in WordNet

• noun hierarchy as lexical inheritance system– “... seldom goes more than ten levels deep,

and the deepest examples usually contain technical levels that are not part of everyday vocabulary.”

– Shetland pony → pony → horse → equid → odd-toed ungulate → herbivore → mammal → vertebrate → animal

Page 17: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

Nouns in WordNet

• man-made artifacts: sometimes six or seven levels deep– roadster → car → motor vehicle → wheeled vehicle

→ vehicle → conveyance → artifact

• hierarchy of persons: about three or four levels– televangelist → evangelist → preacher → clergyman

→ spiritual leader → person

• Like all thesaurus structures, words can have multiple hypernyms

Page 18: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

WordNets for other languages

• Idea has been widely copied• Sometimes by “translating” Princeton WordNet

– Lexical relations in general are universal ...– But are they in practice?– Are synsets universal?

• EuroWordNet: combining multilingual WordNets to include cross-language equivalence– Inherent difficulties, as above

Page 19: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

BNC

• One of the most widely used corpora (esp. in Britain, but also elsewhere)

• A balanced synchronic text corpus containing 100 million words (POS tagged)

• Collected in late 1980s• 90% text, 10% transcribed speech• Encoded according to TEI standards• Associated tools (mainly for searching), but

many users write their own (eg in Perl)• http://www.natcorp.ox.ac.uk/

Page 20: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

Using the BNC

• Just looking up words

• More interesting to construct queries that exploit the mark-up (see Allan’s slides)

• Already becoming dated (e.g. “numpty”)

• Results often contradict “authorities” such as dictionaries, especially in revealing primary senses/uses of words.

Page 21: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.
Page 22: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.
Page 23: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

WWW as a corpus

• Standard Google search engine used with individual words does not always give good word collocations: after all, Google is document retrieval

• Try: http://labs1.google.com/sets

Page 24: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.
Page 25: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.
Page 26: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.
Page 27: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.
Page 28: Using resources WordNet and the BNC. WordNet: History 1985: a group of psychologists and linguists start to develop a “lexical database” –Princeton University.

Lexical research

• Use corpus resource such as BNc together with WordNet to get interesting results

• → Allan’s slides