Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical...

45
Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn ÖFAI Vienna

Transcript of Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical...

Page 1: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Computational Approaches to Lexical

Collocations

Stefan Evert

IMS

Stuttgart

Brigitte Krenn

ÖFAI

Vienna

Page 2: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Collocations

• What kind of animal are they?

• What are they good for?

• How can we find them?

• How likely are we to find them?

Page 3: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Schedule

• Monday: Introduction to Collocations

• Tuesday: Extraction of Coocurrence Data

• Wednesday: Association Measures (AMs)

• Thursday: Evaluation of AMs

• Friday: Significance of Result Differences

Page 4: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

CollocationsTerminology & Definitions

• Firth's Notion of Collocation

``Meaning by collocation is an abstraction at the syntagmatic level and is not directly concerned with the conceptual or idea approach to the meaning of words.''

``One of the meanings of night is its collocability with dark, and of dark, of course, its collocation with night.'‘ (Firth 1957)

Page 5: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

CollocationsTerminology & Definitions

• Choueka's Notion of Collocation

"A collocation is defined as a sequence of two or more consecutive words, that has characteristics of a syntactic and semantic unit, and whose exact and unambiguous meaning cannot be derived directly from the meaning or connotation of its components." (Choueka 1988)

Page 6: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

CollocationsTerminology & Definitions

Summing up, we have 2 different views1. Firth: collocations as lexical proximities in text2. Choueka: collocations as syntactic and semantic units, semantic

irregularity

Page 7: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Terminology

• Idiomspreferably used in the English literature, e.g. Bar-Hillel:55, Hockett:58, Katz;Postal:63, Healey:68,Makkai:72.

Page 8: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Terminology

• Phraseological Units, (Ge.: Phraseologismen) a widely used generic term in the German literature, e.g. BurgerEA:82, Fleischer:82.

Page 9: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Terminology

• Light-verb Constructions, Support-verb Constructionsrefer to very particular phenomena, cross-categorisation with idioms

Page 10: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Terminology

Terms stemming from Computational Linguistics, e.g.

• multi-word lexemes, e.g. Tschichold:97, BreidtEA:96.

• multi-word expressions, e.g. Segond;Tapanainen:95

• non-compositional compounds, e.g. Melamed:97

Page 11: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Terminology

• Collocation e.g, red herring, kick the bucket, dark-night, dog-bark

• Base, Collocateelements if a collocation,the base selects its collocates,it is not always clear to define what the base is,e.g. red-herring, kick-bucket

Page 12: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Terminology

• Collocation Phrasegrammatical unit which is constituted by a base and its collocates or by two or more collocates, e.g. a red herring, kick the bucket, unexpectedly kick the bucket

Page 13: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Summing up

• Terminology and definitions are influenced by • different linguistic traditions• computational linguistic

applications

Page 14: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Summing up

• The phenomena covered are manifold:• lexical proximities in texts• syntactic and semantic units• semantic irregularity• syntactic rigidity

Page 15: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Observations: Generativity

• Collocations range from completely fixed to syntactically flexible constructions.

• Syntactic restrictions usually coincide with semantic restrictions and thus are indicators for the degree of lexicalization of a particular word combination.

Page 16: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Observations: Generativity

• Particular word combinations are associated with specific restrictions that cannot be inferred from standard rules of grammar and thus need to be stored together with the collocation.

Page 17: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Observations: Recurrence

• Within corpora, the proportion of collocations is larger among highly recurrent word combination than among infrequent ones.

• Recurrence is an effect of lexicalization.

Page 18: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Observations: Idiomaticity

• Semantic opacity is not sufficient for the definition of collocations as there exists a variety of conventionalized word combinations that range from• fully compositional ones like Hut

aufsetzen (`put on a hat'), Jacke anziehen (`put on a jacket')

to • semantically opaque ones like ins Gras

beissen (`bite into the grass' literal meaning, `die' idiomatic meaning).

Page 19: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Observations: Words, Multi-words or Phrases

• Collocations can be • word level phenomena • phrase level phenomena

(collocation phrase)

• Collocation phrases consist of the lexically determined words (collocates) only or contain additional lexically underspecified material.

Page 20: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Word-level Collocations

• Adjective- and Adverb-Like Collocations• nichts desto trotz (`nonetheless') adverb • fix und fertig (`exhausted') adjective

• Preposition-Like Collocations• im Lauf(e), im Zuge (`during')• an Hand (`with the help of')

Page 21: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Word-level Collocations

• Noun-Like Collocations• Rotes Kreuz (Red Cross)• Wiener Sängerknaben (Vienna choir

boys)• Hinz und Kunz (`every Tom, Dick and

Harry')

• Sequences where the nouns are duplicated• Schulter an Schulter (shoulder to

shoulder), • Kopf an Kopf (neck and neck)

Page 22: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Phrase-level Collocations

• Modal constructions• sich (nicht) lumpen lassen (`to splash

out')

• Verb-object combinations• übers Ohr hauen (`take somebody for a

ride')• unter die Lupe nehmen (`take a close

look at')• zum Vorschein bringen (`bring something

to the light')• des Weges kommen (`to approach')• Lügen strafen (`prove somebody a liar')

Page 23: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Phrase-level Collocations

• Copula constructions• guten Glaubens sein (`be in good faith')• auf Draht sein (`be on the ball')

• Proverbs• Morgenstund hat Gold im Mund

(morning hour has gold in the mouth, ‘the early bid catches the worm’)

• wissen, wo der Barthel den Most holt (know where the Barthel the cider fetches, `know every trick in the book')

Page 24: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Summing up

• Structural dependency • the collocates of a collocation are

syntactic dependents, thus knowledge of syntactic structure is a precondition for accurate collocation identification.

• Syntactic context • may help to discriminate literal and

collocational readings, see for instance im Lauf, im Zug where a genitive to the right is a strong indicator for collocational reading.

Page 25: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Summing up

• Markedness • morphologically or syntactically

marked constructions like seemingly incomplete syntactic structure or archaic e-suffix are suitable indicators for collocations, see im Laufe, im Zuge for e-suffix and zu Recht, an Hand for incomplete syntactic structures.

Page 26: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Summing up

• Single-word versus multi-word units • single-word occurrences of word

combinations indicate word-level collocations, see for instance zu Recht, zurecht.

• Syntactic rigidity• is an important indicator for

collocations see for instance Hinz und Kunz, an und für sich, fix und fertig, Kopf an Kopf.

Page 27: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Identification of Collocations: Operationalizable Criteria

• over proportionally high recurrence of collocational word combinations compared to noncollocational word combinations in corpora

• lexical determination of the collocates of a collocation

• collocations constitute grammatical units

• grammatical restrictions in the collocation phrases

Page 28: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Positional N-Grams

• numerical span• typically AMs work on bi-grams

, i.e., all <wi, wj> pairs within a certain span are considered

• <wi,wi> pairs are not considered!

• e.g.: wi-3 ... wi-1 wi wi+1 ... wi+3

(span size is 6)

Page 29: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Positional N-Grams

• grammatical span• e.g. we assume that collocational

relations do not exceed sentence boundaries

i.e., span size = sentence size

Page 30: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Positional N-Grams:Problems

• huge amounts of cooccurrence data need to be managed this requires special algorithms

(e.g. Yamamoto,Church 2001)

Page 31: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Positional N-Grams:Problems

• definition of span size is crucial If the span size is kept small, it is

unlikely to properly cover nonadjacent collocates of structurally flexible collocations.

Enlarging the span size leads to an increase of candidate collocations including an increase of noisy data which need to be discarded in a further processing step.

Page 32: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Positional N-Grams:Problems

High amount of noise in the collocation data due to

• inappropriate span size

Page 33: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Positional N-Grams:Problems

High amount of noise in the collocation data due to

• over-proportional frequency of function words within texts use stop word lists

Page 34: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Positional N-Grams:Problems

High amount of noise in the collocation data due to

• insensitivity to punctuation use a sentence as the largest unit within

which the collocates of a collocation may occur

Page 35: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Positional N-Grams:Problems

High amount of noise in the collocation data due to

• insensitivity to parts-of-speech knowing parts-of-speech allows a large

number of syntactically invalid n-grams to be excluded beforehand

Page 36: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Positional N-Grams:Problems

High amount of noise in the collocation data due to

• insensitivity to syntactic structure further improvement of the

appropriateness of the collocation candidates selected is achieved by the availability of structural and/or dependency information

Page 37: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Proposal

• (Partial) replacement of • positional n-grams

by • relational n-grams

• What does it imply?

• In which cases do we want/need it?

Page 38: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

From Positional to Relational N-Grams

• positional with part-of-speech, eg. (Evert,Kermes 2003)• <Adj,N> pairs within certain span

Page 39: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

From Positional to Relational N-Grams

• positional with part-of-speech and lexical knowledge, e.g. Breidt 93• sentence final <noun,past

participle> pairs • past participles must be part of a

list of 16 German support verbs e.g. kommen, bringen, stehen, stellen, ...

Page 40: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

From Positional to Relational N-Grams

• partially relational, e.g. Krenn 2000• extraction of PN-combinations from

PPs• extraction of main verbs• combination of PN-pairs and verbs

co-occurring in a sentence

Page 41: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

From Positional to Relational N-Grams

• Result• a theoretical maximum of PNV

combinations, i.e., • verbs are duplicated in sentences

that contain more than one PP,• PPs are duplicated in sentences

where more than one main verb is found.

This has effects on counting!

Page 42: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Relational N-Grams

Assumption

The collocates of a collocation are syntactically related!

Page 43: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Relational N-Grams

Requirements

• part-of-speech tagging

• partial parsing (phrase chunking)

• full parsingEvert,Kermes 2003

Page 44: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Relational N-Grams

Problems

• full parsing of free text is error prone

• there are cases where collocation extraction from fully parsed text is less accurate

Page 45: Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Computational Approaches to Lexical Collocations Stefan Evert IMS Stuttgart Brigitte Krenn.

Ste

fan

Eve

rt,

IMS

- U

ni

Stu

ttg

art

Bri

git

te K

ren

n,

ÖF

AI

Wie

n

IMS

Summary

• Use application dependent notion of collocation as opposed to Firth/Choueka

• Extract collocations from relational data (relational n-grams)

• Interested in collocation phrases and their collocates

• Consider grammatically homogenous data (e.g. Adj-N, PP-V)

• Recurrence/cooccurrence frequency is main criterion for collocation extraction (statistical approaches)