Multi-layer Annotation in Dependency Structure

19

Click here to load reader

description

Stages annual meeting at Columbia 2011.

Transcript of Multi-layer Annotation in Dependency Structure

Page 1: Multi-layer Annotation in Dependency Structure

Multi-layer Annotation inDependency Structure

University of Colorado at BoulderOctober 20th, 2011

[email protected]

Stages Annual Meeting at Columbia University

Martha Palmer, James Martin, Jinho D. Choi, Shumin Wu

Friday, October 21, 2011

Page 2: Multi-layer Annotation in Dependency Structure

Contents• Data

- OntoNotes v4.0.

• Components

- Tool developments.

• Multi-layer annotation in dependency structure

- Dependency trees, semantic roles, word senses, named entities(, conferences).

• Chinese VerbNet

- Manual inspection.

2

Friday, October 21, 2011

Page 3: Multi-layer Annotation in Dependency Structure

Data• OntoNotes v4.0

- Arabic, Chinese, English.

- Treebank, PropBank, word sense, named entity, coreference.

3

Genre Parse Prop Sense Name CorefBC 16,107 16,107 16,107 16,107 14,412BN 13,287 13,241 13,287 13,287 12,147MZ 8,333 8,333 8,333 8,333 8,333NW 42,640 42,542 42,268 41,876 19,240WB 15,103 11,941 13,340 9,868 8,420Total 95,470 92,164 93,335 89,471 62,552

# of sentences in English corpora

Friday, October 21, 2011

Page 4: Multi-layer Annotation in Dependency Structure

Data• OntoNotes v4.0

- Arabic, Chinese, English.

- Treebank, PropBank, word sense, named entity, coreference.

4

Genre Parse Prop Sense Name CorefBC 12,049 10,960 10,960 10,960 10,960BN 10,083 10,083 9,491 4,764 9,483MZ 4,801 4,801 4,801 4,749 4,801NW 4,510 4,510 4,175 4,180 4,180WB 10,181 5,589 - - 8,044Total 41,624 35,943 29,427 24,653 37,468

# of sentences in Chinese corpora

Friday, October 21, 2011

Page 5: Multi-layer Annotation in Dependency Structure

Data• Parallel corpora in the OntoNotes v4.0

- c2e: Chinese to English, e2c: English to Chinese

5

Genrec2ec2e e2ce2c

GenreEnglish Chinese English Chinese

BC 1,190 1,089 - -BN 642 592 - -MZ 8,333 4,801 - -NW 4,894 4,510 - -WB 3,239 2,854 3,412 3,565Total 18,298 13,846 3,412 3,565

# of sentences in parallel corpora

Friday, October 21, 2011

Page 6: Multi-layer Annotation in Dependency Structure

Components• Tool development

- Dependency parser

• English.

- Semantic role labelers

• English (dependency, constituency), Chinese (constituency)

- Predicate argument aligner

• Between English and Chinese.

- Constituent-to-dependency converter

• English, Chinese (coming soon).

6

Friday, October 21, 2011

Page 7: Multi-layer Annotation in Dependency Structure

Components• Dependency parser (Choi & Palmer, ACL’11)

- Transition-based, non-projective dependency parser.

- Takes 2 milliseconds per sentence.

- Available at: http://code.google.com/p/clearparser/

• Dependency convertor (Choi & Palmer, TLT’10)

- Available at: http://code.google.com/p/clearparser/

7

Non-projective Function tags New format MaintenancePenn2Malt - - - -Stanford - - - O

LTH O O - -Clear O O O O

Friday, October 21, 2011

Page 8: Multi-layer Annotation in Dependency Structure

Components• Semantic role labelers

8

Dependency-based Constituent-based

Input Dependency trees: From our parser

Constituent trees: From Berkeley parser

Output Semantic roleson head words

Semantic roleson phrases

Available http://code.google.com/p/clearparser/

http://code.google.com/p/clearsrl/

Reference Choi & Palmer, RELMS’11 Wu & Palmer, SSST’11

Friday, October 21, 2011

Page 9: Multi-layer Annotation in Dependency Structure

Multi-layer Annotation in Dependency Structure

• Dependency trees

- Automatically converted from the OntoNotes Treebank.

- Including empty categories.

• Semantic roles

- Annotated on head words instead of phrases.

• Word senses

- Annotated on nouns and verbs.

• Named entities

- Annotated on head words (or spans of words?).

9

Friday, October 21, 2011

Page 10: Multi-layer Annotation in Dependency Structure

The car0

John wanted

*T*

tobuy

*PRO*

was

sold

last night

NP

WHNPNP VP

NP

VP

VP

VPNP

VPNP

S

S

SBAR

NP

S

NP

*

Phrase Structure• The car John wanted to buy was sold last night.

10

PropBank

Friday, October 21, 2011

Page 11: Multi-layer Annotation in Dependency Structure

Dependency Trees• Constituent-to-dependency conversion

- We need a richer representation for showing the hidden dependencies.

11

The car John wanted to buy was sold last night

NMOD SBJ IMOPRDNMOD

NMODTMP

VC

SBJ

root

SBJOBJ

OBJ

Friday, October 21, 2011

Page 12: Multi-layer Annotation in Dependency Structure

Empty Categories• Empty categories in dependency structure

12

The car John wanted to buy was sold last night

NMOD SBJ OPRD

OBJ

NMOD

TMP

VC

SBJroot

*PRO*

SBJIM

*

OBJ

0

NMOD

The car John wanted to buy was sold last night

NMOD SBJ IMOPRDNMOD

NMODTMP

VC

SBJ

root

Friday, October 21, 2011

Page 13: Multi-layer Annotation in Dependency Structure

Semantic Roles• Semantic roles in dependency structure with EC

- Can avoid some long-distance dependencies.

- Represents the true spans of semantic arguments.

13

The car John wanted to buy was sold last night

NMOD SBJA0

OPRDA1

OBJA1

NMOD

TMPAM-TMP

VC

SBJroot

*PRO*

SBJA0

IM

*

OBJA1

0

NMOD

Friday, October 21, 2011

Page 14: Multi-layer Annotation in Dependency Structure

Word Senses & Named Entities• Word senses and named entities

• Coreferences?

- Annotation across sentences within a document.

14

The car.2 John wanted.1 to buy.1 was sold.1 last night.1

NMOD SBJA0

OPRDA1

OBJA1

NMOD

TMPAM-TMP

VC

SBJroot

*PRO*

SBJA0

IM

*

OBJA1

0

NMOD

NAME TIME

Friday, October 21, 2011

Page 15: Multi-layer Annotation in Dependency Structure

Chinese VerbNet• Motivation

- Verb classes give back-off information for unknown verbs.

- There is no lexicon specifying verb classes in Chinese.

- Manual annotation of Chinese verb classes is expensive.

- Can we automatically generate Chinese verb classes from English VerbNet?

• Manual inspection

- Given an English verb class, find a similar class in Chinese.

- English verb class: spray-9.7.

15

Friday, October 21, 2011

Page 16: Multi-layer Annotation in Dependency Structure

Manual Inspection of Chinese Verbs• Spray-9.7

- brush, drizzle, hang, plaster, pump, rub, scatter, seed, sew, shower, smear, smudge, sow, spatter, splash, splatter, spray, spread, sprinkle, spritz, spurt, squirt, stick, strew, string, swab, wrap

- Thematic roles associated with the class: agent, theme, destination

• Examples

- [Agent Jessica] squirted [Theme water].

- [Agent Jessica] sprayed [Destination the wall].

16

Friday, October 21, 2011

Page 17: Multi-layer Annotation in Dependency Structure

Manual Inspection of Chinese Verbs• Chinese verbs associated with spray-9.7.

- 刷(brush), 滴(drip), 挂(hang), 抹(wipe), 抽(pump), 擦(rub),

撒(sprinkle), 缝(sew), 涂(smear), 种(plant), 播种(sow),

洒(spray), 喷(splatter), 喷洒(spray), 飙(spritz), 喷射(squirt),

贴(stick), 串(string), 包(wrap), 裹(wrap), 挤(squeeze),

塞(stuff), 装(pack)

17

Friday, October 21, 2011

Page 18: Multi-layer Annotation in Dependency Structure

Manual Inspection of Chinese Verbs

18

他He

把PART

涂料paint

刷brush

在on ceiling

天花板 上LOC

"He brushes paint on the ceiling"

paint 他 刷 了 涂料He brush ASP

"He brushed the paint"

ceiling 涂料 刷 在 天花板Paint brush on

上LOC

"Paint is brushed on the ceiling"

ASP天花板 上 刷 了Ceiling LOC brush

上paint

"Paint is brushed on the ceiling"

: predicate: agent: theme: destination

Friday, October 21, 2011

Page 19: Multi-layer Annotation in Dependency Structure

Automatic Generation of Chinese VerbNet

19

EnglishData

ChineseData

English Data+ Verb Classes

Chinese Data+ Verb Classes

VerbNetClassifier

Predicate Argument Aligner

Friday, October 21, 2011