JHU WORKSHOP - 2003 July 30th, 2003 Semantic Annotation – Week 3 Team: Louise Guthrie, Roberto...

27
JHU WORKSHOP - 2003 July 30th, 2003 Semantic Annotation – Week 3 Team: Louise Guthrie, Roberto Basili, Fabio Zanzotto, Hamish Cunningham, Kalina Boncheva, Jia Cui, Klaus Macherey, David Guthrie, Martin Holub, Marco Cammisa, Cassia Martin, Jerry Liu, Kris Haralambiev Fred Jelinek

Transcript of JHU WORKSHOP - 2003 July 30th, 2003 Semantic Annotation – Week 3 Team: Louise Guthrie, Roberto...

JHU WORKSHOP - 2003July 30th, 2003

Semantic Annotation – Week 3

Team: Louise Guthrie, Roberto Basili, Fabio Zanzotto, Hamish Cunningham, Kalina Boncheva, Jia Cui, Klaus Macherey, David Guthrie, Martin Holub, Marco Cammisa, Cassia Martin, Jerry Liu, Kris Haralambiev

Fred Jelinek

JHU WORKSHOP - 2003July 30th, 2003

Our Hypotheses

● A transformation of a corpus to replace words and phrases with coarse semantic categories will help overcome the data sparseness problem encountered in language modeling

● Semantic category information will also help improve machine translation

● A noun-centric approach initially will allow bootstrapping for other syntactic categories

JHU WORKSHOP - 2003July 30th, 2003

An Example

● Astronauts aboard the space shuttle Endeavor were forced to dodge a derelict Air Force satellite Friday

● Humans aboard space_vehicle dodge satellite timeref.

JHU WORKSHOP - 2003July 30th, 2003

Our Progress – Preparing the data- Pre-Workshop

● Identify a tag set

● Create a Human annotated corpus

● Create a double annotated corpus

● Process all data for named entity and noun phrase recognition using GATE Tools

● Develop algorithms for mapping target categories to Wordnet synsets to support the tag set assessment

JHU WORKSHOP - 2003July 30th, 2003

The Semantic Classes for Annotators

● A subset of classes available in Longman's Dictionary of contemporary English (LDOCE) Electronic version

● Rationale:

The number of semantic classes was smallThe classes are somewhat reliable since they were used by a team of lexicographers to code

Noun senses Adjective preferences Verb preferences

JHU WORKSHOP - 2003July 30th, 2003

Semantic Classes

Abstract T

B Movable N

Animate Q

Plant P Animal A Human H

Inanimate I

Liquid L Gas G Solid S

Concrete C

D F MNon-movable J

• Target Classes• Annotated Evidence--

PhysQuant 4Organic 5

JHU WORKSHOP - 2003July 30th, 2003

More Categories

● U: Collective● K: Male● R: Female● W: Not animate● X: Not concrete or animal● Z: Unmarked

We allowed annotators to choose “none of the above” (? in the slides that follow)

JHU WORKSHOP - 2003July 30th, 2003

Our Progress – Data Preparation

● Assess annotation format and define uniform descriptions for irregular phenomena and normalize them

● Determine the distribution of the tag set in the training corpus

● Analyze inter-annotator agreement

● Determine a reliable set of tags – T

● Parse all training data

JHU WORKSHOP - 2003July 30th, 2003

Doubly Annotated Data

● Instances (headwords): 10960

● 8,950 instances without question marks.

● 8,446 of those are marked the same.

● Inter-annotator agreement is 94% (83% including question marks)

Recall – these are non named entity noun phrases

JHU WORKSHOP - 2003July 30th, 2003

77,06%

5,79%12,55% 4,60%

agree w/o ?agree with ?disagree with ?disagree w/o ?

Distribution of Double Annotated Data

JHU WORKSHOP - 2003July 30th, 2003

Agreement of doubly marked instances

94%

6% agree w/o ?

disagree w/o ?

JHU WORKSHOP - 2003July 30th, 2003

Inter-annotator agreement – for each category

00,10,20,30,40,50,60,70,80,9

1

A C G H I J K L N P Q R S 4 5 T U W X

2

JHU WORKSHOP - 2003July 30th, 2003

Category distribution among agreed part

inter-annotator agreement

W K Q G I P C L X R 5

4 A U N J S H T

69%

JHU WORKSHOP - 2003July 30th, 2003

A few statistics on the human annotated data

● Total annotated 262,230 instances48,175 with ?

● 214,055 with a categoryof those Z .5%

W and X .5%

4 , 5 1.6%

JHU WORKSHOP - 2003July 30th, 2003

Our progress – baselines

● Determine baselines for automatic tagging of noun phrases

● Baselines for tagging observed words in new contexts (new instances of known words)

● Baselines for tagging unobserved words Unseen words – not in the training material but in dictionary

Novel words – not in the training material nor in the dictionary/Wordnet

JHU WORKSHOP - 2003July 30th, 2003

Overlap of dictionary and head nouns (in the BNC)

● 85% of NP’s covered

● only 33% of vocabulary (both in LDOCE and in Wordnet) in the NP’s covered

JHU WORKSHOP - 2003July 30th, 2003

Preparation of the test environment

● Selected the blind portion of the human annotated data for late evaluation

● Divided the remaining corpus into training and held-out portions

Random division of files

Unambiguous words for training – ambiguous for testing

JHU WORKSHOP - 2003July 30th, 2003

Baselines using only (target) words

Error Rate Unseen words marked with

Method Valid training instances

blame

15.1% the first class MaxEntropy count 3 Klaus

12.6% most frequent class

MaxEntropy count 3 Jerry

16% most frequent class

VFI all Fabio

13% most frequent class

NaiveBayes all Fabio

JHU WORKSHOP - 2003July 30th, 2003

Baselines using only (target) words and preceeding adjectives

Error Rate Unseen words marked with

Method Valid training instances

blame

13% most frequent class

MaxEntropy count 3 Jerry

13.2% most frequent class

MaxEntropy all Jerry

12.7% most frequent class

MaxEntropy count 3 Jerry

JHU WORKSHOP - 2003July 30th, 2003

Baselines using multiple knowledge sources

● Experiments in Sheffield

● Unambiguous tagger (assign only available semantic categories)

● bag-of-words tagger (IR inspired)window size 50 wordsnouns and verbs

● Frequency-based tagger (assign the most frequent semantic category)

JHU WORKSHOP - 2003July 30th, 2003

Baselines using multiple knowledge sources (cont’d)

● Frequency-based tagger

16-18% error rate

● bag-of-words tagger

17% error rate

● Combined architecture

14.5-15% error rate

JHU WORKSHOP - 2003July 30th, 2003

Bootstrapping to Unseen Words

● Problem: Automatically identify the semantic class of words in LDOCE whose behavior was not observed in the training data

● Basic Idea: We use the unambiguous words (unambiguous with respect to the our semantic tag set) to learn context for tagging unseen words.

JHU WORKSHOP - 2003July 30th, 2003

Bootstrapping: statistics

6,656 different unambiguous lemmas in the (visible) human tagged corpus

...these contribute to 166,249 instances of data

...134,777 instances were considered correct by the annotators

! Observation: Unambiguous words can be used in the corpus in an “unforeseen” way

JHU WORKSHOP - 2003July 30th, 2003

Bootstrapping baselines

Method % correct labelled instances

Assigning the most frequent semantic tag (i.e. Abstract)

52%

Using one previous word (Adjective, Noun, or Verb) (using Naive Bayes Classifier)

(with reliable tagged instances) 45%

(with all instances) 44.3%

1 previous and 1 following word (Adjective, Noun, or Verb) (using Naive Bayes Classifier)

(with reliable tagged instances) 46.8%

(with all instances) 44.5%

● Test Instances (instances of ambiguous words) : 62,853

JHU WORKSHOP - 2003July 30th, 2003

Metrics for Intrinsic Evaluation

● Need to take into account the hierarchical structure of the target semantic categories

● Two fuzzy measures based on:

dominance between categories

edge distance in the category tree/graph

● Results wrt inter annotator agreement is almost identical to exact match

JHU WORKSHOP - 2003July 30th, 2003

What’s next

● Investigate respective contribution of (independent) features

● Incorporate syntactic information

● Refine some coarse categories

Using subject codes

Using genus terms

Re-mapping via Wordnet

JHU WORKSHOP - 2003July 30th, 2003

What’s next (cont’d)

● Reduce the number of features/values via external resources:

lexical vs. semantic models of the context

use selectional preferences

● Concentrate on complex cases (e.g. unseen words)

● Preparation of test data for extrinsic evaluation (MT)