Asma Naseer. Shallow Parsing or Partial Parsing At first proposed by Steven Abney (1991) Breaking...

42
Asma Naseer CHUNKING SHALLOW PARSING
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    221
  • download

    1

Transcript of Asma Naseer. Shallow Parsing or Partial Parsing At first proposed by Steven Abney (1991) Breaking...

Page 1: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

Asma Naseer

CHUNKINGSHALLOW PARSING

Page 2: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

INTRODUCTION

Shallow Parsing or Partial Parsing At first proposed by Steven Abney

(1991) Breaking text up into small pieces Each piece is parsed separately [1]

Page 3: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

INTRODUCTION (CONTINUE . . . ) Words are not arranged flatly in a

sentence but are grouped in smaller parts called phrases

The girl was playing in the street

دی کتاب کو احمد نے اس

Page 4: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

INTRODUCTION (CONTINUE . . . ) Chunks are non-recursive (does not contain

a phrase of the same category as it self)

NP D? AdjP? AdjP? N

The big red balloon

[NP[D The] [AdjP [Adj big]] [AdjP [Adj red]] [N balloon]]

[1]

Page 5: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

INTRODUCTION (CONTINUE . . . ) Each phrase is dominated by a head h

A man proud of his son.

A proud man

The root of the chunk has h as s-head (semantic head)

Head of a Noun phrase is usually a Noun or pronoun [1]

[1]

Page 6: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

CHUNK TAGGING (CONTINUE . . .)

IOBE IOB IO

Page 7: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

CHUNK TAGGING

IOB (Inside Outside Begin)I-NP O-NP B-NPI-VP O-VP B-BP

کیا خطاب سے قوم نے جناح علی محمد اعظم قائد

] [I-NP محمد ] [I-NPعلی] [I-NP جناح] اعظم [B-NP قائد

[O-NP نے] [B-NP قوم] [ O-NP سے] [B-NP خطاب]

[O-NP کیا]

Page 8: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

RESEARCH WORK

Rule Based Vs Statistical Based Chunking [2] Use of Support Vector Learning for Chunk

Identification [5] A Context Based Maximum Likelihood

Approach to Chunking [6] Chunking with Maximum Entropy Models [7] Single-Classifier Memory-Based Phrase

Chunking [8] Hybrid Text Chunking [9] Shallow Parsing as POS Tagging [3]

Page 9: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

RULE BASED VS STATISTICAL BASED CHUNKING Two techniques are used

Regular expressions rules○ Shallow Parse based on regular expressions

N-gram statistical tagger (machine based chunking)○ NLTK (Natural Language Toolkit) based on

TnT Tagger (Trigramsb’n’Tags).○ Basic Idea: Reuse POS tagger for chunking.

Page 10: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

RULE BASED VS STATISTICAL BASED CHUNKING ( CONTINUE… )

Regular expressions rules

Necessary to develop regular expressions manually

N-gram statistical tagger

Can be trained on gold standard chunked data

Page 11: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

RULE BASED VS STATISTICAL BASED CHUNKING ( CONTINUE… ) Focus is on Verb and Noun phrase chunking Noun Phrases

Noun or pronoun is the headAlso contains

○ Determiners i.e. Articles, Demonstratives, Numerals, Possessives and Quantifiers

○ Adjectives○ Complements ( ad-positional, relative clauses )

Verb PhrasesVerb is the headOften one or two complementsAny number of Adjuncts

Page 12: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

RULE BASED VS STATISTICAL BASED CHUNKING ( CONTINUE… ) Training NLTK on Chunk Data

Starts with empty rule set○ 1. Define or refine a rule○ 2. Execute chunker on training data○ 3. Compare results with previous run

Repeat (1,2 & 3) until performance does not improve significantly

Issues: Total 211,727 phrases. Taken subset 1,000 phrases.

Page 13: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

RULE BASED VS STATISTICAL BASED CHUNKING ( CONTINUE… ) Training TnT on Chunk Data

Chunking is treated as statistical taggingTwo steps

○ Parameter generation : create model parameters from training corpus

○ Tagging : tag each word with chunk label

Page 14: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

RULE BASED VS STATISTICAL BASED CHUNKING ( CONTINUE… ) Data Set

WSJ: Wall Street Journal Newspaper NY○ US○ International Business○ Financial News

Training: section 15-18Testing: section 20Both tagged with POS and IOBSpecial characters are treated as other

POS, punctuation are tagged as O

Page 15: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

RULE BASED VS STATISTICAL BASED CHUNKING ( CONTINUE… ) Results

Precision P = |reference ∩ test| / testRecall R = |reference ∩ test| / referenceF- Measure Fα = 0.5 = 1 / (α/P + (1-α)/PR)F- Rate F = (2 * P* R) / (R+P)

Page 16: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

RULE BASED VS STATISTICAL BASED CHUNKING ( CONTINUE… ) Results

NLTK

TnT

P R F-Measure

VP 79.3 % 80.1 % 79.7 %

NP 76.5 % 84.4 % 80.3 %

P R F-Measure

VP 79.59 % 82.35 % 80.95 %

NP 78.36 % 76.76 % 77.55 %

Page 17: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

USE OF SUPPORT VECTOR LEARNING FOR CHUNK IDENTIFICATION SVM (Large Margin Classifiers) Introduced by Vapnik 1995 Two class pattern recognition problem Good generalization performance High accuracy in text categorization

without over fitting (Joachims, 1998; Taira and Haruono, 1999)

Page 18: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

USE OF SUPPORT VECTOR LEARNING FOR CHUNK IDENTIFICATION ( CONTINUE… ) Training data (xi, yi)…. (xl, yl) xi Є Rn, yi Є {+1, -1}

xi is the i-th sample represented by n dimensional vector

yi is (+ve or –ve class) label of i-th sample In SVM

+ve and –ve examples are separated by a hyperplane

SVM finds optimal hyperplane

Page 19: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

USE OF SUPPORT VECTOR LEARNING FOR CHUNK IDENTIFICATION ( CONTINUE… )

Two possible hyperplanes

Page 20: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

USE OF SUPPORT VECTOR LEARNING FOR CHUNK IDENTIFICATION ( CONTINUE… ) Chunks in CoNLL-2000 shared task, are IOB

Tagged Each chunk type belongs to either I or B

I-NP or B-NP 22 types of chunks are found in CoNLL-2000 Chunking problem is classification of these 22

types SVM is binary classifier, so its extended to k-

classes One class vs. all others Pairwise classification

○ k * (k-1) / 2 classifiers 22 * 21 / 2 = 231 classifiers○ Majority decides final class

Page 21: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

USE OF SUPPORT VECTOR LEARNING FOR CHUNK IDENTIFICATION ( CONTINUE… ) Feature vector consists of

Words: wPOS tags: tChunk tags: c

To identify chunk ci at i-th wordwj, tj (j = i-2, i-1, i, i+1, i+2)cj (j = i-2, i-1)

All features are expanded to binary values; either 0 or 1

The total dimensions of feature vector becomes 92837

Page 22: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

USE OF SUPPORT VECTOR LEARNING FOR CHUNK IDENTIFICATION ( CONTINUE… )

Results It took about 1 day to train 231 classifiers PC-Linux

Celeron 500 MHz, 512 MB ADJP, ADVP, CONJP, INTJ, LST, NP, PP,

PRT, SBAR, VPPrecision = 93.45 %Recall = 93.51 %Fβ=1 = 93.48 %

Page 23: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

A CONTEXT BASED MAXIMUM LIKELIHOOD APPROACH TO CHUNKING

Training POS Tags based Construct symmetric n-context from

training corpus1-context: most common chunk label for each

tag3-context: tag followed by the tag before and

after it [t-1, t0, t+1]

5-context [t-2 ,t-1, t0, t+1, t+2]

7-context [t-3 , t-2 ,t-1, t0, t+1, t+2, t+3]

Page 24: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

A CONTEXT BASED MAXIMUM LIKELIHOOD APPROACH TO CHUNKING (CONTINUE . . .)

Training For each context find the most frequent

labelCC [O CC]PRP CC RP [B-NP CC]

To save storage space n-context is added if its different from its nearest lower order context

Page 25: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

A CONTEXT BASED MAXIMUM LIKELIHOOD APPROACH TO CHUNKING (CONTINUE . . .)

Testing Construct maximum context for each tag Look up in the database of most likely

patterns If the largest context is not found context

is diminished step by step The only rule for chunk-labeling is to

look up [t-3 , t-2 ,t-1, t0, t+1, t+2, t+3] .… [t0] until the context is found

Page 26: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

A CONTEXT BASED MAXIMUM LIKELIHOOD APPROACH TO CHUNKING (CONTINUE . . .)

Results The best results are achieved for 5-

contextADJP, ADVP, CONJP, INTJ, LST, NP, PP,

PRT, SBAR, VP○ Precision = 86.24%○ Recall = 88.25%○ Fβ=1 = 87.23%

Page 27: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

CHUNKING WITH MAXIMUM ENTROPY MODELS Maximum Entropy models are exponential

models Collect as much information as possible

Frequencies of events relevant to the process MaxEnt model has the form

P(w|h) = 1 / Z(h) . eΣi λi fi(h,w)

fi(h,w) is a binary valued featured vector describing an event

λi describes how important is fiZ(h) is a normalization factor

Page 28: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

CHUNKING WITH MAXIMUM ENTROPY MODELS (CONTUNE . . .)

Attributes Used Information in WSJ Corpus

Current WordPOS Tag of Current WordSurrounding WordsPOS Tags of Surrounding Words

Context Left Context: 3 wordsRight Context: 2 words

Additional Information Chunk tags of previous 2 words

Page 29: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

CHUNKING WITH MAXIMUM ENTROPY MODELS (CONTUNE . . .)

Results Tagging Accuracy = 95.5%

# of correct tagged words

Total # of words Recall = 91.86%

# of correct proposed base NPs

Number of correct base NPs Precision = 92.08%

# of correct proposed base NPs

Number of proposed base NPs

Fβ=1 = 91.97%

(β 2 +1). Recall .Precision

β2 . (Recall + Precision)

Page 30: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

HYBRID TEXT CHUNKING

Context based Lexicon and HMM based chunker

Statistics were used for chunking by Church(1998)Corpus frequencies were usedNon-recursive noun phrases were identified

Skut & Brants (1998) modifeid Church approach and used Viterbi Tagger

Page 31: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

HYBRID TEXT CHUNKING (CONTINUE . . .)

Error-driven HMM based text chunker Memory is decreased by keeping only +ve

lexical entries HMM based text chunker with context-

dependent lexiconGiven Gn

1 = g1, g2,. . ., gn

Find optimal sequence Tn1 = t1, t2, . . ., tn

Maximize log P( Tn1 | Gn

1 )

log P( Tn1 | Gn

1 ) = log P(Tn1) + log P( Tn

1 , Gn1 )

P( Tn1 ) P ( Gn

1 )

Page 32: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

SHALLOW PARSING AS POS TAGGING CoNLL 2000 : for testing and training Ratnaparkhi’s maximum entropy based

POS taggerNo change in internal operationInformation for training is increased

Page 33: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

SHALLOW PARSING AS POS TAGGING (CONTINUE . . .)

Shallow Parsing VS POS Tagging Shallow Parsing requires more

surrounding POS/lexical syntactic environment

Training ConfigurationsWords w1 w2 w3

POS Tags t1 t2 t3

Chunk Types c1 c2 c3

Suffixes or Prefixes

Page 34: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

SHALLOW PARSING AS POS TAGGING (CONTINUE . . .)

Amount of information is gradually increasedWord w1

Tag t1

Word, Tag, Chunk Label (w1 t1 c1)○ Current chunk label is accessed through another

model with configurations of words and tags (w1 t1)

To deal with sparseness○ t1, t2

○ c1

○ c2 (last two letters)

○ w1 (first two letters)

Page 35: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

SHALLOW PARSING AS POS TAGGING (CONTINUE . . .)

Word w1

Page 36: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

SHALLOW PARSING AS POS TAGGING (CONTINUE . . .)

Tag t1

Page 37: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

SHALLOW PARSING AS POS TAGGING (CONTINUE . . .)

(w1 t1 c1)

Page 38: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

SHALLOW PARSING AS POS TAGGING (CONTINUE . . .)

Sparseness Handling

Page 39: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

SHALLOW PARSING AS POS TAGGING (CONTINUE . . .)

Precision Recall F β=1

Word w1 88.06% 88.71% 80.38%

Tag t1 88.15% 88.07% 88.11%

(w1 t1 c1) 89.79% 90.70% 90.24%

Sparseness Handling

91.65% 92.23% 91.94%

Over all Results

Page 40: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

SHALLOW PARSING AS POS TAGGING (CONTINUE . . .)

Error Analysis Three groups of errors

Difficult syntactic constructs○ Punctuations○ Treating di-transitive VPs and transitive VPs○ Adjective vs. Adverbial Phrases

Mistakes made in training or testing by annotator○ Noise○ POS Errors○ Odd annotation decisions

Errors peculiar to approach○ Exponential Distribution assigns non zero probability to all

events○ Tagger may assign illegal chunk-labels (I-NP while w is not NP)

Page 41: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

SHALLOW PARSING AS POS TAGGING (CONTINUE . . .)

Comments PPs are easy to identify ADJP and ADVP are hard to identify

correctly (more syntactic information is required)

Performance at NPs can be further improved

Performance using w1 or t1 is almost same. Using both the features enhances performance

Page 42: Asma Naseer.  Shallow Parsing or Partial Parsing  At first proposed by Steven Abney (1991)  Breaking text up into small pieces  Each piece is parsed.

REFERENCES

[1] Philip Brooks, “A Simple Chunk Parser”, May 8, 2003. [2] Igor Boehm, “Rule Based vs. Statistical Chunking of CoNLL data

Set”. [3] Miles Osborne, “Shallow Parsing as POS Tagging” [4] Hans van Halteren, “Chunking with WPDV Models” [5] Taku Kudoh and Yuji Matsumoto, “Use of Support Vector

Learning for Chunk Identification”, In proceeding of CoNLL-2000 and LLL-2000, page 142-144, Portugal 2000.

[6] Christer Johanson, “A Context Sensitive Maximum Likelihood Approach to Chunking”

[7] Rob Koeling, “Chunking with Maximum Entropy Models” [8] Jorn Veenstra and Antal van den Bosch, “Single Cassifier

Memory Based Phrase Chunking” [9] Guo dong Zhou and Jian Su and TongGuan Tey, “Hybrid Text

Chunking”