Named-Entity Recognition with Character-Level Models Dan Klein, Joseph Smarr, Huy Nguyen, and...

Named-Entity Recognition with Character-Level Models

Dan Klein, Joseph Smarr, Huy Nguyen, and Christopher D. Manning

Stanford University

CoNLL-2003: Seventh Conference on Natural Language Learning

klein@cs.stanford.edu jsmarr@stanford.edu htnguyen@stanford.edu manning@cs.stanford.edu

Unknown Words are a Central Challenge for NER

Recognizing known named-entities (NEs) is relatively simple and accurate

Recognizing novel NEs requires recognizing context and/or word-internal features

External context and frequent internal words (e.g. “Inc.”) are most commonly used features

Internal composition of NEs alone provide surprisingly strong evidence for classification (Smarr & Manning, 2002) Staffordshire Abdul-Karim al-Kabariti CentrInvest

Are Names Self-Describing?

NO: names can be opaque/ambiguousWord-Level: “Washington” occurs as LOC, PER, and

ORGChar-Level: “–ville” suggests LOC, but exceptions

like “Neville”

YES: names can be highly distinctive/descriptiveWord-Level: “National Bank” is a bank (i.e. ORG)Char-Level: “Cotramoxazole” is clearly a drug

Question: Overall, how informative are names alone?

How Internally Descriptive are Isolated Named Entities?

Classification accuracy of pre-segmented CoNLL NEs without context is ~90%

Using character n-grams as features instead of words yields 25% error reduction

On single-word unknown NEs, word model is at chance; char n-gram model fixes 38% of errors

Words Char N-Grams

All NEs

Words Char N-Grams

Single-word UNKs

NE Classification Accuracy (%)[not CoNLL task]

Exploiting Word-Internal Features

Many existing systems use some word-internal features (suffix, capitalization, punctuation, etc.)

e.g. Mikheev 97, Wacholder et al 97, Bikel et al 97 Features usually language-dependent (e.g. morphology)

Our approach: use char n-grams as primary representation

Use all substrings as classification features:

Char n-grams subsume word features Features are language-independent (assuming its

alphabetic) Similar in spirit to Cucerzan and Yarowsky (99), but uses

ALL char n-grams vs. just prefix/suffix

#Tom##Tom#, #Tom, Tom#, #To,

Tom, om#, #T, To, om, m#, T, o, m

Character-Feature Based Classifier

Model I: Independent classification at each word maxent classifiers, trained using conjugate gradient equal-scale gaussian priors for smoothing trained models with >800K features in ~2 hrs

POS tags and contextual features complement n-grams

Description Added Features Overall F1 (English Dev.)

Words w0

Official Baseline

Char N-Grams n(w0)

POS Tags t0

Simple Context

w-1, w0, t-1, t1

More Context ‹w-1, w0›, ‹w0, w1›, ‹t-1, t0›, ‹t0, w1›

Character-Based CMM

Model II: Joint classifications along the sequence

Previous classification decisions are clearly relevant: “Grace Road” is a single location, not a

person + location Include neighboring classification

decisions as features Perform joint inference across chain of

classifiers Conditional Markov Model (CMM, aka. maxent

Markov model) Borthwick 1999, McCallum et al 2000

Character-Based CMM

Final extra features: Letter-type patterns for each word

United Xx, 12-month d-x, etc. Conjunction features

E.g., previous state and current signature Repeated last words of multi-word names

E.g., Jones after having seen Doug Jones … and a few more

Description Added Features Overall F1 (English Dev)

More Context ‹w-1, w0›, ‹w0, w1›, ‹t-1, t0›, ‹t0, w1›

Simple Sequence

s-1, ‹s-1, t-1, t0›

More Sequence ‹s-2, s-1›, ‹s-2, s-1, t-1, t0›

Final misc. extra features

Final Results

Drop from English dev to test largely due to inconsistent labeling

Lack of capitalization cues in German hurts recall more because maxent classifier is precision-biased when faced with weak evidence

Eng Dev Eng Test Ger Dev Ger Test

Precision Recall F1

Conclusions

Character substrings are valuable and underexploited model features Named entities are internally quite

descriptive 25-30% error reduction vs. word-level models

Discriminative maxent models allow productive feature engineering 30% error reduction vs. basic model

What distinguishes our approach? More and better features Regularization is crucial for preventing

overfitting

Named-Entity Recognition with Character-Level Models Dan Klein, Joseph Smarr, Huy Nguyen, and...

Documents

Transcript of Named-Entity Recognition with Character-Level Models Dan Klein, Joseph Smarr, Huy Nguyen, and...

This file has been cleaned of potential threats. If you confirm that …cse.iitkgp.ac.in/~sourangshu/coursefiles/IR18A/chap16... · 2018-11-12 · CoNLL-2002 and CoNLL-2003 (British

Introduction to the CoNLL-2003 Shared Task

Huy Nh Quang

LE TIN HUY

CoNLL-SIGMORPHON 2017 Shared Task: Universal Morphological ...jason/papers/cotterell+al.conll17.pdf · CoNLL-SIGMORPHON 2017 Shared Task: Universal Morphological Reinﬂection in

Ketqua Huy

Tom DeFanti* and Larry Smarr - CENIC

Joseph Smarr - Cross-Site Ajax 1 Cross-Site Ajax Challenges and Techniques for Building Rich Web 2.0 Mashups Joseph Smarr Plaxo, Inc. joseph@plaxo.com.

ACL(+WS) 2007 EMNLP- CoNLL 2007 サーベイ

Brian Oberkirch, Tantek Celik & Joseph Smarr @ FOWA Miami

Educause09 Smarr Arnaud

Dr. Larry Smarr, Director, California Institute for Telecommunications and Information Technology

A Smarr formula for charged black holes in nonlinear electrodynamics · A Smarr formula for charged black holes in nonlinear electrodynamics ... respective formulas of the ﬁrst

Linguistic regularities in sparse and explicit word representations conll-2014

Smarr Oscon 2007

Pengemasan Huy

Hidden Markov Models for Information Extraction Recent Results and Current Projects Joseph Smarr & Huy Nguyen Advisor: Chris Manning.

Nơi bán huy chương thể thao,huy chương giải thưởng,làm huy chương vàng,bạc,đồng

Widget Summit: Advanced JavaScript Joseph Smarr Plaxo, Inc. October 16, 2007.

CoNLL-2010: Shared Task Fourteenth Conference on ...