#3 Information extraction from news to conversations

Information extraction from news to conversations Will Radford – Xerox Research Centre Europe 15th October 2015

Outline

1.  Named entity recognition using Conditional Random Fields (CRFs)

•  Features •  Learning and evaluation

2.  Knowledge Base Tags for NER 3.  Introduction to dialogue state tracking and why it’s hard

NAMED ENTITY RECOGNITION USING CONDITIONAL RANDOM FIELDS

The task

• Given a sequence of tokens •  X is a sequence of tokens •  xi is a token (e.g. Will) at the ith position in the sequence

• Return a structured label •  Y is a sequence of tags •  yi is a tag (e.g. PER) at the ith position in the sequence

X: I saw Paris in New York with her dog .

Y: O O B-PER O B-LOC I-LOC O O O O

The model

• Conditional Random Fields •  Linear chain, undirected graphical model • P(Y|X) à Probability of tags given tokens •  Features of token xi can be joined with

• Yi à This token, this tag •  (yi, yi-1) à This token, this and the last tag

• Convex optimization allows us to learn optimal feature weights

Data

• Assumption: the data we train on reflects our application setting Note: Your Mileage May Vary…

Train Development Test

FEATURES

What’s in a name?

• Particular words • Particular contexts • Particular appearance • …

We want discriminative, general features

Content and context

Token Feature Value I xi-2 xi-2_I

saw xi-1 xi-1_saw

Paris xi xi_Paris in xi+1 xi+1_in

New xi+2 xi+2_New

Word “shapes”

Token Value I is_cap, init_cap

saw

Paris init_cap in

New init_cap

Matching lists of words (gazetteers)

Token Value I

saw

Paris LOC_match, PER_match in

New LOC_match, ORG_match

Other representations of words

Token Brown cluster I 111000111

saw 10110110 Paris 01110

in 11010110 New 101001

Note: word2vec embeddings here too…

Brown clusters at different levels

Cluster prefix Most frequent words 101011 new, U.S. ,European, public, major, local, former, current,

British, German 1010.* new, other, U.S., European, foreign, economic, most,

President, financial, political,

101.* this, more, new, two, last, than, first, other, next, U.S.,

10.*

the, a, its, an, this, their, his, more, new, two,

1.*

the, ., , , to, of, in, a, and, ", said,

Putting it together

Combine different features and tags • xi=New:

•  yi-1, yi •  O, B-LOC

•  features: •  LOC-match •  init_cap •  brown_cluster_i-1_pref_1_value_1 •  xi+1_in • …

LEARNING AND EVALUATING

When to use each dataset

• Learn parameters (feature weights) on train • Scan for optimal hyper-parameters on development

•  regularization type •  regularization hyper-parameter

• Report evaluation and analyze development results • Very infrequently evaluate on test

Regularize to combat overfitting

L1 à force features to zero, sparser model L2 à trim all weights

0

1

a b c d

0

1

a b c d

Evaluation

• Strict •  Span correct •  Type correct

• Relaxed •  Per-token accuracy

• Precision • Recall • F1

More data, better score…

0 10 20 30 40 50 60 70 80

0 50 100 150 200 250

KNOWLEDGE BASE TAG GAZETTEERS

What if you can cheat a little?

Document metadata tells you what entities are there •  Standardized name •  Type •  Pointer to knowledge base •  Not aligned to the text

KB tags at training and test time

Where do they come from?

• Existing indexing systems (eg, New York Times, Bloomberg) • Document-level annotation for busy knowledge workers

CoNLL 2003 English NER testa results

How many sentences do annotators have to check for KB tags?

INFORMATION EXTRACTION FOR CONVERSATIONS

Why conversational systems?

One interaction loop

1.  Natural language understanding •  What are you talking about? •  What is your intent?

2.  Dialogue management •  What should the machine do next?

3.  Natural language generation •  How do we express what we want to say?

Dialogue State Tracking Challenge

• To hold a conversation, we need to follow one • Can we track what the user is talking about as a conversation

evolves? • 4th challenge:

•  Human-human dialogues •  Fixed KB •  Fairly open domain

Strictly Named Entity Linking, since we need to identify the KB entity

Data

Utterance

Utterance Topic, Slot, Value

Topic, Slot, Value

Topic, Slot, Value

Topic, Slot, Value

Utterance

Utterance

Time Segments with given Topic State data to track

Example

Transcript Detected state And then when the people in the past come to Singapore.

%Uh they all live near the Singapore River.

(Attraction, NEIGHBOURHOOD, Singapore River) (Attraction, PLACE, Singapore River)

So Chinatown is actually just next to the Singapore River.

(Attraction, NEIGHBOURHOOD, Chinatown) (Attraction, NEIGHBOURHOOD/Singapore River) (Attraction, PLACE, Chinatown) (Attraction, PLACE, Singapore River)

Expected: (Attraction, INFO, History) (Attraction, NEIGHBOURHOOD, Chinatown)

Challenges: text à ontology

•  Explicit mentions: •  Typos, aliases

•  Explicit referring mentions: •  “that place”, “the first one”

•  Implicit mentions: •  Info slots are “topical”

With history at the same time…

What does the future hold?

• Conversational interactions everywhere • Smart virtual assistants

•  Personal •  Enterprise

Thanks Questions?

Abstract

One part of information extraction is the recognition of named entities in text. This has been long-studied in the research community and there are mature approaches to solving the problem. This talk is a deeper dive into the topic. There won’t be any equations, but you should get an idea of what kinds of features people use and why. I’ll then discuss a novel application setting that shows how you can effectively use other data for supervision. Finally, I’ll briefly discuss the challenges of information extraction in a conversational context, and talk about the shared tasks the research community is using to explore natural language understanding.

#3 Information extraction from news to conversations

Technology

Transcript of #3 Information extraction from news to conversations