#3 Information extraction from news to conversations

37
Information extraction from news to conversations Will Radford – Xerox Research Centre Europe 15 th October 2015

Transcript of #3 Information extraction from news to conversations

Page 1: #3 Information extraction from news to conversations

Information extraction from news to conversations Will Radford – Xerox Research Centre Europe 15th October 2015

Page 2: #3 Information extraction from news to conversations

Outline

1.  Named entity recognition using Conditional Random Fields (CRFs)

•  Features •  Learning and evaluation

2.  Knowledge Base Tags for NER 3.  Introduction to dialogue state tracking and why it’s hard

Page 3: #3 Information extraction from news to conversations

NAMED ENTITY RECOGNITION USING CONDITIONAL RANDOM FIELDS

Page 4: #3 Information extraction from news to conversations

The task

• Given a sequence of tokens •  X is a sequence of tokens •  xi is a token (e.g. Will) at the ith position in the sequence

• Return a structured label •  Y is a sequence of tags •  yi is a tag (e.g. PER) at the ith position in the sequence

Page 5: #3 Information extraction from news to conversations

X: I saw Paris in New York with her dog .

Y: O O B-PER O B-LOC I-LOC O O O O

Page 6: #3 Information extraction from news to conversations

The model

• Conditional Random Fields •  Linear chain, undirected graphical model • P(Y|X) à Probability of tags given tokens •  Features of token xi can be joined with

• Yi à This token, this tag •  (yi, yi-1) à This token, this and the last tag

• Convex optimization allows us to learn optimal feature weights

Page 7: #3 Information extraction from news to conversations

Data

• Assumption: the data we train on reflects our application setting Note: Your Mileage May Vary…

Train Development Test

Page 8: #3 Information extraction from news to conversations

FEATURES

Page 9: #3 Information extraction from news to conversations

What’s in a name?

• Particular words • Particular contexts • Particular appearance • …

We want discriminative, general features

Page 10: #3 Information extraction from news to conversations

Content and context

Token Feature Value I xi-2 xi-2_I

saw xi-1 xi-1_saw

Paris xi xi_Paris in xi+1 xi+1_in

New xi+2 xi+2_New

Page 11: #3 Information extraction from news to conversations

Word “shapes”

Token Value I is_cap, init_cap

saw

Paris init_cap in

New init_cap

Page 12: #3 Information extraction from news to conversations

Matching lists of words (gazetteers)

Token Value I

saw

Paris LOC_match, PER_match in

New LOC_match, ORG_match

Page 13: #3 Information extraction from news to conversations

Other representations of words

Token Brown cluster I 111000111

saw 10110110 Paris 01110

in 11010110 New 101001

Note: word2vec embeddings here too…

Page 14: #3 Information extraction from news to conversations

Brown clusters at different levels

Cluster prefix Most frequent words 101011 new, U.S. ,European, public, major, local, former, current,

British, German 1010.* new, other, U.S., European, foreign, economic, most,

President, financial, political,

101.* this, more, new, two, last, than, first, other, next, U.S.,

10.*

the, a, its, an, this, their, his, more, new, two,

1.*

the, ., , , to, of, in, a, and, ", said,

Page 15: #3 Information extraction from news to conversations

Putting it together

Combine different features and tags • xi=New:

•  yi-1, yi •  O, B-LOC

•  features: •  LOC-match •  init_cap •  brown_cluster_i-1_pref_1_value_1 •  xi+1_in • …

Page 16: #3 Information extraction from news to conversations

LEARNING AND EVALUATING

Page 17: #3 Information extraction from news to conversations

When to use each dataset

• Learn parameters (feature weights) on train • Scan for optimal hyper-parameters on development

•  regularization type •  regularization hyper-parameter

• Report evaluation and analyze development results • Very infrequently evaluate on test

Page 18: #3 Information extraction from news to conversations

Regularize to combat overfitting

L1 à force features to zero, sparser model L2 à trim all weights

0

1

a b c d

0

1

a b c d

Page 19: #3 Information extraction from news to conversations

Evaluation

• Strict •  Span correct •  Type correct

• Relaxed •  Per-token accuracy

• Precision • Recall • F1

Page 20: #3 Information extraction from news to conversations

More data, better score…

0 10 20 30 40 50 60 70 80

0 50 100 150 200 250

Page 21: #3 Information extraction from news to conversations

KNOWLEDGE BASE TAG GAZETTEERS

Page 22: #3 Information extraction from news to conversations

What if you can cheat a little?

Document metadata tells you what entities are there •  Standardized name •  Type •  Pointer to knowledge base •  Not aligned to the text

Page 23: #3 Information extraction from news to conversations

KB tags at training and test time

Page 24: #3 Information extraction from news to conversations

Where do they come from?

• Existing indexing systems (eg, New York Times, Bloomberg) • Document-level annotation for busy knowledge workers

Page 25: #3 Information extraction from news to conversations

CoNLL 2003 English NER testa results

Page 26: #3 Information extraction from news to conversations

How many sentences do annotators have to check for KB tags?

Page 27: #3 Information extraction from news to conversations

INFORMATION EXTRACTION FOR CONVERSATIONS

Page 28: #3 Information extraction from news to conversations

Why conversational systems?

Page 29: #3 Information extraction from news to conversations

One interaction loop

1.  Natural language understanding •  What are you talking about? •  What is your intent?

2.  Dialogue management •  What should the machine do next?

3.  Natural language generation •  How do we express what we want to say?

Page 30: #3 Information extraction from news to conversations

Dialogue State Tracking Challenge

• To hold a conversation, we need to follow one • Can we track what the user is talking about as a conversation

evolves? • 4th challenge:

•  Human-human dialogues •  Fixed KB •  Fairly open domain

Strictly Named Entity Linking, since we need to identify the KB entity

Page 31: #3 Information extraction from news to conversations

Data

Utterance

Utterance Topic, Slot, Value

Topic, Slot, Value

Topic, Slot, Value

Topic, Slot, Value

Utterance

Utterance

Time Segments with given Topic State data to track

Page 32: #3 Information extraction from news to conversations

Example

Transcript Detected state And then when the people in the past come to Singapore.

%Uh they all live near the Singapore River.

(Attraction, NEIGHBOURHOOD, Singapore River) (Attraction, PLACE, Singapore River)

So Chinatown is actually just next to the Singapore River.

(Attraction, NEIGHBOURHOOD, Chinatown) (Attraction, NEIGHBOURHOOD/Singapore River) (Attraction, PLACE, Chinatown) (Attraction, PLACE, Singapore River)

Expected: (Attraction, INFO, History) (Attraction, NEIGHBOURHOOD, Chinatown)

Page 33: #3 Information extraction from news to conversations

Challenges: text à ontology

•  Explicit mentions: •  Typos, aliases

•  Explicit referring mentions: •  “that place”, “the first one”

•  Implicit mentions: •  Info slots are “topical”

With history at the same time…

Page 34: #3 Information extraction from news to conversations

What does the future hold?

• Conversational interactions everywhere • Smart virtual assistants

•  Personal •  Enterprise

Page 35: #3 Information extraction from news to conversations

Thanks Questions?

Page 36: #3 Information extraction from news to conversations
Page 37: #3 Information extraction from news to conversations

Abstract

One part of information extraction is the recognition of named entities in text. This has been long-studied in the research community and there are mature approaches to solving the problem. This talk is a deeper dive into the topic. There won’t be any equations, but you should get an idea of what kinds of features people use and why. I’ll then discuss a novel application setting that shows how you can effectively use other data for supervision. Finally, I’ll briefly discuss the challenges of information extraction in a conversational context, and talk about the shared tasks the research community is using to explore natural language understanding.