#3 Information extraction from news to conversations
-
Upload
berlin-language-technology -
Category
Technology
-
view
475 -
download
1
Transcript of #3 Information extraction from news to conversations
Information extraction from news to conversations Will Radford – Xerox Research Centre Europe 15th October 2015
Outline
1. Named entity recognition using Conditional Random Fields (CRFs)
• Features • Learning and evaluation
2. Knowledge Base Tags for NER 3. Introduction to dialogue state tracking and why it’s hard
NAMED ENTITY RECOGNITION USING CONDITIONAL RANDOM FIELDS
The task
• Given a sequence of tokens • X is a sequence of tokens • xi is a token (e.g. Will) at the ith position in the sequence
• Return a structured label • Y is a sequence of tags • yi is a tag (e.g. PER) at the ith position in the sequence
X: I saw Paris in New York with her dog .
Y: O O B-PER O B-LOC I-LOC O O O O
The model
• Conditional Random Fields • Linear chain, undirected graphical model • P(Y|X) à Probability of tags given tokens • Features of token xi can be joined with
• Yi à This token, this tag • (yi, yi-1) à This token, this and the last tag
• Convex optimization allows us to learn optimal feature weights
Data
• Assumption: the data we train on reflects our application setting Note: Your Mileage May Vary…
Train Development Test
FEATURES
What’s in a name?
• Particular words • Particular contexts • Particular appearance • …
We want discriminative, general features
Content and context
Token Feature Value I xi-2 xi-2_I
saw xi-1 xi-1_saw
Paris xi xi_Paris in xi+1 xi+1_in
New xi+2 xi+2_New
Word “shapes”
Token Value I is_cap, init_cap
saw
Paris init_cap in
New init_cap
Matching lists of words (gazetteers)
Token Value I
saw
Paris LOC_match, PER_match in
New LOC_match, ORG_match
Other representations of words
Token Brown cluster I 111000111
saw 10110110 Paris 01110
in 11010110 New 101001
Note: word2vec embeddings here too…
Brown clusters at different levels
Cluster prefix Most frequent words 101011 new, U.S. ,European, public, major, local, former, current,
British, German 1010.* new, other, U.S., European, foreign, economic, most,
President, financial, political,
101.* this, more, new, two, last, than, first, other, next, U.S.,
10.*
the, a, its, an, this, their, his, more, new, two,
1.*
the, ., , , to, of, in, a, and, ", said,
Putting it together
Combine different features and tags • xi=New:
• yi-1, yi • O, B-LOC
• features: • LOC-match • init_cap • brown_cluster_i-1_pref_1_value_1 • xi+1_in • …
LEARNING AND EVALUATING
When to use each dataset
• Learn parameters (feature weights) on train • Scan for optimal hyper-parameters on development
• regularization type • regularization hyper-parameter
• Report evaluation and analyze development results • Very infrequently evaluate on test
Regularize to combat overfitting
L1 à force features to zero, sparser model L2 à trim all weights
0
1
a b c d
0
1
a b c d
Evaluation
• Strict • Span correct • Type correct
• Relaxed • Per-token accuracy
• Precision • Recall • F1
More data, better score…
0 10 20 30 40 50 60 70 80
0 50 100 150 200 250
KNOWLEDGE BASE TAG GAZETTEERS
What if you can cheat a little?
Document metadata tells you what entities are there • Standardized name • Type • Pointer to knowledge base • Not aligned to the text
KB tags at training and test time
Where do they come from?
• Existing indexing systems (eg, New York Times, Bloomberg) • Document-level annotation for busy knowledge workers
CoNLL 2003 English NER testa results
How many sentences do annotators have to check for KB tags?
INFORMATION EXTRACTION FOR CONVERSATIONS
Why conversational systems?
One interaction loop
1. Natural language understanding • What are you talking about? • What is your intent?
2. Dialogue management • What should the machine do next?
3. Natural language generation • How do we express what we want to say?
Dialogue State Tracking Challenge
• To hold a conversation, we need to follow one • Can we track what the user is talking about as a conversation
evolves? • 4th challenge:
• Human-human dialogues • Fixed KB • Fairly open domain
Strictly Named Entity Linking, since we need to identify the KB entity
Data
Utterance
Utterance Topic, Slot, Value
Topic, Slot, Value
Topic, Slot, Value
Topic, Slot, Value
Utterance
Utterance
Time Segments with given Topic State data to track
Example
Transcript Detected state And then when the people in the past come to Singapore.
%Uh they all live near the Singapore River.
(Attraction, NEIGHBOURHOOD, Singapore River) (Attraction, PLACE, Singapore River)
So Chinatown is actually just next to the Singapore River.
(Attraction, NEIGHBOURHOOD, Chinatown) (Attraction, NEIGHBOURHOOD/Singapore River) (Attraction, PLACE, Chinatown) (Attraction, PLACE, Singapore River)
Expected: (Attraction, INFO, History) (Attraction, NEIGHBOURHOOD, Chinatown)
Challenges: text à ontology
• Explicit mentions: • Typos, aliases
• Explicit referring mentions: • “that place”, “the first one”
• Implicit mentions: • Info slots are “topical”
With history at the same time…
What does the future hold?
• Conversational interactions everywhere • Smart virtual assistants
• Personal • Enterprise
Thanks Questions?
Abstract
One part of information extraction is the recognition of named entities in text. This has been long-studied in the research community and there are mature approaches to solving the problem. This talk is a deeper dive into the topic. There won’t be any equations, but you should get an idea of what kinds of features people use and why. I’ll then discuss a novel application setting that shows how you can effectively use other data for supervision. Finally, I’ll briefly discuss the challenges of information extraction in a conversational context, and talk about the shared tasks the research community is using to explore natural language understanding.