UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department...

56
UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts

Transcript of UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department...

Page 1: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

UMass andLearning for CALO

Andrew McCallum

Information Extraction & Synthesis Laboratory

Department of Computer Science

University of Massachusetts

Page 2: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Outline• CC-Prediction

– Learning in the wild from user email usage

• DEX– Learning in the wild from user correction...

as well as KB records filled by other CALO components

• Rexa– Learning in the wild from user corrections to

coreference... propagating constraints in a Markov-Logic-like system that scales to ~20 million objects

• Several new topic models– Discover interesting useful structure without the need for

supervision... learning from newly arrived data on the fly

Page 3: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

CC Prediction Using Various Exponential Family

Factor Graphs

Learning to keep an org. connected & avoid stove-piping.

First steps toward ad-hoc team creation.

Learning in the wild from user’s CC behavior,and from other parts of the CALO ontology.

Page 4: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Graphical Models for Email

xb

y

Nb

xsNs

xrNr-1

Body Subject Other Words Words Recipients

Recipient of Email

Nr

• Compute P(y|x) for CC prediction

- function - random variable

- N replicationsN

• Local functions facilitate system engineering through modularity

Email Model: Nb words in the body, Ns words in the subject, Nr recipients

The graph describes the joint distribution of random variables in term of the product of local functions

Page 5: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Document Models

xb

y

Nb

xsNs

xrNa-1

Title Abstract Body Co-authors References

Author ofDocument

Na

• Models may relational attributes

xt xbNt Nr

• We can optimize P(y|x) for classification performance and P(x|y) for model interpretability and parameter transfer (to other models)

Page 6: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

CC Prediction and Relational Attributes

xb

y

Nb

xsNs

xrNr-1

Thread Body Subject Other Relation Relation Words Words Recipients

Target Recipient

Nr

xr’xtr

Thread Relations – e.g. Was a given recipient ever included on this email thread?

Recipient Relationships – e.g. Does one of the other recipients report to the target recipient?

Ntr

Page 7: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

CC-Prediction Learning in the Wild

• As documents are added to Rexa, models of expertise for authors grows

• As DEX obtains more contact information and keywords, organizational relations emerge

• Model parameters can be adapted on-line

• Priors on parameters can be used to transfer learned information between models

• New relations can be added on-line• Modular model construction and intelligent model

optimization enable these goals

Page 8: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

CC Prediction Upcoming work on

Multi-Conditional Learning

A discriminatively-trained topic model,

discovering low-dimensional representations for

transfer learning and improved regularization & generalization.

Page 9: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Objective Functions for Parameter EstimationTraditional, joint training (e.g. naive Bayes, most topic models)

Traditional, conditional training (e.g. MaxEnt classifiers, CRFs)

Conditional mixtures (e.g. Jebara’s CEM, McCallum CRF string edit distance, ...)

Multi-conditional(mostly conditional, generative regularization)

Multi-conditional(for semi-sup)

Multi-conditional(for transfer learning, 2 tasks, shared hiddens)

Tra

dit

ion

alN

ew,

mu

lti-

con

dit

ion

al

Traditional mixture model (e.g. LDA)

Page 10: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

“Multi-Conditional Learning” (Regularization)[McCallum, Pal, Wang, 2006]

Page 11: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Predictive Random Fieldsmixture of Gaussians on synthetic data

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Data, classify by color Generatively trained

Conditionally-trained [Jebara 1998]

Multi-Conditional

[McCallum, Wang, Pal, 2005]

Page 12: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Multi-Conditional Mixturesvs. Harmoniun

on document retrieval task

Harmonium, joint with words, no labels

Harmonium, joint,with class labels and words

Conditionally-trained,to predict class labels

Multi-Conditional,multi-way conditionally trained

[McCallum, Wang, Pal, 2005]

Page 13: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

DEX

Beginning with a review of previous work,

then new work on record extraction,

with the ability to leverage new KBs in the wild, and for transfer

Page 14: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

System Overview

ContactInfo andPerson Name

Extraction

Person Name

Extraction

NameCoreference

HomepageRetrieval

Social NetworkAnalysis

KeywordExtraction

CRFWWW

names

Email QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Page 15: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

An ExampleTo: “Andrew McCallum” [email protected]

Subject ...

First Name:

Andrew

Middle Name:

Kachites

Last Name:

McCallum

JobTitle: Associate Professor

Company: University of Massachusetts

Street Address:

140 Governor’s Dr.

City: Amherst

State: MA

Zip: 01003

Company Phone:

(413) 545-1323

Links: Fernando Pereira, Sam Roweis,…

Key Words:

Information extraction,

social network,…

Search for new people

Page 16: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Summary of Results

Token

Acc

Field

Prec

Field

Recall

Field

F1

CRF 94.50 85.73 76.33 80.76

Person Keywords

William Cohen Logic programming

Text categorization

Data integration

Rule learning

Daphne Koller Bayesian networks

Relational models

Probabilistic models

Hidden variables

Deborah McGuiness

Semantic web

Description logics

Knowledge representation

Ontologies

Tom Mitchell Machine learning

Cognitive states

Learning apprentice

Artificial intelligence

Contact info and name extraction performance (25 fields)

Example keywords extracted

1. Expert Finding: When solving some task, find friends-of-friends with relevant expertise. Avoid “stove-piping” in large org’s by automatically suggesting collaborators. Given a task, automatically suggest the right team for the job. (Hiring aid!)

2. Social Network Analysis: Understand the social structure of your organization. Suggest structural changes for improved efficiency.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 17: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

• Information about – people – contact information– email– affiliation– job title– expertise – ...

are key to answering many CALO questions...both directly, and as supporting inputs to higher-level questions.

Importance of accurate DEX fields in IRIS

Page 18: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Learning Field Compatibilities in DEX

Professor Jane Smith

University of California

209-555-5555

Professor Smith chairs the Computer Science Department. She hails from Boston, …her administrative assistant …

John Doe

Administrative Assistant

University of California

209-444-4444

Name: Jane Smith, John Doe

JobTitle: Professor, Administrative Assistant

Company: U of California

Department: Computer Science

Phone: 209-555-5555, 209-444-4444

City: Boston

Extracted Record

Jane Smith University of California

209-555-5555Computer Science

Boston

John Doe

Administrative Assistant

University of California

209-444-4444

Professor-.5

-.4

-.6

.4

.8

.4

-.5

Compatibility Graph

Page 19: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Learning Field Compatibilities in DEX

Professor Jane Smith

University of California

209-555-5555

Professor Smith chairs the Computer Science Department. She hails from Boston, …her administrative assistant …

John Doe

Administrative Assistant

University of California

209-444-4444

Name: Jane Smith, John Doe

JobTitle: Professor, Administrative Assistant

Company: U of California

Department: Computer Science

Phone: 209-555-5555, 209-444-4444

City: Boston

Extracted Record

Jane Smith University of California

209-555-5555 Computer Science

Boston

John Doe

Administrative Assistant

University of California

209-444-4444

Professor

Page 20: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

• ~35% error reduction over transitive closure

• Qualitatively better than heuristic approach • Mine Knowledge Bases from other parts of IRIS

for learning compatibility rules among fields– “Professor” job title co-occurs with “University” company– Area code / city compatibility– “Senator” job title co-occurs with “Washington, D.C” location

• In the wild– As the user adds new fields & make corrections, DEX learns from

this KB data

• Transfer learning – between departments/industries

Learning Field Compatibilities in DEX

Page 21: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Rexa A knowledge base of publications,

grants, people, their expertise, topics, and inter-connections

Learning for information extraction and coreference.

Incrementally leveraging multiple sources of information for improved coreference

Gathering information about people’s expertise and co-author, citation relations

First a tour of Rexa, then slides about learning

Page 22: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Previous Systems

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 23: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

ResearchPaper

Cites

Previous Systems

Page 24: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

ResearchPaper

Cites

Person

UniversityVenue

Grant

Groups

Expertise

More Entities and Relations

Page 25: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.
Page 26: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.
Page 27: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.
Page 28: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.
Page 29: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.
Page 30: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.
Page 31: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.
Page 32: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.
Page 33: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.
Page 34: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.
Page 35: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.
Page 36: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.
Page 37: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Learning in Rexa

Extraction, coreferenceIn the wild: Re-adjusting KB after corrections from a user

Also, learning research topics/expertise, and their interconnections

Page 38: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

(Linear Chain) Conditional Random Fields

yt -1

yt

xt

yt+1

xt +1

xt -1

Finite state model Graphical model

Undirected graphical model, trained to maximize

conditional probability of output sequence given input sequence

. . .

FSM states

observations

yt+2

xt +2

yt+3

xt +3

said Jones a Microsoft VP …

where

OTHER PERSON OTHER ORG TITLE …

output seq

input seq

Asian word segmentation [COLING’04], [ACL’04]IE from Research papers [HTL’04]Object classification in images [CVPR ‘04]

Wide-spread interest, positive experimental results in many applications.

Noun phrase, Named entity [HLT’03], [CoNLL’03]Protein structure prediction [ICML’04]IE from Bioinformatics text [Bioinformatics ‘04],…

[Lafferty, McCallum, Pereira 2001]

p(y | x) =1

Zx

Φ(y t ,y t−1,x, t)t

Φ(y t ,y t−1,x, t) = exp λ k fk (y t ,y t−1,x, t)k

∑ ⎛

⎝ ⎜

⎠ ⎟

(500

cit

atio

ns)

Page 39: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

IE from Research Papers[McCallum et al ‘99]

Page 40: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

IE from Research Papers

Field-level F1

Hidden Markov Models (HMMs) 75.6[Seymore, McCallum, Rosenfeld, 1999]

Support Vector Machines (SVMs) 89.7[Han, Giles, et al, 2003]

Conditional Random Fields (CRFs) 93.9[Peng, McCallum, 2004]

error40%

(Word-level accuracy is >99%)

Page 41: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

p

Databasefield values

c

Joint segmentation and co-reference

o

s

o

s

c

c

s

o

Citation attributes

y y

y

Segmentation

[Wellner, McCallum, Peng, Hay, UAI 2004]Inference:Variant of Iterated Conditional Modes

Co-reference decisions

Laurel, B. Interface Agents: Metaphors with Character, in The Art of Human-Computer Interface Design, B. Laurel (ed), Addison-Wesley, 1990.

Brenda Laurel. Interface Agents: Metaphors with Character, in Laurel, The Art of Human-Computer Interface Design, 355-366, 1990.

[Besag, 1986]

World Knowledge

35% reduction in co-reference error by using segmentation uncertainty.

6-14% reduction in segmentation error by using co-reference.

Extraction from and matching of research paper citations.

see also [Marthi, Milch, Russell, 2003]

Page 42: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Rexa Learning in the Wildfrom User Feedback

• Coreference will never be perfect.• Rexa allows users to enter corrections to

coreference decisions• Rexa then uses this feedback to

– re-consider other inter-related parts of the KB– automatically make further error corrections

by propagating constraints

• (Our coreference system uses underlying ideas very much like Markov Logic, and scales to ~20 million mention objects.)

Page 43: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Finding Topics in 1 million CS papers

200 topics & keywords automatically discovered.

Page 44: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Topical Transfer

Citation counts from one topic to another. Map “producers and consumers”

Page 45: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Topical Diversity

Find the topics that are cited by many other topics---measuring diversity of impact.

Entropy of the topic distribution among papers that cite this paper (this topic).

LowDiversity

HighDiversity

Page 46: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Some New Work onTopic Models

Robustly capturing topic correlationsPachkinko Allocation Model

Capturing phrases in topic-specific waysTopical N-Gram Model

Page 47: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Pachinko Machine

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Page 48: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Pachinko Allocation Model[Li, McCallum, 2005]

Model stru

cture

,

not the g

raphical m

odel

Distributions over words (like “LDA topics”)

Distributions over topics;mixtures, representing topic correlations

Distributions over distributions over topics...

Some interior nodes could contain one multinomial, used for all documents.(i.e. a very peaked Dirichlet)

22

31 33

41 42 43 44 45

32

word1 word2 word3 word4 word5 word6 word7 word8

21

11

Page 49: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Topic Coherence Comparison

LDA 100estimationlikelihoodmaximumnoisyestimatesmixturescenesurfacenormalizationgeneratedmeasurementssurfacesestimatingestimatediterativecombinedfiguredivisivesequenceideal

LDA 20models modelparametersdistributionbayesianprobabilityestimationdatagaussianmethodslikelihoodemmixtureshowapproachpaperdensityframeworkapproximationmarkov

Example super-topic33 input hidden units function number27 estimation bayesian parameters data methods24 distribution gaussian markov likelihood mixture11 exact kalman full conditional deterministic1 smoothing predictive regularizers intermediate slope

“models,estimation, stopwords”

“estimation,some junk”

PAM 100estimationbayesianparametersdatamethodsestimatemaximumprobabilisticdistributionsnoisevariablevariablesnoisyinferencevarianceentropymodelsframeworkstatisticalestimating

“estimation”

Page 50: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Topic Correlations in PAM

5000 research paper abstracts, from across all CS

Numbers on edges are supertopics’ Dirichlet parameters

Page 51: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Likelihood Comparison

Varying number of topics

Page 52: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Want to Model Trends over Time

• Is prevalence of topic growing or waning?

• Pattern appears only briefly– Capture its statistics in focused way– Don’t confuse it with patterns elsewhere in time

• How do roles, groups, influence shift over time?

Page 53: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Topics over Time (TOT)

w t

Nd

z

D

T

T

Betaover time

Multinomialover words

Dirichlet

multinomialover topics

topicindex

wordtime

stamp

Dirichletprior

Uniformprior

w

t

Nd

z

D

T

Multinomialover words

time stamp

multinomialover topics

topicindex

word

Dirichletprior

distributionon timestamps

T

Betaover time

Uniformprior

[Wang, McCallum 2006]

Page 54: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

State of the Union Address

208 Addresses delivered between January 8, 1790 and January 29, 2002.

To increase the number of documents, we split the addresses into paragraphs and treated them as ‘documents’. One-line paragraphs were excluded. Stopping was applied.

•17156 ‘documents’

•21534 words

•669,425 tokens

Our scheme of taxation, by means of which this needless surplus is takenfrom the people and put into the public Treasury, consists of a tariff orduty levied upon importations from abroad and internal-revenue taxes leviedupon the consumption of tobacco and spirituous and malt liquors. It must beconceded that none of the things subjected to internal-revenue taxationare, strictly speaking, necessaries. There appears to be no just complaintof this taxation by the consumers of these articles, and there seems to benothing so well able to bear the burden without hardship to any portion ofthe people.

1910

Page 55: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Comparing

TOT

against

LDA

Page 56: UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Topic Distributions Conditioned on Time

time

top

ic m

ass

(in

ver

tica

l h

eig

ht)

NIPSvol1-14