Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum...

Bayesian models of cross-situational word learning

Michael C. FrankNoah Goodman

Josh Tenenbaum(MIT)

Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff for valuable discussion. Also thanks to Vikash Mansinghka, Ted Gibson, tedlab, and cocosci for

comments and the Jacob Javits Foundation for funding.

Word-learning in action

The problem of word learning

QuickTime™ and a decompressor

are needed to see this picture.







words: “blue rings”objects: rings, big bird

words: “and green rings”objects: rings, big bird

words: “and yellow rings”objects: rings, big bird

words: “Bigbird! Do you want to hold the rings?”

objects: big bird

In any one situation, children hear many words and see many objects

One possible solution

QuickTime™ and aTIFF (Uncompressed) decompressor


Apply a cross-situational strategy to learn mappings(but this is harder than it looks)

The problem of word learning









words: “blue rings”objects: rings, big bird

words: “and green rings”objects: rings, big bird

words: “and yellow rings”objects: rings, big bird

words: “Bigbird! Do you want to hold the rings?”

objects: big bird

Techniques for cross-situational word learning • Deductive inference: Siskind (1996)• Translation model: Yu, Ballard, & Aslin (2005), Yu &

Ballard (in press)

Outline

• Some facts of word learning– Mutual exclusivity– Fast-mapping– Use of social cues

• Our model: Bayesian word-learner

• Extension: Learning social cues

• Experimental coverage

• Some facts of word learning




Three facts of word learning

By 18-24 months, children will map a

novel word onto a novel referent (Markman

1992; Mervis & Bertrand, 1994)

Give me the dax!

Mutual exclusivity

Three- and four-year-olds can learn words

from one situation (Carey, 1978; Markson

& Bloom, 1997)

This one is a koba!

Fast mapping

By 18 months, children distinguish referents

from one another using social cues (Hollich,

Hirsh-Pasek, & Golinkoff, 2001)

Look at the modi!

Use of social cues

Outline

• The facts of word learning

• Our model: Bayesian word-learner– Model– Corpus– Comparison models– Results



Generative model

O

W

lexicon

words

objects

I

things you intend to refer to

€

l

situations

unobserved

observed

observed

Generative model: example

situations

Wwords look pretty

objects O

Iintention

ball

lexicon

ball bike

€

l

Inference

Bayes’ rule

Parsimony prior on lexicons

Inference technique• Stochastic search with simulated tempering• Data-driven proposals drawn from the mutual

information of word-object pairings

Corpus

• 2x10 min clips from CHILDES-Rollins

• Interaction between mom and infant (~6mo)

• 2528 word tokens of 420 words in 623 sentences

• 24 objects, all toys

QuickTime™ and aPhoto - JPEG decompressor


Model comparison

• Co-occurrence frequency

• Point-wise mutual information

€

MI(W ,O) =p(W ,O)

p(W )p(O)

• Translation model, based on IBM model 1 (Yu & Ballard, in press)

Results: model comparison

precision

€

correct pairs in lexicon

total pairs in lexicon

recall

€

correct pairs in lexicon

total correct pairs

Results: intuitive analysis

Word Objectbaby book

bigbird birdbird rattle

birdie duckbook bookoink pighand handhat hat

meow kittymoocow cow

oink pigon ring

ring ringsheep sheep





Most likely intentionsBest lexicon found

by search

Also: unlike baseline models, our model is extremely extensible

Outline



• Extension: Learning social cues– Corpus– Model– Preliminary results


Social corpus coding

Coded social cues for each utterance: infant’s hands, eyes, mouth, and touch; mom’s

hands, eyes, and touch

How it works

I’m looking

Mom looking

Ball 0 1

Bike 1 0

… … …

Bag 0 0

could be caused by base rate or by relevance

Noisy OR process

base rate relevance

Social model framework

S

social cues

r,b

relevance and base rate of social cues

O

W

lexicon

words

objects

I

things you intend to

refer to

€

l

situations

unobserved

Preliminary Results



Model finds appropriate features

Social features allow finding intent in situations without referential words

Outline




• Experimental coverage– Mutual exclusivity– Fast-mapping– Use of social cues

Mutual exclusivity

model shows soft mutual exclusivity

Fast-mapping

model can fast-map: learn a word from a single instance

ruled out on account of “light syntax”: penalty for using a referring word in a non-referring way

Use of social cues

model can learn word meanings based on social cues alone

Conclusions

• Bayesian model of cross-situational word-learning– Performed best over a corpus– Allows parsing of sentences and interpretation

of speaker’s intent

• Social model– Model can learn which social cues are relevant

to reference

• Experimental coverage– Mutual exclusivity– Fast-mapping– Learning words for social cues

Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum...

Documents

Transcript of Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum...