Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum...

25
Bayesian models of cross- situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff for valuable discussion. Also thanks to Vikash Mansinghka, Ted Gibson, tedlab, and cocosci for comments and the Jacob Javits Foundation for funding.

Transcript of Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum...

Page 1: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Bayesian models of cross-situational word learning

Michael C. FrankNoah Goodman

Josh Tenenbaum(MIT)

Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff for valuable discussion. Also thanks to Vikash Mansinghka, Ted Gibson, tedlab, and cocosci for

comments and the Jacob Javits Foundation for funding.

Page 2: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Word-learning in action

Page 3: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

The problem of word learning

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

words: “blue rings”objects: rings, big bird

words: “and green rings”objects: rings, big bird

words: “and yellow rings”objects: rings, big bird

words: “Bigbird! Do you want to hold the rings?”

objects: big bird

In any one situation, children hear many words and see many objects

Page 4: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

One possible solution

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Apply a cross-situational strategy to learn mappings(but this is harder than it looks)

Page 5: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

The problem of word learning

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

QuickTime™ and a decompressor

are needed to see this picture.

words: “blue rings”objects: rings, big bird

words: “and green rings”objects: rings, big bird

words: “and yellow rings”objects: rings, big bird

words: “Bigbird! Do you want to hold the rings?”

objects: big bird

Techniques for cross-situational word learning • Deductive inference: Siskind (1996)• Translation model: Yu, Ballard, & Aslin (2005), Yu &

Ballard (in press)

Page 6: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Outline

• Some facts of word learning– Mutual exclusivity– Fast-mapping– Use of social cues

• Our model: Bayesian word-learner

• Extension: Learning social cues

• Experimental coverage

• Some facts of word learning

• Our model: Bayesian word-learner

• Extension: Learning social cues

• Experimental coverage

Page 7: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Three facts of word learning

By 18-24 months, children will map a

novel word onto a novel referent (Markman

1992; Mervis & Bertrand, 1994)

Give me the dax!

Mutual exclusivity

Three- and four-year-olds can learn words

from one situation (Carey, 1978; Markson

& Bloom, 1997)

This one is a koba!

Fast mapping

By 18 months, children distinguish referents

from one another using social cues (Hollich,

Hirsh-Pasek, & Golinkoff, 2001)

Look at the modi!

Use of social cues

Page 8: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Outline

• The facts of word learning

• Our model: Bayesian word-learner– Model– Corpus– Comparison models– Results

• Extension: Learning social cues

• Experimental coverage

Page 9: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Generative model

O

W

lexicon

words

objects

I

things you intend to refer to

l

situations

unobserved

observed

observed

Page 10: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Generative model: example

situations

Wwords look pretty

objects O

Iintention

ball

lexicon

ball bike

l

Page 11: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Inference

Bayes’ rule

Parsimony prior on lexicons

Inference technique• Stochastic search with simulated tempering• Data-driven proposals drawn from the mutual

information of word-object pairings

Page 12: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Corpus

• 2x10 min clips from CHILDES-Rollins

• Interaction between mom and infant (~6mo)

• 2528 word tokens of 420 words in 623 sentences

• 24 objects, all toys

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

Page 13: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Model comparison

• Co-occurrence frequency

• Point-wise mutual information

MI(W ,O) =p(W ,O)

p(W )p(O)

• Translation model, based on IBM model 1 (Yu & Ballard, in press)

Page 14: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Results: model comparison

precision

correct pairs in lexicon

total pairs in lexicon

recall

correct pairs in lexicon

total correct pairs

Page 15: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Results: intuitive analysis

Word Objectbaby book

bigbird birdbird rattle

birdie duckbook bookoink pighand handhat hat

meow kittymoocow cow

oink pigon ring

ring ringsheep sheep

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Most likely intentionsBest lexicon found

by search

Also: unlike baseline models, our model is extremely extensible

Page 16: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Outline

• The facts of word learning

• Our model: Bayesian word-learner

• Extension: Learning social cues– Corpus– Model– Preliminary results

• Experimental coverage

Page 17: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Social corpus coding

Coded social cues for each utterance: infant’s hands, eyes, mouth, and touch; mom’s

hands, eyes, and touch

Page 18: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

How it works

I’m looking

Mom looking

Ball 0 1

Bike 1 0

… … …

Bag 0 0

could be caused by base rate or by relevance

Noisy OR process

base rate relevance

Page 19: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Social model framework

S

social cues

r,b

relevance and base rate of social cues

O

W

lexicon

words

objects

I

things you intend to

refer to

l

situations

unobserved

Page 20: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Preliminary Results

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Model finds appropriate features

Social features allow finding intent in situations without referential words

Page 21: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Outline

• The facts of word learning

• Our model: Bayesian word-learner

• Extension: Learning social cues

• Experimental coverage– Mutual exclusivity– Fast-mapping– Use of social cues

Page 22: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Mutual exclusivity

model shows soft mutual exclusivity

Page 23: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Fast-mapping

model can fast-map: learn a word from a single instance

ruled out on account of “light syntax”: penalty for using a referring word in a non-referring way

Page 24: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Use of social cues

model can learn word meanings based on social cues alone

Page 25: Bayesian models of cross-situational word learning Michael C. Frank Noah Goodman Josh Tenenbaum (MIT) Thanks to Kathy Hirsh-Pasek and Roberta Golinkoff.

Conclusions

• Bayesian model of cross-situational word-learning– Performed best over a corpus– Allows parsing of sentences and interpretation

of speaker’s intent

• Social model– Model can learn which social cues are relevant

to reference

• Experimental coverage– Mutual exclusivity– Fast-mapping– Learning words for social cues