Bayesian models as a tool for revealing inductive biases

44
Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley

description

Bayesian models as a tool for revealing inductive biases. Tom Griffiths University of California, Berkeley. Learning languages from utterances. blicket toma dax wug blicket wug. S  X Y X  {blicket,dax} Y  {toma, wug}. Learning functions from ( x , y ) pairs. - PowerPoint PPT Presentation

Transcript of Bayesian models as a tool for revealing inductive biases

Page 1: Bayesian models as a tool for revealing inductive biases

Bayesian models as a tool for revealing inductive biases

Tom GriffithsUniversity of California, Berkeley

Page 2: Bayesian models as a tool for revealing inductive biases

Inductive problems

blicket toma

dax wug

blicket wug

S X Y

X {blicket,dax}

Y {toma, wug}

Learning languages from utterances

Learning functions from (x,y) pairs

Learning categories from instances of their members

Page 3: Bayesian models as a tool for revealing inductive biases

Revealing inductive biases

• Many problems in cognitive science can be formulated as problems of induction– learning languages, concepts, and causal relations

• Such problems are not solvable without bias(e.g., Goodman, 1955; Kearns & Vazirani, 1994; Vapnik, 1995)

• What biases guide human inductive inferences?

How can computational models be used to investigate human inductive biases?

Page 4: Bayesian models as a tool for revealing inductive biases

Models and inductive biases

• Transparent

Page 5: Bayesian models as a tool for revealing inductive biases

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Reverend Thomas Bayes

Bayesian models

Page 6: Bayesian models as a tool for revealing inductive biases

Bayes’ theorem

P(h | d) =P(d | h)P(h)

P(d | ′ h )P( ′ h )′ h ∈H

Posteriorprobability

Likelihood Priorprobability

Sum over space of hypothesesh: hypothesis

d: data

Page 7: Bayesian models as a tool for revealing inductive biases

Three advantages of Bayesian models

• Transparent identification of inductive biases through hypothesis space, prior, and likelihood

• Opportunity to explore a range of biases expressed in terms that are natural to the problem at hand

• Rational statistical inference provides an upper bound on human inferences from data

Page 8: Bayesian models as a tool for revealing inductive biases

Two examples

Causal induction from small samples(Josh Tenenbaum, David Sobel, Alison Gopnik)

Statistical learning and word segmentation(Sharon Goldwater, Mark Johnson)

Page 9: Bayesian models as a tool for revealing inductive biases

Two examples

Causal induction from small samples(Josh Tenenbaum, David Sobel, Alison Gopnik)

Statistical learning and word segmentation(Sharon Goldwater, Mark Johnson)

Page 10: Bayesian models as a tool for revealing inductive biases

Blicket detector (Dave Sobel, Alison Gopnik, and colleagues)

See this? It’s a blicket machine. Blickets make it go.

Let’s put this oneon the machine.

Oooh, it’s a blicket!

Page 11: Bayesian models as a tool for revealing inductive biases

– Two objects: A and B– Trial 1: A B on detector – detector active– Trial 2: B on detector – detector inactive– 4-year-olds judge whether each object is a blicket

• A: a blicket (100% say yes)

• B: almost certainly not a blicket (16% say yes)

“One cause” (Gopnik, Sobel, Schulz, & Glymour, 2001)

AB TrialB TrialA B A Trial

Page 12: Bayesian models as a tool for revealing inductive biases

Hypotheses: causal models

Defines probability distribution over variables(for both observation, and intervention)

E

BA

E

BA

E

BA

E

BA

(Pearl, 2000; Spirtes, Glymour, & Scheines, 1993)

Page 13: Bayesian models as a tool for revealing inductive biases

Prior and likelihood: causal theory

• Prior probability an object is a blicket is q– defines a distribution over causal models

• Detectors have a deterministic “activation law”– always activate if a blicket is on the detector– never activate otherwise

(Tenenbaum & Griffiths, 2003; Griffiths, 2005)

Page 14: Bayesian models as a tool for revealing inductive biases

Prior and likelihood: causal theory

P(E=1 | A=0, B=0): 0 0 0 0

P(E=0 | A=0, B=0): 1 1 1 1P(E=1 | A=1, B=0): 0 0 1 1P(E=0 | A=1, B=0): 1 1 0 0P(E=1 | A=0, B=1): 0 1 0 1P(E=0 | A=0, B=1): 1 0 1 0P(E=1 | A=1, B=1): 0 1 1 1P(E=0 | A=1, B=1): 1 0 0 0

E

BA

E

BA

E

BA

E

BA

P(h00) = (1 – q)2 P(h10) = q(1 – q)P(h01) = (1 – q) q P(h11) = q2

Page 15: Bayesian models as a tool for revealing inductive biases

Modeling “one cause”

P(E=1 | A=0, B=0): 0 0 0 0

P(E=0 | A=0, B=0): 1 1 1 1P(E=1 | A=1, B=0): 0 0 1 1P(E=0 | A=1, B=0): 1 1 0 0P(E=1 | A=0, B=1): 0 1 0 1P(E=0 | A=0, B=1): 1 0 1 0P(E=1 | A=1, B=1): 0 1 1 1P(E=0 | A=1, B=1): 1 0 0 0

E

BA

E

BA

E

BA

E

BA

P(h00) = (1 – q)2 P(h10) = q(1 – q)P(h01) = (1 – q) q P(h11) = q2

Page 16: Bayesian models as a tool for revealing inductive biases

Modeling “one cause”

P(E=1 | A=0, B=0): 0 0 0 0

P(E=0 | A=0, B=0): 1 1 1 1P(E=1 | A=1, B=0): 0 0 1 1P(E=0 | A=1, B=0): 1 1 0 0P(E=1 | A=0, B=1): 0 1 0 1P(E=0 | A=0, B=1): 1 0 1 0P(E=1 | A=1, B=1): 0 1 1 1P(E=0 | A=1, B=1): 1 0 0 0

E

BA

E

BA

E

BA

P(h10) = q(1 – q)P(h01) = (1 – q) q P(h11) = q2

Page 17: Bayesian models as a tool for revealing inductive biases

Modeling “one cause”

P(E=1 | A=0, B=0): 0 0 0 0

P(E=0 | A=0, B=0): 1 1 1 1P(E=1 | A=1, B=0): 0 0 1 1P(E=0 | A=1, B=0): 1 1 0 0P(E=1 | A=0, B=1): 0 1 0 1P(E=0 | A=0, B=1): 1 0 1 0P(E=1 | A=1, B=1): 0 1 1 1P(E=0 | A=1, B=1): 1 0 0 0

E

BA

P(h10) = q(1 – q)

A is definitely a blicketB is definitely not a blicket

Page 18: Bayesian models as a tool for revealing inductive biases

– Two objects: A and B– Trial 1: A B on detector – detector active– Trial 2: B on detector – detector inactive– 4-year-olds judge whether each object is a blicket

• A: a blicket (100% say yes)

• B: almost certainly not a blicket (16% say yes)

“One cause” (Gopnik, Sobel, Schulz, & Glymour, 2001)

AB TrialB TrialA B A Trial

Page 19: Bayesian models as a tool for revealing inductive biases

Building on this analysis

• Transparent

Page 20: Bayesian models as a tool for revealing inductive biases

Other physical systems

From stick-ball machines…

…to lemur colonies

(Kushnir, Schulz, Gopnik, & Danks, 2003)(Griffiths, Baraff, & Tenenbaum, 2004)

(Griffiths & Tenenbaum, 2007)

Page 21: Bayesian models as a tool for revealing inductive biases

Two examples

Causal induction from small samples(Josh Tenenbaum, David Sobel, Alison Gopnik)

Statistical learning and word segmentation(Sharon Goldwater, Mark Johnson)

Page 22: Bayesian models as a tool for revealing inductive biases

Bayesian segmentation• In the domain of segmentation, we have:

– Data: unsegmented corpus (transcriptions).– Hypotheses: sequences of word tokens.

• Optimal solution is the segmentation with highest prior probability

= 1 if concatenating words forms corpus, = 0 otherwise.

Encodes assumptions about the structure of language

Page 23: Bayesian models as a tool for revealing inductive biases

Brent (1999)

• Describes a Bayesian unigram model for segmentation.– Prior favors solutions with fewer words, shorter words.

• Problems with Brent’s system:– Learning algorithm is approximate (non-optimal).– Difficult to extend to incorporate bigram info.

Page 24: Bayesian models as a tool for revealing inductive biases

A new unigram model (Dirichlet process)

Assume word wi is generated as follows:

1. Is wi a novel lexical item?

αα +

=n

yesP )(

α +=

n

nnoP )(

Fewer word types = Higher probability

Page 25: Bayesian models as a tool for revealing inductive biases

A new unigram model (Dirichlet process)

Assume word wi is generated as follows:

2. If novel, generate phonemic form x1…xm :

If not, choose lexical identity of wi from previously occurring words:

∏=

==m

iimi xPxxwP

11 )()...(

n

lcountlwP i

)()( ==

Shorter words = Higher probability

Power law = Higher probability

Page 26: Bayesian models as a tool for revealing inductive biases

Unigram model: simulations

• Same corpus as Brent (Bernstein-Ratner, 1987):– 9790 utterances of phonemically transcribed

child-directed speech (19-23 months).– Average utterance length: 3.4 words.– Average word length: 2.9 phonemes.

• Example input:yuwanttusiD6bUklUkD*z6b7wIThIzh&t&nd6dOgiyuwanttulUk&tDIs...

Page 27: Bayesian models as a tool for revealing inductive biases

Example results

Page 28: Bayesian models as a tool for revealing inductive biases

What happened?

• Model assumes (falsely) that words have the same probability regardless of context.

• Positing amalgams allows the model to capture word-to-word dependencies.

P(D&t) = .024 P(D&t|WAts) = .46 P(D&t|tu) = .0019

Page 29: Bayesian models as a tool for revealing inductive biases

What about other unigram models?

• Brent’s learning algorithm is insufficient to identify the optimal segmentation.– Our solution has higher probability under his

model than his own solution does.– On randomly permuted corpus, our system

achieves 96% accuracy; Brent gets 81%.

• Formal analysis shows undersegmentation is the optimal solution for any (reasonable) unigram model.

Page 30: Bayesian models as a tool for revealing inductive biases

Bigram model (hierachical Dirichlet process)

Assume word wi is generated as follows:1. Is (wi-1,wi) a novel bigram?

2. If novel, generate wi using unigram model (almost).

If not, choose lexical identity of wi from words previously occurring after wi-1.

ββ

+=

−1

)(iwn

yesPβ +

=−

1

1)(i

i

w

w

n

nnoP

)'(

),'()'|( 1 lcount

llcountlwlwP ii === −

Page 31: Bayesian models as a tool for revealing inductive biases

Example results

Page 32: Bayesian models as a tool for revealing inductive biases

Conclusions

• Both adults and children are sensitive to the nature of mechanisms in using covariation

• Both adults and children can use covariation to make inferences about the nature of mechanisms

• Bayesian inference provides a formal framework for understanding how statistics and knowledge interact in making these inferences– how theories constrain hypotheses, and are learned

Page 33: Bayesian models as a tool for revealing inductive biases
Page 34: Bayesian models as a tool for revealing inductive biases

A probabilistic mechanism?

• Children in Gopnik et al. (2001) who said that B was a blicket had seen evidence that the detector was probabilistic– one block activated detector 5/6 times

• Replace the deterministic “activation law”…– activate with p = 1- if a blicket is on the detector– never activate otherwise

Page 35: Bayesian models as a tool for revealing inductive biases

Deterministic vs. probabilisticP

rob a

b ili

t y o

f be

ing

a b l

i ck e

t

One cause

Deterministic

Probabilistic

mechanism knowledge affects intepretation of contingency data

Page 36: Bayesian models as a tool for revealing inductive biases

At end of the test phase, adults judge the probability that each object is a blicket

AB Trial B TrialBA

I. Familiarization phase: Establish nature of mechanism

II. Test phase: one cause

Manipulating mechanisms

same block

Page 37: Bayesian models as a tool for revealing inductive biases

Pro

b ab i

lit y

of

bein

g a

b li c

k et

One cause

BayesPeople

Deterministic

Probabilistic

Manipulating mechanisms(n = 12 undergraduates per condition)

Page 38: Bayesian models as a tool for revealing inductive biases

Pro

b ab i

lit y

of

bein

g a

b li c

k et

One cause One control Three control

Deterministic

Probabilistic

BayesPeople

Manipulating mechanisms (n = 12 undergraduates per condition)

Page 39: Bayesian models as a tool for revealing inductive biases

At end of the test phase, adults judge the probability that each object is a blicket

AB Trial B TrialBA

I. Familiarization phase: Establish nature of mechanism

II. Test phase: one cause

Acquiring mechanism knowledge

same block

Page 40: Bayesian models as a tool for revealing inductive biases

Results with children

• Tested 24 four-year-olds (mean age 54 months)• Instead of rating, yes or no response• Significant difference in one cause B responses

– deterministic: 8% say yes– probabilistic: 79% say yes

• No significant difference in one control trials– deterministic: 4% say yes– probabilistic: 21% say yes

(Griffiths & Sobel, submitted)

Page 41: Bayesian models as a tool for revealing inductive biases
Page 42: Bayesian models as a tool for revealing inductive biases

Comparison to previous results

• Proposed boundaries are more accurate than Brent’s, but fewer proposals are made.

• Result: word tokens are less accurate.

Boundary Precision

Boundary Recall

Brent .80 .85

GGJ .92 .62

Token F-score

Brent .68

GGJ .54

Precision: #correct / #found = [= hits / (hits + false alarms)]

Recall: #found / #true = [= hits / (hits + misses)]

F-score: an average of precision and recall.

Page 43: Bayesian models as a tool for revealing inductive biases

Quantitative evaluation

• Compared to unigram model, more boundaries are proposed, with no loss in accuracy:

• Accuracy is higher than previous models:

Boundary Precision

Boundary Recall

GGJ (unigram) .92 .62

GGJ (bigram) .92 .84

Token F-score Type F-score

Brent (unigram) .68 .52

GGJ (bigram) .77 .63

Page 44: Bayesian models as a tool for revealing inductive biases

Two examples

Causal induction from small samples(Josh Tenenbaum, David Sobel, Alison Gopnik)

Statistical learning and word segmentation(Sharon Goldwater, Mark Johnson)