Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California,...

44
Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    2

Transcript of Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California,...

Page 1: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Bayesian models as a tool for revealing inductive biases

Tom GriffithsUniversity of California, Berkeley

Page 2: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Inductive problems

blicket toma

dax wug

blicket wug

S X Y

X {blicket,dax}

Y {toma, wug}

Learning languages from utterances

Learning functions from (x,y) pairs

Learning categories from instances of their members

Page 3: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Revealing inductive biases

• Many problems in cognitive science can be formulated as problems of induction– learning languages, concepts, and causal relations

• Such problems are not solvable without bias(e.g., Goodman, 1955; Kearns & Vazirani, 1994; Vapnik, 1995)

• What biases guide human inductive inferences?

How can computational models be used to investigate human inductive biases?

Page 4: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Models and inductive biases

• Transparent

Page 5: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Reverend Thomas Bayes

Bayesian models

Page 6: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Bayes’ theorem

P(h | d) =P(d | h)P(h)

P(d | ′ h )P( ′ h )′ h ∈H

Posteriorprobability

Likelihood Priorprobability

Sum over space of hypothesesh: hypothesis

d: data

Page 7: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Three advantages of Bayesian models

• Transparent identification of inductive biases through hypothesis space, prior, and likelihood

• Opportunity to explore a range of biases expressed in terms that are natural to the problem at hand

• Rational statistical inference provides an upper bound on human inferences from data

Page 8: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Two examples

Causal induction from small samples(Josh Tenenbaum, David Sobel, Alison Gopnik)

Statistical learning and word segmentation(Sharon Goldwater, Mark Johnson)

Page 9: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Two examples

Causal induction from small samples(Josh Tenenbaum, David Sobel, Alison Gopnik)

Statistical learning and word segmentation(Sharon Goldwater, Mark Johnson)

Page 10: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Blicket detector (Dave Sobel, Alison Gopnik, and colleagues)

See this? It’s a blicket machine. Blickets make it go.

Let’s put this oneon the machine.

Oooh, it’s a blicket!

Page 11: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

– Two objects: A and B– Trial 1: A B on detector – detector active– Trial 2: B on detector – detector inactive– 4-year-olds judge whether each object is a blicket

• A: a blicket (100% say yes)

• B: almost certainly not a blicket (16% say yes)

“One cause” (Gopnik, Sobel, Schulz, & Glymour, 2001)

AB TrialB TrialA B A Trial

Page 12: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Hypotheses: causal models

Defines probability distribution over variables(for both observation, and intervention)

E

BA

E

BA

E

BA

E

BA

(Pearl, 2000; Spirtes, Glymour, & Scheines, 1993)

Page 13: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Prior and likelihood: causal theory

• Prior probability an object is a blicket is q– defines a distribution over causal models

• Detectors have a deterministic “activation law”– always activate if a blicket is on the detector– never activate otherwise

(Tenenbaum & Griffiths, 2003; Griffiths, 2005)

Page 14: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Prior and likelihood: causal theory

P(E=1 | A=0, B=0): 0 0 0 0

P(E=0 | A=0, B=0): 1 1 1 1P(E=1 | A=1, B=0): 0 0 1 1P(E=0 | A=1, B=0): 1 1 0 0P(E=1 | A=0, B=1): 0 1 0 1P(E=0 | A=0, B=1): 1 0 1 0P(E=1 | A=1, B=1): 0 1 1 1P(E=0 | A=1, B=1): 1 0 0 0

E

BA

E

BA

E

BA

E

BA

P(h00) = (1 – q)2 P(h10) = q(1 – q)P(h01) = (1 – q) q P(h11) = q2

Page 15: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Modeling “one cause”

P(E=1 | A=0, B=0): 0 0 0 0

P(E=0 | A=0, B=0): 1 1 1 1P(E=1 | A=1, B=0): 0 0 1 1P(E=0 | A=1, B=0): 1 1 0 0P(E=1 | A=0, B=1): 0 1 0 1P(E=0 | A=0, B=1): 1 0 1 0P(E=1 | A=1, B=1): 0 1 1 1P(E=0 | A=1, B=1): 1 0 0 0

E

BA

E

BA

E

BA

E

BA

P(h00) = (1 – q)2 P(h10) = q(1 – q)P(h01) = (1 – q) q P(h11) = q2

Page 16: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Modeling “one cause”

P(E=1 | A=0, B=0): 0 0 0 0

P(E=0 | A=0, B=0): 1 1 1 1P(E=1 | A=1, B=0): 0 0 1 1P(E=0 | A=1, B=0): 1 1 0 0P(E=1 | A=0, B=1): 0 1 0 1P(E=0 | A=0, B=1): 1 0 1 0P(E=1 | A=1, B=1): 0 1 1 1P(E=0 | A=1, B=1): 1 0 0 0

E

BA

E

BA

E

BA

P(h10) = q(1 – q)P(h01) = (1 – q) q P(h11) = q2

Page 17: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Modeling “one cause”

P(E=1 | A=0, B=0): 0 0 0 0

P(E=0 | A=0, B=0): 1 1 1 1P(E=1 | A=1, B=0): 0 0 1 1P(E=0 | A=1, B=0): 1 1 0 0P(E=1 | A=0, B=1): 0 1 0 1P(E=0 | A=0, B=1): 1 0 1 0P(E=1 | A=1, B=1): 0 1 1 1P(E=0 | A=1, B=1): 1 0 0 0

E

BA

P(h10) = q(1 – q)

A is definitely a blicketB is definitely not a blicket

Page 18: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

– Two objects: A and B– Trial 1: A B on detector – detector active– Trial 2: B on detector – detector inactive– 4-year-olds judge whether each object is a blicket

• A: a blicket (100% say yes)

• B: almost certainly not a blicket (16% say yes)

“One cause” (Gopnik, Sobel, Schulz, & Glymour, 2001)

AB TrialB TrialA B A Trial

Page 19: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Building on this analysis

• Transparent

Page 20: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Other physical systems

From stick-ball machines…

…to lemur colonies

(Kushnir, Schulz, Gopnik, & Danks, 2003)(Griffiths, Baraff, & Tenenbaum, 2004)

(Griffiths & Tenenbaum, 2007)

Page 21: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Two examples

Causal induction from small samples(Josh Tenenbaum, David Sobel, Alison Gopnik)

Statistical learning and word segmentation(Sharon Goldwater, Mark Johnson)

Page 22: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Bayesian segmentation• In the domain of segmentation, we have:

– Data: unsegmented corpus (transcriptions).– Hypotheses: sequences of word tokens.

• Optimal solution is the segmentation with highest prior probability

= 1 if concatenating words forms corpus, = 0 otherwise.

Encodes assumptions about the structure of language

Page 23: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Brent (1999)

• Describes a Bayesian unigram model for segmentation.– Prior favors solutions with fewer words, shorter words.

• Problems with Brent’s system:– Learning algorithm is approximate (non-optimal).– Difficult to extend to incorporate bigram info.

Page 24: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

A new unigram model (Dirichlet process)

Assume word wi is generated as follows:

1. Is wi a novel lexical item?

αα +

=n

yesP )(

α +=

n

nnoP )(

Fewer word types = Higher probability

Page 25: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

A new unigram model (Dirichlet process)

Assume word wi is generated as follows:

2. If novel, generate phonemic form x1…xm :

If not, choose lexical identity of wi from previously occurring words:

∏=

==m

iimi xPxxwP

11 )()...(

n

lcountlwP i

)()( ==

Shorter words = Higher probability

Power law = Higher probability

Page 26: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Unigram model: simulations

• Same corpus as Brent (Bernstein-Ratner, 1987):– 9790 utterances of phonemically transcribed

child-directed speech (19-23 months).– Average utterance length: 3.4 words.– Average word length: 2.9 phonemes.

• Example input:yuwanttusiD6bUklUkD*z6b7wIThIzh&t&nd6dOgiyuwanttulUk&tDIs...

Page 27: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Example results

Page 28: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

What happened?

• Model assumes (falsely) that words have the same probability regardless of context.

• Positing amalgams allows the model to capture word-to-word dependencies.

P(D&t) = .024 P(D&t|WAts) = .46 P(D&t|tu) = .0019

Page 29: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

What about other unigram models?

• Brent’s learning algorithm is insufficient to identify the optimal segmentation.– Our solution has higher probability under his

model than his own solution does.– On randomly permuted corpus, our system

achieves 96% accuracy; Brent gets 81%.

• Formal analysis shows undersegmentation is the optimal solution for any (reasonable) unigram model.

Page 30: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Bigram model (hierachical Dirichlet process)

Assume word wi is generated as follows:1. Is (wi-1,wi) a novel bigram?

2. If novel, generate wi using unigram model (almost).

If not, choose lexical identity of wi from words previously occurring after wi-1.

ββ

+=

−1

)(iwn

yesPβ +

=−

1

1)(i

i

w

w

n

nnoP

)'(

),'()'|( 1 lcount

llcountlwlwP ii === −

Page 31: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Example results

Page 32: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Conclusions

• Both adults and children are sensitive to the nature of mechanisms in using covariation

• Both adults and children can use covariation to make inferences about the nature of mechanisms

• Bayesian inference provides a formal framework for understanding how statistics and knowledge interact in making these inferences– how theories constrain hypotheses, and are learned

Page 33: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.
Page 34: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

A probabilistic mechanism?

• Children in Gopnik et al. (2001) who said that B was a blicket had seen evidence that the detector was probabilistic– one block activated detector 5/6 times

• Replace the deterministic “activation law”…– activate with p = 1- if a blicket is on the detector– never activate otherwise

Page 35: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Deterministic vs. probabilisticP

rob a

b ili

t y o

f be

ing

a b l

i ck e

t

One cause

Deterministic

Probabilistic

mechanism knowledge affects intepretation of contingency data

Page 36: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

At end of the test phase, adults judge the probability that each object is a blicket

AB Trial B TrialBA

I. Familiarization phase: Establish nature of mechanism

II. Test phase: one cause

Manipulating mechanisms

same block

Page 37: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Pro

b ab i

lit y

of

bein

g a

b li c

k et

One cause

BayesPeople

Deterministic

Probabilistic

Manipulating mechanisms(n = 12 undergraduates per condition)

Page 38: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Pro

b ab i

lit y

of

bein

g a

b li c

k et

One cause One control Three control

Deterministic

Probabilistic

BayesPeople

Manipulating mechanisms (n = 12 undergraduates per condition)

Page 39: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

At end of the test phase, adults judge the probability that each object is a blicket

AB Trial B TrialBA

I. Familiarization phase: Establish nature of mechanism

II. Test phase: one cause

Acquiring mechanism knowledge

same block

Page 40: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Results with children

• Tested 24 four-year-olds (mean age 54 months)• Instead of rating, yes or no response• Significant difference in one cause B responses

– deterministic: 8% say yes– probabilistic: 79% say yes

• No significant difference in one control trials– deterministic: 4% say yes– probabilistic: 21% say yes

(Griffiths & Sobel, submitted)

Page 41: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.
Page 42: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Comparison to previous results

• Proposed boundaries are more accurate than Brent’s, but fewer proposals are made.

• Result: word tokens are less accurate.

Boundary Precision

Boundary Recall

Brent .80 .85

GGJ .92 .62

Token F-score

Brent .68

GGJ .54

Precision: #correct / #found = [= hits / (hits + false alarms)]

Recall: #found / #true = [= hits / (hits + misses)]

F-score: an average of precision and recall.

Page 43: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Quantitative evaluation

• Compared to unigram model, more boundaries are proposed, with no loss in accuracy:

• Accuracy is higher than previous models:

Boundary Precision

Boundary Recall

GGJ (unigram) .92 .62

GGJ (bigram) .92 .84

Token F-score Type F-score

Brent (unigram) .68 .52

GGJ (bigram) .77 .63

Page 44: Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley.

Two examples

Causal induction from small samples(Josh Tenenbaum, David Sobel, Alison Gopnik)

Statistical learning and word segmentation(Sharon Goldwater, Mark Johnson)