Analyzing cultural evolution by iterated learning
-
Upload
lars-valenzuela -
Category
Documents
-
view
43 -
download
2
description
Transcript of Analyzing cultural evolution by iterated learning
Analyzing cultural evolution by iterated learning
Tom GriffithsDepartment of Psychology
Cognitive Science Program
UC Berkeley QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Inductive problems
blicket toma
dax wug
blicket wug
S X Y
X {blicket,dax}
Y {toma, wug}
Learning languages from utterances
Learning functions from (x,y) pairs
Learning categories from instances of their members
Learning
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.data
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Iterated learning(Kirby, 2001)
What are the consequences of learners learning from other learners?
Outline
Part I: Formal analysis of iterated learning
Part II: Iterated learning in the lab
Outline
Part I: Formal analysis of iterated learning
Part II: Iterated learning in the lab
Objects of iterated learning
How do constraints on learning (inductive biases) influence cultural universals?
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Language
• The languages spoken by humans are typically viewed as the result of two factors– individual learning – innate constraints (biological evolution)
• This limits the possible explanations for different kinds of linguistic phenomena
Linguistic universals
• Human languages possess universal properties– e.g. compositionality
(Comrie, 1981; Greenberg, 1963; Hawkins, 1988)
• Traditional explanation:– linguistic universals reflect strong innate constraints
specific to a system for acquiring language(e.g., Chomsky, 1965)
Cultural evolution
• Languages are also subject to change via cultural evolution (through iterated learning)
• Alternative explanation:– linguistic universals emerge as the result of the fact that
language is learned anew by each generation (using general-purpose learning mechanisms, expressing weak constraints on languages)
(e.g., Briscoe, 1998; Kirby, 2001)
Analyzing iterated learning
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
PL(h|d): probability of inferring hypothesis h from data d
PP(d|h): probability of generating data d from hypothesis h
PL(h|d)
PP(d|h)
PL(h|d)
PP(d|h)
• Variables x(t+1) independent of history given x(t)
• Converges to a stationary distribution under easily checked conditions (i.e., if it is ergodic)
x x x x x x x x
Transition matrixP(x(t+1)|x(t))
Markov chains
Analyzing iterated learning
d0 h1 d1 h2PL(h|d) PP(d|h) PL(h|d)
d2 h3PP(d|h) PL(h|d)
d PP(d|h)PL(h|d)h1 h2d PP(d|h)PL(h|d)
h3
A Markov chain on hypotheses
d0 d1h PL(h|d) PP(d|h)d2h PL(h|d) PP(d|h) h PL(h|d) PP(d|h)
A Markov chain on data
Bayesian inference
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Reverend Thomas Bayes
Bayes’ theorem
P(h | d) P(d | h)P(h)
P(d | h )P( h )h H
Posteriorprobability
Likelihood Priorprobability
Sum over space of hypothesesh: hypothesis
d: data
A note on hypotheses and priors
• No commitment to the nature of hypotheses– neural networks (Rumelhart & McClelland, 1986)
– discrete parameters (Gibson & Wexler, 1994)
• Priors do not necessarily represent innate constraints specific to language acquisition– not innate: can reflect independent sources of data– not specific: general-purpose learning algorithms also
have inductive biases expressible as priors
Iterated Bayesian learning
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
PL(h|d)
PP(d|h)
PL(h|d)
PP(d|h)
PL (h | d) PP (d | h)P(h)
PP (d | h )P( h )h H
Assume learners sample from their posterior distribution:
Stationary distributions
• Markov chain on h converges to the prior, P(h)
• Markov chain on d converges to the “prior predictive distribution”
P(d) P(d | h)h
P(h)
(Griffiths & Kalish, 2005)
Explaining convergence to the prior
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
PL(h|d)
PP(d|h)
PL(h|d)
PP(d|h)
• Intuitively: data acts once, prior many times
• Formally: iterated learning with Bayesian agents is a Gibbs sampler on P(d,h)
(Griffiths & Kalish, 2007)
Gibbs sampling
For variables x = x1, x2, …, xn
Draw xi(t+1) from P(xi|x-i)
x-i = x1(t+1), x2
(t+1),…, xi-1(t+1)
, xi+1(t)
, …, xn(t)
Converges to P(x1, x2, …, xn)
(a.k.a. the heat bath algorithm in statistical physics)
(Geman & Geman, 1984)
Gibbs sampling
(MacKay, 2003)
Explaining convergence to the prior
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
PL(h|d)
PP(d|h)
PL(h|d)
PP(d|h)
When target distribution is P(d,h) = PP(d|h)P(h), conditional distributions are PL(h|d) and PP(d|h)
Implications for linguistic universals
• When learners sample from P(h|d), the distribution over languages converges to the prior– identifies a one-to-one correspondence between
inductive biases and linguistic universals
Iterated Bayesian learning
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
PL(h|d)
PP(d|h)
PL(h|d)
PP(d|h)
PL (h | d) PP (d | h)P(h)
PP (d | h )P( h )h H
Assume learners sample from their posterior distribution:
PL (h | d) PP (d | h)P(h)
PP (d | h )P( h )h H
r
r = 1 r = 2 r =
From sampling to maximizing
• General analytic results are hard to obtain– (r = is Monte Carlo EM with a single sample)
• For certain classes of languages, it is possible to show that the stationary distribution gives each hypothesis h probability proportional to P(h)r
– the ordering identified by the prior is preserved, but not the corresponding probabilities
(Kirby, Dowman, & Griffiths, 2007)
From sampling to maximizing
Implications for linguistic universals
• When learners sample from P(h|d), the distribution over languages converges to the prior– identifies a one-to-one correspondence between inductive
biases and linguistic universals
• As learners move towards maximizing, the influence of the prior is exaggerated– weak biases can produce strong universals– cultural evolution is a viable alternative to traditional
explanations for linguistic universals
Infinite populations in continuous time
• “Language dynamical equation”
• “Neutral model” (fj(x) constant)
• Stable equilibrium at first eigenvector of Q, which is our stationary distribution
dx i
dt qij f j (x)
j
x j (x)x i
(Nowak, Komarova, & Niyogi, 2001)
dx i
dt qij
j
x j x i
(Komarova & Nowak, 2003)
dx
dt(Q I)x
Analyzing iterated learning
• The outcome of iterated learning is strongly affected by the inductive biases of the learners– hypotheses with high prior probability ultimately
appear with high probability in the population
• Clarifies the connection between constraints on language learning and linguistic universals…
• …and provides formal justification for the idea that culture reflects the structure of the mind
Outline
Part I: Formal analysis of iterated learning
Part II: Iterated learning in the lab
Inductive problems
blicket toma
dax wug
blicket wug
S X Y
X {blicket,dax}
Y {toma, wug}
Learning languages from utterances
Learning functions from (x,y) pairs
Learning categories from instances of their members
Revealing inductive biases
• Many problems in cognitive science can be formulated as problems of induction– learning languages, concepts, and causal relations
• Such problems are not solvable without bias(e.g., Goodman, 1955; Kearns & Vazirani, 1994; Vapnik, 1995)
• What biases guide human inductive inferences?
If iterated learning converges to the prior, then it may provide a method for investigating biases
Serial reproduction(Bartlett, 1932)
General strategy
• Step 1: use well-studied and simple tasks for which people’s inductive biases are known– function learning– concept learning
• Step 2: explore learning problems where effects of inductive biases are controversial– frequency distributions– systems of color terms
Iterated function learning
• Each learner sees a set of (x,y) pairs
• Makes predictions of y for new x values
• Predictions are data for the next learner
data hypotheses
(Kalish, Griffiths, & Lewandowsky, 2007)
Function learning experiments
Stimulus
Response
Slider
Feedback
Examine iterated learning with different initial data
1 2 3 4 5 6 7 8 9
IterationInitialdata
Iterated concept learning
• Each learner sees examples from a species
• Identifies species of four amoebae
• Species correspond to boolean concepts
data hypotheses
(Griffiths, Christian, & Kalish, 2006)
Types of concepts(Shepard, Hovland, & Jenkins, 1961)
Type I
Type II
Type III
Type IV
Type V
Type VI
shape
size
color
Results of iterated learningPr
obab
ilit
y
Iteration
Prob
abil
ity
Iteration
Human learners Bayesian model
Frequency distributions
• Each learner sees objects receiving two labels
• Produces labels for those objects at test
• First learner: one label {0,1,2,3,4,5}/10 times
data hypotheses
(Reali & Griffiths, submitted)
5 x “DUP”
5 x “NEK”P(“DUP”| ) =
(Vouloumanos, 2008)
Results after one generation
Fre
quen
cy o
f ta
rget
labe
l
Condition
Results after five generations
Frequency of target label
The Wright-Fisher model
• Basic model: x copies of gene A in population of N
• With mutation…
x(t1) ~ Binomial(N,) =x(t )
N
x(t1) ~ Binomial(N,) =x(t )(1 u) (N x(t ))v
N
Iterated learning and Wright-Fisher
• Basic model is MAP with uniform prior, mutation model is MAP with Beta prior
• Extends to other models of genetic drift…– connection between drift models and inference
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
x x x
Nv
1 u v,
Nu
1 u v
with Florencia Reali
Cultural evolution of color terms
with Mike Dowman and Jing Xu
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Identifying inductive biases
• Formal analysis suggests that iterated learning provides a way to determine inductive biases
• Experiments with human learners support this idea– when stimuli for which biases are well understood are used,
those biases are revealed by iterated learning
• What do inductive biases look like in other cases?– continuous categories– causal structure– word learning– language learning
Conclusions
• Iterated learning provides a lens for magnifying the inductive biases of learners– small effects for individuals are big effects for groups
• When cognition affects culture, studying groups can give us better insight into individuals
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.data
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
CreditsJoint work with…
Brian ChristianMike Dowman
Mike KalishSimon Kirby
Steve LewandowskyFlorencia Reali
Jing Xu
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Computational Cognitive Science Labhttp://cocosci.berkeley.edu/