Much ado about nothing: A social network model of Russian...
-
Upload
nguyennhan -
Category
Documents
-
view
215 -
download
0
Transcript of Much ado about nothing: A social network model of Russian...
Learning defectiveness: Russian verbal gaps and implicit negative evidence
Andrea D. Sims, Robert T. Daland & Janet B. PierrehumbertNorthwestern University
{andreasims, rdaland, jbp}@northwestern.edu
2nd Annual Meeting of the Slavic Linguistics SocietyBerlin, Germany (ZAS)
August 23, 2007
Outline1) The phenomenon: Russian paradigmatic gaps2) The mystery: How do children learn inflectional
defectiveness?– Previous accounts: Assumption that children don’t learn
from negative evidence• Our account: Morphosyntactic learning as Bayesian
estimation over the paradigm– A.k.a., in defense of learning defectiveness via implicit
negative evidence
4) Testing the account: A multigenerational social network model
5) Conclusions
1) The phenomenon: Russian paradigmatic gaps2) The mystery: How do children learn inflectional
defectiveness?– Previous accounts: Assumption that children don’t learn
from negative evidence
3) Our account: Morphosyntactic learning as Bayesian estimation over the paradigm
– A.k.a., in defense of learning defectiveness via implicit negative evidence
4) Testing the account: A multigenerational social network model
5) Conclusions
Outline
Russian paradigmatic gaps
pylesosjatubedjatsprosjat3p
pylesositeubeditesprosite2p
pylesosimubedimsprosim1p
pylesositubeditsprosit3s
pylesosiš’ubediš’sprosiš’2s
**sprošu1s
pylesosit’‘to vacuum’
ubedit’‘to convince’
sprosit’‘to ask’
• 2nd conjugation dental stems have an alternation in the 1SG NONPAST (tj ~ č/šj, dj ~ ž, sj ~ š, zj ~ ž).
• Appx. 100 verbs in this subclass lack 1SG NONPAST forms (Halle 1973).
• Many Russian 1SG gaps originated in the mid19th century (Baerman 2007); some entered the language in the 20th century (e.g. pylesosit’ ‘to vacuum’).
• Attested in...– Dictionaries & grammars (e.g., Ožegov 1972, Švedova 1982)– Production experiment data (Alley, Sims & Brookes 2006)– Corpus data
Russian paradigmatic gaps
1) The phenomenon: Russian paradigmatic gaps2) The mystery: How do children learn inflectional
defectiveness?– Previous accounts: Assumption that children don’t learn
from negative evidence
3) Our account: Morphosyntactic learning as Bayesian estimation over the paradigm
– A.k.a., in defense of learning defectiveness via implicit negative evidence
4) Testing the account: A multigenerational social network model
5) Conclusions
Outline
The mystery• Observation: Inflectional morphology is highly
productive. Children easily produce inflected forms that they have never heard (Berko 1958).
• The mystery: How do children learn that certain verbs have paradigmatic gaps?– Gaps are inflected forms speakers have never heard, or hear
very infrequently. Why don’t children (and adults) fill them in?
The mystery• Two good possibilities
– Learned from implicit negative evidence– Not “learned” directly: Gaps are epiphenomenal result of other
aspect of the system, such as competing / incompatible rules• Many studies of gaps inherit from the generativist tradition
the belief that children do not learn via (explicit or) implicit negative evidence (e.g., Baronian 2005, Pertsova 2005, Rice 2003)
• Resulting conclusion: Synchronic grammar conflict / competition must cause the gaps– Hudson (2000:298): “... neither [conservative learning nor
learning from usage statistics] will turn out to lead to a solution…There must be something about the grammar of English that causes the [*amn’t] gap in a way that speakers don’t need any evidence for it and don’t try to fill it.”
Competition won’t work in Russian
• 1SG form is predictable. In CSR, alternation applies absolutely in its conditioning environment (compare to Albright 2003 on grammar competition as cause of Spanish gaps)
• No competition, but gaps are transmitted anyways.– See Alley et al. (2006) for some complications in non
standard Russian that don’t change this fundamental point.
Negative evidence revisited• Child language acquisition literature suggests that children
can and do learn from implicit negative evidence (cf. Sokolov & Snow 1994)
– Sensitivity to the absence of an expected structure (Tenenbaum & Griffiths 2001, Regier & Gahl 2004)
• Application to the Russian verbal gaps– Cannot be grammar conflict (Alley et al. 2006)
– Our argument: Speakers are sensitive to the absence of a given combination of lemma + inflectional property set (IPS) (i.e., paradigm cell)
1) The phenomenon: Russian paradigmatic gaps2) The problem: How do children learn inflectional
defectiveness?– Previous accounts: Assumption that children don’t learn
from negative evidence• Our account: Morphosyntactic learning as Bayesian
estimation over the paradigm– A.k.a., in defense of learning defectiveness via implicit
negative evidence
4) Testing the account: A multigenerational social network model
5) Conclusions
Outline
Morphosyntactic learning as Bayesian estimation• Learner infers the probability
distribution over inflectional property sets (IPS’s) for each lemma (cf. Baayen 2007)
• Application to UBEDIT’– Learner hears many
UBEDIT’ tokens, but no or few tokens of UBEDIT’+1SG.NONPAST
– Learner infers that relative absence of UBEDIT’ + 1SG.NONPAST is a property of UBEDIT’
20.1913p
100
15.7
6.0
46.4
11.7
0.2
Relative freq
453Sum
712p
271p
2103s
532s
11s
Raw #
ubedit’
21.2
100
6.4
7.2
48.4
7.2
9.6
Relative freq
All lemmas
• Learning a gap involves inferring deviation from expected (average) probability
• However, the relative frequency of a lemma + IPS cannot be reliably estimated from a small sample
• Learner looks to lexical neighbors as need to fill in the missing information– If learner hears many tokens of lemma (e.g. UBEDIT’), the
distribution of those tokens is more influential– If learner hears few tokens of lemma (e.g. ERUNDIT’ ‘to do
stupid or funny things’), the distribution of lexical neighbors is more influential
Morphosyntactic learning as Bayesian estimation
Morphosyntactic learning as Bayesian estimation
• Hypothesis: Two ways to learn gaps– Wordspecific learning for highly frequent lemmas– Analogicallydriven learning from lexical neighbors for
lower frequency lemmas
• For lower frequency lemmas, gaps are learned to the extent that they form a morphophonologically coherent group
1) The phenomenon: Russian paradigmatic gaps2) The problem: How do children learn inflectional
defectiveness?– Previous accounts: Assumption that children don’t learn
from negative evidence
3) Our account: Morphosyntactic learning as Bayesian estimation over the paradigm
– A.k.a., in defense of learning defectiveness via implicit negative evidence
4) Testing the account: A multigenerational social network model
5) Conclusions
Outline
• Behaviors we want to capture– Wordspecific learning of gaps in frequent verbs– Learning gaps “analogically” from morphophonologically
similar neighbors
• Our account captures these behaviors with a computational model multiple generations of agents embedded in a social network (See Daland et al. 2007 for details)
• Major results– Many existing gaps persist for multiple generations– New gaps are created– Evidence of both wordspecific and analogicallydriven
learning of gaps
Testing our account
target:ubedit’
28.56.06.241.95.811.8sudit’
24.25.45.154.45.75.2pobedit’
19.46.76.951.68.37.1vil’nut’
23.07.17.246.39.17.3vzdut’
24.27.76.147.66.28.1begat’3p2p1p3s2s1sneighborhood
baraxlit’begat’
bespokoit’brodit’vzdut’vesit’
vzvintit’vil’nut’dobavit’dobit’
…pobedit’
sudit’ekonomit’
jutit’sjajavit’sja
3) Mix attested relative freq & nbhd relative freq
20.115.76.046.411.70.2Attested relative freq (ubedit’)
917127210531Raw tokens (ubedit’)
20.215.66.046.411.60.3Predicted relative freq (ubedit’)
2) Average relative freq for lexical neighborhood
11
1/31
2/3
w
1) Lexical nbhd
23.96.66.348.37.07.9Average relative freq (lexical neighborhd)
Testing our account• Adults talk (100,000 verbs
each), children listen• End of generational cycle: adults
die off, children learn grammar, mature, reproduce
• Speech of new adults based on the grammar that they learned
• 10 generations• 50 adults and 50 children per
generation• Each child connected to 10
adults on average (random network)
• First generation seeded by random sampling from RNC
Testing our account
• Adults talk (100,000 verbs each), children listen
• End of generational cycle: adults die off, children learn grammar, mature, reproduce
• Speech of new adults based on the grammar that they learned
• 10 generations• 50 adults and 50 children per
generation• Each child connected to 10
adults on average (random network)
• First generation seeded by random sampling from RNC
Testing our account
• Adults talk (100,000 verbs each), children listen
• End of generational cycle: adults die off, children learn grammar, mature, reproduce
• Speech of new adults based on the grammar that they learned
• 10 generations• 50 adults and 50 children per
generation• Each child connected to 10
adults on average (random network)
• First generation seeded by random sampling from RNC
Testing our account
• Adults talk (100,000 verbs each), children listen
• End of generational cycle: adults die off, children learn grammar, mature, reproduce
• Speech of new adults based on the grammar that they learned
• 10 generations• 50 adults and 50 children per
generation• Each child connected to 10
adults on average (random network)
• First generation seeded by random sampling from RNC
• Conditions– Two types of analogical
influence from lexical neighborhood: unweighted vs. morphophonologically weighted
– Four levels of analogical influence
• Evaluation questions– Do existing gaps persist
for multiple generations?
– Are new gaps created?
Testing our account
Definition of a paradigmatic gap• Corpus data (RNC)
• Gap criteria– Remove sampling errors:
raw lemma frequency > 1 per 2 million words
– No impersonal verbs: 3sg+3pl < 85% of relative freq
– 1sg < 2% relative frequency
(2% = valley of bimodal distribution)
• 56 gaps in Gen 0 (/808 candidate lemmas)– reasonable agreement with
prescriptive sources
20.1
15.7
6.0
46.4
11.7
0.2
Relative freq
913p
712p
271p
2103s
532s
11s
Raw #
ubedit’
21.2
6.4
7.2
48.4
7.2
9.6
Relative freq
All lemmas
• For each lemma, count how many generations it met gap criterion
• Histogram: For each number of generations n, how many lemmas had a 1sg gap for n generations (out of 10 possible)?
Persistence of gaps
Gap lifetimes: Unweighted influence from lexical neighbors
Persistence of gaps
Least analogical force of lexical neighborhood
Most analogical force of lexical neighborhood
Gap lifetimes: Unweighted influence from lexical neighbors
• Some gaps persist for as many as 10 generations– Wordspecific
learning• Influence from lexical
neighborhood shortens the lifetimes of gaps
Persistence of gaps
Gap lifetimes: Unweighted influence from lexical neighbors
Persistence of gaps
τ = 4.95
τ = 3.46τ =
1.91τ = 1.46
Gap lifetimes: Unweighted influence from lexical neighbors
Gap lifetimes: MPweighted influence from lexical neighbors
• Weighting by morphophonological similarity increases the lifetime of a given 1sg gaps
• Relationship between dental stems and gaps is selfreinforcing
• Interaction much like a lexical gang effect, only at the morphosyntactic level– Can draw in new gaps
Persistence of gaps
Gap lifetimes: MPweighted influence from lexical neighbors
New gaps
Number of gaps by generation: MPweighted influence of LN
Number of gaps by generation: Unweighted influence of LN
New gaps
Number of gaps by generation: MPweighted influence of LN•Weighting by morphophon.
similarity increases the number of gaps in a generation when analogical influence is strong
•Number of gaps per generation increases, then reaches point of local stability
1) The phenomenon: Russian paradigmatic gaps2) The problem: How do children learn inflectional
defectiveness?– Previous accounts: Assumption that children don’t learn
from negative evidence
3) Our account: Morphosyntactic learning as Bayesian estimation over the paradigm
– A.k.a., in defense of learning defectiveness via implicit negative evidence
4) Testing the account: A multigenerational social network model
5) Conclusions
Outline
Conclusions
• Bayesian morphosyntactic learning can explain the persistence of Russian paradigmatic gaps
– Existing gaps persist and new gaps are created
• Predicting gaps from probability distributions = learning from implicit negative evidence
– Contra most previous accounts of paradigmatic gaps, but cf. Johansson (1999)
• Not random that gaps follow distribution of alternation– Morphosyntactic distribution (low 1sg relative frequency)
promoted by morphophonological coherence
Acknowledgments
• Luis Amaral and his Network Theory class (Fall ’06)• Phonatics & NICO at Northwestern University• Andrew W. Mellon Postdoctoral Fellowship
References• Albright, Adam (2003). “A quantitative study of Spanish paradigm gaps”.
WCCFL 22, 114.• Alley, Maria, Andrea D. Sims and Bryan Brookes (2006). “On Russian
verbal gaps and nonoptimality in language”. Paper presented at the First Annual Meeting of the Slavic Linguistics Society in Bloomington, IN, September 810, 2006.
• Baayen, R. Harald (2007). Paradigmatic structure in speech production. Paper presented at the 43rd Annual Meeting of the Chicago Linguistic Society in Chicago, IL, May 35, 2007.
• Baerman, Matthew (2007). “The diachrony of defectiveness”. Paper presented at 43rd Annual Meeting of the Chicago Linguistic Society in Chicago, IL, May 35, 2007.
• Baronian, Luc V. (2005). North of phonology. Ph.D. thesis: Linguistics Department, Stanford University.
• Berko, Jean (1958). “The Child's Learning of English Morphology”. Word14, 15077.
• Brown, Roger and Camille Hanlon (1970). “Derivational complexity and order of acquisition in child speech”. In Cognition and the development of language, John R. Hayes, ed. New York: Wiley, 1153.
• Daland, Robert, Andrea D. Sims and Janet Pierrehumbert (2007). “Much ado about nothing: A social network model of Russian paradigmatic gaps”. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics in Prague, Czech Republic, June 2429, 2007. Annie Zaenen and Antal van den Bosch, eds. Prague: Association for Computational Linguistics, 936943.
• Gold, E. Mark (1967). “Language identification in the limit”. Information and control 10(5), 447474.
• Halle, Morris (1973). “Prolegomena to a theory of word inflection”. Linguistic inquiry 4, 316.
• Hudson, Richard (2000). “*I amn’t”. Language 76(2), 297323.• Johansson, Christer (1999). “Learning what cannot be by failing
expectations”. Nordic journal of linguistics 22(1), 6176.• Morgan, James L. and Lisa L. Travis (1989). “Limits on negative
information”. Journal of child language 16(3), 531552.• Ožegov, S.I. (1972). Slovar’ russkogo jazyka. Moskva: Sov. Enciklopedija.
References
• Pertsova, Katya (2005). “How lexical conservatism can lead to paradigm gaps”. UCLA working papers in linguistics 11, 1338.
• Regier, Terry and Susanne Gahl (2004). “Learning the unlearnable: The role of missing evidence”. Cognition 93, 147155.
• Rice, Curt (2003). “Dialectal variation in Norwegian imperatives”. Nordlyd31, 372384.
• Sims, Andrea (2006). Minding the gaps: Inflectional defectiveness in paradigmatic morphology. Ph.D. thesis: Linguistics Department, The Ohio State University.
• Sokolov, Jeffrey L. and Catherine E. Snow (1994). “The changing role of negative evidence in theories of language development”. In Input and interaction in language development. Clare Gallaway and Brian J. Richards, eds. Cambridge: Cambridge University Press, 3855.
• Švedova, Ju. (1982). Grammatika sovremennogo russkogo literaturnogo jazyka. Moskva: Nauka.
• Tenenbaum, Joshua B. and Thomas L. Griffiths (2001). “Generalization, similarity, and Bayesian inference”. Behavioral and brain sciences 24, 629640.
References