Much ado about nothing: A social network model of Russian...

Learning defectiveness: Russian verbal gaps and implicit negative evidence

Andrea D. Sims, Robert T. Daland & Janet B. PierrehumbertNorthwestern University

{andreasims, rdaland, jbp}@northwestern.edu

2nd Annual Meeting of the Slavic Linguistics SocietyBerlin, Germany (ZAS)

August 23, 2007

Outline1) The phenomenon: Russian paradigmatic gaps2) The mystery: How do children learn inflectional

defectiveness?– Previous accounts: Assumption that children don’t learn

from negative evidence• Our account: Morphosyntactic learning as Bayesian

estimation over the paradigm– A.k.a., in defense of learning defectiveness via implicit

negative evidence

4) Testing the account: A multigenerational social network model

5) Conclusions

1) The phenomenon: Russian paradigmatic gaps2) The mystery: How do children learn inflectional


from negative evidence

3) Our account: Morphosyntactic learning as Bayesian estimation over the paradigm

– A.k.a., in defense of learning defectiveness via implicit negative evidence


5) Conclusions

Outline

Russian paradigmatic gaps

pylesosjatubedjatsprosjat3p

pylesositeubeditesprosite2p

pylesosimubedimsprosim1p

pylesositubeditsprosit3s

pylesosiš’ubediš’sprosiš’2s

**sprošu1s

pylesosit’‘to vacuum’

ubedit’‘to convince’

sprosit’‘to ask’

• 2nd conjugation dental stems have an alternation in the 1SG NONPAST (tj ~ č/šj, dj ~ ž, sj ~ š, zj ~ ž).

• Appx. 100 verbs in this subclass lack 1SG NONPAST forms (Halle 1973).

• Many Russian 1SG gaps originated in the mid19th century (Baerman 2007); some entered the language in the 20th century (e.g. pylesosit’ ‘to vacuum’).

• Attested in...– Dictionaries & grammars (e.g., Ožegov 1972, Švedova 1982)– Production experiment data (Alley, Sims & Brookes 2006)– Corpus data

Russian paradigmatic gaps

1) The phenomenon: Russian paradigmatic gaps2) The mystery: How do children learn inflectional






5) Conclusions

Outline

The mystery• Observation: Inflectional morphology is highly

productive. Children easily produce inflected forms that they have never heard (Berko 1958).

• The mystery: How do children learn that certain verbs have paradigmatic gaps?– Gaps are inflected forms speakers have never heard, or hear

very infrequently. Why don’t children (and adults) fill them in?

The mystery• Two good possibilities

– Learned from implicit negative evidence– Not “learned” directly: Gaps are epiphenomenal result of other

aspect of the system, such as competing / incompatible rules• Many studies of gaps inherit from the generativist tradition

the belief that children do not learn via (explicit or) implicit negative evidence (e.g., Baronian 2005, Pertsova 2005, Rice 2003)

• Resulting conclusion: Synchronic grammar conflict / competition must cause the gaps– Hudson (2000:298): “... neither [conservative learning nor

learning from usage statistics] will turn out to lead to a solution…There must be something about the grammar of English that causes the [*amn’t] gap in a way that speakers don’t need any evidence for it and don’t try to fill it.”

Competition won’t work in Russian

• 1SG form is predictable. In CSR, alternation applies absolutely in its conditioning environment (compare to Albright 2003 on grammar competition as cause of Spanish gaps)

• No competition, but gaps are transmitted anyways.– See Alley et al. (2006) for some complications in non

standard Russian that don’t change this fundamental point.

Negative evidence revisited• Child language acquisition literature suggests that children

can and do learn from implicit negative evidence (cf. Sokolov & Snow 1994)

– Sensitivity to the absence of an expected structure (Tenenbaum & Griffiths 2001, Regier & Gahl 2004)

• Application to the Russian verbal gaps– Cannot be grammar conflict (Alley et al. 2006)

– Our argument: Speakers are sensitive to the absence of a given combination of lemma + inflectional property set (IPS) (i.e., paradigm cell)

1) The phenomenon: Russian paradigmatic gaps2) The problem: How do children learn inflectional


from negative evidence• Our account: Morphosyntactic learning as Bayesian

estimation over the paradigm– A.k.a., in defense of learning defectiveness via implicit

negative evidence


5) Conclusions

Outline

Morphosyntactic learning as Bayesian estimation• Learner infers the probability

distribution over inflectional property sets (IPS’s) for each lemma (cf. Baayen 2007)

• Application to UBEDIT’– Learner hears many

UBEDIT’ tokens, but no or few tokens of UBEDIT’+1SG.NONPAST

– Learner infers that relative absence of UBEDIT’ + 1SG.NONPAST is a property of UBEDIT’

20.1913p

100

15.7

6.0

46.4

11.7

0.2

Relative freq

453Sum

712p

271p

2103s

532s

11s

Raw #

ubedit’

21.2

100

6.4

7.2

48.4

7.2

9.6

Relative freq

All lemmas

• Learning a gap involves inferring deviation from expected (average) probability

• However, the relative frequency of a lemma + IPS cannot be reliably estimated from a small sample

• Learner looks to lexical neighbors as need to fill in the missing information– If learner hears many tokens of lemma (e.g. UBEDIT’), the

distribution of those tokens is more influential– If learner hears few tokens of lemma (e.g. ERUNDIT’ ‘to do

stupid or funny things’), the distribution of lexical neighbors is more influential

Morphosyntactic learning as Bayesian estimation

Morphosyntactic learning as Bayesian estimation

• Hypothesis: Two ways to learn gaps– Wordspecific learning for highly frequent lemmas– Analogicallydriven learning from lexical neighbors for

lower frequency lemmas

• For lower frequency lemmas, gaps are learned to the extent that they form a morphophonologically coherent group







5) Conclusions

Outline

• Behaviors we want to capture– Wordspecific learning of gaps in frequent verbs– Learning gaps “analogically” from morphophonologically

similar neighbors

• Our account captures these behaviors with a computational model multiple generations of agents embedded in a social network (See Daland et al. 2007 for details)

• Major results– Many existing gaps persist for multiple generations– New gaps are created– Evidence of both wordspecific and analogicallydriven

learning of gaps

Testing our account

target:ubedit’

28.56.06.241.95.811.8sudit’

24.25.45.154.45.75.2pobedit’

19.46.76.951.68.37.1vil’nut’

23.07.17.246.39.17.3vzdut’

24.27.76.147.66.28.1begat’3p2p1p3s2s1sneighborhood

baraxlit’begat’

bespokoit’brodit’vzdut’vesit’

vzvintit’vil’nut’dobavit’dobit’

…pobedit’

sudit’ekonomit’

jutit’sjajavit’sja

3) Mix attested relative freq & nbhd relative freq

20.115.76.046.411.70.2Attested relative freq (ubedit’)

917127210531Raw tokens (ubedit’)

20.215.66.046.411.60.3Predicted relative freq (ubedit’)

2) Average relative freq for lexical neighborhood

11

1/31

2/3

w

1) Lexical nbhd

23.96.66.348.37.07.9Average relative freq (lexical neighborhd)

Testing our account• Adults talk (100,000 verbs

each), children listen• End of generational cycle: adults

die off, children learn grammar, mature, reproduce

• Speech of new adults based on the grammar that they learned

• 10 generations• 50 adults and 50 children per

generation• Each child connected to 10

adults on average (random network)

• First generation seeded by random sampling from RNC

Testing our account

• Adults talk (100,000 verbs each), children listen

• End of generational cycle: adults die off, children learn grammar, mature, reproduce

• Speech of new adults based on the grammar that they learned

• 10 generations• 50 adults and 50 children per

generation• Each child connected to 10

adults on average (random network)

• First generation seeded by random sampling from RNC

• Conditions– Two types of analogical

influence from lexical neighborhood: unweighted vs. morphophonologically weighted

– Four levels of analogical influence

• Evaluation questions– Do existing gaps persist

for multiple generations?

– Are new gaps created?

Testing our account

Definition of a paradigmatic gap• Corpus data (RNC)

• Gap criteria– Remove sampling errors:

raw lemma frequency > 1 per 2 million words

– No impersonal verbs: 3sg+3pl < 85% of relative freq

– 1sg < 2% relative frequency

(2% = valley of bimodal distribution)

• 56 gaps in Gen 0 (/808 candidate lemmas)– reasonable agreement with

prescriptive sources

20.1

15.7

6.0

46.4

11.7

0.2

Relative freq

913p

712p

271p

2103s

532s

11s

Raw #

ubedit’

21.2

6.4

7.2

48.4

7.2

9.6

Relative freq

All lemmas

• For each lemma, count how many generations it met gap criterion

• Histogram: For each number of generations n, how many lemmas had a 1sg gap for n generations (out of 10 possible)?

Persistence of gaps

Gap lifetimes: Unweighted influence from lexical neighbors

Persistence of gaps

Least analogical force of lexical neighborhood

Most analogical force of lexical neighborhood


• Some gaps persist for as many as 10 generations– Wordspecific

learning• Influence from lexical

neighborhood shortens the lifetimes of gaps

Persistence of gaps


Persistence of gaps

τ = 4.95

τ = 3.46τ =

1.91τ = 1.46


Gap lifetimes: MPweighted influence from lexical neighbors

• Weighting by morphophonological similarity increases the lifetime of a given 1sg gaps

• Relationship between dental stems and gaps is selfreinforcing

• Interaction much like a lexical gang effect, only at the morphosyntactic level– Can draw in new gaps

Persistence of gaps

Gap lifetimes: MPweighted influence from lexical neighbors

New gaps

Number of gaps by generation: MPweighted influence of LN

Number of gaps by generation: Unweighted influence of LN

New gaps

Number of gaps by generation: MPweighted influence of LN•Weighting by morphophon.

similarity increases the number of gaps in a generation when analogical influence is strong

•Number of gaps per generation increases, then reaches point of local stability







5) Conclusions

Outline

Conclusions

• Bayesian morphosyntactic learning can explain the persistence of Russian paradigmatic gaps

– Existing gaps persist and new gaps are created

• Predicting gaps from probability distributions = learning from implicit negative evidence

– Contra most previous accounts of paradigmatic gaps, but cf. Johansson (1999)

• Not random that gaps follow distribution of alternation– Morphosyntactic distribution (low 1sg relative frequency)

promoted by morphophonological coherence

Acknowledgments

• Luis Amaral and his Network Theory class (Fall ’06)• Phonatics & NICO at Northwestern University• Andrew W. Mellon Postdoctoral Fellowship

References• Albright, Adam (2003). “A quantitative study of Spanish paradigm gaps”.

WCCFL 22, 114.• Alley, Maria, Andrea D. Sims and Bryan Brookes (2006). “On Russian

verbal gaps and nonoptimality in language”. Paper presented at the First Annual Meeting of the Slavic Linguistics Society in Bloomington, IN, September 810, 2006.

• Baayen, R. Harald (2007). Paradigmatic structure in speech production. Paper presented at the 43rd Annual Meeting of the Chicago Linguistic Society in Chicago, IL, May 35, 2007.

• Baerman, Matthew (2007). “The diachrony of defectiveness”. Paper presented at 43rd Annual Meeting of the Chicago Linguistic Society in Chicago, IL, May 35, 2007.

• Baronian, Luc V. (2005). North of phonology. Ph.D. thesis: Linguistics Department, Stanford University.

• Berko, Jean (1958). “The Child's Learning of English Morphology”. Word14, 15077.

• Brown, Roger and Camille Hanlon (1970). “Derivational complexity and order of acquisition in child speech”. In Cognition and the development of language, John R. Hayes, ed. New York: Wiley, 1153.

• Daland, Robert, Andrea D. Sims and Janet Pierrehumbert (2007). “Much ado about nothing: A social network model of Russian paradigmatic gaps”. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics in Prague, Czech Republic, June 2429, 2007. Annie Zaenen and Antal van den Bosch, eds. Prague: Association for Computational Linguistics, 936943.

• Gold, E. Mark (1967). “Language identification in the limit”. Information and control 10(5), 447474.

• Halle, Morris (1973). “Prolegomena to a theory of word inflection”. Linguistic inquiry 4, 316.

• Hudson, Richard (2000). “*I amn’t”. Language 76(2), 297323.• Johansson, Christer (1999). “Learning what cannot be by failing

expectations”. Nordic journal of linguistics 22(1), 6176.• Morgan, James L. and Lisa L. Travis (1989). “Limits on negative

information”. Journal of child language 16(3), 531552.• Ožegov, S.I. (1972). Slovar’ russkogo jazyka. Moskva: Sov. Enciklopedija.

References

• Pertsova, Katya (2005). “How lexical conservatism can lead to paradigm gaps”. UCLA working papers in linguistics 11, 1338.

• Regier, Terry and Susanne Gahl (2004). “Learning the unlearnable: The role of missing evidence”. Cognition 93, 147155.

• Rice, Curt (2003). “Dialectal variation in Norwegian imperatives”. Nordlyd31, 372384.

• Sims, Andrea (2006). Minding the gaps: Inflectional defectiveness in paradigmatic morphology. Ph.D. thesis: Linguistics Department, The Ohio State University.

• Sokolov, Jeffrey L. and Catherine E. Snow (1994). “The changing role of negative evidence in theories of language development”. In Input and interaction in language development. Clare Gallaway and Brian J. Richards, eds. Cambridge: Cambridge University Press, 3855.

• Švedova, Ju. (1982). Grammatika sovremennogo russkogo literaturnogo jazyka. Moskva: Nauka.

• Tenenbaum, Joshua B. and Thomas L. Griffiths (2001). “Generalization, similarity, and Bayesian inference”. Behavioral and brain sciences 24, 629640.

References

Baronian (2005:1312): “Only a specific instruction to the effect of the sort ‘do not use this form’ could justify [retreat from the default setting [+lexical insertion]]. As we know, this is the kind of negative evidence that most theories of language acquisition do not recognize as valid.”

Much ado about nothing: A social network model of Russian...

Documents

Transcript of Much ado about nothing: A social network model of Russian...