Average Conditional Entropy of the Tlingit Verbal Inflection Paradigm: A Brief Report

1

Average Conditional Entropy of the Tlingit Verbal Inflection Paradigm: A Brief Report

Seth Cable

In the year of our Lord 2000 (or was it 2001?), Alan taught the very first field methods course that I ever took. The language was Hungarian. We constructed target sentences poking fun at Trent Lott, referencing Bartk and goulash. Alan spoke excitedly of our having discovered noun incorporation ON THE HOOF! My final paper was absurdly long, but the comments back which I received only a few days later were nearly as long themselves. They made me feel like I could really do this kind of work, which I found myself having grown in love with over the course of the term. In light of this, I thought the following brief report might be a suitable contribution to this venue. So that there is no misunderstanding, I do believe that there is far more to morphological learning than is suggested here. However, having witnessed the intricacies of Tlingit verbal inflection reduce to slag several leading theories of morphology, I am excited by any perspective that can shed light onto how a system of this sort could ever be learned by earthly beings.

1. Overview With assistance from PhD student Presley Pizzo, I employed a program written and made available to me by Rob Malouf to explore the consequences that the inflectional system of Tlingit (Na-Dene; Alaska, British Columbia, Yukon) may have for the proposals of Ackerman & Malouf (2013), in particular their Low Conditional Entropy Conjecture (LCEC).1

Three different paradigmatic representations of the Tlingit verbal inflectional system were examined. For two of the three, the average conditional entropy was within the range predicted by the LCEC. I explain below how this accords well with specialists understanding of the exceptional status of imperfective aspect within the Tlingit inflectional system. Interestingly and importantly, for all three paradigmatic representations, the average conditional entropy was much lower than that of the comparable, artificial systems randomly generated by the bootstrap simulation (Ackerman & Malouf 2013). This supports the view that, even though average conditional entropy of the Tlingit verbal inflectional system may be relatively high, the system is organized in a way that minimizes such entropy. 2. Basic Background on the Tlingit Verbal Inflectional System I assume Leers (1991) analysis of the Tlingit verbal inflectional system, though my own presentation below differs slightly in superficial respects, largely following the notational and terminological changes advanced by Crippen (2013). According to Leers analysis, in order to inflect a Tlingit verb (or verbal theme), a speaker must know the value of the following three parameters. 1 Given my unfamiliarity with Python, Pizzo did the work of actually running the program on the input paradigms I constructed.

2

(1) Three Parameters Defining the Inflectional Classes of Tlingit a. The Primary Imperfective Type (if any) of the verb [ 22 possible values] b. The Conjugation Class of the verb [5 possible values] c. The Root Type of the verbal root [9 possible values] With this information, a speaker can project the surface forms of the verb for each of the following 18 Tense-Aspect-Mood (TAM) categories. (2) The Inflectional Categories of the Tlingit Verb a. Imperfective b. Irrealis Imperfective c. Perfective d. Irrealis Perfective e. Future f. Irrealis Future g. Potential h. Irrealis Potential i. Habitual j. Irrealis Habitual k. Imperative l. Hortative m. Admonitive n. Consecutive o. Conditional p. Contingent q. Progressive r. Repetitive For more information on the morphosyntax and semantics of each of these TAM forms, I refer the reader to Leer (1991). Returning to the key parameters in (1), the Primary Imperfective Type (1a) of the verb determines the realization of the verb in the imperfective and irrealis imperfective. According to Leer (1991), there are at least 22 Primary Imperfective Types; it should be noted, however, that other specialists count as many as 27 such types (Crippen 2013). For the purposes of this study, I assume the following 22 Primary Imperfective Types. (3) The Primary Imperfective Types of Tlingit (Leer 1991, Crippen 2013) a. y-Stative Imperfective b. :-Stative Imperfective c. h-Stative Imperfective d. n-Positional Imperfective e. :-Positional Imperfective f. -Positional Imperfective g. :-Processive Imperfective h. -Processive Imperfective i. h-Processive Imperfective

3

j. n-Processive Imperfective k. xh-Processive Imperfective l. g-Processive Imperfective m. (I)g-Processive Imperfective n. yoo(I)g-Processive Imperfective o. ch-Processive Imperfective p. t-Processive Imperfective q. s-Processive Imperfective r. l-Processive Imperfective s. x-Processive Imperfective t. t-Processive Imperfective u. h-Extensional Imperfective v. y-Extensional Imperfective Next, the Conjugation Class (1b) determines the affixal realizations of the other 16 TAM values in (2), as well as the stem variant appearing in those forms. To illustrate, a verb in the so-called ga-conjugation will have in its future form the affixes and stem variant in (4a), while a verb in the 0-conjugation will have the affixes and stem variant in (4b). (4) Morphological Realization of Future in Two Conjugation Classes a. Ga-Conjugation kei # ga-w-gha-[ :-Stem Variant ] b. 0-Conjugation ga-w-gha-[ :-Stem Variant ] Similarly, a ga-conjugation verb will have in its perfective form the affixes and stem-variant in (5a), while a 0-conjugation verb will have the affixes and stem variant in (5b). (5) Morphological Realization of Perfective in Two Conjugation Classes a. Ga-Conjugation wu-[I]-[ h-Stem Variant ] b. 0-Conjugation wu-[I]-[ y-Stem Variant ] Under one method of counting, there are five conjugation classes in Tlingit, which receive some version of the following mnemonic labels. (6) Conjugation Classes of Tlingit (Crippen 2013) a. 0-Conjugation b. 0/y-Conjugation 2 c. na-Conjugation d. ga-Conjugation e. gha-Conjugation 2 Most authors view the 0/y-Conjugation as a subtype of 0-Conjugation verbs, rather than its own conjugation class. Such a distinction is not important for present purposes. Since 0/y-Conjugation verbs sometimes take different stem variants from (plain) 0-Conjugation verbs, the present study views them as a separate conjugation class.

4

Finally, the Root Type (1c) determines the surface appearance of the stem variant in a given verbal form. For example, if the Root Type of the verbal root is CVC-Varying, then the h-Stem variant will surface as CVVC, with a long low-toned vowel nucleus. However, if the Root Type is CVC-Glottal, then the h-Stem variant will surface as CV VC, with a long high-toned vowel nucleus. By one method of counting, there are nine different root types in Tlingit. (7) The Root Types of Tlingit (Crippen 2013) a. CV Varying e. CV C Invariant b. CV Fading f. CVVC Invariant c. CVC Varying g. CV VC Invariant d. CVC Glottal h. CV V Invariant i. CVV Invariant Note that five of the Root Types are invariant; this means that every stem built from the root has the same surface form. For example, a CV VC-Invariant root always surfaces with a long high-toned vowel in every form, no matter what the underlying stem variant specification is. Given that all three parameters in (1) must be known in order to inflect a Tlingit verb, one could view the possible valuations of these parameters as defining the inflectional classes (ICs) of Tlingit. Given that there are not any substantive grammatical constraints on the possible combinations of those parameters, the ICs of Tlingit would seem to be all 990 logically possible combinations. However, due to the morphophonology of the language, the contrasts between certain Primary Imperfective Types collapses with particular Root Types. Bearing this in mind, the total possible ICs of Tlingit reduces to 790. Note, however, that given the relative rarity of certain Primary Imperfective Types, many of these logically possible ICs do not seem to be attested. Nevertheless, the logical structure of the Tlingit system entails their grammatical possibility, and it is reasonable to assume that a competent speaker of Tlingit would know how to inflect a verb from every logically possible class. Thus, in this study, I will ignore such accidental gaps within the overall system of Tlingit inflectional classes, and so will assume the existence of all 790 of the aforementioned ICs. 3. Basic Background on the Low Conditional Entropy Conjecture This report will assume familiarity with the work of Ackerman & Malouf (2013) on conditional entropy in inflectional paradigms. I will however, for rhetorical purposes, briefly and superficially summarize the key ideas. The reader is referred to the original paper for precise technical definitions and equations. Briefly put, the conditional entropy of Y given X H(Y|X) is a measure of the uncertainty in the value of Y given knowledge of the value of X. The higher the value of H(Y|X), the less information that X provides regarding Y. If H(Y|X) = 0, then the value of X completely determines the value of Y. However, if H(Y|X) =1, then knowing X gives one a 50/50 chance of picking the correct value for Y; if H(Y|X) = 2, then knowing X gives a 1/4 chance of picking the correct value for Y; if H(Y|X) = 3, then it is a 1/8 chance, etc. Thus, as H(Y|X) approaches infinity, the information that the value of X provides regarding the value of Y diminishes. This notion of conditional entropy can provide a tool for measuring the learnability of a given inflectional system. To see this, let us imagine the task of the language learner as being to

5

produce novel inflectional forms of a given verb V based upon the inflected forms of V that theyve already heard. Intuitively, if hearing a single inflected form of V allowed the learner to correctly predict all other inflected forms of V, then that inflectional system would be rather easy to learn. However, if hearing a single inflected form of V only gave the learner on average a 1/4 chance of correctly picking the other inflected forms of V, then that system would intuitively be harder to learn than the former system. With this in mind, let us assume that the variable X ranges over possible the surface realizations of a particular verb V in a particular inflectional form F. Now, let us suppose that Y ranges over possible surface realizations of the same verb V in a different inflectional form F. Thus, the conditional entropy H(Y|X) would represent how much information form F provides regarding the realization of form F. If averaged across all possible pairs of inflectional forms in the inflectional system, we would thereby have a concrete measure of how confidently learners can, on average, predict novel inflectional forms on the basis of forms theyve already encountered. Building upon these ideas, Ackerman & Malouf (2013) calculate the average conditional entropy for the inflectional paradigms of several languages. To illustrate, the nominal inflectional system of Modern Greek is represented as in (8), where each row is a different nominal inflectional class (or declension) while each column is the suffixal realization of a different combination of case and number features. (8) Nominal Inflectional System of Modern Greek (Ackerman & Malouf 2013)

From representations such as these, Ackerman & Malouf are able to calculate for every pair of Case-Number features , the conditional entropy of the realization of F given the realization of F. The table in (9) presents their calculations for the Modern Greek system in (8).

6

(9) Conditional Entropies for the Modern Greek (Ackerman & Malouf 2013)

For example, we find in the chart above that the conditional entropy of the Genitive Singular given the Nominitive Singular is 1, meaning that knowing the Nominative Singular of a particular verb V gives one a 50/50 shot at correctly choosing the Genitive Singular of V. However, the conditional entropy of the Nominative Plural given the Accusative Plural is 0, meaning that the form of the Accusative Plural completely determines the form of the Nominative Plural, as can be seen from a quick glance at the table in (8).

Finally, averaging together all the conditional entropies in the chart above, one finds that the overall average conditional entropy for the Modern Greek nominal inflectional system is 0.644. Thus, for a learner of Modern Greek, knowing any particular inflected form F provides on average a better than 50/50 chance of correctly choosing another inflected form F. Bearing in mind that there can be as many as 5 different possible realizations of a given case/number combination, this suggests that the inflectional system in (8) is organized in a way that facilitates efficient learning, i.e., the prediction of novel inflected forms on the basis of encountered forms. Ackerman & Malouf (2013) perform similar calculations for the inflectional systems of a number of typologically distinct languages. Interestingly, as shown in the chart below, in nearly all cases, the overall conditional entropy for the inflectional system is at or below 0.7. (10) Average Conditional Entropies for Ten Languages (Ackerman & Malouf 2013)

7

Note that even for a language such as Mazatec, with 109 different inflectional classes, and where there can be as many as 94 different possible realizations of a particular inflectional feature, the overall average conditional entropy for the system is still just .709. Moreover, looking down the rightmost column of (10), we find that for 9/10 of these languages, knowing just one inflected form for a given verb gives the learner on average a better than 50/50 chance at choosing any other inflected form. To highlight the surprising nature of this finding, Ackerman & Malouf (2013) compare the observed entropy averages in (10) to those of randomly generated languages of comparable superficial complexity. To illustrate, they generate alternative versions of (e.g.) Mazatec by randomly constructing 109 different inflectional classes, where each class is created by randomly selecting for each inflectional form one of the possible surface realizations of that form. Importantly, the conditional entropy for such randomized variants of Mazatec is on average 1.1, significantly higher than the observed average of 0.709. This lends further support to the view that the true Mazatec inflectional system is organized to facilitate the prediction of novel inflected forms from observed forms. Taking this all together, Ackerman and Malouf put forth the Low Conditional Entropy Conjecture (LCEC), which I paraphrase as in (11). (11) The Low Conditional Entropy Conjecture (LCEC)

The average conditional entropy (ACE) of a natural language inflectional system will tend to be low (i.e., at or below 0.7; Robert Malouf, p.c.), or will be lower than the average ACE of randomized variants of that system.

Readers who are interested in the relationship between the LCEC above and the traditional notion of a system of principle parts are referred to Ackerman & Maloufs (2013) paper. 4. The Conditional Entropy of the Tlingit Verbal Inflectional System Given the 790 logically possible inflectional classes of Tlingit verbs, it seems to provide an interesting test case for the LCEC in (11). Similarly, if the ACE of the Tlingit verbal inflectional system is relatively low, that may provide some insight into how such a remarkably complex system has been so diachronically stable. For these reasons, I sought to calculate the ACE of the full Tlingit verbal inflectional system, as described in Section 2. As mentioned in Section 1, in order to make these calculations, PhD student Presley Pizzo ran a Python program written by Rob Malouf on paradigm tables constructed by myself.3 The first inflectional paradigm table we examined contained all 790 of the aforementioned verbal inflectional classes and all 18 of the TAM forms in (2). When run on this table, the program output the table of conditional entropies given in (12) below.

3 The Python code is available at https://github.com/rmalouf/morphology/blob/master/paradigms/entropy.py.

8

(12) Conditional Entropies of the 18 TAM Forms in (2)

(13) Conditional Entropies of the Reorganized and Simplified Tlingit Verbal Inflectional System

9

The first thing to note regarding the table in (12) is that the ACE for the verbal inflectional system as described in Section 2 is 1.229, higher than any of the ACEs recorded in table (10). Put crudely, this figure indicates that for a Tlingit-learner, knowing a particular inflected form gives them, on average, between a 1/4 and 50/50 chance of predicting another inflected form. Although such predictive power is not as strong as that found for the languages in (10), it is important to compare this to the average ACE of randomized variants of the Tlingit system. To this end, the Python program written by Malouf also generates such randomized variants and averages their ACEs. The result of these calculations appears under ** Bootstrap in (12). There, we find that the average ACE of randomly generated variants of the Tlingit system with similar superficial complexity is 3.513, significantly higher than the observed value of 1.229. Thus, although an ACE of 1.229 may seem at first to challenge the LCEC, the contrast between the observed ACE and the average ACE of the randomized systems lends credence to the view that the Tlingit verbal inflectional system is structured to facilitate the prediction of novel inflectional forms. Probing further, however, a closer examination of the table in (12) can yield some insight into why the ACE of the Tlingit system in Section 2 is so relatively high. Note the rather high values throughout the first two rows and the first two columns. This, of course, reflects the fact that for Tlingit, knowing the imperfective (and irrealis imperfective) form a given verb does not help one to predict the other TAM forms of the verb, and vice versa. This is because the imperfective forms of a verb can, at most, only determine the Primary Imperfective Type (1a) and Root Type (1c) of the verb; the imperfective form provides no information regarding the Conjugation Class of the verb, which determines 16 of the 18 TAM forms in (2). Similarly, the other 16 TAM forms in (2) provide no information regarding the Primary Imperfective Type of the verb, and so provide no information regarding the realization of the imperfective (and irrealis imperfective) forms. This issue is well known to Tlingit specialists, and creates a special problem for language documentarians and lexicographers (Edwards 2009, Eggleston 2013). In this sense, the imperfective forms stand outside of the larger TAM system of Tlingit. After all, as the summary in Section 2 makes clear, the imperfective form of a Tlingit verb is essentially stipulated as part of its lexical entry (Leer 1991, Edwards 2009, Eggleston 2013). For this reason, one would be warranted in the view that the features imperfective and irrealis imperfective are separate dimensions of the verbal inflectional paradigm in Tlingit. Under this alternate view, the primary inflectional categories of the Tlingit TAM system are simply (2c)-(2r), and the primary inflectional classes are solely defined by the parameters in (1b) and (1c). Consequently, under this view, there are merely 45 (9x5) possible (primary) inflectional classes. Given the plausibility of this alternate view of the Tlingit verbal inflectional system, I sought to calculate the ACE of the Tlingit verbal system under this reorganization. Consequently, a second paradigm table was input to Maloufs python program, one containing only the aforementioned 45 inflectional classes and the 16 remaining TAM forms in (2). The resulting calculations were output as table (13) above. Note that in (13), we find that removing the imperfective forms from the Tlingit verbal inflectional system dramatically reduces the systems ACE; the ACE of the simplified system is 0.739, well within the range predicted by the LCEC. Furthermore, it should also be noted that the average ACE of the randomized variants of the simplified inflectional system is 1.663, significantly higher than the observed value. Thus, by isolating the imperfective forms from the rest of the verbal inflectional system, the learnability of the resulting system is thereby increased. Interestingly, we can push this initial result even further. Note the first two rows in (13) are also relatively high. This reflects the

10

fact that the perfective form of a Tlingit verb provides relatively little information regarding the realizations of the other TAM categories. This is due to the fact that in the perfective, there is no contrast between three of the five Conjugation Classes.4 Interestingly, for this reason, the non-perfective (and non-imperfective) forms of a verb do provide good information regarding the realization of the perfective form, a fact reflected in the relatively low conditional entropy values in the first two columns of (13). In sum, the perfective form of a Tlingit verb provides little information regarding the realizations of the other TAM forms, an issue that is again already well-known to Tlingit specialists (Edwards 2009, Eggleston 2013). This raises the question of how ACE may be affected by separating out both perfective and imperfective from the verbal inflectional system. For this reason, I sought to calculate the ACE of a verbal inflectional system identical to the one in (13), but lacking the categories perfective and irrealis perfective; such a paradigm table was thus input to Maloufs Python program. The resulting calculations were output as table (14) below. Note that (14) shows that the ACE of the resulting system is just 0.575, a relatively low number, comparable to that of Fur and Russian in table (10). Further, note again that the average ACE of the randomized variants of this system is still quite high by comparison, at 1.573.

4 That is, verbs of the na-, ga- and gha-conjugations all have the same form in the perfective, while verbs of the 0- and 0/y-conjugations receive a distinct form.

11

(14) Conditional Entropies of the Simplified Tlingit Verbal Inflectional System, Without Perfective

12

We find, then, that by removing just four of the eighteen TAM categories in (2), the ACE of the verbal inflectional system drops considerably. That is, as is well-known to Tlingit grammarians, the non-(im)perfective verbal forms constitute a morphological subsystem within which relatively reliable predictions of form can be made. This, however, raises the question of how reliably a learner will encounter such non-(im)perfective forms. Here, we paradoxically find that these most informative verbal inflections are actually rather infrequent in natural speech, especially as compared to the perfective and imperfective forms. This is a widely shared impression amongst Tlingit scholars, and can be corroborated via formal textual counts. For example, we find in (15) below that in a 40-minute conversation between two fluent elders, Shgonde Walter Soboleff and Keiheenk'w John Martin, approximately 75% of the verbs were either perfective or imperfective. (15) Count of Inflected Verbal Forms in Recorded Tlingit Conversation 5 a. Imperfective 156 b. Irrealis Imperfective 5 c. Perfective 224 d. Irrealis Perfective 13 e. Future 35 f. Irrealis Future 2 g. Potential 1 h. Irrealis Potential 1 i. Habitual 51 j. Irrealis Habitual 5 k. Imperative 5 l. Hortative 2 m. Admonitive 0 n. Consecutive 1 o. Conditional 1 p. Contingent 9 q. Progressive 12 r. Repetitive 7 Perfective / Irrealis Perfective: 237/530 45%

Imperfective / Irrealis Imperfective: 161/530 30% Non-(Im)perfective: 132/530 25%

It should be noted, however, that although each of the individual verbal forms in (15e)-(15r) is rather infrequent, when taken as a whole, the non-(im)perfective forms are by no means rare, accounting for a full quarter of the spoken verbal forms. Furthermore, this itself accords well with the relatively low ACE of the forms in (15e)-(15r). Given that each of the forms in (15e)-(15r) is rather infrequent, it would be of much benefit to learners if a single such form allowed reliable prediction of the others. 5 This conversation was recorded as part of Alice Taffs Tingit Conversation Documentation Project, http://www.uas.alaska.edu/arts_sciences/humanities/alaska-languages/cuped/video-conv/index.html. It is listed as Conversation 16: Shgonde Walter Soboleff and Keiheenk'w John Martin.

13

As one final remark, it would be interesting if a learning simulation could be designed to explore the diachronic stability of a system with the properties above, where the highest ACE within the system holds between relatively infrequent surface forms. References Ackerman, Farrell and Robert Malouf. 2013. Morphological Organization: The Low Conditional Entropy Conjecture. Language 89(3): 429-464. Crippen, James A. 2013. Tlingitology Seminar Notes: Background and Morphology. Manuscript. University of British Columbia. Edwards, Keri M. 2009. Dictionary of Tlingit. Juneau, AK: Sealaska Heritage Institute. Eggleston, Keri M. 2013. 575 Tlingit Verbs: A Study of Tlingit Verb Paradigms. PhD Dissertation. University of Alaska Fairbanks. Leer, Jeff. 1991. The Schetic Categories of the Tlingit Verb. PhD Dissertation. University of Chicago.

Prince-ContributionPrince-Contribution.2Prince-Contribution.3Prince-Contribution.4Prince-Contribution.5

Average Conditional Entropy of the Tlingit Verbal Inflection Paradigm: A Brief Report

Documents

Transcript of Average Conditional Entropy of the Tlingit Verbal Inflection Paradigm: A Brief Report