THE SOURCES OF PHONOLOGICAL MARKEDNESS A Dissertation …kfpotts/papers/flack... · 2009-10-04 ·...
Transcript of THE SOURCES OF PHONOLOGICAL MARKEDNESS A Dissertation …kfpotts/papers/flack... · 2009-10-04 ·...
THE SOURCES OF PHONOLOGICAL MARKEDNESS
A Dissertation Presented
By
KATHRYN GILBERT FLACK
Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment
of the requirements for the degree of
DOCTOR OF PHILOSOPHY
September 2007
Department of Linguistics
© Copyright by Kathryn Gilbert Flack 2007
All Rights Reserved
THE SOURCES OF PHONOLOGICAL MARKEDNESS
A Dissertation Presented
by
KATHRYN GILBERT FLACK
Approved as to style and content by: John J. McCarthy, Chair John Kingston, Member Joe Pater, Member Andrew McCallum, Member
Elisabeth O. Selkirk, Department Head Department of Linguistics
iv
ACKNOWLEDGEMENTS
I am enormously grateful to my chair, John McCarthy, for guiding me through the
writing of this dissertation, as well as every other aspect of my academic life for the past
five years. His probing questions and sage advice have made this work much stronger
and richer than it could have been otherwise. The other members of my committee also
deserve great thanks and appreciation. John Kingston has shaped and clarified my
thinking on phonetics and its interface with phonology, taught me about every aspect of
experiments, and has been great fun to debate and brainstorm with throughout my time at
UMass. Joe Pater is consistently enthusiastic about exploring new theoretical directions,
and his insightful challenges have made my work more solid. Andrew McCallum has
offered constant support and enthusiasm, as well as plenty of helpful suggestions and
questions. I have learned a great deal from classes and conversations with many other
UMass faculty members, especially Ellen Woolford, Lisa Selkirk, Lyn Frasier, and Gaja
Jarosz. Thanks also to Paul de Lacy, Alan Prince, Jason Riggle, Nathan Sanders, Donca
Steriade, and Colin Wilson, and audiences at HUMDRUM 2006 and LSA 2007 meetings
for helpful discussions.
I would not have survived graduate school without my classmates Michael Becker
and Shigeto Kawahara. Ever since our first homework assignments in our first semester,
they’ve talked through ideas with me, debated theories, provided data, suggested
references, and made me laugh (even as the phoenixes burn). I’ve also enjoyed and
appreciated my time with the other UMass grad students, including Leah Bateman, Tim
Beechey, Angela Carpenter, Della Chambless, Andries Coetzee, Shai Cohen, Emily
Elfner, Maria Gouskova, Elena Innes, Karen Jesney, Mike Key, Wendell Kimper,
v
Kathryn Pruitt, Taka Shinya, Anne-Michelle Tessier, Adam Werle, and Matt Wolf. For
their assistance with the experiments reported in chapter 3, I am grateful to Dan Mash for
making everything run more smoothly, and Marianne McKenzie for her patience in
recording hundreds of semi-French sentences.
Mike Flynn convinced me that I wanted to be a linguist during my first term at
Carleton College, and my enthusiasm only grew through another 10 or so classes with
him over the next four years. Also at Carleton, I learned a great deal from the teaching
and friendship of Laurie Zaring. I’m grateful to Ehren Reilly for all of our conversations
over the years. I also appreciate the education and support I’ve received from Gavan
Breen, Jenny Green, Robert Hoogenraad, and Rob Pensalfini. Alex Barron deserves
special mention for providing an excellent sounding board on all issues academic and
otherwise.
Finally, Mike, Meg, and Dan Flack have enthusiastically supported everything I’ve
done, and Chris Potts has made everything much more fun.
vi
ABSTRACT
THE SOURCES OF PHONOLOGICAL MARKEDNESS
SEPTEMBER 2007
KATHRYN GILBERT FLACK, B.A. CARLETON COLLEGE
Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST
Directed by: Professor John J. McCarthy
A great deal of current work in phonology discusses the functional grounding of
phonological patterns. This dissertation proposes that functional factors can motivate
phonological constraints in two ways. ‘Functionally grounded’ constraints are induced
from learners’ immediate linguistic experience. ‘Formally grounded’ constraints
generalize beyond literal functional facts; as learners do not have direct evidence for
these constraints, they must be innate. As this proposal distinguishes between constraints
which are and are not induced, questions about how learners induce constraints are also
central. The dissertation describes a computational model in which virtual learners hear
acoustically realistic segments, learn to identify these segments in a realistic way, and
induce attested phonotactic constraints from this experience.
Chapter 1 gives an overview of the proposed distinction between functionally and
formally grounded constraints.
Chapter 2 explores a novel class of functionally grounded constraints which impose
parallel phonotactic restrictions on the edges of all prosodic domains. Restrictions on
domain-initial ŋ, , and h are discussed in particular detail. While these tend to reflect
perceptual facts, individual constraints on marked domain-initial onsets cannot all be
induced from learners’ perceptual experience. For this reason, these domain-edge
constraint schemata and all constraints belonging to the schemata are formally grounded.
vii
Chapters 3 and 4 turn to functionally grounded constraints. The empirical focus is a
restriction on word-initial p found in languages including Cajonos Zapotec, Ibibio, and
Moroccan Arabic. Chapter 3 presents experimental results showing that initial p is
uniquely perceptually difficult and uniquely acoustically similar to initial b. These
phonetic facts are taken to be the basis for initial p’s phonological markedness.
In order to show that the constraint *#P can be consistently induced by all learners,
chapter 4 describes a computational model based on the acoustic and perceptual data
collected in these experiments. Virtual learners are exposed to either pseudo-French,
where word-initial p is attested, or pseudo-Cajonos Zapotec, where there is no initial p.
With only very conservative assumptions about the nature of learners’ perceptual
experience, the model consistently induces the constraint *#P from realistic input.
Chapter 5 concludes, emphasizing the importance of testing these proposals empirically.
viii
Table of contents
ACKNOWLEDGEMENTS............................................................................................iv ABSTRACT...................................................................................................................vi
LIST OF TABLES .........................................................................................................xi LIST OF FIGURES.......................................................................................................xii
CHAPTER 1. FORMAL AND FUNCTIONAL ASPECTS OF MARKEDNESS .........1 1.1. Constraint universality .......................................................................................3
1.2. Formal vs. functional grounding and language learners ......................................6 1.3. Formal vs. functional grounding and constraint schemata...................................9
1.4. Formally grounded constraints and functional aspects of phonology ................12 1.5. Outline of the dissertation ................................................................................15
CHAPTER 2. FORMALLY GROUNDED MARKEDNESS CONSTRAINTS ..........17 2.1. Formally grounded domain-edge markedness constraints.................................17
2.2. Marked domain-initial onset segments .............................................................21 2.2.1. Marked onset segments .............................................................................21
2.2.2. Parallel restrictions on marked word-initial segments ................................25 2.2.3. Generalized domain-initial markedness .....................................................30
2.2.3.1. Parallel restrictions on marked utterance-initial segments ...................31 2.2.3.2. Parallel restrictions on marked foot-initial segments ...........................32
2.2.4. Summary of the domain-initial onset restrictions.......................................33 2.3. A constraint schema for marked onsets: *X(Onset/PCat).................................34
2.3.1. The *X(Onset/PCat) constraint schema .....................................................35 2.3.2. Factorial typology: General faithfulness and *X(Onset/PCat) constraints ..37
2.3.3. Factorial typology II: Positional faithfulness and *X(Onset/PCat) constraints 40
2.3.4. Implicational restrictions and free ranking of *X(Onset/PCat) constraints .42 2.3.5. *X(Onset/PCat) constraints are formally grounded....................................46
2.3.5.1. The phonetics of marked onsets..........................................................47 2.3.5.2. Comparison: Phonetics and phonotactics of retroflexes.......................49 2.3.5.3. *X(Onset/PCat) constraints are formally grounded .............................51
2.4. Generalized domain-edge markedness constraints ............................................54
2.4.1. MOnset(Onset/PCat): Onset restrictions across prosodic domains ...............56
ix
2.4.2. MCoda(Coda/PCat): Coda restrictions across prosodic domains..................59 2.4.2.1. MCoda(Coda/σ): Syllable coda restrictions ..........................................60 2.4.2.2. MCoda(Coda/Word): Word-final coda restrictions ...............................62 2.4.2.3. MCoda(Coda/Phrase): Phrase-final coda restrictions ............................65 2.4.2.4. MCoda(Coda/Utterance): Utterance-final coda restrictions...................67
2.4.3. Summary of the argument .........................................................................68 2.5. Domain-edge markedness constraints and strict layering..................................71
2.5.1. Marked structures become extraprosodic: Banawa stress ...........................73
2.5.1.1. Basic analysis of Banawa ...................................................................75 2.5.1.2. Alternative analysis: ONSET/σ ...........................................................77 2.5.1.3. ONSET/PCat constraints must be freely rankable .................................79
2.5.2. Tolerance of marked ‘initial’ structures: Tzutujil clitics.............................80 2.5.3. Domain-edge markedness constraints and non-strict layering ....................85
2.6. Conclusion.......................................................................................................86 CHAPTER 3. FUNCTIONALLY GROUNDED PHONOTACTIC RESTRICTIONS 88
3.1. Functional grounding in phonology..................................................................88 3.2. Phonological restrictions on word-initial p .......................................................90
3.3. The perceptual difficulty of word-initial p ........................................................92 3.3.1. Methods ....................................................................................................93
3.3.1.1. Stimuli................................................................................................93 3.3.1.2. Recording...........................................................................................95 3.3.1.3. Stimulus construction and acoustic manipulation................................96 3.3.1.4. Participants.........................................................................................98 3.3.1.5. Procedure ...........................................................................................99 3.3.1.6. Analysis ...........................................................................................100
3.3.2. Results ....................................................................................................102 3.3.2.1. Reaction time ...................................................................................102 3.3.2.2. Response accuracy ...........................................................................103 3.3.2.3. Ruling out alternative explanations of the effect ...............................104
3.3.3. Discussion...............................................................................................109 3.4. Acoustic similarity between word-initial p and b............................................110
3.4.1. Methods ..................................................................................................111
3.4.1.1. Acoustic analysis..............................................................................111 3.4.1.2. Statistical analysis ............................................................................112
3.4.2. Results ....................................................................................................113 3.4.2.1. Maximum burst intensity ..................................................................114
x
3.4.2.2. Voice onset time...............................................................................115 3.4.3. Discussion...............................................................................................116
3.5. Summary and general discussion....................................................................119 CHAPTER 4. MODELLING CONSTRAINT INDUCTION....................................122
4.1. The nature of functional grounding and constraint induction ..........................122 4.2. Modelling production and perception.............................................................126
4.2.1. How the model works .............................................................................127 4.2.1.1. Production and phonetic representations ...........................................127 4.2.1.2. Perception: Hearing, phoneme identification, and category learning 131
4.2.2. Results and discussion.............................................................................138
4.2.2.1. General results: Initial p is perceptually difficult...............................139 4.2.2.2. Justifying assumptions in the model..................................................141 4.2.2.3. The source of initial p’s perceptual difficulty: VOT variances ..........145 4.2.2.4. Summary and discussion ..................................................................152
4.3. Modelling constraint induction.......................................................................154 4.3.1. Desiderata for a constraint inducer ..........................................................155
4.3.2. How the model works .............................................................................161 4.3.2.1. The structure of functionally grounded constraint schemata..............162 4.3.2.2. Induction from accuracy scores: Pseudo-French ...............................167 4.3.2.3. Induction from false alarm scores: Pseudo-Cajonos Zapotec.............169
4.3.3. Summary of the constraint induction model.............................................172 4.4. Conclusion.....................................................................................................173
4.4.1. Summary.................................................................................................173 4.4.2. Future directions: Elaborating the model .................................................175
CHAPTER 5. CONCLUSION..................................................................................177 5.1. Summary of the dissertation...........................................................................177
5.2. Broader issues................................................................................................179 5.2.1. Constraint universality?...........................................................................180
5.2.2. Empirical investigations of constraint induction ......................................181 APPENDIX A. EXPERIMENTAL STIMULI RECORDINGS.................................184
APPENDIX B. SUBJECTS ANALYSES OF PERCEPTUAL RESULTS ................188 BIBLIOGRAPHY .......................................................................................................191
xi
List of tables
Table 1. Reaction time analyses of stimuli trimmed from word and nonword recordings,
with p values from preplanned two-sample t-tests (from items analysis)..............................................................................................................106
Table 2. Percent correct analyses of stimuli trimmed from word and nonword recordings, with p values from preplanned two-sample t-tests (from items analysis). ..............................................................................................106
Table 3. Reaction times analyses of stimuli in which consonants are followed by the same vowels (/ i o y/) in all eight conditions, and those in which the following vowels are not shared across all conditions. p values are from preplanned two-sample t-tests, using items analyses. ............................107
Table 4. Frequency measures, from the Lexique corpus. Type frequency is a count of the occurrences of a consonant in the words contained in Lexique; a consonant’s token frequency is derived from word frequency data given in Lexique. ...............................................................................................108
Table 5. Maximum release burst intensity measures for initial and medial p, b, t, and d, with differences and p values (from preplanned two-sample t-tests) for pairs of stops differing in voicing. ........................................................114
Table 6. Reaction time analyses, with p values from preplanned two-sample t-tests. .188 Table 7. Percent correct analyses, with p values from preplanned two-sample t-tests.189
Table 8. Reaction time analyses of stimuli trimmed from word and nonword recordings, with p values from preplanned two-sample t-tests.................................190
Table 9. Percent correct analyses of stimuli trimmed from word and nonword recordings, with p values from preplanned two-sample t-tests...............190
xii
List of figures
Figure 1. Spectrogram and waveform for robotique, with edges of the target stop b and
the flanking vowels labeled ....................................................................97 Figure 2. Waveform for robotique (from Figure 1), after windowing removes
everything but the target consonant and the inner three-quarters of each flanking vowel........................................................................................98
Figure 3. Average reaction times (ms) in each condition, with 95% confidence intervals (from items analysis)..............................................................103
Figure 4. Average percent correct in each condition, with 95% confidence intervals (from items analysis). ...........................................................................104
Figure 5. Average maximum burst intensity in each condition, within 5 ms of release, with 95% confidence intervals..............................................................115
Figure 6. VOT of initial and medial voiceless labial and coronal stops followed by non-high vowels, with 95% confidence intervals. .................................116
Figure 7. Model accuracy for each initial (a) and medial (b) consonant, averaged across 20,000 simulations of 300 rounds each. The lines represent moving averages across 15-round windows.......................................................141
Figure 8. Model accuracy for each initial consonant, where place cues are dropped in 5% of heard utterances and closure voicing, VOT, and burst cues each dropped in 10% (a), 25% (b), or 50% (c) of heard utterances. Results are averaged across 20,000 simulations of 300 rounds each; lines represent moving averages across 15-round windows. .........................................142
Figure 9. Model accuracy for each initial consonant based on actual acoustic measurements (a) and on values selected from normal distributions whose means and variances are based on actual acoustic measurements (b). Results are averaged across 20,000 simulations of 300 rounds each; lines represent moving averages across 15-round windows. ..........................144
Figure 10. Model accuracy for each medial consonant based on actual acoustic measurements (a) and on values selected from normal distributions whose means and variances are based on actual acoustic measurements (b). Results are averaged across 20,000 simulations of 300 rounds each; lines represent moving averages across 15-round windows. ..........................144
Figure 11. Model accuracy for each initial consonant where place, VOT, and burst cues are always heard, and closure voicing is never heard (a); also for the basic version of the model (b). Values are averaged over 20,000 300-round simulations. ..........................................................................................147
xiii
Figure 12. Confusion matrix for initial consonants where place, VOT, and burst cues are always heard, and closure voicing is never heard (a); also for the basic version of the model (b). Data collected from the last 15 rounds of each of 20,000 simulations. ..............................................................................147
Figure 13. Model accuracy for each initial consonant where place and VOT cues are always heard; closure voicing and burst cues are never heard. Values are averaged over 20,000 300-round simulations........................................148
Figure 14. Confusion matrix for initial consonants where place and VOT cues are always heard; burst and closure voicing are never heard. Data collected from the last 15 rounds of each of 20,000 simulations. .........................148
Figure 15. Model accuracy for each initial consonant where place and burst cues are always heard; closure voicing and VOT cues are never heard. Values are averaged over 20,000 300-round simulations........................................149
Figure 16. Confusion matrix for initial consonants where place and burst cues are always heard; VOT and closure voicing are never heard. Data collected from the last 15 rounds of each of 20,000 simulations. .........................149
Figure 17. Model accuracy for each initial consonant where the VOT variance of each initial segment is 10 (a) and in the basic model (b). Values are averaged over 20,000 300-round simulations.......................................................150
Figure 18. VOT probability distributions for initial labial (a) and coronal (b) segments in the perceptual model.........................................................................151
Figure 19. Constraints induced in each of 250 pseudo-French simulations of 40,000 rounds each. .........................................................................................169
Figure 20. Model accuracy for each initial pseudo-CZ consonant, averaged across 20,000 simulations of 300 rounds each. The lines represent moving averages across 15-round windows.......................................................171
Figure 21. Confusion matrix for initial pseudo-CZ consonants. Data collected from the last 15 rounds of each of 20,000 simulations.........................................171
Figure 22. Constraints induced in each of 250 pseudo-Cajonos Zapotec simulations of 40,000 rounds each...............................................................................172
Figure 23. Constraints induced in each of 250 simulations of 40,000 rounds each.....173 Figure 24. Average reaction times (ms) in each condition, with 95% confidence
intervals. ..............................................................................................188 Figure 25. Average percent correct in each condition, with 95% confidence intervals.
.............................................................................................................189
1
Chapter 1. Formal and functional aspects of markedness
Phonologists have been long concerned with finding phonetic properties which allow
phonological patterns to be seen as ‘natural’ or ‘grounded’ (see e.g. Stampe (1973),
Hooper [Bybee] (1976), Ohala (1990), Archangeli and Pulleyblank (1994)). More
recently, with the advent of Optimality Theory (Prince and Smolensky, 1993/2004), a
great deal of work has focused on identifying functional grounding for specific OT
constraints (see e.g. Hayes (1999), Smith (2002), Steriade (1999; 2001a), and papers in
Hayes et al. (eds.) (2004)). A wide range of markedness constraints against particular
segments (e.g. voiced geminates (Kawahara, 2006b; Ohala, 1983)), sequences (e.g. nasals
followed by heterorganic stops (Hamilton, 1996; Steriade, 2001b)), structures (e.g.
stressed high vowels (Smith, 2002)) and contrasts (as in most vowel inventories
(Flemming, 2004; Lindblom, 1986)) are frequently argued to follow from articulatory,
perceptual, or psycholinguistic properties which cause them to be phonologically
marked.1
Theoretical discussions of constraint grounding generally agree that functionally
grounded constraints are those which prefer more perceptually or psycholinguistically
salient, or less articulatorily challenging, forms to those with less prominence or greater
difficulty. Beyond this, however, there is very little agreement about what it means for
constraints to be functionally grounded, what sort of connection exists between phonetic
facts and constraints, or whether all constraints reflect functional aspects of language.
Most work is agnostic on this matter, exploring functional grounding by finding phonetic
1 A similar drive to ground sound patterns in phonetics also motivates frameworks like
Evolutionary Phonology (Blevins, 2004, 2006) and bidirectional phonetic optimization (Boersma and Hamann, 2007a; Boersma, to appear) in which phonetics is removed from phonology entirely, providing extraphonological accounts of these patterns.
2
facts which correlate with constraint activity while remaining uncommitted to a particular
relationship between phonetics and constraints.
Prince and Smolensky (1993/2004) originally proposed that all constraints in the
universal constraint inventory CON are innate. Under this assumption, any functional
factors which determine the shape of the constraint inventory must have done so at an
earlier stage of evolution, rather than affecting individual learners’ constraint inventories.
Alternatively, Hayes (1999), Smith (2002), and Steriade (1999; 2001a) discuss various
means by which learners could induce functionally grounded constraints directly from
their individual linguistic experience.
This dissertation will propose that all constraints are either functionally grounded
or formally grounded, that a constraint’s grounding reflects the directness of the
connection between functional motivations and the constraint itself, and that the
distinction is an empirically testable one. Following Hayes, Smith, and Steriade,
functionally grounded constraints are those which can be induced from individual
learners’ linguistic (articulatory, perceptual, etc.) experience. Formally grounded
constraints, on the other hand, cannot be induced from experience and so are instead
entirely innate, referring to formal linguistic elements rather than literal articulatory or
perceptual properties. This work will proceed from the premise that constraints are
universal, and present in the grammar of each speaker of each language; section 1.1 will
motivate this premise. Section 1.2 will then argue that the perspective of the learner
should be taken in determining whether each of these universal constraints is formally or
functionally grounded – that is, we should ask whether each learner has the ability to
consistently induce some constraint. Section 1.3 then argues that formal vs. functional
3
grounding should not be determined uniquely for each individual constraint, but rather
for schemata – for sets of formally similar constraints.
The distinction between functionally and formally grounded constraints proposed
here is by no means the same as a distinction between all functional and formal aspects of
phonology. Formally grounded constraints may very often reflect phonetic and
psycholinguistic facts. The connection is, however, an indirect one, as facts which
(evolutionarily) motivated functionally grounded constraints are not directly available to
learners. The sonority scale, for example, closely corresponds to segments’ relative
intensity; however, the correspondence between intensity and sonority is imperfect
(Parker, 2002). Learners could not derive the sonority scale from their phonetic
experience, and so Parker concludes that this must be a formal, innate linguistic
primitive. Section 1.4 argues that constraints which (like the sonority scale) appear to be
functionally motivated, but whose functional motivations are not available to all learners,
must similarly be innate rather than induced. The functional motivation for these
constraints has been grammaticalized. The constraints themselves are synchronically
formally rather than functionally grounded, innate rather than induced. Section 1.5
describes the structure of the rest of the dissertation as it relates to the issues raised here.
1.1. Constraint universality
The goal of this dissertation is to investigate the relationship between constraints and
their functional motivations. In doing this, I will proceed from the premise that the
grammar of each speaker of each language contains the same set of universal constraints.
This is supported by cases of ‘the emergence of the unmarked’ (McCarthy and Prince,
1994) in which speakers’ preferences correlate with cross-linguistic typological
preferences rather than with the speakers’ own linguistic experience. In these cases,
4
preferences (which follow from constraints) emerge, despite a lack of evidence for these
preferences or constraints in ambient language. Given the lack of language-specific
motivation for these constraints, they appear to be universal, and so present even in
grammars where they are otherwise inactive.
Constraints can emerge in child phonology and second language acquisition, as
well as in adult phonology. Adam demonstrates that Hebrew-speaking children first
pronounce (truncated) words with consistent initial stress, though their adult
pronunciations have initial, medial or final stress (Adam, 2002: ch. 3). Children thus
prefer unmarked forms while having evidence only for lexical stress. A similar bias for
trochaic forms is found in French learners, despite the fact that adult French words nearly
always have final stress (Archibald and Carson, 2000). In both of these cases, the
typologically attested constraint ALIGN-L(σ́, Word) emerges in learners of languages
where it is generally inactive. This constraint thus appears to be universal.
Constraints expressing segmental markedness can also emerge in this way.
Mandarin has no codas at all, so speakers have no experience with segments’ markedness
(or phonetic properties) in coda position. When adult Mandarin speakers learn English,
they initially prefer voiceless to voiced obstruent codas in English words (Broselow et al.,
1998), mirroring the typological markedness of voiced obstruent codas. In phonological
terms, these speakers show a preference for forms satisfying *VOIOBSCODA.
Knowledge of fixed rankings among constraints can also emerge from speakers
whose language makes no use of this ranking. In mimetic reduplication of novel forms,
Japanese speakers show a preference for geminate stops over fricatives, and geminate
fricatives over nasals, though these are attested with equal frequency in Japanese
(Kawahara, 2006a; Kawahara and Akashi, 2006). These preferences correspond with the
5
cross-linguistic typology of geminates. Their emergence shows that speakers whose
grammar reveals only the ranking FAITH » *GEMNASAL, *GEMFRIC, *GEMSTOP are
nevertheless aware of the fixed ranking *GEMNASAL » *GEMFRIC » *GEMSTOP.
Speakers therefore reveal knowledge of constraints, and rankings among
constraints, for which they have no evidence from their immediate linguistic experience.
These emergent constraints and rankings correspond to typologically attested constraints
and rankings. The presence of these constraints even in languages where they are
typically inactive and unmotivated suggests that they are universal.
As the following chapters investigate the grounding of particular constraints, I
will assume that each constraint is universally present in all speakers’ grammars. This
claim that constraints are universal – in effect, that they are part of Universal Grammar –
is not a claim that constraints are all innate. An explicit distinction between universality
and innateness has been made by Tesar and Smolensky, as well as Kiparsky:
The set Con of constraints is universal: the same constraints are present in all languages. The simplest interpretation of this is that the constraints are innately specified, but that is not required by the theory itself: OT only requires that Con be universal.
(Tesar and Smolensky, 2000: 130) Blevins calls my approach [searching for true linguistic universals] “innatist”, but this is incorrect for two reasons. First, while the criteria I propose serve to distinguish intrinsic properties of language (“universals” from historically contingent ones (“typological generalizations”) they do not and cannot tell us whether a putative universal, in this sense, is innate, grounded in language use, or both (it is good for us to be predisposed to learn the kinds of languages that are good for us – an instance of the so-called Baldwin effect). Secondly, I make no prior commitments to an innate faculty of language. I happen to find some of the arguments for it quite persuasive but the program can just as well be pursued by those who do not, and indeed it may well turn out to undermine innatist assumptions.
(Kiparsky, 2006: 221)
6
Throughout this work I will hold that some, but explicitly not all, universal
constraints are innate. The next section will argue in more detail that formal and
functional grounding are distinguished in terms of the directness of the connection
between constraints and their functional motivations: functionally grounded constraints
can be induced from learners’ experience, while formally grounded constraints have
generalized beyond phonetic experience and are innate. If each constraint is universal,
and is consistently either innate or induced, the linguistic experience of all learners of all
languages must be considered in determining whether some constraint can be consistently
induced by each learner. In order for induced constraints to be in all learners’ constraint
inventories, all learners must have sufficient access to perceptual evidence for these
constraints. If only some learners would have sufficient perceptual information to induce
a constraint, the constraint must instead be innate in order to be universal.
1.2. Formal vs. functional grounding and language learners
A fundamental question about the relationship between functional properties of language
(e.g. articulatory, perceptual, and psycholinguistic properties) and constraints which are
motivated by these functional properties concerns the relationship between the two. One
possibility is that functional factors could determine aspects of individual learners’
constraint inventories. Alternatively, at an earlier stage of evolution, these factors could
have determined properties of the constraint inventory; the constraints encoding these
properties could now be innate in all learners.
I take the position that functional factors shape the constraint inventory in both of
these ways: individual learners induce some constraints based on aspects of their
linguistic experience, while other constraints which are motivated by, but generalize
beyond, these functional factors are innate in all learners. ‘Functionally grounded’
7
constraints are those which each learner induces based on their phonetic experience with
the ambient language and their own productions. ‘Formally grounded’ constraints, on the
other hand, are those whose functional motivations are evolutionary and which are now
innate.2 Whether a particular constraint is functionally grounded and induced or formally
grounded and innate, it is assumed throughout this dissertation to be present in the
constraint inventory of each speaker of each language, for the reasons given in section
1.1.
The primary argument that learners actively induce some of the constraints in
their grammars comes from cognitive economy. Assume for the moment that phonetic
data demonstrating segments’ or features’ relative perceptual salience, their articulatory
difficulty, and so on in particular contexts is available to learners via their immediate
linguistic experience. Further assume that there exists a reliable mechanism by which
learners can evaluate their linguistic experience and induce constraints which are
grounded in these functional factors.
The independent existence of this information, and its availability to learners,
makes any innate specifications of phonetically grounded markedness redundant. Under
the assumption that innate mechanisms for language acquisition should contain only
specifications which are absolutely necessary, learners must use as much information as
possible from their experience. Innate specifications should only be posited when
externally-available information is insufficient to the learning task.
While the substantive properties of functionally grounded constraints may be
induced from external phonetic information, a learner’s complete acquisition of a
2 See section 1.4 and chapter 2 for further discussion of functional motivations for formally
grounded constraints. These motivations have been grammaticalized and are no longer directly available to learners.
8
constraint inventory must also rely on a number of innate specifications. The most
immediately relevant of these is the mechanism for inducing functionally grounded
constraints from a learner’s experience. Hayes (1999) argues that phonetic experience
cannot be directly, literally mapped to constraints, so some (innate) procedure must exist
for deriving constraints from raw phonetic data. The induction of functionally grounded
constraints must be guided by innate constraint schemata. Formally grounded constraints,
which cannot be induced from experience and must instead be fully innate, must also be
entirely innate.
The argument here for including learner-induced functionally grounded
constraints in the grammar depends on the premise that each learner is capable of
inducing a consistent set of functionally grounded constraints from their own linguistic
experience. If learners have similar linguistic experience, they will emerge from language
acquisition with a consistent set of induced constraints. All learners of the same language
will have essentially the same linguistic experience; learners of different languages,
however, will have fundamentally different linguistic experiences. The constraint
induction procedure must be robust in the face of these differences, allowing a single
universal set of constraints to be induced by any learner of any language.
The ultimate division between innate and induced constraints is a matter for
empirical investigation. A constraint can only plausibly be induced from a learner’s
experience if it can be shown that all learners have access to sufficient perceptual or
articulatory experience to induce that constraint, as well as some mechanism for reliably
mapping this experience to the constraint. In order to come to a deeper understanding of
functionally grounded constraints, we must understand how constraints could be induced
from phonetic data. A basic understanding of this mechanism will allow evaluation of
9
whether individual, arguably functionally grounded constraints can in fact be induced
from learners’ perceptual, articulatory, and psycholinguistic experience.
1.3. Formal vs. functional grounding and constraint schemata
The constraint inventory CON is structured around constraint schemata: templates for
sets of formally similar constraints. This section will argue that if any aspect of CON is
innate in all learners, it must be constraint schemata rather than individual constraints.
For this reason, formally and functionally grounded constraints should be distinguished at
the level of schemata rather than at the level of individual constraints.
A familiar example of a constraint schema is the general definition of alignment
constraints in (1) (McCarthy and Prince, 1993b). All alignment constraints are formally
similar in requiring pairs of constituent edges to coincide; individual alignment
constraints differ in which edges of which morphological or prosodic categories are
targeted. In this way a single schema defines a class of logically possible constraints of a
particular form. (1) ALIGN(Cat1, Edge1, Cat2, Edge2)
The element standing at the Edge1 of any Cat1 also stands at the Edge2 of some Cat2, where Cat1 and Cat2 are grammatical or prosodic constituents and Edge1 and Edge2 are left or right.
While the set of individual constraints generated by a schema can be relatively
large, a schema itself needs to contains comparatively little information. While the
schema and the list of constraints can both describe learners’ knowledge, the schema is a
much more efficient way of expressing this body of information. Therefore the
considerations of cognitive economy which suggest that functionally grounded
constraints are induced rather than innate also suggest that learners’ innate knowledge of
10
CON is characterized by these relatively economical constraint schemata, rather than by
exhaustive lists of individual constraints.
Returning to the distinction between functionally grounded and formally
grounded constraint schemata, the grounding of a particular schema is often clear. For
example, in the Inductive Grounding framework (Hayes, 1999), the formal complexity
and ‘effectiveness’ of possible *F constraints (where F is some feature or set of features)
is evaluated, where effective constraints prefer articulatorily simple structures to
articulatorily complex ones. Knowledge of articulatory difficulty is collected from
learners’ experience, and effective, formally simple constraints are admitted into learners’
constraint inventories. These *F constraints are functionally grounded because the
schema from which they arise makes direct reference to learners’ articulatory experience.
Similar perceptual, rather than articulatory, information informs constraint
induction in the Schema/Filter model of CON (Smith, 2002) and the Licensing by Cue
framework (Steriade, 1999, 2001a). In the Schema/Filter model, learners evaluate
possible constraints’ ability to prefer perceptually salient structures; those constraints
which satisfy this condition become part of a learner’s inventory. Within the Licensing by
Cue framework, learners induce fixed hierarchies of faithfulness constraints in which
constraints protecting more salient perceptual contrasts are ranked above those protecting
less salient contrasts. In both of these cases, learners’ perceptual experience provides the
necessary information about the perceptual salience of contrasts in various phonotactic
contexts, again producing functionally grounded constraints.
These functionally grounded constraint schemata are fundamentally different
from formally grounded schemata which refer to formal phonological primitives rather
than explicitly to learners’ experience. Formally grounded markedness constraint
11
schemata include the alignment schema discussed above and schemata for creating
constraints through harmonic alignment and constraint alignment (Prince and Smolensky,
1993/2004). These schemata allow learners to create all logically possible sets of
constraints from formal phonological primitives, rather than from literal aspects of
learners’ experience.3
A universal inventory of formally grounded markedness constraints can be
created by all learners with identical innate formal schemata and features, categories,
scales, etc. A universal inventory of functionally grounded markedness constraints, on the
other hand, can be created by all learners with identical innate functional schemata and
comparable linguistic experience.
While an intuitive distinction between functional and formal constraint schemata
can be made by examining the elements referred to by constraint schemata, this alone is
insufficient justification for assigning schemata to either category. For a given
markedness constraint, any claim that the constraint is functionally grounded must be
supported by showing that its substantive properties – and those of all other constraints
defined by the same schema – are consistently induced from all learners’ immediate
linguistic experience. If a schema contains constraints which are not consistently
inducible, all constraints in that schema are instead formally grounded.
Hayes (1999) demonstrates that a set of typologically attested, articulatorily
motivated constraints are inducible by virtual learners in a simulation of articulatory
difficulty, These constraints are thus functionally grounded. The constraints *[+nasal,
3 McCarthy and Prince (1993b) suggest that alignment constraints are psycholinguistically
motivated, as their effects could be informative as to the location of prosodic and morphological constituent edges. While this is a likely functional motivation for the alignment schema, it seems unlikely that a learner could induce the full set of alignment constraint from the highly variable tendencies towards edge alignment seen in individual languages. For this reason, I consider the alignment constraint schemata to be functionally motivated but crucially formally grounded.
12
–voice] (*NT) and *[LAB, –voice] (*p) can be induced via Inductive Grounding. These
constraints are functionally grounded because Inductive Grounding, by definition,
requires learners to induce markedness constraints from basic articulatory facts.
Assuming that the articulatory measures available to the simulated learner are those
available to actual learners, all learners with comparable articulatory apparatus will
induce these same constraints. Because the substantive aspects of each of these formally
similar constraints are consistently inducible from all learners’ experience, each of the
constraints is functionally grounded, as is the schema itself.
Individual alignment constraints, on the other hand, must be formally grounded
because they are not similarly consistently inducible. For example, section 1.1 discussed
cases where learners of languages with lexical or word-final stress (Hebrew and French,
respectively) produce forms with initial stress. It is extremely unlikely that learners of
these languages (especially French) induce the constraint ALIGN-L(σ́, Word) from their
immediate linguistic experience. If this suggestion that not all alignment constraints are
universally induceable is true, then the Align schema itself must be innate and formally
grounded.4
1.4. Formally grounded constraints and functional aspects of phonology
This dissertation proposes that functionally and formally grounded constraint schemata
are distinguished by whether all constraints in a schema can be induced from learners’
linguistic experience, or whether the constraints’ substantive aspects must instead be
innately defined. Distinguishing between formally and functionally grounded constraints
in this way is very importantly not the same as distinguishing between all formal and
4 See chapter 5 for further discussion of the importance of empirically investigating claims about what is and is not induceable.
13
functional aspects of phonology. The latter distinction is much harder to make, as many
(if not most, or even all) aspects of formally grounded constraints are shaped to some
extent by functional factors. The phonological primitives referred to by formally
grounded markedness constraints, while innate, may be ultimately (evolutionarily)
motivated by functional factors without being inducible from each learner’s experience.
For this reason, it can be difficult to determine whether a particular constraint schema is
formally or functionally grounded without experimental justification.
An illustration of phonological elements which are formal and necessarily innate
though ultimately rooted in functional factors is given in Parker’s study of the phonetic
correlates of phonological sonority (Parker, 2002). Phonological differences between
segments in a fine-grained universal sonority scale correlate closely with phonetic
distinctions in the relative acoustic intensity of these segments. Despite this close
connection between sonority values and their functional motivation, Parker concludes
that “[s]onority is a scalar phonological feature which classifies all speech sounds into an
autonomous hierarchy accessible to the constraint component CON. It is thus a
theoretical primitive of Universal Grammar.” (p. 295) The sonority scale is a formal
phonological entity, rather than a literal representation of listeners’ perceptual experience
of intensity.
Despite the overall strength of the correlation between sonority and intensity,
there are two kinds of mismatches which demonstrate that the phonological sonority scale
cannot be induced from phonetic data by each learner. First of all, not all typologically
motivated sonority distinctions correspond to intensity distinctions. English r is more
sonorous than l: r can precede l in codas (as in Carl), but l cannot precede r (*calr).
Phonetically, however, r and l are of equivalent intensity. In addition, segments can
14
consistently differ in intensity without being correspondingly distinguished
phonologically. The intensity of voiced stops is greater than that of voiceless fricatives;
however, the phonological sonority of voiceless fricatives can be higher than that of
voiced stops, as in Imdlawn Tashlhiyt Berber (Dell and Elmedlaoui, 1985).
These mismatches make sense in terms of the formal features of the segments
involved, thus supporting the claim that functional factors can be generalized and
grammaticalized, giving rise to linguistic objects whose properties are ultimately
formally defined. For example, r and l are distinguished by their manner of articulation,
though not by intensity. As all other manner distinctions are reflected in the sonority
scale, it is unsurprising that this formal feature difference is also reflected in the formal
scale. Similarly, the variable relative sonority of voiced stops and voiceless fricatives can
be accounted for if the innate definition of the scale specifies voiced segments as more
sonorous than voiceless ones, and fricatives as more sonorous than stops, but does not
privilege either feature over the other. Tellingly, all of the distinctions supported by
intensity can also be cast in terms of formal phonological features, and there are no
sonority distinctions between segments not distinguished by formal features (e.g. n > m, y
> w).
The example of the sonority scale demonstrates in a general way that not all
phonological elements with roots in functional phenomena can be consistently induced
by each learner directly from experience with these functional sources. Functional factors
can be generalized and grammaticalized such that innate phonological elements (features,
scales, prosodic categories, etc.) generally reflect, but are no longer literal mappings
from, their functional sources. Formally grounded constraint schemata may refer to these
innate, formal representations of phonetic (or other functional) phenomena. Functionally
15
grounded constraint schemata, on the other hand, refer to the phonetic phenomena
themselves.
1.5. Outline of the dissertation
The remainder of the dissertation explores the distinction between formally and
functionally grounded constraint schemata and the nature of the schemata themselves.
Chapter 2 proposes a novel schema for markedness constraints which impose parallel
phonotactic restrictions on the edges of all prosodic domains. Segmental restrictions on
domain-initial onset ŋ, , h, and high-sonority segments are discussed in particular detail.
While these restrictions tend to reflect perceptual facts, individual constraints on marked
domain-initial onsets (*(Onset/σ), *h(Onset/Word), *ŋ(Onset/Utterance), etc.) cannot all
be induced from learners’ perceptual experience. For this reason, these domain-edge
constraint schemata and all constraints belonging to the schemata are formally grounded.
Chapters 3 and 4 turn to functionally grounded constraint schemata. The empirical
focus of these chapters is a restriction on word-initial p found in languages such as
Cajonos Zapotec (Nellis and Hollenbach, 1980), Ibibio (Essien, 1990), and Moroccan
Arabic (Heath, 1989). Chapter 3 presents the results of perceptual and acoustic
experiments showing that initial p is uniquely perceptually difficult, and also uniquely
acoustically similar to initial b. These phonetic facts are taken to be the basis for initial
p’s phonological markedness.
In order to show that the constraint *#P can be consistently induced by all learners
of all languages, chapter 4 describes a computational model based on the acoustic and
perceptual data collected in these experiments. Virtual learners are exposed to either
pseudo-French, where perceptually difficult word-initial p is attested, or pseudo-Cajonos
Zapotec, where there is no initial p. With only very conservative assumptions about the
16
nature of learners’ perceptual experience, the model consistently induces the constraint
*#P from acoustically and perceptually realistic input. Chapter 5 concludes.
17
Chapter 2. Formally grounded markedness constraints
2.1. Formally grounded domain-edge markedness constraints
This chapter will focus on constraints whose grounding, from the perspective of a learner,
is formal rather than functional. Chapter 1 proposed that formally grounded constraint
schemata are innate, whereas functionally grounded constraints induced by learners from
functional properties of ambient language (as proposed by Hayes (1999), Smith (2002),
and Steriade (1999; 2001a)). Formally grounded schemata therefore refer to innate,
formal linguistic primitives (features, scales, prosodic constituents, etc.) while
functionally grounded schemata refer to literal phonetic and psycholinguistic properties.
For this reason, the grounding of a constraint is often suggested by its definition
and that of the schema to which it belongs. The ultimate distinction between formally and
functionally grounded constraints is an empirical one: if sufficient information to
motivate the induction of all constraints in some schema is available in the experience of
all learners of all languages, those constraints are functionally grounded. If learners’
experience provides insufficient information for the induction of all constraints in a
particular schema, those constraints can only be universally present in all speakers’
grammars if the schema is instead innate. These innate constraints are formally grounded.
Chapter 1 suggests that Alignment constraints, whose general form is given in (1)
(McCarthy and Prince, 1993b), are formally grounded rather than induced from learners’
experience. This schema includes constraints which align edges of any pair of
morphological or prosodic categories. McCarthy and Prince discuss the effects of various
attested Alignment constraints: ALIGN(σ́, R, Word, R) penalizes stressed syllables which
are not final in words, while ALIGN(Word, L, σ́, L) penalizes words without initial stress;
18
ALIGN(Suffix, L, Word, R) licenses suffixes only after prosodically minimal words in
Axininca Campa, while ALIGN(-um-, L, Word, L) places the Tagalog affix -um- near the
left edge of a word. (2) ALIGN(Cat1, Edge1, Cat2, Edge2)
The element standing at the Edge1 of any Cat1 also stands at the Edge2 of some Cat2, where Cat1 and Cat2 are grammatical or prosodic constituents and Edge1 and Edge2 are left or right.
The categories and edges referred to by Alignment constraints are formal
elements, and these categories and edges are free to combine in any logically possible
manner. These constraints may be ultimately (evolutionarily) motivated by
psycholinguistic concerns, as their effects may enhance speakers’ ability to identify the
edges of prosodic and morphological constituents. These constraints can emerge in the
grammars of speakers of languages where they are generally inactive: Hebrew and
French learners produce forms with initial stress, despite a lack of evidence for
productive initial stress in either language. It is unlikely that these learners could induce
the appropriate Alignment constraint, ALIGN(Word, L, σ́, L), from their linguistic
experience. Instead these constraints appear to be formally grounded, constructed by
learners from the innate ALIGN(Cat1, Edge1, Cat2, Edge2) schema and other formal, innate
elements. This chapter will examine a novel set of segmental ‘domain-edge markedness
constraints’ which I argue are similarly formally grounded and emerge from innate
schemata.
Domain-edge markedness constraints account for phonotactic parallels among
prosodic domains, explaining the typological generalization that restrictions on syllable
onsets and codas can also hold on edges of any larger prosodic domain. For example, a
segment like ŋ which can be banned in all syllable onsets in a language like Mongolian
19
(and very nearly in English) can also be banned in strictly word-initial position in West
Greenlandic, while being licensed in medial onsets. In Kunwinjku, ŋ is licensed word-
initially, but tends to be dropped from utterance-initial position. Similar parallel
restrictions hold across final codas of prosodic domains. Mascaró and Wetzels (2001)
demonstrate that languages can implement final devoicing at the ends of all syllables (as
in German), or only at the ends of words (Russian) or phrases (Yiddish). In the
phonotactic patterns of interest here, a smaller set of segments is licensed at the edge of a
prosodic domain than in domain-medial positions. The opposite pattern, in which more
segments are licensed at domain edges, is also attested and has been analyzed within the
framework of positional faithfulness (Beckman, 1999).
This typological generalization regarding the phonotactics of prosodic domain
edges must be accounted for by any theory of phonology. Within Optimality Theory
(Prince and Smolensky, 1993/2004), these parallel phonotactic restrictions must be
accounted for by formal parallels among the constraints in speakers’ grammars – that is,
by schemata for sets of formally parallel constraints. This chapter proposes that in order
to impose parallel restrictions on the edges of all prosodic domains, all markedness
constraints on onsets and codas are part of one of the domain-edge markedness constraint
schemata defined in (3). These schemata give rise to parallel constraints referring to each
level of the prosodic hierarchy. (3) Domain-edge markedness constraint schemata
a. MOnset(Onset/PCat) Where MOnset is some markedness constraint which targets onsets, and PCat is some prosodic domain, assign one violation for each instance of PCat in which the initial syllable incurs a violation of MOnset.
20
b. MCoda(Coda/PCat) Where MCoda is some markedness constraint which targets codas, and PCat is some prosodic domain, assign one violation for each instance of PCat in which the final syllable incurs a violation of MCoda.
While the substantive aspects of domain-edge markedness constraints are likely
motivated in some way by functional factors, I will demonstrate that they are
synchronically like Alignment constraints in being constructed by learners from schemata
which combine formal elements in all logically possible ways. Further, I will argue that
these constraints must be constructed from innate schemata rather than induced from
learners’ linguistic experience, simply because there is no consistent perceptual or
articulatory difficulty associated with the segments and structures penalized in each of
these prosodic positions by each of these constraints. For this reason, these constraints are
formally grounded, rather than functionally grounded, from the perspective of a learner.5
In order to motivate the general schemata in (3), this chapter will provide a
detailed survey of the parallel phonotactic restrictions which target the edges of various
prosodic domains. Specifically, section 2.2 will begin by motivating the more modest
claim that all segments which are marked in syllable onsets are also marked in word-
initial, foot-initial, and utterance-initial onsets. This generalization allows the formulation
of a preliminary constraint schema for marked onsets, *X(Onset/PCat). The factorial
typology and ranking requirements of these constraints are explored in section 2.3, along
with the case for formal rather than functional grounding. Section 2.4 motivates
generalizing the *X(Onset/PCat) schema to the full set of domain-edge markedness
constraint schemata by providing cross-linguistic evidence for a wide range of parallel
phonotactic constraints on onsets and codas of syllables, words, phrases, and utterances.
5 See section 1.4 in chapter 1 for discussion of the difference between functionally motivated,
innate phonological primitives and constraint schemata and functionally grounded constraints which are induced from learners’ experience.
21
Section 2.5 explores the factorial typology of these constraints more broadly, looking at
their interactions with constraints on prosodic strict layering. Finally, section 2.6 returns
to the issue of why these constraints must be formally rather than functionally grounded.
2.2. Marked domain-initial onset segments
As a first step towards the general schemata for domain-edge markedness constraints, this
section will present cross-linguistic evidence for parallels among marked segments in
syllable onsets, word-initial onsets, and foot-initial and utterance-initial onsets. This data
demonstrates that any segment which is marked in syllable onsets is also marked (and
thus can be banned) in the initial onset of any other prosodic domain. These parallel
restrictions will be used to motivate the preliminary *X(Onset/PCat) constraint schema in
section 2.3, which will then be generalized to the full domain-edge markedness constraint
schemata in section 2.4.
2.2.1. Marked onset segments
While most languages tend to either license identical sets of segments in onset and coda
positions or else license more segments in onsets than in codas (Beckman, 1999: 121-3;
Goldsmith, 1990; Hooper [Bybee], 1976), there is a set of segments which languages can
ban in onsets and which can thus surface exclusively in codas. These include the velar
nasal , the glottals and h, and high-sonority segments like glides, rhotics, and laterals.
First, the languages in (4) license in codas but not in onsets.
22
(4) codas; not onsets6
Doyayo (Wiering and Wiering, 1986) Lower Grand Valley Dani (Bromley, 1961) Mixe (Van Haitsma and Van Haitsma, 1976: 16) Mongolian (Poppe, 1970) Mundang (Elders, 2000) More examples are discussed by Anderson (2004: 221-2)
In Mixe, for example, m, n, and are contrastive in medial and final codas, as in
(5a), but only m and n can appear in onsets (initially or medially), as shown in (5b). (5) Mixe a. mu:m ‘somewhere’ kom.ha.bo:m ‘next day’
tu:n ‘he worked’ wyi:n.c.kly ‘they are skittish’
tu: ‘work (N)’ ni:.ha.du’n ‘also’
b. mac ‘he grabs it’ ci:n.mah ‘mature pine tree’ na ‘to pass’ muc.n.dy ‘are small’ *a *muc.
Similarly, the glottals and h can be banned in all of a language’s onsets but
licensed in codas. Languages where these restrictions hold are listed in (6). (6) a. codas; not onsets
Balantak (Broselow, 2003: 187; Busenitz and Busenitz, 1991) Chamicuro, Tiriyo (Parker, 2001: 362) Finnish (Branch, 1987: 597) Many Top End Australian languages: Gamu, Gunwinjgu, Jawoyn,
Manggarrayi, Ngalakan, Ngandi, Rembarrnga, Wagiman, Warray, Yolngu (Harvey, 1991: 224)
b. h codas; not onsets
Chamicuro (Parker, 2001) Macushi (Abbott, 1991) Wiyot (Teeter, 1964)
Evidence from Chamicuro demonstrates that glottals are not simply rare overall and thus
perhaps accidentally missing from onset positions. Instead, in this language, glottals are
6 Languages will be described as lacking some segment x in a particular prosodic position if x is
completely absent from the position; or if x appears in that position only in non-native words, or only in interjections, ideophones, or function words; or if there is a productive process of dropping or changing underlying x in that position.
23
strongly preferred to other consonants in codas. Sixteen of Chamicuro’s eighteen
consonants are attested in coda position ( and codas are unattested), but 351 of the 375
coda consonants (93.6%) in a 700-word corpus are either or h (Parker, 1994, 2001). As
glottals are otherwise so frequent, the categorical ban on glottal onsets is thus a
phonological restriction rather than simply an accidental gap.
Further evidence for the productivity of glottal onset restrictions is found in
Macushi, where a ban on h onsets can induce metathesis. As shown in (7), h and the high
vowels i and u metathesize in morphologically complex words where h would otherwise
be syllabified as an onset. In this situation h thus surfaces as a coda to an onsetless
syllable, rather than as an onset to an otherwise less-marked CV syllable. (7) Macushi /koneka-sah-i-ya/ [ko.ne.ka.sa.ih.ya] ‘he made it’
*[ko.ne.ka.sa.hi.ya]
/kuh-toh-u-ya/ [kuh.to.uh.ya] ‘what I did’ *[kuh.to.hu.ya]
A final class of marked onset segments are those of high sonority. While no
languages are known to impose absolute phonotactic restrictions against high-sonority
syllable onsets, evidence from patterns of cluster reduction in reduplication and child
phonology suggests that high-sonority segments are more marked in onset position than
are lower-sonority segments. In Sanskrit reduplication, for example, onset clusters
simplify by deleting the higher-sonority member of a cluster, preferentially preserving
low-sonority onsets as in pa-prach and a-ti-trasam (Gnanadesikan, 2004; Kiparsky,
1979; Steriade, 1988; Whitney, 1889).
Similar preferences for low-sonority onsets can be found in children’s cluster
reduction patterns (Gnanadesikan, 2004; Goad and Rose, 2004; Pater and Barlow, 2003).
Gnanadesikan reports that Gitanjali (age 2;3 to 2;9) reduces s-stop and stop-liquid
24
clusters to stops as in (8a,b) and fricative-sonorant clusters to fricatives as in (8c). She
consistently preserves the lowest-sonority member of an underlying onset cluster. This
typically occurs word-initially, but can also occur medially as in umbrella [fiby] where
the initial unstressed syllable is overwritten with Gitanjali’s productive dummy syllable
fi-. (8) a. s + stop b. Stop + liquid c. Fricative + sonorant
star [d] draw [d] snookie [ski] spoon [bun] please [piz] sleep [sip] straw [d] umbrella [fiby] friend [fn]
A final example of children’s preference for low-sonority onsets can be found in
their systematic replacement of high-sonority onsets with homorganic stops. Fikkert
(1994: 57-63) reports that the Dutch-speaking child Jarmo (age 1;8 to 2;2) goes through a
stage of avoiding high-sonority onsets. One of his repair strategies is to replace
underlying fricatives, nasals, liquids, and glides with plosives. (9) a. Fricative stop fiets /fi:ts/ [ti:ts] ‘bicycle’ (2;0.4)
gevallen /valn/ [kal] ‘fallen’ (2;0.28) b. Nasal stop nu /ny:/ [ty:] ‘now’ (1;11.20)
mais /majs/ [pis] ‘mealies’ (2;2.6) c. Rhotic stop regen /re:n/ [te:] ‘rain’ (1;11.20) d. Glide stop wortel /ortl/ [tatw] ‘carrot’ (2;1.8)
High-sonority segments and ŋ, , and h are thus marked, and can be subject to
restrictions, in syllable onset position (see section 2.3.5 for discussion of why these
segments might be dispreferred in this position). In languages where these segments are
banned syllable-initially, they also typically fail to surface word-initially. This follows
from the implicational nature of prosodic structure, as shown in (10): words typically
begin with syllables, and so a word-initial consonant (C1) is also syllable-initial. Thus if a
25
segment is banned in all syllable onsets, it will typically never appear word-initially
either.7 (10) Word σ σ
C1 V C2 V C3
2.2.2. Parallel restrictions on marked word-initial segments
Languages discussed in the previous section ban marked onset segments in all syllables;
they ban these segments in word-initial position simply because this is also a syllable-
initial position. The languages surveyed in this section ban marked onsets in only word-
initial position, while licensing them in the onsets of medial syllables. All marked onsets
which can be banned syllable-initially can also be banned strictly word-initially. These
parallel restrictions on syllable onsets and word-initial onsets contribute to the
generalization that marked syllable onsets are marked in all prosodic domain-initial onset
positions.
In the languages listed in (11), the marked onset may occur in medial onsets and
often in codas as well, but is banned in strictly word-initial onsets. (11) codas, medial onsets; not word-initial
Barua (Lloyd and Healey, 1970: 11) Bakshir (Poppe, 1962) Bhojpuri (Shukla, 1981) Bhumij (Ramaswami, 1992) Ewondo (Abega, 1969) Gadaba (Bhaskararao, 1998: 328) Gbeya (Samarin, 1966) Gumbaingar (Smythe, 1948: 7)
7 This is true in prosodic structures which obey strict layering; see section 2.5 for discussion of
examples where strict layering is violated.
26
I jo (Williamson, 1969) Kapau (Healey, 1981b: 97) Kobon (Davies, 1981) Kolami (Subrahmanyam, 1998: 303) Koa (Krishnamurti and Benham, 1998: 243) Kristang (Baxter, 1988) Limbu (van Driem, 1987: 16) Mansi (Keresztes, 1998: 394-5) Santali (Ghosh, 1994: 17) Selkup (Helimski, 1998a: 554) Southern Sierra Miwok (Broadbent, 1964) Sri Lankan Portuguese Creole (Hume and Tserdanelis, 2002: 4) Telefol (Healey, 1981a; Healey and Healey, 1977: xvi) Tumpisa Shoshone (Dayley, 1989: 388) Ura (Crowley, 1998) Uyghur (Hahn and Ibrahim, 1991) Wori (Hagège, 1967: 25) West Greenlandic (Fortescue, 1984) Yamphu (Rutgers, 1998: 33) More examples are discussed by Anderson (2004)
In a number of these languages, underlying word-initial can surface as n. In
Yamphu, “[t]he velar nasal // occurs in word-initial position only in a small number of
words, especially in the speech of elderly people. In word-initial position, the velar nasal
may always be replaced with the apico-alveolar nasal /n/” (Rutgers, 1998: 33). Words
with this variation between initial and n in (12a) contrast with the invariant n-initial
words in (12b). The variable words thus have underlying ŋ, while the invariant words
have underlying n. When is not word-initial, it does not alternate with n, as in (12c).
Only word-initial ŋ is marked and is avoided in favor of n.
27
(12) Yamphu a. a ~ na ‘fish’
a:kma ~ na:kma ‘to request”
b. nema *ema ‘to count’ nitci *itci ‘two’
c. nindaa *nindana ‘head’ cwædo *cwændo ‘sizzling’ parle *parlen ‘tale’
Languages may also license the glottals and h in medial onsets (and often in
codas as well) but ban them in word-initial onsets. This occurs in the languages in (13). (13) a. medial onsets; not word-initial
Awa (McKaughan, 1973) Barua (Lloyd and Healey, 1970: 11) Bhumij (Ramaswami, 1992) Chepang (Caughley, 2000) Djinang and Djinba (Waters, 1989) Fefe Bamileke (Hyman, 1978) Koa (Krishnamurti and Benham, 1998: 243) Lower Grand Valley Dani (Bromley, 1961) Luiseno (Kroeber and Grace, 1960) Nahuatl (Sullivan, 1988) Nganasan (Helimski, 1998b: 484) Timugon Murut (Prentice, 1971) Western Shoshoni (Crum and Dayley, 1993: 233)
b. h medial onsets, codas; not word-initial Carib (Peasgood, 1972: 36) Sierra Nahuat (Key and Key, 1953: 54) Ura (Crowley, 1998: 4)
The claim that these marked syllable onset segments may be licensed in medial
onsets but banned in word-initial onsets depends on the medial occurrences of these
segments being true onsets, rather than ambisyllabic. Evidence for the prosodic position
of medial glottal stop is found in Koa, where is banned word-initially. Medially,
can occur at the end of an intervocalic sequence of consonants with decreasing sonority,
as in (14); this is canonically an onset, rather than coda or ambisyllabic, position.
28
(14) Koa ig.a ‘get off’
dor k.i.a ‘is found’ panz.i ‘because’
Similarly in Gumbaingar, can occur freely word-medially, but can be dropped
from word-initial onsets. As in Koa, medial can be the second of two heterorganic
intervocalic consonants, as in (15), indicating that it is an onset rather than ambisyllabic. (15) Gumbaingar bal.an ‘gristle, sinew, cartilage’
djil.u:jn.ga ‘Australian cedar’ mu.u:r.a.in ‘bloodshot’
Finally, languages can ban high-sonority segments in only word-initial position.
The languages in (16) ban various classes of word-initial segments, effectively setting
upper limits on the acceptable sonority of word-initial segments. Many of these cases are
discussed by Smith (2002: 131-157). (16) a. No word-initial glides
Dhangar-Kurux (Gordon, 1976: 52) Malay (Prentice, 1990: 918) Nung (Saul and Wilson, 1980) Nunggubuyu (Heath, 1984) Puluwat (Elbert, 1974: 8)
b. No word-initial glides or rhotics Sestu Campidanian Sardinian (Bolognesi, 1998; Smith, 2002: 133-138)
c. No word-initial glides, rhotics, or laterals Chalcatongo Mixtec (Macaulay, 1996) West Greenlandic (Fortescue, 1984)
The Sestu dialect of Campidanian Sardinian avoids word-initial rhotic and glide onsets
via epenthesis of word-initial vowels. Latin rosa has become Setsu ar:za; Italian radio
has been borrowed as ar:ik:u; other Campidanian dialects use jaju for ‘grandfather’ while
Setsu uses ajaju.
In addition to these languages which impose literal sonority thresholds on word-
initial segments, a number of languages license word-initial glides while banning other
29
high-sonority segments (i.e. rhotics; rhotics and laterals; rhotics, laterals, and nasals).
Languages of this type are listed in (17). (17) a. No word-initial rhotics
Mbabaram (Dixon, 1991)
b. No word-initial rhotics or laterals Kuman (Smith, 2002: 140; Trefry, 1969: 2-5) Mongolian (Poppe, 1970; Ramsey, 1987: 205-209) Piro (Matteson, 1965: 29) Telefol (Healey, 1981a; Healey and Healey, 1977: xvi) Many Australian languages (Hamilton, 1996)
c. No word-initial rhotics, laterals, or nasals Turkish (Kornfilt, 1997)
Smith (2002) and Flack (2006) argue that these restrictions are also due to sonority-based
restrictions. Despite the general prohibition against word-initial segments whose sonority
is equal to or higher than that of e.g. laterals, these initial glides could surface either
because they are part of the nucleus and so not truly onsets or because of high-ranking
glide-specific faithfulness constraints.
A particularly severe sonority-based restriction is found in Turkish, where the
phoneme inventory is as in (18). Kornfilt (1997: 492) reports that “[w]ords of the native
vocabulary don’t, in general, begin with the following segments: [dʒ], [f], [ʒ], [l], [m],
[n], [ɾ], or [z].” Turkish bans its rhotic, lateral, and nasals word-initially (and neutralizes
the voice contrast in fricatives), while licensing word-initial glides.
30
(18) Turkish phoneme inventory (*x = banned word-initially) Labial Labiodental Alveolar Palatal Velar Glottal
Stop p b t d k g
Fricative *f v s *z ʃ *ʒ h
Affricate tʃ *dʒ
Nasal *m *n
Lateral *l
Rhotic *ɾ
Glide j
Kornfilt notes that exceptions to this generalization are found in onomatopoeic
words and a small number of function words, like the interrogative clitic [mƜ] and the
particle [ne] ‘what’. Function words may be subject to different phonotactic restrictions
than lexical words; setting these aside, the Turkish lexicon is heavily restricted with
respect to the sonority of word-initial segments.
2.2.3. Generalized domain-initial markedness
Each segment which is marked (and can be banned) in syllable onsets is also marked (and
so can also be banned) in strictly word-initial onsets. These parallel restrictions suggest
that the markedness of ŋ, , h, and high-sonority segments in syllable-initial and word-
initial positions stems from the shared prosodic properties of these two positions: each is
the initial onset of a prosodic domain.
The parallels among syllable-initial and word-initial restrictions can be unified
under the proposal that ŋ, , h, and high-sonority segments are generally marked in
prosodic domain-initial positions. Syllable-initial and word-initial restrictions are specific
instances of this general fact. This generalization makes the prediction that restrictions on
these same segments should be found initially in other prosodic domains. The following
sections will demonstrate that this prediction is accurate: marked syllable-initial and
31
word-initial segments are also marked in utterance-initial position, and possibly in foot-
initial position as well.
2.2.3.1. Parallel restrictions on marked utterance-initial segments
Segments which can be banned in syllable-initial and word-initial positions can also be
banned strictly utterance-initially. For example, in Kaiwa, is reportedly licensed word-
medially and initially, but banned in strictly utterance-initial position (Bridgeman, 1961:
332). West and Welch (1967: 14) similarly describe h in Tucano as failing to appear only
utterance-initially.
A dispreference for utterance-initial ŋ is found in the Kunwinjku dialect of Bininj
Gun-Wok. Evans (2003: 94-5) observes that word-initial is variably deleted in
Kunwinjku. Unlike similar processes of word-initial -drop in other Australian languages
(e.g. Gumbaingar (Smythe, 1948), Innamincka Yandruwandha (Breen, 2004)), however,
the tendency to delete word-initial in Kunwinjku is strongest utterance-initially. Evans
describes Kunwinjku as having “[a] large number of words which freely drop the initial
found in their cognates in other dialects, particularly when coming at the beginning of a
breath group.” (p. 94) (19) Kunwinjku anabbau ~ anabbau ‘buffalo’
an-bebe ~ an-bebe ‘ghost gum’ aje ~ aje ‘I, me’ okko ~ okko ‘already’ uniwam ~ uniwam ‘you two went’
Evans argues that the -initial, rather than vowel-initial, variants are underlying, noting
that the -initial pronunciations are considered to be more correct: “I have heard
Kunwinjku speakers also make this vowel-initial pronunciation, but then correct my
32
repetitions by restoring the -. They also standardise toward the -initial spelling when
writing in Kunwinjku.” (p. 94)
Reports of this sort of utterance-initial restriction are rare, and in fact there are no
reports at all of languages with restrictions against strictly utterance-initial high-sonority
segments. This should not be taken as evidence against the existence or productivity of
such restrictions; instead, it is a natural consequence of the fact that most language
descriptions focus on word-level phonology. There are relatively few reports of any sort
of phonological phenomena in domains larger than the word, though careful study has
identified a great deal of phonological activity at higher prosodic levels (Nespor and
Vogel, 1986; Selkirk, 1981, 1984). Despite the scarcity of reported utterance-initial
restrictions, the restrictions described here parallel those which hold at the left edge of
smaller prosodic domains. The existence of these parallels supports the claim that ŋ, , h,
and high-sonority segments can be banned at the left edge of any prosodic domain.
2.2.3.2. Parallel restrictions on marked foot-initial segments
The claim that marked syllable onsets can be banned initially in any prosodic domain
predicts that restrictions against ŋ, , h, and high-sonority segments should be found foot-
initially, as well as word- and utterance-initially. Purely foot-oriented phonotactic
restrictions are extremely difficult to distinguish from stress-based phonotactic
restrictions (see e.g. Smith (2002: 97-115)). For this reason, feet will be set aside through
most of the discussion of prosodic domains in this chapter. There are, however, languages
in which the marked onsets discussed above are absent in foot-initial position, suggesting
that these parallel restrictions do exist.
The distribution of ŋ in English and German is consistent with a foot-initial
restriction. The phonotactics of ŋ are identical in these two languages: ŋ is licensed in
33
codas (as in sing and blanket, and German Ding ‘thing’ and dunkel ‘dark’), and in
intervocalic onsets following stressed syllables (as in dinghy and orangutan, and German
ringen ‘to struggle’). ŋ is banned in the onsets of syllables which are word-initial or
stressed. Weise unifies these positions where ŋ is banned as both foot-initial,
“presupposing…that initial syllables with non-primary stress are dominated by their own
foot.” (Wiese, 1996: 59)8
There are also suggestions that high-sonority segments are avoided in foot-initial
position. Waters (1989) reports that Djinang words are composed of series of trochees (in
his words, ‘rhythmic units’) in which the initial consonant tends to be a nasal or oral stop
while the medial consonant tends to be a liquid or nasal. Foot-initial consonants in
Djinang thus tend to be less sonorous than foot-medial consonants. This tendency
towards a sonority threshold for foot-initial segments is parallel to (though less
categorical than) the sonority thresholds which languages can impose on syllable-initial
or word-initial segments.
2.2.4. Summary of the domain-initial onset restrictions
This section has demonstrated that those segments which can be banned in all of a
language’s onsets, while being licensed in codas, can also be banned initially in larger
prosodic domains: feet, words, and utterances. These restrictions are summarized in
(20).9
8 McCarthy (2001) describes the distribution of English ŋ as occurring only after short vowels; this
condition also holds of German ŋ. This leads McCarthy to propose that ŋ must head a mora, thus imposing a further restriction beyond non-foot-initiality on ŋ.
9 Many of the marked domain-initial onsets (, h, and glides) are also frequently epenthesized domain-initially when domains would otherwise be without initial onsets. See section 2.4.1 for discussion of these cases.
34
(20) Summary of restricted prosodic domain initial segments Syllable Foot Word Utterance
ŋ Mixe English Yamphu Kunwinjku
Chamicuro Nahuatl Kaiwa
h Macushi Carib Tucano
High-sonority segments child language Djinang Turkish
Rather than stipulating the markedness of these segments in each prosodic domain
via arbitrary sets of independent markedness constraints, these parallels invite a
theoretical mechanism which imposes the same set of restrictions on initial onsets of all
prosodic domains. Such a mechanism would render these parallels predictable and
expected, rather than arbitrary or accidental. The next section will propose a constraint
schema which does just this, and will demonstrate that the schema is formally, rather than
functionally, grounded. This schema will then be generalized to account for the fact that
all phonotactic restrictions on syllable onsets and codas can hold on the edges of all
larger prosodic domains.
2.3. A constraint schema for marked onsets: *X(Onset/PCat)
Any restriction on syllable onset segments can also target the initial onset of any other
prosodic domain. This generalization motivates the proposal that any markedness
constraint which encodes a restriction on syllable onset segments is part of a constraint
schema composed of individual constraints enforcing that restriction on the initial onsets
of each prosodic domain.
After the proposed schema is defined in section 2.3.1, sections 2.3.2–2.3.3
examine the factorial typology predicted by these constraints’ interaction with general
and positional faithfulness constraints. This discussion demonstrates that the constraints
35
make accurate predictions regarding possible domain-specific onset restrictions. Section
2.3.4 continues to investigate the formal properties of this constraint schema, showing
that no fixed rankings or stringency relations are imposed among parallel domain-specific
constraints. Finally, section 2.3.5 discusses the issue of formal vs. functional grounding,
determining that these constraints cannot be induced from phonetic facts of a learner’s
experience and so must be innate and formally grounded.
2.3.1. The *X(Onset/PCat) constraint schema
The discussion above demonstrated that for each segmental restriction on syllable onsets,
there are parallel restrictions on initial onsets of words, and also typically on initial onsets
of utterances and feet. Taken together, these correspondences among phonotactic
restrictions demonstrate a fundamental similarity among initial onsets of all prosodic
domains: any marked onset segment can be banned in any domain-initial onset.
This section proposes that a constraint schema is responsible for these parallels. In
order to formulate the schema, the notion of onset must first be slightly reformulated such
that all prosodic domains have onsets. This new general definition of the onset of a
prosodic domain is based on the traditional notion of onset, i.e. all of the consonants in a
syllable which precede the head. The general definition is given in (21), and examples of
specific versions of this definition as it applies to syllables, words, and utterances are in
(22). While the examples in (22) refer only to some prosodic domains, the general
definition assumes that all other prosodic domains (e.g. feet, phrases) have onsets as well.
36
(21) Onset/PCat The onset of PCat, where PCat is some prosodic domain (e.g. syllable, word, utterance): All consonants in PCat which belong to the leftmost syllable of PCat and which precede that syllable’s head.10
(22) a. Onset/σ The onset of a syllable.
All consonants in a syllable (which belong to the leftmost syllable of the syllable and) which precede that syllable’s head.11
b. Onset/Word The onset of a word.
All consonants in a word which belong to the leftmost syllable of the word and which precede that (leftmost) syllable’s head.
c. Onset/Utterance The onset of an utterance.
All consonants in an utterance which belong to the leftmost syllable of the utterance and which precede that (leftmost) syllable’s head.
Given these definitions, a constraint schema which gives rise to all of the onset
restrictions discussed in the previous section can now be formulated. The schema consists
of constraints of the general form *X(Onset/PCat) as defined in (23), which refer to each
level of the prosodic hierarchy. Some examples of *X(Onset/PCat) constraints, which
will be referred to generally as ‘domain-specific onset markedness constraints’, are given
in (24).
10 This definition is deliberately ambiguous as to whether onsets are actual syllable constituents or
not.
11 The “leftmost syllable” belonging to a syllable is, of course, that syllable itself given proper containment. Thus the onset of a syllable under this revised definition refers to the same portion of a syllable as do earlier definitions of ‘onset’.
37
(23) *X(Onset/PCat) Where X is some segment or (set of) feature(s) and PCat is some prosodic domain, assign one violation for each instance of X in an onset of PCat.
‘X cannot be the (leftmost) onset of PCat.’
(24) *X(Onset/Utterance) *X(Onset/Word) *X(Onset/σ)
Specific restrictions against the marked domain-initial onset segments discussed
above are imposed by particular instantiations of the *X(Onset/PCat) constraint schema.
Restrictions against onset , , and h are enforced by the constraints in (25). (25) a. *(Onset/Utterance) *(Onset/Word) *(Onset/σ)
b. *(Onset/Utterance) *(Onset/Word) *(Onset/σ) c. *h(Onset/Utterance) *h(Onset/Word) *h(Onset/σ)
The sets of constraints in (26) are those in which the *X(Onset/PCat) constraint schema is
aligned with the sonority scale. The fact that sets of high-sonority segments can be
banned domain-initially can be accounted for by either fixed rankings among constraints
targeting each prosodic domain, as shown in (26), or by stringency relations among the
constraints. In either case, the relationships between these sonority-based constraints on
domain-initial onsets are inherited from the sonority scale.
(26) *Glide(Ons/Utt) » *Rho(Ons/Utt) » *Lat(Ons/Utt) » *Nasal(Ons/Utt) » *Fric(Ons/Utt)
*Glide(Ons/Wd) » *Rho(Ons/Wd) » *Lat(Ons/Wd) » *Nasal(Ons/Wd) » *Fric(Ons/Wd)
*Glide(Ons/σ) » *Rho(Ons/σ) » *Lat(Ons/σ) » *Nasal(Ons/σ) » *Fric(Ons/σ)
2.3.2. Factorial typology: General faithfulness and *X(Onset/PCat) constraints
In an OT grammar, interactions among *X(Onset/PCat) constraints and faithfulness
constraints account for the domain-specific restrictions on marked onsets described
above. When all of the relevant faithfulness constraints are ranked below the constraints
38
against in all onset positions, the pattern of total avoidance of all onset emerges, as in
Chamicuro. (27) Chamicuro: No syllable onset 12 aa *(Onset/Utt) *(Onset/Wd) *(Onset/σ) IDENT
a. a.a *! **
b. ta.ta **
The issue of fixed vs. free ranking of *X(Onset/PCat) constraints will be addressed in
section 2.3.4; for now, tableaux will follow the convention of arranging constraints such
that constraints on larger domains are to the left of constraints on smaller domains.
When faithfulness constraints are ranked below *(Onset/Utterance) and
*(Onset/Word) but above *(Onset/σ) as in (28), the Nahuatl pattern emerges: can
surface in medial but not word-initial or utterance-initial onsets. (28) Nahuatl: No word onset aa *(Onset/Utt) *(Onset/Wd) IDENT *(Onset/σ)
a. a.a *! **
b. ta.a * *
c. ta.ta **!
If faithfulness is ranked below *(Onset/Utterance) but above *(Onset/Word)
and *(Onset/σ), glottal stops may surface in word-initial and word-medial onsets but not
utterance-initially, as in Kaiwa.
12 In this and other hypothetical tableaux I assume the familiar OT idea of Richness of the Base
(Prince and Smolensky, 1993/2004), under which there are no restrictions on inputs; any imaginable input will have some winning output form in each language. Additionally, in these tableaux illustrating phonotactic restrictions, the winning unfaithful mappings are themselves hypothetical. That is, in (27), the crucial point is simply that onset does not surface faithfully; the // [t] mapping is hypothetical.
39
(29) Kaiwa: No utterance onset
a. Utterance-medial [Utt … aa *(Onset/Utt) IDENT *(Onset/Wd) *(Onset/σ)
a. [Utt … a.a * **
b. [Utt … ta.a *! *
c. [Utt … ta.ta **! b. Utterance-initial
[Utt aa *(Onset/Utt) IDENT *(Onset/Wd) *(Onset/σ)
a. [Utt a.a *! *! **
b. [Utt ta.a * *
c. [Utt ta.ta **!
Finally, when faithfulness constraints dominate all of the constraints against
domain-initial glottal stop onsets, a language (like Arabic, among others) allows glottal
stop in all onsets. (30) Arabic: No restrictions on onset aa IDENT *(Onset/Utt) *(Onset/Wd) *(Onset/σ)
a. a.a * **
b. ta.a *! *
c. ta.ta **!
This section has shown that domain-specific onset markedness constraints can
give rise to the restrictions exemplified above, through their interaction with general
faithfulness constraints. The next section will show that these constraints’ ranking with
respect to positional faithfulness constraints also gives rise to attested phonotactic
patterns.
40
2.3.3. Factorial typology II: Positional faithfulness and *X(Onset/PCat) constraints
Domain-specific onset markedness constraints penalize marked segments or structures at
the beginnings of prosodic domains. Conversely, positional faithfulness constraints
penalize unfaithful mappings in various positions, including the beginnings of many
prosodic domains (Beckman, 1999). Both positional constraint frameworks target word-
initial position: *X(Onset/Word) constraints can ban particular segments from word-
initial onsets, resulting in patterns where word-initial onsets license a subset of the onsets
which may occur word-medially. IDENT/σ1 constraints, on the other hand, preserve
contrasts in word-initial syllables, and so can give rise to patterns where word-initial
onsets license a superset of the segments which may occur in word-medial onsets.
In OT terms, a direct conflict between domain-edge positional markedness
constraints and positional faithfulness constraints is therefore possible. Rankings like that
in the hypothetical tableau in (31) are of particular interest in fully understanding the
factorial typology of domain-edge markedness constraints. (31) Marked onsets are only licensed word-initially
a. aga IDENT/σ1 *(Onset/Wd) *(Onset/σ) IDENT
a. a.ga * *
b. ga.ga *! * b. gaa IDENT/σ1 *(Onset/Wd) *(Onset/σ) IDENT
a. ga.a *!
b. ga.ga *
Here, IDENT/σ1 dominates *(Onset/Word) and *(Onset/σ), which themselves dominate
the general faithfulness constraint IDENT. The result of this ranking is a predicted
language in which a marked onset (here, ) is banned in medial onsets due to *(Onset/σ)
» IDENT, but permitted in word-initial onsets because of IDENT/σ1 » *(Onset/Word).
41
This sort of pattern in which marked onset segments are preferentially licensed in
word-initial position, rather than banned word-initially as discussed elsewhere in this
chapter, occurs in a number of languages as shown in (32). (32) Marked onsets in word-initial, not medial onsets
: Lango (Noonan, 1992: 10, 16-7) h: Lamani (Trail, 1970)
Lele (Frajzyngier, 2001) Mbay (Keegan 1997) Songhay (Prost, 1956) Tsisaath Nootka (Stonham, 1999) Wiyot (Teeter, 1964) Yana (Sapir and Swadesh, 1960)
In Lango, for example, can be a word-initial onset, as in (33a), but not a medial
onset. When a morphologically complex word would be expected to have a medial onset
, e.g. when an -final word is followed by a vowel-initial suffix as in (33b), deletes
and the flanking vowels are nasalized. (33) Lango a. ec ‘back’
we ‘smelly’ wcc ‘to run from’ u: ‘beast of prey’
b. /cI-e/ [cIe] ‘hands’
/c-e/ [ce] ‘knees’ /-e/ [e] ‘crocodiles’ /tya-e/ [tyae] ‘durra stalks’
The distribution of in Lango can be accounted for by the ranking IDENT/σ1 »
*(Onset/Word), *(Onset/σ) » IDENT in (31) above. Similar rankings, where positional
faithfulness constraints dominate domain-edge markedness constraints which in turn
dominate general faithfulness, are responsible for other familiar patterns in which more
contrasts are licensed at domain edges than domain-medially; see e.g. Beckman (1999)
and Broselow (2003) for discussion of such patterns.
42
2.3.4. Implicational restrictions and free ranking of *X(Onset/PCat) constraints
As noted in section 2.3.1 above, within a language, restrictions against marked domain-
initial onsets are typically implicational along the prosodic hierarchy. Within a given
language, a prosodic restriction which holds in a small prosodic domain typically holds in
larger domains as well: no syllable onset generally implies no word onset (as in
Chamicuro), and no word onset generally implies no utterance onset (as in Chamicuro
and Nahuatl). Restrictions can, however, hold on larger domains without holding in
smaller domains: again within a language, no word onset does not imply no (medial)
syllable onset .
These implicational relations among restrictions follow from the structure of the
prosodic hierarchy, rather than from any fixed ranking or stringency relations among the
domain-edge markedness constraints. When a language obeys prosodic strict layering
(like the languages discussed above), prosodic structures are implicational in nature:
utterances begin with words, and words begin with syllables. Utterance-initial segments
are thus also word-initial, and word-initial segments are also syllable-initial. When this
type of language bans in syllable onsets, it will also lack word onset and utterance
onset simply because word and utterance onsets are also syllable onsets. Because the
implicational nature of these phonotactic restrictions emerges from the implicational
nature of prosodic structures in this way, the constraints defined in (23) above are
effectively stringent when strict layering holds, without explicitly stringent formulations
or fixed ranking.
This can also be seen by considering the implicational restrictions in terms of the
effects of each constraint. *(Onset/σ) explicitly bans syllable onset . Assuming strict
layering, all word-initial segments are also syllable-initial; *(Onset/σ) therefore also
bans all word onset . *(Onset/Word), on the other hand, bans word onset without
43
imposing any restriction on non-word-initial syllable onsets. So when strict layering
holds, it is a consequence of the implicational nature of prosodic structure that constraints
on onsets of small prosodic domains (e.g. syllables) are more stringent than constraints
on onsets of larger prosodic domains (e.g. words). That is, constraints on prosodic
domains automatically stand in specific-general relationships: *X(Onset/σ) constraints
have more general effects than parallel specific *X(Onset/Word) constraints.
The ‘pseudo-stringent’ behavior of domain-specific marked onset constraints
under strict layering allows these constraints to be freely ranked with respect to each
other while predicting only the attested patterns of implicational restrictions described
above. This can be shown by considering the general factorial typology predicted by the
interactions of specific and general marked onset constraints with faithfulness. The
following discussion will consider the interaction of specific *(Onset/Word) and general
*(Onset/σ); the results of this discussion can be generalized to all domain-specific
marked onset constraints.
There are six possible rankings of *(Onset/σ) and *(Onset/Word) with respect
to a faithfulness constraint like IDENT. These six rankings allow three different winning
output forms for input /aa/, as summarized in (34). These three patterns are all
attested, as described above: marked onset can be banned in all syllable onsets, in only
word-initial onsets, or it can be licensed everywhere.
44
(34) a. banned in all onsets: Balantak /aa/ [ta.ta]
*(Onset/Word) » *(Onset/σ) » IDENT *(Onset/σ) » *(Onset/Word) » IDENT *(Onset/σ) » IDENT » *(Onset/Word)
b. banned in word-initial onsets: Nahuatl /aa/ [ta.a]
*(Onset/Word) » IDENT » *(Onset/σ) c. licensed in all positions: Arabic /aa/ [a.a]
IDENT » *(Onset/Word) » *(Onset/σ) IDENT » *(Onset/σ) » *(Onset/Word)
The comparative tableau in (35) (in which constraints are unranked) shows that
any of the three rankings in (34a) will map input /aa/ to output [ta.ta].13 *(Onset/σ)
must dominate IDENT, as *(Onset/σ) is the only constraint which favors the winner over
the loser [ta.a]. This ranking also favors the winner over the loser [a.a], thus
guaranteeing that [ta.ta] wins. The ranking of *(Onset/Word) is therefore irrelevant to
the outcome, so any of the three possible rankings in which IDENT dominates *(Onset/σ)
will produce this mapping. (35) *(Onset/σ) » IDENT; *(Onset/Word) irrelevant No syllable-initial aa *(Onset/Word) IDENT *(Onset/σ)
a. ta.ta **
b. ta.a * L * W
c. a.a * W L ** W
13 A comparative tableau (Prince, 2002) shows constraints’ favoring relations among candidates.
For each constraint, and each candidate other than the winner, the tableau shows whether the constraint favors the winner over the loser (“W”), the loser over the winner (“L”), or neither (empty cell). While these tableaux do not indicate constraint ranking by left-to-right ordering, as do traditional violation tableaux, they can be used to determine necessary ranking conditions: for a ranking to map the input to the desired output, each constraint which favors some loser over the desired winner must be dominated by a constraint that favors the winner over that loser; that is, when constraints are ordered with respect to their ranking, each L in a row must be dominated by a W in the same row.
45
The ranking *(Onset/Word) » IDENT » *(Onset/σ) maps input /aa/ to output
[ta.a], as in (34b). Tableau (36) shows that, in order for this mapping to occur, IDENT
must dominate *(Onset/σ), as IDENT is the only constraint favoring winning [ta.a]
over the losing candidate [ta.ta]. IDENT favors the loser [a.a] over the winner,
however, and so IDENT must be ranked below *(Onset/Word) which favors the winner
over this loser. (36) *(Onset/Word) » IDENT » *(Onset/σ) No word-initial aa *(Onset/Word) IDENT *(Onset/σ)
a. ta.a * *
b. a.a * W L ** W
c. ta.ta ** W L
Finally, the two rankings in (34c) both map input /aa/ to output [a.a].
Tableau (37) shows that this mapping occurs in either ranking where IDENT – which
favors this winner over both losers – outranks *(Onset/σ) and *(Onset/Word), as each
markedness constraint favors both losers over the winner. (37) IDENT » *(Onset/σ), *(Onset/Word) licensed everywhere aa *(Onset/Word) IDENT *(Onset/σ)
a. a.a * **
b. ta.a L * W * L
c. ta.ta L ** W L
In both (35) and (37), the relative ranking of domain-specific marked onset
constraints does not affect the outcome. The desired mapping occurs whether *(Onset/σ)
dominates *(Onset/Word) or vice versa. This demonstrates that allowing these
constraints to be ranked freely with respect to each other predicts only the attested range
of typological possibilities. The results of this investigation of the relationship between
46
specific *(Onset/Word) and general *(Onset/σ) can be generalized to all marked onset
constraints: in languages where strict layering holds, neither fixed ranking nor stringent
definitions is necessary for domain-specific marked onset constraints.
Justification for the stronger claim that free ranking of domain-specific onset
markedness constraints is required to explain the full range of attested patterns of
domain-edge restrictions will be delayed until the discussion of prosodic structures which
violate strict layering, in section 2.5.1.3. In this situation, prosodic structures are no
longer implicational, and utterance-initial segments may be extrametrical rather than
word-initial (Nespor and Vogel, 1986; Selkirk, 1981, 1984). When this occurs, the
domain-edge restrictions may not be in an implicational relationship either: marked
onsets may be banned word-initially but tolerated utterance-initially. It will be shown
below that free ranking is necessary to account for these patterns.
2.3.5. *X(Onset/PCat) constraints are formally grounded
The *X(Onset/PCat) constraint schema accounts for the typological generalization that
any marked syllable onset segment can also be banned in the onset of larger prosodic
domains. According to the definitions of formal and functional grounding in chapter 1, if
the schema were functionally grounded, all learners of all languages wound be able to
induce all *X(Onset/PCat) constraints from the phonetic properties of each marked onset
in each domain-initial position. In this case the schema would define constraints in terms
of phonetic, rather than formal, properties of segments and positions.
This section will argue that these constraints cannot be consistently induced from
learners’ experience. To show this, the phonetic properties of these marked onset
segments (particularly and h) will be reviewed, demonstrating that they are particularly
difficult to perceive utterance- and word-initially. The argument that the constraints are
47
not induced from these phonetic facts then emerges from a comparison with other
segments (e.g. retroflexes) which are perceptually weak in these same positions.
Retroflexes’ phonotactic distribution very closely reflects their perceptibility, while the
restrictions on marked onsets like and h instead have been generalized beyond only
those positions where they are difficult to perceive. Because marked onsets’ phonotactics
do not correlate with their phonetics, the *X(Onset/PCat) schema is formally grounded
and innate.
2.3.5.1. The phonetics of marked onsets
Many marked onset segments are generally perceptually weak, and appear to be
particularly difficult to perceive in some domain-initial onset positions. Their perceptual
salience therefore correlates with the phonological preferences expressed by some
*X(Onset/PCat) constraints. This correlation is, however imperfect: it is not the case that
all marked onsets are perceptually difficult in all and only the positions targeted by these
constraints, as the following discussion of the glottal segments and h shows.
Perceptual cues for and h tend to be inherently weak, and these segments may
be particularly difficult to perceive word-initially. is rarely realized as a full glottal
closure; more often, “a very compressed form of creaky voice or some less extreme form
of stiff phonation may be superimposed on the vocalic stream.” (Ladefoged and
Maddieson, 1996: 75) The pronunciation of h also ranges from fairly strongly articulated
to extremely lenited, and in the latter case it is acoustically quite vowel-like. Further, as h
has no oral specifications of its own, the vocal tract has the shape of surrounding sounds
during its articulation, making h potentially extremely similar to its context (Keating,
1988; Pierrehumbert and Talkin, 1992). The similarity between and h and surrounding
segments makes them generally difficult to perceive. Further, as a major perceptual cue
48
for each is their interruption of modal voicing in e.g. intervocalic position, these segments
are most perceptually difficult in post-pausal utterance-initial or word-initial position.14
While glottals in domain-final positions also fail to interrupt voicing, the tendency for
utterance- and word-final glottalization bolsters glottals’ perceptibility in domain-final
codas, leaving them asymmetrically perceptually weak in onsets.
While it is beyond the scope of the present work to undertake an exhaustive
exploration of the phonetics of all marked domain-initial onsets, it’s likely that ŋ and
high-sonority segments are, like the glottals, perceptually difficult in some or all domain-
initial onsets. For example, Smith (2002: 50-2) argues that syllables with high-sonority
onsets are less perceptually prominent than syllables with low-sonority onsets, because
the neural response of the auditory system is stronger in cases of strong acoustic contrast
(Delgutte, 1997).
The phonetic properties of and h correlate with the attested constraints against
these segments in utterance and word onsets. Constraints against syllable onset glottals,
however, have no such correlation with phonetics. Constraints like *(Onset/σ) target
both intervocalic and postconsonantal onsets, but no unified perceptual context
characterizes these restrictions. Further, glottal onsets are phonologically dispreferred to
glottal codas, but no phonetic data correlates with this preference: no data suggests that
glottal onsets are less perceptible than glottal codas.
14 The perceptual salience of word-initial and utterance-initial h is complicated by domain-initial
strengthening processes (Byrd, 2000; Cho and Jun, 2000; Fougeron and Keating, 1996; Keating et al., 1999). Domain-initial strengthening occurs variably at the edges of words, phrases, and utterances, and tends to have a greater effect at the edges of larger prosodic domains. Pierrehumbert and Talkin (1992) show that h tends to be pronounced with greater strength and a longer VOT in word-initial and especially phrase-initial positions. This suggests a further mismatch between the perceptual and phonotactic properties of h: it is perceptually strongest in domain-initial positions, where it is phonologically dispreferred.
49
While some *X(Onset/PCat) constraints penalize glottal onsets in positions where
their perceptibility is compromised, the full set of these constraints generalizes beyond
their perceptual motivations. Glottals are perceptually difficult in some domain-initial
onsets, and *X(Onset/PCat) penalize all domain-initial glottal onsets. The next section
will demonstrate that retroflexes are perceptually marked in the same utterance- and
word-initial positions; unlike glottals, however, retroflexes are phonologically banned in
exactly the positions where their perceptibility is compromised. In order to explain how
segments with comparable perceptual properties can have either of two phonotactic
patterns, I argue that constraints against retroflexes can be induced from phonetic data,
while *X(Onset/PCat) constraints against domain-initial glottals must instead be innate
and formally grounded.
2.3.5.2. Comparison: Phonetics and phonotactics of retroflexes
Retroflex segments like ʈ, ɖ, ɳ, and ɭ, like glottals, are difficult to perceive at the
beginnings of words and utterances. This is because the retroflex articulation is unique
(relative to other alveolars) at its closure, but not at its release, as follows. In pronouncing
a retroflex consonant, the tongue achieves its characteristic postalveolar contact when it
first makes contact with the roof of the mouth. This results in an acoustic distinction
between the onsets of retroflex and non-retroflex alveolars, realized primarily in low third
and fourth formants during the transitions from preceding vowels into retroflexes
(Stevens and Blumstein 1975). During the retroflex closure, however, the tongue moves
forward such that at release it is in essentially the same position as the target for apico-
alveolar t, d, n, l, etc. (Butcher 1993, Henderson 1998). The transition from a retroflex
into a following vowel is essentially identical to the transition from an apico-alveolar into
50
a following vowel. Retroflexes are therefore distinguished primarily by their anticipatory
transitions.
These articulatory facts compromise retroflexes’ perceptibility word- and
utterance-initially. In both of these positions, the lack of a preceding vowel means that
retroflexes will not have audible third and fourth formants prior to their closure.15
Steriade (1999; 2001b) observes that phonotactic restrictions on retroflexes mirror these
perceptual facts. Retroflexes are banned word-initially in a number of Australian and
Dravidian languages, as shown in (38a). In the additional languages in (38b), a subset of
the retroflexes are banned initially: retroflex stops may occur in this position, while
retroflex sonorants may not.16 (38) a. No word-initial retroflexes
Tamil (Annamalai and Steever, 1998) Many Australian languages:
Alawa, Anindilyakwa, Arrernte, Bularnu, Dhuwaya, Diyari, Djambarrpuyungu, Djaru, Gaalpu, Gooniyandi, Guugu-Yimidhirr, Kalkatungu, Kayardild, Kitja, Kukatj, Lardil, Madhimadhi, Mangarrayi, Mantjiltjarra, Marra, Mgalakan, Miriwung, Muruwari, Ngandi, Ngawun, Nyigina, Nyungar, Pitta-Pitta, Ritharrngu, Tiwi, Walmatjarri, Wambaya, Warlmanpa, Warlpiri, Warluwarra, Warndarrang, Warumungu, Watjarri, Wergaia, Yankuntjatjarra, Yirr-Yorront (Hamilton, 1996: 215-6)
15 For the same reason, retroflexes are also difficult to perceive postconsonantally. This is also
reflected in their cross-linguistic phonotactics: they very rarely occur following non-retroflex consonants. Glottals presumably have similarly diminished postconsonantal (and preconsonantal) perceptibility. While this chapter has focused on onset restrictions on glottals, it does not exclude the likely possibility that languages may also impose further perceptually grounded restrictions on glottals parallel to those on postconsonantal retroflexes. Even if glottals could be banned in all of the perceptually motivated positions where retroflexes are banned, however, the existence of e.g. *(Onset/σ) must still be explained as below.
16 While retroflexes are doubtless particularly difficult to perceive in utterance-initial position, there are no reports of restrictions holding only in this position. This is most likely a consequence of the fact that phonotactic restrictions above the word level are rarely reported, as discussed in section 2.2.3.1.
51
b. No word-initial retroflex sonorants (initial ʈ, ɖ licensed) Kannada (Steever, 1998) Telugu (Khrishnamurti, 1998) Koɳɖa (Krishnamurti and Benham, 1998) Gadaba (Bhaskararao, 1998)
A fundamental difference between the phonotactics of retroflexes and those of
glottals lies in the directness with which their perceptibility is reflected in their
phonology. Retroflexes can be banned only in positions where they are difficult to
perceive, due to a lack of anticipatory transitions: there are no languages in which
retroflexes are banned in all syllable onsets. Therefore while restrictions on both
retroflexes and glottals appear to be ultimately motivated by issues of perceptibility,
retroflexes are banned in exactly those positions where they are difficult to perceive,
while glottals (and other marked onsets) are banned in all positions formally similar to
those where their perceptibility is compromised. The following section will argue that
this difference indicates that restrictions on retroflexes are functionally grounded and
induced by learners, while restrictions on glottals and other marked onsets are formally
grounded and innate.
2.3.5.3. *X(Onset/PCat) constraints are formally grounded
Chapter 1 argued that formal vs. functional grounding (and so innateness vs. induction)
should be considered from the perspective of a learner. A learner exposed to retroflex
consonants can observe that these segments are difficult to perceive word-initially (and
also utterance-initially, as well as postconsonantally), and so can map this perceptual
experience directly to the attested constraints. Because these constraints can be induced
directly from a learner’s experience, they are functionally grounded.
When a learner is exposed to , however, the learner will similarly observe that it
is difficult to perceive utterance-initially and word-initially. If this perceptual experience
52
were mapped directly to a set of perceptually grounded constraints, the learner would
crucially fail to induce a constraint against syllable onset . *(Onset/σ) targets both
intervocalic and postconsonantal onsets, and the learner has no evidence for being
perceptually difficult intervocalically – in fact, this is the position in which is most
easily perceived.
So while the learner makes comparable observations about the positions where
retroflexes and are perceptually difficult, learners must end up with distinct sets of
constraints against these segments. Crucially, perceptual data offers learners no way of
determining that should be penalized in all onsets. Only if the *(Onset/PCat) constraint
schema is innate – if it is functionally, rather than formally, grounded – will each
learner’s grammar contain the full set of attested constraints against domain-initial
onsets. This difference in constraint grounding accounts for the fact that retroflexes are
penalized only in positions where learners have direct knowledge of their diminished
perceptibility, while glottals are penalized in a set of positions which generalize beyond
their actual relative perceptibility.
This conclusion can be generalized to the claim that all domain-initial onset
constraints are formally grounded and induced. This follows from the argument in
chapter 1 that formal vs. functional grounding is a property of constraint schemata, rather
than individual constraints. If all constraints in a schema can be induced from learners’
experience, that schema is functionally grounded. Otherwise, if some or all constraints in
a schema cannot be consistently induced by learners, the schema itself – and so all
individual constraints defined by the schema – must instead be innate and formally
grounded. Because *(Onset/σ) cannot be induced from learners’ experience, the
53
*X(Onset/PCat) constraint schema, and all constraints defined by this schema, must be
innate and formally grounded.
Chapter 1 also proposed that formally grounded constraints are defined in terms
of formal elements, while functionally grounded constraints are defined instead in
phonetic and/or psycholinguistic terms. The segments and positions targeted by
*X(Onset/PCat) constraints further support the claim that these constraints are
functionally grounded. The positions are defined in terms of the formal prosodic
hierarchy, and the segments fall into two formally definable classes. High-sonority
segments are, of course, defined by the formal sonority scale. , h, and ŋ can all be
considered placeless (de Lacy, 2002a: 278-9; Parker, 2001; Trigo, 1988, 1991), and so
this class of segment can also be formally defined; further, the onset restrictions against
, h, and ŋ could potentially be described as general HAVEPLACE(Onset/PCat) constraints,
as Parker proposes for Chamicuro.17 This mirrors the frequent use of CODACOND
constraints which prevent codas from independently licensing place features (Goldsmith,
1990; Ito, 1986, 1989; Lombardi, 2001; McCarthy and Prince, 1993b).
This section has shown that *X(Onset/PCat) constraints correlate closely, but not
perfectly, with patterns of perceptual difficulty. Chapter 1 discussed Parker’s similar
conclusions about the sonority scale, which correlates very closely but ultimately
imperfectly with segments’ relative intensities: some acoustic distinctions are not
reflected by sonority distinctions, and some sonority distinctions have no corresponding
acoustic distinctions (Parker, 2002). The sonority scale and the *X(Onset/PCat)
constraint schema therefore similarly reflect functional aspects of phonology without
17 A HAVEPLACE(Onset/PCat) analysis would need to account for the fact that languages can vary in which placeless segments they allow vs. ban word-initially. For example, Lower Grand Valley Dani allows h but bans and word-initially; Tu mpisa Shoshone and Kapau allow word-initial and h but ban ; many Australian languages allow word-initial but not (and don’t have h).
54
directly encoding them. In both of these cases, acoustic and perceptual tendencies have
been generalized, grammaticalized, and formalized. From the perspective of a learner,
these phonetic facts now appear to be represented as innate phonological primitives rather
than being literally induced from linguistic experience.
2.4. Generalized domain-edge markedness constraints
The previous sections have established that each constraint on a marked onset segment is
part of a set of *X(Onset/PCat) constraints against that segment in the initial onsets of all
prosodic domains. The ranking of these constraints with respect to general faithfulness
produces the attested typology of domain-specific restrictions on marked onsets; the
ranking of domain-specific marked onset constraints with respect to positional
faithfulness constraints (e.g. IDENT/σ1) also accurately predicts attested phonotactic
patterns. These constraints appear to be freely rankable, and their grounding is formal,
rather than functional, from the perspective of a learner.
Now that the properties of the *X(Onset/PCat) constraints are understood, we can
ask whether the *X(Onset/PCat) constraint schema can be further generalized, and so
whether there are additional constraints with similar formal properties. One possible
generalization of this schema would claim that any constraint on onsets can target the
onsets of all prosodic domains, rather than just constraints on marked onset segments.
This would predict that a constraint like ONSET, which requires syllables to have onsets,
could apply at any prosodic level.
Another possible generalization would capitalize on the formal parallel between
onsets and codas, and propose that any coda constraint can also target the final coda of
any prosodic domain. If this were true, constraints on codas like NOCODA (‘syllables may
not have codas’), *VOIOBSCODA (‘voiced obstruents may not appear in codas’),
55
*COMPLEXCODA (‘codas may not contain more than one segment’), and CODACOND
constraints (which impose place and/or manner restrictions on coda segments) would also
be able to target the final codas of words, phrases, utterances, and other prosodic
domains.
These predictions together comprise a strong hypothesis regarding the inventories
of markedness constraints on syllable edges: every markedness constraint which targets
onsets or codas is part of a formally grounded domain-edge markedness constraint
schema, composed of either MOnset(Onset/PCat) or MCoda(Coda/PCat) constraints as
defined in (39) and (41). These schemata, like the more limited *X(Onset/PCat) schema,
include individual constraints which impose restrictions on syllable constituents at the
relevant edge of each prosodic domain, as exemplified in (40) and (42). (39) MOnset(Onset/PCat) Where MOnset is some markedness constraint which targets
onsets, and PCat is some prosodic domain, assign one violation for each instance of PCat in which there is a violation of MOnset.
(40) MOns(Onset/Utt) MOns(Onset/Phr) MOns(Onset/Wd) MOns(Onset/σ) (41) MCoda(Coda/PCat) Where MCoda is some markedness constraint which targets
codas, and PCat is some prosodic domain, assign one violation for each instance of PCat in which there is a violation of MCoda.
(42) MCoda(Coda/Utt) MCoda(Coda/Phr) MCoda(Coda/Wd) MCoda(Coda/σ)
The remainder of this section will explore the specific predictions of this general
hypothesis for first onset constraints, then coda constraints, demonstrating that the
predicted phonotactic parallels are attested.
56
2.4.1. MOnset(Onset/PCat): Onset restrictions across prosodic domains
The generalized hypothesis above proposes that all markedness constraints which target
onsets are, like constraints against marked onset segments, part of constraint schemata
composed of parallel constraints on the onsets of each prosodic domain. This makes the
specific prediction that the constraint ONSET, which requires all syllables to have onsets
in languages like Cairene Arabic, Sedang, and Klamath (Blevins, 1995), is part of the
MOnset(Onset/PCat) constraint schema as in (43) and (44). This schema predicts that
words, utterances, and other prosodic domains can be required to have onsets. (43) ONSET/PCat Where PCat is some prosodic domain, assign one violation
(EXIST(Onset/PCat)) for each instance of PCat which lacks an onset.
‘PCat must have an (initial) onset.’ (44) ONSET/Utterance ONSET/Phrase ONSET/Word ONSET/σ
As Bell (1971), McCarthy (1998), and Smith (2002: 126-31), among others, have
observed, languages can require all word-initial syllables to have onsets while tolerating
onsetless syllables in word-medial position (that is, while tolerating medial hiatus). A
number of languages which require only word-initial onsets are listed in (45); this
requirement is enforced by the domain-edge markedness constraint ONSET/Word. (45) ONSET/Word: Onsets are required of (all and only) word-initial syllables18
Babungo (Schaub, 1985: 272) Bakshir (Poppe, 1962: 7) Bininj Gun-Wok (Evans, 2003: 94-5) Brahui (Elfenbein, 1998: 393) Camling (Ebert, 1997: 12) Doyayo (Wiering and Wiering, 1986) Guarani (Gregores and Suarez, 1967) Guhang Ifugao (Newell, 1956: 536)
18 Conversely, there are a number of languages in which marked onsets or onsetless syllables are
tolerated only word-initially (Beckman, 1999), or in which codas are tolerated only word-finally (see e.g. Broselow, 2003). These patterns are predicted to occur given positional faithfulness to word edges, as discussed in section 2.3.3.
57
Gwandara (Matsushita 1972) Heiltsuk (Rath, 1981) Hausa (Greenberg, 1941) Luiseno (Kroeber and Grace, 1960) Leti (Engelenhoven, 2004) Madi (Tucker, 1967) Mam (England, 1983) Manam (Lichtenberk, 1983) Mangap-Mbula (Bugenhagen, 1995: 76) Maricopa (Gordon, 1986) Mundang (Elders, 2000) Northern Arapaho (Salzmann, 1956: 51) Northwest River Montagnais (Clarke, 1982) Squamish (Kuipers, 1967) Tabukang Sangir (Maryott, 1961) Ulithian (Sohn, 1973: 39) Wiyot (Teeter, 1964) Woleian (Sohn, 1975) Wolof (Ka, 1994) Yagua (Payne and Payne, 1990) See also many examples in Bell (1971: 36)
The languages in (45) provide onsets to underlyingly vowel-initial words in
various ways. Most languages epenthesize before word-initial vowels, three other
processes are also attested, as in (46).19 (46) a. h epenthesis Yagua (Payne and Payne, 1990) Madi (Tucker, 1967: 107) Manam (Lichtenberk, 1983) b. Glide epenthesis
Woleaian (Sohn, 1975: 33-4)
c. Initial short vowel deletion Northwest River Montagnais (Clarke, 1982)
19 Those consonants epenthesized at the beginning of vowel-initial words – , h, and glides – can
also be banned word-initially; they are both marked onsets and unmarked with respect to epenthesis. See Gouskova (2003: 191) for a similar observation about schwa: it is both marked and thus prone to deletion and also unmarked and thus optimal for epenthesis.
58
In Madi, a variable process of h epenthesis may occur in underlyingly vowel-
initial words. (47) Madi ja ~ hja ‘war’
ini ~ hini ‘black’
Alternatively in Woleaian, glides are epenthesized before underlying word-initial vowels.
Epenthetic initial glides agree in rounding with following vowels: unrounded vowels are
preceded by j– and rounded vowels are preceded by w–, as shown in (48a).20 Only initial
syllables are required to have onsets; sequences of vowels are permitted word-medially as
in (48b). (48) Woleaian a. /oro-oro/ [worooro] ‘fence’
/epe-epe/ [jepeepe] ‘lee platform of canoe’ /aremata/ [jaremate] ‘person’
b. temwaaiu ‘sickness’ meloufeiu ‘a part of a men’s house’
Moving to the utterance level, languages may require only utterance-initial
syllables to have onsets, while tolerating hiatus within words and across word boundaries.
Languages in which this restriction is enforced by the constraint ONSET/Utterance are
listed in (49).
20 Woleaian i and u can surface word-initially, rather than with initial glides ji and wu, and are thus
exceptions to the ban on initial vowels. There are a number of languages which ban initial ji and wu sequences, e.g. Yana (Sapir and Swadesh, 1960) and Duunidjawu (Kite and Wurm, 2004). The exceptional licensing of these initial vowels in Woleaian thus results from constraints banning initial ji and wu which dominate ONSET/Word.
59
(49) ONSET/Utterance: Onsets are required of (all and only) utterance-initial syllables Anejom (Lynch, 2000) Hawaiian (Elbert and Pukui, 1979: 10) Koya (Tyler, 1969) Kunjen (Sommer, 1969: 28) Lango (Noonan, 1992) Menomini (Bloomfield, 1962: 3) Sanuma (Borgman, 1990: 223) Selayarese (Mithun and Basri, 1986: 242) Tuvalu (Milner, 1958: 370)
As was the case for ONSET/Word, potential violations of ONSET/Utterance can
also be avoided in a variety of ways. Glottal stop epenthesis is common. In Selayarese,
is epenthesized before vowel-initial words only when they occur in isolation or otherwise
in utterance-initial position (Mithun and Basri, 1986: 242). (50) Selayarese a:pa ‘what?’
nn ‘this’ a:pa inn *a:pa inn ‘what is this?’
A glottal stop is similarly epenthesized before utterance-initial vowels in Hawaiian, as
described by Elbert and Pukui (where ‘ represents glottal stop):
[Glottal stop] is always heard before utterance-initial a, e, and i, but this is not considered significant because its occurrence in this position is predictable. A Hawaiian greets a friend ‘Aloha, but if he uses this word within a sentence the glottal stop is no longer heard: ua aloha ‘[he] did [or does] have compassion’. (p. 10)
Languages like Menomini satisfy ONSET/Utterance via epenthesis of h, while Koya
epenthesizes an initial glide which is homorganic with the underlyingly initial vowel. In
Kunjen, utterance-initial vowels are deleted.
2.4.2. MCoda(Coda/PCat): Coda restrictions across prosodic domains
The general hypothesis described above predicts that all markedness constraints targeting
syllable codas are also part of MCoda(Coda/PCat) constraint schemata which can impose
parallel restrictions on final codas of all prosodic domains.
60
(51) MCoda(Coda/PCat) Where MCoda is some markedness constraint which targets
codas, and PCat is some prosodic domain, assign one violation for each instance of PCat in which there is a violation of MCoda.
(52) MCoda(Coda/Utt) MCoda(Coda/Phr) MCoda(Coda/Wd) MCoda(Coda/σ)
This makes specific predictions about a number of known constraints on syllable codas,
including NOCODA, *VOIOBSCODA, *COMPLEXCODA, and CODACOND.
Before these specific predictions may be explored, the notion of ‘coda of a
prosodic domain’ (Coda/PCat) must be explicitly defined so that these positions can be
discussed in a unified way. This definition is very similar to the definition of the onset of
a prosodic domain. Intuitively, the coda of some prosodic domain is the coda of the final
syllable in that prosodic domain. This definition is formalized in (53). (53) Coda/PCat The (final) coda of PCat.
Where PCat is some prosodic domain (e.g. syllable, word, phrase, utterance): All consonants in PCat which belong to the rightmost syllable of PCat and which follow that syllable’s head.
The following sections will survey a range of known constraints on syllable codas and
explore the specific predictions that codas of higher prosodic domains be subject to
parallel requirements.
2.4.2.1. MCoda(Coda/σ): Syllable coda restrictions
Languages can ban codas in all syllables; Mazateco (Pike and Pike, 1947) and Hua and
Cayuvava (Blevins, 1995) are a few of the many languages with this restriction. The
MCoda(Coda/PCat) proposal predicts that this requirement is enforced by NOCODA/σ,
which is a part of the NOCODA/PCat constraint schema defined in (54). As this schema
includes constraints which target prosodic domains above the syllable level, there should
61
be languages where final codas are banned in only words, phrases, or utterances; these
predictions will be discussed below. (54) NOCODA/PCat Where PCat is some prosodic domain, assign one violation for
(NO(Coda/PCat)) each instance of PCat which has a coda.
‘PCat cannot have a (final) coda.’ (55) NOCODA/Utterance NOCODA/Phrase NOCODA/Word NOCODA/σ
Additional known constraints on syllable codas ban particular segments or
structures which are marked in coda position. One such class of marked coda segments
are the voiced obstruents. In languages including German (Mascaró and Wetzels, 2001)
and Malayu Ambong (van Minde, 1997), any underlyingly voiced obstruents which
surface in coda position are devoiced. Languages can also license fewer place or manner
contrasts in codas than in onsets. For example, Japanese codas are restricted to nasals
(which must be homorganic with following onsets) and the initial moras of geminate
consonants. These coda conditions are typically expressed in OT via cover constraints
like CODACOND (Ito and Mester, 1994; McCarthy and Prince, 1993b). Finally, languages
can restrict the size of codas to a single segment, banning complex codas.
Under the hypothesis that all syllable coda constraints are part of
MCoda(Coda/PCat) constraint schemata, all of these constraints against marked codas are
instantiations of *X(Coda/PCat) constraints as defined in (56). These constraints are
formally parallel to the *X(Onset/PCat) constraints against marked onsets discussed
above. The specific constraint schemata which by hypothesis account for the syllable
coda restrictions discussed here are given in (57) – (58).
62
(56) *X(Coda/PCat) Where X is some segment or (set of) feature(s) and PCat is some
prosodic domain, assign one violation for each instance of X in a coda of PCat.
‘X cannot be the (final) coda of PCat.’
(57) *VOIOBS(Co/Utt) *VOIOBS(Co/Phr) *VOIOBS(Co/Wd) *VOIOBS(Co/σ)
(58) CODACOND/Utt CODACOND/Phr CODACOND/Wd CODACOND/σ (59) *COMPLEX(Co/Utt) *COMPLEX(Co/Phr) *COMPLEX(Co/Wd) *COMPLEX(Co/σ)
2.4.2.2. MCoda(Coda/Word): Word-final coda restrictions
All of the restrictions on syllable codas discussed in the previous section have parallels at
the word level. Broselow (2003) and Wiltshire (2003) discuss languages which allow
medial codas to surface freely, licensing their own place and voicing features, but ban
codas strictly in word-final syllables. Some representative languages exhibiting these
word-level NOCODA effects are listed in (60). (60) NOCODA/Word: Codas are banned in (all and only) word-final syllables
Chamicuro (Parker, 2001: 365-6) Italian, Telugu (Harris and Gussmann, 1998) Many Australian languages (Dixon, 2002: 644-8; Hamilton, 1996: 228)
In Chamicuro, for example, when a prosodic word is underlyingly consonant-
final, –i is epenthesized at the end of the word. This is shown in (61), where /ak/ is the
root ‘dance’. When this root surfaces without suffixes, as in (61a), the word is vowel-
final. The final –i is epenthetic rather than underlying, however, as it does not surface
when the root is followed by a vowel-final suffix, as in (61b) (where deletes to avoid a
complex coda).
63
(61) Chamicuro a. /u-ak/ [u.a.ki] ‘I dance’ 1SG-dance
b. /i-ak-kana/ [i.ak.ka.na] ‘they dance’
3-dance-PL *[i.a.ki.ka.na]
Other consonant-final Chamicuro words also end in epenthetic –i when they appear
without suffixes, e.g. /timil/ [timili] ‘wind (N)’, /ahkot/ [akoti] ‘house’.
This sort of pattern, in which codas are avoided only at word edges, can be
accounted for by the constraint ranking in (62). NOCODA/Word crucially dominates DEP,
allowing epenthesis in order to avoid word-final codas. DEP itself dominates NOCODA/σ,
and so medial codas are realized faithfully. (62) Chamicuro: Codas banned only word-finally /u-ak/ NOCODA/Word DEP NOCODA/σ
a. u.a.ki * *
b. u.ak *! *
c. u.a.i.ki **!
Throughout the following discussion of coda restrictions at different prosodic levels, the
emergence of these restrictions from constraint rankings will be illustrated through
examples using NOCODA/PCat constraints.
Marked voiced obstruent codas can also be tolerated word-internally but avoided
in word-final syllables, due to *VOIOBS(Coda/Word). In Russian, for example, voiced
obstruents are devoiced word-finally, as in kniga ‘book (nom. sg.)’ versus knik ‘book
(gen. pl.)’ and kluba ‘club (gen. sg.)’ versus klup ‘club (nom. sg.)’. This phenomenon is
discussed in detail by Mascaro and Wetzels (2001).
64
(63) *VOIOBS(Coda/Wd): Voiced obstruent codas banned in word-final syllables
Polish, Walloon (Mascaró and Wetzels, 2001) Russian (Halle, 1959; Mascaró and Wetzels, 2001) Mideastern (Polish) Yiddish (Katz, 1987: 39; Mascaró and Wetzels, 2001)
CODACOND-type restrictions can hold in only word-final codas in Garawa (Furby,
1974; Hamilton, 1996: 257). Here, only four consonants may occur in word-final
syllables: n, l, , and . Many more consonants are freely licensed in medial codas;
heterorganic clusters in which the medial coda consonants , c, n, , , l, , and occur are
listed in (64). (64) Garawa medial clusters Onsets
p c k m w
.p .c
c c.p
n n.p n.k n.
.p .c .k .m .
Cod
as
.p .k .m
l l.p l.k l.m l. l.w
.p .k .w
.p .k .m . .w
Garawa’s word-final codas ban low-sonority stops (*, *c) and also fail to license the full
set of place contrasts in nasals which are licensed word-medially (n, *, *).
Finally, complex codas are licensed medially but banned word-finally in
Dongolese Nubian. While medial codas may be composed of two consonants as in (65a),
word-final codas must be simple (Armbruster, 1960: 43, 48-9). Underlying word-final
codas are simplified via epenthesis of I, as shown in (65b).
65
(65) Dongolese Nubian
a. Medial complex codas mat.bahn.tu:r ‘inside the kitchen’ di gm.ba:.dIr ‘after five’ wln.di ‘canine’
b. Final simple codas /gi ns/ [gi n.sI] ~ [gi .nIs] ‘sort, kind’
/br-k/ [br.kI] ‘(the) wood (obj.)’ /to:g-n/ [to:.gIn] ‘she strikes’
2.4.2.3. MCoda(Coda/Phrase): Phrase-final coda restrictions
Restrictions parallel to those on syllable and word codas can also target phrase codas.
Wiltshire (2003: 258-60) observes that a ban on phrase-final codas in Leti is similar to
more common bans on word-final codas. In Leti, codas are licensed word-medially and
word-finally, while being banned strictly at the ends of phonological phrases
(Engelenhoven, 2004; Hume, 1998). Consonants at the ends of phonological phrases
(which are described by Hume as being roughly equivalent to major syntactic XPs)
metathesize with preceding vowels. Syllables and words may thus end in consonants, but
phrases may not; examples of this are shown in (66). (66) Leti Non-phrase-final C# /urun ma/ [urun ma] ‘Moanese breadfruit’
Phrase-final C# /urun/ [urnu] ‘beautiful’
Non-phrase-final C# /msar lavna/ [msar lavna] ‘teacher, big’ Phrase-final C# /msar/ [msra] ‘teacher’
This sort of pattern, where marked structures are banned only at phrase edges,
emerges from OT constraint rankings as in (67). NOCODA/Phrase » LINEARITY allows
metathesis in order to avoid phrase-final codas (as in (67a)), but LINEARITY »
NOCODA/Word, NOCODA/σ prevents metathesis from similarly avoiding word-final or
syllable-final codas when they are not also phrase-final (as in (67b)).
66
(67) Leti: Codas banned only phrase-finally a. msar ]Phr NOCODA/Phr LINEARITY NOCODA/Wd NOCODA/
σ
a. ms.ra ]Phr * *
b. m.sar ]Phr *! * * b. msar … ]Phr NOCODA/Phr LINEARITY NOCODA/Wd NOCODA/
σ
a. ms.ra … ]Phr *! **
b. m.sar … ]Phr * **
Marked coda segments may also be avoided only at the ends of phrases. In the
variety of Yiddish described by Birnbaum (1979: 211), coda devoicing occurs only
phrase-finally. Voiced obstruents are devoiced when they are “followed by a break in
speaking, even a short one, and, of course, at the end of a sentence”, as illustrated in (68).
Note that underlyingly voiced obstruents which are word-final but not phrase final, like
the z in [er iz miit, bin ex…], remain voiced. (68) Yiddish /my meig, ober…/ [my meik, ober] ‘one may – but…’
/zaan vaab, demlt…/ [zaan vaap, demlt…] ‘his wife, at that time…’
/er iz miid, bin ex…/ [er iz miit, bin ex…] ‘he is tired, so I…’ /di maaz, er vet…/ [di maas, er vet…] ‘the mice, he will…’
Finally, CODACOND restrictions can apply only to phrasal codas as well. In
Koromfe, “[p]hrase-medially, word-final consonants can occur freely; in phrase-final
position only the consonants [m, n, , l] are permitted; after all other consonants an
‘epenthetic’ vowel must be ‘inserted’.” (Rennison, 1997: 422) This restriction is similar
67
to the syllable coda place and manner restrictions found in Japanese, as well as the word-
final restrictions described above in Garawa, and are imposed by CODACOND/Phrase.21
2.4.2.4. MCoda(Coda/Utterance): Utterance-final coda restrictions
Finally, languages may ban codas in only utterance-final syllables. In the languages in
(69), codas may occur word-medially, and also word-finally except in utterance-final
words. (69) NOCODA/Utterance: No utterance-final syllables have codas
Arrernte (Tabain et al., 2004: 178) Sardinian (Ferrer, 1994: 43) Western Shoshoni (Crum and Dayley, 1993: 235, 248)
Utterance-final consonants are followed by epenthetic vowels in Arrernte and in
Sardinian as in (70), where each word is given with its utterance-final pronunciation. (70) Sardinian medas [maza] sun [suni]
fit [fii] fut [fui]
In Western Shoshoni, utterance-final codas are deleted.
This avoidance of codas (and other marked structures) only at utterance edges
follows from rankings like NOCODA/Utterance » DEP » NOCODA/Phrase, NOCODA/Word,
NOCODA/σ, as in (71). In (71a), the input word is utterance-final and also phrase-final; in
(71b) it is phrase-final but not utterance-final. (71) Sardinian: Codas banned only utterance-finally a. mdas ]Phr ]Utt
NOCODA /Utterance DEP NOCODA
/Phrase NOCODA
/Word NOCODA
/σ
a. m.a.za ]Phr ]Utt *
b. m.as ]Phr ]Utt *! * * *
21 As Rennison provides no examples of these restrictions, it is not entirely certain that he uses
‘phrase’ to refer to a prosodic unit smaller than the utterance; this may instead be an instance of CODACOND/Utterance.
68
b. mdas ]Phr … ]Utt
NOCODA /Utterance DEP NOCODA
/Phrase NOCODA
/Word NOCODA
/σ
a. m.a.za ]Phr … ]Utt *!
b. m.as ]Phr … ]Utt * *
CODACOND restrictions may also hold utterance-finally, as in Pintupi:
The consonants [n, ɲ, ɳ, l, ʎ, ɭ] may occur in word-final positions while utterance-medial.…Except for…two morphemes ending in [n], no consonant is permitted to occur pre-pause. Therefore the juncture syllable [–pa] is added to any stem which could otherwise occur final in the utterance. (Hansen and Hansen, 1978: 39-40)
As noted in section 2.2.3.1 above, because most language descriptions focus on
word phonology there are extremely few reported cases of phrase- and utterance-level
phonotactic restrictions. The languages discussed here and in the previous section
demonstrate that NOCODA, *VOIOBSCODA, and CODACOND can target only phrase-final
syllables (via NOCODA/Phrase, *VOIOBS(Coda/Phrase), and CODACOND(Coda/Phrase));
NOCODA and CODACOND can also target utterance-final codas (via NOCODA/Utterance
and CODACOND/Utterance). This analysis predicts that further investigation of phrase-
and utterance-level phonology would reveal additional coda restrictions at these prosodic
levels.
2.4.3. Summary of the argument
The preceding discussion has shown that any restriction which can hold on syllable
onsets or codas can also hold of (initial) onsets or (final) codas of larger prosodic
domains. These correspondences among phonotactic restrictions across domains are
summarized in the elaborated table in (72).
69
(72) Attested phonotactic restrictions on prosodic domains
RESTRICTION Syllable Word Phrase Utterance
*/ONSET Mongolian West Greenlandic Kunwinjku
*/ONSET Balantak Nahuatl Kaiwa
*h/ONSET Chamicuro Carib Tucano
*Glide/ONSET child language
Sestu Campidanian Sardinian
ONSET Klamath Wiyot Selayarese
NOCODA Hua Italian Leti Sardinian
*VOIOBSCODA German Russian Yiddish
*Complex/CODA Sedang Dongolese Nubian
CODACOND Japanese Garawa Koromfe Pintupi
These restrictions are therefore general prosodic domain phenomena, rather than
strictly syllable phenomena: onset restrictions can apply to initial onsets in any domain,
and coda restrictions can apply to final codas in any domain.
This generalization has led to a redefinition of ‘onset’ and ‘coda’, as in (73), such
that all prosodic domains – utterances, phrases, and words, as well as syllables – now
have onsets and codas. This allows for a simpler typological statement: onset restrictions
can apply to onsets of any domain, and likewise for codas. (73) a. Onset/PCat The onset of PCat, where PCat is some prosodic domain (e.g.
syllable, word, phrase, utterance): All consonants in PCat which belong to the leftmost syllable of PCat and which precede that syllable’s head.
b. Coda/PCat The coda of PCat, where PCat is some prosodic domain (e.g.
syllable, word, phrase, utterance): All consonants in PCat which belong to the rightmost syllable of PCat and which follow that syllable’s head.
The observed correspondences among domain-edge restrictions provide evidence
for the proposal that all prosodic restrictions on onsets and codas are enforced by
70
constraints belonging to a MOnset(Onset/PCat) or MCoda(Coda/PCat) constraint schema, as
in defined in (74) and exemplified in (75). (74) a. MOnset(Onset/PCat) Where MOnset is some markedness constraint which targets
onsets, and PCat is some prosodic domain, assign one violation for each instance of PCat in which there is a violation of MOnset.
b. MCoda(Coda/PCat) Where MCoda is some markedness constraint which targets
codas, and PCat is some prosodic domain, assign one violation for each instance of PCat in which there is a violation of MCoda.
(75) a. MOns(Onset/Utt) MOns(Onset/Phr) MOns(Onset/Wd) MOns(Onset/σ)
b. MCoda(Coda/Utt) MCoda(Coda/Phr) MCoda(Coda/Wd) MCoda(Coda/σ)
The argument that the attested restrictions should be unified via the
MOnset(Onset/PCat) and MCoda(Coda/PCat) schemata has strong empirical support.
Despite the relative scarcity of reported phonotactic restrictions on the edges of prosodic
domains above the word (as discussed in section 2.2.3.1), most of the logically possible
constraints predicted by these schemata are attested, as shown in (72) above. Further, the
factorial typology predicted by the free ranking of these constraints with respect to
faithfulness constraints also makes accurate predictions, as discussed in sections 2.3.2
and 2.3.3.
The discovery of these constraint schemata enriches our understanding of the
nature of both prosodic domains in general and also of OT’s constraint inventory CON.
Specifically, this provides evidence for a great deal of structure within CON: CON
cannot contain arbitrary sets of constraints targeting the edges of different prosodic
domains; instead, these positions are subject to parallel sets of markedness constraints.
Section 2.3.5 showed that there are no consistent correlations between marked
onsets’ phonetic properties and the prosodic positions in which they are dispreferred,
71
indicating that *X(Onset/PCat) constraints are formally grounded. Similarly, there is no
consistent correlation between the phonetic properties of the additional marked onset and
coda segments and structures and the positions where the phonological restrictions can be
imposed, as described here. Segments can be banned in prosodic positions where their
phonetic properties are not particularly compromised. For example, voiced obstruents can
be banned in utterance-final, phrase-final, word-final, or syllable-final codas. These
segments are somewhat articulatorily difficult in utterance-final position; however, at the
other end of the prosodic hierarchy, there is no comparable difficulty for word-medial
voiced obstruent codas. Learners would not have sufficient information to induce each of
the constraints in these domain-edge markedness schemata. Therefore the general
MOnset(Onset/PCat) and MCoda(Coda/PCat) constraint schemata are not functionally
grounded but rather formally grounded and innate.
2.5. Domain-edge markedness constraints and strict layering
Having motivated the existence of formally grounded domain-edge markedness
constraints through the cross-linguistic phonotactic parallels among edges of prosodic
domains, the predicted phonological consequences of these constraints can be explored
more deeply. One of their defining properties is that they are crucially sensitive to details
of prosodic structure. That is, the assessment of ONSET/Word violations incurred by a
form depends on the precise location of prosodic word edges in that form. Thus far, the
discussion has focused on cases in which output prosodic word edges fall at the edges of
underlying lexical words, and so where onsetless words epenthesize a consonant or delete
a vowel at the left edge of the prosodic word in order to satisfy ONSET/Word. These are
cases in which prosodic strict layering is obeyed and thus in which all prosodic structures
are as in (76), where all segments are syllabified, all syllables are in prosodic words, all
72
prosodic words are in prosodic phrases, etc. (Nespor and Vogel, 1986; Selkirk, 1981,
1984). (76) Utterance Phrase Phrase Word Word Word Word σ σ σ σ σ σ σ σ x x x x x x x x x x x x x x x x
In structures like (76), an utterance-initial segment is also always phrase-initial,
word-initial, and syllable-initial. Considering only prosodic structures of this sort has
allowed us to assume the within-language implication that any language which bans e.g.
word-initially also bans utterance-initially (as is typically the case). As all utterances
are word-initial, licensed utterance-initial segments must also be licensed word-initially.
While strict layering is typically observed, it by no means always holds. Syllables
can be attached directly to prosodic words rather than to feet, and clitics and function
words can be attached directly to phrases rather than to prosodic words (Ito and Mester,
2003; Selkirk, 1995). For a complete understanding of the typology of domain-edge
markedness constraints, it is important to consider their effects in prosodic structures in
which strict layering is not observed.
The positions of prosodic domain edges, including word edges, has been argued
to follow from constraint interaction (Selkirk, 1995; Truckenbrodt, 1999). The location of
these domain edges is thus variable within and across languages. Because of this and the
fact that not all segments are necessarily inside prosodic words, complex interactions
between segmental and prosodic structures are predicted to result from the interaction of
domain-edge markedness constraints and constraints enforcing strict layering. Two such
73
predictions will be discussed in detail in the following sections. First, it will be shown
that faithful realization of marked structures can force outputs to have prosodic structures
that violate strict layering. An example of this is found in Banawa, where underlyingly
word-initial onsetless syllables surface outside prosodic words. Second, structures which
are banned at domain edges can be tolerated when they appear, for independent reasons,
in extraprosodic positions; an example of this is found in Tzutujil, where extraprosodic
proclitics can lack initial onsets despite the fact that lexical (and prosodic) words must
have initial onsets.
2.5.1. Marked structures become extraprosodic: Banawa stress
Because domain-edge markedness constraints are positional markedness constraints, two
types of repair are possible when one of these constraints is potentially violated. As
discussed above, the marked structure itself can be repaired. For example, a violation of
ONSET/Word can be avoided by epenthesizing a consonant before a word-initial vowel, or
by deleting the word-initial vowel entirely. The second type of repair moves the marked
structure out of a word onset position; this is predicted by OT rankings like the one in the
hypothetical tableau in (77), where ONSET/Word and faithfulness constraints dominate
the constraint enforcing prosodic strict layering.22
22 Here, a cover constraint simply called STRICTLAYER is used; see e.g. Selkirk (1995) for specific
constraints that have been proposed to enforce strict layering.
74
(77) Onsetless initial syllables become extraprosodic
V.CV.CV FAITH ONSET/Word STRICTLAYER
a. V [Wd CV.CV ] *
b. [Wd V.CV.CV ] *!
c. [Wd CV.CV.CV ] *!
d. [Wd CV.CV ] *!
This ranking predicts that it should also be possible to avoid an ONSET/Word
violation by moving a (fully faithful) onsetless syllable to a position outside the prosodic
word, where it no longer violates the domain-edge markedness constraint. In this type of
‘prosodic’ repair, the presence of a marked structure in the input is predicted to be able to
cause violations of strict layering. The winning structure in (77), with an extraprosodic
initial vowel, is very similar to structures proposed by Spring (1990) to account for the
fact that onsetless initial syllables in Axininca Campa do not participate in reduplication.
Downing (1998) also proposes similar structures for a number of languages in which
onsetless word-initial syllables fail to bear stress or high tone, or to participate in
reduplication. The present analysis builds on the central insight of Spring’s and
Downing’s proposals – that initial onsetless syllables can be extraprosodic and thus
exceptional.23
In this manner, ONSET/Word induces onsetless initial syllables to surface outside
of prosodic words in Banawa (Buller et al., 1993; Downing, 1998; Everett, 1990).
Banawa prosodic words must have initial onsets; onsetless (underlyingly) word-initial
syllables surface outside the prosodic word. The extraprosodic position of these marked
initial onsetless syllables is indicated by the fact that they cannot be stressed, unlike all
23 See Downing (1998) for arguments against Spring’s derivational analysis, and Smith (2002:
104-5) for arguments against Downing’s constraint-conjunction approach.
75
other (prosodic word) initial syllables. After an analysis of Banawa based on
ONSET/Word is presented, an alternative analysis based on a requirement that stressed
syllables have onsets will be shown to be insufficient to account for the data. Further
details of the Banawa data will then be used to show that the ONSET/PCat constraints
must be freely rankable, as proposed in section 2.3.4.
2.5.1.1. Basic analysis of Banawa
The default Banawa stress pattern is illustrated in (78). Initial syllables, and every second
syllable thereafter, are stressed; feet are trochaic and start at the left edge of the word.
Main stress is typically (though not consistently) on the penultimate foot; for the
purposes of the present discussion, the distinction between primary and secondary stress
is irrelevant. (78) te.me ‘foot’
ma.ka.rì ‘cloth’ ta.ti.ku.ne ‘hair’ tì.na.rí.fa.bu.ne ‘you are going to work’
Banawa syllables are either CV or V.24 Medial onsetless syllables may be either unstressed
(as in 79a) or stressed (as in 79b). Postvocalic word-final i is extraprosodic, as in
sayie<i> and jauma<i>, and is never stressed. (79) a. fu.a ‘lose’ b. ba.du.e ‘species of deer’
fu.a.na ‘lost’ sa.yi.e.i ‘sound out’ ja.u.ma.i ‘pig’ ke.re.we.du.a.ma ‘turn end over end’ tì.a.sí.a.nì ‘acquire’
24 Buller et al. claimed that syllables are CV(V) (with the exception of initial V syllables), and that
stress is on alternate moras. I follow Hayes (1995: 121-3) in assuming that syllables, not moras, are stress-bearing units, and therefore that only word-internal V syllables can produce a contrast between CV .V and CV.V .
76
The only initial syllables which are not stressed are those which are onsetless, as
in (80). When a word has an initial onsetless syllable, its second syllable and every
second syllable thereafter is stressed. That is, in these cases, words are stressed according
to the normal pattern, but the first stress occurs on the second syllable. (80) u.wa.re.i *u.wa.re.i ‘make noise’
u.fa.bu.ne *u.fa.bu.ne ‘I drink’ a.tì.ke.í.ya.rì.ne *a.ti.ke.i.ya.ri.ne ‘happy’
The avoidance of stress on initial onsetless syllables can be straightforwardly
accounted for by the constraint ranking in (81), which forces such syllables to fall outside
the prosodic word.25 (81) Initial onsetless syllables are extraprosodic in Banawa ufabune ALIGN-L(Wd,Ft) FAITH ONSET/Word STRICTLAYER
a. u [Wd (fa.bu)(ne) ] *
b. [Wd u (fa.bu)(ne) ] *! *!
c. [Wd (u.fa)(bu.ne) ] *!
d. [Wd (u.fa)(bu.ne) ] *!
Here, winning candidate (81a) represents a word in which the initial u is attached directly
to some larger prosodic constituent, e.g. a phonological phrase, in order to avoid violating
ONSET/Word (though at the cost of violating the lower-ranked STRICTLAYER). Losing
candidates (81b-c) surface with initial onsetless syllables, thus violating ONSET/Word;
(81b) also violates ALIGN-L(Wd,Ft), due to the fact that the initial u is instead unfooted
and attached directly to the prosodic word. Finally, (81d) loses as it epenthesizes a glottal
stop in order to provide an onset to the initial syllable, violating high-ranking DEP.
ONSET/Word is therefore capable of accounting for the avoidance of word-initial
onsetless syllables in Banawa.
25 I assume that TROCHEE rules out iambic candidates, e.g. *[Wd (u.fa)(bu .ne) ], *[Wd (u.fa)(bu.ne).
77
Finally, ONSET/Word is violable in Banawa vowel-initial disyllabic words, where
the initial vowels receive stress: a.ba *a.ba ‘fish’; a.wa *a.wa ‘wood’; a.wi *a.wi ‘tapir’.
This follows from the ranking FTBIN » ONSET/Word, where FTBIN requires that feet be
binary: (82) FTBIN » ONSET/Word allows initial onsetless syllables in disyllabic words aba FTBIN ONSET/Word STRICTLAYER
a. a [Wd (ba) ] *! *
b. [Wd a (ba) ] *!
c. [Wd (a.ba) ] *
FTBIN is itself violable as the result of the ranking (FAITH ») PARSE-σ » FTBIN, where
PARSE-σ requires all syllables inside a prosodic word to be parsed into feet. This allows
word-final feet to be unary rather than binary.
2.5.1.2. Alternative analysis: ONSET/σ
An alternative explanation for the avoidance of stressed onsetless syllables is suggested
by Smith (2002: 97ff.) and de Lacy (2000). They suggest many languages exhibiting this
type of pattern in fact have stress systems which are sensitive to the presence of onsets,
and that these patterns are due to a constraint which attracts stress to syllables with
onsets: ONSET/σ. Given the ranking in (83), undominated ONSET/σ and STRICTLAYER
cause feet to be displaced from the left edge of the prosodic word, leaving the initial
onsetless syllable inside the prosodic word (unlike in (81)) but unfooted and thus
unstressed. Under this analysis, ONSET/σ would need to penalize onsetless syllables with
either primary or secondary stress in order to explain both the avoidance of *u.fa.bu.ne in
favor of u.fa.bu.ne and also the avoidance of *a.ti.ke.i.ya.ri.ne in favor of a.tì.ke.í.ya.rì.ne
‘happy’.
78
(83) ONSET/σ also avoids onsetless initial stressed syllables ufabune STRICTLAYER FAITH ONSET/σ ALIGN-L(Wd,Ft)
a. u [Wd (fa.bu)(ne) ] *!
b. [Wd u (fa.bu)(ne) ] *
c. [Wd (u.fa)(bu.ne) ] *!
d. [Wd (u.fa)(bu.ne) ] *!
An analysis based on ONSET/σ therefore cannot provide a complete account of
Banawa stress, because in Banawa only initial onsetless syllables avoid stress. As
ONSET/σ cannot distinguish between initial and medial onsetless syllables, the ranking
from (84) incorrectly leaves initial CV syllables unfooted (and thus unstressed) when
doing so prevents stress from appearing on medial onsetless syllables. This is illustrated
in (84), where candidate (84b) is incorrectly chosen as the winner; the actual winner is
(84c), where the final onsetless syllable is stressed.26 (84) Bad prediction: No onsetless syllables are stressed badue STRICTLAYER FAITH ONSET/σ ALIGN-L(Wd,Ft)
a. ba [Wd (du.e) ] *!
b. *[Wd ba (du.e) ] *
c. [Wd (ba.du)(e) ] *!
d. [Wd (ba.du)(e) ] *!
Because of its direct reference to word-initial onsets, ONSET/Word
straightforwardly explains the difference in behavior between initial and medial onsetless
syllables in Banawa : as shown in (85), a medial stressed onsetless syllable does not incur
a violation of ONSET/Word, and thus does not disrupt the normal pattern of stress
assignment. Unlike ONSET/σ, which directly penalizes any and all stressed onsetless
26 It would be possible to account for the Banawa data using the locally conjoined constraint
[ONSET/σ & ONSET/σ1] (Smolensky, 1995, 1997); however, see McCarthy (1999: 365-6) and Padgett (2002) for arguments against local conjunction.
79
syllables, ONSET/Word simply requires prosodic words to begin with an onset. The repair
chosen in Banawa , where onsetless initial syllables are removed from prosodic words and
stress is therefore shifted away from these initial syllables, is a consequence of the
ranking of other constraints relative to ONSET/Word. (85) ONSET/Word allows only initial onsetless syllables to avoid stress badue ALIGN-L(Wd,Ft) FAITH ONSET/Word STRICTLAYER
a. ba [Wd (du.e) ] *!
b. [Wd ba (du.e) ] *!
c. [Wd (ba.du)(e) ]
d. [Wd (ba.du)(e) ] *!
2.5.1.3. ONSET/PCat constraints must be freely rankable
Because strict layering is not consistently observed in Banawa, ONSET/PCat constraints
are not necessarily pseudo-stringent (as was the case in section 2.3.4). This provides a
situation in which the relationship among these constraints can be explored to see
whether stringent formulations or fixed ranking is required. This section will show that
the constraints must be nonstringent and freely rankable.
When a form like ufabune occurs utterance-initially, the initial u is utterance-
initial, but is extraprosodic and so not word-initial for the reasons discussed above. The
word is pronounced without an initial onset, as in utterance-medial position. Thus the
prosodic structure is [Utt u [Wd fa.bu.ne ]. Utterances are therefore unlike words in that
they license initial onsetless syllables.
In Banawa, the typical within-language implication regarding the distribution of
marked domain-edge structures does not hold. Prosodic words always have initial onsets,
while utterances may be vowel-initial. Onsetless syllables are banned word-initially, but
tolerated utterance-initially. The general discussion of Banawa stress shows that FAITH
80
and ONSET/Word must dominate STRICTLAYER, as in (86), explaining why candidates
(86b) and (86c) lose. For candidate (86a) to win, STRICTLAYER must dominate
ONSET/Utterance. By transitivity, then, FAITH and ONSET/Word must also dominate
ONSET/Utterance. (86) ONSET/Word » ONSET/Utterance ufabune FAITH ONSET/Word STRICTLAYER ONSET/Utt
a. [Utt u [Wd (fa.bu)(ne) ] ] * *
b. [Utt u [Wd (fa.bu)(ne) ] ] *! *
c. [Utt [Wd (u.fa)(bu.ne) ] ] *! *
d. u [Utt [Wd (fa.bu)(ne) ] ] **!
Recall that the typical case, where some marked onset is banned e.g. utterance-
initially but licensed word-initially, follows from a ranking like *X(Onset/Utterance) »
FAITH » *X(Onset/Word), as shown in section 2.3.2. Thus in order for the conflicting
rankings *X(Onset/Utterance) » *X(Onset/Word) and *X(Onset/Word) »
*X(Onset/Utterance) to both be possible, constraints in the MOns(Onset/PCat) schema
must be freely rankable.
2.5.2. Tolerance of marked ‘initial’ structures: Tzutujil clitics
The previous section demonstrated that domain-edge markedness constraints can force
marked structures to surface outside prosodic words. This section will show that material
which surfaces outside prosodic words for independent reasons (e.g. clitics) is not
evaluated by domain-edge markedness constraints, and so can have marked initial
structures which are banned prosodic word-initially throughout the language. That is,
clitics can begin with structures which are never initial in lexical words, because lexical
words are always inside prosodic words and thus subject to domain-edge markedness
81
constraints. This occurs in Tzutujil (Dayley, 1985), where prosodic words (and thus all
roots) must have initial onsets, while proclitics may be onsetless.
When underlyingly vowel-initial Tzutujil roots occur in their bare forms, they
receive epenthetic glottal stop onsets.27 The appearance of this epenthetic onset satisfies
ONSET/Word, as shown in (88). (87) /ak’/ [ak’] ‘chicken’ /axq’i:x/ [axq’i:x] ‘diviner’
/o:x/ [o:x] ‘avocado’ /oxqat/ [oxqat] ‘deerhunter’ /utz/ [utz] ‘good’ /utzi:l/ [utzi:l] ‘goodness’
(88) epenthesis satisfies ONSET/Word ak’ ONSET/Word DEP
a. [Wd ak’ ] *
b. [Wd ak’ ] *!
The only Tzutujil words which regularly surface without initial onsets are the
vowel-initial absolutive and ergative proclitics; the clitic paradigms are given in (89). As
shown in (90), is never epenthesized before these clitics. (89) a. Absolutive proclitics b. Ergative proclitics
1Sg in– 1Pl oq– 1Sg nu:–/w– 1Pl qa:–/q– 2Sg at– 2Pl ix– 2Sg a:–/a:w– 2Pl e:–/e:w– 3Sg ∅ 3Pl e:–/e–28 3Sg ru:–/r– 3Pl ke:–/k–
(90) in=winak *in=winak ‘I am a person’
oq=winak *oq=winak ‘we are people’ a:=tz’i: *a:=tz’i: ‘your dog’ a:w=ak’ *a:w=ak ‘your chicken’
27 Epenthetic onset is obligatory on monosyllabic words and optional on longer forms; the
crucial point here is that all vowel-initial words can take epenthetic initial , unlike the clitics discussed below which never receive initial .
28 When two forms occur, the first is for consonant-initial stems and the second for vowel-initial stems.
82
The difference between bare roots and clitics can be derived by assuming,
following Selkirk (1995), that Tzutujil clitics surface outside of prosodic words, attaching
directly to phonological phrases or higher prosodic constituents; the formal motivation
for this prosodic structure will be discussed below. ONSET/Word requires all and only
prosodic words to have initial onsets. As shown in (91), ONSET/Word is indifferent as to
whether the extraprosodic clitics in candidates (91a) and (91b) have onsets and so rules
out only candidate (91c) where the clitic is fully incorporated into an onsetless prosodic
word. DEP prefers candidates without epenthesis, eliminating candidates (91b) and (91d)
and allowing the STRICTLAYER-violating candidate (91a) to win. Thus clitics, unlike
roots, may surface with initial onsetless syllables. (91) Clitics don’t receive epenthetic onsets a:w=ak’ ONSET/Word DEP STRICTLAYER
a. a: [Wd wak’ ] *
b. a: [Wd wak’ ] *! *
c. [Wd a:wak’ ] *!
d. [Wd a:wak’ ] *!
In the winning candidate (91a), epenthesis does not occur root-initially because
the final consonant of the clitic resyllabifies and provides an onset to the root. Allowing
this, while preventing various other unattested misalignments of root and prosodic word
edges, is crucial to a full analysis of prosodic word edges in Tzutujil.
Two types of undesirable outputs must be avoided: those in which onsetless root-
initial syllables surface outside the prosodic word (as in Banawa), e.g. *a[xq’i:x], cf.
[axq’i:x] ‘diviner’, and those in which clitics fully incorporate into prosodic words (thus
satisfying STRICTLAYER), e.g. *[a:w=ak’], *[a:w=ak’], cf. a:[w=ak’] ‘your chicken’. A
traditional alignment constraint like ALIGN-L(Root, PrWd) (McCarthy and Prince,
83
1993a), which demands that the left edge of each root cooccur with the left edge of a
prosodic word, would prevent both of these undesirable results. Problematically,
however, it would also inappropriately rule out surface structures in which clitic-final
consonants resyllabify to satisfy ONSET/Word as in (92), where the actual output is (92a);
bare roots with epenthetic word-initial would also be wrongly eliminated. (92) ALIGN-L(Root, PrWd) wrongly prevents clitic consonant resyllabification a:w=ak’ ALIGN-L(Root, PrWd) ONSET/Word DEP STRICTLAYER
a. a: [Wd wak’ ] *! *
b. *a:w [Wd ak’ ] * *
c. a:w [Wd ak’ ] *! * *
Something weaker than ALIGN-L(Root, PrWd) must therefore mediate the
relationship between Tzutujil root and word edges. The necessary constraint must force
the beginning of the root to fall inside the prosodic word, and must allow epenthetic or
clitic consonants but not clitic vowels to intervene between root and word edges. A
constraint which aligns the edges of root-headed syllables with edges of prosodic words,
ROOTHEADL, can account for this pattern. (93) ROOTHEADL The left edge of the leftmost syllable whose morphological domain
is the root must be aligned with the left edge of a prosodic word.
This constraint crucially refers to the notion of ‘morphological domain’
introduced by van Oostendorp (2004) in a discussion of differences in the syllabification
of prefix vs. suffix segments in Dutch. A segment’s morphological domain is defined as
the smallest word containing the segment, and a syllable inherits its morphological
domain from its head; that is, the morphological domain of a syllable is the
morphological domain of the segment heading it. As vowels are syllable heads, they are
the source of syllables’ morphological affiliations; non-head consonants thus do not
transmit their morphological domains to syllables.
84
If alignment constraints can refer to morphological domains, ROOTHEADL can
therefore require the leftmost vowel in a root to surface in the leftmost syllable of a
prosodic word, while failing to penalize non-head material in that leftmost syllable with
morphological affiliations other than root. Clitic and epenthetic consonants can thus
appropriately appear before root-initial vowels inside Tzutujil prosodic words in order to
satisfy ONSET/Word, while clitic vowels must remain outside prosodic words and root
vowels must remain inside them. These results are shown in (94) and (95). (94) Clitic vowels cannot be inside prosodic words; clitic consonants can resyllabify a:w=ak’ ROOTHEADL ONSET/Word DEP STRICTLAYER
a. a: [Wd wak’ ] *
b. a:w [Wd ak’ ] *! *
b. [Wd a:.wak’ ] *! *!
c. [Wd a:.wak’] *! * (95) Root vowels cannot surface outside prosodic words; initial epenthesis is possible axq’i:x ROOTHEADL ONSET/Word DEP STRICTLAYER
a. a [Wd xq’i:x ] *! *
b. [Wd ax.q’i:x ] *!
c. [Wd ax.q’i:x ] *
Without reference to segments’ morphological affiliation, it is impossible to force
all clitic vowels (but not all clitic consonants) to surface outside prosodic words and thus
allow these clitic-initial vowels to escape from prosodic requirements on prosodic word-
initial onsets. In general, morphologically-mediated alignment constraints like
ROOTHEADL firmly link edgemost vowels to prosodic domain edges, while allowing
consonants to enter or leave prosodic domains without penalty. ROOTHEADL thus
captures consonants’ tendency to be more free than vowels in terms of prosodic
alignment and resyllabification across word boundaries. Effects of this tendency are often
85
seen in languages other than Tzutujil. de Lacy (2002b) argues that in Maori, a single
prosodic word must contain all vocalic elements of a root, but not necessarily all
consonantal elements; final consonants surface in a distinct prosodic word when suffixes
are added (and are otherwise deleted). Similarly, Cairene Arabic allows the initial
consonant of a complex onset to resyllabify and become a coda to a preceding word,
though vowels can never change their prosodic affiliations.
2.5.3. Domain-edge markedness constraints and non-strict layering
This section has considered the effects of domain-edge markedness constraints in
prosodic structures which violate strict layering. These positional markedness constraints
were predicted to be sensitive to, and also to be able to affect, prosodic structure. Both of
these predictions are attested. In Banawa, ONSET/Word forces onsetless initial syllables to
surface in extraprosodic positions; in Tzutujil, onsetless clitic-initial syllables (which
independently surface in extraprosodic positions due to ROOTHEADL) surface faithfully,
as they are not penalized by ONSET/Word. This discussion also demonstrated that these
constraints must be nonstringent and freely rankable in order to account for cases where
restrictions hold on smaller but not larger domains.
Given the formal parallels among domain-edge markedness constraints, all such
constraints should interact in similar ways with candidates’ prosodic structures. While
this section considered only the sensitivity of ONSET/Word to prosodic word edges, all
other constraints are predicted to show similar effects. For example, *X(Onset/PCat)
constraints should be able to license marked onsets only in extraprosodic clitics, or force
root-initial marked onsets to surface in extraprosodic position.
Coda constraints should also be sensitive to prosodic structure, licensing (marked
or all) codas only in clitics or forcing them to surface outside the prosodic word. More
86
specifically, a ranking like FAITH, NOCODA/Word » STRICTLAYER should force
underlying word-final consonants to surface faithfully but in a position outside of
prosodic words, rather than as word-final codas. In other words, this ranking accurately
predicts that languages may require final consonants to be extrametrical.
2.6. Conclusion
This chapter has examined the properties of constraints in the formally grounded
MOnset(Onset/PCat) and MCoda(Coda/PCat) schemata. These schemata are motivated by
the cross-linguistic generalization that parallel phonotactic restrictions can hold on the
edges of all prosodic domains. These generalizations allow a proper characterization of
syllable onset and coda restrictions as properties of any prosodic domain; syllable edges
are simply particular instances of prosodic domains in which these restrictions may hold.
Domain-edge markedness constraints must target prosodic positions. ONSET/Word targets
the initial syllables of prosodic, rather than morphological, words, as in Tzutujil. Further,
ONSET/Word can also drive an onsetless underlyingly word-initial syllable to surface in
an extraprosodic position, in order to satisfy the requirement that prosodic word-initial
syllables have onsets.
Returning to the central topic of this dissertation, domain-edge markedness
constraints must be formally, rather than functionally, grounded. Learners could not
induce each of the constraints in these schemata from their immediate linguistic
experience, as there are no consistent phonetic difficulties associated with each marked
onset and coda segment (or structure) in each of the prosodic positions where it is
dispreferred. While some of these constraints penalize marked segments or structures in
phonotactic contexts where they are explicitly perceptually difficult, formal or functional
grounding is a property of an entire constraint schema (as argued in chapter 1), rather
87
than of individual constraints. For this reason, all domain-edge markedness constraints
must emerge from innate schemata, as the full set of constraints cannot be induced by
learners from their immediate linguistic experience.
Finally, these general properties of domain-edge markedness constraints raise
questions about whether there are other formal similarities among prosodic domains
which could be captured by similar formally grounded constraint schemata. A natural
extension of this proposal would hypothesize that parallel sets of positional faithfulness
constraints target the edges of all prosodic domains. This hypothesis is tentatively
supported by observations that positional faithfulness can protect both syllable onsets and
word-initial onsets (Beckman, 1999). Wiltshire has also observed that similar supersets of
codas may be licensed exclusively at the ends of various prosodic domains (Wiltshire,
2003), and Côté notes that segments at the ends of larger prosodic domains are
neutralized less frequently than those at the ends of smaller domains (Côté, 1999). Given
that domain-edge markedness constraints are defined in terms of the formal prosodic
hierarchy, it would be unsurprising if positional faithfulness constraints also had parallel
instantiations in these same prosodic domains.
88
Chapter 3. Functionally grounded phonotactic restrictions
3.1. Functional grounding in phonology
This chapter and the next will investigate phonological patterns whose functional
motivations may be discovered by learners. A common theme in phonology is the search
for phonetic properties which make phonological patterns ‘natural’ or ‘grounded’ (see
e.g. Stampe (1973), Hooper [Bybee] (1976), Ohala (1990), and Archangeli and
Pulleyblank (1994)). Within Optimality Theory (Prince and Smolensky, 1993/2004),
much of this work has turned to the search for functional grounding of specific OT
constraints (see e.g. Hayes (1999), Smith (2002), Steriade (1999; 2001a), and papers in
Hayes et al. (2004)). Within this work on constraint grounding, there is widespread
agreement that functionally grounded constraints are those which prefer more
perceptually salient or less articulatorily challenging forms to those with less perceptual
prominence or greater articulatory difficulty.
Beyond this basic consensus, however, there is relatively little explicit discussion
of the relationship between functionally grounded constraints and their phonetic
motivations. A great deal of work identifies phonetic facts which correlate with constraint
activity while remaining uncommitted to a particular relationship between the phonetics
and the constraints. Chapter 1 proposed that functionally grounded constraints are
induced by learners based on their immediate perceptual, articulatory, and
psycholinguistic experience of the language surrounding them. In this and the following
chapter, I follow Steriade (2001a), Hayes (1999), and Hayes and Wilson (to appear) in
explicitly investigating the mechanism by learners induce functionally grounded
constraints.
89
This chapter will explore the perceptual and acoustic correlates of a phonotactic
restriction on word-initial unaspirated p found in Cajonos Zapotec (Nellis and
Hollenbach, 1980), Ibibio (Essien, 1990), and Moroccan Arabic (Heath, 1989). Chapter 4
describes a computational model in which a virtual learner ‘hears’ these naturalistic
acoustic properties, perceives and identifies them as actual speakers do, and induces the
phonologically attested constraints from this acoustic and perceptual data.
This chapter is structured as follows. Section 3.2 presents the basic phonotactics
of Cajonos Zapotec and Ibibio: p and b contrast in non-initial positions, while only b
occurs initially; coronal and dorsal stops contrast in voicing initially (and in other
positions). A similar dispreference for initial p holds in Moroccan Arabic loanwords. This
restriction follows from listeners’ propensity to misidentify initial p as b – that is, from
initial p’s unique perceptual difficulty, as discussed in section 3.3. A perceptual
experiment demonstrates that word-initial p is significantly more difficult for French
speakers to perceive than initial b. This perceptual difficulty is unique to initial p: medial
p is no more difficult to identify than medial b, and t is no more difficult than d in either
position. Section 3.4 finds acoustic sources for this perceptual difficulty in the similarity
of initial p and b’s VOTs and maximum burst intensities. Initial labials are significantly
more similar along these acoustic dimensions than are medial labials, or coronals in any
position. Taken together, these results indicate that the phonological restrictions on initial
p are due to its unique perceptual difficulty, and that this difficulty in turn follows from
initial p’s similarity to b. Chapter 4 will explore the proposed connections between these
factors.
90
3.2. Phonological restrictions on word-initial p
In Cajonos Zapotec (Nellis and Hollenbach, 1980), Ibibio (Akinlabi and Urua, 2002;
Connell, 1994; Essien, 1990), and Moroccan Arabic (1989) unaspirated p contrasts with b
in non-initial positions, but only b may surface initially. These languages allow other
pairs of segments contrasting in voicing (e.g. t and d, k and g) in initial position.
In Cajonos Zapotec, coronal and velar stops contrast for voicing initially,
medially, and finally. Labials contrast for voicing only medially and finally. Native
words can begin with b but not p. This restriction was phonologically productive until
recently. Older Spanish loans borrowed initial Spanish /p/ as [b], as in bej ‘sash’ (Sp.
pano) and bed (Sp. Pedro). Newer loans faithfully retain initial /p/, as in pat ‘duck’ (Sp.
pato). (96) p ~ b t ~ d k ~ g
*pèn to ‘one’ koc ‘pig’ ben ‘do!’ do ‘string’ goc ‘gunny sack’ gopee ‘fog’ yi ta ‘the squash’ wake ‘it can’ dobee ‘feather’ yi da ‘the leather’ wage ‘firewood’ jap ‘will care for’ yet ‘tortilla’ wak ‘it can’ jab ‘will weave’ zed ‘disturbance’ wag ‘firewood’
A similar restriction against word-initial p is found in Ibibio (data presented here
is from Essien (1990)). The distribution of voicing and length contrasts in Ibibio stops is
complex. Intervocalic stops are typically geminates, as singleton intervocalic stops are
generally lenited and medial clusters are banned. Ibibio has no voiced velar stop, and
coronals are devoiced syllable-finally and in geminates. See Essien (1990) and especially
Akinlabi and Urua (2002) for further discussion of Ibibio morphophonology.
Most interestingly for the present discussion, Ibibio licenses b but not p word-
initially. p and b contrast medially in dɨ́ppé ‘lift up’ versus dɨ́bbé ‘hide oneself’, and
91
finally in bɔ́p ‘build (something)’ versus bɔ́ɔb ‘build many things’. While there are b-
initial words like bàt ‘count’, there are no p-initial words like *pàt. Unlike labials,
coronals t and d contrast word-initially as in tàppa ‘call someone’s attention’ versus
dàppa ‘remove something from a fire’. (97) p ~ b t ~ d k (~ *g)
*pà t tàppa ‘call someone’s attention’ kárá ‘govern’ bàt ‘count’ dàppa ‘remove s.t. from a fire’ *gárá dɨ́ppé ‘lift up’ sɨtté ‘uncork’ dàkká ‘move away’ dɨ́bbé ‘hide oneself’ *sɨddé29 *dàggá bɔ́p ‘build (something)’ wèt ‘write’ sák ‘laugh’ bɔ́ɔb ‘build many things’ *wèd *ság
Moroccan Arabic shows a similar dispreference for word-initial p. The native
Moroccan Arabic stop inventory is *p, b, t, d, tˤ, dˤ, k, g, q: the voicing contrast is
neutralized in labials such that there is b but no p (Heath, 1989). Recent loanwords from
Spanish and French have introduced p, as in diparˤ (Fr. depart ‘departure’) and diplˤum
(Fr. diplôme ‘diploma’). Word-initially, some loans like pasˤ ‘passport’ (Fr. passe) and
purˤ ‘port’ (Fr. port) borrow initial French p faithfully; however, this is a relatively recent
development. Heath reports, “[i]t would appear that formerly /b/ or /bˤ/ was the regular
borrowed form of p in stem-initial position.” (p. 91) Many more frequent examples of #p
#b borrowings are seen in bakiy-a (Fr. paquet ‘packet’) and blˤasˤ-a ‘place’ (Fr.
place). Many p-initial borrowings can also be pronounced with initial b, typically in rural
dialects, as in piniz ~ biniz ‘thumbtacks’ (Fr. punaise) and plˤay-a ~ blˤay-a ‘beach’ (Sp.
playa). This variation is not symmetrical: b-initial borrowings do not have p-initial
29 Essein reports the possibility of free variation between voiced and voiceless medial and final
coronal stops.
92
variants, indicating a specific dispreference for initial p. Intervocalic p is never borrowed
as b (though there are five examples in Heath’s corpus where VpV VbbV ), showing
that p is avoided primarily in initial position.
In all of these languages, initial p is penalized by a positional markedness
constraint, *#P. Unlike restrictions on word-initial segments which follow from the
*X(Onset/Word) constraints introduced in chapter 2, this restriction against word-initial p
has no parallels in syllable onsets, or onsets of phrases or utterances. Many such
positional restrictions reflect the fact that segments are uniquely difficult to articulate or
perceive in the positions where they are banned. This chapter argues that perceptual
factors are the basis for these restrictions against word-initial p.
In order to support this claim, section 3.3 will present the results of a perceptual
study in which French participants identified word-initial unaspirated p significantly
more slowly than word-initial b; the identification of medial p was not similarly different
from that of medial b. Parallel patterns of perceptual difficulty did not emerge in initial
and medial t and d. Together, these results indicate that word-initial p is uniquely
perceptually difficult. An acoustic study, reported in section 3.4, found an explanation for
these perceptual results in the fact that initial p and b are more similar to each other in
terms of the intensities of their release bursts and VOTs than medial p and b, or either
initial or medial t and d. These results thus support the claim that the constraint *#P
which bans initial p in Cajonos Zapotec and Ibibio is grounded in the perceptual
difficulty of initial p. The nature of this phonetic grounding is the topic of chapter 4.
3.3. The perceptual difficulty of word-initial p
In order to test the hypothesis that word-initial p is perceptually difficult, a perceptual
experiment was conducted in which participants were asked to identify tokens of initial
93
and intervocalic p, b, t, and d. The specific hypothesis tested is that initial p is more
difficult to accurately identify than word-initial b, that no similar asymmetry emerges
word-medially, and that this is a particular property of p rather than a general property of
voiceless stops (that is, that t is not similarly more difficult than d).
The results of this experiment showed that while speakers were able to identify p
and b with essentially equal accuracy, reaction times for initial p were on average 35 ms
slower than those for initial b. Reaction times for medial p were not similarly slower than
those for medial b; nor were reaction times for initial t slower than initial d. Thus word-
initial p appears to be uniquely perceptually difficult. This property of word-initial p
supports the claim made above, that the cross-linguistic restrictions on this segment in
initial position follow from its perceptual properties.
3.3.1. Methods
Voiceless stops in Cajonos Zapotec, Ibibio, and Moroccan Arabic are unaspirated. In
order to investigate whether initial p could be banned in these languages because of its
perceptual difficulty, a perceptual experiment was conducted using French participants
and stimuli. Voiceless stops are similarly unaspirated in French, and so the perceptibility
of initial and medial unaspirated p and t can be compared to that of b and d.
3.3.1.1. Stimuli
Participants were presented with fragments of words containing one of four consonants
(p, b, t, or d) in either initial prevocalic (#CV) or intervocalic (VCV) position. This
provided a measure of the relationship between, and relative difficulty of, initial p and b –
by hypothesis, initial p should be more difficult to identify than initial b. This relationship
could be compared to the relationship between medial p and b, to determine whether any
94
difficulty is unique to initial p, and also to the coronal stops to determine whether any
observed difficulty is particular to p rather than a general property of voiceless stops. The
stimuli used in this experiment were extracted from materials recorded for a previous
experiment, the results of which are not reported here.
These recorded materials were French words and nonwords corresponding to the
real words, all of which contained either initial or medial p, b, t, or d. The nonwords
differed from the real words only in the voicing of the target stop: a real word with initial
or medial p corresponded to a nonword in which the p was replaced by a b (e.g.
paragraphe ~ *baragraphe, capuccino ~ *cabuccino); real words with b similarly had
corresponding nonwords where b was replaced by p (e.g. bordeaux ~ *pordeaux,
robotique ~ *ropotique), and similarly for t and d (e.g. therapie ~ *derapie, itineraire ~
*idineraire; declaration ~ *teclaration, comedie ~ *comettie).
The sets of real words were balanced such that ps and bs, and ts and ds, were in
similar segmental contexts. Specifically, each p-initial word was paired with a b-initial
word in which p and b were followed by the same vowel. Similarly, pairs of words had
identical vowels following the target consonants in the medial p and b, initial t and d, and
medial t and d conditions. As each nonword corresponded to a real word, the nonwords
were thus balanced for following vowels as well. For words (and nonwords) with medial
target consonants, preceding vowels were also identical where possible. The experiment
for which these words were originally recorded also required that words be matched for
frequency, neighborhood densities, and uniqueness point location (these analyses were
conducted using data from the Lexique corpus (New and Pallier, 2005)), so perfect
correspondence between preceding vowels was not always possible.
95
3.3.1.2. Recording
A female native speaker of French was recruited to record the stimuli. The speaker lived
in France for 23 years, in and around Clermont-Ferrand, and speaks a standard Parisian
dialect of French. The speaker has been living in western Massachusetts (continuing to
speak French at home) for the past three years.
The recording task was designed to capture casual, naturalistic pronunciations of
the target consonants. All stimulus words and nonwords (generally, ‘strings’) were
recorded in frame sentences. Vowel-initial strings followed the consonant-final phrase
J’ai dit au mec [ʒe di o mk] ‘I said to the guy’ and consonant-initial strings followed the
vowel-final phrase J’ai dit [ʒe di] ‘I said’. Following each target string was a randomly-
chosen stop-initial adverb or adverbial phrase from the following set: deux fois ‘two
times’, trois fois ‘three times’, quatre fois ‘four times’, dix fois ‘ten times’, quelquefois
‘sometimes’, pour toi ‘for you’, gravement ‘gravely’, or doucement ‘sweetly’. Thus
sentences were thus of the form J’ai dit “bordeaux” quatre fois ‘I said “bordeaux” four
times’ or J’ai dit au mec “idineraire” pour toi ‘I said to the guy “idineraire” for you’.
The adverbs prevented phrase-final accent and lengthening from falling on the target
string. They also provided contrastive elements other than the strings themselves within
the set of recorded sentences, such that any contrastive focus present would not fall
entirely on the target string. As the target strings were of various parts of speech, these
frame sentences (which could be understood as telling someone a password in a specified
manner) allowed all target strings to appear in identical prosodic positions. A complete
list of stimulus words and nonwords in their frame sentences can be found in Appendix 1.
Sentences were recorded through a head-mounted microphone (MicroMic II C420
by AKG) onto an iMac using Adobe Audition software. Sentences of four basic types
were recorded in four separate sessions (typically on separate days), for words with initial
96
target consonants, nonwords with initial targets, words with medial targets, and nonwords
with medial targets. All stimuli of each type were recorded together, in a single
randomized list. The speaker was instructed to read the sentences quickly and casually,
without sentence-internal pauses. Each set of sentences was recorded twice, and some
sentences were recorded more than twice to correct mistaken pronunciations (especially
of nonwords) or hyperarticulations of the target phonemes.
3.3.1.3. Stimulus construction and acoustic manipulation
Target words and nonwords were spliced out of the sentences to create #CV and VCV
stimuli for the identification task. From the multiple recordings of each sentence, a single
token was selected from which a stimulus was created. Stimuli were created from
sentences which were fluent, without pauses or hesitations, and were relatively rapid and
without hyperarticulated target consonants. The target consonants were further evaluated
to ensure that voiceless stops had clear voiceless intervals and voiced stops had complete
closure voicing, and that stop releases were not fricated or followed by devoiced vowels.
Finally, when the rest of the criteria were met, the stimulus string in which a target
voiceless consonant had the shortest VOT and weakest release burst was chosen. This
was done in hopes that such voiceless stops would be more confusable with their voiced
counterparts than would those with stronger release bursts.
From the selected strings, the target consonants and portions of flanking vowels
(following vowels only for word-initial stimuli; preceding and following vowels for
medial stimuli) were extracted. All stimulus processing was done using Praat (Boersma
and Weenink, 2006). Each stimulus ultimately consisted of a target consonant plus a
portion of any flanking vowels – preceding and following vowels for VCV stimuli;
following vowels only for #CV stimuli.
97
To obtain this segment of a recorded string, the edges of the target consonants and
flanking vowels were first manually labeled according to the following criteria. The
beginning of a word-initial stop was taken to be the beginning of the consonant closure,
after the final vowel of the preceding word. Due to the prosodic structure of the frame
sentence, there was frequently a very short break between the final vowel of the
preceding word and the initial consonant of the target string. The beginning of a medial
stop was similarly labeled at the beginning of the consonant closure, at the peak of the
first waveform period in which the first formant of the preceding vowel was attenuated in
the spectrogram. All stops’ endpoints were labeled after the stop burst, at the beginning
of the steady state of the following vowel, at the peak of the first period of the waveform
in which the spectrogram showed identifiable vowel formants. The beginnings of
preceding vowels and the ends of following vowels were similarly labeled at peaks of the
outermost periods where vowel formant structure emerged or disappeared. A
representative sample of this labeling is shown in Figure 1.
Figure 1. Spectrogram and waveform for robotique, with edges of the target stop b and
the flanking vowels labeled
The stimulus interval was then extracted, using a window with a 5 ms on-ramp
and off-ramp. The window removed the outer quarter of each flanking vowel in order to
render stimulus vowels as free of irrelevant coarticulation as possible, while preserving as
Time (s)0 0.512245
0
5000
r o b o t i q u e
Time (s)0 0.512245
Time (s)0 0.512245
0
5000
r o b o t i q u e
Time (s)0 0.512245
98
much of the vowel as possible. The windowed stimulus extracted from robotique is
shown in Figure 2. The zero-amplitude portions of the files were then removed.
r o b o t i q u e
Time (s)0 0.512245
Figure 2. Waveform for robotique (from Figure 1), after windowing removes everything but the target consonant and the inner three-quarters of each flanking vowel.
3.3.1.4. Participants
Fifteen native speakers of French were recruited through UMass to participate in the
perceptual experiment (2 additional participants were ultimately excluded, as discussed
below). Of the 15 participants, 10 were from France, 4 from Canada, and 1 from
Switzerland. Participants’ native dialects were presumed to be irrelevant, as all speakers
had been exposed to large amounts of standard Parisian French. All participants were
presently living in Amherst, and were students (undergraduate, graduate, or recent
alumni) or professors at the University of Massachusetts ranging in age from 18 to 40.
All participants whose results are reported below speak French on a daily basis.
Participants had normal hearing and were free of speaking disorders. An informed
consent form was obtained from each participant, and participants were paid $10 for their
participation.
99
3.3.1.5. Procedure
The experiment was conducted using Superlab Pro 2.0.4 software for PC, Sennheiser
HD280 pro headphones, and a Cedrus RB-834 response pad. The response pad had
colored buttons labeled P, B, T, and D; the position of the buttons was rotated randomly
across participants. There were 8 conditions: 4 stops (p, b, t, d) × 2 positions (initial #CV
or medial VCV). In each condition, each participant heard approximately 45 unique
stimuli once each, for a total of 360 stimuli. A list of words from which stimuli in each
condition were extracted can be found in Appendix 1. The entire experiment, including
instructions and breaks, took approximately 15 minutes.
Participants were first presented with written and auditory instructions. All
materials presented during the experiment were in French, in order to encourage
participants to perceive the stimuli as French sounds. Participants were told that in each
trial, they would hear a stimulus containing either p, b, t, or d. They were to respond by
pressing the button indicating the consonant that they heard as quickly as possible. The
instructions were followed by a training period of 32 trials, in which participants heard 16
stimuli (2 in each condition) twice each, in random order. Stimuli used for training were
distinct from those used during testing, and are identified in Appendix 1.
During the presentation of each stimulus, the screen showed the four possible
responses. The position and color of each response on the screen corresponded with its
position and button color on the response pad. In order to increase the difficulty of the
task, participants had only a one-second interval following the offset of the stimulus in
which they could respond. Participants’ reaction times were recorded (as measured from
the end of the stimulus). After a participant’s response to each training trial, the correct
answer appeared on the screen. If the participant failed to respond during the interval
allowed, a message appeared saying that they had not responded quickly enough and
100
asking them to respond more quickly in the future. After either this message or the
correct response appeared on the screen, the next trial began after a 750 ms intertrial
interval.
After the 32 training trials, a written and auditory message told participants that
the correct responses would not appear on the screen in future trials. The testing phase of
the experiment began after this message. Each trial of the testing phase was identical to
the training trials except for the absence of correct answers after participant responses.
The experiment consisted of three blocks of 120 trials each. Trials were randomly
assigned to blocks such that blocks consisted of equal numbers of trials from each
condition; blocks were separated by self-timed breaks. The order of the blocks was
randomized across participants, and stimuli were presented in random order within a
block. No stimuli were repeated during the testing phase, and all participants heard the
same stimuli.
3.3.1.6. Analysis
Participants’ responses were initially evaluated to check for outliers. This initial analysis
revealed six stimuli which were acoustically problematic, and thus for which participants
had significant difficulty performing the identification task. In these six stimuli, the
percentage of correct responses was more than three standard deviations below the
average percent correct for items in that condition. Each of these stimuli proved to have
some clear perceptual difficulty, stemming either from the windowing process (which
occasionally left traces of consonants in the initial portion of vowels which preceded
target consonants) or from a devoiced vowel following a voiceless target consonant.
These excluded stimuli are identified in Appendix 1. All responses to each of these six
stimuli were disregarded in the analyses presented below.
101
Additionally, some of the participants’ description of their language background
left some uncertainty regarding their French competence. An evaluation of participants’
mean accuracy across conditions revealed that for two participants, there were two or
more conditions in which those participants’ mean accuracies were more than two
standard deviations below the mean for all participants. For one of these participants, it
had been unclear whether he was a native speaker of French in addition to his two other
native languages. While the other was a native speaker, it had been over a decade since
she spoke French regularly, as opposed to the other participants who all presently speak
French on a daily basis. These two participants’ responses were therefore disregarded in
the analyses presented below, which show data from the remaining 15 participants.
After excluding invalid stimuli and participants from the experiment, the
remaining data were analyzed as follows. The general idea that word-initial p is more
perceptually difficult than initial b leads to the specific hypothesis that responses to initial
p are slower and/or less accurate than those for initial b. As only initial p is predicted to
be perceptually difficult, no similar difference in perceptibility should be found between
medial p and b or initial t and d. This suggested that analysis proceed by performing two-
sample one-tailed t-tests on reaction time and percent-correct data for pairs of segments
matched for place and position, differing in voicing – i.e. initial p and b, and t and d;
medial p and b, and t and d. The results of this analysis are presented in the next section.
The results presented in this chapter are all from items analyses. Each condition
had approximately 45 items; as 15 participants participated, an items analysis was
therefore more powerful than a subjects analysis. Both analyses showed the same trends,
but significance was often only attained in the items analysis. With more participants and
102
thus more power, significance is expected to emerge from the subjects analysis as well.
Results from the subjects analysis can be found in Appendix 2.
3.3.2. Results
The primary result of the perceptual experiment is that word-initial p was identified
significantly more slowly than initial b, while neither medial p nor either initial or medial
t show the same delay in recognition with respect to their voiced counterpart. There was
no corresponding difference between initial p and b in terms of the accuracy with which
they were identified; for this reason reaction time results will be presented first, and then
accuracy results will be discussed.
3.3.2.1. Reaction time
By hypothesis, initial p is uniquely perceptually difficult. This difficulty could reveal
itself either in inaccurate identification of initial p or slow reaction times to initial p
(Pisoni and Tash, 1974). This leads to the specific prediction that accurate recognition of
initial p should be slower (and/or less accurate) than initial b, and that neither initial t and
d nor medial p and b should similarly differ in this way. To test this hypothesis,
preplanned two-sample one-tailed t-tests were performed on reaction times for subjects’
correct identifications of each segment. Accurate identification of initial p proved to be
significantly slower than identification of initial b. The average reaction time for accurate
initial p responses was 588 ms. This is marginally significantly greater than the average
response time for initial b responses (555 ms; t(88) = 2.445, p = 0.016).30 Reaction times
30 Because these comparisons cover four conditions (p, b, t, and d in a given context), a Bonferroni
correction is applied to α = 0.05 such that α here is equal to 0.05/4, 0.0125, for all t-tests described in this chapter.
103
for medial p (599 ms) are not significantly different from reaction times for medial b (592
ms; t(82) = 0.485, p = 0.629), showing that only initial p is more slowly recognized than
b.
Turning to the coronal stops, response times for initial t (495 ms) were quicker
than those for initial d (538 ms; t(85) = 3.742, p < 0.001), indicating that slow responses
are not a general property of initial voiceless stops but rather a unique property of initial
p. Medial t (548 ms) is also significantly more rapidly recognized than medial d (593 ms;
t(87) = 3.772, p < 0.001), suggesting that whatever the source of t’s speed, it is a general
property of t rather than a specific property of word-initial t. Figure 3 summarizes these
results, showing average reaction times for each condition with 95% confidence intervals.
Figure 3. Average reaction times (ms) in each condition, with 95% confidence intervals
(from items analysis).
3.3.2.2. Response accuracy
Participants’ slow reaction times for initial p do not correlate with any difference in the
accuracy of responses to initial p versus b stimuli. Initial p stimuli were identified
correctly 93% of the time, and initial b stimuli were similarly identified correctly 94% of
104
the time. A two-sample t-test indicates that this difference is not significant (t(88) =
0.314, p = 0.754). There was also no difference between participants’ identification of
medial p (91%) and b (91%; t(82) = 0.121, p = 0.904).
Participants’ accurate responses to t stimuli were consistently faster than those to
d stimuli, and there are also trends (though not significant ones) indicating that t
responses are more accurate than d responses. Initial t was identified correctly in 96% of
the trials, and initial d was identified correctly only 94% of the time (t(85) = 1.524, p =
0.131). The same trend exists in medial t and d tokens: t responses are 89% accurate,
while d responses are 87% accurate (t(87) = 0.949, p = 0.345). These results are
summarized in Figure 4.
Figure 4. Average percent correct in each condition, with 95% confidence intervals
(from items analysis).
3.3.2.3. Ruling out alternative explanations of the effect
Word/Nonword stimuli
As mentioned above, half of the #CV and VCV stimuli were extracted from real words,
while the other half were extracted from nonwords. A comparison of responses to word-
105
derived and nonword-derived stimuli reveals that participants respond similarly to both
sets of stimuli. The effects appear to be basic phonetic properties of initial and medial
stops, rather than simply lexical effects which would surface in real-word stimuli only.
A series of two-sample t-tests comparing p–b and t–d responses within the sets of
word and nonword stimuli shows that the patterns of results reported above are found in
participants’ responses to both word and nonword stimuli. These results are summarized
in Tables 1 and 2. None of the results which were significant above are significant here,
as the amount of data considered in each case is half of that pooled above. The trends are
strongly similar, however, and so with more participants, significance would presumably
be found here as well.
Looking first at the reaction time results shown in Table 1, initial p responses tend
to be slower than b responses in words (32 ms) and nonwords (34 ms), while medial p
and b responses show much less of a difference in both words (14 ms) and nonwords (3
ms). None of these results are significant, though response times for both initial and
medial t are again significantly faster than those for d in words as well as nonwords.
Turning to the accuracy results given in Table 2, neither words nor nonwords
show a difference in the accurate identification of initial p vs. b or medial p vs. b. As
above, the only noteworthy difference in accuracy is that t can occasionally be more
accurately recognized than d.
106
Words: Reaction time Nonwords: Reaction time
Initial Medial Initial Medial
Mean (ms)
p value
Mean (ms)
p value
Mean (ms)
p value
Mean (ms)
p value
b 555 b 601 b 555 b 582
p 587
0.079 t(44) = 1.799 p 615
0.439 t(39) = 0.781 p 589
0.096 t(44) = 1.701 p 585
0.883 t(41) = 0.149
d 550 d 605 d 526 d 582
t 504
0.009 t(41) = 2.750 t 556
0.006 t(43) = 2.888 t 486
0.011 t(42) = 2.675 t 537
0.011 t(43) = 2.663
Table 1. Reaction time analyses of stimuli trimmed from word and nonword recordings, with p values from preplanned two-sample t-tests (from items analysis).
Words: Percent correct Nonwords: Percent correct
Initial Medial Initial Medial
Mean (%)
p value
Mean (%)
p value
Mean (%)
p value
Mean (%)
p value
b 94 b 91 b 93 b 91
p 95
0.641 t(44) = 0.469 p 92
0.679 t(39) = 0.417 p 92
0.782 t(44) = 0.278 p 91
0.843 t(41) = 0.200
d 95 d 85 d 94 d 88
t 96
0.647 t(41) = 0.461 t 91
0.077 t(43) = 1.812 t 97
0.120 t(42) = 1.587 t 86
0.481 t(43) = 0.711
Table 2. Percent correct analyses of stimuli trimmed from word and nonword recordings, with p values from preplanned two-sample t-tests (from items analysis).
Flanking vowel effects
Another property which was not perfectly controlled across stimuli is the identity of the
vowels flanking the consonants. As discussed above, flanking vowels were controlled
within pairs of consonants differing in voicing. That is, each initial-p stimulus is paired
with an initial-b stimulus with the same following vowel. Likewise for initial t and d, and
for medial p–b and t–d. However, the vowels following initial p and b are not necessarily
the same as those following initial t and d. The following analysis reveals that these
differences in flanking vowels are not responsible for the results observed here.
107
There are four vowels which follow target consonants in each of the four
conditions: / i o y/. To determine whether vowels which followed consonants in only a
subset of the conditions skewed the results, items were separated into those with the
shared following vowels and those with unshared following vowels. Reaction times for
these two subsets of items were then analyzed as above, using two-sample t-tests. These
analyses showed that regardless of whether the following vowels were shared across all
conditions, initial p was always slower than initial b, while t was faster than d both
initially and medially, and medial p and b had relatively similar reaction times. The
differences between initial p and b are not significant; this is again presumably because
there are relatively few of each such item. With more items, this measure would be
expected to prove significant as well.
Shared following vowels Non-shared following vowels
Initial Medial Initial Medial
Mean (ms)
p value
Mean (ms)
p value
Mean (ms)
p value
Mean (ms)
p value
b 549 b 597 b 559 b 585
p 587
0.145 t(34) = 1.491 p 603
0.749 t(44) = 0.322 p 589
0.047 t(54) = 2.030 p 593
0.740 t(36 = 0.344
d 551 d 588 d 528 d 600
t 489
0.001 t(33) = 3.700 t 542
0.004 t(45) = 3.047 t 499
0.061 t(50) = 1.913 t 552
0.017 t(41) = 2.491
Table 3. Reaction times analyses of stimuli in which consonants are followed by the same vowels (/ i o y/) in all eight conditions, and those in which the following vowels are not shared across all conditions. p values are from preplanned two-sample t-tests, using items analyses.
Segmental frequency
A final alternative explanation would link the differences among reaction times to the
relative frequencies of p, b, t, and d in French. Speakers could simply respond more
slowly to less frequent segments. By this reasoning, the slow responses to initial p should
follow from a comparative scarcity of initial p in the lexicon. An analysis of the lexical
108
and corpus frequencies of these four segments, however, does not support such an
explanation.
Type and token frequencies for each of the four stops in initial and medial
position are provided in Table 4. This information is calculated using the Lexique
database (New and Pallier, 2005), which includes pronunciation and frequency
information for 135,000 words of French. Each type frequency is the total number of
occurrences of the segment in the position in the Lexique corpus; token frequencies were
calculated by multiplying the number of ps, bs, etc. in the relevant position in each word
by the word’s frequency (per million words)31 and then summing these values for all
words including p.32
Word-initial consonants Word-medial consonants
Type frequency
Token frequency (per million words)
Type frequency
Token frequency (per million words)
b 6,986 18,222 11,404 12,554
p 11,797 77,175 17,893 24,403
d 11,657 77,086 15,767 24,364
t 6,531 55,816 48,505 107,928 Table 4. Frequency measures, from the Lexique corpus. Type frequency is a count of
the occurrences of a consonant in the words contained in Lexique; a consonant’s token frequency is derived from word frequency data given in Lexique.
These results do not explain the observed differences in reaction times.
Specifically, if frequency were to explain participants’ delay in responding to word-initial
31 The Lexique database includes frequency data based on a corpus of text from novels and also based on a corpus of text from contemporary film subtitles. As the film corpus includes more contemporary vocabulary, and is also more representative of spoken (rather than literary) vocabulary, the film frequencies were used here.
32 The negative particle pas ‘not’ is extremely frequent (token frequency per million = 9,132). However, initial p still has type and token frequencies greater than those of initial b when this frequent particle is set aside.
109
p relative to initial b, p should be less frequent than b initially; instead, there are nearly
twice as many p-initial words as b-initial words, and the p-initial words are
approximately four times as frequent as the b-initial words. If anything, the frequency of
initial p should facilitate its identification; no such effect was found.
3.3.3. Discussion
The experimental results indicate that word-initial p is, as predicted, uniquely
perceptually difficult. While p and b are recognized with equal accuracy in both initial
and medial position, initial p is significantly slower than initial b while medial p and b are
identified with equal speed.33 Responses to t and d follow a different pattern: both initial
and medial t are recognized more quickly than d, and t responses tend to be more
accurate overall than d responses as well. The experiment was originally motivated by the
cross-linguistic observation that in a language where p, b, t, and d all contrast medially,
only p can be banned word-initially. These results demonstrate that this typological
generalization correlates with the fact that listeners have more difficulty identifying
initial p than any of the other three sounds, and suggest that the constraint *#P is
functionally grounded in these perceptual facts.
This perceptual difficulty is observed both in segments extracted from real words,
all judged to be commonly known and used by the speaker, and nonce forms which the
speaker often had difficulty producing correctly. Word-initial p is therefore difficult to
perceive both when it is pronounced in a familiar word with which the speaker has
articulatory experience, and also when it is pronounced as part of a novel word for which
33 While the significance of initial p’s slowness relative to initial b is only marginal, no other pair
of stops shows a comparable, uniquely word-initial pattern with anything even approaching significance; therefore the pattern appears to be unique and reliable within the data given.
110
more explicit attention to articulation is required. The perceptual difficulty is thus
independent of a speaker’s experience (or lack thereof) with a word, and so independent
of any effects of articulatory attention or rehearsal.
A surprising aspect of these results is the fact that participants can typically
identify initial segments more easily than medial segments. That is, initial segments are
generally identified both more quickly and more accurately than medial segments. This is
unexpected, as medial VCV stimuli include information about stop voicing (and place) in
the transitions from preceding vowels; this information is absent in initial #CV stimuli.
Further investigation is needed in order to explain this result.
3.4. Acoustic similarity between word-initial p and b
The results of the perceptual experiment indicated that listeners identify word-initial p
more slowly (though no less accurately) than other initial stops b, t, or d. Listeners’
reaction time to medial p, however, is no slower than their reaction time to other medial
stops. This result is consistent with the hypothesis that languages like Cajonos Zapotec
and Ibibio ban p in strictly word-initial position, while no language bans only word-initial
b, t, or d, because initial p is uniquely perceptually difficult.34
This section explores the acoustic basis for this perceptual difficulty. It seems
likely that listeners’ difficulty in perceiving initial p stems from the fact that initial p is
acoustically more similar to b than initial t is to d, or medial p to b or t to d. Further, this
similarity is presumably asymmetric: more initial p tokens are b-like than vice versa. In
order to investigate the acoustic similarity between pairs of stops in various positions, the
34 This perceptual difficulty is found only in reaction time, and not in measures of accuracy. See
section 3.5 for discussion of why these reaction time effects alone may be sufficient to trigger phonological restrictions on word-initial segments.
111
stops’ release burst intensities and voice onset times (VOT) were measured. Both of these
properties are important cues for voicing (Lisker and Abramson, 1964; Repp, 1979), and
so the attenuation of either distinction between initial p and b could render these two
perceptually similar.
3.4.1. Methods
Burst intensity and VOT were calculated for the 376 French stops used as training and
testing stimuli in the perceptual experiment. All tokens were produced by a single female
speaker of French, and were extracted from real words and nonwords recorded in
sentence-medial position; see section 3.3.1 for further details about the recordings.
3.4.1.1. Acoustic analysis
In order to measure stops’ burst intensity and VOT, the time of each stop’s release and
voicing onset were manually labeled. All acoustic analysis was conducted in Praat
(Boersma and Weenink, 2006). For each token, the release was identified at the first
instance of aperiodic noise following the closure (which was either silent or had periodic
voicing).
In order to obtain comparable burst intensity values for voiced and voiceless
stops, any voicing which was present during the release of a voiced stop needed to be
removed from the signal before the intensity of the burst itself could be calculated. To
ensure that only the intensity of bursts themselves were considered, all of the recordings
were bandpass filtered such that only noise between 2000 and 12000 Hz was included.
These values were chosen based on the acoustic properties of the tokens used here. The
bottom of this band is high enough to ensure that voicing is disregarded, and the band’s
112
upper edge is high enough to ensure that no high-frequency burst energy is ignored. The
filter used was a Hanning window with 100-Hz smoothing skirts.
After filtering, intensity was calculated at 1-ms intervals across a window
beginning 5 ms before the manually labeled release point and ending 5 ms after the
release. The maximum intensity value within this interval was regarded as that token’s
maximum release intensity.
The voiceless stops’ VOT were also measured. Voiced stops in French are
prevoiced, and all stimuli selected for the perceptual experiment had full closure voicing.
VOT was not measured for voiced stops, as it is simply equivalent to the fully voiced
stops’ closure durations. French voiceless stops have very short positive VOTs (that is,
voicing tends to start very shortly after release). After each voiceless stop’s voicing onset
was manually labeled as described below, the VOT of each voiceless stop was the
calculated difference between the time of voicing onset and the time of release.
Each voiceless stop’s voicing onset was labeled at the point where periodic
voicing first appeared in the waveform following the stop closure, in addition to aperiodic
burst noise which typically began shortly before the onset of voicing. High vowels (/i/,
/u/, and /y/) were often partially or fully devoiced after voiceless stops, giving stops
preceding these vowels significantly delayed VOTs. In order to calculate VOT
independently of vowel effects, only stops followed by nonhigh vowels were considered
in the VOT analysis.
3.4.1.2. Statistical analysis
As in the perceptual experiment, by hypothesis, initial p–b are more similar than medial
p–b or initial or medial t–d. To test this hypothesis, preplanned two-sample t-tests were
113
performed on the intensity measures, comparing the values for initial and medial p–b and
t–d.
All voiced stop tokens used in the perceptual experiment were fully voiced
throughout closure as well as during and after release, while voiceless stops had no
closure voicing and relatively short positive VOTs. Hay (2005) shows that in languages
where stops are prevoiced, listeners tend not to distinguish among stimuli with varying
amounts of prevoicing. Instead, they use the presence of prevoicing to categorically
indicate that a stop is voiced without making fine-grained temporal distinctions. Listeners
do, however, distinguish among stops with varying positive VOTs.
Following this, a shorter positive VOT (i.e. one closer to 0) was considered more
similar to that of a voiced stop than a longer positive VOT. That is, while the VOTs of
voiced and voiceless stops were never directly compared, those of voiceless stops were
compared to each other. If one voiceless stop had a significantly shorter VOT than
another, the short-VOT stop was considered more similar to its voiced counterpart than
the long-VOT stop is to its voiced counterpart. VOT similarity was evaluated via
preplanned t-tests comparing initial and medial p, to determine whether initial p’s VOT is
significantly shorter than medial p’s, and similarly initial p and t, as well as initial and
medial t for comparison. The results of these tests are reported below.
3.4.2. Results
As predicted, initial p–b are revealed to be more similar than medial p–b or initial or
medial t–d in terms of both the maximum intensities of their release bursts and also their
VOTs. As section 3.3.2.3 showed that there was no perceptual difference between stimuli
extracted from words vs. nonwords, or those with shared vs. non-shared flanking vowels,
the acoustic effects of these conditions were not investigated.
114
3.4.2.1. Maximum burst intensity
Looking first at the intensity measures, the maximum burst intensities of initial p and b
are not significantly different. The burst intensities of all other pairs of voiced and
voiceless segments are significantly different.
The average maximum intensity of a release burst which follows a word-initial b
is 53 dB, and the average maximum intensity of a burst following initial p is 55 dB. The 2
dB difference between these two values is not significant (all statistical results are given
in Table 5). A burst following a medial b, on the other hand, is 52 dB while that
following medial p is 56 dB. This 4 dB difference is significant, indicating that initial p–b
are more similar than medial p–b. Initial p–b are also more similar than initial t–d: an
average initial d burst is 60 dB while an average initial t burst is 64 dB, and this 4 dB
difference is significant. The similarity between initial p and b is thus not simply a
general property of initial stops, but rather a specific fact about initial labials. These
results are summarized in Table 5 and in Figure 5, which shows the average maximum
intensity in each condition with 95% confidence intervals.
Maximum burst intensity
Initial Medial
Mean (dB) Difference p value Mean (dB) Difference p value
b 53 b 52
p 55 2
0.129 t(93) = 1.530 p 56
4 <0.001 t(93) = 4.495
d 60 d 57
t 64 4
<0.001 t(91) = 5.937 t 62
5 <0.001 t(93) = 7.969
Table 5. Maximum release burst intensity measures for initial and medial p, b, t, and d, with differences and p values (from preplanned two-sample t-tests) for pairs of stops differing in voicing.
115
Figure 5. Average maximum burst intensity in each condition, within 5 ms of release,
with 95% confidence intervals.
3.4.2.2. Voice onset time
Turning to the VOT measures, initial p and b are again more similar than medial p–b or
initial or medial t–d. This follows from the fact that initial p has the shortest VOT of all
four voiceless stops; recall that here, similarity between voiced and voiceless stops is
indicates by the shortness of voiceless stops’ VOTs.
Initial p’s 16 ms VOT is significantly shorter than that of medial p (22 ms; t(63) =
2.719, p = 0.008), rendering initial p and b significantly more similar than medial p and b.
Initial p’s VOT is also significantly shorter than initial t’s VOT (34 ms; t(64) = 7.995, p <
0.001). Initial p and b are thus also more similar than initial t and d. Further, while initial
labial stops are more similar than medial labial stops, the pattern is reversed for coronals:
medial t has a significantly shorter VOT (29 ms) than initial t (34 ms; t(57) = 2.432, p =
0.018). The tendency for initial p and b to have relatively similar VOTs compared to
medial p and b is thus a specific property of labials, rather than a general property of all
116
stops. These results are summarized in Figure 6, which shows the VOT of each voiceless
stop with 95% confidence intervals.
Figure 6. VOT of initial and medial voiceless labial and coronal stops followed by non-
high vowels, with 95% confidence intervals.
Figure 6 also shows that the average VOTs of both initial and medial p are
consistently shorter than those of initial and medial t, respectively, as expected. The
difference between initial p and t is significant, as reported above. Similarly, medial p’s
VOT is 22 ms, which is significantly shorter than medial t’s 29 ms VOT (t(56) = 3.135, p
= 0.003).
3.4.3. Discussion
The acoustic properties measured here provide an explanation for the unique perceptual
difficulty of word-initial p, as described in the preceding section. Word-initial p and b are
on average more acoustically similar than are medial p and b, or initial t and d. Initial p
has a shorter VOT than either initial t or medial p. As French voiced stops have
consistently negative VOTs, initial p’s short VOT indicates that it is more similar to
117
initial b than either initial t or medial p are to their voiced counterparts. Further, while
medial p and initial t both have significantly stronger bursts than their voiced
counterparts, initial p and b are not reliably distinguishable based on their bursts; this cue
to voicing is thus unavailable for initial labial stops.
While the differences between initial and medial stops which make initial p and b
uniquely similar have not been previously observed, the majority of the acoustic
properties of these stops observed here are as expected given what is known about VOT
and burst intensity in French stops. First, this experiment determined the average VOT of
initial p and t to be 16 ms and 34 ms, respectively. These measurements are consistent
with those of Kessinger and Blumstein (1997). Labial stops had, on average, less intense
release bursts than did coronal stops. This is also expected given the larger oral cavity
volume and thus lower oral air pressure in a labial stop compared to a coronal stop.
Likewise, the fact that voiceless stops tend to have more intense bursts than do voiced
stops is expected, as vocal fold vibration during voiced stops impedes air flow into the
oral cavity and results in a lower oral air pressure.
Turning to acoustic distinctions between initial and medial stops, word-initial
stops are generally expected to be more strongly articulated – and thus to have longer
VOTs and stronger bursts – than medial stops, as word-initial segments are subject to
domain-initial strengthening (Byrd, 2000; Keating et al., 1999). Evidence of this strength
can be found in t, which tends to have both a stronger burst and also a longer VOT when
it is word-initial. Initial t’s burst has an average maximum intensity of 64 dB, compared
to 62 dB for medial t; similarly, initial t’s average VOT is 34 ms, compared to 29 ms for
medial t. These are consistent with reports from Keating et al. (1999: 156) that in Korean,
118
the VOT of word-initial t is consistently longer than that of medial t.35 Initial p is thus
atypical in its relative acoustic weakness compared to medial p. This weakness is the
source of initial p’s acoustic similarity to b, which in turn is the source of its perceptual
difficulty. 36
These acoustic results have shown that initial p and b are similar. But the
perceptual study indicated that these two segments are not symmetrically confusable;
instead, initial p is uniquely perceptually difficult. The source of this asymmetry must be
some property of p itself, rather than simply the similarities between p and b that have
been discussed so far.
A likely acoustic source of this asymmetrical perceptual difficulty lies in p’s
greater acoustic variability. The standard deviation of initial p’s burst is greater than that
of initial b: 6.0 and 4.7 dB, respectively.37 Similarly, while VOT was not measured for
voiced stops, initial p’s VOT is the most variable of all voiceless stops: its standard
deviation is 9.4 ms, while that of initial t is 8.7 ms, medial p 7.1 ms, and medial t 8.9 ms.
Because initial p’s burst is more variable than initial b’s, there are more initial p
tokens with burst intensities equal to, or even less than, the mean burst intensity for initial
b than there are initial b tokens with burst intensities equal to or greater than the average
for initial p. Put more simply, there are more b-like initial ps than there are p-like initial
35 Keating et al. (1999: 154) failed to find a consistent effect of VOT on word-initial vs. syllable-
initial t in French. They also failed to find a difference between initial and medial French t in terms of seal duration or linguopalatal contact. They did find that intonational phrase-initial and utterance-initial t has more and stronger contact, and tends to have a longer VOT, than word-initial or syllable-initial t.
36 The finding here that initial t is stronger than medial t also suggests that the comparative weakness of initial p is not simply due to a general property of French such as word-final stress, If this were the case, all initial segments could be consistently weaker than medial segments, as the latter fall closer to the stressed syllable.
37 The standard deviations of other consonants’ bursts: initial d = 3.4; initial t = 3.4; medial b = 4.3; medial p = 4.0; medial d = 3.1; medial t = 3.5 (all dB).
119
bs, in terms of burst intensity. Initial p’s VOT is also more variable than, and shorter
than, that of any other voiceless stop. This suggests that there are also more tokens of
initial p with extremely short VOTs similar to that of a voiced stop than there are of other
voiceless segments: there are again more b-like initial ps than b-like medial ps, or d-like
ts in any position. It could thus be the case that while the general similarity between
initial p and b makes them difficult to distinguish, the great variability of initial p makes
it more b-like than initial b is p-like, thus accounting for initial p’s unique perceptual
difficulty.
3.5. Summary and general discussion
These experiments have shown that French word-initial p is significantly more
acoustically similar to initial b (in terms of its VOT and maximum burst intensity) than
initial t is to d, and than medial p is to b. Further, initial p is acoustically more b-like than
initial b is p-like: the similarity is asymmetric. These acoustic findings are consistent with
the perceptual observation that French speakers find initial p uniquely perceptually
difficult: listeners take on average 35 ms longer to accurately identify initial p than initial
b, while no other voiceless stop is identified similarly more slowly than its voiced
counterpart.
Despite these acoustic similarities, voiced and voiceless stimuli differ consistently
in the presence vs. absence of closure voicing. The presence of this reliable cue to
voicing makes initial p’s perceptual difficulty somewhat surprising, given Hay’s (2005)
finding that listeners typically make binary judgments about a segment’s voicing based
on the presence vs. absence of prevoicing. The fact that participants in the present
experiment could ever misidentify voiced segments as voiceless (and vice versa) suggests
that segments’ closure voicing (or the absence thereof) is simply not always perceived by
120
a listener. If further investigation of this matter shows this to be the case, these two
findings could be reconciled: listeners make binary judgments about a segment’s voicing
based on the presence vs. absence of prevoicing, when that segment’s closure voicing is
accurately perceived. When listeners misperceive this cue, however, they are prone to
delayed or ultimately incorrect decisions about voicing.
While the results of the perceptual experiment reported above correlate with the
observed restrictions on initial p, they also demonstrate that initial p’s perceptual
difficulty is fairly subtle, affecting participants’ reaction time but not their overall
accuracy.38 The modest nature of initial p’s perceptual difficulty invites the question of
whether it is ultimately sufficient to trigger the attested phonotactic restrictions.
Even when accuracy isn’t affected, delayed identification of a word-initial
segment could cause significant difficulty in word recognition. Accurate identification of
word-initial segments is uniquely important in recognizing words. Once a listener
identifies a word boundary by some means (see e.g. Cutler and Norris (1988), Marslen-
Wilson and Welch (1978), and Luce and Pisoni (1988) for theories of word
segmentation), identification of the word following this boundary is particularly reliant
on accurate identification of the word-initial segment (Marslen-Wilson, 1975; Marslen-
Wilson and Welsh, 1978; Nooteboom, 1981; Pitt and Samuel, 1995).
Identification of p-initial words could thus be affected by listeners’ delayed
identification of initial p. With an average reaction time of 588 ms, recognition of initial
p is significantly slower than that of initial b (555 ms), and also slower than initial t and
d. p-initial words should therefore be identified overall more slowly than words with
38 The consistent presence/absence of closure voicing may allow listeners to compensate for the
perceptual difficulty imposed by initial p and b’s acoustically similar VOTs and burst intensities, ultimately allowing them to correctly identify these stops.
121
other initial segments. Assuming that both speed and accuracy are important in word
recognition, this delay alone could be sufficient to motivate the induction of a constraint
against word-initial p.
These findings about word-initial p’s unique acoustic and perceptual properties
correlate with the cross-linguistic observation that languages like Cajonos Zapotec,
Ibibio, and Moroccan Arabic may ban p while licensing b word-initially, and also while
allowing a medial contrast between p and b. The initial p–b contrast is always neutralized
in favor of b, and never p. Neither t nor d may be banned in strictly word-initial position.
Thus the unique perceptual difficulty associated with initial p most likely is the source of
languages’ restrictions on this segment, via the functionally grounded constraint *#P.
The following chapter will investigate the nature of functional grounding by
exploring the relationship between these acoustic and perceptual facts and the constraint
*#P. To do this, chapter 4 describes a computational model in which a virtual learner
induces the phonological *#P constraint from its experience with realistic representations
of the acoustic and perceptual properties of initial and medial p, b, t, and d. The acoustic
input and perceptual output of the model are taken from the acoustic and perceptual
results found in this chapter.
122
Chapter 4. Modelling constraint induction
4.1. The nature of functional grounding and constraint induction
Word-initial p is phonologically marked. Evidence of this is found in languages like
Cajonos Zapotec (Nellis and Hollenbach, 1980), Ibibio (Essien, 1990), and Moroccan
Arabic (Heath, 1989). Only b is licensed word-initially, though p and b contrast in other
positions. These languages allow a contrast between other pairs of voiced and voiceless
stops in all positions, including word-initially, indicating that it is specifically initial p
which is dispreferred.
This phonological markedness correlates with initial p’s acoustic and perceptual
properties. As shown in the previous chapter, French speakers find word-initial p
significantly more perceptually difficult (as indicated by reaction time in an identification
experiment) than initial b. Medial p is no more difficult than medial b, nor is initial t
more difficult than initial d. Word-initial p is therefore uniquely perceptually difficult.
Word-initial p and b are also uniquely acoustically similar. Initial p has a much shorter
VOT than medial p or initial t; further, the maximum burst intensities of initial p and b
are not significantly different, while other voiced and voiceless stop pairs can generally
be distinguished in terms of this measure. Initial p is more variable than b, which may
explain its greater likelihood of being misidentified as b than vice versa.
Taken together, these results suggest that the acoustic similarity between initial p
and b, perhaps along with p’s more variable acoustics, are the source of listeners’
difficulty in identifying initial p. The cross-linguistic dispreference for p in word-initial
position is likely the result of its perceptual difficulty in this position. Identifying this sort
of connection between a phonological pattern and phonetic properties which make it
123
‘natural’ or ‘grounded’ is a central concern of phonologists (see e.g. Stampe (1973),
Hooper [Bybee] (1976), Archangeli and Pulleyblank (1994)). Recently, work in this area
has often sought functional grounding for specific Optimality Theoretic constraints (see
e.g. Hayes (1999), Smith (2002), Steriade (1999; 2001a), and papers in Hayes et al. (eds.)
(2004)).
Within this work on constraint grounding, there is general agreement that
functionally grounded constraints are those which prefer more perceptually or
psycholinguistically salient, or less articulatorily challenging, forms to those with less
salience or greater difficulty. From this perspective, the constraint which penalizes word-
initial p (‘*#P’) could be functionally grounded in initial p’s perceptual difficulty. Beyond
the basic consensus that functionally grounded constraints disprefer perceptually or
articulatorily difficult structures, however, there is relatively little discussion of what it
means for constraints to be functionally grounded; existing discussions of this issue take
various positions.
Functionally grounded constraints may be individually induced (or ‘projected’) by
a learner based on aspects of phonetic experience (Hayes, 1999; Smith, 2002; Steriade,
2001a). Prince and Smolensky (Prince and Smolensky, 1993/2004), on the other hand,
originally proposed that all constraints are innate. If this is the case, any functional
motivation for individual constraints had its effect in the distant evolutionary past. A
great deal of other work is agnostic on this matter, searching for phonetic facts which
correlate with constraint activity while remaining uncommitted to a particular
relationship between phonetics and constraints.
The position taken here is fundamentally similar to those of Steriade, Hayes, and
Smith: functionally grounded constraints are induced by each learner, based on that
124
learner’s experience. Specifically, each learner determines a portion of their own
constraint inventory based on their phonetic experience with the language surrounding
them. These induced constraints are functionally grounded. The motivation for this
proposal is discussed in detail in chapter 1; briefly, the primary argument comes from
cognitive economy, as follows.
Assume for the moment that phonetic data demonstrating segments’ or features’
relative perceptual salience, their articulatory difficulty, and so on is available to learners
via their immediate linguistic experience. Further assume that there exists a reliable
mechanism for evaluating learners’ linguistic experience and inducing a set of constraints
motivated by these functional factors. The independent existence of this information, and
its availability to learners, makes any innate specifications of phonetically grounded
markedness redundant. Under the assumption that innate mechanisms for language
acquisition should contain only those specifications which are absolutely necessary,
learners should use as much information as possible from their experience. Innate
specifications should only be posited when externally-available information is
insufficient for the learning task. While the induction of functionally grounded
constraints relies on innate constraint schemata which provide the learner with
instructions for mapping perceptual or articulatory experience to phonological
constraints, the substance of the constraints themselves need not be innately encoded if it
is available in learners’ experience. Formally grounded constraints, on the other hand, are
those which cannot be consistently induced by all learners, and so must instead be innate.
The division between formally grounded (innate) and functionally grounded
(induced) constraints is therefore a matter for empirical scrutiny. A constraint can only be
universally induced from learners’ perceptual and articulatory experience if it can be
125
shown that all learners of all languages have consistent access to experience from which
they can induce the relevant constraint, regardless of differences in their linguistic
experience.
As argued in chapter 1, functionally grounded constraints must be induced by all
learners of all languages in order to maintain a consistent, universal constraint inventory.
In order for induced constraints to be in all learners’ constraint inventories, all learners
must have sufficient access to perceptual or articulatory evidence for these constraints. If
only some learners would have sufficient perceptual or articulatory information to induce
a constraint, the constraint must instead be innate in order to be universal.
Returning to the arguably functionally grounded constraint *#P, the claim that this
constraint can be universally induced through learners’ experience with initial p’s
perceptual difficulty depends on learners having consistent experience of this perceptual
difficulty, and having some mechanism for reliably translating this perceptual experience
to the appropriate constraint. A computational model of a learner’s perceptual experience
can be used to evaluate whether *#P can be induced from this experience. This chapter
describes such a model, which induces *#P from learners’ perceptual experience in
languages where initial p is attested and also in languages where there is no initial p. The
model produces realistic patterns of perception based on realistic acoustic representations
of initial and medial p, b, t, and d. When a constraint induction algorithm evaluates this
perceptual experience in either type of language, the constraint *#P can be consistently
induced.
The model has three components, each of which is a realistic representation of a
learner’s experience. First, in the production component, a virtual adult speaker
pronounces words with initial and medial stops whose acoustic properties are those
126
measured in chapter 3. This is the input to the perception component, where a virtual
learner hears these segments and develops acoustic criteria for identifying them in initial
and medial position. At the end of phonetic learning, the learner’s perceptual behavior is
equivalent to that of subjects in the perceptual experiment reported in chapter 3: it finds
word-initial p uniquely perceptually difficult. Finally, in the induction component, the
learner uses its own perceptual experience to induce constraints against segments which
meet ‘innate’ criteria for being perceptually difficult. The learner reliably induces the
attested constraint *#P without inducing other, unattested constraints; this occurs whether
the learner is exposed to pseudo-French, where initial p occurs, or pseudo-Cajonos
Zapotec (‘pseudo-CZ’), where it does not occur.
The model of perception and constraint induction described here provides
evidence for the argument that *#P is functionally grounded, as it demonstrates that this
constraint can be consistently induced by learners from realistic representations of
perceptual experience. The remainder of the chapter will describe the structure and
results of the model in detail. Section 4.2 will describe the perception and production
components. As described above, the realistic results of the perceptual model form the
basis for constraint induction, which is discussed in section 4.3.
4.2. Modelling production and perception
The production component of the model represents an adult speaker’s acoustically
realistic productions of initial and medial stops. The output of production is the input to
perception; the perception component represents a learner who listens to adult speech,
develops acoustic prototypes of segments, and learns to identify stops based on their
acoustic properties. The model also has realistic perceptual properties: it behaves
similarly to subjects in the perceptual experiment reported in chapter 3 in that it finds
127
word-initial p uniquely perceptually difficult. These realistic properties of the perception
model make it a reliable foundation for the perceptually-based model of constraint
induction discussed in section 4.3.
4.2.1. How the model works
This section will describe the structure of first the production model and then the
perception model. The following section will present the results of the perception model,
which demonstrate that it faithfully represents human perception.
4.2.1.1. Production and phonetic representations
In a single cycle of the model, the virtual speaker produces an utterance of the form
CaCa, where each C is either p, b, t, or d. The virtual learner’s task is to learn to identify
the consonants that it hears. The speaker is represented by the production component of
the model described here. During production, initial and medial segments are selected
from the inventory of known segments, and each consonant is ‘pronounced’ with
appropriate acoustic properties which are randomly chosen from the possible properties
of each consonant. This section will first describe the way in which stops’ acoustic
properties are represented in the model, and will then describe the speaker’s procedure
for selecting particular stops with particular acoustic properties.
Acoustic representations of stops
Each consonant in the model has four acoustic properties: place, closure voicing
(voicing), VOT, and maximum burst intensity (burst). A numeric value for each of these
properties is chosen by the speaker each time a stop is produced, and the learner uses
these acoustic values to categorize the stops it hears. In a single cycle of the model, as
128
will be described below, the speaker randomly chooses an initial and medial consonant to
produce. The speaker then randomly chooses values for each of the four acoustic
properties for each consonant, where the possible values for each property are taken from
the acoustic experiments described in the previous chapter. These sets of acoustic values
constitute the spoken utterance, as shown in (98). The learner hears these sets of acoustic
values and uses its developing knowledge of prototypical acoustic values for each
consonant to guess which consonants were spoken. (98) SPOKEN: "tada!"
Initial t: Place = 100 Voicing = 0 VOT = 47 Burst = 61 Medial d: Place = 100 Voicing = 100 VOT = 18 Burst = 54
39
Each of the four acoustic values of a consonant in a given utterance is randomly
chosen from normal distributions with specified means and variances. Each acoustic
value is an integer between 0 and 100. Possible maximum burst intensity and VOT values
were taken directly from the experimental data reported in chapter 3. Maximum burst
intensity was measured for each consonant in each position, and the means and variances
of these distributions are used directly in the model as shown in (99). (99) p b t d
INITIAL mean variance mean variance mean variance mean variance
BURST 55 36 53 22 64 11 60 11
p b t d
MEDIAL mean variance mean variance mean variance mean variance
BURST 56 16 52 19 62 12 57 10
Within the model, the ‘VOT’ property reflects only the positive portion of each
stop’s voice onset time. This is distinct from the model’s ‘closure voicing’ property
39 Throughout this chapter, this font will be used to show data from the model.
129
described below. This distinction reflects speakers’ tendency to process the presence vs.
absence of prevoicing differently from fine-grained distinctions between positive VOTs
(Hay, 2005). For the voiceless stops p and t, VOT means and variances were taken
directly from the experimental measures. As voiced stops do not have positive voice
onset times, the VOT means for b and d were set to 0, and these stops’ variances were set
to the averaged variance of all voiceless stops’ VOTs.40 Stops’ possible VOT values in
the model are summarized in (100). (100) p b t d
INITIAL mean variance mean variance mean variance mean variance
VOT 16 89 0 74 34 76 0 74
p b t d
MEDIAL mean variance mean variance mean variance mean variance
VOT 22 51 0 74 29 80 0 74
Hay (2005) has shown that the presence vs. absence of prevoicing is perceived
categorically. To represent this aspect of perception in the model, the possible closure
voicing values of each stop are distributed such that there is a robust binary distinction
between voiced and voiceless stops. Voiceless stops have closure voicing values of
essentially zero (their mean is zero, and the variance is extremely small), while voiced
stops have closure voicing values of essentially 100. (101) CLOSURE VOICING: Voiceless mean = 0 Voiced mean = 100
Variance = 2 Variance = 2
Finally, the ‘place’ cue also produces a binary distinction between labial stops
(with values at or near 0) and coronal stops (with values at or near 100). As this model is
40 These variances for voiced stops’ VOTs are almost certainly too large; however, section 4.2.2.4 suggests that smaller, more realistic variances for these stops would make initial p even more uniquely perceptually difficult, and so would make the model behave increasingly realistically overall.
130
concerned with voicing distinctions within a single place rather than the perception of
place distinctions themselves, these extremely simple values are placeholders for more
realistic sets of detailed acoustic cues to place.
(102) PLACE: Labial mean = 0 Coronal mean = 100 Variance = 2 Variance = 2
The ranges of possible acoustic values for each phonetic property are summarized
in ((103). Acoustic values occurring in the model range between 0 and 100. If a normal
distribution with a specified mean and variance would allow some chance for values
below 0 or above 100, those values were replaced by additional 0 values or 100 values,
respectively. ((103) p b t d
INITIAL mean variance mean variance mean variance mean variance
PLACE 0 2 0 2 100 2 100 2
VOICING 0 2 100 2 0 2 100 2
VOT 16 89 0 74 34 76 0 74
BURST 55 36 53 22 64 11 60 11
p b t d
MEDIAL mean variance mean variance mean variance mean variance
PLACE 0 2 0 2 100 2 100 2
VOICING 0 2 100 2 0 2 100 2
VOT 22 51 0 74 29 80 0 74
BURST 56 16 52 19 62 12 57 10
Because these acoustic properties are represented by normal distributions, the
overall structure of the model can be easily investigated. For example, the mean or
variance of a particular property can be changed in order to investigate the perceptual
consequences of such a change. This sort of exploration of the model will be discussed in
131
detail in section 4.2.2. Acoustic properties could also be represented by sets of actual
values of burst intensity and VOT for each consonant as determined in the acoustic
experiments. Section 4.2.2 will also show that generalizing from these individual values
to normal distributions based on these values has essentially no consequence for the
overall performance of the model, and so that this generalization is justified. Production: Choosing particular values to pronounce
In each round of the model, the speaker randomly chooses consonants to produce in
initial and medial position. The output of each round of production is thus a single word
of the form CaCa. For each consonant, the speaker then chooses values for place, closure
voicing, VOT, and burst intensity from normal distributions with the means and variances
specified above.
Because the acoustic properties of ‘spoken’ stops are taken from acoustic
measurements of naturally produced stops, the production component of the model
accurately represents the acoustic properties of a learner’s linguistic experience. The
result of a round of production is repeated in (104). (104) SPOKEN: "tada!"
Initial t: Place = 100 Voicing = 0 VOT = 47 Burst = 61 Medial d: Place = 100 Voicing = 100 VOT = 18 Burst = 54
4.2.1.2. Perception: Hearing, phoneme identification, and category learning
The perception component of the model is more complex than the production component.
After a set of acoustic values is produced, the learner ‘hears’ these acoustic values
somewhat imperfectly. The learner guesses which stop was produced in each position of
the CaCa word by comparing the heard values to prototypical values (which emerge from
the learner’s experience) for each stop, looking for the prototype most similar to the
132
heard segment. The model assumes that the learner receives feedback as to which stop
was actually produced. This feedback is used to adjust the prototype for the heard stop
based on the new acoustic information. Finally, the model stores information about
whether it accurately identified the stop, for use in constraint induction. These processes
are described in more detail below.
Overall, this procedure allows the perception component of the model to map
realistic acoustic data to realistic patterns of perception. It takes experimentally-
determined acoustic properties as its input, and produces a pattern of perceptual accuracy
consistent with experimental results. Crucially, the model finds word-initial p more
perceptually difficult than initial b, without similarly finding medial p more difficult than
medial b, or initial t more difficult than initial d; these results are presented in section
4.2.2. This realistic model of a learner’s perceptual experience can then be used to test the
model of constraint induction described in section 4.3. Hearing: Not all spoken acoustic values are perceived accurately
In order to model imperfect perception (as in a noisy environment), the transmission of
the spoken acoustic values is imperfect in two ways: some acoustic values are not heard
at all, and those which are heard may be perturbed slightly.
First, each acoustic property has some fractional likelihood of being heard,
causing the learner to fail to hear some acoustic properties. The motivation for
occasionally dropping cues comes from speakers’ imperfect performance in the
perceptual experiment. In the experimental materials, the closure portion of each voiced
stop was fully voiced, and that of each voiceless stop was entirely voiceless. If this cue
were consistently perceived, its acoustically binary nature should have allow listeners to
perfectly categorize stimuli for voicing. As listeners regularly misidentified voiced
133
segments as voiceless and vice versa, they must not have been consistently able to hear,
or make perceptual use of, this acoustically reliable cue. In the model, each cue to voicing
(closure voicing, VOT, and burst intensity) is heard for 75% of stops, and the single place
cue is heard in 95% of consonants. Place is heard more frequently than any of the voicing
cues because subjects in the perceptual experiment made fewer place mistakes than
voicing mistakes.41
When some property of a consonant is heard by the learner, its perception can
also be slightly imperfect. A spoken acoustic value is transformed into a heard acoustic
value by randomly choosing a value from a normal distribution whose mean is the spoken
value and whose variance is very small; the variance for all such distributions here is 2.
The spoken properties given above can thus be heard as a subset of imperfectly-
transmitted values as in (105), where the learner fails to hear the medial d’s closure
voicing and burst cues at all and its VOT property is inaccurately heard as 17 rather than
the spoken 18. (105) SPOKEN: "tada!"
Initial t: Place = 100 Voicing = 0 VOT = 47 Burst = 61 Medial d: Place = 100 Voicing = 100 VOT = 18 Burst = 54
HEARD:
Initial: Place = 100 Voicing = 0 VOT = 47 Burst = 61 Medial: Place = 100 Voicing = VOT = 17 Burst =
Identification: Comparing heard values to prototypes
In order to identify the spoken consonant from its acoustic properties, the learner
compares the set of heard acoustic properties to prototypes of each of the possible
41 Section 4.2.2.2 discusses other possible rates of cue transmission, showing that these arbitrary
values are not crucial to the ultimate perceptual results of the model.
134
consonants. The learner guesses that the prototype most similar to the set of heard
properties is the consonant produced by the speaker.
A prototype of a particular consonant is a four-dimensional vector whose
coordinates represent the average value for each of that consonant’s four acoustic
properties, based on all tokens of that consonant heard by the learner thus far. Examples
of these prototype coordinates are given in (106). (106) PROTOTYPES: Initial Place Voicing VOT Burst
p: 3.9 4.3 16.5 54.1 b: 3.5 96.1 4.5 53.4 t: 94.8 7.4 30.2 65.3 d: 96.8 96.2 3.7 60.1 Medial Place Voicing VOT Burst p: 4.1 5.6 20.9 55.4 b: 4.3 94.0 5.6 52.7 t: 95.5 5.6 22.9 63.0 d: 96.9 95.7 5.2 57.3
In order to guess which stops were heard in a particular CaCa word, the listener
calculates the distance between the points represented by the heard property values and
those of each prototype. When all four cues are heard, as for the initial stop in (107)
(repeated from above), distance is calculated by the formula in (108). As the model is
concerned with the details of voicing identification but not with place identification, there
are three cues to voicing but only one for place. In order to give equal weight to place and
voicing, the single place cue is more heavily weighted in the distance calculation. (107) SPOKEN: "tada!"
Initial t: Place = 100 Voicing = 0 VOT = 47 Burst = 61 Medial d: Place = 100 Voicing = 100 VOT = 18 Burst = 54
HEARD:
Initial: Place = 100 Voicing = 0 VOT = 47 Burst = 61 Medial: Place = 100 Voicing = VOT = 17 Burst =
(108) distance =
sqrt( 3*(placeHeard–placeC)2 + (voiHeard–voiC)2 + (votHeard–votC)2 + (burstHeard–burstC)2 )
135
The distance between each prototype and the set of heard values is calculated,
producing a set of distances between the heard stop and each prototype as in (109). The
learner guesses that the heard stop is in the category of the nearest prototype. In this
example, the learner guesses correctly that the initial stop was t. If two or more
prototypes are equidistant from the heard stop, the learner guesses randomly among these
equally likely possibilities. (109) Initial Distance (prototype ~ X) Place Voicing VOT Burst
p: 169 3.9 4.3 16.5 54.1 b: 198 Prototype 3.5 96.1 4.5 53.4 t: 21 coordinates: 94.8 7.4 30.2 65.3 d: 106 96.8 96.2 3.7 60.1 Guess: t (correct)
When the listener fails to hear all of the acoustic properties of some stop, as is the
case for the medial stop in example (107) above, distance is calculated based on only
those properties heard. For example, in this case where the learner heard only place and
VOT values, distance between this stop and each prototype is calculated using only the
prototype values for place and VOT. The reduced equation which calculates distance in
this case is given in (110). This produces the set of distances in (111), and the shortest of
these prompts the learner to incorrectly guess that the medial consonant was t. (110) When only Place and VOT cues are heard:
distance = sqrt( 3*(placeHeard – placeC)2 + (votHeard – votC)2 ) (111) Medial Distance (prototype ~ X) Place Voicing VOT Burst
p: 166 4.1 5.6 20.9 55.4 b: 166 Prototype 4.3 94.0 5.6 52.7 t: 10 coordinates: 95.5 5.6 22.9 63.0 d: 13 96.9 95.7 5.2 57.3 Guess: t (incorrect)
In this model, in addition to guessing which segments were heard based on the
learner’s own acoustic representations, the learner is also assumed to know which
136
segments were actually produced by the speaker. In this assumption, the model is similar
to many learning algorithms for OT constraint rankings which assume that the learner
compares observed surface forms to the underlying representations from which they were
derived (see e.g. Tesar and Smolensky (1994 et seq.), Boersma and Hayes (2001)). In
giving this phonetic learner access to the ‘underlying representation’ of the speaker’s
utterance, in addition to the surface acoustic values, this model focuses on learning the
relationship between a given set of categories and their possible acoustic realizations. Just
as elaborated models of phonological learning have been proposed in which learners
discover underlying representations as well as constraint rankings (Jarosz, 2006;
Merchant and Tesar, to appear), this model could be elaborated such that the learner
would discover the phonetic categories themselves, as in de Boer’s model of vowel
inventories (2001).
The learner uses its knowledge of the segments actually produced by the speaker
to track its own rates of perceptual difficulty, and uses this knowledge of perceptual
difficulty as the basis for constraint induction. Knowing which segments are heard in a
given utterance is also important in learning the coordinates of the prototypes as
described below. This knowledge is crucially not available, however, to the component of
the model which compares prototype coordinates and heard acoustic values. That is, the
learner attempts to identify segments based only on their acoustic properties, and
effectively ‘finds out’ which segment was actually produced only after it has guessed the
segment’s identity.
Once the learner has attempted to identify a segment and received feedback about
the segment’s actual identity, the learner calculates and tracks two aspects of perceptual
difficulty: accuracy and false alarms. Accuracy measures the learner’s ability to correctly
137
identify tokens of a particular consonant; in other words, accuracy scores address the
question, ‘Of all the tokens of initial p the learner has heard, how many have been
correctly identified?’ False alarms measure the learner’s ability to guess that some
particular consonant was heard only when this is true. That is, false alarm scores address
the question, ‘Of all the times the learner guessed that it heard initial p, how many of
those guesses were wrong?’ The formulas for calculating these two scores are given in
(112), and sample accuracy and false alarm rates are given in (113). (112) For some segment x:
Accuracy(x) = [ # x tokens correctly identified ] ÷ [ # x tokens heard ]
FalseAlarm(x) = [ # x tokens incorrectly identified ] ÷ [ # x responses ] (113) Initial Accuracy False alarm
p: 80% (12 of 15) 8% (1 of 13) b: 88% (15 of 17) 12% (2 of 17) t: 92% (11 of 12) 21% (3 of 14) d: 94% (17 of 18) 6% (1 of 18)
Medial Accuracy False alarm p: 93% (13 of 14) 19% (3 of 16) b: 86% (12 of 14) 8% (1 of 13) t: 77% (10 of 13) 17% (2 of 12) d: 90% (19 of 21) 10% (2 of 21)
Learning: Prototypes are adjusted based on new acoustic information
Finally, the learner adjusts the coordinates of its prototypes based on the acoustic values
that it hears in a particular round. In this way, the prototype coordinates change over
time. As the learner has more perceptual experience, the prototypes come to represent
each consonant’s acoustic properties with increasing accuracy.
Each of a prototype’s coordinates represents one of the stop’s four acoustic
properties. The value of each coordinate is the average of all acoustic values of that
property for the stop which the learner has heard. At the beginning of the simulation, the
138
coordinates of each prototype are set to the default values given in (114). These defaults
are identical for all four initial stops and for all four medial stops. Each of these values is
the average of the four consonants’ mean values for the particular acoustic property. For
example, each word-initial stop prototype has an initial ‘place’ value of 50 because this is
the average of the mean place values of initial p (0), b (0), t (100), and d (100). These
simulation-initial values give the model no inherent bias towards any of the four stops. (114) INITIAL DEFAULTS: Initial Place Voicing VOT Burst
p: 50 50 12.5 58 b: 50 50 12.5 58 t: 50 50 12.5 58 d: 50 50 12.5 58 Medial Place Voicing VOT Burst p: 50 50 12.8 56.8 b: 50 50 12.8 56.8 t: 50 50 12.8 56.8 d: 50 50 12.8 56.8
As the learner gains linguistic experience, it adjusts the coordinates of these
prototypes so that they come to reflect the segments’ actual acoustic properties. For
example, in the round of the simulation discussed here, the initial t is the twelfth initial t
heard by the learner. The most recent set of acoustic values for t is averaged with the
previous 11 sets of values (plus the initial default values) to get a new set of coordinates
for the initial t prototype, given in (115), which represents all of the learner’s experience
with initial t tokens to date. (115) ADJUSTED PROTOTYPES: Place Voicing VOT Burst
Initial t --> 95.2 6.6 31.8 64.8 Medial d --> 97.0 95.7 5.9 57.3
4.2.2. Results and discussion
The production and perception components of the model described above represent an
extremely basic picture of phonetic perception. As the following discussion will show,
139
this basic model ‘perceives’ the four stops p, b, t, and d in initial and medial positions
very much like subjects in the perceptual experiment reported in the previous chapter.
Section 4.2.2.1 presents results which show that the model, like human listeners, finds
word-initial p uniquely perceptually difficult.
One analytical possibility offered by any perceptual model is that the model itself
can be modified, and the consequences of these adjustments used to illuminate the inner
workings of the model. In this model, the values of individual parameters can be varied:
acoustic properties’ means and variances can be changed, properties can be heard more or
less frequently, or individual acoustic properties can be removed from the model entirely.
Section 4.2.2.2 will justify some of the model’s arbitrary parameter settings in this way.
Section 4.2.2.3 will then demonstrate that the source of the model’s unique perceptual
difficulty with word-initial p follows from the variability of initial p’s VOT values. In
this way, the model can be used to generate hypotheses for future perceptual experiments.
4.2.2.1. General results: Initial p is perceptually difficult
In general, the perceptual model behaves very much like subjects in the perceptual
experiment: it finds word-initial p uniquely perceptually difficult. Initial p is more
frequently misidentified than its voiced counterpart (initial b), while no similar
relationship holds between initial medial p and b or t and d. A difference between the
model and the subjects is found in the specific indicators of perceptual difficulty, which
is measured in the model only by accuracy scores. This, like many aspects of the model,
is a simplification of real behavior, where perceptual difficulty can be indicated by either
accuracy or reaction times. The perceptual experiment reported in chapter 3 found
indications of initial p’s perceptual difficulty in subjects’ reaction times, but not in their
overall accuracy. This is likely a consequence of the specific task. As speed and accuracy
140
are both indicators of the same fundamental perceptual difficulty (Ashby and Maddox,
1994; Pisoni and Lazarus, 1973), simplification to an accuracy-only model is justified.
The model’s overall rates of accurately identifying various segments are
determined by averaging the results of many simulations, much like subjects’ overall
ability to perceive segments is typically evaluated by averaging experimental results from
many subjects. Each simulation represents the progress of a single learner towards stable
phonetic categories, which allow the learner to accurately identify segments at stable
rates. Figure 7 represents the model’s changing ability to accurately identify each initial
and medial consonant, averaged over 20,000 simulations. The accuracy measure for each
consonant begins at chance. As the learner gains experience with the segments, its ability
to accurately identify each first increases and then stabilizes over the course of a 300-
round simulation.
In Figure 7 (and throughout this chapter), the percent correct for some segment in
some round is measured as follows: out of 20,000 simulations, given some segment (e.g.
initial p) and some point in time (e.g. round 200), the percent correct measure compares
the number of accurate identifications of that segment in that round to the total number of
times that segment was randomly selected and produced in that round. That is, out of
20,000 simulations, the model ‘heard’ initial p approximately 5,000 times; initial p was
accurately identified in 91% of those instances. As the data are somewhat noisy, even
when averaged over 20,000 simulations, the lines in the graphs in this section (as in
Figure 7) are moving averages over 15-round windows. The percent correct shown for
initial p at round 200 is actually the average percent correct for initial p in rounds 193
through 207. Lines are labeled by the segments in boxes at the right of each graph, which
show segments’ order from most to least accurate.
141
a. b. Figure 7. Model accuracy for each initial (a) and medial (b) consonant, averaged across
20,000 simulations of 300 rounds each. The lines represent moving averages across 15-round windows.
The perceptual model gives the same primary result as the perceptual experiment
reported in chapter 3. Word-initial p is consistently more difficult for the model to
identify than word-initial b. This difference is unique to labials in initial position, as no
comparable accuracy difference is seen in medial p and b. There is also no comparable
difference between the accuracies of initial t and d, indicating that word-initial p is
uniquely perceptually difficult.
4.2.2.2. Justifying assumptions in the model
This section will examine the consequences of various simplifying assumptions made in
designing the model, demonstrating that none of these are central to the overall pattern of
results. These assumptions include the rates at which the learner fails to hear acoustic
cues, and the generalization from actual acoustic values to those from normal
distributions. Voicing cue transmission rates
In the ‘basic’ version of the model described in section 4.2.1, the learner doesn’t always
hear every acoustic property of every segment. Specifically, learners in the basic model
t d b p
d t b p
142
have a 5% chance of failing to hear a segment’s place cue, and a 25% chance of failing to
hear each of the voicing cues (closure voicing, VOT, and maximum burst intensity).
Without some rate of dropping each cue, the place and closure voicing cues would allow
the learner to identify all segments with perfect accuracy. This is because these cues have
widely different means and very small variances, and so give rise to perfect
categorization when they are heard. Therefore, for segments to be occasionally
misidentified as in the perceptual experiment, there must be some chance that each cue
goes unheard. Because subjects make fewer place mistakes than voicing mistakes, place
cues are heard more frequently than voicing cues.
The rates of cue dropping in the basic model satisfy the requirements that cues
must be dropped at some rate, and that place cues must be dropped less frequently than
voicing cues. However, versions of the model in which voicing cues are dropped in
something other than 25% of the cases show that other rates of dropping voicing cues
would preserve the core perceptual properties of the model. As shown in Figure 8, the
ordinal and proportional relationships between initial segments’ identification accuracies
are preserved under different rates of cue dropping.
a. b. c. Figure 8. Model accuracy for each initial consonant, where place cues are dropped in
5% of heard utterances and closure voicing, VOT, and burst cues each dropped in 10% (a), 25% (b), or 50% (c) of heard utterances. Results are averaged across 20,000 simulations of 300 rounds each; lines represent moving averages across 15-round windows.
t d b p
t d b p
t d b p
143
Assuming that a learner has a consistent 5% chance of failing to hear the place
cue, any stable rate of dropping the voicing cues (here, 10%, 25%, or 50%) results in a
perceptual pattern where initial p is more difficult to perceive than b but no such
difference exists between t and d. Higher rates of dropping voicing cues simply
exaggerate the accuracy differences among segments. While this exaggeration is most
evident in the differences between p and b’s accuracies, the difference between t and d’s
accuracies also expands with more cues dropped, particularly in early rounds of the
model.
Some arbitrary choice of rates at which cues are dropped is necessary. For this
reason, all versions of the model discussed in this chapter drop place cues from 5% of
segments and each other cue from 25% of segments. However, as the relative perceptual
difficulty of p does not depend on these particular values, the model could be fine-tuned
to more closely resemble human perception with more information about listeners’ ability
to hear and make use of each voicing cue. Generalizing from real acoustic values to normally-distributed acoustic values
The production component of the model produces segments with realistic acoustic values
by choosing each of a segment’s four acoustic properties from normal distributions with
particular means and variances. For the (positive) VOT and maximum burst intensity
cues, these means and variances match those experimentally identified for each segment.
In choosing these values from normal distributions rather than simply selecting them
from the sets of measured values, the production component generalizes slightly beyond
the data obtained experimentally.
By comparing this basic version of the model with a version where VOT and
burst values are selected from exactly the sets of acoustic values measured
144
experimentally, it can be shown that this generalization is valid. Either method of
choosing burst and VOT values gives the same central perceptual results, namely lower
accuracy for initial p than b, but no similar lower accuracy for initial t than d or for
medial p than b. The results of these two versions of the model are compared in Figures 9
and 10.
a. b. Figure 9. Model accuracy for each initial consonant based on actual acoustic
measurements (a) and on values selected from normal distributions whose means and variances are based on actual acoustic measurements (b). Results are averaged across 20,000 simulations of 300 rounds each; lines represent moving averages across 15-round windows.
a. b. Figure 10. Model accuracy for each medial consonant based on actual acoustic
measurements (a) and on values selected from normal distributions whose means and variances are based on actual acoustic measurements (b). Results are averaged across 20,000 simulations of 300 rounds each; lines represent moving averages across 15-round windows.
t d b p
d t b p
t,d b p
t d b p
145
There are three primary differences between a model which uses real acoustic
values and the basic model, in which values are chosen from constructed normal
distributions. First, while initial p is less accurately recognized than initial b in either
case, its accuracy (and that of initial b) is generally lower in the basic model. Second, a
model using real values identifies initial t somewhat more accurately than initial d, while
the basic model fails to consistently make this distinction. Finally, the model which uses
real values recognizes medial coronals more accurately than labials, while the basic
model recognizes each medial voiced segment slightly more accurately than its voiceless
counterpart; in both cases, however, there is very little difference in the accuracy of the
four medial segments.
The basic model thus exaggerates the contrast of interest here (initial b vs. p),
while neutralizing irrelevant differences (initial t vs. d; medial labials vs. coronals). This
generalization allows individual acoustic properties’ means and variances to be changed
and the perceptual consequences of these changes used to illuminate the inner workings
of the model.
4.2.2.3. The source of initial p’s perceptual difficulty: VOT variances
The acoustic experiments reported in chapter 3 found a correlation between the acoustics
of initial p and its perceptual difficulty. Initial p’s VOT and burst intensity are more
acoustically similar to those of initial b than other voiceless segments are to their voiced
counterparts. Initial p is also more acoustically variable than initial b. Together, these
properties are likely the source of listeners’ difficulty in accurately identifying word-
initial p.
Because the perceptual model accurately represents both the acoustics and relative
perceptibility of the segments in question, it can be used to develop more detailed
146
hypotheses about the relationship between particular acoustic properties and the
perceptual difficulty of word-initial p. Specifically, by removing individual acoustic
features (e.g. VOT and burst intensity) from the model, and by changing individual
segments’ means and variances for these properties, we can identify those specific
acoustic features which cause the model to identify initial p less accurately than initial b.
These results can then suggest the direction of further perceptual experiments.
The relative importance of burst intensity and VOT can be explored in versions of
the model where each is the sole cue to voicing. If such a model produces patterns of
perception similar to the basic version of the model, where initial p is uniquely
perceptually difficult, the single voicing cue in that model contributes significantly to this
perceptual result. If, instead, the presence of only a single voicing cue changes the
perceptual results dramatically, then that cue is not responsible for the basic pattern.
In order to focus exclusively on the effects of burst and VOT in making voicing
distinctions, the model will be simplified somewhat from the basic version discussed
above. The binary closure voicing cue will be removed from the model entirely, as it
provides the learner with a perfect cue to voicing. All other cues which are present will
be heard in 100% of the heard utterances, so that the only perceptual difficulty in the
model comes from the inherent properties of VOT and burst cues.
Figure 11 compares this ‘place-VOT-burst’ model, where the closure voicing cue
is never heard and place, VOT, and burst are always heard, to the basic model. The two
are qualitatively very similar: initial p is accurately identified much less frequently than
initial b, and both are less accurately identified than t and d, whose accuracies are fairly
similar. This similarity is also seen in the confusion matrices in Figure 12. As the place
cue is always heard here, the place-VOT-burst model never makes a place mistake;
147
otherwise, both models produce similar relative patterns of results. Initial p is mistaken
for b more often than initial b is mistaken for p, and t is mistaken for d slightly more
often than d is mistaken for t.
a. b. Figure 11. Model accuracy for each initial consonant where place, VOT, and burst cues
are always heard, and closure voicing is never heard (a); also for the basic version of the model (b). Values are averaged over 20,000 300-round simulations.
Response Response
p b t d p b t d
p 73% 27% p 91% 8% 1% 0%
b 12% 88% b 5% 93% 0% 2%
t 96% 4% t 1% 0% 97% 3%
a.
Segm
ent
d 0% 100% b.
Segm
ent
d 0% 1% 2% 97%
Figure 12. Confusion matrix for initial consonants where place, VOT, and burst cues are always heard, and closure voicing is never heard (a); also for the basic version of the model (b). Data collected from the last 15 rounds of each of 20,000 simulations.
Using the place-VOT-burst model as a baseline, the perceptual results of models
which make voicing decisions based on only VOT or only burst cues can now be
explored. Figure 13 shows the results of a model in which burst cues are never heard and
so identification is based on only place and VOT cues. Figure 14 shows a confusion
t d b p
d t b
p
148
matrix for the same data. The results of this place-VOT model are extremely similar to
those of the place-VOT-burst model given in Figures 11 and 12. Removing burst cues has
very little effect on the overall patterns of recognition. Initial p’s perceptual difficulty
therefore appears to be due largely to VOT cues, and not to burst cues.
Figure 13. Model accuracy for each initial consonant where place and VOT cues are
always heard; closure voicing and burst cues are never heard. Values are averaged over 20,000 300-round simulations.
Response p b t d
p 73% 27%
b 13% 87%
t 96% 4% Segm
ent
d 1% 99% Figure 14. Confusion matrix for initial consonants where place and VOT cues are always
heard; burst and closure voicing are never heard. Data collected from the last 15 rounds of each of 20,000 simulations.
This is confirmed by a version of the model in which place and burst cues are
always heard, while VOT and closure voicing are never heard. This place-burst model,
whose results are shown in Figures 15 and 16, produces a pattern of perception quite
unlike that described above. While initial p is still somewhat less accurately identified
d t b
p
149
than initial b, this difference is comparable in size to the difference between initial d and
t. Unlike the place-VOT-burst model, the place-VOT model, the basic version of the
model, and human perceptual results, the perceptual difficulty of initial p is not
particularly unique. This indicates that VOT cues, rather than burst cues, are responsible
for the general perceptual behavior of the model.
Figure 15. Model accuracy for each initial consonant where place and burst cues are
always heard; closure voicing and VOT cues are never heard. Values are averaged over 20,000 300-round simulations.
Response
p b t d
p 56% 44%
b 42% 58%
t 72% 28% Segm
ent
d 29% 71% Figure 16. Confusion matrix for initial consonants where place and burst cues are always
heard; VOT and closure voicing are never heard. Data collected from the last 15 rounds of each of 20,000 simulations.
The discussion of experimental results in chapter 3 hypothesized that initial p’s
greater variability, as compared to initial b, is the source of its asymmetric perceptual
difficulty. That is, initial p and b are confusable with each other (and so both somewhat
t d b p
150
inaccurate) because of their similarity. p is perhaps more confusable with b than vice
versa because its acoustics are more variable, resulting in more b-like pronunciations of p
than vice versa. Variations on the model have already demonstrated that p’s perceptual
difficulty (within the model) results from properties of the VOT cue. The model can be
further varied to test the hypothesis that segments’ VOT variances are the primary source
of this asymmetry.
All four segments have quite large VOT variances in the model; these vowels are
repeated in (116). When the basic model (with all four cues present, at their default rates
of dropping) is modified such that each segment’s VOT variance is 10, the resulting
pattern of perception is quite different; this is shown in Figure 17. All four initial
segments are identified with very nearly equal accuracy. Crucially, the asymmetry
between p and b disappears.
(116) p b t d
INITIAL mean variance mean variance mean variance mean variance
VOT 16 89 0 74 34 76 0 74
a. b. Figure 17. Model accuracy for each initial consonant where the VOT variance of each
initial segment is 10 (a) and in the basic model (b). Values are averaged over 20,000 300-round simulations.
t d b p
t d b p
151
This suggests that it is in fact segments’ large VOT variances – and initial p’s
particularly large variance – which give rise to initial p’s unique perceptual difficulty.
The segments’ VOT distributions (in the basic model) are shown in Figure 18; segments’
confusability can be explained by reference to these distributions.
a. b. Figure 18. VOT probability distributions for initial labial (a) and coronal (b) segments in
the perceptual model.42
Considering again the results of the place-VOT model in figures 13 and 14 above,
initial t and d are rarely confused because of the relatively great distance between their
VOT means, as shown in Figure 18. t’s slightly greater variance accounts for the fact that
it is misidentified as d (4%) slightly more often than d is misidentified as t (1%). b’s VOT
variance is the same as that of d, but b is more frequently misidentified as p (13%)
because of the smaller distance between the VOT means of p and b. Finally, the model’s
greater rate of misidentifying p as b (27%) than vice versa, like the asymmetry between t
and d, follows from p’s larger VOT variance. These trends are all present in the basic
42 All acoustic values in the model are between 0 and 100. For this reason, if the probability
distribution for some feature would include values below 0, those values are replaced with additional 0 values. This is why the VOT distributions for initial b, p, and d contain disproportionate numbers of 0. Further, because of the discrete nature of the model, tails of the distributions are not infinite but rather end when response probabilities are less than 1%.
152
model, though its accuracy is generally better than in the place-VOT model due to the
presence of closure voicing (and burst) cues.
4.2.2.4. Summary and discussion
This model is based on realistic acoustic representations of voicing cues for the initial and
medial stops p, b, t, and d. From this data, the virtual learner develops criteria for
identifying each stop, ultimately presenting a pattern of perception very similar to that
found in subjects in the perceptual experiment reported in chapter 3. Word-initial p is
more perceptually difficult (as indicated by the model’s lower rate of accurate
identification of initial p) than initial b. This is not a general property of labials: the
model, like humans, finds medial p no more difficult than medial b. Neither is it a general
property of initial voiceless stops: initial t is no more difficult than initial d.
In order to understand more deeply the relationship between acoustic and
perceptual properties of the model, simulations can be run with slightly different
parameters. This reveals, for example, that particular values for the frequency with which
the learner fails to hear individual acoustic cues are not crucial to the perceptual results of
interest. Adjusting the model such that individual acoustic cues are consistently present
or absent, or have different means or variances, provides a detailed picture of the
relationship between particular acoustic features and features of the overall pattern of
perception. Within the model, word-initial p’s unique perceptual difficulty is due to both
the similarity between the VOT means for initial p and b and especially p’s greater VOT
variance.
As explained in section 4.2.1.1, the VOT variances for voiced stops were
arbitrarily set to the average of the VOT variances of initial and medial p and t. This
gives the voiced stops quite large VOT variances, effectively allowing some of the
153
model’s voiced stops to have voiceless intervals following their release. This sort of brief
post-release voiceless interval for voiceless stops is reported by Mikuteit (2006),
justifying this basic representation. However, the assumption that the variance in voiced
stops’ (positive) VOT is as large as that of voiceless stops is entirely arbitrary. If
anything, it seems likely that these post-release periods of voicelessness are shorter and
rarer than assumed in this model, and so the variance of voiced stops’ positive VOTs is
actually smaller than the variances given in the present model. As it has been shown here
that initial p’s perceptual difficulty follows primarily from the fact that its VOT variance
is simply larger than that of initial b, any revised version of the model in which voiced
stops’ VOT variance were smaller would also produce this asymmetry between initial p
and b.
These results about the source of initial p’s perceptual difficulty are, of course,
necessarily true of only the model. Whether humans process these acoustic features in the
same way as the model, and thus whether humans’ perceptual difficulty with initial p also
follows primarily from its short, variable VOT, is a matter for further experimental study.
The model is useful in that it allows this sort of acoustically motivated hypothesis to be
developed and explored in a preliminary way much more rapidly than actual
experimentation allows.
The overall structure of this perceptual model is similar to an Expectation
Maximization (EM) model (Dempster et al., 1977), in that it alternates between
identifying tokens and learning about prototypes. At present, however, the model does
not use a true EM algorithm. This model’s procedure for learning prototype coordinates
is supervised – the model is told which stop was produced in each round – and
incremental – in each round, the model adjusts prototypes based only on data acquired
154
during that round. An EM model is typically unsupervised, and learns from an entire data
set at once.43 An EM model also makes probabilistic identifications: it guesses that some
surface form has some probability of belonging to one category, and another probability
of belonging to a different category. This model, on the other hand, makes categorical
identifications: in a given state of the model, a surface form is identified as belonging to
exactly one category. In the future, the model could be relatively straightforwardly
revised to incorporate EM.
Overall, because the model produces realistic mappings between attested patterns
of acoustics and perception, it can provide the basis for a model of constraint induction in
which a learner’s perceptual experience gives rise to constraints against perceptually
difficult segments. The structure of this constraint induction component of the model is
the topic of section 4.3.
4.3. Modelling constraint induction
Having constructed a model of learners’ perceptual experience with initial p, b, t, and d,
we can use this model to explore the induction of functionally grounded constraints.
Section 4.3.1 considers the varieties of linguistic experience which must be considered in
order to ensure that functionally grounded constraints are induced by all learners, and
also argues that constraint induction cannot refer to certain kinds of perceptual experience
(like infants’ perception of their own speech).
Constraint induction is governed by innate schemata for functionally grounded
constraints which provide the learner with instructions for mapping their perceptual
experience to phonological constraints. Section 4.3.2 discusses the general properties of
43 Semi-supervised EM models have also been proposed (Nigam et al., 2006).
155
these constraint schemata and presents the specific properties of the schema used in this
model to induce the constraint *#P. The model’s success in consistently inducing *#P is
summarized in section 4.3.3.
4.3.1. Desiderata for a constraint inducer
The general goal of constraint induction is for all learners to induce a consistent set of
functionally grounded constraints from their immediate linguistic experience. A realistic
model of constraint induction depends on a precise, realistic characterization of this
immediate linguistic experience. Learners’ experience can vary along two dimensions.
Different learners can be exposed to languages with different phonological properties,
and so can have differential exposure to individual segments or structures. Each learner
also has various kinds of experience with language: learners perceive adult speech, and
also articulate and perceive their own babbling and early speech. I argue that for a
constraint to be functionally grounded, it must be consistently induced from any learner’s
experience with any language. However, perceptually grounded constraints can only be
induced from learners’ immediate experience with the perception of adult speech.
This chapter and the last have discussed two phonotactic possibilities for word-
initial p, which give learners fundamentally different information about this segment.
Initial p can be phonotactically present and difficult to identify accurately, as in French
and the model of pseudo-French. Initial p can also be absent, as in Cajonos Zapotec. If
*#P is functionally grounded, it can only be consistently induced by learners of either
type of language if the constraint induction mechanism is able to identify initial p as
perceptually difficult in either situation.
Returning to the structure of a constraint inducer, in a language like French where
initial p is licensed, its perceptual properties are readily available to learners as they
156
induce perceptually grounded constraints. This general process is fairly straightforward to
implement in a model of pseudo-French. If the inducer has a way of tracking the relative
perceptual difficulty of segments in particular phonotactic positions, it will observe that p
is more difficult to accurately identify than other word-initial segments, and this
information can be used to induce the constraint *#P. In a language like Cajonos Zapotec,
however, this literal experience of initial p’s acoustic properties and their perceptual
consequences is unavailable to the learner. A learner of Cajonos Zapotec (or a virtual
learner of pseudo-CZ) must, like the (pseudo-)French learner, consistently induce *#P
from its perceptual experience; however, a different aspect of perceptual experience must
be able to motivate the constraint in this case.
It is frequently assumed that constraints like *#P, which prevent learners of some
languages from ever being exposed to perceptually difficult segments when they are
highly ranked, are nonetheless universally grounded in the perceptual difficulty of the
marked structure. After all, if languages like Cajonos Zapotec ban initial p is because it is
perceptually difficult, the constraint responsible for this restriction should arguably
represent each speaker’s knowledge of initial p’s perceptual difficulty.
This perspective is difficult to reconcile with the claim that functionally grounded
constraints are induced rather than innate, as learners of Cajonos Zapotec have no
perceptual experience of adult word-initial p from which *#P could be induced. For this
reason, it is often tacitly assumed that learners inducing constraints through a mechanism
like Inductive Grounding (Hayes, 1999) or the Schema/Filter model of CON (Smith,
2002) refer to something other than perceptual experience of adult forms of the ambient
language (or their own articulations of these same forms).
157
One possible way in which infants could acquire knowledge of unattested
segments’ articulatory and perceptual properties is through their own early productions.
Hayes’ discussion of Inductive Grounding focuses on learners’ induction of articulatorily
grounded constraints. If learners could have experience of segments’ articulatory
properties in phonotactic positions where they are not attested in the adult language (like
initial p) through babbling, they could perhaps use this information to induce a full range
of typologically attested, articulatorily grounded constraints.
In Smith’s Schema/Filter model of CON, perceptually grounded constraints
emerge from a mechanism similar to Inductive Grounding. She discusses the example of
functionally grounded perceptual augmentation constraints, which prefer perceptually
salient candidates to minimally different, less perceptually salient candidates.44 For
example, the augmentation constraint HEAVYσ prefers more salient long vowels to less
salient short vowels. In order to determine the relative perceptual salience of segments or
structures unattested in learners’ target languages, learners could again examine the
psycholinguistic consequences their own early productions of unattested structures.
With respect to the perceptually grounded constraint *#P, however, it is unlikely
that infants’ own early productions of p-initial forms could provide learners with the
same perceptual data as French-speaking adult pronunciations. First of all, evidence of
such forms would be relatively rare, and highly inconsistent across learners. While
children’s early babbling occasionally includes unattested segments and phonotactic
structures, later stages of babbling quickly come to reflect the segmental frequency and
phonotactics of the target language (Jusczyk, 1997: 177-9). Various child phonology
processes such as truncation, consonant harmony, and other unfaithful mappings can also
44 Perceptual salience is a psychoacoustic measure, perhaps of neural response magnitude.
158
give rise to phonotactic structures unattested in adult language (Vihman, 1996: 218-21),
but children vary widely in their use of these processes (as they do in the phonetic
inventories and structures used in babbling). So while it is likely that many children
learning languages without word-initial p could occasionally produce word-initial p, it is
unlikely that this experience would be frequent enough, or consistent enough across
learners, for universal induction of *#P.
A further reason why infants’ early productions would provide perceptual data
unlike that garnered from adult speech is that infants’ speech is much more articulatorily
variable than adult speech (Jusczyk, 1997: 181). In fact, while the articulations of very
young children may be impressionistically similar to various adult segments, children
only very rarely produce adult-like segments before approximately 6 months, at which
point the segmental content of babbling very quickly comes to resemble that of early
child speech (Oller, 2000). The acoustic experiments discussed in chapter 3, and also the
production model discussed above, suggest that the perceptual difficulty of word-initial p
follows from its relatively fine-grained acoustic properties. As infants’ articulations are
much more variable than those of adults, it is unlikely that an infant’s own rare
productions of unattested segments would be articulatorily and acoustically similar
enough to those of adult speakers to trigger the same patterns of perception as those adult
productions. For these reasons, I argue that learners should refer only to their perceptual
experience with adult productions of the ambient language in inducing perceptually
grounded constraints.45
45 Learners’ articulatory experience of their own productions poses similar difficulties for
constraint induction. In addition to children’s articulatory inaccuracy and the scarcity of unattested segments and phonotactic structures, the size and shape of an infant’s mouth (along with the initial absence of teeth) may give infants substantially different experience of articulatory difficulty than that found in adult speech, which is typically assumed to shape adult phonology.
159
A learner of Cajonos Zapotec therefore cannot induce *#P from the same
knowledge of the relative difficulty of accurately identifying initial p and b used by a
learner of French uses. Cajonos Zapotec and French learners’ knowledge about initial p is
fundamentally different: a French learner knows that initial p is dispreferred – and so
induces *#P – based on the knowledge that initial p is difficult to accurately identify. A
Cajonos Zapotec learner instead knows that initial p is dispreferred simply because it is
unattested in adult language.
Reflecting the diverse knowledge about initial p possessed by learners of
phonotactically different languages, I propose an induction mechanism for perceptually
grounded constraints which refers to correspondingly diverse aspects of perceptual
difficulty. In general, the inducer tracks segments’ perceptual properties, identifies
segments which are relatively perceptually difficult in particular phonotactic positions,
and generates constraints against these segments in these positions. In order to induce
constraints against segments with which learners have actual perceptual experience and
also those which are absent from a particular phonotactic position, the inducer tracks two
measures of perceptual difficulty: accuracy (which reflects correct identification of a
segment) and false alarms (which reflect incorrect guesses that a segment was heard).
The precise mapping from perceptual data to induced constraints is governed by
schemata for perceptually grounded constraints. These constraint schemata provide the
criteria for identifying segments whose accuracy and false alarm measures label them as
perceptually difficult, and for inducing constraints against these segments. The basic
definitions of perceptual difficulty, as understood by the inducer, are given in (117).
160
(117) Some segment x is perceptually difficult in some context ContextZ if either:
a. Accuracy(x/ContextZ) < threshold and
Accuracy(x/ContextZ) < Accuracy(y/ContextZ) Constraint *x/ContextZ
This difference must be significant (α = 0.01).
b. Accuracy(x/ContextZ) < FalseAlarm(x/ContextZ) Constraint *x/ContextZ
In (117a), the relative accuracy of two segments (x and y) in a shared phonotactic
context (ContextZ) is evaluated. If one segment’s accuracy is inherently lower than some
threshold, and also significantly less than some other segment’s accuracy given some
level of significance (here, α = 0.01), a constraint against the poorly perceived segment is
induced. This measure will trigger induction of *#P in languages like French, where
learners have actual experience with the relative perceptibility of initial p and b.
The comparison of individual segments’ accuracy and false alarm rates in (117b)
reveals cases where a learner expects to hear some segment (like initial p), and so
occasionally misidentifies other segments (like initial b) as the expected but unattested
segment. In this case, the unattested segment’s false alarms will outnumber its
(nonexistent) accurate identifications. This measure triggers the induction of a constraint
against the missing segment – here, *#P – in languages like Cajonos Zapotec, where a
learner’s experience with unattested initial p is limited to false alarms.
The remainder of this section will describe the constraint induction component of
the computational model. The perceptually grounded constraint *#P is consistently
induced by learners of either pseudo-French or pseudo-CZ based on the comparison of
segments’ accuracy and false alarm rates, which are determined by the perceptual
component of the model.
161
4.3.2. How the model works
The perception component of the model, described in section 4.2, hears acoustically
realistic representations of initial and medial p, b, t, and d and perceives them
realistically, finding word-initial p uniquely perceptually difficult. The output of this
production model is the input to the model of constraint induction described here, which
induces the constraint *#P from this perceptual experience. To accomplish this, the
inducer tracks segments’ accuracy and false alarm scores; positional markedness
constraints are induced against segments which are perceptually difficult in particular
phonotactic contexts.
In phonological terms, a functionally grounded constraint schema defines the
phonotactic positions which can be targeted by these positional markedness constraints,
as well as what exactly is meant by “significantly more difficult to perceive.” Schemata
for these functionally grounded constraints, which are induced from each learner’s
experience, are thus sets of phonotactic and perceptual (as well as articulatory,
psycholinguistic, etc.) criteria for constraint induction. The model of constraint induction
in this section provides an example of such a schema at work.
Section 4.3.2.1 first describes the general structure of functional constraint
schemata, and of the particular constraint schema which governs the assessment and
comparison of accuracy and false alarm scores, ultimately leading to consistent induction
of *#P. Section 4.3.2.2 presents the specific criteria for inducing a constraint against *#P
from a comparison of segments’ accuracy scores, which provides a model of induction in
a French-type language where learners hear tokens of initial p. Induction of the same
constraint from false alarm scores, as in a Cajonos Zapotec-type language where learners
never hear initial p, is discussed in section 4.3.2.3.
162
4.3.2.1. The structure of functionally grounded constraint schemata
The goal of the induction mechanism is to consistently induce the constraint *#P from
word-initial p’s unique perceptual difficulty. The two measures of perceptual difficulty
which will be used by the inducer are accuracy and false alarms; the definitions of these
two scores are repeated in (118). (118) For some segment x:
Accuracy(x) = [ # x tokens correctly identified ] ÷ [ # x tokens heard ]
FalseAlarm(x) = [ # x tokens incorrectly identified ] ÷ [ # x responses ]
In the early part of a simulation, before robust criteria for identifying segments are
available, a learner has low accuracy scores and high false alarm scores for all segments.
For this reason, the inducer does not begin tracking accuracy or false alarm scores until
phonetic categories and accuracy rates have stabilized. In the simulations reported here,
induction begins after 150 rounds of production and perception. A time where prototype
coordinates have stabilized (and so when induction should begin) could also be
dynamically identified in each simulation.
As described above, the induction of functionally grounded constraints is
governed by constraint schemata. Functionally grounded schemata specify four basic
elements of the induction of perceptually grounded constraints, which are summarized,
along with the particular parameters instantiated in the constraint induction model
described here, in (119). (119) Schemata specify four basic features of perceptual constraint induction:
• What kind of phonological element could be perceptually difficult. Here: Individual segments.
163
• Phonotactic positions where perceptual difficulty is considered.
Here: Word-initial position.
• What makes a segment perceptually difficult: a procedure for comparing perceptibility measures. o How many recent tokens’ accuracy/false alarm scores are considered.
Here: 400 recent tokens of each segment.
o Properties of segments’ relative accuracy and false alarm scores that trigger induction. Here: See sections 4.3.2.2 and 4.3.2.3.
• Definition of the induced constraints.
Here: If a segment x is relatively perceptually difficult in ContextZ: *x/ContextZ Assign one violation mark for each instance of x in
ContextZ.
First, a constraint schema defines the type of phonological element which could
be found perceptually difficult. In the present model, individual segments are judged
perceptually difficult; features or sets of segments all sharing a feature or features could
presumably be usefully judged perceptually difficult as well.
A schema for constraint induction must also specify the phonotactic positions in
which segments’ perceptual difficulty will be evaluated. With no such specifications,
learners would need to track perceptual difficulty in all phonotactic positions. This is
undesirable, as some logically possible positions have no known phonological relevance.
For example, no attested phonotactic constraint targets third-syllable onsets. Schemata
provide learners with innate information about which positions are phonologically
interesting, allowing them to ignore this sort of irrelevant position. Schemata may also
provide learners with minimal innate information about where segments can be banned
for perceptual reasons. For example, segments are generally not banned for perceptual
164
reasons in intervocalic position. This is frequently assumed to be the position in which
acoustic cues for consonant identification are most perceptually salient, so it is
unnecessary to track intervocalic segments’ relative perceptual difficulty.46 As utterances
in the model are restricted to the form C1aC2a, the induction component of the model
evaluates perceptual difficulty only in word-initial (C1) position.47
The third element specified by a constraint schema is the set of criteria for
identifying a segment as ‘perceptually difficult’ based on its accuracy and false alarm
scores. In order for a learner to determine segments’ relative perceptibility, the learner
must first calculate accuracy and false alarm scores for each segment, then compare the
scores using specified criteria. The comparison mechanism will be the topic of sections
4.3.2.2 and 4.3.2.3.
In order to calculate the accuracy and false alarm scores themselves, a learner
must know how much of its experience to take into consideration. For the sake of
efficiency, a learner does not consider every token of every segment in its entire
experience. A learner must also not consider too small a sample of its experience. In
order to be resilient in the face of noisy data (particularly in the early stages of learning)
and induce constraints only from persistent patterns of perceptual difficulty, the learner
here considers the accuracy and false alarm scores of the most recent 400 tokens of each
initial segment.
After a learner’s phonetic categories are stable, the inducer begins collecting
accuracy and false alarm data for each segment. For a given round, each segment heard
46 This is a simplifying, rather than a crucial, assumption in the model; a more complex model
could do without this limitation.
47 Individual consonants can be banned intervocalically for articulatory reasons, as this is a common position for lenition.
165
by the learner gets an accuracy score of 1 if it is correctly identified and 0 otherwise, as
shown in (120). Similarly, each segment which the learner guesses it heard gets a false
alarm score of 0 if the guess was correct and 1 if the guess was incorrect. When these
scores are averaged across tokens, segments which are typically accurately identified
have average accuracy scores of close to 1, and segments which the learner typically
guesses were heard only when they were actually heard have false alarm scores of close
to 0. (120) Heard: Initial p --> Initial p accuracy = 0
Guess: Initial b --> Initial b false alarm = 1
In order to maintain a consistent window of perceptual experience, no segment is
judged more or less perceptible than any other segment until the learner has heard each
segment 400 times. That is, segments’ accuracy and false alarm scores are not compared
until each represents the average of 400 tokens’ scores. After the learner has sufficient
data to begin comparing segments’ accuracy and false alarm scores, it continues to
consider only the most recent 400 tokens of each segment for the sake of efficiency.
Given these criteria, the learner evaluates sets of accuracy and false alarm scores like
those in (121). Details of this evaluation are discussed in sections 4.3.2.2 and 4.3.2.3. (121) ACCURACY: p: 0.913 b: 0.920 t: 0.957 d: 0.970
FALSE ALARMS: 0.080 0.093 0.027 0.040
Finally, after defining the elements that can be judged perceptually difficult, the
phonotactic positions in which these elements’ perceptibility is evaluated, and the criteria
for finding particular segments perceptually difficult, a functionally grounded constraint
schema also defines the constraints that are induced when elements are appropriately
found perceptually difficult. Here, the induced constraint is a positional markedness
constraint of the form defined in (122).
166
(122) *x/ContextZ Assign one violation mark for each instance of x in ContextZ.
According to the schema, if some segment x is found to be relatively perceptually
difficult in some position ContextZ, the constraint *x/ContextZ is induced, and so becomes
part of the learner’s constraint inventory. The constraint *#P is of this form; the name
‘*#P’ is an abbreviation for *p/#__.
There is one final property of the model described here which is a significant
simplification of any actual learners’ induction processes.48 This model is only concerned
with the relative perceptibility of pairs of voiced and voiceless homorganic stops. For this
reason, the acoustic and perceptual differences between p and b, and t and d, are
accurately represented. Differences between other pairs of segments, however, are not.
Therefore while the model can accurately assess the relative perceptual difficulty of p and
b or t and d, any judgment it would make about the relative perceptibility of b and d, p
and t, or other heterorganic pairs does not accurately reflect speakers’ judgments about
these segments’ perceptibility. Because of this limitation, the model never compares the
perceptual difficulty of a segment to anything other than its homorganic counterpart.
By using two comparisons of accuracy and false alarm scores to obtain measures
of segments’ relative perceptibility, the model described here can consistently induce the
constraint *#P from either pseudo-French data or pseudo-Cajonos Zapotec data. In the
production component of the model, the virtual speaker can speak either pseudo-French
or pseudo-Cajonos Zapotec (pseudo-CZ). The only difference between the models of
these two languages is whether or not they allow word-initial p, as shown in (123). (123) Initial Cs Medial Cs
pseudo-French: p b t d p b t d pseudo-Cajonos Zapotec: b t d p b t d
48 Restricting the universe of discourse within the model to CaCa words is another such necessary
simplification.
167
A learner of either pseudo-language compares accuracy and false alarm scores
using both of the methods described below. Section 4.3.2.2 describes a comparison of
accuracy scores that allows pseudo-French learners to induce *#P, and section 4.3.2.3
describes a comparison of accuracy and false alarm scores that allows pseudo-CZ
learners to also induce *#P.
4.3.2.2. Induction from accuracy scores: Pseudo-French
The virtual learner induces constraints against segments which it finds perceptually
difficult. From the perspective of a pseudo-French learner, initial p is more perceptually
difficult than initial b simply because initial p is recognized less accurately than initial b.
The constraint schema must provide the learner with an explicit procedure for identifying
this sort of pattern of perceptual difficulty, which is persistent and significant enough to
merit being encoded in a phonological constraint.
The constraint induction model imposes both absolute and relative criteria for the
evaluation of segments’ accuracy scores. In order for a segment’s accuracy score to
identify the segment as perceptually difficult for the purposes of constraint induction, the
accuracy score (as measured over the last 400 tokens of the segment) must be lower than
the absolute threshold of 0.9. In addition to this absolute measure of difficulty, the model
also requires segments’ accuracy scores to be significantly different from those of their
homorganic counterparts.
168
(124) Some segment x is perceptually difficult in ContextZ if:
Accuracy(x/ContextZ) < 0.9
and
Accuracy(x/ContextZ) < Accuracy(y/ContextZ) The two accuracy measures must be significantly different. (α = 0.01)
By these measures, a constraint against initial p can be induced only if p is
accurately identified less than 90% of the time, and if this accuracy score is significantly
lower than initial b’s accuracy score (as determined by a t-test, where α = 0.01). The
absolute difficulty measure ensures that only significant, persistent perceptual problems
will be penalized by induced constraints. The relative measure further captures the
inherently comparative character of markedness constraints: constraints are induced only
against segments which are demonstrably more difficult than others.49
The results of pseudo-French simulations where constraints are induced through
these accuracy criteria are summarized in Figure 19. The graph shows the sets of
constraints induced when 250 pseudo-French simulations (of 40,000 rounds each) are
run. Because initial p’s accuracy is consistently both sufficiently low and significantly
lower than that of initial b, the inducer consistently observes that initial p is perceptually
difficult and so induce *#P in nearly every simulation. The very small number of
simulations in which *#P is not induced would disappear if simulations were slightly
longer. Because initial p is so much more perceptually difficult than initial b, the inducer
has evidence for the opposite constraint *#B only extremely rarely.
49 See Gouskova (2003) for discussion of the comparative nature of markedness.
169
Figure 19. Constraints induced in each of 250 pseudo-French simulations of 40,000
rounds each.
Unlike initial p and b, initial t and d are essentially equally perceptually difficult.
While either may occasionally be significantly less accurately identified than the other,
and either may very occasionally have an accuracy score below 0.9, these aspects of
perceptual difficulty consistently fail to coincide, leaving learners with no evidence for
the induction of either *#T or *#D.
4.3.2.3. Induction from false alarm scores: Pseudo-Cajonos Zapotec
In this model, a learner may only compare two segments’ accuracy scores when the
learner has collected enough accuracy scores for each segment that these scores present a
reliable picture of the segments’ overall perceptibility. This is enforced through the
requirement that two segments’ accuracy scores are not compared until 400 tokens of
each segment have been heard. Learners who never hear any tokens of initial p never
develop comparable accuracy scores for p and b, so comparison of accuracy scores will
never allow a pseudo-CZ learner to induce *#P. For this reason, learners must be able to
use something other than comparison of accuracy scores in order for all learners of all
170
languages to identify initial p as perceptually difficult and induce this functionally
grounded constraint.
The perception component of the model assumes that learners identify the overall
set of segments which occur in the ambient language, then expect to hear each of these
segments in each phonotactic position.50 For this reason, while a pseudo-CZ learner never
perceives an actual token of word-initial p, it does know that p is one of pseudo-CZ’s
segments. For this reason, in attempting to learn the acoustic properties of each inventory
segment in each position, the learner occasionally misidentifies another initial segment as
p. In this way, pseudo-CZ learners acquire false alarms for unattested initial p.
Learners therefore have a unique kind of perceptual experience with phonotactic
gaps: segments which are missing in a particular position incur more false alarms than
accurate identifications in that position. These false alarms are relatively rare but they do
consistently occur, as illustrated in figures 20 and 21. Figure 20 shows the pseudo-CZ
learner’s accuracy as it learns to identify the three attested initial stops b, t, and d. This
learner and the pseudo-French learner described in section 4.2.1.2 identify initial t and d
with roughly comparable accuracy. The pseudo-CZ learner is overall more accurate in its
identification of initial b than the pseudo-French learner, as this learner’s lack of
knowledge of the detailed acoustic properties of initial p make it less likely to misidentify
initial b as p. There is, however, a small but consistent chance that the learner will make
exactly this mistake. As the confusion matrix in figure 21 shows, 0.2% of initial b tokens
are misidentified as initial p.
50 This initial inventory is stipulated in the present model; it could also be learned from the
statistical properties of the segments that it hears, as proposed by Maye (2000) and Boersma and Hamann (2007b), and as modelled by de Boer (2000). This initial inventory does not necessarily correspond to the language’s actual phoneme inventory but instead is simply the learner’s initial hypothesis space for early categorization.
171
Figure 20. Model accuracy for each initial pseudo-CZ consonant, averaged across 20,000
simulations of 300 rounds each. The lines represent moving averages across 15-round windows.
Response
p b t d
p
b 0.2% 98.4% 0.0% 1.4%
t 0.1% 0.0% 97.1% 2.8% Segm
ent
d 0.2% 1.2% 2.2% 96.4% Figure 21. Confusion matrix for initial pseudo-CZ consonants. Data collected from the
last 15 rounds of each of 20,000 simulations.
In order for the model to identify segments like pseudo-CZ initial p as
perceptually difficult, it can compare segments’ false alarm and accuracy scores. If some
segment’s false alarm score is not lower than its accuracy score (that is, if the false alarm
score is higher than the accuracy score, or if there is a false alarm score but no accuracy
score), the false-alarm-prone segment qualifies as perceptually difficult. This perceptual
fact then triggers the induction of a constraint against the missing segment. Using this
measure of perceptual difficulty, every simulated pseudo-CZ learner observes that initial
p is prone to false alarms, and so induces the constraint *#P as shown in Figure 22.
b t d
172
(125) Some segment x is perceptually difficult in ContextZ if:
Accuracy(x/ContextZ) < FalseAlarm(x/ContextZ)
Figure 22. Constraints induced in each of 250 pseudo-Cajonos Zapotec simulations of
40,000 rounds each.
4.3.3. Summary of the constraint induction model
A learner of any (pseudo-)language considers accuracy and false alarm scores for the
400 most recent tokens of each segment.51 Learners of all languages examine individual
segments’ accuracy scores, testing those which are below 0.9 to see whether they are
significantly lower than those of their homorganic counterparts. At the same time,
learners also compare accuracy and false alarm scores of individual segments. A segment
is labeled perceptually difficult if either its accuracy score is below 0.9 and is
significantly lower than that of some other segment, or when its false alarm score is
greater than its accuracy score.
Whenever any segment x is found to be perceptually difficult in some phonotactic
context ContextZ, a positional markedness constraint of the form *x/ContextZ is induced.
51 In terms of accuracy scores, a token of a segment is an actual pronounced instance of that
segment. In terms of false alarm scores, a token is instead an incorrect guess that a segment was pronounced.
173
Learners of languages like pseudo-French, where initial p is present but less perceptible
than initial b, consistently induce the constraint *#P through comparison of accuracy
scores while learners of languages like pseudo-CZ, where initial p is absent, consistently
induce the same constraint *#P through comparison of accuracy and false alarm scores.
Figure 23. Constraints induced in each of 250 simulations of 40,000 rounds each.
4.4. Conclusion
4.4.1. Summary
The constraint induction component of this model has demonstrated that a perceptually
grounded positional markedness constraint against word-initial p can be consistently
induced from the diverse perceptual experiences of learners who hear this perceptually
difficult segment, as well as those learning languages where p is banned word-initially.
This is possible because the inducer makes use of two measures of perceptual difficulty.
The relative accuracy of initial p and b demonstrates p’s perceptual difficulty in
languages like French. In languages like Cajonos Zapotec, where learners mistakenly
expect to hear initial p and so occasionally misidentify initial b as p, p’s false alarm score
is much higher than its accuracy score; this also indicates that initial p is difficult to
174
accurately identify. The constraint *#P can be induced from perceptual difficulty in either
case.
This case study of constraint induction illuminates the structure and role of
constraint schemata in the induction of functionally grounded constraints. Both formally
grounded and functionally grounded markedness constraint schemata provide skeletal
definitions of constraints which disprefer particular marked segments, structures, or
features in particular phonotactic positions. Formally grounded constraint schemata
specify the sets of positions and marked elements targeted by the constraints.
Functionally grounded schemata, on the other hand, provide instructions for how the
learner should determine the sets of positions and marked elements targeted by the
constraints. In other words, while formal schemata provide a learner with all information
necessary to assemble a complete set of constraints, functional schemata instead provide
directions for finding much of this information in a learner’s linguistic experience.
The functionally grounded schema governing the perceptual induction process
described here provides a learner with three pieces of information in addition to
definitions of the induced constraints. The schema tells the learner which kinds of
phonological elements may be considered perceptually difficult, which phonotactic
positions segments’ perceptual difficulty should be assessed in, and exactly how to
compute and compare perceptual difficulty measures (specifically, accuracy and false
alarm scores). Other functional constraint schemata for constraints grounded in facts of
articulation, psycholinguistics, or other aspects of perception presumably make similar
specifications.
175
4.4.2. Future directions: Elaborating the model
While this model has made explicit proposals about many aspects of constraint induction,
it still simplifies learners’ actual perception and induction tasks in a number of ways. In
order to develop more explicit hypotheses about learners’ behavior, the model should be
enriched and made more realistic.
One basic simplification concerns the acoustics of the stops produced by the
virtual speaker. The acoustic values of these stops are realistic, but represent the speech
of only a single speaker of French. This speaker was described by participants in the
perceptual experiment as a typical speaker of Parisian French; still, learners’ experience
would be more accurately modelled if the virtual learner were exposed to acoustic data
from more than one speaker. Additionally, explicit acoustic measurements of these stops
in languages other than French would support the claim that this French data is truly
representative of cross-linguistic learners’ experience.
Another simplification which should be removed in future versions of the model
is the current focus on accurate modelling of only voicing features. The present model
includes three acoustically realistic features for voicing, but only a single binary feature
for place distinctions. The virtual learner is consequently restricted to comparing the
accuracy scores only for pairs of homorganic stops differing in voicing, as described in
section 4.3.2.1. If acoustically accurate place features were included, the virtual learner’s
task would be much more realistic.
In modelling only pseudo-French and pseudo-Cajonos Zapotec, the model
assumes that all languages have unaspirated p in their stop inventories. Before the model
can be rid of this assumption, modelling constraint induction in languages without p (like
standard Arabic) was well, we must determine empirically what constraints should be
induced in this situation. While it has been assumed throughout this dissertation that all
176
constraints are universal, there is no conclusive evidence showing whether learners who
never hear p have a specific dispreference for initial p.52 If the present virtual learner
were exposed to pseudo-Arabic, it would not induce the constraint *#P; however, without
empirical investigation of speakers of Arabic-type languages, we cannot know whether or
not this is the desired outcome.
Finally, even a model enriched in these ways will still induce only the single
constraint *#P. An extremely diverse set of phonological restrictions (against sonorous
geminates or NT sequences, favoring place assimilation, and many more) have been
argued to be grounded in perceptual factors. Exploring additional restrictions in an
explicit, phonetically realistic model will shed further light on the relationship between
constraints and their functional motivations. This chapter made claims about the general
structure of schemata for perceptually grounded constraints. The model instantiated a
particular schema, which provided sufficient instructions for inducing *#P. By modelling
the induction of additional constraints, we can investigate whether similar schemata can
actually account for the induction of a variety of constraints.
52 Chapter 3 suggests that speakers of languages without p may have *#P, as speakers of Moroccan
Arabic have borrowed p in many loanwords from French and Spanish, but tend to avoid borrowing initial p faithfully. It is unclear, however, whether these speakers had *#P before hearing p at all, or whether it has developed in learners exposed to perceptually difficult French and Spanish p.
177
Chapter 5. Conclusion
5.1. Summary of the dissertation
This work began from the premise that phonological constraints can reflect functional
tendencies, and investigated the nature of the relationship between constraints and their
functional motivations. I have characterized this relationship in terms of two kinds of
knowledge that learners can have about constraints’ functional sources. Functionally
grounded constraints – like *#P, discussed in chapters 3 and 4 – are those which learners
induce from their immediate linguistic experience. Learners thus have direct knowledge
of the functional motivations of these constraints. Formally grounded constraints, on the
other hand – like the domain-edge markedness constraints discussed in chapter 2 – are
those which cannot be consistently induced, and so must be innately specified in all
learners. In this case, learners have no direct knowledge of any functional factors which
may have (originally, evolutionarily) motivated these constraints.
Chapter 1 motivated the induction-based distinction between formally and
functionally grounded constraints by first motivating the premise that all constraints are
universally present in the grammars of all speakers of all languages. Given this, if all
learners of all languages have consistent access to perceptual (or acoustic,
psycholinguistic, etc.) data from which some typologically attested constraint can be
consistently induced, then that constraint should be induced. Any innate definitions of
induceable constraints would be redundant, and would add unnecessary complexity to the
innate language endowment. Chapters 3 and 4 focus on the induction of *#P, which is
active in Cajonos Zapotec, Ibibio, and Moroccan Arabic. After characterizing the
acoustic and perceptual properties of initial p (and other initial and medial stops), these
178
phonetic facts are instantiated in a model of perception and constraint induction. Virtual
learners of phonotactically diverse pseudo-languages are exposed to these realistic stops
and induce *#P. Because this constraint can be consistently induced, its functional
motivations are accessible to learners and so it is functionally grounded.
Chapter 1 also argued that formal vs. functional grounding is not determined by
whether an individual constraint can be induced, but rather by whether all constraints
generated by a particular schema can be induced. Schemata, including those for the
domain-edge markedness constraints proposed in chapter 2 (MOnset(Onset/PCat) and
MCoda(Coda/PCat)), are templates for sets of formally similar constraints. If individual
constraints belonging to a particular schema cannot be induced, then the schema defining
these constraints – and so the constraints themselves – must instead be innate. This gives
the diagnostic for formal vs. functional grounding an ‘all or nothing’ character: if all
constraints in some schema can be induced, then the schema and all constraints in it are
functionally grounded. If any constraint in a schema cannot be induced, however, then
the schema and all constraints in it must instead be innate and so are formally grounded.
Returning to the domain-edge markedness constraints, chapter 2 argued that
individual constraints (e.g. *(Onset/σ)) do not appear to be induceable from their
phonetic properties. This suggests that all domain-edge markedness constraints are
formally grounded. This is not to say that these constraints have no functional
motivations. Many domain-edge markedness constraints do reflect phonetic facts. These
constraints cannot all be induced, however, because the perceptual difficulty of utterance-
and word-initial has been generalized and grammaticalized. is phonologically marked
in sets of formally similar positions – prosodic domain onsets – which go beyond strictly
those positions where it is phonetically marked. Formally grounded *(Onset/σ) cannot
179
be induced precisely because of this mismatch between the phonological and phonetic
properties of . Other domain-edge markedness constraints similarly generalize beyond
literal phonetic markedness.
While this dissertation has focused on learners’ knowledge of Optimality
Theoretic constraints, the central result informs any theory of phonological grammar.
Whether a grammar consists of ranked or weighted constraints, rules, parameters, or
other objects, I argue that (1) these should reflect functional tendencies, and (2) those
aspects of grammar which directly encode functional facts should be created anew by
each learner, while those which cannot be consistently induced must instead be part of
each learner’s innate endowment.
5.2. Broader issues
This final section looks at phonological grammars from a broader perspective and raises
two issues whose resolution is central to the claims made in the preceding chapters. First,
there is a great deal of debate as to whether there are universal constraints (or other
aspects of phonological grammar). While this dissertation has assumed that constraints
are universal, section 5.2.1 demonstrates that the proposals made here remain relevant
even if the premise of universality ultimately turns out not to hold. Second, while this
dissertation has offered a proposal for distinguishing formally from functionally
grounded constraints and identified particular constraints which may belong to each class,
section 5.2.2 observes that further empirical investigation of whether and how learners
actually induce constraints is necessary in order to definitively evaluate these claims.
180
5.2.1. Constraint universality?
The diagnostics for formal and functional constraint grounding proposed here assumed
that each constraint is present in the grammar of each speaker of each language, and so
that each learner’s knowledge of each universal constraint must be accounted for.
Chapter 1 presented evidence that constraints can ‘emerge’ in languages where they are
otherwise inactive. These cases indicate that various constraints are present in grammars
beyond those where they appear to be strictly necessary, suggesting that they – and so
perhaps all constraints – may in fact be universal.
There have been a number of challenges to the idea of a universal constraint set.
Pater (to appear) proposes that learners may apply morphological indexes to innate,
universal constraints in order to account for the morphophonology of the language being
learned. Kawahara (2007) argues that phonetically natural constraints are universal, while
phonetically unnatural constraints are created by learners on a language-specific basis.
Hayes and Wilson (to appear) demonstrate that learners’ knowledge of both lexical and
nonce forms can be realistically modelled if grammars are composed entirely of ad hoc
constraints created by individual learners. Blevins (2004) follows Ohala (1981) in
suggesting that phonetically motivated, listener-driven sound changes, rather than a
universal constraint inventory, or indeed any sort of independent phonological grammar,
are responsible for typological generalizations.
A developing body of recent work (Albright, 2007; Berent et al., 2007; Moreton,
2002; Wilson, 2003) argues that listeners have phonological knowledge which cannot be
reduced to their experience with the lexical or phonetic properties of their linguistic
experience. This work, along with that surveyed in chapter 1, thus supports the claim that
there is an innate, universal component of phonological grammar. Many questions remain
unresolved: Does the phonological grammar consist of ranked or weighted constraints,
181
ordered rules, or something else entirely? Are all of these constraints (or rules, etc.)
universal? If not, what distinguishes universal constraints from language-specific ones?
The work presented here addresses a question which remains relevant regardless
of the ultimate resolution of these issues. In all languages whose grammars include some
constraint, how do all speakers of those languages come by the constraint? For each
universal constraint, the question remains as framed here: can all speakers of all
languages induce it from their immediate linguistic experience? If so, the constraint is
functionally grounded. If not, the constraint must instead be innate and formally
grounded. Constraints which are not universal are presumably induced rather than
innate.53 For these constraints we must still ask, however, how induction proceeds, and
what sort of schemata guide this process.
5.2.2. Empirical investigations of constraint induction
A major goal of this dissertation is to examine the relationship between constraints and
functional factors in an explicit, rigorous manner, considering the detailed nature of the
phonetic data available to listeners and determining from this what sort of induction
mechanism would allow learners to create the appropriate constraints. Careful
consideration of the parallels between constraints and functional factors, and
computational implementations of realistic phonetic data, are valuable tools for showing
whether constraint induction and other aspects of phonological learning are possible in
principle. But ultimately, these provide only hypotheses about what actual learners do
53 The innate component of the linguistic endowment should be universal; if constraints are
demonstrably absent from the grammars of some speakers, these constraints are thus presumably not innate. A possible alternative is that all constraints could be universal, but that learners’ experience instructs them to include particular subsets of this universal constraint set in their grammars. In this case, a combination of innateness and induction results in grammars with different constraint sets.
182
during language acquisition. All of the discussion and proposals regarding the grounding
and ultimate source of particular constraints offered in the preceding chapters can be
confirmed only by further empirical work.
Considering first functionally grounded constraints, the question of whether (and
how) these are induced from phonetic data is ultimately an empirical one. Induction is
only possible if learners observe phonetic facts which could motivate particular
constraints, and if there is some mechanism which could induce the attested set of
constraints from this data. But, of course, demonstrating that constraint induction is
possible is not the same as showing that it actually occurs. Any proposed induction
mechanism is at best a hypothesis about learners’ behavior, and must be tested against
actual speakers. While chapter 4 has shown that the constraint *#P can be consistently
induced from learners’ immediate linguistic experience, this cannot be the end of the
story. The question of whether real learners actually do induce *#P using a mechanism
like the one proposed here is left for future investigation.
The arguments for the formal grounding and innateness of domain-edge
markedness constraints can similarly be confirmed only empirically. Chapter 2 suggested
that domain-edge markedness constraints don’t correlate well enough with their apparent
phonetic motivations to be consistently induceable by learners, and therefore must instead
be innate and formally grounded. While phonetic data supports the claim that glottal
stops are not consistently difficult to perceive in syllable onsets, a conclusive answer to
the larger question of whether or not these constraints can be induced cannot be provided
without more information about exactly what acoustic and perceptual data learners
consider during induction.
183
The approach pursued here suggests that learners induce constraints whenever
possible, and only uninduceable constraints are innate. While the definitions of innate,
formally grounded constraints also tend to refer to classes of segments and positions
which can be defined in terms of formal features like placeless and high-sonority
segments, the prosodic hierarchy, etc., the definition of this class of constraints is
primarily negative: they are those which learners fail to induce. At present, we can only
speculate as to what learners may be capable of. Arguments for learners’ inability to learn
a given linguistic fact from available data are notoriously difficult to make (Pullum and
Scholz, 2002). Language acquisition is an enormously complex task, and the suggestion
that some constraints are uninduceable may underestimate learners’ cleverness in the face
of massive quantities of data. On the other hand, acquisition could be hugely simplified if
learners came equipped with constraint inventories, rather than needing to induce these
constraints at all. This work provides explicit hypotheses regarding the induction process
which can be used to tease apart these possibilities in future work.
184
Appendix A. Experimental stimuli recordings.
P initial words
J’ai dit “pacha” trois fois. J’ai dit “païen” deux fois. J’ai dit “paragraphe” quelquefois. J’ai dit “parcelle” quelquefois. J’ai dit “parcourir” doucement” J’ai dit “parfum” trois fois. J’ai dit “partante” quelquefois. J’ai dit “passerelle” trois fois. J’ai dit “patelin” deux fois. J’ai dit “pèlerin” trois fois. J’ai dit “perime” quatre fois. J’ai dit “perroquet” quatre fois. J’ai dit “pillage” quelquefois. J’ai dit “piston” gravement. J’ai dit “podium” gravement. J’ai dit “polaire” doucement. J’ai dit “portion” pour toi. J’ai dit “poterie” pour toi. J’ai dit “poucettes” gravement. J’ai dit “poumon” gravement. J’ai dit “poussin” pour toi. J’ai dit “pudeur” dix fois. J’ai dit “putois” quelquefois. J’ai dit “puzzle” gravement. (training) B initial words
J’ai dit “bagatelle” doucement. J’ai dit “balcon” deux fois. J’ai dit “banquier” pour toi. J’ai dit “baptême” trois fois. J’ai dit “baratin” deux fois. J’ai dit “baril” deux fois. J’ai dit “barjo” trois fois. J’ai dit “baron” dix fois. J’ai dit “bassin” trois fois. J’ai dit “bavure” pour toi. J’ai dit “béguin” pour toi. J’ai dit “bercail” pour toi. J’ai dit “beurre” doucement. (training) J’ai dit “bijoutier” pour toi. J’ai dit “bison” quelquefois. J’ai dit “bolide” trois fois. J’ai dit “bordeaux” quelquefois. J’ai dit “borgne” pour toi. J’ai dit “bougie” quelquefois. J’ai dit “bouilloire” doucement. J’ai dit “bourreau” quelquefois. J’ai dit “boyard” pour toi. J’ai dit “bûcheron” quatre fois. J’ai dit “buveur” dix fois.
“(training)” = stimulus used during training only “(delete)” = stimulus discarded after perceptual
experiement; not included in results B initial nonwords
J’ai dit “bacha” trois fois. J’ai dit “baïen” deux fois. J’ai dit “baragraphe” quelquefois. J’ai dit “barcelle” quelquefois. J’ai dit “barcourir” doucement” J’ai dit “barfum” trois fois. J’ai dit “bartante” quelquefois. J’ai dit “bassarelle” trois fois. J’ai dit “batelin” deux fois. J’ai dit “bèlerin” trois fois. J’ai dit “berime” quatre fois. J’ai dit “berroqué” quatre fois. J’ai dit “billage” quelquefois. J’ai dit “biston” gravement. J’ai dit “bodium” gravement. J’ai dit “bolaire” doucement. J’ai dit “bortion” pour toi. J’ai dit “boterie” pour toi. J’ai dit “boucette” gravement. J’ai dit “boumon” gravement. J’ai dit “boussin” pour toi. J’ai dit “budeur” dix fois. J’ai dit “butois” quelquefois. J’ai dit “buzzle” gravement. (training) P initial nonwords
J’ai dit “pagatelle” doucement. J’ai dit “palcon” deux fois. J’ai dit “panquier” pour toi. J’ai dit “patême” trois fois. J’ai dit “paratin” deux fois. J’ai dit “paril” deux fois J’ai dit “parjot” trois fois. J’ai dit “paron” dix fois. J’ai dit “passin” trois fois. J’ai dit “pavure” pour toi. J’ai dit “péguin” pour toi. J’ai dit “percail” pour toi. J’ai dit “peurre” doucement. (training) J’ai dit “pijoutier” pour toi. J’ai dit “pison” quelquefois. J’ai dit “polide” trois fois. J’ai dit “pordeau” quelquefois. J’ai dit “porgne” pour toi. J’ai dit “pougie” quelquefois. J’ai dit “pouilloire” doucement. J’ai dit “pourreau” quelquefois. J’ai dit “poyard” pour toi. J’ai dit “pûcheron” quatre fois. J’ai dit “puveur” dix fois.
185
P medial words J’ai dit au mec “anthropologue” dix fois. J’ai dit au mec “apitoyé” deux fois. J’ai dit au mec “arrière-grand-père” pour toi.
(training) J’ai dit au mec “apogée” deux fois. J’ai dit au mec “apparence” quelquefois. J’ai dit au mec “auparavant” deux fois. J’ai dit “capillaire” trois fois. J’ai dit “cappuccino” quelquefois. J’ai dit “capuchon” gravement. J’ai dit “champignon” dix fois. J’ai dit “composition” pour toi. J’ai dit “corrompu” pour toi. J’ai dit au mec “hépatite” pour toi. J’ai dit au mec “imperfection” pour toi. J’ai dit “manipulation” pour toi. J’ai dit au mec “opportun” quatre fois. J’ai dit “rapidité” quelquefois. J’ai dit “remporté” quelquefois. J’ai dit “ripou” gravement. J’ai dit “supercherie” deux fois. J’ai dit “superviseur” pour toi. (delete) J’ai dit “trappeur” dix fois. J’ai dit “vaporisé” quelquefois. B medial words J’ai dit au mec “abomination” pour toi. J’ai dit au mec “ambigu” doucement. J’ai dit au mec “aubergine” deux fois. J’ai dit “cabaret” trois fois. J’ai dit “cabinet” quatre fois. J’ai dit “chamboulé” dix fois. J’ai dit “cobalt” dix fois. J’ai dit “cobaye” pour toi. J’ai dit “combustion” pour toi. J’ai dit “cubaine” dix fois. J’ai dit “débâcle” deux fois. (delete) J’ai dit “flambé” dix fois. (training) J’ai dit “grabuge” quelquefois. J’ai dit au mec “inhabituelle” trois fois. J’ai dit “labourer” dix fois. J’ai dit “lambeau” gravement. J’ai dit “lavabo” quatre fois. J’ai dit au mec “obèse” trois fois. J’ai dit “robotique” deux fois. J’ai dit “sabotage” quatre fois. J’ai dit “stabilité” dix fois. J’ai dit “tombeur” deux fois. J’ai dit “tribunaux” doucement. J’ai dit “trombone” quatre fois.
B medial nonwords J’ai dit au mec “anthrobologue” dix fois. J’ai dit au mec “abitoyé” deux fois. J’ai dit au mec “arrière-grand-bère” pour toi.
(training) J’ai dit au mec “abogée” deux fois. J’ai dit au mec “abarence” quelquefois. J’ai dit au mec “aubaravant” deux fois. J’ai dit “cabillaire” trois fois. J’ai dit “cabuccino” quelquefois. J’ai dit “cabuchon” gravement. J’ai dit “chambignon” dix fois. J’ai dit “combosition” pour toi. J’ai dit “corrombu” pour toi. J’ai dit au mec “hébatite” pour toi. J’ai dit au mec “imberfection” pour toi. J’ai dit “manibulation” pour toi. J’ai dit au mec “obortun” quatre fois. J’ai dit “rabidité” quelquefois. J’ai dit “remborté” quelquefois. J’ai dit “ribou” gravement. J’ai dit “subercherie” deux fois. (delete) J’ai dit “suberviseur” pour toi. J’ai dit “trabeur” dix fois. J’ai dit “vaborisé” quelquefois. P medial nonwords J’ai dit au mec “apomination” pour toi. J’ai dit au mec “ampigu” doucement. J’ai dit au mec “aupergine” deux fois. J’ai dit “caparet” trois fois. J’ai dit “capinet” quatre fois. J’ai dit “champoulé” dix fois. J’ai dit “copalt” dix fois. J’ai dit “copaye” pour toi. J’ai dit “compustion” pour toi. J’ai dit “cupaine” dix fois. J’ai dit “dépâcle” deux fois. J’ai dit “flampé” dix fois. (training) J’ai dit “grapuge” quelquefois. J’ai dit au mec “inhapituelle” trois fois. J’ai dit “labourer” dix fois. J’ai dit “lampeau” gravement. J’ai dit “lavapo” quatre fois. J’ai dit au mec “opèse” trois fois. J’ai dit “ropotique” deux fois. J’ai dit “sapotage” quatre fois. J’ai dit “stapilité” dix fois. J’ai dit “tompeur” deux fois. J’ai dit “tripunaux” doucement. J’ai dit “trompone” quatre fois.
186
T initial words J’ai dit “tangible” dix fois. J’ai dit “tango” doucement. (training) J’ai dit “technicien” deux fois. J’ai dit “techniquement” quelquefois. J’ai dit “télégramme” dix fois. J’ai dit “télévision” quelquefois. J’ai dit “ténèbre” pour toi. J’ai dit “tenir” dix fois. J’ai dit “ténor” quatre fois. J’ai dit “terminal” gravement J’ai dit “théoriquement” dix fois. J’ai dit “thérapie” gravement. J’ai dit “timbre” dix fois. J’ai dit “tirage” gravement. J’ai dit “tirelire” trois fois. J’ai dit “tireur” deux fois. J’ai dit “tisonnier” deux fois. (delete) J’ai dit “tolérance” dix fois. J’ai dit “torchon” doucement. J’ai dit “tourelle” dix fois. J’ai dit “tourisme” quatre fois. J’ai dit “tourné” doucement. J’ai dit “turbulence” quelquefois. J’ai dit “typhon” deux fois. D initial words J’ai dit “dangereusement” pour toi. J’ai dit “danseuse” doucement. (training) J’ai dit “dauphin” deux fois. J’ai dit “débâcle” deux fois. (delete) J’ai dit “décevoir” quelquefois. J’ai dit “déclaration” doucement. J’ai dit “déesse” trois fois. J’ai dit “démocratie” gravement. J’ai dit “démoniaque” deux fois. J’ai dit “dépanneuse” quelquefois. J’ai dit “dernierement” gravement. J’ai dit “description” dix fois. J’ai dit “devenir” gravement. J’ai dit “diffusion” trois fois. J’ai dit “dilemme” deux fois. J’ai dit “dingo” doucement. J’ai dit “divorcé” quelquefois. J’ai dit “dormeur” pour toi. J’ai dit “doublement” pour toi. J’ai dit “doublure” trois fois. J’ai dit “douleur” quelquefois. J’ai dit “duchesse” gravement. J’ai dit “dynamique” dix fois.
D initial nonwords J’ai dit “dangible” dix fois. J’ai dit “dango” doucement. (training) J’ai dit “dechnicien” deux fois. J’ai dit “dechniquement” quelquefois. J’ai dit “délégramme” dix fois. J’ai dit “délévision” quelquefois. J’ai dit “dénèbre” pour toi. J’ai dit “denir” dix fois. J’ai dit “dénor” quatre fois. J’ai dit “derminal” gravement J’ai dit “déoriquement” dix fois. J’ai dit “dérapie” gravement. J’ai dit “dimbre” dix fois. J’ai dit “dirage” gravement. J’ai dit “direlire” trois fois. J’ai dit “direur” deux fois. J’ai dit “disonnier” deux fois. J’ai dit “dolérance” dix fois. J’ai dit “dorchon” doucement. J’ai dit “dourelle” dix fois. J’ai dit “dourisme” quatre fois. J’ai dit “dourné” doucement. J’ai dit “durbulence” quelquefois. J’ai dit “diphon” deux fois. T initial nonwords J’ai dit “tangereusement” pour toi. J’ai dit “tanseuse” doucement. (training) J’ai dit “tauphin” deux fois. J’ai dit “tébâcle” deux fois. J’ai dit “técevoir” quelquefois. J’ai dit “téclaration” doucement. J’ai dit “téesse” trois fois. J’ai dit “thémocrasie” gravement. J’ai dit “témoniaque” deux fois. J’ai dit “tépanneuse” quelquefois. J’ai dit “ternierement” gravement. J’ai dit “tescription” dix fois. J’ai dit “tevenir” gravement. J’ai dit “tiffusion” trois fois. (delete) J’ai dit “tilemme” deux fois. J’ai dit “tingo” doucement. J’ai dit “tivorcé” quelquefois. J’ai dit “tormeur” pour toi. J’ai dit “doublement” pour toi. J’ai dit “toublure” trois fois. J’ai dit “toleur” quelquefois. J’ai dit “tuchesse” gravement. J’ai dit “thinamique” dix fois.
187
T medial words J’ai dit au mec “atomique” deux fois. J’ai dit au mec “atténué” dix fois. J’ai dit “châtier” quelquefois. J’ai dit “châtiment” dix fois. J’ai dit “contamination” trois fois. J’ai dit “contemplé” doucement. J’ai dit “contusion” pour toi. J’ai dit “côtière” doucement. J’ai dit “cratère” doucement. J’ai dit au mec “échantillon” dix fois. J’ai dit “fautif” trois fois. J’ai dit “génétiquement” dix fois. J’ai dit au mec “intensément” deux fois. J’ai dit au mec “intimement” gravement. J’ai dit au mec “inventaire” deux fois. J’ai dit au mec “itinéraire” dix fois. J’ai dit “littéralement” doucement. J’ai dit “matériau” deux fois. J’ai dit “mentalement” quatre fois. (training) J’ai dit au mec “observateur” quatre fois. J’ai dit “rotation” doucement. J’ai dit “satisfaire” trois fois. J’ai dit “vétéran” pour toi. J’ai dit “volontairement” pour toi. D medial words J’ai dit au mec “abondance” quelquefois. J’ai dit au mec “acadien” trois fois. J’ai dit au mec “addition” quelquefois. J’ai dit au mec “adhésif” trois fois. J’ai dit au mec “adoption” quelquefois. J’ai dit au mec “ambassadeur” doucement. J’ai dit “chaudière” quelquefois. J’ai dit “comédie” doucement. J’ai dit “considération” pour toi. (delete) J’ai dit “condamnation” dix fois. J’ai dit au mec “édifice” trois fois. J’ai dit au mec “évidence” trois fois. J’ai dit “fédération” dix fois. J’ai dit “godasse” gravement. J’ai dit au mec “indication” pour toi. J’ai dit au mec “irlandaise” gravement. J’ai dit “juridiction” trois fois. J’ai dit “mandarin” quatre fois. (training) J’ai dit “modification” deux fois. J’ai dit “mondaine” dix fois. J’ai dit “reduction” quelquefois. J’ai dit “refroidissement” pour toi. J’ai dit “rondelle” quatre fois. J’ai dit “sidéré” dix fois.
D medial nonwords J’ai dit au mec “adomique” deux fois. J’ai dit au mec “adénué” dix fois. J’ai dit “châdier” quelquefois. J’ai dit “châdiment” dix fois. J’ai dit “condaminasion” trois fois. J’ai dit “condemplé” doucement. J’ai dit “condusion” pour toi. J’ai dit “côdière” doucement. J’ai dit “cradère” doucement. J’ai dit au mec “échandillon” dix fois. J’ai dit “faudif” trois fois. J’ai dit “génédiquement” dix fois. J’ai dit au mec “indensément” deux fois. J’ai dit au mec “indimement” gravement. J’ai dit au mec “invendaire” deux fois. J’ai dit au mec “idinéraire” dix fois. J’ai dit “lidéralement” doucement. J’ai dit “madériaux” deux fois. J’ai dit “mendalement” quatre fois. (training) J’ai dit au mec “observadeur” quatre fois. J’ai dit “rodation” doucement. J’ai dit “sadisfaire” trois fois. J’ai dit “védéran” pour toi. J’ai dit “volondairement” pour toi. T medial nonwords J’ai dit au mec “abontance” quelquefois. J’ai dit au mec “acattien” trois fois. J’ai dit au mec “atission” quelquefois. J’ai dit au mec “atésif” trois fois. J’ai dit au mec “atoption” quelquefois. J’ai dit au mec “ambassateur” doucement. J’ai dit “chautière” quelquefois. J’ai dit “cométtie” doucement. J’ai dit “consitération” pour toi. J’ai dit “contemnasion” dix fois. J’ai dit au mec “étifice” trois fois. J’ai dit au mec “évitence” trois fois. J’ai dit “fétérasion” dix fois. J’ai dit “gotasse” gravement. J’ai dit au mec “inticasion” pour toi. J’ai dit au mec “irlantaise” gravement. J’ai dit “juritiction” trois fois. J’ai dit “mantarin” quatre fois. (training) J’ai dit “motification” deux fois. J’ai dit “montaine” dix fois. J’ai dit “retuction” quelquefois. J’ai dit “refroidissement” pour toi. J’ai dit “rontelle” quatre fois. J’ai dit “sitéré” dix fois.
188
Appendix B. Subjects analyses of perceptual results
Overall reaction times
Figure 24. Average reaction times (ms) in each condition, with 95% confidence intervals.
Reaction time
Initial Medial
Mean (ms) p value Mean (ms) p value
b 554 b 591
p 587
0.279 t(28) = 1.103 p 598
0.812 t(28) = 0.240
d 536 d 594
t 494
0.109 t(28) = 1.656 t 547
0.122 t(28) = 1.596
Table 6. Reaction time analyses, with p values from preplanned two-sample t-tests.
189
Overall percent correct
Figure 25. Average percent correct in each condition, with 95% confidence intervals.
Percent correct
Initial Medial
Mean (%) p value Mean (%) p value
b 94 b 92
p 94
0.967 t(28) = 0.042 p 93
0.802 t(28) = 0.253
d 95 d 87
t 97
0.212 t(28) = 1.278 t 89
0.533 t(28) = 0.631
Table 7. Percent correct analyses, with p values from preplanned two-sample t-tests.
190
Word/Nonword stimuli: Reaction time and percent correct
Words: Reaction time Nonwords: Reaction time
Initial Medial Initial Medial
Mean (ms)
p value
Mean (ms)
p value
Mean (ms)
p value
Mean (ms)
p value
b 554 b 601 b 554 b 580
p 586
0.316 t(28) = 1.022 p 616
0.683 t(28) = 0.413 p 588
0.271 t(28) = 1.123 p 583
0.923 t(28) = 0.098
d 548 d 607 d 526 d 581
t 502
0.083 t(28) = 1.800 t 557
0.145 t(28) = 1.499 t 485
0.178 t(28) = 1.382 t 537
0.148 t(28) = 1.490
Table 8. Reaction time analyses of stimuli trimmed from word and nonword recordings, with p values from preplanned two-sample t-tests.
Words: Percent correct Nonwords: Percent correct
Initial Medial Initial Medial
Mean (%)
p value
Mean (%)
p value
Mean (%)
p value
Mean (%)
p value
b 95 b 92 b 94 b 92
p 96
0.590 t(28) = 0.545 p 93
0.702 t(28) = 0.387 p 93
0.723 t(28) = 0.358 p 92
0.900 t(28) = 0.126
d 96 d 87 d 94 d 88
t 97
0.535 t(28) = 0.629 t 92
0.138 t(28) = 1.528 t 97
0.169 t(28) = 1.411 t 87
0.748 t(28) = 0.325
Table 9. Percent correct analyses of stimuli trimmed from word and nonword recordings, with p values from preplanned two-sample t-tests.
191
BIBLIOGRAPHY
Abbott, Miriam. 1991. Macushi. In Handbook of Amazonian Languages, eds. D.C. Derbyshire and G.K. Pullum, 23-160. Berlin & New York: Mouton de Gruyter.
Abega, P. 1969. La Grammaire de l'Ewondo. Yaounde: Section de Linguistique Appliquée, Université féderale du Cameroun.
Adam, Galit. 2002. From Variable to Optimal Grammar: Evidence from Language Acquisition and Language Change, Tel-Aviv University: Doctoral dissertation.
Akinlabi, Akinbiyi, and Eno E. Urua. 2002. Foot structure in the Ibibio verb. Journal of African Languages and Linguistics 23:119-160.
Albright, Adam. 2007. Natural classes are not enough: Biased generalization in novel onset clusters. Ms. Cambridge, MA.
Anderson, Gregory D. S. 2004. Areal and phonotactic distribution of ŋ. In The Internal Organization of Phonological Segments, eds. M. van Oostendorp and J. van de Weijer, 217-234. Berlin: Mouton de Gruyter.
Annamalai, E., and S. B. Steever. 1998. Modern Tamil. In The Dravidian Languages, ed. S.B. Steever, 100-128. London & New York: Routledge.
Archangeli, Diana, and Douglas Pulleyblank. 1994. Grounded Phonology. Cambridge, MA: MIT Press.
Archibald, John, and Jana Carson. 2000. Acquisition of Quebec French Stress: University of Calgary.
Armbruster, Charles Hubert. 1960. Dongolese Nubian: A Grammar. Cambridge: Cambridge University Press.
Ashby, F. Gregory, and W. Todd Maddox. 1994. A response time theory of separability and integrality in speeded classification. Journal of Mathematical Psychology 38:423-466.
Baxter, Alan N. 1988. A Grammar of Kristang (Malacca Portuguese Creole): Pacific Linguistics B-95. Canberra: Australian National University.
Beckman, Jill. 1999. Positional Faithfulness: An Optimality Theoretic Treatment of Phonological Asymmetries: Outstanding Dissertations in Linguistics. New York & London: Garland.
Bell, Alan. 1971. Some patterns of occurrence and formation of syllable structures. In Working Papers on Language Universals 6, 23-137. Stanford, CA: Department of Linguistics, Stanford University.
Berent, Iris, Donca Steriade, Tracy Lennertz, and Vered Vaknin. 2007. What we know about what we have never heard: Evidence from perceptual illusions. Cognition 104:591-630.
Bhaskararao, Peri. 1998. Gadaba. In The Dravidian Languages, ed. S.B. Steever, 328-358. London & New York: Routledge.
192
Birnbaum, Solomon A. 1979. Yiddish: A Survey and a Grammar. Toronto: University of Toronto Press.
Blevins, Juliette. 1995. The syllable in phonological theory. In The Handbook of Phonological Theory, ed. J.A. Goldsmith, 206-244. Cambridge, MA, & Oxford: Blackwell.
Blevins, Juliette. 2004. Evolutionary Phonology: The Emergence of Sound Patterns. Cambridge: Cambridge University Press.
Blevins, Juliette. 2006. A theoretical synopsis of Evolutionary Phonology. Theoretical Linguistics 32:117-166.
Bloomfield, Leonard. 1962. The Menomini Language. New Haven: Yale University Press.
Boersma, Paul, and Bruce Hayes. 2001. Empirical tests of the Gradual Learning Algorithm. Linguistic Inquiry 32:45-86.
Boersma, Paul, and David Weenink. 2006. Praat: Doing phonetics by computer.
Boersma, Paul, and Silke Hamann. 2007a. The evolution of auditory contrast: Universiteit van Amsterdam and Utrecht University.
Boersma, Paul, and Silke Hamann. 2007b. The evolution of auditory contrast. Ms.
Boersma, Paul. to appear. Some listener-oriented accounts of h-aspiré in French. Lingua.
Bolognesi, Roberto. 1998. The Phonology of Campidanian Sardinian: A Unitary Account of a Self-Organizing Structure, University of Amsterdam: Doctoral dissertation.
Borgman, Donald M. 1990. Sanuma. In Handbook of Amazonian Languages, eds. D.C. Derbyshire and G.K. Pullum, 15-248. Berlin & New York: Mouton de Gruyter.
Branch, Michael. 1987. Finnish. In The World's Major Languages, ed. B. Comrie, 593-618. Oxford: Oxford University Press.
Breen, Gavan. 2004. Innamincka Words: Yandruwandha Dictionary and Stories. Canberra: Pacific Linguistics, Research School of Pacific and Asian Studies, Australian National University.
Bridgeman, Loraine I. 1961. Kaiwa (Guarani) phonology. International Journal of American Linguistics 27:329-334.
Broadbent, S. M. 1964. The Southern Sierra Miwok Language. Berkeley & Los Angeles: University of California Press.
Bromley, H. Myron. 1961. The Phonology of Lower Grand Valley Dani: A Comparative Structural Study of Skewed Phonemic Patterns. s' Gravenhave: M. Nijhoff.
Broselow, Ellen, Su-I Chen, and Chilin Wang. 1998. The emergence of the unmarked in second language phonology. Studies in Second Language Acquisition 20:261-280.
193
Broselow, Ellen. 2003. Marginal phonology: Phonotactics on the edge. The Linguistic Review 20:159-193.
Bugenhagen, Robert D. 1995. A Grammar of Mangap-Mbula: An Austronesian Language of Papua New Guinea: Pacific Linguistics C-101. Canberra: Australian National University.
Buller, Barbara, Ernest Buller, and Daniel L. Everett. 1993. Stress placement, syllable structure, and minimality in Banawa. International Journal of American Linguistics 59:280-293.
Busenitz, Robert L., and Marilyn J. Busenitz. 1991. Balantak phonology and morphophonemics. In Studies in Sulawesi Linguistics, Part II, ed. J.J. Sneddon, 29-47: NUSA.
Byrd, Dani. 2000. Articulatory vowel lengthening and coordination at phrasal junctures. Phonetica 57:3-16.
Caughley, Ross Charles. 2000. Dictionary of Chepang: A Tibeto-Burman Language of Nepal. Canberra: Australian National University.
Cho, Taehong, and Sun Ah Jun. 2000. Domain-initial strengthening as an enhancement of laryngeal features: Aerodynamic evidence from Korean. UCLA Working Papers in Phonetics 99:57-79.
Clarke, Sandra. 1982. North-West River (Sheshātshīt) Montagnais: A Grammatical Sketch. Ottawa: National Museums of Canada.
Connell, Bruce. 1994. The Lower Cross languages: A prolegomena to the classification of the Cross River languages. Journal of West African Linguistics 24:3-46.
Côté, Marie-Hélène. 1999. Edge effects and the prosodic hierarchy: Evidence from stops and affricates in Basque. In Proceedings of the 29th Annual Meeting of the North Eastern Linguistic Society, eds. P. Tamanji, M. Hirotani and N. Hall, 51-65. Amherst, MA: GLSA.
Crowley, Terry. 1998. Ura. Munchen: Lincom Europa.
Crum, Beverly, and Jon P. Dayley. 1993. Western Shoshoni Grammar. Boise, Idaho: Dept. of Anthropology, Boise State University.
Cutler, A., and D. G. Norris. 1988. The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance 14:113-121.
Davies, John. 1981. Kobon. Amsterdam: North-Holland.
Dayley, Jon P. 1985. Tzutujil Grammar. Berkeley: University of California Press.
Dayley, Jon P. 1989. Tümpisa (Panamint) Shoshone Grammar. Berkeley: University of California Press.
de Boer, Bart. 2001. The Origins of Vowel Systems. Oxford: Oxford University Press.
194
de Boer, Bart G. 2000. Self-organization in vowel systems. Journal of Phonetics 28:441-465.
de Lacy, Paul. 2000. Markedness in prominent positions. In Proceedings of HUMIT (MIT Working Papers in Linguistics), ed. A. Szcegielniak. Cambridge, MA: Department of Linguistics and Philosophy, MIT.
de Lacy, Paul. 2002a. The Formal Expression of Markedness, University of Massachusetts, Amherst: Doctoral dissertation.
de Lacy, Paul. 2002b. Maximal words and the Maori passive. In Proceedings of the Austronesian Formal Linguistics Association (AFLA) VIII, ed. N. Richards. Cambridge, MA: MIT Working Papers in Linguistics.
Delgutte, Bertrand. 1997. Auditory neural processing of speech. In The Handbook of Phonetic Sciences, eds. W.J. Hardcastle and J. Laver, 507-538. Oxford: Blackwell.
Dell, François, and Mohamed Elmedlaoui. 1985. Syllabic consonants and syllabification in Imdlawn Tashlhiyt Berber. Journal of African Languages and Linguistics 7:105-130.
Dempster, Arthur P., Nan M. Laird, and Donald B. Rubin. 1977. Maximum Likelihood from incomplete data via the EM algorithm. Journal of Royal Statistics Society 39:1-38.
Dixon, R. M. W. 1991. Mbabaram. In The Handbook of Australian Languages, eds. Robert M. W. Dixon and B. Blake. Melbourne: Oxford University Press Australia.
Dixon, R. M. W. 2002. Australian Languages. Cambridge: Cambridge University Press.
Downing, Laura J. 1998. On the prosodic misalignment of onsetless syllables. Natural Language and Linguistic Theory 16:1-52.
Ebert, Karen H. 1997. Camling (Chamling). Munchen: Lincom Europa.
Elbert, Samuel H. 1974. Puluwat Grammar: Pacific Linguistics B-29. Canberra: Australian National University.
Elbert, Samuel H., and Mary Kawena Pukui. 1979. Hawaiian Grammar. Honolulu: University of Hawaii Press.
Elders, Stefan. 2000. Grammaire Mundang. Leiden: Research School of Asian, African, and Amerindian Studies, Universiteit Leiden.
Elfenbein, Josef. 1998. Brahui. In The Dravidian Languages, ed. S.B. Steever, 388-414. London & New York: Routledge.
Engelenhoven, Aone Thomas Pieter Gerrit van. 2004. Leti: A Language of Southwest Maluku. Leiden: KITLV Press.
England, Nora C. 1983. A Grammar of Mam, a Mayan Language. Austin: University of Texas Press.
195
Essien, Okon E. 1990. A Grammar of the Ibibio Language. Ibadan: University Press Limited.
Evans, Nicholas. 2003. Bininj Gun-Wok: A Pan-Dialectal Grammar of Mayali, Kunwinjku and Kune. Canberra: Australian National University.
Everett, Daniel. 1990. Minimality in Kama and Bawana. Ms. Philadelphia.
Ferrer, Eduardo Blasco. 1994. Ello Ellus: Grammatica Sarda. Nuoro: Poliedro Edizioni.
Fikkert, Paula. 1994. On the Acquisition of Prosodic Structure. The Hague: Holland Academic Graphics.
Flack, Kathryn. 2006. Lateral phonotactics in Australian languages. In Proceedings of NELS 35, eds. L. Bateman and C. Ussery, 187-199. Amherst, MA: GLSA.
Flemming, Edward. 2004. Contrast and perceptual distinctiveness. In Phonetically Based Phonology, eds. Bruce Hayes, Robert Kirchner and Donca Steriade. Cambridge: Cambridge University Press.
Fortescue, Michael. 1984. West Greenlandic. London: Croom Helm.
Fougeron, Cecile, and Patricia A. Keating. 1996. Articulatory strengthening in prosodic domain-initial position. UCLA Working Papers in Phonetics 92:61-87.
Frajzyngier, Zygmunt. 2001. A Grammar of Lele. Stanford CA: CSLI.
Furby, Christine. 1974. Garawa Phonology. Canberra: Australian National University.
Ghosh, Arun. 1994. Santali: A Look into Santal Morphology. New Delhi: Gyan Publishing House.
Gnanadesikan, Amalia. 2004. Markedness and faithfulness constraints in child phonology. In Constraints in Phonological Acquisition, eds. R. Kager, J. Pater and W. Zonneveld, 73-108. Cambridge: Cambridge University Press.
Goad, Heather, and Yvan Rose. 2004. Input elaboration, head faithfulness, and evidence for representation in the acquisition of left-edge clusters in West Germanic. In Constraints in Phonological Acquisition, eds. R. Kager, J. Pater and W. Zonneveld, 109-157. Cambridge: Cambridge University Press.
Goldsmith, John. 1990. Autosegmental and Metrical Phonology. Cambridge, MA & Oxford: Blackwell.
Gordon, Kent H. 1976. Phonology of Dhangar-Kurux. Kathmandu: Institute of Nepal and Asian Studies, Tribhuvan University.
Gordon, Lynn. 1986. Maricopa Morphology and Syntax. Berkeley: University of California Press.
Gouskova, Maria. 2003. Deriving Economy: Syncope in Optimality Theory, University of Massachusetts Amherst: Doctoral dissertation.
Greenberg, Joseph. 1941. Some problems in Hausa phonology. Language 17:316-323.
196
Gregores, Emma, and Jorge A. Suarez. 1967. A Description of Colloquial Guarani. The Hague: Mouton.
Hagège, Claude. 1967. Description phonologique du parler Wori. Journal of West African Linguistics 4:15-34.
Hahn, Reinhard F., and Ablahat Ibrahim. 1991. Spoken Uyghur. Seattle: University of Washington Press.
Halle, Morris. 1959. The Sound Pattern of Russian. The Hague: Mouton.
Hamilton, Philip J. 1996. Phonetic Constraints and Markedness in the Phonotactics of Australian Languages, University of Toronto: Doctoral dissertation.
Hansen, K. C., and L. E. Hansen. 1978. The Core of Pintupi Grammar. Alice Springs, Northern Territory: Institute for Aboriginal Development.
Harris, John, and Edmund Gussmann. 1998. Final codas: Why the west was wrong. In Structure and Interpretation: Studies in Phonology, ed. E. Cyran, 139-162. Lublin: Folium.
Harvey, Mark. 1991. Glottal stop, underspecification, and syllable structures among the Top End languages. Australian Journal of Linguistics 11:67-105.
Hay, Jessica. 2005. How Auditory Discontinuities and Linguistic Experience Affect the Perception of Speech and Non-Speech in English- and Spanish-Speaking Listeners, University of Texas Austin: Doctoral dissertation.
Hayes, Bruce. 1995. Metrical Stress Theory: Principles and Case Studies. Chicago: The University of Chicago Press.
Hayes, Bruce, Robert Kirchner, and Donca Steriade eds. 2004. Phonetically Based Phonology. Cambridge: Cambridge University Press.
Hayes, Bruce, and Colin Wilson. to appear. A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry.
Hayes, Bruce P. 1999. Phonetically driven phonology: The role of Optimality Theory and inductive grounding. In Formalism and Functionalism in Linguistics, vol. 1, eds. M. Darness, E. A. Moravcsik, F. Newmeyer, M. Noonan and K. M. Wheatley, 243-285. Amsterdam: Benjamins.
Healey, Alan. 1981a. Telefol Phonology: Pacific Linguistics B-3. Canberra: Australian National University.
Healey, Alan. 1981b. The phonological complexity of Kapau. In Angan Languages are Different: Four Phonologies, ed. P. M. Healey, 5-16. Karumpa, E.J.P., Papua New Guinea: SIL.
Healey, Phyllis, and Alan Healey. 1977. Telefol Dictionary: Pacific Linguistics C-46. Canberra: Australian National University.
Heath, Jeffrey. 1984. A Functional Grammar of Nunggubuyu. Canberra: AIAS.
197
Heath, Jeffrey. 1989. From Code-Switching to Borrowing: Foreign and Diglossic Mixing in Moroccan Arabic. London: Kegan Paul International.
Helimski, Eugene. 1998a. Selkup. In The Uralic Languages, ed. D. Abondolo, 548-579. London & New York: Routledge.
Helimski, Eugene. 1998b. Nganasan. In The Uralic Languages, ed. D. Abondolo, 480-515. London & New York: Routledge.
Hooper [Bybee], Joan. 1976. An Introduction to Natural Generative Phonology. New York: Academic Press.
Hume, Elizabeth. 1998. Metathesis in phonological theory: The case of Leti. Lingua 104:147-186.
Hume, Elizabeth V., and Georgios Tserdanelis. 2002. Labial unmarkedness in Sri Lankan Portuguese Creole. Phonology 19:441-458.
Hyman, Larry M. 1978. Word demarcation. In Universals of Human Language, Vol. 2: Phonology, ed. J. H. Greenberg, 443-470. Palo Alto: Stanford University Press.
Ito, Junko. 1986. Syllable Theory in Prosodic Phonology, University of Massachusetts, Amherst: Doctoral dissertation.
Ito, Junko. 1989. A prosodic theory of epenthesis. Natural Language and Linguistic Theory 7:217-259.
Ito, Junko, and Armin Mester. 1994. Reflections on CodaCond and Alignment. In Phonology at Santa Cruz, eds. Jason Merchant, Jaye Padgett and Rachel Walker, 27-46. Santa Cruz: Linguistics Research Center, UC Santa Cruz.
Ito, Junko, and Armin Mester. 2003. Weak layering and word binarity. In A New Century of Phonology and Phonological Theory: A Festschrift for Professor Shosuke Haraguchi on the Occasion of His Sixtieth Birthday, eds. Takeru Honma, Masao Okazaki, Toshiyuki Tabata and Shin-ichi Tanaka, 26-65. Tokyo: Kaitakusha.
Jarosz, Gaja. 2006. Rich Lexicons and Restrictive Grammars - Maximum Likelihood Learning in Optimality Theory, Johns Hopkins University: Doctoral dissertation.
Jusczyk, Peter W. 1997. The Discovery of Spoken Language. Cambridge, MA: MIT Press.
Ka, Omar. 1994. Wolof Phonology and Morphology. Lanham MD: University Press of America.
Katz, D. 1987. The Grammar of the Yiddish Language. London: Duckworth.
Kawahara, Shigeto. 2006a. Mimetic gemination in Japanese: A challenge for Evolutionary Phonology. Theoretical Linguistics 32:411-424.
Kawahara, Shigeto. 2006b. A faithfulness ranking projected from a perceptibility scale: The case of [+ voice] in Japanese. Language 82:536-574.
198
Kawahara, Shigeto, and Kaori Akashi. 2006. The markedness hierarchy of geminates and mimetic gemination in Japanese: University of Massachusetts Amherst.
Kawahara, Shigeto. 2007. The Emergence of Phonetic Naturalness, University of Massachusetts Amherst: Doctoral dissertation.
Keating, Patricia. 1988. Underspecification in phonetics. Phonology 5:275-292.
Keating, Patricia A., Taehong Cho, Cecile Fougeron, and Chai-Shune Hsu. 1999. Domain-initial strengthening in four languages. UCLA Working Papers in Phonetics 97:137-151.
Keresztes, László. 1998. Mansi. In The Uralic Languages, ed. D. Abondolo, 387-427. London & New York: Routledge.
Kessinger, Rachel H., and Sheila E. Blumstein. 1997. Effects of speaking rate on voice-onset time in Thai, French, and English. Journal of Phonetics 25:143-168.
Key, Harold, and Mary Key. 1953. The phonemes of Sierra Nahuat. International Journal of American Linguistics 19:53-56.
Khrishnamurti, Bh. 1998. Telugu. In The Dravidian Languages, ed. Sanford B. Steever, 202-240. London & New York: Routledge.
Kiparsky, Paul. 1979. Metrical structure assignment is cyclic. Linguistic Inquiry 10:421-441.
Kiparsky, Paul. 2006. Amphichronic linguistics vs. Evolutionary Phonology. Theoretical Linguistics 32:217-236.
Kite, Suzanne, and Stephen Wurm. 2004. The Duunidjawu Language of the Southeast Queensland: Grammar, Texts and Vocabulary. Canberra: Australian National University.
Kornfilt, Jaklin. 1997. Turkish. London & New York: Routledge.
Krishnamurti, Bh., and Brett A. Benham. 1998. Konda. In The Dravidian Languages, ed. S.B. Steever, 241-269. London & New York: Routledge.
Kroeber, A. L., and George William Grace. 1960. The Sparkman Grammar of Luiseño. Berkeley: University of California Press.
Kuipers, Aert Hendrik. 1967. The Squamish Language: Grammar, Texts, Dictionary. The Hague & Paris: Mouton & Co.
Ladefoged, Peter, and Ian Maddieson. 1996. The Sounds of the World's Languages. Oxford and Malden, MA: Blackwell.
Lichtenberk, Frantisek. 1983. A Grammar of Manam. Honolulu: University of Hawaii Press.
Lindblom, Björn. 1986. Phonetic universals in vowel systems. In Experimental Phonology, eds. John J. Ohala and Jeri J. Jaeger, 13-44. Orlando: Academic Press.
199
Lisker, Leigh, and Arthur Abramson. 1964. A cross-language study of voicing in initial stops: Acoustical measurements. Word 20:384-422.
Lloyd, J., and A. Healey. 1970. Barua phonemes: A problem in interpretation. Linguistics 60:33-48.
Lombardi, Linda. 2001. Why Place and Voice are different: Constraint-specific alternations in Optimality Theory. In Segmental Phonology in Optimality Theory: Constraints and Representations, ed. L. Lombardi, 13-45. Cambridge: Cambridge University Press.
Luce, Paul A, and David B. Pisoni. 1988. Recognizing spoken words: The neighborhood activation model. Ear and Hearing 19:1-36.
Lynch, John. 2000. A Grammar of Anejom. Canberra: Research School of Pacific and Asian Studies, Australian National University.
Macaulay, Monica Ann. 1996. A Grammar of Chalcatongo Mixtec. Berkeley: University of California Press.
Marslen-Wilson, William D. 1975. Sentence perception as an interactive parallel process. Science 189:226-228.
Marslen-Wilson, William D., and Alan Welsh. 1978. Processing interactions and lexical access during word recognition in continuous speech. Cognitive Psychology 10:29-63.
Maryott, Kenneth R. 1961. The phonology and morphophonemics of Tabukang Sangir. Philippine Social Sciences and Humanities Review 26:111-126.
Mascaró, Joan, and Leo Wetzels. 2001. The typology of voicing and devoicing. Language 77:207-244.
Matteson, Esther. 1965. The Piro (Arawakan) Language. Berkeley: University of California Press.
Maye, Jessica. 2000. Learning Speech Sound Categories from Statistical Information, University of Arizona: Doctoral dissertation.
McCarthy, John J., and Alan Prince. 1993a. Generalized alignment. In Yearbook of Morphology 1993, 79-153.
McCarthy, John J., and Alan Prince. 1993b. Generalized alignment. In Yearbook of Morphology, eds. G. Booij and J. van Marle, 79-153. Dordrecht: Kluwer.
McCarthy, John J., and Alan Prince. 1994. The emergence of the unmarked: Optimality in prosodic morphology. In Proceedings of the North East Linguistic Society 24, ed. M. Gonzàlez, 333-379. Amherst, MA: GLSA Publications.
McCarthy, John J. 1998. Constraints on word edges. Handout for a talk presented at Johns Hopkins University, March 12, 1998.
McCarthy, John J. 1999. Sympathy and phonological opacity. Phonology 16:331-399.
200
McCarthy, John J. 2001. ŋ. In University of Massachusetts, Amherst Summer Phonology Group: Handout for a talk given at the University of Massachusetts, Amherst Summer Phonology Group. July 18, 2001.
McKaughan, H. 1973. Introduction. In The Languages of the Eastern Family of the East New Guinea Highland Stock, ed. H. McKaughan. Seattle: University of Washington Press.
Merchant, Nazzaré, and Bruce Tesar. to appear. Learning underlying forms by searching restricted subspaces. In The Proceedings of CLS 41. Chicago: Chicago Linguistics Society.
Mikuteit, Simone. 2006. A Cross Linguistic Inquiry on Voice, Quantity and Aspiration, Universität Konstanz: Doctoral dissertation.
Milner, G. B. 1958. Aspiration in two Polynesian languages. Bulletin of the School of Oriental and African Studies 21:368-375.
Mithun, Marianne, and Hasan Basri. 1986. The phonology of Selayarese. Oceanic Linguistics 25:210-254.
Moreton, Elliott. 2002. Structural constraints in the perception of English stop-sonorant clusters. Cognition 84:55-71.
Nellis, Donald G., and Barbara E. Hollenbach. 1980. Fortis versus lenis in Cajonos Zapotec phonology. International Journal of American Linguistics 46:92-105.
Nespor, Marina, and Irene Vogel. 1986. Prosodic Phonology. Dordrecht: Foris.
New, Boris, and Christophe Pallier. 2005. LEXIQUE 3.02: Une Base de Donnees Lexicales Libre.
Newell, Leonard E. 1956. Phonology of the Guhang Ifugao dialect. Philippine Journal of Science 85:523-539.
Nigam, Kamal, Andrew McCallum, and Tom Mitchell. 2006. Semi-supervised text classification using EM. In Semi-Supervised Learning, eds. O. Chapelle, A. Zien and B. Scholkopf, 33-56. Cambridge, MA: MIT Press.
Noonan, Michael. 1992. A Grammar of Lango. Berlin & New York: Mouton de Gruyter.
Nooteboom, Sieb G. 1981. Lexical retrieval from fragments of spoken words: Beginnings vs. endings. Journal of Phonetics 9:407-424.
Ohala, John. 1981. The listener as a source of sound change. In Papers from the Parasession on Language and Behavior, eds. Carrie S. Masek, Roberta A. Hendrick and Mary Frances Miller, 178-203. Chicago: Chicago Linguistic Society.
Ohala, John. 1983. The origin of sound patterns in vocal tract constraints. In The Production of Speech, ed. Peter MacNeilage, 189-216. New York: Springer-Verlag.
201
Ohala, John J. 1990. There is no interface between phonology and phonetics: A personal view. Journal of Phonetics 18:153-171.
Oller, D. Kimbrough. 2000. The Emergence of the Speech Capacity. Mahwah, N.J.: Lawrence Erlbaum Associates.
Padgett, Jaye. 2002. Constraint conjunction versus grounded constraint subhierarchies in Optimality Theory. Ms. Santa Cruz, CA.
Parker, Stephen G. 2002. Quantifying the Sonority Hierarchy, University of Massachusetts Amherst: Doctoral dissertation.
Parker, Steve. 1994. Laryngeal codas in Chamicuro. International Journal of American Linguistics 60:261-271.
Parker, Steve. 2001. Non-optimal onsets in Chamicuro: An inventory maximised in coda position. Phonology 18:361-386.
Pater, Joe, and Jessica Barlow. 2003. Constraint conflict in cluster reduction. Journal of Child Language 30:487-526.
Pater, Joe. to appear. Morpheme-specific phonology: Constraint indexation and inconsistency resolution. In Phonological Argumentation: Essays on Evidence and Motivation, ed. Stephen G. Parker. London: Equinox.
Payne, Doris L., and Thomas E. Payne. 1990. Yagua. In Handbook of Amazonian Languages, eds. D.C. Derbyshire and G.K. Pullum, 249-474. Berlin & New York: Mouton de Gruyter.
Peasgood, Edward T. 1972. Carib phonology. In Languages of the Guianas, ed. J.E. Grimes, 35-41. Norman: Summer Institute of Linguistics of the University of Oklahoma.
Pierrehumbert, J., and D. Talkin. 1992. Lenition of /h/ and glottal stop. In Papers in Laboratory Phonology II: Gesture, Segment, Prosody, eds. G.J. Docherty and D.R. Ladd, 90-117. Cambridge: Cambridge University Press.
Pike, Kenneth, and E. Pike. 1947. Immediate constituents of Mazatec syllables. International Journal of American Linguistics 13:78-91.
Pisoni, David B., and Joan House Lazarus. 1973. Categorical and noncategorical modes of speech perception along the voicing continuum. Journal of the Acoustical Society of America 55:328-333.
Pisoni, David B., and J. Tash. 1974. Reaction times to comparisons within and across phonetic categories. Perception and Psychophysics 15:285-290.
Pitt, Mark A., and Arthur G. Samuel. 1995. Lexical and sublexical feedback in auditory word recognition. Cognitive Psychology 29:149-188.
Poppe, N. 1962. Bashkir Manual: Uralic and Altaic Series, vol. 36. Bloomington: Indiana University.
202
Poppe, Nikolaus. 1970. Mongolian Language Handbook. Washington, D.C.: Center for Applied Linguistics.
Prentice, D. J. 1971. The Murut Languages of Sabah: Pacific Linguistics C-18. Canberra: Australian National University.
Prentice, D. J. 1990. Malay (Indonesian and Malaysian). In The World's Major Languages, ed. B. Comrie. Oxford: Oxford University Press.
Prince, Alan, and Paul Smolensky. 1993/2004. Optimality Theory: Constraint Interaction in Generative Grammar. Malden, MA & Oxford: Blackwell.
Prince, Alan. 2002. Arguing optimality. In Papers in Optimality Theory II (= University of Massachusetts Occasional Papers 26), eds. Angela Carpenter, Andries Coetzee and Paul de Lacy, 269-304. Amherst, MA: GLSA.
Prost, André. 1956. La Langue Soŋay et ses Dialectes. Dakar: IFAN.
Pullum, Geoffrey K., and Barbara C. Scholz. 2002. Empirical assessment of stimulus poverty arguments. The Linguistic Review 19:9-50.
Ramaswami, N. 1992. Bhumij grammar. Mysore: Central Institute of Indian Languages.
Ramsey, S. Robert. 1987. The Languages of China. Princeton, NJ: Princeton University Press.
Rath, John C. 1981. A Practical Heiltsuk-English Dictionary with a Grammatical Introduction. Ottawa: National Museums of Canada.
Rennison, John R. 1997. Koromfe. London & New York: Routledge.
Repp, Bruno. 1979. Relative amplitude of aspiration noise as a voicing cue for syllable-initial stop consonants. Language and Speech 22:173-189.
Rutgers, Roland. 1998. Yamphu: Grammar, Texts & Lexicon. Leiden: Research School CNWS, School of Asian, African, and Amerindian Studies.
Salzmann, Zdenek. 1956. Arapaho I: Phonology. International Journal of American Linguistics 22:49-56.
Samarin, William J. 1966. The Gbeya Language: Grammar, Texts, and Vocabularies. Berkeley: University of California Press.
Sapir, Edward, and Morris Swadesh. 1960. Yana Dictionary: University of California Publications in Linguistics, v. 22. Berkeley: University of California Press.
Saul, Janice E., and Nancy F. Wilson. 1980. Nung Grammar. Dallas: Summer Institute of Linguistics.
Schaub, Willi. 1985. Babungo. London & Dover NH: Croom Helm.
Selkirk, Elisabeth. 1981. On the nature of phonological representation. In The Cognitive Representation of Speech, eds. J. Anderson, J. Laver and T. Meyers. Amsterdam: North Holland.
203
Selkirk, Elisabeth. 1984. Phonology and Syntax: The Relation between Sound and Structure. Cambridge, MA: MIT Press.
Selkirk, Elisabeth. 1995. The prosodic structure of function words. In Papers in Optimality Theory, eds. J. Beckman, L. Walsh Dickey and S. Urbanczyk, 439-470. Amherst, MA: GLSA Publications.
Shukla, Shaligram. 1981. Bhojpuri Grammar. Washington, DC: Georgetown University Press.
Smith, Jennifer L. 2002. Phonological Augmentation in Prominent Positions, University of Massachusetts Amherst: Doctoral dissertation.
Smolensky, Paul. 1995. On the internal structure of the constraint component Con of UG. Handout from talk, University of Arizona.
Smolensky, Paul. 1997. Constraint interaction in generative grammar II: Local conjunction, or random rules in Universal Grammar. Paper presented at Hopkins Optimality Theory Workshop/Maryland Mayfest '97, Baltimore, MD.
Smythe, W. E. 1948. Elementary Grammar of the Gumbaiŋgar Language (North Coast, N. S. W.). Sydney: Australian National Research Council.
Sohn, Ho-min. 1973. A Ulithian Grammar: Pacific Linguistics C-27. Canberra: Australian National University.
Sohn, Ho-min. 1975. Woleaian Reference Grammar. Honolulu: University of Hawaii Press.
Sommer, B. A. 1969. Kunjen Phonology: Synchronic and Diachronic: Pacific Linguistics B-11. Canberra: Australian National University.
Spring, Cari. 1990. Implications of Axininca Campa for Prosodic Morphology and Reduplication, University of Arizona: Doctoral dissertation.
Stampe, David. 1973. A Dissertation on Natural Phonology, University of Chicago: Doctoral dissertation.
Steever, Sanford B. 1998. Kannada. In The Dravidian Languages, ed. S.B. Steever, 129-157. London & New York: Routledge.
Steriade, Donca. 1988. Reduplication and syllable transfer in Sanskrit and elsewhere. Phonology 5:73-155.
Steriade, Donca. 1999. Alternatives to the syllabic interpretation of consonantal phonotactics. In Proceedings of the 1998 Linguistics and Phonetics Conference, eds. O. Fujimura, B. Joseph and B. Palek, 205-242. Prague: The Karolinum Press.
Steriade, Donca. 2001a. The phonology of perceptibility effects: The P-map and its consequences for constraint organization. Ms. Los Angeles.
Steriade, Donca. 2001b. Directional asymmetries in place assimilation. In The Role of Speech Perception in Phonology, eds. Elizabeth Hume and Keith Johnson, 219-250. San Diego: Academic Press.
204
Stonham, John T. 1999. Aspects of Tsishaat Nootka Phonetics and Phonology. Munchen: Lincom Europa.
Subrahmanyam, P. S. 1998. Kolami. In The Dravidian Languages, ed. S.B. Steever, 301-327. London & New York: Routledge.
Sullivan, Thelma D. 1988. Compendio de la Gramatica Nahuatl. Salt Lake City: University of Utah Press.
Tabain, Marija, Gavan Breen, and Andrew Butcher. 2004. VC vs. CV syllables: A comparison of Aboriginal languages with English. Journal of the International Phonetic Association 34:175-200.
Teeter, Karl V. 1964. The Wiyot Language. Berkeley: University of California.
Tesar, Bruce, and Paul Smolensky. 1994. The learnability of Optimality Theory. In Proceedings of the Thirteenth West Coast Conference on Formal Linguistics, eds. Raul Aranovich, William Byrne, Susanne Preuss and Martha Senturia, 122-137. Stanford, CA: CSLI Publications.
Tesar, Bruce, and Paul Smolensky. 2000. Learnability in Optimality Theory. Cambridge, MA: MIT Press.
Trail, Ronald L. 1970. The Grammar of Lamani. Norman, OK: Summer Institute of Linguistics of the University of Oklahoma.
Trefry, D. 1969. A Comparative Study of Kuman and Pawaian: Pacific Linguistics B-13. Canberra: The Australian National University.
Trigo, Rosario L. 1988. On the Phonological Derivation and Behavior of Nasal Glides, MIT: Doctoral dissertation.
Trigo, Rosario L. 1991. On pharynx-larynx interactions. Phonology 8:113-136.
Truckenbrodt, Hubert. 1999. On the relation between syntactic phrases and phonological phrases. Linguistic Inquiry 30:219-256.
Tucker, Archibald Norman. 1967. The Eastern Sudanic Languages. London: International African Institute.
Tyler, S. A. 1969. Koya: An Outline Grammar: University of California Publications in Linguistics no. 54. Berkeley: University of California Press.
van Driem, George. 1987. A Grammar of Limbu. Berlin & New York: Mouton de Gruyter.
Van Haitsma, J. D., and W. Van Haitsma. 1976. A Hierarchical Sketch of Mixe as Spoken in San José El Paraíso: SIL Publications 44. Norman, OK: Summer Institute of Linguistics of the University of Oklahoma.
van Minde, Don. 1997. Malayu Ambong: Phonology, Morphology, Syntax. Leiden: Research School CNWS.
205
van Oostendorp, Marc. 2004. Crossing morpheme boundaries in Dutch. Lingua 114:1367-1400.
Vihman, Marilyn May. 1996. Phonological Development: The Origins of Language in the Child. Oxford: Blackwell.
Waters, Bruce E. 1989. Djinang and Djinba: A Grammatical and Historical Perspective: Pacific Linguistics C-114. Canberra: ANU.
West, Birdie, and Betty Welch. 1967. Phonemic system of Tucano. In Phonemic Systems of Colombian Languages, ed. V.G. Waterhouse, 11-24. Norman, OK: Summer Institute of Linguistics of the University of Oklahoma.
Whitney, William D. 1889. Sanskrit Grammar, Including Both the Classical Language and the Older Dialects, of Veda and Brahmana. Cambridge, MA: Harvard University Press.
Wiering, Elisabeth, and Marinus Wiering. 1986. The Doyayo Language: Selected Studies. Dallas: Summer Institute of Linguistics.
Wiese, Richard. 1996. The Phonology of German: The Phonology of the World's Languages. Oxford: Clarendon Press.
Williamson, Kay. 1969. A Grammar of the Kolokuma Dialect of Ijo. Cambridge: Cambridge University Press.
Wilson, Colin. 2003. Experimental investigation of phonological naturalness. In Proceedings of the 22nd West Coast Conference on Formal Linguistics, eds. G. Garding and M. Tsujimura, 533-546. Somerville, MA: Cascadilla Press.
Wiltshire, Caroline. 2003. Beyond codas: Word and phrase-final alignment. In The Syllable in Optimality Theory, eds. C. Féry and R. van de Vijver, 254-268. Cambridge: Cambridge University Press.