Connectionist Models and Linguistic Theory
-
Upload
compraventamya -
Category
Documents
-
view
227 -
download
0
Transcript of Connectionist Models and Linguistic Theory
-
8/11/2019 Connectionist Models and Linguistic Theory
1/50
COGNITIVE SCIENCE
18,
l-50 (1994)
ConnectionistModels and LinguisticTheory:
Investigationsof StressSystems n Language
PRAHLAD GUPTA
DAVID S. TOURETZKY
Carnegie Mellon University
We question the widespreod assumption that l inguistic theory should guide the
formuiotion of mechanistic accounts of huma n language processing. We develop
a pseudo -l inguistic theory for the dom ain of l inguistic stress,based on observa-
tion of the ieornin g behavior of a perceptron exposed to a variety of stress pat-
terns. There are significant similarities betw een our analysis of perceptron stress
learning and metrical phonology, the l inguistic theory of hum an stress. Both
approaches attem pt to identify salient characteristics of the stress systems under
exam ination witho ut reference to the workings of the underlying processor. Our
theory and computer simulat ions exhibit some strikingly suggestive correspon-
dences w ith metrical theory. We show, however, that our high-leve l pseudo-
l inguistic account bears no causal relation to processing in the perceptron, and
provides l i ttle insight into the nature of this processing. Because of the persua-
sive similarities betw een the nature of our theory an d l inguistic theorizing, we
suggest that l inguistic theory may be in much the same position. Contrary to the
usual assumption, i t may not provide useful guidance in attempts to identi fy pro-
cessing mechanisms underlying hum an language.
The fundamentalquestions o be answered bout languageare, as Noam
Chomskyhaspointedout: (1)What is thesystemof knowledgen the mind/
brain of the speakerof Englishor Spanishor Japanese?2) How does his
knowledge ome nto being? 3) How is this knowledge sed?and (4)What
are the physicalmechanisms ervingas he basis or this systemof knowl-
edge,and for its use? Chomsky, 1988,p. 3).
We would l ike to thank Deirdre Wheeler for introducing us to the l i terature on stres s and
providing guidance in the early stage-s f this wo rk. We also thank Gary Cottrel l , Elan Dres her,
Jef f Ehnan, Dan Everet t , M ichael Gasser, Br ian M acWhinney, Jay McClel land, Er ic Nyberg,
Brad Pritchett, Steve Sm all, Deirdre Whe eler, and an anonym ous reviewer for helpful com-
me nts on earlier ve rsions of this paper, and David Evan s for ac ces s o computing facil i ties at
Carnegie Mellons Laboratory for Com putational Linguistics. Of course , none of them is re-
sponsible for any errors in this work , nor do they necess ari ly agree with the opinions expressed
here.
The second author wa s supported by a grant from H ughes Aircraft Corporation, and by
the Office of Nava l R esearch under contrac t num ber NOO O14-86-K-0678.
Correspondence and requests for reprints should be sent to Prahlad Gup ta, Depa rtment of
Psychology, Carnegie Mellon University, Pittsburgh, PA 15213.
1
-
8/11/2019 Connectionist Models and Linguistic Theory
2/50
2 GUPTA AND TOURETZKY
Linguistic theory s viewed, n the Chomskyan radition, as lluminating
the nature of mind, and as guiding the search or lower level processing.
Thus, according o Chomsky, to the extent hat the linguist can provide
answers o the first threequestionsabove, he brain scientistcan begin o
explore answers o the fourth question, hat is, can begin to explore he
physical mechanisms hat exhibit the properties evealed n the linguists
abstract heory. In the absence f answerso the first threequestions, rain
scientistsdo not know what they are searching or (Chomsky, 1988,p. 6).
Chomsky distinguishes hree levels of analysis n linguistic inquiry:
observation,description,and explanation; heseare he tasks or linguistics
(Chomsky, 1988, . 60-62).Explanation s via the formulation of Universal
Grammar, which is viewedasproviding a genuineexplanationof the phe-
nomena Chomsky,1988,p. 62).UniversalGrammarreveals he principles
of the mind (Chomsky, 1988,p, 91), and thus constitutes he abstract
theory that is needed o guide he search or actual brain mechanisms.
This view canbe statedmore generallyas he claim that the abstract or-
mulations of linguistic theory are he appropriate rameworkwithin which
to organize the search for the lower-level processingmechanismsof
language.Our aim in this paper s to draw attention o the fact that this s an
assumption, and one with important consequences.o the extent hat this
claim is true, it suggestshat formulation of accountsof language rocess-
ing, as well as the search or the neural substrates f language, hould be
guidedby the constructsof linguistic theory.To the extent hat this claim is
false, the useof linguistic constructsas the basis for processing ccounts
and neural nvestigationmay be misguided,and misleading.
Our goal, then, is to examinewhether t is necessarilyrue that the con-
structsof linguistic theory play a valuable ole in formulation of accounts
of lower-levelprocessing.n this paper,we will develop
&pseudo-linguistic
theory
for the domain of linguistic stress,basedon observationof the
learningbehaviorof a perceptron xposedo a varietyof stress atterns.We
will then compare his theorywith
metricalphonology,
the actual inguistic
theory of stress,showinghow our theory exhibits many of the kinds of
regularitiesand predictionsof metrical theory. Thus, our theory will be
analogous,n important ways, o metrical theory. It predicts, or example,
that certain stresssystemswill be unlearnable.We will then analyze he
behaviorof the underlyingperceptronmodel n termsof its actual process-
ing. This will show hat our theoreticalpredictionsof non-learnabilitybear
no relation to the low-level, causalexplanationwe developas part of the
processing ccount.
We call our theory pse udo-l inguistic only because it is based on data generated b y the
perceptron rather than hum ans.
-
8/11/2019 Connectionist Models and Linguistic Theory
3/50
CONNECTIONIST MODE LS AND LINGUISTIC THEORY
3
In short, we will demonstratehat a lower-levelaccountmay not be so-
morphic (or evensimilar) to a higher-level ccount,and hat under hese ir-
cumstances, he higher-levelaccount may provide little insight into the
underlying phenomena.Furthermore, since the nature and the mode of
development f our higher-level ccountarecloselyanalogouso the nature
and developmentof bodies of linguistic theory, this constitutesa strong
argumentagainstassuming hat the high-levelaccountsof linguistic theory
will necessarilylluminate lower-levelprocessingmechanisms, nd against
assuming hat linguistic theory is the appropriateconceptual ramework
within which to organize he search or thesemechanisms. ucharguments
haveof course eenmadebefore Pylyshyn,1973;Rumelhart8cMcClelland,
1987;Smolensky,1988).However, he presentwork substantiateshe argu-
ment with a concretedemonstrationof how an analytical framework that
bearsa close esemblanceo linguistic heorycan ail to havean analoguen
the underlyingprocessingmechanisms.
The organizationof the paper s as follows. The first sectionprovides
somebackground nformation about the domain of linguistic stress,while
the second ectiondescribes ur perceptron imulations of stress earning.
The third sectiondevelops ur pseudo-linguisticheory, anddiscussesorre-
spondencesetween spects f the perceptron imulations,our theory, and
metrical theory. In the fourth section,we show that our theory n reality
offers only an abstract account hat is unrelated o low-level processing
characteristics. aving shown hat a high-levelaccountneednot illuminate
underlying processing,we turn in the fifth section o a considerationof
someotherreasonswhy theanalyses f metrical theorymight not be appro-
priate primitives for a processingmodel. In view of all this, we conclude
that the Chomskyanassumption hat linguistic theoryshouldguide nvesti-
gation of processingmechanismsmay be unwarranted.
1. BACKGROUND: STRESS SYSTEMS IN LANGUAGE
1.1. Motivation for Choice of Domain
In order to examinea subsystem f linguistic theory n termsof its implica-
tions for lower-levelprocessing, prerequisites, of course, hat the theory
be clearly specified.According o Dresherand Kaye (1990),stress ystems
arean attractivedomain for investigationbecause:a) the inguistic theory
is well-developed, o that comparedwith syntax, here s a relatively com-
pletedescriptionof the observed henomena, nd (b) stress ystems anbe
studiedrelatively independently f other aspectsof language Dresher&
Kaye, 1990,p. 138).We thereforechose o constructour pseudo-linguistic
theory with respect o linguistic stress, ince t is regarded sa well-defined
domain for which theoreticalanalyses rovide good coverage.
-
8/11/2019 Connectionist Models and Linguistic Theory
4/50
4
GUPTA AND TOURETZKY
1.2. Evolution of the Linguistic Theory
The analysisof stresshasevolved hrough a number of phases:
1.
2.
3.
4.
5.
Linear analysespresented tressas a phonemic eature of individual
vowels,with different evelsof stressepresenting ifferent evelsof ab-
soluteprominence.This approachwas aken n Trager& Smith (1951),
and culminated n Chomskyand Halles seminalThe Sound Pattern of
English
(1968).
Metrical theory, as developed y Liberman & Prince (Liberman, 1975;
Liberman & Prince, 1977), ntroducedboth a non-linearanalysisof
stress atterns in termsof
metrical trees),
and the reatmentof stress s
a relative property rather than an absolute one; however, he stress
featurewasretained n the analysis.
In subsequent evelopmentsPrince, 1976;Selkirk, 1980), elianceon
this featurewaseliminatedby incorporationof the dea hat subtrees f
metrical treeshad an independent tatus metrical feet), so that stress
assignment ulescould make reference o them.
The positing of internal structure for syllables (McCarthy, 1979a;
McCarthy, 1979b;Vergnaud& Halle, 1978)provideda meansof dis-
tinguishing ight andheavysyllables, distinction o whichstress atterns
are widely sensitive,but which had beenproblematic under previous
analyses.
An analysisof metrical tree geometriesHayes, 1980)providedan ac-
count of many aspects f stress ystemsn terms of a small numberof
parameters.
Through he development f metrical theory, therehasbeendebateover
whether he autosegmentalepresentationsor stress
re metrical trees only
(Hayes, 1980),metrical grids only (Prince, 1983;Selkirk, 1984),or some
combination of the two (Halle & Vergnaud, 1987a;Halle & Vergnaud,
1987b;Hayes, 1984a;Hayes, 1984b;Liberman: 1975;Liberman & Prince,
1977).
1.3. Syllable Structure
A syllable s analyzedas being comprisedof an onset,which contains he
material before he vowel, and a rime. The rime is comprisedof a nucleus,
which contains he vocalicmaterial, anda
coda,
which containsany remain-
ing (non-vocalic)material. For further discussion,seeKaye (1989, pp.
54-58).
A
syllablemay be
open
(it ends n a vowel); or
closed
(it ends n a conso-
nant). In terms of syllablestructure,an opensyllablehas a non-branching
rime (the rime has a nucleus,but not a coda), and a closedsyllable hasa
branching
rime (the rime hasboth a nucleusand a coda).
-
8/11/2019 Connectionist Models and Linguistic Theory
5/50
CONNECTIONIST MODE LS AND LINGUISTIC THEORY
5
In many languages,tress ends o beplacedon certainkinds of syllables
rather han on others; he former are ermedheavysyllables,and he atter
light syllables.What countsas a heavyor a light syllablemay differ across
languagesn which such a distinction is present,but, most commonly, a
heavysyllable s one hat can be characterized shavinga branching ime,
and a light syllable can be characterized s having a non-branching ime
(Goldsmith,1990, . 113).Languageshat nvolvesucha distinction between
heavyand ight syllables, .e., between he weightsof syllables)are termed
quantity-sensitive,and languages hat do not, quantity-insensitive.Note
that in quantity-insensitiveanguages yllablescan occur both with and
without branching rimes, but the distinction between hese kinds of
syllableshas no relevanceor the placementof stress.
1.4. Metrical Phonology
Thereseems o be theoreticalagreementhat stress atternsare sensitive o
information about syllablestructure,and in particular, to the structureof
the syllablerime, and not the syllableonset.We follow this assumptior?.
Thus rime structure s taken o be he basic evel at which accountsof stress
systems,re ormulated.(For an overviewof metrical heory, seeGoldsmith
(1990,Chapter4), Kaye(1989,pp. 139-145), an derHulst & Smith (1982),
or Dresher& Kaye (1990, pp. l-8)). Stresspatterns are controlled by
metrical structures uilt on top of rime structures.The versionof metrical
structure adopted here is metrical feet. We assume he parameters or-
mulatedby Dresherand Kaye (1990,p. 142):
W)
The word-tree s strong on the [Left/Right]
WI
Feet are [Binary/Unbounded]
(P3)
Feet arebuilt from the [Left/Right]
(P4)
Feet arestrong on the [Left/Right]
(P5)
Feet are Quantity-SensitiveQS) [Yes/No]
06)
Feetare QS to the [Rime/Nucleus]
0?7)
A strongbranchof a foot must itself branch [No/Yes]
WV
There s an extrametricalsyllable [Yes/No]
(P9)
It is extrametricalon the [Left/Right]
(Pw
A weak foot is defooted n clash [No/Yes]
Wl)
Feetare non-iterative No/Yes]
As an exampleof the applicationof theseparameters, onsider he stress
patternof Maranungku, n which primary stress alls on the first syllableof
the word and secondary tresson alternatesucceeding yllables.Figure 1
See, for exam ple, Dresher & Kaye (1990, p. 141) or Goldsm ith (1990, p. 170). No te,
howe ver, that som e researchers have presented evidence that onse ts ma y in fact be relevant to
the placeme nt of stress (Dav is, 1988; Everett & Eve rett, 1984).
-
8/11/2019 Connectionist Models and Linguistic Theory
6/50
6 GUPTA AND TOURETZKY
Word-tree
Flgure 1. Metrl cal structures for a six-syllable word In Moran ungku.
showsan abstract epresentation f a six-syllableword, with eachsyllable
represented s u. The assignmentof stress s characterizedas follows.
Binary, quantity-insensitive,eft-dominant feet are constructed teratively
from the left edgeof the word. Each foot has a strong and a weak
branch (labeled S and W, respectively,n the figure). The strong,or
dominant branchassigns tress o the syllable t dominates.Since he feet
are left-dominant, odd-numbered yllablesare assigned tress.Over the
roots of thesemetricalfeet, a eft-dominant word-tree s constructed,which
assigns tress o the structuredominatedby its leftmost branch.The third
and fifth syllablesare each dominated by the dominant branch of one
metrical structure a foot), while the first syllable s dominatedby the domi-
nant branches f two structures a foot, and he word-tree).Even-numbered
syllablesaredominatedonly by nondominantbranches f feet. The result s
that even-numberedyllables eceiveno stress; he third and fifth syllables
receiveonedegree f stress secondary tress);and he first syllable eceives
two degrees f stress primary stress.) he parameter ettings haracterizing
Maranungkuare [Pl Left], [PZ Binary], [P3 Left], [P4 Left], [PSNo], [P7
No], [P8 No], [PlO No], [Pll No]. ParametersP6 and P9 do not apply
because f the settingsof parametersP5 and P8, respectively.
1.5 Principles and Parameters
Metrical theory illustrates he
principles and parameters
approach o lan-
guage,one of whosecentralhypothesess that languageearningproceeds
through the discoveryof appropriateparametersettings.Every possible
human anguage anbe characterizedn terms of parametersettings;once
-
8/11/2019 Connectionist Models and Linguistic Theory
7/50
CONNECTIONIST MODE LS AND LINGUISTIC THEORY 7
thesesettingsare determined, he nature of structure-sensitiveperations
and he structures n which hey operates known, so that the detailsof lan-
guage rocessing reautomaticallydetermined at an abstract evel).Subse-
quently, the assignment f stress n the actual
production
or
processing
of
language s assumed o involve neural processeshat correspondquite
directly with the abstractprocess f applicationof theseparameter ettings
as guidelines o the constructionand manipulation of metrical feet.
1.6. Previous Computational Models of Stress Learning
Computationalmodels of stresssystemsn language avebeendeveloped
by Dresher& Kaye (1990)and by Nyberg(1990,1992).The focus of these
models s on the learningof theparameters specifiedby metrical theory;
they therefore ake asa startingpoint the constructs f that theory, and n-
corporate ts assumptions.What they add to the linguistic theory s what
Dresher8~Kaye erm a learning theory: a specificationof how the data he
language earner encounters n its environment are to be used to set
parameters. he following features haracterizehesemodels:
1.
2.
3.
4.
They assume he existence f processesxplicitly correspondingo the
linguistic notion of parametersetting.
They proposea
learning
theoryas an accountof that parameter-setting
process.
They assumehat he process f production (i.e., of producingappropri-
ate stress ontours or input wordsafter earninghasoccurred)nvolves
explicit .representational tructuresand structure-sensitive perations
directly corresponding o metrical-theoretic rees and operationson
those rees.
They assume o necessaryelationshipbetween he processingmecha-
nisms nvolved in
learning
vs. those nvolved in
production.
That is,
learning s accomplished y supplying alues or the parameters efined
by metrical theory; thesevalues hen form a knowledge ase or stress
assignment,whoseprocessingnvolves, for example, he construction
of binary trees rom right to left-an operationhavingno necessaryor-
respondence ith thoseby which the parameter alueswere acquired.
The abovemodelsexemplify he assumptionwe are nterestedn examin-
ing in the presentwork: that abstract inguistic formulations are he appro-
priate starting point for processingmodels.
Processingn the perceptronwill differ from the Dresher& Kaye and
Nybergmodelswith regard o the abovenoted characteristics:
1. There is no explicit incorporation of parameters n the perceptron
model.
-
8/11/2019 Connectionist Models and Linguistic Theory
8/50
8 GUPTA AND TOURETZKY
2. The learning heory employedconsistsof one of the general earning
algorithmscommon in connection& modeling, and is not an account
of parameter-setting.
3. The process f productiondoesnot involve explicitly structured epre-
sentations n the classical ense.
4. The processingmechanisms nd structures nvolved n production are
essentially he sameas those nvolved n learning.
Thus, to the extent that we are successful n contructing an abstract,
pseudo-linguisticheory that predictsbehaviorof the perceptronmodel, t
will not be merelyby virtue of having built in the constructs f metrical
theory to beginwith.
2. OVERVIEW OF MODEL AND SIMUL.ATIONS
2.1. Nineteen Stress Systems
Nine quantity-insensitive QI) languagesand ten quantity-sensitive QS)
languageswere examined n our experiments.The data, summarized n
Table 1, were aken primarily from Hayes(1980).Note that the QI stress
patternsof Latvian andFrench,MaranungkuandWeri, Lakota and Polish,
and Paiute and Waraoare mirror imagesof eachother. The QS stress at-
terns of Malayalam and Yapese,Osseticand Rotuman, EasternPermyak
Komi and EasternCheremis,and Khalka MongolianandAguacatecMayan
are also mirror images).
2.2. Structure of the Model
In separate xperiments,we taught a perceptron o produce he stresspat-
tern of each of the nineteen anguages. he domain was imited to single
words, as n the previous earningmodelsof metrical phonologydeveloped
by Dresherand Kaye (1990)and Nyberg (1990).Again as in the other
models, the effects of morpho-syntactic nformation such as lexical cate-
gory were gnored,and the simplifying assumptionwas made hat the only
relevant nformation about syllableswas their weight.
Two input representations ereused. n thesyllabicrepresentation, sed
for QI patternsonly, a syllablewasrepresentedsa [l, l] vector, and [0, 0]
represented o syllable. In the weight-string epresentation,which was
necessaryor QS languages,he input patternsusedwere [l, O] for a light
syllable, [0, l] for a heavy syllable, and [0, O] for no syllable. For stress
systems, ith up to two levelsof stress, he output targetsused n training
were1.0 or primary stress, .5 for secondary tress, nd0 for no stress. or
Descriptions som ewh at more com plex than ours have been reported for Polish (Halle &
Vergnaud, 1987a, pp. 57-58) and Malayalam (Hay es, 1980, pp. 66, 109). How ever, this does
not detract from our discus sion in any wa y, as stres s sys tem s corresponding to our sim plif ica-
t ions are reported to exis t: Swahil i (Halle & Clem ents, 1983, p. 17) and Gurkhali (Hay es, 1980,
p. 66), corresponding to Polish and Ma lay&m , respec tively.
-
8/11/2019 Connectionist Models and Linguistic Theory
9/50
T
1
S
e
P
e
n
D
p
o
a
E
m
p
e
S
e
A
g
m
e
R
L
D
p
o
o
S
e
P
e
n
E
m
p
e
Q
y
n
v
L
L
L
v
a
F
x
w
d
n
a
s
e
L
F
e
F
x
w
d
n
s
e
L
M
a
a
P
m
a
y
s
e
o
f
s
s
a
e
s
y
s
e
o
a
e
n
e
s
n
s
a
e
L
W
e
P
m
a
y
s
e
o
l
a
s
a
e
s
y
s
e
o
a
e
n
e
p
e
n
s
a
e
L
G
a
w
P
m
a
y
s
e
o
f
s
s
a
e
s
y
s
e
o
p
m
a
e
s
a
e
t
e
a
y
s
e
o
a
e
n
e
s
a
e
p
e
n
t
h
p
n
s
e
o
s
s
a
e
L
L
a
L
P
s
L
-
P
u
e
L
W
a
0
P
m
a
y
s
e
o
s
s
a
e
P
m
a
y
s
e
o
p
m
a
e
s
a
e
-
P
m
a
y
s
e
o
s
s
a
e
s
y
s
e
o
a
e
n
e
s
n
s
a
e
P
m
a
y
s
e
o
p
m
a
e
s
a
e
s
y
s
e
o
a
e
n
e
p
e
n
s
a
e
(
c
n
-
8/11/2019 Connectionist Models and Linguistic Theory
10/50
T
1
C
n
Q
v
S
v
L
L
O
L
1
L L L L L L L
B
L
K E
m
o
M
a
a
a
m
Y O
c
R
u
m
a
K
m
i
C
e
m
i
s
M
o
a
M
a
P
m
a
y
s
e
o
f
s
s
a
e
s
y
s
e
o
h
s
a
e
(
H
o
s
a
e
o
s
a
e
w
h
l
o
v
w
(
P
m
a
y
s
e
o
f
n
a
h
s
a
e
(
H
o
s
a
e
P
m
a
y
s
e
o
f
s
s
a
e
e
w
f
s
s
a
e
l
g
a
s
s
a
e
h
(
H
o
v
w
P
m
a
y
s
e
o
l
a
s
a
e
e
w
la
i
s
l
g
a
p
m
a
e
h
(
H
l
o
v
w
P
m
a
y
s
e
o
f
s
s
a
e
i
h
e
s
o
s
s
a
e
(
H
l
o
v
w
P
m
a
y
s
e
o
l
a
s
a
e
i
h
e
s
o
p
m
a
e
s
a
e
(
H
l
o
v
w
P
m
a
y
s
e
o
f
s
h
s
a
e
o
o
l
a
s
a
e
i
n
h
(
H
l
o
v
w
P
m
a
y
s
e
o
l
a
h
s
a
e
o
o
f
s
s
a
e
i
n
h
(
H
l
o
v
w
P
m
a
y
s
e
o
f
s
h
s
a
e
o
o
f
s
s
a
e
i
n
h
(
H
l
o
v
w
P
m
a
y
s
e
o
l
a
h
s
a
e
o
o
la
s
a
e
i
n
h
(
H
l
o
v
w
L
L
O
O
L
O
L
O
L
L
O
O
O
O
O
O
L
O
O
O
L
O
O
L
O
O
O
O
O
O
L
L
O
H
O
L
L
O
L
H
L
O
L
O
O
H
O
L
O
O
L
O
L
O
H
L
O
O
O
O
L
O
L
O
O
O
O
O
L
O
O
O
L
L
O
L
O
O
O
O
O
L
O
L
L
O
L
O
O
O
L
O
O
O
O
O
O
L
O
O
O
O
L
O
L
L
O
O
O
O
O
O
L
O
H
L
O
O
L
O
L
L
O
O
O
O
O
O
L
O
O
L
H
L
L
O
O
O
O
O
O
N
e
E
m
p
e
a
e
o
s
e
a
g
m
e
i
n
s
s
a
e
w
d
P
m
a
y
s
e
i
s
d
e
b
t
h
s
s
p
1
e
g
S
s
y
s
e
b
t
h
s
s
p
2
t
e
a
y
s
e
b
t
h
s
s
p
3
a
n
s
e
b
t
h
s
s
p
0
S
n
c
e
a
a
b
a
y
s
a
e
a
i
s
u
f
o
t
h
Q
s
e
p
t
e
n
F
Q
s
e
p
e
n
H
a
L
a
e
u
t
o
d
e
H
a
L
g
s
a
e
r
e
v
y
-
8/11/2019 Connectionist Models and Linguistic Theory
11/50
CONNECTIONIST MOD ELS AND LINGUISTIC THEORY
11
I
I OUTPUT UNIT
J
(PERCEPTRON)
-6 -5 -4 -3 -2
-1 0 +l t2 t3 t4 t5 t6
INPUT LAYER (2 x 13 units)
Figure 2.
Perceptron mod el used in simulotlons.
stresssystemswith three evels of stress, he output targetswere 1.0 for
primary stress,0.6 for secondary, .35 or tertiary, and0 for no stress. he
input data set for all stresssystems onsistedof all word-forms of up to
sevensyllables.With the syllabic input representationhere are sevenof
these,andwith theweight-string nput representation,hereare254distinct
patterns. The perceptrons nput array was a buffer of 13 syllables;each
word wasprocessednesyllableat a time by sliding t through he buffer (see
Figure 2). The desiredoutput at eachstepwas he stressevel of the middle
syllableof the buffer. Connectionweightswereadjustedat eachstepusing
the back-propagationearningalgorithm.*(Rumelhart,Hinton, & Williams,
In practice, we used weight str ing training sets n which there were an equal number of in-
put patterns of each length. Thu s, there was one instance of each of the 128 (=2) seven-
syllable patterns, and 64 instances of each of the two m onosyllablic patterns. This length
balancing wa s neces sary for certain languages for suc cess ful traning. We do not kno w h ow w ell
i t agrees with the actual frequency of form s in these languages, though as a general rule high-
frequency form s are probably shorter.
Note that although the architecture of the model is two-layered, with a single output unit,
as in a simple perceptron, we used the back-propagation algorithm (BP) rather than the
Widrow -Hoff algorithm (W H); (Widrow & Ho ff, 1960). BP adds to W H the scaling of the
error for each output unit by a function (the derivative) of the activation of that output unit,
and thus performs a more sensit ive and example-tuned weight adjustment than W H. Note that
BP and W H algorithms for two layers are guaranteed by the perceptron convergence procedure
to be equivalent in term s of learning capabil i t ies, for binary-valued outpu&. How ever, outputs
in the present simulations are not alway s binary; they som etime s take on intermediate values.
As a result, the different com putation of the error term in BP turned out to provide better
learning than WH.
-
8/11/2019 Connectionist Models and Linguistic Theory
12/50
12
GUPTA AND TOURETZKY
1986).One
epoch
consistedof one presentation f the entire training set.
The network was rained or asmany epochsasnecessaryo ensure hat the
stress alueproducedby the perceptronwaswithin 0.1 of the targetvalue,
for eachsyllable of the word, for all words n the training set. A
learning
rate of 0.05 and momentum of 0.90 were used n all simulations. Initial
weightswereuniformly distributed andomvalues n therangeof f 0.5. Each
simulation wasrun at least hree imes, and the learning imes averaged.
2.3. Training the Perceptron
To enableprecisedescription, et the input buffer be regardedas a 2 X 13
array asshown n Figure2. Let the 7th, that is, center,column benumbered
0, let the columns to the left of the center be numberednegatively - 1
through -6 going outwards rom the center,and the columns o the right
of the centerbenumberedpositively + 1 hrough + 6, goingoutwards rom
the center.
As an exampleof processing, upposehat the next word in the training
set or somequantity-insensitiveanguages the four-syllablepattern S S S
S, and theassociatedtress ontour(i.e., target), s [O0 0 11, ndicating hat
the final syllable eceives tress,and all other syllablesareunstressed6.he
first syllableenters he rightmost elementof the input buffer (element+ 6).
This is the first time step of processing. t the next time step, the first
syllable (in the rightmost input buffer position, that is, element +6) is
shifted eft to buffer element + 5, and the second yllableenters he input
buffer in element+6.
After two further time steps,elements+ 3, + 4, + 5, and + 6 of the nput
buffer contain he four syllablesof the currentword. At the next wo time
steps, he eftward flow of syllables hrough he nput buffer continues,un-
til at time step6, the words four syllablesare n elements+ 1 through +4.
At time step7, the first syllablemoves nto element0 of the input buffer,
and he four syllablesof theword occupyelements through + 3 of the buf-
fer. At this time step, raining of the network occurs or the first time: the
perceptron s trained to associate he pattern n its input buffer (the four
syllablesof the word, in buffer elements0 through +3) with the target
stress evel for the syllablecurrently n buffer element0. At the next time
step, the syllablesof the word are shifted left again, and occupybuffer
elements 1 through +2. The network s trained to associatehis pattern
with the argetstressevel or the second yllable,which s now in element .
Similarly, at the next two time steps, he perceptron s trained to produce
the stressevelsappropriate or the third and fourth syllablesof the word,
as they come o occupyelement0 of the input buffer.
6 To simp lify discussion here, each syllable is represented as an S token , rather than a s
an H or IL. For quantity- insensit ive sy ste m s, this information suff ice s to determine place-
ment of stress.
-
8/11/2019 Connectionist Models and Linguistic Theory
13/50
CONNECTIONIST MOD ELS AND LINGUISTIC THEORY 13
At this point, the buffer is flushed(all elements reset o zero),and one
trial is over. The next word in the training set cannow enter,beginning
the next trial.
Thus, the processing f one word, syllable by syllable, constitutesone
trial.
Onepass hroughall the words n the training setconstitutes n
epoch.
It shouldbe noted hat sequentialprocessing f syllables s not a neces-
sarypart of this model.Exactly the sameeffect canbeobtainedby a parallel
scheme f 13perceptron-like nits whoseweightvectorsare ied togetherby
weight-sharing
(Hertz, Krogh, & Palmer, 1991,p. 140).All 13units would
learn n unison, and words could be processedn one parallel step.
3. PERCEPTRON STRESS LEARNING:
A PSEUDO-LINGUISTIC ACCOUNT
In this section,we analyze he characteristics f the variousstresspatterns
to which the perceptron s exposed,and relate them to the perceptrons
learningbehavior,with a view to developinga
pseudo-linguistic theory.
In
the sameway that metrical phonologyattempts o provide an accountof
linguistic stressn humans,our pseudo-linguisticheorywill attempt o pro-
vide an accountof the learningof stress n a perceptron.
We first presentan analysisof factors affecting earnability for the QI
stress ystems.We developan analyticschemehat enables s both to char-
acterize he patternsand to predict the easeor difficulty of their learning.
We then include QS systemsand show that one analytical framework can
take into accountboth QI and QS systems.This overall framework con-
stitutesour pseudo-linguisticheory.
3.1. Learnability of QI Systems
We begin by noting that learning imes differ considerably or (Latvian,
French}, {Maranungku,Weri}, {Lakota, Polish} and Garawa,asshown n
the ast column of Table2. Moreover, Paiuteand Warao wereunlearnable
with this model*.
These are learning t ime s with the syllubic input representation.
* A case could perhaps be made tha t mono syllables are special (Elan Dresher, personal
com mun ication, March 16, 1992), so that i t might be plausible to assu me that there are no
stressed mon osyllables in the data. We investigated this possibi l i ty in two wa ys. First, i f mono-
syllables are treated as receiving no stress , then the stress patterns of Paiute and Warao becom e
learnable within the current architecture: It takes 45 epochs for the network to learn the stres s
pattern o f Paiute, and 36 epochs for Warao. Second, i f monosyllables are removed from the
data se t, the learning t imes are 44 epochs for P aiute, and 46 epochs for Warao. Note also that
both stres s patterns were learnable, even with the stressed mon osyllables, using either a two-
layer architecture with two output units (targets were [0, O] for no stres s, [ l , 0] for sec ondary
stres s, and [1, l ] for primary stress ), or a three-layer architecture. We present the f inding of
nonlearnabil ity within the present model in order to maintain a cons istency of analysis in one
mod el, and since the two-layer model with a single output unit faci l i tates analysis of connec-
tion weights and their contribution to processing.
-
8/11/2019 Connectionist Models and Linguistic Theory
14/50
14 GUPTA AND TOURETZKY
TABLE 2
Prellminory Analysis of Learning Times for QI Stress Systems
IPS SCA Ah Mt's MSL QI LANGUAGES&F EPCXHS
(syllabic)
00 00 0 Latvian Ll 17
French L2 16
00 10 1 Maranungku L3 37
Weri L4 34
01 10 1 Garawa L5 165
10 00 0 Lakota L6 255
Polish L7
254
10 10 1 Paiute L8 **
Warao L9 **
Note. These learning times were obtained using the syllabic input representation.
IPS = Inconsistent Prlmory Stress; SCA = Stress Clash Avoidance; Alt = Alternotion ;
MP S = Mult iple Primary Stresses: ML = Mult iple Stress Levels. Symbols Ll-L9 refer to
Table 1.
Examination of the inherent featuresof thesestresspatternssuggests
various factors asbeing relevant o learning:
Alternation
of stressesasopposed o a singlestress)s suggestedy the
differencebetweenearning imes for {Latvian, French}and {Maranungku,
Weri}, which also suggesthat the number ofstress levels may be relevant.
Recall rom Table 1 that, in.Garawa,primary stress s placedon the first
syllable,secondary tresson the penultimatesyllable,and tertiary stresson
alternatesyllablespreceding he penultimate,but that no stressappears n
the secondsyllable. The primary, secondary,and tertiary stresspatterns
potentially ead o stress ppearing n both the first and he second yllables;
however, his is avoided stresss neverplacedon the second yllable).This
exemplifies he tendency n human anguageso avoid the appearance f
stresson adjacentsyllables.The greater earning ime for Garawasuggests
that such
stress clash avoidance
is computationallyexpensive.
Iq languages uchas Latvian, French,Maranungku,Weri, and Garawa,
prinia@stresss alwayson a syllableat the edgeof the word. In Lakota and
Polish, for which learning imes aresubstantiallygreater han thoseof the
-
8/11/2019 Connectionist Models and Linguistic Theory
15/50
CONNECTIONIST MODE LS AND LINGUISTIC THEORY 15
other languages, rimary stress s alwaysat a non-edge yllable,except n
mono- and d&syllables. Paiuteand Waraoare denticalwith respect o the
placementof primary stress o Lakota and Polish, respectively,but are
unlearnable).Thus, placement of primary stress seemscomputationally
relevant. n particular, t appearsmore difficult to learn patterns n which
primary stress s assigned t the edgesnconsistently.
To test heseassumptions, nd to determinewhat features f Paiute and
Warao ed to their nonlearnability,we constructed
ypothetical stress
pat-
terns hat exhibit morecombinations f the above eatures hanmay actually
exist. Thesestresspatternsare describedn Table 3. We trained networks
on thesevariations n order to evaluate he effectsof the various eatures.
The following factors emergedas determinantsof learnability for the
rangeof QI patternsconsidered:
gr)
On
(In)
(IV)
09
Inconsistent Primary Stress
(IPS): it is computationallyexpensiveo
learn the pattern if neither edgereceivesprimary stressexcept n
mono-and d&syllables;his canbe regarded s an ndexof computa-
tional complexity that takes he values(0, 1): 1 if an edge eceives
primary stress nconsistently,and 0, otherwise.
Stress class avoidake @CA):
if the componentsof a stresspattern
can potentially lead to
stress clash,
then the languagemay either
actuallypermit suchstress lash, or it may avoid t. This index akes
the values 0, 1): 0 if stressclash s permitted, and 1 if stress lash
is avoided.
Alternation (Alt): an index of learnability with value0 if there s no
alternation, and value 1 if there s. Alternation meansa pattern of
somekind that repeatson alternatesyllables.
Multiple Primary Stresses
(MIS): has value0 if there s exactlyone
primary stress, ndvalue1 f there s more than oneprimary stress.t
hasbeenassumedhat a repeating atternof primary stresses ill be
on alternate, ather than adjacentsyllables.Thus, [Alternation= 0]
implies [MPS = 01.The hypotheticalstress atternsexaminednclude
somewith more thanoneprimary stress;however,as ar as s known,
no actually occurringQI stresspattern has more than one primary
stress.
Multiple Stress Levels MSL): hasvalue0 if there s a single evelof
stress primary stressonly), and value 1 otherwise.
It is possible o order these actors with respect o eachother to form a
five-digit binary string characterizing he ease/difficulty of learning. That
is, the computationalcomplexityof learninga stress atterncanbe charac-
terizedasa S-bit binary numberwhosebits representhe five factorsabove,
in decreasing rderof significance. able2 shows hat this characterization
captureshe earning imesof the QI patterns uiteaccurately. s an example
-
8/11/2019 Connectionist Models and Linguistic Theory
16/50
T
3
D
p
o
o
H
h
c
Q
S
e
P
e
n
R
L
D
p
o
o
S
e
P
e
n
h h h h h h h h h h
0
h h h h h h h h h h h h h h h h h h h h
L
v
a
s
e
L
v
a
e
F
e
e
F
e
M
s
e
L
v
a
L
v
a
e
M
o
o
e
W
e
s
e
L
v
a
e
a
G
a
w
S
G
a
w
e
S
M
o
a
s
e
W
e
s
e
L
v
o
o
G
a
w
s
e
S
L
v
o
e
o
G
a
w
n
o
L
v
o
e
L
v
o
S
L
v
o
e
S
G
a
w
e
.
L
v
o
e
e
S
G
a
w
s
e
L
v
o
e
a
S
L
v
o
e
o
S
L
o
e
L
a
L
a
e
L
a
o
L
a
s
e
o
M
a
n
s
e
o
f
s
s
a
e
s
y
o
s
M
a
n
s
e
o
f
s
s
y
o
s
t
e
a
y
o
t
h
d
s
o
e
M
a
n
s
e
o
f
n
s
y
o
a
e
M
a
n
s
e
o
f
n
s
y
o
p
t
e
a
y
o
a
e
M
a
n
s
e
o
f
s
a
l
a
s
o
e
M
a
n
s
e
o
f
s
a
l
o
s
y
o
a
e
M
a
n
s
e
o
f
s
s
y
o
p
a
e
n
e
p
e
n
t
e
a
y
a
s
y
s
e
M
a
n
s
e
o
l
o
s
y
o
o
e
a
e
n
e
p
e
n
t
e
a
y
s
y
s
e
M
a
n
s
e
o
f
s
s
y
o
p
a
a
e
n
e
p
e
n
s
a
e
M
a
n
s
e
o
f
s
s
y
o
p
t
e
o
y
o
a
e
n
e
p
e
n
s
a
e
M
o
n
s
e
o
f
s
s
y
o
p
a
a
e
n
e
p
e
n
s
a
e
M
a
n
s
e
o
f
s
a
a
e
n
e
s
n
s
a
e
M
a
n
s
e
o
l
a
a
o
e
n
e
p
e
n
s
a
e
M
a
n
s
e
o
f
s
a
l
a
a
a
e
n
e
p
e
n
s
a
e
M
a
n
s
e
o
f
s
a
p
a
a
e
n
e
p
e
n
s
a
e
M
a
n
s
e
o
f
s
a
a
e
a
a
e
n
e
p
e
n
s
a
e
s
y
o
f
n
M
a
n
s
e
o
f
s
s
y
o
p
t
e
a
y
o
a
e
a
e
n
s
e
o
s
M
a
n
s
e
o
f
s
s
y
o
l
a
t
e
a
y
o
a
e
n
s
e
o
s
M
a
n
s
e
o
f
s
a
l
o
b
n
s
e
o
s
M
o
n
s
e
o
f
s
a
l
a
s
y
o
a
e
n
s
e
o
s
M
a
n
s
e
o
f
s
s
y
o
p
a
a
e
n
e
p
e
n
s
a
e
n
s
e
o
s
M
a
n
s
e
o
f
s
s
y
o
l
a
a
a
e
n
e
p
e
n
s
a
e
n
s
e
o
s
M
a
n
s
e
o
f
s
a
p
a
a
e
n
e
p
e
n
s
a
e
b
n
s
e
o
s
M
a
n
s
e
o
f
s
o
l
o
a
a
e
n
e
p
e
n
s
o
e
n
s
e
o
s
M
o
n
s
e
o
f
s
a
a
e
a
o
p
e
n
s
a
e
s
y
o
l
a
b
n
s
e
o
s
M
a
n
s
e
o
s
s
y
o
p
M
a
n
s
e
o
s
a
p
s
a
e
M
a
n
s
e
o
s
a
p
s
y
o
f
o
h
s
a
e
M
a
n
s
e
o
s
a
a
e
n
e
s
n
s
a
e
b
n
o
l
a
M
a
n
s
e
o
s
a
p
s
y
o
f
o
h
a
a
e
n
e
s
n
s
a
e
-
8/11/2019 Connectionist Models and Linguistic Theory
17/50
CONNECTIONIST MOD ELS AND LINGUISTIC THEORY
17
of how to read Table 2, note that Garawa takes longer to learn than Latvian
(165 vs. 17 epochs). This is reflected in the parameter setting for Garawa,
OllOl, being lexicographically greater than that for Latvian, OOOOO.
The analysis of learnability is summarized in Table 4 for all the QI stress
patterns, both actual and hypothetical. It can be seen that the 5-bit charac-
terization fits the learning times of various actual and hypothetical patterns
reasonably well; there are, however, exceptions, indicating that this 5-bit
characterization is only a heuristic. For example, the hypothetical stress pat-
terns with reference numbers h21 through h25 have a higher 5-bit character-
ization than some other stress patterns, but lower learning times.
The effect of stres s clash avoidance is seen in cons istent learning t ime differentials
between stres s patterns of com plexity less than or greater than binary 1000. Learning
time s with com plexity 0001 are in the range 10 to 25 epoch s, while comp lexity 1001
patterns are of the order of 170 epoch s; com plexity 0010 is of the order of 30 epoch s, and
lOlO, 190 epoch s, for patterns the only difference between which is the absen ce/
presence of stres s clash avoidance (Latvian2edge and Latvian2edgeSCA, references h5
and h19 respec tively). A pattern with com plexity 0011 (Latvian2edge2stress reference
h6) has a learning t ime of 37 epoch s, while a pattern differing only in the addition of SCA
(Latvlan2edge2stressS CA, reference h20) takes 206 epoch s. Com plexity 0101 patterns
are in the range 30 to 60 epoch s, while co mp lexity 1101 patterns are in the range 70 to
170 epoch s; in part icular, while Garawa (reference L5) has a learning t ime of 165 epoch s,
the sam e pattern without SCA has a learning t ime of 36 epochs (Garawa-SC, reference
hl0). A stres s pattern o f com plexity 0111 takes 65 epochs to learn (Latvian2edge2stress-
lalt, reference hl6), while addit ion of stres s clash avoidance results in a learning t ime of
129 epochs (Latvian2edge2s tress-1 alt-SCA, reference h25).
The effect of alternatlon Is seen in a contrast between learning t ime s for patterns of
com plexity 001 (range: 10 to 25 epochs) and 101 (range: 30 to 60 epochs ); 010 and
110 (30 epochs vs . a range of 60 to 90 epochs ); 011 and 111 (37 vs . 65 epochs ).
The effect of mult lple primary stress es is seen in the contras t between the stres s pat-
terns: LatvianPstress (reference hl , com plexity OOl, 21 epochs) and LatvianPedgePstress
(reference h6, com plexity 01 l , 37 epoch s); Latvian2edge2stresealt (reference h9, com -
plexity 101 I, 56 epochs) and Latvian2edge2strese1 alt (reference h16, com plexity 1 11 ,
65 epochs).
The effec t of lnconslstent primary stres s is considerable: stres s patterns with the m ost
significa nt bit 1 are learnable in the perceptron mod el on ly if all the other bits are 0; suc h
patterns (Lakota, Polish, references L6, L7, com plexity ~lOOO O, 255 epochs) have a
higher learning t ime than any of the patterns with mo st signif icant bit 0. All the examined
stres s patterns o f com plexity greater than 10000 were unlearnable In the perceptron
mod el. Re call that Paiute and Warao were unlearnable; the present fram ework is cons istent
with that result, since under the present analysis, these two patterns have complexity
10101 I. Recall also from footnote 6 that i f we assu me no stresse d mon osyllables in Paiute
and Wa rao, then these stres s patterns be come learnable, in approximately 35-45 epoch s.
Under these circum stances, the complexi ty index of the two stress syste ms is 00101. As
can be seen from Table 2 and Table 4, a learning t ime o f approximately 40 epochs Is com -
pletely con sistent with the learning t imes obtained for other stres s patterns with this index.
That is, our sche me for indexing com plexity is cons istent with the results of Paiute and
Wa rao, whether or not there are assum ed to be stressed mon osyllables.
-
8/11/2019 Connectionist Models and Linguistic Theory
18/50
TABLE 4
Analysis of Quantity-Insensitive Learn ing
Epochs
IPS SCA Ah MP S MSL Lanauaae Ref (sv llab lc)
0 0 0
Latvian
L l 17
French L2 16
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
Latvlan2stress :: 21
LatvianJstress 11
FrenchSstress h3 23
French3stress h4 14
LatvianPedge h5 30
LatvianPedgePstress h6 37
fmposslble
Maranungku L3 37
Weri L4 34
Paiu te (unstressed mono syllables) LB 45
Warao (unstressed mono syllables) L9 36
Maranungkutstress h7 43
WeriSstress h3 41
Latvian2edge2stress-alt h9 53
Garawa-SC hlo 33
Garawa2stressSC hll 50
Maranungkulstress h12 61
Werilstress h13 65
Latvian2edge-al t h14 70
Garawal stress-SC h15 33
1 1 1 Latvian2edae2stress -lalt h16 85
1
0 0 0
lmpossfble
1 0 1 Garawa-non-al t h17 164
Lotvian3stress2edaeSCA hl3 163
0 1 0
0 1 1
1 0 1
1 1 0
1 1 1
0 0 0
0 0 1
0 1 0
0 1 1
1 0 1
1 1 0
1 1 1
Latv ian2edgeSCA
Latvian2edge2stressSCA
Garawa
Garawa2stress
Lotvian2edge2stress-alt-SCA
Garawalstress
LatvianZedge-al t-SCA
Latvian2edge2stress-lalt-SCA
Lakota
Pol ish
LakotaPstress
LakotaPedge
LakotaPedgePstress
Paiu te
Warao
Lakota-alt
Lakotalstress-alt
h19 194
h20 206
:1 165
71
h22 91
h23 121
h24 126
h25 129
L6 255
254
1;726 l
h27 **
h23 **
L3 **
l
:9 **
h30 l *
Note. These learning t imes were obtained using the syl labic input representation.
IPS = Inconsistent Primory Stress: SC A = Stress Clash Avoidanc e; Aft = Altern ation;
MP S = Mu l t ip le Primary Stresses: MS L = Mul t ip le Stress Levels. Symbols Ll -L9 refer to
Table 1, and hl -h30 to Table 3.
18
-
8/11/2019 Connectionist Models and Linguistic Theory
19/50
CONNECTIONIST MOD ELS AND LINGUISTIC THEORY
19
The im pact of mult iple stres s levels is relatively sma ller, and less uniform, both of which
mo tivate this factors being treated as least signif icant. Thu s, though there are several in-
stanc es where a stres s pattern with a greater number of stres s levels h as a higher learning
time (hl vs . Li; h3 vs L2; h6 vs . h5; h7 vs . L3; h6 vs . L4; h21 vs . L5), there are also case s in
which a stres s pattern with a higher number of stres s levels has a lower learning t ime than
one w ith fewer stress levels (h2 vs . hl; h4 vs . h3; hll vs . h15; h21 vs . h23):
The effect o f a part icular factor s eem s to be reduced when a higher-order bit has a non-
zero va lue. Th us, the effec ts of alternation are less clear when there is stres s clash
avoidance: Withou t SC A, the range of learning t imes for patterns without alternation Is 10 to
40 epochs, and with alternation 30 to 90 epoch s; but with SCA , the range without alterna-
t ion Is 160 to 210 epochs , and with alternation 70 to 170 epochs.
In summary, the complexity measure suggested here appears to iden-
tify a number of factors relevant to the learnability of QI stress patterns
within a minimal connectionist architecture. It also assesses heir relative
impacts. The analysis is undoubtedly a simplification, but it provides a
framework within which to relate the various learning results.
3.2. Incorporating QS Systems
We now turn to a consideration of quantity-sensitive (QS) stress systems.
For QS patterns, information about syllable weight needs to be included in
the input representation-the input has to consist of (encoded) sequences of
H and L tokens. A purely
syllabic
nput representation is, by defini-
tion of quantity-sensitivity, inadequate. Weight-stringepresentations were
therefore adopted, as discussed in Section 2.2. To maintain consistency of
analysis across QI and QS stress patterns, simulations for the QI languages
were re-run using the weight-string representation. Note that the stress pat-
terns for all possible weight strings of length n are the same for a QI
language.
Learning times are shown in Table 5 for simulations for all QI and QS
stress systems. The section on the left shows QI learning times, while the
section on the right shows QS learning times. Note that in this table, learn-
ing times for all languages are reported in terms of the weight-string repre-
sentation rather than the unweighted syllabic representation used for the
previous QI studies.
The differences in learning times across QI patterns are less marked than
the differentials in Table 2. This is the result of the increased training set size
with the weight-string representation as compared with the syllabic repre-
sentation. However, learning times are completely consistent with the
overall learnability analysis developed in the previous section, so that the
analysis of QI systems is in no way affected. The reader may wish to com-
pare the learning times for QI systems in Table 5 (weight-string input repre-
sentations) with those in Table 2 (syllabic input representations).
-
8/11/2019 Connectionist Models and Linguistic Theory
20/50
20 GUPTA AND TOURETZKY
In order o incorporateQS systems, he earnabilityanalysisproposed n
Section3.1 on the basisof QI patterns urnsout to requiresome efinement
(although he analysisof QI systems hemselves ill remainunchanged; ee
below). InconsistentPrimary Stress IPS) was previouslyhypothesized s
taking binary values;a valueof 1 for IPS indicated hat primary stresswas
assignednconsistently t the edgeof words; a valueof 0 ndicated hat this
was not the case. f this measures modified so that its value ndicates he
proportion of cases in which primary stress is not assigned at the edge of a
word,
the learning esults or both QI andQS patterns anbe ntegrated, o
a large extent, nto a unified account. Recall that learning times for QS
systemsare thoseshown n the right-handsectionof Table 5.
The learning times for Malayalam and Yapeseare approximately 20
epochs,while those or OsseticandRotumanare approximately30 epochs.
The differencebetween hesepairs of stress atterns s: for Malayalam and
Yapese,primary stress s placedat the end
except
when the edgevowel is
short and he next vowel ong (i.e., except0.25of the time); for Osseticand
Rotuman, primary stress alls at the edgeexcept when the edgevowel is
short, that is, except n 0.5 of the cases.
The five factors discussedearlier were: InconsistentPrimary Stress
(IPS); StressClash Avoidance SCA); Alternation (Ah); Multiple Primary
StressesMPS); andMultiple Stress evels MSL). The valuesof thesendices
respectively, or both Malayalam and Yapese,are [0.250 0 1 01,and for
both Osseticand Rotuman, [0.5 0 0 1 01.The differencebetweenearning
times for thesepairs of otherwise denticalpatternscan then be accounted
for in terms of differing values of the IPS measure.Note that the earlier
analysisof QI languagesemainsunchanged: tresspatterns hat had IPS
value 0 still do, and those hat had IPS value 1 still do as well.
The learning imes of Komi and Cheremisare substantiallyhigher han
thoseof Koya, Eskimo, Malayalam, Yapese,Ossetic,and Rotuman.Recall
the stress atternof Komi: stress he first heavysyllable,or the ast syllable
if there are no heavysyllables.That is, for Komi, a particular syllable S
receivesprimary stressunder the following conditions: (1) there are no
heavysyllables o the eft of S n the syllablestring;and (2)S s Heavyor S is
the last syllable.The second lauseof the conditional nvolvessingle-posi-
tional information: nformation eitherabout he syllableS tself (S s Heavy),
or about the absence/presencef a syllable right-adjacent o S in the
weight-string. If there s no syllable o the right of S in the weight-string,
then S is the ast syllable; f there s a syllable ight-adjacent o S, then S is
not the last syllable).The first clauseof the conditional, however, nvolves
aggregative information: information aboutalf the syllables o the eft of S
in the weight-string.We see hat in assigning tress o syllable strings n
Komi, one must somehow scan the input string to extract his aggrega-
tive information; similarly for Cheremis.
-
8/11/2019 Connectionist Models and Linguistic Theory
21/50
T
5
S
m
m
a
y
o
R
s
a
A
y
s
o
Q
a
Q
L
n
n
E
A
m
I
P
S
A
M
P
M
S
Q
L
R
(
w
n
0
0
0
0
0
0
L
v
a
L
2
0
0
0
0
0
1
K
L
O
2
0
0
0
0
1
0
E
m
o
L
3
0
0
0
1
0
1
M
a
a
L
3
W
e
L
3
0
0
1
1
0
1
G
Y
C
W
a
L
7
0
0
2
0
0
0
0
M
a
o
a
m
L
1
Y
L
1
0 0
0
5
1
0 0
0 0
0 0
0
0
L
o
L
1
O
c
L
3
R
u
m
a
L
2
P
s
L
1
0
1
0
1
0
1
P
u
e
W
a
a
L
t
L
l
1
0
0
0
0
0
K
m
i
L
6
2
C
e
m
is
L
2
2
0
0
0
0
0
M
o
a
L
2
M
a
L
2
N
e
T
l
e
n
n
t
m
e
w
e
o
a
n
u
n
w
g
s
n
i
n
r
e
e
a
o
A
e
v
I
n
a
m
a
o
I
P
n
s
e
P
m
a
y
S
e
S
e
C
a
A
d
A
=
e
n
o
M
P
M
u
p
e
P
m
a
y
S
e
M
S
M
u
p
e
S
e
L
s
S
m
b
s
L
L
r
e
e
t
o
T
e
1
Y
-
8/11/2019 Connectionist Models and Linguistic Theory
22/50
22 GUPTA AND TOURETZKY
Komi and Cheremis an thereforebe analyzedas stresspatterns hat re-
quire uggregative information for the determinationof stressplacement;
noneof the otherstress atterns equiresuch nformation. For example, or
Koya, a syllableS should eceive tressf it is the first syllable whichcanbe
determined rom information about the presence/absencef a syllable n
the left-adjacentweight-stringposition), or if it is heavy,both of which are
single-positional kinds of information. For Ossetic,a syllableS should be
stressedf (a) it is the first syllableand it is heavy(which requires ingle-
positiona information about the left-adjacentweight-stringposition, and
aboutS tself); or if (b) t is the second yllable single-positional information
about heweight-string lement wo positions o the eft of S)
and
the syllable
in the left-adjacent osition s light
(also single-positional
information).
The difference n learning imes betweenKomi and Cheremison the one
hand, and Koya, Eskimo, Malayalam, Yapese,Osseticand Rotuman, on
the other, can now be analyzed n terms of differing informational re-
quirements.Whether or not aggregative information is needed herefore
seems o be a further factor relevant o the learnability of stresspatterns.
We further need o consider he fact that the patternsof Mongolian and
Mayan havevery much higher earning imes than thoseof any other stress
patterns, including Komi and Cheremis. Recall the stress pattern of
Mongolian: stress he first heavysyllable,or the first syllable f thereareno
heavysyllables.Notice how it differs from that of Komi, whosepattern s:
stress he first heavy syllable, or the last syllable if there are no heavy
syllables.
Thus, for Mongolian, if the currentsyllableS is heavy t shouldreceive
stress f it is thefirst heavysyllable. f the currentsyllableS is light, then t
shouldreceivestress nly if (a) there s no syllable o its left in the weight-,
string (which ndicates hat it is the first syllable),and (b) there s no heavy
syllable to its right in the weight-string.That is, for Mongolian, there s
aggregative
information requiredabout heavysyllablesboth to the left of
the currentsyllable
and
to its right. Information aboutheavysyllables o the
left of S is relevant f S is heavy; nformation about heavysyllables o the
right is relevant f S is light. This requirement or keeping rack of dual
kinds of aggregativenformation seemso be what makes he patternsodif-
ficult to learn. This contrastswith Komi, for which aggregativenformation
is requiredonly about syllables o the left of S.
A.,parallel analysiscan be made for Mayan. For both Mongolian and
Mayan, then, the very high
learning
times result from the requirement or
two kinds of aggregativenformation, in contrastwith only one kind for
Komi and Cheremis.Acquiring this information requires idirectionalscan-
ning of the input string for Mongolian and Mayan, as comparedwith uni-
directionalscanning,as discussed bove, or Komi and Cheremis.
-
8/11/2019 Connectionist Models and Linguistic Theory
23/50
CONNECTION&T MOD ELS AND LINGUISTIC THEORY
23
The results rom Komi, Cheremis,Mongolian, and Mayan thus suggest
an additional factor that is relevant or determinationof learnability, but
that comes nto play only in the caseof QS pattern: how much scannings
required or
aggregative
information. This canbe treatedasa sixth indexof
computationalcomplexity hat takeson thevalues 0, 1,2}. We, therefore,
have he following factor, in addition to the five previouslydiscussed:
(VI) Aggregative Information (Agg): hasvalue0 if no aggregativenfor-
mation is required
single-positional
information suffices);value 1 f
unidirectional scanning s required (Komi, Cheremis);and 2 if bi-
directionalscanning s required(Mongolian, Mayan).
With thesemodifications (viz., refinementof the IPS measure, nd addi-
tion of the Aggregativemeasure),he sameparameter cheme an be used
for both the QI and QS anguage lasses,with good earnabilitypredictions
within eachclass,as shown n Table 5.
We note, moreover, that learning times for the QS stresspatternsof
Koya and Eskimo fit right in with the analysis.The characterizationof
Koya n termsof our complexitymeasures OOOOO1,hile that of Eskimo
is 900010. Learning imes for thesesystems,accordingly, it in between
the learning times for the QI systemsof Latvian and French (complexity
index OOOOOO)nd Maranungkuand Weri (complexity ndex 000101).
As can alsobe seen,differencesn learning imesbetweenQS stress at-
ternsalso it in more generallywith the analysisdeveloped arlier,andwith
the analysisof
single-positional vs. aggregative
informational requirements
developedn this section.
Thus, both the QI and QS results all into a single analysiswithin this
generalized arameterschemeand weight-string epresentation, lthough
with a less perfect fit than the within-classresults.For example, he QI
stresspatternsof Lakota and Polish havehigher complexity ndexes han
the QS stresspatternsof Malayalam, Yapese,Ossetic,and Rotuman, but
lower learning times. Quantity-sensitivity hus appears o affect learning
times, as seems easonableo expect,due o the distribution of the weight-
string training set (seediscussionn Section2.2). However,no measure
of its effect will be offered here.The analytical rameworkdevelopedhus
far appears o hold within QI languages, ndwithin QS languages;urther
analysiswould be neededo relate earningresultsacross he two kinds of
stresspatterns.
Finally, it is worth noting that learning imes for thesestresspatterns
havealsobeenobtainedwith a three-layer rchitecture.Differences etween
learning times for different stress patterns were somewhat reduced.
However, earning
imes
were
ordered
exactly as in the two-layer model
resultsshown n Table 5, suggestinghat these earningresultsare robust
-
8/11/2019 Connectionist Models and Linguistic Theory
24/50
24
GUPTA AND TOURETZKY
with regard o architecturaldifferences. n the discussion hroughout this
paper,however,we focus on results rom the two-layermodel only, since
the two-layermodelwith a singleoutput unit facilitatesanalysisof connec-
tion weightsand their contribution to processing.
3.3. Markedness and Learning Times:
Correspondences with Metrical Theory
So far, we haveusedour perceptron earning esults o devisean analytical
framework or stress.t shouldbeclearhow our theory s analogouso con-
ventional inguistic heory: t is an accountof stress ystems ased n obser-
vation of the behaviorof a perceptronexposed o thosesystems, n very
much the sameway that metrical theory s an accountof linguistic stress,
basedon observationsof what human beings earn. The stresspattern
descriptions iven to the perceptronareat the same evel of abstractionas
those hat form the startingpoint for metrical heory. But sinceour account
is basedon learning ime rather han surface orms anddistributional data,
we refer to it as a pseudo-linguisticheory.
In this section,we will discuss heways n which our learning ime results
are in good agreementwith some of the markedness predictions of
metrical theory. That is, our resultsexhibit a correspondenceith theoreti-
cal predictions,strengtheninghe analogybetween ur theory and metrical
theory. As we will show, earning esultssuchas hese ould alsoprovidean
additional sourceof data for choosingbetween heoreticalalternatives.g
Within the dominant inguistic tradition, a universalgrammar of stress
should ncorporatea theory of markedness, o as o predictwhich features
of stresssystems re at the core of the human anguage aculty and which
are at the periphery.The distributional approach o markednessreats as
unmarked those inguistic orms that occurmore requently n the worlds
languages. nother approach o markednesss
learnability theory,
which
examines he logical process f language cquisitionO. hus, for example,
Dresher& Kaye take iteration to be the default or unmarkedsetting for
parameterPll, because here s evidence hat can cause evision of this
default if itturns out to be the incorrectsetting: he absence f any secon-
dary stresses ervesas a diagnostic hat feet are
not
iterative (Dresher&
) Nybergs parameter-based model also provides such data, but in term s of how ma ny ex-
urn&s
of a stres s pattern have to be presented to the stress learner (Nyberg, 1990).
lo As an exam ple, the Subset Principle (Berwick, 1985; We xler & Man xini, 1987) hos im-
plications for ma rkedne ss. Suppose that two possible sett ings u and b for parameter P result in
the learner respectively accepting s ets S,,and Sb of l inguistic form s. If S ,, s a subs et of St,, then,
once P has been set to value b, no posit ive evidence can ever re-set it to a, even if that was the
correct sett ing. Unm arked values for parameters should therefore be the ones yielding the m ost
constrained system.
-
8/11/2019 Connectionist Models and Linguistic Theory
25/50
CONNECTIONIST MOD ELS AND LINGUISTIC THEORY
TABLE 6
Learn ing Times far Ql and QS Stress Patterns, Grouped by Theoretical Analysis
Lanauaa e Characterization Epochs
7
a
9
Latvlan, French
Koya
Eskimo
Maranungku, Weri
Lokota, Pol ish
Malaya lam,
Yapese
Ossetic,
Rotuman
Koml, Cheremls
Mongl ian,
Mavan
Word-tree, no feet
Word-tree, Iterative unb oun ded QS feet
No word-tree, i terative unbounde d QS feet
Ward-tree, Iterative binary Ql feet
Word-tree, non-iterative binary Ql feet
Word-tree, non-iterative binary QS feet,
domina nt node branches
Word-tree, non-iterative binary QS feet
Word-tree, non-iterative unb ound ed QS feet
Word-tree, non-iterative unb ound ed QS feet,
domina nt node branches
2
2
3
3
10
19
(*:;
214
W)
2302
(f4)
Note. These learn ing times were obta ined using wefght-strfng input representations for
both Ql and QS patterns. Each f igure Is the average learning t ime for languages In the group.
Kaye, 1990,p. 191). f non-iterationwere he default, their learningsystem
might not encounterevidene hat would enable t to correct this default
setting, f it were n fact incorrect. It shouldbe noted that, while this is a
representativepplicationof subset heory, the choiceof defaultparameter
valuesdepends n the particular earningalgorithm employed.
Table 6 shows he stress ystems roupedby their theoreticalanalysesn
termsof the parameter cheme iscussedn Section1.4.The ast column of
.the able shows he averageearning ime in epochs or eachgroupof stress
patterns theseare the same earning imes shown n Table 5). As can be
seen, hereappearso bea fairly systematic ifferentiationof learning imes
for groupsof stresspatternswith different clustersof parameter ettings.
I As noted in Section 2.2,
weight-string
representations are neces sary for QS stre ss pat-
terns. For QI sys tem s,
syllabic
representations are suff icient. Of course, weight-str ing repre-
sentations can be used for QI sys tem s, although the information about syl lable we ight wil l be
redundant. To obtain learning t imes that m ight reflect differences between the stres s patterns
them selve s, rather than merely reflecting differing input representations and training set size s,
we ran simulations for both Q S and QI patterns using weight-str ing input representations. The
learning t imes in Table 6 are based on use of this weight-str ing representation for both QI and
QS patterns.
I The stress sys tem of Garawa, as described in Hayes (1980, pp. N-55), cannot be char-
acter ized in terms of the parameter scheme adopted here. The stress syste ms of Paiute and
Warao cannot be learned by a perceptron, as already discusse d in Section 3 .1, and as wil l be
discuss ed in m ore detail in Section 4.2. (Recall, howev er, that they can be learned (a) using a
two-layer a rchitecture with two output units, or (b) a three-layer architecture with hidden
units, or (c) within the present perceptron architecture, i f monosyllabic stres s is not included.)
Learning results for these three stre ss sy ste m s have therefore been excluded from the present
analysis, as either the parameterized characterization or learning t imes cannot be established.
-
8/11/2019 Connectionist Models and Linguistic Theory
26/50
26
GUPTA AND TOURETZKY
Learning times appear o be significantly higher for stresssystems n
groups5 through9, which havenon-iterative eet, han for those n groups
1 through 4, which either do not have metrical feet at all, or else have
iterative feet. This makes he interestingprediction that non-iterative eet
are more difficult to learn, and hencemarked.This predictioncorresponds
with both Halle & Vergnauds xhaustivityCondition3,andwith thechoice
of marked and unmarkedsettings n Dresher8cKayesparameterscheme
(ParameterPl 1).
Comparisonof learning imes for group 1 vis-a-visgroups2,3 and4 also
suggestshat a stresssystemwith only a word-tree(i.e., with no metrical
feet) s easier o learn than one with (iterative)metrical feet.
The dramatic difference n learning imes betweengroups8 and 9 sug-
gests hat it is marked or the dominantnode o be obligatorily branchingls.
Group 8 differs from group 9 only in not havingobligatorybranching,and
averageearning imes were214epochs s. 2302epochs.
This predictionagrees ith the distributionalview that obligatorybranch-
ing is relativelymarked16, ut runs counter o Dresher& Kayes choiceof
default values parameterP7)*.
However,comparisonof group6 with group7 suggestshat systemswith
obligatorybranchingmay be more easily earned:group 6, with obligatory
branching,hasa learning ime of 19epochs, omparedwith group 7, with-
out obligatory branching,but with a learning ime of 29 epochs.This runs
counter o the distributionalargument,but agrees ith the earnabilityview.
Two points areworth noting. First, it is interesting hat where here s a
conflict between he distributional and learnability theory predictionsof
markedness,here s also conflicting evidencerom the perceptron imula-
tion. Second, heseconflicting perceptron esultshighlight the fact that it
may be infeasible o analyze he effectsof different settings or individual
parameters;t may only be possible o makebroaderanalyses f the effects
of
clusters
of parametersettings.Strong nteractionsbetweenparameters
havealsobeenobservedn othercomputational earningmodelsof metrical
phonology Eric Nyberg, personalcommunication,August, 1991).
I The rules of cons tituent boundary construction apply exh austively.. . (Halle L
Vergnaud, 1987a, p. 15).
I In Dresher 8c Kayes mod el, i teration is the default or unmarked parameter sett ing
because there is evidence that can cause revision of this default. The absence of a ny secondary
stress es serves as a diagnostic that feet are not i terative (Dresher 8c Kaye, 1990, p. 191).
I This mean s that the strong branch of a foot m ust dominate a heavy syllable, and cannot
dominant a l ight on e.
I6 Thu s, Hayes (1980, p. 113): the max imally unmarked labeling convention is that which
ma kes all d ominant nodes strong . . . he conve ntion that wins second place is: label dominant
nodes as strong if and only if the y branch.
I7 Obligatory branching is the default because evidence (the presence of any stressed l ight
syllables that do not receive stres s from the word-tree) can force its revision (Dresher & Kaye,
1990, p. 193).
-
8/11/2019 Connectionist Models and Linguistic Theory
27/50
CONNECTIONIST MODE LS AND LINGUISTIC THEORY
27
In view of thegreaterdifferential n learning imesbetweenGroups8 and
9 than betweenGroups6 and 7, we conclude hat the effect of obligatory
branching s to
increase
learning ime. That is, we view our learning esults
assupporting he markedness f obligatorybranching.This raises he nter-
estingpossibility hat learning esultssuchas hose rom the percentpercep-
tron simulationscanprovidea newsource f insight nto questions f mark-
edness. he distributional view of the markedness f obligatorybranching
(Hayes,1980,p. 113)seemso conflict with the earnabilityview (Dresher&
Kaye, 1990, p. 193). The present simulations seem to agreewith the
distributional view, and could perhaps erveasa fresh sourceof evidence.
This potential contribution to theoreticalanalysiscan be further illu-
strated for the stresssystemsof Lakota and Polish, which are mirror
images,Recall hat in Lakota, primary stress alls on the second yllableof
the word. The analysis o far adopted or Lakota s that it hasnon-iterative
binary right-dominant QI feet constructed rom left to right, with a left-
dominantword-tree. Let us call this
Analysis A.
As illustrated n Figure3,
this leads o the constructionof onebinary right-dominantQI foot at the
left edgeof the word. This, togetherwith the left-dominant word-tree,
results n the assignmentof primary stress o the secondsyllable. As has
beenshown,under his analysis he perceptron earning esultssupport he
markedness f non-iteration recall he differing learning imesof Groups1
through 4, vs. Groups5 through 9).
However, an alternative analysis s that Lakota has a left-dominant
word-tree with no metrical feet, and the first syllable is extrametrical
(Dresher& Kaye, 1990,p. 143).Let us call this Analysis B. As illustrated n
Figure 3, the leftmost syllable s treatedas invisible to the stress ules,
and the word-treeassignsprimary stress o the leftmost of the visible
syllables.The result is that the secondsyllable receives rimary stress.
Under this analysis,Lakota and Polish (Group 5, in Table 6) differ from
Latvian and French (Group 1 in Table 6) only in having an extrametrical
syllable.The differing learning imes for the two groups (1 epochvs. 10
epochs) hen suggest hat extrametricality s
marked.
However, this runs
counter o both the distributionalview (Hayes,1980,p. 82)lgand the earn-
ability theory view (Dresher& Kaye, 1990,pp. 189,191).
I This is based on Hayes analysis of
penultimate stress
(Hayes, 1980, p. 55).
I9 Hay es argues for the importance of the device of extrametricality to the theory because
of i ts,role in accounting for a variety of stres s phenomena . Thu s, extrametricality is unma rked
in the sense previously referred to: that of being attested to in a fair variety o f languages.
lo In Dresher &
Kayes
formulation, the presence of stress on a leftmo st or r ightm ost
syllable can rule out extrametricality. How ever, there is no posit ive c ue that unambiguously
determines the presence of ex trame tricality. Hen ce, the default value for parame ter P8 is that
there is extram etricality (PE[ye s]). I f the default were PE[N o], but this wa s an incorrect sett ing,
there would be no cue that could lead to detection that this is incorrect.
-
8/11/2019 Connectionist Models and Linguistic Theory
28/50
28
ANALYSIS A
GUPTA AND TOURETZKY
I
ANALYSIS B
Word-tree
Foot
A I\
/I
w s w w
S
w w
u (r u u (qy.~~
0-u-u
Syllables
.
i
I--
Extra-metrical syllable
Figure 3. Two metrical analyses far a four-syl lable word in Lakota. Strong branches are
labe led S, and weak branches W. Analysts A: construction of one binary right-dom inant
Ql foot at the left edge of the word, together wi th a left-dominant word-tree, resul ts in the
assignme nt of primary stress to the second syl lable. Analysis 3: the leftmost syl lable is
treated as invisible to the stress rules (extrametrtcal), and the word-tree assigns primary
stress to the leftmost of the visible syl lables. The result is that the second syl lable
receives primary stress.
To summarize,
nalysis A views
LakotaandPolishashavingnon-iterative
feet, which both the distributional/theoreticaland learnability approaches
treat as marked.
Analysis B
views hesestresspatternsas havingan extra-
metrical syllable,which both approachesreat asunmarked.So far, there s
nothing theory-external o help choosebetween he analyses.Simulation
resultssuchas hese rovidesucha means: ince he earning esultsarecon-
sistentwith the theoreticalmarkedness f non-iteration,but not with the
unmarkedness f extrametricality, hey provide at leastweak support for
preferring
Analysis A
over
Analysis B.
4. LOWER LEVEL PROCESSING: THE CAUSAL ACCOUNT
In the previous section, we developedour abstract pseudo-linguistic
theory of the earningandassignment f stress y a perceptron. his high-
levelaccountwasbasedon observation f perceptronearning esults,with-
out knowledgeof processingmechanismswithin the perceptron.
-
8/11/2019 Connectionist Models and Linguistic Theory
29/50
CONNECTIONIST MOD ELS AND LINGUISTIC THEORY
29
In this section, we open up the black box of the perceptron and ex-
a