Connectionist Models and Linguistic Theory

download Connectionist Models and Linguistic Theory

of 50

Transcript of Connectionist Models and Linguistic Theory

  • 8/11/2019 Connectionist Models and Linguistic Theory

    1/50

    COGNITIVE SCIENCE

    18,

    l-50 (1994)

    ConnectionistModels and LinguisticTheory:

    Investigationsof StressSystems n Language

    PRAHLAD GUPTA

    DAVID S. TOURETZKY

    Carnegie Mellon University

    We question the widespreod assumption that l inguistic theory should guide the

    formuiotion of mechanistic accounts of huma n language processing. We develop

    a pseudo -l inguistic theory for the dom ain of l inguistic stress,based on observa-

    tion of the ieornin g behavior of a perceptron exposed to a variety of stress pat-

    terns. There are significant similarities betw een our analysis of perceptron stress

    learning and metrical phonology, the l inguistic theory of hum an stress. Both

    approaches attem pt to identify salient characteristics of the stress systems under

    exam ination witho ut reference to the workings of the underlying processor. Our

    theory and computer simulat ions exhibit some strikingly suggestive correspon-

    dences w ith metrical theory. We show, however, that our high-leve l pseudo-

    l inguistic account bears no causal relation to processing in the perceptron, and

    provides l i ttle insight into the nature of this processing. Because of the persua-

    sive similarities betw een the nature of our theory an d l inguistic theorizing, we

    suggest that l inguistic theory may be in much the same position. Contrary to the

    usual assumption, i t may not provide useful guidance in attempts to identi fy pro-

    cessing mechanisms underlying hum an language.

    The fundamentalquestions o be answered bout languageare, as Noam

    Chomskyhaspointedout: (1)What is thesystemof knowledgen the mind/

    brain of the speakerof Englishor Spanishor Japanese?2) How does his

    knowledge ome nto being? 3) How is this knowledge sed?and (4)What

    are the physicalmechanisms ervingas he basis or this systemof knowl-

    edge,and for its use? Chomsky, 1988,p. 3).

    We would l ike to thank Deirdre Wheeler for introducing us to the l i terature on stres s and

    providing guidance in the early stage-s f this wo rk. We also thank Gary Cottrel l , Elan Dres her,

    Jef f Ehnan, Dan Everet t , M ichael Gasser, Br ian M acWhinney, Jay McClel land, Er ic Nyberg,

    Brad Pritchett, Steve Sm all, Deirdre Whe eler, and an anonym ous reviewer for helpful com-

    me nts on earlier ve rsions of this paper, and David Evan s for ac ces s o computing facil i ties at

    Carnegie Mellons Laboratory for Com putational Linguistics. Of course , none of them is re-

    sponsible for any errors in this work , nor do they necess ari ly agree with the opinions expressed

    here.

    The second author wa s supported by a grant from H ughes Aircraft Corporation, and by

    the Office of Nava l R esearch under contrac t num ber NOO O14-86-K-0678.

    Correspondence and requests for reprints should be sent to Prahlad Gup ta, Depa rtment of

    Psychology, Carnegie Mellon University, Pittsburgh, PA 15213.

    1

  • 8/11/2019 Connectionist Models and Linguistic Theory

    2/50

    2 GUPTA AND TOURETZKY

    Linguistic theory s viewed, n the Chomskyan radition, as lluminating

    the nature of mind, and as guiding the search or lower level processing.

    Thus, according o Chomsky, to the extent hat the linguist can provide

    answers o the first threequestionsabove, he brain scientistcan begin o

    explore answers o the fourth question, hat is, can begin to explore he

    physical mechanisms hat exhibit the properties evealed n the linguists

    abstract heory. In the absence f answerso the first threequestions, rain

    scientistsdo not know what they are searching or (Chomsky, 1988,p. 6).

    Chomsky distinguishes hree levels of analysis n linguistic inquiry:

    observation,description,and explanation; heseare he tasks or linguistics

    (Chomsky, 1988, . 60-62).Explanation s via the formulation of Universal

    Grammar, which is viewedasproviding a genuineexplanationof the phe-

    nomena Chomsky,1988,p. 62).UniversalGrammarreveals he principles

    of the mind (Chomsky, 1988,p, 91), and thus constitutes he abstract

    theory that is needed o guide he search or actual brain mechanisms.

    This view canbe statedmore generallyas he claim that the abstract or-

    mulations of linguistic theory are he appropriate rameworkwithin which

    to organize the search for the lower-level processingmechanismsof

    language.Our aim in this paper s to draw attention o the fact that this s an

    assumption, and one with important consequences.o the extent hat this

    claim is true, it suggestshat formulation of accountsof language rocess-

    ing, as well as the search or the neural substrates f language, hould be

    guidedby the constructsof linguistic theory.To the extent hat this claim is

    false, the useof linguistic constructsas the basis for processing ccounts

    and neural nvestigationmay be misguided,and misleading.

    Our goal, then, is to examinewhether t is necessarilyrue that the con-

    structsof linguistic theory play a valuable ole in formulation of accounts

    of lower-levelprocessing.n this paper,we will develop

    &pseudo-linguistic

    theory

    for the domain of linguistic stress,basedon observationof the

    learningbehaviorof a perceptron xposedo a varietyof stress atterns.We

    will then compare his theorywith

    metricalphonology,

    the actual inguistic

    theory of stress,showinghow our theory exhibits many of the kinds of

    regularitiesand predictionsof metrical theory. Thus, our theory will be

    analogous,n important ways, o metrical theory. It predicts, or example,

    that certain stresssystemswill be unlearnable.We will then analyze he

    behaviorof the underlyingperceptronmodel n termsof its actual process-

    ing. This will show hat our theoreticalpredictionsof non-learnabilitybear

    no relation to the low-level, causalexplanationwe developas part of the

    processing ccount.

    We call our theory pse udo-l inguistic only because it is based on data generated b y the

    perceptron rather than hum ans.

  • 8/11/2019 Connectionist Models and Linguistic Theory

    3/50

    CONNECTIONIST MODE LS AND LINGUISTIC THEORY

    3

    In short, we will demonstratehat a lower-levelaccountmay not be so-

    morphic (or evensimilar) to a higher-level ccount,and hat under hese ir-

    cumstances, he higher-levelaccount may provide little insight into the

    underlying phenomena.Furthermore, since the nature and the mode of

    development f our higher-level ccountarecloselyanalogouso the nature

    and developmentof bodies of linguistic theory, this constitutesa strong

    argumentagainstassuming hat the high-levelaccountsof linguistic theory

    will necessarilylluminate lower-levelprocessingmechanisms, nd against

    assuming hat linguistic theory is the appropriateconceptual ramework

    within which to organize he search or thesemechanisms. ucharguments

    haveof course eenmadebefore Pylyshyn,1973;Rumelhart8cMcClelland,

    1987;Smolensky,1988).However, he presentwork substantiateshe argu-

    ment with a concretedemonstrationof how an analytical framework that

    bearsa close esemblanceo linguistic heorycan ail to havean analoguen

    the underlyingprocessingmechanisms.

    The organizationof the paper s as follows. The first sectionprovides

    somebackground nformation about the domain of linguistic stress,while

    the second ectiondescribes ur perceptron imulations of stress earning.

    The third sectiondevelops ur pseudo-linguisticheory, anddiscussesorre-

    spondencesetween spects f the perceptron imulations,our theory, and

    metrical theory. In the fourth section,we show that our theory n reality

    offers only an abstract account hat is unrelated o low-level processing

    characteristics. aving shown hat a high-levelaccountneednot illuminate

    underlying processing,we turn in the fifth section o a considerationof

    someotherreasonswhy theanalyses f metrical theorymight not be appro-

    priate primitives for a processingmodel. In view of all this, we conclude

    that the Chomskyanassumption hat linguistic theoryshouldguide nvesti-

    gation of processingmechanismsmay be unwarranted.

    1. BACKGROUND: STRESS SYSTEMS IN LANGUAGE

    1.1. Motivation for Choice of Domain

    In order to examinea subsystem f linguistic theory n termsof its implica-

    tions for lower-levelprocessing, prerequisites, of course, hat the theory

    be clearly specified.According o Dresherand Kaye (1990),stress ystems

    arean attractivedomain for investigationbecause:a) the inguistic theory

    is well-developed, o that comparedwith syntax, here s a relatively com-

    pletedescriptionof the observed henomena, nd (b) stress ystems anbe

    studiedrelatively independently f other aspectsof language Dresher&

    Kaye, 1990,p. 138).We thereforechose o constructour pseudo-linguistic

    theory with respect o linguistic stress, ince t is regarded sa well-defined

    domain for which theoreticalanalyses rovide good coverage.

  • 8/11/2019 Connectionist Models and Linguistic Theory

    4/50

    4

    GUPTA AND TOURETZKY

    1.2. Evolution of the Linguistic Theory

    The analysisof stresshasevolved hrough a number of phases:

    1.

    2.

    3.

    4.

    5.

    Linear analysespresented tressas a phonemic eature of individual

    vowels,with different evelsof stressepresenting ifferent evelsof ab-

    soluteprominence.This approachwas aken n Trager& Smith (1951),

    and culminated n Chomskyand Halles seminalThe Sound Pattern of

    English

    (1968).

    Metrical theory, as developed y Liberman & Prince (Liberman, 1975;

    Liberman & Prince, 1977), ntroducedboth a non-linearanalysisof

    stress atterns in termsof

    metrical trees),

    and the reatmentof stress s

    a relative property rather than an absolute one; however, he stress

    featurewasretained n the analysis.

    In subsequent evelopmentsPrince, 1976;Selkirk, 1980), elianceon

    this featurewaseliminatedby incorporationof the dea hat subtrees f

    metrical treeshad an independent tatus metrical feet), so that stress

    assignment ulescould make reference o them.

    The positing of internal structure for syllables (McCarthy, 1979a;

    McCarthy, 1979b;Vergnaud& Halle, 1978)provideda meansof dis-

    tinguishing ight andheavysyllables, distinction o whichstress atterns

    are widely sensitive,but which had beenproblematic under previous

    analyses.

    An analysisof metrical tree geometriesHayes, 1980)providedan ac-

    count of many aspects f stress ystemsn terms of a small numberof

    parameters.

    Through he development f metrical theory, therehasbeendebateover

    whether he autosegmentalepresentationsor stress

    re metrical trees only

    (Hayes, 1980),metrical grids only (Prince, 1983;Selkirk, 1984),or some

    combination of the two (Halle & Vergnaud, 1987a;Halle & Vergnaud,

    1987b;Hayes, 1984a;Hayes, 1984b;Liberman: 1975;Liberman & Prince,

    1977).

    1.3. Syllable Structure

    A syllable s analyzedas being comprisedof an onset,which contains he

    material before he vowel, and a rime. The rime is comprisedof a nucleus,

    which contains he vocalicmaterial, anda

    coda,

    which containsany remain-

    ing (non-vocalic)material. For further discussion,seeKaye (1989, pp.

    54-58).

    A

    syllablemay be

    open

    (it ends n a vowel); or

    closed

    (it ends n a conso-

    nant). In terms of syllablestructure,an opensyllablehas a non-branching

    rime (the rime has a nucleus,but not a coda), and a closedsyllable hasa

    branching

    rime (the rime hasboth a nucleusand a coda).

  • 8/11/2019 Connectionist Models and Linguistic Theory

    5/50

    CONNECTIONIST MODE LS AND LINGUISTIC THEORY

    5

    In many languages,tress ends o beplacedon certainkinds of syllables

    rather han on others; he former are ermedheavysyllables,and he atter

    light syllables.What countsas a heavyor a light syllablemay differ across

    languagesn which such a distinction is present,but, most commonly, a

    heavysyllable s one hat can be characterized shavinga branching ime,

    and a light syllable can be characterized s having a non-branching ime

    (Goldsmith,1990, . 113).Languageshat nvolvesucha distinction between

    heavyand ight syllables, .e., between he weightsof syllables)are termed

    quantity-sensitive,and languages hat do not, quantity-insensitive.Note

    that in quantity-insensitiveanguages yllablescan occur both with and

    without branching rimes, but the distinction between hese kinds of

    syllableshas no relevanceor the placementof stress.

    1.4. Metrical Phonology

    Thereseems o be theoreticalagreementhat stress atternsare sensitive o

    information about syllablestructure,and in particular, to the structureof

    the syllablerime, and not the syllableonset.We follow this assumptior?.

    Thus rime structure s taken o be he basic evel at which accountsof stress

    systems,re ormulated.(For an overviewof metrical heory, seeGoldsmith

    (1990,Chapter4), Kaye(1989,pp. 139-145), an derHulst & Smith (1982),

    or Dresher& Kaye (1990, pp. l-8)). Stresspatterns are controlled by

    metrical structures uilt on top of rime structures.The versionof metrical

    structure adopted here is metrical feet. We assume he parameters or-

    mulatedby Dresherand Kaye (1990,p. 142):

    W)

    The word-tree s strong on the [Left/Right]

    WI

    Feet are [Binary/Unbounded]

    (P3)

    Feet arebuilt from the [Left/Right]

    (P4)

    Feet arestrong on the [Left/Right]

    (P5)

    Feet are Quantity-SensitiveQS) [Yes/No]

    06)

    Feetare QS to the [Rime/Nucleus]

    0?7)

    A strongbranchof a foot must itself branch [No/Yes]

    WV

    There s an extrametricalsyllable [Yes/No]

    (P9)

    It is extrametricalon the [Left/Right]

    (Pw

    A weak foot is defooted n clash [No/Yes]

    Wl)

    Feetare non-iterative No/Yes]

    As an exampleof the applicationof theseparameters, onsider he stress

    patternof Maranungku, n which primary stress alls on the first syllableof

    the word and secondary tresson alternatesucceeding yllables.Figure 1

    See, for exam ple, Dresher & Kaye (1990, p. 141) or Goldsm ith (1990, p. 170). No te,

    howe ver, that som e researchers have presented evidence that onse ts ma y in fact be relevant to

    the placeme nt of stress (Dav is, 1988; Everett & Eve rett, 1984).

  • 8/11/2019 Connectionist Models and Linguistic Theory

    6/50

    6 GUPTA AND TOURETZKY

    Word-tree

    Flgure 1. Metrl cal structures for a six-syllable word In Moran ungku.

    showsan abstract epresentation f a six-syllableword, with eachsyllable

    represented s u. The assignmentof stress s characterizedas follows.

    Binary, quantity-insensitive,eft-dominant feet are constructed teratively

    from the left edgeof the word. Each foot has a strong and a weak

    branch (labeled S and W, respectively,n the figure). The strong,or

    dominant branchassigns tress o the syllable t dominates.Since he feet

    are left-dominant, odd-numbered yllablesare assigned tress.Over the

    roots of thesemetricalfeet, a eft-dominant word-tree s constructed,which

    assigns tress o the structuredominatedby its leftmost branch.The third

    and fifth syllablesare each dominated by the dominant branch of one

    metrical structure a foot), while the first syllable s dominatedby the domi-

    nant branches f two structures a foot, and he word-tree).Even-numbered

    syllablesaredominatedonly by nondominantbranches f feet. The result s

    that even-numberedyllables eceiveno stress; he third and fifth syllables

    receiveonedegree f stress secondary tress);and he first syllable eceives

    two degrees f stress primary stress.) he parameter ettings haracterizing

    Maranungkuare [Pl Left], [PZ Binary], [P3 Left], [P4 Left], [PSNo], [P7

    No], [P8 No], [PlO No], [Pll No]. ParametersP6 and P9 do not apply

    because f the settingsof parametersP5 and P8, respectively.

    1.5 Principles and Parameters

    Metrical theory illustrates he

    principles and parameters

    approach o lan-

    guage,one of whosecentralhypothesess that languageearningproceeds

    through the discoveryof appropriateparametersettings.Every possible

    human anguage anbe characterizedn terms of parametersettings;once

  • 8/11/2019 Connectionist Models and Linguistic Theory

    7/50

    CONNECTIONIST MODE LS AND LINGUISTIC THEORY 7

    thesesettingsare determined, he nature of structure-sensitiveperations

    and he structures n which hey operates known, so that the detailsof lan-

    guage rocessing reautomaticallydetermined at an abstract evel).Subse-

    quently, the assignment f stress n the actual

    production

    or

    processing

    of

    language s assumed o involve neural processeshat correspondquite

    directly with the abstractprocess f applicationof theseparameter ettings

    as guidelines o the constructionand manipulation of metrical feet.

    1.6. Previous Computational Models of Stress Learning

    Computationalmodels of stresssystemsn language avebeendeveloped

    by Dresher& Kaye (1990)and by Nyberg(1990,1992).The focus of these

    models s on the learningof theparameters specifiedby metrical theory;

    they therefore ake asa startingpoint the constructs f that theory, and n-

    corporate ts assumptions.What they add to the linguistic theory s what

    Dresher8~Kaye erm a learning theory: a specificationof how the data he

    language earner encounters n its environment are to be used to set

    parameters. he following features haracterizehesemodels:

    1.

    2.

    3.

    4.

    They assume he existence f processesxplicitly correspondingo the

    linguistic notion of parametersetting.

    They proposea

    learning

    theoryas an accountof that parameter-setting

    process.

    They assumehat he process f production (i.e., of producingappropri-

    ate stress ontours or input wordsafter earninghasoccurred)nvolves

    explicit .representational tructuresand structure-sensitive perations

    directly corresponding o metrical-theoretic rees and operationson

    those rees.

    They assume o necessaryelationshipbetween he processingmecha-

    nisms nvolved in

    learning

    vs. those nvolved in

    production.

    That is,

    learning s accomplished y supplying alues or the parameters efined

    by metrical theory; thesevalues hen form a knowledge ase or stress

    assignment,whoseprocessingnvolves, for example, he construction

    of binary trees rom right to left-an operationhavingno necessaryor-

    respondence ith thoseby which the parameter alueswere acquired.

    The abovemodelsexemplify he assumptionwe are nterestedn examin-

    ing in the presentwork: that abstract inguistic formulations are he appro-

    priate starting point for processingmodels.

    Processingn the perceptronwill differ from the Dresher& Kaye and

    Nybergmodelswith regard o the abovenoted characteristics:

    1. There is no explicit incorporation of parameters n the perceptron

    model.

  • 8/11/2019 Connectionist Models and Linguistic Theory

    8/50

    8 GUPTA AND TOURETZKY

    2. The learning heory employedconsistsof one of the general earning

    algorithmscommon in connection& modeling, and is not an account

    of parameter-setting.

    3. The process f productiondoesnot involve explicitly structured epre-

    sentations n the classical ense.

    4. The processingmechanisms nd structures nvolved n production are

    essentially he sameas those nvolved n learning.

    Thus, to the extent that we are successful n contructing an abstract,

    pseudo-linguisticheory that predictsbehaviorof the perceptronmodel, t

    will not be merelyby virtue of having built in the constructs f metrical

    theory to beginwith.

    2. OVERVIEW OF MODEL AND SIMUL.ATIONS

    2.1. Nineteen Stress Systems

    Nine quantity-insensitive QI) languagesand ten quantity-sensitive QS)

    languageswere examined n our experiments.The data, summarized n

    Table 1, were aken primarily from Hayes(1980).Note that the QI stress

    patternsof Latvian andFrench,MaranungkuandWeri, Lakota and Polish,

    and Paiute and Waraoare mirror imagesof eachother. The QS stress at-

    terns of Malayalam and Yapese,Osseticand Rotuman, EasternPermyak

    Komi and EasternCheremis,and Khalka MongolianandAguacatecMayan

    are also mirror images).

    2.2. Structure of the Model

    In separate xperiments,we taught a perceptron o produce he stresspat-

    tern of each of the nineteen anguages. he domain was imited to single

    words, as n the previous earningmodelsof metrical phonologydeveloped

    by Dresherand Kaye (1990)and Nyberg (1990).Again as in the other

    models, the effects of morpho-syntactic nformation such as lexical cate-

    gory were gnored,and the simplifying assumptionwas made hat the only

    relevant nformation about syllableswas their weight.

    Two input representations ereused. n thesyllabicrepresentation, sed

    for QI patternsonly, a syllablewasrepresentedsa [l, l] vector, and [0, 0]

    represented o syllable. In the weight-string epresentation,which was

    necessaryor QS languages,he input patternsusedwere [l, O] for a light

    syllable, [0, l] for a heavy syllable, and [0, O] for no syllable. For stress

    systems, ith up to two levelsof stress, he output targetsused n training

    were1.0 or primary stress, .5 for secondary tress, nd0 for no stress. or

    Descriptions som ewh at more com plex than ours have been reported for Polish (Halle &

    Vergnaud, 1987a, pp. 57-58) and Malayalam (Hay es, 1980, pp. 66, 109). How ever, this does

    not detract from our discus sion in any wa y, as stres s sys tem s corresponding to our sim plif ica-

    t ions are reported to exis t: Swahil i (Halle & Clem ents, 1983, p. 17) and Gurkhali (Hay es, 1980,

    p. 66), corresponding to Polish and Ma lay&m , respec tively.

  • 8/11/2019 Connectionist Models and Linguistic Theory

    9/50

    T

    1

    S

    e

    P

    e

    n

    D

    p

    o

    a

    E

    m

    p

    e

    S

    e

    A

    g

    m

    e

    R

    L

    D

    p

    o

    o

    S

    e

    P

    e

    n

    E

    m

    p

    e

    Q

    y

    n

    v

    L

    L

    L

    v

    a

    F

    x

    w

    d

    n

    a

    s

    e

    L

    F

    e

    F

    x

    w

    d

    n

    s

    e

    L

    M

    a

    a

    P

    m

    a

    y

    s

    e

    o

    f

    s

    s

    a

    e

    s

    y

    s

    e

    o

    a

    e

    n

    e

    s

    n

    s

    a

    e

    L

    W

    e

    P

    m

    a

    y

    s

    e

    o

    l

    a

    s

    a

    e

    s

    y

    s

    e

    o

    a

    e

    n

    e

    p

    e

    n

    s

    a

    e

    L

    G

    a

    w

    P

    m

    a

    y

    s

    e

    o

    f

    s

    s

    a

    e

    s

    y

    s

    e

    o

    p

    m

    a

    e

    s

    a

    e

    t

    e

    a

    y

    s

    e

    o

    a

    e

    n

    e

    s

    a

    e

    p

    e

    n

    t

    h

    p

    n

    s

    e

    o

    s

    s

    a

    e

    L

    L

    a

    L

    P

    s

    L

    -

    P

    u

    e

    L

    W

    a

    0

    P

    m

    a

    y

    s

    e

    o

    s

    s

    a

    e

    P

    m

    a

    y

    s

    e

    o

    p

    m

    a

    e

    s

    a

    e

    -

    P

    m

    a

    y

    s

    e

    o

    s

    s

    a

    e

    s

    y

    s

    e

    o

    a

    e

    n

    e

    s

    n

    s

    a

    e

    P

    m

    a

    y

    s

    e

    o

    p

    m

    a

    e

    s

    a

    e

    s

    y

    s

    e

    o

    a

    e

    n

    e

    p

    e

    n

    s

    a

    e

    (

    c

    n

  • 8/11/2019 Connectionist Models and Linguistic Theory

    10/50

    T

    1

    C

    n

    Q

    v

    S

    v

    L

    L

    O

    L

    1

    L L L L L L L

    B

    L

    K E

    m

    o

    M

    a

    a

    a

    m

    Y O

    c

    R

    u

    m

    a

    K

    m

    i

    C

    e

    m

    i

    s

    M

    o

    a

    M

    a

    P

    m

    a

    y

    s

    e

    o

    f

    s

    s

    a

    e

    s

    y

    s

    e

    o

    h

    s

    a

    e

    (

    H

    o

    s

    a

    e

    o

    s

    a

    e

    w

    h

    l

    o

    v

    w

    (

    P

    m

    a

    y

    s

    e

    o

    f

    n

    a

    h

    s

    a

    e

    (

    H

    o

    s

    a

    e

    P

    m

    a

    y

    s

    e

    o

    f

    s

    s

    a

    e

    e

    w

    f

    s

    s

    a

    e

    l

    g

    a

    s

    s

    a

    e

    h

    (

    H

    o

    v

    w

    P

    m

    a

    y

    s

    e

    o

    l

    a

    s

    a

    e

    e

    w

    la

    i

    s

    l

    g

    a

    p

    m

    a

    e

    h

    (

    H

    l

    o

    v

    w

    P

    m

    a

    y

    s

    e

    o

    f

    s

    s

    a

    e

    i

    h

    e

    s

    o

    s

    s

    a

    e

    (

    H

    l

    o

    v

    w

    P

    m

    a

    y

    s

    e

    o

    l

    a

    s

    a

    e

    i

    h

    e

    s

    o

    p

    m

    a

    e

    s

    a

    e

    (

    H

    l

    o

    v

    w

    P

    m

    a

    y

    s

    e

    o

    f

    s

    h

    s

    a

    e

    o

    o

    l

    a

    s

    a

    e

    i

    n

    h

    (

    H

    l

    o

    v

    w

    P

    m

    a

    y

    s

    e

    o

    l

    a

    h

    s

    a

    e

    o

    o

    f

    s

    s

    a

    e

    i

    n

    h

    (

    H

    l

    o

    v

    w

    P

    m

    a

    y

    s

    e

    o

    f

    s

    h

    s

    a

    e

    o

    o

    f

    s

    s

    a

    e

    i

    n

    h

    (

    H

    l

    o

    v

    w

    P

    m

    a

    y

    s

    e

    o

    l

    a

    h

    s

    a

    e

    o

    o

    la

    s

    a

    e

    i

    n

    h

    (

    H

    l

    o

    v

    w

    L

    L

    O

    O

    L

    O

    L

    O

    L

    L

    O

    O

    O

    O

    O

    O

    L

    O

    O

    O

    L

    O

    O

    L

    O

    O

    O

    O

    O

    O

    L

    L

    O

    H

    O

    L

    L

    O

    L

    H

    L

    O

    L

    O

    O

    H

    O

    L

    O

    O

    L

    O

    L

    O

    H

    L

    O

    O

    O

    O

    L

    O

    L

    O

    O

    O

    O

    O

    L

    O

    O

    O

    L

    L

    O

    L

    O

    O

    O

    O

    O

    L

    O

    L

    L

    O

    L

    O

    O

    O

    L

    O

    O

    O

    O

    O

    O

    L

    O

    O

    O

    O

    L

    O

    L

    L

    O

    O

    O

    O

    O

    O

    L

    O

    H

    L

    O

    O

    L

    O

    L

    L

    O

    O

    O

    O

    O

    O

    L

    O

    O

    L

    H

    L

    L

    O

    O

    O

    O

    O

    O

    N

    e

    E

    m

    p

    e

    a

    e

    o

    s

    e

    a

    g

    m

    e

    i

    n

    s

    s

    a

    e

    w

    d

    P

    m

    a

    y

    s

    e

    i

    s

    d

    e

    b

    t

    h

    s

    s

    p

    1

    e

    g

    S

    s

    y

    s

    e

    b

    t

    h

    s

    s

    p

    2

    t

    e

    a

    y

    s

    e

    b

    t

    h

    s

    s

    p

    3

    a

    n

    s

    e

    b

    t

    h

    s

    s

    p

    0

    S

    n

    c

    e

    a

    a

    b

    a

    y

    s

    a

    e

    a

    i

    s

    u

    f

    o

    t

    h

    Q

    s

    e

    p

    t

    e

    n

    F

    Q

    s

    e

    p

    e

    n

    H

    a

    L

    a

    e

    u

    t

    o

    d

    e

    H

    a

    L

    g

    s

    a

    e

    r

    e

    v

    y

  • 8/11/2019 Connectionist Models and Linguistic Theory

    11/50

    CONNECTIONIST MOD ELS AND LINGUISTIC THEORY

    11

    I

    I OUTPUT UNIT

    J

    (PERCEPTRON)

    -6 -5 -4 -3 -2

    -1 0 +l t2 t3 t4 t5 t6

    INPUT LAYER (2 x 13 units)

    Figure 2.

    Perceptron mod el used in simulotlons.

    stresssystemswith three evels of stress, he output targetswere 1.0 for

    primary stress,0.6 for secondary, .35 or tertiary, and0 for no stress. he

    input data set for all stresssystems onsistedof all word-forms of up to

    sevensyllables.With the syllabic input representationhere are sevenof

    these,andwith theweight-string nput representation,hereare254distinct

    patterns. The perceptrons nput array was a buffer of 13 syllables;each

    word wasprocessednesyllableat a time by sliding t through he buffer (see

    Figure 2). The desiredoutput at eachstepwas he stressevel of the middle

    syllableof the buffer. Connectionweightswereadjustedat eachstepusing

    the back-propagationearningalgorithm.*(Rumelhart,Hinton, & Williams,

    In practice, we used weight str ing training sets n which there were an equal number of in-

    put patterns of each length. Thu s, there was one instance of each of the 128 (=2) seven-

    syllable patterns, and 64 instances of each of the two m onosyllablic patterns. This length

    balancing wa s neces sary for certain languages for suc cess ful traning. We do not kno w h ow w ell

    i t agrees with the actual frequency of form s in these languages, though as a general rule high-

    frequency form s are probably shorter.

    Note that although the architecture of the model is two-layered, with a single output unit,

    as in a simple perceptron, we used the back-propagation algorithm (BP) rather than the

    Widrow -Hoff algorithm (W H); (Widrow & Ho ff, 1960). BP adds to W H the scaling of the

    error for each output unit by a function (the derivative) of the activation of that output unit,

    and thus performs a more sensit ive and example-tuned weight adjustment than W H. Note that

    BP and W H algorithms for two layers are guaranteed by the perceptron convergence procedure

    to be equivalent in term s of learning capabil i t ies, for binary-valued outpu&. How ever, outputs

    in the present simulations are not alway s binary; they som etime s take on intermediate values.

    As a result, the different com putation of the error term in BP turned out to provide better

    learning than WH.

  • 8/11/2019 Connectionist Models and Linguistic Theory

    12/50

    12

    GUPTA AND TOURETZKY

    1986).One

    epoch

    consistedof one presentation f the entire training set.

    The network was rained or asmany epochsasnecessaryo ensure hat the

    stress alueproducedby the perceptronwaswithin 0.1 of the targetvalue,

    for eachsyllable of the word, for all words n the training set. A

    learning

    rate of 0.05 and momentum of 0.90 were used n all simulations. Initial

    weightswereuniformly distributed andomvalues n therangeof f 0.5. Each

    simulation wasrun at least hree imes, and the learning imes averaged.

    2.3. Training the Perceptron

    To enableprecisedescription, et the input buffer be regardedas a 2 X 13

    array asshown n Figure2. Let the 7th, that is, center,column benumbered

    0, let the columns to the left of the center be numberednegatively - 1

    through -6 going outwards rom the center,and the columns o the right

    of the centerbenumberedpositively + 1 hrough + 6, goingoutwards rom

    the center.

    As an exampleof processing, upposehat the next word in the training

    set or somequantity-insensitiveanguages the four-syllablepattern S S S

    S, and theassociatedtress ontour(i.e., target), s [O0 0 11, ndicating hat

    the final syllable eceives tress,and all other syllablesareunstressed6.he

    first syllableenters he rightmost elementof the input buffer (element+ 6).

    This is the first time step of processing. t the next time step, the first

    syllable (in the rightmost input buffer position, that is, element +6) is

    shifted eft to buffer element + 5, and the second yllableenters he input

    buffer in element+6.

    After two further time steps,elements+ 3, + 4, + 5, and + 6 of the nput

    buffer contain he four syllablesof the currentword. At the next wo time

    steps, he eftward flow of syllables hrough he nput buffer continues,un-

    til at time step6, the words four syllablesare n elements+ 1 through +4.

    At time step7, the first syllablemoves nto element0 of the input buffer,

    and he four syllablesof theword occupyelements through + 3 of the buf-

    fer. At this time step, raining of the network occurs or the first time: the

    perceptron s trained to associate he pattern n its input buffer (the four

    syllablesof the word, in buffer elements0 through +3) with the target

    stress evel for the syllablecurrently n buffer element0. At the next time

    step, the syllablesof the word are shifted left again, and occupybuffer

    elements 1 through +2. The network s trained to associatehis pattern

    with the argetstressevel or the second yllable,which s now in element .

    Similarly, at the next two time steps, he perceptron s trained to produce

    the stressevelsappropriate or the third and fourth syllablesof the word,

    as they come o occupyelement0 of the input buffer.

    6 To simp lify discussion here, each syllable is represented as an S token , rather than a s

    an H or IL. For quantity- insensit ive sy ste m s, this information suff ice s to determine place-

    ment of stress.

  • 8/11/2019 Connectionist Models and Linguistic Theory

    13/50

    CONNECTIONIST MOD ELS AND LINGUISTIC THEORY 13

    At this point, the buffer is flushed(all elements reset o zero),and one

    trial is over. The next word in the training set cannow enter,beginning

    the next trial.

    Thus, the processing f one word, syllable by syllable, constitutesone

    trial.

    Onepass hroughall the words n the training setconstitutes n

    epoch.

    It shouldbe noted hat sequentialprocessing f syllables s not a neces-

    sarypart of this model.Exactly the sameeffect canbeobtainedby a parallel

    scheme f 13perceptron-like nits whoseweightvectorsare ied togetherby

    weight-sharing

    (Hertz, Krogh, & Palmer, 1991,p. 140).All 13units would

    learn n unison, and words could be processedn one parallel step.

    3. PERCEPTRON STRESS LEARNING:

    A PSEUDO-LINGUISTIC ACCOUNT

    In this section,we analyze he characteristics f the variousstresspatterns

    to which the perceptron s exposed,and relate them to the perceptrons

    learningbehavior,with a view to developinga

    pseudo-linguistic theory.

    In

    the sameway that metrical phonologyattempts o provide an accountof

    linguistic stressn humans,our pseudo-linguisticheorywill attempt o pro-

    vide an accountof the learningof stress n a perceptron.

    We first presentan analysisof factors affecting earnability for the QI

    stress ystems.We developan analyticschemehat enables s both to char-

    acterize he patternsand to predict the easeor difficulty of their learning.

    We then include QS systemsand show that one analytical framework can

    take into accountboth QI and QS systems.This overall framework con-

    stitutesour pseudo-linguisticheory.

    3.1. Learnability of QI Systems

    We begin by noting that learning imes differ considerably or (Latvian,

    French}, {Maranungku,Weri}, {Lakota, Polish} and Garawa,asshown n

    the ast column of Table2. Moreover, Paiuteand Warao wereunlearnable

    with this model*.

    These are learning t ime s with the syllubic input representation.

    * A case could perhaps be made tha t mono syllables are special (Elan Dresher, personal

    com mun ication, March 16, 1992), so that i t might be plausible to assu me that there are no

    stressed mon osyllables in the data. We investigated this possibi l i ty in two wa ys. First, i f mono-

    syllables are treated as receiving no stress , then the stress patterns of Paiute and Warao becom e

    learnable within the current architecture: It takes 45 epochs for the network to learn the stres s

    pattern o f Paiute, and 36 epochs for Warao. Second, i f monosyllables are removed from the

    data se t, the learning t imes are 44 epochs for P aiute, and 46 epochs for Warao. Note also that

    both stres s patterns were learnable, even with the stressed mon osyllables, using either a two-

    layer architecture with two output units (targets were [0, O] for no stres s, [ l , 0] for sec ondary

    stres s, and [1, l ] for primary stress ), or a three-layer architecture. We present the f inding of

    nonlearnabil ity within the present model in order to maintain a cons istency of analysis in one

    mod el, and since the two-layer model with a single output unit faci l i tates analysis of connec-

    tion weights and their contribution to processing.

  • 8/11/2019 Connectionist Models and Linguistic Theory

    14/50

    14 GUPTA AND TOURETZKY

    TABLE 2

    Prellminory Analysis of Learning Times for QI Stress Systems

    IPS SCA Ah Mt's MSL QI LANGUAGES&F EPCXHS

    (syllabic)

    00 00 0 Latvian Ll 17

    French L2 16

    00 10 1 Maranungku L3 37

    Weri L4 34

    01 10 1 Garawa L5 165

    10 00 0 Lakota L6 255

    Polish L7

    254

    10 10 1 Paiute L8 **

    Warao L9 **

    Note. These learning times were obtained using the syllabic input representation.

    IPS = Inconsistent Prlmory Stress; SCA = Stress Clash Avoidance; Alt = Alternotion ;

    MP S = Mult iple Primary Stresses: ML = Mult iple Stress Levels. Symbols Ll-L9 refer to

    Table 1.

    Examination of the inherent featuresof thesestresspatternssuggests

    various factors asbeing relevant o learning:

    Alternation

    of stressesasopposed o a singlestress)s suggestedy the

    differencebetweenearning imes for {Latvian, French}and {Maranungku,

    Weri}, which also suggesthat the number ofstress levels may be relevant.

    Recall rom Table 1 that, in.Garawa,primary stress s placedon the first

    syllable,secondary tresson the penultimatesyllable,and tertiary stresson

    alternatesyllablespreceding he penultimate,but that no stressappears n

    the secondsyllable. The primary, secondary,and tertiary stresspatterns

    potentially ead o stress ppearing n both the first and he second yllables;

    however, his is avoided stresss neverplacedon the second yllable).This

    exemplifies he tendency n human anguageso avoid the appearance f

    stresson adjacentsyllables.The greater earning ime for Garawasuggests

    that such

    stress clash avoidance

    is computationallyexpensive.

    Iq languages uchas Latvian, French,Maranungku,Weri, and Garawa,

    prinia@stresss alwayson a syllableat the edgeof the word. In Lakota and

    Polish, for which learning imes aresubstantiallygreater han thoseof the

  • 8/11/2019 Connectionist Models and Linguistic Theory

    15/50

    CONNECTIONIST MODE LS AND LINGUISTIC THEORY 15

    other languages, rimary stress s alwaysat a non-edge yllable,except n

    mono- and d&syllables. Paiuteand Waraoare denticalwith respect o the

    placementof primary stress o Lakota and Polish, respectively,but are

    unlearnable).Thus, placement of primary stress seemscomputationally

    relevant. n particular, t appearsmore difficult to learn patterns n which

    primary stress s assigned t the edgesnconsistently.

    To test heseassumptions, nd to determinewhat features f Paiute and

    Warao ed to their nonlearnability,we constructed

    ypothetical stress

    pat-

    terns hat exhibit morecombinations f the above eatures hanmay actually

    exist. Thesestresspatternsare describedn Table 3. We trained networks

    on thesevariations n order to evaluate he effectsof the various eatures.

    The following factors emergedas determinantsof learnability for the

    rangeof QI patternsconsidered:

    gr)

    On

    (In)

    (IV)

    09

    Inconsistent Primary Stress

    (IPS): it is computationallyexpensiveo

    learn the pattern if neither edgereceivesprimary stressexcept n

    mono-and d&syllables;his canbe regarded s an ndexof computa-

    tional complexity that takes he values(0, 1): 1 if an edge eceives

    primary stress nconsistently,and 0, otherwise.

    Stress class avoidake @CA):

    if the componentsof a stresspattern

    can potentially lead to

    stress clash,

    then the languagemay either

    actuallypermit suchstress lash, or it may avoid t. This index akes

    the values 0, 1): 0 if stressclash s permitted, and 1 if stress lash

    is avoided.

    Alternation (Alt): an index of learnability with value0 if there s no

    alternation, and value 1 if there s. Alternation meansa pattern of

    somekind that repeatson alternatesyllables.

    Multiple Primary Stresses

    (MIS): has value0 if there s exactlyone

    primary stress, ndvalue1 f there s more than oneprimary stress.t

    hasbeenassumedhat a repeating atternof primary stresses ill be

    on alternate, ather than adjacentsyllables.Thus, [Alternation= 0]

    implies [MPS = 01.The hypotheticalstress atternsexaminednclude

    somewith more thanoneprimary stress;however,as ar as s known,

    no actually occurringQI stresspattern has more than one primary

    stress.

    Multiple Stress Levels MSL): hasvalue0 if there s a single evelof

    stress primary stressonly), and value 1 otherwise.

    It is possible o order these actors with respect o eachother to form a

    five-digit binary string characterizing he ease/difficulty of learning. That

    is, the computationalcomplexityof learninga stress atterncanbe charac-

    terizedasa S-bit binary numberwhosebits representhe five factorsabove,

    in decreasing rderof significance. able2 shows hat this characterization

    captureshe earning imesof the QI patterns uiteaccurately. s an example

  • 8/11/2019 Connectionist Models and Linguistic Theory

    16/50

    T

    3

    D

    p

    o

    o

    H

    h

    c

    Q

    S

    e

    P

    e

    n

    R

    L

    D

    p

    o

    o

    S

    e

    P

    e

    n

    h h h h h h h h h h

    0

    h h h h h h h h h h h h h h h h h h h h

    L

    v

    a

    s

    e

    L

    v

    a

    e

    F

    e

    e

    F

    e

    M

    s

    e

    L

    v

    a

    L

    v

    a

    e

    M

    o

    o

    e

    W

    e

    s

    e

    L

    v

    a

    e

    a

    G

    a

    w

    S

    G

    a

    w

    e

    S

    M

    o

    a

    s

    e

    W

    e

    s

    e

    L

    v

    o

    o

    G

    a

    w

    s

    e

    S

    L

    v

    o

    e

    o

    G

    a

    w

    n

    o

    L

    v

    o

    e

    L

    v

    o

    S

    L

    v

    o

    e

    S

    G

    a

    w

    e

    .

    L

    v

    o

    e

    e

    S

    G

    a

    w

    s

    e

    L

    v

    o

    e

    a

    S

    L

    v

    o

    e

    o

    S

    L

    o

    e

    L

    a

    L

    a

    e

    L

    a

    o

    L

    a

    s

    e

    o

    M

    a

    n

    s

    e

    o

    f

    s

    s

    a

    e

    s

    y

    o

    s

    M

    a

    n

    s

    e

    o

    f

    s

    s

    y

    o

    s

    t

    e

    a

    y

    o

    t

    h

    d

    s

    o

    e

    M

    a

    n

    s

    e

    o

    f

    n

    s

    y

    o

    a

    e

    M

    a

    n

    s

    e

    o

    f

    n

    s

    y

    o

    p

    t

    e

    a

    y

    o

    a

    e

    M

    a

    n

    s

    e

    o

    f

    s

    a

    l

    a

    s

    o

    e

    M

    a

    n

    s

    e

    o

    f

    s

    a

    l

    o

    s

    y

    o

    a

    e

    M

    a

    n

    s

    e

    o

    f

    s

    s

    y

    o

    p

    a

    e

    n

    e

    p

    e

    n

    t

    e

    a

    y

    a

    s

    y

    s

    e

    M

    a

    n

    s

    e

    o

    l

    o

    s

    y

    o

    o

    e

    a

    e

    n

    e

    p

    e

    n

    t

    e

    a

    y

    s

    y

    s

    e

    M

    a

    n

    s

    e

    o

    f

    s

    s

    y

    o

    p

    a

    a

    e

    n

    e

    p

    e

    n

    s

    a

    e

    M

    a

    n

    s

    e

    o

    f

    s

    s

    y

    o

    p

    t

    e

    o

    y

    o

    a

    e

    n

    e

    p

    e

    n

    s

    a

    e

    M

    o

    n

    s

    e

    o

    f

    s

    s

    y

    o

    p

    a

    a

    e

    n

    e

    p

    e

    n

    s

    a

    e

    M

    a

    n

    s

    e

    o

    f

    s

    a

    a

    e

    n

    e

    s

    n

    s

    a

    e

    M

    a

    n

    s

    e

    o

    l

    a

    a

    o

    e

    n

    e

    p

    e

    n

    s

    a

    e

    M

    a

    n

    s

    e

    o

    f

    s

    a

    l

    a

    a

    a

    e

    n

    e

    p

    e

    n

    s

    a

    e

    M

    a

    n

    s

    e

    o

    f

    s

    a

    p

    a

    a

    e

    n

    e

    p

    e

    n

    s

    a

    e

    M

    a

    n

    s

    e

    o

    f

    s

    a

    a

    e

    a

    a

    e

    n

    e

    p

    e

    n

    s

    a

    e

    s

    y

    o

    f

    n

    M

    a

    n

    s

    e

    o

    f

    s

    s

    y

    o

    p

    t

    e

    a

    y

    o

    a

    e

    a

    e

    n

    s

    e

    o

    s

    M

    a

    n

    s

    e

    o

    f

    s

    s

    y

    o

    l

    a

    t

    e

    a

    y

    o

    a

    e

    n

    s

    e

    o

    s

    M

    a

    n

    s

    e

    o

    f

    s

    a

    l

    o

    b

    n

    s

    e

    o

    s

    M

    o

    n

    s

    e

    o

    f

    s

    a

    l

    a

    s

    y

    o

    a

    e

    n

    s

    e

    o

    s

    M

    a

    n

    s

    e

    o

    f

    s

    s

    y

    o

    p

    a

    a

    e

    n

    e

    p

    e

    n

    s

    a

    e

    n

    s

    e

    o

    s

    M

    a

    n

    s

    e

    o

    f

    s

    s

    y

    o

    l

    a

    a

    a

    e

    n

    e

    p

    e

    n

    s

    a

    e

    n

    s

    e

    o

    s

    M

    a

    n

    s

    e

    o

    f

    s

    a

    p

    a

    a

    e

    n

    e

    p

    e

    n

    s

    a

    e

    b

    n

    s

    e

    o

    s

    M

    a

    n

    s

    e

    o

    f

    s

    o

    l

    o

    a

    a

    e

    n

    e

    p

    e

    n

    s

    o

    e

    n

    s

    e

    o

    s

    M

    o

    n

    s

    e

    o

    f

    s

    a

    a

    e

    a

    o

    p

    e

    n

    s

    a

    e

    s

    y

    o

    l

    a

    b

    n

    s

    e

    o

    s

    M

    a

    n

    s

    e

    o

    s

    s

    y

    o

    p

    M

    a

    n

    s

    e

    o

    s

    a

    p

    s

    a

    e

    M

    a

    n

    s

    e

    o

    s

    a

    p

    s

    y

    o

    f

    o

    h

    s

    a

    e

    M

    a

    n

    s

    e

    o

    s

    a

    a

    e

    n

    e

    s

    n

    s

    a

    e

    b

    n

    o

    l

    a

    M

    a

    n

    s

    e

    o

    s

    a

    p

    s

    y

    o

    f

    o

    h

    a

    a

    e

    n

    e

    s

    n

    s

    a

    e

  • 8/11/2019 Connectionist Models and Linguistic Theory

    17/50

    CONNECTIONIST MOD ELS AND LINGUISTIC THEORY

    17

    of how to read Table 2, note that Garawa takes longer to learn than Latvian

    (165 vs. 17 epochs). This is reflected in the parameter setting for Garawa,

    OllOl, being lexicographically greater than that for Latvian, OOOOO.

    The analysis of learnability is summarized in Table 4 for all the QI stress

    patterns, both actual and hypothetical. It can be seen that the 5-bit charac-

    terization fits the learning times of various actual and hypothetical patterns

    reasonably well; there are, however, exceptions, indicating that this 5-bit

    characterization is only a heuristic. For example, the hypothetical stress pat-

    terns with reference numbers h21 through h25 have a higher 5-bit character-

    ization than some other stress patterns, but lower learning times.

    The effect of stres s clash avoidance is seen in cons istent learning t ime differentials

    between stres s patterns of com plexity less than or greater than binary 1000. Learning

    time s with com plexity 0001 are in the range 10 to 25 epoch s, while comp lexity 1001

    patterns are of the order of 170 epoch s; com plexity 0010 is of the order of 30 epoch s, and

    lOlO, 190 epoch s, for patterns the only difference between which is the absen ce/

    presence of stres s clash avoidance (Latvian2edge and Latvian2edgeSCA, references h5

    and h19 respec tively). A pattern with com plexity 0011 (Latvian2edge2stress reference

    h6) has a learning t ime of 37 epoch s, while a pattern differing only in the addition of SCA

    (Latvlan2edge2stressS CA, reference h20) takes 206 epoch s. Com plexity 0101 patterns

    are in the range 30 to 60 epoch s, while co mp lexity 1101 patterns are in the range 70 to

    170 epoch s; in part icular, while Garawa (reference L5) has a learning t ime of 165 epoch s,

    the sam e pattern without SCA has a learning t ime of 36 epochs (Garawa-SC, reference

    hl0). A stres s pattern o f com plexity 0111 takes 65 epochs to learn (Latvian2edge2stress-

    lalt, reference hl6), while addit ion of stres s clash avoidance results in a learning t ime of

    129 epochs (Latvian2edge2s tress-1 alt-SCA, reference h25).

    The effect of alternatlon Is seen in a contrast between learning t ime s for patterns of

    com plexity 001 (range: 10 to 25 epochs) and 101 (range: 30 to 60 epochs ); 010 and

    110 (30 epochs vs . a range of 60 to 90 epochs ); 011 and 111 (37 vs . 65 epochs ).

    The effect of mult lple primary stress es is seen in the contras t between the stres s pat-

    terns: LatvianPstress (reference hl , com plexity OOl, 21 epochs) and LatvianPedgePstress

    (reference h6, com plexity 01 l , 37 epoch s); Latvian2edge2stresealt (reference h9, com -

    plexity 101 I, 56 epochs) and Latvian2edge2strese1 alt (reference h16, com plexity 1 11 ,

    65 epochs).

    The effec t of lnconslstent primary stres s is considerable: stres s patterns with the m ost

    significa nt bit 1 are learnable in the perceptron mod el on ly if all the other bits are 0; suc h

    patterns (Lakota, Polish, references L6, L7, com plexity ~lOOO O, 255 epochs) have a

    higher learning t ime than any of the patterns with mo st signif icant bit 0. All the examined

    stres s patterns o f com plexity greater than 10000 were unlearnable In the perceptron

    mod el. Re call that Paiute and Warao were unlearnable; the present fram ework is cons istent

    with that result, since under the present analysis, these two patterns have complexity

    10101 I. Recall also from footnote 6 that i f we assu me no stresse d mon osyllables in Paiute

    and Wa rao, then these stres s patterns be come learnable, in approximately 35-45 epoch s.

    Under these circum stances, the complexi ty index of the two stress syste ms is 00101. As

    can be seen from Table 2 and Table 4, a learning t ime o f approximately 40 epochs Is com -

    pletely con sistent with the learning t imes obtained for other stres s patterns with this index.

    That is, our sche me for indexing com plexity is cons istent with the results of Paiute and

    Wa rao, whether or not there are assum ed to be stressed mon osyllables.

  • 8/11/2019 Connectionist Models and Linguistic Theory

    18/50

    TABLE 4

    Analysis of Quantity-Insensitive Learn ing

    Epochs

    IPS SCA Ah MP S MSL Lanauaae Ref (sv llab lc)

    0 0 0

    Latvian

    L l 17

    French L2 16

    0 0 1

    0 1 0

    0 1 1

    1 0 0

    1 0 1

    1 1 0

    Latvlan2stress :: 21

    LatvianJstress 11

    FrenchSstress h3 23

    French3stress h4 14

    LatvianPedge h5 30

    LatvianPedgePstress h6 37

    fmposslble

    Maranungku L3 37

    Weri L4 34

    Paiu te (unstressed mono syllables) LB 45

    Warao (unstressed mono syllables) L9 36

    Maranungkutstress h7 43

    WeriSstress h3 41

    Latvian2edge2stress-alt h9 53

    Garawa-SC hlo 33

    Garawa2stressSC hll 50

    Maranungkulstress h12 61

    Werilstress h13 65

    Latvian2edge-al t h14 70

    Garawal stress-SC h15 33

    1 1 1 Latvian2edae2stress -lalt h16 85

    1

    0 0 0

    lmpossfble

    1 0 1 Garawa-non-al t h17 164

    Lotvian3stress2edaeSCA hl3 163

    0 1 0

    0 1 1

    1 0 1

    1 1 0

    1 1 1

    0 0 0

    0 0 1

    0 1 0

    0 1 1

    1 0 1

    1 1 0

    1 1 1

    Latv ian2edgeSCA

    Latvian2edge2stressSCA

    Garawa

    Garawa2stress

    Lotvian2edge2stress-alt-SCA

    Garawalstress

    LatvianZedge-al t-SCA

    Latvian2edge2stress-lalt-SCA

    Lakota

    Pol ish

    LakotaPstress

    LakotaPedge

    LakotaPedgePstress

    Paiu te

    Warao

    Lakota-alt

    Lakotalstress-alt

    h19 194

    h20 206

    :1 165

    71

    h22 91

    h23 121

    h24 126

    h25 129

    L6 255

    254

    1;726 l

    h27 **

    h23 **

    L3 **

    l

    :9 **

    h30 l *

    Note. These learning t imes were obtained using the syl labic input representation.

    IPS = Inconsistent Primory Stress: SC A = Stress Clash Avoidanc e; Aft = Altern ation;

    MP S = Mu l t ip le Primary Stresses: MS L = Mul t ip le Stress Levels. Symbols Ll -L9 refer to

    Table 1, and hl -h30 to Table 3.

    18

  • 8/11/2019 Connectionist Models and Linguistic Theory

    19/50

    CONNECTIONIST MOD ELS AND LINGUISTIC THEORY

    19

    The im pact of mult iple stres s levels is relatively sma ller, and less uniform, both of which

    mo tivate this factors being treated as least signif icant. Thu s, though there are several in-

    stanc es where a stres s pattern with a greater number of stres s levels h as a higher learning

    time (hl vs . Li; h3 vs L2; h6 vs . h5; h7 vs . L3; h6 vs . L4; h21 vs . L5), there are also case s in

    which a stres s pattern with a higher number of stres s levels has a lower learning t ime than

    one w ith fewer stress levels (h2 vs . hl; h4 vs . h3; hll vs . h15; h21 vs . h23):

    The effect o f a part icular factor s eem s to be reduced when a higher-order bit has a non-

    zero va lue. Th us, the effec ts of alternation are less clear when there is stres s clash

    avoidance: Withou t SC A, the range of learning t imes for patterns without alternation Is 10 to

    40 epochs, and with alternation 30 to 90 epoch s; but with SCA , the range without alterna-

    t ion Is 160 to 210 epochs , and with alternation 70 to 170 epochs.

    In summary, the complexity measure suggested here appears to iden-

    tify a number of factors relevant to the learnability of QI stress patterns

    within a minimal connectionist architecture. It also assesses heir relative

    impacts. The analysis is undoubtedly a simplification, but it provides a

    framework within which to relate the various learning results.

    3.2. Incorporating QS Systems

    We now turn to a consideration of quantity-sensitive (QS) stress systems.

    For QS patterns, information about syllable weight needs to be included in

    the input representation-the input has to consist of (encoded) sequences of

    H and L tokens. A purely

    syllabic

    nput representation is, by defini-

    tion of quantity-sensitivity, inadequate. Weight-stringepresentations were

    therefore adopted, as discussed in Section 2.2. To maintain consistency of

    analysis across QI and QS stress patterns, simulations for the QI languages

    were re-run using the weight-string representation. Note that the stress pat-

    terns for all possible weight strings of length n are the same for a QI

    language.

    Learning times are shown in Table 5 for simulations for all QI and QS

    stress systems. The section on the left shows QI learning times, while the

    section on the right shows QS learning times. Note that in this table, learn-

    ing times for all languages are reported in terms of the weight-string repre-

    sentation rather than the unweighted syllabic representation used for the

    previous QI studies.

    The differences in learning times across QI patterns are less marked than

    the differentials in Table 2. This is the result of the increased training set size

    with the weight-string representation as compared with the syllabic repre-

    sentation. However, learning times are completely consistent with the

    overall learnability analysis developed in the previous section, so that the

    analysis of QI systems is in no way affected. The reader may wish to com-

    pare the learning times for QI systems in Table 5 (weight-string input repre-

    sentations) with those in Table 2 (syllabic input representations).

  • 8/11/2019 Connectionist Models and Linguistic Theory

    20/50

    20 GUPTA AND TOURETZKY

    In order o incorporateQS systems, he earnabilityanalysisproposed n

    Section3.1 on the basisof QI patterns urnsout to requiresome efinement

    (although he analysisof QI systems hemselves ill remainunchanged; ee

    below). InconsistentPrimary Stress IPS) was previouslyhypothesized s

    taking binary values;a valueof 1 for IPS indicated hat primary stresswas

    assignednconsistently t the edgeof words; a valueof 0 ndicated hat this

    was not the case. f this measures modified so that its value ndicates he

    proportion of cases in which primary stress is not assigned at the edge of a

    word,

    the learning esults or both QI andQS patterns anbe ntegrated, o

    a large extent, nto a unified account. Recall that learning times for QS

    systemsare thoseshown n the right-handsectionof Table 5.

    The learning times for Malayalam and Yapeseare approximately 20

    epochs,while those or OsseticandRotumanare approximately30 epochs.

    The differencebetween hesepairs of stress atterns s: for Malayalam and

    Yapese,primary stress s placedat the end

    except

    when the edgevowel is

    short and he next vowel ong (i.e., except0.25of the time); for Osseticand

    Rotuman, primary stress alls at the edgeexcept when the edgevowel is

    short, that is, except n 0.5 of the cases.

    The five factors discussedearlier were: InconsistentPrimary Stress

    (IPS); StressClash Avoidance SCA); Alternation (Ah); Multiple Primary

    StressesMPS); andMultiple Stress evels MSL). The valuesof thesendices

    respectively, or both Malayalam and Yapese,are [0.250 0 1 01,and for

    both Osseticand Rotuman, [0.5 0 0 1 01.The differencebetweenearning

    times for thesepairs of otherwise denticalpatternscan then be accounted

    for in terms of differing values of the IPS measure.Note that the earlier

    analysisof QI languagesemainsunchanged: tresspatterns hat had IPS

    value 0 still do, and those hat had IPS value 1 still do as well.

    The learning imes of Komi and Cheremisare substantiallyhigher han

    thoseof Koya, Eskimo, Malayalam, Yapese,Ossetic,and Rotuman.Recall

    the stress atternof Komi: stress he first heavysyllable,or the ast syllable

    if there are no heavysyllables.That is, for Komi, a particular syllable S

    receivesprimary stressunder the following conditions: (1) there are no

    heavysyllables o the eft of S n the syllablestring;and (2)S s Heavyor S is

    the last syllable.The second lauseof the conditional nvolvessingle-posi-

    tional information: nformation eitherabout he syllableS tself (S s Heavy),

    or about the absence/presencef a syllable right-adjacent o S in the

    weight-string. If there s no syllable o the right of S in the weight-string,

    then S is the ast syllable; f there s a syllable ight-adjacent o S, then S is

    not the last syllable).The first clauseof the conditional, however, nvolves

    aggregative information: information aboutalf the syllables o the eft of S

    in the weight-string.We see hat in assigning tress o syllable strings n

    Komi, one must somehow scan the input string to extract his aggrega-

    tive information; similarly for Cheremis.

  • 8/11/2019 Connectionist Models and Linguistic Theory

    21/50

    T

    5

    S

    m

    m

    a

    y

    o

    R

    s

    a

    A

    y

    s

    o

    Q

    a

    Q

    L

    n

    n

    E

    A

    m

    I

    P

    S

    A

    M

    P

    M

    S

    Q

    L

    R

    (

    w

    n

    0

    0

    0

    0

    0

    0

    L

    v

    a

    L

    2

    0

    0

    0

    0

    0

    1

    K

    L

    O

    2

    0

    0

    0

    0

    1

    0

    E

    m

    o

    L

    3

    0

    0

    0

    1

    0

    1

    M

    a

    a

    L

    3

    W

    e

    L

    3

    0

    0

    1

    1

    0

    1

    G

    Y

    C

    W

    a

    L

    7

    0

    0

    2

    0

    0

    0

    0

    M

    a

    o

    a

    m

    L

    1

    Y

    L

    1

    0 0

    0

    5

    1

    0 0

    0 0

    0 0

    0

    0

    L

    o

    L

    1

    O

    c

    L

    3

    R

    u

    m

    a

    L

    2

    P

    s

    L

    1

    0

    1

    0

    1

    0

    1

    P

    u

    e

    W

    a

    a

    L

    t

    L

    l

    1

    0

    0

    0

    0

    0

    K

    m

    i

    L

    6

    2

    C

    e

    m

    is

    L

    2

    2

    0

    0

    0

    0

    0

    M

    o

    a

    L

    2

    M

    a

    L

    2

    N

    e

    T

    l

    e

    n

    n

    t

    m

    e

    w

    e

    o

    a

    n

    u

    n

    w

    g

    s

    n

    i

    n

    r

    e

    e

    a

    o

    A

    e

    v

    I

    n

    a

    m

    a

    o

    I

    P

    n

    s

    e

    P

    m

    a

    y

    S

    e

    S

    e

    C

    a

    A

    d

    A

    =

    e

    n

    o

    M

    P

    M

    u

    p

    e

    P

    m

    a

    y

    S

    e

    M

    S

    M

    u

    p

    e

    S

    e

    L

    s

    S

    m

    b

    s

    L

    L

    r

    e

    e

    t

    o

    T

    e

    1

    Y

  • 8/11/2019 Connectionist Models and Linguistic Theory

    22/50

    22 GUPTA AND TOURETZKY

    Komi and Cheremis an thereforebe analyzedas stresspatterns hat re-

    quire uggregative information for the determinationof stressplacement;

    noneof the otherstress atterns equiresuch nformation. For example, or

    Koya, a syllableS should eceive tressf it is the first syllable whichcanbe

    determined rom information about the presence/absencef a syllable n

    the left-adjacentweight-stringposition), or if it is heavy,both of which are

    single-positional kinds of information. For Ossetic,a syllableS should be

    stressedf (a) it is the first syllableand it is heavy(which requires ingle-

    positiona information about the left-adjacentweight-stringposition, and

    aboutS tself); or if (b) t is the second yllable single-positional information

    about heweight-string lement wo positions o the eft of S)

    and

    the syllable

    in the left-adjacent osition s light

    (also single-positional

    information).

    The difference n learning imes betweenKomi and Cheremison the one

    hand, and Koya, Eskimo, Malayalam, Yapese,Osseticand Rotuman, on

    the other, can now be analyzed n terms of differing informational re-

    quirements.Whether or not aggregative information is needed herefore

    seems o be a further factor relevant o the learnability of stresspatterns.

    We further need o consider he fact that the patternsof Mongolian and

    Mayan havevery much higher earning imes than thoseof any other stress

    patterns, including Komi and Cheremis. Recall the stress pattern of

    Mongolian: stress he first heavysyllable,or the first syllable f thereareno

    heavysyllables.Notice how it differs from that of Komi, whosepattern s:

    stress he first heavy syllable, or the last syllable if there are no heavy

    syllables.

    Thus, for Mongolian, if the currentsyllableS is heavy t shouldreceive

    stress f it is thefirst heavysyllable. f the currentsyllableS is light, then t

    shouldreceivestress nly if (a) there s no syllable o its left in the weight-,

    string (which ndicates hat it is the first syllable),and (b) there s no heavy

    syllable to its right in the weight-string.That is, for Mongolian, there s

    aggregative

    information requiredabout heavysyllablesboth to the left of

    the currentsyllable

    and

    to its right. Information aboutheavysyllables o the

    left of S is relevant f S is heavy; nformation about heavysyllables o the

    right is relevant f S is light. This requirement or keeping rack of dual

    kinds of aggregativenformation seemso be what makes he patternsodif-

    ficult to learn. This contrastswith Komi, for which aggregativenformation

    is requiredonly about syllables o the left of S.

    A.,parallel analysiscan be made for Mayan. For both Mongolian and

    Mayan, then, the very high

    learning

    times result from the requirement or

    two kinds of aggregativenformation, in contrastwith only one kind for

    Komi and Cheremis.Acquiring this information requires idirectionalscan-

    ning of the input string for Mongolian and Mayan, as comparedwith uni-

    directionalscanning,as discussed bove, or Komi and Cheremis.

  • 8/11/2019 Connectionist Models and Linguistic Theory

    23/50

    CONNECTION&T MOD ELS AND LINGUISTIC THEORY

    23

    The results rom Komi, Cheremis,Mongolian, and Mayan thus suggest

    an additional factor that is relevant or determinationof learnability, but

    that comes nto play only in the caseof QS pattern: how much scannings

    required or

    aggregative

    information. This canbe treatedasa sixth indexof

    computationalcomplexity hat takeson thevalues 0, 1,2}. We, therefore,

    have he following factor, in addition to the five previouslydiscussed:

    (VI) Aggregative Information (Agg): hasvalue0 if no aggregativenfor-

    mation is required

    single-positional

    information suffices);value 1 f

    unidirectional scanning s required (Komi, Cheremis);and 2 if bi-

    directionalscanning s required(Mongolian, Mayan).

    With thesemodifications (viz., refinementof the IPS measure, nd addi-

    tion of the Aggregativemeasure),he sameparameter cheme an be used

    for both the QI and QS anguage lasses,with good earnabilitypredictions

    within eachclass,as shown n Table 5.

    We note, moreover, that learning times for the QS stresspatternsof

    Koya and Eskimo fit right in with the analysis.The characterizationof

    Koya n termsof our complexitymeasures OOOOO1,hile that of Eskimo

    is 900010. Learning imes for thesesystems,accordingly, it in between

    the learning times for the QI systemsof Latvian and French (complexity

    index OOOOOO)nd Maranungkuand Weri (complexity ndex 000101).

    As can alsobe seen,differencesn learning imesbetweenQS stress at-

    ternsalso it in more generallywith the analysisdeveloped arlier,andwith

    the analysisof

    single-positional vs. aggregative

    informational requirements

    developedn this section.

    Thus, both the QI and QS results all into a single analysiswithin this

    generalized arameterschemeand weight-string epresentation, lthough

    with a less perfect fit than the within-classresults.For example, he QI

    stresspatternsof Lakota and Polish havehigher complexity ndexes han

    the QS stresspatternsof Malayalam, Yapese,Ossetic,and Rotuman, but

    lower learning times. Quantity-sensitivity hus appears o affect learning

    times, as seems easonableo expect,due o the distribution of the weight-

    string training set (seediscussionn Section2.2). However,no measure

    of its effect will be offered here.The analytical rameworkdevelopedhus

    far appears o hold within QI languages, ndwithin QS languages;urther

    analysiswould be neededo relate earningresultsacross he two kinds of

    stresspatterns.

    Finally, it is worth noting that learning imes for thesestresspatterns

    havealsobeenobtainedwith a three-layer rchitecture.Differences etween

    learning times for different stress patterns were somewhat reduced.

    However, earning

    imes

    were

    ordered

    exactly as in the two-layer model

    resultsshown n Table 5, suggestinghat these earningresultsare robust

  • 8/11/2019 Connectionist Models and Linguistic Theory

    24/50

    24

    GUPTA AND TOURETZKY

    with regard o architecturaldifferences. n the discussion hroughout this

    paper,however,we focus on results rom the two-layermodel only, since

    the two-layermodelwith a singleoutput unit facilitatesanalysisof connec-

    tion weightsand their contribution to processing.

    3.3. Markedness and Learning Times:

    Correspondences with Metrical Theory

    So far, we haveusedour perceptron earning esults o devisean analytical

    framework or stress.t shouldbeclearhow our theory s analogouso con-

    ventional inguistic heory: t is an accountof stress ystems ased n obser-

    vation of the behaviorof a perceptronexposed o thosesystems, n very

    much the sameway that metrical theory s an accountof linguistic stress,

    basedon observationsof what human beings earn. The stresspattern

    descriptions iven to the perceptronareat the same evel of abstractionas

    those hat form the startingpoint for metrical heory. But sinceour account

    is basedon learning ime rather han surface orms anddistributional data,

    we refer to it as a pseudo-linguisticheory.

    In this section,we will discuss heways n which our learning ime results

    are in good agreementwith some of the markedness predictions of

    metrical theory. That is, our resultsexhibit a correspondenceith theoreti-

    cal predictions,strengtheninghe analogybetween ur theory and metrical

    theory. As we will show, earning esultssuchas hese ould alsoprovidean

    additional sourceof data for choosingbetween heoreticalalternatives.g

    Within the dominant inguistic tradition, a universalgrammar of stress

    should ncorporatea theory of markedness, o as o predictwhich features

    of stresssystems re at the core of the human anguage aculty and which

    are at the periphery.The distributional approach o markednessreats as

    unmarked those inguistic orms that occurmore requently n the worlds

    languages. nother approach o markednesss

    learnability theory,

    which

    examines he logical process f language cquisitionO. hus, for example,

    Dresher& Kaye take iteration to be the default or unmarkedsetting for

    parameterPll, because here s evidence hat can cause evision of this

    default if itturns out to be the incorrectsetting: he absence f any secon-

    dary stresses ervesas a diagnostic hat feet are

    not

    iterative (Dresher&

    ) Nybergs parameter-based model also provides such data, but in term s of how ma ny ex-

    urn&s

    of a stres s pattern have to be presented to the stress learner (Nyberg, 1990).

    lo As an exam ple, the Subset Principle (Berwick, 1985; We xler & Man xini, 1987) hos im-

    plications for ma rkedne ss. Suppose that two possible sett ings u and b for parameter P result in

    the learner respectively accepting s ets S,,and Sb of l inguistic form s. If S ,, s a subs et of St,, then,

    once P has been set to value b, no posit ive evidence can ever re-set it to a, even if that was the

    correct sett ing. Unm arked values for parameters should therefore be the ones yielding the m ost

    constrained system.

  • 8/11/2019 Connectionist Models and Linguistic Theory

    25/50

    CONNECTIONIST MOD ELS AND LINGUISTIC THEORY

    TABLE 6

    Learn ing Times far Ql and QS Stress Patterns, Grouped by Theoretical Analysis

    Lanauaa e Characterization Epochs

    7

    a

    9

    Latvlan, French

    Koya

    Eskimo

    Maranungku, Weri

    Lokota, Pol ish

    Malaya lam,

    Yapese

    Ossetic,

    Rotuman

    Koml, Cheremls

    Mongl ian,

    Mavan

    Word-tree, no feet

    Word-tree, Iterative unb oun ded QS feet

    No word-tree, i terative unbounde d QS feet

    Ward-tree, Iterative binary Ql feet

    Word-tree, non-iterative binary Ql feet

    Word-tree, non-iterative binary QS feet,

    domina nt node branches

    Word-tree, non-iterative binary QS feet

    Word-tree, non-iterative unb ound ed QS feet

    Word-tree, non-iterative unb ound ed QS feet,

    domina nt node branches

    2

    2

    3

    3

    10

    19

    (*:;

    214

    W)

    2302

    (f4)

    Note. These learn ing times were obta ined using wefght-strfng input representations for

    both Ql and QS patterns. Each f igure Is the average learning t ime for languages In the group.

    Kaye, 1990,p. 191). f non-iterationwere he default, their learningsystem

    might not encounterevidene hat would enable t to correct this default

    setting, f it were n fact incorrect. It shouldbe noted that, while this is a

    representativepplicationof subset heory, the choiceof defaultparameter

    valuesdepends n the particular earningalgorithm employed.

    Table 6 shows he stress ystems roupedby their theoreticalanalysesn

    termsof the parameter cheme iscussedn Section1.4.The ast column of

    .the able shows he averageearning ime in epochs or eachgroupof stress

    patterns theseare the same earning imes shown n Table 5). As can be

    seen, hereappearso bea fairly systematic ifferentiationof learning imes

    for groupsof stresspatternswith different clustersof parameter ettings.

    I As noted in Section 2.2,

    weight-string

    representations are neces sary for QS stre ss pat-

    terns. For QI sys tem s,

    syllabic

    representations are suff icient. Of course, weight-str ing repre-

    sentations can be used for QI sys tem s, although the information about syl lable we ight wil l be

    redundant. To obtain learning t imes that m ight reflect differences between the stres s patterns

    them selve s, rather than merely reflecting differing input representations and training set size s,

    we ran simulations for both Q S and QI patterns using weight-str ing input representations. The

    learning t imes in Table 6 are based on use of this weight-str ing representation for both QI and

    QS patterns.

    I The stress sys tem of Garawa, as described in Hayes (1980, pp. N-55), cannot be char-

    acter ized in terms of the parameter scheme adopted here. The stress syste ms of Paiute and

    Warao cannot be learned by a perceptron, as already discusse d in Section 3 .1, and as wil l be

    discuss ed in m ore detail in Section 4.2. (Recall, howev er, that they can be learned (a) using a

    two-layer a rchitecture with two output units, or (b) a three-layer architecture with hidden

    units, or (c) within the present perceptron architecture, i f monosyllabic stres s is not included.)

    Learning results for these three stre ss sy ste m s have therefore been excluded from the present

    analysis, as either the parameterized characterization or learning t imes cannot be established.

  • 8/11/2019 Connectionist Models and Linguistic Theory

    26/50

    26

    GUPTA AND TOURETZKY

    Learning times appear o be significantly higher for stresssystems n

    groups5 through9, which havenon-iterative eet, han for those n groups

    1 through 4, which either do not have metrical feet at all, or else have

    iterative feet. This makes he interestingprediction that non-iterative eet

    are more difficult to learn, and hencemarked.This predictioncorresponds

    with both Halle & Vergnauds xhaustivityCondition3,andwith thechoice

    of marked and unmarkedsettings n Dresher8cKayesparameterscheme

    (ParameterPl 1).

    Comparisonof learning imes for group 1 vis-a-visgroups2,3 and4 also

    suggestshat a stresssystemwith only a word-tree(i.e., with no metrical

    feet) s easier o learn than one with (iterative)metrical feet.

    The dramatic difference n learning imes betweengroups8 and 9 sug-

    gests hat it is marked or the dominantnode o be obligatorily branchingls.

    Group 8 differs from group 9 only in not havingobligatorybranching,and

    averageearning imes were214epochs s. 2302epochs.

    This predictionagrees ith the distributionalview that obligatorybranch-

    ing is relativelymarked16, ut runs counter o Dresher& Kayes choiceof

    default values parameterP7)*.

    However,comparisonof group6 with group7 suggestshat systemswith

    obligatorybranchingmay be more easily earned:group 6, with obligatory

    branching,hasa learning ime of 19epochs, omparedwith group 7, with-

    out obligatory branching,but with a learning ime of 29 epochs.This runs

    counter o the distributionalargument,but agrees ith the earnabilityview.

    Two points areworth noting. First, it is interesting hat where here s a

    conflict between he distributional and learnability theory predictionsof

    markedness,here s also conflicting evidencerom the perceptron imula-

    tion. Second, heseconflicting perceptron esultshighlight the fact that it

    may be infeasible o analyze he effectsof different settings or individual

    parameters;t may only be possible o makebroaderanalyses f the effects

    of

    clusters

    of parametersettings.Strong nteractionsbetweenparameters

    havealsobeenobservedn othercomputational earningmodelsof metrical

    phonology Eric Nyberg, personalcommunication,August, 1991).

    I The rules of cons tituent boundary construction apply exh austively.. . (Halle L

    Vergnaud, 1987a, p. 15).

    I In Dresher 8c Kayes mod el, i teration is the default or unmarked parameter sett ing

    because there is evidence that can cause revision of this default. The absence of a ny secondary

    stress es serves as a diagnostic that feet are not i terative (Dresher 8c Kaye, 1990, p. 191).

    I This mean s that the strong branch of a foot m ust dominate a heavy syllable, and cannot

    dominant a l ight on e.

    I6 Thu s, Hayes (1980, p. 113): the max imally unmarked labeling convention is that which

    ma kes all d ominant nodes strong . . . he conve ntion that wins second place is: label dominant

    nodes as strong if and only if the y branch.

    I7 Obligatory branching is the default because evidence (the presence of any stressed l ight

    syllables that do not receive stres s from the word-tree) can force its revision (Dresher & Kaye,

    1990, p. 193).

  • 8/11/2019 Connectionist Models and Linguistic Theory

    27/50

    CONNECTIONIST MODE LS AND LINGUISTIC THEORY

    27

    In view of thegreaterdifferential n learning imesbetweenGroups8 and

    9 than betweenGroups6 and 7, we conclude hat the effect of obligatory

    branching s to

    increase

    learning ime. That is, we view our learning esults

    assupporting he markedness f obligatorybranching.This raises he nter-

    estingpossibility hat learning esultssuchas hose rom the percentpercep-

    tron simulationscanprovidea newsource f insight nto questions f mark-

    edness. he distributional view of the markedness f obligatorybranching

    (Hayes,1980,p. 113)seemso conflict with the earnabilityview (Dresher&

    Kaye, 1990, p. 193). The present simulations seem to agreewith the

    distributional view, and could perhaps erveasa fresh sourceof evidence.

    This potential contribution to theoreticalanalysiscan be further illu-

    strated for the stresssystemsof Lakota and Polish, which are mirror

    images,Recall hat in Lakota, primary stress alls on the second yllableof

    the word. The analysis o far adopted or Lakota s that it hasnon-iterative

    binary right-dominant QI feet constructed rom left to right, with a left-

    dominantword-tree. Let us call this

    Analysis A.

    As illustrated n Figure3,

    this leads o the constructionof onebinary right-dominantQI foot at the

    left edgeof the word. This, togetherwith the left-dominant word-tree,

    results n the assignmentof primary stress o the secondsyllable. As has

    beenshown,under his analysis he perceptron earning esultssupport he

    markedness f non-iteration recall he differing learning imesof Groups1

    through 4, vs. Groups5 through 9).

    However, an alternative analysis s that Lakota has a left-dominant

    word-tree with no metrical feet, and the first syllable is extrametrical

    (Dresher& Kaye, 1990,p. 143).Let us call this Analysis B. As illustrated n

    Figure 3, the leftmost syllable s treatedas invisible to the stress ules,

    and the word-treeassignsprimary stress o the leftmost of the visible

    syllables.The result is that the secondsyllable receives rimary stress.

    Under this analysis,Lakota and Polish (Group 5, in Table 6) differ from

    Latvian and French (Group 1 in Table 6) only in having an extrametrical

    syllable.The differing learning imes for the two groups (1 epochvs. 10

    epochs) hen suggest hat extrametricality s

    marked.

    However, this runs

    counter o both the distributionalview (Hayes,1980,p. 82)lgand the earn-

    ability theory view (Dresher& Kaye, 1990,pp. 189,191).

    I This is based on Hayes analysis of

    penultimate stress

    (Hayes, 1980, p. 55).

    I9 Hay es argues for the importance of the device of extrametricality to the theory because

    of i ts,role in accounting for a variety of stres s phenomena . Thu s, extrametricality is unma rked

    in the sense previously referred to: that of being attested to in a fair variety o f languages.

    lo In Dresher &

    Kayes

    formulation, the presence of stress on a leftmo st or r ightm ost

    syllable can rule out extrametricality. How ever, there is no posit ive c ue that unambiguously

    determines the presence of ex trame tricality. Hen ce, the default value for parame ter P8 is that

    there is extram etricality (PE[ye s]). I f the default were PE[N o], but this wa s an incorrect sett ing,

    there would be no cue that could lead to detection that this is incorrect.

  • 8/11/2019 Connectionist Models and Linguistic Theory

    28/50

    28

    ANALYSIS A

    GUPTA AND TOURETZKY

    I

    ANALYSIS B

    Word-tree

    Foot

    A I\

    /I

    w s w w

    S

    w w

    u (r u u (qy.~~

    0-u-u

    Syllables

    .

    i

    I--

    Extra-metrical syllable

    Figure 3. Two metrical analyses far a four-syl lable word in Lakota. Strong branches are

    labe led S, and weak branches W. Analysts A: construction of one binary right-dom inant

    Ql foot at the left edge of the word, together wi th a left-dominant word-tree, resul ts in the

    assignme nt of primary stress to the second syl lable. Analysis 3: the leftmost syl lable is

    treated as invisible to the stress rules (extrametrtcal), and the word-tree assigns primary

    stress to the leftmost of the visible syl lables. The result is that the second syl lable

    receives primary stress.

    To summarize,

    nalysis A views

    LakotaandPolishashavingnon-iterative

    feet, which both the distributional/theoreticaland learnability approaches

    treat as marked.

    Analysis B

    views hesestresspatternsas havingan extra-

    metrical syllable,which both approachesreat asunmarked.So far, there s

    nothing theory-external o help choosebetween he analyses.Simulation

    resultssuchas hese rovidesucha means: ince he earning esultsarecon-

    sistentwith the theoreticalmarkedness f non-iteration,but not with the

    unmarkedness f extrametricality, hey provide at leastweak support for

    preferring

    Analysis A

    over

    Analysis B.

    4. LOWER LEVEL PROCESSING: THE CAUSAL ACCOUNT

    In the previous section, we developedour abstract pseudo-linguistic

    theory of the earningandassignment f stress y a perceptron. his high-

    levelaccountwasbasedon observation f perceptronearning esults,with-

    out knowledgeof processingmechanismswithin the perceptron.

  • 8/11/2019 Connectionist Models and Linguistic Theory

    29/50

    CONNECTIONIST MOD ELS AND LINGUISTIC THEORY

    29

    In this section, we open up the black box of the perceptron and ex-

    a