direct variationinverse variation constant of variationjoint variation combined variation
Weighting the Contraints on Word-order Variation in...
Transcript of Weighting the Contraints on Word-order Variation in...
Weighting the Contraints on Word-order
Variation in German
Markus Bader & Jana Haussler
University of Konstanz
QITL 2 - Osnabruck 2006 – p. 1
Introduction
The expression of focus – syntax vs. phonology
Sentence-focus (Context: What happened?):
Rightmost stress and canonical word-order in English andItalian
(1) a. English: [F John has LAUGHED]b. Italian: [F Gianni ha RISO]
Subject-focus (Context: Who has laughed?):
English retains canonical word-order,Italian retains rightmost stress
(2) a. English: [F JOHN] has laughedb. Italian: Ha riso [F GIANNI]
QITL 2 - Osnabruck 2006 – p. 2
Introduction
Sentence-focus (Context: What happened?):
Rightmost stress and canonical word-order(3) [F Jan hat GELACHT]
John has laughed
Subject-focus (Context: Who has laughed?):
Either canonical word-order is retained, or rightmost stress(4) a. [F JAN] hat gelacht
John has laughedb. Gelacht hat [F JAN]
laughed has John
QITL 2 - Osnabruck 2006 – p. 3
Introduction
Strategies to express focus
English: fixed word-order and flexible focus assignment
Italian: flexible syntax and fixed focus structure
German: flexible syntax and flexible focus assignment
Flexibility in German includes the ordering between subjectand object: Who called the minister?(5) a. [F Der VATER] hat den Pfarrer angerufen
the father-NOM has the minister-ACC calledb. Den Pfarrer hat [F der VATER] angerufen
the minister-ACC has the father-NOM called
QITL 2 - Osnabruck 2006 – p. 4
Introduction
Questions of the current study
What are the relevant factors determining the orderbetween subject and object in German? (Are all factorssuggested in the literature indeed necessary todetermine word-order in German?)
What is the weight of each of these factors?
Do the factors (or their weight) differ for embedded vs.main clauses?
QITL 2 - Osnabruck 2006 – p. 5
Introduction
Outline of the talk
The grammar of word-order variation
Previous corpus studies of word-order variation inGerman
The current corpus study
A comparison with results from languagecomprehension
General discussion
QITL 2 - Osnabruck 2006 – p. 6
The grammar of word-order variation
‘Word-order’ freedom involving the prefield (SpecCP):(Only possible in main clauses)
(6) a. Der Vater hat den Pfarrer besuchtthe father-NOM has the minister-ACC visited
b. Den Pfarrer hat der Vater besuchtthe minister-ACC has the father-NOM visited
QITL 2 - Osnabruck 2006 – p. 7
The grammar of word-order variation
‘Word-order’ freedom involving the middlefield(between C◦ and the clause-final verb(s)):
(7) a. Sicher hat der Vater den Pfarrer besuchtSurely has the father-NOM the minister-ACC visited
b. Sicher hat den Pfarrer der Vater besuchtSurely has the minister-ACC the father-NOM visited
(8) a. dass der Vater den Pfarrer besucht hatthat the father-NOM the minister-ACC visited has
b. dass den Pfarrer der Vater besucht hatthat the minister-ACC the father-NOM visited has
QITL 2 - Osnabruck 2006 – p. 8
The grammar of word-order variation
For German, SO is considered to be the canonicalword-order.
Two sources have been proposed for using OS:
Information structure :In order to have focus in its default pre-verb position,a non-focused object may be moved to the left.In order to have the topic in its preferred initialposition, an object in topic function might be movedto the left.
Argument structure : Particular verbs (e.g.psych-verbs, unaccusative verbs) license the use of OS.
QITL 2 - Osnabruck 2006 – p. 9
The grammar of word-order variation
Factors that have been suggested to affect word-order
discourse related properties:definiteness: definite < indefinitefocus/backgroundtopichood: topic first
semantic properties:agency: agent < non-agentanimacy: animate < inanimate
length: favoring OS when the object is shorter than thesubject
QITL 2 - Osnabruck 2006 – p. 10
Previous corpus studies
Hoberg (1981): Base word order
(N–D–A)pron – ((N–D–A)+ani – (N–D–A)−ani)nom – (N,D,A)FN
pronominal argument < non-pronominal argument
ordering of pronouns: NOM < ACC < DAT
non-pronominal arguments: animate < inanimate
Semantically opaque arguments are adjacent to theverb (‘Funktionsverbgefüge’)
QITL 2 - Osnabruck 2006 – p. 11
The current corpus study
Data base
Newspaper corpus of the IDS (Mannheim)
We queried for den (object introduced by the definitearticle den) with further restrictions (see below).
Motivation for den: combination of corpus andcomprehension studies
Focus of comprehension studies: Syntactic ambiguityresolution
Depending on the noun, objects with den areambiguous between dative and accusative
QITL 2 - Osnabruck 2006 – p. 12
The current corpus study
Sentences were randomly sampled by theCOSMAS-System
four sets: 2 clause types x 2 positional restrictionsdass (...) den: embedded clauses, unconstrainedwrt. to the position of the objectdass den: embedded clauses, object-initial(immediately following the complementizer)den: main clauses, unconstrained wrt. to theposition of the objectDen: main clauses, object-initial
Sentences in which den was not a verbal argumentwere subsequently removed (e.g. den within PPs)
QITL 2 - Osnabruck 2006 – p. 13
The current corpus study
The sentence sets were annotated for the followingproperties:
case
voice
animacy
definiteness
pronominality
length of subject and object (number of words)
QITL 2 - Osnabruck 2006 – p. 14
The current corpus study
Table 1: Overview of embedded clausesUnconstrained Object-initial
Total 1178 930S=Nominal 835 838S=Pronominal 335 0no S 8 911 NP Arg 8 912 NP Args 1019 8283 NP Args 151 114 NP Args 0 0
QITL 2 - Osnabruck 2006 – p. 15
The current corpus study
Table 2: Overview of main clausesUnconstrained Object-initial
Total 668 827S=Nominal 518 602S=Pronominal 146 198no S 4 271 NP Arg 4 272 NP Args 559 7193 NP Args 104 804 NP Args 1 1
QITL 2 - Osnabruck 2006 – p. 16
The current corpus study
The position of subject pronouns:
Embedded clauses : In our data, a pronominal subjectprecedes the non-pronominal den-object withoutexceptionThis confirms earlier observations of a categorialconstrain “S[+pron] < O[−pron]” in the middlefield
Main clauses : Both orderings are attested, but whenthe object precedes a pronominal subject, the object isalways in the prefield.This is expected given that the prefield and themiddlefield-initial position have different syntacticcharacteristics.
QITL 2 - Osnabruck 2006 – p. 17
Focus of this Presentation
Factors determining word-order in German
Comparison of word order in the middlefield andword-order involving the prefield
Word-order variation in the middlefield: corporacontaining embedded clauses (possible differencesbetween the middlefield in main and embeddedclauses won’t be discussed)Word-order variation involving the prefield: corporacontaining main clauses, with all sentences in whichboth subject and object are in the middlefieldremoved
Note: In the following, we will present data for sentenceswith a non-pronominal subject only.
QITL 2 - Osnabruck 2006 – p. 18
Word-order: Basic Results
Table 3: Percentages of SO-sentences by case and
nr. of argumentsMiddlefield (emb.) Prefield (main)
Accusative Dative Accusative Dative2 Args. 99.25 50.93 89.2 49.23 Args. 98.65 96.10 87.5 94.3
QITL 2 - Osnabruck 2006 – p. 19
Word-order: Basic Results
Table 4: Percentages of accusative-sentences by
order and nr. of argumentsMiddlefield (emb.) Prefield (main)
Sub>Obj Obj>Sub Sub>Obj Obj>Sub2 Args. 88.0 5.4 (6.4) 91.6 56.2 (76.0)3 Args. 49.7 25.0 (0) 48.5 71.4 (67.0)
QITL 2 - Osnabruck 2006 – p. 20
Word-order: Basic Results
Summary for 2-argument sentences:
OS-order in the middlefield occurs mainly with dativeobjects
For dative objects, there is no difference betweenmiddlefield and prefield
For accusative objects, prefield OS is much morefrequent than middlefield OS
Summary for 3-argument sentences:
Sentences with three arguments were almost alwaysrealized with SO-order.
QITL 2 - Osnabruck 2006 – p. 21
Lexical-Conceptual Factors
Animacy in accusative sentences
020
4060
8010
0
Distribution of Animacy Features
Per
cent
age
of s
ente
nces
S[+an]O[+an]
S[+an]O[−an]
S[−an]O[+an]
S[−an]O[−an]
SO MiddlefieldSO Prefield
OS MiddlefieldOS Prefield
QITL 2 - Osnabruck 2006 – p. 22
Lexical-Conceptual Factors
Animacy in accusative sentences
020
4060
8010
0
Distribution of Animacy Features
Per
cent
age
of s
ente
nces
S[+an]O[+an]
S[+an]O[−an]
S[−an]O[+an]
S[−an]O[−an]
SO MiddlefieldSO Prefield
OS MiddlefieldOS Prefield
QITL 2 - Osnabruck 2006 – p. 23
Lexical-Conceptual Factors
Animacy in dative sentences
020
4060
8010
0
Distribution of Animacy Features
Per
cent
age
of s
ente
nces
S[+an]O[+an]
S[+an]O[−an]
S[−an]O[+an]
S[−an]O[−an]
SO MiddlefieldSO PrefieldOS MiddlefieldOS Prefield
QITL 2 - Osnabruck 2006 – p. 24
Lexical-Conceptual Factors
Animacy in dative sentences
020
4060
8010
0
Distribution of Animacy Features
Per
cent
age
of s
ente
nces
S[+an]O[+an]
S[+an]O[−an]
S[−an]O[+an]
S[−an]O[−an]
SO MiddlefieldSO PrefieldOS MiddlefieldOS Prefield
QITL 2 - Osnabruck 2006 – p. 25
Lexical-Conceptual Factors
Animacy in OS sentences
020
4060
8010
0
Distribution of Animacy Features
Per
cent
age
of s
ente
nces
S[+an]O[+an]
S[+an]O[−an]
S[−an]O[+an]
S[−an]O[−an]
Acc MiddleAcc Pre
Dat MiddleDat Pre
QITL 2 - Osnabruck 2006 – p. 26
Lexical-Conceptual Factors
Summary:
In the middlefield:OS-order occurs mainly with S[−animate] andO[+animate]
Animate subjects occur mainly in the order SO.
For dative objects, middlefield OS and prefield OSpattern together
For accusative objects, prefield OS and middlefield SOpattern together
QITL 2 - Osnabruck 2006 – p. 27
Verb-Related Factors
What kind of constructions occur with inanimate subjectand animate object?
Passivized ditransitive verbs (cf. (9)) → dative object
Unaccusative verbs (cf. (10)) → dative object
(9) . . . dass dem Kind ein Bär geschenkt wurdethat the child-DAT a bear-NOM given was
(10) . . . dass dem Kind ein Witz eingefallen istthat the child-DAT a joke-NOM come-to-mind is
QITL 2 - Osnabruck 2006 – p. 28
Verb-Related Factors
What kind of constructions occur with inanimate subjectand animate object?
Object-experiencer verbs take either an accusative (cf.(11)) or a dative object (cf. (12)).
(11) . . . dass das Kind der Witz gelangweilt hatthat the child-ACC the joke-NOM annoyed has
(12) . . . dass dem Kind der Witz gefallen hatthat the child-DAT the joke-NOM pleased has
QITL 2 - Osnabruck 2006 – p. 29
Verb-Related Factors
Table 5: Percentages of passive usage and sein-
auxiliary by orderMiddlefield (emb.) Prefield (main)
Sub>Obj Obj>Sub Sub>Obj Obj>Sub% Pass. 9.1 46.7 (41.0) 18.0 21.0 (37.0)% ‘sein’ 27.1 69.6 (66.7) 27.8 33.3 (43.9)
QITL 2 - Osnabruck 2006 – p. 30
Verb-Related Factors
Summary:
Passivized ditransitive verbs and unaccusative verbsaccount for a substantial amount of OS sentences withdative objects.
These constructions are not compatible with accusativecase. This accounts for the finding that OS-order in themiddlefield is rare with accusative objects.
Topics under current investigation:
Can the factor ‘animacy’ and the verb-related factors(passivization, unaccusativity) be reduced to the lexicalsemantics of verbs?
QITL 2 - Osnabruck 2006 – p. 31
Constituent Length
Question: Does constituent weight affect the orderingbetween subject and object?
So far, we have only computed weight in terms ofconstituent length measured by number of words.
Phenomena like extraposition have not yet been takeninto account.
QITL 2 - Osnabruck 2006 – p. 32
Constituent Length
Table 6: Mean difference between length of object
and length of subject (measured in words)Middlefield (emb.) Prefield (main)
Sub>Obj Obj>Sub Sub>Obj Obj>SubAccusative 0.84 -0.45 -0.02 0.34Dative 0.48 0.05 -0.45 0.02
QITL 2 - Osnabruck 2006 – p. 33
Constituent Length
Middlefield SO sentences: Difference in length (O−S)
Den
sity
−10 −5 0 5 10 15 20 25
0.00
0.20
Middlefield OS sentences: Difference in length (O−S)
Den
sity
−10 −5 0 5 10 15 20 25
0.00
0.20
QITL 2 - Osnabruck 2006 – p. 34
Constituent Length
Prefield SO sentences: Difference in length (O−S)
Den
sity
−40 −30 −20 −10 0 10 20
0.00
0.20
Prefield OS sentences: Difference in length (O−S)
Den
sity
−40 −30 −20 −10 0 10 20
0.00
0.10
QITL 2 - Osnabruck 2006 – p. 35
Constituent Length
Table 7: Mean length of subject (measured in
words)Middlefield (emb.) Prefield (main)
Sub>Obj Obj>Sub Sub>Obj Obj>SubAccusative 2.67 3.91 3.94 4.34Dative 2.82 3.16 3.32 4.43
QITL 2 - Osnabruck 2006 – p. 36
Constituent Length
Table 8: Mean length of object (measured in words)Middlefield (emb.) Prefield (main)
Sub>Obj Obj>Sub Sub>Obj Obj>SubAccusative 3.51 3.45 3.92 4.68Dative 3.3 3.21 2.86 4.45
QITL 2 - Osnabruck 2006 – p. 37
Constituent Length
Middlefield OS sentences: Object length
Den
sity
0 5 10 15 20 25 30
0.0
0.3
0.6
Prefield OS sentences: Object length
Den
sity
0 5 10 15 20 25 30
0.0
0.3
0.6
QITL 2 - Osnabruck 2006 – p. 38
Constituent Length
Summary:
In the middlefield, there is a slight tendency for initialsubjects to be shorter than non-initial subjects, but noeffect for objects.
For prefield sentences, no clear tendency is visible.Subjects are somewhat shorter when in the prefield, butthe reverse is true for objects.
Overall, there is a tendency for “short before long” in themiddlefield but “long before short” when S or O is in theprefield.
QITL 2 - Osnabruck 2006 – p. 39
A Logistic Regression Model
A logistic regression model with the following factors wasfitted to the sentence set “embedded clauses with orderunconstrained”:
object case accusative or dativeanimacy of subject animate or inanimateanimacy of object animate or inanimatevoice active or passiveperfect auxiliary ‘haben’ (have) or ‘sein’ (be)determiner of subject indefinite or definitenr of arguments 2 or 3∆ length (O minus S)
QITL 2 - Osnabruck 2006 – p. 40
A Logistic Regression Model
Estimate Pr(>|z|)(Intercept) 0.7332 0.59955object case = DAT -2.1063 3.4e-05 ***sAni = inanimate -1.8847 1.1e-05 ***oAni = inanimate 1.9801 7.8e-06 ***voice = passive -1.3986 0.00842 **aux = ‘sein’(be) -1.4053 0.00067 ***sDet = definite 1.5183 5.0e-05 ***nr of arguments 1.2614 0.03964 *∆ length (O minus S) -0.0443 0.42250
QITL 2 - Osnabruck 2006 – p. 41
A Logistic Regression Model
The sentence set contains 86% SO-sentences
The model correctly classifies 96% of all sentences(with p < 0.5 taken as OS and p ≥ 0.5 as SO).
Correctness broken down by order and case:SO OS
ACC 1.00 0.0DAT 0.85 0.9
When applied to the set ‘embedded clauses withOS-sentences only’, the model correctly classified 80%of all sentences (0% ACC, 86% DAT).
When applied to the set ‘main clauses withOS-sentences only’, the model correctly classified 15%of all sentences (0% ACC, 61% DAT).
QITL 2 - Osnabruck 2006 – p. 42
A Note on Comprehension
Method:
Procedure: Speeded grammaticality judgments
Material: Ambiguous and unambiguous sentences ofvarious syntactic structures with den-objects. Onlyresults for unambiguous embedded clauses are shown.
Table 9: Selected Results (% correct) from experi-
ments with den-objectsS[+an]/O[+an] S[-an]/O[+an]
Sub>Obj Obj>Sub Sub>Obj Obj>SubAccusative 90 70 93 95Dative 93 86 90 94
QITL 2 - Osnabruck 2006 – p. 43
A Note on Comprehension
For comparison: Difficult garden-path sentences like (13)receive mean judgments of about 50%.(13) . . . dass Maria die Kinder besucht haben
that Maria-ACC the children-NOM visited have
Conclusion:
Even the rarest kind of OS-sentences are clearlycomprehensible, as shown in particular by thecomparison to garden-path sentences.
To some degree, the rareness of OS-structures isreflected in the comprehension results (in particular,ACC with S[+an]/O[+an])
QITL 2 - Osnabruck 2006 – p. 44
Summary
Our corpus results are compatible with a ratherconventional syntactic analysis which claims . . .
The grammar can base-generate both SO- andOS-structures in the middlefield.
The particular order is determined by argumentstructure properties of the verb.
The prefield has to be filled by movement.
Movement allows the deviation from thebase-generated order.
Movement within the middlefield (‘scrambling’)Fronting to SpecCP
QITL 2 - Osnabruck 2006 – p. 45
Summary
The source of OS in our corpus
In the middlefield, OS is by and large restricted tobase-generation. Scrambling is rare (although peopleclearly can comprehend scrambled sentences).
OS with O in the prefield has two sources:An OS structure is base-generated in the middlefieldand O—as the highest argument—is fronted toSpecCP by default.An SO structure is base-generated in the middlefieldand O is fronted to SpecCP for discourse reasons.
QITL 2 - Osnabruck 2006 – p. 46