1/47 Peter Grzybek Kriq 75, August 18/19, 2014 Peter Grzybek Estonian Proverbs: Searching for...
-
Upload
gertrude-york -
Category
Documents
-
view
218 -
download
3
Transcript of 1/47 Peter Grzybek Kriq 75, August 18/19, 2014 Peter Grzybek Estonian Proverbs: Searching for...
Kriq 75, August 18/19, 2014 1/47
Peter Grzybek
www.peter-grzybek.eu
Peter Grzybek
Estonian Proverbs:
Searching for regularities
Kriq 75, August 18/19, 2014 2/47Peter Grzybek
How long is a proverb ?
How long are words in proverbs ?
Does word length depend on proverb length ?
Is word length independent of within-text position ?
Kriq 75, August 18/19, 2014 3/47Peter Grzybek
Memo: „There are no positive facts in language.“ (Saussure)
There is always more than one definition.
I. Define the entity you want to measure. If you want to measure sentence length, define ‚sentence‘. If you want to measure word length, define ‚word‘.
II. Determine the measuring units in which you want to measure.
E.g., sentence length: number of clauses, phrases, words, syllables, morphemes, … ?
E.g., word length: number of syllables, morphemes, letters, graphemes, of phonemes, … ?
III. Define the measuring units. Define ‘clause’, ‘phrase’, ‘syllable’, ‘morpheme’, ‘phoneme’,
‘grapheme’, ‘letter’, … ?
How to measure the length of linguistic units and entities ?
Rule in Quantitative Linguistics:
Take direct constituents as measuring units
Kriq 75, August 18/19, 2014 4/47Peter Grzybek
How long are proverbs ?
Sentence length: One proverb one sentence
How many…
XY per proverb ?
clauses, phrases + syntactic analysis
- dependent on syntax theory;reduced number of clauses/phrases in proverbs (lack of variation)
words, stems + sufficient variation
- dependent on lexical theory:(orthographic word, phonological word, etc.)
Syllables, morphemes
+ lexical analysis; rhythmic structure
- Dependent on morphology and phonotactics;high degree of variation
Kriq 75, August 18/19, 2014 5/47Peter Grzybek
In agglutinative languages …
… stems do not change, … affixes do not fuse with other affixes, … affixes do not change form conditioned by other affixes.
Orthographic problems:
Mother-in-law - Isn‘t that a problem ?
В этом доме.
в кратцу - вкратце
Phonological word (tact group):
Ná mostu.
Kriq 75, August 18/19, 2014 6/47Peter Grzybek
How many…
XY per word ?
letters, graphemes
+ easy (automatic) analysis
- high degree of alphabetic arbitrariness;high degree of variation
phonemes + better linguistic justification
- dependent on phonological theory;high degree of variationneglect of quantity
Syllables, morphemes
+ lexical analysis; rhythmic structure
- High degree of variation
How long are words ?
Kriq 75, August 18/19, 2014 7/47Peter Grzybek
Estonian phonemes:
Three degrees of phonemic length (consonants and vowels)
[o] (short o) koli = „Müll“[oˈ] (long o) kooli = „Schule“[oː] (extra long o) kooli" = „schulen“
Kriq 75, August 18/19, 2014 8/47Peter Grzybek
Decisions / Definitions(In accordance with Kriq 1967)
Linguistic Unit Definition
Sentence One proverb
Length Number of words / stems
Word Orthographic
Length Number of syllables
Kriq 75, August 18/19, 2014 9/47Peter Grzybek
Üks riisub rihaga, teine pühib luuaga. (EV 15016)[Der eine recht mit dem Rechen, der andere kehrt mit dem Besen.]
Wo:6 – St:6 – Sy:13Üks rii-sub ri-ha-ga, tei-ne pü-hib luu-a-ga.
Isi puu, isi puuke. (EV 2245)[Das eine ist der Baum, das andere ist das Bäumchen]
Wo:4 – St:4 – Sy:7 I-si puu, i-si puu-ke.
Kriq 75, August 18/19, 2014 10/47Peter Grzybek
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 210
200
400
600
800
F r e q u e n c i e s Length
3 934 6195 5186 7107 5118 5189 218
10 17511 8112 6213 4014 1315 516 517 218 419 020 121 1
Erna Normann (1955) Valimik eesti vanasõnu
3576 proverbs Ca. end 19th, early 20th century
Kriq 75, August 18/19, 2014 11/47Peter Grzybek
F r e q u e n c i e s Length Normann Old New
3 93 21 264 619 103 895 518 80 526 710 129 397 511 86 178 518 96 449 218 39 12
10 175 21 911 81 21 112 62 14 513 40 4 014 13 3 115 5 0 016 5 1 017 2 018 4 119 0 20 1 21 1
3576 618 294
Length
2 4 6 8 10 12 14 16 18 20 22
Fre
que
ncy
0,00
0,05
0,10
0,15
0,20
0,25
0,30
0,35
ContemporaryNormannOld
Bimodal distributions:Additional Peaks ( 6 / 8 )
Comparisons
Old (17th/18th century) and Contemporary
Question:Does the word-stem distinction
explain the bi-modality?
Kriq 75, August 18/19, 2014 12/47Peter Grzybek
Eesti vanasõnad(12921 proverbs)
Stems per proverb
0 5 10 15 20 25 30 35
Num
ber of proverbs
0
500
1000
1500
2000
2500
(Orthographic) words per proverb
0 5 10 15 20 25 30 35
Num
ber of proverbs
0
500
1000
1500
2000
2500
3000
Orthographic words per proverb
0 5 10 15 20 25 30 35
Stem
s per pro
verb
0
5
10
15
20
25
30
35
Words per proverbStems per proverb
Words stems: Linear relation !
Concentration on words
Kriq 75, August 18/19, 2014 13/47Peter Grzybek
Some In-between conclusions
1. Bi-modality seems to originate in the proverb material‘s characteristics; this phenomenon needs more detailed study
2. It seems reasonable to assume the overall picture to be a result of differences between syntactically different provers: e.g., „simple“ (uni-partite proverbs without hypotaxis) vs. „complex“ (n-partite proverbs with hypotaxis).
3. As long as we do not have relevant data available, data pooling seems to be an appropriate procedure, to make the forest visible before the trees.
(Orthographic) words per proverb
0 5 10 15 20 25 30 35
Num
ber of proverbs
0
500
1000
1500
2000
2500
3000
Pooling data: Intervals
2-3, 4-5, 6-7,…
Kriq 75, August 18/19, 2014 14/47Peter Grzybek
Is there a way to find a theoretical model for sentence length
frequencies ?Assumptions: 1. The distribution of length is organized in a law-like manner.2. It is sufficient to make assumptions about the difference D of
two neighboring frequencies (probabilities)
11 xxx PPP1 1
1 1
x x x
x x
P P PD
P P
Which factors influence D ?
a language-specific factorsb production-specific factorsc norming forcesd level-specific factors (words vs. phrases)
a bxD
cx d
1x x
a bxP P
cx d
01
1
Pq
x
xm
x
xk
P xx
1; x x
c a b d k xq k P P
c c a m x
Hyperpascal distribution(Beta-binomial d.)
Kriq 75, August 18/19, 2014 15/47Peter Grzybek
k = 1.21m = 0.07q = 0.39
C = X²/N = 0.0193
01
1
Pq
x
xmx
xk
P xx
Eesti vanasõnad
Testing the hyperpascal distribution
1. Length of Estonian proverbs is regularly organized.2. The well-known hyperpascal distribution is a good model.
Kriq 75, August 18/19, 2014 16/47Peter Grzybek
Is there a regularity of word length in Estonian proverbs ?
Syllables per
word (x)
Number of
words (fx)
1 66482 105733 27304 9205 1496 167 2
1 2 3 4 5 6 70
5000
10000
15000
Normann(21038 words)
Kriq 75, August 18/19, 2014 17/47Peter Grzybek
1x xP g x P
1( ) x x
a ag x P P
x x
0,1,2,...!
a x
x
e aP x
x
1
1,2,3,...1 !
a x
x
e aP x
x
1 2 3 4 5 6 70
5000
10000
15000
C = X²/N = 0.08 No good model !
Poisson-distribution
1-displaced Poisson-distribution(„Fucks distribution“)
In search of a word length model
Kriq 75, August 18/19, 2014 18/47Peter Grzybek
0,1,2,...xxP pq x
1 1,2,3,...xxP pq x
1 11 1,2,3,...x
xP pq a x xp
Geometric distribution
1-displaced geometric distribution
1-displaced Shenton-Skees geometric distribution
p = 0.88a = 4.71
C = 0.0023
Orthographic words Word stems
p = 0.85a = 3.49
C = 0.0062
1x xP g x P
1( ) x x
ag x q P q P
b
An alternative model for word length in Estonian (proverbs)
Kriq 75, August 18/19, 2014 19/47Peter Grzybek
Word length in Eesti vanasõnad(88296 words)
Syllables per word (x)
Number of words
(fx)
1 272722 436963 121274 41855 8226 1657 328 69 1
1 11 1,2,3,...x
xP pq a x xp
p = 0.84a = 3.30
C = 0.0074
Kriq 75, August 18/19, 2014 20/47Peter Grzybek
Proverb Length Word Length(Normann)
P r o v e r b L e n g t hT3 T4 T5 T6 T7 T8 T9 T10
(Word length)
2.2652 1.9939 1.9830 1.9554 1.9642 1.8434 1.8507 1.8217
Kriq 75, August 18/19, 2014 21/47Peter Grzybek
Menzerath-Altmann law (Altmann 1980)
»The longer (more complex) a linguistic construct, the shorter (less complex) its constituents.«
Example: The longer a sentence the shorter the clauses constituting the sentence.
NB: Direct relations (in the classical structuralist paradigm) only, i.e., the relation of a construct to its immediate constituents; the relation between entities from indirectly related levels (e.g., between sentences and words, leapfrogging the intermediate level of sub-sentential constructs like clauses or phrases) is expected to show different (more complex) tendencies.
ay K x y: construct = dependent variable, x: constituent independent variableK: integration constant, a: parameter determining the steepness of the decrease (for a < 0).
'y a
y x
a bxy K x e 'y ab
y x
2
'y a cb
y x x /a bx c xy K x e e
aWoL K SeL
Basic form:
Full form
Extended form (Wimmer-Altmann law)
Kriq 75, August 18/19, 2014 22/47Peter Grzybek
/c xy K e
K = 1.68 c = –0.84
R² = 0.90
K = 1.71 a = 0.18c = –1.05
R² = 0.98
/a c xy K x e
Proverb length
0 5 10 15 20 25
Word length
1,5
2,0
2,5
3,0
3,5
Normann
Proverb length (words per sentence)
2 4 6 8 10
Word length
(syllables per word)
1,5
2,0
2,5
3,0
3,5
Proverb Length Word Length
Eesti vanasõnad
Kriq 75, August 18/19, 2014 23/47Peter Grzybek
/c xy K e
K = 2.02 c = 0.42
R² = 0.96
Word Length Syllable Length
Word length (syllables per word)
0 2 4 6 8 10
Syllable length (letters per syllable)
2,0
2,2
2,4
2,6
2,8
3,0
3,2
3,4
Eesti vanasõnad
Kriq 75, August 18/19, 2014 24/47Peter Grzybek
Positional aspects of word length
W i t h i n - P r o v e r b i a l P o s i t i o n Pos1 Pos2 Pos3 Pos4 Pos5 Pos6 Pos7 Pos8 Pos9 Pos10
(Word length)
1.8852 1.7980 1.9765 1.9608 1.8943 1.9756 2.0373 1.9704 1.9771 2.1714
sin cos sin cosf x k a bx c dx e fx g hx
W i t h i n - P r o v e r b i a l P o s i t i o n Pos1 Pos2 Pos3 Pos4 Pos5 Pos6 Pos7 Pos8 Pos9 Pos10
(Word length)
1.8852
1.7980
1.9765
1.9608
1.8943
1.9756
2.0373
1.9704
1.9771 2.1714
Position
2 4 6 8 10
Me
an w
ord
leng
th
1,6
1,8
2,0
2,2
2,4
Fourier series:R² = 0.99
Kriq 75, August 18/19, 2014 25/47Peter Grzybek
In the two approaches discussed above, analyses concerned: • the dependence of word length on sentence length no attention to within-sentence
position, • the dependence of word length on within-proverb position ignoring the specific proverb length.
Position (sentence-length specific)
3 4 5 6 7 8 9 10
Me
an w
ord
leng
ths
1,6
1,8
2,0
2,2
2,4
Unipartite proverbs with length T3–T5
Decrease – increaseMinimum at 2nd positionMaximum at last position
Bipartite proverbs with length T6–T10
Cycle I: unipartite proverbs (T6)
Cycle II:
T7, T9, and T10 T6, T8
unipartite proverbs = monotonous increase
Kriq 75, August 18/19, 2014 26/47Peter Grzybek
What causes proverbs to be long(er) or short(er) ?
From internal synergetic to external factors
Kriq 75, August 18/19, 2014 27/47Peter Grzybek
... Tänan teid kannatlikkuse ja tähelepanu ...
Kriq 75, August 18/19, 2014 28/47Peter Grzybek
Frequency (corpus-based)
0 100 200 300 400
Fa
mili
arit
y (P
TP
)
0
20
40
60
80
100
Familiarity Frequency
0 500 1000 1500 2000 2500 3000 3500
45
50
55
60
65
70
75
80
85
ObservedTheoretical
German data American data
FAM20,0015,0010,005,000,00
8,50
8,00
7,50
7,00
6,50
6,00
SL
Sentence Length and Familiarity(German data: N = 11.355; excluding zero-familiarity, f >100)
SeL = 8.40 FRQ-0.09
R² = 0.89
Kriq 75, August 18/19, 2014 29/47Peter Grzybek
“It seems preposterous even to ask where the 'variants of one proverb' end and the 'variants of another proverb' begin, or how many 'different proverbs' could be found within such a thicket.”
Desiderata for Estonian Paremiology
Variants vs. Types
Frequency
Familiarity
1. Linguistic forms of variants
2. Frequency 1. of variants2. of types
3. Familiarity 1. of variants2. of types
Kriq 75, August 18/19, 2014 30/47Peter Grzybek
a = 1.91 R = 9
C=X²/N = 0.0032
a = 2.08
C=X²/N = 0.06
Zipf distribution
Right-truncated Zipf distribution
Frequency distribution of ‚variants‘(Unreliable data for f > 10)
Kriq 75, August 18/19, 2014 31/47Peter Grzybek
/c xy K e
K = 6.52 c = 0.07
R² = 0.96
Number of variants
2 4 6 8
Proverb length
6,0
6,2
6,4
6,6
6,8
7,0
7,2
7,4
Kriq 75, August 18/19, 2014 32/47Peter Grzybek
33/47Peter Grzybek Kriq 75, August 18/19, 2014
July 21, 1939:Arvo Arnol‘dovič Krikmann
July 21, 1940:President Konstantin Päts affirmed the government of Johannes Vares (appointed by Andrej Ždanov), accompanied by the arrival of Soviet demonstrators and Red Army troops, replacement of the Flag of Estonia by the Red flag on Pikk Hermann, meeting of the newly elected parliament Riigikogu on July 21.
July 21, 1944:Graf Claus von Stauffenberg and his fellow conspirators were executed in Berlin for the plot to assassinate Adolf Hitler.
July 21, 1944:The United States Senate ratifies the North Atlantic Treaty.
Village Pudivere (German: Poidifer)Estonian Writer Eduard Vilde (1865-1933)
Simuna ParishImportant point in F.G.W. Struve‘s Geodatic arc, A chain of triangulations (1827)
Belgian National Day
Kriq 75, August 18/19, 2014 34/47Peter Grzybek