1/47 Peter Grzybek Kriq 75, August 18/19, 2014 Peter Grzybek Estonian Proverbs: Searching for...

34
1/47 Peter Grzybek Kriq 75, August 18/19, 2014 www.peter-grzybek.eu Peter Grzybek Estonian Proverbs: Searching for regularities

Transcript of 1/47 Peter Grzybek Kriq 75, August 18/19, 2014 Peter Grzybek Estonian Proverbs: Searching for...

Page 1: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 1/47

Peter Grzybek

www.peter-grzybek.eu

Peter Grzybek

Estonian Proverbs:

Searching for regularities

Page 2: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 2/47Peter Grzybek

How long is a proverb ?

How long are words in proverbs ?

Does word length depend on proverb length ?

Is word length independent of within-text position ?

Page 3: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 3/47Peter Grzybek

Memo: „There are no positive facts in language.“ (Saussure)

There is always more than one definition.

I. Define the entity you want to measure. If you want to measure sentence length, define ‚sentence‘. If you want to measure word length, define ‚word‘.

II. Determine the measuring units in which you want to measure.

E.g., sentence length: number of clauses, phrases, words, syllables, morphemes, … ?

E.g., word length: number of syllables, morphemes, letters, graphemes, of phonemes, … ?

III. Define the measuring units. Define ‘clause’, ‘phrase’, ‘syllable’, ‘morpheme’, ‘phoneme’,

‘grapheme’, ‘letter’, … ?

How to measure the length of linguistic units and entities ?

Rule in Quantitative Linguistics:

Take direct constituents as measuring units

Page 4: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 4/47Peter Grzybek

How long are proverbs ?

Sentence length: One proverb one sentence

How many…

XY per proverb ?

clauses, phrases + syntactic analysis

- dependent on syntax theory;reduced number of clauses/phrases in proverbs (lack of variation)

words, stems + sufficient variation

- dependent on lexical theory:(orthographic word, phonological word, etc.)

Syllables, morphemes

+ lexical analysis; rhythmic structure

- Dependent on morphology and phonotactics;high degree of variation

Page 5: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 5/47Peter Grzybek

In agglutinative languages …

… stems do not change, … affixes do not fuse with other affixes, … affixes do not change form conditioned by other affixes.

Orthographic problems:

Mother-in-law - Isn‘t that a problem ?

В этом доме.

в кратцу - вкратце

Phonological word (tact group):

Ná mostu.

Page 6: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 6/47Peter Grzybek

How many…

XY per word ?

letters, graphemes

+ easy (automatic) analysis

- high degree of alphabetic arbitrariness;high degree of variation

phonemes + better linguistic justification

- dependent on phonological theory;high degree of variationneglect of quantity

Syllables, morphemes

+ lexical analysis; rhythmic structure

- High degree of variation

How long are words ?

Page 7: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 7/47Peter Grzybek

Estonian phonemes:

Three degrees of phonemic length (consonants and vowels)

[o] (short o) koli = „Müll“[oˈ] (long o) kooli = „Schule“[oː] (extra long o) kooli" = „schulen“

Page 8: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 8/47Peter Grzybek

Decisions / Definitions(In accordance with Kriq 1967)

Linguistic Unit Definition

Sentence One proverb

Length Number of words / stems

Word Orthographic

Length Number of syllables

Page 9: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 9/47Peter Grzybek

Üks riisub rihaga, teine pühib luuaga. (EV 15016)[Der eine recht mit dem Rechen, der andere kehrt mit dem Besen.]

Wo:6 – St:6 – Sy:13Üks rii-sub ri-ha-ga, tei-ne pü-hib luu-a-ga.

Isi puu, isi puuke. (EV 2245)[Das eine ist der Baum, das andere ist das Bäumchen]

Wo:4 – St:4 – Sy:7 I-si puu, i-si puu-ke.

Page 10: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 10/47Peter Grzybek

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 210

200

400

600

800

F r e q u e n c i e s Length

3 934 6195 5186 7107 5118 5189 218

10 17511 8112 6213 4014 1315 516 517 218 419 020 121 1

Erna Normann (1955) Valimik eesti vanasõnu

3576 proverbs Ca. end 19th, early 20th century

Page 11: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 11/47Peter Grzybek

F r e q u e n c i e s Length Normann Old New

3 93 21 264 619 103 895 518 80 526 710 129 397 511 86 178 518 96 449 218 39 12

10 175 21 911 81 21 112 62 14 513 40 4 014 13 3 115 5 0 016 5 1 017 2 018 4 119 0  20 1  21 1  

3576 618 294

Length

2 4 6 8 10 12 14 16 18 20 22

Fre

que

ncy

0,00

0,05

0,10

0,15

0,20

0,25

0,30

0,35

ContemporaryNormannOld

Bimodal distributions:Additional Peaks ( 6 / 8 )

Comparisons

Old (17th/18th century) and Contemporary

Question:Does the word-stem distinction

explain the bi-modality?

Page 12: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 12/47Peter Grzybek

Eesti vanasõnad(12921 proverbs)

Stems per proverb

0 5 10 15 20 25 30 35

Num

ber of proverbs

0

500

1000

1500

2000

2500

(Orthographic) words per proverb

0 5 10 15 20 25 30 35

Num

ber of proverbs

0

500

1000

1500

2000

2500

3000

Orthographic words per proverb

0 5 10 15 20 25 30 35

Stem

s per pro

verb

0

5

10

15

20

25

30

35

Words per proverbStems per proverb

Words stems: Linear relation !

Concentration on words

Page 13: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 13/47Peter Grzybek

Some In-between conclusions

1. Bi-modality seems to originate in the proverb material‘s characteristics; this phenomenon needs more detailed study

2. It seems reasonable to assume the overall picture to be a result of differences between syntactically different provers: e.g., „simple“ (uni-partite proverbs without hypotaxis) vs. „complex“ (n-partite proverbs with hypotaxis).

3. As long as we do not have relevant data available, data pooling seems to be an appropriate procedure, to make the forest visible before the trees.

(Orthographic) words per proverb

0 5 10 15 20 25 30 35

Num

ber of proverbs

0

500

1000

1500

2000

2500

3000

Pooling data: Intervals

2-3, 4-5, 6-7,…

Page 14: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 14/47Peter Grzybek

Is there a way to find a theoretical model for sentence length

frequencies ?Assumptions: 1. The distribution of length is organized in a law-like manner.2. It is sufficient to make assumptions about the difference D of

two neighboring frequencies (probabilities)

11 xxx PPP1 1

1 1

x x x

x x

P P PD

P P

Which factors influence D ?

a language-specific factorsb production-specific factorsc norming forcesd level-specific factors (words vs. phrases)

a bxD

cx d

1x x

a bxP P

cx d

01

1

Pq

x

xm

x

xk

P xx

1; x x

c a b d k xq k P P

c c a m x

Hyperpascal distribution(Beta-binomial d.)

Page 15: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 15/47Peter Grzybek

k = 1.21m = 0.07q = 0.39

C = X²/N = 0.0193

01

1

Pq

x

xmx

xk

P xx

Eesti vanasõnad

Testing the hyperpascal distribution

1. Length of Estonian proverbs is regularly organized.2. The well-known hyperpascal distribution is a good model.

Page 16: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 16/47Peter Grzybek

Is there a regularity of word length in Estonian proverbs ?

Syllables per

word (x)

Number of

words (fx)

1 66482 105733 27304 9205 1496 167 2

1 2 3 4 5 6 70

5000

10000

15000

Normann(21038 words)

Page 17: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 17/47Peter Grzybek

1x xP g x P

1( ) x x

a ag x P P

x x

0,1,2,...!

a x

x

e aP x

x

1

1,2,3,...1 !

a x

x

e aP x

x

1 2 3 4 5 6 70

5000

10000

15000

C = X²/N = 0.08 No good model !

Poisson-distribution

1-displaced Poisson-distribution(„Fucks distribution“)

In search of a word length model

Page 18: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 18/47Peter Grzybek

0,1,2,...xxP pq x

1 1,2,3,...xxP pq x

1 11 1,2,3,...x

xP pq a x xp

Geometric distribution

1-displaced geometric distribution

1-displaced Shenton-Skees geometric distribution

p = 0.88a = 4.71

C = 0.0023

Orthographic words Word stems

p = 0.85a = 3.49

C = 0.0062

1x xP g x P

1( ) x x

ag x q P q P

b

An alternative model for word length in Estonian (proverbs)

Page 19: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 19/47Peter Grzybek

Word length in Eesti vanasõnad(88296 words)

Syllables per word (x)

Number of words

(fx)

1 272722 436963 121274 41855 8226 1657 328 69 1

1 11 1,2,3,...x

xP pq a x xp

p = 0.84a = 3.30

C = 0.0074

Page 20: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 20/47Peter Grzybek

Proverb Length Word Length(Normann)

P r o v e r b L e n g t hT3 T4 T5 T6 T7 T8 T9 T10

(Word length)

2.2652 1.9939 1.9830 1.9554 1.9642 1.8434 1.8507 1.8217

Page 21: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 21/47Peter Grzybek

Menzerath-Altmann law (Altmann 1980)

»The longer (more complex) a linguistic construct, the shorter (less complex) its constituents.«

Example: The longer a sentence the shorter the clauses constituting the sentence.

NB: Direct relations (in the classical structuralist paradigm) only, i.e., the relation of a construct to its immediate constituents; the relation between entities from indirectly related levels (e.g., between sentences and words, leapfrogging the intermediate level of sub-sentential constructs like clauses or phrases) is expected to show different (more complex) tendencies.

ay K x y: construct = dependent variable, x: constituent independent variableK: integration constant, a: parameter determining the steepness of the decrease (for a < 0).

'y a

y x

a bxy K x e 'y ab

y x

2

'y a cb

y x x /a bx c xy K x e e

aWoL K SeL

Basic form:

Full form

Extended form (Wimmer-Altmann law)

Page 22: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 22/47Peter Grzybek

/c xy K e

K  = 1.68 c  = –0.84

R² = 0.90

K  = 1.71 a = 0.18c  = –1.05

R² = 0.98

/a c xy K x e

Proverb length

0 5 10 15 20 25

Word length

1,5

2,0

2,5

3,0

3,5

Normann

Proverb length (words per sentence)

2 4 6 8 10

Word length

(syllables per word)

1,5

2,0

2,5

3,0

3,5

Proverb Length Word Length

Eesti vanasõnad

Page 23: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 23/47Peter Grzybek

/c xy K e

K  = 2.02 c  = 0.42

R² = 0.96

Word Length Syllable Length

Word length (syllables per word)

0 2 4 6 8 10

Syllable length (letters per syllable)

2,0

2,2

2,4

2,6

2,8

3,0

3,2

3,4

Eesti vanasõnad

Page 24: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 24/47Peter Grzybek

Positional aspects of word length

W i t h i n - P r o v e r b i a l P o s i t i o n    Pos1 Pos2 Pos3 Pos4 Pos5 Pos6 Pos7 Pos8 Pos9 Pos10

(Word length)

1.8852 1.7980 1.9765 1.9608 1.8943 1.9756 2.0373 1.9704 1.9771 2.1714

sin cos sin cosf x k a bx c dx e fx g hx

W i t h i n - P r o v e r b i a l P o s i t i o n    Pos1 Pos2 Pos3 Pos4 Pos5 Pos6 Pos7 Pos8 Pos9 Pos10

(Word length)

1.8852

1.7980

1.9765

1.9608

1.8943

1.9756

2.0373

1.9704

1.9771 2.1714

Position

2 4 6 8 10

Me

an w

ord

leng

th

1,6

1,8

2,0

2,2

2,4

Fourier series:R² = 0.99

Page 25: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 25/47Peter Grzybek

In the two approaches discussed above, analyses concerned: • the dependence of word length on sentence length no attention to within-sentence

position, • the dependence of word length on within-proverb position ignoring the specific proverb length.

Position (sentence-length specific)

3 4 5 6 7 8 9 10

Me

an w

ord

leng

ths

1,6

1,8

2,0

2,2

2,4

Unipartite proverbs with length T3–T5

Decrease – increaseMinimum at 2nd positionMaximum at last position

Bipartite proverbs with length T6–T10

Cycle I: unipartite proverbs (T6)

Cycle II:

T7, T9, and T10 T6, T8

unipartite proverbs = monotonous increase

Page 26: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 26/47Peter Grzybek

What causes proverbs to be long(er) or short(er) ?

From internal synergetic to external factors

Page 27: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 27/47Peter Grzybek

... Tänan teid kannatlikkuse ja tähelepanu ...

Page 28: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 28/47Peter Grzybek

Frequency (corpus-based)

0 100 200 300 400

Fa

mili

arit

y (P

TP

)

0

20

40

60

80

100

Familiarity Frequency

0 500 1000 1500 2000 2500 3000 3500

45

50

55

60

65

70

75

80

85

ObservedTheoretical

German data American data

FAM20,0015,0010,005,000,00

8,50

8,00

7,50

7,00

6,50

6,00

SL

Sentence Length and Familiarity(German data: N = 11.355; excluding zero-familiarity, f >100)

SeL = 8.40 FRQ-0.09

R² = 0.89

Page 29: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 29/47Peter Grzybek

“It seems preposterous even to ask where the 'variants of one proverb' end and the 'variants of another proverb' begin, or how many 'different proverbs' could be found within such a thicket.”

Desiderata for Estonian Paremiology

Variants vs. Types

Frequency

Familiarity

1. Linguistic forms of variants

2. Frequency 1. of variants2. of types

3. Familiarity 1. of variants2. of types

Page 30: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 30/47Peter Grzybek

a  = 1.91 R  = 9

C=X²/N = 0.0032

a  = 2.08

C=X²/N = 0.06

Zipf distribution

Right-truncated Zipf distribution

Frequency distribution of ‚variants‘(Unreliable data for f > 10)

Page 31: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 31/47Peter Grzybek

/c xy K e

K  = 6.52 c  = 0.07

R² = 0.96

Number of variants

2 4 6 8

Proverb length

6,0

6,2

6,4

6,6

6,8

7,0

7,2

7,4

Page 32: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 32/47Peter Grzybek

Page 33: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

33/47Peter Grzybek Kriq 75, August 18/19, 2014

July 21, 1939:Arvo Arnol‘dovič Krikmann

July 21, 1940:President Konstantin Päts affirmed the government of Johannes Vares (appointed by Andrej Ždanov), accompanied by the arrival of Soviet demonstrators and Red Army troops, replacement of the Flag of Estonia by the Red flag on Pikk Hermann, meeting of the newly elected parliament Riigikogu on July 21.

July 21, 1944:Graf Claus von Stauffenberg and his fellow conspirators were executed in Berlin for the plot to assassinate Adolf Hitler.

July 21, 1944:The United States Senate ratifies the North Atlantic Treaty.

Village Pudivere (German: Poidifer)Estonian Writer Eduard Vilde (1865-1933)

Simuna ParishImportant point in F.G.W. Struve‘s Geodatic arc, A chain of triangulations (1827)

Belgian National Day

Page 34: 1/47 Peter Grzybek Kriq 75, August 18/19, 2014  Peter Grzybek Estonian Proverbs: Searching for regularities.

Kriq 75, August 18/19, 2014 34/47Peter Grzybek