15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

30
Andrew Meade [email protected] University of Reading

Transcript of 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

Page 1: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

Andrew [email protected] of Reading

Page 2: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.
Page 3: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

15 linesrepresenting a bull

Page 4: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.
Page 5: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

Traditional statisticsAssumes data is independent

Comparative methods

Page 6: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.
Page 7: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.
Page 8: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

EnglishFish

DanishFisk

DutchVisch

Fish Ryba

CzechRyba

Russian Ryba

BulgarianRiba

23 other languages34other languages

Page 9: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

1 3517

Average 17

1 “Who”, “Three”

35 “Person”, “Dirty”

Page 10: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

English here sea (A) water when

German hier see, meer (A,B) wasser

wenn

French ici mer (B) eau

quand

Italian qui, qua mare (B) acqua

quando

Greek edo thalasa (C) nero

pote

Hittite ka aruna- (D) watar

kuwapi

Languages Meanings

sea (A) meer (B) thalasa (C) aruna- (D)

English 1 0 0 0

German 1 1 0 0

French 0 1 0 0

Italian 0 1 0 0

Greek 0 0 1 0

Hittite 0 0 0 1

Page 11: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

Q01

0Non cognate

1Cognate

Q10

0 10 1

0 0 0 0

Time1000 years

Page 12: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

Results = Data + Method

Page 13: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

Most probableRandom tree -58204 Log units4.1 x 1014107

Infinite number of poor trees

Page 14: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

Out g

roup

Gre

ek

Ind

o-Ira

nia

n

Sla

vic

Germ

anic

Celtic

Rom

ance

Page 15: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

“Name”, 3 cognate classesClass A, Gypsy (Alav), Persian (Esm)Class B, Latvian (Vards), Lithuanian (Vardas)Class C, All the rest, Hindi (Nam), Greek (Onoma), Italian (Nome)

Class A

Class B Class C

B AA B

C A

A C

B C

C B

B A, C B, ectThe estimated instantiations transition rate

To many parameters, not enough data

Page 16: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

2 cognate classes

Slow rate Fast rate

Class 1

Class 2

Page 17: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

“Red”“Salt”

“Five”

Page 18: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

Mean = 3.05 1.82Median = 2.74Min. = 0.09Max = 9.27

100 fold difference

Mean rates for the 200 words

Slow‘two’, ‘who’, ‘one’, ‘night’, ‘to die’

Fast‘dirty’, ‘to turn’, ‘to stab’,

Page 19: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

Word Half life50% chance of the word being replaced by a non-cognate form

Years

Mean 5260

Median 2530

Min 750

Max 76530

Based on IE being 8000 years

Page 20: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

I-E tree showing variation in rates of lexical replacement, per 10k years

“One” 0.43 “Ear” 0.88 “Sand” 4.5

ROMANCE

GERMANIC

GREEK

GERMANIC

SLAVIC

INDO-IRANIAN

Page 21: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.
Page 22: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.
Page 23: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

Spoken word frequency Spoken word frequency British National CorpusBritish National Corpus

0

50

100

150

200

250

300

350

Co

un

t

1 1.5 2 2.5 3 3.5 4 4.5

log(10) of spoken word frequency per million

N = 4840 wordsmean = 194geometric mean = 35.94median = 25

Page 24: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

Distribution of frequency of word use(20-100 million words)

Most words used < 100 times per million

Page 25: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

r=0.87 r=0.88

r=0.87Frequent of use is very stable thru out IE

Page 26: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

Frequency vs rate of lexical evolution

r=-0.37 r=-0.35

r=-0.41 r=-0.32

Page 27: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

Parts of speechconjunctions ----prepositions ----adjectives ----verbs ----nouns ----special adverbs----pronouns ----numbers ----

R2=0.50 R2=0.48

R2=0.48R2=0.48

Numbers, pronouns, special adverbs

Stronger selection?

Page 28: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.
Page 29: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.
Page 30: 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

Attribute Genetic systems Languages

discrete units nucleotides, genes,individuals

words and other linguisticelements

replication transcription teaching, learning, imitation

dominant mode(s) ofinheritance

parent-offspring parent-offspring,generational (includingteaching)

horizontal transmission many mechanisms (e.g.,hybridisation, viruses,transposons, insects)

borrowing

mutation many mechanisms (e.g.,slippage, unequal crossingover, point mutations andfaulty repair)

mistakes, vowel shifts,innovation

selection of favouredvariants

fitness differences amongalleles

societal trends

Some similarities between linguistic and genetic systems