15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

Andrew [email protected] of Reading

15 linesrepresenting a bull

Traditional statisticsAssumes data is independent

Comparative methods

http://upload.wikimedia.org/wikipedia/en/4/45/Phylogeny_Star_vs_Hierarchical.jpg

EnglishFish

DanishFisk

DutchVisch

Fish Ryba

CzechRyba

Russian Ryba

BulgarianRiba

23 other languages34other languages

1 3517

Average 17

1 “Who”, “Three”

35 “Person”, “Dirty”

English here sea (A) water when

German hier see, meer (A,B) wasser

wenn

French ici mer (B) eau

quand

Italian qui, qua mare (B) acqua

quando

Greek edo thalasa (C) nero

pote

Hittite ka aruna- (D) watar

kuwapi

Languages Meanings

sea (A) meer (B) thalasa (C) aruna- (D)

English 1 0 0 0

German 1 1 0 0

French 0 1 0 0

Italian 0 1 0 0

Greek 0 0 1 0

Hittite 0 0 0 1

Q01

0Non cognate

1Cognate

Q10

0 10 1

0 0 0 0

Time1000 years

Results = Data + Method

Most probableRandom tree -58204 Log units4.1 x 1014107

Infinite number of poor trees

Out g

roup

Gre

ek

Ind

o-Ira

nia

n

Sla

vic

Germ

anic

Celtic

Rom

ance

“Name”, 3 cognate classesClass A, Gypsy (Alav), Persian (Esm)Class B, Latvian (Vards), Lithuanian (Vardas)Class C, All the rest, Hindi (Nam), Greek (Onoma), Italian (Nome)

Class A

Class B Class C

B AA B

C A

A C

B C

C B

B A, C B, ectThe estimated instantiations transition rate

To many parameters, not enough data

2 cognate classes

Slow rate Fast rate

Class 1

Class 2

“Red”“Salt”

“Five”

Mean = 3.05 1.82Median = 2.74Min. = 0.09Max = 9.27

100 fold difference

Mean rates for the 200 words

Slow‘two’, ‘who’, ‘one’, ‘night’, ‘to die’

Fast‘dirty’, ‘to turn’, ‘to stab’,

Word Half life50% chance of the word being replaced by a non-cognate form

Years

Mean 5260

Median 2530

Min 750

Max 76530

Based on IE being 8000 years

I-E tree showing variation in rates of lexical replacement, per 10k years

“One” 0.43 “Ear” 0.88 “Sand” 4.5

ROMANCE

GERMANIC

GREEK

GERMANIC

SLAVIC

INDO-IRANIAN

Spoken word frequency Spoken word frequency British National CorpusBritish National Corpus

0

50

100

150

200

250

300

350

Co

un

t

1 1.5 2 2.5 3 3.5 4 4.5

log(10) of spoken word frequency per million

N = 4840 wordsmean = 194geometric mean = 35.94median = 25

Distribution of frequency of word use(20-100 million words)

Most words used < 100 times per million

r=0.87 r=0.88

r=0.87Frequent of use is very stable thru out IE

Frequency vs rate of lexical evolution

r=-0.37 r=-0.35

r=-0.41 r=-0.32

Parts of speechconjunctions ----prepositions ----adjectives ----verbs ----nouns ----special adverbs----pronouns ----numbers ----

R2=0.50 R2=0.48

R2=0.48R2=0.48

Numbers, pronouns, special adverbs

Stronger selection?

Attribute Genetic systems Languages

discrete units nucleotides, genes,individuals

words and other linguisticelements

replication transcription teaching, learning, imitation

dominant mode(s) ofinheritance

parent-offspring parent-offspring,generational (includingteaching)

horizontal transmission many mechanisms (e.g.,hybridisation, viruses,transposons, insects)

borrowing

mutation many mechanisms (e.g.,slippage, unequal crossingover, point mutations andfaulty repair)

mistakes, vowel shifts,innovation

selection of favouredvariants

fitness differences amongalleles

societal trends

Some similarities between linguistic and genetic systems

15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.

Documents

Transcript of 15 lines representing a bull Traditional statistics Assumes data is independent Comparative methods.