Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

62
Molecular Evolution and Molecular Evolution and Population Genetics with Population Genetics with MATLAB MATLAB ® ® James J. Cai
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    230
  • download

    3

Transcript of Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Page 1: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Molecular Evolution and Population Molecular Evolution and Population Genetics with MATLABGenetics with MATLAB®®

James J. Cai

Page 2: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

OutlineOutline

Introduction Data Manipulation Phylogenetic Inference Nonneutrality Detection

Page 3: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

I n t r o d u c t i o nI n t r o d u c t i o n

Page 4: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

MBEToolbox and PGEToolboxMBEToolbox and PGEToolbox

MBEToolbox (Molecular Biology & Evolution) Since March 2003 248 functions (version 2.20) Published

BMC Bioinformatics 2005, 6:64 (22Mar2005) – version 1.0 Evolutionary Bioinformatics Online, in press – version 2.0

m-source code released

PGEToolbox (Population Genetics & Evolution) Since October 2005 227 functions (version 1.37) Published in Journal of Heredity 2008 Feb 29. m-source code released

Page 5: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

MBEToolbox Broad classes of substitution models (nt, aa and codon) Wide range of evolutionary distances Synonymous & nonsynonymous substitution rate calculation Model/Tree parameter optimization Site-specific rate estimation (ML and EB methods)

PGEToolbox Sequences and SNP (genotype and haplotype) data manipulation Neutrality tests Coalescent simulations Recombination, LD, and long haplotype tests

Page 6: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

MBEToolbox GUIMBEToolbox GUI

Figure 1. MBEToolbox GUI. (a) Sequences submenu; (b) Distances submenu; (c) Phylogeny submenu and DNAML dialog; (d) Polymorphism submenu and DPRS table dialog.

(a)

(b)

(c)

(d)

Page 7: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.
Page 8: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.
Page 9: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

500 1000 1500 2000 2500 30000

0.5

1

1.5

2

2.5

Subs

titutio

nra

te

Codon site

Sliding Windows Analysis

synnonsyn

500 1000 1500 2000 2500 3000

-100

-80

-60

-40

-20

0

20

Enhanced Sliding Windows Analysis

Subs

titutio

nra

te

-80-60

-40-20

0

0

50

100

0

10

20

30

40

50

XY

Z

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Generations

Alle

le F

requ

ency

p

Change in allele frequency for population (diploid) of size N=100

0 50 100 150 200 250 300 350 400 450 5000

0.5

1

0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000

0.5

1

0 0.05 0.1 0.15 0.2 0.25 0.3 0.350

50

100

150

A R N D C Q E G H I L K M F P S T W Y V

ARNDCQEGHI

LKMFPST

WYV

JTT

0 0.05 0.1 0.15 0.2 0.25 0.3

human-ECP

chimp-ECP

gorla-ECP

orang-ECP

macaq-ECP

human-EDN

chimp-EDN

gorla-EDN

orang-EDN

macaq-EDN

tamar-EDN

0 0.05 0.1 0.15 0.2 0.250

0.02

0.04

0.06

0.08

0.1

0.12Transitions & Transversions vs. Distance

Distance (HKY)

Tra

nsiti

ons

& T

rans

vers

ions

TransitionTransversion

Figure 2. MBEToolbox Output Examples. (a) Graph submenu; (b) Alignment shading; (c) Enhanced sliding window analysis; (d) Tajima’s test; (e) JTT matrix; (f) Distance vs. transition and transversion; (g) Genetic drift simulation; (h) Distance matrix; (i) NJ tree; (j) 3D Z-curve; (k) MCMC estimation of JC distance.

(a)

(c)(d)

(h)

(e)

(f)

(i)

(g)

(j) (k)

MBEToolbox OutputsMBEToolbox Outputs

Page 10: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

PGEToolbox GUIPGEToolbox GUI

Page 11: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

snptoolsnptool

Page 12: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

snptoolsnptool

Page 13: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.
Page 14: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

D a t aD a t aM a n i p u l a t i o nM a n i p u l a t i o n

Page 15: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

human-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGhuman-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGchimp-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGchimp-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGgorla-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGgorla-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGorang-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTAGTGGTGorang-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTAGTGGTGmacaq-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGmacaq-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGhuman-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTCTGGCTGhuman-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTCTGGCTGchimp-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTCTGGCTGchimp-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTCTGGCTGgorla-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTCTGGCAGgorla-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTCTGGCAGorang-EDN ATGGTTCCAAAACTGTTCACTTCTCAAATTTCCCTGCTTCTTCTGTTGGGGCTTCTGGCTGorang-EDN ATGGTTCCAAAACTGTTCACTTCTCAAATTTCCCTGCTTCTTCTGTTGGGGCTTCTGGCTGmacaq-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGmacaq-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGtamar-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGCGTGCTTCTTCTTTTCGGGCTTTTGAGTGtamar-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGCGTGCTTCTTCTTTTCGGGCTTTTGAGTG

Page 16: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

human-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 human-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 chimp-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 chimp-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 gorla-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 gorla-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 orang-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 orang-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 macaq-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 macaq-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 human-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 human-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 chimp-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 chimp-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 gorla-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 gorla-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 orang-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 4 2 1 1 1 4 4 4 2 2 2 4 3 2 4 4 2 4 4 2 4 3 4 4 orang-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 4 2 1 1 1 4 4 4 2 2 2 4 3 2 4 4 2 4 4 2 4 3 4 4 macaq-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 macaq-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 tamar-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 2 3 4 3 2 4 4 2 4 4 2 4 4 4 4 tamar-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 2 3 4 3 2 4 4 2 4 4 2 4 4 4 4

Page 17: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

>S = [1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1;>S = [1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1;1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 4 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 4 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1]1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1]

Page 18: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

>S = [1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1;>S = [1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1;1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 4 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 4 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1]1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1]

>> S(1,:)>> S([3,4],:)>> S(:,[1:3:end])

Page 19: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

========================================== ========================================== Genotype View Genotype View ========================================== ========================================== Idv_1Idv_1 CT TT GG GG CT TT AG GG TT CC GG TT TT GG AACT TT GG GG CT TT AG GG TT CC GG TT TT GG AA Idv_2Idv_2 CT TT GG GG CT TT GG GG TT CC GG TT TT GG AACT TT GG GG CT TT GG GG TT CC GG TT TT GG AA Idv_3Idv_3 TT TT GG GG TT CT GG GG TT CC GG TT AT GG AATT TT GG GG TT CT GG GG TT CC GG TT AT GG AA Idv_4Idv_4 CT TT GT AG CT CT GG GG TT CC GG TT AT GG AACT TT GT AG CT CT GG GG TT CC GG TT AT GG AA Idv_5Idv_5 TT TT GT AG TT CT GG GG TT CC GG TT AT GG AATT TT GT AG TT CT GG GG TT CC GG TT AT GG AA Idv_6Idv_6 TT TT GG GG TT TT AG GG TT CC GG TT TT GG AATT TT GG GG TT TT AG GG TT CC GG TT TT GG AA Idv_7Idv_7 CT TT GG GG CT TT GG GG TT CC GG TT TT GG AACT TT GG GG CT TT GG GG TT CC GG TT TT GG AA Idv_8Idv_8 TT TT GT AG TT CT GG GG TT CC GG TT AT GG AATT TT GT AG TT CT GG GG TT CC GG TT AT GG AA Idv_9Idv_9 TT TT GG GG TT TT GG GG CT CC GG TT TT AG AATT TT GG GG TT TT GG GG CT CC GG TT TT AG AA Idv_10Idv_10 CT TT GG GG CT TT GG GG CT CC GG TT TT GG AACT TT GG GG CT TT GG GG CT CC GG TT TT GG AA Idv_11Idv_11 CT TT GT AG CT CT GG GG TT CC GG TT AT GG AACT TT GT AG CT CT GG GG TT CC GG TT AT GG AA Idv_12Idv_12 TT TT GG GG TT TT AG GG TT CC GG TT TT GG AATT TT GG GG TT TT AG GG TT CC GG TT TT GG AA Idv_13Idv_13 CT TT GG GG CT TT AG GG TT CC GG TT TT GG AACT TT GG GG CT TT AG GG TT CC GG TT TT GG AA Idv_14Idv_14 CT TT GG GG CT TT GG GG TT CC GG TT TT GG AACT TT GG GG CT TT GG GG TT CC GG TT TT GG AA Idv_15Idv_15 TT TT GG GG TT TT GG AG TT CT AG CT AT GG AGTT TT GG GG TT TT GG AG TT CT AG CT AT GG AG Idv_16Idv_16 CT TT GG GG CT TT GG GG TT CC GG TT TT GG AACT TT GG GG CT TT GG GG TT CC GG TT TT GG AA Idv_17Idv_17 TT TT GT AG TT CT GG GG TT CC GG TT AT GG AATT TT GT AG TT CT GG GG TT CC GG TT AT GG AA Idv_18Idv_18 TT CT GT AG TT CT AG GG TT CC GG TT AT GG AATT CT GT AG TT CT AG GG TT CC GG TT AT GG AA Idv_19Idv_19 TT TT GG GG TT TT GG GG TT CC GG TT TT GG AATT TT GG GG TT TT GG GG TT CC GG TT TT GG AA Idv_20Idv_20 CT CT GG GG CT TT GG GG TT CC GG TT TT AG AACT CT GG GG CT TT GG GG TT CC GG TT TT AG AA

Page 20: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Human Genome Diversity Project (HGDP)Human Genome Diversity Project (HGDP)

1,043 individuals x 650,000 SNP loci 51 different populations from Africa, Europe, the Middle

East, South and Central Asia, East Asia, Oceania and the Americas.

Using uint8uint8 and memorymapmemorymap functions

Page 21: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

P h y l o g e n e t i cP h y l o g e n e t i cI n f e r e n c e sI n f e r e n c e s

Page 22: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Phylogenetic TreePhylogenetic Tree

Binary tree Tree topology Branch lengths (evolutionary time)

Page 23: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Computing the likelihood of a tree modelComputing the likelihood of a tree model

Assumption Given correct multiple alignment of n sequences of length L

X = {xi,j} jth character in the ith sequence

Xj = jth column of the alignment

Tree model Q : substitution rate matrix of dimension : tree topology : a vector of branch lengths : a vector of equilibrium base frequencies

xi,jxj

Page 24: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Computing the likelihood of a tree modelComputing the likelihood of a tree model

The likelihood of a given tree model

With assumption of site independence Reduce computing the likelihood of each column Xi

Again, it can be reduced the summation of all possible labelings of ancestral nodes of a tree

L is a labeling of the n-1 ancestral nodes of the tree with elements from

Page 25: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Computing the likelihood of a tree modelComputing the likelihood of a tree model

Probability must be summed over all possible combinations of ancestral nucleotides.

(Here we have 3 internal nodes giving 64 possible combinations)

Page 26: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Computing the likelihood of a tree modelComputing the likelihood of a tree model

Assume ancestral states were ‘A’s. Start computation at any internal or external node.

Pr = Pr = GG∙∙ PPGAGA(t(t11))∙∙PPAAAA(t(t22))∙∙ PPAAAA(t(t33))∙ ∙ • • • • •• ∙∙ PPACAC(t(t66))1 2 3 6Pr ( ) ( ) ( ) ( )G GA AA AA ACP t P t P t P t

Page 27: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Models of DNA SubstitutionModels of DNA Substitution

Probabilistic model parameters (simplest case): Continuous-time Markov rate matrix:

Q={qi,j}, where q is nucleotide-nucleotide substitution rate;

P(b|a,t) : probability that a base b is substituted for a base a over a branch of length t

Page 28: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Instantaneous Rate Matrix

Equilibrium Frequencies

Page 29: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Transition Rate MatrixInstantaneous Rate Matrix

Equilibrium Frequencies

Page 30: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Probability Matrix (Function of time t)

Transition Rate Matrix

Equilibrium Frequencies

Instantaneous Rate Matrix

Page 31: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Define a Substitution ModelDefine a Substitution Model

>> model=>> model=modeljcmodeljcans = ans =

name: 'jc'name: 'jc' R: [4x4 double]R: [4x4 double] freq: [0.2500 0.2500 0.2500 0.2500]freq: [0.2500 0.2500 0.2500 0.2500]

>> Q=model.R*>> Q=model.R*diagdiag(model.freq)(model.freq) >> P=>> P=expmexpm(Q*0.5)(Q*0.5)

P =P =

1.0027 0.0435 0.0435 0.04351.0027 0.0435 0.0435 0.0435 0.0435 1.0027 0.0435 0.04350.0435 1.0027 0.0435 0.0435 0.0435 0.0435 1.0027 0.04350.0435 0.0435 1.0027 0.0435 0.0435 0.0435 0.0435 1.00270.0435 0.0435 0.0435 1.0027

>> model.R>> model.R

ans =ans =

0 0.3333 0.3333 0.33330 0.3333 0.3333 0.3333 0.3333 0 0.3333 0.33330.3333 0 0.3333 0.3333 0.3333 0.3333 0 0.33330.3333 0.3333 0 0.3333 0.3333 0.3333 0.3333 00.3333 0.3333 0.3333 0

Page 32: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Find a Better TreeFind a Better Tree

>> tree1='((gorla-ECP:0.00664,(chimp-ECP:0.00578,>> tree1='((gorla-ECP:0.00664,(chimp-ECP:0.00578,((orang-ECP:0.02515,(((((human-EDN:0.00542,chimp-((orang-ECP:0.02515,(((((human-EDN:0.00542,chimp-EDN:0.00312):0.00277,gorla-EDN:0.00365):0.01918,orang-EDN:0.00312):0.00277,gorla-EDN:0.00365):0.01918,orang-EDN:0.02427):0.01979,macaq-EDN:0.07058):0.02561,tamar-EDN:0.02427):0.01979,macaq-EDN:0.07058):0.02561,tamar-EDN:0.11203):0.05274):0.02151,macaq-EDN:0.11203):0.05274):0.02151,macaq-ECP:0.04586):0.00935):0.00064):0.00095,human-ECP:0.04586):0.00935):0.00064):0.00095,human-ECP:0.00095); 'ECP:0.00095); '

>> tree2='((gorla-ECP:>> tree2='((gorla-ECP:0.001730.00173,(chimp-ECP:0.00578,,(chimp-ECP:0.00578,((orang-ECP:0.02515,(((((human-EDN:((orang-ECP:0.02515,(((((human-EDN:0.026710.02671,chimp-,chimp-EDN:0.00312):EDN:0.00312):0.001770.00177,gorla-EDN:0.00365):0.01918,orang-,gorla-EDN:0.00365):0.01918,orang-EDN:0.02427):0.01979,macaq-EDN:0.07058):0.02561,tamar-EDN:0.02427):0.01979,macaq-EDN:0.07058):0.02561,tamar-EDN:0.11203):0.05274):0.02151,macaq-EDN:0.11203):0.05274):0.02151,macaq-ECP:0.04586):0.00935):0.00064):0.00095,human-ECP:0.04586):0.00935):0.00064):0.00095,human-ECP:0.00095); 'ECP:0.00095); '

>> model=>> model=modeljcmodeljc;; >> lnL1=>> lnL1=treeliketreelike(aln,tree1,model)(aln,tree1,model) >> lnL2=>> lnL2=treeliketreelike(aln,tree2,model)(aln,tree2,model)

Page 33: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Find a Better ModelFind a Better Model

>> tree='((gorla-ECP:0.00664,(chimp-ECP:0.00578,>> tree='((gorla-ECP:0.00664,(chimp-ECP:0.00578,((orang-ECP:0.02515,(((((human-EDN:0.00542,chimp-((orang-ECP:0.02515,(((((human-EDN:0.00542,chimp-EDN:0.00312):0.00277,gorla-EDN:0.00312):0.00277,gorla-EDN:0.00365):0.01918,orang-EDN:0.00365):0.01918,orang-EDN:0.02427):0.01979,macaq-EDN:0.02427):0.01979,macaq-EDN:0.07058):0.02561,tamar-EDN:0.07058):0.02561,tamar-EDN:0.11203):0.05274):0.02151,macaq-EDN:0.11203):0.05274):0.02151,macaq-ECP:0.04586):0.00935):0.00064):0.00095,human-ECP:0.04586):0.00935):0.00064):0.00095,human-ECP:0.00095); 'ECP:0.00095); '

>> model1=>> model1=modeljcmodeljc;; >> model2=>> model2=modelk2pmodelk2p(2);(2); >> lnL1=>> lnL1=treeliketreelike(aln,tree,model1)(aln,tree,model1) >> lnL2=>> lnL2=treeliketreelike(aln,tree,model2)(aln,tree,model2)

Page 34: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Goldman and Yang’s Codon Model (GY94) Goldman and Yang’s Codon Model (GY94)

j

j

j

0, if and differ by two or more positions

, if and differ by a synonymous transversion

, if and differ by a synonymous transition

, if and differ by a nonsynon

ij

i j

i j

q k i j

i j

j

ymous transition

, if and differ by a nonsynonymous transversionk i j

Page 35: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.
Page 36: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.
Page 37: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.
Page 38: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Modified Codon Model (GY94m)Modified Codon Model (GY94m)

,

0, if and differ by more than one nucleotide difference

, if and differ by a synonymous transversion

, if and differ by a synonymous transtion between p

j

R j

ij i j

i j

i j

i j

q

urines

, if and differ by a synonymous transtion between pyrimidines

, if and differ by a nonsynonymous transversion

, if and differ by a nonsynonymous transtion betw

Y j

j

R j

i j

i j

i j

een purines

, if and differ by a nonsynonymous transtion between pyrimidinesY j i j

Zhang et al (2006)

Page 39: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

>> [dS,dN,dN_dS,lnL] = >> [dS,dN,dN_dS,lnL] = dc_gy94dc_gy94(aln,i,j)(aln,i,j) >> [dS,dN,dN_dS,lnL] = >> [dS,dN,dN_dS,lnL] = dc_gy94mdc_gy94m(aln,i,j)(aln,i,j)

Page 40: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

N o n n e u t r a l i t yN o n n e u t r a l i t yD e t e c t i o nD e t e c t i o n

Page 41: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Methods for Detecting SelectionMethods for Detecting Selection

Phylogenetic Likelihood Method (with Divergence Data) e.g., dN/dS Test

SFS-based Methods (with Polymorphism Data) e.g., Fay and Wu H Test

LD-based Tests e.g., EHH Test

Methods Using both Polymorphism and Divergence Data e.g., McDonald and Kreitman Test

Page 42: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.
Page 43: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

PGEToolbox GUIPGEToolbox GUI

Page 44: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.
Page 45: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

PGEToolbox GUIPGEToolbox GUI

Page 46: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Locus under positive selectionLocus under positive selection

Page 47: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.
Page 48: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

MK test GUIMK test GUI

Page 49: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

AvailabilityAvailability

MBEToolbox http://www.bioinformatics.org/mbetoolbox

PGEToolbox http://www.bioinformatics.org/pgetoolbox

Thank You!Thank You!

Page 50: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Polymorphism vs. Divergence MethodPolymorphism vs. Divergence Method

Neutral Theory of Molecular Evolution: Most genomic regions are

thought to be evolving neutrally; that is, they accumulate mutations (by random genetic drift) that do not influence the fitness of the organism.

Neutral Theory of Molecular Evolution - M. Kimura

Page 51: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Polymorphism vs. Divergence MethodPolymorphism vs. Divergence Method

Neutral theory predicts that the ratio of replacement to silent substitutions should be the same both within and between species – “null” model

When comparing a gene between species a greater proportion of replacement substitutions between species (“fixed” differences) would indicate positive selection for divergence

Page 52: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

McDonald-Kreitman TestMcDonald-Kreitman Test

Page 53: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Subset of Data from McDonald & Kreitman Subset of Data from McDonald & Kreitman (1991)(1991)

Page 54: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

<- within spp.between spp ->.

Results – McDonald & Kreitman (1991)Results – McDonald & Kreitman (1991)

2/7 << 42/17

7/17 >> 2/42 indicates positive selection

Go Back

Page 55: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

L i m i t a t i o n sL i m i t a t i o n s

Page 56: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Parallelism in MATLABParallelism in MATLAB

There are mainly 4 approaches to providing parallel functionalities to Matlab:

1.1. Provide communication routines (MPI/PVM) in Provide communication routines (MPI/PVM) in Matlab. Matlab.

2.2. Provide routines to split up work among multiple Provide routines to split up work among multiple Matlab sessions. Matlab sessions.

3.3. Provide parallel backend to Matlab. Provide parallel backend to Matlab.

4.4. Compile Matlab scripts into native parallel code.Compile Matlab scripts into native parallel code.

Page 57: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

MATLAB Pointers LibraryMATLAB Pointers Library

Page 58: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

Site

Sequence

Frequency class:

A G G C T T A A AA T G C T C G A AG T G T T C A C GA G G C T C A A GA G A C C C G A A

163

975

1972

2188

3529

4424

4961

5286

7019

1

2

3

4

5

1 2 1 1 1 4 2 1 3

Ancestral Derived

1 2 3 4

1

2

3

4

5

Frequency class

Cou

nt

The frequency spectrum

Site-Frequency SpectrumSite-Frequency Spectrum

Page 59: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

1 2 3 4 5 6 7 8 9

10

20

30

40

50

60

Frequency class

Cou

ntObserved frequency spectra

Putatively neutral

Potentially selected

Comparing frequency spectra for different Comparing frequency spectra for different classes of mutationclasses of mutation

Page 60: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.

)/(

/

aKV

aKD

Tests of selection based on estimates of Θ Tests of selection based on estimates of Θ   Tajima’s D  Tajima’s D

Tajima’s D tests whether the estimate of Θ from π is significantly different to the estimate from K: If there are a lot of polymorphisms at very low frequency (as

expected under purifying selection) then the estimate from K will be high (i.e. D will have a negative sign).

On the other hand if allele-frequencies are being increased by overdominant (balancing) selection π will be increased without any effect on K (D will have a positive sign) 

Page 61: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.
Page 62: Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.