Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.
-
date post
18-Dec-2015 -
Category
Documents
-
view
230 -
download
3
Transcript of Molecular Evolution and Population Genetics with MATLAB ® James J. Cai.
Molecular Evolution and Population Molecular Evolution and Population Genetics with MATLABGenetics with MATLAB®®
James J. Cai
OutlineOutline
Introduction Data Manipulation Phylogenetic Inference Nonneutrality Detection
I n t r o d u c t i o nI n t r o d u c t i o n
MBEToolbox and PGEToolboxMBEToolbox and PGEToolbox
MBEToolbox (Molecular Biology & Evolution) Since March 2003 248 functions (version 2.20) Published
BMC Bioinformatics 2005, 6:64 (22Mar2005) – version 1.0 Evolutionary Bioinformatics Online, in press – version 2.0
m-source code released
PGEToolbox (Population Genetics & Evolution) Since October 2005 227 functions (version 1.37) Published in Journal of Heredity 2008 Feb 29. m-source code released
MBEToolbox Broad classes of substitution models (nt, aa and codon) Wide range of evolutionary distances Synonymous & nonsynonymous substitution rate calculation Model/Tree parameter optimization Site-specific rate estimation (ML and EB methods)
PGEToolbox Sequences and SNP (genotype and haplotype) data manipulation Neutrality tests Coalescent simulations Recombination, LD, and long haplotype tests
MBEToolbox GUIMBEToolbox GUI
Figure 1. MBEToolbox GUI. (a) Sequences submenu; (b) Distances submenu; (c) Phylogeny submenu and DNAML dialog; (d) Polymorphism submenu and DPRS table dialog.
(a)
(b)
(c)
(d)
500 1000 1500 2000 2500 30000
0.5
1
1.5
2
2.5
Subs
titutio
nra
te
Codon site
Sliding Windows Analysis
synnonsyn
500 1000 1500 2000 2500 3000
-100
-80
-60
-40
-20
0
20
Enhanced Sliding Windows Analysis
Subs
titutio
nra
te
-80-60
-40-20
0
0
50
100
0
10
20
30
40
50
XY
Z
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Generations
Alle
le F
requ
ency
p
Change in allele frequency for population (diploid) of size N=100
0 50 100 150 200 250 300 350 400 450 5000
0.5
1
0 500 1000 1500 2000 2500 3000 3500 4000 4500 50000
0.5
1
0 0.05 0.1 0.15 0.2 0.25 0.3 0.350
50
100
150
A R N D C Q E G H I L K M F P S T W Y V
ARNDCQEGHI
LKMFPST
WYV
JTT
0 0.05 0.1 0.15 0.2 0.25 0.3
human-ECP
chimp-ECP
gorla-ECP
orang-ECP
macaq-ECP
human-EDN
chimp-EDN
gorla-EDN
orang-EDN
macaq-EDN
tamar-EDN
0 0.05 0.1 0.15 0.2 0.250
0.02
0.04
0.06
0.08
0.1
0.12Transitions & Transversions vs. Distance
Distance (HKY)
Tra
nsiti
ons
& T
rans
vers
ions
TransitionTransversion
Figure 2. MBEToolbox Output Examples. (a) Graph submenu; (b) Alignment shading; (c) Enhanced sliding window analysis; (d) Tajima’s test; (e) JTT matrix; (f) Distance vs. transition and transversion; (g) Genetic drift simulation; (h) Distance matrix; (i) NJ tree; (j) 3D Z-curve; (k) MCMC estimation of JC distance.
(a)
(c)(d)
(h)
(e)
(f)
(i)
(g)
(j) (k)
MBEToolbox OutputsMBEToolbox Outputs
PGEToolbox GUIPGEToolbox GUI
snptoolsnptool
snptoolsnptool
D a t aD a t aM a n i p u l a t i o nM a n i p u l a t i o n
human-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGhuman-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGchimp-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGchimp-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGgorla-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGgorla-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGorang-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTAGTGGTGorang-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTAGTGGTGmacaq-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGmacaq-ECP ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGhuman-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTCTGGCTGhuman-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTCTGGCTGchimp-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTCTGGCTGchimp-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTCTGGCTGgorla-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTCTGGCAGgorla-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTCTGGCAGorang-EDN ATGGTTCCAAAACTGTTCACTTCTCAAATTTCCCTGCTTCTTCTGTTGGGGCTTCTGGCTGorang-EDN ATGGTTCCAAAACTGTTCACTTCTCAAATTTCCCTGCTTCTTCTGTTGGGGCTTCTGGCTGmacaq-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGmacaq-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGTCTGCTTCTTCTGTTGGGGCTTATGGGTGtamar-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGCGTGCTTCTTCTTTTCGGGCTTTTGAGTGtamar-EDN ATGGTTCCAAAACTGTTCACTTCCCAAATTTGCGTGCTTCTTCTTTTCGGGCTTTTGAGTG
human-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 human-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 chimp-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 chimp-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 gorla-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 gorla-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 orang-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 orang-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 macaq-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 macaq-ECP 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 human-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 human-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 chimp-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 chimp-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 gorla-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 gorla-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 orang-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 4 2 1 1 1 4 4 4 2 2 2 4 3 2 4 4 2 4 4 2 4 3 4 4 orang-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 4 2 1 1 1 4 4 4 2 2 2 4 3 2 4 4 2 4 4 2 4 3 4 4 macaq-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 macaq-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 4 2 4 3 2 4 4 2 4 4 2 4 3 4 4 tamar-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 2 3 4 3 2 4 4 2 4 4 2 4 4 4 4 tamar-EDN 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1 1 1 4 4 4 3 2 3 4 3 2 4 4 2 4 4 2 4 4 4 4
>S = [1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1;>S = [1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1;1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 4 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 4 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1]1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1]
>S = [1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1;>S = [1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1;1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 4 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 4 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1; 1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1]1 4 3 3 4 4 2 2 1 1 1 1 2 4 3 4 4 2 1 2 4 4 2 2 2 1]
>> S(1,:)>> S([3,4],:)>> S(:,[1:3:end])
========================================== ========================================== Genotype View Genotype View ========================================== ========================================== Idv_1Idv_1 CT TT GG GG CT TT AG GG TT CC GG TT TT GG AACT TT GG GG CT TT AG GG TT CC GG TT TT GG AA Idv_2Idv_2 CT TT GG GG CT TT GG GG TT CC GG TT TT GG AACT TT GG GG CT TT GG GG TT CC GG TT TT GG AA Idv_3Idv_3 TT TT GG GG TT CT GG GG TT CC GG TT AT GG AATT TT GG GG TT CT GG GG TT CC GG TT AT GG AA Idv_4Idv_4 CT TT GT AG CT CT GG GG TT CC GG TT AT GG AACT TT GT AG CT CT GG GG TT CC GG TT AT GG AA Idv_5Idv_5 TT TT GT AG TT CT GG GG TT CC GG TT AT GG AATT TT GT AG TT CT GG GG TT CC GG TT AT GG AA Idv_6Idv_6 TT TT GG GG TT TT AG GG TT CC GG TT TT GG AATT TT GG GG TT TT AG GG TT CC GG TT TT GG AA Idv_7Idv_7 CT TT GG GG CT TT GG GG TT CC GG TT TT GG AACT TT GG GG CT TT GG GG TT CC GG TT TT GG AA Idv_8Idv_8 TT TT GT AG TT CT GG GG TT CC GG TT AT GG AATT TT GT AG TT CT GG GG TT CC GG TT AT GG AA Idv_9Idv_9 TT TT GG GG TT TT GG GG CT CC GG TT TT AG AATT TT GG GG TT TT GG GG CT CC GG TT TT AG AA Idv_10Idv_10 CT TT GG GG CT TT GG GG CT CC GG TT TT GG AACT TT GG GG CT TT GG GG CT CC GG TT TT GG AA Idv_11Idv_11 CT TT GT AG CT CT GG GG TT CC GG TT AT GG AACT TT GT AG CT CT GG GG TT CC GG TT AT GG AA Idv_12Idv_12 TT TT GG GG TT TT AG GG TT CC GG TT TT GG AATT TT GG GG TT TT AG GG TT CC GG TT TT GG AA Idv_13Idv_13 CT TT GG GG CT TT AG GG TT CC GG TT TT GG AACT TT GG GG CT TT AG GG TT CC GG TT TT GG AA Idv_14Idv_14 CT TT GG GG CT TT GG GG TT CC GG TT TT GG AACT TT GG GG CT TT GG GG TT CC GG TT TT GG AA Idv_15Idv_15 TT TT GG GG TT TT GG AG TT CT AG CT AT GG AGTT TT GG GG TT TT GG AG TT CT AG CT AT GG AG Idv_16Idv_16 CT TT GG GG CT TT GG GG TT CC GG TT TT GG AACT TT GG GG CT TT GG GG TT CC GG TT TT GG AA Idv_17Idv_17 TT TT GT AG TT CT GG GG TT CC GG TT AT GG AATT TT GT AG TT CT GG GG TT CC GG TT AT GG AA Idv_18Idv_18 TT CT GT AG TT CT AG GG TT CC GG TT AT GG AATT CT GT AG TT CT AG GG TT CC GG TT AT GG AA Idv_19Idv_19 TT TT GG GG TT TT GG GG TT CC GG TT TT GG AATT TT GG GG TT TT GG GG TT CC GG TT TT GG AA Idv_20Idv_20 CT CT GG GG CT TT GG GG TT CC GG TT TT AG AACT CT GG GG CT TT GG GG TT CC GG TT TT AG AA
Human Genome Diversity Project (HGDP)Human Genome Diversity Project (HGDP)
1,043 individuals x 650,000 SNP loci 51 different populations from Africa, Europe, the Middle
East, South and Central Asia, East Asia, Oceania and the Americas.
Using uint8uint8 and memorymapmemorymap functions
P h y l o g e n e t i cP h y l o g e n e t i cI n f e r e n c e sI n f e r e n c e s
Phylogenetic TreePhylogenetic Tree
Binary tree Tree topology Branch lengths (evolutionary time)
Computing the likelihood of a tree modelComputing the likelihood of a tree model
Assumption Given correct multiple alignment of n sequences of length L
X = {xi,j} jth character in the ith sequence
Xj = jth column of the alignment
Tree model Q : substitution rate matrix of dimension : tree topology : a vector of branch lengths : a vector of equilibrium base frequencies
xi,jxj
Computing the likelihood of a tree modelComputing the likelihood of a tree model
The likelihood of a given tree model
With assumption of site independence Reduce computing the likelihood of each column Xi
Again, it can be reduced the summation of all possible labelings of ancestral nodes of a tree
L is a labeling of the n-1 ancestral nodes of the tree with elements from
Computing the likelihood of a tree modelComputing the likelihood of a tree model
Probability must be summed over all possible combinations of ancestral nucleotides.
(Here we have 3 internal nodes giving 64 possible combinations)
Computing the likelihood of a tree modelComputing the likelihood of a tree model
Assume ancestral states were ‘A’s. Start computation at any internal or external node.
Pr = Pr = GG∙∙ PPGAGA(t(t11))∙∙PPAAAA(t(t22))∙∙ PPAAAA(t(t33))∙ ∙ • • • • •• ∙∙ PPACAC(t(t66))1 2 3 6Pr ( ) ( ) ( ) ( )G GA AA AA ACP t P t P t P t
Models of DNA SubstitutionModels of DNA Substitution
Probabilistic model parameters (simplest case): Continuous-time Markov rate matrix:
Q={qi,j}, where q is nucleotide-nucleotide substitution rate;
P(b|a,t) : probability that a base b is substituted for a base a over a branch of length t
Instantaneous Rate Matrix
Equilibrium Frequencies
Transition Rate MatrixInstantaneous Rate Matrix
Equilibrium Frequencies
Probability Matrix (Function of time t)
Transition Rate Matrix
Equilibrium Frequencies
Instantaneous Rate Matrix
Define a Substitution ModelDefine a Substitution Model
>> model=>> model=modeljcmodeljcans = ans =
name: 'jc'name: 'jc' R: [4x4 double]R: [4x4 double] freq: [0.2500 0.2500 0.2500 0.2500]freq: [0.2500 0.2500 0.2500 0.2500]
>> Q=model.R*>> Q=model.R*diagdiag(model.freq)(model.freq) >> P=>> P=expmexpm(Q*0.5)(Q*0.5)
P =P =
1.0027 0.0435 0.0435 0.04351.0027 0.0435 0.0435 0.0435 0.0435 1.0027 0.0435 0.04350.0435 1.0027 0.0435 0.0435 0.0435 0.0435 1.0027 0.04350.0435 0.0435 1.0027 0.0435 0.0435 0.0435 0.0435 1.00270.0435 0.0435 0.0435 1.0027
>> model.R>> model.R
ans =ans =
0 0.3333 0.3333 0.33330 0.3333 0.3333 0.3333 0.3333 0 0.3333 0.33330.3333 0 0.3333 0.3333 0.3333 0.3333 0 0.33330.3333 0.3333 0 0.3333 0.3333 0.3333 0.3333 00.3333 0.3333 0.3333 0
Find a Better TreeFind a Better Tree
>> tree1='((gorla-ECP:0.00664,(chimp-ECP:0.00578,>> tree1='((gorla-ECP:0.00664,(chimp-ECP:0.00578,((orang-ECP:0.02515,(((((human-EDN:0.00542,chimp-((orang-ECP:0.02515,(((((human-EDN:0.00542,chimp-EDN:0.00312):0.00277,gorla-EDN:0.00365):0.01918,orang-EDN:0.00312):0.00277,gorla-EDN:0.00365):0.01918,orang-EDN:0.02427):0.01979,macaq-EDN:0.07058):0.02561,tamar-EDN:0.02427):0.01979,macaq-EDN:0.07058):0.02561,tamar-EDN:0.11203):0.05274):0.02151,macaq-EDN:0.11203):0.05274):0.02151,macaq-ECP:0.04586):0.00935):0.00064):0.00095,human-ECP:0.04586):0.00935):0.00064):0.00095,human-ECP:0.00095); 'ECP:0.00095); '
>> tree2='((gorla-ECP:>> tree2='((gorla-ECP:0.001730.00173,(chimp-ECP:0.00578,,(chimp-ECP:0.00578,((orang-ECP:0.02515,(((((human-EDN:((orang-ECP:0.02515,(((((human-EDN:0.026710.02671,chimp-,chimp-EDN:0.00312):EDN:0.00312):0.001770.00177,gorla-EDN:0.00365):0.01918,orang-,gorla-EDN:0.00365):0.01918,orang-EDN:0.02427):0.01979,macaq-EDN:0.07058):0.02561,tamar-EDN:0.02427):0.01979,macaq-EDN:0.07058):0.02561,tamar-EDN:0.11203):0.05274):0.02151,macaq-EDN:0.11203):0.05274):0.02151,macaq-ECP:0.04586):0.00935):0.00064):0.00095,human-ECP:0.04586):0.00935):0.00064):0.00095,human-ECP:0.00095); 'ECP:0.00095); '
>> model=>> model=modeljcmodeljc;; >> lnL1=>> lnL1=treeliketreelike(aln,tree1,model)(aln,tree1,model) >> lnL2=>> lnL2=treeliketreelike(aln,tree2,model)(aln,tree2,model)
Find a Better ModelFind a Better Model
>> tree='((gorla-ECP:0.00664,(chimp-ECP:0.00578,>> tree='((gorla-ECP:0.00664,(chimp-ECP:0.00578,((orang-ECP:0.02515,(((((human-EDN:0.00542,chimp-((orang-ECP:0.02515,(((((human-EDN:0.00542,chimp-EDN:0.00312):0.00277,gorla-EDN:0.00312):0.00277,gorla-EDN:0.00365):0.01918,orang-EDN:0.00365):0.01918,orang-EDN:0.02427):0.01979,macaq-EDN:0.02427):0.01979,macaq-EDN:0.07058):0.02561,tamar-EDN:0.07058):0.02561,tamar-EDN:0.11203):0.05274):0.02151,macaq-EDN:0.11203):0.05274):0.02151,macaq-ECP:0.04586):0.00935):0.00064):0.00095,human-ECP:0.04586):0.00935):0.00064):0.00095,human-ECP:0.00095); 'ECP:0.00095); '
>> model1=>> model1=modeljcmodeljc;; >> model2=>> model2=modelk2pmodelk2p(2);(2); >> lnL1=>> lnL1=treeliketreelike(aln,tree,model1)(aln,tree,model1) >> lnL2=>> lnL2=treeliketreelike(aln,tree,model2)(aln,tree,model2)
Goldman and Yang’s Codon Model (GY94) Goldman and Yang’s Codon Model (GY94)
j
j
j
0, if and differ by two or more positions
, if and differ by a synonymous transversion
, if and differ by a synonymous transition
, if and differ by a nonsynon
ij
i j
i j
q k i j
i j
j
ymous transition
, if and differ by a nonsynonymous transversionk i j
Modified Codon Model (GY94m)Modified Codon Model (GY94m)
,
0, if and differ by more than one nucleotide difference
, if and differ by a synonymous transversion
, if and differ by a synonymous transtion between p
j
R j
ij i j
i j
i j
i j
q
urines
, if and differ by a synonymous transtion between pyrimidines
, if and differ by a nonsynonymous transversion
, if and differ by a nonsynonymous transtion betw
Y j
j
R j
i j
i j
i j
een purines
, if and differ by a nonsynonymous transtion between pyrimidinesY j i j
Zhang et al (2006)
>> [dS,dN,dN_dS,lnL] = >> [dS,dN,dN_dS,lnL] = dc_gy94dc_gy94(aln,i,j)(aln,i,j) >> [dS,dN,dN_dS,lnL] = >> [dS,dN,dN_dS,lnL] = dc_gy94mdc_gy94m(aln,i,j)(aln,i,j)
N o n n e u t r a l i t yN o n n e u t r a l i t yD e t e c t i o nD e t e c t i o n
Methods for Detecting SelectionMethods for Detecting Selection
Phylogenetic Likelihood Method (with Divergence Data) e.g., dN/dS Test
SFS-based Methods (with Polymorphism Data) e.g., Fay and Wu H Test
LD-based Tests e.g., EHH Test
Methods Using both Polymorphism and Divergence Data e.g., McDonald and Kreitman Test
PGEToolbox GUIPGEToolbox GUI
PGEToolbox GUIPGEToolbox GUI
Locus under positive selectionLocus under positive selection
MK test GUIMK test GUI
AvailabilityAvailability
MBEToolbox http://www.bioinformatics.org/mbetoolbox
PGEToolbox http://www.bioinformatics.org/pgetoolbox
Thank You!Thank You!
Polymorphism vs. Divergence MethodPolymorphism vs. Divergence Method
Neutral Theory of Molecular Evolution: Most genomic regions are
thought to be evolving neutrally; that is, they accumulate mutations (by random genetic drift) that do not influence the fitness of the organism.
Neutral Theory of Molecular Evolution - M. Kimura
Polymorphism vs. Divergence MethodPolymorphism vs. Divergence Method
Neutral theory predicts that the ratio of replacement to silent substitutions should be the same both within and between species – “null” model
When comparing a gene between species a greater proportion of replacement substitutions between species (“fixed” differences) would indicate positive selection for divergence
McDonald-Kreitman TestMcDonald-Kreitman Test
Subset of Data from McDonald & Kreitman Subset of Data from McDonald & Kreitman (1991)(1991)
<- within spp.between spp ->.
Results – McDonald & Kreitman (1991)Results – McDonald & Kreitman (1991)
2/7 << 42/17
7/17 >> 2/42 indicates positive selection
Go Back
L i m i t a t i o n sL i m i t a t i o n s
Parallelism in MATLABParallelism in MATLAB
There are mainly 4 approaches to providing parallel functionalities to Matlab:
1.1. Provide communication routines (MPI/PVM) in Provide communication routines (MPI/PVM) in Matlab. Matlab.
2.2. Provide routines to split up work among multiple Provide routines to split up work among multiple Matlab sessions. Matlab sessions.
3.3. Provide parallel backend to Matlab. Provide parallel backend to Matlab.
4.4. Compile Matlab scripts into native parallel code.Compile Matlab scripts into native parallel code.
MATLAB Pointers LibraryMATLAB Pointers Library
Site
Sequence
Frequency class:
A G G C T T A A AA T G C T C G A AG T G T T C A C GA G G C T C A A GA G A C C C G A A
163
975
1972
2188
3529
4424
4961
5286
7019
1
2
3
4
5
1 2 1 1 1 4 2 1 3
Ancestral Derived
1 2 3 4
1
2
3
4
5
Frequency class
Cou
nt
The frequency spectrum
Site-Frequency SpectrumSite-Frequency Spectrum
1 2 3 4 5 6 7 8 9
10
20
30
40
50
60
Frequency class
Cou
ntObserved frequency spectra
Putatively neutral
Potentially selected
Comparing frequency spectra for different Comparing frequency spectra for different classes of mutationclasses of mutation
)/(
/
aKV
aKD
Tests of selection based on estimates of Θ Tests of selection based on estimates of Θ Tajima’s D Tajima’s D
Tajima’s D tests whether the estimate of Θ from π is significantly different to the estimate from K: If there are a lot of polymorphisms at very low frequency (as
expected under purifying selection) then the estimate from K will be high (i.e. D will have a negative sign).
On the other hand if allele-frequencies are being increased by overdominant (balancing) selection π will be increased without any effect on K (D will have a positive sign)