A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli...
-
Upload
norman-johnston -
Category
Documents
-
view
218 -
download
0
Transcript of A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli...
![Page 1: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/1.jpg)
ACCELERATING EVOLUTIONARY MOLECULAR PHYLOGENETIC ANALYSES ON THE NUS TCG GRID
Hu YongliDepartment of Biochemistry, Yong Loo Lin School of Medicine
![Page 2: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/2.jpg)
WHAT IS PHYLOGENY? The Science of
estimating the evolutionary pastFossil dataMorphological dataProtein sequence
dataDNA sequence dataEtc…
Baldauf, S.L., 2003,Trends Genet. 16(6):345‐51 http://www.clarifyingchristianity.com/images/philotr1.gif, retrieved on 21 Nov 09
WHAT IS MOLECULAR PHYLOGENY?
![Page 3: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/3.jpg)
Maurer-Stroh, S. et. al, 2009, Bio. Direct 4:18
![Page 4: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/4.jpg)
WHICH SOFTWARE TO USE?
PHYLIP
MEGA
PAUP*
PHYLO_WIN
VOSTROG
MAC_CLADE
TURBOTREE
VOSTROG
EVOMONY
![Page 5: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/5.jpg)
PHYLIP Developed in the 1980s Most commonly used package for inferring
phylogenies Most widely‐distributed phylogeny packages Used for building the largest number of
published phylogenetic trees Contains a large number of methods and
can handle many type of data Open source
http://evolution.genetics.washington.edu/phylip/general.html, retrieved on 21 Nov 09Abdennadher, N. and Boesch, R. , 2007, Stud Health Technol Inform. 126:55‐64
![Page 6: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/6.jpg)
BUILDING A PROTEIN PHYLOGENETIC TREE
seqboot protdist neighbor consense drawgram
protein_1
protein_2
protein_3
protein_4
>protein_1
GJYWLKADWWGGMD…>protein_2
KKLLDWGGJWGGMD…
>protein_3
KKLLDWGKJWGGME…>protein_4
GJYWLAADWWGGMS…
![Page 7: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/7.jpg)
WHY PROTDIST???
Most time consuming step Building a tree with 178 protein sequences * protdist ~9 hours and 6 minutes seqboot, neighbor and consense ~ 2 minutes
each
Ability to be parallelized to be placed on the grid
each of the 100 seqboot output datasets can be discretely used for the calculation of protein distances in protdist*Sunfire 6800 server, with 16 CPUs at 900MHz and 16GB RAM
![Page 8: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/8.jpg)
ENABLING PHYLIP ON NUS
TCG
![Page 9: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/9.jpg)
STEPS TAKEN TO PLACE META-PHYLIP ON NUS TCG
Preparing the protdist program in meta‐PHYLIP
Data and Parameter Files Preparation
Running meta‐PHYLIP on the NUS TCG
![Page 10: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/10.jpg)
PREPARING THE PROTDIST PROGRAM IN META‐PHYLIP
Downloading PHYLIP 3.68
Compiling source code on Linux server*
* Intel Pentium 4 CPU 3.00GHz, 4 GB of RAM running on Slackware 10.0
Testing functionality of meta-PHYLIP on NUS altas‐4 Linuxcomputer cluster
![Page 11: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/11.jpg)
STEPS TAKEN TO PLACE META-PHYLIP ON NUS TCG GRID
Preparing the protdist program in meta‐PHYLIP
Data and Parameter Files Preparation
Running meta‐PHYLIP on the NUS TCG
![Page 12: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/12.jpg)
DATA AND PARAMETER FILE PREPARATION
(DATA FILES = INPUT1.DAT)
seqboot protdist neighbor consense drawgram
>protein_1GJYWLKADWWGGMD…>protein_2KKLLDWGGJWGGMD…
>protein_3KKLLDWGKJWGGME…>protein_4GJYWLAADWWGGMS…
Seqboot_1
Seqboot_2
Seqboot_3
……… Seqboot_99
Seqboot_100
Seqboot_1
Seqboot_2
Seqboot_3
Seqboot_99
Seqboot_100
Seqboot_4
Seqboot_89
Seqboot_23
Seqboot_38
Seqboot_8
Seqboot_54Seqboot_8
8Seqboot_13
Seqboot_75
![Page 13: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/13.jpg)
Parameter File
input1.datFoutput1.datY
DATA AND PARAMETER FILE PREPARATION
(PARAMETER FILES = INPUT2.DAT)
![Page 14: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/14.jpg)
STEPS TAKEN TO PLACE META-PHYLIP ON NUS TCG
Preparing the protdist program in meta‐PHYLIP
Data and Parameter Files Preparation
Running meta‐PHYLIP on the NUS TCG
![Page 15: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/15.jpg)
RUNNING META‐PHYLIP ON THE NUS TCG
Download parametrics study program Prepare zipped input file: “input.zip”
(data+parameter files)
![Page 16: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/16.jpg)
DATA PROCESSING ON GRIDInput.zip(100 seqboot output files +
100 parameter
files )
Koala1(GridMP Server)
Seqboot_1Seqboot_
2 Seqboot_3Seqboot_9
9Seqboot_100
Param_1Param_2
Param_3
Param_99
Param_100
Seqboot_1Seqboot_2Seqboot_3
Seqboot_99Seqboot_100
Param_1
Param_2
Param_3
Param_99
Param_100
.
.
Meta-PHYLIP
Meta-PHYLIP
Meta-PHYLIP
Meta-PHYLIP
Meta-PHYLIP
Output1.dat.000001Output2.dat.00000
1Output1.dat.000002 Output2.dat.00000
2Output1.dat.000099 Output2.dat.00009
9
Output1.dat.000100 Output2.dat.00010
0
![Page 17: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/17.jpg)
Parameter File
input1.datFoutput1.datY
LOG FILES
![Page 18: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/18.jpg)
EVALUATING THE SPEEDUP
OF META-PHYLIP
![Page 19: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/19.jpg)
EVALUATION OF SPEEDUP
Speedup is explored with Same protein length different number of protein sequencesReal-life biological datasets
Speedup = RT100 / Tp
RT100 : time (in seconds) from the job creation to return of the last output to the grid server Tp : total CPU time required to run the program in serial.
![Page 20: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/20.jpg)
SPEEDUP ACHIEVED WITH DATASET OF DIFFERENT NUMBER
OF SEQUENCES
speedup achieved ranges from 14.1 to 65.0 times
speedup for small datasets is lower than larger datasets
![Page 21: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/21.jpg)
SPEEDUP ACHIEVED WITH REAL BIOLOGICAL DATA
speedup achieved ranges from 25.0 to 58.1 times
speedup for small datasets is lower than larger datasets
0
10
20
30
40
50
60
HIV-1 Clade D vif
HIV-1 Clade D vpr
HIV-1 Clade D gag
HIV-1 Clade D pol
DENV Envelope
HIV-1 Clade B gag
Influenza A Hemagglutinin
Sp
eed
Up
![Page 22: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/22.jpg)
DISCUSSION AND CONCLUSION Advancement in sequencing technology brings
about sequence data explosion Phylogenetic analyses can no longer be carried
out within an acceptable time frame Placing PHYLIP on the grid will greatly enhance
the rate of molecular phylogenetic analyses Acceleration depends on availability of idle
computer cycles on grid clients Importance in the study of disease outbreaks and
emerging pandemics, especially in disease treatment and pandemic containment
Future challenge: Enhance distribution and generality and efficiency
Sanderson, M.J. and Driskell, A.C. ,2003, Trends Plant Sci. 8(8):374‐379Maurer-Stroh, S. et. al, 2009, Bio. Direct 4:18
![Page 23: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/23.jpg)
ACKNOWLEDGEMENTS A/Prof Tan Tin Wee Mark De Silva Lim Kuan Siong Wang Jun Hong Mohammad Asif Khan Heiny Tan All members of BIC
![Page 24: A CCELERATING E VOLUTIONARY M OLECULAR P HYLOGENETIC ANALYSES ON THE NUS TCG G RID Hu Yongli Department of Biochemistry, Yong Loo Lin School of Medicine.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e495503460f94b3c52d/html5/thumbnails/24.jpg)
THANK YOU