A High Throughput Bioinformatics Distributed Computing Platform
-
Upload
habibur-rahman -
Category
Education
-
view
376 -
download
1
description
Transcript of A High Throughput Bioinformatics Distributed Computing Platform
A High-Throughput A High-Throughput Bioinformatics Distributed Bioinformatics Distributed
Computing PlatformComputing Platform
INS
TIT
UTE O
F IN
FOR
MA
TIO
N T
EC
HN
OLO
GY (
IIT),
U
NIV
ER
SIT
Y O
F D
HA
KA
1119-09-201219-09-2012
A high-throughput bioinformatics distributed computing platform
Presented by-Presented by-
Md. Habibur RahmanMd. Habibur Rahman
BIT 0216BIT 0216
Institute of Information TechnologyInstitute of Information Technology
University of DhakaUniversity of Dhaka
BangladeshBangladesh
INS
TIT
UTE O
F IN
FOR
MA
TIO
N T
EC
HN
OLO
GY (
IIT),
U
NIV
ER
SIT
Y O
F D
HA
KA
2219-09-201219-09-2012
A high-throughput bioinformatics distributed computing platform
The contributors of the paper The contributors of the paper
Thomas M. Keane, Andrew J. Page, James O. McInerney, Thomas M. Keane, Andrew J. Page, James O. McInerney, and Thomas J. Naughtonand Thomas J. Naughton
Bioinformatics and Pharmacogenomics Laboratory, Bioinformatics and Pharmacogenomics Laboratory, National University of Ireland, Maynooth, Co. Kildare, National University of Ireland, Maynooth, Co. Kildare,
IrelandIreland
Department of Computer Science, National University of Department of Computer Science, National University of Ireland, Maynooth, Co. Kildare, IrelandIreland, Maynooth, Co. Kildare, Ireland
Homepage: http://www.cs.nuim.ie/distibutedHomepage: http://www.cs.nuim.ie/distibuted
INS
TIT
UTE O
F IN
FOR
MA
TIO
N T
EC
HN
OLO
GY (
IIT),
U
NIV
ER
SIT
Y O
F D
HA
KA
3319-09-201219-09-2012
A high-throughput bioinformatics distributed computing platform
PublicationsPublications
18th IEEE Symposium on Computer-Based Medical System (CBMS’05)
INS
TIT
UTE O
F IN
FOR
MA
TIO
N T
EC
HN
OLO
GY (
IIT),
U
NIV
ER
SIT
Y O
F D
HA
KA
4419-09-201219-09-2012
A high-throughput bioinformatics distributed computing platform
Suitability of Bioinformatics to Suitability of Bioinformatics to Distributed ComputingDistributed Computing
A Class of Algorithmic Parallelism
referred to as coarse-grained parallelism.
High compute-to-data ratio.
INS
TIT
UTE O
F IN
FOR
MA
TIO
N T
EC
HN
OLO
GY (
IIT),
U
NIV
ER
SIT
Y O
F D
HA
KA
5519-09-201219-09-2012
A high-throughput bioinformatics distributed computing platform
Topic and Problem Overview Topic and Problem Overview
Demand for high performance computing has increased dramatically in the area of bioinformatics due to rapid increase in the size of genomic databases.
Traditional database search algorithm was not feasible to perform full search of a large database in a reasonable time.
Feasibility of heuristic algorithm but reduction of sensitivity of search.
Evolutionary biology, phylogenetic tree and greedy heuristic algorithm.
INS
TIT
UTE O
F IN
FOR
MA
TIO
N T
EC
HN
OLO
GY (
IIT),
U
NIV
ER
SIT
Y O
F D
HA
KA
6619-09-201219-09-2012
A high-throughput bioinformatics distributed computing platform
Proposed solutionProposed solution
o According to the writers of the paper---
“We present a general-
purpose programmable distributed computing platform suitable
for deployment in a typical university environment where many
semi-idle desktop PC’s are connected via a network”
The system is fully cross-platform. Two distributed bioinformatics applications:
i) DSEARCHii) DPRml
INS
TIT
UTE O
F IN
FOR
MA
TIO
N T
EC
HN
OLO
GY (
IIT),
U
NIV
ER
SIT
Y O
F D
HA
KA
7719-09-201219-09-2012
A high-throughput bioinformatics distributed computing platform
Proposed solution(cont.)Proposed solution(cont.)
o Java Distributed Computing platform
- Client Server model
- Server controls the resources (database, algorithm
or computer hardware)
- The model is divided into three separate pieces of
software: server, client and remote interface.
INS
TIT
UTE O
F IN
FOR
MA
TIO
N T
EC
HN
OLO
GY (
IIT),
U
NIV
ER
SIT
Y O
F D
HA
KA
8819-09-201219-09-2012
A high-throughput bioinformatics distributed computing platform
Proposed solution(cont.)Proposed solution(cont.)
Fig: Diagram of the complete system
INS
TIT
UTE O
F IN
FOR
MA
TIO
N T
EC
HN
OLO
GY (
IIT),
U
NIV
ER
SIT
Y O
F D
HA
KA
9919-09-201219-09-2012
A high-throughput bioinformatics distributed computing platform
Installation and Deployment
- Consists of three executable JAR files corresponding to
the server, client and remote interface.
- Run the client as a low priority background service.
- Hardware specification: At least Pentium IV processor
- OS compatibility: Windows, Sun Solaris, Mac OSX and
Linux.
Proposed solution(cont.)Proposed solution(cont.)
INS
TIT
UTE O
F IN
FOR
MA
TIO
N T
EC
HN
OLO
GY (
IIT),
U
NIV
ER
SIT
Y O
F D
HA
KA
101019-09-201219-09-2012
A high-throughput bioinformatics distributed computing platform
DPRmlDPRml- - Distributed Phylogeny Reconstruction by maximum likelihood
INS
TIT
UTE O
F IN
FOR
MA
TIO
N T
EC
HN
OLO
GY (
IIT),
U
NIV
ER
SIT
Y O
F D
HA
KA
111119-09-201219-09-2012
A high-throughput bioinformatics distributed computing platform
Previous situation:
Maximum likelihood evolution is one the most accurate techniques
for reconstructing phylogenies.
Developed parallel ML programs for reconstructing large and
accurate phylogenetic trees.
Implemented in platform specific language
19-09-201219-09-2012 1212
DPRml (cont.)DPRml (cont.)- - Distributed Phylogeny Reconstruction by maximum likelihood
After the development of distributed computing platform:
One of the most general and powerful likelihood-based phylogenetic
tree building program.
Used proven tree building algorithm and phylogenetic Analysis
Library
Possibility of multiple phylogenetic computation.
Platform independent ML program.
INS
TIT
UTE O
F IN
FOR
MA
TIO
N T
EC
HN
OLO
GY (
IIT),
U
NIV
ER
SIT
Y O
F D
HA
KA
A high-throughput bioinformatics distributed computing platform
19-09-201219-09-2012 1313
DPRml (cont.)DPRml (cont.)- - Distributed Phylogeny Reconstruction by maximum likelihood
Speed up Testing:
INS
TIT
UTE O
F IN
FOR
MA
TIO
N T
EC
HN
OLO
GY (
IIT),
U
NIV
ER
SIT
Y O
F D
HA
KA
A high-throughput bioinformatics distributed computing platform
Fig. Speedup achieved by running 6 simultaneous DPRml problems using between 1-40 semi-idle processors.
DSEARCHDSEARCH
INS
TIT
UTE O
F IN
FOR
MA
TIO
N T
EC
HN
OLO
GY (
IIT),
U
NIV
ER
SIT
Y O
F D
HA
KA
141419-09-201219-09-2012
A high-throughput bioinformatics distributed computing platform
Fully cross-platform parallel database search program.
Operates in a master slave environment.
Splitting the database into fixed sized units that are subsequently
searched on the donor machines.,
19-09-201219-09-2012 1515
DSEARCH (cont.)DSEARCH (cont.)
Speed up Testing:
Fig. Speedup achieved by DSEARCH running on between 1-80 semi-idle processors.
Using-
- FASTA database file,
- A FASTA query
sequence file.
- A searching scheme
- A configuration file.
INS
TIT
UTE O
F IN
FOR
MA
TIO
N T
EC
HN
OLO
GY (
IIT),
U
NIV
ER
SIT
Y O
F D
HA
KA
A high-throughput bioinformatics distributed computing platform
My criticism and future work to doMy criticism and future work to do
INS
TIT
UTE O
F IN
FOR
MA
TIO
N T
EC
HN
OLO
GY (
IIT),
U
NIV
ER
SIT
Y O
F D
HA
KA
A high-throughput bioinformatics distributed computing platform
161619-09-201219-09-2012
No detail description about how the applications works on the
distributed computing platform.
If we don’t get the spare clock cycle of the semi-idle pc then the
system will not give us the best result.
Failure of interconnected network of the desktop-pc’s will reduce
the performance.
To improve and expand the range of bioinformatics applications for
the system.
ConclusionConclusion
INS
TIT
UTE O
F IN
FOR
MA
TIO
N T
EC
HN
OLO
GY (
IIT),
U
NIV
ER
SIT
Y O
F D
HA
KA
171719-09-201219-09-2012
A high-throughput bioinformatics distributed computing platform
“There should not have any conclusion of
research work, It is a continual process and it will
be continued for the betterment of the human
being.”
ANY QUESTION?ANY QUESTION?
19-09-201219-09-2012 1818
INS
TIT
UTE O
F IN
FOR
MA
TIO
N T
EC
HN
OLO
GY (
IIT),
U
NIV
ER
SIT
Y O
F D
HA
KA
A high-throughput bioinformatics distributed computing platform
191919-09-201219-09-2012
A high-throughput bioinformatics distributed computing platform
INS
TIT
UTE O
F IN
FOR
MA
TIO
N T
EC
HN
OLO
GY (
IIT),
U
NIV
ER
SIT
Y O
F D
HA
KA