應用生物資訊軟體於平行及網格計算環境
東海大學資訊工程與科學系高效能計算實驗室楊朝棟 , 郭育倫
Apply Bioinformatics Applications on Parallel and Grid Computing EnvironmentApply Bioinformatics Applications on Parallel and Grid Computing Environment
國立高雄應用科技大學 電機研究所presenter : Yu-Ming Wang
2
Experimental Resulet5
Outline
Introduction1
Bioinformatics , BioGrid2
Parallel Bioinformatics3
System Environment4
Click to add TitleConclusionsConclusions6
3
IntroductionBioinformatics tools can speed up the analysis of
large-scale sequence data, especially about sequence alignment.
Hardwares : PC clusters; one master node, seven slave nodes(16 processors totally) Sun Fire 6800 Sever Grid System
Bioinformatics tools : mpiBLAST (MPI) FASTA (MPI) HMMs (PVM-Parallel Virtual machine)
4
Bioinformatics
1. Creation of database allowing storage and management of large biological deta set.
2. Development of algorithems and statistics to determine relationships between members.
3. Use above tools for analysis and interpretation of biological data.
5
Grid Computing
To make more effective use of computer resource. As a way to solve problems that required enormous of computer
power. The resources of many computers can be toward a common
objects.
6
BioGrid
Construct the BioGrid system is necessary for research to reduced the sequence alignment time.
PC ClusterPC Cluster
Local BioGrid
Global BioGrid
7
Parallel Bioinformatics I (BLAST) Basic Local Alignment Search Tool - 核酸與蛋白質序列比對工具
[blastall] :
[blastpgp] : 搜尋 PSI-BLAST(Position-Specific Iterated BLAST ; 一種輸入
蛋白質序列查詢蛋白質資料庫,搜尋是否屬於某個蛋白質家族的 BLAST程式。
[bl2seq] : 2 條核酸或蛋白質序列比對 [formatdb] : 將序列資料轉換成 FASTA 格式 , 再輸入 BLAST 的資料庫 mpiBLAST is based on MPI.
核酸序列比對 蛋白質序列比對
核酸序列與蛋白質資料庫比對
蛋白質序列與轉譯核酸資料庫比對 核酸序列與轉譯核酸資料庫比對
8
Parallel Bioinformatics II (FASTA) FASTA is a searching sequence programs that are similar to the
BLAST modes, exception of PSI-BLAST, therefore provide very fast searchs of sequence database.(DNA and protein)
[fasta] 使用 FASTA 演算法來對 DNA 序列與 DNA 資料庫比對或 protein 序列
跟 protein 資料庫比對
[ssearch]使用 Smith-Waterman 演算法再次進行上述的比對程序 [fastx/fasty]將 DNA 序列與 protein 資料庫作比對,並在 DNA 序列上執行轉譯 [tfastx/tfasty]將 protein 序列與 DNA 資料庫作比對,並在 protein 序列上執行轉譯 [align]在兩組 DNA 或 protein 序列中,計算排列組合 [lalign]在兩組 DNA 或 protein 序列中,計算局部的排列組合
9
Parallel Bioinformatics III (HMMs) HMMs (Hidden Markov Models) can be used to do database
searching using statistical descriptions of a sequence families.
[hmmpfam]要求在 HMM 資料庫上進行序列搜索,並試著在未知的序列上加上註解 [hmmindex]在 HMM 資料庫上建立二進制 SSI 索引 (binary SSI index) [hmmsearch]搜索 HMM 的序列資料庫,找出更多類似的序列組合 [hmmalign]排列多種序列 (align multiple sequence) [hmmbulid]從多種序列排列建立一個 HMM [hmmcalibrate]讀取 HMM ,並校正它的搜尋統計 (search statistics) 法 [hmmemit]產生一個 " 一致性 "(consensus) 的序列 [hmmfetch]從 HMM 資料庫重新取回 HMM
10
Our System Environment(I)
Linux PC Cluster One server node
• AMD ATHLON MP 2000+ processors
• 1 GB shared memory
Seven slave nodes• AMD ATHLON MP 1800+ processors
• 512 MB shared memory
100Mbps Ethernet switches
Sun Fire 6800 Server 8 UltraSPARC III Cu 1.2-GHz processors 8 GB main memory Setup by Solaries 8 operation system
11
Our System Environment(II)
The Grid System Each clusters has one master
node , two slave nodes. 3COM 3C9051 10/100 Fast
Ethernet Card AC-EX3016B Switch HUB Globus Toolkit v2.4
12
Experimental Results (I)
The Experimental Results on PC Cluster The Performance of mpiBLAST
near two times
13
The Performance of HMMs
HMMs
• saved about a half time
• speedup : near two times
14
Experimental Results (II)The Experimental Results on Sun Fire 6800
The Performance of FASTA• speedup : near two times
15
The Performance of HMMs• saved about a half time
• speedup : near two times
16
Experimental Results (III)The Experimental Results on Grid SystemThe performance has obvious improvement
and it can save about one-third time????
250sec160sec
17
Conclusions
The parallel computer and grid system can
save more time for sequence analysis.
Therefore, the parallel bioinformatics tools
can help us reduce the waiting time of
alignment and improve performance about
sequence alignment.
Life is too short &
DNA is too long!!
Top Related