EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo...
-
Upload
beryl-bates -
Category
Documents
-
view
217 -
download
3
Transcript of EMBOSS over a Grid 1. 1st EELA Grid School December 4th of 2006 Eduardo MURRIETA LEON Romualdo...
1st EELA Grid School 1st EELA Grid School December 4th of 2006December 4th of 2006
Eduardo MURRIETA LEON
Romualdo ZAYAS-LAGUNAS
Pierre-Alain BRANGER
Jérôme VERLEYEN
Roberto RODRIGUEZ
César BONAVIDES
Alfredo HERNANDEZ
EMBOSS over a GridEMBOSS over a Grid
4
EMBOSS over a GridEMBOSS over a Grid
IndexIndex
BioinformaticsBioinformatics EMBOSSEMBOSS ObjectivesObjectives
5
EMBOSS over a GridEMBOSS over a Grid
What is Bioinformatics?What is Bioinformatics?
BioinformaticsBioinformatics What is it?What is it?
ToolsTools
DatabaseDatabase EMBOSS EMBOSS ObjectivesObjectives
• State of art
- Analysis of genes expression
- Need for prediction of protein structure
- Analysis of sequence
- A huge amount of knowledge to store
6
EMBOSS over a GridEMBOSS over a Grid
What is Bioinformatics?What is Bioinformatics?
BioinformaticsBioinformatics What is it?What is it?
ToolsTools
DatabaseDatabase EMBOSS EMBOSS ObjectivesObjectives
• Bioinformatics as a solution
- To help life science data analysis
- Use in a lot of domain (human genome project)
7
EMBOSS over a GridEMBOSS over a Grid
Type of ToolsType of Tools
BioinformaticsBioinformaticsWhat is it?What is it?
ToolsTools
DatabaseDatabase EMBOSS EMBOSS ObjectivesObjectives
• Searching (knowledge extraction)
- Blast (nucleotides, proteins)
• Alignment
- Clustal
• Phylogeny
- Phylip
8
EMBOSS over a GridEMBOSS over a Grid
DatabaseDatabase
BioinformaticsBioinformaticsWhat is it?What is it?
ToolsTools DatabaseDatabase
EMBOSS EMBOSS ObjectivesObjectives
• Various organization
- NCBI : United States
- EMBL : Europe
- DDBJ : Japan
9
EMBOSS over a GridEMBOSS over a Grid
OverviewOverview
BioinformaticsBioinformatics EMBOSSEMBOSS
OverviewOverview
Tech. Char. Tech. Char.
ArchitectureArchitecture
GUIs GUIs
UseUse ObjectivesObjectives
• The European Molecular Biology Open Software Suite
- From EMBnet
• Package of software:
- a set of sequence analysis program
- a toolkit for creating robust bioinformatics applications or workflows
- Database searching
- Identification of motif
- Presentation tools for publication
10
EMBOSS over a GridEMBOSS over a Grid
Technical CharacteristicsTechnical Characteristics
BioinformaticsBioinformatics EMBOSSEMBOSS
OverviewOverview Tech. Char.Tech. Char.
ArchitectureArchitecture
GUIs GUIs
UseUse ObjectivesObjectives
• Software requirements
- Linux Distribution
- gcc compiler and graphic libraries
• Hardware requirements
- 100 to 400 Mb free disk space
- 512 Mb of RAM
• Execution requirements- Input data size : From 20 Kb to 100 Mb - Output : From 20 Kb to 1 Mb
11
EMBOSS over a GridEMBOSS over a Grid
EMBOSS ArchitectureEMBOSS Architecture
BioinformaticsBioinformatics EMBOSSEMBOSS
OverviewOverview
Tech. Char. Tech. Char. ArchitectureArchitecture
GUIsGUIs
UseUse ObjectivesObjectives
• Main parts
- ACD Files
- Programs (API)
- Inputs / Outputs (sequences, databases)
12
EMBOSS over a GridEMBOSS over a Grid
ACD FilesACD Files
BioinformaticsBioinformatics EMBOSSEMBOSS
OverviewOverview
Tech. Char. Tech. Char. ArchitectureArchitecture
GUIsGUIs
UseUse ObjectivesObjectives
• ACD Files
- Ajax Command
Definition Files
- stored in
$EMBOSS_DIR/acd
application: intconv [ documentation: "Convert ints to ajints" groups: "Test"]
section: input [ information: "Input section" type: "page"]
infile: infile [ parameter: "Y" knowntype: "integer long data" information: "Standard format information" ]
endsection: input
13
EMBOSS over a GridEMBOSS over a Grid
ProgramsPrograms
BioinformaticsBioinformatics EMBOSSEMBOSS
OverviewOverview
Tech. Char. Tech. Char. ArchitectureArchitecture
GUIsGUIs
UseUse ObjectivesObjectives
• Programs
- Binary files written in C and stored in $EMBOSS_DIR/bin
- Use of libraries
AJAX (Asynchronous Javascript and XML)
NUCLEUS (specific of molecular sequence analysis)
14
EMBOSS over a GridEMBOSS over a Grid
Input/OutputInput/Output
BioinformaticsBioinformatics EMBOSSEMBOSS
OverviewOverview
Tech. Char. Tech. Char. ArchitectureArchitecture
GUIsGUIs
UseUse ObjectivesObjectives
• Sequences
- succession of letters representing the structure of a real or hypothetical DNA molecule or protein
- ASCII TEXT extracted from huge Databases
• EMBOSS can access to various format of database
- Embl, Fasta, Genbank, Swissprot …
- access by Id of genes, by description keywords …
15
EMBOSS over a GridEMBOSS over a Grid
GUI for EMBOSSGUI for EMBOSS
BioinformaticsBioinformatics EMBOSSEMBOSS
OverviewOverview
Tech. Char.Tech. Char.
Architecture Architecture GUIsGUIs
UseUse ObjectivesObjectives
• wEMBOSS, Jemboss …
16
EMBOSS over a GridEMBOSS over a Grid
Use of EMBOSSUse of EMBOSS
BioinformaticsBioinformatics EMBOSSEMBOSS
OverviewOverview
Tech. Char.Tech. Char.
Architecture Architecture
GUIsGUIs UseUse
ObjectivesObjectives
• Study of Haptoglobin protein in different species
- Extraction from Swissprot DB. (“seqret”)
- 10 Mamalians species (human, rat, mouse ,rabbit)
- Alignment (“emma”)
- Calculate the phylogenetic tree
17
EMBOSS over a GridEMBOSS over a Grid
Use of EMBOSSUse of EMBOSS
BioinformaticsBioinformatics EMBOSSEMBOSS
OverviewOverview
Tech. Char.Tech. Char.
Architecture Architecture
GUIsGUIs UseUse
ObjectivesObjectives
• Example of a generated tree
18
EMBOSS over a GridEMBOSS over a Grid
ObjectivesObjectives
BioinformaticsBioinformatics EMBOSSEMBOSS ObjectivesObjectives
« Get EMBOSS running over a Grid »
- EMBOSS jobs execution on a grid through command lined
- Retrieving jobs results
- Be able to execute a complete workflow / pipeline sequence analysis (i.e Use of EMBOSS)
Complementary functions• EMBOSSed Databases research
• Wrapping applications for EMBOSS over a Grid
• Web interface and Project manager for EMBOSS
• Have a BioGrid portal