INFSO-RI-508833 Enabling Grids for E-sciencE ghermann/szemelyes/ELTE_2006.
INFSO-RI-508833 Enabling Grids for E-sciencE High Throughput Bioinformatics analysis on the Grid...
-
Upload
laureen-caldwell -
Category
Documents
-
view
216 -
download
0
Transcript of INFSO-RI-508833 Enabling Grids for E-sciencE High Throughput Bioinformatics analysis on the Grid...
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
High Throughput Bioinformatics analysis on the Grid
EMBnet/CNBhttp://www.es.embnet.org/
Scientific Workshop, AGM'06
Helsinki, Finland
Grid Workshop, SC '05, Seattle WA, USA 2
Enabling Grids for E-sciencE
INFSO-RI-508833
Summary
HT analysis on the Grid
GROCK architecture
GROCK as Web Service
Thanks
Lessons Learnt
So long and thanks for all the fish!
Grid Workshop, SC '05, Seattle WA, USA 3
Enabling Grids for E-sciencE
INFSO-RI-508833
Why do we want HT?• The short answer
To perform many analysis efficiently
• The long answer– To run multi-process jobs
Evolutionary bootstraps
Docking
Image processing...
– To run many processes High number of users
High number of problems
• Modelling
• Function prediction
• Structure prediction....
Grid Workshop, SC '05, Seattle WA, USA 4
Enabling Grids for E-sciencE
INFSO-RI-508833
GROCK goal
• Why do we want High-Throughput docking?● find best matches between two molecular structures● for a probe molecule against all molecules in a database
● drug against protein● Identify drug function, predict secondary effects
● protein against proteins● Identify protein interactions, build interaction networks
● protein against drugs● Identify candidate drugs for therapy
● Beyond a single organism
Grid Workshop, SC '05, Seattle WA, USA 5
Enabling Grids for E-sciencE
INFSO-RI-508833
So, what? Is it any good?
To tell you the truth:
In and of itself
it is of limited interest
Grid Workshop, SC '05, Seattle WA, USA 6
Enabling Grids for E-sciencE
INFSO-RI-508833
Beg to disagree
Pharmaceutical companies have been using something 3D-QSAR for years
With considerable success
Grid Workshop, SC '05, Seattle WA, USA 7
Enabling Grids for E-sciencE
INFSO-RI-508833
Come on!Do I need to tell you this? Really?
• You should never blindly trust a computer. – Predictions must be verified
– Predictions must be put in perspective
– Predictions are but a small part of a larger protocol
• It is difficult to get access to pharmacological data– Unless you are a Pharma
• GROCK should be part of a larger ensemble
Grid Workshop, SC '05, Seattle WA, USA 8
Enabling Grids for E-sciencE
INFSO-RI-508833
GROCK in context• Predicting protein interaction networks
– HT protein interaction predictions (HT-GROCK? Whoa!)
– Experimental validation Proteomic analysis of experimental results
– Systems Biology modelling
– Analyze macromolecular assemblies (e.g. 3D-EM)
• Predicting new drugs– Build protein models / Analyze protein structure
– Identify putative targets (3D-QSAR, GROCK, WISDOM)
– Screen using QSAR
– Predict possible effects (GROCK, HT-GROCK?Re-Whoa!)
– Experimental validation
Grid Workshop, SC '05, Seattle WA, USA 9
Enabling Grids for E-sciencE
INFSO-RI-508833
Attacking current needs
• GROCK is a tool that makes 3D molecular screening:
● Easy through a simple, intutitive web interface● More reliable than pharmacophores: uses 3D
docking methods● Versatile: uses standard software and data● Efficient: thanks to the Grid (EGEE)● Integrable in other programs as a Web Service
(SOAP or XML-RPC)● And is GPL!
Grid Workshop, SC '05, Seattle WA, USA 10
Enabling Grids for E-sciencE
INFSO-RI-508833
A Real Time example
• Just for fun: Let's run a screening of aspirin against a small test database
● Connect to GROCK server● Upload aspirin● Select options● Run
Grid Workshop, SC '05, Seattle WA, USA 11
Enabling Grids for E-sciencE
INFSO-RI-508833
Grid Workshop, SC '05, Seattle WA, USA 12
Enabling Grids for E-sciencE
INFSO-RI-508833
GROCK: match explorer
For each
pair● show 10 best● 3D coords● PNG● JPEG● PS● PDF● VRML1● VRML2● Jmol
Grid Workshop, SC '05, Seattle WA, USA 13
Enabling Grids for E-sciencE
INFSO-RI-508833
Aspirin (Acetyl salicylic acid)● Induces its effect through phospholipase A2
● Which is not on the search subset itself (sic)
● But has many other effects● on Protein G signalling● modulates hormone stimulated cyclic AMP production● protects against neurotoxicity● is used in dyslipidaemias ● affects pulmonary surfactant● etc... (check PubMed).
Grid Workshop, SC '05, Seattle WA, USA 14
Enabling Grids for E-sciencE
INFSO-RI-508833
Caveats
● Molecular databases are noisy– Plenty of room for enhancement– ...by Biology/Chemistry Structuralists
● Meaningless molecules are included– E.g. irrelevant molecules from uninteresting organisms– Data reduction by representative clustering
● Meaningful molecules may be excluded– E.g. by substitution of a relevant protein by an irrelevant relative
● 3D matching is approximate– E.g. meaningul info not included (like water or ion molecules)
● Users MUST exercise thoughtful criticism– Just like with any other theoretical tool
Grid Workshop, SC '05, Seattle WA, USA 15
Enabling Grids for E-sciencE
INFSO-RI-508833
Next: Architecture
✔ GROCK: HT docking on the Grid
GROCK architecture
GROCK as Web Service
Lessons learnt
Thanks
So long and thanks for all the fish!
Grid Workshop, SC '05, Seattle WA, USA 16
Enabling Grids for E-sciencE
INFSO-RI-508833
GROCK: architecture
• Design:– User– Web Server– Web service– Grid front-end– Grid back-end
• Advantages:– Secure– Fail safe– Efficient– GENERIC
• To be done:– Make restartable
Avoiding “Death eaters”
Grid Workshop, SC '05, Seattle WA, USA 17
Enabling Grids for E-sciencE
INFSO-RI-508833
GROCK design• Command line application
• WS wrapper
• WWW interface
• Provision for easy expansion– Plugin mechanism to add new databases (PDB, HIC-UP, ZINC)
– Plugin mechanism to add new methods (GRAMM, 3D-DOCK)
– Well defined plugin interfaces (roll your own)
• GROCK builds on other tools– Result browser relies on remote WS for generating output
– Generic docking methods
• GROCK may be used to build other tools
Grid Workshop, SC '05, Seattle WA, USA 18
Enabling Grids for E-sciencE
INFSO-RI-508833
Next: WS
✔ GROCK: HT docking on the Grid✔ GROCK architecture
GROCK as Web Service
Lessons learnt
Thanks
So long and thanks for all the fish!
Grid Workshop, SC '05, Seattle WA, USA 19
Enabling Grids for E-sciencE
INFSO-RI-508833
GROCK as a Web Service
– Callable using SOAP or XML-RPC
– Provides its own description and WSDL when invoked with no parameters User-friendly, human readable
– Provides meta-data about itself Source code Usage info Bibliography
– Job monitoring Asynchronous Web Service Dynamic
Grid Workshop, SC '05, Seattle WA, USA 20
Enabling Grids for E-sciencE
INFSO-RI-508833
An asynchronous WS
When invoked, GROCK returns an opaque key that may be used to query it for status and output info:
Keys are generated at random with enough entropy to make them difficult to guess
The key is actually a ‘session ID’ that uniquely identifies a given job request in the file store.
GROCK uses the key to retrieve job status and output
Grid Workshop, SC '05, Seattle WA, USA 21
Enabling Grids for E-sciencE
INFSO-RI-508833
Next: Lessons Learnt
✔ GROCK: HT docking on the Grid✔ GROCK architecture✔ GROCK as Web Service
Lessons learnt
Thanks
So long and thanks for all the fish!
Grid Workshop, SC '05, Seattle WA, USA 22
Enabling Grids for E-sciencE
INFSO-RI-508833
Future directions
• Add support for additional docking methods– DOCK5 (MPI), AutoDock, others
• Add support for other databases– HIC-Up– ZINC subsets
• Exploit Grid distributed storage system– Needed for truly massive jobs (e.g. drug screening)
• Apply architecture to other problems (evolution, 3D reconstruction, high-throughput *)
Grid Workshop, SC '05, Seattle WA, USA 23
Enabling Grids for E-sciencE
INFSO-RI-508833
Next steps• Extend pharmainformatics work
– Molecular modelling (YaMI: MODELLER) Already on its way
– Molecular Dynamics (AMBER, TINKER, NAMD) In collaboration with Raul Isea (RIB), Paulino Gomez-Puertas
(CBM)...
– Cheminformatics (MPQC, NWChem, Car-Parrinello, DFT) If still needed
• Extend interactions work– 3D-EM analysis of macromolecular assemblies (analysis
restarted on February 2006)
– Xmipp (in-house open source package)
– In collaboration with 3D-EM NoE
– Start easy, with most heavy and used applications
Grid Workshop, SC '05, Seattle WA, USA 24
Enabling Grids for E-sciencE
INFSO-RI-508833
Lessons learned• YaMI v7 (Yet another Modeller Interface)
• GridGRAMM– Running a single process takes longer
– But may be worth the wait
– Don't let anybody mislead you: The Grid is a source of raw computing power. Dot.
• HT Docking– All you need is a tight loop, et voilà!
– Really!
– However...
à
Grid Workshop, SC '05, Seattle WA, USA 25
Enabling Grids for E-sciencE
INFSO-RI-508833
Component Based Architecture• Extending GROCK to use additional dockers
• Extending GROCK to use distributed storage
• Extending GROCK to run in non-EGEE environments
• Shows the relevance of choosing appropriate interfaces
• GROCK, YaMI, GridGRAMM themselves require NEW, well thought out interfaces
• Job execution DOES NOT– DRMAA-WG is a estandard for a batch submission API
– Joined DRMAA-WG in February 2006
– Goal: Define a DRMAA binding for PHP
– Build a DRMAA binding for EGEE
Grid Workshop, SC '05, Seattle WA, USA 26
Enabling Grids for E-sciencE
INFSO-RI-508833
Our Advice• Program using a standard API: DRMAA
– Do it once, run on SGE, Condor, GridWay, etc...
• Use third party work whenever possible– To save effort and increase portability
– Remember: Don't over do it! KISS!
• Define plugin interfaces (and document them)– For extensibility
• Define WS invocation interface (and document it)– For integration into other frameworks
• And finally program a trivial loop (always document)– Don't be too worried about performance
– It will be simple, fast and short
Grid Workshop, SC '05, Seattle WA, USA 27
Enabling Grids for E-sciencE
INFSO-RI-508833
Current work and next steps• Build DRMAA API for EGEE
– So that next steps are easier
• Think about best architecture for data distribution– So it is intuitive, effective and simple
• Go ahead– Molecular modelling
– Molecular dynamics
– Molecular reconstruction
– Macromolecular assembly analysis by 3D-EM
– Cheminformatics (if not done yet)
Grid Workshop, SC '05, Seattle WA, USA 28
Enabling Grids for E-sciencE
INFSO-RI-508833
Next: Middleware classes
✔ GROCK: HT docking on the Grid✔ GROCK architecture✔ GROCK as Web Service✔ Lessons learnt
Thanks
So long and thanks for all the fish!
Grid Workshop, SC '05, Seattle WA, USA 29
Enabling Grids for E-sciencE
INFSO-RI-508833
We wish to thank
• YOU ALL– for being here, your help, encouragement, feedback and
support
– and not falling asleep
• The TEAM at CNB– Biocomputing
José M. Carazo, Carlos Pérez-Roca, Enrique de Andrés, Natalia Jiménez, Sjors Schëres,Alfredo
– Bioinformatics José R. Valverde, David J. García
• THE EU for EGEE
Grid Workshop, SC '05, Seattle WA, USA 30
Enabling Grids for E-sciencE
INFSO-RI-508833
Next: That's all folks!
✔ GROCK: HT docking on the Grid✔ GROCK architecture✔ GROCK as Web Service✔ PHP middleware✔ LCG middleware✔ Thanks
So long and thanks for all the fish!
Grid Workshop, SC '05, Seattle WA, USA 31
Enabling Grids for E-sciencE
INFSO-RI-508833
Any questions?