Post on 26-Mar-2015
Future Challenges in Bioinformatics
RRXPharma
RRXPharma
Introduction
• Introduction: How RRX got involved …
• Life sciences context: How bioinformatics came to be important…
• The past half century: How bioinformatics has “evolved”…
Introduction
• Categories of Bioinformatics Tools
• Why We Need Supercomputers
• Software Development Issues
• Future Challenges
• Tools for Biotech Projects
• Summary
How RRX got involved …
• Submitted a Canadian Foundation for Innovation (CFI) proposal for Advanced Bioinformatics Collaborative Computing (ABioCC)
How RRX got involved …
• Developed an SVG based visualization front end
• Paper will be presented at SVG Open 2003 in Vancouver on July 17th
How bioinformatics came to be important…
• After the structure of DNA was reverse engineered with X-Ray diffraction in 1953 focus shifted to nucleic acid sequence analysis
• DNA/RNA/protein sequence data accumulated using computer programs for storage and analysis
How bioinformatics came to be important…
• Bioinformatics algorithms in development for the last half century came into wide spread use by researchers
• The ability to compare sequences created a homology context for unknown sequences of interest leading to advances…
How bioinformatics came to be important…
• Improved sequencing technology enabled the complete deciphering of the human genome >>> 1999
• About 3.18 billion base pairs
• Celera used 300 PE Biosystems ABI Prism 3700 DNA Analysers
How bioinformatics has “evolved”…
• Central dogma of molecular biology – – DNA sequences are transcribed into mRNA
sequences, mRNA sequences are translated into protein sequences, which fold 3D creating structures with functions statistically survival selected >>> affecting the prevalence of the underlying DNA sequences in a population
How bioinformatics has “evolved”…
• This created a supporting information flow– Organization and control of genes in the DNA
sequence– Identification of transcriptional units in the
DNA sequence– Prediction of protein structure from sequence– Analysis of molecular function
How bioinformatics has “evolved”…
• Another covariant information flow was created based on the scientific method– Create hypothesis wrt biological activity– Design experiments to test the hypothesis– Evaluate resulting data for compatibility with
the hypothesis– Extend/modify hypothesis in response
How bioinformatics has “evolved”…
• IT used to handle explosion of data from high throughput techniques, too complex for manual analysis– X-ray diffraction
How bioinformatics has “evolved”…
– Automated DNA sequencing• Amersham Biosciences• Applied Biosystems• Beckman Coulter• LI-COR• SpectruMedix Corp.• Visible Genetics Corp.
How bioinformatics has “evolved”…
– Microarray expression analysis
How bioinformatics has “evolved”…
• Rapid emergence of 3D macromolecular structure databases– New sub discipline: structural
bioinformatics• Atomic and sub cellular spatial
scales– Representation/physics– Storage/retrieval/source data
correlation/interpretation– Analysis/simulation– Display/visualization
How bioinformatics has “evolved”…
Categories of Bioinformatics Tools…
• Databases >>> search/compare• Sequence Analysis - Clusters• Genomics• Phylogenics• Structure Prediction• Molecular Modelling• Microarrays• Packages, Misc Apps, Graphics, Scripts
• aceperl
• BLAST
• Blastall
• Blastpgp
• BLAT
• Blimps
• Entrez
• FASTA
• fastacmd
• formatdb
• getz
• HMMER
• IMPALA
• InterProScan
• PHI-BLAST
• ProSearch
• PSI-BLAST
• PSI-BLASTN
• Seguin
• Swat
• tace
• xace
Categories of Bioinformatics Tools…•Database >>> search/compare
Sequence Analysis• Artemis
• Bl2seq
• BLAST
• Clustal W, X
• consed/autofinish
• Cross_match
• Dotter
• EMBOSS
• FASTA
• Glimmer
• HMMER
• InterProScan
• MEME
• View
• Paracel Transcript Assem
• Phrap
• Phred
• Primers
• ProSearch
• Readseq2
• Rnabob
• RRTree
• SAPS
• seals
• Seqsblast
• STADEN
• Swat
• T-Coffee
Genomics• Calc_primers• Cross_match• FPC• GENSCAN• Glimmer• Image• Mzef• Phrap
• Phred• STADEN• Swat• tace• tace_celegans• tRNAscan-SE• xace• xace_celegans
Phylogenics
• Clustal W• Clustal X• MOLPHY• MrBayes• PHYLIP
• RRTree• T-Coffee• TREE-PUZZLE• TreeViewX
Structure Prediction
• EMBOSS
• MEME
• Modeller
• Mzef
• PHI-BLAST
Molecular Modelling
• Modeller– homology modeling an alignment of a sequence to be
modeled with known related structures
• Rasmol– a molecular graphics program intended for 3D
visualisation of proteins and nucleic acids
• Raster3D (publishing images)
• X3DNA– analyzing and rebuilding 3D structures
Microarrays
• Dapple– a program for quantitating spots on a two-
colour DNA microarray image..
• OligoArray– a program that computes gene specific
oligonucleotides that are free of secondary structure for genome-scale oligonucleotide microarray construction.
Packages, Useful Scripts/Source Code, Graphics, PERL
• BioPERL• BioJava• boxshade• mvscf• seg• Split_fasta
• povRay• Raster3D• MOLPHY
Why We Need Supercomputers…
• Some commercial packages run on “supercomputers”– Accelrys: modeling and simulation
• Materials Studio• Cerius2 (SGI Unix only)
– Homology modeling to catalyst design
• Insight II (SGI Unix only)– 3D graphical environment for physics based molecular
modeling
• Catalyst (high end Unix servers)– database management valuable in drug discovery
research
• QUANTA (high end Unix servers)– crystallographic 2D/3D protein structure solution
• Discovery Studio
Why We Need Supercomputers…
• Supercomputer advantages– Multiple processors
– Large shared memory
– Handle very large files
– Large/fast RAID arrays
– Terabyte tape backup systems
– Power backup systems
– High performance networks
Why We Need Supercomputers…
• Common bioinformatics requirements– Computationally intensive tasks– Large memory models– Intensive/complex database searches– Large experimental database sets– Large derived database sets– Large persistent intermediate data structures– Teamwork data sharing and visualization
Why We Need Supercomputers…
• Network requirements– Driving gigE/10gigE NICs
• Moving large files/data sets rapidly
• Visualization streams/Access GRID
• Coordinating Cluster/GRID computing
• Dynamic provisioning of light paths
Why We Need Supercomputers…
OS
R es ea rchOb je ct s
Mul ti ca st OB GP MP LS IPv 6 GR ID
Object Messaging System
HTTPSession Operator Session Memory Model Beans
Java Server Page Handler
Java ServeletHandler
HostJava VM(s)
Data BaseServer(s)
DAO
OS
IRR Server Wrapper
IRR
IRR
IRR
IRR
Object Messaging System
Request Analyzer
Server BeanCacheResearch
Handler
RtConfig Handler
Whois Handler
GRIDCluster
GRID Cluster Network Object Model
Object Messaging System
GRID Resource Coordination Model
N x NOXC
Router OS
OXC MIB(s)/Session(s)Device Config File(s)
ResearchLab
ResearchLab
Collaboration Object Models
Configuration
Why We Need Supercomputers… xxxxxxxxxxxxxxxxxxxxxxx
Software Development Issues…
• Collaboration contexts/barriers – Team work … collaboration spaces– Standards development … DTDs– Integration issues…
• experimental data to homology to 3D model
• platform issues…
• network issues – 9k MTU - jumbo frames
– Licensing issues – public vs. private
Future Challenges…
• Creating developer infrastructure for building up structural models from component parts …– components from macromolecule libraries ported
to object models
• Understanding the design principles of systems of macromolecules and harnessing them to create new functions …– specialized molecular machines
Future Challenges…
• Learning to design drugs efficiently and cost effectively based on knowledge of the target …– target generation automation– validation automation
• Development of enhanced simulation models that give insight into context based function from knowledge of structure …– possible use of artificial intelligence to limit
scope of search
How Tools might be used for Industry Biotech Projects
Summary
• Bioinformatics – well positioned to assist with
application development– exploring novel bioinformatics
software development– proceeding with supporting access
GRID and optical switching technology
Questions/Comments… ? ;-)