Bioinformatics – The Premier Biological Research Tool? Warren Kaplan Peter Wills Bioinformatics...

42
Bioinformatics – The Premier Biological Research Tool? Warren Kaplan Peter Wills Bioinformatics Centre Garvan Institute of Medical Research

Transcript of Bioinformatics – The Premier Biological Research Tool? Warren Kaplan Peter Wills Bioinformatics...

Bioinformatics – The Premier Biological

Research Tool?Warren Kaplan

Peter Wills Bioinformatics Centre

Garvan Institute of Medical Research

The Structure of Most Biological Projects

Bench Scientist

Bench Scientist

Bioinformatics

Can we reverse the status quo?

Bioinformatics

Bioinformatics

Bench Scientist

TIME Magazine Invention of the year to Apple iTUNES

“One of the major goals of Bioinformatics is to understand the relationship between amino acid sequence and three-dimensional structure in proteins. If this relationship were known, then the structure of a protein could be reliably predicted from the amino acid sequence.”

David W. Mount

Structure of interesting machine

K+ channel from McKinnon!!

1PRC

Current Status for Protein Structures

23382 Experimentally solved structures in the PDB

~7000 Unique Structures from NRPDB

Mainly soluble easily crystalized proteins

Compared to GenBank we have a long way to go

Comparative Homology Modelling

Threading/Fold Recognition

Ab initio folding (Anfinsen)

IBM Blue Gene

~ 65000 processors200 000 Gigabytes of RAM10e15 FLOPS

Accurate force-field• loops• mis-folded protein diseases

Experamentalists know about this:

• PosDoc

• Protein Structure Initiative (PSI)

3 Year Runtime!

From a folding perspective we still need

the bench scientists!

But if protein structures contain so much information how can this be?

225 Structures of Unknown Function

1FUX: 1.8Angstrom Resolution

INFERENCE

How would we typically go about assigning function to

a structure?

Experimental Validation e.g. functional assays

Bioinformatics• BLAST• VAST/CE

While these tools have failed for these 225 structures,

historically we have been very successful in seeing what originally was not clearly

evident

Is ditchwater dull? Naturalists with microscopes have told me that it

teems with quiet fun. G. K. CHESTERTON, The Spice of Life , 1936

‘Shrek, that’s nothing but a bunch of little dots’

Many of us seem to spend a lot of time also looking at ‘a bunch of dots’

That’s not to say that we can’t drive biological

projects

Glycine Receptor

Absalom, N. L., T. M. Lewis, Kaplan, W., Pierce, K.D., & Schofield, P.R. (2003). "Role of charged residues in coupling ligand binding and channel activation in the extracellular domain of the glycine receptor." J Biol Chem

Binding Partners

Extensive annotation

In silico Site Directed Mutagenesis

Pattern Matchingwith error tolerance

siRNA

gi|942577|gb|U22445.1|MMU22445 Mus musculus serine/threonine kinase (Akt2) mRNA, complete cdsFor triplet filtering there were 34 matches, 0 were removed, there are now 34 matchesFor percentage filtering there were 34 matches, 31 were removed, there are now 3 matches331 AAGGAGAGGCCCGAGGCCCC 351 75.0 __**_*_******_****** 80.01384 AAGCAGAGGCTCGGCGGAGG 1404 70.0 __**_*_***_******_** 80.0547 AAGCAGCGGGGCCCAGGTGA 567 70.0 __**_*********_**_*_ 90.0

PyView

QuickTime™ and aGraphics decompressorare needed to see this picture.

QuickTime™ and aGraphics decompressorare needed to see this picture.QuickTime™ and aGraphics decompressorare needed to see this picture.

Lai, P., W. Kaplan, Church, W.B., & Wong, R.K. (2004). Informative 3D Visualization of Multiple Protein Structures. The Second Asia-Pacific Bioinformatics Conference (APBC2004), Dunedin, New Zealand, Conferences in Research and Practice in Information Technology - Conf.Res.Prac.Inf.Tech

But how are we doing on the communication front?

Why is the protein folding problem so attractive?

2000 to 2500 papers per year

Fundamentally a very easy problem to define

Maybe we aren’t spending enough time understanding what the real problems are

Can we be relying only on the biologists to write specifications?

Why are there examples where someone who has spent 6 months learning about microarrays only then realises that the data is representative of a gene and an associated level of expression?

Specifications can’t be written in a day - but being close enough to the problem can eliminate the need for extensive specs

Great software in the past was written by an individual or a few individuals:

Steven Wolfram wrote Mathematica in a summer

Ken Thompson wrote UNIX

We have to be true experts in the tools we use and develop, but we need to terribly intimate with the biology that we’re hoping to contribute to.

This is the great challenge

The real Apple story