Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of...
-
Upload
lia-charley -
Category
Documents
-
view
218 -
download
0
Transcript of Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of...
![Page 1: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/1.jpg)
Structural Genomics and the Protein Folding
Problem
George N. Phillips, Jr.University of Wisconsin-Madison
February 15, 2006
![Page 2: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/2.jpg)
High-throughputDNA Sequencing
GeneModel
FunctionalAssignments
Basic Understanding/Applications
(e.g. therapeutics)
Structure Determination& Experimental Analysis
Modeling& Inference
From DNA to biological function
![Page 3: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/3.jpg)
Developing a gene modelGlimmer (Gene Locator and Interpolated Markov ModelER)GlimmerHMM for eukaryotic genomes (more advanced)
Genome sequencingGenome assemblyRegulatory elementsIdentification of ORF’s
All but the simplest genomes are works in progress. It is estimated that 80% of gene models have errors at present!Comparative genomics should help the process, as will sequencing
of expressed sequence tags and other genomics projects
Efficient implementation of a generalized pair hidden Markov model for comparative gene finding. W.H. Majoros, M. Pertea, and S.L. Salzberg. Bioinformatics 21:9 (2005), 1782-88.
![Page 4: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/4.jpg)
PfamMany others…
HYSIELNASLLERGV…HLNIEDNPSCNAMGV…PLNIELNASLNEPGV…WERIELNASLNER--…HQRIEL--SLMMRG-…
HLNIEDNPSCNAMGV…PLNIELNASLNEPGV…WERIELNASLNER--…HQRIEL--SLMMRG-…
HYSIELNASLLERGV…HLNIEDNPSCNAMGV…WERIELNASLNER--…HQRIEL--SLMMRG-…
HLNIEDNPSCNAMGV…PLNIELNASLNEPGV…WERIELNASLNER--…HQRIEL--SLMMRG-… HYSIELNASLLERGV…HLNIEDNPSCNAMGV…PLNIELNASLNEPGV…WERIELNASLNER--…HQRIELK-SLMMRG-… HYSIELNASLLERGV…
HLNIEDNPSCNAMGV…PLNIELNASLNEPGV…WERIELNASLNER--…HQRIEL--SLMMRG-…
The “sequence-space” of proteins
Universe of all protein sequences
PSI-BLASTHMM
![Page 5: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/5.jpg)
PFAM “domains”
Alex Bateman, Lachlan Coin, Richard Durbin, Robert D. Finn, Volker Hollich, Sam Griffiths-Jones, Ajay Khanna, Mhairi Marshall, Simon Moxon, Erik L. L. Sonnhammer, David J. Studholme, Corin Yeats and Sean R. Eddym Nucleic Acids Research(2004) Database Issue 32:D138-D141
![Page 6: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/6.jpg)
High-throughputDNA Sequencing
GeneModel
FunctionalAssignments
Basic Understanding/Applications
(e.g. therapeutics)
Structure Determination& Experimental Analysis
Modeling& Inference
Flow of information from DNA to functional understanding
![Page 7: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/7.jpg)
X-ray Laboratory
![Page 8: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/8.jpg)
Crystallography reveals locationsof electron ‘clouds’ of the atoms:And the polypeptide chain can
be traced through space
![Page 9: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/9.jpg)
ScopCath
The “fold-space” of proteins
Universe of all protein structures
![Page 10: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/10.jpg)
Murzin et al. http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.html
![Page 11: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/11.jpg)
Glimpes of the “fold space” of proteins
Hou, Sims, Zhang, and Kim, PNAS 100:2386 (2003)
![Page 12: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/12.jpg)
High-throughputDNA Sequencing
GeneModel
FunctionalAssignments
Basic Understanding/Applications
(e.g. therapeutics)
Structure Determination& Experimental Analysis
Modeling& Inference
Flow of information from DNA to functional understanding
![Page 13: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/13.jpg)
Connections between sequence and structure
Universe of sequences Universe of structures
![Page 14: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/14.jpg)
Connections between sequence and structure
Universe of sequences Universe of structures
?
![Page 15: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/15.jpg)
At what level of homology can one trust a structural inference?
Redfern, Orengo et al., J. Chromatography B 815:97 (2005)
![Page 16: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/16.jpg)
What is structural genomics?
• Experimental determination of key structures (target selection is a key part of the idea)
• Modeling of family members• Inferring function (note “infer”)• Making direct use of the new structures
![Page 17: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/17.jpg)
Protein Sequences and Folds
• ~100,000 families of proteins that cannot be reliably modeled at present (modeling families: <30% identity over large fraction to a known structure)
• ~50% of all domain families can be assigned to a structure under CATH
![Page 18: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/18.jpg)
Protein Structure Initiative (PSI)Mission Statement
“To make the three-dimensional atomic level structures of most proteins easily available from knowledge of their corresponding DNA sequences.”
![Page 19: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/19.jpg)
Genseration of new structures
Chandonia and Brenner, Science 311:347 2006.
![Page 20: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/20.jpg)
Center for Eukaryotic Structural Genomics
Exclusively eukaryotic targets• 60% fold-space targets (emphasis on eukaryote-only
families• 20% disease relevant• 20% outreach – targets from the community
Overall goals are to reduce the costs of determining structures of proteins from eukaryotes by refining all steps in the pipeline
Supported by National Institutes of HealthJohn Markley- PI, George Phillips/Brian Fox Co-PI’s
![Page 21: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/21.jpg)
University of Wisconsin’s Center for Eukaryotic Structural Genomics
(~75 total, 3/4 unique)
![Page 22: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/22.jpg)
How does one clone, express, purify, and solve structures
not previously studied?
An industry-style pipeline
![Page 23: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/23.jpg)
Protein from E. coli cells Protein from cell-free
PCR cloning -> DNA
Protein from E. coli cells
Construct design
Protein from cell-free
Screening:YieldMS
Functional assays
1-5 mg scale
Fluidigm chip crystallization screening (+)
NMR 15N-1H HSQC or 1H screening (+)
Flexi®Vector plasmids
10-100 mg scale: 13C,15N for NMR, Se-Met for X-ray
2-10 mg scale: 13C,15N for NMR, Se-Met for X-ray
Protein from E. coli cells Protein from cell-free
PCR cloning -> DNA
Protein from E. coli cells
Construct design
Protein from cell-free
Screening:YieldMS
Functional assays
1-5 mg scale
Fluidigm chip crystallization screening (+)
NMR 15N-1H HSQC or 1H screening (+)
Flexi®Vector plasmids
10-100 mg scale: 13C,15N for NMR, Se-Met for X-ray
2-10 mg scale: 13C,15N for NMR, Se-Met for X-ray
Pipeline details: cell-based and cell-free protein production for X-ray and NMR
Note: project involves sequencing, which aids gene modeling!
![Page 24: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/24.jpg)
Sesame—integrated LIMS in use at CESG
Open access to the public—structures, protocols, reagents, progress… http://www.uwstructuralgenomics.org
Zolnai et al., J. Struct. Func. Genomics 4:11 (2003)
![Page 25: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/25.jpg)
At1g18200
Mis-annotated prior to our work, but structure led to discovery of function.
![Page 26: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/26.jpg)
>>Alignment of GalP_UDP_transf vs 1Z84:A|PDBID|CHAIN|SEQUENCE/15-196
*->kkfsplDhvhrrynpLtlvwilVsphrakRPikqsqsLidlkkeLwq ++ ++ + +r p t +w+ sp+rakRP 1Z84:A|PDB 15 GDSVENQSPELRKDPVTNRWVIFSPARAKRP---------------- 45
gavetpkvptdplhdp.dcysakLcpg........atratgevNPdyest + ++k p+ p p++c+ c g++++ ++ r++ ++ P + 1Z84:A|PDB 46 -TDFKSKSPQNPNPKPsSCP---FCIGreqecapeLFRVP-DHDPNWKLR 90
yvLkspkkftndFyalseDnpyikvsvSNeaIaknplfqlksvrGhelci + +n ++als+ +++ +++++ G +++ 1Z84:A|PDB 91 VI-------ENLYPALSRN---LETQ------------STQPETG--TSR 116
VI...CF......SKPehDptlpalakeeirevvdaWqlcteelGyegre +I + F++ +S P h+ l + i+ ++ a + + 1Z84:A|PDB 117 TIvgfGFhdvvieS-PVHSIQLSDIDPVGIGDILIAYKKRINQIA----- 160
nhpayqnvqIFEmNkGaemGcsnpHPYaYFnEHGQvwatsfiP<-* h + + q+F N Ga G s H H Q a++ +P 1Z84:A|PDB 161 QHDSINYIQVFK-NQGASAGASMSHS------HSQMMALPVVP 196
Pfam B: 13 and 136 matches to #’s 7198 and 11634
http://www.sanger.ac.uk/Software/Pfam/
![Page 27: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/27.jpg)
Blind prediction of structure:CASP and At5g18200
![Page 28: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/28.jpg)
High-throughputDNA Sequencing
GeneModel
FunctionalAssignments
Basic Understanding/Applications
(e.g. therapeutics)
Structure Determination& Experimental Analysis
Modeling& Inference
Flow of information from DNA to functional understanding
![Page 29: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/29.jpg)
Function space of proteinsKEGG = Kyoto Encyclopedia of Genes and GenomesThe Gene Ontology project (GO)
Metabolism Cellular Processes
SignalProcessing
Enzymes
Don’t forget protein-protein interactions exist also!
![Page 30: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/30.jpg)
At2g17340
Related to a human protein associated with Hallervorden-Spatz syndrome, a neurological disorder?
![Page 31: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/31.jpg)
81 protein samples sent to Toronto:8 solved CESG structures, 73 randomly chosen
Generalized assays for: phosphatase, esterase, phospodiesterase, protease, amino acid dehydrogenase, alcohol dehydrogenase, organic acid dehydrogenase, amino acid oxidase, alcohol oxidase, organic acid oxidase, beta-lactamase, beta-galactosidase, arylsulfatase, lipase.
Results:- Solid hits: 3 phosphatases, 5 esterases- Weaker hits: 9 more esterases, 6 phosphodiesterases - No hits: all others
A. Yakuknin et al. Current Opinion in Chemical Biology, 8:42 (2004)
Parallel Enzyme Activity Testing (Collaboration with University of Toronto)
![Page 32: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/32.jpg)
Activity Assay Substrate JR5670
Phosphodiesterase bis-pNPP 0.016
Dehydrogenase Amino Acids 0.032
Dehydrogenase Acids 0.016
Dehydrogenase Alcohols 0.022
Dehydrogenase Aldehyde -0.045
Dehydrogenase Sugars 0.003
Thioesterase palmitoyl-CoA 0.108
Oxidase NAD(P)H Ox -0.115
Protease Protease Mix 0.118
Phosphatase pNPP > 1
Target: At2g17340/JR5670
• Absorbance >0.25 is a tentative signal, >0.5 is a strong signal.
Initial Assay: Wide-spectrum
![Page 33: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/33.jpg)
High-throughputDNA Sequencing
GeneModel
FunctionalAssignments
Basic Understanding/Applications
(e.g. therapeutics)
Structure Determination& Experimental Analysis
Modeling& Inference
Flow of information from DNA to functional understanding
![Page 34: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/34.jpg)
At2g17340
Enzyme of unknown specificity.
![Page 35: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/35.jpg)
A functional annotation lesson
![Page 36: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/36.jpg)
Functional Annotation by Inference
From raw DNA sequences, one looks for genomic features such as promoters, alternative splicing of mRNAs, retrotransposons, pseudogenes, tandem duplications, synteny, and homology.
It Is homology, both from sequence and from structure, that allow functional inferences to be made.
Prosite, Dali, VAST, FFAS03
Some tool integrate knowledge from many sources into one place, acting a meta-servers of clues.
![Page 37: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/37.jpg)
Connections between structure and function
Universe of structuresUniverse of functions
![Page 38: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/38.jpg)
Connections between structure and function
Universe of structuresUniverse of functions
Convergent evolution
![Page 39: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/39.jpg)
Connections between structure and function
Universe of structuresUniverse of functions
Divergent evolution
![Page 40: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/40.jpg)
At1g18200
Misleading annotation prior to our work, but structure led to
discovery of function.
![Page 41: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/41.jpg)
High-throughputDNA Sequencing
GeneModel
FunctionalAssignments
Basic Understanding/Applications
(e.g. therapeutics)
Structure Determination& Experimental Analysis
Modeling& Inference
Flow of information from DNA to functional understanding
![Page 42: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/42.jpg)
Summary
Structural genomics efforts are gaining momentum and helping to assign new functions to orfs and to fill in the space of all possible
protein folds.
![Page 43: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006.](https://reader038.fdocuments.net/reader038/viewer/2022103004/56649c855503460f9493ad43/html5/thumbnails/43.jpg)
Administration Madison (Primm, Troestler, Markley, Phillips, Fox)Cloning/sequencing pipeline Madison (Wrobel, Fox)Expression pipeline Madison (Frederick, Fox, Riters)E. coli cell growth pipeline Madison (Sreenath, Burns, Seder, Fox)Cell-Free System Madison (Vinarov, Markley, Newman)Protein purification pipeline Madison (Vojtik, Phillips, Fox, Ellefson, Jeon)Mass spectrometry Madison (Aceti, Sabat, Sussman)
Madison NMRFAM (Song, Tyler, Cornilescu, Markley) NMR spectroscopy Milwaukee MCW (Peterson, Volkman, Lytle)Crystallization / crystallography Madison (Bingman, Phillips, Bitto, Han, Bae, Meske)
Argonne (Advanced Photon Source)Bioinformatics Madison (Bingman, Sun, Phillips, Wesenberg)
Indianapolis (Dunker)Milwaukee MCW (Twigger, de la Cruz)
Computational support Madison (Bingman, Ramirez, Phillips)Sesame Madison (Zolnai, Markley, Lee)
The Center for Eukaryotic Structural Genomics(supported by NIH GM64598 and GM074901)