Ryan ’Donnell Carnegie Mellon University O. Ryan ’Donnell Carnegie Mellon University.
Computational Biology at Carnegie Mellon University A Quick Tour Jaime Carbonell Carnegie Mellon...
-
date post
19-Dec-2015 -
Category
Documents
-
view
218 -
download
1
Transcript of Computational Biology at Carnegie Mellon University A Quick Tour Jaime Carbonell Carnegie Mellon...
Computational Biology at Carnegie Mellon University
A Quick Tour
Jaime CarbonellCarnegie Mellon University
December, 2008
Computational Biology at CMU: Educational History 1987 Undergraduate program in
Computational Biology established 1991 Howard Hughes Medical Institute
grant to build undergrad curriculum 2000 M.S. Program in Computational
Biology established 2005 Joint CMU & U. of Pittsburgh PHD
Program in Computational Biology
Computational Biology at CMU: History
2002 NSF large ITR grant (CMU PI: Reddy & Carbonell) with U, Pitt, MIT, Boston U, NRC Canada Computational Biolinguistics
2003 NSF large ITR grant (CMU PI: Murphy) with UCSB, Berkeley, MIT Bioimage Informatics
2004-2008 10 small grants from NSF, NIH, Merck, Gates on: Computational proteomics, viral evolution, HIV-human interactome, …
Joint CMU-Pitt Ph.D. Program in Computational Biology
Curriculum for Comp Bio PhD Core graduate courses
Molecular Biology Biochemistry Biophysics Advanced Algorithms & Language Tech. Machine Learning Methods Computational Genomics Computational Structural Biology Cellular and Systems Modeling
Curriculum Elective Courses
Computational Genomics Computational Structural Biology Cellular and Systems Modeling Bioimage Informatics Computational Neurobiology Advanced Statistical Learning Methods
Example Books Used
Teaching & Advising Faculty 30 faculty from CMU
11 Computer Science 11.5 Biology and Chemistry 3.5 Bio-Engineering 3 Statistics and Mathematics 1 Business School
36 faculty from Pitt 19 Medical School 17 Biology, Chemistry, Physics
Faculty: Computational Genomics
Ziv Bar-Joseph* Jaime Carbonell Marie Dannie Durand* Jonathan Minden Ramamoorthi Ravi Kathryn Roeder Roni Rosenfeld Larry Wasserman Eric Xing*
Linguistics methods for elucidating
sequence-structure-function relations
Machine Learning methods for
annotation
Modeling genome evolution through
duplication
* = Primary research area
Faculty: Computational Structural Biology (Proteomics)
Michael Erdmann Maria Kurnikova* Chris Langmead* John Nagle Gordon Rule Robert Swendsen Jaime Carbonell*
Homologous structure
determination by NMR
Improving determination of
protein structure and dynamics using
sparse data
Molecular dynamics of proteins and nucleic
acids
Faculty: Cellular and Systems Modeling
Ziv Bar-Joseph* Omar Ghattas Philip LeDuc Russell Schwartz* Joel Stiles* Shlomo Ta’asan Yiming Yang Eric Xing
Computational modeling of mechanical
properties of cells and tissues
Modeling of formation of
protein complexes
Multi-scale modeling of excitable membranes
Discovery of large-scale gene regulatory networks
Faculty: Bioimage Informatics William Cohen Bill Eddy Christos Faloutsos Jelena Kovacevic Tom Mitchell* Robert Murphy* Eric Xing
Determining subcellular location
from microscope images
Machine learning of patterns of brain activity
Statistical analysis of gel images for proteomics
Generative models of protein traffic
Faculty: Computational Neurobiology
Justin Crowley Tom Mitchell Joel Stiles* David Touretzky* Nathan Urban
Multi-scale modeling of excitable membranes
Machine learning of patterns of brain
activity
Development of structure of neuronal
circuits
Proteomics Things to learn about proteins
sequence activity Partners Structure Functions Expression level Location/motility
Examples of Cool Research Computational Biolinguistics
Sequence (DNA, Protein) Structure Function Language (Speech, Text) Syntax Semantics
GPCRs (sensor/channel proteins, Klein CMU/Pitt) 60% of all targeted drugs affect GPCRs Language (information-theoretic) analysis
Evolutionary Analysis (of genes, proteins, …) Conservation, replication, poly-functionality (Rosenberg)
Immune System Modeling (just starting…) Domain/Fold polymorphic modeling (Langmead)
Cross-species Interactome (just starting…) Human-HIV protein-protein (Carbonell, Klein)
Evolutionary Methods for Discovering Sequence Function Mapping (Rosenfeld)
HumanMonkeyMouseRatCowDogFlyWormYeast
A Multiple Sequence Alignment Distribution of amino acids
Conserved Properties across Rhodopsin
Subtask: Identifying Chemical Properties Conserved at each Protein Position
A Single Position Results for All Rhodopsin Positions
Five Classifiers in Gene Identification for Cancer/H5 (Yang)
New Field: Location Proteomics (Langmead)
Can use CD-tagging (developed by Jonathan Jarvik and Peter Berget) to randomly tag many proteins
Isolate separate clones, each of which produces one tagged protein
Use RT-PCR to identify tagged gene in each clone Collect many live cell images for each clone using
spinning disk confocal fluorescence microscopy Cluster proteins by their location patterns
(automatically)
Quaternary Fold Predictions (Carbonell & Liu)
Triple beta-spirals [van Raaij et al. Nature 1999]
Virus fibers in adenovirus, reovirus and PRD1
Double barrel trimer [Benson et al, 2004]
Coat protein of adenovirus, PRD1, STIV, PBCV
Model Organism: Bacterial Phage T4: (Ultimate targets are HIV, etc.)
Clone isolation and images collection by Jonathan Jarvik, CD-tagged gene identification by Peter Berget, Computational Analysis of patterns by Xiang Chen and Robert F. Murphy
Protein name
Dendritic Clustering for Clone (Murphy)
New Challenge: Functional Genomics
The various genome projects have yielded the complete DNA sequences of many organisms. E.g. human, mouse, yeast,
fruitfly, etc. Human: 3 billion base-pairs, 30-
40 thousand genes. Challenge: go from
sequence to function, i.e., define the role of each gene
and understand how the genome functions as a whole.
Free DNA probe
*
*Protein-DNA complex
Advantage: sensitive Disadvantage: requires stable complex; little “structural” information about which protein is binding
Classical Analysis of Transcription Regulation Interactions
“Gel shift”: electorphoretic mobility shift assay (“EMSA”) for DNA-binding proteins
Modern Analysis of Transcription Regulation Interactions
Genome-wide Location Analysis
Advantage: High throughput Disadvantage: Inaccurate
Gene Regulatory Network Induction (Xing et al)
Gene Regulation and Carcinogenesis
PCNA (not cycle specific)
G0 or G1 M G2
S
G1
E
AB
+
PCNA
Gadd45DNA repair
Rb
E2F
Rb P
Cyclin
CdkPhosphorylation of
+ -
Apoptosis
FasTNF
TGF-...
p53
Pro
mo
tes
oncogeneticstimuli
(ie. Ras)
extracellularstimuli(TGF-)In
hibi
tsac
tiva
tes
acti
vate
s
p16
p15
p53
p14
tran
scrip
tiona
l ac
tivat
ion
p21
acti
vate
s
cell damagetime required for DNA repair severe DNA damage
Cancer !Cancer !
Normal BCH
CIS
DYS
SCC
The Pathogenesis of Cancer