Introduction to Systems Biology 國立台灣大學資訊工程系 博士後研究員 詹鎮熊.
-
Upload
wesley-harper -
Category
Documents
-
view
250 -
download
0
Transcript of Introduction to Systems Biology 國立台灣大學資訊工程系 博士後研究員 詹鎮熊.
Introduction to Systems Biology
國立台灣大學資訊工程系博士後研究員 詹鎮熊
What is a system?
Features of a system Components Interrelated components Boundary Purpose Environment Interfaces Input Output Constrain
Examples of Systems
Life‘s Complexity Pyramid
Components
Buildingblocks
Functionalmodules
System
Z. N. Oltvai and A.-L. Barabási, Science 298, 763 (2002)
生態體系
社區
族群
個體
器官系統
個體
組織
細胞
分子
原子
生物圈
個體 – 細胞 – 胞器 – 分子Organism – Cell – Organelle – Molecules
人體由上兆個細胞組成每個細胞具有:46 條染色體2 米長的 DNA30 億個鹼基 (A, T, G, C)2~3 萬個基因
The Central
Dogma
Bottom-up
From genes to phenotypes
If the genome sequence can be fully sequenced, can we resolve all the secrets hidden in the DNA?
The -omics (-ome) era
Genomics (Genome) Human Genome Project Other Genome Projects
Mouse Fly Dog Worm Bacteria … Most recently … Cat
Human genome project
Sequence the whole genome sequence of several individuals
Competition between Celera and NIH Took over a decade Draft in 2000, complete in 2003
The next stage: HapMap HapMap is a catalog of common genetic
variants that occur in human beings It describes:
what these variants are where they occur in our DNA and how they are distributed among people
within populations and among populations in different parts of the world
Single Nucleotide Polymorphism (SNP)
Personalized genome James Watson (454 Life Science) Craig Venter (Venter Institute)
23andme (backed by Google, focus on social/family relationships)
Navigenics (focus on medical conditions) Personal Genome Project (PGP, Harvard)
Proteomics (Proteome)
Categorize all proteins (and their relationships) in a temporal-spatial confined system Identities of these proteins Quantities Variants of these proteins
Alternative splicing forms Post-translational modifications
(Phosphorylation, Methylation, Ubiqutination, …)
Proteomics
Mass Spectrometry
Fluorescence Resonance Energy Transfer (FRET)
Co-localization (interaction) between protein-protein, protein-DNA pairs
Transcriptome Identify all transcription factors (TF) func
tioning in a specific temporal-spatial confined system
Identify all genes regulated by specific TFs
ChIP-chip TransFac database
Chromatin Immuno-Precipitation (ChIP) a well-established
procedure used to investigate interactions between DNA-binding proteins and DNA in vivo
ChIP-chip
Transcription Factor Binding Motifs
Interactome
Categorized all interactions (protein-protein or protein-DNA) within an organism Yeast Two-Hybrid Immuno-coprecipitation (co-IP) Mass Spectrometry FRET …
Yeast Two-hybrid
Metabolomics (Metabolome) “systematic study of the unique
chemical fingerprints that specific cellular processes leave behind”
Collection of all metabolites in a biological organism
Analytical methods for metabolomics Separation
Gas Chromatography (GC) High performance liquid chromatography
(HPLC) Capillary electrophoresis (CE)
Detection Mass Spectrometry Nuclear magnetic resonance (NMR)
spectroscopy
Glycomics Oligosaccharide Glycoprotein/Proteoglycan
Proteins attached to oligosaccharides Important to cell recognition
Cancer targeting Influenza
Model Organisms
Yeast (S. cerevisiae) Worm (C. elegans) Fruit Fly (D. melanogaster) Mouse (M. musculus)
Monitoring the System
High throughput monitoring of gene expression Microarray Protein microarray GC/HPLC/MASS/Tandem MASS
Phenotype/Disease
Microarray
Protein Microarray
Phenotypes
Lethality Synthetic lethal
Developmental Morphological Behavioral Diseases
Genotypes and Phenotypes
genotype + environment → phenotype genotype + environment + random-variation → phenotype
Importance of Computer Models
Interactions in cell are too complex to handle by pen-and-paper
With high-throughput tools, biology shifts from descriptive to predictive
Computers are required to store, processing, assemble, and model all high-throughput data into networks
Types of Computer Models Chemical Kinetic Model
Defined by concentrations of different molecular species in the cell
Represented with a number of equations Some processes may be stochastic
Simplified Discrete Circuit Network with nodes and arrows Nodes represent quantity or other attributes Directed edges represent effect of nodes on
other nodes
Different Mathematical Formulations Differential Equations
Linear (ordinary) Partial Stochastic
S-Systems Power-law
formulation Captures complicate
dynamics Parameter estimation
is computation intensive
Model details Selection of genes, gene products, and ot
her molecules to be included Cellular compartments: nucleus, golgi, or
other organelles Too much details may lead to more noises Minimal model able to predict system pro
perties (mRNA level, growth rate, etc) is sufficient
Construct Model from Global Patterns
Microarray gene expression patterns: Up-regulated/down-regulated
Gene expression profiles under different conditions: Tumor/normal, cell cycle, drug treatment, …
Methods: Bayesian Inferences Machine learning (clustering, classification) …
Framework for Systems Biology
Tools for Simulation
E-cell Cell Illustrator Virtual Cell
Standardizing efforts: BioJake SBML (systems biology markup language) Facilitate the exchange of models
E-Cell System
A software to construct object models equivalent to a cell system or a part of the cell system
Employing Structured Variable-Process model (previously called the Substance-Reactor model, or SRM)
Objects: Variables, Processes, Systems
Cell Illustrator
Computational Databases Protein-protein interaction
DIP, BIND, MIPS, MINT, IntAct, POINT, BioGRID Protein-DNA interaction
TRANSFAC, SCPD Metabolic pathways
KEGG, EcoCyc, WIT, Reactome Gene Expression
GEO, ArrayExpress, GNF, NCI60, commercial Gene Ontology
Network Biology
The entities within a system form intertwined complex networks Genes Proteins Metabolites External factors…
Gene (Transcription) Regulatory Network
Protein-Protein Interaction Network
Metabolic Pathways
KEGGmetabolicpathway
Gene Ontology
The Gene Ontology project provides a controlled vocabulary to describe gene and gene product attributes in any organism
Annotations Molecular Function Cellular Components Biological Processes
Challenges of Databases
Provide information other than simple entries (e.g. PPI with functional annotation or binding strength)
Data maintenance – update Integration with other databases
Applications
Target identification and drug discovery
Disease Gene Identification From networks From literature From microarray Quantitative Trait Loci (QTL) Genome-Wide Association Study (GWAS) Endeavour Systems biology (integrated) approaches?
Drug Targets
Gene identification from network
Nodes Hubs
Edges (interactions) Define critical genes from connected
edges? Shortest path, alternative path? Weights
Metabolic pathways as well
Gene identification from literature
OMIM (Online Mendelian Inheritance in Men) Single gene disease Complex disease
Defects identified, target for drugs and cures
Gene identification from microarray Up-regulated genes Down-regulated genes Too many? Cluster of genes Regulator (transcription factors) for
the important clusters
Quantitative Trait Loci (QTL)
Region of DNA that is associated with a particular phenotypic trait
Phenotypic characteristic varies in degree and attributes to interaction between two or more genes
QTL may not be gene itself, but as a sequence of DNA, is closely linked with the target gene
Quantitative Trait Loci
LOD (log odd ratio): how likely to observe a locus for a group with specific trait (phenotype)
Expression QTL (e-QTL): combine microarray for gene expression (identify transcription regulatory elements as QTL)
cM: centimorgan, 1,000,000 bases in chromosome
Genome-Wide Association Studies (GWAS)
Genome-wide association studies (GWAS) rely on newly available research tools and technologies to rapidly and cost-effectively analyze genetic differences between people with specific illnesses, such as diabetes or heart disease, compared to healthy individuals.
Keys to success of GWAS
Population Resource Large sample size required for significant det
ection SNP Map and Genotyping
High-throughput genotyping IT and Analysis Tool
Storage and analysis (1000 microarrays for billions of data points)
What have GWAS found?
Genes associated with risks of: type 2 diabetes Parkinson's disease heart disorders Obesity prostate cancer …
An integrated approach: Endeavour
Genes can obtain various scores regarding their association with disease
These scores include those mentioned above
The various ranks of these genes according to different scores are determined
With a consensus scoring scheme (data fusion), the resulting prediction accuracy could be improved
Aerts, et al. (2006)
Toward personalized medicine
Targeted therapy
Using antibody against biomarkers (cancer or other infectious agents)
Require prior knowledge of patient response (through lab tests or biochips)
Gene therapy
Replace or inhibit genes in patients Vectors
Adenovirus (AAV) Silencing the disease gene
RNAi microRNA
RNA interference
Putting All Together
Network of Networks
Gene regulation (protein-DNA) Protein-protein interaction Metabolic pathway
How…?
Questions?