TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

43
TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course

Transcript of TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Page 1: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

TOPICS IN (NANO) BIOTECHNOLOGYGenomics & Proteomics

Lecture 13

21st June, 2006

PhD Course

Page 2: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

• High quality genome sequencing and annotation (2003)• Complete sequencing the genomes of other model organisms

(e.g. Mouse)

• The next step: Functional Genomics• Determine what our genes do through systematic studies of

function on a large scale– Transcriptomics - Comparative analysis of mRNA expression /splicing– Proteomics - Comparative analysis of protein expression and post-

translational modifications– Structural genomics - Determine 3-D structures of key family members– Intervention studies - Effects of inhibiting gene expression– Comparative genomics - Analysis of DNA sequence patterns of humans and

well studies model organisms

What next?

Page 3: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Beyond Genomics – Systems Biology

• Human Genome = 30,000 to 60,000 genes

• Human Proteome = 300,000 to 1,200,000 protein variants

• Human Metabalome = metabolic products of the organism (lipids,carbohydrates, amino acids, peptides, prostaglandins, etc)

Page 4: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

• Whole genome – Once the whole genome is truly known and the whole

genome sequences become available for an organism, the challenge turns from identifying parts to understanding function

• Functional genomics – The post-genomic era is defined as functional

genomics– Assignation of function to identified genes– Organisation and control of genetic pathways that come

together to make up the physiology of an organism

Functional Genomics

Page 5: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

• 42% of human genes of unknown function have been found in the human genome

• assigning function to these genes using systematic high throughput methods is required

Functional Genomics

Page 6: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

The Periodic Table: Functional grouping of Chemical Elements

Page 7: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Biologist’s Periodic Table

Organism’s Gene

System for classifying

genes

• Will not be two-dimensional

• Will reflect similarities at diverse levels

– Primary DNA sequence in coding and regulatory regions

– Polymorphic variation within a species or subgroup

– Time and place of expression of RNAs during development, physiological response and disease

– Subcellular localisation and intermolecular interaction of protein products

Page 8: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

• Array of hope? Arrays offer hope for global views of biological

processes– Systematic way to study DNA and RNA variation– Standard tool for molecular biology research & clinical

diagnostics– Labelled nucleic acid molecules can be used to interrogate

nucleic acid molecules attached to solid support (remember Southern Blotting?)

(Refer to January 1999, Nature Genetics Supplement, Volume 21)

Gene Expression analysis

Page 9: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

• DNA chips Also known as gene chips, biochips, microarrays…basically DNA-covered pieces of glass (or plastic) capable of simultaneously analysing thousands of genes at a time – they can be high density arrays of oligonucleotides or cDNA

• Chips allow the monitoring of mRNA expression on a big scale (i.e many many genes at the same time)

Gene Expression analysis

Pre-1995, Northern Blots used to look at gene expression

Page 10: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Gene Expression analysis

Incyte

Page 11: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Affymetrix

Gene Expression analysis

Page 12: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Nanogen_Movie_1

Nanogen_Movie_2

Nanogen_Movie_3

Affymetrix_Movie_3

http://www.learner.org/channel/courses/biology/units/genom/images.html

Page 13: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.
Page 14: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Determining gene function

sequence homology

sequence motif

tissue distribution

chromsme localisation

function . expression in disease

biochemical assays

proteomics .

expression in models

Page 15: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Protein synthesis

Page 16: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

RNA synthesis and processing

Page 17: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Alternatively spliced mRNA

Page 18: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

• DEFINITION: The mRNA collection content, present at any given

moment in a cell or a tissue, and its behaviour over time and cell states

(Adam Sartel, COMPUGEN).

The complete collection of mRNAs and their alternative splice forms is sometimes referred to as the trancriptome. The transcriptome is teh set of instructions for creating all of the different proteins found in an organism.

(From Genome to Transcriptome, Incyte)

The transcriptome

Page 19: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Genome, proteome and transcriptome

The Proteome

The Genome

- Index to a range of possible proteins - Useful as a map and for inter-organisms analysis

- Describes what actually happens in the cell - Complex tools, partial results

Page 20: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

• Discovery of new proteins: – that are present in specific tissues– that have specific cell locations– that respond to specific cell states

• Discovery of new variants:– of important genes– that work to increase/decrease the activity of the ‘native’ protein

• The transcriptome reflects tissue source (cell type, organ) and also tissue activity and state such as the stage of development, growth and death, cell cycle, diseased or healthy, response to therapy or stress..

Use of transcriptome analysis

Page 21: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

• Proteomics…where the genome hits the road – Proteomics refers to the simultaneous, large scale analysis

of all (or many) of the proteins made in a cell at one time to get a global picture of what proteins are made in cells and when

– Hopefully then we can determine the ‘whys’ and what we can thus do about it – very important for drug development

– The proteome is the protein complement encoded by a genome and the term was first proposed by an Australian post-doc, Marc Wilkins in 1994

Beyond genomics…proteomics

Page 22: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Beyond the genome: Proteomics

• Genomics involves study of mRNA expression-the full set of genetic information in an organism contains the recipes for making proteins

• Proteins constitute the “bricks and mortar” of cells and do most of the work

• Proteins distinguish various types of cells, since all cells have essentially the same “Genome” their differences are dictated by which genes are active and the corresponding proteins that are made

• Similarly, diseased cells may produce dissimilar proteins to healthy cells

• However task of studying proteins is often more difficult than genes (e.g. post-translational modifications can dramatically alter protein function)

Page 23: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

• Identification of all the proteins made in a given cell, tissue or organism

• Identification of the intracellular networks associated with these proteins

• Identification of the precise 3D-structure of relevant proteins to enable researchers to identify potential drug targets to turn protein “on or off”

• Proteomics very much requires a coordinated focus involving physicists, chemists, biologists and computer scientists

Beyond the genome: Proteomics

Page 24: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

• Major challenge-how do we go from the treasure chest of information yielded by genomics in understanding cellular function

• Genomics based approaches initially use computer-based similarity searches against proteins of known function

• Results may allow some broad inferences to be made about possible function

• However, a significant percentage (>30%) of the sequences thus far ascertained seem to code for proteins that are unrelated at this level to proteins of known function

Beyond the genome: Proteomics

Page 25: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

• Beyond the genetic make-up of an individual or organism, many other factors determine gene and ultimately protein expression and therefore affect proteins directly

• These include environmental factors such as pH, hypoxia, drug treatment to name a few

• Examination of the genome alone can not take into account complex multigenic processes such as ageing, stress, disease or the fact that the cellular phenotype is influenced by the networks created by interaction between pathways that are regulated in a coordinated way or that overlap

Beyond the genome: Proteomics

Page 26: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

• Genomic analysis has certainly provided us with much insight into the possible role of particular genes in disease

• However proteins are the functional output of the cell and their dynamic nature in specific biological contexts is critical

• The expression or function of proteins is modulated at many diverse points from transcription to post-translation and very little of this can be predicted from a simple analysis of nucleic acids alone

• There is generally poor correlation between the abundance of mRNA transcribed from the DNA and the respective proteins translated from that mRNA

• Furthermore, transcript splicing can yield different protein forms• Proteins can undergo extensive modifications such as

glycosylation, acetylation, and phosphorylation which can lead to multiple protein products from the same gene

Beyond the genome: Proteomics

Page 27: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Proteomics Tools• The core methodologies for displaying the proteome

are a combination of advanced separation techniques principally involving two-dimensional electrophoresis (2D-GE) and mass spectrometry

http://www.learner.org/channel/courses/biology/units/proteo/images.html

http://www.childrenshospital.org/cfapps/research/data_admin/Site602/mainpageS602P0.html

Page 28: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

2D-GE: basic methodology• Sample (tissue, serum, cell extract) is solubilized and the

proteins are denatured into polypeptide components• This mixture is separated by isoelectric focusing (IEF); on

the application of a current, the charged polypeptide subunits migrate in a polyacrylamide gel strip that contains an immobilized pH gradient until they reach the pH at which their overall charge is neutral (isoelctric point or pI), hence producing a gel strip with distinct protein bands along its length

• This strip is applied to the edge of a rectangular slab of polyacrylamide gel containing SDS. The focused polypeptides migrate in an electric current into the second gel and undergo separation on the basis of their molecular size

Page 29: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

• The resultant gel is stained (Coomassie, silver, fluorescent stains) and spots are visualized by eye or an imager. Typically 1000-3000 spots can be visualized with silver. Complementary techniques, e.g. immunoblotting allow greater sensitivity for specific molecules.

• Multiple forms of individual proteins can be visualized and the particular subset of proteins examined from the proteome is determined by factors such as initial solubilization conditions, pH range of the IPG and gel gradient

2D-GE: basic methodology

Page 30: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

General schematic of 2D-PAGE for protein identification in Toxicology

Page 31: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Sample growth Sample solubilization

Isoelectric focusing (IPG)

2D-PAGE

Image analysisImmunoblot (Western)

Isolation of spots of interest

Trypsin digestion of proteins

MS analysis of tryptic fragments

Identification of proteins

General strategy for proteomic analysis

Page 32: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Nature of IPG determines spot location on 2D-PAGE

Page 33: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Limitations of 2D-GE

• In the large scale analysis of proteomics, 2D-GE has been the major workhorse over the last 20 years-its unique application in being able to distinguish post-translational modifications and is analytically quantitative

• However despite the significant improvements (e.g. immobilized pH gradients) to the technique and its coupling with MS analysis it is still difficult to automate

• Although at first glance the resolution of 2D seems very impressive, it still lags behind the enormous diversity of proteins and thus comigrating protein spots are not uncommon

• This is especially of concern when trying to distinguish between highly abundant proteins e.g. actin (108 molecules/cell) and low abundant like transcription factors (100-1000)-this is beyond the dynamic range of 2D

• Enrichment or prefractionation can often overcome such discrepancies

Page 34: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

• Chemical heterogeneity of proteins also presents a major limitation

• Thus the full range of pIs and MWs of proteins exceeds what can routinely be analyzed on 2D-GE. However improvements to IPGs is expected to overcome some of these constraints and greatly imrpove the coverage of the entire proteome of the cell

• Problems liked with extraction and solubilization of proteins prior to 2D-GE present an even greater challenge-especially for extremely hydrophobic proteins, such as membrane and nuclear proteins. Again recent advances in buffer composition has diminished the scale of this problem

Limitations of 2D-GE

Page 35: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Differential Gel Electrophoresis (DiGE)

Page 36: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Protein identification and characterization

• Specialized imaging software allows for a more detailed analysis of spot identification and comparison between gels, and treatments

• By a process of subtraction, differences (e.g. presence, absence, or intensity of proteins or different forms) between healthy and diseased samples can be revealed

• Cross-references to protein databases allow assignment by known pIs and apparent molecular size. Ultimate protein identification requires spot digestion (enzymatic) and analysis of charge and mass by mass spectrometry (MS)

• Spot cutter tools can be coupled to image analysis tools and in gel tryptic digestion techniques in 96 or 384 well format can greatly reduce the bottle-neck in sample identification by MS

Page 37: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Protein analysis by MS• Compared to sequencing, MS is more sensitive (femtomole to

attomole concentrations) and is higher throughput• Digestion of excised spot with trypsin results in a mixture of peptides.

These are ionized by electrospray ionization from liquid state or matrix-assisted laser desorption ionization from solid state (MALDI-TOF) and the mass of the ions is measured by various coupled analyzers (e.g. time of flight measures the time for ions to travel from the source to the detector, resulting in a peptide fingerprint

• The resultant signature is compared with the peptide masses predicted from theoretical digestion of protein sequences found in databases-identification of protein!

• Tandem MS allows one to obtain actual protein sequence information-discrete peptide ions can be selected and further fragmented, and complex algorithms employed to correlate exp data with database derived peptide sequences

Page 38: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

MS analysis

Page 39: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

MS analysis

Page 40: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Antibody arrays

Good for low-abundance proteinsProblem is antibody specificity

Page 41: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Protein microarrays

Page 42: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

Caveats• The technology of proteomics is not as mature as

genomics, owing to the lack of amplification schemes akin to PCR. Only proteins from a natural source can be analyzed

• The complexities of the proteome arise because most proteins seem to be processed and modified in complex ways and can be the products of differential splicing;

• in addition; protein abundance spans a range estimated to be 5 to 6 orders of magnitude in yeast and 10 orders of magnitude in humans.

Page 43: TOPICS IN (NANO) BIOTECHNOLOGY Genomics & Proteomics Lecture 13 21st June, 2006 PhD Course.

challenges

• Complexity – some proteins have >1000 variants

• Need for a general technology for targeted manipulation of gene expression

• Limited throughput of todays proteomic platforms

• Lack of general technique for absolute quantitation of proteins