CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a...

66
CO 10

Transcript of CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a...

Page 1: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

CO 10

Page 2: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Genome:

The entire collection of genes encoded by a particular organism.

Determination of a entire genome sequence is a prerequisite to understanding the completebiology of an organism.

Page 3: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Structural: construction of sequence data and gene map.

Functional: functions of genes, and their regulation and products.

Comparative: compare genes from different genomes to elucidate functional and evolutional relationship.

Genomics:

Page 4: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

1990: International Human Genome project begins.

1. To generate physical, genetic, and sequence map of the human genome.

2. To sequence the genome of a variety of model organisms.

3. To develop improved technologies for mapping and sequencing.

4. To develop computational tools for capturing, storing, analyzing, displaying, and distributing map and sequence information.

History

Page 5: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

5. To sequence EST (expressed-sequence tag) fragments of cDNA, and eventually full-length cDNA in different cell

types of human and mice.

6. To consider the ethical, social, and legal challenges posted

by genomic information.

History

1990: International Human Genome project begins.

Page 6: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.1

Page 7: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

What in this chapter?

• Challenges and strategies of genome analysis

• Major insights emerging from complete genome sequences

• High throughput tools for analyzing genome and their products.

Page 8: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Table 10.1

The genomes of living Organisms vary enormously in size

Page 9: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Sequences and polymorphisms

• Sequence error rate: 1% per sequence read Good genomic sequence errors: 1/10,000

Polymorphisms: 1/500 bp.

Repeated sequences may be hard to placeUnclonable DNA cannot be sequenced

Challenges and strategies of genome analysis

Page 10: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.2

A divide and conquer strategy

Page 11: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

10-fold sequence coverage

Sequencing of every chromosomal region from 10 independent inserts can generate an error rate of less than 1/10000.

Random sequence error:1/10 sequence fragments

Polymorphisms: 5/10 sequence fragments

Page 12: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Major techniques in genome characterization

Cloning

hybridization

PCR amplification

sequencing

Computational tool

Page 13: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Three types of maps used in the analysis of human genome

• Linkage map (DNA markers)

• Physical map (divide and conquer)

• Sequence map

Human genome: 3X109

Page 14: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.3

The making of large-scale linkage maps

Two common types of polymorphisms used or mapping

DNA markers

(expand or contract during replication)

Page 15: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Genomewide identification of genetic markers

Identification of SSR by specific pairs of PCR primers

Page 16: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Human Linkage Map

• 20,000 SSRs, 4 million SNPs.

Page 17: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.4

In human: 1 cM= 1 MbIn mice: 1 cM= 2 Mb

Physical MapsOverlapping DNA fragments that are ordered and oriented

and span each of the chromosomes in a genome

The molecular counterparts of linkage maps

Page 18: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

How to build the long-range physical maps:

Bottom-up and Top-Down approaches

Page 19: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

A Hypothetical physical map generated by the analysis of sequence

tagged sites

STS: sequence tagged sites

Page 20: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.5

Dark band: gene poor, AT richLight band: gene rich, CG rich

metaphase

Page 21: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Chromosome 7 at three levels of banding resolution

Page 22: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.6

FISH (fluorescent in situ hybridization)

Page 23: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Advantages of FISH compared to linkage mapping

1. All clones can be mapped by FISH, but those that detect polymorphisms can be mapped by linkage analysis.

2. FISH can be done on any clone locus in isolation, but linkage requires the analysis of one locus in relation to another.

3. FISH requires only a single sample, linkage requires genotype information from a large cohort of individuals.

Disadvantages: low resolution, 4-8 Mb

Page 24: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

A sequencing map is the highest-resolutiongenomic map

Hierarchical shotgun approach

Whole-genome shotgun approach

Page 25: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.12

Hierarchical shotgun approach

minimal overlappingBACs

10X coverage acrossThe BAC insert

200kbX10/2Kb=1000

Page 26: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.13

Whole genome shotgun approach

10-fold sequencecoverage

3X109X6/2000

Page 27: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Whole genome shotgun approach

Advantages: no construction of physical map.

Disadvantage: some genomic sequences can not be cloned.

Page 28: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

The human genome project has changed the practice of Biology, genetics, and genomics

Gene finding and gene-function analyses:

•Through comparative genomics, Identification of genes and gene functions in second genome is facilitated by sequence homology.

•Genes often encodes one or more protein domains. These information provide insights into the functions of a protein.

Page 29: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.14

Page 30: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.15

Synteticblocks

Page 31: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Major insights from the Human and model organismgenome sequence

1. There are approximately 30,000 human genes. 2. Genes encodes either noncoding RNAs or proteins Non-coding RNAs: tRNA,tRNA,snoRNA (small nucleolar RNAs)snRNA (small nuclear RNAs)

Page 32: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

3. Higher complexity of proteome in human: more genes,

more paralogous, alternative splicing.

Homologous genes: genes with enough sequence similarity to be evolutionarily related.

Orthologous genes: defined by their sequence similarities, are genes in two different species that arose from the same gene in the two species’ common ancestor. Paralogous genes: arise by duplication within the same species.

Page 33: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

4. More Domain architecture:

Major insights from the Human and model organismgenome sequence

Page 34: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

5. Chemical modification of proteins

• 400 different chemical modification

• 1 million different proteins

Page 35: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Major insights from the Human and model organismgenome sequence

6. Repeated sequences constitute more than 50% of the human genome.

Transposon-derived repeats, pseudogenes, or simple sequence repeats

Page 36: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Major insights from the Human and model organismgenome sequence

6. The genome contains distinct types of gene organization

A). gene family: multiple related genesolfactory gene family (1000 genes), histones, hemoglobins,

Page 37: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.19 Olfactory receptor gene family

1. One gene undergoes duplication to generate 20 paralogs.2. Massive duplication created 30 sites of the original 20-paralog family.

Page 38: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.20

B). Gene rich region 70% DNA is transcribed

C). Gene deserts

82 gene deserts: no identifiable gene within a megabase

60 genes/700 kb

Page 39: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.21

Combinational strategies may amplify geneticInformation and generate diversity

at DNA level

Antibody or T-cell receptor genes: VDJ recombination

Page 40: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.22

Combinational strategies may amplify geneticInformation and generate diversity

At the RNA level

Page 41: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

High throughput genomic and proteomic platformspermit the global analysis of gene product

Page 42: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.23

Sanger sequencing scheme

Page 43: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

DNA arrays

Macroarray: cDNA on nylon membrane

Microarray: PCR amplified product on glass-slide

Oligonuclotide array: chemically synthesized 20- 60 nt of DNA or RNA

Page 44: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.25

Normaltumor

Normal

tumor

Two-color DNA microarray

Page 45: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.27

Mass/chargeratios

Protein analyses

Page 46: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.28

MPSS: methods to identify transcriptome

(multiple parallel signature sequencing)

Page 47: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.31

Protein-protein interaction:affinity purification and mass spectrometry

Page 48: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.32

The yeast two-hybrid

Page 49: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

System BiologyGlobal study of multiple components of biological

systems and their simultaneous interaction

Page 50: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

System Biology approaches

1. Formulate a computer-based model based on current understanding.2. To define as many of the system’s element as possible by discovery science.3. Perturb the system either genetically or environmentally and

measure changes.

Page 51: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.33

Perturb the system and measure changes

Page 52: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.34

Page 53: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.35

4. Integrate the biological information, and compare these data against prediction of the model

Page 54: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

5. Formulate hypothesis to explain disparities betweenexperimental data and the model, and use these hypothesis as the basis for a second round of perturbation

6. Refine the model until model and experiment are in accord with one another.

Page 55: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

TABLES

Page 56: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Table 10.2a

Page 57: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Table 10.2b

Page 58: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Table 10.3

Page 59: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.9b

Page 60: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.11

Page 61: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.16

Page 62: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.26

Page 63: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.36

Page 64: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.29

Page 65: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.30

Page 66: CO 10. Genome: The entire collection of genes encoded by a particular organism. Determination of a entire genome sequence is a prerequisite to understanding.

Fig. 10.7

Basic procedures in building a whole chromosome physical map