Lecture 8 Plant Genomics I Genome sequencing and...

13
Lecture 8 Plant Genomics I Genome sequencing and analyses 1. Sequencing methods 2. Sequence annotation and analyses 3. Genome structure 4. Arabidopsis genome sequencing 5. Other plant genome sequencing effort -Chapter 7, 322-325, 328-329 -Nature vol. 408, page 792- 795 (Dec. 14, 2000) “Now for the hard ones” “ A green chapter in the book of life” Assigned reading

Transcript of Lecture 8 Plant Genomics I Genome sequencing and...

Page 1: Lecture 8 Plant Genomics I Genome sequencing and …science.umd.edu/classroom/BSCI411/Liu/lecture8.pdfLecture 8 Plant Genomics I Genome sequencing and analyses 1. Sequencing methods

Lecture 8 Plant Genomics I

Genome sequencing and analyses1. Sequencing methods2. Sequence annotation and analyses3. Genome structure4. Arabidopsis genome sequencing5. Other plant genome sequencing effort

-Chapter 7, 322-325, 328-329

-Nature vol. 408, page 792- 795 (Dec. 14, 2000) “Now for the hard ones”“ A green chapter in the book of life”

Assigned reading

Page 2: Lecture 8 Plant Genomics I Genome sequencing and …science.umd.edu/classroom/BSCI411/Liu/lecture8.pdfLecture 8 Plant Genomics I Genome sequencing and analyses 1. Sequencing methods

Generate & align large BAC or P1 clones

Fragment and sequence a subset of clones

Hierarchical sequencing

Fragment and sequence entire genome

Shotgun sequencing

Adapted from Fig. 2.7 Gibson and Muse

Page 3: Lecture 8 Plant Genomics I Genome sequencing and …science.umd.edu/classroom/BSCI411/Liu/lecture8.pdfLecture 8 Plant Genomics I Genome sequencing and analyses 1. Sequencing methods

A T G C

Sanger sequencing methodchain termination with a specific ddNTP (dideoxynucleotides)

Template: CGAGTCCTTAGGCATACAGCTCAGPrimer:

ddT

Template: CGAGTCCTTAGGCATACA

GCTCAGGAAddT

GCTCAGGAATCCGddT

GCTCAGGAATCCGTddT

GCTCAGGAATCCGTATGddT

dNTP and DNA polymerase

+ddA +ddT +ddG +ddC

Page 4: Lecture 8 Plant Genomics I Genome sequencing and …science.umd.edu/classroom/BSCI411/Liu/lecture8.pdfLecture 8 Plant Genomics I Genome sequencing and analyses 1. Sequencing methods

A T G C

animation

Page 5: Lecture 8 Plant Genomics I Genome sequencing and …science.umd.edu/classroom/BSCI411/Liu/lecture8.pdfLecture 8 Plant Genomics I Genome sequencing and analyses 1. Sequencing methods

ATGTTCTT CAACAAAGCC GGCGAGACAA ACTTCGTATT CCATCTCTCG ATTCCCACTT CCACTTTCAC CCTCCTCCTCCTCCTTCCTC CGGCGGCGGA GGTGGCGTCT TTCCTCTCGC TGATTCCGAT TTCCTCGCAG CCGGTGGCTT TCACTCCAACAACAACAACA ACCACATATC TAACCCTAGC TACAGTAATT TCATGGGATT TCTCGGTGGC CCTTCTTCTT CTTCATCCACCGCAGTCGCC GTCGCCGGAG ATCATTCCTT TAACGCCGGA CTTTCTTCCG GAGACGTTCT TGTCTTCAAA CCCGAGCCTCTATCTCTATC TTTGTCCTCT CACCCTAGAC TCGCTTACGA TCTAGTCGTT CCCGGTGTTG TTAACTCCGG ATTCTGTAGATCTGCCGGTG AAGCCAACGC CGCCGCCGTC ACCATCGCGT CTAGAAGCTC TGGTCCTCTC GGACCTTTCA CGGGCTACGCGTCGATTCTT AAAGGATCAA GGTTCTTGAA ACCAGCACAG ATGCTTCTTG ATGAGTTTTG TAATGTGGGT CGTGGGATTTACACCGACAA AGTCATCGAC GACGATGATT CTTCTCTGCT TTTTGATCCG ACGGTTGAGA ATCTCTGCGG TGTTTCTGATGGCGGCGGAG GAGATAATGG AAAGAAAAAA TCAAAACTCA TCTCCATGCT CGACGAGGTT TACAAGAGGT ATAAGCAATACTATGAGCAG CTACAAGCTG TGATGGGATC ATTCGAATGC GTTGCAGGTC TCGGGCACGC TGCTCCGTAC GCTAACTTAGCCTTGAAAGC ATTGTCTAAG CATTTCAAGT GTTTGAAGAA TGCTATAACG GACCAGCTTC AATTCAGCCA CAACAACAAGATCCAACAAC AACAACAATG TGGTCATCCG ATGAACTCTG AGAATAAGAC TGATTCTTTA AGATTTGGAG GAAGTGATAGTTCTAGAGGC TTATGTTCTG CTGGTCAAAG ACATGGATTT CCTGATCATC ATGCTCCTGT TTGGAGACCG CACCGTGGCCTACCCGAACG TGCTGTTACT GTTCTAAGGG CTTGGCTCTT CGATCATTTC TTGCATCCTT ATCCAACAGA TACAGACAAACTCATGCTGG CTAAGCAGAC AGGTCTCTCC AGAAATCAGG TATCGAATTG GTTCATAAAC GCAAGAGTTA GGGTTTGGAAGCCGATGGTG GAAGAGATTC ACATGCTGGA GACTCGACAA TCTCAGAGAT CTTCTTCTTC CTCTTGGAGA GACGAACGTACTAGCACCAC CGTCTTCCCT GACAACAACA ACAACAACCC ATCTTCGTCC TCGGCACAGC AAAGACCTAA CAACTCATCTCCGCCTAGAC GGGCACGAAA CGACGACGTT CATGGCACAA ACAACAACAA CAGCTATGTA AACAGTGGGA GCGGCTGCGGTAGTGCGGTT GGTTTCTCGT ATGGAATTGG GTCGTCGAAT GTGCCGGTGA TGAATAGCAG CACAAACGGA GGAGTGTCTTTGACGTTAGG GCTTCATCAT CAGATTGGGT TACCGGAGCC TTTTCCGATG ACAACTGCTC AGAGGTTTGG TCTTGATGGTGGTAGTGGCG ATGGTGGTGG TGGGTATGAA GGGCAAAATC GTCAGTTTGG GAGAGATTTT ATTGGTGGTA GTAATCATCAGTTTCTACAT GATTTTGTAG GTTGAGATTA TTTGTGTGGA AAGGAAAAAA TATGTTTGAC GTTTGGGTAT GTATAAGAAGATATGGGGGA ATTGAAATGC ATATGATGTG TATATTAGAA TGTTTCTTC

BLAST: Basic Local Alignment Search ToolPerforms pairwise comparisons of sequences, seeking regions of local similarityrather than optimal global alignment between two sequences

Blast Search

NCBI (http://www.ncbi.nlm.nih.gov/)

Page 6: Lecture 8 Plant Genomics I Genome sequencing and …science.umd.edu/classroom/BSCI411/Liu/lecture8.pdfLecture 8 Plant Genomics I Genome sequencing and analyses 1. Sequencing methods

Searching sequence databases

Query: the submitted sequence

Genbank accesion number: every sequence submitted to Genbankhas an assigned number

E-value: probability of, by chance, obtaining a seq similarityas similar as the blast result.

Scores:based on scoring matrix, penalizes mismatchesaccording to certain rules or seq alignment.

Blast result

Page 7: Lecture 8 Plant Genomics I Genome sequencing and …science.umd.edu/classroom/BSCI411/Liu/lecture8.pdfLecture 8 Plant Genomics I Genome sequencing and analyses 1. Sequencing methods

Ab initio Gene discovery

EST (Expressed Sequence Tag) sequencing

Genome Annotation

GeneFinder

Grail

Genie

Genscan

HMM gene

FGENES

Page 8: Lecture 8 Plant Genomics I Genome sequencing and …science.umd.edu/classroom/BSCI411/Liu/lecture8.pdfLecture 8 Plant Genomics I Genome sequencing and analyses 1. Sequencing methods

E. coli 4.5 kbYeast 1.2 x 104 kbC. elegans 9.7 x 104 kbArabidopsis 1.2 x 105 kbDrosophila 1.8 x 105 kbMung bean 4.5 x 105 kbRice 5.0 x 105 kbTomato 1.0 x 106 kbPotato 1.8 x 106 kbHuman 3.2 x 106 kbSoya bean 1.1 x 106 kbMaize 6.6 x 106 kbWheat 1.6 x 107 kb

Some of the genome sizes

Page 9: Lecture 8 Plant Genomics I Genome sequencing and …science.umd.edu/classroom/BSCI411/Liu/lecture8.pdfLecture 8 Plant Genomics I Genome sequencing and analyses 1. Sequencing methods

-Near constant number of genes in all genomes irrespective of genome sizes25,000 Arabidopsis, 30-40,000 human, 19,099 in C. elegans, 13,600 in Drosophila.

-The bigger a genome, the more repetitive DNA, the C-value paradox

Arabidopsis: 1X 105 kb (14%) Tomato: 1X 106 kb (15-20%); Mung Bean: 4.5X105 kb (30%)Pea: 4.1X 106 kb (70%)Wheat, Corn 107 kb (60-80%)

Fig. 7.23

-Adh1 gene in maize:

Page 10: Lecture 8 Plant Genomics I Genome sequencing and …science.umd.edu/classroom/BSCI411/Liu/lecture8.pdfLecture 8 Plant Genomics I Genome sequencing and analyses 1. Sequencing methods

Arabidopsis thaliana (www.arabidopsis.org)

Genome sequence completed in 2000, published in 5 installmentSee “Arabidopsis Genome Intiative, 2000 (pdf)”

-115 Mb, 25,500 predicted genes, -Whole genome duplication 2X followed by extensive shuffling of chromosomal regions and gene loss-The majority of the genes can be assigned to just 11,000 families, which might represent the minimal complexity or “toolkit” to support complex multicellularity. Animal and plant genomes might evolve from this toolkit

-Distinctive features of plant genome: ~ 800 genes are of plastid decent ~10% genome are transposable elements ~ plant specific genes:

Enzymes for cell wall biosynthesis, photosynthesis, secondary metabolitesPhotptrophic, gravitrophicTransport proteins for nutrient, ion, toxic compound, metabolites between cellsPathogen resistant genes

Page 11: Lecture 8 Plant Genomics I Genome sequencing and …science.umd.edu/classroom/BSCI411/Liu/lecture8.pdfLecture 8 Plant Genomics I Genome sequencing and analyses 1. Sequencing methods

Synteny: Colinearity of loci (genes) among different plant species

i.e. Revolutionarily conserved organization and arrangement of single copy genes

Also see Fig. 7.28 of our text book

20 of the 54 genes in a 340 kb stretch of the rice genome (top) retain the same order in five different 80-200 kb regions of Arabidopsis genome

genes on different strandsinterspersed, unrelated genes

Page 12: Lecture 8 Plant Genomics I Genome sequencing and …science.umd.edu/classroom/BSCI411/Liu/lecture8.pdfLecture 8 Plant Genomics I Genome sequencing and analyses 1. Sequencing methods

Grasses, Legumes, and Solanaceae

-Whole genome seq: rice, maize, and alfafa-Comparative genome methods: synteny-EST projects: for rest of the crop plants-High resolution genetic map: for rest of the crop plants

Grasses: rice, maize, barley, wheat, sorghumLegume: soybean, alfalfaSolanaceae: tomato, potato

US ARS: http://ars-genome.cornell.edu

MaizeDB: http://www.agron.missouri.edu

Cropnet http://uk-crop.net

Page 13: Lecture 8 Plant Genomics I Genome sequencing and …science.umd.edu/classroom/BSCI411/Liu/lecture8.pdfLecture 8 Plant Genomics I Genome sequencing and analyses 1. Sequencing methods

listed at USDA site: http://www.nal.usda.gov/pgdic/map_proj

90 different angiosperm genome project

Forest genomics

http://dendrome.ucdavis.edu