Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA...

21
Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY

Transcript of Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA...

Page 1: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

Stratton Nature 45: 719, 2009

Evolution of DNA sequencing technologies - 1980 to present day

DNA SEQUENCING & ASSEMBLY

Page 2: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

“The X Prize Foundation of Playa Vista, California, is offering a $10-million prize to the first team to accurately sequence the genomes of 100 people aged 100 or older, for $1,000 or less apiece and within 30 days [beginning September 5, 2013].”

see Nature 487:417, July 26, 2012

$$$ Motivation to “spur DNA sequencing technologies, boost accuracy and drive down costs”

…with an accuracy of <1 error per 1 million bases

Page 3: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

… but it stops when dideoxynucleotideis incorporated

4 parallel sets of reactions: ddATP + 4 dNTPsddCTP + 4 dNTPs etc. Fig. 4.2

Sanger chain termination method (Fred Sanger, 1977)- enzymatic synthesis of DNA strand complementary to “template” of interest

Nobel Prizes: Sanger 1958 (protein structure)1980 (DNA sequencing)

Page 4: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

Fig. 4.2

Ratio of ddATP:dATP importantto get appropriate size range of products

- set of products each terminating with ddA- their sizes reflect positions of T in template DNA

Page 5: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

Products (each differing in length by 1 nt) resolved on denaturing polyacrylamide gels...

Automated sequencing profileAutoradiograph Fig. 4.1 Fig. 4.3

or by capillary electrophoresis …

Page 6: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

Fig. 4.5

PRIMERS FOR SEQUENCING

1. “Universal” - forward & reverse

2. Custom-designed “internal”

- use new sequence info to design primer to sequence next stretchIf insert is too long to completely sequence using “universal primers”, can use this strategy to close a “sequencing gap”

Page 7: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

… or can find another clone in library that has overlap, and sequence it using “universal” primers

Fig. 3.35

Page 8: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

if particular region of genome is not represented in clone library

- can use a different vector to prepare a second clone library

- then use probes (eg. oligomers) mapping to ends of contigs from first library to screen second library

(maybe region was unstable in first vector)

What if there is a “physical gap”?

Fig. 4.17

Page 9: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

Fig. 4.11

Which contigs are adjacent?

… or by PCR

You have 9 contigs & design oligomers mapping close to their ends (#1-18)

Example of closing a “physical gap”

screening by hybridization

8 7 1 2

Page 10: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

What if “physical gap” is very short?

- then sequence the PCR product directly

- could use oligomers mapping to ends of contigs in PCRreactions with uncloned DNA template

3’… … 5’

5’… … 3’

< 10 kb or so

- this slide also illustrates a method for finding overlapping clones

Page 11: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

Fig. 4.12

- repeat to “walk along” genome

ASSEMBLING INFO FROM CLONES INTO CONTIGS

1. CHROMOSOME WALKING by hybridization

- sequence from one clone is used as probe to screenlibrary of clones to find overlapping one

Page 12: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

But what if probe contains repeated sequences?

Problem avoided if use short unique-sequence probe(eg oligomer) mapping close to end of clone

- so hybridizes to multiple clones

… or if pre-hybridize with repeat sequence

Fig. 3.34

Page 13: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

2. CHROMOSOME WALKING by PCR

Fig. 4.13

- reactions can be carried out as pools for more rapid screening

- design primer pairs based on sequence at end of clone

- use other clones in library for template DNA

- will get PCR amplicon for any new clones with that sequence

(combinatorial screening) Fig. 4.14

Page 14: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

Fig. 4.15A

3. CLONE FINGERPRINTING

Restriction profile fingerprint

To identify overlapping clones: by finding features that they share

or clones having STS in common (Fig.4.15D)

Page 15: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

Fig. 4.10

Haemophilus genome project 1995 (1.8 Mbp)

1. DNA sonicated, fragments (1.6 – 2 kb) cloned in plasmid vectors

2. Shotgun sequencing of insert ends~ 20,000 clones analyzed, 11 Mbp of sequence, scaffolds with sequencing gaps & physical gaps

4. Screened for overlapping clones – reduced to 42 contigs

3. Assembled into 140 contigs

5. Assumed gaps represented genome regions unstable in plasmid vector - switched to lambda vector

6. Probed library with oligomers from contig ends or used PCR with primer pairs from contig ends

Page 16: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

“Cost per Megabase of DNA Sequence (or Why biologists panic about computing)”

“Next generation” sequencing technologies

National Human Genome Research Institute

- major challenge to correctly assemble the massive amount of sequence data generated…and to interpret it !

Page 17: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

Genome Res 11:3, 2001

- one dNTP is added at a time + enzyme (apyrase) that degradesdNTP if not incorporated into new strand, then next dNTP added

- incorporation detected by chemiluminescence of pyrophosphate (PPi)

Fig. 4.9

1. Pyrosequencing

C

www.youtube.com/watch?v=kYAGFrbGl6E&feature=related

Page 18: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

Medini Nat Rev Microbiol. 6:419, 2008

- DNA sheared, adaptors ligated, attached to bead & PCR amplified

- beads captured in wells & pyrosequencing carried out in parallel on each DNA fragment

Enzymes on beads and primer

Sample preparation Pyrosequencing

PCR

Polymerase

PPi

Light

Genomic DNA

- average read of ~ 700 (?) bp

“Massively-parallel” pyrosequencing (on beads or chips)

454 technology

... but “up to 1.6 million reactions can be carried out in parallel on a 6.4 cm2 slide”

“expect ~ 500 million nucleotides of sequence data per 10 hour run” (July 2010)

Page 19: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

2. Illumina sequencing (parallel microchip)

Sample preparation Sequencing by synthesis

- average read of ~ 40-100 bp (short-read)

- add adaptors to sheared DNA, attach to chip, then PCR “bridge amplification”

- denature clusters of ~ 1000 copies of DNA molecules & sequential sequencing using four fluorophore-labelled nts

SOLEXA technology

Medini Nat Rev Microbiol. 6:419, 2008www.youtube.com/watch?v=HtuUFUnYB9Y&feature=related

Output (2 × 100 bp) 600 Gb 300 Gb

Run Time (2 × 100 bp)

~11 days ~8.5 days

Paired-end Reads 6 Billion 3 Billion

Single Reads 3 Billion 1.5 Billion

Maximum Read Length**

2 × 100 bp

2 × 100 bp

Bases Above Q30***

> 85% (2 x 50 bp)> 80% (2 x 100 bp)

HiSeq 2000 HiSeq 1000

(Illumina website Sept. 2012)

Page 20: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

3. Single molecule real-time sequencing (Helicos, Pacific Biosciences)

Metzker Nature Reviews Genetics 11:31, 2010

- continuous monitoring of nt incorporation (rather than termination as in Sanger method…) and no amplification- formation of phosphodiester bond releases fluorophore

- nanoscale wells on chip so ~ one DNA polymerase molecule per well

(Helicos website Sept. 2012)

- read length 25 to 55 bases, 21-35 Gigabases per run

Page 21: Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA SEQUENCING & ASSEMBLY.

Chin et al. New Eng J Med 364:33, 2011

Press release, Dec 9,2010: “PacBio & Harvard Use Fast Gene Sequencer to Crack DNA Code of Haitian Cholera Strain”

H1 and H2 strains were sequenced in < 24 hr with enough “reads” to cover the genomes 60 and 32 times, respectively.