Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA...

Stratton Nature 45: 719, 2009

Evolution of DNA sequencing technologies - 1980 to present day

DNA SEQUENCING & ASSEMBLY

“The X Prize Foundation of Playa Vista, California, is offering a $10-million prize to the first team to accurately sequence the genomes of 100 people aged 100 or older, for $1,000 or less apiece and within 30 days [beginning September 5, 2013].”

see Nature 487:417, July 26, 2012

$$$ Motivation to “spur DNA sequencing technologies, boost accuracy and drive down costs”

…with an accuracy of <1 error per 1 million bases

http://genomics.xprize.org/100-over-100

… but it stops when dideoxynucleotideis incorporated

4 parallel sets of reactions: ddATP + 4 dNTPsddCTP + 4 dNTPs etc. Fig. 4.2

Sanger chain termination method (Fred Sanger, 1977)- enzymatic synthesis of DNA strand complementary to “template” of interest

Nobel Prizes: Sanger 1958 (protein structure)1980 (DNA sequencing)

Fig. 4.2

Ratio of ddATP:dATP importantto get appropriate size range of products

- set of products each terminating with ddA- their sizes reflect positions of T in template DNA

Products (each differing in length by 1 nt) resolved on denaturing polyacrylamide gels...

Automated sequencing profileAutoradiograph Fig. 4.1 Fig. 4.3

or by capillary electrophoresis …

Fig. 4.5

PRIMERS FOR SEQUENCING

1. “Universal” - forward & reverse

2. Custom-designed “internal”

- use new sequence info to design primer to sequence next stretchIf insert is too long to completely sequence using “universal primers”, can use this strategy to close a “sequencing gap”

… or can find another clone in library that has overlap, and sequence it using “universal” primers

Fig. 3.35

if particular region of genome is not represented in clone library

- can use a different vector to prepare a second clone library

- then use probes (eg. oligomers) mapping to ends of contigs from first library to screen second library

(maybe region was unstable in first vector)

What if there is a “physical gap”?

Fig. 4.17

Fig. 4.11

Which contigs are adjacent?

… or by PCR

You have 9 contigs & design oligomers mapping close to their ends (#1-18)

Example of closing a “physical gap”

screening by hybridization

8 7 1 2

What if “physical gap” is very short?

- then sequence the PCR product directly

- could use oligomers mapping to ends of contigs in PCRreactions with uncloned DNA template

3’… … 5’

5’… … 3’

< 10 kb or so

- this slide also illustrates a method for finding overlapping clones

Fig. 4.12

- repeat to “walk along” genome

ASSEMBLING INFO FROM CLONES INTO CONTIGS

1. CHROMOSOME WALKING by hybridization

- sequence from one clone is used as probe to screenlibrary of clones to find overlapping one

But what if probe contains repeated sequences?

Problem avoided if use short unique-sequence probe(eg oligomer) mapping close to end of clone

- so hybridizes to multiple clones

… or if pre-hybridize with repeat sequence

Fig. 3.34

2. CHROMOSOME WALKING by PCR

Fig. 4.13

- reactions can be carried out as pools for more rapid screening

- design primer pairs based on sequence at end of clone

- use other clones in library for template DNA

- will get PCR amplicon for any new clones with that sequence

(combinatorial screening) Fig. 4.14

Fig. 4.15A

3. CLONE FINGERPRINTING

Restriction profile fingerprint

To identify overlapping clones: by finding features that they share

or clones having STS in common (Fig.4.15D)

Fig. 4.10

Haemophilus genome project 1995 (1.8 Mbp)

1. DNA sonicated, fragments (1.6 – 2 kb) cloned in plasmid vectors

2. Shotgun sequencing of insert ends~ 20,000 clones analyzed, 11 Mbp of sequence, scaffolds with sequencing gaps & physical gaps

4. Screened for overlapping clones – reduced to 42 contigs

3. Assembled into 140 contigs

5. Assumed gaps represented genome regions unstable in plasmid vector - switched to lambda vector

6. Probed library with oligomers from contig ends or used PCR with primer pairs from contig ends

“Cost per Megabase of DNA Sequence (or Why biologists panic about computing)”

“Next generation” sequencing technologies

National Human Genome Research Institute

- major challenge to correctly assemble the massive amount of sequence data generated…and to interpret it !

Genome Res 11:3, 2001

- one dNTP is added at a time + enzyme (apyrase) that degradesdNTP if not incorporated into new strand, then next dNTP added

- incorporation detected by chemiluminescence of pyrophosphate (PPi)

Fig. 4.9

1. Pyrosequencing

C

www.youtube.com/watch?v=kYAGFrbGl6E&feature=related

Medini Nat Rev Microbiol. 6:419, 2008

- DNA sheared, adaptors ligated, attached to bead & PCR amplified

- beads captured in wells & pyrosequencing carried out in parallel on each DNA fragment

Enzymes on beads and primer

Sample preparation Pyrosequencing

PCR

Polymerase

PPi

Light

Genomic DNA

- average read of ~ 700 (?) bp

“Massively-parallel” pyrosequencing (on beads or chips)

454 technology

... but “up to 1.6 million reactions can be carried out in parallel on a 6.4 cm2 slide”

“expect ~ 500 million nucleotides of sequence data per 10 hour run” (July 2010)

2. Illumina sequencing (parallel microchip)

Sample preparation Sequencing by synthesis

- average read of ~ 40-100 bp (short-read)

- add adaptors to sheared DNA, attach to chip, then PCR “bridge amplification”

- denature clusters of ~ 1000 copies of DNA molecules & sequential sequencing using four fluorophore-labelled nts

SOLEXA technology

Medini Nat Rev Microbiol. 6:419, 2008www.youtube.com/watch?v=HtuUFUnYB9Y&feature=related

Output (2 × 100 bp) 600 Gb 300 Gb

Run Time (2 × 100 bp)

~11 days ~8.5 days

Paired-end Reads 6 Billion 3 Billion

Single Reads 3 Billion 1.5 Billion

Maximum Read Length**

2 × 100 bp

2 × 100 bp

Bases Above Q30***

> 85% (2 x 50 bp)> 80% (2 x 100 bp)

HiSeq 2000 HiSeq 1000

(Illumina website Sept. 2012)

3. Single molecule real-time sequencing (Helicos, Pacific Biosciences)

Metzker Nature Reviews Genetics 11:31, 2010

- continuous monitoring of nt incorporation (rather than termination as in Sanger method…) and no amplification- formation of phosphodiester bond releases fluorophore

- nanoscale wells on chip so ~ one DNA polymerase molecule per well

(Helicos website Sept. 2012)

- read length 25 to 55 bases, 21-35 Gigabases per run

Chin et al. New Eng J Med 364:33, 2011

Press release, Dec 9,2010: “PacBio & Harvard Use Fast Gene Sequencer to Crack DNA Code of Haitian Cholera Strain”

H1 and H2 strains were sequenced in < 24 hr with enough “reads” to cover the genomes 60 and 32 times, respectively.

Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA...

Documents

Transcript of Stratton Nature 45: 719, 2009 Evolution of DNA sequencing technologies - 1980 to present day DNA...