Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and...

26
Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System Ken Dewar and Vince Forgetta Department of Human Genetics McGill University McGill University and Genome Quebec Innovation Centre [email protected] -0-

Transcript of Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and...

Page 1: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus

using the Roche/454 Titanium System

Ken Dewar and Vince ForgettaDepartment of Human Geneticsp

McGill University

McGill University and Genome Quebec Innovation Centrey

[email protected]

- 0 -

Page 2: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

Acknowledgements

Study Design and LeadershipABRF DSRG: Chair (outgoing): Jan Kieleczawa, Pfizer ResearchChair (incoming): Michael Zianni Ohio State UniversityChair (incoming): Michael Zianni, Ohio State University

Robert Steen, Harvard Medical SchoolDeborah Grove, Penn State University, Anoja Perera, Stowers Institute for Medical ResearchAnoja Perera, Stowers Institute for Medical ResearchRobert Lyons Jr., University of MichiganSushmita Singh, University of MinnesotaDoug Bintzler, University of CincinnatiScottie Adams, Trudeau InstituteKen Dewar, McGill University

S i L bSequencing LabsDeborah Grove, Gregory Grove, Penn State UniversityRobert Lyons Jr., Suzanne Genik, University of MichiganChris Wright Alvaro Hernandez Sharon Bachman Lorie Hetrick UChris Wright, Alvaro Hernandez, Sharon Bachman, Lorie Hetrick, U. Illinois at Urbana- ChampaignSushmita Singh, Nichole Peterson, University of MinnesotaVince Forgetta, Gary Leveque, Joana Dias, McGill University

- 1 -

g , y q , , y

Roche 454Clotilde Teiling, Tim Harkins

Page 3: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

Rationale of the Study

To evaluate the reproducibility of Roche/454 instrumentation, protocols and reagentsprotocols, and reagents.

Experiment1) Single DNA sample → ssDNA (MID) fragment library1) Single DNA sample → ssDNA (MID) fragment library2) Aliquots distributed to each of 5 centers3) Common protocol followed for sequencing4) Data returned for centralized analysis4) Data returned for centralized analysis

Evaluation1) Sequence yield (Mb reads read lengths)1) Sequence yield (Mb, reads, read lengths)2) Sequence accuracy3) Effects on assembly

- 2 -

Page 4: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

Choice of sample

Experimental considerations5 sites performing minimum ½ run

Target output = 5 * 250 Mb = 1.25 Gb>300X coverage of a typical 4 Mb bacteria<0.5X coverage of a typical 3 Gb mammal<0.5X coverage of a typical 3 Gb mammal

Sequence a known genome (easier analysis) versus sequence a new genome (contribution to new knowledge)sequence a new genome (contribution to new knowledge)

Fungal genome, the fungus responsible for Dutch elm disease

Pro’s: 30-50 Mb genome (25-40X coverage)

Con’s:AT or GC skew?repetitive elements?

- 3 -

Bad genome or bad assembly?

Page 5: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

Location of Participating Sequencing Cores

Site Size of Team Team Experience Instrument Runs

A 2 22 22B 5 172 163C 3 128 128D 2 5 5E 7 345 197

- 4 -

The participating sites encompassed a variety of levels of experience.

Page 6: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

The Ideal Titanium RunA run is advertised as ~1.2 million reads @ ~420 nt = >500 Mb

(½ run = 600K reads and 250 Mb)

Random sequence200 cycles600K reads (1/2 run)

310 Mb516 nt read lengthg99% reads between 472-560 nt

Th k f d (i hi h /l GC) th

- 5 -

The more skew from random (i.e. higher/lower GC), the more the length should increase (more homopolymers).

Page 7: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

The Actual Titanium RunsAll sites used an aliquot from a single ssDNA library.

Not all sites used the same lots of reagents.

Site Reads MbA 602 239B 626 243B 626 243C 714 228D 743 308E 547 215long shoulder of short reads

C

N it t th “ t d d” 600K d d 250 Mb

- 6 -

No site met the “standard” 600K reads and 250 Mb (Although several were very close considering we trimmed MIDs.)

Page 8: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

Fluidics problems cause short reads

This is hat the This is what the

- 7 -

This is what the results looked like.

This is what the lab looked like.

Page 9: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

The Actual Titanium Runs

- 8 -

Page 10: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

The Actual Titanium Runs

Site Reads MbA 602 239B 626 243

Site Reads MbA 602 239B 626 243B 626 243

C 714 228D 743 308E 547 215

B 626 243C 791 297D 743 308E 547 215

- 9 -

Page 11: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

Ready for Genome AssemblyAll sites attempted ½ run under control conditionsAll sites attempted ½ run under control conditions,

and a second ½ run under conditions of their choice.

In total >6 4 million reads >2 53 Gb generatedIn total, >6.4 million reads, >2.53 Gb generated.

78.2% of the datain reads >400 nt

- 10 -

Page 12: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

Dutch elm disease is a vascular wilt.

- 11 -

DED has killed >1 billion elms in <100 years.

Page 13: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

Dutch elm disease

Host: elm trees (varying susceptibility)Pathogen: fungus (varying pathogenicity)Vector: beetleVector: beetle

1920’s

1960’s1990

1960’s

Multiple invasions3 sub-speciesO hi t l i l i d hi l l i

- 12 -

Ophiostoma ulmi, novo-ulmi, and himal-ulmi

Page 14: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

Ophiostoma novo-ulmi genetics and genomics

t f (N S h )ascomycete fungus (Neurospora, Saccharomyces)mycelial or budding cell growthvegetatively haploidcan be cultured and crossedpathogenicity under polygenic control

genetic linkage maps chromosomes separated by PFGEg g p y

Estimated genome25-35 Mb

- 13 -

>6 linear chromosomes

Page 15: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

Genome Sequencing Pilot

Single run performed for QC/QA before samples sent to other sites

7X d bl~7X coverage and assembly

31 Mb assemblyti 50 Kb

mtDNA18S rDNA

- 14 -

contigs >50 Kb nuclear genomelow repeat content

Page 16: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

Genome Sequencing Pilot

In addition to the ssDNA library sent to all sites

Generated an 8 Kb paired-end library½ run sequenced at a single site

Total reads 181,162Reads >50nt-adaptor->50nt 78,026Unique >50nt-adaptor->50nt 61,209Unique >50nt adaptor >50nt 61,209

Est. paired-end distance 7.5 KbEst. genome template coverage 15XEst. genome template coverage 15X

- 15 -

Page 17: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

Genome Assembly Statistics

l PE t l ti i ff ldre-analyse PE to place contigs in scaffoldsre-assemble high coverage contigs

WGS WGS/PE WGS/PE/analysisRuns

Reads 6.4 million 6.6 6.6Bases 2.53 Gb 2.6 2.6

A blAssemblySize 31.7 Mb 31.8 31.8

Coverage 80X 81X 81X

Contigs 252 160 160

In this assembly we estimate 151 gaps

Contigs 252 160 160N50 contig 262 Kb 368 368Scaffolds 0 (+252) 8 (+9) 8 (+1)

In this assembly we estimate 151 gaps22 > 1kb85 between 400-1 kb44 <400 bp

DiGuistini et al., 2009, Genome BiologySequencing the genome of Grosmannia clavigeraCombination of

GS20/FLX ~8.3XIllumina 42 bp PE >50XFosmid ends >18,000 reads

Assembly

- 16 -

Assembly3295 contigs N50=32 Kb163 scaffolds N50=558 Kb

Page 18: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

Genome Assembly Quality AssessmentWhat proportion of reads accounted for in the assembly?What proportion of reads accounted for in the assembly?

Total Reads 6 425 438Total Reads 6,425,438Genome (incl. rDNA) 5,670,075 88.20%

mtDNA 633,519 9.90%unassembled 121,844 1.90%

- 17 -

Page 19: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

How do scaffolds compare to chromosomes?

Genome Assembly Quality AssessmentHow do scaffolds compare to chromosomes?

- 18 -

Page 20: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

Does the assembly contain known genes?

Genome Assembly Quality AssessmentDoes the assembly contain known genes?

3309 EST sequences available3309 EST sequences available3277 (99%) align to the assembly

- 19 -

Page 21: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

Does the mtDNA assembly make sense?

Genome Assembly Quality AssessmentDoes the mtDNA assembly make sense?

- 20 -

mtDNA: 66kb, circular (confirmed by reads and PE)

Page 22: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

What’s up with the single contig not in a scaffold?

Genome Assembly Quality AssessmentWhat s up with the single contig not in a scaffold?

Smallest scaffold is 2.53 MbOne non-scaffolded contig of 4.1 kb.

dsDNA plasmid

- 21 -

p

Page 23: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

Site Performance versus AssemblyDeviations in genome coverage?Deviations in genome coverage?

99 9% of all assembled bases are99.9% of all assembled bases are present in the read set of each site.

- 22 -

Page 24: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

Site Performance versus AssemblyDeviations in homopolymer counting?

Length Number Bases4 294,261       1,177,044    5 79 130 395 650

Deviations in homopolymer counting?

5 79,130       395,650     6 20,974         125,844       7 8,090            56,630          8 3,381            27,048          9 1 654 14 8869 1,654          14,886         10 825               8,250            

Site Reads MbA 602 239

Site Size of Team Team Experience Instrument Runs

A 2 22 22 60 39B 626 243C 791 297D 743 308E 547 215

A 2 22 22B 5 172 163C 3 128 128D 2 5 5E 7 345 197 E 547 215E 7 345 197

- 23 -

Page 25: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

Site Performance versus AssemblyDeviations in substitution error rates?

0.0025

Deviations in substitution error rates?0.0025

0.002

0.002

T>G

T>C

0.0015

Series1

Across the entirety of the data,a read h h l i t d

0.0015

T>C

T>A

G>T

G>C

G>A

0.001

Series1has a non-homopolymer associated change at a rate of 1/430 to 1/450 0.001

G>A

C>T

C>G

C>A

A>T

Substitution errors are slightly biased, C>T four times more common than anything else

0.0005

0.0005

A>T

A>G

A>C

0

A B C D E

0

A B C D E

- 24 -

A B C D E

Page 26: Multi-Centre Genome Sequencing and Analysis of the Dutch ... · Multi-Centre Genome Sequencing and Analysis of the Dutch Elm Disease Fungus using the Roche/454 Titanium System ...

Summary and Conclusions

1 R h /454 i t t ti t d1. Roche/454 instrumentation, reagents and protocols are robust and reproducible across sitessitesThe fluidic systems need to be in top shape for best results.

2. A combination of Titanium reads and 8 kb paired ends have led to a very high quality assembly of the O. novo-ulmi genome.It’ ti f th O l i bi l i t t t tIt’s time for the O. ulmi biologists to get to back to work…we’ve done our part.

Thanks ABRF and Roche/454

- 25 -

Thanks ABRF and Roche/454