Eccmid meet the-expert

14
What bioinformatic tools should I use for analysis of high-throughput sequencing data for molecular diagnostics? Nick Loman

Transcript of Eccmid meet the-expert

Page 1: Eccmid meet the-expert

What bioinformatic tools should I use for analysis of high-throughput sequencing data

for molecular diagnostics?

Nick Loman

Page 2: Eccmid meet the-expert

Read QC

Assembly

Whole-genome alignment

Reference-based approach

De novo approach

MauveParsnpAlignment BWA

Variant calling Samtools/VarScanGATK

SPADES

FastQCQualimapKrakenBLAST!Adaptor/quality

trimming Trimmomatic

SNP extractionPython script!Snippy

Recombination filtering Gubbins

MLST/Antibiogram

Annotation

Mlstabricate

Prokka

Tree building FastTreeRAXML

Tree building Harvest

Population genomicsBIGSDBPhyloviz

MLST/Antibiogram SRST2 Pan-genome LS-BSR

Page 3: Eccmid meet the-expert

Quality Control: Questions to Ask

• Did my sequencing work?

• What are the fragment lengths?

• Is my sample what I think it is?

• Is my sample contaminated?

Page 4: Eccmid meet the-expert

Did my sequencing work?

• FastQC:

Page 5: Eccmid meet the-expert

What are the fragment lengths?

• Qualimap (or just BWA)

BadFragment length < read

length

OKFragment length > read

length

GoodFragment length > 2x read

length

Will affect: genome coverage, de novo assembly performance, alignment performance

Page 6: Eccmid meet the-expert

Is my sample what I think it is?

• BLASTing a few reads usually very efficient

Page 7: Eccmid meet the-expert

Is my sample contaminated?

Page 8: Eccmid meet the-expert

Adaptor trim reads

• With Nextera libraries, failing to adaptor trim will KILL your assemblies.

• Particularly important when mean fragment length < read length.

• Many trimmers available: I like to use Trimmomatic

For more explanation: http://nickloman.github.io/high-throughput%20sequencing/genomics/bioinformatics/2013/04/17/adaptor-trim-or-die-experiences-with-nextera-libraries/

Page 9: Eccmid meet the-expert

Adaptor trim reads

• With Nextera libraries, failing to adaptor trim will KILL your assemblies.

• Particularly important when mean fragment length < read length.

• Many trimmers available: I like to use Trimmomatic

For more explanation: http://nickloman.github.io/high-throughput%20sequencing/genomics/bioinformatics/2013/04/17/adaptor-trim-or-die-experiences-with-nextera-libraries/

Page 10: Eccmid meet the-expert

Reference-based or de novo?

Page 11: Eccmid meet the-expert

Reference-based or de novo?

• Reference-based– Implies ALIGNMENT to reference

– Implies you HAVE a reference

– Allows exquisitely sensitive and specific SNP calling (forensic SNP calling to single mutation precision)

– Important for looking at CHAINS OF TRANSMISSION

– Can only call in parts of the genome COMMON between your SAMPLES and REFERENCE

Page 12: Eccmid meet the-expert

Reference-based or de novo?

• De-novo– Implies de novo assembly

– Does NOT require a reference

– Gives access to the entire PAN-genome

– E.g.• Unexpected antibiotic resistance genes

• Virulence factors

– Can give misleading results in REPEAT sequences

– Not suitable for very fine-resolution SNP analysis

Page 13: Eccmid meet the-expert

In practice

• Most people will want to do both.

• And if you have no reference, you can use a draft de novo assembly AS your reference.

Page 14: Eccmid meet the-expert

Acknowledgements

• Twitter comments:

– Tom Connor, Alan McNally, Torsten Seemann, C. Titus Brown, Heng Li, Christoffer Flensburg, Matt MacManes, Rachel Glover, Willem van Schaik