Eccmid meet the-expert
-
Upload
nick-loman -
Category
Science
-
view
437 -
download
1
Transcript of Eccmid meet the-expert
What bioinformatic tools should I use for analysis of high-throughput sequencing data
for molecular diagnostics?
Nick Loman
Read QC
Assembly
Whole-genome alignment
Reference-based approach
De novo approach
MauveParsnpAlignment BWA
Variant calling Samtools/VarScanGATK
SPADES
FastQCQualimapKrakenBLAST!Adaptor/quality
trimming Trimmomatic
SNP extractionPython script!Snippy
Recombination filtering Gubbins
MLST/Antibiogram
Annotation
Mlstabricate
Prokka
Tree building FastTreeRAXML
Tree building Harvest
Population genomicsBIGSDBPhyloviz
MLST/Antibiogram SRST2 Pan-genome LS-BSR
Quality Control: Questions to Ask
• Did my sequencing work?
• What are the fragment lengths?
• Is my sample what I think it is?
• Is my sample contaminated?
Did my sequencing work?
• FastQC:
What are the fragment lengths?
• Qualimap (or just BWA)
BadFragment length < read
length
OKFragment length > read
length
GoodFragment length > 2x read
length
Will affect: genome coverage, de novo assembly performance, alignment performance
Is my sample what I think it is?
• BLASTing a few reads usually very efficient
Is my sample contaminated?
Adaptor trim reads
• With Nextera libraries, failing to adaptor trim will KILL your assemblies.
• Particularly important when mean fragment length < read length.
• Many trimmers available: I like to use Trimmomatic
For more explanation: http://nickloman.github.io/high-throughput%20sequencing/genomics/bioinformatics/2013/04/17/adaptor-trim-or-die-experiences-with-nextera-libraries/
Adaptor trim reads
• With Nextera libraries, failing to adaptor trim will KILL your assemblies.
• Particularly important when mean fragment length < read length.
• Many trimmers available: I like to use Trimmomatic
For more explanation: http://nickloman.github.io/high-throughput%20sequencing/genomics/bioinformatics/2013/04/17/adaptor-trim-or-die-experiences-with-nextera-libraries/
Reference-based or de novo?
Reference-based or de novo?
• Reference-based– Implies ALIGNMENT to reference
– Implies you HAVE a reference
– Allows exquisitely sensitive and specific SNP calling (forensic SNP calling to single mutation precision)
– Important for looking at CHAINS OF TRANSMISSION
– Can only call in parts of the genome COMMON between your SAMPLES and REFERENCE
Reference-based or de novo?
• De-novo– Implies de novo assembly
– Does NOT require a reference
– Gives access to the entire PAN-genome
– E.g.• Unexpected antibiotic resistance genes
• Virulence factors
– Can give misleading results in REPEAT sequences
– Not suitable for very fine-resolution SNP analysis
In practice
• Most people will want to do both.
• And if you have no reference, you can use a draft de novo assembly AS your reference.
Acknowledgements
• Twitter comments:
– Tom Connor, Alan McNally, Torsten Seemann, C. Titus Brown, Heng Li, Christoffer Flensburg, Matt MacManes, Rachel Glover, Willem van Schaik