ASM workshop handout

1
ASM2015 Workshop 20: Typing of Bacterial Pathogens in 2015 May 30, 2015 SNP CALLING & OUTBREAK RECONSTRUCTION IN CLONAL PATHOGENS Tips for successful sequencing 1. Choose the right platform and read length. For identifying SNPs in a monomorphic pathogen, mapping short reads against a reference genome works well. 150-300bp paired-end reads on an Illumina platform are common. There may be cases in which you want a de novo assembly or want to cover long repetitive regions. Running a single isolate of interest on a long-read platform can give you the scaold against which short reads can then be mapped. 2. Multiplex appropriately, aiming for at least 50x coverage, if not more. 3. When sequencing outbreaks, it can be helpful to include a few non-outbreak cases to ensure the sample labels did not get mixed up during the run. Bioinformatics advice (with thanks to Nick Loman) 1. First, a little quality control. FastQC can tell you if the reads look good. Trimmomatic can remove sequencing adaptors, but trimming for base quality isn’t generally needed. BLAST, Kraken, or Metaphlan will let you know if what you sequenced is what you think it should be. 2. Decide on your goal: mapping your reads to a reference (perfect for outbreak reconstruction SNP calling, building a phylogeny of isolates) or creating a de novo genome assembly (useful for trying to describe a genome in which there may be some interesting biology). 3. A good reference mapping pipeline would be BWA for alignment, Samtools or VarScan to call SNPs and indels, a custom script for extracting and filtering SNPs, and then a phylogenetic tree from a SNP aligment using RaxML or BEAST. For non-clonal pathogens, a recombination filtering step could use Gubbins or Clonal Frame. There are tools that integrate several steps too, like Snippy or BRESEQ. 4. A good reference mapping also depends on having a good reference - it should be closely related to your organism and should be high-quality - early Sanger genomes work well. 5. If you are just interested in typing from your reads using an existing MLST scheme, or matching reads to a set of genes - say, antibiotic resistance genes from your organism of interest, use SRST2. 6. Quality SNPs will have decent coverage, will occur on both forward and reverse reads, won’t be near the ends of reads or contigs, and won’t be densely clumped. Inspecting your SNPs visually is a good practice. Interpreting genomic data through the epidemiological lens 1. Nothing in biology makes sense except in light of evolution. 2. Identical sequences don’t necessarily always imply transmission from person to person - there could be exposure from a common source, or within-host genetic diversity might make it dicult to identify an infector. 3. Sometimes it’s easier to rule out transmission than rule it in. 4. Information to consider besides shared contact includes: date of symptom onset, date of diagnosis, date put on treatment, infectiousness, hospitalizations, and duration of infectious period. By Jennifer Gardy, BC Centre for Disease Control

Transcript of ASM workshop handout

Page 1: ASM workshop handout

ASM2015 Workshop 20: Typing of Bacterial Pathogens in 2015 May 30, 2015

SNP CALLING & OUTBREAK RECONSTRUCTION IN CLONAL PATHOGENS

Tips for successful sequencing 1. Choose the right platform and read length. For identifying SNPs in a monomorphic pathogen, mapping short reads against a reference genome works well. 150-300bp paired-end reads on an Illumina platform are common. There may be cases in which you want a de novo assembly or want to cover long repetitive regions. Running a single isolate of interest on a long-read platform can give you the scaffold against which short reads can then be mapped.

2. Multiplex appropriately, aiming for at least 50x coverage, if not more.

3. When sequencing outbreaks, it can be helpful to include a few non-outbreak cases to ensure the sample labels did not get mixed up during the run.

Bioinformatics advice (with thanks to Nick Loman) 1. First, a little quality control. FastQC can tell you if the reads look good. Trimmomatic can remove sequencing adaptors, but trimming for base quality isn’t generally needed. BLAST, Kraken, or Metaphlan will let you know if what you sequenced is what you think it should be.

2. Decide on your goal: mapping your reads to a reference (perfect for outbreak reconstruction SNP calling, building a phylogeny of isolates) or creating a de novo genome assembly (useful for trying to describe a genome in which there may be some interesting biology).

3. A good reference mapping pipeline would be BWA for alignment, Samtools or VarScan to call SNPs and indels, a custom script for extracting and filtering SNPs, and then a phylogenetic tree from a SNP aligment using RaxML or BEAST. For non-clonal pathogens, a recombination filtering step could use Gubbins or Clonal Frame. There are tools that integrate several steps too, like Snippy or BRESEQ.

4. A good reference mapping also depends on having a good reference - it should be closely related to your organism and should be high-quality - early Sanger genomes work well.

5. If you are just interested in typing from your reads using an existing MLST scheme, or matching reads to a set of genes - say, antibiotic resistance genes from your organism of interest, use SRST2.

6. Quality SNPs will have decent coverage, will occur on both forward and reverse reads, won’t be near the ends of reads or contigs, and won’t be densely clumped. Inspecting your SNPs visually is a good practice.

Interpreting genomic data through the epidemiological lens 1. Nothing in biology makes sense except in light of evolution.

2. Identical sequences don’t necessarily always imply transmission from person to person - there could be exposure from a common source, or within-host genetic diversity might make it difficult to identify an infector.

3. Sometimes it’s easier to rule out transmission than rule it in.

4. Information to consider besides shared contact includes: date of symptom onset, date of diagnosis, date put on treatment, infectiousness, hospitalizations, and duration of infectious period.

By Jennifer Gardy, BC Centre for Disease Control