Download - SnapperDB ASM NGS conference

SnapperDB: A Scalable Database for Routine Sequencing of Bacterial Isolates

Philip AshtonBioinformaticianGastrointestinal Bacteria Reference Unit

Github + CLIMB image• Download the code from https://github.com/PHE-GIDIS/SnapperDB.git

• Spin up an image (if you have a CLIMB account) http://birmingham.climb.ac.uk/ ashton-phe-snpdb-client

• And, if you have no idea what I mean by spin up an image:• http://bitsandbugs.org/2015/05/13/climb-hackathon-outcome/• Google – ‘bits and bugs + climb’• CLIMB google group

2 SnapperDB

http://birmingham.climb.ac.uk/



http://bitsandbugs.org/2015/05/13/climb-hackathon-outcome/



3 SnapperDB

Challenges:• Many different eburst groups (of STs) – have to be analysed separately• Hundreds of strains a week• Rapid, hands-off analysis

Solution – SnapperDB:

SnapperDB

SampleFASTQs(with ST)

EBG 1 - Typhimuriumdb

db

db

db

db

EBG 4 - Enteritidis

EBG 13 - Typhi

EBG 3 - Newport

EBG 11 – Paratyphi A

…

30 mins - parallel 5 min – 1 hour

4 SnapperDB

FASTQsSNPdb

(PostgreSQL database)

SNP alignmentsTrees

SnapperDBSnapperDB.py

fastq_to_db

(fastq_to_vcf,vcf_to_db)

SnapperDB.pyget_the_snps

RAxML

FastTree

SNPdb Schema

5 SnapperDB

Parse VCF

Ignored positions• Ambiguous

mapping• Low coverage

Variants

Strain SNPs

id name variants_id ignored_pos1 H123456789 [1,2,3,4,5,…] [9985, 856142, …]

Variantsid position ref base var_base1 235214 A T2 455544 T C…

Running SnapperDB

6 SnapperDB

FASTQs

Trees

Pairwise distance matrix

Put picture of SNP distance matrix

SNP address

Looks something like

1.1.24.36.48.128.2013

7 Revolutionising Salmonella reference microbiology

http://benfry.com/zipdecode/

SNP Address

13

1 2

1

2

31

2

3

4

5

6

1.1.1

1.2.2 1.2.41.2.3

2.3.5

2.3.6

Update SNP address& detect mixed

14 SnapperDB

As you add strains to cluster, it calculates the mean pairwise distance

and stdev within that cluster.

Then, if a strain has a z-score of >1.75, the strain is quality assessed.

Look at the alignment of that strain and near neighbours (with Ns) and exclude if a strain introduces a larger number

of unique Ns to the alignment.

Otherwise, these can blunt the tips of the tree and reduce resolution.

SnapperDB.py update_clusters

Results

15 SnapperDB

16 SnapperDB

Results

AcknowledmentsTim Dallman

Anthony Underwood

Aleksy Jironkin

Jon Green

Ali Al-Shabib

17 SnapperDB