SnapperDB: A Scalable Database for Routine Sequencing of Bacterial Isolates
Philip AshtonBioinformaticianGastrointestinal Bacteria Reference Unit
Github + CLIMB image• Download the code from https://github.com/PHE-GIDIS/SnapperDB.git
• Spin up an image (if you have a CLIMB account) http://birmingham.climb.ac.uk/ ashton-phe-snpdb-client
• And, if you have no idea what I mean by spin up an image:• http://bitsandbugs.org/2015/05/13/climb-hackathon-outcome/• Google – ‘bits and bugs + climb’• CLIMB google group
2 SnapperDB
3 SnapperDB
Challenges:• Many different eburst groups (of STs) – have to be analysed separately• Hundreds of strains a week• Rapid, hands-off analysis
Solution – SnapperDB:
SnapperDB
SampleFASTQs(with ST)
EBG 1 - Typhimuriumdb
db
db
db
db
EBG 4 - Enteritidis
EBG 13 - Typhi
EBG 3 - Newport
EBG 11 – Paratyphi A
…
30 mins - parallel 5 min – 1 hour
4 SnapperDB
FASTQsSNPdb
(PostgreSQL database)
SNP alignmentsTrees
SnapperDBSnapperDB.py
fastq_to_db
(fastq_to_vcf,vcf_to_db)
SnapperDB.pyget_the_snps
RAxML
FastTree
SNPdb Schema
5 SnapperDB
Parse VCF
Ignored positions• Ambiguous
mapping• Low coverage
Variants
Strain SNPs
id name variants_id ignored_pos1 H123456789 [1,2,3,4,5,…] [9985, 856142, …]
Variantsid position ref base var_base1 235214 A T2 455544 T C…
Running SnapperDB
6 SnapperDB
FASTQs
Trees
Pairwise distance matrix
Put picture of SNP distance matrix
SNP address
Looks something like
1.1.24.36.48.128.2013
7 Revolutionising Salmonella reference microbiology
http://benfry.com/zipdecode/
8 Revolutionising Salmonella reference microbiology
http://benfry.com/zipdecode/
9 Revolutionising Salmonella reference microbiology
http://benfry.com/zipdecode/
10 Revolutionising Salmonella reference microbiology
http://benfry.com/zipdecode/
11 Revolutionising Salmonella reference microbiology
http://benfry.com/zipdecode/
12 Revolutionising Salmonella reference microbiology
http://benfry.com/zipdecode/
SNP Address
13
1 2
1
2
31
2
3
4
5
6
1.1.1
1.2.2 1.2.41.2.3
2.3.5
2.3.6
Update SNP address& detect mixed
14 SnapperDB
As you add strains to cluster, it calculates the mean pairwise distance
and stdev within that cluster.
Then, if a strain has a z-score of >1.75, the strain is quality assessed.
Look at the alignment of that strain and near neighbours (with Ns) and exclude if a strain introduces a larger number
of unique Ns to the alignment.
Otherwise, these can blunt the tips of the tree and reduce resolution.
SnapperDB.py update_clusters
Results
15 SnapperDB
16 SnapperDB
Results
AcknowledmentsTim Dallman
Anthony Underwood
Aleksy Jironkin
Jon Green
Ali Al-Shabib
17 SnapperDB
Top Related