SnapperDB ASM NGS conference

17
SnapperDB: A Scalable Database for Routine Sequencing of Bacterial Isolates Philip Ashton Bioinformatician Gastrointestinal Bacteria Reference Unit

Transcript of SnapperDB ASM NGS conference

Page 1: SnapperDB ASM NGS conference

SnapperDB: A Scalable Database for Routine Sequencing of Bacterial Isolates

Philip AshtonBioinformaticianGastrointestinal Bacteria Reference Unit

Page 2: SnapperDB ASM NGS conference

Github + CLIMB image• Download the code from https://github.com/PHE-GIDIS/SnapperDB.git

• Spin up an image (if you have a CLIMB account) http://birmingham.climb.ac.uk/ ashton-phe-snpdb-client

• And, if you have no idea what I mean by spin up an image:• http://bitsandbugs.org/2015/05/13/climb-hackathon-outcome/• Google – ‘bits and bugs + climb’• CLIMB google group

2 SnapperDB

Page 3: SnapperDB ASM NGS conference

3 SnapperDB

Challenges:• Many different eburst groups (of STs) – have to be analysed separately• Hundreds of strains a week• Rapid, hands-off analysis

Solution – SnapperDB:

SnapperDB

SampleFASTQs(with ST)

EBG 1 - Typhimuriumdb

db

db

db

db

EBG 4 - Enteritidis

EBG 13 - Typhi

EBG 3 - Newport

EBG 11 – Paratyphi A

30 mins - parallel 5 min – 1 hour

Page 4: SnapperDB ASM NGS conference

4 SnapperDB

FASTQsSNPdb

(PostgreSQL database)

SNP alignmentsTrees

SnapperDBSnapperDB.py

fastq_to_db

(fastq_to_vcf,vcf_to_db)

SnapperDB.pyget_the_snps

RAxML

FastTree

Page 5: SnapperDB ASM NGS conference

SNPdb Schema

5 SnapperDB

Parse VCF

Ignored positions• Ambiguous

mapping• Low coverage

Variants

Strain SNPs

id name variants_id ignored_pos1 H123456789 [1,2,3,4,5,…] [9985, 856142, …]

Variantsid position ref base var_base1 235214 A T2 455544 T C…

Page 6: SnapperDB ASM NGS conference

Running SnapperDB

6 SnapperDB

FASTQs

Trees

Pairwise distance matrix

Put picture of SNP distance matrix

SNP address

Looks something like

1.1.24.36.48.128.2013

Page 7: SnapperDB ASM NGS conference

7 Revolutionising Salmonella reference microbiology

http://benfry.com/zipdecode/

Page 8: SnapperDB ASM NGS conference

8 Revolutionising Salmonella reference microbiology

http://benfry.com/zipdecode/

Page 9: SnapperDB ASM NGS conference

9 Revolutionising Salmonella reference microbiology

http://benfry.com/zipdecode/

Page 10: SnapperDB ASM NGS conference

10 Revolutionising Salmonella reference microbiology

http://benfry.com/zipdecode/

Page 11: SnapperDB ASM NGS conference

11 Revolutionising Salmonella reference microbiology

http://benfry.com/zipdecode/

Page 12: SnapperDB ASM NGS conference

12 Revolutionising Salmonella reference microbiology

http://benfry.com/zipdecode/

Page 13: SnapperDB ASM NGS conference

SNP Address

13

1 2

1

2

31

2

3

4

5

6

1.1.1

1.2.2 1.2.41.2.3

2.3.5

2.3.6

Page 14: SnapperDB ASM NGS conference

Update SNP address& detect mixed

14 SnapperDB

As you add strains to cluster, it calculates the mean pairwise distance

and stdev within that cluster.

Then, if a strain has a z-score of >1.75, the strain is quality assessed.

Look at the alignment of that strain and near neighbours (with Ns) and exclude if a strain introduces a larger number

of unique Ns to the alignment.

Otherwise, these can blunt the tips of the tree and reduce resolution.

SnapperDB.py update_clusters

Page 15: SnapperDB ASM NGS conference

Results

15 SnapperDB

Page 16: SnapperDB ASM NGS conference

16 SnapperDB

Results

Page 17: SnapperDB ASM NGS conference

AcknowledmentsTim Dallman

Anthony Underwood

Aleksy Jironkin

Jon Green

Ali Al-Shabib

17 SnapperDB