07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL...

Post on 12-Aug-2020

1 views 0 download

Transcript of 07-Klimke6/27/2016 1 Fostering Collaboration for Public Health: The Role of NCBI William Klimke APHL...

6/27/2016

1

Fostering Collaboration for Public Health: The Role of NCBI

William KlimkeAPHL 2016

The Interagency Collaboration on Genomics and Food Safety (Gen‐FS) established by this Charter represents a substantial effort to strengthen collaboration and coordination of Federal public health and regulatory food safety responsibilities of the Centers for Disease Control and Prevention (CDC), the Food and Drug Administration (FDA), and the National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH) of the US Department of Health and Human Services, and the Food Safety and Inspection Service (FSIS) of the US Department of Agriculture. 

Gen‐FS will strengthen Federal collaboration by addressing cross‐cutting priorities for molecular sequencing of foodborne and other pathogens causing human illness, and data collection, analysis and use, as outlined in the key findings of the Report of the Real Time Whole Genome Sequencing Surveillance Multi‐Agency Collaboration Meeting, September 22‐23, 2014, Natcher Center, NIH, Bethesda Maryland

Interagency Collaboration on Genomics and Food Safety (Gen-FS)

6/27/2016

2

4

FDA/CDC Real Time Listeria Project

FDA & CDC could leverage existing systems & work flows…

Could NCBI play a role?

Listeria Annual Stats (CDC)• ~1600 cases• ~260 deaths• ~$230 million (USDA ERS)

6/27/2016

3

Whole Genome Sequencing (WGS)Listeria Pilot Project

Started September 2013

Goal: Sequence all Listeria monocytogenes isolates

Near real‐time (<1 week for patient isolates)

Public Health Agency of Canada

6/27/2016

4

http://www.globalmicrobialidentifier.org/

Vision and objectivesThe vision is to develop a global system to aggregate, share, mine and use microbiological genomic data to address global public health and clinical challenges, a high impact area in need of focused effort. Such a system should be deployed in a manner which promotes equity in access and use of the current technology worldwide, enabling cost-effective improvements in plant, animal, environmental and human health.

Global Microbial Identifier

6/27/2016

5

sample_name

organism

strain/isolate

Category (attribute_package)

1a) Clinical/Host‐associated

1a1) specific_host

1a2) isolation_source

1a3) host‐disease

OR 

1b) Environmental/Food/Other

1b1) isolation_source

collection_date

Geographic location

6a) geo_loc_name

OR

6b) lat_lon

collected by

Where

When

Who

What

minimal metadata 

NCBI Biosample – Pathogen Template (Foodborne Outbreaks)

6/27/2016

6

Type Submissions

pathogen 117406

pathogen: clinical/host‐associated 68458

pathogen: food/environmental/other 48948

with publicly available SRA data 83243

Salmonella 48967

Listeria 12116

Campylobacter 2978

Escherichia and Shigella 13011

Other 40334

NCBI Biosample – Pathogen Template Total Submissions (May 2016)

Type SubmissionsKlebsiella 1815Acinetobacter 1906Enterobacter 822Staphylococcus 1960Streptococcus 4337Legionella 296Viruses 8589Serratia 125Pseudomonas 1133Mycobacterium 6161Vibrio 1149Bordetella 205Bacillus 332Neisseria 985

NCBI Biosample – Pathogen Template Other pathogens (May 2016)

6/27/2016

7

NCBI Pathogen Detection Pipeline Submissions and Analysis

NCBI Submission Portal

BioSamples

SRA

GenBank

BioProject

NCBI Pathogen Pipeline

Kmer analysis

Genome  Assembly

Genome  Annotation

Genome  Placement

Clustering

SNP analysis

Tree Construction

Reports

QC

USA

UK

Aus

Clinical

NCBI Pathogen Detection Pipeline

Submissions (Jan – May, 2016)

6/27/2016

8

Type Total targets in k‐mer tree

Targets in clusters (single linkage <= 50 SNPs)

Salmonella 45297 38794Listeria 9621 8135E. coli & Shigella 13144 6046Campylobacter 2234 1569Acinteobacter 2179 1299Elizabethkingia 89 74Serratia 336 227Klebsiella  1194 677

Contributions of enteric pathogensfor food safety

http://www.ncbi.nlm.nih.gov/pathogens/contributors/

6/27/2016

9

6/27/2016

10

Contributions of clinical pathogens

http://www.ncbi.nlm.nih.gov/pathogens/

Results Available Now

6/27/2016

11

6/27/2016

12

NCBI’s Role in Combating Antibiotic Resistant Bacteria

“Create a repository of resistant bacterial strains (an “isolate bank”) and maintain a well‐curated reference database that describes the characteristics of these strains.”

“Develop and maintain a national sequence database of resistant pathogens.”

6/27/2016

13

6/27/2016

14

Clin Infect Dis. 2014 Aug 1;59(3):390‐7. doi: 10.1093/cid/ciu319. Epub 2014 May 1.

MBio. 2015 Jul 28;6(4):e01030. doi: 10.1128/mBio.01030‐15.

6/27/2016

15

AMR efforts at NCBI

• With collaborators, build database of sequenced isolates with standardized AMR metadata (i.e. accept antibiograms)  (2019 Samples as of May 16 ‐http://www.ncbi.nlm.nih.gov/biosample/?term=antibiogram[filter])

• Collaborators include: (CDC, WRAIR, FDA, B&W)

• Stable, up‐to‐date database of AMR genes with standardized nomenclature• Collaborators (CARD)

• – RefSeq set released by June 2016

• Implement and validate tools for identifying AMR genes in new isolates

Antibiogram Fields• Fields designed to find balance between comprehensiveness and ease of submission

• Data dictionaries based on outside expertise (ASM, CLSI) standardize input and minimize ‘data drift’

6/27/2016

16

NCBI Outputs

Kmer tree

ftp://ftp.ncbi.nlm.nih.gov/pathogen/Results/Listeria/latest/

• Genome Workbench• full SNP reports• Integrated web‐based interactive 

system*• AMR reports*• wgMLST*

Acknowledgements

Joshua CherryMichael DiCuccioWilliam KlimkeAleksandr MorgulisEyal MozesArjun PrasadKirill RotmistrovskyAlejandro SchafferSergey ShiryevMartin ShumwayAlexander SouvorovLukas WagnerAlexander Zasypkin

CDCFDA/CFSANUSDA‐FSISPHE/FERANIAIDWRAIRBroadWadsworth/MDH

pd‐help@ncbi.nlm.nih.gov

This research was supported by the Intramural Research Program of the NIH, National Library of Medicine. http://www.ncbi.nlm.nih.govNational Center for Biotechnology Information – National Library of Medicine – Bethesda MD 20892 USA

David LipmanJames Ostell

SRA teamSystems groupSubmission Portal

6/27/2016

17

Automated Bacterial Assembly

SRA Reads sample 1

Trim reads (Ns, adaptor)

Reference  Distance tree

Find closest reference genome(s)

ArgoCA (Combined Assembly)

De novo assembly panel

Argo (Reference assisted assembly) SOAP denovo GS‐assembler (newbler)MaSuRCA Celera Assembler 

Reads remapped to combined assembly

Contig fastaRead placements (bam)Quality profile

SPAdes

6/27/2016

18

NCBI Pathogen Detection SNP Pipeline Web viewer (coming soon): example 3 – Elizabethkingia outbreak

704 SalmonellaEnteritidis

7102 columns (filtered)

Compatibility Parsimony

Data plus “noise”7402 total columns

Add “noise”:300 

columns that had been 

removed by filtering

No Changes 

to Topology

Many Changes and 

conflicts (47 + 43)

More Conflicts

(5 + 6 branches, of ~1100 total)

Few Conflicts between Topologies

R&D: Tree Building

6/27/2016

19

wgMLST approach• Complementary to SNP analysis e.g. consistency check

• Efficient for initial clustering of all isolates in species

• Generate loci using “essentially complete” RefSeq genomes

Organism Number of loci Genome in loci Number of genomes Major species

Acinetobacter 2420 58.25% 43/47 Baumannii

Campylobacter 1257 68.36% 90/132 Jejuni

Escherichia 2896 52.97% 159/165 Coli

Klebsiella 4004 82.54% 67/82 Pneumoniae

Listeria 2364 73.88% 73/81 Monocytogenes

Salmonella 3469 66.98% 137/147 Enterica

R&D: wgMLST

• Fast & relatively simple• Epidemiologists are 

familiar with it• Good for initial clustering• Different heuristics• Can use special markers 

for e.g. serovars• Still need to deal with 

assembly errors• Recombination can still 

be a problem…

wgMLST – a complementary 

method

Loci are notindependent

R&D: wgMLST

6/27/2016

20

1. Initial partition of isolates within each species by kmer distances

2. Within each partition, blast comparison of all pairs of genomes

3. Single linkage clusters with at most 50 SNPs

4. Within clusters, SNPs with respect to one reference

5. Generate final SNP list and phylogenetic trees

Filtering:• Base level• Repeat • Density

Problematic genomes are eliminated at various points along the way

SNP pipeline

High SNP densityCumulative count of differences

Iterative density filtering (Richa Agarwala modification of Science. 2011 Jan 28;331(6016):430‐4. 

6/27/2016

21

Number of RefSeq genomes with AMR hits

OrganismCarbapenem‐resistant beta 

lactamase alleles

GES KPC NDM OXA IMP VIM IMI

3221  Escherichia coli 0 74 32 2 6 0 0

1096 Acinetobacter baumannii 0 2 32 2861 6 0 0

1081 Pseudomonas aeruginosa 0 6 0 0 0 234 0

781 Klebsiella pneumoniae 2 930 96 10 6 0 0

314 Enterobacter cloacae 0 278 8 0 6 0 1

74 Enterobacter aerogenes 0 16 0 0 0 0 0

72 Klebsiella oxytoca 0 20 4 0 0 0 0

70 Serratia marcescens 0 2 0 0 0 0 0

30 Citrobacter 0 24 4 0 0 3 0

NCBI Pathogen DetectionCarbapenem resistant beta lactamase alleles found

Organism Submitter

Number ofgenomes withcarbapenemases KPC NDM OXA

Salmonella CDC 2 1 1 0

Salmonella PHE 12 0 1 11

Serratia marcescens B&W Hospital 1 2 0 0

Pseudomonas aeruginosa B&W Hospital 2 0 0 3

Escherichia coli B&W Hospital 1 0 0 2

Klebsiella pneumoniae B&W Hospital 10 10 0 0

Enterobacter cloacae B&W Hospital 7 7 0 0

Acinetobacter B&W Hospital 6 0 0 10