The path to implementation of Whole Genome Sequencing (WGS) in PulseNet
-
Upload
externalevents -
Category
Education
-
view
178 -
download
1
Transcript of The path to implementation of Whole Genome Sequencing (WGS) in PulseNet
The path to implementation of WGS in
PulseNet
National Center for Emerging and Zoonotic Infectious Diseases
Division of Foodborne, Waterborne, and Environmental Diseases
Peter Gerner-Smidt, MD, DScEnteric Diseases Laboratory Branch
GMI9
Rome, Italy, May 23- 25, 2016
PulseNet International The international subtyping network of national and regional networks for foodborne disease surveillance
”Saving Lives Since 2000”
http://www.pulsenetinternational.org/
Whole Genome Sequencing (WGS) is a Transforming and REPLACING Technology
Consolidating multiple laboratory workflows into one:
o Identification – serotyping – virulence profiling – antimicrobial
resistance characterization – plasmid characterization- subtyping
Replacing - NOT supplementing current methods
More: Precise- Informative- Cost-efficient
WGS in Public Health:
The analytical tools must be
• Simple
• Public health microbiologists are NOT
bioinformaticians
• Standard desktop software
• Comprehensive
• All characterization incl. analysis in one workflow
• Working in a network of laboratories, i.e. STANDARDIZED
• Free sharing and comparison of data between labs
• Central and local analysis
MLST vs SNPSNP MLST
Epidemiological concordance High High
Stable nomenclature (No) Yes
Reference characterization:
identification, serotyping, virulence &
resistance markers
No Yes
Speed Slow SNP calling,
slow analysis
Slow allele calling,
fast analysis
Local computing requirements Medium-High Low
Local bioinformatics expertise Yes No
Reference used to perform analysis Sequence of
closely related
annotated strain
Allele database
Requires curation No (Yes)
MLST is the primary approach for public health surveillance; SNP is used if more
detail is needed or MLST fails
Listeria 1403MLGX6-1WGS
wgMLST and hqSNP Are Equally Discriminatory
and Phylogenetic Trees Are Concordant
hqSNP
0.0
0.0
0.3
0.3
0.1
0.5
0.1
0.6
1.5
2.1
wgMLST (<All Characters>)
100
9998
wgMLST
LMO
_1
LMO
_4
LMO
_5
LMO
_6
LMO
_7
LMO
_10
LMO
_11
LMO
_12
LMO
_13
LMO
_14
LMO
_15
LMO
_16
LMO
_17
LMO
_18
2 18 20 41 11 19 11 8 37 21 22 13 4
25 2 18 20 41 11 19 11 8 37 21 22 13 4
25 2 18 20 41 11 19 11 8 37 21 22 13 4
25 2 20 41 11 19 11 8 37 21 22 13 4
25 2 18 20 41 11 19 11 8 37 21 22 13 4
25 2 41 11 19 11 8 21 22 13 109
State 2 isolate 1
State 1 isolate
State 3 isolate
State 2 isolate 2
State 2 isolate 3
2013 isolate – Nearest Neighbor
wgMLST
State 2 isolate 1
State 1 isolate
State 3 isolate
State 2 isolate 2
State 2 isolate 3
2013 isolate – Nearest Neighbor
Trees ~ Tables
Key SourceStateSerotype PFGE-XbaI-patternPFGE-XbaI-status PFGE-BlnI-pattern
PFGE-BlnI-
status Outbreak SourceCounty SourceCity SourceCountry
SourceT
ype SourceSite PatientAge PatientSex IsolatDateReceivedDate UploadDate
M18340 M Enteritidis JEGX01.0009 Confirmed Unconfirmed 1507MLJEG-3 DeKalb Dawsonville USA Human Stool 54UNKNOWN 6/26/2015 7/15/2015 8/4/2015
X150951 X Enteritidis JEGX01.0009 Confirmed Unconfirmed 1507MLJEG-3 Gwinnett Key West USA Human Stool 33MALE 7/5/2015 7/15/2015 8/4/2015
D108427 D Enteritidis JEGX01.0009 Confirmed Unconfirmed 1507MLJEG-3 Fulton Miami USA Human Blood 50FEMALE 7/7/2015 7/15/2015 8/4/2015
A15054-1 A Enteritidis JEGX01.0009 Confirmed Unconfirmed 1507MLJEG-3 Pickens USA Human Stool 28FEMALE 7/7/2015 7/27/2015 8/7/2015
D508583 D Enteritidis JEGX01.0009 Confirmed Unconfirmed 1507MLJEG-3 Dawson Philadelphia USA Human Stool 24FEMALE 7/21/2015 8/11/2015
M088433 M Enteritidis JEGX01.0009 Confirmed Unconfirmed 1507MLJEG-3 Forsyth USA Human Stool 44FEMALE 7/16/2015 7/24/2015 8/13/2015
P110964-1 P Enteritidis JEGX01.0009 Confirmed Unconfirmed Forsyth USA Human Blood 72MALE 8/3/2015 8/10/2015 8/17/2015
A09461 A Enteritidis JEGX01.0009 Confirmed Unconfirmed Cabbagetown USA Human Blood 43FEMALE 7/30/2015 8/5/2015 8/26/2015
A109320 A Enteritidis JEGX01.0009 Confirmed Unconfirmed Bismarck USA Human Stool 28UNKNOWN 7/25/2015 8/6/2015 8/27/2015
T509961 T Enteritidis JEGX01.0009 Confirmed Unconfirmed
ForsythDecatur USA Human Stool 57UNKNOWN 7/31/2015 8/13/2015 9/10/2015
A110203 A Enteritidis JEGX01.0009 Confirmed Unconfirmed DeKalb Hollywood USA Human Other 14FEMALE 8/11/2015 8/25/2015 9/22/2015
A151664 A Enteritidis JEGX01.0009 Confirmed Unconfirmed Talking Rock USA Human Stool 62MALE 8/26/2015 9/8/2015 9/28/2015
DA159061 K Enteritidis JEGX01.0009 Confirmed Unconfirmed Pickens Pierre USA Human Stool 6FEMALE 8/29/2015 9/9/2015 9/29/2015
M150130-1 P Enteritidis JEGX01.0009 Confirmed Unconfirmed Dawson USA Human Stool 6MALE 9/20/2015 9/28/2015 10/1/2015
C15-0445058 N Enteritidis JEGX01.0009 Confirmed Unconfirmed Charlotte USA Human Stool 5MALE 9/2/2015 9/25/2015 10/9/2015
A122326 L Enteritidis JEGX01.0009 Confirmed Unconfirmed Gwinnett NYC USA Human Blood 88FEMALE 9/30/2015 10/7/2015 10/15/2015
A151248 A Enteritidis JEGX01.0009 Confirmed Unconfirmed Atlanta USA Human Stool 37MALE 10/4/2015 10/13/2015 10/21/2015
A125223 D Enteritidis JEGX01.0009 Confirmed Unconfirmed Hall L..A. USA Human Stool FEMALE 9/26/2015 10/14/2015 10/22/2015
FD
A0
00
09
43
3
FD
A0
00
09
40
8
FD
A0
00
09
43
2
FD
A0
00
09
41
1
FD
A0
00
09
41
4
FD
A0
00
09
41
0
20
15
K-0
96
2
FD
A0
00
09
41
5
FD
A0
00
09
40
9
PN
US
AS
00
09
07
FD
A0
00
09
41
3
20
15
K-0
96
0
FD
A0
00
09
41
2
20
15
K-0
96
1
FD
A0
00
09
41
7
PN
US
AS
00
09
05
PN
US
AS
00
08
39
FD
A0
00
09
41
6
PN
US
AS
00
08
61
PN
US
AS
00
09
06
PN
US
AS
00
08
42
PN
US
AS
00
08
58
PN
US
AS
00
08
44
PN
US
AS
00
08
62
PN
US
AS
00
08
40
PN
US
AS
00
09
08
PN
US
AS
00
08
97
PN
US
AS
00
08
45
PN
US
AS
00
08
60
PN
US
AS
00
09
03
PN
US
AS
00
09
04
PN
US
AS
00
07
64
PN
US
AS
00
08
43
PN
US
AS
00
08
59
PN
US
AS
00
08
41
PN
US
AS
00
08
07
PN
US
AS
00
08
95
PN
US
AS
00
07
73
PN
US
AS
00
07
67
*
PN
US
AS
00
08
94
PN
US
AS
00
07
66
PN
US
AS
00
07
70
*
PN
US
AS
00
07
72
*
PN
US
AS
00
08
96
PN
US
AS
00
07
69
*
PN
US
AS
00
07
71
*
PN
US
AS
00
08
08
PN
US
AS
00
07
68
*
PN
US
AS
00
07
99
20
15
K-0
96
4
6344
15
38
75
84
67
10
0
4
35
52 25
19
12
0.0
01
FD
A0
00
09
43
3
FD
A0
00
09
40
8
FD
A0
00
09
43
2
FD
A0
00
09
41
1
FD
A0
00
09
41
4
FD
A0
00
09
41
0
20
15
K-0
96
2
FD
A0
00
09
41
5
FD
A0
00
09
40
9
PN
US
AS
00
09
07
FD
A0
00
09
41
3
20
15
K-0
96
0
FD
A0
00
09
41
2
20
15
K-0
96
1
FD
A0
00
09
41
7
PN
US
AS
00
09
05
20
15
K-0
96
3
PN
US
AS
00
08
39
FD
A0
00
09
41
6
PN
US
AS
00
08
61
PN
US
AS
00
09
06
PN
US
AS
00
08
42
PN
US
AS
00
08
58
PN
US
AS
00
08
44
PN
US
AS
00
08
62
PN
US
AS
00
08
40
PN
US
AS
00
09
08
PN
US
AS
00
08
97
PN
US
AS
00
08
45
PN
US
AS
00
08
60
PN
US
AS
00
09
03
PN
US
AS
00
09
04
PN
US
AS
00
07
64
PN
US
AS
00
08
43
PN
US
AS
00
08
59
PN
US
AS
00
08
41
PN
US
AS
00
08
07
PN
US
AS
00
08
95
PN
US
AS
00
07
73
PN
US
AS
00
07
67
*
PN
US
AS
00
08
94
PN
US
AS
00
07
66
PN
US
AS
00
07
70
*
PN
US
AS
00
07
72
*
PN
US
AS
00
08
96
PN
US
AS
00
07
69
*
PN
US
AS
00
07
71
*
PN
US
AS
00
08
08
PN
US
AS
00
07
68
*
PN
US
AS
00
07
99
20
15
K-0
96
4
6344
15
38
75
84
67
10
0
4
35
52 25
19
12
0.0
01
Definitive phylogenetically relevant naming of WGS profiles
“SNP Address”
Courtesy Tim Dallman, PHE
1 2
1
2
31
2
3
4
5
6
1.1.1
1.2.21.2.4
1.2.3
2.3.5
2.3.6
Courtesy Tim Dallman, PHE
• Hierarchical clustering
based on full pairwise
distance between two
genomes
• Used to assign a SNP
address to a strain
based on specified
index e.g. 50:25:10:5:0
• Can be used for
surveillance purposes
“SNP address”
PulseNet International will use MLST: “Allele Code”
Considerations for a phylogenetic
relevant strain nomenclature system
• Must be simple
– Sequence of numbers
• Stability of system
– Fit new sequences into an existing tree?
– Recalculate the clusters with every new entry?
• No matter which method used, the stability can be controlled
• < 2% risk that you cannot fit a new sequence unambiguously
into the nomenclatural system
• Cutoffs between levels
• Clustering algorithm
– Single linkage? UPGMA?
WGS Data Workflow
Allele & Allele code
DatabasesAllele names, Allele code
(strain names)
NO Metadata
Temporary storage,
QA/QC, Data
extractionTrimming, mapping, de novo
assembly, SNP detection,
allele detection
NO Metadata
Public Health databases
Extensive Metadata
Database managers
and end users
External storageNCBI, ENA,
Limited
Metadata
Sequencer
Raw sequences
LIMS
7-gene MLST Allelic profile
cgMLST ST
wgMLST Allele Code
(SNPs)
Acknowledgements
National Center for Emerging and Zoonotic Infectious Diseases
Division of Foodborne, Waterborne, and Environmental Diseases
Disclaimers:
“The findings and conclusions in this presentation are those of the author and do not necessarily
represent the official position of the Centers for Disease Control and Prevention”
“Use of trade names is for identification only and does not imply endorsement by the Centers for
Disease Control and Prevention or by the U.S. Department of Health and Human Services.”
Public Health Agency of Canada
Institut Pasteur, S. Brisse; M. Lecuit
Center for Genomic Epidemiology, DTU
University of Oxford, M. Maiden, K. Jolly
Public Health England, T. Dallman
Hierachical Nomenclature Is
Inherently Unstable
• As we use approximate matching to group strains, equality is no
longer transitive.
Given strains A, B and C with distances as indicated,
Then at distance cutoff 21, A, B and C would be in the same cluster.
• However, if B has not been sampled yet, A and C would not be in
the same cluster
• How bad is it?
A
C
B13 17
28
Courtesy: Hannes Poussele, Applied Maths
Cutoff determination(case PulseNet Listeria cgMLST database, N= 3,652)
Test procedure: find points with minimal name changes starting from
nothing and by chronological addition of strains
Thresholds: 150:100:63:41:21:11
Courtesy: Hannes Poussele, Applied Maths
Stability Assessment
• Test 1: starting from nothing, add samples chronologically
• Test 2: starting from a random subset (50%), add samples chronologically
• Using a precalculated strain nomenclature structure based on what is
known today, reduces the nomenclature stability beyond what is expected
(that is, in this case, 50% reduction)
• The 21 allelic changes cutoff might be not stable enough
threshold % change
Test 1 Test 2
11 1.01% 0.30%
21 2.51% 1.64%
41 2.51% 0.57%
63 1.37% 0.27%
100 2.52% 0.03%
150 0.22% 0%
Courtesy: Hannes Poussele, Applied Maths
Stability Assessment
ConclusionsMLST-based hierachical strain nomenclature is feasible
• Stability good
– Without the 21 allelic changes cutoff, less that 1.17% name changes
• Stability can be further increased by defining a broad starting set – Using a more international collection of strains
– Using biological knowledge about the population structure of L.monocytogenes
• Computational feasibility
– Names can be assigned one sample at a time, no need for complete
recalculations
• wgMLST instead of cgMLST yields extremely similar results
Courtesy: Hannes Poussele, Applied Maths