2015 06-12-beiko-irida-big data
-
Upload
beiko -
Category
Health & Medicine
-
view
489 -
download
4
Transcript of 2015 06-12-beiko-irida-big data
![Page 1: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/1.jpg)
1
![Page 2: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/2.jpg)
2
“All of your answers are approximate, you might as well live with it…”
Andrew Rau-Chaplin, 1½ hours ago
![Page 3: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/3.jpg)
Integrated Rapid Infectious Disease Analysiswww.irida.ca
Rob BeikoFaculty of Computer ScienceDalhousie UniversityJune 12
Microbial genomics for rapid investigation of infectious disease
Image © Kenneth Todar
![Page 4: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/4.jpg)
4
2009 and Influenza A
![Page 5: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/5.jpg)
5
![Page 6: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/6.jpg)
6
![Page 7: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/7.jpg)
7
Influenza ARNA genome (14,000 nucleotides)Eight segments(Image: Tao and Zheng, Science 2012)
S. Typhi CT18DNA genome (~5,100,000 nucleotides)One chromosome + two plasmidsScience (2001)
VIRUS BACTERIUM
![Page 8: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/8.jpg)
8
Outbreak investigation
Similarities: place, time, genetics
fda.gov
2014
2010-2013
Inns et al. (2015)
![Page 9: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/9.jpg)
9
Outbreak investigation in Canada
NATIONAL MICROBIOLOGY LABORATORY
PROVINCIAL PUBLIC HEALTH LABORATORIES
CLINICAL ISOLATES
SENTINEL SURVEILLANCE(FoodNet Canada)
CLINICAL, FOOD, ENVIRONMENTAL
CANADIAN FOOD INSPECTION AGENCY
(Regulatory)
FOOD ISOLATES
LISTERIA - E. COLI O157:H7 - SALMONELLA - SHIGELLA
PFGE/MLVA
PUBLIC HEALTH ACTION
![Page 10: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/10.jpg)
10
Pulsed Field Gel ElectrophoresisSerratia - NICU
Hospita
l cas
es
Handwash
es
Environmental
(doors, etc)
Control
(elsewhere in
hospita
l)
Jang et al., J Hosp Infect (2001)
![Page 11: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/11.jpg)
11
15 gigabases per run$1000 - $1500 / run, 1 day
Tinier pieces (150 – 400 bases)
< 1 kilobase per run$2 / run, 1-3 hours (96 in parallel)
Tiny pieces (600 – 1000 bases)
2011: Illumina MiSeq1977: Sanger sequencing ( )
DNA Sequencing
![Page 12: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/12.jpg)
10/10/2013 VanBUG 12
![Page 13: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/13.jpg)
13
MiSeq projects at Dalhousie• Bedford Basin microbial monitoring• Pediatric Crohn’s disease samples• Global microbial air sampling• Mink genomes• Sequencing Lactobacillus genomes from the poop of
old mice• Wastewater diversity and function in the Arctic• Verifying ingredients in dog food ( )• Exercise and the Microbiome
![Page 14: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/14.jpg)
14
Integrated Rapid Infectious Disease Analysiswww.irida.ca
1.56M, 3-year Genome Canada Large-Scale Applied Platform Grant
SFU / BCCDC / PHAC-NML / Dalhousie DNA sequencing and downstream applications
• data management / federation• analysis workflows• ontologies• APIs• 3rd-party applications
Implementation in provincial public health labs Training
![Page 15: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/15.jpg)
15
Five Pillars of IRIDA
![Page 16: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/16.jpg)
16
Ontologies and data standards NCBI, MiXS, vegetables
Metadata Data provenance Data quality Environmental information
![Page 17: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/17.jpg)
17
Data sharing!
• BIG challenges – different jurisdictions, “ownership” of epi data. Privacy!• Health service providers – concerns
about privacy and data breach• Technology outstrips policy• What digital records could we get TODAY?
• Canada lagging in data sharing
![Page 18: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/18.jpg)
18
Calling isolates based on genetic variation
Traditional: Pulsed-field Multi-locus (standards! mlst.net)
Whole genomes: Lots of information! Too much information! Lots of filtering and quality
control required
![Page 19: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/19.jpg)
19
Workflow management
REST-like API (3rd – party applications)
Security: authentication / authorization
Data models & implementation
![Page 20: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/20.jpg)
Local Storage
Remote APIs
IRIDA’s Federated Design
List Samples
20
![Page 21: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/21.jpg)
21
Each pipeline is implemented as a Galaxy workflow
Internal analysis pipelines Assembly and annotation Phylogenetics “Line list” management
3rd-party applications
![Page 22: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/22.jpg)
22
Sampled genomes Quality control Tree generation /visualization
Single-Nucleotide Variant Phylogenetic Pipeline
(SNVPhyl)
![Page 23: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/23.jpg)
23
GenGIS
Data from Haiti cholera outbreak, 2010http://kiwi.cs.dal.ca/GenGIS
![Page 24: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/24.jpg)
24
IslandViewer
http://www.pathogenomics.sfu.ca/islandviewer/browse
![Page 25: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/25.jpg)
25
Interfaces / environment
Personas Researchers Epidemiologists Clinical microbiologists / lab technicians
Workflow design and execution
![Page 26: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/26.jpg)
Full Privileges
Cluster Line List ID
Patient Name
Prov. Health
No.Age Sex Location Sample
IDCollection
DateCulture Result
A 1John Smith 4513253244 26 M Vancouver F14231 14/03/21 Salmonella
sp.
A 2Sally Smith 4519567458 24 F Vancouver F14235 14/03/21 Salmonella
sp.
B 3Tom Jones 4517543216 35 M Vancouver M6542 14/03/24 Salmonella
sp.
B 4Helen Jones 9856321124 35 F Vancouver S1245 14/03/22 Salmonella
sp.
C 5Jennifer Lee 4516853122 29 F Vancouver S5642 14/03/22 Salmonella
sp.
C 6Michael Brown 9456534561 45 M Victoria T68954 14/03/25 Salmonella
sp.
Phylogenetic Tree
Genetic Distance
![Page 27: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/27.jpg)
Limited Privileges
Cluster Line List ID
Patient Name
Prov. Health
No.Age Sex Location Sample
IDCollection
DateCulture Result
A 1John Smith 4513253244 26 M Vancouver F14231 14/03/21 Salmonella
sp.
A 2Sally Smith 4519567458 24 F Vancouver F14235 14/03/21 Salmonella
sp.
B 3Tom Jones 4517543216 35 M Vancouver M6542 14/03/24 Salmonella
sp.
B 4Helen Jones 9856321124 35 F Vancouver S1245 14/03/22 Salmonella
sp.
C 5Jennifer Lee 4516853122 29 F Vancouver S5642 14/03/22 Salmonella
sp.
C 6Michael Brown 9456534561 45 M Victoria T68954 14/03/25 Salmonella
sp.
Phylogenetic Tree
Genetic Distance
![Page 28: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/28.jpg)
28
Large-scale sequencing initiatives
en.wikipedia.org
![Page 29: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/29.jpg)
29
FDA GenomeTrakr
http://www.fda.gov/Food/FoodScienceResearch/WholeGenomeSequencingProgramWGS/ucm363134.htm
![Page 30: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/30.jpg)
30
Public Health England project (>10,000 Salmonella so far)
• As of 2015, sequencing every sampled Salmonella isolate collected in England• Over 10,000 sequenced to date• 8000 already available for download in the public
databases
![Page 31: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/31.jpg)
31Gary van Domselaar, NML
The Global Microbial Identifier
![Page 32: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/32.jpg)
32
What’s next?
??? per run$900 / run, 6 hours
Huge pieces (max so far – 200-300 kilobases)Can stop / restart using same disposable flowcell
2015: Oxford Nanopore MinION
15 cm (-ish)
thehightechsociety.com
![Page 33: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/33.jpg)
33Quick et al. (2015)
“Using a novel streaming phylogenetic placement method samples can be assigned to a serotype in 40 minutes and determined to be part of the outbreak in less than 2 h.”
![Page 34: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/34.jpg)
34
Ebola monitoring
blogs.biomedcentral.comJoshua Quick, Nick Loman
![Page 35: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/35.jpg)
35
Example workflow
6 hrs
Changeflowcell
Samples evaluated against reference in real time
Positive ID / placement
Load DNA
confi
denc
e
![Page 36: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/36.jpg)
36
Challenges
• Sample extraction: getting DNA from stuff• Clinical-grade evaluation• Training• Equipment reliability• Sequencing errors• Quality of reference data / attribution algorithms
• Database updates in real time• Ethics / privacy (Genomes Sequenced While U Wait)
![Page 37: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/37.jpg)
37
The Point
Comprehensive monitoringAccurate typingRapid identification
Real-time decision making
![Page 38: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/38.jpg)
Acknowledgements PIs
Fiona Brinkman – SFUWill Hsiao – PHMRLGary Van Domselaar – NMLMorag Graham - NMLRob Beiko – Dalhousie
University of LisbonJoᾶo Carriҫo
National Microbiology Laboratory (NML)Franklin BristowAaron PetkauThomas MatthewsJosh AdamAdam OlsenTara LynchShaun TylerPhilip MabonPhilip AuCeline NadonMatthew Stuart-EdwardsChrystal BerryLorelee Tschetter
Laboratory for Foodborne Zoonoses (LFZ)Eduardo TaboadaPeter KruczkiewiczChad LaingVic GannonMatthew WhitesideRoss DuncanSteven Mutschall
Simon Fraser University (SFU)Melanie CourtotEmma GriffithsGeoff WinsorJulie ShayMatthew LairdBhav DhillonRaymond Lo
BC Public Health Microbiology & Reference Laboratory (PHMRL) and BC Centre for Disease Control (BCCDC)Judy Isaac-RentonPatrick TangNatalie PrystajeckyJennifer GardyDamion DooleyLinda HoangKim MacDonaldYin ChangEleni GalanisMarsha TaylorCletus D’SouzaAna Paccagnella
University of MarylandLynn Schriml
Canadian Food Inspection Agency (CFIA)Burton BlaisCatherine CarrilloDominic Lambert
Dalhousie UniversityAlex Keddy 38
McMaster UniversityAndrew McArthurDaim Sardar
European Nucleotide ArchiveGuy CochranePetra ten HoopenClara Amid
European Food Safety AgencyLeibana Criado ErnestoVernazza FrancescoRizzi Valentina
![Page 39: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/39.jpg)
39
Seminar from the Will Hsiao,BC Centres for Disease Control
![Page 40: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/40.jpg)
40
Materials to be available onhttp://bioinformatics.ca/
June 24-26, 2015
![Page 41: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/41.jpg)
41
The Bioinformatics Exam of the Future
tagc.com.aucommons.wikimedia.org/wiki/File:DNA_ahelatest_moodustunud_niit_katsuti_korgil..JPGhttp://omicfrontiers.com/2014/06/11/diaryofaminion_part2/
![Page 42: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/42.jpg)
42
2009 was a long time ago
J. Craig Venter Institute
![Page 43: 2015 06-12-beiko-irida-big data](https://reader036.fdocuments.net/reader036/viewer/2022081519/55b89598bb61ebd9358b4585/html5/thumbnails/43.jpg)
43Photo credit: Emma Allen-VercoeSome slides courtesy of Gary Van Domselaar, NML
FIN