ECCMID 2016 - How to build actionable virulome databases
-
Upload
joao-andre-carrico -
Category
Science
-
view
352 -
download
0
Transcript of ECCMID 2016 - How to build actionable virulome databases
Assessing virulence from genomic data - which virulome database?
João André Carriço, Microbiology Institute and Instituto de Medicina Molecular, Faculty of Medicine, University of [email protected] twitter: @jacarrico
Session SY024 Controversies in interpreting whole genome sequence data26th ECCMID, Amsterdam, Netherlands 7-12 April 2016
How can we design actionable virulome databases
João André Carriço, Microbiology Institute and Instituto de Medicina Molecular, Faculty of Medicine, University of [email protected] twitter: @jacarrico
Session SY024 Controversies in interpreting whole genome sequence data26th ECCMID, Amsterdam, Netherlands 7-12 April 2016
What is a virulence factor?Virulence Factors: Class of gene products Help pathogens to invade the host
and evade specific host’s defensive mechanisms
Enhance the pathogen’s potential to cause disease
What is a virulence factor?Virulence Factors (example): Bacterial toxins (Endotoxins and Exotoxins) Adherence factors (Pili) Cell surface carbohydrates and proteins that
protect a bacterium (Streptococcal M Protein) Hydrolytic enzymes that may contribute to the
pathogenicity of the bacterium (hyaluronidase) Factors to compete with host nutrient uptake
(Siderophores)Sources: VFDB / Medical Microbiology. 4th edition. (http://www.ncbi.nlm.nih.gov/books/NBK7627/)
Too much –ome will kill you…
Virulome
Core genome Accessory genomeMobilom
e
“Virulome” Databases VFDB (http://www.mgc.ac.cn/VFs/main.htm) Pathosystems Resource Integration Center
(PATRIC) VF (https)://www.patricbrc.org/) Victors (http://www.phidias.us/victors/) PHI-Base (http://www.phi-base.org/) MvirDB (http://mvirdb.llnl.gov/ )
Criteria for choice: Focused mainly on virulence factors DB (as defined in the first slide) excludes Antibiotic resistance databases (CARD, ARDB,ARGO, RAC,…)
VFDB
* Created to facilitate the screening of HTS data
Database last update: Tue Feb 23 22:05:25 2016
PATRIC VF• 6 NIAID priority genera:
• Mycobacterium• Salmonella• Escherichia• Shigella• Listeria• Bartonella
• 1572 VFs• 1071 articles• Use of controlled vocabulary• Integrates VFDB and Victors VF information
• PATRIC supports:• Genome annotation• Comparative Genomics• Transcriptomics• Pathways• Host-pathogen interaction• Disease-related information
• Database last update:• March 2016
Pathosystems Resource Integration Center
Victors
• 5177 Virulence Factors• 126 Pathogens (class/#sp/#VFs):
• Gram + 15 1160 • Gram – 36 3488 • Virus 54 179 • Parasites 13 105 • Fungi 8 245
• Last DB Update: 27/8/2014
PHI-base
• pathogenicity, virulence and effector genes• Fungal• Oomycete • bacterial pathogens
• Hosts:• Animal• Plant • Fungal• Insect hosts.
mVirDB
• Biodefense focused• Last update 2007??• Data still available for download..
Greatest strengths All the databases have:
manually curated data links for the original publication
However manual curation is a huge caveat due to the sustainability of the process
How to use these resources Querying annotation in the the
website
Selecting species of interest, and browsing the website
BLAST query for DNA or Protein
How to use these resources Download the gene/protein
databases and use them as templates for searching own data
How to use these resources
MVLST/MLST-v
How to use these resources With HTS several core genome /whole genome MLST schemas are becoming
available/being developed: Neisseria sp. Campylobacter sp. Staphylococcus aureus Legionella pneumophila Listeria monocitogenes Enterococcus faecium Mycobacterium tuberculosis Acinetobacter baumannii Salmonella enterica E.coli ….
Loci in these schemas can be annotated / linked to the Virulence Factor DBs for automatic allele annotation through these systems
Seqsphere+
http://pubmlst.org/http://bigsdb.web.pasteur.fr/https://enterobase.warwick.ac.uk/
Bionumerics 7.5
Back to the title So far we have seen what is
available
How can we design actionable virulome
databases ?Actionable: able to be done or acted on; having practical value New Oxford American Dictionary
Bioinformatics needs Available databases still lack interfaces
for programmatic access : RESTful APIs would allow:
▪ easy automatic querying from scripts without the need of web interfaces or downloads
▪ Database updates by authorized groups (distributed curation effort)
APIs : Application Programming Interfaces
Bioinformatics needs Existing DBs reuse each others datasets without true
database interoperability: need for common ontologies (controlled vocabularies already exist but are not used by all)
Ontologies and computer readable data formats (json-ld or RDF) can allow for true database interoperability allowing bioinformaticians to extract the targeted information from a single query reaching multiple databases
Controlled vocabularies and Ontologies
Trends Microbiol 17, 279–285 (2009).
Sustainability needs
Major problems of databases Manual curation still a necessity Academic model for sustainability of a
resource: lack of funding leads to “dead” databases
Take home messages Existing virulome databases provide a wealth of data
A large part of the available VF data overlaps between DBs. The overlap largely depends of the last database update and what was included.
They are always a Work in Progress , heavily relying in manual curation
Novel HTS based techniques such as cg/wgMLST can use this databases to annotate schemas and provide a much richer picture of VF diversity at DNA/Protein level.
on VF
Acknowledgments UMMI Members
Mário Ramirez José Melo-Cristino
EFSA INNUENDO Project (https://sites.google.com/site/innuendocon/) Mirko Rossi
FP7 PathoNGenTrace (http://www.patho-ngen-trace.eu/): Dag Harmsen (Univ. Muenster) Stefan Niemann (Research Center Borstel) Keith Jolley, James Bray and Martin Maiden (Univ. Oxford) Joerg Rothganger (RIDOM) Hannes Pouseele (Applied Maths)
Genome Canada IRIDA project (www.irida.ca) Franklin Bristow, Thomas Matthews, Aaron Petkau, Morag Graham and Gary Van Domselaar (NLM , PHAC) Ed Taboada and Peter Kruczkiewicz (Lab Foodborne Zoonoses, PHAC) Fiona Brinkman (SFU) William Hsiao (BCCDC)
INTEGRATED RAPID INFECTIOUS DISEASE ANALYSIS