OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows...
Transcript of OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows...
Projects Tools BLAH proposal Conclusion
OntoGene/BioMeXT
Projects Tools BLAH proposal Conclusion
The Bio Term Hub and OGER
Lenz Furrer, Nico Colic, Fabio RinaldiUniversity of Zurich and Swiss Institute of Bioinformatics
January 10, 2018
Projects Tools BLAH proposal Conclusion
Outline
Projects
Tools
BLAH proposal
Conclusion
Projects Tools BLAH proposal Conclusion
Topic
Projects
Tools
BLAH proposal
Conclusion
Projects Tools BLAH proposal Conclusion
VetMine
Projects Tools BLAH proposal Conclusion
VetMine
Projects Tools BLAH proposal Conclusion
VetMine
Projects Tools BLAH proposal Conclusion
VetMine
Projects Tools BLAH proposal Conclusion
VetMine
Projects Tools BLAH proposal Conclusion
PsyMine
Projects Tools BLAH proposal Conclusion
Goal: discover causal interactions
Projects Tools BLAH proposal Conclusion
From disorders to etiological factors
Projects Tools BLAH proposal Conclusion
Creation of a reference corpus
Projects Tools BLAH proposal Conclusion
An application: temporal trends
Projects Tools BLAH proposal Conclusion
Author name disambiguation
Projects Tools BLAH proposal Conclusion
SwissMADE: The challenge of clinical text
[http://carecentra.com/clinical-notes-mining/]
• SwissMADE (Monitoring ofAdverse Drug Event)
• older patients (aged ≥ 65years)
• antithrombotic drugs
• using structured andunstructured parts of theEHRs
• involves five hospitals
Projects Tools BLAH proposal Conclusion
MedMon
• Mining the web and social networks for mentions of Adverse Drug Reactions
• Collaboration with a major Pharma Company and another Swiss University
• I urgently need a junior PostDoc!• PhD in Computational Linguistics, Computer Science or a related field• Good programming skills, and proven expertise in Python• Experience with Information Extraction and Text Mining• Experience with machine learning approaches
Projects Tools BLAH proposal Conclusion
Assisted curation
• The OntoGene/BioMeXT group has been active in assisted curation since 2010with the SASEBio project (Semi-Automated Semantic Enrichment of theBiomedical Literature).
• Since 2013 we are collaborating with the RegulonDB database in a project aimedat testing and gradually introduce assisted curation techniques in their curationpipeline.
• RegulonDB is a database of the regulatory network of Escherichia coli K-12.
Projects Tools BLAH proposal Conclusion
Example
We additionally found that expression of the mntP gene is upregulated by manganesethrough MntR.
• Given: MntR [+] mntP
• To identify: condition [manganese]
Projects Tools BLAH proposal Conclusion
Projects Tools BLAH proposal Conclusion
OxyR experiment
• TOPIC: oxidative stress by OxyR
• CORPUS: 46 papers, curated in RegDB
• METHODS: automated annotations of entitiesvia OntoGene, selection of sentences via ODINfilters, manual validation
• RESULTS: 100% of RIs retrieved, includingTF, EFFECT and their TG
• Identified the growth conditions for 15 of the20 Ris of OxyR checking only a limited set ofsentences (about 10% of the article is read)
[Gama-Castro et al., 2014]
Projects Tools BLAH proposal Conclusion
Projects Tools BLAH proposal Conclusion
Topic
Projects
Tools
BLAH proposal
Conclusion
Projects Tools BLAH proposal Conclusion
Bio Term Hub
Projects Tools BLAH proposal Conclusion
Projects Tools BLAH proposal Conclusion
Bio Term Hub
Projects Tools BLAH proposal Conclusion
BTH: Term Statistics
Projects Tools BLAH proposal Conclusion
BTH: Term confusion matrix
Projects Tools BLAH proposal Conclusion
OGER
http://www.ontogene.org/resources/oger
Projects Tools BLAH proposal Conclusion
OGER: annotation service
The OntoGene’s Biomedical Entity Recogniser (OGER)
• RESTful web service, using BTH terminologies
• Allows annotation of a collection of documents.
• Evaluated in the Bio Text Mining services challenge BioCreative/TIPS• best results according to several of the evaluation metrics.
http://www.ontogene.org/resources/oger
Projects Tools BLAH proposal Conclusion
OGER: annotation service
• Annotates input text with entities from the BTH• Except EntrezGene
• Can be used as a web demo (for annotation of single articles) or as a web service(batch).
• Input: PubMed, PubMed Central, Free Text
• Formats: text(I), BioC (I/O), pxml (I), tsv (O), brat (O), odin-xml (O)
http://www.ontogene.org/resources/oger
Note: user-provided terminologies can be used, but this is not yet supported by theinterface and web service.
Projects Tools BLAH proposal Conclusion
BioCreative V.5 / TIPS
Projects Tools BLAH proposal Conclusion
OGER in TIPS
Projects Tools BLAH proposal Conclusion
Projects Tools BLAH proposal Conclusion
Term ambiguity
Projects Tools BLAH proposal Conclusion
Term ambiguity
Projects Tools BLAH proposal Conclusion
Term ambiguity
Projects Tools BLAH proposal Conclusion
Projects Tools BLAH proposal Conclusion
CRAFT corpus
Projects Tools BLAH proposal Conclusion
Projects Tools BLAH proposal Conclusion
Results
[Anna Jancso, Lenz Furrer, Fabio Rinaldi, in preparation]
Projects Tools BLAH proposal Conclusion
Previous history. . .
• [2006] BioCreative II: PPI (3rd), IMT(best)
• [2009] BioCreative II.5 PPI (bestresults); BioNLP
• [2010] BioCreative III: ACT, IMT, IAT
• [2011] CALBC (large scale entityextraction), BioNLP
• [2012] CTD task at BioCreative 2012
• [2013] BioCreative IV: BioC, CTD,IAT
http://www.biomext.org/
Projects Tools BLAH proposal Conclusion
Topic
Projects
Tools
BLAH proposal
Conclusion
Projects Tools BLAH proposal Conclusion
Use BTH/OGER through web API
• integration of BTH in another dictionary-based annotation platformhttp://www.ontogene.org/resources/termdb
• usage of OGER web serviceshttp://www.ontogene.org/resources/oger
Suggestion: integration with PubDictionaries/PubAnnotations
Projects Tools BLAH proposal Conclusion
BTH: Rest API
• The Bio Term Hub can currently be accessed publicly through a web interface
• or (if installed locally) used through a command-line interface.
• To ease integration into automatic workflows, a REST API should be added.
http://www.ontogene.org/resources/termdb
Projects Tools BLAH proposal Conclusion
BTH: JSON output
• The Bio Term Hub currently produces plain-text output (a TSV table).
• dense (as compared to e.g. XML), and straight-forward to parse in text-basedprocessing environments
• for some applications JSON might be more suitable.
We propose to evaluate different possible JSON representations and implement thebest one.https://github.com/OntoGene/BioTermHub
License: BSD 2-clause
Projects Tools BLAH proposal Conclusion
OGER: BioC/JSON
• OGER supports BioC XML as both input and output.
• recently a JSON version of the BioC format has been defined.
We propose to add support for this new format. While a possible approach would be touse the converter provided by the NCBI, it is preferable to use a solution with lessoverhead with respect to speed and memory consumption.https://github.com/ncbi-nlp/BioC-JSON
https://github.com/OntoGene/PyBioC
Projects Tools BLAH proposal Conclusion
OGER: Format options in the web UI
• only a fraction of the API’s options is exposed in the web interface
• only allows specifying input documents through an ID or by typing/pasting plaintext into a text box.
• output is always an embedded HTML fragment with the annotations highlightedin color, which cannot easily be downloaded.
We propose to extend the availables choices to the full range of input and outputformats.http://www.ontogene.org/resources/oger
Projects Tools BLAH proposal Conclusion
Topic
Projects
Tools
BLAH proposal
Conclusion
Projects Tools BLAH proposal Conclusion
Conclusions
• Bio Term Hub: a one-stop site for obtaining up-to-date biomedicalterminological resources
• OGER: an efficient text annotation tool using the BTH terminologies Providesspans and IDs (NER and CR)
• OGER-CR shown to be state-of-the-art
• But: disambiguation not yet included in web services
Projects Tools BLAH proposal Conclusion
Thank you / どうもありがとうございます