OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows...

51
Projects Tools BLAH proposal Conclusion OntoGene/BioMeXT

Transcript of OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows...

Page 1: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

OntoGene/BioMeXT

Page 2: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

The Bio Term Hub and OGER

Lenz Furrer, Nico Colic, Fabio RinaldiUniversity of Zurich and Swiss Institute of Bioinformatics

January 10, 2018

Page 3: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Outline

Projects

Tools

BLAH proposal

Conclusion

Page 4: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Topic

Projects

Tools

BLAH proposal

Conclusion

Page 5: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

VetMine

Page 6: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

VetMine

Page 7: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

VetMine

Page 8: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

VetMine

Page 9: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

VetMine

Page 10: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

PsyMine

Page 11: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Goal: discover causal interactions

Page 12: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

From disorders to etiological factors

Page 13: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Creation of a reference corpus

Page 14: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

An application: temporal trends

Page 15: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Author name disambiguation

Page 16: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

SwissMADE: The challenge of clinical text

[http://carecentra.com/clinical-notes-mining/]

• SwissMADE (Monitoring ofAdverse Drug Event)

• older patients (aged ≥ 65years)

• antithrombotic drugs

• using structured andunstructured parts of theEHRs

• involves five hospitals

Page 17: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

MedMon

• Mining the web and social networks for mentions of Adverse Drug Reactions

• Collaboration with a major Pharma Company and another Swiss University

• I urgently need a junior PostDoc!• PhD in Computational Linguistics, Computer Science or a related field• Good programming skills, and proven expertise in Python• Experience with Information Extraction and Text Mining• Experience with machine learning approaches

Page 18: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Assisted curation

• The OntoGene/BioMeXT group has been active in assisted curation since 2010with the SASEBio project (Semi-Automated Semantic Enrichment of theBiomedical Literature).

• Since 2013 we are collaborating with the RegulonDB database in a project aimedat testing and gradually introduce assisted curation techniques in their curationpipeline.

• RegulonDB is a database of the regulatory network of Escherichia coli K-12.

Page 19: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Example

We additionally found that expression of the mntP gene is upregulated by manganesethrough MntR.

• Given: MntR [+] mntP

• To identify: condition [manganese]

Page 20: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Page 21: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

OxyR experiment

• TOPIC: oxidative stress by OxyR

• CORPUS: 46 papers, curated in RegDB

• METHODS: automated annotations of entitiesvia OntoGene, selection of sentences via ODINfilters, manual validation

• RESULTS: 100% of RIs retrieved, includingTF, EFFECT and their TG

• Identified the growth conditions for 15 of the20 Ris of OxyR checking only a limited set ofsentences (about 10% of the article is read)

[Gama-Castro et al., 2014]

Page 22: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Page 23: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Topic

Projects

Tools

BLAH proposal

Conclusion

Page 24: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Bio Term Hub

Page 25: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Page 26: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Bio Term Hub

Page 27: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

BTH: Term Statistics

Page 28: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

BTH: Term confusion matrix

Page 29: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

OGER

http://www.ontogene.org/resources/oger

Page 30: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

OGER: annotation service

The OntoGene’s Biomedical Entity Recogniser (OGER)

• RESTful web service, using BTH terminologies

• Allows annotation of a collection of documents.

• Evaluated in the Bio Text Mining services challenge BioCreative/TIPS• best results according to several of the evaluation metrics.

http://www.ontogene.org/resources/oger

Page 31: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

OGER: annotation service

• Annotates input text with entities from the BTH• Except EntrezGene

• Can be used as a web demo (for annotation of single articles) or as a web service(batch).

• Input: PubMed, PubMed Central, Free Text

• Formats: text(I), BioC (I/O), pxml (I), tsv (O), brat (O), odin-xml (O)

http://www.ontogene.org/resources/oger

Note: user-provided terminologies can be used, but this is not yet supported by theinterface and web service.

Page 32: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

BioCreative V.5 / TIPS

Page 33: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

OGER in TIPS

Page 34: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Page 35: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Term ambiguity

Page 36: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Term ambiguity

Page 37: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Term ambiguity

Page 38: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Page 39: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

CRAFT corpus

Page 40: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Page 41: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Results

[Anna Jancso, Lenz Furrer, Fabio Rinaldi, in preparation]

Page 42: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Previous history. . .

• [2006] BioCreative II: PPI (3rd), IMT(best)

• [2009] BioCreative II.5 PPI (bestresults); BioNLP

• [2010] BioCreative III: ACT, IMT, IAT

• [2011] CALBC (large scale entityextraction), BioNLP

• [2012] CTD task at BioCreative 2012

• [2013] BioCreative IV: BioC, CTD,IAT

http://www.biomext.org/

Page 43: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Topic

Projects

Tools

BLAH proposal

Conclusion

Page 44: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Use BTH/OGER through web API

• integration of BTH in another dictionary-based annotation platformhttp://www.ontogene.org/resources/termdb

• usage of OGER web serviceshttp://www.ontogene.org/resources/oger

Suggestion: integration with PubDictionaries/PubAnnotations

Page 45: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

BTH: Rest API

• The Bio Term Hub can currently be accessed publicly through a web interface

• or (if installed locally) used through a command-line interface.

• To ease integration into automatic workflows, a REST API should be added.

http://www.ontogene.org/resources/termdb

Page 46: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

BTH: JSON output

• The Bio Term Hub currently produces plain-text output (a TSV table).

• dense (as compared to e.g. XML), and straight-forward to parse in text-basedprocessing environments

• for some applications JSON might be more suitable.

We propose to evaluate different possible JSON representations and implement thebest one.https://github.com/OntoGene/BioTermHub

License: BSD 2-clause

Page 47: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

OGER: BioC/JSON

• OGER supports BioC XML as both input and output.

• recently a JSON version of the BioC format has been defined.

We propose to add support for this new format. While a possible approach would be touse the converter provided by the NCBI, it is preferable to use a solution with lessoverhead with respect to speed and memory consumption.https://github.com/ncbi-nlp/BioC-JSON

https://github.com/OntoGene/PyBioC

Page 48: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

OGER: Format options in the web UI

• only a fraction of the API’s options is exposed in the web interface

• only allows specifying input documents through an ID or by typing/pasting plaintext into a text box.

• output is always an embedded HTML fragment with the annotations highlightedin color, which cannot easily be downloaded.

We propose to extend the availables choices to the full range of input and outputformats.http://www.ontogene.org/resources/oger

Page 49: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Topic

Projects

Tools

BLAH proposal

Conclusion

Page 50: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Conclusions

• Bio Term Hub: a one-stop site for obtaining up-to-date biomedicalterminological resources

• OGER: an efficient text annotation tool using the BTH terminologies Providesspans and IDs (NER and CR)

• OGER-CR shown to be state-of-the-art

• But: disambiguation not yet included in web services

Page 51: OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows annotation of a collection of documents. Evaluated in the Bio Text Mining services challenge

Projects Tools BLAH proposal Conclusion

Thank you / どうもありがとうございます