OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows...

Post on 08-Sep-2019

6 views 0 download

Transcript of OntoGene/BioMeXT - files.ifi.uzh.ch · RESTful web service, using BTH terminologies Allows...

Projects Tools BLAH proposal Conclusion

OntoGene/BioMeXT

Projects Tools BLAH proposal Conclusion

The Bio Term Hub and OGER

Lenz Furrer, Nico Colic, Fabio RinaldiUniversity of Zurich and Swiss Institute of Bioinformatics

January 10, 2018

Projects Tools BLAH proposal Conclusion

Outline

Projects

Tools

BLAH proposal

Conclusion

Projects Tools BLAH proposal Conclusion

Topic

Projects

Tools

BLAH proposal

Conclusion

Projects Tools BLAH proposal Conclusion

VetMine

Projects Tools BLAH proposal Conclusion

VetMine

Projects Tools BLAH proposal Conclusion

VetMine

Projects Tools BLAH proposal Conclusion

VetMine

Projects Tools BLAH proposal Conclusion

VetMine

Projects Tools BLAH proposal Conclusion

PsyMine

Projects Tools BLAH proposal Conclusion

Goal: discover causal interactions

Projects Tools BLAH proposal Conclusion

From disorders to etiological factors

Projects Tools BLAH proposal Conclusion

Creation of a reference corpus

Projects Tools BLAH proposal Conclusion

An application: temporal trends

Projects Tools BLAH proposal Conclusion

Author name disambiguation

Projects Tools BLAH proposal Conclusion

SwissMADE: The challenge of clinical text

[http://carecentra.com/clinical-notes-mining/]

• SwissMADE (Monitoring ofAdverse Drug Event)

• older patients (aged ≥ 65years)

• antithrombotic drugs

• using structured andunstructured parts of theEHRs

• involves five hospitals

Projects Tools BLAH proposal Conclusion

MedMon

• Mining the web and social networks for mentions of Adverse Drug Reactions

• Collaboration with a major Pharma Company and another Swiss University

• I urgently need a junior PostDoc!• PhD in Computational Linguistics, Computer Science or a related field• Good programming skills, and proven expertise in Python• Experience with Information Extraction and Text Mining• Experience with machine learning approaches

Projects Tools BLAH proposal Conclusion

Assisted curation

• The OntoGene/BioMeXT group has been active in assisted curation since 2010with the SASEBio project (Semi-Automated Semantic Enrichment of theBiomedical Literature).

• Since 2013 we are collaborating with the RegulonDB database in a project aimedat testing and gradually introduce assisted curation techniques in their curationpipeline.

• RegulonDB is a database of the regulatory network of Escherichia coli K-12.

Projects Tools BLAH proposal Conclusion

Example

We additionally found that expression of the mntP gene is upregulated by manganesethrough MntR.

• Given: MntR [+] mntP

• To identify: condition [manganese]

Projects Tools BLAH proposal Conclusion

Projects Tools BLAH proposal Conclusion

OxyR experiment

• TOPIC: oxidative stress by OxyR

• CORPUS: 46 papers, curated in RegDB

• METHODS: automated annotations of entitiesvia OntoGene, selection of sentences via ODINfilters, manual validation

• RESULTS: 100% of RIs retrieved, includingTF, EFFECT and their TG

• Identified the growth conditions for 15 of the20 Ris of OxyR checking only a limited set ofsentences (about 10% of the article is read)

[Gama-Castro et al., 2014]

Projects Tools BLAH proposal Conclusion

Projects Tools BLAH proposal Conclusion

Topic

Projects

Tools

BLAH proposal

Conclusion

Projects Tools BLAH proposal Conclusion

Bio Term Hub

Projects Tools BLAH proposal Conclusion

Projects Tools BLAH proposal Conclusion

Bio Term Hub

Projects Tools BLAH proposal Conclusion

BTH: Term Statistics

Projects Tools BLAH proposal Conclusion

BTH: Term confusion matrix

Projects Tools BLAH proposal Conclusion

OGER

http://www.ontogene.org/resources/oger

Projects Tools BLAH proposal Conclusion

OGER: annotation service

The OntoGene’s Biomedical Entity Recogniser (OGER)

• RESTful web service, using BTH terminologies

• Allows annotation of a collection of documents.

• Evaluated in the Bio Text Mining services challenge BioCreative/TIPS• best results according to several of the evaluation metrics.

http://www.ontogene.org/resources/oger

Projects Tools BLAH proposal Conclusion

OGER: annotation service

• Annotates input text with entities from the BTH• Except EntrezGene

• Can be used as a web demo (for annotation of single articles) or as a web service(batch).

• Input: PubMed, PubMed Central, Free Text

• Formats: text(I), BioC (I/O), pxml (I), tsv (O), brat (O), odin-xml (O)

http://www.ontogene.org/resources/oger

Note: user-provided terminologies can be used, but this is not yet supported by theinterface and web service.

Projects Tools BLAH proposal Conclusion

BioCreative V.5 / TIPS

Projects Tools BLAH proposal Conclusion

OGER in TIPS

Projects Tools BLAH proposal Conclusion

Projects Tools BLAH proposal Conclusion

Term ambiguity

Projects Tools BLAH proposal Conclusion

Term ambiguity

Projects Tools BLAH proposal Conclusion

Term ambiguity

Projects Tools BLAH proposal Conclusion

Projects Tools BLAH proposal Conclusion

CRAFT corpus

Projects Tools BLAH proposal Conclusion

Projects Tools BLAH proposal Conclusion

Results

[Anna Jancso, Lenz Furrer, Fabio Rinaldi, in preparation]

Projects Tools BLAH proposal Conclusion

Previous history. . .

• [2006] BioCreative II: PPI (3rd), IMT(best)

• [2009] BioCreative II.5 PPI (bestresults); BioNLP

• [2010] BioCreative III: ACT, IMT, IAT

• [2011] CALBC (large scale entityextraction), BioNLP

• [2012] CTD task at BioCreative 2012

• [2013] BioCreative IV: BioC, CTD,IAT

http://www.biomext.org/

Projects Tools BLAH proposal Conclusion

Topic

Projects

Tools

BLAH proposal

Conclusion

Projects Tools BLAH proposal Conclusion

Use BTH/OGER through web API

• integration of BTH in another dictionary-based annotation platformhttp://www.ontogene.org/resources/termdb

• usage of OGER web serviceshttp://www.ontogene.org/resources/oger

Suggestion: integration with PubDictionaries/PubAnnotations

Projects Tools BLAH proposal Conclusion

BTH: Rest API

• The Bio Term Hub can currently be accessed publicly through a web interface

• or (if installed locally) used through a command-line interface.

• To ease integration into automatic workflows, a REST API should be added.

http://www.ontogene.org/resources/termdb

Projects Tools BLAH proposal Conclusion

BTH: JSON output

• The Bio Term Hub currently produces plain-text output (a TSV table).

• dense (as compared to e.g. XML), and straight-forward to parse in text-basedprocessing environments

• for some applications JSON might be more suitable.

We propose to evaluate different possible JSON representations and implement thebest one.https://github.com/OntoGene/BioTermHub

License: BSD 2-clause

Projects Tools BLAH proposal Conclusion

OGER: BioC/JSON

• OGER supports BioC XML as both input and output.

• recently a JSON version of the BioC format has been defined.

We propose to add support for this new format. While a possible approach would be touse the converter provided by the NCBI, it is preferable to use a solution with lessoverhead with respect to speed and memory consumption.https://github.com/ncbi-nlp/BioC-JSON

https://github.com/OntoGene/PyBioC

Projects Tools BLAH proposal Conclusion

OGER: Format options in the web UI

• only a fraction of the API’s options is exposed in the web interface

• only allows specifying input documents through an ID or by typing/pasting plaintext into a text box.

• output is always an embedded HTML fragment with the annotations highlightedin color, which cannot easily be downloaded.

We propose to extend the availables choices to the full range of input and outputformats.http://www.ontogene.org/resources/oger

Projects Tools BLAH proposal Conclusion

Topic

Projects

Tools

BLAH proposal

Conclusion

Projects Tools BLAH proposal Conclusion

Conclusions

• Bio Term Hub: a one-stop site for obtaining up-to-date biomedicalterminological resources

• OGER: an efficient text annotation tool using the BTH terminologies Providesspans and IDs (NER and CR)

• OGER-CR shown to be state-of-the-art

• But: disambiguation not yet included in web services

Projects Tools BLAH proposal Conclusion

Thank you / どうもありがとうございます