Linguamatics – David Milward - ChemAxon

9
Chemically Informed Text Mining David Milward Linguamatics Chemaxon UGM Budapest 2013 © Linguamatics 2013

Transcript of Linguamatics – David Milward - ChemAxon

Page 1: Linguamatics – David Milward - ChemAxon

Chemically Informed Text Mining

David Milward

Linguamatics

Chemaxon UGM Budapest 2013

© Linguamatics 2013

Page 2: Linguamatics – David Milward - ChemAxon

Click to edit Master title style Click to edit Master title style Linguamatics: Agile Text Mining

Boston Cambridge

I2E: agile, scalable, real-time NLP-based text mining

Fact extraction and knowledge synthesis

Fortune 500

Pharma/Biotech

Healthcare

Government Including 9

of the top 10

Including Kaiser Permanente

Including FDA

© Linguamatics 2013

Page 3: Linguamatics – David Milward - ChemAxon

Click to edit Master title style Click to edit Master title style

• Melting points for exemplified compounds in patents

Chemical Searching combined with Text Searching

Patent Data from IFI Claims Direct

© Linguamatics 2013

Page 4: Linguamatics – David Milward - ChemAxon

Click to edit Master title style Click to edit Master title style A Versatile Toolbox for Finding Information …

• Search for e.g. cancer and get synonyms and children:

• Malignant neoplasms, Malignant tumor …

• Leukaemia, Lymphoma, Astrocytoma … Terminologies

Linguistics

• e.g. microRNA: let-?\d+.* mirn?a?-?\d+.* Regular Expressions

Chemical Substructure

• Simultaneous processing of large numbers of items e.g.

• 500 genes from microarray experiment High Throughput

© Linguamatics 2013

Page 5: Linguamatics – David Milward - ChemAxon

Click to edit Master title style Click to edit Master title style … and Presenting it Efficiently

Identify Extract Synthesize Analyze

Pie Charts for drill down

© Linguamatics 2013

Trending over time

Interaction networks

Mind maps with clustering Clustered results table

RDF/BEL for network modelling

bp(apoptosis)p(C)taof(p(A))

microRNA(Q) kaof(p(D))p(D, P@Y)

p(B) catof(p(R))

catalytic activity

kinase activitymicroRNA abundance

phosphorylation at unspecified

tyrosine

protein abundance

direct causation

transcriptional activity

biological process

protein abundance

Page 6: Linguamatics – David Milward - ChemAxon

Click to edit Master title style Click to edit Master title style

© Linguamatics 2013

ChemAxon Integration

Mol files

Mol conversion with Filtering

5.7 g (56.7 mmol) of triethylamine in 20 ml methylene chloride are added dropwise at room temperature to a solution of 10 g (56.7 mmol) 2-hydroxymethyl-6-methylene-1,4-dithiepane

I2E Index

Name-to-Structure

I2E Query with Substructure/ Similarity

Page 7: Linguamatics – David Milward - ChemAxon

Click to edit Master title style Click to edit Master title style

YOUR APPLICATION

HERE!

I2E Server

Indexing tasks

Querying tasks

Class matching

Index/Query Publishing

Administration Tasks

I2E Client Pipeline Pilot Components

WSAPI Web View

Sample Web GUI

Client

I2E WSAPI

Serv

er

I2E Web Services API (WSAPI)

© Linguamatics 2013

Page 8: Linguamatics – David Milward - ChemAxon

Click to edit Master title style Click to edit Master title style I2E WSAPI Examples

© Linguamatics 2013

Page 9: Linguamatics – David Milward - ChemAxon

Click to edit Master title style Click to edit Master title style Thank You!

For more information…

Please visit our table or www.linguamatics.com

Webinars:

www.linguamatics.com/welcome/events/webinars.html

Contact: Phil Hastings

Email: [email protected]

© Linguamatics 2013