Chemical name interpretations & Molecular time lines -

Post on 24-Jan-2016

35 views 0 download

description

Chemical name interpretations & Molecular time lines -. This shows detailed record view – with molecular links -. This shows the chemicals report with molecular timeline & mouse over of chemical names. Exploring co-table analysis of Molecules with Gene ID ’ s. - PowerPoint PPT Presentation

Transcript of Chemical name interpretations & Molecular time lines -

11

Chemical name interpretations & Molecular time lines -

22

This shows detailed record view – with molecular links -

33

This shows the chemicals report with molecular timeline & mouse over of chemical names

44

Exploring co-table analysis of Molecules with Gene ID’s

For example – show me all of the co-occurrences of these (x) molecules with these (any / all) gene’s !

55

From the main menu select the Analyze tab 11

66

22 From the analyze menu select the Cotable tab !

77

33 Now Enter the Inchi keys for the molecules of interest -

Click here to enter a sample (test) set of molecules

88

44 Now select - patent field – to explore “patents” !

These are the molecules of interest – (Inchi keys to explore)

Select Patent field here

99

55 Now select - facet = patent field + Gene then click analyze

Molecules

Facet = Patents + Genes

1010

These are the NCBI Gene ID #’s

To transpose the charts or export the data – click here

This shows the “cotable” results = co-occurrences of molecules + NCBI –Gene ID’s

1111

This shows the transposed chart – of co-occurrences of molecules + NCBI –Gene ID’s

Click here to see the patents containing this molecule + this particular gene

1212

Co-table Analysis

For example : Show me all documents where imitrex wasMentioned with “any” …..sign and / or symptoms

(note: these are terms such as headache, vomiting, nausea ..etc ..there are > 680 of them).

1313

Draw a compound of interest Draw a compound of interest 11

22Click – view compound in co-table Click – view compound in co-table

1414

Draw a compound of interest Draw a compound of interest 11

22Click – view compound in co-table Click – view compound in co-table

1515

33 Select a MeSH category for Co-occurance analysis Select a MeSH category for Co-occurance analysis

44 Click analyzeClick analyze

1616

This shows the number of documents that contained the source molecule andANY of the MeSH – C23 terms

This shows the number of documents that contained the source molecule andANY of the MeSH – C23 terms

Click on the numbers to “link to ” the documents

Click on the numbers to “link to ” the documents

1717

Type in a new MeSH code to change the analysis from ‘signs & symptoms’ (C23) to diseases (C01)

Type in a new MeSH code to change the analysis from ‘signs & symptoms’ (C23) to diseases (C01)

1818

This shows the number of documents that contained the source molecule andANY of the MeSH – disease (C01) terms

This shows the number of documents that contained the source molecule andANY of the MeSH – disease (C01) terms

1919

This shows the comparison of 2 drugs and the co-occurrence of MeSH Symptoms (C23) terms

This shows the comparison of 2 drugs and the co-occurrence of MeSH Symptoms (C23) terms

2020

Medline co-occurrence of Statin structures vs. MeSH –

Chemical Structures vs. Signs and Symptoms

This shows the comparison of different statins and the co-occurrence of MeSh terms

This shows the comparison of different statins and the co-occurrence of MeSh terms

2121

Screen shoots from our SIMPLE / SIIP Web application

2222

Search Chemical Search using ChemAxon w/ DB2

Proximal Search Nearest Neighbor Search

2323

BioTerm Analysis

Clustering Claims Originality

Discovery

2424

Landscape Analysis

Visualization

Networks

2525

IBM’s - Massively Parallel Probabilistic Architecture

Question/Topic

Analysis

Question

Hypothesis & Evidence Scoring

Answer, Confidence

SynthesisFinal Merging

& Ranking

QueryDecompositio

n

Hypothesis Generation

Hypothesis & Evidence Scoring

Soft Filtering

Hypothesis Generation

Hypothesis & Evidence Scoring

Soft Filtering

Hypothesis Generation

Trained Models

Primary Search

Candidate Answer Generati

on

A. Sources

SupportingEvidenceRetrieval

Deep Evidence Scoring

Answer Scoring

E. Sources

Evidence

Retrieval

DeepEvidenceScoring

25

Watson generates and scores many hypotheses using an extensible collection of Natural Language Processing, Machine Learning and Reasoning Algorithms. These gather and weigh evidence over both unstructured and

structured content to determine the answer with the best confidence.

Source – J Kreulen

2626

DeepQA Application (Java/C++)

Watson Infrastructure• 90 Power 750 Servers• Each Server 3.5GHz POWER7 8 Core Processor with

4 threads/core• Total: 2880 POWER7 Cores with 16TB RAM• Processing speed: 500Gb/sec; 80 TeraFLOPS• 94th on Top 500 Supercomputers• Note: This hardware is for Jeopardy. Any other

application of Watson will require appropriate sizing and optimization for purpose.

SUSE Linux Enterprise Server 11

Apace Hadoop + Apache UIMA

Nature of Domain: Open vs. ClosedClosed domain implies all knowledge is contained within a specific domain characterized by ontologies and there is no need to go outside the domain.Jeopardy is an open-domain example where it is general knowledge.

Knowledge/Data Sources: AvailabilityQA systems are natural language search engines. Watson goes beyond NL search. If knowledge sources are incomplete, unavailable, insufficient or inadequate then it is not possible for the system to provide an answer. In some cases one would need to envisage Interactive QA that require human interaction to guide the search. Another very important consideration is the availability of sufficient sample data for training (i.e. training corpus).

Need for multi-modalityIs there a need for Transcription from Speech to Text before a question is answered? This would require integration of Speech to Text capabilities that are not really ready for real-time applications.

LatencyWatson is capable of processing 500GB of information per second with 3 sec response to questions and used most of its knowledge source in memory (as opposed to disk) for speed. What is the latency requirement for the application?

Multi-Lingual or Cross-Lingual SupportWatson can support only English at this time; with language-specific parsers other languages can be supported . If knowledge sources or QA is required in multiple languages then that would not be a good candidate. Additionally if cultural context have to be accommodated in the answer then it would not be prudent to deploy QA systems directly interacting with users.

Question TypeDecomposition and classification of the question is critical to how QA systems work. Bulk of the question types in Jeopardy were Factoid questions. Watson did not include 2 question categories: One is Audio/Video type questions that require looking at a video to answer and another are questions that require special instructions (e.g. verbal instructions to explain a question.)

Answer TypesWatson is not designed to curate a task-oriented system. It can handle temporal and geo-spatial reasoning in its answers. As it stands it cannot handle business process type of reasoning (to do task B tasks A, C must be completed etc.)

Technical Issues to consider when applying QA systems like Watson

2727

I would like to acknowledge the IBM Almaden Research – team

Jeff Kreulen Ying Chen Scott Spangler Alfredo AlbaTom GriffinEric Louie Su Yan Issic Cheng Prasad Ramachandran Bin HeAna Lelescu

Qi HeLinda KatoAna Lelescu Brad Wade John Colino Meenakshi NagarajanTimothy J Bethea German Attanasio Laura AndersonRobert Prill

+ a host of folks from IBM China Labs -

2828

Back-up slides

2929

• Challenges ahead –

• Access to full – text

• Language issues • Chinese• Japanese• Korean • Other

• Legal issues

• Web data

• Integration with Medical content

3030

Chemicals from Chinese Patents -

Attempts to process Chinese Patent Documents

Extracting chemical structures form Chinese patents…

3131

Dat

a So

urce

s

View selected

Documents & Reports

U.S.Patents(1976 -—

2009)

U.S. Pre-

Grants (All)

PCT &EPO

Apps

Medline Abstracts

(>18 M)

SelectedInternet Content

User Applications

In-House

Content

Knime or Pipeline Pilot

BIW

SIMPLE

Chem Axon Search

Cognos/DDQB/Other Apps

Parse & Extract

data

Annotator 1

Annotator 2

Database

+compu ted Meta Data

e Classifier & OtherData Associations

Annotation Factory

Computational Analytics

(SemanticAssociations)

Computer Curation Process Overview & integration with our collaborators -

IP Database(e.g. DB2)

ADU*ADU*

* ADU = Automated Data Update

* ADU = Automated Data Update

ChemVersedb

ChemVerse

Services Hosted at IBM Almaden