Chemical name interpretations & Molecular time lines -
description
Transcript of Chemical name interpretations & Molecular time lines -
11
Chemical name interpretations & Molecular time lines -
22
This shows detailed record view – with molecular links -
33
This shows the chemicals report with molecular timeline & mouse over of chemical names
44
Exploring co-table analysis of Molecules with Gene ID’s
For example – show me all of the co-occurrences of these (x) molecules with these (any / all) gene’s !
55
From the main menu select the Analyze tab 11
66
22 From the analyze menu select the Cotable tab !
77
33 Now Enter the Inchi keys for the molecules of interest -
Click here to enter a sample (test) set of molecules
88
44 Now select - patent field – to explore “patents” !
These are the molecules of interest – (Inchi keys to explore)
Select Patent field here
99
55 Now select - facet = patent field + Gene then click analyze
Molecules
Facet = Patents + Genes
1010
These are the NCBI Gene ID #’s
To transpose the charts or export the data – click here
This shows the “cotable” results = co-occurrences of molecules + NCBI –Gene ID’s
1111
This shows the transposed chart – of co-occurrences of molecules + NCBI –Gene ID’s
Click here to see the patents containing this molecule + this particular gene
1212
Co-table Analysis
For example : Show me all documents where imitrex wasMentioned with “any” …..sign and / or symptoms
(note: these are terms such as headache, vomiting, nausea ..etc ..there are > 680 of them).
1313
Draw a compound of interest Draw a compound of interest 11
22Click – view compound in co-table Click – view compound in co-table
1414
Draw a compound of interest Draw a compound of interest 11
22Click – view compound in co-table Click – view compound in co-table
1515
33 Select a MeSH category for Co-occurance analysis Select a MeSH category for Co-occurance analysis
44 Click analyzeClick analyze
1616
This shows the number of documents that contained the source molecule andANY of the MeSH – C23 terms
This shows the number of documents that contained the source molecule andANY of the MeSH – C23 terms
Click on the numbers to “link to ” the documents
Click on the numbers to “link to ” the documents
1717
Type in a new MeSH code to change the analysis from ‘signs & symptoms’ (C23) to diseases (C01)
Type in a new MeSH code to change the analysis from ‘signs & symptoms’ (C23) to diseases (C01)
1818
This shows the number of documents that contained the source molecule andANY of the MeSH – disease (C01) terms
This shows the number of documents that contained the source molecule andANY of the MeSH – disease (C01) terms
1919
This shows the comparison of 2 drugs and the co-occurrence of MeSH Symptoms (C23) terms
This shows the comparison of 2 drugs and the co-occurrence of MeSH Symptoms (C23) terms
2020
Medline co-occurrence of Statin structures vs. MeSH –
Chemical Structures vs. Signs and Symptoms
This shows the comparison of different statins and the co-occurrence of MeSh terms
This shows the comparison of different statins and the co-occurrence of MeSh terms
2121
Screen shoots from our SIMPLE / SIIP Web application
2222
Search Chemical Search using ChemAxon w/ DB2
Proximal Search Nearest Neighbor Search
2323
BioTerm Analysis
Clustering Claims Originality
Discovery
2424
Landscape Analysis
Visualization
Networks
2525
IBM’s - Massively Parallel Probabilistic Architecture
Question/Topic
Analysis
Question
Hypothesis & Evidence Scoring
Answer, Confidence
SynthesisFinal Merging
& Ranking
QueryDecompositio
n
Hypothesis Generation
Hypothesis & Evidence Scoring
Soft Filtering
Hypothesis Generation
Hypothesis & Evidence Scoring
Soft Filtering
Hypothesis Generation
Trained Models
Primary Search
Candidate Answer Generati
on
A. Sources
SupportingEvidenceRetrieval
Deep Evidence Scoring
Answer Scoring
E. Sources
Evidence
Retrieval
DeepEvidenceScoring
25
Watson generates and scores many hypotheses using an extensible collection of Natural Language Processing, Machine Learning and Reasoning Algorithms. These gather and weigh evidence over both unstructured and
structured content to determine the answer with the best confidence.
Source – J Kreulen
2626
DeepQA Application (Java/C++)
Watson Infrastructure• 90 Power 750 Servers• Each Server 3.5GHz POWER7 8 Core Processor with
4 threads/core• Total: 2880 POWER7 Cores with 16TB RAM• Processing speed: 500Gb/sec; 80 TeraFLOPS• 94th on Top 500 Supercomputers• Note: This hardware is for Jeopardy. Any other
application of Watson will require appropriate sizing and optimization for purpose.
SUSE Linux Enterprise Server 11
Apace Hadoop + Apache UIMA
Nature of Domain: Open vs. ClosedClosed domain implies all knowledge is contained within a specific domain characterized by ontologies and there is no need to go outside the domain.Jeopardy is an open-domain example where it is general knowledge.
Knowledge/Data Sources: AvailabilityQA systems are natural language search engines. Watson goes beyond NL search. If knowledge sources are incomplete, unavailable, insufficient or inadequate then it is not possible for the system to provide an answer. In some cases one would need to envisage Interactive QA that require human interaction to guide the search. Another very important consideration is the availability of sufficient sample data for training (i.e. training corpus).
Need for multi-modalityIs there a need for Transcription from Speech to Text before a question is answered? This would require integration of Speech to Text capabilities that are not really ready for real-time applications.
LatencyWatson is capable of processing 500GB of information per second with 3 sec response to questions and used most of its knowledge source in memory (as opposed to disk) for speed. What is the latency requirement for the application?
Multi-Lingual or Cross-Lingual SupportWatson can support only English at this time; with language-specific parsers other languages can be supported . If knowledge sources or QA is required in multiple languages then that would not be a good candidate. Additionally if cultural context have to be accommodated in the answer then it would not be prudent to deploy QA systems directly interacting with users.
Question TypeDecomposition and classification of the question is critical to how QA systems work. Bulk of the question types in Jeopardy were Factoid questions. Watson did not include 2 question categories: One is Audio/Video type questions that require looking at a video to answer and another are questions that require special instructions (e.g. verbal instructions to explain a question.)
Answer TypesWatson is not designed to curate a task-oriented system. It can handle temporal and geo-spatial reasoning in its answers. As it stands it cannot handle business process type of reasoning (to do task B tasks A, C must be completed etc.)
Technical Issues to consider when applying QA systems like Watson
2727
I would like to acknowledge the IBM Almaden Research – team
Jeff Kreulen Ying Chen Scott Spangler Alfredo AlbaTom GriffinEric Louie Su Yan Issic Cheng Prasad Ramachandran Bin HeAna Lelescu
Qi HeLinda KatoAna Lelescu Brad Wade John Colino Meenakshi NagarajanTimothy J Bethea German Attanasio Laura AndersonRobert Prill
+ a host of folks from IBM China Labs -
2828
Back-up slides
2929
• Challenges ahead –
• Access to full – text
• Language issues • Chinese• Japanese• Korean • Other
• Legal issues
• Web data
• Integration with Medical content
3030
Chemicals from Chinese Patents -
Attempts to process Chinese Patent Documents
Extracting chemical structures form Chinese patents…
3131
Dat
a So
urce
s
View selected
Documents & Reports
U.S.Patents(1976 -—
2009)
U.S. Pre-
Grants (All)
PCT &EPO
Apps
Medline Abstracts
(>18 M)
SelectedInternet Content
User Applications
In-House
Content
Knime or Pipeline Pilot
BIW
SIMPLE
Chem Axon Search
Cognos/DDQB/Other Apps
Parse & Extract
data
Annotator 1
Annotator 2
Database
+compu ted Meta Data
e Classifier & OtherData Associations
Annotation Factory
Computational Analytics
(SemanticAssociations)
Computer Curation Process Overview & integration with our collaborators -
IP Database(e.g. DB2)
ADU*ADU*
* ADU = Automated Data Update
* ADU = Automated Data Update
ChemVersedb
ChemVerse
Services Hosted at IBM Almaden