Chem spider as a chemical term resolver
-
Upload
antony-williams-chemconnector-orcid-0000-0002-2668-4821 -
Category
Technology
-
view
2.610 -
download
1
description
Transcript of Chem spider as a chemical term resolver
ChemSpider as a Chemical Term Resolver
Antony Williams, Valery Tkachenko, Sean Ekins and Andy Fant
ACS San Diego March 2012
The Web of Chemistry – VERY BIG!
Online Databases are “Linking”
It is so difficult to navigate…
What’s the structure?What’s the structure?
Are they in our file?
Are they in our file?
What’s similar?What’s
similar?
What’s the target?
What’s the target?Pharmacology
data?Pharmacology
data?
Known Pathways?
Known Pathways?
Working On Now?
Working On Now?Connections
to disease?Connections to disease?
Expressed in right cell type?Expressed in
right cell type?
Competitors?Competitors?
IP?IP?
Open PHACTS Project Develop a set of robust standards… Implement the standards in a semantic integration hub Deliver services to support drug discovery programs in
pharma and public domain 22 partners, 8 pharmaceutical companies, 3 biotechs 36 months project
Guiding principle is open access, open usage, open source- Key to standards adoption -
Guiding principle is open access, open usage, open source- Key to standards adoption -
What is the Structure of Vitamin K?
MeSH
A lipid cofactor that is required for normal blood clotting.
Several forms of vitamin K have been identified: VITAMIN K 1 (phytomenadione) derived from
plants, VITAMIN K 2 (menaquinone) from bacteria, and
synthetic naphthoquinone provitamins, VITAMIN K 3 (menadione).
What is the Structure of Vitamin K1?
Create an Online “Resolver” as a path to chemistry Search all forms of structure IDs
Systematic name(s) Trivial Name(s) SMILES InChI Strings InChIKeys Database IDs Registry Number
ChemSpider
Available Information…
Linked to vendors, safety data, toxicity, metabolism
Available Information….
Vitamin K1 Names
Vitamin K1 on ChemSpider CORRECT
Resolving Names for QUALITY
Searching chemical identifiers should resolve to the correct chemical as much as possible
Validated Name-Structure Dictionaries
Chemical name dictionaries are used for: Text-mining (publications, patents)
Used to index PubMed and link to Google Patents
Linking to other databases – think Biology! When structures are not available drug names link
Searching the web Names link to structures link to InChIs
I want to know about “Vincristine”
Vincristine: Identifiers
Vincristine: PatentsLinked by Name
Many Names, One Structure
Top 200 Drugs on Wikipediahttp://en.wikipedia.org/wiki/List_of_bestselling_drugs
The Project Challenge PART ONE
Agree on the set of chemical names to work with
Independently create an SDF file in each “lab”
Compare differences and agree on final structures
Issue “Gold Standard” SDF file to team
RSC Process
Relative accuracy of groups against final master list
The Project Challenge PART TWO
Use Gold Standard SDF File to investigate data quality on these compounds in Internet Databases
Two checks Search chemical name – does it return the
correct compound. If not correct, how is it different?
Search “structure” – SMILES, Molfile, InChIString or InChIKey
“The First 10”
Performance on 150 Drug Names
NPC Browser Set
Standardize
Use the SRS as a guidance document for standardization
Adjust as necessary to our needs
Nitro groups
Salt and Ionic Bonds
One dictionary look up is never enough…
ChemSpider does not contain all chemistry
We are not the only ones curating data
New chemistry expands daily and goes online
Federation is key….
Check ChemSpider first, if not found then Check PubChem Check NCI resolver Check ChEBI Check ….the “network” of open interfaces
Each resolver will have its own “quantitative confidence”.
One dictionary look up is never enough…
Chemical Identifier Resolver (CIR)
http://cactus.nci.nih.gov/chemical/structure
Converts a given structure identifier into another representation or structure identifier.
Resolve names, identifiers etc
What can become a resolver?
We are building….
A central federated resolver utilizing available services
Dictionary lookups, systematic name conversions (multiple tools – ACD/Labs, Lexichem, OPSIN)
“Consensus” decisions and guidance BUT Chemicals have timelines!!!
ORIGINAL FINAL
Thank you
Email: [email protected] Twitter: ChemConnectorPersonal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams