How can the international chemical identifier (InChI) be extended to non …
-
Upload
valery-tkachenko -
Category
Technology
-
view
792 -
download
2
Transcript of How can the international chemical identifier (InChI) be extended to non …
How can the International Chemical Identifier (InChI) be extended to non-
trivial chemicals? of the pillars of a
V. Tkachenko, A.J. Williams,Y. Borodina, F. Switzer, T. Peryea, L. Callahan
ACS Philly August 2012
What is InChI
InChI Examples
CH3CH2OHethanol
InChI=1S/C2H6O/c1-2-3/h3H,2H2,1H3
L-ascorbic acidInChI=1S/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-8,10-11H,1H2/t2-,5+/m0/s1
InChI Structure
InChIKey The condensed, 27 character standard InChIKey is a hashed version of the full standard InChI (using the
SHA-256 algorithm)
Designed to allow for easy web searches of chemical compounds
InChIKeys consist of
14 characters resulting from a hash of the connectivity information of the InChI
followed by 9 characters resulting from a hash of the remaining layers of the InChI
followed by a single character indication the version of InChI used
followed by single checksum character
InChI=1S/C17H19NO3/c1-18-7-6-17-10-3-5-13(20)16(17)21-15-12(19)4-2-9(14(15)17)8-11(10)18/h2-5,10-
11,13,16,19-20H,6-8H2,1H3/t10-,11+,13-,16-,17-/m0/s1
BQJCRHHNABKAKU-KBQPJGBKSA-N
Unlike InChI, InChIKey CT only by lookup
Proliferation of InChI
Search by InChI
ChemSpider Google Searchhttp://www.chemspider.com/google/
What’s the catch?
InChI has limitations InChI is ideal for
Simple Static Well-defined graphs
Real chemical substances can only be approximated by such graphs
Limitations Non-trivial stereo (e.g. axial, planar)
Non-trivial tautomers (e.g. ring-chain)
Mixtures – full stereo is rarely known
Polymers
Markush structures
Organometalics
Inorganics
Materials
Reactions
Etc
Chemical data complexity
Work in progress InChI Extensions: Under the guidance of IUPAC, several sub-teams are now
working on expanding InChI to new areas of chemical representation:
Reaction InChI (RInChI): the reaction working group has completed its recommendations, and work is ready to begin.
Polymers/Mixtures: The polymers/mixtures working group also has submitted its recommendations, and work to incorporate the new representations should begin once version 1.04 is released.
Markush: This project is the most complex undertaken to date. The initial recommendations have been submitted, but financing of the work still needs to be sorted out.
But what do we do NOW???
Deposition Process
Non-redundant
data
Data
Va
lidat
ion
Sta
nd
ardi
zatio
n
Filt
erin
g
De
dup
lica
tion
Co
mp
one
ntiz
atio
n
Ma
pp
ing
ChemSpider Data Model
Organometallics
Mixtures or unknown stereo
Accelrys Enhanced Stereo
MOL V3000
Enhanced stereo and InChI…
Unfortunately not supported Is it important? Now real-world examples…
FDA Substance Registration System
Stoichiometric and non-stoichiometric mixtures
Moiety 1:
Moiety 2:
C H 3
NH
O HCH 3
O H
O
C H 3
NH
O HCH 3
O H
O
AND Enantiomer
C H 3
NH
O HCH 3
O H
O
Substance:
Moiety 1:
Moiety 2:
NH 2
O –
O
O
Na +
NH 2
O –
O
ONa +
NH 2
O –
O
ONa +
&1
&2
Mixed
NH 2
O –
O
O
Na +
Moiety 3:
Moiety 4:NH 2
O –
O
ONa +
Substance:
O H
O H
O H
O H
O H
O
O –OH Ca 2+
2
UNDEFINEDO H 2
O H
O H
O H
O H
O H
O
O –OH Ca 2+
2
Substance: Moiety 1:
OH 2Moiety 2:
(undefined)
A
BO 2–Fe 2+
O 2–Fe 3+
2 3
Substance:
2 3
Fe 3+O 2–
O 2–Fe 2+
Moiety 1:
Moiety 2:
(A)
(B)
O HOH
OH O H
O
OH
O HOH
OH O H
O
OH
O
O H
O H
O H
H
O H
OH
O H
OH
O H
O H
OH
O
O H
OH
O H
O H
OH
O
D-glucose
SRS standardization approach
Substance description Standardization module Moieties generator Normalization InChI[Key] generator
Hash function f(InChIKeys, moieties)
Unique ID Standard description
SRS TBD
Markush
Polymers
Proteins
Inorganics
Materials
OpenPHACTS
Open PHACTS is an Innovative Medicines Initiative (IMI) – 3 years project
To reduce the barriers to drug discovery in industry, academia and for small businesses
To build an open platform, integrating chemistry and biology data from public domain resources
Semantic web platform
Open Standards, Open Data and Open Source
OpenPHACTS specifics
Active/inactive ingredient
Parent/child
Sample/substance
Misreferences (!!!)
ChemSpider Reactions
ChemSpider Reaction Challenges
Deduplication
Identification
Deposition
Conclusions
InChI is The Identifier
InChI has its limitations
InChI is work in progress
InChI deficiencies can be hot-fixed
Acknowledgements
RSC Cheminformatics group
FDA SRS group
OpenPHACTS consortium
Software: InChI, GGA Software
Thank you
Email: [email protected] Blog: www.chemspider.com/blog SLIDES: http://www.slideshare.net/valerytkachenko16