InChI/InChIKey vs. NCI/CADD Structure Identifiers: A ... · PDF fileComparison Standard...
date post
19-Sep-2019Category
Documents
view
1download
0
Embed Size (px)
Transcript of InChI/InChIKey vs. NCI/CADD Structure Identifiers: A ... · PDF fileComparison Standard...
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
InChI/InChIKey vs. NCI/CADD Structure Identifiers: A comparison Markus Sitzmann
Computer-Aided Drug Design Group (NCI/CADD), Laboratory of Medicinal Chemistry, NCI-Frederick, NIH, DHHS
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
The Adaption and Use of the IUPAC InChI/InChIKey
NCI/CADD Identifiers InChI/InChIKey
Chemical Structure Lookup Service
FICTS FICuS uuuuu Std. InChI/InChIKey
74 million structure records – 46 million unique structures
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
• based on hashcodes calculated by the chemoinformatics toolkit CACTVS
• CACTVS hashcodes: represent a chemical structure uniquely as
16-digit hexadecimal number (64-bit unsigned) have a high sensitivity to structural features of a compound change if connectivity changes
NCI/CADD Structure Identifiers Unique Representation of Chemical Structures
H N N N H 2
O H
O
9850FD9F9E2B4E25
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
charged form
A3DAE0788050DDE4 3ECEF579D7DF025A
tautomers
isotope “errors”
E92E4BA2869F3611 8A7AD1EB498CC76A stereoisomers 6C16DE2351F9FF50
H N N N H 2
O H
O
N N H N H 2
O H O
H N N
O H O
N H 2 H N
N O H
O
N H 2
salt
H N N N H 2
O - O
N a + H N
N N H 3 + O -
O
8F7A1DE5A733F0E0
O
H N N N H 2
O N a
60525E1AF41497B6
H N N N H
O H O
B2FDA68AEDA06DB9
N H N 1 5 N H 2
O H O
9850FD9F9E2B4E25
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
input structure
MDL Molfile MDL SDF SMILES ChemDraw cdx PDB
structure normalization
parent structure
MDL SDF SMILES database
NCI/CADD Identifier
hashcode calculation
NCI/CADD Structure Identifiers Unique Representation of Chemical Structures
E_HASHISY
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
• adjustable levels of sensitivity:
NCI/CADD Structure Identifiers
Fragments
sensitive
keep only largest organic fragment
Isotopes
ignore isotope labels
sensitive
D D D
D D D
Charges
uncharge
sensitive
find canonical tautomer
O O
Stereochemistry
sensitive
C O O H N H 2
discard stereo information
O - O
N H 3 +
O H O
N H 2
un-sensitive un-sensitive un-sensitive un-sensitive
sensitive
O O H
O O H
Tautomers
C O O H H N H 2 C O O H
N H 2 H Na+ O
O -
O O H
Structure Normalization
un-sensitive
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
NCI/CADD Structure Identifiers
Fragments Isotopes Charges
sensitive sensitive sensitive
D D D
D D D
O O C O O H N H 2
un-sensitive un-sensitive un-sensitive un-sensitive
O - O
N H 3 +
O H O
N H 2
Tautomers Stereochemistry
sensitive sensitive
O O H
O O H C O O H H N H 2 C O O H
N H 2 H Na+ O
O -
O O H
Structure Normalization
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
NCI/CADD Structure Identifiers
Fragments Isotopes Charges
sensitive sensitive sensitive
D D D
D D D
O O C O O H N H 2
F I C
FICTS identifier: representation of the exact drawing
un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive
T
O - O
N H 3 +
O H O
N H 2
≠ ≠ ≠
Tautomers Stereochemistry
sensitive sensitive
O O H
O O H C O O H H N H 2 C O O H
N H 2 H
≠
≠
S
Na+
O O -
O O H
=
=
≠
≠
Structure Normalization
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
NCI/CADD Structure Identifiers
Fragments Isotopes Charges
sensitive sensitive sensitive
D D D
D D D
O O C O O H N H 2
F I C
FICuS identifier: comes closest to how a chemist perceives a compound
un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive
u
O - O
N H 3 +
O H O
N H 2
≠ ≠ ≠ ≠
Tautomers Stereochemistry
sensitive sensitive
O O H
O O H C O O H H N H 2 C O O H
N H 2 H =
= ≠
≠
S
Na+
O O -
O O H
Structure Normalization
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
NCI/CADD Structure Identifier
Fragments Isotopes Charges Tautomers Stereochemistry
Na+
sensitive sensitive sensitive sensitive sensitive
O O -
D D D
D D D
O - O
N H 3 +
O O H
O O H C O O H H N H 2 C O O H
N H 2 H
O O H
O O C O O H N H 2 O H
O
N H 2
=
= = = = = =
=
uuuuu identifier: closely related forms of the same compound
u u u u u
un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive
Structure Normalization
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
NCI/CADD Structure Identifier
correct structure: add hydrogen atoms correct functional groups correct metal atom bonds
input structure
normalize or discard stereo information
define canonical tautomer
discard isotope labels
d
Structure Normalization
get largest fragment & uncharge: delete complex center get largest organic fragment delete radical center uncharge structure
uuuuu
uuuuS
uuuTu
uuuTS
FICuu
FICuS
FICTS
FICTu
n
n
n
n
d
d
d
define canonical resonance form/ protonation state
parent structures
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
NCI/CADD Structure Identifier
9850FD9F9E2B4E25-FICTS-01-57 9850FD9F9E2B4E25-FICuS-01-78 9850FD9F9E2B4E25-uuuuu-01-27
---
H N N N H 2
O H
O
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
A3DAE0788050DDE4-FICTS E5F83F10C5DB080A-FICTS
B2FDA68AEDA06DB9-FICTS
9850FD9F9E2B4E25-FICTS
E5F83F10C5DB080A-FICTS
E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS 6C16DE2351F9FF50-FICTS
H N N N H 2
O H
O
N N H N H 2
O H O
H N N
O H O
N H 2 H N
N O H
O
N H 2
H N N N H 2
O - O
N a + H N
N N H 3 + O -
O
O
H N N N H 2
O N a
H N N N H
O H O
N H N 1 5 N H 2
O H O
9850FD9F9E2B4E25-FICTS
charged form
tautomers
isotope
salt
stereoisomers
FICTS
“errors”
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
A3DAE0788050DDE4-FICuS E5F83F10C5DB080A-FICuS
B2FDA68AEDA06DB9-FICuS
9850FD9F9E2B4E25-FICuS
E5F83F10C5DB080A-FICuS
E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS 9850FD9F9E2B4E25-FICuS
H N N N H 2
O H
O
N N H N H 2
O H O
H N N
O H O
N H 2 H N
N O H
O
N H 2
H N N N H 2
O - O
N a + H N
N N H 3 + O -
O
O
H N N N H 2
O N a
H N N N H
O H O
N H N 1 5 N H 2
O H O
9850FD9F9E2B4E25-FICuS
charged form
tautomers
isotope
salt
stereoisomers
FICuS
“errors”
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
9850FD9F9E2B4E25-uuuuu 9850FD9F9E2B4E25-uuuuu
9850FD9F9E2B4E25-uuuuu
9850FD9F9E2B4E25-FICuS
9850FD9F9E2B4E25-uuuuu
9850FD9F9E2B4E25-uuuuu 9850FD9F9E2B4E25-uuuuu 9850FD9F9E2B4E25-uuuuu
H N N N H 2
O H
O
N N H N H 2
O H O
H N N
O H O
N H 2 H N
N O H
O
N H 2
H N N N H 2
O - O
N a + H N
N N H 3 + O -
O
O
H N N N H 2
O N a
H N N N H
O H O
N H N 1 5 N H 2
O H