InChI/InChIKey vs. NCI/CADD Structure Identifiers: A ... · PDF fileComparison Standard...

Click here to load reader

  • date post

    19-Sep-2019
  • Category

    Documents

  • view

    1
  • download

    0

Embed Size (px)

Transcript of InChI/InChIKey vs. NCI/CADD Structure Identifiers: A ... · PDF fileComparison Standard...

  • Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

    InChI/InChIKey vs. NCI/CADD Structure Identifiers: A comparison Markus Sitzmann

    Computer-Aided Drug Design Group (NCI/CADD), Laboratory of Medicinal Chemistry, NCI-Frederick, NIH, DHHS

  • Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

    The Adaption and Use of the IUPAC InChI/InChIKey

    NCI/CADD Identifiers InChI/InChIKey

    Chemical Structure Lookup Service

    FICTS FICuS uuuuu Std. InChI/InChIKey

    74 million structure records – 46 million unique structures

  • Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

    •  based on hashcodes calculated by the chemoinformatics toolkit CACTVS

    •  CACTVS hashcodes:   represent a chemical structure uniquely as

    16-digit hexadecimal number (64-bit unsigned)   have a high sensitivity to structural features of a compound   change if connectivity changes

    NCI/CADD Structure Identifiers Unique Representation of Chemical Structures

    H N N N H 2

    O H

    O

    9850FD9F9E2B4E25

  • Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

    charged form

    A3DAE0788050DDE4 3ECEF579D7DF025A

    tautomers

    isotope “errors”

    E92E4BA2869F3611 8A7AD1EB498CC76A stereoisomers 6C16DE2351F9FF50

    H N N N H 2

    O H

    O

    N N H N H 2

    O H O

    H N N

    O H O

    N H 2 H N

    N O H

    O

    N H 2

    salt

    H N N N H 2

    O - O

    N a + H N

    N N H 3 + O -

    O

    8F7A1DE5A733F0E0

    O

    H N N N H 2

    O N a

    60525E1AF41497B6

    H N N N H

    O H O

    B2FDA68AEDA06DB9

    N H N 1 5 N H 2

    O H O

    9850FD9F9E2B4E25

  • Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

    input structure

    MDL Molfile MDL SDF SMILES ChemDraw cdx PDB

    structure normalization

    parent structure

    MDL SDF SMILES database

    NCI/CADD Identifier

    hashcode calculation

    NCI/CADD Structure Identifiers Unique Representation of Chemical Structures

    E_HASHISY

  • Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

    •  adjustable levels of sensitivity:

    NCI/CADD Structure Identifiers

    Fragments

    sensitive

    keep only largest organic fragment

    Isotopes

    ignore isotope labels

    sensitive

    D D D

    D D D

    Charges

    uncharge

    sensitive

    find canonical tautomer

    O O

    Stereochemistry

    sensitive

    C O O H N H 2

    discard stereo information

    O - O

    N H 3 +

    O H O

    N H 2

    un-sensitive un-sensitive un-sensitive un-sensitive

    sensitive

    O O H

    O O H

    Tautomers

    C O O H H N H 2 C O O H

    N H 2 H Na+ O

    O -

    O O H

    Structure Normalization

    un-sensitive

  • Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

    NCI/CADD Structure Identifiers

    Fragments Isotopes Charges

    sensitive sensitive sensitive

    D D D

    D D D

    O O C O O H N H 2

    un-sensitive un-sensitive un-sensitive un-sensitive

    O - O

    N H 3 +

    O H O

    N H 2

    Tautomers Stereochemistry

    sensitive sensitive

    O O H

    O O H C O O H H N H 2 C O O H

    N H 2 H Na+ O

    O -

    O O H

    Structure Normalization

  • Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

    NCI/CADD Structure Identifiers

    Fragments Isotopes Charges

    sensitive sensitive sensitive

    D D D

    D D D

    O O C O O H N H 2

    F I C

    FICTS identifier: representation of the exact drawing

    un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive

    T

    O - O

    N H 3 +

    O H O

    N H 2

    ≠ ≠ ≠

    Tautomers Stereochemistry

    sensitive sensitive

    O O H

    O O H C O O H H N H 2 C O O H

    N H 2 H

    S

    Na+

    O O -

    O O H

    =

    =

    Structure Normalization

  • Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

    NCI/CADD Structure Identifiers

    Fragments Isotopes Charges

    sensitive sensitive sensitive

    D D D

    D D D

    O O C O O H N H 2

    F I C

    FICuS identifier: comes closest to how a chemist perceives a compound

    un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive

    u

    O - O

    N H 3 +

    O H O

    N H 2

    ≠ ≠ ≠ ≠

    Tautomers Stereochemistry

    sensitive sensitive

    O O H

    O O H C O O H H N H 2 C O O H

    N H 2 H =

    = ≠

    S

    Na+

    O O -

    O O H

    Structure Normalization

  • Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

    NCI/CADD Structure Identifier

    Fragments Isotopes Charges Tautomers Stereochemistry

    Na+

    sensitive sensitive sensitive sensitive sensitive

    O O -

    D D D

    D D D

    O - O

    N H 3 +

    O O H

    O O H C O O H H N H 2 C O O H

    N H 2 H

    O O H

    O O C O O H N H 2 O H

    O

    N H 2

    =

    = = = = = =

    =

    uuuuu identifier: closely related forms of the same compound

    u u u u u

    un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive

    Structure Normalization

  • Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

    NCI/CADD Structure Identifier

    correct structure: add hydrogen atoms correct functional groups correct metal atom bonds

    input structure

    normalize or discard stereo information

    define canonical tautomer

    discard isotope labels

    d

    Structure Normalization

    get largest fragment & uncharge: delete complex center get largest organic fragment delete radical center uncharge structure

    uuuuu

    uuuuS

    uuuTu

    uuuTS

    FICuu

    FICuS

    FICTS

    FICTu

    n

    n

    n

    n

    d

    d

    d

    define canonical resonance form/ protonation state

    parent structures

  • Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

    NCI/CADD Structure Identifier

    9850FD9F9E2B4E25-FICTS-01-57 9850FD9F9E2B4E25-FICuS-01-78 9850FD9F9E2B4E25-uuuuu-01-27

    ---

    H N N N H 2

    O H

    O

  • Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

    A3DAE0788050DDE4-FICTS E5F83F10C5DB080A-FICTS

    B2FDA68AEDA06DB9-FICTS

    9850FD9F9E2B4E25-FICTS

    E5F83F10C5DB080A-FICTS

    E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS 6C16DE2351F9FF50-FICTS

    H N N N H 2

    O H

    O

    N N H N H 2

    O H O

    H N N

    O H O

    N H 2 H N

    N O H

    O

    N H 2

    H N N N H 2

    O - O

    N a + H N

    N N H 3 + O -

    O

    O

    H N N N H 2

    O N a

    H N N N H

    O H O

    N H N 1 5 N H 2

    O H O

    9850FD9F9E2B4E25-FICTS

    charged form

    tautomers

    isotope

    salt

    stereoisomers

    FICTS

    “errors”

  • Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

    A3DAE0788050DDE4-FICuS E5F83F10C5DB080A-FICuS

    B2FDA68AEDA06DB9-FICuS

    9850FD9F9E2B4E25-FICuS

    E5F83F10C5DB080A-FICuS

    E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuS 9850FD9F9E2B4E25-FICuS

    H N N N H 2

    O H

    O

    N N H N H 2

    O H O

    H N N

    O H O

    N H 2 H N

    N O H

    O

    N H 2

    H N N N H 2

    O - O

    N a + H N

    N N H 3 + O -

    O

    O

    H N N N H 2

    O N a

    H N N N H

    O H O

    N H N 1 5 N H 2

    O H O

    9850FD9F9E2B4E25-FICuS

    charged form

    tautomers

    isotope

    salt

    stereoisomers

    FICuS

    “errors”

  • Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

    9850FD9F9E2B4E25-uuuuu 9850FD9F9E2B4E25-uuuuu

    9850FD9F9E2B4E25-uuuuu

    9850FD9F9E2B4E25-FICuS

    9850FD9F9E2B4E25-uuuuu

    9850FD9F9E2B4E25-uuuuu 9850FD9F9E2B4E25-uuuuu 9850FD9F9E2B4E25-uuuuu

    H N N N H 2

    O H

    O

    N N H N H 2

    O H O

    H N N

    O H O

    N H 2 H N

    N O H

    O

    N H 2

    H N N N H 2

    O - O

    N a + H N

    N N H 3 + O -

    O

    O

    H N N N H 2

    O N a

    H N N N H

    O H O

    N H N 1 5 N H 2

    O H