Interaction fingerprint: 1D representation of 3D protein-ligand complexes

45
Interaction fingerprints Vladimir Chupakhin, UNISTRA, 2011 1NTERACT10N F1NGERPR1NTS 1 Chupakhin Vladimir Laboratory of Chemoinformatics Structural Chemogenomics Group University of Strasbourg December 2011

description

Structural interaction fingerprint1 (IFP) was introduced in order to overcome the shortcomings of the existing scoring functions. IFP represent a binary string that encoding a presence or an absence of interactions of a ligand with amino acids of a protein binding site. It is a convenient way to compare and analyze binding poses of the ligands.

Transcript of Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Page 1: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Interaction fingerprints

Vladimir Chupakhin, UNISTRA, 2011

1NTERACT10NF1NGERPR1NTS

1

Chupakhin VladimirLaboratory of ChemoinformaticsStructural Chemogenomics GroupUniversity of Strasbourg

December 2011

Page 2: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Virtual screening approaches

Ligand –based (QSAR, similarity search,

pharmacophores)

Vladimir Chupakhin, UNISTRA, 2011

Structure–based(docking, pharmacophores)

?

Page 3: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Lock-and-key paradigm

InteractionsLo

ckK

ey

Vladimir Chupakhin, UNISTRA, 20113

Page 4: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Molecular docking: main steps

1. Protein and ligand preparation2. Binding site identification

3. Conformational search with scoring of the generated

poses

Vladimir Chupakhin, UNISTRA, 20114

Page 5: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Geometry of interaction

H-bond length (3.0 Å)

H-bond angle (~175°)

Interactions are geometry!

Page 6: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

- Hydrophobic- H-bonds- Ionic- Aromatic- Cation-π

Different type of interactions

Page 7: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Self-docking

Vladimir Chupakhin, UNISTRA, 20117

Extract ligand

Modify geometryDock to thesame protein

Extract ligand

Calculate RMSD

BlueRedOrange

1.1Å4.3Å

Page 8: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Docking quality: RMSD

Vladimir Chupakhin, UNISTRA, 20118

δ is the distance between N pairs of equivalent atoms

δ1

δN

Page 9: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Cross-docking

Vladimir Chupakhin, UNISTRA, 20119

Procedures are the same. But why?Robustness!!!

These fluctuation have huge influence in the docking results

Page 10: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Scoring functions

Vladimir Chupakhin, UNISTRA, 201110

1. Force-field scoring functions (Dock, AutoDock, GOLD)

2. Empirical scoring functions (ChemScore, PLP, Glide SP/XP)

3. Knowledge-based scoring functions (PMF, DrugScore, ASP, SMoG)

Ligand atoms

Protein atoms

Page 11: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Force-field scoring function

Vladimir Chupakhin, UNISTRA, 2011DOI:10.1038/nrd1549

Protein-ligand interactions energy terms Ligand energy terms

Algorithm (force field based)For a given PL complex1. Calculate the interaction energies

between atoms of the ligand and protein (EvdW + EH-bond) using force field.

2. Calculate internal energy of the ligand (Ewdw + Etorsion) + internal H-bond of the ligand (optionally).

3. Total energy = sum of the energy terms 2 and 3

11

Page 12: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Empirical scoring function

Vladimir Chupakhin, UNISTRA, 2011

Algorithm (additive scheme)1. Define interactions types and

geometries2. Look up at the database of

interaction energies3. Total energy = Sum of the

contribution of the every component (+ geometry term influence)

LUDI

DOI:10.1038/nrd1549

ESF made to reproduce the binding energies or conformations (scoring function depends on the training set used to developed it)

12

Page 13: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Knowledge-based scoring function

Vladimir Chupakhin, UNISTRA, 2011DOI:10.1038/nrd1549

Algorithm1. Define interactions types and

geometries2. Look up into the database of LP

atom interactions3. Total score (energy) = Sum of the

interactions scores (energies)(ϒ – adjustable parameter, SAS0 – solvated state

of the solvent accessible ares)

KBSF developed to reproduce the binding pose then energy

13

Page 14: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Scoring functions: the purposes

Vladimir Chupakhin, UNISTRA, 201114

Docking = finding the correct binding

pose

Scoring = predict activity of the compound (Ki,

IC50, etc)

Page 15: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Scoring functions: docking

Vladimir Chupakhin, UNISTRA, 2011

Docking

Average success to dock compound within RMSD < 2Å is around 70%

15Comparative Assessment of Scoring Functions on a Diverse Test Set, Wang, 2009

Page 16: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Scoring functions: scoring

Vladimir Chupakhin, UNISTRA, 2011

Scoring

Average success rate to rank compound with correlation coefficient from 55-64%

Comparative Assessment of Scoring Functions on a Diverse Test Set, Wang, 2009 16

Page 17: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

GOLD Score failure

Vladimir Chupakhin, UNISTRA, 201117

pose1

pose2

pose1 pose2GOLD Score 59,19 59,30

RMSD, Å 1,10 4,27

Top scored pose

Page 18: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Molecular scoring functions: problems

Vladimir Chupakhin, UNISTRA, 2011

1.Problems when binding site is highly charged or highly hydrophobic/ hydrophilic

2.Problems when binging site contains waters, ions, cofactors

3. Fragment-like docking – is very tricky4. Even input conformation can influence

the docking results

18

Page 19: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Vladimir Chupakhin, UNISTRA, 2011

Interaction fingerprints

19

Page 20: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Chemical fingerprintFingerprints encode the presence or absence of certain features in a

compound, e.g., fragments.

0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0

KISS: Keep It Short and Simple! Keep It Simple Stupid

Page 21: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Structural Interaction Fingerprints

Zhan Deng, Claudio Chuaqui, and Juswinder Singh Structural Interaction Fingerprint (SIFt): A Novel Method for Analyzing Three-Dimensional Protein−Ligand Binding Interactions (DOI: 10.1021/jm030331x), Biogen Inc.

Detect interactions of the ligandwith every amino acidof the binding site

Page 22: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

22

Aromatic face to face

Hydrophobic

H-bond(protein donor)

Ionic (protein anion)

Aromatic face to edge

Ionic (protein cation)

1 0 0 0 1 0 0

H-bond(protein acceptor)

Bitstring for 1 residue

Bitstring for the whole binding site – Interaction Fingerprint

100100010000101000000100000010000001 …..Residue 1 Residue 2 Residue 3 Residue 4 Residue 5 Residue X

Interaction Fingerprints : preparation

2007, Optimizing Fragment and Scaffold Docking by Use of Molecular Interaction Fingerprints

Page 23: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Molecular Interaction Fingerprints ~ (IFP)

Vladimir Chupakhin, UNISTRA, 2011

ILE10 1000000VAL18 1000000ALA31 1000000LYS33 1000000VAL64 1000000PHE80 1010000GLU81 0000100PHE82 1100000LEU83 1001000HIS84 1000000GLN85 1000000ASP86 1000101LEU134 1000000ALA144 1000000ASP145 1000000

Zhan Deng, Claudio Chuaqui, and Juswinder Singh Structural Interaction Fingerprint (SIFt): A Novel Method for Analyzing Three-Dimensional Protein−Ligand Binding Interactions (DOI: 10.1021/jm030331x), Biogen Inc.

1000000100000010000001000000100000010000001000101100000010000001000000

3D 1D (bit string)

23

Page 24: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

interacting patterns (amino acid can be

represented as residue or an

pharmacophoric point, interacting

fragment of ligand can be encoded as

atom, fragment or pharmacophoric point);

type of interaction (hydrogen bonds,

hydrophobic interactions, etc);

direction of interaction (this parameter

distinguish the direction of interaction: for

example is donor of hydrogen bond protein

or ligand);

strength of interaction and distance

between interacting patterns (these

parameters are research specific);

number of bits per interaction point (one

or many).

Ligand ↔ Receptor

Parameters of IFP

24

Page 25: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Gold scoring function failure: IFP wins!

Vladimir Chupakhin, UNISTRA, 201125

Ligand A07 from LR-complex (PDB ID: 3LFS), docked into CDK2 binding site (PDB ID: 2A0C).

Pose 1 – orange(TCreal_vs_docked – 0.75RMSD – 1.10 Å,Goldscore = 59.20)

Pose 2 – blue(TCreal_vs_docked – 0.52RMSD – 4.27 Å,Goldscore = 59.30)

X-ray pose – brickred

Jaccard (Tanimoto)coefficient

Page 26: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Vladimir Chupakhin, UNISTRA, 2011

IFP usage• store interactions in useful format• analyze experimental LR-complexes

• quality of docking studies• results clustering (even peptides and PPI)

• analyze docked LR-complexes (drug-like and fragment-like compounds)

• retrieve correct binding pose• retrieve specific binding pose

Page 27: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Use cases for IFP: storage

Useful way to store interaction information from experimentally derived LR-complexes:

• scPDB database – Laboratory of Didier Rognan, UNISTRA, Illkirch (DOI: 10.1021/ci050372x)

• CREDO database (DOI:10.1111/j.1747-0285.2008.00762.x).

27

Page 28: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Use cases for IFP: x-ray LR analysis

Vladimir Chupakhin, UNISTRA, 2011

Compounds

Binding site

Specific interactions

DOI: 10.1021/jm030331x

Page 29: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Use cases for IFP: pose retrieval (1)

Vladimir Chupakhin, UNISTRA, 2011

RMSD is not 100% correct evaluation function!

DOI: 10.1021/ci600342e

Page 30: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Use cases for IFP: VS

Vladimir Chupakhin, UNISTRA, 2011

Compare the reference x-ray IFP with IFP of docked poses using Tanimoto coefficient.Compounds

database

Virtual screeningresults

Using standard SF: X% of the real hitsUsing standard SF + TC: X% + up to 20%

Page 31: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Use cases for IFP: PPI

Vladimir Chupakhin, UNISTRA, 2011

IFP suitable even for analysis of Protein-Protein Interactions!

Page 32: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Use cases for IFP: agonists/antagonists

Vladimir Chupakhin, UNISTRA, 2011

(A) Procaterol – agonist, (B) Carvediol - antagonist

Selective Structure-Based Virtual Screening for Full and Partial Agonists of the b2 Adrenergic Receptor, DOI: 10.1021/jm800710x

Page 33: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

IFP modifications

IFP modifications

Page 34: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

IFP modifications: r-SIFt – R-group IFP

LEU831001000

110

C R1R2 Benefits: Combinatorial library analysis(~100.000 compounds)

Independent of interaction type!Just the fact of interaction!

DOI: 10.1021/jm050381x

Page 35: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

IFP modifications: w-SIFt – weighed IFP

+ Biological Activity

+Machine learning approach: find correlation between bit frequency and activity

DOI: 10.1021/ci800466n

moderateactivity

mostactive

lessactive

Benefits:• help to find what interactions are critical for compound potency• interpretable position dependent scoring function for ligand protein interactions

Page 36: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Binding site independent IFP

Binding site independent IFP

Page 37: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

BS-independent IFP: APIF

APIF: A New Interaction Fingerprint Based on Atom Pairs and Its Application to Virtual Screening

Atom Pair

Distance = range

Algorithm1. Detect interaction patterns (Hydrophobic,

HBA, HBD)2. Define distance1 and distance2 for

quadruplet interaction3. Convert distances to distance range4. Map distance range and types ….

QuadrupletIFP

Page 38: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

BS-independent IFP: APIF - Quadruplet

Ligand-atom

Protein-atom

Ligand-atom

Protein-atom

Distance 2

Distance 1

Interaction Interaction

1 bit in the APIF

Page 39: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

BS-independent IFP: APIF

Benefits:• independent on the binding site• comparable to current scoring functions

Page 40: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

BS-independent IFP: Pharm-IF

Algorithm1. Detect interaction patterns (Hydrophobic,

HBA, HBD)2. Define ligand pairs based on ligand atoms

interacting with protein ONLY3. Measure their distance4. Map distance to range (quantization) =

Pharm-IF

Benefits:• independent on the binding site• comparable to current scoring functions

DOI: 10.1021/ci900382e

Page 41: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

IFP-based scoring functions

IFP-based scoring functions

Page 42: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

IFP-based SF: AuPosSOM

Vladimir Chupakhin, UNISTRA, 201142

• Dock decoys and compounds with known activity

• Generate vector of interactions (H-bons, hydroph.interactions)

• Train model of the active and incative (vector is input)*

Automatic clustering of docking poses in virtual screening process using self-organizing map - AuPosSOM

f (Input (IFP) = 1 or 0 where 1 – is binder0 – non binder

*Simplified representation

Page 43: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

IFP-based SF: RF-Score

Vladimir Chupakhin, UNISTRA, 201143

A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking – RF-Score DOI:10.1093/bioinformatics/btq112

• Vector of 36 features, each feature is occurrence count for j-iatom pair

• Mechanism of generations: take all atoms around 12A around selected ligand atom, filter out interaction out of cutoff range, sum the result (for each interaction pair).

• PDBBind was used to train Random Forest model• Train model using activity as output and interactions as input

Page 44: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Literature overview: SVM-SP

Vladimir Chupakhin, UNISTRA, 201144

Support Vector Regression Scoring of Receptor–Ligand Complexes for Rank-Ordering and Virtual Screening of Chemical Libraries DOI: 10.1021/ci200078f

• Two types of vectors: SVR-KB (146 features) are knowledge-based pairwise potentials (same as above mentioned but trained with SVR), while SVR-EP is based on physico-chemical properties. SVR-EP vector consist of features extracted from X-score (polar/unpolarSASA, MW, vdW energy, etc)

• SVR-KB is better then SVR-EP

Vector is unique!Vector is atom pair based

Page 45: Interaction fingerprint: 1D representation of 3D protein-ligand complexes

Merci bien!Thanks a lot!