Interaction fingerprint: 1D representation of 3D protein-ligand complexes
-
Upload
vladimir-chupakhin -
Category
Education
-
view
3.525 -
download
1
description
Transcript of Interaction fingerprint: 1D representation of 3D protein-ligand complexes
Interaction fingerprints
Vladimir Chupakhin, UNISTRA, 2011
1NTERACT10NF1NGERPR1NTS
1
Chupakhin VladimirLaboratory of ChemoinformaticsStructural Chemogenomics GroupUniversity of Strasbourg
December 2011
Virtual screening approaches
Ligand –based (QSAR, similarity search,
pharmacophores)
Vladimir Chupakhin, UNISTRA, 2011
Structure–based(docking, pharmacophores)
?
Lock-and-key paradigm
InteractionsLo
ckK
ey
Vladimir Chupakhin, UNISTRA, 20113
Molecular docking: main steps
1. Protein and ligand preparation2. Binding site identification
3. Conformational search with scoring of the generated
poses
Vladimir Chupakhin, UNISTRA, 20114
Geometry of interaction
H-bond length (3.0 Å)
H-bond angle (~175°)
Interactions are geometry!
- Hydrophobic- H-bonds- Ionic- Aromatic- Cation-π
Different type of interactions
Self-docking
Vladimir Chupakhin, UNISTRA, 20117
Extract ligand
Modify geometryDock to thesame protein
Extract ligand
Calculate RMSD
BlueRedOrange
1.1Å4.3Å
Docking quality: RMSD
Vladimir Chupakhin, UNISTRA, 20118
δ is the distance between N pairs of equivalent atoms
δ1
δN
Cross-docking
Vladimir Chupakhin, UNISTRA, 20119
Procedures are the same. But why?Robustness!!!
These fluctuation have huge influence in the docking results
Scoring functions
Vladimir Chupakhin, UNISTRA, 201110
1. Force-field scoring functions (Dock, AutoDock, GOLD)
2. Empirical scoring functions (ChemScore, PLP, Glide SP/XP)
3. Knowledge-based scoring functions (PMF, DrugScore, ASP, SMoG)
Ligand atoms
Protein atoms
Force-field scoring function
Vladimir Chupakhin, UNISTRA, 2011DOI:10.1038/nrd1549
Protein-ligand interactions energy terms Ligand energy terms
Algorithm (force field based)For a given PL complex1. Calculate the interaction energies
between atoms of the ligand and protein (EvdW + EH-bond) using force field.
2. Calculate internal energy of the ligand (Ewdw + Etorsion) + internal H-bond of the ligand (optionally).
3. Total energy = sum of the energy terms 2 and 3
11
Empirical scoring function
Vladimir Chupakhin, UNISTRA, 2011
Algorithm (additive scheme)1. Define interactions types and
geometries2. Look up at the database of
interaction energies3. Total energy = Sum of the
contribution of the every component (+ geometry term influence)
LUDI
DOI:10.1038/nrd1549
ESF made to reproduce the binding energies or conformations (scoring function depends on the training set used to developed it)
12
Knowledge-based scoring function
Vladimir Chupakhin, UNISTRA, 2011DOI:10.1038/nrd1549
Algorithm1. Define interactions types and
geometries2. Look up into the database of LP
atom interactions3. Total score (energy) = Sum of the
interactions scores (energies)(ϒ – adjustable parameter, SAS0 – solvated state
of the solvent accessible ares)
KBSF developed to reproduce the binding pose then energy
13
Scoring functions: the purposes
Vladimir Chupakhin, UNISTRA, 201114
Docking = finding the correct binding
pose
Scoring = predict activity of the compound (Ki,
IC50, etc)
Scoring functions: docking
Vladimir Chupakhin, UNISTRA, 2011
Docking
Average success to dock compound within RMSD < 2Å is around 70%
15Comparative Assessment of Scoring Functions on a Diverse Test Set, Wang, 2009
Scoring functions: scoring
Vladimir Chupakhin, UNISTRA, 2011
Scoring
Average success rate to rank compound with correlation coefficient from 55-64%
Comparative Assessment of Scoring Functions on a Diverse Test Set, Wang, 2009 16
GOLD Score failure
Vladimir Chupakhin, UNISTRA, 201117
pose1
pose2
pose1 pose2GOLD Score 59,19 59,30
RMSD, Å 1,10 4,27
Top scored pose
Molecular scoring functions: problems
Vladimir Chupakhin, UNISTRA, 2011
1.Problems when binding site is highly charged or highly hydrophobic/ hydrophilic
2.Problems when binging site contains waters, ions, cofactors
3. Fragment-like docking – is very tricky4. Even input conformation can influence
the docking results
18
Vladimir Chupakhin, UNISTRA, 2011
Interaction fingerprints
19
Chemical fingerprintFingerprints encode the presence or absence of certain features in a
compound, e.g., fragments.
0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0
KISS: Keep It Short and Simple! Keep It Simple Stupid
Structural Interaction Fingerprints
Zhan Deng, Claudio Chuaqui, and Juswinder Singh Structural Interaction Fingerprint (SIFt): A Novel Method for Analyzing Three-Dimensional Protein−Ligand Binding Interactions (DOI: 10.1021/jm030331x), Biogen Inc.
Detect interactions of the ligandwith every amino acidof the binding site
22
Aromatic face to face
Hydrophobic
H-bond(protein donor)
Ionic (protein anion)
Aromatic face to edge
Ionic (protein cation)
1 0 0 0 1 0 0
H-bond(protein acceptor)
Bitstring for 1 residue
Bitstring for the whole binding site – Interaction Fingerprint
100100010000101000000100000010000001 …..Residue 1 Residue 2 Residue 3 Residue 4 Residue 5 Residue X
Interaction Fingerprints : preparation
2007, Optimizing Fragment and Scaffold Docking by Use of Molecular Interaction Fingerprints
Molecular Interaction Fingerprints ~ (IFP)
Vladimir Chupakhin, UNISTRA, 2011
ILE10 1000000VAL18 1000000ALA31 1000000LYS33 1000000VAL64 1000000PHE80 1010000GLU81 0000100PHE82 1100000LEU83 1001000HIS84 1000000GLN85 1000000ASP86 1000101LEU134 1000000ALA144 1000000ASP145 1000000
Zhan Deng, Claudio Chuaqui, and Juswinder Singh Structural Interaction Fingerprint (SIFt): A Novel Method for Analyzing Three-Dimensional Protein−Ligand Binding Interactions (DOI: 10.1021/jm030331x), Biogen Inc.
1000000100000010000001000000100000010000001000101100000010000001000000
3D 1D (bit string)
23
interacting patterns (amino acid can be
represented as residue or an
pharmacophoric point, interacting
fragment of ligand can be encoded as
atom, fragment or pharmacophoric point);
type of interaction (hydrogen bonds,
hydrophobic interactions, etc);
direction of interaction (this parameter
distinguish the direction of interaction: for
example is donor of hydrogen bond protein
or ligand);
strength of interaction and distance
between interacting patterns (these
parameters are research specific);
number of bits per interaction point (one
or many).
Ligand ↔ Receptor
Parameters of IFP
24
Gold scoring function failure: IFP wins!
Vladimir Chupakhin, UNISTRA, 201125
Ligand A07 from LR-complex (PDB ID: 3LFS), docked into CDK2 binding site (PDB ID: 2A0C).
Pose 1 – orange(TCreal_vs_docked – 0.75RMSD – 1.10 Å,Goldscore = 59.20)
Pose 2 – blue(TCreal_vs_docked – 0.52RMSD – 4.27 Å,Goldscore = 59.30)
X-ray pose – brickred
Jaccard (Tanimoto)coefficient
Vladimir Chupakhin, UNISTRA, 2011
IFP usage• store interactions in useful format• analyze experimental LR-complexes
• quality of docking studies• results clustering (even peptides and PPI)
• analyze docked LR-complexes (drug-like and fragment-like compounds)
• retrieve correct binding pose• retrieve specific binding pose
Use cases for IFP: storage
Useful way to store interaction information from experimentally derived LR-complexes:
• scPDB database – Laboratory of Didier Rognan, UNISTRA, Illkirch (DOI: 10.1021/ci050372x)
• CREDO database (DOI:10.1111/j.1747-0285.2008.00762.x).
27
Use cases for IFP: x-ray LR analysis
Vladimir Chupakhin, UNISTRA, 2011
Compounds
Binding site
Specific interactions
DOI: 10.1021/jm030331x
Use cases for IFP: pose retrieval (1)
Vladimir Chupakhin, UNISTRA, 2011
RMSD is not 100% correct evaluation function!
DOI: 10.1021/ci600342e
Use cases for IFP: VS
Vladimir Chupakhin, UNISTRA, 2011
Compare the reference x-ray IFP with IFP of docked poses using Tanimoto coefficient.Compounds
database
Virtual screeningresults
Using standard SF: X% of the real hitsUsing standard SF + TC: X% + up to 20%
Use cases for IFP: PPI
Vladimir Chupakhin, UNISTRA, 2011
IFP suitable even for analysis of Protein-Protein Interactions!
Use cases for IFP: agonists/antagonists
Vladimir Chupakhin, UNISTRA, 2011
(A) Procaterol – agonist, (B) Carvediol - antagonist
Selective Structure-Based Virtual Screening for Full and Partial Agonists of the b2 Adrenergic Receptor, DOI: 10.1021/jm800710x
IFP modifications
IFP modifications
IFP modifications: r-SIFt – R-group IFP
LEU831001000
110
C R1R2 Benefits: Combinatorial library analysis(~100.000 compounds)
Independent of interaction type!Just the fact of interaction!
DOI: 10.1021/jm050381x
IFP modifications: w-SIFt – weighed IFP
+ Biological Activity
+Machine learning approach: find correlation between bit frequency and activity
DOI: 10.1021/ci800466n
moderateactivity
mostactive
lessactive
Benefits:• help to find what interactions are critical for compound potency• interpretable position dependent scoring function for ligand protein interactions
Binding site independent IFP
Binding site independent IFP
BS-independent IFP: APIF
APIF: A New Interaction Fingerprint Based on Atom Pairs and Its Application to Virtual Screening
Atom Pair
Distance = range
Algorithm1. Detect interaction patterns (Hydrophobic,
HBA, HBD)2. Define distance1 and distance2 for
quadruplet interaction3. Convert distances to distance range4. Map distance range and types ….
QuadrupletIFP
BS-independent IFP: APIF - Quadruplet
Ligand-atom
Protein-atom
Ligand-atom
Protein-atom
Distance 2
Distance 1
Interaction Interaction
1 bit in the APIF
BS-independent IFP: APIF
Benefits:• independent on the binding site• comparable to current scoring functions
BS-independent IFP: Pharm-IF
Algorithm1. Detect interaction patterns (Hydrophobic,
HBA, HBD)2. Define ligand pairs based on ligand atoms
interacting with protein ONLY3. Measure their distance4. Map distance to range (quantization) =
Pharm-IF
Benefits:• independent on the binding site• comparable to current scoring functions
DOI: 10.1021/ci900382e
IFP-based scoring functions
IFP-based scoring functions
IFP-based SF: AuPosSOM
Vladimir Chupakhin, UNISTRA, 201142
• Dock decoys and compounds with known activity
• Generate vector of interactions (H-bons, hydroph.interactions)
• Train model of the active and incative (vector is input)*
Automatic clustering of docking poses in virtual screening process using self-organizing map - AuPosSOM
f (Input (IFP) = 1 or 0 where 1 – is binder0 – non binder
*Simplified representation
IFP-based SF: RF-Score
Vladimir Chupakhin, UNISTRA, 201143
A machine learning approach to predicting protein–ligand binding affinity with applications to molecular docking – RF-Score DOI:10.1093/bioinformatics/btq112
• Vector of 36 features, each feature is occurrence count for j-iatom pair
• Mechanism of generations: take all atoms around 12A around selected ligand atom, filter out interaction out of cutoff range, sum the result (for each interaction pair).
• PDBBind was used to train Random Forest model• Train model using activity as output and interactions as input
Literature overview: SVM-SP
Vladimir Chupakhin, UNISTRA, 201144
Support Vector Regression Scoring of Receptor–Ligand Complexes for Rank-Ordering and Virtual Screening of Chemical Libraries DOI: 10.1021/ci200078f
• Two types of vectors: SVR-KB (146 features) are knowledge-based pairwise potentials (same as above mentioned but trained with SVR), while SVR-EP is based on physico-chemical properties. SVR-EP vector consist of features extracted from X-score (polar/unpolarSASA, MW, vdW energy, etc)
• SVR-KB is better then SVR-EP
Vector is unique!Vector is atom pair based
Merci bien!Thanks a lot!