Knowledge-based chemical fragment analysis in protein binding sites
Transcript of Knowledge-based chemical fragment analysis in protein binding sites
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
1
Knowledge-based Chemical Fragment Analysis
in Protein Binding Sites
Edith Chan
Roman Laskowski, David Selwood
University College London, UK
19 June [email protected]://www.ucl.ac.uk/wibr/research/drug-discovery/edith-chan
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
2
A medicinal chemist’s problem
vFLIP – IKKgamma, 3cl3.pdb
A common question from medicinal
chemists - what compounds
should be made given a protein
target.
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
3
A medicinal chemist’s problem
vFLIP – IKKgamma, 3cl3.pb
• A common question from
medicinal chemists - what
compounds should be made
given a protein target.
• Chemists very good at
generating new ideas when
seeing other structures.
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
How can a medicinal chemist decide on a chemical strategy
– Screening to find leads
– Natural ligand
– Virtual screening (hit rate improvement of 10x)
– SAR studies
– Fragment based approaches
Med chem need
– A simple way to select likely binding molecules for a protein binding site
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
Fragment-based drug discovery
5
• FBDD has become an established and
successful paradigm for the past 10
years.
• Small chemical structures are screened
to probe the binding site and then to
identify larger molecules to bind.
• Most platforms are laboratory based –
like X-ray, NMR.
Target Site
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
Design of fragments: current status
• Physicochemical filtering Ro3 (Ro2 ½ ), small, soluble
• Screen out reactive groups
• Chemical handles for further manipulation
• Most groups use 500- 2000 compounds, some use 10,000 plus
• A good binding affinity - ligand efficiency often used to express binding
affinity of fragments
– Typical good LE is > 0.2
LE = -ΔGHAC
-RT ln(Kd)HAC
=
LE = ligand efficiency, HAC = heavy atom count
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
Common computational approaches
• Fragment /property centered
– Analyzing drug-related databases for molecular
frameworks, property, diversity, and privileged
scaffolds, etc.
DrugBank, WDI
MW
HA
HD
PSA clogP
Diversity
Rule of3
N
N
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
Common computational approaches
• 3D fragments experimental centered
– Most current fragments are based around
aromatic or heteroaromatic structures
– Use Diversity orientated synthesis to design
new fragment based libraries
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
Common computational approaches
• Binding site centered
• FragFEATURE – predict fragments / functional groups based on structural environment of binding sites.
• LigFrag-RPM, preference of small fragments / functional groups and their amino acid environment
• Would be too complicated for chemists to understand
9
Tang et al. PLOS Computational Biology 2014, 10, e1003589Wang et al. Chemical Information & Modeling 2011, 51, 807-815
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
Target centred
• Given a new or unknown target, what kind of scaffolds or fragments
chemists should try.
• Using X-ray structures to define fragments
• A cheminformatics database identifying the chemical motifs or fragments
that preferentially interact with particular protein side chains in the binding
site.
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
Our approach
• Use the pdb as a source of ligand- protein information- deriving the interacting “fragment”
• High quality information
Definition of fragment
• the largest ring assembly containing the atoms involved in hydrogen bond(s) to one of the side chains
• Looked at Asp, Glu, Arg and His as the most common amino acids involved in H-bonding interactions
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
12
PDB
• The Protein Data Bank (PDB) has over 100K structures, with ~ 77K that have co-crystallized ligand.
• PDB 3D co-crystallized structures provide protein-ligand interaction.
• Analyses of specific ligand/side chain interactions tended to focus on single ligand atoms rather than chemical fragments.
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
Protein-ligand interactions – 3D
13
Hydrogen Bonds
Interactions in 3D: Hbonds, hydrophobic, van der Waals, etc
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
14
Protein-ligand interactions – 2D
LigPlot
N
N
N
N
O
ON
F
F
N
O
O
O
NN
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
15
Fragment motif definition & extraction
N
N
N
N
O
ON
F
F
N
O
O
O
NN
• Interacting motif is the largest ring
assembly containing the atoms involved in
the hydrogen bond(s).
• Other substituents on the ring assembly
are removed if they are not involved in the
hydrogen bond to the relevant protein side
chain
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
16
Amino acids studied• Acidic and negatively charged
N
O
O
O
N
O
OO
N
N
C+
O
N NH
HH
H
H
N
O
NN
H
• Basic and positively charged arginine (Arg)
• Basic and polar histidine (His)
aspartic acid (Asp) glutamic acid (Glu)
Asp, Glu, Arg, and His – important in binding sites
• Account for 55% of all catalytic residues in enzyme actives sites
• Most frequently in contact with ligand in 50 diverse protein binding sites
Villar, FEBS Lett 1994 349 125Bartlett, J Mol Biol 2002 324 105
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
17
Ligands
Filter Criteria Used to Retrieve Relevant Ligands from the PDB
1. 100 < molecular weight (Da) < 800
2. Number of atoms > 5
3. Only atoms H, C, N, O, P, S, F, Cl, Br, I
4. Number of oxygen and nitrogen atoms < 16
5. Number of hydrogen donor atoms < 8
6. Number of rotatable bonds < 16
7. Not an metal ion or inorganic compound, such as AlF3
8. Not a common solvent used in X-ray, such as GOL (Glycerol), EDO (1,2-ethanediol), TRS (2-amino-2-hydroxymethyl-propane-1,3-diol)
9. Not an impurity or unknown, such as UNX, UNK, UNL, ARG, O, C, N
10. Not an negativeiIon, such as NO3 (Nitrate), SO4 (sulfate)
11. Not an amino acid
12. Not a common sugar or lipid
13. Not a cofactor, such as ATP, ADP, SAM, FAM
Chan, et al. J. Med. Chem. 2010, 53, 3086–3094
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
18
Data set from PDB• The data set for this study was compiled in April 2009
• Approximately 26,000 structures.
• Hbond between ligands and specific protein side chains were extracted from the data files in PDBsum
• PDBsum uses the HBPLUS program to calculate potential hydrogen bonds and non-bonded contacts.
• The 3D coordinates of the ligand and interacting side chain of interest were then extracted from the parent PDB file and translated into MDL SD format for further processing
PDB
Protein-ligand
complexes
X-ray only; R < 2.2Å
Hbond withD, E, R or H?
Protein LigandInteraction(Hbond)
N
N
O
ON
F
F
N
O
O
O
NN
Fragment
N
N
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
19
O-H•••O
N-H•••O
N-H •••N
Asp
Glu
Arg
His
0
5
10
15
20
25
30
35
40
1 2 3 4 5 6 7 8
Fre
q(%
)
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 Bond Length (Å)
0
5
10
15
20
25
30
35
40
45
50
1 2 3 4 5 6 7 8
Fre
q (
%)
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 Bond Length (Å)
0
5
10
15
20
25
30
35
40
45
50
1 2 3 4 5 6 7 8
Fre
q (
%)
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 Bond Length (Å)
HBond• HBond length - distance between
• O-O, N-O, and N-N atoms
• Typical HBond values between 2
heteroatoms
• 2.5 – 3.5 Å (15 - 20 kcal/mol)
• The bond lengths of different type of
Hbond are in line with typical values
• – O-O shortest
• – N-N longest
• Due to atomic radii
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
20
N-H•••O
Asp
Glu
Arg
His
0
5
10
15
20
25
30
35
40
45
50
1 2 3 4 5 6 7 8
Fre
q (
%)
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 Bond Length (Å)
Hbond: N-O category• In the N-O category, the mean bond
values for Asp, Glu, and Arg are around
• 2.8-3.0 Å
• However, in His, the mean value is slightly
shorter, around
• 2.6 to 2.8 Å
• bond value distribution is more
spread out.
• It may imply that His has the ability to form
a wider range (stronger to weaker) of
Hbond with ligands.
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
21
N-H •••N
Asp
Glu
Arg
His
0
5
10
15
20
25
30
35
40
1 2 3 4 5 6 7 8
Fre
q(%
)
2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 Bond Length (Å)
Hbond: N-N categoryThe N-N category,:
• Hbond between His and other N-
containing ligand tends to be longer
• weaker Hbond compared to those
formed by Arg.
• Most of PDB in this category are Zn
binding proteins.
• His(s) coordinate to Zn as well as
ligand. In this case, His is donating
electrons to the metal and forms a
weaker HB with ligand.
2q1qcarbonic anhydrase
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
22
Protein families
His
Glu
Arg
Asp
Hydrolase
Oxidoreductase
Transferase
Lyase
Isomerase
others
33%
18%15%6%
3%
26%
21%17%5%
4%
33%
18%
15%6%
3%
33%
18%15%
6%
3%
• Our data set contains structures from
– all enzyme classes
– various receptor families.
• 5 protein families dominate
– Hydrolase
– Oxidoreductase
– Transferase
– Lyase
– Isomerase
• In Asp and Glu, their family distribution profiles are similar.
• in Arg, the top protein family is oxidoreductase.
The domination of enzymatic classes over receptor families reflect the nature that almost in all the cases, hydrogen bonding interaction is a requirement in enzymatic active sites.
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
23
• Fragments showing two H-bonds to Asp and Glu side chains
Table 1. Fragments showing two hydrogen bonds to Asp and Glu side chains. Generic fragment
(fa) Highest frequency examplesb
nc Aryl amidines
Ar N
N
Asp (11) Glu ( 2)
N
N
72 2
N
N
N
N 30 0
N
N
12 0
Guanidines
RN N
N
Asp (2) Glu (5)
N
N
N
14 2
N
N
N
0 1
N
N
N
O 1 1
1-aza-2-aminoaryls
N
N
N
N
N
O
N
N
N
N
N
O
N
N
N
N
N
O Asp (8) Glu (6)
9 0
7 10
4 0
azaheteroaryl-7-amines N
N
Asp (2) Glu (1)
N
N N
NN
0 3
N
N N
NN
2 0
Dihydropyridazines
N
N
O
O
Asp (2) Glu (0)
N
N
O
O
2 0
NN
O
O
N
N
1 0
Cyclic diols
Wn
O
O Asp (20) Glu (13)
O
O
O 69 65
O
O
O 54 105
N
O
O 10 5
O
ON
N Asp or Glu
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
24
Asp vs Glu – O mediated motifs are similar
OO
O
O
O
O
O
O
N
N
O
O
O
Asp 34 30 8 7 10 1Glu 45 69 5 5 4 11
NO
OO
O
• O-mediated motifs are similar
O
O
Asp or Glu
O
OH
OH
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
25
Asp vs Glu: N-mediated motifs are different
Asp 72 30 14 12 11 2Glu 0 0 1 0 0 0
Asp 6 0 0 1 0 0Glu 13 11 4 4 3 1
NN
N
N
N
N
N
N
N
N
N
N N
N
N
N
N
N
N
N
N
N
N
S N
N
N S N
NN
N
Asp 6 1 1 2 1 1 Glu 0 0 0 0 2 0
S
NN
N
N
O
NN
NN
OH
H
N
N
O
NN
N N N
O
HH
N
N N
N
N
N
NN
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
26
cytochromE c peroxidase• Membrane-bound hemoproteins that are
essential for electron transport. They are capable of undergoing oxidation and reduction.
• Asp can HB to a variety of motifs – 15 (10%)
NN
N
N NN N
N N
O
NNN
S
NN
N
N
N N
N
N N
N
NS
N
N
N
N
N
2euu1dso1dsp1ds41dse1aej1aes1cmp
2eun2aqd2eut2eup2anz
2as2
2as4
2rbu 2as62as1
1ryc1kxm
2rc0 2rbx
1aeo1aeg
1aen
1aee
1aem 1aek
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
27
Arg – COOH motifs
• Arg is positively charged at pH values below their pKa (~12).
• Arg is mainly a HB donor.
• From our study, Arg almost exclusively forms HBond with O-mediated ligands
• The most frequent motif is acid.
O O O
O
S
O
O
O
O O
NO
N
O
O
N N
N
O O
OO
N
O
O
SO
OO
N+
O O
N
O
O
N
N
C+
O
N NH
HH
H
H
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
28
• Although acidic motif is most common, exceptions apply but not often
O
O
N
ARG – via N:
1yfx 1vfs
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
Generic fragment (fa)
Highest frequency examplesb
nc
Aza heterocycle
WnN
N
N
N
S
N
8 14 7 4
Sulfonamides
NS
OO
S
OO
N
SO
O
NN
NS
OS
OO
N
9 5 1 1
Cyclic alcohols
Wn
O
O
O
O
N
O
O 24 4 3 3
Phenols O
O
O
O N N
O
20 88 13 2
Carboxylic acids
O
R O
O
O
O
OO
O
ON
18 17 8 2
Carbonyls
OR
R'
O
N
O
N
N O
O
O
28 5 4 4
Fragments showing H-bonds to His side chains
• pKa ~ 6
• High freq binds to metal
• Can be HB or HD
• Most freq frags are –OH, -C=O, -COOH
• N-heterocyclics
N
O
NN
H
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
30
Common and unique
OO
O
O
N
N
OO
N
N
O
O
N
N
Common in all 4 amino acids– universal Hbond partners
Unique in Asp but not others
N
N
NN
N
O
O
O
N
N
Most frequent
Asp Glu Arg His
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
31
Table 2. Analysis and statistics for the 4 amino acids
Asp Glu Arg His
Side chain moiety and property
carboxylic acid carboxylic acid guanidinium imidazole
negatively charged, acidic
negatively charged, acidic
positively charged, basic
polar, basic
PDBa 3851 2541 3764 2736
Nonreductant PDBb 2428 1043 1568 1019
Protein familyc 186 150 214 166
Unique ligandd 992 893 710 883
Unique motif 161 144 137 133
Diversity ratioe 0.16 0.16 0.19 0.15
Hydrogen bond mediated atom
N 106 (66%) 80 (56%) 7 (5%) 35 (26%)
O 53 (33%) 56 (39%) 124 (91%) 85 (63%)
F/Cl 0 0 5 2
S 2 2 1 5
Mixedf 0 6 0 6
• The diversity ratio measures the diversity of the fragment motifs for all the ligands
= number of unique motif / number of unique ligands
• Asp, Glu, and His have a similar ratio, 0.16, while Arg (0.19) is higher.
• More variety of fragment motifs that interact with Arg in the PDB even though the number of unique ligands is the smallest in Arg.
`
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
32
Table 2. Analysis and statistics for the 4 amino acids
Asp Glu Arg His
Hydrogen bond mediated atom
N 106 (66%) 80 (56%) 7 (5%) 35 (26%)
O 53 (33%) 56 (39%) 124 (91%) 85 (63%)
F/Cl 0 0 5 2
S 2 2 1 5
Mixedf 0 6 0 6
• Acidic side chains, Asp and Glu have a higher tendency to form hydrogen bond with N-mediated motifs
• Basic residues, Arg and His have a higher tendency with O-mediated ones.
• Most interesting is that Arg almost exclusively forms hydrogen bonds with O-mediated ligands (91%), suggesting it could be most effective to use O-mediated motifs to form hydrogen bonds with Arg.
• Another reassuring fact is that no F or Cl (hydrogen acceptor) is detected to form hydrogen bonds with Asp or Glu
• confirming that a binding site’s Asp and Glu are always hydrogen bond acceptors as well as negatively charged in protein structures.
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
33http://www.ucl.ac.uk/~rmgzawe/suppinfo/index.html
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
Conclusions
• Data show a conserved number of fragments.
• The most common fragments should be represented in fragment screening
sets for given targets.
• We need to expand and refine the analyses e.g. missing amino acids
• Some fragments (3D) are under-represented and could be embedded in
novel scaffolds – new chemistry
Chan, et al. J. Med. Chem. 2010, 53, 3086–3094
The Wolfson Institute for Biomedical ResearchThe Cruciform Building, UCL
Acknowledgment
• Funding from Cancer Research UK
35