Variations in binding conformations of small molecules to...

5
Indian Journal of Chemistry Vol. 45A, January 2006, pp. 106-110 Variations in binding conformations of small molecules to proteins: A study using Protein Data Bank Vijay Kum ar, Amit Sharma & K V Radha Ki s han * Institute of Micr ob ial Technology, Sector 39-A, Chandigarh 160036, India Email: ki shan @imtech.res.in Received 25 November 2004; revised 23 Novelllber 2005 There are several databases in co rporating vast data available on the prot ei n structures and small molecules. Th ey are useful in many fields of chemistry and biology for understanding the chemical and biological significance of the entries in th e databases. Since the biochemical as well as pharmaceutical sciences are the interface for both che mi stry and bi ology, understanding the effect of chemicals on biological molecules is an important issue. While vast data is being generated on thi s front, th ese are not compiled or arranged in a proper way for user-friendly retri eva l. The biological activity of a molecule is directly related to the way the molec ul e is bound to its target prote in . There are many examp les in the Protein Data Bank where sma ll molcc ul es are bound to prote in s in diff erent co nformation s. Some features of li gand conformational fl ex ibility are presented in this paper. The Protein Data Bank (POB)' contains not only deposited coordinates of proteins whose structures have been determined by either X-ray crystallography or NMR , but also a wealth of information associated with the functional features. Several databases have emerged from the POB, such as SCOP", CATH 3 , POBsum 4 . 5 and many interactive structure visualization online servers. The main function of these databases is to facilitate easy data retrieval and understanding the complex nature of protein structures. Many of the structures in POB contain other components besides the proteins. The se include, mainly, structured water molecules inside and around the protein molecules, ions from the buffers used, metal ions binding to the protein s, DNA or RNA molec ul es , co-crystallized substrates, substrate analogues and inhibitor molecule s. In our present investigations, we have studied the substrates, substrate analogues and inhibitor molecules, their typ e, nature and possible biological significance. Kleywegt and Jones 6 have compiled a database of all the molecules bound to proteins in the POB except water molecules. This database is called HIC-Up (Hetero-compound Information Center - Uppsala), and was started in 1997 with a total number of 1100 compounds. At present, this database (HIC-Up 8.1) has grown to 4669 compounds as per the Feb 2004 release. Protein li gand database (PLD) maintained by Puvanendrampillai and Mitche1I 7 at Cambridge has about 485 protein li ga nd complexes information. Of the many databases available, we looked at the HIC- Up database because of the large number of ligands present. The nature and the binding site information of these ligands are of high value in the context of pharmaceuticals since they provide first-hand information on the binding affinities of the ligands bound to their target proteins. Although, all the proteins in th e POB do not contain bound ligands, the available ligand information will throw light on the binding nature of the ligands. We report herein, the nature, biological significance and a few examples of heterogeneous binding of hetero-compounds with respect to the proteins. Methodology All compounds, other than DNA/RNA and buffer ions, were the metal Ions , considered in thi s study. Initially, the database was screened to remove all entries except the chemical compounds. The compounds were further filtered on the basis of the 'Lipinski 's rule of five, 8. Except the rule on LogP values, all the other four rules were used to fi lt er the compo unds. The classification of the compounds was done on the basis of aromaticity, aliphaticity and heterocyclic nature of the compounds. The biological significance of the compounds was identified by thorough screening of the literature related to the compounds and their binding proteins. For analyzing the proteins and the ligands, programs from CCP4

Transcript of Variations in binding conformations of small molecules to...

Indian Journal of Chemistry Vol. 45A, January 2006, pp. 106-110

Variations in binding conformations of small molecules to proteins: A study using Protein Data Bank

Vijay Kumar, Amit Sharma & K V Radha Kishan *

Institute of Microbial Technology, Sector 39-A, Chandigarh 160036, India Email: kishan @imtech.res.i n

Received 25 November 2004; revised 23 Novelllber 2005

There are several databases incorporating vast data available on the protei n structures and small molecules. They are useful in many fields of chemistry and biology for understanding the chemical and biological sig nificance of the entries in the databases. Since the biochemical as well as pharmace utical sciences are the interface for both chemistry and biology, understanding the effect o f chemical s on biological molecules is an important issue. While vast data is being generated on thi s front , these are not compiled or arranged in a proper way for user-friendly retrieval. The biological activity of a mo lecule is directly related to the way the mol ecule is bound to its target protein . There are many examples in the Protein Data Bank where small molccules are bound to prote ins in different conformations. Some features of li gand conformational fl ex ibility are presented in thi s paper.

The Protein Data Bank (POB)' contains not only deposited coordinates of proteins whose structures have been determined by either X-ray crystallography or NMR, but also a wealth of information associated with the functional features. Several databases have emerged from the POB, such as SCOP", CATH3

,

POBsum4.5 and many interactive structure

visualization online servers. The main function of these databases is to facilitate easy data retrieval and understanding the complex nature of protein structures. Many of the structures in POB contain other components besides the proteins . These include, mainly, structured water molecules inside and around the protein molecules, ions from the buffers used, metal ions binding to the proteins, DNA or RNA molecul es, co-crystallized substrates, substrate analogues and inhibitor molecules. In our present investigations, we have studied the substrates, substrate analogues and inhibitor molecules, their type, nature and possible biological significance. Kleywegt and Jones6 have compiled a database of all the molecules bound to proteins in the POB except water molecules . This database is called HIC-Up (Hetero-compound Information Center - Uppsala), and was started in 1997 with a total number of 1100 compounds. At present, this database (HIC-Up 8.1) has grown to 4669 compounds as per the Feb 2004 release. Protein li gand database (PLD) maintained by Puvanendrampillai and Mitche1I7 at Cambridge has about 485 protein ligand complexes information. Of

the many databases available, we looked at the HIC­Up database because of the large number of ligands present. The nature and the binding site information of these ligands are of high value in the context of pharmaceuticals since they provide first-hand information on the binding affinities of the ligands bound to their target proteins. Although, all the proteins in the POB do not contain bound ligands, the available ligand information will throw light on the binding nature of the ligands. We report herein, the nature, biological significance and a few examples of heterogeneous binding of hetero-compounds with respect to the proteins.

Methodology All compounds, other than

DNA/RNA and buffer ions, were the metal Ions, considered in this

study . Initially, the database was screened to remove all entries except the chemical compounds. The compounds were further filtered on the basis of the 'Lipinski 's rule of five, 8. Except the rule on LogP values, all the other four rules were used to fi lter the compounds. The classification of the compounds was done on the basis of aromaticity, aliphaticity and heterocyclic nature of the compounds. The biological sign ificance of the compounds was identified by thorough screening of the literature related to the compounds and their binding proteins . For analyzing the proteins and the ligands, programs from CCP4

KUMAR el al.: BINDING CONFORMATIONS OF SMALL MOLECULES TO PROTEINS 107

package9, MERCURY (downloadable from

Cambridge Crystallographic data center home page (www.ccdc.cam.ac.uk)) and lCMEte (www.molsoft.com) were used. The figures are made with MS excel and ICMlite.

Results and Discussion There are many libraries of chemical compounds

available from various sources for screening based on their drug-like nature or their selection as drugs by QSAR or COMFA studies IO

.11 . However, the

screening will take enormous amount of computational time. There are other alternative methods based on library based virtual ligand screening (VLS). The existing libraries of compounds for VLS are many and screening of all the compounds again is computationally very time consuming. Therefore, there is a need to look for databases of compounds, which are known to bind to proteins or enzymes, because those compounds or their analogs having affinity towards proteins have a higher chance of being drugs. The POB being the source of protein structure coordinates, also contains the coordinates of ligands and other additives bound to proteins. Many databases of the ligands and other additive coordinates, which, were evolved from the POB, are known as HIC-Up, PLO and Ligand depot. These databases were appropriate for retrieving compound names and their structural information. Some of these contain along with other compounds, buffer ions, metal ions and solvents , which may not be very interesting for a researcher looking for drug-like molecule. There is , therefore, a need for creating a database of compounds having affinity towards proteins and known biological activity on proteins. However prior to that, the analysis of these compounds, their chemical nature, the conformational flexibility of the compounds and the factors influencing the conformations of 'proteins have to be studied.

Based on the Lipinski ' s rule of five, except the LogP values, all the entries of the database HIC-Up were screened and the compounds following the rule were filtered. The logl?'12 values are the partition coefficient values of a compound between octanoliwater and are not available for all the compounds. The Feb 2004 version of HIC-Up contains 4669 entries. About 64% of the entries (2998) were found to tollow the Lipinski 's rule. These entries were further analyzed for their nature. Finding

biological activity for all the compounds was not very easy, albeit attempts were made. About 40% of the 2998 (1195) compounds were scanned and further screening is in the process. It was surprising that only 18 % of the I 195 (216) compounds showed reports of biological activjty . This -search was done on available search engines on the Internet. However, it is possible that this is not exhaustive information pertaining to the biological activity of these compounds. The nature of these 216 compounds and their distribution is as follows: Heterocyclic 51.9%; aromatic 24. I %; aliphatic 16.2%; cyclic 6.5% and carbohydrates 1.4%. It is understandable that heterocyclic compounds are more in number since they are considered to be biologically more active. The small number of carbohydrates is due to the filtering effect of the Lipinski 's rule. Only monosaccharides would fit into the rule and any increase in the oligomeric status of the carbohydrate will increase the nllmber of O+N number. Currently , we are in the process of creating a database of compounds with biological activity .

One of the important features is that many compounds are present more than once in the database. This gives a chance to look at the conformational freedom of various compounds. A particular compound may experience di fferent environments with different proteins, depending on its conformational flexibility. Since these compounds contain many single bonds, they are flexible and may exist in more than one conformation. Also, these compounds will exert conformational changes on the proteins to which they are bound in different contexts. It was also observed that the same li gand was bound to different proteins in different environments. For example, the ligand dihydroxyacetonephosphate (OHAP) is known to be associated with fructose 1,6-bisphosphate aldolase 13 (FBA), fructose I-phosphate aldolase l 4 (FPA), triosephosphate isomerase l 5 (TIM) and dihydroxyacetone kinase l 6 (OHK). Although all the enzymes where OHAP is bound have a similar fold, except dihydroxyacetone kinase, the binding site and the orientation of the OHAP in the binding pocket is not same. The amino acid sequence of the enzymes are not similar, therefore the envi ronment in which the OHAP was bound is also not the same. Figure I A and Table I show the general scheme of OHAP binding and the rotational freedom around the indi vidual bonds to show the conformational variations of DHAP. Two sitLiations were observed with OHAP. One is that the OHAP is covalently

108 INDIAN J CHEM, SEC A, JANUARY 2006

03

OIP

Fig. I-Molecular structures of (A) dihydroxyacetone phosphate, and, (B) pyrroloquinoline quinine, shown as stick rendered drawings. The numbering has been taken from their entries in the PDB.

Table I-Torsion angles of DHAP in different protein complexes (proteins are denoted with their PDB code with

chain identi fier)

Atoms defining the torsion angles

OIP-P-OI-CI P-0l-CI-C2 o I-C I-C2-C3 CI -C2-C3-03

Torsion angle (deg) IFDJA IFDJB LJ4E

-101.16 - 126.66 95.24 6.43

-163.27 -173.02 91.56 -78.84

66.81 - 138.21 -137.67 -97.71

IOK4

-67.74 137.84 3.72

-1l1.57

bound to FBA, while in the other, DHAP is bound to FBA through non-covalent interactions. The conformational torsion angles for DHAP in both PDB entries for FBA, (lJ4E and lOK4) show that the structure of DHAP is not same. Although Lys 229 of both IJ4E and IOK4 is the residue covalently bound to C2 of DHAP the other residues surrounding DHAP are not same. So, the conformational flexibility of the DHAP may be attributed to the variation in amino acid sequence between FBA of Orictolagus cuniculus (lJ4E) and Thermoproteus tenax (lOK4). From Table 1 it is also observed that DHAP has two different conformations within the same enzyme, but with different molecules in a crystallographic asymmetric unit. In a non-covalently bound environment (IFDJ) with FBA, the DHAP exists in two conformations (Table 1, IFDJA and IFDJB). This is due to the flexible nature of the amino acids responsible for holding the DHAP in the active site (Fig. 2). Therefore, DHAP exists in at least 9 different environments within three different enzymes depending on the variations in amino acid sequence and the nature of the binding (data not shown). This indicates the flexible nature of ligands when they are used for docking studies. Flexibility in the proteins

~sn229 Thr~

- Pro389 ASP30St

Asn26~ ~ ~ - l 5335

Ca fI

Fig. 2- Protein environment of PQQ in (A) quinoprotein glucose dehydro­genase, and, (B) quinohemoprotein alcohol dehydrogenase. Oxygen atoms are shown in red and· nitrogens in blue. Carbon atoms are shown either in cyan or gold (A) or green (B). The calcium ion position is shown in one of the molecules. Note the difference in the surrounding amino acids around both PQQs.

KUMAR el al.: BINDING CONFORMATIONS OF SMALL MOLECULES TO PROTEINS 109

Table 2-Torsion angles of PQQ in different protein complexes (proteins are denoted with their PDB code)

Torsion angle (deg) Atoms defining the torsion ICQI 10TW IKBO angles

C8-C9-C9x-09A N I -C2-C2x-02A N6-C7-C7x-07 A

33. I 3 38.70 -10.42 -13.86 -13.44 -21.01 4.46 -8.95 7.05

also is an important factor. However, not many commercial packages would allow for the flexibility in the target protein while carrying out the docking studies.

Pyrroloquinoline quinone (PQQ), an aromatic planar heterocyclic compound containing three acidic and two phenolic hydroxyl functional groups (Fig. 2B), is observed to be bound to three different proteins. Since there are only three acidic functional groups, which could be rotated to have different conformations of PQQ, these dihedral angles were measured. It was found that there are two sets of dihedrals (Table 2). The proteins to which the PQQ is bound are different, and therefore, the molecular environment around the binding site is also different. However, some similarities were expected, as the planar aromatic region of the PQQ should bind to a hydrophobic region on the target proteins. However, this was not true when the aromatic binding regions were also different. While in alcohol dehydrogenase I?

(lKBO) and pyrroloquinoline quinone synthase l8

(lOTW), there were aromatic residues from the proteins having stacking as well as edge-to-plane interactions, in glucose dehydrogenase l9 (lCQl) there are no such aromatic-aromatic interactions. In aromatic molecular packing, stacking interactions and edge-to-plane interactions are common and believed to be stabilizing aromatic-aromatic interactions20

-22

.

Also, while there is a calcium ion bound to N6, 07 A and 05 in lKBO and lCQl, such metal binding is absent in 10TW. The carboxylic acid groups are bound to mainly basic residues in lCQl and 10TW. However, in 1 KBO the C2 carboxylic acid group of PQQ is hydrogen bonded to an acidic amino acid (Fig. 2 A&B). Either the carboxylic acid group of PQQ was protonated or the protein was bound to PQQ in acidic pH, say around 5. The C2 carboxylic acid group makes a strong hydrogen bonding interaction with Glu 70 (2.47 A between Glu 70 OE2 and 02A of PQQ) suggesting that the Glu 70 was protonated. The other two carboxylic acid groups of PQQ showed

interactions involving negatively charged oxygens, which leads to the assumption that the C2 carboxylic acid also should be negatively charged (Fig. 2B). Therefore, Glu 70 should he protonated since a negatively charged Glu 70 cannot form hydrogen­bonding interactions. In fact, alcohol dehydrogenase17

,23 was crystallized at pH 5.5. Therefore, pH is also an important factor in considering the docking interactions between highly charged species. These examples suggest that the proteins change their protonation status depending on the changing environment of the cell (PH) and act accordingly. This is not very easy to predict and calculating docking interactions by computers pose uncertainties.

References I Berman H M, Westbrook J, Feng Z, Gilliland G, Bhat T N,

Weissig H, Shindyalov I N &Bourne P E, Nucleic Acids Res, 28 (2000) 235.

2 Andreeva A, Howorth D, Brenner S E, Hubbard T J P, Chothia C & Murzin A G, Nucleic Acids Res, 32 (2004) 0226.

3 Pearl F M G, Bennett C F, Bray J E, Harrison A P, Martin N, Shepherd A, Sillitoe I, Thornton J & Oren go C A. Nucleic Acids Res, 3 I (2003) 452.

4 Laskowski R A, Hutchinson E G, Michie A 0, Wallace A C, Jones M L & Thornton J M, Trends Biochem Sci, 22 (1997) 488.

5 Laskowski R A, Nucleic Acid.l· Res, 29 (2001) 221.

6 Kleywegt G J & Jones T A, Acta Cryst D, 54 (1998) 1 119.

7 Puvanendrampillai 0 & Mitchell J B 0, Bioinformatics, J 9 (2003) 1856.

8 Lipinski C A, Lombardo F, Dominy B W & Feeney P J, Adv Drug Delivery Rev, 23 (1997) 3.

9 Collaborative Computational Project Number. The CCP4 Suite: Programs for Protei1l Crystallography, Acta Crystallogr D, 50 (1994) 760.

10 Marshall G R, Barry C D, Bosshard H E, Dammkoehler R A & Dunn D A, ACS Sym, 112, edited by E COlson & R E Christofferson, (Am Chern Soc, Washington D.C.) 1979.

11 Cramer R D Ill, Patterson D E & Bunce J D, J Am Chem Soc, 110 (1988) 5959.

12 Hansch C, Acc Chem Res, B2 (1969) 232.

13 Choi H, Shi J, Hopkins C E, Tolan 0 R & Allen K N, Biochemistry, 40 (2001) 13868.

14 Joerger A C, Mueller-Dieckmann C & Schultz G E, J Mol Bioi, 303 (2000) 53 I.

15 Jogl G, Rajovsky S, McDermott A E & Tong L, Proc Natl Acad Sci (USA), 100 (2003) 50.

16 Garcia-Alles L F, Siebold C, Luthi-Nyffeler T, Flukiger­Bruhwiler K, Schneider P, Burgi H-B, Baumann U & Erni B, Biochemistry, 43 (2004) 13037.

17 Oubrae A, Rozeboom H J, Kalk K H, Huizinga E G & Dijkstra B W, J BioI Chem, 277 (2002) 3727.

no INDIAN J CI-IEM. SEC A. JANUARY 2006

18 Magnusson 0 T. Toyama H. Saeki M, Rojas A. Reed J C. Liddington R C Klinman J P & Schwarzenbacher R. Proc NaIL Acad Sci (USA). 101 (2004) 79 13.

19 Oubrae A. Rozeboom H J. Kalk K H. Olslhoorn J J, Duine J A & Dijkslra B W. EMBOJ. 18 (1999) 5187.

20 Burley S K & Pelsko G. Sciellce, 229 (1985) 23.

. , '-'

{ ~ ,- . ---;, -

21 McGaughey G B, Gagne M & Rappe A K. J Bioi Chelll. 273 (1998) 15458.

22 Anderson D E, Hurley J H, Nicholson H, Baase W A & Matthews B W, Prot Sci, 2 ( 1993) 1285.

23 Oubrae A. Rozeboom H 1, Kalk K H, Huizinga E G & Dijkstra B W. Acta Crystallogr D, 57 (2001) 1732 .

11')'.;"'; ' , ') ; ,j: <.:1 ::