In studying protein-protein interactions it is important to accurately describe the surface of the...

1
In studying protein-protein interactions it is important to accurately describe the surface of the proteins, as that is where the interactions occur. The most common surface representation in molecular visualization programs is the Lee-Richards (LR) surface, which is generated by rolling a probe representing a solvent molecule on the van der Waals surface of the protein. This approach is slowed computationally because the program must consider 2 or three atoms at a time. The resulting molecular surface is like a Van der Waals surface, but with reentrant surfaces bridging gaps between reasonably close atoms. Also, the LR surface algorithm sometimes incorrectly assigns atoms to the surface of the protein. We are developing a new algorithm with the hope that it will be both more accurate and faster than current approaches. In the algorithm, atoms are first identified by their accessibility to an imaginary water molecules; it then finds the number of these atoms which are within distance epsilon of atom i. The algorithm then explores a sphere with radius epsilon around atom i. The sphere is divided into 8 sectors and the algorithm determines how many of the n atoms contained in the sphere fall into each of the sectors. If all 8 sectors contain at least one atom, then the atom in question is considered an inside atom and given a value of 0. If at least one sector contains no atoms then the atom in question is considered outside and is given a value of 1. This algorithm is currently implemented in Minitab via the use of a global macro. Atomic coordinates were extracted using an awk script and saved to text files. Preliminary results show that the algorithm is flexible and can identify outside atoms over different shapes of proteins. Also a unique pattern emerged: no matter the protein, there was a limit on the percentage of atoms found on the outside; the graphs show identical limits for all proteins studied thus far. The burdensome. We are currently testing the algorithm with different proteins and are investigating the association of limits with protein size/shape. Our next step will be to program the algorithm in a different language such as python, java or C++ to see if a more robust language can decrease the run time. In conclusion, this new algorithm is effective on a variety of proteins, shows a unique aspect of proteins as seen in the limits which possibly can lead into new insights into protein surfaces. Abstract # 2624 Goals A STATISTICAL ANALYSIS AND APPROACH TO PROTEIN SURFACE MODELING Luticha Doucette†, James Halavin*, Paul Craig°, Herbert J. Bernstein‡ [email protected], [email protected], [email protected], [email protected] †RIT Life Sciences, *RIT Mathematical Sciences, °RIT Chemistry ‡Dowling College, Mathematics and Computer Science Conclusion Materials and Methods Results from Minitab Figure 1. 3D Minitab Scatter Plot of 1UAQ Figure 1 shows protein 1UAQ with epsilon = 40. Number of outside atoms found = 434 Figure 2 shows how a change in epsilon changes how many atoms are found. Epsilon = 4.6 Number of outside atoms = 1593 The current algorithm is flexible and can handle many different shaped proteins Proteins show similar asymptotic curves representing the number of atoms that are identified as being on the protein surface Once implemented in Python, the algorithm runs faster and does not crash like in Minitab Future Plans Refine the algorithm in Python Test the refined algorithm with proteins used in Minitab, compare results Once the algorithm is refined, expand tests to different types of proteins according to class and size Incorporate the algorithm into existing ProMol extension as another tool Publish! Create a faster, more accurate representation of a protein surface Implement the algorithm in a more robust language such as Python Investigate the implications of limits in protein surfaces Created global macros, used awk script to obtain coordinates of atoms from PDB files Opened Minitab in Windows. A global macro was created and added to the Minitab file menu. In Python, reinterpreted the algorithm and tested on the same PDB files as was tested in Minitab Literature Cited Acknowledgments Scott “JT” Mengel Jon Schull Eulas Boyd and NSF-LSAMP The authors gratefully acknowledge the assistance of current and former students who have worked at Dowling College and at RIT on the SBEVSL project. Funding: This work has been supported in part by National Science Foundation Division of Undergraduate Education grant 0402408, National Institute of General Medical Sciences grants 2R15GM078077-02, 3R15GM078077-02S1. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies. Steinkellner, G; Rader, R.; Thallinger, G.G; Kratky, C.; Gruber, K. VASCo: computation and visualization of annotated surface protein surface contacts. BMC Bioinformatics. 2009, 10, 32. http://www.biomedcentral.com/1471-2105/10/32 (accessed June 7, 2010). Liu, Y.S; Fang, Y; Ramani, K. IDSS: deformation invariant signatures for molecular shape comparison. BMC Bioinformatics 2009, 10, 157. http://www.biomedcentral.com/1471-2105/10/157 (accessed June 7, 2010). Hoffmann, B.; Zaslavskiy, M.; Vert, J.P.; Stoven, V. A new protein binding pocket similarity measure based on comparison of clouds of atoms in 3D: application to ligand prediction. BMC Bioinformatics 2010, 11, 99. http://www.biomedcentral.com/1471-2105/11/99 (accessed June 7, 2010). Bash, P. A.; Pattabiraman, N.; Huang, C.; Ferrin, T.E; Langridge, R. Van Der Waals Surfaces in Molecular Modeling: Implementation with Real-Time Computer Graphics. Science. New Series. 1983, 222, 4630 pg 1325 – 1327. http://www.jstor.org/stable/1691658 (accessed June 8, 2010). Kuntz, I.; Blaney, J.M.; Oatley, S. J.; Langridge, R.; Ferrin, T.E. A geometric approach to macromolecule-ligand interactions. J. Molec. Bio. 1982, 161, 2. Pg 269 – 288 doi:10.1016/0022-2836(82)90153-X (accessed June 8, 2010). Connolly, M.L. Solvent-Accessible Surfaces of Proteins and Nucleic Acids. Science, New Series. 1983, 221,4612 pg 709 – 713. http://www.jstor.org/pss/1691011 (accessed June 8, 2010). Bernstein, H.J.; Craig, P.A. Efficient molecular surface rendering by linear-time pseudo- Gaussian approximation to Lee-Richards surfaces (PGALRS). Journal of Applied Crystallography. 2010, 43, 2. Pg 356 – 361. Peter J. Artymiuk, Andrew R. Poirrette, Helen M. Grindley, David W. Rice, Peter Willett, A Graph-theoretic Approach to the Identification of Three-dimensional Patterns of Amino Acid Side-chains in Protein Structures, Journal of Molecular Biology, Volume 243, Issue 2, 20 October 1994, Pages 327-344, ISSN 0022-2836, 10.1006/jmbi.1994.1657. (http://www.sciencedirect.com/science/article/pii/S0022283684716573) Results From Python Table 1 shows that as epsilon is increased, the number of outside atoms found reaches a limit. Algorithm Center (x(i), y(i), z(i)) An imaginary surface is created and divided into 8 sectors. If one sector is is empty, value = 1. If all sectors filled, value = 0. Figure 2. 3D Scatter Plot of 1UAQ with Change in Epsilon Epsilon Number Inside (0) Number Outside (1) Total Number of Atoms (N) 4 415 2443 2858 4.6 1265 1593 2858 8 2302 556 2858 15 2415 443 2858 40 2424 434 2858 75 2424 434 2858 Table 1. Limiting Asymptotic Value for 1UAQ Figure 3. ED Scatter Plot of 3CTK Figure 3 shows protein 3CTK with epsilon set to 70. Maximum radius was found to be 62.927. Number of outside atoms found = 469 Figure 4. Plot of Maximum Radius vs Percent Outside Proteins studied thus far show very similar asymptotes as seen in Figure 4. All have a sharp decrease until the radius is about 20% of the total number of atoms then it levels off. Figure 6. 1UAQ Implemented in vPython From left to right: vPython places a boxel around each atom, boxel is subdivided into 8 sectors, if other atoms are within those sectors Python returns false and that atom is eliminated. Only outside atoms are left with their corresponding boxels as shown in yellow. Results from Minitab Continued Figure 5. 3D Scatter Plot of 1AV1 The new algorithm is flexible in that it can identify outside atoms in unusually shaped proteins as seen in Figure 5. This plot is a different representation of a change in epsilon as seen in figures 1 and 2. Program No. 978.9 Figure 7. Python vs. Minitab 0 10 20 30 40 50 60 70 80 0 200 400 600 800 1000 1200 1400 1UAQ in Python Radius Num Atoms 0 10 20 30 40 50 60 70 80 0 500 1000 1500 2000 2500 3000 1UAQ in Minitab Series1 Radius Num Atoms Figure 9. 1AV1 Python Implementation 3D scatter On the left is 1AV1 in the next iteration of the algorithm. Done in Python 2.7.5 the blue represents the maximum number of atoms found to be on the outside, while green represents interior atoms. 1649 out of 6588 atoms were found which is 25% of the total, consistent with the asymptotic curves as seen in Figure 4. as well as the 3D scatter plot in Figure 5. On the right, is just the outside atoms, represented in blue. On the left is 1UAQ from Python. The curve is not plotted as percent of maximum radius vs percent outside but shows some discrepancies between Minitab and Python. Further refinement of Python should resolve this issue.

Transcript of In studying protein-protein interactions it is important to accurately describe the surface of the...

Page 1: In studying protein-protein interactions it is important to accurately describe the surface of the proteins, as that is where the interactions occur. The.

In studying protein-protein interactions it is important to accurately describe the surface of the proteins, as that is where the interactions occur. The most common surface representation in molecular visualization programs is the Lee-Richards (LR) surface, which is generated by rolling a probe representing a solvent molecule on the van der Waals surface of the protein. This approach is slowed computationally because the program must consider 2 or three atoms at a time. The resulting molecular surface is like a Van der Waals surface, but with reentrant surfaces bridging gaps between reasonably close atoms. Also, the LR surface algorithm sometimes incorrectly assigns atoms to the surface of the protein. We are developing a new algorithm with the hope that it will be both more accurate and faster than current approaches.

In the algorithm, atoms are first identified by their accessibility to an imaginary water molecules; it then finds the number of these atoms which are within distance epsilon of atom i. The algorithm then explores a sphere with radius epsilon around atom i. The sphere is divided into 8 sectors and the algorithm determines how many of the n atoms contained in the sphere fall into each of the sectors. If all 8 sectors contain at least one atom, then the atom in question is considered an inside atom and given a value of 0. If at least one sector contains no atoms then the atom in question is considered outside and is given a value of 1. This algorithm is currently implemented in Minitab via the use of a global macro. Atomic coordinates were extracted using an awk script and saved to text files. Preliminary results show that the algorithm is flexible and can identify outside atoms over different shapes of proteins. Also a unique pattern emerged: no matter the protein, there was a limit on the percentage of atoms found on the outside; the graphs show identical limits for all proteins studied thus far. The downside to this approach is that it is computationally burdensome. We are currently testing the algorithm with different proteins and are investigating the association of limits with protein size/shape. Our next step will be to program the algorithm in a different language such as python, java or C++ to see if a more robust language can decrease the run time. In conclusion, this new algorithm is effective on a variety of proteins, shows a unique aspect of proteins as seen in the limits which possibly can lead into new insights into protein surfaces.

Abstract # 2624

Goals

A STATISTICAL ANALYSIS AND APPROACH TO PROTEIN SURFACE MODELING

Luticha Doucette†, James Halavin*, Paul Craig°, Herbert J. Bernstein‡[email protected], [email protected], [email protected], [email protected]

†RIT Life Sciences, *RIT Mathematical Sciences, °RIT Chemistry ‡Dowling College, Mathematics and Computer Science Conclusion

Materials and Methods

Results from Minitab

Figure 1. 3D Minitab Scatter Plot of 1UAQ

Figure 1 shows protein 1UAQ with epsilon = 40. Number of outside atoms found = 434

Figure 2 shows how a change in epsilon changes how many atoms are found.Epsilon = 4.6Number of outside atoms = 1593

The current algorithm is flexible and can handle many different shaped proteins

Proteins show similar asymptotic curves representing the number of atoms that are identified as being on the protein surface

Once implemented in Python, the algorithm runs faster and does not crash like in Minitab

Future Plans

Refine the algorithm in Python Test the refined algorithm with proteins used in

Minitab, compare results Once the algorithm is refined, expand tests to

different types of proteins according to class and size Incorporate the algorithm into existing ProMol

extension as another tool Publish!

Create a faster, more accurate representation of a protein surface

Implement the algorithm in a more robust language such as Python

Investigate the implications of limits in protein surfaces

Created global macros, used awk script to obtain coordinates of atoms from PDB files

Opened Minitab in Windows. A global macro was created and added to the Minitab file menu.

In Python, reinterpreted the algorithm and tested on the same PDB files as was tested in Minitab

Literature Cited

Acknowledgments

Scott “JT” Mengel

Jon Schull

Eulas Boyd and NSF-LSAMP

The authors gratefully acknowledge the assistance of current and former students who have worked at Dowling College and at RIT on the SBEVSL project.  Funding: This work has been supported in part by National Science Foundation Division of Undergraduate Education grant 0402408, National Institute of General Medical Sciences grants 2R15GM078077-02, 3R15GM078077-02S1. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.

• Steinkellner, G; Rader, R.; Thallinger, G.G; Kratky, C.; Gruber, K. VASCo: computation and visualization of annotated surface protein surface contacts. BMC Bioinformatics. 2009, 10, 32. http://www.biomedcentral.com/1471-2105/10/32 (accessed June 7, 2010).

• Liu, Y.S; Fang, Y; Ramani, K. IDSS: deformation invariant signatures for molecular shape comparison. BMC Bioinformatics 2009, 10, 157. http://www.biomedcentral.com/1471-2105/10/157 (accessed June 7, 2010).

• Hoffmann, B.; Zaslavskiy, M.; Vert, J.P.; Stoven, V. A new protein binding pocket similarity measure based on comparison of clouds of atoms in 3D: application to ligand prediction. BMC Bioinformatics 2010, 11, 99. http://www.biomedcentral.com/1471-2105/11/99 (accessed June 7, 2010).

• Bash, P. A.; Pattabiraman, N.; Huang, C.; Ferrin, T.E; Langridge, R. Van Der Waals Surfaces in Molecular Modeling: Implementation with Real-Time Computer Graphics. Science. New Series. 1983, 222, 4630 pg 1325 – 1327. http://www.jstor.org/stable/1691658 (accessed June 8, 2010).

• Kuntz, I.; Blaney, J.M.; Oatley, S. J.; Langridge, R.; Ferrin, T.E. A geometric approach to macromolecule-ligand interactions. J. Molec. Bio. 1982, 161, 2. Pg 269 – 288 doi:10.1016/0022-2836(82)90153-X (accessed June 8, 2010).

• Connolly, M.L. Solvent-Accessible Surfaces of Proteins and Nucleic Acids. Science, New Series. 1983, 221,4612 pg 709 – 713. http://www.jstor.org/pss/1691011 (accessed June 8, 2010).

• Bernstein, H.J.; Craig, P.A. Efficient molecular surface rendering by linear-time pseudo-Gaussian approximation to Lee-Richards surfaces (PGALRS). Journal of Applied Crystallography. 2010, 43, 2. Pg 356 – 361.

• Peter J. Artymiuk, Andrew R. Poirrette, Helen M. Grindley, David W. Rice, Peter Willett, A Graph-theoretic Approach to the Identification of Three-dimensional Patterns of Amino Acid Side-chains in Protein Structures, Journal of Molecular Biology, Volume 243, Issue 2, 20 October 1994, Pages 327-344, ISSN 0022-2836, 10.1006/jmbi.1994.1657. (http://www.sciencedirect.com/science/article/pii/S0022283684716573)

Results From Python

Table 1 shows that as epsilon is increased, the number of outside atoms found reaches a limit.

Algorithm

Center (x(i), y(i), z(i))

An imaginary surface is created and divided into 8 sectors.

If one sector is is empty, value = 1.

If all sectors filled, value = 0.

Figure 2. 3D Scatter Plot of 1UAQ with Change in Epsilon

Epsilon Number Inside (0)

Number Outside (1)

Total Number of Atoms (N)

4 415 2443 28584.6 1265 1593 28588 2302 556 285815 2415 443 285840 2424 434 285875 2424 434 2858

Table 1. Limiting Asymptotic Value for 1UAQ

Figure 3. ED Scatter Plot of 3CTK

Figure 3 shows protein 3CTK with epsilon set to 70. Maximum radius was found to be 62.927. Number of outside atoms found = 469

Figure 4. Plot of Maximum Radius vs Percent Outside

Proteins studied thus far show very similar asymptotes as seen in Figure 4. All have a sharp decrease until the radius is about 20% of the total number of atoms then it levels off.

Figure 6. 1UAQ Implemented in vPython

From left to right: vPython places a boxel around each atom, boxel is subdivided into 8 sectors, if other atoms are within those sectors Python returns false and that atom is eliminated. Only outside atoms are left with their corresponding boxels as shown in yellow.

Results from Minitab Continued

Figure 5. 3D Scatter Plot of

1AV1The new algorithm is flexible in that it can identify outside atoms in unusually shaped proteins as seen in Figure 5. This plot is a different representation of a change in epsilon as seen in figures 1 and 2.

Program No. 978.9

Figure 7. Python vs. Minitab

0 10 20 30 40 50 60 70 800

200

400

600

800

1000

1200

1400

1UAQ in Python

Radius

Num Atoms

0 10 20 30 40 50 60 70 800

500

1000

1500

2000

2500

3000

1UAQ in Minitab

Series1

Radius

Num Atoms

Figure 9. 1AV1 Python Implementation 3D scatter

On the left is 1AV1 in the next iteration of the algorithm. Done in Python 2.7.5 the blue represents the maximum number of atoms found to be on the outside, while green represents interior atoms. 1649 out of 6588 atoms were found which is 25% of the total, consistent with the asymptotic curves as seen in Figure 4. as well as the 3D scatter plot in Figure 5. On the right, is just the outside atoms, represented in blue.

On the left is 1UAQ from Python. The curve is not plotted as percent of maximum radius vs percent outside but shows some discrepancies between Minitab and Python. Further refinement of Python should resolve this issue.