On a Generic Shape Complementarity Score · question, which is, what exactly do we mean by the...

3
On a Generic Shape Complementarity Score Morad Behandish Horea T. Ilie¸ s * Motivation. The ability to quantify shape complementarity (i.e., a measure for the ‘goodness of fit’) of geometric interfaces appears fundamental to applications as diverse as mechanical design and manufacturing automation, robot motion planning and navigation, protein docking and rational drug design, and in the broad scientific arena whenever the behavior/function of a system is depen- dent on proper geometric alignment (i.e., interfaceability) of the constituents. However, the current challenge lies in the lack of a generic mathematical formulation that applies to objects of arbitrary shape, and obtaining a general measure of interfaceability without simplifying assumptions on the shape domain remains an open problem. In spite of the substantial amount of research on ad-hoc measures of shape complementarity for protein complexes (i.e., finite arrangements of spherical atoms) reviewed in [5, 8], the problem is scarcely studied for objects of arbitrarily complex surface features [1]. Here we propose a novel formulation and computational framework for objects of arbitrary shape in the Euclidean 3-space, potentially extensible to higher dimensions, built on a generalization of the ideas that are in use in the most recent protein docking systems [2, 4]. Formulation. Given two sets S 1 ,S 2 ∈S in the 3-space, where S⊂ E 3 represents the collection of all ‘well-defined’ solid objects (here specified as compact regular semi-analytic subsets of the Euclidean metric space E 3 =(R 3 ,d) with the usual L 2 -norm as the metric d(x, y)= kx - yk 2 for all x, y R 3 [6]) the basic idea is to formulate the so-called shape complementarity score function f : SE(3) R as a cross-correlation of the form f (t; S 1 ,S 2 )=(ρ 1 * ρ 2 )(t)= Z R 3 ρ 1 (x) ρ 2 (t -1 x) dx, (1) where SE(3) = SO(3) n R 3 is the Special Euclidean group (i.e., the group of all rigid body transformations), * is the convolution operator, and dx is the infinitesimal volume element in E 3 . The functions ρ 1,2 = ρ(x; S 1,2 ) are shape descriptors (called affinity functions) that are invariant under rigid body motion, i.e., ρ(x; tS )= ρ(t -1 x; S ) for all x R 3 , t SE(3), and S ∈S . How can one define the affinity function ρ :(R 3 - ∂S ) R (or C) 1 for a given shape in such a way that the integral in (1) produces a higher score when there is a better geometric fit between the surface features of the stationary solid S 1 and the displaced solid tS 2 ? This raises a more fundamental question, which is, what exactly do we mean by the ‘goodness of fit’? We start from an intuitive qualitative definition: A generic shape complementarity score model for objects of arbitrary shape can be obtained from a comparative overlapping of shape skeletons * Computational Design Lab, Departments of Mechanical Engineering and Computer Science and Engineering, University of Connecticut, Authors may be reached at [email protected] and [email protected]. 1 Our particular choice of the kernel used in the definition of the affinity function excludes the boundary from its domain, which does not affect (1) when dealing with solid objects that have nowhere-dense boundaries [7]. The range is changed to complex plane for practical reasons to be explained shortly. In this case, the definition in (1) needs to be modified to Re{ρ1 * ρ2} to ensure an ordering on the range of the score function. 1

Transcript of On a Generic Shape Complementarity Score · question, which is, what exactly do we mean by the...

Page 1: On a Generic Shape Complementarity Score · question, which is, what exactly do we mean by the ‘goodness of t’? We start from an intuitive qualitative de nition: A generic shape

On a Generic Shape Complementarity Score

Morad Behandish Horea T. Ilies∗

Motivation. The ability to quantify shape complementarity (i.e., a measure for the ‘goodness offit’) of geometric interfaces appears fundamental to applications as diverse as mechanical design andmanufacturing automation, robot motion planning and navigation, protein docking and rationaldrug design, and in the broad scientific arena whenever the behavior/function of a system is depen-dent on proper geometric alignment (i.e., interfaceability) of the constituents. However, the currentchallenge lies in the lack of a generic mathematical formulation that applies to objects of arbitraryshape, and obtaining a general measure of interfaceability without simplifying assumptions on theshape domain remains an open problem.

In spite of the substantial amount of research on ad-hoc measures of shape complementarityfor protein complexes (i.e., finite arrangements of spherical atoms) reviewed in [5, 8], the problemis scarcely studied for objects of arbitrarily complex surface features [1]. Here we propose a novelformulation and computational framework for objects of arbitrary shape in the Euclidean 3−space,potentially extensible to higher dimensions, built on a generalization of the ideas that are in use inthe most recent protein docking systems [2, 4].

Formulation. Given two sets S1, S2 ∈ S in the 3−space, where S ⊂ E3 represents the collectionof all ‘well-defined’ solid objects (here specified as compact regular semi-analytic subsets of theEuclidean metric space E3 = (R3, d) with the usual L2−norm as the metric d(x, y) = ‖x− y‖2 forall x, y ∈ R3 [6]) the basic idea is to formulate the so-called shape complementarity score functionf : SE(3)→ R as a cross-correlation of the form

f(t;S1, S2) = (ρ1 ∗ ρ2)(t) =

∫R3

ρ1(x) ρ2(t−1x) dx, (1)

where SE(3) ∼= SO(3) n R3 is the Special Euclidean group (i.e., the group of all rigid bodytransformations), ∗ is the convolution operator, and dx is the infinitesimal volume element in E3.The functions ρ1,2 = ρ(x;S1,2) are shape descriptors (called affinity functions) that are invariantunder rigid body motion, i.e., ρ(x; tS) = ρ(t−1x;S) for all x ∈ R3, t ∈ SE(3), and S ∈ S. How canone define the affinity function ρ : (R3 − ∂S) → R (or C)1 for a given shape in such a way thatthe integral in (1) produces a higher score when there is a better geometric fit between the surfacefeatures of the stationary solid S1 and the displaced solid tS2? This raises a more fundamentalquestion, which is, what exactly do we mean by the ‘goodness of fit’?

We start from an intuitive qualitative definition: A generic shape complementarity score modelfor objects of arbitrary shape can be obtained from a comparative overlapping of shape skeletons

∗Computational Design Lab, Departments of Mechanical Engineering and Computer Science and Engineering,University of Connecticut, Authors may be reached at [email protected] and [email protected].

1Our particular choice of the kernel used in the definition of the affinity function excludes the boundary from itsdomain, which does not affect (1) when dealing with solid objects that have nowhere-dense boundaries [7]. The rangeis changed to complex plane for practical reasons to be explained shortly. In this case, the definition in (1) needs tobe modified to Re{ρ1 ∗ ρ2} to ensure an ordering on the range of the score function.

1

Page 2: On a Generic Shape Complementarity Score · question, which is, what exactly do we mean by the ‘goodness of t’? We start from an intuitive qualitative de nition: A generic shape

Figure 1: Shape complementarity score profiles for (a-b) assembly of two mechanical parts, withnon-trivial fit correspondence between mating features; and (c-e) bound-bound docking of a partof Ran GTPase in complex with NTF-2 [PDB Code: 1A2K].

between the mutually complement features, i.e., by overlapping the external skeleton of one objectwith the internal skeleton of its assembly partner. For a precise quantitative formulation, we use ournovel concept of the Skeletal Density Function (SDF), which can be conceptualized as a continuousextension of the definition of the traditional shape skeletons:

ρ(x;S) =

∮∂Sφ[M(x, S)d(x, ∂S) + id(x, y)

]dy⊥, (2)

where M : R3 × S → {−1, 0,+1} is the Point Membership Classification (PMC) function [9],the three integer outcomes coding ‘in’, ‘on’, and ‘out’, respectively, yielding the signed distancefunction as the real part inside brackets; dy⊥ is the projection of the surface element dy on theplane normal to the vector (y−x) for x ∈ R3 and y ∈ ∂S. The kernel φ : C→ C can be defined in avariety of ways, a proper candidate being φ(ζ;σ) ∝ (

√2πζ2)−1 g (| tan∠ζ| − 1;σ), where g(x;σ) =

(√

2πσ)−1 exp[−12(x/σ)2] is the isotropic Gauss function. This particular form is composed of a

‘medial’ component (the Gaussian term) that characterizes the skeletal density that extends animplicit definition of conventional skeletons, and a ‘proximal’ component (the inverse-square term)that obligates the skeletal branches to stronger densities near the object boundaries. The latter alsoadjusts the proper phase shift ∠φ = −2∠ζ of the integrand in (2), which induces (approximately)opposite phases between the high-density internal and external skeletal regions as a result of (2).This in turn results in meaningful contribution terms to the cross-correlation in (1); i.e., a positivereal ‘award’ in (1) in case of external/internal skeletal overlap (i.e., proper fit), and a negativereal ‘penalty’ in case of external/external overlap (i.e., separation) or internal/internal overlap (i.e.,collision), the relative strength of each contribution being adjusted by the proportionality coefficientin the φ−kernel that is chosen to be dependent on the sign of Re{ζ} only.

Validation. We will review several practical examples including those illustrated in Figure 1, anddemonstrate the effectiveness of the method in mechanical assembly automation (a-b) [3], as well asab initio protein docking (c-d). We will also investigate the theoretical and computational propertiesof the new formulation in comparison with the state-of-the-art in protein docking algorithms (e)[2].

Conclusion. Our proposed approach to model complementarity is generic, it applies to arbitrar-ily complex shapes; produces inherently robust results against small perturbations; is effective insteering both gradient-based and evolutionary optimization algorithms; possesses appealing com-putational properties that suggest efficient computational algorithms in the 3D Euclidean space,and subsumes the existing protein docking (of spherical atoms) approaches as special cases.

2

Page 3: On a Generic Shape Complementarity Score · question, which is, what exactly do we mean by the ‘goodness of t’? We start from an intuitive qualitative de nition: A generic shape

References Cited

[1] Pankaj K Agarwal, Herbert Edelsbrunner, John Harer, and Yusu Wang. Extreme elevation ona 2-manifold. Discrete & Computational Geometry, 36(4):553–572, 2006.

[2] Chandrajit L. Bajaj, Rezaul Chowdhury, and Vinay Siddahanavalli. F2Dock: Fast Fourierprotein-protein docking. IEEE/ACM Transactions on Computational Biology and Bioinfor-matics (TCBB), 8(1):45–58, 2011.

[3] Morad Behandish and Horea T. Ilies. Peg-in-hole revisited: A generic force model for hapticassembly. In Proceedings of ASME Computers and Information in Engineering Conference(CIE), 2014.

[4] Rezaul Chowdhury, Muhibur Rasheed, Donald Keidel, Maysam Moussalem, Arthur Olson,Michel Sanner, and Chandrajit L. Bajaj. Protein-protein docking with F2Dock 2.0 and GB-rerank. PloS one, 8(3):e51307, 2013.

[5] Miriam Eisenstein and Ephraim Katchalski-Katzir. On proteins, grids, correlations, and dock-ing. Comptes rendus biologies, 327(5):409–420, 2004.

[6] Aristides Requicha. Mathematical models of rigid solid objects. Production Automation Project,Technical Memo. No. 28, University of Rochester, 1977.

[7] Aristides Requicha and Robert B. Tilove. Mathematical foundations of constructive solid ge-ometry: General topology of closed regular sets. Production Automation Project, TechnicalMemo. No. 27, University of Rochester, 1978.

[8] David W Ritchie. Recent progress and future directions in protein-protein docking. CurrentProtein and Peptide Science, 9(1):1–15, 2008.

[9] Robert B. Tilove. Set membership classification: A unified approach to geometric intersectionproblems. IEEE Transactions on Computers, 100(10):874–883, 1980.

3