NMR Protein Structure Calculation and Sphere Intersections

: Nuclear Magnetic Resonance (NMR) experiments can be used to calculate 3D protein structures and geometric properties of protein molecules allow us to solve the problem iteratively using a combinatorial method, called Branch-and-Prune (BP). The main step of BP algorithm is to intersect three spheres centered at the positions for atoms i − 3, i − 2, i − 1, with radii given by the atomic distances d i −3, i , d i −2, i , d i −1, i , respectively, to obtain the position for atom i . Because of uncertainty in NMR data, some of the distances d i −3, i should be represented as interval distances [ d i −3, i , d i −3, i ], where d i −3, i ≤ d i −3, i ≤ d i −3, i . In the literature, an extension of the BP algorithm was proposed to deal with interval distances, where the idea is to sample values from [ d i −3, i , d i −3, i ] . We present a new method, based on conformal geometric algebra, to reduce the size of [ d i −3, i , d i −3, i ], before the sampling process. We also compare it with another approach proposed in the literature.


Introduction
Nuclear Magnetic Resonance (NMR) experiments provide short distance values between atoms of a protein molecule. The Molecular Distance Geometry Problem (MDGP) asks to realize the 3D protein structure using this partial distance information [4,5,31].
Precisely, the MDGP concerns with a graph G = (V , E, d) , where V is a set of vertices representing the atoms and E is a set of edges representing the atomic pairs for which a distance is available, given by the function d : E → ( , ∞). The problem amounts to nd an embedding x : V → R such that where xu = x(u), xv = x(v), du,v = d ({u, v}), and ||xu − xv|| is the Euclidean norm.
The classical approach to the MDGP is based on global optimization methods [29], where a MDGP solution is associated to the global minimizer of the problem min x ,...,xn∈R f (x , ..., xn), where f is a function f : R n → [ , ∞) de ned by Note that x , ..., xn ∈ R is a MDGP solution if, and only if, f (x , ..., xn) = .
We may furnish the set V of vertices with an ordering V = {v , ..., vn} [9,15,23,25] so that the MDGP can be solved iteratively using a combinatorial method, namely the Branch-and-Prune (BP) method [8,28]. In this situation, the MDGP is called the Discretizable Molecular Distance Geometry Problem (DMDGP) [19,20], which can be stated as follows, where we use x i instead of xv i and d i,j in place of dv i ,v j : (DMDGP) Given a simple undirected graph G = (V , E, d) in which the vertex set V is ordered as V = {v , ..., vn}, whose edges are weighted by d : E → ( , ∞), subject to the following three constraints: Geometrically, the requirements (3) and (4) imply that, at each iteration of the BP algorithm, we intersect three spheres centered at the positions for vertices respectively, resulting in two possible positions for v i , i > . Distances d i− ,i and d i− ,i are considered precise values, and known a priori, since they are related to bond lengths and bond angles of a protein [20]. However, distances d i− ,i may be obtained from NMR experiments, and instead of being represented by real numbers, they should be given as interval distances In this situation, we have the intersection of two spheres with a spherical shell, giving two arcs, instead of two points in R .
In [21], an extension of the BP algorithm was proposed to deal with interval distances, called iBP, where the idea is to sample values from [d i− ,i , d i− ,i ] [16]. Computational results presented in [10,11,32] con rm what it should be expected: sampling many values, the search space increases exponentially, and for small samples, a solution may not be found.
For a given vertex v i , i > , if another distance d j,i (j < i − ) is detected by NMR, another spherical shell must be considered. This new information can be used to reduce the size of the interval distance [d i− ,i , d i− ,i ], before applying the sampling process.
Computational results presented in [14,26] con rm the improvement of iBP algorithm when such kind of interval reduction is implemented, before sampling values. Without interval reduction, it is necessary to select a distance value from the interval [d i− ,i , d i− ,i ] in order to calculate a position for vertice v i . From positions for vertices v i− , v i− , v i , we calculate a position for vertice v i+ , making another selection from interval [d i− ,i+ , d i− ,i+ ], and so on. A DMDGP solution is obtained when such selections allow us to reach the last vertex of the DMDGP order such that all positions x , ..., xn satisfy the equations (1). The main cost of the iBP algorithm is related to backtracking in the search tree, when "wrong" distance values are selected. When interval distances are reduced, we also decrease the probability of selecting "wrong" distance values.
Using Conformal Geometric Algebra (CGA), we present a new way to make this reduction that simpli es the process considerably, compared to other approaches proposed in the literature.

Methods for reducing [d i− ,i , d i− ,i ]
This section rst describes a recent method proposed for reducing [d i− ,i , d i− ,i ] [26], which is an extension of the ideas presented in [14]. Then, we explain the new approach motivated by the results given in [6,7].
we have an extra equation to the system (5): If the points x j , However, both positions for v i may not satisfy (6) and, in this case, must be pruned. Then, we have to consider the other possible positions for v i− and repeat the procedure until a DMDGP solution is found [20].
For all i ≥ , the solution of the rst two equations of the system (5) is a circle, as the result of intersection of two spheres, centered at Using a xed point c on the circle, in the same plane de ned by the points x i− , x i− , x i− and nearest to x i− , and de ning , the circle can be described by (see Fig. 1) To check this, replace x(t) in the rst equation of (5), From (7) and by the fact , implying that the solution for is given by  (8), we obtain that the solution for is given by .
Using the point x j instead of x i− , another parameterization of the circle de ned by the rst two equations of the system (5) must be de ned, for t ∈ [−π, π], To describe the solution of in terms of the rst parameterization, it is necessary to obtain the coordinates of x ( ) − p in terms of → u and → v [26], which results in (see Fig. 2 x'(0) x(0) .
Finally, the solution of the system , is given by The next subsection describes a new model for the 3D space, where spheres are basic objects like points and planes.This model also provides a way to intersect spheres by de ning a product among them.

. A conformal geometric algebra approach
The Euclidean space R can be represented by the conformal model [27], adding two extra dimensions e and e∞, where a point x ∈ R is represented in R by where the usual Euclidean metric still holds for e , e , e . An interesting property of the conformal model is that the inner product X · Y (X, Y ∈ R ) is the squared Euclidean distance between x, y ∈ R , up to a constant factor: x + e + ||x|| e∞ · y + e + ||y|| e∞ From this result, a sphere in R is encoded as a vector S ∈ R [12], given by where C is the conformal representation of the sphere center c ∈ R and r ∈ R is its radius. To see this, we use which implies that X · S = ⇔ ||x − c|| = r .
A spherical shell with center c ∈ R and radii r, r ∈ R, < r < r, is simply given by where C ∈ R is the conformal representation of c ∈ R . Sphere intersections can also be encoded in the conformal model if a more general product (associative and distributive), called geometric product [18], is introduced by From the geometric product, the inner product de ned above can be given by and another product, called outer product [18], is de ned by for a, b ∈ R . The intersection of distinct spheres is given by the outer product of their vector representations. For example, the intersection between two spheres, given in the conformal model by σ , σ ∈ R , is the circle given by [12] σ = σ ∧ σ .
It is important to emphasize that in the general case, the result of the intersection of spheres in R using (9) may result in an element geometrically interpreted as an imaginary circle when no nite point is shared by the spheres, a tangent plane (or point circle with attitude) when the spheres share a single nite point or a real circle. But due to molecular structure restrictions, only point circles and real circles are expected as a result of the intersections.
Let us analyze geometrically the solution of the system Fig. 3, we see that this solution is a subset of the union of two arcs of a circle de ned by as explained in subsection 2.1. Let us denote by P i P i and P i P i the arcs obtained from the intersection of spheres S i− ,i , S i− ,i with the spherical shell S i− ,i (see Fig. 3), given by x j where X i− , X i− , X i− are the conformal representation of the points Motivated by the geometry of the problem (see Fig. 4), we de ne two more spherical shells, with the same center x j , and with interval radii given by the distances between x j and P i , P i and the distances between x j and P i , P i : with with These new spherical shells are of the essence of the problem, which implies that the solution of the system (10) is given by (see Fig. 5) The rst part of the union in (13) is a subset of the arc P i P i , and the second is a subset of the arc P i P i (see Fig. 5).
x j If one of these intervals is empty, we simply remove it from the calculation (see example in the next section).
The CGA approach has, at least, two advantages. The rst one is related to the fact that the geometric interpretation of the problem that must be solved, given by , can be described in the language of the CGA, which allowed a better "view" of the problem, in addition to solve it just comparing distance values. The second advantage is based on the possibility to solve problems in higher dimensions, where sphere intersections are also involved [1].

Example
To illustrate the di erence between the two approaches, let us consider a DMDGP instance de ned by the graph G = (V , E, d), given by .
The rst four vertices can be xed at Using the polar coordinates approach, we rst calculate which implies that the circle de ned by the system can be described by for t ∈ [−π, π], and that the solution of For the interval distance d , , another parameterization is obtained as implying that the solution of the system is given by . Now, let us see how to solve the example using the CGA approach. From the intersections of spheres S , ∧S , ∧S , and S , ∧S , ∧S , , with radii given by d , , d , , d , , d , , respectively, we obtain the arcs P P and P P , de ned by the points The radii of the spherical shells S (11) and S (12), centered at X , are r = min − X · P , − X · P = min{ . , . } = .
r = max − X · P , − X · P = max{ . , . } = . , Hence, the solution of the system ||x − x || = d , The superscript 1 in S , indicates that the solution is a subset of the arc P P . Doing the calculations, the size of the interval associated to the distance d , changed from [ . , . ] to [ .

Conclusions
NMR experiments do not provide precise distances between atoms in a protein molecule and dealing with interval distances is a big challenge for DMDGP solution methods [30].
Based on Conformal Geometric Algebra (CGA), we present a new approach that allows us to incorporate the geometry involved when uncertainties must be taken into account, in addition to simplify the understanding of the problem.