Quantum algorithms for computing general discrete logarithms and orders with tradeoffs

In this paper, we generalize and bridge Shor’s groundbreaking works [13, 14] on computing orders and general discrete logarithms, our earlier works [3, 4, 5] on computing short discrete logarithms with tradeoffs and Seifert’s work [12] on computing orders with tradeoffs. In particular, we demonstrate how the idea of enabling tradeoffs may be extended to the case of computing general discrete logarithms. This yields a reduction by up to a factor of two in the number of group operations that need to be computed in each run of the quantum algorithm at the expense of having to run the algorithm multiple times. Combined with our earlier works [4, 5] this implies that the number of group operations that need to be computed in each run of the quantum algorithm equals the number of bits in the logarithm times a small constant factor that depends on the tradeoff factor. We comprehensively analyze the probability distribution induced by our quantum algorithm, describe how the algorithm may be simulated, and estimate the number of runs required to compute logarithms with a given minimum success probability for different tradeoff factors. Our algorithm does not require the group order to be known. If the order is unknown, it may be computed at no additional quantum cost from the same set of outputs as is used to compute the logarithm.


Introduction
As in [4,7,6], let G under be a finite cyclic group of order r generated by g, and The discrete logarithm problem is to compute d = log g x given the group elements g and x. In cryptographic applications, the group G is typically a subgroup of F * p , for some prime p, or an elliptic curve group.
In the general discrete logarithm problem 0 ≤ d < r, whereas d is smaller than r by some order of magnitude in the short discrete logarithm problem.

Earlier works
In 1994, in a groundbreaking publication, Shor [22,23] introduced polynomial time quantum algorithms for factoring integers and for computing general discrete logarithms in F * p . Note that the latter algorithm may be trivially generalized to compute discrete logarithms in arbitrary finite cyclic groups, provided the group operation can be implemented efficiently on the quantum computer. Ekerå [4] initiated a line of research in 2016 by introducing a modified version of Shor's algorithm for computing discrete logarithms that more efficiently solves the short discrete logarithm problem. This work is of cryptographic significance as the short discrete logarithm problem underpins many implementations of cryptographic schemes instantiated with safe-prime groups. A notable example is Diffie-Hellman key exchange [2] in TLS, IKE and NIST SP 800-56A [26,25,24].
In a follow-up work, Ekerå and Håstad [7] enabled tradeoffs in Ekerå's algorithm using ideas that directly parallel those of Seifert [21] in his work on enabling tradeoffs in Shor's order finding algorithm; the quantum part of Shor's factoring algorithm. Ekerå and Håstad furthermore showed how the RSA integer factoring problem, that underpins the widely deployed RSA cryptosystem [18], may be reduced via [9] to a short discrete logarithm problem and attacked quantumly. This gives rise to a quantum algorithm that more efficiently solves the RSA integer factoring problem than Shor's original factoring algorithm when making tradeoffs.
Ekerå [6] subsequently refined the classical post-processing in [7] to render it more efficient. With this improved post-processing, the algorithm of Ekerå and Håstad was shown in [6] to outperform Shor's factoring algorithm when targeting RSA integers, irrespective of whether tradeoffs are made.
A key component to this result was the development of a classical simulator for the quantum algorithm for computing short discrete logarithms: For problem instances for which the solution is classically known, this simulator allows outputs to be generated that are representative of outputs that would be generated by the quantum algorithm if executed on a quantum computer. This in turn allows the efficiency of the classical post-processing to be experimentally assessed.

Our contributions
We generalize and bridge our earlier works on computing short discrete logarithms with tradeoffs, Seifert's work on computing orders with tradeoffs and Shor's groundbreaking works on computing orders and general discrete logarithms. In particular, we enable tradeoffs when computing general discrete logarithms.
Compared to Shor's algorithm for computing general discrete logarithms, this yields a reduction by up to a factor of two in the number of group operations evaluated quantumly in each run, at the expense of having to perform multiple runs. Unlike Shor's algorithm, our algorithm does not require the group order to be known. It simultaneously computes both the order and the logarithm. This allows it to outperform Shor's original algorithms with respect to the number of group operations that need to be evaluated quantumly in some cases even when not making tradeoffs. One cryptographically relevant example of such a case is the computation of discrete logarithms in Schnorr groups of unknown order.
We analyze the probability distributions induced by our algorithm, and by Shor's and Seifert's order finding algorithms, describe how all of these algorithms may be simulated when the solution is known, and estimate the number of runs required for a given minimum success probability when making different tradeoffs.

On the cryptographic significance of this work
Virtually all currently widely deployed asymmetric cryptosystems are based on the intractability of the discrete logarithm problem or the integer factoring problem.
In this work, we further the understanding of how hard these two key problems are to solve quantumly when not on special form. We hope that our results may prove useful when developing cost estimates for quantum attacks, and that they may inform decisions on when to mandate migration from the currently deployed asymmetric cryptosystems to post-quantum secure cryptosystems.

Further details and overview
Our algorithm for computing discrete logarithms consists of two algorithms; • a quantum algorithm, that upon input of a generator g of order r, and an element x = [ d ] g where 0 ≤ d < r, outputs a pair (j, k), and • a classical probabilistic post-processing algorithm, that upon input of a set of n pairs (j, k), produced by n runs of the quantum algorithm, computes d.
In addition to the above post-processing algorithm, we furthermore specify • a classical probabilistic post-processing algorithm, that upon input of a set of n pairs (j, k), computes the order r. Note that the same set of pairs may be used as input to both this and the above post-processing algorithm.
The quantum algorithm is identical to the algorithm in [7,6] for computing short discrete logarithms with tradeoffs. The key difference in this work is that we admit general discrete logarithms and comprehensively analyze the probability distribution that the algorithm induces for such logarithms. The post-processing algorithm for d is a tweaked version of the lattice-based algorithm in [6], whereas the algorithm for r is a natural generalization of the lattice-based algorithm in [6] first sketched in a pre-print of [7]. It is similar to the post-processing in [21].
The quantum algorithm is parameterized under a tradeoff factor s. This factor controls the tradeoff between the requirements that the algorithm imposes on the quantum computer, and the number of runs, n, required to attain a given minimum probability q of recovering d and r in the classical post-processing.
Following [6], we estimate n for a given problem instance, represented by d and r, and fixed s and q, by simulating the quantum algorithm. We first use the simulated output to heuristically estimate n, and then verify the estimate by executing the two post-processing algorithms with respect to simulated output.
The simulator is based on a high-resolution two-dimensional histogram of the probability distribution induced by the quantum algorithm. By sampling the histogram, we generate pairs (j, k) that very closely approximate the output that would be produced by the quantum algorithm if executed on a quantum computer.
To construct the histogram, we first derive a closed form expression that approximates the probability of the quantum algorithm yielding (j, k) as output, and an upper bound on the error in the approximation. We then integrate this expression and the error bound numerically in different regions of the plane.
Our simulations show that when not making tradeoffs, a single run suffices to compute d or r with ≥ 99% success probability. When making tradeoffs, slightly more than s runs are typically required to achieve a similar success probability. In appendix A we show that these results extend to order finding and factoring.
Note that the simulator requires d and r to be explicitly known: It cannot be used for problem instances represented by group elements g and x = [ d ] g.

Structure of this paper
The quantum algorithm is described in section 2. In section 3, we analyze the probability distribution it induces, and derive a closed form expression that approximates the probability of it yielding (j, k) as output. In sections 4 and 5, we describe how the high-resolution histogram is constructed by integrating the closed form expression, and how it is sampled to simulate the quantum algorithm.
In section 6, we describe the two post-processing algorithms for recovering d and r from a set of n pairs (j, k). In section 7, we use the simulator to estimate the number of runs n required to solve a given problem instance for d and r, with minimum success probability q, as a function of the tradeoff factor s.
We summarize past and new results, and discuss related applications, such as order finding and integer factoring, in sections 8 and 9, and in the appendices.

Notation
The below notation is used throughout this paper: • u mod n denotes u reduced modulo n constrained to 0 ≤ u mod n < n.
• u , u and u denotes u rounded up, down and to the closest integer.

Randomization
Given two group elements g and x = [ d ] g to be solved for d , the general discrete logarithm problem may be randomized as follows: 2. Solve g and x for d ≡ d + t (mod r) and optionally for r.

Compute and return
Hence, we may assume without loss of generality that d is selected uniformly at random on 0 ≤ d < r in the analysis of the quantum algorithm. If r is known, t should be selected uniformly at random on 0 ≤ t < r, otherwise on 0 ≤ t < 2 m+c for c a sufficiently large integer constant for the selection of x to be indistinguishable from a uniform selection from G. Solving for r in step (2) is only necessary if r is unknown and d must be on 0 ≤ d < r when returned.

The quantum algorithm
In this section we describe the quantum algorithm, that upon input of a generator g and an element x = [ d ] g, where 0 ≤ d < r, outputs a pair (j, k) and element y.
As stated earlier, the algorithm is parameterized under a small integer constant s ≥ 1, referred to as the tradeoff factor, that controls the tradeoff between the number of runs required and the requirements imposed on the quantum computer.
1. Let m be the integer such that 2 m−1 ≤ r < 2 m , let = m/s , and let 3. Compute QFTs of size 2 m+ and 2 of the first two registers to obtain 4.
Observe the system to obtain (j, k) and y = [e] g where e = (a − bd) mod r.
The above steps may be interleaved, rather than executed sequentially, so as to allow the qubits in the first two registers to be recycled [8,15,16]. A single control qubit then suffices to implement the first two control registers. This is possible as the qubits in the control registers are not initially entangled; the registers are initialized to uniform superpositions of 2 m+ and 2 values, respectively. In Shor's algorithm for computing general discrete logarithms, the two control registers are instead of length m qubits. Both registers are initialized to uniform superpositions of r values. This makes the single control qubit optimization less straightforward to apply, and the initial superpositions harder to induce. Apart from this difference, Shor's algorithm and our algorithm may be easily compared in terms of the difference in the total exponent length.
In practice, the exponentiation of group elements would typically be performed by computing a group operation controlled by each bit in the exponent. Hence, a total of 2m group operations are performed in Shor's algorithm, compared to m + 2m/s in our algorithm. As s increases, this tends to m operations, providing an advantage over Shor's original algorithm by up to a factor of two at the expense of having to execute the algorithm multiple times. This reduction in the number of group operations translates into a corresponding reduction in the coherence time and circuit depth requirements of our quantum algorithm.
Note that our algorithm does not require r to be known. It suffices that the size of r is known, and that group operations and inverses may be efficiently computed. For comparison, Shor requires r to be known. This explains why Shor needs to perform only 2m operations, whilst we need 3m operations when not making tradeoffs. As we shall see, we do in fact compute both d and r simultaneously, whilst Shor computes d given r.
3 The probability of observing (j, k) and y In step (4) in section 2, we obtain (j, k) and y = [e] g with probability where the sum is over all pairs (a, b), such that 0 ≤ a < 2 m+ and 0 ≤ b < 2 , respecting the condition e ≡ a − bd (mod r). In this section, we seek a closed form error-bounded approximation to (1) summed over all y = [e] g ∈ G.
To this end, we first perform a variable substitution to obtain contiguous summation intervals. As a = e + bd + n r r for n r an integer, the index a is a function of b and n r , where 0 ≤ a = e + bd + n r r < 2 m+ , so −(e + bd)/r ≤ n r < (2 m+ − (e + bd))/r .
Substituting a for e + bd + n r r in (1) and adjusting the phase therefore yields By introducing arguments α d and α r , and corresponding angles θ d and θ r , where we may write (3) as a function of α d and α r , and e, as or of θ d and θ r , and e, as ρ(θ d , θ r , e) = 1 2 2(m+2 ) This implies that the probability of observing the pair (j, k) and y = [e] g depends only on (α d , α r ) and e, or equivalently on (θ d , θ r ) and e. The probability is virtually independent of e in practice, as e can at most shift the endpoints of the summation interval in the inner sums in (4) and (5) by one step.
As was stated above, we seek a closed form approximation to ρ(θ d , θ r , e) summed over all r group elements y = [e] g ∈ G. Hereinafter, we denote this probability and we furthermore use angles and arguments interchangeably, depending on which representation best lends itself to analysis in each step of the process.

Preliminaries
To gain some intuition, we write ρ(θ d , θ r , e) as and note that there are two obstacles to placing this expression on closed form: Firstly, the summation interval in the inner sum over n r depends on the summation variable b of the outer sum. Secondly, the exponent of the summand in the outer sum over b contains a rounding operation that depends on b.
By using that (2 m+ − (e + bd))/r − −(e + bd)/r ≈ 2 m+ /r we may remove the dependency between the inner and outer sums, and by using that −(e + bd)/r ≈ −(e + bd)/r we may remove the rounding operation.
By making these two approximations, and by adjusting the phase, we may derive an approximation to ρ(θ d , θ r , e) that is independent of e, enabling us to sum ρ(θ d , θ r , e) over the r values of e, corresponding to the r group elements y = [e] g ∈ G, simply by multiplying by r. This yields where we furthermore need to assume in (7) that θ d − θ r d/r = 0 and θ r = 0. This closed form approximation captures the general characteristics of the probability distribution induced by the quantum algorithm. However, it is seemingly non-trivial to derive a good bound for the error in this approximation.
In what follows, we use techniques similar to those employed above to derive an error-bounded closed form approximation to ρ(θ d , θ r , e) such that the error is negligible in the regions of the plane where the probability mass is concentrated.
As was the case above, we will find that the error-bounded approximation of ρ(θ d , θ r , e) is independent of e, enabling us to approximate P (θ d , θ r ) simply by multiplying the closed form approximation to ρ(θ d , θ r , e) by r.

Constructive interference
Before we proceed to develop the closed form approximation, we note that for a fixed problem instance and fixed e, the sums in ρ(θ d , θ r , e) are over a constant number of unit vectors in the complex plane. For such sums, constructive interference arises when all vectors point in approximately the same direction.
In regions of the plane where θ r and θ d − d/r θ r are both small, we hence expect constructive interference to arise. The probability mass is expected to concentrate in regions where constructive interference arises, and where the concentration of pairs (θ d , θ r ) yielded by the integers pairs (j, k) is great.
In what follows, we therefore seek to derive a closed form approximation to ρ(θ d , θ r , e), and an associated bound on the error in the approximation, such that the error is small when θ d and θ d − d/r θ r are small.  Fig. 1: The lattice L a,b for σ = 2. All red filled points are in R. The region A and its translated replicas are drawn as dashed rectangles. All blue outlined points are in A or in one of its replicas. The gray triangles outline the points that are in A or one of its replicas, but not in R, and vice versa.

Closed form approximation with error bounds
To derive a closed form approximation to ρ(θ d , θ r , e), we first observe that the sums in the expression for ρ(θ d , θ r , e) may be regarded as sums over the points in a region R in a lattice L a,b , as is illustrated in Fig. 1. Note that this figure also contains other elements to which we shall return as the analysis progresses.    Proof. The points in R are given by (a, b) = b(d, 1) + n r (r, 0), for 0 ≤ b < 2 and n r on (2) so that 0 ≤ a = e + bd + n r r < 2 m+ , which implies that In what follows, we derive a closed form approximation to ρ(θ d , θ r , e) = S R , and an associated error bound, in three steps.

Preliminaries
Before proceeding as outlined above, we first introduce some preliminary claims.
Proof. First verify that where the overlines denote complex conjugates. This implies that and so the claim follows.

Bounding | s R |
Before proceeding to the first approximation step, we furthermore bound | s R | in this section, as this bound is needed in the following analysis.
Proof. By Claim 3.1 the sum where the outer sum over b is over 2 values and the inner sum over n r is over at most 2 +1 values by Claim 3.4 below. As s R is a sum of at most 2 2 +1 complex unit vectors, it follows that | s R | ≤ 2 2 +1 , and so the lemma follows.

Approximating S R by S A T A
In the first approximation step, we approximate S R by summing the points in a small region A in R, and then replicating and translating the points in A, and the associated sum over these points, so as to approximately cover R, see Fig. 1.
Claim 3.5. Proof. The points in A are given by (a, b) = b(d, 1) + n r (r, 0) for 0 ≤ b < 2 σ and n r on (2) so that 0 ≤ a = e + bd + n r r < 2 m+ which implies that in analogy with the analysis in section 3, but with b on 0 ≤ b < 2 σ as opposed to 0 ≤ b < 2 , and so the claim follows.
To replicate and translate the points in A so as to approximately cover R, we furthermore introduce t A and T A , as defined below: The error when approximating S R by S A T A may now be bounded as follows: Proof. The exponential sum t A replicates and translates the partial sum over A so as to approximately cover R as is illustrated in Fig. 1. Every time the region is replicated, it is translated by e i(θ d 2 σ +θr −2 σ d/r ) . This exponential function may be easily seen to correspond to a vector in L a,b . The error that arises when s R is approximated by s A t A is hence due to points that are in R but excluded from the sum, and conversely to points not in R that are erroneously included in the sum. Hereinafter these points will be referred to as the erroneous points.
The erroneous points fall within the two gray triangles in Fig. 1. Both triangles are of horizontal length 2 and vertical side length 2 −σ (2 σ d mod r), as the region A is replicated and translated 2 −σ times in total, and as it is shifted horizontally by 2 σ and vertically by 2 σ d mod r every time it is translated.
To upper-bound the number of lattice points in each triangle, note that the lattice points are on 2 vertical lines, evenly separated horizontally by a distance of one. The points on each vertical line are evenly separated vertically by a distance of r, with varying starting positions on each line. For h(b) = 2 −σ (2 σ d mod r)(b/2 ) the height of each triangle at b, we have that at most lattice points are then on the vertical line that cuts through the triangle at b, as may be seen by maximizing over all possible starting points. By summing N (b) over all 2 lines, we thus obtain an upper bound of on the number of points in each triangle, where we have used that 2 2 −σ−1 ≥ 2 as σ is an integer on 0 < σ < . As there are two triangles, the total number of erroneous points is upper-bounded by 2 · 2 2 −σ = 2 2 −σ+1 . Each erroneous point corresponds to a unit vector in the complex sum s R − s A t A , which implies | s R − s A t A | ≤ 2 2 −σ+1 , and so the lemma follows.
Proof. By Claim 3.2, it holds that and so the lemma follows.
As t A is a geometric series T A = | t A | 2 may be placed on closed form. It remains to derive a closed form approximation to S A in two more steps.

Approximating S A by S A
In the second approximation step, we derive a closed form approximation to S A , by first approximating S A by the product S A of two sums, such that the leading sum may be placed on closed form, and such that the trailing sum may be placed on closed form by means of a third approximation step.
Lemma 3.4. The error when approximating s A by s A is bounded by Proof. As s A and s A are sums of complex unit vectors, and as the sums differ by at most 2 σ vectors, as may be seen by comparing the summation intervals using Claim 3.4, it follows that | s A − s A | ≤ 2 σ , and so the lemma follows.
Proof. In the expression for s A in Definition 3.7, the sum over b assumes 2 σ values and the sum over n r assumes at most 2 +1 values as the order r ≥ 2 m−1 .
As s A is a sum of at most 2 +σ+1 complex unit vectors, it follows that | s A | ≤ 2 +σ+1 , and so the lemma follows.
Lemma 3.6. The error when approximating S A by S A is upper-bounded by as | s A − s A | ≤ 2 σ by Lemma 3.4 and | s A | ≤ 2 +σ+1 by Lemma 3.5.
From the above, and Definitions 3.5 and 3.7, we have that and so the lemma follows.
The trailing sum in S A is the square norm of a geometric series. Hence, it may be trivially placed on closed form. Due to the rounding operation in the exponent, this approach is not valid for the leading sum; we need a third approximation step.

Approximating S A by S A
For θ d and θ r such that the angles all 2 σ terms in the sum are approximately one. In the third and final step of the approximation, we bound the error when simply approximating all terms in the leading sum by one.
Lemma 3.7. The difference between s A and s A is upper-bounded by Proof. First observe that By using Claim 3.3 and the triangle inequality, it follows that where we use that −x = − x and (e + bd)/r ≤ b. To verify the latter claim, note that f 1 = e/r ∈ [0, 1) and By combining the above results, we now have that and so the lemma follows.
Lemma 3.8. The error when approximating S A by S A is upper-bounded by Proof. By Claim 3.2, it holds that From the above, and Definitions 3.7 and 3.8, we have that and so the lemma follows.
This yields an approximation S A to S A that may be placed on closed form.

Main approximability result
By combining the above results, the main approximability result follows: Theorem 3.1. The probability P (θ d , θ r ) of observing a specific pair (j, k) with angle pair (θ d , θ r ), summed over all y ∈ G, may be approximated by assuming θ d 2 σ + θ r −2 σ d/r = 0 and θ r = 0 when placing the expression on closed form. The approximation error Proof. The probability ρ(θ d , θ r , e) of observing a specific pair (j, k), with angle pair (θ d , θ r ), and some group element y = [e] g ∈ G, is S R by Claim 3.1.
The error when approximating S R by S A T A is bounded by Neither of these three error terms, nor the expression for S A T A , depend on e.
Hence, we may sum over all r elements y = [e] g ∈ G by multiplying by r. It therefore follows that P (θ d , θ r ) = rS A T A is an approximation to P (θ d , θ r ), and that the error that arises in this approximation is bounded bỹ where we use that r < 2 m , and that T A ≤ 2 2( −σ) as it is the square norm of a sum of 2 −σ unit vectors by Definition 3.6, and so the theorem follows.
In appendix C we demonstrate the soundness of this approximation.

The distribution of pairs (α d , α r )
In this section, we identify and count all pairs (j, k) that yield (α d , α r ) and analyze the distribution and density of pairs (α d , α r ) in the plane.
Definition 4.1. An argument pair (α d , α r ) is said to be admissible if there exists an integer pair (j, k), for j on 0 ≤ j < 2 m+ and k on 0 ≤ k < 2 , such that Definition 4.2. Let κ d denote the greatest integer such that 2 κ d divides d, and let κ r denote the greatest integer such that 2 κr divides r.
Let L α be the lattice generated by the rows in There are 2 m+2 −κr+γ distinct admissible argument pairs. Each admissible argument pair occurs with multiplicity 2 κr−γ . Proof. As α r ≡ rj (mod 2 m+ ), the set of integers j that yield α r are given by for t r an integer on 0 ≤ t r < 2 κr . As α d ≡ dj + 2 m k (mod 2 m+ ), we need for k an integer on 0 ≤ k < 2 , to ensure compatibility. As 2 m−γ is the largest power of two to divide both 2 m and 2 m+ −κr+κ d , by the definition of γ, the congruence relation α d ≡ (α r /2 κr ) d (r/2 κr ) −1 (mod 2 m−γ ) must hold.
As t r and k run through all pairwise combinations, the set of 2 +κr arguments α d generated by (8) is equal to that generated by as t γ , or equivalently t γ , runs through all integers on 0 ≤ t γ , t γ < 2 +κr .
To go from (8) to (9), first note that B runs through all values in [2 m , 2 m+ ). If γ = 0, term A introduces multiplicity by repeating the sequence generated by B with various offsets. These offsets are of no significance to this analysis, as we only account for which values occur in the set and with what multiplicity.
If γ > 0, term A runs through all values in [2 m−γ , 2 m−γ+κr ). As κ r ≥ γ when γ > 0, term A runs all values in the subrange [2 m−γ , 2 m ). When A assumes values greater than or equal to 2 m , it introduces multiplicity by repeating the sequence of all values on [2 m−γ , 2 m+ ) generated by A and B with various offsets.
In Fig. 2 the distribution of arguments in the region of the plane where −2 m+ −1 ≤ α d , α r < 2 m+ −1 is depicted for various combinations of parameters.

Pairs
In this section we identify all pairs (j, k) that yield (α d , α r ).
Lemma 4.2. The set of integer pairs (j, k), for j on 0 ≤ j < 2 m+ and k on 0 ≤ k < 2 , that yield the admissible argument pair (α d , α r ) is given by as t r runs through all integer multiples of 2 γ on 0 ≤ t r < 2 κr .
Proof. As α r ≡ rj (mod 2 m+ ), solving for j yields for t r an integer 0 ≤ t r < 2 κr . As α d ≡ dj + 2 m k (mod 2 m+ ), for compatibility 2 m must divide 2 m+ −κr dt r for all t r = 0. As 2 m+ +κ d −κr is the greatest power of two to divide 2 m+ −κr d, it follows that t r must be a multiple of 2 γ , and so the lemma follows.

The density of pairs (α d , α r )
In this section we analyze the density of pairs (α d , α r ) in the argument plane.
Claim 4.1. The density of admissible argument pairs in the region of the plane where −2 m+ −1 ≤ α d , α r < 2 m+ −1 is 2 −m when accounting for multiplicity.
To construct the histogram for the probability distribution, the argument plane is divided into small rectangular subregions. The below lemma bounds the error when approximating the density in such subregions by 2 −m .
for λ 1 the norm of the shortest non-zero vector w 1 ∈ L α , and λ 2 the norm of the shortest non-zero vector w 2 ∈ L α that is linearly independent to w 1 .
Proof. By Lemma 4.1, the admissible argument pairs (α d , α r ) are vectors in L α in the region of the argument plane where −2 m+ −1 ≤ α d , α r < 2 m+ −1 . Each admissible argument pair occurs with multiplicity 2 κr−γ . The fundamental parallelogram in L α contains a single lattice vector. It is spanned by w 1 and w 2 , and has area det To bound the number of argument pairs (α d , α r ) ∈ R, we lower-and upperbound the number of fundamental parallelograms that can at most fit into R, as described below, paying particular attention to the border areas: To upper-bound the number of vectors in R, we extend each side of R by 2 λ 2 length units, to ensure that any parallelogram that is only partly in R is included in the count, and divide the area of the resulting rectangle by the area of the fundamental parallelogram. This yields (A + 2Cλ 2 + 4 (2 Conversely, to lower-bound the number of vectors in R, we retract each side of R by 2λ 2 length units, to ensure that all parallelograms that are only partly in the rectangle are excluded from the count, and divide the area of the resulting rectangle by det L α . This yields By combining the upper and lower bounds, dividing by the area A of R, and multiplying by 2 κr−γ to account for multiplicity, the lemma follows. For known d and r, Lemma 4.3 above provides a bound on the error when approximating the density in a rectangle in L α by 2 −m as λ 2 may then be computed. To bound the error for general problem instances, and when d and r are unknown, we introduce the following less tight lemma: Lemma 4.4. Let D be the density of admissible argument pairs (α d , α r ), when accounting for multiplicity, in a rectangle of side lengths l d and l r in the α d and α r directions, respectively, in the region where −2 m+ −1 ≤ α d , α r < 2 m+ −1 of the argument plane. Then Proof. By Lemma 4.1, the admissible argument pairs are vectors in L α . The vectors in L α are on horizontal lines (for fixed α r ) evenly separated by a vertical distance of 2 κr . The number of such lines that intersect the rectangle is upper-bounded by l r /2 κr + 1 ≤ l r /2 κr + 1 and lower-bounded by l r /2 κr ≥ l r /2 κr − 1 as may be seen by positioning the rectangle to maximize or minimize the number of lines that intersect the rectangle.
On each line, the vectors in L α are evenly spaced by a distance of 2 m−γ with varying starting positions. The number of vectors in L α that fall within the rectangle on each line is upper-bounded by l d /2 m−γ + 1 ≤ l d /2 m−γ + 1 and lowerbounded by l d /2 m−γ ≥ l d /2 m−γ − 1, when not accounting for multiplicity, as may be seen by positioning the line to maximize or minimize the number of vectors that fall within the rectangle.
Hence the number of lattice vectors in the rectangle is upper-bounded by and lower-bounded by as each vector corresponds to a pair that occurs with multiplicity 2 κr−γ . By combining the above bounds, and dividing by the area l d l r of the rectangle, the lemma follows.
For unknown d and r, the above lemma provides an error bound, assuming only some bounds on the parameters κ r and γ. Asymptotically, the error in the approximation tends to zero as the side lengths of the rectangle tend to infinity.
For rectangular subregions of specific dimensions, it may furthermore be shown that the error is zero, as is demonstrated in the following lemma: Lemma 4.5. The density of admissible argument pairs in a rectangle of side lengths positive integer multiples of 2 m−γ and 2 m−γ+κr in α d and α r , respectively, in the region where −2 m+ −1 ≤ α d , α r < 2 m+ −1 of the argument plane, is 2 −m when accounting for multiplicity.
Proof. By Lemma 4.1, the admissible arguments are vectors in L α in the region of the argument plane where From the definition of L α in Lemma 4.1, it follows that the lattice is cyclic with period 2 m−γ in α d and 2 m−γ+κr in α r . This is illustrated in Fig. 2 where rectangular regions of these dimensions are highlighted in gray. The highlighted regions all extend from the origin in Fig. 2 but the starting point may of course be arbitrarily selected. This implies that the lattice L α may be generated by replicating and translating any rectangle of side lengths positive multiples of 2 m−γ and 2 m−γ+κr in α d and α r , respectively, see Fig. 2, throughout the plane. The same holds if the rectangle is replicated and translated cyclically throughout the region of the plane where −2 m+ −1 ≤ α d , α r < 2 m+ −1 . The number of rectangles that fit in the region when replicated and translated cyclically is 2 2(m+ ) /2 2(m−γ)+κr = 2 2( +γ)−κr as the area of the region is 2 2(m+ ) and the area of the rectangle is 2 2(m−γ)+κr . The total number of lattice vectors in the region is 2 2m+ , so each rectangle contains 2 m+2 /2 2( +γ)−κr = 2 m−2γ+κr vectors when accounting for multiplicity.
By dividing by the rectangle area, we see that the density of points in each rectangle is 2 m−2γ+κr /2 2(m−γ)+κr = 2 −m , and so the lemma follows.

Simulating the quantum algorithm
In close analogy with [6], we now proceed to construct a high-resolution histogram for the probability distribution induced by the quantum algorithm, for given d and r, and to sample it to simulate the quantum algorithm.

Constructing the histogram
Except for the fact that the probability distribution is two-dimensional, and that we need to account for the closed form expression being an approximation, we exactly follow [6] to construct the high-resolution histogram: We subdivide the argument plane into regions and subregions, and integrate the closed form probability approximation and the associated error bound numerically in each subregion.
Then, we subdivide each region into rectangular subregions identified by an integer pair (ξ d , ξ r ) by requiring that for all (α d , α r ) in the subregion a resolution parameter adaptively selected as a function of the probability mass and variance in each region.
For each subregion, we compute the approximate probability mass contained within the subregion, and an associated error bound, by applying Simpson's rule in two dimensions, followed by Richardson extrapolation to cancel the linear error term, and division by 2 m to account for the density of pairs.
Simpson's rule is hence applied 2 2ν (1+2 2 ) times in each region. Each application requires the approximate probability and associated error bound to be computed in up to nine points, for which purpose we use the closed form expressions in Theorem 3.1, with σ adaptively selected to suppress the bounded error.
The optimal σ may be found by searching exhaustively. A computationally more efficient method for selecting σ is to use the heuristic in appendix C.5.3. We use the heuristic in all cases except when s is large in relation to m causing the error in the close-form approximation to be large. For such m and s we accept an extra computational burden to get slightly better σ and slightly smaller errors.
In order to save space when storing the histogram, we discard regions that capture insignificant shares of the probability mass. Note furthermore that for m and s such that the total error in the closed form approximation is large, the error may often be reduced at the expense of capturing a smaller fraction of the probability mass by simply discarding selected regions where the error is large. The errors we report in this paper are without accounting for such additional filtering.
Note that this method of constructing the histogram assumes κ d and κ r to be small in relation to m. Note also that it follows from section 4.2 that it is sound to approximate the density by 2 −m in the four regions of interest in the plane. For the m and s that we consider, the error in the density approximation is negligible.

Understanding the probability distribution
To illustrate the distribution that arises, a histogram is plotted in the signed logarithmic argument plane in Fig. 4 for m = 2048 and s = 30, and for d and r selected as explained in section 7.3. It captures approximately 99.99% of the probability mass. The total approximation error is less than 10 −3 .
The histogram plotted in Fig. 4 captures the general characteristics of the probability distribution. Varying d and r on the interval 2 m−1 < d < r < 2 m , for d and r not divisible by large powers of two, in general only slightly affects the distribution. Scaling m and s has virtually no effect on the distribution.
The probability mass is located in the regions where (| α d |, | α r |) ∼ (2 m , 2 m ), whereas for random outputs the arguments would be of size ∼ 2 m+ . Hence, a single run yields ∼ ∼ m/s bits of information on d and r, respectively.
The distribution is symmetric, in that the top right and lower left quadrants are mirrored, as are the top left and lower right quadrants. It hence suffices to compute only two quadrants to construct the histogram. To see why this is, note that flipping the sign of both arguments in the expression for P (θ d , θ r ) in Theorem 3.1 has no effect. Flipping the sign of only one argument, on the other hand, may lead to cancellation or lack of cancellation in the angle θ d 2 σ + θ r −2 σ d/r . This explains the concentration of probability mass in the top right and lower left quadrants, and in the tail along the diagonal in Fig. 4 where θ d 2 σ + θ r −2 σ d/r is small. The marginal distribution along the α d axis is virtually identical to the probability distribution induced by d when regarded as a short discrete logarithm, see [6] and Fig. 5 for comparison. Analogously, the marginal distribution along the α r axis in Fig. 4 is virtually identical to the distribution induced by r when performing order finding, see appendix A and Fig. 6 for comparison. In appendix D we show this analytically by summing P (θ d , θ r ) over all admissible θ d .
This implies that the lattice-based post-processing algorithm introduced in [6] may be used to solve sets of pairs (j, k) for both short and general d, with minor modifications, see section 6.1. An analogous lattice-based algorithm may be developed to solve sets of integers j for r, see section 6.2.

Sampling the probability distribution
Except for the fact that the probability distribution is two-dimensional, we exactly follow [6] to sample the distribution: To sample an argument pair (α d , α r ), we first sample a subregion and then sample (α d , α r ) from this subregion.
To sample the subregion, we first order all subregions in the histogram by probability, and compute the cumulative probability up to and including each subregion in the resulting ordered sequence. Then, we sample a pivot uniformly at random from [0, 1), and return the first subregion in the ordered sequence for which the cumulative probability is greater than or equal to the pivot. Note that this procedure may fail: This occurs if the pivot is greater than the total cumulative probability.
To sample an argument pair (α d , α r ) from the subregion, we first sample a point (α d , α r ) ∈ Z 2 uniformly at random from the subregion. Then, we map (α d , α r ) to the closest admissible argument pair (α d , α r ) ∈ L α by reducing the basis for L α given in Definition 4.3 and applying Babai's algorithm [1].
To sample an integer pair (j, k) from the distribution, we first sample (α d , α r ) as described above, and then sample (j, k) uniformly at random from the set of all integer pairs (j, k) yielding (α d , α r ) using Lemma 4.2. More specifically, we first sample an integer t r uniformly at random from the set of all admissible values for t r and then compute (j, k) from (α d , α r ) and t r as described in Lemma 4.2.

The classical post-processing algorithms
In this section, we describe how d and r are classically recovered from a set {(j 1 , k 1 ), . . . , (j n , k n )} of pairs produced by performing n independent runs.

Recovering d from a set of n pairs
To recover d, we exactly follow [6], and use the set of n pairs to form a vector where D = n + 1. For some constants m 1 , . . . , m n ∈ Z, the vector To recover d, it hence suffices to find u j d by enumerating all vectors in L j within a D-dimensional hypersphere of radius R d centered on v k d . Its volume is where Γ is the Gamma function, whilst the fundamental parallelepiped in L j , that by definition contains a single lattice vector, is of volume det L j = 2 (m+ )n . Heuristically, the hypersphere is hence expected to contain approximately v d = V D (R d ) / det L j lattice vectors. The exact number depends on the placement of the hypersphere in Z D , and on the shape of the fundamental parallelepiped in L j .

Estimating the minimum n required to solve for d
The radius R d depends on (j i , k i ) via α d,i for 1 ≤ i ≤ n. For fixed n and probability q d , we exactly follow [6] and estimate the minimum radius R d such that by sampling α d,i from the probability distribution. For details on how the estimate is computed, see section 6.3. Equation (11) implies that This provides a heuristic bound on the number of lattice vectors v d that at most have to be enumerated to solve for d, and that holds with probability at least q d .

Selecting n and solving for d
A simple strategy when solving for d is to select n as described in section 6.1.1 such that v d is below a bound equal to the maximum number of vectors that it is computationally feasible to enumerate with probability q d . This strategy minimizes n at the expense of performing potentially computationally expensive post-processing. Another strategy is to select n such that v d < 2 with probability q d , so that there is only one vector in the hypersphere by the heuristic. In theory, this enables us to find u j d with probability q d by mapping v k d to the closest vector in L j without enumerating vectors in L j . In practice, however, the situation is a bit more complicated as u j r = ({rj 1 } 2 m+ , . . . , {rj n } 2 m+ , r) ∈ L j and this vector is short in L j by construction. To further complicate matters, u j r /z may be in L j when r is composite, for z some factor of r, see section 6.2.1. To recover u j d , we therefore first map v k d to the closest vector in L j , and then add or subtract small integer multiples of the shortest vector in the reduced basis for L j to find u j d . This is efficient, except if r has very many small prime factors, and n is close to one, in which case an additional classical post-processing step may be required, see section 6.2.5.
Note that this complication arises only for general discrete logarithms. It does not arise in [6] when post-processing short discrete logarithms, as the order then does not enter into the equation. Note furthermore that the fact that the order now does play a part may be leveraged in the post-processing, see the next sections.

Selecting n and solving for d by exhausting subsets
The greatest argument α d,i essentially determines the bound on R d and hence on v d . A plausible strategy is therefore to make n runs, but to independently post-process all subsets of n − t pairs from the resulting n pairs, for t a constant.
To select n when using this strategy, we specify a bound B on the number of vectors v d that we accept to enumerate in each lattice of dimension n − t + 1, and follow section 6.1.1 to select the minimum n respecting this bound with probability at least q d , including only the smallest n − t arguments α d,i when bounding R d .
With probability q d , the post-processing then heuristically requires at most B lattice vectors to be enumerated in at most n t lattices of dimension n − t + 1. Note that t must be small as the binomial coefficient grows rapidly in t.

Optimizations when r is known
Note that when r is known, the argument α r,i = {rj i } 2 m+ is known for 1 ≤ i ≤ n, and α r,i provides information on α d,i as the arguments are pairwise correlated.
When constructing subsets of n − t pairs from the n pairs (j i , k i ), the pairs should be included in ascending order sorted by | α r,i |. In general, pairs such that | α r,i | exceed some bound may be rejected as large | α r,i | identify erroneous runs.

Recovering r from a set of n pairs
To recover r, we instead use that u j r = ({rj 1 } 2 m+ , . . . , {rj n } 2 m+ , r) ∈ L j is a short vector by construction. More specifically, we use that u j r is within a D-dimensional hypersphere in L j of radius centered at the origin. In close analogy with [6] and the previous section, we may recover u j r and hence r by enumerating all vectors in this hypersphere. Heuristically, we expect the hypersphere to contain v r = V D (R r ) / det L j lattice vectors.
This generalization was hinted at in the pre-print of [7]. Furthermore, it is similar to the method employed by Seifert in [21], where he uses what he refers to as simultaneous Diophantine approximation techniques to generalize Shor's [22] continued fractions expansion-based post-processing to higher dimensions. In the case of Shor's original order finding algorithm, the fact that the problem of finding a continued fraction may be perceived as a lattice problem is observed in [11].
We prefer to describe the post-processing in terms of a shortest vector problem, as this gives us two lattice problems in the same lattice L j , and as we may re-use the above tools to estimate the number of runs n required to solve the problem.

Estimating the minimum n required to solve for r
The radius R r depends on j i via α r,i for 1 ≤ i ≤ n. For fixed n and probability q r , we proceed in analogy with [6] and estimate the minimum radius R r such that by sampling α r,i from the probability distribution. For details on how the estimate is computed, see section 6.3. Equation (13) implies that This provides a heuristic bound on the number of lattice vectors v r that at most have to enumerated to solve for r, and that holds with probability at least q r .

Selecting n and solving for r
One strategy when solving for r is to use the heuristic to select n such that v r is below a bound equal to the maximum number of vectors that it is computationally feasible to enumerate, with probability q r . This strategy minimizes n at the expense of performing potentially computationally expensive post-processing. Another strategy is to select n such that v r < 2 with probability q r , so that there is only one lattice vector in the hypersphere by the heuristic. In theory, this enables us to find u j r with probability q r by computing the shortest non-zero vector. In practice, the heuristic is good when r is prime, as is typically the case when computing discrete logarithms in cryptographic settings. If r is composite, the heuristic is still good, but it may be necessary to perform a small search to find r if r has one or more small prime factors, see section 6.2.3. If r has many small prime factors, and n is close to one, an additional classical post-processing step may be required to solve efficiently for r, as there may then exist an artificially short non-zero vector in L j . This additional step is described in section 6.2.4.
A third strategy is to independently post-process subsets of the pairs output by the quantum computer, in analogy with the procedure described in section 6.1.4.

Handling composite r
Assume that r is composite. Let gcd(α r,1 , . . . , α r,n , r) = 2 κr o for o odd. Let t be the greatest integer on [0, 2 κr ] for which and r/(2 t o) will likely be recovered in the post-processing instead of u j r and r. For q an odd prime divisor of r, the probability of q also dividing α r,i for all i ∈ [1, n] is approximately q −n . This implies that r may in general be recovered from r/(2 t o) by exhausting t and o, as the search space is expected to be small: It is only if r has very many small odd prime divisors, and if n is close to one, that problems may potentially arise. Such problematic cases may be handled efficiently by introducing an additional classical post-processing step, see the next section.

Handling partially smooth r
Let P be the set of all prime factors ≤ cm, for c ≥ 1 some small constant, and let υ q be the greatest integer such that q υq < 2 m . Furthermore, let where bracket notation is used to denote generalized exponentiations. Computingg requires at most 2m · #P ≤ 2cm 2 group operations 1 to be evaluated classically, for #P the cardinality of P. It may hence be done efficiently.
As previously explained, when r is partially very smooth, the classical postprocessing algorithm is likely to return r = r/(2 t o), where o may be large, but where all prime factors of o are small. Assume that all prime factors of 2 t o are ≤ cm. It must then be that [ r ]g ≡ 1, enabling us to quickly test if r is on said form. Once r = r/2 t o is found, it is easy to find r: For all f ∈ P, computeg f and find the smallest non-negative integer e f such that [ f e f ]g f ≡ 1. Then allowing r = 2 t o · r to be recovered. Computingg f requires at most 2cm 2 group operations for each f ∈ P, for a total of at most 2cm 2 · #P = 2c 2 m 3 group operations. The recovery procedure is hence efficient.

Computing discrete logarithms when r is partially smooth
If r is partially very smooth, it may be hard to determine d, as there may exist an artificially short vector | u j r |/2 t o ∈ L j , where o is smooth. Note however that it is still possible to determine d mod r , by reducing the last component of the vector u j d sought for in the classical post-processing algorithm modulo r = r/2 t o.
Provided we classically solve the discrete logarithm problem in the residual subgroups of small prime power orders f e f dividing 2 t o, which can be done efficiently, the full logarithm d may then be found via the Chinese remainder theorem. This was originally observed by Pohlig and Hellman [17].

Estimating R d and R r
To estimate R d and R r for m, s and n, known d and r, and a given target success probability q d or q r , we exactly follow [6] and sample N sets of n argument pairs {(α d,1 , α r,1 ), . . . , (α d,n , α r,n )} from the probability distribution.
For each set, we compute R d , sort the resulting list of values in increasing order, and select the value at index (N − 1) q d to arrive at our estimate for R d . The estimate of R r is then computed analogously. The constant N controls the accuracy. If N to be sufficiently large in relation to q d and q r , and to the variance in the arguments, we expect this approach to yield sufficiently good estimates.
If we fail to sample one or more argument pairs in a set, we closely follow [6] and over-estimate R d and R r by letting R d = R r = ∞ for the set. The entries for the failed sets will then all be sorted to the end of the lists. If the value of R d or R r selected from the sorted lists is ∞, no estimate is produced.
Let p be the total probability mass covered by the histogram. The probability of all n pairs in a set being in regions covered by the histogram is then p n . When sampling N sets, the expected number of sets with finite R d and R r is N p n . As N q d and N q r entries, respectively, in the two lists must be finite for the algorithm to produce an estimate, it follows that it is required that q d , q r > p n , with some margin to account for the sampling variance, for estimates to be produced.

Estimating the number of runs required
We are now ready to estimate the number of runs n required to attain a given minimum success probability q when recovering both d and r for tradeoff factor s.

Estimating n
To estimate n for a problem instance given by d, r and s, we proceed as follows: For n = s + 1, s + 2, . . . we first estimate R d and R r by sampling N = 10 6 sets of n argument pairs (α d , α r ), as explained in section 6.3. We stop and record the smallest n for which the volume quotients v d < 2 and v r < 2 with probability q = q d = q r = 99%. As the volume quotients each decrease by approximately a constant factor for every increment in n, the minimum n may in practice be found efficiently by interpolation once a few quotients have been computed.
For selected problem instances, we verify the above initial estimate of n by simulating the quantum algorithm and post-processing the simulated output.
More specifically, with the initial estimate of n as our starting point, we sample M = 10 3 sets of n pairs (j, k), as explained in section 5.3, and test whether recovery of both d and r is successful for at least M q sets when executing the postprocessing algorithms in sections 6.1 and 6.2 without enumerating L j .
Depending on the outcome of the test, we either increment or decrement n, and repeat the process, recursively, until the smallest n such that the test passes has been identified. We record this n alongside the initial estimate of n.
In practice, we compute the closest vector in L j by reducing the lattice basis and applying Babai's [1] nearest plane algorithm. The shortest non-zero vector in L j is the shortest non-zero vector in the reduced basis. Enumeration is performed using Kannan's [10] original approach, as this is sufficient for our purposes. Note however that there are more efficient approaches in the literature.
To reduce the basis, we closely follow [6] and employ LLL and BKZ [12,13,19,20], as implemented in fpLLL v5.0, with default parameters and a block size of min(n + 1, 10) for all combinations of m, s and n. We first compute a LLL reduction. If it proves insufficient, we proceed to compute a BKZ reduction.

Selecting m and s
As the cost of estimating n for a given problem instance is non-negligible, we seek to minimize the number of problem instances considered, whilst capturing the problems that underpin most currently deployed asymmetric cryptologic schemes.
In terms of group size, the above choices of m capture most currently widely deployed elliptic curve groups, Schnorr groups and safe-prime groups.

Selecting d and r given m
For each value of m, we need to select d and r such that 2 m−1 ≤ d < r < 2 m .
For as long as d and r do not have very special properties, such as being divisible by large powers of two or being otherwise smooth, the exact values of d and r are of no great significance, however. To avoid having to tabulate d and r for the m we consider, we read d and r from the decimal expansion of Catalan's constant

Experiments and results
The estimates of n in Tab. 1 were produced by executing the above experiments. As may be seen in the table, n asymptotically tends to s + 1 as m tends to infinity for fixed s. For fixed m, it holds that n = s + 1 up to some cutoff point in s.
The estimates are for not enumerating the lattice L j . By enumerating a bounded number of vectors in the lattice, n may potentially be further reduced. In particular, our experiments show that a single run suffices to solve with probability q ≥ 99% for s = 1, provided we accept to enumerate up to ∼ 1.3 · 10 3 vectors.
As may furthermore be seen in the table, the initial estimates of n are in general verified by the simulations. In general v d > v r . Hence v d determines the initial estimate for n. Note however that when the heuristic estimate of v d is close to two, minor discrepancies between the initial estimates and the simulations may arise.
This phenomenon is discussed in [6]: For large tradeoff factors s in relation to m, increasing or decreasing n typically has a small effect on v d and v r . This may lead to slight instabilities in the estimates, as v d may be close to two for several values of n. Discrepancies may also arise, especially for large n, if we fail to find the closest and shortest non-zero vectors in L j , or if sampling fails. Such discrepancies may be amplified by the difference in the sample sizes N and M .  1  2  2  2  2  2  2  2  2  2 * 3  3  3  3  3  3  3  3  3  -4  4  4  As v d > v r in general, we therefore expect the estimates of n for computing general discrete logarithms to agree with the estimates of n for computing short discrete logarithms. This is indeed the case, see Tab. 4 on p. 41 where n is estimated for short discrete logarithms selected as in section 7.3. It is reasonable to presume that this pattern would continue if s was to be permitted to grow a bit past the point where the approximation error becomes non-negligible.
To produce Tab. 1 we had to fix some d and r such that d < r < 2 m . We did this by selecting d and r from Catalan's constant. However, the variation in each estimate of n as a function of d and r is fairly small, for as long as d and r are both of size ∼ 2 m , and not divisible by large powers of two or otherwise smooth.
Experiments imply that for d and r that fulfill these basic requirements, the larger d and r are permitted to grow in relation to 2 m , the harder it becomes to solve in the classical post-processing. For maximal d = 2 m − 1 we may hence claim to obtain "worst case" estimates of n, see Tab. 5 on p. 41 restricted to m and s such that the bound on the approximation error is negligible.

Order finding with tradeoffs
The algorithm for computing general discrete logarithms in this paper does not require the group order to be known, as neither the quantum algorithm nor the classical post-processing algorithm makes explicit use of the order. If the order of the group is unknown, it may be computed from the same set of pairs (j, k) output by the quantum computer as is used to compute the logarithm.
This implies that the algorithm may be used as an order finding algorithm. When only the order is of interest, only j need to be computed, as k is not used by the post-processing algorithm that recovers the order. The second stage of the quantum algorithm where k is computed need therefore not be executed when the goal is to perform order finding. If the second stage is removed, the quantum algorithm reduces to the algorithm proposed by Seifert [21]. For s = 1 this algorithm in turn reduces to Shor's order finding algorithm.
This provides a link between our works on computing discrete logarithms, Seifert's work on order finding, and Shor's original work. As for post-processing, Seifert generalizes Shor's continued fractions-based post-processing algorithm to higher dimensions. We instead use lattice-based post-processing.
In appendix A, we provide a description of Shor's and Seifert's quantum algorithms for order finding, a complete analysis of the probability distributions that they induce, and estimates for the number of runs n required to solve various problem instances for r when using our lattice-based post-processing algorithms.

Summary and conclusion
We generalize and bridge our earlier works on computing short discrete logarithms with tradeoffs, Seifert's work on computing orders with tradeoffs and Shor's groundbreaking works on computing orders and general discrete logarithms. In particular, we enable tradeoffs when computing general discrete logarithms.
Compared to Shor's algorithm for computing general discrete logarithms, this yields a reduction by up to a factor of two in the number of group operations evaluated quantumly in each run, at the expense of having to perform multiple runs. The runs are independent, and may hence be executed in parallel.
Unlike Shor's algorithm, our algorithm does not require the group order to be known. It simultaneously computes both the order and the logarithm. This allows it to outperform Shor's original algorithms with respect to the number of group operations that need to be evaluated quantumly in some cases even when not making tradeoffs. One cryptographically relevant example of such a case is the computation of discrete logarithms in Schnorr groups of unknown order.
We analyze the probability distributions induced by our algorithm, and by Shor's and Seifert's order finding algorithms, describe how all of these algorithms may be simulated when the solution is known, and estimate the number of runs required for a given minimum success probability when making different tradeoffs.
When solving using lattice-based post-processing without enumerating L j , the number of runs n required for a fixed tradeoff factor s tends to s + 1 asymptotically as m tends to infinity. By enumerating, n may be further reduced. Notably, when not making tradeoffs, a single run suffices to solve with at least 99% success probability, provided a small number of lattice vectors are enumerated.

A.1 The quantum algorithm
Given a generator g of a finite cyclic group of order r of length ∼ m bits, Shor's order finding algorithm [22] outputs an integer j that yields ∼ m bits on r.
Seifert [21] enabled tradeoffs in Shor's algorithm by modifying it to yield ∼ m/s bits on r in each run, for s a tradeoff factor. For s = 1 Seifert's algorithm reverts to Shor's algorithm. This allows us to conveniently describe both algorithms below: 1. Let m be the integer such that 2 m−1 < r < 2 m , let = m/s , and let

Compute [a]
g and store the result in the second register to obtain 3. Compute a QFT of size 2 m+ of the first register to obtain 4.
Observe the system to obtain j and y = [e] g where e = a mod r.
Note that Seifert's interpretation of the advantage of his algorithms is that he saves control qubits. This is not the case when recycling control qubits; see the discussion in section 2 for a more modern interpretation of the advantage.
where the sum is over all a on 0 ≤ a < 2 m+ such that a ≡ e (mod r).
In this section, we seek to place (15) on closed form. To this end, we first perform a variable substitution to obtain a contiguous summation interval. As all a that fulfill the condition that a ≡ e (mod r) are on the form a = e + n r r where 0 ≤ n r ≤ (2 m+ − 1 − e)/r, substituting a for e + n r r in (15) and adjusting the phase yields for β such that β ≡ 2 m+ (mod r), as for all e on 0 ≤ e < β it then holds that whereas for all e on β ≤ e < r, it holds that Note that β ≡ 0 (mod r) since we require r to be on 2 m−1 < r < 2 m .

A.2.1 Closed form expressions
Assuming θ r = 0, we may write (17) on closed form as Otherwise, if θ r = 0, we may write (17) on closed form as This step of the analysis is similar to a step in the analysis of Einarsson [3].

A.3 Distribution of integers j
In this section we analyze the distribution of integers j that yield α r .
Definition A.1. Let κ r denote the greatest integer such that 2 κr divides r.
Definition A.2. An argument α r is said to be admissible if there exists an integer j on 0 ≤ j < 2 m+ such that α r = {rj} 2 m+ .
Proof. As 2 κr | r and the modulus is a power of two the claim follows.
Lemma A.1. The set of integers j on 0 ≤ j < 2 m+ that yield the admissible argument α r is given by as t r runs trough all integers on 0 ≤ t r < 2 κr . Each admissible argument α r hence occurs with multiplicity 2 κr .
Proof. As α r ≡ rj (mod 2 m+ ), the lemma follows by solving for j.

A.4 Simulating the quantum algorithm
In this section, we first construct a high-resolution histogram for the probability distribution induced by the quantum algorithm for known r. We then proceed to sample the histogram it to simulate the quantum algorithm.
For each subregion, we compute the approximate probability mass contained within the subregion by applying Simpson's rule, followed by Richardson extrapolation to cancel the linear error term. Simpson's rule is hence applied 2 ν (1 + 2) times in each region. Each application requires the probability to be computed in up to three points (the two endpoints and the midpoint), for which purpose we use the closed form expression developed in section A.2.1.
Note that we should furthermore multiply by the multiplicity of arguments 2 κr , see Lemma A.1 in section A.3, and divide by 2 κr to account for the density of distinct pairs in the region. However, these operations cancel. Note also that this method of constructing the histogram assumes κ r to be small in relation to m.
To obtain a high degree of accuracy in the tail, we fix to ν = 11 for all regions. This enables us use this histogram as a reference when adaptively selecting the resolution for the two-dimensional histogram in section 5.1, see Lemma D.1.

A.4.2 Understanding the probability distribution
The probability distribution is plotted on the signed logarithmic argument axis in Fig. 4 for m = 2048 and s = 30, and for r selected as explained in section 7.3. The regions form two contiguous symmetric areas on the argument axis, as is illustrated in Fig. 6. As expected, the distribution plotted is virtually identical to the marginal distribution along the vertical α r axis in Fig. 4.
The probability mass is located in the regions where | α r | ∼ 2 m , whereas for random outputs the argument would be of size ∼ 2 m+ . Hence, a single run of the quantum algorithm yields ∼ ∼ m/s bits of information on r.

A.4.3 Sampling the probability distribution
To sample an argument α r from the distribution, we exactly follow [6]: We first sample a subregion from the histogram and then sample α r uniformly at random this subregion, with the restriction that 2 κr must divide α r so that α r is admissible. To sample a subregion from the histogram, we order all subregions in the histogram by probability, and compute the cumulative probability up to and including each subregion in the resulting ordered sequence, in analogy with section 5.3.
Then, we sample a pivot uniformly at random from [0, 1), and return the first subregion in the ordered sequence for which the cumulative probability is greater than or equal to the pivot. The sampling operation fails if the pivot is greater than the cumulative probability of the last subregion in the sequence.
To sample an integer j from the distribution, we first sample an argument α r and then select an integer j yielding α r uniformly at random from the set of all such integers using Lemma A.1. More specifically, we first sample an integer t r uniformly at random on the admissible interval for t r and then compute j from α r and t r as described in Lemma A.1.

A.5 Classical post-processing
The probability distribution induced by the quantum algorithm in section A.1 is virtually identical to the marginal distribution along the α r axis in section 5.1. Hence, the classical post-processing algorithm in section 6.2 may be used to solve sets of n integers j output by the quantum algorithm in section A.1 for r.

A.6 Estimating the number of runs required
To estimate n for problem instance given by r, we exactly follow [6]: For n = s + 1, s + 2, . . . we first estimate R r by sampling N = 10 6 sets of n arguments α r , as explained in sections A.4.3 and 6.3, and record the smallest n for which the volume quotient v r < 2 with probability q = 99%.
With this estimate of n as our starting point, we then sample M = 10 3 sets of n integers j, as explained in section A.4.3, and test whether recovery of r is successful for at least M q sets when executing the post-processing algorithm in section 6.2 without enumerating L j . Depending on the outcome of the test, we either increment or decrement n, and repeat the process, recursively, until the smallest n such that the test passes has been identified.
group size m Executing this procedure, for m and s selected as described in section 7.2, both for r selected as explained in section 7.3, and for maximal r = 2 m − 1, produced the estimates in Tab. 2 and Tab. 3, respectively. Note that for A the initial and B the simulated estimate, we print B / A, unless B = A, we then only print A. Note furthermore that we have excluded m = 384 to avoid breaking the page layout.
The tabulated estimates are for not enumerating the lattice L j . By enumerating a bounded number of vectors in the lattice, n may potentially be further reduced. In particular, our experiments show that a single run suffices to solve with probability q ≥ 99% for s = 1, provided we accept to enumerate up to ∼ 3.5 · 10 2 vectors.

A.7 Applications of order finding to integer factoring
Quantum algorithms for order finding may be used to factor integers, as was first proposed by Shor [22] using a reduction due to Miller [14]. To factor a composite integer N , that is odd and not a pure prime power, Shor proceeds as follows: Pick an integer g ∈ (1, N ) and compute D = gcd(g, N ). If D = 1, then D is a non-trivial factor of N . In practice, small and moderate size factors of N would typically be removed before attempting to factor N via order finding, so it is unlikely that factors would be found in this manner. If D = 1, then g may be perceived as a generator of a cyclic subgroup g ⊂ Z * N , and its order r computed using a quantum algorithm for order finding.
As g r ≡ 1 (mod N ), it must be that g r − 1 ≡ 0 (mod N ). If r is even and g r/2 ≡ −1 (mod N ), Miller [14] observed that as g r/2 ± 1 ≡ 0 (mod N ), whilst non-trivial factors of N may be found by computing gcd((g r/2 mod N ) ± 1, N ). This reduces the integer factoring problem to an order finding problem.
Shor originally proposed to use this reduction, and to simply re-run the whole algorithm if any of the above requirements are not fulfilled, or if the order finding algorithm fails to yield r. In [22], Shor lower-bounds the probability of his order finding algorithm yielding r in a single run, and of non-trivial factors of N being found given r, so as to obtain a lower bound on the overall success probability.
A number of improvements have since been proposed. In this appendix, we have for instance shown that the probability of Shor's original order finding algorithm yielding r in a single run is very close to one. Furthermore, we have estimated the number of runs required to obtain a similarly high success probability when making tradeoffs in Seifert's order finding algorithm. In [5], it is shown that any integer N may be completely factored classically into all of its constituent prime factors with very high probability after a single call to an order finding algorithm. Hence, the estimates we provide of the number of runs required for Shor's and Seifert's order finding algorithms to yield r are also estimates of the number of runs required to completely factor N via these order finding algorithms.

A.7.1 Factoring RSA integers
Note that if N is an RSA [18] integer, as is typically the case in cryptographic applications, a more efficient approach to factoring N is to use the algorithm of Ekerå and Håstad [7]. This algorithm reduces the RSA integer factoring problem to a short discrete logarithm problem via [9] and solves this problem quantumly.
As is explained in [6] and this appendix, the quantum part of Ekerå-Håstad's algorithm imposes less requirements on the quantum computer than Shor's or Seifert's order-finding algorithms, in each run and overall, both when making and not making tradeoffs. The probability of recovering the logarithm d is very close to one. The two prime factors of N are recovered deterministically from d.

B Short discrete logarithms with tradeoffs
The experiments in appendix A for order finding are analogous with those for short discrete logarithms in [6]. For completeness, and so as to enable comparisons, we have run experiments for short discrete logarithms, both for maximal d = 2 m − 1, and for d selected as described in section 7.2. These experiments produced the estimates in Tab. 4 and Tab. 5, respectively. Note that for A the initial and B the simulated estimate, we print B / A, unless B = A, we then only print A.

C Soundness of the closed form approximation
In this appendix, we demonstrate the fundamental soundness of the closed form approximation to P (θ d , θ r ) that we derived in section 3. This appendix is rather technical and may be considered to constitute supplementary material.

C.1 Introduction and recapitulation
Recall that by Theorem 3.1 in section 3, the probability P (θ d , θ r ) of the quantum algorithm in section 2 yielding (j, k), with associated angle pair (θ d , θ r ), summed over all r group elements y = [e] g ∈ G, may be approximated by where we have introduced some new notation in the form of the two functions f (θ r ) = that we shall use throughout this section, and that may both be placed on closed form. The error when approximating P (θ d , θ r ) by P (θ d , θ r ) is bounded by again by Theorem 3.1, where the functionẽ(θ d , θ r ) is specified.

C.1.1 Overview of the soundness argument
In what follows, we demonstrate the fundamental soundness of the above closed form approximation, by summing P (θ d , θ r ) analytically to show that virtually all probability mass is within a specific region of the plane, and by summingẽ(θ d , θ r ) analytically to show that the total approximation error in this region is negligible.
Asymptotically, in the limit as m tends to infinity for fixed s, we furthermore show that the fraction of the probability mass captured tends to one whilst the error tends to zero. This implies that P (θ d , θ r ) asymptotically captures the probability distribution completely and exactly.

C.2 Preliminaries
Before we proceed as outlined above, we first introduce some preliminaries.

C.2.1 Bounding tail regions
Claim C.1. For ∆ and N integers such that 1 < ∆ < N it holds that for z any integer such that z > 1, it follows that where, for ∆ and N integers on 1 < ∆ < N , it holds that and so the claim follows.
Proof. As φ = 0, we have by Claim C.3 below that and so the claim follows.

C.2.2 Intervals of admissible arguments and angles
To facilitate the analysis, we need notation to handle intervals of admissible angles: Definition C.1. Let Θ r (I) be the set of distinct admissible θ r on the interval I.
Definition C.2. For a fixed admissible θ r , let Θ d (I, θ r ) be the set of distinct admissible θ d on the interval I.

C.2.3 Parameterizing the admissible arguments and angles
Furthermore, we need a convenient method for parameterizing the distinct admissible argument pairs (α d , α r ), or angle pairs (θ d , θ r ).
The parameterization takes u r times the first row and u d times second row of the basis matrix for L α . It furthermore uses the second row to reduce the starting point δ r u r modulo 2 m−γ . The claim follows from this analysis.

C.3 Establishing a baseline
We begin by proving that the sum of P (θ d , θ r ) over all admissible (θ d , θ r ), with multiplicity, in the region where θ r ∈ [−π, π) and θ d ∈ [−π/2 σ , π/2 σ ), tends to one asymptotically in the limit as m tends to infinity for fixed s.
Proof. The function g(θ d , θ r ) is non-negative and periodic in θ d for fixed θ r . It cycles exactly 2 σ times on the interval θ d ∈ [−π, π), as may be seen in Fig. 7 where g(θ d , θ r ) is plotted continuously in θ d for θ r fixed to zero. Fixing a different value of θ r shifts the graph cyclically along the θ d axis. This implies that we may parameterize θ d in u d and u r using Claim C.4 and sum θ d (u d , u r ) over any consecutive sequence of 2 +γ−σ values of u d for the fixed u r given by θ r to sum over all θ d ∈ Θ d ([−π/2 σ , π/2 σ ), θ r ).
Proof. The function f (θ r ) is non-negative and periodic in θ r . It cycles exactly once on the interval θ r ∈ [−π, π). This implies that we may parameterize θ r in u r using Claim C.4, and sum over all 2 m+ −κr values of u r to sum over all θ r ∈ Θ r ([−π, π)). By using this approach and Lemma C.1, we thus obtain θr ∈ Θr([−π, π)) f (θ r ) = by using that we may shift the interval in u r , and so the lemma follows.
It follows from the above lemma that the sum of P (θ d , θ r ) over all admissible pairs (θ d , θ r ) in the region where θ d ∈ [−π/2 σ , π/2 σ ) and θ r ∈ [−π, π) tends to one as m tends to infinity for fixed s, when accounting for the fact that each distinct admissible pair (θ d , θ r ) occurs with multiplicity 2 κr−γ by Lemma 4.1.
The total approximation error, as upper-bounded by summingẽ(θ d , θ r ) over all admissible (θ d , θ r ), with multiplicity, in the region, is non-negligible however. In the next section we address this problem by reducing the size of the region.

C.4 Adapting the limited region to reduce the error
In this section, we show that the sum of P (θ d , θ r ) over all admissible (θ d , θ r ), with multiplicity, in the central part of the limited region as defined below, captures a fraction of the probability mass in τ .
Definition C.3. The central region is the part of the limited region in the angle plane where | θ d | ≤ B d and | θ r | ≤ B r , for B d = 2 τ − +1 π and B r = B d /2, and for τ an integer constant such that 1 < τ < − σ − 1.
In the next section, we describe how the approximation error, as upper-bounded by summingẽ(θ d , θ r ) over all admissible (θ d , θ r ), with multiplicity, in the central region, depends on τ . For appropriate σ and τ , virtually all probability mass is in the central region, and the total approximation error is negligible in the region.
Note that by the above definition of B d and B r , all argument pairs (α d , α r ) such that | α d | ≤ 2 m+τ and | α r | ≤ 2 m+τ −1 are in the central region. Note furthermore that B r < B d = 2 τ − +1 π ≤ 2 −σ−1 π, so the central region is a subregion of the limited region we considered in the previous section.  h(θ d + −2 σ d/r θ r /2 σ ) that is independent of θ r , where we have used Claim C.2 to obtain (18), and where we have introduced h(θ d ) = 2 4 /(2 σ θ d ) 2 that is strictly decreasing in | θ d |. The situation is depicted in Fig. 8, where g(θ d , θ r ) for θ r = 0 is plotted continuously in θ d , for | θ d | ≤ π in the top graph, and | θ d | ≤ π/2 σ in the middle graph. Fixing a non-zero θ r ∈ Θ r ([−B r , B r )) shifts the top and middle graphs in Fig. 8 cyclically by −2 σ d/r θ r /2 σ . As | −2 σ d/r θ r /2 σ | ≤ | θ r | ≤ B r , the maximum cyclic shift in θ d is upper bounded by B r , see the bottom graph in Fig. 8 where g(θ d + B r , 0) is plotted in yellow and g(θ d − B r , 0) in green.
To upper-bound (18) it therefore suffices to sum over all distinct admissible θ d on I r = [−π/2 σ , −B r ] ∪ [B r , π/2 σ ], as this captures all distinct admissible θ d in the left and right tail regions under any cyclic shift. We have that (18)

D Marginal distributions
By using results and notation from the soundness analysis in appendix C, we may immediately derive a closed form expression for the marginal distribution that arises when summing P (θ d , θ r ) over all admissible θ d with multiplicity.
The above expression for the marginal probability distribution is derived from the approximation P (θ d , θ r ). It corresponds to the exact expression derived in appendix A for the order finding algorithm with tradeoffs. Note that there are minor differences between the two expressions. These are explained by P (θ d , θ r ) being an approximation to P (θ d , θ r ), whilst the expression in appendix A is exact.
A closed form analytical expression for the marginal distribution that arises when summing over all admissible θ r is seemingly less straightforward to derive. Numerically, the marginal distribution may however be seen to correspond to that for short logarithms when 2 m−1 < d < r < 2 m as stated in section 5.2.