Integer factoring and compositeness witnesses

Abstract We describe a reduction of the problem of factorization of integers n ≤ x in polynomial-time (log x)M+O(1) to computing Euler’s totient function, with exceptions of at most xO(1/M) composite integers that cannot be factored at all, and at most x exp −cM(loglog⁡x)3(logloglog⁡x)2 $\begin{array}{} \displaystyle \left(-\frac{c_M(\log\log x)^3}{(\log\log\log x)^2}\right) \end{array}$ integers that cannot be factored completely. The problem of factoring square-free integers n is similarly reduced to that of computing a multiple D of ϕ(n), where D ≪ exp((log x)O(1)), with the exception of at most xO(1/M) integers that cannot be factored at all, in particular O(x1/M) integers of the form n = pq that cannot be factored.


Introduction
The computational problems of factorization of a composite integer n and computation of discrete logarithms in Z * n play a significant role in the current public key cryptography. The security of many popular cryptosystems rests on the difficulty of the integer factorization problem. E.g., to break the RSA cryptosystem, it is enough to be able to factorize integers of the form n = pq. The reduction of the general factorization problem to computing the values of Euler's totient function ϕ(n) or to computing the discrete logarithms in Z * n has attracted much attention in the last decades. The existence of such a reduction (which is trivial in the special case n = pq) would, of course, render the cryptosystems in question insecure if somebody developed a method to quickly compute, e.g., ϕ(n) for large n. Even if the computation of ϕ(n) seems out of reach at present, the computation of a multiple of ϕ(n) seems more plausible. Obviously n! is one such multiple, but it is far too large for any practical purposes. If anyone came up with a fast method of computing such a multiple of reasonable size, then, as we show at the end of the paper, it would seriously impact the security of the RSA cryptosystem.
E. Bach (see [3]) showed the reduction of factoring n to solving the discrete logarithm problem in Z * n in probabilistic polynomial time and, assuming the Extended Riemann Hypothesis (ERH), also in deterministic polynomial time. The corresponding unconditional deterministic subexponential time reduction was proved in [14]. Factoring integers given the values of Euler's totient function ϕ(n) can be done in probabilistic polynomial time due to the work [15]. The related deterministic polynomial time algorithm is known only under the assumption of ERH and was done in [12]. The unconditional result is still unsolved (see [1]). Although an unconditional, deterministic algorithm is known, it runs in subexponential time, cf. [18]. It was also noticed there, that almost all positive integers can be factorized in deterministic polynomial time with the oracle Φ, which is a hypothetical device for computing the value of ϕ(n) for any positive integer n (see the next section for the definitions of the oracles). Quite recently the analogous deterministic reductions were investigated in [6] and [13] with the oracles Φ and Dec Φ, the latter giving the complete prime factorization of ϕ(n).
The aim of this paper is to design algorithms, that make use of these oracles, with improved upper bounds for the number of integers that we are unable to factorize. We approach the solution of the problem posed in [1], showing that the related reduction in polynomial time t = O(log M+5 x) holds for all, but at most O(x c M ) exceptions n ≤ x, where c = 1.34 and M ≥ 4. The value of the constant c can be decreased to c = 1 in the case of the (stronger) oracle Dec Φ. This extends and improves substantially upon the related bound for the possible exceptions proved in [6].
We also investigate the problem of complete factorization with the aid of the oracle O = Φ. In this case we were able to obtain the bound O(x/(log x) 6.5M ) for the number of the related exceptions when the oracle is queried once, and x exp when the oracle is queried multiple times.
The latter bound, while weaker than x c M , is also stronger than any bound of the form x/(log x) c . The former bound depends on the current top results related to the Vinogradov-Linnik problem on the least character non-residue. Any improvement upon the result quoted here as Lemma 3.4 will translate to an appropriate improvement of this bound.
In the last section we discuss similar reductions using slightly weaker oracles, related to the multiples of Euler's totient function. In the special case of integers of the form n = pq we show that all except O( Our approach is based on the investigation of the corresponding "hard" numbers that may not be factored with the aid of the related Fermat-Euclid compositeness witnesses or power-difference compositeness witnesses of given order (see Section 2 for definition). We remark that in order for an algorithm to complete in polynomial time with probability 1, i.e. for almost all integers, it would be enough to know that there are no more than o(x) hard integers in [1, x], in other words the set of hard integers should have density zero. However, to say that a set has "density zero" is only a very rough estimate. For example, the set of primes, the set of integers of the form n = pq, the set of squares, and the set of cubes all have density zero, but the first two are still much "denser", with O( x log x ) and O( x log log x log x ) elements in [1, x], respectively, than the other two, with O(x 1/2 ) and O(x 1/3 ) elements in [1, x]. The existence of a large set of hard integers might suggest that it might still be possible to keep the cryptosystem secure by appropriate choice of parameters. Having a tight estimate for the number of hard integers in [1, x] makes such a measure unlikely. The additional arithmetical properties of hard integers stated in some lemmas and theorems serve the purpose of showing which integers might and which should not be considered as parameters in order to keep a cryptosystem secure in the hypothetical event of someone developing one of the oracles considered here.
We transform the problem to the investigation of primitive Dirichlet characters χ mod n of given order and prove that hard numbers correspond to the exceptional conductors of such characters. The estimates for the numbers of such conductors are deduced from the bounds for the least character nonresidue proved in [4] and [10], and the enhanced analysis of the Hensel-Berlekamp method applied in [18].

Notations and basic definitions
Conventionally m, n stand for positive integers while p, q are prime numbers, A a given deterministic algorithm, O -the related oracle. We also employ the following notations throughout the paper. lcm(m, n) the least common multiple of m and n gcd(m, n) the greatest common divisor of m and n ϕ Euler' totient function P + (n), P − (n) the greatest and smallest prime divisors of n respectively ω(n) the number of distinct prime divisors of n νq(n) the exponent in the highest power of q dividing n ordn b the order of b mod n, where gcd(b, n) = 1 log x the natural logarithm of x LN(χ) for a Dirichlet character χ( mod n), the least character nonresidue, i.e. the least b such that We note that our algorithms, and thus the functions , implicitly depend on the choice of some auxiliary parameters, denoted as B, y and z. We explicitly mention these parameters in Theorems 3.1, 3.5, 4.1, 5.1 and 6.1 stating lower bounds for F and F * . Optimized values for these parameters, all of the order (log x) O(1) , may be found in the proofs of the theorems.
The investigated factoring algorithms are based on two kinds of factorization witnesses. The first is the so called Fermat-Euclid compositeness witness, i.e. an element b such that for some prime r | ordn b (more precisely it is called Fermat-Euclid compositeness witness of order r for n). For r = 2 we call such b a Miller-Rabin compositeness witness. Let r | ϕ(n) be a prime, l = νr(ϕ(n)), k ∈ {1, . . . , l}, u = ϕ(n)r −l+k−1 , and b 0 = min{b ≥ 1 : νr(ordn b) = k}. We call b a power-difference compositeness witness of order r and degree k if νr(ordn b) = k and for some j = 1, ..., r − 1 we have This notion, admittedly harder to employ than that of a Fermat-Euclid witness, is useful for small primes r ≥ 3, for which the iteration over j is possible. The idea is that if b and b 0 are not Fermat-Euclid witnesses of order r, then for each p | n both b u and b u 0 are of order r mod p, so one is the j-th power of the other (mod p) for some j = 1, ..., r − 1. Unless j is the same for every p, the gcd will yield a nontrivial factorization of n. In case of r = 2 this would not be useful, because the only element of order 2 mod p is −1, so if b is a power-difference witness for a square-free n (or indeed any n not divisible by 8), then b or b 0 is a Miller-Rabin witness for n.
A number n is called (A, B, y)-hard if P + (n) > y and the algorithm A does not find the complete factorization of n with the aid of Fermat-Euclid compositeness witnesses b ≤ B. The number n is called (A, B, y) * -hard if P − (n) > y and the algorithm A does not find any nontrivial divisor of n with the aid of Fermat-Euclid compositeness witnesses b ≤ B.
In the following sections we present several algorithms factoring square-free positive integers. This limitation would affect only a thin set of integers, because we always factor out small primes by brute force. By [17,Theorem I.4.2] the number of positive integers n ≤ x with all prime factors p > y is of the order assuming y ≤ x 1/ log log(x) (in our case y is in fact much smaller). The number of such n divisible by a square of a prime is at most provided y ≤ x 1/2 log log(x) . However, our sets of hard numbers will be much thinner than this, so it is useful to deal with non-square-free numbers. Given an oracle O = Φ we can easily reduce the general problem of factoring an integer to the square-free case via the familiar algorithm of Landau [9], of complexity O(log 3 n), that we denote by A 0 . With A 0 every positive integer n can be represented as the product n = n 1 n 2 2 ...n s s of powers of pairwise coprime, square-free numbers n i , using O(ω(n)) calls to the Φ oracle. Thus it reduces the problem of factoring n to the problem of factoring the square-free numbers n i , i ≤ s. The complexity O(log 3 n) will be negligible when compared with the factoring of square-free factors n i | n. If A is a factorization algorithm for square-free numbers, we consider a composite algorithm using both A and A 0 to factorize general numbers. In the composite algorithm we run A 0 for a given n and, inside it, we immediately let A factorize every new square-free factor m | n found by A 0 . This way we can factor out the prime factors of m from n and update the value of ϕ(n) without a new query to the oracle. We can therefore reduce the number of oracle queries to 1. We denote the resulting composite algorithm by A 0 (A). We also append any required extra parameters to this notation, or skip them if they have been fixed. E.g., we define an algorithm A 3 that requires parameters B and y, and then we refer to A * -hard numbers where A = (A 0 (A 1 ), B, y). For such algorithms we evaluate the related failure sets by careful use of estimates related to Dirichlet's characters. 1. Factor out small prime divisors p | n, p ≤ y, using division with remainder.

Factoring based on witnesses of small order
If any d is a nontrivial factor of n, return to Step 2 with n ′ ∈ {d, n/d}. Otherwise output n.  Suppose that all prime divisors of n are greater than B and that there is no Fermat-Euclid factorization witness b ≤ B of order r for n. Then at least one of the following assertions holds: (i) there exists a primitive Dirichlet character χ (mod n), whose order is a power of r, such that LN(χ) > B, (ii) we have r = 2, 2 ω(n), and there exist numbers n 1 , n 2 and primitive characters χ 1 (mod n 1 ) and χ 2 (mod n 2 ), whose orders are powers of 2, such that lcm(n 1 , n 2 ) = n, n 1 < n 2/ω(n) , n 1 n 2 < n 1+1/ω(n) , and LN(χ i ) > B for i = 1, 2, (iii) we have r ≥ 3 and there is a power-difference factorization witness b ≤ B of order r and degree k for n, or (iv) we have r ≥ 3 and k = 0.
We have r | q − 1, because r k | q − 1. Let u = ϕ(n)r −νr(ϕ(n))+k−1 . Let χ be any character of order r νr(p−1)−k+1 mod p, i.e. the u-th power of any character of order p − 1 mod p. Then for each b ≤ B we have ordχ(b) = r if νr(ordp(b)) = k, and χ(b) = 1 otherwise. Moreover there exists a smallest b 0 ≤ B such that ord(χ(b 0 )) = r. Let ψ be any character of order r νr(q−1)−k+1 mod q, i.e. the u-th power of any character of order q − 1 mod q. Then and b u would be congruent mod p and not mod q, so b would be a power-difference witness of order r and degree k, contrary to our assumption. Hence we have Therefore the character ψχ −l mod pq is equal to 1 on all b ≤ B. It is primitive, as a product of primitive characters to relatively prime moduli.
In the case r ≥ 3 we can therefore split the set of prime divisors of n to groups of 2 or 3 factors, obtain the necessary primitive character mod the product of the factors in each group, and multiply them to obtain the primitive character mod n required in (i). When r = 2 and 2 | ω(n), we can just use groups of two factors and show (i) again.
In the case a ≥ 5/2 we have f 1 (a, 2) ≥ f 2 (a, 2). Step 2 contributes ≪ (log x) a+3+ε in total to the overall complexity. The complexity of generating primes ≤ y and factoring them out is ≪ y(log x) 1+ε . We put y = (log x) a+2 . Having in mind the complexity of algorithm A 0 the complexity of algorithm A is

Lemma 3.4 (Lau, Wu [10, cf. Theorem 1]). For every non-principal Dirichlet character χ (mod q), where q is cube-free, we have
Let S denote the set of A * -hard integers, E the set of B-exceptional integers, S(x) and E(x) the corresponding counting functions. Every n ∈ S is square-free. By Lemma 3.2 every n ∈ S, n ≤ x, is either exceptional itself or it is determined by a pair of two exceptional integers, n 1 ≤ x 2/3 and n 2 < min(x, x 4/3 /n 1 ) (since 2 ω(n) implies ω(n) ≥ 3). Therefore

Algorithm 2 (A 2 ) Factorization based on witnesses of small orders and the Φ oracle
Input Square-free positive integer n, a positive multiple D of ϕ(n), auxiliary parameters B, y, z where B ≤ y. Output Factorization of n.

For each
If any d is a nontrivial factor of n, return to Step 2 with n ′ ∈ {d, n/d}. Otherwise output n.
Algorithm A 2 exploits the notion of power-difference witnesses. It is a generalization of A 1 , as it reduces to A 1 when z = 2. Accordingly, Theorem 3.5 generalizes Theorem 3.1. The sets of hard and *-hard numbers for A 2 are contained in the corresponding sets for A 1 . Conjecturally (assuming ERH) all of these sets are empty in the range of parameters given, as LN(χ) ≪ (log q) 2 by the result of N. C. Ankeny [2], so we do not aim to prove proper inclusion, however, better unconditional bounds would be desirable. In the present paper we are only able to state additional arithmetical consequences of A 2 -hardness, without improving upon the bounds. Nevertheless we include this algorithm, because power-difference witnesses potentially offer a new line of attack on the factorization problem.  The estimates in the assertion follow from Theorem 3.1. The additional property of A * -hard numbers follows from Lemma 3.2 (in the case of k = 0 the exceptional character is any character of order r).  + 1, n). If any d is a nontrivial factor of n, return to

Fermat-Euclid compositeness witnesses and nontrivial factorization
Step 2 with n ′ ∈ {d, n/d}. (w-distance prime detection) 5. Represent n in base m, i.e. n = 1+ a 1 m +· · ·+ a k m k , 0 ≤ a i < m. Attempt to factorize g(X) = 1+ a 1 X + . . . and For x, y > 0 let ψ(x, y) denote the number of integers 1 ≤ n ≤ x with P + (n) ≤ y, so-called y-smooth integers.  )︀ , and it does not include the (harmless) assumption b i > s. The idea is to make sure that the coefficients of the polynomial product g(X) = 1 + a 1 X + . . . + as X s = ∏︁ satisfy a i < m for i = 1, . . . , s, so they can be determined by expressing n = g(m) in base m. Suppose, as we may, that b 1 < b 2 < . . . < bs. We observe that for each i = 1, . . . , s we have where the first strict inequality follows from b s−i > s. Hence a i < n/m s ≤ m.
ps < n η/((η−1)(s+1)) < n 1/(η−1) , and n > exp Proof. For every prime p i | n and every b ≤ B we have ordp i b = ordn b, otherwise a factorization witness would be found. Therefore the exponent m of the subgroup generated by positive integers b ≤ B is the same in Z * n and in each Z * p i . Each B-smooth integer less than ps belongs to the subgroup in Z * ps , and for cyclic groups the exponent is equal to the group order, hence where the second inequality follows from Lemma 4. Moreover hence s < log n/ log y < w < p1−1 m , so we can apply Lemma 4.3 and obtain m s+1 < n.
Then we have log log n ≤ 3 log log B − 2 log log log B < 3 log log B, where the last inequality comes from log log log B = log log(η log log n) > log log(3 log log 15) > 0.
Proof of Theorem 4.1. Let x be large, B = (log x) a , a > 3, y = B and w = (log x) a+2 , and let ε be small, given a. Let A = (A 0 (A 3 ), B, y). Let S denote the set of A * -hard integers, E the set of B-exceptional integers, S(x) and E(x) the corresponding counting functions. It follows from Lemma 3.2, similarly to the proof of Theorem 3.1, that every n ∈ S, n ≤ x, is either B-exceptional itself or it is determined by a pair of two B-exceptional integers, n 1 ≤ x 2/ω(n) and n 2 < min(x, x 1+1/ω(n) /n 1 ). It follows from Lemma 4.4 that ω(n) > log B log log n − 1 ≥ a − 1.
We obtain, using Lemma 3.3 as before, Every A-hard integer is a multiple of an element of S. By Lemma 4.4 the smallest element n 0 of S satisfies the inequality Hence the number of A-hard integers n ≤ x is bounded by The number of passes through steps 2-5 is, again, ≪ log n. The oracle Dec Φ only needs to be queried once, for the original value of n, to supply the algorithm input. For divisors n ′ | n it suffices to have a factorization of a multiple of ϕ(n ′ ) to perform steps 2-3, and we do have ϕ(n ′ ) | ϕ(n). Step 2 and 3 each take ≪ B log n log log n multiplications mod n, which can be seen as follows. Let r 1 · . . . · rq be the original prime decomposition of ϕ(n). Then ordn b computed in Step 2 is the smallest divisor d of r 1 · . . . · rq such that b d is 1 mod n, while Step 3 requires the computation of b d/r i for each r i | d. We first show an upper bound T(r 1 , . . . , rq), for the number of multiplications mod n necessary to complete Step 2. For q = 1 we have T(r 1 ) ≪ log r 1 . For q = 2q ′ we can first consider all d of the form involving log(r 1 . . . r q ′ ) + T(r q ′ +1 , . . . , r 2q ′ ) multiplications, then, having found the minimal d 2 , consider involving log(r q ′ +1 , . . . , r 2q ′ ) + T(r 1 , . . . , r q ′ ) multiplications, and then take d = d 1 d 2 . Thus T(r 1 , . . . , r 2q ′ ) ≤ log(r 1 . . . r 2q ′ ) + T(r 1 , . . . , r q ′ ) + T(r q ′ +1 , . . . , r 2q ′ ).
Hence We then obtain an upper bound for the complexity of Step 3 in the same way as for Step 2.
It suffices to set a = M + 7/4 and the proof of Theorem 4.1 is complete.

Oracles related to multiples of ϕ(n)
Three of the algorithms presented above can also be used with slightly weaker oracles. Algorithms A 1 and A 2 can be run for n given any multiple D of ϕ(n), in which case the computational complexity grows by a factor of log D/ log x. Likewise algorithm A 3 works given a prime factorization of a multiple D of ϕ(n), whence the computational complexity grows by a factor of (log D/ log x) 1+ε . Therefore D can be larger than an arbitrarily large fixed power of x. In fact it can be as large as exp((log x) O (1) ) for the algorithms to finish in polynomial time. However, algorithms A 0 and A 4 do depend on having the exact value of ϕ(n). We can therefore state variants of Theorems 3.1, 3.5 and 4.1 referring to "raw" algorithms A 1 , A 2 and A 3 (without A 0 ) and using weaker oracles, at the cost of restricting them to square-free integers. This essentially follows from the proofs of Theorems 3.1, 3.5 and 4.1 and the well known fact that S(x) is of the same order as x.
In the special case when n is a product of two primes, n = pq, the problem of factoring n with the oracle Dec Mul Φ is always solvable using algorithm A 3 , by (2). Using the oracle Φ this problem is trivial and the solution is well known. However, it is not trivial with the weaker oracle Mul Φ. We have Again, this essentially follows from the proof of Theorem 3.1, except we do not need to consider case (ii) of Lemma 3.2, and thus obtain a better exponent in the estimate of the number of hard integers.