Stochastic methods defeat regular RSA exponentiation algorithms with combined blinding methods

: Extra - reductions occurring in Montgomery multiplications disclose side - channel information which can be exploited even in stringent contexts. In this article, we derive stochastic attacks to defeat Rivest - Shamir - Adleman ( RSA ) with Montgomery ladder regular exponentiation coupled with base blinding. Namely, we leverage on precharacterized multivariate probability mass functions of extra - reductions between pairs of ( multiplication, square ) in one iteration of the RSA algorithm and that of the next one ( s ) to build a maximum likelihood distinguisher. The e ﬃ ciency of our attack ( in terms of required traces ) is more than double compared to the state - of - the - art. In addition to this result, we also apply our method to the case of regular exponentiation, base blinding, and modulus blinding. Quite surprisingly, modulus blinding does notmakeourattackimpossible,andsoevenforlargesizesofthemodulusrandomizingelement.Atthecostof larger sample sizes our attacks tolerate noisy measurements. Fortunately, e ﬀ ective countermeasures exist.


Introduction
It has been noted by Kocher [13] as early as 1996 that asymmetric cryptographic algorithms are prone to side-channel attacks. Countermeasures have been developed in a view to make these attacks either impossible or at least much harder to perform. There are several countermeasure principles. One first class consists in balancing the control-flow so that execution traces perfectly superimpose whatever the value of the secrets. A second important class of countermeasures consists in deceiving correlation attempts by attacker with side-channel traces. The strategy consists in randomizing algorithm inputs or internal parameters, so that the computation is carried out on unpredictable data. Obviously, the randomization is restricted, since it must be possible to unravel the injected randomness at the end of the computation.
In this article, we focus on the Rivest-Shamir-Adleman (RSA) cryptosystem, while it uses its secret exponent . Despite the balancing and randomization countermeasures, attackers will desperately persist at recovering . But in order to bypass protections, the attacker needs to resort to more evolved strategies.
We make a difference between attacks which can be carried out in one single trace and those which require multiple traces (since there is not enough information in a single trace). An attack which succeeds with one single trace can overcome any algorithmic countermeasure:¹ basically, against randomizing countermeasures, it will recover the randomized version of some sensitive value, but this randomized value is still sufficient for the adversary to behave as if he knows the secret. As an example, in the case of exponent blinding, instead of computing (where is the base, is the secret exponent, and is the modulus), the side-channel protected RSA computes (where is the Euler totient function). Those two quantities are equal, owing to the Fermat little theorem, hence, it does not matter if the attacker recovers in lieu of : in both cases he can forge valid signatures or decrypt messages correctly. Indeed, is equivalent to for the purpose of signature generation or decryption. When attacks require some kind of averaging, then randomization countermeasures do work in concealing the secret, at least if the randomness is refreshed at each new computation. However, the balancing countermeasures do not deceive an attacker which averages traces, because the averaging of always the same execution allows for the attacker to increase the signal-to-noise ratio (SNR).
In practice, the attacks which succeed in a single trace are the more dangerous, and implementers defend their implementation in the first place. The so-called simple power analysis (SPA [14, §2]) introduced in 1999 allows us to read out the exponent in one trace. Therefore, the usual countermeasure consists in the implementation of a regular exponentiation algorithm. In RSA, the so-called "regular algorithm" is a method to compute the modular exponentiation using a key-independent sequence of squaring and multiplication operations. Examples of regular exponentiation algorithms are the Montgomery ladder (treated in this paper), the square and multiply always algorithm, or fixed window exponentiation with explicit multiplication also if the exponent bits in the current window are all equal to zero [15,Algorithm 14.82].
Thus, it is a protection against the simple trace analysis, where the attacker attempts to derive the exponent by observing one (or several identical) computation. The regular exponentiation countermeasure against SPA plugs the leak, but in the meantime takes care to properly align traces corresponding to various executions. This is at the advantage of the adversary, in that such unfortunate alignment opens the door to differential power analyses, as discussed in [14, §5], to template attacks [5], or to machine learning attacks [19]. Those attacks, provided they require to collect several traces from the same inputs (for averaging in order to increase the SNR), are combated by randomizing countermeasures. For instance, the input of the RSA (its base) can be randomized at the input, while being consistently derandomized at the output. Another option to randomize the intermediate computations is to randomize the modulus (so-called "modular extension"). This second option also allows us to perform a sanity check for the computation, which is incidentally a countermeasure against fault injection attacks [7]. We insist that all three countermeasures might well be stacked one on top of each other, so as to thwart simple power attacks, differential power attacks, and perturbation attacks, altogether. As an alternative to regular exponentiation algorithm, or even as a complement to it, the secret exponent can be protected by blinding, as explained earlier.
2 Previous work and our contributions 2.1 State-of-the-art We analyze in this article possible remaining biases, namely, extra-reductions inherent to the modular multiplication algorithm.  1 For the sake of being accurate, let us precise that this assertion holds true for most scenarios, but might become wrong for some pathological counterexamples where the overall attack requires some additional work (e.g., some search) which, e.g., increases in the exponent length so that an attack becomes infeasible when the exponent becomes longer by exponent blinding. However, such countermeasures are not realistic from an industrial standpoint owing to the excessive overhead they incur, thus they can safely be ignored in our argumentation.
Given two integers and , the classical modular multiplication computes the multiplication followed by the modular reduction by . Montgomery Modular Multiplication (MMM) transforms and into special representations known as their Montgomery forms. Definition 2.1. (Montgomery transformation [16]) For any modulus , the Montgomery form of is for some constant greater than and co-prime with .
In order to ease the computation, is usually chosen as the smallest power of two greater than , that is, . Using the Montgomery form of integers, modular multiplications used in modular exponentiation algorithms can be carried out using the MMM: . We set in the presence of the extra-reduction, and in its absence.
As we shall explain, this side channel is induced by the choice of moduli represented on a bitwidth, which is exactly divisible by the bitwidth of the computers, namely, this bitwidth is typically a power of two, such as 16, 32, or 64. This bias has given rise to the so-called extra-reduction analysis (ERA). An overview of known ERAs is provided in Table 1. Specifically, this table shows which countermeasure can be bypassed by which attack. The classification criteria in Table 1 are listed as follows: • the implementation uses the Chinese Remainder Theorem (CRT), i.e., the moduli and are unknown to the attacker, • the protection against differential power analysis named the base blinding, • the protection against SPA protection named the regular exponentiation algorithm, • the compensation of the extra-reduction by a fake operation, which is named constant time nonstraight line algorithm (N-SLA), i.e., constant operations have their fixed values identified by software.² In  principle (at least with a reasonable probability), these countermeasures might be detected and nullified by a suitable side-channel attack. In Table 1, we assume that such side-channel attacks exist, • identical execution times are ensured by avoiding extra-reductions at all, which is named constant time straight line algorithm (SLA). Obviously, the attacks listed in Table 1 cannot work in this case, see also Section 5, • the protection against differential power analysis named the exponent blinding, and • the fault and differential protection named modular extension.
The algorithms from ERA-1a, ERA-1b, and ERA-2 are pure (global) timing attacks. Of course, by definition, pure timing attacks cannot overcome constant time implementations. While the pure timing attacks are very different for CRT implementations and for non-CRT implementations the local timing attacks from ERA-L1 and ERA-L2 work for the CRT and non-CRT implementations as well. More precisely, these local attacks are a little bit easier to perform on non-CRT implementations because the ratio (and sometimes also the value ) does not have to be estimated there. For these reasons, we did not distinguish between CRT and not CRT there. The pioneer papers [9,30] are significantly less efficient than their successors in the respective ERA (up to factor 50) and less general [30]. The difference between ERA-L1 and ERA-L2 is that with ERA-L2, the attacker is capable of probing the cache to distinguish between two different execution paths of otherwise identical duration and power leakage, whereas with ERA-L1, the attacker is restricted to observe the duration or the power leakage. Arguably, this difference resides more in the side-channel collection than in its analysis.
Remark. The terminology in Table 1 shall be considered with attention. Indeed, historically, ERA-1a, ERA-1b, and ERA-2 are pure timing attacks discovered in this order. Similarly, ERA-L1 and ERA-L2 are local timing attacks, discovered in this order. But some papers about ERA-1b were published after the papers from ERA-L1 and vice versa.
In [10,11], side-channel attacks on RSA, with CRT and without CRT, were investigated using leakage information of the presence or absence of the extra-reductions in MMM. The side-channel information was used to identify, which MMs require extra reductions. Two exponentiation algorithms were considered, namely, the always square and multiply exponentiation and the Montgomery ladder. The overall attacks split into many individual decisions whether or , where and denote subsequent key bits. The  both countermeasures rely on a test, hence a branching in the control flow, which can be detected by a cache-timing analysis (see [2]) or by a power/electromagnetic side-channel analysis (empowered by a two-class clustering algorithm; see Figure 7 of [10]).
presented attacks were successful but for these decisions only twoone squaring and one multiplicationout of four Montgomery operations (squaring or multiplication) were exploited. However, the approach is too complex: the derivation of the probability mass function (PMF) of values for multiple operations becomes mathematically intractable when the number of operations analyzed jointly is strictly greater than two.

Novel contributions
For these reasons, in this article, we resort to another way to estimate the distribution of the extra-reduction which does not need the estimation of PMF values. We leverage on a previous work of Schindler [21]: this paper simplifies the characterization of the extra-reduction distribution using two elegant properties of MMM.
Using sophisticated stochastic methods, we solve the problem and improve the efficiency of [10,11], in the presence of regular exponent and base blinding.
Moreover, we extend the results to the case where the modulus is itself randomized. We show that ERA remains a powerful side-channel despite the stacking of three protections, namely, regular exponentiation and base and modulus blinding. We performed our experiments on 1024-bit RSA moduli as this allows a fair comparison of the attack efficiency with the experimental results in [10,11].
This manuscript contains joint research work from the years 2016-2018. We mention that parts of an intermediate version of this paper have found input in the PhD thesis of the lead author.

Outline
The rest of this paper is organized as follows. We start by giving our optimized attack in Section 3. Namely, we recapitulate in Section 3.1 the background to optimize the state-of-the-art when RSA uses a regular algorithm (we focus on the so-called Montgomery ladder) and base blinding. The core of our attack is presented in Section 3.2. Evaluation with both perfect and noisy measurements is conducted in Section 4, where we also consider the "modulus extension" as a third countermeasure on top of regular exponentiation and base blinding. Eventually, countermeasures are addressed in Section 5, and conclusions are derived in Section 6. Some formal computation results are given in Appendix A.

The optimized attack: the stochastic background
In this section, we optimize the attack from [10,11]. We begin with definitions and we formulate the target of our attack in Section 3.1. In Section 3.2, we analyze the stochastic properties of the MM, and in Lemma 3.4 we develop a formula for the joint probability of several extra-reductions. The following subsections treat the estimation of two parameters, which are usually unknown, and the maximum likelihood estimator is derived.

Definitions and target of the attack
In this paper, we only consider the Montgomery ladder (left-to-right), which is described in Algorithm 2. Unlike [10,11] we do not consider the square and always multiply algorithm (cf. Algorithm 1.1 in [11]). It is obvious how the applied mathematical methods can be transferred to the square and always multiply exponentiation algorithm.
We assume that the message has been blinded (message blinding, a.k.a. base blinding). The attack applies to both RSA with CRT and RSA without CRT. We further assume that the arithmetic operations apply the Montgomery's multiplication algorithm [17]. As in [10,11] we assume that a side-channel attack yields (possibly noisy) information about whether or not MMs need extra-reductions. The applied mathematical techniques are similar to that in [1,2,21], where attacks on different variants of fixed window exponentiation algorithms [2,21] and the sliding window exponentiation algorithm [1] were analyzed thoroughly.
To avoid clumsy formulations we always target RSA with CRT in the following, where denotes one prime factor of the RSA modulus . We note that the attack on RSA without CRT works identically and is even simpler since there is no need to estimate the ratio (which is the ratio of two public parameters). Definition 3.1 describes the notations, necessary to understand this paper.
, and , the term denotes the value of register after the key bit has been processed. Furthermore, stands for the normalized register values. For , we set if the first Montgomery operation for key bit ("multiplication") needs an extra-reduction (ER) and otherwise. Analogously, if the second Montgomery operation for key bit ("squaring," or "Quadrierung" in Germanwe apply "Q" in place of "S" to prevent confusion with the stochastic process defined below) needs an ER and otherwise. We recall that in the context of random variables the abbreviation "iid" stands for "independent and identically distributed." The indicator function assumes the value 1 if and 0 else. For , the term denotes the unique element in , which is congruent to modulo . The letter denotes the Montgomery constant for some integer . (Usually, .) When is a real number, the term denotes the real number . Finally, for we define (MM, as per Definition 2.2).

Algorithm 2. Left-to-right Montgomery ladder with MM algorithm
Input: Output: First Square 3 for down to 0 do

return
We note that and (cf. lines 1 and 6 of Algorithm 2). Besides, the key is chosen of full length (hence ) and must be coprime with , which is even (as is a prime number); therefore, is odd (hence ). This gives for free two bits of information to an attacker. The index may be determined by an SPA. Moreover, it suffices to recover the exponent for the exponentiation modulo : if denotes the secret RSA key and if , then , which factorizes the modulus (see, e.g., [21], Section 6).

The core of our attack
We interpret the as realizations of random variables , i.e., values taken on by , which assume values in . Analogously, we view and as realizations of -valued random variables and .
Lemmas 3.2(i) and (ii) collect known stochastic properties of Montgomery's multiplication algorithm, while Assertions (iii) and (iv) follow the strategy that has proven successful for fixed-window exponentiation in [2,21].
(ii) Assume that and that the random variable is uniformly distributed on . Furthermore, and denote independent random variables, which are uniformly distributed on . Then approximately The random variables may be viewed as iid uniformly distributed on .
(v) For the indicator functions, we obtain Proof. Assertions (i) and (ii) are shown in [22] (see Lemma A.3 and its proof at page 209). The core idea of the approximate representations (3.2) and (3.3) is that a small deviation of the random variable (resp. of ) causes only a small deviation of the first summand but implies an "uncontrolled large" deviation of the second summand over the unit interval. We note that if and are independent, then and are independent, too. Since the base (Algorithm 2) has been base-blinded, we may assume that is a realization of a random variable , which is uniformly distributed on the unit interval . Following (3.3) we further assume that is also uniformly distributed on and that and are independent (see also  .3). In a strict sense, this claim is certainly not correct, e.g., because the normalized register values only assume values in the finite set , and to mention just one missing number theoretical property, the cannot assume nonquadratic residua in . However, this is not relevant for our purposes since we are only interested in the (joint) probabilities of extra reductions. These events can be characterized by "metric" conditions in (cf. (3.1), (3.2), (3.3)). It should be noted that the iid assumption on the normalized intermediate random variables of the exponentiation algorithm (here: the ) has been proven successful, e.g., in [2,3,[20][21][22], and it will turn out to be successful in the following, too.
The overall attack consists of many independent decisions (which nevertheless influence each other). Each of these attack steps (decisions) considers all MM simultaneously, which are carried out when consecutive key bits are processed. Lemma 3.4 is the core of our attack. It provides the probabilities, which are needed later in Lemma 4.6 (maximum likelihood decision strategy).   with suitable integration boundaries . These integration boundaries follow immediately from Lemma 3.2(iv) and (ii) with in place of . This verifies the formula (3.9) to (3.12) for . The integral over can be transformed in the same way into a sequence of one-dimensional integrals. Since the integration boundaries depend only on the left-hand indicator functions, i.e., on the observations Lemma 3.4(i) can be verified by induction on . We first note that (swapping the right-hand indices from 0 to 1 and vice versa) defines a volume-preserving diffeomorphism on . As already pointed out above the probabilities (3.13) and (3.14) can be expressed by integrals over of indicator functions and respectively. The terms and indicate the hypotheses. From Lemma 3.2(iv), we conclude that and for all , which completes the proof of Assertion (ii). □

Lemma 3.4(ii) says that the information contained in the extra-reduction vectors
does not allow us to distinguish between the hypotheses and . This means that we can only determine the set , as depicted in Figure 1. In particular, it would be pointless to consider the case . For one can distinguish between the cases and , or equivalently, between and .
For , the parameter corresponds to (3.16) where " " denotes the addition modulo 2. For the sake of clarity, we precise that the components of vector can also be written as for .
(i) Lemma 3.4 can be applied to all -tuples for . Combining the information from all -tuples only provides the vector . This information determines the whole key since is odd due to (where we recall that is Euler totient function).
(ii) The probabilities in Lemma 3.4 do not depend on the index . By Lemma 3.4(ii), it suffices to compute at most probabilities of type (3.8). (Note that different extra-reduction vectors exist and one has to distinguish between hypotheses.) Example 3.6 illustrates the calculation of one particular probability, and the appendix contains two tables with all probabilities for . (iii) For , our attack aims at pairs of consecutive key bits . This is like the original attack in [10,11], but the original attack only exploits the extra reductions while our attack considers . The probabilities, which are applied in the original attack, are the marginal probabilities of the probability (3.8) with regard to . Obviously, the original attack exploits less information than the new attack for , and experiments confirm that for our new attack reduces by a factor greater than 2 the number of queries (cf. Figure 3). Corollary 3.7. For by applying the law of total probability on in (3.8), the joint probability for maximum likelihood described in [10,11,Theorem 2] can be recovered.
Remark 3.8. The two approaches in previous work [10,11] and this work are independent and both allow us to derive a maximum likelihood key distinguisher. Here, we are not interested in the values manipulated by the multiplication and square operations, but only with the necessary and sufficient conditions for the existence of extra-reductions, allowing an analysis of larger dimensions.

Perfect and noisy measurements
The attacker gets access to side-channel information about each bit ( ) of the exponent through the noised distribution of the pair of extra-reductions . The noise consists in two binary random variables . Additionally, the random variables and are assumed independent and identically distributed (iid), as is usually the case of measurement noise of different operations in a side-channel trace. Namely, we denote by the probability Thus, the attacker garners an iid sequence , where for each query and exponent index , and . This means that and are, respectively, the input and the output of a binary symmetric channel (BSC) of parameter . Similarly, and are also input and output of an independent identical BSC parallel to the first one.
In practical cases, detecting an extra-reduction using only one acquisition can lead to errors. Let us model the attack setup, taking into account that the detection of presence/absence of extra-reductions is a random variable, due to some noise. The random variables Markov chain for index is given as follows:

Secret
Bias Observable . The probabilities (3.8) depend on the unknown ratio . The crucial observation is that the attacker knows the position of all squarings and all multiplications. Lemma 4.2 provides concrete formula, which allows us to estimate . Of course, this estimation step is only necessary for RSA with CRT but not for RSA without CRT. We begin with a lemma, which will be needed.  We note that the probability (4.2) was already verified in [20]³ and, for instance, in [11], respectively, the latter by other mathematical methods. Formula (4.3) follows directly from (4.1) and (4.2). □ The ER-values and are determined (or more precisely: guessed) on the basis of single-trace template attacks. In particular, their guesses and might be incorrect with some probability. We  denote the corresponding random variables (referring to the guessed ER values) by and . In the following, we assume that (4.4) and similarly for the initialization of the registers and in Algorithm 2. In other words, the probability of guessing an ER value incorrectly is , independently of the true value. Of course, characterizes a perfect side-channel measurement. Lemma 4.2(iii) is the generalization of (4.3) for noisy measurements. As noted in Lemma 4.4, this allows the estimation of and .    Proof. The term quantifies the probability for the error vector . This fact and the definition of the conditional probability imply (4.7). Assertion (ii) follows immediately from (i) and Lemma 3.4(ii), applied to the particular right-hand probabilities in (4.7). □ Stochastic methods defeat regular RSA exponentiation algorithms  419 The last lemma of this section explains how to estimate the ratio and the probability .      (i) maximizes the right-hand side of (4.11) iff maximizes the right-hand side of (4.11). It thus suffices to compute the right-hand term of (4.11) for all , or, without loss of generality, by fixing one arbitrary bit within . (ii) The attacker decides for (4.12) This is the optimal decision strategy.

The optimal decision strategy
Proof. The first assertion of (i) follows from Lemma 3.4(ii), and the second is an immediate consequence of the first. With regard to the assumptions on and on the subkey we interpret the unknown subkey as a realization of random variable, which is uniformly distributed on . Then may be viewed as a realization of a random variable, which is uniformly distributed on . Furthermore, iff . Hence, (4.11) yields the maximum likelihood estimator for the transformed subkey . If we assume that each false decision is equally bad the optimal decision

Attack summary and success rate
The decision strategy in Lemma 4.6 is based on the observed extra-reductions for each multiply and square operation for calls of the cryptographic operation with a static key of -bit length ( and , as described in Algorithm 2). For each -tuple of (noisily) observed extra reductions the attacker estimates the value using the maximum likelihood estimator like described in Lemma 4.6 using only the probabilities (for and the probabilities are given as polynomials in the ratio in the informative Appendix A). Algorithm 3 permits us to retrieve the key bit values. It is a windowed algorithm, which recovers an estimation of the secret key by tuples of bits. In Algorithm 3, takes values , , , etc. The first -bit window considers the Montgomery operations, which depend on the key bits . Due to Lemma 4.3, subsequent windows overlap in one bit position. Note that at lines 4 and 16 of Algorithm 3, the final value of must be , which might not be a multiple of depending on the values of and . Thence, the final value of is adjusted to be equal to . In this case, the last window consists in bits of indices , which overlaps the last but one window in more than one bit position. Alternatively, the final maximum likelihood can be computed for a smaller window (of length ). Our first proposal saves the computation of additional probabilities (step 3 of Algorithm 3), hence it is adopted in Algorithm 3, and put in force at lines 7 and 17.
The last steps of Algorithm 3 consist in putting together pieces of bits of the key guess. Simple error correction can be applied at this stage, to fix easily one or two errors while rebuilding the full bits of the secret exponent. For each trial only the loop from line 16 in Algorithm 3 has to be executed (with modified guesses for an index or for two indices ) and the Euclidean algorithm, which is not costly. We point out that in Definition 4.8 we do not allow any false decision for the particular -bit windows for the sake of a fair comparison with the attacks in [10,11]. If we did so this would increase our success rate to some extent (and those in [10,11]  In order to compare the previous work and this optimized method, we compute the success rate of those attacks. In this article, we define the success rate of a whole exponent value. For different exponents of 512-bit length, we estimate the success rate of the attack for the modulo (RSA-1024-q defined in [11, Section 2.2]), for different probabilities and different values of depending on the number of side-channel traces . Figure 3 shows a comparison between the attack described in [10,11] and our method for between 2 and 5. Here one can observe that our method for different values increases significantly the success rate compared to the state-of-the-art method described in [10,11]. The number of side-channel traces needed to succeed the attack is divided by a factor greater than 2. More precisely, our new method recovers the key with probability using only 40% of the traces needed in [10,11]. This advantage does not depend on the size of the modulus .
The gain obtained by the increasing values is not significant.

The attack in the presence of several blinding techniques
We already know that base blinding (a.k.a. message blinding) does not prevent our attack. The reason is that our attack neither requires the knowledge of any register values and nor it needs chosen input values. In this section, we analyze the situation when in addition to base blinding either modulus blinding or exponent blinding is applied.

The combination of basis blinding with modulus blinding
In the first step, an odd modulus blinding factor is selected randomly, where for a suitable exponent , e.g., for . The modular exponentiation is calculated modulo (instead of modulo ), and the new Montgomery constant is the product in place of . The input value (base) is reduced modulo , yielding , and then the product is computed for some random value (base blinding). The result of the modular exponentiation, , is reduced modulo , which yields . Finally, the effect of the base blinding is annihilated by the multiplication with , providing the desired output .
(i) The modulus blinding factor needs to be odd because Montgomery's multiplication algorithm requires that the modulus is coprime to . (ii) Of course, the annihilating term is not computed straightforward. First of all, this would be extremely inefficient, and further, is a sensitive variable. Hence, it is better not to touch it more than necessary in computations. Hence, we recommended already to resort to a similar albeit less harmful strategy (cf. [13, §10]). If denotes the public RSA exponent, then , and thus for (with randomly selected ) we have . Such blinding, applied to Montgomery ladder regular exponentiation using MM (i.e., Algorithm 2), is illustrated in Algorithm 4. (The affectation " " stands for uniformly random assignment.) Moreover, once a pair has been found it can easily be updated by squaring both components modulo [13, §10].
(iii) In this paper, we consider the case "first modulus blinding then base blinding." This countermeasure is represented in Algorithm 5. We point out that reversed order, "first base blinding then modulus blinding," can be attacked in the same way. In this subsection, the ratio between the modulus and the Montgomery constant is no longer constant but depends on the selected modulus blinding value . Hence, we extend the notation and write in place of if .
For given modulus blinding factor one has with and . However, the applied modulus blinding factor is unknown. Relevant to our formulae is the normalized modulus blinding factor . We interpret as a realization of a random variable , which assumes values in the finite set . Then Usually the normalized blinding factors should be uniformly distributed on , i.e., each value in should occur with probability . For typical parameters (e.g., for ), the right-hand side of (4.13) can be replaced by (4.14) For reasonable parameter , the deviation of the right-hand term from the exact probability (4.13) is negligible, which should justify the "=" sign. The evaluation of the integral is fairly easy since the integrand is a polynomial in . In fact, for the integrand the integral equals . Another protection strategy would be to select modulus blinding factors uniformly in so that all blinding factors have identical (maximal) length. In this case, assumes each value in with probability , and (4.13) can be expressed by (4.15) In analogy to Section 4, the next step is to estimate and . The equivalents to (4.1) and (4.2) are Substituting and in the proof of Lemma 4.2 by the right-hand terms of (4.16) and of (4.17) (in place of (4.1) and (4.2)) yields equivalents to the formulae (4.5) and (4.6) for the modulus blinding case. Note that the conditional probabilities and depend only on but not on or . More precisely, a careful computation yields The right-hand side of (4.18) differs from (4.5) by the factor , while (4.19) coincides with (4.6) Above we have identified two strategies for the selection of modulus blinding factors, which are of particular interest. If is uniformly distributed on , then . Similarly, if is uniformly distributed on , then .
Substituting (4.13) (resp., (4.14) or (4.15)) into Lemma 4.3(i) yields analogous assertions for the modulus blinding case. The estimation of and is done as in Lemma 4.4. For different power traces, the blinding factors are selected independently according to the same distribution so that the normalized blinding factors for the power traces may be interpreted as realizations of iid random variables , where is distributed as . With the aforementioned considerations and Lemma 4.6 also applies to the modulus blinding scenario when is calculated as in (4.7), combined with (4.13). Usually, the latter should coincide with (4.14) or (4.15).
Altogether, modulus blinding does not prevent our attack. For power trace it yet reduces its efficiency since , which lowers the probability for extra-reductions. Moreover, the applied blinding factor is unknown, which results in averaged probabilities (4.13). Both can be compensated by increasing the sample size. is estimated only once at the beginning of the attack on the basis of all power traces. The intention is to reduce the loss of efficiency caused by the use of averaged probabilities (4.13). Lemma 4.6 then could be applied as in Sections 3.2-4.1 with individual parameters for each power trace. On the negative side, the estimates of the products are less precise than the estimate of in the scenario without modulus blinding since and depend only on the MM of single power traces, which undermines the intention of this attack variant. Figure 4 compares the success rate evolution of our attack, using (4.14), for the same three noise levels as in Figure 3, for with modulus randomization uniformly distributed in interval . It can be seen that the value of does not really impact on the success rate of the attack, which is in line with (4.14) and (4.15). It is corroborated by the fact that the attack success rate in the case of a modulus randomization factor uniformly distributed in does not change significantly, by adapting the attack with (4.15). These success rates are shown in Figure 5. Note that the ratio is the same in the results from Figures 4 and 5, because the modulus (on 512 bits) is the same and the Montgomery constant is also the same, namely, . The success rate for some modulus randomization factors could be derived from the exact formula (4.13). However, one shall take care that such small blinding factors should be of no practical relevance. For instance, • when , there exists only two eligible random numbers, namely 1 and 3; • when , the only four eligible random numbers are ; • when , the only eight eligible random numbers are .

Experimental results with modulus blinding
If furthermore we demand that the blinding factors have full bit length (which corresponds to (4.15)) the situation is even worse. The sets then reduce to , , and , respectively. However, such little sets of admissible modular blinding factors might allow other, even stronger attacks. Interestingly, the attacks work about with the same success rate as the original attacks [10,11] before our improvement in the absence of modulus blinding.

The combination of basis blinding with exponent blinding
Assume that base blinding is combined with (additive) exponent blinding, which means that the exponent is replaced by for some randomly selected exponent blinding factor . Our attack cannot be transferred to this situation since (4.11) assumes that is the same for all power traces. It should be noted, however, that if (e.g.) single-trace template attacks provide significant advantage over blind guessing of the exponent bits a successful attack may be possible anyway; see [27,28], for example, for details. The techniques developed in [27] obviously apply to the Montgomery ladder as well. The knowledge of the extra-reductions alone does not yet give sufficient advantage over blind guessing for single power traces. Sufficient advantage might be achieved by exploiting further features of the power traces but this is not within the scope of this paper.

Countermeasures
In Table 1 and in Section 4.3, several countermeasures were addressed and analyzed. In particular, even the combination of base blinding and exponent blinding does not prevent our attack. An option is to add exponent blinding, resulting in the combination (base blinding and exponent blinding) or in (base blinding, modulus blinding, and exponent blinding). In the absence of additional leakage, to our best knowledge no attack is known (Section 4.3).
The most solid solution, of course, is to avoid extra-reductions at all. Following an idea of C. Walter one can completely resign on extra-reductions if the Montgomery constant is not only larger than but if [29], Theorems 3 and 6. In this case, the intermediate values of the Montgomery operations within the exponentiation algorithm are always between but they do not "explode." Currently, OpenSSL library uses another strategy. Indeed, most security standards prescribe that be chosen with a size which is a multiple of the machine word size (typically 1024, 2048, 3072, and 4096 bits, which are all multiple of 32 and even 64 bits). Therefore, the abovementioned strategy of C. Walter requires that an extra limb (machine word encoding on radix in the representation of a big number) shall be allocated for each intermediate variable, which is considered too high an overhead. For this reason, OpenSSL disguises the extra-reduction in a constant time SLA, a technique mentioned already in Section 2.1. Namely, a mask of size bits ( is the size of the modulus) is computed to be equal to (i.e., 0xFF...FF in hexadecimal) when an extra-reduction is required or to (i.e., 0x00...00 in hexadecimal) when no extra-reduction is needed. Subsequently, the quantity (word obtained by bitwise logical AND of bits from and ) is subtracted from the result of the MM. This quantity is either 0 or , depending on whether an extrareduction is needed or not. This strategy implements an SLA. Such coding style is, as of today, believed secure against cache-timing attacks, because the control flow is data independent. However, the authors warn that the strategy of OpenSSL might not hide perfectly the extra-reduction if the attacker is able to partition power or electromagnetic side-channel traces based on the value of , since the absence of extra-reduction involves a remarkable subtraction with a big number equal to zero. Such bias has already been exploited in the past by attacks such as the Refined Power Analysis [12] or the Zero Power Analysis [4]. Note that OpenSSL is nowadays used in embedded systems (microcontrollers, internet of things devices, smartphones [5,18], etc.), which are indeed attackable with power and electromagnetic side-channel analyses.

Conclusion
In [10,11], ERA exploiting the dependency of two consecutive MMs was applied to attack RSA implementations, which use the Montgomery ladder or the always square and multiply exponentiation algorithm. Basis blinding does not prevent this attack. Although both attacks were successful they did not exploit all the available information. In this paper, we followed the strategy in [1,2,21], formulated, and analyzed a stochastic process, which was tailored to the stochastic behavior of the extra-reductions in Montgomery ladder. This sophisticated strategy allowed us to exploit all the given information in an optimal way. Practical experiments underlined that the new method reduces the sample size by a factor greater than 2 (to 40% of the original sample size). Our new attack can directly be transferred to the always square and multiply algorithm. Moreover, we presented a generalization of our attack, which cannot even be prevented by combination of base blinding with modulus blinding. This generalization of our attack is efficient, too.
Funding information: This work has benefited from a partial funding via TeamPlay (https://teamplay-h2020. eu/), a project from European Union's Horizon 2020 research and innovation program, under grant agreement no. 779882. The analysis methods have been integrated into Secure-IC Laboryzr tools https://www.secure-ic. com/solutions/laboryzr/.

Conflict of interest:
Authors state no conflict of interest.