Skip to content
BY 4.0 license Open Access Published by De Gruyter April 20, 2021

Stochastic methods defeat regular RSA exponentiation algorithms with combined blinding methods

  • Margaux Dugardin , Werner Schindler and Sylvain Guilley

Abstract

Extra-reductions occurring in Montgomery multiplications disclose side-channel information which can be exploited even in stringent contexts. In this article, we derive stochastic attacks to defeat Rivest-Shamir-Adleman (RSA) with Montgomery ladder regular exponentiation coupled with base blinding. Namely, we leverage on precharacterized multivariate probability mass functions of extra-reductions between pairs of (multiplication, square) in one iteration of the RSA algorithm and that of the next one(s) to build a maximum likelihood distinguisher. The efficiency of our attack (in terms of required traces) is more than double compared to the state-of-the-art. In addition to this result, we also apply our method to the case of regular exponentiation, base blinding, and modulus blinding. Quite surprisingly, modulus blinding does not make our attack impossible, and so even for large sizes of the modulus randomizing element. At the cost of larger sample sizes our attacks tolerate noisy measurements. Fortunately, effective countermeasures exist.

MSC 2010: 94A60; 60G99

1 Introduction

It has been noted by Kocher [13] as early as 1996 that asymmetric cryptographic algorithms are prone to side-channel attacks. Countermeasures have been developed in a view to make these attacks either impossible or at least much harder to perform. There are several countermeasure principles. One first class consists in balancing the control-flow so that execution traces perfectly superimpose whatever the value of the secrets. A second important class of countermeasures consists in deceiving correlation attempts by attacker with side-channel traces. The strategy consists in randomizing algorithm inputs or internal parameters, so that the computation is carried out on unpredictable data. Obviously, the randomization is restricted, since it must be possible to unravel the injected randomness at the end of the computation.

In this article, we focus on the Rivest-Shamir-Adleman (RSA) cryptosystem, while it uses its secret exponent k . Despite the balancing and randomization countermeasures, attackers will desperately persist at recovering k . But in order to bypass protections, the attacker needs to resort to more evolved strategies. We make a difference between attacks which can be carried out in one single trace and those which require multiple traces (since there is not enough information in a single trace). An attack which succeeds with one single trace can overcome any algorithmic countermeasure:[1] basically, against randomizing countermeasures, it will recover the randomized version of some sensitive value, but this randomized value is still sufficient for the adversary to behave as if he knows the secret. As an example, in the case of exponent blinding, instead of computing m k mod N (where m is the base, k is the secret exponent, and N is the modulus), the side-channel protected RSA computes m k + φ ( N ) mod N (where φ is the Euler totient function). Those two quantities are equal, owing to the Fermat little theorem, hence, it does not matter if the attacker recovers k = k + φ ( N ) in lieu of k : in both cases he can forge valid signatures or decrypt messages correctly. Indeed, k is equivalent to k for the purpose of signature generation or decryption. When attacks require some kind of averaging, then randomization countermeasures do work in concealing the secret, at least if the randomness is refreshed at each new computation. However, the balancing countermeasures do not deceive an attacker which averages traces, because the averaging of always the same execution allows for the attacker to increase the signal-to-noise ratio (SNR).

In practice, the attacks which succeed in a single trace are the more dangerous, and implementers defend their implementation in the first place. The so-called simple power analysis (SPA [14, §2]) introduced in 1999 allows us to read out the exponent in one trace. Therefore, the usual countermeasure consists in the implementation of a regular exponentiation algorithm. In RSA, the so-called “regular algorithm” is a method to compute the modular exponentiation using a key-independent sequence of squaring and multiplication operations. Examples of regular exponentiation algorithms are the Montgomery ladder (treated in this paper), the square and multiply always algorithm, or fixed window exponentiation with explicit multiplication also if the exponent bits in the current window are all equal to zero [15, Algorithm 14.82].

Thus, it is a protection against the simple trace analysis, where the attacker attempts to derive the exponent by observing one (or several identical) computation. The regular exponentiation countermeasure against SPA plugs the leak, but in the meantime takes care to properly align traces corresponding to various executions. This is at the advantage of the adversary, in that such unfortunate alignment opens the door to differential power analyses, as discussed in [14, §5], to template attacks [5], or to machine learning attacks [19]. Those attacks, provided they require to collect several traces from the same inputs (for averaging in order to increase the SNR), are combated by randomizing countermeasures. For instance, the input of the RSA (its base) can be randomized at the input, while being consistently derandomized at the output. Another option to randomize the intermediate computations is to randomize the modulus (so-called “modular extension”). This second option also allows us to perform a sanity check for the computation, which is incidentally a countermeasure against fault injection attacks [7]. We insist that all three countermeasures might well be stacked one on top of each other, so as to thwart simple power attacks, differential power attacks, and perturbation attacks, altogether. As an alternative to regular exponentiation algorithm, or even as a complement to it, the secret exponent can be protected by blinding, as explained earlier.

2 Previous work and our contributions

2.1 State-of-the-art

We analyze in this article possible remaining biases, namely, extra-reductions inherent to the modular multiplication algorithm.

Given two integers a and b , the classical modular multiplication a × b mod p computes the multiplication a × b followed by the modular reduction by p . Montgomery Modular Multiplication (MMM) transforms a and b into special representations known as their Montgomery forms.

Definition 2.1

(Montgomery Transformation [16]) For any modulus p , the Montgomery form of a F p is ϕ ( a ) = a × R mod p for some constant R greater than and co-prime with p .

In order to ease the computation, R is usually chosen as the smallest power of two greater than p , that is, R = 2 log 2 ( p ) . Using the Montgomery form of integers, modular multiplications used in modular exponentiation algorithms can be carried out using the MMM:

Definition 2.2

(MMM [16]) Let ϕ ( a ) and ϕ ( b ) be two elements of F p in the Montgomery form. The MMM of ϕ ( a ) and ϕ ( b ) is ϕ ( a ) × ϕ ( b ) × R 1 mod p .

Proposition 2.3

(MMM correction [15, §14.36]) The output of the MMM of ϕ ( a ) and ϕ ( b ) is ϕ ( a b ) .

Algorithm 1 shows that the MMM can be implemented in two steps:

  1. compute D = ϕ ( a ) × ϕ ( b ) , then

  2. reduce D using Montgomery reduction which returns ϕ ( c ) .

In Algorithm 1, the pair ( R 1 , v ) is such that R R 1 v p = 1 .

Algorithm 1

   input : D = ϕ ( a ) × ϕ ( b )
   output : ϕ ( c ) = ϕ ( a ) × ϕ ( b ) × R 1 mod p
1 m ( D mod R ) × v mod R ;
2 U ( D + m × p ) R ; // Invariant: 0 U < 2 p
3if U p then
4 C U p ; // Extra-reduction
5 C U ;
6return C ;

Montgomery reduction (Algorithm 14.32 of [15])

Definition 2.4

(Extra-reduction) In Algorithm 1, when the intermediate value U is greater than p , a subtraction named eXtra-reduction occurs so as to have a result C of the Montgomery multiplication (MM) between 0 and p 1 . We set X = 1 in the presence of the extra-reduction, and X = 0 in its absence.

As we shall explain, this side channel is induced by the choice of moduli represented on a bitwidth, which is exactly divisible by the bitwidth of the computers, namely, this bitwidth is typically a power of two, such as 16, 32, or 64. This bias has given rise to the so-called extra-reduction analysis (ERA). An overview of known ERAs is provided in Table 1. Specifically, this table shows which countermeasure can be bypassed by which attack. The classification criteria in Table 1 are listed as follows:

  • the implementation uses the Chinese Remainder Theorem (CRT), i.e., the moduli p and q are unknown to the attacker,

  • the protection against differential power analysis named the base blinding,

  • the protection against SPA protection named the regular exponentiation algorithm,

  • the compensation of the extra-reduction by a fake operation, which is named constant time nonstraight line algorithm (N-SLA), i.e., constant operations have their fixed values identified by software.[2] In principle (at least with a reasonable probability), these countermeasures might be detected and nullified by a suitable side-channel attack. In Table 1, we assume that such side-channel attacks exist,

  • identical execution times are ensured by avoiding extra-reductions at all, which is named constant time straight line algorithm (SLA). Obviously, the attacks listed in Table 1 cannot work in this case, see also Section 5,

  • the protection against differential power analysis named the exponent blinding, and

  • the fault and differential protection named modular extension.

Table 1

Summary of capability of extra-reduction analyses published before December 2020

With RSA-CRT Basis blinding Regular algorithm Constant time N-SLA Constant time SLA Exponent blinding Modular extension
ERA-1a
[9,13,22,25] No No No No No No No
ERA-1b
[3,6,8,20] Yes No Yes No No No No
ERA-2
[23,24] Yes No No No No Yes No
ERA-L1
[1,2,21,26,30] Yes/No Yes Yes No No No No
ERA-L2
[10,11] Yes/No Yes Yes Yes No No No
This work
Yes/No Yes Yes Yes No No Yes

The algorithms from ERA-1a, ERA-1b, and ERA-2 are pure (global) timing attacks. Of course, by definition, pure timing attacks cannot overcome constant time implementations. While the pure timing attacks are very different for CRT implementations and for non-CRT implementations the local timing attacks from ERA-L1 and ERA-L2 work for the CRT and non-CRT implementations as well. More precisely, these local attacks are a little bit easier to perform on non-CRT implementations because the ratio p / R (and sometimes also the value R 2 ( mod p ) / p ) does not have to be estimated there. For these reasons, we did not distinguish between CRT and not CRT there. The pioneer papers [9,30] are significantly less efficient than their successors in the respective ERA (up to factor 50) and less general [30]. The difference between ERA-L1 and ERA-L2 is that with ERA-L2, the attacker is capable of probing the cache to distinguish between two different execution paths of otherwise identical duration and power leakage, whereas with ERA-L1, the attacker is restricted to observe the duration or the power leakage. Arguably, this difference resides more in the side-channel collection than in its analysis.

Remark

The terminology in Table 1 shall be considered with attention. Indeed, historically, ERA-1a, ERA-1b, and ERA-2 are pure timing attacks discovered in this order. Similarly, ERA-L1 and ERA-L2 are local timing attacks, discovered in this order. But some papers about ERA-1b were published after the papers from ERA-L1 and vice versa.

In [10,11], side-channel attacks on RSA, with CRT and without CRT, were investigated using leakage information of the presence or absence of the extra-reductions in MMM. The side-channel information was used to identify, which MMs require extra reductions. Two exponentiation algorithms were considered, namely, the always square and multiply exponentiation and the Montgomery ladder. The overall attacks split into many individual decisions whether ( k i = k i 1 ) or ( k i k i 1 ) , where k i and k i 1 denote subsequent key bits. The presented attacks were successful but for these decisions only two – one squaring and one multiplication – out of four Montgomery operations (squaring or multiplication) were exploited. However, the approach is too complex: the derivation of the probability mass function (PMF) of values for multiple operations becomes mathematically intractable when the number of operations analyzed jointly is strictly greater than two.

2.2 Novel contributions

For these reasons, in this article, we resort to another way to estimate the distribution of the extra-reduction which does not need the estimation of PMF values. We leverage on a previous work of Schindler [21]: this paper simplifies the characterization of the extra-reduction distribution using two elegant properties of MMM.

Using sophisticated stochastic methods, we solve the problem and improve the efficiency of [10,11], in the presence of regular exponent and base blinding.

Moreover, we extend the results to the case where the modulus is itself randomized. We show that ERA remains a powerful side-channel despite the stacking of three protections, namely, regular exponentiation and base and modulus blinding. We performed our experiments on 1024-bit RSA moduli as this allows a fair comparison of the attack efficiency with the experimental results in [10,11].

This manuscript contains joint research work from the years 2016–2018. We mention that parts of an intermediate version of this paper have found input in the PhD thesis of the lead author.

2.3 Outline

The rest of this paper is organized as follows. We start by giving our optimized attack in Section 3. Namely, we recapitulate in Section 3.1 the background to optimize the state-of-the-art when RSA uses a regular algorithm (we focus on the so-called Montgomery ladder) and base blinding. The core of our attack is presented in Section 3.2. Evaluation with both perfect and noisy measurements is conducted in Section 4, where we also consider the “modulus extension” as a third countermeasure on top of regular exponentiation and base blinding. Eventually, countermeasures are addressed in Section 5, and conclusions are derived in Section 6. Some formal computation results are given in Appendix 7.

3 The optimized attack: the stochastic background

In this section, we optimize the attack from [10,11]. We begin with definitions and we formulate the target of our attack in Section 3.1. In Section 3.2, we analyze the stochastic properties of the MM, and in Lemma 3.4 we develop a formula for the joint probability of several extra-reductions. The following subsections treat the estimation of two parameters, which are usually unknown, and the maximum likelihood estimator is derived.

3.1 Definitions and target of the attack

In this paper, we only consider the Montgomery ladder (left-to-right), which is described in Algorithm 2. Unlike [10,11] we do not consider the square and always multiply algorithm (cf. Algorithm 1.1 in [11]). It is obvious how the applied mathematical methods can be transferred to the square and always multiply exponentiation algorithm.

We assume that the message m has been blinded (message blinding, a.k.a. base blinding). The attack applies to both RSA with CRT and RSA without CRT. We further assume that the arithmetic operations apply the Montgomery’s multiplication algorithm [17]. As in [10,11] we assume that a side-channel attack yields the (possibly noisy) information, which MM need extra-reductions. The applied mathematical techniques are similar to that in [1,2,21], where attacks on different variants of fixed window exponentiation algorithms [2,21] and the sliding window exponentiation algorithm [1] were analyzed thoroughly.

To avoid clumsy formulations we always target RSA with CRT in the following, where p denotes one prime factor of the RSA modulus n . We note that the attack on RSA without CRT works identically and is even simpler since there is no need to estimate the ratio n / R (which is the ratio of two public parameters).

Definition 3.1 describes the notations, necessary to understand this paper.

Definition 3.1

For i = l 1 , l 2 , , 0 , and j = 0 , 1 , the term r i , j denotes the value of register R j after the key bit k i has been processed. Furthermore, s i , j r i , j / p 0 , 1 stands for the normalized register values. For i = l 2 , , 0 , we set w i ( M ) = 1 if the first Montgomery operation for key bit k i (“multiplication”) needs an extra-reduction (ER) and w i ( M ) = 0 otherwise. Analogously, w i ( Q ) = 1 if the second Montgomery operation for key bit k i (“squaring,” or “Quadrierung” in German – we apply “Q” in place of “S” to prevent confusion with the stochastic process S i ; j defined below) needs an ER and w i ( Q ) = 0 otherwise. We recall that in the context of random variables the abbreviation “iid” stands for “independent and identically distributed.” The indicator function 1 A ( x ) assumes the value 1 if x A and 0 else. For b Z , the term b ( mod p ) denotes the unique element in Z p = { 0 , 1 , , p 1 } , which is congruent to b modulo p . The letter R denotes the Montgomery constant R = 2 x for some integer x log 2 p . (Usually, x = log 2 p .) When b is a real number, the term b ( mod p ) denotes the real number b b / p . Finally, for a , b Z p we define MM ( a , b ; p ) a b R 1 ( mod p ) (MM, as per Definition 2.2).

Algorithm 2

   Input: m , k = ( k l 1 k l 2 k 0 ) 2 , p ( k l 1 = 1 and k 0 = 1 )
   Output: m k mod p
1 R 0 MM ( m , R 2 ; p )
2 R 1 MM ( R 0 , R 0 ; p ) // First Square
3for i = l 2 down to 0 do
4 5 R ¬ k i MM ( R 0 , R 1 ; p ) // i ( M ) R k i MM ( R k i , R k i ; p ) // i ( Q )
6return MM ( R 0 , 1 ; p )

Left-to-right Montgomery ladder with MM algorithm

We note that MM ( m , R 2 ; p ) m R ( mod p ) and MM ( R 0 , 1 ; p ) R 0 R 1 ( mod p ) (cf. lines 1 and 6 of Algorithm 2). Besides, the key k is chosen of full length (hence k l 1 = 1 ) and must be coprime with p 1 , which is even (as p is a prime number); therefore, k is odd (hence k 0 = 1 ). This gives for free two bits of information to an attacker. The index l may be determined by an SPA. Moreover, it suffices to recover the exponent k for the exponentiation modulo p : If d denotes the secret RSA key and if y = x d ( mod n ) , then gcd ( x k ( mod n ) , y ) = p , which factorizes the modulus n (see, e.g., [21], Section 6).

3.2 The core of our attack

We interpret the s i , j as realizations of random variables S i , j , i.e., values taken on by S i , j , which assume values in 0 , 1 . Analogously, we view w i ( M ) and w i ( Q ) as realizations of { 0 , 1 } -valued random variables W i ( M ) and W i ( Q ) . Lemmas 3.2(i) and (ii) collect known stochastic properties of Montgomery’s multiplication algorithm, while Assertions (iii) and (iv) follow the strategy that has proven successful for fixed-window exponentiation in [2,21].

Lemma 3.2

(MM)

  1. MM ( a , b ; p ) requires an extra-reduction iff

    (3.1) a p b p p R + a b p ( mod R ) R 1 iff MM ( a , b , p ) p < a p b p p R .

  2. Assume that a Z p and that the random variable B is uniformly distributed on Z p . Furthermore, U and V denote independent random variables, which are uniformly distributed on 0 , 1 . Then approximately

    (3.2) P a p B p p R + a B p ( mod R ) R 1 = P a p p R U + V 1 = p 2 R a p ,

    (3.3) P B p B p p R + B 2 p ( mod R ) R 1 = P p R U 2 + V 1 = p 3 R .

  3. The random variables S l , 0 , S l , 1 , S l 1 , 0 , , S 0 , 0 , S 0 , 1 may be viewed as iid uniformly distributed on 0 , 1 .

  4. For i = l 1 , , 0 , we have

    (3.4) W i ( M ) = 1 S i , 1 < S i + 1 , 0 S i + 1 , 1 p R if k i = 0 1 S i , 0 < S i + 1 , 0 S i + 1 , 1 p R if k i = 1 ,

    (3.5) W i ( Q ) = 1 S i , 0 < S i + 1 , 0 2 p R if k i = 0 1 S i , 1 < S i + 1 , 1 2 p R if k i = 1 .

  5. For the indicator functions, we obtain

    (3.6) 1 { W i ( M ) = 1 } = W i ( M ) , 1 { W i ( M ) = 0 } = 1 W i ( M ) ,

    (3.7) 1 { W i ( Q ) = 1 } = W i ( Q ) , 1 { W i ( Q ) = 0 } = 1 W i ( Q ) .

Proof

Assertions (i) and (ii) are shown in [22] (see Lemma A.3 and its proof at page 209). The core idea of the approximate representations (3.2) and (3.3) is that a small deviation of the random variable B (resp. of B / p ) causes only a small deviation of the first summand but implies an “uncontrolled large” deviation of the second summand over the unit interval. We note that if U and V are independent, then U and ( a / R ) U + V ( mod 1 ) are independent, too. Since the base m (Algorithm 2) has been base-blinded, we may assume that s l , 0 = r l , 0 / p = m / p is a realization of a random variable S l , 0 , which is uniformly distributed on the unit interval 0 , 1 . Following (3.3) we further assume that S 1 , 0 is also uniformly distributed on 0 , 1 and that S l , 0 and S l , 1 are independent (see also Remark 3.3). Now let us assume that the random variables S l , 0 , S l , 1 , S l 1 , 0 , , S t + 1 , 1 are iid uniformly distributed on 0 , 1 . If ( k i , k i 1 ) = ( 0 , 0 ) we may replace ( a / p ) , U (approximation of B / p ), and V in (3.2) by S i + 1 , 0 , S i + 1 , 1 , and V i , 0 , and analogously U and V in (3.3) by S i + 1 , 0 and V i , 1 , where V i , 0 and V i , 1 are uniformly distributed on 0 , 1 and independent of S l , 0 , , S i + 1 , 1 . Furthermore, the assumption that V i , 0 and V i , 1 are independent seems to be reasonable since S i + 1 , 1 and S i , 1 are independent. This assumption finally implies that the random variables S l , 0 , , S i , 1 are independent. Formula (3.4) follows from (3.1) if we replace the terms ( a / p ) and ( B / p ) by S i + 1 , 0 and S i + 1 , 1 (cf. (3.2)), and further MM ( a , b ; p ) / p by S i , 1 . Analogously, to verify (3.5) one replaces in (3.1) the terms ( B / p ) and MM ( a , b ; p ) / p by S i + 1 , 0 and S i , 1 , respectively. The cases ( k i , k i 1 ) { ( 1 , 0 ) , ( 0 , 1 ) , ( 1 , 1 ) } are similar. Assertion (v) follows immediately from the definition of indicator functions. This completes the proof of Lemma 3.2.□

Remark 3.3

(The independence assumption) A central assertion of Lemma 3.2, which is used in Lemma 3.4, is that random variables S i , j may be viewed iid uniformly distributed on 0 , 1 . This property has been deduced from the (approximate) stochastic representations (3.2) and (3.3). In a strict sense, this claim is certainly not correct, e.g., because the normalized register values r i , j / p only assume values in the finite set Z p / p 0 , 1 , and to mention just one missing number theoretical property, the r i , j cannot assume non-quadratic residua in Z p . However, this is not relevant for our purposes since we are only interested in the (joint) probabilities of extra reductions. These events can be characterized by “metric” conditions in R (cf. (3.1), (3.2), (3.3)). It should be noted that the iid assumption on the normalized intermediate random variables of the exponentiation algorithm (here: the S i , j ) has been proven successful, e.g., in [2, 3,20, 21,22], and it will turn out to be successful in the following, too.

The overall attack consists of many independent decisions (which nevertheless influence each other). Each of these attack steps (decisions) considers all MM simultaneously, which are carried out when u consecutive key bits ( k i , , k i u + 1 ) are processed. Lemma 3.4 is the core of our attack. It provides the probabilities, which are needed later in Lemma 4.6 (maximum likelihood decision strategy).

Lemma 3.4

Let u 2 and θ = ( θ 1 , , θ u ) { 0 , 1 } u .

  1. The term (3.8) quantifies the probability that the extra-reduction vector ( w i ( M ) , w i ( Q ) , , w i u + 1 ( M ) , w i u + 1 ( Q ) ) occurs if ( k i , , k i u + 1 ) = ( θ 1 , , θ u ) . The probabilities are expressed by integrals over 0 , 1 2 u + 2 . The index θ shows the dependency on θ .

    (3.8) P θ W i ( M ) = w i ( M ) , W i ( Q ) = w i ( Q ) , , W i u + 1 ( M ) = w i u + 1 ( M ) , W i u + 1 ( Q ) = w i u + 1 ( Q ) = 0 1 0 1 a 1 b 1 a 2 b 2 a 2 u 1 b 2 u 1 a 2 u b 2 u 1 d s i u + 1 , 1 d s i u + 1 , 0 d s i , 1 d s i , 0 d s i + 1 , 1 d s i + 1 , 0 .

    Note: When the key bit k j (for j { i , i 1 , , i u + 1 } ) is processed the register value R v ( v { 0 , 1 } ) corresponds to the integration variable s j , v . The integration boundaries ( a 2 j , b 2 j ) and ( a 2 j 1 , b 2 j 1 ) correspond to the integration with regard to the variables s i j + 1 , 1 and s i j + 1 , 0 , respectively ( j = 1 , , u ). The integration boundaries depend on the hypothesis θ = ( θ 1 , , θ u ) and the observed extra-reduction vector ( w i ( M ) , w i ( Q ) , , w i u + 1 ( M ) , w i u + 1 ( Q ) ) . More precisely, for j { 1 , , u } we have

    If θ j = 0 , then

    (3.9) ( a 2 j , b 2 j ) = 0 , s i j + 2 , 0 s i j + 2 , 1 p R if w i j + 1 ( M ) = 1 s i j + 2 , 0 s i j + 2 , 1 p R , 1 if w i j + 1 ( M ) = 0

    (3.10) ( a 2 j 1 , b 2 j 1 ) = 0 , s i j + 2 , 0 2 p R if w i j + 1 ( Q ) = 1 s i j + 2 , 0 2 p R , 1 if w i j + 1 ( Q ) = 0 .

    If θ j = 1 , then

    (3.11) ( a 2 j , b 2 j ) = 0 , s i j + 2 , 1 2 p R if w i j + 1 ( Q ) = 1 s i j + 2 , 1 2 p R , 1 if w i j + 1 ( Q ) = 0

    and

    (3.12) ( a 2 j 1 , b 2 j 1 ) = 0 , s i j + 2 , 0 s i j + 2 , 1 p R if w i j + 1 ( M ) = 1 s i j + 2 , 0 s i j + 2 , 1 p R , 1 if w i j + 1 ( M ) = 0 .

  2. Let 1 ( 1 , , 1 ) (with u components). For each hypothesis θ { 0 , 1 } u and each extra-reduction vector ( w i ( M ) , w i ( Q ) , , w i t + 1 ( M ) , w i u + 1 ( Q ) ) , we have

    (3.13) P θ ( W i ( M ) = w i ( M ) , , W i u + 1 ( Q ) = w i u + 1 ( Q ) )

    (3.14) P 1 θ ( W i ( M ) = w i ( M ) , , W i u + 1 ( Q ) = w i u + 1 ( Q ) ) .

Proof

By Lemma 3.2(iv), the random variables W i ( M ) , W i ( Q ) , , W i u + 1 ( M ) , W i u + 1 ( Q ) can be expressed by indicator functions, which depend on the random variables S i + 1 , 1 , S i + 1 , 0 , , S i u + 1 , 1 , S i u + 1 , 0 . This allows us to express the probability (3.8) by an integral over 0 , 1 2 u + 2 of a product of indicator functions. Furthermore, for j < u the indicator functions 1 { W i j + 1 ( M ) = w i j + 1 ( M ) } and 1 { W i j + 1 ( Q ) = w i j + 1 ( Q ) } actually only depend on s i + 1 , 1 , s i + 1 , 0 , , s i u + 2 , 1 , s i u + 2 , 0 , while 1 { W i u + 1 ( M ) = w i u + 1 ( M ) } and 1 { W i u + 1 ( Q ) = w i u + 1 ( Q ) } merely depend on s i u + 2 , 1 , s i u + 2 , 0 , s i u + 1 , 1 , s i u + 1 , 0 . This allows us to express (3.8) in the form

(3.15) 0 , 1 2 u j = 1 u 1 1 { W i j + 1 ( M ) = w i j + 1 ( M ) } 1 { W i j + 1 ( Q ) = w i j + 1 ( Q ) } a 2 u 1 b 2 u 1 a 2 u b 2 u 1 d s i u + 1 , 1 d s i u + 1 , 0 d s i u + 2 , 1 d s i + 2 , 0

with suitable integration boundaries a 2 u 1 , b 2 u 1 , a 2 u , b 2 u . These integration boundaries follow immediately from Lemma 3.2(iv) and (ii) with i u + 1 in place of i . This verifies the formula (3.9) to (3.12) for j = u . The integral over 0 , 1 2 u can be transformed in the same way into a sequence of one-dimensional integrals. Since the integration boundaries a 1 , b 1 , , a 2 u 2 , b 2 u 2 depend only on the left-hand indicator functions, i.e., on the observations w i ( M ) , w i ( Q ) , , w i u + 2 ( M ) , w i u + 2 ( Q ) Lemma 3.4(i) can be verified by induction on u .

We first note that

ϕ : 0 , 1 2 u + 2 0 , 1 2 u + 2 , ϕ ( s i + 1 , 1 , s i + 1 , 0 , , s i u + 1 , 1 , s i u + 1 , 0 ) ( s i + 1 , 0 , s i + 1 , 1 , , s i u + 1 , 1 , s i u + 1 , 0 )

(swapping the right-hand indices from 0 to 1 and vice versa) defines a volume-preserving diffeomorphism on 0 , 1 2 u + 2 . As already pointed out above the probabilities (3.13) and (3.14) can be expressed by integrals over 0 , 1 2 u + 2 of indicator functions

j = 1 u 1 [ θ ] { W i j + 1 ( M ) = w i j + 1 ( M ) } 1 [ θ ] { W i j + 1 ( Q ) = w i j + 1 ( Q ) }

and

j = 1 u 1 [ 1 θ ] { W i j + 1 ( M ) = w i j + 1 ( M ) } 1 [ 1 θ ] { W i j + 1 ( Q ) = w i j + 1 ( Q ) }

respectively. The terms θ and 1 θ indicate the hypotheses. From Lemma 3.2(iv), we conclude that 1 [ 1 θ ] { W i j + 1 ( M ) = w i j + 1 ( M ) } = 1 [ θ ] { W i j + 1 ( M ) = w i j + 1 ( M ) } ϕ and 1 [ 1 θ ] { W i j + 1 ( Q ) = w i j + 1 ( Q ) } = 1 [ θ ] { W i j + 1 ( Q ) = w i j + 1 ( Q ) } ϕ for all j u , which completes the proof of Assertion (ii).□

Lemma 3.4(ii) says that the information contained in the extra-reduction vectors ( w i ( M ) , , w i u + 1 ( Q ) ) does not allow us to distinguish between the hypotheses θ and 1 θ . This means that we can only determine the set { θ , 1 θ } , as depicted in Figure 1.

Figure 1 
            Information collected during the presented attack on 
                  
                     
                     
                        u
                     
                     u
                  
                pairs of extra-reductions.
Figure 1

Information collected during the presented attack on u pairs of extra-reductions.

In particular, it would be pointless to consider the case u = 1 . For u = 2 one can distinguish between the cases ( k i , k i 1 ) { ( 0 , 0 ) , ( 1 , 1 ) } and ( k i , k i 1 ) { ( 0 , 1 ) , ( 1 , 0 ) } , or equivalently, between k i = k i 1 and k i k i 1 . For u 2 , the parameter θ { ( θ 1 , , θ u ) , ( 1 θ 1 , , 1 θ u ) } corresponds to

(3.16) ( k i k i 1 = θ 1 θ 2 , , k i u + 2 k i u + 1 = θ u 1 θ u ) ,

where “ ” denotes the addition modulo 2. For the sake of clarity, we precise that the components of vector 1 θ can also be written as ( 1 θ i ) = ¬ θ i for 1 i u .

Remark 3.5

  1. Lemma 3.4 can be applied to all u -tuples ( k i , , k i + u 1 ) for i = l 1 , , u 1 . Combining the information from all u -tuples only provides the vector ( k l 1 k l 2 , , k 1 k 0 ) . This information determines the whole key k = ( k l = 1 , k l 1 , k 0 ) since k is odd due to gcd ( k , φ ( p ) ) = 1 (where we recall that φ is Euler totient function).

  2. The probabilities in Lemma 3.4 do not depend on the index i . By Lemma 3.4(ii), it suffices to compute at most 2 3 u 1 probabilities of type (3.8). (Note that 2 2 u different extra-reduction vectors exist and one has to distinguish between 2 u 1 hypotheses.) Example 3.6 illustrates the calculation of one particular probability, and the appendix contains two tables with all probabilities for u = 2 .

  3. For u = 2 , our attack aims at pairs of consecutive key bits ( k i , k i 1 ) . This is like the original attack in [10,11], but the original attack only exploits the extra reductions ( w Q ( i ) , w M ( i 1 ) ) while our attack considers ( w M ( i ) , w Q ( i ) , w M ( i 1 ) , w Q ( i 1 ) ) . The probabilities, which are applied in the original attack, are the marginal probabilities of the probability (3.8) with regard to ( w M ( i ) , w Q ( i 1 ) ) . Obviously, the original attack exploits less information than the new attack for u = 2 , and experiments confirm that for u = 2 our new attack reduces by a factor greater than 2 the number of queries (cf. Figure 3).

Example 3.6

Let ( θ 1 , θ 2 ) = ( 0 , 1 ) and ( w i ( M ) , w i ( Q ) , w i 1 ( M ) , w i 1 ( Q ) ) = ( 1 , 1 , 0 , 1 ) . By Lemma 3.4(i),

(3.17) P θ ( W i ( M ) = 1 , W i ( Q ) = 1 , W i 1 ( M ) = 0 , W i 1 ( Q ) = 1 ) = 0 1 0 1 0 s i + 1 , 0 s i + 1 , 1 p R 0 s i + 1 , 1 2 p R 0 s i , 0 2 p R s i , 0 s i , 1 p R 1 1 d s i 1 , 1 d s i 1 , 0 d s i , 1 d s i , 0 d s i + 1 , 1 d s i + 1 , 0 = = 0 1 0 1 0 s i + 1 , 0 s i + 1 , 1 p R 0 s i + 1 , 1 2 p R s i , 0 2 p R s i , 0 3 s i , 1 p R 2 d s i , 1 d s i , 0 d s i + 1 , 1 d s i + 1 , 0 = = 0 1 0 1 1 3 s i + 1 , 0 3 s i + 1 , 1 5 p R 5 1 8 s i + 1 , 0 4 s i + 1 , 1 8 p R 8 d s i + 1 , 1 d s i + 1 , 0 = = 1 3 4 6 p R 5 1 8 5 9 p R 8 = 1 72 p R 5 1 360 p R 8 .

Corollary 3.7

For u = 2 by applying the law of total probability on P θ in (3.8), the joint probability for maximum likelihood described in [10, 11, Theorem 2] can be recovered.

Remark 3.8

The two approaches in previous work [10, 11] and this work are independent and both allow us to derive a maximum likelihood key distinguisher. Here, we are not interested in the values manipulated by the multiplication and square operations, but only with the necessary and sufficient conditions for the existence of extra-reductions, allowing an analysis of larger dimensions.

4 Perfect and noisy measurements

The attacker gets access to side-channel information about each bit k i ( l 2 i > 0 ) of the exponent k through the noised distribution of the pair of extra-reductions ( W i ( M ) , W i ( Q ) ) . The noise consists in two binary random variables ( N i ( M ) , N i ( Q ) ) . Additionally, the random variables N i ( M ) and N i ( Q ) are assumed independent and identically distributed (iid), as is usually the case of measurement noise of different operations in a side-channel trace. Namely, we denote by p noise the probability

p noise = P ( N i ( M ) = 1 ) = P ( N i ( Q ) = 1 ) for all i .

Thus, the attacker garners an iid sequence ( y i ( M ) ̲ , y i ( Q ) ̲ ) = ( y i ( M ) ; n , y i ( Q ) ; n ) n = 1 , , N , where for each query n and exponent index i { l 1 , , 0 } , y i ( M ) ; n = x i ( M ) ; n n i ( M ) ; n and y i ( Q ) ; n = x i ( Q ) ; n n i ( Q ) ; n . This means that W i ( M ) and Y i ( M ) are, respectively, the input and the output of a binary symmetric channel (BSC) of parameter p noise . Similarly, W i ( Q ) and Y i ( Q ) are also input and output of an independent identical BSC parallel to the first one.

In practical cases, detecting an extra-reduction using only one acquisition can lead to errors. Let us model the attack setup, taking into account that the detection of presence/absence of extra-reductions is a random variable, due to some noise. The random variables Markov chain for index i is given as follows:

Secret Bias Observable
K = k i ( W i ( M ) , W i ( Q ) ) ( Y i ( M ) , Y i ( Q ) ) = ( W i ( M ) N i ( M ) , W i ( Q ) N i ( Q ) ) .

The probabilities (3.8) depend on the unknown ratio p / R . The crucial observation is that the attacker knows the position of all squarings and all multiplications. Lemma 4.2 provides concrete formula, which allows us to estimate p / R . Of course, this estimation step is only necessary for RSA with CRT but not for RSA without CRT. We begin with a lemma, which will be needed.

Lemma 4.1

It is

(4.1) E ( W i ( M ) ) = P ( W i ( M ) = 1 ) = 1 4 p R ,

(4.2) E ( W i ( Q ) ) = P ( W i ( Q ) = 1 ) = 1 3 p R ,

(4.3) p R = 3 E ( W i ( Q ) ) = 2 E ( W i ( M ) ) + 1.5 E ( W i ( Q ) ) .

Proof

Since W i ( M ) and W i ( Q ) assume values in { 0 , 1 } the left-hand side equations in (4.1) and (4.2) are obvious, while the right-hand side equation follow immediately from (3.4) and (3.5), respectively. For k i = 0 , for instance,

P ( W i ( M ) = 1 ) = 0 1 0 1 0 s i + 1 , 0 s i + 1 , 1 p / R 1 d s i + 1 , 1 d s i + 1 , 0 d s i , 1 = 1 4 p R .

We note that the probability (4.2) was already verified in [20][3] and, for instance, in [11], respectively, the latter by other mathematical methods. Formula (4.3) follows directly from (4.1) and (4.2).□

The ER-values w i ( M ) and w i ( Q ) are determined (or more precisely: guessed) on the basis of single-trace template attacks. In particular, their guesses w ˜ i ( M ) and w ˜ i ( Q ) might be incorrect with some probability. We denote the corresponding random variables (referring to the guessed ER values) by W ˜ i ( M ) and W ˜ i ( Q ) . In the following, we assume that

(4.4) P ( W ˜ i ( M ) = v W i ( M ) = 1 v ) = P ( W ˜ i ( Q ) = v W i ( Q ) = 1 v ) = p noise for i { 0 , , l 1 } and v { 0 , 1 } ,

and similarly for the initialization of the registers R 0 and R 1 in Algorithm 2. In other words, the probability of guessing an ER value incorrectly is p noise 0 , independently of the true value. Of course, p noise = 0 characterizes a perfect side-channel measurement. Lemma 4.2(iii) is the generalization of (4.3) for noisy measurements. As noted in Lemma 4.4, this allows the estimation of p / R and p noise .

Lemma 4.2

(4.5) p R = 12 E ( W ˜ i ( Q ) ) 12 E ( W ˜ i ( M ) ) 1 + 6 E ( W ˜ i ( Q ) ) 8 E ( W ˜ i ( M ) ) ,

(4.6) p noise = 4 E ( W ˜ i ( M ) ) 3 E ( W ˜ i ( Q ) ) .

Proof

Since W ˜ i ( Q ) is { 0 , 1 } -valued, we obtain

E ( W ˜ i ( Q ) ) = P ( W ˜ i ( Q ) = 1 ) = P ( W ˜ i ( Q ) = 1 W i ( Q ) = 1 ) P ( W i ( Q ) = 1 ) + P ( W ˜ i ( Q ) = 1 W i ( Q ) = 0 ) P ( W i ( Q ) = 0 ) = ( 1 p noise ) p 3 R + p noise 1 p 3 R ,

and similarly

E ( W ˜ i ( M ) ) = P ( W ˜ i ( M ) = 1 ) = P ( W ˜ i ( M ) = 1 W i ( M ) = 1 ) P ( W i ( M ) = 1 ) + P ( W ˜ i ( M ) = 1 W i ( M ) = 0 ) P ( W i ( M ) = 0 ) = ( 1 p noise ) p 4 R + p noise 1 p 4 R .

Solving these equations for ( p / R ) and p noise yields (4.5) and (4.6).□

In Lemma 4.3, ( e 1 ( M ) , e 1 ( Q ) , , e u ( M ) , e u ( Q ) ) { 0 , 1 } 2 u represents the “error vector” and ham corresponds to the Hamming weight of a value. The nonzero entries give the positions at which the guessed extra-reduction vector ( w ˜ i ( M ) , w ˜ i ( Q ) , , w ˜ i u + 1 ( M ) , w ˜ i u + 1 ( Q ) ) are incorrect.

Lemma 4.3

  1. (4.7) P θ ( W ˜ i j + 1 ( M ) = w ˜ i j + 1 ( M ) , W ˜ i j + 1 ( Q ) = w ˜ i j + 1 ( Q ) j = 1 , , u ) = 0 e j ( M ) , e j ( Q ) 1 1 j u P θ W i j + 1 ( M ) = w ˜ i j + 1 ( M ) e j ( M ) , W i j + 1 ( Q ) = w ˜ i j + 1 ( Q ) e j i + 1 ( Q ) j = 1 , , u × p noise ham ( e 1 ( M ) , , e u ( Q ) ) ( 1 p noise ) 2 u ham ( e 1 ( M ) , , e u ( Q ) ) .

  2. For each hypothesis θ { 0 , 1 } u and each (guessed) extra-reduction vector ( w ˜ i ( M ) , w ˜ i ( Q ) , , w ˜ i u + 1 ( M ) , w ˜ i u + 1 ( Q ) ) , we have

    (4.8) P θ ( W ˜ i ( M ) = w ˜ i ( M ) , , W ˜ i u + 1 ( Q ) = w ˜ i u + 1 ( Q ) ) = P 1 θ ( W ˜ i ( M ) = w ˜ i ( M ) , , W ˜ i u + 1 ( Q ) = w ˜ i u + 1 ( Q ) ) .

Proof

The term p noise ham ( e 1 ( M ) , , e u ( Q ) ) ( 1 p noise ) 2 u ham ( e 1 ( M ) , , e u ( Q ) ) quantifies the probability for the error vector ( e 1 ( M ) , e 1 ( Q ) , , e u ( M ) , e u ( Q ) ) . This fact and the definition of the conditional probability imply (4.7). Assertion (ii) follows immediately from (i) and Lemma 3.4(ii), applied to the particular right-hand probabilities in (4.7).□

The last lemma of this section explains how to estimate the ratio p / R and the probability p noise .

Lemma 4.4

Assume that the attacker has observed N side-channel traces. Then

(4.9) μ ˜ M 1 N l n = 1 N i = 0 l 1 w ˜ i ( M ) ; n

provides an estimator for E ( W ˜ i ( M ) ; n ) and analogously

(4.10) μ ˜ Q 1 N l n = 1 N i = 0 l 1 w ˜ i ( Q ) ; n

for E ( W ˜ i ( Q ) ; n ) . The index n refers to the numbering of the side-channel traces. (ii) Substituting μ ˜ M and μ ˜ Q for E ( W ˜ i ( M ) ; n ) and E ( W ˜ i ( Q ) ; n ) into (4.5) and (4.6) yields estimates p R ˜ and p ˜ noise .

(iii) For perfect measurements alternatively (4.3) might be used to estimate p / R . Compared to the mid-term the right-hand term considers twice as many MM and thus should provide a more precise estimate.

Proof

Straightforward.□

Example 4.5

(Estimation of p / R and p noise ) For different exponents of 512-bit length, we estimate p / R ˜ and p noise ˜ for two moduli (RSA-1024-p and RSA-1024-q defined in [11, Section 2.2]) and different values of p noise depending on the number of side-channel traces N . For each value of N between 0 and 500, we compute p / R ˜ using (4.5) and p noise ˜ using (4.6) for the different exponents and the found values are represented using a box plot (deciles/quartile/median values) in Figure 2.

Figure 2 
            Statistic box plot to estimate the ratio 
                  
                     
                     
                        p
                        
                        /
                        
                        R
                     
                     p\hspace{0.1em}\text{/}\hspace{0.1em}R
                  
                and the probability 
                  
                     
                     
                        
                           
                              p
                           
                           
                              noise
                           
                        
                     
                     {p}_{{\rm{noise}}}
                  
                in function of side-channel traces 
                  
                     
                     
                        N
                     
                     {\mathcal{N}}
                  
                using 1.000 randomly selected exponent values.
Figure 2

Statistic box plot to estimate the ratio p / R and the probability p noise in function of side-channel traces N using 1.000 randomly selected exponent values.

Figure 3 
            Success rate for an entire exponent using 1.000 randomly selected exponent values depending on the number of side-channel traces 
                  
                     
                     
                        N
                     
                     {\mathcal{N}}
                  
                with different noise probabilities 
                  
                     
                     
                        
                           
                              p
                           
                           
                              noise
                           
                        
                     
                     {p}_{{\rm{noise}}}
                  
               : (a) 
                  
                     
                     
                        
                           
                              p
                           
                           
                              noise
                           
                        
                        =
                        0.00
                     
                     {p}_{{\rm{noise}}}=0.00
                  
               , (b) 
                  
                     
                     
                        
                           
                              p
                           
                           
                              noise
                           
                        
                        =
                        0.10
                     
                     {p}_{{\rm{noise}}}=0.10
                  
               , (c) 
                  
                     
                     
                        
                           
                              p
                           
                           
                              noise
                           
                        
                        =
                        0.20
                     
                     {p}_{{\rm{noise}}}=0.20
                  
               .
Figure 3

Success rate for an entire exponent using 1.000 randomly selected exponent values depending on the number of side-channel traces N with different noise probabilities