Recovering Secrets From Prefix-Dependent Leakage

Abstract We discuss how to recover a secret bitstring given partial information obtained during a computation over that string, assuming the computation is a deterministic algorithm processing the secret bits sequentially. That abstract situation models certain types of side-channel attacks against discrete logarithm and RSA-based cryptosystems, where the adversary obtains information not on the secret exponent directly, but instead on the group or ring element that varies at each step of the exponentiation algorithm. Our main result shows that for a leakage of a single bit per iteration, under suitable statistical independence assumptions, one can recover the whole secret bitstring in polynomial time. We also discuss how to cope with imperfect leakage, extend the model to k-bit leaks, and show how our algorithm yields attacks on popular cryptosystems such as (EC)DSA.


Introduction
Many cryptographic algorithms iterate over a secret sequence of bits. This is the case for example in typical implementations of the exponentiation algorithms used in discrete logarithm and RSA-based cryptosystems. In such cases, one may hope to learn about the secret bits by observing side-channel leakage during computation. This is the basis of many side-channel attacks based on simple power analysis (SPA), including Kocher's seminal work [17,18] and many followups.
However, there are cases in which the relationship between computation and leakage is nontrivial. For instance, if we consider a double-and-add scalar multiplication on an elliptic curve, the leakage of one bit of the x-coordinate of each intermediate point in the computation should be enough in an information-theoretic sense to recover the secret scalar, but it seems hard to write down an expression of the scalar in terms of that leakage. We also note that such a leakage can in fact occur in concrete attack settings: we discuss one such setting in the full version of this paper [12], where an attacker does obtain this type of leakage using a so-called "hardware trojan horse".
Although the relationship between the leakage and the secret can be quite involved, the structure of the algorithm, which iterates over secret bits sequentially, ensures that the leakage at the i-th iteration depends only on the first i bits of the secret. If the algorithm is deterministic and the leakage is perfect, the leakage at the i-th iteration can therefore be viewed as a deterministic function of the i-bit prefix of the secret bitstring. This leads us to define the following general problem: recover a bitstring s ∈ {0, 1} n given the leakage vector: f (s) = (︁ f 1 (s [1:1] ), f 2 (s [1:2] ), . . . , fn(s [1:n] ) )︁ ∈ {0, 1} n where s [1:i] ∈ {0, 1} i is the i-bit prefix of s, and the functions f i are regarded as known, independent random functions with values in {0, 1}.
We show that this problem can be solved in expected polynomial time, in the sense that there is an algorithm with expected polynomial running time which outputs the list of all possible solutions to the problem (which is, in particular, of expected polynomial length). We also initiate the study of what happens when the leak is imperfect (e.g., due to noise considerations), and when more than a single bit of leakage is known to the attacker.
From a side-channel perspective, the algorithm we consider is a "single-trace" attack, in the sense that it recovers the secret from the leakage of a single execution of the target algorithm. In particular there is no notion of adaptive queries from the adversary in that context, which sets us apart from such questions as hardcore bits, and allows our attack to work in the presence of a large class of classical side-channel countermeasures, often designed to protect against multi-trace attacks like differential power analysis.

The secret prefix random leakage problem
Let s be an n-bit secret. Let f 1 , . . . , fn be functions f j : {0, 1} j → {0, 1} k , which are modeled as independent random oracles. Let s [1:i] denote the i-bit prefix of s for every i = 1, . . . , n. This paper considers the following problem. )︁ ∈ {0, 1} k·n , recover the value of s.
We will also discuss harder variants of this problem where the adversary is not given the vector f (s) exactly, but only gets partial information about it. These are abstractions of a situation arising in certain real-world side-channel attack settings, as discussed in the full version of this paper [12].
The main concrete example of this problem that we consider throughout the paper is the case when the secret s is the secret exponent used in a discrete logarithm-based cryptosystem, which computes g s using a (possibly side-channel protected) square-and-multiply algorithm, for some known group generator g. The function f j is then some k-bit leakage function on the variable that contains the intermediate group element at iteration j of the square-and-multiply; for example, if the group is an elliptic curve, f j could be the k lowest order bit of the x-coordinate of the intermediate curve point at iteration j.

A polynomial time algorithm for 1-bit leaks
We first consider the special case of the problem described in Definition 2.1 when k = 1. In this case already, we expect the problem to be tractable in an information-theoretic sense, since we get n independent bits of information on the n-bit secret s. However, recovery is not a priori trivial.
A simple approach to solve the problem is to simply reconstruct the entire list of all possible i-bit prefixes of s compatible with the provided leakage f (s), successively for i = 1, 2, . . . , n. This amounts to building a binary tree of possible prefixes as in Figure 1, starting from the empty string ε at the root, and with i-bit prefixes at level i. Going down one level, we double the set of candidate prefixes of the secret exponent by extending them either by 0 or 1, and then remove all the candidates that are incompatible with the new bit of information. Our key observation is that, under the right conditions, this pruning compensates the tree's growth, so that the algorithm terminates in expected polynomial time.
Concretely, upon extending a candidate with a bit, the probability that the new candidate is correct is expected to be 1/2, independently for all candidates. In particular, a node in the tree of possible candidates should have 0, 1 or 2 children according as whether none, either, or both of its extensions by 0 and 1 are compatible with observations; this happens with probability 1/4, 1/2 and 1/4 respectively for all nodes, independently. Thus, our recovery algorithm is a search in a Galton-Watson tree with p 1 = 1/2, p 0 = p 2 = 1/4 and p k = 0 for all k > 2 in the sense of the following definition. of non negative integers, and whose transition probabilities are as follows: Pr where {p *m k } denotes the m-th convolution power of the distribution {p k }, i.e. the conditional distribution of Z n+1 given that Zn = m is the distribution of the sum of m i.i.d. random variables each with distribution {p k }.
A Galton-Watson tree is a random tree with offspring distribution {p k } is a random tree in which each node independently has k offspring with probability p k for all k ≥ 0; it satisfies that the number Zn of nodes at level n is given by a Galton-Watson process.
In our case, each node in the tree has on average µ = ∑︀ ∞ k=0 kp k = 1 child: in other words, we have a critical Galton-Watson process, so that E[Zn] = µ n E[Z 0 ] = 1. In particular, the size of the full search space, which is given by Z 1 + · · · + Zn, does not undergo a combinatorial explosion. There are several ways to implement the Galton-Watson simulation; one possibility is to keep a pool of candidates from which new candidates are generated and pruned. This gives Figure 2. Secret recovery from 1-bit leakage information: 7. return Xn  We implemented this algorithm in the setting of square-and-multiply leaks in group exponentiations discussed in Section 2; implementation results are provided in (the first column of) Table 1. We can see in the table that the size of the search space increases quadratically rather than linearly with the bit length n of the exponent (despite the fact that E[Z 1 + · · · + Zn] = n). This is because we are looking at a Galton-Watson tree conditioned on having at least one node at depth n, and we can show that E[Z 1 + · · · + Zn | Zn ≠ 0] = Ω(n 2 ): see the discussion below. Nevertheless, our attack is polynomial and very practical for cryptographic sized problems, as exemplified by the simulations in Table 1.
Our algorithm is reminiscent of the cold boot attack of Heninger and Shacham against factorization [14] and its numerous follow-ups, such as [13,19,23]. This is interesting, as these cold boot attacks do not really have a natural polynomial-time counterpart in the discrete logarithm setting (even the attack of Poettering and Sibborn [24] is basically exponential). Applied to a leaky exponentiation algorithm, our algorithm of sidechannel provides such a counterpart. Moreover, like in the extensions of the original attack of Heninger and Shacham, one can consider variants of our attack model in which the leak is altered (bit flips, or analog noise, rather than erasures). We discuss such generalizations below.

Galton-Watson conditioning and search space growth
We now explain why the search space for the critical Galton-Watson search described above is found to increase quadratically (see Table 1). This is due to the fact that our Galton-Watson tree is guaranteed to have a node at depth n. In other words, the average search space size that we want to estimate is E[Z 1 + · · · + Zn | Zn ≠ 0], where Z i is the number of nodes at depth i. Our analysis relies on the following result.

Proposition 3.2. Consider a Galton-Watson process
∑︀ k≥0 kp k = 1, and non trivial, in the sense that p 1 ≠ 1. Denote by f the generating function of the offspring distribution: and assume that f ′′ (1) < +∞. Then the following asymptotic estimate holds: Proof. Note first that under the assumptions of the theorem, we have f (1) = 1 (because {p k } is a probability distribution) and f ′ (1) = 1 (because of criticality). As a result, we claim that is certainly non negative, and can only vanish if p k = 0 for all k ≥ 2. But if that were the case, we would get p 1 = f ′ (1) = 1, contradicting non triviality.
As a consequence, we must have f (x) > x for all x ∈ [0, 1). Indeed, the function g(x) = f (x) − x satisfies g ′′ = f ′′ , and is thus strictly convex over [0, 1]. Hence g ′ = f ′ − 1 is monotonically increasing, and it vanishes as 1, hence g ′ < 0 over [0, 1). By the same argument, this implies f (x) > x for all x ∈ [0, 1) as required. Now, let un = Pr[Zn = 0] and vn = 1 − un = Pr[Zn ≠ 0]. By definition of the Galton-Watson process, the sequence (un) satisfies the recurrence relation u 0 = 1 and u n+1 = f (un). By the argument above, the sequence (un) is strictly increasing, and since it is bounded by 1, it must converge to the only fixed point of f in [0, 1], which is 1. In particular, vn tends to 0, and using the Taylor expansion of f at 1, we get: Raising this relation to the power −1 yields: (1).

Dealing with imperfect information
A central argument in the polynomial-time execution of the algorithm in the noiseless case is that every additional bit of information allowed to (safely) prune on average half of the candidates, compensating exactly the tree's growth. This argument no longer holds under imperfect information, as pruning may eliminate correct candidates (or conversely keep in the pool incorrect ones), resulting in a blowup and/or incorrect results. If we have less than one bit of information at each generation, the population grows exponentially (i.e. µ > 1). In Table 1 we indicate the probability p of the leaked bit being correct, and simulate the corresponding blowup when p is slightly less that 100%.

Recovery with imperfect k-bit leakage
We now turn our attention to a more general setting, in which the leakage at each iteration consists of k-bit values, but is not always recovered exactly. More precisely, we consider two possible models: in the simpler "all-or-nothing" model, for each step j, the k-bit leakage at step j is recovered in its entirety with some probability p, and not at all otherwise; in the more involved "bitwise" model, each bit of leakage independently has probability p of being recovered. The algorithm of Figure 2 extends directly to both settings: pruning is simply done with all the available information at each step instead of just a single bit. The question is then to determine under which condition on k and p the number of candidates and the running time are expected to remain polynomial. We address this question below.

All-or-nothing model
In the all-or-nothing model, at each iteration, all k bits of leakage are recovered exactly with some probability p, and with probability 1 − p, no information is available at all.
Mathematically, this yields a generation-dependent Galton-Watson process: with probability p there will be an average of 2 1−k offsprings, and with probability 1 − p there will be an average of 2 offsprings. It turns out that the expected number of nodes at depth n is then exactly the product of the expected numbers of offsprings at each generation [11,Proposition 4]. As a result, after n iterations, of which ℓ were successful extractions (we learned all k bits) and n − ℓ failed (we learned nothing), the average number of offsprings is µ ℓ = (2 −k+1 ) ℓ · 2 n−ℓ . Naturally, we do not know ℓ, so that we have to compute the weighted sum over all possible values: Our algorithm runs in polynomial time exactly when µ 0 ≤ 1, i.e., if and only if p ≥ p crit where p crit = 2 k−1 /(2 k − 1). For instance, in the 4-leak model, this requires a success probability of 8/15 ≈ 54%. Simulation results for this scenario are given in Table 2.

Bitwise model
In the bitwise model, at each iteration, each bit of leakage (among k bits in total) is recovered independently with probability p.
Assume that we recover j 1 bits at iteration 1, j 2 bits at iteration 2 and so on. The expected number of leaves in that case is then 2 1−j1 · · · 2 1−jn . Now since we learn each bit with probability p, the probability of recovering j bits of leakage at any fixed iteration is exactly As a result, the expected number of The smallest value of p for which our attack runs in polynomial time is therefore p crit = 2(1 − 2 −1/k ). For example, when k = 4, the condition is p 32%. Simulation results for this attack are given in Table 3. Note that the critical probability is significantly lower in this case, despite the fact that we obtain the same number of bits of leakage per iteration on average. This is due to the fact that we obtain usable information more often, and as a result, we can prune the search tree earlier.

Application to (EC)DSA
What we have described so far is a generic attack against cryptographic schemes. Particular schemes among them, however, can be vulnerable to stronger attacks. For example, it is possible to efficiently break (EC)DSA (or more generally Schnorr-like signatures) in the 1-leak model, even if the presence of noise causes the corresponding leakage bit to be recoverable with probability p < 1 (this is in stark contrast with the generic attack above, in which a less than perfect recovery makes the search space in the 1-leak case exponentially large).
A first possible approach is to get a one bit leak about the nonce. That bit of information can then used to recover the secret signing key given sufficiently many signatures, using the statistical attack of Bleichenbacher [5,21]. The attack with a single bit of information requires many signatures, but Aranha et al. [1] have shown it to be practical at least against 160-bit groups. And if the leakage is recoverable only with probability p, the same attack can be mounted by simply increasing the number of signatures by a factor of 1/p and throwing away those for which the bit is unrecoverable.
A more efficient approach is to combine this leak with nonce-based lattice attacks on (EC)DSA [15,22]. In the 1-leak model, where we recover the leakage bit with probability p < 1, although we do not recover the entire nonce we are still be able to learn a prefix of it (i.e. the MSBs of the nonce, for a left-to-right square-andmultiply) with good probability. And if we have sufficiently many signatures for which we know the MSBs of the nonce, standard lattice techniques will recover the signing key.
More precisely, consider the first bit of the nonce (the MSB), which may be 0 or 1. With probability p, we learn one leakage bit, which may itself be compatible (with equal probability) with that bit being 0, 1, or both (but not neither of them, because our search tree is conditioned on knowing that a solution actually exists). If it is both, we learn nothing, but otherwise we learn that MSB: this happens with probability 2p/3. And in that case, we can make the same argument for the second bit, and then the third, and so on. With probability at least (2p/3) k , we will learn the k most significant bits of the nonce.¹ Now how many MSBs do we need to mount the lattice attack? This depends on the bit size n of the group, and the maximum lattice dimension dmax in which we can reliably find the shortest vector of a random lattice (nowadays, dmax = 100 is a reasonable rule-of-thumb using reduction algorithms like BKZ 2.0 [7]; academic records on the SVP Hall of Fame go all the way to dimension 150 as of this writing). Indeed, it is standard that we can recover the signing key by computing the SVP in a lattice of dimension d = n/(k − c), where c = log 2 √︀ πe/2, so the lowest usable k is given by k = ⌈c + n/dmax⌉. And we then need d signatures with k known nonce MSBs to carry out the attack. This can be obtained by collecting m = (︀

2p
)︀ k · d signatures with the leakage above, and keeping those for which the leakage is enough to learn the k most significant bits.
In a 256-bit group and with dmax = 100, we have k = 4 and d ≈ 87. If the probability of learning a bit of the leakage is p = 1/2, we thus get m ≈ 7000: collecting 7000 signatures should yield enough signatures with 4 known MSBs to mount the attack and recover the signing key. This is much better than the Bleichenbacher approach, which would require billions of signatures at this group size, and a fortiori better than trying to apply the generic algorithm, since the expected size of the search space in that case is at least µ = (p+2−2p) n = (3/2) 256 ≈ 2 150 by (2).
Remark 5.1. Bauer and Vergnaud [2] have recently cryptanalyzed (EC)DSA-type schemes in the presence of leakage on randomly-located bits of the nonces. Although seemingly relevant, this attack does not apply to our setting, because the one bit leakage will not reveal many randomly located bits of the nonce: to learn a bit with certainty, it should be the only bit compatible when extending all previous candidate prefixes, and this is exponentially unlikely to happen when the set of candidate prefixes is already large.

Countermeasures and perspectives
In this section, we discuss the effectiveness of various possible side-channel countermeasures used in implementations of discrete logarithm-based cryptosystems with respect to the attack considered in this paper. We also suggest possible perspectives for further work.

Protecting exponentiation algorithms
As we have seen, our algorithm gives rise to efficient side-channel attacks on exponentiation algorithms in cryptographic groups (assuming of course that the "prefix-dependent leakage" it relies on can be collected in practice). Interestingly, the corresponding side-channel attacks is unaffected by several large classes of common side-channel countermeasures deployed in implementations of discrete logarithm-based cryptosystems. Indeed, a first important family of countermeasures used in that setting includes modification to the exponentiation algorithm that make it regular, in the sense that the same types of operations are carried out at each iteration, regardless of whether the bit of the secret at that iteration is 0 or 1. This includes the squareand-multiply-always algorithm [8], the Montgomery ladder [20], and the use of elliptic curves with complete addition laws [3,4,25]. None of these approach thwart our attack, since it does not rely on the particular control flow of the algorithm but only on essentially arbitrary leakage information on intermediate values.
Another family of side-channel countermeasures used in exponentiation algorithm relies on randomizing the secret exponent. This includes exponent blinding [10, Section 5.1], where a random multiple of the group order is added to the exponent, and exponent splitting [6,9], where the exponent s is written as a random sum s 0 + · · · + s d modulo the group order, and the exponentiation is computed as g s = g s0 · · · g s d . Again, neither of these approaches thwart our attack. Indeed, it is sufficient to recover the longer blinded exponent in full in the case of blinding, or all of the additive shares in the case of splitting; this increases the complexity of the attack slightly, but the recovery algorithm remains polynomial time (at least for a constant number of additive shares in the case of splitting).
A countermeasure that does work, however, consists in randomizing the base point of the exponentiation [16]. Indeed, to fix ideas, if for example g x is computed as g x 1 · g x 2 where g = g 1 · g 2 is a random decomposition, then it is no longer possible for the attacker to recompute the leakage functions f j locally and therefore to build to corresponding Galton-Watson tree. This applies similarly to other base point randomization techniques using, e.g., isomorphic elliptic curves or field isomorphisms, as well as related techniques like the use of randomized projective coordinates [10,Section 5.3]. In all of these cases, our approach fails because, from the adversary's viewpoint, the leakage functions f j are no longer deterministic: they depend probabilistically on the randomness used to blind the base point of the exponentiation (resp. rerandomize projective coordinates).

Open problems
There are several generalizations of the problems considered in this paper that would be natural to consider and explore in further work.
A first one is the extension to algorithm that iterate over secrets a few bits at a time instead of one by one. This is typically the case for k-ary or window-based exponentiation algorithms. This also includes cryptographic computations based on non-binary secrets (such as the use of signed binary expansions like nonadjacent forms). Our approach should natural generalize to those settings, but the leakage bounds to achieve polynomial time recovery are of course different.
It would also be interesting to consider more general noise models for the leakage. For example, one could ask how to solve the following problem, with an arbitrary noise distribution: with the same notation as Definition 2.1, recover the secret s from: where ⃗ e = (e 1 , . . . , en) is a vector of independent identically distributed noise values sampled from some fixed distribution χ over {0, 1} k . Of course, for some distributions, the problem is clearly intractable (e.g. when χ is uniform), and in general one can only hope to recover the secret with some probability, but the tree-based approach should generalize naturally, provided that it is combined with a suitable algorithm for pruning branches that have a low probability of being consistent with the leakage, instead of branches that are literally incompatible. Quantifying that intuition and obtaining concrete bounds to ensure expected polynomial time recovery with high probability are left as open problems for future work.