# Designing Efficient Dyadic Operations for Cryptographic Applications

Gustavo Banegas, Paulo S. L. M. Barreto, Edoardo Persichetti and Paolo Santini

# Abstract

Cryptographic primitives from coding theory are some of the most promising candidates for NIST’s Post-Quantum Cryptography Standardization process. In this paper, we introduce a variety of techniques to improve operations on dyadic matrices, a particular type of symmetric matrices that appear in the automorphism group of certain linear codes. Besides the independent interest, these techniques find an immediate application in practice. In fact, one of the candidates for the Key Exchange functionality, called DAGS, makes use of quasi-dyadic matrices to provide compact keys for the scheme.

MSC 2010: 11T71; 94A60

## 1 Introduction

Post-Quantum Cryptography is the area of research that investigates cryptographic primitives that are deemed secure against attackers equipped with quantum technology. These include schemes based on a variety of mathematical problems, such as finding short vectors in a lattice, or decoding random linear codes. The latter is known as Code-based Cryptography and it relies more or less directly on the Syndrome Decoding Problem [4], which shows no vulnerabilities to quantum attacks. The first code-based scheme was introduced by McEliece in 1978 [10] and has resisted cryptanalysis, in its original form, for nearly 40 years.

McEliece’s cryptosystem has often been ignored in favor of schemes based on number theory problems (such as RSA or El Gamal), mainly due to the size of its public key, which was deemed too large for practical use (especially at the time). However, Shor’s algorithm [14] shows that, once quantum computers of an appropriate size are available, the cryptosystems currently in use will become obsolete. It is therefore important to offer a credible alternative to current cryptography, and, with this in mind, NIST has recently launched a call for papers to standardize the public-key primitives of the future[1].

Among the code-based candidates for NIST’s call, DAGS [3] is a Key Encapsulation Mechanism (KEM) that uses Quasi-Dyadic (QD) matrices to considerably reduce the size of the public key, following a McEliece-like approach. The proposal builds on a line of work initiated by Misoczki and Barreto [11] and subsequently developed by Persichetti in [6, 12].

### Our Contribution

We analyze two separate aspects of dyadic operations. First, we present three different algorithms for that are aimed specifically at computing multiplication of dyadic matrices. These are, respectively, a “standard” approach that makes use of dyadic signatures, a specialized Karatsuba-like algorithm, and a procedure based on the Fast Walsh-Hadamard Transform (FWHT) [7], also called dyadic convolution. We analyze the performance of all three methods and report our timings.

As a second contribution, we describe a procedure that applies the LUP decomposition [5] to the dyadic case. The method effectively factors every quasi-dyadic matrix into a product of two triangular matrices and a permutation matrix. This leads to the possibility of a very efficient algorithm for computing the inverse of a matrix, which is particularly useful in code-based cryptography, for instance for computing the systematic form of a parity-check (or generator) matrix. According to our measurements, this improved inversion procedure is extremely fast, and provides a very large speedup during DAGS Key Generation.

### Organization of the Paper

This paper is organized as follows. We start with some preliminary definitions in Section 2. We then present our main contributions: the various multiplication techniques are described in Section 3 and the improved inversion algorithm is presented in Section 4. We conclude by showing the results obtained when applying our techniques to DAGS; this is done in Section 5.

## 2 Preliminaries

We introduce dyadic matrices and describe some of their general properties.

## Definition 2.1

Given a ring 𝓡 and a vectorh = (h0, h1, ⋯, hn−1) ∈ 𝓷R, withn = 2rfor somer ∈ ℕ, thedyadicmatrixΔ(h) ∈ 𝓡n×nis the symmetric matrix with componentsΔij = hijwherestands for bitwise exclusive-or. Such a matrix is said to haveorderr. The sequencehis calledsignatureof the matrixΔ(h), and corresponds to its first row. The set of dyadicn × nmatrices over 𝓡 is denotedΔ(𝓡n).

One can alternatively characterize a dyadic matrix recursively: any 1 × 1 matrix is dyadic of order 0, and any dyadic matrix M of order r > 0 has the form

M=ABBA(1)

where A and B are two dyadic matrices of order r − 1. In other words, Δ(𝓡n) = Δ(Δ(𝓡n/2)).

## Definition 2.2

A dyadic permutation is a dyadic matrixΠiΔ({0, 1}n) characterized by the signatureπi = (δij | j = 0, …, n − 1), whereδijis the Kronecker delta (henceπicorresponds to thei-th row or column of the identity matrix).

A dyadic permutation is clearly an involution, i.e. (Πi)2 = I. The i-th row, or equivalently the i-th column, of the dyadic matrix defined by a signature h can be written as Δ(h)i = hΠi.

A dyadic matrix can be efficiently represented by its signature; in particular, all the operations between dyadic matrices can be referred only to the corresponding signatures. Indeed, for any two length-n vectors a, b ∈ 𝓡, we have:

Δ(a)+Δ(b)=Δ(a+b)(2)

which means that, given two dyadic matrices A and B, with respective signatures a and b, their sum is the dyadic matrix described by the signature a+b.

In an analogous way, the multiplication between dyadic matrices can be done by considering only the corresponding signatures; we will discuss efficient ways for computing multiplications in Section 3.

## Algorithm 1

 INPUT: r ∈ ℕ, n = 2r and a, b ∈ 𝔽n. OUTPUT: c ∈ 𝔽n such that Δ(c) = Δ(a)Δ(b). 1: c ← vector of length n, initialized with null elements. 2: c0 ← a0 ⋅ b0 3: fori ← 1 ton − 1 do 4:  c0 ← c0 + aibi 5:  i(b) ← binary representation of i, using n bits. 6:  for {j = 0, 1, ⋯, n − 1} do 7:   j(b) ← binary representation of j, using n bits. 8:   π(b) ← i(b)⊕ j(b) 9:   π ← conversion of π(b) into an integer. 10:   ci ← ci + aibπ 11:   end for 12:   end for 13: return c

Moreover, it is easy to see that the inverse of a dyadic matrix is also a dyadic matrix; this can be easily computed using Sylvester-Hadamard matrices (see Section 3.2).We will expand on this in Section 4.

Finally, we introduce a relaxed notion of dyadicity, which will be useful throughout the paper.

## Definition 2.3

A quasi-dyadic matrix is a (possibly non-dyadic) block matrix whose elements are dyadic submatrices, i.e. an element ofΔ(𝓡n)d1×d2.

## 3 Multiplication of Dyadic Matrices

In this section we consider different methods for computing the multiplication between two dyadic matrices. In fact, we have just mentioned how some matrix operations, like the sum or the inversion, can be efficiently performed in the dyadic case just by considering the signatures. Multiplication can be strongly improved with similar methods, which exploit the particular structure of such matrices. In particular, we analyze three different algorithms and provide estimations for their complexities; we then compare the performance of the various algorithms.

For ease of notation, we will refer to the two n × n matrices that we want to multiply simply as A and B, with a = [a0, a1, ⋯, an−1] and b = [b0, b1, ⋯, bn−1] being the respective signatures. Maintaining the same notation, the product matrix C = AB, which is also dyadic, will have signature c = [c0, c1, ⋯, cn−1].

In particular, we focus on the special case of quasi-dyadic matrices with elements belonging to a field 𝔽 of characteristic 2.

### 3.1 Standard Multiplication

The first algorithm we analyze is described in Algorithm 1; we refer to it as the standard multiplication. The element of C in position (i, j) is obtained as the multiplication between the i-th row of A and the j-th column of B. Since dyadic matrices are symmetric, this is equivalent to the inner product between the i-th row of A and the j-th one of B. The signature c (i.e., the first row of C) is obtained by inner products involving only a (i.e., the first row of A). Thus, we can just construct the rows of B, by permutations of the elements in b, and then compute the inner products.

The complexity of the algorithm is due to two different types of operations:

1. In order to construct the rows of B, we need the indexes of the corresponding permutations. Each index is computed as the modulo 2 sum of two binary vectors of length r, so can be obtained with a complexity of r binary operations. Thus, considering that we need to repeat this operation for 2r − 1 rows (for the first one, no permutation is needed), the complexity of this procedure can be estimated as r ⋅ 2r ⋅ ( 2r − 1 ).

2. Each element of c is obtained as the inner product between two vectors of 2r elements, assuming values in 𝔽. This operation requires 2r multiplication and 2r − 1 sums in 𝔽. If we denote as Cmult and Csum the costs of, respectively, a multiplication and a sum in 𝔽, the total number of binary operations needed to compute 2r inner products can be estimated as 22rCmult + (22r − 2r)⋅ Csum.

The complexity of a standard multiplication between two dyadic signatures can be estimated as:

Cstd=r22r2r+22rCmult+(22r2r)Csum(3)

### Definition 3.1

The dyadic convolution of two vectorsa, b ∈ 𝓡, denoted byab, is the unique vector of 𝓡 such thatΔ(ab) = Δ(a) Δ(b).

Of particular interest to us is the case where 𝓡n is actually a field 𝔽. Dyadic matrices over 𝔽 form a commutative subring Δ(𝔽n) ⊂ 𝔽n×n, and this property gives rise to efficient arithmetic algorithms to compute the dyadic convolution. In particular, we here consider the fast Walsh-Hadamard transform (FWHT), which is well known [7] but seldom found in a cryptographic context. We describe it here for ease of reference. We firstly recall the FWHT for the case of a field 𝔽 such that char(𝔽) ≠ 2, and then describe how this technique can be generalized to consider also the case of char(𝔽) = 2 (which, again, is the one we are interested in).

### Definition 3.2

Let 𝔽 be a field with char(𝔽) ≠ 2. The Sylvester-Hadamard matrix Hr ∈ 𝔽nis recursively defined as

H0=1,Hr=Hr1Hr1Hr1Hr1,r>0.

One can show by straightforward induction that Hr2=2rIr and hence Hr1=2rHr, which can also be expressed recursively as

H01=1,Hr1=12Hr11Hr11Hr11Hr11,r>0.

### Lemma 3.1

Let 𝔽 be a field with char(𝔽) ≠ 2. IfM ∈ 𝔽n×nis dyadic, thenHr1MHris diagonal.

### Proof

The lemma clearly holds for r = 0. Now let r > 0, and write

M=ABBA

where A and B are dyadic. It follows that

Hr1MHr=12Hr11Hr11Hr11Hr11ABBAHr1Hr1Hr1Hr1=Hr11M+Hr1OOHr11MHr1,

and since both M+ = A + B and M = AB are dyadic, Hr11M+Hr1andHr11MHr1 are diagonal by induction, as is thus also Hr1MHr. □

Lemma 3.1 establishes that Sylvester-Hadamard matrices diagonalize all dyadic matrices. In particular, the factors in a product of dyadic matrices are thus simultaneously diagonalized, suggesting an efficient way to carry out the matrix multiplication, namely, computing Hr1(MN)Hr=(Hr1MHr)(Hr1rHr) given the diagonal forms Hr1MHrandHr1rHr of two dyadic matrices M and N requires only n multiplications of the diagonal elements.

In fact, it is not necessary to compute Hr1MHr in full to obtain the diagonal form of M, as indicated by the following result:

### Lemma 3.2

Let 𝔽 be a field with char(𝔽) ≠ 2. The diagonal form of a dyadic matrixM ∈ 𝔽n×nis the first line ofMHr. In other words, Hr1Δ(h) Hr = diag(hHr).

### Proof

The lemma clearly holds for r = 0. Now let r > 0, and with the notation of Lemma 3.1, the diagonal of Hr1MHr is the concatenation of the diagonals of Hr11M+Hr1andHr11MHr1. Similarly, since

MHr=ABBAHr1Hr1Hr1Hr1=M+Hr1MHr1M+Hr1MHr1,

the first line of MHr is the concatenation of the first lines of M+Hr−1 and MHr−1, which by induction are the diagonals of Hr11M+Hr1andHr11MHr1 respectively, yielding the claimed property. □

### Corollary 3.2.1

Computingcsuch thatΔ(a) Δ(b) = Δ(c) involves only three multiplications of vectors by Sylvester-Hadamard matrices.

### Proof

By Lemma 3.2, diag(aHr) diag(bHr) = (Hr1Δ(a)Hr)(Hr1Δ(b)Hr)=Hr1Δ(a)Δ(b)Hr=Hr1Δ(c)Hr = diag(cHr). Now simply retrieve c from z = cHr as c=zHr1=2rzHr.

The structure of Sylvester-Hadamard matrices leads to an efficient algorithm to compute aHr for a ∈ 𝔽n, which is known as the fast Walsh-Hadamard transform. Let [a0, a1] be the two halves of a. Thus

aHr=[a0,a1]Hr1Hr1Hr1Hr1=[(a0+a1)Hr1,(a0a1)Hr1].

This recursive algorithm, which can be easily written in purely sequential fashion (Algorithm 2), has complexity Θ(n log n), specifically, rn additions or subtractions in 𝔽. It is therefore somewhat more efficient than the fast Fourier transform, which involves multiplications by n-th roots of unity, when they are available at all (otherwise working in extension fields is unavoidable, and more expensive).

The product of two dyadic matrices Δ(a) and Δ(b), or equivalently the dyadic convolution ab, can thus be efficiently computed as described in Algorithm 3. The total cost is 3rn additions or subtractions and 2n multiplications (half of these by the constant 2r = 1/n) in 𝔽, with an overall complexity Θ(n log n). Notice that this is also the complexity of computing det Δ(a).

The fast Walsh-Hadamard transform itself is not immediately possible on fields of characteristic 2, since it depends on Sylvester-Hadamard matrices which must contain a primitive square root of unity. Yet the FWHT algorithm can be lifted to characteristic 0, namely, from 𝔽2 = ℤ/2ℤ to ℤ, or more generally from 𝔽2N = (ℤ/2ℤ)[x]/P(x) (for some irreducible P(x) of degree N) to ℤ[x]. Algorithm 3 can then be applied, and its output mapped back to the relevant binary field by modular reduction. This incurs a space expansion by a logarithmic factor, though. Each bit from 𝔽2 is mapped to intermediate values that can occupy as much as 3r + 1 bits; correspondingly, each element from 𝔽2N is mapped to intermediate values that can occupy as much as (3r + 1)N bits. Thus the component-wise multiplication in Algorithm 3 becomes more complicated to implement for large N. However, the method remains very efficient for the binary case as long as each expanded integer component fits a computer word. For a typical word size of 32 bits and each binary component being expanded by a factor of 3r + 1, this means that blocks as large as 1024 × 1024 can be tackled efficiently. On more restricted platforms where the maximum available word size is 16 bits, dyadic blocks of size 32 × 32 can still be handled with relative ease.

### Algorithm 2

 INPUT: r ∈ ℕ, n = 2r and a ∈ 𝔽n with char(𝔽) ≠ 2. OUTPUT: aHr. 1: v ← 1 2: forj ← 1 tondo 3:  w ← v 4:  v ← 2v 5:  fori ← 0 ton − 1 byvdo 6:   forl ← 0 tow − 1 do 7:    s ← ai+l 8:    q ← ai+l+w 9:    ai+l ← s + q 10:   ai+l+w ← s − q 11:   end for 12:  end for 13: end for 14: return a

### Algorithm 3

 INPUT: r ∈ ℕ, n = 2r and a, b ∈ 𝔽n with char(𝔽) ≠ 2. OUTPUT: a△b ∈ 𝔽n such that Δ(a) Δ(b) = Δ(a△b). 1: c ← vector of length n, initialized with null elements. 2: c̃ ← vector of length n, initialized with null elements. 3: Compute ã ← aHr via Algorithm 2. ▹ expansion 1 → r + 1 4: Compute b̃ ← bHr via Algorithm 2. ▹ expansion 1 → r + 1 5: forj ← 0 ton − 1 do 6:  c̃j ← ãjb̃j ▹ expansion r + 1 → 2r + 1 7: end for 8: Compute c ← c̃Hr via Algorithm 2. ▹ expansion 2r + 1 → 3r + 1 9: c ← 2−rc 10: return c

### 3.3 Karatsuba Multiplication

In this section we propose a method which is inspired by Karatsuba’s algorithm for the multiplication of two integers [9]. Let us denote by a0 and a1, respectively, the first and second halves of a, i.e.:

a0=a0,a1,,an21a1=an2,an2+1,,an1.(4)

The same notation is used for b0 and b1 and c0 and c1, corresponding to the halves of B and C. Some straightforward computations show that the following relations hold:

c0=a0b0+a1b1c1=a0+a1b0+b1+c0(5)

The iterative application of equation (5) allows to compute multiplications between dyadic matrices of any size. Let us denote as Cmul(2z)andCsum(2z) the complexities of a multiplication and a sum between two signatures of length 2z. For the sum of two dyadic signatures of size 2z we have:

Csum(2z)=2zCsum,(6)

where Csum again denotes the complexity of a sum in the finite field.

The complexity of this algorithm can thus be estimated as:

CKar=3Cmul(2r1)+4Csum(2r1)==3Cmul(2r1)+42r1Csum==33Cmul(2r2)+4Csum(2r2)+42r1Csum==33Cmul(2r2)+42r2Csum+42r1Csum==32Cmul(2r2)+432r2+2r1Csum==33Cmul(2r3)+4322r3+32r2+2r1Csum====3rCmul+4j=1r3j12rjCsum==3rCmul+432rj=1r32jCsum(7)

Taking into account the well known sum of a geometric series, we have:

j=1r32j=1+j=0r32j==1+1(32)r+1132=3r+12r3.

Considering this result, equation (7) leads to:

CKar=3rCmul+43r2rCsum(8)

In this section we propose an efficient algorithm for computing the inverse of quasi-dyadic matrices. The algorithm in principle is targeted to matrices that are not fully dyadic (even though, obviously, they have to be square). This is because, while it is of course possible to apply our procedure to fully dyadic matrices, these can in general be inverted much more easily, as we will see next.

To begin, remember that by definition of a quasi-dyadic matrix (Definition 2.3) we mean an element of Δ(𝓡n)d1×d2.

The inverse of a dyadic matrix (i.e. d1 = d2 = 1) can be efficiently computed, using only the signature, as described by the following Lemma.

### Lemma 4.1

Letn = 2rforr ∈ ℕ and letΔ(a) ∈ 𝓡n×nbe a dyadic matrix with signaturea. Then the inverseΔ(a)−1is the dyadic matrixb=12rb~Hr,whereis the vector such that diag() = [diag(aHr)−1.

### Proof

We have Δ(b)Δ(a) = In = Δ([1, 0, ⋯, 0]). The diagonal form of In corresponds to the first row of the product InHr, and so it is equal to the first row of Hr, that is the length-n vector made of all ones. According to Corollary 3.2.1, we can write:

diag(b)diag(a)=diag([1,1,,1]).

We then define aHr = [λ0, λ1, ⋯, λn−1], and obtain:

diag(b)=diag([1,1,,1])diag1(a)=diag([λ01,λ11,,λn11]).

Because of Lemma 3.2, we finally have:

b=diag1(a)Hr1=12rdiag1(a)Hr.

As we mentioned before, the above Lemma yields a very simple way for computing the inverse of a dyadic matrix: given a signature a, we just need to compute its diagonalized form as aHr, compute the reciprocals of its elements and put it in a vector . Finally, the inverse of Δ(a) can be obtained as 12rb~Hr. This property also leads to a very simple way to check the singularity of Δ(a): if its diagonalized form contains some null elements, then it is singular.

We now focus on the case of dyadic matrices over a field 𝔽 with characteristic 2. One can show by induction that in such a case a dyadic matrix Δ(a) of dimension n satisfies Δ(a)2 = (∑iai)2I, and hence its inverse, when it exists, is Δ(a)−1 = (∑iai)−2Δ(a), which can be computed in O(n) steps since it is entirely determined by its first row. It is equally clear that det Δ(a) = (∑iai)n, which can be computed with the same complexity (notice that raising to the power of n = 2r only involves r squarings). Basically, verifying whether a dyadic matrix has full rank or not can be easily done by checking whether the sum of the elements of the signature equals 0.

Consider a quasi-dyadic matrix M. Since the matrix has to be square, we have d1 = d2 = d, and the matrix has dimension dn × dn. Such a matrix can be compactly represented just by the signatures of the dyadic blocks. To simplify notation, we can denote the signature of the dyadic-block in position (i, j) as i,j, and store all such vectors in a matrix ∈ 𝓡d×dn:

M^=m^0,0m^0,1m^0,d1m^1,0m^1,1m^1,d1m^d1,0m^d1,1m^d1,d1.(9)

We focus again on the special case of quasi-dyadic matrices over a field 𝔽 with characteristic 2.

The LUP decomposition is a method which factorizes a matrix M as LUP, where L and U are lower triangular and upper triangular matrices, respectively, and P is a permutation.

Exploiting this factorization, the inverse of M can thus be expressed as:

M1=P1U1L1.(10)

The advantage of this method is that the inverses appearing in (10) can be easily computed, because of their particular structures. In fact, the inverse of an upper (lower) triangular matrix is obtained via a simple backward (forward) substitution procedure, while the inverse of P is its transpose.

In some cases, applying a block-wise LUP decomposition might lead to some complexity reduction; for instance, see [13] for the inversion of a sparse matrix. Here, we consider the case of a quasi-dyadic matrix; the corresponding procedure is shown in Algorithm 4.

### Algorithm 4

LUP Decomposition of a Quasi-Dyadic Matrix

 INPUT: d, r ∈ ℕ, n = 2r and M̂ ∈ 𝔽d×dn with char(𝔽) = 2. OUTPUT: M̂ ∈ 𝔽d×dn, P̂ ∈ ℕd. 1: P̂ ← [0, 1, ⋯, d − 1] 2: u ← 0 3: forj ← 0 tod − 1 do 4:  Update u, M̂ and P̂ via Algorithm 5. ▹ Pivoting of the signatures in the j-th column 5:  ifu = 0 then 6:   returnu ▹ M̂ is singular 7:  end if 8:  fori ← j + 1 toddo 9:   m̂i,j ← m^i,jm^j,j−1 10:  end for 11:  fori ← j + 1 tod − 1 do 12:   forl ← j + 1 tod − 1 do 13:    m̂i,l ← m̂i,l+m̂i,jm̂j,l 14:   end for 15:  end for 16: end for returnM̂, P̂

Our proposed procedure consists in using a block decomposition, which works directly on the signatures, in order to exploit the simple and efficient algebra of dyadic matrices. The operations in Algorithm 4 only refer to the signatures in : for instance, the expression i, ji, l means the product between the dyadics having as signatures i, j and i, l. This choice may result in some abuse of notation, but is useful to emphasize the fact that, as we have explained in the previous sections, operations with dyadics can be efficiently computed just by taking into account their signatures. It can be easily shown that, for a quasi-dyadic matrix, its factors L, U and P are in quasi-dyadic form as well: as we have done for the matrix M, we refer to their compact representations as , Û and , respectively.

The algorithm takes as input a matrix , as in (9), and computes its LUP factorization; outputs of the algorithm are the modified matrix , having as elements the ones of its factors and Û, and the permutation . As in (9), we denote as i,j the signature in position (i, j) in the output matrix . The matrices and Û can then be expressed as:

L^=1^0^0^0^m^1,01^0^0^m^2,0m^2,11^0^m^d1,0m^d1,1m^d1,21^,U^=m^0,0m^0,1m^0,2m^0,d10^m^1,1m^1,2m^1,d10^0^m^2,2m^2,d10^0^0^m^d1,d1(11)

where and denote, respectively, the signature of the identity matrix and the one of the null matrix (i.e. the length-k vectors [1, 0, ⋯, 0] and [0, 0, ⋯, 0]).

The matrix is represented through a length-d vector [p0, p1, ⋯, pd−1], containing a permutation of the integers [0, 1, ⋯, d − 1]; the rows of get permuted according to the elements of . In particular, the elements of are obtained through a block pivoting procedure, which is described in Algorithm 5.

### Algorithm 5

Block pivoting

 INPUT: d, j, r ∈ ℕ, n = 2r, P̂ ∈ ℕd and M̂ ∈ 𝔽d×dn with char(𝔽) = 2, . OUTPUT: u ∈ ℕ. 1: u ← 0 2: i ← j 3: whilei ≤ d − 1 4:  w ← sum(m̂i,j) ▹ Sum of the elements in m̂i,j 5:  ifw = 0 then 6:   z ← pj 7:   pj ← pi 8:   pi ← z 9:   forl ← 0 tod − 1 do 10:    z ← m̂j,l 11:    m̂j,l ← m̂i,l 12:    m̂i,l ← z 13:    i ← i + 1 14:  end for 15:  else 16:    i ← d 17:    u ← 1 18:  end if 19: end while 20: return u

This function takes as input , and an integer j, and searches for a pivot (i.e., a non singular signature) in the j-th column of , starting from j,j, and places it in position (j, j). As the procedure goes on, every time a singular signature is tested, the rows of get permuted; the elements of are accordingly modified. If the j-th column contains all singular blocks, this means that the matrix is singular; in such a case, this event is notified by setting u = 0.

We point out that, for the matrices we are considering, we expect Algorithm 4 to be particularly efficient. First of all, as we have already said, this is due to the possibility of efficiently performing operations involving dyadic matrices; in addition, the dyadic structure should also speed-up the pivoting procedure. In fact, we can consider a signature in as a collection of k random elements picked from GF(2N): thus, their sum can be assumed to be a random variable with uniform distribution among the elements of the field GF(2N). So, the probability of it being equal to 0, which corresponds to the probability of the corresponding signature to be singular, equals 2m. This probability gets lower as m increases: this fact means that the expected number of operations performed by Algorithm 5 should be particularly low. Basically, most of the times the function will just compute the sum of the elements in j,j and verify whether it is null or not.

Once the factorization of has been obtained, we just need to perform the computation of M−1 through (10). Since the inverse of a triangular matrix maintains the original triangular structure, the computation of the inverses −1 and Û−1 can be efficiently performed. A possible way for computing these matrices is to store the elements of both matrices in just one output matrix . We do this in Algorithm 6.

### Algorithm 6

Computation of

 INPUT: d, r ∈ ℕ, n = 2r and M̂ ∈ 𝔽d×dn with char(𝔽) = 2. OUTPUT: T̂ ∈ 𝔽d×dn. 1: T̂ ← Îd 2: forj ← 0 tod − 1 do 3:  fori ← j + 1 tod − 1 do 4:   forl ← jtoi − 1 5:    t̂i,j ← t̂i,j + m̂i,kt̂k,j 6:   end for 7:  end for 8:  fori ← jtod − 1 9:   forl ← jtoi − 1 do 10:    t̂j,i ← t̂j,i + m̂k,it̂j,k 11:   end for 12:   t^j,i←t^j,im^i,i−1 13:  end for 14: end for 15: return T̂

The matrix Îd is the compact representation of a dn × dn identity matrix, and so is composed of signatures δi,j, where δi,j denotes the Kronecker delta.

If we denote as i,j the signature in position (i, j), we have:

L^1=1^0^0^0^t^1,01^0^0^t^2,0t^2,11^0^t^d1,0t^d1,1t^d1,21^,U^1=t^0,0t^0,1t^0,2t^0,d10^t^1,1t^1,2t^1,d10^0^t^2,2t^2,d10^0^0^t^d1,d1.(12)

## 5 Performance Analysis: Application to DAGS

In this section we provide the results of the application of our techniques to DAGS. For completeness, we have included a specification of the three DAGS algorithms in Appendix A, but for our purpose, DAGS is essentially the McEliece cryptosystem, converted to a KEM via a standard transformation [8]. In particular, the Key Generation algorithm is the same as the QD-GS McEliece version described in [12]. In this algorithm, a key role is played by the systematization (i.e. reduced row echelon form) of a quasi-dyadic rectangular matrix, the result of which will in fact be the public key for the scheme. The cost of computing said systematic matrix dwarfs everything else in key generation: according to a static analysis, this takes over 98% of the total cost of key generation. Therefore, a fast procedure to compute the systematic form will have a substantial impact on the overall performance of the algorithm.

### Implementation Details

We developed a code in “C'' to implement our procedures. In all cases, we use no optimizations apart from the optimization from the GCC compiler (“-O3”). The GCC version used was 7.3.1 20180406, the code was compiled for the processor Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz with 16GB of memory and operating system Arch linux version 2018.05.01 with Kernel 4.16.5. We ran 100 times each piece of code and computed the average of all measurements; to obtain the number of cycles, we used the file “cpucycles.h” from supercop[1].

### Fast Multiplication

To compare our methods, we fix a dyadic order r and measure the cost of a multiplication of two matrices of size n = 2r. Relevant dyadic orders for DAGS are for instance r = 4 and r = 5. We do this over different fields to highlight the difference in performance when changing fields: we tested 𝔽25 and 𝔽26 which are the fields currently used by DAGS.

### Table 1

Cost of Multiplication between Dyadic Matrices

𝔽25r = 44, 8332, 1943, 899
r = 521, 2855, 90912, 045
𝔽26r = 45, 8332, 1944, 899
r = 523, 2316, 22313, 568

### Efficient Inversion

We report here the results of the improved inversion procedure (Algorithms 4, 5 and 6). We compared our procedure with the equivalent portion of the DAGS implementation that we extrapolated from the publicly available source code [2]. In particular, we measured the piece of code that begins with the creation of the Cauchy matrix and ends with the generation of the systematic matrix. Table 2 shows the comparison, measured in cpu cycles.

### Table 2

Comparison of Inversion Methods

DAGS ImplementationLUP InversionLUP + Karatsuba
DAGS 11, 318, 973, 209321, 771108, 117
DAGS 32, 211, 076, 311557, 822198, 199
DAGS 517, 925, 330, 712654, 713431, 890

# Acknowledgement

Edoardo Persichetti and Paolo Santini were supported by NSF grant CNS-1906360.

### References

[3] Gustavo Banegas, Paulo S. L. M. Barreto, Brice Odilon Boidje, Pierre-Louis Cayrel, Gilbert Ndollane Dione, Kris Gaj, Cheikh Thiecoumba Gueye, Richard Haeussler, Jean Belo Klamti, Ousmane Ndiaye, Duc Tri Nguyen, Edoardo Persichetti and Jefferson E. Ricardini, DAGS: Key Encapsulation using Dyadic GS Codes, IACR Cryptology ePrint Archive2017 (2017), 1037.Search in Google Scholar

[4] E. Berlekamp, R. McEliece and H. van Tilborg, On the inherent intractability of certain coding problems (Corresp.), Information Theory, IEEE Transactions on24 (1978), 384 – 386.Search in Google Scholar

[5] J. R. Bunch and J. E. Hopcroft, Triangular factorization and inversion by fast matrix multiplication, Mathematics of Computation28 (1974), 231–236.Search in Google Scholar

[6] Pierre-Louis Cayrel, Gerhard Hoffmann and Edoardo Persichetti, Efficient Implementation of a CCA2-Secure Variant of McEliece Using Generalized Srivastava Codes, in: Public Key Cryptography - PKC 2012 - 15th International Conference on Practice and Theory in Public Key Cryptography, Darmstadt, Germany, May 21-23, 2012. Proceedings (Marc Fischlin, Johannes A. Buchmann and Mark Manulis, eds.), Lecture Notes in Computer Science 7293, pp. 138–155, Springer, 2012.Search in Google Scholar

[7] M. N. Gulamhusein, Simple matrix-theory proof of the discrete dyadic convolution theorem, Electronics Letters9 (1973), 238–239.Search in Google Scholar

[8] Dennis Hofheinz, Kathrin Hövelmanns and Eike Kiltz, A Modular Analysis of the Fujisaki-Okamoto Transformation, Cryptology ePrint Archive, Report 2017/604, 2017, .Search in Google Scholar

[9] A. Karatsuba and Y. Ofman, Multiplication of Multidigit Numbers by Automata, 01 1963.Search in Google Scholar

[10] R. J. McEliece, A Public-Key Cryptosystem Based On Algebraic Coding Theory, Deep Space Network Progress Report44 (1978), 114–116.Search in Google Scholar

[11] R. Misoczki and P. S. L. M. Barreto, Compact McEliece Keys from Goppa Codes, in: Selected Areas in Cryptography, pp. 376–392, 2009.Search in Google Scholar

[12] E. Persichetti, Compact McEliece keys based on quasi-dyadic Srivastava codes, Journal of Mathematical Cryptology6 (2012), 149–169.Search in Google Scholar

[13] Lukas Polok and Pavel Smrz, Pivoting Strategy for Fast LU Decomposition of Sparse Block Matrices, in: Proceedings of the 25th High Performance Computing Symposium, HPC ’17, pp. 14:1–14:12, Society for Computer Simulation International, San Diego, CA, USA, 2017.Search in Google Scholar

[14] P. W. Shor, Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer, SIAM Journal on Computing26 (1997), 1484–1509.Search in Google Scholar

### A DAGS Algorithms

We briefly describe the three algorithms that define DAGS. Generalized Srivastava codes are defined by parameters s and t, where in our case log s is the dyadic order; the codes in use have length n = n0s and dimension k = k0s where n0 and k0 are the number of dyadic blocks. Other parameters are the cardinality of the base field q and the degree of the field extension m. In addition, we have k = k′ + k″, where k′ is arbitrary and set to be “small”.

The key generation process uses the following fundamental equation

1hij=1hi+1hj+1h0(13)

which guarantees we can build a dyadic matrix, with signature h = (h0, h1, …, hn−1), which is also a Cauchy matrix, i.e. a matrix C(u, v) with components Cij=1uivj. In [11] it is proved that we can use the fundamental equation to choose a support and polynomial for a Goppa code such that this dyadic Cauchy matrix is a parity-check matrix for the code.

#### Key Generation

1. Generate dyadic signature h according to the fundamental equation.

2. Build the vectors (u, v) that define the Cauchy matrix (again using the equation).

3. Form Cauchy matrix Ĥ1 = C(u, v).

4. Build Hi, i = 2, … t, by raising each element of Ĥ1 to the power of i.

5. Superimpose blocks Ĥi in ascending order to form matrix Ĥ.

6. Generate scaling vector z by sampling elements zi in 𝔽qm with zis+j = zis for i = 0, …, n0 − 1, j = 0, …, s − 1.

7. Set yj=zji=0s1(uivj)t for j = 0, …, n − 1 and y = (y0, …, yn−1).

8. Form H = Ĥ ⋅ Diag(z).

9. Project H onto 𝔽q using the co-trace function: call this Hbase.

10. Write Hbase in systematic form (A | Ink).

11. The public key is the generator matrix G = (Ik | AT).

12. The private key is the pair (v, y).

Note that all matrices involved in key generation are quasi-dyadic (with blocks of size s × s), namely Ĥ, H, Hbase and its systematic form, and the final matrix G which is the public key. Step 10 is the systematization process which is impacted by our improved inversion algorithm.

The encapsulation and decapsulation algorithms make use of three hash functions 𝓖 : FqkFqk, 𝓗 : FqkFqk and 𝓚 : {0, 1}* → {0, 1}, where is the desired length of the key to be shared.

#### Encapsulation

1. Choose m\$Fqk.

2. Compute r = 𝓖(m) and d = 𝓗(m).

3. Parse r as (ρσ) then set μ = (ρm).

4. Generate error vector e of length n and weight w from σ.

5. Compute c = μG + e.

6. Compute k = 𝓚(m).

7. Output ciphertext (c, d); the encapsulated key is k.

The decapsulation algorithm is essentially a run of the decoding algorithm to decode the noisy codeword received as part of the ciphertext, plus a number of integrity checks.

#### Decapsulation

1. Recover parity-check matrix H′ in alternant form from private key.

2. Use H′ to decode c and obtain codeword μG and error e′.

3. Output ⊥ if decoding fails or wt(e′) ≠ w

4. Recover μ′ and parse it as (ρ′ ∥ m′).

5. Compute r′ = 𝓖(m′) and d′ = 𝓗(m′).

6. Parse r′ as (ρ″ ∥ σ′).

7. Generate error vector e″ of length n and weight w from σ′.

8. If e′ ≠ e″ ∨ ρ′ ≠ ρ″∨ dd′ output ⊥.

9. Else compute k = 𝓚(m′).

10. The decapsulated key is k.