## 1 Introduction

Post-Quantum Cryptography is the area of research that investigates cryptographic primitives that are deemed secure against attackers equipped with quantum technology. These include schemes based on a variety of mathematical problems, such as finding short vectors in a lattice, or decoding random linear codes. The latter is known as Code-based Cryptography and it relies more or less directly on the Syndrome Decoding Problem [4], which shows no vulnerabilities to quantum attacks. The first code-based scheme was introduced by McEliece in 1978 [10] and has resisted cryptanalysis, in its original form, for nearly 40 years.

McEliece’s cryptosystem has often been ignored in favor of schemes based on number theory problems (such as RSA or El Gamal), mainly due to the size of its public key, which was deemed too large for practical use (especially at the time). However, Shor’s algorithm [14] shows that, once quantum computers of an appropriate size are available, the cryptosystems currently in use will become obsolete. It is therefore important to offer a credible alternative to current cryptography, and, with this in mind, NIST has recently launched a call for papers to standardize the public-key primitives of the future[1].

Among the code-based candidates for NIST’s call, DAGS [3] is a Key Encapsulation Mechanism (KEM) that uses Quasi-Dyadic (QD) matrices to considerably reduce the size of the public key, following a McEliece-like approach. The proposal builds on a line of work initiated by Misoczki and Barreto [11] and subsequently developed by Persichetti in [6, 12].

### Our Contribution

We analyze two separate aspects of dyadic operations. First, we present three different algorithms for that are aimed specifically at computing multiplication of dyadic matrices. These are, respectively, a “standard” approach that makes use of dyadic signatures, a specialized Karatsuba-like algorithm, and a procedure based on the Fast Walsh-Hadamard Transform (FWHT) [7], also called dyadic convolution. We analyze the performance of all three methods and report our timings.

As a second contribution, we describe a procedure that applies the LUP decomposition [5] to the dyadic case. The method effectively factors every quasi-dyadic matrix into a product of two triangular matrices and a permutation matrix. This leads to the possibility of a very efficient algorithm for computing the inverse of a matrix, which is particularly useful in code-based cryptography, for instance for computing the systematic form of a parity-check (or generator) matrix. According to our measurements, this improved inversion procedure is extremely fast, and provides a very large speedup during DAGS Key Generation.

### Organization of the Paper

This paper is organized as follows. We start with some preliminary definitions in Section 2. We then present our main contributions: the various multiplication techniques are described in Section 3 and the improved inversion algorithm is presented in Section 4. We conclude by showing the results obtained when applying our techniques to DAGS; this is done in Section 5.

## 2 Preliminaries

We introduce dyadic matrices and describe some of their general properties.

*Given a ring* 𝓡 *and a vector***h** = (*h*_{0}, *h*_{1}, ⋯, *h*_{n−1}) ∈ 𝓷^{R}, *with**n* = 2^{r}*for some**r* ∈ ℕ, *the**dyadic**matrix** Δ*(

**h**) ∈ 𝓡

^{n×n}

*is the symmetric matrix with components*

*Δ*=

_{ij}*h*

_{i⊕j}

*where*⊕

*stands for bitwise exclusive*-

*or. Such a matrix is said to have*

*order*

*r*.

*The sequence*

**h**

*is called*

*signature*

*of the matrix*

**(**

*Δ***h**),

*and corresponds to its first row. The set of dyadic*

*n*×

*n*

*matrices over*𝓡

*is denoted*

**(𝓡**

*Δ*^{n}).

One can alternatively characterize a dyadic matrix recursively: any 1 × 1 matrix is dyadic of order 0, and any dyadic matrix **M** of order *r* > 0 has the form

where *A* and *B* are two dyadic matrices of order *r* − 1. In other words, ** Δ**(𝓡

^{n}) =

**(**

*Δ***(𝓡**

*Δ*^{n/2})).

*A* dyadic permutation *is a dyadic matrix** Π^{i}* ∈

**({0, 1}**

*Δ*^{n})

*characterized by the signature*

*= (*

**π**^{i}*δ*|

_{ij}*j*= 0, …,

*n*− 1),

*where*

*δ*

_{ij}*is the Kronecker delta (hence*

**π**^{i}

*corresponds to the*

*i*-

*th row or column of the identity matrix)*.

A dyadic permutation is clearly an involution, i.e. (* Π^{i}*)

^{2}=

**I**. The

*i*-th row, or equivalently the

*i*-th column, of the dyadic matrix defined by a signature

**h**can be written as

**(**

*Δ***h**)

_{i}=

**h**

**Π**^{i}.

A dyadic matrix can be efficiently represented by its signature; in particular, all the operations between dyadic matrices can be referred only to the corresponding signatures. Indeed, for any two length-*n* vectors **a**, **b** ∈ 𝓡, we have:

which means that, given two dyadic matrices **A** and **B**, with respective signatures **a** and **b**, their sum is the dyadic matrix described by the signature **a**+**b**.

In an analogous way, the multiplication between dyadic matrices can be done by considering only the corresponding signatures; we will discuss efficient ways for computing multiplications in Section 3.

Standard multiplication of dyadic matrices

*r* ∈ ℕ, *n* = 2^{r} and **a**, **b** ∈ 𝔽^{n}.OUTPUT: **c** ∈ 𝔽^{n} such that ** Δ**(

**c**) =

**(**

*Δ***a**)

**(**

*Δ***b**). 1:

**c**← vector of length

*n*, initialized with null elements. 2:

*c*

_{0}←

*a*

_{0 }⋅

*b*

_{0}3:

**for**

*i*← 1

**to**

*n*− 1

**do**4:

*c*

_{0}←

*c*

_{0}+

*a*

_{i}*b*5:

_{i}*i*

^{(b)}← binary representation of

*i*, using

*n*bits. 6:

**for**{

*j*= 0, 1, ⋯,

*n*− 1}

**do**7:

*j*

^{(b)}← binary representation of

*j*, using

*n*bits. 8:

*π*

^{(b)}←

*i*

^{(b)}⊕

*j*

^{(b)}9:

*π*← conversion of

*π*

^{(b)}into an integer. 10:

*c*←

_{i}*c*+

_{i}*a*

_{i}*b*

_{π}11:

**end for**12:

**end for**13:

**return c**

Moreover, it is easy to see that the inverse of a dyadic matrix is also a dyadic matrix; this can be easily computed using Sylvester-Hadamard matrices (see Section 3.2).We will expand on this in Section 4.

Finally, we introduce a relaxed notion of dyadicity, which will be useful throughout the paper.

*A* quasi-dyadic *matrix is a (possibly non*-*dyadic) block matrix whose elements are dyadic submatrices*, *i.e. an element of*** Δ**(𝓡

^{n})

^{d1×d2}.

## 3 Multiplication of Dyadic Matrices

In this section we consider different methods for computing the multiplication between two dyadic matrices. In fact, we have just mentioned how some matrix operations, like the sum or the inversion, can be efficiently performed in the dyadic case just by considering the signatures. Multiplication can be strongly improved with similar methods, which exploit the particular structure of such matrices. In particular, we analyze three different algorithms and provide estimations for their complexities; we then compare the performance of the various algorithms.

For ease of notation, we will refer to the two *n* × *n* matrices that we want to multiply simply as **A** and **B**, with **a** = [*a*_{0}, *a*_{1}, ⋯, *a*_{n−1}] and **b** = [*b*_{0}, *b*_{1}, ⋯, *b*_{n−1}] being the respective signatures. Maintaining the same notation, the product matrix **C** = **A****B**, which is also dyadic, will have signature **c** = [*c*_{0}, *c*_{1}, ⋯, *c*_{n−1}].

In particular, we focus on the special case of quasi-dyadic matrices with elements belonging to a field 𝔽 of characteristic 2.

### 3.1 Standard Multiplication

The first algorithm we analyze is described in Algorithm 1; we refer to it as the *standard multiplication*. The element of **C** in position (*i*, *j*) is obtained as the multiplication between the *i*-th row of **A** and the *j*-th column of **B**. Since dyadic matrices are symmetric, this is equivalent to the inner product between the *i*-th row of **A** and the *j*-th one of **B**. The signature **c** (i.e., the first row of **C**) is obtained by inner products involving only **a** (i.e., the first row of **A**). Thus, we can just construct the rows of **B**, by permutations of the elements in **b**, and then compute the inner products.

The complexity of the algorithm is due to two different types of operations:

- In order to construct the rows of
**B**, we need the indexes of the corresponding permutations. Each index is computed as the modulo 2 sum of two binary vectors of length*r*, so can be obtained with a complexity of*r*binary operations. Thus, considering that we need to repeat this operation for 2^{r}− 1 rows (for the first one, no permutation is needed), the complexity of this procedure can be estimated as*r*⋅ 2^{r}⋅ ( 2^{r}− 1 ). - Each element of
**c**is obtained as the inner product between two vectors of 2^{r}elements, assuming values in 𝔽. This operation requires 2^{r}multiplication and 2^{r}− 1 sums in 𝔽. If we denote as*C*_{mult}and*C*_{sum}the costs of, respectively, a multiplication and a sum in 𝔽, the total number of binary operations needed to compute 2^{r}inner products can be estimated as 2^{2r}⋅*C*_{mult}+ (2^{2r}− 2^{r})⋅*C*_{sum}.

The complexity of a standard multiplication between two dyadic signatures can be estimated as:

### 3.2 Dyadic Convolution

*The* dyadic convolution *of two vectors***a**, **b** ∈ 𝓡, *denoted by***a****b**, *is the unique vector of* 𝓡 *such that*** Δ**(

**a**

**b**) =

**(**

*Δ***a**)

**(**

*Δ***b**).

Of particular interest to us is the case where 𝓡^{n} is actually a field 𝔽. Dyadic matrices over 𝔽 form a commutative subring ** Δ**(𝔽

^{n}) ⊂ 𝔽

^{n×n}, and this property gives rise to efficient arithmetic algorithms to compute the dyadic convolution. In particular, we here consider the fast Walsh-Hadamard transform (FWHT), which is well known [7] but seldom found in a cryptographic context. We describe it here for ease of reference. We firstly recall the FWHT for the case of a field 𝔽 such that char(𝔽) ≠ 2, and then describe how this technique can be generalized to consider also the case of char(𝔽) = 2 (which, again, is the one we are interested in).

*Let* 𝔽 *be a field with* char(𝔽) ≠ 2. *The* Sylvester-Hadamard matrix **H**_{r} ∈ 𝔽^{n}*is recursively defined as*

One can show by straightforward induction that

*Let* 𝔽 *be a field with* char(𝔽) ≠ 2. *If***M** ∈ 𝔽^{n×n}*is dyadic*, *then**is diagonal*.

The lemma clearly holds for *r* = 0. Now let *r* > 0, and write

where **A** and **B** are dyadic. It follows that

and since both **M**_{+} = **A** + **B** and **M**_{−} = **A** − **B** are dyadic,

Lemma 3.1 establishes that Sylvester-Hadamard matrices diagonalize all dyadic matrices. In particular, the factors in a product of dyadic matrices are thus simultaneously diagonalized, suggesting an efficient way to carry out the matrix multiplication, namely, computing **M** and **N** requires only *n* multiplications of the diagonal elements.

In fact, it is not necessary to compute **M**, as indicated by the following result:

*Let* 𝔽 *be a field with* char(𝔽) ≠ 2. *The diagonal form of a dyadic matrix***M** ∈ 𝔽^{n×n}*is the first line of***M****H**_{r}. *In other words*, ** Δ**(

**h**)

**H**

_{r}= diag(

**h**

**H**

_{r}).

The lemma clearly holds for *r* = 0. Now let *r* > 0, and with the notation of Lemma 3.1, the diagonal of **M****H**_{r} is the concatenation of the diagonals of

the first line of **M****H**_{r} is the concatenation of the first lines of **M**_{+}**H**_{r−1} and **M**_{−}**H**_{r−1}, which by induction are the diagonals of

*Computing***c***such that*** Δ**(

**a**)

**(**

*Δ***b**) =

**(**

*Δ***c**)

*involves only three multiplications of vectors by Sylvester*-

*Hadamard matrices*.

By Lemma 3.2, diag(**a****H**_{r}) diag(**b****H**_{r}) = **c****H**_{r}). Now simply retrieve **c** from **z** = **c****H**_{r} as

The structure of Sylvester-Hadamard matrices leads to an efficient algorithm to compute **a****H**_{r} for **a** ∈ 𝔽^{n}, which is known as the fast Walsh-Hadamard transform. Let [**a**_{0}, **a**_{1}] be the two halves of **a**. Thus

This recursive algorithm, which can be easily written in purely sequential fashion (Algorithm 2), has complexity *Θ*(*n* log *n*), specifically, *rn* additions or subtractions in 𝔽. It is therefore somewhat more efficient than the fast Fourier transform, which involves multiplications by *n*-th roots of unity, when they are available at all (otherwise working in extension fields is unavoidable, and more expensive).

The product of two dyadic matrices ** Δ**(

**a**) and

**(**

*Δ***b**), or equivalently the dyadic convolution

**a**

**b**, can thus be efficiently computed as described in Algorithm 3. The total cost is 3

*rn*additions or subtractions and 2

*n*multiplications (half of these by the constant 2

^{−r}= 1/

*n*) in 𝔽, with an overall complexity

*Θ*(

**n**log

*n*). Notice that this is also the complexity of computing det

**(**

*Δ***a**).

The fast Walsh-Hadamard transform itself is not immediately possible on fields of characteristic 2, since it depends on Sylvester-Hadamard matrices which must contain a primitive square root of unity. Yet the FWHT algorithm can be lifted to characteristic 0, namely, from 𝔽_{2} = ℤ/2ℤ to ℤ, or more generally from 𝔽_{2N} = (ℤ/2ℤ)[*x*]/*P*(*x*) (for some irreducible *P*(*x*) of degree *N*) to ℤ[*x*]. Algorithm 3 can then be applied, and its output mapped back to the relevant binary field by modular reduction. This incurs a space expansion by a logarithmic factor, though. Each bit from 𝔽_{2} is mapped to intermediate values that can occupy as much as 3*r* + 1 bits; correspondingly, each element from 𝔽_{2N} is mapped to intermediate values that can occupy as much as (3*r* + 1)*N* bits. Thus the component-wise multiplication in Algorithm 3 becomes more complicated to implement for large *N*. However, the method remains very efficient for the binary case as long as each expanded integer component fits a computer word. For a typical word size of 32 bits and each binary component being expanded by a factor of 3*r* + 1, this means that blocks as large as 1024 × 1024 can be tackled efficiently. On more restricted platforms where the maximum available word size is 16 bits, dyadic blocks of size 32 × 32 can still be handled with relative ease.

The fast Walsh-Hadamard transform (FWHT)

*r* ∈ ℕ, *n* = 2^{r} and **a** ∈ 𝔽^{n} with char(𝔽) ≠ 2.OUTPUT: **a****H**_{r}. 1: *v* ← 1 2: **for***j* ← 1 **to***n***do** 3: *w* ← *v* 4: *v* ← 2*v* 5: **for***i* ← 0 **to***n* − 1 **by***v***do** 6: **for***l* ← 0 **to***w* − 1 **do** 7: *s* ← *a*_{i+l} 8: *q* ← *a*_{i+l+w} 9: *a*_{i+l} ← s + *q* 10: *a*_{i+l+w} ← *s* − *q* 11: **end for** 12: **end for** 13: **end for** 14: **return a**

Dyadic convolution via the FWHT

*r* ∈ ℕ, *n* = 2^{r} and **a**, **b** ∈ 𝔽^{n} with char(𝔽) ≠ 2.OUTPUT: **a****b** ∈ 𝔽^{n} such that ** Δ**(

**a**)

**(**

*Δ***b**) =

**(**

*Δ***a**

**b**). 1:

**c**← vector of length

*n*, initialized with null elements. 2:

**c̃**← vector of length

*n*, initialized with null elements. 3: Compute

**ã**←

**a**

**H**

_{r}via Algorithm 2.▹ expansion 1 →

*r*+ 1 4: Compute

**b̃**←

**b**

**H**

_{r}via Algorithm 2.▹ expansion 1 →

*r*+ 1 5:

**for**

*j*← 0

**to**

*n*− 1

**do**6:

*c̃*←

_{j}*ã*

_{j}*b̃*▹ expansion

_{j}*r*+ 1 → 2

*r*+ 1 7:

**end for**8: Compute

**c**←

**c̃**

**H**

_{r}via Algorithm 2.▹ expansion 2

*r*+ 1 → 3

*r*+ 1 9:

**c**← 2

^{−r}

**c**10:

**return c**

### 3.3 Karatsuba Multiplication

In this section we propose a method which is inspired by Karatsuba’s algorithm for the multiplication of two integers [9]. Let us denote by **a**_{0} and **a**_{1}, respectively, the first and second halves of **a**, i.e.:

The same notation is used for **b**_{0} and **b**_{1} and **c**_{0} and **c**_{1}, corresponding to the halves of **B** and **C**. Some straightforward computations show that the following relations hold:

The iterative application of equation (5) allows to compute multiplications between dyadic matrices of any size. Let us denote as ^{z}. For the sum of two dyadic signatures of size 2^{z} we have:

where *C*_{sum} again denotes the complexity of a sum in the finite field.

The complexity of this algorithm can thus be estimated as:

Taking into account the well known sum of a geometric series, we have:

Considering this result, equation (7) leads to:

## 4 Efficient Inversion of Dyadic and Quasi-Dyadic Matrices

In this section we propose an efficient algorithm for computing the inverse of quasi-dyadic matrices. The algorithm in principle is targeted to matrices that are not fully dyadic (even though, obviously, they have to be square). This is because, while it is of course possible to apply our procedure to fully dyadic matrices, these can in general be inverted much more easily, as we will see next.

To begin, remember that by definition of a quasi-dyadic matrix (Definition 2.3) we mean an element of ** Δ**(𝓡

^{n})

^{d1×d2}.

### 4.1 Dyadic Matrices

The inverse of a dyadic matrix (i.e. *d*_{1} = *d*_{2} = 1) can be efficiently computed, using only the signature, as described by the following Lemma.

*Let**n* = 2^{r}*for**r* ∈ ℕ *and let*** Δ**(

**a**) ∈ 𝓡

^{n×n}

*be a dyadic matrix with signature*

**a**.

*Then the inverse*

**(**

*Δ***a**)

^{−1}

*is the dyadic matrix*

*where*

**b̃**

*is the vector such that*diag(

**b̃**) = [diag(

**a**

**H**

_{r})

^{−1}.

We have *Δ*(b)** Δ(a)** =

**I**

_{n}=

**([1, 0, ⋯, 0]). The diagonal form of**

*Δ***I**

_{n}corresponds to the first row of the product

**I**

_{n}

**H**

_{r}, and so it is equal to the first row of

**H**

_{r}, that is the length-

*n*vector made of all ones. According to Corollary 3.2.1, we can write:

We then define **a****H**_{r} = [*λ*_{0}, *λ*_{1}, ⋯, *λ*_{n−1}], and obtain:

Because of Lemma 3.2, we finally have:

□

As we mentioned before, the above Lemma yields a very simple way for computing the inverse of a dyadic matrix: given a signature **a**, we just need to compute its diagonalized form as **a****H**_{r}, compute the reciprocals of its elements and put it in a vector **b̃**. Finally, the inverse of ** Δ(a)** can be obtained as

**(**

*Δ***a**): if its diagonalized form contains some null elements, then it is singular.

We now focus on the case of dyadic matrices over a field 𝔽 with characteristic 2. One can show by induction that in such a case a dyadic matrix ** Δ**(

**a**) of dimension

*n*satisfies

**(**

*Δ***a**)

^{2}= (∑

_{i}

*a*)

_{i}^{2}

**I**, and hence its inverse, when it exists, is

**(**

*Δ***a**)

^{−1}= (∑

_{i}

*a*)

_{i}^{−2}

**(**

*Δ***a**), which can be computed in

*O*(

*n*) steps since it is entirely determined by its first row. It is equally clear that det

**(**

*Δ***a**) = (∑

_{i}

*a*)

_{i}^{n}, which can be computed with the same complexity (notice that raising to the power of

*n*= 2

^{r}only involves

*r*squarings). Basically, verifying whether a dyadic matrix has full rank or not can be easily done by checking whether the sum of the elements of the signature equals 0.

### 4.2 Quasi-Dyadic Matrices

Consider a quasi-dyadic matrix **M**. Since the matrix has to be square, we have *d*_{1} = *d*_{2} = *d*, and the matrix has dimension *dn* × *dn*. Such a matrix can be compactly represented just by the signatures of the dyadic blocks. To simplify notation, we can denote the signature of the dyadic-block in position (*i*, *j*) as **m̂**_{i,j}, and store all such vectors in a matrix **M̂** ∈ 𝓡^{d×dn}:

We focus again on the special case of quasi-dyadic matrices over a field 𝔽 with characteristic 2.

The LUP decomposition is a method which factorizes a matrix **M** as **L****U****P**, where **L** and **U** are lower triangular and upper triangular matrices, respectively, and **P** is a permutation.

Exploiting this factorization, the inverse of **M** can thus be expressed as:

The advantage of this method is that the inverses appearing in (10) can be easily computed, because of their particular structures. In fact, the inverse of an upper (lower) triangular matrix is obtained via a simple backward (forward) substitution procedure, while the inverse of **P** is its transpose.

In some cases, applying a block-wise LUP decomposition might lead to some complexity reduction; for instance, see [13] for the inversion of a sparse matrix. Here, we consider the case of a quasi-dyadic matrix; the corresponding procedure is shown in Algorithm 4.

LUP Decomposition of a Quasi-Dyadic Matrix

*d*, *r* ∈ ℕ, *n* = 2^{r} and **M̂** ∈ 𝔽^{d×dn} with char(𝔽) = 2.OUTPUT: **M̂** ∈ 𝔽^{d×dn}, **P̂** ∈ ℕ^{d}. 1: **P̂** ← [0, 1, ⋯, *d* − 1] 2: *u* ← 0 3: **for***j* ← 0 **to***d* − 1 **do** 4: Update *u*, **M̂** and **P̂** via Algorithm 5.▹ Pivoting of the signatures in the *j*-th column 5: **if***u* = 0 **then** 6: **return***u*▹ **M̂** is singular 7: **end if** 8: **for***i* ← *j* + 1 **to***d***do** 9: **m̂**_{i,j} ← **end for** 11: **for***i* ← *j* + 1 **to****d** − 1 **do** 12: **for***l* ← *j* + 1 **to***d* − 1 **do** 13: **m̂**_{i,l} ← **m̂**_{i,l}+**m̂**_{i,j}**m̂**_{j,l} 14: **end for** 15: **end for** 16: **end for****return****M̂**, **P̂**

Our proposed procedure consists in using a block decomposition, which works directly on the signatures, in order to exploit the simple and efficient algebra of dyadic matrices. The operations in Algorithm 4 only refer to the signatures in **M̂**: for instance, the expression **m̂ _{i, j} m̂_{i, l}** means the product between the dyadics having as signatures

**m̂**and

_{i, j}**m̂**. This choice may result in some abuse of notation, but is useful to emphasize the fact that, as we have explained in the previous sections, operations with dyadics can be efficiently computed just by taking into account their signatures. It can be easily shown that, for a quasi-dyadic matrix, its factors

_{i, l}**L**,

**U**and

**P**are in quasi-dyadic form as well: as we have done for the matrix

**M**, we refer to their compact representations as

**L̂**,

**Û**and

**P̂**, respectively.

The algorithm takes as input a matrix **M̂**, as in (9), and computes its LUP factorization; outputs of the algorithm are the modified matrix **M̂**, having as elements the ones of its factors **L̂** and **Û**, and the permutation **P̂**. As in (9), we denote as **m̂**_{i,j} the signature in position (*i*, *j*) in the output matrix **M̂**. The matrices **L̂** and **Û** can then be expressed as:

where **1̂** and **0̂** denote, respectively, the signature of the identity matrix and the one of the null matrix (i.e. the length-*k* vectors [1, 0, ⋯, 0] and [0, 0, ⋯, 0]).

The matrix **P̂** is represented through a length-*d* vector [*p*_{0}, *p*_{1}, ⋯, *p*_{d−1}], containing a permutation of the integers [0, 1, ⋯, *d* − 1]; the rows of **M̂** get permuted according to the elements of **P̂**. In particular, the elements of **P̂** are obtained through a block pivoting procedure, which is described in Algorithm 5.

Block pivoting

*d*, *j*, *r* ∈ ℕ, *n* = 2^{r}, **P̂** ∈ ℕ^{d} and **M̂** ∈ 𝔽^{d×dn} with char(𝔽) = 2, .OUTPUT: *u* ∈ ℕ. 1: *u* ← 0 2: *i* ← *j* 3: **while***i* ≤ *d* − 1 4: *w* ← sum(**m̂**_{i,j})▹ Sum of the elements in **m̂**_{i,j} 5: **if***w* = 0 **then** 6: *z* ← *p _{j}* 7:

*p*←

_{j}*p*8:

_{i}*p*←

_{i}*z*9:

**for**

*l*← 0

**to**

*d*− 1

**do**10:

*z*←

**m̂**

_{j,l}11:

**m̂**

_{j,l}←

**m̂**

_{i,l}12:

**m̂**

_{i,l}←

*z*13:

*i*←

*i*+ 1 14:

**end for**15:

**else**16:

*i*←

*d*17:

*u*← 1 18:

**end if**19:

**end while**20:

**return u**

This function takes as input **M̂**, **P̂** and an integer *j*, and searches for a pivot (i.e., a non singular signature) in the *j*-th column of **M̂**, starting from **m̂**_{j,j}, and places it in position (*j*, *j*). As the procedure goes on, every time a singular signature is tested, the rows of **M̂** get permuted; the elements of **P̂** are accordingly modified. If the *j*-th column contains all singular blocks, this means that the matrix **M̂** is singular; in such a case, this event is notified by setting *u* = 0.

We point out that, for the matrices we are considering, we expect Algorithm 4 to be particularly efficient. First of all, as we have already said, this is due to the possibility of efficiently performing operations involving dyadic matrices; in addition, the dyadic structure should also speed-up the pivoting procedure. In fact, we can consider a signature in **M̂** as a collection of *k* random elements picked from *GF*(2^{N}): thus, their sum can be assumed to be a random variable with uniform distribution among the elements of the field *GF*(2^{N}). So, the probability of it being equal to 0, which corresponds to the probability of the corresponding signature to be singular, equals 2^{−m}. This probability gets lower as *m* increases: this fact means that the expected number of operations performed by Algorithm 5 should be particularly low. Basically, most of the times the function will just compute the sum of the elements in **m̂**_{j,j} and verify whether it is null or not.

Once the factorization of **M̂** has been obtained, we just need to perform the computation of **M**^{−1} through (10). Since the inverse of a triangular matrix maintains the original triangular structure, the computation of the inverses **L̂**^{−1} and **Û**^{−1} can be efficiently performed. A possible way for computing these matrices is to store the elements of both matrices in just one output matrix **T̂**. We do this in Algorithm 6.

Computation of **T̂**

*d*, *r* ∈ ℕ, *n* = 2^{r} and **M̂** ∈ 𝔽^{d×dn} with char(𝔽) = 2.OUTPUT: **T̂** ∈ 𝔽^{d×dn}. 1: **T̂** ← **Î**_{d} 2: **for***j* ← 0 **to***d* − 1 **do** 3: **for***i* ← *j* + 1 **to***d* − 1 **do** 4: **for***l* ← *j***to***i* − 1 5: **t̂**_{i,j} ← **t̂**_{i,j} + **m̂**_{i,k}**t̂**_{k,j} 6: **end for** 7: **end for** 8: **for***i* ← *j***to***d* − 1 9: **for***l* ← *j***to***i* − 1 **do** 10: **t̂**_{j,i} ← **t̂**_{j,i} + **m̂**_{k,i}**t̂**_{j,k} 11: **end for** 12: **end for** 14: **end for** 15: **return T̂**

The matrix **Î**_{d} is the compact representation of a *dn* × *dn* identity matrix, and so is composed of signatures *δ*_{i,j}**1̂**, where *δ*_{i,j} denotes the Kronecker delta.

If we denote as **t̂**_{i,j} the signature in position (*i*, *j*), we have:

## 5 Performance Analysis: Application to DAGS

In this section we provide the results of the application of our techniques to DAGS. For completeness, we have included a specification of the three DAGS algorithms in Appendix A, but for our purpose, DAGS is essentially the McEliece cryptosystem, converted to a KEM via a standard transformation [8]. In particular, the Key Generation algorithm is the same as the QD-GS McEliece version described in [12]. In this algorithm, a key role is played by the systematization (i.e. reduced row echelon form) of a quasi-dyadic rectangular matrix, the result of which will in fact be the public key for the scheme. The cost of computing said systematic matrix dwarfs everything else in key generation: according to a static analysis, this takes over 98% of the total cost of key generation. Therefore, a fast procedure to compute the systematic form will have a substantial impact on the overall performance of the algorithm.

### Implementation Details

We developed a code in “C'' to implement our procedures. In all cases, we use no optimizations apart from the optimization from the GCC compiler (“-O3”). The GCC version used was 7.3.1 20180406, the code was compiled for the processor Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz with 16GB of memory and operating system Arch linux version 2018.05.01 with Kernel 4.16.5. We ran 100 times each piece of code and computed the average of all measurements; to obtain the number of cycles, we used the file “cpucycles.h” from supercop^{1}.

### Fast Multiplication

To compare our methods, we fix a dyadic order *r* and measure the cost of a multiplication of two matrices of size *n* = 2^{r}. Relevant dyadic orders for DAGS are for instance *r* = 4 and *r* = 5. We do this over different fields to highlight the difference in performance when changing fields: we tested 𝔽_{25} and 𝔽_{26} which are the fields currently used by DAGS.

Cost of Multiplication between Dyadic Matrices

Standard | Karatsuba | Dyadic Convolution | ||
---|---|---|---|---|

𝔽_{25} | r = 4 | 4, 833 | 2, 194 | 3, 899 |

r = 5 | 21, 285 | 5, 909 | 12, 045 | |

𝔽_{26} | r = 4 | 5, 833 | 2, 194 | 4, 899 |

r = 5 | 23, 231 | 6, 223 | 13, 568 |

### Efficient Inversion

We report here the results of the improved inversion procedure (Algorithms 4, 5 and 6). We compared our procedure with the equivalent portion of the DAGS implementation that we extrapolated from the publicly available source code [2]. In particular, we measured the piece of code that begins with the creation of the Cauchy matrix and ends with the generation of the systematic matrix. Table 2 shows the comparison, measured in cpu cycles.

Comparison of Inversion Methods

DAGS Implementation | LUP Inversion | LUP + Karatsuba | |
---|---|---|---|

DAGS 1 | 1, 318, 973, 209 | 321, 771 | 108, 117 |

DAGS 3 | 2, 211, 076, 311 | 557, 822 | 198, 199 |

DAGS 5 | 17, 925, 330, 712 | 654, 713 | 431, 890 |

Edoardo Persichetti and Paolo Santini were supported by NSF grant CNS-1906360.

## References

- [3]↑
Gustavo Banegas, Paulo S. L. M. Barreto, Brice Odilon Boidje, Pierre-Louis Cayrel, Gilbert Ndollane Dione, Kris Gaj, Cheikh Thiecoumba Gueye, Richard Haeussler, Jean Belo Klamti, Ousmane Ndiaye, Duc Tri Nguyen, Edoardo Persichetti and Jefferson E. Ricardini, DAGS: Key Encapsulation using Dyadic GS Codes,

*IACR Cryptology ePrint Archive***2017**(2017), 1037. - [4]↑
E. Berlekamp, R. McEliece and H. van Tilborg, On the inherent intractability of certain coding problems (Corresp.),

*Information Theory, IEEE Transactions on***24**(1978), 384 – 386. - [5]↑
J. R. Bunch and J. E. Hopcroft, Triangular factorization and inversion by fast matrix multiplication,

*Mathematics of Computation***28**(1974), 231–236. - [6]↑
Pierre-Louis Cayrel, Gerhard Hoffmann and Edoardo Persichetti, Efficient Implementation of a CCA2-Secure Variant of McEliece Using Generalized Srivastava Codes, in:

*Public Key Cryptography - PKC 2012 - 15th International Conference on Practice and Theory in Public Key Cryptography, Darmstadt, Germany, May 21-23, 2012. Proceedings*(Marc Fischlin, Johannes A. Buchmann and Mark Manulis, eds.), Lecture Notes in Computer Science 7293, pp. 138–155, Springer, 2012. - [7]↑
M. N. Gulamhusein, Simple matrix-theory proof of the discrete dyadic convolution theorem,

*Electronics Letters***9**(1973), 238–239. - [8]↑
Dennis Hofheinz, Kathrin Hövelmanns and Eike Kiltz,

*A Modular Analysis of the Fujisaki-Okamoto Transformation*, Cryptology ePrint Archive, Report 2017/604, 2017, http://eprint.iacr.org/2017/604. - [10]↑
R. J. McEliece, A Public-Key Cryptosystem Based On Algebraic Coding Theory,

*Deep Space Network Progress Report***44**(1978), 114–116. - [11]↑
R. Misoczki and P. S. L. M. Barreto, Compact McEliece Keys from Goppa Codes, in:

*Selected Areas in Cryptography*, pp. 376–392, 2009. - [12]↑
E. Persichetti, Compact McEliece keys based on quasi-dyadic Srivastava codes,

*Journal of Mathematical Cryptology***6**(2012), 149–169. - [13]↑
Lukas Polok and Pavel Smrz, Pivoting Strategy for Fast LU Decomposition of Sparse Block Matrices, in:

*Proceedings of the 25th High Performance Computing Symposium*, HPC ’17, pp. 14:1–14:12, Society for Computer Simulation International, San Diego, CA, USA, 2017. - [14]↑
P. W. Shor, Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer,

*SIAM Journal on Computing***26**(1997), 1484–1509.

## A DAGS Algorithms

We briefly describe the three algorithms that define DAGS. Generalized Srivastava codes are defined by parameters *s* and *t*, where in our case log *s* is the dyadic order; the codes in use have length *n* = *n*_{0}*s* and dimension *k* = *k*_{0}*s* where *n*_{0} and *k*_{0} are the number of dyadic blocks. Other parameters are the cardinality of the base field *q* and the degree of the field extension *m*. In addition, we have *k* = *k*′ + *k*″, where *k*′ is arbitrary and set to be “small”.

The key generation process uses the following fundamental equation

which guarantees we can build a dyadic matrix, with signature **h** = (*h*_{0}, *h*_{1}, …, *h*_{n−1}), which is also a Cauchy matrix, i.e. a matrix *C*(**u**, **v**) with components

### Key Generation

- Generate dyadic signature
**h**according to the fundamental equation. - Build the vectors (
**u**,**v**) that define the Cauchy matrix (again using the equation). - Form Cauchy matrix
*Ĥ*_{1}=*C*(**u**,**v**). - Build
*H*_{i},*i*= 2, …*t*, by raising each element of*Ĥ*_{1}to the power of*i*. - Superimpose blocks
*Ĥ*_{i}in ascending order to form matrix*Ĥ*. - Generate scaling vector
**z**by sampling elements*z*_{i}in 𝔽_{qm}with*z*_{is+j}=*z*for_{is}*i*= 0, …,*n*_{0}− 1,*j*= 0, …,*s*− 1. - Set
for$\begin{array}{c}{y}_{j}=\frac{{z}_{j}}{\prod _{i=0}^{s-1}({u}_{i}-{v}_{j}{)}^{t}}\end{array}$ *j*= 0, …,*n*− 1 and**y**= (*y*_{0}, …,*y*_{n−1}). - Form
*H*=*Ĥ*⋅ Diag(**z**). - Project
*H*onto 𝔽_{q}using the co-trace function: call this*H*._{base} - Write
*H*in systematic form (_{base}*A*|**I**_{n−k}). - The public key is the generator matrix
*G*= (**I**_{k}|*A*).^{T} - The private key is the pair (
**v**,**y**).

Note that all matrices involved in key generation are quasi-dyadic (with blocks of size *s* × *s*), namely *Ĥ*, *H*, *H _{base}* and its systematic form, and the final matrix

*G*which is the public key. Step 10 is the systematization process which is impacted by our improved inversion algorithm.

The encapsulation and decapsulation algorithms make use of three hash functions 𝓖 : ^{*} → {0, 1}^{ℓ}, where *ℓ* is the desired length of the key to be shared.

### Encapsulation

- Choose
$\begin{array}{c}m\stackrel{{\phantom{\rule{0ex}{0ex}}}_{\$}}{\leftarrow}{F}_{q}^{{k}^{\prime}}.\end{array}$ - Compute
**r**= 𝓖(**m**) and**d**= 𝓗(**m**). - Parse
**r**as (*ρ*∥*σ*) then set*μ*= (*ρ*∥**m**). - Generate error vector
**e**of length*n*and weight*w*from*σ*. - Compute
**c**=*μ**G*+**e**. - Compute
**k**= 𝓚(**m**). - Output ciphertext (
**c**,**d**); the encapsulated key is**k**.

The decapsulation algorithm is essentially a run of the decoding algorithm to decode the noisy codeword received as part of the ciphertext, plus a number of integrity checks.

### Decapsulation

- Recover parity-check matrix
*H*′ in alternant form from private key. - Use
*H*′ to decode**c**and obtain codeword*μ*′*G*and error**e**′. - Output ⊥ if decoding fails or wt(
**e**′) ≠*w* - Recover
*μ*′ and parse it as (*ρ*′ ∥**m**′). - Compute
**r**′ = 𝓖(**m**′) and**d**′ = 𝓗(**m**′). - Parse
**r**′ as (*ρ*″ ∥*σ*′). - Generate error vector
**e**″ of length*n*and weight*w*from*σ*′. - If
**e**′ ≠**e**″ ∨*ρ*′ ≠*ρ*″∨**d**≠**d**′ output ⊥. - Else compute
**k**= 𝓚(**m**′). - The decapsulated key is
**k**.