The Oribatida v1.3 Family of Lightweight Authenticated Encryption Schemes

: Permutation-based modes have been established for lightweight authenticated encryption, as can be seen from the high interest in the ongoing NIST lightweight competition. However, their security is upper bounded by O ( σ 2 /2 c ) bits, where σ are the number of calls and c is the hidden capacity of the state. The development of more schemes that provide higher security bounds led to the CHES’18 proposal Beetle that raised the bound to O ( rσ /2 c ), where r is the public rate of the state. While authenticated encryption can be performed in an on-line manner, authenticated decryption assumes that the resulting plaintext is buffered and never released if the corresponding tag is incorrect. Since lightweight devices may lack the resources for buffering, additional robustness guarantees, such as integrity under release of unverified plaintexts (Int-RUP), are desirable. In this stronger setting, the security of the established schemes, including Beetle, is limited by O ( q p q d /2 c ), where q d is the maximal number of decryption queries, and q p that of off-line primitive queries, which motivates novel approaches. This work proposes Oribatida , a permutation-based AE scheme that derives s -bit masks from previous per-mutation outputs to mask ciphertext blocks. Oribatida can provide a security bound of O ( rσ 2 /2 c + s ), which allows smaller permutations for the same level of security. It provides a security level dominated by O ( σ 2 d /2 c ) under Int-RUP adversaries, which eliminates the dependency on primitive queries. We prove its security under nonce-respecting and Int-RUP adversaries. We show that our Int-RUP bound is tight and show general attacks on previous constructions.


Permutation-based Modes
The sponge [17] and duplex [15] modes transform an internal n-bit state iteratively with a public permutation. Both modes absorb an input stream block-wise to generate a pseudo-random output stream. While sponges separate the input (absorption) and output (squeezing) phases, the duplex mode generates the i-th output block directly after the i-th input block has been absorbed. In both modes, an n-bit permutation absorbs the data in r-bit chunks, where the outer part of the state r < n is called the rate. The hidden inner part of the state of c = n − r bits is called the capacity, where r and c are a trade-off between performance and security.
Keyed Sponge Variants were introduced by Bertoni et al. [18] and can be categorized into inner-keyed, outer-keyed, and full-keyed variants (cf. [48]); recently, Dobraunig and Mennink added the suffix-keyed sponge [35]. The inner-keyed sponge [28] initializes the inner part with the key, (0 ‖ K), whereas the outer-keyed sponge [18] (so-dubbed by [8]) concatenates key and message K ‖ M for the output. The full-keyed sponge [16] employs the full state in the absorption phase; the suffix-keyed sponge uses a keyed function only at the end.
Permutation-based modes are analyzed mostly in the ideal-permutation model. For authenticated encryption, an adversary A shall distinguish between two worlds consisting of two oracles: each world has (1) a construction oracle that A can ask encryption and verification queries to and (2) a primitive oracle that provides access to the internal permutation. The former oracle represents on-line queries, whereas the latter represents off-line queries. A can ask qe encryption queries, qv verification queries, and qp construction queries; σ usually denotes the number of blocks over all construction queries.
Sponge Modes for Authenticated Encryption started with the Duplex construction and the AE scheme SpongeWrap [15] and MonkeyDuplex [16], and led to a considerable corpus of analysis, e.g., [8,32,37,49,52]. Early, Bertoni et al. [14] showed that the sponge is indifferentiable from a random oracle [47] for up to O(2 c/2 ) calls to the permutation. Their follow-up work [18] improved the bounds for the unkeyed sponge to O( qp σ 2 c + qc 2 k ) if σ ≪ 2 c/2 . For SpongeWrap, Bertoni et al. [15] had shown a privacy bound of O( q 2 k + σ 2 2 c ) and an authenticity bound of O( q 2 k + σ 2 2 c + q 2 τ ). Jovanovic et al. [41] improved the asymptotic authenticity bound, although under the limitation of at most σ ≪ 2 c/2 decryption queries. Summarizing many previous results, Mennink [48] showed that keyed sponges achieve PRF security of around O( q 2 c +qc qp 2 c ) + Adv Π . He coined the final term the key-prediction security, e.g., in O( qp 2 k ) for full-keyed sponges and k < n. The recent duplex-based AE scheme Beetle [25] added a transform to the output so that the plaintext input that is added to the inner part and to the ciphertext output block differ. As a result, Beetle offered a bound of O( rqp+rσ 2 c + qv+qp 2 r + σ 2 +q 2 p 2 n ). Table 1 summarizes some of the most noteworthy results in the past. Improvements to those general bounds appear hard, which motivates the search for novel constructions. Correct Authenticated Decryption requires the entire plaintext to be buffered until the tag has been verified. On certain architectures, this requirement can exceed the available storage and induce unacceptable latency. Remark 1. Finally, we acknowledge an observation by Rohit and Sarkar on the NIST lightweight mailing list [59]. We note that our proposal here represents a slightly updated variant Oribatida compared to the NIST submission [21] that addresses their observation by masking the authentication tag. We call it also Oribatida v1.3, but use Oribatida hereafter. We will discuss the effect of the slight update later.

Outline
After a brief recall of the necessary preliminaries, Section 3 motivates our proposal by showing I -RUP attacks on the duplex mode and other existing schemes. Section 4 describes Oribatida in general. We close the parenthesis of I -RUP attacks on Oribatida and other duplex-based modes when in Section 5. We analyze the security on Oribatida for the standard nonce-based AE setting in Section 6 and in the I -RUP setting in Section 7. Next, Section 8 compares it with those second-round NIST lightweight candidates that claim I -RUP security. Section 9 discusses the slight update from [21] and the associated improvement. Subsequently, Section 10 specifies an instance with a Simon-based permutation, whose security is discussed in Section 11 from previous works. Section 12 reports on the result of a hardware implementation of Oribatida before Section 13 concludes this work.

General Notations
We use uppercase letters (e.g., X, Y) for functions and variables, lowercase letters (e.g., x, y) for indices and lengths, as well as calligraphic uppercase letters (e.g., X, Y) for sets and spaces. We write F 2 for the field of characteristic 2 and F n 2 = {0, 1} n for the set of vectors over F 2 , i.e., strings of n bits. |X| denotes the number of bits of X. Given X ∈ F n 2 , we write X[i] for the i-th bit of X, and define the bit order by X = (X[n − 1] ‖ . . . ‖ X [1] ‖ X[0]). For t ≤ n, we use msb t (X) = (X[n − 1] . . . X[n − t]) to return the t rightmost (or most significant when interpreting X as integer) and lsb t (X) = (X[t − 1] . . . X [1]X [0]) to return the t leftmost (or least significant when interpreting X as integer) bits of X. We write ∅ for the empty set and ε for the empty string.
We denote by X[x.
.y] the range of X[x], . . . , X[y] for non-zero integers x and y. Given binary strings X and Y, we denote their concatenation by X ‖ Y and their bitwise XOR by X ⊕ Y when |X| = |Y|. For positive integers x and y and bit strings of different lengths X ∈ F x 2 and Y ∈ F y 2 with x ≥ y, we define X ⊕y Y = def X ⊕ (0 x−y ‖ Y). We write X X to indicate that X is chosen uniformly at random and independent from other variables from a set X. We consider Func(X, Y) to be the set of all mappings F : X → Y, and Perm(X) to be the set of all permutations over X. Given an event E, we denote the probability of E by Pr [E]. We denote the invalid symbol by ⊥. Moreover, we denote by (n) k = def ∏︀ k−1 i=0 (n − i) the falling factorial. For X ∈ F * 2 , we denote by (X 1 , X 2 , . . . , Xx) n ← − X the splitting of X into n-bit strings X 1 , . . . , X x−1 , and |Xx| ≤ n, in form of X 1 ‖ . . . ‖ Xx = X. For Y ∈ F x 2 , we write (X 1 , X 2 , . . . , Xm) x1,x2,...,xm For a given set X and non-negative integer x, we write X ≤x for the union set ∪ x i=0 X i . For a natural x < 2 n , we write ⟨x⟩n for its conversion to an n-bit binary string with the most significant bit left, e.g., ⟨135⟩ 8 = (10000111). We omit n if clear from the context.

Nonce-based Authenticated Encryption
Let K be a set of keys, N be a set of nonces, A a set of associated data, M a set of messages, C a set of ciphertexts, and T a set of authentication tags. A nonce N ∈ N is an input that must be unique for each authenticated encryption query.
A nonce-based AE scheme (with associated data) Π = (E, D) is a tuple of deterministic encryption algorithm E : K × N × A × M → C × T and deterministic decryption algorithm D : K × N × A × C × T → M ∪ {⊥} with associated key space K. The encryption algorithm E takes a tuple (K, N, A, M) and outputs (C, T), where C is a ciphertext and T an authentication tag. We assume that |C| = |M| holds for all inputs (K, N, A, M) and their corresponding ciphertexts. The associated data is authenticated, but not encrypted. The decryption function D takes a tuple (K, N, A, C, T) and outputs either the unique plaintext M for which E K (N, A, M) = (C, T) holds, or outputs ⊥ if the input is invalid. We introduce E N,A K (M) as short form of E K (N, A, M) and D N,A K (C, T) for D K (N, A, C, T), respectively. We assume that AE schemes are correct, i.e., for all (K, N, A, M) ∈ K × N × A × M, it holds that D N,A K (E N,A K (M)) = M. The ideal AE scheme provides two oracles $ : N × A × M → C × T and ⊥ : N × A × C × T → M ∪ {⊥} that offer access to encryption and verification. We overload the ⊥ notation to mean the oracle and the symbol of invalid decryption. Given (N, A, M), the ideal encryption oracle outputs ciphertext-tag tuples (C, T) that are random bits of the expected length. The ideal decryption oracle ensures correctness, i.e., given an input (N, A, C, T) where (C, T) had been the output to a previous encryption query (N, A, M), the decryption oracle outputs the corresponding message M. Otherwise, the decryption always returns the invalid symbol ⊥ for every new decryption query that had not been the answer to an earlier encryption query.
Since this work studies schemes based on public permutations, we employ the usual security notions in the ideal-permutation model. So, the adversary always has an additional oracle π ± that provides access to the public permutation π in forward and backward direction. We write Π[π] and E[π], D[π], etc. to indicate that an authenticated-encryption scheme Π and its algorithms are based on a primitive π Perm(B), where B = {0, 1} n is some block space. Note that in the analysis, the permutation is chosen uniformly at random from the set of all permutations over B. Though, the model is an adaption of the ideal-cipher model [60], and not of the random-oracle model [12]. We write ∆ A (O 1 ; O 2 ) for the advantage of A to distinguish between O 1 and O 2 .
be a nonce-based authenticated scheme. Let A be a nonce respecting adversary. Then, In the RUP model, the understanding of nonce-based AE differs slightly from the previous definition. Here, we use the notion that the adversary can always see the full resulting plaintext from a decryption query.
To formulate the forgery goal, the oracles are adapted. A verification oracle outputs 1 iff the input is valid, and 0 otherwise. A nonce-based RUP authenticated encryption schemẽ︀

H-coeflcient Technique
We will need a proof method in the later parts, where we opt for Patarin's well-suited H-coefficient technique in the variant by Chen and Steinberger [30,54]. The results of the interaction of an adversary A with its oracles are collected in a transcript τ. The oracles can sample randomness before the interaction (often a key or an ideal primitive that is sampled beforehand), and are then deterministic throughout the experiment [30]. The challenge of A is to distinguish the real world O real from the ideal world O ideal . We denote by Θ real and Θ ideal random variables that represent the distribution of transcripts in the real and the ideal world, respectively. In general, a transcript τ is called attainable if the probability to obtain τ in the ideal world -i.e. over Θ ideal -is non-zero. The Fundamental Lemma of the H-coefficients technique, the proof to which is given in [30,54], states that we can partition the set of all attainable transcripts into two disjoint sets G T and B T. Note that in all our analyses, we consider information-theoretic distinguishers A, where we assume idealized primitives. Thus, their resources are bounded only in terms of their maximal numbers of queries and blocks that they can ask to their available oracles. One can derive the computation-theoretic counterparts easily by adding a parameter of the distinguishers' maximal computational efforts.

I -RUP Attacks on Existing AE Schemes
This section shows attacks under I -RUP adversaries on the duplex mode, as well as on recent more secure AE schemes Beetle or SPoC. For each construction, we briefly recall the necessary parts of their definition. As summarized in Table 2, the proposed attacks possess an advantage of O( qp q d 2 c ) on the previous constructions. Thus, the improved bounds of Beetle or SPoC do not carry over to the I -RUP setting. For all attacks, we consider a random permutation π Perm(B) and a randomly chosen secret key K K. We denote by A a nonce-respecting I -RUP adversary against the individual schemes.
The main idea of all the attacks in this section is as follows: A asks q d decryption queries s. t. any predetermined r bits (e.g., the first r bits) of the input to one of the permutations of the construction are fixed and known (say X). The remaining n − r = c bits may vary. Next, A asks qp primitive queries Q 1 , Q 2 , . . . , Q qp with the first r bits fixed to X, but with pairwise distinct c bits, and receives R 1 , R 2 , . . . , R qp . When q d · qp ≈ O(2 c ), A can expect a state collision between an on-line input to the permutation and an (off-line) permutation query. This collision can be detected from the first r bits of the outputs of the corresponding construction queries, which will be equal for the colliding inputs. Once A knows the full state at the input to the permutation of the construction, it can revert the permutation calls in the construction and finally recover the key. We adapt this strategy for the duplex mode and Beetle before we consider the differences in SPoC and hybrids.

I -RUP Attack on The Duplex Mode
Let us consider the Sponge-Wrap mode [15].
1. The adversary A asks q d decryption queries (N, The associated data A i consist of a single block, the ciphertexts C = (C 1 , C 2 ) are fixed to the same two blocks for each query. 2. Now A can follow the generic idea to complete the attack.
The attack complexity is q d qp ≈ O(2 c ).
1. The adversary A asks q d encryption queries (N 1 , A 1 , M), (N 2 , A 2 , M), . . . , (N q d , A q d , M) to the encryption oracle, and receives C 1 , C 2 , . . . , C q d . Size of M and A i is one block for each i. 2. Then, A asks q d decryption queries (N 1 , . This ensures that the first r bits of the input to the third permutation always equal zero. 3. Now A can follow the generic idea to complete the attack.
The attack complexity is again q d qp ≈ O (2 c Figure 1: The Beetle authenticated encryption scheme.

I -RUP Attack on SPoC
SPoC (see Figure 2a) is a permutation-based NIST Lightweight candidate [5]. that uses the capacity to derive ciphertext outputs from, while it still absorbs the message in the rate. In SPoC, the adversary cannot fix any part of the state in contrast to SpongeWrap and Beetle-though, there is a similar attack: (b) A hybrid of Beetle and SPoC. 2. Now A can follow the generic idea to complete the attack.
So, the attack needs q d qp ∈ O(2 c ) to work, as before.

I -RUP Attack on A Hybrid of Beetle and SPoC
We can generalize our attacks to hybrid modes of Beetle and SPoC as well. Such a hybrid would use both modes Beetle and SPoC in parallel to process the queries. We illustrate it in Figure 2b. Each message block (say M) is parsed into two sections (say M 1 and M 2 ), where |M 1 | = r 1 and |M 2 | = r 2 . M 1 is processed with Beetle to a ciphertext block C 1 ; M 2 with SPoC to a ciphertext block C 2 ; The final ciphertext block becomes C ← C 1 ‖ C 2 , and the associated-data blocks and the ciphertext blocks for decryption are treated in a similar manner. Note that the hybrid mode is parameterized by r 1 , r 2 and c with the condition c ≥ r 2 . The size of rate and capacity of the Beetle part are r 1 and c − r 2 ; the size of both rate and capacity of the SPoC part is r 2 . As a result, the size of rate and capacity of the hybrid mode is r = r 1 + r 2 and c. When r 2 = 0, the hybrid mode translates to the Beetle mode. Similarly, when r 1 = 0, the hybrid mode is equivalent to the SPoC mode. An I -RUP attack on such modes could be defined as follows: to the decryption oracle and receives M 1 , M 2 , . . . , M q d . The ciphertext C and associated data A i consist of a single block for every i. 2. There exists at least one value of the last r 2 bits of the input to the third permutation which remains same for at least q = q d 2 r 2 queries. Suppose q such queries are (N, A can detect the previous step as it knows the value of the last r 2 bits of the input to the third permutation because that will be equal to the last r 2 bits of C ⊕ M i . 4. A retains those q queries and discards the rest. 5. For each of the above queries, A updates the value of the first r 1 bits of the ciphertext to Y 2 ⊕ shuffle(Y 2 ) and varies the remaining r 2 bits. This ensures that the first r 1 bits of the input to the third permutation always equal zero. 6. In the way mentioned above, A can ask q d more decryption queries to the decryption oracle. This time, a total of r bits (first r 1 bits and last r 2 bits) of the input to the third permutation are fixed and known to A. 7. Then, A can follow the generic idea to complete the attack.
The attack needs again q d qp ∈ O(2 c ) as before.

Discussion
As a takeaway from this section, the unmasked sponge-based AE schemes allow I -RUP attacks whose advantage can depend linearly on the number of off-line primitive queries. We think that any AE construction which uses linear feedback is vulnerable to such an attack (q d qp ∈ O(2 c )) unless it uses more state. The next section defines Oribatida that masks its ciphertexts for higher I -RUP resistance. After its definition, we will get back to attacks on it and on strengthened versions of the schemes sketched here to show that they can be similarly extended by a ciphertext masking.

Specification of Oribatida
At its core, Oribatida is a variant of the monkey-wrap design [16], but adds a ciphertext masking. This section considers a slightly updated version of Oribatida. Section 9 discusses the update from Oribatida (v1.2) from [21] to Oribatida v1.3 in this work.
In the following, let P ∈ Perm(B) be a permutation. We denote by (X i , Y i ) the inputs and by (U i , V i ) the outputs of the primitive(s). As in the classical sponge, Oribatida considers the state S i = (U i ‖ V i ) as a rate U i T ← msbτ(S a+m+1 ) ⊕s lsbs(V) 137: return (C, T) return (S 1 , V 1 ) 151: function P AD(S 1 , A, d A ) 152: (A 1 , · · · , Aa) r ← − A 153: for i = 1..a − 1 do 154: of r bits, where inputs are XORed to, and a capacity V i of c = n − r bits. Unlike the usual sponge, an s-bit part of the capacity is used to mask the subsequent ciphertext block. The definition is given in Algorithm 1. We assume that the key size is at most the capacity, k ≤ c, and the tag size is at most τ ≤ r bits.

Initialization
Each variant of Oribatida uses a fixed-size nonce N, whose length ν is such that k + ν = n bits. N is concatenated with the key K to initialize the state: N ‖ K:  Figure 3: Authenticated encryption of a-block associated data A and m-block message M with Oribatida.

Processing Associated Data
After the initialization, the associated data A is split into r-bit blocks and is absorbed in the rate. If its length is not a multiple of r bits, A is padded with a 10 * -padding if |A| mod r / ≡0 such that its length becomes the next highest multiple of r bits. If the associated data is empty, it is padded to one full block 10 r−1 . In this case, we denote the length of the padded associated data A in blocks also as a = 1. The padded A is split into r-bit blocks (A 1 , · · · , Aa).
For all non-final blocks of A, the capacity of the permutation output, V i , is simply forwarded to that of the subsequent input to the permutation P: Y i ← V i . The state is updated with P afterwards, for all 1 < i < a (that is, except the final a-th block of A): S i+1 ← P(X i ‖ Y i ). When the final block Aa is processed, a domain d A is XORed to the least significant byte of the capacity.

Encryption
After A has been processed, the message M is encrypted. Similarly as for the associated data, if its length is not a multiple of r bits, M is padded with a 10 * -padding such that its length after padding becomes the next highest multiple of r bits. An empty message M = ε will not be padded.
After M is split into r-bit blocks (M 1 , · · · , Mm) (after padding if necessary), the blocks M i are processed one after the other. Given the state value (U a+i , V a+i ) r,c ← − − S a+i , the current block M i is XORed to the rate U a+i : X a+i ← M i ⊕ U a+i . The capacity is simply forwarded: Y a+i ← V a+i . Then, (X a+i ‖ Y a+i ) is used as input to a call to P to derive the next state value S a+i+1 ← P(X a+i ‖ Y a+i ).
The ciphertext blocks C i are computed from a sum of the current rate, the current plaintext block, and a (partial) earlier mask value from the capacity. The first ciphertext block is computed from C 1 ← X a+i ⊕s lsbs(V 1 ). If C 1 is the final ciphertext block, it is computed as C 1 ← msb ℓ E (X a+i ⊕s lsbs(V 1 )), where ℓ E denotes the length of M before padding. Non-final ciphertext blocks C i , 1 < i < m are computed from C i ← X a+i ⊕s lsbs(V a+i−1 ), for 1 < i < m. If m > 1, the final ciphertext block results from Cm ← msb ℓ E mod r (Xa+m ⊕s lsbs(V a+m−1 )). For the final message block, a domain d E is XORed to the least significant byte of the capacity: Ya+m ← Va+m ⊕ d ⟨d E ⟩ d . Similar as for A, d E uses three pairwise distinct domains depending on whether M was empty, non-empty and required no padding, or non-empty and has been padded. P is called another time to derive S a+m+1 ← P(Xa+m ‖ Ya+m). Its rate is XORed with the most significant τ bits of the key Va+m, and -truncated to τ bits if necessary -is released as the authentication tag: T ← msbτ(S a+m+1 ) ⊕s lsbs(Va+m). Note that, for s = τ as for our instantiations, the tag is masked as the ciphertext output blocks, which unifies this process.

Decryption
The decryption takes a tuple (K, N, A, C, T). The initialization with K and N as well as the processing of the associated data A is performed in the same manner as for encryption. If |C| mod r ≠ 0, the decryption pads C with a 10 * -padding to the next multiple of r bits. In all cases, it splits C into r-bit blocks (C 1 , · · · , C m−1 ) plus a final block Cm. If m > 1, the plaintext block is computed as The capacity is simply forwarded to the next call of the permutation: The final plaintext block is computed from the padded ciphertext block Cm as Xa+m ← Cm ⊕s lsbs(V) and Mm ← lsbx(Ua+m ⊕ Xa+m), where x ← ℓ E mod r. For the final block, the domain d E is XORed to the least significant byte of the capacity: , and using only its most significant τ bits: T ′ ← msbτ(T ′ ‖ Z) as for the encryption If T = T ′ , the ciphertext is considered valid, and M = (M 1 ‖ · · · ‖ Mm) is released as plaintext. Otherwise, the ciphertext is deemed invalid, and ⊥ is returned.

Domain Separation
For domain separation, Oribatida defines constants d N , d A and d E . The domains are XORed with the least significant byte of the state at three stages. Domains are encoded as d-bit strings, where d = 4 bits suffice in practice. The value depends on the presence of A and M and whether their final blocks are absent, partial, or integral to prevent trivial collisions of inputs to P among blocks of A and M. The constants are determined by four bits (t 3 , t 2 , t 1 , t 0 ) that reflect inputs in the hardware API, similar to, e.g., [25]: -EOI: t 3 is the end-of-input control bit. This bit is set to 1 if the current data block is the final block of the input. Note that if the associated data is empty, then the created 10 r−1 -block is never treated as the final block of the input.
-EOT: t 2 is the end-of-type control bit. This bit is set to 1 if the current data block is the final block of the same type, i.e., it is the last block of the nonce/associated data/message. Note that if both the associated data and the message are empty, then neither the nonce nor the created 10 r−1 -block of the associated data is considered as the final block of its type.
-Partial: t 1 is the partial-control bit. This bit is set to 1 if the size of the current block is less than the block size. Note that if the associated data is empty and the message is non-empty, then the created 10 r−1 -block is treated as a partial block.
-Type: t 0 is the type-control bit, identifying the type of the current block. For the nonce and the final message block, t 0 = 1. If the associated data is empty and the message is non-empty, then t 0 = 1 for the created 10 r−1 -block of the associated data. For all other cases, t 0 = 0.
While processing a data block, the domains are set as the integer representation of t 3 ‖ t 2 ‖ t 1 ‖ t 0 . For example, processing the nonce (which is always a complete r-bit block) with empty associated data and non-empty message yields d N = (t 3 t 2 t 1 t 0 ) = (0101) 2 = 5. Details are provided in Algorithm 2; ℓ A denotes the length of A and ℓ E that of M in bits before padding. An overview is given in Table 3.

I -RUP Attacks on Schemes with Masked Ciphertexts
The approach of Oribatida is to employ (a portion of) the capacity of the previous permutation output to mask the ciphertext outputs. This strategy is generic enough to also apply it to other modes, such as Beetle or SPoC. We can informally define a masked variant of Beetle and SPoC. The masked beetle uses Z i−1 and XORs lsbs(Z i−1 ) to the s rightmost bits of the ciphertext block C i , for i > 1 and s ≤ c. If there is no associated data present, we define Z 0 = K 2 . A masked variant of SPoC would employ the rate (since it is the hidden part). Thus, for the masked SPoC, we define s ≤ r and define that lsbs(U i−1 ) is XORed to C i for i > 0. If no associated data is present, we define U 0 for the rate of the initial input to the permutation P.
For Oribatida, the masked Beetle or the masked SPoC, the attacks in Section 3 do not apply directly. Though, there exist attacks on each of them with complexity O(q 2 d /2 c ).

The Generic I -RUP Attack on Oribatida (Masked Duplex)
Here, we consider an attack on Oribatida that shows that our I -RUP bound of O(q 2 d /2 c ) will be tight, so the attack will be successful for where N i and C i is static for all queries. We assume that the associated data A i are pairwise distinct and consist of a single block for all queries. A obtains M i from the encryption oracle, Then, this collision leads to a full-state collision that can be detected when M i is a valid forgery and yields M j .

I -RUP Attack on The Masked Beetle
1. The adversary A asks q d encryption queries (N 1 , A 1 , M), (N 2 , A 2 , M), . . . , (N q d , A q d , M) to the encryption oracle, and receives C 1 , C 2 , . . ., C q d . The associated data A i consist of a single block for each i; the message M contains ⌈ c r ⌉ blocks. 2. A asks q d − 1 decryption queries, one for each encryption query except the first encryption query, to the decryption oracle. The decryption query of the i-th encryption query is with the definition of shuffle. So, the first r bits of the input to the third permutation call always equal those of the first encryption query. 3. Afterwards, A repeats Step 2 to 6 from Section 5.1 to complete the attack.
The attack is successful for q 2 d ≈ O(2 c ). However, the attack strategy differs for SPoC.

I -RUP Attack on The Masked SPoC
Here, A has to perform the attack in two stages.
to the decryption oracle, and receives M 1 , M 2 , . . ., M q d . The associated data A i consists of a single block for each i; the ciphertext C consists of two blocks. 2. When q d ≈ O(2 r ), A expects a collision in the first r bits of the input to the third permutation call. A can detect this collision by looking at the first message block because it will be equal only for the two colliding queries. 3. Suppose the associated data of the two colliding queries are A i and A j .
to the decryption oracle. 5. When q 1 · q 2 ≈ O(2 r ), A expects a full state collision at the input to the third permutation, between one query with associated data A i and another query with associated data A j . 6. Suppose the two ciphertexts corresponding to the two colliding queries are C p and C q , and the corresponding messages are M p and M q . 7. A identifies those pairs of queries (N, A i , C x ), (N, A j , C y ), 1 ≤ x ≤ q 1 and 1 ≤ y ≤ q 2 , for which the sum of the second message blocks equals that of the first message blocks. For each such pair, A updates C x and C y by appending ⌈ c r ⌉ − 1 ciphertext blocks to each s.t., r ⌉ , and makes decryption queries with the updated ciphertexts. For k > 2, M p k will equal M q k . 8. Next, A asks (N, A i , M p ) to the encryption oracle; suppose, the tag is T. 9. Then, A successfully forges with the query (N, A j , C q , T).
Again, the probability for forgeries becomes non-negligible when q 2 d ≈ O(2 c ).

AE Security Analysis
This section analyzes the AE security of Oribatida. In the following, let K K and π Perm(B). We use Π[π, π] K = Π[π] K as short form of Oribatida, instantiated with π for P and for P ′ , and keyed by K. Let A be a nonce-respecting AE adversary w.r.t. Π[π] K . We denote by qp, q f , q b , qc, qe, q d , σc, σe, σ d the number of primitive queries, forward primitive queries, backward primitive queries, construction queries, encryption queries, decryption queries, blocks summed over all construction queries, blocks summed over only all encryption queries, and blocks summed over all decryption queries, respectively. It holds that qp = q f + q b , qc = qe + q d , and σc = σe + σ d . For simplicity, we define a function ρ as So, V i ρ(i,j) denotes the used block for masking the ciphertext block C i j . Recall the notion of a longest common prefix from [11]. Let Q = (N, A, M, C, T) be a query of A with the response. Let Q denote a set of queries without Q, i.e., Q ∉ Q. We define the length of the longest common prefix of M and another message M ′ as LCP(M, . Given Q and M, we overload the notation by considering the longest common prefix of M with the queries in Collisions of chaining values are trivial if in the longest common prefix and non-trivial otherwise. Proof. We follow the strategy of the AE proof of Beetle [25]. The queries by A and their corresponding answers are collected in a transcript τ = (τe , τ d , τp). In that transcript, the encryption construction queries are stored the decryption construction queries are stored as tuples

Sampling.
We define the ideal oracle to consist of an on-line and an off-line phase. In the on-line phase, the ideal oracle samples the responses (C i , T i ) uniformly at random from the bit strings of expected lengths for encryption queries. For decryption queries, it always outputs ⊥. For forward primitive queries Q i , it forwards the result of π(Q i ) to A; for backward primitive queries R i , it forwards the result of π −1 (R i Moreover, for construction queries whose plaintext or ciphertext length is not a multiple of r bits, the oracle samples exactly the missing bits C i m uniformly independently at random that are not fixed from previous queries, at most at most r − |C i m i | bits at a time. The so-sampled values for the final blocks C i m i are stored also in the transcript. Moreover, the random key K is revealed to A after the off-line phase.

Bad Events.
We define the following bad events. If any of them occurs, the adversary aborts, and we define that it wins in this case.
bad 1 : Multi-collision on the rate X in encryption construction queries. For some w ≥ r, ∃ indices (i 1 , j 1 ), bad 3 : Collision of permutation outputs in encryption construction queries: bad 4 : Collision of permutation inputs between a construction and a primitive query: bad 5 : Collision of permutation outputs between a construction and a primitive query: bad 6 : Initial-state collision with a primitive query: bad 7 : Multi-collision in the rate of w outputs of forward primitive queries: for some w ≥ r, .qp] s. t. msbr(R i1 ) = msbr(R i2 ) = · · · = msbr(R iw ). bad 8 : Multi-collision in the rate of w outputs of backward primitive queries: for some w ≥ r, We define that the adversary is provided with all internal chaining values V i j and U i j after its interaction, but before it outputs its decision bit, which only strengthens the adversary. We define the set of bad transcripts B T, to contain exactly those attainable transcripts τ for which at least one of the bad events occurred. It The probability of bad transcripts in the ideal world is treated in the proof of Lemma 2. The ratio of good transcripts is bounded in Lemma 3. Our bound in Theorem 1 follows from them and the fundamental Lemma of the H-coefficient Technique [54], using w = r in the bounds.

Lemma 2.
Let w ≥ r be a positive integer. It holds that Proof. In the following, we upper bound the probabilities of the individual bad events.
bad 1 : Multi-collision on X in encryption construction queries.
In the ideal world, the ciphertext blocks are sampled independently and uniformly at random from the strings of expected length. The internal values X i j can be computed by A once it is given the transcript including the internal chaining values V i j . It must hold that . The random sampling of C implies that the probability of the values X i j to take any specific r-bit value is 1/2 r . Note that in the case of a padded ciphertext block, each padded bit of C i m i is also sampled once randomly and given in the transcript. Hence, the probability for X i a i +m i to take any r-bit value is also 2 −r in the ideal world. For fixed indices (i 1 , j 1 ), (i 2 , j 2 ), . . . , (iw , jw), it holds that Over all queries and blocks of τe, it follows that

bad 2 : Collision of two permutation inputs in encryption construction queries.
Here, we consider All ciphertext blocks and the internal chaining values V i ρ(i,j−a i ) , j > a i are sampled independently and uniformly at random. Moreover, padded bits of ciphertexts are sampled also independently and uniformly at random. Though, we have to consider two cases; j = 0 ∧ j ′ = 0: since X i 0 and X i ′ 0 contain nonces, and since we assume A to be nonce-respecting, the probability for a collision is zero in this case.
is derived from C i j ; so, both X i j and Y i j are chosen independently and uniformly at random, and the probability for a collision in this case is at most 2 −n .
Therefore, for fixed indices (i, j) ≠ (i ′ , j ′ ), the probability is Over all combinations of indices, it follows that bad 3 : Collision of two permutation outputs in encryption construction queries. This case is analogous to bad 2 . The permutation outputs V i j are sampled randomly. In all cases, it holds that Over all combinations of indices, it follows that bad 4 : Collision of permutation inputs between a construction and a primitive query.
where const is a public constant. The values C i j−a i and V i ρ(i,j−a i ) are sampled randomly, the values (X i j ‖ Y i j ) take any value with probability at most 2 −n .
1. Assume, the primitive query was asked before the construction query. If the construction query was in encryption direction, the collision probability for fixed queries is at most 2 −n , for qp · qc combinations. 2. If the primitive query was asked after an encryption query, then, the latter one produced a tag. If the primitive query starts at any other block, A can see r − s bits. Hence, the probability is at most 2 −(c+s) for qp · σe combinations. If the primitive query starts from the tag, the adversary sees c + s unmasked bits. Assuming bad 1 , there are at most w equal tags over all encryption queries. So, the probability for a collision is 2 −(n−τ) , for at most w · qp combinations.
Over all combinations of indices, it follows that bad 5 : Collision of permutation outputs between an encryption construction query and a primitive query. Again, U i j+a i can be derived from M i j ⊕ C i j ⊕s lsbs(V i ρ(i,j) ) and the values C i j and V i ρ(i,j) are sampled randomly. So, the values U i j+a i ‖ V i j+a i take any value with probability at most 2 −n . If the primitive query starts at any other block, A can see r − s bits. Hence, the probability is at most 2 −(c+s) for qp · σe combinations. Following a similar argument as for bounding bad 4 and excluding bad 1 , we obtain over all combinations of indices that bad 6 : Initial-state collision with a primitive query.
Here, we know that the key is chosen uniformly at random. We distinguish between collisions depending on whether the primitive query was a forward query or a backward query.
1. If the primitive query was a forward query, it must hit the correct value of K ⊕ d d N . So, the probability is at most qp /2 k to collide with encryption construction queries. Considering also decryption queries, a nonce can repeat but change d N . Since there exist at most three distinct values for d N , the probability is at most 3qp /2 k to collide. 2. If the primitive query was in backward direction, its response must hit any initial state of a construction query. If the construction query was asked before the primitive, A sees at best r − s bits of C 1 . Then, the probability is at most qc · qp /2 c+s .
3. If the primitive query was asked before the construction query, A can use the nonce part of the primitive query's result as a nonce. Though, a collision needs the key part to be correct, which holds with probability at most 3qp /2 k .
Over all possible options, we obtain bad 7 : Multi-collision in the rate of w outputs of forward primitive queries.
Since π is chosen randomly from the set of all permutations, the outputs are chosen randomly from a set of size 2 n − (i − 1) for the i-th primitive query. So, the probability for w distinct queries to collide in their rate is at most 1/2 r(w−1) as for bad 7 in the AE proof. Over all queries, the probability is upper bounded by bad 8 : Multi-collision in the rate of w outputs of backward primitive queries.
Following a similar argumentation as for bad 7 , we obtain Our bound in Lemma 2 follows from summing up all probabilities.
Proof. It remains to lower bound the ratio of real and ideal probability of obtaining a good transcript τ. Let τ = (τe , τ d , τp) be an attainable transcript, where τ d = ⊥ all contains only ⊥ for all responses. Since all ciphertext-block outputs and all internal chaining values in encryption queries are sampled independently and uniformly at random, their probability is 1/2 per bit. We define σ distinct for the number of distinct calls to the permutation over all encryption and decryption queries. In the ideal world, it holds that since the outputs from encryption queries are sampled uniformly at random; so, the encryption and decryption transcripts τe and τ d are independent from τp.
In the real world, the probabilities for choosing K as key and π as permutation are equal to those of the ideal world. We can separate the probability into where we define The probability of primitive queries is given by the fraction of all permutations π that would produce τp, which is as in the ideal world. The ciphertext blocks C i j from encryption queries as well as the chaining values V i j are results from the permutation π and hence, depend on the permutation. Since τ is a good transcript, there are no undesired collisions, e.g., between primitive and construction queries. Hence, all internally computed values (U i j ‖ V i j ) -note that U i j can be derived from C i j−a i ⊕s lsbs(V i ρ(i,j−a i ) ) ⊕ M i j−a i by the adversary -are results of fresh values or predefined in decryption queries from the result of previous encryption queries. Then, the probabilities for the outputs of π in construction queries are given by 1/ . It is not difficult to see that for positive σ distinct , the ratio of the interpolation probabilities from Equation (1) It remains to upper bound ϵ. For this purpose, we upper bound the values ϵ i for transcripts that contain forgeries. Since τ is a good transcript, we assume that bad events do not hold. Hence, either ⊤ i does not hold, which yields ϵ i = 0; in the opposite case, we have to consider a few mutually exclusive cases in the following. We assume that there exists a decryption query In all cases, the tag can simply be guessed correctly if the block (X i a i +m i ‖ Y i a i +m i ) is fresh. Then, the probability for the tag to be correct is 2 −τ . So, we can concentrate on the cases where it is non-fresh in the following. Prior, we define (X i1 , X i2 , . . . , X iw+1 ) as a w-chain if there exist (Y i1 , Y i2 , . . . , Y iw+1 ) s. t. the following chain has been obtained from primitive queries: The cases are: , and no w-chain of primitive queries is hit. Clearly, the cases cover all possible options. We assume that no previous bad events occur, in particular, no w-multi-collisions or collisions with the primitive queries occurred.

Case (A).
We excluded bad 6 in this case. The probability that (N i ‖ K) ⊕ d d N hits any block (X i ′ j ‖ Y i ′ j ) from another construction query so that the final block is old is at most

Case (B).
Let p ≤ a i + m i denote the length of the longest common prefix of the i-th query with all other queries. In Case (B), the probability that any block (X i j ‖ Y i j ) with j ≥ p + 1 matches the permutation input of any other encryption-query block or primitive query can be upper bounded by

Case (C).
A similar argument as for Case (B) can be applied in Case (C). The probability that there exists i ′ ≠ i, s. t. for some block indices, it holds that (j, j ′ ):

Case (D).
This case needs that (X i a i +m i +1 ‖ Y i a i +m i +1 ) matches the permutation input of any other encryption-query block or primitive query. The probability can be upper bounded by

Case (E).
Assume that (X i p+1 ‖ Y i p+1 ) hits a w-chain of primitive queries. Under the assumption that no other bad events occurred, the probability is at most Over all decryption queries, we obtain Our claim in Lemma 3 follows.

I -RUP Analysis
We use the same notations as in Section 6 but add some. Let q d and σ d be the number of decryption queries and blocks over decryption queries, respectively, and qv and σv the analogs for verification queries. We replace π Perm(B), assume K K, and denote Π[π] K for Oribatida with π and K.
Proof. The I -RUP analysis of Oribatida follows a similar strategy as our AE analysis. However, this time, the adversary has access to three oracles for encryption, decryption and verification. Moreover, the encryption and decryption oracles are the same in both the real and the ideal world. Both worlds differ only in the verification oracle. To alleviate the task, we replace the oracles for encryption and decryptioñ︀ E[π] K ,̃︀ D[π] K with a pair of consistent pseudo-random oracles $̃︀ E [π] and $̃︀ D [π] (we define our intent of consistency for encryption and decryption in a moment). The advantage between both settings can be upper bounded by Note that the oracles $̃︀ E and $̃︀ D differ from the independent random oracles in the stronger RUPAE notion. In the RUPAE notion, they sample independently from each other without considering common prefixes between queries, which would be impossible to achieve for an on-line AE scheme. Again, we consider the H-coefficient approach. So, we define several bad events and bad as well good transcripts. If any of the bad events occurs, the adversary aborts and is defined to win. Next, we consider the probability of forgeries under those idealized oracles. So, we can exclude the previous bad events and study the probability of forgeries. Finally, we study the ratio of interpolation probabilities for good transcripts.

Sampling Consistently in the On-line Phase.
This on-line phase contains much from the off-line phase of the AE analysis. We define the ideal encryption oracle as in the AE proof: it samples the responses (C i , T i ) uniformly at random from all bit strings of expected lengths for encryption queries. The ideal decryption oracle, however, must sample plaintext outputs consistently. For this purpose, the ideal encryption oracle has to sample also the internal chaining values V i j {0, 1} c and U i j {0, 1} r uniformly at random for all construction queries already in the on-line phase. It stores the values of C i j , V i j , and U i j also internally, but does not release U i j and V i j in this phase. On each input (N i , A i , C i , T i ), the ideal decryption oracle looks up the length of the longest common prefix of the query p ← LCP N i ,A i (C i , Q) with all previous queries Q. For all blocks in the common prefix 1 ≤ j ≤ p, it uses the same outputs M i j that have been fixed from previous queries. Since the oracle has sampled V i p+1 for the (p +1)-th block, it can deduce all bits not fixed from previous query outputs. Assume {0, 1} r uniformly and independently at random from the bit strings of expected lengths, for p + 2 ≤ j ≤ a i + m i . Note that queries whose ciphertext lengths are not multiples of r bits are answered consistently since the oracle samples V i j , and all bits fixed from previous queries are used. For verification queries, the ideal verification oracle always outputs ⊥. For forward primitive queries Q i , the ideal oracle forwards π(Q i ); for backward primitive queries R i , it returns π −1 (R i ).

Off-line phase.
Here, the ideal oracle releases the internal chaining values (U i j , V i j ), after the considered adversary made all queries, but before outputting the decision bit. The ideal oracle also reveals a random key K K then.

Bad Events.
Whenever we consider a non-trivial collision between blocks or chaining values at block indices j, j ′ of two messages, we assume that at least one of them exceeds the longest common prefix.
bad 1 : Non-trivial collision of permutation inputs in construction queries: bad 2 : Non-trivial collision of permutation outputs in construction queries: We define B T to contain exactly the attainable transcripts τ for which at least one bad events occurred. All other attainable transcripts are in G T. Then Pr The probability of bad transcripts in the ideal world is treated in the proof of Lemma 4. The ratio of obtaining a good transcript is bounded in Lemma 5. Our bound in Theorem 1 follows from those and the fundamental Lemma of the H-coefficient Technique [54]. We apply w = r in the bound of Lemma 4.

Lemma 4. Let w ≥ r be a positive integer. It holds that
Proof. In the following, we upper bound the probabilities of the individual bad events. For most of them, we differentiate between encryption and decryption queries.
Since there exist (︀ σe 2 )︀ block combinations, we obtain (︀ σe 2 )︀ /2 n . 2. Dec-then-Enc: If we consider an encryption query block to collide with a block from a previous decryption query, the probability is at most 2 −(c+s) since A can see r − s bits that it can use as the nonce. We have qe σ d combinations of such blocks. For the remaining σe σ d blocks, the probability is 2 −n . 3. Among decryption queries only: w.l.o.g., we consider the first such collision. If A modifies the nonce in the later following query, the bound is the same as for encryption-only queries. So, we assume in the remainder of that the later query is a decryption query. Let j − 1 be the first modified block and assume it is in the message-processing part. If the block indices differ j ≠ j ′ , the probability is 2 −(c+s) . Otherwise, assume j = j ′ and A i = A i ′ . Then, the permutation output So, X i j = X i ′ j . With probability 2 −c , it also holds for the capacity V i j+1 = V i ′ j+1 and thus Y i j+1 = Y i ′ j+1 . Note that this approach holds only for the first differing block, for which which yields a term of (︀ q d 2 )︀ /2 c . If the collision does not hold, the masks beginning for the (j + 2)-th block will differ and the probability decreases to 2 −(c+s) , which produces a term of (︀ σ d 2 )︀ /2 c+s . 4. Enc-then-Dec: It remains to consider collisions between an encryption query, followed by a decryption query. If the block indices j ≠ j ′ differ, the probability is again 2 −(c+s) , for at most σe · σ d combinations.
Otherwise, if j = j ′ , A can apply the strategy above for a collision. Then, the probability is 2 −c ; though, the q d queries can collide at most with one encryption query each since we consider the first collision, producing a term of q d /2 c .
Over all cases, we obtain bad 2 : Collision of two permutation outputs in encryption construction queries. This case is analogous to bad 1 . Over all combinations of indices, it follows that bad 3 : Multi-collision on w tags from encryption queries.
Since the tags are sampled uniformly and independently at random in the ideal world, it holds that

bad 4 : Collision of permutation inputs between a construction and a primitive query.
Again, we consider where const is a public constant. The values C i j−a i and V i ρ(i,j−a i ) are sampled randomly, the values (X i j ‖ Y i j ) take any value with probability at most 2 −n .
1. Assume, the primitive query was asked before the construction query. If the construction query was in encryption direction, the collision probability for fixed queries is at most 2 −n , for qp · qc combinations. 2. Otherwise, if the construction query was a decryption query, A can see r − s bits. Hence, the probability is at most 2 −(c+s) , for qp · qc combinations. 3. The same argument can be applied in the case when the primitive query was asked after a decryption query. Then, the adversary can see r − s unmasked bits of the rate from C i j . Again, the probability is at most 2 −(c+s) and we have qp · qc combinations. 4. If the primitive query was asked after an encryption query, then, the latter produced a tag. If the primitive query targets any other block, the argument is the same as in Case c). If the primitive query starts from the tag, the adversary sees τ − s unmasked bits. Assuming bad 3 , there are at most w equal tags over all encryption queries. So, the probability for a collision is 2 −(c+s) , for w · qp combinations.
Over all combinations of indices, it follows that Here, we distinguish between the cases whether the construction query was asked before or after the primitive query and whether the primitive query was in forward or backward direction.
1. Assume, the primitive query was asked after the construction query. If the primitive query was a forward query, it must hit the correct value of K ⊕ d d N . This probability is at most qp /2 k to collide when considering encryption construction queries. Considering also decryption queries, a nonce can repeat often; though, the initial state can take three different values for the same nonce, namely if the decryption query changes the length of associated data and message, affecting d N . Since there exist at most three distinct values for d N , the probability to collide is at most 3qp /2 k . 2. If the primitive query was in backward direction, its response must hit any initial state of a construction query. If the construction query was asked before the primitive, A sees at best r − s bits of C 1 . Then, the probability is at most qc · qp /2 c+s . 3. If the primitive query was asked before the construction query, A can use the nonce part of the primitive query's result as the nonce. However, a collision must hit the key part, which holds with probability at most 3qp /2 k . 4. If the primitive query was in backward direction, A sees at best r − s bits of C 1 . Then, there is at most one starting state, assuming bad 1 , which yields qp /2 c+s .
Over all possible options, we obtain bad 7 : Multi-collision in the rate of w outputs of forward primitive queries.
Since π is chosen randomly from the set of all permutations, the outputs are sampled uniformly at random from a set of size at least 2 n − (i − 1) for the i-th query. So, the probability for w distinct queries to collide in their rate is upper bounded by Over all primitive query indices, it holds that We consider the same five mutually exclusive cases as in the AE proof. In all cases, the tag can simply be guessed correctly if the block (X a i +m i ‖ Y a i +m i ) is fresh. Then, the probability for the tag to be correct is upper bounded by 2 −τ . We adopt the cases and the notions from the AE proof and assume that no previous bad events occur, in particular no w-multi-collisions described earlier or collisions with primitive queries.
-Case (C): , and no w-chain of primitive queries is hit.

is a prefix of another construction query. -Case (E): (N i , A i ) is old and there exists a w-chain of primitive queries that is hit.
Clearly, the cases cover all possible options. We assume that no previous bad events occur, in particular no w-multi-collisions described earlier or collisions with primitive queries.

Case (A).
We excluded bad 4 , i.e., collisions of permutation inputs between construction and primitive queries in this case. The probability that ( from another construction query so that the final block is old is at most

Cases (B)-(D).
Let p ≤ a i + m i denote the length of the longest common prefix of the i-th query with all other queries. The probability that any block (X i j ‖ Y i j ) with j ≥ p + 1 matches the permutation input of any other query block or primitive query can be upper bounded analogously as bad 1 and bad 4 : Over all verification queries, we obtain

Case (E).
Assume that (X i p+1 ‖ Y i p+1 ) hits a w-chain of primitive queries. Under the assumption that no other bad events occurred, the probability is at most Over all verification queries, we obtain Our bound in Lemma 4 follows from summing up all probabilities.
Proof. It remains to bound the ratio of the probabilities for obtaining a good transcript τ in the real and the ideal world, respectively. The bound is similar to that of Lemma 3. The difference to the AE proof is that the ideal decryption oracle also generates pseudorandom output blocks M i j beyond the longest common prefix. The AE transcript also contained the sampled internal values, as does the transcript τ here. Since we assume that no bad events have occurred, we revisit the following cases for forgeries: -Case (A): The final input to π, (X i a i +m i ‖ Y i a i +m i ) is fresh, i.e., has not occurred before. Then, the probability that the authentication tag τ i is valid is at most 1/2 τ . -Case (B): The final input to π, (X i a i +m i ‖ Y i a i +m i ) is old, but there exists some block index j ∈ [1..a i + m i ] s. t. (X i j ‖ Y i j ) is fresh. Since the input is old, the probability that the result of any of the next blocks is old is at most The probability that all of those blocks are old is at most It follows that Over all indices i ∈ [1..q d ], it follows that Our claim in Lemma 5 follows.

Comparison with Lightweight I -RUP-secure Schemes
Among the submissions to the NIST lightweight competition [53], ESTATE [27], LAEM [61], LOTUS-AEAD and LOCUS-AEAD [24] claimed security in the I -RUP model. Among these modes, ESTATE, LOTUS-AEAD, and LOCUS-AEAD were elected into the second round. This section compares our proposal to those; Table 4 gives a summary.

Brief Description
ESTATE follows SIV [57]: the associated data and message are authenticated using a variant of CBC-MAC with a tweakable block cipher before the tag is used as an initial vector of CBC-like encryption. The intermediate values are used as keystream and added to the message blocks. LOCUS-AEAD and LOTUS-AEAD employ a variant of PMAC [23] to process the associated data with the tweakable block cipher. For encryption, LOTUS-AEAD uses a variant of OTR [50], a two-round, two-branch Feistel structure to process the message in double blocks. LOCUS-AEAD employs an encryption similar to OCB [56] and EME/EME * [38]. Both LOCUS-AEAD and LOTUS-AEAD employ a single pass over the message for encryption, but two calls to the primitive per message block. The intermediate values are summed to the associated-data hash and the final message block; the encrypted sum yields the tag.

Eflciency
Oribatida processes 96-or 128-bit message blocks per primitive call, whereas the size of the message processed in one primitive call is 64 bits for ESTATE and 32 for LOTUS-AEAD and LOCUS-AEAD. Thus, Oribatida offers higher throughput; moreover, the state size of Oribatida (288 and 320 bits, respectively) is smaller than those of LOTUS-AEAD (388 bits) and LOCUS-AEAD (324 bits). ESTATE has a state size of 260 bits; all three must process the message with two calls to the primitive. LOCUS-AEAD requires the inverse operation of the underlying block cipher to be available for the decryption. In sum, Oribatida possesses a smaller state size than LOCUS-AEAD and LOTUS-AEAD, and higher AE security, as well as a higher rate, compared to its I -RUP-secure competitors.

Security
All three competitors are based on tweakable block ciphers, with I -RUP claims limited by the birthday bound of the internal primitive. ESTATE inherits I -RUP security until the birthday bound from SIV, which has been considered in [7, Sect. 6.2]. While LOCUS-AEAD and LOTUS-AEAD share similarities to OCB and OTR, they use intermediate checksums as in EME designs in the tag-generation process. Informally, modifying any message block will result in new pseudorandom internal values and therefore a pseudorandom input to the tag computation.

Discussion of the Updated Variant Oribatida v1.3
This section discusses the update from Oribatida (v1.2) from [21] to Oribatida v1.3 in this work, that addresses the observation by Rohit and Sarkar [59] in a straight-forward manner. Here, we briefly discuss only the differences: 1. Oribatida v1.2 released the tag without masking. As a consequence, the adversary has seen the full rate and had to guess only the n − τ-bit hidden part to be able to invert the encryption process. To succumb this attack, Oribatida v1.3 masks the tag such that the adversary sees τ − s bits if s ≤ τ, which restores the complexity from q/2 n−τ to q/2 c+s . Figure 4 illustrates both tag-generation processes for comparison. The masking of the authentication tag is performed exactly as for ciphertext blocks, which streamlines this process. 2. Oribatida v1.2 employed two permutations P and P ′ , where the latter was intended to be a more efficient variant of the former. In practice, P ′ was instantiated with a round-reduced version of P, which was only used for processing intermediate blocks of associated data. This was fine since an upper bound on the probability of differentials was sufficient for security and not pseudo-randomness. Oribatida v1.3 unified the process and uses P at every location. 3. Oribatida v1.2 used a different starting value V 0 for masking the ciphertexts when the associated data was empty and V 1 otherwise. The reason was simply efficiency since empty associated data did not yield a value V 1 . In contrast, Oribatida v1.3 always pads the associated data such that there always exists an intermediate value V 1 that is not used as capacity in the message-processing step. This decision implies a slightly lower throughput for empty associated data but adds unification. 4. As a result of Aspect (3), Oribatida v1.3 uses slightly different and more domains to properly address also the additional case when the associated data was empty.   Among all changes, only Aspect (1) is crucial. All further aspects helped unify the design. The security effect of the additional tag masking is illustrated in Figure 5 for the maximum number of qc = 2 50 construction queries as in the NIST guidelines. One can observe that it salvages the AE security of the 192-bit version of Oribatida v1.3. Note that the figure cannot illustrate that many primitive (o ine) queries to the permutation are in practice much easier to obtain than construction queries.

Instantiation of Oribatida
This section specifies the permutation SimP. From a high-level view, SimP is a variant of the domain extender Ψr by Coron et al. [31]. We define SimP to use a round-reduced variant of the Simon [10] block cipher and its key schedule through four such steps. We briefly recall Ψr before we describe the details of Simon, provide an overview of existing cryptanalysis, and close with a discussion of the implications on SimP.

The Ψ r Domain Extender
The Ψr family is a two-branch Feistel-like network that consists of r calls to (pairwise independent) block ciphers. An illustration of Ψ 4 is given at the top of Figure 6. Let BlockCipher(K, B) denote the set of all block ciphers with key space K and block space B. For Ψr, π 1 , π 2 , . . . , πr ∈ BlockCipher(F n 2 × F n 2 , F n 2 ) are independent block ciphers which use one branch R i as state input, and the other one, L i , as secret key. Coron et al. provide statements on the indifferentiability of their constructions.
R rs R 2rs R 3rs Figure 6: Top: The construction Ψ 4 [31]. The blocks π i denote block ciphers over F n 2 with key space F n 2 . Bottom: High-level view of the construction Φ 4 as a variant of Ψ 4 . The blocks φ i represent the key schedules that produce the subkeys and which are externalized from the block ciphers π i in Φ 4 . φ i feeds the subkeys to π i and outputs the final subkey K rs to become the next value R irs .
Intuitively, it follows that a four-step construction with a fourth independent permutation π 4 BlockCipher(K, F n 2 ) inherits at least the security of the three-step construction.

Φ r : A Variant of Ψ r That Includes The Key Schedule
The Ψr construction has to store the state that is transformed through the block cipher π i 's state transformation, plus the key of the current step. Internally, the block ciphers π i also have to expand the secret key to subkeys that add to the total memory requirement. We propose a variant that avoids the need to store the current secret key input. For this purpose, we define the key-schedule permutation φ i : F n 2 → F n 2 that takes an initial key K as input and outputs the subkeys K 0 , . . . , K rs for fixed number of rounds rs of π i . An illustration is given at the bottom of Figure 6. Hereafter, we call the construction Φr when it consists of r steps in total. Note that Φr omits the final swap of the halves for simplicity and since it does not affect the security.

Simon
The Simon family of block ciphers [10] belongs to the lightest block ciphers in terms of hardware area and energy efficiency. Its round function consists of only an XOR, three bit-wise rotations, and a bit-wise AND, which renders it particularly lightweight and flexible. Moreover, Simon has been analyzed intensively since its proposal; among others, e.g., [3,29,44,55,64] studied the security of Simon-96-96 and Simon-128-128. Considerably more works targeted the smaller-state variants of Simon, which has recently been standardized as part of ISO/IEC 29167-21:2018 [40]. For concreteness, Simon-96-96 uses a word size w = 48 bits and employs 52 rounds, whereas Simon-128-128 uses w = 64 bits and 68 rounds.

The SimP-n-θ Family of Permutations
SimP is an instantiation of Φ 4 that tries to adhere to the standard as close as possible, SimP-192 employs the round-reduced Simon-96-96 as π and its key schedule as φ. To form a 256-bit permutation, SimP-256 uses Simon-128-128 with its key schedule. One iteration of the round function of Simon-2w-2w and its key-update function side by side, as is used in SimP-n, is illustrated in Figure 7. Internally, the state of SimP-n-θ consists of four w-bit words (X i 0 , X i 1 , X i 2 , X i 3 ), where the superscript index i indicates the state after Round i. We denote by rs the number of rounds per step, and index the steps from 1 to θ, and the rounds from 1 to θ · rs. The plaintext is denoted as (X 0 0 , X 0 1 , X 0 2 , X 0 3 ); the ciphertext is given as (X θrs 0 , X θrs 1 , X θrs 2 , X θrs 3 ). After Round rs, the state halves (X rs 0 , X rs 1 ) and (X rs 2 , X rs 3 ) are swapped; similarly, they are swapped also after Round 2rs , . . . , θrs. One round of the permutation is illustrated in Figure 7. Thus, SimP-192-θ uses Simon-96-96 and consists of four 48-bit words. SimP-256-θ employs the round function and the key-update function of Simon-128-128 as a 256-bit permutation. For SimP-256-θ, the state consists of four 64-bit words. Figure 7: One iteration of the round function of SimP, which is equivalent to the key-update function (left) and the state-update function (right) of Simon-2w/2w, where w is the word size.

Round Function
Let w be a positive integer for the word size. for SimP-192, w = 48 bits; for SimP-256, w = 64 bits. Let f : F 2 w → F 2 w and g : F 2 w → F 2 w be defined as

Round Constants
The The sequence has a period of 62, so z i = z i mod 62 , for non-negative integers i. Note that the order of the bits z i is reversed.

Number of Steps θ
We consider only the choice of θ = 4 everywhere in our proposed construction.

Number of Rounds
SimP-192-4 consists of rs = 26 rounds for each step, and therefore performs r = 4 · rs = 104 rounds in total.
For simplicity, we also denote SimP-n-4 as SimP-n. The algorithm for SimP-n-θ is given in Algorithm 3.

Remark 2.
Instantiating a scheme proven in an idealized model such as indifferentiability with a symmetrickey primitive is almost always a heuristic: there simply exist few provably secure instantiations. Using the full Simon-2w-2w for each step would be an option for a more secure, but considerably less performant scheme. Concerning SimP, our approach follows the prove-then-prune strategy from AEZ [39]. However, after replacing each step by at least half of the number of rounds, and always using four steps, our approach is far less aggressive than it, as outlined above, and seems to provide a sufficient security margin.

Security of SimP
The number of steps and rounds of SimP was chosen to resist known cryptanalysis techniques. This section provides a rationale for our choices from the existing works.

Requirements
Oribatida with a random permutation aims at AE security of O(rσ d /2 c+s ) and I -RUP security of O(q 2 d /2 c ) in the ideal-permutation model. The advantage of those bounds should be much higher than the complexity to recover or predict the key. An instantiation of P must be free of distinguishing properties that allow us to distinguish it from a random permutation with non-negligible advantage and << 2 n queries. This strengthens the adversary compared to the use of P in Oribatida. There, it can inject nonce, associated data, or message blocks only into the rate and can observe ciphertext and tag outputs also only from that part, but masked. Concretely, we require from P the absence of (truncated, higher-order) differential characteristics with probability ≥ 2 −n , linear approximations with squared correlation ≥ 2 −n , or component functions of degree < n in SimP-4. Moreover, we require the absence of impossible-differential, zero-correlation, or integral distinguishers in SimP-4. However, we disregard rebound or other forms of inside-out attacks that are inapplicable in Oribatida, or splice-and-cut attacks when using SimP as a compression function.

Existing Cryptanalysis on Simon
Various works analyzed the Simon family of block ciphers since its proposal.

Differential Cryptanalysis
Cryptanalysis that appeared early after the proposal of Simon followed mainly heuristics for differential cryptanalysis: Abed et al. [3] followed a heuristic branch-and-bound approach that yielded differentials for up to 30 rounds of Simon-96. Biryukov et al. [22] studied more efficient heuristics, but considered the small variants with state sizes up to 64 bits. Dinur et al. [33] showed that distinguishers on Simon with k key words can be extended by at least k rounds. Interestingly, boomerangs seemed to be less a threat to Simon-like ciphers than pure differentials.
Kölbl et al. [42] redirected the research focus to the search for optimal characteristics. More recently, Liu et al. [44] employed a variant of Matsui's algorithm [46] to find optimal differential characteristics. They found that characteristics with probability higher than 2 −96 covered at most 27 rounds. Moreover, they found at best 31-round differentials with an accumulated probability higher than 2 −96 , i.e., of probability 2 −95. 34 . For Simon-128, they showed that optimal differential characteristics covered at most 37 rounds and found 41-round differentials with a cumulative probability of 2 −123.74 .

Linear Cryptanalysis
Linear cryptanalysis is similarly effective for Simon-like ciphers as its differential counterpart. Alizadeh et al. [1,4] reported multi-trail linear distinguishers on all variants of Simon. For Simon-96-96, they proposed a distinguisher on up to 31 rounds that could be extended by two rounds in a key-recovery attack. Similarly, they reported a 37-round distinguisher for Simon-128-128 that could be extendable by two rounds. Chen and Wang [29] proposed improved key-recovery attacks with the help of dynamic key guessing. To the best of our knowledge, their attacks are the most effective ones for our considered variants in terms of the number of covered rounds, with up to 37 rounds of Simon-96-96 and up to 49 rounds of Simon-128-128 in theory.
Similar as for differentials, Liu et al. studied also optimal linear approximations [45]. They found that the optimal linear approximations can reach at most 28 rounds for Simon-96, and at most 37 rounds for Simon-128. Moreover, they determined linear hulls with potential of 2 −93.8 for 31 rounds of Simon-96, and 2 −123.15 for 41 rounds of Simon-128.

Integral, Impossible-differential, and Zero-correlation Distinguishers
Integral attacks cover at most 22 rounds for Simon-96-96 and 26 rounds of Simon-128-128. Initially, Zhang et al. [65] found integral distinguishers on up to 21 and 25 rounds for Simon-96 and Simon-128. Their results were extended by one round each by Xiang et al. [64], and later by Todo and Morii [62]. The latter could show the absence of integrals for 25-round Simon-96, which was confirmed by Kondo et al. [43].
The maximal number of rounds that impossible-differential and zero-correlation distinguishers can cover is given by at most twice the length of the maximal diffusion. From the results by Kölbl et al. [42], full diffusion is achieved by 11 rounds for Simon-96 and 13 rounds for Simon-128-128. So, impossible-differential and zero-correlation distinguishers can cover at most 22 and 26 rounds in the single-key setting.

Related-key Distinguishers
Kondo et al. [43] searched for iterative key differences in Simon. This allowed them to extend previous results by four to 15 rounds. For Simon-96-96, the authors found iterative key differentials for up to 20 rounds. It remains unclear if this yields an impossible differential; in the best case, a key-iterated 20-round distinguisher could be extended by 2 + 2 + 2 wrapping rounds: two more blank rounds where one key word is not used, plus two rounds where the key difference can be canceled by the state differences, plus two outermost rounds since the result of the non-linear function is independent of the key and therefore predictable in Simon. So, an impossible-differential distinguisher could cover up to 26 rounds. Though, such an upper bound has not been formulated to an attack on the here-considered versions by Kondo et al.; therefore, it is not contained in the overview in Table 6.

Algebraic Cryptanalysis
Algebraic attacks are unlikely to be a threat to Simon-like constructions for sufficiently many rounds. Raddum [55] pointed out that the large number of rounds is necessary. He demonstrated that the equation systems for up to 14 rounds of Simon-96-96 and up to 16 rounds of Simon-128 can be solved efficiently in a few minutes on an off-the-shelf laptop. Extensions to considerably more rounds are still unknown.

Meet-in-the-Middle Attacks
Meet-in-the-middle attacks are successful primarily on primitives that do not use parts of the key in sequences of several rounds. The Simon-2w-2w versions use every key bit in each sequence of two subsequent rounds, which limits the chances of meet-in-the-middle attacks drastically. Considering 3-subset meet-in-the-middle attacks, together with an initial structure and partial matching, the length of an attack is limited to roughly that of twice the full diffusion plus four rounds plus the maximal length of an initial structure plus two rounds for a splice-and-cut part, which yields 30 rounds as a rough upper bound. It is unlikely that such attacks cover 30 or more rounds on Simon-2w-2w.

Correlated Sequences
An interesting recent direction may be correlated sequences introduced by Rohit and Gong in [58]. Their technique requires only very few texts and claims to break 27 rounds of Simon-32 and S -32; thus, it might outperform all previous attacks by at least three rounds. Though, that approach needs further investigation and has seen application only to Simon-32-64 until now.

Implications to SimP
Since the key schedule of Simon is fully linear, the two state words that are transformed by the key schedule allow the prediction of differences, linear and algebraic properties through a full step. In any case, SimP transforms each input word through at least 2rs rounds of Simon.

Related-key Differential Cryptanalysis
SimP needs cryptanalysis of related-key differential and linear characteristics. Existing methods such as the exhaustive search in [44] or SAT solvers [42], render such studies difficult due to the large state size since the known tools cannot scale appropriately. There exist peer-reviewed related-key results on Simon, e.g., by Wang et al. [63]. For the sake of feasibility, they restricted their search to related-key trails for the small variants, i.e., Simon-32, Simon-48, and Simon-64.
We conducted experiments using the SAT-based approach from [42] as well as with the branch-and-bound approach from [44] to search for optimal differential characteristics on SimP. Though, the related-key analysis of Simon-like constructions is computationally difficult because of the large state size. We obtained improved trails for only for up to seven rounds of Simon-96; starting from eight rounds, the best characteristics found possessed a zero key difference for up to 10 rounds, which suggests that differences in the few key words do not improve the best single-key characteristics. It seems that the probabilities of the existing optimal differential characteristics and linear trails for Simon-96-96 and Simon-128-128 also hold for SimP-192-1 and SimP-256-1 beyond that point. Table 7 compares the probabilities of optimal single-and related-key differential characteristics.
In the second step, the difference β 0 is transformed to βr s linearly, i.e., Pr[β 0 → βr s ] = 1. We can assume that β 0 ≠ 0 and (β 0 , αr s ) will be transformed to an output difference (βr s , 0) after the second step with probability q ≈ 2 −2w by approximating π 2 by a random permutation and using the Markov assumption and the random-key hypothesis. In this case, it holds that ]︂ = p · 2 −2w > 2 −4w .
Since π 1 is a round-reduced variant of Simon with 26 or 34 rounds, it is possible to have such trails. This setting is illustrated in Figure 9. However, an adversary would have to find a manyfold of related-key characteristics.
βr s βr s Figure 9: Setting of a differential attack with the step-reduced instance of SimP

Integral and impossible-differential Distinguishers
Integral and impossible-differential distinguishers are possible for up to two steps of SimP in general. We study an integral distinguisher in the following: Consider a structure of texts (L 0 i , R 0 i ) that use pairwise distinct values R 0 i and leave L 0 i constant. After the first step, the value R rs i is also constant for all texts and all values L rs i are pairwise distinct. This property is preserved through the linear key schedule to L 2rs i . However, the values of R 2rs are, in general, unknown. Note that there is no word swap after the second step. The third step destroys that knowledge.
Impossible-differential distinguishers can use a similar strategy, by setting ∆L 0 = 0 and testing if ∆L 2rs = 0, which is impossible. Again, note that there is no word swap after the second step.

Cube-like Distinguishers
Cube (and integral) distinguishers exploit that the degree of some output-bit component functions is lower than the state size. As discussed above, the degree of each bit is at least w after more than 22 rounds for Simon-96 and more than 26 rounds for Simon-128. SimP transforms each input bit through at least two steps of Simon, that consist of 26 rounds for Simon-96 and 34 rounds for Simon-128. While two out of four steps do not increase the degree in the part that is used as the key-update function, each bit is transformed through full-round Simon-96 or Simon-192. Thus, cube-like and integral distinguishers are not expected to threaten the security of SimP.

Number of Steps and Rounds of SimP
SimP benefits from the intensive existing cryptanalysis of Simon. The usage of the key-update function of Simon seems to not promote considerably more effective differential or linear distinguishers compared to the single-key results on Simon. The usage of the 2w-word key appears not exploitable neither by differentials and linear characteristics nor by techniques that try to benefit from a larger state, such as meet-in-the-middle distinguishers. The reason seems to be mainly the diffusion in the key schedule together with the relatively large number of rounds.
The number of steps and the number of rounds in our employed instantiations of SimP have been chosen very conservatively, using the number of rounds per step rs as half the number of rounds in Simon. This choice guarantees that each bit passes at least once through the full-round cipher, and therefore is expected to possess at least the algebraic degree of the full-round cipher. Moreover, the diffusion properties of Simon render impossible-differential, zero-correlation, or integral distinguishers implausible.
The design of SimP is very close to the original design of Simon. So, any considerable improvement in the cryptanalysis on SimP would most likely also be a higher threat on Simon-2w-2w. While such results are not impossible, the higher number of rounds in SimP provides an additional security margin.

FPGA Implementations
This section reports on FPGA implementations of SimP and Oribatida.

SimP
SimP is lightweight since its transformations are exactly the round function and the key-update function of Simon-96-96 or Simon-128-128, respectively. Both transformations are based on simple operations such as rotations, XORs, and ANDs that consume only routing resources and bit-wise logical operations. The area in GEs is approximately that of Simon-96 plus some overhead, which is caused by the need for additional input to both transformations due to the swapping after rs rounds.
Unprotected implementations of Simon are vulnerable against differential power analysis attacks using the leakage generated by the transitions in the state register; the Hamming-distance model captures such leakage. Masking -in particular, Boolean masking (XORing a random value to the output of the round function) -is one countermeasure that can be applied to Simon easily. The simple structure of Simon components allows exploring other countermeasures such as unrolling rounds to achieve higher-order side-channel resistance.
SimP can be implemented in different levels of serialization, from fully serial implementations that update only a single bit per cycle up to round-based implementations that update the full state in one clock cycle. Depending on the choice, there is a broad implementation spectrum with a trade-off between throughput and area.

Oribatida
Hardware implementations of our proposed instance of Oribatida are relatively straight-forward. It can be implemented efficiently with little extra cost compared to the duplex sponge. Additional costs result from the use of a module to generate the constants for the domain separation, which can be held in ROM. In modern FPGAs, this module takes only four look-up tables (LUTs). For domain separation, only a four-bit XOR is necessary at the input to the capacity of the permutation. An additional 64-bit register to store a mask and a 64-bit XOR to add the mask to the ciphertext are required.
The use of SimP as its main building block allows us to directly transfer the same strategy of using different data-path sizes to Oribatida. Thus, the implementer can choose among various trade-offs between throughput, latency, area, and power consumption.
In terms of side-channel resistance, the same aspects that hold for SimP also hold for Oribatida. Thus, Oribatida does not introduce additional weaknesses of side channels. Table 8 lists our implementations results obtained from Xilinx Vivado 2018 optimizing for area. All results represent measurements after the place-and-route process.
In Table 8, we list two columns for the number of clock cycles and throughput, the former represents the results for the processing of associated data (with the step-reduced SimP), whereas the latter denotes the results for processing the message (with the non-reduced SimP). Our results leave still room for further improvements in the close future.

Conclusion
This work presented Oribatida, a nonce-based permutation-based AE scheme that masks the ciphertexts from preceding permutation outputs. As a result, the adversary cannot deduce the masked part of the internal state. Therefore, Oribatida can achieve O(q 2 d /2 c ) security against forgeries under the release of unverified plaintexts. Oribatida improves the best known I -RUP security bound while the permutation can be kept as small as 64 + 128 bits for 121-bit AE and 48-bit I -RUP security.
We showed that even recent previous proposals with high AE security guarantees, such as Beetle or SPoC, succumb to attacks with complexity O( q d qp 2 c ). In contrast, the security bound of Oribatida does not depend primarily on the number of primitive queries. We showed that our bound is tight with a matching I -RUP attack, generalized our masking approach by applying it also to Beetle or the NIST submission SPoC, and demonstrated its application with similar attacks on their masked variants.