In this paper, we show how to construct – from any linear code – a Proof of Retrievability ( ) which features very low computation complexity on both the client ( ) and the server ( ) sides, as well as small client storage (typically 512 bits). We adapt the security model initiated by Juels and Kaliski [PoRs: Proofs of retrievability for large files, Proceedings of the 2007 ACM Conference on Computer and Communications Security—CCS 2007, ACM, New York 2007, 584–597] to fit into the framework of Paterson, Stinson and Upadhyay [A coding theory foundation for the analysis of general unconditionally secure proof-of-retrievability schemes for cloud storage, J. Math. Cryptol. 7 2013, 3, 183–216], from which our construction evolves. We thus provide a rigorous treatment of the security of our generic design; more precisely, we sharply bound the extraction failure of our protocol according to this security model. Next we instantiate our formal construction with codes built from tensor-products as well as with Reed–Muller codes and lifted codes, yielding s with moderate communication complexity and (server) storage overhead, in addition to the aforementioned features.
Cloud computing and storage has evolved quite spectacularly over the past decade. Especially, data outsourcing allows users and companies to lighten their storage burden and maintenance cost. Though, it raises several issues: for example, how can someone check efficiently that he can retrieve without any loss a massive file that he had uploaded on a distant server and erased from his personal system?
Proofs of retrievability ( s) address this issue. They are cryptographic protocols involving two parts: a client (or a verifier) and a server (or a prover). s usually consist in the following phases. First, a key generation process creates secret material related to the file, meant to be kept by the client only. Then the file is initialised, that is, it is encoded and/or encrypted according to the secret data held by the client. This processed file is uploaded to the server. In order to check retrievability, the client can run a verification procedure, which is the core of the . Finally, if the client is convinced that the server still holds his file, the client can proceed at any time to the extraction of the file.
Several parameters must be taken into account. Plainly, the verification process has to feature a low communication complexity, as the main goal is to avoid downloading a large part of the file to only check its extractability. Second, the storage overhead induced by the protocol must be low, as large server overhead would imply high fees for the customer. Third, the computation cost of the verification procedure must be low, both for the client (which is likely to own a lightweight device) and the server (whose computation work could also be expensive for the client).
Notice that proofs of data possession ( ) represent protocols close to what is needed in s. However, in s, one does not require the client to be able to extract the file from the server. Instances of s are given by Ateniese et al. . Besides, protocols of Lillibridge et al.  and Naor and Rothblum  are very often seen as precursors for s. For instance, the work of Naor and Rothblum  considers a setting in which the client directly accesses the file stored by the prover/server (while the actual definition uses “an arbitrary program as opposed to a simple memory layout and this program may answer these questions in an arbitrary manner” ).
1.2 Previous work
Juels and Kaliski  gave the first formal definition of s. They also proposed a first construction based on so-called sentinels (namely, random parts of the file to be checked during the verification step) the client keeps secretly on his device. Additionally, an erasure code ensures the integrity of the file to be extracted. This seminal work also raised several interesting points. On the one hand, it revealed that (i) the client must store secret data to be used in the verification step and (ii) coding is needed in order to retrieve the file without erasures or errors. On the other hand, in Juels and Kaliski’s construction, the verification step can only be performed a finite number of times since sentinels cannot be reused endlessly.
As a consequence, Shacham and Waters proposed to consider unbounded-use s in , where they built two kinds of s. The first one is based on linear combinations of authenticators produced via pseudo-random functions; its security was proved using cryptographic tools such as unforgeable MAC scheme, semantically secure symmetric encryption and secure PRFs. The second one is a publicly verifiable scheme based on the Diffie–Hellman problem in bilinear groups.
Bowers, Juels and Oprea  adopted a coding-theoretic approach (inner code, outer code) to compare variants of Shacham–Waters and Juels–Kaliski schemes. They focused on the efficiency of the schemes, and proved that, despite bounded use, new variants of Juels–Kaliski construction are highly competitive compared to other existing schemes.
In , Paterson, Stinson and Upadhyay provide a general framework for s in the unconditional security model. They show that retrievability of the file can be expressed as error correction of a so-called response code. That allows them to precisely quantify the extraction success as a function of the success probability of a proving algorithm: indeed, in this setting, extraction can be naturally seen as nearest-neighbour decoding in the response code. They notably apply their framework to prove the security of a modified version of the Shacham–Waters scheme. Also, notice that, prior to , Dodis, Vahan and Wichs  proposed another coding-theoretic model for s that allowed them to build efficient bounded-use and unbounded-use schemes.
With practicality in mind, other features have been deployed on s. For instance, Wang et al.  presented a construction based on Merkle hash trees, which allows efficient file updates on the server. Their scheme is provably secure under cryptographic assumptions (hardness of Diffie–Hellman in bilinear groups, unforgeable signatures, etc.) and has been improved by Mo, Zhou and Chen  in order to prevent unbalanced trees. More recently, other features have been proposed for s, such as multi-prover s (see ) or public verifiability (for instance in ).
1.3 Our approach
As we remarked before, most schemes rely on two techniques: (i) the client locally stores secret data in order to check the integrity of the file, and (ii) the client encodes the file in order to repair a small number of erasures and errors that could have been missed during the verification step.
In this work, we propose to build schemes using codes that fulfil the two previous goals, when equipped with a suitable family of efficiently computable random permutations. More precisely, our idea is the following. Given a file F, a code and a family of random permutations , the client sends to the server an encoded and scrambled version of his file. Then the verification step consists in checking “short” relations among descrambled symbols of , which come, for instance, from low-weight parity-check equations for . Moreover, during the extraction step, the code provides the redundancy necessary to repair erasures and potential unnoticed errors.
In the present work, we develop a seminal idea that appeared in , where the authors proposed a construction of s based on lifted codes. We here provide a more generic construction and give a deeper analysis of its security.
While our scheme does not feature updatability nor public verifiability, we emphasise the genericity of our construction, which is based on well-studied algebraic and combinatorial structures, namely, codes and their parity-check equations. Moreover, since the code is public, the client must only store the secret material associated to the random permutations , which consist in a few bytes. Besides, an honest server simply needs to read pieces of w during the verification step, and therefore has very low computational burden compared to many other schemes.
Section 2 is devoted to the definition and security model of proofs of retrievability. Despite the great disparity of models in literature, we try to keep close to the definitions given in [6, 11] for the sake of uniformity.
Section 3 presents our construction of . Precisely, in Section 3.1, we introduce objects called verification structures for a code that will be used in the definition of our scheme (Section 3.2). A rigorous analysis of our scheme is the purpose of the remainder of that section.
2 Proofs of retrievability
2.1 Definition of underlying protocols
We recall that, in proofs of retrievability, a user wants to estimate if a message m can be retrieved from a encoded version w of the message stored on a server. In all what follows, the user will be known as the (wants to verify the retrievability of the message) while the server is the (aims at proving the retrievability). The message space is denoted by while , the (server) file space, is the set of encoded versions of the messages. We also denote by the set of secret values (or keys) kept by the , and by the space of responses to challenges.
Throughout the paper, the symbols and respectively denote the output of randomised and deterministic algorithms.
A keyed proof of retrievability ( ) is a tuple of algorithms ( , , , ) running as follows:
The key generation algorithm generates uniformly at random a key . The key κ is secretly kept by the .
The initialisation algorithm is a deterministic algorithm which takes, as input, a message and a key , and outputs a file . is run by the which initially holds the message m. After the process, the file w is sent to the , and the message m is erased on ’s side. Upon receipt of w, the sets a deterministic algorithm that will be run during the verification procedure.
The verification algorithm is a randomised algorithm initiated by the which needs a secret key and interacts with the . is depicted in Figure 1 and works as follows:
the runs a random query generator that outputs a challenge (the set being the so-called query set);
the challenge u is sent to the ;
the outputs a response ;
the checks the validity of according to u and κ; the algorithm finally outputs the Boolean value .
The extraction algorithm is run by the . It takes, as input, κ and and outputs either a message or a failure symbol . We say that extraction succeeds if .
The vector is called the response word associated to .
Note that, in assuming that the response algorithm is deterministic and non-adaptive, we follow the work of Paterson, Stinson and Upadhyay . The authors justify determinism of response algorithms by the fact that any probabilistic prover can be replaced by a deterministic prover whose success probability is at least as good as the probabilistic one.
In Definition 2.1, we can see that a deterministic algorithm can be represented by the vector of its outputs , called the response word of . Therefore, we can assume that, before the verification step, the produces a word related to the file w he holds. In other words, we model provers as algorithms which, given as input w, return a word .
Following , we also assume in this chapter that the extraction algorithm is deterministic, though, in general, it can be randomised. Finally, notice that proofs of retrievability aim at proving the extractability of a file. The extraction algorithm is therefore a tool to retrieve the whole file. Hence its computational efficiency is not a crucial feature.
|Input||m, κ||r, κ||u, , κ||r, κ|
|Output||κ||w||True or False||True or False||or|
2.2 Security models
One should first notice that, despite many efforts, proofs of retrievability lack a general agreement on the definition of their security model. Nevertheless, our definitions remain very close to the ones given in the original work of Juels and Kaliski .
For a response word given by the and a key kept by the , we first define the success of r according to κ as
where the probability is taken over the internal randomness of . A first security model can be defined as follows.
Definition 2.2 (Security model, strong version).
Let . A proof of retrievability is strongly -sound if, for every initial file , every uploaded file and every prover , we have
the probability being taken over the internal randomness of under the constraint that .
A remark concerning parameters ε and τ
In proofs of retrievability, we aim at making the extraction of the desired file m as sure as possible when the audit succeeds. Hence it is desirable to have τ small. On the other hand, the parameter ε measures the rate of unsuccessful audits which leads the to believe the extraction will fail. Therefore, one does not necessarily need to look for large values of ε, though, in practice, large ε afford more flexibility, for instance, if communication errors occur between the and the during the verification procedure.
Definition 2.2 provides a strong security model, in the sense that (i) it does not require any bound on the response algorithms given by the and (ii) the probability in (2.1) is taken over fixed messages m (informally, it means the knows m).
However, keyed proofs of retrievability are usually insecure according to the security model given in Definition 2.2. For instance, in , Paterson, Stinson and Upadhyay noticed that in the Shacham–Waters scheme , given the knowledge of m and w, an unbounded may be able to
compute (or at least randomly guess) a key κ such that ,
build such that ,
set which (a) successfully passes every audit and (b) leads to the extraction of .
Hence we choose to use a weaker but still realistic security model, where, informally, the only knows what he stores (that is, w) and has no information on the initial message m. The following security model thus remains conform with the one given by Paterson, Stinson and Upadhyay .
Definition 2.3 (Security model, weak version).
Let . A proof of retrievability is weakly -sound (or simply -sound) if, for every polynomial-time prover and every uploaded file , we have
In equation (2.2), the randomness comes from pairs picked uniformly at random among those satisfying .
Since we deal with values of τ very close to 0, we also say that a strongly -sound admits bits of security against ε-adversaries.
Informally, saying that a is not weakly sound amounts to finding a polynomial-time deterministic algorithm which
takes, as input, a file and outputs a response word ,
makes the extraction fail with non-negligible probability (over messages m and keys κ such that the corresponding response words are successfully audited).
3 Our generic construction
Schematically, in the initialisation phase of our construction, the
encodes his file according to a code ,
scrambles the resulting codeword using a tuple of permutations over the base field,
uploads the result to the .
As we explained in the introduction, the verification step then consists in checking that the server is still able to give answers that, once descrambled, satisfy low-weight parity-check equations for .
For this purpose, we next introduce objects called verification structures for codes, which will be used in the definition of our generic scheme.
3.1 Verification structures: A tool for our PoR scheme
We here consider , the finite field with q elements. From well-known coding theory terminology, the support of a word is , and its weight is .
In this work, we need to consider codes whose alphabets are finite-dimensional spaces over , typically . Precisely, a code of length n over is a subset of . A code is -linear if is a vector space over . When , we get the usual definition of linear codes over finite fields. Unless stated otherwise, we only consider -linear codes, that we will refer to as codes.
We usually denote by k the dimension over of a code . Its minimum distance is the smallest Hamming distance between two distinct codewords. If n is the length of , then is the relative minimum distance of the code , while represents its rate. If , its dual code is defined as . Codewords in are also called parity-check equations for .
Definition 3.1 (Verification structure).
Let and be a code. Let also be a non-empty set of -subsets of . Set . We define the restriction mapR associated to as
Given an integer and a map , we say that is a verification structure for if the following holds:
For all , there exists such that .
For all , the map given by is surjective and vanishes on the code . Explicitly,
The map V is then called a verification map for , and the set a query set for . By convention, for and , we define
Finally, the code is called the response code of .
Example 3.2 (Fundamental example).
Let be a code, and let be a set of parity-check equations for of Hamming weight , whose supports are pairwise distinct. Define the query set and, for any , to be the unique parity-check equation in whose support is u. Finally, we define a map V by
Notice that we set here. By construction, it is clear that is a verification structure for .
Example 3.3 (Toy example).
Let be a binary Hadamard code of length and dimension . In other words, is defined by a parity-check matrix
According to Example 3.2, we define to be the set of supports of rows of H. In other words,
Then the verification map can be defined as follows. If and is indexed according to u, then we define
Now let . The message m can be encoded into
Hence the word is
For each vector-coordinate of , one can now check that . Hence we get , as expected.
From now on, we denote by the length of the response code of a code equipped with a verification structure .
3.2 Definition of our PoR scheme
Let be a verification structure for , and let , where denotes the set of permutations over . Any n-tuple of permutations naturally acts on by
and we define . Let finally
where . The map has been defined in order to satisfy
for every .
Based on this, our construction is given in Figure 2.
3.3.1 Preliminary results
We first give results concerning verification structures and response codes. The following two lemmata are straightforward to prove.
Let be a verification structure for a code . Then is a verification structure for .
Let be any query-set for a code whose elements have cardinality . Then its response code is an -linear code over the alphabet .
By considering instead of , we loose the -linearity, but one can check that verification structures still make sense and provide the result claimed in Lemma 3.4.
The next result states that the map does not modify the distance between codewords.
Let be a linear code, a verification structure for , and . Then it holds that
the distribution of distances in and are the same,
the distribution of distances in and are the same.
Since every is one-to-one, for any , we get
The proof for response codes relies on the same argument. ∎
Remark these results imply that, if is linear, then the minimum distance of is the minimum weight of .
Let and be a verification structure for a code . We say is ε-close to if
Let now and . We say that is a β-liar for if
Bounded-distance error-and-erasure decoder
Let be any code of minimum distance d, and let be corrupted with b errors and e erasures, resulting in a word . Then it is well known that, as long as , it is possible to retrieve a from thanks to a so-called bounded-distance error-and-erasure decoding algorithm. This is precisely the decoding algorithm that we employ in Figure 3 on the code .
Our framework allows us to reformulate the extraction success in terms of a probability to decode corrupted codewords. More precisely:
Let , , and denote by d the minimum distance of of length N. Let also be the response word, output of a proving algorithm taking as input. Finally, assume that r is ε-close to and a β-liar for , with . Then , where is defined in Figure 3.
Recall that represents the word we get from r after step (ii) of the algorithm given in Figure 3. Let us now translate our assumptions on r in coding-theoretic terminology:
r is ε-close to means that there are at most challenges for which we know that the coordinate is not authentic. This justifies that we assign erasure symbols to these coordinates.
r is a β-liar for means that there are at most other corrupted values , but we cannot identify them. Therefore, we can assimilate these coordinates to errors.
To sum up, we see as a corruption of with at most erasures and at most errors, where . Since we assume that , we know from the previous discussion that the decoding succeeds to retrieve m. ∎
3.3.2 Bounding the extraction failure
According to Definition 2.3, our scheme is weakly -sound if, for every polynomial-time algorithm outputting a response word from a file w, we have
Using Proposition 3.9, the security analysis of our scheme reduces to measuring the ability of the to produce a response word r which is ε-close to and a β-liar for , with .
For fixed , and the authentic file given to the prover, we define three subsets of :
and . This represents challenges u on which the response word r differs from the authentic one .
and . These are challenges u on which the associated coordinate is not accepted by the verification map (it corresponds to erasures in the decoding process).
and . These are the challenges u on which the associated coordinate is accepted by the verification map, but differs from the authentic response (it corresponds to errors in the decoding process).
One can easily check that, for every σ, the sets and define a partition of . The probability of extraction failure can thus be written as
For , let us define the set of admissible permutations and messages
so that equation (3.1) rewrites
Later on, we will use the notation to refer to the fact that is uniformly drawn from . Similarly we will use notation for the expectancy and for the variance.
Given , we also define
and , where are such that . The parameter is called the bias of the verification structure for . It corresponds to the maximum probability that a response is accepted but not authentic.
For all and , we have
A simple computation shows
Lemma 3.10 essentially means that, if an adversary to our scheme wants its response word to be (in average) ε-close to the verification structure, then he should modify at most responses. Below, we take advantage of this result, and we measure the probability of an extraction failure.
First, for , let
The probability represents the probability that the extraction fails for a response code of relative distance δ and an adversarial response word r associated to w, which is ε-close to the verification structure. Let us bound .
Let such that . Let also and . Then we have
We distinguish three cases.
(i) . The event never occurs since . Hence .
(ii) . The inequality implies
Hence, using Chebychev’s inequality,
(iii) . In this case, implies
Therefore, similarly to the previous case, we obtain the claimed result. ∎
For any , denote by the -random variable “ ” when σ is uniformly drawn from . It holds that .
Recall that two real random variables are uncorrelated if . For instance, two independent random variables are uncorrelated.
Let and . If the random variables are pairwise uncorrelated, then
By assumption, are pairwise uncorrelated; hence
The trivial bound gives the result. ∎
since . Moreover, if and , then .
Therefore, we end up with the following theorem.
Let be a verification structure for with bias α. Let , and let be the relative distance of the associated response code. Finally, assume that, for any and any , the variables are pairwise uncorrelated. Then, for any , the scheme associated to and is -sound, where
For asymptotically small α, a code equipped with a verification structure satisfying the conditions of Theorem 3.13 thus gives an -sound scheme for every and .
According to Theorem 3.13, we thus need to look for (sequences of) codes and associated verification structures such that
the response code admits a good relative distance ,
the bias α is small,
random variables are pairwise uncorrelated.
3.4 Estimating α
In this section, we prove that, assuming approximates the uniform distribution over in a sense that we make precise later, the bias α can be bounded according to parameters of the verification structure.
Let us fix , and . We recall that α is defined by
where randomness comes from . We notice that this is equivalent to write .
For convenience, we will view as a vector indexed by , so that we can easily denote by its j-th coordinate, . We define the code , and up to re-indexing coordinates, . This allows us to write that, for every σ, we have if and only if . Finally, we denote by the set of coordinates of that are not authentic.
Let represent the event “ ”. Informally, the reason why we consider an event conditioned by is that the is free to choose any support on which he can modify the original file. More formally, this constraint will help us to bound the probability in Lemma 3.14. We say that is sufficiently uniform if, for every , we have
when the file size . In other words, is sufficiently uniform if it is a good approximation of the whole set of n-tuples of permutations, when considering the probability that happens.
Let r, w, u and be defined as above. Let also . Then
For every σ such that , we know that , and we recall that if and only if . Since is linear, and up to considering instead, we can assume without loss of generality that for every . In other words, we assume that .
since counts the number of codewords in whose support is .
Therefore, we get
Let be the -vector space , and assume that . We have
We prove that, if for some integer , then , which clearly induces our result. If , then since . The Singleton bound then provides
Finally, we get the following upper bound on α.
Let . Then
Remark that , defined in previous lemma, is a subcode of shortened on . Hence
and we can apply previous results and obtain the desired bound
where . ∎
If every is sufficiently uniform, then, by definition, we have when the file size . This assumption is significant since we desire to have a small bias α, which is deeply linked to the soundness of s (see Theorem 3.13). In Appendix A, we present experimental estimates of α, validating that the assumption that is sufficiently uniform.
3.5 Pairwise uncorrelation of
This section is devoted to proving that variables are pairwise uncorrelated if the supports of challenges have small pairwise intersection. For this purpose, let us recall that, for fixed , w and , the random variable represents when σ is uniformly picked in .
We first state a technical lemma that will be useful to prove Proposition 3.18 below. For clarity, we denote by the minimum distance of the dual code of a linear code .
Let be a linear code and , , where . For , we define
is a linear subcode of ;
for every non-zero , there exists a non-zero such that ;
for every , , where .
(i) The fact that is actually the well-known definition of the shortening of a code. It is easy to prove that it defines a linear code.
(ii) Let be non-zero, and let us first prove that there exists such that . If it were not the case, then, by definition, we would have . But this is impossible since contains no non-zero codeword of weight less that t. It is then easy to check that .
(iii) First notice that if . Since
we get the expected result. ∎
If , then the random variables are pairwise uncorrelated.
Recall that and that, by definition of a verification structure, we have . For , let us prove that . First,
Denote , and let . We denote by the event
We first notice that . Indeed, we can here use an argument similar to the proof of Lemma 3.17: the constraint is ineffective on since for every . Therefore, for every , we have
and it follows that
Recall now that . Hence, for fixed and , the variables and are independent (once again, it is a consequence of the structure results of Lemma 3.17). Therefore,
and we conclude since
4.1 Efficient scrambling of the encoded file
In the scheme we propose, the storage cost of an n-tuple of permutations in is excessive since it is superlinear in the original file size. In this subsection, we propose a storage-efficient way to scramble the codeword produced by the .
Precisely, we want to define a family of maps , where , , with the following requirements:
For every κ, the map is efficiently computable and requires a low storage.
For every κ and every , if , then, for every , the local inverse map is efficiently computable.
If κ is randomly generated but unknown, then, given the knowledge of and , it is hard to produce a response word such that, for many , both and hold. To be more specific and in light of the security analysis of Section 3.3, we require that it is hard to distinguish from a random , where symbols are picked independently and uniformly at random.
We here propose to derive from a suitable block cipher, yielding the explicit construction given below. Of course, other proposals can be envisioned.
Let denote a random initialisation vector for AES in CTR mode ( could be a nonce concatenated with a random value). Vector is kept secret by the , as well as a randomly chosen key κ for the cipher. Let also f be a permutation polynomial over of degree . For instance, one could choose with . Notice that polynomial f can be made public.
Let be the number of -symbols one can store in a 256-bit word. Up to appending a few random bits to c, we assume that , and we define . Let us fix a partition of into s-tuples ; it can be, for instance, , . Notice that this partition does not need to be chosen at random. Given and i an element of the above partition, we now define
If , trailing zeroes can be added to evaluations of f. Finally, the pseudo-random permutation σ is defined by
AES is a natural choice when one needs a (secret-)keyed pseudo-random permutation. Also notice that, with this construction, one only needs to store the key κ and the vector since the other objects (the polynomial f, the partition) are made public. Hence our objectives in terms of storage are met.
We now point out the necessity to use i as a part of the input of the AES cipher. Assume that we do not. Then the local permutation , , would not depend on j. As a consequence, for a certain class of codes, the local verification map would not depend on u, and a malicious would then be able to produce accepted answers while storing only a small piece of the file w (e.g., for only one ).
Another mandatory feature is the non-linearity of the permutation polynomial f. Indeed, assume, for instance, that . Then, given the knowledge of , it would be very easy for a malicious to produce a word such that is always accepted by the . Simply, the defines , where is any non-zero codeword of . Hence one sees that the polynomial f must be non-linear in order to prevent such kind of attacks.
We here consider a built upon a code with verification structure satisfying and . We also assume that we use an n-tuple of pseudo-random permutations as described in the previous subsection.
At each verification step, the client sends an -tuple of coordinates , . The server then answers with corresponding symbols . Therefore, the upload communication cost is bits, while the download communication cost is , thus a total of bits.
In the initialisation phase, following the encryption described in Section 4.1, the client essentially has
to compute the codeword associated to its message,
to make n evaluations of the permutation polynomial f over ,
to compute AES ciphertexts to produce the word w to be sent to the server.
Given a generator matrix of , the codeword c can be computed in operations over with a matrix-vector product. Notice that quasi-linear-time encoding algorithms exist for some classes of codes. Besides, if a monomial or a sparse permutation polynomial is used, then the cost of each evaluation is . If we denote by c the bitcost of an AES encryption, we get a total bitcost of for the initialisation phase. Recall this is a worst-case scenario in which the encoding process is inefficient.
At each verification step, an honest server only needs to read symbols from the file it stores. Hence its computation complexity is . The client has to compute a matrix-vector product over , where the matrix has size and the vector has size , thus a computation cost of operations over .
The client stores bits for secret material κ and to use in AES. The server storage overhead exactly corresponds to the redundancy of the linear code , that is, bits.
Our scheme is unbounded-use since every challenge reveals nothing about the secret data held by the client. It does not feature dynamic updates of files. Though, we must emphasise that the file w the client produces can be split among several servers, and the verification step remains possible even if the servers do not communicate with each other. Indeed, computing a response to a challenge does not require mixing distinct symbols of the uploaded file. Therefore, our scheme is well suited for the storage of large static distributed databases. Parameters of the schemes we propose are reported in Figure 4.
In this section, we present several instantiations of our construction. We first recall basics and notation from coding theory.
The code denotes the repetition code . We recall that is the parity code . Let be two linear codes over of respective parameters and . Their tensor product is the -linear code generated by words
It has dimension and minimum distance . We also denote by
the s-fold tensor product of with itself.
5.1 Tensor-product codes
The upcoming subsection illustrates our construction with a non practical but simple instance. The next ones lead to practical instances.
5.1.1 A simple but non-practical instance
Let and . The set defines a partition of . We define the code
In other words, , and a parity-check matrix H for is given by
The verification map is defined by for all . By construction (see the fundamental Example 3.2), the pair defines a verification structure for .
Let as above. Then the response code has minimum distance 1.
We see that the restriction map R sends the codeword to a word of weight 1. Besides, R is injective, so . ∎
Since when N goes to infinity, an attempt to build a scheme from cannot be practical.
5.1.2 Higher order tensor-product codes
Let be a non-degenerate -linear code, and define , where . Notice that it will be more convenient to see coordinates of words as elements of .
For and , we define , the “i-th axis-parallel line with basis ”, as
By definition of , a word c lies in if and only if, for every , the restriction . This means that we can define
a set of queries ,
a verification map
where H is a parity-check matrix for whose columns are ordered according to the line L.
By the previous discussion, it is clear that implies that for every (in fact, these two assertions are equivalent). Hence defines a verification structure for , and we have .
Let as above. Then has minimum distance .
Let us first prove that the minimum distance of is larger than . Let , and assume . Then there exists such that . Therefore, for some . Consider the set
Very informally, the set corresponds to the hyperplane passing through and “orthogonal” to the i-th axis. By definition of , we know that for every . Let
with . Every defines a line on which is a non-zero codeword of . Equivalently, r is non-zero on index . Therefore,
Let us now build a word of weight . Let be a minimum-weight codeword of , and define . Define ; then . Let finally . We see that if and only if . Hence we get
since each line is counted times when runs over . ∎
Let , and let be an MDS code. Define and as above. If every is sufficiently uniform, then the scheme associated to and is -sound for and every , where when .
First, the relative distance of is according to Lemma 5.2. Then the random variables are pairwise uncorrelated because the inequality
We mainly focus on the download communication complexity in the verification step and on the server storage overhead since these are the most crucial parameters which depend on the family of codes we use. Besides, we consider that it is more relevant to analyse the ratio between these quantities and the file size than their absolute values.
Here, for an initial file of size bits, we get
a redundancy rate
a communication complexity rate
In Table 3, we present various parameters of instances admitting , for files of size approaching , and bits. Here is a MDS code (e.g., a Reed–Solomon code), and .
|q||s||File size (bits)||Comm. rate||Redundancy rate|
The previous example shows that, while the communication rate is reasonable for these instances over large files, the storage needs remain large.
5.2 Reed–Muller and related codes
Low-degree Reed–Muller codes are known to admit many distinct low-weight parity-check equations, whose supports correspond to affine subspaces of the ambient space. Therefore, they seem naturally adapted to our construction. Let us first consider the plane (or bivariate) Reed–Muller code case.
5.2.1 The plane Reed–Muller code
Let be the Reed–Muller code
It is well known that has length and dimension . Besides, for every line
and every , we can check that . Indeed, let , . The restriction of f on an affine line L can be interpolated as a univariate polynomial of degree at most a. Our claim follows since for every .
Therefore, we can define as the set of affine lines L of and . From the previous discussion, we see that is a verification structure for . Also notice there are distinct affine lines in ; hence .
Let , equipped with its verification structure defined as above. Then the response code has minimum distance .
Any non-zero codeword consists in the evaluation of a non-zero polynomial of degree at most . Denote by the affine lines on which f vanishes, i.e., for every , . We claim that . Indeed, since f has total degree less than , it also vanishes on closed lines , considered as affine lines in , where denotes the algebraic closure of . Denote by the monic polynomial of degree 1 which defines . From Hilbert’s Nullstellensatz, there exists such that . Since the ’s have degree 1 and are distinct, we get . Hence the affine lines different from correspond to non-zero coordinates of . There are such lines, so .
Now we claim there exists a word of weight . Let and be two distinct parallel affine lines, respectively defined by and . We build the word c which is -1 on coordinates corresponding to points in , 1 on those corresponding to points in and 0 elsewhere. One can check that ; indeed, c corresponds to the evaluation of . Now, if we want to compute , we only need to count the number of lines which do not intersect nor . Clearly, there are only such lines. Hence , and this concludes the proof. ∎
Let , and let be its associated verification structure. If every is sufficiently uniform, then the scheme associated to and is -sound for and , when .
One can check that the random variables are pairwise uncorrelated since
For an initial file of size bits, we get
a redundancy rate
a communication complexity rate
5.2.2 Storage improvements via lifted codes
The redundancy rate of Reed–Muller codes presented above stays stuck above 2. Affine lifted codes, introduced by Guo, Kopparty and Sudan , allow to break this barrier while keeping the same verification structure. Generically, they are defined as follows:
We refer to  for more details about the construction. Here we focus on since it can be compared to . Indeed, one sees that
and equation (5.1) turns into a proper inclusion as long as q is not a prime. Besides, by definition of lifted codes,