There has been considerable recent interest in “cloud storage” wherein a user asks a server to store a large file. One issue is whether the user can verify that the server is actually storing the file, and typically a challenge-response protocol is employed to convince the user that the file is indeed being stored correctly. The security of these schemes is phrased in terms of an extractor which will recover the file given any “proving algorithm” that has a sufficiently high success probability. This forms the basis of proof-of-retrievability (PoR) systems. In this paper, we study multiple server PoR systems. We formalize security definitions for two possible scenarios: (i) A threshold of servers succeeds with high enough probability (worst case), and (ii) the average of the success probability of all the servers is above a threshold (average case). We also motivate the study of confidentiality of the outsourced message. We give MPoR schemes which are secure under both these security definitions and provide reasonable confidentiality guarantees even when there is no restriction on the computational power of the servers. We also show how classical statistical techniques previously used by us can be extended to evaluate whether the responses of the provers are accurate enough to permit successful extraction. We also look at one specific instantiation of our construction when instantiated with the unconditionally secure version of the Shacham–Waters scheme. This scheme gives reasonable security and privacy guarantee. We show that, in the multi-server setting with computationally unbounded provers, one can overcome the limitation that the verifier needs to store as much secret information as the provers.
In the recent past, there has been a lot of activity on remote storage and the associated cryptographic problem of integrity of the stored data. This question becomes even more important when there are reasons to believe that the remote servers might act maliciously, i.e., one or more servers can delete (whether accidentally or on purpose) a part of the data since there is a good chance that the data will never be accessed, and hence, the client would never find out! In order to assuage such concerns, one would prefer to have a simple auditing system that convinces the client if and only if the server has the data. Such audit protocols, called proof-of-retrievability (PoR) systems, were introduced by Juels and Kaliski , and closely related proof-of-data-possession (PDP) systems were introduced by Ateniese et al. .
In a PoR protocol, a client stores a message m on a remote server and keeps only a short private fingerprint locally. At some later time, when the client wishes to verify the integrity of its message, it can run an audit protocol in which it acts as a verifier while the server proves that it has the client’s data. The formal security of a PoR protocol is expressed in terms of an extractor – there exists an extractor with (black-box or non-black-box) access to the proving algorithm used by the server to respond to the client’s challenge, such that the extractor retrieves the original message given any adversarial server which passes the audits with a threshold probability. Apart from this security requirement, two practical requirements of any PoR system would be to have a reasonable bound on the communication cost of every audit and small storage overhead on both the client and server.
PoR systems were originally defined for the single-server setting. However, in the real world, it is highly likely that a client would store its data on more than one server. This might be due to a variety of reasons. For example, a client might wish to have a certain degree of redundancy if one or more servers fails. In this case, the client is more likely to store multiple copies of the same data. Another possible scenario could be that the client does not trust a single server with all of its data. In this case, the client might distribute the data across multiple servers. Both of these settings have been studied previously in the literature.
The first such study was initiated by Curtmola et al. , who considered the first of the above two cases. They addressed the problem of storing copies of a single file on multiple servers. This is an attractive solution considering the fact that replication is a fundamental principle in ensuring the availability and durability of data. Their system allows the client to audit a subset of servers even if some of them collude.
On the other hand, Bowers, Juels and Oprea  considered the second of the above two cases. They studied a system where the client’s data is distributed and stored on different servers. This ensures that none of the servers has the whole data.
Both of these systems covered one specific instance of the wide spectrum of possibilities when more than one server is involved. For example, none of the works mentioned above addresses the question of the privacy of data. Both of them argue that, for privacy, the client can encrypt its file before storing it on the servers. These systems are secure only in the computational setting and the privacy guarantee is dependent on the underlying encryption scheme. On the other hand, there are known primitives in the setting of distributed systems, like secret sharing schemes, that are known to be unconditionally secure. Moreover, we can also utilize cross-server redundancy to get more practical systems.
1.1 Our contributions
In Section 2, we give the formal description of multi-server PoR (MPoR) systems. We state the definitions for worst-case and the average-case secure MPoR systems. We also motivate the privacy requirement and state the privacy definition for MPoR systems. In Section 3, we define various primitives to the level required to understand this paper.
In Section 4, we give a construction of an MPoR scheme that achieves worst-case security when the malicious servers are computationally unbounded. Our construction is based on ramp schemes and a single-server PoR scheme. Our construction achieves confidentiality of the message. To exemplify our scheme, we instantiate this scheme with a specific form of ramp scheme.
In Section 5, we give a construction of an MPoR scheme that achieves average-case security against computationally unbounded adversaries. For an MPoR system that affords average-case security, we also show that an extension of classical statistical techniques previously used by us  can be used to provide a basis for estimating whether the responses of the servers are accurate enough to allow successful extraction.
One of the benefits of an MPoR system is that it provides cross-server redundancy. In the past, this feature has been used by Bowers, Juels and Oprea  to propose a multi-server system called HAIL. We first note that the constructions in Section 4 and Section 5 do not provide any improvement on the storage overhead of the server or the client. In Section 6, we give a construction based on the Shacham–Waters protocol  that allows significant reduction of the storage overhead of the client in the multi-server setting.
1.2 Related works
The concept of proof of retrievability is due to Juels and Kaliski . A PoR scheme incorporates a challenge-response protocol in which a verifier can check that a message is being stored correctly, along with an extractor that will actually reconstruct the message, given the algorithm of a “prover” who is able to correctly respond to a sufficiently high percentage of challenges.
There are also papers that describe the closely related (but slightly weaker) idea of a proof-of-data-possession scheme (PDP scheme), e.g., . A PDP scheme permits the possibility that not all of the message blocks can be reconstructed. Ateniese et al.  also introduced the idea of using homomorphic authenticators to reduce the communication complexity of the system. This scheme was improved in a follow-up work by Ateniese et al. . Shacham and Waters  later showed that the scheme of Ateniese et al.  can be transformed into a PoR scheme by constructing an extractor that extracts the file from the responses of the prover on the audits.
Bowers, Juels and Oprea  extended the idea of Juels and Kaliski  and used error-correcting codes. The main difference in their construction is that they use the idea of an “outer” and an “inner” code (in the same vein as concatenated codes), to get a good balance between the extra storage overhead and computational overhead in responding to the audits. Dodis, Vadhan and Wichs  provided the first example of an unconditionally secure PoR scheme, also constructed from an error-correcting code, with extraction performed through list decoding in conjunction with the use of an almost-universal hash function. They also give different constructions depending on the computational capabilities of the server. Previously , we studied PoR schemes in the setting of unconditional security and showed some close connections to error-correcting codes.
Recently, Ateniese, Kamara and Katz  defined the framework of proof-of-storage systems to understand PDP and PoR system in a unified manner. They argue that existing PoR  and PDP  schemes can be seen as an instantiation of their framework. They used homomorphic identification schemes to give efficient proof-of-storage systems in the random-oracle model. They further exhibited that existing constructions of PoR and PDP schemes are specific instantiation of their construction. Wang et al.  gave the first privacy preserving public auditable proof-of-storage systems. We refer the readers to the survey by Kamara and Lauter  regarding the architecture of proof-of-storage systems.
Distributed cloud computing.
All the constructions mentioned above considered single server system; however, such systems are prone to failure leading to catastrophic problems . However, proof-of-storage systems have been also studied in the setting where there is more than one server or more than one client. The first such setting was studied by Curtmola et al. . They studied a multiple-replica PDP system, which is the natural generalization of a single-server PDP system to t servers.
Bowers, Juels and Oprea  introduced a distributed system that they called HAIL. Their system allows a set of provers to prove the integrity of a file stored by a client. The idea in HAIL is to exploit the cross-prover redundancy. They considered an active and mobile adversary that can corrupt the whole set of provers.
Recently, Ateniese et al.  considered the problem from the client side, where n clients store their respective files on a single prover in a manner such that the verification of the integrity of a single client’s file simultaneously gives the integrity guarantee of the files of all the participating clients. They called such a system an entangled cloud storage.
1.3 Comparison with Bowers, Juels and Oprea
The focus of this paper is PoR systems in the distributed setting; therefore, we only compare our work with existing works in the distributed setting. The scheme of Curtmola et al.  only considers multiple replica of the same underlying PDP systems, while the construction of Ateniese et al.  is for the multiple clients setting. In other words, the scheme of Bowers, Juels and Oprea  is closest to ours. However, there are a few key differences.
The construction of Bowers, Juels and Oprea  is secure only in the computational setting, while we provide security in the setting of unconditional security.
Bowers, Juels and Oprea  use various tools and algorithms to construct their systems, including error-correcting codes, pseudo-random functions, message authentication codes and universal hash function families. On the other hand, we only use ramp schemes in our constructions, making our schemes easier to state and analyze, and arguably simpler to implement.
We consider two types of security guarantees, namely, the worst-case scenario and the average-case scenario. On the other hand, Bowers, Juels and Oprea  only consider the worst-case scenario.
We use the term Prover to identify any server that stores the file of a client. We use the term Verifier for any entity that verifies whether the file of a client is stored properly or not by the server. We also assume that a file is composed of message blocks of an appropriate fixed length. If the file consists of single block, we simply call it the file.
2 Security model of multi-server PoR systems
The essential components of multi-server PoR (MPoR) systems are natural generalizations of single-server PoR systems. The first difference is that there are ρ provers and the Verifier might store different messages on each of them. Also, during an audit phase, the Verifier can pick a subset of provers on which it runs the audits. The last crucial difference is that the Extractor has (black-box or non-black-box) access to a subset of proving algorithms corresponding to the provers that the Verifier picked to audit. We detail them below for the sake of completeness.
Let be a set of ρ provers. The Verifier has a message from the message space which he redundantly encodes to .
In the keyed setting, the Verifier picks ρ different keys , one for each of the corresponding provers.
The Verifier gives to . In the case of a keyed scheme, may be also given an additional tag , generated using the key , and .
The Verifier stores some sort of information (say a fingerprint of the encoded message) which allows him to verify the responses made by the provers.
On receiving the encoded message , generates a proving algorithm , which it uses to generate its responses during the auditing phase.
At any time, the Verifier picks an index i, where , and engages in a challenge-response protocol with . In one execution of the challenge-response protocol, the Verifier picks a challenge c and gives it to , and the prover responds. The Verifier then verifies the correctness of the response (based on its fingerprint).
The success probability is the probability, computed over all the challenges, with which the Verifier accepts the response sent by .
The Extractor is given a subset S of the proving algorithms (and in the case of a keyed scheme, the corresponding subset of the keys, ) and outputs a message . The Extractor succeeds if .
The above framework does not restrict any provers from interacting with other provers when they receive the encoded message. However, we assume that they do not interact after they have generated a proving algorithm. If we do not include this restriction, then it is not hard to see that one cannot have any meaningful protocol. For example, if provers can interact after they receive the encoded message, then it is possible that one prover stores the entire message and the other provers just relay the challenges to this specific prover and relay back its response to the verifier.
In contrast to a single-prover PoR scheme, there are two possible ways in which one can define the security of a multiple-prover PoR system. We define them next.
The first security definition corresponds to the “worst case” scenario and is the natural generalization of a single-server PoR system.
A ρ-prover MPoR scheme is -threshold secure if there is an Extractor which, when given any τ proving algorithms, say , succeeds with probability at least ν whenever
We note that when , we get a standard single-server PoR system. Moreover, the definition captures the worst-case scenario in the sense that it only guarantees extraction if there exists a set of τ proving algorithms, all of which succeed with high enough probability.
The above definition requires that all the τ servers succeed with high enough probability. On the other hand, it might not be the case that all the proving algorithms of the servers picked by the Verifier succeed with the required probability. In fact, even verifying whether or not all the τ proving algorithms have high enough success probability to allow successful extraction might be difficult (see, for example  for more details about this). However, it is possible that some of the proving algorithms succeed with high enough probability to compensate for the failure of the rest of the proving algorithms. For instance, since the provers are allowed to interact before they specify their proving algorithms, it might be the case that the colluding provers decide to store most of the message on a single prover. In this case, even a weaker guarantee that the average success probability is high enough might be sufficient to guarantee a successful extraction. In other words, it is possible to state (and as we show in this paper, achieve) a security guarantee with respect to the average case success probability over all the proving algorithms.
A ρ-prover MPoR scheme is -average secure if the Extractor succeeds with probability at least ν whenever
Note that the average-case secure system reduces to the standard PoR scheme (with ) when . The following example illustrates that average-case security is possible even when an MPoR system is not possible as per Definition 2.1.
Suppose , and . Further, suppose that , and . Then the hypotheses of Definition 2.1 are not satisfied for . So even if the MPoR scheme is -threshold secure, we cannot conclude that the Extractor will succeed. On the other hand, for the assumed success probabilities, the hypotheses of Definition 2.2 are satisfied. Therefore, if the MPoR scheme is -average secure, the Extractor will succeed.
We mentioned at the start of this section that PoR systems were introduced and studied to give assurance of the integrity of the data stored on remote storage. However, the confidentiality aspects of data have not been studied formally in the area of cloud-based PoR systems. There have been couple of ad hoc solutions that have been proposed in which the messages are encrypted and then stored on the cloud . We believe that, in addition to the standard integrity requirement, privacy of the stored data when multiple provers are involved is also an important requirement. We model the privacy requirement as follows:
An MPoR system is called t-private if no set of adversarial provers of size at most t learns anything about the message stored by the Verifier.
Note that corresponds to the case when the MPoR system does not provide any confidentiality to the message. The above definition captures the idea that, even if t provers collude, they do not learn anything about the message. We remark that we can achieve confidentiality without encrypting the message by using secret sharing techniques.
We fix the letter m for the original message, to denote the space from which the message m is picked and M to denote the encoded message. We fix ν to denote the failure probability of the extractor and η to denote the success probability of a proving algorithm. In this paper, we are mainly interested in the case when for both the worst-case and the average-case security. We use n to denote the number of message blocks, assuming the underlying PoR system breaks the message into blocks.
3 Primitives used in this paper
3.1 Ramp schemes
In our construction, we use a primitive related to secret sharing schemes known as ramp schemes. A secret sharing scheme allows a trusted dealer to share a secret between n players so that certain subsets of players can reconstruct the secret from the shares they hold [6, 17].
It is well known that the size of each player’s share in a secret sharing scheme must be at least the size of the secret. If the secret that is to be shared is large, then this constraint can be very restrictive. Schemes for which we can get a certain form of trade-off between share size and security are known as ramp schemes .
Definition 3.1 (Ramp scheme).
Let , and n be positive integers such that . A -ramp scheme is a pair of algorithms, say and , such that, on input a secret , generates n shares, one for each of the n players, such that the following two properties hold:
Reconstruction: Any subset of or more players can pool together their shares and use to compute the secret from the shares that they collectively hold.
Secrecy: No subset of or fewer players can determine any information about the secret .
Suppose the dealer wishes to set up a -ramp scheme with the secret . The dealer picks a finite field with such that . The dealer picks random elements independently from the field and constructs the following polynomial of degree 3 over the finite field : . The share for any player is generated by computing . It is easy to see that if two or fewer players come together, they do not learn any information about the secret, and if at least four players come together, they can use Lagrange’s interpolation formula to compute the function f as well as the secret. However, if three players pool together their shares, then they can learn some partial information about one of the other player’s share. For concreteness, let . Then ; therefore, players , and can compute the value of .
For completeness, we review some of the basic theory concerning the construction of ramp schemes. Linear codes have been used to construct ramp schemes for over thirty years since the work of McEliece and Sarwate . We will consider a construction from an arbitrary code in this paper. The following relation between an arbitrary code (linear or non-linear) and a ramp scheme was shown by Paterson and Stinson .
Let be a code of length N, distance and dual distance . Let . Then there is a -ramp scheme, where and .
Here is the rate of the ramp scheme. If is a generator matrix of a code C with dimension k, then . In other words, .
The construction of a ramp scheme from a code is as follows. Let and ρ be positive integers, and let be the message. Let C be a code of length defined over a finite field . We also require that the first entries of a codeword is the message to be encoded, i.e., the corresponding generator matrix is in standard form. Select a random codeword , and define the shares as .
One can use a Reed–Solomon code to construct a ramp scheme . Let q be a prime and . It is well known that, for a prime q, there is an Reed–Solomon code with . This implies a -ramp scheme over .
3.2 Single-prover PoR system
We start by fixing some notation for PoR schemes that we use throughout the paper. Let Γ be the challenge space, and let Δ be the response space. We denote by the size of a challenge space. Let be the space of all encoded messages. The response function computes the response given the encoded message M and the challenge c.
For an encoded message , we define the response vector that contains all the responses to all possible challenges for the encoded message M. Finally, define the response code of the scheme to be
The codewords in are just the response vectors that we defined above. Previously , we proved the following result for a single-prover PoR scheme.
Suppose that is a proving algorithm for a PoR scheme with response code . If the success probability of the corresponding proving algorithm satisfies , where is the Hamming distance of the code , and γ is the size of the challenge space, then the extractor described in Figure 1 always outputs the message m.
Suppose that is a proving algorithm for a single server PoR scheme with response code . Then there exists a -MPoR system, where is the Hamming distance of the code , and γ is the size of the challenge space Γ.
Previously , we gave a modified version of the Shacham–Waters scheme which they showed is secure in the unconditional security setting. They argued that, in the setting of unconditionally security, any keyed PoR scheme should be considered to be secure when the success probability of the proving algorithm , denoted by , is defined as the average success probability of the prover over all possible keys (Theorem 3.8). The same reasoning extends to MPoR systems. Therefore, in what follows and in Section 6, when we say a scheme is an -threshold-secure scheme, the term η is the average success probability where the average is computed over all possible keys. We denote the average success probability of a prover over all possible keys by . Previously , we showed the following:
Let be the underlying field, and let be the Hamming weight of the challenges made by the Verifier. Let be the Hamming distance of the space of the encoded message . Suppose that
where is the size of the challenge space and is given by
Then there exists an Extractor that always outputs .
4 Worst-case MPoR based on ramp scheme
In this section, we give our first construction that achieves a worst-case security guarantee. The idea is to use a -ramp scheme in conjunction with a single-server PoR system. The intuition behind the construction is that the underlying PoR system along with the ramp scheme provides the retrievability guarantee and the ramp scheme provides the confidentiality guarantee.
Let . Suppose the Verifier and the provers use a PoR system Π. Let the message to be stored be . The Verifier picks and chooses two random elements to construct a polynomial . The Verifier picks an encoding function and stores on , on , on , on , on , and on .
Let us suppose that the PoR scheme is such that, for a random challenge vector of dimension ρ, say , where the i-th entry would be a challenge to , the corresponding responses of the provers form a vector , where is the correct response of . In other words, on challenge 5 to , the correct response is 3, and so on.
During the audit phase, the Verifier picks any four provers and sends the challenges to the provers. Once all the provers that he chose reply, he verifies their response. For example, suppose the Verifier picks , , and . The Verifier then sends the challenge 5 to , 9 to , 13 to and 6 to . If it gets the responses 3, 1, 13 and 14 back, it accepts; otherwise, it rejects.
We note one of the possible practical deployments of the Ramp-MPoR stated in Figure 3. Let m be a message that consists of sk elements from . The Verifier breaks the message into k blocks of length s each. It then invokes a -ramp scheme on each of these blocks to generate n shares of each of the k blocks. The Verifier then runs a PoR scheme Π to compute the encoded message to be stored on each of the servers by encoding its k shares, one corresponding to each of the k blocks.
We prove the following security result for the MPoR scheme presented in Figure 3.
Let Π be an -threshold-secure MPoR with a response code of Hamming distance and the size of challenge space γ. Let be a -ramp scheme. Then Ramp-MPoR, defined in Figure 3, is an MPoR system with the following properties:
Privacy: Ramp-MPoR is -private.
Security: Ramp-MPoR is -threshold secure, where .
The privacy guarantee of Ramp-MPoR is straightforward from the privacy property of the underlying ramp scheme.
For the security guarantee, we need to demonstrate an Extractor that outputs a message if at least t servers succeed with probability at least . The description of our Extractor is as follows:
The Extractor chooses provers and runs the extraction algorithm of the underlying single-server PoR system on each of these provers. In the end, it outputs for the corresponding provers . It defines .
The Extractor invokes the algorithm of the underlying ramp scheme with the elements of . It outputs whatever outputs.
Now note that the Verifier interacts with every independently. We know from the security of the underlying single-server PoR scheme (Theorem 3.6) that there is an extractor that always outputs the encoded message whenever . Therefore, if all the chosen proving algorithms succeed with probability at least η, then the set will have correct shares. From the correctness of the algorithm, we know that the message output in the end by the Extractor will be the message m. ∎
As a special case of the above, we get a simple MPoR system which uses a replication code. A replication code has an encoding function
This is the setting considered by Curtmola et al. .
We call a Ramp-MPoR scheme based on a replication code a Rep-MPoR. The schematic description of the scheme is presented in Figure 4, and the scheme is presented in Figure 5. Since a ρ-replication code is a -ramp scheme, a simple corollary to Theorem 4.2 is the following:
Let Π be an -MPoR system with a response code of Hamming distance and the size of challenge space γ. Then Rep-MPoR, formed by instantiating Ramp-MPoR with the replication code based Ramp scheme, is an MPoR system with the following properties:
Privacy: It is 0 -private.
Security: It is -threshold secure, where .
The issue with Rep-MPoR scheme is that there is no confidentiality of the file. We will come back to this issue later in Section 6.1.
5 Average-case secure MPoR system
In general, it is not possible to verify with certainty whether the success probability of a proving algorithm is above a certain threshold; therefore, in that case, it is unclear how the Extractor would know which proving algorithms to use for extraction as described in Section 4. In this section, we analyze the average-case security properties of the replication code based scheme, Rep-MPoR, described in the last section. This allows us an alternative guarantee that allows successful extraction where the extractor need not worry whether a certain proving algorithm succeeds with high enough probability or not.
Recall the scenario introduced in Example 2.3. Here we assumed , and for three provers. Suppose that successful extraction for a particular prover requires . Then extraction would work on only one of these three provers. On the other hand, suppose we have an average-case secure MPoR in which extraction is successful if the average success probability of the three provers is at least 0.7. Then the success probabilities assumed above would be sufficient to guarantee successful extraction.
Let Π be a single-server PoR system with a response code of Hamming distance and the size of challenge space γ. Then Rep-MPoR, defined in Figure 5, is an MPoR system with the following properties:
Privacy: Rep-MPoR is 0 -private.
Security: Rep-MPoR is -average secure.
Since the message is stored in its entirety on each of the servers, there is no confidentiality.
For the security guarantee, we need to demonstrate an Extractor that outputs a message if the average success probability of all the provers is at least . The description of our Extractor is as follows:
For all , use to compute the vector , where for all (i.e., for every c, is the response computed by when it is given the challenge c).
Compute R as a concatenation of and find so that is minimized.
Now note that the Verifier interacts with each independently and the Extractor uses the challenge-response step with independent challenges. Let be the success probabilities of the ρ proving algorithms. Let be the average success probability over all the servers and challenges. Therefore, .
First note that, in the case of Figure 5, the response code is of the form
It is easy to see that the distance of the response code is and the length of a challenge is . From the definition of the extractor and Theorem 3.6, it follows that the extraction succeeds if
5.1 Hypothesis testing for Rep-MPoR
For the purposes of auditing whether a file is being stored appropriately, it is necessary to have a mechanism for determining whether the success probability of a prover is sufficiently high. In the case of replication code based on MPoR with worst-case security, we are interested in the success probabilities of individual provers, and the analysis can be carried out as detailed in . In the case of Rep-MPoR, however, we wish to determine whether the average success probability of the set of provers is at least η. This amounts to distinguishing the null hypothesis
from the alternative hypothesis
Suppose we send c challenges to each server. If a given server has success probability , then the number of correct responses received follows the binomial distribution . If the success probabilities were the same for each server, then the sum of the number of successes over all the servers would also follow a binomial distribution. However, we are also interested in the case in which these success probabilities differ, in which case the total number of successes follows a Poisson binomial distribution, which is more complicated to work with. In order to establish a test that is conceptually and computationally easier to apply, we will instead rely on the observation that, in cases where the average success probability is high enough to permit extraction, the failure rates of the servers are relatively low.
For a given server , let denote the probability of failure. For r challenges, the number of failures follows the binomial distribution . Provided that r is sufficiently large and is sufficiently low, then can be approximated by the Poisson distribution. The Poisson distribution is used to model the scenario where discrete events are occurring independently within a given time period with an expected rate of λ events during that period. The probability of observing k events within that period is given by
Mean and variance of are equal to λ. For our purposes, the advantage of using this approximation is that the sum of ρ independent variables following the Poisson distributions is itself distributed according to the Poisson distribution , even when the all differ. In the case where the average failure probability is low, the distribution should provide a reasonable approximation to the actual distribution of the total number of failed challenges.
To demonstrate the appropriateness of the Poisson approximation for this application, suppose we have five servers, whose failure probabilities are expressed as . Let t be the number of trials per server and b the total number of observed failures out of the trials. Table 1 gives both the exact cumulative probability of observing up to b failures, and the Poisson approximation of this cumulative probability, for a range of values for .
As an example of using the given formula to calculate a confidence interval, suppose we do 200 trials on each of five servers (so there are 1000 trials in total), and we observe 50 failures in total. Then the resulting confidence interval is . Suppose we wish to know whether the success probability is at least . We have . This is outside of that interval, and hence we conclude there is enough evidence to reject at the 95 % significance level. However, to test whether the success probability was greater than 0.95 we see that . Since 50 lies within the interval, we conclude there is insufficient evidence to reject at the 95 % significance level.
Let b denote the number of incorrect responses we have received from the challenges given to the provers. Suppose that is true, so that the expected number of failures is at least . Based on our approximation, the probability that the number of failures is at most b is at most
If this probability is less than 0.05, we reject and accept the alternative hypothesis. However, if the probability is greater than 0.05, then there is not enough evidence to reject at the 5 % significance level, and so we continue to suspect that the file is not being stored appropriately.
We can express this test neatly using a confidence interval. We define a 95 % upper confidence bound by
This represents the smallest parameter choice for the Poisson distribution for which the probability of obtaining b or fewer incorrect responses is less than 0.05. Then is a 95 % confidence interval for the mean number of failures, so we reject whenever lies outside this interval. The value of can be determined easily by exploiting a connection with the chi-squared distribution . We have
and so the appropriate value of can readily be obtained from a table for the chi-squared distribution.
We give a comparison between exact cumulative probability and approximation by Poisson distribution in Table 1.
6 Optimization using the keyed Shacham–Waters scheme
In the last three sections, we gave constructions of MPoR scheme using ramp schemes, linear secret-sharing schemes, replication codes and a single-prover PoR system. In this section, we show a specific instantiation of our scheme using the keyed scheme of Shacham and Waters [15, 16] for a single-server PoR system.
6.1 Extension of the keyed Shacham–Waters scheme to MPoR
If we instantiate the Rep-MPoR scheme (described in Section 4) with the modified Shacham–Waters scheme of , then we need one key that consists of values in . However, in this case, we do not have any privacy. In particular, we have the following extension of Corollary 4.3.
Let Π be an -PoR system of Shacham and Waters  with a response code of Hamming distance and the size of challenge space γ, where is given by equation (3.1). Then Rep-MPoR instantiated with the Shacham–Waters scheme is an MPoR system with the following properties:
Privacy: It is 0 -private.
Security: It is -threshold secure, where .
Storage Overhead: The Verifier needs to store field elements, and every needs to store field elements.
The issue with the Rep-MPoR scheme is that there is no confidentiality of the file. In what follows, we improve the privacy guarantee of the MPoR scheme described above. Our starting point would be an instantiation of the Ramp-MPoR scheme, defined in Figure 3, with the Shacham–Waters scheme. We then reduce the storage on the Verifier through two steps.
6.2 Optimized version of the multi-server Shacham–Waters scheme
We follow two steps to get an MPoR scheme based on the Shacham–Waters scheme with a reduced storage requirement for the Verifier, while improving the confidentiality guarantee.
In the first step, stated in Theorem 6.2, we improve the privacy guarantee of the MPoR scheme to get a -private MPoR scheme (where is an integer). The Verifier in this scheme has to store field elements. When the underlying field is , the verifier has to store bits.
In the second step, stated in Theorem 6.3, we reduce the storage requirement of the Verifier from to field elements for some integer without affecting the privacy guarantee. When the underlying field is , the verifier has to store bits.
To improve the privacy guarantee of Corollary 6.1 to say, -private (as per Definition 2.4), we use a Ramp-MPoR scheme and ρ different keys, where each key consists of values in . The Verifier generates ρ shares of every message block using a ramp scheme, then encodes the shares, and finally computes the tag for each of these encoded shares.
We follow with more details. Let be the message. The Verifier computes the shares of every message block using a -ramp scheme. It then encodes all the shares using the encoding scheme of the PoR scheme. Let the resulting encoded shares be for . In other words, the result of the above two steps are ρ encoded shares, each of which is an n-tuple in . The Verifier now picks random values for and computes the tags as follows:
The verifier gives the tuple of encoded messages and the corresponding tags . We call this scheme the Basic-MPoR scheme. The following is straightforward from Theorem 4.2.
Let Π be an -PoR scheme of Shacham and Waters  with a response code of Hamming distance and the size of challenge space , where is given by equation (3.1). Let be a -ramp scheme. Then Basic-MPoR defined above is an MPoR scheme with the following properties:
Privacy: Basic-MPoR is -private.
Security: Basic-MPoR is -threshold secure, where .
Storage Overhead: The Verifier needs to store field elements and every needs to store field elements.
In the construction mentioned above, the Verifier needs to store elements of , which is almost the same as the total storage requirements of all the provers. In , we encountered the same issue, where the Verifier has to store as much secret information as the size of the message. This seems to be the general drawback in the unconditionally secure setting. However, in the case of MPoR, we can improve the storage requirement of the Verifier as shown in the next step.
In this step, we improve the above-described MPoR scheme to achieve considerable reduction on the storage requirement of the Verifier. The resulting scheme also provides unbounded audit capability against computationally unbounded adversarial provers, and it also ensures -privacy.
The main observation that results in the reduction in the storage requirements of the Verifier is the fact that we can partially derandomize the keys generated by the Verifier. We use one of the most common techniques in derandomization. The keys in this scheme are generated by -wise independent functions. Our construction works as follows: We pick random polynomials, , each of degree at most . Then the Verifier computes the secret key by evaluating the polynomials and on ρ different values, say
for and . The Verifier then computes the encoded shares and their corresponding tags as in Basic-MPoR, i.e.,
Let be a -ramp scheme. Let Π be a single-prover Shacham–Waters scheme  with a response code of Hamming distance and the size of challenge space γ. Then SW-MPoR, defined in Figure 6, is an MPoR system with the following properties:
Privacy: SW-MPoR is -private.
Security: SW-MPoR is -threshold secure, where .
Storage Overhead: The Verifier needs to store field elements, and every (for ) needs to store field elements.
The privacy guarantee of SW-MPoR is straightforward from the secrecy property of the underlying ramp scheme.
For the security guarantee, we have to show an explicit construction of the Extractor that, on input proving algorithms , outputs m if for at least proving algorithms. However, there is a subtle issue that we have to deal with before using the proof of Theorem 4.2, because of the relation between every message and tag pair. It was previously noted by us  that if the adversarial prover learns the secret key, then it can break the PoR scheme. We first argue that a set of colluding provers cannot have an undue advantage from exploiting the linear structure of the message-tag pairs.
We now prove that any set of provers do not learn anything about the keys generated using polynomials of degree at most . The idea is very similar to the single-prover case. Previously , we noted that in the single prover case, for an n-tuple encoded message, the key is a tuple of uniformly random elements in . Further, from the point of view of a prover, there are q possible keys – the value of a determines the n-tuple uniquely, but a is completely undetermined. In the MPoR case, we have ρ keys. Each prover in a given set of provers has q possible keys, as discussed above. However, it is conceivable that they can use their collective knowledge to learn something about the keys. In what follows, we show that they cannot determine any additional information about their keys by combining the information they hold collectively.
Let be the indices of any arbitrary set of provers. Let denote the set of possible keys for , for . Consider any list of keys . Recall that (for ) has the form , where and (for ) are generated by random polynomials of degree . We first consider (for ). Note that the vector is defined uniquely by and the set of all encoded message-tag pairs. We have already shown that any set of provers cannot learn anything about the random polynomial used to generate the for all . We use the following well-known fact to show that any set of provers does not learn any additional information about the keys.
Let be an integer, let q be a prime number, and let be a finite field. Let be random elements picked uniformly at random. Define for all . Then,
Since is uniformly distributed in , the probability computed in equation (6.1) is actually equal to .
By construction, is a random polynomial of degree at most . Fact 6.4 then implies that any combination of is equally likely. A similar argument, with the ’s replaced by the ’s (for all and ) and the polynomial replaced by (for ), gives that all sets of keys are equally likely. In other words, the set of provers in the set I cannot determine any additional information about their keys by combining the information they hold collectively.
We now complete the security proof by describing an Extractor that outputs the file if provers succeed with high enough probability. The description of the Extractor and its analysis is the same as that of Theorem 4.2. We give it for the sake of completeness.
The Extractor chooses provers and runs the extraction algorithm of the underlying single-server PoR system on each of these provers. In the end, it outputs for the corresponding provers . It defines . Note that the Extractor of the underlying PoR scheme has already computed on the set .
The Extractor invokes the algorithm of the underlying ramp scheme with the elements of to compute .
Now note that the Verifier interacts with every independently. We know from the security of the underlying PoR scheme of Shacham–Waters that there is an extractor that always outputs the encoded message whenever . Therefore, if all the chosen proving algorithms succeed with probability at least η over all possible keys, then the set will have correct shares. From the correctness of the algorithm and , we know that the message output in the end by the Extractor will be the message m.
For the storage requirement, the Verifier has to store the coefficients of all the random polynomials , which amounts to a total of field elements. ∎
7 Conclusion and future works
In this paper, we studied PoR systems when multiple provers are involved (MPoR). We motivated and defined the security of MPoR in the worst-case (Definition 2.1) and the average-case (Definition 2.2) settings, and extended the hypothesis testing techniques used in the single-server setting  to the multi-server setting. We also motivated the study of confidentiality of the outsourced message. We gave MPoR schemes which are secure under both these security definitions and provide reasonable confidentiality guarantees even when there is no restriction on the computational power of the servers. At the end of this paper, we looked at an optimized version of MPoR system when instantiated with the unconditionally secure version of the Shacham–Waters scheme . We exhibited that, in the multi-server setting with computationally unbounded provers, one can overcome the limitation that the verifier needs to store as much secret information as the provers.
Our paper leaves several open problems. We list two of them below:
Our approach works in the privately verifiable setting, i.e., the entity that wishes to verify the validity of stored data is the same entity that stored the data. It would be interesting to see if our schemes can be extended to publicly verifiable setting.
We assume that the provers do not interact with each other after they receive the encoded files. There is a vast literature on mitigating collusion. It is an interesting direction to see if our schemes can be combined with the recent advances in secure scheme against colluding players in the distributed setting to remove our assumption.
Notation used in this paper
dual of a code C
distance of the response code
distance of a codeword
dual distance of a code
Hamming distance between two vectors
generator matrix of a code
length of a message
key (in a keyed scheme)
number of message-blocks
i-th message block
message outputted by the Extractor
i-th encoded message
i-th encoded message on
encoded message space
number of provers
proving algorithm of i-th Prover
order of underlying finite field
response vector for encoded message M
tag (in a keyed scheme)
success probability of proving algorithm
number of possible challenges
number of users
Thanks to Andris Abakuks and Simon Skene for some helpful discussions of statistics.
 G. Ateniese, R. C. Burns, R. Curtmola, J. Herring, O. Khan, L. Kissner, Z. N. J. Peterson and D. Song, Remote data checking using provable data possession, ACM Trans. Inform. Sys. Security 14 (2011), Paper No. 12. Search in Google Scholar
 G. Ateniese, R. C. Burns, R. Curtmola, J. Herring, L. Kissner, Z. N. J. Peterson and D. X. Song, Provable data possession at untrusted stores, Proceedings of the 14th ACM Conference on Computer and Communications Security, ACM, New York (2007), 598–609. Search in Google Scholar
 G. Ateniese, Ö. Dagdelen, I. Damgård and D. Venturi, Entangled cloud storage, IACR Cryptology ePrint Archive (2012), . Search in Google Scholar
 G. Ateniese, R. Di Pietro, L. V. Mancini and G. Tsudik, Scalable and efficient provable data possession, Proceedings of the 4th International Conference on Security and Privacy in Communication Networks, ACM, New York (2008), 1–9. Search in Google Scholar
 G. Ateniese, S. Kamara and J. Katz, Proofs of storage from homomorphic identification protocols, Advances in Cryptology—ASIACRYPT 2009, Springer, Berlin (2009), 319–333. Search in Google Scholar
 G. R. Blakley, Safeguarding cryptographic keys, Proceedings of the National Computer Conference, AFIPS, New York (1979), 313–317. Search in Google Scholar
 G. R. Blakley and C. Meadows, Security of ramp schemes, Advances in Cryptology—CRYPTO 1985, Springer, Berlin (1985), 242–268. Search in Google Scholar
 K. D. Bowers, A. Juels and A. Oprea, Proofs of retrievability: Theory and implementation, Proceedings of the 2009 ACM Workshop on Cloud Computing Security, ACM, New York (2009), 43–54. Search in Google Scholar
 R. Curtmola, O. Khan, R. C. Burns and G. Ateniese, MR-PDP: Multiple-replica provable data possession, The 28th International Conference on Distributed Computing Systems, IEEE Press, Piscataway (2008), 411–420. Search in Google Scholar
 Y. Dodis, S. P. Vadhan and D. Wichs, Proofs of retrievability via hardness amplification, Theory of Cryptography, Springer, Berlin (2009), 109–127. Search in Google Scholar
 A. Juels and B. S. Kaliski, Jr., PORs: Proofs of retrievability for large files, Proceedings of the 14th ACM Conference on Computer and Communications Security, ACM, New York (2007), 584–597. Search in Google Scholar
 S. Kamara and K. Lauter, Cryptographic cloud storage, Financial Cryptography and Data Security, Springer, Berlin (2010), 136–149. Search in Google Scholar
 M. B. Paterson and D. R. Stinson, A simple combinatorial treatment of constructions and threshold gaps of ramp schemes, Cryptogr. Commun. 5 (2013), 229–240. 10.1007/s12095-013-0082-1Search in Google Scholar
 M. B. Paterson, D. R. Stinson and J. Upadhyay, A coding theory foundation for the analysis of general unconditionally secure proof-of-retrievability schemes for cloud storage, J. Math. Cryptol. 7 (2013), 183–216. Search in Google Scholar
 H. Shacham and B. Waters, Compact Proofs of Retrievability, Advances in Cryptology—ASIACRYPT 2008, Springer, Berlin (2009), 90–107. Search in Google Scholar
 K. Ulm, Simple method to calculate the confidence interval of a standardized mortality ratio (SMR), Amer. J. Epidemiology 131 (1990), 373–375. 10.1093/oxfordjournals.aje.a115507Search in Google Scholar
 C. Wang, Q. Wang, K. Ren and W. Lou, Privacy-preserving public auditing for data storage security in cloud computing, IEEE Proceedings INFOCOM 2010, IEEE Press, Piscataway (2010), 1–9. Search in Google Scholar
 Summary of the Amazon S3 Service Disruption in the Northern Virginia (US-EAST-1) Region, . Search in Google Scholar
 Why is decentralized and distributed file storage critical for a better web?, . Search in Google Scholar
© 2018 Walter de Gruyter GmbH, Berlin/Boston
This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.