# Multi-prover proof of retrievability

Maura B. Paterson, Douglas R. Stinson and Jalaj Upadhyay

# Abstract

There has been considerable recent interest in “cloud storage” wherein a user asks a server to store a large file. One issue is whether the user can verify that the server is actually storing the file, and typically a challenge-response protocol is employed to convince the user that the file is indeed being stored correctly. The security of these schemes is phrased in terms of an extractor which will recover the file given any “proving algorithm” that has a sufficiently high success probability. This forms the basis of proof-of-retrievability (PoR) systems. In this paper, we study multiple server PoR systems. We formalize security definitions for two possible scenarios: (i) A threshold of servers succeeds with high enough probability (worst case), and (ii) the average of the success probability of all the servers is above a threshold (average case). We also motivate the study of confidentiality of the outsourced message. We give MPoR schemes which are secure under both these security definitions and provide reasonable confidentiality guarantees even when there is no restriction on the computational power of the servers. We also show how classical statistical techniques previously used by us can be extended to evaluate whether the responses of the provers are accurate enough to permit successful extraction. We also look at one specific instantiation of our construction when instantiated with the unconditionally secure version of the Shacham–Waters scheme. This scheme gives reasonable security and privacy guarantee. We show that, in the multi-server setting with computationally unbounded provers, one can overcome the limitation that the verifier needs to store as much secret information as the provers.

MSC 2010: 94A60

## 1 Introduction

In the recent past, there has been a lot of activity on remote storage and the associated cryptographic problem of integrity of the stored data. This question becomes even more important when there are reasons to believe that the remote servers might act maliciously, i.e., one or more servers can delete (whether accidentally or on purpose) a part of the data since there is a good chance that the data will never be accessed, and hence, the client would never find out! In order to assuage such concerns, one would prefer to have a simple auditing system that convinces the client if and only if the server has the data. Such audit protocols, called proof-of-retrievability (PoR) systems, were introduced by Juels and Kaliski [11], and closely related proof-of-data-possession (PDP) systems were introduced by Ateniese et al. [2].

In a PoR protocol, a client stores a message m on a remote server and keeps only a short private fingerprint locally. At some later time, when the client wishes to verify the integrity of its message, it can run an audit protocol in which it acts as a verifier while the server proves that it has the client’s data. The formal security of a PoR protocol is expressed in terms of an extractor – there exists an extractor with (black-box or non-black-box) access to the proving algorithm used by the server to respond to the client’s challenge, such that the extractor retrieves the original message given any adversarial server which passes the audits with a threshold probability. Apart from this security requirement, two practical requirements of any PoR system would be to have a reasonable bound on the communication cost of every audit and small storage overhead on both the client and server.

PoR systems were originally defined for the single-server setting. However, in the real world, it is highly likely that a client would store its data on more than one server. This might be due to a variety of reasons. For example, a client might wish to have a certain degree of redundancy if one or more servers fails. In this case, the client is more likely to store multiple copies of the same data. Another possible scenario could be that the client does not trust a single server with all of its data. In this case, the client might distribute the data across multiple servers. Both of these settings have been studied previously in the literature.

The first such study was initiated by Curtmola et al. [9], who considered the first of the above two cases. They addressed the problem of storing copies of a single file on multiple servers. This is an attractive solution considering the fact that replication is a fundamental principle in ensuring the availability and durability of data. Their system allows the client to audit a subset of servers even if some of them collude.

On the other hand, Bowers, Juels and Oprea [8] considered the second of the above two cases. They studied a system where the client’s data is distributed and stored on different servers. This ensures that none of the servers has the whole data.

Both of these systems covered one specific instance of the wide spectrum of possibilities when more than one server is involved. For example, none of the works mentioned above addresses the question of the privacy of data. Both of them argue that, for privacy, the client can encrypt its file before storing it on the servers. These systems are secure only in the computational setting and the privacy guarantee is dependent on the underlying encryption scheme. On the other hand, there are known primitives in the setting of distributed systems, like secret sharing schemes, that are known to be unconditionally secure. Moreover, we can also utilize cross-server redundancy to get more practical systems.

### 1.1 Our contributions

In Section 2, we give the formal description of multi-server PoR (MPoR) systems. We state the definitions for worst-case and the average-case secure MPoR systems. We also motivate the privacy requirement and state the privacy definition for MPoR systems. In Section 3, we define various primitives to the level required to understand this paper.

In Section 4, we give a construction of an MPoR scheme that achieves worst-case security when the malicious servers are computationally unbounded. Our construction is based on ramp schemes and a single-server PoR scheme. Our construction achieves confidentiality of the message. To exemplify our scheme, we instantiate this scheme with a specific form of ramp scheme.

In Section 5, we give a construction of an MPoR scheme that achieves average-case security against computationally unbounded adversaries. For an MPoR system that affords average-case security, we also show that an extension of classical statistical techniques previously used by us [15] can be used to provide a basis for estimating whether the responses of the servers are accurate enough to allow successful extraction.

One of the benefits of an MPoR system is that it provides cross-server redundancy. In the past, this feature has been used by Bowers, Juels and Oprea [8] to propose a multi-server system called HAIL. We first note that the constructions in Section 4 and Section 5 do not provide any improvement on the storage overhead of the server or the client. In Section 6, we give a construction based on the Shacham–Waters protocol [16] that allows significant reduction of the storage overhead of the client in the multi-server setting.

### 1.2 Related works

The concept of proof of retrievability is due to Juels and Kaliski [11]. A PoR scheme incorporates a challenge-response protocol in which a verifier can check that a message is being stored correctly, along with an extractor that will actually reconstruct the message, given the algorithm of a “prover” who is able to correctly respond to a sufficiently high percentage of challenges.

There are also papers that describe the closely related (but slightly weaker) idea of a proof-of-data-possession scheme (PDP scheme), e.g., [2]. A PDP scheme permits the possibility that not all of the message blocks can be reconstructed. Ateniese et al. [2] also introduced the idea of using homomorphic authenticators to reduce the communication complexity of the system. This scheme was improved in a follow-up work by Ateniese et al. [4]. Shacham and Waters [16] later showed that the scheme of Ateniese et al. [1] can be transformed into a PoR scheme by constructing an extractor that extracts the file from the responses of the prover on the audits.

Bowers, Juels and Oprea [8] extended the idea of Juels and Kaliski [11] and used error-correcting codes. The main difference in their construction is that they use the idea of an “outer” and an “inner” code (in the same vein as concatenated codes), to get a good balance between the extra storage overhead and computational overhead in responding to the audits. Dodis, Vadhan and Wichs [10] provided the first example of an unconditionally secure PoR scheme, also constructed from an error-correcting code, with extraction performed through list decoding in conjunction with the use of an almost-universal hash function. They also give different constructions depending on the computational capabilities of the server. Previously [15], we studied PoR schemes in the setting of unconditional security and showed some close connections to error-correcting codes.

Recently, Ateniese, Kamara and Katz [5] defined the framework of proof-of-storage systems to understand PDP and PoR system in a unified manner. They argue that existing PoR [16] and PDP [2] schemes can be seen as an instantiation of their framework. They used homomorphic identification schemes to give efficient proof-of-storage systems in the random-oracle model. They further exhibited that existing constructions of PoR and PDP schemes are specific instantiation of their construction. Wang et al. [19] gave the first privacy preserving public auditable proof-of-storage systems. We refer the readers to the survey by Kamara and Lauter [12] regarding the architecture of proof-of-storage systems.

#### Distributed cloud computing.

All the constructions mentioned above considered single server system; however, such systems are prone to failure leading to catastrophic problems [20]. However, proof-of-storage systems have been also studied in the setting where there is more than one server or more than one client. The first such setting was studied by Curtmola et al. [9]. They studied a multiple-replica PDP system, which is the natural generalization of a single-server PDP system to t servers.

Bowers, Juels and Oprea [8] introduced a distributed system that they called HAIL. Their system allows a set of provers to prove the integrity of a file stored by a client. The idea in HAIL is to exploit the cross-prover redundancy. They considered an active and mobile adversary that can corrupt the whole set of provers.

Recently, Ateniese et al. [3] considered the problem from the client side, where n clients store their respective files on a single prover in a manner such that the verification of the integrity of a single client’s file simultaneously gives the integrity guarantee of the files of all the participating clients. They called such a system an entangled cloud storage.

### 1.3 Comparison with Bowers, Juels and Oprea

The focus of this paper is PoR systems in the distributed setting; therefore, we only compare our work with existing works in the distributed setting. The scheme of Curtmola et al. [9] only considers multiple replica of the same underlying PDP systems, while the construction of Ateniese et al. [3] is for the multiple clients setting. In other words, the scheme of Bowers, Juels and Oprea [8] is closest to ours. However, there are a few key differences.

1. (i)

The construction of Bowers, Juels and Oprea [8] is secure only in the computational setting, while we provide security in the setting of unconditional security.

2. (ii)

Bowers, Juels and Oprea [8] use various tools and algorithms to construct their systems, including error-correcting codes, pseudo-random functions, message authentication codes and universal hash function families. On the other hand, we only use ramp schemes in our constructions, making our schemes easier to state and analyze, and arguably simpler to implement.

3. (iii)

We consider two types of security guarantees, namely, the worst-case scenario and the average-case scenario. On the other hand, Bowers, Juels and Oprea [8] only consider the worst-case scenario.

4. (iv)

The construction of Bowers, Juels and Oprea [8] only aims to protect the integrity of the message, while we consider both the privacy and integrity of the message. Privacy of data has emerged as an important requirement in cloud storage due to recent attacks [21].

5. (v)

We work under a stronger requirement than [8] – we require extraction to succeed with probability equal to 1, whereas in [8], extraction succeeds with probability close to 1, depending in part on properties of a certain class of hash functions used in the protocol.

We use the term Prover to identify any server that stores the file of a client. We use the term Verifier for any entity that verifies whether the file of a client is stored properly or not by the server. We also assume that a file is composed of message blocks of an appropriate fixed length. If the file consists of single block, we simply call it the file.

## 2 Security model of multi-server PoR systems

The essential components of multi-server PoR (MPoR) systems are natural generalizations of single-server PoR systems. The first difference is that there are ρ provers and the Verifier might store different messages on each of them. Also, during an audit phase, the Verifier can pick a subset of provers on which it runs the audits. The last crucial difference is that the Extractor has (black-box or non-black-box) access to a subset of proving algorithms corresponding to the provers that the Verifier picked to audit. We detail them below for the sake of completeness.

Let 𝖯𝗋𝗈𝗏𝖾𝗋1,,𝖯𝗋𝗈𝗏𝖾𝗋ρ be a set of ρ provers. The Verifier has a message m from the message space which he redundantly encodes to M1,,Mρ.

1. (i)

In the keyed setting, the Verifier picks ρ different keys (K1,,Kρ), one for each of the corresponding provers.

2. (ii)

The Verifier gives Mi to 𝖯𝗋𝗈𝗏𝖾𝗋i. In the case of a keyed scheme, 𝖯𝗋𝗈𝗏𝖾𝗋i may be also given an additional tag Si, generated using the key Ki, and Mi.

3. (iii)

The Verifier stores some sort of information (say a fingerprint of the encoded message) which allows him to verify the responses made by the provers.

4. (iv)

On receiving the encoded message Mi, 𝖯𝗋𝗈𝗏𝖾𝗋i generates a proving algorithm 𝒫i, which it uses to generate its responses during the auditing phase.

5. (v)

At any time, the Verifier picks an index i, where 1i, and engages in a challenge-response protocol with 𝖯𝗋𝗈𝗏𝖾𝗋i. In one execution of the challenge-response protocol, the Verifier picks a challenge c and gives it to 𝖯𝗋𝗈𝗏𝖾𝗋i, and the prover responds. The Verifier then verifies the correctness of the response (based on its fingerprint).

6. (vi)

The success probability succ(𝒫i) is the probability, computed over all the challenges, with which the Verifier accepts the response sent by 𝖯𝗋𝗈𝗏𝖾𝗋i.

7. (vii)

The Extractor is given a subset S of the proving algorithms 𝒫1,,𝒫ρ (and in the case of a keyed scheme, the corresponding subset of the keys, {Ki:iS}) and outputs a message m^. The Extractor succeeds if m^=m.

The above framework does not restrict any provers from interacting with other provers when they receive the encoded message. However, we assume that they do not interact after they have generated a proving algorithm. If we do not include this restriction, then it is not hard to see that one cannot have any meaningful protocol. For example, if provers can interact after they receive the encoded message, then it is possible that one prover stores the entire message and the other provers just relay the challenges to this specific prover and relay back its response to the verifier.

In contrast to a single-prover PoR scheme, there are two possible ways in which one can define the security of a multiple-prover PoR system. We define them next.

The first security definition corresponds to the “worst case” scenario and is the natural generalization of a single-server PoR system.

## Definition 2.1.

A ρ-prover MPoR scheme is (η,ν,τ,ρ)-threshold secure if there is an Extractor which, when given any τ proving algorithms, say 𝒫i1,,𝒫iτ, succeeds with probability at least ν whenever

succ(𝒫j)ηfor alljI,

where I={i1,,iτ}.

We note that when ρ=τ=1, we get a standard single-server PoR system. Moreover, the definition captures the worst-case scenario in the sense that it only guarantees extraction if there exists a set of τ proving algorithms, all of which succeed with high enough probability.

The above definition requires that all the τ servers succeed with high enough probability. On the other hand, it might not be the case that all the proving algorithms of the servers picked by the Verifier succeed with the required probability. In fact, even verifying whether or not all the τ proving algorithms have high enough success probability to allow successful extraction might be difficult (see, for example [15] for more details about this). However, it is possible that some of the proving algorithms succeed with high enough probability to compensate for the failure of the rest of the proving algorithms. For instance, since the provers are allowed to interact before they specify their proving algorithms, it might be the case that the colluding provers decide to store most of the message on a single prover. In this case, even a weaker guarantee that the average success probability is high enough might be sufficient to guarantee a successful extraction. In other words, it is possible to state (and as we show in this paper, achieve) a security guarantee with respect to the average case success probability over all the proving algorithms.

## Definition 2.2.

A ρ-prover MPoR scheme is (η,ν,ρ)-average secure if the Extractor succeeds with probability at least ν whenever

1ρi=1ρsucc(𝒫i)η.

Note that the average-case secure system reduces to the standard PoR scheme (with τ=ρ) when ρ=1. The following example illustrates that average-case security is possible even when an MPoR system is not possible as per Definition 2.1.

## Example 2.3.

Suppose η=0.7, ν=0 and ρ=3. Further, suppose that succ(𝒫1)=0.9, succ(𝒫2)=0.6 and succ(𝒫3)=0.6. Then the hypotheses of Definition 2.1 are not satisfied for τ=2. So even if the MPoR scheme is (η,ν,τ,ρ)-threshold secure, we cannot conclude that the Extractor will succeed. On the other hand, for the assumed success probabilities, the hypotheses of Definition 2.2 are satisfied. Therefore, if the MPoR scheme is (0.7,ν,τ)-average secure, the Extractor will succeed.

### Privacy guarantee.

We mentioned at the start of this section that PoR systems were introduced and studied to give assurance of the integrity of the data stored on remote storage. However, the confidentiality aspects of data have not been studied formally in the area of cloud-based PoR systems. There have been couple of ad hoc solutions that have been proposed in which the messages are encrypted and then stored on the cloud [9]. We believe that, in addition to the standard integrity requirement, privacy of the stored data when multiple provers are involved is also an important requirement. We model the privacy requirement as follows:

### Definition 2.4.

An MPoR system is called t-private if no set 𝒜 of adversarial provers of size at most t learns anything about the message stored by the Verifier.

Note that t=0 corresponds to the case when the MPoR system does not provide any confidentiality to the message. The above definition captures the idea that, even if t provers collude, they do not learn anything about the message. We remark that we can achieve confidentiality without encrypting the message by using secret sharing techniques.

### Notation.

We fix the letter m for the original message, to denote the space from which the message m is picked and M to denote the encoded message. We fix ν to denote the failure probability of the extractor and η to denote the success probability of a proving algorithm. In this paper, we are mainly interested in the case when ν=0 for both the worst-case and the average-case security. We use n to denote the number of message blocks, assuming the underlying PoR system breaks the message into blocks.

## 3 Primitives used in this paper

### 3.1 Ramp schemes

In our construction, we use a primitive related to secret sharing schemes known as ramp schemes. A secret sharing scheme allows a trusted dealer to share a secret between n players so that certain subsets of players can reconstruct the secret from the shares they hold [6, 17].

It is well known that the size of each player’s share in a secret sharing scheme must be at least the size of the secret. If the secret that is to be shared is large, then this constraint can be very restrictive. Schemes for which we can get a certain form of trade-off between share size and security are known as ramp schemes [7].

### Definition 3.1 (Ramp scheme).

Let τ1, τ2 and n be positive integers such that τ1<τ2n. A (τ1,τ2,n)-ramp scheme is a pair of algorithms, say 𝖲𝗁𝖺𝗋𝖾𝖦𝖾𝗇 and 𝖱𝖾𝖼𝗈𝗇𝗌𝗍𝗋𝗎𝖼𝗍, such that, on input a secret 𝖲, 𝖲𝗁𝖺𝗋𝖾𝖦𝖾𝗇(𝖲) generates n shares, one for each of the n players, such that the following two properties hold:

1. (i)

Reconstruction: Any subset of τ2 or more players can pool together their shares and use 𝖱𝖾𝖼𝗈𝗇𝗌𝗍𝗋𝗎𝖼𝗍 to compute the secret 𝖲 from the shares that they collectively hold.

2. (ii)

Secrecy: No subset of τ1 or fewer players can determine any information about the secret 𝖲.

### Example 3.2.

Suppose the dealer wishes to set up a (2,4,n)-ramp scheme with the secret (a0,a1). The dealer picks a finite field 𝔽q with q>n such that a0,a1𝔽q. The dealer picks random elements a2,a3 independently from the field 𝔽q and constructs the following polynomial of degree 3 over the finite field 𝔽q: f(x)=a0+a1x+a2x2+a3x3. The share for any player 𝒫i is generated by computing si=f(i). It is easy to see that if two or fewer players come together, they do not learn any information about the secret, and if at least four players come together, they can use Lagrange’s interpolation formula to compute the function f as well as the secret. However, if three players pool together their shares, then they can learn some partial information about one of the other player’s share. For concreteness, let q=17. Then 5a17s3+9s6+s15mod17; therefore, players 𝒫3, 𝒫6 and 𝒫15 can compute the value of a1.

For completeness, we review some of the basic theory concerning the construction of ramp schemes. Linear codes have been used to construct ramp schemes for over thirty years since the work of McEliece and Sarwate [13]. We will consider a construction from an arbitrary code in this paper. The following relation between an arbitrary code (linear or non-linear) and a ramp scheme was shown by Paterson and Stinson [14].

### Theorem 3.3.

Let C be a code of length N, distance d and dual distance d. Let 1s<d-2. Then there is a (τ1,τ2,N-s)-ramp scheme, where τ1=d-s-1 and τ2=N-d+1.

Here 𝗌 is the rate of the ramp scheme. If 𝐆 is a generator matrix of a code C with dimension k, then |C|=qkq𝖽-1. In other words, k𝖽-1.

### Construction 3.4.

The construction of a ramp scheme from a code is as follows. Let 𝗌 and ρ be positive integers, and let (m1,,m𝗌)𝔽𝗌 be the message. Let C be a code of length n=ρ+𝗌 defined over a finite field 𝔽. We also require that the first 𝗌 entries of a codeword is the message to be encoded, i.e., the corresponding generator matrix is in standard form. Select a random codeword (𝐜1=m1,,𝐜𝗌=m𝗌,𝐜𝗌+1,,𝐜ρ+𝗌)C, and define the shares as (𝐜𝗌+1,,𝐜ρ+𝗌).

### Example 3.5.

One can use a Reed–Solomon code to construct a ramp scheme [13]. Let q be a prime and 1𝗌<tn<q. It is well known that, for a prime q, there is an [N,k,N-τ+1]q Reed–Solomon code with 𝖽=τ+1. This implies a (τ-𝗌,τ,N)-ramp scheme over 𝔽q.

### 3.2 Single-prover PoR system

We start by fixing some notation for PoR schemes that we use throughout the paper. Let Γ be the challenge space, and let Δ be the response space. We denote by γ=|Γ| the size of a challenge space. Let * be the space of all encoded messages. The response functionρ:*×ΓΔ computes the response r=ρ(M,c) given the encoded message M and the challenge c.

For an encoded message M*, we define the response vectorrM that contains all the responses to all possible challenges for the encoded message M. Finally, define the response code of the scheme to be

={rM:M*}.

The codewords in are just the response vectors that we defined above. Previously [15], we proved the following result for a single-prover PoR scheme.

### Theorem 3.6.

Suppose that P is a proving algorithm for a PoR scheme with response code R. If the success probability of the corresponding proving algorithm satisfies succ(P)1-d~/2γ, where d~ is the Hamming distance of the code R, and γ is the size of the challenge space, then the extractor described in Figure 1 always outputs the message m.

### Figure 1

Extractor for Theorem 3.6.

If we cast this in the security model defined in Section 1 (Definition 2.1 and Definition 2.2), then we have the following theorem.

### Theorem 3.7.

Suppose that P is a proving algorithm for a single server PoR scheme with response code R. Then there exists a (1-d~/2γ,0,1,1)-MPoR system, where d~ is the Hamming distance of the code R, and γ is the size of the challenge space Γ.

Previously [15], we gave a modified version of the Shacham–Waters scheme which they showed is secure in the unconditional security setting. They argued that, in the setting of unconditionally security, any keyed PoR scheme should be considered to be secure when the success probability of the proving algorithm 𝒫, denoted by succ(𝒫), is defined as the average success probability of the prover over all possible keys (Theorem 3.8). The same reasoning extends to MPoR systems. Therefore, in what follows and in Section 6, when we say a scheme is an (η,ν,τ,ρ)-threshold-secure scheme, the term η is the average success probability where the average is computed over all possible keys. We denote the average success probability of a prover 𝒫 over all possible keys by succavg(𝒫). Previously [15], we showed the following:

### Theorem 3.8.

Let Fq be the underlying field, and let 1 be the Hamming weight of the challenges made by the Verifier. Let d be the Hamming distance of the space of the encoded message M*. Suppose that

succavg(𝒫)1-𝖽*(q-1)2γq,

where γ=qn is the size of the challenge space and d* is given by

(3.1)𝖽*(n)(q-1)-(n-𝖽)(q-1)-w1(𝖽w)(n-𝖽-w)(q-1)q.

Then there exists an Extractor that always outputs m^=m.

## 4 Worst-case MPoR based on ramp scheme

In this section, we give our first construction that achieves a worst-case security guarantee. The idea is to use a (τ1,τ2,ρ)-ramp scheme in conjunction with a single-server PoR system. The intuition behind the construction is that the underlying PoR system along with the ramp scheme provides the retrievability guarantee and the ramp scheme provides the confidentiality guarantee.

We first present a schematic diagram of the working of an MPoR in Figure 2 and illustrate the scheme with the help of following example. We provide the details of the construction in Figure 3.

### Figure 2

Schematic view of Ramp-MPoR system.

### Figure 3

Worst-case secure MPoR using a ramp scheme (Ramp-MPoR).

## Example 4.1.

Let ρ=6. Suppose the Verifier and the provers use a PoR system Π. Let the message to be stored be (15,3). The Verifier picks q=17 and chooses two random elements 1,2𝔽17 to construct a polynomial f(x)=15+3x+x2+2x3. The Verifier picks an encoding function e() and stores e(4) on 𝖯𝗋𝗈𝗏𝖾𝗋1, e(7) on 𝖯𝗋𝗈𝗏𝖾𝗋2, e(2) on 𝖯𝗋𝗈𝗏𝖾𝗋3, e(1) on 𝖯𝗋𝗈𝗏𝖾𝗋4, e(16) on 𝖯𝗋𝗈𝗏𝖾𝗋5, and e(8) on 𝖯𝗋𝗈𝗏𝖾𝗋6.

Let us suppose that the PoR scheme is such that, for a random challenge vector of dimension ρ, say (5,2,9,13,5,6), where the i-th entry would be a challenge to 𝖯𝗋𝗈𝗏𝖾𝗋i, the corresponding responses of the provers form a vector (3,14,1,13,12,14), where 𝖱𝖾𝗌𝗉i is the correct response of 𝖯𝗋𝗈𝗏𝖾𝗋i. In other words, on challenge 5 to 𝖯𝗋𝗈𝗏𝖾𝗋1, the correct response is 3, and so on.

During the audit phase, the Verifier picks any four provers and sends the challenges to the provers. Once all the provers that he chose reply, he verifies their response. For example, suppose the Verifier picks 𝖯𝗋𝗈𝗏𝖾𝗋1, 𝖯𝗋𝗈𝗏𝖾𝗋3, 𝖯𝗋𝗈𝗏𝖾𝗋4 and 𝖯𝗋𝗈𝗏𝖾𝗋6. The Verifier then sends the challenge 5 to 𝖯𝗋𝗈𝗏𝖾𝗋1, 9 to 𝖯𝗋𝗈𝗏𝖾𝗋3, 13 to 𝖯𝗋𝗈𝗏𝖾𝗋4 and 6 to 𝖯𝗋𝗈𝗏𝖾𝗋6. If it gets the responses 3, 1, 13 and 14 back, it accepts; otherwise, it rejects.

We note one of the possible practical deployments of the Ramp-MPoR stated in Figure 3. Let m be a message that consists of sk elements from 𝔽q. The Verifier breaks the message into k blocks of length s each. It then invokes a (τ1,τ2,n)-ramp scheme on each of these blocks to generate n shares of each of the k blocks. The Verifier then runs a PoR scheme Π to compute the encoded message to be stored on each of the servers by encoding its k shares, one corresponding to each of the k blocks.

We prove the following security result for the MPoR scheme presented in Figure 3.

## Theorem 4.2.

Let Π be an (η,0,1,1)-threshold-secure MPoR with a response code of Hamming distance d~ and the size of challenge space γ. Let Ramp=(ShareGen,Reconstruct) be a (τ1,τ2,ρ)-ramp scheme. Then Ramp-MPoR, defined in Figure 3, is an MPoR system with the following properties:

1. (i)

Privacy: Ramp-MPoR is τ1-private.

2. (ii)

Security: Ramp-MPoR is (η,0,τ2,ρ)-threshold secure, where η=1-𝖽~/2γ.

## Proof.

The privacy guarantee of Ramp-MPoR is straightforward from the privacy property of the underlying ramp scheme.

For the security guarantee, we need to demonstrate an Extractor that outputs a message m^=m if at least t servers succeed with probability at least η=1-𝖽~/2γ. The description of our Extractor is as follows:

1. (i)

The Extractor chooses τ2 provers and runs the extraction algorithm of the underlying single-server PoR system on each of these provers. In the end, it outputs M^ij for the corresponding provers 𝖯𝗋𝗈𝗏𝖾𝗋ij. It defines 𝒮{M^i1,,M^iτ2}.

2. (ii)

The Extractor invokes the 𝖱𝖾𝖼𝗈𝗇𝗌𝗍𝗋𝗎𝖼𝗍 algorithm of the underlying ramp scheme with the elements of 𝒮. It outputs whatever 𝖱𝖾𝖼𝗈𝗇𝗌𝗍𝗋𝗎𝖼𝗍 outputs.

Now note that the Verifier interacts with every 𝖯𝗋𝗈𝗏𝖾𝗋i independently. We know from the security of the underlying single-server PoR scheme (Theorem 3.6) that there is an extractor that always outputs the encoded message whenever succ(𝒫i)η. Therefore, if all the τ2 chosen proving algorithms succeed with probability at least η, then the set 𝒮 will have τ2 correct shares. From the correctness of the 𝖱𝖾𝖼𝗈𝗇𝗌𝗍𝗋𝗎𝖼𝗍 algorithm, we know that the message output in the end by the Extractor will be the message m. ∎

As a special case of the above, we get a simple MPoR system which uses a replication code. A replication code has an encoding function

Enc:ΛΛρsuch thatEnc(x)=(x,x,,xρtimes)for anyxΛ.

This is the setting considered by Curtmola et al. [9].

We call a Ramp-MPoR scheme based on a replication code a Rep-MPoR. The schematic description of the scheme is presented in Figure 4, and the scheme is presented in Figure 5. Since a ρ-replication code is a (0,1,ρ)-ramp scheme, a simple corollary to Theorem 4.2 is the following:

## Corollary 4.3.

Let Π be an (η,0,1,1)-MPoR system with a response code of Hamming distance d~ and the size of challenge space γ. Then Rep-MPoR, formed by instantiating Ramp-MPoR with the replication code based Ramp scheme, is an MPoR system with the following properties:

1. (i)

Privacy: It is 0 -private.

2. (ii)

Security: It is (η,0,1,ρ)-threshold secure, where η=1-𝖽~/2γ.

The issue with Rep-MPoR scheme is that there is no confidentiality of the file. We will come back to this issue later in Section 6.1.

### Figure 4

Schematic view of Rep-MPoR.

### Figure 5

Average-case secure MPoR (Rep-MPoR).

## 5 Average-case secure MPoR system

In general, it is not possible to verify with certainty whether the success probability of a proving algorithm is above a certain threshold; therefore, in that case, it is unclear how the Extractor would know which proving algorithms to use for extraction as described in Section 4. In this section, we analyze the average-case security properties of the replication code based scheme, Rep-MPoR, described in the last section. This allows us an alternative guarantee that allows successful extraction where the extractor need not worry whether a certain proving algorithm succeeds with high enough probability or not.

Recall the scenario introduced in Example 2.3. Here we assumed succ(𝒫1)=0.9, succ(𝒫2)=0.6 and succ(𝒫3)=0.6 for three provers. Suppose that successful extraction for a particular prover 𝒫i requires succ(𝒫2)0.7. Then extraction would work on only one of these three provers. On the other hand, suppose we have an average-case secure MPoR in which extraction is successful if the average success probability of the three provers is at least 0.7. Then the success probabilities assumed above would be sufficient to guarantee successful extraction.

## Theorem 5.1.

Let Π be a single-server PoR system with a response code of Hamming distance d~ and the size of challenge space γ. Then Rep-MPoR, defined in Figure 5, is an MPoR system with the following properties:

1. (i)

Privacy: Rep-MPoR is 0 -private.

2. (ii)

Security: Rep-MPoR is (1-𝖽~/2γ,0,ρ)-average secure.

## Proof.

Since the message is stored in its entirety on each of the servers, there is no confidentiality.

For the security guarantee, we need to demonstrate an Extractor that outputs a message m^=m if the average success probability of all the provers is at least η=1-𝖽~/2γ. The description of our Extractor is as follows:

1. (i)

For all 1in, use 𝒫i to compute the vector Ri=(rc(i):cΓ), where rc(i)=𝒫i(c) for all cΓ (i.e., for every c, rc(i) is the response computed by 𝒫i when it is given the challenge c).

2. (ii)

Compute R as a concatenation of R1,,Rρ and find M^:=(M^1,,M^ρ) so that dist(R,rM^) is minimized.

3. (iii)

Compute m=e-1(M^).

Now note that the Verifier interacts with each 𝖯𝗋𝗈𝗏𝖾𝗋i independently and the Extractor uses the challenge-response step with independent challenges. Let η1,,ηρ be the success probabilities of the ρ proving algorithms. Let η¯ be the average success probability over all the servers and challenges. Therefore, η¯=ρ-1i=1ρηi.

First note that, in the case of Figure 5, the response code is of the form

{(r,r,,rρ times):r}.

It is easy to see that the distance of the response code is ρ𝖽~ and the length of a challenge is ργ. From the definition of the extractor and Theorem 3.6, it follows that the extraction succeeds if

η1++ηρρ=η¯1-𝖽~2γ.

### 5.1 Hypothesis testing for Rep-MPoR

For the purposes of auditing whether a file is being stored appropriately, it is necessary to have a mechanism for determining whether the success probability of a prover is sufficiently high. In the case of replication code based on MPoR with worst-case security, we are interested in the success probabilities of individual provers, and the analysis can be carried out as detailed in [15]. In the case of Rep-MPoR, however, we wish to determine whether the average success probability of the set of provers {𝒫1,𝒫2,,𝒫ρ} is at least η. This amounts to distinguishing the null hypothesis

H0:avg-succ(𝒫i)<η

from the alternative hypothesis

H1:avg-succ(𝒫i)η.

Suppose we send c challenges to each server. If a given server 𝒫i has success probability succ(𝒫i), then the number of correct responses received follows the binomial distribution B(c,succ(𝒫i)). If the success probabilities succ(𝒫i) were the same for each server, then the sum of the number of successes over all the servers would also follow a binomial distribution. However, we are also interested in the case in which these success probabilities differ, in which case the total number of successes follows a Poisson binomial distribution, which is more complicated to work with. In order to establish a test that is conceptually and computationally easier to apply, we will instead rely on the observation that, in cases where the average success probability is high enough to permit extraction, the failure rates of the servers are relatively low.

For a given server 𝒫i, let fi=1-succ(𝒫i) denote the probability of failure. For r challenges, the number of failures follows the binomial distribution B(c,fi). Provided that r is sufficiently large and fi is sufficiently low, then B(c,fi) can be approximated by the Poisson distributionPois(cfi). The Poisson distribution Pois(λ) is used to model the scenario where discrete events are occurring independently within a given time period with an expected rate of λ events during that period. The probability of observing k events within that period is given by

P(k)=e-λλkk!.

Mean and variance of Pois(λ) are equal to λ. For our purposes, the advantage of using this approximation is that the sum of ρ independent variables following the Poisson distributions Pois(λ1),Pois(λ2),,Pois(λρ) is itself distributed according to the Poisson distribution Pois(λ1+λ2++λρ), even when the λi all differ. In the case where the average failure probability is low, the distribution Pois(c(f1+f2++fρ)) should provide a reasonable approximation to the actual distribution of the total number of failed challenges.

### Example 5.2.

To demonstrate the appropriateness of the Poisson approximation for this application, suppose we have five servers, whose failure probabilities are expressed as 𝐟=(f1,f2,,f5). Let t be the number of trials per server and b the total number of observed failures out of the 5t trials. Table 1 gives both the exact cumulative probability Pr[Bb] of observing up to b failures, and the Poisson approximation PrPois[Bb] of this cumulative probability, for a range of values for 𝐟.

### Table 1

Comparison between exact cumulative probability and approximation by Poisson distribution.

 t b Pr⁡[B≤b] PrPois⁡[B≤b] 𝐟=(0.1,0.1,0.1,0.1,0.1) 200 5 2.556545692×10-38 3.261456422×10-36 200 10 1.450898832×10-32 1.137687971×10-30 200 50 5.995167631×10-9 2.401592276×10-8 200 100 0.5265990813 0.5265622074 100 0 1.322070819×10-23 1.928749864×10-22 100 5 6.272915577×10-17 5.567756307×10-16 100 10 1.135691814×10-12 6.450152972×10-12 100 15 1.662665039×10-9 6.357982164×10-9 100 20 4.557480806×10-7 0.000001235187232 200 0 1.747871252×10-46 3.720076039×10-44 200 5 2.556545692×10-38 3.261456422×10-36 200 10 1.450898832×10-32 1.137687971×10-30 200 15 6.757345217×10-28 3.340076418×10-26 200 20 5.962487876×10-24 1.905558774×10-22 500 20 1.240463044×10-84 1.084188102×10-79 500 25 3.140367419×10-79 1.697380630×10-74 500 30 2.935666094×10-74 9.912214279×10-70 500 35 1.193158517×10-69 2.542280876×10-65 500 40 2.369596756×10-65 3.218593843×10-61 𝐟=(0.01,0.01,0.01,0.01,0.01) 200 5 0.06613951161 0.06708596299 200 10 0.5830408032 0.5830397512 200 20 0.9985035184 0.9984117410 200 50 ≈1 ≈1 𝐟=(0.2,0.01,0.02,0.03,0.04) 200 5 9.651421837×10-22 6.180223643×10-20 200 10 5.539867010×10-17 1.744235672×10-15 200 20 0.09020056729 0.1076778797 200 50 0.9999999198 0.9999991415 𝐟=(0.01,0.01,0.03,0.04,0.05) 200 5 8.312224722×10-8 1.196952269×10-7 200 10 0.00006809921297 0.00008550688580 200 20 0.06901537242 0.07274102693 200 50 0.9999582547 0.9999397284 𝐟=(0.1,0.1,0.1,0.1,0.1) 20 0 0.00002656139888 0.00004539992984 20 5 0.05757688648 0.06708596299 20 10 0.5831555123 0.5830397512 20 15 0.9601094730 0.9512595983 20 20 0.9991924263 0.9984117410 40 0 7.055079108×10-10 2.061153629×10-9 40 5 0.00003871193246 0.00007190884076 40 10 0.008071249954 0.01081171886 40 15 0.1430754340 0.1565131351 40 20 0.5591747822 0.5590925860 100 20 4.557480806×10-7 0.000001235187232 100 25 0.00003540113222 0.00007160717427 100 30 0.001002549708 0.001594027332 100 35 0.01231948910 0.01621388016 100 40 0.07508928967 0.08607000083 𝐟=(0.01,0.01,0.01,0.01,0.01) 20 0 0.3660323413 0.3678794412 20 5 0.9994654657 0.9994058153 20 10 0.9999999939 0.9999999900 20 15 1.000000000 1.000000000 20 20 1.000000000 1.000000000 40 0 0.1339796748 0.1353352833 40 5 0.9839770930 0.9834363920 40 10 0.9999931182 0.9999916922 40 15 0.9999999996 1.000000000 40 20 0.9999999999 1.000000000 100 20 0.9999999367 0.9999999198 100 25 0.9999999999 1.000000001 100 30 0.9999999999 1.000000001 100 35 0.9999999999 1.000000001 100 40 0.9999999999 1.000000001 𝐟=(0.02,0.0075,0.0075,0.0075,0.0075) 20 0 0.08936904038 0.09536916225 20 5 0.9712600336 0.9672561739 20 10 0.9999843669 0.9999642885 20 15 0.9999999995 0.9999999958 20 20 1.000000000 1.000000000 40 0 0.007986825382 0.009095277109 40 5 0.6699740391 0.6684384858 40 10 0.9927425867 0.9909776597 40 15 0.9999835852 0.9999661876 40 20 0.9999999935 0.9999999715 100 20 0.9999999935 0.9999999715 100 25 0.9999999998 1.000000001 100 30 0.9999999998 1.000000001 100 35 0.9999999998 1.000000001 100 40 0.9999999998 1.000000001

As an example of using the given formula to calculate a confidence interval, suppose we do 200 trials on each of five servers (so there are 1000 trials in total), and we observe 50 failures in total. Then the resulting confidence interval is [0,63.29). Suppose we wish to know whether the success probability is at least η=0.9. We have (1-0.9)×1000=100. This is outside of that interval, and hence we conclude there is enough evidence to reject H0 at the 95 % significance level. However, to test whether the success probability was greater than 0.95 we see that (1-0.95)×1000=50. Since 50 lies within the interval, we conclude there is insufficient evidence to reject H0 at the 95 % significance level.

Let b denote the number of incorrect responses we have received from the cρ challenges given to the provers. Suppose that H0 is true, so that the expected number of failures is at least ηρc. Based on our approximation, the probability that the number of failures is at most b is at most

i=0be-ηρc(ηρc)ii!.

If this probability is less than 0.05, we reject H0 and accept the alternative hypothesis. However, if the probability is greater than 0.05, then there is not enough evidence to reject H0 at the 5 % significance level, and so we continue to suspect that the file is not being stored appropriately.

We can express this test neatly using a confidence interval. We define a 95 % upper confidence bound by

λU=inf{λ|i=0be-λλii!<0.05}.

This represents the smallest parameter choice for the Poisson distribution for which the probability of obtaining b or fewer incorrect responses is less than 0.05. Then [0,λU) is a 95 % confidence interval for the mean number of failures, so we reject H0 whenever ηnr lies outside this interval. The value of λU can be determined easily by exploiting a connection with the chi-squared distribution [18]. We have

i=0be-λλii!=Pr(χ2b+22>2λ),

and so the appropriate value of λU can readily be obtained from a table for the chi-squared distribution.

We give a comparison between exact cumulative probability and approximation by Poisson distribution in Table 1.

## 6 Optimization using the keyed Shacham–Waters scheme

In the last three sections, we gave constructions of MPoR scheme using ramp schemes, linear secret-sharing schemes, replication codes and a single-prover PoR system. In this section, we show a specific instantiation of our scheme using the keyed scheme of Shacham and Waters [15, 16] for a single-server PoR system.

### 6.1 Extension of the keyed Shacham–Waters scheme to MPoR

If we instantiate the Rep-MPoR scheme (described in Section 4) with the modified Shacham–Waters scheme of [15], then we need one key that consists of n+1 values in 𝔽q. However, in this case, we do not have any privacy. In particular, we have the following extension of Corollary 4.3.

### Corollary 6.1.

Let Π be an (η,0,0,1)-PoR system of Shacham and Waters [16] with a response code of Hamming distance d~ and the size of challenge space γ, where d~ is given by equation (3.1). Then Rep-MPoR instantiated with the Shacham–Waters scheme is an MPoR system with the following properties:

1. (i)

Privacy: It is 0 -private.

2. (ii)

Security: It is (η,0,1,ρ)-threshold secure, where η=1-𝖽~(q-1)2γq.

3. (iii)

Storage Overhead: The Verifier needs to store n+1 field elements, and every 𝘗𝘳𝘰𝘷𝘦𝘳i needs to store 2n field elements.

### Proof.

The results follow by combining Theorem 3.8 with Corollary 4.3. ∎

The issue with the Rep-MPoR scheme is that there is no confidentiality of the file. In what follows, we improve the privacy guarantee of the MPoR scheme described above. Our starting point would be an instantiation of the Ramp-MPoR scheme, defined in Figure 3, with the Shacham–Waters scheme. We then reduce the storage on the Verifier through two steps.

### 6.2 Optimized version of the multi-server Shacham–Waters scheme

We follow two steps to get an MPoR scheme based on the Shacham–Waters scheme with a reduced storage requirement for the Verifier, while improving the confidentiality guarantee.

1. (i)

In the first step, stated in Theorem 6.2, we improve the privacy guarantee of the MPoR scheme to get a τ1-private MPoR scheme (where τ1<ρ is an integer). The Verifier in this scheme has to store ρ(n+1) field elements. When the underlying field is 𝔽q, the verifier has to store ρ(n+1)logq bits.

2. (ii)

In the second step, stated in Theorem 6.3, we reduce the storage requirement of the Verifier from ρ(n+1) to τ1(n+1) field elements for some integer τ1<ρ without affecting the privacy guarantee. When the underlying field is 𝔽q, the verifier has to store τ1(n+1)logq bits.

#### Step 1.

To improve the privacy guarantee of Corollary 6.1 to say, τ1-private (as per Definition 2.4), we use a Ramp-MPoR scheme and ρ different keys, where each key consists of n+1 values in 𝔽q. The Verifier generates ρ shares of every message block using a ramp scheme, then encodes the shares, and finally computes the tag for each of these encoded shares.

We follow with more details. Let m=(m[1],,m[k]) be the message. The Verifier computes the shares of every message block (m[1],,m[k]) using a (τ1,τ2,ρ)-ramp scheme. It then encodes all the shares using the encoding scheme of the PoR scheme. Let the resulting encoded shares be Mi[1],,Mi[n] for 1iρ. In other words, the result of the above two steps are ρ encoded shares, each of which is an n-tuple in (𝔽q)n. The Verifier now picks random values a(i),b1(i),,bn(i)𝔽q for 1iρ and computes the tags as follows:

Si[j]=bj(i)+a(i)Mi[j]for 1iρ, 1jn.

The verifier gives 𝖯𝗋𝗈𝗏𝖾𝗋i the tuple of encoded messages (Mi[1],,Mi[n]) and the corresponding tags (Si[1],,Si[n]). We call this scheme the Basic-MPoR scheme. The following is straightforward from Theorem 4.2.

#### Theorem 6.2.

Let Π be an (η,0,0,1)-PoR scheme of Shacham and Waters [16] with a response code of Hamming distance d~ and the size of challenge space γ=qn, where d~ is given by equation (3.1). Let Ramp be a (τ1,τ2,ρ)-ramp scheme. Then Basic-MPoR defined above is an MPoR scheme with the following properties:

1. (i)

Privacy: Basic-MPoR is τ1-private.

2. (ii)

Security: Basic-MPoR is (η,0,τ2,ρ)-threshold secure, where η=1-𝖽~(q-1)2γq.

3. (iii)

Storage Overhead: The Verifier needs to store ρ(n+1) field elements and every 𝘗𝘳𝘰𝘷𝘦𝘳i needs to store 2n field elements.

In the construction mentioned above, the Verifier needs to store ρ(n+1) elements of 𝔽q, which is almost the same as the total storage requirements of all the provers. In [15], we encountered the same issue, where the Verifier has to store as much secret information as the size of the message. This seems to be the general drawback in the unconditionally secure setting. However, in the case of MPoR, we can improve the storage requirement of the Verifier as shown in the next step.

#### Step 2.

In this step, we improve the above-described MPoR scheme to achieve considerable reduction on the storage requirement of the Verifier. The resulting scheme also provides unbounded audit capability against computationally unbounded adversarial provers, and it also ensures τ1-privacy.

The main observation that results in the reduction in the storage requirements of the Verifier is the fact that we can partially derandomize the keys generated by the Verifier. We use one of the most common techniques in derandomization. The keys in this scheme are generated by τ1-wise independent functions.[1] Our construction works as follows: We pick n+1 random polynomials, f1(x),,fn(x),g(x)𝔽q[x], each of degree at most τ1-1. Then the Verifier computes the secret key by evaluating the polynomials fj(x) and g(x) on ρ different values, say

bj(i)=fj(i)andai=g(i)

for 1jn and 1iρ. The Verifier then computes the encoded shares and their corresponding tags as in Basic-MPoR, i.e.,

Si[j]=bj(i)+a(i)Mi[j]for 1iρ, 1jn.

Figure 6 is the formal description of this scheme. For the scheme described in Figure 6, we prove the following result.

#### Theorem 6.3.

Let Ramp=(ShareGen,Reconstruct) be a (τ1,τ2,ρ)-ramp scheme. Let Π be a single-prover Shacham–Waters scheme [16] with a response code of Hamming distance d~ and the size of challenge space γ. Then SW-MPoR, defined in Figure 6, is an MPoR system with the following properties:

1. (i)

Privacy: SW-MPoR is τ1-private.

2. (ii)

Security: SW-MPoR is (η,0,τ2,ρ)-threshold secure, where η=1-𝖽~(q-1)2γq.

3. (iii)

Storage Overhead: The Verifier needs to store τ1(n+1) field elements, and every 𝘗𝘳𝘰𝘷𝘦𝘳i (for 1iρ) needs to store 2n field elements.

### Figure 6

MPoR using optimized Shacham–Waters scheme (SW-MPoR).

#### Proof.

The privacy guarantee of SW-MPoR is straightforward from the secrecy property of the underlying ramp scheme.

For the security guarantee, we have to show an explicit construction of the Extractor that, on input proving algorithms 𝒫1,,𝒫ρ, outputs m if succ(𝒫i)>η for at least τ2 proving algorithms. However, there is a subtle issue that we have to deal with before using the proof of Theorem 4.2, because of the relation between every message and tag pair. It was previously noted by us [15] that if the adversarial prover learns the secret key, then it can break the PoR scheme. We first argue that a set of τ1 colluding provers cannot have an undue advantage from exploiting the linear structure of the message-tag pairs.

We now prove that any set of τ1 provers do not learn anything about the keys generated using n+1 polynomials of degree at most τ1-1. The idea is very similar to the single-prover case. Previously [15], we noted that in the single prover case, for an n-tuple encoded message, the key is a tuple of n+1 uniformly random elements (a,b1,,bn) in 𝔽q. Further, from the point of view of a prover, there are q possible keys – the value of a determines the n-tuple (b1,,bn) uniquely, but a is completely undetermined. In the MPoR case, we have ρ keys. Each prover in a given set of τ1 provers has q possible keys, as discussed above. However, it is conceivable that they can use their collective knowledge to learn something about the keys. In what follows, we show that they cannot determine any additional information about their keys by combining the information they hold collectively.

Let I={i1,,iτ1} be the indices of any arbitrary set of τ1 provers. Let Si denote the set of possible keys for 𝖯𝗋𝗈𝗏𝖾𝗋i, for iI. Consider any list of τ1 keys (Ki1,Ki2,,Kiτ1). Recall that Ki (for iI) has the form (a(i),b1(i),,bn(i)), where a(i) and bj(i) (for 1jn) are generated by random polynomials of degree τ1. We first consider a(i) (for iI). Note that the vector (b1(i),,bn(i)) is defined uniquely by a(i) and the set of all encoded message-tag pairs. We have already shown that any set of τ1 provers cannot learn anything about the random polynomial g(x) used to generate the a(i) for all iI. We use the following well-known fact to show that any set of τ1 provers does not learn any additional information about the keys.

#### Fact 6.4.

Let t>0 be an integer, let q be a prime number, and let 𝔽q be a finite field. Let h0,h1,,ht-1𝔽q be random elements picked uniformly at random. Define h(x)=i=0t-1hixi for all α𝔽q. Then,

(6.1)Pr[h(x1)=y1h(xτ)=yt]=i=1tPr[h(xαi)=yi].

Since h(x) is uniformly distributed in 𝔽q, the probability computed in equation (6.1) is actually equal to q-t.

By construction, g(x) is a random polynomial of degree at most τ1-1. Fact 6.4 then implies that any combination of {a(i)}iI is equally likely. A similar argument, with the a(i)’s replaced by the bj(i)’s (for all iI and 1jn) and the polynomial g(x) replaced by fj(x) (for 1jn), gives that all sets of τ1 keys are equally likely. In other words, the set of provers in the set I cannot determine any additional information about their keys by combining the information they hold collectively.

We now complete the security proof by describing an Extractor that outputs the file if τ2 provers succeed with high enough probability. The description of the Extractor and its analysis is the same as that of Theorem 4.2. We give it for the sake of completeness.

1. (i)

The Extractor chooses τ2 provers and runs the extraction algorithm of the underlying single-server PoR system on each of these provers. In the end, it outputs M^ij for the corresponding provers 𝖯𝗋𝗈𝗏𝖾𝗋ij. It defines 𝒮{m^i1,,m^iτ2}. Note that the Extractor of the underlying PoR scheme has already computed e-1 on the set {M^i1,,M^iτ2}.

2. (ii)

The Extractor invokes the 𝖱𝖾𝖼𝗈𝗇𝗌𝗍𝗋𝗎𝖼𝗍 algorithm of the underlying ramp scheme with the elements of 𝒮~ to compute m.

Now note that the Verifier interacts with every 𝖯𝗋𝗈𝗏𝖾𝗋i independently. We know from the security of the underlying PoR scheme of Shacham–Waters that there is an extractor that always outputs the encoded message whenever succavg(𝒫i)η. Therefore, if all the τ2 chosen proving algorithms succeed with probability at least η over all possible keys, then the set 𝒮 will have τ2 correct shares. From the correctness of the 𝖱𝖾𝖼𝗈𝗇𝗌𝗍𝗋𝗎𝖼𝗍 algorithm and e-1(), we know that the message output in the end by the Extractor will be the message m.

For the storage requirement, the Verifier has to store the coefficients of all the random polynomials f1(x),,fn(x),g(x), which amounts to a total of τ1(n+1)=τ1n+n field elements. ∎

## 7 Conclusion and future works

In this paper, we studied PoR systems when multiple provers are involved (MPoR). We motivated and defined the security of MPoR in the worst-case (Definition 2.1) and the average-case (Definition 2.2) settings, and extended the hypothesis testing techniques used in the single-server setting [15] to the multi-server setting. We also motivated the study of confidentiality of the outsourced message. We gave MPoR schemes which are secure under both these security definitions and provide reasonable confidentiality guarantees even when there is no restriction on the computational power of the servers. At the end of this paper, we looked at an optimized version of MPoR system when instantiated with the unconditionally secure version of the Shacham–Waters scheme [16]. We exhibited that, in the multi-server setting with computationally unbounded provers, one can overcome the limitation that the verifier needs to store as much secret information as the provers.

Our paper leaves several open problems. We list two of them below:

1. (i)

Our approach works in the privately verifiable setting, i.e., the entity that wishes to verify the validity of stored data is the same entity that stored the data. It would be interesting to see if our schemes can be extended to publicly verifiable setting.

2. (ii)

We assume that the provers do not interact with each other after they receive the encoded files. There is a vast literature on mitigating collusion. It is an interesting direction to see if our schemes can be combined with the recent advances in secure scheme against colluding players in the distributed setting to remove our assumption.

## Notation used in this paper

1. c

challenge

2. C

dual of a code C

3. 𝖽*

distance of the response code

4. 𝖽

distance of a codeword

5. 𝖽

dual distance of a code

6. dist

Hamming distance between two vectors

7. 𝐆

generator matrix of a code

8. k

length of a message

9. K

key (in a keyed scheme)

10. number of message-blocks

11. m

message

12. m[i]

i-th message block

13. m^

message outputted by the Extractor

14. message space

15. M

encoded message

16. M[i]

i-th encoded message

17. Mj[i]

i-th encoded message on 𝖯𝗋𝗈𝗏𝖾𝗋j

18. *

encoded message space

19. n

number of provers

20. N

codeword length

21. 𝒫i

proving algorithm of i-th Prover

22. q

order of underlying finite field

23. r

response

24. rM

response vector for encoded message M

25. S

tag (in a keyed scheme)

26. succ(𝒫)

success probability of proving algorithm

27. *

response code

28. Γ

challenge space

29. γ

number of possible challenges

30. Δ

response space

31. ρ

number of users

32. τ

privacy threshold

Communicated by Spyros Magliveras

# Acknowledgements

Thanks to Andris Abakuks and Simon Skene for some helpful discussions of statistics.

### References

[1] G. Ateniese, R. C. Burns, R. Curtmola, J. Herring, O. Khan, L. Kissner, Z. N. J. Peterson and D. Song, Remote data checking using provable data possession, ACM Trans. Inform. Sys. Security 14 (2011), Paper No. 12. Search in Google Scholar

[2] G. Ateniese, R. C. Burns, R. Curtmola, J. Herring, L. Kissner, Z. N. J. Peterson and D. X. Song, Provable data possession at untrusted stores, Proceedings of the 14th ACM Conference on Computer and Communications Security, ACM, New York (2007), 598–609. Search in Google Scholar

[3] G. Ateniese, Ö. Dagdelen, I. Damgård and D. Venturi, Entangled cloud storage, IACR Cryptology ePrint Archive (2012), . Search in Google Scholar

[4] G. Ateniese, R. Di Pietro, L. V. Mancini and G. Tsudik, Scalable and efficient provable data possession, Proceedings of the 4th International Conference on Security and Privacy in Communication Networks, ACM, New York (2008), 1–9. Search in Google Scholar

[5] G. Ateniese, S. Kamara and J. Katz, Proofs of storage from homomorphic identification protocols, Advances in Cryptology—ASIACRYPT 2009, Springer, Berlin (2009), 319–333. Search in Google Scholar

[6] G. R. Blakley, Safeguarding cryptographic keys, Proceedings of the National Computer Conference, AFIPS, New York (1979), 313–317. Search in Google Scholar

[7] G. R. Blakley and C. Meadows, Security of ramp schemes, Advances in Cryptology—CRYPTO 1985, Springer, Berlin (1985), 242–268. Search in Google Scholar

[8] K. D. Bowers, A. Juels and A. Oprea, Proofs of retrievability: Theory and implementation, Proceedings of the 2009 ACM Workshop on Cloud Computing Security, ACM, New York (2009), 43–54. Search in Google Scholar

[9] R. Curtmola, O. Khan, R. C. Burns and G. Ateniese, MR-PDP: Multiple-replica provable data possession, The 28th International Conference on Distributed Computing Systems, IEEE Press, Piscataway (2008), 411–420. Search in Google Scholar

[10] Y. Dodis, S. P. Vadhan and D. Wichs, Proofs of retrievability via hardness amplification, Theory of Cryptography, Springer, Berlin (2009), 109–127. Search in Google Scholar

[11] A. Juels and B. S. Kaliski, Jr., PORs: Proofs of retrievability for large files, Proceedings of the 14th ACM Conference on Computer and Communications Security, ACM, New York (2007), 584–597. Search in Google Scholar

[12] S. Kamara and K. Lauter, Cryptographic cloud storage, Financial Cryptography and Data Security, Springer, Berlin (2010), 136–149. Search in Google Scholar

[13] R. J. McEliece and D. V. Sarwate, On sharing secrets and Reed–Solomon codes, Comm. ACM 24 (1981), 583–584. 10.1145/358746.358762Search in Google Scholar

[14] M. B. Paterson and D. R. Stinson, A simple combinatorial treatment of constructions and threshold gaps of ramp schemes, Cryptogr. Commun. 5 (2013), 229–240. 10.1007/s12095-013-0082-1Search in Google Scholar

[15] M. B. Paterson, D. R. Stinson and J. Upadhyay, A coding theory foundation for the analysis of general unconditionally secure proof-of-retrievability schemes for cloud storage, J. Math. Cryptol. 7 (2013), 183–216. Search in Google Scholar

[16] H. Shacham and B. Waters, Compact Proofs of Retrievability, Advances in Cryptology—ASIACRYPT 2008, Springer, Berlin (2009), 90–107. Search in Google Scholar

[17] A. Shamir, How to share a secret, Comm. ACM 22 (1979), 612–613. 10.1145/359168.359176Search in Google Scholar

[18] K. Ulm, Simple method to calculate the confidence interval of a standardized mortality ratio (SMR), Amer. J. Epidemiology 131 (1990), 373–375. 10.1093/oxfordjournals.aje.a115507Search in Google Scholar

[19] C. Wang, Q. Wang, K. Ren and W. Lou, Privacy-preserving public auditing for data storage security in cloud computing, IEEE Proceedings INFOCOM 2010, IEEE Press, Piscataway (2010), 1–9. Search in Google Scholar

[20] Summary of the Amazon S3 Service Disruption in the Northern Virginia (US-EAST-1) Region, . Search in Google Scholar

[21] Why is decentralized and distributed file storage critical for a better web?, . Search in Google Scholar