## 1 Introduction

In the recent past, there has been a lot of activity on remote storage and the associated cryptographic problem of integrity of the stored data. This question becomes even more important when there are reasons to believe that the remote servers might act maliciously, i.e., one or more servers can delete (whether accidentally or on purpose) a part of the data since there is a good chance that the data will never be accessed, and hence, the client would never find out! In order to assuage such concerns, one would prefer to have a simple auditing system that convinces the client if and only if the server has the data. Such audit protocols, called *proof-of-retrievability* (PoR) systems, were introduced by Juels and Kaliski [11], and closely related *proof-of-data-possession* (PDP) systems were introduced by Ateniese et al. [2].

In a PoR protocol, a client stores a message *m* on a remote server and keeps only a short private *fingerprint* locally. At some later time, when the client wishes to verify the integrity of its message, it can run an audit protocol in which it acts as a verifier while the server proves that it has the client’s data. The formal security of a PoR protocol is expressed in terms of an *extractor* – there exists an extractor with (black-box or non-black-box) access to the proving algorithm used by the server to respond to the client’s challenge, such that the extractor retrieves the original message given any adversarial server which passes the audits with a threshold probability. Apart from this security requirement, two practical requirements of any PoR system would be to have a reasonable bound on the communication cost of every audit and small storage overhead on both the client and server.

PoR systems were originally defined for the single-server setting. However, in the real world, it is highly likely that a client would store its data on more than one server. This might be due to a variety of reasons. For example, a client might wish to have a certain degree of redundancy if one or more servers fails. In this case, the client is more likely to store multiple copies of the same data. Another possible scenario could be that the client does not trust a single server with all of its data. In this case, the client might distribute the data across multiple servers. Both of these settings have been studied previously in the literature.

The first such study was initiated by Curtmola et al. [9], who considered the first of the above two cases. They addressed the problem of storing copies of a single file on multiple servers. This is an attractive solution considering the fact that replication is a fundamental principle in ensuring the availability and durability of data. Their system allows the client to audit a subset of servers even if some of them collude.

On the other hand, Bowers, Juels and Oprea [8] considered the second of the above two cases. They studied a system where the client’s data is distributed and stored on different servers. This ensures that none of the servers has the whole data.

Both of these systems covered one specific instance of the wide spectrum of possibilities when more than one server is involved. For example, none of the works mentioned above addresses the question of the privacy of data. Both of them argue that, for privacy, the client can encrypt its file before storing it on the servers. These systems are secure only in the computational setting and the privacy guarantee is dependent on the underlying encryption scheme. On the other hand, there are known primitives in the setting of distributed systems, like secret sharing schemes, that are known to be unconditionally secure. Moreover, we can also utilize cross-server redundancy to get more practical systems.

### 1.1 Our contributions

In Section 2, we give the formal description of multi-server PoR (MPoR) systems. We state the definitions for worst-case and the average-case secure MPoR systems. We also motivate the privacy requirement and state the privacy definition for MPoR systems. In Section 3, we define various primitives to the level required to understand this paper.

In Section 4, we give a construction of an MPoR scheme that achieves worst-case security when the malicious servers are computationally unbounded. Our construction is based on ramp schemes and a single-server PoR scheme. Our construction achieves confidentiality of the message. To exemplify our scheme, we instantiate this scheme with a specific form of ramp scheme.

In Section 5, we give a construction of an MPoR scheme that achieves average-case security against computationally unbounded adversaries. For an MPoR system that affords average-case security, we also show that an extension of classical statistical techniques previously used by us [15] can be used to provide a basis for estimating whether the responses of the servers are accurate enough to allow successful extraction.

One of the benefits of an MPoR system is that it provides cross-server redundancy. In the past, this feature has been used by Bowers, Juels and Oprea [8] to propose a multi-server system called HAIL. We first note that the constructions in Section 4 and Section 5 do not provide any improvement on the storage overhead of the server or the client. In Section 6, we give a construction based on the Shacham–Waters protocol [16] that allows significant reduction of the storage overhead of the client in the multi-server setting.

### 1.2 Related works

The concept of *proof of retrievability* is due to Juels and Kaliski [11]. A PoR scheme incorporates a challenge-response protocol in which a verifier can check that a message is being stored correctly, along with an *extractor* that will actually reconstruct the message, given the algorithm of a “prover” who is able to correctly respond to a sufficiently high percentage of challenges.

There are also papers that describe the closely related (but slightly weaker) idea of a *proof-of-data-possession scheme* (PDP scheme), e.g., [2]. A PDP scheme permits the possibility that not all of the message blocks can be reconstructed. Ateniese et al. [2] also introduced the idea of using *homomorphic authenticators* to reduce the communication complexity of the system. This scheme was improved in a follow-up work by Ateniese et al. [4].
Shacham and Waters [16] later showed that the scheme of Ateniese et al. [1] can be transformed into a PoR scheme by constructing an extractor that extracts the file from the responses of the prover on the audits.

Bowers, Juels and Oprea [8] extended the idea of Juels and Kaliski [11] and used error-correcting codes. The main difference in their construction is that they use the idea of an “outer” and an “inner” code (in the same vein as concatenated codes), to get a good balance between the extra storage overhead and computational overhead in responding to the audits.
Dodis, Vadhan and Wichs [10] provided the first example of an unconditionally secure PoR scheme, also constructed from an error-correcting code, with extraction performed through *list decoding* in conjunction with the use of an *almost-universal hash function*. They also give different constructions depending on the computational capabilities of the server.
Previously [15], we studied PoR schemes in the setting of unconditional security and showed some close connections to error-correcting codes.

Recently, Ateniese, Kamara and Katz [5] defined the framework of *proof-of-storage systems* to understand PDP and PoR system in a unified manner. They argue that existing PoR [16] and PDP [2] schemes can be seen as an instantiation of their framework.
They used *homomorphic identification schemes* to give efficient proof-of-storage systems in the *random-oracle model*. They further exhibited that existing constructions of PoR and PDP schemes are specific instantiation of their construction. Wang et al. [19] gave the first privacy preserving public auditable proof-of-storage systems. We refer the readers to the survey by Kamara and Lauter [12] regarding the architecture of proof-of-storage systems.

### Distributed cloud computing.

All the constructions mentioned above considered single server system; however, such systems are prone to failure leading to catastrophic problems [20].
However, proof-of-storage systems have been also studied in the setting where there is more than one server or more than one client. The first such setting was studied by Curtmola et al. [9]. They studied a multiple-replica PDP system, which is the natural generalization of a single-server PDP system to *t* servers.

Bowers, Juels and Oprea [8] introduced a distributed system that they called HAIL. Their system allows a set of provers to prove the integrity of a file stored by a client. The idea in HAIL is to exploit the cross-prover redundancy. They considered an active and mobile adversary that can corrupt the whole set of provers.

Recently, Ateniese et al. [3] considered the problem from the client side, where *n* clients store their respective files on a single prover in a manner such that the verification of the integrity of a single client’s file simultaneously gives the integrity guarantee of the files of all the participating clients. They called such a system an *entangled cloud storage.*

### 1.3 Comparison with Bowers, Juels and Oprea

The focus of this paper is PoR systems in the distributed setting; therefore, we only compare our work with existing works in the distributed setting. The scheme of Curtmola et al. [9] only considers multiple replica of the same underlying PDP systems, while the construction of Ateniese et al. [3] is for the multiple clients setting. In other words, the scheme of Bowers, Juels and Oprea [8] is closest to ours. However, there are a few key differences.

- (i)The construction of Bowers, Juels and Oprea [8] is secure only in the computational setting, while we provide security in the setting of unconditional security.
- (ii)Bowers, Juels and Oprea [8] use various tools and algorithms to construct their systems, including error-correcting codes, pseudo-random functions, message authentication codes and universal hash function families. On the other hand, we only use ramp schemes in our constructions, making our schemes easier to state and analyze, and arguably simpler to implement.
- (iii)We consider two types of security guarantees, namely, the worst-case scenario and the average-case scenario. On the other hand, Bowers, Juels and Oprea [8] only consider the worst-case scenario.
- (iv)The construction of Bowers, Juels and Oprea [8] only aims to protect the integrity of the message, while we consider both the privacy and integrity of the message. Privacy of data has emerged as an important requirement in cloud storage due to recent attacks [21].
- (v)We work under a stronger requirement than [8] – we require extraction to succeed with probability equal to 1, whereas in [8], extraction succeeds with probability close to 1, depending in part on properties of a certain class of hash functions used in the protocol.

We use the term Prover to identify any server that stores the file of a client. We use the term Verifier for any entity that verifies whether the file of a client is stored properly or not by the server. We also assume that a file is composed of message blocks of an appropriate fixed length. If the file consists of single block, we simply call it the file.

## 2 Security model of multi-server PoR systems

The essential components of multi-server PoR (MPoR) systems are natural generalizations of single-server PoR systems. The first difference is that there are ρ provers and the Verifier might store different messages on each of them. Also, during an audit phase, the Verifier can pick a subset of provers on which it runs the audits. The last crucial difference is that the Extractor has (black-box or non-black-box) access to a subset of proving algorithms corresponding to the provers that the Verifier picked to audit. We detail them below for the sake of completeness.

Let

- (i)In the keyed setting, the Verifier picks ρ different keys
, one for each of the corresponding provers.$({K}_{1},\dots ,{K}_{\rho})$ - (ii)The Verifier gives
to${M}_{i}$ . In the case of a keyed scheme,${\text{\U0001d5af\U0001d5cb\U0001d5c8\U0001d5cf\U0001d5be\U0001d5cb}}_{i}$ may be also given an additional tag${\text{\U0001d5af\U0001d5cb\U0001d5c8\U0001d5cf\U0001d5be\U0001d5cb}}_{i}$ , generated using the key${S}_{i}$ , and${K}_{i}$ .${M}_{i}$ - (iii)The Verifier stores some sort of information (say a
*fingerprint*of the encoded message) which allows him to verify the responses made by the provers. - (iv)On receiving the encoded message
,${M}_{i}$ generates a proving algorithm${\text{\U0001d5af\U0001d5cb\U0001d5c8\U0001d5cf\U0001d5be\U0001d5cb}}_{i}$ , which it uses to generate its responses during the auditing phase.${\mathcal{P}}_{i}$ - (v)At any time, the Verifier picks an index
*i*, where , and engages in a challenge-response protocol with$1\le i\le \ell $ . In one execution of the challenge-response protocol, the Verifier picks a challenge${\text{\U0001d5af\U0001d5cb\U0001d5c8\U0001d5cf\U0001d5be\U0001d5cb}}_{i}$ *c*and gives it to , and the prover responds. The Verifier then verifies the correctness of the response (based on its fingerprint).${\text{\U0001d5af\U0001d5cb\U0001d5c8\U0001d5cf\U0001d5be\U0001d5cb}}_{i}$ - (vi)The success probability
is the probability, computed over all the challenges, with which the Verifier accepts the response sent by$\mathrm{succ}\left({\mathcal{P}}_{i}\right)$ .${\text{\U0001d5af\U0001d5cb\U0001d5c8\U0001d5cf\U0001d5be\U0001d5cb}}_{i}$ - (vii)The Extractor is given a subset
*S*of the proving algorithms (and in the case of a keyed scheme, the corresponding subset of the keys,${\mathcal{P}}_{1},\dots ,{\mathcal{P}}_{\rho}$ ) and outputs a message$\{{K}_{i}:i\in S\}$ . The Extractor succeeds if$\hat{m}$ .$\hat{m}=m$

The above framework does not restrict any provers from interacting with other provers when they receive the encoded message. However, we assume that they do not interact *after* they have generated a proving algorithm. If we do not include this restriction, then it is not hard to see that one cannot have any meaningful protocol. For example, if provers can interact after they receive the encoded message, then it is possible that one prover stores the entire message and the other provers just relay the challenges to this specific prover and relay back its response to the verifier.

In contrast to a single-prover PoR scheme, there are two possible ways in which one can define the security of a multiple-prover PoR system. We define them next.

The first security definition corresponds to the “worst case” scenario and is the natural generalization of a single-server PoR system.

A ρ-prover MPoR scheme is if there is an Extractor which, when given any τ proving algorithms, say

where

We note that when

The above definition requires that all the τ servers succeed with high enough probability. On the other hand, it might not be the case that all the proving algorithms of the servers picked by the Verifier succeed with the required probability. In fact, even verifying whether or not all the τ proving algorithms have high enough success probability to allow successful extraction might be difficult (see, for example [15] for more details about this). However, it is possible that some of the proving algorithms succeed with high enough probability to compensate for the failure of the rest of the proving algorithms. For instance, since the provers are allowed to interact before they specify their proving algorithms, it might be the case that the colluding provers decide to store most of the message on a single prover. In this case, even a weaker guarantee that the average success probability is high enough might be sufficient to guarantee a successful extraction. In other words, it is possible to state (and as we show in this paper, achieve) a security guarantee with respect to the average case success probability over all the proving algorithms.

A ρ-prover MPoR scheme is if the Extractor succeeds with probability at least ν whenever

Note that the average-case secure system reduces to the standard PoR scheme (with

Suppose

### Privacy guarantee.

We mentioned at the start of this section that PoR systems were introduced and studied to give assurance of the integrity of the data stored on remote storage. However, the confidentiality aspects of data have not been studied formally in the area of cloud-based PoR systems. There have been couple of ad hoc solutions that have been proposed in which the messages are encrypted and then stored on the cloud [9]. We believe that, in addition to the standard integrity requirement, privacy of the stored data when multiple provers are involved is also an important requirement. We model the privacy requirement as follows:

An MPoR system is called *t*-private if no set *t* learns anything about the message stored by the Verifier.

Note that *t* provers collude, they do not learn anything about the message. We remark that we can achieve confidentiality without encrypting the message by using secret sharing techniques.

### Notation.

We fix the letter *m* for the original message, *m* is picked and *M* to denote the encoded message.
We fix ν to denote the failure probability of the extractor and η to denote the success probability of a proving algorithm. In this paper, we are mainly interested in the case when *n* to denote the number of message blocks, assuming the underlying PoR system breaks the message into blocks.

## 3 Primitives used in this paper

### 3.1 Ramp schemes

In our construction, we use a primitive related to secret sharing schemes known as *ramp schemes*. A *secret sharing scheme* allows a trusted dealer to share a secret between *n* players so that certain subsets of players can reconstruct the secret from the shares they hold [6, 17].

It is well known that the size of each player’s share in a secret sharing scheme must be at least the size of the secret. If the secret that is to be shared is large, then this constraint can be very restrictive. Schemes for which we can get a certain form of trade-off between share size and security are known as *ramp schemes* [7].

Let *n* be positive integers such that is a pair of algorithms, say

*n*shares, one for each of the

*n*players, such that the following two properties hold:

- (i)Reconstruction: Any subset of
or more players can pool together their shares and use${\tau}_{2}$ to compute the secret$\mathrm{\U0001d5b1\U0001d5be\U0001d5bc\U0001d5c8\U0001d5c7\U0001d5cc\U0001d5cd\U0001d5cb\U0001d5ce\U0001d5bc\U0001d5cd}$ from the shares that they collectively hold.$\U0001d5b2$ - (ii)Secrecy: No subset of
or fewer players can determine any information about the${\tau}_{1}$ *secret*.$\U0001d5b2$

Suppose the dealer wishes to set up a *f* as well as the secret. However, if three players pool together their shares, then they can learn some partial information about one of the other player’s share. For concreteness, let

For completeness, we review some of the basic theory concerning the construction of ramp schemes. Linear codes have been used to construct ramp schemes for over thirty years since the work of McEliece and Sarwate [13]. We will consider a construction from an arbitrary code in this paper. The following relation between an arbitrary code (linear or non-linear) and a ramp scheme was shown by Paterson and Stinson [14].

*Let $C$ be a code of length N, distance $d$ and dual distance ${d}^{\perp}$. Let $1\le s<{d}^{\perp}-2$. Then there is a $({\tau}_{1},{\tau}_{2},N-s)$-ramp scheme, where ${\tau}_{1}={d}^{\perp}-s-1$ and ${\tau}_{2}=N-d+1$.
*

Here *rate* of the ramp scheme. If *C* with dimension *k*, then

The construction of a ramp scheme from a code is as follows. Let *C* be a code of length

One can use a Reed–Solomon code to construct a ramp scheme [13]. Let *q* be a prime and *q*, there is an

### 3.2 Single-prover PoR system

We start by fixing some notation for PoR schemes that we use throughout the paper. Let Γ be the *challenge space*, and let Δ be the *response space*. We denote by *response function**M* and the challenge *c*.

For an encoded message *response vector**M*. Finally, define the *response code*
of the scheme to be

The codewords in

*Suppose that *

*m*.

If we cast this in the security model defined in Section 1 (Definition 2.1 and Definition 2.2), then we have the following theorem.

*Suppose that *

Previously [15], we gave a modified version of the Shacham–Waters scheme which they showed is secure in the unconditional security setting. They argued that, in the setting of unconditionally security, any keyed PoR scheme should be considered to be secure when the success probability of the proving algorithm

*Let *

*where *

*Then there exists an Extractor that always outputs *

## 4 Worst-case MPoR based on ramp scheme

In this section, we give our first construction that achieves a worst-case security guarantee. The idea is to use a

We first present a schematic diagram of the working of an MPoR in Figure 2 and illustrate the scheme with the help of following example. We provide the details of the construction in Figure 3.

Let

Let us suppose that the PoR scheme is such that, for a random challenge vector of dimension ρ, say *i*-th entry would be a challenge to

During the audit phase, the Verifier picks any four provers and sends the challenges to the provers. Once all the provers that he chose reply, he verifies their response. For example, suppose the Verifier picks

We note one of the possible practical deployments of the Ramp-MPoR stated in Figure 3. Let *m* be a message that consists of *sk* elements from *k* blocks of length *s* each. It then invokes a *n* shares of each of the *k* blocks. The Verifier then runs a PoR scheme Π to compute the encoded message to be stored on each of the servers by encoding its *k* shares, one corresponding to each of the *k* blocks.

We prove the following security result for the MPoR scheme presented in Figure 3.

*Let Π be an *

- (i)
*Privacy: Ramp-MPoR is*${\tau}_{1}$ *-private.* - (ii)
*Security: Ramp-MPoR is*$(\eta ,0,{\tau}_{2},\rho )$ *-threshold secure, where* .$\eta =1-\stackrel{~}{\U0001d5bd}/2\gamma $

The privacy guarantee of Ramp-MPoR is straightforward from the privacy property of the underlying ramp scheme.

For the security guarantee, we need to demonstrate an Extractor that outputs a message *t* servers succeed with probability at least

- (i)The Extractor chooses
provers and runs the extraction algorithm of the underlying single-server PoR system on each of these provers. In the end, it outputs${\tau}_{2}$ for the corresponding provers${\hat{M}}_{{i}_{j}}$ . It defines${\text{\U0001d5af\U0001d5cb\U0001d5c8\U0001d5cf\U0001d5be\U0001d5cb}}_{{i}_{j}}$ .$\mathcal{S}\leftarrow \{{\hat{M}}_{{i}_{1}},\dots ,{\hat{M}}_{{i}_{{\tau}_{2}}}\}$ - (ii)The Extractor invokes the
algorithm of the underlying ramp scheme with the elements of . It outputs whatever$\mathcal{S}$

Now note that the Verifier interacts with every *m*.
∎

As a special case of the above, we get a simple MPoR system which uses a *replication code*. A replication code has an encoding function

This is the setting considered by Curtmola et al. [9].

We call a Ramp-MPoR scheme based on a replication code a Rep-MPoR. The schematic description of the scheme is presented in Figure 4, and the scheme is presented in Figure 5. Since a ρ-replication code is a

*Let Π be an *

- (i)
*Privacy: It is*0*-private.* - (ii)
*Security: It is*$(\eta ,0,1,\rho )$ *-threshold secure, where* .$\eta =1-\stackrel{~}{\U0001d5bd}/2\gamma $

The issue with Rep-MPoR scheme is that there is no confidentiality of the file. We will come back to this issue later in Section 6.1.

## 5 Average-case secure MPoR system

In general, it is not possible to verify with certainty whether the success probability of a proving algorithm is above a certain threshold; therefore, in that case, it is unclear how the Extractor would know which proving algorithms to use for extraction as described in Section 4. In this section, we analyze the average-case security properties of the replication code based scheme, Rep-MPoR, described in the last section. This allows us an alternative guarantee that allows successful extraction where the extractor need not worry whether a certain proving algorithm succeeds with high enough probability or not.

Recall the scenario introduced in Example 2.3. Here we assumed

*Let Π be a single-server PoR system with a response code of Hamming distance *

- (i)
*Privacy: Rep-MPoR is*0*-private.* - (ii)
*Security: Rep-MPoR is*$(1-\stackrel{~}{\U0001d5bd}/2\gamma ,0,\rho )$ *-average secure.*

Since the message is stored in its entirety on each of the servers, there is no confidentiality.

For the security guarantee, we need to demonstrate an Extractor that outputs a message

- (i)For all
, use$1\le i\le n$ to compute the vector${\mathcal{P}}_{i}$ , where${R}_{i}=({r}_{c}^{\left(i\right)}:c\in \Gamma )$ for all${r}_{c}^{\left(i\right)}={\mathcal{P}}_{i}\left(c\right)$ (i.e., for every$c\in \Gamma $ *c*, is the response computed by${r}_{c}^{\left(i\right)}$ when it is given the challenge${\mathcal{P}}_{i}$ *c*). - (ii)Compute
*R*as a concatenation of and find${R}_{1},\dots ,{R}_{\rho}$ so that$\hat{M}:=({\hat{M}}_{1},\dots ,{\hat{M}}_{\rho})$ is minimized.$\mathrm{dist}(R,{r}^{\hat{M}})$ - (iii)Compute
.$m={e}^{-1}\left(\hat{M}\right)$

Now note that the Verifier interacts with each

First note that, in the case of Figure 5, the response code is of the form

It is easy to see that the distance of the response code is

### 5.1 Hypothesis testing for Rep-MPoR

For the purposes of auditing whether a file is being stored appropriately, it is necessary to have a mechanism for determining whether the success probability of a prover is sufficiently high. In the case of replication code based on MPoR with worst-case security, we are interested in the success probabilities of individual provers, and the analysis can be carried out as detailed in [15]. In the case of Rep-MPoR, however, we wish to determine whether the *average* success probability of the set of provers

from the alternative hypothesis

Suppose we send *c* challenges to each server. If a given server *Poisson binomial distribution*, which is more complicated to work with. In order to establish a test that is conceptually and computationally easier to apply, we will instead rely on the observation that, in cases where the average success probability is high enough to permit extraction, the failure rates of the servers are relatively low.

For a given server *r* challenges, the number of failures follows the binomial distribution *r* is sufficiently large and *Poisson distribution**k* events within that period is given by

Mean and variance of

To demonstrate the appropriateness of the Poisson approximation for this application, suppose we have five servers, whose failure probabilities are expressed as *t* be the number of trials per server and *b* the total number of observed failures out of the *b* failures, and the Poisson approximation

Comparison between exact cumulative probability and approximation by Poisson distribution.

t | b | ||

200 | 5 | ||

200 | 10 | ||

200 | 50 | ||

200 | 100 | 0.5265990813 | 0.5265622074 |

100 | 0 | ||

100 | 5 | ||

100 | 10 | ||

100 | 15 | ||

100 | 20 | 0.000001235187232 | |

200 | 0 | ||

200 | 5 | ||

200 | 10 | ||

200 | 15 | ||

200 | 20 | ||

500 | 20 | ||

500 | 25 | ||

500 | 30 | ||

500 | 35 | ||

500 | 40 | ||

200 | 5 | 0.06613951161 | 0.06708596299 |

200 | 10 | 0.5830408032 | 0.5830397512 |

200 | 20 | 0.9985035184 | 0.9984117410 |

200 | 50 | ||

200 | 5 | ||

200 | 10 | ||

200 | 20 | 0.09020056729 | 0.1076778797 |

200 | 50 | 0.9999999198 | 0.9999991415 |

200 | 5 | ||

200 | 10 | 0.00006809921297 | 0.00008550688580 |

200 | 20 | 0.06901537242 | 0.07274102693 |

200 | 50 | 0.9999582547 | 0.9999397284 |

20 | 0 | 0.00002656139888 | 0.00004539992984 |

20 | 5 | 0.05757688648 | 0.06708596299 |

20 | 10 | 0.5831555123 | 0.5830397512 |

20 | 15 | 0.9601094730 | 0.9512595983 |

20 | 20 | 0.9991924263 | 0.9984117410 |

40 | 0 | ||

40 | 5 | 0.00003871193246 | 0.00007190884076 |

40 | 10 | 0.008071249954 | 0.01081171886 |

40 | 15 | 0.1430754340 | 0.1565131351 |

40 | 20 | 0.5591747822 | 0.5590925860 |

100 | 20 | 0.000001235187232 | |

100 | 25 | 0.00003540113222 | 0.00007160717427 |

100 | 30 | 0.001002549708 | 0.001594027332 |

100 | 35 | 0.01231948910 | 0.01621388016 |

100 | 40 | 0.07508928967 | 0.08607000083 |

20 | 0 | 0.3660323413 | 0.3678794412 |

20 | 5 | 0.9994654657 | 0.9994058153 |

20 | 10 | 0.9999999939 | 0.9999999900 |

20 | 15 | 1.000000000 | 1.000000000 |

20 | 20 | 1.000000000 | 1.000000000 |

40 | 0 | 0.1339796748 | 0.1353352833 |

40 | 5 | 0.9839770930 | 0.9834363920 |

40 | 10 | 0.9999931182 | 0.9999916922 |

40 | 15 | 0.9999999996 | 1.000000000 |

40 | 20 | 0.9999999999 | 1.000000000 |

100 | 20 | 0.9999999367 | 0.9999999198 |

100 | 25 | 0.9999999999 | 1.000000001 |

100 | 30 | 0.9999999999 | 1.000000001 |

100 | 35 | 0.9999999999 | 1.000000001 |

100 | 40 | 0.9999999999 | 1.000000001 |

20 | 0 | 0.08936904038 | 0.09536916225 |

20 | 5 | 0.9712600336 | 0.9672561739 |

20 | 10 | 0.9999843669 | 0.9999642885 |

20 | 15 | 0.9999999995 | 0.9999999958 |

20 | 20 | 1.000000000 | 1.000000000 |

40 | 0 | 0.007986825382 | 0.009095277109 |

40 | 5 | 0.6699740391 | 0.6684384858 |

40 | 10 | 0.9927425867 | 0.9909776597 |

40 | 15 | 0.9999835852 | 0.9999661876 |

40 | 20 | 0.9999999935 | 0.9999999715 |

100 | 20 | 0.9999999935 | 0.9999999715 |

100 | 25 | 0.9999999998 | 1.000000001 |

100 | 30 | 0.9999999998 | 1.000000001 |

100 | 35 | 0.9999999998 | 1.000000001 |

100 | 40 | 0.9999999998 | 1.000000001 |

As an example of using the given formula to calculate a confidence interval, suppose we do 200 trials on each of five servers (so there are 1000 trials in total), and we observe 50 failures in total. Then the resulting confidence interval is

Let *b* denote the number of incorrect responses we have received from the *b* is at most

If this probability is less than 0.05, we reject

We can express this test neatly using a *confidence interval*. We define a 95 % upper confidence bound by

This represents the smallest parameter choice for the Poisson distribution for which the probability of obtaining *b* or fewer incorrect responses is less than 0.05. Then

and so the appropriate value of

We give a comparison between exact cumulative probability and approximation by Poisson distribution in Table 1.

## 6 Optimization using the keyed Shacham–Waters scheme

In the last three sections, we gave constructions of MPoR scheme using ramp schemes, linear secret-sharing schemes, replication codes and a single-prover PoR system. In this section, we show a specific instantiation of our scheme using the keyed scheme of Shacham and Waters [15, 16] for a single-server PoR system.

### 6.1 Extension of the keyed Shacham–Waters scheme to MPoR

If we instantiate the Rep-MPoR scheme (described in Section 4) with the modified Shacham–Waters scheme of [15], then we need one key that consists of

*Let Π be an *

- (i)
*Privacy: It is*0*-private.* - (ii)
*Security: It is*$(\eta ,0,1,\rho )$ *-threshold secure, where* .$\eta =1-\frac{\stackrel{~}{\U0001d5bd}\left(q-1\right)}{2\gamma q}$ - (iii)
*Storage Overhead: The*Verifier*needs to store*$n+1$ *field elements, and every*${\text{\U0001d617\U0001d633\U0001d630\U0001d637\U0001d626\U0001d633}}_{i}$ *needs to store*$2n$ *field elements.*

The results follow by combining Theorem 3.8 with Corollary 4.3. ∎

The issue with the Rep-MPoR scheme is that there is no confidentiality of the file. In what follows, we improve the privacy guarantee of the MPoR scheme described above. Our starting point would be an instantiation of the Ramp-MPoR scheme, defined in Figure 3, with the Shacham–Waters scheme. We then reduce the storage on the Verifier through two steps.

### 6.2 Optimized version of the multi-server Shacham–Waters scheme

We follow two steps to get an MPoR scheme based on the Shacham–Waters scheme with a reduced storage requirement for the Verifier, while improving the confidentiality guarantee.

- (i)In the first step, stated in Theorem 6.2, we improve the privacy guarantee of the MPoR scheme to get a
-private MPoR scheme (where${\tau}_{1}$ is an integer). The Verifier in this scheme has to store${\tau}_{1}<\rho $ field elements. When the underlying field is$\rho \left(n+1\right)$ , the verifier has to store${\mathbb{F}}_{q}$ bits.$\rho \left(n+1\right)\mathrm{log}q$ - (ii)In the second step, stated in Theorem 6.3, we reduce the storage requirement of the Verifier from
to$\rho \left(n+1\right)$ field elements for some integer${\tau}_{1}\left(n+1\right)$ without affecting the privacy guarantee. When the underlying field is${\tau}_{1}<\rho $ , the verifier has to store${\mathbb{F}}_{q}$ bits.${\tau}_{1}\left(n+1\right)\mathrm{log}q$

### Step 1.

To improve the privacy guarantee of Corollary 6.1 to say,

We follow with more details. Let *n*-tuple in

The verifier gives

*Let Π be an *

- (i)
*Privacy: Basic-MPoR is*${\tau}_{1}$ *-private.* - (ii)
*Security: Basic-MPoR is*$(\eta ,0,{\tau}_{2},\rho )$ *-threshold secure, where* .$\eta =1-\frac{\stackrel{~}{\U0001d5bd}\left(q-1\right)}{2\gamma q}$ - (iii)
*Storage Overhead: The*Verifier*needs to store*$\rho \left(n+1\right)$ *field elements and every*${\text{\U0001d617\U0001d633\U0001d630\U0001d637\U0001d626\U0001d633}}_{i}$ *needs to store*$2n$ *field elements.*

In the construction mentioned above, the Verifier needs to store

### Step 2.

In this step, we improve the above-described MPoR scheme to achieve considerable reduction on the storage requirement of the Verifier. The resulting scheme also provides unbounded audit capability against computationally unbounded adversarial provers, and it also ensures

The main observation that results in the reduction in the storage requirements of the Verifier is the fact that we can partially derandomize the keys generated by the Verifier. We use one of the most common techniques in derandomization. The keys in this scheme are generated by ^{1}
Our construction works as follows: We pick

for

Figure 6 is the formal description of this scheme. For the scheme described in Figure 6, we prove the following result.

*Let *

- (i)
*Privacy: SW-MPoR is*${\tau}_{1}$ *-private.* - (ii)
*Security: SW-MPoR is*$(\eta ,0,{\tau}_{2},\rho )$ *-threshold secure, where* .$\eta =1-\frac{\stackrel{~}{\U0001d5bd}\left(q-1\right)}{2\gamma q}$ - (iii)
*Storage Overhead: The*Verifier*needs to store*${\tau}_{1}\left(n+1\right)$ *field elements, and every*${\text{\U0001d617\U0001d633\U0001d630\U0001d637\U0001d626\U0001d633}}_{i}$ *(for*$1\le i\le \rho $ *) needs to store*$2n$ *field elements.*

The privacy guarantee of SW-MPoR is straightforward from the secrecy property of the underlying ramp scheme.

For the security guarantee, we have to show an explicit construction of the Extractor that, on input proving algorithms *m* if

We now prove that any set of *n*-tuple encoded message, the key is a tuple of *q* possible keys – the value of *a* determines the *n*-tuple *a* is completely undetermined. In the MPoR case, we have ρ keys. Each prover in a given set of *q* possible keys, as discussed above. However, it is conceivable that they can use their collective knowledge to learn something about the keys. In what follows, we show that they cannot determine any additional information about their keys by combining the information they hold collectively.

Let

Let *q* be a prime number, and let

Since

By construction, *I* cannot determine any additional information about their keys by combining the information they hold collectively.

We now complete the security proof by describing an Extractor that outputs the file if

- (i)The Extractor chooses
provers and runs the extraction algorithm of the underlying single-server PoR system on each of these provers. In the end, it outputs${\tau}_{2}$ for the corresponding provers${\hat{M}}_{{i}_{j}}$ . It defines${\text{\U0001d5af\U0001d5cb\U0001d5c8\U0001d5cf\U0001d5be\U0001d5cb}}_{{i}_{j}}$ . Note that the Extractor of the underlying PoR scheme has already computed$\mathcal{S}\leftarrow \{{\hat{m}}_{{i}_{1}},\dots ,{\hat{m}}_{{i}_{{\tau}_{2}}}\}$ on the set${e}^{-1}$ .$\{{\hat{M}}_{{i}_{1}},\dots ,{\hat{M}}_{{i}_{{\tau}_{2}}}\}$ - (ii)The Extractor invokes the
to compute$\stackrel{~}{\mathcal{S}}$ .${m}^{\prime}$

Now note that the Verifier interacts with every *m*.

For the storage requirement, the Verifier has to store the coefficients of all the random polynomials

## 7 Conclusion and future works

In this paper, we studied PoR systems when multiple provers are involved (MPoR). We motivated and defined the security of MPoR in the worst-case (Definition 2.1) and the average-case (Definition 2.2) settings, and extended the hypothesis testing techniques used in the single-server setting [15] to the multi-server setting. We also motivated the study of confidentiality of the outsourced message. We gave MPoR schemes which are secure under both these security definitions and provide reasonable confidentiality guarantees even when there is no restriction on the computational power of the servers. At the end of this paper, we looked at an optimized version of MPoR system when instantiated with the unconditionally secure version of the Shacham–Waters scheme [16]. We exhibited that, in the multi-server setting with computationally unbounded provers, one can overcome the limitation that the verifier needs to store as much secret information as the provers.

Our paper leaves several open problems. We list two of them below:

- (i)Our approach works in the privately verifiable setting, i.e., the entity that wishes to verify the validity of stored data is the same entity that stored the data. It would be interesting to see if our schemes can be extended to publicly verifiable setting.
- (ii)We assume that the provers do not interact with each other after they receive the encoded files. There is a vast literature on mitigating collusion. It is an interesting direction to see if our schemes can be combined with the recent advances in secure scheme against colluding players in the distributed setting to remove our assumption.

## Notation used in this paper

*c*challenge dual of a code${C}^{\perp}$ *C* distance of the response code${\U0001d5bd}^{*}$ distance of a codeword$\U0001d5bd$ dual distance of a code${\U0001d5bd}^{\perp}$ Hamming distance between two vectors$\mathrm{dist}$ generator matrix of a code$\mathbf{G}$ *k*length of a message*K*key (in a keyed scheme) number of message-blocks$\ell $ *m*message$m\left[i\right]$ *i*-th message block message outputted by the Extractor$\hat{m}$ message space$\mathcal{M}$ *M*encoded message$M\left[i\right]$ *i*-th encoded message${M}_{j}\left[i\right]$ *i*-th encoded message on${\text{\U0001d5af\U0001d5cb\U0001d5c8\U0001d5cf\U0001d5be\U0001d5cb}}_{j}$ encoded message space${\mathcal{M}}^{*}$ *n*number of provers*N*codeword length proving algorithm of${\mathcal{P}}_{i}$ *i*-th Prover*q*order of underlying finite field*r*response response vector for encoded message${r}^{M}$ *M**S*tag (in a keyed scheme) success probability of proving algorithm$\mathrm{succ}\left(\mathcal{P}\right)$ response code${\mathcal{R}}^{*}$ - Γ challenge space
- γ number of possible challenges
- Δ response space
- ρ number of users
- τ privacy threshold

Thanks to Andris Abakuks and Simon Skene for some helpful discussions of statistics.

## References

- [1]↑
G. Ateniese, R. C. Burns, R. Curtmola, J. Herring, O. Khan, L. Kissner, Z. N. J. Peterson and D. Song, Remote data checking using provable data possession, ACM Trans. Inform. Sys. Security 14 (2011), Paper No. 12.

- [2]↑
G. Ateniese, R. C. Burns, R. Curtmola, J. Herring, L. Kissner, Z. N. J. Peterson and D. X. Song, Provable data possession at untrusted stores, Proceedings of the 14th ACM Conference on Computer and Communications Security, ACM, New York (2007), 598–609.

- [3]↑
G. Ateniese, Ö. Dagdelen, I. Damgård and D. Venturi, Entangled cloud storage, IACR Cryptology ePrint Archive (2012), https://eprint.iacr.org/2012/511.pdf.

- [4]↑
G. Ateniese, R. Di Pietro, L. V. Mancini and G. Tsudik, Scalable and efficient provable data possession, Proceedings of the 4th International Conference on Security and Privacy in Communication Networks, ACM, New York (2008), 1–9.

- [5]↑
G. Ateniese, S. Kamara and J. Katz, Proofs of storage from homomorphic identification protocols, Advances in Cryptology—ASIACRYPT 2009, Springer, Berlin (2009), 319–333.

- [6]↑
G. R. Blakley, Safeguarding cryptographic keys, Proceedings of the National Computer Conference, AFIPS, New York (1979), 313–317.

- [7]↑
G. R. Blakley and C. Meadows, Security of ramp schemes, Advances in Cryptology—CRYPTO 1985, Springer, Berlin (1985), 242–268.

- [8]↑
K. D. Bowers, A. Juels and A. Oprea, Proofs of retrievability: Theory and implementation, Proceedings of the 2009 ACM Workshop on Cloud Computing Security, ACM, New York (2009), 43–54.

- [9]↑
R. Curtmola, O. Khan, R. C. Burns and G. Ateniese, MR-PDP: Multiple-replica provable data possession, The 28th International Conference on Distributed Computing Systems, IEEE Press, Piscataway (2008), 411–420.

- [10]↑
Y. Dodis, S. P. Vadhan and D. Wichs, Proofs of retrievability via hardness amplification, Theory of Cryptography, Springer, Berlin (2009), 109–127.

- [11]↑
A. Juels and B. S. Kaliski, Jr., PORs: Proofs of retrievability for large files, Proceedings of the 14th ACM Conference on Computer and Communications Security, ACM, New York (2007), 584–597.

- [12]↑
S. Kamara and K. Lauter, Cryptographic cloud storage, Financial Cryptography and Data Security, Springer, Berlin (2010), 136–149.

- [13]↑
R. J. McEliece and D. V. Sarwate, On sharing secrets and Reed–Solomon codes, Comm. ACM 24 (1981), 583–584.

- [14]↑
M. B. Paterson and D. R. Stinson, A simple combinatorial treatment of constructions and threshold gaps of ramp schemes, Cryptogr. Commun. 5 (2013), 229–240.

- [15]↑
M. B. Paterson, D. R. Stinson and J. Upadhyay, A coding theory foundation for the analysis of general unconditionally secure proof-of-retrievability schemes for cloud storage, J. Math. Cryptol. 7 (2013), 183–216.

- [16]↑
H. Shacham and B. Waters, Compact Proofs of Retrievability, Advances in Cryptology—ASIACRYPT 2008, Springer, Berlin (2009), 90–107.

- [18]↑
K. Ulm, Simple method to calculate the confidence interval of a standardized mortality ratio (SMR), Amer. J. Epidemiology 131 (1990), 373–375.

- [19]↑
C. Wang, Q. Wang, K. Ren and W. Lou, Privacy-preserving public auditing for data storage security in cloud computing, IEEE Proceedings INFOCOM 2010, IEEE Press, Piscataway (2010), 1–9.

- [20]↑
Summary of the Amazon S3 Service Disruption in the Northern Virginia (US-EAST-1) Region, https://aws.amazon.com/message/41926/.

- [21]↑
Why is decentralized and distributed file storage critical for a better web?, https://coincenter.org/entry/why-is-decentralized-and-distributed-file-storage-critical-for-a-better-web.

## Footnotes

^{1}

A function is a function if every subset of