Show Summary Details
More options …

# Open Engineering

### formerly Central European Journal of Engineering

Editor-in-Chief: Ritter, William

CiteScore 2018: 0.91

SCImago Journal Rank (SJR) 2018: 0.211
Source Normalized Impact per Paper (SNIP) 2018: 0.655

ICV 2017: 100.00

Open Access
Online
ISSN
2391-5439
See all formats and pricing
More options …
Volume 8, Issue 1

# Efficient Redundancy Techniques in Cloud and Desktop Grid Systems using MAP/G/c-type Queues

Srinivas R. Chakravarthy
• Corresponding author
• Department of Industrial and Manufacturing Engineering, Kettering University, Flint, MI 48504, USA
• Email
• Other articles by this author:
/ Alexander Rumyantsev
• Corresponding author
• Institute of Applied Mathematical Research, Karelian Research Centre of RAS, 11 Pushkinskaya Str., Petrozavodsk, 185910, Russia
• Petrozavodsk State University, 33 Lenina Pr., Petrozavodsk, 185910, Russia
• Email
• Other articles by this author:
Published Online: 2018-03-03 | DOI: https://doi.org/10.1515/eng-2018-0004

## Abstract

Cloud computing is continuing to prove its flexibility and versatility in helping industries and businesses as well as academia as a way of providing needed computing capacity. As an important alternative to cloud computing, desktop grids allow to utilize the idle computer resources of an enterprise/community by means of distributed computing system, providing a more secure and controllable environment with lower operational expenses. Further, both cloud computing and desktop grids are meant to optimize limited resources and at the same time to decrease the expected latency for users. The crucial parameter for optimization both in cloud computing and in desktop grids is the level of redundancy (replication) for service requests/workunits. In this paper we study the optimal replication policies by considering three variations of Fork-Join systems in the context of a multi-server queueing system with a versatile point process for the arrivals. For services we consider phase type distributions as well as shifted exponential and Weibull. We use both analytical and simulation approach in our analysis and report some interesting qualitative results.

## 1 Introduction and Model Description

During early stages of growth, small and medium enterprises face the problem of procuring the required computing power. The main alternatives to the most expensive option of owning and running datacenter are the cloud computing (CC) and desktop grids (DG).

According to NIST [25], CC “is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” According to Murugesan and Bojanova [26], CC “offers huge computing power, on-demand scalability, and utility-like availability at low cost”. While there are many types of CC [7], in the public cloud the users are provided requisite computing resources in a dynamic way over the Internet with the help of Web services. These Web services are provided generally by third-party vendors so as to have low cost access to users. Thus, the service providers are required to optimize the limited resources and at the same time decrease the expected latency for its users.

DG is a specifically designed, inexpensive, and powerful option for tasks that (a) require huge computational resources and (b) may be split into a large number of loosely coupled subtasks (known as workunits). The computational resources are harvested from the desktops, tablets, GP-GPUs, and servers owned by volunteers (in case of the so-called volunteer computing [33]) or by enterprise itself (the Enterprise DG [14]), utilizing the idle times of the aforementioned hosts. The diversity of computational resources makes time to complete the task highly unpredictable. Further, computation is only one of many steps involved in solving applied research problems and hence reduction of the expected latency is an important aspect of DG computing.

A common mechanism used in the CC as well as DG is replication. Each workunit is processed by multiple hosts until the quorum (required number of valid results from these hosts) is obtained. Replication reduces expected latency [13, 15], whereas quorum reduces the probability of malicious activity and increases application turnaround [35]. Note that replication significantly complicates the model and for this reason only recently researchers have started applying queueing theory to study CC [8,9, 18] and DG [6].

One of the possible models to study the concepts of replication and quorum may be seen in the class of Fork-Join (FJ) systems. In a classical c-server FJ system an arriving job is split into c tasks and each of these is sent to one of the c (homogeneous) servers, each having a distinct queue. Service of a job is completed as soon as all c tasks of the job have completed their services (see e.g., [2, 8, 20, 27]). In [17] a generalization of the FJ queueing model is considered (referred to as (c, k)-FJ system), wherein an arriving job, split into c tasks and sent to each of c servers, is said to have completed the service when any k out c tasks are completed. Note that (c, c)-FJ is the classical FJ system. In [17], the Fork-Early-Cancel variation of the FJ system (referred to as (c, k)-FEC) is considered, wherein an arriving job, split into c tasks and sent to each of c queues, waits until any k of c tasks start the service, and at that moment the remaining (ck) redundant tasks are canceled. A detailed comparison of FJ and FEC models is performed in [18]. Summarizing the important results from [16, 17, 18]),

1. (c, 1)-FJ system is equivalent to a single server system with first-come-first-served (FCFS) discipline, in which the service times are given by the minimum of c iid. service times;

2. (c, 1)-FEC system is equivalent to a c-server system with FCFS discipline;

3. (c, k)-FJ is upper-bounded by (c, k)-Split-Merge (SM) system (in which the replicas of a job are not allowed to start service before the previous job completes its service), equivalent to a single server system, in which the service times are obtained as the k-th order statistics of c iid. random variables.

In [37] the authors study a multiserver FJ-system with Markovian arrival process (MAP), phase type (PH) services, task vacations and intercommunication times (modeled as PH random variables). An approximate solution based on stochastic decomposition is derived, and examples are presented comparing the approximation with simulation results under four dynamic scheduling policies for a few scenarios. We also note an interesting overview of FJ and related systems [39]. Specifically, in this survey paper, the author reviews the queueing systems related to FJ systems, the analysis of certain Markovian queues, methods to estimate the expected value of the maximum (which play an important role in CC and DG models) of different distributions, and approximations for the mean delay.

The focus of our paper is to study the three FJ systems: (i) (c, 1)-FJ system, (ii) (c, 1)-FEC system, and (iii) (c, k)-SM systems, 1 ⩽ kc, under the assumptions that the jobs arrive according to a MAP.

The MAP, introduced as a versatile Markovian point process by Neuts [28] and later simplified using less notations by Lucantioni, et.al. [23], is described by an underlying (continuous-time) Markov chain with generator, say, D, of dimension m such that D = D0+ D1. Note that the matrix D0 governs transitions without arrivals and the matrix D1 governs transitions inducing arrivals to the system. It should be pointed out that MAP is a rich class of point processes and includes many classical processes such as Poisson, PH-renewal processes, and Markov-modulated Poisson process. For more details on MAP and their applications in stochastic modelling, we refer to [22, 23, 30, 31]. Further, we refer the reader to [1, 4, 5] for a review and recent work on MAP.

We assume that all c servers are homogeneous and that the services they provide are PH with representation (β, S) of order n. In simulations, however, we consider general service time distribution.

In the sequel we need the following notations. By e we denote a column vector of 1’s; ei, a unit column vector with 1 in the ith position and 0 elsewhere; and I, and identity matrix (of appropriate dimensions). Should there be a need to emphasize the dimension, we will do so like Im rather than I, and similarly for the others. The notation S0 is such that Se + S0 = 0. The symbols, ⊗ and ⊕, respectively, stand for the Kronecker product and Kronecker sum of matrices. For details and properties on Kronecker products and Kronecker sums we refer the reader to [11, 24, 38].

Suppose λ and μ, respectively, denote the arrival and service rates. It is easy to verify that λ = δD1e and μ = [β(−S)−1e]−1, where δ is the stationary probability vector of the irreducible generator D and is obtained as the unique (positive) probability vector satisfying δD = 0, δe = 1. The expected latency E(𝓛) (defined as the average sojourn time of a job in the system) and the expected cost of computing E(𝓒) (which is taken to be the average service time of a job) are two performance measures that play a key role in identifying an optimum (replication) strategy.

In this section we will study the FJ, SM and FEC systems as described in Section 1 in steady-state.

## 2.1 (c, 1)-FJ system

In this system, recall, an arriving job is split into c tasks and served by each of c servers until the earliest service completion of the task, when all other tasks of the job in service are removed from the system. This system is known to be equivalent to MAP/PH/1 queue [18], where the service time, obtained as the minimum of c identical (β, S) PH-distributions, has PH-distribution (α, T) of order nc, given as follows [29] $α=β=β⊗⋯⊗β⏟c,T=S⊕⋯⊕S⏟c.$

Let μT = [α(−T)−1 e]−1 denote the service rate. In order to study the model in this section as a continuous-time Markov chain (CTMC), we need to keep track of the number of jobs, N(t) in the system, the phase of the service times, Jr, 1 ⩽ rc, if any, and the phase of the arrival process, M(t) at time t ⩾ 0. The state space of CTMC {(N(t), J1(t),…, Jc(t), M(t)) : t ⩾ 0} is given by $Ω~={i_,i⩾0},$

where

• set 0 = {k, 1 ⩽ km} of dimension m corresponds to the case where the system is idle and the MAP process is in one of m phases;

• set i = {(i, j1, ⋯, jc, k), 1 ⩽ i, 1 ⩽ jrn, 1 ⩽ km} of dimension mnc corresponds to the case when a job is at service, i − 1 jobs are in the queue (if any), the service time at rth server (1 ⩽ rc) is in one of the n phases, the MAP process in one of the m phases.

The generator (see e.g., [30]) of the CTMC governing the MAP/PH/1 queue is of the form $Q~=D0α⊗D1T0⊗IA~1A~0A~2A~1A~0A~2A~1A~0⋱⋱⋱,$(1)

where $A~0=Inc⊗D1,A~1=T⊕D0,A~2=T0α⊗Im.$

## 2.1.1 The steady-state probability vector

Towards this end, we define , partitioned as = ((0), (1), ⋯), to be the steady-state probability vector of generator given in (1). Note that is of QBD-type and thus possesses matrix-geometric steady-state probability vector [29] (for details on QBD processes see [1, 4]). Specific structure of the generator given in (1), allows to obtain the vector as follows.

#### Theorem 1

Under the stability condition λ < μT, the steady-state probability vector x̃ is of modified matrix-geometric type. Specifically, we have $x~(0)D0+x~(1)(T0⊗I)=O,x~(0)(α⊗D1)+x~(1)[A~1+R~A~2]=O,x~(i)=x~(1)R~i−1,i⩾1,$

where R̃ is the minimal non-negative solution to the matrix-quadratic equation: $R~2A~2+R~A~1+A~0=0.$(2)

and the normalizing equation is given by $x~(0)e+x~(1)(I−R~)−1e=1.$

#### Proof

The proof follows by applying the matrix-geometric results as seen in [29].

## 2.1.2 Expected latency and the cost of computing

The expected latency, E(𝓛) and the expected cost of computing, E(𝓒), for the FJ system are given by [29] $E(L)=1λx~(1)(I−R~)−2e,E(C)=cμT.$

#### Remark 1

In a MAP/M/1-type system, the expected cost equals E(𝓒) = μ−1 since the minimum of c exponentials with rate μ is exponential with rate μT = cμ.

We note that in general E(𝓒) depends on c and the type of dependence is related to the so-called log-concavity (log-convexity) of the service time distribution, namely, the following lemma holds [18]:

#### Lemma 1

If the service time of a task has log-concave (log-convex) distribution, then E(𝓒) is non-decreasing (non-increasing) in c.

## 2.1.3 Explicit solution for MAP(2)/M/1-type system

In general, the matrix has to be obtained numerically with any of the iterative procedures (see e.g. [12]). However, in the case when m is small, one can obtain explicitly using complex variable approach as presented in [10]. Suppose that we consider the arrival process to be a MAP with two phases (m = 2) and the service distribution of a single task of the job is exponential with rate μ. Thus, the system is equivalent to MAP/M/1-type system, with the service time of the job being exponentially distributed with rate μT = cμ (see Remark 1), and the stability criterion reduces to λcμ. In this case, may be obtained explicitly and the details are as follows.

First, note that for the current special case n = 1, β = 1, S = −μ, which implies α = 1, T = −cμ, and T0 = cμ. This simplifies the matrices Ãi, i = 0,1, 2 as follows: $A~0=D1,A~1=D0−cμI2,A~2=cμI2.$

Now we briefly outline the procedure of obtaining . The necessary details may be found in [10].

1. Write down the determinantal polynomial det(A(ξ)) := det(Ã0 + ξÃ1 + ξ2Ã2) for ξ complex.

2. Using trigonometric solution, obtain the greatest root ξ3 of the third degree polynomial det(A(ξ))|(ξ − 1) = a3ξ3 + a2ξ2 + a1 ξ + a0, with roots known to be real.

3. Find b0 = −ao/(a3ξ3), b1 = a2/a3 + ξ3.

4. Find as follows: $R~=[b0A~2−A~0][A~1−b1A~2]−1.$

After obtaining , following (2), it is easy to obtain the explicit solution for the steady-state probability vector . Thus, the value E(𝓛) may also be obtained exactly, while E(𝓒) = μ−1 (see Remark 1). It allows to evaluate the system performance for relatively large values of c.

To illustrate this approach, we evaluate the value E(𝓛) for c = 1,…, 1000, with an example of MAP(2) with parameter matrices (D0, D1) given by $D0=−422−5,D1=1121.$

In this case, verify that the stationary vector δ = (4/7, 3/7) and the fundamental rate $\begin{array}{}\lambda =2\frac{4}{7}.\end{array}$ We also take μ = 3 so that the system is stable for any c ⩾ 1. Then, we obtain following the procedure above, and depict the dependence E(𝓛) on c. The results are displayed in Fig. 1, with logarithmic y-axis. It can be seen, that the E(𝓛) decreases rapidly with increasing number of replicas c.

Figure 1

Reduced expected latency in MAP(2)/M/1-type system: the expected latency E(𝓛) (log. scale) vs. number of servers (or replicas), c.

## 2.2 FEC system

Here, we consider a system in which an arriving job is split into c tasks and each task is sent to each of the c servers. As soon as any one of the c tasks starts service, all other (c − 1) redundant tasks waiting in the queue are canceled. The system corresponds to a queueing model of MAP/PH/c-type. To obtain the two steady-state performance measures, we reuse a few notations from the FJ system. Let N(t), Jr, 1 ⩽ rc, and M(t) denote, respectively, the number of jobs in the system, the phase of service of the rth server (ignored if that server is idle), and the phase of the arrival process, at time t. The process {(N (t), J1(t), ⋯, Jc(t), M(t)) : t ⩾ 0} is a CTMC with the state space given by $Ω2={⋆_}⋃{i^_,1⩽i⩽c−1}⋃{i_,i⩾0},$

where the set of states is defined as follows:

• The set * = {k, 1 ⩽ km} of dimension m corresponds to the case where the system is idle and the MAP process is in one of m phases.

• î = {(i, j1, ⋯, ji, k), 1 ⩽ jrn, 1 ⩽ km} of dimension mni corresponds to the case where i, 1 ⩽ ic − 1, servers are at service, with rth (ri) server busy in one of the n phases, and the MAP process is in one of the m phases.

• i = {(i, j1, ⋯, jc, k), 1 ⩽ jrn, 1 ⩽ km} of dimension mnc corresponds to the case where all c servers are busy with i ⩾ 0 jobs waiting in the queue; the MAP process is in one of m phases, the rth (rc) server busy in one of n phases.

It is easy to verify the CTMC with the above state space has the infinitesimal generator matrix of the form: $Q=D0B0,1B1,0B1,1B1,2⋱⋱⋱Bc−1,c−2Bc−1,c−1Bc−1,cBc,c−1A1A0A2A1A0⋱⋱⋱,$(3)

where $Bi,j=S⊕S⊕⋯⊕S⏟j⊕D0,1⩽j⩽c−1,Bj,j−1=∑k=0j−1Ink⊗S0⊗I^j−k,1⩽j⩽c,Bj,j+1=Inj⊗β⊗D1,0⩽j⩽c−1,A0=Inc⊗D1,A1=S⊕S⊕⋯⊕S⏟c⊕D0,A2=∑k=0j−1Ink⊗S0β⊗I^c−k,I^j=Imnj−1.$

## 2.2.1 The steady-state probability vector for FEC system

In this section we will look at the steady-state probability vector of the CTMC with generator given in (3). Towards this end, we define x, partitioned as x = (x*, (1), ⋯, (c − 1), x(o), x(1), x(2), ⋯), to be the steady-state probability vector of 𝓒. That is, x satisfies $xQ=0,xe=1.$(4)

Note that (a) x* of dimension m gives the steady-state probability vector that all c servers are idle with the arrival process is in one of m states; (b) (j), 1 ⩽ jc − 1, of dimension mnj gives the steady-state probability vector that j servers are busy with no job waiting in the queue, the arrival process is in one of m phases, and each one of j busy servers is in one of n phases; (c) x(i), i ⩾ o, of dimension mnc gives the steady-state probability vector that all c servers are busy with i jobs are waiting in the queue, the arrival process is in one of m phases, and each one of c busy servers is in one of n phases.

It is easy to verify (see [29]) the following theorem.

#### Theorem 2

Under the stability condition that λ < cμ, the steady-state probability vector is of modified matrix-geometric type. Specifically, we have $x⋆D0+x^(1)B1,0=0,x⋆B0,1+x^(1)B1,1+x^(2)B2,1=0,x^(j−1)Bj−1,j+x^(j)Bj,j+x^(j+1)Bj+1,j=0,2⩽j⩽c−1,x^(c−1)Bc−1,c+x(0)[A1+RA2]=0,x(i)=x(0)Ri,i⩾1,$(5)

where R is the minimal non-negative solution to the matrix-quadratic equation: $R2A2+RA1+A0=0.$(6)

and the normalizing equation is given by $x⋆e+∑j=1c−1x^(j)e+x(0)(I−R)−1e=1.$(7)

## 2.2.2 Expected latency and the cost of computing for FEC system

The expected latency and the expected cost of computing for the FEC system are given by $E(L)=1λ[x(0)(R(I−R)−2+c(I−R)−1)e+∑i=1c−1ox^(i)e],E(C)=1μ.$

## 2.3 (c, k)-SM system

In this case we split each job into c tasks only at (beginning) service epochs. The jobs sequentially enter into service (in contrast to FJ system, where the jobs are dispatched to each of c servers immediately, and each server has its independent queue). Thus, the service is offered according to FCFS basis. After being processed, each task is routed to a station where they wait until a quorum of k, 1 ⩽ kc, of processed tasks of the same job is obtained before it leaves the system. At that time the remaining (ck) servers are preempted and now all c servers are available for serving the next task. Thus, in this case, we study (c, k)-SM system, 1 ⩽ kc, as a single server queueing system in which the service times are obtained as the k-th order statistics of c identically distributed random variables. It is known (see, e.g., [3]) that the kth order statistics of c identically distributed PH-distributions is again a PH-distribution. However, the dimension of the PH-distribution of the kth order statistics grows exponentially with k and hence in this paper we will resort to simulation for the steady-state analysis of the model. We refer the reader to [36] for an extended discussion of these type of models.

## 3 Illustrative Examples

The purpose of this section is to discuss a few illustrative examples to bring out the qualitative nature of the models under study. Towards this end, we consider five arrival processes and three service time distributions. These five MAPs and three PH-representations are as follows.

1. Erlang (ERLA): $D0=−220−2,D=0020$

2. Exponential (EXPA): $D0=(−1),D=(1)$

3. Hyperexponential (HEXA): $D0=−1.9000−0.19,D=1.710.190.1710.019$

4. MAP with negative correlation (MNCA): $D0=−1.002221.0022200−1.00222000−225.75,D=0000.0100200.9922223.492502.575$

5. MAP with positive correlation (MPCA): $D0=−1.002221.0022200−1.00222000−225.75,D=0000.992200.010022.25750223.4925.$

Note first that the first three arrival processes, namely, ERLA, EXPA, and HEXA, have zero correlation for two successive inter-arrival times. The arrival processes labeled MNCA and MPCA, respectively, have negative and positive correlation (with values -0.4889 and 0.4889) for two successive inter-arrival times. The ratio of the standard deviation of the inter-arrival times of these five arrival processes with respect to ERLA are, respectively, 1, 1.41421, 3.17451, 1.99336, and 1.99336. The above MAP processes will be normalized so as to have a specific arrival rate.

For the service times we consider the following three (β, S) PH-distributions. These distributions will be normalized so as to arrive at a desired value for μ.

1. Erlang (ERLS) : $β=(1,0),S=−220−2.$

2. Exponential (EXPS) : $β=(1),S=−1.$

3. Hyperexponential (HEXS) : $β=(0.9,0.1),S=−1000−1.$

#### Example 1

The purpose of this example is to see the behavior of E(𝓛) and E(𝓒) under different arrival processes and service times. Towards this end, we fix λ = 0.8, μ = 1.0, vary c = 1,… 5 and plot the measures ln(E(𝓛)) and E(𝓒) under various scenarios, in Fig. 2.

Figure 2

ln(E(𝓛)) vs E(C) as c is varied under different scenarios

First, we want to point out that in [18] for the case of Poisson arrivals, it was shown that, as c increases,

• (a)

for log-concave type service distribution, E(𝓛) decreases and E(𝓒) increases;

• (b)

for exponential E(𝓛) decreases at no additional cost, that is, E(𝓛) decreases while E(𝓒) remains constant;

• (b)

log-convex type service distribution E(𝓛) and E(𝓒) both decrease.

We notice from Fig. 2, that (for all arrival processes considered) for Erlang (log-concave) services, ln(E(𝓛)) decreases and E(𝓒) increases with increasing c. We also notice that MPCA arrivals appears to produce the largest E(𝓒) as compared to other arrivals. In the case of hyperexponential (log-convex) services both ln(E(𝓛)) and E(𝓒) decrease with increasing c. Note that the results agree with observed in [18]. It is also worth pointing out that among renewal arrivals, the one with the largest coefficient of variation, namely, hyperexponential, yields the largest E(𝓛).

#### Example 2

The purpose of this example is to compare FJ and FEC systems. In order to do this, we vary both λ and c and plot E(𝓛) in Fig. 3 and 4, respectively, for ERLS and HEXS under various scenarios.

Figure 3

Plot of E(𝓛) under various scenarios for Erlang services

Figure 4

Plot of E(𝓛) under various scenarios for hyperexponential services

Looking at these two figures, we observe the following key points.

• Obviously, having a larger variability in services (HEXS) yields a higher value for E(𝓛) as compared to the ones with smaller variability (ERLS).

• When considering Erlang services (see Fig. 3), we notice that for both Erlang and hyperexponentail arrivals, a pattern (similar to the one observed in [18]) with regard to E(𝓛) when comparing the two systems, FJ and FEC. Namely, starting from some input rate (respectively, with increasing load) the FJ system gives smaller E(𝓛) compared to FEC. The crossing points of the two curves for these systems depend on the load (and hence, on the ratio of λ and c). Also, the crossing points for hyperexponential arrivals are less than the corresponding ones for Erlang arrivals. However, for positively correlated arrivals (MPCA) we see that FJ system appears to yield a smaller value compared to those of FEC system, and furthermore, for large λ values the difference is even more apparent.

• When looking hyperexponential services (see Fig. 4), we notice that for all combinations, FJ system appears to yield a much smaller expected latency as compared to the corresponding FEC system. This is intuitively clear, since having a highly variable services such as hyperexponential will yield a smaller service time for FJ system compared to FEC system. While this has been observed in ([18]) for the case of Poisson arrivals and for log-concave and log-convex type services, we notice here the same phenomenon for other types of arrival processes. But more than this observation, the fact that for non-Poisson arrivals and non-exponential services the measure E(𝓛) appears to differ significantly for the two systems: FJ and FEC.

It is worth mentioning here that when the services are exponential, we noticed that FJ system yields a higher value for E(𝓛) when compared to the corresponding FEC system for all combinations and for all values of λ. When λ is increased (keeping the stability) E(𝓛) appears to approach the same value for both systems. To save space, we do not provide the figures here, since they do not provide additional insight.

## 4 Simulation

So far, we studied FJ and FEC systems analytically. However, the state space of the CTMC governing the system under study grows exponentially with c, and hence we turn to simulation to study the systems for large values of c. To complete the experiments, we validated our simulation models with the analytical ones. Towards this end, we consider the arrival processes and service time distributions defined in Section 3. We fix λ = 0.8, μ = 1.0, and vary c from 1 to 5. We compare the analytical results for FJ and FEC systems with the simulated results, for the same set of parameter values. The simulation models were implemented using ARENA [19], a powerful software used in applied stochastic modeling and other areas. We made five simulation runs with each run lasting 1,000,000 units of model time.

In Tables 1 and 2, respectively, we display the error percentages (i.e., error = |analytical – simulated|/analytical) for E(𝓛) and E(𝓒) (in parentheses), for the FJ and FEC systems. Generally the simulated results and the corresponding analytical results agree closely, the error percentages are less than 1%. We also denoted the scenarios with error percentages from 1.09% to 6.65%, and re-run the simulations with 10,000,000 units of model time, which provided a significant error reduce (e.g. for MPCA arrivals, EXPS services, and c = 1, the error percentage dropped from 6.65% to 0.92%).

Table 1

Error percentages for E(𝓛)(E(𝓒)) under various scenarios for FJ system

Table 2

Error percentages for E(𝓛)(E(𝓒)) under various scenarios for FJ system

Having validated the simulated models with the analytical models for the FJ and FEC systems, we now simulate these two models when dealing with large values of c.

#### Example 3

We fix λ= 0.9, μ = 1.0, take c = 1,…, 10, 15, 20, 50 and plot the measures E(𝓒) and E(𝓛) under various scenarios, in Fig. 5 and Fig. 6, respectively. It is clear from these figures that

Figure 5

Plot of E(𝓒) under various scenarios

Figure 6

Plot of E(𝓛) under various scenarios

• for the FJ system, E(𝓒) increases as the number of servers (or replication parameter) is increased only for the case of ERLS. It decreases to a constant for the HEXS, and we stress the fast decrease for low values of c. Note that we plot the theoretical value 1/μ for the EXPS, since the result follows analytically. In general, E(𝓒) depends on the type of service time distribution, which agrees to Remark 1 and Lemma 1, since the ERLS is log-concave, while HEXS is log-convex.

• for the FEC system, E(𝓒) is given by $\begin{array}{}\frac{1}{\mu }\end{array}$ confirming the theoretical result (see Section 2.2.2).

• we see an interesting pattern among different scenarios for the measure E(𝓛). This measure appears to be larger for FEC compared to that of FJ system for all scenarios except for MPCA arrivals and ERLS services.

In this case (i.e., for MPCA with ERLS) we see that FJ system yields a higher E(𝓛). This indicates that when services have a coefficient of variation less than one and for positively correlated type of arrivals, one may need a larger number of servers as compared to other scenarios.

## 4.1 Simulation of (c, k)-SM System

In this section we concentrate on simulating (c, k)-SM system under various scenarios. In addition to the exponential service time distribution EXPS mentioned earlier, we also consider a log-concave and a heavy-tailed one. The cumulative distribution functions (CDF) follow:

D.Shifted Exponential (SEXS): The shifted exponential with a shift of magnitude Δ > 0 is one with CDF

$FSE(x)=?1−e−μ(x−Δ),x⩾Δ,0,x>Δ.$

E.Weibull (WEIB): The 2-parameter Weibull considered here has the CDF

$FWB(x)=?1−e−(2x)0.5,x⩾0,0,x<0.$

First, note that the simulated mean ($\begin{array}{}{\mu }_{s}^{{}^{\prime }}\end{array}$) and standard deviation (σS) when using the three service distributions are given in Table 3 below. In the example below, we simulated the models using three simulation runs and each run is done to cover 500,000 jobs leaving the system.

Table 3

Simulated $\begin{array}{}{\mu }_{s}^{{}^{\prime }}\end{array}$ (σS) for (c, k) systems when μ = 1

#### Example 4

In this example we look at (c, k)-SM system by considering c = 3,4, 5, vary k = 1,…, c and vary ρ = λ$\begin{array}{}{\mu }_{s}^{{}^{\prime }}\end{array}$ = 0.2, 0.5, 0.9, 0.95, by choosing λ appropriately using the values of $\begin{array}{}{\mu }_{s}^{{}^{\prime }}\end{array}$ displayed in Table 3 above. Note that the values of λ will be dependent of the type of arrival process as well as the service distribution. However, using ρ as a common parameter we can compare various scenarios to bring out the qualitative aspects of the system under study. The results of experiment are presented on Fig. 7. It is clear from the results, that the measure E(𝓛) appears

Figure 7

Plot of E(𝓛) under various scenarios when c = 3, 4, 5 and k = 1, …, c for ρ = 0.2, 0.5, 0.9, 0.95 (black, red, green, blue circles correspondingly). Radius of the circle is proportional to the E(𝓛).

• to decrease with increasing c for all scenarios and for all k (where comparison is valid). For example, we can compare k up to 3 when dealing with c = 3, 4, 5, we can compare k up to 4 when dealing with c = 4, 5.

• to exhibit an interesting pattern. As k increases to c, we see the expected latency for heavy-tailed services like Weibull appears to go from least value (compared to other service distributions) to the highest. This seems to be the case for higher c. This phenomenon can be explained as follows. For heavy-tailed distribution, when k is closer to c, the mean service time will be much higher as compared to other distributions, resulting in a higher E(𝓛).

## 5 Concluding Remarks and future research work

In this paper we studied queuing models useful in the study of efficient redundancy techniques in CC and DG systems. We combined both analytic and simulation approach in the study of such queuing models under the assumption that the arrivals occur according to a versatile Markovian point process. Through simulation and numerical methods we showed the effect of redundancy on the expected latency as well as on the expected cost. The models studied in this paper can be extended in a number of ways. For example, heavy-tailed distributions for services [36] play a significant role in CC and DG areas. So, it will be of interest to explore this in the context of classical FJ system as well as extensions of FJ and SM systems. This will shed additional light similar to [18] but for more versatile arrival process. It is also worthy of consideration for further extensions both from theoretical and algorithmic points of view to compare the results with the model presented in [37]. It should be pointed that some preliminary results for multiserver queues with log-convex type services are available in [36].

## Acknowledgement

The work of AR is partially supported by RFBR, projects 18-07-00147, 18-07-00156, and by President RF’s grant No.MK-1641.2017.1. The authors express their sincere thanks to the anonymous reviewers and the editor for their suggestions and paying attention to some key references, that improved the presentation of the paper.

## References

• [1]

Artalejo J.R., Gomez-Correl A., He Q.M., Markovian arrivals in stochastic modelling: a survey and some new results, Statistics and Operations Research Transactions, 2010, 34, 2, 101–144. Google Scholar

• [2]

Baccelli F., Makowski A.M., Shwartz A., The fork-join queue and related systems with synchronization constraints: stochastic ordering and computable bounds, Adv. Appl. Prob., 1989, 21, 629–660,

• [3]

Bladt M., Nielsen B. F., Matrix-Exponential Distributions in Applied Probability, Probability Theory and Stochastic Modelling, 81, Springer US, Boston, MA, 2017,

• [4]

Chakravarthy S.R., The batch Markovian arrival process: A review and future work, In: A. Krishnamoorthy et al.(Eds.), Advances in Probability Theory and Stochastic Processes, Notable Publications Inc., NJ, 2001, 21–39. Google Scholar

• [5]

Chakravarthy S.R., Markovian arrival processes. In: Wiley Encyclopedia of Operations Research and Management Science, Published Online: 15 JUN 2010,

• [6]

Chernov I., Nikitina N., Optimal Quorum for a Reliable Desktop Grid, In: Proceedings of the Second International Conference BOINC-based High Performance Computing: Fundamental Research and Development (BOINC:FAST 2015), CEUR Workshop Proceedings, Vol-1502, 2015, 31–36. Google Scholar

• [7]

Furht B., Escalante A.(Eds), Handbook of Cloud Computing, Springer, New York, USA, 2010, .

• [8]

Gardner K., Zbarsky Z., Doroudi S., Harchol-Balter M., Hyytiä E., Scheller-Wolf A., Queueing with redundant requests: exact analysis, Queueing Syst., 2016, 83, 227–259,

• [9]

Gardner K., Harchol-Balter M., Scheller-Wolf A., Van Houdt B., A Better Model for Job Redundancy: Decoupling Server Slowdown and Job Size, IEEE/ACM Transactions on Networking, 2017, PP, 99, 1–15,

• [10]

Rama Murthy Garimella, Rumyantsev A. On An Exact Solution Of The Rate Matrix Of Quasi-Birth-Death Process With Small Number Of Phases, In: ECMS 2017 Proceedings. European Council for Modeling and Simulation, 2017, 713–719,

• [11]

Graham A., Kronecker Products and Matrix Calculus with Applications, Ellis Horwood, Chichester, UK, 1981. Google Scholar

• [12]

He Qi-Ming, Fundamentals of Matrix-Analytic Methods, Springer, New York, 2014,

• [13]

Heien E. M., Anderson D. P., Hagihara K. Batches with Unreliable Workers in Volunteer Computing Environments, Journal of Grid Computing, 2009, 7, 4, 501–518,

• [14]

Ivashko E., Enterprise Desktop Grids, In: Proceedings of the Second International Conference BOINC-based High Performance Computing: Fundamental Research and Development (BOINC:FAST 2015), CEUR Workshop Proceedings, Vol-1502, 2015, 16–21. Google Scholar

• [15]

Joshi G., Soljanin E., Wornell G., Efficient replication of queued tasks for latency reduction in cloud systems, In: Proceedings of 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), 2015, 107–114,

• [16]

Joshi G., Soljanin E., Wornell G., Efficient redundancy techniques for latency reduction in cloud systems, arXiv:1508.03599 [cs.DC], Aug. 2015. Google Scholar

• [17]

Joshi G., Solijanin E., and Wornell G., Queues with redundancy: Latency-Cost analysis, Performance Evaluation Review, 2015, 43, 2, 54–56,

• [18]

Joshi G., Efficient Redundancy Techniques to Reduce Delay in Cloud Systems, Ph.D. Thesis, MIT, June 2016. Google Scholar

• [19]

Kelton W.D., Sadowski R.P., Swets N.B., Simulation with ARENA, Fifth ed., McGraw-Hill, New York, 2010. Google Scholar

• [20]

Kim C., Agrawala A.K., Analysis of the fork-join queue, IEEE Trans. Comput., 1989, 38(2), 1041–1053,

• [21]

Latouche G., Ramaswami V., Introduction to matrix analytic methods in stochastic modeling, SIAM, 1999,

• [22]

Lucantoni D., Meier-Hellstern K.S., Neuts M.F., A single-server queue with server vacations and a class of nonrenewal arrival processes, Advances in Applied Probability, 1990, 22, 676–705,

• [23]

Lucantoni D.M., New results on the single server queue with a batch Markovian arrival process, Stochastic Models, 1991, 7, 1–46.

• [24]

Marcus M., Minc H., A Survey of Matrix Theory and Matrix Inequalities, Allyn and Bacon, Boston, MA, 1964. Google Scholar

• [25]

Mell P., Grance T., The NIST Definition of Cloud Computing, Special publication 800-145, National Institute of Standards and Technology: U.S. Department of Commerce, 2011,

• [26]

Murugesan S., Bojanova I., Encyclopedia of cloud computing, John Wiley & Sons, Ltd., UK, 2016. Google Scholar

• [27]

Nelson R., Tantawi A.N., Approximate analysis of fork/join synchronization in parallel queues, IEEE Trans. Comput., 1988, 37, 6, 739–743,

• [28]

Neuts M.F., A versatile Markovian point process, Journal of Applied Probability, 1979, 16, 764–779,

• [29]

Neuts M.F., Matrix-geometric solutions in stochastic models: An algorithmic approach, The Johns Hopkins University Press, Baltimore, MD, 1981. Google Scholar

• [30]

Neuts M.F., Structured stochastic matrices of M/G/1 type and their applications, Marcel Dekker, NY, 1989. Google Scholar

• [31]

Neuts M.F., Models based on the Markovian arrival process, IEICE Transactions on Communications, 1992, E75B, 1255–1265. Google Scholar

• [32]

Neuts M.F., Algorithmic Probability: A collection of problems, Chapman and Hall, NY, 1995.Google Scholar

• [33]

Nouman Durrani M., Shamsi J. A., Volunteer computing: requirements, challenges, and solutions, Journal of Network and Computer Applications, 2014, 39, 369–380,

• [34]

Qiu Z., Perez J.F., Harrison P.G., Tackling latency via replication in distributed systems, In: Proceedings of ICPE’16, March 12-18, 2016, Delft, Netherlands, 2016,

• [35]

Rood B., Lewis M.J., Grid Resource Availability Prediction-Based Scheduling and Task Replication, Journal of Grid Computing, 7, 2009, 479–500,

• [36]

Rumyantsev A., Chakravarthy S. R., Split-Merge Model of Workunit Replication in Distributed Computing, In: Proceedings of the Third International Conference BOINC-based High Performance Computing: Fundamental Research and Development (BOINC:FAST 2017), CEUR Workshop Proceedings, Vol. 1973, 2017, 27–34. Google Scholar

• [37]

Squillante M. S., Zhang Y., Sivasubramaniam A., Gautam N., Generalized parallel-server fork-join queues with dynamic task scheduling, Annals of Operations Research, 2008, 160, 227–255.

• [38]

Steeb W-H., Hardy Y., Matrix Calculus and Kronecker Product, World Scientific Publishing, Singapore, 2011. Google Scholar

• [39]

Thomasian A., Analysis of Fork/Join and Related Queueing Systems, ACM Comput. Surv., 2014, 47, 2, Article 17, 71 p., http://dx.doi.org/10.1145/2628913 Web of Science

• [40]

Voas J., Zhang J., Cloud computing: New wine or just a new bottle? IEEE ITPro, 2009, 15–17.Google Scholar

• [41]

Vulimiri A., Godfrey P.B., Mittal R., Sherry J., Ratnasamy S., Shenker S., Low latency via redundancy, In: Proceedings of CoNEXT’13, Santa Barbara, California, USA, 2013, 283–294,

Accepted: 2017-11-16

Published Online: 2018-03-03

Citation Information: Open Engineering, Volume 8, Issue 1, Pages 17–31, ISSN (Online) 2391-5439,

Export Citation