# Optimality and Complexity Analysis of a Branch-and-Bound Method in Solving Some Instances of the Subset Sum Problem

Roman Kolpakov and Mikhail Posypkin
From the journal Open Computer Science

## Abstract

In this paper we study the question of parallelization of a variant of Branch-and-Bound method for solving of the subset sum problem which is a special case of the Boolean knapsack problem. The following natural approach to the solution of this question is considered. At the first stage one of the processors (control processor) performs some number of algorithm steps of solving a given problem with generating some number of subproblems of the problem. In the second stage the generated subproblems are sent to other processors for solving (one subproblem per processor). Processors solve completely the received subproblems and return their solutions to the control processor which chooses the optimal solution of the initial problem from these solutions. For this approach we define formally a model of parallel computing (frontal parallelization scheme) and the notion of complexity of the frontal scheme. We study the asymptotic behavior of the complexity of the frontal scheme for two special cases of the subset sum problem.

## 1 Introduction

We consider one of the possible parallel realizations of a variant of Branch-and-Bound method for solving the subset sum problem which is a particular case of knapsack problem [1]. Along with the dynamic programming method [2], the Branch-and-Bound method is a basic method for solving this problem. The Branch-and-Bound method is based on step-by-step decomposition of a given problem to subproblems with removing from the consideration subproblems which certainly have no optimal solutions. The most simple way of parallelization of such computations is as follows. One of the processors being used is chosen as a control processor that processes some number of decompositions on the first stage. On the second stage, the obtained subproblems are sent to the other processors, one subproblem to each processor. Then processors solve completely the received subproblems using the Branch-and-Bound method, and the control processor collects all the obtained solutions and chooses the optimal solution from them. More detailed description of this approach to parallelization of computations can be found in [3]. The described scheme of calculations parallelization suits ideally the distributed environments where the control processor interacts with the other processor only for sending a new job or for receiving calculation results. Many grid systems which are widely used today belong to this class of environments [4].

The subset sum problem (SSP) is a particular case of the knapsack problem where for each item the price is equal to the weight of the item. The subset sum problem is stated as follows:

(1) maximizef(x˜)=iNxiwi,subjecttog(x˜)=iNxiwiC,xi{0,1},iN,

where N = {1, . . . , n} is a set of integers between 1 and n, a capacity C and weights wi for iN are positive numbers (without loss of generality we assume that C < ∑iN wi).

The questions of parallelization of solving of SSP by dynamic programming method [5] and Horowitz and Sahni (two-list) method [6] in computational model with shared memory were actively investigated before in literature. Theoretical bounds on the complexity of parallel solving of SSP are obtained in [7,8,9,10,11] for dynamic programming method and in [12] for two-list method. Empirical parallel algorithms of solving SSP by dynamic programming method for CPU and GPU computational models are proposed in [11, 13, 14]. Empirical parallelizations of two-list method of solving of SSP for GPU and hybrid CPU/GPU computational models are proposed in [15,16,17]. A parallel GPU algorithm of solving some modification of SSP is proposed in [18]. More information on the state of the art in this field can be found in [19] where an efficient modification of the balanced algorithm [1] for SSP is proposed. New approaches for parallel solving of optimization problems were proposed in [20,21,22,23,24].

In the paper (extended journal version of [25]) we investigate a variant of the parallelization scheme described above which is called the frontal scheme. We give a definition of complexity of this scheme and study the asymptotic behavior of this complexity for two series of subset sum problems. Comparing the asymptotical estimates of the frontal scheme complexity obtained for these two series of problems we discover some qualitative effect for this complexity.

## 2 Preliminaries

A boolean tuple = (x1, x2, . . . , xn) such that g() ≤ C is called a feasible solution of the problem (1). A feasible solution of a problem (1) is called an optimal solution if for any other feasible solution of the problem (1) the inequality f () ≤ f () holds. Solving the problem (1) means finding at least one of its optimal solutions.

We define a map as the pair (I, θ) of a set IN and a mapping θ : I → {0, 1}. Any map (I, θ) defines a subproblem formulated as follows:

(2) maximizef(x)=iNwixi,subjecttog(x)=iNwixiC,xi=θ(i),iI,xi{0,1},iN\I.

Variables xi such that iI are called fixed variables of the subproblem (2), and variables xi such that iN \ I} are called free variables of this subproblem. We will refer to the subproblem (2) as the corresponding subproblem for the map (I, θ) and will refer to the map (I, θ) as the corresponding map for the subproblem (2). The elements of the map corresponding for a subproblem P will be denoted by IP and θP.

A boolean tuple = (x1, x2, . . . , xn) such that

g(x)C,xi=θ(i),iI,

is called a feasible solution of the subproblem (2). Clearly, any feasible solution of the subproblem (2) is a feasible solution of the problem (1) as well. A feasible solution of the subproblem (2) is called optimal if for any other feasible solution of this subproblem the inequality f () ≤ f () holds (for convenience, we will also call the optimal solution of a subproblem simply by solution).

Let W = ∑iN wi. We say that the subproblem (2) satisfies C0-condition if ∑iI θ(i)wi > C and satisifies C1-condition if ∑iI(1 − θ(i))wiWC. For subproblems satisfying C0- and C1-conditions we have the following obvious facts.

## Proposition 1

A subproblem satisfying C0-condition has no feasible solutions.

## Proposition 2

A subproblem satisfying C1-condition has the only optimal solution x̃ = (x1, x2, . . . , xn) such that

xi={θ(i)ifiI,1ifiN\I.

## Corollary 1

A subproblem can not satisfy both C0-condition and C1-condition at the same time.

Moreover, it is easy to see that in the case of I = N the subproblem (2) satisfies either C0-condition or C1-condition.

Let xi be a free variable of the subproblem (2). Then we can consider two subproblems P0 and P1 such that IP1 = IP2 = I ∪ {i} and

θPk(j)={θ(j)ifjI,kifj=i,k=0,1.

Note that the set of all feasible solutions of the subproblem (2) is the union of the sets of all feasible solutions of the subproblems P0 and P1, so any optimal solution of the subproblem (2) is an optimal solution of either the subproblem P0 or the subproblem P1. We will say that the subproblem (2) is decomposed to the subproblems P0 and P1 along the variable xi and call the subproblem P0 (P1) 0-decomposition (1-decomposition) of the subproblem (2). Note that if the subproblem (2) satisfies neither C0-condition nor C1-condition then a 0-decomposition (a 1-decomposition) of this problem can not satisfy C0-condition (C1-condition).

## 3 Branch-and-Bound algorithm

In this paper we consider one of the basic variants of the Branch-and-Bound method for solving SSP which we call the consecutive Branch-and-Bound (cBnB) algorithm. In order to solve a given subset sum problem by this algorithm, during the procedure of solving we maintain a FIFO subproblems queue containing for each time all subproblems of the given problem which are waiting for processing. To each time of this procedure we also keep the best found feasible solution of the given problem which we call the incumbent solution. The subproblems queue contains initially the given problem only. While the subproblems queue is not empty we remove a next subproblem P from this queue and process P in the following way depending on three possible cases.

1. Let P satisfy C0-condition. Then, by Proposition 1, P has no feasible solutions, so we don’t make any operations.

2. Let P satisfy C1-condition, so, by Proposition 2, P has the only optimal solution . Then we compare with the incumbent solution and, if f () > f (), we replace by .

3. Let P satisfy neither C0-condition nor C1-condition, so P has free variables. Then we decompose P along the free variable with the minimal index and put the obtained decompositions of P into the subproblems queue.

It is easy to see that after termination of solving the given problem by cBnB algorithm the incumbent solution will be an optimal solution of the problem. The complexity of solving SSP by cBnB algorithm is the number of subproblems processed during the procedure of solving. Note that, since processed by the algorithm subproblems are decomposed along the free variable with the minimal index, for each processed subproblem P we have IP = {1, 2, . . . , s} where 0 < sn and in the third case this subproblem is decomposed along the variable xs+1. Further, we denote the number s by sP.

The problem resolution process can be demonstrated by a rooted binary tree which we call a cBnB-tree of the problem. The subproblems processed by the cBnB algorithm form the set of the cBnB-tree nodes, i.e. the number of the cBnB-tree nodes is equal to the complexity of solving the problem by cBnB algorithm. The root of the cBnBtree is the given subset sum problem. Each subproblem decomposed by the cBnB algorithm is connected by arcs with the two decompositions of this subproblem. Thus, leaves of the cBnB-tree are subproblems satisfying either C0-condition or C1-condition, and internal nodes of the cBnB-tree are decomposed subproblems. We will mean by C0-leaf (C1-leaf) of the cBnB-tree a leaf satisfying C0-condition (C1-condition). Since each internal node of the cBnB-tree has two childrens the number of the cBnB-tree nodes is equal to 2L − 1 where L is the number of the cBnB-tree leaves.

Let P be a subproblem processed by the cBnB algorithm. Then by the level of P we will mean the number of distinct ancestors of P in the cBnB-tree, i.e. the given problem has level 0, the decompositions of the given problem have level 1, the decompositions of the decompositions of the given problem have level 2 and so on. In this paper we consider the cBnB algorithm which traverses all subproblems in the cBnB-tree by levels, using BFS search.

## 4 Frontal scheme of parallelization

We study questions concerning the parallelization of solving SSP by the cBnB algorithm. In particular, we consider the following scheme of parallelization which we call the frontal scheme. For applying this scheme it is assumed that potentially unrestricted number of processors can be used. Among used processors we choose one processor which is called control processor. All other used processors are called operable processors. The frontal scheme realization depends on some positive integer parameter l which we call the parallelization level of the scheme. The considered scheme has two stages of computations. At the first stage the control processor solve the given problem by the cBnB algorithm while the subproblems queue contains subproblems of level less than l. Note that after this stage the subproblems queue contains all subproblems which are the cBnB-tree nodes of level l. We will call these subproblems candidate subproblems. Then the control processor saves the current incumbent solution of the given problem and sends all candidate subproblems to operable processors (one subproblem to each processor). At the second stage operable processors which receive candidate subproblems solve completely these subproblems by the cBnB algorithm and send the obtained optimal solutions of subproblems back to the control processor that finds optimal solution of the given problem by comparing the received solutions and the saved incumbent solution.

The first stage processing can be presented by the fragment of the cBnB-tree induced by all subproblems processed at the first stage and all candidate subproblems, i.e. all subproblems of level not greater than l in the cBnB-tree. Note that this fragment is a rooted binary tree of depth l. We will call it the first stage tree of level l. Note that internal nodes of the first stage tree are subproblems decomposed at the first stage, and leaves of the first stage tree are either the cBnB-tree C0- and C1-leaves of level less than l (which are also called respectively C0- and C1-leaves of the first stage tree) or candidate subproblems (which are called candidate leaves of the first stage tree).

By the first stage complexity we mean the number of subproblems processed by the control processor at the first stage, i.e. the total number of internal nodes and C0- and C1-leaves in the first stage tree. By the second stage complexity we mean the maximal complexity of solving by the cBnB algorithm the subproblem sent to an operable processor. The frontal scheme complexity is defined as the sum of the first stage and the second stage complexities (thus in our complexity model we don’t take into account the time required for communications between processors and for comparing the received solutions).

Note that the frontal scheme complexity depends on the chosen parallelization level. The problem we address in the paper is what is the optimal value of parallelization level for which the frontal scheme has the minimal complexity. We investigate this problem for the following particular case of subset sum problem:

(3) maximizef(x˜)=iNaxi,subjecttog(x˜)=iNaxika+1,xi{0,1},iN,

where a > 1 and k ∈ {0, 1, 2, . . . , n − 1}. This case has been chosen because it is relatively simple and well studied in the literature [26, 27]. It is easy to see (see, e.g.,[28]) that C0-leaves of the cBnB-tree for the problem (3) are subproblems P such that i=1sP1θP(i)=k and θP (sP) = 1. These subproblems have one-to-one correspondence with boolean tuples (θP(1), θP(2), . . . , θP(sP), 0, 0, . . . , 0) ∈ {0, 1}n of weight[1] k + 1. So the cBnB-tree for the problem (3) has (nk+1) C0-leaves. It is also easy to see that C1-leaves of the cBnB-tree for the problem (3) are subproblems P such that i=1sP1θP(i)=k(nsp) and θP(sP) = 0 which are in one-to-one correspondence with boolean tuples (θP(1), θP(2), . . . , θP(sP), 1, 1, . . . , 1) ∈ {0, 1}n of weight k. So the cBnB-tree for the problem (3) has (nk) C1-leaves. Thus in this tree the total number of leaves is (nk+1)+(nk)=(n+1k+1) , and the total number of nodes is 2(n+1k+1)1 . Hence the complexity of solving the problem (3) by the cBnB algorithm is 2(n+1k+1)1 , i.e. this complexity depends on neither a nor the set of variable indexes. So without loss of generality we denote the problem (3) by P[n; k] and assume that 0-decomposition (1-decomposition) of the problem P[n; k] is the problem P[n − 1; k] (P[n − 1; k − 1]).

By LP[n;k](1)(l)(LP[n;k](2)(l)) we denote the first stage (the second stage) complexity of solving P[n; k] by applying the frontal scheme for the parallelization level l, and by LP[n;k](l)=LP[n;k](1)(l)+LP[n;k](2)(l) we denote the frontal scheme complexity of solving the problem P[n; k]. Define LP[n;k]*=minlLP[n;k](l) . A parallelization level l* is optimal for P[n; k] if LP[n;k](l*)=LP[n;k]* . Further, we investigate the asymptotic behavior of optimal parallelization levels for problems P[n; k] as n becomes infinite. This problem was considered earlier in [29, 30]. In particular, it is shown in [30] that in the case n/3 ≤ k ≤ 2n/3 the optimal parallelization level l* satisfies the relation l*=n214log2n+O(1) and LP[n;k]*=Θ(2n/2n4) . In this work we generalize this result to the case n/4 < k < 3n/4. Moreover, we for the first time consider the case of “small” values k. In particular, we compute the optimal parallelization level l* and the complexity LP[n;k]* for the case k354n(k(1354)n) and show in this way that the asymptotic behavior of the values l* and LP[n;k]* in this case is different from the asymptotic behavior of these values in the case n/4 < k < 3n/4. Thus, the main result of this paper is detecting the difference of the asymptotic behavior of the values l* and LP[n;k]* for the cases of “big” and “small” values of k.

## 5 Auxiliary results

We will call two problems P′ = P[n; k′] and P″ = P[n; k″] dual problems if k′ + k″ = n − 1. Let P′, P″ be dual problems which satisfy neither C0-condition nor C1-condition. It is easy to see that 0-decompositions (1-decompositions) of P′ satisfy C1-condition (C0-condition) if and only if 1-decompositions (0-decompositions) of P″ satisfy C0-condition (C1-condition). Moreover, if decompositions of P′ and P″ satisfy neither C0-condition nor C1-condition then 0-decompositions of P′ are dual to 1-decompositions of P″ and 1-decompositions of P′ are dual to 0-decompositions of P″. Note also that dual problems P′ and P″ have the same complexity of solving by cBnB algorithm. Using these observations, the following fact can be proved.

## Proposition 3

If k′ + k″ = n − 1 then LP[n;k′](l) = LP[n;k″](l) for any parallelization level l.

According to this proposition, without loss of generality we can restrict our consideration to the case of problems P[n; k] for kn/2. Further, we give an explicit formula for computing LP[n;k](1)(l) and LP[n;k](2)(l) in the case when kn/2 and lnk.

Let kn/2, lnk. For computing LP[n;k](1)(l) denote by T1 the first stage tree of level l for the problem P[n; k]. First consider the case lk+1. Note that in this case all subproblems of P[n; k] processed at the first stage of the frontal scheme satisfy neither C0-condition nor C1-condition, so all these subproblems are decomposed. Thus the tree T1 is a full binary tree of depth l containing 2l candidate leaves and 2l − 1 internal nodes, i.e. in this case LP[n;k](1)(l)=2l1 . Now consider the case k + 1 < lnk. Note that subproblems of P[n; k] processed at the first stage of the frontal scheme can not satisfy C1-condition because of lnk, so the tree T1 can contain only C0-leaves. It is not difficult to see that C0-leaves of T1 are all subproblems P of P[n; k] such that k < sP < l, θP(sP) = 1 and i=1sP1θP(i)=k . The number of such subproblems is s=kl2(sk)=(l1k+1) . Thus T1 contains (l1k+1) C0-leaves. Note also that all candidate leaves of T1 are decompositions of the problem P[n; k] subproblems of level l − 1 which satisfy neither C0-condition nor C1-condition. Since the problem P[n; k] subproblems of level l − 1 can not satisfy C1-condition, this is subproblems of level l − 1 which don’t satisfy C0-condition, i.e. subproblems P such that sP = l − 1 and i=1sPθP(i)k . The number of such subproblems is t=0k(l1t) , so T1 contains 2t=0k(l1t) candidate leaves. Thus the total number of leaves in T1 is (l1k+1)+2t=0k(l1t) , so T1 contains (l1k+1)+2t=0k(l1t)1 internal nodes. Therefore,

(4) LP[n;k](1)(l)=[(l1k+1)+2t=0k(l1t)1]++(l1k+1)=2t=0k+1(l1t)1.

Thus

(5) LP[n;k](1)(l)={2l1iflk+1,2t=0k+1(l1t)1ifk+1<lnk.

Since 2t=0k+1(l1t)12l1 , relation (5) implies

(6) 2t=0k+1(l1t)1LP[n;k](1)(l)2l1.

To compute LP[n;k](2)(l) , note that for fixed n the complexity of solving P[n; k] by the cBnB algorithm achieves the maximum value (n+1(n+1)/2) for k=n12 if n is odd and for k=n21,n2 if n is even. Moreover, for k<n2 this complexity increases as k increases. Using this observations for kn/2, we conclude that

(7) LP[n;k](2)(l)={2(nl+1(nl+1)/2)1ifln2(k+1),2(nl+1k+1)1ifln2(k+1).

Note also that for LP[n;k](1)(l) we have the following obvious fact.

## Proposition 4

LP[n;k](1)(l)LP[n;k](1)(l) for any l′ , lsuch that 0 < l′ ≤ l″ ≤ n.

Further, we use the following known lower bound on (2kk) (see, e.g., [31]):

(8) (2kk)22k2k.

The following binomial inequality is also used.

## Proposition 5

Let k ≥ 2, m ≥ 3, and q q23m . Then

i=m+1q(ik)(mk+1).

## Proof

Without loss of generality we assume that q ≤ 2m. Then q3q ≥ 2(m3m) which implies

(9) (q+1)q(q1)(m+1)m(m1)m(m1)(m2)>1.

Note that

(10) (m+1k)=(k+1)(m+1)(mk+1)(mk)(mk+1)3(m+1)(m1)(m2)(mk+1)=3(m+1)mm(m1)(m2)(mk+1).

Moreover, for each i > m + 1 we have

(11) (ik)=iik(i1k)ii2(i1k).

From (10) and (11) by induction for i = m + 1, m + 2, . . . we obtain

(ik)3i(i1)m(m1)(m2)(mk+1).

Thus

(12) i=m+1q(ik)(mk+1)i=m+1q3i(i1)m(m1)(m2)=3(mk+1)m(m1)(m2)i=m+1qi(i1).

It can be easily checked that

i=m+1qi(i1)=13[(q+1)q(q1)(m+1)m(m1)],

so the proposition follows from (12) and (9).

## 6 Estimations of the frontal scheme complexity

We consider the two following cases of problems P[n; k]: the case n/2kn4(1+ε) where ɛ > 0 for big values of k and the case k354n for small values of k.

### 6.1 The case for big values of k

Let n/2kn4(1+ε) where ɛ > 0. Note that for ln/2 in this case we have t=0k+1(l1t)>122l1 , so it follows from (6) that

(13) 2l1LP[n;k](1)(l)<2l.

Therefore, using Proposition 4, for any ln/2 we obtain

(14) LP[n;k](l)LP[n;k](1)(l)LP[n;k](1)(n/2)2n/21.

We consider separately the two following cases.

1. Let ln − 2(k + 1). Then by relation (7) we have LP[n;k](2)(l)=2(nl+1k+1)1 . Thus

LP[n;k](l)=LP[n;k](1)(l)+2(nl+1k+1)1.

For 0 < ln − 2(k + 1) consider Δ(l) = LP[n;k](l − 1) − LP[n;k](l). Note that we have

Δ(l)=LP[n;k](1)(l1)LP[n;k](1)(l)++2((nl+2k+1)(nl+1k+1))=LP[n;k](1)(l1)LP[n;k](1)(l)+2(nl+1k)2(n1+1k)LP[n;k](1)(l).

Hence, by relation (13),

Δ(l)>2(nl+1k)2l.

Note that if l increases then 2(nl+1k) decreases and 2l increases. So the value 2(nl+1k)2l decreases as l increases. Thus the minimal value 2(nl+1k)2l is achieved for the maximal value l = n − 2(k + 1), so for any l

Δ(l)>2(n(n2(k+1))+1k)2n2(k+1)=2(2k+3k)2n2k2>2(2kk)2n2k2.

Let n be sufficiently large such that 2nε>n/4 . Using inequality (8), we obtain

Δ(l)>222k2k2n2k2=22kk2n2k2>22kn2n2k2=2n2k2(24k2n2n1)2n2k2(24n4(1+ε)2n2n1)=2n2k2(2nεn/41)

Therefore, from 2nε>n/4 we derive Δ(l) > 0, i.e. LP[n;k](l) is a monotonically decreasing function for 0 ≤ ln − 2(k + 1). Thus, in this case, for sufficiently large values of n, LP[n;k](l) has the only minimum value for l = n − 2(k + 1), i.e. any optimal parallelization level for P[n; k] is not less than n − 2(k + 1).

2. Now let ln − 2(k + 1). Then by relation (7), using well-known asymptotics (nn/2)2nπn/2 , we have

(15) LP[n;k](2)(l)2(nl+1(nl+1)/2)2nl+2π(nl+1)/24π/22nlnl.

Therefore, for sufficiently large values of n,

(16) 32nlnl<LP[n;k](2)(l)<42nlnl.

Denote l0=n214log2n . Note that, for sufficiently large values of n such that εn12log2n , we have n − 2(k + 1) ≤ l0n/2, so, using inequalities (13) and (16), we obtain

LP[n;k](l0)<2l0+42nl0nl02n214log2n+42n2+14log2n+1n/2<132n/2n4.

Thus LP[n;k]*<132n/2n4 .

Consider the three following subcases.

1. Let ll0 − 3. Then, by inequalities (16),

LP[n;k](l)LP[n;k](2)(l)>32nlnl>32n2+14log2n+3n=242n2n4>LP[n;k]*.

2. Let n/2 ≥ ll0 + 6. Then, by inequalities (13),

LP[n;k](l)LP[n;k](1)(l)2l12l0+5>2n214log2n+4=162n/2n4>LP[n;k]*.

3. Let ln/2. Then, by inequality (14), LP[n;k](l) ≥ 2⌊n/2⌋−1, so, for sufficiently large values of n, we have LP[n;k](l)132n/2n4>LP[n;k]* .

Thus, for sufficiently large values of n, any l satisfying subcases a)–c) cannot be optimal for P[n; k], so any optimal parallelization level l* for P[n; k] satisfies the inequalities l0 − 2 ≤ l*l0 + 5, i.e. l*=n214log2n+O(1) . Therefore, using relations (13) and (15), we can easily obtain

LP[n;k]*=LP[n;k](l*)=Θ(2n/2n4).

### 6.2 The case for small values of k

Let 0<k354n . Note that in this case ln − 2(k + 1), so by relation (7) we have LP[n;k](2)(l)=2(nl+1k+1)1 . Further, we assume that n5(5+2) , i.e. n ≥ 22 (the case n < 22 can be checked immediately). Then the inequality k354n implies that n ≤ 4k + 5. We consider the four following cases.

1. Let lk + 2. Then from relation (5) we can derive LP[n;k](1)(l)=2l1 . Thus

LP[n;k](l)=2l+2(nl+1k+1)2.

Therefore, if we denote again Δ(l) = LP[n;k](l − 1) − LP[n;k](l),

Δ(l)=2((nl+2k+1)(nl+1k+1))2l1=2(nl+1k)2l12(nl+1k)2k+1.

Note that

(nl+1k)=(nl+1)!k!(nl+1k)!=i=1knl+2ik+1i(nl+1k)k(nk1k)k.

Since k354n<n/4 , we have nk1kn2kk>2 , so (nl+1k)>2k . Therefore, Δ(l) > 0, i.e. LP[n;k](l) is a monotonically decreasing function for 0 ≤ lk + 2. Thus, in this case LP[n;k](l) has the only minimum value for l = k + 2, i.e. any optimal parallelization level for P[n; k] is not less than k + 2.

2. Let k + 2 ≤ ln − 2(k + 1). Then by relation (5) we have LP[n;k](1)(l)=2i=0k+1(l1i)1 . Thus

(17) LP[n;k](l)=2i=0k+1(l1i)+2(nl+1k+1)2.

For convenience consider separately the case k = 1. In this case

LP[n;k](l)=2l22(n+1)l+n2+n,

so LP[n;k](l) has the minimum value LP[n;k](n+12) if n is odd and the minimum values LP[n;k](n2) and LP[n;k](n2+1) if n is even, i.e. for any n the minimum value of LP[n;k](l) is achieved at l = ⌈n/2⌉. Now let k > 1. Denote δ(l) = LP[n;k](l + 1) − LP[n;k](l). Then for k + 2 ≤ l < n − 2(k + 1)

δ(l)=(2i=0k+1(li)+2(n1k+1)2)(2i=0k+1(l1i)+2(nl+1k+1)2)=2(i=0k+1((li)(l1i))((nl+1k+1)(nlk+1)))=2(i=1k+1(l1i1)(nlk))=2(i=0k(l1i)(nlk)).

Note that if l increases then the sum i=0k(l1i) increases and (nlk) decreases, so δ(l) increases as l increases, i.e. for any k + 2 ≤ l′ < l″ < n − 2(k + 1)

(18) δ(l)<δ(l).

Further, we consider separately the cases for odd and even values of n.

1. Let n be odd, i.e. n = 2n′ + 1. Then we have

δ(n+1)=2(i=0k(ni)(nk))=2i=0k1(ni)>0.

Now we prove that δ(n′) < 0. Note that

δ(n)=2(i=0k(n1i)(n+1k))

where

(n+1k)=(nk)+(nk1)=(n1k)+2(n1k1)+(n1k1).

Thus

δ(n)=i=0k(n1i)(n+1k)=i=0k3(n1i)(n1k1)=i=0k2(n1i)(n1k),

where k′ = k − 1. So the inequality δ(n′) < 0 is obvious for k ≤ 3. Let k > 3. Note that

(n1i1)(n1i)=ini.

So for 0 < ik′ − 2 we have

(n1i1)k2nk+2(n1i).

Therefore

i=0k2(n1i)<(n1k2)(1+k2nk+2++(k2nk+2)2+)=(n1k2)11k2nk+2=(n1k2)nk+2n2k+4<(n1k2)nk+2n2k.

On the other hand,

(n1k)=(nk+1)(nk)k(k1)(n1k2).

Moreover,

nk+1k1=1+n2k+2k1>1+n2k+2k=nk+2k,

so

(n1k)>(nk+2)(nk)k2(n1k2).

Thus

i=0k2(n1i)(n1k)<(n1k2)nk+2n2k(n1k2)(nk+2)(nk)k2=(n1k2)(nk+2)(1n2knkk2).

It is easy to check that for k352n the inequality 1n2knkk20 is valid, hence

i=0k2(n1i)(n1k)<0.

Thus δ(n′) < 0 for k352n (which follows from k354n ). Taking into account inequalities (18), from δ(n′ + 1) > 0 and δ(n′) < 0 we conclude that the minimum value of LP[n;k](l) for k+2 ≤ ln−2(k+1) is achieved at l = n′+1.

2. Now let n be even, i.e. n = 2n′. For l = n′ we have

δ(n)=2(i=0k(n1i)(nk))=2(i=0k(n1i)(n1k)(n1k1))=2i=0k2(n1i)>0.

For l = n′ − 1 we have

δ(n1)=2(i=0k(n2i)(n+1k))<2(i=0k(n2i)(nk)).

Thus, by the same way as in the case of odd n we can prove that for k352(n1) (which follows from k354n )

i=0k(n2i)(nk)<0,

so δ(n′ − 1) < 0. From δ(n′) > 0, δ(n′ − 1) < 0 and inequalities (18) we can conclude that the minimum value of LP[n;k](l) is achieved at l = n′.

Summing up the cases a) and b), we obtain that for k +2 ≤ ln −2(k +1) the minimum value of LP[n;k](l) is achieved at l = ⌈n/2⌉.

3. Let n − 2(k + 1) < lnk. Note that for n − 2(k + 1) ≤ lnk

LP[n;k](l)=2i=0k+1(l1i)+2(nl+1(nl+1)/2)2.

So for n − 2(k + 1) ≤ l < nk

δ(l)=2[(i=0k+1(li)i=0k+1(l1i))τ(l)]=2[i=0k(l1i)τ(l)]

where τ(l)=(nl+1(nl+1)/2)(nl(nl)/2) . Using Pascal's rule, it is easy to see that

(19) τ(l)={(nl(nl)/21)ifnliseven,(nl(nl)/2+1)ifnlisodd.

It can be easily checked from (19) that τl decreases monotonically as l increases, i.e. τ(l″) < τ(l′) for n−2(k+1) ≤ l′ < l″ < nk. Thus, since i=0k(l1i) increases monotonically as l increases, we obtain that δ(l) also increases as l increases, i.e. δ(l) ≥ δ(n−2(k + 1)) for n−2(k+1) ≤ l < nk. Moreover, taking into account that τ(n2k2)=(2k+2k) by (19) and n ≥ 4k+5, we have

τ(n2(k+1))=2[i=0k(n2k3i)(2k+2k)]>2[(n2k3k)(2k+2k)]0,

i.e. τ(n −2(k +1)) > 0. Thus, δ(l) > 0 for n −2(k +1) ≤ l < nk, so in this case LP[n;k](l) > LP[n;k](n−2(k+1)), i.e. l can not be an optimal parallelization level for P[n; k].

4. 4. Let l > nk. Then, using Proposition 4, we have

(20) LP[n;k](l)>LP[n;k](1)(l)LP[n;k](1)(nk)=2i=0k+1(nk1i)1.

On the other hand, taking into account k + 2 ≤ ⌈n/2⌉ ≤ n − 2(k + 1), from (17) we obtain that

(21) LP[n;k](n/2)=2i=0k+1(n/21i)+2(nn/2+1k+1)22i=0k+1(n/2i)+2(n/2+1k+1)2.

Further, we prove the inequality

(22) i=0k+1(n/2i)+(n/2+1k+1)<i=0k+1(nk1i)

For k = 1 this inequality is checked directly. Let k ≥ 2. Denote n′ = ⌊n/2⌋ ≥ 11. Note that n ≥ 4k+5 > 3k+6, so nk2>2n/34n/3>23n . Thus, using Proposition 5, we have

i=n+1nk2(ik)(nk+1).

Then, taking into account the equality

(nk1k+1)=(n+1k+1)+i=n+1nk2(ik)

obtained by sequential applying of Pascal's rule, we derive

(23) (nk1k+1)(n+1k+1)+(nk+1).

From nk − 1 > n′ we have also

i=0k(nk1i)>i=0k(ni).

Summing up this inequality with (23), we obtain inequality (22) for k ≥ 2. It follows from inequalities (20), (21) and (22) that LP[n;k](l) > LP[n;k] (⌈n/2⌉), i.e. in this case l can not also be an optimal parallelization level for P[n; k].

Summarizing all the considered cases for small values of k, we can conclude that for big enough n the parallelization level ⌈n/2⌉) is optimal for P[n; k], and, moreover, ⌈n/2⌉ is only optimal parallelization level for P[n; k] if k ≥ 2 or n is odd (if k = 1 and n is even the parallelization level ⌈n/2⌉ + 1 is also optimal for P[n; k]).

## 7 Conclusion

In conclusion we formulate the obtained results.

## Theorem 1

Let n/2kn4(1+ε) for some fixed ɛ > 0, and l* be an optimal parallelization level for P[n; k]. Then l*=n214log2n+O(1) and LP[n;k]*=Θ(2n/2n4) .

## Theorem 2

Let 0<k354n . Then the value l* = ⌈n/2⌉ is an optimal parallelization level for P[n; k], and LP[n;k]*=Θ((n/2+1k+1)) .

It follows from the obtained results that in the cases for big and small values of k the optimal parallelization level l* for P[n; k] and the value LP[n;k]* have distinct asymptotic behaviors. We conjecture that in the case of 354n<kn4(1ε) for some fixed ɛ > 0 the relation l* = n/2 + O(1) holds, i.e. the value k = n/4 is the boundary for the considered cases of asymptotic behavior of the values l* and LP[n;k]* .

Note also that our results are obtained under the condition that potentially unrestricted number of processors can be used while in a practical situation the number of processors is limited. In the case when the number of available processors is not enough for the optimal parallelization, according to our results, it can be concluded that for the most efficient solving of the problem one has to use as much as possible processors.

## Acknowledgement

This work is partially supported by Russian Foundation for Fundamental Research (Grant 18-07-00566).

## References

[1] Kellerer H., Pfershy U., Pisinger D., Knapsack Problems, Springer Verlag, 200410.1007/978-3-540-24777-7Search in Google Scholar

[2] Posypkin M., Sin S. T. T., Comparative analysis of the efficiency of various dynamic programming algorithms for the knapsack problem, Proceedings of 2016 IEEE NW Russia Young Researchers in Electrical and Electronic Engineering Conference (EIConRusNW), 2016, 313–31610.1109/EIConRusNW.2016.7448182Search in Google Scholar

[3] Posypkin M., Sigal I., Speedup estimates for some variants of the parallel implementations of the branch-and-bound method, Computational Mathematics and Mathematical Physics, 2006, 46, N 12, 2187–220210.1134/S0965542506120165Search in Google Scholar

[4] Afanasiev A., Bychkov I., Zaikin O., Manzyuk M., Posypkin M., Semenov A., Concept of a multitask grid system with a flexible allocation of idle computational resources of supercomputers, Journal of Computer and Systems Sciences International, 2017, 56, N 4, 701–70710.1134/S1064230717040025Search in Google Scholar

[5] Bellman R.E., Dynamic Programming, Princeton University Press, Princeton, 1957Search in Google Scholar

[6] Horowitz E., Sahni S., Computing partitions with applications to the knapsack problem, Journal of ACM, 1974, 21, N 2, 277–29210.1145/321812.321823Search in Google Scholar

[7] Kindervater G.A.P., Lenstra J.K., An introduction to parallelism in combinatorial optimization, Discrete Applied Mathematics, 1986, 14, 135–15610.1016/0166-218X(86)90057-0Search in Google Scholar

[8] Lee J., Shragowitz E., Sahni S., A hypercube algorithm for the 0/1 knapsack problem, Journal of Parallel and Distributed Computing, 1988, 5, 438–45610.1016/0743-7315(88)90007-XSearch in Google Scholar

[9] Lin J., Storer J., Processor efficient hypercube algorithm for the knapsack problem, Journal of Parallel and Distributed Computing, 1991, 3, 332–33710.1016/0743-7315(91)90080-SSearch in Google Scholar

[10] Sanches C.A.A., Somaa N.Y., Yanasse H.H., Parallel time and space upper-bounds for the subset-sum problem, Theoretical Computer Science, 2008, 407, 342–34810.1016/j.tcs.2008.06.051Search in Google Scholar

[11] Curtis V.V., Sanches C.A.A., An efficient solution to the subset sum problem on GPU. Concurrency Computation: Practice and Experience, 2016, 28, 95–11310.1002/cpe.3636Search in Google Scholar

[12] Sanches C.A.A., Soma N.Y., Yanasse H.H., An optimal and scalable parallelization of the two-list algorithm for the subset-sum problem, European Journal of Operational Research, 2007, 176, 870–87910.1016/j.ejor.2005.09.026Search in Google Scholar

[13] Bokhari S.S., Parallel solution of the subset-sum problem: an empirical study, Concurrency and Computation: Practice and Experience, 2012, 24, 2241–225410.1002/cpe.2800Search in Google Scholar

[14] Curtis V.V., Sanches C.A.A., A low-space algorithm for the subset-sum problem on GPU, Computers & Operations Research, 2017, 83, 120–12410.1016/j.cor.2017.02.006Search in Google Scholar

[15] Kang L., Wan L., Li K., Efficient Parallelization of a Two-List Algorithm for the Subset-Sum Problem on a Hybrid CPU/GPU Cluster, Proceedings of International Symposium on Parallel Architectures, Algorithms and Programming, 2014, 93–9810.1109/PAAP.2014.44Search in Google Scholar

[16] Wan L., Li K., Liu J., Li K., GPU implementation of a parallel two-list algorithm for the subset-sum problem, Concurrency and Computation: Practice and Experience, 2015, 27, 119–14510.1002/cpe.3201Search in Google Scholar

[17] Wan L., Li K., Li K., A novel cooperative accelerated parallel two-list algorithm for solving the subset-sum problem on a hybrid CPU/GPU cluster, Journal of Parallel and Distributed Computing, 2016, 97, 112–12310.1016/j.jpdc.2016.07.003Search in Google Scholar

[18] Ristovski Z., Mishkovski I., Gramatikov S., Filiposka S., Parallel implementation of the modified subset sum problem in CUDA, Proceedings of 22nd Telecommunications Forum Telfor (TELFOR), Belgrade, 2014, 923–92610.1109/TELFOR.2014.7034556Search in Google Scholar

[19] Curtis V.V., Sanches C.A.A., An improved balanced algorithm for the subset-sum problem, European Journal of Operational Research, 2019, 275, 460–46610.1016/j.ejor.2018.11.055Search in Google Scholar

[20] Barkalov K., and Gergel V., Parallel global optimization on GPU, Journal of Global Optimization, 2016, 66, N 1, 3–2010.1007/s10898-016-0411-ySearch in Google Scholar

[21] Pietracaprina A., Pucci G., Silvestri F., Vandin F., Space-efficient parallel algorithms for combinatorial search problems, Journal of Parallel and Distributed Computing, 2015, 76, 58–6510.1007/978-3-642-40313-2_63Search in Google Scholar

[22] Casado L. G., Martinez J. A., García I., Hendrix E. M. T., Branch-and-bound interval global optimization on shared memory multiprocessors, Optimization Methods & Software, 2008, 23, N 5, 689–70110.1080/10556780802086300Search in Google Scholar

[23] Vu T.-T., Derbel B., Parallel Branch-and-Bound in multi-core multi-CPU multi-GPU heterogeneous environments, Future Generation Computer Systems, 2016, 56, 95–10910.1016/j.future.2015.10.009Search in Google Scholar

[24] Baldwin A., Asaithambi A., An efficient method for parallel interval global optimization, Proceedings of 2011 International Conference on High Performance Computing and Simulation (HPCS), 2011, 317–32110.1109/HPCSim.2011.5999840Search in Google Scholar

[25] Kolpakov R., Posypkin P., The lower bound on complexity of parallel branch-and-bound algorithm for subset sum problem, AIP Conference Proceedings, 2016, 1776, N 1, AIP Publishing10.1063/1.4965329Search in Google Scholar

[26] Finkel'shtein Yu., Priblizhennye metody i prikladnye zadachi diskretnogo programmirovaniya, Nauka, Moscow, 1976 (in Russian)Search in Google Scholar

[27] Grishukhin V., Efficiency of the branch and bound method in problems with boolean variables, Issledovaniya po diskretnoj optimizatsii, Nauka, Moscow, 1976, 203–230 (in Russian)Search in Google Scholar

[28] Kolpakov R., Posypkin M., Upper and lower bounds for the complexity of the branch and bound method for the knapsack problem, Discrete Mathematics and Applications, 2010, 20, N 1, 113–12510.1515/dma.2010.006Search in Google Scholar

[29] Kolpakov R., Posypkin M., Sigal I., On a lower bound on the computational complexity of a parallel implementation of the branch-and-bound method, Automation and Remote Control, 2010, 71, N 10, 2152–216110.1134/S0005117910100140Search in Google Scholar

[30] Kolpakov R., Posypkin M., Estimating the computational complexity of one variant of parallel realization of the branch and bound method for the knapsack problem, J. of Computer and Systems Sciences International, 2011, 50, N 5, 756–76510.1134/S106423071105011XSearch in Google Scholar

[31] Koshy T., Catalan numbers with applications, Oxford University Press, USA, 200910.1093/acprof:oso/9780195334548.001.0001Search in Google Scholar