Show Summary Details
More options …

# Open Physics

### formerly Central European Journal of Physics

Editor-in-Chief: Seidel, Sally

Managing Editor: Lesna-Szreter, Paulina

IMPACT FACTOR 2018: 1.005

CiteScore 2018: 1.01

SCImago Journal Rank (SJR) 2018: 0.237
Source Normalized Impact per Paper (SNIP) 2018: 0.541

ICV 2017: 162.45

Open Access
Online
ISSN
2391-5471
See all formats and pricing
More options …
Volume 15, Issue 1

# Unstructured P2P Network Load Balance Strategy Based on Multilevel Partitioning of Hypergraph

Lv Feng
/ Gao Chunlin
/ Ma Kaiyang
Published Online: 2017-05-04 | DOI: https://doi.org/10.1515/phys-2017-0024

## Abstract

With rapid development of computer performance and distributed technology, P2P-based resource sharing mode plays important role in Internet. P2P network users continued to increase so the high dynamic characteristics of the system determine that it is difficult to obtain the load of other nodes. Therefore, a dynamic load balance strategy based on hypergraph is proposed in this article. The scheme develops from the idea of hypergraph theory in multilevel partitioning. It adopts optimized multilevel partitioning algorithms to partition P2P network into several small areas, and assigns each area a supernode for the management and load transferring of the nodes in this area. In the case of global scheduling is difficult to be achieved, the priority of a number of small range of load balancing can be ensured first. By the node load balance in each small area the whole network can achieve relative load balance. The experiments indicate that the load distribution of network nodes in our scheme is obviously compacter. It effectively solves the unbalanced problems in P2P network, which also improve the scalability and bandwidth utilization of system.

PACS: 42.30.Va

## 1 Introduction

P2P network is a new generation of technology that changes the Internet. It has a profound impact on the way of publishing, sharing and achieving information for Internet users. However, with the rapid increase of user number, a disadvantage of P2P network structure is gradually exposed, that is, a few nodes bear too large information, while some node has little communication, or even no communication at all with others [1, 2]. Especially when large events or unexpected events occurs, some resources become hot resources, increasing the node load suddenly which owing these sources. It causes the response time, data processing ability and throughput of nodes reduce significantly, resulting in network congestion [39]. Therefore, load balance technology is used in peer to peer networks gradually. In P2P network, since the nodes have high dynamic and locality, they are hard to acknowledge the load status of other nodes, so the load balance problem is still a prominent and urgent problem waiting to be solved.

The problem of load unbalance of hot resource in unstructured P2P systems are discussed in this article, such as storage space, bandwidth etc. We propose a dynamic load balance strategy. The network is depicted as a collection of many small regions, and the major problems are ensuring the load balance of these small regions. Then, the nodes in network can be taken as the vertices of hypergraph, and the physical location among the nodes is taken as the side of hypergraph. On this basis, the hypergraph is partitioned in small area. The load statuses of partitioned regions are roughly equivalent, and the nodes lying in the same region have certain relevance. Hypergraph partitioning algorithm is adopted to copy the hot resources, to partake the visits of nodes, and to achieve relative global load balance status.

The remainder of this paper is organized as follows. In Section 2, a description of the hypergraph partitioning algorithm is given in detail. In Section 3, the improvement of multi-level partitioning of hypergraph is introduced to solve the problem in load unbalance in P2P network. Section 4 provides several tests to verify the feasibility and effectiveness of our scheme. The conclusions are presented in Section 5.

## 2.1 Hypergraph Definition

In the past few decades, graph theory is proved to be very important tools for combinatorial problems in the fields like geometry, number theory, operational research and optimization. To solve more combinatorial problems, it is nature to extend the conception of graph. In 1970, the definition of hypergraph was proposed by C.Berge [13]. Hypergraph is a an extension of the map. In a Hypergraph one hyperside connects several points and each hyperside is a nonvoid subset of Hypergraph point set, which is often used to describe the structure in sparse irregular problems [14]. It not only inherits all the functions of general graphs, but also can solve difficult problems that are hard to deal with. Illustrated by association and clustering analysis in data mining, hypergraph has more concise expression and stronger results.

The representation of hypergraph is H = V, E. V = {V1, V2,…,Vn} is the set of vertex and E = {e1, e2,…,ej} is the set of hyper sides. Different with general graphs, the hypersides in hypergraph is a nonempty subset of the vertex set, that is, the sides of hypergraph can connect 3 or more vertices.

There are some related conceptions in hypergraph partitioning:

Figure 1

An hypergraph with 7 vertices and 4 edges

Size of hyperside: The size of each hyperside is the number of elements in the set, that is, the amount of vertices.

Weight of vertice: Each node ViV has a weight Wi, which is a integer generally.

Sum of external degrees(SOED): If certain hyperside ei crosses K blocks, its price is Ci = K. SOED is defined as the sum of hypersides crossing multiple blocks.

k-way partitioning: For a hypergraph, we define the set P = {P1, P2, …, Pn} as a k-way partitioning of hypergraph. Pi is the subset of vertex set V in hypergraph H. In such method, the entity is performed k-partitioning to acquire bidirectional partitions. Then, each part is further partitioned into quartiles. Assuming that k is a power of two, then the final k-way partitioning can be obtained in log(k) such steps (or after performing k − 1 bisections. In the cases in which k is not a power of two, the above approach needs to be modified so that each bisectioning produces appropriate size patitions.

Equilibrium K partition: For a hypergraph H, partitions set P creates equal numbers of each partition and minimize value of SODE.

## 2.2 Multi-level Partitioning

Since hypergraph is also a generalization of the graph, most of the graph partitions can be used in hypergraphs. Studies have pointed that the partition of graph is an NP complete problem, so heuristic algorithms are often adopted to make graph partitioning, instead of iterative ways. There are two popular algorithms currently [15]:

K-L algorithm: It is a classical graph partitioning algorithm put forward by Kernighan and Lin. The principle idea is: first graph G is randomly divided into two sub graphs P1 and P2 that has the same number of vertices. Choose two vertices from P1 and P2, if the weight difference of the two sub graphs is less than the weight of the former after vertex changing, exchange these two vertices and lock them. Then continue above searching and exchanging processes from unlocked nodes set until the weight difference of internal side and external side is negative, and stop the iteration. The process is described with following pseudo codes:

K-L algorithm is not good at the case that the data distribution is sparse; while it works well for the graph whose average degree is more than 3.

Multi-level partitioning of hyper graph was proposed by Karypis, integrated and improved with other partitioning algorithms. The core thought is that the hypergraph partitioning is divided into three phases: coarsening, initial partitioning and refinement. Coarsening aims at using gradually decreasing hypergraphs to approximate the original hypergraphs and to keep their structures as far as possible. It mean the vertices of graph are combined to form a hypergraph with fewer vertices. Then normal partitioning is performed at the phase of original partitioning, generally using K-L algorithm or FM algorithm. Finally it is restored to initial hypergraph at the phase of refinement. The multi-level partitioning algorithms can provide high-quality partition in relatively short time and it has become research benchmark in recent years [16].

Figure 2

The example process of multi-level partitioning of hypergraph

## 3.1 Basic thought and principles

Load balance is an NP complete problem, in general, especially in P2P network. Due to the locality of nodes and dynamicity of network, the nodes only know the load state of a small number of nodes. Most of them cannot know global load information and a perfect scheme for load balance is rarely found. This article believes P2P network does not need nor necessary to achieve global optimal load balancing. We only need the local nodes achieve load balance, global network can approach a relatively better balance state. It can be proved by the following queuing theory [17]:

The whole network is looked on a queuing model, that is, a task is processed just by one node in the same hour. The other tasks will not begin until the task is completed. Assuming both the task arriving rate and the completion rate obey Poisson distribution, then the number of average busy nodes is $\begin{array}{}\frac{\lambda }{\mu }\end{array}$. λ is average tasks in unit time and μ is the tasks number completed in unit time. λ and μ obey the Poisson distribution both, that is, f(t) = λ eλ t(t ≥ 0). If the average busy node number is $\begin{array}{}\overline{m}\end{array}$ and the number of total service nodes is m, the service rate is $\begin{array}{}\rho =\frac{\lambda }{m\mu }.\end{array}$ Pk is the probability that the total tasks number is k in system, so we have P0 = 1−ρ, Pk = (1−ρ)ρk. The average number of busy nodes is obtained as the following equation: $m¯=∑n=0m−1npn+∑n=m∞pn=∑n=0m−1n(mρ)(n−1)!P0+∑n=m∞pnm⋅mmρnm!P0=mρ[∑n−1m−1(mρ)n−1(n−1)!P0+∑n=m∞mmρn−1m!P0]=mρ[∑k=0m−1pk+∑k=m−1∞pk]=mρ=λμ$

In above equation, the average number of nodes and the number of tasks per unit time is proportional to the number of tasks, and it is inversely proportional to the number of tasks completed in unit time. It also has nothing to do with the number of service nodes. If there is overloading node, the average tasks completed in unit time will be reduced, meanwhile, the number of average busy nodes increases.

Thus it can be seen that of each small region achieves optimal load balance, the overall network can be thought to keep good load balance. In view of this, this article proposes the solution to partition the network into the sets of many small regions, adopting hypergraph partitioning method, abbreviate as LABR (Local Area Balance Replication). First, we define supernodes according to online situation of nodes and each supernode is in charge of a small local area network. Each area will randomly select another area to start a 2-partition process of hypergraph (k-way partitioning of hypergaph, k=2), to keep consistent of the total load of the region. In this way, when the nodes in area approaches to certain status (such as overload), the nodes will actively make selection for the light-load nodes in the area through supernodes to achieve the transfer of load.

## 3.2 Region partitioning based on hypergraph

For P2P nodes, due to the limitation among them, heavy-load nodes are hard to choose the best light-load node to make load exchanging [18]. They are possible to choose some neighbour nodes to perform operation s of load balance, leading to relatively poor results, while global scheduling is difficult to be achieved. Therefore, we propose a partitioning strategy based on small areas, that is, the node only exchanges the load in its own area generally. The advantage of hypergraph is, different with common graphs, it can define the points set (hyperside), to make the nodes owning certain properties lie in the same hyperside. According to the hypergraph partitioning algorithm, the area is partitioned and the nodes lie in the same hyperside are possible to be in the same area. Thus we can perform load scheduling and reduce the communication consumption among the nodes.

## 3.2.1 Assignment of supernode

The selection of supernode adopts an adaptive method: We use probabilistic method, 100 is divided by the number of nodes in network, to acquire a probability number p. Then p is sent to the whole network for the operation of each node. If the acquired probability is in the range of 0 ∼ p, it becomes the super node. Then it is adjusted according to the actual state of node.

When new node joins, it will automatically join the area that has direct relation with this node. When more nodes join that causing the number is beyond 200, the area spilt begins. The process of splitting means choosing a node as the supernode in each area, and two supernodes perform 2-way partitioning of hypergraph in the local area. When the node leaves or loses efficacy, if the number of total vertices of the hypergraph is less than certain amount q, the two areas are merged. In general case, q is thought to be equivalent to be 10% of the upper limitation.

## 3.2.2 Hyperside clustering

Hyperside clustering is a complicated process. Since the major problems discussed in this article is load balance, we adopt the scheme proposed in literature [19], which divides hypersides by the computation of distance between nodes according to IP address. First, when the supernodes of two partition collect the information, the partition nodes are classified as one set. According to their IP addresses, an IP-tree with the height of 24 is established, as depicted in figure 3. Because the latter 8 bits of IP address describe different hosts of the same network, it will not be considered. Its root node is 0.0.0.0/0, having two child nodes 0.0.0/ and 128.0.0/1. Then the child node of each node a.b.c/n on the IP tree represents a binary partition at its nth bit. Based on such setting, we set the height of node i as H, then the distance of Na and Nb equals to Dab = HiHj. If D is specified to be less than 1, the nodes are partitioned to the same hyperside, and the same node joins the same hyperside.

Figure 3

IP tree of storage nodes

## 3.2.3 Partitioning optimization

For original hypergraph G = (V, E), in coarsening phase, aftern steps of operation, we gradually map G to G1, G2, …, Gn. At the phase of initial partitioning, we first obtain a partition of Gn as pn. In original multi-level partitioning structure, the partition of Gi+1 is mapped to partition of Gi as $\begin{array}{}{p}_{i}^{\phantom{\rule{thinmathspace}{0ex}}{\phantom{\rule{thinmathspace}{0ex}}}^{\prime }}\end{array}$ at the ith step. Then partitioning algorithm is adopted to optimize $\begin{array}{}{p}_{i}^{\phantom{\rule{thinmathspace}{0ex}}{\phantom{\rule{thinmathspace}{0ex}}}^{\prime }}\end{array}$ as the partition of Gi. After n steps of operation we can acquire the partition of G.

At the phase of refinement and optimization, the following operations are joined: when pi is obtained after mapping and optimization, we adopt V-cycle [20] adjustment on Gi and re-coarsen Gi to get $\begin{array}{}{G}_{i+1}^{{\phantom{\rule{thinmathspace}{0ex}}}^{\prime }},{G}_{i+2}^{{\phantom{\rule{thinmathspace}{0ex}}}^{\prime }},\dots ,{G}_{n}^{{\phantom{\rule{thinmathspace}{0ex}}}^{\prime }},\end{array}$ and the partition of $\begin{array}{}{G}_{n}^{\phantom{\rule{thinmathspace}{0ex}}{\phantom{\rule{thinmathspace}{0ex}}}^{\prime }}\end{array}$. Then the work returns to refinement and optimization. When back to Gi, we can get $\begin{array}{}{p}_{i}^{\phantom{\rule{thinmathspace}{0ex}}{\phantom{\rule{thinmathspace}{0ex}}}^{\prime }}\end{array}$ ; if the partition measuring function of $\begin{array}{}{p}_{i}^{\phantom{\rule{thinmathspace}{0ex}}{\phantom{\rule{thinmathspace}{0ex}}}^{\prime }}\end{array}$ is better than that of pi, repeated the process of V-cycle until the partition measuring function will not be improved any more, when $\begin{array}{}{p}_{i}^{k}\end{array}$ is compared to previous partition $\begin{array}{}{p}_{i}^{k-1}\end{array}$ at a certain step. The partition result $\begin{array}{}{p}_{i}^{k-1}\end{array}$ is adopted as the input of next iteration at the stage of refinement and optimization.

## 3.3 Algorithm realization of partitioning

This article makes a little improvement on K-L algorithm in node partitioning. The initialization of K-L algorithm divides the set of points into two equal numbers, while the algorithm of this paper is based on the size of the region itself. The node number after area partitioning will not changes. Taking into account of the load weight of nodes, each node will compute a parameter for load degree, recorded as contribution, which is decided by some attributes and total load in the area of node. The core ideas is: choose a pairs of node randomly in two areas and determine the difference of weight after exchanging is benefited to the area. If it is, exchange these two nodes. The pseudo code is described as follows:

The first 3 lines means that the unlocked nodes in two areas will randomly choose one node respectively; in the 6th and 7th lines, the weight difference is computed; 8-14 lines represent that if the difference between the sum of contribute before exchanging and the difference of contribute of two nodes after exchanging is a positive number, these two nodes are locked, and their location will be exchanged. If SOED value after exchanging increases, the operation is cancelled. The cycle continues until the difference of total load of two areas are negative, then the iteration stops. Thus, we have partitioned the load area of nodes and the total load value of these two areas is very close. The nodes whose physical location are close will lie in the same area, which facilitates the transfer of load.

The supernodes will count the number of light-load nodes in the area, and the address information. We set the number as 3 log 2 and establish a route table. At the same time, it contacts its neighbour node to send integral area information and load information to the neighbour node, as spare supernode. Finally, the partition information of node, that is, whether the node still belongs to this partition or it is transferred, is sent to the node. The nodes in the same area are sent to the information of spare supernode.

## 4 Experimental Results and Analysis

The simulations mainly adopt scatter diagram and broken line graph to display and contrast the load state before and after balancing. Before the first experiment data record, the load balance algorithm is not started. In the second experiment we start LABR strategy. When the system reaches steady state, the node load degree after load balance is printed and corresponding graph and tables are created by Matlab. This article adopts BA network environment of Peersim to make simulation and it adds the improved algorithm to the strategies module, to verify the effectiveness of hypergraph partition algorithm.

In figure 4, the horizontal coordinate denotes the node number of P2P network and the vertical coordinate denotes the ratio of control load and transmission load. Each supernode in the experiment is assigned to common nodes averagely. In test we set the number of common nodes managed by supernode as the square root of total number of nodes, that is, if there are 400 nodes in system, each supernode is in charge of 20 nodes and these nodes join in or leave the network randomly. From this figure we can see, the ration the ratio of control load and transmission load is very small whether it is a server or a super node in the system. Through the comparison, we find that the supernode reduces many burden for the server, for it executes management and monitoring.

Figure 4

Figure 5 depicts the change of supernode with time goes on. It is found that its change rate is not large. It indicates that with the change of running time, the supernodes in system achieve better stability and effectiveness, which also improve the stability of system.

Figure 5

Change rate of supernodes in network

Figure 6

Figure 7

Relation contrast between load degree and node number

Table 1 is the statics of distribution parameters of different user query (θ and Q denote the user query distribution parameter and link bandwidth), and the influence caused by file quest connections. When the connections number takes different value, the final effect of control algorithm is not changed drastically. The network structure of entropy (NSE) [21] is bigger than the case that the improved algorithm is not implemented. It means the load degree of network is more balanced and the global load is lower. In addition, different distribution parameter has little influence on network before and after the algorithm is implemented and NSE maintains around 3. The reasons is when the distribution parameter is bigger, the query probability of hot documents is greater. But for a network with balanced file distribution, the query request of each node will make big difference. With the connection number increases, the NSE of system gets slight increasing. It shows that the exchanging among the nodes is frequent and the load difference gets minimized. Therefore, the unbalance degree of degree is relieved, which strengthens the anti interference ability of network to a certain extent.

Table 1

The influence of parameter and connection number to the experiment

## 5 Conclusion

Based on the study of search mechanism and load strategy in unstructured P2P network, related defects of unstructured P2P network is analyzed. With theoretical basis of queuing theory, a multi-level partitioning method based on hypergraph is proposed. It can control the system to be in a balance state in local area, so as to achieve overall load balance. Simulation results show that the improved scheme has a faster load transfer speed, and the load distribution ratio is relatively compacter than before, even under high load conditions, it also shows good robustness. Even in status of high load it shows better robustness and comprehensive performance compared to traditional P2P load balance algorithms. Due to limitation of research level and time, there are many problems waiting to be discussed and completed. In complicated P2P network there will be more uncertain factors, which brings big test to network robustness. Therefore, the universal property of our algorithm needs further study and discussion.

## References

• [1]

Mirrezaei S.I., Shahparian J., Ghodsi M., A topology-aware load balancing algorithm for P2P systems, Digital Information Management, 2009, 45, 97–102. Google Scholar

• [2]

Fan D., An improved load balancing scheme for dynamic structured P2P networks, International Journal of Applied Mathematics and Statistics, 2013, 511, 96–204. Google Scholar

• [3]

Deming F. Yong Q.Y., An adaptive and dynamic load balancing algorithm for structured P2P systems, Journal of Convergence Information Technology, 2011, 6, 95–103.

• [4]

Mi W., Zhang C.H., An effective load-balancing algorithm SDYA for structured P2P systems, Journal of Beijing University of Posts and Telecommunications, 2010, 33, 116–120.Google Scholar

• [5]

Joung Y., Approaching neighbor proximity and load balance for range query in P2P networks, Computer Networks, 2008, 52, 1451–1472.

• [6]

Takaoka M., Uchida M., Ohnishi K., Access load balancing with analogy to thermal diffusion for dynamic P2P file-sharing environments, IEICE Transactions on Communications, 2010, 5, 1140–1150.

• [7]

Xiao L.F., Ying X.. A load balance algorithm for hybrid P2P network model, Computing, Communication, Control, and Management, 2008, 10, 236–239.Google Scholar

• [8]

Wei X.L., Chen M., Zhang G.M., A comprehensive load balance mechanism for structured P2P systems, Journal of Beijing University of Posts and Telecommunications, 2012, 35, 87–90.Google Scholar

• [9]

Song G.H., Xia Y.J., Zheng Y., P2P load-balance model based on multi-layer Bayesian trust network, Journal of Zhejiang University (Engineering Science), 2014, 44, 1676–1680. Google Scholar

• [10]

Rao M.A., Load balancing in DHT based P2P networks, Computer Society, 2008, 34, 920–923. Google Scholar

• [11]

Li Z.Y., Xie G.G., A load balancing algorithm for DHT-based P2P systems, Computer Research and Development, 2006, 43, 1579–1585.

• [12]

Ragab E., An efficient load balancing algorithm for P2P systems, Journal of Communications, 2011, 6, 648–656. Google Scholar

• [13]

C. Berge, Graphes hypergraphes, 1st ed., Preprint Series press, Matematisk Institut Aarhus Universitet, 1970. Google Scholar

• [14]

Kennighan B.W., Lin S., An efficient heuristic procedure for pairirioning graphs, Bell system Technical Journal, 1970, 49, 191–307. Google Scholar

• [15]

Liu L.T., Kuo M.T., Cheng C.K., A Gradient Method on the Initial Partition of Fiduccia-Mattheyses Algorithm, Computer-aided Design, 1995, 20, 229–234. Google Scholar

• [16]

Catalyurek U.V., Boman E.G., Devine K.D., A repartitioning hypergraph model for dynamic load balancing, Journal of Parallel and Distributed Computing, 2009, 69, 711–724.

• [17]

Yu F., Liu W., Li P., Hypergraph Partitioning Algorithm for Load Scheduling of P2P Network, Journal of Shenyang Jianzhu University, 2014, 30, 953–960. Google Scholar

• [18]

Khan M.A., Yeh L., Zeitouni K.M., Achieving availability and load balance in a mobile P2P data store, MobiCASE, 2015, 13, 171–172. Google Scholar

• [19]

Song B.Y., Gao N., Li X.G., DLRD: a P2P grid resource discovery mechanism for dynamic load-balance, Journal on Communications, 2008, 29, 94–99. Google Scholar

• [20]

Aric A., V-cycle Optimal Convergence for Certain (Multilevel) Structured Linear Systems, Siam Journal on Matrix Analysis & Applications, 2003, 3, 543–544. Google Scholar

• [21]

Huang C., Wang Y.L., Li D., Description and measurement of actor-network structure entropy, Journal of Nanjing University of Science & Technology, 2012, 36, 414–419. Google Scholar

Accepted: 2016-11-23

Published Online: 2017-05-04

Citation Information: Open Physics, Volume 15, Issue 1, Pages 225–232, ISSN (Online) 2391-5471,

Export Citation