Show Summary Details
More options …

# Open Mathematics

### formerly Central European Journal of Mathematics

Editor-in-Chief: Gianazza, Ugo / Vespri, Vincenzo

IMPACT FACTOR 2018: 0.726
5-year IMPACT FACTOR: 0.869

CiteScore 2018: 0.90

SCImago Journal Rank (SJR) 2018: 0.323
Source Normalized Impact per Paper (SNIP) 2018: 0.821

Mathematical Citation Quotient (MCQ) 2017: 0.32

ICV 2017: 161.82

Open Access
Online
ISSN
2391-5455
See all formats and pricing
More options …
Volume 16, Issue 1

# Learning Bayesian networks based on bi-velocity discrete particle swarm optimization with mutation operator

Jingyun Wang
/ Sanyang Liu
Published Online: 2018-08-24 | DOI: https://doi.org/10.1515/math-2018-0086

## Abstract

The problem of structures learning in Bayesian networks is to discover a directed acyclic graph that in some sense is the best representation of the given database. Score-based learning algorithm is one of the important structure learning methods used to construct the Bayesian networks. These algorithms are implemented by using some heuristic search strategies to maximize the score of each candidate Bayesian network. In this paper, a bi-velocity discrete particle swarm optimization with mutation operator algorithm is proposed to learn Bayesian networks. The mutation strategy in proposed algorithm can efficiently prevent premature convergence and enhance the exploration capability of the population. We test the proposed algorithm on databases sampled from three well-known benchmark networks, and compare with other algorithms. The experimental results demonstrate the superiority of the proposed algorithm in learning Bayesian networks.

MSC 2010: 68R10; 68T20; 68W25

## 1 Introduction

Bayesian networks (BNs) are probabilistic graphical models used for representing the probabilistic relationships among the random variables in the domain and doing probabilistic inference with these variables. They have been successfully used for modeling and reasoning, with applications such as pattern recognition [1, 2], medical diagnosis [3, 4], risk analysis [5, 6], computational biology [7, 8, 9], and many others.

The issue of learning BN structures from data is receiving increasing attention. Algorithms for BN structures learning can be grouped into two categories. Constraint-based algorithms [10, 11, 12, 13] construct a graph from data by employing conditional independence tests; statistical or information theoretic measures are used to test conditional independencce between the variables. Local structure learning algorithms, designed by using the knowledge of the Markov blankets of the variables, can reduce the number of dependence tests to some extent. However, these algorithms depend on the accuracy of the statistical tests, they may perform badly with the insufficient or noisy data. Score-based learning algorithms [14, 15, 16] try to construct a network by maximizing the score function of each candidate network using some greedy search or heuristic search algorithms. However, the space of all possible structures increases rapidly with the increasing number of variables [17], deterministic search method may fail to find optimal solution and are often trapped in local optimum. On the other hand, approximation or nondeterministic algorithms are often promising to solve the problem of BN structures learning[18, 19]. To overcome the drawbacks of the score-based algorithms, swarm intelligence algorithms have been used to learn BN structures [20]. Recently, several swarm intelligence algorithms have been successfully applied to BN structures learning, such as ant colony optimization algorithm (ACO) [21, 22, 23], artificial bee colony algorithm (ABC) [24], bacterial foraging optimization (BFO) [25] and particle swarm optimization (PSO) [26, 27, 28, 29].

Although, these swarm intelligence algorithms have good performance in the problem of BN structures learning, there may exist some unavoidable drawbacks. For instance, in global optimization problems, the challenge for PSO is that it may be trapped in the local optimum due to its poor exploration. To enhance the exploration ability of PSO, many strategies have been proposed, so that it can be widely used in many research and application areas with its advantages not only in easy implementation and few parameters to adjust, but also in the ability of quick discovery of optimal solutions. BN structures learning is one of this cases. The classical PSO is designed for continuous problems. In order to extent its applications, and apply it in discrete space to learn BNs, several discrete PSOs for discrete optimization have been presented.

To keep the efficiency of the classical PSO in continuous space, the fast convergence speed and global search advantages, the bi-velocity discrete PSO was proposed and applied in the steiner tree problem in graphs and the multicast routing problem in communication networks [30, 31]. Since a BN structure is represented as a connectivity matrix, in which each element is 0 or 1, and the particle corresponding to the BN structure can be represented by a binary string. Thus, we adopt the velocity and position updating rules similar to that proposed in [30, 31]. However, in PSO each particle moves toward its past best position and the global best position found so far, the exploitation ability is enhanced, but the number of nonzero elements of the velocity tends to zero with the increasing iterations. In this case, if the current global best position is not the global optimum, the particles in the swarm may be trapped in the local optima. To prevent the algorithm from being trapped in a local optimum and enhance the exploration capability, a mutation strategy is introduced to conduct mutation on each new particle. In this paper, an efficient bi-velocity discrete particle swarm optimization with mutation operator algorithm is designed to solve the problem of BN structures learning (BVD-MPSO-BN).

## 2 Bayesian networks and k2 metric

Bayesian networks are knowledge representation tools capable of representing independence and dependence relationships among variables. A Bayesian network, on one hand, is a directed acyclic graph (DAG) G = (X, E), where X = {X1, X2,⋯, Xn} the set of nodes, represents the random variables in a special domain. E is the set of edges, each edge represents the directed influence of one node on another. On the other hand, a Bayesian network uniquely encodes a joint probability distribution over the random variables. It decomposes according to the structure as

$P(X1,X2,⋯,Xn)=∏i=1nP(Xi|Pa(Xi)),$(1)

where Pa(Xi) is the set of the parents of node Xi in G.

When performing the score-and-searching approach for learning BNs from data, a score metric must be specified. So far many score criteria for evaluating the learned networks have been proposed. One of the most well-known score criterion in learning BN structures has been given by Cooper and Herskovits (1992). The score function for a given structure G and training database 𝓓 is

$P(G,D)=P(G)∏i=1n∏j=1qi(ri−1)!(Nij+ri−1)!∏k=1riNijk!,$(2)

where n is the number of variables, each variable Xi has ri possible values, qi is the number of parent configurations of variable Xi, Nijk is the number of cases in 𝓓, where Xi takes on value k with parent configuration j and $\begin{array}{}{N}_{ij}=\sum _{k=1}^{{r}_{i}}{N}_{ijk}.\end{array}$

By using the logarithm of the above function and assuming a uniform prior for P(G), the decomposable k2 metric can be expressed as

$f(G,D)=log⁡(P(G,D))=∑i=1nf(Xi,Pa(Xi)),$(3)

where f(Xi, Pa(Xi)) is the k2 score of node Xi and defined as

$f(Xi,Pa(Xi))=∑j=1qi(log⁡((r1−1)!(Nij+ri−1)!)+∑k=1rilog⁡(Nijk!)).$(4)

## 3 Particle swarm optimization

Particle swarm optimization is a population based stochastic optimization technique, each particle in PSO is a potential solution, the position of a particle is represented as Xi = (xi1, xi2,⋯, xiD), i = 1,2,⋯,N, in which D is the dimension of the search space, N is the number of particles. Each particle has a velocity Vi = (vi1, vi2,⋯, viD). When a particle updates its position, it records its past best position pbesti = (pbesti1, pbesti2,⋯, pbestiD) and the global best position gbest = (gbest1, gbest2,⋯, gbestD) found by any particle in the population. In the standard PSO, the new velocity is calculated according to its previous velocity, the distances of the current position from both its past best position and the global best position. After a particle updates its velocity via Eq. (5), it flies toward a new position according to Eq. (6). Each particle compares its current fitness value with its own past best fitness value – if it is better then a particle updates the past best position and the past best fitness value with the current position and its fitness value. The particle also compares its fitness value with global best fitness value – if it is better then a particle updates the global best position and the global best fitness value with the current position and its fitness value.

$vij(t+1)=ωvij(t)+c1r1ij(t)[pbestij(t)−xij(t)]+c2r2ij(t)[gbestj(t)−xij(t)],$(5)

$xij(t+1)=xij(t)+vij(t+1),$(6)

where ω is the inertia weight, c1 and c2 are positive acceleration coefficients, r1 and r2 are two independently uniformly distributed random values in the range [0, 1], t is the number of iterations.

## 4 Learning BNs using the bi-velocity discrete PSO with mutation operator

Considering the fact that the original PSO algorithm operates only in a continuous search space, some strategies were proposed to solve the problems in a discrete search space and then applied to learn BN structures. To keep the original PSO framework, a bi-velocity discrete particle swarm optimization was proposed, and used for the steiner tree problem in graphs [30] and the multicast routing problem in communication networks [31]. In this section, we intent to use the bi-velocity discrete particle swarm optimization with mutation to solve the problem of BN structures learning.

## 4.1 Problem representation

The problem of BN structures learning is discrete. A BN structure can be represented by an n × n connectivity matrix A, whose each element aij is defined as in Eq. (7), where n is the number of nodes. We intend to use the bi-velocity discrete particle swarm optimization to solve the BN structures learning problem. The position of a particle is encoded as a binary string : a11, a12,⋯, a1n, a21,⋯, a2n,⋯, an1,⋯, ann, similar to [32]. A BN structure G is shown in Fig. 1, the corresponding connectivity matrix $\begin{array}{}A=\left(\begin{array}{c}0\phantom{\rule{thinmathspace}{0ex}}1\phantom{\rule{thinmathspace}{0ex}}1\phantom{\rule{thinmathspace}{0ex}}1\phantom{\rule{thinmathspace}{0ex}}0\\ 0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}1\phantom{\rule{thinmathspace}{0ex}}1\\ 0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}1\\ 0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}0\\ 0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}0\end{array}\right),\end{array}$ and the binary string representing X = (0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0).

Fig. 1

An example of Bayesian network

$aij=1,ifiisaparentofj0,otherwise.$(7)

When bi-velocity discrete PSO works in BN structures learning, the position of each particle i is represented as

$Xi=(xi1,xi2,⋯,xiD),$(8)

while the velocity is encoded as

$Vi=vi10vi20⋯vij0⋯viD0vi11vi21⋯vij1⋯viD1,$(9)

where $\begin{array}{}D={n}^{2},{x}_{ij}=0\phantom{\rule{thinmathspace}{0ex}}or\phantom{\rule{thinmathspace}{0ex}}1,\phantom{\rule{thinmathspace}{0ex}}0\le {v}_{ij}^{0}\le 1,\phantom{\rule{thinmathspace}{0ex}}0\le {v}_{ij}^{1}\le 1.\phantom{\rule{thinmathspace}{0ex}}{v}_{ij}^{0}\end{array}$ is the probability of xij being 0, and $\begin{array}{}{v}_{ij}^{1}\end{array}$ is the probability of xij being 1.

## 4.2 Initial solution construction

To generate initial solutions, we use a method analogous to the one used in [25]. Each initial solution is derived by first starting with an empty graph dose not having any edges, and then adding the absent edges one by one to the current graph if and only if the new graph is a directed acyclic graph and the score of the new graph is higher than that of the previous graph. This procedure repeats until the number of added edges reaches the predefined value. By using the method mentioned above, a certain number of initial solutions are generated.

## 4.3.1 Updating rules

To keep the concept of original PSO in a continuous search space, different updating rules have been proposed in bi-velocity discrete PSO, which are described in detail as follows:

1. Velocity = Position1 – Position2: Suppose that X1 = (x11, x12,⋯, x1D) is in Position1 and X2 = (x21, x22,⋯, x2D) is in Position2, Position1 is better than Position2, then X2 must be learn from X1. The jth dimension of Vi is calculated according to the difference between x1j and x2j, if x1j is b, but x2j is not (b is 0 or 1), which means that the jth dimension of X2 is different from that of X1, X2 learns from X1, so $\begin{array}{}{v}_{ij}^{b}\end{array}$ = 1 and $\begin{array}{}{v}_{ij}^{1-b}\end{array}$ = 1. If x2j is equal to x1j, which indicates that it is not necessary for X2 to learn from X1 on jth dimension, thus, $\begin{array}{}{v}_{ij}^{1}\end{array}$ = $\begin{array}{}{v}_{ij}^{0}\end{array}$ = 1.

For example, if X1 = (1, 1, 0, 0, 1, 0, 0, 0) and X2 = (1, 0, 1, 1, 0, 0, 0, 1), then, $\begin{array}{}{V}_{i}={X}_{1}-{X}_{2}=\left(\begin{array}{c}0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}1\phantom{\rule{thinmathspace}{0ex}}1\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}1\\ 0\phantom{\rule{thinmathspace}{0ex}}1\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}1\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}0\end{array}\right).\end{array}$

2. Velocity = Coefficient × Velocity: This equation represents that the Velocity is multiplied by ω or c × r. Because each element of the Velocity is the possibility for the position being 0 or 1, so the element that is larger than 1 is set to 1.

For example, c × r = (1.2, 0.8, 0.3, 1.5, 0.5, 0.2, 1.3, 0.7), $\begin{array}{}V=\left(\begin{array}{c}0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}1\phantom{\rule{thinmathspace}{0ex}}1\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}1\\ 0\phantom{\rule{thinmathspace}{0ex}}1\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}1\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}0\end{array}\right)\end{array}$ , then, (c × r) × $\begin{array}{}V=\left(\begin{array}{cccccccc}0& 0& 0.3& 1.5& 0& 0& 0& 0.7\\ 0& 0.8& 0& 0& 0.5& 0& 0& 0\end{array}\right)\end{array}$ , the final velocity $\begin{array}{}V=\left(\begin{array}{cccccccc}0& 0& 0.3& 1& 0& 0& 0& 0.7\\ 0& 0.8& 0& 0& 0.5& 0& 0& 0\end{array}\right).\end{array}$.

3. Velocity = Veloctiy1 + Velocity2: Suppose that V1 is Veloctiy1 and V2 is Velocity2, Vi = V1 + V2, the jth dimension $\begin{array}{}{v}_{ij}^{b}\end{array}$ in velocity Vi is the greater one between $\begin{array}{}{v}_{1j}^{b}\end{array}$ and $\begin{array}{}{v}_{2j}^{b}\end{array}$ , in which b = 1 or b = 1.

For example, $\begin{array}{}{V}_{1}=\left(\begin{array}{cccccccc}0& 0& 0.3& 1& 0& 0& 0& 0.7\\ 0& 0.8& 0& 0& 0.5& 0& 0& 0\end{array}\right)\end{array}$ , $\begin{array}{}{V}_{2}=\left(\begin{array}{cccccccc}0.1& 0& 0.5& 0& 0& 0& 0& 1\\ 0& 0.3& 0& 0.2& 0.8& 0& 0& 0.1\end{array}\right)\end{array}$, then, Vi = V1 + $\begin{array}{}{V}_{2}=\left(\begin{array}{cccccccc}0.1& 0& 0.5& 1& 0& 0& 0& 1\\ 0& 0.8& 0& 0.2& 0.8& 0& 0& 0.1\end{array}\right).\end{array}$

In the continuous searching space, the new position of a particle is calculated by adding the updated velocity to the current position. However, the position and velocity may not be added directly in the discrete searching space. In order to solve the discrete problem, a new updating method has been proposed as Eq. (10) [31]

$xij=rand(0,1),ifvij0≥α,vij1≥α0,ifvij0≥α,vij1<α1,ifvij0<α,vij1≥αxij,ifvij0≤α,vij1≤α,$(10)

in which α is a random value in [0, 1].

## 4.3.2 Mutation operation

When PSO is implemented, each particle moves toward its past best position and the global best position found so far, the exploitation ability is enhanced. However, according to the velocity and particle position updating rules, the numbers of nonzero elements in results of pbestiXi and gbestXi decrease and even becomes equal to zero with the increasing iterations. In this case, if the current global best position is not the global optimum, the particles in the swarm may be trapped in the local optima. To prevent premature convergence and enhance the exploration capability, a mutation strategy is adopted to conduct mutation on each new particle. The mutation probability depends on the problem dimension, in other words, it is determined by the number of nodes in BN structures learning problem, because a BN structure is represented by an n × n connectivity matrix whose diagonal elements are all zeros, and the position of a particle is encoded as a binary string according to the connectivity matrix. Thus, we define the mutation probability p = 1/(n2n), where n is the number of nodes. The mutation operator of a particle Xi = (xi1, xi2,⋯, xiD) is defined as

$xij=1−xij,ifrand≤pandj≠1+(m−1)(n+1)xij,otherwise,$(11)

in which, m = 1,⋯,n.

## 4.4 Procedure of the proposed algorithm for BN structures learning

Based on the description above, the pseudo-code of BVD-MPSO-BN is presented in Algorithm 1. It starts with the initial solutions generated by the method described in section 4.2. Each particle in the swarm is encoded as a binary string corresponding to a directed acyclic graph and evaluated using the k2 metric. During iteration, each particle updates its velocity and position according to the updating rules presented in subsection 4.3.1. To increase the probability of escaping from a local optimum, the mutation operator is conducted on each new particle. Because each solution should be a direct acyclic graph, the direct cycles are removed from the new particle if it is a invalid solution. In order to detect and remove the cycles, we first use the depth first search algorithm to detect all back edges, and then invert or delete them. After removing the cycles, the proposed algorithm is executed to improve the past best position of each particle and the global best position of the population. During the main loop of the algorithm, the velocities and the positions of the particles are iteratively updated until a stopping criterion is met.

Algorithm 1

## 5 Experimental results

In this section, we use several networks to test the behaviour of BVD-MPSO-BN. These networks are available in the software GeNie 1 In addition, we compare the proposed algorithm with other algorithms on benchmark networks. The experiments have been executed in a personal computer with Pentium(R) Dual-Core CPU, 2.0 GB memory, and Windows 7, all the algorithms have been implemented in the Matlab language.

## 5.1 Databases and parameter settings of the algorithms

In our experiments, three benchmark networks are selected, namely Alarm, Asia and Credit networks. Alarm network developed for on-line monitoring of patients in intensive care unites [33] contains 37 nodes and 46 arcs; Asia network is useful in demonstrating basics concepts of Bayesian networks in diagnosis [34], it is a simple graphical model and has 8 nodes and 8 arcs; Credit network for assessing credit worthiness of an individual was developed by Gerardina Hernandez as a class homework at the University of Pittsburgh, it is available in the GeNie software and consists of 12 nodes and 12 arcs. The databases used in our experiments are sampled from these benchmark networks by probabilistic logic sampling. In Table 1, the databases, the original networks, the number of cases in each database, the number of nodes in each network, the number of arcs in each network and the k2 scores for the original networks are listed.

Table 1

Databases used in experiments

We compare BVD-MPSO-BN with other algorithms. BNC-PSO: structure learning Bayesian networks by particle swarm optimization [26], when BNC-PSO is implemented, the population size is 50, inertia weight ω decreases linearly from 0.95 to 0.4, acceleration coefficient c1 decreases linearly from 0.82 to 0.5 and acceleration coefficient c2 increases linearly from 0.4 to 0.83. An artificial bee colony algorithm for learning Bayesian networks (ABC-B) [24] had the following parameters: weighted coefficients for the pheromone α = 1 and the heuristic information β = 1, pheromone evaporation coefficient ρ = 0.1, switching parameter for exploitation versus exploration q0 = 0.8, maximum number of solution stagnation limit = 0.3, population size is equal to 40. The parameters for BVD-MPSO-BN were chosen as: the population size is 50, inertia weight ω = 0.1, acceleration coefficients c1 = c2 = 1.1.

## 5.2 Metrics of the performance

To measure the performance of proposed algorithm, we evaluate the learned results in terms of the k2 score and the structural difference (i.e., the differences between the learned structure and the original network).

The detailed descriptions of the metrics are defined as below:

• HKS: the highest k2 score resulting from all trials carried out.

• LKS: the lowest k2 score resulting from all trials.

• AKS: the average k2 score (including the mean and the standard deviation) resulting from all trials.

• AEA: the average number of edges accidentally added over all trials, it contains the mean and the standard deviation.

• AED: the average number of edges accidentally deleted over all trials, it contains the mean and the standard deviation.

• AEI: the average number of edges accidentally inverted over all trials, it contains the mean and the standard deviation.

• LSD: the largest structural difference resulting from all trials.

• SSD: the smallest structural difference resulting from all trials.

• ASD: the average structural difference (including the mean and the standard deviation) resulting from all trials.

• AIt: the average number of iterations needed to find an optimal solution over all trials, it contains the mean and the standard deviation.

• AET: the average execution time over all trials.

## 5.3.1 Learning BNs using bi-velocity discrete PSO with mutation operator

To study the performance of BVD-MPSO-BN algorithm for Bayesian networks learning, we use it to recover the structures from databases sampled from the given benchmark networks. k2 score and structural difference as the basic metrics of the performance are adopted to evaluate the learned networks. We test the BVD-MPSO-BN algorithm by using the Alarm network with the sample size n = 500, 1000, 2000, 3000, 4000, 5000, the Asia network with the sample size n = 500, 1000, 3000 and the Credit network with the sample size n = 500, 1000, 3000. Table 2 reports the experimental results in terms of k2 score, the number of iterations and the execution time. Table 3 shows the experimental results based on the structural difference between the learned network and the original network. Each statistic in Table 2 and Table 3 is the average and standard deviation values over ten independent runs of BVD-MPSO-BN algorithm. We mark the best values in bold.

Table 2

The k2 score, number of iterations and running time of BVD-MPSO-BN on different networks

Table 3

The structural difference of BVD-MPSO-BN on different networks

As shown in Table 2, the difference between HKS and LKS is small on databases Alarm-2000, Alarm-3000, Alarm-4000 and Alarm-5000, except for Alarm-500 and Alarm-1000, which do not have enough samples to correctly learn Alarm networks. The algorithm also returns the small standard deviation, which indicates that the BVD-MPSO-BN algorithm is stable for the network with enough samples. For Asia network, the differences between HKS and LKS are smaller than 0.6 on database Asia-1000 and 0.4 on database Asia-3000, the algorithm returns the same k2 score on database Asia-500. In addition, the difference between AKS and the score of the original network is smaller than 0.6, which means that the score of the Asia network obtained by BVD-MPSO-BN algorithm is very close to that of the original network. For Credit network, the proposed algorithm obtains small standard deviation value and the difference between HKS and LKS, which indicates that the BVD-MPSO-BN algorithm also performs well in Credit network.

From the view of the structural difference, as shown in Table 3, the average and standard values in terms of ASD, AEA, AED and AEI are relatively small on databases sampled from Alarm, Asia and Credit networks. For Alarm network, the values of SSD on databases Alarm-3000 and Alarm-4000 are equal to two, which means that only two times of legitimate operations needed to change the learned network to the original one at the best case. The standard deviation values of AED are equal to zeros on databases Alarm-2000, Alarm-3000, Alarm-4000 and Alarm-5000, which means that the proposed algorithm learns the Alarm network with one or two edges accidentally deleted over ten runs. The average structural difference on database Alarm-500 is larger than 12, which means that the algorithm has poor performance on small databases. For Asia and Credit networks, the average and standard deviation values of AEA approach to zeros, and they are equal to zeros on databases Asia-3000, Credit-500, Credit-1000 and Credit-3000, which means that there are no edges accidentally added when the proposed algorithm learns the Asia network with sample size 3000 and the Credit network with sample size 500, 1000 and 3000. In the best case, the values of AEI are equal to zeros on databases generated from the benchmark networks, that is, there is no accidentally inverted edges in the best case.

The results related to the Alarm, Asia and Credit networks demonstrate that the proposed algorithm is stable for the large networks and able to find structures very close to the original structures for small networks. The performance of the proposed algorithm improves with the increasing sample size.

## 5.3.2 Learning BNs using different algorithms

Next, we compare BVD-MPSO-BN algorithm with BNC-PSO and ABC-B algorithms. The experimental results are presented in Table 4, Table 5 and Table 6. Each entry is the average and standard deviation values over ten independent runs of the different algorithms. The performance of the algorithms is evaluated based on the accuracy in terms of AKS, AEA, AED, AEI and AIt. The best values for different metrics are marked in bold.

Table 4

The experimental results of three algorithms on Alarm network

Table 5

The experimental results of three algorithms on Asia network

Table 6

The experimental results of three algorithms on Credit network

Table 4 shows the experimental results of three different algorithms on databases sampled from Alarm network. From the perspective of k2 score, BVD-MPSO-BN achieves the best values of AKS on databases Alarm-3000 and Alarm-5000. Although, BVD-MPSO-BN obtains the higher k2 score on databases Alarm-1000 compared with BNC-PSO, the standard deviation is larger than that obtained by BNC-PSO. ABC-B algorithm returns the best k2 score on small database Alarm-500. From the view of structural difference, the ASD values of BVD-MPSO-BN on databases Alarm-1000 and Alarm-3000 are the smallest among those of three algorithms. The ASD values returned by ABC-B algorithm on databases Alarm-500 and Alarm-5000 are smaller than that returned by other algorithms. Although ABC-B achieves the smallest average value of ASD on database Alarm-5000, the standard deviation is larger than that of BVD-MPSO-BN. It is obvious that BNC-PSO obtains networks with more incorrect edges compared with original networks on database Alarm-500. The values of AEA returned by ABC-B on different databases sample from Alarm network are the best among those of three algorithms. BVD-MPSO-BN achieves the smallest values of AED on database Alarm-500 and Alarm-1000, the AED values of BNC-PSO and ABC-B on databases Alarm-3000 and Alarm-5000 are the same as that of BVD-MPSO-BN, and they are equal to ones, which means that each of three algorithms learns the BN structures with one edge accidentally deleted on each trial carried out. BVD-MPSO-BN obtains the best values of AEI on databases Alarm-1000, Alarm-500 and Alarm-5000. From the view of the number of iterations, ABC-B often needs less number of iterations compared with BVD-MPSO-BN on databases Alarm-500, Alarm-3000 and Alarm-5000.

From the experimental results on Asia network with sample size n = 500, 1000, 3000 in Table 5, we observe that the BVD-MPSO-BN and BNC-PSO algorithms achieve the same well k2 score and structural difference on databases Asia-500 and Asia-1000, they learn the BNs from database Asia-3000 with no edges accidentally added over ten executions. In comparison to the use of ABC-B algorithm, BVD-MPSO-BN algorithm can obtain higher k2 score, while the ASD values returned by ABC-B algorithm on databases Asia-500 and Asia-1000 are the smallest among three algorithms. There is no accidentally inverted edges generated by BVD-MPSO-BN and BNC-PSO algorithms on database Asia-500. However, the average number of iterations of BVD-MPSO-BN algorithm is the smallest among three algorithms on databases Asia-1000 and Asia-3000.

The experimental results of three algorithms on the Credit network are presented in Table 6. For k2 score, BVD-MPSO-BN algorithm does not perform well on databases Credit-500 and Credit-1000, but still obtains relatively good result on databases Credit-3000. From the view of the structure difference, we observe that BVD-MPSO-BN obtains the best ASD results on databases Credit-500 and Credit-1000. BVD-MPSO-BN and BNC-PSO get the same AEA results and BVD-MPSO-BN obtains the best AED results on three databases. ABC-B obtains the best AEI results on databases Credit-1000 and Credit-5000, and BVD-MPSO-BN obtains the best AEI result on database Credit-500. The average number of iteration of BVD-MPSO-BN is smaller or at least not larger than that of the other two algorithms.

To test the time performance of the proposed algorithm, we evaluate three algorithms on Alarm network with sample size n = 1000, 3000, 5000, Asia network with sample size n = 1000, 3000 and Credit network with sample size n = 1000, 3000. Fig.(2) shows the average running time of three algorithms on different networks. It is obvious that the searching time of the proposed algorithm is the smallest among three algorithms. For BNC-PSO algorithm, the reason is that BVD-MPSO-BN keeps the advantage of fast convergence of classical PSO, while BNC-PSO was proposed by combining PSO with Genetic Algorithm. For ABC-B algorithm, during each iteration, each employed bee finds a new solution in its neighborhood by testing and comparing the k2 scores of four operators (addition, deletion, reversion and move). In addition, each onlooker determines a new solution by performing two knowledge-guided operators or four simple operators and comparing their k2 scores, so it is time consuming to compute the k2 score. Although the number of iterations of ABC-B is often less than that of BVD-MPSO-BN, ABC-B takes much time to reach the near-optimal solutions. Meanwhile, we analyze the changing of time requirement as the changing of sample size. Fig.(3) shows the average results of BVD-MPSO-BN in comparison to ABC-B and BNC-PSO algorithms on databases sampled from Alarm network. Three algorithms generally take much time on learning BNs from large databases. It is obvious that the execution time of the proposed algorithm increases slowly with the increase of the sample size, whereas ABC-B and BNC-PSO algorithms are sensitive to the sample capacity. The overall results demonstrate that BVD-MPSO-BN algorithm is superior to ABC-B and BNC-PSO algorithms in terms of execution time.

Fig. 2

Time performance on three different networks.

Fig. 3

Time performance of three algorithms on Alarm network.

Fig. 4 shows the convergence characteristics of three heuristic algorithms on database Alarm-5000. It is obvious that the final solutions of three algorithms are close to each other. However, the proposed algorithm converges to the optimal solutions faster than both BNC-PSO and ABC-B algorithms. BNC-PSO performs better than ABC-B at the beginning because particles in PSO learn from better and best solutions so that the population quickly converges to the optimal solution. Once the particles are close to the best solution, the convergence speed becomes slower. However, with the help of mutation operator, the particles in proposed algorithm are easy to jump out of the likely local optima, and hence the fast convergence speed could be remained through the whole evolutionary progress.

Fig. 4

The score convergence of three algorithms on Alarm-5000.

Based on the observations above, we conclude that BVD-MPSO-BN can guarantee to learn good-quality networks. BVD-MPSO-BN not only keeps the powerful searching capability in finding the optimal solution, but also prevents the particles in swarm from trapping in the local optima.

## 6 Conclusion

In this paper, we propose a novel score-based algorithm for BNs learning. PSO is a swarm intelligence globalized search algorithm with the advantages of simple computation and rapid convergence capability. However, with the increasing of the number of iterations, the quality of the solution can not be improved, and the algorithm converges to the local optima. In other words, it is easy for PSO to suffer from the premature convergence. To overcome the drawback of the PSO and learn BN structures from data, bi-velocity discrete PSO with mutation algorithm has been proposed. We make a proper balance between exploration and exploitation ability of the proposed algorithm. The experimental results on the databases generated from the benchmark networks demonstrate the effectiveness of our method. Comparing with the BNC-PSO algorithm for BNs learning, the advantage of our algorithm not only lies in its less computation time but also lies in its less error rate between the learned structure and the original network. In the comparison to the use of the ABC-B algorithm, when the number of samples available for structure learning is large, the proposed algorithm performs well and has the better average accurate. The experimental results illustrate the superiority of the proposed algorithm in learning BNs from the data. In this paper, the databases are completely observed, however, there may exist missing data or data with hidden variables in practice. Extending swarm-based algorithms to learn BN structures with incomplete data is our future work. In addition, the performance of the proposed algorithm decreases with the decreasing sample size. Thus, future work will consider the method for structure learning on small databases.

## Acknowledgement

The research is supported by the National Natural Science Foundation of China (Grant No.61373174) and (Grant No.11401454).

## References

• [1]

Jayech K., Mahjoub M.A., Ghanmi N., Application of bayesian networks for pattern recognition: Character recognition case, Proceedings of 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT), 2012, IEEE, pp. 748–757 Google Scholar

• [2]

Wang Q., Gao X., Chen D., Pattern recognition for ship based on bayesian networks, Proceedings of Fourth International Conference on Fuzzy Systems and Knowledge Discovery, 2007, vol. 4, IEEE, pp. 684–688 Google Scholar

• [3]

Nikovski D., Constructing bayesian networks for medical diagnosis from incomplete and partially correct statistics, IEEE Transactions on Knowledge and Data Engineering, 2000, 12(4), 509–516

• [4]

AlObaidi A.T.S., Mahmood N.T., Modified full bayesian networks classifiers for medical diagnosis, Proceedings of International Conference on Advanced Computer Science Applications and Technologies (ACSAT), 2013, IEEE, pp. 5–12 Google Scholar

• [5]

Bonafede C.E., Giudici P., Bayesian networks for enterprise risk assessment, Physica A: Statistical Mechanics and its Applications, 2007, 382(1), 22–28

• [6]

Liu Q., Pérès F., Tchangani A., Object oriented bayesian network for complex system risk assessment, IFAC-PapersOnLine, 2016, 49(28), 31–36

• [7]

Li Y., Ngom A., The max-min high-order dynamic bayesian network learning for identifying gene regulatory networks from time-series microarray data, IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2013, IEEE, pp. 83–90 Google Scholar

• [8]

Tamada Y., Imoto S., Araki H., Nagasaki M., Print C., Charnock-Jones D.S., Miyano S., Estimating genome-wide gene networks using nonparametric bayesian network models on massively parallel computers, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2011, 8(3), 683–697

• [9]

Wang M., Chen Z., Cloutier S., A hybrid bayesian network learning method for constructing gene networks, Computational Biology and Chemistry, 2007, 31(5-6), 361–372

• [10]

Margaritis D., Learning bayesian network model structure from data(phd thesis), Tech. rep., Carnegie-Mellon Univ Pittsburgh PA School of Computer Science, 2003 Google Scholar

• [11]

Tsamardinos I., Aliferis C.F., Statnikov A.R., Statnikov E., Algorithms for large scale markov blanket discovery, Proceedings of FLAIRS Conference, 2003, vol. 2, pp. 376–380 Google Scholar

• [12]

Tsamardinos I., Aliferis C.F., Statnikov A., Time and sample efficient discovery of markov blankets and direct causal relations, Proceedings of Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003, ACM, pp. 673–678 Google Scholar

• [13]

Pena J.M., Nilsson R., Björkegren J., Tegnér J., Towards scalable and data efficient learning of markov boundaries, International Journal of Approximate Reasoning, 2007, 45(2), 211–232

• [14]

Cooper G.F., Herskovits E., A bayesian method for the induction of probabilistic networks from data, Machine learning, 1992, 9(4), 309–347

• [15]

Alcobé J.R., Incremental hill-climbing search applied to bayesian network structure learning, Proceedings of 15th European Conference on Machine Learning, 2004, IEEE Pisa, Italy, pp. 1–10 Google Scholar

• [16]

Chickering D.M., Optimal structure identification with greedy search, Journal of Machine Learning Research, 2002, 3(11), 507–554 Google Scholar

• [17]

Chickering D.M., Geiger D., Heckerman D., et al., Learning bayesian networks is np-hard, Tech. rep., Citeseer, 1994 Google Scholar

• [18]

Tonda A.P., Lutton E., Reuillon R., Squillero G., Wuillemin P.H., Bayesian network structure learning from limited datasets through graph evolution, Proceedings of European Conference on Genetic Programming, 2012, pp. 254–265 Google Scholar

• [19]

Tonda A., Lutton E., Squillero G., Wuillemin P.H., A memetic approach to bayesian network structure learning, Lecture Notes in Computer Science, 2013, 7835, 102–111

• [20]

Ji J., Yang C., Liu J., Liu J., Yin B., A comparative study on swarm intelligence for structure learning of bayesian networks, Soft Computing, 2017, 21(22), 6713–6738

• [21]

De Campos L.M., Fernandez-Luna J.M., Gámez J.A., Puerta J.M., Ant colony optimization for learning bayesian networks, International Journal of Approximate Reasoning, 2002, 31(3), 291–311

• [22]

Daly R., Shen Q., et al., Learning bayesian network equivalence classes with ant colony optimization, Journal of Artificial Intelligence Research, 2009, 35(1), 391–447

• [23]

Jun-Zhong J., Zhang H.X., Ren-Bing H., Chun-Nian L., A bayesian network learning algorithm based on independence test and ant colony optimization, Acta Automatica Sinica, 2009, 35(3), 281–288 Google Scholar

• [24]

Ji J., Wei H., Liu C., An artificial bee colony algorithm for learning bayesian networks, Soft Computing, 2013, 17(6), 983–994

• [25]

Yang C., Ji J., Liu J., Liu J., Yin B., Structural learning of bayesian networks by bacterial foraging optimization, International Journal of Approximate Reasoning, 2016, 69, 147–167

• [26]

Gheisari S., Meybodi M.R., Bnc-pso: structure learning of bayesian networks by particle swarm optimization, Information Sciences, 2016, 348, 272–289

• [27]

Wang T., Yang J., A heuristic method for learning bayesian networks using discrete particle swarm optimization, Knowledge and Information Systems, 2010, 24(2), 269–281

• [28]

Xing-Chen H., Zheng Q., Lei T., Li-Ping S., Learning bayesian network structures with discrete particle swarm optimization algorithm, IEEE Symposium on Foundations of Computational Intelligence, 2007, IEEE, pp. 47–52 Google Scholar

• [29]

Aouay S., Jamoussi S., Ayed Y.B., Particle swarm optimization based method for bayesian network structure learning, Proceedings of 5th International Conference on Modeling, Simulation and Applied Optimization (ICMSAO), 2013, IEEE, pp. 1–6 Google Scholar

• [30]

Zhong W.L., Huang J., Zhang J., A novel particle swarm optimization for the steiner tree problem in graphs, IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), 2008, IEEE, pp. 2460–2467 Google Scholar

• [31]

Shen M., Zhan Z.H., Chen W.N., Gong Y.J., Zhang J., Li Y., Bi-velocity discrete particle swarm optimization and its application to multicast routing problem in communication networks, IEEE Transactions on Industrial Electronics, 2014, 61(12), 7141–7151

• [32]

Larrañaga P., Poza M., Yurramendi Y., Murga R.H., Kuijpers C.M.H., Structure learning of bayesian networks by genetic algorithms: A performance analysis of control parameters, IEEE Transactions on Ppattern Analysis and Machine Intelligence, 1996, 18(9), 912–926

• [33]

Beinlich I.A., Suermondt H.J., Chavez R.M., Cooper G.F., The alarm monitoring system: A case study with two probabilistic inference techniques for belief networks, AIME 89, Springer, 1989, pp. 247–256 Google Scholar

• [34]

Lauritzen S.L., Spiegelhalter D.J., Local computations with probabilities on graphical structures and their application to expert systems, Journal of the Royal Statistical Society. Series B (Methodological), 1988, 157–224 Google Scholar

## Footnotes

• 1

Accepted: 2018-06-13

Published Online: 2018-08-24

Citation Information: Open Mathematics, Volume 16, Issue 1, Pages 1022–1036, ISSN (Online) 2391-5455,

Export Citation