Open Access Published by De Gruyter Open Access May 4, 2017

Research on the method of information system risk state estimation based on clustering particle filter

Jia Cui, Bei Hong, Xuepeng Jiang and Qinghua Chen
From the journal Open Physics


With the purpose of reinforcing correlation analysis of risk assessment threat factors, a dynamic assessment method of safety risks based on particle filtering is proposed, which takes threat analysis as the core. Based on the risk assessment standards, the method selects threat indicates, applies a particle filtering algorithm to calculate influencing weight of threat indications, and confirms information system risk levels by combining with state estimation theory. In order to improve the calculating efficiency of the particle filtering algorithm, the k-means cluster algorithm is introduced to the particle filtering algorithm. By clustering all particles, the author regards centroid as the representative to operate, so as to reduce calculated amount. The empirical experience indicates that the method can embody the relation of mutual dependence and influence in risk elements reasonably. Under the circumstance of limited information, it provides the scientific basis on fabricating a risk management control strategy.

1 Introduction

In recent years, the information system has brought people convenience, but at the same time, the security problem is also becoming more and more outstanding. It has been widely used to solve the potential security problem by identifying the security risk of information system [1]. The evaluation result of traditional risk assessment method may have greater subjectivity such as matrix method and phase multiplication, because they mainly depend on the experience of experts. Some researchers have proposed using rough set model [2], Bayesian network model [3] or support vector machine model [4] method for risk assessment, which have made some achievements, but there are some problems in these models: rough set model has lower accuracy; Bayesian network model's accuracy is determined by the class conditional probability density and the prior probability; support vector machine model needs solving convex quadratic programming, which is equal to the number of training samples two times, the storage space is large, and the calculation time is long.

In order to improve the accuracy of the evaluation results, it may be considered to start with one or several major security risk factors [5, 6]. A method of information system risk assessment based on particle filter is proposed in this paper. It is based on the information security risk assessment process and combined with the information security risk assessment standards, to reduce the risk assessment of subjectivity, improve the effectiveness of evaluation and decision-making as the goal, and it's a new and effective risk evaluation method.

2 Particle filter algorithm

The particle filter algorithm is that: by looking for a random sample spread in the state space to approximate the probability density function p(xk|yk), integral operation by sample mean value, so as to obtain the process state minimum variance estimation, the random sample is called particle. The mathematical language is described as follows: for stationary stochastic processes, it can be assumed that at the k − 1 moment, the posterior probability density of the system is p(xk − 1|yk − 1), according to a certain principle, then select n random sample point, after getting the measure information yk at the k moment, the posterior probability density of n particle can be approximated to p(xk|yk) after the state and time renewal process. With the increase of the number of particles, probability density function of particle is gradually approaching the probability density function of the state, the particle filter estimation is to achieve the effect of the optimal Bayesian estimation.

Particle filter algorithm has the following advantages:

  1. The state equation and the observation equation of the system are not required to be linearized, which avoids the error caused by the linearization process;

  2. With a large number of sample points, this algorithm predicts and updates the probability density function of state by sampling and re-sampling, and contains more information than only using mean value and variance;

  3. Particle filter algorithm does not need to has too many constraints on the probability density of the state variables

Therefore, it is the “optimal” filter for the state estimation of nonlinear non-Gaussian systems.

The core of the particle filter algorithm is with a set of random samples to represent random variables inspection probability density, which can be obtained basing on the approximate optimal numerical solution of physical model rather than the approximate model of optimal filtering. Particle filtering algorithm can be used in any system, and is especially suitable for strong nonlinear non-Gaussian system.

Theoretically speaking, all systems in real life are nonlinear, many of which are nonlinear and non-Gaussian, therefore using the particle filter algorithm to estimate the system, and estimating the risking state for nonlinear non-Gaussian information system would undoubtedly be the best choice.

Sequential importance sampling first appeared in 1950s, however, due to the limitations of the degradation problem and the computational power, the SIS has not been well developed. Until 1993, Gordon overcomes the problem of degradation of the algorithm, and puts forward the concept of resampling, then the first operation of the Carlo Monte filter appears, that is called the resampling particle filter algorithm [712]. As for particle filter algorithm, when the number of particles approaches infinity, the calculation accuracy of the particle filter algorithm is the highest, but the computation is also increased. How to reduce the computational complexity of the particle filter algorithm, and how to improve the computational efficiency of the particle filter algorithm have become a hot spot in the research field of the particle filter algorithm. Aiming at the above problems, the K-means clustering algorithm was proposed into particle filter algorithm by clustering all particles on the particle concentration, the centroid of each class as the class representative to participate in the operation [13, 14]. When the degenerate phenomenon is serious, the optimal particle is replaced by the degenerated particle, so as to reduce the amount of computation, and solve the problem of degradation to a certain extent.

3 Clustering algorithm

Cluster analysis is an unsupervised pattern recognition method, which is one of the most common techniques in data mining. Clustering is a progress of dividing a data set into several groups or classes, which makes the data object in the same class of high similarity, and similarities between data objects are relatively low.

Each set of data generated by the clustering is called a cluster, and each data in the cluster is called an object. The purpose of clustering is to make the characteristics of objects in the same cluster as similar as possible, and the difference among different cluster objects is as large as possible. The task of clustering is to divide an unlabeled model into several subsets according to some criteria, and the similar samples are classified into the same class. Many existing clustering methods such as rough set clustering [15], fuzzy clustering [16, 17] and support vector clustering [18] have been used in many fields, including data analysis, fault diagnosis, text classification, pattern recognition, image processing, radar target detection, biological engineering, space remote sensing technology, etc.

3.1 Characteristics of K-means algorithm

Particle filter algorithm is designed to reduce the calculation of particle filter algorithm, so the first principle of the clustering algorithm is: easy to implement, as the algorithm is simple. K-means clustering algorithm, also known as hard C mean clustering algorithm, which is a classical algorithm to solve the clustering problem, and it has been successfully applied in many clustering problems [19]. The main advantages of the algorithm are simplicity and speed. In addition, when programming in MATLAB, one can directly call the K-means clustering function to simplify the preparation of the program. Above all, K-means algorithm is the best choice of particle clustering algorithm.

K-means clustering algorithm is one of the basic and most widely used classification methods in clustering analysis. It is a method to discover clusters and cluster centers in non-labeled data. After choosing the number of expecting centers k, K-means becomes the progress of minimizing variance within clusters by moving the centers repeatedly.

3.2 The idea of K-means clustering algorithm

The basic idea of K-means clustering algorithm is: giving a database containing n data object, and the number of clusters to be generated k, select k objects as the initial k cluster centers randomly, then calculate the distance between every remaining sample and each cluster center, then return to its nearest cluster center which is located in the class, then calculate the cluster centers by the method of adjusting the new class using the average value. If there is no change in the cluster center of the two adjacent centers, it means the sample adjustment is done and the clustering average error criterion function E is convergent [20, 21]. The algorithm has to examine whether each sample in each iteration is correct. If it is not correct, then it should be adjusted. After all the samples are adjusted, the algorithm should modify the cluster center and enter the next iteration. In an iterative algorithm, if all samples are correctly classified, there will be no adjustment, the clustering center will not have any change too. In the process of iterative algorithm, the value of E decreased, and finally converged to a fixed value, which is also one of the basis of the measurement algorithm [22]. The following criterion function is generally selected:

E = i = 1 k p C i | p m i | 2 , (1)

where the E is the sum of squared error of all research objects, p is the point of the space, that is, data objects, mi is the average value of class Ci. According to this criterion, the resulting clusters tend to be independent and compact.

3.3 K-means algorithm process

Input: number of clusters k, and a sample set {x} containing n data objects.

Output: k cluster with minimum variance criterion.

K-means clustering algorithm process [23, 24]

Step 1. Select k object as the initial cluster center: z1(1), z2(1), z3(1), . . . . . . , zk(1), the ordinal number in the bracket is the second ordinal of the iteration of the center of the class, the vector value of cluster centers can be arbitrarily set, for example, k initial data can be used as the clustering center;

Step 2. According to the mean value of all the objects in each cluster, the distance between each object in the sample set {x} and the center object is calculated, according to the principle of minimum distance, the corresponding object is divided into the corresponding cluster center zj(t). Which means when {‖xzi(t)‖, i = 1, 2 … , k} = ‖xzj(t)‖, xSj(t), where Sj(t) indicates the cluster whose center is zj(t);

Step 3. Calculate the average (center object) of each cluster. zj(t + 1), j = 1, 2 … , k, that is

z j ( t + 1 ) = 1 N j x S j ( t ) x , j = 1 , 2 , k (2)

Where Nj represents the number of samples contained in the cluster Sj(t). The cluster mean value is used as the class center, which can make the clustering criterion function

J j = x S j ( t ) x z j ( t + 1 ) , j = 1 , 2 , k (3)

In this step, the mean of the k clusters is calculated separately, and the name of the k-means cluster is derived;

Step 4. If zj(t + 1) ≠ zj(t), j = 1, 2 … , k, then t = t + 1, back to step 2, reclassification of sample set {x}, then iterative calculate; if zj(t + 1) = zj(t), then the algorithm is convergent, and the calculation is finished.

4 Information system security risk state estimation method based on clustering particle filter

In the process of information security risk assessment, the risk state estimation is a key link, and the accuracy of the state estimation algorithm will directly determine whether the evaluation results are accurate. Particle filter is a nonlinear non-Gaussian optimal filtering algorithm [7, 25], so this paper select the particle filter algorithm for information system security risk assessment, combined with clustering algorithm to solve the computational problem.

The dynamic model of the information system is assumed to be:

x k = f ( x k 1 , ν k 1 ) y k = h ( x k , v k ) (4)

where xkRnx is the threat index vector of the system at moment k, ykRny is risk output vector, νkRnν is the system’s noise, v k R n n is the observation noise.

The posterior density p(x0:k|y1:k) is a complete solution to the sequential estimation problem. According to the principle of Monte Carlo simulation, the posterior density can be approximately represented as:

p ( x 0 : k | y 1 : k ) i = 1 N w k i δ ( x 0 : k x 0 : k i ) (5)

Introducing the key density q(x0:k|y1:k) and assuming that the sample x 0 : k i is obtained from the focus density sampling:

x k i q ( x 0 : k | y 1 : k ) (6)

and the importance weight:

w k i p ( x 0 : k i | y 1 : k ) q ( x 0 : k i | y 1 : k ) (7)

Assuming that the density can be decomposed into:

q ( x 0 : k | y 1 : k ) = q ( x k | x 0 : k 1 , y 1 : k ) q ( x 0 : k 1 | y 1 : k 1 ) (8)

Which means that the sample set x 0 : k i q ( x 0 : k | y 1 : k ) can be obtained by adding the new particle x k i q ( x k | x 0 : k 1 , y 1 : k ) into x 0 : k 1 i q ( x 0 : k 1 | y 1 : k 1 ) . And p(x0:k|y1:k) can be expressed as a recursive form below:

p ( x 0 : k | y 1 : k ) = p ( y k | y 1 : k 1 | x 0 : k ) p ( y k | y 1 : k 1 ) × p ( y 1 : k 1 | x 0 : k ) p ( x 0 : k ) p ( y 1 : k 1 ) (9)

Using Bayes formula:

p ( x 0 : k | y 1 : k ) = p ( y k | y 1 : k 1 | x 0 : k ) p ( y k | y 1 : k 1 ) × p ( x k | x 0 : k 1 | y 1 : k 1 ) p ( x 0 : k 1 | y 1 : k 1 ) (10)

As the system follows the first order Markov process, and it’s an independent observation system, so

p ( x 0 : k | y 1 : k ) p ( y k | x k ) p ( x k | x k 1 ) p ( x 0 : k 1 | y 1 : k 1 ) (11)

If the focus density satisfies

q ( x k | x 0 : k 1 , y 1 : k ) = q ( x k | x k 1 , y k ) (12)

Combining formulas (4)

w k i w k 1 i p ( y k | x k i ) p ( x k i | x k 1 i ) q ( x k i | x k 1 i , y k ) (13)

Which is

x k i q ( x k | x k 1 i , y k ) (14)

After weight normalization

w k i = w k i / i = 1 N w k i (15)

Usually taking

q ( x k i | x k 1 i , y k ) = p ( x k i | x k 1 i ) (16)

Which is

x k i p ( x k | x k 1 i ) (17)


w k i w k 1 i p ( y k | x k i ) (18)

The above steps are all the basis of particle filter dynamic estimation algorithm. According to the measuring value of the system, the above-mentioned method is used to calculate samples and weight recursively, forming a dynamic estimation algorithm of particle filtering. And the progress of particle filter and state estimation algorithm based on weight is as follows:

Step 1. Initialization: At k = 0 moment, taking samples according to the key density, k = 1;

Step 2. Predication:

x k i = f ( x k 1 i , ν k 1 ) (19)

Step 3. Weighting:

w k i = w k 1 i × p ( y k | x k i ) p ( x k i | x k 1 i ) q ( x k i | x k 1 i , y k ) (20)

Step 4. Weight normalization:

w k i = w k i / i = 1 N w k i (21)

Step 5. State estimation:

x k = i = 1 N x k i × w k i (22)

Step 6. Back to step 2.

The essence of the method of information system state estimation based on particle filter is that making j-steps forward prediction about the particle at k moment. Knowing the observed value y1:k, when making j-steps forward prediction about the system’s state, particles are updated in an existing way, the weight of a particle at (k + j) moment keeps unchanged to particle at k moment, and the j-step forward to the state of risk prediction probability (i.e., a comprehensive assessment of the risk level) can be calculated as:

r p ( j , k ) = i = 1 N w k j I ( x k + j k + j 1 i ) , j [ 1 , n ] (23)

Where w k j is the importance weight corresponding to x k i , which is the system risk state, such as {normal, medium, dangerous}, and w0 = {0.1, 0.5, 1}, I(A) is the symbolic function. In order to ensure the accuracy of calculation, usually take j = 1, which means one step prediction.

5 Simulation and verification

In this paper, the simulation data is generated by a security laboratory database provided by a scientific research institution. In order to ensure the overall accuracy of the selected indicators, this paper takes the GB/T 20984-2007 “Information security technology-risk assessment specification for information security” as the principle, which raises the basic concepts, elements relationships, analysis principles, implementation processes and assessment methods, as well as implementation points and forms of work on risk assessment [26]. According to the information security risk management and the actual evaluation process specification and experts’ discussion, the 12 specific threat indexes system is finally obtained which is shown in Figure 1.

Figure 1 The system of threat indexes

Figure 1

The system of threat indexes

According to the 12 risk indexes selected in Figure 1, the paper converts it into a 1×12 dimensional line vector and extract all kinds of samples to participate in the validation in order. The dynamic equation of the information system can be found below:

x ( k + 1 ) = A x ( k ) + v ( k ) y ( k ) = C x 1 ( k ) x 12 ( k ) + v ( k ) (24)

Where A and C are the system state parameter matrices and they are full rank, this paper selects it as the unit matrix. Under normal circumstances, the track situation of former three threat index of the risk status is as shown in Figure 2. Figure 3 shows the track situation of former three threat index of the risk status when the system state is in risk.

Figure 2 State estimation situations (system is normal)

Figure 2

State estimation situations (system is normal)

Figure 3 State estimation situations (system is in risk)

Figure 3

State estimation situations (system is in risk)

In the normal operation of the system as in Figure 2, it can be showed that the true value of the system state agrees with the estimated value well. When the system is in danger, the state estimation from Figure 3 shows that the state estimation algorithm based on particle filter can track the change of system state very well. The risk level probability of the comprehensive assessment is calculated by formula (23), and Table 1 lists the probability from 74 to 83 moment. As shows Table 1, if the threat of hidden dangers remains, the system risk will change as time goes on. In this paper, the algorithm can track the dynamic change of the risk very well.

Table 1

Comprehensive evaluation probability on risk level

time 74 75 76 77 78 79 80 81 82 83
x1 0.2759 0.2740 0.2734 0.2726 0.2711 0.2708 0.2705 0.2701 0.2698 0.2682
rp 0.3040 0.4020 0.5220 0.6740 0.6240 0.8480 0.8180 0.9760 0.9980 1.0000

6 Conclusion

Considering the key threat analysis, this paper proposes an effective method of information security risk assessment based on particle filter combined with information security risk assessment process. Firstly, the paper uses particle filter algorithm to obtain the indexes of comprehensive influence weight according to the selected threat indicators, and at the same time uses the state estimation theory to track the dynamic change of information system risk. Finally, the impact of the weight of each index obtained and the state are used to take a comprehensive evaluation on the information system, to determine the probability of safety risk level.


The authors would like to acknowledge many helpful suggestions of the reviewers and the participants on earlier versions of this paper, and also thank authors mentioned in the references.


[1] Feng D.G., Zhang Y., Zhang Y.Q., Survey of information security risk assessment, Journal of China Institute of Communication, 2004, 7, 10-18. Search in Google Scholar

[2] Chen X.Z., Zheng Q.H., Guan X.H., et al., Approach to security evaluation based on rough set theory for host computer, Journal of Xi’an Jiaotong University, 2004, 12, 1228-1231. Search in Google Scholar

[3] Wang Z.Z., Jiang X., Wu X.Y., et al., Planning exploitation graph-bayesian networks model for information security risk frequency measurement, Acta Electronica Sinica, 2010, 2A, 18-22. Search in Google Scholar

[4] Dang D.P., Meng Z., Assessment of information security risk by support vector machine, Journal of Huazhong University of Science and Technology(Natural Science Edition), 2010, 3, 46-49. Search in Google Scholar

[5] Luo C.C., Chen W.J., A hybrid information security risk assessment procedure considering interdependences between controls, Expert System with Applications, 2012, 39, 247-257. Search in Google Scholar

[6] Wang Y.W., Chen H., Research of information security risk assessment method based on business process, Library and Information Service, 2011, 8, 62-66. Search in Google Scholar

[7] Ryu H.R., Huber M., A particle filter approach for multitarget tracking, Proceedings of the 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, CA, USA, 2007, 2753-2760. Search in Google Scholar

[8] Moon H., Chellappa R., 3D shape-encoded particle filter for object tracking and its application to human body tracking, EURASIP Journal on Image and Video Processing, 2008, 5, 1-16. Search in Google Scholar

[9] Liu Y., Shen T., Wang X., Image restoration using gaussian particle filters, Proceedings of the 2007 International Conference on Computational Intelligence and Security, IEEE, Harbin, China, 2007, 391-394. Search in Google Scholar

[10] Canton-Ferrer C., Segura C., Casas J., Pardas M., Hernando J., Audio-visual head orientation estimation with particle filtering in multisensor scenarios, EURASIP Journal on Advances in Signal Processing, 2007, 6, 1-12. Search in Google Scholar

[11] Caron F., Davy M., Duflos E., Vanheeghe P., Particle filtering for multisensor data fusion with switching observation models: application to land vehicle positioning, IEEE Transactions 5 on Signal Processing, 2007, 6, 2703-2719. Search in Google Scholar

[12] Miller I., Campbell M., Particle filtering for map-aided localization in sparse GPS environments, Proceedings of the 2008 IEEE International Conference on Robotics and Automation Pasadena, CA, USA, 2008, 1834-1841. Search in Google Scholar

[13] Hamerly G., Elkan C., Learning the k in k-means, Proceedings of the 17th Annual Conference on Neural Information Processing Systems, Vancouver, Canada, 2003, 1-8. Search in Google Scholar

[14] Pelleg D., Moore A., X-means: extending K-means with efficient estimation of the number of clusters, Proceedings of the 17th 15 International Conference on Machine Learning, Morgan Kaufmann, 2000, 727-734. Search in Google Scholar

[15] Sun S.B., Zhao W.T., Qin K.Y., Wang Y.L., Research on data clustering algorithm based on rough sets, Computer Engineering and Applications, 2006, 22, 140-141. Search in Google Scholar

[16] Liu J.Z., Xie W.X., Clustering analysis by genetic algorithms, Acta Electronica Sinica, 1995, 11, 81-83. Search in Google Scholar

[17] Gao X.B., Pei J.H., Xie W.X., A study of weighting exponent m in a fuzzy c-means algorithm, Acta Electronica Sinica, 2000, 4, 1-4. Search in Google Scholar

[18] Zhang Z., Zheng N., Shi G., Maximum entropy clustering algorithm and its global convergence, Science in China (series E), 2001, 1, 59-70. Search in Google Scholar

[19] Zhang Y.N., Zhao R.C., Liang Y., An eflcient target recognition method for large scale data, Acta Electronica Sinica, 2002, 10, 1533-1535. Search in Google Scholar

[20] Yang Z.H., Yang Y., Document clustering method based on hybrid of SOM and k-means, Application Research of Computers, 2006, 5, 73-79. Search in Google Scholar

[21] Li Y.S., Yang S.L., Ma X.J., Hu X.X., Chen Z.M., Optimization study on k value of spatial clustering, Journal of System Simulation, 2006, 3, 573-576. Search in Google Scholar

[22] Shi Y.P., Xin D.X., Analysis and application based on k-means clustering algorithm, Journal of Xi’An Technological University, 2006, 1, 45-48. Search in Google Scholar

[23] Yuan F., Meng Z.H., Yu G., Improved k-means clustering algorithm, Computer Engineering and Applications, 2004, 36, 177-178. Search in Google Scholar

[24] Sizhong Z., Lan X., Yang X., A sufficient condition for the existence of a k-factor excluding a given r-factor, Applied Mathematics and Nonlinear Sciences, 2017, 1, 13-20. Search in Google Scholar

[25] Brzeziński D.W., Accuracy problems of numerical calculation of fractional order derivatives and integrals applying the Riemann-Liouville/Caputo formulas, Applied Mathematics and Nonlinear Sciences, 2016, 1, 23-44. Search in Google Scholar

[26] State Bureau of Technical Supervision of the People’s Republic of China, GB/T 20984-2007, Information security technology risk assessment specification for information security, Standards Press of China, Beijing, 2007. Search in Google Scholar

Received: 2016-11-24
Accepted: 2017-1-16
Published Online: 2017-5-4

© 2017 J. Cui et al.

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.