Show Summary Details
More options …

# Open Physics

### formerly Central European Journal of Physics

Editor-in-Chief: Seidel, Sally

Managing Editor: Lesna-Szreter, Paulina

IMPACT FACTOR 2018: 1.005

CiteScore 2018: 1.01

SCImago Journal Rank (SJR) 2018: 0.237
Source Normalized Impact per Paper (SNIP) 2018: 0.541

ICV 2017: 162.45

Open Access
Online
ISSN
2391-5471
See all formats and pricing
More options …
Volume 15, Issue 1

# Random walk-based similarity measure method for patterns in complex object

Shihu Liu
/ Xiaozhou Chen
• Key Laboratory of IOT Application Technology of Universities in Yunnan Province, Yunnan Minzu University, Kunming, 650031, China
• Email
• Other articles by this author:
Published Online: 2017-04-14 | DOI: https://doi.org/10.1515/phys-2017-0016

## Abstract

This paper discusses the similarity of the patterns in complex objects. The complex object is composed both of the attribute information of patterns and the relational information between patterns. Bearing in mind the specificity of complex object, a random walk-based similarity measurement method for patterns is constructed. In this method, the reachability of any two patterns with respect to the relational information is fully studied, and in the case of similarity of patterns with respect to the relational information can be calculated. On this bases, an integrated similarity measurement method is proposed, and algorithms 1 and 2 show the performed calculation procedure. One can find that this method makes full use of the attribute information and relational information. Finally, a synthetic example shows that our proposed similarity measurement method is validated.

PACS: 02.90.+p

## 1 Introductory comments and Problem statement

Similarity measurement, as a tool to determine the similar between two patterns, is given special attention from its wide application in many fields, such as pattern recognition[1], machine learning[2], image processing[36], multimodal maps[7], supply chain network[8], dynamic systems[9], and others[1015]. It is not hard to find that the data generated in similarity computing processes are almost interval numbers[16], fuzzy numbers[1721], (interval-valued)intuitionistic fuzzy numbers[22, 23], or even a set[24]. Besides these, many similarity measure methods are applied, such as the cosine method, correlation coefficient method and the max-min method, amongst others.

No matter which representation the data has or what similarity measurement method is used, all the data are vector-based attribute information, to some extent. But, in practical problem solving, this is not enough. For example, if we want to detect the community structure, the research should be aimed at network data[25]. Therefore, the data takes not only the attribute information of patterns but also the relational information between patterns.

In view of the similarity problem of such data, a lot of studies have been done. For example, Rossi et al.[26] proposed a quantum algorithm to measure the similarity between a pair of unattributed graphs, and in which the theory of quantum Jensen-Shannon divergence constitutes the basis of theoretical analysis. Moreover, Rossi et al.[27] discussed the similarity between attributed graphs by means of the evolution of a continuous-time quantum walk. In [28], Cason et al. computed the low approximation of the graph similarity matrix. Brandes and Lerner [29] introduced the concept of structural similarity by relaxation of equitable partitions. Kpodjedo et al.[30] investigated heuristics for approximate graph matching. Maggini et al.[2] presented a neural networks model that could be used to learn a similarity measure for pairs of patterns. Grewenig, Zimmer and Weickert [31] studied the rotationally invariant similarity measures for non-local image denoising. Besides these studies, many other similarity measuring approaches are proposed in references [3239], in this paper’s.

Taking the relational information of directly-linked patterns and indirectly-linked patterns into account, here we propose a random walk-based similarity measure method for patterns in complex object. Here, the “random walk” refers to the linking route between any two patterns, and the “complex object” refers to the data that it contains not only the attribute information of patterns but also the relational information between patterns. According to the corresponding relational information of any two patterns, firstly we propose the concept of reachability of patterns. On this basis, the so called random walk route between any two patterns is constructed. From this, the similarity value with respect to relational information can be obtained by weighting method. After that, the similarity value comes naturally. During the process of constructing the random walk route, we consider two cases: one is that the longest length of it is not limited; another is that the longest length of it is restricted.

The remainder of this paper is organized as follows. In Section 2, we make a discussion on the concept of complex object. In Section 3, the random walk-based similarity measurement method for patterns in complex object is discussed carefully. In Section 4, a synthetic example is simulated to illustrate the validity of our proposed algorithm. Finally, Section 5 concludes this paper.

## 2 The mathematical description of complex object

Mathematically, the complex object can be expressed as a tuple CO=(X, R), where

• X = {x1, x2, ⋯, xn} is a nonempty finite set and xi is a pattern, for i = 1, 2, ⋯, n;

• R = {rij | i, j = 1, 2, ⋯, n} is a set of relations and rijR represents the possible relational information between the patterns xi and xj.

Naturally, if xiX is nothing but a symbolic description of the ith pattern, then the information provided by R be used. If not, the pattern xi can be expressed as xi = (xi1, xi2, ⋯, xim), where xij represents the value of xi with respect to the attribution aj. To the best of our knowledge, the set X can be rewritten as: $X=x11x12⋯x1mx21x22⋯x2m⋮⋮⋱⋮xn1xn2⋯xnm.$ Similarly, R can be expressed as $R=r11r12⋯r1nr21r22⋯r2n⋮⋮⋱⋮rn1rn2⋯rnn.$

It is well known that the attribute information xij may be not a real number, sometimes it may be a set[24], an interval [16], or even a fuzzy number [17, 18]. Certainly, the foregoing proposed data representation formats are also suitable for the entries in the set R. Especially for some practical problems, the relationship between any two patterns xi and xj may be described by more than one relation, that is, R = {R1, R2, ⋯, Rk with k ≥ 1, where RiR shows the ith relationship between patterns.

In what follows, we adhere to the hypotheses that for complex object CO = (X, R): R describes only one relationship, that is, rij ∈ [0, 1] for i, j = 1, 2, ⋯, n and xij is a real number for i = 1, 2, ⋯, n and j = 1, 2, ⋯, m.

## 3 Random walk-based similarity measure method for patterns in complex object

In this section, we pay attention to constructing the similarity measurement method from two aspects: one is that the length of the random walk route between xi to xj which is not limited, and another is that the length of the random walk route between xi and xj is restricted. Before doing this, at first we introduce some relative concepts.

## 3.1 Reachability of patterns

#### Definition 3.1

Suppose that CO = (X, R) is a complex object. If rij ≠ 0, then we say that the patterns xi and xj are one step reachable with respect to the relationship described by R, otherwise it is one step unreachable.

Obviously, for a complex object CO = (X, R), there holds rij = 0 and rij ≠ 0. Therefore, in this situation all patterns can be divided into two types: one step reachable and one step unreachable. Take the patterns in figure 1 for example, the patterns x1 and x2, x1 and x4, x5 and x7 are one step reachable, but the patterns x1 and x5, x5 and x6 are one step unreachable.

Figure 1

One complex object CO = (X, R)

#### Definition 3.2

Let CO = (X, R) be a complex object. If rij = 0 and rik0, rk0k1, ⋯, rk−1j ≠ 0 when xk0, xk1, ⋯, xk-1X, then we say that the patterns xi and xj are steps reachable with respect to the patterns xk0, xk1, ⋯, xk-1. Moreover, xixk0xk1 → ⋯ → xk-1xj is the possible random walk route between xi and xj.

As can be seen from definition 3.2, we have that for the one step unreachable patterns, it may be steps reachable for ≥ 2. For example, the patterns x2 and x7 in figure 1 are reachable because they have 6 random walk routes between them: $(a)x2→x1→x4→x6→x7;(b)x2→x1→x4→x5→x7;(c)x2→x3→x4→x6→x7;(d)x2→x3→x4→x5→x7;(e)x2→x1→x4→x7;(f)x2→x3→x4→x7.$

Obviously, the length of such random walk routes is not equal. Especially, even the length is equal, the random walk route may not be the same. In order to avoid the unnecessary confusion, we make the following definition.

#### Definition 3.3

Let CO = (X, R) be a complex object. The patterns xi and xj are steps reachable if and only if they are -1 steps unreachable.

Certainly, for any two ( ≥ 2) steps reachable patterns xi and xj in the complex object CO = (X, R), the random walk route may not be unique. Moreover, there exist one case that if X can be divided into two or more disjoint subsets(such as x1 and x2) in terms of R, then the patterns xix1 and xjx2 are unreachable forever.

#### Definition 3.4

Let CO = (X, R) be a complex object and |X| = n. If the patterns xi and xj are n−1 steps unreachable, then we say that they are never reachable.

Through above analysis, one can find that for a complex object CO = (X, R) with n patterns, the reachability between any two patterns can fall into 4 types:

• xi and xj are zero step reachable, that is, xi and xj are the same pattern;

• xi and xj are one step reachable;

• xi and xj are steps reachable, where 2 ≤ ln − 1;

• xi and xj are never reachable.

It should be pointed out that if the patterns xi and xj are zero step reachable, then its similarity value is equal to 1, not only in aspect of attribute information, but also in aspect of relational information. In other words, it is trivial.

## 3.2 Similarity measure in aspect of relational information

Up to now, the reachability of any two different patterns in aspect of relational information has been proposed clearly: one step reachable, steps reachable for 2 ≤ n − 1 and never reachable.

When the patterns xi and xj are one step reachable, such as the patterns x1 and x2 in figure 1, the corresponding relational information rij can be applied to calculate the similarity between them: if R describes the similarity between patterns, then SR(xi, xj) = rij; if R describes the distance between patterns, then SR(xi, xj) = 1 − rij, and so on. Therefore, the finial similarity value between patterns xi and xj can be determined by some weighted approaches, such as the following formula: $sij=SR(xi,xj)2+SX(xi,xj)2SR(xi,xj)+SX(xi,xj).$(1)

When the patterns xi and xj are never reachable, one can find that during the process of similarity measurement, the relational information contributes nothing. In other words, only the attribute information takes part in the calculation. Hence, the finial similarity between patterns xi and xj is the same as that of attribute information, that is, $sij=SX(xi,xj).$(2)

In the sequel, we discuss the situation when the patterns xi and xj are steps reachable for 2 ≤ ≤ |X| − 1. For this, equation (1) and equation (2) are invalid for calculation. Take patterns xi, xj and xk for example, where rij = 0 but rjk ≠ 0 and rik ≠ 0. If we only consider rij = 0, then the similarity of xi and xj, about relational information, is equal to 0, is it reasonable? Certainly, no!

As can be seen from the above subsection, there would have more than one random walk routes between two patterns, such as x2x1x4x7 and x2x3x4x7 in figure 1. Intuitively, different random walk route reflects different similar-degree between these two patterns. To this, we introduce the following approach to calculate the similarity between two steps reachable patterns xi and xj. Given that the patterns xi and xj are steps reachable, and there are more than one random walk routes between them. Then, the similarity value, in aspect of relational information, can be calculated by formula $SℓR(xi,xj)=maxτ∈ΓSR(xi,xk0)2+⋯+SR(xkℓ−1,xj)2SR(xi,xk0)+⋯+SR(xkℓ−1,xj),$(3) where

• τ : xixk0 → ⋯ → xk−1 is one possible random walk route between xi and xj;

• Γ is the index set with respect to the random walk routes;

• SR(xki, xki+1 represents the similarity between one step reachable patterns xki and xki+1.

So, the finial similarity value between the patterns xi and xj can be calculated by the weighted formula $sij=SℓR(xi,xj)2+SX(xi,xj)2SℓR(xi,xj)+SX(xi,xj).$(4)

Naturally, if patterns xi and xj are zero step reachable, then ${S}_{\ell }^{R}\left({x}_{i},{x}_{j}\right)\equiv 1$.

## 3.3 Similarity algorithm construction

In above subsection, we discussed the similarity measure with respect to the relational information. Because the attribute information of patterns is represented as a vector xi = (xi1, xi2, ⋯, xin), the similarity between any two patterns with respect to attribute information is straightforward, such as the cosine measurement $SX(xi,xj)=∑k=1mxikxjk∑k=1mxik2∑k=1mxjk2$(5) can be used.

Bearing our research purpose in mind, the random walk-based similarity measure method can be summarized as following algorithm 1. Notice that for briefness, the sentence “random walk-based similarity measure method for patterns in complex object” is abbreviated into “RWSMM-PCO”.

Algorithm 1

RWSMM-PCO

Obviously, the similarity of any two patterns in CO = (X, R) can be determined by algorithm 1. But, the possible random walk route between patterns xi and xj may be very long, especially in some practical problems.

Once the value of R in CO = (X, R) is given, the random walk routes between any two patterns can be determined. By considering the needs of the practical problem and especially the difficulty of computing the concrete random walk route, next we study the similarity measure between any two patterns when the longest length of corresponding random walk route is restricted. For convenience, we suppose that the length of random walk route between patterns xi and xj is ij, and the corresponding threshold is ${\vartheta }_{ij}^{\ast }$. On this hypothesis, the algorithm 1 can be rewritten as following algorithm 2. Similarly, here the sentence “length limited random walk-based similarity measure method for patterns in complex object” is abbreviated into “LLRWSMM-PCO” for short.

Algorithm 2

LLRWSMM-PCO

## 4 Experimental study

In this section, we introduce a synthetic example to illustrate the validity of the proposed similarity measure method for patterns in complex object. For the complex object CO = (X, R), the attribute information is listed in table 1 and the relational information is shown in figure 2. Hereinto, the relational information reflects the similarity between any two patterns.

Table 1

Attribute information of patterns.

Figure 2

Relational information of patterns in complex object

As can be seen from figure 2, there is at least one random walk route from xi to xj.

Take the patterns x2 and x8 for example, they are 4 steps reachable patterns. With the aid of equation (5), we have that the similarity of x2 and x8 with respect to the attribute information is $SX(x2,x8)=0.986.$(6)

Obviously, there are 2 random walk routes between x2 and x8, which are:

• τ1 : x2x1x4x7x8;

• τ2 : x2x3x4x7x8.

If the route τ1 is considered, then, we have that $Sτ1R(x2,x8)=0.633,$(7) while if we take route τ2 into account, then $Sτ2R(x2,x8)=0.627.$(8)

Taking equation (7) and equation (8) into equation (3), we have that SR(x2, x8) = 0.633. Thus, the similarity of patterns x2 and x8 is s28 = 0.848. Similarity, if take patterns x2 and x4 for example, we have that the similarity with respect to the random walk route x2x1x4 is $Sτ1R(x2,x4)=0.456,$(9) and the similarity with respect to the random walk route x2x3x4 is $Sτ2R(x2,x4)=0.654.$(10) Therefore, the similarity of x2 and x4 is s24 = 0.855.

Obviously, if only the attribute information is considered, then the similarity value of x2 and x8 is 0.986, and the similarity value of x2 and x4 reaches up to 0.988. Certainly, if we only take the relational information into consideration, then the similarity of x2 and x8 is equal to 0, and the same as that of x2 and x4. For the similarity of any two patterns, it can be found in table 2.

Table 2

The similarity of patterns.

In above discussion, the longest length of a random walk route is not restricted. For example, x2 and x8 are 4 steps reachable patterns, and there have 2 possible random walk routes. If the longest length of random walk route is limited to 3, then they are unreachable. If so, according to the algorithm 2, only the attribute information is considered during the process of similarity measuring. But for patterns x2 and x4, because they are 2 steps reachable patterns, then ${\vartheta }_{24}^{\ast }=3$ has no effect on it.

## 5 Conclusion

Here, the problem of how to calculate the similarity of patterns in complex object is discussed carefully. Because the complex object is composed by two types of irrelative information (the vector-based attribute information and the relationship-based relational information), the similarity of patterns in such object is not a simple thing. To this, we proposed a random walk-based similarity measure method for patterns in complex object, where the reachability of any two patterns played a dominant role. On these bases, the similarity of patterns can be obtained by weighting the similarity with respect to the attribute information and the similarity with respect to the relational information. At the end of this paper, a synthetic example showed the validity of our proposed similarity measure algorithm.

## Acknowledgement

This work is supported by the National Natural Science Foundation of China (No. 31460297), Yunnan Applied Basic Research Youth Projects(No.2015FD032) and the Natural Science Foundation of Shandong Province(ZR2016AP12).

## References

• [1]

Li D.F., Cheng C.T., New similarity measure of intuitionistic fuzzy sets and application to pattern recognitions, Pattern Recogn. Lett., 2002, 23, 221-225.

• [2]

Maggini M., Melacci S., Sarti L., Learning from pairwise constraints by similarity neural networks, Neural Networks, 2012, 26, 141-158.

• [3]

Bustince H., Barrenechea E., Pagola M., Image thresholding using restricted equivalence functions and maximizing the measures of similarity, Fuzzy Sets Syst., 2007, 158, 496-516.

• [4]

Chambon S., Crouzil A., Similarity measures for image matching despite occlusions in stereo vision, Pattern Recogn., 2011, 44, 2063-2075.

• [5]

Yen C.Y., Cios K.J., Image recognition system based on novel measures of image similarity and cluster validity, Neurocomputing, 2008, 72, 401-412.

• [6]

Moghaddam B., Nastar C., Pentland A., A Bayesian similarity measure for deformable image matching, Image Vision Comput., 2001, 19, 235-244.

• [7]

Amigó J.M., Giménez Á., Applications of the min-max symbols of multimodal maps, Appl.Math.Nonlinear Sci., 2016,1,87-98.

• [8]

Guo C.X., Liu X.L., Jin M.Z., Lv Z., The research on optimization of auto supply chain network robust model under macroeconomic fluctuations, Chaos Soliton. Fract., 2015, 89, 105-114.

• [9]

Guo C.X., Qiang G., Jin M.Z., Lv Z.H., Dynamic systems based on preference graph and distance, Discrete Cont. Dyn-S., 2015, 8, 1139-1154.

• [10]

de A.T.de Carvalho F., Lechevallier Y., de Melo F.M., Partitioning hard clustering algorithms based on multiple dissimilarity matrices, Pattern Recogn., 2012, 45, 447-464.

• [11]

Egghe L., Rousseau R., Lorenz theory of symmetric relative concentration and similarity, incorporating variable array length, Math. Comput. Model., 2006, 44, 628-639.

• [12]

Gao W., Farahani M.R., Degree-based indices computation for special chemical molecular structures using edge dividing method, Appl. Math. Nonlinear Sci., 2016, 1, 99-122.

• [13]

Liu S.H., Yu F.S., Hesitation degree-based similarity measures for intuitionistic fuzzy sets, Int. J. Information and Communication Technology, 2014, 6, 7-22.Google Scholar

• [14]

Wu H.L., Zhao B., Overview of current techniques in remote data auditing, Appl. Math. Nonlinear Sci., 2016, 1, 145-158.

• [15]

Ramane H.S., Jummannaver R.B., Note on forgotten topological index of chemical structure in drugs, Applied Mathematics and Nonlinear Sciences, 2016, 1, 369-374.

• [16]

Qian Y.H., Liang J.Y., Dang C.Y., Interval ordered information systems, Comput. Math. Appl., 2008, 56, 1994-2009.

• [17]

Dubois D., Prade H., Gradualness, uncertainty and bipolarity: making sense of fuzzy sets, Fuzzy Sets Syst., 2012, 192, 3-24.

• [18]

Ban A., Coroianu L., Grzegorzewski P., Trapezoidal approximation and aggregation, Fuzzy Sets Syst., 2011, 177, 45-59.

• [19]

de Campos Ibáñez L.M., González Muñoz A., A subjective approach for ranking fuzzy numbers, Fuzzy Sets Syst., 1989, 29, 145-153.

• [20]

Zadeh L.A., Fuzzy sets, Information and Control, 1965, 8, 338-353.

• [21]

Bortolan G., Degani R., A review of some methods for ranking fuzzy subsets, Fuzzy Sets Syst., 1985, 15, 1-19.

• [22]

Zhang Q.S., Jiang S.Y., Jia B.G., Luo S.H., Some information measures for interval-valued intuitionistic fuzzy sets, Inform. Sciences, 2010, 180, 5130-5145.

• [23]

Atanassov K.T., Intuitionistic fuzzy sets, Fuzzy Sets Syst., 1986, 20, 87-96.

• [24]

Qian Y.H., Liang J.Y., Song P., Dang C.Y., On dominance relations in disjunctive set-valued ordered information systems, Int. J. Inf. Tech. Decis., 2010, 9, 9-33.

• [25]

Newman M.E.J., Detecting community structure in networks, Eur. Phys.J.B., 2004, 38, 321-330.

• [26]

Rossi L., Torsello A., Hancock E.R., Measuring graph similarity through continuous-time quantum walks and the quantum Jensen-Shannon divergence, Phys. Rev. E., 2015, 91, 12 pages.

• [27]

Rossi L., Torsello A., Andrea E.R., Attributed graph similarity from the quantum Jensen-Shannon divergence, Lecture Notes in Comput. Sci., 2013, 7953, 204-218.

• [28]

Cason T.P., Absil P.A., Van Dooren P., Iterative methods for low rank approximation of graph similarity matrices, Linear Algebra Appl., 2013, 438, 1863-1882.

• [29]

Brandes U., Lerner J., Structural Similarity in Graphs(a relaxation approach for role assignment), Lecture Notes in Comput. Sci., 2005, 3341, 184-195. Google Scholar

• [30]

Kpodjedo S., Galinier P., Antoniol G., Using local similarity measures to efficiently address approximate graph matching, Discrete Appl. Math., 2014, 164, part 1, 161-177.

• [31]

Grewenig S., Zimmer S., Weickert J., Rotationally invariant similarity measures for nonlocal image denoising, J.Vis. Commun. Image. R., 2011, 22, 117-130.

• [32]

Hosamani S.M., Correlation of domination parameters with physicochemical properties of octane isomers, Applied Mathematics and Nonlinear Sciences, 2016, 1, 346-352. Google Scholar

• [33]

Fernandez M.L., Valiente G., A graph distance metric combining maximum common subgraph and minimum common supergraph, Pattern Recogn. Lett., 2001, 22, 753-758.

• [34]

Bunke H., Shearer K., A graph distance metric based on the maximal common sub-graph, Pattern Recogn. Lett., 1998, 19, 255-259.

• [35]

Chen J., Safro I., A measure of the local connectivity between graph vertices, Procedia Comput. Sci., 2011, 4, 96-205.Google Scholar

• [36]

Dehmer M., Emmert-Streib F., Kilian J., A similarity measure for graphs with low computational complexity, Appl. Math. Comput., 2006, 182, 447-459.Google Scholar

• [37]

Hidovic D., Pelillo M., Metrics for attributed graphs based on the maximal similarity common subgraph, Int. J. Pattern Recogn., 2004, 18, 299-313.

• [38]

Rupp M., Proschak A.E., Schneider G., Kernel approach to molecular similarity based on iterative graph similarity, J. Chem. Inf. Model., 2007, 47, 2280-2286.

• [39]

Zager L.A., Verghese G.C., Graph similarity scoring and matching, Appl. Math. Lett., 2008, 21, 86-94.

Accepted: 2016-09-26

Published Online: 2017-04-14

Citation Information: Open Physics, Volume 15, Issue 1, Pages 154–159, ISSN (Online) 2391-5471,

Export Citation