On dynamic network security: A random decentering algorithm on graphs

: Random Decentering Algorithm (RDA) on a undirected unweighted graph is defined and tested over several concrete scale-free networks. RDA introduces ancillary nodes to the given network following basic principles of minimal cost, density preservation, centrality reduction and randomness. First simulations over scale-free networks show that RDA gives a significant decreasing of both betweenness centrality and closeness centrality and hence topological protection of network is improved. On the other hand, the procedure is performed without significant change of the density of connections of the given network. Thus ancillae are not distinguible from real nodes (in a straightforward way) and hence network is obfuscated to potential adversaries by our manipulation.


Introduction
Network analysis [1] is a research eld related to a wide range of scienti c areas having roots in seminal papers on social sciences [2][3][4][5] but also of increasing interest in Biology [6,7], Engineering [8,9], Finance [10] or Computer Science and Security [11][12][13].Networks are structures collecting a xed number of objects or entities called nodes together with relations between them called links.Hence algebraic structures like graphs, directed graphs and their weighted versions [14] are fundamental tools to describe networks in order to perform simulations and qualitative or quantitative analysis by means of statistical or numerical procedures.
In addition to social or physical networks, like social work environments [15], transportation networks [8,9], supply systems [16,17], we have technological networks, like the Internet [18] or data networks, whose nodes are computers or virtual machines or email clients [11] and links represent the data exchange between nodes.
Distributed computing over telecommunication networks, peer-to-peer le sharing, online communities, social networks or even virtual networks in the cloud are emerging topics of increasing interest in several branches of applied mathematics which are essentially graph structures.This applies also to privacy concerns related to data-analysis over graphs and networks including attack or fraud auditing techniques in cybersecurity research [11].
Privacy breaches in a network can be grouped into three categories [19]: 1) identity disclosure: the identity of an individual who is associated with a node is revealed; 2) link disclosure: the sensitive relationships between two individuals are disclosed; and 3) content disclosure: the sensitive data associated with each node is compromised, e.g. the email message sent and/or received by the individuals in a email communication network.A privacy-preserving system over graphs and networks should consider all of these issues.
Those issues imply the following challenges [19]: Challenge 1.To model the knowledge and the capability of an adversary/attacker because any topological structures of the graph can be exploited by the attacker to derive private information.
Challenge 2. To quantify the information as a function of several di erent measures associated to the graph: degree, centrality, betweenness, average path length, diameter, clustering coe cient etc.Should we attempt to modify these metrics?How? Challenge 3. To de ne graph-modi cation algorithms that balance privacy and data utility.The nodes and links of a graph are all correlated.Thus, the impact of a single change of an edge or a node can spread across the whole network.
Challenge 4. To model the behavior of the participants involved in a network-based collaborative computing environment.
Several privacy models, adversaries and graph-modi cation algorithms have been proposed recently [20][21][22] in order to face the challenges above.Note also that network recovery can be approached by similar methods (see [23][24][25]).Unfortunately, it is unlikely to solve all problems in a single shot as protection against each type of privacy branch requires di erent techniques or even a combination of them [19].
In this work we focus on topological properties related to centrality in networks.Structural central features of technological and communication networks are associated with their capacity to share and broadcast information in a secure way.
To be speci c, when the ow of the information is condensed into a few amount of nodes, it is easier to crash the network by removing or infecting some of those nodes before the system administrator detects and stops the attack.Thus, networks with a few of such central nodes, called hubs, are more vulnerable to targeted attacks [26,27].Our goal is to obtain suitable dynamic extensions to hide the central structure of the network, so that it becomes less vulnerable to attacks.
Therefore, it's necessary to know how principal or central a node is, and moreover, if a network has a high level of centralization or not.Concepts of centrality and centralization measures are discussed in the following sections.Note that there have been studies on hiding nodes in a network [28] but the focus therein is not on centrality but on percolation threshold.
Section 2 is devoted to some topics on centrality and centralization measures.Then section 3 describes the e ect on the underlying graph of introducing ancillary nodes to a given network; afterwards, in section 4 we propose our Random Decentralized Algorithm (RDA) to perform those dynamic extensions on networks.In 5 we provide several numerical simulations, test RDA and analyze numerical and graphical results in order to establish some nal conclusions and comments.

Graph centrality measures
From a computer science perspective, a network can be identi ed with a graph G = (V , E), where V = {v , v , . . ., v n } is the set of vertices or nodes and E ⊂ V × V is the set of links or edges; a pair (v i , v j ) ∈ E if and only if vertices v i and v j are connected.In this work we are interested in network structures with data exchanges, where nodes are terminals or other connected computational devices.Therefore, two nodes are linked if and only if they are connected and there is information exchange between them.Several approaches to technological communication networks could be modeled by graphs, but it is not within the scope of this paper to cover all cases, so we are considering binary relations uniquely, which implies that underlying graphs are undirected and unweighted.Also the graph will be assumed to have no isolated nodes nor loops.We will call such networks simple networks.Finally, the networks will be assumed to be connected.We shall refer to Brandes and Erlebach [1] for terminology and basic properties on Network Analysis.
An undirected graph G = (V , E) consists of two sets V, and E, such that V ≠ ∅ and E is a set of unordered pairs of elements of V. A graph G is completely determined by its adjacency matrix contains relevant information about the topology of the network, such as centralization, number of triangles, diameter of the network, presence of cohesive clusters, bipartite character, randomness, etc. ( [1,5,29]).
We brie y discuss in this section some measures of centrality of a single node within a graph -point centrality measures-and Freeman's normalization procedure to measure the centrality of the graph taken as a whole -network centralization measures-.Much work has been done on centrality properties and their applications [30][31][32].In this paper we follow Freeman [33] who in the seventies collected and formalized point centrality and network centrality concepts.

De nition 1. Degree point centrality computes the total amount of adjacencies (direct neighbors) of a node v k , which corresponds to the sum of terms of row (or column) k of the adjacency matrix
Therefore, a node with high degree centrality is an important node of communication, due to its capacity to have direct contact with a great number of nodes in the network.Nevertheless, a node could have high degree, but be disconnected to other nodes if their neighbors have no links to others in the network.So, it is central in a "local sense".To solve this limitation, we consider betweenness and closeness point centrality measures, which take into account the connections of all the nodes in the network.
Betweenness point centrality quanti es the ratio of geodesics (shortest paths) linking two nodes passing through a third point v k with respect to all geodesics between them.

De nition 2. If g i,j denotes the number of geodesics connecting v i , v j and g i,j (v k ) the number of those shortest paths passing trough v k , the betweenness index of v k is given by
A node with high value of betweenness is also a communication hub, having the capacity to control a signi cant part of the ow of information in the network due to the great proportion of nodes communicated through it.

De nition 3. The closeness index of a node v k is given by
where d i,k denotes the distance between nodes v i , and v k .
A node with high closeness is an important communication node, related to e ciency [34] and minimal cost in communication, due its proximity to other nodes in the network.The three structural point centrality measures described above strongly depend on the network size and have normalized versions.In section 5, we compute and plot that measures for simulated networks and apply Freeman's normalization in order to compare centrality in networks removing network size e ect.
Centrality network indexes (or centralization indexes) quantify the homogeneity of point centrality of all the nodes in the network.Networks with high level of centralization have great di erences between point centrality value of the most central point and the others.Networks with low level of centralization have homogeneous point centrality values, which are around the value of the most central node.
De nition 4 (cf.Freeman [33]).Freeman's normalized indexes of network centralization measures are de ned as follows: X its value at the most central node and G n is the set of networks of size to n.Note that C X takes values between 0 and 1.
It is evident that from the point of view of degree, betweenness and closeness point centrality, the center of the star is the most central node in all three cases.The star is the network in which the maximum value in the denominator is reached.

Theorem 1. Degree-based centrality measures are computed as follows: (i) Degree network centrality
Proof.(i) and (ii) see Freeman [33].(iii) is a slight modi cation with respect to its counterpart Freeman's coe cient.Considering point closeness centrality as C C (v i ) = ∑ d i,j it's easy to proof that the maximum value of

Dynamic extensions
Hiding network topology can be important in order to protect node's identity or other con dencial parameters [35].Our goal in this paper is to hide information about the topology of a given network by adding ancillary nodes.To be concise, given a network ( g. 1), we add a new node A and its connections to the remaining nodes ( g. 2).Note that this manipulation is cheap and possible in virtual environments where nodes are virtual machines in a cloud.Some topological properties may be directly shifted by means of such an operation because in particular the characteristic polynomial of the graph and hence its roots (spectrum) are manipulated.
Note also that in such a dynamic enlargement both the number of ancillary nodes added as well as their connections have to be decided.

De nition 5. A dynamic enlargement of a graph G
We deal with adjacency matrices of simple and connected networks.Consequently, this matrices are binary and symmetric, their diagonal entries are equal to zero and there does not exist a node permutation to get a block decomposition of the adjacency matrix.In particular, matrices of above examples are:

Extended Network
Dynamic enlargement with one ancilla gives the incidence matrix A(Γ (G)) where connections ⋆ have to be decided.

The dynamic Random Decentering Algorithm (RDA)
Now we propose an algorithm transforming a given graph G into a dynamic enlargement Γ (G) of graph G with the following principles: Principle 1. Minimal cost.Algorithm does add one single ancillary node in each step.This minimizes the manipulation cost in physical networks.However we can design a signi cant number of ancillae by means of running algorithm in a loop.Notations used in the sequel are detailed next: A is the adjacency matrix of original graph G = (V , E), n = #V is the size of matrix A, m = A is the number of edges, and parameter p is the proportion of ancillae we add.

Principle 3. Centrality and centralization reduction. Distribution of centralities in network Γ (G) should be more homogeneous than in network G hence centralization of network decreases each time
We will obtain a succession of dynamic extensions A = A( ), A( ), ..., A(pn) of the original graph to reach the number of required ancillae; parameters n(t), m(t) are referred to the correspondent extension with n = n( ), m = m( ).We also use the notation ∆m(t) = m(t) − m(t − ) for the number of links we add in the step A(t − ) → A(t).Requirements we stated for our algorithm yield the following equalities related to the parameters: Principle 1 implies thus, the number of links added in each step ∆m(t) = m(t) − m(t − ) can be approximated by the nearest integer to the quotient m( On the other hand, to follow principles 3 and 4, a multinomial distribution law is applied.A new ancillary node links to a previous node v i with probability at each step t. In order to avoid multilinks we apply this probability model step by step, recalculating probabilities p i after each selection (no replacement probability model).
We provide a self-explain pseudocode of RDA (see Algorithm 1).After performing RDA algorithm we obtain all the adjacency matrices of the successive dynamic extensions and its basic topological properties (size, number or links and node degrees).
The following sections are devoted to obtaining experimental results.The centrality measures of the enlarged networks will be computed and plotted as functions of the number of added nodes.It is worth to remark here that original graphs are chosen to be scale-free graphs randomly obtained by BA algorithm.

Experiments
In this section section we carry out dynamic extensions over simulated networks and analyze numerical changes on centrality and centralization measures and their relations.We have conducted the experiments on simulated data that closely model technological data networks we are interested in. .

Real world networks. Scale-free
Several models to generate real world networks have been proposed since the 50's [36][37][38][39][40][41].Paul Erdós and Alfred Rényi modeled random networks, characterized by having nodes with approximately the same degree, showing low level of centralization and being robust against target attacks but vulnerable to random attacks.On the other hand, scale free networks have a few nodes with a great number of connections (called hubs), whereas most nodes have small degree.Networks with this characteristics might appear in complex and computational networks, have high level of centralization and are more vulnerable to coordinated attacks than to random threats.

. Simulation data
Simulated data of our experiment were eventually obtained by BA scale free algorithm [37] with SFNG m-le of MATLAB.We carried out BA simulations and obtained networks with di erent size but similar density we call "original networks".Random Decentralized Algorithm (RDA) was applied with di erent percentage p of ancillary points added.We shall refer to the dynamic extensions as "extended networks" at each algorithm step t.Degree point centrality index was obtained for single nodes in all the networks, original and extended and degree distribution plots were also performed.Finally, network centrality indexes were obtained together with their evolution graphs over all the algorithm steps from % to % of ancillary nodes added.Six original networks were randomly obtained by BA algorithm and several sizes n=40, 80, 100, 150, 200, 250.Parameters were selected in order to get networks with similar density d ≈ . .RDA algorithm was performed for each original matrix A with proportion p = ( % of ancillary nodes added).Finally, sequences of matrices A(t) were stored from t = (adjacency matrix of the original network) to t = [p * n( )] = n( ).

. Results
In order to compare the point degree distribution for original and extended networks, frequency plots were performed for % and % of ancillary nodes added ( gs. 3, 4).While original scale-free networks t to decreasing exponential shapes, extended networks show right displacement and tendency to converge towards bell shapes, typical for random networks, witch are robust to target attacks.Centralization measures were computed for all original and extended matrices A(t).As we have mentioned before, all measures have been normalized.Consequently, potential network-size e ect has been removed.This reveals that centralization decreases as t incerases.Table 1 contains absolute values of degree, betweenness and closeness centralization measures respectively.
As we remarked above, original simulated networks were randomly obtained by BA algorithm.In spite of the random character of BA algorithm, networks obtained by RDA have similar centralization values of degree (varying between .and .
).Nevertheless, betweenness network centrality tends to decrease when n increases (from .and . ).On the one hand, the range of variation of betweenness values is greater in general for this index [4] than the others.Furthermore, the inherent nature of BA algorithm tends to get short paths between nodes if seeds have small size compared to the nal size of the scale free network.This fact implies that nodes have low point betweenness centrality and the same occurs with betweenness network index.We note once again that a dynamic treatment like RDA is cheap to implement on networks of virtual machines in the cloud, but its implementation on cyber-physical environments needs previous study in order to evaluate the cost of introducing each ancillary node and each ancillary link.Some pending tasks are (1) to get, if possible, a true random network from a scale-free one by means of adding ancillary nodes; (2) to compute e ectively how many ancillary nodes are necessary to hide main properties of original network; (3) to perform experiments of order n nodes, n = , , .. in order to check our procedure in real world networks.

Fig. 3 .
Fig. 3. Frequency plots for point degree with % of nodes added

Algorithm 1. Random Decentering Algorithm RDA Require:
A = A( ) incidence matrix Require: p > proportion of ancillary nodes Ensure: t number of steps and A(t) dynamic enlargement of A( ) following principles 1-4.1: n( ) ← n and t ← 2: while i ≤ n( ) do