A network - based correlation research between element electronegativity and node importance

: Abstracted from real compounds, chemical ele - ments can be considered a system tied by chemical bonds ( or bonding relationships ) between two elements, namely the chemical element and chemical bond system. Then, elements, bonds and their properties can be studied from the view of complex networks. Based on the pre - vious work, we introduce bond polarity to judge edge direction and select four electronegativity scales to build the directed chemical bond networks. Taking node impor - tance and element electronegativity as an example, we discuss the relationships of properties between chemistry and networks. Through quantitative analysis, the impor - tance scale changing trends in all networks are found to follow the similar periodic laws. And there exist statisti - cally signi ﬁ cant correlations between most of scale pairs. The further analysis proves the similar chemical meanings between above two scales. All these conclusions are unas - sociated with speci ﬁ c electronegativity scales, even if their networks have di ﬀ erent nodes and edges, which prove the rationality and universality of the proposed method. Our research gives a network explanation on element electro - negativity, and we can study more objects and chemical properties from the view of complex networks.


Introduction
A chemical element is defined as a species of atoms with the same number of protons in atomic nucleus [1].All elements are a set of individuals, whose properties depend only on themselves.In other words, the elements are seen as independent but not isolated.With the development of nuclear science, all elements are generated from Hydrogenium and each element can be transformed into others under certain conditions [2][3][4][5].Furthermore, there exist periodic laws ordered by atomic number and similarity among properties of elements in periods and groups [2,6,7].All above phenomena show that there exist relationships among elements, which can be considered a system, not only a set.
In Bond Theory, the chemical bonds can be considered the forces that exist between two atoms or groups of atoms, among three or more of them, or in a whole molecule [1].The bonds are commonly classified and discussed by the kinds of bonding elements and atomic groups, although the same bonds in different environment can show different physical and chemical properties, for example, bond C-C and bond C-H in organic compounds, the bond between Na + and HCO 3 − in sodium bicarbonate.And based on discovered bonding laws, the properties of a bond can be speculated in unknown environment, even if the results sometimes are inaccurate.We can discuss chemical bonds and their properties abstracted from specific compounds and apply these conclusions into real onesone important example is VSEPR.
In complex networks, an edge is defined as the link between two nodes, and all these build up a network; a hyperedge is defined as the link containing one or more nodes, and all these form a hypernetwork.We can use networks and hypernetworks to describe chemical objects nodes as chemical objects, edges as bonds between two chemical objects and networks to describe groups of atoms or molecules (that is to say, the classic structural formula of a molecule can also be considered its network description); hyperedges as bonds among three or more chemical objects; and hypernetworks to describe molecules in more detail (e.g.Bond Π 6  6 among six carbon atoms of benzene, Bond Π 4  3 among three oxygen atoms of ozone).A molecule is a system of atoms or atoms with valences and bonds, which can be described as a network, like molecular networks, but usually we care more about the relationship between its structure and specific properties, such as boiling points [8,9].To find out the common laws in properties (such as periodic laws and similarity), we choose chemical elements as our research objects and the bonds that exist between two elements as relationships, then all the chemical elements can be considered a system tied by chemical bonds or bonding relationships, namely the chemical element and chemical bond system (CECBS).In this system, bond linking status of one element reflects its chemical properties, which can be studied from the view of complex networks.Then, we build the undirected chemical bond network, and the research of this network proves the feasibility and rationality of this methodology [10,11].
As one property of chemical bonds, bond polarity is used to measure the deviation degree or the occurrence probability of bonding electrons between two atoms or groups of atoms and reflects the difference between their abilities of attracting bonding electrons [1,12].Analogous to concepts in complex networks, edge direction can describe the polarity of a chemical bond between two given elements and create a directed network based on CECBS, namely the directed chemical bond network (DCBN).
In chemical researches, chemists often use the electronegativity of an element to reflect the ability of its atoms to attract bonding electrons, noted as χ [1,13].Thus, the polarity of a bond between two bonding elements can be approximately judged by the relative size of their electronegativity values [14].Different kinds of electronegativity scales are proposed based on various average property values [15,16], such as Pauling scale [13], Allred-Rochow scale [17] and Mulliken scale [18].These scales are calculated by bond energy, ionization potential, electronic affinity or other properties, which can be used to measure the ability of elements or atoms with various valences to attract bonding electrons.Limited by research conditions, we cannot collect or detect enough actual polarity of bonds, but we can use the relative sizes of element electronegativity to judge bond polarity and edge direction.Then, in our work, a directed chemical bond network is built with an electronegativity scale by the following rule.
Rule suppose that a chemical bond can stably exist between atoms A and B in one or more molecules, there exists a bonding relationship between elements A and B. If the electronegativity of element A, noted as χ A , is equal to χ B of element B, this relationship is nonpolar and there are two and only two directed edges between the corresponding nodes, noted as edge A → B and edge A ← B; if χ A is smaller than χ B , this relationship is polar and there is one and only one directed edge between the corresponding nodes, noted as edge A → B.
Only if two elements with known χ have the bonding relationship, the directed edge can exist in DCBN.This rule ignores the property difference of atoms and bonds in various molecules, which can be used to measure the properties of elements as a whole.Based on DCBN, we analyze the relationship between chemical properties and network properties of a given element and prove the rationality of studying elements and their chemical properties from the view of complex networks.Taking element electronegativity and node importance as an example in this article, we discuss the relationship between them quantitatively and qualitatively.To ensure the universality of research results, we measure the node importance of all elements through two network indexesdegree centrality and PageRank, which reflect their network properties from different angles.

Node importance
As a topology property in complex networks, node importance is used to reflect its location in a network and measures its influence on the whole structure [19,20].Generally, the greater the scale value of one node is, the more important this node is and the greater influence it has on network topology.In other words, if an important node is deleted from a network, its topology or topology properties will be different from the initial.Nodes in various networks or different locations of networks have different importance values due to the topology difference among these networks.Therefore, the node importance distribution reflects the topology features of a network.
In directed networks, many indexes are proposed to measure node importance from various kinds of views [20,21], for example, degree centrality, betweenness centrality, closeness centrality, eigenvector centrality, Hyperlink-Induced Topic Search (HITS) [22] and PageRank [23], which can totally be divided into three groups by based scalethe key method to measure node importance, as shown in Table 1.Based on our previous work, the shortest paths between two elements in CECBS cannot be explained well from the view of chemistry; thus, the scales based on the shortest paths are not taken into consideration in this work.Taking index popularity and topological meaning into consideration, we use degree centrality and PageRank as examples to measure the importance of elements from the micro and macro views of DCBN.

Degree centrality
In directed networks, degree of a node is defined as the number of the edges directly linked to it, which is divided into two partsindegree and outdegree, and its degree centrality is also divided into the same partsindegree and outdegree centrality.
In a directed network, if node A has an edge with node B, A can be called as the neighbor of B; if the edge is edge A → B, A is the in-neighbor of B and B is the out-neighbor of A. The degree of a node is defined as the number of edges that link with it, the indegree of a node is defined as the number of edges that point to it (or the number of in-neighbors), and the outdegree of a node is defined as the number of edges that point from it (or the number of out-neighbors).
Suppose that there are N nodes in a directed network, if the indegree of node i is k, its outdegree is k and its degree is k i , the degree centrality of node i, noted as DC i , is defined as where DC is its indegree centrality and DC is its outdegree centrality [20,21].In a directed network, DC in and DC out of all nodes range from 0 to 1 and their DC range from 0 to 2.
If a node has more directed edges than others, its DC in or DC out is closer to 1, its DC is closer to 2, and this node is considered to be more important.
Taking the edges of element Xe in a DCBN as an example, Xe has three edgesedge Xe → F, edge Xe → O and edge Xe → Cl, which build a directed network of four nodes and three edges, as shown in Figure 1.In this network, only three nodes, F, Cl and O, have edges linking with Xeall edges point from Xe and no edge points to it; thus, the degree of Xe is 3, the indegree of Xe is 0 and the outdegree of Xe is 3, as shown in Table 2. Based on equation (1), we have the conclusions that DC Xe is 1 In DCBN, the degree centralities of an element reflect the bonding status and chemical environment nearby it -DC of an element reflects its chemical reactivity, and DC in and DC out reflect the ability of its atoms to attract or lose bonding electrons.Relative sizes of three degree centralities between two elements reflect the difference in their chemical properties.Generally, the more active one element is, the more bonds its atoms can form, and the bigger its DC is.The stronger the ability of an atom to attract bonding electrons is, the bigger the DC in of corresponding element is; the stronger the ability of this atom to lose bonding electrons is, the bigger the DC out of this element is.

PageRank
PageRank is another kind of importance scales in directed networks to calculate the accessing probabilities of all nodes, values of which reflect their importance in the whole network.To study importance of chemical elements more comprehensively, we select the PageRank index to measure element importance from the macro view of network topology.A network-based correlation research  3 In 1998, Larry Page and Sergey Brin proposed the PageRank algorithm based on hyperlinks between web pages.The key of this algorithm is that the importance value of one node depends on the quantity and importance values of nodes pointing to it [23,24].In a directed network, the PageRank value of node i, noted as PR i , is equal to the stable value of PR i (k) after iterations, and in each iteration, this value is calculated by , where 1, 2, , and 1, where N is the node number in this network, s is the given constant number and k is the outdegree of node j.The convergence of the PageRank algorithm is proved to be independent of the initial PageRank values.To ensure the sum of all PageRank values in the network is 1 in each iteration, initial PageRank values of all nodes are set as and s is chosen as the recommended value 0.85 [21,23].
Difference in PageRank values of nodes reflects difference in their locations in network structure.The bigger the PR of one node is, the more possibly this node can be visited while walking randomly in the whole network, and the more important this node is considered to be [21,23].
In each iteration, PR of an element is calculated from the PageRank values of other elements directly pointing to this element in DCBN, which reflects the ability of its atoms to attract bonding electrons and show positive valence.After enough iterations, its stable PR is associated with PageRank values of all others in the network, not just the elements pointing to it.The PageRank index can quantitatively measure locations of chemical elements in the whole DCBN.If PR of element A is bigger than that of element B, it is easier to randomly visit A than B in DCBN, and A is more important than B.

The directed chemical bond network
Complex networks have already been applied to study chemical objects, such as compounds and their elements [25,26] and binary compounds and their stoichiometric factors [27][28][29].To distinguish from above previous researches, we use 97 chemical elements and 2,198 bonding relationships extracting from 4,274 binary compounds between two different elements, to build CECBS in this article (the bond data are quoted from our previous work [10]).Figure 2 shows the distribution of collected data, where the bonding element pairs are marked as red.
In the axis of Figure 2, 97 elements are divided and colored into 6 groups, where nonmetals are marked as green, group 18 elements are marked as yellow, metals are marked as blue, transition metals are marked as red, lanthanides are marked as sky blue and actinides are marked as pink.To distinguish from stable elements, radioactive elements are gapped in this figure as well, such as elements Tc, Pm, and Po.The collected data cover 47.21% of possible chemical bonds between all 97 elements and 61.77% of the ones between involved 78 stable elements with stable isotopes (all the stable elements in nature, except group 18 elements He, Ne and Ar), especially the ones between nonmetals and metals or the ones between elements in period 1-5, which guarantees the reliability and universality of our research.
Aimed at ruling out the dependence of conclusions on specific scales, we choose four different electronegativity scales -Nagle scale χ α [30], Allred-Rochow scale χ ar [17], Pauling scale χ p [13] and Allen scale χ s [31], to build the directed chemical bond networks.All values of electronegativity scales are ordered by increasing atomic number, as shown in Figure 3, where the values of Pauling scale are quoted from WebElements [6] and values of other three scales are quoted from their original references [17,30,31].
Excluding missing values, all trends of the four electronegativity scales show similar periodic laws, especially the values of nonmetals and metals in groups 1-2 and 13-18.To quantitatively analyze their similarity, we do Pearson and Spearman correlation analyses under the 0.001 significance level between any scale pair, as listed in Figure 4 (where Pearson correlation coefficients are listed in the lower left corner and Spearman results are listed in the upper right corner).All the Pearson coefficients are greater than 0.90, which proves the significant positive correlations between any two scales.
To study network structure, we count the number of nodes, edges and common edges in the built networks and calculate the ratios of common edges to all edges, as listed in Table 3.There are 308 common edges in the four networks, and the ratios of common edges are different due to the number of elements with known χ.Although the four electronegativity scales of elements show strongly positive correlations, there still exists significant difference among topology of these networks.Directions of edges in the four networks are only determined by the relative electronegativity values of bonding elements; thus, it is the difference among relative electronegativity values of bonding pairs which causes the structural difference and similarity among these networks, not the electronegativity values themselves.For the better analysis of network difference, we calculate their basic topological parametersaverage degree, network density, average path length and network diameter [20,21], as listed in Table 3.Although edges in various networks are different, all these networks have the same topological features of large average degree, high density and small world, which means that there exist significant similarity in their topology.We can get the conclusion that these features are independent from the selected scale values, which reflect the network structure of DCBN built by bond polarity.

Degree centrality
We use formula (1) to calculate DC, DC in and DC out of all elements in all DCBNs and draw their changing trends  ordered by increasing atomic number, as shown in Figures 5-7.
Among all involved elements in each network, element F is the only one with the maximum DC and DC out and the minimum DC in .In practice, the chemical properties of element F is so active that it can form stable bonds with most of chemical elements, especially the ones in group 18, and some bonds even can be generated spontaneously without additional conditions.And while forming binary compounds, element F is the one and only one element without positive valence, due to its strongest ability of attracting bonding electrons.The values of other two active nonmetalselements O and Cl, are similar to the values of element F in each network.Atoms of elements in group 18 neither attract nor lose electrons easily, and they only can form several binary compounds and stable bonds with few elements under specific conditions, e.g.elements F, Cl and O.In our network analysis, elements Kr, Xe and Rn are the elements with smaller DC, DC in and DC out values than those of most other elements, which is consistent with their known chemical properties.
Comparing the degree centrality values by element types, we find that DC in values of nonmetals are generally bigger than those of metals, while DC out values of  A network-based correlation research  7 nonmetals are generally smaller than those of metals.
In reality, atoms of nonmetals are much easier to attract bonding electrons than atoms of metals, and when forming chemical bonds, nonmetals always show negative valence while metals always show positive valence.
In the above figures, DC, DC in and DC out distribution of all the involved elements follow the similar periodic lawsin each period of the periodic table, DC and DC in values of elements increase as a whole and their DC out values decrease as a whole; in each group of the periodic table, DC, DC in and DC out values of chemical elements decrease as a whole.The bonding elements of an element are dependent on its chemical properties, which are determined by its configuration of extra-nuclear electrons, making the three changing trends follow the periodic laws.

PageRank
Based on formulas (2)-( 4), we calculate PR values of all elements in each DCBN, the results of which are ordered by increasing atomic number, as shown in Figure 8.
Although there exist difference in scale and topology of these networks (seen as in Table 3), the PR changing trends of all elements still follow similar periodic laws, which are concordant with the laws of corresponding electronegativity scales as shown in Figure 3. Generally, the bigger the χ of an element is, the bigger its PR is, which proves that there exists a positive correlation between PR and χ.Considering the elements in each period of the periodic table, we can find that in each DCBN, PR values follow the laws: (1) Halogens are the elements with the biggest PR of all, and their abilities to attract electrons are the strongest as well.All above phenomena show that the electronegativity values of chemical elements are generally associated with structure of the corresponding network and their locations in it.It is self-organization of all elements that leads to the significant periodic laws that PageRank changing trends show.All these laws are unassociated with the selected electronegativity scales, which are the inherent properties of DCBNs and CECBS.

Correlation analysis and significance test
To quantitatively measure the similarity between element electronegativity and node importance, we use Pearson and Spearman index to calculate their correlation coefficients and to do significance test.The correlation results   are shown in Figures 9 and 10, where their color and shape meanings are the same as the ones in Figure 4 and the P values smaller than 0.001 are marked with a gray star.Table 4 lists the P values under 0.001 significance level, and the statistically uncorrelated ones, greater than 0.001, are highlighted with bold.
In contrast to other importance scales, DC out is the only one that is negatively related with the corresponding χ.The reason for this is that DC out of an element is inversely proportion to the ability of its atoms to attract electrons, while its χ and other importance scales are all positively proportion to this ability.Among all correlation coefficients, only the absolute values of the Pearson coefficients between DC and χ of elements are smaller than 0.6 and their P values are not always smaller than 0.001, which indicate that there may not exist significant correlations between them.In reality, elements with bigger DC do not always mean they have bigger χ, for example, elements in group 1-2 have smaller electronegativity values than others, but they also can form stable bonds with a lot of elements.
The absolute values of other coefficients are all greater than 0.65 and their P values are all much smaller than 0.001, proving the significantly positive or negative correlations between element electronegativity and node importance.In other words, the electronegativity value of one element is associated with its location in the corresponding DCBN and the whole structure of this network.

Chemical meaning discussion
Comparing the electronegativity and importance values between elements A and B, we find that in some cases, if χ A is larger than χ B , then DC A is larger than DC B , DC in A is larger than DC in B , PR A is larger than PR B and DC out A is smaller than DC out B .Taking bond O-F and bond Na-Cl as an example, the polarity of bond O-F is from O to F and the polarity of bond Na-Cl is from Na to Cl, while the difference of χ p , DC, DC in and PR matches the polarity of the two bonds and the difference of DC out matches the reverse directions of their polarity, as shown in Table 5.Not only the electronegativity values and the importance values are significantly related, but also their difference are similar.To quantitatively measure its frequency, we calculate the relative values of electronegativity and importance between any two elements with known electronegativity values and count their percentages of the above phenomenon, as shown in Table 6.Among all the importance scales, only the difference of DC out cannot match that of χ well, due to the negative correlation between DC out and χ.The difference of other three scales matches more than 50% of that of χ, especially DC in and PR.The difference of node importance can effectively reflect the difference of relative values in element electronegativity, and both of them can reflect the polarity of chemical bonds, whether forward or reverse.Therefore, we get the conclusion that node importance in DCBN has its chemical meaning, the same as element electronegativity.
Because Pauling scale is widely applied in chemical analysis, we use the difference of χ p and PR between any element pair as an example and draw the overall matching status, as shown in Figure 11.In this figure, the number of elements with χ p is 94, and the difference values with the same plus-minus sign are colored by red, where the color and shape meanings in it are the same as the ones in Figure 4. 76.07% of element pairs have the same difference of node importance and Pauling electronegativity, especially the pairs of {metal, non-metal} and {nonmetal, nonmetal}.

Element cluster analysis
All the electronegativity scales reflect chemical properties of elements from various angles, which can be used to cluster chemical elements.Due to the similar chemical meaning, node importance can do the same things as well, such as clustering elements.To prove this point of view, we use k-means algorithma method of using distance difference to cluster points into given groups in n-dimensional space [32,33], to cluster elements with PR in DCBN built by χ p .The clustering results with five and seven clusters are shown in Figures 12 and 13, where the elements with the same group are colored as a color.
There exists similarity between the cluster results of χ p and PR as whole, for example, the clusters of metals, but the clusters of PR have its own features, which χ p cannot identify, such as The difference of node importance can also reflect the similarity in the properties of chemical elements from the view of network topology.

Conclusions
No matter directed or undirected, the chemical bond network creates a bridge between chemistry and complex networks, so that we can study the relationship between chemical and network properties of chemical objects and explain chemical properties from the view of complex networks.In this work, we introduce the conceptions of bond polarity and element electronegativity to judge edge directions and select four different electronegativity scales to build the directed chemical bond networks.All the networks have the same topological features of large average degree, high density and small world.In each network, we do quantitative and qualitative analyses to analyze the relationship between element electronegativity and node importance and get the conclusions as below: (1) Based on quantitative analysis, changing trends of the same importance scale values ordered by atomic number in all networks follow the similar periodic laws, which are proved to be unassociated with the used electronegativity scale.Therefore, periodic laws are the inherent structural properties of DCBN and CECBS.
(2) Pearson and Spearman correlation is used to qualitatively measure the relevance between element electronegativity and node importance, the results of which show statistically significant positive or negative correlations in most cases.The importance value of an element can reflect its electronegativity value and ability to attract or lose electrons.(3) Through chemical meaning discussion and element clustering analysis, the chemical meaning of node importance is proved to be the same as those of element electronegativity.The difference of node importance can reflect the real polarity of chemical bonds and the similarity of elements in chemical properties.
To show more details of trends in figures, this research only takes four electronegativity scales as examples to show the relationship between node importance and element electronegativity.Considering the chemical meanings of commonly used electronegativity scales, Mulliken scale may be much more accurate after taking electron affinity and ionization potential into consideration.We also study Mulliken scale and many other scales and find that the networks show the similar topology as well and their conclusions also match the ones got in the involved four networks, where the results can be reproduced with the data provided in our repositorychemistry (https:// github.com/liurunzhan/dataset/tree/master/chemistry).We can get the conclusion that no matter how to measure the property, element electronegativity can be studied and measured from the view of node importance.
The purpose of this research is to prove that electronegativity of one element in CECBS is associated with importance of its node in DCBN, which guarantees the feasibility and rationality of studying bond polarity and element electronegativity from the view of node importance.Based on the meanings of nodes and edges in DCBN, element electronegativity is not the only property that relates with network topology, and more can be measured from network models, such as bond energy and electron cloud distribution.After gathering more data, we can study more concepts, properties or scales in chemistry and biochemistry from the view of complex networks and even reinterpret or redefine them.
Our research provides a different approach to study objects and their properties and give an explanation of network-related models applied into chemistry and biochemistry.Apart from chemical elements, other objects can also be considered a system, not only a set, such as atoms, atomic groups, functional groups, genes and gene segments, properties of which can also be explored by complex networks.More chemical and biochemical problems can be translated into network problems and discussed from the view of complex networks, such as prediction of chemical bonds, stability and similarity of compound structure and distribution of electron cloud.

Figure 1 :
Figure 1: A directed network of Xe and its edges in a DCBN.

Figure 2 :
Figure 2: The bonding element pair distribution.

Figure 4 :
Figure 4: The correlation analysis between electronegativity scales.

Figure 5 :
Figure 5: DC changing trends in the four DCBNs.

Figure 6 :
Figure 6: DC in changing trends in the four DCBNs.

( 2 )
Elements Kr, Xe and Rn are the ones with smaller PR values than the values of others, due to their inactive chemical properties.(3) Nonmetals with stronger abilities to attract electrons have bigger PR values than those of metals with stronger abilities to lose electrons; thus, nonmetals have more influence on network topology.(4) Some radioactive elements have small PR values or even do not have PR values, because nuclear instability of these elements makes it difficult to detect their bonds or calculate their electronegativity values.(5) Except elements in group 18, PR values of elements increase as a whole in each period, especially the values of nonmetals.The values of translation metals, lanthanides and actinides are much similar than others, due to the similarity among their configurations of extra-nuclear electrons in d and f orbitals.

Figure 9 :
Figure 9: Pearson correlation analysis in the four DCBNs.Figure 10: Spearman correlation analysis in the four DCBNs.

Figure 10 :
Figure 9: Pearson correlation analysis in the four DCBNs.Figure 10: Spearman correlation analysis in the four DCBNs.

Figure 11 :
Figure 11: Chemical meaning analysis of PR under the scale of χ p .

( 1 )
The difference among nonmetals are identified, such as set {F}, set {O, Cl}, set {N, S, Se, Br, I} and set {H, C, P, As}.These elements have different strength in ability of attracting bonding electron pair.(2)The similarity among metals are also identified, such as the set of active metals in group 1-2, lanthanides, actinides, set {Be, Mg} and set {Al, Ga, In, Tl}.Electronegativity values of these elements are not close, but their chemical properties are similar.

Figure 12 :
Figure 12: Five groups clustered by χ p and PR.(a) Five groups clustered by χ p .(b) Five groups clustered by PR.

Figure 13 :
Figure 13: Seven groups clustered by χ p and PR.(a) Seven groups clustered by χ p .(b) Seven groups clustered by PR.

Table 1 :
Centrality scales in directed networks DegreeDegree centrality (indegree and outdegree centrality) The neighbors (in-neighbors and out-neighbors) of a node The shortest path Betweenness centrality, closeness centralityThe shortest paths from a node or to it Eigenvector Eigenvector centrality, HITS, PageRank The probability to access a node

Table 2 :
Degrees and neighbors of Xe

Table 3 :
Basic topological parameters of the four DCBNs

Table 4 :
Pearson and Spearman significance test between element electronegativity and node importance under 0.001 significance level (where the values in bold are the ones bigger than 0.001)

Table 5 :
The difference between element electronegativity and node importance in bond O-F and bond Na-Cl

Table 6 :
Relative value analysis between element electronegativity and node importance (where the values in italics are smaller than 50%)