Abstract
Link prediction is one of the methods of social network analysis. Bipartite networks are a type of complex network that can be used to model many natural events. In this study, a novel similarity measure for link prediction in bipartite networks is presented. Due to the fact that classical social network link prediction methods are less efficient and effective for use in bipartite network, it is necessary to use bipartite networkspecific methods to solve this problem. The purpose of this study is to provide a centralized and comprehensive method based on the neighborhood structure that performs better than the existing classical methods. The proposed method consists of a combination of criteria based on the neighborhood structure. Here, the classical criteria for link prediction by modifying the bipartite network are defined. These modified criteria constitute the main component of the proposed similarity measure. In addition to low simplicity and complexity, this method has high efficiency. The simulation results show that the proposed method with a superiority of 0.5% over MetaPath, 1.32% over FriendLink, and 1.8% over Katz in the fmeasure criterion shows the best performance.
1 Introduction
Social networks are a new generation of databases that are in the spotlight of Internet users these days [1]. Such databases operate on the basis of online organizations, and each has brought together a group of Internet users with a specific feature [2]. Today, social networks have become widespread and there is currently no convenient way to manage and categorize them [3,4,5]. Of course, some social networks have allowed users to categorize their friends into social circles (such as circles on Google Plus and friends list on Facebook and Twitter). However, these methods do not work well because with the addition of other people to the friends of these circles, they must be updated again by users [4]. Therefore, a mechanism is needed to learn and identify people and be able to automatically form and update social circles. In this case, we have the information of a person and his friends in a social network, and the aim is to find the social circles of the person in question, where each circle is a subset of the person’s friends. As shown in Figure 1, the user is marked with
Social networks are constantly increasing the number of users and the communications between them, and unfortunately these communications may be lost for various reasons [6]. In relation with these links and the communications between them, the problem of link prediction, which is an important topic for social media analysis, has become important [4,7]. This means predicting the likelihood of a link between two users, knowing that there is currently no link between the two users. Predicting the occurrence of links is a fundamental problem in network analysis [8]. In the subject of link prediction, a view of a network is given and we want to know what transactions are likely to take place between current members of the network in the near future [8]. Although this problem has been extensively studied, however, the problem of how to optimally and effectively combine the information obtained from the network structure with the abundant descriptive data related to nodes and links, remains to a large extent. Real networks show a range of interesting features and patterns. One of the important topics in this field of research is the design of models that predict and reproduce the occurrence of such network structures [9]. Therefore, research processes seek to develop models that accurately predict the overall structure of the network.
Many types of networks are highly dynamic [9,10,11]. These networks are rapidly growing and changing by adding new nodes that represent the existence of new transactions between network nodes [9]. Therefore, the study of networks is considered at the level of creating separate links and is even more difficult in some respects than global modeling [10,11]. Identifying the mechanisms by which these networks grow at the level of individual ridges is not yet fully understood, and is in fact the impetus for research into link prediction. In general, we consider the classic problem of link prediction. In this case, we have a view of the network at time
Bipartite networks are one of the most important types of complex networks in which nodes are divided into two parts [11]. In these networks, the links connect the nodes of different sections and there is no link between two nodes of the same section. Many realworld networks are essentially bipartite networks, such as people and items purchased, people and diseases, diseases and genes, papers and authors, words and texts, and investors and companies [12]. Link prediction is one of the most important issues in bipartite social media analysis. Figure 3 provides an overview of a bipartite network. In this figure, there are two types of nodes that connect users to items of interest. Here there is a weight as a similarity between each user and item, indicated by
The problem of link prediction arises mainly in the case of classical networks. Therefore, the appropriate method for link prediction in bipartite networks needs to be studied more carefully. Depending on the field of application, it is better to choose a link prediction method that the criteria used in it match the context of the problem. The purpose of this study is to develop a link prediction method for bipartite networks that examines the network from different perspectives based on neighborhood structure and has apt performance.
The main contribution of this study is as follows:
Development of a novel similarity measure based on bipartite social network topology
Measuring similarity between users based on neighborhood structure in bipartite social networks
Evaluation of the proposed algorithm with extensive simulation on real social networks using MATLAB.
The rest of the paper is organized as follows: Section 2 discusses the research literature on the problem of link prediction. Section 3 describes, in detail, the proposed method. Section 4 deals with simulations and experiments results. Finally, Section 5 concludes this study.
2 Literature review
Much research has been done on the problem of link prediction in social networks [14,15,16]. The first link prediction model, which was explicitly used in social networks, was proposed by Liben‐Nowell and Kleinberg [14]. They defined the method of prediction by the similarity between the two nodes with the possibility of future friendship. They then ranked the nodes based on similarity scores and suggested the highest ranked nodes. Al Hasan and Zaki later developed this approach [15]. They showed that the use of external data can improve the performance of link prediction. The authors formulated the link prediction problem as a binary classification problem.
In ref. [16], a link prediction approach based on similarity in social networks was used using latent relationships between users. In this method, a new measurement is proposed to determine the similarity of each pair of nodes based on the number of common neighbors and the correlation between the neighboring vectors of the nodes. In ref. [17], a link prediction model for complex networks is introduced. In this model, four similarity indices including CN, LHNII, COS+, and MFI are combined to define a new index for link prediction in complex networks. The combination model through logistic regression introduces the EnsembleModelBased Link Prediction algorithm.
In ref. [18], the Common Neighbors Degree Penalization (CNDP) method is introduced to link prediction in social networks. CNDP offers a new criterion for link prediction by considering clustering coefficient as a structural feature of the network. In ref. [19], the detection of communities in complex networks with ambiguous structure is proposed to improve central nodebased link prediction. In this study, a new link prediction strategy is designed that identifies communities in complex networks with ambiguous structures.
In ref. [20], the Stretch Shrink Distance Based Algorithm (SSDBA) is introduced to link prediction in social networks. SSDBA is a shortdistance contractionbased algorithm that solves community prediction based on community identification. In this algorithm, first the associations of a social network are identified and then the active nodes are identified based on community average threshold and node average threshold in each community. Next, the Stretch Shrink Distance model is used to calculate the distance changes between active nodes and local neighbors. In ref. [21], a multilayer link prediction model for complex dynamic networks is proposed. The authors developed a method for modeling multilayer networks based on the evolution of each node membership at different layers. This evolution was formulated using the Infinite Hidden Markov Model through intralayer and interlayer bonds.
In ref. [7], a new approach to link prediction in multiplex networks is proposed as Multiple Stochastic Local Walking (MLRW). Local Random Walk is one of the most popular methods for link prediction for multiplex social networks, which records network structure through pure random walking to measure similarity between nodes. MLRW uses biasing functions to calculate the weight between different layers. In ref. [22], a link prediction accounting interlayer similarity framework and proximitybased features for multiplex social networks are proposed. The authors examine the effect of interlayer similarities on link prediction in artificial and real multiplex social networks.
In ref. [23], a supervisedlearning approach is proposed to link prediction in single layer and multiplex social networks. The authors use improved structural features and similarity criteria. Here, communitybased features are used to develop this approach. In ref. [24], a supervised approach to solving the problem of link prediction in multiplex social networks is introduced. The authors derive a binary classification model from complex structural features of layers, where they consider the information of all layers at the same time. The MetaPath algorithm is presented in ref. [25], which is a way to link prediction in multiplex social networks. MetaPath performs link prediction for Foursquare social network users based on nodebased features as well as metapathbased features on Twitter. The nodebased features used are optimism and reputation, and the metapathbased are derived from the path of multiplex networks.
3 Proposed method
The idea of the proposed method is to use the wellknown and classic criteria of link prediction that have been developed to adapt to the bipartite network. For better understanding, our focus is on criteria based on neighborhood structure. Criteria based on neighborhood structure are the most important set for link prediction. Therefore, in order to take advantage of different perspectives to solve the link prediction problem, we use a combination of different neighborhoodbased criteria to define a new similarity measure. The main focus of the proposed method is on the importance of weight between users in calculating similarity. In this regard, among the criteria based on the neighborhood structure, we use the classical similarity criteria by weight, where they assign a higher score to the more dependent nodes. The classical similarity criteria used in the proposed method of Common Neighbors (CNs) [26], Jaccard Coefficient (JC) [27], AdamicAdar (AA) [28], Preferential Attachment (PA) [29], Katz (KT) [30], and FriendLink (FL) [31]. All of these criteria calculate the similarity between two nodes based on the neighborhood structure. In these criteria, nodes with a higher degree are more important. The general process of this research is shown in Figure 4.
Due to the fact that some of the implicit information is lost in the conversion of the bipartite network to a onepart network, so the weighted version of the similarity criteria is used. Hence, the weightless network graph is mapped to a weighted network. Users profile information is used to calculate the weight of links to express their common interests in communication.
where
The following are the classical similarity criteria used in the proposed method, including CNs, JC, AA, PA, KT, and FL. All of these criteria are considered weightless as well as weighted, where we use the weighted version in this paper because they conform to the structure of bipartite networks.
CN: This criterion in weightless social networks refers to the number of common nodes that are directly connected to the two nodes under evaluation. The greater the number of common neighbors between the two nodes, the more likely it is that a direct link will be established between the two nodes in the future. Equation (2) shows the
JC: This criterion refers to the highest value between a pair of nodes that has a number of common neighbors compared to the number of its neighbors. Equation (4) shows the
AA: This criterion is related to the JC. This criterion gives more importance to common neighbors who have a small number of neighbors. Thus, AA measures how strong the relationship between the common neighbors and the two nodes evaluated is. Equation (6) shows the
PA: This criterion assumes that the probability of creating a new link from node
KT: This criterion is one of the most successful global metrics for calculating similarity between users and link prediction. KT calculates the similarity according to the number and length of paths between the two users. The characteristic of this criterion is the assignment of coefficients to the paths between two users, which decreases exponentially with respect to the path length. Thus, KT attaches less importance to paths with longer lengths in calculating the final similarity. Equation (10) shows the
where
FL: Like KT, this criterion uses factors such as number and path length to calculate similarity. The only difference is considering the attenuation factor of
where
Due to the different amplitude and difference of the values of these criteria and in order to have the same effect in calculating the proposed similarity measure, the values of the introduced criteria are normalized using the zscore method [32], as shown in equation (14). This normalization is to map the amount of data from the current interval to another interval with the aim of increasing scalability.
where
The proposed similarity measure for link prediction in a bipartite network is calculated based on different similarity criteria, where it can combine the information obtained from each criterion according to different concepts. Here, the average scores of these criteria are used to calculate the proposed criterion, as shown in equation (14).
where
4 Experimental results
This section is related to the evaluation and comparison of the proposed method in solving the problem of link prediction in bipartite networks. Evaluation and comparison are based on various criteria such as precision, recall, fmeasure and mean average precision (MAP). To compare the performance of the proposed method, the classical similarity criteria of KT [30] and FL [31] as well as the MetaPath algorithm [25] have been used. The simulation was performed by MATLAB R2019a on HP Pavilion 15 Laptop with 11th Gen Intel Core i71165G7 Processor at 4.2 GHz and 16 GB RAM. In addition, the simulation is based on the Twitter and Foursquare social network datasets.
All results are based on the 10fold crossvalidation method to ensure. In this validation, training users include 90% and testing users 10% of the total social network users. At each validation step, the same users are split between the two social networks Twitter and Foursquare into two sets, training (
4.1 Evaluation criteria
In this study, various criteria such as precision, recall, fmeasure and MAP have been used to evaluate the results of different algorithms in solving the problem of link prediction [13,14]. These criteria are calculated based on two factors, including actual related users and recommended users. Let
where
4.2 Dataset
This study uses the same users on Twitter and Foursquare to evaluate link prediction algorithms. Twitter is a directional microblogging social network, and Foursquare is a unidirectional social platform based on location. Foursquare social networking information is available at https://sites.google.com/site/yangdingqi/home/foursquaredataset and Twitter social networking information is available at https://snap.stanford.edu/data/egonetsTwitter.html. Details of the dataset used by these networks are shown in Table 1.
Networks  #Links  #Nodes  #Common nodes  #Common links  Average degree  Average nodes 

81,306  1,768,149  1,508  6551  10.05  in = 10.05, out = 10  
Foursquare  266,909  3,680,126  24.41  24.4 
4.3 Discussion and comparison
In this study, extensive experiments have been performed to evaluate the method in comparison with KT and FL similarity criteria as well as the MetaPath method. Considering the use of 10fold crossvalidation, there is 10% of the total users (i.e., 150 users) from the
In the experiments, the evaluations were calculated and presented separately for each user, and the number of recommendations made to each user was considered 10 (
In another similar experiment, the results of a comparison of the recall criteria for different
Figures 11 and 12 show the results for the fmeasure and MAP criteria, respectively, with different
In order to better express the results of different methods, Table 2 is presented. In this table, the results of all different methods for precision, recall, fmeasure, and MAP criteria are reported. According to the evaluation of the proposed method in the best case, here the results are reported based on
Methods  Precision  Recall  FMeasure  MAP 

KT  92.15  66.15  81.52  80.11 
FL  92.58  69.21  82.68  81.32 
MetaPath  93.76  72.74  86.47  86.43 
Proposed method  93.81  75.50  87.28  87.75 
The proposed method has reached the fmeasure criterion of 87.28% according to all experiments performed. This advantage is achieved with 17 recommended users (i.e.,
Methods  Precision  Recall  FMeasure  MAP 

KT  9.53  14.13  1.80  11.63 
FL  7.90  9.09  1.32  9.23 
MetaPath  1.52  3.79  0.50  1.60 
5 Conclusion
Social network analysis is an approach to the study of social structures. Link prediction is one of the important fields in social networks analyses. Link prediction tries to reach an appropriate answer to this question: what kinds of links among members of a network would possibly form in future, given a snapshot of the network in current time. Similarity based methods, due to simplicity and suitable performance, are among the most popular methods of link prediction. In this study, a neighborhood structurebased method for link prediction in bipartite networks is presented. In this method, the classical similarity criteria based on neighborhood structure were first defined by applying modifications to bipartite networks. These criteria have been developed from the mapping of weightless to weighted networks. Here, we used CNs, JC, AA, PA, KT, and FL criteria. The proposed similarity measure is a combination of these criteria that can have the conceptual information of all of them. The evaluation results show that the proposed method has better performance than the basic methods such as KT and FL and also has a promising performance compared to the new MetaPath method. Therefore, the aim of the research is to achieve a criterion based on neighboring structure and optimal performance in bipartite networks. However, it is suggested that this method be analyzed for other networks as well, such as egocentric and multiplex.

Funding information: This research received no specific grant from any funding agency in the public, commercial, or notforprofit sectors.

Author contributions: All authors contributed to the design and implementation of the research, analysis of the results and writing of the manuscript.

Conflict of interest: We certify that there is no actual or potential conflict of interest in relation to this manuscript.

Competing interests: There is no free code for this study.

Ethics approval: This material is the authors’ own original work, which has not been previously published elsewhere.

Data availability statement: Data sharing is not applicable to this manuscript as no datasets were generated or analyzed during the current study.
References
[1] W. Yuan, K. He, D. Guan, L. Zhou, and C. Li, “Graph kernel based link prediction for signed social networks,” Inf. Fusion., vol. 46, pp. 1–10, 2019.10.1016/j.inffus.2018.04.004Search in Google Scholar
[2] Z. Samei and M. Jalili, “Application of hyperbolic geometry in link prediction of multiplex networks,” Sci. Rep., vol. 9, no. 1, pp. 1–11, 2019.10.1038/s41598019490017Search in Google Scholar PubMed PubMed Central
[3] P. Pei, B. Liu, and L. Jiao, “Link prediction in complex networks based on an information allocation index,” Phys. A: Stat. Mech. its Appl., vol. 470, pp. 1–11, 2017.10.1016/j.physa.2016.11.069Search in Google Scholar
[4] M. S. Aslanpour, S. E. Dashti, M. GhobaeiArani, and A. A. Rahmanian, “Resource provisioning for cloud applications: a 3D, provident and flexible approach,” J. Supercomput., vol. 74, no. 12, pp. 6470–6501, 2018.10.1007/s112270172156xSearch in Google Scholar
[5] M. Etemadi, M. GhobaeiArani, and A. Shahidinejad, “Resource provisioning for IoT services in the fog computing environment: An autonomic approach,” Comput. Commun., vol. 161, pp. 109–131, 2020.10.1016/j.comcom.2020.07.028Search in Google Scholar
[6] T. M. Tuan, P. M. Chuan, M. Ali, T. T. Ngan, and M. Mittal, “Fuzzy and neutrosophic modeling for link prediction in social networks,” Evol. Syst., vol. 10, no. 4, pp. 629–634, 2019.10.1007/s125300189251ySearch in Google Scholar
[7] E. Nasiri, K. Berahmand, and Y. Li, “A new link prediction in multiplex networks using topologically biased random walks,” Chaos, Solitons Fractals, vol. 151, p. 111230, 2021.10.1016/j.chaos.2021.111230Search in Google Scholar
[8] K. Berahmand and A. Bouyer, “LPLPA: a link influencebased label propagation algorithm for discovering community structures in networks,” Int. J. Mod. Phys. B, vol. 32, no. 06, p. 1850062, 2018.10.1142/S0217979218500625Search in Google Scholar
[9] R. Yang, C. Yang, X. Peng, and A. Rezaeipanah, “A novel similarity measure of link prediction in multi‐layer social networks based on reliable paths,” Concurrency Computation: Pract. Experience, p. e6829, 2022. 10.1002/cpe.6829.Search in Google Scholar
[10] K. Berahmand, E. Nasiri, M. Rostami, and S. Forouzandeh, “A modified DeepWalk method for link prediction in attributed social network,” Computing, vol. 103, no. 10, pp. 2227–2249, 2021.10.1007/s00607021009822Search in Google Scholar
[11] S. Mallek, I. Boukhris, Z. Elouedi, and E. Lefèvre, “Evidential link prediction in social networks based on structural and social information,” J. Comput. Sci., vol. 30, pp. 98–107, 2019.10.1016/j.jocs.2018.11.009Search in Google Scholar
[12] E. Nasiri, K. Berahmand, M. Rostami, and M. Dabiri, “A novel link prediction algorithm for proteinprotein interaction networks by attributed graph embedding,” Comput. Biol. Med., vol. 137, p. 104772, 2021.10.1016/j.compbiomed.2021.104772Search in Google Scholar PubMed
[13] A. Rezaeipanah, G. Ahmadi, and S. Sechin Matoori, “A classification approach to link prediction in multiplex online egosocial networks,” Soc. Netw. Anal. Min., vol. 10, no. 1, pp. 1–16, 2020.10.1007/s13278020006396Search in Google Scholar
[14] D. Liben‐Nowell and J. Kleinberg, “The link‐prediction problem for social networks,” J. Am. Soc. Inf. Sci. Technol., vol. 58, no. 7, pp. 1019–1031, 2007.10.1145/956863.956972Search in Google Scholar
[15] M. Al Hasan, M. J. Zaki, A survey of link prediction in social networks, Social network data analytics, Boston, MA, Springer, 2011, pp. 243–275.10.1007/9781441984623_9Search in Google Scholar
[16] A. Zareie and R. Sakellariou, “Similaritybased link prediction in social networks using latent relationships between the users,” Sci. Rep., vol. 10, no. 1, pp. 1–11, 2020.10.1038/s41598020767994Search in Google Scholar PubMed PubMed Central
[17] K. Li, L. Tu, and L. Chai, “Ensemblemodelbased link prediction of complex networks,” Computer Netw., vol. 166, p. 106978, 2020.10.1016/j.comnet.2019.106978Search in Google Scholar
[18] S. Rafiee, C. Salavati, and A. Abdollahpouri, “CNDP: link prediction based on common neighbors degree penalization,” Phys. A: Stat. Mech. its Appl., vol. 539, p. 122950, 2020.10.1016/j.physa.2019.122950Search in Google Scholar
[19] H. Jiang, Z. Liu, C. Liu, Y. Su, and X. Zhang, “Community detection in complex networks with an ambiguous structure using central node based link prediction,” Knowl. Syst., vol. 195, p. 105626, 2020.10.1016/j.knosys.2020.105626Search in Google Scholar
[20] R. Yan, Y. Li, D. Li, W. Wu, and Y. Wang, “SSDBA: the stretch shrink distance based algorithm for link prediction in social networks,” Front. Comput. Sci., vol. 15, no. 1, pp. 1–8, 2021.10.1007/s1170401990833Search in Google Scholar
[21] M. K. Manshad, M. R. Meybodi, and A. Salajegheh, “A new irregular cellular learning automatabased evolutionary computation for time series link prediction in social networks,” Appl. Intell., vol. 51, no. 1, pp. 71–84, 2021.10.1007/s10489020016855Search in Google Scholar
[22] S. Najari, M. Salehi, V. Ranjbar, and M. Jalili, “Link prediction in multiplex networks based on interlayer similarity,” Phys. A: Stat. Mech. Appl., vol. 536, p. 120978, 2019.10.1016/j.physa.2019.04.214Search in Google Scholar
[23] D. Malhotra and R. Goyal, “Supervisedlearning link prediction in single layer and multiplex networks,” Mach. Learn. Appl., vol. 6, p. 100086, 2021.10.1016/j.mlwa.2021.100086Search in Google Scholar
[24] N. Shan, L. Li, Y. Zhang, S. Bai, and X. Chen, “Supervised link prediction in multiplex networks,” Knowl. Syst., vol. 203, p. 106168, 2020.10.1016/j.knosys.2020.106168Search in Google Scholar
[25] M. Jalili, Y. Orouskhani, M. Asgari, N. Alipourfard, and M. Perc, “Link prediction in multiplex online social networks,” R. Soc. open. Sci., vol. 4, no. 2, p. 160863, 2017.10.1098/rsos.160863Search in Google Scholar
[26] F. Lorrain and H. C. White, “Structural equivalence of individuals in social networks,” J. Math. Sociol., vol. 1, no. 1, pp. 49–80, 1971.10.1016/B9780124424500.500122Search in Google Scholar
[27] S. Niwattanakul, J. Singthongchai, E. Naenudorn, and S. Wanapu, “Using of Jaccard coefficient for keywords similarity,” Proc. Int. Multiconference Eng. Comput. Sci., vol. 1, no. 6, pp. 380–384, 2013, March.10.12720/lnit.1.4.159164Search in Google Scholar
[28] L. A. Adamic and E. Adar, “Friends and neighbors on the web,” Soc. Netw., vol. 25, no. 3, pp. 211–230, 2003.10.1016/S03788733(03)000091Search in Google Scholar
[29] H. Chen, X. Li, Z. Huang, Link prediction approach to collaborative filtering, Proceedings of the 5th ACM/IEEECS Joint Conference on Digital Libraries (JCDL'05) IEEE, 2005, June, pp. 141–142.Search in Google Scholar
[30] L. Katz, “A new status index derived from sociometric analysis,” Psychometrika, vol. 18, no. 1, pp. 39–43, 1953.10.1007/BF02289026Search in Google Scholar
[31] A. Papadimitriou, P. Symeonidis, and Y. Manolopoulos, “Fast and accurate link prediction in social networking systems,” J. Syst. Softw., vol. 85, no. 9, pp. 2119–2132, 2012.10.1016/j.jss.2012.04.019Search in Google Scholar
[32] C. Cheadle, M. P. Vawter, W. J. Freed, and K. G. Becker, “Analysis of microarray data using Z score transformation,” J. Mol. diagnostics, vol. 5, no. 2, pp. 73–81, 2003.10.1016/S15251578(10)604552Search in Google Scholar
© 2022 Fariba Sarhangnia et al., published by De Gruyter
This work is licensed under the Creative Commons Attribution 4.0 International License.