A Genetic Algorithm Approach for Group Recommender System Based on Partial Rankings

Abstract Many recommender systems frequently make suggestions for group consumable items to the individual users. There has been much work done in group recommender systems (GRSs) with full ranking, but partial ranking (PR) where items are partially ranked still remains a challenge. The ultimate objective of this work is to propose rank aggregation technique for effectively handling the PR problem. Additionally, in real applications, most of the studies have focused on PR without ties (PRWOT). However, the rankings may have ties where some items are placed in the same position, but where some items are partially ranked to be aggregated may not be permutations. In this work, in order to handle problem of PR in GRS for PRWOT and PR with ties (PRWT), we propose a novel approach to GRS based on genetic algorithm (GA) where for PRWOT Spearman foot rule distance and for PRWT Kendall tau distance with bucket order are used as fitness functions. Experimental results are presented that clearly demonstrate that our proposed GRS based on GA for PRWOT (GRS-GA-PRWOT) and PRWT (GRS-GA-PRWT) outperforms well-known baseline GRS techniques.


Introduction
Recommender systems (RSs) are one of the information overloaded web personalization tools to manage the web and provide users those items that best fit their individual needs. The most successful and widely used filtering technique is collaborative filtering. Most recommender systems make recommendations of the group consumable selected items (e.g. movies) to the individual. Recently there have been several efforts in the area of group recommender systems (GRSs).
GRSs [4,15] consider preferences of each member of a group and provide such recommendations to groups so that suggested list of items (or a single recommended item) satisfies the group members optimally. However, the optimal solution is not possible so that most of the GRS proposed are providing near-optimal solutions [2,5]. The main concern is how to effectively aggregate the preferences of individuals in a group to produce recommendations that will satisfy a group of users.
The goal of rank aggregation (RA) is to find out an aggregate ranking so that the distance to each of the given partial ranked list get minimized. In the domain of GRS ranking can be categorized as follows: -Full rankings: Item list is fully ranked by a member of the group.
-Partial rankings (PRs): Item list is partially ranked, i.e. some items in the list are unranked by members of the group.
RA is a useful abstraction with a number of applications which includes meta-search, similarity search algorithms, composite rank functions from multiple listing, and classification. For GRSs, RA techniques, such as Most pleasure, Least misery, Borda count, and Copeland rule have been successfully used [9,10,12].
In real applications, the rankings which are partial (all items are not ranked by users) and are to be aggregated may not be permutations, but they may have ties where some elements are placed at the same position. However, PR with ties is more challenging as we cannot mathematically compare two PRs, which are rankings that have ties in the rank list [8,11].
There are a number of literature available in the domain of GRS with full rankings, but PR is relatively much less explored [10,11]. Among various aggregation strategies, Spearman foot rule aggregation is supposed to be a unique procedure that generates best compromise ordering that minimizes the sum of Spearman foot rule distance (SFD) from the input orderings. In general, this minimization problem is (non-deterministic polynomial-time hardness) NP-hard, and therefore a genetic algorithm (GA) is most suited for producing a near-optimal solution [18,20].
In this paper, both the PR without ties (PRWOT) and PR with ties (PRWT) are considered to develop novel GRS Schemes, GRS-GA-PRWOT and GRS-GA-PRWT, employing GA. In our GA approach, fitness functions SFD for PRWOT and Kendall tau distance (KtD) with bucket order for PRWT were used. Experimental results are demonstrating the effectiveness of the proposed GRS-GA-PRWOT and GRS-GA-PRWT Schemes.
Section 2 of this article presents related work, and the proposed GRS-based approaches using GA for PRWOT (GRS-GA-PRWOT) and PRWT (GRS-GA-PRWT) are discussed in Section 3. Details of the experiments and results are established in Section 4. Finally, in Section 5 we conclude the paper and outline some direction for future work.

Related Works
RSs are a method to solve the problem of information overload and successfully employ a variety of choices by combining ideas from user preferences, filtering information, and providing the user more intelligent and proactive information by making concrete item recommendations.
In this section GRSs, RA, PRWOT, GA, SFD measure, KtD using bucket order for PRWT, and effectiveness of rank list of recommendation are discussed [4,9].

Group Recommender Systems
GRSs make suggestions for a group. It considers the preferences of each individual member of a group into account and provides an optimal single recommendation. "In order to generate effective recommendations for a group, the system must satisfy, as much as possible, the individual preferences of the group's members" [9,21].
GRSs approaches can be classified into two broad categories [1,4]. The first one merges the profiles of multiple users and constructs a single preference model for the group, and the other one makes the group recommendation instead of the individual's recommendations. The RA is a prominent method for developing multiple strategies for GRS [5,19].

Rank Aggregation
There are two well-established ranking methods, namely, KtD and SFD. Fagin et al. has proposed various metrics for comparing PR-suggested algorithms for computing few topmost items of a near-optimal aggregation of multiple different PRs [8, 10-12].

Partial Ranking without Ties
Following Dwork et al. [10] and Fagin et al. [12], we encounter PRs where there are some items unranked by a set of users, and such lists that rank only some of the items by a set of users are called partial list. There are many situations where a full listis not convenient or even cannot be possible. A special case of partial lists is the following. If σ is a set of rank order for a full list which includes all results, another set is called τ which is a partial list which is some subset of σ. τ is only a subset of σ, and each ranked item list above all unranked items is called top n list, where n is the size of the list. In order to deal with PRs, modified SFD is used [8,11].

Partial Ranking with Ties
Let us consider the problem of RA of PRWT, that is, where items have the same preference for a user can be tied. Following Brancotte et al. [8], a bucket order on n is represented by a set of buckets B 1 to B m that forms a disjointed partition of n such that x has transitive binary relation with y if there are i, j with i < j such that x ∈ B i and y ∈ B j . A ranking with ties on n is defined as Although the classical formulation of the KtD allows comparing rankings with ties, in this case, it is not a distance anymore. Moreover, ties are actually ignored, and no disagreement can be considered for items which do not have ties. Thus, independent of the input, the ranking with the fewest disagreements is the ranking where all elements are tied in a unique bucket. To avoid reducing such a useless solution, algorithms based on KtD have to restrain themselves to produce permutations [11,12].
With reference to only one ranking, a set of items which are either counter or tied would be counted as one disagreement. KtD is equivalent to sorting the elements and can be done, with adaptations, when considering ranking with ties [8,11]. An optimal consensus ranking of a set of rankings with ties R ⊆ Rn under the generalized Kemeny RA with ties is a ranking such that ∀∈ Rn: K(r, R ≤ K(r, R)).
A consensus ranking denotes a not certainly optimal solution of the problem. When a solution is optimal, it is explicitly denoted as an optimal solution of a problem [6,14].

Bucket Order (Spearman Foot Rule Distance Measure)
A bucket order is a linear order intuitively, with ties where every bucket is of size 1. Since the foot rule optimal aggregation is an optimal distance measure, the technique is called Kemeny optimal aggregation. The computations of optimal aggregation using foot rule for the partial list is an NP-hard problem of computing distance between one partial list and full list example of bucket order show in Figure 1.

Genetic Algorithm
A population of chromosomes is processed in a GA approach where a possible solution to an optimization problem is shown by chromosomes, and the solution quality is measured using a fitness function. Chromosomes with minimum distance fitness value are selected, and offspring are produced using genetic operators, crossover, and mutation [19,20]. A new population is formed by replacing individuals with low fitness in the current populations by the newly generated good individuals.
That will be acceptable stopping criterion, such as for termination of GA when a maximum number of generations proceed or a desired level of fitness is reached will be a suitable stopping criterion. GA is the suitable technique to produce near-optimal solution for Kemeny optimal aggregation [7,18]. Main steps of GA are summarized below:

The Effectiveness of a Ranked List of Recommendation
Normalized discounted cumulative gain (nDCG) is used to evaluate the effectiveness of a ranked list of recommendations [4]. Computation of nDCG requires full ratings, and updating of nDCG for partial ratings is utilized in our work.

The Proposed PR-Based Approach for GRSs
A GRSs approach based on GA for PR is presented in this work.

Data Encoding
Let us suppose that there is a group of m users and n items. Example of partial ranking matrices is given in Table 1 Group

Scheme 1
Assuming that a group of n users do not give their preferences for all m items, not all items are present in the list; the rating matrix m*n is generated in which the ith row represents the rating of u i for i 1 , to i n .
Here, to solve the problem of GRS, generate a group of 15 items for aggregating m chromosomes that tries to satisfy different sets of users optimally.
In view of the limited access of GRS data sets, we have done a test to conduct our experiments on a randomly generated data set that closely relates the characteristics of the information available on the internet sources like MovieLens and GoupLens data sets. Schematic representation of our proposed Scheme 1 is show in Figure 2.
Details of the proposed GRS-GA-PRWOT based on PRWOT are given below:

Metrics for PRWOT
Metrics on PRs without ties are defined below. This is the matrix we have to compare with full ranking using fitness formula that is SFD.

Fitness Function for PRWOT (Sum-SFD)
Given a partial list τ and a full list σ, the SFD is explained in Table 2 Our proposed GRS finally recommends the list of items that is the permutation of best chromosome permutation with the minimum SFD.  Figure 3. Offspring Generation. The following genetic sequencing operators are employed to generate offspring [18,20]. Figure 4 Scramble Sublist Mutation show in Figure 5 Selection Criteria. In order to avoid the possibility of crossover or mutation destroying best individuals, elitist approach is used, where top chromosomes are preserved from one generation to the next generation.

Uniform Order-Based Crossover. Explained in
Is represent on

Scheme 2
Here, for solving the problem of GRS is to generate a group of 15 items for aggregating m chromosome that tries to satisfy different sets of users optimally. Movie lens datasets used for Scheme 2. Details of the proposed GRS-GA-PRWT based on partial rating with ties are given below. Schematic representation of our proposed Scheme 2 is show in Figure 6.

Metrics for PRWT
Bucket Order. For each bucket of size 1, a bucket order is termed as a linear order [12].
Let ℬ 1 . . . ..ℬ m be the bucket in order that bucket ℬ i precedes bucket ℬ j when i < j example of bucket order show in Figure 1.
This is the matrix we have to compare with full ranking with ties (σ). These are located on a concept of the KtD to top k list. Here, σ is full ranking with ties, and τ is PRWT. is a penalty, with 0 ≤ ≤ 1. According to the definition a penalty score k (p) i,j (σ, τ) is defined for PRs τ 1 , τ 2 ..... τ n , {i, j} ∈ . We calculate the penalty using three cases.

Case 1:
If i and j are in different order (bucket) in both τ and σ or if i and j are in the same order in σ and τ (such as τ(i) > τ(j) and (i) > σ(j)), and both rankings agree that i and j are tied, then  in σ and τ (such as σ(i) > σ(j) and τ(i) < τ(j)), then

Case 3:
If i and j are in the same order in one of the rankings σ and τ, but in a different order in the other PR, then in this case,

Crossover and Mutation
Here, we have to apply crossover and mutation in full ranking with ties.

Single-Point Crossover
One crossover point is selected randomly; the ranking is copied till this point from the first parent, then the second parent is scanned, and in the offspring it is added. Explained in Figure 7.
Order-Changing Mutation. In this mutation operator two numbers are selected randomly and exchanged with randomly generated numbers between 1 and 10. Explained in Figure 8.

Stopping Criteria for Genetic Algorithm.
If there is no improvement in the fitness value after 20 consecutive generations, then evolution process terminates [16].  Let p1, ..., pk be a recommended list of items produced by proposed GRSs model. Let u be a user and the actual rating is r upi for the item pi of the user u ranked in position i, i.e., σ u(pi) = i. The definitions of nDCG, ideal discounted cumulative gain (IDCG) and discounted cumulative gain (DCG) at rank k are given below:

Ranking KtD
where for user u the maximum possible gain value which is IDCG will be the same as DCG that is obtained with the optimal re-order of the k items in p1, ..., pk.

Data Set
For our experiments, we have used both the synthetic data set and real-world MovieLens data set. We have performed experiments for Scheme 1 using synthetic data sets because the type of data required for this Scheme is not publicly available, whereas for Scheme 2 MovieLens data set is used, which contains 12,832 ratings provided by 1043 users for 1682 movies; we generated list of users who have rated at least 12 movies, which provide five random splits (S), S-1, S-2, S-3, S-4, and S-5, where for each split, 20 users were randomly selected as active users. The data sets consist of users' profiles and their preferences about movies.

Scheme 1: Partial Ranking without Ties (PRWOT).
In order to measure the performance of proposed GRS-GA-PRWOT Scheme with different baseline GRS techniques, our experiments had groups of different sizes (G5, G10, G15, and G20). The evaluation is based on the sum of Spearman foot rule distance (sum-SFD); the smaller the Sum-SFD, the better the proposed model.

Experiment 1.
Here sum-SFD is computed for four different group sizes (G5, G10, G15, and G20) across generations. The results shown in Figure 9 clearly indicate that for all the groups of different sizes, GA converges to the near-optimal solution after 500 generations.

Experiment 2.
In this experiment, the proposed GRS-GA-PRWOT is compared with the baseline techniques. The results depicted in Figure 10 clearly demonstrate that our Scheme GRS-GA-PRWOT outperforms several RA techniques given in the chart. In order to measure the performance of proposed GRS-GA-PRWT Scheme with different baseline GRS techniques, we conducted experiments with groups of different sizes (G10, G20, G30, G40, and G50) users and 20 items. The evaluation is based on the Kemeny optimal aggregation with bucket order (sum-KtD); the smaller sum-KtD shows that this technique outperforms several stats of art techniques.

Experiment 1.
Here sum-SFD is computed for four different sizes of the group (G10, G20, G30, G40, and G50) across generations. The results shown in Figure 12 clearly indicate that for all the groups of different sizes, GA converges to the near-optimal solution after 500 generations.

Experiment 2.
In this experiment, the proposed GRS-GA-PRWT is compared with the baseline techniques. The results depicted in Figure 13 clearly demonstrate that our Scheme GRS-GA-PRWT outperforms Least misery, Most pleasure, Borda count, and Copeland rule.

Experiment 3.
Here, the effectiveness of the group recommendation by our proposed Scheme GRS-GA-PRWT is compared with baseline techniques based on mean nDCG with varying group sizes. Results are shown in Figure 14.

Conclusions and Future Work
A framework for GRSs is presented in this work where GA is employed successfully to deal with the aggregation problem of PRWOT as well as PRWT to generate near-optimal solutions, and two Schemes GRS-GA-PRWOT and GRS-GA-PRWT are proposed. Experimental results have clearly established the effectiveness of our proposed Schemes. As a future research, we would like to investigate incorporation of negotiation mechanism [13,14] and selected models and discuss their importance for recommender system development, notions of trustdistrust, and reputation [3,17] into the proposed GRS techniques to further enhance their capability. Further, it remains to be seen how consensus-based GRS technique can improve the proposed Schemes [6,14].