Optimizing the Self-Organizing Team Size Using a Genetic Algorithm in Agile Practices

Abstract In agile software processes, the issue of team size is an important one. In this work we look at how to find the optimal, or near optimal, self-organizing team size using a genetic algorithm (GA) which considers team communication efforts. Communication, authority, roles, and learning are the team’s performance characteristics. The GA has been developed according to performance characteristics. A survey was used to evaluate the communication weight factors, which were qualitatively assessed and used in the algorithm’s objective function. The GA experiments were performed in different stages: each stage results were tested and compared with the previous results. The results show that self-organizing teams of sizes ranged from five to nine members scored more. The model can be improved by adding other team characteristics, i.e. software development efforts and costs.


Introduction
The self-organizing team is at the heart of the agile software development practice [9,10,15,21,30,45]. Despite the popularity of agile software development, there has been little research on team size and its relation to a team's performance. There are different characteristics and attributes that can affect the team performance, these are discussed in Section 2. The team communication characteristics has been used in the algorithm design as the objective function. In this research, the genetic algorithm (GA) was modeled based on available theories, previous work and the outcome of survey data to find the optimal team size based on a communication load constraint, as illustrated in Sections 2.3 and 3.1. The typical self-managed team can range from 5 to 15 members [6]. It was found that statistical correlation varies between team size, efforts and productivity [42].
The objective of this research is to develop a GA model to find the optimal self-organizing team size considering team communication characteristics. Section 2 is a literature review, Section 3 is a high-level design methodology and a summary of the users' survey, Section 4 is the algorithm design and implementation, Section 5 is the results and discussion and finally we conclude in Section 6.

Literature Review
Most of the work on work-groups' performance, have investigated characteristics associated with effectiveness [34] without considering the effect of those characteristics [8]. The heterogeneity between team members, training, diversity, access to information, self-management, team size, communication, decision-making and experience are possible characteristics [17] and variables that could positively, or negatively, affect the team's productivity. In this section, the self-organizing team, characteristics, communication and a GA are illustrated.

Self-Organizing Team Definitions
The team is defined as a group who have shared commitments [24] and strive for synergy among the team members. The self-organizing team comes under a social entity work group [56], are self-determined and selfdesigned, working independently to perform tasks [17,49]. A group of motivated individuals working as one team together to achieve one goal with the ability and authority to take decisions [20] and ready to respond to high changing demands [3,15]. Takeuchi and Nonaka [50] have described the team's characteristics as transcendent, autonomous and cross-functional.
In the agile manifesto [1], the self-organizing team is one of the core concepts and one of the 12 principles of the agile software development [9,10,15,30]. It has always been anticipated as people sharing the responsibility of supervisors and teams having everyday decision-making. The team should have variety in terms of multi-discipline, combined skills, redundancy in functions to supplement each other's, also, in learning new skills and having better ways to complete more tasks [22]. The definition of teams and groups, including self-organizing, autonomous, semi-autonomous, self-managing, self-determining, self-designing, crews, cross-functional, quality circles, project teams, task forces and emergency response teams are features of different work groups [17,21,35].

Performance Characteristics
As observed from self-organizing team definitions, the team's productivity can be increased by improving the sense of responsibility and ownership of work [16]. Individuals working collaboratively were found to achieve better decision-making [57]. By eliminating the manager role, the team can handle the decision-making role, but in some cases the team's performance did not improve [51]. Social interdependent teamwork, implies that a cooperative environment between individuals, who are interested in a common goal, sharing skills and knowledge develop mutual relationships [39].
The Karhatsu framework [26,36,48] identified the baseline of self-organizing team characteristics which focuses on team orientation, shared leadership, learning, communication and collaboration autonomy. In a research questionnaire [36] which was conducted to evaluate Agile software projects, self-organizing teams were recognized as a performance correlated factor. Generally, the dilemma is, how to measure team performance?
In sports games, the teams' performance can be measured by knowing the winning or losing teams [44], but in software development, there should be other performance measures and factors.
Based on the Hofstede multi-focus model [23], organizational culture is the way members are related to each other, their work, and the outside world. The organizational culture can either enable your strategy or delay it. According to this model, to achieve the optimal culture, to perform gap analysis and to implement the change; Hofstede has studied the following dimensions: Open system, goal-oriented, internally driven, employee-oriented and acceptance of leadership style [23]. It has been perceived by agile experts that having the right corporate culture is a necessary factor to determine the outline of agile practices [10,56]. As agile methods put a stress on people interactions [19] over processes and on customer collaboration, it is logical that individuals are naturally supportive and collaborative [13,33].

Communication and Team Size
The automation in solving problems relates to the team's communication efforts and team size in agile software practice.

Team Size
In the software development industry, researchers have been studying the relation between team size, project efforts and team productivity [5,18,38,42]. In [29], they compared traditional project management and agile practice by considering the requirements prioritization policy with eight other input values, the input values were, the team size and the culture. In addition to dynamism, criticality, interdependency, cost and value, culture was measured in percentage and has a range of 90% in agile projects. Requirements are sorted according to cost effectiveness when completed first. The team size was measured according to number of personnel and it has a range of 3 (1,8,20). The performance score considered the work to be done using the final value/cost after project completion. By modeling previous values and using an AI search engine in a distribution of 10,000 runs [29], the results showed that, the agile method provided better performance than the plan-based methods. For very high dynamism, it was found useful to have a team size of five and not more than 17 people.
The software development efforts and the impact of the team size in addition to other variables have been investigated in over 200 projects as part of an empirical study [37]. The variables included: software size in terms of functional requirements, programing language and management tools. The larger team size increases the software development efforts due to high interaction, coordination and communication between the team members [38]. One hypothesis is that, the team size has a positive association with the software development efforts. Team interaction with tools has no effect and the interaction of team size and programing language has a negative association are other hypotheses. However, using ordinary least squares (OLS) analysis and using data quality to measure the hypotheses [37], the results found no support for the previously mentioned hypotheses.

Communication
With the complexity of software projects, the emerging technologies and the multiple frameworks available to developers, it became fundamental to build multi-skilled teams. However, it is important that team members can effectively communicate with each other, with other teams and with their leaders [14,31,40]. People interacting with each other in the form of communication, will be used in the fitness function as part of the GA, which aims to find the optimal team size.
In real-world communications, expertise is as important as the technical skills required within the team. It is also possible to study the communication constraint in a social network to find the optimal team with minimum communication cost [52]. Two communication metrics were introduced to measure the team's communication effectiveness: (1) diameter communication cost and (2) minimum spanning tree cost. Given a group of experts X = {x 1 , x 2 , x 3 , . . . , x n }, and a list of skills S = {s 1 , s 2 , s 3 , . . . , s n }. The social network was modeled using an undirected graph G(X, E) where E are edges. Weight of ( The algorithm brute force was used to solve the team formation problem with the communication load constraint, which includes two phases. The first phase is team generation, and the second phase is hierarchy establishment with minimum communication cost for each generated team in phase one. The algorithm shows slow performance when the network nodes were increased from 30 to 300. The opt-algorithm was used to overcome the performance challenge which was found satisfactory [52]. The skills required for each team is set to 6 and the communication load constraint is set to 3. According to the results, the opt-algorithm was found to be the best of the three algorithms. However, the research did not mention the optimal team size considering communication load constraints.
A different interaction requires a different formula to calculate the number of communication channels of each type. In this project, the focus is on self-organizing team which requires all channels communications type. The following is a list of different communication channels types as shown in Figure 1: -Wheel: all communication goes through one the group leader.
-Y-Pattern: less centralized -two persons are closer to the center.
-Chain: flow of information goes among members from one to one till the end.  Increasing the number of team members will exponentially increase the possible number of interactions. This means more members in the team, requires more communication channels [7,25].
-Circle: closer sides persons can communicate with each other.
-All channel: more decentralized, communication flows freely between all team members.
All channel (star) type of communication is considered in the GA design. As shown in Figure 2, increasing the number of team members will exponentially increase the number of communication channels.

Genetic Algorithm (GA)
GAs are inspired by natural biological evolution which tends to find the best solution to solve a problem. The GA was invented by John Holland in 1975 and was popularized by Goldberg in 1989 [47,53,54]. The GA is known as evolutionary algorithm for optimization using techniques inspired by nature [55]. This means that the algorithm tries to find the closest best solution of the available solutions, but is not guaranteed to find the optimal one [12]. A population of solution candidates solve the optimization problem and move towards the best solution [11]. The algorithm consists of three generic operators, which are applied on population generations (selection, crossover, and mutation) [28]. The fitness function is being evaluated for each individual in the population, to select individuals for reproduction through crossover and mutation, to create the next generation [54]. The solution space is required to find an optimal parameter value which can be represented by a binary string or a real number. For example, the team size can be represented by the values 3, 5, or 10 members. But consider that the team member might be present or virtually connected, which can be represented by different parameters, or by a 10-bits binary string, for example, in [11010 | 11011] the first 5-bits represents the team size value and the second 5-bits represents the team type [12].

Fitness Function:
Once the solution is represented by encoding strings. It is necessary to evaluate the solution to meet the principal objective. In most cases, the fitness function is equal to the objective function. For example, the following fitness function is used to calculate the number of communication channels available for a team of five persons: The fitness function f s used the team size n = 5 to calculate the total number of communication channels, which was found to be 10. In this example, looking for optimized solution would be minimizing the fitness value. The flowchart in Figure 3 presents the GA steps. Figure 4 illustrates the crossover process and Figure 5 represents a mutation operation in bit number 5. The steps involved in implementing the algorithm are inspired from nature. First, the chromosome encoding is used to represent the solution. Then, a random population of individuals is generated. Then, the fitness value is used to determine the set of values that best fit for selection. In the selection stage, the algorithm picks the parents that are stronger to survive. Then, the reproduction stage, the stronger parents have higher probability to produce better children's using crossover. Then, the mutation stage is used to modify and diversify the new population. Figure 4: Crossover: Generally, the Crossover is Used to Create a New Solution from the Existing Solution After Applying the Selection. It can be done by exchanging two chromosomes in a crossover point [12,53].

Figure 5: Mutation: Mutation is Optional
Step, Which is the Transformation of Randomly Selected Genes. Mutation helps to occasionally introduce new features to maintain diversity and avoid local optima. A common mutation rate is 5%.

Chromosome Encoding:
The chromosome is the raw genetic information that the GA deals with (genotype). The physical representation of the chromosome is known as (phenotype) [47]. A chromosome is a string which stores individual genetic information. The string length depends on number of genes and encoding type [43].

Design Methodology
In this section, the genetic algorithm solution requirements are outlined briefly as a prerequisite for the design and implementation stage. First, the number of communication channels between individuals and teams is the objective function. Then, other constraints were added to the communication factor to calculate the fitness function. Then different categories and medium of communication channels were discussed. The details of GA design, equations and pseudocode are covered later in Section 4.

Communication Channels Calculation
The following formula has been used to calculate the number of communication channels, as per the PMI best practices [25]. Assume that, N is the total number of communication channels, n is the number of team members for each team and T = {t 1 , t 2 , . . . , t n } is the number of teams. The following equation was used to calculate N t the total number of communication channels within the same team: To calculate N T the total number of communication channels between teams: To calculate N the total number of communication channels: For example, assume a project team of 15 persons: Total number of communication channels = (15 × 14)/2 = 105. Now, if the team has been organized to three small teams of five: The total number of communication channels N = 3(5(5 − 1) + (3 − 1))/2 = 33.
The GA fitness function was designed based on the previous communication channels formula, but considering other attributes and different team models.

Communication Weight
In the previous equations, it was considered that all communication channels have the same weight. But considering that, they have different effectiveness, the weight factors will be added in the algorithm design to calculate the fitness value as listed in Table 1. Based on the effectiveness of the communication channel shown in Figure 6, the different communication types are having different weight factors. These weight factors were quantified and used in the objective function to calculate the communication efforts. It was gathered from the survey data, as shown in Section 3.2. Figure 7 illustrates a sample population of size 100, in the histogram the distribution of individuals represents the team size and the frequency represents how many times repeated in the solution population. Figure 8 is a high level representation for the communication   (4,6), (4,5), (5,6), (1,4), (1,5), (1,6), (2,3), (3,6), (2,5), (3,4), (2,4), (3,5)]. channels within the same team. To fully illustrate the star type (all channel) communication, the cross team communications should be plotted, but it would be difficult to visualize.

Survey and Data Analysis
A survey was conducted to quantify the communication constraints as discussed in Section 3.1. Out of total 78 participants, 56 of them have worked in agile projects. Considering that "person to person communicating in verbal" is having the weight factor of 1, which was used as a reference for the highest communication channel effectiveness, so that, the higher weight factor means lower effectiveness. The following are some of the important questions: (I) Q. person to person email or chat messaging communication has a weight factor of? Thirty-eight percent of participants considered that email communication is the same as verbal communication and 28% considered them close. (II) Q. person to person from different cultural backgrounds or communicating in a second language has a weight factor of? Sevety-seven percent of participants believe that communicating in second language requires more effort. (III) Q. person to person communicating in a video conference has a weight factor of? Twenty-seven percent believed that more effort is required. (IV) Q. person female to person male verbal communication has a weight factor of? Fifty-six percent believed that weight factor for both genders is the same. (V) Q. compared to a person's communication; team to team communication has a weight factor of?
Sixty-six percent of the participants believed that, more effort is required to communicate across teams.

Design and Implementation
As mentioned in the literature review in Section 2, there are a number of steps involved in the GA operation. These steps are: the randomly generated population, fitness function, selection, reproduction and termination. The code was developed using Python scripting language, which facilitated the testing of different models and scenarios.

Genetic Algorithm Pseudo-Code and Parameters
Using the following parameters: ∑︀ represents the search space, µ is the population size, POP is the population of individuals, Ga is the new generation population, γ is the maximum number of generations. The GA pseudo-code is shown in Algorithm 1.

Teams are Equally Distributed
The self-organizing team tends to be multi-disciplinary. One team of five members could have a combination of one analyst, one tester and three developers. The following given teams were tested considering that, the teams will be equally distributed. Each team of the following list was examined by the GA (Test 1): Total team members [50,100,120,150,200]. The total number of project members depends on the project size. The research study is to find the optimal self-organized team distributions. For a team of total number below 50, it can be solved without the GA. There was no reference from the software industry of a team bigger than 200 member. The first test results shown in Table 2.

Test 1 Results:
Outcome: The most optimal team size results for self-organizing teams that are equally distributed were found to be five, seven, or nine members per team depending on the given total team size and the following notes. Interpretation: the team size was found in range between five to nine members per team. The recommended team size by agile best practice is seven plus/minus two members. Discussion: -Considering the human factor, the team size is a discrete value. Some teams can not be equally divided; therefore, the rounded figure was used in the GA to calculate the team distribution. For example, the 100 members test result shows that the team can be divided into 14 teams, each has seven members. The result has 2% error, as two team members were not assigned. -As part of the results, the team sizes with six or eight are even numbers. The agile practitioners did not recommend teams of even numbers, due to equal voting possibilities in case of disputes and decision-making or in the case of assigning the team to a leader, then the team even number is chosen [41]. -The test results did not improve after the GA reproduction stage in the second generation. By reviewing the results in (Table 2), it was found that all the possible answers did exist in the first population. This is due to the small solution space.

Teams are Randomly Distributed
In this section, the team distribution was unequally and randomly generated, so that the decision will be taken by the GA based on the total team fitness value. The case of having equal teams might not be the optimal case? The human mind might not imagine the optimal random distribution of a team and the relation to its communication load constraint.

Evaluation
This project aims to find the optimal team size and distribution for a given total number of team members. In the test results, it was found that the team size can range from five to nine members per team, which matches the industry's best practice. The following researches results were found to support this project evaluation: -In the research of "an approach to optimizing software development team size" based on a 4000 project data analysis [18], the software cost estimate was considered and a small team was recommended. The best team size was found to be seven. -In an empirical study, for the impact of team size on software development efforts. The team size of nine was found to be productive [37]. -Researchers have rewarded team size in high performance teams. It was suggested that team members should not exceed 12, otherwise subgroups shall be considered to provide a cover of anonymity [32]. -The research by [29] has referenced a NASA recommendation for the team size with a wide range from three to 19 member per team. -In the "Experimental approach to organization development" [6] the small team size was mentioned as a characteristic of the self-managed team which can range from five to 15 member per team.

Conclusion and Future Work
In this research, the study of team productivity was limited to team size and communication characteristics. The number of communication channels between team members and between teams has been used to calculate the communication load constraint. This calculation was used initially as a fitness function by the GA to find the optimal team size from a randomly generated solution space. The experimental results concluded that, the teams of five to nine members are found to be optimal in different cases. In some cases, equally distributed teams have provided better results than randomly distributed teams. For small to medium projects of 70 members or less, it was found that smaller self-organizing teams of five, six and seven members have better valuation than other team sizes. But, for larger projects which have more than 120 members, it was found that larger self-organizing team of seven, eight and nine members have provided better results than other team sizes.
Following this research results, the proposed GA model will open the door for future work and improvements. In this research, manual input was used with few parameters assuming that each organization will have different situations. Therefore, it would be more productive to build an adaptive learning algorithm and expand the predictability of self-organizing team productivity based on team size and communication.
Additionaly we can include more characteristics.
With reference to self-organizing team characteristics which are illustrated in Section 2, it is worth studying additional characteristics other than communication efforts. This might include but not be limited to users' experience, skills, and roles. This could lead to improved fitness function and better results. Some researchers have studied the effect of project scope and the functional requirements development efforts. Also, some researchers used the software size to estimate the required efforts and time, based on the COCOMO approach [4,46]. There is a possible integration by adding the team size to these estimates. Also, revising the fitness function to find the optimal team size based on the software development cost estimate.
The GA model can be improved by performing sensitivity analysis to optimize the decision variables, also, by using different encoding techniques, if additional decision variables are considered.