Cognitive map plasticity and imitation strategies to improve individual and social behaviors of autonomous agents

Abstract Starting from neurobiological hypotheses on the existence of place cells (PC) in the brain, the aim of this article is to show how little assumptions at both individual and social levels can lead to the emergence of non-trivial global behaviors in a multi-agent system (MAS). In particular, we show that adding a simple, hebbian learning mechanism on a cognitive map allows autonomous, situated agents to adapt themselves in a dynamically changing environment, and that even using simple agent-following strategies (driven either by similarities in the agent movement, or by individual marks - “signatures” - in agents) can dramatically improve the global performance of the MAS, in terms of survival rate of the agents. Moreover, we show that analogies can be made between such a MAS and the emergence of certain social behaviors.


Introduction
Swarm-based systems [1] are a classical approach to deal with collective intelligence problems.In such approaches, newly gathered information is represented by physical traces (pheromones) let by agents in the environment.We develop an alternative approach in which information is stored internally in agents, with no marking of the environment but with will to get similar emergent behaviors.Several works [2][3][4] relate the possible use of special cells in the rat' s hippocampus that fire when the animal is at a precise location.These neurons have been called "place cells".Starting from those neurobiological hypotheses on the existence of place cells (PC) in the brain, we show how agents can continuously learn during the exploration of their environment, and how they can share the information they have using a simple agentfollowing mechanism, seen as a low-level kind of imitation.Imitation processes are usually divided in two levels: the action level of imitation [5,6] is related to the mechanisms involved when reproducing a simple action, often an elementary movement.In what follows, we refer to this agent following capacity as "imitation".The program-level refers to imitation of more complex actions while preserving their organizational level.The work presented in this paper mainly focuses on the first of those two levels.It lies at the intersection between several domains: we use a multi-agent system (MAS) to test hypotheses on both individual cognitive processes in simple agents and the emergence of non-trivial collective behaviors in some (sub-)groups of those agents.At the individual level, agents can rely on an on-line, continuous building of a cognitive map [2,[7][8][9][10][11] whose structure depends on their own experience and discovery of the environment in which they live.At the global, population level, they can take advantage of the ability to imitate one another using simple agent-following strategies to transmit parts of one agent' s cognitive map to another' s, leading to some kind of natural distributed knowledge, similar to what can be achieved in swarm-intelligence systems, except that shared knowledge does not use the physical space as repository (as is the case for pheromones) but individual cognitive maps instead.Those maps are a subclass of topological maps, used in several context, from navigation [12][13][14] to image processing [15] and others.This process, though mainly illustrated here in navigation tasks, can thus be reused for a wide range of action selection problems.Some of our previous papers [16][17][18][19] showed how to use a cognitive map to solve non-trivial, possibly contradictory goals at an individual level and to let social behavior emerge from a group of situated agents launched in a previously unknown environment.Similar works [20,21] use sensorimotor maps together with an additional field to distinguish between information about position and activity spreading mechanism on a cognitive map.These two fields play a similar role as the sensorimotor and planning maps in our architecture.The navigation and planning aspects described here are meant as parts of a more global and complex autonomous system, which in time aims at recognizing objects [22], imitating [23][24][25][26] and communicating information among them.Other experiments in our lab showed that our model can successfully be used on actual robots navigating in the outside [27], but also learning sequences of actions, either from a human being or from another robot [25,26].In social sciences, and more precisely in the economics field, MAS are used to "formalize complex situations with various, spatial, temporal or organisational scales and heterogeneous agents engaged in social activities" [28].Previous works on the formalization of bottom up processes and rooted in the Santa Fe Institute, "Simulating Societies" [29], "Artificial societies" [30] then "Growing up societies" [31] have arisen the interest of the application of MAS in economics and more generally, MAS can be seen as a tool to model macro and micro relationships such as the artificial stock market in Santa Fe [32], the complexity of exchange and market mechanisms [33] or the strategic behavior of agents [34].In those contexts, agents are purely rational: they try to optimize a hardwired satisfaction function.Moreover, we know that individual behaviors are able to produce social and spatial phenomena unexpected in theory [35].In social science, the Schelling segregation model is doubtlessly one of the first models of a dynamical system capable of self-organization.It shows that a small preference for one' s neighbors to be of the same color can lead to a spatial segregation.MAS also exhibit the emergence of urban patterns such as urban hierarchy [36], resilience and the persistence of urban settlement structures and emergence of polynucleated urban landscapes [37].Whereas the process is quite similar (except that agents are immobile geographical entities in [36]), spatial organization and clusters emerge from the definition of different categories of agents: population size, activity, urban functions and range in [36], color in [38], ethnic specification in [39] or income level in [40], and spatial marks (such as ants in [37]).It remains that, in human geography too, agents are truly rational.The choice criterion is only making the difference: as an example, agents locate their home with regard to income and job accessibility [40], a convenient cultural environment [39], a mild preference for having neighbors of their own color 1 [38]; towns trade with regard to surplus and the creation of new functions [36] and agents, following the tradition of swarmbased systems, leave physical traces in the environment that enable learning to occur and routines to emerge [37].The spatial dimension is the consequence of the definition of agents' basic characteristics and location or moving processes on a lattice.If human geography has engaged successfully with applications in agent-based systems, it has surprisingly omitted to integrate cognitive maps.Indeed, the definition of cognitive and mental maps processes come from the geography and psychology and previous works by Lynch [41] have shown that space is of fundamental interest to understand individual routes, spatial routines or the geography of places.Renewed in the nineties, partly by the development of computer sciences and interdisciplinary approaches [42], authors were expecting a better understanding of individual spatial behavior that could replace the simplistic homoeconomicus hypothesis [43].Using cognitive maps to model economic agents enables us to live up to this expectation and to deal with limited and situated rationality.Our systems deal with two kinds of agents, reactive and cognitive [17].Reactive agents are "naturally" rational in simple environments (where visual continuity allows for the success of simple gradient following strategies), but fail as soon as those hypotheses are not met.Cognitive agents are not rational initially, but tend to become rational when their knowledge of the environment increases.Our aim is to show that this last kind of agents, dotted with a cognitive map, though only partially rational, can help to model complex social patterns in which simple reactive agents are of no -or few -help.Next section is devoted to the description of our model of place cells (and transition cells [44], see 2), and to that of the agents.In particular, we show that using transition cells (TC) solves the important problem of unambiguously linking actions to places in the environment.The term has been proposed by our lab, initially without actual biological justification, even if several recent biological recordings [45,46] can be interpreted as coming from such transition cells.Section 3 depicts the structure, role and construction of a cognitive map, then shows that adding a hebbian learning rule on the weights of the connections in the cognitive map can help the agents adapt themselves to a dynamically changing environment.Using such a map can allow non-trivial behaviors, such as the ability to deal with contradictory goals [16].Section 4 focuses on two agent-following strategies that can help to transmit information from an agent to others.We describe a signaturebased mechanism to distinguish agents from one another, then compare a "blind" following strategy to the strategy based on the agents' signature.We show that in both cases, simulations lead to the formation of subgroups, though not identical in their structure and stability.As a consequence, the analysis of such subgroups may question spatial economic analysis.Section 5 discusses how to implement such a model in the spatial economic field which main purpose is to study the importance of space in both individual and collective economics processes.

Model, material and method
In our experiments, software agents -or animats [47] -are launched in an unknown environment.They are motivated by the simulation of three types of needs (hunger, thirst and stress) that can be contradictory.Each need can be satisfied by a (pool of) corresponding resource(s), namely food, water and nest, that can be found in the environment.The level of each type of need is internally represented by an essential variable [48], ( ) whose value is in [0 1] and varies with time as in Equation 1, except when the agent reaches the resource and the variable level is reset to 1.
In the equation, α represents the decreasing rate of the essential variable.When ( ) falls under a given threshold, a planning behavior is triggered to go back to the (known) resource.Thus, changing its value has an impact on the frequency of the visits to the resource: the higher α , the more frequent the visits.If no resource of the corresponding type has been found by the agent, the level decreases until 0 and the agent dies.
The environment is gradually discovered by the agent during random exploration phases.Each time-step, the agent receives information about the visible portion of its surrounding environment.This information is made of couples of "what" and "where" information."What" relates to the recognition of a local view centered around a point of interest (corner, end of line, etc.)."Where" represents the azimuth (angle) under which each point is seen by the agent, compared to the North.The perceivable points of interest are as follows: • landmarks: fixed remarkable points (thus a particular location is given by a set of landmark/azimuth pairs).Those points are defined during the design of the environment and are not subject to changed during an experiment; • obstacles: locations that the agent cannot cross, and which prevent it from seeing what is beyond; • resources; • other agents.
Landmarks may be seen from anywhere unless occluded by an obstacle.All other elements are only detected within a given detection range.Indeed, if sources were visible from far away, there would be no need for any planning behavior.Similarly, the choice to follow another agent is restricted to the closest agents.Finally, obstacles are detected using proximity sensors (equivalent to infra-red sensors for instance).In order to simplify the model, the three ranges described above are merged into a single "visibility range".Figure 1 shows a typical environment.There is no cartesian map of the environment, nor use of a square paving, as in some other planning techniques (Q-learning among others).Depending on its internal state, the agent can adopt several different behaviors.Figure 2 sums them up.More precisely, and in decreasing priority order: 1.If the agent encounters an obstacle on its path, it triggers an obstacle avoidance strategy, based on Braitenberg vehicles [49] principle.
2. If one of the essential variables falls below its minimum threshold, and the agent has already discovered at least one matching resource, it triggers a planning strategy to reach back the known resource.Depending on the cognitive level of the agent, this strategy can be a simple gradient following, or -as we describe later in this paper, see 3 -it can use a cognitive map.
3. If other agents are in sight, it can choose to follow one of them.The decision is probabilistic, and the choice of "who to imitate" depends on the imitation strategy (see 4).
4. Lastly, default behavior is a random exploration of the environment.
The place cell paradigm allows for a given PC to code for a given location in the environment, and only fire when the agent is at (or very near) this location.Equation 2 defines the activity A of place cell at a location where landmark is viewed under azimuth θ : where K is the number of known landmarks, = L =1 ω is the number of landmarks used for (i.e., visible from) PC , ω = {0 1} expresses the fact that landmark has been used to encode PC , θ is the learned azimuth of landmark for PC and λ is a normalization constant.A similar model has been tested on mobile robots, in indoor and outdoor conditions [51].
As the distance to the location of maximum response increases, the activity of the PC decreases.While the PC activity is above a given, minimum recognition threshold, the agent is said to be in the "place field" of the winning PC (the one with maximum activity).When the agent gets out of any learned place field, the current location is learned on a newly recruited cell1 .This way, randomly exploring the environment for a long time leads to a somewhat complete "paving" of the environment with place cells.Figure 3 shows three steps in the paving process, in the case of simple, obstacle-free environment.This PC neural layer gives the agent a way to localize itself into the environment.
Using only place cells for localization leads to several drawbacks related to the planning model and to the action selection mechanism.Both will be described further, but to understand the problem, and why it can be solved using transition cells, let us briefly describe a PC-based simple action selection mechanism: to define a sensori-motor unit, one can associate an action with a given place.But then, the model cannot deal with situations where several actions can be associated with the same place.The problem is mentioned in [20], who handle it by associating the motor command with a couple of sensorimotor units, but do not use transition cells.In Figure 4, if food is placed in the left arm of the T-maze ("C") and water in the right arm ("D"), then PC "B" should be associated with two different movements ("turn left" and "turn right"), depending on the motivation (hunger or thirst), which is not feasible.Transition cells are inspired by a neurobiological model of timing and temporal sequences learning in the hippocampus [10,52,53].A transition cell codes for a spatio-temporal transition between two PCs successively winning the competition, respectively at time and − .Using TCs instead of PCs allows to solve the problem: "turn left" can be associated with transition BC and "turn right" with transition BD.One can legitimately wonder about the growth of the number of TCs, relatively to that of PCs, during the exploration of the environment: since  Experiments have been made [54] that show that this is not the case, and that the number of TCs stays proportional to that of PCs (with a branching factor of approximately 5).This result was predictable, as the number of neighbors of a given PC is limited by construction, and as one TC only links two successively reached PCs.
The two following sections describe experiments made using several types of environments, differing in size, presence / absence of obstacles, static / dynamic resources and so on.However they share common features and parameters.In all of them, the need for a resource is dictated by the level of an essential variable that falls below the minimum "comfort" level and triggers planning.The only encoded information are locations and resources discovered during random exploration.
The environment is discretized in square portions.Typical sizes range from 30x30 squares to 200x200 squares.The agent speed is constant, time needed to go from the upper-left corner to the lower-right one is approximately 150 time steps for a small environment (40x40 squares).The lifetime of agents depends primarily on the value of α , the decreasing rate of essential variables level.Typically for a small environment, α = 10 −3 makes agents die after about 1500 time steps if they have not found the minimum set of resources.Time to almost completely "pave" such a small environment in the cognitive map is approximately 20000 to 30000 time steps.The quantity of visual informations usable by the agent depends on the visibility radius, counted in squares, typically between 5 and 10.

Figure 5. Simplified view of a cognitive map based on transition cells. The motivation system modulates the activity of neurons at the planning level, thus allowing to choose the appropriate action at the recognition level:
(1) Place recognition using PCs, (2) activation of one PC and delayed activation of another allows for transition prediction, (3) need for a resource triggers maximum motivation for a given goal (Equation 3), ( 4) motivation is diffused along the cognitive map using Equation 4, ( 5) combining transition prediction and motivation gives the recognized transition, and ( 6) the action (movement) corresponding to this transition is selected.

Individual learning
Place cells are sufficient to make the agent reach back a previously discovered resource when the need arises.Indeed, the resource location can be coded on a particular PC; then, by computing a distance (using the perceived azimuth vectors) between each neighbor of the current location and the target location, the agent can choose the neighbor which is "nearest" to the target location.This way, a simple gradientfollowing strategy can be used to return to the resource location.However, this is only true for simple environments.When the environment becomes more complex (namely, in the presence of obstacles, as has been shown in [17]), or when multiple goals must be achieved simultaneously, the choice of the path to follow cannot rely any more on simple sensori-motor associations: the agent needs more information.This information can be provided by a cognitive map.Such a map is built simultaneously with the creation of place (transition) cells, by linking together two PCs (TCs) reached successively.This linking process leads to the formation of "paths" in the cognitive map that will allow the agent to find its way through a complex environment to retrieve the previously discovered resource.By associating this map with a motivational system, we provide the needed tool to cope with those complex situations.
The motivational system consists of a modulation of the neuron activity by a motivation activity.When the need for a given resource arises, the motivation triggers the activation of the appropriate neurons -every TC whose destination is the goal location -in the cognitive map (see Figure 5.).The activity is then diffused along the graph (using a Bellman-Fordlike [55] algorithm, see also [20]) until the TC whose origin codes for the actual location of the agent (see Figure 6).Equation 3 rules the motivation value for the goal TC, when the level of an essential variable is below its minimum threshold: where C is a constant in ]0 1[, is the minimum threshold for the essential variable (the value below which motivation triggers the activity of the TC), and is its current value (supposed less than for activation to take place).The goal TC activation is then propagated to the neighbours.Equation 4 shows that the activity of TC A at the planning level of the cognitive map depends exponentially on its distance (counted in number of links) to the goal: Here, ( ) represents the set of paths leading from current location to , the goal TC, is an element of this set, A is the activity of the goal TC, and is the weight (in ]0 1]) of link , for each link between two successive TC belonging to .This way, the activity of a TC can be viewed as a measure of "proximity" to the goal: the agent goes from a given TC to that whose activity is maximum.By default, all transition weights are constant (0.9 in Figure 6).
Our system aims at making agents evolve in a dynamically changing environment, where some sources can disappear when intensively visited for a long time, and others can randomly appear somewhere in the environment.The kind of map described above is fixed, in the sense that the connections between neurons, and their weight, do not change.Hence, when the environment changes (a door opens or closes, a resource disappears, etc.), the map may not be used accurately any more: it is necessary to be able to modify the links and their weights, and to create or remove neurons when appropriate.We thus added a learning rule on the building and evolution of the cognitive map to help the agent acquire smarter behaviors, for instance solve contradictory goals.The equation ruling this hebbian learning algorithm [56] is as follows: where ( ) is the weight of the link joining two successively reached cells i and j in the cognitive map, A ( ) the activity of cell at time and δ is 1 if link is fired at time t and 0 otherwise.By complexifying the δ function, one can even enforce behaviors with different timescales (for instance, in order to have the agent avoid dangerous areas [57]).Previous works [16,58] showed that such a cognitive map2 enables agents to exhibit "smart" behaviors, such as choosing to go to a food resource located far away, although there are some closer to its current location, because it will find water there too: everything looks like the agent anticipates a future need for water, whereas no structure allows it to know it will be thirsty in the near future.Simply, the links in the cognitive map leading to the couple of resources are more often reinforced (each time the agent eats OR drinks), leading the agent to see this path shorter than it really is and thus choose it.
When a planning agent tries to reach a previously known source and realizes that this source has disappeared, two things happen: (i) the agent dissociates the current PC from the formerly-corresponding resource, and (ii) it sets to 0 the motivation.Since the PC does not fire any more when the agent feels the need for this resource, there are chances that the use of transitions leading to this place be progressively forgotten.Similarly, when a new, matching resource is discovered, the paths leading to the resource are rapidly reinforced, making the cognitive map evolve synchronously with the environment.This evolution is illustrated in Figure 7, where left snapshot is taken when the dynamic food resource has not yet expired (t = 10000 time steps) whereas right picture represents the map after the dynamic resource has expired and the agent has discovered a new matching source elsewhere in the environment (t = 25000 time steps).In the experiments reported here, with respect to Equation 5, we set α = 5 10 −2 and λ = 10 −6 .We can see that some of the paths leading to the old resource location have been partially forgotten, and that new paths have emerged.

Collective learning
Experience gathered by individual agents cannot benefit to others, if there is no means to communicate information between agents.One way to add knowledge transmission among agents is to make them able to imitate one another.Several benefits can be expected from imitation capability, among which the possibility to share partial knowledge of the environment, resource locations and so on.In particular, we showed in [17] that adding an imitation capability can dramatically enhance the survival rate of a population, as Figure 8 sums up.In the experiment shown here, the environment is the same as depicted in Figure 1.First, ten agents are launched; we wait for stabilization (those who found the three resources will remain alive indefinitely and serve as potential "teachers", others are dead), then successively launch ten other agents, one at a time, to make sure "students" will only imitate "teachers".The experiment has been repeated 15 times and its averaged results are presented in the figure .When an agent happens to see other agent(s), it can choose to imitate (one of) the agent(s).The decision is probabilistic, the initial minimum threshold is an input parameter of our simulations, and its value decreases as the agent gets older.We implemented and studied two simple imitation strategies, one based on the azimuth under which potential "teachers" are perceived by the agent, the other based on the signatures.In both cases, it is important to mention that the imitated agent is never aware of being imitated, and that there are no predefined "teacher" or "student" roles: the same agent can be in both situation (in case it chooses to imitate agent as it is itself followed by agent ), either at the same time or at different moments of its life.Indeed, if agents are indistinguishable from one another, we have to face several problems: for instance, no agent can be sure to follow the same agent when encountering a whole group; moreover, statistical results can not separate individuals and detect if one agent spends most of its time in the vicinity of another.To solve those problems, we decided to add an individual signature to each agent.This signature is to evolve over time, when agents meet each other and one starts to follow the other.
We chose to design the signature as a short integer coding for a two eight-bit coordinates vector, just to be able to map it on a space isomorphic to the geographical space (see Figure 9, where agents appear as triangles and signatures as diamonds), which allows to better follow the evolution of signatures, in the map and in time.Signatures are thus

Figure 8. Survival rate of a population of agents with (red curve with diamonds) and without (blue curve with squares) imitation capability. The x axis represents the number of agents launched in the environment, the y axis the number of surviving agents (the ones that succeded in finding all needed resources).
part of the information which is perceived by an agent, just like points of interest or obstacles.When a new agent appears, its initial location is chosen randomly and its initial signature is the vector of this location: it represents the agent' s "place of birth".Other kinds of signatures could be used (not necessarily related to physical space).The interest of this coding is to allow an intuitive (geometrical) representation of the signature.The evolution of signatures is ruled by their meetings with other agents: each time an agent decides to imitate another agent, its signature will slightly change to get closer to that of the imitated agent.We can expect two consequences from this dynamic evolution of signatures.On the one hand, it helps to detect the formation of subgroups (and to follow the evolution of such subgroups) of agents in which the members imitate each other more often than they imitate members of other groups.On the other hand, agents can adopt a strategy based on signatures to choose which agent to imitate: they can choose the one, in their vicinity, whose signature is closest to their own, assuming that since signatures are similar, habits (place to live) are similar too and this agent' s goals are "close" to theirs.The equation that describes the variation of signatures is as follows: where S ( ) and S ( ) are respectively the signature of the imitating and the imitated agent at time , δ represents the facility for agent to imitate (it is a decreasing probabilistic function of the age of agent : the older the agent, the less probable the imitation).To avoid a global convergence to a unique signature, a noise is systematically added to each agent signature at each timestep: ν( ) is a random "noise" vector whose two coordinates are randomly chosen in {−1 0 +1} (kind of brownian movement), so that distance between S ( + 1) without noise and S ( + 1) with noise can never be more than √ 2. Indeed, without noise the basic mechanism described by this simplified equation only makes signature get closer from each other: it is thus likely that, after a certain amount of time, random agent meetings will have led to a global, common agent signature.This mechanism is similar to the Kohonen LVQ algorithm [59].Steel' s experiments about the talking head project showed the relevance of the introduction of such a simple vocabulary, but one can wonder about the robustness of the signatures.Indeed, in a same subgroup it is not unlikely that two agents tend to have similar signatures; but in our context it is not a problem, since what we expect from agents' signature is just an help to detect the formation and follow the evolution of subgroups.
In the case of imitation based on the azimuth, the choice of who to imitate is ruled as follows: if a single agent is visible, it becomes the chosen imitation target; if several agents are visible, then the chosen target is the one that is closest to the direction in which the agent is currently heading to (that is, the one whose perceived azimuth is closest to 0 2π).The main problem we encountered with this simple imitating scheme is deadlocks due to reciprocal imitations.When two agents came in front of one another, they sometimes both began to imitate each other, leading to a blocking situation: each of them started to follow the other, so they changed their direction at the same time and in the opposite direction!In order to avoid / lessen reciprocal imitations that lead to such deadlocks, only agents whose azimuth is less than or equal to π/2 (in absolute value) are competing.This way, when the two agents are just in front of each other at time , they can't see each other any more at time + 1, so no more deadlock can occur, at least not that simple (such a deadlock could still occur if a lot of agents are involved).Such problems are an illustration of the kind of local dynamics that can emerge from even simple imitation mechanisms (here, the creation of a moving loop of agents following each other until death).
When the target teacher gets out of sight, the student stops imitating but the process can start again later.
In the case of imitation based on the signature, the choice of who to imitate is ruled by computing the Euclidean distance between the vector coding the "student" signature and each of the potential "teachers"' vector, and choose that with the minimum distance.Notice that, since the decision to imitate leads to a modification of the signature of the imitating agent, which gets slightly closer to that of the imitated agent, one can expect the formation of groups of agents having somewhat similar signatures.In the experiments reported below, we compared the two imitation strategies (plus a purely random choice strategy) from two points of view: (i) the emergence of subgroups in a "multi-villages"3 environment, and (ii) the survival rate of a population.To study the influence of the imitation strategy on the emergence of subgroups, we used the environment shown in Figure 9, containing two separate "villages".In such a configuration, one can expect two types of behaviors: either the agent sticks to a village it found, or it "oscillates" between resources belonging to two or more villages.These two behaviors are illustrated in Figure 10, which represent the density of presence of the agent in the environment after 5,000 time steps (we removed the vertical "wall" separating the villages to record the agent positions).To differentiate between the two imitation strategies described above, for each experiment, we launched 50 agents randomly in the environment and waited for 20000 time steps (which is approximately the time needed for the agents cognitive map to tile the whole environment).Then we studied the set of signatures to determine the number of subgroups formed.We repeated the experiment 42 times, 21 times with each of the two imitation strategies, and reported, for each experiment, the number of subgroups that have emerged, based on a minimization of the intragroup standard deviations.We also made the same experiments using a random-based technique to choose the agent to follow.The results are represented in Table 1.As expected, the possibility to distinguish individual agents leads to a more stable way to choose which agent to imitate (which explains why random selection and selection based on the azimuth both lead to similar results), and thus to a bigger number of subgroups.Indeed, signatures can be thought of as a membership sign, and can be analysed as a support to "remember" who the agent has already imitated, and who it is willing to imitate in the near future.When there is no signature, there is no "past", no history for imitation decision.What is more interesting is that, whereas the number of sub-Table 1. Influence of the imitation strategy on the formation of subgroups ( 2villages).We report here the number of experiments that led to the emergence of 1, 2 or 3 subgroups, when imitation is random, based on the azimuth (left column) or on the signature (right column).Each experiment is made with 50 agents running during 20,000 time steps.
# experiments random azimuth signature groups almost never exceeds the number of villages when agents imitate randomly or from azimuth, we found several cases where imitation from signatures leads to such a situation.This is a clear indication of the greater stability of the groups in this case: the probability to imitate someone who is not part of its own group is lower, due to the distance between the two signatures.Moreover, in the cases where three subgroups emerge, two of them are related to a village and one is made of individuals that "oscillate" between the two regions.It is likely that, if the number of villages increases, the number of different subgroups will grow more rapidly, but the experiments verifying that assertion are yet to be made.The probability for such oscillation is much lower in the case of imitation based on the azimuth: suppose that two "oscillating" agents and arrive, following , near a village where agents "live".
The probability for to keep on imitating is roughly 1 +1 , so it is very likely that either or (or both) will remain near the village.
is easy to see why imitation on signatures leads to a number of subgroups greater or equal to the number of villages: oscillating agents are likely to create their own subgroup.To show that in more detail, we can think of a proof by recurrence: let be the number of villages and the number of subgroups, try to state that ≥ .The property is clearly true for ∈ {1 2}: one village always leads to one subgroup, and two villages to two or three subgroups (oscillating agents, if any, tend to stick to the same subgroup, going from one village to the other).Let' s assume the property is true for ( )  11 shows an experiment with 4 villages, where 5 stable subgroups appear.The fact that the number of subgroups is not that high when agents imitate on the azimuth is due to the lower stability of subgroups in that case: agents arriving in a village are likely to follow other agents that stick to the village (same probability as to continue their oscillation), so oscillating subgroups are unstable over time.
To study the influence of the imitation strategy on the survival rate of the populations, we used the same environment and successively launched 40, 50, then 70 agents, for both of the imitation strategies, and counted the number of agents that survived, or died for not having found all of the three types of resource.Each of the experiments has been conducted 7 times, and the results are summarized in Table 2. Average number of surviving agents is almost always greater with signature-based imitation strategy.This result is linked to the greater stability of the groups in such a strategy.If one of a group member has found the three resource types, the probability for the entire group members found to find the three resources is higher (indeed the probability each agent of the group imitates another is greater).What can seem more curious is that the standard deviation is significantly higher.
Observing what happens during those simulations, we saw that this was due to some subgroups creation around one or two agents that did not find the three resource types at the time the group emerged.
Consequently, since the chance for an agent to follow its peers is high, the result is of type "all or nothing": either the resources are discovered by one of the group members, and the whole group survives, or they are not and the whole group disappears.

Discussion
The kind of complex behaviors illustrated in this article explains the interest for cognitive maps and imitation in social sciences, and more precisely in the spatial economics field.
Concerning the agents imitation ability, the group behavior observed in Table 2 is related to a process observed in the spatial economics field, known as "unemployment traps": they are well-defined portions of urban territories in which the unemployment rate is significantly higher than anywhere around [60].Although it might be possible for people living there to find a job a few kilometers ahead, things are like if people could not -or didn't try to -move outside this small region.In our simulations, although the needed resource is in the reach of the group, it is only seldom discovered because of the strength of the intra-group link.
Using a cognitive map is a way to store information internally in agents and, together with the ability to imitate, to get similar emergent behaviors as in swarm-based systems.It questions the analysis of the resilience of urban systems and the complexity of cities [37,61].The results presented here show that the cognitive map is a way to envisage broad-minded agents that enable us to deal with series of spatial economics issues.They question spatial economists as the issue is not only the analysis of collective spatial dynamics but also a better understanding of the agents' behavior.If classical economics generally considers agents as rational, the cognitive maps raise the interest to work with cognitive agents instead.As an example, in spatial economics, the interest is to minimize distances or costs to one or several objectives that are previously known by the agent.As a consequence, when maximizing an objective function, agents are considered having a substantive rationality.In most cases, this optimization is incompatible with the limited cognitive capacities of the agents.Agents do not have complete and perfect information: they have a bounded rationality [62], a procedural knowledge that enables them to discover new places, new objectives locations and learn new paths to reach a known objective or to satisfy a multi-objective function.In this respect, spatial economics suppose that the question of the knowledge of places, of objectives location, paths and obstacles can be thought of as a matter of geography and regional planning.Indeed, classical economics generally considers that space is an additional index we add on economical variables, so space is not considered as an important variable in individual and collective analysis: the spatial configuration of places or roads would only imply a marginal modification of the agents' rational strategies.
Works in cognitive science show that spatial cognition -and cognitive maps -take part directly in the effectiveness of individual strategies.Environmental psychologists and researchers in human geography had the intuition of this result.Quoting from Portugali' s paper [43], "mental studies would provide a deeper understanding of human spatial behavior and as such would replace the simplistic assumption of a rational 'economic man' which underpinned the theory of location and spatial analysis".Indeed, works in cognitive science teach us that each agent has its own procedural knowledge of space, places, objectives and possible paths to reach a previously learn objective.Each agent is equipped with a cognitive map, that its experience or its past discovery and learning of places and paths has enlarged with time.As a consequence, simulations of cognitive maps are a way to show the importance of space in individual and collective intelligence processes.A main result of cognitive map models in economics field is to explain how the learning process is functioning and so on, how this procedural knowledge is constructed.Following Bourgine' s paper [34], it' s important to note the "remarkable property of neuronal plasticity that make adaptation possible in complex situations".Here, we mobilize tools of simulation and modeling from robotics and computer science: the cognitive map is referring to two levels of neural networks (the first one intended to learn and recognize distinct places; the second one to memorize the paths most frequently used by the agent to achieve a goal, see 3).As a consequence, agents inherit adaptability in complex environment.In such a context, they have to locate according to a set of visible landmarks likely to change (landmarks could be hidden by a wall, houses...) and they are able to benefit from a former exploration of space: they will find different ways through intermediary goals set on their cognitive maps.Hence, economic agents are in a situation of limited rationality: they can see and discover their environment (see 2), they can learn and plan optimal paths (see 3), solve multi-objective problems according to their experience and to their discovery of the environment (see 4).Another result is the interest to work with cognitive agents compared to purely reactive agents when faced with increasingly complex spatial environments.When the environment is complex (made of obstacles, vanishing or non-renewable resources as example), the cognitive agents are more effective than the reactive ones (which is not the case with simple, obstacle-free environments): in such a situation, the masking of landmarks leads to important distortions, and the agents may have difficulties to locate or to find an effective path because they will find their way through intermediary goals set on their cognitive maps.Moreover, when multiple solutions exist, such cognitive agents will naturally "spread" on the different solutions instead of sticking to the apparently best one (that with the maximum average value of essential variables level for instance).That can lead to a better global solution: in the case where resources are limited and can only renew with time, purely rational economic agents would all stick to the nearest village, exhaust its resources and die if too far from the next village; cognitive agents, by spreading on all the solutions, can exhibit a better global behavior (in term of surviving agents ratio for instance) in this situation.On the other hand, if the environment is simple, reactive agents are more effective than cognitive ones because they do not refer to previous paths and go straight to the objectives.It seems that the collective intelligence processes have inherited from these results.Following classical economics fields, as in a simple environment, natural resources have been over-exploited or exhausted whereas in a complex environment, the agent exhibits complex dynamics such as adaptability or survival when resources exhausted and it will be interesting to analyse the agents capabilities to manage existing stocks of resources.In the first case, we consider that natural resources, that have been discovered, are exploited independently of their location.Prices (including transportation costs) and available quantities are the single limits to resources exploitation.In that case, environment is quite simple, routes are known and the economics question is to preserve resources controlling prices or quantities.In the second case, we have to deal with the complexity of the environment: resources can become exhausted, new ones are discovered and agents are simultaneously interested in different types of resources.Pressures to resources are unequal and are not systematically a function of the transportation costs of agents.The complexity of the environment is also questioning local unemployment contexts: finding vacant jobs asks for an ability to move, to know the location of firms, skills and vacancies to avoid spatial mismatch.Urban economics encounters spatial mismatch cases without taking into account the spatial strategies of unemployed people.It supposes that the standard characteristics of agents (gender, age, qualification, car ownership, distance from home to potential jobs, incomefl) or of local policies (collective transportation system, unemployment agenciesfl) and labor market functioning (internal market, adequacy between supply and demandfl) are the main variables to understand spatial mismatch.Agents may have spatial habits or routines that constrain their use of space whereas the economics analysis is always developed in a simple environment.It argues as if unemployed people have all information on firms, jobs and road to apply when individual strategies and contexts are complex.Cognitive maps enable to catch this phenomenon.Nevertheless, it requires adequate tools to identify, test and calibrate different groups and spatial configurations.
Schelling' s work [38] is a fundamental example of the analysis of spatial configurations in economics: "The demographic map of almost any American metropolitan area suggests that it is easy to find residential areas that are all white or nearly so and areas that are all black or nearly so but hard to find localities in which neither whites nor nonwhites are more than, say, three-quarters of the total".The main question is to identify why such a non-organized segregation exists.To mimic the distribution of ethnic groups in an urban area, Schelling sets up the following hypothesis: residents of a given area are happy as long as the majority of their neighbors are the same color as themselves.If the res-idents are "unhappy" they move to a new area.On the basis of such an hypothesis he showed that even if all agents have a preference for integration, the model dynamics leads to segregation.The main criticisms consist of an a priori definition of the origin of segregation (the color) and the dynamics rule that inevitably leads to "tipping point" and segregation.The use of cognitive maps together with the ability to imitate show that it is possible to define "villages" as in Figure 11 and segregation outlines: the signatures of agents are a way to differentiate them and to define the nature of the groups, whereas the collective intelligence process is running on the base of space discovery and known essential resources reaching.It is a way to deepen the analysis of collective dynamics [31,61] to better understand the part that space is playing.
important question is now to characterize the set of spatial collective dynamics which can be obtained according to the level of complexity of the model of agents, their environment and their interactions with the environment and other agents.If limited rationality is now an operational concept, it' s in the interest of economists to recognize the importance of space in some basic economics question.When considering a complex environment, space is essential to the analysis of collective intelligence processes.In conclusion, we showed that our model and system allow to to solve non trivial planning and optimization problems, with only very little assumptions on the initial knowledge of the agents.They can adapt themselves to a changing environment, share a partial knowledge of the problem with each other, handle multiple, contradictory goals and find different solutions to the problem when multiple solutions exist.It should now be possible to see how the optimization can be pushed one step further, by making agents able to act on their environment instead of just adapting themselves to it: for instance, letting agents specialize in a given task [63], or letting them carry some resource from "natural" sources to locations near important paths could dramatically enhance the performance of the global system, as far as the average "satisfaction" level of the agents (the average of an agent' s essential variables values) is concerned.

Figure 1 .
Figure 1.An animat (small dark triangle) in its environment, made of landmarks (crosses), obstacles (solid rectangles), resources (labelled circles) and possibly other animats (triangles).Visible landmarks are linked to selected animat, and its visibility range is put in evidence with the grey circle around it.

Figure 3 .
Figure 3. Building place cells during exploration of the environment.The known portion of the environment at = 500 (left picture), = 1500 (middle) and = 5000 (right).Each homogeneous region represents a zone with the same winner (its place field).Colors are randomly chosen to avoid two contiguous regions to share a common color.

Figure 4 .
Figure 4.A simple "T" maze example to illustrate action selection problem with place cells: if food is in C and water in D, two movements ("turn left" and "turn right") should be associated with B.

Figure 6 .
Figure 6.Diffusion of the motivation along the cognitive map.Here, the agent (situated near the lower water source) feels the need to rest (nest), and there are two solutions (two known resources) in competition.In this situation, the motivation for the lower nest is stronger (nearly 1) than that of the upper one (nearly 0).

Figure 7 .
Figure 7. Cognitive map evolution induced by a changing environment.The roads leading to the food source in the left snapshot (bottom of figure) of the cognitive map are partially forgotten when the source has changed, whereas new paths have appeared near the new source location (left part of the right snapshot of the cognitive map) and some existing paths have been reinforced (in this experiment, the animat almost never goes directly from Nest to Water, but prefers to "transit" by the Food source)

Figure 9 .
Figure 9. Agents (triangles on left-sided picture) and their signatures (diamonds on right-sided picture).Signatures being considered as a twocoordinate vector can be projected on the same physical space as agents, which shows where emerged subgroups of agents spend most of their time.The geographical proximity of the agents signatures is a way to define the emergence of subgroups of agents.Here for instance, one can distinguish two main subgroups, on each side of the vertical "wall"; right subgroup is more tightened than left one, which is not yet stabilized.

Figure 10 .
Figure 10.Two types of behaviors in the case where two "villages" exist: in the left picture, the agent "oscillates" between resources belonging to different villages, in the right picture it sticks to one of the villages.The graphics represent the density of presence of the agent after 5,000 time steps.There were no obstacles during this simulation, and only one agent' s moves are recorded in each case.

Figure 11 .
Figure 11.An example showing that imitation on signatures can lead to situations where the number of subgroups (5, left picture) is greater than the number of villages (4, right picture).

Table 2 .
try to figure out what happens when a new village is added to an existing environment containing villages, and new agents are launched.A new group of agents will almost certainly stabilize around the new village.If the new Influence of the imitation strategy on the survival rate of the populations.Average number (and standard deviation) of surviving agents.The much higher standard deviation for signature-based imitation is due to some global group disappearance.