A hybrid air-sea cooperative approach combined with a swarm trajectory planning method

Abstract This work addresses the issue of ocean monitoring and clean-up of polluted zones, as well as the notion of trajectory planning and fault tolerance for semi-autonomous unmanned vehicles. A hybrid approach for unmanned aerial vehicles (UAVs) is introduced to monitor the ocean region and cooperate with swarm of unmanned surface vehicles (USVs) to clean dirty zones. The paper proposes two solutions that apply to trajectory planning from the base of life to the dirty zone for swarm USVs. The first solution is performed by a modified Genetic Algorithm (GA), and the second uses a modified Ant Algorithm (AA). The proposed solutions were both implemented in the simulation with different scenarios for the dirty zone. This approach detects and reduces the pollution level in ocean zones while taking into account the problem of fault tolerance related to unmanned cleaning vehicles.

task of controlling these unmanned vehicles to follow a predefined trajectory while maintaining the desired formation pattern [3].
Path planning refers to the search for an optimal or sub-optimal path from the starting position to the goal in the environment. Trajectory planning is based on two key technologies: environmental modeling and planning algorithms. A reasonable model of the environment can help to reduce the amount of search and space occupancy over time [4], the grid representation, the geometry of space law, the topological method and electronic chart are reasonable models [5]. Grid method characteristics are simple, easy to implement and can be applied to different algorithms. Trajectory planning algorithms commonly include two methods: traditional planning and intelligent planning methods. Traditional methods include: artificial potential field method [6], A* method [7], D* method [8] and so on. Intelligent planning methods include: genetic algorithm [9], colony algorithm [5], neural network [10], particle swarm algorithm [2], etc. In addition, there is a mixed planning method based on the combination between traditional and intelligent planning.
According to [9], heuristic algorithms that implement searches in the solution space can be classified as instance-based or model-based algorithms. The instancebased algorithms generate new candidate solutions using the current solution or a population of solutions, such as genetic algorithms. GA are iterative stochastic optimization algorithms. They provide solutions to problems that do not have computational solutions in reasonable time analytic or algorithmic time. Model-based search algorithms generate candidate solutions using a parameterized probabilistic model, updated according to previous solutions. It allows the search to focus on regions containing high quality solutions such as Ant Colony Optimization (ACO). ACO is a meta-heuristic modeling based on the ant's drilling behavior. Ant System (AS) was the first ACO algorithm applied to the travelling salesman problem (TSP) [11].
This work aims to present a hybrid hierarchical approach for a better management and easier control between the different unmanned vehicles. Moreover, this architecture allows fault tolerance and scalability compared to the self-organized approach that does not allow or hardly allows scalability with increased complexity for efficient management of its entities. Centralized management has a central node with deterministic decisionmaking capability and easy to implement coordination. This central node has a global view of the unmanned monitoring and cleaning swarm vehicles activities. Distributed management begins when the monitoring vehicle is assigned to a region. It has the ability to coordinate its cleaning swarms. These swarms plan their movement based on the proposed solutions to get to the dirty zones from their storage bases. These solutions propose a calculation method that quickly provides a workable solution to the planning problem. As this cleaning vehicle is a mechanical, electronic and computer system which may at some point fail at the hardware / software level, a solution is proposed to allow the replacement of failed vehicles by the competent cleaning vehicle during the realization of the cleaning mission.
This paper presents a hierarchical hybrid cooperative approach for heterogeneous unmanned vehicles. The approach includes two solutions to the trajectory planning problem for the unmanned surface vehicle (USV) swarm. The first one is based on the modified Genetic Algorithm, whereas the second relies on the modified Ant Algorithm. The proposed approach allows the cooperation of an unmanned aerial vehicle (UAV) to monitor an ocean region with an unmanned surface vehicles (USV) swarm to clean dirty oceanic zones. The UAV is assumed to be equipped with on-board sensors that allow it to locate dirty zones. The UAV discretizes its environment map and updates it with the collected information related to dirty zones. The UAV sends its environment map to its general coordinator (represented by a laptop and guided by a human operator). After an analysis of this collected data, the general coordinator allocates the explored map (way-points) to the USV swarm to clean each dirty zones. This swarm navigates to the assigned dirty zone and cleans it based on the proposed solutions. In addition, the proposed approach is complemented by a failure managing method of a USV during the execution of its cleaning task.
The remainder of this paper is organised as follows: Section 2 presents several related works; Section 3 describes the proposed approach for the different unmanned vehicles with the proposed solutions, and models them by logical formalization; in Section 4 the proposed approach is compared to the related works; Section 5 illustrates an example to simulate the operation of the proposed approach; finally, Section 6 concludes this paper and provides some future research directions.

Related works
Many heuristic and meta-heuristic algorithms have been applied to the path planning problem of unmanned vehicles, such as GA, ACO or PSO (more approaches can be found in [6]). In addition, several methods allow for coordination and control of multiple unmanned vehicles in a swarm or formation in trajectory planning [12] and to execute various applications such as cleaning, surveillance, rescue, detection and localization. For example, the CADDY cooperative orientation system which was proposed to monitor the behavior of human divers during the execution of missions at sea [13]. CADDY functionalities rely on the coordination of two robotic platforms, an Autonomous Surface Vehicle (ASV) and Autonomous Underwater Vehicles (AUV), and a guide. The cooperative pathfollowing method is based on the virtual target approach. This method is developed in two stages (defined in [14,15]). First, it relies on the consolidated single-vehicle virtualtarget based path-following technique (using a Lyapunov function). Then, it extends the concept to a multi-vehicle system based on speed regulation, depending on the relative distance between the virtual targets (further details in [16]).
Thus, in [3], the authors introduced an improved swarm-based path-following guidance system for an autonomous multi-vehicle marine system. In the seminal idea, a team of USVs is required to join a formation using a potential-based swarm aggregation methodology, while a virtual target based guidance module drives the whole formation towards a desired reference path (further details in [16]). The proposed document improves the performance of the work [15] by presenting the modelling of the two following problems: the positive overvoltage speed (typical constraints of the maritime vehicles) and the saturation speed at a maximum value. Another work [17] proposed an organic computing approach to develop a complete framework for the design (observers) and control of autonomous collaborative robot swarms, with particular emphasis on quadcopters collaborating with each other to perform spatial tasks. The proposed approach has facilitated the adaptability and self-optimization of drone individuals, the optimization of collaborative efforts between drones and an efficient control of the swarm by the human user on several levels of abstraction [18].
Furthermore, a prototype [19] of a USV based on two gas sensors was developed to locate the gas source of an oil spill in the sea. The gas concentration received by the two sensors, applied to a fuzzy logic controller, determines the rate at which each propeller motor approaches the gas target. The result shows that the USV prototype is able to approach and locate the target with average accuracy obtained when searching for the location of the gas source of 90.1%. Thus, a new robotic system swarm [20] was proposed to locate and pick up an oil spill on the surface of the water (ocean, river, lake.). A coordinator determines the position and the center of the spill using a GPS receiver, then a barge carrying the robots moves to the workplace. Depending on the state of the spill, the coordinator can send a swarm of robots to surround and collect the oil spill, then place the barge with oil suction equipment and move it to another location to remove the oil more safely. This new system saves money and time.
In addition, a USV design system [21] was proposed to detect and collect samples of oil spills. The manufactured USV uses a straight-line tracking algorithm with GPS navigation to localize the polluted zone. When it reaches the polluted zone, it activates a sampling mechanism to extract a sample from the water surface and executes the image-based detection algorithm using an automatic learning algorithm to confirm the oil presence. Experiments with this vessel have shown excellent maneuverability, a reliable mechanism for collecting samples and a reasonable accuracy in detecting oil contamination.
On the other hand, an optimal path-planning method for a USV was proposed in [5]. The authors used the grid method modelling for typical obstacles. This grid divides the workspace of the robot (environment) into cells. The white and black cells represent the free space and the obstacle space, respectively. The shortest path is obtained by this frame with the USV eight movements in the grid. The global path planning method used is the ant colony algorithm based on ACO. The roulette method is applied to select the next point at which the USV can reach.
Another ant algorithm 'Max-Min Ant System (MMAS)' was studied in [11] to find an optimal or near-optimal path, in which robots should explore the environment at the same time they plan the path. Thus, the authors analyzed the distance traveled by robots during the first iteration of the algorithm to analyze the quality of the solutions obtained. The environment is represented by a topological map. The paths are represented by a sequence of actions that robots should execute to reach the goal. MMAS provided a very good performance and obtained better results (best distance traveled) compared to the best distance traveled obtained by GA and best distance traveled obtained by A* algorithm.
Contrariwise, a real-time in-flight trajectory planner (RTTP) [22] of a fixed-wing UAV was presented for the volcanic monitoring and ash sampling task. The RTTP is based on a GA that allows that finds a a collision-free tra-jectory with less energy and is adapted to long and high altitude atmospheric conditions over the Volcán de Fuego in Guatemala. In [9], another method for planning pathway is applied, where GA is used to find a sequence of actions for autonomous mobile robots to perform to reach the goal. The robot does not know in advance the disposition of the environment and only has a rough estimate of the starting positions and objective. At first, a set of actions are generated randomly; robots execute these actions. After, their fitness is evaluated by the distance traveled and the Euclidean distance from the goal. Individuals are selected by tournament to breed. Then, a new sequence of actions is generated by applying crossover and mutation operators. The evolution continues only for the sequence of related actions that did not reach the goal. GA has a better average performance and the shortest distances traveled by the solutions returned by GA than the solutions returned by A* and its improvement (C*). In [12], another GA was proposed for the motion planning of heterogeneous holonomic robot swarms. This algorithm consists of a global path planner (GPP) and a motion planner (MP). The GPP algorithm searches for a path that the robot swarm center must move along in the free space of a Voronoi diagram (2D workspace). The MP is a GA based on an artificial potential field. The repulsion keeps robots away from obstacles and the 'spring' function maintains the robot swarm within a certain distance from each other. Since GA searches for an optimal configuration with lower potential, the obtained paths are safer. The results of the simulation demonstrated that the proposed algorithm can plan the paths without collision.
This research work aims to provide a new solution to maritime pollution challenges by proposing a hybrid approach based on the two proposed solutions for trajectory planning 'modified-GA' and 'modified-AA'.

Proposed approach
This paper presents a new hybrid approach based on the cooperation of air-sea vehicles. This cooperation minimizes the work overload to perform a monitoring and cleaning mission of dirty zones. The unmanned vehicles must plan their paths to achieve their goals in this mission. Thus, the approach proposed in [16] is developed by describing the architecture and constituent entities of the system, the trajectory planning methods, as well as their functioning.  Figure 1 shows the hybrid architecture of the proposed system. It consists of a central unit, a monitoring vehicle to monitor maritime region, a swarm of cleaning vehicles to clean dirty zones, and a recovery vehicles to recover defective vehicles. The central unit is composed of a general coordinator, a base of life and a database. The coordinator general stores and consults the data in the database. It interacts with the base of life and the monitoring vehicle via a communication link of the IEEE.802.11a standard (5000m outside range) of the name Wi-Fi 5. While the Wi-Fi network of the IEEE.802.11b standard (35m-140m indoor range) allows messages to be connected between the monitoring vehicle and the swarm of cleaning vehicles.

Hierarchical role of each vehicle
The decision-making hierarchy of the proposed approach is made up at the first level of a general coordinator. It has the highest decision for the execution of various tasks and the launch and control of unmanned vehicles used, and that are located in the basis of life. The monitoring drone is loaded with a lower decision at a second level. It has the role to supervise and monitor its cleaning vehicle swarms. These cleaning vehicles are located in the third level and are composed by leader vehicles with their cleaning following vehicles. Their objective is to carry out the cleaning operation of the dirty zone according to the energy availability of each member. Each leader of the swarm has two necessary roles. It is responsible for the tasks/characteristics of its followers, and also, it shares (cooperates) with them the cleaning action. The coordinator launches a recovery vehicle to recover the failing vehicles.

Environment modeling
This section defines the modeling of this work environment along with the set of components. Table 1 summarizes the parameters used.
1. Set of tasks: five high level tasks are defined and used by the general coordinator, the monitoring and cleaning vehicles: -Allocating task (ta) represents the task of allocating unmanned vehicles to different regions r and dirty zones z; -Monitoring task (tmr) represents a task of monitoring a region r; -Cleaning task (tcz) represents the task of cleaning a zone z; -Supervising cleaning task (tsz) represents the task of supervising the cleaning of a zone z; -Launching task (t l ) represents the launching spot of the different previous tasks (monitoring, cleaning, supervising cleaning and allocating tasks).
2. Set of vehicles: different vehicles with their related roles are described as follows: -UAVmr: monitoring, homogeneous and semiautonomous aerial vehicle; this vehicle incorporates a camera, an ultrasonic sensor, a GPS (Global Positioning System) and an autopilot software; it features a speed and an energy capacity that allows it to complete the following roles/tasks: monitor the regions, dirty zones, and supervise the cleaning vehicles swarm; ask / inform the task leader; launch / return the data and results to the general coordinator; -USVcz: cleaning, homogeneous and semiautonomous surface vehicle; it has the same hardware and features as UAVmr except it does not embed a camera; USVcz is responsible for: cleaning the dirty zones; following / requesting tasks and informing / returning the data and results to its leader; -Leadercz: cleaning, homogeneous and semiautonomous surface vehicle; this vehicle is similar to USVcz in hardware and software components; it allows to: cooperate with followers (USVcz) for cleaning mission; send the requests

Threshold
Fixed threshold that allows classifying the zone coordinates according to the degrees of cells 'List_Degree cell '.

Thresholdaverage_energycap
Threshold of the average energy capacity that has a fixed value and it enabled the determination of the average energy capacity.

Threshold low_energycap
Threshold of low energy capacity that has a fixed value and it enabled the determination of the low energy capacity.
List of USVcz identifiers composing the swarm.
List of the USV energy capacity in instant t during cleaning as well as its identifier. Listzone(List_Degree cell ,List_Position cell , Positionzone) List of coordinates of the dirty zone.

List zoneS
Listzone sorted by the degree of dirt.

List threshold_degreeZ
Coordinates list of the dirty zone that is compared with the threshold of dirt compared to the dirty degrees (List_Degree cell ).

Nbr CZ
A function that gives the number of selected USVcz by General crd . List to memorize cells that are already cleaned by USVcz. It is constituted by the identifier of USVcz and its cleaned Cell(x,y). This cell is represented by the Cartesian couple (x,y) in the grid.
List of distances between positions that are marked as an energy cost. It is composed of the link between the position i and j thus its energy Cost ij .

Energy capCZ
Energy capacity of USVcz. Listv irtualPos(List PosS (x,y), Pos End (x,y)) Virtual list which is constructed by a starting position list List PosS (x,y) and Pos End (x,y). Parameters start−up(M) (Id region ,Position region , Path region ) Triplet of start parameters for each UAVmr which represents the identifier, position, and trajectory of a region. Parameters start−up(C) (Id region , Idzone, Id SUPMR , Id LeaderCZ , Positionzone, Pos S tart(x,y), Pos E nd(x,y), List P osE(x,y)) Triplet of the start parameters for each USVcz which represents the identifier of its region, zone, supervisor, leader and the position of zone with Pos Start (x,y) and Pos End (x,y) and List PosE (x,y). for its followers; receive / save the characteristics of each follower and save its features; inform / return the data and results to its supervisor; -Vehiclerec: it is a special vehicle named recovery vehicle; it is similar as to USVcz in hardware and software components; it allows the recovery and the return of faulty vehicles (out of order) towards the base of life (detailed description of Vehiclerec is presented in [23]).
3. Set of agents: a description is provided for each used agent: -General crd : General coordinator is represented by a laptop that contains a coordination software, guided by a human operator; it is responsible for the following: the base of life, data of the regions and the dirty zones; use (treatment)/storage of data in the database; launch of the tasks / missions and allocate tasks to vehicles; -Leadercz: Leader vehicle is a USVcz that has two roles: it is an intermediary between the supervisor and USVcz swarm, and also cooperates in the cleanup operation; -USVcz(discharge): Cleaning vehicle in discharge state is a cleaning vehicle, that has not yet completed its task and energy capacity are low during cleaning; -USVcz(free): Cleaning vehicle in free state is a cleaning vehicle that has completed its task; -USVcz(prepared): Cleaning vehicle in prepared state is a cleaning vehicle prepared in the base of life by the General crd which will replace USVcz(discharge); -Supmr ( 4. Set of regions: the monitored maritime space is divided into maritime regions; the region is composed of two sub-spaces: an atmosphere sub-space where UAVmr can be found, and a maritime sub-space with the swarm of USVcz, Vehiclerec, base of life and dirty zones.

Set of base of life:
it is a zone (can be a boat, ship or an island) to store a fixed number of UAVmr, USVcz and Vehiclerec.
6. Set of database: these are basics for storing and saving all the data and features of the maritime space as well as the different unmanned vehicles used.
7. Set of dirty zones: with water pollution, for example oil slicks or plastic waste; the proposed metric in this work is the degrees of dirt for each zone with four types: strong dirt, average-strong dirt, average dirt and low dirt; each zone is characterized by a list 'Listzone' which delimits its borders by the coordinates; they are composed by the list the degrees of dirt 'List_Degree cell ', list of cell positions of this degrees 'List_Position cell ' and they are attached to a zone by 'Positionzone'.

Hybrid approach with two trajectory planning solutions
This section describes the key steps of the proposed hybrid approach with a proposal for tailored solutions. The first solution is based on a genetic algorithm (GA), and the second is based on an ant algorithm (AA). To this end, the approach is based on two essential steps: monitoring and cleaning. Both solutions are applied in the cleaning step. Figure 2 illustrates a structure of the proposed approach.

Step 1: Monitoring
This step presents the actions of the UAVmr monitoring phases:

Phase 1: monitoring setup
The monitoring step is performed, for example, twice a week in the maritime region to be monitored. The general coordinator General crd prepares the monitoring drones according to the region numbers, allocating a UAVmr for each region. In addition, General crd checks periodi-cally the maritime space statistics that are saved in the database. If it finds a statistic region that contains a high percentage of dirty zones in a certain period, it launches a UAVmr for a full day to monitor this dirty zone.

Phase 2: monitoring execution
General crd triggers the monitoring step according to the following actions: 1. Prepare each UAVmr with startup parameters Parameters startup(M) (Id region , Position region , Path region ), the atmosphere map explored, the energy (battery charged) and the associated speed before starting.

Each
UAVmr is launched from the base of life to the end position according to the assigned Path region . It follows this path with a rectilinear movement [16] to reach its region. When UAVmr arrives at its region, using the explored map (a discrete grid 2D) of the atmospheric part of the region, it moves, captures and records the data of maritime sub-space in Listzone(List_Degree cell , List_Position cell , Positionzone) following a swipe movement [16]. The collection of data is based on a discretized map (grid 2D) by the UAVmr using its sensor (a camera and an ultrasonic).

Step 2: Cleaning
This step is organized in three phases: cleaning setup, execution and cleaning finalization.

Phase 1: cleaning setup
The two proposed solutions (modified-GA and modified-AA) are applied exactly in this phase. In addition, they guide the USVcz swarm to execute the moving phase to dirty zones. Before applying the two solutions, General crd analyzes the received data (List_threshold degreeZ ) from each UAVmr. Then, it determines the USVcz number containing the swarm for each dirty zone. This action is de-scribed by Algorithm 1. Then, the main phases of the two proposed solutions are presented so that the USVcz swarm can plan its way to its dirty zone.

Solution 1: modified-GA trajectory planning
This phase is presented as the first solution modified to the trajectory planning problem. Different actions of this solution are mentioned so that the USVcz swarm selected can build its trajectory towards its dirty zone.
1. Genetic algorithm setup: the proposed GA is inspired by the algorithm presented in [24]. In this GA, the USVcz swarm must search for an optimal trajectory between the starting position (the base of life) and the goal (the dirty zone). Table 2 presents a comparative table of the proposed approach with the work of [24] in [16].
-Environment modeling: the environment modeling depends on the map of the discretized region (metric map) by Supmr. The workspace is discretized by a square grid G which is composed of square cells. A node is located inside each cell to facilitate the movement of USVcz. These nodes build a dynamic graph where the arcs are presented by R ij connection links. R ij between two neighboring cell-nodes represents the distance between two positions ( Figure 3). This distance is represented by the energy value E ij that USVcz can consume in the displacement. Each cell-node i th represents a free / occupied position by a position or a non-solid obstacle in the environment. These positions are identified by Cartesian coordinates (x,y) in a 2D plane. These positions have a maximum of eight links (R ij ) with the neighboring positions j th . -Trajectory planning method: General crd assigns a Pos Start (x,y) and the list of end positions ListPos End (x,y) to each USVcz. This list sets the positions to arrive at the dirty zone. Then, the trajectory, traveled by USVcz from a Pos Start (x,y) to a Pos End (x,y) passing through all accessible positions and avoiding obstacles, is called the global trajectory planning. To obtain this global trajectory, it should be divided into a mini-trajectory; the latter must not exceed the sensor range of USVcz (an ultrasonic sensor), as shown in Figure 4. The range of the sensor represents a detection of the region Rs of s * s boxes that allows USVcz to capture its neighboring positions relative to its current position. A USVcz can move by one step between cells. The mini-trajectory contains Pos Start (x,y) and Pos Gol (x,y). If Pos Gol (x,y) is equal to one of ListPos End (x,y), then USVcz has arrived in the dirty zone. The mini-trajectory contains Pos Gol (x,y) which are occupied by USVcz when it is moved. The others remain un-browsed until they are taken into account during the next planning. The aim is to obtain an efficient global trajectory. To this end, the mini-trajectory should be effective and with a short distance, less obstacle and a minimum of energy consumed. -Evolutionary approach: an evolutionary approach based on GA is proposed to minimize the energy consumption, obstacle and trajectory distance. GA helps the USVcz find an efficient trajectory from the Pos Start (x,y) to Pos End (x,y) (further details in [16]). -Encoding and generation of initial population: after the environment modeling is finished, each node has its own number. The gene is made of one part which represents the number nodes. This gene corresponds to a position of USVcz. The chromosome represents a mini-trajectory which contains a number of genes, the latter should belong to the rayon of the sensors. The population has a fixed length of individuals and each individual has a sequence of initial positions (cell-node). The mini-trajectory planning is random by choosing neighboring positions in the known environment. As a result, individuals have their sequences of positions (mini trajectory), where a mini trajectory is chosen to execute it by USVcz (more clarification found in [16]). -Fitness Function and Selection: fitness function is necessary to know the details and solution of the problem. Two parameters are taken into consideration to evaluate the fitness of the minitrajectories: the total distance of mini-trajectory and the direction cost for each USVcz. An appropriate fitness function of mini-trajectory (i) is constructed as follows: (2) where, A and B are two equilibrium parameters (constant numbers) and Dist Trajectory(i) and Cost Dir(i) are the distance of the mini-trajectory (i) and the direction cost for each USVcz respectively. Dist Trajectory(i) is calculated by the sum of each distance between two cell-nodes in which this distance is represented by a value of energy. Cost Dir(i) is calculated according to the direction of USVcz when it moves to another node. It has three directions: direct direction (Direct cost ) where USVcz moves to another position in the same direction; right/left direction (Right cost /Le cost ) where it turns to the right/left of its position to get to another position. -Genetic operators: new position sequences (new individuals) are created by applying crossover and mutation operators. Crossover is a performed positions sequence provided by two parents. These parents (two mini-trajectories) are selected by the tournament, where they (the individuals) are chosen at random and who with a low fitness value becomes a parent. A one-point crossover technique was chosen for this approach; two children are then added to the population. Figure 5 shows an example of a crossover operator, where the crossover point is the middle of the trajectory of parent 1. Then, child 1 takes the first part of the parent 1 chromosomes plus the second part of the parent 2 from the crossover point. Thus, child 2 takes the first part of parent 2 and the second part of parent 1. Therefore, two new minitrajectories are provided. The mutation defines a quantity of genes that build a problem in the trajectory. The gene is replaced by one of the neighbor genes of its previous gene that are also randomly selected. Figure 6 shows the mutation operation on child 1 of the previous example. At each generation, applied elitism can be created by new individuals (children) of length l who have a low fitness to replace previous individuals (parents). Each new position sequence is executed by an associated USVcz, so that the fitness value of each new individual is calculated.  2. Modified-GA based trigger process: when General crd chooses USVcz swarm, it sends the List idu sv (Id USVCZ ) to their Supmr. In return, Supmr sends its discrete maritime exploration map. General crd broadcasts a set of parameters to the selected swarm, as shown in the sequence diagram of Figure 7. After, General crd launches the execution of GA and each USVcz of the swarm plans to move from the above mentioned steps of GA to the dirty zone.

Solution 2: modified-AA trajectory planning
This phase is presented as the second suitable solution for the trajectory planning problem. Different actions of this solution are mentioned so that the USVcz swarm selected can build its trajectory towards its dirty zone.

Ant algorithm setup: this solution is based on an
ant algorithm proposed in [6,25]. These algorithms show the steps of modeling ants' behavior in the environment. To this end, the proposed modified-AA can model the behavior of a USVcz swarm to find an optimal trajectory between the base of life and the dirty zone. In addition, the discretized grid G is composed of cells that contain positions. These positions are presented as cities. Each city i th represents a free / occupied position by a USVcz agent. These cities are identified by Cartesian coordinates (x,y) in a 2D plane.
Cities i th can have connection links (edges) R ij between neighboring positions j th . R ij represents the distance between cities (Figure 8). This distance is represented by an energy cost E ij that a USVcz agent will consume in the displacement between the cities.  ,y), then the USVcz has arrived at the dirty zone. Each USVcz agent has the following characteristics: -It deposits a pheromone trace on R ij when it moves from city i to city j; -It chooses the destination city according to a probability that depends on the distance between this city and its position and the amount of pheromones present on the edge (transition rule); -To pass only once through each city, USVcz cannot go to a city that has already been crossed, that's why USVcz must have a memory 'List_tabooPos(Id USV , Pos Gol (x,y))'. The traces of pheromones are modeled by the variables τ ij (t) which give the intensity of the trace on the trajectory (i, j) at time t. The transition probability from city i to city j by the agent u i (t) is given by: where, Lu(i) represents the List_tabooPos of u i (t) located on the vertex i and V ij represents a measure of visibility that corresponds to the inverse of the distance between cities i and j. This list is represented by: where, d ij is the distance between the city i and j, which is presented by the E ij in the array List_Cost.
Then α and β are two parameters for modulating the relative importance of pheromones and visibility. The update of the pheromones is done once all the ants have passed through all the cities: where ρ is a coefficient representing the evaporation of traces of pheromones. Then ∆τ u ij represents the link reinforcement (i, j) for u i (t): where Q is a constant and Lu is the length of the trajectory traveled by u i (t) (the sum of the energy costs consumed during the travel between cities).

Running of 'Modified-AA' algorithm:
General crd prepares a virtual list of List_virtualPos(List PosS (x,y), Pos End (x,y)). This list is constructed by a starting position list List PosS (x,y) and an ending position Pos End (x,y) (the arrival at the dirty zone). General crd assigns a Pos Start (x,y) for each agent USVcz of the selected swarm, and the Pos End (x,y) is shared in this swarm. General crd launches Algorithm 2 to find the best trajectory that will apply to the swarm. The flow is started by initializing the transition trace τ ij (t) (3). An interval of time is proposed to assess the improvement of the trajectories built. Each u i (t) will build its trajectory (Path u (t)) from its Pos Start (x,y) to the Pos End (x,y). This construction is made based on the transition rule (3). The agent u i (t) detects its detection region via its ultrasonic sensor to identify its neighboring cities. From the maximum transition probability P ij (t) that was computed, its next destination city is known. When the trajectory is complete, its length L u (t) is calculated. This trajectory is saved in a ListPath_UT list. At each end of the time lapse, the τ ij (t) (rule (5)) is updated. After, the best trajectory found in ListPath_UT is selected based on the maximum value of P ij (t). The best trajectory of t is saved in a list PathFinal_UT. When t = tmax, the best trajectory Best_path u is selected of all t turns among the trajectories found in PathFinal_UT. This trajectory is selected based on its maximum value of P ij (t).

Modified-AA based trigger process: when
General crd chooses the USVcz swarm, it sends the List idu sv (Id USVCZ ) to its Supmr. In return, Supmr sends its discrete maritime exploration map. General crd broadcasts a set of parameters to the selected swarm, as shown in the sequence diagram of Figure 9. After, General crd sends Best_path u to the swarm after changing the starting position of each USVcz. The latter moves together in the grid following the Best_path u . Then, the first USVcz is started from the Pos Start (x,y) of the trajectory, then the second starts from the first position of the first USVcz, and so on. Each USVcz follows its neighbor USVcz. When

Phase 2: cleaning operation
This phase allows the USVcz swarm to move into the dirty zone and clean it. The maritime space is discretized in square grid (G). Each box of G can contain an object (a dirty cell / a clean cell). Objects are points in a numerical space with M dimensions; these objects to be partitioned are positioned. USVcz can move in G and perceive a detection region Rs in their neighborhood (Figure 4). These USVcz can clean the dirty cell type objects. To this end, Algorithm 4 applies the proposed approach with the two solutions. Using this algorithm, the swarm can move in the dirty zone and clean its dirty objects. This phase is started when the swarm arrives at its dirty zone. General crd stops the execution of modified-GA and modified-AA. Subsequently, General crd selects a Leadercz from the USVcz composing the swarm, and at the same time shares this information with the other USVcz. This leader has a very high energy capacity compared to other USVcz. After, General crd launches Algorithm 4 on the swarm and at the same time Leadercz launches the recording of the characteristics of himself and its followers. The USVcz of the swarm follow the algorithm so that they can properly position and move between the clean cells, select the cells to be cleaned afterwards, and share their positions between them.

Phase 3: cleaning finalization
This phase identifies the end of the cleaning step and collects the current characteristics of USVcz for each proposed solution, and has three cases:

Logical formalization of the proposal 4.1 Conceptual model for planning
A conceptual model is a simple theoretical device to describe the main elements of a problem [26]. Most of the planning approaches described in [27] rely on a general model, which is common to other areas of computer science, namely the model of state-transition systems (also called discrete-event systems) [16,26,27]. The technical words of this formalizations (further details in [16]) are given in a general way.  Figure 10 shows a state transition system for a region involving two locations, a dirty zone, a base of life (for example: a boat) and a UAVmr. The set of states is {s0, s1, s2, s3} and the set of actions is {stayinbase, flaputbase, move1∧startmonitor, move2∧end-monitor, discover, undiscover}. -'Swarm (USV) -Cleaning' domain: The system of region ( Figure 11) involves three locations, two dirty zones, a base of life, object of the crane type for picking up, putting down and releasing unnamed vehicles. The set of states {s0, s1, s2, s3} and the set of actions is {take, put, start-clean, end-clean, move1, move2}. -'USV(substitute) -Cleaning' domain: The system presented in Figure 12 is similar to the previous system. However, the difference is the use of a replacement cleaning vehicle (in free / prepared state), which will replace a USV in discharge state. The set of states is {s0, s1, s2, s3, s4, s5} with {takeD∧move2, putD∧putD, moveD1∧start-clean, moveD2∧endclean, moveD1∧move2, moveD2∧move1, move1, move2}.

Representations for classical planning
The classical planning problems are presented in three different ways [26], namely: i) Set theoretic representation, ii) Classical representation, iii) State-Variable representation. This work focuses on the second representation (classical) to apply its planning on the proposed approach.

Simulation
An illustrative example with two scenarios for dirty zones (with a homogeneous degree of dirt and a low degree of dirt) is provided to simulate the functioning of the proposed approach. Two trajectory planning solutions are given for swarm USVcz, namely the modified-GA and the modified-AA. The following measures allow us to highlight the contributions of the proposed approach: displacement energy consumption between the base of life and dirty zone, displacement-cleaning energy consumption in the dirty zone and total energy consumption of swarm USVcz. Figure 15 illustrates a simplistic example of the virtual environment. The maritime space of a 'Region' is assumed to include a polluted 'Zone'; a central unit consists of a base of life. Two different dirty zones are proposed according to the degrees of dirt in this region. After randomly defining the degrees of dirt in the environment with the two different zones, the discretization step is activated. Two matrices are obtained, where the black cells represent the dirty zone ( Figure 16 and Figure 17).

Results
Two modified solutions for trajectory planning were used in this work. These solutions were implemented using a genetic algorithm and an ant algorithm. The two modified algorithms were programmed via open source Java. The simulations were run on a PC with a Core (TM) i5-5200U   @ 2.20 GHz running the Windows 7 Professional operating system. The proposed approach was implemented for a maritime region without hard obstacles that can prevent vehicles from navigating and plan their trajectories in this environment. This region includes a monitoring vehicle (UAVmr), a cleaning vehicle swarm (USVcz) and a dirty zone. Based on a metric map (Grid), the UAVmr captured data from its region. This data was measured against a color metric. These colors were partitioned on four intervals:] 0, 25],] 25, 50],] 50, 75],] 75, 100] involve a white cell (weak dirt), light brown (average dirt), dark brown ( medium-strong dirt) and black (strong dirt). The UAVmr classifies this data (degrees) of the cells by comparing the predefined threshold value (equal to 25% of the degree of dirt) with the Degree cell of each cell. The number of USVcz containing the swarm used in the simulations varied from 3 USVcz to 7 USVcz. Each USVcz swarm planned its movement based on optimal trajectories to reach its dirty zone and clean it. These trajectories were built based on a modified-GA or a modified-AA. The modified-GA was executed successively on each USVcz of the swarm.
The population was composed of five individuals and the maximum number of generations equaled to five because the solution obtained from the fourth generation was identical to that of the fifth generation compared to a tolerated threshold. The initial population was randomly generated where each individual (trajectory) had a start and end position. The tournament selection was applied to find the individual (trajectory) with a weak fitness (energy consumed). This fitness function used two balancing parameters: A = 0.02, B = 0.01. The crossover was used on this individual (the description cited in sub-section 'Solution 1: modified-GA trajectory planning'). The mutation was not applied in this work because it was noticed that the two new children (trajectories) are 100% correct at the end of each generation.
In addition, the parameters of the modified-AA were used to find the best trajectory that will be applied to the swarm of USVcz are: τ0 = 0.1, ρ = 0.5, a = 2, b = 1, n = 90; number of time laps between t0 = 0 and tmax = 4. The cost value between the cells in the grid is randomly generated between 5 and 10. The energy required for the USVcz of the swarm to turn left / right from its position to another position is 0.2% and 0.05% be able to continue directly. When the USVcz swarm arrives at its zone, it starts cleaning dirty cells based on the proposed Algorithm 4. The proposed energy required to clean a black cell is 0.9%, for a medium-high dirt cell of 0.5% and for an average dirt cell of 0.2%. A formula was developed to calculate the total energy consumption (TEC) of each USVcz swarm: TEC = DEC1 + DCEC (DEC2 + CEC). This formula groups the displacement energy consumption of the base of life to the dirty zone (DEC1), the displacement energy consumption (DEC2) plus the cleaning energy consumption (CEC) in the dirty zone (DCEC).

Scenario 1: Zone with a strong degrees of dirt
This first scenario shows the simulation results of a dirty zone with strong degrees of dirt. These results give the curves of DEC1 (or DEC), DCEC and TEC of USVcz swarm for the two proposed solutions: -Result of displacement Energy Consumption (DEC1): Figure 18 shows the results of the displacement energy consumption of each USVcz swarm from the base of life to its dirty zone. The modified-GA was compared with the modified-AA in terms of energy consumption (DEC1). It can be noted that the DEC curve of the modified-AA is less than the DEC curve of the modified-GA with a minimum gain of 1.01% and a maximum gain of 3.51%. Therefore, USVcz consumes less energy to plant its trajectory using the modified-AA algorithm compared to the modified-GA with an average gain of 2.62%.
-Result of displacement-cleaning energy consumption (DCEC): Figure 19 shows a simulation to evaluate the USVcz swarm behavior by its energy consumption for displacement and cleaning its dirty zone. Table  2 shows that the DCEC of the modified-AA decreases when the swarm is composed of 3, 4, 6 and 7 USVcz compared to the second DCEC with a low minimum gain of 0.5%. On the other hand, it increases rapidly when the swarm contains 5 USVcz with a maximum gain of 1.20%. Thus, the modified-GA is generally better than the modified-AA to move and clean with a USVcz swarm in a zone with strong degrees of dirt. Therefore, an average gain of 0.4% is obtained. -Result of total energy consumption (TEC): the two previous results were combined to calculate the total energy consumption based on the two proposed algorithms. Figure 20 shows that the TEC curve of the modified-AA is below the TEC curve of the modified-GA with a minimum gain of 1.13% and a maximum gain of 3.5%. Therefore, the proposed modified-AA proposal consumes less energy than the modified GA with an average gain of 2.4% in a zone with a strong degrees of dirt.

Scenario 2: Zone with a low degree of dirt
The second scenario shows the simulation results for a zone with a low degrees of dirt. These results give the curves of DEC1 (or DEC), DCEC and TEC of USVcz swarm for the two proposed solutions.
-Result of displacement energy consumption (DEC1): Figure 21 shows the results of DEC1 of each USVcz swarm from the base of life to its dirty zone. The modified-GA was compared with the modified-AA with respect to DEC1. It can be noted that the DEC1 curve of the modified-AA is less than the DEC1 curve of the modified-GA with a minimum gain of 0.63% and a maximum gain of 3.4%. Therefore, the USVcz swarm consumes less energy to plan its trajectory using the modified AA algorithm compared to the modified-GA with an average gain of 1.71%. -Result of displacement -cleaning energy consumption (DCEC): Figure 22 presents a simulation that evaluated the USVcz swarm behavior in DCEC in a zone with a low degree of dirt. Table 3 shows that the swarms of 3, 4, 5 and 7 USVcz consume almost the same displacement energy and clean-up by applying the modified-AA and modified-GA with a minimum gain of 0.01%. In addition, the amount of energy consumed by a swarm of 5 USVcz using the modified-AA increases rapidly with a maximum gain of 0.4% compared to the modified-GA. Thus, the modified-GA is generally better than the modified AA in moving and cleaning for the USVcz swarms in this dirty zone with an average gain of 0.4%.  Figure 23 shows that the TEC curve of the modified-AA is below the TEC curve of the modified-GA with a minimum gain of 0.63% and a maximum gain of 3.4%. Therefore, the proposed modified-AA consumes less energy than the modified-GA with an average gain of 1.64% in a zone with a low degree of dirt.

Discussions
The USVcz swarm trajectory planning applied in this approach is based on two modifiable meta-heuristic methods: modified-AA and modified-GA. The implementation of these algorithms shows that the modified-GA takes a lot of computation time in the execution of its steps compared to the modified-AA. Thus, the convergence of its best solution reached a number of strong iterations compared to the modified-AA. In addition, a single path obtained / generated by the modified-AA is applied to a USVcz swarm in a sequential manner to arrive at its zone. However, the modified-GA generates a trajectory for each USVcz of a swarm while respecting the shape of a swarm. Therefore, the results of the simulation show that the USVcz swarm consumes less displacement and cleaning energy using the modified-AA compared to the modified-GA in zones with different degrees of dirt.

Positioning of the approach
This section positions the proposed approach in relation to the related works cited in Section 2. Each study mentioned has the characteristics / parameters that differentiate it from the others. Table 4 shows a comparison study of these studies and the advantage of the proposed approach. In this hybrid approach, a method of cooperation and coordination between a General crd , a semi-autonomous UAVmr, and a USVcz swarm was proposed to accomplish the cleaning missions of dirty oceanic regions contrary to the work of [17]. In [17], the authors developed a complete framework for the design and control of swarms of autonomous collaborative robots, with particular emphasis on quadcopters collaborating with each other to perform space tasks. Furthermore, the work in [13] proposed a trajectory planning guidance system based on a USVcz swarm. In addition, the article of [3] enhanced the first system for an ASVs, an AUV and a guide. The UAVmr of this approach locates and detects the dirty zone on the basis of its sensors like the developed USV [19], which enables the location of the gas source in the sea by two gas sensors. In [21], it extracts a sample of the surface water and runs an image-based detection algorithm to confirm the presence of oil. Depending on the data received and the map being explored by the UAVmr, the general coordinator can send a swarm of USVcz to each dirty zone by executing Algorithm 1.
Thus, in this paper two solutions were proposed for the trajectory planning from the base of life to the dirty      zone for each USVcz swarm. The first solution is based on a modified-GA compared to the GA proposed in [24]. While the second modified solution is inspired by the ant algorithms proposed in [6,25]. The swarm is moved in a square grid (space 2 D) discretized by UAVmr (Supmr). The cells are marked based on the degrees of dirt that the UAVmr processes according to the colors captured. On the other hand, the work of [5] used a grid that is modeled by black (obstacles) and white (clean) cells. In addition, a Voronoi free space scheme for modeling the work environment used in [12]. The swarm applies both solutions to find its trajectory with less distance. The fitness function of the modified-GA is calculated in relation to the energy consumed between the positions traveled and the energy consumed by all the actions performed (turn left / right or continue directly) by USVcz.
On the other hand, the fitness of [9,12,22] is based on the distance traveled and the Euclidean distance of the goal. The tournament selection method was applied in the modified AA while the roulette method was applied in [9,11]. The USVcz swarm trajectory planning problem was managed to find its shortest trajectory. During simulation, it was noticed that the proposed algorithm MMAS [11] offers very good performances for the optimal path com-pared to a GA and an algorithm A*. Thus, the GA of [9] has a higher average performance than the one proposed by A* and C*. For this purpose, a dirty zone cleaning algorithm was applied to the USVcz swarm where each swarm can at the same time move and clean its dirty cells without specifying how to remove the dirt (the oil spill). On the other hand, the swarms of labor robots [20] can place the barge with oil suction equipment and move it to another location to remove the oil more safely.
The two proposed solutions were also compared with respect to the two different scenarios, namely a zone with a homogeneous degree of dirt and a zone with a strong degree of dirt. These scenarios were simulated according to three metrics: energy consumption of displacement of the base of life towards the dirty zone, energy consumption of displacement and cleaning in the dirty zone, and total energy consumption. The simulation results obtained confirmed the choice to use both solutions. Thus, the modified-AA algorithm gives encouraging results compared to the modified-GA algorithm.

Conclusion
This article has presented a hierarchical decision-making system for a hybrid air-sea approach. This approach uses a UAVmr for each maritime region and a USVcz swarm to clean dirty zones. In the monitoring step, the color was chosen as a degree of dirt for UAVmr to detect the dirty zones and the start of cleaning. During the cleaning phase, two solutions were proposed: 'modified-GA' and 'modified-AA'. To this end, a USVcz swarm plans its displacement from the base of life towards its dirty zone. When this swarm arrives at its dirty zone, it begins to move and clean its dirty cells based on the proposed Algorithm 4. The USVcz swarm measures its amount of energy based on an energy threshold, and then sends its information to its Leadercz. The latter returns this information to its Supmr to find a USVcz competent. Supmr launches the USVcz(competent) which will replace the USVcz(discharge) when the energy amount of the latter becomes low. The proposition is formalized by means of a classical representation. Two scenarios were proposed to simulate this proposal, namely a zone with a strong degree of dirt and a zone with a low degree of dirt. The measured metrics are the displacement energy consumption, the displacement energy consumption plus cleaning, and the total energy consumption for each USVcz swarm. The situation where the USVcz breaks down was not simulated in this work (a simulation was performed in [16,23]). The simulation results show that the proposal applied with the modified-AA gives encouraging results compared to the modified-GA. This work did not address the problem of failure of the General crd , Leadercz and the UAVmr in the execution of the tasks. Thus, the Leadercz was changed after the trajectory planning phase by one of its followers due to the lack of a powerful leader in terms of energy able to move, clean and receive / transmit data. Therefore, as future work, the authors intend to develop a hybrid approach to address this problem. In addition, the authors aim to develop the modified-AA with a proposal of the wave propagation instead of pheromone, and to study the influence of speed variations of unmanned vehicles. Furthermore, the authors would intend to assign a cluster of drones for each region in a complex environment through solid obstacles, thus implementing an intelligent planning approach similar to the PSO method using fuzzy logic in combination with the bee colony algorithm.