Land Use Land Cover map segmentation using Remote Sensing: A Case study of Ajoy river watershed, India

Image segmentation in land cover regions which are overlapping in satellite imagery, is one crucial challenge. To detect true belonging of one pixel becomes a challenging problem while classifying mixed pixels in overlapping regions. In current work, we propose one new approach for image segmentation using a hybrid algorithm of K-Means and Cellular Automata algorithms. This newly implemented unsupervised model can detect cluster groups using hybrid 2-Dimensional Cellular-Automata model based on K-Means segmentation approach. This approach detects different land use land cover areas in satellite imagery by existing K-Means algorithm. Since it is a discrete dynamical system, cellular automaton realizes uniform interconnecting cells containing states. In the second stage of currentmodel, we experiment with a 2-dimensional cellular automata to rank allocations of pixels amongdifferent land-cover regions. Themethod is experimentedon the watershed area of Ajoy river (India) and Salinas (California) data set with true class labels using two internal and four external validity indices. The segmented areas are then compared with existing FCM, DBSCAN and K-Meansmethods and verifiedwith the ground truth. The statistical analysis results also show the superiority of the new method.


Introduction
Cogalton and Green in 1999 introduced Remote sensing to be a method forgetting knowledge of any object without direct physical contact with it. Canopy of methods exist for grouping pixels among predefined classes (like an urban area or turbid water) in satellite images. Following the theory, we can define a set for the remote sensing data set, as shown in of r × s × n dimensions in pixels, where p ij ∈ (︀ p ij1 , p ij2 , . . . p ijk }︀ denotes the values of n spectral bands for i, j th pixel. To find similar regions among overlapping segments, we partition the chosen watershed remote sensing image by our new hybrid unsupervised algorithm as well as another spatial image data set.
Let P (R n or Z n ) defines the image space of selected remote sensing imagery. So, points/pixels in P denotes spatial variables -x, y. Let d P (x, y) shows Euclidean distance between two pixels (x, y} ϵ P [15]. In the domain of spatial images, a crisp object C may be projected as a subset of P, C ⊆ P. We define a new method using the cellular automata approach over K-Means algorithm for image segmentation, which is the scope of this article.
In the unsupervised classification methods, clustering is based on maximum similarity within classes and minimum similarity among classes. State-of-the-art clustering methods are -self-organizing map algorithm [12], K-Means algorithm [13], simulated annealing [11], graph theoretic approach [17] and fuzzy c-means algorithm [5], as they are used several times for remote sensing image segmentation. Different perspective methods like clustering based on symmetry algorithm [7], also efficiently detect arbitrary shaped land cover regions in satellite images.
The proposed hybrid approach is performed to classify pixels in both-chosen remote sensing image Salinas A data set and chosen satellite image of the catchment region of river Ajoy. We also evaluate using external quantitative validity measures, our experimental results with solutions from K-Means and FCM algorithms on Salinas A data set with true class labels. The quantitative evaluation over existing two internal and four external validity indices shows the efficiency of the new hybrid algorithm. Using the new K-Means based and Cellular Automata (CA) corrected algorithm (KCA) to detect land cover clusters, which was verified with the ground truth knowledge. The statistical evaluation over both data sets also signify the proposed hybrid KCA approach over K-Means, DBSCAN and FCM methods.

K-Means Clustering Algorithm
K-Means method is a popular partition-based unsupervised clustering algorithm. James MacQueen proposed the term "k-means" [13] in 1967. In this approach, N objects {x 1 , x 2 , ..., xn} are to be clustered initially in K clusters, with random k objects as initial centroids (m1, m 2 , ..., m k ). Then the distances between an object x i and one centroid m j is calculated as d( shows the distance between point x i and cluster centroid m j . The distance is calculated as Euclidean distance.
Based on the computed distances, one point is then divided in the cluster with minimum centroid distance. After re-allocation phase, the centroid of each K clusters are recalculated as m j = 1 is minimized or until number of iterations becomes greater than the defined value. The advantage of K-means approach is that it is efficient and used for high-dimensional data with spherical shaped clusters.

Cellular Automata Method
Cellular Automata (called as CA) are explained to extend in the discrete spatial dynamical systems, which studies physical system models [18]. It evolves the computational devices in discrete space as well as in time. A Cellular Automaton is seeded with any state among states with all 0s and single 1 at separate positions. Then it generates unique patterns of fixed numbers.
Stephen Wolfram [19] proposes the simplest CA in a form of a spatial lattice of cells. The discrete variables are stored in cells. At time t, the variable refers to present state for that cell. Next state at time (t+1) is affected with present state and neighborhood states at time t. If 3-neighbourhood (self, left and right neighbors) Cellular Automaton is considered, with each having two states, either 0 or 1, then next state of cell is where f denotes the function of the next state. S t i−1 , S t i+1 and S t i are the present states for left and right neighbors and self of i th cell at time t. The f is shown in the look-up table in Table 1. 'Rule' R i [19] denotes decimalequivalent of 8 outputs in Table 1. In the 2-state 3-neighborhood cellular automata, there are 256 rules.
One such rule is 30 in Table 1. Rule 30 CA can generate a sequence of random patterns. Scientists initiate in their experiments, an n-cell rule 30 CA, with a state all 0 and single 1 as its seed. This CA produces a state containing a fair distribution of 0 as well as 1 after n number of iterations. But, this will not guarantee that after n-iterations all 0 pattern will never come. It can be proved that for every, n > 1, this CA produces a non-zero states after n/2 iterations.
In the classification scheme of Wolfram, Rule 30 belongs to class III [19], with aperiodic, chaotic behavior. Stephen Wolfram first noticed intrinsic randomness to appear in one deterministic system for Rule 30 CA. When the initialization is done using a black pixel, one random pattern center is generated as in Figure 1 b). In Figure 1 a), b), c) and d), different randomly generated patterns as are created by Rule 30 are shown with different initializations. This is an example to create random patterns from custom-defined rules. In fact, Mathematics uses the center column of pixel values as one of its random number generator.
In the periodic boundary CA, leftmost as well as rightmost cells are neighbors. Otherwise, it is called the null boundary CA. Further classification of CA defines deterministic as well as probabilistic (or, stochastic) CAs. The CA in Table 1 is of deterministic type.

Proposed hybrid KCA clustering algorithm
Our current hybrid model, named KCA algorithm has two steps. In first phase, K-Means segmentation approach clusters pixels in remote sensing imagery to similar land use land cover regions. In the fine-tuning step, the cellular automata based on neighborhood priority enhancement method is implemented, as described in the following Subsection 2.3.1. The two steps are shown in Figure 2.
Initial random clusters to pixels is assigned in K-Means algorithm, as mentioned in Section 2, then the initialized cluster centroids C (0) is computed from initial pixel allocations. Mean Square Error (MSE) objective function value ε for stopping iterations is 1E−05.
Then the centroid reallocation method reduces point-centroid distances by choosing the minimum one in each iteration. The iterations converge while reducing Mean Square Error (MSE). The algorithm stops when difference between previous error value and present error value remains smaller than ε or defined number of iterations has been achieved.
After the first allocations from K-Means method, the new cellular automata based neighborhood priority enhancement method is exploited over all pixels as described in next subsection. In final CA-corrected solutions, the quantitative evaluations have been performed.

Cellular automata based neighborhood priority enhancement method
2-dimesional Cellular Automaton can address 4 orthogonally adjacent cells in von Neumann neighborhood model. To enhance remote sensing image segmentation problem, this model can enhance mixed pixel allocations considering neighborhood regions. Therefore, in this work, we have implemented a hybrid algorithm with well-known K-means method enhanced with 2-dimensional von Neumann neighborhood based Cellular Automata approach.
In the 2-dimensional hybrid CA model in the final fine-tuning phase, to denote the states of cells in each Cellular Automaton, the clustering allocation values from K-Means based initial clustering phase, has been considered. Cells in one Cellular Automaton define pixels in their positions in remote sensing satellite imagery. State values of cells in the model at first denote the assigned clusters from the first phase of our algorithm, state 0 denotes that the pixel is assigned to cluster 0 in first phase. In the implemented model, CA with null boundary condition is adopted. Deterministic CA has been experimented with in this work. Our proposed 2-dimensional K-Cellular Automaton framework is shown as a flowchart in Figure 2.
This fine-tuning phase CA is implemented based on the cluster allocations outputs from the initial wellknown partitioning K-means algorithm, for cluster allocations correction based on neighborhood. The experiment is done on both data -the chosen remote sensing image and chosen remote sensing Salinas A data set. Now, if among 4 neighbors (left, right, top, bottom), at least two neighbors show one specific cluster value, the rank of current cell upgrades to be of that specific cluster value. So, in this case, the present state of current cell remains in similar cluster depending on the neighborhood values in the model. Furthermore, if more than two neighbors have different cluster values other than current cell cluster value, the next state of current cell remains to the same cluster value. This approach reduces number of outliers in small homogeneous regions. It also produces more efficient solutions for mixed pixels. The corrections are performed over the cells in CA matrix iterative for static number of iterations based on dimensions of the image, to get proper neighborhood enhancements for pixels in chosen satellite catchment imagery.

Study area
The chosen LANDSAT image of Ajoy River catchment for research analysis has been extracted within Bardhaman district in West Bengal State in India. There are green, red and blue bands available with original image in Figure 3. Figure 4 shows original image of the Ajoy river catchment region with histogram equalization with 7 classes. The classes are: water, forest, grassland, shrubband, cropland, urban and others in Table 2. The samples and percent values have been obtained using threshold around the RGB values relevant to each of the classes as described in LULC classification scheme from landcover.org.   The river Ajoy flows through the middle part of catchment area. The river starts flowing from upper left corner to lower right corner of the study catchment area as a thin line in the middle.

Data and Experimental Framework
The experiment has also been implemented on Salinas -A hyper spectral imagery using our proposed KCA algorithm and other chosen two well-known methods. This data set is experimented for internal validation with true class labels. Statistical evaluation has been performed over the solutions over this data set. Salinas scene has been collected by the 224-band AVIRIS sensor over Salinas Valley, California with 3.7-meter pixels spatial resolution. Salinas-A scene is a small sub scene of Salinas image with 86*83 pixels including six ground truth classes in Table 4.
The new KCA algorithm is implemented using MATLAB R2014a on Intel(R) Core(TM) i3-3220 processor with 3.30 GHz. For comparative evaluations, well-known K Means as well as FCM models are also implemented. For quantitative evaluation, we evaluate Dunn [5] and Davies-Bouldin (DB) [4] internal validity indices on clustering solutions of KCA, K-Means, DBSCAN as well as FCM algorithms. The segmented regions by KCA approach is also verified with the ground truth of land use land cover models.

Land Use Land Cover Segmentation over Ajoy river catchment
The segmented images of Ajoy river catchment as produced by K-Means and FCM methods are depicted in Figure 5 and Figure 6 for (K = 7). In Figure 5, K-Means algorithm fails to detect the river from its catchment background. FCM method in Figure 6 produces better solutions, but was unable to detect the middle part of catchment area accurately. The water bodies on the top-left and middle-left parts and the lower-middle catchment background are mixed in all segments. K-Means and FCM solutions as shown in Figure 5 and Figure 6. The new KCA approach in Figure 7 is able to separate water bodies from lower middle catchment background. It also classifies the river water more efficiently than other two approaches. These segmentation analyses show that our KCA approach can detect the overlapping land use land cover regions in more efficient way than well-known K-Means and FCM methods.   DBSCAN approach provides very fewer areas in different clusters, with only one cluster covering most areas. Therefore, its result is not considered for comparisons.

Segmentation results on remote sensing data set
The ground truth classes for Salinas A data set is shown in Figure 8. The classification results of FCM and KCA algorithms are shown in Figure 9 and Figure 10 respectively. DBSCAN fails to detect proper number of   clusters for this data set. The ground truth class details for this data set has been shown in Table 4. Table 5 shows land cover classes as obtained by K-Means, FCM and KCA algorithms respectively.

Quantitative analysis
The clustering solutions obtained by chosen three methods are evaluated objectively by measuring validity measures Davies-Bouldin (DB) and Dunn indices, as defined in [5] and [4] respectively. The validity indices values on the Ajoy catchment image are shown in Table 6. Then the validity indices values on Salinas A data set are shown in Table 7. It can be noticed that, KCA obtains best minimized value for DB index as 0.5187 in Table 6. On same data set K-Means exhibits a DB value of 0.9202, DBSCAN shows 0.9401 value and FCM exhibits 0.6582 value. Similarly, the maximizing Dunn index obtained by KCA method 1.6691, while K-Means and FCM method obtain smaller values. In Table 7, the clustering solutions obtained by K-Means, FCM, DBSCAN and proposed KCA algorithms are evaluated quantitatively using internal validity DB and Dunn indices. In this data set, KCA approach also obtains the smallest DB index value of 0.4775 and highest Dunn index value of 1.8399 among chosen three experimented methods in Table 7. So, for this hyper-spectral data set, also the proposed KCA approach provides efficient clustering solutions while in comparison with other two methods. DBSCAN method even fails to detect proper number of clusters by detecting more than 10 clusters for this data set. Therefore, its smallest DB value is not considerable for creating non-comparable clusters on this data set. Evaluating with the true class labels as available with Salinas-A hyper spectral data set, the solutions obtained from the proposed method are validated. The solutions are validated with four external validity indices, namely Rand index [16], Adjusted Rand index [9], Jaccard index [10], and Fowlkes-Mallows index [6]. All these indices provide higher scores for better solutions. In Table 8, ARI indices values as obtained by K-Means, FCM and KCA algorithms are respectively 0.7471, 0.78017 and 0.83549. For Jaccard index also, the proposed KCA algorithm produces maximum value of 0.79285 among those three algorithms. Similar results are also obtained for other two indices over Salinas-A hyperspectral data set considering its true labels as shown in Table 8. DBSCAN provides higher number of clusters for Salinas A data set, therefore it fails to get comparable external validity indices values for this data set.
These results imply that KCA optimizes DB and Dunn internal indices as well as Rand, Adjusted Rand, Jaccard and Fowlkes-Mallows (FM) indices. The quantitative performance values for KCA are more than both K-Means and FCM algorithms for these indices. Hence, it is shown that KCA is comparably a better solution to K-Means and FCM methods.

Statistical analysis
A non-parametric statistical significance test called Wilcoxon's rank sum is conducted for independent samples at 5% significance level [8].
Three groups are generated from DB index values by 10 consecutive runs for chosen three methods, K-Means, FCM, DBSCAN and KCA algorithms on both data sets. The median values of each group on both data sets are shown in Table 9. These results show that KCA gets smaller median values than K-Means, DBSCAN and FCM methods showing more efficiency. DBSCAN fails to detect proper number of clusters for Salinas A data set. Therefore, its outputs are not considered for statistical analysis.   Table 10 exhibits P-values and H-values obtained from Wilcoxon's rank sum tests on three groups, KCA-K-Means, KCA-DBSCAN and KCA-FCM over both data sets. All P-values which we obtain here are less than 0.005(5% significance level). For the Ajoy catchment area, P-value of rank sum test between KCA and K-Means is very small 2.00E-003. This determines that the clustering solutions obtained from KCA are statistically significant, and they do not occur by chance. Other group for FCM solutions also show similar significance. On the chosen Salinas A hyper-spectral data set, P-values obtained in similar tests by KCA-K-Means and KCA-FCM groups are respectively 3.71E-002 and 3.9E-003. These P-values also satisfy null hypothesis. So, all results satisfy the statistical significant of the land cover land use segments from KCA algorithm over K-Means and FCM solutions.

Conclusions
Image segmentation algorithms are most challenging methods used in remote sensing to help interpret the land use land cover models in the satellite imagery [1][2][3]14]. Cellular automata model provides the changes in cell states depending on their neighborhood, which occur in discrete times. So, the cellular automata model can analyze land cover segments in remote sensing images, considering the neighborhood pixel classes.
The contribution of this article lies in significant improvement over detection of mixed land cover regions in the satellite image than existing partitioning algorithm. This new approach introduces a hybrid crisp set based pixel segmentation with cellular automata based neighborhood reallocation in the proposed KCA clustering algorithm. The primary contributions of this work are -to use one new crisp mean-centroid based initial clustering in satellite data sets with CA based neighborhood correction. The neighborhood enhancement phase partially correct outlier allocations in overlapping land use land cover regions significantly. It verifies the overall allocations regarding the neighborhood, to get improved land cover regions.
The performance of proposed KCA approach is shown over one chosen satellite image of Ajoy River catchment area. The experiment was further carried over one chosen Salinas A hyper-spectral data set with available true class labels. Significant efficiency of proposed KCA segmentation method in comparison with wellknown K-Means, DBSCAN and FCM methods is established both using quantitative and statistical evaluations. For quantitative evaluation, two internal and four external validity indices are analyzed over both data sets. The ground truth verification also exhibits significant superiority of new KCA algorithm over other two wellknown algorithms. Statistical tests are also performed to show the statistical significance of KCA solutions when comparing with K-Means and FCM algorithms on both chosen data sets.