Sparse Decomposition Technique for Segmentation and Compression of Compound Images

Abstract Compression of compound records and images can be more cumbersome than the original information since they can be a mix of text, picture and graphics. The principle requirement of the compound record or images is the nature of the compressed data. In this paper, diverse procedures are used under block-based classification to distinguish the compound image segments. The segmentation process starts with separation of the entire image into blocks by spare decomposition technique in smooth blocks and non smooth blocks. Gray wolf-optimization based FCM (fuzzy C-means) algorithm is employed to segment background, text, graphics, images and overlap, which are then individually compressed using adaptive Huffman coding, embedded zero wavelet and H.264 coding techniques. Exploratory outcomes demonstrate that the proposed conspire expands compression ratio, enhances image quality and additionally limits computational complexity. The proposed method is implemented on the working platform of MATLAB.


Introduction
Currently, with the advent of new technology, high-speed internet and need for high amount of data storage, image compression is one of the most important tasks. Additionally, medical science increasingly requires huge amount of images to be stored digitally, most of which are generally grayscale images. Furthermore, in wireless sensor networks, where low-power devices are deployed, image compression techniques are required for reducing power consumption, transmission time and failure probability [12]. In JPEG2000, there is a choice of two discrete wavelet filters: the filters can be lifted (factorized) in order to speed up the convolution step. The 9/7 filter is chiefly suited for high-visual-quality compression. The use of floating-point arithmetic in the discrete wavelet transform and the associated rounding errors make it unsuitable for strictly lossless compression. The filter has, however, better de-correlation properties than the shorter 5/3 filter and hence a better compression performance [7]. The bit-allocation approaches did not change the coding structure, they simply gave more bits or finer quantization steps to the text/graphic areas. However, these approaches could not deal with cases where most of the image was text/graphic region [18].
Usually, one image is first partitioned into some square blocks in FIC, and then these square blocks compose a set called the pool. According to two types of size, an image is partitioned into two dissimilar pools. The pool composed by the blocks with larger size is called the domain pool, and the other pool is called the range pool. The cells in the range pool are the blocks to be encoded [15]. Several digital watermarking algorithms have been proposed with different contributions. The goal is to embed the watermark that is imperceptible in the image, while the copyright holder can detect its existence, using proper private data key. Roughly speaking, these watermarking schemes can be categorized by casting/processing domain, signal type of watermark and hiding position. There are two dispensation domain categories: the spatial domain and the transform domain. In contrast to the spatial domain-based method, transform domain-based methods can embed more bits of the watermark and have better robustness against the attacks such as noise, JPEG compression and Gaussian low-pass filter. Therefore, it becomes one of study focuses in this community [9]. The image to be compressed is grayscale with pixel values between 0 and 255. Compression refers to reducing the quantity of data used to represent a file, image or video content without excessively reducing the quality of the original data. It also reduces the number of bits required to store and transmit digital media. Compression could be defined as the process of reducing the actual number of bits required to represent an image [13].
There are some stringent requirements for the compression of video-like screen contents considering the screen-sharing scenarios. First, the screen compression schemes should guarantee high-fidelity display, especially on the textual contents for visual experience. Second, the screen compression algorithms should achieve a high compression ratio to fulfill the network requirements [11]. Real-time high-quality compressed screen image transmission can also be used in many recent proposed applications, such as cloud, cloudlet screen and cloud mobile computing. In cloud computing, data transfer bottleneck is an obstacle. If the computer screen image can be transmitted in real time and with high quality from the host-to-user site, the data transfer bottleneck problem in cloud computing can be solved [14].
In this article, spare decomposition method for splitting image into smooth and non-smooth blocks and gray wolf-based FCM (GW-FCM) optimization for cluster has been proposed. Different coders have been employed for compressing specific type of image pixels. Adaptive Huffman coder has been employed to compress smooth blocks or background, EZW coder to compress text pixels and the H.264 method has been employed for graphics and overlap pixels. These coder's forms compression block and yields compressed image of compound input image. The technique has been implemented on MATLAB platform for performance and efficiency analysis. The organization of the paper is prearranged as literature survey in Section 2, block diagram of proposed methodology in Section 3, the spare decomposition method in Section 3.1, gray wolf-based FCM in Section 3.2, compression coders in Section 3.3, results and discussion in Section 4 and conclusion in Section 5. Gueguen [6] proposed a new compact representation for the fast query/classification of compound structures from very-high-resolution optical remote sensing imagery. This bag-of-features representation relies on the multiscale segmentation of the input image and the quantization of image structures pooled into visual word distributions for the characterization of compound structures. A compressed form of the visual word distributions was described, allowing adaptive and fast queries/classification of image patterns. The proposed representation and the query methodology were evaluated for the classification of the (University of California) UC Merced 21-class data set, for the detection of informal settlements and for the discrimination of challenging agricultural classes.

Literature Survey
Ebenezer Juliet et al.
[1] proposed a new compound image segmentation algorithm based on the mixed raster content model (MRC) of multilayer approach (foreground/mask/background). The algorithm first segments a compound image into different classes. Then, each class was transformed to the three-layer MRC model differently according to the property of that class. Finally, the foreground and background layers were compressed using JPEG 2000. The mask layer was compressed using JBIG2. The proposed morphologicalbased segmentation algorithm designs a binary segmentation mask that accurately partitions a compound image into different layers, such as the background and foreground layers.
Yang et al. [17] proposed a scale and orientation invariant grouping algorithm to adaptively generate textual connected components (TCCs) with uniform statistical features. The minimum average distance and morphological operations were employed to assist the formation of candidate TCCs. Then, three string-level features (i.e. sharpness, color similarity and mean activity level) were designed to distinguish the true TCCs from the false-positive ones that are formed by connecting the high-activity pictorial components. Extensive experiments showed that the proposed framework can segment textual regions precisely from born-digital compound images while preserving the integrity of texts with varied scales and orientations and avoiding overconnection of textual regions (Gnana et al. [2]).
Grailu [5] used the set partitioning in hierarchical trees (SPIHT) coder in the framework of ROI coding along with some image enhancement techniques to remove the leakage effect that occurred in the waveletbased low-bit-rate compression. They evaluated the compression performance of the proposed method with respect to some qualitative and quantitative measures. The qualitative measured include the averaged mean opinion scores (MOS) curve along with demonstrating some outputs in different conditions.
Kurbana et al. [8] used well-known evolutionary algorithms such as evolution strategy, genetic algorithm, differential evolution, adaptive differential evolution and swarm-based algorithms such as particle swarm optimization, artificial bee colony, cuckoo search and differential search algorithm to solve multilevel thresholding problem. Kapur's entropy was used as the fitness function to be maximized. Experiments are conducted on 20 different test images to compare the algorithms in terms of quality, running CPU times and compression ratios (Gnana et al. [3]).
Yang et al. [16] have studied the subjective quality evaluation for compressed DCIs and investigated whether existing image quality assessment (IQA) metrics are effective to evaluate the visual quality of compressed DCIs. A new compound image quality assessment database (CIQAD) was therefore constructed, including 24 reference and 576 compressed DCIs. The subjective scores of these DCIs were obtained via visual judgement of 62 subjects using paired comparison (PC) in which the Hodgerank decomposition was adopted to generate uncompleted but near-balanced pairs (Gnana et al. [4]). Fourteen state-of-the-art IQA metrics are adopted to assess quality of images in CIQAD, and experimental results indicate that the existing IQA methods were limited in evaluating visual quality of DCIs.
Zhu et al. [19] analyzed the characteristics of screen content and coding efficiency of HEVC on screen content. They proposed a new coding scheme, which adopts a non-transform representation, separating screen content into color component and structure component. Based on the proposed representation, two coding modes were designed for screen content to exploit the directional correlation and non-translational changes in screen video sequences. The proposed scheme was then seamlessly incorporated into the HEVC structure and implemented into HEVC range extension reference software HM9.0.
Minaee and Wang [10] have proposed a model that uses the fact that the background in each block was usually smoothly varying and can be modeled well by a linear combination of a few smoothly varying basis functions, while the foreground text and graphics create sharp discontinuity. The algorithms separated the background and foreground pixels by trying to fit background pixel values in the block into a smooth function using two different schemes. One was based on robust regression, where the inlier pixels will be considered as background, while remaining outlier pixels will be considered foreground. The second approach used a sparse decomposition framework where the background and foreground layers are modeled with smooth and sparse components, respectively.

Compound Image
The most common and significant form of human communication today in "paperless office" is paper documents. These documents are shaped via computers and are stowed in electronic form. The obstacle faced, even with the help of electronic documents, is that they can be quite large in size. Electronic document images have mixed content types such as text, background, and graphics in both grayscale and in color form; they are labeled as "compound images". There are various practices available for compressing compound images. Thus, to progress the compression ratio, a joint segmentation-based compression technique is introduced. In this proposed process, the sparse decomposition-based compression technique is employed. The sparse decomposer technique will segment the smooth and non-smooth constituents. Next, gray wolf optimization-based FCM procedure is engaged to segment the text, overlap and graphic region. Finally, the adaptive Huffman coder, EZW coder and H.264 coding methods were employed for the compression.
The flowchart of the proposed method is depicted in Figure 1.

Spare Decomposition Technique
The proposed sparse decomposition method is engaged to segment a provided image into two layers, background and foreground. The background holds the smooth part of the image and can be well signified with a few smooth basis functions, whereas the foreground holds the text graphics and overlaps that cannot be signified with a smooth model. With the help of the fact that foreground pixels typically subjugate a small percentage of the images, we can model them with a sparse constituent overlaid on top of the background.  Consequently, it makes sense to think of the mixed content image as a superposition of two layers, one smooth and the other one sparse. Henceforth, we can use sparse decomposition methods to separate these mechanisms.
We first need to descend the appropriate model for background component. We divide each image into non-overlapping blocks of size M × M, signified by f (a, b), where a and b represent the horizontal and vertical axes, respectively. Then, it is characterized as a sum of two components F = A + S, where A and S signify the smooth background and sparse foreground components, respectively. The background is demonstrated as a linear combination of m basis functions: where Z m (a, b) denotes a 2D smooth basis function and x 1 …x k signify the parameters of this smooth model. As this model is linear in parameters x k , it is simpler to find the optimal weights to find Z m (a, b). All the possible basis functions are well-ordered in the conventional zig-zag order in the (u; v) plane, and the first K basis functions are selected. Henceforth, each image block can be signified as and S(a, b) represent the smooth background region and foreground pixels, respectively.
To have a more compact notation, we convert all two-dimensional (2D) blocks of size M × M into vectors of length M 2 represented by f and S. f(a, b) and S(a, b) are shown in Equation (1) Z m , where Z is a matrix of size M 2 × m, in that the M th column resembles the vectorized version of Z m (a, b). Equation (2) can be written as To neglect the decomposition issue, we need to impose some priors on m and s. In this method, three priors are forced sparsity of m, sparsity of the foreground and connectivity of the foreground. The reason for arresting sparsity is that we do not want to utilize too numerous basis performances for background representation. Without such a constraint on the coefficients, we might end up with the situation in that all foreground pixels are also modeled using the smooth layer. The second prior, sparsity of foreground, is stimulated from the fact that the foreground pixels are anticipated to inhabit a small percentage of pixels in each block. The last but not the least point is that we supposed the nonzero constituents of the foreground to be linked to each other and we do not want to have a set of isolated points perceived as foreground. We can add a group sparsity regularization to endorse the connectivity of the foreground pixels. We can incorporate all these priors in an optimization issue as shown below: where η 1 and η 2 are the weights for regularization terms that requirement to be tuned, and G(s) shows the group sparsity on the foreground.
where, gm represents the M th group. At this time, the overlapping groups are concerned which consist of all columns and rows in the image. Consequently, we can display the group sparsity term as the summation of two terms, one over all columns and the other one over all rows of image as

Gray Wolf-based Fuzzy C-means Optimization
Clustering is the procedure of transmission a homogeneous group of objects into subsets known as clusters, so that objects in each cluster are more analogous to each other than objects from dissimilar clusters on the basis of the values of their attributes. For handling random distribution data sets, soft computing has been announced in clustering that exploits the tolerance for imprecision and uncertainty in order to accomplish tractability and robustness. Fuzzy sets and rough sets have been assimilated in the C-means outline to develop the fuzzy C-means (FCM) and rough-means (RCM) algorithms. Assume X = (x 1 , x 2 ,…, x N ) as the universe of a clustering data set, G = (γ 1 , γ 2 , γ 3 , ……, γ C ) as the prototypes of C clusters, and M = [m lm ] N×C as a fuzzy partition matrix, where u lm ∈[0, 1] is the membership of x l in a cluster with prototype γ l ; x l , γ l ∈ R P , where P is the data dimensionality, 1 ≤ l ≤ N, and 1 ≤ j ≤ N. The FCM algorithm is consequent by diminishing the objective function.
where z > 1 is the weighting exponent on each fuzzy membership and d im is the Euclidian distance from information vectors x l to cluster center γ l .
Using the wolves' strategy, i.e. gray wolves encircle their prey at the time of the hunt, the centroid is initialized with the help of gray wolf optimization and the degree of membership is calculated for all the feature vectors in all the clusters. To mathematically model encompassing behavior, the subsequent equations are proposed: where t represents the current iteration, A and C are coefficient vectors, p X is the position vector of the prey and X is the position vector of a gray wolf. Vectors A and C are measured as The new centroid is prearranged as: Update the degree of membership M lm using Equation (13).
< stop, then compute the novel centroid again, where ε ∈(0,1) is the termination criteria. The FCM permits each feature vector to belong to every cluster with a membership between 0 and 1 that is computed by Equation (13). The foreground of the image is segmented into three clusters with dissimilar features, based on the features text, graphics and overlap clusters are engendered.

Compound Image Compression
Image compression is mostly used to decrease the trivial and redundant part of the image information, and to deal with such kind of images, it is important to recognize layout and structural data from the image, and informal effective compression methods that will be feasible for the dissimilar content types of the image. Such methods are designated as document image compression approaches. The chief aim of using the compression method is to accomplish low-complexity and high-compression ratio in order to meet the energy restrictions of image processing without any loss of the original data [2].
Image compression deliberates minimization of storage space as its chief objective and the decompressed image after compression should be the exact replica of the original image [3]. Thus, it is essential to select the compression methods for compressing the segmented image blocks.
In our proposed technique, the background image having a smooth region is compressed by means of adaptive Huffman coder; however, the text region is compressed using EZW coder, and the graphics and the overlapping regions are compacted with the H.264 coder technique. It is always obligatory to have a coder that codes the images in analogous technique. So that it has the aptitude to process the resulting illustration directly are predicted.

Adaptive Huffman Coder
The Huffman code describes word schemes that compute the mapping from source messages to the codeword based on a running assessment of the source message probabilities. The code is adaptive and dynamic so as to endure the relevance for the current assessments [4]. In this method, the adaptive Huffman codes rejoin to locality. In core, the encoder is "learning" the features of the source. The decoder must learn together with the encoder by continually updating the Huffman tree so as to stay in synchrony with the encoder. One more benefit of the scheme is that they necessitate only one pass over the information.
The adaptive Huffman algorithm comprises two enhancements over conventional Huffman algorithm. First, the number of interchanges in that a node is moved upward in the tree throughout a recomputation is restricted to 1. This number is conservatively bounded only by l/2, in which l is the length of the codeword for x(t + 1) when the recomputation begins. Second, the adaptive technique diminishes the values of SUM{l(k)} and MAX{l(k)} subject to the obligation of diminishing SUM{w(k) l(k)}. Over the whole message, adaptive approaches that do not undertake the relative frequencies of a prefix signify accurately the symbol probabilities.
An assumption is made in adaptive coding that the weights in the current tree are parallel to the probabilities related to the source. This assumption becomes more operative as the length of the ensemble increases.
Under this assumption, the anticipated cost of transmitting the subsequent letter is SUM{p(k) l(k)}, which is roughly SUM{w(k) l(k)}.
Principally, all nodes are roots of their own degenerate tree of only one leaf. The algorithm combines the trees with the least probability first and repeats this process until only one tree is left. The function of adaptive Huffman's algorithm is bounded with the help of S − n + 1 from below and S + t − 2n + 1 from above. At worst, the adaptive technique can diffuse one more bit per codeword than the conventional Huffman technique.

Embedded Zero Tree Wavelet Coder
The embedded zero tree wavelet algorithm (EZW) is a simple, hitherto remarkably efficient, image compression algorithm, having the property that the bits in the bit stream are produced in order of priority, attaining a fully embedded code. The embedded code signifies a series of binary decisions that differentiate an image from the "null" image. With the help of an embedded coding algorithm, an encoder can dismiss the encoding at any point, thereby permitting a target rate or target distortion metric to be met precisely. EZW reliably produces compression results that are competitive with virtually all known compression algorithms on typical test images.
The EZW algorithm based on the basis of four important aspects: 1. Finding discrete wavelet transformation 2. Forecasting the absence of large amount of data across scales by using the self-similarity essential in images 3. Entropy-coded successive approximation quantization 4. "Universal" lossless data compression that is attained via adaptive arithmetic coding.
Each wavelet coefficient at a provided scale can be associated with a set of coefficients at the subsequent finer scale of similar orientation. Zero tree root (ZTR) is a low-scale "zero-valued" coefficient, in that all the associated higher-scale coefficients are also "zero-valued". Requiring a ZTR permits the decoder to "track down" and zero out all the related higher-scale coefficients. Trees well-defined using the wavelet is provided in Figure 2A and conforming compression is prearranged in Figure 2B.
Zero trees are the main portion of EZW; they are not the only important portion. The other portion has to do with embedded coding. The aim of embedded coding is to generate a bit stream that can be abridged at any point using the decoder.

H.264 Coding Technique
To acclimatize to the H.264 intra frame coding, the two intra coding modes can be industrialized: RSQ (residual scalar quantization) and BCIM (base colors and index map). RSQ mode: For graphics blocks enjoying edges of numerous directions, intra prediction along a single direction cannot entirely remove the directional correlation among samples. After intra prediction, strong anisotropic correlation still remains reserved. In such cases, it is not effective to achieve a transform on them. One technique is to skip the transform and straight code prediction residues that are similar to traditional pulse-code modulation (PCM).
Let R be the rate, the coding gain is well defined as the ratio of distortions on transform coefficients and residual samples, individually, as pcm/ pcm Tc Tc where ε is the factor of probability distribution, σ is the variance, R is the rate, D Tc is the distortion using transformation coefficient and D pcm is the distortion of PCM.
For an agreed prediction direction, only the nearest reconstructed integer pixel along that direction without filtering is utilized for prediction. The reason is that the filtering achieved on reconstructed pixels would blur sharp edges in text and graphics blocks and decrease the prediction accuracy. Thus, in Equations (15), (16) and (17), the reconstructed nearest integer pixel that is elected for prediction will have a distance from the current pixel farther than amalgamations of reconstructed pixels.
BCIM mode: The overlap portions on compound images have limited colors but complex patterns. Such blocks can be characterized succinctly by numerous base colors along with an index map. It is rather like color quantization, which involves selecting an illustrative set of colors to estimate all the colors of an image. At this time, we primarily got the base cluster of the blocks with the help of a clustering algorithm. Let us use the luminance plane of a 16 × 16 overlap block as an example, with each sample expressed by 8 bits. If four base colors are nominated to approximate colors of that block, only two bits are needed to signify each sample's index without compression.

Results and Discussion
The anticipated methodology was applied with the help of MATLAB and was assessed by testing the proposed system with an input compound image. To authenticate the performance of the proposed algorithms, experimental assessment is achieved with the help of a variety of images. For experimentation, compound images such as text, image and background test images are used.
The anticipated approach was executed with the help of MATLAB and was evaluated by testing the proposed conspire with an information compound picture. With a precise end goal to authorize the accomplishment of anticipated calculations, test calculation is performed with the help of a collection of pictures. For investigation, compound images were used. It includes text, image and background images.

Performance Metrics
The proposed approach achieves better compression ratio, decompression time, memory estimate and is contrasted by existing pressure procedures as far as the formerly stated execution measurements.

Peak Signal-to-Noise Ratio
The peak signal-to-noise ratio (PSNR) is defined as the ratio between the maximum possible power of a picture and power of corrupting noise. A high PSNR legalizes the decompressed picture have great quality. The PSNR is figured using the equation The compressed and decompressed content/design pieces and picture/foundation squares can be tested with the help of compression ratio. In advanced picture preparation, the compression ratio is considered as the proportion of size of original image in grayscale to the span of decompressed picture. The proposed agenda provides the aggressive compression ratio: compression ratio = (size of original image)/(size of compressed image).

Mean Square Error
The mean square error (MSE) is the measure of average of squared ratio of estimator output to the estimated output. It is denoted as:

Root Mean Square Error
The root mean square error (RMSE) (similarly known as the root mean square deviation, RMSD) is defined as the measure of contrast within qualities predicted by a model and the qualities really saw from the condition that is being presented. These individual contrasts are similarly known as residuals, and the RMSE serves to total them into a solitary measure of prescient power.
The RMSE of a model expectation concerning the evaluated variable is considered as the square foundation of the mean square error: where A obs is the observed value and A model is the modeled value at time/place s. The sample test image is shown in Figure 3.

Performance Evaluation
The compression ratio of text, image and background test images (1-5) with the help of future optimizationbased FCM and existing multibalanced CS k-means algorithm was represented in Table 1. The compression ratio value of proposed technique is high related to available technique, making it good in condensing compound images. Furthermore, the compression efficacy values of the proposed and available methods were revealed graphically in Figure 4. The figure shows that the compression ratio of the proposed segmentation-based compression strategy works effectively than all other existing methods.
The dissimilar parameter, for example, PSNR, structural similarity (SSIM), RMSE, image quality index, and second derivative-like measure of enhancement (SDME) were established for the test pictures. The performance evaluation of content, picture and background image quality while employing optimization-based FCM segmentation technique is given in Table 2.    The competence of MB CS k-means algorithm-based compression and H.264 video compression method with the help of dissimilar parameters is shown in Table 2. Test picture 1 has a higher PSNR than the alternative pictures. The similarity within pictures is checked with the help of SSIM, and test image 2 has less mutilation distinguished compared with the other test pictures. The RMSE is lower for test picture 3, which was distinguished with other test picture. The picture quality index is valuable for test pictures 4 and 5, which were differentiated with the other test picture. SDME is less narrow to noise and then soak edges in test picture 3 has better enhancement analogized with others test pictures.
The viability of optimization-based FCM using dissimilar parameters is shown in Table 2. Test pictures 2 and 4 have the high PSNR than the alternate pictures. The similarity within the pictures is checked using SSIM, and test images 2 and 5 having less mutilation juxtaposed with dissimilar pictures. The RMSE is lower for test pictures 3 and 4, which were juxtaposed with other test pictures. The picture quality index is valuable for test pictures 3 and 4, analogized with other test picture. The SDME is less touchy to noise and then soak edges in test picture 3 has better upgrade distinguished with others test pictures. Table 2 illustrates about the performance valuation of dissimilar parameters using available MB CS k-means, H.264 (without segmentation) and proposed optimization-based FCM technique. From the above tables, it is evident that all the parameters are better for the proposed approach than the current methods; thus, our proposed approach is best analogized with available method. Based on the implementation results, H.264 is better than the MB Cs k-means technique. Even though all the existing methods compared in our evaluation produces much lesser results than our proposed technique.
The running time (in seconds) was also examined for the anticipated compression method and the H.264 video compression technique without segmentation, and the values are shown in Table 3.
In Table 3, it is predicted that the running time obtained for H.264 compression strategy is less when related to the proposed technique. Although the running time is high for the proposed segmentation-based compression approach, the proposed method is substantial in terms of compression ratio.

Conclusion
In this article, the compression process is performed via the compression of individual constituents such as image, text and background accomplished from the segmentation with the help of sparse decomposition method and OFCM technique to progress the compression ratio. The proposed segmentation-based compression is associated with the available methods. Furthermore, the proposed technique is juxtaposed with the H.264 video compression strategy without segmentation. After compression, the image quality is investigated via numerous measures such as PSNR, SSIM, RMSE, image quality index and SDME and associated for the available methods. Furthermore, the compression ratio and its running time were also investigated. The investigation shows that the proposed method is operative and offers outstanding compression ratios.