Skip to content
BY-NC-ND 3.0 license Open Access Published by De Gruyter May 1, 2013

An Intelligent Fused Approach for Face Recognition

  • Khitikun Meethongjan , Mohamad Dzulkifli , Amjad Rehman EMAIL logo , Ayman Altameem and Tanzila Saba


Face detection plays important roles in many applications such as human–computer interaction, security and surveillance, face recognition, etc. This article presents an intelligent enhanced fused approach for face recognition based on the Voronoi diagram (VD) and wavelet moment invariants. Discrete wavelet transform and moment invariants are used for feature extraction of the facial face. Finally, VD and the dual tessellation (Delaunay triangulation, DT) are used to locate and detect original face images. Face recognition results based on this new fusion are promising in the state of the art.

1 Introduction

Face recognition is a challenging research area in the field of computer vision that remains incremental over the time due to its utmost need in security and access control applications. Several issues make automatic face recognition a very difficult task, especially when variations of images of the same face occur due to possible changes of some parameters such as pose, illumination, expression, motion, facial hair, glasses, and background [12, 19]. However, these parameters are unavoidable and negatively affect the performance of face recognition systems. Other factors also contribute such as from camera quality, light intensity, and video control.

Literature is replete with several techniques using two-dimensional (2D) images for face recognition tasks. Some perform under a medium illumination variation, although the performance drops when both illumination and postchanges occur [26]. The postvariation factor is also a challenge in this field. For instance, the algorithm yields false positive results when rotations occur. It is always difficult to deal with head rotation, and therefore, it affects the authentication process even though the security camera can create viewing angles outside of this range [1]. In addition, the facial expression factor is very important, which affects the high-frequency components of the experiment [17]. Another factor that also affects face recognition performance is the occlusions, i.e., when the face is obstructed by foreign effects such as hair, hand, beard, and so on. These factors negatively affect face recognition system performance.

Xie et al. [26] proposed a new method of concentric circular Fourier–Zernike descriptors for face image retrieval. The descriptors are created using two main steps. First, the original square image is converted into a circular image; then the circular image is partitioned into several concentric circular sub-images in a new polar coordinate space. Second, local invariant Zernike moments for each concentric circular sub-image were calculated and applied to one-dimensional Fourier transformation on these Zernike moments to obtain the Fourier–Zernike descriptors. The experimental results were reported on the AR grayscale-face database and ORLFERET mixed-face database. A novel scheme for human face detection in color images under non-constrained scene conditions was presented by Lu et al. [11]. Color clustering and filtering based on the vector quantization (VQ) technique were performed on the original input image, providing binary skin color regions. To provide a set of candidate face areas, constrains related to shape and size of faces were considered. Finally, mouths were searched in all of the candidate face regions using a mouth detector to verify each face region. Researchers claimed their method efficient and robust to head rotation to some extent. However, no results were reported.

Few researchers have exploited the Voronoi diagram (VD) for face recognition. Cheddad et al. [4] suggested a method for the segmentation of the human face from still images using VD. They presented a face location and extraction step that generates clusters of intensity values using the vertices of the external boundary of the Delaunay triangulation (DT). Their results are robust, precise, and independent of translation, rotation, and scaling. However, the presence of background intensities similar to the face intensities in the image may cause a problem for segmentation decisions [11]. The DT of the facial feature has a different size in a different area, which can classify the size of DT using VD. Furthermore, it can separate a different facial image into a number of regions that represent the skeleton of the skin [27]. In addition, Dobrina and Dominique [6] proposed the discrete Delaunay method based on the Voronoi graph and DT for boundary extraction from a voxel object. Their result shows that the method allows a polygonal boundary representation that is guaranteed to be two-manifold and is successfully transformed in a triangular quality mesh.

The most recent approaches are proposed particularly to solve facial expression and illumination. Wong et al. [24] presented dual optimal multiband features for face recognition that are invariant to illumination and facial expression variation. They used a wavelet packet transform that decomposes image into frequency sub-bands, and a multiband feature fusion technique was incorporated to select the optimal multiband feature sets. Although the proposed system achieved a high recognition rate under different illumination and facial expression variations, the strategy needs a high processor speed and memory [28]. Jadhav and Holambe [9] described a face recognition system based on the combination of radon and wavelet transform that is invariant to illumination and facial expression variations. The DC component of the low-frequency sub-band was removed during the testing of algorithm performance in illumination variation. The proposed system achieved high recognition accuracy in the variation of facial expression and illumination separately.

In addition, Celik et al. [3] proposed a method for facial feature extraction using the directional multiresolution decomposition offered by complex wavelet transform. They used the dual-tree implementation of complex wavelet transform using two parallel discrete wavelet transform (DWT) with different low- and high-pass filters in different scales. The linear combination of sub-bands generated by the two parallel DWTs is used to generate sub-bands with complex coefficients. The resultant performance gets better, provided that the diversity in the illumination conditions is less. Moreover, Nanni and Luminia [16] presented a multiexpert approach for a wavelet-based face detection based on the multiresolution analysis of the face. The images are decomposed into frequency sub-bands with different levels of decomposition using different wavelets. Although this method provides a faster detection and reduces false-positive results without discarding real face images, it only works on an upright frontal face image.

In this study, we propose a new scheme to enhance the performance of a face recognition system based on the fusion of VD, DWT, and moment invariants. The aims of this work are to (i) solve the problem of illumination of the facial image, (ii) apply process step using VD for segmentation and detection on grayscale face images, and (iii) integrate three processing method: VD, DWT, and moment invariants. The article is further organized into five sections. Section 2 describes the block diagram. Face segmentation and detection using VD are described in Section 3. Face extraction based on DWT and moment invariant step is reported in Section 4. Section 5 discusses the experimental results. Finally, the conclusion is given in Section 6.

2 Framework of the Proposed Approach

In this study, the system is based on several modules in a sequential architecture, as shown Figure 1. We used a 2D frontal face image from a standard data set (BioID face database), which included several illuminations and without glasses.

  1. Preprocessing step: Preprocessing is normally included in all image-processing applications [26]. Accordingly, in this research, the image is preprocessed by applying square morphological, Gaussian low-pass filter, median filter, and histogram equalization.

  2. VD face segmentation and detection step: In this step, the face is segmented. Therefore, VD and DT are used to detect points for face image identification. To detect the eyes on an image, a predesigned rectangle template (like window) is applied to crop the face. The output is the facial image.

  3. Extract with DWT: To attain a wavelet coefficient and a wavelet sub-band facial image, wavelet decomposition based on 2D DWT (Daubechies mother wavelet “db4” level 1) is applied to the facial image acquired in the previous step.

  4. Feature extraction: Central moment invariants computed from sub-band facial image, similitude invariants, orthogonal invariants, and similitude with orthogonal invariants are fused. Accordingly, 21 feature vectors per facial image are acquired.

  5. Classification step: The feature vectors acquired represent the face image and are used for training and testing. However, to classify and verify face images, the minimum Euclidean distance is broadly applied in the state of the arts. Therefore, in this research, the minimum Euclidean distance is also applied for the classification and verification of face images.

Figure 1 Block Diagram.
Figure 1

Block Diagram.

3 Image Processing Based on VD/DT

3.1 Voronoi Diagram Content

VD, or Voronoi tessellation, is a well-known technique in computational geometry that generates clusters of intensity values using information from the vertices of the external boundary of DT. VD has been revived several times, and a comprehensive review of its variations and applications can be found in Ref. [18]. Furthermore, Blum and Nagel [2] proposed an algorithm for the computational analysis of the skeleton and the construction of VD/DT on the boundary of a shape image. It is presently used in many research areas; however, researchers primarily focus on its use in skeletonization and generation of Euclidean distances.

This research work exploits the triangulations (i.e., Delaunay) generated by the VD [17]. The VD set of “sites” (points) is a collection of regions that divide the plane. Each region corresponds to one of the sites, and all the points in one region are closer to the corresponding site than to any other site. Given a set of 2D points, the Voronoi region for a point pi is defined as the set of all the points that are closer to pi than to any other points. The intersections of the Voronoi regions for the set of points construct the Voronoi diagram. Additionally, it has geometric features such as Voronoi edges and Voronoi vertices. A Voronoi edge is a boundary line segment limiting its associated Voronoi region. A point on the Voronoi edge is associated with two input points such that each point on a Voronoi edge is equidistant from these two points. A Voronoi vertex is an intersection of Voronoi edges and is associated with three or more input points such that each Voronoi vertex is equidistant from these input points as shown in Figures 2 and 3, respectively [7].

Figure 2 (A) Set of Point, (B) VD, and (C) DT.
Figure 2

(A) Set of Point, (B) VD, and (C) DT.

Figure 3 VD Component. V(S) = Voronoi Diagram (solid lines); DT(S) = Delaunay Triangulation (dashed lines); w = Voronoi Vertex; p, q, r, s = Co-circular Site Degree 4.
Figure 3

VD Component. V(S) = Voronoi Diagram (solid lines); DT(S) = Delaunay Triangulation (dashed lines); w = Voronoi Vertex; p, q, r, s = Co-circular Site Degree 4.

Let a set S = {p1, p2, …, pn} of n distinct points in the plane. The Voronoi cell V(pi) of a point piS is defined in the following:

Here, d(p, q) denotes the ordinary Euclidean distance between p and q:

The VD V(S) of S is the family of subset of R2 that consists of the Voronoi cells and all of their intersections. The bound of a Voronoi cell consists of Voronoi edges and Voronoi vertices. A point qR2 is on the Voronoi edge (pi, pj) if

3.2 Face Image Segmentation Phase

Image segmentation is a basic step in almost all image-processing applications [21]. In image analysis of face segmentation, identification and isolation into regions are mandatory in order to distinguish between the object image’s so-called foreground and the part of the non-object’s so-called background. However, before segmentation, preprocessing is carried out to enhance the quality of the original image such as lighting uneven effects, noise, and the impact background on the object. The proposed procedure used square morphological, Gaussian low-pass filter, median filter (to remove noise), and histogram equalization. Consequently, the grayscale values of the face images are reduced to 27.87%, which enhanced the speed and reduced memory usage. Additionally, the histogram equalization stage enhanced the image intensity contrast to generate a convex hull to be used in next stage. The convex hull of a set of featured points is the smallest convex set containing these points that have the outer boundary of VD. However, few selection points (fewer than 255) are selected [17]. The represented point sets can reflect dense point sets, and a set of unique dot patterns are attained. The two global maxima are obtained from the host image histogram, which obtains the minima between these two peaks. Finally, we set all points below the first peak and all points beyond the second peak to zeros, and then we set all points that are also equal to zeros to be equal to the argmax (peak1, peak2). Hence, it yields a local flip effect image histogram as shown in Figure 4. The new set points for the part of the convex hull corresponding with the DT were sorted in ascending order to form ranges to merge or split regions accordingly:

Figure 4 Global Maxima and Local Minima.
Figure 4

Global Maxima and Local Minima.

where Valnew(x) represents the highest frequency in the host image histogram.

To locate the face region, the authors have introduced distance transformation to separate the face region from the background, which used function operators on the Euclidean distance between two points (DisT ≥ threshold value). Next, a well-known strategy for face estimation and segmentation is used, which includes ellipse fitting, cross correlation, and the Euler number. From a previous step, the eye position in the original face image is detected, and finally, the arbitrary square angle (the so-called window), with a size of 200 × 200 pixels, is used to crop the face image automatically. The targets of this phase of facial image detection consist of the eyes, nose, and mouth as shown in Figures 57.

Figure 5 (A) DT, (B) Original Image, (C) Segmentation, and (D) Detection.
Figure 5

(A) DT, (B) Original Image, (C) Segmentation, and (D) Detection.

Figure 6 Facial Face (Automatic Crop).
Figure 6

Facial Face (Automatic Crop).

Figure 7 Example Face Cropped Image from the BioID Data Set.
Figure 7

Example Face Cropped Image from the BioID Data Set.

4 Face Extraction Based on DWT and Moment Invariant

4.1 Face Extraction Using DWT

Wavelet transformation has been widely used in image-processing applications [25]. Mallat and Zhang [13, 14] presented the wavelet theory for signal processing. It defines an orthogonal multiresolution representation, the so-called “wavelet representation”. In the case of image, the wavelet representation differentiates several spatial orientations for data compression in image coding and texture discrimination. An image is repeatedly filtered and decimated into high and low spatial frequency bands, alternatively between the horizontal and vertical directions.

The 2D DWT is computed by applying a separable filter bank to the original image that is decomposed into four sets to represent wavelet coefficients, approximation, and details. At each step, the previous four sub-images of the wavelet coefficients are further decomposed into four new sub-images: approximation and detail [23]:

where An is the approximation at level n and Ao=I(x, y) (original image); Dni is the level n details; parameter i stands for the direction of the details (i = 1, 2, 3 represents the vertical, horizontal, and diagonal directions, respectively); |2,1 and |1,2 are the image transformations performed in both the vertical and the horizontal directions, respectively; and H and L are the high- and low-pass filters, respectively.

In this research, the image is cropped with first-level Daubechies wavelet “db4” with a size of 128 × 128 pixels. The result of the extracted facial image is the part of sub-band HL, LH, and HH (Figure 8). Next, these sub-bands are selected to compute the wavelet feature that used the detail coefficient matrices of each sub-band:

Figure 8 Sub-band Type and First-level DWT.
Figure 8

Sub-band Type and First-level DWT.

where cH, cV, and cD are the detail coefficient matrices of horizontal, vertical, and diagonal directions, respectively.

4.2 Face Extraction Using Moment Invariants

Moment-based invariants are widely applicable in describing the features of moment invariants on 2D image and have been used for pattern feature extraction and object recognition [5, 15]. Initially, Hu [8] introduced a set of invariants based on non-linear combinations using regular moments. The significant central moments are invariant toward rotation, translation, and scale [10].

The 2D moment of the order (p+q) of a digital image f(x, y) is defined as

for p, q=0, 1, 2, …, where the summations are over the values of the spatial coordinates x and y spanning the image. The corresponding central moment is defined as


The normalized central moment of order (p+q) is defined as

for p, q=0, 1, 2,…, where

for p+q=2, 3, …. A set of seven 2D moment invariants that are insensitive to translation, scale change, mirroring, and rotation can be derived from these equations such as:

The central moment invariants are useful features for image description due to their independence and invariance to position, orientation, and size of image. In this article, we combine the absolute of the orthogonal moment invariants with the similitude moment invariants [7]. Thus, all fused feature vectors be composed of orthogonal moment invariants, similitude moment invariants, and moment invariants from both similitude and orthogonal transformations. The total of the feature vectors is 21, which was obtained from the combination of the seven features in each moment invariant.

5 Experimental Results

5.1 Experiment of Face Segmentation Using VD/DT

After preprocessing, the proposed scheme uses VD and DT for segmentation and detection of the face image. All experiments are conducted on grayscale frontal face images, 3849 × 9286 pixels, obtained from the BioID face data set. There are 15 objects (people), and each object had 10 images. Additionally, each image had different illuminations and non-strict face expression. The target of this step is the detection of the eyes on original face image. Then we used the square angle (window) for automatic cropping of the face from its image, as shown in Figure 6. Our method sets the size of the facial face to 1289 × 9128 pixels, which is enough to include the eyes, nose, and mouth. The result of correct location rate with different lighting conditions is 90.7%, as shown in Table 1.

Table 1

Correct Location Rate of Face Cropping.

ObjectTest imageTrain imageCorrect location rate
15150759.067 (90.67%)

Only a few results are reported in the literature, but for the sake of comparison, Yi and Hong [27] claimed an accuracy rate of 89% and Lam and Yan [10] achieved an accuracy of 85%. In comparison, our proposed approach produced a better result [22].

5.2 Experiment of Face Classification Using Minimum Euclidean Distance

The BioID face benchmark database is composed of 10 different images of each of 15 people. These images are taken under different illuminations and without glasses. The experiments are carried out on nearest neighbor classifiers using minimum Euclidean distance. In our experiment, five images of each person are selected as training, and the rest of the 10 images are used for testing. The experiment results of the first-level discrete wavelet decomposition using db4 and the extraction of the feature using moment invariants are shown in Table 2.

Table 2

Recognition Rates of Combining VD and DWT Moment Invariants.

Object no.0102030405
RR (%)92.795.595.895.496.6
Object no.0607080910
RR (%)93.893.294.194.795.5
Object no.1112131415
RR (%)92.095.697.294.493.2
Average of recognition rate (RR)94.7

Table 2 shows that the best recognition rate, 97.2%, is obtained for object 13. However, the lowest recognition rate is for object 11 (92.0%), where first-level db4 DWT and moment invariant extraction are used. Nonetheless, while the experimental setup and face data set are kept constant, the recognition rates are not the same. This is due to the arbitrary selection of training, testing data set, and different illuminations of objects 13 and 11. Therefore, the images for testing and training should be selected appropriately [20].

6 Conclusion

This article has presented a fused approach to enhance the accuracy of face recognition. The proposed method is the integration of VD, DWT, and moment invariants to detect and extract face image. Several face images of each person in the BioID face data set were selected as training samples, and the rest of the 10 face images were used for testing samples. The sizes of the original face image (3849 × 9286 pixels), facial crop image (1289 × 9128 pixels), and sub-band image (649 × 964 pixels) were decreased. Meanwhile, to reduce memory usage and increase the processing speed, a small-size feature vector of moment invariants on the sub-band image were selected. Finally, the fusion of VD, DWT, and moment invariants produced a promising recognition performance (94.7%) for human face recognition in the state of the art.

Corresponding author: Amjad Rehman, MIS Department, College of Business Administration, Salman Abdul Aziz University, Alkharj 11942, KSA

Many thanks to the Deanship of Scientific Research, King Saud University, Riyadh, Saudi Arabia, for their full financial support to this research.


[1] A. F. Abate, M. Nappi, D. Riccio and G. Sabatino, 2D and 3D face recognition: a survey, Pattern Recog. Lett.28 (2007), 1885–1906.10.1016/j.patrec.2006.12.018Search in Google Scholar

[2] H. Blum and R. S. Nagel, Shape description using weighted symmetric axis features, Pattern Recog.10 (1978), 167–180.10.1016/0031-3203(78)90025-0Search in Google Scholar

[3] T. Celik, H. Ozkaramanlıa and H. Demirel, Facial feature extraction using complex dual-tree wavelet transform, Comput. Vis. Image Understand.111 (2012), 229–246.10.1016/j.cviu.2007.12.001Search in Google Scholar

[4] A. Cheddad, D. Mohamad and A. Manaf, Exploiting Voronoi diagram properties in face segmentation and feature extraction, Pattern Recog.41 (2008), 3842–3859.10.1016/j.patcog.2008.06.007Search in Google Scholar

[5] B. Deepayan, P. A. Bala and M. Tim, Real-time object classification on FPGA using moment invariants and Kohonen neural network, in: Proc. IEEE SMC UK-RI Chapter Conference 2006 on Advances in Cybernetic Systems (2006), 43–48.Search in Google Scholar

[6] B. Dobrina and B. Dominique, Discrete Delaunay: boundary and applications from voxel objects, Sylvain theory, in: Sixth International Conference on 3-D Digital Imaging and Modeling (3DIM) (2007), 209–216.Search in Google Scholar

[7] M. J. Dry, Using relational structure to detect symmetry: a Voronoi tessellation based model of symmetry perception, Acta Psychol.128 (2008), 75–90.10.1016/j.actpsy.2007.10.001Search in Google Scholar

[8] M.-K. Hu, Visual pattern recognition by moment invariants, IEEE Trans. Inform. Theory8 (2009), 179–187.10.1109/TIT.1962.1057692Search in Google Scholar

[9] D. V. Jadhav and R. S. Holambe, Feature extraction using Radon and wavelet transforms with application to face recognition, Neurocomputing72–9 (2009), 1951–1959.10.1016/j.neucom.2008.05.001Search in Google Scholar

[10] K. M. Lam and H. Yan, Location and extracting the eye in human face image, Pattern Recog.29 (2007), 771–779.10.1016/0031-3203(95)00119-0Search in Google Scholar

[11] Z. M. Lu, X. N. Xu, and J. S. Pan, Face detection based on vector quantization in color images, Int. J. Innov. Comput. Inform. Control (IJICIC)2 (2006), 667–672.Search in Google Scholar

[12] J. Lu, X. Yuan and T. Yahagi, A method of face recognition base on fuzzy c-mean clustering and associated sub-NNs, IEEE Proc.18 (2007), 150–159.10.1109/TNN.2006.884678Search in Google Scholar PubMed

[13] S. Mallat. A theory for multiresolution signal decomposition: the wavelet representation, IEEE Trans. Pattern Anal. Machine Intell.11 (1989), 674–693.10.1109/34.192463Search in Google Scholar

[14] S. Mallat and Z. Zhang, Matching pursuits with time–frequency dictionaries, IEEE Trans. Signal Process.41 (1993), 3397–3415.10.1109/78.258082Search in Google Scholar

[15] B. Nagarajan and P. Balasubramanie, Neural classifier system for object classification with clustered background using invariant moments features, Int. J. Soft Comput.3 (2008), 302–307.Search in Google Scholar

[16] L. Nanni and A. Luminia, A multi-expert approach for wavelet-based face detection, Pattern Recog. Lett.28 (2007), 1541–1547.10.1016/j.patrec.2007.03.015Search in Google Scholar

[17] C. Naster and N. Ayache, Frequency-based non-rigid motion analysis, IEEE Trans. Pattern Anal. Machine Intell.18 (2009), 1067–1079.10.1109/34.544076Search in Google Scholar

[18] A. Okabe, B. Boots, K. Sugihara and S.N. Chiu, Spatial tessellations: concepts and applications of Voronoi diagrams, Vol. 501 of Wiley Series in Probability and Statistics, 2nd ed., p. 696, John Wiley & Sons, 2009. ISBN 047031785X, 9780470317853.Search in Google Scholar

[19] A. Rehman and T. Saba, Analysis of advanced image processing to clinical and preclinical decision making with prospectus of quantitative imaging biomarkers, Artif. Intell. Rev. (2012), 1–19. DOI: 10.1007/s10462-012-9335-1.10.1007/s10462-012-9335-1Search in Google Scholar

[20] A. Rehman and T. Saba, Features extraction for soccer video semantic analysis: current achievements and remaining issues, Artif. Intell. Rev. (2012). DOI: 10.1007/s10462-012-9319-1.10.1007/s10462-012-9319-1Search in Google Scholar

[21] H. Ryu, M. Kim, V. Dinh, S. Chun and S. Sull, Robust face tracking based on region correspondence and its application for person based indexing system, Int. J. Innov. Comput. Inform. Control (IJICIC)4 (2010), 2861–2873.Search in Google Scholar

[22] T. Saba and A. Rehman, Effects of artificially intelligent tools on pattern recognition, Int. J. Machine Learn. Cybern4 (2012), 155–162.10.1007/s13042-012-0082-zSearch in Google Scholar

[23] S. Sokolov, O. Boumbarov and G. Gluhchev, Face recognition using combination of wavelet packets, PCA and LDA, IEEE Int. Symp. Signal Process. Inform. Technol. (2007), 257–262.10.1109/ISSPIT.2007.4458032Search in Google Scholar

[24] Y. W. Wong, K. P. Seng and L. M. Ang, Dual optimal multiband features for face recognition, Expert Syst. Appl.37 (2010), 2957–2962.10.1016/j.eswa.2009.09.039Search in Google Scholar

[25] L. Wu and X. Meng, A robust object segmentation method, Int. J. Innov. Comput. Inform. Control (IJICIC)4 (2008), 3059–3065.Search in Google Scholar

[26] Y. Xie, L. Setia and H. Burkhardt, Face image retrieval based on concentric circular Fourier–Zernike descriptors, Int. J. Innov. Comput. Inform. Control (IJICIC)4 (2008), 1433–1443.Search in Google Scholar

[27] X. Yi and Y. Hong, Facial feature location with Delaunay triangulation/Voronoi diagram calculation, in: Proceeding VIP ’01 Proceedings of the Pan-Sydney Area Workshop on Visual Information Processing, 11, pp. 103–108, Australian Computer Society, Inc. Darlinghurst, Australia, 2001. ISBN 0-909-92589-5.Search in Google Scholar

[28] C. Zhou, X. Wei, Q. Zhang and B. Xiao, Image reconstruction for face recognition based on fast ICA, Int. J. Innov. Comput. Inform. Control (IJICIC)4 (2008), 1723–1732.10.1109/ICNSC.2008.4525434Search in Google Scholar

Received: 2013-3-6
Published Online: 2013-05-01
Published in Print: 2013-06-01

©2013 by Walter de Gruyter Berlin Boston

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Downloaded on 29.2.2024 from
Scroll to top button