[IEEE 2010 IEEE International Conference on Computational Intelligence and Computing Research...

`�

Efficient and Robust Detection and Recognition of Objects in Grayscale Images

T. Shivanand1, Shahedur Rahman2, Gopinath Pillai2,

1 Department of Electrical Engineering, Indian Institute of Technology Roorkee, Roorkee, India 2School of Engineering and Information Sciences, Middlesex University, London, UK 3 Department of Electrical Engineering, Indian Institute of Technology Roorkee, India

Email: [email protected], [email protected], [email protected]

Abstract - A new method for the detection and recognition of objects was developed for grayscale images. Obstacle detection is based on an efficient binarization and enhancement techniques followed by a suitable connected component analysis procedure. The grayscale object corresponding to the object identified in the binary image is then extracted. The second step deals with the recognition of these extracted objects. Each object is then described by Zernike moments. To achieve rotation and scaling invariance an efficient method based on bounding box is used. In order to achieve better results for object recognition, modified Support Vector Machine(SVM) classifiers utilizing decision tree for solving multiclass problems is used. The algorithm performs the task of object detection and recognition more efficiently, even with external constraints i.e. image scenes can have Shadows, partial occlusion and non- uniform illumination and at a much faster rate. The efficiency of the proposed method on grayscale images is shown by cascading some objects from COIL -8 database. Keywords: Sauvola, Modified SVM decision tree

I. INTRODUCTION One of the most challenging tasks in Image processing is Image understanding. It consists of the following tasks: (a) Detect objects/obstacles and (b) object recognition. Object detection and recognition with complete shapes has been studied for a long time. Many techniques such as template matching, Fourier descriptors are being used. But these methods are not efficient if the image (a) is taken at varying lighting/weather conditions i.e. non-uniform illumination (b) has the blurring effect (c) has partial occlusion (d) has shadows.

Many structural methods have also been reported for objects having partial occlusion. Selecting specific points of an image called as interest points have been proposed [1]. However these methods are insufficient to form complete representation of the object. Representation of an object through a polygon was also applied [2]. But this method has the drawback to be unstable in finding breakpoints for non-polygon objects. Representations using simple geometric shapes like line, arc [3][4] were successful in representing

simple objects but failed to represent complex objects accurately. Image binarization is an important initial step in most of the image processing tasks involving complex objects. Performance of these tasks heavily depends on the results of binarization. The main objective of image binarization is to divide a grayscale or colour image into two groups that are foreground objects and clear background. From decades, many different approaches for the binarization of grayscale and colour images have been proposed in the literature. Additionally, grayscale binarization techniques can be applied by first converting the colour image into grayscale. Grayscale binarization approaches can be classified into two main groups: i) global binarization methods and ii) local binarization methods. Global binarization methods, like Otsu [5], try to estimate a single threshold value for the binarization of whole image. Then based on the intensity values, each pixel is assigned either to foreground or background. Global binarization methods are computationally inexpensive, however, they produce marginal noise artefacts if grayscale image contains non-uniform illumination. Local binarization methods [6][7] , like Sauvola, try to overcome these problems by calculating threshold values for each pixel differently using local neighbourhood information. Object Recognition is carried out through combinations of Zernike moments and Support Vector Machines (SVMs)[11]. The inputs to these SVM are the colour and shape based features of objects which are computed using Zernike moments. The recent results in pattern recognition have shown that support vector machine (SVM) classifiers often have superior recognition rates in comparison to other classification methods. However, the SVM was originally developed for binary decision problems, and its extension to multi-class problems is not straightforward. The popular methods for applying SVMs to multiclass classification problems usually decompose the multi-class problems into several two-class problems that can be addressed directly using several SVMs.

978-1-4244-5967-4/10/$26.00 ©2010 IEEE

`�The proposed method uses efficient image binarisation technique followed by Zernike moment and the Support vector machine based Decision tree for the detection and recognition of an object. Experimental results show that by using the proposed method we achieve an improved recognition rate for images (a) taken at varying lighting/weather conditions (b) taken at non-uniform illumination (c) having partial occlusion (d) having shadows. The paper is structured as follows as follows: Section 2 is dedicated to a description of the basic concepts used in our paper. Section 3 consists of detailed description of the procedure. The experimental results are given in Section 4 while conclusions are drawn in Section 5.

II. BASICS

a) LOCAL BINARIZATION USING SAUVOLA’s METHOD

Grayscale images of unsigned integer 8 (unit8) type contain intensity values in between 0 to 255. Unlike global binarization, local binarisation methods calculate a threshold t(x, y) for each pixel such that �� The threshold t(x, y) is computed using the mean �(x, y) and standard deviation �(x, y) of the pixel intensities in a 3 × 3 window centered on the pixel (x, y) in Sauvola’s binarization method: �� μ�� where R is the maximum value of the standard deviation (R = 128 for a grayscale image), and k is a parameter which takes positive values. The formula (Equation 2) has been designed in such a way that, the value of the threshold is adapted according to the contrast in the local neighbourhood of the pixel using the local mean �(x, y) and local standard deviation �(x, y). Because of this, it tries to estimate appropriate threshold t(x, y) for each pixel under both possible conditions: high and low contrast. In case of local high contrast region (�(x, y) � R), the threshold t(x, y) is nearly equal to �(x, y). Under quite low contrast region (� << R), the threshold goes below the mean value thereby successfully removing the relatively dark regions of the background. The parameter k controls the value of the threshold in the local window such that the higher the value of k, the lower the threshold from the local mean m(x, y).

The statistical constraint in the above Equation gives acceptable results even for degraded images.

b) SUPPORT VECTOR MACHINES FOR PATTERN RECOGNITION

The supervised classification method used in our paper is based on Support Vector Machines (SVM). SVM was proposed by Vapnik [11].This method creates functions from set of labelled training data. The function can be a classification function with binary outputs or it can be a general regression function. For the classification, SVMs operate by finding a hypersurface in the space of possible inputs. This hypersurface attempts to split the positive examples from the negative examples. The split will be chosen to have the largest distance from the hypersurface to the nearest of the positive and negative examples.

III. METHODOLOGY The scheme of the proposed method includes the following:

� Image pre-processing essentially for smoothing of background texture.

� Image binarization and enhancement in order to separate the objects from the background

� Labelling approach to distinguish each object

� Description of each object using Zernike moments

� Recognition by Support Vector Machine based Decision Tree

3.1 Object detection

3.1.1 Image Pre-processing For low resolution and poor quality scene images, a pre-processing stage of the grayscale source image is essential for the smoothing of background texture. A simple method for background smoothing in the corrupted greyscale images is used. This method consists of detecting the background pixels using connected components method. The pixel values of the background are now made equal to 0.

3.1.2 Estimation of Foreground Regions In this step, a rough estimation of the foreground regions is obtained. The intention is to proceed to an initial segmentation of foreground and background regions that will provide a superset of

`�the correct set of foreground pixels. This is refined at a later step. Sauvola’s approach for adaptive thresholding using k=.2 is, suitable for this case. At this step, original (grayscale) image O(x,y) is processed in order to extract the binary image S(x,y), where 1’s correspond to the rough estimated foreground regions. 3.1.3 Background Surface Estimation At this stage, an approximated background surface B(x,y) of the image O(x,y) is computed. Background surface estimation is guided by the evaluation of S(x,y) image [the binarized image obtained earlier using Sauvola’s approach]. For pixels that correspond to 0’s at image S(x,y), the corresponding value at B(x,y) equals to O(x,y). For the remaining pixels, the valuation of B(x,y) is computed by a neighbouring pixel interpolation. Bi-cubic interpolation is used. It estimates the value of the pixel in B(x, y) image by an average of 16 pixels surrounding the closest corresponding pixel in the source image. 3.1.4 Final Thresholding It is done by combining the calculated background surface B(x,y) with the original image O(x,y). Object areas are detected if the distance of the pre-processed image O(x,y) with the calculated background B(x,y) exceeds a threshold d. The threshold d will be changed according to the gray-scale value of the background surface B(x,y) in order to preserve the object information even in very dark background areas. For this reason, a threshold d with smaller value will be used for darker regions.

3.1.5 Image post-Processing: In the final step, post-processing of the resulting binary image will be done to eliminate noise, improve the quality of object regions and preserve the stroke connectivity by isolated pixel removal and filing breaks, gaps or holes. The post-processing algorithm involves a successive application of opening and closing i.e shrink and swells filtering. 3.1.6 Labelling Approach In order to differentiate each object present in the scene, connected component labelling algorithm [8] is used. This algorithm uses the final binary image for the detection of objects. The binary object is then detected and extracted from the image.

3.2 object recognition One of the important stages of pattern recognition is object representation. The representation should be invariant to object position viz. rotation translation and scale factor. The proposed object representation is based on Zernike moments. Zernike moments [9] are well known to be rotation invariant. To achieve Scale and translation invariance a novel technique is adopted which is discussed below. a) Obtain the centroid of object (the background pixels have the value 0) b) Draw the smallest circle centered at object's centroid completely containing the object here, an efficient search method based on the bounding box is used. c) Scale the bounding-shape (obtained in (b)) to desired resolution (in this case 128*128) and then create the resulting image from the scaled bounding shape, i.e obtain the modified image of resolution 128*128 through interpolation from the earlier image.

The object detected through labelling algorithm is resized to an image of 128*128 resolution. This image is used for extracting Zernike moments.

3.2.1 Zernike moments

Zernike moments are based on a set of complex polynomials that form a complete orthogonal set over the interior of the unit circle [10]. Zernike moments are defined to be the projection of the image function on these orthogonal basis functions. The basis functions Vn.m(x, y) are given by n,m (x, y) = n,m (�, ) = R n, m(�)!"# �$� , where n is a non-negative integer, m is non-zero integer subject to the constraints n-|m| is even and |m| < n, � is the length of the vector from origin to (x, y), is the angle between vector � and the x -axis in a counter clockwise direction and Rn,m(�) is the Zernike radial polynomial. The Zernike radial polynomials, Rn, m (�), are defined as:

Rn,�m(�) = % �&'�()*+ ,-.+ /,).+ /.-0+ /.)0+ /1

234#4�1&23565172

= 8 79:934;4�:&93<=<: �n, m, k (4)

`�Note that Rn, m(�) = Rn, m(�). The basis functions in equation 3 are orthogonal thus satisfy

1>'? @ 1�#�� +>�+A' Vp, q * (x, y)� B1�CB#�C

(5) Where,

BD�E� F�� G�HIJK�LI �M� But for digital image function f(x, y) the Zernike moment of order n with repetition m is given by �

Zn, m =:>'? NN �� O �� P:�;�+>�+A' (7)

�

An, m = Zn, m; (8) where V*

n,m(x, y) is the complex conjugate of Vn,m(x, y). The image center of mass is shifted to the origin for the computation of the Zernike moments. To compute the Zernike moments of a given image, the center of the image is taken as the origin and pixel coordiantes are mapped to the range of unit circle, i.e ., x2+y2<=1. The pixels falling outside the unit circle are not used in the computation. Also An,-m=A*

nm .The main reason for using Zernike moments is that they are known to be rotation invariant. The extracted Zernike moments are then normalized by (7) to make them less sensitive to illumination changes : For odd order moments: X n, m =|Zn, m| / |Z0, 0| (9)

For even order moments X 0, 0 = Zn, m / Z0, 0 (10) And for values of m other than zero. X n, m =|Zn, m| / |Z0, 0| (11) This is done to achieve better efficiency. In order to differentiate objects with similar shape, Zernike moments are applied on the grayscale images. Each input vector to the SVM is formed by 19 components of normalised Zernike moments of order 10 to 12th obtained by applying moments on the Gray image.

Figure 1 3.2.2 Supervised Classification Using SVM Decision Tree

Modified SVM Decision Tree is used for efficient multiclass classification. SVM binary decision tree classifier [13] uses top-to-bottom decision layer architecture for classification

The proposed method uses bottom-to-top decision layer architecture for classification. The test sample is tested from the bottom of the tree. In the experiment which consists of 8 objects, using the SVM classifier the test sample is first tested in the bottom SVM classifiers as shown in the figure below. The winning classes are (one among the classes 1, 8), (one among the classes 2, 7), (one among the classes 6, 3) and (one among the classes 4, 5). These winning classes are now passed into the upper SVM classifiers. The test sample is finally tested between the winners of the two SVM classifiers (winner among classes 1,8,2,7 and classes 6,3,4,5). The SVM based Decision Tree architecture takes advantage of both the high classification accuracy of SVMs and efficient computation of the decision tree architecture and thereby it offers better accuracy. The training phase of SVM-Decision Tree is faster than other SVM based approaches and neural networks. During recognition phase, due to its logarithmic complexity, SVM Decision Tree is faster than the widely used multi-class SVM methods like “one-against-one” and “one-against-all”, for multiclass problems.

IV. RESULTS

This section describes experimental results obtained with the proposed method. The image database contains grayscale images of 8 different objects. 72 views of each object are taken at pose angle of 5 degrees. Figure 2 presents some examples of the database, while Figure 3 presents one object of the database

`�

Figure 2: som

Figure 3: Exa

Figure 4: Obj

Figure 5: Obj

Figure 6: Obj for differehaving paobjects witpresents so

me objects of the da

ample of one objec

jects with partial o

jects with non-unif

jects with dull illum

ent rotations. artial occlusionth non-uniformome objects wi

20

atabase

ct with different ro

occlusion

form illumination

mination

Figure 4 presn. Figure 5 s

m illumination th dull illumin

010 IEEE Internat

otations

sents objects shows some and Figure 6

nation.

tional Conference

4.1 obj

(a)

(c)

(e)

(f)

(g)

Figure7((c) Imag(k=.2) (obtained objects aobject is

detectiobackgrobinarisalabellin

on Computational

ject detection r

a) original image ge after the applid) the backgroun

with a thresholare separated fromthen resized into 1

Figure 7 shon form thound using thation methodng. Firstly, th

l Intelligence and C

results:

(b)

(d)

(b) image after baication of modifind image (e) finding value of 18

m the original imag128*128 resolution

hows an examhe database he efficient md and conneche background

Computing Resear

ackground correctied Sauvola meth

nal binarised ima8. (f) The detectge. (g) Each detectn.

mple of objewith texture

modified Sauvocted componed pixels of th

rch

ion hod age ted ted

ect ed

ola ent he

`�given grayscale image is computed using connected components method and the pixel value of the background pixels is made 0. Using the modified Sauvola approach (k=.2) the new image is obtained as shown in Figure 7(c). Then we compute the Background image as represented in 7(d). Figure 7(e) shows the final binarised image. Using the connected component analysis method each object is separated as shown in figure 7(f). Each detected object is then resized to 128*128 resolution image as shown in 7(g). 4.2 object recognition results

For the object recognition part each training vector is formed by 19 components of Zernike moments of order 10, 11 and 12 applied on grayscale image. The database is composed of 2304 images. A learning database containing some views of each object is created. The test database corresponds to the views 0°, 50°,100°,150°,200°,250°,300°,350° for each object. There are 8 classes of objects and there are 72 views for each object. 576 images were considered with normal illumination, 576 images were considered with non-uniform illumination, 576 images were considered with dull illumination and finally another 576 objects were considered with partial occlusion. Out of a total of 576 images 512 were considered as training samples and 64 were considered as test samples. The test samples were given as input to the SVM Decision tree classifier and the results obtained were as follows. Uniform illumination

Non-uniform illumination

Dull illumination

Partial occlusion

100% 100% 100% 95.31% % refers to the recognition rate for varying conditions. The background in the above cases is a textured background and the results were the same irrespective of the background colour.

V. CONCLUSION

The proposed approach strives toward a methodology that aids automatic detection, segmentation and recognition of visual objects .Image binarisation successfully process images having shadow or non-uniform illumination. Connected component analysis is used to define a binary image that mainly consists of the objects to be detected. For feature extraction Zernike moments are computed because they are robust and are invariant (to rotation), also to achieve translation and scaling invariance a novel approach

using bounding box approach is used . And finally to achieve best results for object recognition modified SVM decision tree is used. The SVM based Decision Tree architecture takes advantage of both the efficient computation of the decision tree architecture and the high classification accuracy of SVMs hence it offers better accuracy. The results show that the proposed method is more efficient, faster and robust. REFERENCES [1] M.H. Han and D.S. Jang, ‘’ The use of Maximum curvature points for the recognition of partially occluded objects”, Pattern Recognition, [2] H.C. Liu and M.D. Srinath, “Partial Shape classification using contour Matching in Distance Transformation”, IEEE Trans. Pattern Anal. Mach. Intell. PAMI- 12(11),pp.1072-1079, 1990. [3] K.B. Lim , K. Xin, G.S. Hong, “Detection and estimation of circular arc segments”, Pattern Recognition Letters 16, pp.627-636,1995. [4] P.W.M. Tsang, P.C. Yuen and F.K. Lam,” Recognition of occluded Objects”, Pattern Recognition 25, pp.1107-1117, 1992. [5] N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Trans. Systems, Man, and Cybernetics 9(1), pp. 62-66, 1979. [6] J. Sauvola and M. Pietikainen, “Adaptive document image binaristaion,” Pattern recognition 33(2), pp.225-236, 200. [7] J. Brensen, “Dynamic thresholding of gray level images,” in Proc. Intl. Conf. On pattern recognititon, pp 1251-1255, 1986. [8] L.Di Stefano, A. Bulgarelli, “ A simple and efficient connected components labelling algorithm”. Image Analysis and Processing, Proceedings. International Conference, 27-29 Sept. 1999 pp: 322-327 [9] S.M. Abdallah, E.M. Nebot, et D.C. Rye “Object Recognition and orientation via Zernike moments”. In Chin, and Pong, T.C editors, Proc. Computer Vision ACCV’98, volume 1 of LNCS 1351, pages 386-393. Springer Verlag,1998 [10] A. Khotanzad, Y.H. Hong, “Invariant image recognition by Zernike moments”, Pattern Analysis and Machine Intelligence, vol:12, Issue 5: pp. 489-497, May 1990 [11]V. Vapnik, “Support – Vector Network”, Machine Learning, vol.20 issue 3, September 1995. [12]http://www1.cs.columbia.edu/CAVE/research/softlib/coil-100/html. [13] Gjorgji Madzarov “A multi-class SVM classifier utilizing binary decision tree” Informatica, May, 2009

[IEEE 2010 IEEE International Conference on Computational Intelligence and Computing Research...

Documents

Transcript of [IEEE 2010 IEEE International Conference on Computational Intelligence and Computing Research...