lishutao-200703-5

download lishutao-200703-5

of 8

Transcript of lishutao-200703-5

  • 8/13/2019 lishutao-200703-5

    1/8

    Skew detection using wavelet decomposition and projectionprofile analysis

    Shutao Li a,*, Qinghua Shen a, Jun Sun b

    a College of Electrical and Information Engineering, Hunan University, Changsha 410082, Chinab Fujitsu R&D Center Co., Ltd., Eagle Run Plaza B1003, Xiaoyun Road No. 26, Chaoyang District, Beijing 100084, China

    Received 22 February 2006; received in revised form 22 September 2006Available online 28 November 2006

    Communicated by A.M. Alimi

    Abstract

    In this paper, a novel document skew detection algorithm based on wavelet decompositions and projection profile analysis is pro-posed. First, the skewed document images are decomposed by the wavelet transform. The matrix containing the absolute values ofthe horizontal sub-band coefficients, which preserves the texts horizontal structure, is then rotated through a range of angles. A projec-tion profile is computed at each angle, and the angle that maximizes a criterion function is regarded as the skew angle. Experimentalresults show that this algorithm performs well on document images of various layouts and is also robust to different languages. Theeffects of various wavelet basis, number of decomposition levels, and parameters of the criterion function are investigated too. 2006 Elsevier B.V. All rights reserved.

    Keywords: Skew detection; Document analysis; Projection profile analysis; Wavelet transform

    1. Introduction

    Document skew detection is necessary for most docu-ment analysis system and many methods have been devel-oped. Existing methods typically use: (1) projectionprofiles analysis (Bloomberg and Kopec, 1993; Bloomberget al., 1995; Ishitani, 1993; Liolios et al., 2002; Postl, 1986);

    (2) nearest neighbors (Jiang et al., 1999; Liolios et al., 2001;Lu and Tan, 2003); (3) Hough transform (Amin andFischer, 2000; Yu and Jain, 1996; Ham et al., 1994); (4)mathematical morphology (Das and Chanda, 2001; Naj-man, 2004); (5) cross-correlations (Akiyama and Hagita,1990; Yan, 1993; Chaudhuri and Chaudhuri, 1997; Chenand Ding, 1999; Gatos et al., 1997).

    The traditional projection profile (PJ) based approachfor skew detection was proposed by Postl (1986). First,the input document is rotated through a range of angles

    and a projection profile is calculated at each angle. Fea-tures are then extracted from each projection profile todetermine the skew angle. This is computationally expen-sive as it is performed directly on the original documentimage. Moreover, it is sensitive to the layout of the docu-ment image.

    An improved projection profile based approach was

    proposed by Bloomberg and Kopec (1993). The originaldocument image is down-sampled before the projectionprofile is computed. The following operations are basedon the sampled image. Therefore, the image data to be pro-cessed is reduced and the computational cost is reduced sig-nificantly. However, a major weakness is that its detectionaccuracy is influenced by the document image layout. Itoften fails on document images with multiple font styles/sizes or those that contain a large amount of non-textregions (such as pictures, tables or graphics).

    The second class of the skew detection methods is basedon the nearest neighbors (Jiang et al., 1999; Liolios et al.,2001; Lu and Tan, 2003). Here, the angle between each

    0167-8655/$ - see front matter 2006 Elsevier B.V. All rights reserved.

    doi:10.1016/j.patrec.2006.10.002

    * Corresponding author. Tel.: +86 731 8672916; fax: +86 731 8822224.E-mail addresses:[email protected],[email protected](S. Li).

    www.elsevier.com/locate/patrec

    Pattern Recognition Letters 28 (2007) 555562

    http://www.paper.edu.cn

  • 8/13/2019 lishutao-200703-5

    2/8

  • 8/13/2019 lishutao-200703-5

    3/8

    filters smooth the image while the highpass filters look fordetailed information in the image.

    As shown inFig. 1, when 2D-DWT is implemented onan image, four frequency bands (LL, LH, HL and HH)are obtained. Among these four sub-bands, the LL sub-

    band corresponds to an approximation of the originaldocument image, the LH sub-band provides details in thehorizontal direction, the HL sub-band provides details inthe vertical direction, while the HH sub-band providesdetails in the diagonal direction. Fig. 2 gives a documentimage and its first level decomposition result using thesymlets wavelet.

    3. Projection profile analysis

    A popular method for skew detection uses horizontalprojection profile because the texts in most documentimages are aligned along horizontal lines. When the hori-zontal projection profile is applied on an M N image, acolumn vector of size M 1 is obtained. Elements of thiscolumn vector are the sum of pixel values in each row of

    the document image. An example of the projection profilesof an unskewed and skewed image are shown inFig. 3. Ascan be seen, peaks in Fig. 3(c), which correspond to thehorizontal projection profile of the unskewed image, aretaller than those inFig. 3(d), which correspond to the hor-

    izontal projection profile of the skewed image. In fact,peaks in Fig. 3(c) average around 170 while peaks inFig. 3(d) average around 80. Based on this significant dif-ference, the skew angle can be estimated.

    4. The proposed algorithm

    The proposed algorithm is based on the wavelet trans-form and horizontal projection profile (Fig. 4):

    1. If the input document image is not a gray one, transformit into a gray-scale image, denoted Ig.

    2. Decompose Ig with 2D-DWT. Then, four frequencysub-bands (LL, LH, HL and HH) are obtained. Here,the LH sub-band is selected because it preserves thehorizontal structure of the document image.

    Fig. 3. Projection profiles of unskewed and skewed document images. (a) Unskewed document image, (b) document image rotated by 6, (c) horizontalprojection profile of (a), (d) horizontal projection profile of (b).

    S. Li et al. / Pattern Recognition Letters 28 (2007) 555562 557

    http://www.paper.edu.cn

  • 8/13/2019 lishutao-200703-5

    4/8

    3. Denote the matrix formed by the absolute values of theLH sub-band coefficients by IH. IH is rotated through[a,a] and a horizontal projection profile is computedat each angle.

    4. The skew angle is estimated by using the criterion func-tion. A proper criterion cannot only reduce computationtime but also obtains more accurate detection result. Inour algorithm, peaks in each of the projection profile areselected as follows.

    Let H be one of the horizontal projection profiles.Therefore H is a column vector. Divide H into M pieceseach size of N 1 (denoted v1, v2, . . .,vM). Then the Klargest values of vm (1 6m 6 M) are selected (denotedvm1, . . ., vmk). The criterion function is defined as

    sumXM

    m1

    XK

    k1

    vmk 1

    The angle that maximizes this criterion function is regardedas the estimated skew angle.

    To speedup the search process, our method is performed

    in a coarse-to-fine mode. First, the search step size is 2toget a coarse estimate. Denote the founding optimal angleby I. Then the search space is changed from I1 to

    I+ 1 with a step size of 0.5. Denote the next optimalangle obtained by J. Finally, it searches for the best skewangleL within the range J0.5to J+ 0.5with a finerstep size of 0.1.

    5. Experimental results

    5.1. Test dataset

    The proposed method is evaluated on the open datasetprovide by Chou et al. (http://dar.iis.sinica.edu.tw/Down-load%20area/skew.htm). It contains 500 images, whichare generated by scanning a collection of different docu-ments, including newspapers, books, magazines, and jour-nals, with a resolution of 300 dpi. The range of the skewangle is [15, 15].

    These images are divided into five categories (Table 1):(1) English documents dominated by text in the horizontaldirection. They are either single-column or double-col-umns; (2) Chinese or Japanese documents. These are alsodominated by text. But text-lines in these images are eitherin the horizontal or vertical direction, or both horizontal

    and vertical directions; (3) Documents with large amountof non-text regions, such as figures, graphics; (4) Docu-ments dominated by tables and forms; (5) Documents inmultiple languages, such as Arabia, Hindi, Greek. Figuresand graphics are also contained in these images. Someexamples are shown in Fig. 5.

    5.2. Experimental results

    To evaluate our proposed method, we compare it withsome standard methods such as PJ (projection profile),TC (transition-counts), CC (cross-correlations) and PCP

    (piecewise covering by parallelograms). Results of thesemethods on the same set of test images are available from

    Fig. 4. Schema of the proposed method.

    Table 1Test samples

    Document type Number of testimages

    1st Category English documents 1002nd Category Chinese and Japanese documents 1003rd Category Documents containing large-scale

    figure100

    4th Category Documents containing forms or

    tables

    100

    5th Category Multil ingual documents 100

    558 S. Li et a . Pattern Recognition Letters 28 (2007) 555562

    http://www.paper.edu.cn

  • 8/13/2019 lishutao-200703-5

    5/8

    Auth

    or's

    pe

    rsonal

    copy

    papers byChou et al. (2007). For PJ, method proposed byPostl (1986)is implemented. For TC, method proposed by

    Chen and Wang (2000) is implemented. CC stands formethod proposed by Chaudhuri and Chaudhuri (1997),

    and PCP stands for the method proposed by Chou et al.(2007).

    The estimation error is usually used to evaluate the effec-tiveness of the skew detection method. It is defined as the

    Fig. 5. Examples of the five categories of documents. (a) 1st Category, (b) 2nd category, (c) 3rd category, (d) 4th category, (e) 5th category.

    S. Li et al. / Pattern Recognition Letters 28 (2007) 555562 559

    http://www.paper.edu.cn

  • 8/13/2019 lishutao-200703-5

    6/8

    difference between the estimated and actual angles. In ourexperiment, both the average and variance of the errorsare computed. The wavelets basis filter is db1 and numberof decomposition levels is 2. Nis fixed to 25 and the num-ber of peaks Kis fixed to 5.

    Results are shown in Tables 2 and 3. Table 2 presentsthe average and variance of errors on all test images whileTable 3gives the average and variance of the top 80% min-imum errors.

    From these two tables, we can see that our proposedmethod can achieve the best estimation results for the2nd and 5th categories. For images of the 1st and 4th cat-egories, the estimation result of our method is similar tothose of the other methods. However, for the images inthe 3rd category, the proposed method is worse thanPCP, TC and CC methods. The main reason is that theseimages are dominated by figures. But compared with PJ,which is the worst, our method performs much betterbecause of the introduction of the wavelet transform.Another advantage of the proposed method is that its var-iance of errors is much smaller than the other methods

    (also shown inTables 2 and 3).The proposed method takes less than one tenth of the

    computation time of PJ because of the use of 2-level wave-let decomposition. For example, using Matlab 6.5 on aAMD Sempron 512 MHz PC running Windows XP, for

    images with size of 2480*3508, our method takes 55 swhile PJ takes 766 s. TC and CC take much more time thanPJ and PCP runs a little faster than PJ (Chou et al., 2007).The proposed algorithm can run faster with higher decom-position level.

    6. Discussions

    6.1. Effect of wavelet basis

    To evaluate the effect of wavelet basis filters on thedetection result, different wavelet basis filters containingDaubechies, Symlets and biorthogonal are tested.In the experiments, second-level wavelet decomposition isperformed. The vector lengthNis fixed to 25, and the num-ber of peaksKis fixed to 5.Table 4gives the mean and var-iance of the errors using different wavelets basis filters.From the table, it can be concluded that the db1 is gen-erally the best.

    6.2. Effect of wavelet decomposition level

    Effects of the number of decomposition levels are showninTable 6. The wavelet basis used are db1 and sym2.Nand Kare the same as in Section6.1. As shown inTable5, different categories require different decomposition levels

    Table 2Comparisons of the error rates

    Methods 1st Category 2nd Category 3rd Category 4th Category 5th Category

    Mean Var Mean Var Mean Var Mean Var Mean Var

    Our method 0.256 0.088 0.126 0.035 0.499 0.019 0.125 0.021 0.071 0.006

    PCP 0.149 0.129 0.139 0.143 0.231 0.135 0.111 0.127 0.077 0.075PJ 0.230 0.206 0.496 0.591 7.787 9.049 0.160 0.163 2.050 5.816TC 0.185 0.180 0.171 0.155 0.249 0.223 0.150 0.18 0.176 0.240CC 0.166 0.144 0.180 0.192 0.345 0.325 0.139 0.146 0.197 0.230

    Table 3Comparisons of the top 80% error rates

    Methods 1st Category 2nd Category 3rd Category 4th Category 5th Category

    Mean Var Mean Var Mean Var Mean Var Mean Var

    Our method 0.208 0.015 0.068 0.005 0.450 0.011 0.071 0.008 0.040 0.002PCP 0.102 0.096 0.088 0.070 0.178 0.102 0.062 0.073 0.051 0.050PJ 0.153 0.140 0.254 0.263 3.419 4.934 0.096 0.105 0.208 0.264TC 0.148 0.131 0.108 0.091 0.183 0.144 0.078 0.084 0.105 0.072CC 0.115 0.109 0.132 0.096 0.223 0.186 0.075 0.078 0.129 0.125

    Table 4Effect of the wavelet basis on the performance (level = 2, N= 25, K= 5)

    Wavelet basis 1st Category 2nd Category 3rd Category 4th Category 5th Category

    Mean Var Mean Var Mean Var Mean Var Mean Var

    db1 0.256 0.088 0.126 0.035 0.499 0.019 0.125 0.021 0.071 0.006db4 0.254 0.086 0.173 0.044 0.504 0.020 0.140 0.025 0.089 0.015

    bior2.2 0.269 0.092 0.126 0.035 0.498 0.019 0.152 0.027 0.072 0.010sym2 0.255 0.086 0.153 0.064 0.460 0.020 0.196 0.021 0.075 0.012

    560 S. Li et al. / Pattern Recognition Letters 28 (2007) 555562

    http://www.paper.edu.cn

  • 8/13/2019 lishutao-200703-5

    7/8

    to get the best results. To have a balance between compu-tation time and detection accuracy, level two is the bestchoice.

    6.3. Effect of N and K

    Tables 6 and 7 show the effects of vector length Nandnumber of peaks K on the results. The wavelet basis isdb1 and the number of decomposition levels is 2. InTable6, Kis fixed to 5 while in Table 7, Nis fixed to 25. FromTable 6, we can see that the parameter Nhas small influ-ence on the detection accuracy. And the smaller N is, thefaster the algorithm works. As is shown in the Table 7,changing Khas no influence on the result and the speedof the proposed algorithm.

    6.4. Detection range

    To investigate how large a skew angle the proposedmethod can tolerate, all the zero-angle images are alsorotated by 15, 30 and 45 in both clockwise and anti-

    clockwise directions. From the detection results shown inTable 8, we can conclude that the proposed method workswell in the range of [45, 45] and the skew angle has littleeffect on the accuracy. This demonstrates that the proposedmethod is effective since the skew angle is usually verysmall in practice. For angles in the range of (90,45)or (45, 90), the detection result depends on the layout ofthe document image. If text-lines in the images are horizon-tal, the estimated angle is close to its actual skew angle. Butfor images whose text-lines are vertical or both vertical and

    Table 5Effect of the number of wavelet decomposition levels on the performance (N= 25, K= 5)

    Wavelet basis level 1st Category 2nd Category 3rd Category 4th Category 5th Category

    Mean Var Mean Var Mean Var Mean Var Mean Var

    db1 1 0.260 0.086 0.139 0.035 0.505 0.017 0.135 0.023 0.066 0.0092 0.256 0.088 0.126 0.035 0.499 0.019 0.125 0.021 0.071 0.0063 0.209 0.074 0.271 0.231 0.496 0.028 0.121 0.022 0.095 0.018

    sym2 1 0.257 0.086 0.126 0.035 0.505 0.017 0.127 0.023 0.071 0.0112 0.255 0.084 0.153 0.064 0.460 0.020 0.125 0.021 0.075 0.1173 0.183 0.061 0.450 0.970 0.473 0.062 0.134 0.027 0.117 0.034

    Table 6

    Effect of the vector length Non the performance (db1, level = 2, K= 5)Vector length N 1st Category 2nd Category 3rd Category 4th Category 5th Category

    Mean Var Mean Var Mean Var Mean Var Mean Var

    15 0.256 0.088 0.132 0.031 0.524 0.027 0.127 0.021 0.073 0.00920 0.255 0.088 0.140 0.064 0.505 0.024 0.122 0.021 0.079 0.01025 0.256 0.088 0.126 0.035 0.499 0.019 0.125 0.021 0.071 0.006

    Table 7Effect of the number of peaks Kon the performance (db1, level = 2, N= 25)

    Number of peaks K 1st Category 2nd Category 3rd Category 4th Category 5th Category

    Mean Var Mean Var Mean Var Mean Var Mean Var

    5 0.256 0.088 0.126 0.035 0.499 0.019 0.125 0.021 0.071 0.00610 0.256 0.088 0.126 0.035 0.499 0.019 0.125 0.021 0.071 0.00615 0.256 0.088 0.126 0.035 0.499 0.019 0.125 0.021 0.071 0.006

    Table 8Detection range of the proposed method

    Skew angles 1st Category 2nd Category 3rd Category 4th Category 5th Category

    Mean Var Mean Var Mean Var Mean Var Mean Var

    45 0.255 0.088 0.129 0.031 0.502 0.024 0.122 0.021 0.075 0.00830 0.256 0.087 0.127 0.030 0.505 0.022 0.125 0.024 0.073 0.009

    15 0.256 0.088 0.126 0.035 0.500 0.024 0.124 0.021 0.072 0.006

    15 0.256 0.088 0.126 0.035 0.500 0.024 0.124 0.021 0.072 0.00630 0.255 0.088 0.127 0.032 0.504 0.022 0.125 0.023 0.074 0.00945 0.255 0.087 0.129 0.031 0.502 0.024 0.122 0.021 0.075 0.008

    S. Li et a . Pattern Recognition Letters 28 (2007) 555562 561

    http://www.paper.edu.cn

  • 8/13/2019 lishutao-200703-5

    8/8

    horizontal, the estimated angle is prone to the complemen-tary angle of the actual angle. This problem can be solved ifprior knowledge about the layout of the image is known.

    7. Conclusions

    In this paper, a novel algorithm for document skewdetection based on wavelet decomposition and projectionprofile analysis is proposed. From the point of waveletdecomposition, the horizontal sub-band is the best candi-date for document skew detection. It can save computationtime and improve the estimation accuracy. The experimen-tal results show that the proposed algorithm works well onvarious documents including Chinese, Japanese, English,diagrams, etc.

    Acknowledgements

    We would like to give many thanks to the anonymousreviewers for helpful comments and constructive sugges-tions. We also thank Prof. Fu Chang for providing thesource data. This paper is supported by the National Nat-ural Science Foundation of China (No. 6040204), Programfor New Century Excellent Talents in University, and theExcellent Youth Foundation of Hunan Province.

    References

    Akiyama, T., Hagita, N., 1990. Automated entry system for printeddocuments. Pattern Recognition 23, 11411154.

    Amin, A., Fischer, S., 2000. A document skew detection method using theHough transform. Pattern Anal. Appl. 3, 243253.

    Bloomberg, D.S., Kopec, G.E., 1993. Method and apparatus foridentification and correction of document skew. Xerox Corporation,US Patent 5,187,753.

    Bloomberg, D.S., Kopec, G.E., Dasari, L., 1995. Measuring documentimage skew and orientation. Proc. SPIE 2422, 302316.

    Chou, C.H., Chu, S.Y., Chang, F., 2007. Estimation of skew angles forscanned documents based on piecewise covering by parallelograms.Pattern Recognition 40, 443455.

    Chaudhuri, A., Chaudhuri, S., 1997. Robust detection of skew indocument images. IEEE Trans. Image Process. 6, 344349.

    Chen, M., Ding, X., 1999. A robust skew detection algorithm for grayscaledocument image. In: Proc. 5th Internat. Conf. on Document Analysisand Recognition, pp. 617620.

    Chen, Y.K., Wang, J.F., 2000. Skew detection and reconstruction basedon maximization of variance of transition-counts. Pattern Recognition33, 195208.

    Das, A.K., Chanda, B., 2001. A fast algorithm for skew detection ofdocument images using morphology. Int. J. Document Anal. Recog-nition 4, 109114.

    Daubechies, I., 1992. Ten Lectures on Wavelets. SIAM, Philadelphia.Gatos, B., Papamarkos, N., Chamzas, C., 1997. Skew detection and text

    line position determination in digitized documents. Pattern Recogni-tion 30, 15051519.

    Ham, Y.K., Chung, H.K., Kim, I.K., Park, R.H., 1994. Automatedanalysis of mixed documents consisting of printed Korean alphanu-meric texts and graphic images. Opt. Eng. 33, 18451853.

    Ishitani, Y., 1993. Document skew detection based on local regioncomplexity. In: Proc. 2nd Internat. Conf. on Document Analysis andRecognition, Tsukuba Science City, Japan, pp. 4952.

    Jiang, X., Bunke, H., Widmer-Kljajo, D., 1999. Skew detection ofdocument images by focused nearest-neighbor clustering. In: Proc. 5th

    Internat. Conf. on Document Analysis and Recognition, Bangalore,pp. 629632.

    Liolios, N., Fakotakis, N., Kokkinakis, G., 2001. Improved documentskew detection based on text line connected component clustering. In:Proc. Internat. Conf. on Image Processing, Thessaloniki, Greece, vol.1, pp. 10981101.

    Liolios, N., Fakotakis, N., Kokkinakis, G., 2002. On the generalization ofthe form identification and skew detection problem. Pattern Recog-nition 35, 253264.

    Lu, Y., Tan, C.L., 2003. A nearest-neighbor chain based approach to skewestimation in document images. Pattern Recognition Lett. 24, 23152323.

    Mallat, S., 1989. A theory for multiresolution signal decomposition: Thewavelet representation. IEEE Trans. Pattern Anal. Machine Intell. 11,674693.

    Najman, L., 2004. Using mathematical morphology for document skewestimation. In: Proc. SPIE, Document Recognition and Retrieval XI,vol. 5296, pp. 182191.

    Postl, W., 1986. Detection of linear oblique structures and skew scan indigitized documents. In: Proc. 8th Internat. Conf. on Pattern Recog-nition, Paris, France, pp. 687689.

    Yan, H., 1993. Skew correction of document images using interline cross-correlation. CVGIP: Graph. Models Image Process. 55, 538543.

    Yu, B., Jain, A.K., 1996. A robust and fast skew detection algorithm forgeneric documents. Pattern Recognition 29, 15991630.

    562 S. Li et al. / Pattern Recognition Letters 28 (2007) 555562

    http://www.paper.edu.cn