Adaptive Restoration of Degraded Document Images

download Adaptive Restoration of Degraded Document Images

of 4

Transcript of Adaptive Restoration of Degraded Document Images

  • 8/11/2019 Adaptive Restoration of Degraded Document Images

    1/4

    Study and Simulation of Different Techniques of

    Adaptive Restoration of Degraded Document Images

    Ripan Deuri Raghavendra Pal

    ELB10033 ELB10034

    Keywords: Binari zation, F il teri ng, Thresholding, Adaptive

    Abstract

    In the last few decades, computer originated electronic documents becomes the exclusive

    medium of transmitting information. To compensate with it, old text document images must be

    stored in a computer readable format. Document Image Processing deals with the restoration of

    document images. Document images are processed in two distinct phases to extract their

    contents: Binarization of the document images which refers to the conversion of a gray scale

    image into a binary image thereby distinguishing the text areas from the background areas and

    the another one is character segmentation and recognition which deals with the recognition of the

    characters of the binary document image. Binarization plays an essential role in the whole

    process as the performance of the character recognition critically depends on the accuracy of the

    binary image. However, binarization is not an easy task because of the degradations that appear

    frequently in the document image due to various reasons. Hence some special techniques arerequired to process such text document images. In this minor project, we have studied and

    simulated four different binarization techniques: Niblack [5], Sauvola [2], Gatos et al. [1] and

    Halabi et al. [4].Niblacks method is good for text regions only, Sauvolas method take cares of

    both text and background regions, Gatoes et als method performs well in presence of

    degradation in background surface and Halabi et al.s method also performs well by applying

    Gaussian function in pre-processing step. The simulation was done in MATLAB. We analyze the

    binarization results for each method in OCR software.

    Introduction

    Binarization of document images are broadly classified into two categories: global and adaptive.

    The global methods use only one threshold to separate the text region from the background

    region, which gives good result in case of uniform background. Adaptive method uses different

    threshold value depending upon the local pixel information and it gives better result in compared

  • 8/11/2019 Adaptive Restoration of Degraded Document Images

    2/4

    to global methods when degradation is present in the document image. In this study, we consider

    adaptive binarization of text document images only.

    Different Binarization methods

    [A] Niblacks methodThreshold is calculated based on the local mean and standard deviation of the pixel information

    within the sliding window by the following formula:

    [B] Sauvolas method

    Sauvolas method is the modified version of Niblacks approach. Threshold is calculated

    considering the dynamics of standard deviation by the following formula:

    [3] Gatos et al.s method

    It uses five distinct steps: preprocessing, rough estimation of foreground region, approximate

    background calculation, final thresholding and post-processing. Usually adaptive wiener filter is

    used for preprocessing of the main grayscale document images. A rough estimation of the text

    region is calculated using sauvolas method which gives the superset of correct text regions. An

    approximate background is calculated from the filtered grayscale image using interpolation of

    the pixel corresponding to the text region of the estimated foreground. Final thresholding is

    performed by calculating the distance of foreground text pixel from it background adaptively.

    Finally, a post-processing is performed using shrink and swell filtering.

    Fig1. : Block diagram of poor quality document image binarization

    Weiner

    Filter

    Sauvola's

    adaptive

    thresholding

    InterpolationExamine pixel contrast

    B(x,y)-I(x,y)>d(B(x,y))

    Post-

    Processing

    Is(x,y)Gray-scale

    source image

    I(x,y)Gray-scale

    image after pre-

    processing

    S(x,y)Intermediate B/W

    image

    B(x,y)Gray-scale

    Background

    surface image

    T(x,y)Final image

    T(x,y)Final image

    after post

    processing

  • 8/11/2019 Adaptive Restoration of Degraded Document Images

    3/4

    (a) (b)

    (c) (d)

    Fig. 2: (a) Image before binarization (b) OCR output (c) After processing (d) OCR output

    [4] Halabi et al.s method

    It uses the same steps as the previous method, but a Gaussian filter is used in preprocessing step

    to reduce the noise of the grayscale image.

    Fig. 3: Block diagram of Halabi et al.s method

    Experimental Results and Conclusion

    Niblacks method performs binarization well in the regions near to the text areas. But large

    amount of noise is present in the non-text regions. This problem of Niblack is overcome by

    sauvolas method by considering the dynamics of standard deviation of the pixels for the

    grayscale images. However, it gives poor performance when the degree of degradation becomes

    more severe. The method proposed by Gatos et al. can deal with the degradations in the

    Is(x,y)

    Gray-scale

    source image

    I(x,y)

    Gray-scale

    image after pre-

    processing

    S(x,y)

    Intermediate B/W

    image

    B(x,y)

    Gray-scale

    Background

    surface image

    T(x,y)

    Final image

    T(x,y)

    Final image

    after post

    processing

    Gaussian

    Filter

    Sauvola'sadaptive

    thresholding

    Interpolation Examine pixel contrast

    B(x,y)-I(x,y)>d(B(x,y))

    Post-

    Processing

  • 8/11/2019 Adaptive Restoration of Degraded Document Images

    4/4

    background surface by considering the distance of the text and background surface as threshold.

    Halabi et al.s method also perform well, but blurring of edges occur due to the usage of

    Gaussian filter.

    References

    1. Gatos B., I. Pratikakis, S. J. Perantonis, Adaptive Degraded Image Binarization,

    Journal of Pattern Recocnition, 39 (2006)

    2. Sauvola J., M. Pietikainen, Adaptive Document Image Binarization, Pattern

    Recognition 33 (2000)

    3. Basilios Gatos, Ioannis Pratikakis and Stavros J. Perantonis An Adaptive Binarization

    Technique for Low Quality Historical Documents (2004)

    4. Yahia S. Halabi, Zaid SA, Faris Hamdan, Khaled Haj Yousef Modeling Adaptive

    Degraded Document Image Binarization and Optical Character System (2009)

    5. W. Niblack, An Introduction to Digital Image Processing, Prentice-Hall, Englewood

    Cliffs, NJ, 1986

    6. ABBYY (www.finereader.com).

    http://www.finereader.com/http://www.finereader.com/http://www.finereader.com/http://www.finereader.com/