Study and Simulation of Different Techniques of Adaptive Restoration of Degraded Document Images

download Study and Simulation of Different Techniques of Adaptive Restoration of Degraded Document Images

of 3

Transcript of Study and Simulation of Different Techniques of Adaptive Restoration of Degraded Document Images

  • 8/12/2019 Study and Simulation of Different Techniques of Adaptive Restoration of Degraded Document Images

    1/3

    Study and Simulation of Different Techniques of

    Adaptive Restoration of Degraded Document Images

    Keywords: Binari zation, F il teri ng, Thresholding, Adaptive

    Abstract

    In the last few decades, computer originated electronic documents becomes the exclusive

    medium of transmitting information. To compensate with it, old text document images must be

    stored in a computer readable format. Document Image Processing deals with the restoration of

    document images. Document images are processed in two distinct phases to extract their

    contents: Binarization of the document images which refers to the conversion of a gray scale

    image into a binary image thereby distinguishing the text areas from the background areas andthe another one is character segmentation and recognition which deals with the recognition of the

    characters of the binary document image. Binarization plays an essential role in the whole

    process as the performance of the character recognition critically depends on the accuracy of the

    binary image. However, binarization is not an easy task because of the degradations that appear

    frequently in the document image due to various reasons. Hence some special techniques are

    required to process such text document images. In this minor project, we have studied and

    simulated four different binarization techniques: Niblack [5], Sauvola [2], Gatos et al. [1] and

    Halabi et al. [4].Niblacks method is good for text regions only, Sauvolas method take cares of

    both text and background regions, Gatoes et als method performs well in presence of

    degradation in background surface and Halabi et al.s method also performs well by applyingGaussian function in pre-processing step. The simulation was done in MATLAB. We analyze the

    binarization results for each method in OCR software.

    Introduction

    Binarization of document images are broadly classified into two categories: global and adaptive.

    The global methods use only one threshold to separate the text region from the background

    region, which gives good result in case of uniform background. Adaptive method uses different

    threshold value depending upon the local pixel information and it gives better result in comparedto global methods when degradation is present in the document image. In this study, we consider

    adaptive binarization of text document images only.

  • 8/12/2019 Study and Simulation of Different Techniques of Adaptive Restoration of Degraded Document Images

    2/3

    Different Binarization methods

    [A] Niblacks method

    Threshold is calculated based on the local mean and standard deviation of the pixel information

    within the sliding window by the following formula:

    [B] Sauvolas methodSauvolas method is the modified version of Niblacks approach. Threshold is calculated

    considering the dynamics of standard deviation by the following formula:

    [3] Gatos et al.s method

    It uses five distinct steps: preprocessing, rough estimation of foreground region, approximate

    background calculation, final thresholding and post-processing. Usually adaptive wiener filter is

    used for preprocessing of the main grayscale document images. A rough estimation of the text

    region is calculated using sauvolas method which gives the superset of correct text regions. An

    approximate background is calculated from the filtered grayscale image using interpolation ofthe pixel corresponding to the text region of the estimated foreground. Final thresholding is

    performed by calculating the distance of foreground text pixel from it background adaptively.

    Finally, a post-processing is performed using shrink and swell filtering.

    Fig1. : Block diagram of poor quality document image binarization

    Weiner

    Filter

    Sauvola's

    adaptive

    thresholding

    InterpolationExamine pixel contrast

    B(x,y)-I(x,y)>d(B(x,y))

    Post-

    Processing

    Is(x,y)

    Gray-scale

    source image

    I(x,y)

    Gray-scale

    image after pre-

    processing

    S(x,y)

    Intermediate B/W

    image

    B(x,y)

    Gray-scale

    Background

    surface image

    T(x,y)

    Final image

    T(x,y)

    Final image

    after post

    processing

  • 8/12/2019 Study and Simulation of Different Techniques of Adaptive Restoration of Degraded Document Images

    3/3