Study and Simulation of Different Techniques of Adaptive Restoration of Degraded Document Images
-
Upload
ripan-deuri -
Category
Documents
-
view
217 -
download
0
Transcript of Study and Simulation of Different Techniques of Adaptive Restoration of Degraded Document Images
-
8/12/2019 Study and Simulation of Different Techniques of Adaptive Restoration of Degraded Document Images
1/3
Study and Simulation of Different Techniques of
Adaptive Restoration of Degraded Document Images
Keywords: Binari zation, F il teri ng, Thresholding, Adaptive
Abstract
In the last few decades, computer originated electronic documents becomes the exclusive
medium of transmitting information. To compensate with it, old text document images must be
stored in a computer readable format. Document Image Processing deals with the restoration of
document images. Document images are processed in two distinct phases to extract their
contents: Binarization of the document images which refers to the conversion of a gray scale
image into a binary image thereby distinguishing the text areas from the background areas andthe another one is character segmentation and recognition which deals with the recognition of the
characters of the binary document image. Binarization plays an essential role in the whole
process as the performance of the character recognition critically depends on the accuracy of the
binary image. However, binarization is not an easy task because of the degradations that appear
frequently in the document image due to various reasons. Hence some special techniques are
required to process such text document images. In this minor project, we have studied and
simulated four different binarization techniques: Niblack [5], Sauvola [2], Gatos et al. [1] and
Halabi et al. [4].Niblacks method is good for text regions only, Sauvolas method take cares of
both text and background regions, Gatoes et als method performs well in presence of
degradation in background surface and Halabi et al.s method also performs well by applyingGaussian function in pre-processing step. The simulation was done in MATLAB. We analyze the
binarization results for each method in OCR software.
Introduction
Binarization of document images are broadly classified into two categories: global and adaptive.
The global methods use only one threshold to separate the text region from the background
region, which gives good result in case of uniform background. Adaptive method uses different
threshold value depending upon the local pixel information and it gives better result in comparedto global methods when degradation is present in the document image. In this study, we consider
adaptive binarization of text document images only.
-
8/12/2019 Study and Simulation of Different Techniques of Adaptive Restoration of Degraded Document Images
2/3
Different Binarization methods
[A] Niblacks method
Threshold is calculated based on the local mean and standard deviation of the pixel information
within the sliding window by the following formula:
[B] Sauvolas methodSauvolas method is the modified version of Niblacks approach. Threshold is calculated
considering the dynamics of standard deviation by the following formula:
[3] Gatos et al.s method
It uses five distinct steps: preprocessing, rough estimation of foreground region, approximate
background calculation, final thresholding and post-processing. Usually adaptive wiener filter is
used for preprocessing of the main grayscale document images. A rough estimation of the text
region is calculated using sauvolas method which gives the superset of correct text regions. An
approximate background is calculated from the filtered grayscale image using interpolation ofthe pixel corresponding to the text region of the estimated foreground. Final thresholding is
performed by calculating the distance of foreground text pixel from it background adaptively.
Finally, a post-processing is performed using shrink and swell filtering.
Fig1. : Block diagram of poor quality document image binarization
Weiner
Filter
Sauvola's
adaptive
thresholding
InterpolationExamine pixel contrast
B(x,y)-I(x,y)>d(B(x,y))
Post-
Processing
Is(x,y)
Gray-scale
source image
I(x,y)
Gray-scale
image after pre-
processing
S(x,y)
Intermediate B/W
image
B(x,y)
Gray-scale
Background
surface image
T(x,y)
Final image
T(x,y)
Final image
after post
processing
-
8/12/2019 Study and Simulation of Different Techniques of Adaptive Restoration of Degraded Document Images
3/3