ijpam.eu An Implementation of Skeletonized Morphological Factor … · 2018-03-15 · An...

16
An Implementation of Skeletonized Morphological Factor of Applying Histogram for Text Extraction 1 K.R. Sanjuna and 2 K. Dinakaran 1 Department of Computer Science and Engineering, Prist University, Tamil Nadu, India. [email protected] 2 Department of Computer Science and Engineering, RMD Engineering College, Chennai, Tamil Nadu. Abstract Implementing morphological factor for text processing is a crucial issue to recognize a text detections. The region selection model has been effectively developed for image textsegmentation. The accuracy of detecting system mainly depends on the text preprocessing and segmentation algorithm being used in this paper, to propose a new method to segment text regions from color images with textured skeletonized morphological Factor Algorithm (SMF). This technique is based on discovery the text edges using materialcontented of the sub image constants of the discrete wavelet transformed input images. Then, the noticed edges are mutual to form the exact position of the characters. In the final stage, the regions that are not satisfactory as the text regions are detached with segmentation to improve the overall performance and extract the outlier of text region. The experimental results show that the projected method is intent to fix line factor text features against size, font, language, color and direction fluctuations of the text regions. Key Words:Text extraction, text segmentation, wavelet transform, image documents. International Journal of Pure and Applied Mathematics Volume 114 No. 7 2017, 727-741 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu Special Issue ijpam.eu 727

Transcript of ijpam.eu An Implementation of Skeletonized Morphological Factor … · 2018-03-15 · An...

Page 1: ijpam.eu An Implementation of Skeletonized Morphological Factor … · 2018-03-15 · An Implementation of Skeletonized Morphological Factor of Applying Histogram for Text Extraction

An Implementation of Skeletonized Morphological

Factor of Applying Histogram for Text Extraction 1K.R. Sanjuna and

2K. Dinakaran

1Department of Computer Science and Engineering,

Prist University, Tamil Nadu, India.

[email protected] 2Department of Computer Science and Engineering,

RMD Engineering College, Chennai, Tamil Nadu.

Abstract Implementing morphological factor for text processing is a crucial issue

to recognize a text detections. The region selection model has been

effectively developed for image textsegmentation. The accuracy of

detecting system mainly depends on the text preprocessing and

segmentation algorithm being used in this paper, to propose a new method

to segment text regions from color images with textured skeletonized

morphological Factor Algorithm (SMF). This technique is based on

discovery the text edges using materialcontented of the sub image

constants of the discrete wavelet transformed input images. Then, the

noticed edges are mutual to form the exact position of the characters. In the

final stage, the regions that are not satisfactory as the text regions are

detached with segmentation to improve the overall performance and

extract the outlier of text region. The experimental results show that the

projected method is intent to fix line factor text features against size, font,

language, color and direction fluctuations of the text regions.

Key Words:Text extraction, text segmentation, wavelet transform, image

documents.

International Journal of Pure and Applied MathematicsVolume 114 No. 7 2017, 727-741ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version)url: http://www.ijpam.euSpecial Issue ijpam.eu

727

Page 2: ijpam.eu An Implementation of Skeletonized Morphological Factor … · 2018-03-15 · An Implementation of Skeletonized Morphological Factor of Applying Histogram for Text Extraction

1. Introduction

In recent years, text detection has attracted a lot of attention and many text

detection methods have been proposed. Text segmentation refers to the process

of segmenting an article into its several parts based on its content. Because in

the information retrieval systems, a long text tends to be retrieved most

frequently by overestimation of its relevancy to a query, we need to segment it

into its several parts, in order to avoid the problem. In this task, the text is given

as the input and segmented into paragraphs, a list of pairs of adjacent

paragraphs is generated, and each pair is judged whether we put the topic

boundary between them, or not. The task is interpreted into a binary

classification where each pair of paragraphs is classified into separation or non-

separation. However, in next research, it will be considered to segment speech

text into paragraphs or sentences.

Some problems are caused by encoding texts into numerical vectors and

computing their similarities based on only attribute values. Many features are

required for encoding texts into numerical vectors, assuming that words are

given as features, in order to maintain the enough system robustness. The

dominance of zero feature values in each numerical vector causes the very poor

environment for computing their similarities because of very weak

discriminations among numerical vectors. In the previous works, the similarity

between numerical vectors representing texts has been computed, assuming the

independence among features, even if the words which indicate the features

have their very strong semantic relations. Therefore, in this research, as the

challenge against the problems, we consider both the semantic relations among

features and differences among feature values for computing the similarity

between two texts.

Let us mention what we propose in this research as some agenda. In this

research, we assume that words are given as features of numerical vectors in

encoding texts, and they have their semantic relations with others. Based on the

assumption, we define the similarity measure for computing the similarity

between feature vectors, considering both feature values and features. We

modify the KNN into the version where both the feature similarity and the

feature value similarity are used, and apply it to the classification task mapped

from the text segmentation. As benefits from this research, we expect its more

tolerance to the sparse distributions and the potential avoidance of the huge

dimensionality. Let us mention what is expected from this research as benefits

by implementing the above ideas. We may cut down the dimensionality in

encoding texts into numerical vectors, potentially. The information loss in

computing the similarity between texts may be reduced by reflecting the

similarities among the features.

Applying machine learning methods for text detection chance meeting the

difficulties due to atmosphere formation of object recognition. To overwhelmed

International Journal of Pure and Applied Mathematics Special Issue

728

Page 3: ijpam.eu An Implementation of Skeletonized Morphological Factor … · 2018-03-15 · An Implementation of Skeletonized Morphological Factor of Applying Histogram for Text Extraction

these problems, the implementation begins suggest a two-step localization

verification of morphological approach. The first step purposes at quickly

focusing candidate text lines, permitting the normalization of characters into a

exceptional unit size. In the verification step, a skilled support vector machine

or multi-layer perceptron’s is applied on background independent features to

remove the false alarms. Text recognition, even from the observed text lines,

leftovers a challenging problematic due to the diversity of fonts, colors, the

attendance of complex backgrounds and the short length of the text strings. Two

structures skeletonized factors are examined addressing the text credit problem:

bi-modal improvement scheme and multi-modal subdivision scheme. In the bi-

modal scheme, By text factors object selection is a set of filters to improve the

contrast of black and white characters and crop a better binarization before

recognition. For more over-all cases, the text recognition is lectured by a text

detection formalize the dissection step followed by a traditional optical

character recognition (OCR) algorithm within a multi-hypotheses outline. In the

segmentation step, segmentation model originates the distribution of grayscale

standards of pixels using a Gaussian mixture model or a Markov Random Field.

The subsequent multiple segmentation hypotheses are post-processed by a

associated component object exploration and a grayscale consistency constraint

forms region bounds of text. The proposed approach becomes less sensitive to

the sparse distribution of numerical vectors, because the similarity among

features is captured as well as among feature values. Therefore, we expect both

the better performance of the classification task which is mapped from the text

segmentation and the more efficient text representations, from this research.

2. Literature Survey

In recent years, many methods for text detection have been presented, that prove

effective for text detection in various configurations. They presented a review of

the research on various text detection methods as follows.

The edge-based methods are usually efficient and simple when the edges of text

and background vary considerably. Own to the property of the above

mentioned, edge based methods attracted much attention in these years and

some effective methods have already been developed in many literatures. Sun et

al. [1] used color image filtering technique to extract board text under natural

scenes V. Khare, P. A new histogram oriented moment’s descriptor for multi-

oriented moving text detection Shiva kumara. [2] Located edge-dense image

blocks using edge feature and morphology operation, and then a SVM classifier

was employed to identify the texts blocks. Developed an effective edge-based

text extraction method which was implemented by investigating the location of

text in complex background images.

Y. Zhang, J. H. Lai, and P. C. Yuen, “Text string detection for loosely

constructed characters with arbitrary orientations, [3] developed a method to

detect text, which firstly group text candidates using clustering algorithm, and

identified texts with a text classifier. Edge-based methods can achieve a good

International Journal of Pure and Applied Mathematics Special Issue

729

Page 4: ijpam.eu An Implementation of Skeletonized Morphological Factor … · 2018-03-15 · An Implementation of Skeletonized Morphological Factor of Applying Histogram for Text Extraction

performance when scene images exhibiting strong edges.

Shuping Liu, Yantuan Xian Text Detection in Natural Scene Images Using

Morphological Component Analysis. The most popular method for text

detection in recent years issparse representation(SR), which inspired by the

sparse-coding mechanism of human vision system. SR technologyhas been

successfully used for face recognition [4], image classification image restoration

and compressed sensing

Xu-Cheng rule, Xuwang Yin, Kaizhu Huang, ANdHongWeiHao [5] projected

at an correct and sturdy technique for detection texts in natural scene photos.

Throughout this paper propose a robust and proper Maximally Stable Extremal

Regions MSER-based scene text detection technique. First, a designed a fast

and effective pruning algorithm may well be a Maximally Stable Extremal

Regions (MSERs); the amount of character candidates to be processed is

reduced with high accuracy. Second, Character candidates are classified into

text candidates by the single-link clustering algorithm, where distance weights

and clustering threshold are learned automatically by a completely unique

(novel) self-training distance metric learning algorithm.

Chucai Yi and YingliTian [6] given a method combines scene text recognition

and scene text detection algorithms. In text detection, projected a Layout based

primarily scene text detection algorithms are applied to get text regions from

scene image. In scene text recognition schemes, structure based scene text

recognition technique is used

Scene Text Recognition applying Structure-Guided Character Detection and

Linguistic Knowledge [7] projected by Cun-Zhao Shi, Chun-Heng Wang, Bai-

Hua Xiao, and Song GAO, Jin-Long Hu projected a completely unique scene

text-recognition technique combination of structure-guided character detection

and linguistic info. Use of every global structure and native look information of

characters, build a part-based tree structure to model each category of characters

so on along observe and acknowledge characters at identical time.

For word recognition, mix the detection scores and language model into the

posterior likelihood of character sequence from the Bayesian decision tree and

the final word recognition result's obtained by finding most likelihood character

sequence utilized by Viterbi algorithm and thus the various information a bit

like the language model to eliminate the word recognition probable ambiguities

A recent paper, Semiautomatic Ground Truth Generation for Text Detection and

Recognition in Video frames [8] proposed by TrungLi, Chew. They projected a

semiautomatic system for ground truth generation for video text detection and

recognition that has English and Chinese text of multi orientation at word level.

Ground trothing for text detection and recognition involves text line

segmentation, word segmentation, bounding box drawing, deciding field,

graphics and scene text separation. The system includes a facility to allow the

International Journal of Pure and Applied Mathematics Special Issue

730

Page 5: ijpam.eu An Implementation of Skeletonized Morphological Factor … · 2018-03-15 · An Implementation of Skeletonized Morphological Factor of Applying Histogram for Text Extraction

user to manually correct the bottom truth if the machine-driven technique

produces incorrect results

Jacqueline Field [9] projected a bit, a system that handles many stages of scene

text reading in probabilistic manner, from binarization to seem standardization

to character segmentation and recognition. Throughout this work, describe a

reading system that integrates a simple region grouping rule and probabilistic

models for binarizing a given text region, distinctive baselines, along perform

word and character segmentation throughout recognition technique.

In this work projected by Ali Mosleh,NizarBouguila, Abdesamad mount Hamza

[10] projected a bit to erase the unwanted text from the video. Throughout this

work presents a two stages (i) automatic video text detection and (ii) restoration

once the removal. Support Vector Machine (SVM) base video text detection

technique is used to localize the text from video frames. Develop one frame text

detection algorithm using a Stroke Width Transform (SWT) and unsupervised

classification.

Palaiahnakote Shivakumara, TrungQuyPhan,Shijian Lu, and Chew Lim Tan

[11] presents a replacement technique supported gradient vector flow (GVF)

and neighbor part grouping that extracts text lines of any orientations. GVF for

characteristic text component applying Sobel edge map attributable to sobel

provides fine details for text and fewer details for nontext on top of the canny

edge map.

Yao Li, WenjingJia, ChunhuaShen, and Anton van den Hengel [12] proposed

the detection methods to measures of abjectness. Throughout this review

describes the characterless model, regions are extracted by modified MSER-

based region detector Then computed novel characterless cues, then these cues

unit of measurement utilized during a Bayesian framework where naïve Bayes

is used to model the probability

Z. Yuan, D. Zhao, T. Lu, and C. L. Tan “New gradient-spatial-structural

features for video script identification [13]. The input for script identification is

the text blocks obtained by our text frame classification method. A method

based on histogram thresholding, entropy filtering and connected components is

used to extract Bengali text and Bengali characters from multimedia images is

proposed in [15]. A method for character extraction and recognition from

images is proposed in [14], in which edge compactness is designed for four

orientations to notice potential text regions and clustering is used to confine text

regions. A collective approach founded on color and edge landscapes for

extracting text from video is proposed in [16, 17], in which color-edge method

is used to eliminate text contextual and vertical and flatforecast is employed to

locate text in image. The relative education of edge-based and connected-

components based methods in terms of correctness, precision and recall rates is

given in [18], in which each approach is analyzed to determine its success and

limitations.

International Journal of Pure and Applied Mathematics Special Issue

731

Page 6: ijpam.eu An Implementation of Skeletonized Morphological Factor … · 2018-03-15 · An Implementation of Skeletonized Morphological Factor of Applying Histogram for Text Extraction

3. Problem Definition and Identification

Factor to be Resolved

Text Detection and extraction of region from images includes some stimulating

difficulties. For occurrence, the fonts areoften varied with other substances, the

appeals of region carries may be of any scripts of alphabets with any hue state,

the contextual color may varyonly slightly from that of the fonts, the font style

and size of the charms may vary, and the luminance ofthe imageries may also

vary.

1. To design a hierarchical (i.e., tree-based) representation of the image

contents, where adjacency between components is relatedto inclusion

where it’s give as efficiency

2. To design a character segmentation method which is a good tradeoff

between efficiency (linear time complexity) and quality(with a competitive

F-score);

3. Enhance an efficient grouping of characters into text boxes by

skeletonization method, taking fully advantage of the tree structure

construct to left right position.

4. An illustration on image binarization have the capabilities of the proposed

tree-basedrepresentation.

4. An Implementation of Skeletonized Morphological Factor to Applying

Histogram for Text Extraction

The development of image computational method for extracting character

portions from a complicated image is segmented to outlier portion.

Identification of areas corresponding to text in document images is an important

step for a character recognition system. We briefly review a technique for

automatic design of Skeletonization morphological extraction and show its

application to the segmentation of text areas using region selection from images.

We also present a heuristic applied Gabor filter used to refine the segmentation

results. The goal of the proposed method is to realize a practical document

structure analysis for advanced optical character recognition systems with

Heuristic Feature extraction Model. The color evaluation are carried through

hue-saturation of histogram evaluationThe main objective of this work has been

to develop a robust and efficient segmentation system for natural images.

Architecture Diagram for Proposed Implementation

The figure given below shows the segmentation of morphological factor the

skeletonize the text factor by various processing steps

International Journal of Pure and Applied Mathematics Special Issue

732

Page 7: ijpam.eu An Implementation of Skeletonized Morphological Factor … · 2018-03-15 · An Implementation of Skeletonized Morphological Factor of Applying Histogram for Text Extraction

Figure 1: Proposed Skeletonization of Morphological Process for Text Extraction

Morphological factor can be achieved by advancing the state-of-art in terms of

pushing forward the edge recognition methods to meet the challenges of the

segmentation task in different situations under extraction processing methods as

shown in figure 1. Consequently, more efficient methods and novel strategies to

issues for which current approaches of previous implementations are developed.

The performance of the presented segmentation produced high performance by

processing this implementation

Pre-processing

Our method is a connected component-based method. Therefore, pre-processing

is an important step to binarize images, extract connected components and

remove noise. First of all, image binarization method is applied to binarize the

image and then all connected components are extracted by he connected

component labeling method. However, not all connected components are

suitable for learning phase or land testing phase such as noise, stains resulting

from the scanning process, etc. Based on the characteristics of connected

components such as size, shape and position, we apply some rules to remove

such small noisy components and stain connected components which usually

have long shapes and appear in the boundary of a document. By sustaining the

Apply Gabor filters Region selection

Input image

Preprocessing Segmentation

A morphological process of applying Histogram Gradient text extraction and classification

Edge mapping

HE feature extraction Technique

Text extraction

Skeletonization

Noise removal

International Journal of Pure and Applied Mathematics Special Issue

733

Page 8: ijpam.eu An Implementation of Skeletonized Morphological Factor … · 2018-03-15 · An Implementation of Skeletonized Morphological Factor of Applying Histogram for Text Extraction

noise by applying, we equate this,

F( x;m;s)

( )

Where x is variable; m and s, are the mean and standard deviation of the

variable’s natural logarithm for all connected components of the images

respectively

Skeletonized Feature Extraction

Many features can be extracted from connected components. However, if a

selected feature is not good, it does not benefit classification. As shape and

context are very important features with which humans recognize or segment

and image, we extract features from size information, shape information, stroke

width, and position of connected components.

a) Elongation Frame Edge Detection of Skeletonized Text Region

The extraction Frame edges are the dissimilar characteristics of the text blocks

which can be used to notice possible text regions. Here, by discovery the edges

in the revealed sub images and combining the edges controlled in each sub

image, the applicant text regions can be originate. As a result, we initially need

to employ a morphological form edge detector. Here, for computational

efficiency, the Sobel edge detector is used. The Sobel edge detector is efficient

to excerpt the strong edges that are desirable in thisapplication carried forming

edges are outlined. We apply the Sobel edge detector on eachsub image.The

algorithm for computing the edge image E, as follows:

Algorithm 4.1 Input: Text Image Output: Detected Edges

Step 1: Accept the input image for preprocess

Step 2: masking Gx, Gy to the input image

Step 3: Sobel edge detection procedure is applied and the gradient Step

Step 4: Masks manipulation of Gx,Gy separately on the input image

For all the pixels in the gray image G(x,y) do

Calculate left = (G(x,y) – G(x-1,y))

Calculate upper = (G(x,y) – G(x,y-1))

For each (Edge region mapping)

Calculate |G|=√

End

Calculateupper Right = (G(x,y)-G(x+1,y-1))

Calculate. E(x,y) = max( left, upper, upper Right )

End For

Step 5.Sharpen the image E by convolving it with a sharpening filter.

W (x,y) = max( L,U,UR)

Step 6: the absolute magnitude is the output edges

b) Stroke filtering of non-text area region

In this stage, filters are further applied to remove the non-text areas using

strokes structural physical rules. To do so, edge region stimulate summarize the

International Journal of Pure and Applied Mathematics Special Issue

734

Page 9: ijpam.eu An Implementation of Skeletonized Morphological Factor … · 2018-03-15 · An Implementation of Skeletonized Morphological Factor of Applying Histogram for Text Extraction

shared attributes of horizontal texts as:

a) Edges to form carried texts,

b) Text bars are formed with widths are larger than their heights,

c) Bounded sizes are round to region of object texts and

d) Textsconsume a singular texture property

We generate horizontal and vertical run-length histograms, and from each

histogram we extract the mode, mean, variance and maximal run-length.

Distance to estimated text lines: the distance of an element € to the text lines (L)

is defined in following equation

Wherevi and vj are pixels in e and `, respectively

D (E, L) = max ->vi∈e {min->vj∈ |vi –vj |2}

The lower the value, the more confidence we require from the normal

distribution in order to assign a text label for an element

Segmentation

The Segmentation of the image is carried through candidate extraction model.

Lining edges are outsourced to split the edges using segmentation algorithm,

inwhich text is projected object linear extraction into Lines and Words. They

using the outmodedof structure vertical and horizontal projectionSegmentation

of appealsis faster than the conservative method in which all the letterings from

the text are segmented by associatedcomponent dispensationof extraction only.

Experimental results it ispragmatic that 98% line, word segmentation. The

projected technique starts by segmenting the appearances of line edges and then

words from the binarized de-skewed text image using straight formatted line

joining edge mapping and vertical projection profiles separately splinted. In the

projection skeleton methods, theflat and perpendicular profiles are computed.

To separate text lines, the horizontal projection profile of the text document

imageis found. The horizontal projection profile (HPP) is a Histogram of a

quantity of ON pixels along each row ofthe image. When the projection profiles

are planned, wecan see mountains and valleys in the plot the segmentations are

splitted. White spaceamong the text lines is cast-off to segment the text lines.

Word Segmentation

The line spacing between the wordsis used for word segmentation to restructure

the text formation. Forming text script, spacing among the words is superiorto

thespacing between the characters in a word that easily extract the gap foaming.

The line spacing between the words originates the text joining is found by

captivating the Vertical Projection Profile (VPP) of an input text line.

Removing False Location

In this steps false locations are removed due to the general rules of the text un

text object regions. False locations are un text regions or background of region

outsourced are that is greater than no segment lines are evacuated; this rule will

result in eliminating most of the square images text boxing pixel regions are

that might be found incorrectly as the text regions. This rule results originates

International Journal of Pure and Applied Mathematics Special Issue

735

Page 10: ijpam.eu An Implementation of Skeletonized Morphological Factor … · 2018-03-15 · An Implementation of Skeletonized Morphological Factor of Applying Histogram for Text Extraction

the exact layer projection of removing singular letters and numbers. In case that

in a particular joining texts are considered as single characters, then this ground

position substitutes by the subsequent.

Algorithm:4.2 Skeletonized Segmentation falseposition Pr, ground pixel labels L

Input: projection of Image

Output: sect ionized image

For(image boundary)

Step 1:segment False point ← un text false (PR).

For each (PR)

Loop: Calculate false location

For end

Step 2: segment False Pixel probabilities

M ← Number of Unique projection images Labels (L)

n1, n2. . . nm ← Counts O uniquelabels(L) .

Get counts of each label on a ground truth image

Step 3: segment Negative pixel probabilities

M←∑ (

.Weighted loss calculation

Get count (loss, 1 n1, 1 n2. . . 1 nm ) . Losspixel with normalization factors

End

Founded square text regions are recognized as single characters without false

crossing regions if their height (width) has a strong-minded size. This text

orientation size is obviously input-dependent. For horizontally regions are

associated in layered text regions, the text block height is amongst two

threshold values. This rule is castoff to eliminate the vertical lines that might

have been noticed in the vertical projection duplicate of the wavelet transform.

These are calculated by probabilities caries the weightage analysis whether the

masking region as are maintained by pixel position.

5. Result and Discussion

The results are implemented in MATLAB with image processing simulation

environment. The projected morphological skeletonized algorithm is tested with

numerous images with trained text regions. We considered only good quality

preprocessed images where there are no overlying to outline suppressed images,

space lining texts, or broken characters. The proposed Skeletonized

morphological factor SMF has produced efficient results than another classifier.

We have evaluated the proposed algorithm with different methodologies

discussed earlier.

The resultant screen shot given below shows that text recognition by

skeletonized morphological applied at in skew factors

International Journal of Pure and Applied Mathematics Special Issue

736

Page 11: ijpam.eu An Implementation of Skeletonized Morphological Factor … · 2018-03-15 · An Implementation of Skeletonized Morphological Factor of Applying Histogram for Text Extraction

Figure 3: Initaia Skew Process Figure 2: Iteration of Morphological Factors

Figure 2 shows the initial process of morphological factor preprocessing state of

MSE, PSNR, and RMSE evaluation. Figure 3 represent the iteration of

morpholical occurrence skeletonized process state.

Figure 4: Mean Square Estimation Figure 5: Number of Text Regions

Figure 4 shows the estimation of feature level by mean square error with region

growing levels .figure 5 shows the observation of text from text region

Graph 1: Shows the Text segmentation Accuracy Achieved by Different

Methods

Graph 1 shows text detection in natural scene images is challenging for complex

background. There are many methods available for detecting the text and

recognition from natural scene images. Here we present a new SMF based

technique has produced higher performance.

0

50

100

150

2 4 6 8 10Text

se

gme

nta

tio

n in

%

Time (min)

Conditional random field

Discrete Wavelet Transform

MSER-DFE

SMF

International Journal of Pure and Applied Mathematics Special Issue

737

Page 12: ijpam.eu An Implementation of Skeletonized Morphological Factor … · 2018-03-15 · An Implementation of Skeletonized Morphological Factor of Applying Histogram for Text Extraction

Graph 2: Graphical Representation of Performance Ratio

In Graph 2, the performance of SMF is shown, which is compared with other

methods based on accuracy. The SMF is having higher performance rate

compared to than other techniques.

Graph 3: Graphical Representation of Processing Time Performance

In graph 3 the processing of time performance has been shown below. In SMF

with neural network method, the classification would be done in 4sec and the

accuracy increases up to more level.

6. Conclusion

For future work, we are considering advanced preprocessing to resolve

problems in distinguishing non-text incontemporary images. Moreover, we will

investigate theuse of automatic feature learning for text versus non-text

discrimination. This paper has presented a Skeletonization method for

segmenting the text and non-text in document images. The method is based on a

set of powerful connected component features. Those features utilize size,

shape, and stroke width and position informationof connected components.

Morphological trained on those features to obtain a model for labelling

connected components. Our results show that the method issimple, fast and is

really able to discriminate text from context, including the text that appears

within graphical.

0

20

40

60

80

100

SVM MRE Linear DWT MSER-DFE SMFTex

t se

gm

enta

tion

in

%

Various method

0

5

10

15

20

30 60 90Tim

e t

ake

n in

m/s

various iteration

SVM

MRE

DWT

MSER-DFE

SMF

International Journal of Pure and Applied Mathematics Special Issue

738

Page 13: ijpam.eu An Implementation of Skeletonized Morphological Factor … · 2018-03-15 · An Implementation of Skeletonized Morphological Factor of Applying Histogram for Text Extraction

References

[1] Sun L., Huo Q., Jia W., Chen K., A robust approach for text detection from natural scene images, Pattern Recognit 48(9) (2015), 2906-2920.

[2] Khare V., Shivakumara P., Raveendran P., A new histogram oriented moments descriptor for multi-oriented moving text detection in video, Exp. Syst. Appl. 42(21) (2015), 7627-7640.

[3] Zhang Y., Lai J.H., Yuen P.C., Text string detection for loosely constructed characters with arbitrary orientations, Neuro computing 168 (2015), 970-978.

[4] Zhao S.H., Hu Z.P., A modular weighted sparse representation based on fisher discriminant and sparse residual for face recognition with occlusion, Inf. Process. Lett. 115(9) (2015), 677−683.

[5] Yin X.C., Yin X., Huang K., Hao H.W., Robust text detection in natural scene images, IEEE transactions on pattern analysis and machine intelligence 36(5) (2014), 970-983.

[6] Yi C., Tian Y., Scene text recognition in mobile applications by character descriptor and structure configuration, IEEE transactions on image processing 23(7) (2014), 2972-2982.

[7] Shi C.Z., Wang C.H., Xiao B.H., Gao S., Hu J.L., Scene text recognition using structure-guided character detection and linguistic knowledge, IEEE transactions on circuits and systems for video technology 24(7) (2014), 1235-1250.

[8] Phan T.Q., Shivakumara P., Bhowmick S., Li S., Tan C.L., Pal U., Semiautomatic Ground Truth Generation for Text Detection and Recognition in Video Images, IEEE transactions on circuits and systems for video technology 24(8) (2014),1277-1287.

[9] Weinman J.J., Butler Z., Knoll D., Field J., Toward integrated scene text reading, IEEE transactions on pattern analysis and machine intelligence 36(2) (2014), 375-387.

[10] Mosleh A., Bouguila N., Hamza A.B., Automatic in painting scheme for video text detection and removal, IEEE Transactions on image processing 22(11) (2013), 4460-4472.

[11] Shivakumara P., Phan T.Q., Lu S., Tan C.L., Gradient vector flow and grouping-based method for arbitrarily oriented scene text detection in video images, IEEE Transactions on Circuits and Systems for Video Technology 23(10) (2013), 1729-1739.

International Journal of Pure and Applied Mathematics Special Issue

739

Page 14: ijpam.eu An Implementation of Skeletonized Morphological Factor … · 2018-03-15 · An Implementation of Skeletonized Morphological Factor of Applying Histogram for Text Extraction

[12] Li Y., Jia W., Shen C., van den Hengel A., Characterness: An indicator of text in the wild, IEEE transactions on image processing 23(4) (2014), 1666-1677.

[13] Shiva kumara P., Yuan Z., Zhao D., Lu T., Tan C.L., New gradient-spatial-structural features for video script identification, Computer Vision and Image Understanding 130 (2015), 35–53.

[14] Wang R.M., Sang N., Gao C.X., Text detection approach based on confidence map and context information, Neurocomputing, 157 (2015), 153-165.

[15] Zhou G., Liu Y.H., Tian Z.Q., Scene text detection with superpixels and hierarchical model, Proc. 19th Int. Conf. Image Processing (2012), 1001-1004.

[16] Le H.P., Toan N.D., Park S., Lee G., Text localization in natural scene images by mean-shift clustering and parallel edge feature, Proc. 5th Int. Conf. Ubiquitous Information Management and Communication (2011).

[17] Lazzara G., Geraud T., Efficient multiscale Sauvola’s binarization, IJDAR 17(2) (2014), 105–123.

[18] Burie J.C., Chazalon J., Coustaty M., Eskenazi S., Luqman M.M., Maroua M., Nayef N., Ogier J.M., Prum S., Rusiñol M., Competition on Smartphone Document Capture and OCR (SmartDoc), International Conference on Document Analysis and Recognition (2015).

[19] Xu Y., Geraud T., Najman L., Connected filtering on tree-based ´ shape-spaces, IEEE Trans. On PAMI 38(6) (2016), 1126–1140.

[20] Busta M., Neumann L., Matas J., fastext: Efficient unconstrained ˇ scene text detector, Proc. of ICCV (2015), 1206–1214.

International Journal of Pure and Applied Mathematics Special Issue

740

Page 15: ijpam.eu An Implementation of Skeletonized Morphological Factor … · 2018-03-15 · An Implementation of Skeletonized Morphological Factor of Applying Histogram for Text Extraction

741

Page 16: ijpam.eu An Implementation of Skeletonized Morphological Factor … · 2018-03-15 · An Implementation of Skeletonized Morphological Factor of Applying Histogram for Text Extraction

742