Page 1 of 24 SAMSUNG EXHIBIT 1008 Samsung v. Image ...

FEATURE EXTRACTION METHODS FOR CHARACTERRECOGNITION | A SURVEY�IVIND DUE TRIER,��y ANIL K. JAIN,x and TORFINN TAXTyyDepartment of Informatics, University of Oslo, P.O.Box 1080 Blindern, N-0316 Oslo, NorwayxDepartment of Computer Science, Michigan State University, A714 Wells Hall, East Lansing,MI 48824{1027, USARevised July 19, 1995Abstract| This paper presents an overview of feature extraction methods for o�-line recog-nition of segmented (isolated) characters. Selection of a feature extraction method is proba-bly the single most important factor in achieving high recognition performance in characterrecognition systems. Di�erent feature extraction methods are designed for di�erent repre-sentations of the characters, such as solid binary characters, character contours, skeletons(thinned characters), or gray level subimages of each individual character. The featureextraction methods are discussed in terms of invariance properties, reconstructability, andexpected distortions and variability of the characters. The problem of choosing the appropri-ate feature extraction method for a given application is also discussed. When a few promisingfeature extraction methods have been identi�ed, they need to be evaluated experimentallyto �nd the best method for the given application.Feature extraction Optical character recognition Character representation InvarianceReconstructability1 IntroductionOptical character recognition (OCR) is one of themost successful applications of automatic patternrecognition. Since the mid 1950's, OCR has beena very active �eld for research and development[1]. Today, reasonably good OCR packages can bebought for as little as $100. However, these areonly able to recognize high quality printed text doc-uments or neatly written hand-printed text. Thecurrent research in OCR is now addressing doc-uments that are not well handled by the availablesystems, including severely degraded, omnifont ma-chine printed text, and (unconstrained) handwrit-ten text. Also, e�orts are being made to achievelower substitution error rates and reject rates evenon good quality machine printed text, since an ex-perienced human typist still has a much lower errorrate, albeit at a slower speed.Selection of a feature extraction method is prob-ably the single most important factor in achievinghigh recognition performance. Our own interest incharacter recognition is to recognize hand-printeddigits in hydrographic maps (Fig. 1), but we havetried not to emphasize this particular applicationin the paper. Given the large number of feature ex-traction methods reported in the literature, a new-comer to the �eld is faced with the following ques-�Author to whom correspondence should be ad-dressed. This work was done while Trier was visitingMichigan State University. The paper appeared in Pat-tern Recognition, Vol. 29, No. 4, pp. 641{662, 1996

tion: Which feature extraction method is the bestfor a given application? This question led us tocharacterize the available feature extraction meth-ods, so that the most promising methods could besorted out. An experimental evaluation of thesefew promising methods must still be performed toselect the best method for a speci�c application. Inthis process, one might �nd that a speci�c featureextraction method needs to be further developed.A full performance evaluation of each methodin terms of classi�cation accuracy and speed is notwithin the scope of this review paper. In order tostudy performance issues, we will have to imple-ment all the feature extraction methods, which isan enormous task. In addition, the performancealso depends on the type of classi�er used. Di�er-ent feature types may need di�erent types of clas-si�ers. Also, the classi�cation results reported inthe literature are not comparable because they arebased on di�erent data sets.Given the vast number of papers published onOCR every year, it is impossible to include all theavailable feature extraction methods in this survey.Instead, we have tried to make a representative se-lection to illustrate the di�erent principles that canbe used.Two-dimensional object classi�cation has sev-eral applications in addition to character recogni-tion. These include airplane recognition [2], recog-nition of mechanical parts and tools [3], and tissueclassi�cation in medical imaging [4]. Several of thefeature extraction techniques described in this pa-1 of 24 SAMSUNG EXHIBIT 1008

Samsung v. Image Processing Techs.

2 TRIER, JAIN, and TAXTFigure 1: A gray scale image of a part of a hand-printed hydrographic map.per for OCR have also been found to be useful insuch applications.An OCR system typically consists of the follow-ing processing steps (Fig. 2):1. Gray level scanning at an appropriate resolu-tion, typically 300{1000 dots per inch,2. Preprocessing:(a) Binarization (two-level thresholding),using a global or a locally adaptivemethod,(b) Segmentation to isolate individual char-acters,(c) (Optional) conversion to another charac-ter representation (e.g., skeleton or con-tour curve),3. Feature extraction4. Recognition using one or more classi�ers,5. Contextual veri�cation or postprocessing.Survey papers [5{7], books [8{12], and evalua-tion studies [13{16] cover most of these subtasks,and several general surveys of OCR systems [1][17{22] also exist. However, to our knowledge, no thor-ough, up to date survey of feature extraction meth-ods for OCR is available.Devijver and Kittler de�ne feature extraction(page 12 in [11]) as the problem of \extracting fromthe raw data the information which is most rele-vant for classi�cation purposes, in the sense of min-imizing the within-class pattern variability whileenhancing the between-class pattern variability."It should be clear that di�erent feature extractionmethods ful�ll this requirement to a varying degree,

CLASSIFIEDTEXT

SCANNING

CLASSIFICATION

FEATURE EXTRACTION

PAPERDOCUMENT

GRAY LEVELIMAGE

FEATUREVECTORS

POSTPROCESSING

CLASSIFIEDCHARACTERS

PREPROCESSINGSINGLECHARACTERSFigure 2: Steps in a character recognition system.depending on the speci�c recognition problem andavailable data. A feature extraction method thatproves to be successful in one application domainmay turn out not to be very useful in another do-main.One could argue that there is only a limitednumber of independent features that can be ex-tracted from a character image, so that which setof features is used is not so important. However,the extracted features must be invariant to theexpected distortions and variations that the char-acters may have in a speci�c application. Also,the phenomenon called the curse of dimensional-ity [9, 23] cautions us that with a limited trainingset, the number of features must be kept reasonablysmall if a statistical classi�er is to be used. A ruleof thumb is to use �ve to ten times as many trainingpatterns of each class as the dimensionality of thefeature vector [23]. In practice, the requirementsof a good feature extraction method makes selec-tion of the best method for a given application achallenging task. One must also consider whetherthe characters to be recognized have known ori-entation and size, whether they are handwritten,machine printed or typed, and to what degree theyare degraded. Also, more than one pattern classmay be necessary to characterize characters thatcan be written in two or more distinct ways, as forexample `4' and `4', and `a' and `a'.Feature extraction is an important step inachieving good performance of OCR systems. How-ever, the other steps in the system (Fig. 2) also needto be optimized to obtain the best possible perfor-mance, and these steps are not independent. Thechoice of feature extraction method limits or dic-tates the nature and output of the preprocessingstep (Table 1). Some feature extraction methodswork on gray level subimages of single characters(Fig. 3), while others work on solid 4-connectedor 8-connected symbols segmented from the binaryraster image (Fig. 4), thinned symbols or skeletons(Fig. 5), or symbol contours (Fig. 6). Further, thetype or format of the extracted features must matchthe requirements of the chosen classi�er. GraphAppeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996

SAMSUNG EXHIBIT 1008 of 24

Feature Extraction Methods for Character Recognition | A Survey 3Table 1: Overview of feature extraction methods for the various representation forms (gray level, binary,vector).Gray scale Binary Vectorsubimage solid character outer contour (skeleton)Template matching Template matching Template matchingDeformable templates Deformable templatesUnitary Transforms Unitary transforms Graph descriptionProjection histograms Contour pro�les Discrete featuresZoning Zoning Zoning ZoningGeometric moments Geometric moments Spline curveZernike moments Zernike moments Fourier descriptors Fourier descriptorsFigure 3: Gray scale subimages (� 30�30 pixels) ofsegmented characters. These digits were extractedfrom the top center portion of the map in Fig. 1.Note that for some of the digits, parts of other printobjects are also present inside the character image.descriptions or grammar-based descriptions of thecharacters are well suited for structural or syntacticclassi�ers. Discrete features that may assume only,say, two or three distinct values are ideal for deci-sion trees. Real-valued feature vectors are ideal forstatistical classi�ers. However, multiple classi�ersmay be used, either as a multi-stage classi�cationscheme [24, 25], or as parallel classi�ers, where acombination of the individual classi�cation resultsis used to decide the �nal classi�cation [20, 26, 27].In that case, features of more than one type or for-mat may be extracted from the input characters.1.1 InvariantsIn order to recognize many variations of the samecharacter, features that are invariant to certaintransformations on the character need to be used.Invariants are features which have approximatelythe same values for samples of the same characterthat are, for example, translated, scaled, rotated,stretched, skewed, or mirrored (Fig. 7). However,not all variations among characters from the samecharacter class (e.g., noise or degradation, and ab-sence or presence of serifs) can be modelled by using

Figure 4: Digits from the hydrographic map in thebinary raster representation.Figure 5: Skeletons of the digits in Fig. 4, thinnedwith the method of Zhang and Suen [28]. Notethat junctions are displaced and a few short falsebranches occur.Figure 6: Contours of two of the digits in Fig. 4.Appeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996


4 TRIER, JAIN, and TAXT(a) (b) (c) (d)(e) (f) (g)Figure 7: Transformed versions of digit `5'. (a)original, (b) rotated, (c) scaled, (d) stretched, (e)skewed, (f) de-skewed, (g) mirrored.invariants.Size- and translation invariance is easilyachieved. The segmentation of individual charac-ters can itself provide estimates of size and loca-tion, but the feature extraction method may oftenprovide more accurate estimates.Rotation invariance is important if the charac-ters to be recognized can occur in any orientation.However, if all the characters are expected to havethe same rotation, then rotation-variant featuresshould be used to distinguish between such char-acters as `6' and `9', and `n' and ù'. Another al-ternative is to use rotation-invariant features, aug-mented with the detected rotation angle. If the ro-tation angle is restricted, say, to lie between �45�and 45�, characters that are, say 180� rotations ofeach other can be di�erentiated. The same princi-ple may be used for size-invariant features, if onewants to recognize punctuation marks in additionto characters, and wants to distinguish between,say, `.', ò', and Ò'; and `,' and `9'.Skew-invariance may be useful for hand-printedtext, where the characters may be more or lessslanted, and multifont machine printed text, wheresome fonts are slanted and some are not. Invari-ance to mirror images is not desirable in characterrecognition, as the mirror image of a character mayproduce an illegitimate symbol or a di�erent char-acter.For features extracted from gray scale subim-ages, invariance to contrast between print andbackground and to mean gray level may be needed,in addition to the other invariants mentionedabove. Invariance to mean gray level is easily ob-tained by adding to each pixel the di�erence of thedesired and the actual mean gray levels of the im-age [29].If invariant features can not be found, an al-ternative is to normalize the input images to havestandard size, rotation, contrast, and so on. How-ever, one should keep in mind that this introducesnew discretization errors.

1.2 ReconstructabilityFor some feature extraction methods, the charac-ters can be reconstructed from the extracted fea-tures [30, 31]. This property ensures that completeinformation about the character shape is present inthe extracted features. Although, for some meth-ods, exact reconstruction may require an arbitrar-ily large number of features, reasonable approxima-tions of the original character shape can usually beobtained by using only a small number of featureswith the highest information content. The hope isthat these features also have high discriminationpower.By reconstructing the character images from theextracted features, one may visually check that asu�cient number of features is used to capture theessential structure of the characters. Reconstruc-tion may also be used to informally control thatthe implementation is correct.The rest of the paper is organized as follows.Sections 2{5 give a detailed review of feature ex-traction methods, grouped by the various repre-sentation forms of the characters. A short sum-mary on neural network classi�ers is given in Sec-tion 6. Section 7 gives guidelines for how one shouldchoose the appropriate feature extraction methodfor a given application. Finally, a summary is givenin Section 7.2 Features Extracted From Gray Scale Im-agesA major challenge in gray scale image-based meth-ods is to locate candidate character locations. Onecan use a locally adaptive binarization method toobtain a good binary raster image, and use con-nected components of the expected character sizeto locate the candidate characters. However, a grayscale-based method is typically used when recogni-tion based on the binary raster representation fails,so the localization problem remains unsolved fordi�cult images. One may have to resort to thebrute force approach of trying all possible locationsin the image. However, one then has to assume astandard size for a character image, as the combi-nation of all character sizes and locations is com-putationally prohibitive. This approach can not beused if the character size is expected to vary.The desired result of the localization or segmen-tation step is a subimage containing one character,and, except for background pixels, no other objects.However, when print objects appear very close toeach other in the input image, this goal can not al-ways be achieved. Often, other characters or printobjects may accidentally occur inside the subim-age (Fig. 3), possibly distorting the extracted fea-tures. This is one of the reasons why every charac-ter recognition system has a reject option.Appeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996SAMSUNG EXHIBIT 1008

of 24

Feature Extraction Methods for Character Recognition | A Survey 52.1 Template matchingWe are not aware of OCR systems using templatematching on gray scale character images. However,since template matching is a fairly standard imageprocessing technique [32, 33], we have included thissection for completeness.In template matching the feature extraction stepis left out altogether, and the character image it-self is used as a \feature vector". In the recogni-tion stage, a similarity (or dissimilarity) measurebetween each template Tj and the character im-age Z is computed. The template Tk which hasthe highest similarity measure is identi�ed, and ifthis similarity is above a speci�ed threshold, thenthe character is assigned the class label k. Else,the character remains unclassi�ed. In the case of adissimilarity measure, the template Tk having thelowest dissimilarity measure is identi�ed, and if thedissimilarity is below a speci�ed threshold, the char-acter is given the class label k.A common dissimilarity measure is the meansquare distance D (Eq. 20.1-1 in Pratt [33]):Dj = MXi=1 (Z(xi; yi)� Tj(xi; yi))2; (1)where it is assumed that the template and the inputcharacter image are of the same size, and the sumis taken over the M pixels in the image.Eqn. (1) can be rewritten asDj = EZ � 2RZTj +ETj ; (2)where EZ = MXi=1 �Z2(xi; yi)�; (3)RZTj = MXi=1 (Z(xi; yi)Tj(xi; yi)); (4)ETj = MXi=1 �T 2j (xi; yi)�: (5)EZ and ETj are the total character image energyand the total template energy, respectively. RZTj isthe cross-correlation between the character and thetemplate, and could have been used as a similaritymeasure, but Pratt [33] points out that RZTj maydetect a false match if, say, Z contains mostly highvalues. In that case, EZ also has a high value, andit could be used to normalize RZTj by the expres-sion ~RZTj = RZTj =EZ . However, in Pratt's formu-lation of template matching, one wants to decidewhether the template is present in the image (andget the locations of each occurrence). Our problemis the opposite: �nd the template that matches thecharacter image best. Therefore, it is more rele-vant to normalize the cross-correlation by dividingit with the total template energy:RZTj = RZTjETj : (6)

Experiments are needed to decide wether Dj orRZTj should be used for OCR.Although simple, template matching su�ersfrom some obvious limitations. One template isonly capable of recognizing characters of the samesize and rotation, is not illumination-invariant (in-variant to contrast and to mean gray level), and isvery vulnerable to noise and small variations thatoccur among characters from the same class. How-ever, many templates may be used for each charac-ter class, but at the cost of higher computationaltime since every input character has to be com-pared with every template. The character candi-dates in the input image can be scaled to suit thetemplate sizes, thus making the recognizer scale-independent.2.2 Deformable TemplatesDeformable templates have been used extensivelyin several object recognition applications [34, 35].Recently, Del Bimbo et al. [36] proposed to usedeformable templates for character recognition ingray scale images of credit card slips with poorprint quality. The templates used were characterskeletons. It is not clear how the initial positions ofthe templates were chosen. If all possible positionsin the image were to be tried, then the computa-tional time would be prohibitive.2.3 Unitary Image TransformsIn template matching, all the pixels in the grayscale character image are used as features. An-drews [37] applies a unitary transform to charac-ter images, obtaining a reduction in the numberof features while preserving most of the informa-tion about the character shape. In the transformedspace, the pixels are ordered by their variance, andthe pixels with the highest variance are used asfeatures. The unitary transform has to be ap-plied to a training set to obtain estimates of thevariances of the pixels in the transformed space.Andrews investigated the Karhunen-Loeve (KL),Fourier, Hadamard (or Walsh), and Haar trans-forms in 1971 [37]. He concluded that the KLtransform was too computationally demanding, sohe recommended to use the Fourier or Hadamardtransforms. However, the KL transform is the only(mean-squared error) optimal unitary transform interms of information compression [38]. When theKL transform is used, the same amount of informa-tion about the input character image is containedin fewer features compared to any other unitarytransform.Other unitary transforms include the Cosine,Sine, and Slant transforms [38]. It has been shownthat the Cosine transform is better in terms of in-formation compression (e.g., see pp. 375{379 in[38]) than the other non-optimal unitary trans-forms. Its computational cost is comparable to thatof the fast Fourier transform, so the Cosine trans-form has been coined \the method of choice forAppeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996SAMSUNG EXHIBIT 1008

of 24

6 TRIER, JAIN, and TAXT(a) (b)Figure 8: Zoning of gray level character images.(a) A 4�4 grid superimposed on a character image.(b) The average gray levels in each zone, which areused as features.image data compression" [38].The KL transform has been used for objectrecognition in several application domains, for ex-ample face recognition [39]. It is also a realisticalternative for OCR on gray level images with to-day's fast computers.The features extracted from unitary transformsare not rotation-invariant, so the input characterimages have to be rotated to a standard orientationif rotated characters may occur. Further, the inputimages have to be of exactly the same size, so ascaling or re-sampling is necessary if the size canvary. The unitary transforms are not illuminationinvariant, but for the Fourier transformed imagethe value at the origin is proportional to the averagepixel value of the input image, so this feature canbe deleted to obtain brightness invariance. For allunitary transforms, an inverse transform exists, sothe original character image can be reconstructed.2.4 ZoningThe commercial OCR system by Calera describedin Bokser [40] uses zoning on solid binary char-acters. A straightforward generalization of thismethod to gray level character images is given here.An n�m grid is superimposed on the character im-age (Fig. 8(a)), and for each of the n�m zones, theaverage gray level is computed (Fig. 8(b)), givinga feature vector of length n �m. However, thesefeatures are not illumination invariant.2.5 Geometric Moment InvariantsHu [41] introduced the use of moment invariantsas features for pattern recognition. Hu's absoluteorthogonal moment invariants (invariant to trans-lation, scale and rotation) have been extensivelyused (see, e.g., [29, 42, 43, 44, 45]). Li [45] listed 52Hu invariants, of orders 2 to 9, that are translation-, scale- and rotation-invariant. Belkasim et al. [43]listed 32 Hu invariants of orders 2 to 7. However,Belkasim et al. identi�ed fewer invariants of orders2 to 7 than Li.

Hu also developed moment invariants that weresupposed to be invariant under general linear trans-formations:� x0y0 � = � a11 a12a21 a22 �� xy �+ � b1b2 � ; (7)where A = � a11 a12a21 a22 � ; b = � b1b2 � : (8)Reiss [29] has recently shown that these Hu invari-ants are in fact incorrect, and provided correctedexpressions for them.Given a gray scale subimage Z containing a char-acter candidate, the regular moments [29] of order(p+ q) are de�ned asmpq = MXi=1 Z(xi; yi)(xi)p(yi)q; (9)where the sum is taken over all the M pixels in thesubimage. The translation-invariant central mo-ments [29] of order (p+ q) are obtained by placingthe origin at the center of gravity:�pq = MXi=1 Z(xi; yi)(xi � x)p(yi � y)q ; (10)where x = m10m00 ; y = m01m00 : (11)Hu [41] showed that��pq = �pq�(1+ p+q2 ) ; p+ q � 2 (12)are scale-invariant, where � = �00 = m00. Fromthe �pq's, rotation-invariant features can be con-structed. For example, the second-order invariantsare �1 = �20 + �02; (13)�2 = (�20 � �02)2 + �211: (14)Invariants for general linear transformations arecomputed via relative invariants [41, 29]. Relativeinvariants satisfyI 0j = jAT jwj jJ jkjIj; (15)where Ij is a function of the moments in the origi-nal (x; y) space, I 0j is the same function computedfrom the moments in the transformed (x0; y0) space,wj is called the weight of the relative invariant, jJ jis the absolute value of the Jacobian of the trans-posed transformation matrix, AT , and kj is the or-der of Ij . Note that the translation vector b doesnot appear in Eq. (15) as the central moments are�Note that Eq. (12) is written with a typographicalerror in Hu's paper [41].Appeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996SAMSUNG EXHIBIT 1008

of 24

Feature Extraction Methods for Character Recognition | A Survey 7independent of translation. To generate absoluteinvariants, that is, invariants j satisfying 0j = j; (16)Reiss [29] used, for linear transformations,jAT j = J and �0 = jJ j�; (17)where � = �00:I 0j = jJ jwj+kj Ij for wj even, (18)I 0j = J jJ jwj+kj�1Ij for wj odd. (19)Then, it can be shown that j = Ij�wj+kj (20)is an invariant if wj is even, and j jj is an invariantif wj is odd.For general linear transformations, Hu [41] andReiss [29, 42] gave the following relative invariantsthat are functions of the second- and third-ordercentral moments:I1 = �20�02 � �211 (21)I2 = (�30�03 � �21�12)2�4(�30�12 � �221)(�21�03 � �212) (22)I3 = �20(�21�03 � �212)��11(�30�03 � �21�12)+�02(�30�12 � �221) (23)I4 = �230�302 � 6�30�21�11�202+6�30�12�02(2�211 � �20�02)+�30�03(6�20�11�02 � 8�311)+9�221�20�202 � 18�21�12�20�11�02+6�21�03�20(2�211 � �20�02)+9�212�220�02�6�12�03�11�220 + �203�320 (24)Reiss found the weights wj and orders kj to bew1 = 2; w2 = 6; w3 = 4; w4 = 6; (25)k1 = 2; k2 = 4; k3 = 3; k4 = 5: (26)Then the following features are invariant un-der translation and general linear transformations(given by Reiss [29] in 1991 and rediscovered byFlusser and Suk [46, 47] in 1993): 1 = I1�4 ; 2 = I2�10 ; (27) 3 = I3�7 ; 4 = I4�11 : (28)Hu [41] implicitly used k � 1, obtaining incor-rect invariants.Bamieh and de Figueiredo [48] have suggestedthe following two relative invariants in addition to

I1 and I2:yJ3 = �40�04 � 4�31�13 + 3�222 (29)J4 = �40�22�04 � 2�31�22�13��40�13 � �04�231 � �322: (30)As above, these relative invariants must be di-vided by �� = �w+k to obtain absolute invariants.Regretfully, Bamieh and de Figueiredo divided Jiby �w (implicitly using k � 0), so their invariantsare incorrect too.Reiss [29, 42] also gave features that are invari-ant under changes in contrast, in addition to be-ing invariant under translation and general lineartransformations (including scale change, rotationand skew). The three �rst features are�1 = I4�I2 , �2 = I21�I3 , �3 = I1I3I4 (31)Experiments with other feature extraction methodsindicate that at least 10-15 features are needed fora successful OCR system. More moment invariants( j's and �j's) based on higher order moments aregiven by Reiss [42].2.6 Zernike MomentsZernike moments have been used by several au-thors for character recognition of binary solid sym-bols [49, 31, 43]. However, initial experiments sug-gest that they are well suited for gray-scale char-acter subimages as well. Both rotation-variant androtation-invariant features can be extracted. Fea-tures invariant to illumination need to be developedfor these features to be really useful for gray levelcharacter images.Khotanzad and Hong [49, 31] use the ampli-tudes of the Zernike moments as features. A set ofcomplex orthogonal polynomials Vnm(x; y) is used(Eqs. 32{33)z. The Zernike moments are projec-tions of the input image onto the space spanned bythe orthogonal V -functions.Vnm(x; y) = Rnm(x;y)ejm tan�1 yx ; (32)where j = p�1, n � 0 , jmj � n , n� jmj is even,andRnm(x; y) =n�jmj2Xs=0 (�1)s(x2 + y2)n2�s(n� s)!s! �n+jmj2 � s�! �n�jmj2 � s�! : (33)For a digital image, the Zernike moment of ordern and repetition m is given byAnm = n+ 1� Xx Xy f(x; y)[Vnm(x;y)]�; (34)yAn incorrect version of I2 is given in Bamieh andde Figueiredo's paper [48].zThere is an error in [49] in equation (33): In [49],the summation is taken from s = 0 to n � jmj2 , how-ever, it must be taken from s = 0 to n�jmj2 to avoid�n�jmj2 � s� becoming negative.Appeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996SAMSUNG EXHIBIT 1008

of 24

8 TRIER, JAIN, and TAXT

Figure 9: Images derived from Zernike moments. Rows 1{2: Input image of digit `4', and contributionsfrom the Zernike moments of order 1{13. The images are histogram equalized to highlight the details.Rows 3{4: Input image of digit `4', and images reconstructed from the Zernike moments of order up to1{13, respectively.Figure 10: Images derived from Zernike moments. Rows 1{2: Input image of digit `5', and contributionsfrom the Zernike moments of order 1{13. The images are histogram equalized to highlight the details.Rows 3{4: Input image of digit `5', and images reconstructed from the Zernike moments of order up to1{13, respectively. Appeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996


Feature Extraction Methods for Character Recognition | A Survey 9where x2 + y2 � 1, and the symbol � denotes thecomplex conjugate operator. Note that the imagecoordinates must be mapped to the range of theunit circle, x2 + y2 � 1. The part of the originalimage inside the unit circle can be reconstructedwith an arbitrary precision usingf(x;y) = limN!1 NXn=0Xm AnmVnm(x;y); (35)where the second sum is taken over all jmj � n;such that n� jmj is even.The magnitudes jAnmj are rotation invariant.To show the contribution of the Zernike momentof order n, we have computedjIn(x; y)j = ��Xm AnmVnm(x; y)�� ; (36)where x2 + y2 � 1 , jmj � n , and n� jmj is even.The images jIn(x; y)j, n = 1; : : : ; 13, for thecharacters `4' and `5' (Figs. 9 and 10) indicatethat the extracted features are very di�erent forthe third and higher order moments. Orders oneand two seem to represent orientation, width, andheight. However, reconstructions of the same digits(Figs. 9 and 10) using Eq. (35), N = 1; : : : ; 13, indi-cate that moments of orders up to 8{11 are neededto achieve a reasonable appearance.Translation- and scale-invariance can be ob-tained by shifting and scaling the image prior tothe computation of the Zernike moments [31]. The�rst-order regular moments can be used to �nd theimage center and the zeroth order central momentgives a size estimate.Belkasim et al. [43, 44] use the following addi-tional featuresBn;n+1 = jAn�2;1jjAn1j cos(�n�2;1 � �n1); (37)Bn;n+L = jAn1jjAnLjp cos(p�nL � �n1) (38)where L = 3; 5; : : : ; n , p = 1=L , and �nm is thephase angle component of Amn so thatAmn = jAmnj cos�mn + jjAmnj sin �mn: (39)3 Features Extracted From Binary ImagesA binary raster image is obtained by a global orlocally adaptive binarization [13] of the gray scaleinput image. In many cases, the segmentation ofcharacters is done simply by isolating connectedcomponents. However, for di�cult images, somecharacters may touch or overlap each other or otherprint objects. Another problem occurs when char-acters are fragmented into two or more connectedcomponents. This problem may be alleviated some-what by choosing a better locally adaptive binariza-tion method, but Trier and Taxt [13] have shownthat even the best locally adaptive binarizationmethod may still not result in perfectly isolatedcharacters.

Methods for segmenting touching characters aregiven by Westall and Narasimha [50], Fujisawa etal. [51], and in surveys [5, 6]. However, these meth-ods assume that the characters appear in the sametext string and have known orientation. In hydro-graphic maps (Fig. 1), for example, some characterstouch or overlap lines, or touch characters from an-other text line. Trier et al. [52] have developed amethod based on gray scale topographic analysis[53, 54], which integrates binarization and segmen-tation. This method gives a better performance,since information gained in the topographic anal-ysis step is used in segmenting the binary image.The segmentation step also handles rotated char-acters and touching characters from di�erent textstrings.The binary raster representation of a characteris a simpli�cation of the gray scale representation.The image function Z(x; y) now takes on two values(say, 0 and 1) instead of the, say, 256 gray level val-ues. This means that all the methods developed forthe gray scale representation are applicable to thesolid binary raster representation as well. There-fore, we will not repeat the full description of eachmethod, but only point out the simpli�cation in thecomputations involved for each feature extractionmethod. Generally, invariance to illumination is nolonger relevant, but the other invariances are.A solid binary character may be converted toother representations, such as the outer contour ofthe character, the contour pro�les, or the charac-ter skeleton, and features may be extracted fromeach of these representations as well. For the pur-pose of designing OCR systems, the goal of theseconversions is to preserve the relevant informationabout the character shape, and discard some of theunnecessary information.Here, we only present the modi�cations of themethods previously described for the gray scale rep-resentation. No changes are needed for unitary im-age transforms and Zernike moments, except thatgray level invariance is irrelevant.3.1 Template MatchingIn binary template matching, several similaritymeasures other than mean square distance andcorrelation have been suggested [55]. To detectmatches, let nij be the number of pixel positionswhere the template pixel x is i and the image pixely is j, with i; j 2 f0; 1g:nij = nXm=1 �m(i; j) (40)where�m(i; j) = � 1 if (xm = i) ^ (ym = j)0 otherwise; (41)i; j 2 f0; 1g, and ym and xm are the m-th pixels ofthe binary images Y and X which are being com-pared. Tubbs evaluated eight distances, and foundAppeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996SAMSUNG EXHIBIT 1008

of 24

10 TRIER, JAIN, and TAXTthe Jaccard distance dJ and the Yule distance dYto be the best.dJ = n11n11 + n10 + n01 (42)dY = n11n00 � n10n01n11n00 + n10n01 (43)However, the lack of robustness regarding shapevariations mentioned in Section 2 for the gray scalecase still applies. Tubbs [55] tries to overcome someof these shortcomings by introducing weights forthe di�erent pixel positions m. Eq. (40) is replacedby nij = nXm=1 pm(kji)�m(i; j); (44)where pm(kji) is the probability that the input im-age Y matches template Xk, given that pixel num-ber m in the template Xk is i. pm(kji) is approxi-mated as the number of templates (including tem-plate Xk) having the same pixel value at locationm as template Xk, divided by the total number oftemplates.However, we suspect that the extra exibilityobtained by introducing p(kji) is not enough tocope with variabilities in character shapes that mayoccur in hand-printed characters and multi-fontmachine printed characters. A more promising ap-proach is taken by Gader et al. [24] who use a set oftemplates for each character class, and a procedurefor selecting templates based on a training set.3.2 Unitary Image TransformsThe NIST form-based hand-print recognition sys-tem [56] uses the Karhunen-Loeve transform to ex-tract features from the binary raster representa-tion. Its performance is claimed to be good, andthis OCR system is available in the public domain.3.3 Projection histogramsProjection histograms were introduced in 1956 ina hardware OCR system by Glauberman [57]. To-day, this technique is mostly used for segmentingcharacters, words, and text lines, or to detect if aninput image of a scanned text page is rotated [58].For a horizontal projection, y(xi) is the number ofpixels with x = xi (Fig. 11). The features can bemade scale independent by using a �xed numberof bins on each axis (by merging neighboring bins)and dividing by the total number of print pixels inthe character image. However, the projection his-tograms are very sensitive to rotation, and to somedegree, variability in writing style. Also, importantinformation about the character shape seems to belost.The vertical projection x(y) is slant invariant,but the horizontal projection is not. When mea-suring the dissimilarity between two histograms, itis tempting to used = nXi=1 jy1(xi)� y2(xi)j; (45)

Figure 11: Horizontal and vertical projection his-tograms.where n is the number of bins, and y1 and y2 arethe two histograms to be compared. However, itis more meaningful to compare the cumulative his-tograms Y (xk), the sum of the k �rst bins,Y (xk) = kXi=1 y(xi); (46)using the dissimilarity measureD = nXi=1 jY1(xi)� Y2(xi)j; (47)where Y1 and Y2 denotes cumulative histograms.The new dissimilarity measure D is not as sensitiveas d to a slight misalignment of dominant peaks inthe original histograms.3.4 ZoningBokser [40] describes the commercial OCR systemCalera that uses zoning on binary characters.The system was designed to recognize machineprinted characters of almost any non-decorativefont, possibly severely degraded, by, for exampleseveral generations of photocopying. Both con-tour extraction and thinning proved to be unre-liable for self-touching characters (Fig. 12). Thezoning method was used to compute the percent-age of black pixels in each zone. Additional featureswere needed to improve the performance, but thedetails were not presented by Bokser [40]. Unfor-tunately, not much explicit information is availableabout the commercial systems.3.5 Geometric Moment InvariantsA binary image can be considered a special caseof a gray level image with Z(x; y) = 1 for printpixels, and Z(xi; yi) = 0 for background pixels. ByAppeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996SAMSUNG EXHIBIT 1008

of 24

Feature Extraction Methods for Character Recognition | A Survey 11(a) (b)(c) (d)Figure 12: Two of the characters in Bokser's study[40] that are easily confused when thinned (e.g.,with Zhang and Suen's method [28]). (a) `S'. (b)`8'. (c) Thinned `S'. (d) Thinned `8'.summing over theN print pixels only, Eqs. (9){(10)can be rewritten asmpq = NXi=1 (xi)p(yi)q (48)�pq = NXi=1 (xi � x)p(yi � y)q; (49)where x = m1;0m0;0 ; y = m0;1m0;0 : (50)Then, Eqs. (12){(24) can be used as before. How-ever, the contrast invariants (Eq. 31) are of no in-terest in the binary case.For characters that are not too elongated, a fastalgorithm for computing the moments based on thecharacter contour exists [59], giving the same valuesas Eq. (49).3.6 Evaluation StudiesBelkasim et al. [43, 44] compared several momentinvariants applied to solid binary characters, in-cluding regular, Hu, Bamieh, Zernike, Teague-Zernike, and pseudo-Zernike moment invariants,using a k nearest neighbor (kNN) classi�er. Theyconcluded that normalized Zernike moment invari-ants [43, 44] gave the best performance for char-acter recognition in terms of recognition accuracy.The normalization compensates for the variances ofthe features, and since the kNN classi�er uses theEuclidean distance to measure the dissimilarity ofthe input feature vectors and the training samples,this will improve the performance. However, by us-ing a statistical classi�er which explicitly accounts

x (y)

x (y)x

y

L

RFigure 13: Digit `5' with left pro�le xL(y) and rightpro�le xR(y). For each y value, the left (right)pro�le value is the leftmost (rightmost) x value onthe character contour.for the variances, for example, a quadratic Bayesianclassi�er using the Mahalanobis distance, no suchnormalization is needed.4 Features Extracted From the BinaryContourThe closed outer contour curve of a character is aclosed piecewise linear curve that passes throughthe centers of all the pixels which are 4-connectedto the outside background, and no other pixels.Following the curve, the pixels are visited in, say,counter-clockwise order, and the curve may visitan edge pixel twice at locations where the objectis one-pixel wide. Each line segment is a straightline between the pixel centers of two 8-connectedneighbors.By approximating the contour curve by a para-metric expression, the coe�cients of the approxi-mation can be used as features. By following theclosed contour successively, a periodic function re-sults. Periodic functions are well-suited for Fourierseries expansion, and this is the foundation for theFourier-based methods discussed below.4.1 Contour Pro�lesThe motivation for using contour pro�les is thateach half of the contour (Fig. 13) can be approxi-mated by a discrete function of one of the spatialvariables, x or y. Then, features can be extractedfrom discrete functions. We may use vertical orhorizontal pro�les, and they can be either outerpro�les or inner pro�les.To construct vertical pro�les, �rst locate the up-permost and lowermost pixels on the contour. Thecontour is split at these two points. To get theouter pro�les, for each y value, select the outer-most x value on each contour half (Fig. 13). To getthe inner pro�les, for each y value, the innermostx values are selected. Horizontal pro�les can beextracted in a similar fashion, starting by dividingthe contour in upper and lower halves.The pro�les are themselves dependent on rota-tion (e.g., try to rotate the `5' in Fig. 13, say, 45�before computing the pro�les). Therefore, all fea-tures derived from the pro�les will also be depen-dent on rotation.Appeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996SAMSUNG EXHIBIT 1008

of 24

12 TRIER, JAIN, and TAXT(a) (b)orientation count0� 945� 190� 2135� 4(c)Figure 14: Zoning of contour curve. (a) 4� 4 gridsuperimposed on character; (b) Close-up of the up-per right corner zone; (c) Histogram of orientationsfor this zone.Figure 15: Slice zones used by Takahashi [60].Kimura and Shridhar [27] extracted featuresfrom the outer vertical pro�les only (Fig. 13). Thepro�les themselves can be used as features, as wellas the �rst di�erences of the pro�les (e.g, x0L(y) =xL(y+1)�xL(y)); the width w(y) = xR(y)�xL(y);the ratio of the vertical height of the character, n,by the maximum of the width function, maxy w(y);location of maxima and minima in the pro�les; andlocations of peaks in the �rst di�erences (which in-dicate discontinuities).

A B C

D E F

G H I

P1

P2Figure 16: Zoning with fuzzy borders. Pixel P1has a membership value of 0.25 in each of the fourzones A, B, D, and E. P2 has a 0.75 membershipof E and a 0.25 membership of F .

4.2 ZoningKimura and Shridhar [27] used zoning on contourcurves. In each zone, the contour line segmentsbetween neighboring pixels were grouped by ori-entation: horizontal (0�), vertical (90�), and thetwo diagonal orientations (45�; 135�). The numberof line segments of each orientation was counted(Fig. 14).Takahashi [60] also used orientation histogramsfrom zones, but used vertical, horizontal, and di-agonal slices as zones (Fig. 15). The orientationswere extracted from contours (if any) in additionto the outer contour when making the histograms.Further, Takahashi identi�ed high curvaturepoints along both outer and inner contours. Foreach of these points, the curvature value, the con-tour tangent, and the point's zonal position wereextracted. This time a regular grid was used aszones.Cao et al. [25] observed that when the contourcurve was close to zone borders, small variations inthe contour curve could lead to large variations inthe extracted features. They tried to compensatefor this by using fuzzy borders. Points near thezone borders are given fuzzy membership values totwo or four zones, and the fuzzy membership valuessum to one (Fig. 16).4.3 Spline curve approximationSekita et al. [61] identify high-curvature points,called breakpoints, on the outer character contour,and approximate the curve between two break-points with a spline function. Then, both thebreakpoints and the spline curve parameters areused as features.Taxt et al. [62] approximated the outer con-tour curve with a spline curve, which was thensmoothed. The smoothed spline curve was dividedinto M parts of equal curve length. For each part,the average curvature was computed. In addition,the distances from the arithmetic mean of the con-tour curve points to N equally spaced points on thecontour were measured. By scaling the character'sspline curve approximation to a standard size be-fore the features were measured, the features willbecome size invariant. The features are alreadytranslation invariant by nature. However, the fea-tures are dependent on rotation.4.4 Elliptic Fourier descriptorsIn Kuhl and Giardina's approach [30], the closedcontour, (x(t); y(t)); t = 1; : : : ;m, is approximatedasx(t) = A0 + NXn=1[an cos 2n�tT + bn sin 2n�tT ] (51)y(t) = C0 + NXn=1[cn cos 2n�tT + dn sin 2n�tT ] (52)Appeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996SAMSUNG EXHIBIT 1008

of 24

Feature Extraction Methods for Character Recognition | A Survey 13where T is total contour length, and with x(t) �x(t) and y(t) � y(t) in the limit when N ! 1.The coe�cients areA0 = 1T Z T0 x(t)dt (53)C0 = 1T Z T0 y(t)dt (54)an = 2T Z T0 x(t) cos 2n�tT dt (55)bn = 2T Z T0 x(t) sin 2n�tT dt (56)cn = 2T Z T0 y(t) cos 2n�tT dt (57)dn = 2T Z T0 y(t) sin 2n�tT dt: (58)The functions x(t) and y(t) are piecewise linear,and the coe�cients can, therefore, be obtained bysummation instead of integration. It can be shown[30] that the coe�cients an, bn, cn and dn, whichare the extracted features, can be expressed asan = T2n2�2 mXi=1 �xi�ti [cos�i � cos�i�1] (59)bn = T2n2�2 mXi=1 �xi�ti [sin�i � sin�i�1] (60)cn = T2n2�2 mXi=1 �yi�ti [cos�i � cos�i�1] (61)dn = T2n2�2 mXi=1 �yi�ti [sin�i � sin�i�1]; (62)where �i = 2n�tiT ,�xi = xi � xi�1; �yi = yi � yi�1; (63)�ti =p�x2 +�y2; ti = iXj=1 �tj; (64)T = tm = mXj=1 �tj; (65)and m is the number of pixels along the bound-ary. The starting point (x1; y1) can be arbitrarilychosen, and it is clear from Eqs. (55{56) that thecoe�cients are dependent on this choice. To ob-tain features that are independent of the particularstarting point, we calculate the phase shift from the�rst major axis as�1 = 12 tan�1 2(a1b1 + c1d1)pa21 � b21 + c21 � d21 : (66)Then, the coe�cients can be rotated to achieve azero phase shift:� a�n b�nc�n d�n � = � an bncn dn �� cos n�1 � sin n�1sin n�1 cosn�1 � :(67)

To obtain rotation invariant descriptors, the ro-tation, 1, of the semi-major axis (Fig. 17(a)) canbe found by 1 = tan�1 c�1a�1 (68)and the descriptors can then be rotated by � 1(Fig. 17(b)), so that the semi-major axis is parallelwith the x-axis:� a��n b��nc��n d��n � = � cos 1 sin 1� sin 1 cos 1 � � a�n b�nc�n d�n � :(69)This rotation gives b��1 = c��1 = 0:0 (Fig. 17(b)),so these coe�cients should not be used as features.Further, both these rotations are ambiguous, as �and � + � give the same axes, and so do and + �.To obtain size-invariant features, the coe�cientscan be divided by the magnitude, E, of the semi-major axis, given byE =pa�21 + c�21 = a��1 : (70)Then a��1 should not be used as a feature as well. Inany case, the low-order coe�cients that are avail-able contain the most information (about the char-acter shape), and should always be used.In Figs. 18{19, the characters `4' and `5' of Fig. 6has been reconstructed using the coe�cients of or-der up to n for di�erent values of n. These �g-ures suggest that using only the descriptors of the�rst three orders (12 features in total) might notbe enough to obtain a classi�er with su�cient dis-crimination power.Lin and Hwang [63] derived rotation-invariantfeatures based on Kuhl and Giardina's [30] features:Ik = a2k + b2k + c2k + d2k (71)Jk = akdk � bkck (72)K1;j = (a21 + b21)(a2j + b2j) + (c21 + d21)(c2j + d2j)+ 2(a1c1 + b1d1)(ajcj + bjdj) (73)As above, a scaling factor may be used to obtainsize-invariant features.4.5 Other Fourier DescriptorsPrior to Kuhl and Giardina [30], and Lin andHwang [63], other Fourier descriptors were devel-oped by Zahn and Roskies [64] and Granlund [65].In Zahn and Roskies' method [64], the angulardi�erence �' between two successive line segmentson the contour is measured at every pixel centeralong the contour. The contour is followed clock-wise. Then the following descriptors can be ex-tracted: an = � 1n� mXk=1�'k sin 2�ntkT (74)bn = 1n� mXk=1�'k cos 2�ntkT ; (75)Appeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996SAMSUNG EXHIBIT 1008

of 24

14 TRIER, JAIN, and TAXT1

−a*1

1−d*

1

1ψ

1*

1*

1*

1

x

y

a

d

* c

*−c

b

*−b

1**

1−a**

1**

1**

1**

1**−d

x

y

b = c = 0.0

d

a(a) (b)Figure 17: The rotation of the �rst-order ellipse used in elliptic Fourier descriptors in order to obtainrotation-independent descriptors a��1 and d��1 . (a) Before rotation, (b) After rotation.Figure 18: Character `4' reconstructed by elliptic Fourier descriptors of orders up to 1, 2, . . . 10; 15, 20,30, 40, 50, and 100, respectively.Figure 19: Character `5' reconstructed by elliptic Fourier descriptors of orders up to 1, 2, . . . 10; 15, 20,30, 40, 50, and 100, respectively.Appeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996


Feature Extraction Methods for Character Recognition | A Survey 15where T is the length of the boundary curve, con-sisting of m line segments, tk is the accumulatedlength of the boundary from the starting point p1to the k-th point pk, and �'k is the angle betweenthe vectors [pk�1; pk] and [pk; pk+1]. an and bn aresize- and translation-invariant. Rotation invariancecan be obtained by transforming to polar coordi-nates. Then the amplitudesAn =pa2n + b2n (76)are independent of rotation and mirroring, whilethe phase angles �n = tan(an=bn) are not. How-ever, mirroring can be detected via the �j's. It canbe shown that Fkj = j��k � k��j; (77)is independent of rotation, but dependent on mir-roring. Here, j� = j=gcd(j;k), k� = k=gcd(j;k),and gcd(j; k) is the greatest common divisor of jand k.Zahn and Roskies warn that �k becomes unre-liable as Ak ! 0, and is totally unde�ned whenAk = 0. Therefore, the Fkj terms may be unreli-able.Granlund [65] uses a complex number z(t) =x(t) + y(t) to denote the points on the contour.Then the contour can be expressed as a Fourierseries: z(t) = 1Xn=�1 ane j2�ntT ; (78)where an = 1T Z T0 z(t)e� j2�ntT dt (79)are the complex coe�cients. a0 is the center ofgravity, and the other coe�cients an; n 6= 0 areindependent of translation. Again, T is the totalcontour length. The derived featuresbn = a1+na1�na21 , (80)Dmn = an=k1+mam=k1�na(m+n)=k1 (81)are independent of scale and rotation. Here, n 6= 1and k = gcd(m;n) is the greatest common divisorof m and n. Furthermore,b�1 = a2ja1ja21 ; (82)d�m1 = a1+mja1jmam+11 (83)are scale-independent, but depend on rotation, sothey can be useful when the orientation of the char-acters is known.Persoon and Fu [3] pointed out thatan = a�ne� 2�n�T (84)for some �. Therefore, the set of an's is redundant.

4.6 Evaluation StudiesTaxt et al. [62] evaluated Zahn and Roskies'Fourier descriptors [64], Kuhl and Giardina's el-liptic Fourier descriptors [30], Lin and Hwang's el-liptic Fourier descriptors [63], and their own cu-bic spline approximation [62]. For characters withknown rotation, the best performance was reportedusing Kuhl and Giardina's method.Persoon and Fu [3] observed that Zahn andRoskies' descriptors (an; bn) converge slowly to zeroas n ! 0 relative to Granlund's [65] descriptors(an) in the case of piecewise linear contour curves.This suggests that Zahn and Roskies' descriptorsare not so well suited for the character contoursobtained from binary raster objects nor characterskeletons.5 Features Extracted From the VectorRepresentationCharacter skeletons (Fig. 5) are obtained by thin-ning the binary raster representation of the char-acters. An overwhelming number of thinning al-gorithms exist, and some recent evaluation stud-ies give clues to their merits and disadvantages[15, 16, 66]. The task of choosing the right one ofteninvolves a compromise; one wants one-pixel wide8-connected skeletons without spurious branchesor displaced junctions, some kind of robustness torotation and noise, and at the same time a fastand easy-to-implement algorithm. Kwok's thinningmethod [67] appears to be a good candidate, al-though its implementation is complicated.A character graph can be derived from the skele-ton by approximating it with a number of straightline segments and junction points. Arcs may beused for curved parts of the skeleton.Wang and Pavlidis have recently proposed amethod for obtaining character graphs directlyfrom the gray level image [53, 68]. They view thegray level image as a 3D surface, with the gray lev-els mapped along the z-coordinate, using z = 0for white (background) and, for example, z = 255for black. By using topographic analysis, ridgelines and saddle points are identi�ed, which arethen used to obtain character graphs consisting ofstraight line segments, arcs, and junction points.The saddle points are analyzed to determine if theyare points of unintentionally touching characters,or unintentionally broken characters. This methodis useful when even the best available binarizationmethods are unable to preserve the character shapein the binarized image.5.1 Template matchingTemplate matching in its pure form is not wellsuited for character skeletons, since the chancesare small that the pixels of the branches in theinput skeleton will exactly coincide with the pix-els of the correct template skeleton. Lee and ParkAppeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996SAMSUNG EXHIBIT 1008

of 24

16 TRIER, JAIN, and TAXT[69] reviewed several non-linear shape normaliza-tion methods used to obtain uniform line or strokespacing both vertically and horizontally. The ideais that such methods will compensate for shape dis-tortions. Such normalizations are claimed to im-prove the performance for template matching [70],but may also be used as a preprocessing step forzoning.5.2 Deformable TemplatesDeformable templates have been used by Burr [71]and Wakahara [72, 73] for recognition of characterskeletons. In Wakahara's approach, each templateis deformed in a number of small steps, called lo-cal a�ne transforms (LAT) to match the candidateinput pattern (Fig. 20). The number and types oftransformations before a match is obtained can beused as a dissimilarity measure between each tem-plate and the input pattern.(a) (b)(c) (d)Figure 20: The deformable template matching ap-proach of Wakahara [72]. Legend: ` '=originaltemplate pixels not in transformed template;` '=transformed template; ` '=input pattern;` '=common pixels of transformed template and in-put pattern. (a) Template and input pattern of aChinese character. (b){(d) after 1, 5, and 10 iter-ations, respectively, of local a�ne transforms on acopy of the template.5.3 Graph DescriptionPavlidis [74] extracts approximate strokes fromskeletons. Kahan et al. [75] augmented them withadditional features to obtain reasonable recognitionperformance.For Chinese character recognition, several au-thors extract graph descriptions or representationsfrom skeletons as features [76, 77, 78]. Lu et al. [76]

Figure 21: Thinned letters `c' and `d'. Vertical andhorizontal axes are placed at the center of gravity.`c' and `d' both have one semicircle in the Westdirection, but none in the other directions. `c' hasone horizontal crossing and two vertical crossings.(Adopted from Kundu et al. [82].)derive hierarchical attributed graphs to deal withvariations in stroke lengths and connectedness dueto variable writing style of di�erent writers. Chenget al. [79] use the Hough transform [80] on single-character images to extract stroke lines from skele-tons of Chinese characters.5.4 Discrete featuresFrom thinned characters, the following featuresmay be extracted [81, 82]: the number of loops;the number of T-joints; the number of X-joints; thenumber of bend points; width-to-height ratio of en-closing rectangle; presence of an isolated dot; totalnumber of endpoints, and number of endpoints ineach of the four directions N, S, W, and E; num-ber of semi-circles in each of these four directions;and number of crossings with vertical and horizon-tal axes, respectively, the axes placed on the centerof gravity. The last two features are explained inFig. 21.One might use crossings with many superim-posed lines as features, and in fact, this was done inearlier OCR systems [1]. However, these featuresalone do not lead to robust recognition systems;as the number of superimposed lines is increased,the resulting features are less robust to variationsin fonts (for machine printed characters) and vari-ability in character shapes and writing styles (forhandwritten characters).5.5 ZoningHolb�k-Hanssen et al. [83] measured the length ofthe character graph in each zone (Fig. 22). Thesefeatures can be made size independent by dividingthe graph length in each zone by the total length ofthe line segments in the graph. However, the fea-tures can not be made rotation independent. Thepresence or absence of junctions or endpoints ineach zone can be used as additional features.Appeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996SAMSUNG EXHIBIT 1008

of 24

Feature Extraction Methods for Character Recognition | A Survey 17Figure 22: Zoning of character skeletons.5.6 Fourier DescriptorsThe Fourier descriptor methods described in Sec-tions 4.4{4.5 for character contours may also beused for character skeletons or character graphs,since the skeleton or graph can be traversed toform a (degenerated) closed curve. Taxt and Bjerde[84] studied Kuhl and Giardina's elliptic Fourierdescriptors [30], and stressed that for charactergraphs with two line endings, no junctions, and noloops, some of the descriptors will be zero, whilefor graphs with junctions or loops, all descriptorswill be non-zero. For the size- and rotation-variantdescriptors, Taxt and Bjerde stated that:� For straight lines,a�n = c�n = 0; n = 2; 4; 6; : : :b�n = d�n = 0; n = 1; 2; 3; : : :� For non-straight graphs with two line endings,no junctions, and no loops,b�n = d�n = 0; n = 1; 2; 3; : : :� For graphs with junctions or loops,a�n 6= 0, b�n 6= 0, c�n 6= 0, d�n 6= 0, n = 1; 2; 3; : : :The characteristics for rotation- and size-invariantfeatures were also found [84]. Taxt and Bjerde ob-served that instances of the same character thathappen to be di�erent with respect to the abovetypes will get very di�erent feature vectors. Thesolution was to pre-classify the character graphs into one of the three types, and then use a separateclassi�er for each type.5.7 Evaluation StudiesHolb�k-Hanssen et al. [83] compared the zon-ing method described in Sec. 5.5 with Zahn andRoskies' Fourier descriptor method on charactergraphs. For characters with known orientation, thezoning method was better, while Zahn and Roskies'Fourier descriptors were better on characters withunknown rotation.6 Neural Network Classi�ersMultilayer feedforward neural networks [85] havebeen used extensively in OCR, for example, by LeCun et al. [86], Takahashi [60], and Cao et al. [25].These networks may be viewed as a combined fea-ture extractor and classi�er. Le Cun et al. scaleeach input character to a 16 � 16 grid, which are

then fed into 256 input nodes of the neural net-work (Fig. 23). The network has ten output nodes,one for each of the ten digit classes `0'{`9' thatthe network tries to recognize. Three intermedi-ate layers were used. Each node in a layer hasconnections from a number of nodes in the pre-vious layer, and during the training phase, connec-tion weights are learned. The output at a nodeis a function (e.g., sigmoid) of the weighted sumof the connected nodes at the previous layer. Onecan think of a feedforward neural network as con-structing decision boundaries in a feature space,and as the number of layers and nodes increases,the exibility of the classi�er increases by allow-ing more and more complex decision boundaries.However, it has been shown [85, 86] that this ex-ibility must be restricted to obtain good recogni-tion performance. This is parallel to the curse ofdimensionality e�ect observed in statistical classi-�ers, mentioned earlier.Another viewpoint can also be taken. Le Cun etal.'s neural network can be regarded as performinghierarchical feature extraction. Each node \sees"a window in the previous layer and combines thelow-level features in this window into a higher levelfeature. So, the higher the network layer, the moreabstract and more global features are extracted, the�nal abstraction level being the features digit `0',digit `1', . . . , digit `9'. Note that the feature extrac-tors are not hand-crafted or ad-hoc selected rules,but are trained on a large set of training samples.Some neural networks are given extracted fea-tures as input instead of a scaled or subsampled in-put image (e.g., [60, 25]). Then the network can beviewed as a pure classi�er, constructing some com-plicated decision boundaries, or it can be viewed asextracting \superfeatures" in the combined processof feature extraction and classi�cation.One problem with using neural networks in OCRis that it is di�cult to analyze and fully understandthe decision making process [87]. What are the im-plicit features, and what are the decision bound-aries? Also, an unbiased comparison between neu-ral networks and statistical classi�ers is di�cult. Ifthe pixels themselves are used as inputs to a neuralnetwork, and the neural network is compared with,say, a k-nearest neighbor (kNN) classi�er using thesame pixels as \features", then the comparison isnot fair since a neural network has the opportunityto derive more meaningful features in the hiddenlayers. Rather, the best-performing statistical orstructural classi�ers have to be compared with thebest neural network classi�ers [88].7 DiscussionBefore selecting a speci�c feature extractionmethod, one needs to consider the total char-acter recognition system in which it will oper-ate. What kind of input characters is the sys-tem designed for? Is the input single-font typedor machine printed characters, multifont machineAppeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996SAMSUNG EXHIBIT 1008

of 24

18 TRIER, JAIN, and TAXT0 1 2 3 4 5 6 7 8 9

16 x 16 input image

12 feature detectors8 x 8 matrices

1. hidden layer

2. hidden layer

12 feature detectors4 x 4 matrices

3. hidden layer

30 units

10 output nodes

Figure 23: The neural network classi�er used by LeCun et al. [86].printed, neatly hand-printed, or unconstrainedhandwritten? What is the variability of the char-acters belonging to the same class? Are gray-levelimages or binary images available? What is thescanner resolution? Is a statistical or structuralclassi�er to be used? What are the throughputrequirements (characters per second) as opposedto recognition requirements (reject vs. error rate)?What hardware is available? Can special-purposehardware be used, or must the system run on stan-dard hardware? What is the expected price of thesystem? Such questions need to be answered in or-der to make a quali�ed selection of the appropriatefeature extraction method.Often, a single feature extraction method aloneis not su�cient to obtain good discriminationpower. An obvious solution is to combine fea-tures from di�erent feature extraction methods. Ifa statistical classi�er is to be used, and a largetraining set is available, discriminant analysis canbe used to select the features with highest dis-criminative power. The statistical properties ofsuch combined feature vectors need to be explored.Another approach is to use multiple classi�ers[20, 25, 26, 27, 89, 90]. In that case, one can evencombine statistical, structural, and neural networkclassi�ers to utilize their inherent di�erences.The main disadvantage of using gray scale image

based approaches is the memory requirements. Al-though Pavlidis [91] has shown that similar recog-nition rates may be achieved at a lower resolutionfor gray scale methods than for binary image basedmethods, gray scale images can not be compressedsigni�cantly without a loss of information. Binaryimages are easily compressed using, for example,run-length coding, and algorithms can be writtento work on this format. However, as the perfor-mance and memory capacity of computers continueto double every 18 months or so, gray-scale meth-ods will eventually become feasible in more andmore applications.To illustrate the process of identifying the bestfeature extraction methods, let us consider the dig-its in the hydrographic map (Fig. 1). The digitsare hand-printed by one writer, and have roughlythe same orientation, size, and slant (skew), al-though some variations exist, and they vary overthe di�erent portions of the whole map. These vari-ations are probably large enough to a�ect the fea-tures considerably, if rotation-variant, size-variant,or skew-variant features are used. However, by us-ing features invariant to scale, rotation, and skew,a larger variability is allowed, and confusion amongcharacters such as `6' and `9' may be expected. Byusing a statistical classi�er which assumes statisti-cally dependent features (e.g., using the multivari-ate Gaussian distribution), we can hope that thesevariations will be properly accounted for. Ideally, itshould then be possible to �nd the size, orientation,and perhaps slant directions in the feature spaceby principal component analysis (PCA), althoughthe actual PCA does not have to be implemented.However, characters with unusual size, rotation, orskew will probably not be correctly classi�ed. Anappropriate solution may therefore be to use a mixof variant and invariant features.For many applications, robustness to variabilityin character shape, to degradation, and to noiseis important. Characters may be fragmented ormerged. Other characters might be self-touchingor have a broken loop. For features extractedfrom character contours or skeletons, we will ex-pect very di�erent features depending on whetherfragmented, self-touching, or broken loop charac-ters occur or not. Separate classes will normallyhave to be used for these variants, but the trainingset may contain too few of each variant to makereliable class descriptions.Fourier descriptors cannot be applied to frag-mented characters in a meaningful way since thismethod extracts features from one single closedcontour or skeleton. Further, outer contour curvebased methods do not use information about theinterior of the characters, like holes in `8', `0', etc.,so one then has to consider if some classes will beeasily confused. A solution may be to use multi-stage classi�ers [25].Zoning, moment invariants, Zernike moments,and the Karhunen-Loeve transform may be goodalternatives, since they are not a�ected by theAppeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996SAMSUNG EXHIBIT 1008

of 24

Feature Extraction Methods for Character Recognition | A Survey 19above degradations to the same extent. Zoning isprobably not a good choice, since the variationspresent in each digit class may cause a speci�c partof a character to fall into di�erent zones for di�erentinstances. Cao et al. [25] tried to compensate forthis by using fuzzy borders, but this method is onlycapable of compensating for small variations of thecharacter shape. Moment invariants are invariantto size and rotation, and some moment invariantsare also invariant to skew and mirror images [42].Mirror image invariance is not desirable, so momentinvariants that are invariant to skew but not mirrorimages would be useful, and a few such invariantsdo exist [42]. Moment invariants lack the recon-structability property, which probably means thata few more features are needed than for features forwhich reconstruction is possible.Zernike moments are complex numbers whichthemselves are not rotation invariant, but their am-plitudes are. Also, size invariance is obtained byprescaling the image. In other words, we can ob-tain size- and rotation-dependent features. SinceZernike moments have the reconstructability prop-erty, they appear to be very promising for our ap-plication.Of the unitary image transforms, the Karhunen-Loeve transform has the best information compact-ness in terms of mean square error. However, sincethe features are only linear combinations of the pix-els in the input character image, we can not expectthem to be able to extract high-level features theway other methods do, so many more features areneeded, and thus a much larger training set thanfor other methods. Also, since the features are tiedto pixel locations, we can not expect to get classdescriptions suitable for parametric statistical clas-si�ers. Still, a non-parametric classi�er like the k-nearest neighbor classi�er [9] may perform well onthe Karhunen-Loeve transform features.Discretization errors and other high frequencynoise are removed when using Fourier descriptors(Figs. 18{19), moment invariants, or Zernike mo-ments, since we never use very high order terms.Zoning methods are also robust against high fre-quency noise because of the implicit low pass �lter-ing in the method.From the above analysis, it seems like Zernikemoments would be good features in our hydro-graphic map application. However, one really needsto perform an experimental evaluation of a few ofthe most promising methods to decide which fea-ture extraction method is the best in practice foreach application. The evaluation should be per-formed on large data sets that are representativefor the particular application. Large, standard datasets are now available from NIST [56] (Gaithers-burg, MD 20899, USA) and SUNY at Bu�alo [92](CEDAR, SUNY, Bu�alo, NY 14260, USA). Ifthese or other available data sets are not represen-tative, then one might have to collect a large dataset. However, performance on the standard datasets does give an indication of the usefulness of the

features, and provides performance �gures that canbe compared with other research groups' results.8 SummaryOptical character recognition (OCR) is one of themost successful applications of automatic patternrecognition. Since the mid 1950's, OCR has beena very active �eld for research and development[1]. Today, reasonably good OCR packages can bebought for as little as $100. However, these areonly able to recognize high quality printed text doc-uments or neatly written hand-printed text. Thecurrent research in OCR is now addressing doc-uments that are not well handled by the availablesystems, including severely degraded, omnifont ma-chine printed text, and (unconstrained) handwrit-ten text. Also, e�orts are being made to achievelower substitution error rates and reject rates evenon good quality machine printed text, since an ex-perienced human typist still has a much lower errorrate, albeit at a slower speed.Selection of feature extraction method is prob-ably the single most important factor in achiev-ing high recognition performance. Given the largenumber of feature extraction methods reported inthe literature, a newcomer to the �eld is faced withthe following question: Which feature extractionmethod is the best for a given application? Thisquestion led us to characterize the available fea-ture extraction methods, so that the most promis-ing methods could be sorted out. An experimentalevaluation of these few promising methods muststill be performed to select the best method for aspeci�c application.Devijver and Kittler de�ne feature extraction(page 12 in [11]) as the problem of \extracting fromthe raw data the information which is most relevantfor classi�cation purposes, in the sense of minimiz-ing the within-class pattern variability while en-hancing the between-class pattern variability." Inthis paper, we reviewed feature extraction methodsincluding:1. Template matching,2. Deformable templates,3. Unitary image transforms,4. Graph description,5. Projection histograms,6. Contour pro�les,7. Zoning,8. Geometric moment invariants,9. Zernike moments,10. Spline curve approximation,11. Fourier descriptors.Each of these methods may be applied to one ormore of the following representation forms1. Gray level character image.2. Binary character image,3. Character contour,4. Character skeleton or character graph.For each feature extraction method and each char-acter representation form, we discussed the prop-Appeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996SAMSUNG EXHIBIT 1008

of 24

20 TRIER, JAIN, and TAXTerties of the extracted features.Before selecting a speci�c feature extractionmethod, one needs to consider the total characterrecognition system in which it will operate. Theprocess of identifying the best feature extractionmethod was illustrated by considering the digits inthe hydrographic map (Fig. 1) as an example. Itappears that Zernike moments would be good fea-tures in this application. However, one really needsto perform an experimental evaluation of a few ofthe most promising methods to decide which fea-ture extraction method is the best in practice foreach application. The evaluation should be per-formed on large data sets that are representativefor the particular application.AcknowledgementsWe thank the anonymous referee for useful com-ments. This work was supported by a NATO col-laborative research grant (CRG 930289), and TheResearch Council of Norway.References[1] S. Mori, C. Y. Suen, and K. Yamamoto, \His-torical review of OCR research and devel-opement," Proceedings of the IEEE, vol. 80,pp. 1029{1058, July 1992.[2] J. W. Gorman, O. R. Mitchell, and F. P.Kuhl, \Partial shape recognition using dy-namic programming," IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 10,pp. 257{266, Mar. 1988.[3] E. Persoon and K.-S. Fu, \Shape discrim-ination using Fourier descriptors," IEEETrans. Systems, Man, and Cybernetics, vol. 7,pp. 170{179, Mar. 1977.[4] L. Shen, R. M. Rangayyan, and J. E. L.Desaultes, \Application of shape analysis tomammographic calci�cations," IEEE Trans.Medical Imaging, vol. 13, pp. 263{274, June1994.[5] D. G. Elliman and I. T. Lancaster, \A reviewof segmentation and contextual analysis fortext recognition," Pattern Recognition, vol. 23,no. 3/4, pp. 337{346, 1990.[6] C. E. Dunn and P. S. P. Wang, \Charactersegmentation techniques for handwritten text| a survey," in Proceedings of 11th IAPRInt. Conf. Pattern Recognition, vol. II, (TheHague, The Netherlands), pp. 577{580, IEEEComputer Society Press, 1992.[7] L. Lam, S.-W. Lee, and C. Y. Suen, \Thin-ning methodologies | a comprehensive sur-vey," IEEE Trans. Pattern Analysis and Ma-chine Intelligence, vol. 14, pp. 869{885, Sept.1992.[8] K.-S. Fu, Syntactic Pattern Recognition andApplication. Englewood Cli�s, New Jersey:Prentice-Hall, 1982.

[9] R. O. Duda and P. E. Hart, Pattern Classi�-cation and Scene Analysis. New York: JohnWiley & Sons, 1973.[10] K. Fukunaga, Introduction to Statistical Pat-tern Recognition. Boston: Academic Press,1990.[11] P. A. Devijver and J. Kittler, Pattern Recog-nition: A Statistical Approach. London:Prentice-Hall, 1982.[12] J. R. Ullmann, Pattern Recognition Tech-niques. London, England: Butterworth, 1973.[13] �. D. Trier and T. Taxt, \Evaluation of bina-rization methods for document images," IEEETrans. Pattern Analysis and Machine Intelli-gence, vol. 17, pp. 312{315, Mar. 1995.[14] �. D. Trier and A. K. Jain, \Goal-directedevaluation of binarization methods," IEEETrans. Pattern Analysis and Machine Intelli-gence, vol. 17, pp. 1191{1201, Dec. 1995.[15] S.-W. Lee, L. Lam, and C. Y. Suen, \Perfor-mance evaluation of skeletonizing algorithmsfor document image processing," in Proceed-ings of the First International Conference onDocument Analysis and Recognition, (Saint-Malo, France), pp. 260{271, IEEE ComputerSociety Press, 1991.[16] M. Y. Jaisimha, R. M. Haralick, and D. Dori,\Quantitative performance evaluation of thin-ning algorithms under noisy conditions," inProceedings of IEEE Conf. Computer Visionand Pattern Recognition (CVPR), (Seattle,WA), pp. 678{683, June 1994.[17] V. K. Govindan and A. P. Shivaprasad, \Char-acter recognition | a review," Pattern Recog-nition, vol. 23, no. 7, pp. 671{683, 1990.[18] G. Nagy, \At the frontiers of OCR," Proceed-ings of the IEEE, vol. 80, pp. 1093{1100, July1992.[19] C. Y. Suen, R. Legault, C. Nadal, M. Cheriet,and L. Lam, \Building a new generationof handwriting recognition systems," PatternRecognition Letters, vol. 14, pp. 303{315, Apr.1993.[20] C. Y. Suen, C. Nadal, R. Legault, T. A. Mai,and L. Lam, \Computer recognition of uncon-strained handwritten numerals," Proceedingsof the IEEE, vol. 80, pp. 1162{1180, July 1992.[21] C. Y. Suen, M. Berthod, and S. Mori, \Auto-matic recognition of handprinted characters |the state of the art," Proceedings of the IEEE,vol. 68, pp. 469{487, Apr. 1980.[22] J. Mantas, \An overview of character recog-nition methodologies," Pattern Recognition,vol. 19, no. 6, pp. 425{430, 1986.[23] A. K. Jain and B. Chandrasekaran, \Dimen-sionality and sample size considerations inpattern recognition practice," in Classi�ca-tion, Pattern Recognition, and Reduction ofAppeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996SAMSUNG EXHIBIT 1008

of 24

Feature Extraction Methods for Character Recognition | A Survey 21Dimensionality (P. R. Krishnaiah and L. N.Kanal, eds.), vol. 2 of Handbook of Statistics,pp. 835{855, Amsterdam: North-Holland,1982.[24] P. Gader, B. Forester, M. Ganzberger,A. Gillies, B. Mitchell, M. Whalen, andT. Yocum, \Recognition of handwritten digitsusing template and model matching," PatternRecognition, vol. 24, no. 5, pp. 421{431, 1991.[25] J. Cao, M. Ahmadi, and M. Shridhar, \Hand-written numeral recognition with multiple fea-tures and multistage classi�ers," in IEEE In-ternational Symposium on Circuits and Sys-tems, vol. 6, (London), pp. 323{326, May 30{June 2 1994.[26] L. Xu, A. Krzy_zak, and C. Y. Suen, \Meth-ods of combining multiple classi�ers andtheir applications to handwriting recognition,"IEEE Trans. Systems, Man, and Cybernetics,vol. 22, no. 3, pp. 418{435, 1992.[27] F. Kimura and M. Shridhar, \Handwrittennumerical recognition based on multiple algo-rithms," Pattern Recognition, vol. 24, no. 10,pp. 969{983, 1991.[28] T. Y. Zhang and C. Y. Suen, \A fast parallelalgorithm for thinning digital patterns," Com-munications of the ACM, vol. 27, pp. 236{239,Mar. 1984.[29] T. H. Reiss, \The revised fundamental the-orem of moment invariants," IEEE Trans.Pattern Analysis and Machine Intelligence,vol. 13, pp. 830{834, Aug. 1991.[30] F. P. Kuhl and C. R. Giardina, \EllipticFourier features of a closed contour," Com-puter Vision, Graphics and Image Processing,vol. 18, pp. 236{258, 1982.[31] A. Khotanzad and Y. H. Hong, \Invariant im-age recognition by zernike moments," IEEETrans. Pattern Analysis and Machine Intelli-gence, vol. 12, no. 5, pp. 489{497, 1990.[32] D. H. Ballard and C. M. Brown, Computer Vi-sion, pp. 65{70. Englewood Cli�s, New Jersey:Prentice-Hall, 1982.[33] W. K. Pratt, Digital Image Processing. NewYork: John Wiley & Sons, second ed., 1991.[34] G. Storvik, \A bayesian approach to dynamiccontours through stochastic sampling andsimulated annealing," IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 16,pp. 976{986, Oct. 1994.[35] M. Kass, A. Witkin, and D. Terzopoulos,\Snakes: Active contour models," Interna-tional Journal of Computer Vision, vol. 5,pp. 321{331, 1988.[36] A. D. Bimbo, S. Santini, and J. Sanz, \OCRfrom poor quality images by deformationof elastic templates," in Proceedings of 12thIAPR Int. Conf. Pattern Recognition, vol. 2,(Jerusalem, Israel), pp. 433{435, 1994.

[37] H. C. Andrews, \Multidimensional rotationsin feature selection," IEEE Trans. Computers,vol. 20, pp. 1045{1051, Sept. 1971.[38] R. C. Gonzalez and R. E. Woods, Digital Im-age Processing. Addison-Wesley, 1992.[39] M. Turk and A. Pentland, \Eigenfaces forrecognition," Journal of Cognitive Neuro-science, vol. 3, no. 1, pp. 71{86, 1991.[40] M. Bokser, \Omnidocument technologies,"Proceedings of the IEEE, vol. 80, pp. 1066{1078, July 1992.[41] M.-K. Hu, \Visual pattern recognition bymoment invariants," IRE Trans. InformationTheory, vol. IT-8, pp. 179{187, Feb. 1962.[42] T. H. Reiss, Recognizing Planar Objects UsingInvariant Image Features, vol. 676 of LectureNotes in Computer Science. Springer-Verlag,1993.[43] S. O. Belkasim, M. Shridhar, and A. Ahmadi,\Pattern recognition with moment invariants:A comparative study and new results," Pat-tern Recognition, vol. 24, pp. 1117{1138, Dec.1991.[44] S. O. Belkasim, M. Shridhar, and A. Ahmadi,\Corrigendum," Pattern Recognition, vol. 26,p. 377, Jan. 1993.[45] Y. Li, \Reforming the theory of invariant mo-ments for pattern recognition," Pattern Recog-nition Letters, vol. 25, pp. 723{730, July 1992.[46] J. Flusser and T. Suk, \Pattern recognitionby a�ne moment invariants," Pattern Recog-nition, vol. 26, pp. 167{174, Jan. 1993.[47] J. Flusser and T. Suk, \A�ne moment invari-ants: A new tool for character recognition,"Pattern Recognition Letters, vol. 15, pp. 433{436, Apr. 1994.[48] B. Bamieh and R. J. P. de Figueiredo, \Ageneral moment-invariants/attributed-graphmethod for three-dimensional object recogni-tion from a single image," IEEE J. Roboticsand Automation, vol. 2, pp. 31{41, Mar. 1986.[49] A. Khotanzad and Y. H. Hong, \Rotation in-variant image recognition using features se-lected via a systematic method," PatternRecognition, vol. 23, no. 10, pp. 1089{1101,1990.[50] J. M. Westall and M. S. Narasimha, \Vertexdirected segmentation of handwritten numer-als," Pattern Recognition, vol. 26, pp. 1473{1486, Oct. 1993.[51] H. Fujisawa, Y. Nakano, and K. Kurino, \Seg-mentation methods for character recognition:from segmentation to document structureanalysis," Proceedings of the IEEE, vol. 80,pp. 1079{1092, July 1992.[52] �. D. Trier, T. Taxt, and A. K. Jain, \Datacapture from maps based on gray scale topo-graphic analysis," in Proceedings of the ThirdAppeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996SAMSUNG EXHIBIT 1008

of 24

22 TRIER, JAIN, and TAXTInternational Conference on Document Anal-ysis and Recognition, (Montreal, Canada),pp. 923{926, Aug. 1995.[53] L. Wang and T. Pavlidis, \Direct gray-scaleextraction of features for character recogni-tion," IEEE Trans. Pattern Analysis and Ma-chine Intelligence, vol. 15, pp. 1053{1067, Oct.1993.[54] R. M. Haralick, L. T.Watson, and T. J. La�ey,\The topographic primal sketch," J. RoboticsRes., vol. 2, pp. 50{72, Spring 1983.[55] J. D. Tubbs, \A note on binary templatematching," Pattern Recognition, vol. 22, no. 4,pp. 359{365, 1989.[56] M. D. Garris, J. L. Blue, G. T. Candela, D. L.Dimmick, J. Gelst, P. J. Grother, S. A. Janet,and C. L. Wilson, \NIST form-based hand-print recognition system," Tech. Rep. NIS-TIR 5469, National Institute of Standards andTechnology, Gaithersburg, MD 20899, USA,July 1994.[57] M. H. Glauberman, \Character recognitionfor business machines," Electronics, vol. 29,pp. 132{136, Feb. 1956.[58] R. Kasturi and M. M. Trivedi, eds., ImageAnalysis Applications. New York: MarcelDekker, 1990.[59] L. Yang and F. Albregtsen, \Fast computa-tion of invariant geometric moments: A newmethod giving correct results," in Proceedingsof 12th IAPR Int. Conf. Pattern Recognition,vol. 1, (Jerusalem, Israel), pp. 201{204, Oct.1994.[60] H. Takahashi, \A neural net OCR using geo-metrical and zonal pattern features," in Pro-ceedings of the First International Confer-ence on Document Analysis and Recognition,(Saint-Malo, France), pp. 821{828, 1991.[61] I. Sekita, K. Toraichi, R. Mori, K. Yamamoto,and H. Yamada, \Feature extraction of hand-written Japanese characters by spline func-tions for relaxation matching," Pattern Recog-nition, no. 1, pp. 9{17, 1988.[62] T. Taxt, J. B. �Olafsd�ottir, and M. D�hlen,\Recognition of handwritten symbols," Pat-tern Recognition, vol. 23, no. 11, pp. 1155{1166, 1990.[63] C.-S. Lin and C.-L. Hwang, \New forms ofshape invariants from elliptic Fourier descrip-tors," Pattern Recognition, vol. 20, no. 5,pp. 535{545, 1987.[64] C. T. Zahn and R. C. Roskies, \Fourierdescriptors for plane closed curves," IEEETrans. Computers, vol. 21, pp. 269{281, Mar.1972.[65] G. H. Granlund, \Fourier preprocessing forhand print character recognition," IEEETrans. Computers, vol. 21, pp. 195{201, Feb.1972.

[66] B. K. Jang and R. T. Chin, \One-pass paral-lel thinning: Analysis, properties and quan-titative evaluation," IEEE Trans. PatternAnalysis and Machine Intelligence, vol. 14,pp. 1129{1140, Nov. 1992.[67] P. C. K. Kwok, \A thinning algorithm bycontour generation," Communications of theACM, vol. 31, no. 11, pp. 1314{1324, 1988.[68] L. Wang and T. Pavlidis, \Detection of curvedand straight segments from gray scale topogra-phy," CVGIP: Image Understanding, vol. 58,pp. 352{365, Nov. 1993.[69] S.-W. Lee and J.-S. Park, \Nonlinear shapenormalization methods for the recognitionof large-set handwritten characters," PatternRecognition, vol. 27, no. 7, pp. 895{902, 1994.[70] H. Yamada, K. Yamamoto, and T. Saito,\A nonlinear normalization method for hand-printed kanji character recognition | linedensity equalization," Pattern Recognition,vol. 23, no. 9, pp. 1023{1029, 1990.[71] D. J. Burr, \Elastic matching of line draw-ings," IEEE Trans. Pattern Analysis and Ma-chine Intelligence, vol. 3, pp. 708{713, Nov.1981.[72] T. Wakahara, \Toward robust handwrittencharacter recognition," Pattern RecognitionLetters, vol. 14, pp. 345{354, Apr. 1993.[73] T. Wakahara, \Shape matching using LATand its application to handwritten numeralrecognition," IEEE Trans. Pattern AnalysisandMachine Intelligence, vol. 16, pp. 618{629,June 1994.[74] T. Pavlidis, \A vectorizer and feature extrac-tor for document recognition," Computer Vi-sion, Graphics and Image Processing, vol. 35,pp. 111{127, 1986.[75] S. Kahan, T. Pavlidis, and H. S. Baird, \Onthe recoginition of printed characters of anyfont and size," IEEE Trans. Pattern Analysisand Machine Intelligence, vol. 9, pp. 274{288,Mar. 1987.[76] S. W. Lu, Y. Ren, and C. Y. Suen, \Hier-archical attributed graph representation andrecognition of handwritten chinese charac-ters," Pattern Recognition, vol. 24, no. 7,pp. 617{632, 1991.[77] H.-J. Lee and B. Chen, \Recognition of hand-written Chinese characters via short line seg-ments," Pattern Recognition, vol. 25, no. 5,pp. 543{552, 1992.[78] F.-H. Cheng, W.-H. Hsu, and M.-C. Kuo,\Recognition of handprinted Chinese charac-ters via stroke relaxation," Pattern Recogni-tion, vol. 26, no. 4, pp. 579{593, 1993.[79] F.-H. Cheng, W.-H. Hsu, and M.-Y. Chen,\Recognition of handwritten Chinese char-acters by modi�ed Hough transform tech-niques," IEEE Trans. Pattern Analysis andAppeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996SAMSUNG EXHIBIT 1008

of 24

Feature Extraction Methods for Character Recognition | A Survey 23Machine Intelligence, vol. 11, pp. 429{439,Apr. 1989.[80] R. O. Duda and P. E. Hart, \Use of theHough transformation to detect lines andcurves in pictures," Communications of theACM, vol. 15, pp. 11{15, Jan. 1972.[81] S. R. Ramesh, \A generalized character recog-nition algorithm: A graphical approach," Pat-tern Recognition, vol. 22, no. 4, pp. 347{350,1989.[82] A. Kundu, Y. He, and P. Bahl, \Recognitionof handwritten word: First and second orderhidden Markov model based approach," Pat-tern Recognition, vol. 22, no. 3, pp. 283{297,1989.[83] E. Holb�k-Hanssen, K. Br�athen, and T. Taxt,\A general software system for supervised sta-tistical classi�cation of symbols," in Proceed-ings of the 8th International Conference onPattern Recognition, (Paris, France), pp. 144{149, Oct. 1986.[84] T. Taxt and K. W. Bjerde, \Classi�cationof handwritten vector symbols using ellipticFourier descriptors," in Proceedings of 12thIAPR Int. Conf. Pattern Recognition, vol. 2,(Jerusalem, Israel), pp. 123{128, Oct. 1994.[85] J. Hertz, A. Krogh, and R. G. Palmer, Intro-duction to the Theory of Neural Computation.Redwood City, CA: Addison-Wesley, 1991.[86] Y. L. Cun, B. Boser, J. S. Denker, D. Hender-son, R. E. Howard, W. Hubband, and L. D.Jackel, \Backpropagation applied to hand-written zip code recognition," Neural Compu-tation, vol. 1, pp. 541{551, 1989.[87] R. P. W. Duin, \Superlearning and neuralnetwork magic," Pattern Recognition Letters,vol. 15, pp. 215{217, Mar. 1994.[88] A. K. Jain and J. Mao, \Neural networksand pattern recognition," in Proceedings of theIEEE World Congress on Computational In-telligence, (Orlando, FL), June 1994.[89] T. K. Ho, J. J. Hull, and S. N. Srihari, \De-cision combination in multiple classi�er sys-tem," IEEE Trans. Pattern Analysis and Ma-chine Intelligence, vol. 16, pp. 66{75, Jan.1994.[90] M. Sabourin, A. Mitiche, D. Thomas, andG. Nagy, \Classi�er combination for hand-printed digit recognition," in Proceedings ofthe Second International Conference on Docu-ment Analysis and Recognition, (Tsukuba Sci-ence City, Japan), pp. 163{166, IEEE Com-puter Society Press, Oct. 1993.[91] T. Pavlidis, \Recognition of printed text un-der realistic conditions," Pattern RecognitionLetters, vol. 14, pp. 317{326, Apr. 1993.[92] J. J. Hull, \A database for handwritten textresearch," IEEE Trans. Pattern Analysis and

Machine Intelligence, vol. 16, pp. 550{554,May 1994.

Appeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996SAMSUNG EXHIBIT 1008

of 24

24 TRIER, JAIN, and TAXTAbout the author| �IVIND DUE TRIER was born in Oslo, Norway, in 1966. He received hisM.Sc. degree in Computer Science from The Norwegian Institute of Technology in 1991, and was with theNorwegian Defense Research Establishment from 1991 to 1992. Since 1992, he has been a Ph.D. studentat the Department of Informatics, University of Oslo. His current research interests include patternrecognition, image analysis, document processing, and geographic information systems.About the author| ANIL K. JAIN received a B.Tech. degree in 1969 from the Indian Instituteof Technology, Kanpur, and the M.S. and Ph.D. degrees in Electrical Engineering from the Ohio StateUniversity, in 1970 and 1973, respectively. He joined the faculty of Michigan State University in 1974,where he currently holds the rank of University Distinguished Professor in the Department of ComputerScience. Dr. Jain served as Program Director of the Intelligent Systems Program at the National ScienceFoundation (1980-81), and has held visiting appointments at Delft Technical University, Holland, Nor-wegian Computing Center, Oslo, and Tata Research Development and Design Center, Pune, India. Hehas also been a consultant to several industrial, government and international organizations. His currentresearch interests are computer vision, image processing, and pattern recognition.Dr. Jain has made signi�cant contributions and published a large number of papers on the followingtopics: statistical pattern recognition, exploratory pattern analysis, Markov random �elds, texture anal-ysis, interpretation of range images, and 3D object recognition. Several of his papers have been reprintedin edited volumes on image processing and pattern recognition. He received the best paper awards in 1987and 1991, and received certi�cates for outstanding contributions in 1976, 1979 and 1992 from the PatternRecognition Society. Dr. Jain was the Editor-in-Chief of the IEEE Transactions on Pattern Analysis andMachine Intelligence (1991-1994), and is on the editorial boards of Pattern Recognition, Pattern Recogni-tion Letters, Journal of Mathematical Imaging, Journal of Applied Intelligence, and IEEE Transactions onNeural Networks. He is the co-author of Algorithms for Clustering Data, Prentice-Hall, 1988, has editedthe book Real-Time Object Measurement and Classi�cation, Springer-Verlag, 1988, and has co-editedthe books, Analysis and Interpretation of Range Images, Springer-Verlag, 1989, Neural Networks andStatistical Pattern Recognition, North-Holland, 1991, Markov Random Fields: Theory and Applications,Academic Press, 1993, and 3D Object Recognition, Elsevier, 1993.Dr. Jain is a Fellow of the IEEE. He was the Co-General Chairman of the 11th International Conferenceon Pattern Recognition, Hague (1992), General Chairman of the IEEE Workshop on Interpretation of3D Scenes, Austin (1989), Director of the NATO Advanced Research Workshop on Real-time ObjectMeasurement and Classi�cation, Maratea (1987), and co-directed NSF supported Workshops on \FutureReserach Directions in Computer Vision", Maui (1991), \Theory and Applications of Markov RandomFields", San Diego (1989) and \Range Image Understanding", East Lansing (1988). Dr. Jain was amember of the IEEE Publications Board (1988-90) and served as the Distinguished Visitor of the IEEEComputer Society (1988-90).About the author| TORFINN TAXT was born in 1950 and is Professor for the Medical ImageAnalysis Section at the University of Bergen and Professor in image processing at the University of Oslo.He is an associate editor of the IEEE Transactions on Medical Imaging and of Pattern Recognition, andhas a Ph.D. in developmental neuroscience (1983), Medical Degree (1976), and M.S. in computer science(1978) from the University of Oslo. His current research interests are in the areas of image restoration,image segmentation, multispectral analysis, medical image analysis, and document processing. He haspublished papers on the following topics: restoration of medical ultrasound and magnetic resonanceimages, magnetic resonance and remote sensing multispectral analysis, quantum �eld models for imageanalysis, and document processing.Appeared in Pattern Recognition, Vol. 29, No. 4, pp. 641{662, 1996


Page 1 of 24 SAMSUNG EXHIBIT 1008 Samsung v. Image ...

Documents

Transcript of Page 1 of 24 SAMSUNG EXHIBIT 1008 Samsung v. Image ...