AB - TKKcis.legacy.ics.tkk.fi/aksela/aksela_mthesis.pdf15 1.2 Aims and o v erview of this thesis. 15...

AB HELSINKI UNIVERSITY OF TECHNOLOGY

Department of Engineering Physics and Mathematics

Handwritten Chara ter Re ognition:A Palm-top Implementation andAdaptive Committee ExperimentsMatti Aksela

The thesis belongs to the spe ial subje t of information s ien eSupervisor Professor Erkki OjaInstru tors Dr. Jorma Laaksonen and Do ent Jari KangasEspoo, Finland 22nd May 2000

HELSINKI UNIVERSITY OF TECHNOLOGY ABSTRACT OF THEMASTER'S THESISAuthor and name of the thesis:Matti AkselaHandwritten Chara ter Re ognition: A Palm-top Implementation andAdaptive Committee ExperimentsDate: 22nd May 2000 Number of pages: 120Department: Department of Engineering Physi s and Mathemati sProfessorship: Tik-61 Information S ien esSupervisor: Professor Erkki OjaInstru tors: Dr. Jorma Laaksonen and Do ent Jari KangasThe work presented in this thesis addresses on-line re ognition of handwritten hara ters.An adaptive on-line handwritten hara ter re ognition system has been developed earlierin the Laboratory of Computer and Information S ien e. This thesis is fo used aroundand expands the system.The literature survey part of the thesis is two-fold, as the �rst part deals with pen omput-ing hardware and appli ations and their development as well as the use of handwritten textas an input method in su h devi es. The se ond part of the survey is fo used on variousapproa hes to handwritten hara ter re ognition. Methods for both individual lassi�ersand ommittee ombination are explored, with the fo us on methods suitable for on-lineoperation.For the pra ti al part of this thesis two separable fra tions an also be identi�ed, onefo using on the palm-top implementation of the re ognition system and the other on theimplementation and testing of an adaptive ommittee lassi�er based on the Dynami allyExpanding Context (DEC) prin iple. The adaptive on-line re ognizer developed earlier isimplemented on the palm-top platform and a suitable interfa e for data olle tion and re -ognizer performan e evaluation is reated. The interfa e is in the form of a questionnaireprogram to whi h the user answers by using handwriting. Also some speed-up methodsne essitated by the platform have been investigated. The DEC ommittee has been imple-mented to join the results from a varying number of lassi�ers in an adaptive ommitteeoperation. Its performan e has been evaluated in omparison to some referen e ommitteemethods, both adaptive and nonadaptive.Results from the palm-top implementation show that the system is apable of adapting tothe user's style of writing. Also the speed-up methodologies provide noti eable gains inre ognition speed. The DEC ommittee established a notable performan e in rease overany member lassi�er and also outperformed all the referen e lassi�ers.Keywords:pattern re ognition, isolated handwritten hara ters, on-line re ognition, adaptation,learning systems, ommittee re ognition, pen-based omputing, palm-top;k Nearest Neighbour rule, Learning Ve tor Quantization (LVQ), Dynami Time Warping(DTW), elasti mat hing, Dynami ally Expanding Context (DEC);

TEKNILLINEN KORKEAKOULU DIPLOMITYÖN TIIVISTELMÄTekijä ja työn nimi:Matti AkselaKäsinkirjoitettujen merkkien tunnistus: kämmentietokonetoteutus ja adaptiivisia komitea-kokeitaPäivämäärä: 22. 5. 2000 Sivumäärä: 120Osasto: Teknillisen fysiikan ja matematiikan osastoProfessuuri: Tik-61 InformaatiotekniikkaValvoja: Professori Erkki OjaOhjaajat: TkT Jorma Laaksonen ja Dosentti Jari KangasTyö liittyy käsinkirjoitettujen merkkien reaaliaikaiseen tunnistukseen. Se perustuu ja toi-mii lisänä Informaatiotekniikan laboratoriossa toteutettuun adaptiiviseen reaaliaikaiseenkäsinkirjoitettujen merkkien tunnistusjärjestelmään.Työn kirjallisuustutkimusosuus on kaksiosainen. Ensimmäinen osa käsittelee kynäpohjai-seen tietojenkäsittelyyn liittyviä laitteistoja sekä sovelluksia ja niiden kehitystä kuten myöskäsinkirjoitettujen merkkien käyttöä tiedonsyöttömenetelmänä tällaisissa laitteissa. Toi-nen osa kirjallisuustutkimusta puolestaan keskittyy erinäisiin käsinkirjoitettujen merkkientunnistukseen liittyviin lähestymistapoihin. Menetelmiä sekä yksittäisille luokittimille ettäluokittimien yhdistämiseen komiteatunnistimeksi on tutkittu pääpainon ollessa menetel-millä, jotka soveltuvat reaaliaikaiseen tunnistukseen.Käytännön osuus koostuu myös kahdesta osakokonaisuudesta. Niistä ensimmäinen keskit-tyy tunnistimen kämmentietokonetoteutukseen ja toinen Dynami ally Expanding Context(DEC)-periaatteeseen pohjautuvan adaptiivisen komitealuokittimen toteutukseen ja tes-taamiseen. Adaptiivisen reaaliaikaisen käsinkirjoitettujen merkkien tunnistusjärjestelmätoteutettiin kämmentietokoneelle ja järjestelmälle luotiin tiedonkeruuseen ja tunnistimientoimivuuden tarkasteluun soveltuva käyttöliittymä. Käyttöliittymä on toteutettu kyselynmuodossa, johon käyttäjä vastaa käsinkirjoitettujen merkkien avulla. Myös toteutuksenvaatimia nopeutusmenetelmiä on tutkittu. DEC-komitea toteutettiin yhdistämään tulok-sia vaihtelevasta määrästä luokittimia adaptiivisella komiteaperiaatteella. Sen suoritusky-kyä on arvioitu vertaamalla saatuja tuloksia toisiin, sekä adaptiivisiin että epäadaptiivisiin,komitealuokittimiin.Kämmentietokonetoteutuksesta saadut tulokset osoittavat, että järjestelmä kykenee adap-tiotumaan käyttäjän kirjoitustapaan. Myös nopeutusmenetelmillä saatiin huomattavia pa-rannuksia tunnistusnopeuteen. DEC-komitealla tehdyt kokeet osoittivat sen saavuttavanmerkittävästi sekä jäseniään että vertailuluokittimia paremman tunnistustarkkuuden.Avainsanat:hahmontunnistus, yksittäiset käsinkirjoitetut merkit, tosiaikainen tunnistus, adaptaatio,oppivat järjestelmät, komitealuokittelu, kynäpohjainen tietokone, kämmentietokone;k:n lähimmän naapurin menetelmä, oppiva vektorikvantisaatio (LVQ), Dynami TimeWarping (DTW), elastinen sovitus, Dynami ally Expanding Context (DEC)

A knowledgmentsAll resear h presented in this thesis has been ondu ted in the Laboratory of Computerand Information S ien e (CIS) in the Department of Computer S ien e and Engineer-ing at the Helsinki University of Te hnology. Without the ex ellent resour es available,obtaining the results presented here would not have been possible. Overall, the CIS lab-oratory has provided a very pleasurable working environment, for whi h my gratitudegoes out to the entire sta�.First, I would like to thank Professor Erkki Oja for supervising my work and the e�orthe has put into making this proje t possible. I would also like to express my deep-est gratitude and respe t for A ademi ian Teuvo Kohonen for the extensive resear hresour es and traditions he has made possible. My thanks also go to Nokia Resear hCenter (NRC) for the funding of this resear h.The work presented here ould not have been possible without the expert guidan eand insight of Dr. Jorma Laaksonen, who fun tioned as the instru tor for this thesis aswell as a oordinator of e�orts in the On-line Re ognition of Handwritten Chara ters-proje t. Without his help and knowledge many of the problems en ountered duringthis work might have never been solved. Also the guidan e of Do ent Jari Kangasfrom NRC has been vital to the ompletion of this thesis. The other members of thisresear h proje t, M.S . Vuokko Vuori from CIS laboratory and Mr. Jukka Yrjänäinenfrom NRC have also provided great help and support.

Otaniemi 22nd May 2000 Matti Aksela

ContentsAbbreviations 9List of Symbols 111 Introdu tion 141.1 Handwriting as an input method . . . . . . . . . . . . . . . . . . . . . 151.2 Aims and overview of this thesis . . . . . . . . . . . . . . . . . . . . . . 152 Pen omputing 172.1 Palm-top omputers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2 Bene�ts of pen-based omputers . . . . . . . . . . . . . . . . . . . . . . 182.3 Problems with pen-based omputers . . . . . . . . . . . . . . . . . . . . 202.3.1 Re ognition a ura y . . . . . . . . . . . . . . . . . . . . . . . . 212.3.2 User interfa es . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.3.3 The pen-and-paper metaphor . . . . . . . . . . . . . . . . . . . 232.4 Palm-top HCR systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.5 Future views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Handwritten hara ter re ognition 263.1 Time of re ognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.1.1 O�-line handwriting re ognition . . . . . . . . . . . . . . . . . . 273.1.2 On-line handwriting re ognition . . . . . . . . . . . . . . . . . . 273.2 Variation in writing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2.1 Strokes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

3.2.2 Chara ter sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2.3 Writing style . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.3 Data prepro essing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.4 Prototype sele tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.5 Re ognition methods for individual re ognizers . . . . . . . . . . . . . . 333.5.1 Statisti al methods . . . . . . . . . . . . . . . . . . . . . . . . . 333.5.2 Synta ti and stru tural methods . . . . . . . . . . . . . . . . . 353.5.3 Time warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.5.4 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 393.5.5 Fuzzy lassi� ation . . . . . . . . . . . . . . . . . . . . . . . . . 403.6 Adaptation methods for re ognizers . . . . . . . . . . . . . . . . . . . . 413.7 Re ognizer ombination methods . . . . . . . . . . . . . . . . . . . . . 423.7.1 Majority voting . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.7.2 Probabilisti ombination methods . . . . . . . . . . . . . . . . 443.7.3 EM algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.7.4 Multi-stage ombinations . . . . . . . . . . . . . . . . . . . . . . 473.7.5 Group-wise lassi� ation . . . . . . . . . . . . . . . . . . . . . . 493.7.6 Criti -driven ombining . . . . . . . . . . . . . . . . . . . . . . 513.7.7 Behavior-knowledge spa e method . . . . . . . . . . . . . . . . . 523.7.8 Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543.8 Adaptation methods for re ognizer ombinations . . . . . . . . . . . . . 554 Des ription of the re ognition system 564.1 Data representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564.2 Prepro essing and normalization . . . . . . . . . . . . . . . . . . . . . . 574.3 Distan e measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.3.1 Point-to-point distan es . . . . . . . . . . . . . . . . . . . . . . 584.3.2 Point-to-line distan es . . . . . . . . . . . . . . . . . . . . . . . 604.3.3 Area-based distan es . . . . . . . . . . . . . . . . . . . . . . . . 634.3.4 Symbol-string-based distan es . . . . . . . . . . . . . . . . . . . 656

4.4 Prototype set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.5 Prototype pruning and ordering . . . . . . . . . . . . . . . . . . . . . . 684.6 Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694.6.1 Prototype addition . . . . . . . . . . . . . . . . . . . . . . . . . 694.6.2 Prototype ina tivation . . . . . . . . . . . . . . . . . . . . . . . 704.6.3 Prototype modi� ation . . . . . . . . . . . . . . . . . . . . . . . 704.6.4 Hybrid approa hes . . . . . . . . . . . . . . . . . . . . . . . . . 725 Palm-top devi e implementation 735.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735.2 Palm-top platform des ription . . . . . . . . . . . . . . . . . . . . . . . 745.3 Implementational Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 745.3.1 Stray point removal . . . . . . . . . . . . . . . . . . . . . . . . . 755.4 Computational enhan ements . . . . . . . . . . . . . . . . . . . . . . . 765.4.1 Dynami memory allo ation . . . . . . . . . . . . . . . . . . . . 785.4.2 Floating point al ulations . . . . . . . . . . . . . . . . . . . . . 785.4.3 Computational power . . . . . . . . . . . . . . . . . . . . . . . . 795.4.4 Data storage spa e . . . . . . . . . . . . . . . . . . . . . . . . . 795.5 The questionnaire appli ation . . . . . . . . . . . . . . . . . . . . . . . 805.5.1 Inputting text . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.5.2 Determining labels . . . . . . . . . . . . . . . . . . . . . . . . . 825.5.3 Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.5.4 Data olle tion . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.6 Speedup methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.6.1 De imation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.6.2 Predi tive al ulation aborting . . . . . . . . . . . . . . . . . . . 855.7 Current level of performan e . . . . . . . . . . . . . . . . . . . . . . . . 875.8 Experiments and results . . . . . . . . . . . . . . . . . . . . . . . . . . 887

6 Adaptive ommittee re ognition 906.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 906.2 Dynami ally Expanding Context . . . . . . . . . . . . . . . . . . . . . 906.2.1 The original DEC prin iple . . . . . . . . . . . . . . . . . . . . 916.2.2 E�e ts of spe i� ity hierar hies . . . . . . . . . . . . . . . . . . 926.3 DEC-based adaptive ommittee lassi�er . . . . . . . . . . . . . . . . . 936.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 936.3.2 Example of operation . . . . . . . . . . . . . . . . . . . . . . . . 946.3.3 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 956.3.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 966.4 Referen e ommittee lassi�ers . . . . . . . . . . . . . . . . . . . . . . . 976.4.1 Adjusting best ommittee . . . . . . . . . . . . . . . . . . . . . 976.4.2 Adjusting majority voting ommittee . . . . . . . . . . . . . . . 976.4.3 Modi�ed Current-Best-Learning ommittee . . . . . . . . . . . . 986.5 Experiments and their results . . . . . . . . . . . . . . . . . . . . . . . 996.5.1 Des ription of the data sets . . . . . . . . . . . . . . . . . . . . 996.5.2 Member lassi�ers . . . . . . . . . . . . . . . . . . . . . . . . . 1006.5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1016.6 Future dire tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057 Con lusions 1087.1 Palm omputing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1087.2 Handwriting re ognition in literature . . . . . . . . . . . . . . . . . . . 1097.3 True label dedu tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1097.4 Future views for the palm-top implementation . . . . . . . . . . . . . . 1097.5 Adaptive ommittee re ognition . . . . . . . . . . . . . . . . . . . . . . 110Bibliography 1118

List of AbbreviationsAIME Adaptive Integration of Multiple ExpertsASSOM Adaptive-Subspa e Self-Organizing MapBKS Behavior-Knowledge Spa eCBL Current-Best-LearningDEC Dynami ally Expanding ContextDTW Dynami Time WarpingEM Expe tation-MaximizationFPU Floating Point UnitGSM Global System for Mobile ommuni ationsHCR Handwritten Chara ter Re ognitionHMM Hidden Markov ModelKA Kind of Areak-NN k Nearest NeighborLAN Lo al Area NetworkLVQ Learning Ve tor QuantizationMCBL Modi�ed Current-Best-LearningML Maximum LikelihoodMLP Multilayer Per eptronNN Neural NetworkNPL normalized point-to-lineNPP normalized point-to-pointOCR Opti al Chara ter Re ognitionPAC Probably Approximately Corre tPDA Personal Digital AssistantPDL Pi ture Des ription LanguagePL point-to-linePP point-to-point9

ppmm points per millimeterRAM Random A ess MemoryROM Read Only MemorySA Simple AreaSCM Supervised Clustering and Mat hingWAP Wireless Appli ation Proto olWIN32 32-Bit WindowsWinCE Windows CEwpm words per minute

10

List of Symbolsx input feature ve tor!j pattern lass number jM!j mean of feature ve tors in lass !j�!j ovarian e matrix of feature ve tors in lass !jG(xjn; s) nth Gaussian for the feature ve tor x in state sVN set of non-terminal symbolsVT set of terminal symbolsP set of writing rules# blank symbolX 0 left-right �ip of the string XOCC( ;X) number of o urren es of hara ter in XREM(X; n) string obtained from X by removing its �rst n elementsDk total distan e between prototype k and the inputd(�) distan e fun tionw(�) warping fun tion�i slope angle of the tangent to the urve at the point iah;k fuzzy a tivation value�Hh horizontal fuzzy set triangular membership fun tion�Vk verti al fuzzy set triangular membership fun tion�ji hardening of probabilities P (!jjxi) to binary valuesN total number of lassi�ersP (N) probability of a onsensus of N lassi�ers being orre tpe lassi�er error rate� set of output labels of a lassi�erE(x) �nal de ision of ommitteePE(x 2 !jjx) probability of x belonging to the lass !j in the ombinerdk(x; !j) distan e of x to !j with measure k

11

Pk(x 2 !jjx) probability of x belonging to lass !j in lassi�er kfj(x; wj) output of the jth expertL log likelihoodFR(ek(x)) rank of the output from re ognizer number k for theinput pattern xrik(x) the output rank by re ognizer k for lass i and input xCF (i) number of votes for lass iBEL(m) the Bayesian probability for mC total number of lassesbj assertment from riti number jK number of dimensions of a BKSe(j) output from lassi�er number jwjl(x) riti 's on�den e in its experts de isionBKS(e(1); : : : ; e(K)) the unit in the BKS where the lassi�ers give theoutputs e(1); : : : ; e(K)ne(1):::e(K)(!) total number of samples belonging to lass ! inBKS(e(1); : : : ; e(K))Te(1):::e(K) total number of in oming samples in BKS(e(1); : : : ; e(K))Re(1):::e(K) the best representative lass of BKS(e(1); : : : ; e(K))� threshold parameter to ontrol BKS solution reliabilityl length of the longer side of a hara ter's bounding boxP (j) time warping path point jpj data point jSj stroke number jNj the number of points in stroke number jDPP(S1; S2) the point-to-point distan e between strokes S1 and S2DNPP(S1; S2) the normalized point-to-point distan e between strokesS1 and S2DPL(S1; S2) the point-to-line distan e between strokes S1 and S2DNPL(S1; S2) the normalized point-to-line distan e between strokesS1 and S2(xi; r) the nearest point to the point y on the line betweenpoints xi and xi+1r lo ation parameterASA area for polygons when the lines do not interse t12

AKA Kind-of-Area distan ewadd symbol string addition ost fun tionwrep symbol string repla ement ost fun tionwrem symbol string removal ost fun tiond number of dis retization dire tions` dis retization distan e threshold�1 ost of an addition�2 bene�t of adding the same symbol as just one of its neighbors�3 bene�t of adding the same symbol as both its neighbors�4 ost of adding a pen-up symbol�1 fa tor for the e�e t of length di�eren es in repla ement�2 ost of repla ing with a pen-up symbol� ratio between the osts of additions and removalsJ(j) luster splitting riterion fun tiong(j) goodness value of prototype number jN orr(P ) ount of the times prototype P belonged to the true lassNerr(P ) ount of the times prototype P did not belong to the true lassm(t) referen e ve tor at state t� learning oe� ientÆ(i; j) Kroene ker's delta fun tion r removal onstant for stray point removaldest(i) predi ted total stroke distan e at point idi the al ulated olumn minimum distan e from the dynami programming algorithm at the point iD(S) total distan e of mat hing stroke Sx(A)y the ontext of A j(i) the on�den e of lassi�er j in its de ision for i

13

Chapter 1Introdu tionThe basi problem of handwriting re ognition has been studied for over a entury.Initial ex ursions date ba k to as early as 1870, when automati hara ter re ognitionwas proposed as an aid to the visually handi apped (Govindan 1990). Automati on-line handwriting re ognition has also been an a tively resear hed problem already forfour de ades.One ould assume the re ognition of handwriting not to be a really di� ult task, asit is basi ally just a question of de iding whi h hara ter of the alphabet the inputmat hes with, or is most similar to. In theory, that is pre isely the ase. But even forOpti al Chara ter Re ognition (OCR) systems dealing with ma hine-printed hara tersthere exists great variation in sizes, alignments and shapes of possible fonts, more thanenough to make the task de�nitely non-trivial.As for Handwritten Chara ter Re ognition (HCR), the variation is even more diverse.It an safely be said that probably no two persons' handwriting styles are exa tlyidenti al. And as ea h human writes in several di�erent manners depending on anumber of fa tors su h as haste, need for pre ision and the surrounding environment,the di�erent ways of writing any parti ular hara ter an be taken to be extraordinarilymany.Granted, there are some onsistent features in handwritten hara ters that allow alsohuman readers to distinguish between them. But it is also well known that the humanbrain ex els in su h re ognition tasks (Wang and Gupta 1991), and surely everyonehas ome a ross simply inde ipherable handwritten notes. This is a epted as simplysomeone having written in a way that annot be read. But when the re ognition is tobe done by a omputer, also extremely di� ult forms of writing are in general expe tedto be re ognized, or the user will be disappointed with the performan e. Thus the taskof automati handwriting re ognition is far from trivial.14

1.1 Handwriting as an input methodWith the introdu tion of small palm-top devi es apable of most ommon tasks previ-ously reserved for either desk or lap-top omputers or the traditional pen and paper,a need to �nd suitable, natural ways of inputting data have arisen. As the size of aPersonal Digital Assistant (PDA) devi e is too small for traditional keyboards to bepra ti al, handwriting has been taken on as a viable alternative. Handwriting is a goodway of data input, as it is onsistent with what people are used to. It an be assumedthat everyone using a omputer these days has also learned to read and write, so usinghandwriting as the means of data input would be quite natural.It has been a long-time dream for pattern re ognition enthusiasts to obtain a re ogni-tion a ura y good enough for the pen-based input to even to an extent repla e thetraditional keyboard. But still re ognition a ura y remains the main problem. Whenan experien ed typist only tolerates an error of up to 1% (Guyon and Warwi k 1996),the re ognition a ura y for isolated handwritten hara ters by humans has been ob-served to be in the range of 94.9% to 96.5% (Neisser and Weene 1960). It has alsobeen found that the user a eptan e threshold for handwriting re ognition a ura y isapproximately 97% (Chang and Ma Kenzie 1994). This an be interpreted as the needfor handwriting re ognizers to at the very least rea h, if not surpass, human a ura ybefore the re ognition a ura y is deemed a eptable by the users.In addition, handwriting input speed is in the range of 15-18 words per minute(wpm) (Guyon and Warwi k 1996) whereas an average engineer an obtain mu h higherspeeds of approximately 60 wpm when typing a memorandum (Ward and Blesser 1985).Thus it annot be seen anywhere in the foreseeable future that handwritten input wouldrepla e the keyboard in a situation where spa e is not a on ern but speed and a u-ra y of input are. But for some appli ations where spa e is an issue, as in palm-topdevi es, handwriting input is pra ti ally the only viable solution. Thus handwritten hara ter re ognition is mu h in need of ontinued resear h.1.2 Aims and overview of this thesisThis thesis is fo used on the adaptive on-line handwriting re ognition system developedin the Laboratory of Computer and Information S ien e at the Helsinki University ofTe hnology in o-operation with Nokia Resear h Center. The re ognition system andits features are explained in Chapter 4.The main obje tive of this thesis was two-fold. First, a survey and implementationfor the palm-top platform were ondu ted. Se ond, a survey and resear h in adaptive15

ommittee re ognition were performed.The �rst part of the thesis onsists of a literature survey. In Chapter 2, pen omputingand palm-top devi es, their features and development, are dis ussed. The history, urrent state, usability in terms of problems and bene�ts and future views of pen omputing are examined. Then in Chapter 3 methods relating to handwritten hara terre ognition are examined with fo us on on-line handwriting re ognition. Methodsof re ognition for both individual lassi�ers and lassi�er ombination methods arein luded.After outlining the prin iples of the on-line re ognition system in Chapter 4, the imple-mentation of the re ognition system on a palm-top platform is dis ussed in Chapter 5.Some omparisons with the palm-top platform and an also previously used large-s aleplatform are made and the implemented user interfa e is explained. Also the resultsfrom some experiments are shown.Experiments performed with an adaptive ommittee ombination method based onthe Dynami ally Expanding Context (Kohonen 1986, Kohonen 1987) are dis ussedin Chapter 6. The prin iple behind and fun tionality of the implemented adaptive ommittee are explained and its performan e is ompared with that of some referen e lassi�ers. Finally, in Chapter 7 main results are gathered and drawn on lusionsexplained.

16

Chapter 2Pen omputingWith re ent developments in omputer hardware it has be ome possible to manufa -ture omputers not mu h larger than the human palm. Implementing regular key-boards with devi es of su h a small size is understandably very impra ti al. Withextremely small keyboards typing be omes very umbersome, and it requires extremepre ision from the user to hit the desired keys. Thus pen-based input, and onsequentlyhandwriting re ognition, seems to be the most logi al form of human to omputer in-formation transfer for palm-top devi es.Palm-top devi es have several advantages in omparison to their larger desktop oun-terparts, most notably portability and apability of being used nearly anywhere. Anadditional bene�t is the possibility to onne t palm-top devi es to desktop omput-ers and transfer data between one another making appointments s heduled or noteswritten more widely available, all without any need of retyping.In this hapter the development of pen omputing devi es and their usual hara teris-ti s, both bene� ial and restraining, are explored and examined.2.1 Palm-top omputersThe development of pen omputing an be seen as having began when Alan Kayenvisioned the Dynabook in 1968 (Meyer 1995). While this was the initial design for apen-based omputer, only a ardboard model was produ ed. A strong ompetitor forthe title of the �rst ommonly available palm-top devi e is the Psion Organizer in 1984(Bray 1999), whi h lead the way for a multitude of produ ts with varying su ess.The Dynabook a tually saw the light of day when a prototype very similar to the orig-inal design was presented by Apple in 1987. This was alled the Knowledge Navigator.From the foundation of the Knowledge Navigator �nally in 1992 appeared the Newton,17

whi h was the �rst in this series to rea h mass produ tion. (Meyer 1995)The original devi es were quite a distan e from the urrent state of pen omputingdevi es. Nowadays palm-top devi es with olor displays and tailor-made operatingsystems su h as EPOC32 or PalmOS are ommonly available (Bray 1999), but stillmany of the original problems persist.A ording to Bray (1999) the hoi e of hand-held PCs is in reasing onstantly butthe struggle for the dominan e in operating systems is still going on between EPOC,PalmOS and WinCE. It was noted that WinCE devi es were the �rst ones to beequipped with olor displays. Also signi� ant improvements to the speed and batterylife of the WinCE devi es have re ently been seen. But still the large and loyal user baseof the 3Com palm-top devi es enables them to ompete with WinCE-based solutions.In the review, the Casio Cassiopeia E-105, a palm-top running on a 131 MHz NECMIPS R4000 pro essor whi h makes is the fastest-running WinCE palm-top devi e,was deemed the best of the urrently available palm-top omputers (Bray 1999).A omparison of some features of sele ted palm-top omputers available is shown inTable 2.1 (Bray 1999, Everex 2000). Only di�erentiating hara teristi s have beenentered into the table. All the devi es featured a do king radle for PC onne tivity,data syn hronization software, stylus-based input, infrared ports and the possibilityfor mobile phone onne tivity with additional equipment. The di�eren e in approa hfor the PalmOS and WinCE-based devi es is lear. The PalmOS-based palm-topsare designed as basi PDAs with limited fun tionality while WinCE-based devi es ingeneral o�er more fun tionality. The ost of the extra features in WinCE devi esis signi� antly shorter battery life, and in all but the Everex devi e, larger size andadditional weight.2.2 Bene�ts of pen-based omputersThe �rst obvious bene�t of pen-based palm-top omputers is their size. Currentlyseveral quite small and light-weight solutions are ommer ially available. The sizes ofmost solutions are a eptable in the manner that they are not mu h larger than anaverage alendar or notebook one might think of being repla ed by them.With the in reasing networking and onne tivity of omputers, it is only natural forusers to desire onne tability also from their mobile devi es. This is a feature that hasgenerally been very well implemented for pen-based omputers. The onne tion meth-ods in lude serial or parallel ables, infrared links, devi e-spe i� do king stations andwireless lo al area networks (LAN) (Meyer 1995). Currently also telephone networkingin the form of both modems using GSM networks and WAP appli ations is be oming18

Table 2.1: A omparison of palm-top devi e featuresManufa turer 3Com 3Com Casio CompaqModel Palm Palm Cassiopeia AeroIIIx V E-105 2130OS PalmOS PalmOS WinCE 2.11 WinCE 2.11CPU type Dragonball Dragonball E2 MIPS R4000 MIPS R4000CPU speed 16 MHz 16 MHz 131 MHz 70 MHzROM 2 MB 2 MB 16 MB 12 MBRAM 4 MB 2 MB 32 MB 16 MBDisplay Mono hrome Mono hrome Color ColorResolution 160�160 160�160 320�240 320�240Dimensions (mm) 120�80�15 115�77�10 131�84�20 134�85�20Weight (g) 172 115 255 260HCR Software Gra�ti Gra�ti Jot JotBattery life 2-4 weeks 2-4 weeks 6 h 10 hVoi e a tivation no no no noVoi e re order no no yes yesModem supplied no no no noUpgrade Internal - Compa tFlash -options memory ardsManufa turer Everex HP PhilipsModel Freestyle Jornada NinoExe utive A20 420 500OS WinCE 2 WinCE 2.11 WinCE 2.11CPU type MIPS R4000 Hita hi SH-3 MIPS R4000CPU speed 66 MHz 100 MHz 75 MHzROM 8 MB 8 MB 16 MBRAM 16 MB 8 MB 16 MBDisplay Mono hrome Color ColorResolution 320�240 320�240 320�240Dimensions (mm) 122�82�16 130�81�22 132�86�19Weight (g) 155 250 227HCR Software Jot Jot CalligrapherBattery life 7-8 h 4 h 8 hVoi e a tivation no no yesVoi e re order yes yes yesModem supplied 33.6 K no noUpgrade Compa tFlash Compa tFlash Compa tFlashoptions ards ards ards, 19.2Kmodem19

ommonpla e adding yet another dimension to onne tivity. Espe ially wireless LANsand telephone networking o�er a great deal in the terms of freedom of movement forany portable devi e, as the onne tion an be established anywhere within the rangeof the network.The third notable bene�t, the use of a pen or stylus for input, is also the most problem-ati one. A pen is a very natural way for humans to input information, sin e writing isgenerally taught in s hools and learnt at a young age, at least for people in developed ountries, who oin idently are also the most likely to use su h applian es. Also reasonsbased on human physiology attribute to the use of a pen as a very e�e tive and pre isemethod for input. For humans positioning the pen-tip is highly a urate due to thehigh number of degrees of freedom provided by the �ngers and the relatively large por-tion of the motor- ontrol areas in the brain dedi ated to �nger movement (S homaker1998). This an learly be observed by anyone when omparing the pre ision of inputwhile drawing with a pen in omparison to drawing with a regular mouse. Movementoriginating from the wrist is mu h more oarse as is the quality of the �nal output.When omparing handwriting input with another viable option for palm-top omputerinput, spee h re ognition, it an be seen that handwriting o�ers several advantages.Most notable of these is the fa t that handwriting is not sensitive to surroundingnoise, that an be a real problem for spee h re ognition (Park and Lee 1999). Inaddition, writing input an be used in situations when spee h is deemed inappropriate,for example in meetings. Spee h re ognition also poses problems when editing inputtedtext, whi h on the other hand is quite natural when using pen-based input (Mel et al.1988).2.3 Problems with pen-based omputersWith pen-based omputing and handwritten hara ter re ognition having gained agreat amount of interest, why have su h devi es not be ome ommonpla e? The mainproblem is urrently asso iated with the usability of these devi es. This an mainly beattributed to two important aspe ts hindering user a eptability of pen-based omput-ers, namely re ognition a ura y and user interfa e problems (S homaker 1994). Alsovarious metaphors, both deliberately and a identally laid on, an ause false expe -tations on the a tual fun tionality of the devi e. One example of ommon misleadingmetaphors is the pen-and-paper metaphor.20

2.3.1 Re ognition a ura yIt has often been stated, that the medio re quality of handwriting re ognition has beenthe most notable obsta le to the su ess of pen omputers (Guyon and Warwi k 1996).While this ertainly is a major problem, it an also be seen that even rea hing humanre ognition levels may not be a eptable for handwriting re ognition (S homaker 1994).A tually re ognition performan e might not be the most important fa tor.It has been noted that users are likely to judge the value of an appli ation primarilyon terms of its appropriateness for the task rather than any individual aspe t of fun -tionality (Frankish et al. 1995). Thus, even if the re ognition a ura y is not adequatefor writing a peer-reviewed paper, the devi e and appli ation might be ompletelya eptable and useful for �lling out forms, for example.When omparing handwriting re ognition with other viable pen-based input methods,it has been established that the use of an on-s reen soft keypad still o�ers both higherinput speeds and lower error rates, 23 wpm versus 16 wpm and 1.1% hara ter errorsversus 8.1%, respe tively (Ma Kenzie et al. 1994). The keyboard used the standardQWERTY layout, but it was also noted that tapping on a keyboard with an ABClayout produ ed still better a ura y at 0.6% errors, but the input speed was notablyslower at 13 wpm (Ma Kenzie et al. 1994). This an learly be seen as an indi ationthat there is still mu h room for improvement in re ognizer performan e. When learlybetter �gures are obtained with other input methods, the value of the omfortabilityand naturalness of handwriting might not be enough to tip the s ales in its favor.Another approa h taken by many is to onstrain the writing by de�ning a spe i� setof alphabets designed in a way as to minimize the possibility of onfusion betweenletters. Examples of su h are the unistrokes alphabet (Goldberg and Ri hardson 1993)and its ommer ial derivation, Gra�ti by Palm Computing (Meyer 1995). These spe- i� alphabet s hemes an obtain re ognition a ura ies very near to 100%, but thesigni� ant drawba k is the need for the user to learn the modi�ed alphabet. This inturn rises the level of motivation needed to begin using su h a devi e. In addition it an be expe ted that su h alphabets are rather easy to forget during prolonged periodsof disuse.As an be seen, e�orts should be made to optimize re ognition performan e for nat-ural handwriting. This has the notable bene�t of enabling a lower initial a eptan ethreshold by removing the need of learning a spe i� writing style. The improvementof the re ognition performan e an be seen as being still quite viable, as a multitudeof re ent resear h ontinues to show that improvements are still possible (Chan andYeung 1998, Brakensiek et al. 1999, Vuori et al. 1999).21

2.3.2 User interfa esEven though the most prominent problem regarding pen-based omputing is in thegeneral opinion poor re ognition performan e, the fa t remains that it alone is not re-sponsible for problems in the use of pen-based omputers. Another important fa tor inthe usability of pen-based omputers and their software is naturally the user interfa e.It has been stated that, aside from re ognition a ura y, the la k of maturity of userinterfa e te hnology is the primary fa tor holding ba k the a eptan e of pen-based omputers (S homaker 1994).The design of a fun tional interfa e for a pen-based appli ation is by no means atrivial task. Even though the use of a stylus for both pointing and textual input ispossible, problems may easily arise from determining whi h input type is intended bythe user. For example, when the user draws a horizontal line on some text written, isthe intention to a tivate the text or append a dash?Su h problems an be resolved by onstraining the handwriting input to spe i� areasof the user interfa e (Chang and Ma Kenzie 1994, Ma Kenzie et al. 1994, Meyer 1995).Although in the optimal ase there should be no onstraints on what and where theuser writes and text should be writable along with non-textual input like graphi s andgestures (Meyer 1995), in pra ti e this is understandably very problemati .The user interfa e should not be a handi ap aside re ognition performan e, but ratherit should be used to improve on and ompensate for the defe ts in the re ognitionpro ess. As a matter of fa t, four methods of helping the re ognizer perform betterthrough the user interfa e an be stated (S homaker 1994). These methods are:1. Constraining the writer where possible2. Giving the writer more ontrol3. Providing more powerful me hanisms for error handling4. Clarifying to the user what is going onThe �rst approa h in ludes, for example, for ing the user to write in boxes or ombsfor isolated hand-printed hara ters, or the use of spe i� gestures, an �ok-button�or time-outs for input validation in the ase of word re ognition. This an be verybene� ial for re ognition a ura y sin e the need for spe i� segmentation, a task farfrom trivial in itself, is removed. Giving the user more ontrol indi ates that developersshould in lude new methods for handling input material whi h is una eptably poorlyre ognized, for example pun tuation marks in many ases. This an be solved byo�ering the possibility of using tool bars for su h symbols. Re overy from errors is22

also a problem whi h requires the in lusion of a deep �undo� sta k. The last importantpoint mentioned is giving the user more information on the a tual o urren es inside thesystem. Thus, the interfa e must allow input validation by the user in an unintrusivemanner. (S homaker 1994)Error re overy is an espe ially di� ult problem for adaptive re ognizers, as errorsare the most prominent ause for mislearning leading to unne essary errors. Alsodis riminating between real errors and hanges of mind an be very problemati whenfun tioning with an adaptive re ognition system.In general the user interfa e is perhaps at a tie with re ognition performan e for thetitle of the most important fa tor in the usability of pen-based omputers. As su hmu h more attention should be given to thorough user interfa e implementations forsu h devi es.2.3.3 The pen-and-paper metaphorEver sin e the �rst explorations into the world of pen omputing, the pen-and-papermetaphor has been very prominent in assessing pen-based omputers. This traditional omparison ontinues to ause onfusion. Paper is still the most popular mediumfor sket hing, note taking and form �lling, as it possesses some unrivaled featuresin luding, but not limited to, its heapness, availability, reliability and ease of use. Butin larger quantities paper does have its problems, it is mu h harder to store masses ofpaper than �les on a omputer. Also editing previous handwritten notes an be quitetedious. (Guyon and Warwi k 1996)The general per eption of the pen-and-paper metaphor indi ates a great degree offreedom of use, and the use of this metaphor an give rise to the omparison betweena pen-based omputer and a rather un o-operative pie e of paper (S homaker 1998).The fa t is simply that a omputer pro esses information and is not merely a mediumfor sket hing. As su h pen-based omputing poses onstraints but also o�ers bene�ts ompletely out of the range of paper.2.4 Palm-top HCR systemsClaims on e�e tive handwriting re ognition systems for the palm-top devi es have alsobeen made but sadly very few s ienti� publi ations or other reliable data ould befound. For example Advan ed Re ognition Te hnologies In . boasts very high out-of-the-box re ognition a ura y and adaptation to the users style of writing duringuse (ART 2000). But, as the re ognition system is ommer ial in nature, no informa-23

tion on how the a tual re ognition is performed is given. It is merely stated that nodi tionary is used, but the �unique linguisti layer enhan es a ura y for the Englishlanguages� and that the re ognizer adapts during use through a spe ial algorithm (ART2000). With a qui k test, the system seems fun tional and adaptive.Another example of available ommer ial re ognizers is Jot from Communi ation In-telligen e Corp. This re ognizer has gained wide distribution, as it is the standardre ognizer on palm-size PCs. The Jot re ognition systems uses writing in di�erentareas for di�erent hara ter sets to enhan e performan e. The ommer ially availableJot Pro o�ers also a spe i� trainer to enable the adaptation of the re ognizer and thepossibility for the user to write anywhere on the s reen. (CIC 2000a, CIC 2000b)Although ommer ial handwriting re ognition systems are still not very numerous, theones en ountered are promising forerunners of fun tioning handwriting re ognition-based input methods available to the general publi . It an be expe ted that the varietyand features will in rease and be enhan ed as ompetition in the se tor in reases.2.5 Future viewsIt is lear that pen-based omputers need to be approa hed as a �eld totally its own;the devi es have their own merits and their own problems. As long as users havein orre t or unrealisti expe tations for the fun tionality of pen-based omputers, theywill inevitably be disappointed. This is a dilemma that an only be orre ted by eitherful�lling these expe tations or lowering them by providing a orre t frame of referen e.Also the fast pa e of progress in mi ropro essor te hnology will undoubtedly o�erbene�ts of in reased omputational power and lower power onsumption also to palm-top devi es. A noteworthy re ent release in the pro essor market is the TransmetaCrusoe, whi h would seem to be able to provide very ompetitive performan e espe iallywhen s aled by power onsumption (Laird 2000).With the perennial growth of omputational power even more omplex and betterperforming algorithms an be implemented also on small platforms. Thus it is evidentthat resear h on handwriting re ognition should be ontinued with the obje tive of reating new algorithms and optimizing old ones for higher re ognition performan erather than ful�lling the speed requirements of urrent hardware (Guyon and Warwi k1996). As the omputational speed in reases, algorithms urrently seen as being too omplex will shortly ause no ompli ations in implementation.But even when a eptable re ognition rates will be obtained, the speed of input willinevitably be mu h lower than that obtainable with traditional keyboard input. This24

ne essitates the need to implement other e�e tive methods of assisting the ompletionof ommon tasks. Espe ially tasks requiring pre ision su h as password entry for useridenti� ation would bene�t greatly from for example using signature veri� ation in-stead of traditional passwords, as proposed for the PCS Smart Phone (Narayanaswamyet al. 1999). Su h simple additions augmenting the usability of a pen-based devi e mayjust be one of the de iding fa tors in their advan e to user a eptan e.

25

Chapter 3Handwritten hara ter re ognitionSystems for handwriting re ognition an be examined from several perspe tives. Onemajor distin tion an be made between systems performing the a tual re ognitioneither during the writing pro ess (on-line) or from data olle ted earlier (o�-line). An-other important distin tion an be made between writer dependent and writer indepen-dent re ognition approa hes. The former refers to a situation where the re ognition isfor example adapted to suit the style of the parti ular writer even at the ost of generi apabilities. Writer independent lassi�ers naturally do not develop su h hara teris-ti s.Other viewpoints in lude distinguishing between systems using a single lassi�er andones ombining several more or less distin t lassi�ers. Individual lassi�ers an alsohave many fundamentally di�erent approa hes to the re ognition task. Also severalvarying approa hes to representing the data an be isolated. These approa hes tohandwriting re ognition, as well as some aspe ts of variation in handwriting, will bedealt with in the following se tions.3.1 Time of re ognitionPerhaps the most prominent and obvious distin tion to be made for re ognition systemsis regarding the time when the a tual re ognition is performed. The use of o�-line oron-line re ognition is usually di tated by the appli ation, but methods from both �elds an also be ombined to enhan e re ognition performan e (Guberman 1998, Tanaka etal. 1999).

26

3.1.1 O�-line handwriting re ognitionO�-line handwriting re ognition an be viewed as a dire t subset of Opti al Chara terRe ognition (OCR), sin e re ognition is in pra ti e performed from images of written hara ters (Tappert et al. 1990). In o�-line operation no feedba k from the user, speed,pressure or dire tional information is generally available for the re ognition pro ess. Inmost ases the data in use is that what an be obtained by s anning writing froma pie e of paper. When o�-line re ognition is performed, the re ognition pro ess bydefault deals with less data than its on-line ounterpart. This in turn enables the useof omputationally more expensive algorithms in all but time- riti al systems dealingwith extremely large quantities of data, su h as postal address reading systems. Also,in o�-line re ognition the writing order or overwriting of strokes does not onfuse the lassi�er, whi h is in ontrast to on-line re ognition where these are notable problems(Tanaka et al. 1999).O�-line re ognition has several useful appli ations, for example in postal address read-ing, pro essing do uments to a omputer readable form, automati writer re ognitionand signature veri� ation. OCR ma hines have been ommer ially available sin e themiddle of the 1950s. (Govindan 1990)3.1.2 On-line handwriting re ognitionOn-line re ognition has some bene�ts over o�-line re ognition, mostly due to the ex-tended amount of information that is obtainable. In addition to the spatial propertiesof the lo i of the stylus also the number of strokes, their order, the dire tion of writ-ing for ea h stroke and the speed of writing are available (Tappert et al. 1990). Onappli able writing surfa es the pen pressure an also be measured. Another notable ad-vantage for on-line operation is intera tivity, whi h enables instant re ognition mistake orre tion and also more e�e tive adaptation of the system (Tappert et al. 1990).Due to the abovementioned fa tors it an be expe ted that on-line re ognition providesbetter re ognition a ura y than o�-line. This has been shown in experiments byMandler et al. (1985), where the Eu lidean distan e measure was used with dynami information. In omparison to rasterized images the worst and best writer �nal errorrates de reased from 16.6% to 11.4% and 4.5% to 1.2%, respe tively. Another studyusing the same base data for o�-line and on-line methods states re ognition rates of73.8% and 84.8%, respe tively (Tanaka et al. 1999). Further dis ussion is fo usedon on-line handwriting as that is in onsensus with the nature of the appli ation inquestion in this thesis. 27

Figure 3.1: Three examples of the hara ter �A� with a varying number strokes3.2 Variation in writingAs is intuitively known, there are probably as many ways of writing as there are writ-ers. In addition to individual writing styles, also several ompletely di�erent hara tersets are widely used. It is imperative to be aware of the existen e and possible mani-festations of variation in order to be able to design a natural handwriting re ognitionsystem with e� ien y. Here some ommonly identi�able sour es and forms of variationin handwriting are examined.3.2.1 StrokesEspe ially in on-line handwritten hara ters it is very ommon to onsider a hara teras onsisting of a varying number of elements of writing alled strokes. In this thesisstrokes are onsidered to be parts of the hara ter separated by pen-ups, ie. when thepen is lifted from the writing surfa e. In Figure 3.1 three examples of the hara ter�A� are presented. The leftmost example has been written with one stroke, as the penhas not been lifted during writing the letter. The se ond and third examples onsistof two and three strokes, respe tively.The number and dire tion of the strokes is a signi� ant ause for variation in hand-written hara ters in on-line re ognition. For example, the hara ter �E� is ommonlywritten with an number of strokes ranging from 1 to 4. As ea h of them an also bewritten in either dire tion and in any order, it instantly produ es 8 � 6 � 4 � 2 = 384di�erent permutations for hara ters with no apparent di�eren e when seen written.28

3.2.2 Chara ter setsThe existing hara ter sets are inherently very di�erent in both their appearan e andfeatures. Re ognizers have been implemented at least for Latin, Japanese, Chinese, In-dian, Arabi , Korean, Cyrilli , Hebrew, Thai, Greek and Berber hara ters. (Govindan1990)Of these perhaps the most hallenging are Chinese hara ters due to the sheer amountof hara ters in use: approximately 5000 are ommonly used of a total of over50000 (Govindan 1990). Another hallenge is posed by the fa t that Chinese har-a ters ontain an average of 8�10 strokes, the most ompli ated having more than 30,and the hara ters an be written in either blo k or ursive style ausing variationeven in the number of strokes of the hara ter (Tappert et al. 1990). In omparison,the average number of strokes for Latin upper ase letters is 2 and for lower ase let-ters one (Tappert et al. 1990). Another problemati writing system is Japanese, sin ethey daily employ up to 5 di�erent hara ter sets, namely Hiragana, Katakana, Kanji,Roman letters and Arabi numerals (Govindan 1990).Ea h hara ter set has its general features, whi h demand the need for spe ializedre ognition te hniques. For example variation in stroke dire tion and retra e is more ommon in English than in Chinese (Tappert et al. 1990). Thus the target hara terset has a great e�e t on the desired properties of a given re ognition system. In furtherdis ussion Latin hara ters will be fo used on sin e the appli ation at hand only dealswith this set of hara ters.3.2.3 Writing styleVariation in writing style for individual hara ters an be grouped to variations in boththe stati and dynami properties of the written hara ters. Stati properties are, forexample, size and shape, and dynami properties in lude for example the number andordering of strokes. (Tappert et al. 1990)Variation in the size of hara ters is very ommon in di�erent mediums. For exampleit is very natural to write small letters in small boxes whereas the size is mu h moreprone to vary when no su h guidelines are present. Overall, the size of the letters anbe ontrolled greatly by the underlying material or form properties. The shape of the hara ters may vary depending on the writers origin; distin t styles an be identi�ed forexample for writers from North Ameri a or Europe (Nouboud and Plamondon 1990).Also a�e ting the shape of hara ters is the inherent tenden y for nominally straightstrokes to be ome urved when written by hand (Ward and Kuklinski 1988).29

Several fa tors a�e t the number of strokes. It is generally a eptable that a spe i� ar hetype an be formed using a di�erent number of strokes (Kuklinski 1984). Thisis mainly due to the human per eption that the most important fa tor in writing isthe �nal appearan e, as humans generally only do o�-line re ognition. Also onne tingstrokes often o ur due to pen lift failures, whi h are most ommon in sequentialsimilarly aligned strokes or strokes whose endpoints are near one another (Ward andKuklinski 1988). Stroke retra e is a problem e�e ting mainly on-line re ognition, as itis not ne essarily visible in the �nal appearan e of the hara ter. Two types of retra e an be identi�ed, intentional due to not seeing the output, and onne ting segmentretra e whi h takes pla e when the pen is repositioned without lifting it (Ward andKuklinski 1988).It an be also be stated that in most ases writers tend to draw verti al strokes beforehorizontal ones and left omponents before right ones (Nouboud and Plamondon 1990).The order of writing strokes is also dependent on the handedness of the writer simplydue to the mus le and tendon groups used in writing. Generally, it an be said thatmost left handers tend to draw horizontal strokes from right to left whereas righthanders prefer writing strokes from left to right (Kuklinski 1984).In addition to inter-writer variation also several writing styles are probable to exist fora single writer. One usually writes in a di�erent way when making personal notes in omparison to writing down something very important for others to read. Espe iallythe speed and motivation to arefulness an greatly a�e t the writing style (Kuklinski1984).Due to these fa tors the variation in hara ters is extreme. It has been estimatedthat the amount of identi�able ways to write just four stroke upper- ase �E� 's wouldbe approximately 23 thousand (Kuklinski 1984), and variants of the upper- ase �A� 'shave been predi ted to near 16 thousand (Ward and Kuklinski 1988).3.3 Data prepro essingHandwriting data is usually olle ted using either ele tromagneti /ele trostati orpressure-sensitive tablets onto whi h the hara ters are written using a stylus. Inon-line re ognition the data is most ommonly olle ted into the form of a ve tor of oordinates, possibly in luding also pressure information. The points of the ve tor areequidistant in time due to most data olle tion devi es having a �xed sampling rate.(Nouboud and Plamondon 1990, Tappert et al. 1990)Due to both the intrinsi variation in handwriting data and inadequa y in the pre isionof data gathering instruments there is usually need for prepro essing and normalization30

of gathered data. Prepro essing is mainly used to simplify the tasks of the re ogni-tion algorithms (Guerfali and Plamondon 1993). The purposes an be divided intothree sub- ategories; redu ing the amount of information, eliminating imperfe tionsand normalizing handwriting (Guerfali and Plamondon 1993).As tablets and other data olle tion devi es generally sample information at a �xedfrequen y, also redundant data is produ ed. The most obvious ase of su h are dupli atepoints, whi h generally provide no information of interest. Also points loser than aprespe i�ed threshold are often removed, sometimes taking into a ount regions ofgreater urvature and avoiding removal in them. (Guerfali and Plamondon 1993)Re-sampling the data to be equidistant in spa e rather than time is another ommonapproa h (Nouboud and Plamondon 1991, Nathan et al. 1993, Bellegarda et al. 1994,Hu et al. 1996, Rigoll et al. 1996, Connell and Jain 1999). This is a two-edged sword, insome ases it an ease re ognition but equidistan ing the points also results in the loss ofpen speed information. Espe ially information on the variations in writing speed mightbe useful in dete ting meaningful points of strokes. Generally equidistant sampling willbe more bene� ial in writer independent systems due to individual writers often writingwith di�erent speed hara teristi s (Bellegarda et al. 1994). This naturally results inthe speed information being bene� ial espe ially in writer re ognition or identi� ationappli ations (Yuen 1996).Imperfe tion elimination is ne essitated by the fa t that data a quisition is rarely suf-� iently pre ise. One ommonly used method is smoothing, whi h is used to eliminatehardware problems or errati hand motion (Guerfali and Plamondon 1993). Smoothingis usually performed by using some averaging s heme over neighboring points (Nouboudand Plamondon 1990, Guerfali and Plamondon 1993). Some examples of smoothingapproa hes in lude the use of a Gaussian smoothing �lter (Connell and Jain 1999), aspline smoothing operator (Hu et al. 1996) and a mobile average �lter (Nouboud andPlamondon 1991).Another more extreme problem due to hardware inadequa ies is the appearan e ofwild points, whi h are points a notable distan e o� the a tual writing tra e. They an usually be dete ted by high-velo ity variations (Guerfali and Plamondon 1993).Wild point removal an be onsidered always useful, sin e the erroneous points donot ontain any bene� ial information. This prepro essing method is present in manystudies (Bellegarda et al. 1994). Thresholding with limits based on hand motions isgenerally suitable for wild point removal (Guerfali and Plamondon 1993). Also a spatial�lter that removes points too far from surrounding points an be su essfully used.One type of imperfe tion removal is the removal of omponent onne tions and hooksat the beginnings or ends of hara ters. The former manifests itself mainly due to31

impre ision in pen-up and pen-down dete tion of the data gathering devi e (Guerfaliand Plamondon 1993). Hooks neither ontain any additional information and are thusbene� ial to remove, as often has been done (Bellegarda et al. 1994).In most ases the area of digitization is larger than the a tual letters, so moving theletters to the same origin is an often-used and bene� ial step. Commonly used methodsare to subtra t the mean from all oordinate points or move the enter of the boundingbox of the hara ter to the origin, whi h is also alled bounding box subtra tion. (Srini-vasan and Ramakrishnan 1999, Laaksonen et al. 1998b)It has been stated that size invarian e is the key to robust re ognition. Three basi approa hes to normalization an be identi�ed as multirate-based normalization, ratio-based normalization and simple s aling normalization. In multirate-based normaliza-tion signal re-sampling is based on multirate �lter theory and an be implementedby a as ade of interpolation and de imation �lters. Ratio-based normalization is im-plemented by stret hing or ompressing the image by a given ratio. Simple s alingnormalization on the other hand is omprised of using the bounding box hara ter ands aling the bounding box, and thus the also the hara ter, to a prede�ned size. (Srikan-tan et al. 1995)3.4 Prototype sele tionWhen using a lassi�er based on prototype mat hing, sele tion of the prototypes is riti al to re ognition a ura y. Some important hara teristi s for e� ient prototypesare (IBM 1991):1. Su� ient overage2. Reasonable separation in prototype spa e3. Ea h prototype should be a good representation of a way of writing a hara ter4. Maveri ks should be avoidedSu� ient overage di tates that the set of prototypes should in lude at least one pro-totype for ea h distin t way of writing a hara ter, whi h is not an easy requirement toful�ll due to the extreme variation possible (Kuklinski 1984). Maveri ks are prototypesformed from distorted hara ters and as su h have very poor generalizational apabil-ities. Su h prototypes an be very harmful to re ognition, as they are most likely to ause only false re ognitions. The problem arises when attempting to dete t maver-i ks; simply rarely used prototypes might be a urate, though rare, representations,and eliminating su h will hamper re ognition a ura y.32

3.5 Re ognition methods for individual re ognizersHere some approa hes to hara ter re ognition for individual lassi�ers are dis ussed.The te hniques have been divided into statisti al, synta ti and stru tural, time warp-ing, neural network and fuzzy re ognition methods due to the fundamental di�eren esbetween the approa hes.3.5.1 Statisti al methodsThe general approa h in statisti al lassi� ation is to hoose the lass with the highestprobability of orre tness for the input sample. Some ways of de ision rule formu-lation for statisti al lassi�ers in lude onverting the a priori probability of a lassP (!i) into the a posteriori probability P (!ijx) and formulating a measure of expe ted lassi� ation error, or risk, and a de ision rule to minimize that risk (S halko� 1992).Perhaps the most ommon statisti al approa hes are methods related to Bayesian de i-sion rules (Cao et al. 1992, Cheung et al. 1998). One example by Arakawa (1983) usesFourier oe� ients of pen-point movement lo i in strokes as feature ve tors. The a tual lassi� ation is then based on the Bayesian de ision rule by assuming that the distri-bution of the feature ve tor x onsisting of the Fourier oe� ients of all the hara tersstrokes is multivariate normal and using a dis riminant fun tion of the formg(x; !j) = 1(2�)n=2j�!j j1=2 exp [�12(x�M!j )��1!j (x�M!j )T ℄; (3.1)where M!j is the mean and �!j the ovarian e matrix of the feature ve tor x in lass!j. In that study two dis riminant onditions were used. First, it was assumed thatthere is no orrelation among strokes omposing a hara ter, but the Fourier oe�- ients expressing ea h stroke have a orrelation among the omponents. In the se onddis riminant ondition, omponent varian es in a feature ve tor are taken into a ountbut the ovarian es between omponents are negle ted.This approa h was tested with 25 writers in the set of learning samples and 10 otherwriters were used for testing. Ea h subje t wrote �ve hara ters for ea h ategory. There ognition rate was 99% for the �rst dis riminant form and 97% for the se ond.As is with most methods, statisti al re ognition an also be ombined with other ap-proa hes. For example Loy and Landay (1982) used a stru tural, hain- ode-based,approa h to prune the potential mat hes. After this a statisti al lassi� ation s hemebased on Gaussian distributions was used to de ide between various lasses in the ategory. With both the learning and test samples written by the same writer, �nalre ognition rates of above 98% were obtained.33

Hidden Markov ModelsThe on ept of the Hidden Markov model (HMM) was introdu ed already in the late1960's but has be ome in reasingly popular in the 1980's. The attribute �hidden� inthe name originates from the fa t that a HMM is a doubly embedded sto hasti pro essin whi h the underlying sto hasti pro ess is not observable. (Rabiner 1989)A ording to Rabiner (1989) there are three problems in onstru ting a HMM useful inreal-world appli ations. First, given a model and a sequen e of observations, how anthe probability that the observed sequen e was produ ed by the model be al ulated?This is alled the evaluation problem, and the most pra ti al solution is to test theavailable models and hoose the one that best mat hes the observations. Se ond, howis a state sequen e that is optimal in some meaningful sense hosen? To this problemno real answer an be given, as the hidden nature of HMMs inhibits the dete tion of the orre t sequen e in all but the most degenerate models with a single state. In pra ti ethis an be solved as well as possible by using a optimality riterion. Unfortunately,due to the sheer amount of possible riteria, hoosing the right one, while of utmostimportan e, is no easy task. Finally, how should the model parameters be adjustedto maximize the probability of the observation sequen e? This an be done by usingvarious algorithms for training the models.The use of HMMs in handwriting re ognition has followed from su essful spee h re og-nition appli ations (Rabiner 1989). Spee h re ognition is in prin iple quite similar to onne ted ursive s ript re ognition; both deal with a ontinuous signal over time, theitems to re ognize are well de�ned and the realized shape of a single obje t, whether aphoneme or hara ter, depends on its neighbors (Starner et al. 1994). Thus it is onlynatural to apply method from one �eld to the other.Usually HMMs are reated for ea h allograph, or model for a way of writing a hara ter,su� iently represented in the training set (Nathan et al. 1993, Bellegarda et al. 1995).A ording to Brakensiek et al. (1999) HMM modeling approa hes an, in general, bedivided into three ategories. The most ommonly used approa h is the ontinuousHMM. In the ontinuous form the feature ve tor x emission probability in the state sis usually des ribed by a mixture of Gaussians,p(xjs) = NXn=1 p(njs) �G(xjn; s); (3.2)where p(njs) is the weighting fa tor for the nth Gaussian G(xjn; s) (Brakensiek et al.1999). Espe ially in small databases the interpolating e�e t of ontinuous HMMs anbe very bene� ial (Rigoll et al. 1996). 34

The se ond approa h mentioned by Brakensiek et al. (1999) is to use dis rete HMMsby �rst pro essing the feature ve tor by a ve tor quantizer and approximating theprobability with the label generated by the quantizer. Dis rete parameterization helpsredu e the amount of training data needed at the ost of the loss of information duringthe quantization step (Bellegarda et al. 1995). It has also been noted that dis rete mod-els an, surprisingly enough, lead to better results than their ontinuous ounterpartswhen the size of the database in reases (Rigoll et al. 1996).The third approa h is to use a hybrid method, where the HMM an be augmented by,for example, a neural network working as either the approximator of the probabilityfun tion in the ontinuous ase, or the ve tor quantizer in the dis rete ase. Su h anapproa h has been explored by Brakensiek et al. (1999), where a neural network ve torquantizer was used. Rather en ouraging results of error rate redu tions by 40% inon-line re ognition and 20% in o�-line re ognition were re eived when repla ing thek-means-based ve tor quantizer with a neural-network-based one.3.5.2 Synta ti and stru tural methodsA major bene�t of using synta ti or stru tural approa hes to re ognition omes fromtheir hierar hi al nature. They inherently divide a large, ompli ated pattern re ur-sively into smaller sub-patterns until given primitives are found. This enables the useof powerful data stru tures using grammati al rules, trees or dire ted labeled graphs.This approa h is intrinsi ally similar to powerful arti� ial intelligen e problem solvingapproa hes, where a large, ompli ated problem is divided into subproblems for easiersolving. This methodology an be onsidered losest to the intuition of humans, amerit based on the fa t that humans possess the best pattern re ognition skills whendealing with handwriting re ognition. (Wang and Gupta 1991)A ording to Wang and Gupta (1991) ommon methods for des ribing the relationsbetween line-drawing patterns in lude the Pi ture Des ription Language (PDL), treegrammars and array grammars. In prin iple PDL onsists of labeling ea h primitiveat two distinguishing points, the tail and the head. The primitive an be linked or on atenated to other primitives only at either of these points. The primitives onsistof short and long straight lines in di�erent orientations. An example for the hara ter�A� is shown in Figure 3.2.Strings an be generalized to trees by extending one-way on atenation to multiplelinkings. If a pattern an onveniently be des ribed by a tree, it is also easily generatedby a tree grammar. An example of tree grammars an be found in (Wang and Gupta1991). 35

cb

acb +

b

S

A

+

Triangle c

* a

b + cFigure 3.2: Naming of the PDL symbols and the PDL expression for �A�An isometri array grammar is a quintupleG = (VN ; VT ; P; S;#); (3.3)where VN is a �nite non-empty set of non-terminal symbols, VT is a �nite non-empty setof terminal symbols, VN \VT = 0 and # 62 (VN [VT ) is the blank symbol. P is a �nitenon-empty set of writing rules � ! �, where array � and array � are geometri allyidenti al over VN [ VT [ f#g : � not all #'s, and S 2 VN is the starting symbol.The distan e between two arrays over VT an be de�ned as the smallest number oferror transformations (insertions, substitutions and deletions) required to derive Yfrom X. An example of this pro ess is shown in Figure 3.3. The distan e de�nitionis generally not su� ient for e� ient and unambiguous al ulation of the distan e, sofurther re�nements in the forms of augmented grammars have been developed. (Wangand Gupta 1991)Many stru tural methods are based on the existen e of a prede�ned set of primitivesto whi h the hara ter is attempted to be divided. Commonly used primitives in ludethe �ve Berthod and Maroy primitives (Wang and Gupta 1991):1. straight line2. positive ( ounter lo kwise) urve3. minus ( lo kwise) urve4. pen lift5. usp 36

X =

a a a a a a a

a a

afteraddition

afterdeletion

aftersubstitution

a a a a a a a

a a

a a a a a a a b

b

X = and Y =

a a a a a a a a a

ba a a a a a a a

ba a a a a a a b

b

=YFigure 3.3: Array grammar example of the derivation of Y from X0

13

4

5 7

2

6Figure 3.4: Freeman's o tal hain ode primitivesIn addition to these, some new primitives have been introdu ed and used. They in ludequarter- ir les, semi ir les, full ir les and points. (Chan and Yeung 1998, Khan andHegt 1998)To perform the a tual di�eren e al ulation, in most ases mat hing methods deal withthe representations dire tly. Due to variation in handwriting, �nding absolute mat hesis in many ases impossible, and thus several mat hing s hemes whi h allow variationhave been developed. Two main approa hes are elasti stru tural mat hing s hemes(Chan and Yeung 1998, Khan and Hegt 1998, Sherkat et al. 1999) and deformingmodels (Filatov et al. 1995, Cheung et al. 1998). A tually the pro ess is basi ally verysimilar in both ases, either the inputted hara ters or models are modi�ed a ordingto a de�ned set of rules and the amount or total ost of the modi� ations is al ulatedto be the di�eren e between the mat hed hara ters.A ommonly used subset of stru tural representations is Freeman's ode and its exten-sions and modi� ations (Wang and Gupta 1991). The original approa h is to ode thegiven hara ter as a symbol string by using the eight dire tions depi ted in Figure 3.4.An example of a Freeman- oded 'C' with the resulting ode word 34566701, are shownin Figure 3.5. The downside to the use of basi equidistant hain ode methods is theloss of the dynami velo ity data of the writing (Loy and Landay 1982).37

Figure 3.5: The hara ter 'C' and it's Freeman's ode word 345667010

3

21Figure 3.6: Position oding used by Nouboud et al.One approa h to hain ode used by Nouboud and Plamondon (1991) uses an adap-tation of Freeman's ode using 12 evenly spa ed dire tions and an additional s hemefor oding positions (Figure 3.6). The string omparison is performed by a spe ializedstring omparison pro essor, the PF474. The omparison value between the words Sand T is omp(S; T ) = 2C(S; T ) + 2C(S 0; T 0)jSj2 + jSj+ jT j2 + jT j ; (3.4)where C(S; T ) = 1Xn=0 X 2CSmin(OCC( ;REM(S; n));OCC( ;REM(T; n))) (3.5)and jXj denotes the length of X, X 0 the left-right �ip of X, OCC( ;X) is the numberof o urren es of hara ter in X and REM(X; n) is the string obtained from X byremoving its �rst n elements. In a test with 4 writers and 59 hara ter lasses, anaverage re ognition a ura y of 96% was obtained when the position ode was notused. This improved to 98% with the addition of positional oding.3.5.3 Time warpingTime warping is another method of elasti mat hing that originates from spee h re og-nition (Kurtzberg and Tappert 1981). Time warping is often used as a distan e al u-lation method for template mat hing (Lu and Brodersen 1984). Generally a sequen eof metri points is onstru ted for the input hara ter and it is mat hed against those38

of the prototypes. Dk, the overall distan e between prototype k and the input, an beobtained as (Kurtzberg and Tappert 1981)Dk = minw(i) NXi=0 d(i; w(i); k); (3.6)where w is a warping fun tion that maps the time index of the input model to thatof the prototype and d is the distan e fun tion. Equation (3.6) an be e� iently andoptimally solved with dynami programming by using the re ursion relation (Kurtzbergand Tappert 1981)D(i; j; k) = d(i; j; k) + min8>>><>>>:D(i� 1; j; k)D(i� 1; j � 1; k)D(i� 1; j � 2; k) ; (3.7)where D(i; j; k) is the umulative distan e up to the point (i; j), and thus the overalldistan e Dk = D(N;w(N); k) . By starting withD(1; 1; k) = d(1; 1; k) andD(i; j; k)!1 elsewhere the minimum of all possible paths through the model an be obtained.An example of this approa h from Tappert (1984) used four parameters to des ribeea h point. The parameters were the slope angle �i of the tangent to the urve atthe point i, the height of the point yi, measured from the urrent baseline, and thehorizontal and verti al o�sets from the enter of gravity, xi and yi, respe tively. Basedon these parameters the point distan e was de�ned asd(i; j; k) = min fj�i � �j;kj; 360Æ � j�i � �j;kjg+ jyi � yj;kj+ jxi � xj;kj+ jyi � yj;kj(3.8)This approa h was tested using 26 upper and 26 lower ase Latin hara ters and 10digits. Six writers were used and all wrote four spe i�ed text fragments of a total of 325 hara ters. The �rst text was used for initial training and the last one for measuringthe a ura y with the middle two used for updating. An average re ognition a ura yof 94.1% was obtained, whi h was in reased to 98.3% when using only upper ase,lower ase or digits at a given time.3.5.4 Neural NetworksNeural Networks (NNs), a ommonly used approa h in pattern re ognition in general,are also a popular method in handwritten hara ter re ognition. NNs an be thoughtof as storing prototype data, but instead of using a tual prototypes the information is39

impli itly stored in the weights of the individual neurons.A notable bene�t of this approa h is the fa t that neural networks an be used withoutprepro essing the data (Mozayyani et al. 1998, Zhang et al. 1999). The bla k-boxnature of NNs is not entirely advantageous, as this also makes it nearly impossible tobreak down possible sour es of errors. Also the need for a very large training set is asevere drawba k to NN use, as espe ially adaptivity is very hard to implement.A traditional approa h is to use Multilayer Per eptron (MLP) networks and ba k-propagation for training, but this su�ers from a slow onvergen e rate (Annaduraiand Balasubramaniam 1996). Several improved s hemes for faster training and betterre ognition results have been proposed, for example a supervised feed-forward fuzzyneural lassi�er (Annadurai and Balasubramaniam 1996), enri hing arti� ial neuronswith spatio-temporal oding (Mozayyani et al. 1998) and using an Adaptive-Subspa eSelf-Organizing Map (ASSOM) (Zhang et al. 1999).3.5.5 Fuzzy lassi� ationDue to its apability of managing un ertainty and inde ision fuzzy set theory an beused for both feature extra tion and lassi� ation. Most ommon approa hes breakhandwritten hara ters into omponent features, whi h are des ribed in terms of lin-guisti variables and fuzzy sets. After this fuzzy rules are reated to des ribe proto-type hara ters and re ognition is performed by omparing features derived from theinput model with features of the models de�ned in the fuzzy rule base. (Frosini et al.1998, Lazzerini and Mar elloni 1999)A fuzzy approa h deriving linguisti expressions des ribing individual hara ters froma fuzzy model of a set of hara ter samples is des ribed by Frosini et al. (1998). Firstthe image is partitioned two dimensionally. Then both horizontal and verti al linguisti models are built, and an a tivation value ah;k is al ulated for ea h re tangle de�nedby the supports of the Cartesian produ t of the horizontal and verti al fuzzy setah;k = PPi=1 �Hh (xi)�Vk (yi)PPi=1 �Hh (xi) ; (3.9)where �Hh and �Vk are the triangular membership fun tions of the horizontal and verti alfuzzy sets, respe tively.40

3.6 Adaptation methods for re ognizersThe most ommon way of adapting a re ognizer has been through an initializationphase at the start of use. During it the prototype set is adapted for that individualuser. This an be done either by having the user input all of the prototypes to be used(Lu and Brodersen 1984, Tappert 1984, Mandler et al. 1985, Nouboud and Plamondon1991) or by retraining the available prototype set (Connell and Jain 1999).The adaptation of a re ognizer and its viability depend greatly on the method ofre ognition used. In most ases there is a prototype set whi h is adapted to suit thewriting style of the user. Adaptation ould also be implemented in systems based onother approa hes, for example by adjusting weights of underlying neurons or modifyingHMMs. The problems in adapting su h systems arise mainly from the fa t that amu h larger data set is needed as a single example is not su� ient for estimatingmodel parameters (Rabiner 1989, Annadurai and Balasubramaniam 1996). Also thetime needed for retraining would probably be una eptable. However, a s heme whereindividual HMMs are retrained if the number of examples mat hed to that parti ularmodel is above a threshold has been shown to be e�e tive (Connell and Jain 1999).The adaptation te hniques for a prototype or template set an be divided into threesub- ategories. First, prototypes an be added. Se ond, the existing prototypes maybe modi�ed a ording to some appropriate s heme. Last, prototypes an be removedfrom the set or ina tivated.In general, the addition of new prototypes is ne essary when ways of writing not yetin luded in the prototype set are introdu ed. Tappert (1984) used an approa h where,while the re ognizer was in update mode, prototypes were added if the hara ter wasmisre ognized or the ratio between the mat hing distan e to the �rst and se ond pro-totype andidates was within a given range of toleran e.Nouboud and Plamondon (1991) suggest a s heme, where the user an alter the pro-totype set during use. This is done by allowing the user to de�ne a new hara ter bywriting 16 spe imens. The user an also delete any hara ter from the prototype set.In (S homaker et al. 1994) a probabilisti s heme is utilized. A prototype usage his-togram is onstru ted and by using dis riminant analysis this histogram is redu edin dimension, thus removing less used, or not probable, prototypes. It was notedthat even with the redu ed amount of prototypes re ognition a ura y remained good,whi h learly suggests that the number of prototypes in use ould be redu ed withoutnegative e�e ts on re ognition performan e.Laaksonen et al. (1998a) and Liu and Nakagawa (1999) have proposed prototype modi-� ation s hemes based on Learning Ve tor Quantization (LVQ) (Kohonen 1997). It was41

noted by Liu and Nakagawa (1999) that two prototype modi� ation s hemes yieldedex eptionally good results. The �rst was a s heme based on generalized LVQ, wherethe losest orre t prototype and the losest in orre t one are always updated in anoptimization frame. The se ond learning s heme was based on minimizing lassi� a-tion error by gradient des ent where a restri ted number of prototypes were updatedfor ea h input pattern.A s heme of adaptation where the system an learn the user's writing after both or-re t and in orre t re ognition an be found in Qian (1999). Due to the word-basednature of the pro edure in question, the adaptation s heme is based on improving thebest interpretation of the word in question. Three adaptation approa hes, adding atemplate, repla ing a template and adjusting a template are used.3.7 Re ognizer ombination methodsWhen one studies the outputs of several lassi�ers it an often be noted, that theerrors are not ne essarily overlapping. Sin e the primary obje tive of any re ognitionsystem is to a hieve the best attainable performan e, the possibility of ombiningdi�erent lassi�ers to enhan e overall performan e has arisen. The two main reasonsfor ombining lassi�ers are desired in reases e� ien y and a ura y. (Kittler et al.1998)In order to in rease e� ien y, multistage ombination rules where the input is �rst lassi�ed by a simple lassi�er using a small set of heap features and a reje t option,and then a more powerful lassi�er to handle the more di� ult samples (Kittler etal. 1998) is often used. Another approa h is to use the simple lassi�er to prunethe possible amount of mat hes and have a more omplex approa h de ide the �nalmat hing (Rahman and Fairhurst 1997 ). A s hemati diagram of su h multistage, orserial, approa hes is show in Figure 3.7(a).For in reasing a ura y results from parallel lassi�ers using the same input an be ombined. Designing and ombining the lassi�ers to enfor e ea h other is of utmostimportan e in attempting to improve performan e. (Suen et al. 1992). A s hemati diagram of the parallel approa h is show in Figure 3.7(b).In the following se tions some re ognizer ombination methods are examined. As anall-in lusive sele tion is out of the s ope of this thesis, only some of the most ommonand promising approa hes are fo used on.42

Classifier 1

Classifier N

Recognition Result

.

.

.

InputSamples

Committee Machine

Recognition Result

Classifier NClassifier 1 . . .

InputSamples

(a) General serial ommittee stru ture (b) General parallel ommittee stru tureFigure 3.7: General stru tures of parallel and serial ommittees3.7.1 Majority votingMajority voting is a very simple and elegant s heme of lassi�er ombination for perfor-man e in rease. The basi approa h is to have several independent lassi�ers outputtheir results and then run a vote on these results, the result having most lassi�ersbehind it being the output of the ommittee. Despite its simple nature majority votinghas been shown to be very e�e tive (Lam and Suen 1994).The majority voting de ision s heme an be written as (Kittler et al. 1998) assigningZ ! !j if NXi=1 �ji = Cmaxk=1 NXi=1 �ki (3.10)in whi h the sum on the right hand side equals to the ount of the votes re eived forthe lass k of a total of C lasses from the N re ognizers. �ki is de�ned by hardeningthe a posteriori probabilities P (!kjxi) to binary values as�ki = 8<:1 if P (!kjxi) = maxCj=1 P (!jjxi)0 otherwise : (3.11)Lam and Suen (1994) studied the theoreti al foundations of majority voting in orderto gain a deeper understanding on why it works. It is presumed that N lassi�ers areused and ea h is assumed to vote and the vote to have either of two values, orre tor in orre t. The basi bene�t of the majority voting s heme is that in order for thevoting de ision to be in orre t a majority of the votes must be for the same in orre tanswer. Due to the large number of possible mistakes, it an be expe ted that themajority of the re ognizers would not often simultaneously make the same mistake.43

Assuming that the N individual experts have the same and mutually independentprobability p of being orre t, the probability P (N) of the onsensus being orre t anbe omputed using the binomial distributionP (N) = NXm=k�Nm�pm(1� p)N�m; (3.12)where k = N=2 is the margin of majority. It is also shown, that if the individualre ognition a ura y probabilities are above a threshold of pu = 0:8090, the orderingP (2N) < P (2N � 1) < P (2N + 2) < P (2N + 1) < P (2N + 4) holds for all N . Ithas also been shown that if all the experts have error rates below pe < 0:5, the overall orre t de ision rate in reases with N (Miller and Yan 1999b).3.7.2 Probabilisti ombination methodsThe Bayesian approa h an be onsidered to be the most basi probabilisti om-bination method. As the name already suggests, the basis for Bayesian probability al ulations is Bayes' rule P (!ijx) = P (xj!i)P (!i)Pmj=1 P (xj!j)P (wj) (3.13)The use of Bayesian ombining for lassi�ers in general has been examined by Xuet al. (1992). For the Bayesian approa h the lassi�ers must output results in themeasurement level, ie. lassi�er e attributes ea h label j in the set � of output labelsa measurement value that addresses the degree that the sample x has the label j.With the Bayesian approa h the �nal de ision E(x) is generally of the formE(x) = j, with PE(x 2 !jjx) = maxi2� PE(x 2 !ijx); (3.14)where PE(x 2 !ijx) is the probability of x belonging to the lass !i in the ombiner.Generally any lassi�ers with some kind of apparent post-probabilities omputable anbe ombined by the means of (3.14). For example, in the ase of a distan e lassi�erek where x is lassi�ed a ording to a distan e measure dk(x; !i), the probability of xbelonging to !i ould be al ulated, for example, withPk(x 2 !ijx) = 1=dk(x; !i)PMj=1 1=dk(x; !j) : (3.15)The probabilities ould then be, for example, averaged to produ e the ombiner prob-44

ability, PE(x 2 !ijx) = 1N NXk=1 Pk(x 2 !ijx); (3.16)and these values again used through (3.14). This would be even simpler with Bayesian lassi�ers, as they dire tly produ e the needed probabilities in (3.16).The aspe t of produ ing probabilities from re ognizer s ores has also been addressedby Bou ha�ra and Govindaraju (1999). Due to the di� ulty in al ulating the state onditional probability P (xj!i) in (3.13), the lassi probability estimate as the fre-quen y of an event in a number of trials is used. This is done by ounting the numberof times !i is the top hoi e in jX j, the number of input patterns in the set X , trials.Thus the onditional probability an be estimated asP (!ijx) � P̂ (!ijx) = Pu2X �u!(!i)jX j ; (3.17)where �u!(!i) = 8<:1, if !i is the top hoise given u 2 X0, otherwise (3.18)Kang and Lee (1999) present an approa h to ombine lassi�ers by minimizing theBayes error rate using higher order dependen ies. The presented � se ond order ap-proa h a hieved a lassi� ation performan e of 98.1%, a notable improvement in om-parison to the best individual lassi�er used (96.0%), a voting approa h (97.6%) or astandard �rst order Bayesian (97.1%).The Dempster-S hafer theory of eviden e an be seen as a kind of generalization toBayesian ombining. It is appli able also when handling weak eviden e that doesnot ful�ll the rather stri t assumptions of probability theory. A omputationally verye� ient method an be derived from using binary voting, for or against membership,for every expert for one lass and feature subspa e. Ea h vote an be handled as anindependent sour e of eviden e for the lass membership of the input patter. Thusit is not ne essary to ompute the ombined belief for all of the possible subsets, butmerely for the sets in fo us for the �nal de ision. (Franke and Mandler 1992)45

3.7.3 EM algorithmThe Expe tation-Maximization (EM) algorithm is an iterative pro edure for MaximumLikelihood (ML) parameter estimation of a �nite mixture model. The development ofthe EM algorithm an be seen as having started sometime in the 1960's. (Xu andJordan 1993)In Xu et al. (1995) a mixture of experts and EM learning algorithm is shown as follows.The mixture of experts model is based on the onditional mixtureP (yjx;�) = KXj=1 gj(x; �)P (yjx; �j); (3.19)whereP (yjx; �j) = 1(2�)n=2j�jj1=2 exp f�12[y � fj(x; wj)℄T��1j [y � fj(x; wj)℄g; (3.20)where x 2 Rn, � onsists of �; f�jgK1 and �j onsists of fwjgK1 ; f�jgK1 . fj(x; wj) is theoutput of the jth expert and gj(x; �) is given by the softmax fun tiongj(x; �) = exp �j(x; �)Pi exp �i(x; �) ; (3.21)where �j(x; �), j = 1; : : : ; K are the outputs of the gating network. The parameter � isthen estimated using ML, where the log likelihood is given by L =Pt lnP (y(t)jx(t);�).The estimate an be found iteratively through the EM algorithm onsisting of twosteps, given the urrent estimate �(k).The �rst step is alled the E-step. For ea h pair fx(t); y(t)g ompute h(k)j (x(t)jy(t)) =P (jjx(t); y(t)) and then form a set of obje tive fun tionsQej(�j) =Xt h(k)j (x(t)jy(t)) lnP (jjx(t); y(t)); j = 1; : : : ; K (3.22)Qg(�) =Xt Xj h(k)j (y(t)jx(t)) ln g(k)j (x(t); �(k)) (3.23)The se ond step, or the M-step, onsists of �nding a new estimate �(k+1) =ff�(k+1)j gKj=1; �(k+1)g with�(k+1)j = argmax�j Qej(�j); j = 1; : : : ; K (3.24)46

Third stage:Final Combination

D-Recognizer

K-Recognizer

G-Recognizer

H-Recognizer

C-Recognizer

Bayesian

W-Borda

Neural Net

First stage:Recognition

Second stage:

Bayesian or Voting

CombinationFigure 3.8: A blo k diagram of the ombination system from Paik et. al (1996)�(k+1) = argmax� Qg(�) (3.25)Generally algorithms performing a full maximization during the M-step are referred toas EM algorithms.3.7.4 Multi-stage ombinationsEven though possibly the most ommon way of using multiple re ognition stages is tospeed up re ognition by �rst performing a faster lassi� ation and then re�ning theprodu t with more omplex and e�e tive methods (Kittler et al. 1998, Rahman andFairhurst 1997 ), it is not the only option. It is also feasible to ombine lassi�ers inseveral stages in striving for better re ognition performan e.Paik et al. (1996) introdu e a two-level ombination approa h where there are �veindividual lassi�ers in the �rst stage, then three ombining methods in the se ondand a �nal ombination in the last stage. An overall diagram of the lassi� ationpro ess is depi ted in Figure 3.8. All the �rst-level re ognizers are based on MLPneural networks using di�erent feature ve tors as input. The �rst uses a dynami mesh feature (M-Re ognizer), the se ond dire tional features extra ted with a Kirs hmask (K-Re ognizer), the third dire tional hange features through the use of gradientve tors (G-Re ognizer), the fourth histogram-based features (H-Re ognizer) and thelast one ontour hain ode for boundary information (C-Re ognizer).These �ve re ognizers are then ombined in the se ond stage with the three independent ombiners. The �rst one is based on a Bayesian method. The se ond approa h is based47

Table 3.1: Some results from Paik et. al (1996)DatabaseStage CENPARMI CEDARWorst Best Worst BestRe ognizers 91.65% 96.20% 95.43% 97.89%First Combination 97.40% 97.60% 98.36% 98.40%Final Combination (voting) 97.65% 98.50%Final Combination (Bayesian) 97.80% 98.62%on the Borda fun tion, where the rank FR isFR(ek(x)) = maxi2A (Bi(ek(x))); (3.26)where ek(x) is the output result of re ognizer number k for the input pattern x andBi(ek(x)) = NXk=1 wk(C � rik(x)); (3.27)in whi h C is the number of lasses and rik(x) is the output rank by re ognizer ek for lass i and N the total number of re ognizers. The weighting fa tor wk an also beunity for a non-weighted s heme. The third ombiner in this stage is also based ona MLP neural network, where the number of output neurons equals the number of lasses.In the �nal stage the outputs of the se ond stage ombiners are ombined using eithermajority voting or a Bayesian ombiner. The majority-voting approa h is formulatedas SV (Fi(ek(x))) = 8<:j , if CF (j) = maxi2A CF (i) > 1Rej, otherwise ; (3.28)where CF (i) is the number of votes for lass i. The Bayesian approa h is then de�nedas SB(Fi(ek(x))) = 8<:j , if BEL(j) = maxCm=0BEL(m) > �Rej, otherwise ; (3.29)where BEL(m) is again the probability for m. The system was tested with theCEDAR (Srihari 1997) and CENPARMI (Nadal 1998) digit databases. Some resultshave been gathered into Table 3.1. The �rst row entitled Re ognizers shows the bestand worst per entages obtained with the individual �rst stage re ognizers for ea h48

database. The se ond row indi ates the best and worst results from the se ond stage,or equivalently the �rst ombinatory stage, re ognizers. The last two rows then showthe overall results using both the voting and Bayesian �nal ombiners, again for bothdatabases individually.3.7.5 Group-wise lassi� ationAnother way of in reasing overall lassi� ation is to use or design lassi�ers spe ializedin spe i� hara ter types or lasses. The lassi�ers an be spe ialized in a parti ularsub-domain, obtaining ex eptional performan e there even when performing poorly inthe entire domain (Teow and Tan 1995). Another option is to use spe ialized lassi�ersto re-pro ess lasses ausing a notable amount of onfusion (Rahman and Fairhurst1997b).A ording to Rahman and Fairhurst (1997a) two approa hes to multiple expert on-�gurations an be identi�ed. The �rst is to develop, formalize and implement formalmethods for ombining multiple experts. This approa h is thus mostly on erned withhow the ombination an improve on the rates of the member lassi�ers. Su h methodsare most ommonly based on fun tional mathemati s or statisti al reasoning.The se ond approa h is to develop and implement spe ialized and task-oriented meth-ods. Group-wise lassi� ation methods are a good example of su h, as a priori knowl-edge often plays a major role in tailored methods. Using spe ialized lassi�ers forspe i� lasses that are known to ause problems is indeed group-wise lassi� ationusing a priori knowledge. This an be expe ted to be a produ tive way of dealing withthe lassi� ation problem, as distinguishable onfusion lasses are a substantial sour eof error in the overall performan e of a lassi�er (Rahman and Fairhurst 1997b).The approa h suggested by Rahman and Fairhurst (1997a) is informal in nature and in- orporates a priori se ond-order knowledge from the training set. A general s hemati of this type of strategy is presented in Figure 3.9. The basi lassi�er �rst performs aninitial separation of the input hara ters. Based on the a priori knowledge, groups of hara ter lasses probable to ause onfusion undergo group-wise lassi� ation, whereasthe stru turally dissimilar hara ters are dire ted to the general lassi�er. The �nalde ision is then obtained by ombining the de isions of the general and spe ialized lassi�ers.A note should be made that when designing very spe ialized group-wise lassi�ers, theprobability of su h being apable of good re ognition among other lasses than thosethey were initially designed for will probably be very limited. Due to this fa tor theinitial assignment of input samples to the spe ialized lassi�ers be omes a ru ial point,49

Perform generalized classification.

Combine decisions.

Input Random Character Stream

Final Classification Results

Assign it to proper group.

Is it a

classification?

possible candidate

Perform group-wise classification.

for group-wise

yes

no

Figure 3.9: Generalized group-wise multi-expert stru tureand also a possible sour e of errors. One way to solve this problem is to implementthe spe ialized lassi�ers in a way that they are apable of group-wise re ognition andtrained reje tion. So, they will automati ally reje t lasses not within their area ofexpertise. Another way of dealing with this problem is to implement spe ialized �lterswith trained a eptan e and reje tion for ea h spe ialized lassi�er so that they ana urately extra t the lasses to be re ognized by that parti ular lassi�er. (Rahmanand Fairhurst 1997b)An experiment on a group-wise lassi� ation system has been performed by Rahmanand Fairhurst (1996). The best individual lassi�ers a hieved re ognition a ura y of92.4% for numerals and 81.1% for alpha-numeri hara ters. With the use of the group-wise lassi� ation s heme a ura ies of 94.8% and 84.2%, respe tively, were a hieved.Teow and Tan (1995) propose a fundamentally di�erent approa h resulting in rathersimilar operation. A notable bene�t of the approa h is the la k of need for a prioriknowledge. The Adaptive Integration of Multiple Experts (AIME) system proposedemploys a gating network implemented by a fuzzy neural logi network. It is a neuralnetwork model apable of both pattern pro essing and logi al inferen ing, trained withthe Supervised Clustering and Mat hing (SCM) algorithm (Tan and Teow 1995), whi his a mat h-based learning method.The gating module learns to assess the performan e of individual experts based ontheir performan e in the data spa e. It is thus apable of giving pre eden e to expertswhi h have priorly performed well in ertain situations. Also an ex eptions expert is50

DomainExpert

DomainExpert

DomainExpert

Gating

Module

ExceptionsExpert

Σ

��

��

��

��

I I I I I

O

Figure 3.10: A s hemati diagram of the AIME systemtrained with SCM to apture patterns that fail to be lassi�ed orre tly by any of thedomain experts.The AIME system is apable of both o�-line and on-line learning, and it an thusbe initially trained and then the performan e boosted during operation. A s hemati diagram of the system is shown in Figure 3.10.3.7.6 Criti -driven ombiningAn interesting addition to ommittee lassi� ation strategies is the in lusion of a riti into the de ision s heme. Basi ally the task of the riti is just to de ide whether the lassi�er the riti has been assigned to is orre t or in orre t. Due to the fa t that the riti only has two lasses to de ide from, its predi tions are generally more reliablethan those of the lassi�ers taking on a multi- lass problem. (Miller and Yan 1999a)Two approa hes to riti -driven ombinations are identi�ed by Miller and Yan (1999a),namely riti -driven voting and riti -driven averaging of probabilities. Criti -drivenvoting is performed through a standard voting s heme with the ex eption that if the riti deems the expert's predi tion to be in orre t, the expert abstains from voting.As it has been shown that the probability for a orre t voting result with 2N expert'sis lower than with 2N � 1 experts (Lam and Suen 1994), so when the number of votesis even, the expert whose riti has the least on�den e in the result is negle ted. Ifone lass re eives more than half of the a epted votes, it is deemed the result.For averaging properties it is noted in (Miller and Yan 1999a) that simple averaging isoutperformed by using either geometri or arithmeti averaging but still more informa-tion is to be gained from the riti . Espe ially the situation where zero probability is51

obtained from the riti is highly informative, as this means that the expert's predi ted lass should be ex luded. This an be taken into a ount by the following onditioning~P (j)e (!jjx; bj = 0) = 8<: 1C�1 if !j 6= !�0 otherwise ; (3.30)where C is the total number of lasses, bj 2 f0; 1g is the assertment from riti j,and !� = argmax!i P (j)e (!ijx). After averaging the ross entropy ost sums over 2Nterms the resulting probability estimate for the input x belonging to lass !j, assumingexperts as posteriors, is P (!jjx) = NXj=1 1Xl=0 wjl(x) ~P (j)e (!jjx; l); (3.31)where the weighting fun tion wjl(x), l 2 f0; 1g, is a probabilisti measure of the riti 's on�den e in its expert, wjl(x) = P (j) (ljx)PNn=1 P (n) (ljx) : (3.32)When dealing with majority voting, if the experts' ommon error rate p < 0:5, theoverall orre t de ision rate in reases with the amount of experts N . In the ase of riti -driven voting, the expe ted in rease in overall orre t de ision rate is extendedso that the overall orre t de ision rate in reases with N if p + q < 1, where q is the riti s error rate. (Miller and Yan 1999b)3.7.7 Behavior-knowledge spa e methodThe Behavior-Knowledge Spa e (BKS) method introdu ed by Huang and Suen (1995)is a lassi�er ombination method theoreti ally sound even when independen e amongthe member lassi�ers is not assumed. For independen e not to be needed, informationneeds to be derived from a knowledge spa e that an on urrently re ord the de isionsof all the lassi�ers on ea h learned sample. An example of su h knowledge spa es isthe BKS.A BKS is a K-dimensional dis rete spa e, where ea h dimension orresponds to thede ision of one lassi�er, ea h of whom have C + 1 possible de ision values to hoosefrom. The interse tion of the de isions of the lassi�ers o upies one unit in the BKS,and ea h unit a umulates the number of samples for that ombination. The interse -tion unit of the lassi�ers' de isions for an input sample is alled the fo al unit. An52

Table 3.2: A 2-D Behavior-knowledge spa ee(1)ne(2) 1 : : : j : : : 11 (1,1) : : : (1,j) : : : (1,11)... ... ... ... ... ...i ... ... (i,j) ... ...... ... ... ... ... ...11 (11,1) : : : (11,j) : : : (11,11)example of a two dimensional BKS is shown in Table 3.2, where (i; j) is the fo al unitwhen the results from lassi�ers e(1) and e(2) are i and j, respe tively.To examine the BKS, some symbols need to be de�ned. First, BKS(e(1); : : : ; e(K)) isthe unit in the BKS where the lassi�ers give the outputs e(1); : : : ; e(K). ne(1):::e(K)(!)is the total number of in oming samples belonging to lass ! in BKS(e(1); : : : ; e(K)).Te(1):::e(K) is the total number of in oming samples in BKS(e(1); : : : ; e(K)), de�ned asTe(1):::e(K) = CXm=1 ne(1):::e(K)(m): (3.33)Re(1):::e(K) is the best representative lass of BKS(e(1); : : : ; e(K)),Re(1):::e(K) = fj j ne(1):::e(K)(j) = maxm ne(1):::e(K)(m)g: (3.34)The a tual operation of the BKS is divided into two stages. First is the knowledgemodeling, or learning stage, and then the de ision making, or lassi� ation, stage. Inthe learning stage the BKS is onstru ted from both the true and re ognized labelsand the values for Te(1):::e(K) and Re(1):::e(K) are al ulated from (3.33) and (3.34). Inthe lassi� ation stage the de isions are made in the fo al unit a ording to the BKSand based on the ruleE(x) = 8<:Re(1):::e(K), when Te(1):::e(K) > 0 and ne(1):::e(K)(Re(1):::e(K))Te(1):::e(K) � �C + 1 , otherwise ; (3.35)where 0 � � � 1 is a threshold to ontrol the reliability of the �nal de ision. Threshold�nding and optimality are onsidered further in (Huang and Suen 1995).This approa h was tested by Huang and Suen (1995) on a database of numerals fromapproximately 1000 writers. 5074 examples were used for training and the remaining46451 numerals for testing. The results showed the BKS to perform better than eitherthe Bayesian or voting approa h used, and mu h better than any individual re ognizerused in it. Also Khotanzad and Chung (1994) performed a test on 3000 numeral53

samples whi h showed a de rease of nearly 50% in the error rate from any of the threeMLP lassi�ers used to onstru t the ommittee.3.7.8 BoostingBoosting is a method designed for onverting a single learning ma hine with a �niteerror rate into an ensemble with arbitrarily low error rate (Dru ker et al. 1993). Boost-ing is ommittee method espe ially designed to in rease the performan e of neuralnetworks.Boosting is based on the Probably Approximately Corre t (PAC) learning model. Inthe standard �strong� model the learner must be able to produ e a hypothesis withan error rate at most �, for arbitrarily small positive values of �. Also a variation,sometimes alled the �weak� learning model an be used. In the weak model the needof orre tness for the learner has substantially de reased to an error rate of slightlyless than 12 , whi h in turn is only slightly better than random guessing. It has beenshown that the learning models are a tually equivalent through the use of ensemble ombination. (Dru ker et al. 1994)Dru ker et al. (1993) des ribe the boosting algorithm for an ensemble of neural networksas follows. First a set of training samples is used to train the �rst network. For thetraining set of the se ond network, the training samples are passed through the �rstnetwork and the patterns for the se ond network's training set are olle ted so thatthe �rst network has lassi�ed half of them orre tly and the other half in orre tly.Then the third network will be trained with patterns that the �rst and se ond networkdisagree on. The same training approa h an then, if desired, be iterated in a re ursivemanner to produ e 9, 27 and so on networks.During the re ognition phase, the patterns are passed through all the three networks.If the �rst two networks agree, that is the output label. Otherwise the label from thethird network is used. In (Dru ker et al. 1994) it is also shown, that as the training setsize in reases, the training error de reases until it asymptotes to the test error rate.Experiments on boosting showed a de rease from a 9.0% error rate for the individualnetworks to a 6.3% error rate when using boosting on a 120 000 handwritten digitdatabase, of whi h 2000 were ex lusively used for testing and the remaining 118 000for training. (Dru ker et al. 1994)54

3.8 Adaptation methods for re ognizer ombinationsAlthough the most ommon way of adaptation is to adapt a single re ognizer to theuser's writing style, it is also possible to onstru t a ommittee that adapts to theuser as a whole. The members of su h a ommittee an be adaptive or non-adaptivethemselves.A simple s heme for introdu ing some adaptation into ommittee operation is to have a ommittee performing weighted majority voting. The weights ould then be adjustedbased on the performan e of the lassi�ers. The weighting would thus adjust thede ision in favor of the lassi�er performing best at that time or for the urrent writer.Another very simplisti ommittee adaptation approa h was presented as a referen e lassi�er in (Laaksonen et al. 1999). In this approa h, the orre t results for ea h las-si�er were tra ked and used in adjusting the ommittee. The result from the lassi�erhaving the most orre t appli ations so far was onsidered the output of the ommittee.The AIME system dis ussed in Se tion 3.7.5 is also an example of an adaptive om-mittee. It is apable of re�ning its operation �rst in the beginning of operation andthen improving performan e during use. (Teow and Tan 1995)Also no reason an be seen as to why the BKS method (Huang and Suen 1995) examinedin Se tion 3.7.7 ould not perform in an adaptive fashion. The BKS ould also store thede isions in the de ision making stage, assuming that the true label an be obtainedduring operation. This ould easily enhan e performan e by allowing the ommitteeto adapt on-line a ording to the urrent situation. The implementation of writer-dependent adaptation would although require personal knowledge spa es for all writers.Generally on-line adaptation poses problems in appli ations where the true lass of thesample is not readily available. On-line adaptation and the Dynami ally ExpandingContext (DEC) (Kohonen 1986, Kohonen 1987) method used will be dis ussed in detailin Chapter 6.

55

Chapter 4Des ription of the re ognition systemThe results in this thesis are based on an on-line re ognition system for isolated hand-written hara ters developed at the Laboratory of Computer and Information S ien esof Helsinki University of Te hnology. It performs stroke-wise prototype mat hing usingDynami Time Warping (DTW) distan e al ulations based on a variety of distan emeasures and the k nearest neighbor (k-NN) rule for the �nal de isions. Here thevarious aspe ts of the lassi�er are des ribed.Most of the following des ription is based on (Vuori 1999) and (Laaksonen et al. 1998a),but the author a tively parti ipated in latter stages of the re ognition system develop-ment pro ess. An example of the ontribution are the symbol string distan e measuresexplored in Se tion 4.3.4.The initial, referred to as the large-s ale, implementation was on the UNIX based entral omputer of the laboratory. To omplement this, the re ognition system hasalso been implemented onto a smaller-s ale platform. This implementation is dis ussedin Chapter 5.4.1 Data representationThe data in the large-s ale implementation was olle ted using a pressure sensitiveWa om ArtPad II tablet atta hed to a Sili on Graphi s workstation. The resolution ofthis tablet is 100 lines per millimeter and the maximum sampling rate 205 data pointsper se ond.The data olle ted is the lo i of the pen point movements in x and y oordinates, thepen's pressure on the writing surfa e and a time stamp. The data was then saved inUNIPEN 1.0 format (Guyon et al. 1994), a hierar hi al, platform independent formatfor storing gathered data into a text �le. 56

Figure 4.1: An example of the two available enter operators for the hara ter 'd'. The'2' denotes the enter of the bounding box and the '�' the mass enter.4.2 Prepro essing and normalizationThe prepro essing operations applied have mainly been used to adjust the samplingmethod and frequen y. The sampling frequen y an be altered with the operationsDe imate(n) and Interpolate(n). De imate(n) preserves every (n + 1)th data pointwhile dis arding the intermediate n points. Interpolate(n) performs the opposite a tion,interpolates n equally-spa ed points between every two su essive original data points.Also an operator alled EvenlySpa edPoints(d) has been introdu ed. The operator anbe used for simulating equidistant sampling, as it interpolates new data points usingthe original ones so that the distan e between the points be omes onstant. Afterequidistan ing, the distan e between adja ent points is dl=1000, where l is the lengthof the longer side of the bounding box of the hara ter. The minimal bounding box ofa hara ter is a re tangular frame around the minimum and maximum values of the xand y oordinates of the hara ter.The normalization methods used were ne essitated by the fa t that the users only havea de�ned area to write in, and no spe i� lo ation or size for the hara ter is indi ated.Thus it is ne essary to move the hara ters to the same lo ation prior to mat hing. Thisis implemented by moving the enter of the hara ter to the origin of the oordinatesystem. Sin e the � enter� is not an unambiguous on ept for handwritten hara ters,two alternative operations have been used. In the �rst one, the enter is de�ned as themass enter of the hara ter, and the movement of this point to the origin is performedthrough the normalization operation MassCenter. The se ond approa h available is todetermine the minimal bounding box for the hara ter. The enter of the boundingbox is then moved to the origin using the normalization operation BoundingBoxCenter.The two enters for an example glyph are shown in Figure 4.1.In addition to relo ating the hara ters, also size varian e needs to be ontrolled. As nostri t guidelines regarding the size of the hara ters to be written have been introdu ed,57

size normalization is a ne essity for stable operation. Size normalization is performedthrough the use of the same bounding box as in BoundingBoxCenter, by means ofs aling the longer side of the bounding box to a prede�ned length and keeping theaspe t ratio un hanged. This is alled the normalization operation MinMaxS aling.As the length an be arbitrarily sele ted, the value 1000 was hosen for its evenness.4.3 Distan e measuresDue to the variation in writing speeds and lengths of strokes, it is inevitable that thenumber of data points in hara ters varies a great deal. Also the ount of strokes anvary onsiderably. Thus it is ne essary to use a distan e measure and a al ulationalgorithm whi h an operate regardless of the varian e in the number of points. Alsosymmetry, meaning that the same result is obtained when mat hing a to b as b to a ie.d(a; b) = d(b; a), is highly desirable.The distan e metri s des ribed in Se tions 4.3.1, 4.3.2 and 4.3.3 are based on DTW andthus de�ned between urves with a varying number of data points. They all are alsostroke-based and symmetri . If the number of strokes is di�erent between hara ters,the distan e is taken to be in�nite. If the number of strokes mat hes, the total distan ebetween the hara ters is the sum of the inter-stroke distan es.4.3.1 Point-to-point distan esThe point-to-point (PP) distan e simply uses the squared Eu lidean distan e be-tween two data points as the ost fun tion. The distan e between the two strokesS1 and S2 and the optimal time warping path are found through the use of a dy-nami programming algorithm (Sanko� and Kruskal 1983). The time warping pathP (h) = (i(h); j(h)); h = 1; 2; : : : ; H, where h is an order index for mat hing the i(h)thand j(h)th data points of strokes S1 and S2, respe tively, des ribes the point-to-point orresponden e. The PP mat hing is depi ted in Figure 4.2.The boundary onditions for the time warping path are that the �rst and last datapoints of the strokes S1 = (p1(1); : : : ; p1(N1)) and S2 = (p2(1); : : : ; p2(N2)) aremat hed against ea h other, orP (1) = P (i(1); j(1)) = (1; 1) (4.1)P (H) = P (i(H); j(H)) = (N1; N2): (4.2)The ontinuity ondition is that all data points are mat hed at least on e and several58

p (i+1)1p (i)

1

d(p (i),p (i))21

p (i)2 p (i+1)

2...

...

Figure 4.2: An example of PP mat hingdata points an be mat hed against one, for h = 2; 3; : : : ; H(�i(h);�j(h)) = (i(h)� i(h� 1); j(h)� j(h� 1)) = 8>>><>>>:(1; 0)(0; 1)(1; 1) : (4.3)The distan e d between two data points p1 = (x1; y1) and p2 = (x2; y2) is the squaredEu lidean distan e, d(p1; p2) = (x1 � x2)2 + (y1 � y2)2: (4.4)The total distan e DPP between the strokes S1 and S2 is found by minimizing the sumof the mat hing osts for all data points,DPP(S1; S2) = minP (h) HXh=1 d(p1(i(h)); p2(j(h))): (4.5)Due to the fa t that the boundary and ontinuity onditions and the ost fun tion aresymmetri al, the distan e metri DPP is also symmetri al. (4.5) an be solved usingdynami programming,DPP(i; j; h) = d(p1(i); p2(j)) + min8>>><>>>:DPP(i� 1; j; h� 1)DPP(i; j � 1; h� 1)DPP(i� 1; j � 1; h� 1) ; (4.6)with the initial onditions of setting DPP(0; 0; 0) to zero and the unde�ned distan esto in�nity, or DPP(0; 0; 0) = 0 (4.7)d(p1(i); p2(j)) =1 , if i; j � 0; j > N1, or j > N2: (4.8)59

The minimum total ost in (4.5) is given byDPP(S1; S2) = DPP(N1; N2;H) (4.9)Due to the fa t that the number of points mat hed does indeed a�e t the sum of (4.5),the distan es between seemingly similar strokes written at di�erent speeds is still no-table. So in addition to the shape of the stroke, also its dynami properties a�e t thetotal distan e. This might be an advantage when adapting to the style of a ertainuser, but is surely a disadvantage when dealing with several users.Another aspe t is that the relative weight of the ost of a stroke with more pointsis mu h larger in the �nal sum than that of a stroke with less points, even thoughin true interpretation of hara ters it might not be so. These possible problems havebeen ountered by de�ning a normalized version of the PP distan e, the normalizedpoint-to-point (NPP) distan e. The normalization is performed by s aling the distan eof two mat hed strokes with the total number of mat hings performed, H, so that thetotal ost be omesDNPP(S1; S2) = DPP(S1; S2)H ; max(N1; N2) � H � N1 +N2 � 2: (4.10)4.3.2 Point-to-line distan esThe point-to-line (PL) distan e is a modi� ation of the PP distan e where the points ofa stroke an also be mat hed to lines interpolated between the su essive points of thestroke they are being mat hed against (Sanko� and Kruskal 1983). This helps redu ethe e�e ts of phase di�eren es in the sampling of two urves of similar shape. Thee�e ts are espe ially seen in ases where the sampling frequen ies di�er notably, whilethe a tual shapes of the hara ters are relatively similar. The al ulated PP distan ebetween hara ters of similar shape, but of whi h one has a mu h higher samplingfrequen y is mu h larger than the di�eren e in the a tual appearan e of the hara ters.The di�eren e should naturally not be dependent on sampling frequen y.Problems may arise with the PL distan e when the sampling frequen y is very low,as the linear interpolation between points might ause shape distortions, espe ially inareas of high urvature. But when the sampling frequen y is high enough, the e�e tsof phase di�eren es are insigni� ant.In the PL distan e al ulations, all points ex ept the �rst and last ones, whose mat h-ings are di tated by the boundary onditions below, are mat hed on e. If a stroke beingmat hed ontains only one point, it is dupli ated. Ea h point is mat hed to the losestpoint on the lines interpolated between the data points of the opposite urve. The ost60

p (i)2 p (i+1)

2...

...

d(p (i),p (i))21

p (i)1

p (i+1)1

Figure 4.3: An example of PL mat hingof an individual point mat hing is the squared Eu lidean distan e to the nearest pointon the line. The PL mat hing is depi ted in Figure 4.3.In order to de�ne the ontinuity onstraints, some fun tions need to be de�ned. Theindex k 2 f1; 2g is the number of the stroke in question. Ik(h) indi ates whetherthe point of the stroke Sk involved in the hth mat hing is a data point or a point onan interpolated line. The fun tion ik(h) is the index of the �rst data point that hasnot been used in the �rst h � 1 mat hings. The fun tion r(h) = [0; 1℄ indi ates therelative position of the interpolated point involved in the hth mat hing. The ontinuity onditions an now be stated as1. If both Ik(h) and Ik(h+1) indi ate data points, the points are adja ent and thusik(h+ 1) = ik(h) + 1.2. If Ik(h) indi ates a data point and Ik(h + 1) an interpolated point, ik(h + 1) =ik(h).3. If Ik(h) indi ates an interpolated point and Ik(h + 1) a data point, ik(h + 1) =ik(h) + 1.4. If both Ik(h) and Ik(h+ 1) indi ate interpolated points, ik(h+ 1) = ik(h).The boundary onditions are de�ned as, with a; b 2 f1; 2g and a 6= b, that the �rstpoint pa(1) of the stroke Sa is mat hed to a point interpolated between the �rst andse ond points, pb(1) and pb(2), and point pb(1) is not mat hed at all. Similarly, thelast point pa(Na) of the stroke Sa is mat hed to a point interpolated between the lastand se ond last points, pb(Nb) and pb(Nb � 1), and point pb(Nb) is not mat hed at all.With the fun tions Ik(h) and ik(h) these onditions an be written as1. ia(1) = ib(1) = 12. If Ia(Na + Nb � 2) indi ates a data point, then ia(Na + Nb � 2) = Na + 1 andib(Na +Nb � 2) = Nb. 61

As the boundary and ontinuity onditions and the asso iated ost fun tion are sym-metri , the PL distan e is also symmetri .The nearest point (pi; r) on the line between points pi and pi+1 to the point q an bewritten as (pi; r) = (1� r)pi + rpi+1; (4.11)where the relative lo ation parameter r is de�ned asr = 8>>><>>>:1, if r � 10, if r � 0r, otherwise ; (4.12)where r = (q � pi) � (pi+1 � pi)(pi+1 � pi) � (pi+1 � pi) : (4.13)The PL distan e is also solved by using dynami programming. The re ursive equationis DPL(i; j; r; h) = min8<:DPL(i� 1; j; r; h� 1) + d(p1(i); (p2(j); r))DPL(i; j � 1; r; h� 1) + d((p1(i); r); p2(j)) (4.14)With similar initial onditions as with the PP distan e of setting DPL(i; j; r; 0) to zeroand the unde�ned distan es to in�nity, orDPL(i; j; r; 0) = 0 (4.15)d((p1(i); r); p2(j)) = d(p1(i); (p2(j); r)) =1, if i; j � 0; i > N1, or j > N2, or j > N2(4.16)The minimum total ost is then given byDPL(S1; S2) = DPL(N1 � 1; N2 � 1; r;N1 +N2 � 2) + min8<:d(p1(N1); (p2(N2 � 1); r))d((p1(N1 � 1); r); p2(N2))(4.17)The normalized point-to-line (NPL) distan e, the PL distan e normalized by the totalnumber of mat hes performed, H, is de�ned in a similar way as the NPP distan e62

p (i+1)1p (i)

1

d(p (i),p (i))21��

��

��

��

��

��

��

��

��

��

��

��

��

p (i)2 p (i+1)

2...

...

Figure 4.4: An example of SA mat hingin (4.10), DNPL(S1; S2) = DPL(S1; S2)H ; H = N1 +N2 � 2: (4.18)4.3.3 Area-based distan esThe main problem with the PP and PL distan es is the fa t that they do not om-pensate for the density of data points. This results in heavy emphasization of pointswhere the pen is moved slowly, usually end points and points of high urvature.Aside from the normalization in the NPP and NPL distan es, the density of data points an be taken into a ount by using an area-based distan e. This results in the lengthsof the strokes being more meaningful, as the area depends on the distan es betweenpoints and thus also the length of the stroke.Two area-based distan es have been implemented, the simple-area (SA) distan e andthe kind-of-area (KA) distan e. Both area-based distan es have similar boundary and ontinuity onstraints as the PP distan e, and the overall distan es an also be solvedusing dynami programming.Simple-area distan eThe ost fun tion asso iated with the SA distan e is de�ned by the area spanned bythree or four data points. After the �rst mat hing, ea h mat hing produ es a triangleor a quadrilateral between the strokes. The SA mat hing is depi ted in Figure 4.4.When mat hing two strokes, S1 = (p1(1); : : : ; p1(i(h)); : : : ; p1(N1)) and S2 =(p2(1); : : : ; p2(j(h)); : : : ; p2(N2)), in every point-to-point mat hing the indexes of themat hed data points hange in one of the following ways:1. (�i(h);�j(h)) = (1; 0) and a triangle is spanned by p1(i(h)); p1(i(h � 1)) andp2(j(h)), or 63

p (i+1)1

p (i)1

p (i+1)2

p (i)2

...

......

...1

2

p (i-1)

p (i-1)

Figure 4.5: An example of KA mat hing2. (�i(h);�j(h)) = (0; 1) and a triangle is spanned by p1(i(h)); p2(j(h � 1)) andp2(j(h)), or3. (�i(h);�j(h)) = (1; 1) and a quadrilateral is spanned by p1(i(h)); p1(i(h �1)); p2(j(h)) and p2(j(h� 1)).The area, and thus the asso iated ost, for polygons with n points when the lines donot interse t, an be al ulated as (Råde and Westergren 1990)ASA = jxny1 � x1yn + n�1Xi=1 (xiyi+1 � xi+1yi)j: (4.19)Interse ting of the lines o urs only if the strokes ross ea h other. If the sampling rateis high the resulting error is very small in proportion to the total ost. This e�e t hasbeen negle ted in order to keep the omputational omplexity reasonable.Kind-of-area distan eWith the KA distan e, the ost asso iated with mat hing the points p1(i(h)) andp2(j(h)), isAKA(p1(i(h)); p2(j(h));n;m)= jp1(i(h))� p2(j(j))jn(jp1(i(h))� p1(i(h) + 1)j+ jp1(i(h))� p1(i(h)� 1)j+ jp2(j(h))� p2(j(h) + 1)j+ jp2(j(h))� p2(j(h)� 1)j)m: (4.20)Prior to mat hing, strokes S1 and S2 are augmented by repli ating the se ond andnext to last points, resulting in the lengthened stroke S = (p(2); p(1); p(2); : : : ; p(N �1); p(N); p(N � 1)). The new starting and ending points are not mat hed. The SAmat hing is depi ted in Figure 4.5.The parameter m � 0 ontrols the e�e t of the point density on the total distan ebetween strokes. If m = 0, high density parts are emphasized. When m � 1, the64

signi� an e of points whi h are lose to their neighbors is in reasingly redu ed. Theother parameter, n � 1, a�e ts the penalization for point distortions. When n > 1,points far apart are penalized more harshly than those lose together. With n = 2 andm = 0 the KA distan e is redu ed to the PP distan e.4.3.4 Symbol-string-based distan esAn entirely di�erent approa h implemented in the re ognition system are the symbol-string-based distan es, an extension of Freeman's ode dis ussed in Se tion 3.5.2. Inthe reation of the symbol strings, the dire tion between the beginning and end pointsof the symbol were dis retized to 2i, i 2 f2; 3; 4; 5g, distan es. The end point wasdeemed a point either far enough, a ording to a threshold, from the beginning pointor when the end of the stroke was rea hed. The symbols were stored either with orwithout length information in their odings. Also di�erent odings for the pen-upsymbols, where the pen was o� the tablet, were used.A sharp orner dete tion method was implemented to take into a ount rapid hangesin the dire tion of the writing in order to add additional symbols to points of high urvature. The equation weighs hanges near the origin of the polar oordinate systemsomewhat more. This an be expe ted to be bene� ial in the sense that dire tional hanges near the origin generally ontain more information. Dire tional hanges in thefar parts of the hara ter, on the other hand, often orrespond to loops at the ends ofstrokes, and are thus of little interest. The hange in dire tion is estimated as�i = ar tan(xi�2 + xi�1 � xiyi�2 + yi�1 � yi ); (4.21)where xi and yi are the x and y oordinates of point number i. The value of �i is al ulated at points i and i� 3. If the di�eren e ��i between these values ex eeds thethreshold d=�, where d is the number of dis retion dire tions, the point i is deemedthe beginning point for the next symbol.Two examples of the symbol string reation are shown in Figure 4.6. The �rst symbolstring was reated without the use of the orner dete tion option and the se ond onewith the option a tive.The original hara ter dlldo dlldhnaFigure 4.6: Examples of symbol string reated65

The used ost fun tion was de�ned based on the Levenshtein distan e (Sanko� andKruskal 1983) through the existen e of additions, repla ements and removals in thesymbol string onstru ted asd(ai; bj) = min8>>><>>>: d(ai�1; bj) + wadd(ai�1; bj; ai) (addition)d(ai�1; bj�1) + wrep(ai; bj) (repla ement)d(ai; bj�1) + wrem(ai�1; ai; ai+1) (removal) ; (4.22)where ai and bi are the ith symbols in the two symbol strings being mat hed. Theindividual w-fun tions were de�ned aswadd(ai�1; bj; ai) = �1 `bj̀ +8>>>>>><>>>>>>:

��2; if bj 2 fai�1; aig; ai�1 6= ai��3; if bj = ai�1 = ai�4 ; if bj 2 U0 ; otherwise ; (4.23)wrep(ai; bj) = �1 j`ai � `bj j` + 6 (aj � bj)2� +8>>><>>>: �2 , if ai � U; bj 6� U;or ai 6� U; bj � U0 , otherwise ; (4.24)and wrem(ai�1; ai; ai+1) = �wadd(ai�1; ai; ai+1) (4.25)where `bj is the length of the symbol bj and U the set of pen-up symbols. d is thenumber of dis retization dire tions and ` the dis retization distan e threshold. The ost parameters are identi�ed in Table 4.1.The total distan e was then al ulated using dynami programming. This approa ho�ers valuable bene�ts in omputational ost in omparison to the distan e measuresTable 4.1: The osts of the symbol-string distan e al ulations�1 The ost of an addition�2 The bene�t of adding the same symbol as just one of its neighbors�3 The bene�t of adding the same symbol as both its neighbors�4 The ost of adding a pen-up symbol�1 The fa tor for the impa t of the symbol length in repla ement�2 The ost of repla ing with a pen-up symbol� The ratio between the osts of additions and removals66

mentioned in Se tions 4.3.1, 4.3.2 and 4.3.3. This bene�t has a serious downside,namely the loss of information in the dis retization stage. This was seen to result in asigni� antly poorer overall re ognition a ura y than with the other distan e metri s.4.4 Prototype setGenerally the performan e of a k-NN lassi�er improves with the in rease in the num-ber of samples to mat h the input against. But as the needed omputation time alsoin reases, it is very impra ti al to use all available samples for mat hing. With arefulsele tion of the training set samples, the deterioration in a ura y resulting from usingfewer samples an be minimized. Thus a prototype set onsisting of samples represent-ing di�erent ways of writing an be onstru ted from the training samples and used inthe re ognition pro ess.As dis ussed in Se tion 3.4, the prototype set should have su� ient overage andseparability. All prototypes should be �good� in the way that they are not distorted,but a urate representations of the intended hara ters. To a hieve good prototypesets in our system a semiautomati lustering algorithm is used to luster the trainingsamples. This lustering results in groups that ontain samples similar to ea h other,a ording to the sele ted similarity measure. Then the enter-most sample from ea h luster is sele ted to be ome the prototype representing that luster.The lustering algorithm is semiautomati in the sense that the number of lusters mustbe determined beforehand. Thus, the number of lusters needed for ea h hara ter lass must be examined manually. Generally, due to the stroke-wise nature of theappli ation, there must be at least as many lusters for ea h hara ter as there arepossible numbers of strokes to use in writing it. Also noti eably di�erent writing stylesneed to have a luster of their own. As an example, 7 prototypes for the hara ter �E�are shown in Figure 4.7.When the number of prototypes for ea h lass and number of strokes therein have beende ided upon, the lustering is arried out by an automati iterative algorithm. Thesame prepro essing and normalization operators, as well as the same distan e measure,as during re ognition are used in the lustering phase. The lustering algorithm workswith a splitting prin iple. At �rst all samples with the same label and number ofstrokes are in one luster and this luster is divided until the predetermined number ofprototypes is rea hed or there are already as many groups as hara ters. The splittingis performed by �rst �nding the entral sample in the luster, the one minimizing thesquared distan e to all others. Then the samples are ordered in the luster a ording toin reasing distan e from the enter. The most distant samples of the luster are then67

Figure 4.7: 7 sample prototypes for the hara ter �E� hosen to form a new luster. The number of items in the new luster is determinedby minimizing the splitting riterion fun tion J(i) de�ned asJ(i) = d(xi�1; xold(i� 1)) + maxi�j�N d(xj; xnew); (4.26)where d(x; y) the squared distan e between two samples, and xold(i) and xnew the itemxi's old and new luster enters, respe tively. In the ase that two distin t writingstyles exist, the splitting fun tion is roughly U-shaped. If more writing styles exist,several lo al minima may be seen. The samples are extra ted into the new luster by�nding the index i� so that J(i�) = mini J(i); (4.27)and the samples with indexes i � i� form the new luster. Then the luster enters arere al ulated and items assigned to the luster enter nearest to them iteratively untila stable settlement is rea hed. After this the splitting pro edure is re-entered unlessthe prede�ned number of lusters has been rea hed or the number of lusters equalsthe number of samples. (Laaksonen et al. 1998b)4.5 Prototype pruning and orderingIn order to de rease the omputational burden, the prototypes are pruned and orderedprior to a tual lassi� ation. The pruning is based on the number of strokes. As thedistan e between hara ters with a di�erent number of strokes is de�ned to be in�nite,68

hara ters are only mat hed against prototypes with the same number of strokes.The a tual al ulation loops have been implemented in a way that the al ulation isstopped when the shortest distan e already attained is ex eeded. Due to this fa tappropriate ordering of the prototypes helps speed up the omputation onsiderably.The ordering is performed by pla ing ea h prototype into one of sixteen ategories.The ategories are de�ned after the prepro essing and normalization phases. For ea h ategory a four-bit binary value b4b3b2b1 is onstru ted a ording to the oordinates ofthe �rst and last points in the �rst stroke,b1 = (x1(1) � 0)b2 = (y1(1) � 0)b3 = (x1(N1) � 0)b4 = (y1(N1) � 0) (4.28)where N1 is the number of data points in the �rst stroke, and a true value of anexpression orresponds to one and false to zero. The distan e d 2 f0; 1; 2; 3; 4g betweenthe ategories is the ount of bit di�eren es in their binary representations. Prototypesare mat hed against the input sample in the order of in reasing ategory distan e.4.6 AdaptationFour adaptation strategies have been implemented into the basi lassi�er. They areall based on modifying the prototype set by either adding, altering or ina tivatingprototypes, or on a ombination of these.4.6.1 Prototype additionThe adaptation strategy Add(k) is a strategy where new prototypes are added tothe prototype set if any of the k nearest prototypes, as de ided by the lassi�er, arein orre t. This is done even if the �nal lassi� ation result was orre t.Prototype addition is the only method of those presented here that an a tually helpthe re ognizer learn entirely new styles of writing, as long as the orre t label an be ex-tra ted for the hara ter. The downside to prototype addition an be seen when a userwrites similar-looking hara ters for di�erent lasses. This results in more and moreprototypes being added but the prototypes similar to the users input still have di�erent lass labels. In su h ases adding prototypes will mainly only in rease omputationtime while providing no enhan ement to the re ognition performan e.69

4.6.2 Prototype ina tivationThe adaptation strategy Ina tivate(N;G), on the other hand, is based on ina tivatingprototypes in the prototype set. After ea h re ognition the nearest prototype to thein oming sample is he ked. For this purpose the goodness value g(P ) is de�ned, for aprototype P , as g(P ) = N orr(P )�Nerr(P )N orr(P ) +Nerr(P ) ; (4.29)where N orr(P ) andNerr(P ) are the ounts of how many times the lass of the prototypeP and the true lass of input sample are the same and di�erent, respe tively, when theprototype has been the nearest one. If the goodness value g(P ) is below a given limitG and the prototype P has been the nearest prototype at least N times, P is removedfrom the set of prototypes in a tive use, ie. ina tivated.The usefulness of ina tivation is most evident in situations, where the user writes aspe i� hara ter in a way as to easily ause onfusion between lasses but hara tersof the lass onfusion is aused with in a distin t style. For example, the �b� 's writtenby the user are onsistently losest to a prototype for the lass �h�, but none of the�h� 's written by the user mat h that parti ular prototype. In su h a situation the pro-totypes ausing onfusion will be ina tivated, thus improving the overall performan e.Prototype ina tivation an also easily and e�e tively be used in onjun tion with anyof the other adaptation methods presented here.4.6.3 Prototype modi� ationWhen a hara ter is lose to the prototype it is mat hed to but of slightly di�erentshape, reshaping of the prototype is a viable alternative to adding a new prototype.Reshaping existing prototypes is more e�e tive for overall re ognition speed as ea hadditional prototype auses the need for more omputation in every re ognition round.The prototype modi� ation strategy used, Lvq(�), is based on Learning Ve tor Quan-tization (Kohonen 1997). If the lassi� ation is orre t, the points of the nearestprototype are moved towards those of the input sample. If the lassi� ation resultedin an in orre t de ision, the prototype points are moved in the opposite dire tion.In the original LVQ rule, the referen e ve torm nearest to the input ve tor x is updated70

Figure 4.8: The operation of the modi�ed LVQ training. The prototype stroke beingmoved is identi�ed by the gray ir les and the input sample by the bla k ir les. Thenew lo ations for the prototype strokes points are shown as open ir les.based on a positive learning oe� ient � as inm(t + 1) = 8<:m(t) + �(x�m(t)), if m and x belong to the same lassm(t)� �(x�m(t)), otherwise : (4.30)For the purposes of handwriting re ognition, where the prototype and input sam-ple generally do not have the same number of points, a modi�ed rule has been de-rived (Laaksonen et al. 1998a). The point-to-point orresponden e is established usingDTW and the modi�ed LVQ training is de�ned as:Let P (h) = (i(h); j(h)) be the optimal time warping path between strokes S1 =(p1(1); : : : ; p1(N1)) and S2 = (p2(1); : : : ; p2(N2)). The PP distan e between the strokesis DPP(S1; S2) = HXh=1 d(p1(i(h)); p2(j(h))); (4.31)where d(p1; p2) is the squared Eu lidean distan e. If the points of stroke S2 are moved,their new lo ations areSnew2 = 8<:Sold2 � ��DPP(S1;S2)�S2 jSold2 , if S1 and S2 belong to the same lassSold2 + ��DPP(S1;S2)�S2 jSold2 , otherwise ; (4.32)where � again is the positive leaning oe� ient, and�DPP(S1; S2)�S2 jSold2 = �20BBBBBBB�

PHh=1 Æ(1; j(j))(p1(i(h))� p2(1))...PHh=1 Æ(k; j(j))(p1(i(h))� p2(k))...PHh=1 Æ(N2; j(j))(p1(i(h))� p2(N2))1CCCCCCCA

T ; (4.33)where Æ(i; j) is Kroene ker's delta fun tion. The operation of the modi�ed LVQ trainingrule is shown in Figure 4.8. 71

4.6.4 Hybrid approa hesThe adaptation strategy Hybrid(k; �) is a ombination of the adaptation strategiesAdd(k) (Se tion 4.6.1) and Lvq(�) (Se tion 4.6.3). In this strategy, the k nearestprototypes are examined, and if any of them belong to the orre t lass for the input hara ter, the nearest prototype is modi�ed using Lvq(�). If none of the prototypesare orre t, the input sample is added to the prototype set.This strategy ombines the bene�ts of being able to learn entirely new writing stylesby adding prototypes while minimizing the growth of the prototype set through theuse of prototype modi� ation whenever possible. Thus the strategy an be expe tedto perform better than either adding or modifying the prototypes in itself. This wasalso shown to be the ase (Vuori 1999).

72

Chapter 5Palm-top devi e implementationThe re ognition system des ribed in Chapter 4 was originally implemented on a large-s ale platform. While being in view of the omputational power very e�e tive forrunning tests, this is in fa t quite far from the intended target system s ale. Here the reation of an implementation for an a tual palm-top devi e and the implementationsfeatures are des ribed.5.1 MotivationAt a ertain point in the development of the re ognition system it was found thatan implementation on a portable platform would bene�t our testing needs. This wasbe ause the obje tive of the resear h has been to reate a re ognition system usable insu h a ontext. Also the data olle tion is expe ted to be easier to ondu t when thewriter need not be sitting at a workstation during olle tion. This an also be presumedto produ e hara ters written in a more natural style as the writer is fun tioning in asituation similar to that where the devi e would be intended to be used; in a regularreal-life environment.Implementing the re ognition system on a platform with mu h more limited omputa-tional resour es also gave rise to several questions regarding the possibility of operatingat a tolerable speed level. As it is known, Dynami Time Warping is a quite ompu-tationally demanding approa h (Lu and Brodersen 1984), and as su h it was expe tedthat it might pose some problems regarding the operational speed. Thus also methodsfor improving exe ution times on the small-s ale platform be ame of great interest.

73

Table 5.1: Palm-top devi e featuresManufa turer Philips EverexModel Nino 300 Freestyle ManagerPro essor Philips PR31700 NEC VR4102Pro essor lo k 75 MHz 66 MHzRAM 8 MB 8 MBROM 8 MB 8 MBDisplay resolution 240 � 320 � 4 240 � 320 � 2Dot Pit h 0.24 mm 0.24 mmTable 5.2: Palm-top devi e writing resolutionsPhilips EverexNino 300 Freestyle ManagerHorizontal point spa ing 4.3 ppmm 4.8 ppmmVerti al point spa ing 2.9 ppmm 3.7 ppmm5.2 Palm-top platform des riptionFor the test beds, two di�erent palm-top devi es have been used. The �rst was aPhilips Nino 300 and the other an Everex Freestyle Manager. Some general propertiesof the devi es have been gathered into Table 5.1 (Philips 1998, Everex 1998, Everex2000). Both of the palm-top devi es use the Windows CE (WinCE) operating systemversion 2.0.The a tual writing resolutions were tested by drawing several verti al and horizontallines of the same length and averaging the results to estimate the a tual resolution.The results of this evaluation are presented in Table 5.2, as measured in points permillimeter (ppmm).5.3 Implementational IssuesIn addition to the expe ted di�eren es in performan e, also the appli ation developmentenvironment for the small-s ale platform is very di�erent from the standard C++implementation the re ognizer was originally developed on. The development toolsused on the palm-top platform were the Mi rosoft's Visual C++ and the WindowsCE Toolkit for Visual C++, of whi h versions 5.0 and, when they be ame available,6.0 were used. The main di�eren e is aused by the approa h taken by Mi rosoft forWinCE Programming.As the nature of C++ is that things an generally be done in several ways, program-74

mers tend to develop a style of their own when writing program ode. With theWinCE Toolkit, a totally di�erent approa h has apparently been taken; for the mostpart, the number of hoi es available for the programmer has been greatly diminished.This would probably be not so mu h of a problem if the ode being ported had ini-tially been designed for a WIN32 platform. It seems that those two are reasonably ompatible in terms of basi lasses and fun tions available. But sin e the original im-plementation was written on an UNIX-based system, the need to redire t several basi fun tions arose. The �nal out ome was that a new intermediate fun tion interfa e was onstru ted, allowing the majority of the re ognizer ode to work unaltered with theintermediate interfa e layer. This layer then either performs the ne essary operationswith fun tions de�ned there or redire ts the alls to existing WinCE fun tions. Ex-amples of missing features in lude standard input and output streams, �le streams ingeneral and some basi fun tions su h as atoi, allo , exit, ss anf and strtol.All this might not even have been a problem with previous familiarity with WIN32programming. Without prior experien e on these platforms, this basi ally simple phaseof transporting the ode and getting it to work on the new platform posed quite a hallenge. After getting the software to work on the new platform, the performan ewas noted to be una eptably low. This was no surprise, as the phenomenon had beenexpe ted.5.3.1 Stray point removalWhen the Wa om tablet was used on the large-s ale platform, stray points were notan issue. But with the transition to the smaller-s ale platform, the o urren e of straypoints be ame a real on ern. This is probably due to both impre ision in the penpoint lo i apture and the fa t that the devi e is generally used while held in the palm,and as su h is not very stable. So, to negate the adverse e�e ts of stray points onre ognition performan e, a method of stray point removal needed to be implementedfor the palm-top platform.Stray point removal was applied in a simple manner of omparing the distan es betweenthree onse utive points, p1, p2 and p3. If the sum of the squared Eu lidean distan es top2 from its prior and next points, p1 and p3, respe tively, was greater than the squaredEu lidean distan e between the points p1 and p3 multiplied by a removal onstant r,the point was deemed stray and removed. This an be written as removing the pointp2 if jp1 � p2j2 + jp2 � p3j2 > rjp1 � p3j2; (5.1)75

p1

p3

p2

Figure 5.1: Stray point removal he king. When r = 1, removing the point orrespondsto he king if the angle depi ted is � 90Æ, whi h also orresponds to the point p2 beingoutside the ir le Table 5.3: E�e ts of stray point removalRemoval onstant r Per entage orre t Average re ognition time (ms)1 (no removal) 57.8 7572.5 57.8 7812 58.0 7791.5 58.1 7551.2 57.8 786where j � � � j2 is the squared Eu lidean distan e. The fun tion of stray point removalis shown in Figure 5.1. The fun tionality of this approa h was tested with a prototypeset of 273 hara ters and a test set of 709 hara ters all written on the Philips Nino.Both hara ter sets in luded lower and upper ase Latin hara ters and digits. Theresults are shown in Table 5.3.As an be seen, using stray point removal with too large values of r produ es no gains,as points are not removed, but slows down the re ognition slightly due to having todo the distan e al ulations on ea h input sample. The optimal value found in thesetests was r = 1:5, whi h produ ed some gains in both speed and a ura y. As thebene�ts were small, the number of stray points was evidently not large in the dataused, whi h was also veri�able by visually examining the hara ters. But this is highlywriter dependent, as low pen pressure used by the user is a major fa tor in reatingstray points. For writers applying low pen pressure the e�e ts might be mu h moreobvious and bene� ial.5.4 Computational enhan ementsThe mere hange in platform aused some signi� ant di� ulties in a hieving a eptableperforman e, in terms of both a ura y and speed. This required some alterations tothe main body of the ode. In Table 5.4 the in rease in re ognition speed after varying76

Table 5.4: Performan e enhan ements a hieved with omputational rearrangementsAverage Speedup Total speedupOperation performed time (ms) fa tor fa torNone (Initial situation) 7084Compiler optimizations 5640 1.26 1.26Distan e al ulation alls inlined 4543 1.24 1.56Stati memory allo ation for matri es 1889 2.40 3.75Added matrix value presetting 1805 1.05 3.92A tivated ategory-based ordering 1585 1.14 4.47Moved to all-integer al ulations 145 10.93 48.86operations are shown. The results have been obtained with a prototype set of onlyone hara ter for ea h lass, so the results are not fully realisti . But as the sameset was used on all runs, they remain perfe tly omparable to one another. The verysmall prototype set was used simply due to the time the entire re ognition pro esstook, espe ially in the early stages of the ode. The testing set used onsisted of 464 hara ters.The initial situation in the table is the result obtained by running the ompiled odesu essfully for the �rst time. At this time the re ognizer ode had just been modi�edto be runnable on the small-s ale platform. The �rst performan e improvement wasseen merely by a tivating a suitable set of ompiler optimization parameters.The third level of performan e obtained was the result of removing the ex essive amountof fun tion all overhead in distan e al ulations. As ompiler inlining of fun tions wasnot working properly, this was done manually. As an be seen, also this manual inliningresulted in a notable time de rease of 19.5%.The �rst of the most signi� ant bene�ts was obtained through moving from dynami tostati memory allo ation for the most important al ulation matri es. This was then ombined with pre-setting some of the values on the matri es to provide even morespeed gains. These operations are dis ussed in more detail in Se tion 5.4.1 below.The a tivation of the ategory-based ordering of prototypes prior to re ognition, dis- ussed in Se tion 4.5, again added to the performan e. The ordering had been initiallydisabled due to some fun tionality issues whi h were �nally solved at this time.The �nal, and learly most signi� ant boost in performan e was seen when moving from�oating point to integer al ulations. This resulted in a de rease of approximately 91%in re ognition time from the previous state. The pro edures involved are dis ussed inmore detail in Se tion 5.4.2 below.77

5.4.1 Dynami memory allo ationAlready at the beginning of the development it was known, that the use of dynami memory allo ation on the WinCE platform might ause e� ien y problems. Somespeed de rease was expe ted to be amountable to the use of dynami memory allo a-tion.For this purpose, the allo ation of the tables needed for the dynami programmingroutine was moved to the initialization of the program, and the same tables were usedin all al ulations. The values were just reset between re ognition rounds without anyfreeing or reallo ation of memory. The option to in rease the table size when neededwas naturally implemented but after some initial testing the sizes of the tables wereset to a level that rarely needs adjustment. In addition to allo ating the memory, alsosome steps to improve performan e by presetting values that ould be anti ipated priorto al ulation, espe ially in the initial orner of the matrix, were taken prior to thea tual lassi� ation loop.An overall de rease of omputation time of 58.4% from the stati matri es was instantlyseen due to allo ating the matri es only on e during the initialization phase. A littleadditional gain of 4.4% ould be realized through the anti ipative value settings.5.4.2 Floating point al ulationsThe most notable di�eren e regarding the fun tionality of the large and small-s alepro essors is that the small-s ale platform's pro essor has no spe ialized �oating pointunit (FPU) for �oating point number al ulations. With the original routines all beingvery FPU intensive, as in the large-s ale platform this posed no di� ulties, it is easyto imagine what the resulting speed of the operations was when the �oating pointinstru tions were emulated with integer numbers.On e this problem was re ognized, the program ode was modi�ed to use almost ex- lusively integer operations in al ulations in the main re ognition loop. It was notedthat on the average the most time was learly spent in this se tion, over 80% of thetotal re ognition. The use of integer al ulations naturally results in slightly redu edpre ision and the need for more areful over�ow ontrol. Still, the speed gains obtainedmade the swit h the only viable option. A omparison using the lower and upper aseLatin hara ters and digits was performed. A prototype set of a total of 273 hara terswas written with the Philips Nino by one writer. The test set onsisted of 709 hara -ters also written by one, but di�erent, writer. They were used to ompare the e�e tsof integer and �oating point al ulations. 78

Table 5.5: Re ognition a ura y and speed omparison between �oating point andinteger al ulationsCal ulation method Re ognition time (ms) Error per entage�oating point 5173 36.2%integer 757 42.2%Table 5.6: Comparison of omputational power available on di�erent platformsPlatform Integer result (ms) Floating point result (ms)large-s ale 1282 2283Everex Freestyle 3435 60232Philips Nino 2791 109140As an be seen in Table 5.5, with the introdu tion of integer al ulations into thedynami programming loop, the re ognition time dropped by approximately 85% fromthe time needed with �oating point al ulations. As a downside, the re ognition error ount also in reased by 17% from the error ount with �oating point use. This drop ina ura y is regrettable but from a usability point of view, the ompromise had to bemade.5.4.3 Computational powerAs is evident by just looking at the large and small-s ale platforms used, there willbe di�eren es in omputational power. It was expe ted that the small-s ale platformwould introdu e di� ulty espe ially performan e-wise. A mobile MIPS R4000-basedpro essor running at well below 100 MHz an in no way ompare to the large-s aleplatform based on 16 MIPS R10000 pro essors running at 195 and 250 MHz.A simple omparison of the omputational power involved is shown in Table 5.6. Theresults were obtained by al ulating the �rst 1000 de imals of � using an algorithmpresented in (Rabinowitz and Wagon 1995). The results shown are an average of a totalof 30 al ulation runs. The di�eren e in integer results, nearly 120% more for the Ninothan the large-s ale platform, is large but expe table. The �oating point di�eren e ofalmost 50 times more time used, on the other hand, is simply una eptable.5.4.4 Data storage spa eAnother issue en ountered at a late point during the a tual testing of the system wasthe limitation of the total of 8 MB RAM available in both palm-top platforms. As theintention is to gather information to further enhan e the performan e of the system,quite a lot of data is stored during the test phase. The total size of the output �les is79

often in the range of several megabytes.Thus the 8 MB base memory initially in the devi e proved too small. Lu kily thisproblem was rather easy to ir umvent, as the use of �ash memory storage ards isvery simple in these devi es. Installing a 16 MB Compa tFlash storage ard in thePhilips Nino posed no di� ulties but for some reason the ards did not fun tion withthe Everex devi e. This issue is yet to be resolved but it should be just a matter oftime before a driver update or another resolution suggestion from the manufa turer willbe obtained, as they have been onta ted on this issue. Meanwhile the data olle tedsu essfully with the Everex will thus be of slightly lower volume than that with theNino.5.5 The questionnaire appli ationWe developed a data olle tion and testing environment for the palm-top platform.The environment was implemented as a questionnaire program. The user interfa e ofthe questionnaire appli ation is shown in Figure 5.2. The basi idea is that the programshows questions to be answered in the top region alled the question area. The userthen uses either one of the two text-input boxes at the bottom, and the buttons ontheir right, to input the desired answer. The answer appears in the middle se tionreferred to as the answer text area as it is being inputted. The insertion ursor is ablinking verti al line. If a portion of the text has been a tivated, it is shown on a bla kba kground, as is ommon in text-editing interfa es.The questions were designed to provoke answers with more than just one or two let-ters. To enfor e the gathering of su� iently many hara ters, a minimum number of hara ters required before pro eeding to the next question has also been implemented.The minimum number is question spe i� , and has been designed to be realisti interms of an average answer. The limits range from two to twenty input hara ters. Inbetween questions the question area is also used to show a message of what the devi eis doing, for example �Pro essing, please wait...�.5.5.1 Inputting textTo input text, the user an type in either of the text-input boxes at the bottom inFigure 5.2. Chara ters are separated with timeouts or when the pen tou hes areaoutside the box. Thus, when desired by the user, text an be inputted at a high speedby simply alternating between the two text-input boxes. The re ognizer was designedto be able to re ognize all upper and lower ase letters from a to z and digits. In80

Figure 5.2: User interfa e of the palm-top questionnaire programaddition a single horizontal stroke from left to right orresponds to spa e. A strokedrawn in the opposite dire tion is interpreted as a ba kspa e.The topmost button to the right of the writing boxes, marked with an ' ', also pro-du es a ba kspa e, and the button dire tly below it is for inserting a spa e. The Setbutton is used when the re ognizer is simply unable to orre tly label the input. Itpops up a hart with all available labels and the user an sele t the desired label for thea tive hara ter. The hart is also used to prompt the user for the labels for hara tersthat the system was unable to re ognize. Cases where no re ognition result is obtainedo ur when the number of strokes in the input hara ter ex eeds that of any hara terin the prototype set.The Sym, short for symbol, button is used to input spe ial hara ters not re ognized,su h as '?', '% ', '(', ',' et . They are available mainly for pleasing the user's eye, as theyhave no e�e t on the fun tionality of the re ognizer and are a tually internally treatedas spa es. The last button, OK, indi ates that the user has answered the questionand is ready to move on to the next one. Pressing it �rst instigates a he k whetherenough hara ters have been written. If there is enough input for that question, theadaptation phase and movement onto the next question take pla e.Text in the answer area, the middle se tion in Figure 5.2, is inputted by using the ontrol me hanisms des ribed above. The text an be edited by deleting and repla ingusing standard Windows text-editing me hanisms, with the ex eption that edit om-mands, su h as ut, opy and paste, have not been in luded. Text an be a tivated bydragging the pen point over the desired area of text, and the ursor an be relo atedby tapping in the edit window. For the Set operation, if no single hara ter has been81

label: Anumber: 1

label: Bnumber: 2

label: Dnumber: 7

label: Enumber: 9

label:Fnumber: 8

label: C

label: Cnumber: 4

label: Cnumber: 3

number: 5

Figure 5.3: A diagram illustrating the data stru ture for the data olle tion programsele ted, the a tive hara ter is taken to be the one dire tly on the left of the ursor.An a tive area an be deleted by using the ba kspa e hara ter or button, or repla edby a single hara ter inputted by writing or pressing the spa e or Sym button.5.5.2 Determining labelsThe underlying data stru ture to store hara ters and their labels is a simple linked list,where ea h hara ter is stored along with its label, input index and on-s reen positioninformation. This list is modi�ed during the writing pro ess, and the �rst item with agiven position always orresponds to the hara ter urrently seen in the editing se tion.If more than one hara ter has been inputted and retained for a given position, theyall are assigned the label of the topmost hara ter in the �pile� of that position.A s hemati example of the data stru ture is shown in Figure 5.3. The example depi tsthe user having written the the �rst six apital letters. The �rst two were inputtedsu essfully and in order, but the hara ter �C� has needed three attempts for the orre t re ognition. Then the user has inputted a hara ter that was later deleted, asthe index number 6 is non-existent. Then the letters �D� and �F� were inputted, and�E� was inserted afterwards.The most di� ult task in the user-interfa e onstru tion is that in order for su essfuladaptation to be possible, ea h hara ter must be assigned a orre t label. For thispurpose, a number of ases an be isolated:1. A hara ter has been written and left un hanged. Su h a hara ter is onsideredto have been orre tly re ognized.2. A single hara ter has been repla ed with another hara ter. The repla ement anbe performed either through a tivating the hara ter and repla ing it impli itlyor by deleting the hara ter using the ba kspa e and inputting a new hara terdire tly afterwards. In either ase it is dedu ed that the initial hara ter was82

in orre tly re ognized and the label of the new hara ter is also assigned to theunderlying hara ter. Both hara ters are kept in the input hara ter list.3. A single hara ter has been relabeled using the Set button. Then the label of the hara ter is simply hanged to that re eived from the user.4. Several hara ters have been repla ed with one input of any kind. This is thoughtto indi ate the user's hange of mind, and in su h a situation nothing on ern-ing the labels of the hara ters being repla ed an be assumed. The repla ed hara ters are thus dis arded.5. One or more hara ters have been repla ed by a symbol. In su h a ase the writerhas learly desired to delete the hara ters. The hara ters are removed from thelist of learning samples.6. Several onse utive ba kspa es have been re eived. Also in this ase the orre t-ness of the hara ters being deleted annot be established and as su h they areremoved from the list.By the use of these rules the list ontaining the inputted hara ters is kept up to dateand the labels therein are assumed orre t for the adaptation pro ess. Naturally thepossibility of in orre t labels still exists, but through these prin iples the labels shouldbe orre t, if the user has noti ed and taken are to orre t all re ognition errors beforesubmitting the answer. The only unsolved situation is that of the user's hange of mindfor a single hara ter, meaning that the user writes one hara ter, deletes it and writesan entirely new one instead. Su h situations are onfused for ase 2 above. It wasthought that the error orre tion, the basis for ase 2 above, is more important and ommon, so the possibility of error was deemed a reasonable risk.5.5.3 AdaptationAdaptation is performed after submitting an answer to a question. This has the draw-ba k that users might be somewhat onfused when the system at �rst does not seem tolearn anything but then suddenly, when entering the next question, has indeed adapted.The problem is not an issue as long as the user is aware of this behavior. If adaptationwere performed during writing, the orre tness of the training samples ould not beensured. In many ases the need to revert to a previous stage would arise, in reas-ing the requirements for both omputational power and storage apabilities to beyondwhat is available for algorithms on the palm-top platform. When the adaptation isperformed after the user is satis�ed with the answer to the question, it an be quite83

reliably assumed that he or she left no errors. Then the adaptation an be on�dentlyperformed on the labels dedu ed a ording to the s heme presented in Se tion 5.5.2.The a tual adaptation is performed with the Hybrid(k; �) adaptation method des ribedin Se tion 4.6.4. The values for k and � were previously determined the most e�e tiveat 3 and 0.3, respe tively (Vuori et al. 1999), and these values were used also for thepalm-top implementation.5.5.4 Data olle tionDuring operation, an exhaustive amount of data is stored in a log �le for the entirepro ess. The amount of information is in general enough to reprodu e all events duringdata olle tion. Ea h entry in the log �le holds a time stamp to help tra e the progressof the experiment.In addition to storing the a tual hara ters written and their dedu ed labels, also allbutton presses and the ontents of the editing se tion on any hange are logged. Thequestion being answered is also stored, as is information on any notable internal oper-ation su h as alterations in the ontents of the storage list, adaption phase informationon what adaption was performed, and, if appli able, what prototype was altered. Theadditional information is stored as omments in the UNIPEN-format �le ontainingthe input data. Due to the log �le adhering to the UNIPEN �le format, glyphs storedin the log an be browsed with a UNIPEN viewer.5.6 Speedup methodologiesDue to insu� ient re ognition speed being a major issue in the palm-top platform im-plementation, methods for speed in rease have been examined. Some of them have yetto be implemented on the a tual platform sin e they have been tested on the large-s alesystem. The experiments have mainly been performed in the large-s ale system as run-ning extensive tests on the small-s ale system is very umbersome and di� ult due toboth storage spa e and omputational power insu� ien ies. Two speedup approa hesare examined in more detail below.5.6.1 De imationThe operator De imate(n), des ribed in Se tion 4.2, was an obvious means for improv-ing re ognition speed. Due to the nature of removing information- ontaining points84

Table 5.7: Performan e e�e ts of de imationDe imation level Per entage orre t Average re ognition time (ms)None 58.1 754.81 57.0 473.42 53.0 387.03 42.0 347.34 44.1 312.5from the in oming data, an extensive amount of de imation produ es notable de reasesin the re ognition performan e (Vuori 1999).The e�e ts of de imation were tested on the palm-top platform with a prototype setof 273 prototypes and a test set of 709 hara ters. Both upper and lower ase Latin hara ters and digits were in luded. The results have been gathered into Table 5.7.As an be seen, the re ognition speed in reases notably when moving from no de ima-tion to 1 or further to 2 point de imation. Also the a ura y loss is a eptable for thesystem, as the adaptive re ognition, whi h was not in luded in these simulations, helpsimprove the �nal a ura y. But with 3 or 4 point de imation, the loss in re ognitiona ura y is very prominent, and the re ognition speed gains are no longer worth theloss. Thus the operation De imate(2) was hosen for use on the palm-top platform.5.6.2 Predi tive al ulation abortingThe se ond speedup method examined fo uses on trying to predi t whether the result ofthe distan e al ulation will ex eed the maximal allowed value, the minimal distan e al ulated so far. As the al ulation algorithm already in ludes an option to abort al ulation if the shortest distan e already found is ex eeded, the predi tion merely auses the al ulation to be terminated at an earlier stage.The form of the predi tion fun tion was hosen based on modeling previously gathereddata sequen es of distan e umulation at varying points of mat hing. The predi tionfun tion dest(i) at point i of the stroke S with a total of I points is in the form ofdest(di; i) = di(�Ii + �); (5.2)where di is the al ulated olumn-minimum distan e from the dynami programmingalgorithm at the point i. The form of the predi tion needs to be sele ted so that itdoes not enormously overestimate the total distan e, whi h would give rise to ex essiveabortings.Figure 5.4(a) shows the average behavior of D(S)=di as a fun tion on iI , where D(S)85

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110

0

101

102

103

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110

0

101

102

(a) (b)Figure 5.4: Plots of D(S)=d olmin(i=I); a) Averaged and b) Minimal (points) along withthe fun tions 0:1=x+ 1 and 0:2=x+ 1:5is the total distan e resulting from the mat hing of S. The averaged mat hings areall �nal orre t results for ea h hara ter, ie. the ones with the smallest total distan ewith the requirement that the losest mat h was indeed orre t. In Figure 5.4(b), 0.1%of the smallest values have been removed and then the minimum of the remaining oneshas been plotted. Two example �ttings, the fun tions 0:1Ii + 1 and 0:2Ii + 1:5, have alsobeen in luded to demonstrate the behavior of the minimums.As the plots are the relation between D(S) and di, it an be noted that pre isely theminimum is the value of most interest, as that is the de iding one in the �nal predi tion.The removal of the 0.1% an be expe ted to dire tly translate to errors, but it enables al ulation aborting with some probability to take pla e.This method was tested using a test set onsisting of a total of 29292 hara ters from24 writers. They were lassi�ed with a prototype set ontaining 7 prototypes for ea h lass, a total of 476 prototypes produ ed with the semiautomati lustering methoddes ribed in Se tion 4.4. The sets in luded the lower and upper ase Latin hara tersand digits. Some results of average re ognition times per hara ter, re ognition a u-ra y and the fra tion of predi tion-based al ulation interruptions obtained with thispredi tion method are presented in Table 5.8. The last olumn entitled �A tivationper entage� represents the per entage of the mat hings aborted due to the predi tionalgorithm of all mat hings initiated.As an be seen, the omputational speed an be improved noti eably, and in some asesalso the re ognition a ura y improves. The improvement of the re ognition a ura yis probably more of a oin iden e, as this suggests that there are some hara ter-prototype-pairs in the set where the end part produ es a very good mat h even thoughthe prototype belongs to the in orre t lass. But as an be seen, the re ognition time86

Table 5.8: Performan e e�e ts of predi tionVariables Average re ognition Error A tivation� � hara ter (ms) per entage per entageReferen e 304.6 34.3 0.00.1 0.0 179.6 34.5 43.00.1 1.0 165.5 33.9 81.70.2 1.0 127.7 33.8 83.10.3 0.75 113.5 34.3 82.40.2 1.5 112.6 36.1 85.20.325 0.75 108.7 34.5 82.90.4 0.75 95.8 35.0 83.80.45 0.75 89.4 35.8 84.20.5 1.0 80.9 37.0 85.30.9 0.0 65.9 40.2 84.41.0 1.0 57.3 43.5 87.0 an be redu ed by 63% with no negative e�e ts on the re ognition performan e, andby 71% with just an in rease of 4% in the re ognition error.If the predi tion algorithm an be implemented in a way feasible in the small-s aleplatform, these speed in reases are extremely promising. The main problem is thatfor a fun tion of the form 1=x the al ulation of the values is by no means trivialfor the integer pro essor. This will likely ause less signi� ant bene�ts in terms offaster re ognition. One possible implementation would be through the use of a look-uptable for the values at varying points. When using slightly less stri t predi tions theunavoidable rounding error should not have any signi� ant e�e t on the �nal out ome.5.7 Current level of performan eThe urrent performan e di�eren e between the large and small-s ale platforms wastested by running an identi al bat h run on both systems. The prototype set onsistedof 273 prototypes and a test set of 709 hara ters. All hara ters were written on thePhilips Nino and in luded the upper and lower ase Latin hara ters and digits. Theresults have been gathered into Table 5.9.As is evident from both Tables 5.6 and 5.9, the small-s ale platform simply annotTable 5.9: Performan e di�eren e between the platformsPlatform Per entage orre t Average re ognition time (ms)Palm-top 53.0 387.0Large-s ale 61.5 51.987

ompete with the large-s ale platform in terms of omputational power. The needto avoid �oating point operations results from the la k of a spe ialized FPU in thepalm-top devi es. The use of integer operations does help speed up omputation, butalso results in slight deterioration in re ognition rate, as was evident in Table 5.5.Thus this omparison showing a 645% in rease in omputation time along with a 14%deterioration in re ognition rate is of no surprise.The speed of human handwriting input is approximately 18.5 wpm, with a word beingde�ned as approximately �ve hara ters (Ma Kenzie et al. 1994). This translates toapproximately 1.5 hara ters per se ond. The re ognition should then be ompletedwithin approximately 650 millise onds. As an be seen, both platforms rea h this ratewhen operating on bat h runs.Sadly though, the a tual re ognition speed of the palm-top devi e drops quite noti e-ably from the bat h run rate due to overheads from the user interfa e and data storage.The a tual re ognition rate during the use, the time it takes between the stopping ofwriting and re eiving the result from the re ognizer, was measured at approximately450 ms during an a tual test run. This attributes an approximately 17% overheadto the re ognition performed on exa tly the same input data and prototypes in bat hmode. When the user-interfa e and data storeage overheads are added, the re ognitiontime an be said to be still simply too long, as the user needs to wait for the re ogni-tion result to appear before ontinuing to write. A slight relief is provided by the userinterfa e, as it allows hara ters to be written at a faster rate, with the results justappearing after a delay.5.8 Experiments and resultsEvaluation of the re ognizer performan e in a real-world appli ation, su h as the ques-tionnaire appli ation, des ribed in Se tion 5.5, is not as straightforward as simply al ulating the re ognition rate in orre t re ognitions versus all re ognition attempts.The true label of the input sample is not known at the time of re ognition, and labellingthe samples afterwards manually is a very laborious task. Also in some ases the labelis also ambiguous for the human reader.Some appli able measures identi�ed for this purpose and are shown in Figure 5.5. Theseresults were obtained with a prototype set onsisting of 273 hara ters and a test set of709 hara ters, written by di�erent writers. All hara ters were written on the PhilipsNino and in luded the upper and lower ase Latin hara ters and digits. The riteriain lude the ratios between the ounts of submitted hara ters and all input hara ters(Figure 5.5(a)), hara ters submitted with one attempt and all submitted hara ters88

0 5 10 15 20 25 30 35 400.5

0.6

0.7

0.8

0.9

1

Question number

Sub

mitt

ed c

hara

cter

s/in

putte

d ch

arac

ters

0 5 10 15 20 25 30 35 400.5

0.6

0.7

0.8

0.9

1

Question number

Cha

ract

ers

subm

itted

on

first

atte

mpt

/sub

mitt

ed c

hara

cter

s

(a) Ratio of submitted hara ters to (b) Ratio of hara ters submitted onall inputted hara ters the �rst attempt to all submitted hara ters

0 5 10 15 20 25 30 35 400.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Question number

Cha

ract

ers

subm

itted

on

first

atte

mpt

/inpu

tted

char

acte

rs

0 5 10 15 20 25 30 35 40250

300

350

400

450

500

550

600

Question number

Ave

rage

rec

ogni

tion

time

per

char

acte

r (m

s)

( ) Ratio of hara ters submitted on (d) Average re ognition time perthe �rst attempt to all inputted hara ter hara tersFigure 5.5: Plots of the four riteria devised to evaluate re ognizer performan e(Figure 5.5(b)) , or hara ters submitted with one attempt versus all input hara ters(Figure 5.5( )). Here submitted hara ters refers to the input samples that were left inthe storage list when the user deemed the question answered. Also the re ognition timeper hara ter (Figure 5.5(d)), and its development, are naturally of interest. Additionalinformation on the performan e an be gained from simply questioning the subje t asto whether performan e at various points of the questionnaire was a eptable.It an be noted, that the average re ognition time overall was 452 ms, starting atapproximately 300 ms and in reasing, due to the addition of prototypes, to approxi-mately 480 ms after the �rst 10 questions. Also all the ratios, on the average, improvedtowards the end of the questionnaire. This suggests that the adaptation of the systemwas su essful, as the user did not need to re-input, or orre t, as many input samplesas in the beginning of the questionnaire. 89

Chapter 6Adaptive ommittee re ognitionThis hapter deals with experiments performed with adaptive ommittee lassi�ers.Mainly the Dynami ally Expanding Context (DEC) (Kohonen 1986, Kohonen 1987)approa h has been used, but also some other methods have been examined mainly toserve as referen e lassi�ers to evaluate the performan e of the DEC-based ommittee.The lassi�ers and results obtained are dis ussed in the following se tions.6.1 MotivationEven though rather impressive results have been obtained with the Dynami TimeWarping-based re ognizer using single lassi�er adaptation (Vuori 1999), the questionas to how these results ould still be improved on was left open. As in general a ombination of re ognizers an be expe ted to perform better than any of its members,using an adaptive ommittee approa h was a very logi al step.When sear hing for a suitable method of ommittee adaptation the idea of using theDEC prin iple, previously mainly used for spee h re ognition (Kohonen 1986, Torkkolaand Kohonen 1988, Torkkola 1993, Kurimo 1998), arose. The method is des ribed indetail below, as is its modi�ed version used in our handwriting re ognition proje t.6.2 Dynami ally Expanding ContextHere the main prin iple of the Dynami ally Expanding Context is explained. Both theoriginal prin iple and an illustrative experiment on ontext dire tions are des ribed.

90

6.2.1 The original DEC prin ipleThe original algorithm was developed to reate transformation rules that would orre ttypi al oarti ulation e�e ts in phonemi spee h re ognition. The notation for a DECrule stands as x(A)y ! B; (6.1)where A is a segment of the sour e string S, B is the orresponding segment in thetransformed string T , and x(�)y is the ontext in string S where A o urs. So in otherwords A is repla ed by B under the ondition x(�)y. (Kohonen 1986)The main philosophy behind the approa h is to determine just a su� ient amount of ontext for ea h individual segment A so that all on�i ts in the set of training sampleswill be resolved (Kohonen 1987). Thus an optimal ompromise between a ura y andgenerality is expe ted to be obtained.In the DEC prin iple a series of stepwise expanding frames, ea h onsisting of anin reasing number of symbol positions on either or both sides of A, is reated. Thenth frame in the series is said to orrespond to the nth level of ontext. Examples of ontextual levels for the letter �s� in the string �eisinki�, an erroneous form of �helsinki�,are shown in Table 6.1.The entral idea of the method is to always �rst try to �nd a produ tion of the lowest ontextual level to su� iently separate ontradi tory ases. Starting with ontext level0, or ontext-free level, ontexts of su essively higher levels will be utilized until all on�i ts are resolved.In Kohonen (1987) it is stated that in order to de�ne the ontext-dependent produ tionrules one has to �rst �nd the ontext-independent produ tion rules. This means �ndingout whi h segments in the transformed string T mat h best with the given segmentsin the sour e string S. This was performed using the weighted Levenshtein distan eTable 6.1: Two examples of ontextual levels for the letter �s� in the string �eisinki�Level Context I Context II0 - -1 i(�) (�)i2 i(�)i i(�)i3 ei(�)i i(�)in4 ei(�)in ei(�)in91

de�ned as DLevenshtein(S; T ) = min(pa+ qb + r ); (6.2)where T is obtained from S using a repla ements, b insertions and deletions. p, qand r are the weighting oe� ients for ea h respe tive operation. The weights anbe determined heuristi ally or obtained statisti ally from the onfusion matrix of thealphabet as the inverse probability for a parti ular type of error. The onfusion matrixis a matrix depi ting for ea h hara ter what hara ters it was onfused with, andhow often, during the re ognition. The minimum of (6.2) is sought using an algorithmbased on dynami programming des ribed in (Kohonen 1987).Be ause the true size of the ontext annot be known beforehand, the training datamight have to be presented multiple times iteratively. To enfor e orre t rule reation,a on�i t bit is used for ea h rule. This bit remains 0 as long as the rule is valid, andis hanged to 1 if the rule needs to be invalidated. Using this knowledge, the �nal ontextual levels required an be determined automati ally in a hierar hi al sear h.Estimation pro edures for ases where the sear h for a suitable rule is unsu essful arealso presented in (Kohonen 1987).This original approa h was tested by Kohonen (1986) with several training sets onsist-ing of 5000 to 9000 words. With the training samples, sets of orre tion rules with over10000 produ tions were established. When using a familiar vo abulary the trans rip-tion a ura y rose from 68.0% to 90.5% as a total of 70% of the errors were orre ted.With unfamiliar words the trans ription a ura y improved from 66.0% to 85.1%, animprovement of 56% errors orre ted.6.2.2 E�e ts of spe i� ity hierar hiesIn (Torkkola 1993) the DEC prin iple is applied to the transformation of English textinto phonemes on the basis of the lo al letter ontext. Di�erent spe i� ity hierar hiesare explored, namely to left only, to right only, symmetri al starting from left, sym-metri al starting from right, left-weighted and right-weighted. An example of theseis shown in Table 6.2, where two alternatives for the ontext expansion of the word�abbreviation� are depi ted.After having been tested with a 18008 word training set and a 2000 word testing set, thespe i� ity hierar hies were ordered in performan e. The results are shown in Table 6.3.As an be seen, the dire tion of the learning hierar hy an have a surprisingly stronge�e t on the �nal performan e. 92

Table 6.2: Examples of expanding learning hierar hiesContext symmetri al right-weightedlevel expansion expansion0 (t) (t)1 (t)i (t)i2 a(t)i (t)io3 a(t)io a(t)io4 ia(t)io a(t)ion5 ia(t)ion a(t)ion_Table 6.3: The performan e of di�erent spe i� ity hierar hiesContext expansion s heme A ura y (%)to left only 78.1to right only 86.6symmetri al, starting from left 89.0symmetri al, starting from right 90.61/3 right, 2/3 left 86.82/3 right, 1/3 left 90.86.3 DEC-based adaptive ommittee lassi�erThe DEC prin iple needed to be slightly modi�ed to suit the use as a ommittee ombinator of lassi�ers in the setting of this thesis. This se tion overs the used versionof the algorithm and the implemented options, as well as the a tual implementation ofthe algorithm.6.3.1 OverviewIn the DEC-based ommittee the lassi�ers are �rst initialized and then tested sepa-rately and ranked in order of de reasing performan e. The outputs and the se ond-ranking results of the member lassi�ers are used as a one-sided ontext for the reationof the DEC rules. A s hemati diagram of the DEC-based adaptive ommittee las-si�er is shown in Figure 6.1. In this example there are three member lassi�ers. The�rst-ranking results are denoted symboli ally as a, b and , and the se ond-rankingones as d, e and f .Ea h time a hara ter is input to the system, the existing rules are �rst sear hedthrough. If no appli able rule is found, the default de ision, whi h an be for examplethe result of the best individual lassi�er or a majority voting result, is applied. If thefound rule, or default de ision, results in an in orre t re ognition result, a new rule is reated. 93

Classi�er #1Classi�er #2Classi�er #3a! pab! qab ! rab d! sDEC rulesmember outputs1st 2ndab

defCommittee ma hine

re ognitionCommittee members

Figure 6.1: A blo k diagram of the DEC-based adaptive ommittee lassi�erWhenever a new rule is reated, it employs more ontextual knowledge, if at all possible,than the rule ausing the on�i t. Eventually the entire ontext available will be usedand more pre ise rules an no longer be written. For this situation a method fortra king the orre tness of the rules an be used and the highest level rule most likelyto be orre t be applied.In the ase of o�-line training the training set an be reiterated until rule onsisten y isensured. But as the obje tive of this resear h is an on-line system, storing all previousinput samples and using them in an iterative manner would be too expensive in termsof both omputational performan e and storage spa e. Thus it is assumed that priorsamples will not be available after they have been re ognized.The introdu tion of a new writer always results in the re-initialization of the rule base,as the adaptation is aimed to be user-dependent. The previous rule sets an naturallybe stored for use if input from the same writer will again be re eived.6.3.2 Example of operationAn example of the operation of the DEC-based ommittee is presented in Table 6.4.The olumns of the table show the orre t lass of the input hara ter, the �rst outputsof the member lassi�ers, the existing rules, the output of the ommittee, its evaluation,and the a tion taken.In this example it is presumed that no rules exist prior to re eiving the �rst sample.The default de ision is taken to be the result from the best individual lassi�er. A seriesof hardly distinguishable letters `t' and `f' are input to the system, and the operationpro eeds as follows.On the �rst round, the input sample is a `t' and is lassi�ed as su h by the best-ranked94

Table 6.4: An example of DEC rule generationRound Input Members Rules Output Result A tion1. t t f t � t ok �2. f t f t � t err add rule: t f -! f3. f t f t t f -! f f ok �4. t t f t t f -! f f err add rule: t f t! t5. t t f t t f -! f t ok �t f t! t lassi�er. As no rules exist, this result is used for the result of the ommittee. Sin ethe result is orre t, no further a tions need to be taken.On the next round, the members' outputs are identi al, but the orre t label wouldhave been `f'. Thus the ommittee's result is in orre t, and a new rule is added. Asno rules for su h a ase exist, the use of a one- hara ter ontext is su� ient to resolvethe on�i t and the rule �t f -! f� is inserted. The third round presents an identi alsituation as the se ond, but as the rule reated on the previous round mat hes theinputs, it is used to produ e orre t output.The fourth round demonstrates the in orre t result from the previously reated rule.This leads to the generation of the new rule �t f t ! f�. Using this new rule the ommittee is then again able to orre tly lassify the identi al situation in round �ve,as the most ontext-having rule is used.6.3.3 OptionsSeveral options were explored in the sear h for the best a hievable re ognition resultusing the ommittee. Herein those variables are explained.Default de isionThe system's default de ision rule is needed when no hara ter-spe i� rules yet exist.Two methods for produ ing the default de ision were experimented with. The �rst isto simply use the output of the best-ranked lassi�er as the default de ision for the ommittee. The alternative is to perform majority voting on the results obtained fromthe lassi�ers to make the default de ision.95

Requiring the in lusion of the outputAnother variation implemented was the possibility to require that the output symbolx for a rule of the form A ! x must be in luded in the ontext A. In other words,one of the lassi�ers must produ e the result for it to be the output of the ommittee.A tivation of this option will hinder the reation of mali ious transformation rules.Use of se ond-ranking resultsThe ommittee an fun tion either by using just the �rst-ranking results from its mem-ber lassi�ers or by also in luding the se ond-ranking results. The se ond-rankingresults an be used in two ways, either horizontally or verti ally.The horizontal in lusion of the se ond-ranking results means that �rst the �rst-rankingresult from the best-performing individual member is used. Then the se ond-rankingresult is used from the same lassi�er, and after this the two results from the se ond-best performing lassi�er in the same order, then the third lassi�er and so on. InFigure 6.1, this orresponds to the order à', `d', `b', è', ` ' and `f'.The verti al approa h to se ond-ranking result use is an indi ation of all �rst-rankedresults being used prior to se ond-ranked results from any lassi�er. So the �rst-rankedresult of the best lassi�er is followed by the �rst-ranked result from the se ond lassi�erand so on until all �rst-ranked results have been used. Then the se ond-ranked resultsare used in a similar fashion. This approa h orresponds to the order à', `b', ` ', `d',è' and `f' in Figure 6.1.Corre tness tra kingThe initial version of the DEC implementation simply dis arded rules as they resultedin an in orre t answer but this was qui kly seen to be suboptimal. Hen e three optionswere implemented to dis riminate between on�i ting high-level rules. These are 1)ina tivation of the latest in orre t rule, 2) ounting the orre t appli ations and usingthe one with most orre t results at the time, or 3) ounting both the orre t andin orre t appli ations and making the de ision based on their di�eren e.6.3.4 ImplementationThe DEC-based ommittee lassi�er has been implemented only on the large-s aleplatform des ribed in Chapter 4. The ommittee lassi�er is a separate software om-ponent apable of using the re ognition result information re eived from any number96

of individual lassi�ers to perform the ommittee operation.The rule base for ea h writer is stored as a simple doubly-linked list and an be exportedinto a �le for examination and restored for further use. In addition to the overallre ognition performan e information, the ounts of orre t and in orre t uses for ea hrule are also tra ked.For the experiments presented in this thesis, the ommittee was implemented andrun in bat h mode simulating on-line operation by taking data in its original orderand disallowing reiteration. The individual member lassi�er outputs were reatedbeforehand and stored in �les with a format onsisting of the information on the writer,the true label of the hara ter, distan e to the nearest prototype as well as its labeland order number, and the same information for the se ond-nearest prototype.6.4 Referen e ommittee lassi�ersIn order to evaluate the performan e of the DEC-based adaptive ommittee, some ref-eren e lassi�ers need to be used. The �rst two presented here are very simple adaptive ommittee ombination methods. The third one uses a slightly more omplex approa h,and also in orporates information on the distan es between the input hara ters andthe prototypes into the de ision-making s heme. All adaptation was performed in auser-dependent manner, and values were reset in between writers. The fourth, non-adaptive referen e lassi�er used is a basi majority voting s heme, as presented inSe tion 3.7.1.6.4.1 Adjusting best ommitteeThe adjusting best ommittee is perhaps the most simple form of ommittee adap-tation. The main idea is to sele t the best lassi�er for ea h individual writer byevaluating ea h lassi�er's performan e during operation and use the result from the lassi�er that has performed the best up to that point.The performan e evaluation was ondu ted by simply keeping tra k of orre t resultsobtained from ea h lassi�er. At any given time the ommittee's de ision is thus theresult from the lassi�er with the highest orre t answer ount at that point.6.4.2 Adjusting majority voting ommitteeAnother very simple approa h to adaptive ommittee de isions is to use a variationof the traditional majority voting rule presented in Se tion 3.7.1. Adaptation was97

implemented by introdu ing weights based on an evaluation of orre tness for ea hvoting lassi�er as in wi = i + 1PNj=1 j + 1 ; (6.3)where wi is the weight for the output of lassi�er number i, i is the ount of orre tre ognitions from that lassi�er and N is the total number of lassi�ers. The additionof one in both the nominator and denominator has been performed merely to avoidboth zero weights and divisions by zero before any orre t re ognition results have beenobtained.With this weighting, the �nal majority voting de ision in (3.10) is modi�ed to assigningZ ! !j if NXi=1 wi�ij = Cmaxk=1 NXi=1 wi�ki; (6.4)where the sum on the right hand side equals to the weighted sum of the votes re eivedfor the lass k from the N re ognizers. �ki is de�ned by hardening the a posterioriprobabilities P (!kjxi) to binary values as in (3.11).6.4.3 Modi�ed Current-Best-Learning ommitteeThe Current-Best-Learning (CBL) algorithm (Russell and Norvig 1995) strives for a onsistent hypothesis for the entire set of samples by generalizing or spe ializing aninitial hypothesis. The original algorithm uses ba ktra king to ensure that the hypoth-esis is also onsistent with all prior samples. The spe ialization operation indi atesthat a unit, a lo ation within the hypothesis spa e, that was previously positive mustbe deemed negative, and the generalization then refers to setting a previous negativeto positive.The a tual algorithm used for this referen e ommittee has a tually grown quite farfrom that initial idea, but as the resemblan e is still evident, it is here alled theModi�ed Current-Best-Learning (MCBL). As in the original version, the data spa e isa two dimensional grid. The use of just a positive and negative value would require aseparate lass for ea h sample, and this would by no means be pra ti al. So the valuesused here are in a way estimates of the on�den e in a parti ular de ision, and arede�ned as j(x) = 1� dj(x)d1(x) + d2(x) ; (6.5)98

Table 6.5: Des ription of the hypothesis spa e using the on�den e valuesClassi�er 1 Classi�er 2 Classi�er 3 Classi�er 4a p1(a) p2(a) p3(a) p4(a)b p1(b) p2(b) p3(b) p4(b) p1( ) p2( ) p3( ) p4( )... ... ... ... ...where j(x) is the on�den e outputted for the sample x. j 2 f1; 2g is the indexindi ating whether the on�den e value is being al ulated for the �rst or se ond-ranking result, and d1(x) and d2(x) are the distan es to the �rst and se ond-rankedprototypes, respe tively.By olle ting the values and ombining them into lass-wise on�den e values pk(!j),where k is the number of the lassi�er and !j the lass, the data stru ture depi ted inTable 6.5 is obtained. The de ision of the ommittee is simply that member lassi�er'sresult obtained whi h has the largest on�den e value. To modify the hypothesis, thevalues pk(!j) are adjusted when the ommittee as a whole in orre t. So when anindividual lassi�er k is orre t, the on�den e of the result j(x), for that lassi�er isadded to the overall on�den e of the lass for that lassi�er, pk(!j) when the inputsample i was lassi�ed as !j. On the other hand, when a lassi�er produ es an in orre tresult, its total on�den e is redu ed by the orresponding amount, but not belowzero. When the ommittee produ es the orre t result, the urrent hypothesis hasbeen e�e tive and no hanges are made.The on�den e values were initialized as the inverse of the ordering of the lassi�ersa ording to their de reasing re ognition performan e. Thus, the �rst lassi�er had aninitial on�den e of 1 for all lasses, the se ond 12 and so on.6.5 Experiments and their resultsHere the DEC-based ommittee's performan e under various ombinations of the avail-able options is examined and the bene�t of using ea h option evaluated. The DEC-based ommittee is also ompared to the referen e lassi�ers in order to see how theperforman e fares in omparison to other ommittee methods, adaptive and not.6.5.1 Des ription of the data setsThe data used in these experiments was olle ted on the large-s ale system and storedin UNIPEN format as des ribed in Se tion 4.1. The details of the databases are sum-99

Table 6.6: Summary of the databases used in the experiments.Database Subje ts Left-handed Females Chara ters (a-z,0-9)DB1 22 1 1 � 10 400 8461DB2 8 0 5 � 8 100 4643marized in Table 6.6.Database 1 onsists of hara ters whi h were written without any visual feedba k. Thepressure level thresholding of the measured data into pen up and pen down movementswas set individually for ea h writer. The distribution of the lasses (a-z, A-Z, å, ä, ö,Å, Ä, Ö, 0-9, (, ), /, +, -, %, $, �, !, ?, :, ., and ,) was somewhat similar to that of theFinnish language.Database 2 was olle ted with a program that showed the pen tra e on the s reenand re ognized the hara ters on-line. The minimum writing pressure for showing thetra e of the pen on the s reen and dete ting pen down movements was the same for allwriters. The distribution of the hara ter lasses (a-z, A-Z, å, ä, ö, Å, Ä, Ö, and 0-9)was nearly even.None of the writers of Database 1 appeared in Database 2. Only lower ase lettersand digits were used in the experiments. Database 1 was used for forming the initialprototype set whi h onsisted of 7 prototypes per lass and Database 2 was used as atest set.6.5.2 Member lassi�ersThe adaptive ommittee experiments were performed using a ommittee onsistingof four individual lassi�ers. All the lassi�ers are based on DTW al ulations usingeither the point-to-line (PL) or normalized point-to-point (NPP) distan es dis ussed inSe tions 4.3.2 and 4.3.1, respe tively. All lassi�ers used the MinMaxS aling operatorand either the MassCenter or BoundingBoxCenter operation, whi h all were des ribedin Se tion 4.2. The on�gurations and error rates of the member lassi�ers are shownin Table 6.7.In general it an be stated that a ommittee ould be expe ted to perform the better thelesser the errors made by its members are orrelated. Unfortunate un orrelatedness isnot the ase here. As the DTW-based lassi�er was the only one apable of a eptablere ognition performan e, all the member lassi�ers are rather similar. This an alsobe seen in Table 6.8, where the o urren e of instan es when the members produ ethe same error is ompared with the o urren e of di�erent errors. The olumns �First100

Table 6.7: Re ognition error rates of the four ommittee member lassi�ersNumber Distan e measure Bounding box Mass enter Errors1 PL � 14.9%2 NPP � 15.1%3 NPP � 18.2%4 PL � 19.6%Table 6.8: Pairwise distribution of errors for the member lassi�ersClassi�ers Both First Se ond Same Di�erent orre t orre t orre t in orre t in orre t1 and 2 82.0% 3.0% 2.9% 9.7% 2.4%1 and 3 78.5% 6.6% 3.3% 8.3% 3.3%1 and 4 78.4% 6.7% 2.0% 10.8% 2.2%2 and 3 79.3% 5.7% 2.5% 10.4% 2.2%2 and 4 76.3% 8.7% 4.0% 8.1% 2.9%3 and 4 77.0% 4.9% 3.4% 11.7% 3.1% orre t� and �Se ond orre t� refer to situations where only the �rst or se ond lassi�erwas orre t and the other in orre t.From Table 6.8 it is obvious that for all pair-wise ombinations of the lassi�ers usedthe o urren e of the same error is mu h more ommon than the lassi�ers produ ingdi�erent errors. As when all the lassi�ers produ e the same error the ommittee has avery limited han e of �nding the orre t answer, su h a situation is highly unbene� ialfor the ommittee operation.6.5.3 ResultsThe results of the runs with the DEC ommittee for evaluating the performan e ofvarious options are shown in Table 6.9. These runs were performed using all ombina-tions of the available options. Some averages of the e�e ts of the di�erent options onoverall performan e have been olle ted into Table 6.10. The tail error per entage inthe tables orresponds to the error per entage al ulated for the last 200 samples forea h writer.As an be seen from Tables 6.9 and 6.10, in general using the best individual lassi�er asthe default rule outperformed majority voting in that role. This might be partially dueto the relatively large di�eren e in the a ura y between the best and worst lassi�er,with a ura ies of 14.9% and 19.6%, respe tively.101

Table 6.9: Results from the DEC parameter variation experimentsDefault Required 2nd-ranking Con�i t Error Tail errorde ision in lusion results resolving per entage per entagebest - - or 14.8 15.1best - - &w 15.1 15.4best - - ir 15.7 16.6best - verti al or 11.1 11.3best - verti al &w 11.1 11.3best - verti al ir 12.2 12.7best - horizontal or 13.5 13.4best - horizontal &w 13.5 13.4best - horizontal ir 15.6 16.4best yes - or 12.7 13.4best yes - &w 12.7 13.4best yes - ir 12.8 13.6best yes verti al or 11.6 11.3best yes verti al &w 11.6 11.3best yes verti al ir 12.3 12.4best yes horizontal or 11.7 11.8best yes horizontal &w 11.7 11.9best yes horizontal ir 11.9 12.4majority - - or 15.1 15.4majority - - &w 15.5 15.8majority - - ir 16.1 17.0majority - verti al or 12.0 12.0majority - verti al &w 12.0 12.0majority - verti al ir 13.0 13.3majority - horizontal or 14.3 13.7majority - horizontal &w 14.3 13.7majority - horizontal ir 16.6 17.2majority yes - or 12.9 13.5majority yes - &w 12.9 13.4majority yes - ir 12.9 13.5majority yes verti al or 12.5 11.9majority yes verti al &w 12.5 11.9majority yes verti al ir 13.0 12.9majority yes horizontal or 12.5 12.3majority yes horizontal &w 12.5 12.4majority yes horizontal ir 12.9 13.1102

Table 6.10: Estimation of the e�e t of various individual options aloneParameter With Total error Tail errorverti al 11.7 11.7best horizontal 13.0 13.2no 2nd 14.0 14.6in 12.1 12.4no in 13.6 14.0overall 12.8 13.2verti al 12.5 12.3majority horizontal 13.9 13.6no 2nd 14.2 14.8in 12.7 12.8no in 14.3 14.5overall 13.5 13.6best 12.6 12.7 or majority 13.2 13.1overall 12.9 12.9best 12.6 12.8 &w majority 13.3 13.2overall 13.0 13.0best 13.4 14.0ir majority 14.1 14.5overall 13.8 14.3best 12.1 12.4in majority 12.7 12.8overall 12.4 12.6best 13.6 14.0no in majority 14.3 14.5overall 14.0 14.2

103

Table 6.11: Comparison of DEC with referen e lassi�ersCombination method Error % Tail error %DEC 11.1 11.3MCBL 13.0 14.3Adjusting Best 14.5 15.0Non-adaptive Majority Voting 14.6 15.9Adjusting Majority Voting 14.7 15.9Requiring the output symbol to be in luded in the inputs seems to also improve a - ura y somewhat. This option has the bene�t of making absurd rules, su h as forexample �a a a ! d�, impossible. It an be expe ted to be bene� ial in all ases butfor those where the user truly writes a hara ter in a way that results in the samein orre t re ognition sequen e every time, and the hara ter the in orre t re ognitionresults in auses a di�erent sequen e from the lassi�ers. As su h errors are mu h eas-ier to orre t by modifying the prototype set, it would be expe ted that this in lusionrequirement would be even more bene� ial when using member lassi�ers that are alsoindependently adaptive.Also the use of the se ond-ranking results seems quite bene� ial. In most ases theverti al ordering produ es learly better results than the horizontal. This would alsobe expe ted, as the verti al ordering should be the most likely to provide the mostprobably orre t results with lassi�ers with re ognition a ura ies of over 50%. Whenthat is the ase, it is more likely for the �rst-ranking result of any lassi�er to be orre tthan any se ond-ranking result.As for the method of resolving on�i ting rules, the di�eren e between using justthe orre t results ( or) and both orre t and wrong results ( &w) for al ulation ofperforman e seems to have no signi� ant e�e t. Using just the orre t results is onthe average slightly better, but the di�eren e is not that signi� ant. The strategy ofina tivating rules after errors (ir), however, performs noti eably worse.The results of the DEC ommittee runs are ompared to the referen e ommittees inTable 6.11. As an be seen, the DEC-based ommittee learly outperforms all thereferen e lassi�ers used.The adaption of the re ognition error rate for an example writer is shown in Figure 6.2.The error rate has been al ulated within a sliding window of 100 hara ters. Theaverage error rate for the writer was 3.2%, but as an be seen the initial error rate isaround 6-7%, and the �nal level is below 2%.Another example of the evolution of the error rate an be seen in Figure 6.3. Also in104

0 100 200 300 400 5000

1

2

3

4

5

6

7

Number of samples used in adaptation

Rec

ogni

tion

erro

r pe

rcen

tage

Figure 6.2: The evolution of the re ognition error rate for one writer from the DEC-based ommitteethis ase the error rate has been al ulated within a sliding window of 100 hara ters.The average error rate for this writer was less impressive at 18.7%. The initial errorrate was near 30%, but as an be seen, the �nal error rate improved onsiderably toend up under 15%. The result from the overall best individual lassi�er is also in ludedand represented by the dotted line. As an be seen, the �nal error rate for the bestnon-adaptive member lassi�er was approximately 30%, whi h is twi e as mu h as withthe adaptive ommittee.6.6 Future dire tionsThe next logi al stage in the experiments with ommittee lassi�ers will be the om-bining of the adaptive ommittee with adaptive member lassi�ers. This will posesome di� ulties, as the adaptation in the individual lassi�ers makes the produ tionof reliable rules for the DEC ommittee ma hine mu h more di� ult.The most noti eable di� ulty is naturally that the lassi�er will not ne essarily respondwith the same lassi� ation result to an identi al input after learning. Thus any rule reated at an earlier stage may be ome worthless and possibly even hinder performan e.A possible solution would be to keep tra k of the parti ular prototypes used to reate a105

0 100 200 300 400 500

5

10

15

20

25

30

Number of samples used in adaptation

Rec

ogni

tion

erro

r pe

rcen

tage

Figure 6.3: The evolution of the re ognition error rate for another writer with theDEC-based ommittee. The result of the DEC ommittee is depi ted with the solidline and the result from the best member lassi�er with the dotted line.rule and simply delete the rule if the prototypes it is based on are altered or removed.As this ould also result in rules in orporating a lower level of ontext being removed,the result would be a possibility for some in onsisten y with the main DEC prin ipleof minimal spe i� ity to resolve on�i ts. It might be possible to then generalize thehigher-level rule, but as this approa h is yet to be tested, nothing an be stated on thee�e tivity of su h an approa h.Also the in lusion of the adapting best as the default de ision rule for the DEC om-mittee has been onsidered. This again poses slight di� ulties as the adaptive bestmethod inherently alters the ordering of the lassi�ers. Sin e the DEC rule ontext isformed based on the ordering, reordering of the lassi�ers should result in reorderingof the ontexts for all existing rules. This ould naturally be done through storing thefull ontext with ea h rule and just using the predesignated level of ontext. Dis rep-an y might also arise from reordering the ontext in the form of illogi al rules arising, ausing in onsisten y espe ially when the in lusion option were in use.Perhaps the simplest way to ombine member lassi�er adaptation and ommitteeadaptation would be to simply �rst adapt just the individual lassi�ers. Thus the a tual reation of the adaptive rules would only be started when either a ertain a ura y levelhas been rea hed or a prede�ned number of input samples have been pro essed. Thus106

the �nal level of a ura y of the lassi�ers ould possibly be further enhan ed by theaddition of rules. The downside of this strategy would be a prolonged learning time,as the rule base an only be formed on lassi� ation errors.Another interesting subje t for further resear h would also be in loosening up the rulesin situations where exa t mat hes annot be found. This might be implemented withsome measure of nearness whi h ould be applied in situations where a part of the ontext is di�erent from an existing rule. If the ontext were similar enough, the rule ould be used anyway. This might be able to produ e more a urate results thanresorting to the default de ision, as is urrently being done, to improve performan ealso in situations not yet en ountered.

107

Chapter 7Con lusionsBeing an ongoing resear h problem, on-line handwriting re ognition is still in need ofenhan ement to a hieve a level of performan e su� ient for ommer ial appli ations.Adaptation an be seen as the most suitable method of improving re ognition perfor-man e espe ially in mobile solutions, where storage limitations hinder or even prohibitthe use of a large di tionary, the other e�e tive means of improving performan e ofalgorithms. Also the general deployment of produ ts to the international market makesthe di tionary-based approa hes even less pra ti al, as the size of a di tionary to overall languages the devi e might be used with easily be omes ex essively large.7.1 Palm omputingPen-based omputers in general are not a new on ept but have be ome in reasinglypopular only re ently, mainly with the su ess of the 3Com's Palm series of PDAs.The two main problems are the e� ien y, or la k thereof, of handwritten hara terre ognition omponents and the immaturity of the user interfa e te hnology.But as the ommer ial se tor gains knowledge and interest, or as the publi demand forsu h appli ations in reases, it is highly probable that more appli ations will emerge.Using natural handwriting is mu h more onvenient for the user than having to learna spe ial alphabet in order to input writing su essfully. As the omputational powerin reases on even the smallest of platforms, more omplex and better-performing algo-rithms should start to surfa e.

108

7.2 Handwriting re ognition in literatureHandwritten hara ter re ognition has been an ongoing resear h topi for quite sometime and has gained wide interest, as an be seen in the sheer number of publi ationson various aspe ts of the problem. A very large number of approa hes to re ognitionhave been undertaken, and only a few of them ould be presented within the s ope ofthe literature survey of this thesis.There are learly some basi di�eren es in the re ognition methods that enable a roughdivision into sub lasses, as was done in Chapter 3. It seems that HMM and elasti -mat hing-based algorithms are perhaps the most popular, but also newer ideas, su has riti -driven and fuzzy lassi� ation, have gained noti eable interest. Adaption inthe lassi�ers has also generally been seen as an interesting topi for resear h and hasbeen widely studied.7.3 True label dedu tionAn issue that must not be overlooked in implementing adaption in a real-life situationis how the true labels of the input samples are dedu ed. This is vital to the e�e tivenessof the adaptation, as otherwise the adaptation might easily result in the deteriorationof the re ognition performan e instead of its improvement. As knowledge on the truelabels annot be re eived from the user expli itly, or the user would have to a eptea h hara ter separately, the logi that reveals the labels is de�nitely something that annot be a�orded to be overlooked.The approa h presented in Se tion 5.5.2 is merely a starting point and its e�e tivenesswill be seen in future experiments. It is probably impossible to determine the labelswith 100% a ura y, as the user annot be expe ted to always have the time or interestto orre t all re ognition mistakes. But due to the riti ality of this stage, are must betaken in its implementation, or even the best of adaptation algorithms will eventuallysu umb to mislearning. Also feasible ountermeasures for mislearning must be given onsideration. Prototype ina tivation is one method apable of dissipating the e�e tsof in orre t learning, but also other methods should be investigated.7.4 Future views for the palm-top implementationA notable di� ulty with the palm-top platform is the still too slow re ognition perfor-man e. As the user has to wait for the hara ters to be re ognized, the pro ess of using109

the apparatus be omes quite frustrating. Clearly the already implemented speedupmethodologies have helped, but the re ognition speed simply must be improved on tomake the devi e more usable. But due to the pa e of hardware development, the e�ortput into devising and implementing speedup in software must be arefully onsidered,as it is merely a matter of time before the omplexity auses no problems for palm-top omputers available. Already the fastest of the devi es overed in Se tion 2.1, the CasioCassiopeia E-105, features a pro essor of running at 131 MHz, while the faster of ourtesting beds has the same pro essor ar hite ture and a lo k speed of 75 MHz. Thisin rease in omputational power alone may be enough to speed up re ognition to ana eptable level without any further speed-related algorithm optimizations.The palm-top implementation presented in this thesis is, mainly in terms of its interfa e,still a data olle tion and testing appli ation. In the future the development of it toa more realisti appli ation will be ne essary in order to see how the performan e andadaption fare when put into a true real-life test. This ould be implemented by forexample a simple note-taking appli ation, or perhaps even as an additional operatingsystem omponent that ould inter ept and forward all data written in the user interfa eand be used in a variety of tasks.7.5 Adaptive ommittee re ognitionThe performan e of the DEC-based ommittee lassi�er proved impressive if not as-tounding. The prin iple is e�e tive, but still some modi� ations to further spe ify the ommittees a tions to the ase of handwritten hara ter re ognition may be ne essaryas onstantly better performan e is striven for.The in lusion of adaptive member lassi�ers into the DEC ommittee will be a targetfor further resear h. If a method enabling on urrent adaptation for the ommittee andthe individual lassi�ers an be found, the time to adaptation should be signi� antlyredu ed to produ e a fast-adapting and well-performing omplete system.

110

BibliographyAnnadurai, S. and Balasubramaniam, A. (1996). Classi� ation of handwrittenalphanumeri hara ters: a fuzzy neural approa h, International Conferen e onHigh Performan e Computing, pp. 36�41.Arakawa, H. (1983). On-line re ognition of handwritten hara ters � alphanumeri s,hiragana, katakana, kanji, Pattern Re ognition 16(1): 9�16.ART (2000). Smartwriter data sheet,http://www.artre ognition. om/artre ognition/download/smartwriter.pdf.Bellegarda, E. J., Bellegarda, J. R., Nahamoo, D. and Nathan, K. S. (1994). A faststatisti al mixture algorithm for on-line handwriting re ognition, IEEETransa tions on Pattern Analysis and Ma hine Intelligen e 16(12): 1227�1233.Bellegarda, E. J., Bellegarda, J. R., Nahamoo, D. and Nathan, K. S. (1995). Adis rete parameter HMM approa h to on-line handwriting re ognition,Pro eedings of International Conferen e on A ousti s, Spee h and SignalPro essing, IEEE, pp. 2631�2622.Bou ha�ra, D. and Govindaraju, V. (1999). A methodology for mapping s ores toprobabilities, IEEE Transa tions on Pattern Analysis and Ma hine Intelligen e21(9): 923�927.Brakensiek, A., Kosmala, A. and Willett, D. (1999). Performan e evaluation of a newhybrid modeling te hnique for handwriting re ognition using identi al on-lineand o�-line data, Pro eedings of International Conferen e on Do ument Analysisand Re ognition, pp. 446�449.Bray, J. (1999). Po ket omputers, PC Pro pp. 76�96.Cao, J., Shridhar, M., Kimura, F. and Ahmadi, M. (1992). Statisti al and neural lassi� ation of handwritten numerals: a omparative study, Pro eedings ofInternational Conferen e on Pattern Re ognition Methodology and Systems,pp. 643�646. 111

Chan, K.-F. and Yeung, D.-Y. (1998). Elasti stru tural mat hing for onlinehandwritten alphanumeri hara ter re ognition, Pro eedings of InternationalConferen e on Pattern Re ognition, Vol. 2, pp. 1508�1511.Chang, L. and Ma Kenzie, I. S. (1994). A omparison of two handwriting re ognizersfor pen-based omputers, Pro eedings of Center for Advan ed StudiesConferen e, pp. 364�371.Cheung, K., Yeung, D. and Chin, R. (1998). A bayesian framework for deformablepattern re ognition with appli ation to handwritten hara ter re ognition, IEEETransa tions on Pattern Analysis and Ma hine Intelligen e 20(12): 1382�1388.CIC (2000a). Jot pro 1.01 for palm-size p ,http://www. i . om/software_store/produ t_details/jotpro_pp .asp.CIC (2000b). Jot pro user's guide/frequently asked questions,http://www. i . om/support_ enter/Faq/CE/PPCJot/PPCJotUsersGuide.html.Connell, S. and Jain, N. (1999). Writer adaptation of online handwriting models,Pro eedings of International Conferen e on Do ument Analysis and Re ognition,pp. 434�437.Dru ker, H., Cortes, C., Ja kel, L. D., LeCun, Y. and Vapnik, V. (1994). Boostingand other ensemble methods, Neural Computation 6(6): 1289�1301.Dru ker, H., S hapire, R. and Simard, P. (1993). Boosting performan e in neuralnetworks, International Journal of Pattern Re ognition and Arti� ial Intelligen e7(4): 705�719.Everex (1998). Everex Freestyle Palm-size PC User's Manual, Everex Systems In .Everex (2000). Everex everywhere - te h support & ustomer servi e: F.a.q.:Freestyle[tm℄, http://www.everex. om/support/faq/freestyle_faq/.Filatov, A., Gitis, A. and Kil, I. (1995). Graph-based handwritten digit stringre ognition, Pro eedings of International Conferen e on Do ument Analysis andRe ognition, pp. 845�848.Franke, J. and Mandler, E. (1992). A omparison of two approa hes for ombiningthe votes of ooperating lassi�ers, Pro eedings of the 11th InternationalConferen e on Pattern Re ognition, Vol. II, IAPR, Hague, pp. 611�614.Frankish, C., Hull, R. and Morgan, P. (1995). Re ognition a ura y and usera eptan e of pen interfa es, Pro eedings ACM CHI'95 Conferen e on HumanFa tors in Computing Systems. 112

Frosini, G., Lazzerini, B., Maggiore, A. and Mar elloni, F. (1998). A fuzzy lassi� ation based system for handwritten hara ter re ognition, InternationalConferen e on Knowledge-Based Intelligent Ele troni Systems, pp. 61�65.Goldberg, D. and Ri hardson, C. (1993). Tou h-typing with a stylus, Pro eedings ofInternational Conferen e on Human Fa tors in Computing Systems, pp. 80�87.Govindan, V. K. (1990). Chara ter re ognition � a review, Pattern Re ognition23(7): 671�683.Guberman, S. (1998). O�-line and online handwriting re ognition- ommon approa h,IEE European Workshop on Handwriting Analysis and Re ognition, pp. 6/1�6/2.Guerfali, W. and Plamondon, R. (1993). Normalizing and restoring on-linehandwriting, Pattern Re ognition 26(3): 419�430.Guyon, I. and Warwi k, C. (1996). Joint EC-US survey of the state-of-the-art inhuman language te hnology, http://www. se.ogi.edu/CSLU/HLTsurvey.htm.Guyon, I., S homaker, L., Plamondon, R., Liberman, M. and Janet, S. (1994).Unipen proje t of on-line data ex hange and re ognizer ben hmark, Pro eedingsof International Conferen e on Pattern Re ognition, pp. 29�33.Hu, J., Brown, M. K. and Turin, W. (1996). HMM based on-line handwritingre ognition, IEEE Transa tions on Pattern Analysis and Ma hine Intelligen e18(10): 1039�1045.Huang, Y. and Suen, C. (1995). A method of ombining multiple experts for there ognition of un onstrained handwritten numerals, IEEE Transa tions onPattern Analysis and Ma hine Intelligen e 17(1): 90�94.IBM (1991). Method that eliminates maveri k prototypes in online handwritingre ognition, IBM Te hni al Dis losure Bulletin 33(11): 383�384.Kang, H.-J. and Lee, S.-W. (1999). Combining lassi�ers based on minimization of abayes error rate, Pro eedings of International Conferen e on Do ument Analysisand Re ognition, pp. 398�401.Khan, N. and Hegt, H. (1998). A �exible and robust mat hing s heme for hara terre ognition to ope with variations in spatial interrelation among stru turalfeatures, International Conferen e on Systems, Man and Cyberneti s, Vol. 5,IEEE, pp. 4166�4171.113

Khotanzad, A. and Chung, C. (1994). Hand written digit re ognition using bks ombination of neural network lassi�ers, Pro eedings of the IEEE SouthwestSymposium on Image Analysis and Interpretation, pp. 94�99.Kittler, J., Hatef, M., Duin, R. and Matas, J. (1998). On ombining lassi�ers, IEEETransa tions on Pattern Analysis and Ma hine Intelligen e 20(3): 226�239.Kohonen, T. (1986). Dynami ally expanding ontext, with appli ations to the orre tion of symbol strings in the re ognition of ontinuous spee h,International Conferen e on Pattern Re ognition, Vol. 2, pp. 1148�1151.Kohonen, T. (1987). Dynami ally expanding ontext, Journal of Intelligent Systems1(1): 79�95.Kohonen, T. (1997). Self-Organizing Maps, Vol. 30 of Springer Series in InformationS ien es, Springer-Verlag. Se ond Extended Edition.Kuklinski, T. T. (1984). Components of handprint style variability, Pro eedings of the7th International Conferen e on Pattern Re ognition, pp. 924�926.Kurimo, M. (1998). Improving vo abulary independent hmm de oding results byusing the dynami ally expanding, Pro eedings of International Conferen e onA ousti s, Spee h and Signal Pro essing, Vol. 2, pp. 883�836.Kurtzberg, J. M. and Tappert, C. C. (1981). Symbol re ognition system by elasti mat hing, IBM Te hni al Dis losure Bulletin 24(6): 2897�2902.Laaksonen, J., Aksela, M., Oja, E. and Kangas, J. (1999). Dynami ally ExpandingContext as ommittee adaptation method in on-line re ognition of handwrittenlatin hara ters, Pro eedings of International Conferen e on Do ument Analysisand Re ognition, pp. 796�799.Laaksonen, J., Hurri, J., Oja, E. and Kangas, J. (1998a). Comparison of adaptivestrategies for on-line hara ter re ognition, Pro eedings of InternationalConferen e on Arti� ial Neural Networks, pp. 245�250.Laaksonen, J., Hurri, J., Oja, E. and Kangas, J. (1998b). Experiments with aself-supervised adaptive lassi� ation strategy in on-line re ognition of isolatedhandwritten latin hara ters, Pro eedings of Sixth International Workshop onFrontiers in Handwriting Re ognition, pp. 475�484.Laird, D. (2000). Crusoe pro essor produ ts and te hnology,http://www.transmeta. om/ rusoe/download/pdf/laird.pdf.114

Lam, L. and Suen, C. Y. (1994). A theoreti al analysis of the appli ation of majorityvoting to pattern re ognition, Pro eedings of 12th International Conferen e onPattern Re ognition, Vol. II, IAPR, Jerusalem, pp. 418�420.Lazzerini, B. and Mar elloni, F. (1999). Fuzzy lassi� ation of handwritten hara ters, International Conferen e on North Ameri an Fuzzy Information,pp. 566�570.Liu, C.-L. and Nakagawa, M. (1999). Prototype learning algorithms for nearestneighbor lassi�er with appli ation to handwritten hara ter re ognition,Pro eedings of International Conferen e on Do ument Analysis and Re ognition,pp. 378�381.Loy, W. and Landay, L. (1982). An on-line pro edure for re ognition of handprintedalphanumeri hara ters, Pattern Re ognition 4(4): 422�427.Lu, P.-Y. and Brodersen, R. W. (1984). Real-time on-line symbol re ognition using aDTW pro essor, Pro eedings of International Conferen e on PatternRe ognition, Vol. 2, pp. 1281�1283.Ma Kenzie, I. S., Nonne ke, B., Riddersma, S., M Queen, C. and Meltz, M. (1994).Alphanumeri entry on pen-based omputers, International Journal ofHuman-Computer Studies 41: 775�792.Mandler, E., Oed, R. and Doster, W. (1985). Experiments in on-line s riptre ognition, Pro eedings of S andinavian Conferen e on Image Analysis,pp. 75�86.Mel, M. W., Omohundro, S. M., Robinson, A. D., Skiena, S. S., Thearling, K. H.,Young, L. T. and Wolfram, S. (1988). Tablet: personal omputer in the year2000, Communi ations of the ACM 31(6): 639�648.Meyer, A. (1995). Pen omputing, a te hnology overview and a vision, ACM SIGCHIbulletin.Miller, D. and Yan, L. (1999a). Ensemble lassi� ation by riti -driven ombining,Pro eedings of International Conferen e on A ousti s, Spee h and SignalPro essing, Vol. 2, IEEE, pp. 1029�1032.Miller, D. and Yan, L. (1999b). Some analyti al results on riti -driven ensemble lassi� ation, Pro eedings of Neural Networks for Signal Pro essing, pp. 252�263.Mozayyani, N., Baig, A. and Vau her, G. (1998). A fully-neural solution for on-linehandwritten hara ter re ognition, Pro eedings of International Joint Conferen eon Neural Networks, Vol. 2, pp. 160�164.115

Nadal, C. (1998). Cenparmi home page,http://www. enparmi. on ordia. a/index.html.Narayanaswamy, S., Hu, J. and Kashi, R. (1999). User interfa e for a p s smartphone, International Conferen e on Multimedia Computing and Systems, Vol. 1,IEEE, pp. 777�781.Nathan, K., Bellegarda, J. R., Nahamoo, D. and Bellegarda, E. J. (1993). On-linehandwriting re ognition using ontinuous parameter hidden markov models,Pro eedings of International Conferen e on A ousti s, Spee h and SignalPro essing, Vol. 5, pp. 121�124.Neisser, U. and Weene, P. (1960). A note on human re ognition of hand-printed hara ters, Information and Control (3): 191�196.Nouboud, F. and Plamondon, R. (1990). On-line re ognition of handprinted hara ters: survey and beta tests, Pattern Re ognition 23(9): 1031�1044.Nouboud, F. and Plamondon, R. (1991). A stru tural approa h to on-line hara terre ognition: System design and appli ations, International Journal of PatternRe ognition and Arti� ial Intelligen e 5(1&2): 311�335.Paik, J., bae Cho, S., Lee, K. and Lee, Y. (1996). Multiple re ognizers system usingtwo-stage ombination, Pro eedings of International Conferen e on PatternRe ognition, IEEE, pp. 581�585.Park, K.-Y. and Lee, S.-Y. (1999). Sele tive attention for robust spee h re ognition innoisy environments, Pro eedings of International Joint Conferen e on NeuralNetworks 1999, Vol. 5, pp. 3014�3019.Philips (1998). Nino 300 Owner's Guide, Philips Ele torni s North Ameri aCorporation.Qian, G. (1999). An engine for ursive handwriting interpretation, Pro eedings ofInternational Conferen e on Tools with Arti� ial Intelligen e, pp. 271�278.Rabiner, L. R. (1989). A tutorial on hidden markov models and sele ted appli ationsin spee h re ognition, Pro eedings of International Conferen e on A ousti s,Spee h and Signal Pro essing, pp. 267�295.Rabinowitz, S. and Wagon, S. (1995). A spigot algorithm for the digits of �,Ameri an Mathemati al Monthly 102: 195�203.116

Rahman, A. and Fairhurst, M. (1996). Re ognition of handwritten hara ters with amulti-expert system, IEE Workshop on Handwriting Analysis and Re ognition -A European Perspe tive, pp. 6/1�6/4.Rahman, A. and Fairhurst, M. (1997a). A omparative study of de ision ombinationstrategies for a novel multiple-expert lassi�er, International Conferen e onImage Pro essing and Its Appli ations, Vol. 1, pp. 131�135.Rahman, A. and Fairhurst, M. (1997b). Generalised approa h to the re ognition ofstru turally similar handwritten hara ters using multiple expert lassi�ers, IEEPro eedings on Vision, Image and Signal Pro essing 144(1): 15�22.Rahman, A. and Fairhurst, M. (1997 ). Introdu ing new multiple expert de ision ombination topologies: a ase study using re ognition of handwritten hara ters, Pro eedings of International Conferen e on Do ument Analysis andRe ognition, Vol. 2, IEEE, pp. 886�891.Råde, L. and Westergren, B. (1990). BETA Mathemati s Handbook for S ien e andEngineering, 2nd edition, Chartwell-Bratt Ltd.Rigoll, G., Kosmala, A., Rottland, J. and Neukir hen, C. (1996). A omparisonbetween ontinuous and dis rete density hidden markov models for ursivehandwriting re ognition, Pro eedings of International Conferen e on PatternRe ognition, Vol. 2, IEEE, pp. 205�209.Russell, S. J. and Norvig, P. (1995). Arti� ial Intelligen e: A Modern Approa h,Prenti e Hall.Sanko�, D. and Kruskal, J. B. (1983). Time warps, string edits, and ma romole ules:the theory and pra ti e of sequen e omparison, Addison-Wesley.S halko�, R. (1992). Pattern re ognition: statisti al, stru tural and neuralapproa hes, John Wiley & Sons, In .S homaker, L. (1994). User-intera e aspe ts in re ognizing onne ted- ursivehandwriting, IEE Colloquium on Handwriting and Pen-based input.S homaker, L. (1998). From handwriting analysis to pen- omputer appli ations,Ele tri s & Communi ation Engineering Journal pp. 93�101.S homaker, L., Abbink, G. and Selen, S. (1994). Writer and writing-style lassi� ation in the re ognition of online handwriting, IEE Colloquium (Digest)Pro eedings of the European Workshop on Handwriting Analysis and Re ognition:European Perspe tive. 117

Sherkat, N., Whitrow, R. and Evans, R. (1999). Wholisti re ognition of handwritingusing stru tural features, IEE Colloquium on Do ument Image Pro essing andMultimedia, pp. 12/1�12/4.Srihari, R. K. (1997). Cedar on-line handwriting database,http://www. edar.bu�alo.edu/Linguisti s/database.html.Srikantan, G., Lee, D.-S. and Favata, J. (1995). Comparison of normalizationmethods for hara ter re ognition, Pro eedings of International Conferen e onDo ument Analysis and Re ognition, Vol. 2, pp. 719�722.Srinivasan, S. and Ramakrishnan, K. (1999). The independent omponents of hara ters are 'strokes', Pro eedings of International Conferen e on Do umentAnalysis and Re ognition, pp. 414�417.Starner, T., J. Makhoul, R. S. and Chou, G. (1994). On-line ursive handwritingre ognition using spee h re ognition methods, Pro eedings of InternationalConferen e on A ousti s, Spee h, and Signal Pro essing, Vol. 5, IEEE,pp. 125�128.Suen, C., Nadal, C., Legault, R., Mai, T. and Lam, L. (1992). Computer re ognitionof un onstrained handwritten numerals, Pro eedings of the IEEE, Vol. 7, IEEE,pp. 1162�1180.Tan, A.-H. and Teow, L.-N. (1995). Learning by supervised lustering and mat hing,Pro eedings of International Conferen e on Neural Networks, Vol. 1, pp. 242�246.Tanaka, H., Nakajima, K., Ishigaki, K., Akiyama, K. and Nakagawa, M. (1999).Hybrid pen-input hara ter re ognition system based on integration ofonline-o�ine re ognition, Pro eedings of International Conferen e on Do umentAnalysis and Re ognition, pp. 209�212.Tappert, C. C. (1984). Adaptive on-line handwriting re ognition, Pro eedings ofInternational Conferen e on Pattern Re ognition, IEEE, pp. 1004�1007.Tappert, C., Suen, C. Y. and Wakahara, T. (1990). The state of the art in on-linehandwriting re ognition, IEEE Transa tions on Pattern Analysis and Ma hineIntelligen e 12(8): 787�808.Teow, L.-N. and Tan, A.-H. (1995). Adaptive integration of multiple experts,Pro eedings of International Conferen e on Neural Networks, Vol. 3,pp. 1215�1220. 118

Torkkola, K. (1993). An e� ient way to learn english grapheme-to-phoneme rulesautomati ally, Pro eedings of International Conferen e on A ousti s, Spee h andSignal Pro essing, Vol. 2, pp. 199�202.Torkkola, K. and Kohonen, T. (1988). Corre tion of quasiphoneme strings by thedynami ally expanding ontext, International Conferen e on PatternRe ognition, Vol. 1, pp. 487�489.Vuori, V. (1999). Adaptation in on-line re ognition of handwriting, Master's thesis,Helsinki University of Te hnology.Vuori, V., Laaksonen, J., Oja, E. and Kangas, J. (1999). On-line adaptation inre ognition of handwritten alphanumeri hara ters, Pro eedings of InternationalConferen e on Do ument Analysis and Re ognition, pp. 792�795.Wang, P. S.-P. and Gupta, A. (1991). An improved stru tural approa h forautomated re ognition of handprinted hara ter, International Journal ofPattern Re ognition and Arti� ial Intelligen e 5(1&2): 97�121.Ward, J. R. and Blesser, B. (1985). Intera tive re ognition of handprinted hara tersfor omputer input, IEEE Computer Graphi s and Appli ations 9(5): 24�37.Ward, J. R. and Kuklinski, T. (1988). A model for variability e�e ts in handprintingwith impli ations for design of handwriting hara ter re ognition systems, IEEETransa tions on Systems, Man, and Cyberneti s 18(3): 438�451.Xu, L. and Jordan, M. I. (1993). EM learning on a generalized �nite mixture modelfor ombining multiple lassi�ers, Pro eedings of the World Congress on NeuralNetworks, Vol. IV, pp. 227�230.Xu, L., Jordan, M. I. and Hinton, G. E. (1995). An alternative model for mixtures ofexperts, in G. Tesauro, D. S. Touretzky and T. K. Leen (eds), Advan es inNeural Information Pro essing Systems 7, MIT Press, Cambridge, MA,pp. 633�640.Xu, L., Krzyzak, A. and Suen, C. (1992). Methods of ombining multiple lassi�ersand their appli ations to handwriting re ognition, IEEE Transa tions onSystems, Man and Cyberneti s 22(3): 418�435.Yuen, H. (1996). A hain oding approa h for real-time re ognition of on-linehandwritten hara ters, Pro eedings of International Conferen e on A ousti s,Spee h, and Signal Pro essing, Vol. 6, IEEE, pp. 3426�3429.119

Zhang, B., Fu, M., Yan, H. and Jabri, M. (1999). Handwritten digit re ognition byadaptive-subspa e self-organizing map (assom), IEEE Transa tions on NeuralNetworks 10(4): 939�945.

120

AB - TKKcis.legacy.ics.tkk.fi/aksela/aksela_mthesis.pdf15 1.2 Aims and o v erview of this thesis. 15...

Documents

Transcript of AB - TKKcis.legacy.ics.tkk.fi/aksela/aksela_mthesis.pdf15 1.2 Aims and o v erview of this thesis. 15...