Boosted Decision Trees for Word Recognition in Handwritten ...

Gianmaria Silvello

Information Management Research Group (IMS)Department of Information Engineering

University of Padua, Italy

Boosted Decision Trees for Word Recognition in Handwritten Document

RetrievalHowe, N.R., Rath, T.M. and Manmatha, R.

Department of Computer Science, University of MassachusettsSIGIR 2005 published by ACM, New York

Applied Functional Analysis5 February 2009Padova, Italy

Applied Functional Analysis05 February 2009, Padova

Gianmaria Silvello

• Introduction to recognition and retrieval of handwritten documents

• Classification Algorithm: AdaBoost and Decision trees

• Classification Experiments

• Language Models for Retrieval

• Conclusions

Outline

Gianmaria Silvello

• Recognition and retrieval of off-line hand-written documents based upon word classification

• Decision tree with normalized pixels as feature form the basis for AdaBoost

• Problem of skewed distribution of class frequencies

• Experiments done on the GW20 and GW100 corpus

• Retrieval is done using a language model over recognized words

Introduction

Gianmaria Silvello

Introduction

• The main goal is to offer access to world historical handwritten documents

• Often HW works on limited vocabularies (postal address)

• Historical documents add complexity due to ink bleeding or dirt on the paper

• Use pixels in normalized word image at multiple scales (image pyramids) as features

• Propose an innovative procedure to create additional training data

Gianmaria Silvello

The Boosting Approach

• Boosting is a classification technique that determines its prediction via the weighted vote of a diverse set of base classifiers each of which has been trained on a different weighting of the trained data

• AdaBoost trains successive version of its base classifier focusing on hard-to-classify examples

• It can use a simple classifier but stronger classifiers get better results

Gianmaria Silvello

AdaBoost in brief

• Introduced in 1995 by Freund and Schapire in “A decision-theoretic generalization of the on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119-139.

Gianmaria Silvello

AdaBoost in brief

Reference: Freund, Y. and Schapire, R. E. “A Short Introduction to Boosting”, Journal of Japanese Society for Artificial Intelligence, 14(5):

771-780, 1999.

Gianmaria Silvello

AdaBoost in brief

Binary case

771-780, 1999.

Gianmaria Silvello

AdaBoost in brief

All weights are set equally

771-780, 1999.

Gianmaria Silvello

AdaBoost in brief

771-780, 1999.

Find a weak hypothesis

appropriate for the distribution Dt

Gianmaria Silvello

AdaBoost in brief

771-780, 1999.

The error measures the goodness of the

hypothesis

Gianmaria Silvello

AdaBoost in brief

771-780, 1999.

AdaBoost chooses the parameter αt that measures the importance assigned to ht

αt ≥0 if εt ≤1/2

Gianmaria Silvello

AdaBoost in brief

771-780, 1999.

Dt is updated → increased the weight of misclassified examples → concentrate on

hard examples

Gianmaria Silvello

AdaBoost in brief

771-780, 1999.

H is a weighted majority vote for the T weak hypothesis where αt is the weight

assigned to ht

Gianmaria Silvello

AdaBoost in brief

• In Schapire, R. E. and Singer, Y. “Improved boosting algorithms using confidence-rated predictions”, Machine Learning 37(3) 297-336, 1999 is shown how AdaBoost can handle weak hypothesis which output real values.

• Consider x, ht outputs ht(x)∈R whose sign is the predicted label (-1 or +1) and whose magnitude ⎮ht(x)⎮gives the measure of confidence in the prediction.

• AdaBoost.M1 is the extension to the multi-class case → it is adequate when the weak learner is strong to achieve an accuracy of at least 50%.

• Extensions: AdaBoost.MH and AdaBoost.MR → reducing multi-class to a larger binary problem

Gianmaria Silvello

Choices and Problems

• The recognition process uses values sampled directly from the word image at varying resolutions

• The choice is to divide word image and not letters

• recognizing letters become a limiting step

• segmentation of individual word images is easier (image classification problem)

• Skewed distribution of class frequencies (Zipfian distribution)and paucity of training data for most word classes

Gianmaria Silvello

Classification Algorithm

• HW words belonging to a single class have similar ink distribution (bur not identical)

• The position of individual features within the word will shift from example to example

• The pixel representation contains information about word identity that can be amplified by boosting

• clearer areas will contain more reliable features

• blurring indicates areas of inconsistency

Gianmaria Silvello

Classification Algorithm

• HW words belonging to a single class have similar ink distribution (bur not identical)

• The position of individual features within the word will shift from example to example

• The pixel representation contains information about word identity that can be amplified by boosting

• clearer areas will contain more reliable features

• blurring indicates areas of inconsistency

Composite image of 21 examples of the word “Instructions”.

Straightforward use of the pixel is ineffective.

Gianmaria Silvello

Common framework

• Pixels used as features for word image classification

• Word image mapped into a common pixel grid

• Images scaled and translated → horizontal line span: (0,0) to (1,0) → resampling each image to a common grid will produce common pixel representation

• Long and short words (horizontal and vertical dimensions) → astronomic data sizes

• Pyramid approach

Gianmaria Silvello

Pyramid Approach

• Define a family of standard grids → base grid Φo covering ([0,1], [-0.5, 0.5]) broken into 32x32 px

• Refine grids cover the same square region with double resolution 64x64 px

• Like a tree in which each Φk has 4 children in Φk+1

✓ The standard image usually don’t cover the full vertical extent of the grid ➙

portions above and below the edges of standard image may be represented

using a single default value

✓ Data need only be stored for Φk with resolution up to that of the reference

image.

Gianmaria Silvello

Pyramid Approach

image.

This square area captures all the detail of interest for most words

Gianmaria Silvello

Pyramid Approach

image.

Gianmaria Silvello

Boosting and Decision Trees

• Word image recognition has many potential class → to use AdaBoost, a classifier with at least 50% accuracy is needed

• Decision Trees are the foremost option

• well understood

• achieve arbitrary accuracy on the training data in practice

• At each node the training examples are split into 2 sub-groups by comparing the value of a chosen pixel in each to a chosen threshold

• A tree branch growth is stopped when the contained subset is dominated by a majority class

Se si va avanti fino ad avere un solo training example per foglia si raggiunge accuratezza 100%. -> overfit the tree che deve essere pruned rimuovendo rami statisticamente poco bell

Gianmaria Silvello

• C4.5 provides the algorithm for building the decision tree, with some modification designed to support the grid pyramid data structure

• C4.5 builds decision trees from a set of training data using the concept of information entropy

• It uses the fact that each attribute of the data can be used to make a decision that splits the data into smaller subsets

Training data is a setS = s1, s2, ..., sn

of already classify samples, where

si = x1, x2, ..., xm

xj = feature

Training data is augmented with a vector

C = c1, c2, ..., cv where

ci represents the class that each sample belongs to.

Gianmaria Silvello

• C4.5 provides the algorithm for building the decision tree, with some modification designed to support the grid pyramid data structure

• C4.5 builds decision trees from a set of training data using the concept of information entropy

• It uses the fact that each attribute of the data can be used to make a decision that splits the data into smaller subsets

Reference: Quinlan, J. R. “C4.5: Programs for Machine Learning”. Morgan Kaufmann, 1993.

Training data is a setS = s1, s2, ..., sn

of already classify samples, where

si = x1, x2, ..., xm

xj = feature

Training data is augmented with a vector

C = c1, c2, ..., cv where

ci represents the class that each sample belongs to.

Gianmaria Silvello

C4.5 for images pyramid

• At each node a feature (i.e. pixel location) and a threshold value must be chosen as split value → exhaustiveness is not possible

• Only Φo is exhaustively examined → location and threshold offering the greatest information gain is retained

• The search proceeds selectively to its children in Φ1, from there to the children of the best of those locations, and so on until the maximum resolution available is reached

• The grid level, location and threshold with the highest information gain becomes the decision criterion for the node

Gianmaria Silvello

Boosting

• Single trees do not generalize well for hand-written word images

(1) The base classifier is normally generated from the training data

(2)AdaBoost raises the weights of misclassify elements → forcing base classifier to work harder

(3)After many rounds of boosting, a weighted vote classifies the training set perfectly and shows good generalization to unseen examples

(4) In practice after a certain # of rounds (here: 200) the results don’t improve significantly

Gianmaria Silvello

Supplementary Training Examples

• Problem: paucity of training examples for many classes makes generalization difficult

• Zipf law ➙ few examples for many words

• 57% of the words appear only one time in the test collection

• Solution: generate new training examples for low frequencies classes via stochastical distortion of the available example

• Improve overall word classification accuracy

Gianmaria Silvello

Supplementary Training Examples

• Sample from the original using a grid of points whose portions have been perturbed from a uniform lattice

• Nearby points should be perturbed by similar amounts

• New image is the distortion of the old one

Gianmaria Silvello

Classification Experiments

• Test collection: GW20 ➞ previously used and GW100 non overlapping with GW20

• written by multiple hands

• manually segmented to extract images of individual words (4856 in GW20 and 21324 in GW100)

• all images labeled with their ASCII equivalent

GW20 experiments. 19 pages for training and 1 for tests.

Gianmaria Silvello

Single decision tree → standard C4.5 grown to completion, then pruned

Gianmaria Silvello

AdaBoost + Decision Tree as base learner

Gianmaria Silvello

AdaBoost + Decision Tree + Synthetic Data

No experiments with AdaBoost and simple classifier because 50% accuracy cannot be achieved

Gianmaria Silvello

GW100: ⇓ performances = + OOV words and ↓image quality

Gianmaria Silvello

Retrieval

• Language Modeling approach to retrieval• Ref: Ponte, J. and Croft, W.B. “A language modeling approach to Information

Retrieval”, SIGIR 1998 275-281

• Use query likelihood formulation where documents are ranked according to P(Q|D)

• AdaBoost provides classification rather than probabilities → only the most likely label for each word image is preserved

• An approach can be that the probabilities are equal to their frequencies in each recognized document → but many words can be misclassified

Gianmaria Silvello

Retrieval: Regularization Schema

• Regularization schema based upon classification rank information

• Hypothesis: Rank info may be more important than actual probabilities ➔ top terms→very imp. | some moderate imp. etc.

• Infer probabilities from the rank ordered output of AdaBoost classification algorithm → rank the top n classes according to scores

• Associate a probability to classes fitting the Zipfian distribution to rank classes

Gianmaria Silvello

• Instead a document has one possible word for each position, now it contains a probability distribution at each position

• Test on Lemur with the query-likelihood ranking method

• Because of limited size of GW20 line retrieval is performed

• relevant = line containing all query terms

• stop-words removed

• GW100 allows for full page retrieval with GW20 as training examples

Gianmaria Silvello

• Instead a document has one possible word for each position, now it contains a probability distribution at each position

• Test on Lemur with the query-likelihood ranking method

• Because of limited size of GW20 line retrieval is performed

• relevant = line containing all query terms

• stop-words removed

• GW100 allows for full page retrieval with GW20 as training examples

Gianmaria Silvello

Conclusions

• Learning algorithms are not designed to deal with training data that exhibits highly skewed distribution of class frequencies

• The methodology described does not always work fine because the synthetic training data are not truly independent of the originals

• Performances are good for GW20

• The problem is challenging for GW100 →larger data-set, noise ⇒ using soft classification decisions can

improve the results for shorter queries

Boosted Decision Trees for Word Recognition in Handwritten ...

Documents

Transcript of Boosted Decision Trees for Word Recognition in Handwritten ...

B-tagging Performance based on Boosted Decision Trees

Introduction to Boosted Treestqchen/pdf/BoostedTree.pdf · Introduction to Boosted Trees ... I want to predict whether I like romantic music at time t ... •Regression tree ensemble

BOOSTED REGRESSION TREES: A MODERN WAY … · BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort ... From the free Machine Learning course @ ml-class.org

Gradient Boosted Regression Trees - ULiege · Gradient Boosted Regression Trees Peter Prettenhofer (@pprett) DataRobot Gilles Louppe (@glouppe) Universit e de Li ege, Belgium

Digital Twin - IBM · FULLY BLACK BOX MODELS (Tooling , DL4J, Spark) TensorFlow 1.Gradient boosted trees 2.Hybrid models 3. ...

part 1: decision trees Paul Seyfert - uni-heidelberg.depseyfert/BDT_C14H.pdf · boosted decision trees part 1: decision trees Paul Seyfert ... Gini-index: (Corrado Gini 1912 ... decision

Introduction to Boosted Trees - homes.cs.washington.eduhomes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf · Recap: Boosted Tree Algorithm •Add a new tree in each iteration •Beginning

Boosting bonsai trees for handwritten/printed text ... · Boosting bonsai trees for handwritten/printed text discrimination Yann Ricquebourga, Christian Raymonda, Baptiste Poirrieza,

스마트 빅 데이터 분석: 서울시 지하철 승객 예측kdb.snu.ac.kr/wp-content/uploads/2015/07/smart bigdata analysis.pdf · 1-1) What is Boosted Trees? • Boosted Trees

MANUAL - Save Boosted Stealth Owner... · 2020-03-07 · 8 Boosted Stealth / Boosted Plus Manual Boosted Stealth / Boosted Plus Manual 9 EN EN POWERING ON/OFF REMOTE Press the Multi-Button

Machine Learning Methods Maximum entropy –Maxent is an example Boosting: –Boosted Regression Trees Neural Networks.

content · olive trees grow. Olive trees found their ideal habitat in the so-called "Campania Felix". The volcanic soil, indeed, and the typical Mediterranean climate, boosted the

cdn-links.lww.com€¦ · Web viewOur propensity score sensitivity analytic approach was based on gradient boosted tree models. Gradient boosted trees use additive ensembles of decision

Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Prettenhofer

Boosted Regression Trees for ecological modeling

Boosted decision trees in theory - Indico

BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING

Online Handwritten Chinese/Japanese Character Recognitioncdn.intechopen.com/...handwritten...japanese_character_recognition.pdf · handwritten Chinese/Japanese string recognition

Higgs Boson Discovery with Boosted Treesproceedings.mlr.press/v42/chen14.pdfHiggs Boson Discovery with Boosted Trees the probability distribution of y iin each region.However, the

2 Venturi Systems – Boosted Pressure Spray-Alls Airless ...sprayingsolutions.com.au/wp-content/uploads/2015/01/Boosted-pressure... · Boosted Pressure Lafferty Boosted Pressure