Download - Signal Processing and Analysisapachepersonal.miun.se/~bentho/sigpronal/download/F7.pdf · •Training sets •Minimum distance ... Reference: Navneet Dalal and Bill Triggs, “Histograms

Signal Processing and AnalysisImage analysis III

W

V

Benny Thörnberg

Associate professor

in electronics

Copyright (c) Benny Thörnberg 1:41

Outline•Shape and texture of objects

•Computation of image gradients

•EHD – Edge Histogram Descriptor

•HOG – Histogram of Oriented Gradients

•Original Character Recognition

•Fundamental steps of OCR

•Training sets

•Minimum distance classifier

•Extension of feature vector to improve OCR

•Summary of performance for OCR

•Principal Component Analysis


Shape and texture of objects

Is there a method available to compute a descriptor that is compact enough and still

provide enough information for a computerized classifier to identify objects in pictures?

Shape and texture of objects seem to capture enough information in order for a human to

distinguish between different kinds of objects present in an image


Convolving image I with the Sobel matrixes gives the gradient vector

Computation of image gradients

IGX ∗

−

−

−

=

101

202

101

IGY ∗

−−−

=

121

000

121

( )YX GGG ,=

Gradient magnitude

and orientation (angle)

22)( YX GGGGmag +==

=

−

)(cos 1

Gmag

GXθ


≈ �� + ��

Approximation used for

Sobel operator

EHD – Edge Histogram Descriptor

If the statistical distribution of all gradient vectors in a

neighborhood is collected into a histogram, we have created

a descriptor that captures local salient texture in image.

But how to create a descriptor that also has the ability to

capture shape of an object such as the woman in picture?



1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

17 18 19 20

21 22 23 24

25 26 27 28

29 30 31 32

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Divide image into blocked sub-images. Compute histogram

for each block and append all histograms into a large vector.

A spatial coding is thus created such that the combined

histograms can capture information about global salient

shapes of objects.



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Classifier

Type of object ( human, giraffe, bird … )

Large datasets of images

used for training

L. Touil, A.B. Abdelali and M. Abdelatif, “A hardware acceleration of real time video processing”, Proc. of16th IEEE

Mediterranean Electrotechnical Conference, 28 March 2012, Yasmine Hammamet, Tunisia.

H. Ayad, S.N.H.S. Abdulah and A. Abdullah, “Visual Object Categorization based on Orientation Descriptor”, Proc. of 6th

Asia Modelling Symposium (AMS 2012), 29-31 May 2012, Bali, Indonesia.


Histograms of Oriented Gradients - HOG

Reference: Navneet Dalal and Bill Triggs, “Histograms of Oriented Gradients for Human Detection”

Overlapping blocks

Cells

NormalizeGamma

and Colour

Computegradient vectors

Weighted vote intohistograms of gradient

orientations, onehistogram for each cell

Contrast normalizecells within

overlapping blocks

Collect HOG’s intoa descriptive

vector for wholedetection window

Linearclassifier(SVM)

Input image

Human detectedor

No human




12

−

=

k

k

V

Vv

Vector v is a normalized

block histogram,

computed from

histograms Vk, built from

all cell histograms

belonging to a single

block

Finally, a large feature vector is built from appending all normalized block histograms into a

long feature vector for whole detection window.

NormalizeGamma

and Colour

Computegradient vectors

Weighted vote intohistograms of gradient

orientations, onehistogram for each cell

Contrast normalizecells within

overlapping blocks

Collect HOG’s intoa descriptive

vector for wholedetection window

Linearclassifier(SVM)

Input image

Human detectedor

No human

Cell = 5x5 pixels

Block = 3x3 cells

Detection window = 128x64 pixels



Examples from a training set of 924 images made available at MITMassachusetts Institute of Technology

Ref: http://cbcl.mit.edu/software-datasets/PedestrianData.html

This “large” collection of images can be used to train a classifier to recognise pedestrians in images based on computed image descriptors. It provides a test bench for researchers to compare performance of different methods for pedestrian detection.




Dalal and Triggs evaluated performance and achieved a miss rate of 10% at 10-4 False Positives Per Window.

Cell size was 4x4 pixels and block size was 2x2 cells, block stride = 8 pixels and detection window = 64 x 128 pixels

Voting for 9 bins per local HOG using weights linearly proportional to gradient strength


Original Character Recognition - OCR

We will investigate and show how scanned copies of printed letters automatically can be recognised as a string of letters


Fundamental steps of the OCR system

Image acquisition

Preprocessing

Segmentation

Feature extraction

Classification

Labeling

A scanner is used to capture images of papers having sequences of letters printed on it



Image acquisition

Preprocessing

Segmentation

Feature extraction

Classification

Labeling

Gradient vector is computed from the Sobel matrixes



Image acquisition

Preprocessing

Segmentation

Feature extraction

Classification

Labeling

Gradient image is segmented into a binary image based on thresholdingthe gradient magnitude



Image acquisition

Preprocessing

Segmentation

Feature extraction

Classification

LabelingLabelling identifies each letter as single image components



Image acquisition

Preprocessing

Segmentation

Feature extraction

Classification

Labeling

Histograms of the gradient orientations (angle) are built from the gradient image developed at preprocessing and only for pixels belonging to an image component. This step generates an Edge Histogram Descriptor (EHD) for each segmented letter.


EHD from training sets for letters R and B

0 20 40 60 80 100 120 140 160 1800

0.05

0.1

0.15

0.2

0.25Histogram Of Gradients

Gradient angle [degrees]

Pro

babili

ty

0 20 40 60 80 100 120 140 160 1800.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

Histogram Of Gradients


Pro

babili

ty


EHD from training sets for letters T and S

0 20 40 60 80 100 120 140 160 1800

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45



Pro

babili

ty

0 20 40 60 80 100 120 140 160 1800

0.05

0.1

0.15

0.2

0.25

0.3



Pro

babili

ty


EHD from training sets for letter O

0 20 40 60 80 100 120 140 160 1800

0.05

0.1

0.15

0.2



Pro

babili

ty


Feature vectors for letters R,B,T,S and O

0 20 40 60 80 100 120 140 160 1800

0.05

0.1

0.15

0.2



Pro

bability

0 20 40 60 80 100 120 140 160 1800.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

Histogram Of Gradients


Pro

bability

0 20 40 60 80 100 120 140 160 1800

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45



Pro

babili

ty

0 20 40 60 80 100 120 140 160 1800

0.05

0.1

0.15

0.2

0.25

0.3



Pro

bability

0 20 40 60 80 100 120 140 160 1800

0.05

0.1

0.15

0.2



Pro

bability

R B

T S

O

These graphs are

generated for all letters

within a training set of

letters. The width of the

line reveals a statistical

distribution among letters

belonging to the same

class.


Classification

Image acquisition

Preprocessing

Segmentation

Feature extraction

Classification

Labeling

From the EHD feature vectors and its statistical distribution over a training set of images, we can classify each feature vector as belonging to a letter (class) at a an estimated probability of correctness.


Classification

This graph is showing a 3-dimensional feature space having five clearly separable classes.


Minimum Distance Classifier

H.Lin and A.N. Venetsanopoulos, “ A weighted Minimum Distance

Classifier for Pattern Recognition”, Canadian Conference on

Electrical and Computer Engineering, vol.2, 904-907, 1993.

X

Euclidian distances from feature vector to mean vectors of all classes.

�� Select the class giving the shortest distance.


Classification of a string of letters

SRBRTSTTRROOBBOOSS

Input

Output

From experiments, the classification success rate using an

extension to15 bins EHD feature vector covering 360 degrees is

estimated to 70 percent for hand written capital letters A to Z.

Reference: Bala Subramanyam and Kassahun Frew, “Hardware Centric Original Character Recognition”


EHD feature for machine written letters

If instead machine printed letters are used, classification success

rate is improved to 87 percent. Letters such as R and B are still

hard to distinguish between.



Extension of the feature vector

Displacement vector between centre of gravity and centre of bounding box. Two elements are thus added to feature vector.

How can the EHD feature vector be extended with additional features to improve possibilities for distinguishing between letters such as R and B?

If this vector is given as a fraction of the bounding box side length, this feature becomes scale invariant.



Extension of feature vector

A four bins zonalhistogram is built from the probabilities of having letters within the four indicated areas. These zones are defined from the bounding box parameters.

This feature has four dimensions and thus adds four additional elements to the feature vector.



After extension of the feature vector

After adding displacement vector and zonal histogram,

classification success rate becomes 86 percent for hand written

letters and close to 100 percent for machine printed capital letters.



Percentage of success rate

Summary of performance for OCR


EHD EHD + Geometrical

Hand written 70 86

Machine printed 87 ~100


Overview of feature space

-2

0

2

4

-3

-2

-1

0

1

-1

0

1

2

3

4

PCA comp 1PCA comp 2

PC

A c

om

p 3

B

R

S

T How can we get an overview of a feature space

that has 21 dimensions as in the previous OCR

example?

Typically, there is correlation between variables in

multi-dimensionally data used as input to

classification.

If so, then a large amount of the variance can be described by projecting e.g. 21-

dimensional data onto two or three variables. We call the new variables Principal

components.

The 3D example graph shows data clusters corresponding to letters B,R,S and T. Still 58%

of the variance of original data is described by this graph.


Principal Component Analysis

��

��

�

��

�� is a data vector having K dimensions (only three dimensions are

shown in graph). �� = ��,� , � ∈ 1…�The 2D data matrix � is a set of � data vectors where each vector

�� represent a statistical observation

� = ��,� , � ∈ 1…� ∧ � ∈ 1…�

Input data


Principal Component AnalysisMean vector

��

��

�

Each point is a data vector of length K

This means that input data matrix X represents a

swarm of N points in a K-dimensional space

A mean value ��vector is computed as,

�� =1��,�

�

�� , ∀� ∈ 1. . �


Principal Component AnalysisCentering of data

��

��

�

Subtract the mean vector �� from all data vectors.

This means that all data vectors are equally

translated in a K-dimensional space such that the

center of point cloud is relocated to the origin.


Principal Component AnalysisScaling of data

��

��

�

Divide all data vectors with a variance vector �̅,

��̅ =1�� ,� − ��

�

�� , ∀� ∈ 1. . �

This means that after scaling, all K dimensions will

have unity variance,

1�� ,� − �� = 1

�

�� , ∀� ∈ 1. . �


��

��

�

"#�

Principal Component AnalysisFirst component

• A principal component (PC) is a vector in K-dimensional X-space

that passes through the origin

• Scores are projections of data vectors (blue point)

• The PC is oriented such that the scores approximate the original

data as well as possible


��

��

�

"#�

Principal Component AnalysisSecond component

• The second PC also passes through the origin

• It is oriented to improve approximation of original data

as much as possible but under the constraint that

second PC should be orthogonal to the first one

• The two PCs are describing a plane which we can

think of as a window into X-space

"#�


��

��

�

"#�

Principal Component AnalysisSecond component

• The blue point is a score computed as a projection on the plane

defined by PC1 and PC2.

• Scores are approximations of their corresponding data points

• The model used for approximation in this case is plane

• If more PCs are included, the model becomes a hyperplane

• More PCs will gradually improve the approximation of data

"#�

X-space here is illustrated with three axes.

Remember that real-world X-spaces can have

hundreds or thousands of dimensions.


��

��

�

"#�

Principal Component AnalysisLoadings

• The direction of each PC is given by their angles (cosines) to

all K axes in X-space.

• In this example, the direction of PC1 is given by the cosines of angles $�, $� and $"#�

$�

$

$�

• When scores are computed from input data, these cosines

define how much each dimension in X-space (variable)

contribute to the PC of score values.

See formula for projection.

• For that reason, cosines of directions are called “loadings”.

%& · ( = %& · ( · )*+,

%&

(,%& · (


Principal Component AnalysisLoading plot

• A loading plot summarizes how each X-

variable “load” on each PC

• Points far away from origin have larger

impact on the model compared to points

closer to origin

• If two variable loadings are located very

close in this plot, it means that they are

positively correlated

• If the two variable loadings are located in

opposite sides of the origin, in diagonally

opposed quadrant, it means that those two

variables are negatively correlated

• It is the correlation between variables that

makes it possible to summarize hundreds of

variables in a few PCs.

-0.1

-0.05

0

0.05

0.1

-0.1

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

-0.1

-0.08

-0.06

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

Component 1

Z1

Z4

G8G1G10

G2G15G9

G7

EX1

G14

G3

EX2G6

G11G13G5

G4

G12Z3

Z2

Component 2

Com

ponent

3


Principal Component AnalysisSummary

• T: Scores are coordinates in a hyperplane that are used to approximate data

• P: A set of loading vectors that all together defines the orientation of the hyperplane.

Each loading vector defines the orientation of a PC

• E: A residual term captures the variation of data that can not be described by the model

• -.: Center of X-data are typically positioned in origin prior to computation of PCs.

� = �� + / ∗ "1 + 2Centering of data

Structure of data

Residuals contain noise