Introduction to Pattern Recognition

Introduction to Pattern Recognition

For 정보과학회 , Pattern Recognition Winter School

2011 년 2 월

김 진형KAIST 전산학과http://ai.kaist.ac.kr/~jkim

2

What is Pattern Recognition?

A pattern is an object, process or event that can be given a namePattern Recognition

assignment of physical object or event to one of several prespecified categeries -- Duda & Hart

A subfield of Artificial Intelligence

human intelligence is based on pattern recognition

3

Examples of Patterns

4

Pattern Recognition

Machine learningMathematical statisticsNeural networksSignal processingRobotics and visionCognitive scienceNonlinear optimizationExploratory data analysisFuzzy and genetic algorithmDetection and estimation theoryFormal languagesStructural modelingBiological cyberneticsComputational neuroscience…

Image processing /segmentationComputer visionSpeech recognitionAutomated target recognitionOptical character recognitionSeismic analysisMan and machine dialogueFingerprint identificationIndustrial inspectionMedical diagnosisECG signal analysisData miningGene sequence analysisProtein structure analysisRemote sensingAerial reconnaissance …

Related fields Application areas

5

패턴인식의 응용 예

Computer aided diagnosis

Medical imaging, EEF, ECG, X-ray mammography

영상인식공장 자동화 , Robot Navigation얼굴식별 , Gesture RecognitionAutomatic Target Recognition

음성인식Speaker identificationSpeech recognitionGoogle Maps Navigation (Beta): search by voice

http://www.youtube.com/watch?v=jLXZ5BHeDFg

6

생체 인식 (Biometrics Recognition)

불변의 생체 특징을 이용한 사람 식별정적 패턴

지문 , 홍채 , 얼굴 , 장문 , …DNA

동적 패턴Signature, 성문Typing pattern

활용출입통제전자상거래 인증

패턴인식의 응용

7

Gesture Recognition

Text editing on Pen ComputersTele-operations

Control remote by gesture inputTV control by hand motion

Sign language Interpretation

Camera2D Projection

Gesture (last)

Hand tracking Gesture spotting


8

데이터로부터 패턴의 추출 Data Mining

인구통계Point of SaleATM금융통계신용정보문헌첩보자료진료기록신체검사기록

데이타 정보 의사결정

A 상품 구매자의 80% 가 B 상품도 구매한다 (CRM)미국시장의 자동차 구매력이 6 개월간 감소A 상품의 매출 증가가 B 상품의 2배탈수 증상을 보이면 위험

광고전략은 ?상품의 진열최적의 예산 할당은 ?시장점유의 확대방안은 ?고객의 이탈 방지책은 ?처방은 ?

국내 사례 : 신용카드 사용 패턴의 학습에 의한 분실 카드 사용 방지


9

e-Book, Tablet PC, iPad, Smart-phone


Smart Phone with Rich Sensors

Online 한글 인식기 비교

11


KAIST Math Expression Recognizer : Demo

12


MathTutor-SE Demo

13


14

古文書認識 : 承政院日記


15

文書認識 Verification & Correction

Interface패턴인식의 응용

16

Mail Sorter패턴인식의 응용

Scene Text Recognition

17


18

Autonomous Land Vehicle

(DARPA’s GrandChallenge contest)

http://www.youtube.com/watch?v=yQ5U8suTUw0


19

Protein Structure Analysis패턴인식의 응용

Protein Structure Analysis

20


21

Types of PR problems

ClassificationAssigning an object to a classOutput: a label of classEx: classifying a product as ‘good’ or ‘bad’ in quality control

ClusteringOrganizing objects into meaningful groupsOutput: (hierarchical) grouping of objectsEx: taxonomy of species

RegressionPredict value based on observationEx: predict stock price, future prediction

DescriptionRepresenting an object in terms of a series of primitivesOutput: a structural or linguistic descriptionEx: labeling ECG signals, video indexing, protein structure indexing

From Ricardo Gutierrez-Osuna,Texas A&M Univ.

22

Pattern Class

A collection of “similar” (not necessarily identical) objects

Inter-class variability

Intra-class variability

Pattern Class Modeldescriptions of each class/population (e.g., a probability density like Gaussian)

23

Classification vs Clustering

Classification (known categories) Clustering (creation of new categories)

Category “A”

Category “B”

Classification (Recognition) (Supervised Classification)

Clustering(Unsupervised Classification)

24

Pattern Recognition : Key Objectives

Process the sensed data to eliminate noise

Data vs Noise

Hypothesize models that describe each class population

Then we may recover the process that generated the patterns.

Choose the best-fitting model for given sensed data to assign the class label associated with the model.

25

일반적인 Classification 과정

ClassifierFeature

Extractor

Sensorsignal

Feature

Class Membership

Sensor

26

Example : Salmon or Sea Bass

Sort incoming fish on a belt according to two classes:

Salmon orSea Bass

Steps:Preprocessing (segmentation)Feature extraction (measure features or properties)Classification (make final decision)

27

Sea bass vs Salmon (by Image)

LengthLightnessWidthNumber and shape of finsPosition of the mouth …

28

Salmon vs. Sea Bass (by length)

29

Salmon vs. Sea Bass (by lightness)

Best Decision Strategy with lightness

30

Cost of Misclassification

There are two possible classification errors.(1) deciding a sea bass into a

salmon.(2) deciding a salmon into a sea

bass.

Which error is more important ?Generalized as Loss functionThen, look for the decision of minimun Risk

Risk = Expected Loss

Salmon

Sea Bass

Salmon

0 -10

Sea bass

-20 0

truth

decision

Loss Function

31

Classification with more features(by length and lightness)

It is possibly better.

Really ??

32

How Many Features and Which?

Choice of features determines success or failure of classification taskFor a given feature, we may compute the best decision strategy from the (training) data

Is called training, parameter adaptation, learningMachine Learning Issues

Issues with feature extraction:

Correlated features do not improve performance.It might be difficult to extract certain features.It might be computationally expensive to extract many features.“Curse” of dimensionality …

33

Feature and Feature Vector

34

− Length− Lightness− Width− …− Number

and shape of fins

− Position of the mouth

Goodness of Feature

35

Features and separability

36

Developing PR system

Sensors and preprocessing.A feature extraction aims to create discriminative features good for classification. A classifier. A teacher provides information about hidden state -- supervised learning. A learning algorithm sets PR from training examples.

Sensors and preprocessing

Feature extraction

Classifier

Classassignment

Learning algorithmTeacher

Pattern

37

PR Approaches

Template matchingThe pattern to be recognized is matched against a stored template

Statistical PR: based on underlying statistical model of patterns(features) and pattern classes.

Structural PR: Syntactic pattern recognitionpattern classes represented by means of formal structures as grammars, automata, strings, etc. Not only for classification but also description

Neural networksclassifier is represented as a network of cells modeling neurons of the human brain (connectionist approach).Knowledge is stored in the connectivity and strength of synaptic weights

Statistical structure AnalysisCombining Structure and statistical analysisBayesian Network, MRF 등의 Probabilistic framework 을 활용

…Modified From Vojtěch Franc

38

Template Matching

Template

Input scene

PR Approaches

39

Deformable Template Matching: Snake

Prototype registration to the low-level segmented

imageShape training set Prototype and variation learning

Prototype warping

Example : Corpus Callosum Segmentation

PR Approaches

40

From Ricardo Gutierrez-Osuna,Texas A&M Univ. PR Approaches

41

Classifier

The task of classifier is to partition feature space into class-labeled decision regions

Borders between decision regions decision boundariesDetermining decision region of a feature vector X

42

Representation of classifier

A classifier is typically represented as a set of discriminant functions

||,,1,:)( YX iGi x

The classifier assigns a feature vector x to the i-the class if )()( xx ji GG ij

)(1 xG

)(2 xG

)(|| xYG

maxx y

Feature vector

Discriminant function

Class identifier

From Vojtěch Franc

…

….

43

Classification of Classifiers by Form of Discriminant Function

Discriminant Function Classifier

A posteriori ProbabilityP( yi | X)

Bayesian

Linear Function Linear Discrinant Analysis, Support Vector Machine

Non-Linear Function Non-Linear Discrinant Analysis

Output of artificial Neuron

Artificial Neural Network

)(xiG

44

Bayesian Decision Making

Statistical approachthe optimal classifier with Minimum errorAssume that complete statistical model is known.

Decision given the posterior probabilities

X is an observation :

if P(1 | x) > P(2 | x) decide state of nature = 1

if P(1 | x) < P(2 | x) decide state of nature = 2

45

Searching Decision Boundary

46

Bayesian Rule : P(x|1) P(1|x)

jjj

iiiii PP

PP

P

PPP

)()|(

)()|(

)(

)()|()|(

x

x

x

xx

47

Limitations of Bayesian approach

Statistical model p(x,y) is mostly not known learning to estimate p(x,y) from training examples {(x1,y1),…,(x,y)}

Usually p(x,y) is assumed to be a parametric form

Ex: multivariate normal distribution

Non-parametric estimation of p(x,y) requires a large set of training samplesNon-Bayesian methods offers equally good (??)

From Vojtěch Franc

48

Polynomial Discriminative Function approaches

Assume that G(x) is a polynomial function

Linear function – Linear Discriminant Analysis (LDA)Quadratic function

Classifier design is determination of separating hyperplane.

From Vojtěch Franc

49

LDA Example : 기수 (J)- 농구선수 (H) 분리

x

2

1

x

x

height

weight

Task: 기수 (J)- 농구선수 (H) 분리

The set of hidden state is

The feature space is

},{ JHY2X

Training examples

)},(,),,{( 11 ll yy xx

1x

2x

Jy

Hy Linear classifier:

0)(

0)()q(

bifJ

bifH

xw

xwx

0)( bxw

From Vojtěch Franc

…

50

Artificial Neural Network Design

For a given structure, find best weight sets which minimizes sum of square error, J(w), from training examples {(x1,y1),…,(x,y)}

2

1

)(2

1)( k

l

kk ztJ

51

PR design cycle

Data collectionProbably the most time-intensive component of projectHow many examples are enough ?

Feature choiceCritical to the success of the PR projectRequire basic prior knowledge, engineering sense

Model choice and designStatistical, neural and structuralParameter settings

TrainingGiven a feature set and ‘blank’ model, adapt the model to explain the training dataSupervised, unsupervised, reinforcement learning

EvaluationHow well does the trained model do ?Overfitting vs. generalization

52

Learning for PR system

Which Feature is good for classifying given classes ?

Feature analysis

Can we get required probabilities or boundaries ?

Learning from training Data

Sensors and preprocessing

Feature extraction

Classifier

Classassignment

Learning algorithmTeacher

Pattern

53

Learning

Change of contents and organization of system’s knowledge enabling to improve to its performance on task - SimonWhen it acquire new knowledge from environment

Learning from Observationfrom trivial memorization to the creation of scientific theoriesInductive Inference

New consistent interpretation of data (observations)General conclusion from examplesInfer association between input and output with some confidence

Data MiningLearning rules from large set of data

Availability of large database allows application of machine learning to real problems

54

Learning Algorithm Categorization Depending on Available

Feedback

Supervised learningexamples of correct input/output pair is availableInduction

Unsupervised learningNo hint at all about the correct outputs.Clustering or consistent interpretation.

Reinforcement learningReceives no examples, but rewards or punishments at the end

Semi-supervised learningTraining with labeled training examples and unlabeled examples

55

Issues on Learning Algorithm

Prior Knowledge Prior knowledge can help in learning.Assumptions on parametric forms and range of values

Incremental learningUpdate old knowledge whenever new example arrives

Batch learningApply learning algorithm to the entire set of examples

Analytic approach : find the optimal parameter values by analysisIterative adaptation : improve parameter values from initial guess

56

Learning Algorithms

General IdeasTweak parameters so as to optimize performance criterionIn the course of learning, the parameter vector traces a path that (hopefully) ends at the best parameter vector

57

Inductive Learning

For given training examplescorrect input-output pairs),

Recover unknown underlying functionfrom which the training data generated

Generalization ability for unseen data is required

Forms of the FunctionLogical sentences / Polynomials / Set of weights (Neural Networks), …

Given form of function, adjust parameters to minimize error

58

Theory of Inductive Inference

Inductive biasconstraints on hypothesis spaceTable of all observation is not a choice

Restricted Hypothesis space biasesPreference biases

Occam’s razor (Ockham) : simple hypo is best

Concept C XExamples are given as (x, y) where xX and

y = 1 if x C, y = 0 if x C Find F such that F(x)= 1 if x C, and F(x)= 0 if x C

59

Consistent hypotheses

William of Ockham (also Occam ) 1285-1349English scholastic philosopher

Prefer the simplest hypothesis consistent with dataDefinition of ‘simple’ is not easy

Tradeoff between complexity of hypothesis and degree of fit

60

Model Complexity

Decision Boundary of Salmon and Sea bassWhich is better ? A or B

A B

61

Model Complexity

We can get perfect classification performance on the training data by choosing complex models.

Issue of generalization

62

Generalization

The main goal of pattern classification system is to suggest the class of objects yet unseen : Generalization

Some complex decision boundaries are not good at generalization.Some simple boundaries are not good either.

Tradeoff between performance and simplicity

core of statistical pattern recognition

63

Generalization Strategy

How can we improve generalization performance ?

More training examples (i.e., better pdf estimates).

Simpler models (i.e., simpler classification boundaries) usually yield better performance.

Simplify the decision boundary!

64

Overfitting and underfitting

underfitting overfittinggood fit

Problem of generalization: a small emprical risk Remp does not imply small true expected risk R.

From Vojtěch Franc

65

Curse of Dimensionality

Function 의 수를 늘려면 error 감소 훈련데이더에 대한 Classifier 성능 향상

제한된 양의 training Data 로 훈련 시에 Feature 수를 늘리면 일반화 능력 감소적절한 일반화 능력 향상을 위하여 요구되는 훈련데이터의 양은 feature dimension 에 따라 급격히 증가

For a finite set of training data, Finding Optimal set of Features is a difficult problem

66

Maximize outcomes from two slot machines of unknown return rates

How much coins should be spent to find the better machine ?

Two Slot Machine Problem

67

Optimal Number of Cells (example)

68

Implication of Curse of Dimensionality to PR system

design

With finite training samples, be cautious of adding features

Features of high Discrimination power first

Feature analysis is mandatory

Simple neural networks is generally better

small number of hidden nodes, links

Tips for structure simplificationParameter tyingEliminate links during learning

69

Cross-Validation

Validate learned model on different set to assess the generalization performance

guarding against overfitting

Partition Training set intoEstimation subset for learning parametersvalidation subset

cross-validation forbest model selectiondetermine when to stop training

Leave-one-out validation methodN-1 for training, 1 for validation, takes turnOvercome Small training set

70

Unsupervised learningInput: training examples {x1,…,x} without information about the hidden state.

Clustering: goal is to find clusters of data sharing similar properties.

Classifier

Learning algorithm

θ

},,{ 1 xx },,{ 1 yyClassifier YΘX :q

A broad class of unsupervised learning algorithms:

From Vojtěch Franc

… …

71

Example of unsupervised learning algorithm

k-Means clustering:

Classifier ||||minarg)q(

,,1i

kiy mxx

Goal is to minimize

1

2)q( ||||

ii ixmx

ij

ji

iII

,||

1xm })q(:{ ij ji xI

Learning algorithm

1m

2m

3m

},,{ 1 xx

},,{ 1 kmmθ

},,{ 1 yy

From Vojtěch Franc

…

…

Other Issues in Pattern Recognition

73

Difficulty of Class Modeling

74

인식에는 Context Processing 이 필수

75

문자인식에서의 Context Processing

Context 없을 경우 사람의 영문 필기 인식률 : 95 %

76

Global Consistency

Local decision is not enough

77

Combining Multiple Classifiers

Approaches for improving the performance of the group of experts

Best single classifier vs Combining multiple classifiersTwo heads (experts, classifiers) are better than one

Classifier output is eitherBest (single) classRankingScore as each class

Method for generating multiple classifiersCo-related classifiers would not be a help

Method for combining multiple classifiersMajority rules, borda count, decorelated combination, etc.

78

패턴 인식 성능의 평가실제

인식결과

a

not a

A not A

p

qr

s

( 정 ) 인식률 = (p+q)/(p+q+r+s)오인식률 = (r+s)/(p+q+r+s)Miss detection = r/(p+r)False alarm = s/(p+s)Recall = p/p+rPrecision = p/p+s기각율 (refuse to make decision)처리율

A case: 20% 기각했는데 결과에는 0.5% errorB case : 10% 기각했는데 결과에는 1.0% errorWhich is better ?

79

패턴인식의 성능 향상

시간 , 노력

성능

100% 를 향하여

첨부

81

Resources

Professional AssociationInternational Association for Pattern Recognition (IAPR)

Text BooksPattern Classification by Richard O. Duda, Pater E. Hart, and David G, Storks

JournalsIEEE Transactions on Pattern Analysis and Machine IntelligencePattern RecognitionPattern Recognition LettersArtificial Intelligence and Pattern Recognition…

Conference and WorkshopsInternational Conference for Pattern RecognitionInt’l Conference on Document Analysis and RecognitionInt’l Workshop in Frontiers of Handwriting RecognitionIEEE Computer Vision and Pattern Recognition…

Introduction to Pattern Recognition

Documents

Transcript of Introduction to Pattern Recognition