Non-Metric Methods

11

Non-Metric MethodsNon-Metric Methods

Shyh-Kang JengShyh-Kang JengDepartment of Electrical Engineering/Department of Electrical Engineering/Graduate Institute of Communication/Graduate Institute of Communication/

Graduate Institute of Networking and MultiGraduate Institute of Networking and Multimedia, National Taiwan Universitymedia, National Taiwan University

22

Non-Metric DescriptionsNon-Metric DescriptionsNominal dataNominal data– DiscreteDiscrete– Without natural notation of similarity or eveWithout natural notation of similarity or eve

n orderingn orderingProperty d-tupleProperty d-tuple– With lists of attributesWith lists of attributes– e.g., { e.g., { redred, , shinyshiny, , sweetsweet, , smallsmall } } – i.e., color = i.e., color = redred, texture = , texture = shinyshiny, , taste = taste = sweetsweet, size = , size = smallsmall

33

Non-Metric DescriptionsNon-Metric Descriptions

Strings of nominal attributesStrings of nominal attributes– e.g., Base sequences in DNA segmentse.g., Base sequences in DNA segments

“ “AGCTTCAGATTCCA”AGCTTCAGATTCCA”– Might themselves being the output of Might themselves being the output of

other component classifiersother component classifiers– e.g., Chinese character recognizer and e.g., Chinese character recognizer and

a neural network for classifyinga neural network for classifying

component brush strokescomponent brush strokes

44

Non-Metric MethodsNon-Metric Methods

Learn categories from non-metric Learn categories from non-metric datadata

Represent structures in stringsRepresent structures in strings

Toward discrete problems addressed Toward discrete problems addressed byby– Rule based pattern recognition methods Rule based pattern recognition methods – Syntactic pattern recognition methods Syntactic pattern recognition methods

55

Decision TreesDecision Trees

66

Benefits of Decision TreesBenefits of Decision Trees

InterpretabilityInterpretability

Rapid classificationRapid classification– Through a sequence of simple queriesThrough a sequence of simple queries

Natural way to incorporate prior Natural way to incorporate prior knowledge from human expertsknowledge from human experts

77

InterpretabilityInterpretabilityConjunctions and disjunctionsConjunctions and disjunctionsFor any particular test patternFor any particular test pattern– e.g.,properties:{taste, color, shape, size}e.g.,properties:{taste, color, shape, size}– xx = { = { sweetsweet, , yellowyellow, , thinthin, , mediummedium } }– (color = (color = yellowyellow) AND (shape = ) AND (shape = thinthin))For category descriptionFor category description– e.g., Apple = (e.g., Apple = (greengreen AND AND mediummedium) OR ) OR ((redred AND AND mediummedium))

Rule reductionRule reduction– e.g., Apple = (e.g., Apple = (mediummedium AND NOT AND NOT yellowyellow))

88

Tree ConstructionTree Construction

GivenGiven– Set Set DD of labeled training data of labeled training data– Set of properties for discriminating Set of properties for discriminating

patternspatterns

GoalGoal– Organize the tests into a treeOrganize the tests into a tree

99

Tree ConstructionTree ConstructionSplit samples progressively into Split samples progressively into smaller subsetssmaller subsetsPure subsetPure subset– All samples have the same category All samples have the same category

labellabel– Could terminate that portion of the treeCould terminate that portion of the tree

Subset with mixture of labelsSubset with mixture of labels– Decide either to stop or select another Decide either to stop or select another

property and grow the tree furtherproperty and grow the tree further

1010

CARTCARTClassification and regression treesClassification and regression treesA general framework for decision A general framework for decision treestreesGeneral questions in CARTGeneral questions in CART– Number of decision outcomes at a nodeNumber of decision outcomes at a node– Property tested at a nodeProperty tested at a node– Declaration of leafDeclaration of leaf– When and how to pruneWhen and how to prune– Decision of impure leaf nodeDecision of impure leaf node– Handling of missing dataHandling of missing data

1111

Branching Factor and Branching Factor and Binary DecisionsBinary Decisions

Branching factor (branching ratio) Branching factor (branching ratio) BB– Number of links descending from a nodeNumber of links descending from a node

Binary decisionsBinary decisions

Every decision can be represented Every decision can be represented using just binary decisionusing just binary decision– e.g., query of color (e.g., query of color (BB==33) ) – color = color = greengreen? Color = ? Color = yellowyellow??– Universal expressive powerUniversal expressive power

1212

Binary TreesBinary Trees

1313

Geometrical Interpretation for Trees Geometrical Interpretation for Trees for Numerical Datafor Numerical Data

1414

Fundamental PrincipleFundamental PrinciplePrefer decisions leading to a simple, compact tPrefer decisions leading to a simple, compact tree with few nodesree with few nodes– A version of Occam’s razorA version of Occam’s razor

Seek a property query Seek a property query TT at each node at each node NN – Make the data reaching the immediate descendent Make the data reaching the immediate descendent

nodes as pure as possible nodes as pure as possible – i.e., achieve lowest impurityi.e., achieve lowest impurity

Impurity Impurity ii((NN))– Zero if all patterns bear the same labelZero if all patterns bear the same label– Large if the categories are equally representedLarge if the categories are equally represented

1515

Entropy Impurity Entropy Impurity (Information Impurity)(Information Impurity)

Most popular measure of impurityMost popular measure of impurity

)|(ˆ)(

)(log)()( 2

NPP

PPNi

jj

jj

j

x

1616

Variance Impurity for Variance Impurity for Two-Category CaseTwo-Category Case

Particular useful in two-category Particular useful in two-category casecase

)()( 21 PPNi

1717

Gini ImpurityGini Impurity

Generalization of variance Generalization of variance impurityimpurity

Applicable to two or more Applicable to two or more categoriescategories

Expected error rate at node Expected error rate at node NN

jj

jiji PPPNi )(1

2

1)()()( 2

1818

Misclassification ImpurityMisclassification Impurity

Minimum probability that a Minimum probability that a training pattern would be training pattern would be misclassified at misclassified at NN

Most strongly peaked at equal Most strongly peaked at equal probabilitiesprobabilities

)(max1)( jj

PNi

1919

Impurity for Two-Category CaseImpurity for Two-Category Case

*Adjusted in scale and offset for comparison

2020

Heuristic to Choose QueryHeuristic to Choose Query

If entropy impurity is used, the impurity If entropy impurity is used, the impurity reduction is corresponding to an informreduction is corresponding to an information gaination gainReduction of entropy impurity due to a sReduction of entropy impurity due to a split can not be greater than 1 bitplit can not be greater than 1 bit

)()1()()()(

maximize to equery valu choose

RLLL NiPNiPNiTi

s

2121

Finding ExtremaFinding Extrema

Nominal attributesNominal attributes– Perform extensive or exhaustive search over Perform extensive or exhaustive search over

all possible subsets of the training setall possible subsets of the training setReal-valued attributesReal-valued attributes– Use gradient descent algorithms to find a spUse gradient descent algorithms to find a sp

litting hyperplanelitting hyperplane– As a one-dimensional optimization problem As a one-dimensional optimization problem

for binary treesfor binary trees

2222

Tie BreakingTie BreakingNominal dataNominal data– Choose randomlyChoose randomlyReal-valued dataReal-valued data– Assume a split lying in Assume a split lying in xxll < < xxss < < xxuu

– Choose either the middle point or the weighChoose either the middle point or the weighted average ted average xxss = ( = (1-P1-P))xxll + + PxPxuu

– PP is the probability a pattern goes to the “l is the probability a pattern goes to the “left” under the decisioneft” under the decision

Computational simplicity may be a deteComputational simplicity may be a determining factorrmining factor

2323

Greedy MethodGreedy MethodGet a local optimum at each nodeGet a local optimum at each nodeNo assurance that successive locally No assurance that successive locally optimal decisions lead to the global optimal decisions lead to the global optimumoptimumNo guarantee that we will have the No guarantee that we will have the smallest treesmallest treeFor reasonable impurity measure and For reasonable impurity measure and learning methodslearning methods– Often continue to split further to get the Often continue to split further to get the

lowest possible impurity at the leafslowest possible impurity at the leafs

2424

Favoring Gini Impurity to MisclassifiFavoring Gini Impurity to Misclassification Impuritycation Impurity

Example: 90 in Example: 90 in 11 and 10 in and 10 in 22

Misclassification impurity: 0.1Misclassification impurity: 0.1Suppose no splits guarantee a Suppose no splits guarantee a 22 majority in ei majority in either of the two descendent nodesther of the two descendent nodesMisclassification impurity remains at 0.1 for all Misclassification impurity remains at 0.1 for all splitssplitsAn attractive split: 70 An attractive split: 70 11, 0 , 0 22 to the right and 2 to the right and 20 0 11, 10 , 10 22 to the left to the leftGini impurity shows that this is a good splitGini impurity shows that this is a good split

2525

Twoing CriterionTwoing Criterion

For multiclass binary tree creationFor multiclass binary tree creationFind “supercategories” Find “supercategories” CC11 and and CC22

CC11 = = {{i1i1, , i2i2, …, , …, ikik}, }, CC22 = = CC--CC11

Compute Compute ii((ss,,CC11)) as though it correspon as though it corresponding to a standard two-class problemding to a standard two-class problemFind Find ss**((CC11)) that maximize the change and that maximize the change and then the supercategory then the supercategory CC11

**

2626

Practical ConsiderationsPractical Considerations

Choice of impurity function rarely Choice of impurity function rarely affects the final classifier and its affects the final classifier and its accuracyaccuracy

Stopping criterion and pruning Stopping criterion and pruning methods are more important in methods are more important in determining final accuracydetermining final accuracy

2727

Multiway SplitsMultiway Splits

impurity ratiogain on the based

log

)()(

, largeith decision w favoring avoid to

)()()(

12

1

B

kkk

B

B

kkk

PP

sisi

B

NiPNisi

2828

Importance of Stopping CriteriaImportance of Stopping CriteriaFully growing trees have typically been oFully growing trees have typically been overfitverfitExtreme case: each leaf corresponds to Extreme case: each leaf corresponds to a single training pointa single training point– The full tree is merely a look-up tableThe full tree is merely a look-up table– Not to generalize well in noisy problemNot to generalize well in noisy problemEarly stopping Early stopping – Error on training data not sufficiently low Error on training data not sufficiently low – Performance may sufferPerformance may suffer

2929

Stopping by Checking Stopping by Checking Validation ErrorValidation Error

Using a subset of the data (e.g., Using a subset of the data (e.g., 90%) for training and the remaining 90%) for training and the remaining (10%) as a validation set(10%) as a validation set

Continue splitting until the error on Continue splitting until the error on the validation data is minimizedthe validation data is minimized

3030

Stopping by Setting a ThresholdStopping by Setting a ThresholdStop if Stop if maxmaxss ii(s) < (s) < BenefitsBenefits– Use all training dataUse all training data– Leaf can lie in different levelsLeaf can lie in different levelsFundamental drawbackFundamental drawback– Difficult to determine the thresholdDifficult to determine the thresholdAn alternative simple methodAn alternative simple method– Stop when a node represents fewer than soStop when a node represents fewer than so

me me A threshold number of points, A threshold number of points, A fixed percentage of the total training setA fixed percentage of the total training set

3131

Stopping by Checking a Stopping by Checking a Global CriterionGlobal Criterion

Stop when a global criterion is Stop when a global criterion is minimumminimum

Minimum description lengthMinimum description length

Criterion: complexity and uncertaintyCriterion: complexity and uncertainty

nodesleaf

Nisize )(

3232

Stopping Using Statistical Tests Stopping Using Statistical Tests

splitting stop eshold,chosen thr theexceeding

a yieldnot doessplit t significanmost theif

right the to)1( andleft to sends split

2

2

1

22

i i

iiL

Pn

Pnn

n-PPns

3333

Horizon EffectHorizon Effect

Determination of optimal split at a Determination of optimal split at a node is not influenced by decisions at node is not influenced by decisions at its descendent nodesits descendent nodes

A stopping condition may be met too A stopping condition may be met too early for overall optimal recognition early for overall optimal recognition accuracyaccuracy– Biases toward trees in which the Biases toward trees in which the

greatest impurity is near the root nodegreatest impurity is near the root node

3434

PruningPruning

Grow a tree fully firstGrow a tree fully first

All pairs of neighboring leaf nodes All pairs of neighboring leaf nodes are considered for eliminationare considered for elimination– If the elimination yields a satisfactory If the elimination yields a satisfactory

(small) increase in impurity, the (small) increase in impurity, the common antecedent node is declared a common antecedent node is declared a leaf leaf

– merging or joiningmerging or joining

3535

Rule PruningRule Pruning

Each leaf has an associated ruleEach leaf has an associated rule

Some of rules can be simplified if a Some of rules can be simplified if a series of decisions is redundantseries of decisions is redundant

Can improve generalization and Can improve generalization and interpretabilityinterpretability

Allows us to distinguish between Allows us to distinguish between contexts in which the node is usedcontexts in which the node is used

3636

Example 1: A Simple TreeExample 1: A Simple Tree

3737


lyrespective ,

and for positions 1ly exhaustivecheck

"" form theof splitsconsider

0.1)(log)()(

noderoot ofimpurity

2

1

2

12

x

xn

xx

PPNi

isi

iiiroot

3838


3939


4040

Computation ComplexityComputation ComplexityTrainingTraining– Root nodeRoot node

Sorting: Sorting: OO((dndn log log nn))

Entropy computation: Entropy computation: OO((nn)+()+(nn-1)O(-1)O(dd))

Total: Total: OO((dndn log log nn))

– Level 1 nodeLevel 1 nodeAverage case: Average case: OO((dndn log ( log (nn/2))/2))

– Total number of levels: Total number of levels: OO(log (log nn))– Total average complexity: Total average complexity: OO((dndn (log (log nn))22))

Recall and classificationRecall and classification– OO(log (log nn))

4141

Feature ChoiceFeature Choice

4242

Feature ChoiceFeature Choice

4343

Multivariate Decision TreesMultivariate Decision Trees

4444

Multivariate Decision Trees Using Multivariate Decision Trees Using General Linear DecisionsGeneral Linear Decisions

4545

Priors and CostsPriors and Costs

PriorsPriors– Weight samples to correct for the prior freWeight samples to correct for the prior fre

quenciesquenciesCostsCosts– Cost matrix Cost matrix ijij

– Incorporate cost into impurity, e.gIncorporate cost into impurity, e.g.,.,

ji

jiij PPNi,

)()()(

4646

Training and Classification Training and Classification with Deficient Patternswith Deficient Patterns

TrainingTraining– Proceed as usualProceed as usual– Calculate impurities at a node using only Calculate impurities at a node using only

the attribute information presentthe attribute information present

ClassificationClassification– Use traditional (“primary”) decision Use traditional (“primary”) decision

whenever possiblewhenever possible– Use surrogate splits when test pattern is Use surrogate splits when test pattern is

missing some featuresmissing some features– Or use virtual valuesOr use virtual values

4747

Example 2: Surrogate Splits and Example 2: Surrogate Splits and Missing AttributesMissing Attributes

4848

Example 2: Surrogate Splits and Example 2: Surrogate Splits and Missing AttributesMissing Attributes

4949

Algorithm ID3 Algorithm ID3 Interactive dichotomizerInteractive dichotomizerFor use with nominal (unordered) inputs For use with nominal (unordered) inputs onlyonly– Real-valued variables are handled by binsReal-valued variables are handled by binsGain ratio impurity is usedGain ratio impurity is usedContinues until all nodes are pure or theContinues until all nodes are pure or there are no more variablesre are no more variablesPruning can be incorporatedPruning can be incorporated

5050

Algorithm C4.5Algorithm C4.5

Successor and refinement of ID3Successor and refinement of ID3

Real-valued variables are treated as Real-valued variables are treated as in CARTin CART

Gain ratio impurity is usedGain ratio impurity is used

Use pruning based on statistical Use pruning based on statistical significance of splitssignificance of splits

5151

Treating Deficient Patterns in C4.5Treating Deficient Patterns in C4.5

Branching factor Branching factor BB at node at node NN

Follow all Follow all BB possible answers to the possible answers to the descendent nodes and ultimate descendent nodes and ultimate BB leaf nodesleaf nodes

Final decision is based on the labels Final decision is based on the labels of the of the BB leaf nodes, weighted by the leaf nodes, weighted by the decision probabilities at decision probabilities at NN

5252

Example of C4.5RulesExample of C4.5Rules

1

21

21

1

21

21

21

21

)]03.633.1343.5(

)11.016.004.0([

)]03.633.1343.5(

)45.077.196.0(

)02.044.027.0(

)11.016.004.0([

x

x

THEN

xxAND

xxIF

THEN

xxAND

xxAND

xxAND

xxIF

5353

Basic Concepts for Basic Concepts for String RecognitionString Recognition

StringsStrings– ““AGCTTCGAATC”AGCTTCGAATC”

Characters and alphabetCharacters and alphabet– { A, G, C, T }{ A, G, C, T }

WordsWords– x = “AGCTTC”x = “AGCTTC”

TextsTextsFactorFactor– ““GCT” in “AGCTTC”GCT” in “AGCTTC”

5454

String Problems of Importance for String Problems of Importance for Pattern RecognitionPattern Recognition

String matchingString matching– e.g., classify a book by key wordse.g., classify a book by key words

Edit distanceEdit distanceString matching with errorsString matching with errors– e.g., classify texts with misspelled wordse.g., classify texts with misspelled words

String matching with “don’t-care” String matching with “don’t-care” symbolsymbol– e.g., classify DNA sequence according to e.g., classify DNA sequence according to

if it contain a protein with some DNA if it contain a protein with some DNA segments inert or having no function segments inert or having no function

5555

General String Matching ProblemGeneral String Matching Problem

5656

Naïve String MatchingNaïve String Matching

ss

s

msstextm

mns

s

lengthmtextlengthntextA

end

return

1

shift"at occurspattern "print then

]1[]1[ if

while

0

][],[,,, initialize

x

xx

5757

Boyer-Moore String MatchingBoyer-Moore String Matching

end

return

])[(]),1[(1max else

1

shift"at occurspattern "print then

0if

1do

][][ and 0 while

while

0

functionsuffix -good)(

function occurrence-last)(

]length[],length[,,, initialize

jstextFjmjGjss

ss

s

j

jj

jstextjj

mj

mns

s

G

F

mtextntextA

x

x

x

x

xx

5858

Boyer-Moore String MatchingBoyer-Moore String Matching

5959

Subset-Superset ProblemSubset-Superset Problem

Search for several stringsSearch for several strings– Some short strings are factors of long onesSome short strings are factors of long onesExampleExample– Find “beat”, “eat”, “be”Find “beat”, “eat”, “be”– From “when_chris_beats_the_drum”From “when_chris_beats_the_drum”Return the longest target strings onlyReturn the longest target strings only

6060

Edit DistanceEdit Distance

Minimum number of fundamental Minimum number of fundamental operations required to transform operations required to transform string string xx to string to string yy

Fundamental operationsFundamental operations– SubstitutionSubstitution– InsertionInsertion– DeletionsDeletions

6161

Edit Distance ComputationEdit Distance Computation

nj

jj

jj

j

mi

ii

ii

i

lengthnlengthmA

until

],0[

1 do

0

until

]0,[

1 do

0

0]0,0[

][],[,,, initialize

C

C

C

yxyx

6262


end

],[return

until

until

])][],[(1]1,1[

,1]1,[

,1],1[min[],[

1 do

1 do

0,0

nm

mi

nj

jiji

ji

jiji

jj

ii

ji

C

yxC

C

CC

6363


6464

String Matching with ErrorsString Matching with ErrorsGiven a pattern Given a pattern xx and and texttext

Find the shift for which the edit distance Find the shift for which the edit distance between between xx and a factor of and a factor of texttext is minimu is minimummApply an algorithm similar to that compApply an algorithm similar to that computes edit distance between two strings, utes edit distance between two strings, but uses matrix of cost but uses matrix of cost EEMatrix of cost Matrix of cost EE– EE[[i,ji,j] = min[ ] = min[ CC((xx[1…[1…ii], ], yy[1…[1…jj]) ]]) ]– EE[0,[0,jj] = 0] = 0

6565

String Matching with ErrorsString Matching with Errors

6666

Can modify the naïve algorithmCan modify the naïve algorithmExtending Boyer-Moore algorithm is diffiExtending Boyer-Moore algorithm is difficult and inefficientcult and inefficientCheck Bibliography for better methodsCheck Bibliography for better methods

String Matching with String Matching with “Don’t-Care” Symbols“Don’t-Care” Symbols

6767

Grammars Grammars Structures of hierarchical Rules to Structures of hierarchical Rules to generate stringsgenerate strings

Examples Examples – ““The history book clearly describe several The history book clearly describe several

wars”wars”– Telephone numbersTelephone numbers

Provide constraints for a full system that Provide constraints for a full system that uses a statistical recognizer as a uses a statistical recognizer as a componentcomponent– e.g., Mathematical equation recognizere.g., Mathematical equation recognizer– ASR for phone numbers ASR for phone numbers

6868

GrammarsGrammarsSymbols Symbols – Alphabet Alphabet AA– Null stringNull string

Variables Variables II

Root symbolRoot symbol– Taken from a set Taken from a set SS

Production rules Production rules PP

Grammar Grammar GG={={AA, , II, , SS, , PP}}

Language Language LL((GG))

6969

Example Language 1Example Language 1

aba:

ccc:

bcb:

bbb:

:

aa:

,,,,cb,a,

6

5

4

3

2

1

B

A

A

B

BAAB

BAORSBAS

CBAS

p

p

p

p

p

p

P

ISA

7070


abc

ab

a

root

4

6

1

p

p

p

A

BA

S

aabbcc

aabbc

aabb

aab

aab

aa

a

root

5

4

3

2

6

p

p

p

p

p

p

p

1

1

A

AA

BAA

ABA

BABA

SBA

S

1|cba)( nnnnGL

7171


sentence

phraseadverbialadverb

adjectivephraseverb

phrasenounverbnoun

S

I

A

,

,,

,,,

copies,

1000, over, sold, book, history, the,

7272


lly"heuristica hop dreamsgreen Squishy "

holds buys describes

theorembook

ORoveradverb

ORORORverb

ORORnoun

phraseadverbialphraseverbphraseverb

phrasenounadjectivephrasenoun

phraseverbphrasenounsentence

P

7373

Derivation TreeDerivation Tree

7474

Types of String GrammarsTypes of String GrammarsType 0: Free or unrestrictedType 0: Free or unrestrictedType 1: Context-sensitiveType 1: Context-sensitive– II Type 2: Context-freeType 2: Context-free– II Type 3: Finite state or regularType 3: Finite state or regular– zz OR OR zzGrammars of type Grammars of type ii includes all grammar includes all grammars of type s of type ii+1+1

7575

Chomsky Normal Form (CNF)Chomsky Normal Form (CNF)

CNFCNF– AA BCBC and and AA zz

For every context-free grammar For every context-free grammar GG, , there is another there is another GG’’ in CNF such that in CNF such that LL((GG)=)=LL((GG’’))

7676

Example 3Example 3

6

,,1,2,3,6

,,,,

,,,,,,,

999,999 and 1between number any pronounce To

digit

tysteensdigitdigitsdigitsdigits

thousandhunderdninetythirty

twentyelevententwoone

S

I

A

7777

Example 3Example 3

ninetyORthirtyORtwentytys

ninenteenORelevenORtenteens

nineORtwoORonedigit

digitORdigittysORtysORteensdigits

digitsORdigitdigits

digitsdigitdigits

digitsORdigitsdigits

digitsdigitsdigits

1

1 1 2

2 hunderd 13

2 hunderd 13

3 thousand36

3 thousand36

P

*Type 2 grammar

7878

Example 3Example 3

7979

Recognition Using GrammarsRecognition Using Grammars

Classify a test sentence Classify a test sentence xx according to th according to the grammar on which it is generatede grammar on which it is generatedThe grammar is among The grammar is among cc different ones: different ones: GG11, , GG22, …, , …, GGcc

ParsingParsing– Find a derivation in Find a derivation in GG that leads to that leads to xx

8080

Bottom-Up Parsing AlgorithmBottom-Up Parsing Algorithm

1 do

0

1 do

1

until

|

1 do

0

CNFin , initialize

1

21

ii

i

jj

j

ni

xAAV

ii

i

xxx

ii

nxPS,I,A,G

8181

Bottom-Up Parsing AlgorithmBottom-Up Parsing Algorithm

end

return

"in successful" " of parse"print then if

until

1 until

1 until

and ,|

1 do

0

1

,

G

P

xn

kjkiik

ijij

ij

VS

nj

jni

jk

VCVBBCAA

VV

kk

k

V

8282

ExampleExample

a:

b:

a:

:

,,

b,a

4

3

2

1

ORABC

ORCCB

ORBAA

BCORABS

S

CBA

p

p

p

p

P

S

I

A

8383

ExampleExample

8484

ExampleExample

8585

ExampleExample

8686

Finite State MachineFinite State Machine

G

BORBA

AS

cowmouse

the

8787

Grammatical InferenceGrammatical Inference

Learn a grammar from example Learn a grammar from example sentences it generatessentences it generates

Differences from statistical methodsDifferences from statistical methods– Usually an infinite number of grammars Usually an infinite number of grammars

consistent with the training dataconsistent with the training data– Often can not recover the source Often can not recover the source

grammar uniquelygrammar uniquely

8888

Main TechniquesMain TechniquesUse both positive and negative instanceUse both positive and negative instances, s, DD++ and and DD--, respectively, respectively– Generalization for multicategory casesGeneralization for multicategory casesImpose conditions and constraintsImpose conditions and constraints– Alphabet contains only those symbols appeAlphabet contains only those symbols appe

aring in training sentencesaring in training sentences– Every production rule is usedEvery production rule is used– Seek the “simplest” grammar that explainSeek the “simplest” grammar that explain

s the training sentencess the training sentences

8989

Grammatical Inference (Overview)Grammatical Inference (Overview)

G

G

A

S

by parsed benot can if

from read

1 do

0

possible) as simple (as initialize

in characters ofset

, initialize

i

i D

ii

i

D

S

Dn

DD

x

x

9090

Grammatical Inference (Overview)Grammatical Inference (Overview)

end

return

sproductionredundant eliminate

until

in sentenses nobut parses if updatesaccept

tobles varia

and tosproduction additional propose dothen

PS,I,A,G

G

I

P

ni

D-ix

9191

Example 4: Grammatical InferenceExample 4: Grammatical Inference

DD++={ a, aaa, aaab, aab }={ a, aaa, aaab, aab }

DD-- = { ab, abc, abb, aabb } = { ab, abc, abb, aabb }

AA = { a, b } = { a, b }

I I = { = { AA } }

SS = = SS

PP = { = { SS AA } }

GG00 = { = { AA, , I, S, PI, S, P } }

9292

Example 4: Grammatical InferenceExample 4: Grammatical Inferenceii xxii

++ PP PP produces produces DD--??

11 aa SSAA

AAaa

NoNo

22 aaaaaa SSAA

AAaa

AAaaAA

NoNo

33 aaabaaab SSAA

AAaa

AAaaAA

AAabab

Yes: Yes: abab is in is in DD--

9393

Example 4: Grammatical InferenceExample 4: Grammatical Inferenceii xxii

++ PP PP produces produces DD--??

33 aaabaaab SSAA

AAaa

AAaaAA

AAaabaab

NoNo

44 aabaab SSAA

AAaa

AAaaAA

AAaabaab

NoNo

9494

Rule-Based MethodsRule-Based Methods

For classes characterized by general For classes characterized by general relationships among entities, rather relationships among entities, rather than by instances per sethan by instances per se

Integral to expert systemIntegral to expert system

Modest use in pattern recognitionModest use in pattern recognition

Focus on if-then rules for Focus on if-then rules for representing and learning such representing and learning such relationshipsrelationships

9595

Example: ArchExample: Arch

9696

IF-THEN RulesIF-THEN RulesExample: Example: – IFIF Swims(Swims(xx)) ANDAND HasScales(HasScales(xx)) THENTHEN FiFish(sh(xx))

PredicatesPredicates– e.g., e.g., Man(.), HasTeeth(.), AreMarrieMan(.), HasTeeth(.), AreMarried(.,.), Touch(.,.), Supportsd(.,.), Touch(.,.), Supports(.,.,.)(.,.,.)

Variables Variables – e.g., e.g., IFIF Eats(Eats(x,yx,y)) ANDAND HasFlesh(HasFlesh(xx)) THETHE

NN Carnivore(Carnivore(yy))

9797

Propositional Logic and Propositional Logic and First-Order LogicFirst-Order Logic

Propositional logicPropositional logic– e.g., e.g., IFIF Male(Bill)Male(Bill) ANDAND IsMarried(BilIsMarried(Bill)l) THENTHEN IsHusband(Bill)IsHusband(Bill)

First-order logic examplesFirst-order logic examples– IFIF Male(x)Male(x) ANDAND IsMarried(x,y)IsMarried(x,y) THENTHEN IIsHusband(x)sHusband(x)

– IFIF Parent(x,y)Parent(x,y) ANDAND Parent(y,z)Parent(y,z) THENTHEN GrandParent(x,z)GrandParent(x,z)

– IFIF Spouse(x,y)Spouse(x,y) THENTHEN Spouse(y,x)Spouse(y,x)

9898

Functions and TermsFunctions and Terms

ExampleExample– IFIF Male(Male(xx)) ANDAND (Age((Age(xx) < 16)) < 16) THENTHEN BoBoy(y(xx))

9999

Applications in Applications in Pattern ClassificationPattern Classification

ExampleExample– IFIF IsBlock(IsBlock(xx)) ANDAND IsBlock(IsBlock(yy)) ANDAND IsBIsBlock(lock(zz)) ANDAND Touch(Touch(x,yx,y)) ANDAND Touch(Touch(x,x,zz)) ANDAND NotTouch(NotTouch(y,zy,z)) ANDAND Supports(Supports(x,x,y,zy,z)) THENTHEN Arch(Arch(x,y,zx,y,z))

Algorithms to implement predicates or fAlgorithms to implement predicates or functions may be difficultunctions may be difficult

100100

Learning RulesLearning RulesTrain a decision tree and then simplify the Train a decision tree and then simplify the tree to extract rules tree to extract rules

Infer rules via grammatical inferenceInfer rules via grammatical inference

Sequential covering learningSequential covering learning– Given sets of positive and negative examplesGiven sets of positive and negative examples– Deletes examples that the rules explain and Deletes examples that the rules explain and

iteratesiterates– Leads to disjunctive set of rules that cover the Leads to disjunctive set of rules that cover the

training datatraining data– Simplifies the results by standard logical Simplifies the results by standard logical

methodsmethods

101101

Learning RulesLearning Rules

Non-Metric Methods

Documents

Transcript of Non-Metric Methods