Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University...

30
Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan

Transcript of Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University...

Page 1: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

Advancements in Genetic Programming for Data Classification

Dr. Hajira Jabeen

Iqra University

Islamabad, Pakistan

Page 2: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

Introduction

Page 3: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

Genetic Programming1. Solution representation 2. Random solution initialization3. Fitness estimation 4. Reproduction operators 5. Termination condition

/

-

/

A1 A2

+

8.06 A6

*

A3 A5

/

-

A3 A1

*

A2 A3

/

A2 *

A3 A5Full Method

Grow Method

/

-A3

A1

*A2

A3

/A2 +

A3

A5

/

-A3

A1

+A3

A5

/A2 *

A2

A3

/

-A3

A1

*A2

A3

/

-A3

A1

+A4

0.3

Crossover

Mutation

Page 4: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

Classification using Genetic Programming Advantages

Global search Flexible Data modeling Feature extraction Data distribution Transparent Comprehensible Portability

Disadvantages Long training time Bloat (larger classifier size) Lack of convergence Mixed type data Multiclass classification

Page 5: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

DepthLimited Crossover

Page 6: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

Classification using GP Solution Representation

Arithmetic classifier expression (ACE) Function set = {+ , - , / , * } Terminal set = {attributes of data, ephemeral

constant}

Solution Initialization Ramped half and half method

Maximum Depth depth=ceil(log2(Ad)) Where Ad is total number of attributes present in

the data

/

-

A3 A1

*

A2 3.4

Page 7: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

Fitness Classification accuracy

Reproduction operators Mutation

Point mutation Reproduction

Copy operator Crossover

DepthLimited Crossover

Classification using GP

Page 8: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

DepthLimited Crossover

*

/

/

A1 A2

-

A5 A6

+

-

A2 A3

A4

-

+

*

A2 -

*

A2 A3

A4

/

A2 A3

/

-

A3 A4

0.3

Valid CrossoverInvalid Crossover

Jabeen, H and Baig, A R., "DepthLimited Crossover in Genetic Programming for Classifier Evolution.“ Computers in Human Behaviour, Vol (27) 1, pp 1475-1481, Springer, 2010.

Page 9: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

DepthLimited Crossover Initial Depth Limits

Limit the search space Liberal search limits Depth must include all the attributes The value of depth d = ceil(log2(Ad))

A full tree of depth ‘d’ can contain all attributes of data.

Page 10: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

Complexity

1 10 19 28 37 46 55 64 73 82 91 1001091180

1000

2000

3000

4000

5000

6000RIPRHABERTRANSBUPAPIMAWBCHRTPARKIONSPECSONARMUSK

Generations

Avera

ge s

ize

1 10 19 28 37 46 55 64 73 82 91 1001091181

10

100

1000RIPRHABERTRANSBUPAPIMAWBCHRTPARKIONSPECSONARMUSK

Generations

Avera

ge S

ize

Increase in average number of nodes in population using DepthLimited GP

Increase in average number of nodes in population using GP with no size limits

Page 11: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

Optimization of GP Evolved Classifier

Page 12: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

Optimization of Expressions Motivation

Long training time Lack of convergence

Reason Classifier evaluated for each training instance All evolutionary algorithms do not converge to

same solution Solution

Optimization

Page 13: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

GP+PSO=GPSO

MADALGO Presentation

End

Best particle + ACE = GPSO

PSO to optimize weights

Add weights to ACE

Best GP ACE

GP for ACE evolution

Start

Jabeen, H and Baig, A. R., “GPSO: Optimization of Genetic Programming Classifier Expressions for Binary Classification using Particle Swarm Optimization.” International Journal of Innovative Computing, Information and Control, Vol (8) 1a, pp 223-242 2011.

Page 14: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

GP+PSO=GPSO

Solution representation = {W1, W2, W3}

Solution initialization Fitness estimation Position update Termination condition

GP ACE PSO ACE

+

/

A1 A3

3

+

/

*

W1 A1

*

W2 A3

*

W3 3

Page 15: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

Wisconsin Breast Cancer Dataset

10 20 30 40 50 60 70 80 90 100 110 12093

94

95

96

97

98

99

100

GPGPSO

Generations

Acc

ura

cy

Page 16: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

Classifiers for Pima Indian’s Dataset

GP * + / A3 A1 * A2 A5 + - A3 A2 - A3 8 69.9%

GPSO* + / * A3 0.50 * A1 0.58 * * A2 0.75 * A5 0.56 + - * A3 0.29 * A2 0.49 - * A3 0.62 * 8 -0.79

71.8%

Page 17: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

Classification of Mixed-Type Attribute Data

Page 18: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

Mixed Attribute Data Motivation

Use combination of arithmetic expressions and logical rules

Reasons Classifier expressions applicable to numerical data Rules applicable to categorical data

Solutions Convert categorical to enumerated value Numerical values to categorical values Use constrained syntax GP

Page 19: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

Two Layered Classifier Solution representation

AND

OR

T1 T2

NOT

T3

+

/

A1 A2

*

A2 A3

OR

AND

C1='a' C2='b'

OR

C3='c' C1='d'

Arithmetic Inner Tree Logical Inner Tree

Outer Layer TreeArithmetic Inner Tree Probability Logical Inner Tree Probability

Jabeen, H and Baig, A. R., “Two Layered Genetic Programming for Mixed Variable Data Classification.” Applied Soft Computing, Vol (12) 1, pp 416-422

Page 20: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

Two Layered Classifier Crossover Operator

Parent 1

Parent 2

Child 1 Child 2

AND

OR

T1 T2

NOT

T3

OR

AND

T4 T5

AND

T6 T7

AND

AND

T6 T7

NOT

T3

OR

AND

T4 T5

OR

T1 T2

Page 21: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

MutationAND

OR

T1 T2

NOT

AND

C1=‘a’ C2=‘b’

AND

OR

+

A2 -

A3 3.2

T2

NOT

T3

AND

OR

*

A8 /

A4 A3

T2

NOT

T3

AND

OR

T1 T2

NOT

OR

C1=‘d’ C3=‘a’

Page 22: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

Evolution of Best Trees

10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 25050

60

70

80

90

100

AUS

HEP

HRT

CRD

GER

Generations

Acc

ura

cy

Page 23: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

Multi-class Classification

Page 24: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

Multi-class Classification Motivation

Lack of a definitive solution for multiclass classification

Reasons Binary classifier

Solutions Binary decomposition Thresholds

Page 25: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

Multi-class Classification Approaches A four class classification problem, C=4

Number of classifiers C in binary decomposition =‘4’

Number of classifiers N in binary encoding=ceil(log2(4) )=‘2’

Number of conflicts in binary decomposition =12 Number of conflicts in binary encoding = 2N-C=0

Binary Encoding

Binary Decomposition

?

?

x

1 2

3 4

?

?

1 2

3 4

Page 26: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

Binary Encoding for Classifier

Classification matrix for a 4 class problem

Binary Encoding

Binary Decomposition

Classifier 1 Classifier 2

Class1 0 0

Class2 0 1

Class3 1 0

Class4 1 1

Classifier 1 Classifier 2 Classifier 3 Classifier 4

Class1 1 0 0 0

Class2 0 1 0 0

Class3 0 0 1 0

Class4 0 0 0 1

Page 27: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

ComparisonDatasets Algorithms BDGP ENGP

IRIS

ACC 96.00% 97.20%SD 0.2 0.3 Conflicts 5 1Classifiers 3 2

WINE

ACC 78.50% 90.10%SD 0.2 0.2 Conflicts 5 1Classifiers 3 2

VEHICLE

ACC 49.00% 83.00%SD 0.3 0.2 Conflicts 12 0Classifiers 4 2

GLASS

ACC 54.70% 76.70%SD 0.4 0.2 Conflicts 122 2Classifiers 6 3

YEAST

ACC 57.00% 73.80%SD 0.6 0.1 Conflicts 1014 24Classifiers 10 4

BDGP=Binary Decomposition

ENGP=Binary Encoding

Page 28: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

Complexity and Convergence

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 1200

20

40

60

80

100

120

NodesAverageBest

GenerationsAccuracy

1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 1031091150

20

40

60

80

100

120

NodesAverageBest

Generations

Accuracy

Number of nodes and fitness (Average, Best) IrisNumber of nodes and fitness (Average, Best) Wine

Page 29: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

Summary Computational efficient Less conflicts Better accuracy

Page 30: Advancements in Genetic Programming for Data Classification Dr. Hajira Jabeen Iqra University Islamabad, Pakistan.

MADALGO Presentation

Conclusion GP based classification

DepthLimited crossover Optimization of classifiers Mixed-type data classification Efficient multi-class classification