1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High...

Post on 13-Jan-2016

215 views 0 download

Tags:

Transcript of 1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High...

1/15

Strengthening I-ReGEC classifierStrengthening I-ReGEC classifier

G. Attratto, D. Feminiano, and M.R. GuarracinoG. Attratto, D. Feminiano, and M.R. GuarracinoHigh Performance Computing and Networking InstituteHigh Performance Computing and Networking Institute

Italian National Research CouncilItalian National Research Council

2/15

Supervised learning

• Supervised learning refers

to the capability of a system

to learn from a set of

input/output couples:

Training Set.

3/15

Classification

• Consists of determining a model that it allows to group elements according to determined features

• The groups are the classes

4/15

Evaluation of classification methods

It’s ability’s pointer of prediction model

Some methods employ little time than others

The defined rules and the accuracy do not change considerable with various set

Possibility to classify dataset of great dimensions

• Accuracy

• Speed

• Robustness

• Scalability

5/15

• To render more efficient the examples’ choice during the training

Goals

• Delete the redundant examples or insufficient informative contribution

• Strengthening the training set, deleting the obsolete knowledge

Building anefficient, scalabile and generalizable

model

6/15

Classification techniques

Based on tree

Compute posterior probabilities with Bayes’ theorem

Simulate the behavior of the biological systems

Calculate hyperplanes

• Decision tree

• Bayesian Networks

• Neurals Networks

• Support Vector Machine (SVM)

(Optimal Tree)

(Slow in training)

(Slow in training)

7/15

SVM: The state of the art

Support vector

Optimal Hyperplane

Separation margin

• Find an examples set (support vectors)

representatives for classes

Nonlinearcase

Linear case

8/15

Regec• Two Hyperplanes representative for classes

(GEPSVM’s family)

011 x 022 x

Based on Genralized Eigenvalue

9/15

I-Regec• Select k points for each class with a clustering technique (K-means) |S| = 2xK

• Classify the test-set with the S points

• Add misclassified points in incremental mode to the S set

• On proceede until the finish of misclassified points

10/15

Strengthening• Apply I-ReGEC in order to obtain the training set

• Each iteration delete a point from training set

• Apply I-ReGEC in each iteration with new input set S

• Strengthening the set (save new S) if accuracy is improved

11/15

Microarray and matrix

EXAMPLES

FEATURESCLASSES

Gene expression

12/15

Results

DATASETACC.

I-RegecN° of

pointsACC.

StrengtheningN° of

points

Alon (62x2000)Colon cancer

73,00% 7,78 74,60% 7,78

Golub (72x7129)Leukaemia

87,12% 9,44 89,88% 9,44

Nutt (50x12625)Gliome

65,20% 7,47 65,20% 7,47

BRCA1 (22x3226)Breast Cancer

67,50% 4,24 67,50% 4,24

BRCA2 (22x3226)Breast Cancer

78,50% 5,53 79,50% 5,96

13/15

Results and Diagrams

Golub2D

Golub3D

I-Regec Strengthening

StrengtheningI-Regec

14/15

• The examples choice became more efficient

Conclusions

• The reduntants or obsolete examples have been deleted

• The training set are “strengthened”

15/15

Future work

• In order to optimize the execution time, the Strengthening technique would to go integrated into I-Regec.