1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High...

15
1/15 Strengthening I-ReGEC Strengthening I-ReGEC classifier classifier G. Attratto, D. Feminiano, and M.R. G. Attratto, D. Feminiano, and M.R. Guarracino Guarracino High Performance Computing and High Performance Computing and Networking Institute Networking Institute Italian National Research Council Italian National Research Council

Transcript of 1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High...

Page 1: 1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National.

1/15

Strengthening I-ReGEC classifierStrengthening I-ReGEC classifier

G. Attratto, D. Feminiano, and M.R. GuarracinoG. Attratto, D. Feminiano, and M.R. GuarracinoHigh Performance Computing and Networking InstituteHigh Performance Computing and Networking Institute

Italian National Research CouncilItalian National Research Council

Page 2: 1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National.

2/15

Supervised learning

• Supervised learning refers

to the capability of a system

to learn from a set of

input/output couples:

Training Set.

Page 3: 1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National.

3/15

Classification

• Consists of determining a model that it allows to group elements according to determined features

• The groups are the classes

Page 4: 1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National.

4/15

Evaluation of classification methods

It’s ability’s pointer of prediction model

Some methods employ little time than others

The defined rules and the accuracy do not change considerable with various set

Possibility to classify dataset of great dimensions

• Accuracy

• Speed

• Robustness

• Scalability

Page 5: 1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National.

5/15

• To render more efficient the examples’ choice during the training

Goals

• Delete the redundant examples or insufficient informative contribution

• Strengthening the training set, deleting the obsolete knowledge

Building anefficient, scalabile and generalizable

model

Page 6: 1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National.

6/15

Classification techniques

Based on tree

Compute posterior probabilities with Bayes’ theorem

Simulate the behavior of the biological systems

Calculate hyperplanes

• Decision tree

• Bayesian Networks

• Neurals Networks

• Support Vector Machine (SVM)

(Optimal Tree)

(Slow in training)

(Slow in training)

Page 7: 1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National.

7/15

SVM: The state of the art

Support vector

Optimal Hyperplane

Separation margin

• Find an examples set (support vectors)

representatives for classes

Nonlinearcase

Linear case

Page 8: 1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National.

8/15

Regec• Two Hyperplanes representative for classes

(GEPSVM’s family)

011 x 022 x

Based on Genralized Eigenvalue

Page 9: 1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National.

9/15

I-Regec• Select k points for each class with a clustering technique (K-means) |S| = 2xK

• Classify the test-set with the S points

• Add misclassified points in incremental mode to the S set

• On proceede until the finish of misclassified points

Page 10: 1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National.

10/15

Strengthening• Apply I-ReGEC in order to obtain the training set

• Each iteration delete a point from training set

• Apply I-ReGEC in each iteration with new input set S

• Strengthening the set (save new S) if accuracy is improved

Page 11: 1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National.

11/15

Microarray and matrix

EXAMPLES

FEATURESCLASSES

Gene expression

Page 12: 1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National.

12/15

Results

DATASETACC.

I-RegecN° of

pointsACC.

StrengtheningN° of

points

Alon (62x2000)Colon cancer

73,00% 7,78 74,60% 7,78

Golub (72x7129)Leukaemia

87,12% 9,44 89,88% 9,44

Nutt (50x12625)Gliome

65,20% 7,47 65,20% 7,47

BRCA1 (22x3226)Breast Cancer

67,50% 4,24 67,50% 4,24

BRCA2 (22x3226)Breast Cancer

78,50% 5,53 79,50% 5,96

Page 13: 1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National.

13/15

Results and Diagrams

Golub2D

Golub3D

I-Regec Strengthening

StrengtheningI-Regec

Page 14: 1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National.

14/15

• The examples choice became more efficient

Conclusions

• The reduntants or obsolete examples have been deleted

• The training set are “strengthened”

Page 15: 1/15 Strengthening I-ReGEC classifier G. Attratto, D. Feminiano, and M.R. Guarracino High Performance Computing and Networking Institute Italian National.

15/15

Future work

• In order to optimize the execution time, the Strengthening technique would to go integrated into I-Regec.