Classify Ensemble

download Classify Ensemble

of 8

Transcript of Classify Ensemble

  • 7/26/2019 Classify Ensemble

    1/8

    Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

    Examples of Ensemble Methods

    How to generate an enemb!e o" c!ai"ier#$ %agging

    $ %ooting

  • 7/26/2019 Classify Ensemble

    2/8

    Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2

    Bagging

    Sam&!ing with re&!acement

    %ui!d a c!ai"ier on each boottra& am&!e

    'ach e(am&!e ha &robabi!it) 1/* being e!ected

    +or an e(am&!e not being e!ected a"ter * time, the

    &robabi!it) i 1 $ 1/*-*

    when * i !arge, it i c!oe to 1/e- +or an e(am&!e being e!ected a"ter * time, the

    &robabi!it) i 1 $ 1/e . 02

    boottra& am&!e contain 3 o" the origina! data

    Original Data 1 2 4 5 8 6 10

    Bagging (Round 1) 5 8 10 8 2 10 10 6

    Bagging (Round 2) 1 4 6 1 2 2 5 2

    Bagging (Round 3) 1 8 10 6 5

  • 7/26/2019 Classify Ensemble

    3/8

    Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004

    Boosting

    n iterati7e &rocedure to ada&ti7e!) changeditribution o" training data b) "ocuing more on

    &re7iou!) mic!ai"ied record

    $ Initia!!), a!! * record are aigned eua!

    weight$ 9n!i:e bagging, weight ma) change at the

    end o" booting round

  • 7/26/2019 Classify Ensemble

    4/8

    Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4

    Boosting

    ;ecord that are wrong!) c!ai"ied wi!! ha7e theirweight increaed

    ;ecord that are c!ai"ied correct!) wi!! ha7e

    their weight decreaed

    Original Data 1 2 4 5 8 6 10

    Boosting (Round 1) 5 2 8 5 6 4 10

    Boosting (Round 2) 4 6 4 2 1 5 4 2

    Boosting (Round 3) 4 4 8 10 4 4 4

  • 7/26/2019 Classify Ensemble

    5/8

    Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004

    Example: AdaBoost

    %ae c!ai"ier= >1, >2, ?, >T

    'rror rate=

    Im&ortance o" a c!ai"ier=

    ( )= =N

    j

    jjiji yxCwN 1 )(

    1

    =

    i

    ii

    1ln

    2

    1

  • 7/26/2019 Classify Ensemble

    6/8

    Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004

    Example: AdaBoost

    @eight u&date=

    I" an) intermediate round &roduce error rate

    higher than 03, the weight are re7erted bac:

    to 1/n and the ream&!ing &rocedure i re&eated >!ai"ication=

    factorionnormalizattheiswhere

    )(ifexp

    )(ifexp)()1(

    j

    iij

    iij

    j

    j

    ij

    i

    Z

    yxC

    yxC

    Z

    ww

    j

    j

    =

    =

    +

    ( )=

    ==T

    j

    jjy

    yxCxC1

    )(maxarg)(*

  • 7/26/2019 Classify Ensemble

    7/8

    Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5

    Illustrating AdaBoost

    Data &oint"or trainingInitia! weight "or each data &oint

  • 7/26/2019 Classify Ensemble

    8/8

    Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8

    Illustrating AdaBoost