Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of...
Transcript of Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of...
![Page 1: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/1.jpg)
SupervisedandUnsupervisedLearning
CiroDonalekAy/Bi199–April2011
![Page 2: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/2.jpg)
Summary• KDDandDataMiningTasks• Findingtheop?malapproach• SupervisedModels
– NeuralNetworks– Mul?LayerPerceptron– DecisionTrees
• UnsupervisedModels– DifferentTypesofClustering– DistancesandNormaliza?on– Kmeans– SelfOrganizingMaps
• Combiningdifferentmodels– CommiOeeMachines– IntroducingaPrioriKnowledge– SleepingExpertFramework
![Page 3: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/3.jpg)
KnowledgeDiscoveryinDatabases
• KDDmaybedefinedas:"Thenontrivialprocessofiden2fyingvalid,novel,poten2allyuseful,andul2matelyunderstandablepa9ernsindata".
• KDDisaninterac?veanditera?veprocessinvolvingseveralsteps.
![Page 4: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/4.jpg)
Yougotyourdata:what’snext?
Whatkindofanalysisdoyouneed?Whichmodelismoreappropriateforit?…
![Page 5: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/5.jpg)
![Page 6: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/6.jpg)
Cleanyourdata!• Datapreprocessingtransformstherawdataintoaformatthatwillbemoreeasilyandeffec?velyprocessedforthepurposeoftheuser.
• Sometasks• sampling:selectsarepresenta?vesubset
fromalargepopula?onofdata;• Noisetreatment• strategiestohandlemissingdata:some?mes
yourrowswillbeincomplete,notallparametersaremeasuredforallsamples.
• normaliza2on• featureextrac2on:pullsoutspecifieddata
thatissignificantinsomepar?cularcontext.
Usestandardformats!
![Page 7: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/7.jpg)
MissingData• Missingdataareapartofalmostallresearch,andweallhaveto
decidehowtodealwithit.• CompleteCaseAnalysis:useonlyrowswithallthevalues• AvailableCaseAnalysis• Subs?tu?on
– MeanValue:replacethemissingvaluewiththemeanvalueforthatpar?cularaOribute
– RegressionSubs?tu?on:wecanreplacethemissingvaluewithhistoricalvaluefromsimilarcases
– MatchingImputa?on:foreachunitwithamissingy,findaunitwithsimilarvaluesofxintheobserveddataandtakeitsyvalue
– MaximumLikelihood,EM,etc• SomeDMmodelscandealwithmissingdatabeOerthanothers.• Whichtechniquetoadoptreallydependsonyourdata
![Page 8: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/8.jpg)
![Page 9: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/9.jpg)
DataMining• CrucialtaskwithintheKDD• DataMiningisaboutautoma?ngtheprocessofsearchingforpaOernsinthedata.
• Moreindetails,themostrelevantDMtasksare:– associa?on– sequenceorpathanalysis– clustering– classificaDon– regression– visualiza?on
![Page 10: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/10.jpg)
FindingSoluDonviaPurposes• Youhaveyourdata,whatkindofanalysisdoyouneed?
• Regression– predictnewvaluesbasedonthepast,inference– computethenewvaluesforadependentvariablebasedonthevaluesofoneormoremeasuredaOributes
• Classifica?on:– dividesamplesinclasses– useatrainedsetofpreviouslylabeleddata
• Clustering– par??oningofadatasetintosubsets(clusters)sothatdataineachsubsetideallysharesomecommoncharacteris?cs
• Classifica?onisinasomewaysimilartotheclustering,butrequiresthattheanalystknowaheadof?mehowclassesaredefined.
![Page 11: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/11.jpg)
ClusterAnalysis
Howmanyclustersdoyouexpect?
![Page 12: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/12.jpg)
![Page 13: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/13.jpg)
![Page 14: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/14.jpg)
![Page 15: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/15.jpg)
SearchforOutliers
![Page 16: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/16.jpg)
ClassificaDon• Dataminingtechniqueusedtopredictgroupmembershipfordatainstances.Therearetwowaystoassignanewvaluetoagivenclass.
• CrispyclassificaDon– givenaninput,theclassifierreturnsitslabel
• ProbabilisDcclassificaDon– givenaninput,theclassifierreturnsitsprobabili?estobelongtoeachclass
– usefulwhensomemistakescanbemorecostlythanothers(givemeonlydata>90%)
– winnertakeallandotherrules• assigntheobjecttotheclasswiththehighestprobability(WTA)
• …butonlyifitsprobabilityisgreaterthan40%(WTAwiththresholds)
![Page 17: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/17.jpg)
![Page 18: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/18.jpg)
Regression/ForecasDng
• Datatablesta?s?calcorrela?on– mappingwithoutanypriorassump?ononthefunc?onalformofthedatadistribu?on;
– machinelearningalgorithmswellsuitedforthis.
• Curvefigng– findawelldefinedandknownfunc?onunderlyingyourdata;
– theory/exper?secanhelp.
![Page 19: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/19.jpg)
MachineLearning
• Tolearn:togetknowledgeofbystudy,experience,orbeingtaught.
• TypesofLearning• Supervised• Unsupervised
![Page 20: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/20.jpg)
UnsupervisedLearning
• Themodelisnotprovidedwiththecorrectresultsduringthetraining.
• Canbeusedtoclustertheinputdatainclassesonthebasisoftheirsta?s?calproper?esonly.
• Clustersignificanceandlabeling.• Thelabelingcanbecarriedoutevenifthelabelsareonlyavailableforasmallnumberofobjectsrepresenta?veofthedesiredclasses.
![Page 21: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/21.jpg)
SupervisedLearning
• Trainingdataincludesboththeinputandthedesiredresults.
• Forsomeexamplesthecorrectresults(targets)areknownandaregivenininputtothemodelduringthelearningprocess.
• Theconstruc?onofapropertraining,valida?onandtestset(Bok)iscrucial.
• Thesemethodsareusuallyfastandaccurate.• Havetobeabletogeneralize:givethecorrectresultswhennewdataaregivenininputwithoutknowingapriorithetarget.
![Page 22: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/22.jpg)
GeneralizaDon
• Referstotheabilitytoproducereasonableoutputsforinputsnotencounteredduringthetraining.
Inotherwords:NOPANICwhen"neverseenbefore"dataaregivenininput!
![Page 23: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/23.jpg)
Acommonproblem:OVERFITTING
• Learnthe“data”andnottheunderlyingfunc?on• Performswellonthedatausedduringthetrainingandpoorlywithnewdata.
Howtoavoidit:usepropersubsets,earlystopping.
![Page 24: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/24.jpg)
Datasets• Trainingset:asetofexamplesusedforlearning,wherethetargetvalueisknown.
• ValidaDonset:asetofexamplesusedtotunethearchitectureofaclassifierandes?matetheerror.
• Testset:usedonlytoassesstheperformancesofaclassifier.Itisneverusedduringthetrainingprocesssothattheerroronthetestsetprovidesanunbiasedes?mateofthegeneraliza?onerror.
![Page 25: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/25.jpg)
IRISdataset
• IRIS– consistsof3classes,50instanceseach– 4numericalaOributes(sepalandpetallengthandwidthincm)
– eachclassreferstoatypeofIrisplant(Setosa,Versicolor,Verginica)
– thefirstclassislinearlyseparablefromtheothertwowhilethe2ndandthe3rdarenotlinearlyseparable
![Page 26: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/26.jpg)
ArDfactsDataset• PQAr?facts
– 2mainclassesand4numericalaOributes
– classesare:trueobjects,ar?facts
![Page 27: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/27.jpg)
DataSelecDon
• “Garbagein,garbageout”:training,valida?onandtestdatamustberepresenta?veoftheunderlyingmodel
• Alleventuali?esmustbecovered• Unbalanceddatasets– sincethenetworkminimizestheoverallerror,thepropor?onoftypesofdatainthesetiscri?cal;
– inclusionofalossmatrix(Bishop,1995);– onen,thebestapproachistoensureevenrepresenta?onofdifferentcases,thentointerpretthenetwork'sdecisionsaccordingly.
![Page 28: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/28.jpg)
ArDficialNeuralNetwork
AnAr?ficialNeuralNetworkisaninforma?onprocessingparadigmthatisinspiredbythewaybiologicalnervoussystemsprocessinforma?on:
“alargenumberofhighlyinterconnectedsimpleprocessing
elements(neurons)workingtogethertosolvespecific
problems”
![Page 29: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/29.jpg)
AsimplearDficialneuron• Thebasiccomputa?onalelementisonencalledanodeorunit.It
receivesinputfromsomeotherunits,orfromanexternalsource.• Eachinputhasanassociatedweightw,whichcanbemodifiedso
astomodelsynap?clearning.• Theunitcomputessomefunc?onoftheweightedsumofits
inputs:
![Page 30: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/30.jpg)
NeuralNetworksANeuralNetworkisusuallystructuredintoaninputlayerofneurons,oneormorehiddenlayersandoneoutputlayer.Neuronsbelongingtoadjacentlayersareusuallyfullyconnectedandthevarioustypesandarchitecturesareiden?fiedbothbythedifferenttopologiesadoptedfortheconnec?onsaswellbythechoiceoftheac?va?onfunc?on.Thevaluesofthefunc?onsassociatedwiththeconnec?onsarecalled“weights”.
ThewholegameofusingNNsisinthefactthat,inorderforthenetworktoyieldappropriateoutputsforgiveninputs,theweightmustbesettosuitablevalues.
Thewaythisisobtainedallowsafurtherdis?nc?onamongmodesofopera?ons.
![Page 31: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/31.jpg)
NeuralNetworks:types
Feedforward:SingleLayerPerceptron,MLP,ADALINE(Adap?veLinearNeuron),RBFSelf‐Organized:SOM(KohonenMaps)
Recurrent:SimpleRecurrentNetwork,HopfieldNetwork.Stochas?c:Boltzmannmachines,RBM.Modular:CommiOeeofMachines,ASNN(Associa?veNeuralNetworks),Ensembles.Others:InstantaneouslyTrained,Spiking(SNN),Dynamic,Cascades,NeuroFuzzy,PPS,GTM.
![Page 32: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/32.jpg)
MulDLayerPerceptron• TheMLPisoneofthemostusedsupervisedmodel:itconsistsofmul?plelayersofcomputa?onalunits,usuallyinterconnectedinafeed‐forwardway.
• Eachneuroninonelayerhasdirectconnec?onstoalltheneuronsofthesubsequentlayer.
![Page 33: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/33.jpg)
LearningProcess• BackPropaga?on
– theoutputvaluesarecomparedwiththetargettocomputethevalueofsomepredefinederrorfunc?on
– theerroristhenfedbackthroughthenetwork– usingthisinforma?on,thealgorithmadjuststheweightsofeach
connec?oninordertoreducethevalueoftheerrorfunc?on
Anerrepea?ngthisprocessforasufficientlylargenumberoftrainingcycles,thenetworkwillusuallyconverge.
![Page 34: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/34.jpg)
HiddenUnits• Thebestnumberofhiddenunitsdependon:
– numberofinputsandoutputs
– numberoftrainingcase– theamountofnoiseinthetargets
– thecomplexityofthefunc?ontobelearned
– theac?va?onfunc?on
• Toofewhiddenunits=>hightrainingandgeneraliza?onerror,duetounderfigngandhighsta?s?calbias.
• Toomanyhiddenunits=>lowtrainingerrorbuthighgeneraliza?onerror,duetooverfigngandhighvariance.
• Rulesofthumbdon'tusuallywork.
![Page 35: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/35.jpg)
AcDvaDonandErrorFuncDons
![Page 36: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/36.jpg)
AcDvaDonFuncDons
![Page 37: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/37.jpg)
Results:confusionmatrix
![Page 38: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/38.jpg)
Results:completenessandcontaminaDon
Exercise:computecompletenessandcontamina?onforthepreviousconfusionmatrix(testset)
![Page 39: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/39.jpg)
DecisionTrees• Isanotherclassifica?onmethod.• Adecisiontreeisasetofsimplerules,suchas"ifthesepallengthislessthan5.45,classifythespecimenassetosa."
• Decisiontreesarealsononparametricbecausetheydonotrequireanyassump?onsaboutthedistribu?onofthevariablesineachclass.
![Page 40: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/40.jpg)
Summary• KDDandDataMiningTasks• Findingtheop?malapproach• SupervisedModels
– NeuralNetworks– Mul?LayerPerceptron– DecisionTrees
• UnsupervisedModels– DifferentTypesofClustering– DistancesandNormaliza?on– Kmeans– SelfOrganizingMaps
• Combiningdifferentmodels– CommiOeeMachines– IntroducingaPrioriKnowledge– SleepingExpertFramework
![Page 41: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/41.jpg)
UnsupervisedLearning
• Themodelisnotprovidedwiththecorrectresultsduringthetraining.
• Canbeusedtoclustertheinputdatainclassesonthebasisoftheirsta?s?calproper?esonly.
• Clustersignificanceandlabeling.• Thelabelingcanbecarriedoutevenifthelabelsareonlyavailableforasmallnumberofobjectsrepresenta?veofthedesiredclasses.
![Page 42: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/42.jpg)
TypesofClustering• Typesofclustering:
– HIERARCHICAL:findssuccessiveclustersusingpreviouslyestablishedclusters• agglomera?ve(boOom‐up):startwitheachelementinaseparateclusterandmergethemaccordinglytoagivenproperty
• divisive(top‐down)– PARTITIONAL:usuallydeterminesallclustersatonce
![Page 43: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/43.jpg)
Distances• Determinethesimilaritybetweentwoclustersandtheshapeoftheclusters.
![Page 44: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/44.jpg)
Incaseofstrings…• TheHammingdistancebetweentwostringsofequallengthisthenumberofposi?onsatwhichthecorrespondingsymbolsaredifferent.– measurestheminimumnumberofsubs2tu2onsrequiredtochangeonestringintotheother
• TheLevenshtein(edit)distanceisametricformeasuringtheamountofdifferencebetweentwosequences.– isdefinedastheminimumnumberofeditsneededtotransformonestringintotheother.
10010011000100HD=3
LD(BIOLOGY,BIOLOGIA)=2BIOLOGY‐>BIOLOGI(subsDtuDon)BIOLOGI‐>BIOLOGIA(inserDon)
![Page 45: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/45.jpg)
NormalizaDon
VAR:themeanofeachaOributeofthetransformedsetofdatapointsisreducedtozerobysubtrac?ngthemeanofeachaOributefromthevaluesoftheaOributesanddividingtheresultbythestandarddevia?onoftheaOribute.
RANGE(Min‐MaxNormalizaDon):subtractstheminimumvalueofanaOributefromeachvalueoftheaOributeandthendividesthedifferencebytherangeoftheaOribute.Ithastheadvantageofpreservingexactlyallrela?onshipinthedata,withoutaddinganybias.
SOFTMAX:isawayofreducingtheinfluenceofextremevaluesoroutliersinthedatawithoutremovingthemfromthedataset.Itisusefulwhenyouhaveoutlierdatathatyouwishtoincludeinthedatasetwhiles?llpreservingthesignificanceofdatawithinastandarddevia?onofthemean.
![Page 46: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/46.jpg)
KMeans
![Page 47: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/47.jpg)
KMeans:howitworks
![Page 48: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/48.jpg)
Kmeans:ProandCons
![Page 49: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/49.jpg)
LearningK• Findabalancebetweentwovariables:thenumberofclusters(K)andtheaveragevarianceoftheclusters.
• Minimizebothvalues
• Asthenumberofclustersincreases,theaveragevariancedecreases(uptothetrivialcaseofk=nandvariance=0).
• Somecriteria:– BIC(BayesianInforma?onCriteria)– AIC(AkaikeInforma?onCriteria)– Davis‐BouldinIndex– ConfusionMatrix
![Page 50: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/50.jpg)
SelfOrganizingMaps
![Page 51: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/51.jpg)
SOMtopology
![Page 52: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/52.jpg)
SOMPrototypes
![Page 53: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/53.jpg)
SOMTraining
![Page 54: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/54.jpg)
CompeDDveandCooperaDveLearning
![Page 55: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/55.jpg)
SOMUpdateRule
![Page 56: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/56.jpg)
Parameters
![Page 57: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/57.jpg)
DMwithSOM
![Page 58: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/58.jpg)
SOMLabeling
![Page 59: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/59.jpg)
LocalizingData
![Page 60: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/60.jpg)
ClusterStructure
![Page 61: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/61.jpg)
ClusterStructure‐2
![Page 62: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/62.jpg)
ComponentPlanes
![Page 63: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/63.jpg)
RelaDveImportance
![Page 64: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/64.jpg)
Howaccurateisyourclustering
![Page 65: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/65.jpg)
Trajectories
![Page 66: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/66.jpg)
CombiningModels
![Page 67: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/67.jpg)
CommideeMachines
![Page 68: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/68.jpg)
Aprioriknowledge
![Page 69: Supervised and Unsupervised Learninggeorge/aybi199/Donalek_Classif.pdfsubtracts the minimum value of an aribute from each value of the aribute and then divides the difference by the](https://reader033.fdocuments.net/reader033/viewer/2022060309/5f0a44257e708231d42ad14f/html5/thumbnails/69.jpg)
SleepingExperts