Supervised Learning Algorithms - Analysis of different approaches

SupervisedLearningAlgorithms

Analysisof

Differentapproaches

EvgeniyMarinovMLConsultant

PhilipYankovx8academy

MLDefiniCon

•  ThereareplentyofdefiniCons...•  Informal:Thefieldofstudythatgivescomputerstheabilitytolearnwithoutbeingexplicitlyprogrammed(ArthurSamuel,1959)

•  Formal:AcomputerprogramissaidtolearnfromexperienceE,withrespecttosometaskT,andsomeperformancemeasureP,ifitsperformanceonTasmeasuredbyPimproveswithexperienceE(TomMitchell,1998).

FromWikipedia

•  Machinelearningis:– asubfieldofcomputersciencethatevolvedfromthestudyofpaRernrecogniConandinAIinthe1980s(MLisaseparatefieldflourishingfromthe1990s,firstbenefitedfromstaCsCcsandthenfromtheincreasingavailabilityofdigiCzedinformaConatthatCme).

WhyML?

KeyfactorsenablingMLgrowthtoday

•  CloudCompu)ng•  InternetofThings•  BigData(+UnstructuredData)

WhyDataissoimportant?

WhyDataissoimportant?

•  GooglePhotos– Unlimitedstorage

•  Googlevoice– OK,Google

Nowadays

•  ItissoeasytogetdatayouneedandtouseanAPIorserviceofsomecompanytoexperimentwiththem

MethodsforcollecCngdata

MethodsforcollecCngdata

•  Download– Spreadsheet– Text

•  API•  Crawling/scraping

SupervisedLearning

Task Description

Pipeline

IniCalexample

NotaCon

•  Asdasd

•  Asdasd

•  Asdasd

•  Asdasd

TheregressionfuncConf(x)

•  as•  as

•  as

Howtoevaluateourmodel?

Pipeline

Assessing the Model Accuracy

Bias-variancetrade-off

Cross-validaCon

GeneralizaConErrorandOverfi`ng

ChoosingaModelbydatatypesofresponse

Pipeline

DatatypesandGeneralizedLinearmodel

•  SimpleandGenerallinearmodels•  RestricConsofthelinearmodel•  DatatypeoftheresponseY1)  (General)LinearmodelR,Y~Gaussian(µ,σ^2)--conCnuous2)  LogisCcregression{0,1},Y~Bernoulli(p)--binarydata3)Poissonregression{0,1,...},Y~Poisson(µ)--counCngdata

SimpleandGenerallinearmodels

Simple:General:

ErroroftheGeneralLinearmodel

ClicktoaddText

RestricConsofLinearmodels

AlthoughtheGenerallinearmodelisausefulframework,itisnotappropriateinthefollowingcases:•  TherangeofYisrestricted(e.g.binary,count,posiCve/negaCve)

•  Var[Y]dependsonthemeanE[Y](fortheGaussiantheyareindependent)

Name Mean Variance

Bernoulli(p) p p(1 - p)

Binomial(p, n) np np(1 - p)

Poisson(p) p p

BinaryresponseY–{0,1}•  TheBernoulli(p)isdiscreter.v.withtwopossibleoutcomes:•  pandq=1–p•  TheparameterpdoesnotchangeoverCme•  Bernoulliisbuildingblockforothermorecomplicated

distribuCons

•  Examples:•  Coinflips{Heads,Tails}–ifunbiased•  thenp=0.5•  ClickonAd,Fail/SuccessonExam

GeneralizedLinearmodel-IntuiCon

ExponenCalFamily

Generallinearmodel

Binary Data

ModelingCounCng/PoissonData

MaximizingtheLog-LikelihoodandParametersesCmaCon

Preprocessing

Pipeline

Problemswithfeaturetypes

•  Bignumberoffeatures->DimensionalityreducCon->SVD,PCA– Dimensionalityreduc)on:“compress”thedatafromahigh-dimensionalrepresentaConintoalower-dimensionalone(usefulforvisualizaConorasaninternaltransformaConforotherMLalgorithms)

•  Sparsefeatures->Hashing

•  Insteadofusingtwocoordinates(𝒙,𝒚)todescribepointlocaCons,let’suseonlyonecoordinate(𝒛)

•  Point’sposiConisitslocaConalongvector𝒗↓𝟏 •  Howtochoose𝒗↓𝟏 ?Minimizereconstruc)onerror

SVD–DimensionalityReducCon

v1

first right singular vector

Movie 1 rating

Mov

ie 2

ratin

g

SVD-DimensionalityReducCon

Moredetails•  Q:Howexactlyisdim.reduc)ondone?•  A:Setsmallestsingularvaluestozero

46

0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09

x x

1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2

0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.59 0.65 0.07 -0.73 -0.67 0.07 -0.29 0.32

12.4 0 0 0 9.5 0 0 0 1.3

≈



47

x x

1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2

0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.59 0.65 0.07 -0.73 -0.67 0.07 -0.29 0.32

12.4 0 0 0 9.5 0 0 0 1.3

0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09

≈



≈ x x

1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2

0.13 0.02 0.41 0.07 0.55 0.09 0.68 0.11 0.15 -0.59 0.07 -0.73 0.07 -0.29

12.4 0 0 9.5

0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69

ǁA-BǁF =√Σij (Aij-Bij)2 is“small”

SVD–DimensionalityReducCon(PCAgeneralizaCon)


≈

1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2

0.92 0.95 0.92 0.01 0.01 2.91 3.01 2.91 -0.01 -0.01 3.90 4.04 3.90 0.01 0.01 4.82 5.00 4.82 0.03 0.03 0.70 0.53 0.70 4.11 4.11 -0.69 1.34 -0.69 4.78 4.78 0.32 0.23 0.32 2.01 2.01

Frobeniusnorm:ǁMǁF =√Σij Mij

2

Feature selection - example

Dummy Encoding

(De)MoCvaCon

SoluContothoseproblemswithfeatures

Pipeline

Factorization Machine (degree 2)

General Applications of FMs

SummaryPipeline

Pipeline

FromprototypetoproducCon

•  PrototypevsProducConCme?–model(pipeline)shouldstaythesame

Libraries

QuesCons?

Thankyou!!!

References•  hRps://www.coursera.org/learn/machine-learning

•  hRp://www.cs.cmu.edu/~tom/•  hRp://scikit-learn.org/stable/•  hRp://www.scalanlp.org/•  hRp://www.algo.uni-konstanz.de/members/rendle/pdf/Rendle2010FM.pdf

•  hRps://securityintelligence.com/factorizaCon-machines-a-new-way-of-looking-at-machine-learning/

References

•  AnIntroducContoGeneralizedLinearModels–AnneReDobson,AdrianBarneR

•  ApplyingGeneralizedLinearModels–JamesLindsey

•  hRps://www.codementor.io/jadianes/building-a-recommender-with-apache-spark-python-example-app-part1-du1083qbw

•  hRps://www.chrisstucchio.com/blog/index.html

Supervised Learning Algorithms - Analysis of different approaches

Science

Transcript of Supervised Learning Algorithms - Analysis of different approaches