H2O Random Grid Search - PyData Amsterdam

21
Using H2O Random Grid Search for Hyper-parameters Optimization Jo-fai (Joe) Chow Data Scientist [email protected]

Transcript of H2O Random Grid Search - PyData Amsterdam

Page 1: H2O Random Grid Search - PyData Amsterdam

Using H2ORandomGridSearch for Hyper-parameters Opt imizat ion

Jo-fai(Joe)[email protected]

Page 2: H2O Random Grid Search - PyData Amsterdam

WHOAM I

• CustomerDataScientistatH2O.ai• Background

o Telecom(VirginMedia)o DataSciencePlatform(DominoDataLab)oWaterEngineering+MachineLearningResearch(STREAMIndustrialDoctorateCentre)

Page 3: H2O Random Grid Search - PyData Amsterdam

ABOUTH2O

• Companyo Team:50(45shown)o Foundedin2012,MountainView,

California.o Venturecapitalbacked

• Productso Open-sourcemachinelearning

platform.o Flow(Web),R,Python,Spark,

Hadoopinterfaces.

Page 4: H2O Random Grid Search - PyData Amsterdam

ABOUTTHISTALK

• StoryofabakerandadatascientistoWhyyoushouldcare

• Hyper-parametersoptimizationo Commontechniqueso H2OPythonAPI

• OtherH2Ofeaturesforstreamliningworkflow

Page 5: H2O Random Grid Search - PyData Amsterdam

STORYOFABAKER

• Makingacakeo Source

• Ingredientso Process:

• Mixing• Baking• Decorating

o Endproduct• Anicelookingcake

Credit: www.dphotographer.co.uk/image/201305/baking_a_cake

Page 6: H2O Random Grid Search - PyData Amsterdam

STORYOFADATASCIENTIST

• Makingadataproducto Source

• Rawdatao Process:

• Datamunging• Analyzing/Modeling• Reporting

o Endproduct• Apps,graphsorreports

Credit: www.simranjindal.com

Page 7: H2O Random Grid Search - PyData Amsterdam

BAKERANDDATASCIENTIST

• Whatdotheyhaveincommon?o Processisimportanttobakersanddatascientists.Yet,mostcustomersdonotappreciatetheeffort.

oMostcustomersonlycareaboutrawmaterialsqualityandendproducts.

Page 8: H2O Random Grid Search - PyData Amsterdam

WHYYOUSHOULDCARE

• Wecanusemachine/softwaretoautomatesomelaborioustasks.

• Wecanspendmoretimeonqualityassuranceandpresentation.

• Thistalkisaboutmakingonespecifictask,hyper-parameterstuning,moreefficient.

Page 9: H2O Random Grid Search - PyData Amsterdam

HYPER-PARAMETERSOPTIMISATION

• Overviewo Optimizinganalgorithm’sperformance.

• e.g.RandomForest,GradientBoostingMachine(GBM)

o Tryingdifferentsetsofhyper-parameterswithinadefinedsearchspace.

o Norulesofthumb.

Page 10: H2O Random Grid Search - PyData Amsterdam

HYPER-PARAMETERSOPTIMISATION

• Exampleofhyper-parametersinH2Oo RandomForest:

• No.oftrees,depthoftrees,samplerate…

o GradientBoostingMachine(GBM):• No.oftrees,depthoftrees,learningrate,samplerate…

o DeepLearning:• Activation,hiddenlayersizes,L1,L2,dropoutratios…

Page 11: H2O Random Grid Search - PyData Amsterdam

COMMONTECHNIQUES

• Manualsearcho Tuningbyhand- inefficiento Expertopinion(notalwaysreliable)

• Gridsearcho Automatedsearchwithinadefinedspaceo Computationallyexpensive

• Randomgridsearcho Moreefficientthanmanual/gridsearcho Equalperformanceinlesstime

Page 12: H2O Random Grid Search - PyData Amsterdam

RANDOMGRIDSEARCH– DOES ITWORK?

• RandomSearchforHyper-ParameterOptimizationo JournalofMachineLearningResearch(2012)o JamesBergstraandYoshuaBengioo “Comparedwithdeepbeliefnetworksconfiguredbyathoughtfulcombinationofmanualsearchandgridsearch,purelyrandomsearchfoundstatisticallyequalperformanceonfourofsevendatasets,andsuperiorperformanceononeofseven.”

Page 13: H2O Random Grid Search - PyData Amsterdam

RELATEDFEATURE– EARLYSTOPPING

• Atechniqueforregularization.• Avoidover-fittingthetrainingset.• Usefulwhencombinedwithhyper-parametersearch:o Additionalcontrols (e.g.timeconstraint,tolerance)

Page 14: H2O Random Grid Search - PyData Amsterdam

H2ORANDOMGRIDSEARCH

• Objectiveso Optimizemodelperformancebasedonevaluationmetric.

o Explorethedefinedsearchspacerandomly.o Useearly-stoppingforregularizationandadditionalcontrols.

Page 15: H2O Random Grid Search - PyData Amsterdam

RANDOMGRIDSEARCH(PYTHONAPI )

Page 16: H2O Random Grid Search - PyData Amsterdam

RANDOMGRIDSEARCH

• Outputso Bestmodelbasedonmetrico Asetofhyper-parametersforthebestmodel

• OtherAPIso R,REST,Java(seedocumentationonGitHub)

Page 17: H2O Random Grid Search - PyData Amsterdam

OTHERH2OFEATURES

• h2oEnsembleo Betterpredictiveperformance

• SparklingWater=Spark+H2O• PlainOldJavaObject(POJO)

o ProductionizeH2Omodels

Page 18: H2O Random Grid Search - PyData Amsterdam

CONCLUSIONS

• Mostpeopleonlycareabouttheendproduct.• UseH2Orandomgridsearchtosavetimeonhyper-parameterstuning.

• Spendmoretimeonqualityassuranceandpresentation.

Page 19: H2O Random Grid Search - PyData Amsterdam

CONCLUSIONS

• H2ORandomGridSearcho Anefficientwaytotunehyper-parameterso APIsforPython,R,Java,RESTo DocheckoutthecodeexamplesonGitHub

• CombinewithotherH2Ofeatureso Streamlinedatascienceworkflow

Page 20: H2O Random Grid Search - PyData Amsterdam

ACKNOWLEDGEMENTS

• GoDataDriven• Conferencesponsors• MycolleaguesatH2O.ai

Page 21: H2O Random Grid Search - PyData Amsterdam

THANKYOU

• Resourceso Slides+code– github.com/h2oai/h2o-meetupso DownloadH2O– www.h2o.aio Documentation– www.h2o.ai/docs/o [email protected]

• Wearehiring– www.h2o.ai/careers/