H2O Random Grid Search - PyData Amsterdam
-
Upload
srisatish-ambati -
Category
Software
-
view
1.365 -
download
0
Transcript of H2O Random Grid Search - PyData Amsterdam
Using H2ORandomGridSearch for Hyper-parameters Opt imizat ion
Jo-fai(Joe)[email protected]
WHOAM I
• CustomerDataScientistatH2O.ai• Background
o Telecom(VirginMedia)o DataSciencePlatform(DominoDataLab)oWaterEngineering+MachineLearningResearch(STREAMIndustrialDoctorateCentre)
ABOUTH2O
• Companyo Team:50(45shown)o Foundedin2012,MountainView,
California.o Venturecapitalbacked
• Productso Open-sourcemachinelearning
platform.o Flow(Web),R,Python,Spark,
Hadoopinterfaces.
ABOUTTHISTALK
• StoryofabakerandadatascientistoWhyyoushouldcare
• Hyper-parametersoptimizationo Commontechniqueso H2OPythonAPI
• OtherH2Ofeaturesforstreamliningworkflow
STORYOFABAKER
• Makingacakeo Source
• Ingredientso Process:
• Mixing• Baking• Decorating
o Endproduct• Anicelookingcake
Credit: www.dphotographer.co.uk/image/201305/baking_a_cake
STORYOFADATASCIENTIST
• Makingadataproducto Source
• Rawdatao Process:
• Datamunging• Analyzing/Modeling• Reporting
o Endproduct• Apps,graphsorreports
Credit: www.simranjindal.com
BAKERANDDATASCIENTIST
• Whatdotheyhaveincommon?o Processisimportanttobakersanddatascientists.Yet,mostcustomersdonotappreciatetheeffort.
oMostcustomersonlycareaboutrawmaterialsqualityandendproducts.
WHYYOUSHOULDCARE
• Wecanusemachine/softwaretoautomatesomelaborioustasks.
• Wecanspendmoretimeonqualityassuranceandpresentation.
• Thistalkisaboutmakingonespecifictask,hyper-parameterstuning,moreefficient.
HYPER-PARAMETERSOPTIMISATION
• Overviewo Optimizinganalgorithm’sperformance.
• e.g.RandomForest,GradientBoostingMachine(GBM)
o Tryingdifferentsetsofhyper-parameterswithinadefinedsearchspace.
o Norulesofthumb.
HYPER-PARAMETERSOPTIMISATION
• Exampleofhyper-parametersinH2Oo RandomForest:
• No.oftrees,depthoftrees,samplerate…
o GradientBoostingMachine(GBM):• No.oftrees,depthoftrees,learningrate,samplerate…
o DeepLearning:• Activation,hiddenlayersizes,L1,L2,dropoutratios…
COMMONTECHNIQUES
• Manualsearcho Tuningbyhand- inefficiento Expertopinion(notalwaysreliable)
• Gridsearcho Automatedsearchwithinadefinedspaceo Computationallyexpensive
• Randomgridsearcho Moreefficientthanmanual/gridsearcho Equalperformanceinlesstime
RANDOMGRIDSEARCH– DOES ITWORK?
• RandomSearchforHyper-ParameterOptimizationo JournalofMachineLearningResearch(2012)o JamesBergstraandYoshuaBengioo “Comparedwithdeepbeliefnetworksconfiguredbyathoughtfulcombinationofmanualsearchandgridsearch,purelyrandomsearchfoundstatisticallyequalperformanceonfourofsevendatasets,andsuperiorperformanceononeofseven.”
RELATEDFEATURE– EARLYSTOPPING
• Atechniqueforregularization.• Avoidover-fittingthetrainingset.• Usefulwhencombinedwithhyper-parametersearch:o Additionalcontrols (e.g.timeconstraint,tolerance)
H2ORANDOMGRIDSEARCH
• Objectiveso Optimizemodelperformancebasedonevaluationmetric.
o Explorethedefinedsearchspacerandomly.o Useearly-stoppingforregularizationandadditionalcontrols.
RANDOMGRIDSEARCH(PYTHONAPI )
RANDOMGRIDSEARCH
• Outputso Bestmodelbasedonmetrico Asetofhyper-parametersforthebestmodel
• OtherAPIso R,REST,Java(seedocumentationonGitHub)
OTHERH2OFEATURES
• h2oEnsembleo Betterpredictiveperformance
• SparklingWater=Spark+H2O• PlainOldJavaObject(POJO)
o ProductionizeH2Omodels
CONCLUSIONS
• Mostpeopleonlycareabouttheendproduct.• UseH2Orandomgridsearchtosavetimeonhyper-parameterstuning.
• Spendmoretimeonqualityassuranceandpresentation.
CONCLUSIONS
• H2ORandomGridSearcho Anefficientwaytotunehyper-parameterso APIsforPython,R,Java,RESTo DocheckoutthecodeexamplesonGitHub
• CombinewithotherH2Ofeatureso Streamlinedatascienceworkflow
ACKNOWLEDGEMENTS
• GoDataDriven• Conferencesponsors• MycolleaguesatH2O.ai
THANKYOU
• Resourceso Slides+code– github.com/h2oai/h2o-meetupso DownloadH2O– www.h2o.aio Documentation– www.h2o.ai/docs/o [email protected]
• Wearehiring– www.h2o.ai/careers/