3rd Hivemall meetup
-
Upload
makoto-yui -
Category
Engineering
-
view
675 -
download
0
Transcript of 3rd Hivemall meetup
RecentprogressandfutureroadmapofHivemall
ResearchEngineerMakotoYUI@myui
1
#hivemallmtup
2016/09/083rdHivemallmeetup
Agenda
1. ShortIntroductiontoHivemallü Hivemalluse-cases
2. RecentUpdates3. RoadmapofHivemallü comingnewfeatures
22016/09/083rdHivemallmeetup
WhatisHivemall
ScalablemachinelearninglibrarybuiltasacollectionofHiveUDFs,licensedundertheApacheLicensev2
3
https://github.com/myui/hivemall
Thankforeveryonecontributedtotheproject!
2016/09/083rdHivemallmeetup
HadoopHDFS
MapReduce(MRv1)
Hivemall
ApacheYARN
ApacheTezDAGprocessing
Machine Learning
Query Processing
Parallel Data Processing Framework
Resource Management
Distributed File SystemCloud Storage
SparkSQL
ApacheSpark
MESOS
Hive Pig
MLlib
WhatisHivemall
AmazonS3
2016/09/083rdHivemallmeetup 4
Hivemall’s Vision:MLonSQL
ClassificationwithMahout
CREATETABLElr_modelASSELECTfeature,-- reducersperformmodelaveraginginparallelavg(weight)asweightFROM(SELECTlogress(features,label,..)as(feature,weight)FROMtrain)t-- map-onlytaskGROUPBYfeature;-- shuffledtoreducers
✓MachineLearningmadeeasyforSQLdevelopers(MLfortherestofus)✓InteractiveandStableAPIsw/ SQLabstraction
ThisSQLqueryautomaticallyrunsinparallelonHadoop
52016/09/083rdHivemallmeetup
Ø CTRpredictionofAdclicklogs•Freakout Inc.,Fancommunication,andmore•ReplacedSparkMLlibw/HivemallatcompanyX
IndustryusecasesofHivemall
6http://www.slideshare.net/masakazusano75/sano-hmm-20150512
2016/09/083rdHivemallmeetup
7
ØGenderpredictionofAdclicklogs•Scaleout Inc.andFancommutations
http://eventdots.jp/eventreport/458208
IndustryusecasesofHivemall
2016/09/083rdHivemallmeetup
8
IndustryusecasesofHivemallØ ValuepredictionofRealestates•Livesense
http://www.slideshare.net/y-ken/real-estate-tech-with-hivemall2016/09/083rdHivemallmeetup
9
ØChurnDetection•OISIX
IndustryusecasesofHivemall
http://www.slideshare.net/TaisukeFukawa/hivemall-meetup-vol2-oisix2016/09/083rdHivemallmeetup
Agenda
1. ShortIntroductiontoHivemallü Hivemalluse-cases
2. RecentUpdates3. RoadmapofHivemallü comingnewfeatures
102016/09/083rdHivemallmeetup
v0.4.2-rc.2Ø Releasedon2016/06/28ØminorhotfixesØ Thelatestrelease
11
RecentReleases
2016/09/083rdHivemallmeetup
v0.4.2-rc.1Ø Releasedon2016/06/07Ø HivemallonSparkv1.6Ø Kudosto@maropu
Ø BPR-MF(MatrixFactorizationforImplicitFeedbacks)
12
RecentReleases
2016/09/083rdHivemallmeetup
13
HivemallonApacheSpark
Installationisveryeasyasfollows:$spark-shell--packagesmaropu:hivemall-spark:0.0.6
2016/09/083rdHivemallmeetup
14
FeatureHashingFrequentlyusedtechniquetodealwithhigh-dimensionaldata
2016/09/083rdHivemallmeetup
高次元 低次元
Kerneltrick
2016/09/083rdHivemallmeetup 15
高次元に写像
InputFeatureSpace MappedFeatureSpace
高次空間でhyperplaneを引く低次元で非線形分離できている
For two dimensional features [a, b], the degree-2 polynomial features are [(1, ) a, b, a^2, ab, b^2].高次元低次元
16
PolynomialExpansion
2016/09/083rdHivemallmeetup
17
PolynomialExpansion
b^b:1.0andb^b^b:1.0areomittedw/truncateoptiona^a:0.25andc^c:0.09areomittedw/interactiveonlyoption
2016/09/083rdHivemallmeetup
FeatureVectorformatterFunctions
18
量的変数は「カラム名:値」質的変数は「カラム名#値」となるなお、nullや重み0.0の特徴は作成されない
2016/09/083rdHivemallmeetup
19
Mini-batchGradientDescent
Caution:Mini-batchgenerallyrequiresmoreiterationsthanSGD2016/09/083rdHivemallmeetup
20
JapaneseTokenizerusingKuromoji
ThisfeatureisrequestfromaTreasureDatacustomer
2016/09/083rdHivemallmeetup
Thanksprovidingareferenceimplementationtous(companyR)
Agenda
1. ShortIntroductiontoHivemallü Hivemalluse-cases
2. RecentUpdates3. RoadmapofHivemallü comingnewfeatures
212016/09/083rdHivemallmeetup
22
ImportantAnnouncement
HivemallwillbecomeApacheHivemall(?)Nowonvotingthough..
2016/09/083rdHivemallmeetup
23
ApacheIncubationstatus
2016/09/083rdHivemallmeetup
•MakotoYui<TreasureData>• TakeshiYamamuro <NTT>Ø HivemallonApacheSpark• DanielDai<Hortonworks>Ø HivemallonApachePigØ ApachePigPMCmember• TsuyoshiOzawa<NTT>ØApacheHadoopPMCmember• KaiSasaki<TreasureData>
24
Initialcommitters
2016/09/083rdHivemallmeetup
Champion
NominatedMentors
25
Projectmentors
• ReynoldXin<Databricks,ASFmember>ApacheSparkPMCmember• MarkusWeimer<Microsoft,ASFmember>ApacheREEFPMCmember• Xiangrui Meng <Databricks,ASFmember>ApacheSparkPMCmember
• RomanShaposhnik <Pivotal,ASFmember>ApacheBigtop/IncubatorPMCmember
2016/09/083rdHivemallmeetup
• PossiblyenterApacheIncubatorinSept,2016• IPclearanceandproject/repositorysitesetup•Contributionguideline•CreatewhouseHivemalllist•Moredocumentations!SepttoNov
• InitialApacheReleaseDec(orlateNov?)• v0.5
• Non-Apachereleaseofv0.5-beta.xxwillbereleaseingithub inOct
26
Roadmap
2016/09/083rdHivemallmeetup
ü HivemallonSpark2.0w/Dataframe support• Kudosto@maropu
ü ChangeFinder• ChangePointandAnomalyDetection• Kudosto@L3sota@takuti• PR#333
ü XGBoost support• Kudosto@maropu
27
ComingNewFeatures- alreadymergedinMaster
2016/09/083rdHivemallmeetup
ü ChangeFinder
28
ComingNewFeatures- alreadymergedinMaster
cf_detect(array<double>x[,const stringoptions])
J.TakeuchiandK.Yamanishi,“AUnifyingFrameworkforDetectingOutliersandChangePointsfromTimeSeries,” IEEEtransactionsonKnowledgeandDataEngineering,pp.482-492,2006.
2016/09/083rdHivemallmeetup
ü ChangeFinder
29
ComingNewFeatures- alreadymergedinMaster
cf_detect(array<double>x[,const stringoptions])
2016/09/083rdHivemallmeetup
ü VariousEvaluationMetrics• Kudosto@takuti,alsoR2by,logloss by• PR#326
30
ComingNewFeatures- alreadymergedinMaster
2016/09/083rdHivemallmeetup
Fan-cs,sakai-san
31
ComingNewFeatures- alreadymergedinMaster
ü FeatureBinning• Kudosto@amaya382onPR#382• Mapsquantitativevariablestobins
Age(quantitativevariable)ismappedintoameaningfulbin(categoricalvariable)basedonquantiles
2016/09/083rdHivemallmeetup
• v0.5-beta{1,2}release(Oct-Nov)ü Systemtestframework
üKudosto@amaya382ü one-hotencoding
üKudosto@kaiü Field-awareFactorizationMachinesü Kernelized PassiveAggressive
üKudosto@L3sotaü GeneralizedLinearModel
ü OptimizerframeworkincludingADAMü L1/L2regularizationü Kudosto@maropu
ü Disk-basediterationsupportü Toavoidtoolargeamplify
ü GradientTreeBoostingü OnlineLDA
32
Otherundergoingnewfeatures
2016/09/083rdHivemallmeetup
33
WesupportmachinelearninginCloud
Anyfeaturerequest?Or,questions?
bit.ly/td-wants-you