Podling Hivemall in the Apache Incubator

Post on 07-Jan-2017

249 views 1 download

Transcript of Podling Hivemall in the Apache Incubator

Podling HivemallintheApacheIncubator

ResearchEngineerMakotoYUI@myui

<myui@treasure-data.com>

12016/11/08ApacheHadoopMeetupatCWT2016

2016/11/08ApacheHadoopMeetupatCWT2016 2

HivemallenteredApacheIncubatoronSept13,2016🎉

hivemall.incubator.apache.org

@ApacheHivemall

•MakotoYui<TreasureData>• TakeshiYamamuro <NTT>Ø HivemallonApacheSpark• DanielDai<Hortonworks>Ø HivemallonApachePigØ ApachePigPMCmember• TsuyoshiOzawa<NTT>ØApacheHadoopPMCmember• KaiSasaki<TreasureData>

3

Initialcommitters

2016/11/08ApacheHadoopMeetupatCWT2016

Champion

NominatedMentors

4

Projectmentors

• ReynoldXin<Databricks,ASFmember>ApacheSparkPMCmember• MarkusWeimer<Microsoft,ASFmember>ApacheREEFPMCmember• Xiangrui Meng <Databricks,ASFmember>ApacheSparkPMCmember

• RomanShaposhnik <Pivotal,ASFmember>ApacheBigtop/IncubatorPMCmember

2016/11/08ApacheHadoopMeetupatCWT2016

WhatisApacheHivemall

ScalablemachinelearninglibrarybuiltasacollectionofHiveUDFs

52016/11/08ApacheHadoopMeetupatCWT2016

Multi/Crossplatform Versatile Scalable Ease-of-use

Hivemalliseasyandscalable…

ClassificationwithMahout

CREATETABLElr_model ASSELECTfeature,-- reducersperformmodelaveraginginparallelavg(weight)asweightFROM(SELECTlogress(features,label,..)as(feature,weight)FROMtrain)t-- map-onlytaskGROUPBYfeature;-- shuffledtoreducers

MLmadeeasyforSQLdevelopers

Borntobeparallelandscalable

ThisSQLqueryautomaticallyrunsinparallelonHadoopcluster

62016/11/08ApacheHadoopMeetupatCWT2016

Ease-of-use

Scalable

2016/11/08ApacheHadoopMeetupatCWT2016 7

Hivemallisamulti/cross-platformMLlibrary

HiveQL SparkSQL/Dataframe API PigLatin

HivemallisMulti/Crossplatform..

Multi/Crossplatform

predictionmodelsbuiltbyHivecanbeusedfromSpark,andconversely,predictionmodelsbuildbySparkcanbeusedfromHive

2016/11/08ApacheHadoopMeetupatCWT2016 8

HivemallonApacheHive

2016/11/08ApacheHadoopMeetupatCWT2016 9

HivemallonApacheSparkDataframe

2016/11/08ApacheHadoopMeetupatCWT2016 10

HivemallonSparkSQL

2016/11/08ApacheHadoopMeetupatCWT2016 11

HivemallonApachePig

2016/11/08ApacheHadoopMeetupatCWT2016 12

Versatile

HivemallisaVersatilelibrary..

ü HivemallisnotonlyforMachineLearning

ü Hivemallprovidesbunchofgenericutilityfunctions(e.g.,top-k,NLP)

EachorganizationhasownsetsofUDFsfordatapreprocessing!

Don’tRepeatYourself!Don’tRepeatYourself!

ConclusionandTakeaway

Hivemallisamachinelearninglibrarythatis…

2016/11/08ApacheHadoopMeetupatCWT2016 13

WewelcomeyourcontributionstoApacheHivemallJ

Multi/Crossplatform Versatile Scalable Ease-of-use

hivemall.incubator.apache.org