Podling Hivemall in the Apache Incubator

13
Podling Hivemall in the Apache Incubator Research Engineer Makoto YUI @myui <[email protected]> 1 2016/11/08 Apache Hadoop Meetup at CWT 2016

Transcript of Podling Hivemall in the Apache Incubator

Page 1: Podling Hivemall in the Apache Incubator

Podling HivemallintheApacheIncubator

ResearchEngineerMakotoYUI@myui

<[email protected]>

12016/11/08ApacheHadoopMeetupatCWT2016

Page 2: Podling Hivemall in the Apache Incubator

2016/11/08ApacheHadoopMeetupatCWT2016 2

HivemallenteredApacheIncubatoronSept13,2016🎉

hivemall.incubator.apache.org

@ApacheHivemall

Page 3: Podling Hivemall in the Apache Incubator

•MakotoYui<TreasureData>• TakeshiYamamuro <NTT>Ø HivemallonApacheSpark• DanielDai<Hortonworks>Ø HivemallonApachePigØ ApachePigPMCmember• TsuyoshiOzawa<NTT>ØApacheHadoopPMCmember• KaiSasaki<TreasureData>

3

Initialcommitters

2016/11/08ApacheHadoopMeetupatCWT2016

Page 4: Podling Hivemall in the Apache Incubator

Champion

NominatedMentors

4

Projectmentors

• ReynoldXin<Databricks,ASFmember>ApacheSparkPMCmember• MarkusWeimer<Microsoft,ASFmember>ApacheREEFPMCmember• Xiangrui Meng <Databricks,ASFmember>ApacheSparkPMCmember

• RomanShaposhnik <Pivotal,ASFmember>ApacheBigtop/IncubatorPMCmember

2016/11/08ApacheHadoopMeetupatCWT2016

Page 5: Podling Hivemall in the Apache Incubator

WhatisApacheHivemall

ScalablemachinelearninglibrarybuiltasacollectionofHiveUDFs

52016/11/08ApacheHadoopMeetupatCWT2016

Multi/Crossplatform Versatile Scalable Ease-of-use

Page 6: Podling Hivemall in the Apache Incubator

Hivemalliseasyandscalable…

ClassificationwithMahout

CREATETABLElr_model ASSELECTfeature,-- reducersperformmodelaveraginginparallelavg(weight)asweightFROM(SELECTlogress(features,label,..)as(feature,weight)FROMtrain)t-- map-onlytaskGROUPBYfeature;-- shuffledtoreducers

MLmadeeasyforSQLdevelopers

Borntobeparallelandscalable

ThisSQLqueryautomaticallyrunsinparallelonHadoopcluster

62016/11/08ApacheHadoopMeetupatCWT2016

Ease-of-use

Scalable

Page 7: Podling Hivemall in the Apache Incubator

2016/11/08ApacheHadoopMeetupatCWT2016 7

Hivemallisamulti/cross-platformMLlibrary

HiveQL SparkSQL/Dataframe API PigLatin

HivemallisMulti/Crossplatform..

Multi/Crossplatform

predictionmodelsbuiltbyHivecanbeusedfromSpark,andconversely,predictionmodelsbuildbySparkcanbeusedfromHive

Page 8: Podling Hivemall in the Apache Incubator

2016/11/08ApacheHadoopMeetupatCWT2016 8

HivemallonApacheHive

Page 9: Podling Hivemall in the Apache Incubator

2016/11/08ApacheHadoopMeetupatCWT2016 9

HivemallonApacheSparkDataframe

Page 10: Podling Hivemall in the Apache Incubator

2016/11/08ApacheHadoopMeetupatCWT2016 10

HivemallonSparkSQL

Page 11: Podling Hivemall in the Apache Incubator

2016/11/08ApacheHadoopMeetupatCWT2016 11

HivemallonApachePig

Page 12: Podling Hivemall in the Apache Incubator

2016/11/08ApacheHadoopMeetupatCWT2016 12

Versatile

HivemallisaVersatilelibrary..

ü HivemallisnotonlyforMachineLearning

ü Hivemallprovidesbunchofgenericutilityfunctions(e.g.,top-k,NLP)

EachorganizationhasownsetsofUDFsfordatapreprocessing!

Don’tRepeatYourself!Don’tRepeatYourself!

Page 13: Podling Hivemall in the Apache Incubator

ConclusionandTakeaway

Hivemallisamachinelearninglibrarythatis…

2016/11/08ApacheHadoopMeetupatCWT2016 13

WewelcomeyourcontributionstoApacheHivemallJ

Multi/Crossplatform Versatile Scalable Ease-of-use

hivemall.incubator.apache.org