Hivemall¨Spark MLlib®¯”較

download Hivemall¨Spark MLlib®¯”較

of 16

Embed Size (px)

Transcript of Hivemall¨Spark MLlib®¯”較

  • @myui

    HivemallSpark MLlib

    1

    1 / 16

    2014/12/11

  • 2

    [1] http://www.slideshare.net/myui/sigmodj-myui

    DisclaimerHivemall1

    HivemallSpark MLlib

    2 / 16

    2014/12/11

  • HivemallSpark MLlib

    Hadoop HDFS

    MapReduce(MRv1)

    Hive/PIG

    Hivemall

    Apache YARN

    Apache TezDAG

    Apache MESOS

    Apache SparkDAG

    Spark MLlib

    MRv2

    SparkSQL

    3

    3 / 16

    2014/12/11

  • SQL

    Hivemall

    Mahout

    CREATE TABLE lr_model ASSELECTfeature, -- reducers perform model averaging in

    parallelavg(weight) as weight

    FROM (SELECT logress(features,label,..) as (feature,weight)FROM train

    ) t -- map-only taskGROUP BY feature; -- shuffled to reducers

    4

    4 / 16

    2014/12/11

  • SparkHadoop

    MapReduceIN/OUTHDFS: TezHDFSIN/OUT

    Spark

    iter. 1 iter. 2 . . .

    Input

    HDFSread

    HDFSwrite

    HDFSread

    HDFSwrite

    iter. 1 iter. 2

    Input5

    5 / 16

    2014/12/11

  • KDD Cup 2012 Track2

    KDD Cup 2012, Track 2(Public

    25CTR

    6

    Spark

    SparkAUC 0.6

    6 / 16

    2014/12/11

  • KDD Cup 2010

    KDD Cup 2010a2Public (: 822.73GB

    7

    Spark

    7 / 16

    2014/12/11

  • https://speakerdeck.com/lintool/large-scale-machine-learning-at-twitter

    w

    8

    9 / 16

    2014/12/11

  • +1 = 1

    =0

    (( ; ), )

    (Gradient Descent)

    9

    10 / 16

    2014/12/11

  • 1. Distributed Gradient

    2. Parameter Mixing

    2

    HivemallJubatus, Down Pour SGDSpark MllibVowpal WabbitAllReduce

    10

    11 / 16

    2014/12/11

  • Distributed Gradient

    +1 = 1

    =0

    (( ; ), )

    mappers

    single reducer

    mapperreducer

    SGD

    11

    12 / 16

    2014/12/11

  • Distributed Gradient with Mini-batch Updates

    CTR0.2%SGD

    12

    13 / 16

    2014/12/11

  • Spark MLlib?

    Val data = ..

    for (i

  • Parameter Mixing

    +1 = ((;), )train train

    +1, ..+1,

    -1, ..+1,

    tuple

    array

    Training table

    -1, ..+1,

    MIX

    -1, ..+1,

    train train

    array

    SGD

    /

    14

    15 / 16

    2014/12/11

  • Hivemall

    15

    1

    2

    N

    Mix Server

    16 / 16

    2014/12/11

  • 1

    2

    N

    /Line rateFPGA+

    FPGANICaccumulator..

    162014/12/11