Petuum - A new Platform

download Petuum - A new Platform

of 15

Transcript of Petuum - A new Platform

  • 7/26/2019 Petuum - A new Platform

    1/15

    PETUUMa New Platform for Distributed Machine Learning on Big Data

    - Seminar on Big Data Architectures for Machine Learning and Data Mini

  • 7/26/2019 Petuum - A new Platform

    2/15

    utline

    Introduction the need

    current problems

    Motivation why build a new

    framework?!

    scalability

    Components

    & Approach key concepts

    data-parallelism

    model-parallelism

    System Design internals -

    schematic view

    scheduler, workers

    YARN compatibility problems &

    solutions proposed

    Performan Comparison w

    different plat

  • 7/26/2019 Petuum - A new Platform

    3/15

    ntroduction

    Machine Learning is becoming a primary mechanism for extracting inform

    Need ML methods to scale beyond single machine.

    Flickr, Instagram and Facebook 10s of billions of images.

    Highly inefficient to use such big data sequentially in a batch fashion in a ML algorithm

    Despite rapid development of many new ML models and algorithms aim

    application, adoption of these technologies remains generally unseen in

    mining, NLP, vision, and other application communities.

    Difficult migration from an academic implementation (small desktop PCs

    clusters) to a big, less predictable platform (cloud or a corporate cluster)

    models and algorithms from being widely applied.

  • 7/26/2019 Petuum - A new Platform

    4/15

    otivation

    Find a systematic way to efficiently apply a wide spectrum of adprograms to industrial scale problems, using Big Models (up to 1of parameters) on Big Data (up to terabytes or petabytes).

    Modern parallelization strategies employ fine-grained operationdifficult to find a universal platform applicable to a wide range oat scale.

    Problems with other platforms

    Hadoop :o simplicity of MapReduce difficult to exploit ML properties (erro

    o performance on many ML programs has been surpassed by alterna

    Spark :

    o does not offer fine-grained scheduling of computation and commuand correct execution of advanced ML algorithms.

  • 7/26/2019 Petuum - A new Platform

    5/15

    LDA - 20m unique words and 1

    topics (220b sparse parameters

    a cluster with total 256 cores a

    1TB memory.

    MF - 20m-by-20k matrix and ra400 (8b parameters) on a cluste

    with total 128 cores and 1TB

    memory.

    CNN - 1b parameters on a clu

    with total 1024 CPU cores an

    2TB memory. GPUs not requi

    Fig. 1: The scale of Big ML efforts in recent literature. A key goal of Petuum is to enable lML models to be run on fewer resources, even relative to highly-specialized implementat

    otivation

  • 7/26/2019 Petuum - A new Platform

    6/15

    omponents pproach

    Formalized ML algorithms as iterative-convergent programs:

    o stochastic gradient descent

    o MCMC for determining point estimates in latent variable models

    o coordinate descent, variational methods for graphical methods

    o proximal optimization for structured sparsity problems, and othe

    Found out the shared properties across all algorithms.

    Key lies in the recognition of a clear dichotomy(division) b/w DATA an

    This inspired bimodal approach to parallelism: data parallel and mod

    and execution of a big ML program over cluster of machines.

  • 7/26/2019 Petuum - A new Platform

    7/15

    This approach exploits unique statistical nature of ML algorithms, mainly three properties:

    Error tolerance iterative-convergent algorithms are robust against limited errors in intermcalculations.

    Dynamic structural dependency changing correlation strengths between model parameteefficient parallelization.

    Non-uniform convergence No. of steps required for a parameter to converge can be highparameters.

    Fig. 2: Key pro

    algorithms:

    (a) Non-unifor

    (b) Error-toler

    (c) Dependenc

    variables.

    omponents pproach

  • 7/26/2019 Petuum - A new Platform

    8/15

    Iterative-Convergent ML Algorithm: Given data D and loss L (i.e., fitness function such a

    lihood), a typical ML problem can be grounded as executing the following update equa

    iteratively, until the model state (i.e., parameters and/or latent variables)

    A reaches some stopping criteria:

    The update function L() (which improves the loss L) performs computation on data D

    model stateA, and outputs intermediate results to be aggregated by F().

    omponents pproach

  • 7/26/2019 Petuum - A new Platform

    9/15

    the data D is partitioned and assigned to

    computational workers;

    we assume that the function () can be applied to

    each of these data subsets independently

    () are aggregated via summation

    Each parallel worker contributes equally

    the dataA is partitioned and assign

    unlike data-parallelism each update

    takes a scheduling function Spt-1() w

    to operate on a subset of the mode

    the model parameters are not inde

    each parallel worker contributes eq

    omponents pproach

    Fig. 3: Petuum Data-Parallelism Fig. 4: Petuum Model-P

  • 7/26/2019 Petuum - A new Platform

    10/15

    ystem design

    Fig. 5: Petuum scheduler, workers, parameter servers.

    Petuum Goal: allow users to easily implement data-parallel andmodel-parallel ML algorithms!

    Parameter Server (PS):

    enables data-parallelism

    global read/write access

    three functions: PS.get

    Scheduler:

    enables model-paralleliscontrol which model par

    worker machines

    scheduling function sch

    of parameters for each w

    Workers:

    receives parameters to b

    schedule(), and th

    functionspush()

  • 7/26/2019 Petuum - A new Platform

    11/15

    ystem design

    STRADSBSEN POSEIDON

    parameter server for data-parallel Big Data AI & ML

    uses a consistency model,which allows asynchronous-

    like performance and bulk

    synchronous execution, yet

    does not sacrifice ML

    algorithm correctness

    scheduler for model-parallelHigh Speed AI & ML

    performs fine-grainedscheduling of ML update

    operations

    prioritizing computation on the

    parts of the ML program that

    need it most

    avoiding unsafe parallel

    operations that could hurt

    performance

    distributed Glearning fram

    on Caffe andBsen

    enjoys speed

    communicati

    bandwidth m

    features of B

  • 7/26/2019 Petuum - A new Platform

    12/15

    ARN compatibility

    Problem:

    many industrial and academic clusters run HadoopMapReduce

    however, programs that are written for stand-alone clusters are n

    with YARN/HDFS, and vice versa, applications written for YARN/HD

    compatible with stand alone clusters.

    Solution: providing common libraries that work on either Hadoop or non-H

    o YARN launcher that will deploy any Petuum application (inclu

    written ones) onto a Hadoop cluster

    o a data access library for HDFS access, which allows users to w

    stream code that works on both HDFS files the local filesystem

  • 7/26/2019 Petuum - A new Platform

    13/15

    erformance

    Fig. 6: Petuum relative speedup vs

    popular platforms (larger is better).

    Across ML programs, Petuum is at least

    2-10 times faster than popular

    implementations.

    Fig. 7: Matrix Factorization convergence time: Petuu

    Spark. Petuum is fastest and the most memory-effic

    platform that could handle Big MF models with rank

    given hardware budget.

  • 7/26/2019 Petuum - A new Platform

    14/15

    Fig. 8: Petuum Program Structure

    Fig. 9: Petuum DML data-parallel ps

    xample

  • 7/26/2019 Petuum - A new Platform

    15/15

    eferences

    [PETUUM A New Platform for Distributed Machine Learning on Big Data Eric] P. Xing

    Wei Dai, Jin Kyu Kim Jinliang Wei, Seunghak Lee Xun Zheng, Pengtao Xie AbhimanuYaoliang Yu. IEEE Transactions on Big Data, June 2015

    https://github.com/petuum

    http://www.simplilearn.com/how-facebook-is-using-big-data-article

    hank ou !