Improving Bayesian Computational Time and Scalability With GPGPU

download Improving Bayesian Computational Time and Scalability With GPGPU

of 33

Transcript of Improving Bayesian Computational Time and Scalability With GPGPU

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    1/33

    Improving BayesianComputational Time andScalability with GPGPU

    Thanakij Pechprasarn, Noppadon Khiripet

    [email protected]

    Knowledge Elicitation Archiving Laboratory (KEA)

    National Electronics and Computer Technology Center

    (NECTEC),A N S C S E 1 5 1 stApril2011

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    2/33

    Bayesian applications

    Style of problems includes: inferenceproblems, causal problems

    For example, the problem may bethat given that the grass is wet(evidence), what is the probabilityof each influential cause (rain,

    sprinkler)?

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    3/33

    Bayesian probability

    probability as a degree of belief

    conditional probability given information(evidence), your belief changes

    posterior as inverse probability Bayes theorem

    Where,

    = prior of = likelihood = posterior = prior ofD (acts as a normalizing constant of value

    )

    )(

    )()|()|(

    DP

    PDPDP

    =

    )(P

    )|( DP

    )|( DP

    )(DP dPDP )()|(

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    4/33

    Our selected application

    To do hypothesis testing givenobserved data

    The expected value of the posteriorhas to fall under 95% region(credible interval) of the priordistribution

    If true, then the hypothesis isaccepted, otherwise rejected

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    5/33

    Posterior expectation

    An expected value of the posterior,

    It requires one to sample from theposterior, but a sampling methodfor all posterior may not be known,especially when the posterior has a

    complex form We can work out math to make it

    simpler

    Remark a powerful method such as Markov chain Monte Carlo

    ][)|( DPE

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    6/33

    Posterior expectation (2)

    The definition of an expected value, So,

    Using Bayes rule, From the definition of an expectation,

    Now we have changed the distributionfrom using the posterior to the prior

    We assume that known sampling method

    for the prior distribution exists

    dxXPXXE XP

    = )(*][)(

    = dDPE DP )|(*][)|(

    dDP

    PDP

    =

    )(

    )(*)|(*

    )(

    )]|(*[)(

    DP

    DPEP =

    [...][...] )()|( PDP EE

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    7/33

    Hypothesis testing

    We do the testing to see if thecalculated expected value of theposterior falls under 95% region of

    prior distribution That is, to see if

    95.0)(

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    8/33

    Problems

    However, we still have to solve theintegrals appeared in the denominator,P(D), and in the hypothesis testing

    Analytical method may not work becauseclosed-form solution may not be found

    Notice that we can convert back and forthbetween the integrals and the

    expectations However, how can we really solve either

    an integral or an expected value

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    9/33

    Solutions

    Monte Carlo integration (MCI) can beused to approximate anexpectation/integral involving a

    random process

    Thus, to find an expectation with MCI:

    1.Sample X1..N according to the

    distribution f

    2.Calculate the sam le mean an

    N

    xf

    xfE

    N

    i

    i= 1

    )(

    )]([

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    10/33

    Solutions (2)

    Unfortunately, MCI also has itsdrawback

    In general, the more samples, themore accurate of the final answer

    However, with a lot more samples,the computation becomes much

    slower!

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    11/33

    GPUs and CUDA

    GPU computing, leveraging graphicscards as an accelerator of thecomputation

    Nvidia CUDA is a major frameworkfor programming GPUs

    CUDA allows developers to exploit

    parallelism in a form of blocks andthreads

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    12/33

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    13/33

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    14/33

    Current work

    Make use of our previous work, theparallel reduction module in GPUs

    Speed up the computation in a real-world Bayesian application withGPU computing

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    15/33

    Current work (2)

    Calculate the posterior expectation

    With this form, we can calculate theexpectations with MCI for both thenumerator and denominator

    )(

    )]|(*[][

    )(

    )|(DP

    DPEE

    P

    DP

    =

    =

    dPDP

    DPEP

    )(*)|(

    )]|(*[)(

    )]|([

    )]|(*[

    )(

    )(

    DPE

    DPE

    P

    P=

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    16/33

    Current work (3)

    Given the computed value of the posteriorexpectation, one can test the hypothesisvia Monte Carlo methods as follows:

    1.X1..N = sample from the prior2.count= the number of samples that its

    value is less than the expected value

    3.Ifcount/N< 0.95 then accept Else reject

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    17/33

    Structures of the parallelprogram

    1.Sample from the prior, (CPU)

    2.

    3.Calculate the posterior expectation (GPU)

    The numerator part,

    The denominator part,

    4.

    5.Do hypothesis testing,(CPU)

    NDPN

    i

    ii /)|(*1=

    =

    N

    i

    i NDP1

    /)|(

    )(P

    95.0)(

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    18/33

    Extra issues

    In addition to the parallelized Bayesianapplications, we also handle 2 issues found in ourprevious work in theparallel reduction step:

    1. Further optimization

    Although results from previous work show thatthe computational time is substantialreduced, but we find that it can be furtherimproved

    Techniques: loop unrolling, enhance compactingcode

    2.Scalability

    The problem is that a certain block size canhandle a problem size up to a certain point,

    so small blocks cannot afford larger problem

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    19/33

    What about the likelihood andprior?

    Prior ~ (broad prior)

    Each observation ~ (normalmodel)

    Likelihood = (observationsare independent)

    The 23 observations weve used are from the

    Cavendishs data:5.36, 5.29, 5.58, 5.65, 5.57, 5.53,5.62, 5.29, 5.44, 5.34, 5.79, 5.10,5.27, 5.39, 5.42, 5.47, 5.63, 5.34,5.46, 5.30, 5.78, 5.68, 5.85

    )04.0,(N

    )5.0,5(N

    =

    23

    1 )04.0,;(i iDN

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    20/33

    Platforms

    CUDA 3.2

    A workstation with followingspecification:Description CPU GPU

    Model Intel Core i7 Nvidia GeForce GTX580Clock frequency (GHz) 2.8 1.56

    # processors 2 16

    # cores per processor 4 32

    # total cores 8 512

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    21/33

    Results

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    22/33

    Results (2)

    The calculated expected value isabout 5.483

    It falls under 95% region, so thehypothesis is accepted

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    23/33

    Results (3)

    Running time: Sequential (CPU) vs Parallel(GPU)

    Our maximum speed-up achieved is53.49x

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    24/33

    Results (4)

    However, we know that the parallelimplementation also contain asequential part

    Currently only the portion of finding aposterior expectation is parallelized

    If we compare the running time of

    this specific portion between CPUand GPU versions, we would seegreater difference in performance

    And the maximum speed-up we

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    25/33

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    26/33

    Summary

    Weve implemented a Bayesianapplication to do the hypothesistesting given a posterior

    expectation We develop parallel programs

    running on GPUs to help accelerate

    the computation Our maximum speed-up obtained is

    53.49x

    In addition, we cope with the

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    27/33

    Thank You

    Q&A

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    28/33

    Solving the scalability issues

    We now use 2D blocks instead of 1Dblocks

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    29/33

    Results (scalability issue)

    We show that the smallest block size can also be used with

    the largest problem size (this would not be possible inour previous work)

    Problem Size Running Time (second)using Block Size = 128

    65,535 0.011

    131,070 0.021

    262,140 0.041

    524,280 0.080

    1,048,560 0.159

    2,097,120 0.317

    4,194,240 0.631

    8,388,480 1.261

    16,776,960 2.523

    33,553,920 5.076

    67,107,840 10.368

    134,215,680 20.516

    268,431,360 40.332

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    30/33

    Further optimization:Loop unrolling

    (* parallel reduction in the reduce kernel *)FOR s from num_samples/2 to 64 having s/=2 Sync threads (* make sure that all threads are working on the same

    level of the tree *) IF threadIdis less than s THEN Add s_data[threadId] to s_data[threadId+ s] END IFEND FOR(* loop unrolling *)IF threadIdis less than 32 THEN (* CUDA warp size is 32 *) Add s_data[threadId] to s_data[threaded+ 32]

    Add s_data[threadId] to s_data[threaded+ 16] Add s_data[threadId] to s_data[threaded+ 8] Add s_data[threadId] to s_data[threaded+ 4] Add s_data[threadId] to s_data[threaded+ 2] Add s_data[threadId] to s_data[threaded+ 1]END IF

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    31/33

    Further optimization:Enhance compacting kernel

    Original version:kernel_reduce ()

    Modified version:kernel_reduce

    ()

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    32/33

    Effect of furtheroptimization

    Unfortunately, each introduced optimizationon parallel reduction seems to have a littlegain

    We find that this is due to the other hot spot

    in the program that dominates thecom utation (that is, the time s ent on

  • 8/6/2019 Improving Bayesian Computational Time and Scalability With GPGPU

    33/33

    Monte Carlo integration(MCI)

    We want to integrate fin [a,b]

    Divide by P(x), distribution that we know how to

    sample from

    Change into a form of expectation

    Estimate the integral by sampling from P(x) andcalculate the sample mean

    =b

    a

    dxxfI )(

    = dxxPxP

    xfI )(

    )(

    )(

    ])(

    )([)(

    xP

    xfEI XP=

    N

    xP

    xf

    I

    N

    i i

    i= 1

    )(

    )(