Dynamic Cuda with F#files.meetup.com/1646359/2013-03-21_HPC_GPU_FSharp_Meetup_GT… · Dynamic Cuda...

Dynamic Cuda with F#

HPC GPU & F# Meetup

March 19

San Jose, California

Dr. Daniel Egloff

[email protected]

+41 44 520 01 17

+41 79 430 03 61

!   Software development and consulting company

!   Based in Zurich

!   Core competence !   Quantitative finance and risk management

!   Derivative pricing and modeling

!   Numerical computing

!   High performance computing (clusters, grid, GPUs)

!   Software engineering (C++, F#, Scala, …)

!   Early adopter of GPUs !   First project with GPUs in finance back in 2007

About Us

Situation

Compute intensive problems GPGPU

Big data and information rich

applications

Cloud and distributed computing

Recurring challenges in our HPC projects

Problems

Algorithms change often

C++ adds unwanted level of

complexity Interoperation and wrapper code

Hard to test correctness of

algorithms

Slow progress because of tools and complexity

Critical algorithms are not designed with GPUs in mind

Implications ? Efficient GPU

code is difficult to develop

Availability of GPU hardware

Negative impact on the usability and acceptance of new technology

Implications

Moving project targets

Optimized code hard to maintain

and extend Maintenance of wrapper code

Maintenance of large body of test

code

Unpredictable project duration

and costs

Rewrite of critical numerical code

Implications Static and

inflexible solutions Delays because of missing hardware

What would be needed and how can we improve?

Needs

Better and more flexible tools to make CUDA more accessible

CUDA programming model

Support iterative development and rapid prototyping

Using modern programming languages and

concepts

Building on top of modern technologies like .NET or the JVM

Generated CUDA code at par with compiled

CUDA C/C++

Solid framework compatible with CUDA

programming model

This is where comes into play…

What is

?

!   Complete solution to develop CUDA accelerated GPU applications in .NET !   Relies on F# and in particular code

quotation

!   Based on LLVM and CUDA 5 technology

!   Fully F# based, no additional changes or extensions to the F# language, no <<<…>>>

!   No wrappers, no post build process to transform IL code

Alea.cuBase

Dynamic code generation

GPU algorithm scripting

Industry grade performance

Rapid development

Solid framework for reusability

Advanced CUDA programming

!   Generate GPU code programmatically at run-time

!   Use .NET generics and F# code quotation splicing for flexible kernels

!   Foundation to develop GPU aware domain specific languages

Benefits

Dynamic code generation

!   Easy and quick setup of development environment, no need to install NVIDIA nvcc compiler tools

!   Rapid prototyping in F# interactive

!   Iteratively improve CUDA kernel algorithms without time consuming build cycles

Benefits

Rapid development

!   Execute F# scripts with GPU algorithms on command line or in F# interactive

!   GPU scripting in Excel

!   Integrate Alea.cuBase directly with Python

Benefits

GPU algorithm scripting

!   Framework for type-safe definition of GPU resources

!   CUDA monad to specify GPU resources together with launch logic in unified manner

!   Reuse GPU kernel code and compose them to modular GPU kernel libraries

Benefits

Solid framework for reusability

cuda !{ kernel_B launch logic}

cuda !{ kernel_A launch logic}

!   Generating performance optimized code which is on par with compiled CUDA C/C++ code

!   Low level device functions and special math functions

!   Built in occupancy calculator to identify optimal thread block layout

Benefits

Industry grade performance

0.00%

100.00%

200.00%

300.00%

400.00%

500.00%

600.00%

700.00%

800.00%

2097152 8388608 16777216 33554432

int32 float32 float64

Segmented Scan by Key Alea.cuExtension against CUDA Thrust

!   Support for texture, constant and shared memory

!   Pointer operations to partition array data

!   Special pointer types such as volatile pointers

!   Runtime compilation control e.g. fast math

!   Multiple streams

!   Thread safe use of multiple GPUs

!   Inline PTX assembly instructions

Benefits

Advanced CUDA programming

Alea Ecosystem

Alea.cuBaseCUDA Runtime

API

ThrustCUDPP Alea.cuExtension

User Applications

CUDA Driver API

Alea EcosystemCUDA C Ecosystem

Advantages over CUDA C/C++

!   Faster development !   More productive tools, type inference, Intellisense support

!   Cleaner more expressive code results in fewer bugs, less testing and less debugging

!   Removing unnecessary complexity !   No template meta programming or preprocessor tricks needed

!   More flexibility !   Dynamic code generation and scripting is entirely missing in CUDA C/C++

!   Easier to reuse and compose GPU algorithm

!   More transparency !   GPU resources are defined where needed

!   Launch logic defines thread layout and data requirement more transparent

!   Direct access to other valuable F# and .NET technologies !   Seamless integration into .NET, without any interoperation layer

!   Monads aka computational expressions

! Async workflows, agents, parallel programming, type providers

!   Uniform .NET type system

How does

work ?

NVVM IR module

PTX module

CUDA cubinmodule

CUDA module

Launch function

PModuleCUDA device

CUDA context

Device workerComilation

process

PTemplate

cuda !{ constant array texture kernel ... launch logic}

!   Four steps to a CUDA kernel with F# and Alea.cuBase

Development Process

2

3

4

1

How to program with

?

!   Basic kernel programming

!   Excel GPU scripting with Alea.cuBase and Alea.cuExtension, in Tsunami IDE and FCell !   Excel based Monte Carlo simulation

!   PDE solver for 2d heat equation in GPU

Live Coding

!   F# is used in multiple ways !   Internally to build our CUDA compiler with quotations

!   As hosting language to define an internal DSL for CUDA, i.e. the CUDA monad

!   Use extensibility of F# to build more DSL such as the pcalc monad

!   Benefit of using F# !   Rich ecosystem first class language in .NET

!   Functional fewer and more descriptive code

!   Monads ideal to create DSLs, seq, async, F# linq

!   Type providers easier to integrate into information rich programming

!   Quotations lot of flexibility for DSL or compiler implementation

!   Extensibility pure solution for CUDA programming

How did F# pay off?

More Information on F#

!   F# is a functional-first programming language

!   Strongly typed

!   First class .NET language

!   Open source

!   Designed to solve complex computing problems with simple, maintainable robust code

!   Runs on Windows, Linux, Mac OS, HTML5 and also on GPUs

!   http://www.fsharp.org

!   http://www.tryfsharp.org

Thank you

Dynamic Cuda with F#files.meetup.com/1646359/2013-03-21_HPC_GPU_FSharp_Meetup_GT… · Dynamic Cuda...

Documents

Transcript of Dynamic Cuda with F#files.meetup.com/1646359/2013-03-21_HPC_GPU_FSharp_Meetup_GT… · Dynamic Cuda...