Dynamic Cuda with F#files.meetup.com/1646359/2013-03-21_HPC_GPU_FSharp_Meetup_GT… · Dynamic Cuda...
Transcript of Dynamic Cuda with F#files.meetup.com/1646359/2013-03-21_HPC_GPU_FSharp_Meetup_GT… · Dynamic Cuda...
Dynamic Cuda with F#
HPC GPU & F# Meetup
March 19
San Jose, California
Dr. Daniel Egloff
+41 44 520 01 17
+41 79 430 03 61
! Software development and consulting company
! Based in Zurich
! Core competence ! Quantitative finance and risk management
! Derivative pricing and modeling
! Numerical computing
! High performance computing (clusters, grid, GPUs)
! Software engineering (C++, F#, Scala, …)
! Early adopter of GPUs ! First project with GPUs in finance back in 2007
About Us
Situation
Compute intensive problems GPGPU
Big data and information rich
applications
Cloud and distributed computing
Recurring challenges in our HPC projects
Problems
Algorithms change often
C++ adds unwanted level of
complexity Interoperation and wrapper code
Hard to test correctness of
algorithms
Slow progress because of tools and complexity
Critical algorithms are not designed with GPUs in mind
Implications ? Efficient GPU
code is difficult to develop
Availability of GPU hardware
Negative impact on the usability and acceptance of new technology
Implications
Moving project targets
Optimized code hard to maintain
and extend Maintenance of wrapper code
Maintenance of large body of test
code
Unpredictable project duration
and costs
Rewrite of critical numerical code
Implications Static and
inflexible solutions Delays because of missing hardware
What would be needed and how can we improve?
Needs
Better and more flexible tools to make CUDA more accessible
CUDA programming model
Support iterative development and rapid prototyping
Using modern programming languages and
concepts
Building on top of modern technologies like .NET or the JVM
Generated CUDA code at par with compiled
CUDA C/C++
Solid framework compatible with CUDA
programming model
This is where comes into play…
What is
?
! Complete solution to develop CUDA accelerated GPU applications in .NET ! Relies on F# and in particular code
quotation
! Based on LLVM and CUDA 5 technology
! Fully F# based, no additional changes or extensions to the F# language, no <<<…>>>
! No wrappers, no post build process to transform IL code
Alea.cuBase
Dynamic code generation
GPU algorithm scripting
Industry grade performance
Rapid development
Solid framework for reusability
Advanced CUDA programming
! Generate GPU code programmatically at run-time
! Use .NET generics and F# code quotation splicing for flexible kernels
! Foundation to develop GPU aware domain specific languages
Benefits
Dynamic code generation
! Easy and quick setup of development environment, no need to install NVIDIA nvcc compiler tools
! Rapid prototyping in F# interactive
! Iteratively improve CUDA kernel algorithms without time consuming build cycles
Benefits
Rapid development
! Execute F# scripts with GPU algorithms on command line or in F# interactive
! GPU scripting in Excel
! Integrate Alea.cuBase directly with Python
Benefits
GPU algorithm scripting
! Framework for type-safe definition of GPU resources
! CUDA monad to specify GPU resources together with launch logic in unified manner
! Reuse GPU kernel code and compose them to modular GPU kernel libraries
Benefits
Solid framework for reusability
cuda !{ kernel_B launch logic}
cuda !{ kernel_A launch logic}
! Generating performance optimized code which is on par with compiled CUDA C/C++ code
! Low level device functions and special math functions
! Built in occupancy calculator to identify optimal thread block layout
Benefits
Industry grade performance
0.00%
100.00%
200.00%
300.00%
400.00%
500.00%
600.00%
700.00%
800.00%
2097152 8388608 16777216 33554432
int32 float32 float64
Segmented Scan by Key Alea.cuExtension against CUDA Thrust
! Support for texture, constant and shared memory
! Pointer operations to partition array data
! Special pointer types such as volatile pointers
! Runtime compilation control e.g. fast math
! Multiple streams
! Thread safe use of multiple GPUs
! Inline PTX assembly instructions
Benefits
Advanced CUDA programming
Alea Ecosystem
Alea.cuBaseCUDA Runtime
API
ThrustCUDPP Alea.cuExtension
User Applications
CUDA Driver API
Alea EcosystemCUDA C Ecosystem
Advantages over CUDA C/C++
! Faster development ! More productive tools, type inference, Intellisense support
! Cleaner more expressive code results in fewer bugs, less testing and less debugging
! Removing unnecessary complexity ! No template meta programming or preprocessor tricks needed
! More flexibility ! Dynamic code generation and scripting is entirely missing in CUDA C/C++
! Easier to reuse and compose GPU algorithm
! More transparency ! GPU resources are defined where needed
! Launch logic defines thread layout and data requirement more transparent
! Direct access to other valuable F# and .NET technologies ! Seamless integration into .NET, without any interoperation layer
! Monads aka computational expressions
! Async workflows, agents, parallel programming, type providers
! Uniform .NET type system
How does
work ?
NVVM IR module
PTX module
CUDA cubinmodule
CUDA module
Launch function
PModuleCUDA device
CUDA context
Device workerComilation
process
PTemplate
cuda !{ constant array texture kernel ... launch logic}
! Four steps to a CUDA kernel with F# and Alea.cuBase
Development Process
2
3
4
1
How to program with
?
! Basic kernel programming
! Excel GPU scripting with Alea.cuBase and Alea.cuExtension, in Tsunami IDE and FCell ! Excel based Monte Carlo simulation
! PDE solver for 2d heat equation in GPU
Live Coding
! F# is used in multiple ways ! Internally to build our CUDA compiler with quotations
! As hosting language to define an internal DSL for CUDA, i.e. the CUDA monad
! Use extensibility of F# to build more DSL such as the pcalc monad
! Benefit of using F# ! Rich ecosystem first class language in .NET
! Functional fewer and more descriptive code
! Monads ideal to create DSLs, seq, async, F# linq
! Type providers easier to integrate into information rich programming
! Quotations lot of flexibility for DSL or compiler implementation
! Extensibility pure solution for CUDA programming
How did F# pay off?
More Information on F#
! F# is a functional-first programming language
! Strongly typed
! First class .NET language
! Open source
! Designed to solve complex computing problems with simple, maintainable robust code
! Runs on Windows, Linux, Mac OS, HTML5 and also on GPUs
! http://www.fsharp.org
! http://www.tryfsharp.org
Thank you