Accelerated Science - Khronos Group€¦ · 11388 (3936) 10357 (5941) GPU OpenCL 219 (109) 122 (51)...

Accelerated Science – use of OpenCL in Land Down Under

Tomasz Bednarz Computational Research Scientist, CSIRO

Outline • CSIRO - Positive Impact

• Bragg – the CSIRO GPU Cluster

• CSIRO Accelerated Service

• MASSIVE

• Radiation therapy

• Level set segmentation

• CFD

• Cloud based imaging

• Other projects

• OpenCL for Game AI Pro

• Sydney Khronos Chapter

• ACW at OzViz 2013

CSIRO - Positive Impact • In the world’s Top 10 institutions for 3

research fields.

• Top 1% of global research institutions

in 14 of 22 research fields.

• Trusted advisor, innovative.

• 2000 doctorates, 500 masters.

Bragg – the CSIRO GPU Cluster • A bit of history

- Heterogeneous compute cluster (CPUs + GPUs)

- Funded by CSS TCP (2009) - first of its kind in AU

• Continuously evolving:

- GPUs upgraded in 2010 to NVIDIA Fermi Teslas

- CPUs upgraded in 2012 to Intel Sandy Bridge and 3rd

GPU added to each node

- GPUs upgraded in 2013 to NVIDIA Kepler Teslas

• Spec:

- 128 Dual Xeon 8-core E5-2650 (= 2048 cores)

- 128 GB RAM, FDR10 InfiniBand Interconnect

- 132 Fermi Tesla M2050 GPUs (= 59,136 cores)

- 254 Kepler Tesla M20 GPUs (= 633,984 cores)

- 16 Intel SC5110P Xenon PHI (= 960 cores)

- 289 on the Top500 list (as per June 2013)

CSIRO Accelerated Computing Service • About the Service

- Supported by ASC Applications and Novel Technology Staff

- Not limited to GPU support

- Delivers: - Accelerated Computing Projects

- HPC Workshops and Training

- Benchmarking

• Activities include

- Porting and Parallelising - Targeting HPC clusters, GPUs

- Profiling

- Performance Tuning

- Debugging

- Numerical Error Analysis

- Algorithmic Improvements Sam Moskwa – ACW at OzViz 2011

OpenCL CUDA Python R OpenMP MPI C/C++ OpenACC Fortran

MASSIVE - Two high performance computing

facilities, located at the Australian

Synchrotron and Monash University.

- Specialised imaging and visualisation

software and databases.

- Expertise in visualisation, image

processing, image analysis, HPC and

GPU computing.

- Training and Outreach program.

- NCI Specialised Facility for Imaging

and Visualisation.

For more info contact Dr Wojtek James Goscinski

Scientific Visualisation – CT reconstruction

Insect CT scan, rendered using Drishi (http://anusf.anu.edu.au/Vizlab/drishti/) by Sherry Mayo (CSIRO)

Radiation therapy applications • Modern radiation therapy is to a large extent a computational discipline and can

greatly benefit from use of task- and data-parallelism. Some applications were

demonstrated on GPUs already:

- CT reconstructions

- Image registrations

- Treatment planning

- Dose computations (e.g. X Gu, U Jelen et al 2011 PMB 56)

• Need for speed: imaging and treatment verification can be used as feedback to improve the

treatment (adaptive radiotherapy), currently offline (mostly population-based), one day online.

• Particle (proton/carbon ion) therapy with raster scanning @ University of Marburg:

- most precise external beam technique (only 5 centers worldwide: 3 active, 2 to start)

- increased precision = increased need for verification (more computations)

- longer computational times (small head case: 1 hour on single-thread)

• Collaborative project between CSIRO and University of Marburg

(Adapted from Schlegel and Mahr 2001)

Ammazzalorso, Bednarz, Jelen

Plan robustness in radiation therapy • Automatic discovery of robust beam setups.

• Results (mean and sd for a single beam):

- 4-core Intel Xeon W3530 2.8GHz 12GB RAM + NVIDIA Tesla C2050 3GB RAM

- 10 skull base cases, 42 beams directions (10 runs each for timing stats)

- 4k-40k pencils of 120-350 samples, 2 mm analysis radius (0.5 mm step)

- Single-precision floating-point operations only (sufficient precision)

mean(sd) ms P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 Pool

Native (1 thread)

21299 (6628)

9891 (2837)

6258 (1485)

15768 (4959)

4342 (1136)

10888 (3179)

10117 (2849)

5464 (1470)

8155 (2195)

11388 (3936)

10357 (5941)

GPU OpenCL

219 (109)

122 (51)

88 (38)

148 (56)

61 (24)

160 (65)

151 (64)

52 (22)

109 (46)

126 (61)

124 (75)

Gain 119 x (36)

98 x (34)

87 x (30)

123 x (36)

83 x (25)

81 x (24)

82 x (30)

124 x (42)

90 x (31)

106 x (29)

99 x (36)

CPU OpenCL

6498 (1996)

2552 (615)

1898 (438)

4810 (1495)

1324 (331)

3280 (944)

3051 (841)

1396 (310)

2481 (649)

2935 (818)

3022 (1798)

Gain 3.3 x (0.0)

3.8 x (0.4)

3.3 x (0.0)

3.9 x (0.4)

3.3 x (0.0)

3.8 x (0.4)

3.5 x (0.3)

F. Ammazzalorso (Uni-Marburg), T. Bednarz (CSIRO) and U. Jelen (Uni-Marburg)

- Oral poster presentation at Int. Conference on the Use of Computers in Radiation Therapy ICCR2013 (Melbourne, 6-9 May 2013)

- Accepted for journal publication in IOP JPCS (upcoming)

Fast particle therapy dose computation • How OpenCL 2.0 can help?

- Radiation therapy (RT) has need for both task- and data-parallelism and it represents a

perfect potential application field for heterogeneous parallel systems.

- OpenCL 2.0 can help exploiting these system in RT applications and open the way to

new/simpler usage scenarios.

- The robust planning approach requires continuous exchange of large dataset: for now

latency can be hidden with ad-hoc synchronization

→ Great opportunity with OCL2.0 shared virtual memory (SVM).

- Dose computation (especially for ions, where biological weighing is involved) is a more

complex problem, which OCL2.0 dynamic parallelism (DP) can help breaking into simpler

blocks, enabling scheduling of different kernels without host-side and communication

overhead.

- A combination of SVM and DP can open the way to treatment plan optimization on massively

parallel architectures: large working datasets so far unsuitable for GPU computation (at

decent resolutions) and complex scheduling to enable e.g. intermediate dose computation at

each optimization step.

Image segmentation • What is image segmentation?

- Wiki: Segmentation is the process of partitioning a digital image into multiple segments

(sets of pixels, also known as super-pixels). The goal of segmentation is to simplify and/or

change the representation of an image into something that is more meaningful and easier to

analyze.

• Our Goal: fast, interactive and accurate segmentations even with noisy data

HCA-Vision

Multiple Myeloma deadly cancer of blood plasma cells • Facts

- Despite a rash of new drugs and advances in stem-cell therapy, this

rare bloodborne cancer is still an almost certain death sentence

- A cure remains a long way off

- Plasma cell cancer

- Collections of abnormal cells accumulate in bones, where they cause

bone lesions (abnormal areas of tissue), and in the bone marrow

where they interfere with the production of normal blood cells

- The disease develops in 1–4 per 100,000 people per year.

Micro-CT scans reveal bone

damage from myeloma in a

5T2MM-bearing mouse.

PETER CROUCHER & KARIN VANDERKERKEN

Level sets segmentation - applications

Segmentation using Level Sets • Embed a seed surface in an image

• Iteratively deform the surface along normal according to local properties of the

surface and the underlying image

• Good: competitive accuracy compared to manual segmentation.

• Advantages: the arbitrary complex shapes can be modeled and topological changes such as

merges and splitting are handled implicitly.

iterations

Level Set Workflow

Initialize f0 to Signed Euclidean Distance Transform from mask m

Calculate Data Speed Term D(I)

Calculate First and Second Order Spatial Derivatives

Calculate Curvature Terms

Calculate Gradient of f

Calculate Speed Term F

Update Time Step, and Reinitialize f if needed

• Governing equation ¶f

¶t= - Ñf aD(x )+ (1-a)Ñ×

ëêê

ûúú

the velocity term the mean curvature

(smoothing term)

Input Image Applied Mask f0 as SDF

iterations

Level Set 2D Effect of a

• Governing equation:

¶t= - Ñf aD(x )+ (1-a)Ñ×

ëêê

ûúú

900 iterations, a = 0.06 900 iterations, a = 0.95 1800 iterations, a = 0.001

C2050 opt C2050 Quadro FX580

Xeon E5520

152.24 sec

15.43 sec 8.94 sec 2.12 sec

Max speedup ~70 X

Level Set Method • Computational Problems

- Computationally intensive for moderate sized data sets

- SDF is not maintained

- Explicit schemes give CFL time step restriction

• Current Apprach

- OpenCL to accelerate adaptive timestepping PDE solver

- OpenMPI to distribute processing to engage multiple GPUs

- Qt – interactive user interface

- Liar/C2Liar – Quantitative Imaging toolbox (CSIRO internal)

- OpenGL + OpenCL for interactive real time volume rendering

• Further Implementation Notes

- The Level Set PDE solver mapped across multiple GPUs using OpenCL and MPI

- 3D volume split evenly among MPI processors

- Ghost slices transmitted between processes at each iteration – minimal overhead

- Simple boundary extension is performed at each iteraction

“Safe Interactions” with bacteria. level set segmentation researched also to track biofilm formation.

Bone Model C4-data-set

Interactive volume visualisation – OpenGL + OpenCL

Bone Model C4-data-set

Interactive volume visualisation – OpenGL + OpenCL

GPU Speed-up Results

3 6 9 12

Number Of Processes

Execution Time (sec per 50 Iterations)

Cloud based image analysis and processing •Available now: http://cloudimaging.net.au, see our demos

Cloud based image analysis and processing • Currently using WebGL based 3D viewer Slice:Drop http://slicedrop.com

• Need for WebCL/WebGL for interactive parameters tuning

Experiments with Fluids

Courtesy of High Field Magnet Laboratory, NL

Air Air

Wakayama Jet

Ozoe Lab, Candle

Ozoe Lab, 10T magnet

Verification – Physics vs Simulations

numerical

Methodology • Governing equations Discretization Implementation Verification Results

Transfer CPU code to heterogeneous architectures Verification Results

• Method tested

- Finite Difference (for discretization), HSMAC Method (for mutual iteration of pressure and

velocity fields), BFC (to solve N-S on complex geometry), Upwind and UTOPIA (for getting

more stable and accurate results), OpenCL (speedup)

• Speedups achieved: ~200-230x

• Easily extended, 2D, 3D, applied to simulate different cases, e.g. smoke, free

surface flows, ventilation, turbulent flows, etc.

• See: https://www.massive.org.au/images/stories/workshops-and-tutorials/computational-fluid-dynamics.pdf

Boundary Fitted Coordinates

Ground Penetrating Radar and FDTD • Finite-difference time-domain method (FDTD) is a computational technique used to model electromagnetic

fields (in our case, radio waves). It provides an approximation to both the spatial and temporal derivatives

that appear in Maxwell’s equations.

• FDTD models how radio waves propagate through space and reflect at boundaries depending on the

properties of the material.

• FDTD code is used to generate synthetic data for a ground

penetrating radar (GPR) system.

• FDTD simulated data can be compared to a real radar system

data from a test environment with known geometry.

With an aim to get modelled information to match the real

system so that predictions of unknown geometry can be made.

Implementation: Based on original code by Andrew Strange. OpenCL version used mixed precision code with single precision image buffers used for constant matrices.

Contact: Josh Bowden CSIRO ASC

Other projects using OpenCL

PCA NIPALS

algorithm –

rapid estimation of

wood properties

Sparse

Hydrodynamic

Ocean Code

(SHOC)

Finite Difference

Time Domain

(FDTD) - Ground

Penetrating Radar

AWDA-DA

Ground water data

assimilation

Contact: Josh Bowden CSIRO ASC

GPGPU for Games • GPU heavily used for game physics

- Performance vs Accuracy

• Nvidia PhysX and FLEX

- Particles, Fluids and Rigid Bodies on

the GPU

• Bullet Physics

- Version 3.x to have 100% GPU support

• Not all games require cutting edge graphics, which leaves processing time

available for other areas

- Audio Processing, Artificial Intelligence

Conan Bourke AIE

GPGPU Artificial Intelligence • Game Artificial Intelligence doesn’t easily lend itself to massively

parallel implementations

- Inter-Agent Communication

- Complex decision making typically controlled by a non-programmer

designer resulting in lots of dynamic branching

- Numerous levels of world state data required

• Some aspects do…

- Flow Fields

- Influence Maps

- Path Finding

- Steering Behaviors

Conan Bourke, Tomasz Bednarz

GPGPU Steering Behavior • Steering Behaviors are a set of

commonly used algorithms for agent motion - Craig Reynolds “Steering Behaviors for

Autonomous Characters” GDC1999

• Easily adaptable to parallel implementations due to close similarities to particle physics - Position, Velocity etc

• Simple brute-force algorithm can be implemented in OpenCL within a couple hours for vast performance gains - Conan Bourke & Tomasz Bednarz

“Introduction to GPGPU for AI”, Game AI Pro 2013 0

Intel i7-2670QM

NvidiaGTX560

NvidiaTeslaC2050

Conan Bourke, Tomasz Bednarz

Acknowledgments • Collaborators, coworkers, coauthors:

- John Taylor

- Josh Bowden

- Luke Domanski

- Filippo Ammazzarolso

- Urszula Jelen

- Marc Piggott

- Wojtek Goscinski

- Conan Bourke

- Tim Gureyev

- Darren Thompson

- NeCTAR RT035 Project team

- Sam Moskwa

- Matt Adcock

- and many others…

Sydney Khronos Chapter http://www.meetup.com/Sydney-GPU-Users

Last meetup, 17th Oct Tomasz Bednarz: Welcome, the Khronos Group Conan Bourke: The Fall and Rise of OpenGL for Games Rob Manson: The Augmented Web Visit to the CSIRO Vis Lab Pizza

Accelerated Computing Workshop @ OzViz • ACW and OzViz 2013 to be held on 8th-

10th December 2013 in Melbourne

- Technologies: OpenCL, CUDA, etc.

- Hardware architectures: GPU, APU, MIC,

• Previous events

- OpenCL Workshop at OzViz 2010, Brisbane - Presenters: Mike Houston, Mark Harris, Derek

Gerstmann, Tomasz Bednarz, Craig James,

Con Caris, John Taylor

- ACW at OzViz 2011, Sydney - Keynote: Prof. Takayuki Aoki

- ACW at OzViz 2012, Perth

https://sites.google.com/site/ozvizworkshop/ozviz-2013

Accelerated Science - Khronos Group€¦ · 11388 (3936) 10357 (5941) GPU OpenCL 219 (109) 122 (51)...

Documents

Transcript of Accelerated Science - Khronos Group€¦ · 11388 (3936) 10357 (5941) GPU OpenCL 219 (109) 122 (51)...

Prince of Songkla University€¦ · 008/ 61 009/ 61 010/ 61 011/ 61 012/ 61 013/ 61 014/ 61 015/ 61 016/ 61 017/ 61 018/ 61 019/ 61 020/ 61 021/ 61 022/ 61 023/ 61 - ana ..024/61

N.º 109 N.º 109 - 2021 Mente Cerebro

RELATÓRIO · Coordenadora-Geral: Ana Carolina Meneghetti Peres EQN 102/103, Bloco 1 , sala 109. Brasília - DF. CEP 70722-400 Tel.: (61) 2027-7694.

Pour que souffle l’esprit olympique en Isère · 3 Pour que souffle l’esprit Olympique en Isère « L’Isère sportive » • 61 comités sportifs départementaux, • 3 109

ArbeitenaufPapier - galeriestuker.chgaleriestuker.ch/fileadmin/CatalogPDF/2012 Herbst/H12_Gemaelde_4.pdf · stalt/J.E.Wolfensberger/Zürich-Bederst.109 ».Gerahmt. 61 :60 ,5 cm. 800

Data Center Racks 11388 R09 10

SQL2-ch2. 80 4 15 16 23 26 35 48 56 61 65 78 140 38 101 109 122 135.

News Release Amcor Limited 109 Burwood Road Hawthorn VIC 3122 Australia T +61 3 9226 9000 F +61 3 9226 9050 ABN 62 000 017 372 1 November 2013 AMCOR LIMITED DEMERGER

109 Madrid, 2018. ISSN: 1134-2277 109

BMC - Truck ExpertApplication Dimension Description EM Nr Truckexpert Nr BMC 109 00 002 109 00 003 109 00 004 109 00 005 109 33 015 109 33 016 2B100634 53RS201696 53RS201508 1 Nut

BBMP / COVID-19 WAR ROOM / BULLETIN - 118 / 19.07.2020 / … · 2020. 7. 19. · 567 533 525 522 61 45 52 81 74 96 125 149 109 134 9310-07-2020 78 109 59 56 11415-07-2020 158 163

Dirección General de Certificación Turística Dirección … · 61 ivÁn felipe pÉrez medina baja california sur ... 99 oscar hÉctor ortiz garcÍa baja california sur ... 109

28642 Federal Register /Vol. 61, No. 109/Wednesday, June … · · 2018-03-1728642 Federal Register/Vol. 61, No. 109/Wednesday, June 5, ... protect public safety and the environment,

Grammar / Vocabulary Ù O~Ø ©ßÒ åæ...Grammar / Vocabulary Ù O~Ø ©ßÒ åæ Visit for detailed information Main p. 61 Index p. 96 Main p. 61 Index p. 109 6–12 Grammar Friends

· (Fluuatmnnqnanò 7 - 15 - 2561 9 {u 7,900 dnuqu thJVl. 109 158 109 82 49 36 38 68 61 122 91 42 126 44 -13 i.eJ.61 10 16 ï].th61 13- -19 ii.EJ.61

Largeand small-scalestructureoftheIntermediateand ...NGC346 Kron 39 SMC 302.14, –44.94 61 6825 60 109 0.3 – 20.7 5.3 – 367 NGC1761 LH09 LMC 277.23, –36.07 51 13650 135 109

SYMPHONY FOR BAND - xsrv.jphorn.xsrv.jp/PERSICHETTISymphonyforBand/PERSICHETTI... · 2018-06-28 · Hn. 1 Hn. 2 Hn. 3 Hn. 4 Hn. 5 Hn. 6 109 109 109 109 109 109 13

CITROEN CITROEN - VICTOR REINZCITROEN Ltr. A CITROEN 1,1 109K 109/5 109/5E 109/5F 35-42 kW (48-57 PS) C15 LNA VISA 11 09/78-12/96 02-23307-01 bm 0197.74 08-23542-01 bn bq 0197.76 61-23775-20

Deutscher Bundestag Drucksache 11388dipbt.bundestag.de/doc/btd/18/113/1811388.pdfDeutscher Bundestag – 18. Wahlperiode – 3 – Drucksache 18/11388 Asylberechtigte insgesamt 39.783

Troop 109 - Home · Troop 109 - Home