Altera SDK for OpenCL - PLDWorld.com · Altera SDK for OpenCL A novel SDK that opens up the world...

14
© 2012 Altera CorporationPublic Altera SDK for OpenCL A novel SDK that opens up the world of FPGAs to today’s developers Altera Technology Roadshow 2013

Transcript of Altera SDK for OpenCL - PLDWorld.com · Altera SDK for OpenCL A novel SDK that opens up the world...

© 2012 Altera Corporation—Public

Altera SDK for OpenCL

A novel SDK that opens up the world

of FPGAs to today’s developers

Altera Technology Roadshow 2013

© 2012 Altera Corporation—Public

Today’s News

Altera today announces its SDK for OpenCL

OpenCL allows software developers to boost system performance by using an FPGA’s massively parallel architecture

Increases designer productivity by raising the level of design abstraction

2

Altera working

with Proof of

Concept

Customers

Altera Opens

OpenCL Early

Access Program

Altera announces

SDK for OpenCL

2010 2011

Altera Joins

Khronos Group

2012 2012

© 2012 Altera Corporation—Public

Performance Challenges

CPU CPUs

Multiple CoresMultiple Cores 100s of Cores100s of Cores Single CoreSingle Core

Performance Challenge

3

Performance Wanted

Multimedia HD Video Processing

Image processing

High-Performance Computing Financial Modeling

Big data analytics

Scientific computing

Military Radar image processing

Persistent surveillance

Medical Medical imaging

Bio informatics

FPGAsFPGAs

GeneralGeneral-- PurposePurpose

GPUsGPUs

DSPsDSPs

© 2012 Altera Corporation—Public

Modern Altera FPGA: Massively Parallel

>1M logic elements

>3.9 billion transistors

>50 Mb of integrated memory

Variable precision floating-point DSP blocks

High-speed serial transceivers

4

© 2012 Altera Corporation—Public

Altera Is Driving Silicon Convergence

5

General Processors

Software programmable

Great flexibility

Poor power efficiency

Application-Specific

Not programmable, hard wired

Inflexible

Great power efficiency

Many contain embedded

processors

FPGAs

Hardware and software

programmable

Great flexibility

Good power efficiency

= Microprocessor + DSP + Application-Specific IP + Programmable Fabric

µP DSP

Need for Efficiency »

ASIC ASSP

« Need for Flexibility

FPGA Combines the Best

of All Four + FPGA

FPGA Combines the Best

of All Four + FPGA

FPGAFPGA

µP ASIC

DSP ASSP

© 2012 Altera Corporation—Public

Improving FPGA Development Productivity

6

De

ve

lop

men

t P

rod

ucti

vit

y

System Performance Low High

High

Multi-core CPU

GPU + CPU

FPGA + CPU

Standard CPU

Custom ASIC

Need to design at higher levels

© 2012 Altera Corporation—Public

7

OpenCL for Heterogeneous Solutions

C-based language with extensions: Standard C Language

Altera OpenCL C extensions (adds parallelism to C)

API (Open standard for different devices)

Programming model supports parallelism

in heterogeneous systems CPU/GPU/FPGA

main(…)

{

}

main(…)

{

for( … )

{

}

main(…)

{

}

main(…)

{

for( … )

{

}

main(…)

{

}

main(…)

{

for( … )

{

}

ProcessorProcessor

Hardware

Accelerator

OpenCL

Host Program + Kernels

Compiler

© 2012 Altera Corporation—Public

Introducing Altera SDK for OpenCL

8

Binary Programming

File

Binary Programming

File

SDK for OpenCL SDK for OpenCL

Standard C

Compiler

Standard C

Compiler

Executable File

Executable File x86 x86

OpenCL

Host Program + Kernels

OpenCL

Host Program + Kernels

© 2012 Altera Corporation—Public

OpenCL Implementation on Altera FPGAs

9

O F F C H I P I N T E R C O N N E C T

O N C H I P M E M O R Y I N T E R C O N N E C T

Memory

Blocks

Arb

itration

Netw

ork

HostHost

ProcessorProcessor

Accelerator

/

Datapath

Computation

Accelerator

/

Datapath

Computation

Accelerator

/

Datapath

Computation

Host

Program

Altera FPGA

Standard C

Compiler

Standard C

Compiler

OpenCL

Host Program + Kernels

OpenCL

Host Program + Kernels

SDK for OpenCL SDK for OpenCL

Memory

Blocks

Memory

Blocks

Memory

Blocks

Memory

Blocks

Memory

Blocks

Memory

Blocks

Memory

Blocks

Placement & Routing Locked,

Timing Closed

PCIe Memory Interface

Controller

Memory Interface

Controller

External DDR Memory

External DDR Memory

© 2012 Altera Corporation—Public

Accelerating Performance with SoC FPGAs

10

Single-Chip

OpenCL Solution:

SoC = ARM + FPGA

Integration Enables:

Higher bandwidth and

lower latency between

FPGA and processor - >125Gbps Interconnect

Processor integration

reduces system cost

FPGA

Hard Processor System

ARM Cortex-A9 ARM Cortex-A9

FPGA FPGA

Hard Processor System Hard Processor System

ARM Cortex-A9 ARM Cortex-A9

>125

Gbps

© 2012 Altera Corporation—Public

OpenCL Example #1 –

Faster Time-to-Market

Video Camera Requiring Intense Video Processing

Proprietary video codec algorithms

Captures frames with different

exposure levels retains highlight

and shadow details

Let Customer Implement Code In An FPGA In <1 Week

Port C-code to OpenCL to FPGA

implementation

C HDL typically requires 3-6 months

Saved Months

of Development

11

© 2012 Altera Corporation—Public

OpenCL Example #2 –

Higher Performance

Financial Marketplace Monte-Carlo Black Scholes Simulations

Calculate the value of trading options w/ multiple sources of uncertainty

FPGA delivers higher performance at a fraction of the power

>9X Higher Performance vs CPU Alone

8.0

8.5

9.0

9.5

10.0

10.5

11.0

11.5

12.0

0.0

0

0.0

5

0.1

0

0.1

5

0.2

0

0.2

5

0.3

0

0.3

5

0.4

0

0.4

5

0.5

0

0.5

5

0.6

0

0.6

5

0.7

0

0.7

5

0.8

0

0.8

5

0.9

0

0.9

5

1.0

0

Sto

ck P

rice

Time

OpenCL

MCBS

Quad-core

µP

Comparable

GPU

Stratix IV

FPGA

Number of Cores 8 448 N/A

Simulations per

Second 240M 2100M 2,200M

Power (Watts) 130W 215W 21W

12

© 2012 Altera Corporation—Public

OpenCL Example #3 –

Power Efficiency

Documenting Search / Filtering Algorithm

Review incoming stream (documents) and return best match

E.g. Monitors news feeds and recommends others

Power savings = Cost savings (huge issue for server farms)

>5X Performance/Watt vs. GPU

Platform Quad-core

µP

Comparable

GPU

Stratix IV

FPGA

Number of Cores 6 448 N/A

Performance /Watt (MT/J) 15.9 15.1 83.6

13

© 2012 Altera Corporation—Public

Benefits of Altera OpenCL for FPGA

14

Superior Design Productivity

Quick and easy evaluation of different solutions

Fast development / debug / optimization cycles

Faster time-to-market

Higher Performance

>9X greater performance vs CPU alone running a Monte-Carlo Black Scholes simulation

Improved Power Savings

>5X performance/Watt vs GPU-based heterogeneous systems running a document search algorithm

Greater Portability

Reuse across multiple platforms, multiple generations

Binary Programming

File

Binary Programming

File

SDK for OpenCL SDK for OpenCL

Standard C

Compiler

Standard C

Compiler

Executable File

Executable File x86 x86

OpenCL

Host Program + Kernels

OpenCL

Host Program + Kernels