Altera SDK for OpenCL - PLDWorld.com · Altera SDK for OpenCL A novel SDK that opens up the world...
Transcript of Altera SDK for OpenCL - PLDWorld.com · Altera SDK for OpenCL A novel SDK that opens up the world...
© 2012 Altera Corporation—Public
Altera SDK for OpenCL
A novel SDK that opens up the world
of FPGAs to today’s developers
Altera Technology Roadshow 2013
© 2012 Altera Corporation—Public
Today’s News
Altera today announces its SDK for OpenCL
OpenCL allows software developers to boost system performance by using an FPGA’s massively parallel architecture
Increases designer productivity by raising the level of design abstraction
2
Altera working
with Proof of
Concept
Customers
Altera Opens
OpenCL Early
Access Program
Altera announces
SDK for OpenCL
2010 2011
Altera Joins
Khronos Group
2012 2012
© 2012 Altera Corporation—Public
Performance Challenges
CPU CPUs
Multiple CoresMultiple Cores 100s of Cores100s of Cores Single CoreSingle Core
Performance Challenge
3
Performance Wanted
Multimedia HD Video Processing
Image processing
High-Performance Computing Financial Modeling
Big data analytics
Scientific computing
Military Radar image processing
Persistent surveillance
Medical Medical imaging
Bio informatics
FPGAsFPGAs
GeneralGeneral-- PurposePurpose
GPUsGPUs
DSPsDSPs
© 2012 Altera Corporation—Public
Modern Altera FPGA: Massively Parallel
>1M logic elements
>3.9 billion transistors
>50 Mb of integrated memory
Variable precision floating-point DSP blocks
High-speed serial transceivers
4
© 2012 Altera Corporation—Public
Altera Is Driving Silicon Convergence
5
General Processors
Software programmable
Great flexibility
Poor power efficiency
Application-Specific
Not programmable, hard wired
Inflexible
Great power efficiency
Many contain embedded
processors
FPGAs
Hardware and software
programmable
Great flexibility
Good power efficiency
= Microprocessor + DSP + Application-Specific IP + Programmable Fabric
µP DSP
Need for Efficiency »
ASIC ASSP
« Need for Flexibility
FPGA Combines the Best
of All Four + FPGA
FPGA Combines the Best
of All Four + FPGA
FPGAFPGA
µP ASIC
DSP ASSP
© 2012 Altera Corporation—Public
Improving FPGA Development Productivity
6
De
ve
lop
men
t P
rod
ucti
vit
y
System Performance Low High
High
Multi-core CPU
GPU + CPU
FPGA + CPU
Standard CPU
Custom ASIC
Need to design at higher levels
© 2012 Altera Corporation—Public
7
OpenCL for Heterogeneous Solutions
C-based language with extensions: Standard C Language
Altera OpenCL C extensions (adds parallelism to C)
API (Open standard for different devices)
Programming model supports parallelism
in heterogeneous systems CPU/GPU/FPGA
main(…)
{
}
main(…)
{
for( … )
{
}
main(…)
{
}
main(…)
{
for( … )
{
}
main(…)
{
}
main(…)
{
for( … )
{
}
ProcessorProcessor
Hardware
Accelerator
OpenCL
Host Program + Kernels
Compiler
© 2012 Altera Corporation—Public
Introducing Altera SDK for OpenCL
8
Binary Programming
File
Binary Programming
File
SDK for OpenCL SDK for OpenCL
Standard C
Compiler
Standard C
Compiler
Executable File
Executable File x86 x86
OpenCL
Host Program + Kernels
OpenCL
Host Program + Kernels
© 2012 Altera Corporation—Public
OpenCL Implementation on Altera FPGAs
9
O F F C H I P I N T E R C O N N E C T
O N C H I P M E M O R Y I N T E R C O N N E C T
Memory
Blocks
Arb
itration
Netw
ork
HostHost
ProcessorProcessor
Accelerator
/
Datapath
Computation
Accelerator
/
Datapath
Computation
Accelerator
/
Datapath
Computation
Host
Program
Altera FPGA
Standard C
Compiler
Standard C
Compiler
OpenCL
Host Program + Kernels
OpenCL
Host Program + Kernels
SDK for OpenCL SDK for OpenCL
Memory
Blocks
Memory
Blocks
Memory
Blocks
Memory
Blocks
Memory
Blocks
Memory
Blocks
Memory
Blocks
Placement & Routing Locked,
Timing Closed
PCIe Memory Interface
Controller
Memory Interface
Controller
External DDR Memory
External DDR Memory
© 2012 Altera Corporation—Public
Accelerating Performance with SoC FPGAs
10
Single-Chip
OpenCL Solution:
SoC = ARM + FPGA
Integration Enables:
Higher bandwidth and
lower latency between
FPGA and processor - >125Gbps Interconnect
Processor integration
reduces system cost
FPGA
Hard Processor System
ARM Cortex-A9 ARM Cortex-A9
FPGA FPGA
Hard Processor System Hard Processor System
ARM Cortex-A9 ARM Cortex-A9
>125
Gbps
© 2012 Altera Corporation—Public
OpenCL Example #1 –
Faster Time-to-Market
Video Camera Requiring Intense Video Processing
Proprietary video codec algorithms
Captures frames with different
exposure levels retains highlight
and shadow details
Let Customer Implement Code In An FPGA In <1 Week
Port C-code to OpenCL to FPGA
implementation
C HDL typically requires 3-6 months
Saved Months
of Development
11
© 2012 Altera Corporation—Public
OpenCL Example #2 –
Higher Performance
Financial Marketplace Monte-Carlo Black Scholes Simulations
Calculate the value of trading options w/ multiple sources of uncertainty
FPGA delivers higher performance at a fraction of the power
>9X Higher Performance vs CPU Alone
8.0
8.5
9.0
9.5
10.0
10.5
11.0
11.5
12.0
0.0
0
0.0
5
0.1
0
0.1
5
0.2
0
0.2
5
0.3
0
0.3
5
0.4
0
0.4
5
0.5
0
0.5
5
0.6
0
0.6
5
0.7
0
0.7
5
0.8
0
0.8
5
0.9
0
0.9
5
1.0
0
Sto
ck P
rice
Time
OpenCL
MCBS
Quad-core
µP
Comparable
GPU
Stratix IV
FPGA
Number of Cores 8 448 N/A
Simulations per
Second 240M 2100M 2,200M
Power (Watts) 130W 215W 21W
12
© 2012 Altera Corporation—Public
OpenCL Example #3 –
Power Efficiency
Documenting Search / Filtering Algorithm
Review incoming stream (documents) and return best match
E.g. Monitors news feeds and recommends others
Power savings = Cost savings (huge issue for server farms)
>5X Performance/Watt vs. GPU
Platform Quad-core
µP
Comparable
GPU
Stratix IV
FPGA
Number of Cores 6 448 N/A
Performance /Watt (MT/J) 15.9 15.1 83.6
13
© 2012 Altera Corporation—Public
Benefits of Altera OpenCL for FPGA
14
Superior Design Productivity
Quick and easy evaluation of different solutions
Fast development / debug / optimization cycles
Faster time-to-market
Higher Performance
>9X greater performance vs CPU alone running a Monte-Carlo Black Scholes simulation
Improved Power Savings
>5X performance/Watt vs GPU-based heterogeneous systems running a document search algorithm
Greater Portability
Reuse across multiple platforms, multiple generations
Binary Programming
File
Binary Programming
File
SDK for OpenCL SDK for OpenCL
Standard C
Compiler
Standard C
Compiler
Executable File
Executable File x86 x86
OpenCL
Host Program + Kernels
OpenCL
Host Program + Kernels