Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers...

24
Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx

Transcript of Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers...

Page 1: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

Programming and roadmap for Xilinx FPGAsProgramming and roadmap for Xilinx FPGAs

Kees VissersXilinx

Page 2: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

Contents

� Processors and FPGAs

� BDTI

� The Video Starter Kit

� The results

� The performance in perspective

© Copyright 2010 Xilinx

October 2009Page 2

� The performance in perspective

� Next step: ARM processors + FPGA

� Virtex7 , 28nm, power and compute

� Conclusion

Page 3: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

Processors and Pipelines

1000:1 100:1 10:1 1:1clock:sample

200Ks/s 2Ms/s 20Ms/s 200Ms/sData Rate(200MHz clock)

RISC

Proc.Proc. w/

accels.

Folded

datapath

Pipelined

datapathDesign

approach

1:10

2 Gs/s

Replicated

datapath

© Copyright 2010 Xilinx

October 2009Page 3

(200MHz clock)

Applications control → audio → mobile video → HDTV → comms → networking

C to RTL tools

FPGAs

Processors

Page 4: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

BDTI High-Level Synthesis Tool Certification

Program

� The certification program was developed through a collaboration

between BDTI and Xilinx

• The two companies jointly developed a robust methodology for evaluating HLS tools

• The program incorporates two realistic example applications, one in wireless and one

in video

� The program currently evaluates HLS tools targeting Xilinx FPGAs

© Copyright 2010 Xilinx

October 2009Page 4

� The program currently evaluates HLS tools targeting Xilinx FPGAs

� The first two vendors to participate are AutoESL and Synfora, now

Synopsys

� The Program Compares FPGA & DSP Implementation Approaches

for the Same Application

FPGA Using HLS Tool

FPGA Using RTL-Based Design

DSP ProcessorUsing Software Tool Suite

vs. vs.

Page 5: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

BDTI and the Xilinx Video Starter Kit

Use the video starter kit, and the given video files on the provided box.

- Board contains a Spartan3 3400A126 DSP48,126 BRAMs (18Kb), 47,744 (4-input) LUTs

Video Starter Kit

© Copyright 2010 Xilinx

October 2009Page 5

47,744 (4-input) LUTs

Optical flow: - Two operating points:

- real-time 60fps, 1280 x 720p (75Msps)- Maximum performance using all

resources.

DQPSK: fixed workload 18.75Msps running with an implementation at 75MHz (initiation interval = 4).

Video

Source

‘DVIco box’

Page 6: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

Design Flow

C to gates toolCustomer C code

Xilinx ISE

synthesis,

Customer

application

GCC/

Visual Studio

Customer C code

Design Template

© Copyright 2010 Xilinx

October 2009Page 6

synthesis,

place, route,

bitgen

Xilinx FPGA

Xilinx IP CoresXilinx ISE/EDK

Algorithmic

C-to-FPGA

C

Libraries

Page 7: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

Certified results for the BDTI Optical Flow Workload,Operating Point 2: maximum frame rate achievable at 720p resolution

Very good results compared to a DSP for the

Video Application

Performance (frames/sec.)

195

250

Cost/Performance ($/frame/sec.)

4.16

4

4.5

© Copyright 2010 Xilinx

October 2009Page 7

5.1

195

0

50

100

150

200

DSP HLST + FPGA

0.14

0

0.5

1

1.5

2

2.5

3

3.5

DSP HLST + FPGA

Page 8: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

Results for the BDTI DQPSK

Receiver Workload,

18.75 Msamples/second input data with a 75

MHz clock

These results are consistent FPGA Utilization

Synthesized results from C program as good as manual RTL

© Copyright 2010 Xilinx

October 2009

8

These results are consistent

with those reported by HLST

users interviewed by BDTI

5.9% 5.9%

0.0%

1.0%

2.0%

3.0%

4.0%

5.0%

6.0%

7.0%

RTL HLST + FPGA

Page 9: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

MetricOut-of-Box Experience

Ease of Use

Completeness of Capabilities

Quality of Documentation and Support)

Combined HLST + Xilinx RTL tools

Fair

(Very Good

Good

(Good / Good

(Good / Good)

Good

(Good / Very

Usability Metrics (1 of 2)Representative Results based on the BDTI Optical Flow™ Workload

© Copyright 2010 Xilinx

October 2009

9

(HLST rating / Xilinx rating)(Very Good

/ Poor)(Good /

Fair)(Good / Good)

(Good / Very Good)

Texas Instruments DM6437 DSP processor and tool suite

GoodVery Good

Very Good Very Good

The above table provides qualitative usability metric scores. Note that HLS tools + Xilinx tools targeting an FPGA

include a combined overall score (in bold) followed in parenthesis by:

•The score for the HLS tool only (the first score in parenthesis)

•The score for the Xilinx RTL tools only (the second score in parenthesis)

Page 10: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

Metric

Efficiency of Design Methodology

Learning to use

the Tool

Design & Implementation (first compiling

version)

Design & Implementation (final optimized

version)

Platform Infrastructure Development

Extent of Modifications to Reference

Code)

Combined HLST +

Usability Metrics (2 of 2)Representative Results based on the BDTI Optical Flow™ Workload

© Copyright 2010 Xilinx

October 2009

10

Combined HLST + Xilinx RTL tools (HLST rating / Xilinx rating)

Good

(Good / N.A.)

Very Good

(Very Good / N.A.)

Good

(Good / Good)

Good

(Good / Good)

Good

(Good / Good)

Texas Instruments DM6437 DSP processor and tool suite

N.A. Excellent Good Good Fair

The above table provides qualitative usability metric scores. Note that HLS tools + Xilinx tools targeting a Xilinx FPGA

include a combined overall score (in bold) followed in parenthesis by:

•The score for the HLS tool only (the first score in parenthesis)

•The score for the Xilinx RTL tools only (the second score in parenthesis)

Page 11: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

Next Generation Extensible Platform

� Arm A9 Processors

� FPGA fabric

� Several mechanisms for interconnect

© Copyright 2010 Xilinx

October 2009Page 11

Page 12: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

Next generation Products in context

1000:1 100:1 10:1 1:1clock:sample

200Ks/s 2Ms/s 20Ms/s 200Ms/sData Rate(200MHz clock)

RISC

Proc.Proc. w/

accels.

Folded

datapath

Pipelined

datapathDesign

approach

1:10

2 Gs/s

Replicated

datapath

© Copyright 2010 Xilinx

October 2009Page 12

(200MHz clock)

Applications control → audio → mobile video → HDTV → comms → networking

Modern Arm processors

Several Gops

Fabric

100 – 1000 Gops

Page 13: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

Introducing the Xilinx 7 Series FPGAsIndustry’s First Unified Architecture

� Industry’s Lowest Power and First Unified Architecture

– Spanning Low-Cost to Ultra High-End applications

� Three new device families with breakthrough innovations in power

efficiency, performance-capacity and price-performance

© Copyright 2010 Xilinx

October 2009Note: Information on 28Gbps serial transceiver support to be disclosed later this year

Page 13

Page 14: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

Xilinx Focused on Power Efficiency from

Every Angle

BRAM

High performance, low power process Transistor choice

optimization

Config

Memory

VCCAUX

Reduced from 2.5V to 1.8V

Reducing Static Power

5th gen. partial reconfiguration

Additional Power Savings

Integrated Analog Front End

© Copyright 2010 Xilinx

October 2009Page 14

Low Power

by Xilinx

Unused BRAM Power Savings

Out

InPad

-

+

VCCO

IO Design & User Power

Saving Modes

Reducing I/O Power

Process Shrink

Reducing Dynamic Power

Optimized Hard Blocks

Fine grain clock and logic gating

Lower device core voltage

-1 LXilinx

7 SeriesFPGAs

Before After

reconfiguration

Page 15: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

The Power Payoff: Capacity and CapabilityNormalized Total Pow

er

70%

80%

90%

100%

I/O Power

50% Lower Power Increase Usable

Performance

and Capacity

Enables

30%

© Copyright 2010 Xilinx

October 2009

28nm

Normalized Total

28nm

0%

10%

20%

30%

40%

50%

60%

40nm

Dynamic Power

Static Power 65%

25%+*

* Additional savings beyond 25% from optimized hardened blocks

Page 16: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

7 Series Families Comparison

Maximum Capability Artix-7 Family Kintex-7 Family Virtex-7 Family

Logic Cells 352K 407K 1,955K

Block RAM 12Mb 29Mb 65Mb

DSP Slices 700 1,540 3,960

� Unified architecture, scalable across all families from high-volume to ultra high-end applications

– Each family optimized for power, performance and price meeting specific application needs

© Copyright 2010 Xilinx

October 2009

Peak DSP Performance (symmetric FIR) 504 GMACS 1,848 GMACS 4,752 GMACS

Transceiver Count 4 16 80

Peak Transceiver Speed 3.75Gbps 10.3125Gbps 13.1Gbps+

Peak Serial Bandwidth (full duplex) 30Gbps 330Gbps 1,886Gbps

PCI Express Interface Gen1 x4 Gen2 x8 Gen3 x8*

Memory Interface 800Mbps 2,133Mbps 2,133Mbps

I/O Pins 450 500 1,200

I/O Voltage 1.2V, 1.5V, 1.8V, 2.5V, 3.3V

1.2V, 1.5V, 1.8V, 2.5V, 3.3V

1.2V, 1.35V, 1.5V, 1.8V, 2.5V, 3.3V**

Packaging Options Low cost wire bond

Low cost lidless flip chip and High performance flip chip

Highest performanceflip chip

Page 16

* Check Product Overview for device details on soft vs. hard Gen3 x8 and 2.5v and 3.3v support

Page 17: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

Xilinx 7 Series Meets Next Generation

Design Challenges

� Highest System Performance

– Advanced features and industry

leading capacity

� Highest Productivity

© Copyright 2010 Xilinx

October 2009

– Unified architecture enables IP

portability and design scalability

saving engineering investments

� Lowest Total Power

– Allowing additional system performance

Page 17

Page 18: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

Logic Cells

DSP Slices

Max. Transceivers

Industry’s Highest System

Performance

Virtex-7 Family –

Highest Performance and Capacity FPGAs

Max. serial bandwidth = 1.9Tbps

World’s 1st 2M logic cell FPGA

Optimized for communication systems requiring

highest performance and serial connectivity.

Enabling world’s 1st Ultra High End FPGAs.

Max. DSP throughput = 2.4TMACS

(symmetric FIR 4.7TMACS)

© Copyright 2010 Xilinx

October 2009

Max. Transceivers

Transceivers Performance

Memory Performance

Max. SelectIO™

Select IO Voltages

Max. serial bandwidth = 1.9Tbps

Max. parallel I/O bandwidth = 0.92Tbps

Combined bandwidth = 2.4Tbps

576 differential I/O pairs @ 1.6Gbps LVDS and

DDR3-2133

1.8V/3.3V I/O mixture optimized to meet the wide

range of high performance application requirements

Page 18

* Information on 28Gbps serial transceiver

support to be disclosed later this yearEasyPath™EasyPath™--7 for additional cost savings7 for additional cost savings

Page 19: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

Compute and Power Consumption experience

� Video and Signal processing Risc operations (mostly 16 bit) require in the

range of 10- 20 Luts per Risc operation, profiled on actual designs

� With mostly 32 bit operations this goes to 20-30 Luts per Risc operation,

profiled on actual designs.

� Floating Point supported with High Level Synthesis

� The power consumption of a total system is significant in the I/O subsystem:

Drive to High Speed serial with low swing and to large devices with a large

© Copyright 2010 Xilinx

October 2009

Drive to High Speed serial with low swing and to large devices with a large

amount of logic/embedded memory/DSPs

� The communication between processors and FPGA requires low latency and

lots of bandwidth: PCIe, cooperation with Intel on QPI, and integrated

processors (ARM) for midrange

� New Virtex7 devices will have 568,000 6-input LUTs, and 3,960 DSP

elements.

� This leads to compute in the range of (568,000 /20) + 3,960 *2 = 36,320 Risc

operations

� This is in the range of 300Mhz * 36,320 ~ 10 * 1012 or 10 Tera ops

Page 19

Page 20: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

Summary

� High-level Synthesis makes programming FPGAs in

C/C++/SystemC as easy as programming for optimized

DSPs.

� FPGAs are extremely efficient and low power for streaming

compute solutions, with deep pipelining

© Copyright 2010 Xilinx

October 2009

� High-speed serial I/O is the lowest power communication

between devices.

� The combination of processors (internal and external) with an

OS and FPGA fabric and High-level Synthesis is opening a

new opportunity for programmers that have not used FPGA

technology

� Xilinx is driving the Roadmap in 28nm and beyond as a

leader, with an emphasis on performance for limited power

consumptionPage 20

Page 21: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

Backup

� Detailed Series 7 parts

© Copyright 2010 Xilinx

October 2009Page 21

Page 22: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

Artix-7 FPGA Family

© Copyright 2010 Xilinx

October 2009Page 22

Page 23: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

Kintex-7 FPGA Family

© Copyright 2010 Xilinx

October 2009Page 23

Page 24: Programming and roadmap for Xilinx FPGAs · Programming and roadmap for Xilinx FPGAs Kees Vissers Xilinx. Contents Processors and FPGAs BDTI The Video Starter Kit ... Xilinx FPGA

Virtex-7 FPGA Family

© Copyright 2010 Xilinx

October 2009Page 24