Active Cells: A Programming Model for Configurable...

51
CASE STUDY 4: CUSTOM DESIGNED MULTI- PROCESSOR SYSTEM Active Cells: A Programming Model for Configurable Multicore Systems 354

Transcript of Active Cells: A Programming Model for Configurable...

Page 1: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM

Active Cells:

A Programming Model for Configurable Multicore Systems

354

Page 2: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Vision

355

P P NILP NILNIL P P PP P PP P

core core

cache

bus

cache

memory

P P

core corePP

core engineP

core corePP

coreP

coreP

engine

General Purpose Shared Memory Computer Application Specific Multicore Network On Chip

Page 3: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Motivation: Multicore Systems Challenges

Cache Coherence

Shared Memory Communication Bottleneck

Thread Synchronization Overhead

Hard to predict performance of a program

Difficult to scale the design to massive multi-core architecture

356

Page 4: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Operating System Challenges

Processor Time Sharing

Interrupts

Context Switches

Thread Synchronisation

Memory Sharing

Inter-process: Paging

Intra-process, Inter-Thread: Monitors

357

Page 5: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Focus

Academia: Education

holistic design of computing systems

simplicity

consistency

Industry: High Performance Sensor Driven Medical IT

streaming applications: ultrasound, tomography, hemodynamics, etc.

358

Page 6: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Focus: Streaming Applications

359

Stream-Parallelism: Pipelining

Task Parallelism:

ParallelExecution Data Paralellism:

Vector ComputingLoop-level parallelism

Page 7: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

4.1. HARDWARE BUILDING BLOCKSTRM AND INTERCONNECTS

360

Page 8: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

TRM: Tiny Register Machine*

Extremely simple processor on FPGA with Harvard architecture.

Two-stage pipelined

Each TRM contains

Arithmetic-logic unit (ALU) and a shifter.

32-bit operands and results stored in a bank of 2*8 registers.

local data memory: d*512 words of 32 bits.

local program memory: i*1024 instructions with 18 bits.

7 general purpose registers

Register H for storing the high 32 bits of a product, and 4 conditional registers C, N, V, Z.

No caches

361* Invented and implemented by Dr. Ling Liu and Prof. Niklaus Wirth

Page 9: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

TRM Machine Language

Machine language: binary representation of instructions

18-bit instructions

Three instruction types:

Type a: arithmetical and logical operations

Type b: load and store instructions

Type c: branch instructions (for jumping)

362from Lectures on Reconfigurable Computing, Dr. Ling Liu, ETH Zürich

Page 10: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Encoding Overview Register Operations

Load and Store

Conditional Branches

Special Instructions

Branch and Link

363

091011131417

0 immRdop imm is zero extended to 32 bits

091011131417

1 RsRdop 0 0 0 x x 0

(a)

(b)

091011131417

1 VRsVRdop 1 0 0 x x x(c)

091011131417

1 x x xRdop 0 0 0 0 0 1(a)

(c)

091011131417

1 RsRdop 1 0 x x x x

091011131417

1 RsRdop 0 1 x x x x(d)

091011131417

1 x x xVRdop 1 0 0 001(b)

091011131417

1 RsRdop 101(e) xxxx

6

(a)

(b)

091011131417

0 RsRdop off

091011131417

1 RsVRdop off

3

3

off is zero extended to 13 bits

0910131417

cond1110 off off is sign extended to 12 bits0131417

1111 off off is 14-bit offset

Page 11: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

TRM architecture

364

Figure from: Niklaus Wirth, Experiments in Computer System Design, Technical Report, August 2010

http://www.inf.ethz.ch/personal/wirth/Articles/FPGA-relatedWork/ComputerSystemDesign.pdf

Page 12: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Variants of TRM

FTRM

includes floating point unit

VTRM (Master Thesis Dan Tecu)

includes a vector processing unit

supports 8 x 8-word registers

available with / without FP unit

TRM with software-configurable instruction width (Master Thesis Stefan Koster, 2015)

365

Page 13: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

1. Bus Interconnect TRM12

366

Column 0

Column 1

Column 2

Column 3

H0

H1

H2H3

C0

inbound arbiteroutbound arbiter

inbound arbiteroutbound arbiter

inbound arbiteroutbound arbiter

inbound arbiter

outbound arbiter

N0

C7

N1

N2

C2

N6

N7

N8

C6

C1

C8

N3

N4

N5

N9

N10

N11

C5 C11

C4

C3

C10

C9

RS232TR

CiNi

: processor core : network controller RS232TR : RS232 transmitter receiver

Timer

LCD

LEDs

DDR2

•12 RISC Cores (two stage pipelined at 116MHz)

• Message passing architecture

•Bus based on-chip interconnect

• On-chip Memory controller

Ling Liu, A 12-Core-Processor Implementation on FPGA, ETH Technical Report, October 2009

ftp://ftp.inf.ethz.ch/pub/publications/tech-reports/6xx/646.pdf

• Not scalable

• Huge resource consumption

Page 14: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

2. Ring Interconnect

node0 node1 node2 node3 node4 node5

node11 node10 node9 node8 node7 node6

TRM0 TRM1 TRM2 TRM3 TRM4 TRM5

TRM11 TRM10 TRM9 TRM8 TRM7 TRM6

TR

MR

ing

RS232

TR

MR

ing

TR

MR

ing

TR

MR

ing

TR

MR

ing

TR

MR

ing

TR

MR

ing

TR

MR

ing

TR

MR

ing

TR

MR

ing

TR

MR

ing

TR

MR

ing

01111110

01111110

01111110

01111110

01111110

01111110

01111110

01111110

01111110 0111

111001111110

01111110

Slide from Dr. Ling Liu‘s Lecture Series on Reconfigurable Computing367

Page 15: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

>trash

RingNode

RB SB

fromRing toRing

statuswreqinDataoutDatardreq

n-bit link

slot

receiving

sending

TRM

TRMRingChannel

ioadr outbus iord iowr inbus

dataToTRMdataFromTRM

dataFromRN statusOfRN

Slide from Dr. Ling Liu‘s Lecture Series on Reconfigurable Computing

Connection TRM / Ring

TRM

Adapter

Ring

• Ring interconnect very simple

• Small router

• Predictable latency

368

• Large delays

• Limited Scalability

Page 16: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

4.2. HARDWARE SOFTWARE CODESIGN

369

Page 17: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Software / Hardware Co-designVision: Custom System on Button Push

System design as high-level program code

Electronic circuits

Computing model

Programming Language

Compiler, Synthesizer,

Hardware Library,

Simulator

Programmable Hardware

(FPGA)

370

Page 18: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

System specification, HW/SW partitioning

Program microcontroller in

C/C++

Programsystem specific

hardware in HDL

Compilation Synthesis

Microcontroller + machine code + specific hardware (eg. DSP)

Traditional HW/SW co-design for embedded

systems

Traditional HW/SW co-design

One Program

One Toolchain

System on FPGA

Active Cells approach for embedded systems

development

Goal

371

Page 19: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Active Cells Computing Model

On-chip distributed system

Cell

Scope and environment for a running isolated process.

Integrated control thread(s)

Provides communication ports

Net

Network of communication cells

Cells connected via channels (FIFOs)

372

Inspired by

• Kahn Process Networks

• Dataflow Programming

• CSP (i.e. Google's Go)

• Actor Model (i.e. Erlang)

Page 20: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Software Hardware Map

channel

cell

fifo

373

Page 21: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Consequences of the approach

No global memory

No processor sharing

No pecularities of specific processor

No predefined topology (NoC)

No interrupts

No operating system

374

Page 22: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Cell

375

type

BernoulliSampler* = cell (probIn: port in; valOut: port out);

var

r: Random.Generator;

p: real;

begin

new(r);

loop

p << probIn;

valOut << r.Bernoulli(p);

end

end BernoulliSampler;

non-typed communication ports

blocking receive

asynchronous send

BernoulliSampler

probIn

valOut

CellActivity

Page 23: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Properties

type

Controller = cell {Processor=TRM, FPU, DataMemory=2048, BitWidth=18} (in: port in (64); result: port out);

...

begin

(* ... controller action ... *)

end Controller;

....

Port Width

376

Properties can influence both, generation of hardware and the generation of software code.

Controller(TRM)

FPU

Page 24: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Configurable Processor on PL

377

Tiny Register Machine

FPU

offon

Vector Unit

offon

Instruction Width

14 16 18 20 22

Multiplier

offon

Program Memory

0k 1k 2k 3k 4k

Data Memory

0.5k 1k 1.5k

Page 25: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Engine

typeMerger = cell {Engine, inputs=1}

(ind: array inputs of port in; outd: port out);var data: longint;begin

loopfor i := 0 to len(in)-1 do

data << ind[i]outd << data;

endend

end Merger;

378

Engines are prebuilt components instantiated as electronic circuitson a target hardware

Merger

Page 26: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Unit of Deployment: (Terminal) CellnetLearnTest = cellnet;var

learner: CRBMNet.CRBMLearner;reader: MLUtil.imageReader;ims0,ims1: MLUtil.imshow;…

beginnew(learner)new(reader);new(ims0{name='v0debug',posx=0,posy=100});new(ims1{name='v1debug',posx=300,posy=100});…reader.imageOUT >> learner.imgIN;learner.v0DebugOUT >> ims0.imageIN;learner.v1DebugOUT >> ims1.imageIN;…

end LearnTest;

379

CRBMLearner

imShow

imageReader

imShow

connection

dynamic construction

connection

Page 27: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Hierarchical Composition

380

CRBMLearner

imShow

imageReader

imShow

CRBMLearner

DownNet

UpNet

Delay

Delay

Delay

Delay

debug

result

img in

… …

…Sample

UpNet

GetGradients

visible

P(h|v)visible

hidden

…hidden

kernels

UpNetsplit

UpStep

Page 28: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Hierarchic Composition: non-terminal CellnetUpNet = cellnet

{vr=28,vc=28,kr=5,kc=5,k=9,c=2,name='upstep'}(pvIN, kerIN, bIN : port in; phOUT: array k of port out; pvSideOUT: port out );

vari,hr,hc: longint;upstep: CRBMUpstepCell;split: MLFunctions.SplitterCell;

beginhr:=vr-kr+1; hc:=vc-kc+1;new(vSplit {dataSize=vr*vc,numOut=2});new(upstep {vr=vr,vc=vc,kr=kr,kc=kc,k=k,c=c});pvIN >> vSplit.dataIN;

vSplit.dataOUT[0] >> pvSideOUT;

vSplit.dataOUT[1] >> UpstepCell.vIN;

kerIN >> UpstepCell.kerIN;bIN >> UpstepCell.bIN;for i:=0 to k-1 do

upstep.phOUT[i] >> phOUT[i];end;

end CRBMUpNet;

381

UpNet

img in

split

UpStep

pvOut

phOut

kernIn

biasIn

port delegationport delegation

ports and properties

Page 29: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Software Hardware Map

382

ARM TRM

ARM ENGINE

ENGINE

ZYNQ PL

PS

cell

engine

port

thread

softcore

hw engine

AX

I4

FIFO

AXI4 Stream Interconnect

Page 30: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Hybrid Compilation

Code body Role Compilation method

Cell (Softcore) Program logic Software Compilation

Cell (Engine) Computation unit Hardware Generation

Cell Net Architecture Hardware Compilation

383

cellnet N;

type A=cell(pi: port in; po: port out);var x: integer;begin… pi ? x; … po ! x; …end A;

var a,b: A;begin… connect(a.po, b.pi)end N.

Page 31: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Implementation Alternatives (1)

384

compilerfrontend

Zybo Backend

Spartan3 Backend

Spartan3e Backend

+ simple- redundant

- not flexible- hard to extend

sourcebitstream

Page 32: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Implementation Alternatives (2)

385

compilerfrontend

+ extensible+ not redundant- not simple- static configuration limits flexibility

Hardware Library Component Library

cellnet interpreterbackend

source bitstream

Page 33: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Implementation Alternatives (3)

386

compilerfrontend

+ extensible+ not redundant+ simpler+ simulation becomes execution

Hardware LibraryExecutables

Component LibraryExecutables

hdl runtime source bitstreamexecutable

Page 34: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Frontend

387

Source Code(Module)

Intermediate Code

(Module)Executable

CompilerFrontend

CompilerBackend

SimulationRuntime

VisualizationRuntime

Software Libraries

Software Libraries

Software Libraries

HDL BackendRuntime

Page 35: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Backend

388

HDL Backend Runtime

DependencyAnalysis

HDL Code Generator

Instruction Code Generator

HW Library

Dependency Graph

HDL Code

TRM Kernel

TRM CodeExecutable

ARM Kernel

ARM CodeExecutable

TRM Kernel

TRM CodeExecutable

ARM Kernel

ARM CodeExecutable

TRM Kernel

TRM CodeExecutable

RuntimeLibraries

CompilerBackend

Executable

FPGA VendorImplementation

Toolsbitstream

Deployment

Intermediate Code

(Module)

RuntimeLibrariesRuntimeLibrariesRuntimeLibraries

Page 36: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Extensibility: Defining components and platforms

Software HLL Codetype Gpo = cell{Engine,DataWidth=32,InitState="0H"} (input: port in);

389

Hardware HDL Codemodule Gpo

#( parameter integer DW = 8 … )

(

input aclk, input aresetn , input [DW−1:0] …

);

Component Specification

Build CommandAcHdlBackend.Build

--target="ZyboBoard" -p="Vivado" CRBM.TestCellnet ~

Platform Specification

Hardware Platform and ToolsHardware Types

Platform Instances

Vendor specific tools

Page 37: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Component Specification

390

module Gpo;import Hdl := AcHdlBackend;var c: Hdl.Engine;begin

new(c,"Gpo","Gpo");c.SetDescription("General Purpose Output … ");

c.SupportedOn("∗"); (∗ portable ∗)

c.NewProperty("DataWidth","DW",Hdl.NewInteger(32),Hdl.IntegerPropertyRangeCheck(1,Hdl.MaxInteger));c.NewProperty("InitState","InitState",Hdl.NewBinaryValue("0H"),nil);

c.SetMainClockInput("aclk"); (* main component's clock *)c.SetMainResetInput("aresetn",Hdl.ActiveLow); (* active-low reset *)c.NewAxisPort("input","inp",Hdl.In,8);c.NewExternalHdlPort("gpo","gpo",Hdl.Out,8);

c.NewDependency("Gpo.v",true,false);

c.AddPostParamSetter(Hdl.SetPortWidthFromProperty("inp","DW"));c.AddPostParamSetter(Hdl.SetPortWidthFromProperty("gpo","DW"));

Hdl.hwLibrary.AddComponent(c);end Gpo.

Identification and description

Supported devices

Parameters

Ports

Dependencies

Configuration Actions

c.NewExternalHdlPort("gpo","gpo",Hdl.Out,8);

Page 38: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Generic Peer-to-Peer Communication Interface

Use of AXI4 Stream interconnect standard from ARM

Generic, flexible

Non-redundant

391

TVALID: Data valid

Senderport

Receiverport

TREADY:Ready to process

Mu

ltip

lexi

ng

net

wo

rkData out

Select Select

Data in

clk clk

master must assert tvalid and keep asserted

slave can wait for master's tvalid and then assert tready

Page 39: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Target Platform Specificationmodule Basys2Board;

import Hdl := AcHdlBackend, AcXilinx;

var t: Hdl.TargetDevice;

pldPart: AcXilinx.PldPart;

ioSetup: Hdl.IoSetup;

pin: Hdl.IoPin;

begin

new(pldPart,"XC3S100E-4CP132");

pldPart.SetJtagChainIndex(0);

new(t,"Basys2Board",pldPart);

new(pin,Hdl.In,"B8","LVCMOS33");

t.NewExternalClock(pin,50000000,50,0); (* ExternalClock0 *)

t.SetSystemClock(t.clocks.GetClockByName("ExternalClock0"),1,1);

new(pin,Hdl.In,"G12","LVCMOS33");

t.SetSystemReset(pin,true);

new(ioSetup,"Gpo_0");

ioSetup.NewIoPort("gpo",Hdl.Out,"U16,E19,U19,V19","LVCMOS33");

t.AddIoSetup(ioSetup);

Hdl.hwLibrary.AddTarget(t);

end Basys2Board.

392

System Signals

FPGA Part

Mapping of external Ports

ioSetup.NewIoPort("gpo",Hdl.Out,"U16,E19,U19,V19","LVCMOS33");

Page 40: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

TRM1

TRM2

TRM9

TRM10 TRM11 TRM12

FIFO1

FIFO8

FIFO9

FIFO16

FIFO17 FIFO18

FIFO19

FIFO20

FIFO33

FIFO34

UART

controller CF

controller

LCD

controller

Virtex-5LX50T FPGA

Xilinx ML505 board

RS232

CF

LCD

ECG

Sensor

·

·

·

·

·

·

Case Study 1: ECGFocus: Resources and Power

Real-time

ECG MonitorSignalinput

Wave proc_1

QRSdetect

HRV analysis

Disease classifier

Wave proc_2

Wave proc_8

ECG

bitstream

out

stream

393

Page 41: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Resources

ECG Monitor*

Maximum number of TRMs in communication chain

394

#TRMs #LUTs #BRAMs #DSPs TRM load

12 13859(48%)

52(86%)

12(25%)

<5%@116 MHz

FPGA #TRMs #LUTs #BRAMs #DSPs

Virtex-5 30 27692 (96%) 60 (100%)

30 (62%)

Virtex 6 500

*8 physical channels @ 500 Hz sampling frequency

implemented on Virtex 5 394

Page 42: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Comparative Power Usage

Preconfigured FPGA (#TRMs, IM/DM, I/O, Interconnect fixed)versus fully configurable FPGA (Active Cells)

System StaticPower (W)

Dynamic Power (W)

Preconfigured("TRM12")

3.44 0.59

Dynamicallyconfigured

0.5 0.58

86% saving!

Core 0

Core 1

Core 2

Core 3

Core 4

Core 5

Core 6

Core 7

Core 8

Core 9

Core 10

Core 11

Signal

inpu

t

Wave

proc_

1

QRS

det

ect

HRV analysis

Disease classifier

Wave

proc_2

Wave

proc_8

395

Page 43: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Case Study: Non-Invasive Continuous Blood Pressure MonitorFocus: Development Cycle Time

A2 Host OS with GUI on ARMSensor control and medical algorithms on Zynq PL

Sensors and Motorson Bracelet

396

Page 44: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Medical Monitor Network On Chip

397397

Dominated by TRM processors. Feedback driven. Not performance critical.

Page 45: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Development Cycle Times

Step Medical Monitor

OCT (full)

Software Compilation 4s 2s

Hardware Implementation

20 min 20 min

Stream Patching (all) 2 min -

Stream Patching (typical) 12 s -

Deployment 11s 16s

sporadic often398

398

Page 46: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Case Study 3: Optical Coherence TomographyFocus: Performance

A(λi) = A(1/fi)

f(z)

z-Axis Processing

1. Non uniform sampling

A(λi) A(f i)

2. Dispersion compensation

3. (Inverse) FFT

… for many lines x in a row (2d)

… and many rows y in a column (3d)

~

zx

y 399 399

Page 47: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

A component of OCT image processingDispersion Compensation

Dominated by Engines. Dataflow driven.

400

Page 48: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Case Study: ANN

401

Sigmoid

InnerProdFlt

Í Í Í

+

+

x0x0 x1x1 x2x2y0y0 y1y1 y2y2

- Í

+

Frac

LUT

yy

Fix

zz

input[0]input[0]

input[1]input[1]

outputoutput

xx

y

bx0 x1 x2w0 w1 w2

ARM SoC

x

w

P0

R0

P2

R2P1

R1

z

ANN Layer 1(MoreLayers)

b

Programmable Logic

Perceptron

Page 49: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Performance and Resource Usage

Medical Monitor Simple OCT OCT Perceptron

Architecture Spartan 6XC6SLX75

Zynq 7000XC7Z020

Zynq 7000XC7Z020

Zynq 7000XC7Z010

Resources 28% Slice LUTs, 4% Slice Registers80% BRAMs24% DSPs

11% Slice LUTs, 6% Slice Registers7% BRAMs15% DSPs1 ARM Cortex A9

17% Slice LUTS8% Slice Registers22% BRAMs31% DSPs1 ARM Cortex A9

83% Slice LUTS, 67% Slice Registers13.33% BRAMs, 21% DSP1 ARM Cortex A9

Clock Rate 58 MHz 118 MHz 50 MHz 147 MHz

Data Bandwidth 1.25 Mbit /s (in)23 kB/s (out)

236 MWords/s (in)118 MWords/s (out)

50 MWords/s (in)50 MWords/s (out)

9.6 GBits/s in9.6 GBits/s out

Performance -- 8.3 GFPOps*up to 32 GFPops**

4.3 GFPOps* 4.9 GFlops

Power ~2W ~5W ~5W 2.1 W

** Fixed point operations, 32bit

* when instantiated 4 times402

Page 50: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Conclusion

ActiveCells: Computing model and tool-chain for configurable computing

Configurable interconnect Simple Computing, Power Saving

Embedding of task engines High Performance

Hybrid compilation Quick Development

Backend execution Eased Flexibility and Extensibility

403

Page 51: Active Cells: A Programming Model for Configurable ...lec.inf.ethz.ch/syscon/2016/slides/LSC16Slides1206.pdf · CASE STUDY 4: CUSTOM DESIGNED MULTI-PROCESSOR SYSTEM Active Cells:

Acknowledgements

Results originate in project "Supercomputer in the Pocket" (J. Gutknecht, L. Liu, A. Morzov, P. Hunziker, A. Gokhberg, FF) in the Microsoft Innovation Cluster for Embedded Systems funded by Microsoft Research (2009-2014).

We are particularly inbebted for consulting to Chuck Thacker, Niklaus Wirth, Timothée Martiel, Paul Reed and Florian Negele.

Thanks for Support from Xilinx Academic Program.

CRBM Implementation by Stephan Koster.

404