Introduction to Multiprocessor System-on-Chip

Post on 02-Jan-2016

46 views 3 download

description

Introduction to Multiprocessor System-on-Chip. Prof. Jan Madsen Informatics and Mathematical Modeling Technical University of Denmark Richard Petersens Plads, Building 321 DK2800 Lyngby, Denmark. bit-pattern. 001010100101101 101011101101010 001010011101101 110101001010011 - PowerPoint PPT Presentation

Transcript of Introduction to Multiprocessor System-on-Chip

courseware

Introduction to Multiprocessor System-on-Chip

Prof. Jan Madsen

Informatics and Mathematical ModelingTechnical University of Denmark

Richard Petersens Plads, Building 321DK2800 Lyngby, Denmark

(c) Jan Madsen 2SoC-MOBINET courseware

Embedded systems

CPUmem

rom

if ...

then ... else ...

for { ...

..}

func

io

001010100101101101011101101010001010011101101110101001010011101010101010001111101010111101010111101101010

bit-pattern

(c) Jan Madsen 3SoC-MOBINET courseware

Embedded systems

Systems which use a computer to perform a specific function, but are neither used nor perceived as a computer

They are embedded within larger electronic devices

Repeatedly carrying out a particular function

Often completely unrecognized by the device’s user

(c) Jan Madsen 4SoC-MOBINET courseware

Embedded systems design

hardware software

validation validation

hardware prototype

software prototype

Several design groups

Separated validations

Prototype realization

hardware model

software model

Problems arise at a very late point in the design process

(c) Jan Madsen 5SoC-MOBINET courseware

Principples of Codesign

void UnitControl() { up = down = 0; open = 1; while (1) { while (req == floor); open = 0; if (req > floor) { up = 1;} else {down = 1;} while (req != floor); open = 1; delay(10); } }}

void UnitControl() { up = down = 0; open = 1; while (1) { while (req == floor); open = 0;

if (req > floor) { up = 1;} else {down = 1;} while (req != floor); open = 1; delay(10); } }}

SW synthesis

CPU

ASIC

HW synthesis

Interface synthesis

(c) Jan Madsen 6SoC-MOBINET courseware

Overview

Technology Processors IC fabric

Codesign for speed-up component execution timing (SW and HW)

Building sub-system Hardware/software partitioning

Building system System-level issues of codesign

(c) Jan Madsen 7SoC-MOBINET courseware

Software

Elements of computation Store data Transform data Move data

if ...

then ... else ...

for { ...

..}

func

pe

(c) Jan Madsen 8SoC-MOBINET courseware

Processor

Architecture components Processing elements – transform data Memories – store data Interconnect – move data

if ...

then ... else ...

for { ...

..}

func

(c) Jan Madsen 9SoC-MOBINET courseware

Processor: General Purpose

Availability Low cost (mass production) Simple design flow High flexibility

if ...

then ... else ...

for { ...

..}

func

inst mem controller datapath data mem

func

pc

ir cu

reg

+/-*

(c) Jan Madsen 10SoC-MOBINET courseware

Processor: General Purpose - example

if ...

then ... else ...

for { ...

..}

func

inst mem controller datapath data mem

func

pc

ir cu

reg

+/-

x = x + A[i] * p1

*A[i]

p1

5 cycles

(c) Jan Madsen 11SoC-MOBINET courseware

Processor: Custom (ASIC)

High performance Low power Complex design flow No flexibility

if ...

then ... else ...

for { ...

..}

func

controller datapath

cu

+/-*+

mem

(c) Jan Madsen 12SoC-MOBINET courseware

Processor: Custom (ASIC) – example

if ...

then ... else ...

for { ...

..}

func

controller datapath

cu

+/-*+

mem

A[i]

p1

x = x + A[i] * p1 1 cycle

(c) Jan Madsen 13SoC-MOBINET courseware

Processor: Semicustom (ASIP)

Costumized datapath – 16, 8 or 4 bit Optimized for particular class of programs - MACC ”Simple” design flow High flexibility

if ...

then ... else ...

for { ...

..}

func

inst mem controller datapath data mem

func

pc

ir cu

reg

+/-

+*

(c) Jan Madsen 14SoC-MOBINET courseware

Processor: Semicustom - example

if ...

then ... else ...

for { ...

..}

func

inst mem controller datapath data mem

func

pc

ir cu

reg

+/-

+*

p1

A[i]

x = x + A[i] * p1 2 cycles

(c) Jan Madsen 15SoC-MOBINET courseware

IC fabrics

IC is an interconnection of transistors following one of several possible styles – fabrics

The fabric defines how and when transistors are composed

”the material of processors” IC fabrics differ in terms of customizability and

generality

(c) Jan Madsen 16SoC-MOBINET courseware

IC fabrics: Custom

Exact implementation of processor components High NRE cost – mask set ~ 1M$

(c) Jan Madsen 17SoC-MOBINET courseware

IC fabrics: Semicustom

Several semicustom fabrics Library of standard cells Cell arrays (sea-of-gates)

Most processing steps are pre manufactured (high volume)

(c) Jan Madsen 18SoC-MOBINET courseware

IC fabrics: Programmable

Set of interconnected modules Set of modules programmed to implement different

components FPGA

Programmable logic modules, storage and interconnect

(c) Jan Madsen 19SoC-MOBINET courseware

Chips: Implementing IC fabric

(c) Jan Madsen 20SoC-MOBINET courseware

Hardware/software codesign?

Many possible mappings Processor may not exist yet! Exploring the design space Need to estimate

if ...

then ... else ...

for { ...

..}

func

(c) Jan Madsen 21SoC-MOBINET courseware

Hardware/Software Codesign

Optimizing Timing (high performance, hard deadlines) Area (cost) Power consumption Flexibility Reliability ...

We will focus on timing

(c) Jan Madsen 22SoC-MOBINET courseware

Processing element timing

Execution path Control data dependent Input data dependent

Function implementation Component architecture Compiler or synthesis

if ...

then ... else ...

for { ...

..}

func

(c) Jan Madsen 23SoC-MOBINET courseware

Formal execution path timing analysis

then ...

else {

... }

for { ...

..}

if ... b1b3

b4

b2

bi basic block or program segment

tpe(bi,pej) execution time of bi on processing element pej

c(bi) execution frequency of bi

worst/best case timing bounds

)c(b,pe ) (bF,pe )t iI

i (pe j tpe j

(c) Jan Madsen 24SoC-MOBINET courseware

Formal execution path timing analysis

then ...

b2

,pe ) (b itpe j +

+

-

* *

model

+

+

-

*

*

hardware

+

+

-

*

*

software

(c) Jan Madsen 25SoC-MOBINET courseware

Memory models

Access time Control overhead Burst access (packets) Cache

hit/miss time overhead Based on execution history

PE

D$ I$

FlashRAM

SDRAM

(c) Jan Madsen 26SoC-MOBINET courseware

Advanced architectures

Modern high performance processors includes architectural features which complicates timing analysis Dynamic instruction scheduling Speculative execution

Though fast, it makes the processor very power hungry tight bounds on timing very difficult Computation less predictable

Issues which are important for embedded systems

(c) Jan Madsen 27SoC-MOBINET courseware

Building sub-systems

Initial codesign problem Hardware/software partitioning the LYCOS cosynthesis tool

Automatic partitioning from C (subset) and VHDL (single process) Developed at DTU

if ...

then ... else ...

for { ...

..}

func

processor ASIC

(c) Jan Madsen 29SoC-MOBINET courseware

Architectural choices

Which processor should be selected and how fast should it be?

Which ASIC technology should be chosen and how fast should the ASIC be?

How large an ASIC can we afford and which functions should it execute?

How should the processor and ASIC communicate?

(c) Jan Madsen 30SoC-MOBINET courseware

Partitioning Model

Determines granularity and simplifying assumptions w.r.t. communication, HW sharing, etc

Specification

BB

Model SW HW

(c) Jan Madsen 31SoC-MOBINET courseware

Estimation

SW HW

SWEstimator

Sa

tS

SWLib

tH

Estimator

HW

Lib

HW

aH

tC

EstimatorLibComCom

Ca

(c) Jan Madsen 32SoC-MOBINET courseware

Process communication

then ...

else { send(...); receive(...);... }

for { ...

..}

if ... b1b2

b3

b4 )c(b)r(bFr

)c(b)s(bFs

iI

i

iI

i

)(

)(

s(bi) sent data in bi

r(bi) received data in bi

c(bi) execution frequency of bi

Communication time

s(bi) and r(bi) determined by data volume Data encoding Communication protocol

(c) Jan Madsen 33SoC-MOBINET courseware

Solving the Partitioning Problem

SW HW

1

2

3

4

5

6

Just try all combinations...

(c) Jan Madsen 34SoC-MOBINET courseware

Solving the Partitioning Problem

Knapsack Stuffing

No communicationinterleaved exec. additive areas

Parallel executionnon-additive areas

Interleaved communication additive areas

Large scale linear/nonlinear integer programming

Heuristics needed!

SW HW

1

2

3

4

5

6

SW HW

1

2

3

4

5

6

1

2

6

7

HW

3

4

5

SW

(c) Jan Madsen 35SoC-MOBINET courseware

LYCOS Design Flow

Partitioning

Comm. Estim.

HW Estim.

SW Estim.

HWSW

Assembler NetlistSW/HW

Synthesis SynthesisComm.

Synthesis

Translate

Specification

SWModel

Model

ModelComm.

HW

Analysis

RequireFunctional

CDFG

CDFG

(c) Jan Madsen 36SoC-MOBINET courseware

Building Systems

Platform architectures are heterogeneous Different processing element types Different interconnection networks and

communication protocols Different memory types Different scheduling and

synchronization strategies M

CoP

M

M

PDSP

M

P

(c) Jan Madsen 37SoC-MOBINET courseware

Managing HW platform complexity

Development of APIs to hide complexity from application programmer and improve portability

Specialized RTOS to control resource sharing and interfaces

Complex multi-level HW/SW architecture

(c) Jan Madsen 38SoC-MOBINET courseware

Software architecture

Bus

RTOS

CPU

I/O IntBus-CTRL

TimerTimer

drivers

RTOS-APIs

Periphery

Cache

mem

private

private

private

private

sha

red

Hardware

Software

HW/SWPlattform

application

ce1

application

pe1

(c) Jan Madsen 39SoC-MOBINET courseware

Platform design challenges

Integration Design process integration Heterogeneous component and language integration

Design space exploration and optimization Verification

(c) Jan Madsen 40SoC-MOBINET courseware

Complex run-time interdependencies

Run-time dependencies of independent components via communication

Influence on timing and power Need to handle resource sharing

Process/task scheduling Communication scheduling Scheduling strategies (static, dynamic, time or priority driven)

CoP

PEPE

(c) Jan Madsen 41SoC-MOBINET courseware

Interdependency example

Complex non-functional interdependencies Periodic task executing on PE Task writes to bus at the end of each periodic execution

PE

Short execution timehigh bus load

long execution timelow bus load

Local decision on improving performance may impact the global system performance

(c) Jan Madsen 42SoC-MOBINET courseware

System-on-Chip challenge

processor

memory

iorouter

(c) Jan Madsen 43SoC-MOBINET courseware

Network-on-Chip

a

b

c

dM

M

M

Multi-hop Segmented communication

Concurrency Multiple simultaneous

communications

(c) Jan Madsen 44SoC-MOBINET courseware

Network-on-Chip

Multi-hop Segmented communication

Concurrency Multiple simultaneous

communications

Sharing Quasi-simultaneous

resource usage Multiple communication

events occupying some or all resources in an interleaved fashion

a

b

c

dM

M

M

(c) Jan Madsen 46SoC-MOBINET courseware

platform designPlatform-based design

New design paradigme ...

platform

specification

IP

re-configure

re-designMapping

(c) Jan Madsen 47SoC-MOBINET courseware