2PartitioninganEmbeddedSystemforMulticoreDesign

8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign

1/36

Microprocessors, Advanced Partitioning an Embedded System for

Multicore Design

January 31, 2012

Jack Ganssle


2/36

The Schedule Grows Faster ThanThe Code!

IBM: person-yrs LOC/month1 439

10 220100 110

1000 55

COCOMO: Schedule = C * KLOC M

(C and M are both > 1)


3/36

The Productivity Crash


4/36


5/36

Partitioning Code

Fact: The easiest way to write great modules fast is to keep them small, with fewdependencies.

Smaller functions have: fewer bugs: bug rate is 2 to 6x lower

more likely to meet specs done faster.


6/36

Eye Scans


7/36

We Turn Micros into Mainframes

- Sensors- Interface

1,000,000 lines of code

8051


8/36

Complexity is not linear with LOC


9/36

A Better Design

supervisory code

I/O Code

I/O Code

I/O Code


10/36

Small and cheap


11/36

Interprocessor Communications

Main CPU

Serial/Encrypt

Rangefinder

TransactionProcessing

I2C a fast serial interface


12/36

National Airports Radar


13/36

The Sergeant York


14/36

The Tradeoff

schedule

featuresquality


15/36

Feature Management


16/36

Requirements Scrubbing

Features removed % of designs

0 to 10% 21.4%10 to 30% 17.9%

30 to 50% 47.7%

More than 50% 25.9%= 73.6%!


17/36

Dont Wait for Hardware

Build an I/O board that plugs into the PC Simulate! Virtualization Virtutech, CoWare, VaST Fitnesse: http://fitnesse.org/ Catsrunner:

www.agilerules.com/projects/catsrunner/index.phtml


18/36

What About Multicore?

CPU Memory

Hundreds of nsecTens of MHz


19/36

Then Came Prefetchers

CPU Memory

Under 100 nsecTens of MHz

Queue


20/36

Then Came Pipelines

CPU Memory

30-50 nsecTens of MHz

Old:Fetch -> Decode -> Execute

Pipelined:FetchDecodeExecute


21/36

Cache

CPU Cache

CPU speed

Hundreds of MHz

Memory

30-50 nsec


22/36

Cache Splits in Two

CPU L1 Cache

CPU speed

Over 1 GHz

L2 Cache

30-50 nsec Memory

3-5 nsec


23/36

SMP

Symmetric Multiprocessing (SMP) multipleidentical CPUs working with a shared memoryarray.

CPU Core CPU Core

Shared memory


24/36

Amdahls Law for SMP

Where:n = Number of processors

f = Percent of operation that can not be parallelized

Max speedup =


25/36

With an Infinite # CPUs

0.00

2.00

4.00

6.00

8.00

10.00

12.00

0 . 1

0 . 1

4

0 . 1

8

0 . 2

2

0 . 2

6 0

. 3

0 . 3

4

0 . 3

8

0 . 4

2

0 . 4

6 0

. 5

0 . 5

4

0 . 5

8

0 . 6

2

0 . 6

6 0

. 7

0 . 7

4

0 . 7

8

0 . 8

2

0 . 8

6 0

. 9

0 . 9

4

Portion not parallelizable

S p e e

d u p


26/36

Best Case: 66% Parallelizable

0

0.5

1

1.5

2

2.5

3

1 3 5 7 9 1 1 1 3 1 5 1 7 1 9 2 1 2 3 2 5 2 7 2 9

S p e e

d u p

Number of cores


27/36

But Memory is a Bottleneck!

CPU Core

L1 Cache

CPU Core

L1 Cache

Shared L2 Cache

Memory

Typically 32KB

Typically 2-4MB


28/36

And so is Comm

Memory

CPU Core

L1 Cache

CPU Core

L1 Cache

Shared L2 Cache

CPU Core

L1 Cache

CPU Core

L1 Cache

Shared L2 Cache

Then theres the cache coherency problem


29/36

The Irony

Programs in L1 run blazingly fast

But why use a 32 bit CPU that canaddress 4 GB on a 32 KB program?


30/36

A Colorimeter SMP Design

Memory

Common Bus

A/D A/D A/D Display Display Display

Core R Core G- Read A/D- FIFO data- Do FIR

- Calculate R- Display

- Read A/D- FIFO data- Do FIR


- Read A/D- FIFO data- Do FIR


Core B


31/36

ASMPAsymmetric Multiprocessing (ASMP or AMP)

Multiple CPUS, identical or not, each runninga specific activity

CPU Core

Memory

CPU Core

Memory Some comm link


32/36

The Assembly Line


33/36

A More Natural Design via AMP

A/D FIFO FIR Calc B Display

DisplayA/D FIFO FIR Calc G

A/D FIFO FIR Calc R Display


34/36

Another Assembly Line

CPU

Memory

CPU

Memory

CPU

Memory

Data

Memory

CPU


35/36

Implications Multicore can give huge performance improvements.

But for non-parallel problems they may not yieldmuch improvement.

Its hard to impossible to predict speedimprovements of most algorithms once they growlarger than L1

Many embedded apps are hugely non-parallelizable. In some cases AMP offers a better solution than SMP


36/36

Questions?

2PartitioninganEmbeddedSystemforMulticoreDesign

Documents

Transcript of 2PartitioninganEmbeddedSystemforMulticoreDesign