2PartitioninganEmbeddedSystemforMulticoreDesign
Transcript of 2PartitioninganEmbeddedSystemforMulticoreDesign
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
1/36
Microprocessors, Advanced Partitioning an Embedded System for
Multicore Design
January 31, 2012
Jack Ganssle
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
2/36
The Schedule Grows Faster ThanThe Code!
IBM: person-yrs LOC/month1 439
10 220100 110
1000 55
COCOMO: Schedule = C * KLOC M
(C and M are both > 1)
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
3/36
The Productivity Crash
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
4/36
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
5/36
Partitioning Code
Fact: The easiest way to write great modules fast is to keep them small, with fewdependencies.
Smaller functions have: fewer bugs: bug rate is 2 to 6x lower
more likely to meet specs done faster.
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
6/36
Eye Scans
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
7/36
We Turn Micros into Mainframes
- Sensors- Interface
1,000,000 lines of code
8051
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
8/36
Complexity is not linear with LOC
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
9/36
A Better Design
supervisory code
I/O Code
I/O Code
I/O Code
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
10/36
Small and cheap
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
11/36
Interprocessor Communications
Main CPU
Serial/Encrypt
Rangefinder
TransactionProcessing
I2C a fast serial interface
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
12/36
National Airports Radar
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
13/36
The Sergeant York
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
14/36
The Tradeoff
schedule
featuresquality
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
15/36
Feature Management
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
16/36
Requirements Scrubbing
Features removed % of designs
0 to 10% 21.4%10 to 30% 17.9%
30 to 50% 47.7%
More than 50% 25.9%= 73.6%!
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
17/36
Dont Wait for Hardware
Build an I/O board that plugs into the PC Simulate! Virtualization Virtutech, CoWare, VaST Fitnesse: http://fitnesse.org/ Catsrunner:
www.agilerules.com/projects/catsrunner/index.phtml
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
18/36
What About Multicore?
CPU Memory
Hundreds of nsecTens of MHz
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
19/36
Then Came Prefetchers
CPU Memory
Under 100 nsecTens of MHz
Queue
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
20/36
Then Came Pipelines
CPU Memory
30-50 nsecTens of MHz
Old:Fetch -> Decode -> Execute
Pipelined:FetchDecodeExecute
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
21/36
Cache
CPU Cache
CPU speed
Hundreds of MHz
Memory
30-50 nsec
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
22/36
Cache Splits in Two
CPU L1 Cache
CPU speed
Over 1 GHz
L2 Cache
30-50 nsec Memory
3-5 nsec
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
23/36
SMP
Symmetric Multiprocessing (SMP) multipleidentical CPUs working with a shared memoryarray.
CPU Core CPU Core
Shared memory
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
24/36
Amdahls Law for SMP
Where:n = Number of processors
f = Percent of operation that can not be parallelized
Max speedup =
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
25/36
With an Infinite # CPUs
0.00
2.00
4.00
6.00
8.00
10.00
12.00
0 . 1
0 . 1
4
0 . 1
8
0 . 2
2
0 . 2
6 0
. 3
0 . 3
4
0 . 3
8
0 . 4
2
0 . 4
6 0
. 5
0 . 5
4
0 . 5
8
0 . 6
2
0 . 6
6 0
. 7
0 . 7
4
0 . 7
8
0 . 8
2
0 . 8
6 0
. 9
0 . 9
4
Portion not parallelizable
S p e e
d u p
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
26/36
Best Case: 66% Parallelizable
0
0.5
1
1.5
2
2.5
3
1 3 5 7 9 1 1 1 3 1 5 1 7 1 9 2 1 2 3 2 5 2 7 2 9
S p e e
d u p
Number of cores
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
27/36
But Memory is a Bottleneck!
CPU Core
L1 Cache
CPU Core
L1 Cache
Shared L2 Cache
Memory
Typically 32KB
Typically 2-4MB
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
28/36
And so is Comm
Memory
CPU Core
L1 Cache
CPU Core
L1 Cache
Shared L2 Cache
CPU Core
L1 Cache
CPU Core
L1 Cache
Shared L2 Cache
Then theres the cache coherency problem
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
29/36
The Irony
Programs in L1 run blazingly fast
But why use a 32 bit CPU that canaddress 4 GB on a 32 KB program?
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
30/36
A Colorimeter SMP Design
Memory
Common Bus
A/D A/D A/D Display Display Display
Core R Core G- Read A/D- FIFO data- Do FIR
- Calculate R- Display
- Read A/D- FIFO data- Do FIR
- Calculate R- Display
- Read A/D- FIFO data- Do FIR
- Calculate R- Display
Core B
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
31/36
ASMPAsymmetric Multiprocessing (ASMP or AMP)
Multiple CPUS, identical or not, each runninga specific activity
CPU Core
Memory
CPU Core
Memory Some comm link
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
32/36
The Assembly Line
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
33/36
A More Natural Design via AMP
A/D FIFO FIR Calc B Display
DisplayA/D FIFO FIR Calc G
A/D FIFO FIR Calc R Display
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
34/36
Another Assembly Line
CPU
Memory
CPU
Memory
CPU
Memory
Data
Memory
CPU
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
35/36
Implications Multicore can give huge performance improvements.
But for non-parallel problems they may not yieldmuch improvement.
Its hard to impossible to predict speedimprovements of most algorithms once they growlarger than L1
Many embedded apps are hugely non-parallelizable. In some cases AMP offers a better solution than SMP
-
8/12/2019 2PartitioninganEmbeddedSystemforMulticoreDesign
36/36
Questions?