A Parametrizable Processor - Computation Structures...
Transcript of A Parametrizable Processor - Computation Structures...
5/17/07 6.375: Bichler, Carli, Yamhure 1
A Parametrizable Processor
Olivier BichlerRoberto Carli
Alessandro Yamhure
5/17/07 6.375: Bichler, Carli, Yamhure 2
Motivation
• Efficient system-on-a-chip solutions• Wide spectrum of requirements
– Performance– Area / Power
• Tuning without engineering
5/17/07 6.375: Bichler, Carli, Yamhure 3
Overview
• One-Rule Synchronous Processor• Combinational stages packaged as functions• Parameter-controlled configuration• Optimization (performance/area):
– Compiler macros– Aggressive compiler
• Results / tradeoffs
5/17/07 6.375: Bichler, Carli, Yamhure 4
One Rule to rule them all
• Explicit guards (WILL_FIRE)– FIFOF Methods– Data Memory– Writeback stage guard
• Customized SFIFO– First, find, find2, notFull, notEmpty, deq < enq < clear– Guards < Actions– Writeback < Execute– No Bypassing
5/17/07 6.375: Bichler, Carli, Yamhure 5
Packaging for Abstraction
• Achieve highly modular parameterization• Combinational workhorse unchanged
– Can be packaged/abstracted into functions– ActionValue functions
5/17/07 6.375: Bichler, Carli, Yamhure 6
3-stage packaged version
5/17/07 6.375: Bichler, Carli, Yamhure 7
2-stage version (no pcQ)
5/17/07 6.375: Bichler, Carli, Yamhure 8
2-stage version (no wbQ)
5/17/07 6.375: Bichler, Carli, Yamhure 9
1-stage version
5/17/07 6.375: Bichler, Carli, Yamhure 10
High-level schematics
5/17/07 6.375: Bichler, Carli, Yamhure 11
2-stage Exploration
• Instruction memory– Never skipped– High latency
• Data memory– Only used on memory LD/ST instructions– Pipeline causes stalls– On branches/jumps no need for writeback
• However, specific SOC tasks can benefit fromalternative solutions
5/17/07 6.375: Bichler, Carli, Yamhure 12
Test Strategy
• Custom Makefile– Scans and changes parameters automatically– Synthesizes various configurations– Reports IPC, IPS, clock period, area
5/17/07 6.375: Bichler, Carli, Yamhure 13
Results: Clock Period
Post-place+route effective clock period
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
w/o pcQ, w/owbQ
w/o pcQ, wwbQ
w pcQ, w/owbQ
w pcQ, wwbQ
Expectation:
Effective clock periodis expected to decreasewith the number ofpipelined stages, as wemake the critical pathshorter
5/17/07 6.375: Bichler, Carli, Yamhure 14
Results: IPS
0.00
10000000.00
20000000.00
30000000.00
40000000.00
50000000.00
60000000.00
70000000.00
80000000.00
90000000.00
IPS(median)
IPS (qsort) IPS(towers)
IPS (vvadd) IPS(multiply)
w/o pcQ, w/o wbQ
w/o pcQ, w wbQ
w pcQ, w/o wbQ
w pcQ, w wbQ
0.00
10000000.00
20000000.00
30000000.00
40000000.00
50000000.00
60000000.00
70000000.00
80000000.00
90000000.00
100000000.00
IPS (median) IPS (qsort) IPS (towers) IPS (vvadd) IPS(multiply)
w/o pcQ, w/o wbQ
w pcQ, w/o wbQ
w/o pcQ, w wbQ
w pcQ, w wbQ
Designexploration
non-EHR RegFile
EHR RegFile
5/17/07 6.375: Bichler, Carli, Yamhure 15
Results: AreaPost-synthesis total area
18000.00
19000.00
20000.00
21000.00
22000.00
23000.00
24000.00
w/o pcQ, w/owbQ
w/o pcQ, wwbQ
w pcQ, w/owbQ
w pcQ, wwbQ
Post-place+route total area
300000.00
310000.00
320000.00
330000.00
340000.00
350000.00
360000.00
370000.00
w/o pcQ, w/owbQ
w/o pcQ, wwbQ
w pcQ, w/owbQ
w pcQ, wwbQ
Expectation:
Area should increase with thenumber of pipelined stages
Post-place+route total area
325000.00330000.00335000.00340000.00345000.00350000.00355000.00360000.00365000.00370000.00375000.00
w/o pcQ, w/owbQ
w/o pcQ, wwbQ
w pcQ, w/owbQ
w pcQ, wwbQ
non-EHR RegFile
EHR RegFile
5/17/07 6.375: Bichler, Carli, Yamhure 16
Performance/Area TradeoffsIPS / Post-synthesis area
0
500
1000
1500
2000
2500
3000
3500
4000
w/o pcQ, w/o wbQ w/o pcQ, w wbQ w pcQ, w/o wbQ w pcQ, w wbQFOM = SQRT(IPS) / Post-synthesis area
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
w/o pcQ, w/o wbQ w/o pcQ, w wbQ w pcQ, w/o wbQ w pcQ, w wbQPowerArea
IPSFOM
×∝
5/17/07 6.375: Bichler, Carli, Yamhure 17
Conclusions
• Task-specific, customizable and flexibleprocessor design
• User-defined parameter controls degree ofparallelism / number of stages
• Each configuration optimized for area andperformance
• Balanced tradeoff between configurations