RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee,...
-
date post
20-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of RAMP Gold: Architecture and Timing Model Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee,...
RAMP Gold: Architecture and Timing Model
Andrew Waterman, Zhangxi Tan, Rimas Avizienis, Yunsup Lee, David Patterson,
Krste Asanović
Parallel Computing LaboratoryUniversity of California, Berkeley
RAMP Gold Overview
• Tiled CMP simulator• ISA: SPARC V8
– (ARM/Thumb-2 later?)
• Split timing and function (both on FPGA)
• Host-multithreaded• Runs on V5LX110T
(XUP)
Par Lab InfiniCore
Functional Functional Model Model PipelinePipeline
Arch State
Timing Timing Model Model PipelinePipeline
Timing State
RAMP Gold Target Machine
SPARC V8CORE
SPARC V8CORE
I$I$ D$D$
DRAMDRAM
Shared L2$ / InterconnectShared L2$ / Interconnect
SPARC V8CORE
SPARC V8CORE
I$I$ D$D$
SPARC V8CORE
SPARC V8CORE
I$I$ D$D$
SPARC V8CORE
SPARC V8CORE
I$I$ D$D$
…
64 cores
RAMP Gold v1 Target Features
• 64 single issue in-order SPARCv8 processors– Simple, 5-stage pipeline– FPU
• Cache Timing model– Configurable size, line size, associativity,
miss penalty, shared/private– Change parameters without resynthesis
RAMP Gold Architecture
• Mapping the target machine directly to an FPGA is inefficient
• Solution: split timing and functionality + Multithreading– The timing logic decides how many
target cycles an instruction sequence should take
– Simulating the functionality of an instruction might take multiple host cycles
Function/Timing Split Advantages
• Flexibility– Can configure target at runtime– Synthesize design once, change target
model parameters at will
• Efficient FPGA resource usage– Example 1: model a 2-cycle FPU in 10 host
cycles– Example 2: model a 16MB L2$ using only
256KB host BRAM to store tags/metadata
• Enables multithreading
Split Timing and Function
• Functional model executes ISA correctly• Timing model determines how long a program takes to
run
CPUCPU
L1 D$L1 D$
MEMMEM
=
Target Machine
CPU FMCPU FM
MEM FMMEM FM
Functional Model Timing Model
CPU TMCPU TM
L1 D$ TML1 D$ TM
MEM TMMEM TM
L1 D$ FML1 D$ FM +
• Functional model executes ISA correctly• Timing model determines how long a program takes to
run
CPUCPU
L1 D$L1 D$
MEMMEM
CPU FMCPU FM
MEM FMMEM FM=
Target Machine Functional Model Timing Model
CPU TMCPU TM
L1 D$ TML1 D$ TM
MEM TMMEM TM
+
Split Timing and Function
TM + FM from 30,000 ft
CPU TimingModel
CPU TimingModel
L1 D$ Timing Model
L1 D$ Timing Model
CPU FunctionalModel
CPU FunctionalModel
Memory TimingModel
Memory TimingModel
Memory FunctionalModel
Memory FunctionalModelinstruction
ld/st addressstore data
ld/st address stall
load data
ld/st addressstore data
stall
instructioncomplete
TM + FM from 3,000 ft
Memory TimingModel
Memory TimingModel
Memory Functional
Model
Memory Functional
Modelinstruction
ld/st address,store data
ld/st address stall
load data
ld/st address,store data
stall
instructioncomplete
CPU TM
IFIF CTRLCTRL
DECDEC EXEX MEMMEM WBWB
CPU FM
TM1TM1 TM2TM2
L1 D$ TM
Example: Target Load Miss
Memory TimingModel
Memory TimingModel
Memory Functional
Model
Memory Functional
Modelinstruction
ld/st address,store data
ld/st address stall
load data
ld/st address,store data
stall
instructioncomplete
CPU TM
IFIF CTRLCTRL
DECDEC EXEX MEMMEM WBWB
CPU FM
TM1TM1 TM2TM2
L1 D$ TM
11
22
33
44
44
44
55
66
77
Timing-Driven Host Pipeline
TSTS IFIF
DEDE EXEX WBWBMEM2MEM2
TM1TM1
TARGET MEMORY TM/FMTARGET MEMORY TM/FM
TM2TM2 TM3TM3
L1 D$ TM
MEM1MEM1
Store Buffer
Load ResultBuffer
CPU/D$ Timing Model
CPU Functional Model
{TID,INST} {TID,ADDR}
T0 T1 T2ADD LD ST
ST LD
ADD
Cache Modeling
• The cache model maintains tag, state, protocol bits internally
• Whenever the functional model issues a memory operation, the cache model determines how many target cycles to stall
…
tagtag indexindex offsetoffset
tag, statetag, state tag, statetag, state tag, statetag, state
==== ==
hit/miss
associativity
Multithreaded, Pipelined Cache TM
tag, statetag, state
==Address
tag, statetag, state
==
tag, statetag, state
==
Index
hit?
Status
• Functional + simple timing model work in HW– Running real programs (e.g. SPLASH2)
• Near term future work– Move from current “functional-first + stall”
configuration to timing-driven described here– More interesting memory system timing
model– Functional potpourri (FDIV, MMU, …)