Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10...
Transcript of Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10...
![Page 1: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/1.jpg)
EvaluationofRISC-VRTLwithFPGA-AcceleratedSimulation
DonggyuKim,ChristopherCelio,DavidBiancolin,JonathanBachrach,KrsteAsanovic
CARRV201710/14/2017
![Page 2: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/2.jpg)
BAR
EvaluationMethodologiesForComputerArchitectureResearch
2
Analytic Power/Energy Modeling
Microarchitectural Cycle-by-Cycle Software Simulators
Simulation Sampling
![Page 3: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/3.jpg)
BAR
3
HowToDoComputerArchitectureResearchInThePast?
FewlinesofC/C++code
Paper
100M~1Binstructions
![Page 4: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/4.jpg)
BAR
HowEasySingle-CyclePerfectCaches?
4
§ 10linesofC++codeinMARRSx86è Easytoimplementunrealistic,non-cycle-accuratemodels
![Page 5: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/5.jpg)
BAR
5
NoµarchSimulationAnyMore!
§ Validationisdifficult- Isyourmodelingstillcycle-accurate?
§ Simulationistooslowà SimulationSampling
Validation of one design instance[1]
Other designs points?
Your target modeling?
[1] Gutierrez et al. Sources of error in full-system simulation, ISPASS 2014
![Page 6: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/6.jpg)
BAR
6
NoSimulationSampling!
§ Phase-basedSampling(e.g.SimPoint)- Notheoreticallyguaranteederrorbounds- ShortperiodsofphasesshowsimilarIPCwheneverrepeated- 401.bzip2withitsreferenceinputinBOOM-2w
§ StatisticalSampling(e.g.SMARTS)- Staticallyboundederrors- Statewarmingproblems• µarchitecturalstateshouldberecoveredfromfunctionalsimulators
§ Whataboutmanaged-languageapplications?
![Page 7: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/7.jpg)
BAR
7
BuildRTLToValidateYourDesignIdeas!
§ RTLisnolongerdifficult- Hardwareconstructionlanguages(e.g.Chisel)- RISC-Vimplementationcodebase(e.g.RocketChip)
§ RTLsimulationisnolongerslow- TensofMIPSusingFPGAs
FGPA-Accelerated Simulation
RTL Implementation
HardwareSpecification
CAD Tools
Area, Timing, Power Evaluation
PerformanceEvaluation
![Page 8: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/8.jpg)
BAR
OldBOOMLayout
8
RegFile
ICache
Uncore
LSU
RenameTable
FPU
ROB
Free List
Issue Window
Branch PredictorALUs
Fetch Buffer
DCache
DCacheControl
IDIVIMUL Busy Table
Bypasses
![Page 9: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/9.jpg)
BAR
§ Automatically transformsanyRTLdesignsintoFPGA-AcceleratedRTLsimulators
§ Quickly evaluatesperformance,power,andenergyofRTLdesignswithrealisticsoftwareapplications.
§ Flexibly Co-simulatesabstractHW&SWmodels
9
MIDAS:TurnYourRTLintoGold
Target RTL
![Page 10: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/10.jpg)
BARMIDASCustomCompilerPasses
10
§ FIRRTL:IRforRTLtransformshttps://github.com/freechipsproject/firrtl
§ InstrumentRTLdesignsfor-Accurateperformancemodeling- EasysimulationcontrolinFPGA- Interactionswithabstracttimingmodels- RTLstatesnapshotsforenergymodeling
FIR
RTL
Com
pile
r
FPGA-Accelerated RTL Simulator
Target RTL Design
Macro MappingFAME1 Transform
Scan Chain InsertionSimulation MappingPlatform Mapping
![Page 11: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/11.jpg)
BARMappingSimulationtotheFPGAHost
11
I/ODevices
ProcessorL2 Cache
/ MainMemory
FPGA Board( e.g. Xilinx Zynq)
Software FPGA
I/ODevices Processor
MemorySystemTiming
Simulation Driver
Board DRAM
L2 Cache /Main Memory
I/O Endpoints
§ SimulationSpeed:3.56MHz(ISCA`16)à ~40MHz(CARRV`17)§ I/OEndpoints- Low-leveltimingtokens<->High-leveltransactions- OptimizethecommunicationsbetweenSW&FPGA
§ L2/MainMemory- Abstracttimingmodel(L2$tags)inFPGA- ActualdatainBoardDRAM- Setsize,associativity,blocksize,latency:runtimeconfigurable
![Page 12: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/12.jpg)
BARMemoryTimingModelValidation
12
§ Apointer-chaseµbenchmarkofccbench runninginBOOM(https://github.com/ucb-bar/ccbench)
§ L1$:16KiB,6cycles§ L2$:1MiB,6+23cycles (runtimeconfigurable)§ DRAM:6+23+80cycles(runtimeconfigurable)
![Page 13: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/13.jpg)
BARTargetDesign:RocketChipGenerator
13
Rocket(In-order Processor)
BOOM-2w (Version 1)(Out-of-order Processor)
Fetch-width 1 2
Issue-width 1 3
Issue slots 20
ROB size 80
Ld/St entries 16/16
Physical registers 32(int)/32(fp) 110
Branch predictor gshare: 16KiB history
BTB entries 40 40
RAS entries 2 4
MSHR entreis 2 2
L1 I$ / D$ 16 KiB or 32 KiB
ITLB / DTLB reaches 128 KiB / 128 KiB
L2 $ 1MiB / 23 cycles
DRAM latency 80 cycles
![Page 14: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/14.jpg)
BAR
§ InstructioncountwithRISC-Vforthereferenceinputs
§ InstructionCountAverage:2.08Trillion§ 445.gobmk,456.hmmer,462.libquantum:failinBOOM
SPEC2006intBenchmarkSuite
14
Benchmarks Instruction Count(T) Benchmarks Instruction Count(T)
400.perlbench 2.48 458.sjeng 2.85
401.bzip2 3.08 462.libquantum 2.09
403.gcc 1.37 464.h264ref 5.07
429.mcf 0.29 471.omnetpp 0.61
445.gobmk 2.04 473.astar 1.05
456.hmmer 2.95 483.xalanbmk 1.10
![Page 15: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/15.jpg)
BARSimulationTimeforSPEC2006int
15
Simulators Speed Average Max(464.h264ref: 5T)
GEM5+Ruby 100 KIPS[1]240 days
(8 months)640 days
(1.8 years)
MARSSx86 400 KIPS[2]60 days
(2 months)160 days
(5.3 months)
MIDAS 18 MIPS (40MHz) 1.4 days 3 days
[1] http://gem5-users.gem5.narkive.com/hCJL6O2Q/gem5-simulation-speed[2] Patel et al. MARSSx86: A full system simulator for x86 CPUs, DAC 2011.
![Page 16: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/16.jpg)
BARCaseStudy:IPC
16
0.0
0.2
0.4
0.6
0.8
1.0
1.2
IPC
Rocket 16KiB L1 Rocket 32KiB L1 BOOM-2w 16KiB L1 BOOM-2w 32KiB L1 Cortex A9
§ RocketiscomparabletoCortexA9§ BOOMoutperformsCortexA9§ Performanceimprovement:16KiBL1$à 32KiBL1$
1.41
![Page 17: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/17.jpg)
BARCaseStudy:MPKIsofBOOM-2w
17
0
10
20
30
40
50
60
MPK
I
Conditional Branch Indirect Branch L1 I-Cache ITLB L1 D-Cache DTLB L2 Cache
§ 40BTBentries,32KiBL1$,1MiBL2$,TLBreach=128KiB§ 473.omnetpp,483.xalancbmk:LargeindirectbranchMPKIsà BiggerBTBs
![Page 18: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/18.jpg)
BAR
0
20
40
60
80
100
Issu
e Sl
ots
/ Iss
ue W
idth
(%)
Issued Slots Empty Slots Non-issued Slots
CaseStudy:IssueQueueUtilization
18
§ IssuedSlots/Cycle=IPC§ EmptySlots:frontendhazards,lackofresources(403.gcc)§ Non-issuedslots:backendhazards
![Page 19: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/19.jpg)
BAR
Full Program Execution In FPGA
StroberPower/EnergyModeling
19
RTL State Snapshots
I/O Traces
PrimeTime PX
RTL SimulationPost-synthesisDesignsRTL Signal Activities
Average Power
§ NostatewarmingisnecessaryinRTL/gate-levelsimulation
![Page 20: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/20.jpg)
BAR
0
100
200
300
400
500
600Ro
cket
BOOM
-2w
Rock
et
BOOM
-2w
Rock
et
BOOM
-2w
Rock
et
BOOM
-2w
Rock
et
BOOM
-2w
Rock
et
BOOM
-2w
Rock
et
BOOM
-2w
Rock
et
BOOM
-2w
Rock
et
BOOM
-2w
400.perlbench 401.bzip2 403.gcc 429.mcf 458.sjeng 464.h264ref 471.omnetpp 473.astar 483.xalancbmk
Pow
er (m
W)
Misc
Uncore
L1 D-cache control
L1 D-cache meta + data
L1 I-cache
ROB
LSU
FPU
Integer Unit
Issue Logic
Register File
Rename + Control
Branch Predictior
Fetch Unit
CaseStudy:PowerBreakdown
20
§ Synopsys32nmeducationaltechnology§ 50randomsampleRTLstatesnapshots§ PipelineUtilization↑à Power↑§ BOOM:40%powerfromRenameLogic&RegisterFile
![Page 21: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/21.jpg)
BARCaseStudy:EnergyPerInstruction
21
§ BOOM-2wisperformant,Rocketismoreenergyefficient§ 429.mcf,471.omnetpp:lowpower,lessenergyefficientPipelineUtilization↑à EnergyEfficiency↑
0
200
400
600
800
1000
EPI(p
J / I
nst)
Rocket 32 KiB L1 BOOM-2w 32 KiB L1
1923.4 1027.0
![Page 22: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/22.jpg)
BAR
§ Debugging,validationforRTLdesigns-Assertiondetection-CommitlogcomparisonwithSpike
§ Memorysystemtimingmodeling-RealisticDRAMtimingmodelsinFPGA
§ FireSim:datacentersimulation(fires.im)
-ConnectRocketChips withNICs&switchtimingmodels-Amazonblogpost,7th RISC-Vworkshop
On-goingMIDASResearchInUCB
22
![Page 23: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/23.jpg)
BARMIDAS(Strober)IsOpen-SourceNow
23
Your Processors /Accelerators
§ Examples:https://github.com/ucb-bar/midas-examples
§ RocketChip template:https://github.com/ucb-bar/midas-top-release
§ Willsupportvarioushostplatforms(XilinxZynq,AmazonEC2F1,IntelXeon+In-packageFPGA)
https://github.com/ucb-bar/midas-releasestrober.org
![Page 24: Evaluation of RISC-V RTL with FPGA-Accelerated Simulation · MIDAS Custom Compiler Passes 10 §FIRRTL: IR for RTL transforms ... -RTL state snapshots for energy modeling r FPGA-Accelerated](https://reader030.fdocuments.net/reader030/viewer/2022041001/5ea152af76732255f923d15e/html5/thumbnails/24.jpg)
BARAcknowledgements
24
§ Funding:- DARPAAwardNumberHR0011-12-2-0016- TheCenterforFutureArchitectureResearch,amemberofSTARnet,aSemiconductorResearchCorporationprogramsponsoredbyMARCOandDARPA
- ASPIRELabindustrialsponsorsandaffiliates:Intel,Google,HPE,Huawei,LGE,Nokia,NVIDIA,Oracle,andSamsung
- Kwanjeong Scholarship