SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin &...
-
Upload
amani-newbury -
Category
Documents
-
view
218 -
download
2
Transcript of SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin &...
![Page 1: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/1.jpg)
SimpleScalar v3.0 Tutorial
U. of Wisconsin, CS752, Fall 2004
Andrey Litvin
(main source: Austin & Burger)
(also Dana Vantrease’ slides)
![Page 2: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/2.jpg)
Simulator Basics
• What is an architectural simulator?– Tool that reproduces the behavior of a computing
device
• Why use a simulator?– Flexible
• Rapid design space exploration• Tailor abstraction level to need
– Cheap
• Why not use a simulator?– Slow– Correctness?
![Page 3: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/3.jpg)
Functional vs. Performance
• Functional simulators implement the architecture
- Perform the actual execution
- Implement what programmers see
• Performance (or timing) simulators implement the microarchitecture.
- Model system resources/internals
- Measure time
- Implement what programmers do not see
![Page 4: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/4.jpg)
![Page 5: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/5.jpg)
Trace vs. Execution Driven (2)
• Trace+ Easy to Implement- Requires large disk files to store instruction stream- Limited state stored- Speculation? Multiprocessor?
• Execution- Hard to Implement+ Allows access to full state information at any point during
program execution+/- Execution requires inclusion of instruction setemulator and an I/O emulation module
![Page 6: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/6.jpg)
Simplescalar
• Developed by Todd Austin with Doug Burger at UW, ’94-’96
• Execution – driven• Collection of simulators that emulate the
microprocessor at different detail levels (functional, functional + cache / bpred, out-of-order cycle-timer, etc.)
• Tools: – C/Fortran compiler, assembler, linker (for PISA)– DLite: target-machine-level debugger – pipeline trace viewer
![Page 7: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/7.jpg)
Advantages of SimpleScalar
• Fast (given priority over clarity)– 4 MIPS functional, 300 KIPS OOO on 1.5 GHz
host
• Relatively(!) simple - short learning curve• Modular design• Well documented and commented• Popular - support community, extensions• Limitations will be summarized at the end
![Page 8: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/8.jpg)
![Page 9: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/9.jpg)
![Page 10: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/10.jpg)
Sim-Fast
• Bare functional simulator
• Does not account for the behavior of any part of the microarchitecture
• Optimized for speed
![Page 11: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/11.jpg)
Sim-Safe
• Similar to sim-fast (slower)
• Implements some memory op safeguards– Memory alignment– Memory access permission
• Good for debugging sim-fast crashes
![Page 12: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/12.jpg)
Sim-EIO
• External trace/checkpoint generator
• Functional simulator like sim-fast and sim-safe
• Implements checks like sim-safe
![Page 13: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/13.jpg)
Sim-Profile
• Profiles by symbol and address
• Keeps track of and reports (many options)– Dynamic instruction counts– Instruction class counts– Usage of address modes– Profiles of the text & data segment
![Page 14: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/14.jpg)
Sim-Cache/Sim-Bpred
• Functional core drives the detailed model of the cache/branch predictor
• Similar to trace-driven cache/bpred simulation• Fast results for miss/misprediction rates• No timing simulation/performance impact
![Page 15: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/15.jpg)
Cache Implementation
• block size, # of sets, associativity, all customizable
• Replacement policies: random, FIFO, LRU
• 2-level cache hierarchy supported (easily extended)
• Unified/separate I/D L1 and L2 caches
![Page 16: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/16.jpg)
Implemented Predictors
• Specifying branch predictor type-bpred <type>
• Implemented predictors– nottaken always predicts not taken– taken always taken– perfect always right– bimod BTB with 2-bit counters– 2lev 2-level adaptive branch predictor
• Specify level1 size, level 2 size, history size(# bits), XOR PC and history?
• GAg, GAp, PAg, PAp, gshare
– comb combining (meta) predictor
![Page 17: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/17.jpg)
Sim-Outorder
• Detailed performance simulator• Out-of-order execution core• Register renaming, reorder buffer, speculative
execution• 2-level cache hierarchy• Branch prediction
![Page 18: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/18.jpg)
![Page 19: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/19.jpg)
![Page 20: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/20.jpg)
Making a Fast Timing Simulator
• Hardware advantage: parallelism
• Software emulator advantage: free space for auxillary structures
• Functional and detailed performance parts can be decoupled
![Page 21: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/21.jpg)
Simulator RUU vs. Logical RUU
• No consumer -> producer links (tags)
• Rather, producer->consumer links for efficient result broadcast at writeback
• Values not tracked (except address)
• “Completed” bit (ROB and IQ combined)
• Same struct used for LSQ
![Page 22: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/22.jpg)
![Page 23: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/23.jpg)
Other Simulator Structures
• ready_queue– Ready to issue instructions– Used at issue stage– Built at writeback– Limited issue bandwidth– Policy: mul/div, load/store, branch first, then
oldest first
![Page 24: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/24.jpg)
Other Simulator Structures
• fu_pool – functional units– Issue latency vs. operational latency
– Both constant, hard-coded in sim-outorder
– Read port latency more detailed (cache simulation)
– Issue latency – busy counter at FU
– Operational latency – event queue
• event_queue– ordered by cycle
– schedules writeback events
![Page 25: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/25.jpg)
Other Simulator Structures
• create_vector– Register renaming– Maps logical register to RUU entry
(architectural register up to date if entry is null)– Updated at dispatch and writeback– Similar to Qi in Tomasulo– Backed up during dispatch if sim “divines”
misspeculation
![Page 26: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/26.jpg)
![Page 27: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/27.jpg)
![Page 28: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/28.jpg)
Stage Implementation(greater detail in SS hack guide and v2.0
tutorial)• Fetch (at ruu_fetch())
– Fetch ins-s up to cache line boundary– Block on miss – Put in Fetch Queue (FQ)– Probe branch predictor for next cache line to
fetch from
![Page 29: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/29.jpg)
Stage Implementation
• Decode (at ruu_decode())– Fetch from FQ
– Decode
– Rename registers
– Put into RUU and LSQ
– Update RUU dependency lists, renaming table
– Sim (functional core): • Execute instruction, update machine state
• Detect branch misprediction, backup state (checkpoint)
![Page 30: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/30.jpg)
Stage Implementation
• Issue (at ruu_issue() and lsq_refresh())– Ready queue -> Event queue– Order based on policy (see ready_queue slide)– Check and reserve FU’s– Mem. Ops check memory dependences
• No load speculation (“maybe” dependence respected)
![Page 31: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/31.jpg)
Stage Implementation
• Writeback (at ruu_writeback())– Get finished instructions from ready queue– Wake up (put in ready queue) instructions with
ready operands (use dependence list)– Performance core detects misprediction here
and rolls back the state
![Page 32: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/32.jpg)
Stage Implementation
• Commit (at ruu_commit())– Service D-TLB misses– Update register file (logically) and rename table– Retire stores to D-cache– Reclaim RUU/LSQ entries of retirees
![Page 33: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/33.jpg)
Limitations
• I/O and other system calls– Only some limited functional simulation
• Lacks support for arbitrary speculation– Only branches can cause rollback– No speculative loads– Harder problem: Decoupling of functional and
timing cores (for performance) complicates data speculation extensions
![Page 34: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/34.jpg)
Limitations of Memory System
• No causality in memory system– All events are calculated at time of first access
• Simple TLB and virtual memory model– No address translation– TLB miss is a (small) fixed latency parallel with cache access
• Bandwidth, non-blocking– Modeled as n FU’s (read/write ports with fixed issue
delay) • Accurate if memory system is lightly utilized
– SMT extensions?
• Overhaul required for multiprocessor simulation
![Page 35: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/35.jpg)
Extensions
www.simplescalar.com
• trace cache
• value prediction
• SMT
• multiprocessor
• more target ISA / host OS support
![Page 36: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/36.jpg)
Miscellaneous
• Enter “sim-outorder” with no options to get options list and usage examples
• Options database (options.h)– Interface to register an option and check if
entered at initialization
• Stats database (stats.h)– Register new stats– Define secondary stats and track distributions
![Page 37: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/37.jpg)
Miscellaneous
![Page 38: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/38.jpg)
![Page 39: SimpleScalar v3.0 Tutorial U. of Wisconsin, CS752, Fall 2004 Andrey Litvin (main source: Austin & Burger) (also Dana Vantrease’ slides)](https://reader035.fdocuments.net/reader035/viewer/2022062421/56649c7b5503460f9492ec10/html5/thumbnails/39.jpg)