February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial...
-
Upload
warren-melton -
Category
Documents
-
view
217 -
download
0
description
Transcript of February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial...
![Page 1: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/1.jpg)
February 20, 2009 http://csg.csail.mit.edu/6.375 L08-1
Asynchronous Pipelines: Concurrency Issues
Arvind Computer Science & Artificial Intelligence LabMassachusetts Institute of Technology
![Page 2: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/2.jpg)
February 20, 2009 L08-2http://csg.csail.mit.edu/6.375
Synchronous vs Asynchronous Pipelines
In a synchronous pipeline: typically only one rule; the designer controls
precisely which activities go on in parallel downside: The rule can get too complicated
-- easy to make a mistake; difficult to make changes
In an asynchronous pipeline: several smaller rules, each easy to write,
easier to make changes downside: sometimes rules do not fire
concurrently when they should
![Page 3: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/3.jpg)
February 20, 2009 L08-3http://csg.csail.mit.edu/6.375
Two-stage Asynchronous Pipeline
rule fetch_and_decode (!stallFunc(instr, bu)); bu.enq(newIt(instr,rf)); pc <= predIa;endrule
rule execute (True); case (it) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin rf.upd(rd, va+vb); bu.deq(); end tagged EBz {cond:.cv,addr:.av}: if (cv == 0) then begin
pc <= av; bu.clear(); end else bu.deq(); tagged ELoad{dst:.rd,addr:.av}: begin rf.upd(rd, dMem.read(av)); bu.deq(); end tagged EStore{value:.vv,addr:.av}: begin dMem.write(av, vv); bu.deq(); end endcase endrule
fetch & decode execute
pc rfCPU
bu
Can these rules fire concurrently ?Does it matter?
![Page 4: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/4.jpg)
February 20, 2009 L08-4http://csg.csail.mit.edu/6.375
The tensionIf the two rules never fire in the same cycle then the machine can hardly be called a pipelined machineIf both rules fire in parallel every cycle when they are enabled, then wrong results would be produced
![Page 5: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/5.jpg)
February 20, 2009 L08-5http://csg.csail.mit.edu/6.375
The compiler issueCan the compiler detect all the conflicting conditions?
Important for correctnessDoes the compiler detect conflicts that do not exist in reality?
False positives lower the performance The main reason is that sometimes the compiler
cannot detect under what conditions the two rules are mutually exclusive or conflict free
What can the user specify easily? Rule priorities to resolve nondeterministic choice
yes
yes
In many situations the correctness of the design is not enough; the design is not done unless the performance goals are met
![Page 6: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/6.jpg)
February 20, 2009 L08-6http://csg.csail.mit.edu/6.375
some insight intoConcurrent rule firing
Rules
HW
Ri Rj Rk
clocks
rule
steps
Ri
RjRk
• There are more intermediate states in the rule semantics (a state after each rule step)
• In the HW, states change only at clock edges
![Page 7: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/7.jpg)
February 20, 2009 L08-7http://csg.csail.mit.edu/6.375
Parallel executionreorders reads and writesRules
HW clocks
rule
steps
• In the rule semantics, each rule sees (reads) the effects (writes) of previous rules
• In the HW, rules only see the effects from previous clocks, and only affect subsequent clocks
reads writes reads writes reads writesreads writesreads writes
reads writes reads writes
![Page 8: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/8.jpg)
February 20, 2009 L08-8http://csg.csail.mit.edu/6.375
Correctness
Rules
HW
Ri Rj Rk
clocks
rule
steps
Ri
RjRk
• Rules are allowed to fire in parallel only if the net state change is equivalent to sequential rule execution
• Consequence: the HW can never reach a state unexpected in the rule semantics
![Page 9: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/9.jpg)
February 20, 2009 L08-9http://csg.csail.mit.edu/6.375
Executing Multiple Rules Per Cycle: Conflict-free rules
Parallel execution behaves like ra < rb or equivalently rb < ra
rule ra (z > 10); x <= x + 1;
endrule
rule rb (z > 20); y <= y + 2;
endrule
Rulea and Ruleb are conflict-free ifs . a(s) b(s) 1. a(b(s)) b(a(s)) 2. a(b(s)) == b(a(s))
![Page 10: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/10.jpg)
February 20, 2009 L08-10http://csg.csail.mit.edu/6.375
Mutually Exclusive RulesRulea and Ruleb are mutually exclusive if they can never be enabled simultaneously
s . a(s) ~ b(s)
Mutually-exclusive rules are Conflict-free by definition
![Page 11: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/11.jpg)
February 20, 2009 L08-11http://csg.csail.mit.edu/6.375
Executing Multiple Rules Per Cycle: Sequentially Composable rulesrule ra (z > 10);
x <= y + 1; endrule
rule rb (z > 20); y <= y + 2;
endrule
Parallel execution behaves like ra < rb
Rulea and Ruleb are sequentially composable ifs . a(s) b(s) 1. b(a(s))
2. PrjR(Rb)(b(s)) == PrjR(Rb)(b(a(s)))
- R(Rb) is the range of rule Rb- Prjst is the projection selecting st from the total state
![Page 12: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/12.jpg)
February 20, 2009 L08-12http://csg.csail.mit.edu/6.375
Compiler determines if two rules can be executed in parallel
Rulea and Ruleb are sequentially composable ifs . a(s) b(s)
1. b(a(s)) 2. PrjR(Rb)(b(s)) == PrjR(Rb)(b(a(s)))
Rulea and Ruleb are conflict-free ifs . a(s) b(s) 1. a(b(s)) b(a(s))2. a(b(s)) == b(a(s))
These properties can be determined by examining the domains and ranges of the rules in a pairwise manner.
Parallel execution of CF and SC rules does not increase the critical path delay
D(Ra) R(Rb) = D(Rb) R(Ra) = R(Ra) R(Rb) =
D(Rb) R(Ra) = These conditions are sufficient but not necessary
![Page 13: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/13.jpg)
February 20, 2009 L08-13http://csg.csail.mit.edu/6.375
Muxing structureMuxing logic requires determining for each register (action method) the rules that update it and under what conditions
Conflict Free/Mutually Exclusive)and
and or
1122
Sequentially Composableand
and or
11 and ~222
If two CF rules update the same element then they must be mutually exclusive (1 ~2)
![Page 14: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/14.jpg)
February 20, 2009 L08-14http://csg.csail.mit.edu/6.375
Scheduling and control logicModules
(Current state) Rules
Scheduler
1
n
1
n
Muxing
1
nn
n
Modules(Next state)
cond
action
“CAN_FIRE” “WILL_FIRE”
i j Ri and Rj are conflict-free or sequentially composable
![Page 15: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/15.jpg)
February 20, 2009 L08-15http://csg.csail.mit.edu/6.375
Concurrency analysis
Two-stage Pipelinerule fetch_and_decode (!stallfunc(instr, bu)); bu.enq(newIt(instr,rf)); pc <= predIa;endrule
rule execute (True); case (it) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin rf.upd(rd, va+vb); bu.deq(); end tagged EBz {cond:.cv,addr:.av}: if (cv == 0) then begin
pc <= av; bu.clear(); end else bu.deq(); tagged ELoad{dst:.rd,addr:.av}: begin rf.upd(rd, dMem.read(av)); bu.deq(); end tagged EStore{value:.vv,addr:.av}: begin dMem.write(av, vv); bu.deq(); end endcase endrule
fetch & decode execute
pc rfCPU
bu
conflicts around: pc, bu, rf
Let us split this rule for the sake of analysis
![Page 16: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/16.jpg)
February 20, 2009 L08-16http://csg.csail.mit.edu/6.375
Concurrency analysis
Add Rulerule fetch_and_decode (!stallfunc(instr, bu)); bu.enq(newIt(instr,rf)); pc <= predIa;endrule
fetch & decode execute
pc rfCPU
bu
rule execAdd (it matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}); rf.upd(rd, va+vb); bu.deq();endrule
rf: subbu: find, enqpc: read,write
execAdd rf: updbu: first, deq
fetch < execAdd rf: sub < upd bu: {find, enq} < {first , deq}
execAdd < fetch rf: sub > upd bu: {find, enq} > {first , deq}
Do either of these concurrency properties hold ?
![Page 17: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/17.jpg)
February 20, 2009 L08-17http://csg.csail.mit.edu/6.375
Register File concurrency properties
Normal Register File implementation guarantees:
rf.sub < rf.upd that is, reads happen before writes in concurrent
executionBut concurrent rf.sub(r1) and rf.upd(r2,v) where r1 ≠ r2 behaves like both
rf.sub(r1) < rf.upd(r2,v) rf.sub(r1) > rf.upd(r2,v)
To guarantee rf.upd < rf.sub Either bypass the input value to output when register
names match Or make sure that on concurrent calls rf.upd and
rf.sub do not operate on the same registerTrue for our rules because of stalls but it is too difficult for the compiler to detect
![Page 18: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/18.jpg)
February 20, 2009 L08-18http://csg.csail.mit.edu/6.375
Bypass Register Filemodule mkBypassRFFull(RegFile#(RName,Value));
RegFile#(RName,Value) rf <- mkRegFileFull(); RWire#(Tuple2#(RName,Value)) rw <- mkRWire();
method Action upd (RName r, Value d); rf.upd(r,d); rw.wset(tuple2(r,d)); endmethod
method Value sub(RName r); case rw.wget() matches tagged Valid {.wr,.d}:
return (wr==r) ? d : rf.sub(r); tagged Invalid: return rf.sub(r); endcase endmethodendmodule
Will work only if the compiler lets us ignore conflicts on the rf made by mkRegFileFull
![Page 19: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/19.jpg)
February 20, 2009 L08-19http://csg.csail.mit.edu/6.375
Unsafe modulesBluespec allows you to import Verilog modules by identifying wires that correspond to methodsSuch modules can be made safe either by asserting the correct scheduling properties of the methods or by wrapping the unsafe modules in appropriate Bluespec code
![Page 20: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/20.jpg)
February 20, 2009 L08-20http://csg.csail.mit.edu/6.375
FIFOsOrdinary one element FIFO deq & enq conflict – won’t doLoopy FIFO deq < enqBypass FIFO enq < deq
What about first, clear, find?
![Page 21: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/21.jpg)
February 20, 2009 L08-21http://csg.csail.mit.edu/6.375
module mkLFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); RWire#(void) deqEN <- mkRWire(); Bool deqp = isValid (deqEN.wget())); method Action enq(t x) if
(!full || deqp); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqEN.wset(?); endmethod method t first() if (full); return (data); endmethod method Action clear(); full <= False; endmethodendmodule
Concurrency analysis
One Element “Loopy” FIFO
not empty
not full rdyenab
rdyenab
enq
deq
FIFO
mod
ule
or!full
bu.first < bu.enqbu.deq < bu.enq
Let us extend it with find
bu.enq < bu.clearbu.deq < bu.clear
![Page 22: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/22.jpg)
February 20, 2009 L08-22http://csg.csail.mit.edu/6.375
module mkSFIFO1#(function Bool findf(tr r, t x)) (SFIFO#(t,tr)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); RWire#(void) deqEN <- mkRWire(); Bool deqp = isValid (deqEN.wget())); method Action enq(t x) if (!full || deqp); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqEN.wset(?); endmethod method t first() if (full); return (data); endmethod method Action clear(); full <= False; endmethod method Bool find(tr r); return (findf(r, data) && full); endmethod endmodule
One Element Searchable FIFO
bu.first < bu.enqbu.deq < bu.enq
(full && !deqp));
bu.find < bu.enqbu.deq < bu.find
bu.enq < bu.clearbu.deq < bu.clear
![Page 23: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/23.jpg)
February 20, 2009 L08-23http://csg.csail.mit.edu/6.375
What concurrencydo we want?
If fetch and execAdd happened in the same cycle and the meaning was:
fetch < execAdd instructions will fly through the FIFO (No pipelining!) rf and bu modules will need the properties;
rf: sub < updbu: {find, enq} < {first , deq}
execAdd < fetch execAdd will make space for the fetched instructions
(i.e., how pipelining is supposed to work) rf and bu modules will need the properties;
rf: upd < sub bu: {first , deq} < {find, enq}
fetch & decode execute
pc rfCPU
buSuppose bu is empty initially
Now we will focus only on the pipeline case
Ordinary RF
Bypass RF
Bypass FIFO
Loopy FIFO
![Page 24: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/24.jpg)
February 20, 2009 L08-24http://csg.csail.mit.edu/6.375
Concurrency analysis
Branch Rulesrule fetch_and_decode (!stallfunc(instr, bu)); bu.enq(newIt(instr,rf)); pc <= predIa;endrule
fetch & decode execute
pc rfCPU
bu
rule execBzTaken(it matches tagged Bz {cond:.cv,addr:.av} &&& (cv == 0)); pc <= av; bu.clear(); endrule
rule execBzNotTaken(it matches tagged Bz {cond:.cv,addr:.av} &&& !(cv == 0)); bu.deq(); endrule
execBzTaken < fetch ? Should be treated as conflict – give priority to execBzTaken
execBzNotTaken < fetch ?bu: {first , deq} < {find, enq}
Loopy FIFO
![Page 25: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/25.jpg)
February 20, 2009 L08-25http://csg.csail.mit.edu/6.375
Concurrency analysis
Load-Store Rulesrule fetch_and_decode (!stallfunc(instr, bu)); bu.enq(newIt(instr,rf)); pc <= predIa;endrule
fetch & decode execute
pc rfCPU
bu
rule execStore(it matches tagged EStore{value:.vv,addr:.av}); dMem.write(av, vv); bu.deq();endrule
rule execLoad(it matches tagged ELoad{dst:.rd,addr:.av}); rf.upd(rd, dMem.read(av)); bu.deq(); endrule
execLoad < fetch ? Same as execAdd, i.e.,
rf: upd < subbu: {first , deq} < {find, enq}
execStore < fetch ?bu: {first , deq} < {find, enq}
Bypass RFLoopy FIFO
Loopy FIFO
![Page 26: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/26.jpg)
February 20, 2009 L08-26http://csg.csail.mit.edu/6.375
Properties Required of Register File and FIFO for Instruction Pipelining
Register File: rf.upd(r1, v) < rf.sub(r2) Bypass RF
FIFO bu: {first , deq} < {find, enq}
bu.first < bu.find bu.first < bu.enq bu.deq < bu.find bu.deq < bu.enq
Loopy FIFO
![Page 27: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/27.jpg)
February 20, 2009 L08-27http://csg.csail.mit.edu/6.375
Concurrency analysis
Two-stage Pipelinerule fetch_and_decode (!stallfunc(instr, bu)); bu.enq(newIt(instr,rf)); pc <= predIa;endrule
rule execAdd (it matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}); rf.upd(rd, va+vb); bu.deq(); endrule
rule execBz(it matches tagged Bz {cond:.cv,addr:.av}); if (cv == 0) then begin pc <= av; bu.clear(); end else bu.deq(); endrule
rule execLoad(it matches tagged ELoad{dst:.rd,addr:.av}); rf.upd(rd, dMem.read(av)); bu.deq(); endrule
rule execStore(it matches tagged EStore{value:.vv,addr:.av}); dMem.write(av, vv); bu.deq(); endrule
fetch & decode execute
pc rfCPU
bu
It all works
![Page 28: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/28.jpg)
February 20, 2009 http://csg.csail.mit.edu/6.375 L08-28
Lot of nontrivial analysis but no change in processor code!Needed Fifos and Register files with the appropriate concurrency properties
![Page 29: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/29.jpg)
February 20, 2009 L08-29http://csg.csail.mit.edu/6.375
BypassingAfter decoding the newIt function must read the new register values if available (i.e., the values that are still to be committed in the register file)
Will happen automatically if we use bypassRF
The instruction fetch must not stall if the new value of the register to be read exists
The old stall function is correct but unable to take advantage of bypassing and stalls unnecessarily
![Page 30: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/30.jpg)
February 20, 2009 L08-30http://csg.csail.mit.edu/6.375
The stall function for the synchronous pipeline
function Bool newStallFunc (Instr instr, Reg#(Maybe#(InstTemplate)) buReg);
case (buReg) matches tagged Invalid: return False; tagged Valid .it: case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return (findf(ra,it) || findf(rb,it)); …
Previously we stalled when ra matched the destination register of the instruction in the execute stage. Now we bypass that information when we read, so no stall is necessary.
return (false);
![Page 31: February 20, 2009 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science Artificial Intelligence Lab Massachusetts.](https://reader030.fdocuments.net/reader030/viewer/2022011801/5a4d1bb07f8b9ab0599cc72c/html5/thumbnails/31.jpg)
February 20, 2009 L08-31http://csg.csail.mit.edu/6.375
The stall function for the asynchronous pipeline
function Bool newStallFunc (Instr instr, SFIFO#(InstTemplate, RName) bu);
case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}:
return (bu.find(ra) || bu.find(rb)); tagged Bz {cond:.rc,addr:.addr}:
return (bu.find(rc) || bu.find(addr)); …
bu.find in our loopy-searchable FIFO happens after deq. This means that if bu can hold at most one instruction like in the synchronous case, we do not have to stall. Otherwise, we will still need to check for hazards and stall.
No change in the stall function