An Instruction Set and Micro architecture for Instruction Level Distribution Processing
description
Transcript of An Instruction Set and Micro architecture for Instruction Level Distribution Processing
An Instruction Set and Micro architecture for Instruction
Level Distribution Processing
(Ho-Seop Kim and James E. Smith)
Haiying QuElectrical and Computer Engineering University of Alberta
Introduction 1 ILPILP: Instruction Level Parallelism Achieved significant performance gains
ILDPILDP: Instruction Level Distributed Processing Technology trend
Introduction 2 Proposed Micro architecture
Short pipelines Distributed processing elements: in-order instruction processing enable out-of order execution
Strand: dependent instructions Accumulator Inter instruction communication
Instruction Set 64 General Purpose
Registers: R0-R63 Source or
Destination 8 Accumulators: A0-A7
Dead Accumulator
Load/store Instruction One accumulator value One GPR One parcel Ai <- mem(Aj) Ai <- mem(Rj) mem(Ai) <- Rj mem(Rj) <- Ai
Register Instruction Operation: accumulator and GPR/immediate Result: accumulator or GPR Ai <- Ai op Rj Ai <- Ai op immed Ai <- Rj op immed Rj <- Ai Rj <- Ai op immed
Branch/jump Instruction
Conditional branch: compare Ai, 0 or GPR(All usual predicates)
Program counter (p) Indirect jump: Ai or GPR Return address: GPR P <- P + immed; Ai pred Rj P <- P + immed; Ai pred 0 P <- Ai P <- Rj P <- Ai; Rj <- P++
Example Code
Strand
Figure 3. Types of values and and associated registers
Strand Ends Two strands
intersect: copy one to GPR
Out put is a static global register
New strand
Figure 4. Issue timing
Stages Fetch: 4 words-- over 4 instructions Parceling: Break into individual instructions Renaming: GPR Steering: into FIFO according to the
accumulators
Figure 5 ILDP Processor Block Diagram
Some Concepts PE: Processing Element IR: Issue Register—single Reservation Station
ICN: Interconnection Network
Figure 6 Micro architecture
Table 1 Complexity Comparison
Please be noted: the ILDP’s is based on one PE
Table 2 Bench Mark Program Properties
Evaluation 1
Figure 7 type of register values Figure 8 Average strand length
Evaluation 2
Figure 9 Strand end Figure 10 instruction size
Evaluation 3
Figure 11 Cumulative strand re-use Figure 12 IPC
Evaluation 4
Figure 13 Global register rename map read/ write bandwidth
Table 3 Simulator Configurations
Discussion