Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.
-
Upload
marianna-shields -
Category
Documents
-
view
227 -
download
0
Transcript of Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.
![Page 1: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/1.jpg)
Vector Processors
Prof. Sivarama Dandamudi
School of Computer Science
Carleton University
![Page 2: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/2.jpg)
Carleton University © S. Dandamudi 2
Pipelining
Vector machines exploit pipelining in all its activitiesComputationsMovement of data from/to memory
Pipelining provides overlapped execution Increases throughputHides latency …
![Page 3: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/3.jpg)
Carleton University © S. Dandamudi 3
Pipelining (cont’d)
Pipeline overlaps execution:6 versus 18 cycles
![Page 4: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/4.jpg)
Carleton University © S. Dandamudi 4
Pipelining (cont’d)
One measure of performance:
Ideal case:n-stage pipeline should give a speedup of n
Two factors affect this:Pipeline fillPipeline drain
Non-pipelined execution time
Pipelined execution time Speedup =
![Page 5: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/5.jpg)
Carleton University © S. Dandamudi 5
Pipelining (cont’d)
N computations, each takes n * T time
Non-pipelined time = N * n * T time
Pipelined time = n * T + (N – 1) T time
= (n + N –1) T time
n * Nn + N 1
Speedup = 1/N + 1/n – 1/(n * N )
1=
![Page 6: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/6.jpg)
Carleton University © S. Dandamudi 6
Pipelining (cont’d)
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
Number of elements, N
Spee
dup
n = 9
n = 3
n = 6
![Page 7: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/7.jpg)
Carleton University © S. Dandamudi 7
Pipelining (cont’d)
Pipeline depth, n
![Page 8: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/8.jpg)
Carleton University © S. Dandamudi 8
Vector Machines
Provide high-level operationsWork on vectors (linear arrays of numbers)A typical vector operation
Add two 64-element floating-point vectorsEquivalent to an entire loop
CRAY formatV3 V2 VOP V1 V3 V2 VOP V1
![Page 9: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/9.jpg)
Carleton University © S. Dandamudi 9
Vector Machines (cont’d)
Consists of Scalar unit
Works on scalarsAddress arithmetic
Vector unitResponsible for vector operationsSeveral vector functional units
Integer add, FP add, FP multiply …
![Page 10: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/10.jpg)
Carleton University © S. Dandamudi 10
Vector Machines (cont’d)
Two types of architectureMemory-to-memory architecture
Vectors are memory resident First machines are of this type Example: CDC Star 100, CYBER 205
Vector-register architecture Vectors are stored in registers
Modern vector machines belong to this type Examples: Cary 1/2/X-MP/YMP, NEC SX/2, Fujitsu VP200,
Hitachi S820
![Page 11: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/11.jpg)
Carleton University © S. Dandamudi 11
Components
Primary components of vector-register machineVector registers
Each register can hold a small vectorExample: Cray-1 has 8 vector registers
Each vector register can hold 64 doublewords (64-bit values) Two read ports and one write port
Allows overlap among the vector operations
![Page 12: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/12.jpg)
Carleton University © S. Dandamudi 12
Cray-1Architecture
![Page 13: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/13.jpg)
Carleton University © S. Dandamudi 13
ComponentsVector functional units
Each unit is fully pipelined Can start a new operation on every clock cycle Cray-1 has six functional units
FP Add, FP multiply, FP reciprocal, Integer add, Logical, Shift
Scalar registersStore scalarsCompute addresses to pass on to the load/store unit
![Page 14: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/14.jpg)
Carleton University © S. Dandamudi 14
ComponentsVector load/store unit
Moves vectors between memory and vector registers Load and store operations are pipelined
Some processors have more than one load/store unit NEC SX/2 has 8 load/store units
MemoryDesigned to allow pipelined accessTypically use interleaved memories
Will discuss later
![Page 15: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/15.jpg)
Carleton University © S. Dandamudi 15
Some Example Vector Machines
Machine Year # VR VR size # LSUs
CRAY-2 1985 8 64 1
Cray Y-MP 1988 8 64 2 loads/1 store
Fujitsu VP100 1982 8-256 32-1024 2
Hitachi S810 1983 32 256 4
NEC SX/2 1984 8+8192 256+var. 8
Convex C-1 1985 8 128 1
![Page 16: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/16.jpg)
Carleton University © S. Dandamudi 16
Some Example Vector Machines (cont’d)
Vector functional unitsCray X-MP/Y-MP
8 units FP add, FP multiply, FP reciprocal Integer add, 2 logical Shift Population count/parity
![Page 17: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/17.jpg)
Carleton University © S. Dandamudi 17
Some Example Vector Machines (cont’d)
Vector functional units (cont’d)
NEX SX/216 units
4 FP add, 4 FP multiply/divide 4 Integer add/logical, 4 Shift
![Page 18: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/18.jpg)
Carleton University © S. Dandamudi 18
Advantages of Vector Machines
Flynn’s bottleneck can be reducedVector instructions significantly improve code densityA single vector instruction specifies a great deal of
workReduce the number of instructions needed to execute a
programEliminate control overhead of a loop
A vector instruction represents the entire loop Loop overhead can be substantial
![Page 19: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/19.jpg)
Carleton University © S. Dandamudi 19
Advantages of Vector Machines (cont’d)
Impact of main memory latency can be reducedVector instructions that access memory have a known
patternPipelined access can be usedCan exploit interleaved memoryHigh latency associated with memory can be amortized over
the entire vector Latency is not associated with each data item
When accessing a floating-point number
![Page 20: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/20.jpg)
Carleton University © S. Dandamudi 20
Advantages of Vector Machines (cont’d)
Control hazards can be reducedVector machines organize data operands into regular
sequences Suitable for pipelined access in hardware
Vector operation loop
Data hazards can be eliminatedDue to structured nature of data
Allows planned prefetching of data
![Page 21: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/21.jpg)
Carleton University © S. Dandamudi 21
Example Problem A Typical Vector Problem
Y = a * X + Y X and Y are vectors This problem is known as
SAXPY (single precision A*X Plus Y)DAXPY (double precision A*X Plus Y)
SAXPY/DAXPY represents a small piece of code that takes most of the time in the benchmark
![Page 22: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/22.jpg)
Carleton University © S. Dandamudi 22
Example Problem (cont’d)
Non-vector code fragment LD F0,a ADDI R4,Rx,#512 ;last address to loadloop: LD F2,0(Rx) ;F2 := M[0+Rx]
; i.e., load X[i] MULT F2,F0,F2 ;a*X[i]
![Page 23: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/23.jpg)
Carleton University © S. Dandamudi 23
Example Problem (cont’d)
LD F4,0(Ry) ;load Y[i]
ADD F4,F2,F4 ;a*X[i] + y[i]
SD F4,0(Ry) ;store into Y[i]
ADDI Rx,Rx,#8 ;increment index to X
ADDI Ry,Ry,#8 ;increment index to Y
SUB R20,R4,Rx ;R20 := R4-Rx
JNZ R20,loop ;jump if not done9 instructions in the loop
![Page 24: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/24.jpg)
Carleton University © S. Dandamudi 24
Example Problem (cont’d)
Vector code fragment LD F0,a ;load scalar a LV V1,Rx ;load vector X MULTSV V2,F0,V1 ;V2 := F0 * V1 LV V3,Ry ;load vector Y ADDV V4,V2,V3 ;V4 := V2 + V3 SV Ry,V4 ; store the result Only 6 vector instructions!
![Page 25: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/25.jpg)
Carleton University © S. Dandamudi 25
Example Problem (cont’d)
Two main observationsExecution efficiency
Vector code Executes 6 instructions
Non-vector code Nearly 600 instructions (9 * 64) Lots of control overhead
4 out of 9 instructions! Absent in the vector code
![Page 26: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/26.jpg)
Carleton University © S. Dandamudi 26
Example Problem (cont’d)
Two main observationsFrequency of pipeline interlock
Non-vector code: Every ADD must wait for MULT Every SD must wait for ADD
Loop unrolling can eliminate this interlockVector code
Each instruction is independent Pipeline stalls once per vector operation
Not once per vector element
![Page 27: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/27.jpg)
Carleton University © S. Dandamudi 27
Vector Length
Vector register has a natural vector length64 elements in CRAY systems
What if the vector has a different length?Three cases
Vector length < Vector register length Use a vector length register to indicate the vector length
Vector length = Vector register lengthVector length > Vector register length
![Page 28: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/28.jpg)
Carleton University © S. Dandamudi 28
Vector Length (cont’d)
Vector length > Vector register lengthUse strip miningVector is partitioned into strips that are less than or
equal to the vector register length
Odd strip
![Page 29: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/29.jpg)
Carleton University © S. Dandamudi 29
Vector Stride
Vector strideDistance separating the elements that are to be merged
into a single vectorIn elements, not bytes
Typically multidimensional matrices may have non-unit stride access patternsExample: matrix multiply
![Page 30: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/30.jpg)
Carleton University © S. Dandamudi 30
Vector Stride (cont’d)
Matrix multiplicationfor (i = 1, 100)
for (j = 1, 100)
A[i,j] = 0
for (k = 1, 100)
A[i,j] = A[i,j] + B[i,k] * C[k,j]
Non-unit stride
Unit stride
![Page 31: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/31.jpg)
Carleton University © S. Dandamudi 31
Vector Stride (cont’d)
Access pattern of B and C depends on how the matrix is storedRow-major
Matrix is stored row-by-rowUsed by most languages except FORTRAN
Column-majorMatrix is stored column-by-columnUsed by FORTRAN
![Page 32: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/32.jpg)
Carleton University © S. Dandamudi 32
Vector Stride (cont’d)
11 12 13 1421 22 23 2431 32 33 3441 42 43 44
![Page 33: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/33.jpg)
Carleton University © S. Dandamudi 33
Cray X-MP Instructions
Integer additionVi Vj+Vk Vi = Vj + VkVi Sj+Vk Vi = Sj + Vk
Sj is a scalar
Floating-point additionVi Vj+FVk Vi = Vj + VkVi Sj+FVk Vi = Sj + Vk
Sj is a scalar
![Page 34: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/34.jpg)
Carleton University © S. Dandamudi 34
Cray X-MP Instructions (cont’d)
Load instructionsVi ,A0,Ak Vi = M(A0)+Ak
Vector load with stride AkLoads VL elements from memory address A0
Vi ,A0,1 Vi = M(A0)+1Vector load with stride 1Special case
![Page 35: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/35.jpg)
Carleton University © S. Dandamudi 35
Cray X-MP Instructions (cont’d)
Store instructions ,A0,Ak Vi
Vector store with stride AkStores VL elements starting at memory address A0
,A0,1 ViVector store with stride 1Special case
![Page 36: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/36.jpg)
Carleton University © S. Dandamudi 36
Cray X-MP Instructions (cont’d)
Logical AND instructionsVi Vj&Vk Vi = Vj & VkVi Sj&Vk Vi = Sj & Vk
Sj is a scalar
Shift instructionsVi Vj>Ak Vi = Vj >> AkVi Vj<Ak Vi = Vj << Ak
Left/right shift each element of Vj and store the result in Vi
![Page 37: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/37.jpg)
Carleton University © S. Dandamudi 37
Sample Vector Functional Units
Vector functional unit # Stages Available to chain
Vector results
Integer ADD (64-bit) 3 8 VL+8
64-bit shift 3 8 VL+8
128-bit shift 4 9 VL+9
Floating ADD 6 11 VL+11
Floating MULTIPLY 7 12 VL+12
![Page 38: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/38.jpg)
Carleton University © S. Dandamudi 38
X-MP Pipeline Operation
Three phasesSetup phase
Sets functional units to perform the appropriate operationEstablishes routes to source and destination vector registersRequires 3 clock cycles for all functional units
Execution phaseShutdown phase
![Page 39: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/39.jpg)
Carleton University © S. Dandamudi 39
X-MP Pipeline Operation (Cont’d)
Three phases (cont’d)
Execution phaseSource and destination vector registers are reserved
Cannot be used by another instruction
Source vector register is reserved for VL+3 clock cycles VL = vector length
One pair of operands/clock cycle enter the first stage
![Page 40: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/40.jpg)
Carleton University © S. Dandamudi 40
X-MP Pipeline Operation (Cont’d)
Three phases (cont’d)
Shutdown phaseShutdown time = 3 clock cyclesShutdown time
Time difference between when the last result emerges and when the destination vector register becomes available for other
instructions
![Page 41: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/41.jpg)
Carleton University © S. Dandamudi 41
X-MP Pipeline Operation (Cont’d)
Three phases (cont’d)
Shutdown phaseDestination register becomes available after
3 + n + (VL1) + 3 = n + VL + 5 clock cyclesSetup time = shutdown time = 3 clock cyclesFirst result comes after n clock cyclesRemaining (VL1) results come out at one/clock cycle
![Page 42: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/42.jpg)
Carleton University © S. Dandamudi 42
A Simple Vector Add Operation
A1 5VL A1V1 V2+FV3
![Page 43: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/43.jpg)
Carleton University © S. Dandamudi 43
Overlapped Vector OperationsA1 5VL A1V1 V2+FV3V4 V5*FV6
![Page 44: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/44.jpg)
Carleton University © S. Dandamudi 44
Chaining ExampleA1 5VL A1V1 V2+FV3V4 V5*FV1
![Page 45: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/45.jpg)
Carleton University © S. Dandamudi 45
Vector Processing Performance
![Page 46: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/46.jpg)
Carleton University © S. Dandamudi 46
Interleaved Memories
Traditional memory designsProvide sequential, non-overlapped access
Use high-order interleaving
Interleaved memoriesFacilitate overlapped, pipelined accessUsed by vector and high performance systems
Use low-order interleaving
![Page 47: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/47.jpg)
Carleton University © S. Dandamudi 47
Interleaved Memories (cont’d)
![Page 48: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/48.jpg)
Carleton University © S. Dandamudi 48
Interleaved Memories (cont’d)
Two types of designsSynchronized access organization
Upper m bits are given to all memory banks simultaneouslyRequires output latchesDoes not efficiently support non-sequential access
Independent access organizationSupports pipelined access for arbitrary access patternRequire address registers
![Page 49: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/49.jpg)
Carleton University © S. Dandamudi 49
Interleaved Memories (cont’d)
Synchronized access organization
![Page 50: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/50.jpg)
Carleton University © S. Dandamudi 50
Interleaved Memories (cont’d)
Pipelined transfer of datain interleaved memories
![Page 51: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/51.jpg)
Carleton University © S. Dandamudi 51
Interleaved Memories (cont’d)
Independent access organization
![Page 52: Vector Processors Prof. Sivarama Dandamudi School of Computer Science Carleton University.](https://reader035.fdocuments.net/reader035/viewer/2022062305/56649f535503460f94c77e47/html5/thumbnails/52.jpg)
Carleton University © S. Dandamudi 52
Interleaved Memories (cont’d)
Number of banks B
B MM = memory access time in cycles
Sequential access if stride = B B = 8, M = 6 clock cycles, stride = 1
Time to read 16 words = 6 + 16 = 22 clock cycles If stride is 8, it takes 16 * 6 = 96 clock cycles
Last slide