Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident...
-
date post
20-Dec-2015 -
Category
Documents
-
view
236 -
download
1
Transcript of Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident...
Trident Processor
A Scalable Architecture for Scalar, Vector, and Matrix
Operations
Trident Processor
A Scalable Architecture for Scalar, Vector, and Matrix
OperationsEng. M. Soliman Prof. S.
Sedukhin
2
ContentsContents
The impacting factors on the processor architecture
The idea of our proposed Trident processor
The Trident parallelism
The architecture of the Trident processor
The features of the Trident processor
Conclusion and Future work
3
TechnologyApplications
Characteristics
processorArchitecture
The Important Factors Impact on the Processor Architecture The Important Factors Impact on the Processor Architecture
4
Fast-improving TechnologyFast-improving TechnologyMoose's law: The number of transistors per
integrated circuit would double every 18 months
5
Application CharacteristicsApplication Characteristics
Processor Multimedia extension
Intel Pentium II, III, and 4
MMX, SSE, and SSE2
Motorola PowerPC AltiVec
Silicon Graphics MIPS MDMX
Sun Sparc VIS
Hewlett-Packard PA-RISC
MAX
In response to the increasing importance of multimedia applications, major processor vendors have announced extensions to their general purpose processors in an effort to improve their multimedia performance
6
The Idea of the Trident ProcessorThe Idea of the Trident Processor
The huge transistor budget (within a few years it will be possible to integrate a billion transistors on a single chip )
The requirements of future applications (the scientific and engineering applications, multimedia applications, … , are based on vector and matrix operations)
7
Scalar IS(1
operation)
Vector IS (n operations)
Matrix IS(n2/n3 operations)
We Propose the Trident ProcessorWe Propose the Trident Processor
Trident: A general-purpose processor which has three instruction sets (IS): scalar, vector, and matrix
8
Ins. Set Example Scalar Code Scalar ops
Scalar Addition z=x+y; 1
VectorAddition
for(i=0;i<n;i++)z[i]=x[i]+y[i]; O(n)
Dot products=0;for(i=0;i<n;i++)s+=x[i]*y[i];
O(n)
Additionfor(i=0;i<n;i++)for(j=0;j<n;j++)z[i][j]=x[i][j]+ y[i][j];
O(n2)
Matrix Matrix-vector multiplication
for(i=0;i<n;i++){s=0;for(j=0;j<n;j++)s+=x[i][j]*y[j];z[i]=s;}
O(n2)
Matrix-matrix multiplication
for(i=0;i<n;i++)for(j=0;j<n;j++){s=0;for(k=0;k<n;k++)s+=x[i][k]*y[k][j];z[i][j]=s;}
O(n3)
The Trident Instruction setsThe Trident Instruction sets
9
Trident processor exploits a significant amount (up to three levels) of data parallelism The advantages of using data parallelism
Compact:A single short instruction can describe array of scalar operations
Expressive: A single instruction can pass valuable information about an array of
scalar operations to hardware
Scalable: adding more hardware can increase performance by processing
longer arrays
The Trident ParallelismThe Trident Parallelism
10
The Trident ArchitectureThe Trident Architecture
11
Vector ProcessingVector Processing
A vector pipeline can perform the fundamental vector operation, such as addition, subtraction, multiplication, and division
Vector data are stored on ring vector registers
Multiple vector instructions can be operated concurrently on the parallel vector pipelines
12
Step0
Inputa0 , b0
Output
1 a! , b1 a0 + b0
2 a3 , b3 a1 + b1
3 a3 , b3 a2 + b2
4 a0 , b0 a3 + b3
VR2 VR0 + VR1
Example: vector additionExample: vector addition
13
Matrix ProcessingMatrix ProcessingBy using parallel vector pipelines and ring matrix register file, the fundamental matrix operations, such as addition, subtraction, multiplication, and inversion, can be performed
14
Example: Matrix additionExample: Matrix addition
MR2 MR0 + MR1P3P2P1P0P3P2P1P0
OutputInput
Ste
p
0a00
b00
a10
b10
a20
b20
a30
b30
1a01
b01
a11
b11
a21
b21
a31
b31
a00+b00
a10 + b10
a20
+
b20
a30
+
b30
2a02
b02
a12
b12
a22
b22
a32
b32
a01+b01
a11 + b11
a21
+
b21
a31
+
b31
3a03
b03
a13
b13
a23
b23
a33
b33
a02+b02
a12 + b12
a22
+
b22
a32
+
b32
15
The basic matrix operation is the matrix-matrix multiplication
Matrix-matrix MultiplicationMatrix-matrix Multiplication
16
ChainingChaining
1
0
n
jjiji bac
1
0
n
kkjikij bac
1
0
n
iii bac
Matrix-matrix multiplication
Matrix-vector multiplication
Dot product
17
Instructions O(n3) O(n2) O(1)
Load O(n3) O(n3) O(n2)
Store O(n2) O(n2) O(n2)
Mull-acc. O(n3) O(n3) O(n2)
Branch O(n3) O(n2) 0
Address comp. O(n3) O(n2) O(1)
Add/sub. O(n3) O(n2) 0
Reg. initialization
O(n2) O(n) 0
Scalar IS Vector IS Matrix IS
Matrix-matrix Multiplication ComplexityMatrix-matrix Multiplication Complexity
18
0
500
1000
1500
2000
2500
3000
3500
4000
1 2 3scalar
vector matrix
88 Matrix-matrix Multiplication88 Matrix-matrix Multiplication
Number of instructions
19
0
200
400
600
800
1000
1200
1400
1 2 3 4 5 6 7
scalar vector matrix
(1) load, (2) store, (3) multiply-accumulate steps, (4) branch, (5) address computations, (6) addition/ subtraction, and (7) register initializations
ContinueContinue
20
What this means?What this means?fewer instruction cache misses, fewer instruction fetches and decodes, fewer branches and fewer mispredicted branches,more predictable memory accesses, fewer hazards We can say that Trident code is compact code with powerful instructions for high performance
21
The Trident Processor FeaturesThe Trident Processor Features
The Trident processor consists mainly of datapath circuitry and register files
The advances in the VLSI fabrication technology can be directly applied to support more parallelism
Simple control unit
There are many applications benefit from executing on the Trident processor, such as scientific, engineering, multimedia, and many others
22
Future WorkFuture Work
Simulating the Trident processor
Evaluating the performance of Trident processor on some multimedia and numerical applications
Comparing the performance of Trident processor with the superscalar processors
23
Thank youThank you