Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident...

23
Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Eng. M. Soliman Prof. S. Sedukhin
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    236
  • download

    1

Transcript of Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident...

Page 1: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

Trident Processor

A Scalable Architecture for Scalar, Vector, and Matrix

Operations

Trident Processor

A Scalable Architecture for Scalar, Vector, and Matrix

OperationsEng. M. Soliman Prof. S.

Sedukhin

Page 2: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

2

ContentsContents

The impacting factors on the processor architecture

The idea of our proposed Trident processor

The Trident parallelism

The architecture of the Trident processor

The features of the Trident processor

Conclusion and Future work

Page 3: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

3

TechnologyApplications

Characteristics

processorArchitecture

The Important Factors Impact on the Processor Architecture The Important Factors Impact on the Processor Architecture

Page 4: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

4

Fast-improving TechnologyFast-improving TechnologyMoose's law: The number of transistors per

integrated circuit would double every 18 months

Page 5: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

5

Application CharacteristicsApplication Characteristics

Processor Multimedia extension

Intel Pentium II, III, and 4

MMX, SSE, and SSE2

Motorola PowerPC AltiVec

Silicon Graphics MIPS MDMX

Sun Sparc VIS

Hewlett-Packard PA-RISC

MAX

In response to the increasing importance of multimedia applications, major processor vendors have announced extensions to their general purpose processors in an effort to improve their multimedia performance

Page 6: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

6

The Idea of the Trident ProcessorThe Idea of the Trident Processor

The huge transistor budget (within a few years it will be possible to integrate a billion transistors on a single chip )

The requirements of future applications (the scientific and engineering applications, multimedia applications, … , are based on vector and matrix operations)

Page 7: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

7

Scalar IS(1

operation)

Vector IS (n operations)

Matrix IS(n2/n3 operations)

We Propose the Trident ProcessorWe Propose the Trident Processor

Trident: A general-purpose processor which has three instruction sets (IS): scalar, vector, and matrix

Page 8: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

8

Ins. Set Example Scalar Code Scalar ops

Scalar Addition z=x+y; 1

VectorAddition

for(i=0;i<n;i++)z[i]=x[i]+y[i]; O(n)

Dot products=0;for(i=0;i<n;i++)s+=x[i]*y[i];

O(n)

Additionfor(i=0;i<n;i++)for(j=0;j<n;j++)z[i][j]=x[i][j]+ y[i][j];

O(n2)

Matrix Matrix-vector multiplication

for(i=0;i<n;i++){s=0;for(j=0;j<n;j++)s+=x[i][j]*y[j];z[i]=s;}

O(n2)

Matrix-matrix multiplication

for(i=0;i<n;i++)for(j=0;j<n;j++){s=0;for(k=0;k<n;k++)s+=x[i][k]*y[k][j];z[i][j]=s;}

O(n3)

The Trident Instruction setsThe Trident Instruction sets

Page 9: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

9

Trident processor exploits a significant amount (up to three levels) of data parallelism The advantages of using data parallelism

Compact:A single short instruction can describe array of scalar operations

Expressive: A single instruction can pass valuable information about an array of

scalar operations to hardware

Scalable: adding more hardware can increase performance by processing

longer arrays

The Trident ParallelismThe Trident Parallelism

Page 10: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

10

The Trident ArchitectureThe Trident Architecture

Page 11: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

11

Vector ProcessingVector Processing

A vector pipeline can perform the fundamental vector operation, such as addition, subtraction, multiplication, and division

Vector data are stored on ring vector registers

Multiple vector instructions can be operated concurrently on the parallel vector pipelines

Page 12: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

12

Step0

Inputa0 , b0

Output

1 a! , b1 a0 + b0

2 a3 , b3 a1 + b1

3 a3 , b3 a2 + b2

4 a0 , b0 a3 + b3

VR2 VR0 + VR1

Example: vector additionExample: vector addition

Page 13: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

13

Matrix ProcessingMatrix ProcessingBy using parallel vector pipelines and ring matrix register file, the fundamental matrix operations, such as addition, subtraction, multiplication, and inversion, can be performed

Page 14: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

14

Example: Matrix additionExample: Matrix addition

MR2 MR0 + MR1P3P2P1P0P3P2P1P0

OutputInput

Ste

p

0a00

b00

a10

b10

a20

b20

a30

b30

1a01

b01

a11

b11

a21

b21

a31

b31

a00+b00

a10 + b10

a20

+

b20

a30

+

b30

2a02

b02

a12

b12

a22

b22

a32

b32

a01+b01

a11 + b11

a21

+

b21

a31

+

b31

3a03

b03

a13

b13

a23

b23

a33

b33

a02+b02

a12 + b12

a22

+

b22

a32

+

b32

Page 15: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

15

The basic matrix operation is the matrix-matrix multiplication

Matrix-matrix MultiplicationMatrix-matrix Multiplication

Page 16: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

16

ChainingChaining

1

0

n

jjiji bac

1

0

n

kkjikij bac

1

0

n

iii bac

Matrix-matrix multiplication

Matrix-vector multiplication

Dot product

Page 17: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

17

Instructions O(n3) O(n2) O(1)

Load O(n3) O(n3) O(n2)

Store O(n2) O(n2) O(n2)

Mull-acc. O(n3) O(n3) O(n2)

Branch O(n3) O(n2) 0

Address comp. O(n3) O(n2) O(1)

Add/sub. O(n3) O(n2) 0

Reg. initialization

O(n2) O(n) 0

Scalar IS Vector IS Matrix IS

Matrix-matrix Multiplication ComplexityMatrix-matrix Multiplication Complexity

Page 18: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

18

0

500

1000

1500

2000

2500

3000

3500

4000

1 2 3scalar

vector matrix

88 Matrix-matrix Multiplication88 Matrix-matrix Multiplication

Number of instructions

Page 19: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

19

0

200

400

600

800

1000

1200

1400

1 2 3 4 5 6 7

scalar vector matrix

(1) load, (2) store, (3) multiply-accumulate steps, (4) branch, (5) address computations, (6) addition/ subtraction, and (7) register initializations

ContinueContinue

Page 20: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

20

What this means?What this means?fewer instruction cache misses, fewer instruction fetches and decodes, fewer branches and fewer mispredicted branches,more predictable memory accesses, fewer hazards We can say that Trident code is compact code with powerful instructions for high performance

Page 21: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

21

The Trident Processor FeaturesThe Trident Processor Features

The Trident processor consists mainly of datapath circuitry and register files

The advances in the VLSI fabrication technology can be directly applied to support more parallelism

Simple control unit

There are many applications benefit from executing on the Trident processor, such as scientific, engineering, multimedia, and many others

Page 22: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

22

Future WorkFuture Work

Simulating the Trident processor

Evaluating the performance of Trident processor on some multimedia and numerical applications

Comparing the performance of Trident processor with the superscalar processors

Page 23: Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix Operations Trident Processor A Scalable Architecture for Scalar, Vector, and Matrix.

23

Thank youThank you