# Nikhil Pre

date post

04-Apr-2018Category

## Documents

view

222download

0

Embed Size (px)

### Transcript of Nikhil Pre

7/29/2019 Nikhil Pre

1/48

By

NIKHIL SURYANARAYANAN

7/29/2019 Nikhil Pre

2/48

Outline

Motivation

Eigen Value Decomposition &Applications

Exact Jacobi Parallel Decomposition using Systolic

Array

Optimization of Systolic Array Interconnect optimized Systolic Array

Conclusion

7/29/2019 Nikhil Pre

3/48

Motivation

Required in various fields

High Performance and Real time applicationsdemands Hardware implementation

SDMA Communication

Realizing optimal architectures with respect tospeed and power for respective applications

7/29/2019 Nikhil Pre

4/48

Eigen Value Decomposition

Angle of Arrival Estimation

Face Detection

Image Compression

Eigen Beam-forming

Signal Subspace Estimation PCA

MUSIC & ESPRIT

7/29/2019 Nikhil Pre

5/48

EVD Methods

Exact Jacobi

Systolic Array

Approximate Jacobi

Algebraic Method (only for 3x3 matrix)

7/29/2019 Nikhil Pre

6/48

Eigen Value Decomposition

(EVD) Special case of Singular Value Decomposition(SVD) where the Matrix is Square-Symmetric

Consider a Matrix ARmxn

SVD: A = UDVT

EVD: A = UDUT

where,

DRmxn is diagonal matrix,URmxn & VRmxn are orthogonal

7/29/2019 Nikhil Pre

7/48

CORDIC

COordinate Rotation DIgital Computer

Set of Shift Add Algorithms forcomputing Sine, Cosine, Arc,

Hyperbolic, Coordinate Rotation etc Eliminates complex computations

Single Shift-Add Multiplier, ROM/RAMfor lookup & Basic Logic gates

Hardware friendly

Iterative Algorithm

7/29/2019 Nikhil Pre

8/48

Loop Unrolling

7/29/2019 Nikhil Pre

9/48

CORDIC Modules

ArcTan Module

Used to compute the

tan-1

/ angle forconstructing the Jacobi

Rotation Matrix

Sine/Cosine Module

cos sin

-sin cos

2x2 matrix is constructedusing the angle from the

ArcTan module

7/29/2019 Nikhil Pre

10/48

Exact Jacobi

Aims at annihilating the off diagonalelements using a series of orthogonaltransformations

A(k+1) = JTpqA(k) Jpq,

where A(0)=A

Jpq is called the Jacobi RotationDefined by the parameter (c s, -s c)

7/29/2019 Nikhil Pre

11/48

Exact Jacobi

A=UDVT

UTAV=D After niterations,

Ai+1=JiTAiJi

Repeating for allpossible pairs, A canbe effectivelydiagonalized

1......0......0......0

. . . .

0......c......s......0 p. . . .

0......-s......c......0 q

. . . .

0......0......0......1

p q

7/29/2019 Nikhil Pre

12/48

Limitations of Exact Jacobi

Implementation

Jacobi iterations are serial

Inability to derive parallelism as iterations have

large inter-loop Data Dependency

Inability to pipeline

Every iteration involves transfer of 4N-4 matrix

elements to the processor Even though it is MATRIX operation,

parallelism cannot be derived

7/29/2019 Nikhil Pre

13/48

How to parallelize?

Systolic Array

Solve 2x2 EVD sub problems

For a matrix of size N we have N/2xN/2

EVD sub problems If N=6; possible sets are

{ (1,2), (3,4) }

{ (1,3), (2,4) }

{ (1,4), (2,3) }

Parallel Reordering

7/29/2019 Nikhil Pre

14/48

Systolic Array for EVD

PE PE PE

PE PE PE

PE PE PE

7/29/2019 Nikhil Pre

15/48

Structure of PE

CORDICATAN

CORDICROT

REG REG

7/29/2019 Nikhil Pre

16/48

Data Exchange Sequence

in in

in

in

PEij

7/29/2019 Nikhil Pre

17/48

Data Exchange PE11

in in

in in

7/29/2019 Nikhil Pre

18/48

Data Exchange PE1j

in in

inin

7/29/2019 Nikhil Pre

19/48

Data Exchange PEi1

in in

inin

7/29/2019 Nikhil Pre

20/48

in in

inin

Data Exchange PEij

7/29/2019 Nikhil Pre

21/48

Timing & Data Exchange

7/29/2019 Nikhil Pre

22/48

Array Cycle = 1

7/29/2019 Nikhil Pre

23/48

Array Cycle = 1

7/29/2019 Nikhil Pre

24/48

Array Cycle = 1

DATA

EXCHANGE

7/29/2019 Nikhil Pre

25/48

Array Cycle = 1DATA

EXCHANGE

7/29/2019 Nikhil Pre

26/48

Array Cycle = 1DATA

EXCHANGE

7/29/2019 Nikhil Pre

27/48

Array Cycle = 1

DATA

EXCHANGE

7/29/2019 Nikhil Pre

28/48

Array Cycle = 1DATA

EXCHANGE

7/29/2019 Nikhil Pre

29/48

Array Cycle = 1DATA

EXCHANGE

7/29/2019 Nikhil Pre

30/48

Array Cycle = 1

DATA

EXCHANGE

7/29/2019 Nikhil Pre

31/48

Staggered Processing?

Not realistic to broadcast row and column angles

in real time

ij is the distance of the processor Pij from the

diagonalAlso Pij needs data from neighbors Pi+-1,j+-1 (1< i,

j < n/2)

Can be made faster by allowing off-diagonalPE to allow execution as soon as thediagonal PE produce angles

7/29/2019 Nikhil Pre

32/48

Optimizations

Improves the utilization time for each PE from 1/3 rd to 2/3 rd

CYCLE 2

CYCLE 1

7/29/2019 Nikhil Pre

33/48

Comparisons. Matrix 8x8

EXACT JACOBI SYSTOLIC ARRAY

Iterations forConvergence 3

Additions 3500

Multiplications 7000

Swaps/Exchange 0

Slower

Iterations forConvergence 22-25

Additions 1500 (less

than half)

Multiplications 3000

Swaps/Exchange = 368

Faster

7/29/2019 Nikhil Pre

34/48

Optimized Architecture

In the final Stages of Analyzing a simpler

Systolic Architecture

Matrix size=4x4

1 2

PE

PE

PE

PE

7/29/2019 Nikhil Pre

35/48

GOALS

Achieved:

Pipelined Jacobi Architecture

S/W Implementation of Systolic Array

Simultaneous execution of off diagonal PE to

improve timing and reduce idle time

Optimized Systolic Array architecture forminimum swaps and angle transmission

7/29/2019 Nikhil Pre

36/48

References

Andraka, Ray, Survey of CORDIC algorithms for FPGA based computers, ACM 1998

A Novel Implementation of CORDIC Algorithm Using Backward Angle Recoding (BAR), Yu Hen Hu & Homer H.M. Chern,IEEE Transactions on Computers, December 1996

Parallel Eigen Value Decomposition for Toeplitz and Related Matrices, Yu Hen Hu, IEEE Transactions-1989

Kim Y, Kim Y, Doyle James, A Low Power CMOS CORDIC Processor Design for Wireless Telecommunications, IEEE2007

Hemkumar N, Masters Thesis, Rice University

Yang Liu et al, Hardware Efficient Architectures for Eigen Value Computation;, EDA 2006

ASIC Implementation of Autocorrelation and CORDIC algorithm for OFDM based WLAN, Sudhakar Reddy & RamchandraReddy, European Journal of Scientific Research, 2009

Advanced Algorithmic Evaluation for Imaging, Communication and Audio Applications Eigenvalue Decomposition usingCATAPULT C Algorithmic Synthesis Methodology

Efficient Implementation of SVD on a Reconfigurable System, Christophe Bobda, Klaus Danne and Andre Linarth,Springer-Verlag Berlin Heidelberg 2003

Hardware Implementation of Smart Antenna Systems, H. Wang and M. Glesner, Adv in Radio Sciences 2006

Spectral Estimation using MUSIC Algorithm, Jawed Qumar, Nios II Embedded Processor Design Contest-2005

Hardware Efficient Architectures for Eigen Value Computation, Yang Liu, Christis-Savvas Bouganis, Peter Y.K. Cheung,Philip H.W. Leong, Stephen J. Motley, EDAA 2006

A Novel Fast Eigenvalue Decomposition based on Cyclic Jacobi Rotation and its application in eigen-beamforming, TechReport of IEICE-Japan

Efficient Hardware Architectures for Eigenvector and Signal Subspace Estimation, Fan Xu & Alan Wilson, IEEETransactions on Circuits & Systems-204

16 BIT CORDIC Rotator for High Sped Wireless LAN, Koushik Maharatna, Alfonso Troya, Swapna Banerjee, EckhardGrass, Milos Krstic, IEEE Transactions-2004

Survey of CORDIC Algorithms for FPGA Based computers, Ray Andraka, ACM-1998

Smart Antennas for Wireless Communications, Frank B Gross, Mc-Graw Hill,2005 ( Used forFacts & References forComparison purposes and Specifications of Different wireless standards)

7/29/2019 Nikhil Pre

37/48

7/29/2019 Nikhil Pre

38/48

7/29/2019 Nikhil Pre

39/48

Ei V l d Ei

7/29/2019 Nikhil Pre

40/48

Eigen Value and Eigen

Vector The non zero vector of any linear

transformation when applied to the vector

changes the magnitude but not the direction is

an Eigen Vector The scalar value associated with this vector is

called the Eigen Value

Ax=x

A is the transformation, x is the Eigen vector &

is the corresponding Eigen Value

7/29/2019 Nikhil Pre

41/48

CORDIC contd

Convergence depends on number of iterations

Unrolled for Systolic and Pipeline

implementations

Iterative architecture unsuitable for FPGA Pipelined preferred as less complex H/W &

operates at data rate

Regi