Nikhil Pre

download Nikhil Pre

of 48

  • date post

    04-Apr-2018
  • Category

    Documents

  • view

    222
  • download

    0

Embed Size (px)

Transcript of Nikhil Pre

  • 7/29/2019 Nikhil Pre

    1/48

    By

    NIKHIL SURYANARAYANAN

  • 7/29/2019 Nikhil Pre

    2/48

    Outline

    Motivation

    Eigen Value Decomposition &Applications

    Exact Jacobi Parallel Decomposition using Systolic

    Array

    Optimization of Systolic Array Interconnect optimized Systolic Array

    Conclusion

  • 7/29/2019 Nikhil Pre

    3/48

    Motivation

    Required in various fields

    High Performance and Real time applicationsdemands Hardware implementation

    SDMA Communication

    Realizing optimal architectures with respect tospeed and power for respective applications

  • 7/29/2019 Nikhil Pre

    4/48

    Eigen Value Decomposition

    Angle of Arrival Estimation

    Face Detection

    Image Compression

    Eigen Beam-forming

    Signal Subspace Estimation PCA

    MUSIC & ESPRIT

  • 7/29/2019 Nikhil Pre

    5/48

    EVD Methods

    Exact Jacobi

    Systolic Array

    Approximate Jacobi

    Algebraic Method (only for 3x3 matrix)

  • 7/29/2019 Nikhil Pre

    6/48

    Eigen Value Decomposition

    (EVD) Special case of Singular Value Decomposition(SVD) where the Matrix is Square-Symmetric

    Consider a Matrix ARmxn

    SVD: A = UDVT

    EVD: A = UDUT

    where,

    DRmxn is diagonal matrix,URmxn & VRmxn are orthogonal

  • 7/29/2019 Nikhil Pre

    7/48

    CORDIC

    COordinate Rotation DIgital Computer

    Set of Shift Add Algorithms forcomputing Sine, Cosine, Arc,

    Hyperbolic, Coordinate Rotation etc Eliminates complex computations

    Single Shift-Add Multiplier, ROM/RAMfor lookup & Basic Logic gates

    Hardware friendly

    Iterative Algorithm

  • 7/29/2019 Nikhil Pre

    8/48

    Loop Unrolling

  • 7/29/2019 Nikhil Pre

    9/48

    CORDIC Modules

    ArcTan Module

    Used to compute the

    tan-1

    / angle forconstructing the Jacobi

    Rotation Matrix

    Sine/Cosine Module

    cos sin

    -sin cos

    2x2 matrix is constructedusing the angle from the

    ArcTan module

  • 7/29/2019 Nikhil Pre

    10/48

    Exact Jacobi

    Aims at annihilating the off diagonalelements using a series of orthogonaltransformations

    A(k+1) = JTpqA(k) Jpq,

    where A(0)=A

    Jpq is called the Jacobi RotationDefined by the parameter (c s, -s c)

  • 7/29/2019 Nikhil Pre

    11/48

    Exact Jacobi

    A=UDVT

    UTAV=D After niterations,

    Ai+1=JiTAiJi

    Repeating for allpossible pairs, A canbe effectivelydiagonalized

    1......0......0......0

    . . . .

    0......c......s......0 p. . . .

    0......-s......c......0 q

    . . . .

    0......0......0......1

    p q

  • 7/29/2019 Nikhil Pre

    12/48

    Limitations of Exact Jacobi

    Implementation

    Jacobi iterations are serial

    Inability to derive parallelism as iterations have

    large inter-loop Data Dependency

    Inability to pipeline

    Every iteration involves transfer of 4N-4 matrix

    elements to the processor Even though it is MATRIX operation,

    parallelism cannot be derived

  • 7/29/2019 Nikhil Pre

    13/48

    How to parallelize?

    Systolic Array

    Solve 2x2 EVD sub problems

    For a matrix of size N we have N/2xN/2

    EVD sub problems If N=6; possible sets are

    { (1,2), (3,4) }

    { (1,3), (2,4) }

    { (1,4), (2,3) }

    Parallel Reordering

  • 7/29/2019 Nikhil Pre

    14/48

    Systolic Array for EVD

    PE PE PE

    PE PE PE

    PE PE PE

  • 7/29/2019 Nikhil Pre

    15/48

    Structure of PE

    CORDICATAN

    CORDICROT

    REG REG

  • 7/29/2019 Nikhil Pre

    16/48

    Data Exchange Sequence

    in in

    in

    in

    PEij

  • 7/29/2019 Nikhil Pre

    17/48

    Data Exchange PE11

    in in

    in in

  • 7/29/2019 Nikhil Pre

    18/48

    Data Exchange PE1j

    in in

    inin

  • 7/29/2019 Nikhil Pre

    19/48

    Data Exchange PEi1

    in in

    inin

  • 7/29/2019 Nikhil Pre

    20/48

    in in

    inin

    Data Exchange PEij

  • 7/29/2019 Nikhil Pre

    21/48

    Timing & Data Exchange

  • 7/29/2019 Nikhil Pre

    22/48

    Array Cycle = 1

  • 7/29/2019 Nikhil Pre

    23/48

    Array Cycle = 1

  • 7/29/2019 Nikhil Pre

    24/48

    Array Cycle = 1

    DATA

    EXCHANGE

  • 7/29/2019 Nikhil Pre

    25/48

    Array Cycle = 1DATA

    EXCHANGE

  • 7/29/2019 Nikhil Pre

    26/48

    Array Cycle = 1DATA

    EXCHANGE

  • 7/29/2019 Nikhil Pre

    27/48

    Array Cycle = 1

    DATA

    EXCHANGE

  • 7/29/2019 Nikhil Pre

    28/48

    Array Cycle = 1DATA

    EXCHANGE

  • 7/29/2019 Nikhil Pre

    29/48

    Array Cycle = 1DATA

    EXCHANGE

  • 7/29/2019 Nikhil Pre

    30/48

    Array Cycle = 1

    DATA

    EXCHANGE

  • 7/29/2019 Nikhil Pre

    31/48

    Staggered Processing?

    Not realistic to broadcast row and column angles

    in real time

    ij is the distance of the processor Pij from the

    diagonalAlso Pij needs data from neighbors Pi+-1,j+-1 (1< i,

    j < n/2)

    Can be made faster by allowing off-diagonalPE to allow execution as soon as thediagonal PE produce angles

  • 7/29/2019 Nikhil Pre

    32/48

    Optimizations

    Improves the utilization time for each PE from 1/3 rd to 2/3 rd

    CYCLE 2

    CYCLE 1

  • 7/29/2019 Nikhil Pre

    33/48

    Comparisons. Matrix 8x8

    EXACT JACOBI SYSTOLIC ARRAY

    Iterations forConvergence 3

    Additions 3500

    Multiplications 7000

    Swaps/Exchange 0

    Slower

    Iterations forConvergence 22-25

    Additions 1500 (less

    than half)

    Multiplications 3000

    Swaps/Exchange = 368

    Faster

  • 7/29/2019 Nikhil Pre

    34/48

    Optimized Architecture

    In the final Stages of Analyzing a simpler

    Systolic Architecture

    Matrix size=4x4

    1 2

    PE

    PE

    PE

    PE

  • 7/29/2019 Nikhil Pre

    35/48

    GOALS

    Achieved:

    Pipelined Jacobi Architecture

    S/W Implementation of Systolic Array

    Simultaneous execution of off diagonal PE to

    improve timing and reduce idle time

    Optimized Systolic Array architecture forminimum swaps and angle transmission

  • 7/29/2019 Nikhil Pre

    36/48

    References

    Andraka, Ray, Survey of CORDIC algorithms for FPGA based computers, ACM 1998

    A Novel Implementation of CORDIC Algorithm Using Backward Angle Recoding (BAR), Yu Hen Hu & Homer H.M. Chern,IEEE Transactions on Computers, December 1996

    Parallel Eigen Value Decomposition for Toeplitz and Related Matrices, Yu Hen Hu, IEEE Transactions-1989

    Kim Y, Kim Y, Doyle James, A Low Power CMOS CORDIC Processor Design for Wireless Telecommunications, IEEE2007

    Hemkumar N, Masters Thesis, Rice University

    Yang Liu et al, Hardware Efficient Architectures for Eigen Value Computation;, EDA 2006

    ASIC Implementation of Autocorrelation and CORDIC algorithm for OFDM based WLAN, Sudhakar Reddy & RamchandraReddy, European Journal of Scientific Research, 2009

    Advanced Algorithmic Evaluation for Imaging, Communication and Audio Applications Eigenvalue Decomposition usingCATAPULT C Algorithmic Synthesis Methodology

    Efficient Implementation of SVD on a Reconfigurable System, Christophe Bobda, Klaus Danne and Andre Linarth,Springer-Verlag Berlin Heidelberg 2003

    Hardware Implementation of Smart Antenna Systems, H. Wang and M. Glesner, Adv in Radio Sciences 2006

    Spectral Estimation using MUSIC Algorithm, Jawed Qumar, Nios II Embedded Processor Design Contest-2005

    Hardware Efficient Architectures for Eigen Value Computation, Yang Liu, Christis-Savvas Bouganis, Peter Y.K. Cheung,Philip H.W. Leong, Stephen J. Motley, EDAA 2006

    A Novel Fast Eigenvalue Decomposition based on Cyclic Jacobi Rotation and its application in eigen-beamforming, TechReport of IEICE-Japan

    Efficient Hardware Architectures for Eigenvector and Signal Subspace Estimation, Fan Xu & Alan Wilson, IEEETransactions on Circuits & Systems-204

    16 BIT CORDIC Rotator for High Sped Wireless LAN, Koushik Maharatna, Alfonso Troya, Swapna Banerjee, EckhardGrass, Milos Krstic, IEEE Transactions-2004

    Survey of CORDIC Algorithms for FPGA Based computers, Ray Andraka, ACM-1998

    Smart Antennas for Wireless Communications, Frank B Gross, Mc-Graw Hill,2005 ( Used forFacts & References forComparison purposes and Specifications of Different wireless standards)

  • 7/29/2019 Nikhil Pre

    37/48

  • 7/29/2019 Nikhil Pre

    38/48

  • 7/29/2019 Nikhil Pre

    39/48

    Ei V l d Ei

  • 7/29/2019 Nikhil Pre

    40/48

    Eigen Value and Eigen

    Vector The non zero vector of any linear

    transformation when applied to the vector

    changes the magnitude but not the direction is

    an Eigen Vector The scalar value associated with this vector is

    called the Eigen Value

    Ax=x

    A is the transformation, x is the Eigen vector &

    is the corresponding Eigen Value

  • 7/29/2019 Nikhil Pre

    41/48

    CORDIC contd

    Convergence depends on number of iterations

    Unrolled for Systolic and Pipeline

    implementations

    Iterative architecture unsuitable for FPGA Pipelined preferred as less complex H/W &

    operates at data rate

    Regi