A Fast Hardware Approach for Approximate, Efficient Logarithm and Anti-logarithm Computation Suganth...

A Fast Hardware Approach for Approximate, Efficient Logarithm and Anti-logarithm Computation

Suganth Paul

Nikhil Jayakumar

Sunil P. Khatri

Department of Electrical and Computer Engineering

Texas A&M University, College Station

Introduction• The fast generation of functions such as logarithm and antilogarithm is

important in areas such as DSP, computer graphics, scientific computing, artificial neural networks, logarithmic number systems.

• Over the past, authors have proposed various hardware approaches to accurately approximate logarithm and antilogarithm functions.

• Out of these approaches, Look up table (LUT) based methods such as Brubaker, Maenner, Kmetz, SBTM are widely used.

• Some hardware approaches also include LUTs combined with polynomial approximations. But these need multiplications/divisions.

• Our approach combines an LUT with linear interpolation implemented in an area and delay efficient manner.

• The novelty of our approach lies in the fact that we do not need a multiplier or divider to perform interpolation. Also we use the same hardware structure to implement log and antilog.

• The number format used for the computation is shown below.

Here : 0 < < 1 is the Mantissa and : is the exponent.

N 2e (1m)

e

m

m

Mitchell ApproximationThe logarithm of a number is found as

Mitchell’s approximation is given by

where

The error due to this approximation is

The error is plotted on the right

log2(N) e m

Em log2(1m) m

N 2e (1m)

log2(N)e log2(1m)

log2(1m) m

Kmetz Approximation

• In the Kmetz method, the Mitchell error curve shown above is sampled at points and stored in an LUT.

• Here the LUT is indexed by the first bits of the mantissa

• If the error value looked up from the LUT is , the logarithm is found as

where

• The error in this case due to approximating the logarithm of the mantissa portion is given by

log2(1m) m a

E k log2(1m) m a

log2(N)e log2(1m)

m

a

2t

t

Our Approach• In our method we interpolate between values stored in the LUT to get a

more accurate result.

• The logarithm of the mantissa part of the number is obtained as

• where is the error value from the LUT at location

is the number of leading bits in the mantissa indexing the table

is the next value in the LUT at location

is the total number of bits used to represent the mantissa

is the decimal value of the last bits of the mantissa

• The multiplication step is found as

• is found by using the same LUT as above

• We consider the following approximations to find and

log2(1m) m a(b a)n2k t

(b a)n

antilog2(log2(b a) log2(n))

log2(n)

log2(b a)


a

t

b

k

n

k t

i

i 1

Errors for Various Interpolation Methods and Table Sizes

1. is found bya) Mitchell approximationb) Kmetz approximation using another LUT

2. is found bya) Mitchell approximationb) Kmetz approximation using another LUT

We find from the table below that 1.b) 2.b) has the best error performance and hence we use LUTs to approximate the multiplication.

Max Error is in

log2(b a)


10 3

Block Diagram of the Log Engine

• The block diagram shows the implementation of where is the 23 bit mantissa

• The number of leading bits of the mantissa going to the interpolator depends on the size of the LUTs used in the Interpolator.

• In this case we are using an LUT that holds 64 values and 13 bits of the mantissa are required.

• The Interpolator block is shown below.

log2(1m)

m

Interpolator Block Diagram

• The implementation can be pipelined to get a better throughput.

• The COMPARE block determines if the final stage does an Add or Subtract.

• The LOD (leading one detector) block finds the position of the leading one and the rest of the bits are used to access the LUT.

• The LUT used to find and is the same and is implemented as a dual port ROM.

a

log2(n)

Antilog Computation• Let

The antilogarithm of this number is found as

Using Mitchell’s method we make the following approximation

• A Kmetz approximation can be made by storing the error due to this approximation in an LUT and adding the error value to the above equation for the antilogarithm.

• In our approach, we compute the antilogarithm by interpolating efficiently between two adjacent table values stored in the LUT without needing a multiplier.

• We follow the same flow used for computing the logarithm. The error incurred while using different table sizes for computing the antilogarithm is shown below.

M log2(N)e m

antilog2(M)2M 2e2m

2m 1m

Comparison of FPGA Resources used by the Log Engine

• We implemented our method and the Symmetric Bipartite Table Method (SBTM) using a Virtex2P FPGA.

• Our method requires smaller on-chip Block Rams.• Both methods occupied less than 1% of FPGA resources• Both methods were able to support clock speeds of a little over

350 MHz.

Comparison of LUT Size used and Accuracy of the Log Computation

Conclusion• Our approach has low memory requirement as compared

with other methods to provide better accuracies.• When compared to the SBTM, for every two bits of extra bits of

accuracy,– we need a factor of 2 increase in the LUT size– the SBTM needs a factor of 3 increase in the LUT size

Hence our method scales well for higher accuracy in bits.• We are area efficient compared polynomial interpolation

methods as we do not need a multiplier or divider to perform interpolation.

• The implementation can be pipelined and the number of stages in the pipeline can be varied depending on the throughput required.

• We have presented an approach to efficiently compute the logarithm and antilogarithm of a number in hardware.

A Fast Hardware Approach for Approximate, Efficient Logarithm and Anti-logarithm Computation Suganth...

Documents

Transcript of A Fast Hardware Approach for Approximate, Efficient Logarithm and Anti-logarithm Computation Suganth...