Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation
description
Transcript of Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation
1
Numerical Error Minimizing Floating-Point to Fixed-Point ANSI C Compilation
Tor Aamodt and Paul Chow
University of Toronto
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 2 / 38
Presentation Outline
Background / Motivation
Floating-to-Fixed-Point Conversion
Architectural Support
Experimental Results
Summary / Future Directions
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 3 / 38
Background: University of Toronto DSP Project
Motivation: DSP Compiler/Architecture Co-design First Generation Silicon (Sean Peng’s M.A.Sc. Thesis) taped-
out Sept. 30, 1999: 108 pin PGA / 0.35 µm CMOS / 63 MHz 16-bit Fixed-Point VLIW with Two-Level Instruction Fetching Harvard Memory Architecture 5 stage pipeline: IF1 IF2 ID EX WB 7 function units:
2 integer units: 16.0 multiply & 1.15 multiply operations 2 address units: modulo addressing 2 memory units: each tied to one data memory bank 1 control unit
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 4 / 38
Background:
Fixed-Point versus Floating-Point
32 bit Floating-Point (IEEE):
Fixed-Point:
sign bit
sign bit
8 bit exponent (excess 127)
fractional part
IWL
integer part
23+1 bit normalizedmantissa
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 5 / 38
Background:
Fixed-Point versus Floating-Point
Property WL-bit Fixed-Point 32 bit Floating-Point
Dynamic Range of |x| [0,2IWL) (2-126, 2127)
Precision of x: |x / x| x -1 2(1+IWL - WL) 2-23
Function Unit Cost significantly less
This factor motivates us to find ways of coping with the shortcomings of fixed-point representations
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 6 / 38
Motivation
Why convert floating-point code to fixed-point code? Saves area and power.
Why automate the process? Manual conversion is time-consuming and error-prone.
What qualities are we looking for in an automated conversion system? Good signal quality*. Fast code.
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 7 / 38
Background: Fixed-point Numerical Representations in Signal Processing
Consider a program P with associated inputs x(k) SP. Example: P an IIR filter, SP the set of all human speech samples x(k).
Signal Scaling: Integer Word Length (IWL)
definition: IW Ld ef x S P
lo g | |m ax,2
Input, program variable, intermediate result, output For all definitions of , and all inputs x + an infinitesimally small number. Why? e.g. log22 = 1
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 8 / 38
Background:
Fixed-Point Arithmetic Operations
n
>> n (binary point alignment)
>> 1
( + 1)
Overflow Guard BitsAddition / Subtraction
B:
A:
Multiplication
IWLA+ IWLB
A*B:
IWLB
IWLA
A:
B:
???
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 9 / 38
Presentation Outline
Background Material / Motivation
Floating-to-Fixed-Point Conversion
Architecture Support
Experimental Results
Summary / Future Directions
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 10 / 38
Conversion Process:
Previous Work
‘Worst-Case Evaluation’: Markus Willems et. al. FRIDGE: An Interactive Code Generation Environment for HW/SW CoDesign. ICASSP, April 1997.
A ‘Statistical’ Approach: Ki-Il Kum, Jiyang Kang, and Wonyong Sung. A Floating-Point to Fixed-Point C Converter for Fixed-Point Digital Signal Processors. In Proc. 2nd SUIF Compiler Workshop, August 1997.
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 11 / 38
Conversion Process: OverviewInput C File
SUIF Front End
Math Library Replacement
Alias Analysis &ID Assignment
Instrument CodeProfile to obtainDynamic Ranges
Generate ScalingOperations
Code Generation /Detect & GenerateFMLS operations
UofT DSP Simulator
float *p, x, y, A[N], B[N];
for( int i=0; i < N; i++ ){ p = (condition) ? A : B; y += x*p[i];}
float fubar( float *p ){ float sum = 0.0; for( int i=0; i < N; i++) sum += p[i];}
“sin(x)” “utdsp_sin(x)”
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 12 / 38
Conversion Process: Collecting Dynamic Range Information
y +
*
*
a
x[i+1]
b
x[i]
Equivalent Expression Tree:
ID Assignment:
“1” : tmp_1
“2” : tmp_2
“0” :
profile(tmp_1,1);
profile(tmp_2,2);
profile(y,0);
Code Instrumentation:
Consider the ANSI C code:
float a, b, x[N]; y = a*x[i] + b*x[i+1];
tmp_1 = a*x[i];
tmp_2 = b*x[i+1];
y = tmp_1 * tmp_2;
fin
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 13 / 38
Conversion Process:
Desired Result
Continuation of Previous Example :
float a, b, x[N];y = a*x[i] + b*x[i+1];
int a, b, x[N];
y = a•x[i] >> 2 + b•x[i+1];
2. Scaling Operations
1. Type Conversion
3. Fractional Fixed-Point Operations
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 14 / 38
Conversion Process:
Type Conversion / Scaling Operation Generation
Type conversion: {float, double} int
Scaling Operations are added to expression trees using a post-order traversal...
Two previous algorithms from the literature for generating scaling operations...
Neither use Intermediate Result Profile data, instead, they combine range information from leaf nodes in a bottom-up fashion.
Is Useful Information Lost?
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 15 / 38
Conversion Process:
IRP: Using Intermediate Result Profile Data ‘Worst-Case Evaluation’: Markus Willems et. al.
FRIDGE: An Interactive Code Generation Environment for HW/SW CoDesign. ICASSP, April 1997.
A ‘Statistical’ Approach: Ki-Il Kum, Jiyang Kang, and Wonyong Sung. A Floating-Point to Fixed-Point C Converter for Fixed-Point Digital Signal Processors. In Proc. 2nd SUIF Compiler Workshop, August 1997.
UTDSP Algorithms: IRP, IRP-SA Each node has a measured IWL and a current IWLMeasured: IWL as determined by profilingCurrent: IWL due to scaling operations within
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 16 / 38
Scaling Operation Generation
IWLA measured
IWLA current
IWLA op B measured
IWLA op B current
IWLB measured
IWLB current
Converted Sub-Expressions
Example: “A op B”:
op
A B
?
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 17 / 38
IRP: Additive Operations
where: nA = IWLA current - IWLA measured
nB = IWLA current - IWLB measured
n = IWLA measured - IWLB measured
“A B” “(A << nA) (B >> [n-nB])”
IWLA+B current = IWLA measured
n
“A ± B”
B:
A:
For example, assume |A| > |B|, andIWLA+B measured IWLA measured
>> n
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 18 / 38
IRP: Multiplication
“A • B” “(A << nA) • (B << nB)”
where: nA = IWLA current - IWLA measured
nB = IWLA current - IWLB measured
IWLA•B current = IWLA measured + IWLB measured
Note: Typo in Notes!IWLA•B current = nA + nB
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 20 / 38
IRP-SA: Using ‘Shift Absorption’
Problem:
Question: Is information discarded unnecessarily here?
y = (a*x[i] + b*x[i+1]>>1) << 1
y = (a*x[i]<<1) + b*x[i+1]
Answer: Yes! Consider the following alternative:
Assuming 2’s-complement arithmetic, this expression results in a more precise answer.
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 21 / 38
Presentation Outline
Background Material / Motivation
Floating-to-Fixed-Point Conversion
Architecture Support
Experimental Results
Summary / Future Directions
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 22 / 38
Architectural Support
Left Shift
A*B:
A:
B:
Common occurrence (using IRP-SA):
A•B << n
Fractional Multiplication with integrated Left Shift:
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 23 / 38
Presentation Outline
Background Material / Motivation
Floating-to-Fixed-Point Conversion
Architecture Support
Experimental Results
Summary / Future Directions
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 24 / 38
Experimental Results
Four test-cases presented in paper:(1) 4th Order IIR Filter
(2) 1024 Point Radix 2 Decimation in Time FFT
(3) Nonlinear Feedback Control System
(4) 16th Order Lattice Filter
Look at (1) in detail, summarize results for others.
Explore some interesting properties exhibited in (4) that are indicative of possible future improvements.
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 25 / 38
Experimental Results:
4th Order IIR Filter4th Order Chebyshev Type II Low-Pass FilterDesigned using MATLAB’s cheby2 commandTransfer Function:
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-300
-200
-100
0
100
Normalized Frequency (´p rad/sample)
Pha
se (
degr
ees)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-100-80-60-40-20
020
Normalized Frequency (´p rad/sample)
Mag
nitu
de (
dB)
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 26 / 38
Experimental Results
4th Order IIR Filter (cont’d)
Filter Realization:MATLAB’s tfsos command (pole-zero pairing)2 Cascaded Direct-Form IIR filters
Algorithm14 Bit 16 Bit
w/o FMLS w/ FMLSw/o FMLS w/ FMLS
SNU-4
WC
IRP
IRP-SA
44.7 dB44.7 dB 56.4 dB 56.4 dB
45.6 dB 45.6 dB 57.1 dB57.1 dB
49.2 dB 49.3 dB 60.9 dB 62.0 dB
48.8 dB 53.5 dB 61.0 dB 66.9 dB
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 27 / 38
Experimental Results
4th Order IIR Filter (cont’d)
(A2[0]*t2 << 3) - (A2[1]*D2[0] << 3) + (A2[2]*D2[1] << 3)
IRP:
IRP-SA:
(A2[0]*t2 - A2[1]*D2[0] << 1) + (A2[2]*D2[1] << 1 ) << 2
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 28 / 38
Experimental Results:
1024-Point Radix-2 FFT
Algorithm14 Bit 16 Bit
w/o FMLS w/ FMLSw/o FMLS w/ FMLS
SNU-4
WC
IRP
IRP-SA
28.7 dB28.7 dB 36.7 dB 36.7 dB
28.7 dB 28.7 dB 36.7 dB36.7 dB
28.7 dB 34.9 dB 36.7 dB 44.6 dB
28.7 dB 34.9 dB 36.7 dB 44.6 dB
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 29 / 38
Experimental Results:
Rotational Inverted Pendulum
U of T System Control GroupNon-linear Testbench
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 30 / 38
Experimental Results:
Rotational Inverted Pendulum
Algorithm14 Bit 16 Bit
w/o FMLS w/ FMLSw/o FMLS w/ FMLS
SNU-4
WC
IRP
IRP-SA
42.7 dB4.0 dB 30.7 dB 54.9 dB
47.3 dB 54.3 dB 66.1 dB59.2 dB
53.1 dB 58.4 dB 65.8 dB 71.8 dB
52.8 dB 59.4 dB 64.4 dB 72.0 dB
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 31 / 38
Experimental Results:
Rotational Inverted Pendulum - 12-bit Controller Comparison
WC : 32.8 dBIRP-SA: 41.1 dBIRP-SA w/ fmls: 48.0 dB
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 32 / 38
Experimental Results:
16th Order Lattice Filter
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-1000
-500
0
500
1000
Normalized Frequency (´p rad/sample)
Pha
se (
degr
ees)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-80
-60
-40
-20
0
20
Normalized Frequency (´p rad/sample)
Mag
nitu
de (
dB)
16th Order Elliptic Bandpass Filter Transfer Function
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 33 / 38
Experimental Results:
Lattice Filter
Algorithm 32 Bit w/o Loop Unrolling 16 Bit w/ Loop Unrolling
w/o FMLS w/ FMLSw/o FMLS w/ FMLS
SNU-4
WC
IRP
IRP-SA
22.8 dB22.8 dB 47.1 dB 47.0 dB
28.1 dB 28.1 dB 48.3 dB48.3 dB
36.1 dB 36.2 dB 51.3 dB 51.3 dB
36.1 dB 36.2 dB 51.3 dB 50.9 dB
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 34 / 38
Experimental Results:
Lattice Filter#define N 16;double state[N+1], K[N], V[N+1];
double lattice( double x ){ double y = 0.0; for( int i=0; i < N; i++ ) { x = x - K[N-i-1] * state[N-i-1]; state[N-i] = state[N-i-1] + K[N-i-1]*x; y = y + V[N-i]*state[N-i]; } state[0] = x; return y + V[0]*state[0];}
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 35 / 38
Experimental Results:
Lattice Filter
Observation: Wide Dynamic Ranges of “state”, “V”, “x”, and “y” are due to ‘Name Dependencies’ of array elements and accumulators when assigning integer word lengths.
Can use Loop Unrolling + Renaming to break dependencies and achieve far better results (iteration dependant analysis mentioned in FRIDGE paper—however no experimental results reported)
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 36 / 38
Presentation Outline
Background Material / Motivation
Floating-to-Fixed-Point Conversion
Architecture Support
Experimental Results
Summary / Future Directions
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 37 / 38
Summary
Intermediate result profile data can used to reduce numerical error of fixed-point code.
A fractional multiply with integrated left shift operation can improve the results, especially when combined with the IRP-SA algorithm.
Improvements between 3.0 dB and 12.8 dB have been observed so far.
Tor Aamodt & Paul ChowUniversity of Toronto
Numerical Error Minimizing Floating-Point toFixed-Point ANSI C Compilation 38 / 38
Future Directions
Structural Transformations
Extended Precision Arithmetic
Overflows due to accumulated rounding error — use two profiling phases to estimate the effect of ‘second-order’ interactions.