Profile-directed speculative optimization of reconfigurable floating point data paths

21
The Queen’s Tower The Queen’s Tower Imperial College London Imperial College London South Kensington, SW7 South Kensington, SW7 27th Jan 2008 | Ashley Brown Profile-directed speculative optimization of reconfigurable floating point data paths Workshop on Reconfigurable Computing at 2008 Ashley Brown, 27 th Jan 2007

description

Profile-directed speculative optimization of reconfigurable floating point data paths. Workshop on Reconfigurable Computing at 2008 Ashley Brown, 27 th Jan 2007. Introduction. Computational science requires reproducible and accurate results IEEE-754 is a compromise - PowerPoint PPT Presentation

Transcript of Profile-directed speculative optimization of reconfigurable floating point data paths

Page 1: Profile-directed speculative optimization of reconfigurable floating point data paths

The Queen’s TowerThe Queen’s TowerImperial College LondonImperial College LondonSouth Kensington, SW7South Kensington, SW7

27th Jan 2008 | Ashley Brown

Profile-directed speculative optimization

ofreconfigurable floating

point data pathsWorkshop on Reconfigurable

Computing at 2008

Ashley Brown, 27th Jan 2007

Page 2: Profile-directed speculative optimization of reconfigurable floating point data paths

27th Jan 2008 | Ashley Brown # 2

IntroductionIntroduction

• Computational science requires reproducible and accurate results

• IEEE-754 is a compromise– Broad range of values

– Many special cases

• Idea: use profiling to reduce range and remove special cases

Generate floating-point data-paths for FPGAs which are smaller and faster

• BUT KEEP RESULTS CONSISTENT WITH IEEE-754

Page 3: Profile-directed speculative optimization of reconfigurable floating point data paths

27th Jan 2008 | Ashley Brown # 3

Advantages of Smaller Floating PointAdvantages of Smaller Floating Point

• Embedded Systems– Do the same work for a lower cost– Implement IEEE-754 compliant floating point where

it may not have been possible before

• High performance– Do more work with the same hardware– Increase in parallel execution on FPGAs– No need to sacrifice IEEE-754 compliance

Page 4: Profile-directed speculative optimization of reconfigurable floating point data paths

Four Pictures to Explain: #1Four Pictures to Explain: #1

27th Jan 2008 | Ashley Brown # 4

Page 5: Profile-directed speculative optimization of reconfigurable floating point data paths

Four Pictures to Explain: #2Four Pictures to Explain: #2

27th Jan 2008 | Ashley Brown # 5

Page 6: Profile-directed speculative optimization of reconfigurable floating point data paths

Four Pictures to Explain: #3Four Pictures to Explain: #3

Page 7: Profile-directed speculative optimization of reconfigurable floating point data paths

Four Pictures to Explain: #4Four Pictures to Explain: #4

27th Jan 2008 | Ashley Brown # 7

Pre-optimisation Post-optimisation

Page 8: Profile-directed speculative optimization of reconfigurable floating point data paths

27th Jan 2008 | Ashley Brown # 8

Optimisation TechniqueOptimisation Technique

• Remove features from the floating-point unit:– Operand alignment– Normalisation– Operand swap

• If these were required, detect and fall-back to alternative solution:– Software-based on embedded/host processor– Hardware-based full implementation for larger

designs

Page 9: Profile-directed speculative optimization of reconfigurable floating point data paths

Optimisation OptionsOptimisation Options

27th Jan 2008 | Ashley Brown # 9

Page 10: Profile-directed speculative optimization of reconfigurable floating point data paths

The stages of optimisationThe stages of optimisation

• Profile target application with training datasets– Source usually FORTRAN, C

• Identify frequently-executed blocks

• Check for good value-locality

• Generate reduced-size floating point datapath– Reduced operand alignment hardware– Reduced normalisation hardware

• Error checking: execute with additional datasets, check error rates

27th Jan 2008 | Ashley Brown # 10

Page 11: Profile-directed speculative optimization of reconfigurable floating point data paths

27th Jan 2008 | Ashley Brown # 11

FloatWatch ProfilerFloatWatch Profiler

• Valgrind-based value profiler

• Can return some metrics of interest here:– Floating point value

ranges– Ratio of floating point

operands

• Each has uses for optimisation!

Page 12: Profile-directed speculative optimization of reconfigurable floating point data paths

27th Jan 2008 | Ashley Brown # 12

VFLOAT LibraryVFLOAT Library

• VHDL variable-precision floating-point library– Initially developed by Belanovic at Northeastern,

continued development under the supervision of Leeser

• Allows basic customisation of precision, exponent bit widths

• Further customisations added for our optimisations:– Operand alignment

– Normalisation

• Performance is lower than vendor-specific libraries

Page 13: Profile-directed speculative optimization of reconfigurable floating point data paths

27th Jan 2008 | Ashley Brown # 13

Data-path GeneratorData-path Generator

• Takes user-selected data-path and generates VHDL implementation

• Assembles modified version of the RPL library – customised to allow removal of various items

• Builds hardware/software integration layer– C library for software– VHDL for hardware

• Does not modify the software source automatically (yet)

Page 14: Profile-directed speculative optimization of reconfigurable floating point data paths

27th Jan 2008 | Ashley Brown # 14

Proof-of-Concept TestingProof-of-Concept Testing

• Original application modified to call C library (usually from FORTRAN)

• Data sent to hardware, calculated, and returned– Software waits for response– No data-aggregation or hardware-side error

detection occurs

• Software layer performs same calculation for verification

• Overall error rate reported

Page 15: Profile-directed speculative optimization of reconfigurable floating point data paths

27th Jan 2008 | Ashley Brown # 15

‘‘ydl_pij’ydl_pij’

• ‘ydl_pij’ is an iterative solver for quantum mechanics, using the “Molecular Mechanics – Valence Bond” method

• Datasets of various sizes available, allowing a variety of test cases be used

• Initial profiling and testing use separate datasets

Page 16: Profile-directed speculative optimization of reconfigurable floating point data paths

27th Jan 2008 | Ashley Brown # 16

‘‘ydl_pij’: Profiling (Hot Code Section)ydl_pij’: Profiling (Hot Code Section)

Narrow value ranges

Page 17: Profile-directed speculative optimization of reconfigurable floating point data paths

27th Jan 2008 | Ashley Brown # 17

‘‘ydl_pij’: Identificationydl_pij’: Identification

• FloatWatch identifies the regions of code executing the most operations

• In this case, these show narrow value ranges

• Create optimised datapaths for testing– Maximum operand alignment reduced to 2n

, where n is in the range [1, 6]

– Normalisation hardware modified similarly

Page 18: Profile-directed speculative optimization of reconfigurable floating point data paths

‘‘ydl_pij’ Error Rateydl_pij’ Error Rate

Not profiled

Page 19: Profile-directed speculative optimization of reconfigurable floating point data paths

‘ydl_pij’: Error Rate and Size

• 20% size reduction with negligible re-execution rate (< 0.5%)

• 27% size reduction with 3% re-execution rate

• Size reduction permits ~40% increase parallelism due to better space usage

Page 20: Profile-directed speculative optimization of reconfigurable floating point data paths

ydl_pij: Area saving for one F.P. ydl_pij: Area saving for one F.P. adder/subtractoradder/subtractor

27th Jan 2008 | Ashley Brown # 20

Pre-optimisation Post-optimisation

Page 21: Profile-directed speculative optimization of reconfigurable floating point data paths

27th Jan 2008 | Ashley Brown # 21

Coming SoonComing Soon

• Per-operation optimisations– Currently only at data-path level

• Optimisation of operand-swap hardware

• Per-operation exponent customisation (size, bias)

• Performance evaluation using state-of-the-art FPGA accelerator hardware

• Implementation of error detection and re-execution

• Potential for even greater size reductions