[IEEE 2008 4th Southern Conference on Programmable Logic (SPL) - Bariloche, patagonia, Argentina...

6
POLYNOMIC CURVE BASED REPRESENTATION SYSTEM IMPLEMENTED USING FPGAS Carlos Pineiro, Cristina Valbuena * Computer Architecture Department Universidad Complutense de Madrid email: [email protected], cristina.valbuena.Iledo @ gmail.com ABSTRACT A polynomic curve based representation system is a three- dimensional system for rendering images using polynomic curves instead of triangles meshes. This system allows ren- der a scene in a way that the computational cost depends on the image size more than on the scene complexity. More- over, It allows the user to describe a scene using a minimal amount of data in comparison with traditional methods. According to that, we have implemented a hardware that demonstrates how this kind of applications can be accele- rated in a huge grade using the proper solution. The hard- ware was implemented on a FPGA; thanks to it, the exe- cution times were very reduced, showing a very promising speedup compared against an only software solution. 1. INTRODUCTION Computer-generated imagery (CGI) was first used in movies in 1973 in Westworld, where 2D images were generated. In 1976, in Westworld sequel Futureworld, 3D CGI was first used. After that, CGI has gained importance until being present in most of the films currently. Meanwhile, computer games have been pushing CGI in a parallel, but convergent, direction. While movies aim to be as much realistic as possible, games are also interested in quality, but limited by frame rate: CGI has to be realistic, but also has to be fast. Game graphics are based in triangulation methods be- cause this are the fastest way of rendering simple scenes. But, as the scene complexity has increased, the graphics ev- erytime needs more and more triangles to define the scene objects. Due to all these facts, triangles do not represent a valid solution for graphics on computer games, as they limit the kind of shapes to be displayed to only polyhedrons. Meanwhile, car and aircraft industry have used poly- nomic curves since late 1950s to design cars and planes. *This work has been supported by Spanish Government Research Grant TIN2006-03274. Hortensia Mecha * Computer Architecture Department Universidad Complutense de Madrid email: [email protected] These mathematical structures let represent many kind of surfaces with a smoothness that can not be achieved with triangle based representation. If curves are needed to design, the logical step is deve- loping a method to directly render objects made up of curved surfaces, without turning them on triangle meshes. Further- more, our objective is not only developing the technique, but also the needed hardware for this computational task. On the other hand, the reconfigurable hardware has ex- perimented a huge development in the last few years, and the inclusion of some coarse-grain elements, as multipliers, memories or even embedded microprocessors, has turned them on ideal devices to accelerate applications which re- quire huge amounts of arithmetic. Because of that, this de- vices are the ideal choice for implementing our polynomic curve based system. The FPGAs are dominated by interconnect. This makes them very flexible in terms of the range of designs that are practical for implementation within them. Another impor- tant fact is that most FPGAs of higher-level present em- bedded arithmetic functions and embedded memories. An example of all this is the Virtex-JI Pro [1], a FPGA which combines up to 20 Rocket IO transceiver blocks, 2 PowerPC processor blocks, 99216 logic cells, 44096 slices, 1378 Kb distributed RAM, 444 18 x 18 multiplier blocks, 7992 Kb block RAM, 12 DCMs and 1164 user I/O pads. In the last years, a recent trend has been to take the coarse-grained architectural approach a step further by com- bining the logic blocks and interconnects present on tradi- tional FPGAs with embedded microprocessors and related peripherals to form a complete "System on Chip". However, an alternate approach to using hard-macro processors is to make use of "soft" processor cores that are implemented within FPGA fabric. Furthermore, for those systems where is needed to per- form a huge amount of very specific arithmetic calculations, FPGAs are the logic choice, providing systems which in- clude processors and user-made co-processors for specific tasks. 978-1-4244-1992-0/08/$25.00 C2008 IEEE. 81

Transcript of [IEEE 2008 4th Southern Conference on Programmable Logic (SPL) - Bariloche, patagonia, Argentina...

Page 1: [IEEE 2008 4th Southern Conference on Programmable Logic (SPL) - Bariloche, patagonia, Argentina (2008.03.26-2008.03.28)] 2008 4th Southern Conference on Programmable Logic - Polynomic

POLYNOMIC CURVE BASED REPRESENTATION SYSTEM IMPLEMENTED USINGFPGAS

Carlos Pineiro, Cristina Valbuena *

Computer Architecture DepartmentUniversidad Complutense de Madrid

email: [email protected],cristina.valbuena.Iledo@ gmail.com

ABSTRACT

A polynomic curve based representation system is a three-dimensional system for rendering images using polynomiccurves instead of triangles meshes. This system allows ren-der a scene in a way that the computational cost depends onthe image size more than on the scene complexity. More-over, It allows the user to describe a scene using a minimalamount of data in comparison with traditional methods.

According to that, we have implemented a hardware thatdemonstrates how this kind of applications can be accele-rated in a huge grade using the proper solution. The hard-ware was implemented on a FPGA; thanks to it, the exe-cution times were very reduced, showing a very promisingspeedup compared against an only software solution.

1. INTRODUCTION

Computer-generated imagery (CGI) was first used in moviesin 1973 in Westworld, where 2D images were generated. In1976, in Westworld sequel Futureworld, 3D CGI was firstused. After that, CGI has gained importance until beingpresent in most of the films currently.

Meanwhile, computer games have been pushing CGI ina parallel, but convergent, direction. While movies aim tobe as much realistic as possible, games are also interested inquality, but limited by frame rate: CGI has to be realistic,but also has to be fast.

Game graphics are based in triangulation methods be-cause this are the fastest way of rendering simple scenes.But, as the scene complexity has increased, the graphics ev-erytime needs more and more triangles to define the sceneobjects. Due to all these facts, triangles do not represent avalid solution for graphics on computer games, as they limitthe kind of shapes to be displayed to only polyhedrons.

Meanwhile, car and aircraft industry have used poly-nomic curves since late 1950s to design cars and planes.

*This work has been supported by Spanish Government Research GrantTIN2006-03274.

Hortensia Mecha *

Computer Architecture DepartmentUniversidad Complutense de Madrid

email: [email protected]

These mathematical structures let represent many kind ofsurfaces with a smoothness that can not be achieved withtriangle based representation.

If curves are needed to design, the logical step is deve-loping a method to directly render objects made up of curvedsurfaces, without turning them on triangle meshes. Further-more, our objective is not only developing the technique, butalso the needed hardware for this computational task.

On the other hand, the reconfigurable hardware has ex-perimented a huge development in the last few years, andthe inclusion of some coarse-grain elements, as multipliers,memories or even embedded microprocessors, has turnedthem on ideal devices to accelerate applications which re-quire huge amounts of arithmetic. Because of that, this de-vices are the ideal choice for implementing our polynomiccurve based system.

The FPGAs are dominated by interconnect. This makesthem very flexible in terms of the range of designs that arepractical for implementation within them. Another impor-tant fact is that most FPGAs of higher-level present em-bedded arithmetic functions and embedded memories. Anexample of all this is the Virtex-JI Pro [1], a FPGA whichcombines up to 20 Rocket IO transceiver blocks, 2 PowerPCprocessor blocks, 99216 logic cells, 44096 slices, 1378 Kbdistributed RAM, 444 18 x 18 multiplier blocks, 7992 Kbblock RAM, 12 DCMs and 1164 user I/O pads.

In the last years, a recent trend has been to take thecoarse-grained architectural approach a step further by com-bining the logic blocks and interconnects present on tradi-tional FPGAs with embedded microprocessors and relatedperipherals to form a complete "System on Chip". However,an alternate approach to using hard-macro processors is tomake use of "soft" processor cores that are implementedwithin FPGA fabric.

Furthermore, for those systems where is needed to per-form a huge amount of very specific arithmetic calculations,FPGAs are the logic choice, providing systems which in-clude processors and user-made co-processors for specifictasks.

978-1-4244-1992-0/08/$25.00 C2008 IEEE. 81

Page 2: [IEEE 2008 4th Southern Conference on Programmable Logic (SPL) - Bariloche, patagonia, Argentina (2008.03.26-2008.03.28)] 2008 4th Southern Conference on Programmable Logic - Polynomic

Our objective is to demonstrate that a polynomic curvebases system as the described before can be developed, andit can become usable if it is implemented over suitable hard-ware.

The rest of the paper is organised as follows: next sec-tion reviews related work in the polynomic curves field. Insection 3 we will describe the kind of applications we wantto accelerate, specifically the mathematical pillars on whichis based the algorithm, and the facts we use to design thespecific hardware. In section 4 we explain the improvementobtained by using specific hardware over an only softwaresolution. In section 5 show a comparison between softwareand hardware execution times. And finally in section 6 and 7we present the conclusions and some future guidelines tocontinue this work.

2. PREVIOUS WORK

Polynomic curves are a tool to construct smooth and flexi-ble shapes in computer graphics, but before computers wereused it also were used in the aircrafts and ship-building in-dustries. For example, in the British aircraft industry dur-ing World War II, there were a technique to construct tem-plates for airplanes by passing thin wooden strips ("splines")through pints laid out on the floor of a large design loft.

The ship design found out a successful way to design.It was plot on graph paper and the key points of the plotwere re-plotted on larger graph paper to full size. The thinwooden strips provided an interpolation of the key pointsinto smooth curves. The strips would be held in place at dis-crete points and between these points would assume shapesof minimum strain energy.

The use of splines for modelling the automobile bodiesseems to have several independent beginnings [2]. Creditis claimed on behalf of de Casteljau [3] at Citroen, PierreBezier [4] at Renault, and Birkhoff and de Boor [5] at Ge-neral Motors, all for work occurring in the very early 1960sor late 1950s. De Boor work at GM resulted in a number ofpapers being published in the early 60's, including some ofthe fundamental work on B-splines.

Splines have also been used widely on image processing.Its mathematical properties make the a great tool in a lot offields, such as signal [6] or image processing [7].

On the other hand, FPGAs are being used currently toaccelerate 3D applications, as the core developed by Xilinx[8] for displaying 3D graphics on cars. FPGAs are also be-ing used as a development tool for traditional 3D graphicscards [9].

3. FRAMEWORK

This system is a mixed software-hardware solution whichrenders an image using splines, which are explained on sec-

Fig. 1. A metallic spline.

tion 3.1, and which has been implemented over a XUP De-velopment Board. This board features a Virtex-II Pro, inparticular the XC2VP30 which includes up to 8 Rocket IOtransceiver blocks, 2 PowerPC 405 processor blocks, 30816cells, 13696 slices, 428 Kb distributed RAM, 136 18x 18 bitmultiplier blocks, 2448 Kb block RAM, 8 DCMs and 644user I/O pads, as will be shown on section 3.3. The mostarithmetic intensive part of the application has been imple-mented directly on the FPGA in VHDL, while the rest hasbeen implemented in software.

3.1. Splines

A spline [10] or elastic curve is the least strained curve de-fined by a control polygon. This polygon is a sequence ofpoints in which the spline belongs to the first and the lastpoint, while the intermediate points have a relative influenceon the shape of the curve. A spline can be seen as a metallicrod which ends are fixed and is bent using springs on somepoints. An example is shown in Fig. 1.

Every spline is defined by a control polygon. The waywe create the curve from the polygon is by repeated linearinterpolation. As linear interpolation is an affine function,there is a function that let us create the curve from the poly-gon, and there is also a function that let us create the poly-gon from the curve. Given this, there is a way to divide aspline [4].

Let t be a real parameter and so the original spline de-fined by the control polygon p0. We define now the parame-ters t1 and t2 as t1 C [0, t], t2 C [t, 1]. We can obtain s1and S2 by repeated linear interpolation over po using t1 andt2. This way, si is the part of the spline so for which the in-terpolating parameter is comprised between 0 and t, and inthe same way, S2 is the part of so for which the interpolatingparameter is comprised between t and 1.

The importance of division lays on the fact that we canobtain the polygons P1 and P2 from po in the same way thatwe obtain s, and S2 from so.

The representation algorithm calculates every point on

82

Page 3: [IEEE 2008 4th Southern Conference on Programmable Logic (SPL) - Bariloche, patagonia, Argentina (2008.03.26-2008.03.28)] 2008 4th Southern Conference on Programmable Logic - Polynomic

will move in [0, u] x [0, v], [0, u] x [v, 1], [u, 1] x [0, v] and[u, 1] x [V, 1].

The reason for using an approximate approach insteadof calculating intersection points the traditional way is thatthis algorithm is much faster than obtaining the points ma-thematically.

3.2. Spline-based rendering

Fig. 2. Observer lines of vision.

which a set of straight lines and a set of splines intersect.The set of straight lines simulates the vision field from anexternal observer, while the set of splines represents the ob-jects of the scene.

As every line can intersect with more than one object,and even more than one time with each object, after the cal-culation we choose the closest to the observer intersectionpoint for every line. Finally, we obtain the colour which theobserver notices for every intersection point. We can see agraphical explanation in Fig 2.

The approximate intersection is used to find out if a lineintersects a spline or not. The rendering algorithm we useto draw based splines objects remembers somehow to ray-tracing algorithm. The main difference between them is thatrendering algorithm for splines makes approximate calcula-tions. The intersection points are calculated with an accu-racy delimited by the object size in the final image.

The intersecting algorithm begins obtaining the mini-mum box that contains the spline. We can affirm that thespline is limited by the control polygon; given that the re-peated linear interpolation is a procedure that preserves theconvex hull, the spline is contained in the polygon convexhull. According to that, the box contains the polygon con-vex hull.

If the straight line does not intersect the box, we canaffirm the straight line also does not intersect the spline. Onthe other hand, if the straight line intersects the box, the nextstep is to divide the spline. We divide it using the processexplained previously. This algorithm continues until the boxis small enough. At the moment the box is small enough weapproximate the intersection point as the centre of the box.

This procedure can be generalised to surfaces. The ap-proximate intersection algorithm is exactly the same exceptthat the surfaces are harder to divide than linear splines. Inthis case, the surface spline is defined by the parameters(U, v) C [0,1] x [0,1]. Therefore, the spline defined by(U, v) can be divided on 4 sub-splines, whose parameters

Spline-based rendering is a technique which requires a hugeamount of arithmetic. It consist of several stages.

The first step is generating a sheaf of straight lines whichsimulate the vision field of the observer: they come out ofhis eye and pass through the scene. The scene will be madeof a set of objects. These objects are made up of splines.

The next stage is calculating the intersections betweenthe objects splines and the sheaf of straight lines. This steprequires calculating the intersection point between a straightline and a spline surface, therefore calculating the approxi-mate intersection point. Once all intersection points are cal-culated, we choose the closest one to the observer for everystraight line of the sheaf.

The last stage involves finding out which colour will theobserver perceive on every point. This is a procedure whichconsists on getting the normal for each surface in every in-tersection point and calculating the amount of light whichwill be reflected. This information will let us rendering thescene as the observer would perceive it.

To accelerate this technique our application consists ona software that delegate in a specialised hardware the criticaloperations using a FPGA.

3.3. Hardware used

The Virtex-JI Pro family contains platform FPGAs for de-signs that are based on IP cores and customised modules. Itincorporates multi-gigabit transceivers and PowerPC CPUblocks in its architecture. It empowers complete solutionsfor communication, wireless, networking, video, and Digi-tal Signal Processing applications. Virtex-JI Pro device areuser-programmable gate arrays with configurable elementsand embedded blocks optimised for high-density and high-performance system designs. It also has two embedded IBMPowerPC 405 RISC processor blocks, which provide perfor-mance up to 400 MHz.

The board used is a Xilinx University Program Virtex-IIPro, which features a lot of peripherals, among which weused the following:

* Up to 2 GB of Double Data Rate (DDR) SDRAM.

* Embedded Platform Cable USB configuration port.

* RS-232 DB9 serial port.

83

Page 4: [IEEE 2008 4th Southern Conference on Programmable Logic (SPL) - Bariloche, patagonia, Argentina (2008.03.26-2008.03.28)] 2008 4th Southern Conference on Programmable Logic - Polynomic

Fig. 3. RTL diagram of the implemented hardware.

* On-board XSGA output, up to 1200 x 1600 at 70 Hzrefresh.

* 100MHz System clock.

4. ACCELERATION USING FPGAS

FPGAs let us accelerate the code by implementing one ofthe operations directly in hardware. This operation is thedivision of a spline surface. The hardware takes the initialcontrol polygon po and returns the control polygons of foursub-splines pl, P2, p3 and p4.

The application control and the rest of the operations areimplemented in software. Whenever the algorithm has todivide a surface, it delegates on the hardware to calculatethe sub-spline control polygons.

The accelerations comes from the fact that every sub-spline can be obtained in an independent way from the otherones. Because of that, the hardware can compute the fourcontrol polygons in the time required to calculate just one.

Moreover, some of the operations can be optimised bystudying the operators in detail. For instance, some oper-ations like divisions by 2 can be implemented directly inrouting, and thus not using any other FPGA fabric.

The divider consists of a net of adders and shifters, asthe operation to be accelerated involves processing a seriesof nested loops. These loops are unrolled and implementedin VHDL, resulting on a hardware which uses almost 50%of the FPGA fabric: more than 6K slices, and a total of 12K4 input LUTs.

Fig. 4. Meshes division.

There is an RTL diagram in Fig. 3 which shows the im-plemented hardware. We can see there how does the soft-ware running on the embedded microprocessor perform thedivision operation in the specific hardware, and how doesthat hardware return the results. There is a FSM that con-trols the data load on the data in registers, and the same forthe data out registers. Also, there must exist a communica-tion protocol between the processor and the hardware im-plemented in the FPGA.

In order to facilitate the implementation we realize asimplification of the problem by using fixed point arithmeticinstead of floating point arithmetic. Although this simplifi-cation limits the application to a fixed range, this was ac-ceptable as the hardware was being created as a proof ofconcept.

The kind of architecture used can grow to perform morethan just one operation, implementing every extra operationin a similar way, using a simple protocol to load data anddownload results on an external hardware.

The main advantage of using an external hardware isthe capability of performing in parallel all those operationswhich are independent amongst them. As we can see onFig. 4, each resulting mesh node is independent, i.e., theycan be calculated at the same time. If the calculation weremade on software, the nodes should be calculated serially.Therefore, calculation time is reduced by a factor of 64,which is the total number of resulting meshes' nodes.

84

Page 5: [IEEE 2008 4th Southern Conference on Programmable Logic (SPL) - Bariloche, patagonia, Argentina (2008.03.26-2008.03.28)] 2008 4th Southern Conference on Programmable Logic - Polynomic

Table 1. Comparison times.SW clockcycles

HW clockcycles Speedup

10 213301 51727 4.123100 2128211 508011 4.1891000 21282011 5092904 4.17810000 212820011 50902710 4.180100000 2128200011 509173260 4.1791000000 21282000011 5091716582 4.179

while the number of arithmetic operations needed addi-tions and multiplications is 1200. As we see, the use of a

highly parallel architecture can drastically reduce the num-ber of cycles needed to complete the operation.

6. CONCLUSIONS

Fig. 5. The rendered image.

5. RESULTS

It is important to note that, to facilitate the implementationon VHDL, we have used fixed point arithmetic. This allowsus to create a faster hardware than a floating point one. But,this solution limits us to use a small data range.

As we see in Fig. 5 ,the system can render curved sur-

faces illuminated by a diffuse and specular light [11], wheresurface colour is completely covered by light colour.

About execution times, we can see on table 1 that thetime needed to render an image is quite reduced when usingthe specific hardware, as the hardware based operation ismore than 4 times faster than the software based one.

The spline division is an operation that is executed the50% of the time, typically. As the operation implemented on

hardware is roughly 4 times faster than the software imple-mentation, the global speedup, according to Amdahl's law,is

A (1-0.5+ 1.

(1 0.5)+0.5

4

(1)

This times, however, can be drastically reduced optimis-ing the connection used between the microprocessor and thehardware. In order to reduce design times, the hardwarewas connected with the microprocessor using GPIOs ports,which are not the fastest way.

In conclusion, the number of operations needed to per-

form the operation in hardware are 16 writes and 64 reads,

We have managed to get a functional system capable of cre-

ating synthetic images starting with an amount of informa-tion much smaller than the traditional systems, i.e. systemsbased on triangle meshes. Moreover, we have developed a

specialised hardware to accelerate the algorithm slowest op-

eration.As seen on section 5, using specialised hardware bring

us a time improvement on the computation of the algorithm,giving a speedup of more than 4 x. According to Amdahl'slaw, as the original code spent nearly 50% of the time on thisoperation, the final speedup is 1.6x.

Spline based rendering also has another adventages. Oneof them is that splines let the user define curved surfaceswith a definition that can not be achieved using triangles.As triangles are flat by definition any curve shape definedusing them can not really be curved, it will be only roughlycurved.

Furthermore, the algorithm cost depends more on imagesize than on image scene objects. This is a very importantfact in scenarios where a constant frame rate is desirable.

7. FUTURE PROSPECTS

Although working with fixed point arithmetic is very fast,it has two main disadvantages: first of all, the operators'range must be completely known and, furthermore, some

operation are known to be out of every possible interval, likedivisions for small values.

Therefore, it would be very favourable for the algorithmto make all operations in floating point arithmetic. Anothersolution could be having a hardware for transforming thedata from floating point to fixed point and vice versa.

85

Iterations

Page 6: [IEEE 2008 4th Southern Conference on Programmable Logic (SPL) - Bariloche, patagonia, Argentina (2008.03.26-2008.03.28)] 2008 4th Southern Conference on Programmable Logic - Polynomic

Another future prospect would be to make more opera-tions in parallel. For example, the hardware could deal withcalculating the control sub-polygons, the box and the normalvector at the same time, while every mathematical operationhas some implicit parallelism which can be exploited.

8. REFERENCES

[1] http://www.xilinx.com/support/documentation/virtexii pro.htm.

[2] J. Epperson, "History of splines," vol. 98, no. 26, July 1998.

[3] P. de Casteljau, Courbes a p6les, 1959.

[4] G. Farin, Curves and surfaces for computer-aided geometricdesign: a practical guide. Morgan-Kaufmann, 1997.

[5] Birkhoff and de Boor, "Piecewise polynomial interpolationand approximation," Proc. General Motors Symposium of1964, pp. 164-190, 1965.

[6] M. Unser, "Splines: a perfect fit for signal and image pro-cessing," Signal Processing Magazine, IEEE, vol. 16, no. 6,pp. 22-38, Nov. 1999.

[7] F. Tsai, S.-Q. Lin, J.-Y. Rau, L.-C. Chen, and G.-R. Liu,"Desriping hyperion imagery using spline interpolation,"Proc. 26th Asian Conference on Remote Sensing, Nov. 2005.

[8] http://www.xilinx.com/prs4rls/2007/end markets/0703 xylon3dCES.htm.

[9] http://wiki.opengraphics.org/tiki-index.php.

[10] P. Davis, "B-splines and geometric design," SIAM News,no. 29, June 1996.

[11] A. Watt, 3D Computer Graphics. Addison-Wesley.

86