Smart Hardware-AcceleratedVolume RenderingSmart Hardware-AcceleratedVolume Rendering Stefan...

9
Joint EUROGRAPHICS - IEEE TCVG Symposium on Visualization (2003) G.-P. Bonneau, S. Hahmann, C. D. Hansen (Editors) Smart Hardware-Accelerated Volume Rendering Stefan Roettger 1 , Stefan Guthe 2 , Daniel Weiskopf 1 , Thomas Ertl 1 , Wolfgang Strasser 2 1 VIS Group, University of Stuttgart 2 WSI-GRIS, University of Tübingen Abstract For volume rendering of regular grids the display of view-plane aligned slices has proven to yield both good quality and performance. In this paper we demonstrate how to merge the most important extensions of the original 3D slicing approach, namely the pre-integration technique, volumetric clipping, and advanced lighting. Our approach allows the suppression of clipping artifacts and achieves high quality while offering the flexibility to explore volume data sets interactively with arbitrary clip objects. We also outline how to utilize the proposed volumetric clipping approach for the display of segmented data sets. Moreover, we increase the rendering quality by implementing efficient over-sampling with the pixel shader of consumer graphics accelerators. We give prove that at least 4- times over-sampling is needed to reconstruct the ray integral with sufficient accuracy even with pre-integration. As an alternative to this brute-force over-sampling approach we propose a hardware-accelerated ray caster which is able to perform over-sampling only where needed and which is able to gain additional speed by early ray termination and space leaping. CR Category: I.3.7 [Computer Graphics]: Three- Dimensional Graphics and Realism. 1. Introduction The basic principle of volume rendering was described by Kajiya 10 as early as in 1984. Since then the availability of graphics hardware has lead to the establishment of the so-called 3D slicing method 13 . The ray integral for each pixel, which Kajiya reconstructed by ray tracing, is calcu- lated with graphics hardware by rendering view-port aligned slices through the volume. In this way the emission and ab- sorption along each viewing ray through the volume is com- puted by sampling and blending the volume for each numer- ical integration step 5 . The main advantage is that the entire task can be performed by the graphics hardware and is lim- ited only by the fill rate of the graphics accelerator. A volume is commonly given as a scalar function sam- pled on a uniform grid. The main source for this kind of volume representation are CT (Computer Tomography) or MRI (Magnetic Resonance Imaging) scanners, which are widely used in medical imaging. In order to associate opac- ity and emission to the scalar values the volume density opti- cal model of Williams et al. 25 13 is usually applied. It defines University of Stuttgart, Computer Science Institute, VIS Group, Breitwiesenstrasse 20-22, 70565 Stuttgart, Germany; E-mail: [email protected] . the opacity and the chromaticity vector to be functions of the scalar values. When slicing a volume in a back-to-front fashion, the opacity and chromaticity are stored in an one- dimensional look-up table (often referred to as the transfer function) which is used to transform the scalar value at each rendered pixel into an RGBA vector. Then the ray integral can be calculated by simple alpha-compositing the slices. This basic principle of hardware-accelerated volume ren- dering of regular grids has been extended in several ways to increase the quality of the renderings. In the following we give a list of the well established extensions and their pur- poses: Ambient and Diffuse Lighting: The emission of the vol- ume is attenuated by a lighting operation 22 24 to give more visual clues regarding the curvature of details in the vol- ume. Pre-Integration: The slicing of the volume results in the so-called ring artifacts which are due to insufficient over- sampling of high frequencies in the transfer function. This issue is resolved by rendering slabs instead of slices. The ray integral of the ray segments inside each slab is a func- tion of the scalar values at the entry and exit points of the ray, and the thickness of the slab (see also Figure 1). By pre-integrating 12 19 the transfer function for each com- bination of scalar values the ray integral can be looked up easily for each rendered pixel using 2D dependent texturing 6 . c The Eurographics Association 2003.

Transcript of Smart Hardware-AcceleratedVolume RenderingSmart Hardware-AcceleratedVolume Rendering Stefan...

Page 1: Smart Hardware-AcceleratedVolume RenderingSmart Hardware-AcceleratedVolume Rendering Stefan Roettger1, Stefan Guthe2, Daniel Weiskopf1, Thomas Ertl1, Wolfgang Strasser2 1 VIS Group,

JointEUROGRAPHICS- IEEETCVG SymposiumonVisualization(2003)G.-P. Bonneau,S.Hahmann,C. D. Hansen(Editors)

Smart Hardware-Accelerated Volume Rendering

StefanRoettger1, StefanGuthe2, DanielWeiskopf1, ThomasErtl1, WolfgangStrasser2

1 VIS Group,Universityof Stuttgart †

2 WSI-GRIS,Universityof Tübingen

AbstractFor volumerenderingof regular gridsthedisplayof view-planealignedsliceshasprovento yieldbothgoodqualityand performance. In this paperwe demonstrate how to merge the mostimportantextensionsof the original 3Dslicingapproach,namelythepre-integrationtechnique, volumetricclipping, andadvancedlighting. Our approachallowsthesuppressionofclippingartifactsandachieveshighqualitywhileofferingtheflexibility toexplorevolumedatasetsinteractivelywith arbitrary clip objects.We alsooutlinehowto utilize theproposedvolumetricclippingapproach for the displayof segmenteddata sets.Moreover, we increasethe renderingquality by implementingefficient over-samplingwith the pixel shaderof consumergraphicsaccelerators. We give prove that at least4-timesover-samplingis neededto reconstructthe ray integral with sufficientaccuracyevenwith pre-integration.Asanalternativeto thisbrute-forceover-samplingapproach weproposea hardware-acceleratedraycasterwhichis able to perform over-samplingonly where neededand which is able to gain additional speedby early rayterminationandspaceleaping.

CR Category: I.3.7 [Computer Graphics]: Three-DimensionalGraphicsandRealism.

1. Introduction

The basicprinciple of volumerenderingwasdescribedbyKajiya10 as early as in 1984. Since then the availabilityof graphicshardware haslead to the establishmentof theso-called3D slicing method1 � 3. The ray integral for eachpixel, which Kajiya reconstructedby ray tracing, is calcu-latedwith graphicshardwareby renderingview-portalignedslicesthroughthevolume.In this way theemissionandab-sorptionalongeachviewing ray throughthevolumeis com-putedby samplingandblendingthevolumefor eachnumer-ical integrationstep5. Themainadvantageis that theentiretaskcanbeperformedby thegraphicshardwareandis lim-itedonly by thefill rateof thegraphicsaccelerator.

A volume is commonlygiven as a scalarfunction sam-pled on a uniform grid. The main sourcefor this kind ofvolume representationare CT (ComputerTomography) orMRI (Magnetic ResonanceImaging) scanners,which arewidely usedin medicalimaging.In orderto associateopac-ity andemissionto thescalarvaluesthevolumedensityopti-calmodelof Williams etal.25� 13 is usuallyapplied.It defines

† University of Stuttgart, ComputerScienceInstitute,VIS Group,Breitwiesenstrasse20-22, 70565 Stuttgart, Germany; E-mail:[email protected] .

the opacity and the chromaticityvector to be functionsofthescalarvalues.Whenslicing a volumein a back-to-frontfashion,the opacityandchromaticityarestoredin an one-dimensionallook-up table(often referredto asthe transferfunction)which is usedto transformthescalarvalueateachrenderedpixel into an RGBA vector. Thenthe ray integralcanbecalculatedby simplealpha-compositingtheslices.

This basicprincipleof hardware-acceleratedvolumeren-deringof regulargridshasbeenextendedin severalwaystoincreasethe quality of the renderings.In the following wegive a list of the well establishedextensionsandtheir pur-poses:

� Ambient and Diffuse Lighting: Theemissionof thevol-umeis attenuatedby alighting operation22� 24 to givemorevisual cluesregardingthecurvatureof detailsin thevol-ume.� Pre-Integration: Theslicing of thevolumeresultsin theso-calledring artifactswhich aredueto insufficient over-samplingof highfrequenciesin thetransferfunction.Thisissueis resolvedby renderingslabsinsteadof slices.Theray integralof theraysegmentsinsideeachslabis a func-tion of the scalarvaluesat the entry and exit points ofthe ray, andthe thicknessof theslab(seealsoFigure1).By pre-integrating12� 19 thetransferfunctionfor eachcom-bination of scalarvaluesthe ray integral can be lookedup easily for eachrenderedpixel using 2D dependenttexturing6.

c�

TheEurographicsAssociation2003.

Page 2: Smart Hardware-AcceleratedVolume RenderingSmart Hardware-AcceleratedVolume Rendering Stefan Roettger1, Stefan Guthe2, Daniel Weiskopf1, Thomas Ertl1, Wolfgang Strasser2 1 VIS Group,

Roettger etal. / SmartHardware-AcceleratedVolumeRendering

� Hardware-Accelerated Pre-Integration: Since pre-integration involves a fair amountof numericalopera-tions, a changeof the transfer function leadsto a de-lay of the volume visualization while the 2D depen-dent texture hasto be recomputed.If self-attenuationisneglected, interactive updaterates of up to 10 Hertz6

are achieved for a transfer function with 256 entries.However, this introducesartifacts in regions where thetransferfunction is very opaque.A bettersolution is tospeedup the computationof the exact ray integral byusing graphicshardware18� 7. The quantizationartifactsthat normally arisewith an 8-bit frame buffer are over-comeby usingthefloatingpoint rendertargetof moderngraphicsacceleratorssuchas the ATI Radeon97002 orNVIDIA GeForceFX16. A 256-stephardware-acceleratedpre-integration with full floating point precision takesabout50millisecondson theATI Radeon9700.� Material Properties: Besidesthe pre-integration of thetransferfunction,onecanalsopre-integratematerialprop-ertieswhichdescribethefractionof ambient,diffuse,andspecularlighting whichhasto beappliedto eachrenderedslab14.� Volumetric Clipping: To exploretheinteriorof avolumeconvenientlyvolumetricclipping23 is usedto cutoff partsof thevolumewhich otherwisewould hide importantin-formation.For that purposethe clip geometryis definedbyaniso-surfaceof anadditionalclip volume(whichneednot necessarilybeof thesamesizeasthescalarvolume).Duringrenderingtheopacityof eachpixel is setto zerointhepixel shaderif thecorrespondingclip valueexceedsacertainthreshold.

In summary, the describedextensionshave led to highqualityvolumerenderingusingthegraphicshardware.How-ever, the combinationof the pre-integrationtechniquewithvolumetricclipping hasbeenan unsolved problem.Henceour first goal is to demonstratethecombinationof thesead-vancedtechniques.To increasetheaccuracy evenfurther, weproposea hardware-acceleratedray casterwhich is abletoadaptthe samplingfrequency to the reconstructionerror oftheray integral.

2. Accurate Volumetric Clipping

In orderto combinepre-integrationwith volumetricclippingbasically threestepsare necessary. Let Cf andCb be thescalarvaluesof theclip volumeattheentryandexit pointsoftheray segmentin therange � 0 � 1� . We alsoassumethat thevolumecorrespondingto clip valuessmallerthan0� 5 shouldbeclippedaway. If bothparametersCf andCb areabove0 � 5,thentheslabis completelyvisible andno specialtreatmentis necessary. ConsideringthecaseCf � 0� 5 andCb � 0 � 5 asdepictedin Figure1, only the darkblue part of thevolumehasto berendered.In this casewe first have to setthefrontscalarvalueSf to thevalueS f atentrypoint into theclippedregion. Thuswe performa look-up into the pre-integration

tablewith theparameters S f � Sb � ratherthanwith Sf � Sb � .In thegeneralcaseSf is replacedby S f accordingto

r � � 0 � 5 Cf �Cb Cf

� S f �� 1 r � Sf � rSb �The brackets denoteclamping to the range � 0� 1� . For thecaseCf � 0� 5 andCb � 0� 5, the front scalarvalueneednotbeadjusted,whichis expressedby r � 0.ThesameholdsforthecaseCf � 0 � 5 andCb � 0 � 5. Similarily, theparameterSbis replacedby Sb asfollows:

g � 1 � 0 � 5 Cb �Cf Cb

� Sb �� 1 g� Sf � gSb �If bothclip valuesarebelow 0� 5, theray segmentis clippedentirely, andthusthescalarvaluesdonotmatter.

Sf Sb

l

Slab

Sf

Figure 1: Using slabs insteadof slices for pre-integratedvolumerenderingasintroducedby Engel et al.6. Thescalarvaluesat theentryandtheexit point of theviewing ray aredenotedbySf andSb, respectively. Thethicknessof theslabis denotedby l . Thedarkblueregion remainsafter volumet-ric clipping. For thispurpose, Sf hasto bereplacedbyS f .

Thesecondproblemwe have to careaboutis thereducedlength l of the clipped ray segment.The numericalpre-integrationdependsonthethreeparametersSf , Sb andl (seeFigure1). Using the optical modelof Williams andMax25

with the chromaticityvector κ and the scalaroptical den-sity ρ, the ray integral for eachray segmentis givenasfol-lows:

Sl x� � Sf � xl Sb Sf �

C Sf � Sb � l � ��� l

0e��� t

0 ρ � Sl � u��� duκ Sl t ��� ρ Sl t ��� dt

θ Sf � Sb � l � � e��� l0 ρ � Sl � t ��� dt � α � 1 θ

Now we have to distinguishbetweentwo differenttypesoftransferfunctions.For atransferfunctionthatdefinesasetofisosurfacesthereducedraysegmentlengthl hasnoeffectonthe integratedchromaticityC andthe integratedopacityα.For the other commoncaseof a transferfunction definedby a lookuptablethethree-dimensionalintegrationproblemcanbereducedto two dimensions.

c�

TheEurographicsAssociation2003.

Page 3: Smart Hardware-AcceleratedVolume RenderingSmart Hardware-AcceleratedVolume Rendering Stefan Roettger1, Stefan Guthe2, Daniel Weiskopf1, Thomas Ertl1, Wolfgang Strasser2 1 VIS Group,

Roettger etal. / SmartHardware-AcceleratedVolumeRendering

Thepre-integrationis performedfor theconstantray seg-mentlengthl . Let thevisible fractionof theslabbedenotedby b � l �� l . Thenthetransparency θ of theclippedray seg-ment is the pre-integratedtransparency θ raisedto the b-thpowerbecause

� l �0

ρ Sl t ��� dt � b � l

0ρ Sl t ��� dt �

θ � e� b � l0 ρ � Sl � t ��� dt � e��� l

0 ρ � Sl � t ��� dtb � θb �

In practicehowever, a first orderapproximationis sufficientif thethicknessof theslabsis reasonablesmall.

If self-attenuationis neglected,theemissionof theclippedray segmentis given by C�� bC. The computationof thecorrect emissioncannotbe expressedin terms of a two-dimensionalpre-integration. For accurateresults a three-dimensionalpre-integration table hasto be used,but highorderpolynomialapproximations7 doaremarkablygoodjobto get rid of the hugethree-dimensionalpre-integrationta-ble.Asidefrom thatissue,theneglectionof self-attenuationis oftenaverygoodassumptionin practice.

Thethird andlastproblemis thelighting of theslab. Sam-pling the gradientat a single point results in severe ringartifacts23. A bettersolutionis to samplethegradientat theentryandexit pointsandto adjustthegradientsin thesamefashionasthescalarvalues.Thenthefinal light intensityoftheslabis theaverageof bothadjustedlight intensities.Theoperationsinvolvedherecanbeexpressedasa linear inter-polationof thegradients.We denotetheinterpolationfactorby a.

Insteadof calculatingthefactorsfor theadjustmentof thescalarvalues,theemission,theopacity, andthegradientsinthe pixel shaderwe pre-computethe factorsfor all combi-nationsof theclip valuesCf andCb andstorethemin a 2Ddependenttexture.Thecontentof eachR/G/B/A channelofthedependenttextureasdepictedin Figure2 correspondstotheadjustmentfactorswith thesamename.

Figure 2: 2D dependenttexture containing the adjust-mentfactorsusedfor accuratevolumetricclipping(R/G/B/Achannelsaredepictedfromleft to right).

Using the pixel shaderversion2.015, both scalarvaluescanbeadjustedby asinglelinearinterpolation("lrp" instruc-tion). The entire pixel shaderperforming pre-integration,ambientanddiffuselighting, andaccuratevolumetricclip-ping is givenin Appendix8.2.

A comparisonof the renderingquality betweenour ap-proachanda naive applicationof the methodby Weiskopfet al.23 to pre-integration is depictedin Figure 3 (seeAp-pendix8.1and8.2 for thepixel shaders).A spherehasbeencutoutof theBucky Ball datasetsothattheholesof thecar-bonringsarevisible asgreenspots.Both imageshave beenrenderedwith only 32 slices.Whereastheslicesareclearlyvisible on the left, our methodreproducesthe clippedvol-umeaccurately. This meansthatour methodis alsoexact intheviewing direction.

Figure 3: Comparisonof volumetricclipping quality. Left:Naiveapproach. Right: Accuratemethod.

The presentedvolumetric clipping algorithm can alsobe extendedto rendersegmentedvolumes21. In conjunc-tion with thepre-integrationtechniquethis wasanunsolvedproblemaswell. If eachsegmentis definedasaclip volume,this problemcanbereducedto volumetricclipping.

For the caseof two segmentedvolumesa clip volumeissetupto includeoneentiresegment.For thesecondsegmentweuseanadditionaltransferfunction.For thepartof theslabwhichis clippedawayweperformanotherdependenttexturelookup into the secondpre-integration table andblend thetwo partial slabsdependingon their orientation.For eachadditionalsegmentthisprocedurehasto berepeated.

3. Over-Sampling

In the last sectionwe have describedthe combinationofthe pre-integration techniquewith volumetric clipping toimprove thequality of volumevisualizations.Nevertheless,even the mentionedadvancedtechniquesare no guaranteefor artifact-freerenderings.

A fundamentalassumptionof volumerenderingis thatthescalarvaluesof thevolumeareinterpolatedtri-linearly. Thepre-integrationtechniquehaseliminatedmany of therender-ing artifactsthatarisefrom slicing thevolume(comparetopright andbottom left imageof Figure4), but it assumesalinearprogressionof thescalarvaluesinsidetheslabs.Thisassumptiondoesnotmatchtheactualcubicbehaviour of thescalarvalues.Furthermore,the scalarfunction may have asharpbendif a voxel boundaryis crossedinsidea slab. This

c�

TheEurographicsAssociation2003.

Page 4: Smart Hardware-AcceleratedVolume RenderingSmart Hardware-AcceleratedVolume Rendering Stefan Roettger1, Stefan Guthe2, Daniel Weiskopf1, Thomas Ertl1, Wolfgang Strasser2 1 VIS Group,

Roettger etal. / SmartHardware-AcceleratedVolumeRendering

non-lineareffectis amplifiedby lighting in regionswith highsecondderivatives.

Consideringpre-integration alone the exact reconstruc-tion of the ray integral is a four-dimensionalproblem(forconstantray segmentlengthl ). Includingtheeffect of light-ing it canonly besolvedefficiently with over-sampling.Thetop row of Figure4 illustratesthe artifactsthat may occurwithout over-sampling.Using 2-timesover-samplinghelpsa lot, but thereis still a largedifferencewhensteppingup to4-times over-sampling.Fortunately8-times over-samplingleadsto insignificant improvementsonly, so that 4-timesover-samplingis alreadya goodchoicefor highestqualityin practice.

Current graphicshardware allows to perform multipleblendingstepsat oncein the pixel shader. We have imple-menteda pixel shaderprogramthatblendsfour slabsinter-nally beforeit writes the result back into the framebuffer(seeAppendix8.3).As a sideeffect, thefour slabs(seeFig-ure 1) are blendedwith the high internal precisionof thepixel shader.

For awindow sizeof 5122 pixelstherenderingtimeis be-tween1 and1.7secondsfor thismulti-steppedapproach.Be-causeof the increasedcachecoherencewe achieve a speedup of 70%-120%over the single-steppedversiondepend-ing on the dataset and viewing parameters.We concludethat2-timesover-samplingcomesalmostfor freewhenus-ing multi-stepping.For very high quality requirementsonehasto acceptabouttwice therenderingtimeasnormal.

4. Hardware-Accelerated Ray Casting

The previously discussedover-sampling approach sup-pressesartifacts causedby neglecting tri-linear interpola-tion, thecrossingof voxel boundaries,andthenon-linearbe-haviour of lighting. While this brute-forceapproachis prac-tically artifact-free,it doesnot exploit any spatialcoherencein thedatasetto increasetherenderingperformance.In com-parisonto brute-forceover-sampling,a ray caster4 � 11 is ableto adaptthe samplingrate to the actualinformation in thedataset.This oftenleadsto a drasticallyreducednumberofsamplepoints.

Just like the flexibility of currentgraphicshardware al-lowshardware-acceleratedraytracing17 it alsoallowsto per-form ray castingcompletelyin hardware. In the followingwe describea smarthardware-acceleratedray casterwhichappliesan error-driven samplingschemeand additionallyperformsearlyray termination.

Thepre-integrationtechniqueisbasedonapiece-wiselin-earapproximationof the ray integral, thusall higherorderfrequenciesare neglected.In the optimal caseone wouldchoosethe step length suchthat the differenceof the ex-actray integral andthelinearapproximationis lower thanapre-definedthreshold.But this solutionis infeasiblesinceitrequiresthepre-integrationof theentiredataset.

Figure 4: A leaveof the Bonsai20. Thedegradationof ren-dering quality is displayedfrom top left to bottom right:with pre-integration and 4-timesover-sampling, with pre-integration and2-timesover-sampling, with pre-integrationandwithoutover-sampling, andneitherwith pre-integrationnor over-sampling. In thebottomright cornerof each imagethezoomof a critical region is depictedwhich shouldshowa smoothcolor transition.Dueto slicingartifactsin thebot-tom right image artificial bandsare visible. Theseremainevenwith 2-timesoversamplingbut almostdisappearwith4-timesoversampling.

Insteadof solving the global problemwe resortto solv-ing the local problem,which basicallyrequiresthe compu-tation of the local approximationerror of the scalarfunc-tion andthe visibility of this error, which is determinedbythetransferfunction.A naturalstrategy to computethesteplengthshouldbebasedonatleastthesecondderivativeof thescalarfunctionandthepre-integratedemissionandopacityof thetransferfunction.Thisapproach,whichwecall adap-tivepre-integration, automaticallysubsumesthewell-knownaccelerationtechniqueof spaceleaping5.

Conceptually, theprocessof raycastingcanbedividedinthreemaintasks,whicharetheraysetup,thesamplingof theray, andthenumericalintegrationof thesamples.In ordertoexploit themassiveparallelismof thegraphicshardware,wecomputetheintegrationfor all raysin parallel.For eachstepthe front facesof theboundingbox arerenderedto processall visible raysin a front-to-backfashion(seeFigure5).

Theray positionis determinedby a floatingpoint rendertargetwhich containsthecurrentray parameter. For a givensamplingdistancel therayparameteris updatedaccordingly

c�

TheEurographicsAssociation2003.

Page 5: Smart Hardware-AcceleratedVolume RenderingSmart Hardware-AcceleratedVolume Rendering Stefan Roettger1, Stefan Guthe2, Daniel Weiskopf1, Thomas Ertl1, Wolfgang Strasser2 1 VIS Group,

Roettger etal. / SmartHardware-AcceleratedVolumeRendering

andwrittenbackinto thefloatingpoint rendertarget.Simul-taneously, the integratedchromaticityandopacityarewrit-ten to additionalfloating point rendertargets(we usetwotargetfor theRGandBG channels).A so-calledimportancevolumeis usedto extract the maximumisotropicsamplingdistanceaspreviouslydefined.

Bounding Box

Screen

already terminated

actual position

initial position

ray parameter

Figure 5: Ray castingscheme:All viewing rays are pro-cessedsimultaneously. For each integration step the pre-integration techniqueis used.After each stepthecolors areblendedand the ray parametercorrespondingto the nextsamplingpositionis writtenback into alternatingrendertar-gets(ping-pongrendering).

A ray is terminatedif it either leaves the volume or ifthe integratedopacity is approximatelyone.If a ray is ter-minatedwe have to prevent further computation.For thispurpose,we usethe Z-buffer since it is testedbeforeanypixel shadercomputation.Thisalsoincludestexturereads.Iftheray endswe settheZ-buffer suchthatthecorrespondingpixel is not processedanymore.This requiresanadditionalrenderingpass.

To find out whetherall rays have beenterminatedweapply an asynchronousocclusion query. In total, we re-quire 2n 1 passeswith n beingthe maximumnumberofsamplesfor the rays in the generatedimage.The first passis additionallysettingup theinitial ray parameters.In orderto allow re-implementationof theraycasterthepixel shader2.0codeis includedin theAppendices8.4,8.5,and8.6.

In contrastto the brute-forceslicing approachthe totalnumberof samplesdependson thetransferfunctionandonthecoherenceof thevolumedata.Also consideringearlyraytermination,thenumberof samples(seeFigure6) is alwayslessthanfor the slicing approach,but the samereconstruc-tion quality is achieved.In additionto this,all computationsincluding blending are performedwith full floating pointprecision,so that artifactsdueto framebuffer quantizationcannotoccur(compareFigure7).

5. Results

All performancemeasurementshave beenconductedon aPC equippedwith an ATI Radeon9700graphicsaccelera-

Figure 6: Numberof samplingstepsfor different transferfunctions:On the left the original image is depictedand atthe center the correspondingnumberof samplingstepsisshown(Whitecorrespondsto 512 samples).On the right amore opaquetransferfunctionwaschosento illustrate theimpactof early ray termination.

Figure 7: Quality comparisonbetweenslicing with pre-integration and 4-timesover-sampling(left) and ray cast-ing with full floating point accuracy and adaptive pre-integration (right). On the left highly transparentareasareneglecteddueto 8 bit framebuffer quantization.

tor. Figure8 showsaBonsai20 of size2563 with andwithoutaccuratevolumetric clipping using 4-timesover-sampling.Renderingtimes are approximately2 and 1.5 seconds,re-spectively. The renderingtimes for the ray-castedbonsaiwith opaqueandsemi-transparenttransferfunctionsarebe-tween1 and 3 secondsper frame (Figures6 and 9). ThesmallerNegHIP dataset(Figure7) with a sizeof 1283 vox-els took 2.2 secondswith a semi-transparenttransferfunc-tion and0.7secondswith anopaquetransferfunction.

In comparisonto traditionalslicingthequalityof raycast-ing is alwayshigher. Dependingon thetransferfunctionraycastingmayevenbefaster. By increasingtheerrorthreshold,therenderingtime caneasilybereducedto 0.1 secondsperframe.The artifacts introducedhereare much lessvisiblethanfor theanaloguecaseof reducingthenumberof slices.

With the upcomingof graphicsacceleratorssuchas theNVIDIA GeForce FX, we expect the performanceof ahardware-acceleratedray casterto increasemuchmorethanthe performanceof regular multi-steppedslicing. The rea-sonsare as follows: First, an additionalpassis no longer

c�

TheEurographicsAssociation2003.

Page 6: Smart Hardware-AcceleratedVolume RenderingSmart Hardware-AcceleratedVolume Rendering Stefan Roettger1, Stefan Guthe2, Daniel Weiskopf1, Thomas Ertl1, Wolfgang Strasser2 1 VIS Group,

Roettger etal. / SmartHardware-AcceleratedVolumeRendering

Figure 8: Bonsaiwith andwithoutvolumetricclipping.

Figure 9: Hardware-acceleratedraycasting.

neededfor ray termination.Secondly, multiple stepscanbeperformedatoncebecauseof ahighermaximuminstructioncount.As buffer writesbetweenthepassesconsumemostofthe renderingtime, this overheadcan be reduceddramati-cally with theNVIDIA GeForceFX.

The performanceof a software ray caster11 is approxi-matelythe sameasfor our hardware-acceleratedapproach.However, the software ray casterachieves its bestperfor-manceonly without pre-integration,so theresultingqualityis not comparable.

6. Conclusion

We have combined volumetric clipping with the pre-integration techniqueto achieve high-qualityvolumevisu-alizations.Sincepre-integrationdoesnot completelysolvetheproblemof thereconstructionof theray integral,wepre-senteda hardware-acceleratedray caster. By adaptive pre-integrationof the viewing raysthe reconstructionerror canbeguaranteedto bebelow thethresholdcontrastsensitivity9.Theraycasteralsoappliespre-integration,spaceleapingandearly ray termination.In the futurewe plan to speedup theray casterby usinghierarchicalapproaches8 thatwill allowthevisualizationof very largedatasets.

7. Acknowledgements

Wewouldliketo thankMikeDoggettof ATI for hisvaluablesupport,andespeciallyfor providing an ATI Radeon9700graphicsaccelerator. We would also like to thankMichaelWandfor interestingdiscussionson thetopic of importancevolumes.

8. Appendix

8.1. Naive Volumetric Clipping:ps_2_0 // ps 2.0dcl_volume s0 // 3D RGBA volumedcl_2d s1 // 2D pre-integration table (RGBA)dcl_volume s2 // 3D clip volume (Luminance)dcl t0.xyz // texcoord frontdcl t1.xyz // texcoord backdcl t2.xyz // normalized eye vectordef c0,0.5,1.0,2.0,0.0 // constant definitionstexld r0,t0,s0 // get front scalar valuetexld r1,t1,s0 // get back scalar valuetexld r4,t0,s2 // get front clip valuemov r2.r,r0.a // move front scalar value to r2.rmov r2.g,r1.a // move back scalar value to r2.gtexld r2,r2,s1 // pre-integration lookupmul r2.rgb,r2,r2.a // post-multiply alphamad r0.rgb,c0.b,r0,-c0.g // expand normaldp3_sat r3,t2,r0 // calculate light intensitymul r2.rgb,r2,r3 // multiply emission with intensityadd r4.r,r4.r,-c0.r // depending on comparison with 0.5..cmp r2.a,r4.r,r2.a,c0.a // ..set opacity to zeromov oC0,r2 // move to output register

8.2. Accurate Volumetric Clipping:ps_2_0 // ps 2.0dcl_volume s0 // 3D volumedcl_2d s1 // 2D pre-integration tabledcl_volume s2 // 3D clip volumedcl_2d s3 // clip coefficient table (RGBA)dcl t0.xyz // texcoord frontdcl t1.xyz // texcoord backdcl t2.xyz // normalized eye vectordef c0,0.5,1.0,2.0,0.0 // constant definitionstexld r0,t0,s0 // get front scalar valuetexld r1,t1,s0 // get back scalar valuetexld r4,t0,s2 // get front clip valuetexld r5,t1,s2 // get back clip valuemov r4.g,r5.r // move back clip value to r4.gtexld r4,r4,s3 // get clip coefficientslrp r2,r4,r1.a,r0.a // adjust scalar valuestexld r2,r2,s1 // pre-integration lookupmul r2.rgb,r2,r2.a // post-multiply alphamad r0.rgb,c0.b,r0,-c0.g // expand front normaldp3_sat r3,t2,r0 // calculate front light intensitymad r1.rgb,c0.b,r1,-c0.g // expand back normaldp3_sat r3.g,t2,r1 // calculate back light intensityadd_sat r3,c0.r,r3 // bias light intensitieslrp r5.r,r4.a,r3.g,r3.r // adjust light intensitiesmul r2.rgb,r2,r5.r // multiply emission with intensitymul r2,r2,r4.b // attenuate emission and opacitymov oC0,r2 // move to output register

8.3. 4Steps@Once:ps_2_0 // ps 2.0dcl_volume s0 // 3D volumedcl_2d s1 // 2D pre-integration tabledcl t0.xyz // texcoord frontdcl t1.xyz // texcoord backdcl t2.xyz // normalized eye vectordcl t3.xyz // second texcoord frontdcl t4.xyz // second texcoord backdcl t5.xyz // central texcoord

c�

TheEurographicsAssociation2003.

Page 7: Smart Hardware-AcceleratedVolume RenderingSmart Hardware-AcceleratedVolume Rendering Stefan Roettger1, Stefan Guthe2, Daniel Weiskopf1, Thomas Ertl1, Wolfgang Strasser2 1 VIS Group,

Roettger etal. / SmartHardware-AcceleratedVolumeRendering

def c0,0.5,1.0,2.0,0.0 // constant definitionstexld r0,t0,s0 // get front scalar valuetexld r1,t1,s0 // get back scalar valuetexld r3,t3,s0 // get second front scalar valuetexld r4,t4,s0 // get second back scalar valuetexld r6,t5,s0 // get central scalar valuemad r0.rgb,c0.b,r0,-c0.g // expand front normaldp3_sat r0.r,t2,r0 // calculate front light intensityadd_sat r0.r,c0.r,r0.r // bias light intensitymad r3.rgb,c0.b,r3,-c0.g // expand second front normaldp3_sat r3.r,t2,r3 // calculate front light intensityadd_sat r3.r,c0.r,r3.r // bias light intensitymad r6.rgb,c0.b,r6,-c0.g // expand central normaldp3_sat r6.r,t2,r6 // calculate front light intensityadd_sat r6.r,c0.r,r6.r // bias light intensitymad r4.rgb,c0.b,r4,-c0.g // expand second back normaldp3_sat r4.r,t2,r4 // calculate front light intensityadd_sat r4.r,c0.r,r4.r // bias light intensitymov r2.r,r4.a // move second back scalar valuemov r2.g,r1.a // move back scalar valuetexld r2,r2,s1 // pre-integration lookupmul r2.rgb,r2,r2.a // post-multiply alphaadd r2.a,c0.g,-r2.a // compute transparencymul r2.rgb,r2,r4.r // multiply emission with intensitymov r5.r,r6.a // move central scalar valuemov r5.g,r4.a // move second back scalar valuetexld r5,r5,s1 // pre-integration lookupmul r5.rgb,r5,r5.a // post-multiply alphaadd r5.a,c0.g,-r5.a // compute transparencymul r5.rgb,r5,r6.r // multiply emission with intensitymad r2.rgb,r2,r5.a,r5 // blend operation (1st)mul r2.a,r2.a,r5.a // accumulate alphamov r5.r,r3.a // move second front valuemov r5.g,r6.a // move central scalar valuetexld r5,r5,s1 // pre-integration lookupmul r5.rgb,r5,r5.a // post-multiply alphaadd r5.a,c0.g,-r5.a // compute transparencymul r5.rgb,r5,r3.r // multiply emission with intensitymad r2.rgb,r2,r5.a,r5 // blend operation (2nd)mul r2.a,r2.a,r5.a // accumulate alphamov r5.r,r0.a // move front scalar valuemov r5.g,r3.a // move second front scalar valuetexld r5,r5,s1 // pre-integration lookupmul r5.rgb,r5,r5.a // post-multiply alphaadd r5.a,c0.g,-r5.a // compute transparencymul r5.rgb,r5,r0.r // multiply emission with intensitymad r2.rgb,r2,r5.a,r5 // blend operation (3rd)mul r2.a,r2.a,r5.a // accumulate alphaadd r2.a,c0.g,-r2.a // compute opacitymov oC0,r2 // move to output register

8.4. Ray Casting: Ray Setup and First Integration Step

ps_2_0 // ps 2.0def c0,0.0,2.0,63.75,1.0 // constant definitionsdef c1,1.0,0.0,0.333,0.5 // (c2,c3) is the bounding boxdcl t0.xyz // first intersection point with volumedcl t1.xyz // ray direction (not normalized)dcl t2.xyz // scaling of texture coordinatesdcl t3.xyz // direction to light sourcedcl_volume s0 // 3D volumedcl_volume s1 // 3D table holding sampling distancesdcl_volume s2 // 3D pre-integration tabletexld r2,t0,s0 // sample volume at first intersectiontexld r5,t0,s1 // get distance to next sampling pointnrm r0.xyz,t1 // normalize ray direction..mul r0.xyz,r0,t2 // ..and multiply with scaling factorsadd r4.xyz,c2,-t0 // calculate signed distances..add r3.xyz,c3,-t0 // ..to the bounding boxrcp r6.x,r0.x // calculate reciprocal..rcp r6.y,r0.y // ..of each component..rcp r6.z,r0.z // ..of ray directionmul r3.xyz,r3,r6 // calculate distance to bounding box..mul r4.xyz,r4,r6 // ..in ray directionmax r6.xyz,r3,r4 // get closest..min r4.x,r6.x,r6.y // ..non-negative..min r1.y,r4.x,r6.z // ..intersection pointmov r1.xzw,c0.x // store intersection 4 ray terminationmad r1.xz,r5.x,c1,r1.x // compute new sampling position

min r1.x,r1.x,r1.y // clamp to volume boundarymad r3.xyz,r1.x,r0,t0 // calculate second sampling point..texld r4,r3,s0 // ..and sample volume at this positionmad r2.xyz,r2,c0.y,-c0.w // extract gradientnrm r8.xyz,r2 // normalize gradientadd r6.w,r1.x,-r1.z // get interval length of current steplrp r6.xy,c1,r2.w,r4.w // setup texcoords 4 pre-integration..mad r6.z,r6.w,c1.z,-c1.z // ..including ray segment length ltexld r7,r6,s2 // pre-integration lookupdp3_sat r8.w,r8,t3 // calculate light intensitymul r6.x,c0.z,r6.w // change opacity to correctly..add r7.a,c0.w,-r7.a // ..represent the ray segment length..pow r10.a,r7.a,r6.x // ..which includes raising it..add r7.a,c0.w,-r10.a // ..to the power of lmul r7.rgb,c1.w,r7 // bias color 4 lightingmad r7.rgb,r8.w,r7,r7 // multiply emission with intensitymul r7.rgb,r7,r7.a // post-multiply alphamov r0,r7.b // split color 4 floating point output..mov r0.g,r7.a // ..into two render targetsmov oC0,r7 // output color (RG)mov oC1,r0 // output color (BA)mov oC2,r1 // output ray parameter

8.5. Ray Casting: Ray Termination

ps_2_0 // ps 2.0def c0,-1.0,-1.0,0.0,0.0 // constant definitionsdef c1,0.996,0.0,0.0,0.0 // threshold equals 1/256dcl t4.xyzw // position of pixel in screen spacedcl_2d s3 // previous ping-pong buffers holding..dcl_2d s4 // ..floating point colorsdcl_2d s5 // buffer containing sample positionstexldp r1,t4,s5 // get sampling position..mov r1.zw,c0.z // ..and clear unused parttexldp r8,t4,s4 // get opacityadd r0,r1.x,-r1.y // calc distance to volume boundarycmp r1,r0,c0,r1 // check if distance equals 0add r0,c1.x,-r8.g // compare opacity against threshold..cmp r1,r0,r1,c0 // ..4 early ray terminationmov r0,-r1 // leave shader if the ray..texkill r0 // ..is not being terminatedtexldp r7,t4,s3 // get remaining parts of colormov oC0,r7 // output floating point color into..mov oC1,r8 // ..alternate ping-pong buffers

8.6. Ray Casting: One Additional Integration Step

ps_2_0 // ps 2.0def c0,0.0,2.0,63.75,1.0 // constant definitionsdef c1,1.0,0.0,0.333,0.5 // (c2,c3) is the bounding boxdcl t0.xyz // first intersection point with volumedcl t1.xyz // ray direction (not normalized)dcl t2.xyz // scaling of texture coordinatesdcl t3.xyz // direction to light sourcedcl t4.xyzw // position of pixel in screen spacedcl_volume s0 // 3D volumedcl_volume s1 // 3D table holding sampling distancesdcl_volume s2 // 3D pre-integration tabledcl_2d s3 // previous ping-pong buffers holding..dcl_2d s4 // ..floating point colorsdcl_2d s5 // buffer containing sample positionstexldp r1,t4,s5 // get sampling positiontexldp r7,t4,s3 // get color from two..texldp r8,t4,s4 // ..floating point render targetsmov r7.z,r8.x // combine the color channels of..mov r7.w,r8.y // ..the two render targetsnrm r0.xyz,t1 // normalize ray direction..mul r0.xyz,r0,t2 // ..and multiply with scaling factorsmad r3.xyz,r1.x,r0,t0 // calculate first sampling pointtexld r2,r3,s0 // sample volume at first sampling pointtexld r5,r3,s1 // get distance to next sampling pointmad r1.xz,r5.x,c1,r1.x // compute new sampling positionmin r1.x,r1.x,r1.y // clamp to volume boundarymad r3.xyz,r1.x,r0,t0 // calculate second sampling point..texld r4,r3,s0 // ..and sample volume at this positionmad r2.xyz,r2,c0.y,-c0.w // extract gradientnrm r8.xyz,r2 // normalize gradientadd r6.w,r1.x,-r1.z // get interval length of current step

c�

TheEurographicsAssociation2003.

Page 8: Smart Hardware-AcceleratedVolume RenderingSmart Hardware-AcceleratedVolume Rendering Stefan Roettger1, Stefan Guthe2, Daniel Weiskopf1, Thomas Ertl1, Wolfgang Strasser2 1 VIS Group,

Roettger etal. / SmartHardware-AcceleratedVolumeRendering

lrp r6.xy,c1,r2.w,r4.w // setup texcoords 4 pre-integration..mad r6.z,r6.w,c1.z,-c1.z // ..including ray segment length ltexld r9,r6,s2 // pre-integration lookupdp3_sat r8.w,r8,t3 // calculate light intensitymul r6.x,c0.z,r6.w // change opacity to correctly..add r9.a,c0.w,-r9.a // ..represent the ray segment length..pow r10.a,r9.a,r6.x // ..which includes raising it..add r9.a,c0.w,-r10.a // ..to the power of lmul r9.rgb,c1.w,r9 // bias color 4 lightingmad r9.rgb,r8.w,r9,r9 // multiply emission with intensitymul r9.rgb,r9,r9.a // post-multiply alphaadd r8.a,c0.w,-r7.a // blend new color..mad r7,r9,r8.a,r7 // ..with the old onemov r0,r7.b // split color 4 floating point output..mov r0.g,r7.a // ..into two render targetsmov oC0,r7 // output color (RG)mov oC1,r0 // output color (BA)mov oC2,r1 // output ray position

References

1. Kurt Akeley. RealityEngineGraphics. ComputerGraphics(SIGGRAPH’93 Proceedings), 27:109–116,1993.

2. ATI. ProgrammabilityFeaturesof GraphicsHardware.SIGGRAPH’02 CourseNotes, 2002.

3. Brian Cabral,Nancy Cam,andJim Foran.AcceleratedVolume Renderingand TomographicReconstructionUsing Texture Mapping Hardware. In Proc. Sympo-siumon VolumeVisualization’94, pages91–98.ACMSIGGRAPH,1994.

4. JohnDanskinandPat Hanrahan.FastAlgorithms forVolumeRay Tracing. In Proc. Symposiumon VolumeVisualization’92, pages91–98.ACM, 1992.

5. R. A. Drebin,L. Carpenter, andP. Hanrahan.VolumeRendering.ComputerGraphics, 22(4):65–74,1988.

6. K. Engel, M. Kraus, and Th. Ertl. High-QualityPre-Integrated Volume Rendering Using Hardware-AcceleratedPixel Shading.In EurographicsWorkshopon Graphics Hardware ’01, pages9–16. ACM SIG-GRAPH,2001.

7. S.Guthe,S.Roettger, A. Schieber, W. Strasser, andTh.Ertl. High-QualityUnstructuredVolumeRenderingonthe PC Platform. In Proc. EG/SIGGRAPHGraphicsHardwareWorkshop’02, pages119–125,2002.

8. S. Guthe,M. Wand,J. Gonser, andW. Strasser. Inter-active Renderingof LargeVolumeDataSets. In Proc.Visualization’02, pages53–60.IEEE ComputerSoci-etyPress,2002.

9. R.F. Hessand E.R. Howell. The ThresholdContrastSensitivity Functionin StrabismicAmblyopia. VisionResearch, 17:1049–1055,1977.

10. JamesT. Kajiya. Ray Tracing Volume Densities. InProc.SIGGRAPH’84, pages165–174.ACM, 1984.

11. GüntherKnittel. The ULTRAVIS System. In Proc.VisualizationSymposium’00, pages71–79,2000.

12. N. L. Max, P. Hanrahan,andR. Crawfis. AreaandVol-umeCoherencefor EfficientVisualizationof 3D ScalarFunctions. ComputerGraphics(SanDiego WorkshoponVolumeVisualization), 24(5):27–33,1990.

13. NelsonL. Max. OpticalModelsfor DirectVolumeRen-dering. IEEE Transactionson VisualizationandCom-puterGraphics, 1(2):99–108,1995.

14. M. Meissner, S. Guthe,and W. Strasser. InteractiveLighting ModelsandPre-Integrationfor VolumeRen-deringonPCGraphicsAccelerators.In Proc.GraphicsInterface’02, pages209–218,2002.

15. Microsoft. Next Generation3-D GamingTechnologyUsingDirectX 9.0.Proc.GameDevelopersConference’02, 2002.

16. NVIDIA. CineFXArchitecture.Proc.SIGGRAPH’02,2002.

17. T. Purcell, I. Buck, W. Mark, and P. Hanrahan. RayTracingon ProgrammableGraphicsHardware. ACMTransactionsonGraphics, 21(3):703–712,2002.

18. S.RoettgerandTh. Ertl. A Two-StepApproachfor In-teractive Pre-IntegratedVolumeRenderingof Unstruc-tur edGrids. In Proc. IEEE Symposiumon VolumeVi-sualization’02, pages23–28.ACM Press,2002.

19. S. Roettger, M. Kraus, and Th. Ertl. Hardware-AcceleratedVolume and IsosurfaceRenderingBasedon Cell-Projection. In Proc. Visualization’00, pages109–116.IEEE,2000.

20. S. Roettgerand B. Thomandl. Three little Bonsaisthat did survive CT scanningand contrast agent but notthegraphixlab conditions. http://wwwvis.informatik.uni-stuttgart.de/˜roettger/data/Bonsai, 2000.

21. U. Tiede, T. Schiemann,and K. H. Hohne. High-QualityRenderingof AttributedVolumeData.In Proc.Visualization’98, pages255–262.IEEE,1998.

22. Allen Van GelderandKwansikKim. Direct VolumeRenderingwith Shadingvia Three-DimensionalTex-tures. In Proc. IEEE Symposiumon VolumeVisualiza-tion ’96, pages23–30,1996.

23. D. Weiskopf, K. Engel,andTh. Ertl. VolumeClippingvia Per-FragmentOperationsin Texture-BasedVolumeVisualization.In Proc.Visualization’02, pages93–100,2002.

24. R. WestermannandTh. Ertl. Efficiently UsingGraph-ics Hardware in Volume RenderingApplications. InComputerGraphics, AnnualConferenceSeries,pages169–177.ACM, 1998.

25. P. L. Williams andN. L. Max. A VolumeDensityOp-tical Model. In ComputerGraphics(Workshopon Vol-umeVisualization’92), pages61–68.ACM, 1992.

c�

TheEurographicsAssociation2003.

Page 9: Smart Hardware-AcceleratedVolume RenderingSmart Hardware-AcceleratedVolume Rendering Stefan Roettger1, Stefan Guthe2, Daniel Weiskopf1, Thomas Ertl1, Wolfgang Strasser2 1 VIS Group,

Roettger etal. / SmartHardware-AcceleratedVolumeRendering

Top left: Naive volumeclippingapproach(renderedBucky Ball showsslicing artifacts).Top right: Accuratevolumeclippingmethod.Bottom: Hardware-acceleratedraycasting(Bonsairenderedwith ATI Radeon9700).

c�

TheEurographicsAssociation2003.