The Irregular Z-Buffer and its Application to Shadow Mapping · The Irregular Z-Buffer and its...

8
The Irregular Z-Buffer and its Application to Shadow Mapping Gregory S. Johnson 12 William R. Mark 1† Christopher A. Burns 12‡ Department of Computer Sciences 1 Texas Advanced Computing Center 2 The University of Texas at Austin Abstract The classical Z-buffer algorithm samples a scene at regularly spaced points on an image plane. We present an extension of this algorithm called the irregular Z-buffer that permits sampling of the scene at arbitrary points on the image plane. The sample points are stored in a two-dimensional spatial data structure which is queried during rasterization. The irregular Z-buffer can be ap- plied to shadow rendering, where we demonstrate that it eliminates the sampling artifacts previously associated with shadow mapping. We describe the extensions to modern graphics hardware necessary to support efficient operation of the irregular Z-buffer, and simulate the performance on the extended hardware. Our results indicate that the irregular Z-buffer can be used to produce hard shadows that are as accurate as those produced by a ray tracer or shadow volume-based renderer. We also find that shadow mapping on the irregular Z-buffer retains many of the performance and application- development simplicity advantages of standard shadow mapping. CR Categories: I.3.1 [Computer Graphics]: Hardware Architecture—Graphics Processors; I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism—Visible Line / Sur- face Algorithms; I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism—Shadows; Keywords: visible surface algorithms, shadow algorithms, real- time graphics hardware 1 Introduction One of the fundamental computational tasks in three-dimensional graphics is the visible surface problem: to efficiently find the first intersection point between a ray and a collection of surfaces. In rendering, the visible surface problem is typically solved for many rays and a single collection of surfaces. Examples include visibility determination from an eye point, shadow computation, and compu- tation of global illumination solutions. Today, the visible surface problem is typically solved by either the Z-buffer algorithm [Catmull 1974] or by ray casting using a spa- tial acceleration structure [Whitted 1980]. The Z-buffer algorithm is appropriate for many rays that share a common origin and share a known pattern of directions. It does not require any preprocessing of the surfaces. In contrast, the ray casting algorithm solves the vis- ible surface problem for the general case – one or more rays with arbitrary origins and directions – but requires preprocessing of the surfaces to build a spatial acceleration structure. We present an algorithm that is more general than the classical Z-buffer, but does not require preprocessing of surfaces. This ap- proach, which we refer to as the irregular Z-buffer, efficiently solves the visibility problem for rays that share a common origin but have arbitrary directions (Figure 2). Our algorithm represents ray directions as points on an image plane, as in the classical Z-buffer approach. However, we explic- e-mail: [email protected] e-mail: [email protected] e-mail: [email protected] Figure 1: Two images of a scene from Doom 3 (alpha). A screen capture directly from the game engine is shown at top. The game engine employs shadow volumes to render this image. The lower image is rendered by a software pipeline that implements irregular shadow maps. itly store all of the sample locations in a two-dimensional spatial data structure rather than implicitly representing them with a reg- ular pattern. The data structure can be any spatial data structure that supports efficient range queries, such as a k-d tree or a grid. Like the classical Z-buffer algorithm, we project triangles onto the image plane one at a time and then determine which samples lie inside a triangle. Unlike the classical algorithm, this determination is made by querying the two-dimensional spatial data structure. Fi- nally, for each sample inside a triangle, we perform the standard Z comparison and update at the sample’s address in a Z-buffer. We demonstrate that the irregular Z-buffer can be used to render high quality shadows cast by a point light source, as seen from a

Transcript of The Irregular Z-Buffer and its Application to Shadow Mapping · The Irregular Z-Buffer and its...

Page 1: The Irregular Z-Buffer and its Application to Shadow Mapping · The Irregular Z-Buffer and its Application to Shadow Mapping ... that the irregular Z-buffer can be used to produce

The Irregular Z-Buffer and its Application to Shadow Mapping

GregoryS.Johnson1 � 2 � William R. Mark 1 † ChristopherA. Burns1 � 2 ‡

Departmentof ComputerSciences1 TexasAdvancedComputingCenter2

TheUniversityof TexasatAustin

Abstract

The classical Z-buffer algorithm samplesa sceneat regularlyspacedpoints on an image plane. We presentan extensionofthis algorithmcalled the irregular Z-buffer that permitssamplingof the sceneat arbitrary points on the imageplane. The samplepointsarestoredin a two-dimensionalspatialdatastructurewhichis queriedduring rasterization.The irregular Z-buffer canbe ap-plied to shadow rendering,wherewedemonstratethatit eliminatesthesamplingartifactspreviously associatedwith shadow mapping.Wedescribetheextensionsto moderngraphicshardwarenecessaryto supportefficientoperationof theirregularZ-buffer, andsimulatethe performanceon the extendedhardware. Our resultsindicatethat the irregular Z-buffer can be usedto producehard shadowsthat are asaccurateas thoseproducedby a ray traceror shadowvolume-basedrenderer. We alsofind that shadow mappingon theirregularZ-buffer retainsmany of theperformanceandapplication-developmentsimplicity advantagesof standardshadow mapping.

CR Categories: I.3.1 [Computer Graphics]: HardwareArchitecture—GraphicsProcessors;I.3.7 [Computer Graphics]:Three-DimensionalGraphicsand Realism—Visible Line / Sur-faceAlgorithms; I.3.7 [ComputerGraphics]: Three-DimensionalGraphicsandRealism—Shadows;

Keywords: visible surfacealgorithms,shadow algorithms,real-time graphicshardware

1 Introduction

Oneof the fundamentalcomputationaltasksin three-dimensionalgraphicsis thevisible surfaceproblem: to efficiently find thefirstintersectionpoint betweena ray anda collection of surfaces. Inrendering,thevisible surfaceproblemis typically solvedfor manyraysandasinglecollectionof surfaces.Examplesincludevisibilitydeterminationfrom aneyepoint,shadow computation,andcompu-tationof globalilluminationsolutions.

Today, the visible surfaceproblemis typically solved by eithertheZ-buffer algorithm[Catmull1974]or by raycastingusingaspa-tial accelerationstructure[Whitted 1980]. TheZ-buffer algorithmis appropriatefor many raysthatsharea commonorigin andshareaknown patternof directions.It doesnotrequireany preprocessingof thesurfaces.In contrast,theraycastingalgorithmsolvesthevis-ible surfaceproblemfor the generalcase– oneor morerayswitharbitraryoriginsanddirections– but requirespreprocessingof thesurfacesto build a spatialaccelerationstructure.

We presentan algorithmthat is moregeneralthanthe classicalZ-buffer, but doesnot requirepreprocessingof surfaces.This ap-proach,whichwereferto astheirregularZ-buffer, efficiently solvesthevisibility problemfor raysthatsharea commonorigin but havearbitrarydirections(Figure 2).

Our algorithm representsray directionsaspoints on an imageplane,asin the classicalZ-buffer approach.However, we explic-�e-mail: [email protected]

†e-mail: [email protected]‡e-mail: [email protected]

Figure1: Two imagesof a scenefromDoom3 (alpha). A screencapture directly fromthegameengineis shownat top. Thegameengineemploysshadowvolumesto renderthis image. Thelowerimage is renderedbya softwarepipelinethat implementsirregularshadowmaps.

itly storeall of the samplelocationsin a two-dimensionalspatialdatastructureratherthanimplicitly representingthemwith a reg-ular pattern. The datastructurecan be any spatialdatastructurethat supportsefficient rangequeries,suchasa k-d treeor a grid.Like theclassicalZ-buffer algorithm,we projecttrianglesonto theimageplaneoneat a time andthen determinewhich sampleslieinsidea triangle.Unlike theclassicalalgorithm,this determinationis madeby queryingthetwo-dimensionalspatialdatastructure.Fi-nally, for eachsampleinsidea triangle,we performthestandardZcomparisonandupdateat thesample’s addressin a Z-buffer.

WedemonstratethattheirregularZ-buffer canbeusedto renderhigh quality shadows castby a point light source,asseenfrom a

Page 2: The Irregular Z-Buffer and its Application to Shadow Mapping · The Irregular Z-Buffer and its Application to Shadow Mapping ... that the irregular Z-buffer can be used to produce

Figure2: A taxonomyof solutionsto thevisiblesurfaceproblem,classifiedby theextentto which eyeraysshare origins anddirec-tions.

known eye point. We proposechangesto currentgraphicsarchi-tecturesthat permit our algorithm to executeefficiently. Finally,we simulatetheperformanceof shadow renderingusingthe irreg-ular Z-buffer. Our resultsindicatethat theapproachis suitableforreal-timeuseonourproposedarchitecture.

Figure3: TheclassicalZ-buffer (left) samplesa sceneat regularlyspacedpointson an image plane. The irregular Z-buffer (right)samplesa sceneat arbitrary pointsonan image plane.

2 The Irregular Z-Buffer

TheZ-buffer algorithmprocessesscenegeometryonepolygonat atime. Eachpolygonis projectedonto the imageplane. An edge-walking or edgeequationcomputationis usedto determinewhichsamplepoints(i.e. pixels)lie insidetheprojectedpolygon.

Answeringthis rangequeryis simplewhenthepositionsof thesamplepointsareimplicitly definedby a simpleformula, suchasthat for a regular grid (Figure3, left side). Classicalrasterizationalgorithmstargetthis caseandevaluatetheformuladirectly – theydonot explicitly storethelocationof eachsamplepoint.

However, if the locationsof the samplepoints cannotbe rep-resentedby a simple formula (Figure 3, right side), the standardrasterizationapproachdoesnot work. In this paperwe proposeanalternative approachto handlethis case.We explicitly storethepointsin aspatialdatastructure.Duringrasterization,arangequeryis performedonthisdatastructureto determinewhichpointsarein-sidea triangle.Wereferto this taskasirregular rasterization.

The datastructuresand algorithmsneededfor efficient rangequeriesarewell understood[Samet1990]. Rangequeriescanbeperformedefficiently for points storedin two-dimensionalspacepartitioningdatastructuressuchasBSPtreesandquadtreesor theirspecializedvariantssuchask-d trees. If the pointsaresomewhatevenly distributed,thengridsarealsoefficient for this task.

Thebestchoiceof datastructuredependson several factors. Ifthesetof samplepointschangesoften– asit doesfor theshadowmappingproblemwe examinelater in this paper– thenthecostofconstructingthedatastructureis important. Theconstructioncost

canvary asymptoticallydependingon thechoiceof datastructure,theoverall distributionof samplepoints,andtheorderin which thepointsareinserted.Oncethedatastructurehasbeenbuilt, thecostof rangequeriesdependson thesesamefactorsaswell. However,evenalgorithmswith thesameasymptoticcosthavedifferenteffec-tivecoststhatvarywith usagepatternsandtheunderlyinghardwarearchitecture.Finally, thestoragerequirementsof thedatastructuresvary andonly certainonescanguaranteea fixedmemoryfootprintregardlessof pointdistribution.

For the shadow-mappingapplicationon which we focus, thepoint distribution is somewhatevenacrosslargeregionsof theim-ageplane,so we found that a hybrid datastructurecomposedofa grid at the top level anda secondarydatastructurebelow it per-formsbetterthana simplek-d tree. We gatheredstatisticson twovariantsof the two-level structure:a grid of lists anda grid of k-dtrees,both implementedwith guaranteedstoragebounds.We de-terminedthat the grid of lists (including optimizationsdescribedlater)performedbestfor our datasets.The reasonsfor this resultarethatour averagelist sizeis smallandthat thepoint-in-areatestat eachstepof list traversalis simpler thanthe area-to-areaover-lap testrequiredat eachstepof k-d treetraversal. The differencewasnot overwhelming,andtheoutcomecouldbedifferentfor lessbalancedpointdistributionsor a differentarchitecture.

Constructingthe grid-of-lists datastructureis straightforward:we first determinewhich grid cell a point mapsto, thenprependthatpoint to the linked list for thatcell. Irregular rasterizationus-ing this datastructureis also simple. First, we determinewhichgrid cell(s) thetriangleoverlaps.This testis just like conventionalrasterizationwith the minor differencethat we mustconsideranyoverlapbetweenthe triangleanda cell asa hit, ratherthanjust anoverlapbetweenthetriangleandthecell’s center. At eachcell thatthe triangle overlaps,we traversethe linked list to test all of thecell’s pointsagainsteachof thetriangle’s threeedges.A point thatpassesall threeedgetestsis insidethetriangle.

3 Shadow Rendering

Shadows provide importantspatialcuesandaddto the realismofcomputergeneratedimages.Shadow renderingalgorithmsareof-ten placedinto two categories: thosefor point light sources(hardshadows)andthosefor arealight sources(softshadows)[AssarssonandAkenine-Moller 2003;ChanandDurand2003]. In this paperwefocusonhardshadow algorithms.In particular, weconsidertheproblemof renderingshadows in real-timefor scenescontainingmoving anddeformingobjects. Even thoughthis problemis con-ceptuallysimpleandis of practicalinterest,existing approachestoit have significantshortcomingswith respectto imagequality ortheirperformanceoncomplex scenes.

Object-spacemethodssuch as shadow volumes[Crow 1977;Everitt andKilgard 2002]canachieve artifact-freehardshadows ifimplementedcarefully. As a result,mainstreamapplicationdevel-operssuchasid Softwarehave chosento usetheapproachin state-of-the-artvideo gameslike the upcomingDoom 3. The shadowvolumetechniquecreatesandrasterizesextrapolygonsthatdemar-catetheboundariesbetweenvolumesof spacein andoutof shadowin thescene.Theseextra polygonsconnectobjectsilhouetteedges(asseenfrom thelight source)to infinity. Thepolygonsareraster-ized from theeye view into a stencilbuffer, wherea counteris in-crementedor decrementedateachpixel to tracktheshadowed/non-shadowed statusof thepixel. Thepolygonsareoften large,so thetechniquerequireshardwarewith a very high fill rate. Even withrecenthardwareandsoftwareoptimizations[NVIDIA Corp.2003;McGuire et al. 2003], useof the techniquerequiresthat geomet-ric scenecomplexity beheldwell below whatwould otherwisebepossible.

Page 3: The Irregular Z-Buffer and its Application to Shadow Mapping · The Irregular Z-Buffer and its Application to Shadow Mapping ... that the irregular Z-buffer can be used to produce

Raytracing[Whitted1980]andrelatedtechniquescanaccuratelyrender� a varietyof global illumination effectsincludinghardshad-ows. It is possiblethat real-timerenderingsystemswill eventuallyadoptraytracingtechniques.However, evenwith recentprogressinthis area[Wald et al. 2003],renderingperformanceremainsinade-quatefor scenescontainingdeformableobjects.

Shadow mapping [Williams 1978] and many of its variants[HourcadeandNicolas 1985; Fernandoet al. 2001] leverageex-isting Z-buffer hardwareto rendershadows with high performancefor complex scenes.However, existingversionsof thetechniqueareproneto samplingandself-shadowing artifactsthataresufficientlyseriousto limit thetechnique’s usein realapplications.

Figure4 (left) illustratestheshadow mapalgorithm. Thesceneis renderedfirst from the light position(yielding Znear values)andthenrenderedfrom theeye position. Eachpixel in theeye view istreatedasa 3-spacepoint positionedaccordingto its X / Y posi-tion in theimageplaneandits Z value(from thedepthbuffer), andis transformedinto light space.This transformationyieldsa pointP in light spaceanda distanceZP betweenP and the light-viewimageplane. The original eye-spacepixel is consideredto be inshadow iff Znear

� ZP, usingan estimatedZnear value. The algo-rithm estimatesZnear from the Znear valuesof oneor more light-view sample(s)thatarenearestto theprojectionof pointP ontothelight-view imageplane. This estimationstepis the primarycauseof artifactsproducedby thetechniqueastheestimationerroris gen-erally unbounded.

Mostrecenteffortsto reducetheseartifactshavetakenoneof twoapproaches.Thefirst is to useadditionalinformationfrom object-spacesilhouettecomputationsto reduceor eliminateestimationer-rors for the mostcommoncases[Senet al. 2003]. This approachseemsto be the mostsuccessfulat reducingthe incidenceof esti-mationartifacts,but sharpcornersanddetailsareoftentruncatedorlost dueto limited precisionin the contoursusedto representthesilhouettes.Also, theneedfor object-spacecomputationintroducesadditionalcomplexity into the renderingsystem. The secondap-proachis to adaptthesamplingratein thelight-view imageplanetothecharacteristicsof thescene[Fernandoet al. 2001;StammingerandDrettakis2002],therebyreducingtheaveragedistancebetweenaprojectionof P andthenearestsamplepoint. Fernandoetal. [Fer-nandoet al. 2001] replacethe standardlight view imagewith anadaptive imagehierarchy. This focuson improving shadow qualitythroughstrategic placementof shadow mapsamplepointsis simi-lar to our own, but we take this approachto its logical extremebyplacingsamplepointsin their ideallocations.

4 Irregular Shadow Mapping

Pseudocodefor irregular shadow mappingis shown in Figure 5.Thesceneis first renderedfrom theeye point. As in conventionalshadow mapping,pixels(at theZ valuesgivenby theZ-buffer) aretransformedinto light space,yielding P andZP. Unlike conven-tional shadow mapping,scenegeometryis thenrasterizedto sam-ple positionsin the light view imageplanegiven by the projec-tion of the transformedpixels, yielding Znear. As before,a pixelis in shadow iff Znear

� ZP. Note that irregular shadow mapsareview-dependent.Samplesarecomputedin the shadow mapplanepreciselywhererequiredby pixels in the eye view. Therefore,nomismatchexistsbetweenthesamplingratesor samplepositionsineye andlight space.Aliasing andself-shadowing areavoided,andno unnecessarysamplesarecomputed.Moreover, given pointsPprior to rasterizationin light space,andthe propertythat Znear isalways lessthanor equalto ZP, we canmaximizeour useof theavailableZ-buffer precision.

Figure 6 plots the location of samplepoints within irregularshadow mapsfor theDoom3 scenefrom Figure1. Thedensityofsamplepointsvariessignificantlyacrossthe imageplane,demon-

Figure4: Conventional(left) and irregular (right) shadowmap-ping. In thecaseof the former, thesceneis rendered to a conven-tional Z-buffer fromthelight, andthenfromtheeye. With thelatter,thesceneis rendered to a conventionalZ-buffer fromtheeye, andto an irregular Z-buffer fromthelight.

stratingtheimportanceof adaptive andirregularsamplingmethodsin this context.

Observe that irregular shadow mapping effectively mimicsshadow generationby ray tracing. PointsP matchthe intersectionpointsbetweeneye raysandscenegeometry;andsteps2, 4 and6imitatelight ray computation.Unlike ray tracing,irregularshadowmappingis anobject-orderalgorithm,which meansthatprimitivesare processedin the order submittedby the application. In thisway, our approachcombinestheimagequality andsamplingchar-acteristicsof ray-tracedshadows with thesystemorganizationandperformancecharacteristicsof Z-buffer rendering.

4.1 Image Quality

We comparethe quality of imagesproducedby irregular shadowmappingto that of several otherapproaches.Figure1 shows thatimagesgeneratedusingirregularshadow mappingarevisually in-distinguishablefrom thoseproducedby theshadow volumestech-nique. Figure7 shows that irregular shadow mappingeliminatesshadow aliasingartifactscommonlyassociatedwith conventionalshadow mapping. In Figure 8 we use an L2 norm to comparequantitatively the imagequality of our approachto that of threeother approaches.Our quantitative comparisonis madeagainstray-tracedshadows andagainsttwo othershadow mappingalgo-rithmsthatavoid object-spacecomputations:conventionalshadowmapping[Williams 1978]andadaptiveshadow mapping[Fernandoet al. 2001]. This figure illustratesthat thenumberof shadow mapsamplesrequiredto attainhigh fidelity is much lessthan that re-quiredby theseothershadow mappingtechniques.

Our conventional and adaptive implementationsinclude stan-dard enhancementsto reduceself-shadowing and shadow alias-ing artifacts.Theseenhancementsincludepercentagecloserfilter-ing (PCF)[Reeveset al. 1987],objectIDs [HourcadeandNicolas1985]andorientation-dependentbiasvalueslikethosecomputedbyglPolygonOffset [OpenGL ArchitecturalReview Board 2003].

Page 4: The Irregular Z-Buffer and its Application to Shadow Mapping · The Irregular Z-Buffer and its Application to Shadow Mapping ... that the irregular Z-buffer can be used to produce

Figure 5: A comparisonof conventional and irregular shadowmapping. Stepslabeledwith “*” are repeatedper light source.

PCFandglPolygonOffset rely on configurableparameters.Wecarefully handtune theseparametersfor eachimageof eachtestsceneto minimize the differencein L2 normsfrom the baselineray-tracedimage.Additionally, thefield of view usedfor renderingthe conventionalandadaptive shadow mapsis tunedto maximizetheeffective resolutionof thesemaps.

Weusedtwo scenesfor ourevaluations.Thefirst is from Doom3(a leaked versionwidely availableon the Internet). It consistsoftwo light sourcesand8548triangleswhich the Doom 3 gameen-gine rendersusingstenciledshadow volumes[Everitt andKilgard2002]. Thesecondsceneis our creationandis designedasa chal-lengingtestcasefor shadow mappingmethods(includingourown).Thissceneis composedof onelight sourcewith shadows,onedetaillight without shadows, and16372trianglesspreadamonga knot-like shapeandseveral spheres.Thesilhouetteedgesof thecurvedsurfacesareproneto generatingregionsof high sampledensityinirregularshadow maps.

All four shadow algorithmsareimplementedwithin asinglesoft-wareframework [Kolb 1997]. Doing soensurestheconsistency ofnon-shadow-relatedimagefeatures(i.e. blending, texturing, etc.)acrossimagesproducedwith any shadow type.Theframework canbe configuredto mimic a real-timeZ-buffer renderer, with depthvaluesstoredin a 24-bit floating point format. Alternatively, theframework canbeconfiguredasa recursive ray tracer, with all in-termediatevaluesmaintainedat64-bit floatingpoint precision.

5 Irregular Shadow Mapping: Data Structures

Our implementationof irregularshadow mappingstoressamplelo-cationsin anenhancedversionof thegrid-of-listsdatastructurede-scribedearlier. Ourprimaryenhancementis to useahash-like func-tion tomapahighresolution“logical” grid into asmaller“physical”grid, therebyreducingthememoryfootprint.

Figure 6: The view from Figure 1 overlaid with colored tiles isshownat top. Thelower imagesshowthe irregular shadowmapsfor thelight sourcesbehindandabovetheeye(center),andat thetop of the right wall (bottom). We refers to theselights as “light0” and “light 1” later in the paper. Colored points denotethetile containingthe respectiveeye view pixel. Both mapscontainregionsof high (B, C) andlow sampledensity(A, D).

Page 5: The Irregular Z-Buffer and its Application to Shadow Mapping · The Irregular Z-Buffer and its Application to Shadow Mapping ... that the irregular Z-buffer can be used to produce

Figure7: Ascenerenderedwith conventionalshadowmappingandonesampleper pixel is shownat top. Thesamescenerenderedwith irregular shadowmappingandonesampleperpixel is shownat bottom.

This structureis is illustratedin Figure9. Thelogical grid over-laysthelight view imageplane.Logicalgrid (lgrid) cell indicesaremappedto indicesin thephysicalgrid (pgrid) usingthehash-likefunction:

pgridi � � lgridi �� lgrid j pgridwidth � � til ewidth � %pgridwidth

pgrid j � � lgrid j �� lgridi pgridheight ��� til eheight � %pgridheight

Each physical grid cell potentially forms the root of its ownlinkedlist. Shadow mapsamplesprojectingto thesamecell becomeelementsof the samelist. Observe in Figure6 that tiles of pixelsin eye spaceremain largely intact in light space. Our hash-likefunctionavoids folding contiguousregionsof high sampledensitybackontothemselves,while largelypreservingany eyespace- lightspacecoherence.We exploit this coherenceby tiling the physicalgrid in memoryandprocessingfragmentsin tile order[McCormacket al. 1998].

This distinctionbetweenlogical andphysicalgrids (inspiredbythework of Reinhardet al. [2000]), permitsfiner discretizationoftheshadow mapplanethanwould otherwisebepractical.Smallercell sizesreducethe numberandseverity of hot spots(cells with

Figure 8: A plot of the differencein L2 normsbetweenimagesrenderedwith conventional(CSM),adaptive(ASM),andirregular(ISM)shadowmapsanda referenceimage generatedvia ray trac-ing (oneunjitteredeyeray per pixel). Therendered sceneis fromDoom3 (seenin Figure 1). Theshadowmapsamplecountrepre-sentsthetotal over both light sourcesin this scene. Considerableeffort wasspenttuningCSMandASMrelatedparameters.

significantlyhighersamplecountsthanothercells). Note that in-creasingthe resolutionof the logical grid relative to the physicalgrid yieldsamoreevendistributionof samplesin thephysicalgrid.However, increasingthis ratioalsoincreasestheprobabilitythatthelinked list at any particularphysicalgrid cell will containsamplesfrom distinctlydifferentregionsof theshadow map.

Constructionof thedatastructureproceedsasfollows. Eyeviewpixelsareprocessedin tiles. Theresultingspatialcoherenceyieldsmemoryaccesslocality. Pixels in eachtile are transformedintolight spaceandprojectedonto the light view imageplane.The in-dex of the logical grid cell occupiedby a projectedpoint (shadowmapsample)is mappedinto the physicalgrid usingthe hash-likefunctionshown previously. Thesampleis prependedto the linkedlist occupying this cell. Acceleratorconstructionis parallelizableby assigningtiles of pixels to different processingelementsandlockingphysicalgrid cellsduringlist update.

Irregularrasterizationis similarly straightforward.Ourselectionof a grid asthetop level datastructureenablesusto leveragemanycomponentsof existing rasterizers.Scenegeometryis rasterizedtothe logical grid asnormal,exceptthatany cell crossedby a prim-itive is processedfurther. Theprimitive is testedagainsteachele-mentin the linked list of thecell. Z interpolationanda depthtestareperformedasusualfor samplesfound to fall within theprimi-tive. Viewportclipping is usedto restrictrasterizationto theregionof thelogical grid populatedby samples.A stencilmaskis usedtodiscardfragmentsassociatedwith logical grid cellsdevoid of sam-ples. Rasterizationis parallelizableby assigninglogical grid cellscoveredby a givenprimitive to differentprocessingelements.Thisapproachis similar to parallelrasterizationin existing hardware.

5.1 Architectural Support

Irregularshadow mappingis particularlyappropriatefor real-timerendering,as it provides imagequality comparableto ray tracedshadows andstenciledshadow volumes,with thecomputationalef-ficiency of shadow maps. However, irregular shadow mappingisunlikely to run efficiently on currentgraphicshardware. An effi-

Page 6: The Irregular Z-Buffer and its Application to Shadow Mapping · The Irregular Z-Buffer and its Application to Shadow Mapping ... that the irregular Z-buffer can be used to produce

0 1 2

3 4 5

6 7 8

0 1 23 4 56 7 8

Figure9: Thedata structure usedfor irregular shadowmapping.Thelight view image planeis overlaid with a “lo gical” grid. Thelogical grid hashigher resolutionthan theactualstored (“physi-cal”) grid. A hash-like functionmapspointsin thelogical grid intocellsin thephysicalgrid. Each cell of thephysicalgrid potentiallyservesastherootof a linkedlist. Thelinkedlist containsthepointsmappingto a givencell, for later useduring rasterization.

cientimplementationof thetechniquerequiresadditionalfunction-ality not supportedin theseGPUs. Our proposedarchitectureisillustrated in Figure 10, and combinesan enhancedGPU with amultithreadedCPU-likeprocessor(denoted“multithreadedproces-sor”) on the samedie. This additionalprocessoris similar to thatfoundin somecommercialembeddedprocessorchips[Levy 2002],network processors[Adiletta et al. 2002],andexperimentalarchi-tectures[Cascaval et al. 2002].

Irregularshadow mappingconsistsof two phaseswhichusedif-ferentpartsof the architecture.Acceleratorconstructionrequiresread/ write accessto the datastructurein memory, andexecuteson theCPU-likeprocessor. Theconstructionkernelcontains40 in-structionsandconsumes8 live four-vectorregisters.Irregular ras-terizationrequiresread-onlyaccessto the datastructure,anduti-lizes thegraphicspipelinewith its enhancedfragmentprocessors.Therespective kernelcontains14 instructionsandconsumes8 livefour-vectorregisters.

During irregular rasterization,the applicationsendspolygonsdown thegraphicspipelinejust asit would duringconventionalZ-buffer rendering.Thefixed-functionrasterizergeneratesafragment(hereaftergrid fragment) for eachlogical grid cell overlappedby atriangle. Fragmentscovering logical grid cells marked empty inthe stencilmaskarediscarded.Our enhancedfragmentprocessortransformsthegrid fragment’slogicalgrid coordinatesinto physicalgrid coordinates,andtraversesthatphysicalgrid cell’s linkedlist ofsamples. Eachsamplein the cell is testedagainstthe triangle’sedgeequations[Pineda1988]. If a samplelies insidethe triangle,thefragmentprocessorevaluatestheZ interpolationequationat thesamplelocationandgeneratesanoutputframebuffer fragmentwiththecomputedZ value. The fragmentprocessorsetstheaddressofthe framebuffer fragmentto the sample’s index value. This indexis thex / y positionof theeye-view pixel in theimageplanecorre-spondingto thesample.Finally, eachframebuffer fragmentpassesthroughthefixed-functionZ-testunit,wheretheusualread/ modify/ write Z buffer comparisonis performed.

As we’ve indicated,efficient supportfor irregular rasterizationrequiresenhancingthe capabilitiesof current graphicspipelinesandtheirfragmentprocessors[Microsoft Corporation2003;Berettaet al. 2003]. First, the rasterizerneedsa modewhereit outputsa grid fragmentwhen a triangle overlapsany portion of a pixel,not just its centerasis thecasenow. Severalsetup/ rasterizerde-signscan be modified to supportthis option, either by changingthe pixel walking algorithmor by moving the triangleedgesout-

Figure10: Anarchitecture with componentswhich enableefficientexecutionof the irregular Z-buffer. The large gray box denotesarchitectural componentsincludedin our simulation.

ward by � 2 2 of pixel spacing. Additionally, the fragmentpro-cessormustsupportconditionalbranchingandhave accessto thehomogeneousequationsdescribingthe edgesandZ interpolation[OlanoandGreer1997]of thefragment’striangle.Finally, thefrag-mentprocessormustbeableto conditionallyoutputa framebufferfragmentasmany timesasneeded,to adifferentcomputedaddress(framebuffer location)eachtime. This computed-addresscapabil-ity is often referredto asa scatteroperationandhasproven to beusefulfor a wide varietyof purposesin streamprocessors[Kapasietal. 2002].

6 Simulation and Analysis

Any new techniquetargetedat real-timegraphicsapplicationsmustrun fastaswell asproduceimagesof thedesiredquality, but eval-uatingtheperformanceof new techniqueslike oursis challenging.Typically, an algorithm’s performanceis measuredin oneof twoways: at a high level by countingarithmeticoperationsusedforsomedataset,or at a low level by implementingthealgorithmoncurrenthardwareandtiming its execution.For our technique,nei-therapproachis adequate.Memoryhierarchyeffectssuchascachemissesandassociatedlatencieshavebecometooimportantfor sim-ple operationcountingto yield accurateperformanceestimatesformemory-intensive techniques.However, currentarchitecturesdonot supportour techniquewell, so timing an implementationof iton thesearchitectureswouldnot bevery informative.

Instead,wehaveconstructedahigh-level performancesimulatorfor thearchitecturewe proposedin theprevioussection.Our simu-latormodelsthekey performancecharacteristicsof thearchitecture– particularlymemoryhierarchyeffects– but doesnot attempttomodelevery detailthatwouldbenecessaryto determineexactexe-cutiontimes. In otherwords,theprimarypurposeof our simulatoris to evaluatealgorithmsratherthanto evaluatearchitectures.Wesimulatethememorysystemperformancein thegreatestdetail,theprocessorcore performanceat a mediumlevel of detail, and thefixed-functiongraphicsunitsat a functionalitylevel only.

Our memory hierarchysimulator is adaptedfrom an existing

Page 7: The Irregular Z-Buffer and its Application to Shadow Mapping · The Irregular Z-Buffer and its Application to Shadow Mapping ... that the irregular Z-buffer can be used to produce

Table1: Statisticscapturedduring irregular rasterizationfor each light view in our scenes.Each light view contains1,310,720samples.

event queuebasedsimulator[Burger et al. 1999]. We discardedtheout-of-orderCPUcoresimulationcomponentsandreplacedtheDRAM subsystemsimulatorwith ourown. In oursimulator, writesto onecacheareincoherentwith readsfrom anotherasautomaticcoherenceis complex andunnecessaryin the context of our algo-rithm. Our DRAM subsystemsimulatorhonorsall bankandchan-nel timing restrictions(e.g.prechargetimes)specifiedby amemorymanufacturer’s datasheet[Micron Technology, Inc. 2003]. Theconfigurationof thesimulatedmemorysystemis shown in Table2.

Our performancesimulatorusesa single model for the CPU-like processorcore and the fragmentprocessorcore, except thatthefragmentprocessoris forbiddento write to memoryandit sup-portsstreaminputsandoutputs.Theprocessorsaremultithreadedwith zero cycle context switchesand round-robinscheduling,al-lowing theprocessorto remainactive evenwhenmostthreadsarewaiting on cachemissesor locks. Eachprocessoris simple – itissuesonescalaror 4-wide vector operationeachcycle. We usetheNV vertex program2instructionset[Brown andKilgard 2003],which includesconditionalbranchinstructionsbut no integerarith-metic. We augmentthis instructionsetwith instructionsfor lockandunlock(usingoneof 256on-chiplocks),128-bitloadandstore(indirectaddressingOR’d with oneof two baseregisters),andlog-ical shifts. We assumethat all instructionsexceptload,store,andlock completein onecycle; i.e. we do not model pipeline stallscausedby branchmispredictionor arithmeticunit latency. Thesepotentialstallswouldbepartiallyor wholly coveredbymultithread-ing andarelessimportantfor theshortpipelineweareassuming( [email protected] 90nm)thanthey arefor mostmodern15+stageCPUs.

Table2: Theconfiguration of our memorysimulator. All proces-sor coresandon-chip bussesrun at 1.2GHz. Cachesutilize LRUreplacementpolicy and are write-allocate, write-back. L2 min-imummisslatencyassumesa DRAMpage hit andno contention.DRAMsubsystem:256-bitwide, eight-channelconfigurationusingeight 700 MHz DDR chips. � TheL2 average misslatencyvariesbyscene.

We separatelysimulatetwo key phasesof the ISM algorithm:constructionof the shadow mapdatastructure,which runson theCPU-like processor;andthe portion of irregular rasterizationthatrunsonthefragmentprocessors.Thesimulatedcodehasbeencare-fully optimizedto reduceinstructioncount– for example,thecon-structionphaseprocessespixelsin pairsto maximizeuseof 4-widevectoroperations.

In theconstructionphase,eachthreadprocessesadifferentsetofpixels.Within a thread,pixelsareprocessedonetile at a time. Theinputdata(eye-view Z-buffer) residesin cacheableDRAM, asdoestheoutputdata(acceleratordatastructureandstencilbuffer mark-ing non-emptyphysical-gridcells). Prior to theconstructionphasewemustclearthephysicalgrid in theacceleratordatastructure(one

pointerpercell) aswell asthestencilbuffer, but we excludetheseclearingoperationsfrom our measurementsbecausetheir costsarealreadywell understood.Table3 shows thesimulatedperformanceof theconstructionphasefor eachlight in the two differentscenesfor animageresolutionof 1280x 1024.

Table3: Simulatedperformanceof theconstructionof thespatialacceleration structure, for each light view of therespectivescene.In all casesthe accelerator contains1,310,720points. A singleCPU-like processoris used. The processoris multithreaded(8-way). “Useful cycles” refers to the percent of processorcyclesspenton usefulwork (not stalling).

In therasterizationphaseof computation,grid fragmentsgener-atedby theconventionalrasterizerareassignedto thefirst availablefragmentprocessorthread. Our performancesimulationassumesthatthestenciltestdescribedin section5 rejectsgrid fragmentsforemptyphysicalgrid cells, so thesefragmentsarenot assignedtoa fragment-processorthread. We do not model the stencil buffermemorytraffic or the Z test/replacememorytraffic in our perfor-mancesimulation,but theadditionalbandwidthconsumedby theseoperationscanbecomputedfrom Table1.

Table4 summarizestheperformanceof the fragment-processorstageof irregular rasterization,for four processorseachwith eightthreads.Theseresultsdemonstratethatouralgorithmachievesreal-timerateson theproposedarchitecture.

Table4: Simulatedperformanceof the irregular Z-buffer duringrasterization. The sceneis rasterizedto each light view of therespectivescene. Each light view contains1,310,720samples.Four multithreadedfragmentprocessors are used,each with eightthreads.“Useful cycles” refers to thepercentof processorcyclesspenton usefulwork (not stalling). “Cycles / Tri. Sample”de-notesthecyclesrequired to processonetriangle fragmentagainstonelight-view sample.

7 Discussion and Conclusion

Wehave presentedseveralresultsin this paper:

� The irregular Z-buffer: WehaveextendedtheZ-bufferalgo-rithm to supportarbitrarysamplelocationsin theimageplane.

Page 8: The Irregular Z-Buffer and its Application to Shadow Mapping · The Irregular Z-Buffer and its Application to Shadow Mapping ... that the irregular Z-buffer can be used to produce

� Irregular shadow mapping: We have demonstratedthattheirregular Z-buffer can be usedto generateshadow mappedshadows, andthat its useeliminatestheartifactstraditionallyassociatedwith shadow mapping.

� Hardware performance analysis: Wehaveproposedasetofarchitecturalenhancementsto supportirregular rasterization,anddemonstratedin simulationthat thesecapabilitiesenablethetechniqueto runat interactive framerates.

It is likely that the irregular Z-buffer has additional applica-tions elsewhere in real-timerendering,including reflectionmap-ping,adaptive sampling,andframelessrendering.

A key characteristicof the irregularZ-buffer approachis that itbuilds andtraversesan adaptive spatialdatastructure. This classof datastructuresis widely usedin graphicsandis likely to bepar-ticularly importantin thefuture for real-timeray tracingof scenescontainingdeformableobjects. We believe that someof the algo-rithmic andarchitecturalideaswe have presentedcan be appliedto this broaderproblem. We hopethat our work will help to in-spireCPUandGPUarchitectsto designfutureprocessorsthatef-ficiently supporttheconstruction,update,andtraversalof adaptivedatastructures.

8 Acknowledgements

Ikrima Elhassanwrote the initial version of our Doom 3 sceneparserand implementedadaptive shadow mapping. Don Fussellgave us a variety of usefulsuggestions,and in particularwas thefirst of several peopleto urge us to investigatea grid-of-lists datastructureas an alternative to the kd-tree data structurethat westartedwith. Kelly Gaither and Jay Boisseauof the Texas Ad-vancedComputingCenterstronglyencouragedandsupportedourwork; the work environmentthey createdmadethis researchpos-sible. DougBurgergave usaccessto his memoryhierarchysimu-lator andadviceon how to useit. J. Mike O’Connor, KarthikeyanSankaralingamandStephenKecklergave us usefuladviceon ar-chitecturechoicesandsimulation.This work wassupportedby theNationalScienceFoundation,Microsoft,andIntel.

References

ADILETTA , M., ROSENBLUTH, M., BERNSTEIN, D., WOLRICH, G., AND

WI LK INSON, H. 2002. Thenext generationof Intel IXP network pro-cessors.Intel Technology Journal 6, 3 (August).

ASSARSSON, U., AND AKENINE-M OLLER, T. 2003. A geometry-basedsoft shadow volumealgorithmusinggraphicshardware. ACM Transac-tionsonGraphics(TOG)22, 3, 511–520.

BERETTA , B., BROWN, P., CRAIGHEAD, M., AND EVERITT, C. 2003.GL ARB fragmentprogramextensionspecification.

BROWN, P., AND K I LGARD, M. 2003.GL NV vertex program2extensionspecification.

BURGER, D., K AGI , A ., AND HRISHIKESH, M. S. 1999.Memoryhierar-chyextensionsto thesimplescalartool set.UT-AustinComputerSciencesTechnical ReportTR-99-25(September).

CASCAVAL , C., NOS, J. G. C., CEZE, L ., DENNEAU, M., GUPTA , M.,L IEBER, D., MOREIRA , J. E., STRAUSS, K ., AND JR, H. S. W.2002. Evaluationof a multithreadedarchitecturefor cellular comput-ing. In Proceedingsof the Eighth International Symposiumon High-PerformanceComputerArchitecture (HPCA’02), IEEE ComputerSoci-ety, 311–322.

CATMULL , E. 1974. A SubdivisionAlgorithm for ComputerDisplay ofCurvedSurfaces. PhDdissertation.

CHAN, E., AND DURAND, F. 2003. Renderingfake soft shadows withsmoothies.In Proceedingsof theEurographicsSymposiumon Render-ing, EurographicsAssociation,208–218.

CROW, F. C. 1977. Shadow algorithmsfor computergraphics.ComputerGraphics(SIGGRAPH’77 Proceedings)11, 2 (July).

EVERITT, C., AND K I LGARD, M. 2002. Practicaland robust stenciledshadow volumesfor hardware-accelerated rendering.SIGGRAPH2002CourseNotescourse#31.

FERNANDO, R., FERNANDEZ, S., BALA , K ., AND GREENBERG, D. P.2001. Adaptive shadow maps.In Proceedingsof the28thAnnualCon-ferenceon ComputerGraphicsand InteractiveTechniques(ACM SIG-GRAPH2001), ACM Press,387–390.

HOURCADE, J. C., AND NICOLAS, A . 1985. Algorithms for antialiasedcastshadows. Computers & Graphics9, 3, 259–265.

KAPASI , U. J., DALLY, W. J., RIXNER, S., OWENS, J. D., AND

KHAILANY, B. 2002.TheImaginestreamprocessor. In ProceedingsoftheIEEE ConferenceonComputerDesign, 295–302.

KOLB, C. 1997.Rayshadehomepage.

LEVY, M. 2002.Tying upaMIPS32processorwith threads:LexraevolvestheLX4380pipelineinto multithreadingprocessor. MicroprocessorRe-port (November).

MCCORMACK , J., MCNAMARA , R., GIANOS, C., SEILER, L ., JOUPPI ,N. P., AND CORRELL , K . 1998. Neon: a single-chip 3dworkstationgraphicsaccelerator. In Proceedingsof the ACM SIG-GRAPH/EUROGRAPHICSWorkshop on Graphics Hardware, ACMPress,123–132.

MCGUIRE, M., HUGHES, J. F., EGAN, K ., K I LGARD, M., AND EVERITT,C. 2003.Fast,practicalandrobustshadows. BrownUniversityTechnicalReportCS03-19(October).

M ICRON TECHNOLOGY, INC. 2003. Micron Technology256 Mb (x32)GDDR3SDRAM datasheet.

M ICROSOFT CORPORATION. 2003.DirectX 9.0SDK.

NVIDIA CORP. 2003. NVIDIA GeForceFX 5900, 5700 and Go5700GPUs:Ultra Shadow technology.

OLANO, M., AND GREER, T. 1997. Trianglescanconversionusing2Dhomogeneouscoordinates. In Proceedingsof the ACM SIGGRAPH/EUROGRAPHICSWorkshopon GraphicsHardware, ACM Press,89–95.

OPENGL ARCHITECTURAL REVIEW BOARD. 2003.OpenGL1.5specifi-cation.

PINEDA , J. 1988.A parallelalgorithmfor polygonrasterization.ComputerGraphics(SIGGRAPH’88 Proceedings)22, 4 (August),17–20.

REEVES, W. T., SALESIN, D. H., AND COOK , R. L . 1987. Renderingantialiasedshadowswith depthmaps.In Proceedingsof the14thAnnualConferenceon ComputerGraphicsand Interactive Techniques, ACMPress,283–291.

REINHARD, E., SMITS, B., AND HANSEN, C. 2000. Dynamicacceler-ation structuresfor interactive ray tracing. In Proceedingsof the 11thEurographicsWorkshopon Rendering, EurographicsAssociation,299–306.

SAMET, H. 1990. The Designand Analysisof SpatialData Structures.Addison-Wesley LongmanPublishingCo.,Inc.

SEN, P., CAMMARANO, M., AND HANRAHAN, P. 2003. Shadow silhou-ettemaps.ACM TransactionsonGraphics(TOG) 22, 3, 521–526.

STAMMINGER, M., AND DRETTAKIS, G. 2002.Perspective shadow maps.In Proceedingsof ACM SIGGRAPH2002, ACM Press/ ACM SIG-GRAPH,J.Hughes,Ed.,557–563.

WALD, I ., PURCELL , T. J., SCHMITTLER, J., BENTHIN, C., AND

SLUSALLEK , P. 2003. Realtimeray tracingandits usefor interactiveglobalillumination. In Stateof theArt Reports,EUROGRAPHICS2003.

WHITTED, T. 1980. An improved illumination modelfor shadeddisplay.Communicationsof theACM 23, 6 (June),343–349.

WI LL IAMS, L . 1978. Castingcurved shadows on curved surfaces.Com-puterGraphics(SIGGRAPH’78 Proceedings)12, 3 (August),270–274.