Game Engine Gems, Volume Onedl.booktolearn.com/ebooks2/computer/game...Kurt Pelzer...
Transcript of Game Engine Gems, Volume Onedl.booktolearn.com/ebooks2/computer/game...Kurt Pelzer...
GameEngineGems,VolumeOneGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
GameEngineGems,VolumeOneEditedbyEricLengyel,Ph.D.
JONESANDBARTLETTPUBLISHERSSudbury,Massachusetts
BOSTON,TORONTO,LONDON,SINGAPORE
WorldHeadquartersJonesandBartlettPublishers40TallPineDriveSudbury,[email protected]
JonesandBartlettPublishersCanada6339OrmindaleWayMississauga,OntarioL5V1J2Canada
JonesandBartlettPublishersInternationalBarbHouse,BarbMewsLondonW67PAUnitedKingdom
JonesandBartlett'sbooksandproductsareavailablethroughmostbookstoresandonlinebooksellers.TocontactJonesandBartlettPublishersdirectly,call800-832-0034,fax978-443-8000,orvisitourwebsite,www.jbpub.com.
SubstantialdiscountsonbulkquantitiesofJonesandBartlett'spublicationsareavailabletocorporations,professionalassociations,andotherqualifiedorganizations.Fordetailsandspecificdiscountinformation,contactthespecialsalesdepartmentatJonesandBartlettviatheabovecontactinformationorsendanemailtospecialsales@jbpub.com.
Copyright©2011byJonesandBartlettPublishers,LLC
ISBN-13:
9780763778880
ISBN-10:0763778885
Allrightsreserved.Nopartofthematerialprotectedbythiscopyrightmaybereproducedorutilizedinanyform,electronicormechanical,includingphotocopying,recording,orbyanyinformationstorageandretrievalsystem,withoutwrittenpermissionfromthecopyrightowner.
Thepublisherrecognizesandrespectsallmarksusedbycompanies,manufacturers,anddevelopersasameanstodistinguishtheirproducts.Allbrandnamesandproductnamesmentionedinthisbookaretrademarksorservice
marksoftheirrespectivecompanies.Anyomissionormisuse(ofanykind)ofservicemarksortrademarks,etc.,isnotanattempttoinfringeonthecopyrightofothers.
ProductionCredits
Publisher:DavidPallai
EditorialAssistant:MollyWhitman
SeniorProductionEditor:KatherineCrighton
AssociateMarketingManager:LindsayRuggiero
V.P.,ManufacturingandInventoryControl:ThereseConnell
CoverDesign:KristinE.Parker
CoverImage:©Jesse-leeLang/Dreamstime.com
PrintingandBinding:Malloy,Inc.
CoverPrinting:Malloy,Inc.
6048
PrintedintheUnitedStatesofAmerica
141312111010987654321
ContributorBiographies
RémiArnaudisworkingasChiefSoftwareArchitectat
ScreampointInternational,acompanyprovidinginteroperable5Ddigitalcitymodelsforthebenefitofgovernments,propertyowners,developers,designers,contractors,managers,andserviceproviders.Rémi'sinvolvementwithreal-timegraphicsstartedintheR&DdepartmentofThomsonTraining&Simulation(nowThales)designingandthenleadingtheSpaceMagicreal-timevisualsystemfortrainingsimulators,wherehefinalizedhisPh.D."Lasynthèsed'imagesentempsréel".HethenrelocatedtoCaliforniatojointheSiliconGraphicsIRISPerformerteam,workingonadvancedfeaturessuchascalligraphiclightpointsfortrainingpilots.Hethendecidedtobemoreadventurousandco-foundedIntrinsicGraphics,whereheco-designedtheAlchemyengine,amiddlewaretargetingcross-platformgamedevelopmentforPS2,Xbox,GameCube,andPC.HewashiredasGraphicsArchitectatSonyComputerEntertainmentUSR&D,workingonthePlayStation3SDKgraphicsAPI,andjoinedtheKhronosGrouptocreateCOLLADAassetexchangestandard.Morerecently,RémiworkedatIntelwherehecreatedandleadtheLarrabeeGameEngineTechnologyteam.
RonBarbosahasbeenanavidhobbyistgameandgametechnologydevelopersincehisteenageyears.Since1993,hehasworkedasaprofessionalnetwork/softwareengineerformanycompaniesproducinginternettechnologies,includingformertechnologygiantsCompaqComputerCorporationandLucentTechnologies,Inc.HecurrentlyservesastheChiefSoftwareArchitectatBocaRaton,Florida'sRevelexCorporation,atraveltechnologyservicesprovider.Inhisshortspurtsofsparetime,heattemptsto
remainactiveinindiegamedevelopmentcirclesandistheoriginalauthorofPlanetCrashmania9,000,000availableonMicrosoft'sXboxLIVEIndieGamesserviceandApple'siPodTouchAppsStore(portedtoiPodTouchbyJamesWebb).
JohnBoltonisasoftwareengineeratNetflixinLosGatos,Californiaandhasbeenprogramminggamesprofessionallysince1992.Hehascontributedtodozensofgamesandhasbeenleadprogrammeronseveraltitles,includingIHaveNoMouthandIMustScream,HeroesofMightandMagic,andHighHeatBaseball.
KhalidDjadoisaPh.D.studentintheDepartmentofComputerSciencesatUniversityofSherbrooke.Hisresearchinterestsincludecomputergraphicsandphysicalsimulations.HeisalecturerforgraduatestudentsingamedevelopmentfortheUniversityofSherbrookeatUbisoftCampus.HewasalsoagamedeveloperatAmusementCyanideinMontreal.Heobtainedabachelor'sdegreeinappliedmathematicsfromtheUniversitySidiMohamedBenAbdellahinMorocco,andamaster'sinmodelling,simulation,andoptimisationfromtheUniversityofBretagneSudinFrance.HehasbeenamemberofACMSiggraphsince2006.
RichardEgliisprofessorintheDepartmentofComputerSciencesatUniversityofSherbrookesince2000.HereceivedhisB.Sc.degreeinComputerScienceandhis
M.Sc.degreeinComputerSciencesatUniversityofSherbrooke(Québec,Canada).HereceivedhisPh.D.inComputerSciencesfromUniversityofMontréal(Québec,Canada)in2000.HeisthedirectorofthecentreMOIVRE(MOdélisationenImagerie,VisionetRÉseauxdeneurones).Hisresearchinterestsincludecomputergraphics,physicalsimulations,anddigitalimageprocessing.
SimonFrancosampledhisfirsttasteofprogrammingontheCommodoreAmiga,whenhewrotehisfirstPongcloneinAMOS,andhehasbeencodingeversince.Hejoinedthegamesindustryin2000aftercompletingadegreeincomputerscience.HestartedatTheCreativeAssemblyin2004,wherehehasbeentothisday.Whenhe'snotplayingthelatestgame,he'llbewritingassemblycodefortheZXspectrum.
AndersHastworkshalftimeasaVisualizationExpertatUPPMAX(UppsalaMultidisciplinaryCenterforAdvancedComputationalScience)andhalftimeasassociateprofessorattheUniversityofGävle,bothinSweden.Hehaspublishedwellover50scientificpapersinjournals,inconferences,andasbookchaptersinvariousareasincomputergraphics,visualization,andappliedmathematics.Hisotherinterestsinlife,besidescomputergraphicsresearch,areUSmodeltrains,drinkingCzechbeer,andstudyingtheItalianlanguage.
Danhasspentover10yearsinthegamesindustry,startingwithStainlessSteelStudios.HewasoneoftheoriginalcreatorsoftheTitangameengine,andwasoneofthechiefAIprogrammersonEmpireEarth,Empires:DawnoftheModernWorldandRise&Fall:CivilizationsatWar.Later,heworkedatTiltedMillonCaesarIVandSimCitySocieties.Today,alongwithhiswife,heisownerandmanagerofLunchtimeStudios,Inc.
AdrianHirsthasbeensheddingblood,sweat,andtearsprogrammingonanyandeverygamingplatformforthelasttenyears,workingwithmanyleadingdevelopersandpublishers,mostrecentlyincludingSony,Codemasters(LMA2002,ColinMcRaePC3,4,5,2005+),andElectronicArts/Criterion(Burnout:Paradise).MostrecentlyhesetupWeaseltronEntertainmentinordertojointhegrowingmassesofindependentdevelopersandapplyhisskillstonewchallenges.Heisalsoremarkablygoodlooking,writeshisownbiography,andneedsabeer.
JasonHughesisanindustryveterangameprogrammerof15yearsandhasbeenactivelycodingfor25years.Hisbackgroundcoverseverythingfrommodemdriversin6502assemblytofluiddynamicsontheWiitoamulti-platform3Dengine.Jasontinkerswithexoticdatastructures,advancedcompressionalgorithms,andvarioustoolsandtechnologyrelatingtothegamesindustry.PriortofoundingSteelPennyGames,JasonspentseveralyearsatNaughtyDogontheICEteamwritingtheassetpipelinetoolsusedbyPS3developersintheICEandEdgelibraries.
FrankKaneistheownerofSundogSoftware,LLC,makersoftheSilverLiningSDKforreal-timerenderingofskies,clouds,andprecipitationeffects(seewww.sundog-soft.comformoreinformation).Frank'sgamedevelopmentexperiencebeganatSierraOn-Line,whereheworkedonthesystem-levelsoftwareofadozenclassicadventuregametitlesincludingPhantasmagoria,GabrielKnightII,PoliceQuest:SWAT,andQuestforGloryV.He'salsoanalumnusofLookingGlassStudios,wherehehelpeddevelopFlightUnlimitedIII.FrankdevelopedtheC2EnginescenerenderingengineforSDSInternational'sAdvancedTechnologyDivision,whichisusedforvirtualrealitytrainingsimulatorsbyeverybranchoftheUSmilitary.HecurrentlyliveswithhisfamilyoutsideSeattle.
JanKrassniggisstudyingInformationTechnologiesattheUniversityofAachen,Germany.
MartinLinklaterhasbeenprogrammingsince1981whenhewastenyearsold.AfterspendinghisteenageyearshackingC64andAmigacode,hegotaBachelorsDegreeinComputerSciencein1993.HisfirstjobinthegamesindustrywasasaprogrammerforPsygnosis,soontobecomeSonyComputerEntertainmentEurope.AfterfiveyearsatSCEEheleftwithfivecolleaguestostartCurlyMonsters,anindependentdevelopmenthouse.CurlyMonstersclosedin2003afterreleasingtwotitles.Martin
workedforashorttimeforEA,thenreturnedtoSonyin2003.MartiniscurrentlyaTechnicalDirectorworkingonanundisclosedSonytitle.MartinlivesinWallasey,UKwithhiswifeandtwo-year-oldson.Heenjoysgames,flightsimulation,simracing,andbeer.
ColtMcAnlisisagraphicsprogrammeratBlizzardEntertainment,whereheworksonstuffhetypicallycan'ttalkabout.Prior,ColtwasagraphicsprogrammeratMicrosoftEnsemblestudios,whereinhisfreetimehemoonlightedasanAdjunctProfessoratSMU'sGUILDHALLschoolforvideogamedevelopment.Hehasreceivedanumberofpublicationsinvariousindustrybooks,andcontinuestobeanactiveparticipantinspeakingatconferences.
JeremyMooreistheleadengineprogrammerfortheCoreTechnologyGroupatDisney'sBlackRockStudioinBrighton,UK.Hehasbeenworkinginthegamesindustryforoveradecade.FourofthoseyearswerespentworkingonSCEA'sATVOffroadFurygamesonbothPS2andPSP.Amongotherthings,hewasresponsiblefortheacclaimednetworkplayimplementation.Henowspecializesinreal-timegraphicsandbeingorderedaroundbyhistwoyoungdaughters.
JonPariseisaseniorsoftwareengineerengineeratElectronicArts.Hehasworkedonanumberoftitles,includingTheSims3,TheLordoftheRings:TheWhite
Council,UltimaOnline,andTheSimsOnline.HewasalsoacontributingauthorforMassivelyMultiplayerGameDevelopment2.JonearnedabachelorsdegreeinInformationTechnologyfromtheRochesterInstituteofTechnologyandamastersdegreeinEntertainmentTechnologyfromCarnegieMellonUniversity.
KurtPelzerisaSeniorSoftwareEngineerandSoftwareArchitectwithadecadeofexperienceinteam-orientedprojectswithinthe3Dreal-timesimulationandgamesindustry.AtPiranhaBytes,hehastakenpartinthedevelopmentofthegamesRisen(PC&Xbox360),Gothic1–3(PC)andtheenginetechnologyusedfortheseproducts.KurthaspublishedarticlesinthetechnicalbookseriesGPUGems,GameProgrammingGems,andShaderX.
AurelioReisisaprogrammeratidSoftware,whereheworksongraphicsandspecialeffects.Whilehe'sinterestedinallaspectsofgamedevelopment,heespeciallyenjoysworkingonnetworkingandgameplayaswellasdoingresearchoncuttingedgegraphicstechniques.Anindustryveteranandavidgamer,Aureliohascontributedtonumeroustitlesovertheyears,butismostexcitedaboutthegamehe'sworkingonrightnow,Doom4.
SébastienSchertenleibhasbeeninvolvedinacademicresearchprojectscreating3DmixedrealitysystemsusingstereoscopicvisualizationwhilecompletinghisPh.D.in
ComputerGraphicsattheSwissInstituteofTechnologyinLausanne.Sincethen,hehasbeenholdingajobasaPrincipalEngineeratSonyComputerEntertainmentEurope'sR&DDivision.ThisroleincludessupportinggamedevelopersonallPlayStationplatformsbyprovidingtechnicaltraining,presentingatvariousgamesconferences,andworkingdirectlywithgamedevelopersviaon-sitetechnicalvisitsandcodeshare.
László[email protected]
LászlóSzirmay-KalosistheheadoftheDepartmentofControlEngineeringandInformationTechnologyattheBudapestUniversityofTechnologyandEconomics.HereceivedhisPh.D.in1992andfullprofessorshipin2001incomputergraphics.HisresearchareaisMonte-CarloglobalilluminationalgorithmsandtheirGPUimplementation.Hehasmorethantwohundredpublicationsinthisfield.HeisthefellowofEurographics.
BalázsTó[email protected]
BalázsTóthisanassistantprocessorattheBudapestUniversityofTechnologyandEconomics.HeisinvolvedindistributedGPGPUprojectsanddeferredshadingrenderingandisresponsiblefortheCUDAeducationofthefaculty.
TamásUmenhofferisanassistantprocessorattheBudapestUniversityofTechnologyandEconomics.Hisresearchtopicisthecomputationofglobalilluminationeffectsandrealisticlightinginparticipationmediaandtheirapplicationinreal-timesystemsandgames.
BradWerthisaSeniorSoftwareEngineerinIntel'sVisualComputingDivision.HehasbeenafrequentspeakerattheGameDevelopersConferenceandAustinGDC.
DavidWilliamsreceivedhisM.Sc.inComputerSciencefromtheUniversityofWarwickin2004beforejoiningCityUniversityasaPh.D.studentresearchingMedicalVisualization.Itwasatthispointthathedevelopedaninterestinvoxelsandbeganinvestigatinghowtheconceptscouldbeappliedtogameengines.HereceivedhisPh.D.in2008,buthascontinuedtoworkonhisThermite3Dvoxelengineinhissparetime.HenowworksasagraphicsprogrammerforagamedevelopmentcompanyintheUK,andalsoenjoysphotographyandtravelling.
AbouttheEditor
EricLengyelisaveteranofthecomputergamesindustrywithover15yearsofexperiencewritinggameengines.HehasaPh.D.inComputerSciencefromtheUniversityofCalifornia,Davis,andhehasaMastersDegreeinMathematicsfromVirginiaTech.EricisthefounderofTerathonSoftware,wherehecurrentlyleadsongoingdevelopmentoftheC4Engine.
EricenteredthegamesindustryattheYosemiteEntertainmentdivisionofSierraOnlineinOakhurst,California,wherehewastheleadprogrammerforthefifth
installmentofthepopularadventureRPGseriesQuestforGlory.HethenworkedontheOpenGLteamforAppleComputerattheirheadquartersinCupertino,California.Morerecently,EricworkedintheAdvancedTechnologyGroupatNaughtyDoginSantaMonica,California,wherehedesignedgraphicsdriversoftwareusedonthePlayStation3gameconsole.
EricistheauthorofthebestsellingbookMathematicsfor3DGameProgrammingandComputerGraphics.HeisalsotheauthorofTheOpenGLExtensionsGuide,themathematicalconceptschapterinthebookIntroductiontoGameDevelopment,andseveralarticlesintheGameProgrammingGemsseries.HisarticleshavealsobeenpublishedintheJournalofGameDevelopment,intheJournalofGraphicsTools,andonGamasutra.com.EriccurrentlyservesontheeditorialboardfortherecentlyrenamedJournalofGraphics,GPU,andGameTools(JGGGT).
AbouttheCD
TheaccompanyingCDcontainssupplementarymaterialsformanyofthegemsinthisbook.Thesematerialsareorganizedintofoldershavingthechapternumbersastheirnames.Thecontentsincludedemos,sourcecode,examples,specifications,andlargerversionsofmanyfigures.Forchaptersthatincludeprojectfiles,thesourcecodecanbecompiledusingMicrosoftVisualStudio.
High-resolutioncolorimagesareincludedontheCDformanychapters,andtheycanbefoundinthefoldersnamedFiguresinsidethechapterfolders.Allofthefiguresshowninthecolorplatessectionofthisbookareincluded.
Additionally,colorversionsoffiguresareincludedfromseveraladditionalchaptersthatwereonlyprintedinblackandwhite.
TeamUnknownRelease
IntroductionGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Introduction
Overview
Inthefieldsofcomputergraphicsandcomputergamedevelopment,thewordgemhasbeenestablishedasatermfordescribingashortarticlethatfocusesonaparticulartechnique,aclevertrick,orpracticaladvicethatapersonworkinginthesefieldswouldfindinterestinganduseful.Thetermgemwasfirstusedin1990forthefirstvolumeoftheGraphicsGemsseriesofbooks,whichconcentratedonknowledgepertainingtocomputergraphics.Themainstreammethodsforrendering3Dimageshavechangedconsiderablysincethen,butmanyofthosegemsstillcompriseusefultechniquestodayandhavedemonstratedatimelessqualitytotheknowledgetheycontain.Severalnewerbookseriescontainingtheword"Gems"intheirtitleshaveappearedinrelatedsubjectareassuchasgameprogrammingandGPUrendering,andtheyalladvancethenotionofsharingknowledgethroughconcisearticlesthateachfocusonaspecifictopic.Wecontinuethetraditionwiththisbook,thefirstvolumeofGameEngineGems.
GameEngineGemsconcentratesonknowledgerelatingtothedevelopmentofgameengines,whichencompassthearchitecture,design,andcodingmethodsconstitutingthetechnologicalfoundationfortoday'svideogames.Acompletegameenginetypicallyincludeslargecomponentsthathandlegraphics,audio,networking,andphysics.Theremayalsobelargecomponentsthatprovideservicesforartificialintelligence(AI)andgraphicaluserinterfaces(GUIs),aswellasavarietyofsmallercomponentsthatdealwithresourcemanagement,inputdevices,mathematics,multithreading,andmanyadditionalpiecesofgeneric
functionalityrequiredbythegamesbuiltuponthem.Furthermore,manygameenginesareabletorunonmultipleplatforms,whichmayincludePCsandoneormoregameconsolessuchasthePlayStation3orXbox360.TheGameEngineGemsseriesisspecificallyintendedtoincludeallsuchaspectsofgameenginedevelopmenttargetingallcurrentgameplatforms.
Thisbookisdividedintothreepartscoveringthebroadsubjectareasofgameenginedesign,renderingtechniques,andprogrammingmethods.The28gemsappearinginthisbookarewrittenbyagroupof25authorshavingexpertiseingameenginedevelopment,somequiteextensive.ItisourhopethatthewisdomrecordedinthesepagesandthepagesoffuturevolumesofGameEngineGemscontinuetoservegamedevelopersformanyyearstocome.
TeamUnknownRelease
IntroductionGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
CallforPapers
Atthetimethisbookispublished,workonthesecondvolumeofGameEngineGemswillhavealreadyentereditsearlystages.Ifyouareaprofessionaldeveloperworkinginafieldrelatedtogamedevelopmentandwouldliketosubmitacontributiontothenextbookintheseries,pleasevisitourofficialwebsiteathttp://www.gameenginegems.com/.
TeamUnknownRelease
PartI-GameEngineDesignGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
PartI:GameEngineDesign
ChapterList
Chapter1:WhattoLookforWhenEvaluatingMiddlewareforIntegrationChapter2:TheGameAssetPipelineChapter3:VolumetricRepresentationofVirtualEnvironmentsChapter4:High-LevelPathfindingChapter5:EnvironmentSoundCullingChapter6:AGUIFrameworkandPresentationLayerChapter7:World'sBestPalettizerChapter8:3DStereoscopicRendering:AnOverviewofImplementationIssuesChapter9:AMultithreaded3DRendererChapter10:Camera-CentricEngineDesignforMultithreadedRenderingChapter11:AGPU-ManagedMemoryPoolChapter12:Precomputed3DVelocityFieldforSimulatingFluidDynamicsChapter13:MeshPartitioningforFunandProfitChapter14:MomentsofInertiaforCommonShapes
TeamUnknownRelease
Chapter1-WhattoLookforWhenEvaluatingMiddlewareforIntegrationGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]KyleWilson."Opinion:DefiningGoodMiddleware".Gamasutra.com.http://www.gamasutra.com/php-bin/news_index.php?story=20406
TeamUnknownRelease
Chapter1-WhattoLookforWhenEvaluatingMiddlewareforIntegrationGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
1.2IntegrationComplexityandModularity
Thefirst,mostimportantfeatureofamiddlewarepackageisitsintegrationcomplexity.Goodlibrariesarehighlymodular,minimallyintrusive,andeasytoplugintoverydifferentcodebases.Inshort,theymakeveryfewassumptionsandarefairlydecoupledfromimplementationdetailsofothersystems.Agoodlibraryshouldcomewithreasonabledefaultsorrequireverylimitedconfigurationbeforethesystemisinatestable,workingstate.Theintegrationengineer'slevelofexperiencecanplayaroleintheusabilityofasystem,sominimalintegrationiskey.Thispromotesarapidevaluationcycle—itleavesabitteraftertastewhenanengineerhasalongintegrationcycle,onlytofindthelibraryisnotdesirable.Myruleofthumbistwodays.Anythingthattakesmorethantwodaystogetrunningwillwasteaweekputtingitthroughitspaces,andtherearen'tenoughweeksinadevelopmentcycletotryoutalternatives.
TeamUnknownRelease
Chapter1-WhattoLookforWhenEvaluatingMiddlewareforIntegrationGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
1.3MemoryManagement
Consoledevelopersallknowthatcarefulmemorymanagementiscrucialtoastableproduct.Middlewareintendedforvirtualmemory-backedPCsystems,oreventhosetechnologieswithheftydemandsthatnextgenerationconsolescansatisfy,maynotbesuitableforthemoremodestRAMbudgetsofyesteryear.It'simportanttoknownotonlywhatmemorybudgetsareexpected,butalsowhoisresponsibleformanagingtheallocations.
Ideally,eachmiddlewarelibrarywillhaveitsownmemorymanagementschemethatsimplyinitializeswithtwoarguments:apointertoablockofmemoryanditssize.Amiddlewarelibraryauthoristhepersonmostknowledgeableofthesizesofallocations,thechurnrateforallocations,andthememorymanagementalgorithmmostappropriatetopreventfragmentationwithintheirownallocationpool.Further,thishastheadvantageofexplicitlynotifyingthedeveloperwhenthememorybudgethasbeenexceeded,ratherthancollectingundesirablylargeamountsofmemoryfromthemainheapwheretrackingdownmemoryconsumptioncanbetediousandproblematic.Also,thismethodtendstohighlightintegrationmishapswherelibraryassetsarebeingleaked,becauseitsoonfailswhentheheapisexhausted.Aresizableheapissometimesdesirable,specificallyifcertaingamelevelsneedtoshiftmemoryprivilegestoemphasizedifferentsystems,thoughthisfeatureisfairlyrare.
Intheabsenceofacompletelyisolatedheapmanagedbythemiddlewarelibrary,agoodfallbackisonewhereyouare
expectedtoprovideapairofmissingfunctionsforalloc()andfree(),youcanoverrideweaklydeclaredfunctionswithyourown,orlastly,youcancompilethelibraryfromsourceandareabletoprovidea#definemacrousedthroughoutforallocation.Noneofthesemethodsincurper-allocationfunctioncalloverheadbecausetheyareresolvedbythecompilerorlinker.Inthismethod,youhavetheoptionofforwardingallocationstothemainheap(refrain!),ordeclaringaspecialheapthatisdedicatedtothesubsystemusingyourchoiceofallocationstrategy.Somemiddlewareprovideonlyamethodforregisteringallocationcallbacks.ItisacommonpetpeevetounnecessarilywasteCPUcycles,andthisisoneminorgripeIhaveaboutotherwisesolidofferings.
Theworstpossiblesituationisalibrarythatislitteredwithdirectcallstonew/mallocanddelete/free.Thisprovidesyounosimplemeanstoencapsulatethesystem'sresourcesandnosimplewaytomeasureorlimititsconsumption.
TeamUnknownRelease
Chapter1-WhattoLookforWhenEvaluatingMiddlewareforIntegrationGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
1.4MassStorageI/OAccess
Accessingopticalmediaisveryslow,andseeking—evenonharddrives—candominateloadtimes.Whilecertainkindsofmiddlewareneedtoaccessthefilesystem,particularlysoundsystemsthatstreammusicorasynchronousstreamingsystemsthatloadgameassetsinthebackground,mostdonotneeddirectaccesstothephysicalmediaAPI.
Theseexceptionsaside,middlewaregenerallydoesrequireaccesstosomeassets,butitshouldneverdosobydirectlyrequestingthemfromtheunderlyingsystemAPI.Agoodmiddlewarelibraryprovidesexplicithooksforthedevelopertooverloadfileanddatarequestseasilysotheycanbefunneledthroughanycustomfilesystem(e.g.,WADorpackfiles)inwhichthedevelopermayhavechosentostoredataorredirectthefilerequesttodifferenthardware.
Themostflexiblelibrariesdonotattempttotreatdataasfilesorstreamsatall,rathertheydealexclusivelywithbulkmemorybuffersandleaveresourceacquisitionfirmlyinthehandsofdevelopers.Thisapproachsimplifieserrorhandlingandloadtimeoptimization,anditismorerapidlydeployedtonewhardwarewithdependableresults.
Theworstmiddlewarelibraries(I'veseenthisfrequentlyinopensourcecode)assumethataPOSIXfilesystemisalwaysavailable,oronelikeit,anddependdirectlyonCRuntimeLibrarycallssuchasFILEandfopen().
Anothermisguidedattempttoaddressfilesystemabstractionistheinvasiveextensionoffilestreamsbyprovidinganabstractinterfaceforareader/writerclassthat
theuserprovides.Aderivedinstanceofthisinterfacegrabsdataviaavirtualfunctiononebyteatatime,oreveninsmallfixedsizedblocks.Thismapsverypoorlytoreal-worlddataandperformancecharacteristics,anditshouldbeavoidedatallcosts.
TeamUnknownRelease
Chapter1-WhattoLookforWhenEvaluatingMiddlewareforIntegrationGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
1.5Logging
Therewillbetimeswhenexpectationswillnotbemetinsideamiddlewarelibrary.Agoodlibraryhandleswarningsinauniformwayandhasawaytointegratethemintoyourexistinglogsystem.Betterlibrarieswillhavesomekindoftrivialverbositysettingthatrangesfromnoisytoabsolutelysilentandwillpreferablycompileoutthestringsanderrorcheckingentirely.Noisyverbositysettingsthatspewplentyofdetailsaboutthedatabeingfedinaffordsdevelopersasenseoftrustthatthelibraryhasbeenintegratedcorrectlyanditsfilesarebeingreadproperly.However,ifyoucannotcompilethisoutentirely,itcomesatacostofmemoryandCPUperformancethatisunacceptableinfinalshippingbuilds.Somemiddlewarecomeswithreleaseanddebuglibrariesforthisreasonalone.
Thebestarchitectedmiddlewareproductshaveasimplemethodforhookingaloggingoutputcallbacksothelogscanbeintegratedintothegame'sexistingreportingsystem.Bewareanylibrarythatblithelycallsprintf(),sincethisfunctionisrelativelyexpensiveandmaynotevenhaveastandardoutputpipeconnected,providingnomeanstoalerttheuser.
TeamUnknownRelease
Chapter1-WhattoLookforWhenEvaluatingMiddlewareforIntegrationGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
1.6ErrorHandling
Thebestmiddlewarewillnothaveerrorconditionsatallbecausetheyalwayswork.Iholdouthopethatsuchathingexists.MeanwhilebackonEarth,errorsareinevitable,andthehandlingofthemisanimportanttraitwhendeterminingwhethersomesoftwarecanoperateinyourprojectenvironment.Therearedifferentschoolsofthoughtaboutwhenanerrorisalwaysfatal,alwaysrecoverable,orinthegrayareainbetween.Middlewarelibrariesthatgiveyounochoiceinthematter,nowaytooverridethehandlingoferrors,tendstobegradeddownward.
Asconsoledevelopersknow,patchingagameisoftenimpossible,andsomebugsarevirtuallyimpossibletoreproduceortrackdown.Sometimes"heroicmeasures"arecalledfortoguaranteethegamedoesnotcrashevenwhensomesubsystemexperiencescompleteandutterfailure.Everypieceofsoftwareshouldhavesomecapacitytoforwardseriouserrorsthroughahandler,whereinafitofdesperationyoucansilencethesoundsystem,clearthescreenandprint"YouWin!"beforehanging,oratleastreboot.Steerclearofanythingthatappearstouseexit(),abort(),orevennakedassert()calls.
TeamUnknownRelease
Chapter1-WhattoLookforWhenEvaluatingMiddlewareforIntegrationGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
1.7StabilityandPerformanceConsistency
Middlewareshouldbestable.Afterall,themainreasondevelopersdon'twritesomethingthemselvesisthetimerequiredtowriteanddebugit.Librariesthatareunabletoreport(andattempttorecover)fromcommontrivialerrorsarefragile,andwillbeoftencursedbydevelopers.Thebestmiddlewarewillnevercrash,willfixgarbageinputsorignorethem,andwillleaveplentyofdebuggingbreadcrumbsforprogrammerstofollowwhiletrackingdowntheproblem.Toptoolsandenginescansustainsignificantdatacorruptionandmissingfileswithoutcrashingoutright,whichcanendupimpedinganentireteam'sprogresswhiledealingwiththeproblem.
Everyprojecthasdifferentexpectationswhenitcomestoperformance,soit'suptoyoutojudgeforyourselfwhatisacceptableinabsoluteterms.However,everymiddlewarelibrarythatyouconsiderusableshouldalsohaveaconsistentmemoryandCPUperformanceprofile.Astableframe-to-framememoryfootprintisessentialforshippinggamesontime.Occasionalspikesintheframeratecanbehardtotrackdown,too,andseverelydegradethegameexperience.Goodmiddlewareshouldbearockwheninstrumentedwithaprofilerineverysituation.
ArecentprojectIworkedonhadasingleframememoryspikeof2MBand100ms.Aftertrackingitdown,itwasduetoaminorchangeinhowalevelscriptwaswritten.Amoreconsistentvirtualmachinewouldhavelimitedthenumberofinstructionsitwouldexecuteorlimitexecutiontoatimeslice.Amorestablelibrarywouldhavekeptacloserwatchonits
memoryusage.Anecdotalevidenceissometimesallyoucangethereuntilyourunafoulwithpersonalexperience,attheworstpossibletime.Askaround.
TeamUnknownRelease
Chapter1-WhattoLookforWhenEvaluatingMiddlewareforIntegrationGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
1.8CustomProfilingTools
Wheneverconsideringapackageforinclusion,Iamimpressedwithtoolsthatreduceuncertaintyneartheendofaproject.TheseincludeanysystemthatcomeswithamonitoringAPI,orbetteryet,aprofilerofsomesort,thatcutsdaysofpotentialprogrammertimeneartheendofthegamewhenmemoryistight,CPUtimeisscarce,andnobodyknowswhereit'sgoing.Beingabletoimmediatelypulloutatoolthatcaninstrumentapartofthegamehelpstorapidlynarrowdownthesearch.Gainingvisibilityintocontentistypicallyhard,soanyhelpyourmiddlewareprovidesistremendous.
TeamUnknownRelease
Chapter1-WhattoLookforWhenEvaluatingMiddlewareforIntegrationGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
1.9CustomerSupport
Commercialmiddlewaretypicallyoffersintegrationspecialists,telephonesupport,directemailsupport,andforumsormailinglists.Sometimes,thisisexactlywhatateamneedstomoveforward,especiallywhentheyreachamajorstumblingblock.Thewholereasonformiddlewareistoreduceriskanduncertainty,whichcustomersupportdoes.Middlewarethatcomeswithnoexpectationofsupportisuseless.Ihavediscardedmorethanafewgreatandpromisingtechnologiessimplybecausetheauthorwasunavailabletoansweracoupleofquestions.
Watchforfastresponsetimesinforumsanddedicatedpersonnelwhoanswerquestionsonthephoneorinemail,andbepreparedtosendyourintegrationcodetothemwhentheyaskforit.(That'sallthebetterreasontokeepittidy.)
TeamUnknownRelease
Chapter1-WhattoLookforWhenEvaluatingMiddlewareforIntegrationGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
1.10DemandsontheMaintainers
Programmersarebusy,expensiveresources.Middlewarethatrequiresalotofeffortandattentiontointegratewellwithyourflavorofbuildsystemisgoingtobekickedtothecurbveryquickly.Ideally,yousimplyaddalibrarytothebuildprocess,#includeaheaderfile,andcallafewfunctions.Ta-da!It'sintegrated.
Sometimes,themiddlewareismoreinvasive,andrequiresvarious#definemacrostobesettoconfigureit.Orperhapsitneedstobeintegrateddirectlyintoyourprojectandcompiledwithyourgame.Additionally,somemiddlewarehasexternaldependenciesthatmustbepresentforittocompile.Worsestill,itmayrequireintroducinganewtoolintothebuildprocessthatmaynotworkwellwiththebuildsystem.Turnkeysystemsareclearlypreferable.IlookformiddlewarethatcomeswithGCCandMicrosoftVisualStudioprojectfiles,butwithextremelybasicprojectconfigurations.ThisprovesthatthecompilersIcareaboutcanhandlethecode,andthatIcanthrowawaytheprojectfilesandintegratethemmyownwayafteraninitialbuildusingtheprovidedprojectfile.
TeamUnknownRelease
Chapter1-WhattoLookforWhenEvaluatingMiddlewareforIntegrationGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
1.11SourceCodeAvailability
Eventually,youmayneedtodebugintothesourcecodeofamiddlewareproductyouplantointegrate.Ifitisaproprietary,closed-sourceproductwithoutasourcecodeoptionatareasonableprice,lookforalternatives.Whilecertain"industrysecret"partsoflibrariesmaybebinary-only,vendorsknowthatsourcecodeisexpectedandwilloftenprovideitwithalicense.Thosewhodon'tprovidesourcetypicallyclaimthattheircustomersupportobviatesthatneed.Whilethismaybetrue,themomentthere'sanissuethatcustomersupportcan'tsolve,butsourcecodecould,isthemomentwestartsearchingforareplacement.Again,abigreasonforusingmiddlewareisthatit'sprovenandstable,sosourcecodeshouldnotreallybenecessary.Buttheavailabilityisstillimportantshouldtheneedarise.
TeamUnknownRelease
Chapter1-WhattoLookforWhenEvaluatingMiddlewareforIntegrationGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
1.12QualityofSourceCode
Havingsourcecodedoesnotalwaysmeanyouhavetheabilitytomakemeaningfulchanges.Thebestmiddlewarehasexcellentdocumentationthatisauto-generatedfromsourceoneverynewcodedrop.Itwillhaveafamiliarandconsistentbracingscheme,somekindofnamingconventionforfunctionsandvariables,andideallywillliveinsideabriefnamespace.
Takeamomenttoperusetheimportantheaderfiles.Inspectfor#definemacrosforlanguagekeywordsorcommonfunctionssuchasnew,min,andmax,orinfactanymacrothatescapesthescopeoftheheaderwhichcouldcauseissueselsewhere.Useanarchive-inspectionutilitysuchasdumpbin(inWindows)ornm(inLinux)toverifythattheonlyexportedsymbolsthemiddlewarelibrarydefinesareinaconsistentnamespacetoavoidconflictswithotherlibrariesoryourowncode.
Questionablemiddlewarewillbelitteredwith#pragmastatements,willdisableerrors,willreducethewarninglevel,etc.Scrutinizethisheavily.Thesestatementsbeinginheaderfilescanharmthequalityofyourcodeandmaycausesomefilestostopcompilingsimplybyincludingthem.
TeamUnknownRelease
Chapter1-WhattoLookforWhenEvaluatingMiddlewareforIntegrationGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
1.13PlatformPortability
Certainkindsofmiddlewarearelikelytobeplatformindependent,butevenifyouhavethesourcecodetheremaybelurkingbyteordering(endianness)issues.Inspectthefileandnetworkstreamhandlingforendianswappingcodeorperhapsdifferentassetbuildingtoolsper-platformthatsetsupdataintonativeformats.
Multithreadingisverycommoningamestoday.Somemiddlewarelibrariespredatethisshift.Determinehowthreadlockingishandled,ifthecodeisthread-safe,andwhetherthethreadcontrolsareeasytooverridewithadifferentplatform'sinterface.Beawarethatanylibrarythatdoesnothingwiththreadingis,byitsverynature,likelytobeunsafetouseinathreadedenvironment.Factorintimeformakingthelibrarythread-safe,oratleastlimitingitsusetoasingle,specificthreadatalltimes.
Alsoconsiderwhetheryourgamemaybeportedtoaless-capablehardwareplatformatsomepoint.Ifitmight,determinewhatfeaturesareuniquetothismiddlewareofferingthatwouldmakeportingtoadifferentlibraryparticularlydifficultifthislibrarydoesnotexistforlower-endplatforms.
TeamUnknownRelease
Chapter1-WhattoLookforWhenEvaluatingMiddlewareforIntegrationGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
1.14LicensingRequirements
Iamnotalawyer,andthisisnotlegaladvice.Consultalawyerofyourowntoanswerspecificquestions,andreadeverylicensepersonally.Alsonote,mostpublishersandevensomehardwaremanufacturershavestrictregulationsaboutwhatopensourcelicensesareallowedinproductsyoudevelop.Clearallopensourcelicensesthroughtheirlegaldepartmentsbeforeintegratingone.Whatfollowsaremyinformedlayman'sopinions.
EveryopensourcelicensethatIknowofspringsintoactionastheresultofdistributingsoftware.Whatthismeansis,aslongasyoudonotdistributethesoftwareoutsideyourcompany,youcanusewhateveryoulikeforapplicationsusedinternally.Feelfreetouseanyopensourcecodeinyourinternally-facingservertechnologyortools,butdonotplantodistributethoseprograms.
Publicdomaincodeiscompletelyharmless.Onceyoucopythecodetoyourharddrive,youownit,andeverymodificationyouputintoitisyourstokeep.Youmayofcoursechangethelicensetosomethingelse,releaseitbacktothepublicdomain,orkeepitsecret.It'syours.
TheMIT,Zlib,etc.,licensesappeartomoreorlessdisavowanyresponsibilityforhowyouusetheirsoftware.Theydonotrequireanythingexceptacreditclauseandthatcertainnoticesremaininthecode.Thisisnotarestrictivelicensewithrespecttoretailuse.
TheLGPLlicensehassomestipulationsthatrequiredistributionofthesourceofthelibrary(andpossiblypartsof
yourownapplication).Readthiscarefullybeforeusinginretailproducts.
TheGPLlicenseandmanyothersimilarlicensesrequirecompletecodereleaseondistributionifyouintegratethemintoyourproducts.
Frequently,commercialmiddlewarelicenseswillrequirecreditattributions,splashscreens,anintromovie,etc.Becarefultoadheretothese,andgivecreditwhereit'sdue—middlewareauthorshelpedmakeyourgamehappen.
TeamUnknownRelease
Chapter1-WhattoLookforWhenEvaluatingMiddlewareforIntegrationGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
1.15Cost
Everygamehasadifferentbudgetformiddleware.Expectpricestovarywidelyfrom$100to$750,000,dependingonwhatkindofproductyou'relookingat.Greatmiddlewaredemandsprofessionalattentiontodetail,andwhenusingalibraryprofessionally,goodsupportisimportant(butoftencostsmoredependingonyourlevelofneed).
Whenlookingforalibrarythatcostsmoney,considerwhetherthecostisper-seat,per-game,per-platform,orifithasamaintenancelicensesuchasper-annum.Manymiddlewarecompanieshavespecialratesfordigitaldownloadtitlesversusretailtitles.Also,bewillingtocalluptheaccountmanageratyourchosenmiddlewareshopandtrytosweettalkthemintoadiscount.Youmightbesurprisedathowflexibletheyareifthisisyourfirsttitlewiththem—theyknowverywellthatmiddlewaretendstobecomeentrenchedandexpensivetoremove,soit'sgoodforthemtogetyouhookedontheirproduct.Trytoworkoutmultipleprojectdeals,ifyouhavethatkindofleverage.
Somelibraries,suchasZlib,haveevolvedtothepointwherethereisplentyofsupportonlineinforumsandsamplecode,andthenumberofbugsisapproachingzero.Whenlibrariescalcify,thefactthattheyarefreeisnotanissue.However,opensourcelibrariesarenotfreeingeneralunlessyourtimehasnovalue.Takeanestimateofhowmanyman-weeksitwilltaketoimplementthevariousmissingfeatureslistedabove,assignapricetothoseman-weeksandcomparethatwiththecostofthemiddlewarelibrary.Ifthey'reevenclosetoparity,gowithcommercialmiddleware,simplybecauseit's
established,tested,supported,andbestofall,youwillfreeupprogrammerstoworkonactualgamefeaturesinsteadofpatchingupsomeoneelse'spetproject.
TeamUnknownRelease
Chapter2-TheGameAssetPipelineGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter2:TheGameAssetPipeline
RémiArnaudScreampointInternational
Highlights
TheGameAssetPipeline(a.k.a.theContentPipeline,andhereafter,simplytheAssetPipeline)isanintricatemechanismthatinterfacestheartiststothegameengine.Gameteamshaveseentheirartist/programmerratioskyrocketinginordertocopewiththequantitiesofhigh-qualityassetsontightschedules.Makingsuretheassetsaredeliveredintherightformattherightplaceisanimportantpartofmanagingthecreationprocesssothatartistsanddesignerscancreate,preview,add,andtweakcontentaseasilyandswiftlyaspossible.
Thisgemiscomposedofthefollowingsections:
Assetpipelineoverview:Whatistheroleandcompositionofanassetpipeline?
Assetpipelinedesign:Whatarethemaingoalsandhurdlesinvolvedindesigninganassetpipeline?
Pushorpullpipelinemodel:Approachingtheproblemfromadifferentperspectivetoachieveabetterdesign.
COLLADA,astandardintermediatelanguagefordigitalassets:Takingadvantageofastandardanditsbroadavailabilitytobuildanassetpipeline.
OpenCOLLADA,anewopensourceframeworkbasedonSAXparsinganddirectwritetechnology.
Usercontent:Retargetingtheassetpipelinetofosterusercontentcreation.
TeamUnknownRelease
Chapter2-TheGameAssetPipelineGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
2.1AssetPipelineOverview
Theassetpipelineisthepaththatallthegameassetsfollowfromtheircreationtooltotheirfinalin-gameversion.Therearealotofassettypesneededinagame:models,materials,textures,animations,audio,andvideotonameafew.Theassetpipelinetakescareofobtainingthedatafromthetoolsusedfortheircreation,andthenoptimizes,splitsormerges,converts,andoutputsthefinaldatathatcanusedbythegameengine.AtypicalassetpipelineisdescribedinFigure2.1andexplainedinthefollowingparagraphs.
Figure2.1:AssetPipelineComponents.
SourceAssets
Sourceassetsarecreatedbyartistsanddesigners.ThetoolsusedtocreatesourceassetsareusuallyreferencedasDigitalContentCreation(DCC)tools.Ingeneral,sourceassetsarecreatedthroughasetofspecializedtools:oneformodeling,onefortexturecreation,oneforgamingdata,and
oneforlevellayout.Sourceassetsshouldcontainalltheinformationnecessarytobuildthegamecontent.Inotherwords,itshouldbepossibletodeletealltheassetsexceptforthesourceassetsandstillbeabletorebuildthegamecontent.Sourceassetsareoftenkeptinthetool-specificformat,andtheyshouldbestoredusingaversioncontrolsystemsuchasSubversion[1]orPerforce[2].Sincesourceassetsareratherlargeandofteninbinaryformat,besuretouseaversioncontrolsystemthatcanhandlelargebinaryfiles.
FinalAssets
Finalassetsareoptimizedandpackagedforeachtargetplatform.Oftenenough,thedeveloperhasnochoiceinwhatthefinalformathastobesincethetargetplatform,thirdpartyengine,orthepublishermayimposeagivenformat.Thereisnoneedforsuperfluousinformationinthefinalassets,soallthedataisprunedtoitsrequiredminimum.Justlikesourcecode,assetshavetobecompiledandoptimizedforeachtarget.Forinstance,texturesshouldtobestoredintheformatandsizethatthehardwarecanuseimmediately,astexturecompressionalgorithmsandparametersneedtobeoptimizedforthetargettexturehardwareanddisplaycapabilities.Geometrydataisalsocompiledforeachtarget.Mostplatformswillonlyaccepttriangles,butthenumberoftrianglesperbatch,thesortingorderofthevertices,theneedfortrianglestrips,andtheamountofdatapervertexwillhaveabigimpactontheperformanceofthegameengine.Limitationsontheshaderrenderingcapabilitiesofthehardwaremayalsohaveanimpactonthegeometrysinceitmaybenecessarytocreateseveralgeometriestoenabletheenginetousetheseinseparatepassestocreatethe
desiredeffect.
Finalassetsalsohavetobeoptimizedforgameloadingspeed.ConversionofthedataintohardwarebinaryrepresentationisoneofthepossibleoptimizationssincethiswillsaveCPUtimewhileloading,butitisnotnecessarilythemainbottlenecktooptimizesinceI/O(bandwidthorseeking)mayactuallybemoreproblematicthanCPUperformance(parsing,decoding)nowadays.Forreallysimplegames,itmaybeenoughtocompactthedataasmuchaspossibleandloaditonceatlaunchorateachchangeoflevel.Butformoreevolvedgames,theexpectationisthatthedatawillbecontinuouslystreamedwhiletheplayerisprogressinginthegametoprovideabetterimmersiveexperience.Havingagoodreal-timedatastreamingcapabilitybuiltintotheengineisoneofthefundamentalcoretechnologiesthatwillalsohelpwithprovidingasimilarexperiencefortheplayeronvariousplatforms,aslimitationsindevicememorysizecanbehiddenbyalocalcachemechanism.Also,ifthecontentistobestoredonahardwareformatthathasspecificlimitations(forexample,aCDorDVDmayhavelongseektimes),itmaybeimportanttosortthedataintheorderinwhichitwillbeencounteredinthegame.
BuildProcess
Thebuildprocessistheprocessthattransformsallthesourcedataandcreatestheoptimizedpackageddataforeachtargetplatform.Thebuildprocessshouldbeautomatic,asdonewiththegamesourcecode.Thebuildshouldbeoptimizedtoonlyprocessthedatathathaschangedfromthepreviousbuild.Buildingtheoptimizeddataforanentiregamecantakehours,ifnottheentireday.Itisespecially
painfulattheendofaprojectwhenthereisalotoftweakingthatneedstooccurinashortamountoftimeandthebuildtimeisthelongestasthedatarepositoryisfullypopulated.Earlyintheprocess,littleattentionmaybegiventotheoptimizationofthebuildprocessandtheorganizationofthedata,buttheoverallqualityandsuccessofthegamewilldependonthefinaleditingthatoccurswhenmostoftheassetshavebeencreated.Inotherwords,apoorlydesignedbuildprocessislikelytodirectlyimpactthequalityoftheendproduct.
Oneofthemajorissueswithoptimizingtheassetbuildprocessisthelackofinformationaboutdependencies.Sincesourceassetsareoftenstoredinatool-specificopaqueformat,itisnotoftenpossibletoeasilyparsethedataandrecursivelycollectthedependencyinformation.Theneedfordependencyinformationandthedirectextractionfromthesourcehasbeeninplaceformanyyearsinsourcecodedevelopment,butunfortunately,suchaprocesshasnotyetmaturedforassetsandwillrequirethegameenginedeveloperstoactivelydeveloptheirown.
Thereareafewgoodsourcesofinformationaboutassetbuildsystems,suchasthebookTheGameAssetPipeline[3]andtheMicrosoftXNAcontentpipeline[4].WementionthelatterinspiteofthefactthatXNAisdesignedtoonlytargetMicrosoftplatforms(PCandXbox360)andhasseveraldesignlimitations,suchasrelyingonfileextensionstoseparatethedatatypesandimposingthebuildinformationtobeembeddedinthesourcecoderatherthanasdescriptivedata[5].
Manifest
Oneverycommonissuewithgamedevelopmentisthatitquicklybecomesimpossibletoknowwhatassetsareneeded,whatassetsdependonothers,andwhatassetsarenolongernecessary.Oftendataispiledupintoashareddirectoryduringdevelopment,andthegameworksfinesinceallthedatareferencedbythescriptsorotherassetsareavailable.Theproblemisthatthereisnoeasywaytoknowwhichassetsarereallynecessary,sopackagingademointhemiddleofaprojectoftenresultsinseveralgigabytesofcompresseddata,makingithardtodistribute.Manuallymaintainingamanifestiseasiersaidthandone,asitrequiresimposingastrongpolicyonartistsanddesignerstomakesuretheyaddalltheassetsreferencedintothemanifestandremovetheonesnotinuse.Inpractice,thereisnohopeformaintainingamanifestmanually,soitisverynecessarytocreateanautomaticsystemthatextractsthemanifestfromthegame'stop-leveldescription(i.e.,leveldesignfiles).Havingaccesstoexternalreferencesstoredinsidetheassets,suchasscripts,textures,models,animation,orotherreferencesbecomesanobviousneedinordertobeabletoautomaticallycreatethemanifest,soonemustbewaryofusingopaqueformatsintheassetpipeline.Thegoodnewsisthatthisinformationcanbeusedtocreateadependencygraphenablingthebuildprocesstoworkonlyonthefilesthatarenecessaryforagivenlevel.Thedependencygraphalsoprovidesanunderstandingofwhatassetsneedtoberebuiltwhentheydependonassetsthathavebeenchanged.
FastPath
Thefastpathistheabilityforassetstobeloadedintothegameenginewithoutgoingthroughthefullbuildprocess.
Thisshortcutisveryimportanttoprovideforbetterartistefficiency.Whenusingthefastpath,theartistinvokesaminimalexportmechanismthatenablesafasteriteration.Notethatanartistisoftenworkingonaparticularsubsetoftheassets,andonlythedataforthoseassetswillfollowthefastpath;theotherelementsofthescenecanstillbeloadedfromthefinalbuildpath.Thismeanstheengineneedstoallowforbothfullbuildandfastpathassetstobeactivesimultaneouslyduringproduction.Thefastpathloadercanlaterbeprunedfromtheengineunlessthedeveloperwantstokeepthiscapabilityasafeaturetomodders(seeSection2.6).Thefastpathisanoptimizationoftheoverallproductionprocess,whichmaybeveryimportanttothesuccessofdeliveringthegamewithintimeandbudgetconstraints.
IntermediateAssets
Intermediateassetsrepresentthedatainbetweenthesourceandthefinalform.IntermediateassetsarecreatedusinganexporterfromtheDCCtoolandalsorepresentthedatathathasbeenprocessedalongthebuildprocessjustbeforefinaltransformationintothefinalassetpackaging.Theintermediateassetformatshouldbeveryeasytoread,parse,andextend.Intermediateassetsshouldcontainalltheinformationthatmaybenecessarydownthepipeline,andbelosslesswhereverpossible[1].Thereisnoneedtopruneandover-optimizeearly;itismuchmoreefficientandautomatictodiscarddatafurtherdowninthepipelineratherthanhavingtore-exportallthesourceassetseachtimeadditionalinformationisrequired.Plaintext,orstructuredXMLfiles[6]arerecommendedformatsformost
intermediateassets;thisprovidesforhumanreadabilityandoccasionalhandeditinginadditiontotakingadvantageofnumerouslibrariescapableofhandlingstandardXMLformat.Especiallyusefulisthebuilt-incapabilityfromlanguagessuchasC#,Python,Java,andotherstobeabletodirectlyloadanXMLdocumentandprovideeasyprogrammaticaccessandsearchfunctionstotheembeddeddocumentobjectmodel(DOM).
Ideally,onesingleflexibleintermediateformatshouldbeusedtostorethedataduringtheentiretransformationprocesssothatthesameformatandI/Ocodecanbeusedthroughoutthecontentpipelineandprovidetheflexibilitytochangethebuildprocessstepsandtheirexecutionorder.ThismeanstheformatneedstostorethedatainarepresentationascloseaspossibletotherepresentationintheDCCtool—inordertosimplifytheexportprocessandensuretheleastdatalossaspossible—aswellasconvertthedatatoaformatascloseaspossibletothefinalencodingrequiredbythetargetplatforms.Thisinvolvesabitofadditionalcomplexitywhendesigningtheintermediateformat,butultimately,itisabigwintoavoidoverloadingtheexportprocesswithearlytransformationsatthecostofexporttimeanddataloss.AsshowninFigure2.2,assetprocessingmodulescanapplytransformationsdirectlytointermediateassets,suchastriangulation,textureprocessing,andlevel-of-detailcalculation.
Figure2.2:Intermediateassetsalongthebuildprocess.
ThePipeline
Acombinationofprocessingmodulescomposethebuildpipeline.Intermediateassetscanbeprobedatanypointinthepipelinesincetheyareavailableinthesameformatthroughout.Althoughnotfundamentallynecessary,itispossibletocreateaspecifictoolwithauserinterfacethatenablesthecreationofabuildprocessbysimplyconnectingassetprocessingmodulestoeachother[7].Thisdesignalsoenablesthebuildprocesstobemoregenericinordertohandledatafromavarietyofsourcesandtools,aswellasbeingconfigurabletosupportexistingandfuturetargetplatforms,bycombiningorconfiguringasetofassetprocessingmodulesdifferently.
TherewouldbenoneedtousedifferentformattingforsourceandintermediateassetsifonesingleformatcouldservebothpurposesandisrecognizedbytheDCCtools.
Thisdoesnotmeanthedatawillstayuntransformedthroughthebuildprocess;itmeansthesameformattingcanbeusedtostorethedataatallstagesofthetransformation.Forinstance,thereexisthundredsofimagefiletypes/formats[8],butunfortunately,noneseemsabletoofferallthepropertiestobebothsourceandintermediateformat.Ideally,wewouldhaveaformatthatcanstoreuncompressedand/orlosslesscompressionpixelsforthesourceform.Someimagesneedtobestoredinhigherresolutionsthantheusual8bitspercomponent,sotheformatshouldallowfor16bits,oreven24bitspercomponent.Moreandmoreoften,highdynamicrange(HDR)storageisnecessary,forwhichanopenformathasbeenmadeavailable:OpenEXR[9].Ontheotherend,theintermediateformatshouldbeabletorepresentthedataascloseaspossibletothetargetplatformformat,buttherequirementsarealloverthespectrumtothepointthatitisalmostimpossible.Forinstance,finaltexturedatahastorepresentmipmaps,cubemaps,volumemaps,heightmaps,normalmaps,andtexturearrays,andithastodealwithmemoryalignment,swizzling,tiling,andhardwarecompression(DXT).TheD3DX10formatfromMicrosoft[10]isprobablytheclosesttotheneedsforintermediateimageformat.ItincludestheDirectDrawSurface(DDS)description[11]aswellasPNG[12],IFF[13],andBMP[14].Moreover,itcanalsobeused,atleastontheMicrosoftplatform,asafinalformat.Theproblemisthattheremaynotbeanexportandimportworkflowforthisformatintheimagecreationtool,forcingtheuseoftwoseparateformatsforthesourceandintermediateimageassets.
Inadditiontotheconversionfromthesourceformattotheintermediateformat,imagesalsohavetobemadeavailabletotheDCCtool,whichmayrequireyetanotherformat
conversion.AverypopulartooltocreateimagesisAdobePhotoshop,whosePSDnativeformatcanstorealotoftypesofdata.Hopefully,mainDCCtoolscandirectlyloadPSDimagesandmayeventakeadvantageoflayersandotheradditionalinformation.Interestingly,AdobehasrecentlymadeavailabletheopensourceGenericImageLibrary[15],whichaimsatsimplifyingandoptimizingtheprocessingofimageassets,butunfortunately,itdoesnotprovideI/OsupportforAdobe'sownPSDformat!
Inthepast,gameenginedevelopershadtocreatetheirownintermediateformatandwritetheirownexporterasaplug-inforeachvendor-specifictoolinusebyartists,whichcanbeadauntingtask,especiallywhentrackingDCCSDKissues.Luckily,thisisnolongeranecessity,andenginedevelopersshouldinsteadtakeadvantageofastandardintermediateformatandavailablesourcecode[16],asdiscussedinSection2.4.
[1]Intermediateassetsstoredinanopenandwell-documentedformatarealsoveryimportantforarchivalofthegameassets.Here'saninterestingstory:AlargestudioshippedagamebasedonanewIPandcleverlystoredallthesourceassets,sourcecode,andbuildprocesses.Ittookmanyyears,buteventuallythesequelwasordered.TheproblemwasthatthesourceassetswerenotrecognizedbythenewversionoftheDCCtools.Cleverly,theyhadalsostoredthetoolsusedtocreatetheassets.Theproblemwasthattherewasnowaytogetalicenseserverworkingforthoseoldtools.Intermediateassetswerestoredinabinaryopaqueformat,sonohelpthere.Alltheassetshadtobecreatedagain.Usingatext/XMLdocumentedintermediateformatinthearchivewouldhavesavedalotoftimeand
money.
TeamUnknownRelease
Chapter2-TheGameAssetPipelineGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
2.2AssetPipelineDesign
Fromtheend-userperspective,theassetpipelineshouldbetransparent.Infact,theultimaterewardforthedeveloperoftheassetpipelineoccurswhenartistsandgamedesignersdonotknowabouttheexistenceofthepipelineandfeelthereisnothinginbetweentheircreationtoolsandthegameengine.Themostimportantpieceoftheassetpipelinefromtheuser'spointofviewisthegameengineeditor.Itisthetoolthatprovidestheassetintegrationinterfaceintothegameengine.Theeditorisbuiltontopofthesameenginecoretechnologyasthegameitselfandprovidesartists,designers,andengineersthenecessaryWYSIWYG(WhatYouSeeIsWhatYouGet)environmentforputtingtogethertheassetsandaddressingdesignorengineissuesrapidly.
GameEngineEditor
Therearetwopossibleworkflowstoconsiderforthegameengineeditor:
1. Theeditorisanadvancedviewer,enablingtheusertochecktheassetswithintheengineandhavecontroloverthecameraposition,lightingconditions,andwhichassetsaredisplayed.Inthisdesign,theassetsaremodifiedandeditedonlyintheDCCtoolsandareexportedwheneverthereisachange.
2. Someoftheassetscanbeeditedusingtheeditor.Thisalternatedesigncreatesamorecomplexassetpipelineworkflow,astheeditoriseffectivelycreatingsourceassetsthatneedtobestoredproperlyforthebuild
process.
Thetwomainconcernswhendesigningacontentpipelineareefficiencyandflexibility[17].Efficiencyisrequiredinorderfortheteambuildingthegametobeabletodelivergreatcontentontime,andflexibilityisrequiredtobeabletoadaptthepipelinetotheevolvingneedsoftheteam.Unfortunately,thesetwoprioritiesaresomewhatinoppositiontoeachother,sothereisn'tasinglesolutionthatcansatisfyallprojects.Instead,achoiceofcompromisesdefiningtherequirementsforthedesignoftheassetpipelineisneeded.
Therefore,thegameengineeditorwillofferacombinationofview-onlyandeditingcapabilitiesbasedonthespecificsofaproject.Thequestioniswheretodrawtheline.Obviously,itisnotpracticaltorewritealltheeditingfunctionalityofalltheDCCtoolsintheeditor.ThiswouldbeamajortaskandchangethefocusofthedevelopmentfromcreatingagametocreatingDCCtools,whichisadifferentbusinessalltogether.Somehavetriedtogotheotherwayaround,writingthegameeditorasaplug-intoaDCCtool.Butitdoesnotworkeither,asthisrequireswriting(forexample)animageeditor,audioeditor,scripteditor,AItool,andleveleditorinsidea3Dpackage,anditlocksthetoolchaintotheextensioncapabilitiesofthetoolSDK.Therightcompromiseistowriteonlytheeditingfeaturesinthegameeditorthatareeitherveryspecifictothegameitselforprovideaverylargeefficiencyimprovementoveroff-the-shelftools.Infact,agameteamshouldalwaysbeonthelookoutfortheavailabilityofnewtoolsthatcouldimprovetheirefficiency.Theassetpipelineshouldbedesignedtoprovideenoughflexibilitytoenabletheinclusionofnewtools,whichcanbea
delicateprocessthroughoutthecourseofaproject.
Theothermainpointtoconsiderisrobustness.Theassetpipelineisthebackboneoftheprojectandtheteam.Theprojectisatastandstillwhenthepipelineisbroken,andeveryoneontheteamisidlingwhiletheywaitforthepipelinetobefixed.Asoliddesignandtheproperdocumentationtellingateamhowtouseandextendtheassetpipelineisthekeytoitsrobustness.
Efficiency,flexibility,androbustnessareallmaingoalsoftheintermediateassetformat.Theprimaryadvantageofprovidingastepbetweenthesourceandthefinalassetformatisdecouplingenginedevelopmentandassetcreation.Ifthefinalassetsaretobecreateddirectlywhileexporting,mostchangesintheenginewouldrequireasynchronizedchangeintheexporters,whichwouldrequirere-exportingalltheassets.ItismuchmoreefficienttocachetheintermediateassetsandprovidethenecessarychangestothefastpathandthefinalstepsofthebuildthanitistoloadalltheassetsbackintotheDCCtoolsandre-export.Doingsomayrequireveryexpensiveandtime-consumingmanualoperationsincemostDCCtoolsarenoteasilyorefficientlydrivenfromthecommandline.Awelldesignedintermediateformatisnotlikelytochangemuchduringthecourseofaproject.Ifcorrectlycachedandthesource-asset-dependencytracked,thefinalassetscanberecreatedfromtheintermediateassetsautomatically,mimickingwhatisalreadytruetodayforsourcecode(whenbuildingtheexecutabledoesnotneedtorebuildalltheintermediateobjectcode).Havinganintermediateassetformatinaneasilyreadableformatprovidesaneasierwayforgameengineprogrammerstodebugandexperiment.
Anadvantageoftheintermediateassetdesignovertheobjectcodeisthatitcanprovideforcross-platformassetcreation.Thebuildprocesscancreatefinalassetsforseveralplatformsfromtheintermediateassets,favoringagamedesignphilosophywheresourceassetsarecreatedatveryhighresolutionandthebuildprocesstakescareofdown-scalingtheassetsforthevarioustargets.Automaticallyconvertinghigh-resolutionassetsisnotalwayspossible,butitismuchmoreflexibleandefficientthanhavingtocreatesourceassetsforeachtarget.Anexampleofthisideausedinmostpipelinesnowadaysistocreatehigh-densitypolygonmodels,andthencreateanormalmaptextureassociatedwithalowerdensitymodel,wherethenumberofpolygonsandtexturesizecanbeadaptedtothedifferenttargetsaswellasprovideameansforlevelofdetailmanagement.Lookingintoproceduraldefinitionsofassetsisagoodwaytoprovideforbetterautomaticadaptability,althoughthetoolsavailableinthatspacearenotasmatureasthetraditionalDCCmethodsthatareeasierforartisttomaster.
Processingthehigh-densityorhigher-levelabstractionassetsintofinalassetscanbeacomplexandresource-intensiveprocessthatisbetterdonewithanindependentbuildsystemthatcanbeoptimizedtakingadvantageofmulti-coreCPUsandnumbercrunchingAPIsfortheGPU,suchasCUDA.Moreandmoretoolandmiddlewareprovidershavestartedtotakeadvantageofthistechniquetoprovidefasterprocessing.Agooddesigngoalistoprogressivelymovetheassetprocessingintotheengineitself,takingadvantageoftheevergrowingavailableprocessingpower.Thisprovidesforasustainabletechnologypaththatwillconvertcomputepowerintomoreflexibleand
efficientcontentcreation.
TeamUnknownRelease
Chapter2-TheGameAssetPipelineGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
2.3PushorPullPipelineModel
Itisreallytemptingtodirectlymodifyanintermediateassetwithouteditingthesourceassetandre-exporting.Whenpressedfortime,itseemsliketherightthingtodo,butmostofthetime,thisisnotsuchagoodideasinceintermediateassetsgetoverwrittennexttimethesourceismodifiedandexported.
Theidealsolutionwouldbetobeabletomakechangesatanypointinthepipelineandmakesurethatthosechangesarereportedbackinthesourcedata.OnewaytodothisistobeabletoimporttheintermediateassetsbackintotheDCCtoolsothemodificationscanbemergedbackintothesourceassets.Butthereisaslewofdatainthesourceassetsthatarenotreproducedintheintermediatedata,suchasconstructionhistoryandtoolwidgetlayout.Additionally,theintermediateassetmayhavebeenprocessedsothattheoriginalformofthedataislost.Therefore,reimportingtheintermediateassetsshouldbedonewiththeideathatchangeswillbemergedintothesource.Unfortunately,unlikesourcecodedevelopmentwherediffandmergetoolsarecommonandallowforconcurrentdevelopment,detectingandmergingchangesisnotacommonfeatureofDCCtools.Thislackofmergetoolsintheassetworkflowhasalsodirectlyimpactedthewaycontentversioningsystemsareused.Whendealingwithsourcecode,itiscustomaryforseveralprogrammerstomakechangestothesamesourcefilesandusemergingtoolstoconsolidatethechangeslater.Butthelackofsuchtoolsforcontentforcestheuseoflockmechanisms,allowingonlyonepersontomakechangestoagivensourceassetatatime.Thisinturnhasforcedthe
sourceassetstobesplitinalargenumberoffilesinordertoenableconcurrentdevelopmenttotakeplace,becauseifallthedatawasinasinglefile,onlyoneartistwouldbeabletoworkonthecontentatatime.Thisisrobust,butnotveryflexibleorefficient.
Traditionally,theassetpipelinehasbeendesignedasapushmodel,inwhichtheuserhastocreatetheintermediateassetsorfinalassetsbyinvokingthebuildprocessandthenloadtheresultintotheeditortoseewhatthefinalresultis.Animprovementtothismodelistousethegameengineeditortopulltheintermediateorfinalassetsdirectlyfromtheuserinterface.AnexampleofthistechniqueisimplementedinthelatestTorque3Dassetpipeline[18],wherethegameengineeditorisactivelylisteningtochangesintheintermediateorfinalassetsandautomaticallyupdatesthecontentintheeditoruponexternalchanges.AnothersimilarideaisdescribedinTheAll-ImportantImportPipeline[19],wherethegameengineeditorprovidestheuserwithdirectloadingoftheintermediateassetsandautomaticallyinvokesthebuildprocesstocreatethefinalassetsondemand.Thisenablestheusertopulltheintermediateassetsdirectly,ratherthanhavingtoinvokethebuildprocessmanually.Thosetwoideascanbecombinedtolistenforintermediateassetchangesandautomaticallyinvokethebuildprocess.ThismechanismcanbecombinedwiththefastpathloadingmechanismtoprovideabetterinteractiveWYSIWYGiterativeprocess,whileabackgroundprocesscreatesthefinalassetsandautomaticallyreplacesthe"slow"contentwith"optimized"contentwheneveritisavailable,transparentlytotheuser.
Thoseideasareastepintherightdirection,butdonotyet
addresstheproblemofeditingintermediateassetswhenthesourceassetshouldinfactbemodified.TheassetpipelinedesignillustratedinFigure2.3addsaRemoteControllinewheretheusercanselectanycontentinthegameengineeditor,andprovidedthattheintermediateassetstoresthesourceassetdependencyandassociatedDCCtool,thesourceasseteditingtoolisautomaticallylaunchedandprovidedtotheusersoitcanbechangedandre-exported.Combinedwiththepreviousmethodwheretheengineislisteningtointermediateassetchangesandautomaticallyinvokingthebuildprocess,thisprovidesforanefficient,flexible,androbustdesignoftheassetpipeline.Theassetpipelineisviewedasapullmodelbytheuser.Intermediateassetsarepulledintotheeditor,forexample,usinganassetbrowserthatcanprovideaquickpreviewoftheassetscategorizedbytype.Theeditorcanautomaticallycreatethemanifestandautomaticallybuildthedependencygraphforthebuildbyfollowingthedependenciesstoredintheintermediateassets.Theeditorcanbeusedtoselectoneorseveralassetsandinvokeaneditcommandthatwilllookintotheintermediateassetinformation,ortheuserpreferences,andinvokethepreferredDCCtooltoeditthecorrectsourceasset.Theeditorcanalsodirectlyinvoketheversioncontrolsystemtoappropriatelycheckoutalocalcopyandcheckbackinmodifiedassets.Acollectionofgenericmodelscanbeprovidedtoserveasplaceholdersthatcanbeinsertedinthecurrentgamelevelforassetstobecreatedlater,enablingatasklistfortheartteamtobegeneratedautomatically.
Figure2.3:Gameengineeditorincontroloftheassetpipeline.
TeamUnknownRelease
Chapter2-TheGameAssetPipelineGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
2.4COLLADA,AStandardIntermediateLanguage
DuringSIGGRAPH2004[20],COLLADAwasintroducedasthefirstopenDigitalAssetExchange(or.dae)specificationtargetinggamedevelopers.Thisinitialspecificationwasprovidedtothepublicafterayearofwork.Exactlyayearearlier,duringSIGGRAPH2003,SonyComputerEntertainmentwasabletocreateaworkinggroupcomposedofkeycompaniesinthefieldtoattemptthecreationofacommonintermediatelanguagethatwouldprovideeverythingneededtorepresenttheassetsforadvancedplatformssuchasthePlayStation3.CompaniessuchasAlias,Discreet,andSoftimagethatwerecompetitorsagreedtoleavetheirweaponshomeandhelpwiththedesignandprototyping.GamedevelopersworkingatDigitalEclipse,ElectronicArts,Ubisoft,andVicariousVisionslikedtheideaandofferedtohelp.
Severaliterationsofthespecificationwerecompleted,andthankstothefirstdevelopersthattookthechallengetohelpresolvemanyoftheissuesinrealusecases,thespecificationfinallygottoastablestate.InJuly2005,theversion1.4specification[21]waspublishedasanopenstandardbytheKhronosGroup[22],thesamegroupmanagingtheOpenGL,OpenGLES,OpenCL,andotherwell-knowngraphicsstandardspecifications.Theversion1.4specificationhadminorfixesmade,providingthe1.4.1specificationwhich,atthetimeofthispublication,isthemostpopularimplementationavailable.
DuringSIGGRAPH2008,theversion1.5versionofthe
COLLADAspecificationwasintroduced[23],addingfeaturesfromtheCADandGISworldssuchasinversekinematics(IK),boundaryrepresentations(b-rep),geographiclocation,andmathematicalrepresentationofconstraintsusingtheMathMLdescriptivelanguage.Theversion1.4.xand1.5.xspecificationsarenotfullybackwardscompatible,whichisnotanissuesinceintermediateassetsaretobere-exportedfromthesourceassetsanyway.Asoftoday,theversion1.4.xformatistheformatsupportedbymostapplications,whileversion1.5supportislimitedtoafewapplications,mostlyintheCADspace.Version1.4.xand1.5.xaretoexistandbemaintainedinparallel.Thenewfeaturesintroducedinversion1.5willmostlikelyneverfindtheirwayintotheversion1.4.xspecification,though,somoreandmoretoolsarelikelytoprovidebothimplementations.Loadingversion1.4or1.5shouldbetransparentfromtheuser'sperspectiveanyway,sincethedocumentcontainsinformationaboutwhichversionitisencodedwith.
SinceCOLLADAisanopenstandard,manycompanieshavebeenprovidingimplementations.COLLADAwasdesignedfromthestarttobealosslessintermediatelanguagethatapplicationscanexportandimport.ItisimportantforCOLLADAtobealanguage,andinordertobeuseful,alanguageneedstobebothspokenandunderstood.Also,tobeabletovalidateandtestanimplementation,itisnecessarytotestanapplicationwithvaliddocumentstoloadandthenexportinordertocomparewiththeoriginaldatasettoseeifanydatawaslostduringtheprocess.ThisisthefoundationoftheofficialKhronosCOLLADAconformancetestthatisavailabletoKhronosAdoptermembersandprovidesavalidationandcertificationprocessinordertoprovidetheend-userwiththeassuranceofagoodquality
implementation.Theapplicationsthathavepassedthetestareauthorizedtodisplaytheconformancebadgelogo.Settingupaneffectiveindependentconformancetestisnoeasytask,sobeforeitisavailableandenforcedthereissomelevelofincompatibilityduetothedifferentpossiblehumaninterpretationofthespecification,butnothingthatcannotbefixedbytheend-user.
SinceCOLLADAprovidesforbothimportandexportfunctionality,andsinceitisavailableformanytools,ithasalsobeenpopularasaninterchangeformat.ThegoalofaninterchangeformatistoenablesourceassetstobetransferredfromoneDCCtooltoanotherDCCtool,whichisdefinitelynotthepurposeoftheassetpipeline.ItturnsoutthatCOLLADAisquitegoodasaninterchangeformat,andhasbeenprovidingforfree,faster,andmoreaccurateconversionofdatabetweentoolsthansomeoftheexistingsolutions.
TheCOLLADA1.4.xfeaturelistincludesgeometry,animation,skinning,morphing,materials,shaders,cameras,(rigidbody)physics,andatransformationhierarchy.ItisbuiltupontheXML(ExtensibleMarkupLanguage)standardforencodingdocumentselectronically,soallCOLLADAdocumentscanbemanagedbyanyofthecommercialoropen-sourceXMLtools.COLLADAisdefinedusingtheXML-Schemalanguage[24]sothattheCOLLADAschemacanbeusedbytoolstovalidatethedocumentorautomaticallycreateAPIsinvariouslanguagesaswellaseditingandpresentationtools.
COLLADAisextensibleinaverystructuredway.Itismandatoryforanintermediateformattobeextensible
becausetheassetpipelinewillneedtoencodedataspecifictothegameengineorthegameitself.Theproblemwithextensionsishowtodesignthemsothatthetoolsthatdonotrecognizetheextensioncanstillimporttheintermediateassetsandbeable,whenpossible,tocarryanextensionforward.COLLADAprovidesseveralplacesintheschemawhereobjectscanbeextendedusingthe<technique>and<extra>elementsassociatedwithotherelementsinthespecification.Sincetheseelementsarenotpermittedtobeplacedeverywhereinthedocument,aCOLLADAparsercanbedesignedtorecognizethoseextensionsandeitherignorethecontent,orbetter,keepthecontentinastringtopasstheinformationthrough.Ofcourse,ifthetoolrecognizestheextension,itshoulduseit!Therearetwowaystoextendanobject:eitheritisadditionalinformationaugmentingthedefinitionofanexistingelement,oritisanalternativedefinitionofthecommondefinitionoftheelement.Itisuptothedevelopertodecidehowtoextend.Inthecasethattheextensionismadebysubstitution,itisarequirementtokeepacommondefinitionavailablethatcanbeusedbyothertoolsasaplaceholder.Forexample,anengineusingspecificgeometrydefinitions,suchasmetaballs[25],canusethe<extra>elementtostorethemetaballinformationandkeepastandardmeshgeometrytostoreagenericmesh.
Let'shavealookofsomedesignprinciplesthatareappliedtoprovidetheflexibilitynecessarytorepresentdatainacommonlanguagewhilestillenablingittobeascloseaspossibletothespecificDCCrepresentationaswellasenablinginternaltransformationstomoveitclosertotheengine-specificrepresentation.TheremainderofthissectionisanoverviewoftheCOLLADAdesignprinciples.Formore
technicallydetailedinformation,thereadercanrefertotheCOLLADA1.4and1.5specificationsavailableontheKhronoswebsite[22]andontheCDaccompanyingthisbook.ThesespecificationsandtheCOLLADAreferencebook[26]providemoredetailsonthechoicesmadewhendesigningthestandard.
The<source>Element
The<source>elementscontaintherawdatathatisusedbytheassets.Likeeveryotherelement,ithasanidattributethathastobeuniqueinavaliddocument.Asourcecontainsaone-dimensionalarrayofdataofaspecifictype(ID,name,boolean,floating-point,orinteger).Eachelementcanhaveaname,whichhastobeavalidXMLstringofcharacters,butthenameinformationisonlyusedtostoreinformationrelevantforhumansandneverusedtoreferenceanobjectinadocument.Floatsandintegerscanbeexpressedwithanynumberofdigits,providingforanylevelofaccuracyneeded.Despitepopularbelief,thereisabsolutelynolossofaccuracyrepresentingfloatingpointnumbersusingdecimaldigits,providedthatenoughdigitsareused—9forsingleprecisionand17fordoubleprecision[27].Thearraysthemselvescontainacountattributethatprovidesforeasiermemoryallocation.
The<technique_common>elementprovidesinformationabouthowthesourceshouldbeaccessedbytheotherelementsdefinedbythespecification.The<technique>elementiswheretheextensionscanbefound;anelementcancarryextensionsfromseveraltoolsatthesametimesinceeachtoolprovidesaprofilenameforitstechnique.
Figure2.4showsthatonlyonearraycanbeina<source>elementbyusingaselector,anditshowsthatthenameattributeisoptional(byusingdottedlines)andtheidattributeismandatory.AllthisinformationisstoredintheXMLschemaavailableontheKhronoswebsite,whichcanbeautomaticallyconvertedintoadrawingorusedtovalidateadocument.Listing2.1showswhatasourcelookslikeinadocumentandthatitisquiteeasytoparse.
Figure2.4:Definitionofthe<source>element.
Listing2.1:Aexample<source>elementand<accessor>element.
<sourceid="mesh1-geometry-position"><float_arrayid="mesh1-geometry-position-array"count="24">2518.18754074.5129650.2518.18750....</float_array><technique_common><accessorsource="#mesh1-geometry-position-array"count="8"stride="3">
<paramname="X"type="float"/><paramname="Y"type="float"/><paramname="Z"type="float"/></accessor></technique_common></source>
The<accessor>Element
The<accessor>element,showninFigure2.5,iswheretheflexibilityisbuiltin.Itenablesustoorganizethearraysinaformatthatisclosertoeitherthetoolorthetargetformat,allowingthebuildprocesstoconvertfromonetoanother.Thisprovidesbetterdecouplingfromtheexporter'spointofviewsincetheexportercanexportthedataasisandthenusethe<accessor>elementtoexplainhowthedatashouldbeaccessed.Thecountattributetellshowmanyelementscanbeaccessedthroughthe<accessor>elementandwhatoffsetandstrideistobeused,typicallycreatingn-tuplesofdata.Thenitdescribesoneparameterforeachelementofthen-tuple,givingitanameandtype.AnexampleisshowninListing2.1.
Figure2.5:Definitionofthe<accessor>element.
Ifa<param>elementisnotgivenaname,thatmeansthecorrespondingvalueistobeignored.Soasourcearraywith3-tuplevaluescouldactuallybedefiningonlya2-tuplewithonepaddingvalue.Asyoucansee,itispossiblewithinthesamelanguagetorepresentapositionarrayinmanydifferentways.Theastutereaderwillnoticethatthearrayelementisoptionalina<source>element,thereasonbeingthatan<accessor>elementcanreferenceanarraystoredinanother<source>element,makingitpossibletoreusethesamearrayofdatainadifferentwaythroughseveral<accessor>elements.
Geometryandthe<mesh>Element
Geometryismostoftenrepresentedbya<mesh>element,althoughthe<convex_mesh>elementandthe<spline>elementareusedforrigidbodyphysicsand2Danimationcurves,respectively.TheCOLLADA1.5specificationadds
<brep>tothelistofgeometrytypes.A<mesh>elementcontainsacollectionof<lines>,<linestrips>,<polygons>,<polylist>,<triangles>,<trifans>,and<tristrips>elements.Mostlikely,themeshdatafoundrightaftertheexportarepolygons,andthesearetransformedintotrianglesclosertotheendofthebuildpipeline.
A<mesh>elementcontainsamandatory<vertices>element,asshowninFigure2.6.Ithasonemandatoryinputelementthatreferencesa<source>element,meaningitusesthe<accessor>elementinthe<source>elementtoaccessthedata.OneveryinterestingdesignfeatureisthatthereisnomentionofthedimensionalityofthevertexdatainCOLLADA.Inotherwords,avertexcanhaveone,two,three,oranynumberofdimensions,thoughmostcontentwillbeusing3DverticeswithX,Y,andZparameternamessincetransformationsandpositionsarelimitedto4Dhomogeneouscoordinatespaceinthe<scene><node>element.The<input>elementassociatesonesemanticwithasource.Naturally,theonemandatoryinputisforthesemanticPOSITION,whichprovidestheposition(e.g.,x,xy,xyz,xyzw)ofalltheverticesusedinthemeshprimitives.A<vertices>elementcancontainasmany<input>elementsasnecessarytoattachadditionaldataassociatedwitheachvertex.Forexample,itiscustomarytohaveaNORMALstoredpervertex,anditisalsocommoninoldersystemstohaveaCOLORpervertex.
Figure2.6:Definitionofthe<vertices>element.
Referringtothe<triangles>elementinFigure2.7asanexampleofa<mesh>elementsub-object,onecanseeitisalsocomposedofasetof<input>elementsthatstorethedataassociatedwitheachprimitiveasopposedtoeachvertexaspreviouslydoneinthe<vertices>element.Oneofthe<input>elementsdefinesthesemanticVERTEX,whichreferencesbyindexthePOSITIONdefinedinthe<mesh><vertices>elementsusedforeachprimitive.Obviously,therearethreeverticesperelementinatrianglelist.The<p>elementisthereforecomposedofaP×Nsetofindexes,wherePisthecountofprimitivesandNisthenumberof<input>elementsdefinedintheprimitivelist.
Figure2.7:Definitionofthe<triangles>element.
Conclusion
ToconcludethisoverviewofthedesignofanintermediateassetformatandthechoicesmadefortheCOLLADAstandard,itisworthmentioningafewadditionalhighlevelconcepts.
Themostimportantanddifficultdesignconcepttokeepinmindwhendesigninganintermediateassetformatistotrytoavoidasmuchaspossibleaparticularimplementationorrun-time.Inotherwords,thedatashouldbeself-describedsoitcanbetransformedandusedwithanyexistingorfuturerun-time.Thisisareallydifficultdesigngoal,andit'swhatoccupiedmostofthedesignmeetingsintheCOLLADAworkinggroup,asthefirstdraftofafeatureisalwaysclosetohowitwillbeusedbytherun-timeorhowitiscreatedin
themodeler.Makingsurethatthedataisnotdescribedrelativetoonesingleusagemodelisveryimportant,asthisensuresthedesignisnotmadeobsoleteastechnologyisrapidlyevolvingorislimitingcreativitybyimposingamodelthatdoesnotfitwithyet-to-be-inventedusagemodels.ThisonedesignpointisthemaindifferencebetweenCOLLADAandmostotherformats.ThisdifferenceisobviouswhencomparingwithAutodeskproprietaryFBXinterchangetechnologywhichisdefinedentirelythroughanAPIandspecificusagemodel:
"TheFBXfileformatisnotdocumented.ApplicationsuseFBXSDKtoimportscenedatafromanFBXfileortoexportscenedatatoanFBXfile."—FBXSDKProgrammer'sGuide,page6[28].
Anotherimportantprincipleofdesignisthecategorizationofelementsinto<library_xx>elementtypesthathelpwiththeorganizationofthedataaswellasenabledocumentcontentstobeseparatedbytype.Importantisthedistinctionbetweenthedatadefinitionanditsutilizationthroughinstancing.Thisenablesanelementtobeusedmanytimeswithouthavingtorepeatitsdefinition,whichwouldcausethesizeoftheintermediateandfinalassetstobloattremendously.COLLADAenablessomeofthevaluesofanelementtobemodifiedwheninstancedthroughtheuseof<param>elements,soitispossibletosharemostoftheelementdefinitionandsavealotofspacewhilestillenablingsourcedatachangestoaffectallinstances,whichcomesinhandyduringproduction.
Lastbutnotleast,COLLADAtakesadvantageofURItechnology[29],whichenableselementstoreferencedata
withinotherdocumentsforflexibleorganization,andittakesadvantageofmanydifferentstoragetechnologiesaswell.Forinstance,anexternalreferenceURIcanbeanHTTPrequestthatisinterpretedbyawebserverasadatabasequery,oritcanbeasimplereferencetoafileonthelocalstoragedevice.Onemainissuethatcomeswithutilizingformatsthatdonothaveagoodexternalreferencemechanismisthattheintermediateassetsallneedtobegroupedinonesingledocument.Suchadocumentcangrowtoanunmanageablesizeduringthecourseoftheproject.Moreover,thismeansthatallthedatahastobeimportedandthenexportedbyallofthetoolsusedintheassetpipeline,whichisamajorlimitingissue.
TeamUnknownRelease
Chapter2-TheGameAssetPipelineGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
2.5OpenCOLLADA
GiventheopennatureofCOLLADAanditsdescriptionusingstandardXMLtechnology,thereexistmanycommercialandopensourceoptionsthatcanbeusedtointerfacewithCOLLADAdocuments.SomeprogramminglanguagessuchasC#andPythonprovidelibrariesthatcandirectlyinterfacewithXMLdata,providingtheprogrammerwithaDocumentObjectModel(DOM)[30],amemoryrepresentationoftheXMLhierarchythatcanbeprogrammaticallyaccessed.ThelibrariessuppliedwithC++,themostusedlanguageingameenginedevelopment,donotprovidethiscapabilitydirectly.ThereareseveralgenericlibrariesthatexistforloadingandsavingXMLcontent,andthereareafewcommercialtoolsthatcangenerateaspecificaccessAPIcreatedautomaticallywithintheXMLschema,enablingthecreationofanAPIspecifictoaparticulardocument.ACOLLADADOMisavailableforC++programmers,anditautomaticallygeneratesacoreAPIfromtheschemainadditiontoprovidinganAPItohelpmanagetheobjectsonceinmemory.ADOMprovidesmorethanaloadingandsavinginterface;itallowsfortheXMLelementstoexistinmemorysotheycanbeediteddirectlywithouthavingtocreateaseparaterepresentation[31].
MorerecentlyduringSIGGRAPH2009,NetAlliedSystemsannouncedtheavailabilityofOpenCOLLADA[32],whichisthefirstopensourceimplementationtotakeadvantageofSAXparsinganddirectwritetechnology,providingsupportforbothversions1.4and1.5.Theprojectpageislocatedonthewebathttp://www.opencollada.org/.The3DSMaxandMayaplug-insbuiltonthistechnology,aswellasthesource
codefortheframework,areavailableonthatwebpage,andtheyareincludedontheCDaccompanyingthisbookforconvenience.Thereaderisencouragedtocheckthewebpageforthelatestupdates.
SAX(SimpleAPIforXML)isanalternativemodeltotheDOMforparsingXMLdocuments[33].ThemainissueprevalentamongmostavailablelibrariesisthattheyloadalloftheXMLdataandcreateanin-memoryrepresentationofthedatabeforeathirdcopyofthedataismadetocreatetheobjectrepresentationusedbytheprogramloadingtheassets.Thisproblemisverycommonacrossallfile-loadingSDKsregardlessoftheformatitself,andthisissueismakingitquitedifficulttohandletheverylargeassetsthatarebecomingmoreandmorecommon.
Thesamememorymanagementissueexistswhenexportingcontentifalibraryisusedtocreateacompletein-memoryrepresentationofthedatatobeexportedbeforethedataiswrittenout.WorkstationmemoryiscommonlymaxedoutwhenalargemodelisloadedinaDCCtool,andastheexportisinvoked,thecomputerhangsasallmemorycontentsareswappedouttomakeroomforyetanothercopyofthesamedata.Thesameprinciplecanbeusedwhenexportingcontentbywritingoutthedatadirectoryandavoidingthecreationofanothercopyofthecontent.
TheDAE2OgresamplecodeshowninListing2.2(andincludedontheaccompanyingCD)demonstrateshowtotakeadvantageofthistechnology.TheideaistoassociateawritertothereaderC++object,wherethereaderistheCOLLADASAXparser,andthewriter(inthiscase)istheOgreengine-specificformat.Sincealldataisnotavailablein
memoryatonce,itisnotpossibletofollowreferencesandexpecttofindthedatainplace.Soinstead,theSAXparserusesaUniqueIdtypeforreferencing.SincethedataispassedtothewriterintheorderthatitappearsintheCOLLADAfile,itmightnotbepossibletoresolveareferenceimmediatelyinthecaseofforwardreferenceddata.Tosolvethisproblem,itiscommontoloadthefiletwice.Inthefirstpass,scenegraph,material,andotherdataaregatheredandstored.Inthesecondpass,geometry,animation,andotherdataarehandled.ASAXparserisabitmorecomplexthanhavingallthedatainmemory,butthankstotheavailabilityOpenCOLLADAopensourceframework[34],itisnotsodifficult.Aswehavealreadyseen,thereisabigadvantageinmemoryusageforsuchtechnology,whichismandatorywhenassetsareverylarge,butitalsoturnsintoperformancegainaswewillobserve.
Listing2.2:ThisisasampleexporterfromDAE2OgreOgreWriter.cpp.
OgreWriter::write(){COLLADASaxFWL::Loaderloader;COLLADAFW::Rootroot(&loader,this);
//loadandwritescenemCurrentRun=SCENEGRAPH_RUN;root.loadDocument(mInputFile.toNativePath())
//ifthereisnovisualsceneintheCOLLADAfile,//nothingtoexporthereif(mVisualScene){
SceneGraphWritersceneGraphWriter(this,*mVisualScene,mLibrayNodesList);sceneGraphWriter.write();}
//loadandwritegeometriesmCurrentRun=GEOMETRY_RUN;root.loadDocument(mInputFile.toNativePath())}
Tables2.1and2.2showthedurationandmemoryconsumptionofanimportoperationandanexportoperationforalargescenein3DSMax.Thetablescomparetheopen-sourceOpenCOLLADAtechnologyandtheFeelingSoftwareopen-sourceimplementationusingtheDOM/intermediatememorymodellibraryFCollada[35].Interestingly,thedifferenceinperformancebetweenthesetwoCOLLADAimplementationsisobservableregardlessoftheexactformatusedforstorage,andthereaderisencouragedtodosomeperformancetestsonlargedatasetstoimprovetheefficiencyofhisassetpipeline,ifneeded.
Table2.1:Importinto3DSMaxusingOpenCOLLADAforMaxandFeeling'sColladaMax.
Boom.dae,116MBonemesh OpenCOLLADA ColladaMax
Timeusedforimport 3.8s 32.5s
Maxmemoryconsumptionduringimport
752MB 784MB
Memoryconsumptionafterimport 444MB 476MB
Memoryconsumptionafterdeletingscene 284MB 332MB
Table2.2:Exportfrom3DSMaxusingOpenCOLLADAforMaxandFeeling'sColladaMax.
Boom.max,29MBonemesh OpenCOLLADA ColladaMax
Timeusedforexport 3.5s 46.3s
Maxmemoryconsumptionduringexport 438MB 623MB
Memoryconsumptionafterexport 418MB 418MB
TeamUnknownRelease
Chapter2-TheGameAssetPipelineGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
2.6UserContent
Moddingisacomputergamecommunityslangexpressionthatisderivedfromtheverb"modify",usedparticularlywithregardtocreatingneworalteredcontent.Moddingwasonceregardedasafringeactivity,butisnowencouragedsinceitextendstheshelflifeofgames.Toolsareprovidedtohelpthegamingcommunitycreateadditionalcontentrequiringthepurchaseofthegameitself,whichprovidesadditionalcontentandrevenuesatnoadditionalcostforthedevelopers.Infact,strongcommunitiesaredevelopedandprovidestrongviralmarketing,therebycreatingadditionalstickinessandloyaltytothegamedeveloper.Contentcreatedbyendusersisreferredtoasusercontentorasplayercontent.
Inorderforend-userstocreatecontent,thegamedeveloperhastoprovideaccesstoasimplifiedormorerobustversionofthegameeditor,scripting,andcontentpipeline.Sinceend-usersarenotlikelytohaveaccesstoexpensiveDCCtools,itisimportanttoprovideanassetpipelinethatcanalsotakeadvantageoffreetoolssuchasBlender[36],GoogleSketchUp[37],orXSIModTool[38].CrymodisanexampleofasuccessfulmoddingcommunityfortheCrytekgames,whichhasitsownportalonthewebathttp://www.crymod.com/.CrytekhaspartneredwithSoftimage(nowownedbyAutodesk)toofferaspecializedversionoftheirCOLLADAexporterthatprovidesafree-to-useprofessionalgradetoolwithintegratedassetpipelinetoenhancethecreationofhighqualityusercontent.ThemodifiedexporterusesspecificnamesemanticsthatarerecognizedbytheCrytekengineandusedtoconnectthe
createdcontenttotheirphysicsandothergame-specificentities.Itshouldbenotedthatthisassetpipeline(ModTooltoCOLLADAtoCryEngine)wasnotusedbyCrytektodeveloptheirowngame,butwasdesignedspecificallyforthemoddingcommunity.Butnowthatthistoolchainhasbeendeveloped,ithasalsofoundinternalusage.
Moreandmoreusercontentcanbefoundonlinein3Dcontentrepositories.Google3DWarehousehasbeenofferingthecapabilityforSketchUpuserstouploadcontenttothisrepository,allowinganyonetosearchanddownloadthecontentintheCOLLADAformat.WiththeintroductionofSketchUp7.1featuresprovidingfreeimportandexportofCOLLADA,itnowispossibleforcontentcreatedinanyothertoolstobeuploadedintothewarehousetobesharedwithotherusers.SincethewarehouseisconnectedtoGoogleEarth,therearemanymodelsofexistingbuildingsandeverincreasingvarietyofcontent.Takingadvantageofthiscontentintheprototypingphaseofagameissomethingthatshouldbeconsideredsinceitisagoodshortcutfordevelopingthegameplaybeforereplacingthecontentwiththerealassetslateron.Anotherusercontentwebsitehastakenadifferentapproach:www.3dvia.com(aDassaultSystèmescompany)enablesuserstouploadthecontentencodedinmanysourceformatsandautomaticallyrunconversiontoolsontheserversothatanyuploadedcontentismadeavailableinboth3DXML(aDassaultproprietaryformat)andCOLLADA.Atthetimeofthiswriting,morethan150,000usersand15,000modelsareactivewithinthis3Dusercontentcommunity.
Somegamesgoevenfurtherintakingadvantageofend-usercreativity.Spore,agamefromMaxis/ElectronicArts,is
agoodexampleinwhichtheuserisprovidedcontentcreationtoolstomaketheirowncreatures,vehicles,andbuildingstouseinthegame.ThelatesteditionofSporeextendstheprincipletothecreationofadventures,ormini-games.Themottoofthistypeofgameisthat"thefunisinthetool",andSporeusershadalotoffun,asitisreportedthatmorethanthreemillioncreatureshavebeencreatedalready.DuringSIGGRAPH2009,WillWright,creatoroftheTheSimsandSpore,deliveredakeynote[39]inwhichheannouncedanadditionalstepinfavorofusercontent.WiththelatestpatchforSpore,userscannotonlycreatecontentthatenrichesthegameforeveryoneelse,butcanalsoexportthecreaturefromthegameintotheCOLLADAformat[40].Thissmallchangeopensupaworldofpossibilitiesforend-users.Thecreaturesareexportedwiththeirskeletonandskin,aswellasdiffuse,specular,andbumpmaps.Already,manyend-usershavegivenfreedomtotheircreaturesfromtheSporeengineandusedsophisticatedrenderingandmaterialstoembellishtheircreations,whichtheythensharewithothers.(Forexample,seethe"MajesticDragon"ontheaccompanyingCD.)MoresophisticatedusersarecreatinganimationsthathavebeenpostedonYouTube.TheseadvancedusersarenoweducatingotherusersandtryingoutallthetoolsthatcanimportCOLLADAmodels.Inthefuture,itislikelythatshortanimatedfeatureswillbecreatedbyend-users,butevenmoreexcitingwillbetheintersectionofdifferentgamingcommunities.ItispossiblethattherearealreadySporecreaturesfightinginsideaCryMod!
TheFuture
Pandora'sboxisopen,andthereisnoturningback.User
contentisgrowing,andtheneedforaneasy-to-useassetpipelineforbothend-usersandprofessionalcontentdevelopersalikeisgrowing.3Dismakingitswaytowardbecomingmainstreammedia,justlikeaudioandvideoindigitalformarenowcommonlyproducedandconsumedviamainstreammedia.Itisexpectedthattheneedforabetterassetpipelineisgrowingas3Disbecomingmorepervasive.Inparticular,theconsumeravailabilityofnative3DdisplayTVsandmonitors[41],advancedshader-capable3Dacceleratorsinmobiledevices[42],andnativehardware-accelerated3Drenderinginsidewebbrowsers[43,44]arealsopushingtheenvelope.
TeamUnknownRelease
Chapter2-TheGameAssetPipelineGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]Subversion.http://subversion.tigris.org/
[2]Perforce.http://www.perforce.com/
[3]BenCarter.TheGameAssetPipeline.CharlesRiverMedia,2004.
[4]Microsoft.XNAGameStudio3.1.2009.http://msdn.microsoft.com/en-us/library/bb203887.aspx
[5]RémiArnaud.COLLADAforXNAforumdiscussionandsourcecode.2008.https://collada.org/public_forum/viewtopic.php?f=13&t=651andhttps://collada.org/public_forum/viewtopic.php?f=13&t=676
[6]W3C.ExtensibleMarkupLanguage(XML)1.0,5thed.November26,2008.http://www.w3.org/TR/REC-xml/
[7]SonyComputerEntertainment."COLLADARefinery".https://collada.org/mediawiki/index.php/COLLADA_Refinery
[8]"Imagefiletypes".http://www.fileinfo.com/filetypes/image
[9]IndustrialLight&Magic.OpenEXR.http://www.openexr.com/
[10]Microsoft."D3DX10imageformat".http://msdn.microsoft.com/en-us/library/ee416748%28VS.85%29.aspx
[11]Microsoft."DDSimageformat".http://msdn.microsoft.com/en-us/library/ee418141%28VS.85%29.aspx
[12]PortableNetworkGraphics."AnOpen,ExtensibleImageFormatwithLosslessCompression".http://www.libpng.org/pub/png/
[13]IBM."StandardsandSpecs:TheInterchangeFileFormat(IFF)".1985.http://www.ibm.com/developerworks/power/library/pa-spec16/
[14]MicrosoftWindowsBitmapFormat.http://www.fileformat.info/format/bmp/spec/e27073c25463436f8a64fa789c886d9c/view.htm
[15]Adobe."OpenSourceGenericImageLibrary(GIL)".http://opensource.adobe.com/wiki/display/gil/Generic+Image+Library
[16]RémiArnaudandKathleenMaher."COLLADA:ContentDevelopmentUsinganOpenStandard".GameDeveloperMagazine,May2007.
[17]NoelLlopis."OptimizingtheContentPipeline".GameDeveloperMagazine,April2004.
[18]GarageGames."EffortlessArtPipelineusingCOLLADA".http://www.garagegames.com/products/torque-3d#feature-pipeline
[19]RodGreen."TheAll-ImportantImportPipeline".GameDeveloperMagazine,April2009.
[20]MarkBarnesandRémiArnaud."SIGGRAPH2004COLLADATechTalk".2004.http://www.collada.org/public_forum/files/COLLADASiggraphTechTalkWebQuality.pdf
[21]SonyComputerEntertainment."COLLADAApprovedbyKhronosGroupasOpenStandard".July29,2005.http://www.scei.co.jp/corporate/release/pdf/050729e.pdf
[22]TheKhronosGroup."COLLADA—3DAssetExchangeSchema".http://khronos.org/collada/
[23]KhronosGroup."KhronosReleasesCOLLADA1.5.0SpecificationwithNewAutomation,Kinematics,andGeospatialFunctionality",August5,2008.http://www.khronos.org/news/press/releases/khronos_releases_collada_150_specification_with_new_automation_kinematics_a/
[24]"TheX3CXMLSchema".2001.http://www.w3.org/XML/Schema
[25]"Metaballs".1999.http://www.siggraph.org/education/materials/HyperGraph/modeling/metaballs/metaballs.htm
[26]RémiArnaudandMarkBarnes.COLLADA:SailingtheGulfof3DDigitalContentCreation.AKPeters,2006.
[27]DavidGoldberg."WhatEveryComputerScientistShouldKnowAboutFloating-PointArithmetic".1991.http://docs.sun.com/source/806-3568/ncg_goldberg.html#812
[28]Autodesk.FBXSDKProgrammer'sGuide,2009.http://images.autodesk.com/adsk/files/fbx_sdk_programmers_guide_2010_2.pdf
[29]W3C."UniformResourceIdentifier(URI):RFC3986".
2005.http://www.ietf.org/rfc/rfc3986.txt
[30]TheXMLDocumentObjectModelfromW3C,2005.http://www.w3.org/DOM/
[31]COLLADAwiki."COLLADADOMPortal".https://collada.org/mediawiki/index.php/Portal:COLLADA_DOM
[32]KhronosGroup."TheKhronosGroupAnnouncesSignificantCOLLADAMomentumatSIGGRAPH2009".2009.http://www.blendernation.com/the-khronos-group-announces-significant-collada-momentum-at-Siggraph2009/
[33]Wikipedia."SimpleAPIforXML".http://en.wikipedia.org/wiki/Simple_API_for_XML
[34]NetalliedSystemsGmBh."OpenCOLLADASDK".http://www.opencollada.org/faq.html
[35]FeelingSoftware.COLLADASupport.http://www.feelingsoftware.com/en_US/3D-collada-tools/collada-tools.html
[36]BlenderFoundation.Blender.http://www.blender.org/
[37]Google.GoogleSketchUp.http://sketchup.google.com/
[38]Autodesk."AutodeskSoftimageModTool".http://usa.autodesk.com/adsk/servlet/pc/item?id=13571257&siteID=123112
[39]StephenJacobs."SIGGRAPH:WrightTalksPerceptionAnd'EntertainingTheHiveMind'".
Gamasutra.com,August6,2009.http://www.gamasutra.com/php-bin/news_index.php?story=24733
[40]DanMoskowitz."HowToExportSporeCreaturestoMaya".http://forum.spore.com/jforum/posts/list/37155.page
[41]MargueriteReardon."3Discomingtoalivingroomnearyou".CES2009.http://ces.cnet.com/8301-19167_1-10142957-100.html
[42]Apple."OpenGLESoniPhoneOS".http://developer.apple.com/iphone/library/documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/OpenGLESontheiPhone/OpenGLESontheiPhone.html#//apple_ref/doc/uid/TP40008793-CH101-SW1
[43]Google."O3DAPI".http://code.google.com/apis/o3d/
[44]KhronosGroup."KhronosDetailsWebGLInitiativetoBringHardware-Accelerated3DGraphicstotheInternet".http://www.khronos.org/news/press/releases/khronos-webgl-initiative-hardware-accelerated-3d-graphics-internet/
TeamUnknownRelease
Chapter3-VolumetricRepresentationofVirtualEnvironmentsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter3:VolumetricRepresentationofVirtualEnvironments
DavidWilliamsThermite3D
3.1Introduction
Theuseofheightmapsasamechanismforrepresentingterrainsiswellestablishedwithincomputergraphicsandgaming.Heightmapsareaconceptuallysimplerepresentation,easytovisualize,andsimpletocreate.Furthermore,thereisalargebodyofresearch[19]intothemanipulationandrenderingofsuchdata.However,therearealsoseriouslimitationsthatresultfromthisrathersimplisticrepresentation,suchastheinabilitytosupportcavesandoverhangs.
Inthisarticle,wetaketheconceptoftwo-dimensionalheightmapsandshowhowtheycanbeextendedtofullythree-dimensionalvolumes.Thisisarepresentationthatnaturallyandconsistentlyhandlesthekindofgeologicalstructuresmentionedpreviously.Italsoallowseasyreal-timemodification,andassuchcanbeusedtocreatepowerfulterraineditorsoruniquegameplayopportunities.
Tothisend,theconceptofvolumetricenvironmentshasbeensuccessfullyemployedinseveralcommercialgamestodate.ThegameWorms3Dfoundittobeanappropriatewaytobringthehighlydestructiblebuttwo-dimensionallevelsfromtheearlierWormsgamesintothethirddimension[1].TheCrysissandboxeditorutilizedvoxelsduringterrainmodeling,andthesewereturnedintotraditionalstaticmeshesforruntimeuse.AndtheupcominggameMinerWarsusestheconcepttoallowplayerstodigthroughasteroidsinrealtime.
Throughoutthisarticlewewillconsiderthemodelingand
renderingofcomplexterrainstobethemainapplicationofthedescribedtechnology.Nonetheless,wedobelievethetechnologycanhavedirectapplicationtomanmadeorotherwiseartificialenvironmentsifappropriategameplaymechanicsandartisticstylesareinplace.Infact,theuseofvoxelsfornon-terrainenvironmentshasbeenacoreresearchareaofourownexperimentalThermite3Dgameengine[15],uponwhichthisgemislargelybased.
Figure3.1showsavarietyofenvironmentsthatarerepresentedusingthevolumetricapproachdescribedinthisarticle.
Figure3.1:Volumetricrepresentationscanbeusedformanydifferenttypesofenvironments.In(a)weseeacomplexterrainwithtwoprimarylevelsandnumerousoverhangs[5].(ImagecourtesyofThomasSchöps.)TheEarthin(b)hasbeencutawaytoillustratethattheinteriorisalsomodeled[15].Manmadestructureswithmanydifferentmaterials(c)canalsoberepresented[15],while(d)showsaminingshipinsideanasteroid,destroyingitinrealtime[8].(ImagecourtesyofKeenSoftwareHouse.)
[1]Althoughbasedontheconceptsdescribedinthisarticle,
Wormsactuallyusesamuchlowerresolutionvolumethanthatwhichwepresentherebutallowsforlatticedeformationstoachievetheirdesiredartisticstyle.
TeamUnknownRelease
Chapter3-VolumetricRepresentationofVirtualEnvironmentsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
3.2Overview
Thecoredatastructurewithinoursystemisthatofthevolume.Avolumeisaregularthree-dimensionalgridofvalues,eachofwhichisknownasavoxel.Conceptually,thisisanalogoustothewayabitmapimageisaregulartwo-dimensionalgridofpixels.Itisalsopossibletothinkofavolumeasconsistingofanumberoftwo-dimensionalslicesstackedontopofeachother.ThisisshowninFigure3.2(a).Wedefinetheconceptofacellasbeingagroupofeightneighboringvoxelsthatformacube,againasillustratedinFigure3.2(b).
Figure3.2:(a)Thisvolumeconsistsofan8×8×8gridofvoxels,thoughrealvolumesareconsiderablylarger.Thecorneriscutawaytoshowhowvoxelsalsomodeltheinteriorofanobject.(b)Eachgroupof2×2×2voxelsformsacell.Notethatvoxelsandvolumesaretheonlytypesthatarestoredexplicitlywithinoursystem,asedgesandcellsareimplicitconstructsthatwebuildbylookingatavoxel'sneighbors.
Thevolumeissizedandpositionedsoastocovertheentirevirtualenvironmentthatwewishtorepresent.Eachvoxelthenencodesarepresentationofwhatexistsatitslocation.Theexactdatathatconstitutesavoxelwillbediscussedshortly,butfornowitcanbeconsideredtobeasinglebitindicatingwhetherthatlocationissolidmaterialorempty
space.Naturallythisrepresentationisveryeasytomodifyinrealtime,becauseaddingorremovingmaterialsimplybecomesacaseofsettingthevaluesofthevoxels.ThisissignificantlyeasierthanthecomplexCSGoperationsthatmightberequiredforotherrepresentations.
Whilethedirectrenderingofsuchvolumesisanactiveresearcharea[12],modernGPUhardwareishighlytunedtotheefficientrenderingoftrianglemeshes.Therefore,althoughthevolumetricrepresentationisveryusefulforeditinganddeformingtheenvironment,itisdesirabletotransformitintoatrianglemeshforthepurposeofvisualization.
Thisprocessisknownassurfaceextraction,andthereareanumberofalgorithmsthatareabletoperformit.TheMarchingCubesalgorithm[7]isoneoftheearliestandmostwidelyused—itispopularduetoitssimplicity,speedofexecution,andgoodlocalityofreference.Furtherdevelopmentshaveaddressedambiguitiesintheoriginalalgorithm,workedaround(nowexpired)patentissues,orprovidedadaptivetriangulation.
TheMarchingCubesalgorithmoperatesonasinglecellatatime.Foreachcornerofthecell,itclassifiesthevoxelasbeingeitherinsideoroutsideofthesurfaceaccordingtoitsvalue.Thisgives256possiblecombinationswhichcanbegroupedintothe18equivalenceclassesillustratedinFigure3.3.Eachclassrepresentsthesetofrotationallysymmetriccases,andsomeclassesincludeinversesaswell.ThelastthreeclassesinFigure3.3areadditionstotheoriginalMarchingCubesalgorithmthatmustbeusedinsteadoftheinvertedtrianglesinordertoavoidholes.
Figure3.3:ThesetoftrianglesgeneratedbytheMarchingCubesalgorithmforeachofthe18possiblecellconfigurations.Thenumbersindicatehowmanytimeseachconfigurationoccurs.Solidcirclesrepresentvoxelscontainingsolidmaterial,whilehollowcirclesrepresentvoxelscontainingemptyspace.Inmostcasestheinverseconfigurationgeneratesthesamesetoftriangles,withtheexceptionofthelastthreecases(whichareinversesofearliercasesbutwithdifferenttrianglestoavoidholes).
Alookuptableisusedtomapthecombinationofvoxelstoaparticularsetoftrianglesthatlocallyrepresentsthesurface.Thisprocessisappliedtoeverycellinthevolume(thoughcellswithidenticalvoxelvaluesgeneratenotrianglesandcanbetriviallyskipped)inordertoreconstructthecompletesurface.Wewillnotbedescribingthisprocessinmuchdetailasitisalreadywellcoveredbyexistingliteratureandnumerousimplementationsareavailableonline.
TeamUnknownRelease
Chapter3-VolumetricRepresentationofVirtualEnvironmentsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
3.3DataStructures
Havingdevelopedanunderstandingofthecoreprinciples,itisnowpossibletothinkcarefullyaboutthedatastructuresinvolvedinourvolumetricrepresentation.
Althougheachvoxelcanberepresentedbyasimpleinside/outsidebitasdescribedpreviously,inpracticethisdoesnotprovidemuchflexibility.Instead,ourenginestoresan8-bitmaterialIDforeachvoxel.Avalueof0representsemptyspace,whileeachofthe255non-zerovaluesrepresentsadifferentmaterial(rock,soil,wood,etc.).Naturallyeachofthesematerialswillhaveadifferentvisualappearance,butitispossibletoattachdifferentphysicalpropertiestothemaswell(perhapssomecannotbedestroyedin-game,forexample).ThismaterialIDwilllaterbepasseddownthegraphicspipelineforuseinshadingcalculations.
Additionally,theuseofasimplein/outdecisionwhenclassifyingthecornersofacelltendstoleadtoameshwithaveryjaggedappearance.Dependingontheapplication(andcertainlyinthecaseofterrain)itcanbeusefultoreplacethisbinaryvolumewithadensityfield.Inthiscaseweassigneachvoxelanumericalvalueandwedefinethesurfaceofourenvironmenttobethesetofallpointsthathaveaparticularisovalue.WhenrunningtheMarchingCubesalgorithmonagivencellweclassifyeachcornerasbeingaboveorbelowtheisovalue,anduselinearinterpolationtopositionanyresultingverticesatthecorrectlocationalongtheedge.[2]
Usingadensityfieldratherthanabinaryvolumemeansmorecontrolisaffordedovertheshapeofthemesh.Bymodifyingthevoxelvaluessuchthattheisovalueisnotexactlyhalfwaybetweenthem,theresultingvertexcanbepushedclosertoonevoxelortheother.Theconsequenceofthisisthattheresultingmeshtendstobealotsmoother.
Havingdefinedthecontentsofavoxelwecannowcreateathreedimensionalgridofthemtoformourvolume.Anaiveapproachwouldbetostoreasimplethree-dimensionalarrayofvoxelssuchthattheyformacontinuouslayoutinmemory.However,givenwhatweknowsofar,wecanoutlinesomedesirablepropertiesthatwewouldlikeourvolumedatastructuretoexhibit:
Compression.Usingtwobytespervoxel(formaterialIDanddensity)meansthatourvolumewilloccupy2×width×height×depthbytesofmemoryusingthesimplecontiguousapproach.Thisquicklybecomesunacceptableforreasonablysizedvolumes,andsoitisusefultoinsteaduseadatastructurethatexploitsthehighspatialcoherencethatvolumestendtoexhibit.
Fastreadaccess.Thisiscrucialfirstlyforimplementingthesurfaceextractionalgorithmefficiently,andalsoforimplementingpickingandcollisiondetectiondirectlyagainstthevolume(ratherthantheextractedmesh).
Fastwriteaccess.Allowingtherealtimemodificationofthevolumesisoneofthecorerequirementsofoursystem.Todothisweneedtoallowfastmodificationofvoxels.Whileastructuresuchasanoctreeislikelytodoverywellatsatisfyingourcompressionrequirement,itislikelytohaveahighermodificationoverheadaschanges
mayhavetobepropagatedupthetree.
Fastaccesstoneighbors.Accessingavoxel'sneighborsisrequiredwhenrunningtheMarchingCubesalgorithm(asthisoperatesonaper-cellbasis)andalsoforcomputingsurfacenormalsdirectlyfromthevolumedata(seeSection3.4).Toachievethiswewishtomakeourdatastructurecache-friendly,suchthatvoxelsthatarenearbyspatiallyarealsolikelytobenearbyinmemory.
Thereisalargebodyofresearchonstoragetechniquesthatmeettherequirementsabove,butwehavechosentousetheapproachpresentedbyGrimmetal.[1].Essentiallythevolumeisbrokendownintoacollectionofcubicblocks,andthevolumeisrepresentedasalistofreference-countedpointerstotheseblocks.SeeFigure3.4.
Figure3.4:Thevolumewithdimensions8×8×8voxelsatthetopofthefigurecontainsfourdifferentmaterialIDsrepresentedbycolors(seefigureonaccompanyingCD).Itissplitinto8blocks,eachofwhichhavedimensions4×4×4voxels.Thefourtopblocksandthetwolowerleftblocks(oneofwhichishiddenattheback)arehomogeneousandsocansharecopiesoftheactualdata.
Thereferencecountsareindicatedatthebottomofthefigure.Explicitlystoringblockdataforonlyfouroutoftheeightblocksgivesusamemorysavingof50%inthisoverlysimplisticexample.
Compressioninthissystemarisesbecause,iftwoblockshaveidenticalcontents,thenweareabletohavebothentriesintheblocklistpointingtothesameblockdata.Thisoccursfrequentlywithblocksthatarecompletelyhomogeneous.Itisofcoursealsopossiblethattwoheterogeneousblocksalsohappentobeidentical,butthisissufficientlyrarethatitisnotworththecomputationaloverheadinvolvedincheckingforit.
Anobviousquestioniswhywebothertostoreahomogeneousblockatall,ratherthansimplysettingaflagtoindicatethatitishomogeneousandthenstoringitsvalue.Thisisaperfectlyvalidapproachanddoessavealittlemorememory,butitmeansthateachtimeavoxelisaccessed,wemustaddsomeadditionallogictodeterminewhetherweshouldfollowapointertosomeblockdataorjustusethehomogeneousvalue.
Oursystemaddsanextralayerofindirectionwhenaccessingvoxelsbecausewemustfirstdetermineinwhichblockavoxelislocated,followthepointer,andthenaccessthevoxel.However,wehavealsoalreadyemphasizedtheimportanceofhavingfastaccesstotheneighboringvoxels,andthesearealsolikelytoliewithinthesameblockaseachother.Hence,wehavefounditusefultointroducetheconceptofavolumesamplerobjectwhich,oncepointedatavoxel,willcachetheblocklookupinordertospeedupaccesstoothervoxelsinitslocality.
Whenwritingtovoxelsthereareacoupleofscenariosforwhichweneedtowatchout.First,theblocktowhichwearewritingmightcurrentlybeshared.Wemustcheckthereferencecountoftheblockandduplicatethedataifnecessary.Second,theactofmodifyingthedatamaycausetheblocktobecomehomogeneousandthereforeeligibleforsharing.Thisiscostlytoverifyasitpotentiallyinvolvesreadingandcomparingeveryvoxelintheblock.Soinsteadwemarktheblockasbeingpotentiallyhomogeneousandprovideagarbagecollectionroutinethatcanbecalledwheneverthereisspareprocessingtime(e.g.,theCPUisstalledwaitingonsomeothertask).
Lastly,weneedtotakesomecaretochooseanappropriateblocksizeandthereareseveralfactorsthatcaninfluencethis:
Smallerblockshaveagreaterchanceofbeinghomogeneous,andthereforeofbeingshared.Ontheotherhand,thismeanstherewillbeagreaternumberofblocksandsotheblocklistwillbelonger.TheeffectthishasonmemoryusagecanbeseeninTable3.1.
Largerblockshaveasmallerproportionofvoxelsonthefacesoftheblock.Thismeansthevolumesamplerismoreeffectivebecauseitislesslikelytoneedtolookatneighboringblocksduringvoxelneighborhoodoperations.
Blocksarerequiredtohaveasidelengththatisapoweroftwo,inordertoallowaddressingoperationstobeimplementedusingbitmanipulation.
BlocksshouldbesmallenoughtofitintotheCPUcache.
Table3.1:Thememoryrequiredtostoreourexamplevolumesvariesdependingontheblocksize.
Volume Dimensions Uncomp.memory Blocksize
No.ofblocks
Castle
256×256×256 16MB 16×16×16 4096
256×256×256 16MB 32×32×32 512
256×256×256 16MB 64×64×64 64
Mountain
512×512×256 64MB 16×16×16 16384
512×512×256 64MB 32×32×32 2048
512×512×256 64MB 64×64×64 265
Earth
512×512×512 128MB 16×16×16 32768
512×512×512 128MB 32×32×32 4096
512×512×512 128MB 64×64×64 512
Giventheaboveconstraints,Grimmetal.foundthatablocksizeof32×32×32gavethebestoverallperformance.OurexperimentalresultsinTable3.1showthatwecanexpectabouta70%compressionrateinthisscenario.
[2]Thecurrentversionofourenginedoesnotuseadensityvalue(onlyamaterialID)foreachvoxel.Thisisbecausewehavebeenfocuseduponrepresentingotherstructurebeyondterrain.However,thereisaforkofourcodebase[5]thathashadthisfunctionalityadded.
TeamUnknownRelease
Chapter3-VolumetricRepresentationofVirtualEnvironmentsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
3.4SurfaceExtraction
WehavealreadyintroducedtheprinciplesoftheMarchingCubesalgorithm(andwesuggestthereaderconsult[7]foramoredetailedexplanation).Inthissectionweexaminesomeofthepracticalitiesofimplementingthealgorithmwithrespecttoourspecificapplication.Theseinclude:
Allowingthedatatobemodifiedinrealtime,andintelligentlyregeneratingthesurfaceforthemodifiedpartwithoutrunningthealgorithmovertheentirevolumeagain.
OutputtingtheresultingsurfacemeshinaformatthatissuitableforGPUrenderingorforfurtherprocessing.
Toaidwiththeabove,webreakourvolumedownintoadjacentandnon-overlappingcubicregions.Thismeansthateverycellisinonlyoneregion,butvoxelsontheface,edge,orcornerofaregionalsobelongtoneighboringregions.Eachoftheseregionsisassignedseparatevertexandindexbufferstoholdthemeshdatacorrespondingtothesurfacethatiscontainedinthatregion.Additionally,eachregionkeepsadirtyflagthatindicateswhetherthetriangledatainthebuffersmatchesthecurrentvolumedataforthatregion.
Althoughthismaysoundsimilartothebreakingdownintoblocksdiscussedearlier,itisimportanttorealizethatthetwoconceptsofregionsandblocksserveentirelydifferentpurposes.Blocksareadatarepresentationusedtoprovideefficientstorageandfastaccesstothevoxels,whereasregionsareusedtorestricttheexecutionoftheMarchingCubesalgorithmtothepartsofthevolumethathave
changed,andalsoforthepurposeofvisibilityculling.
Afterthevolumeissettoitsinitialvalue,theMarchingCubesalgorithmisexecutedtoupdatethemeshdataforeachregion,andthedirtyflagiscleared.Furtherattemptstowritetothevolumeresultintheregionscontainingtheaffectedvoxelsbeingmarkedasdirty.Atrivialcheckisperformedtoensurethatvoxelsarenotbeingsettotheircurrentvalue.Also,rememberthatmostvoxelsonlybelongtoasingleregion,butthoseontheface,edge,orcornerofaregionaresharedbyneighboringregionsaswell.Hence,modifyingasinglevoxelcancausemultipledirtyflagstobeset.
Anydirtyregionwillneedtohaveitsmeshregenerated,butthisshouldnothappenimmediatelybecauseitisanexpensiveoperation,anditislikelythatothernearbyvoxels(probablyinthesameregion)arealsoabouttobeupdatedasaresultoftheuser'scurrentaction.Regenerationshouldinsteadbedelayeduntilallvolumemodificationsforthecurrentframearecomplete.
Aswiththeblocksize,thereareanumberoffactorstotakeintoaccountwhenchoosinganappropriateregionsize:
Smallerregionsmeanmoreregions,resultinginahigherbatchcount.Foreachregionthatcontainsasurface,wegenerateatleastonebatch(possiblymorethanonebatchifwesplitmaterialsasdescribedinSection3.5.2).BatchcountisoneofthelimitingfactorsintheperformanceofmodernGPUs.
Smallerregionsoffertheopportunityforfiner-grainedocclusionculling(seeSection3.5).
Smallerregionswilltypicallyencloseamodifiedpartofthevolumemoretightly,meaningfewercellshavetobeprocessedbytheMarchingCubesalgorithm.
Largeregionstakelongertoregenerate.
Largeregionsmaycontainmorethan65,536vertices,whichwillthenrequire32-bitindices.Thishassomememoryoverhead,andmaynotbesupportedonolderhardware.
MatchingtheregionsizetotheblocksizemakesiteasiertoimplementtheMarchingCubesalgorithminacache-friendlymanner.
Asaguideline,webreakourvolumedowninto8×8×8=512regions.
Inadditiontogeneratingthevertexpositions,wealsorequirevertexnormalstoperformshadingcalculations.Thereareseveralknownapproaches[17]tocomputingthesenormalsfromthemeshdatabutthesecansufferfrommismatchesontheregionboundaries.Analternativeistogeneratethenormalsdirectlyfromthevolumedatabycomputingthegradientvector.
Anapproximationofthegradientvectorcanbefoundusingthecentraldifferenceoperator[12].Foravoxelatintegerposition(x,y,z),thecentraldifferencegradientcanbefoundbyexaminingthedensityvalueofneighboringvoxelsasfollows:
IfasmothergradientisrequiredthentheSobeloperator[12]maybeusedinstead,butingeneralthecentraldifferenceoperatorwillbesufficientforourpurpose.Thegradientatanarbitrarypointinthevolumecanbefoundbyinterpolatingthegradientsfromthecornersofthecorrespondingcell.Whenfindingthegradientatagivenvertexposition,aone-dimensionalinterpolationissufficientbecauseageneratedvertexwillalwayslieonacelledge.
3.4.1OutputoftheAlgorithm
Foreachregion,thealgorithmgeneratesasinglevertexandindexbufferpair.Inthecasethattheregiondoesnotcontainasurface,thebufferswillbeemptyandneednotbeuploadedtotheGPU.Asmentioned,theindicesmaybeeither16or32bitsdependingonthenumberofvertices,andtheverticescontainthefollowinginformation:
structVertex{floatposition[3];floatnormal[3];floatmaterialId;floatalpha;}
ThematerialIDisstoredasafloatforcompatibilitywithShaderModel3.0hardware.IfyouaretargetingamoremodernGPU,thenyoumaywanttouseanintegraltypefordirectuseasanindexintoatexturearray(seeSection3.5.1).ThealphavalueisusedtoblendbetweendifferentmaterialsandwillbediscussedfurtherinSection3.5.2.
Althoughwehavenotyetimplementedit,oneusefuloptimizationwhenrenderinggeometrythatisrepresentedbybuffersistoadjusttheorderinwhichprimitivesarerenderedinordertoeffectivelyutilizetheGPUvertexcache.Primitivesthatareclosetogetherspatiallyshouldalsobeclosetogetherinthebuffer,suchthatwhenrenderingthereisanincreasedlikelihoodthatacachedvertexcanbereusedratherthanthevertexshaderbeingexecuted.
BothNvidiaandATIprovideofflinetoolsforthisjobbutwerequiresomethingthatrunsquicklyonmeshesgeneratedatruntime.Therearesomepossiblecandidatesforthis[16],butitremainstobeseeniftheirperformanceissufficientforourapplication.
3.4.2LevelofDetail
Anotherimportanttechniqueforimprovingtherenderingperformanceofoursystemistoimplementsomekindoflevel-of-detail(LOD)mechanismsuchthatregionsthataredistantfromthecameraarerepresentedwithfewertrianglesthanthosethatareclosetoit.
Withinourengine,weimplementedLODdirectlyonthevolume,ratherthanonthegeneratedtrianglesurfaces.Thatis,webuildamippyramidwhereeachlevelhashalfthewidth,height,anddepthofthelevelbelowit.(Thisisconceptuallysimilartotexturemipmapsusedongraphicscards.)Wekeepthesamenumberofregionssuchthateachregionalsohashalfthedimensionsofitspredecessorinthepyramid.
Wethenrunthesurfaceextractionontheappropriatemip
levelbasedonthedistanceoftheregionfromthecamera.Asthecameramovesaroundregionscanhavetheirsurfaceregeneratedatadifferentresolution,withtheprevioussurfaceeitherbeingdiscardedorcachedforpossiblelateruse.ThethreadingsystemthathandlesthisautomaticregenerationinthebackgroundisdiscussedinSection3.4.3.
Toactuallygeneratethemippyramiditisnecessarytobeabletoderivethevalueofavoxelfromtheeightvoxelsintheprecedingmiplevel.Ifwearestoringdensitycomponentsforourvoxelsthenthesecansimplybeaveraged.IfwearestoringamaterialIDthenitdoesnotmakesensetoaveragethese(thecombinationoftwodifferentmaterialsshouldbeoneofthosetwo,ratherthanathirdmaterial).Inoursystemwetaketheminimum,althoughthemostfrequentlyoccurringvalue(thestatisticalmode)couldalsobechosen.
OnedrawbackofourLODmechanismisthattheretendtoberatherlargecracksbetweenadjacentregionsusingdifferentLODlevels.Thisisaclassicprobleminterrainrenderingandhasbeenthefocusofsignificantresearchforthe2Dheightmapscenario,butitisvastlymoredifficultinthreedimensions.Lengyel[6]hassolvedthisproblemusingamoresophisticatedsetofequivalenceclassesthatconsiderboththehigherandlowerresolutiondatawhengeneratingtriangles.
InadditiontothediscreteLODsystemoutlinedabove,wealsohaveanearlyversionofaprogressiveLODsystembasedontheworkof[11]and[18].Itislittlemorethanavanillaimplementationofthetechniquesdescribedinthesepapers,withtheexceptionthatwemodifytheedgecollapse
heuristictonotcollapseedgesthatlieonmaterialboundariesorontheboundariesofregions.ThiseliminatesthecracksbetweenregionsbutthesystemissignificantlyslowerthanthediscreteLODapproach.Inparticular,generatingalowlevel-of-detailmeshwiththeprogressivesystemtakesatleastaslongasgeneratingahigh-resolutionmesh,plusthetimetakentoperformthesimplification.Incontrast,ourcurrentdiscreteapproachcangeneratealowLODinmuchlesstimethanthehigherLOD.
3.4.3ThreadingtheAlgorithm
Interactivevolumeeditingiscrucialforallowingdesignerstobuildtheir3Dworlds,andwithinourengineweliketothinkofitasagameplayfeatureaswell.Therefore,itisessentialthattheregenerationofadirtyregion'ssurfaceisperformedasquicklyaspossible.Efficientlythreadingthealgorithmcangoalongwayinhelpingusachievethisgoal.
OursystemisillustratedinFigure3.5andworksasfollows.Ourmaingamethreadhandleslogicandrenderingandistheonlyonetohavebothreadandwriteaccesstothevolume.Voxelsaremodifiedbasedonuserinputandin-gameevents,andtheregion'sdirtyflagisset.WeperiodicallycollectdirtyregionsandpopulateasimpleTaskDatastructurewiththeregion,therequiredLODlevel,andapriority.Thispriorityisbasedupontheregion'sdistancefromthecamera,suchthatnearbyregionswillberegeneratedsooner.Thetaskdataisaddedtoaprioritizedqueueoftasksthatneedtobeprocessed.Ifthetaskisalreadyinthequeuethenthereisnoneedtoadditagain,asthecorrespondingregionisalreadyscheduledforanupdate.
Figure3.5:Asmallnumberofbackgroundthreadscontinuouslyprocessthequeueofregionsthathavebeenmodifiedandregeneratethesurfacegeometry.ThemainthreadretrievestheresultsanduploadsthegeometrytotheGPU.
Asmallnumber(1–4)ofsurfaceextractionthreadsruninthebackgroundandwaitforthepriorityqueuetocontaintaskdatatoprocess.WhenaTaskDatainstanceisavailable,asurfaceextractionthreadwillremoveitfromthepriorityqueueandperformtheMarchingCubesalgorithmonitsspecifiedregion.Ifothermeshprocessingtaskssuchasnormalcomputation(Section3.4)orsplittingbymaterial(Section3.5.2)arenecessary,thentheyarealsoperformedbythesurfaceextractionthread.TheresultingvertexandindexbufferpairisthenassignedtotheTaskDatainstance,whichinturnisaddedtoaqueueofcompletedtasks.
ThemainthreadistheonlyonethatcancopythedatatotheGPU(atleastoncurrentgenerationhardware),andsothistakesresponsibilityforremovingitemsfromthequeueofcompletedtasksanduploadingthemtothegraphicscard.Semaphoresareusedtocontrolaccesstothequeuesandtoprovidethreadsynchronization.
Inordertogiveanindicationofhowwellourthreadedsurfaceextractorperforms,weraneachofourthreetestvolumesthroughthesurfaceextractorusingbetweenoneandfourthreads.Ineachcase,wemeasuredtheamountoftimeactuallyspentperformingsurfaceextraction.TimespentperformingothertaskssuchasloadingthevolumefromdiskanduploadingthemeshesintoGPUmemoryisomittedfromtheseresults,whichcanbeseeninTable3.2.
Table3.2:Sometypicaltimingsforourthreadedsurfaceextractorrunningonaquad-core2.33GHzCPUwith2GBofmemory.
No.ofthreads
Thread1time(ms)
Thread2time(ms)
Thread3time(ms)
Thread4time(ms)
Averagetime(ms) times
Castle
1 1094 - - - 1094
2 501 589 - - 545
3 360 439 344 - 381
4 281 280 265 322 287
1 5292 - - - 5292
2 2717 2683 - - 2700
Mountain
3 1960 1888 1955 - 1912
4 1438 1340 1314 1518 1402
Earth
1 9062 - - - 9062
2 4439 4455 - - 4447
3 2926 3063 3045 - 3011
4 2549 2501 2510 2537 2524 10097
Thereareseveralthingswecanobservefromthisdata.First,wecanseethatwithseveralthreadsrunningtheworkloadissplitfairlyevenlyamongthem.Adirectresultofthisisthattheaveragetimespentinathreaddecreasesasthenumberofthreadsincreases.It'snotquitealinearrelationship,butitisprettyclose,asshownbythe"Averagetime"column.Second,wecanseethatthetimetoregeneratearegion(64×64×64inallthesecases)isconsistentlyaround20ms.Thismeanswewouldtypicallyexpecttopickuptheresultsofmodification1–2framesafterishasoccurred,anditmakes64×64×64thelargestregionsizewecanpracticallyuseforrealtimemodification.Lastly,weseethatthetimerequiredtoprocessacompletevolumeistypicallyjustafewseconds.
TeamUnknownRelease
Chapter3-VolumetricRepresentationofVirtualEnvironmentsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
3.5Rendering
Forthepurposeofrendering,aboundingvolumehierarchyisbuiltwithourregion'sgeometryastheleaves.Theboundingboxofeachpieceofleafgeometryisatmostthesizeofaregion,butisusuallysmallerandittakesrelativelylittleefforttotrimtheboundingboxtothesizeoftheactualmesh.Eachinternalnodeofthehierarchyisbuiltfromtheeightnodesbelowit.
Becauseofthehighlydynamicnatureofourenvironmentswedonotperformanykindofprecomputedvisibilitycalculations.Visibilitycullingiscurrentlyhandledsimplybyintersectingtheviewfrustumwiththeboundingvolumehierarchy,andrenderingeachpieceofgeometrywhoseboundingboxisatleastpartlyinside.Thisisafastandefficientmethodforcullinglargeamountsofgeometry,butitisnotanoptimalsolutionforsceneswithahighdepthcomplexity.Fortheseweareinvestigatingtheuseofimage-spacemethodssuchasCoherentHierarchalCulling[4],butwedonotyetknowhowwellthesewillperform.
Lightingandshadowingalgorithmsarealsofullydynamic,astheabilitytomodifythevolumeatanytimemakestheuseofprecomputedilluminationverydifficult.Normalsforlightingareusuallyprovidedwiththevertices(asdiscussedinSection3.4.1),butinsomecasesitispossibletogeneratethemonthefly.Sofar,wehaveappliedonlylocalilluminationmodels,butsomeofthecurrentresearchonrealtimeglobalilluminationmayalsobeapplicable.Oneofthemanyvariantsofthepopularshadowmappingalgorithmcanbeusedtogeneratethereal-timeanddynamicshadows.
3.5.1TheMaterialSystem
Whenasurfaceisrendered,itisalmostalwaysdesirabletoprovideadditionalsurfacedetailtotheobjectthoughtheuseoftexturemapping.OneofthekeyproblemswithgeneratinggeometryontheflyisthatthereisnoopportunityforanartisttodefinetheUVparameterizationthatspecifieshowthesetexturemapsshouldbeapplied.Insteadweneedanautomaticwayofgeneratingtexturecoordinates.
Oneofthemostusefulapproaches(demonstratedinrealtimebyNvidiaintheirCascadesdemo[3])isknownastriplanartexturing.Thisisperformedinthefragmentshaderandusesblendweightsderivedfromthesurfacenormaltointerpolatebetweenthreetexturesprojectedalongthex,y,andzaxes.Thatis,thefirsttextureissampledusingthe(yz)componentsofthefragment'sworld-spacepositionandmodulatedbythexcomponentofthenormal,thesecondtextureissampledusingthe(x,z)componentsofthefragment'sworld-spacepositionandmodulatedbytheycomponentofthenormal,andsoforth.Afragment'sworld-spacepositioncanbedeterminedbyinterpolatingitfromthevertices,andthecomponentsofanormalvectorcanbemadetosumtoonebysquaringthem.ThesnippetofCgfragmentshadercodeinListing3.1demonstratesthisprocess.
Listing3.1:Triplanartexturingcanprojecttexturesontoarbitrarygeometry.Thiscodereceivestextures,anormal,andaworldpositionasinputandcomputestheresultingcolor.Notethatinordertopreservetexturehandedness,oneoftheUVcoordinatesmustbenegatedwheneverthedominantnormalcomponentis-x,+y,or-
z.Thisisparticularlyimportantwhenworkingwithnormalmapsbutisnotshownforsimplicity.
//Interpolationmeansnormalsmaynotbeunitlengthnormal=normalize(normal);
//Squaringaunitvectormakesthecomponentsaddtoone.float3blendWeights=abs(normal*normal);
//Foreachaxis,samplethetextureandmultiplybytheblendweights.float4colorMapValueYZ=tex2D(colorMapX,worldPos.yz)*blendWeights.x;float4colorMapValueXZ=tex2D(colorMapY,worldPos.xz)*blendWeights.y;float4colorMapValueXY=tex2D(colorMapZ,worldPos.xy)*blendWeights.z;
//Combinetheresultsfloat4colorMapValue=colorMapValueXY+colorMapValueYZ+colorMapValueXZ;...
Triplanartexturingworksparticularlywellwhenappliedtoterrain,andwhennaturaltextures(rock,grass,etc.)areused.Itdoesnotrespondparticularlywelltomanmadetexturescontainingsharpedgesorotherhighfrequencydetail,asthesedonotblendwellwitheachother.Variationsonthisideaarealsopossible,suchasusingsixtexturesinsteadofthree,orusingonetextureforallfourlateralsurfacesanddifferenttexturesforthetopandbottom.
Analternativemethodistousethenormaldirectlytolookupintoacubemaptexture.ThisiswhatwasdoneforthesurfaceoftheEarthinFigure3.1(b).Inthiscase,the
normalswhichwereusedforthelookupweregeneratedontheflybynormalizingavectorfromthecenteroftheEarthtothevertexonthesurface.ThisgavebetterresultsthanusingthenormalsgeneratedbythemethodsinSection3.4.
However,theconceptuallysimplestwaytoperformtheparameterizationistousethree-dimensionaltexturecoordinates.Inthiscase,thetexturecoordinatescanbederiveddirectlyfromthefragment'sworld-spaceposition.Storing3DtexturesontheGPUquicklybecomesimpracticalbecauseoftheirhighmemoryrequirements,butthe3Dtexturecoordinatescaneasilybeusedasinputsintoproceduraltexturegenerationroutines.ReferringagaintoFigure3.1(b),thelavaintheEarth'scoreisgeneratedbyusingseveraloctavesofPerlinnoiserunningontheGPU[10].
Notethatwhichevertechniqueisusedtogeneratethetexturecoordinates,itisstillpossibletoperformtexturetransformationsbyapplyinganappropriatetexturematrix.
3.5.2UsingMultipleMaterials
IfourvoxelsincludeamaterialID,thenthiswillhavebeenpassedtotheGPUaspartofourvertexdefinition(seeSection3.4.1).OnesimplewayinwhichwecanusethismaterialIDistoidentifythetexture(orsetoftexturesfortriplanartexturing)thatshouldbeapplied.ModernGPUsprovidedirectsupportforthisthroughtexturearrays,whichallowasingle2Dtexturetobeindexedinanarrayoftexturesusingtheroundedvalueofafloating-pointinput.Olderhardwarecanmakeuseofatextureatlas[14]toobtainasimilarresult,butspecialmeasuresmustbetaken
toavoidfilteringartifactswhentexturesrepeat.
Althoughsimpleinprinciple,therearesomeadditionalissuestobewaryofifyouwishtoblendsmoothlyfromonematerialtoanother.Forexample,atriangleontheboundaryoftwoormorematerialswillhaveadifferentmaterialIDateachofitsvertices.ItdoesnotmakesensetosimplyinterpolatethesematerialIDsacrossthefaceofthetriangle,asthiswouldyieldvalueswhichdidnotexistatanyofthethreevertices.
Withinoursystemwehandlethisscenariobysplittingourinputmeshintwo.Oneoftheresultingmeshescontainsonlythosetrianglesthathavethesamematerialateachvertex(wewillcalltheseuniformtriangles).TheothernonuniformtrianglesarereplacedwithnewtrianglesthathaveamaterialIDofzeroateachvertex.Duringshading,weensurethatmaterialzeroisdrawnasblackbyeithersettingthezerothslotofourtextureatlastoblack,orbyputtinginanexplicitcheckandreturnatthebeginningofourfragmentprogram.
Thesecondmeshthatresultsfromoursplittingprocedurecontainsthenon-uniformtriangles.Actually,eachnon-uniformtrianglegetsduplicatedthreetimes[3]tocreatethreeuniformtriangles,oneforeachofthematerialsinthenon-uniformversion(seeFigure3.6).Wesetthealphavaluesoftheverticessuchthatonecornerisfullyopaquewhiletheothertwoarefullytransparent.
Figure3.6:AsingleinputmeshcontainingmultiplematerialIDs,representedbydifferentcolors(seefigureonaccompanyingCD),issplitintotwomeshes.Intheuniformtrianglemesh,allthecomponentsarespatiallyadjacentandareonlysplitupinthefiguretoaidvisualization.Inthenonuniformtrianglemesh,thethreepartsaredrawnontopofeachothersuchthatthealphavaluesblendcorrectly.
Renderingisperformedbyfirstdrawingtheuniformtriangleswiththeblendingmodesettoreplacethecurrentcontentsoftheframebuffer.Aswellasdrawingtheuniformtriangles,thisalsoservestoensurethatthebackgroundbehindthenon-uniformtrianglesissettoblack.Theblendingmodeisthensettoadditivelyblendwiththeexistingframebuffercontents,andthemeshcontainingthenon-uniformtrianglesisdrawn.Thisresultsinasmoothtransitionfromonematerialtoanother.IfwefollowasingletrianglesuchastheonemarkedXinthefigure,wecanseethatitisdrawnfirstinblackandthenonceagainusingeachofthematerials.
NotethatthistriangleduplicationonmaterialboundariesiscurrentlyperformedontheCPUafterthemesheshavebeen
generatedbytheMarchingCubesalgorithm.Thisisdoneforcompatibilitywitholderhardware,butitwouldbeinterestingtoinvestigatewhetheritcouldinsteadbehandledbythegeometryshaderonmoremodernGPUs.
ItispossiblethatthematerialrepresentedbythematerialIDsdifferbymorethanjustthetexturesthatareapplied.Infact,theentireshadersand/orpipelinestatemightneedtobedifferentforsomematerials.Figure3.1(b)isagainagoodexampleofthis,asitusesacubemapprojectionforthesurfaceoftheEarth,triplanartexturingfortherock,andGPUPerlinnoiseforthemagma.
Ifthemeshcorrespondingtoasingleregiondoesneedtocaterforsuchadiverserangeofmaterials,thenwesplitthemeshintoseveralpieces,uptothenumberofdifferentmaterials.Forexample,ifameshconsistsofthreedifferentmaterials,ofwhichtwoarebasedontriplanartexturingandthethirdisprocedurallygenerated,thenwesplittheverticesforthethirdmaterial,butleavethefirsttwomaterialsinthesamemesh.Wethenrenderthefirstmeshusingourtriplanartexturingshader(choosingamongtexturesbasedonthematerialID)andthenrenderthesecondmeshusingourproceduralshader.Thissplittingintomaterialsisinadditiontothesplittingdescribedearlierforblendingamongmaterials.Thismeansthatifaregioncontainsndifferentmaterials,thenitmayendupinatmost2npiecesafterallsplittingiscomplete.However,thisisaworstcasescenario,andinpracticemanyregion'smeshesdonotneedtobesplitatall.
OursystembenefitsherefrombeingbuiltontheOgre3Dgraphicsengine[9],aswecansimplypassourmeshesinto
Ogre'srenderqueueandthesortingbytexturechanges,renderstatechanges,etc.,ishandledautomatically.Mostgraphicsengineswillprovidesomesimilarfunctionality.
[3]Actuallythereissomeroomforimprovementhereastrianglescontainingtwomaterialsshouldonlybeduplicatedtwice,ratherthanduplicatingallnon-uniformtrianglesthreetimes.Weintendtochangethisastrianglesconsistingofthreedifferentmaterialsarequiterare,andsoalotofextratrianglesarecurrentlygeneratedtosupportthisworst-casescenario.
TeamUnknownRelease
Chapter3-VolumetricRepresentationofVirtualEnvironmentsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
3.6Physics
Integrationofaphysicssolution(inourcaseBullet[2])intoourenginewasrelativelystraightforward,asforthemostpart,thegeometrygeneratedbytheMarchingCubesalgorithmcanbetreatedthesameasanyother.Aphysicsmeshisconstructedforeachregionfromthevertexandindexbuffers,andthesameboundingvolumehierarchythatweusedforviewfrustumcullingcanbeusedforthebroad-phasecollisiondetection.
Duringsimulation,wefoundthatthesheernumberoftrianglesdidputthephysicssystemunderalotofstrain,andthataneffectiveLODsystembecomesessentialforvolumeswithdimensionsoverabout2563(ofcourse,thisvarieswildlydependingonthecomplexityofthevolume).OurexistingLODsystemisnotparticularlysuitableforthispurposeduetothemismatchesbetweenLODlevelalignmentdiscussedinSection3.4.2.Simplifyingtheoriginalhigh-resolutionmesheswouldlikelybeanimprovementhere.
Dynamicallyupdatingthemeshesasvoxelsareremovedwasalotmorestraightforwardthaninitiallyanticipated.Simplyreplacingonemeshwithanotherbetweensimulationtimestepsseemedtocausethephysicsenginenoproblemsatall.Dynamicallyaddingvoxelsisalotmorecomplexbecauseanobjectcansuddenlyfinditselfpenetratingasurfacethatitwaspreviouslynotcloseto.Moreworkisrequiredtodecidehoworifthisscenarioshouldbestbehandled.
Oneadditionalpointworthnotingisthatwhileallcollisiondetectioniscurrentlyperformedagainstthesurfacemeshes,thereispotentialfordoingitdirectlyagainstthevoxelvolume.Hittestingandpickingiscurrentlyperformedinthisway,andtestingwhetherapointisinsideanobjectornotbecomesasimplecaseofcheckingthevalueofthevoxelatthatlocation.
TeamUnknownRelease
Chapter3-VolumetricRepresentationofVirtualEnvironmentsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
3.7TheFuture
MostofthetechniquesdescribedinthischapterhavebeenimplementedinourexperimentalThermite3Dgameengine[15],withtheexceptionofthe"densityvoxels"whichareavailableintheForeverWarspin-offproject[5].Thetechniquesareappropriateforintegrationintoothergameenginesrunningoncurrentgenerationhardware.However,ourprojectiscurrentlyatanearlystage,andthereareanumberoffeatureswewouldliketoaddinthenearfuture.
First,wewouldliketoincreasethesizeofthevolumesthatwecanloadandrender.ThiswillrequirefurtherworkonourLODsystem,particularlywiththeaimofdetermininghowtheprogressiveLODapproachcomparestothecurrentdiscretesystem.Second,wewouldliketoinvestigatetheuseofstreamingasamechanismforreducingtheamountofdataheldinmemoryatatime.Ourcurrentblockvolumestructureislikelytoprovideastrongbasisforthisasblockscanbestoredinacompressedformatondisk,andloadedintomemoryondemand.Wemayalsowanttosaveblocksbacktodisksothatchangesmadetotheenvironmentcanbepersistent.
Inaddition,weneedtogivesomethoughttotheissueofcontentcreation.Atpresentwehavetoolstoconvertexistingheightmapsintovolumetricrepresentationssotheycanbedestroyedinrealtime,andwealsohaveatoolthatwillconvertatrianglemeshfromastandard3Dmodelingpackageintoavolume(thiswasusedtocreatethecastleinFigure3.1(c),forexample).Butthereisalotofpotentialforgeneratingenvironmentsprocedurally,suchasaterrainbuilt
fromPerlinnoisewithanundergroundnetworkofcavesbuiltusingVoronoicells.
Itisalsoimportanttoconsiderhowtheuseofvolumetricenvironmentscanbeusedasagameplayelement.Theabilitytodestroypartsoftheenvironmentinresponsetoexplosionsisanobviousexample,buttherepresentationalsolendsitselfnaturallytoallowingpartsoftheenvironmenttobeslowlyerodedaway,perhapsinresponsetofireoracid.Ifagamescenariorequiredit,itwouldalsobepossibletoslowlyhealgeometrybacktoitsinitialstate.Lastly,thedevelopmentofapowerfulandintuitiveinterfaceforeditingcouldalsohelpitsadoptionasagame-playdevice.
Finally,ifwelookbeyondourownproject,itisworthnotingasignificantamountofresearchonothermethodsofrenderingthesekindsofenvironments.idsoftwareisprobablythemosthighprofileofthesewithtalkofa"SparseVoxelOctree"beingusedtorepresentgeometryintheiridTech6engine[13].AsGPUsbecomeincreasinglygeneralpurpose,itisbecomingpracticaltoimplementothervolumerenderingapproachessuchasraycastingorpointsplatting.Fornow,thesurfaceextractionapproachdescribedinthischapteristheonlyapproachtohaveseenuseinrealgames,butitwillbeinterestingtoseewherethefutureleads.
TeamUnknownRelease
Chapter3-VolumetricRepresentationofVirtualEnvironmentsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Acknowledgements
IwouldliketothankMatthewWilliamsandJazWilsonfortheircontributionstotheThermite3DEngine,andthedevelopersofOgre3DandBulletfortheirvaluablelibraries.ThomasSchöpsandMarekRosagrantedpermissiontousetheirimagesinthisgem,whileTobiasTropperprovidedusefulfeedbackonearlyversionsofthegem.
TeamUnknownRelease
Chapter3-VolumetricRepresentationofVirtualEnvironmentsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]SörenGrimm,StefanBruckner,ArminKanitsar,andMeisterEduardGröller."ARefinedDataAddressingandProcessingSchemetoAccelerateVolumeRaycasting".Computers&Graphics,Volume28,Number5(October2004),pp.719–729.http://www.cg.tuwien.ac.at/research/publications/2004/grimm-2004-arefined/
[2]ErwinCoumans.BulletPhysicsLibrary.http://bulletphysics.com/
[3]RyanGeissandMichaelThompson."NVIDIADemoTeamSecrets—Cascades",GameDevelopersConference2007.http://developer.download.nvidia.com/presentations/2007/gdc/CascadesDemoSecrets.zip
[4]OliverMattausch,JiříBittner,andMichaelWimmer."CHC++:CoherentHierarchicalCullingRevisited".ComputerGraphicsForum(ProceedingsEurographics2008),Volume27,Number2(April2008),pp.221–230.http://www.cg.tuwien.ac.at/research/publications/2008/MATTAUSCH-2008-CHC/
[5]ThomasSchöpsandOliverSchneider.ForeverWar.http://foreverwar.sourceforge.net/
[6]EricLengyel."Voxel-BasedTerrainforReal-TimeVirtualSimulations".Ph.D.diss.,UniversityofCalifornia,Davis,2010.
[7]WilliamE.LorensenandHarveyE.Cline."Marchingcubes:Ahighresolution3Dsurfaceconstructionalgorithm".ACMSIGGRAPHComputerGraphics,Volume21,Number4(July1987).
[8]KeenSoftwareHouse.MinerWars.http://www.minerwars.com
[9]SteveStreeting.Object-OrientedGraphicsRenderingEngine.http://www.ogre3d.org
[10]SimonGreen."ImplementingImprovedPerlinNoise".GPUGems2,Addison-Wesley,2005.http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter26.html
[11]StanMelax."ASimple,Fast,andEffectivePolygonReductionAlgorithm".GameDeveloperMagazine,November1998.http://www.melax.com/polychop
[12]KlausEngel,MarkusHadwiger,JoeM.Kniss,ChristofRezk-Salama,andDanielWeiskopf.Real-TimeVolumeGraphics.AKPeters,2006.http://www.real-time-volume-graphics.org/
[13]JonOlick."CurrentGenerationParallelismInGames".BeyondProgrammableShading,Siggraph2008.http://s08.idav.ucdavis.edu/olick-current-and-next-generation-parallelism-in-games.pdf
[14]Nvidia."ImproveBatchingUsingTextureAtlases".July2004.http://http.download.nvidia.com/developer/NVTextureSuite/Atlas_Tools/Texture_Atlas_Whitepaper.pdf
[15]DavidWilliams.Thermite3DGameEngine.
http://www.thermite3d.org/
[16]GangLin,andThomasP.-Y.Yu,"AnImprovedVertexCachingSchemefor3DMeshRendering".IEEETransactionsonVisualizationandComputerGraphics,Volume12,Number4(July2006).http://www.ecse.rpi.edu/~lin/K-Cache-Reorder/
[17]ShuangshuangJin,RobertR.Lewis,andDavidWest."Acomparisonofalgorithmsforvertexnormalcomputation".TheVisualComputer,Volume21,Numbers1–2(February2005),pp.71–82.http://www.tricity.wsu.edu/cs/boblewis/pdfs/2003_vertnorm_tvc.pdf
[18]TomForsyth."ComparisonofVIPMMethods".GameProgrammingGems2,CharlesRiverMedia,2001.http://home.comcast.net/~tom_forsyth/papers/gem_vipm_webversion.html
[19]"VirtualTerrainProject".http://www.vterrain.org/
TeamUnknownRelease
Chapter4-High-LevelPathfindingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter4:High-LevelPathfinding
DanielHigginLunchtimeStudios,LLC
Overview
Today'sgamersdemandspectacularpathfinding.Ifitisn'tamazing,don'tshipit.Otherwise,prepareforanassaultofangrygamersmobbingthedeveloper'sstudios,torchesablazeandpitchforksinhand.Simplyput,pathfindingisoneofthemostimportantpiecesoftechnologytogetrightinagame,especiallyinthereal-timestrategygenre.Besidesthepressureofperfection,pathfindingprogrammersneedathickskin.Thesebravesoulsshouldassumethatgamersdon'tcomplimentpath-finding,theyroastit.GamerslatchontoalmostanygameorAIimperfectionandblameitonthepathfindingsystem.Intruth,theyusedtoberight.Theoldgenerationofpathfindingenginesresultedinactorslookinglikeknuckleheadsastheyfoundlessthanoptimalpathsandoftenfailedtofindavalidpathaltogetherwhenoneexisted.
Apartfromtheimpactpathfindinghasongameplay,italsohasanenormouseffectonperformance.Gamelagshouldneverbetheresultofpathfinding,butsadly,ithappensalltoooften.Whatgamersdon'trealizeisthatbehindthescenes,poorpathfindingresultsinarestrictedgameworld.Theserestrictionskeepgamersfromexperiencingagamedesigner'struevision,whichisneveragoodthing.Certainlynotallgenresneedfastpathfinding,butforthosewithlargeworldsormanyactors,pathfindingmustbeaccurateandoptimized.Likemanyofthecleverlydesignedusabilityfeaturesofmodernsoftware,pathfindingthatgoesunnoticedbytheuserisasuccess.
Therearemanyoptionsforoptimizingpathfinding.Wecanoptimizeourcode,time-slicepathfindingcalculations,reuse
paths,grouppaths,andevenavoidpathingalltogetherwhenwecangetawaywithit.Oneoptimizationhowever,reallystandsout,andthatistoreducethesearchspaceapathfinderworkswith.High-levelpathfindingisagreattechniquetoachievethisreducedsearchspace,andisthefocusofthisgem.Besidesdramaticallyimprovingperformance,high-levelpathfindingoffersustheopportunitytoperfectpathfinding.Oncepathfindingperformanceisnolongeranissue,wecanforceourpathfindingenginestonevergiveup,andalwaysfindapathwhenavalidoneexists.
TeamUnknownRelease
Chapter4-High-LevelPathfindingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
4.1Terms
Beforewediveintohowhigh-levelpathfindingworks,let'sreviewsometerms.
Actor.Anysignificantobjectwithinagame'sworldthatservesapurposebeyondambience.
RTS.Shortfor"real-timestrategy",agenreofstrategygamesthatfocusesonunitproduction,resourcemanagement,andforsome,combat.Itstypicallyhighactorcountandlargeworldareastraintraditionalpathingalgorithms,whicharedesignedforsmallerdatasets.
Tile.Asquareareaofaworld(suchas1×1meters)thatholdspropertiesforthatgivenworldspace.InatypicalRTSgame,tilesarelaidoutinagridpatternandcanbelocatedwithintegercoordinates.
Path.Asequenceofpointsthroughwhichanactorcanlegallymovetogetfromonelocationtoanother.
Detailedpath.Exacttilesanactorwillnavigatetogettotheirdestination.Thispathavoidsallobstacles.
Pathrule.Agamerulethatdictatesamovementrestrictionforagivenactortype,suchaswhalesnotbeingallowedonland.Theserulesdeterminewhattiletypesare"legal"foranactor'smovement.
Pathregion.Acollectionofcontiguoustilesthatsharethesamepathrules.
Beaconpoint.Thepathregion'spointclosesttotheregion'scenterofmass.
Fuzzypathing.Fuzzypathingisanotherwayofsayinghigh-levelpathing.Itindicatesthatanon-detailedpathisbeingcreatedthatislegal,butwithoutmanyofthemovementdetailsthatmakepathslookgoodandavoidallobstacles.Anotherwaytothinkofafuzzypathisthatitindicatesthereisaguaranteedwaytothedestinationviathisroute,butitmayhavesomeunknownandnavigableobstaclesalongtheway.
Terrainanalysis.Atermforcomputingknowledgeofagameworld[3].It'sdatawecomputeaboutaworldandorganizeinawaysothat,fromwithinagame,thepathfindercanapproximateahuman-likeawarenessoftheenvironment.Examplesincludecomputingoceantilesandshapes,recognitionofabay,ariver,marshlands,forests,beaches,etc.
TeamUnknownRelease
Chapter4-High-LevelPathfindingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
4.2StartYourEngines
Forthebestunderstandingofthisarticle,knowledgeofpathfindingisn'tessential,butisstronglyrecommended.Whatiscontainedwithinthisarticleisaskeletonviewofwhat'srequiredinahigh-levelpathfindingsystem.Whilereading,it'scriticaltorememberthatevenwithbrilliantdesignersandgeniusprogrammers,therewillbefrequentrevisionstothepathrulestohandlespecialcasesuntilpathfindingisperfectandfeaturefreezeoccurs.
Truly,thebestwaytobuildahigh-levelpathfindingsystemistomakeitasgenericaspossiblewithknownandisolatedcustomizationpoints.Thisisaperfecttimetoeitherbuy,ordustoffthosedesignpatternbooks.Bereadytoutilizepatternssuchasfactories,strategies,prototypes,andpolicies.Isolatethecustomizationpoints,becausethosewillbefrequentlymodified,andwedon'twanttonsofspecialcasecodefloatingaroundtheengine.
TeamUnknownRelease
Chapter4-High-LevelPathfindingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
4.3WhyHigh-LevelPathfinding?
Imaginewehaveaworldthatcontainsoveramilliontiles.Inthisworld,thereisabattleragingintheNorthatthewallsofanenemycityasseeninFigure4.1.Ourheroicarmyistryingtotakethewalledincitybyforce,butneedsreinforcements.Let'shelpourarmybymovingasiegeram,aman-at-arms,andnavalgalleynorthtothebattle.
Figure4.1:Aworldmap,withtilemarkings.
Amillionormoretilesisagreatdistancetonavigate,especiallywithobstaclessuchasforestsandtoweringenemywalls.Oddsarethatduringabattle,CPUsarebusywithanimations,projectiles,physics,AI,andgraphics.Theeffortofcalculatingdetailedpathsforanarmyacrossahugemapwilladverselyaffectgameperformance;however,ifourworldwasonlyafewthousandtiles,wecouldachievethis
withoutanynoticeableimpactontheframerate.
Toaccomplishthisweneedtoreducethesearchspacetonavigateourmaps.High-levelpathfindingachievesthisthroughaseveral-phaseprocess.First,ourpreprocessphasemustidentifyeachtileintheworldasbeingofone,andonlyone,pathtype.Withthosetypesdefined,weneedtoanalyzetheworldandbuildupaknowledgebaseofterraininformationbeforethegamebegins,whichweaccomplishbyperformingterrainanalysis[3].Thesecondphaseinvolvesusingthedatastoredinourworldknowledgebasetofindafuzzy,non-detailedpathbetweentwopointsusinganalgorithmsuchasA*.Ourfinalphaseinvolvestheactorrefiningthefuzzypathtoplantheexactroutetotheirdestination.
Soundlikemorethanafewdaysofwork?Itshould.Implementinghigh-levelpathfindingcanconsumeasignificantamountofdevelopmenttime,butiswellworththeeffort.
TeamUnknownRelease
Chapter4-High-LevelPathfindingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
4.4PreprocessPhase
Thepreprocessingphasehastwomainparts—designandterrainanalysis.Inthedesignportionofthisphase,wefocusonidentifyinguniquepathtiletypes.Thisstepisdonewithoutanycoding,anditreliesonfullknowledgeofactorsandtheirmovementrules.Designersshouldbepreparedtodefineeveryactorrulesuchasprohibitingsubmarinesfromapproachingshorelines,ordeclaringthatonlyninjascanclimbwalls.Aruleonlyneedstobeaddedonlyifithasasignificantaffectonanactor.Forexample,ifmovementongrassandmovementonsandarethesameforallactorsintermsofpathfinding,thenthereisnoneedtodefinethemasdifferentregiontypes.If,however,sandisillegalforaunicycle,thenitneedstobeitsownpathtype.
Thelastpartofthisphaseisterrainanalysis,andthisiswherethecreationofpathregionsoccur.Foranin-depthexplanationofterrainanalysisandit'smanyuses,it'sstronglyrecommendedthat"TerrainAnalysisinanRTS—TheHiddenGiant"[3]isreadbeforebeginninganyactualcoding.
DesignTime
Priortocoding,it'scrucialtocategorizeeachtileintoauniquepathfindingtype.Forexample,awatertileisrarelyjustawatertile.Agivenwatertilemaybebestclassifiedasbeingpartofariver,bay,deepocean,orcoralreef.Toidentifytheseregions,wedetermineifallactorshavethesamerestrictionfortraversingoravoidingthetile.Ifanyactorisdesignedwithaconstraintthatmodifiestheir
passageacrossthetile,wemakeanewpathtype.Perhapsit'sadifferenceinterraintype,ormaybethere'saspecialruleforlargeactors.Whenthisoccurs,wemayturnthespecialcaseruleintoauniquetype.
Let'sseeafewexamples.InourefforttoreinforcethearmytotheNorth,eachofoursoldiershasverydifferentpathfindingrestrictions.Ourrulesareasfollows:
SiegeRam.Bigandbulky,thisactormustmoveonroadsorgrasslands.Whatmakesthisactoruniqueisthatwhilepathfinding,itcanmovethroughenemywallsandgatesifitdoesn'tmindattackingthemfirst.Siegeramsalsocannotcrossriverswithoutusingabridge.
Man-At-Arms.Ourplatedsoldiercanbeingroupformationsoractindependentlyasascout.Unboundbytherestrictionsofusingahorse,whenalone,ourknightcanmovethroughthewoodsifneeded.
Galley.Thiswarshipislarge,andusedprimarilyinshiptoshipcombatortobombardthecoastfromadistance.Riverwarfareisoutofthequestion.Evenapproachingneartothecoastcouldriskitrunningagroundandisforbidden.Asaresult,itmuststayindeepwateratalltimes.
Already,wecanseeaneedfordifferenttypesofwaterandrecognizethatweneedtoseparatewalls,gates,forests,andbridgesintoseparatetypes.It'sessentialtogothrougheachtypeofactorthatcanpathfindanddetermineitsmovementrules.It'sagoodideatoestablishanenumerationofpathtiletypesandafunctionforcomputingthem,asshowninListing4.1.
Listing4.1:Thisisafunctionusedtodeterminethepathtypeofatile.Theorderofruleshereisveryimportant.Thisfunctionisaspecialcaseandwillbefrequentlymodifiedduringdevelopment.
namespacePathTileType{typedefunsignedlongType;
enum{Unknown=0,Grass=1,
DeepWater=2,Forest=3,Wall=4,Gate=5,ShoreLine=6,Shallows=7,Bridge=8};
//Helperusedtodetermineatile'spathtype.externTypeComputeTileType(constTile*inTile);}
//Thisfunctiontakesatileanddeterminesitspathtype.//EverytilemustbedeterminedtobeofoneandONLYonetype.PathTypeType::TypePathTypeType::ComputeTileType(constTile*inTile){//Assumeassertsfortilevalidity,etc...//Isthisatypeofwater?
if(inTile->IsWater()){//Isthereabridgehere?if(inTile->HasActorTypeOnTile(kActorTypeBridge))returnPathTileType::Bridge;
//Isitshallowwater?if(inTile->GetWaterDepth()<=1.0)returnPathTileType::Shallows;
//Closetoshore?if(inTile->GetDistanceToShore()<=2.0)returnPathTileType::ShoreLine;
//Mustbeadeepwatertile.returnPathTileType::DeepWater;}
if(inTile->HasActorTypeOnTile(kActorTypeTree))
returnPathTileType::Forest;
if(inTile->HasActorTypeOnTile(kActorTypeWall))returnPathTileType::Wall;
if(inTile->HasActorTypeOnTile(kActorTypeGate))returnPathTileType::Gate;
returnPathTileType::Grass;}
NoticehowinListing4.1theorderissignificant.Ifatilehaswateronit,it'smoreimportanttoknowifithasabridgeonitthanwhetherit'sovershallowwater.Ifthetypeofwaterunderabridgealsomattered,thetypecouldbeBridgeShoreLine,orBridgeDeepWater.It'slikelytherearemanypermutationsoftypes,butintheend,ensureeachtilebelongstoonlyonepathtiletype.
It'seasytogetcarriedawayinthisphaseandunnecessarilydefineeverypossiblevariationofasimilarregion,sobecareful.Examinethedifferencesbetweenthefollowingtwocases.Iftherearetwotypesofbridgesintheworld,adrawbridgeandafootbridge,shouldwecreateuniquepathtypesforthem?Adrawbridgeallowslargeshipstopassbeneaththem,whileafootbridgedoesnot.Anargumentcouldbemadeeitherway,butthissituationisn'tsouniquethatitneedstobehandledwithanewtype.It'ssimpleenoughforashiptoaskaregionwhattypeofbridgeitcontains.
Tofindanexamplewhereauniqueruleiswarranted,let'sexaminethecaseofladdersintheRTSgame,Rise&Fall:CivilizationsatWar.Laddersaremobile,meaningtheycouldbepackedupandmovedtoothersectionsofwall.Laddersalsohaveastrictrequirementofallowingonlyhumans.Whilehorsescouldchargeuprampstofightontopofwalls,ladderswerestrictlyhorse-freezones.Pathregionsalsopresenteduswiththeopportunitytosolvetheissueofpositioningactorsdirectlyinfrontofaladderpriortoclimbing.Wesolvedthisbymakingthetileinfrontoftheladder'sbaseaonetilelargepathregion.Thisensuredanactorwouldwalktothetileinfrontoftheladderbeforeclimbing.Ladders,therefore,qualifiedasnotonlyauniquearea,butalsocreatedtheopportunityforaseparatepath
regionthatweusedtohandleactorpositioningforladderclimbing.
TerrainAnalysis
Oncewehaveourdesignestablished,weuseitinthenextportionofthepreprocessphase,terrainanalysis.KnowledgeofagameworldisincrediblyusefultoAIprogrammers,andterrainanalysisisvitaltogainingit.Itisacollectionofalgorithmsthatexecutebeforeagamebeginsandarerefinedatruntimeinordertofirstanalyze,thenorganize,gameworlddata.Programmersthenuseitcreativelytoimplythatactorshaveahuman-likeunderstandingoftheirworld.
Pathregionsarecreatedbyiteratingthrougheachtile,categorizing,andthenclumpingcontiguoustiletypestogether.Theseclumpsarethenlabeledbytheirpathtypesanddefinedasapathregion.Itisthennecessarytoprovideanindicatorofaregion'slocationintheworld.Toaccomplishthis,wepositionabeaconpointatthecenterofmassofthepathregion.Rememberthatregionsaren'tnecessarilyniceandneatrowsoftiles.Aconnectedforestoftreescouldspiraloutlikeaspiderwebfullofsparse,butconnectedtrees.Thus,thetruecenterofmassfortheregioncouldbeapointinsideanotherregion.Inthatcasethebeaconpointispositionedbyfindingtheclosestpointinsideourregiontothatcenter.
Theseregions,orclumpsofcontiguoustiles,areeasyforustovisualizeandunderstand,butthereisaproblem:ifapathregionistoolarge,usingalgorithmslikeA*willhavedifficultyproducinggoodhigh-levelpaths.Considerapathregion
comprisinganoceanthatsurroundsanisland.Whenwewanttomoveashipfromonesideoftheworldtotheotheraroundtheisland,wehavetocommunicatetotheenginethatjustbecauseit'sthesameoceandoesn'tmeanit'sasimplepath.Iftheoceanwasoneregion,therewouldbenothingtonavigate.Thelow-levelpathfinderwouldthinkitsstartandenddestinationwasthesameregion,soitmustbeasimplepath.Notso.Wewouldwanttheretoberegionsthattookourshiparoundtheisland,regardlessofitbeingthesameocean.Thekeytodoingthisistodivideuptheregionsusingaworldgrid,whicheffectivelynormalizesthesizeofallregions.
Usingourpathregionclumps,someofwhicharegiganticandsomeofwhicharesmall,welayagridovertheworld.Largeandsmallregionsalikegetsubdividedbythisgrid,makingmanyoftheregionssplitatleastonce.Choosingthesizeofthegridtakessomeexperimentation,mostlikely,aftertheentirehigh-levelpath-findingsystemiscomplete.Forthesakeofstartingsomewhere,let'suseamapthatis1000×1000tiles,andagridresolutionthatis10×10Givenaflatmapofjustgrass,wewouldhavemanygrassregionsthatcontained100tileseach.Thatisn'tanaggressivesize,andit'slikelyamilliontilemapwouldhavelargerregions,butit'sastartingpoint.Becertaintoplaywiththeresolutionnumbers,adjustingthemupanddowndependingonmapsize,complexity,orperformance.Also,it'simportanttostoretheworldgridposition(blacklinedsquaresinFigure4.2)withinthepathregionaswe'llfindmultipleusesforitlater,suchasduringthefuzzypathingphaseorwhenrecomputingpathareasduetoworldchanges.
Figure4.2:PathregionswithaworldgridoverlayandIDsforeachpathregion.Noticehowtheworldgridboxescontainmultipleregionsinsidethem,andthatregionsliketheoceanaredividedupbytheworldgridintomanyseparateregions.
Whydon'twejustmaketheresolutionsuchthatwehavehugeregions?Ifthenumberistoolarge,aswesawfromtheoceanandislandexample,we'llmisssomenavigationimprovementsandperformancegains.Thekeytothissystemistheone-twopunchoffirsthigh-level(fuzzy)pathsfollowedbydetailedlevelpaths.Sincedetailedpathsaremorecomputationallyexpensiveinlargesearchspaces,giantregionswouldn'tsaveusasmuchCPUtimeduringthedetailedpathfindingphase.Conversely,ifthegridforcestinyregions,thenperformancegainsarereducedbecausethesearchspaceagaingrows.Thereisalwaysasweetspot—don'tdiscounttheimportanceoffindingit.
Thegrid,asillustratedinFigure4.2,showshowwecanpredict,withinsomesmallerroramount,thedistance
betweenregions.Noregionspanshalfthemap.Weknowthelargestsize,andwecanmakecostestimatesinpathfindingbasedontheseassumptions.Ifoneregionwasgiganticandtheothersweresmall,itwouldbedifficulttoknowifonewastraversingasmallportionofthegiganticregionortraversingthelongestpossibledistanceacrossit.Keepingpathregionsslicedintoacontrolledgridsizekeepstheunknowntraversalcostsolowthatwecanpredicttheapproximatepathplanningperformanceimpact.
AdjacentRegions
Inordertouseallthepathregions(asshowninFigure4.2)tonavigatethemap,wefirstneedtotiethemtogetherinagraph.Thisisdonebyiteratingthrougheverytile,findingthetile'sregion,andfindingtheregionsleft,right,up,anddownfromthetile.Iftheregiononanyofthesidesdoesnotmatchthetile'sregion,wecreateanadjacencystructureforbothregions.That'sfairlystraightforwardforleft,right,up,anddown,butwealsoneedtohandlediagonalconnections.
Diagonalsrequireaspecialadjacencyrulebecausemovementthroughthemactuallytraversesneighboringtiles.NoticeinFigure4.3thattomovefromregion2toregion1,mostactorswouldhavesomeportionofthemtravelthrough3and4ontheway.Ifregion3or4isinvalidfortheactor,thisdiagonalmustbeofflimits.Sincewedon'tknowwhatactorsaregoingtousetheconnection,weneedtostorethemsomewhereandcheckthemduringhigh-levelpathing.
Figure4.3:Exampleofdiagonalconnectionsbetweenregions.
Wedon'tnecessarilyalwayswanttostorediagonalconnections.Weonlywanttostorethemiftheonlyconnectionbetweentwoadjacentregionsisthroughadiagonalmove.Sinceregionscouldbeoddlyshapedwithmultipleconnectionpointstoadjacentregions,theremaybemorethanonediagonalconnectionpointfromregion2toregion1.Weneedtostoreeachinstanceofthediagonalconnectionssothatpathfindingcanchecktheboundaryregionstoseeifthemoveislegal.
Donotkeepanydiagonalconnectionsbetweentworegionsifanon-diagonalconnectionisavailable.Diagonalconnectionsareonlyalast-caseconnectionsituationsincetheyfactorinsurroundingregionsduringthepathingofactors.Again,evenifdiagonalconnectionshavebeenfoundbetweenregions2and1,thenanon-diagonalconnectionbetweenregions2and1isfound,thediagonalconnectioninformationmustbethrownout.
EachregionshouldcontainsomestructureperadjacencysuchasthatshowninListing4.2.
Listing4.2:Regionsshouldcontainalistofadjacentregionswithdataindicatingiftheconnectionisadiagonal,andwhichregionsborderthediagonalconnection.
structPathRegionConnection{PathRegionConnection(PathRegionIDinToRegion,PathRegionIDinDiagonalFirst=kInvalidRegionID,PathRegionIDinDiagonalSecond=kInvalidRegionID);
PathRegionIDmToRegion;PathRegionIDmDiagonalA;//kInvalidRegionIDifnodiagonalexistsPathRegionIDmDiagonalB;//kInvalidRegionIDifnodiagonalexists};
AWorldofChange
Whataboutadynamicworld?Whathappenswhengatesfallorwallsarebuilt?Wemustincorporatethesechangesintoourworld'sknowledge.Thereisnoavoidingthisduringruntime.Toreducetheimpactofthisreanalysis,wecanleverageourworldgrid.Givenachangeintheworldthatshouldaffectpathfinding,wefindtheworldgridinwhichthechangeoccurredandqueueitforreanalysis.Soonafter,werunthroughallourqueuedworldgridregions,destroyalltheoldpathregionswithinthesesquareareas,andthenrecomputethem.Whenaworldgridisqueuedfor
reprocessing,it'sgoodtowaitforafewmomentsbeforereprocessingsinceit'slikelyonewallbuiltinanareawillbefollowedsoonafterbyanother.Waitingtoreprocesssectionsisfinewhenbuildingsareconstructed,butnotwhenbuildingsaredestroyed.Imaginearamknocksdownawall,akingthenissuesachargeintothecity,andthemenbeginwalkinginthewrongdirectionbecausetheydon'trealizethewallnolongerblockstheirentry.Notgood!
Withallourpathregionssubdividedusingagrid,asshowninFigure4.2,andalladjacentregionconnections(includingdiagonals)havingbeenidentified,wecanbeginmovingouractorsnorthtothebattleduringourfuzzypathingphase.
TeamUnknownRelease
Chapter4-High-LevelPathfindingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
4.5FuzzyPathingPhase
Wehaveanactor,astartingpoint,andadestinationpoint,andnowit'stimetopath!First,weneedtodeterminethepathregionofouractorandthatofthedestination.Fromthere,wehandthisdataovertoahigh-level(orfuzzy)pathfindingenginethatcomputesalistofconnectedbeaconpointsindicatingtheregionsanactormusttraversetogettohisdestination.Luckily,thisisnotdifficultifyou'refamiliarwithA*.
Fuzzypathfindingenginesworkjustlikelow-levelpathfindingengines.InthecaseofA*,thealgorithmfindsthelowestcostroutefromastartingpointtoanendingpointbyexaminingadjacentregions,computingthecostsfortraversal,andfloodingoutwarduntilapathisfound.Thereareafewimportantdifferences,butoverall,muchoftheA*enginedesigncanusethesamestoragemechanisms[1],design[2],andoptimizations[4]usedinlow-levelA*engines.Thebigdifferenceisthatapathregion's"neighbors"arenotattheguaranteedleft,right,up,down,anddiagonalpositionsofapathregion.Instead,theyareallofapathregion'sadjacentneighbors.Thinkofregionsasbeingnodesinagraphsinceagivenregioncouldcontaintenregionsonitsright,andonlyoneonitsleft.
Listing4.3demonstratesanA*engineupdate,andweseetheadjacencyiterationloopusedintheA*machine[1].WegetthepathregionfromthecurrentA*node,iterateoveritsadjacentregions,andifRegionIsOpen()methodreturnstrue,thenweknowouractorcanlegallymovefromthecurrentregionintotheadjacentregion.Ifthemoveisindeed
legal,thentheCheckNeighbor()methodcomputesthecostsofmovingfromoneregiontotheotherandplacesthatnodeontheappropriateA*list.
Listing4.3:PrimaryadjacencyupdateloopforfuzzypathingA*engine.
//Givenourcurrentnode,getthepathregionfromourA*nodetheCurRegion=mCurrentAStarNode->GetPathRegion();
//Shownasforeach,insertyourfavoriteloopiterationtechniqueforeach(PathRegion*theAdjacent,theCurRegion->GetAdjacents()){//makesurethisisnotthesameparentnode.if(RegionIsOpen(theAdjacent,theCurRegion))CheckNeighbor(mCurrentAStarNode,theAdjacent);}
PathsgeneratedbyA*arecontrolledprimarilyfromtwomethods.First,theRegionIsOpen()methodisusedtodetermineifit'slegaltotraversefromoneregiontoanother.Ithandlesnotonlythedetectionofpathtypevalidity,butchecksdiagonalmovementaswell.Lastly,theGetRegionCost()methodindicatesthecostoftraversingfromoneregiontoanother.Thiscostmethodhasanenormousimpactonpathaestheticsandperformance.
RegionIsOpen
Determiningifanactorcanmoveinaregiongoesbacktoouroriginaldesignofpathtypes.Ifaregionisaforest,only
ourman-at-armscanmovewithinit.Iftheregionisanenemywall,onlyoursiegeramisvalid.Andasforourgalley,itcanonlymoveindeepwater.
TheRegionIsOpen()methodshowninListing4.4notonlyneedstocheckifthenewregionisvalidforthepathingactor,butwhether,inthecaseofadiagonalmove,thecornersofthediagonalleap(Figure4.3)arealsovalid.
Listing4.4:Thismethoddetermineswhetheranactorcanmovebetweenregions.
boolAStarGraph::RegionIsOpen(PathRegion*inTo,PathRegion*inFrom){//Ifouractorcan'twalkinthenewregion,returnfalse//thistakesintoaccountthingssuchas...
//Thetiletypeisagate,butit'slocked.Ifthatisthecase,//itreturnsfalse.if(!mActor->CanMoveOnType(inToRegion->GetType()){returnfalse;}
//Askourcurrentregiontogetthecorrectdiagonals.//Note:theremaybe2ormorejuncturediagonalpoints//allwithdifferentregionsattheirdiagonaledges.std::vector<std::pair<PathRegion*,PathRegion*>>diags;inTo->GetDiagonalBlockingRegions(inFrom->GetID(),diags);
//Iftherearenoblocks,it'spassable.if(theBlocks.empty()){
returntrue;}
//Seeifthisisablockage.for(inti=(int)theBlocks.size()-1;i>=0;i--){//CanouractorwalkonBOTHregionsthatareattheedges?
//Inotherwords,canithopthediagonallegallybystepping//intotheotherregionsmomentarily?if(!(mActor->CanMoveOnType(diags[i].first->GetType())&&mActor->CanMoveOnType(diags[i].second->GetType()))){returnfalse;}}
returntrue;}
GetRegionCost
Ifanactorcanmovefromoneregiontoanother,A*needsustodeterminethetraversalcostforthemove.We'reusingoddlyshapedregions,whichcouldbeaproblemexceptthatwenormalizedthesizesbyusingaworldgrid.Forabasiccost,weusetheworld'sgrid.NoticethatinFigure4.3,wehavegridsectionswithmanyregionswithinthem.Ifweconsidereachblack-borderedboxhasan(x,y)gridposition,we'llusethattogettherelativepositionforpathfinding.Listing4.5demonstrateshowwecomputethecostof
travelingfromaregiontoanadjacentregion.Often,therearespecificcostssuchasmakingitexpensive(orlessdesirable)foranactortomovefromlandintoariver.
Listing4.5:CostmethodforourA*fuzzypathingengine.
unsignedlongAStarGraph::GetRegionCost(PathRegion*inTo,PathRegion*inFrom){unsignedlongtheBasicCost=0;
//Specialcaseforwalls.if(inTo->GetType()==PathTileType::Wall)theBasicCost=100;
//Ifit'sthesameparentgrid,we'llmakethecostreallylow.if(inTo->GetGridPosition()==inFrom->GetGridPosition())returntheBasicCost+1;
//Non-diagonalmovementcostslessthandiagonal.if(inTo->GetGridPosition().IsDiagonalFrom(inFrom->GetGridPosition())){returntheBasicCost+10;}
//Diagonalmovementcostsmore.returntheBasicCost+14;}
Oncethehigh-levelpathfinderisdone,wehavealistofpath
regions.Weshoulditeratethrougheachone,extractthebeaconpoints,andprepareouractorforlow-levelpathfinding.
High-levelpathscomputed,ourreinforcementsarereadytomove!Ofcourse,ifwefollowedthepathpointsexactlyasseeninFigure4.4,ourpathsmaynotbeasperfectaswewouldlike.Cornerswon'tberounded,andactorswillendupmovingalotfartheroverallthantheyshouldifcomputingonlyalow-levelpath.Thenextstepistotakethesepathsandconvertthemintoareal,detailedsetoftilesoursoldierscanusetoreachthebattle.
Figure4.4:High-level,fuzzypathshavebeencomputedforeachofouractors.Noteourbeaconpoints,identifiedasfootprints.Weusethesebeaconpointsinthedetailedpathingphase.
TeamUnknownRelease
Chapter4-High-LevelPathfindingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
4.6DetailedPathsPhase
Nowthatoursoldiersknowhowtogettotheirdestinationlegally,it'stimetogetthemmoving.Wecouldpathbetweeneachpoint,roundingthepathstobeprettier,butwhatifoursoldiersgotcoldfeetanddecidedtheypreferredtodosomethingotherthanfight?Wewouldhavedonealotofpathfindingthatwasunnecessary.
Thekeytoproducingaestheticallypleasingpathswhileavoidingwasteistoincrementallypathalongtheway.Whenanactorapproachesabeaconpoint,theactorpathstothenextbeaconpointbeforearrivingattheactualbeaconpoint.Ifwedisplayedlinesshowinganactor'spath,wewouldfrequentlyseehimcuttingofftheendofeachbeaconpathwhilehemovedandpathingaheadtothenextbeacon.Thatdistanceissomethingthataprogrammermustexperimentwith,asitaffectsboththetimeatwhichalow-levelpathcalculationoccurs(performance)andtheoverallaestheticsofapath.Togiveanexamplethresholdforagridresolutionof10×10,wemustmeetoneoftwocriteriatopathtothenextbeacon:anactormusteitherenterapathregionwherehiscurrentbeaconpointexists,orhemustbewithin12tilesofthatbeaconpoint.Ifoneofthoseconditionsistrue,thenwepaththeactortothenextbeaconusingourdetailedpathengine.
Whatweendupwithisanicelyroundedpath.Actorswon'tbewalkingtothemiddleofregions,onlytoturnandheadinanunnaturalangletothenextbeacon.Simplyput,wegetanaturallook,asshowninFigure4.5.Totheplayer,thereisnoevidenceofahigh-levelgrid-basedpath.
Figure4.5:Beaconpointsserveasaguidetothedetailedpathingengine.Notehowthedetailedpath(roundpathwithanarrowhead)doesnotgoallthewaytothebeaconpoints.
TeamUnknownRelease
Chapter4-High-LevelPathfindingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
4.7WhyGoThroughAllThisTrouble?
Oneofthebigadvantagesofourhigh-levelpathisintheconstraintswecanapplyinlow-levelpathfinding.TheA*algorithmiswellknowntosufferhorriblyfromfloodingproblems.Itoftensearchesinthewrongdirectionandintoconcavespaces,anditultimatelychecksmanytilesthatareunnecessary.Usingourhigh-levelpathinformation,wecanconstrainourlow-levelpathfindertoonlyconsidertilesinsideourbeacon-point(ortheiradjacent)regions.Thiskeepsthelow-levelpathfinderfromfloodingoutwardsandbackwardsintoareasthatweknow,atahighlevel,itdoesn'tneedtocheck.ThissavesahugepercentageofA*loopiterationsoverthelifetimeofthepath.
Thekeytothisoptimizationisitsone-twopunchoffirstusingahigh-level,fuzzypathtodeterminethecorrect,butugly,routetoanactor'sdestination.Thisfirstpunchdeliversincredibleperformance,somethingcrucialfortoday'spath-finding.Followthisupwithoursecondpunch,aseriesoflow-level,detailedpathsbetweenbeaconpoints.Suddenly,ouruglybeacon-pointpathslookbeautifulandintelligent.Separately,thesepathtechniqueshavemajorstrengthsandweaknesses;howevercombined,strengthscanceleachothersweaknesses,andourpathfindingbecomesunstoppable.
TeamUnknownRelease
Chapter4-High-LevelPathfindingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Acknowledgements
SpecialthankstoRickBushiefordrawingouractors.
TeamUnknownRelease
Chapter4-High-LevelPathfindingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]DanielF.Higgins."GenericA*Pathfinding".AIGameProgrammingWisdom,CharlesRiverMedia,2002.
[2]DanielF.Higgins."PathfindingDesignArchitecture".AIGameProgrammingWisdom,CharlesRiverMedia,2002.
[3]DanielF.Higgins."TerrainAnalysisinanRTS—TheHiddenGiant".GameProgrammingGems3,CharlesRiverMedia,2002.
[4]DanielF.Higgins."HowtoAchieveLightningFastA*".AIGameProgrammingWisdom,CharlesRiverMedia,2002.
TeamUnknownRelease
Chapter5-EnvironmentSoundCullingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter5:EnvironmentSoundCulling
SimonFrancoTheCreativeAssembly
Overview
Eachgenerationofgamehardwarebringswithitnewandexcitingchallengesfordeveloperstotackle.Oneconstantchallengewitheachhardwareiterationishowtoprocessdataefficientlyinrealtime.Thismustbedoneoptimallytomaximizehardwareperformanceanddelivercompetitiveresults.
Anumberoftechniqueshavebeendevelopedtoefficientlyhandlespatialdataforvarioussystems.TheseincludetechniquessuchasusingspacepartitioningstructurestocullgeometrythatisnotvisibletothecameraandusingdifferentAIcomplexitylevelstoreducetheprocessingtimespentondistantorhiddencharacters.
Aproblemthatreceiveslessattention,however,isthatofdetermininghowtoselectandprocessreal-timeaudiowithinascene.Theremaybehundredsorthousandsofpermanentenvironmentalsoundsourcesinasceneinadditiontothemanytransientsoundsourcesthatoccurduringgameplay,butonlyasmallsubsetofthemcanactuallybeplayingatanyonetimeforperformancereasons.Thisgemdiscussesatechniqueforefficientlyreducingthecompletesetofsoundstotheactivesetthatisaudibletotheplayer.
TeamUnknownRelease
Chapter5-EnvironmentSoundCullingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
5.1TheProblem
Asourgameenvironmentsincreaseingraphicaldetail,sotoomustourlevelsofaudiodetailtomatchthegraphicalrepresentationofthegameworld.Numeroussoundsarecommonlylayeredtoconstructagame'senvironmentalambiance.Thissupersedesearliertechniques,whichwouldhaveplayedasinglestereofiletoachievethesamegoal.Theadvantagetoplayingmultiplesounds,ratherthanasingleaudioclip,isthatthesoundcloselymatchestheplayer'ssurroundings.Forexample,iftheplayerwasinsideanoldhauntedhouse,therecouldbeagrandfatherclock,atelevisionshowingstatic,andwindhowlingthroughopenwindows.Astheplayermovesthroughthehouse,theambiancechangesastheplayergoesfromroomtoroom.Theplayercouldchangetheambiancebyclosingthewindowsorsmashingthegrandfatherclock.
Environmentalsoundstobeplayedareselectedfromthosenearthelistener.Thelistenerrepresentsapositionandorientationinthegameworldfromwhichtheplayerislistening.Usually,thisiseitherattachedtothecamerarenderingtheplayer'sviewoftheworld,oritusessomevariationoftheplayer'spositionandthecamera'sorientation.
Theproblemwemustaddressishowtohandlethemultitudeofsoundspositionedwithinthegameworld.Thesesoundsrangefromstaticallypositionedcontinuoussoundssuchasfiresandrivers,tomorecomplexanddynamicaudioeventssuchasacrowdcheeringonafight,oracharacterinteractingwithapieceofanimatedgeometrysuchasa
lever.
Allofthesecasesrequirethatasoundemitterisplacedwithintheworld,eithermanuallybyadesigneroraspartofanautomatedprocess.Soundemittersareusedtocontrolhowandwhenthesoundistriggered,andmosthaveapositionfromwhichthesoundistriggered.
Asoundemitteralsocontainsapointertoasoundevent.Soundeventsareobjectscontaininginformationabouthowtoplayasound,suchaswhichwavefiletoplay,volume,audibledistance,pitchsettings,andapriority.Prioritychecksareusedasamethodtoselectwhichaudiochannelsaremadeavailabletothenewly-requestedsoundeventwhentherearenofreechannelsavailable.Iftherearenochannelsavailablewithalowerpriority,thentherequestedsoundeventisnotplayed.
Whilethegameisrunning,eachactivesoundemittercheckswhetherthelisteneriswithinaudibledistance(seeFigure5.1).Ifthistestsucceeds,andtherearenofreesoundchannelsavailable,thenaprioritycheckagainstallcurrentlyplayingsoundsisperformed.Ifbothtestsarepassed,thentheemittercanfinallystartplayingitssound.Thenumberoftestsbeingperformedeachframehasincreasedasthenumberofsoundemittershasriseninatypicalgame.Therefore,weneedafastmethodforrapidlyrejectinglargenumbersofunsuitablesoundemitters.
Figure5.1:Allsoundemitterstestingagainstthelistener.
TeamUnknownRelease
Chapter5-EnvironmentSoundCullingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
5.2ASoundCullingSolution
Weneedtofindawaytoefficientlycullsoundemittersthatarenotwithinanaudibledistanceofthelistener.Wealsoneedtocullanyremainingsoundemitterswhosepriorityistoolowtoconsiderplaying.Thesolutionpresentedhereinvolvestheconstructionofasoundgridasameansofrapidlycullinglargenumbersofsoundemitterspositionedwithinthegameworld.Thesoundgridisatwo-dimensionalgridparalleltothex-yplanethatencompassestheentiregameworld.Thegridismadeupofequal-sizedcells,andeachcellcontainsanarrayofsoundemitterlists.Eachlistwithinthearrayrepresentsadifferentpriorityvalue,startingwiththehighestprioritytakingindex0inthearray.Allsoundemitterswithinalisthavethesamepriority,matchingthatrepresentedbythearray'sindex.Thesoundemittersstoredintheselistsarethosethatarewithinaudiblerangeofthatgridcell.
Sinceeachgridcellcontainslistsofaudiblesoundemitters,weonlyneedtodeterminewhichcellcontainsthelistenerinordertoknowwhichsoundemittersshouldbeplaying.Thisavoidshavingtoperformcomplexsearchesforsuitablesoundemittersinrealtime.
Weuseafixedcellsize,ratherthandividingupthespaceunequally,duetothenatureofsoundemitters.Soundemittersmaybeinphysicalproximitytoeachother,buthavewildlydifferingaudibledistances.Thesedifferentaudibledistanceswouldcauseanyformofgroupingtobepotentiallylessefficientandresultinmoreprocessingbeingusedtodeterminewhichsoundemittersareaudible.Havingafixed
cellsizealsoallowsforoptimizationswhendeterminingwhichgridcellisoccupiedbythelistener.Thesizeofagridcellcanvarydependingonyourapplication,buttoosmallofasizecanleadtoperformanceproblems.
Wetakeadvantageofbeingabletogroupthesoundemittersintoprioritylists,asmostapplicationswillonlyusealimitedpriorityrange,typicallybetween5and10differentprioritylevelsforenvironmentalsounds.
Usingacullingsystemsuchasasoundgridallowstheaudiosystemtorapidlycullthousandsofpotentialsoundemittingobjectsveryquicklybyonlystoringwhatcanbeheardwithinagivenareaoftheworld.Bystoringthesoundemittersinmatchingprioritylists,wecanstartbyprocessingthehighestprioritylistandquicklybailoutofoursoundemitterprocessingifwehaverunoutoffreesoundchannelsandhavereachedaprioritylevelthatistoolow.
Listing5.1showsanexamplesoundgridcellalongwithanexamplesoundemitterandlinkingclassusedtobindthem.WeuseinstancesoftheSoundEmitterLinkclasswithintheSoundEmitterclasstoformthelinkedlistconnectingupsoundemittersofmatchingprioritywithinacell.TheSoundEmitterclasscontainsthem_cells_touched_arraymember,whichisanarrayofSoundEmitterLinkobjects.Thearray'ssizeissetonconstructingthesoundemitterobject,andshouldbethemaximumnumberofcellsthatcouldbetouchedbythatSoundEmitter.
Listing5.1:Thispseudocodeshowsanexamplesoundgridcellandsoundemitter.
structCell{SoundEmitterLink*m_emitter_list[MAX_PRIORITY_LEVELS];};
structSoundEmitterLink{Cell*m_cell;SoundEmitter*m_parent;SoundEmitterLink*mprev;SoundEmitterLink*m_next;};
structSoundEmitter{Vectorm_pos;SoundEventm_sound_event;SoundHandle*m_sound_handle;SoundEmiiterLink*m_cells_touched_array;intm_num_cells_touched;boolm_active;};
TeamUnknownRelease
Chapter5-EnvironmentSoundCullingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
5.3ConstructingtheSoundGrid
Thesoundgridisconstructedusingdatafrombothstaticanddynamicallymovinggameobjects.Agameobjectisanobjectcreatedbythegamethathasinformationaboutthesoundeventitwantstoplayandwherethesoundshouldbepositioned.
Thesoundgridisfirstconstructedusingthestaticgameobjects.Weprocesseachstaticgameobjectpresentwithinthegameworldonlyoncewhenthegame'slevelisloading.Weusetheaudibledistanceandpositionofeachgameobjecttodeterminewhichgridcellsitssoundemittertouches(seeFigure5.2).Weconstructasinglesoundemitterforthatgameobjectandsetupitsm_cells_touched_arraymemberforthenumberofcellswithinaudibledistance.Foreachofthosegridcellswithinaudiblerange,weuseafreeelementinthem_cells_touched_arraymembertoformalinkbetweenthatcell'sappropriateprioritylistandthenewlyconstructedemitter.Wereturnthepointerforthenewly-constructedsoundemitterobjectbacktothegameobjecttooptionallystoreincaseitneedstolatermakemodificationstotheemitter'sstate.
Figure5.2:Soundsemittersusingtheiraudibledistancetodeterminewhichgridcellstheytouch.
Somesoundemittersdonotoccupyasinglefixedpositioninthegameworld.Forexample,youmayhaveasplinerepresentingariverthatistohaveasoundemitterplacedatthepositionnearesttothelisteneronthespline.Anotherexamplemaybethatthegameworldcontainsaforestedregioninsideofwhichyouwishtoplayabirdchirpingsoundatarandomlydeterminedposition.Forthesecases,weconstructthesoundemitteraswithstaticsounds,butdonotaddittoanygridcells.Wetreattheseasdynamicsoundemitters,whichwillbechangingwhichgridcellstheybelongtoasthegameprogresses.
TeamUnknownRelease
Chapter5-EnvironmentSoundCullingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
5.4ProcessingtheSoundGrid
Onceperframe,thelistener'spositionisconvertedfromaworld-spacepositiontotheparticularcellcoveringthatspacewithinthesoundgrid.Thisissowecanretrievefromthatcellthelistsofaudiblesoundemitters,whichwe'llneedtotryplaying.Thesoundgridretainsacopyofthesoundemittersselectedfromthecellvisitedonthepreviousframe.Thislistisreferredtoastheactivelist,asitcontainsthelistofallsoundemittersthatshouldbeactivelyplaying.Theactivelisthasafixedmaximumsize,matchingthemaximumnumberofsoundsthatcanbeplayedatanyonetime.
Listing5.2showstheprocessforbuildinguptheactivelist.Thefirstprocessingphaseistomarkallsoundemittersintheactivelistforremoval.Eachelementinthecell'semitterlist,uptothemaximumnumberofplayablesounds,ischeckedagainsttheactivelistofemitters.Ifanemitterinthecell'slistpointstothesameemitterintheactivelist,thenthatemitter'sremovalflagiscleared.Weonlytestuptothisnumberofelementsasadditionalelementscouldnotbeplayed.
Listing5.2:Thispseudocodeshowshowwebuildtheactivelistandstopplayinginvalidsounds.
voidSoundGrid::buildActiveList(){//phase1.set_all_active_list_emitters_for_removal()
for(priority=0;priority<MAX_PRIORITY_LEVELS;++priority)
{foreachemitterinthecell.m_emitter_list[priority]{foreachactive_emitterintheactive_emitter_list{
if(emitter==active_emitter){emitter.set_unremoved()break}elseif(emitter.m_priority>active_emitter.m_priority){//Emittercan'tbeontheactiveemitterlistas//we'vegonepastitspriority.break}}}}
//phase2.foreachactive_emitterintheactive_emitter_list{if(emitter.is_removed())emitter.stop_any_playing_sounds()}
//Finallycopythecell'semitterlisttotheactiveemitterlist.//Startwiththehighestprioritylistandprogresstothelowest//prioritylistuntilwerunoutofsoundemittersorfillup//active_emitter_listcopy_list_to_active_list(active_emitter_list,cell_emitter_list)}
Thesecondphaseistothenrunthroughtheactivelistandstopplayinganyemittersthatarestillmarkedforremoval.
Now,theonlysoundscurrentlybeingplayedarethosethatareinthecell'semitterlistandthatwereintheactiveplayinglist.Wefinallycopythecell'semitterlistintotheactivelist.Againhereweonlycopyuptothemaximumnumberofplayablesoundstoavoidredundantdata.
Onceperframe,weprocesstheactivelistofemitterstoseeifanyofthemareeitherinan"on"statebutnotplaying,orareinan"off"statebutareplaying.ThisisshowninListing5.3.InFigure5.3,weshowhowonlyonesoundemitterisvalidwhenusingasoundgrid.
Figure5.3:Thecelloccupiedbythelistenerisonlywithintheradiusofonesoundemitter.
Listing5.3:Thispseudocodedemonstrateshowthe
activeemitterlistisprocessed.
voidSoundGrid::updateEmitters(){foreachactive_emitterintheactive_emitter_list{if(emitter.is_on()){if(emitter.not_playing()){if(engine.sound.get_num_channels_free()>0){emitter.play_sound()}elseif(emitter.m_priority>engine.sound.lowest_priority()){emitter.play_sound()}}else{emitter.update_sound()}}elseif(emitter.is_playing()){emitter.stop_sound()}}}
MoreonSoundEmitters
Asmentionedearlier,apointertothenewly-constructedsoundemitterobjectisreturnedonceagameobjecthashaditssoundemitterconstructed.Thepurposeofthisistoallowthegametoplayorstopasoundemitterbychangingits"on"flag.Thisflagisarequiredpartofasoundemitter,asnotallgameobjectsrequirethattheirsoundemitterplaycontinuously.Examplesofthisarewhentheplayerhassetfiretoanobjectorthegameobjectiscoordinatingthesoundbeingplayedduringparticularframesofananimatedpieceofgeometry.
HowtoHandleDynamicSounds
Somegameobjectsneedtoperformanupdateonceperframeonthepositionoftheirsoundemitter.Forexample,youmayhaveasoundthatmovesalongapredefinedpathandneedtoupdatewhereitisonthatpatheachframe.Toachievethis,wetakeadvantageofthesoundgrid'sstructuretoallowfordynamicsounds.Duringeachframe,agameobjectsuchasacarcanmoveitssoundemitter.Todothis,itmustfirsthavethesoundemitterremoveitselffromallthegridcellsitwaspreviouslytouching.Thesoundemitteristhenmovedtoitsnewlocationintheworldandre-addedtothesoundgrid.Thismusthappenbeforethesoundgridisprocessedandtheactiveemitterlistisbuilt.Listing5.4showsanexamplealgorithmforinsertingthesoundemitter'slinktotheheadofacell'semitterlist,andListing5.5showsthematchingremovalalgorithm.
Listing5.4:Sampleinsertionroutinefordynamicsounds.
/**ThisinsertstheSoundEmittertotheheadofthecorresponding*priorityemitterlistforthiscell.*/voidSoundEmitter::add_emitter_to_cell(cell){intindex=m_num_cells_touchedintpriority_level=m_sound_event.m_priority
++m_num_cells_touched
m_cells_touched[index].m_cell=cellm_cells_touched[index].m_prev=null
//Setthenextelementtowhatthefirstelement(ifany)//waspointedtointhecell'slist.
m_cells_touched[index].m_next=cell.m_emitter_list[priority_level]
//Wastheresomethingattheheadoftheemitterlist?Ifso,//haveitsprevlinkpointtothisentry.
if(cell.m_emitter_list[priority_level]){cell.m_emitter_list[priority_level].m_prev=m_cells_touched[index]}
//Finallysettheheadofthecell'semitterlist//tobethissoundemitterlink.cell.m_emitter_list[priority_level]=m_cells_touched[index]}
Listing5.5:Sampleremovalroutinefordynamicsounds.
//ThisremovestheSoundEmitterfromallcellsitstouched.
voidSoundEmitter::remove_emitter_from_cells(){intpriority_level=m_sound_event.m_priority
for(index=0;index<m_num_cells_touched;index++){if(m_cells_touched[index].m_next){m_cells_touched[index].m_next->m_prev=m_cells_touched[index].m_prev}if(m_cells_touched[index].m_prev){m_cells_touched[index].m_prev->m_next=m_cells_touched[index].m_next}
//Checkifthiswastheheadnodeofthelinkedlist.Ifso//changetheheadnodetopointtothenextnode(ifany).
cell=m_cells_touched[index].m_cellif(cell.m_emitter_list[priority_level]=m_cells_touched[index]){cell.m_emitter_list[priority_level]=cell.m_emitter_list[priority_level].m_next}
//CleanthisSoundEmitterLinkup.emitter.m_cells_touched[index].m_cell=nullemitter.m_cells_touched[index].m_prev=nullemitter.m_cells_touched[index].m_next=null}emitter.m_num_cells_touched=0}
TeamUnknownRelease
Chapter5-EnvironmentSoundCullingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
5.5SupportingMultipleListeners
Somegamesneedtosupportmorethanoneplayerusingthesametelevisionormonitor.Thisrequiresthatthedisplayissplitinsomefashiontoshowbothplayers'viewoftheworld.Aswellashavingbothplayers'viewbeingprocessedbythesamegameconsole,wemustalsodivideupthesoundchannelstorepresentwhateachplayerishearing.Thesoundgridcanbemodifiedtosupportthiswithafewchangestothewaytheactivelistisbuilt:
Wefirstconstructanactivelistforeachlistenerpresentinthegame.
Oncethishasbeendoneforalllisteners,wecopythehighestprioritysoundsfromeachlistintothemasteractivelist.
Wethenmakesureanysoundsthatwereinanyofthelisteners'activelistsanddidn'tmakeitintothemasteractivelistarenotplaying.
TeamUnknownRelease
Chapter5-EnvironmentSoundCullingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
5.6Extensions
Thesoundgridisonepossiblesolutiontocullingsoundsoccupyingknownworldpositions.Whiletheimplementationdiscussedinthisgemonlyconstructsatwo-dimensionalgrid,thistechniqueshouldworkwithoutchangeformosttypesofgameworlds.Ifyourgamecontainsahighnumberofverticalsoundsoccupyinganearbyspaceinxandy,thentheprocesscanbeextendedtoallowforcullingsoundemittersinthezdimension.Eitheryoumaywishtohavemultiplesoundgrids,witheachoneatadifferentzheight,orexpandintoa3Dsoundgrid.
Additionalfutureworkcanbeperformedonthesoundgrid,suchasembeddingadditionalinformationaboutthegridcellorsoundemitters,suchaswhichreverbeffectstoapplytotheemitters,orwhetherweneedtoapplyafilter(duetoanobstructionbetweenthesoundemitterandlistener).Dynamicallygeneratedsounds,suchasgunfire,mayalsobeabletouseinformationcontainedinthesoundgridtocalculateanyfilteringthatneedstobeapplied,avoidingtheneedtoperformexpensivetestsinrealtime.Also,additionaloptimizationscouldbemadetothesoundemitterlinkedlists.Wecould,forexample,addaheadandtailnodetoeachlinkedlistandthusremovetheconditionaltestssurroundingtheinsertionandremovalofasoundemittertoagridcell.
TeamUnknownRelease
Chapter6-AGUIFrameworkandPresentationLayerGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter6:AGUIFrameworkandPresentationLayer
AdrianHirstWeaseltronEntertainmentLimited
Overview
Graphicaluserinterface(GUI)designisanoftenoverlookedandunder-resourcedpartofgamedevelopment,yetitisresponsibleforthelookandfeelofagametotheuseraswellasitsallimportantfirstimpressions.Theuserinterfaceneedstobequicklyanddramaticallyadaptedcontinuouslythroughoutaproduct'sdevelopmentcycleinordertoreflectmodificationstovirtuallyanyotherpartofthegame.
RelativelylittleliteratureexistsforGUIpresentationcode,andfindingsamplesofrigorouslytestedsourcecodeisdifficult.ThisgemprovidesabriefintroductiontoGUIsystemsanddocumentsaproven,current,flexible,andworkingsystemthatcanprovidethefirststepforin-gameandtools-basedsystems.Drop-insourcecodeisprovidedandshouldproveinstantlyusefulforeveryonefromthestudenttotheseasonedprofessional.
TeamUnknownRelease
Chapter6-AGUIFrameworkandPresentationLayerGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
6.1GUISystems
Likemanyin-gamesystems,thedifficultyofarrivingatthefinishedproductisinbeingflexibleenoughtorespondtoconstantchanges.Theiterativenatureofgamedevelopmentmeansthatagameisconstantlychanging.Foreverychangetoscoringsystems,gamemode,orindeedanygameplayelement,asubsequentchangeismostoftenrequiredtothepresentationlayer,ensuringthattheuserisstillclearlypresentedtherelevantandrequiredinformationtheyneedtoplaythegame.Iterativegamedesignimpactsmanyaspectsofagamethroughoutitsdevelopment,allofwhichneedstobefedbacktotheuser,ataskthatfallstothepresentationlayer.
Acommonandsensibleapproachisbuiltaroundcreatingasetofsolid,stablecontrols,orcomponentsthatcanbeusedandcombinedoverandoverinvariousways.Whenrequestscomeintothepresentationteamforanewscreen,orthere'sanewgameplaymechanicthatrequiresascreentoberewritten,afamiliarsetoftextboxes,buttons,menusetc.canbedroppedintoplace.
AGUIsystemshouldincludeaneditorforartiststocreateandpositioncomponents,textures,andtextinthebestway.Toooften,andparticularlyonsmallerprojects,ratherthancreatinganeditorforthistask,thejobofpositioningthesescreenelementsfallstoaprogrammer,whohastotypethecoordinatesbyhandintoatextfile.Thisisunacceptableforanythingbutthesmallestofgamesandinterfaces.
Localizationissuesalwayscreatetheirownproblemsinuser
interfacedesign.Translationofgamestomultiplelanguagesinevitablyleadstotextstringsofvaryinglength,oftenoccupyingmorescreenspacethaninitiallyallocated.Evenseeminglyinnocentchangesinthewordingofkeyphrases,orevensimplemodificationsincapitalizationorgrammar,cancausetextstringstooverlaptheareaofscreendesignated.Thenewwordislikelytobelongerinanotherlanguage—theGermanlanguagegenerallybeingthemostverbose.Anessentialfeatureofusinganeditortodesignpresentationscreensisbeingabletopreviewstatictextinalllanguagestocheckforsuchissues.
Consoleplatformholderstypicallyhavetheirownsetoftechnicalrequirementsthatmustbefulfilledbeforethegameisaccepted.Withoutgoingintotoomuchdetail,theserangefromnewscreens,gamemodes,networkrequirements,controllerconfigurations,andmenuoptionstoscreenresolutions,drawableareasofthescreen,andissuesoftextlegibility.
ExistingSolutions
Severalmiddlewaresolutionsexistforcreatingfrontendsthatprovidefullfeaturesets,includingWYSIWYGeditors,runtimecomponents,customanimation,scripting,andAdobeFlashsupport.Theseprovecompellingwherethebudgetallows.SuchtoolsprovidesupportthatempowersartistsanddesignerstocreatethebestlookingandmostadvancedGUIsystemswithaslittleprogrammerinvolvementaspossible.
Supportforthesefeaturescomesatacost,though,andsomeimplementationscanaddsignificantmemoryandCPU
overhead,withsomeAdobeFlashimplementationsinparticularbeingtypicallyresource-heavy.Smaller-scaleproducts,however,oftendonotrequiresuchfeature-richimplementationsandcantradefeaturesforperformanceandflexibility.
TeamUnknownRelease
Chapter6-AGUIFrameworkandPresentationLayerGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
6.2DesignPatterns:ModelViewController(MVC)
Acommonproblem,particularlyevidentwithGUIsystems,isthatovertime,lastminutehacksand"temporary"bugfixesleadtostronglycoupledcode,leavingchangestooneareaofthesystemtocauseunintendedeffectselsewhereandalsomakingitdifficulttorefactor.Themodel-view-controller(MVC)designpatternaimstoensureonlyaloosecouplingofelementsbyseparatingtheframeworkintothreedistinctconstituentpartstobemaintainedindependently:
Themodelreferstotheactualdatathatwerepresentonthescreen.Forexample,thiscouldbethegametimeorthenumberoflivesremaining.
Theviewconcernsitselfonlywiththevisualrepresentationandrenderingofthatobject.Inourpreviousexample,perhapswemightdisplayananalogueclocktorepresentthetimeremainingoraniconforeveryliferemaining.
Thecontrollerreferstohowtheobjectinteractswiththegame,userinput,andgeneralstate,system,orgamelogic.
MVChasgainedpopularity,nowbeingamajorcontributingconceptinmanyofthelargerGUIandsystemframeworks,fromCocoatoQt,MFC,andthecurrentWindowsPresentationFoundation.
Takingourprevioustimerexample,thecontrollerwouldrefer
toourgameupdateloop;beingresponsiblefordeterminingthecurrenttimeandpassingaTimerUpdatemessagetothetimermodelobject,whichinturnupdatesitsinternalrepresentation.Theviewwilltypicallyhaveitsowninternaldescription,butwillupdatethatbasedoninformationinsidethemodelitself.
ImplementingMVCdoesrequireachangeinthinking,asthethreeconstituentpartsareunawareoftheothers'innerworkingsandunabletoaltereachother'sdatasocommunicationreliesonsendingmessages.Thiscouldjusttaketheformofcallingafunctiononanotherobject.Inlargersystems,though,itisbeneficialtoensureagreaterdistinctionbetweencomponentsbyusingeventqueuesandtraditionalmessagepassingsystems.
Dependingonthetechniqueused,thismessagepassingcanaddasubstantialamountofoverheadtothesystem,reducingitsflexibility.Choosingthecorrectmethodbywhichthesemessagesarepassediscrucial,andworthyofitsownGemstopic.
TeamUnknownRelease
Chapter6-AGUIFrameworkandPresentationLayerGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
6.3AGUIDesign
Thegoalofthisgemistopresentasimpleandextensibledrop-inGUImodulewhosetargetaudiencerangesfromthestudentorsmallerindiestudiototheexperiencedprofessionalwantingtoquicklyaddsomeuserinterfaceinteractiontotheirgamewithoutdevotingtoomuchoftheirtimetoreinventingthewheel.Therefore,thesystemmustexhibitthefollowingattributes:
Modular.Asadrop-inreplacement,itmusthavetheminimumpossiblenumberofexternalAPIdependencies.
Lightweightandflexible.Itshouldbeadaptabletoasmanyplatformsaspossible,havebindingstospecificareassuchasvariousinputtypes,andrenderingoraudiocodeshouldbekeptasminimalaspossible.Thisguaranteesthehighestlevelofportability.
Bothprogrammerandartist-driven.Formanysmallerproducts,itisoftenfeasible(ifnotfun)tohand-codeourownfrontenddesign.Whilethisexposesustothedangersoflocalization,minorfontchanges,andrewordings,itcansometimesbethequickestwayofcreatingausableGUI.ThissystemallowsprogrammaticallycreatedGUIscreenswherenecessary,anditleavesscopefordata-drivendesignsfromanexternaltoolatalaterdate.
Extensible.Thecodeprovidedshouldgivetheuserasolidbasefromwhichfurther,moreadvancedcontrolsmaybeproducedwiththeleastamountofeffort.
Objectoriented.Object-orientedprogrammingtendstolenditselfparticularlywelltoGUIobjectswithinheritablebehaviorsanddatasets.
Localization-ready.Alltextrenderingmustbeabletohandleinternationalcharactersets.
"Boilerplatecode"-minimizing.Itshouldencapsulatecommonprogrammingtaskssotheusercanconcentrateonaddingcontentratherthanrepetitivefunctionality.
ALittleCode[1]
First,let'slookatthebasicGuiComponentclassshowninListing6.1.ThisisthebaseclassfromwhichallotherGUIcomponentsarederived,anditcontainsverylittlespecificinformationorfunctionalityotherthanitisanobject,ithasavisualrepresentationthatexistssomewhereinahierarchy,andithasalittleinformationaboutwhetheritisactiveandwhetheritisvisible.
Listing6.1:OurGuiComponentbaseclass.
classGuiComponent{public:
GuiComponent();virtual~GuiComponent();
virtualTypeIdGetTypeId()const=0;staticTypeIdGetStaticTypeId();
uint32_tGetId()const;
voidSetupGuiComponent(constString&name,boolactive=true);voidSetVisual(Visual*visual,boolmakeVisible=true);voidSetPosition(constVec3&pos);
virtualvoidUpdate(Time&timeDelta)=0;virtualvoidRender(Renderer*renderer);
private:
staticTypeIdsTypeId;
Stringm_Name;uint32_tm_Id;Framem_Frame;Visual*m_Visual;
//IstheGuiComponenttakinginputandbeingupdatedetc.?boolm_Active;
//DefaulttoinvisiblefalseuntilwehaveavalidGFX::Visual.boolm_Visible;};
TheGuiComponentclasscontainstwosimplebooleanvariablesthatdeterminewhethertheobjectisconsideredactiveandwhetheritisvisible.Thedistinctionhereisthatanobjectcouldbevisibleonscreen,butnotactiveinthatitdoesnotrespondtoinputorupdateitself.Likewise,anobjectcouldbeactivelybeingprocessed,butnotvisibleon-
screen.
Astringisstoredtokeepahuman-readablenamefortheobject.Thisislargelyusedfordebugging,butalsousedtogeneratetheCRCofthisname,whichisstoredinthem_Idvariableforfastaccessandcomparison/finding.Thiscanalsobeusedbytooleditorsandloadersforreferencingobjects.
TheGuiComponentalsocontainsaFramenodethatspecifiesalocaltranslationtoapplytothisobject,aswellasitspositioninsideahierarchy.ThisallowsustoattachtogetherGuiComponentsintogroupswhichcanthenbemovedtogetherasawhole.
ThemodelcontainsasinglepointertoaVisualobjectcontainingrenderinginformation.Iconsiderthistobealooseabstractionofthemodelandviewoftheobject,butitservesourdesigngoalofbeingquickandflexiblewell,meaningthatitabstractsmuchoftherenderingcodeawaywhilestillallowingustohaveanentrypointforpatchinganyrenderinginformation,suchasstateormaterialchanges,wherenecessary,forthoselast-minutefixes.Formedium-sizedorlargersystems,thecouplingbetweenthemodelandviewsideoftheobjectwouldneedtobelooser.TheRender()functioncansimplymakethefollowingcall:
renderer->Render(m_Visual);
Thiscanbeoverriddenviavirtualfunctioninderivedclasseswhererequired.
ThisGuiComponentclassallowsustoextendtriviallytocreateabasicGuiSpriteclass,asshowninListing6.2.
ThissimplestofexamplesjustaddsaninstanceofaVisualSpriteclassandcallstherenderertodrawit.Here,theVisualSpriteitselfcontainsmostoftheinformationrequiredtorenderthesprite,whichissetupinsidetheGuiSprite::SetupGuiSprite()function.CallingtheGuiComponent::SetVisual(&m_Sprite)functioninsidetheSetupGuiSprite()functionensuresthatthebaseclassrenderfunctionperformsalloftherenderingrequiredforobjectsofthistype.
Listing6.2:Aspriteclass.
classGuiSprite:publicGuiComponent{public:
voidSetupGuiSprite(constVec3&pos,Texture*texture,Color&sprite,floatwidth=0.0F,floatheight=0.0F);
virtualvoidUpdate(Time&delta);
private:
VisualSpritem_Sprite;};
Likewise,aGuiTextItemclasscanbederivedthatstoresapointertotextandrenderingparameters,asshowninListing6.3.InthefullcodesampleontheaccompanyingCD,aBMFontclassstoresglyph-basedinformationandtexturedataloadedfromfilesexportedfromtheAngelCodeBMFont
library[2].
Listing6.3:TheGuiTextItemclass.
classGuiTextItem:publicGuiComponent{public:
voidSetupGuiTextItem(wchar_t*text,FONT::BMFont*font);
floatGetTextWidth()const;floatGetTextHeight()const;
voidSetText(wchar_t*text);voidSetAlignment(GuiTextHAlignhAlign,GuiTextVAlignvAlign);
voidSetColor(constColor&color);virtualvoidUpdate(Time&delta);
private:
floatm_xScale;floatm_yScale;
VisualTextm_VisualText;};
NotethattheGuiTextItemclassonlyacceptswidecharacterstringstodisplaytext.Thesemulti-bytecharactersenableustodisplaycharactersfrommultiplelanguages,thoughvariousplatformsmayendian-swaptheorderofthe
bytes.Twofunctionsgetthecalculatedwidthandheightofthetextbasedonthetextandthefont.Thisbecomesusefulwhenwewanttoallowapieceoftexttobeselectedwithapointingdevice.
ExtendingourSystem
Unfortunately,GUIsystemsarerequiredtoperformmorethansimplydisplayingspritesandtext,solet'slookataddinganotherimportantclass,GuiSelectable.ShowninListing6.4,theGuiSelectableclassisaparentclasstomenuitems,clickableicons,selectabletext,andanythingelsethatmightbehighlightedand/orselected.Thepurposeofthisclassistoprovideaninterfacethatrespondstotwomajoractivities—beinghighlighted(forexample,whenauserpointstheirmouse,stylus,orotherpointingdeviceoverthecomponent)andbeingselected(forexample,whenthepointingdeviceisclickedonthisitemorthe"select"buttonispressed).Thedevicetypebeingusedisirrelevant,ashighlightingandselectingareactionscommontoon-screennavigation,whetherbykeyboard,joypad,stylus/touch-screencontrollers,remotepointingdevices,etc.
Listing6.4:Abaseclassforselectablecomponents.
classGuiSelectable:publicGuiComponent{public:
boolIsSelectable()const;virtualvoidSetSelectable(boolselectable);
boolIsHighlightable()const;
virtualvoidSetHighlightable(boolhighlightable);
GuiHotSpot*GetHotSpot();voidSetHotSpot(GuiHotSpot*hotSpot);
//Processwhattodowhentheusersfocusis//inthespecifiedposition.virtualboolSetFocus(floatx,floaty);
//TesttoseeifthespecifiedpositioniswithintheHotSpot.virtualboolIsInHotSpot(floatx,floaty);
//Virtualfunctioncalledwhentheitemisselectedandreturns//whetheritispossibletoselectthisitem.virtualboolOnSelect();
//TheGuiSelectableishighlighted//(mouseover,menuitemselected).virtualboolOnHighlight(boolhighlighted);
virtualvoidUpdate(Time&delta);private:
GuiHotSpot*m_HotSpot;
boolm_Highlightable;boolm_Selectable;};
AGuiSelectableobjectcontainsareferencetoaGuiHotSpotitem,whichholdsinformationaboutthearea
ofthescreenthat,whenclickedon,makestheGuiSelectableobjectactive.Typically,thisisarectangularregionroughlythesamesizeastheiconortextoftheitemitself.DerivedclassescandescribetheirowncircularorpolygonalareassolongastheIsInside()virtualfunctionisoverridden.GuiHotSpotobjectsretainapointertoaFrameobjectsothattheobjecttrackswiththeGuiComponentobjectitself.
Forpointer-baseddevices,determiningthehighlightedstatusofaGuiSelectableobjectisachievedbypassingthecurrenton-screencursorcoordinatestotheSetFocus()function.If,afterqueryingthem_HotSpotvariable,it'sdeterminedthatthehighlightedstateshouldchange,thenthevirtualOnHighlighted()functioniscalledwiththenewdesiredvalue.Again,wecanoverridethisvirtualfunctioninderivedclassestoupdateitsownstate,andtherefore,itsvisualrepresentation.
Forbutton-basedcontrollers(primarilykeyboardsandjoypads/sticks),thefocusislikelytobemorebespoke,perhapsbeingtransferredbytabbingthroughthevarioushighlightablecomponents,orinthecaseofmenus,usingthedirectionalpadonaninputdevice.Werevisitthistopiclater.
GuiSelectableobjectscanbetoldthattheyarenothighlightable.Whileinthisstate,theyarenotabletobecomeactivelikeotherGuiComponentobjectsunlessexplicitlymadehighlightableagain.Thisismostlikelytobeusedforcomponentsthatareeitherjustnotavailableorarenotavailableyet.Forexample,theremaybeacurrentlyhiddengamemodeoption.
Likewise,itmaynotbepossibletoactuallyselectsome
items,eventhoughtheyarehighlightable.Perhaps,forexample,movingourcursoroverthecurrentlyunavailablelastlevelonalevelselectscreenhighlightsoranimatesthatcomponent,buttheuserisstillnotabletoselectit.ObjectsbeingselectedhavetheOnSelect()virtualfunctioncalledonthem,whichreturnswhethertheselectionsucceeded.Thisisthenhandledbythecurrentstate.
ButWhatAbouttheMenus?
Okay,sowe'vegottheabilitytoplaceobjectsaroundthescreenwhereverwelike,butwouldn'titbenicetoputacollectionoftheseitemsintoamenu?EntertheGuiSelectableGroupclass,showninListing6.5.AGuiSelectableGroupobjectcanbethoughtofasamenu,withacollectionofpointerstoGuiSelectableobjectsthatmakeupitsmenuitems.Thisallowsustomovecommoncodeforhighlightingandselectinggroupsofobjectsintoasingleinterface.Atitssimplest,theGuiSelectableGroupclassallowsustohighlightobjectsbyarrayindex,byGuiComponentname,orbyID.Italsoallowsustosimplyhighlightthenextorpreviousiteminthecurrentgroup,skippingunhighlightableitemsandwrappingaroundthelist,ifrequired.
Listing6.5:AGuiSelectableGrouphandlestheselectionofmanyGuiSelectableobjects.
classGuiSelectableGroup:publicGuiSelectable{public:
//Additemandsettheattacheditem'sm_Frame'sparenttous?
virtualvoidAddItem(GuiSelectable*item,boolattachFrame);
//RemoveallitemsfromlistanddetachanyattachedFrames.virtualvoidClear();
//Highlightthenext/previousitemskippinganyunhighlightable//itemsandwrappingwherenecessary.virtualboolHighlightNext();virtualboolHighlightPrevious();
//Highlightaniteminthelistbyindex,componentname//orId(nameCRC).virtualboolHighlightIndex(intindex);virtualboolHighlightItem(constString&name);virtualboolHighlightItem(uint32_tid);
//FindanitemintheGuiSelectableGroupeitherbyname//orId(nameCRC)virtualintFindItemIndex(uint32_tid);virtualintFindItemIndex(constString&name);
virtualGuiSelectable*FindItem(uint32_tid);virtualGuiSelectable*FindItem(constString&name);
//Getthecurrentlyhighlightedobject(ifthereisone).GuiSelectable*GetHighlighted();
virtualboolSetFocus(floatx,floaty);
//Testtoseeifthespecifiedpositioniswithin//theGuiComponent'sHotSpot.virtualboolIsInHotSpot(floatx,floaty);
//Virtualfunctioncalledwhenitemisselectedandactivated.virtualboolOnSelect();
//VirtualfunctioncalledwhentheGuiSelectableishighlighted//(mouseover,menuitemselected).virtualboolOnHighlight(boolhighlighted);
//UpdatealltheitemsinthisGuiSelectableGroup.virtualvoidUpdate(Time&delta);
private:
//Astatic-sizedarrayofpointerstoGuiSelectableitems.GuiSelectable*m_Selectables[GUI_SELECTABLEGROUP_MAXITEMS];intm_NumSelectables;
intm_Highlighted;boolm_Wrapped;};
AddinganitemtoaGuiSelectableGroupobjectisdoneviatheAddItem()function,whichprovidestheoptionofautomaticallyattachingthenewitem'sFrametothisone.Ifusedasamenu,thisattachmentmeansthatallitemsinthemenucanbemovedtogether,perhapsanimatedontooroffofthescreenjustbyanimatingthexcomponentoftherootnodematrix.Notealso,thatafteraddingitemstoanemptylist,ifwewantonehighlighted,westillwanttomanuallydothat,orcallHighlightNext()tohighlightthefirstavailable.Thisistypicalforaconsoleorkeyboard-basedcontrolsystem.
Thisclassconformstothecompositedesignpattern,astheGuiSelectableGroupclassitselfderivesfromtheGuiSelectableclass.Thisstraightforward,butpowerfuldistinctionmeansthatitinheritstheabilitytobeselected,tobehighlighted,andimportantly,tobeincludedinalistinsideanotherGuiSelectableGroupobject.Thisisusefulforanumberofsituations—forexample,withanoptionsscreenwherethemainitemsrunvertically,selectedbyup/downonthejoypad,butwheresomeitemsprovidemultipleoptionstochoosefrom,suchasselectinglow,medium,orhighforgraphicsdetail,stereoormonoforsound,etc.,whichrunhorizontally.
Inconformingtothecompositedesignpattern,theGuiSelectableGroupclassisresponsibleformanagingcallsthroughtheGuiSelectablevirtualfunctionstoo.Update()andRender()callsarepassedthroughtoallcontainedGuiSelectableobjects,theOnSelect()functiondeterminesthecurrentlyhighlighteditemand(ifpresentandactive)callstherelevantOnSelect()functiononlyonthatitem.FunctionssuchasOnHighlighted()andSetActive()followasimilarpattern,thoughitisworthbrowsingthecodetoseethedetailsofwhenobjectsbecomeactiveornot.
AnExampleinUse
ThecodeexcerptshowninListing6.6usesatypeGuiSelectableText.ThisisaGuiSelectableobjectcontainingjustaGuiTextItemobject.Itshowshowquicklytheclassstructurecancreateanon-screenmenu.ARenderComponents()functiontriviallyreducestojustthefollowing:
Menu.Render(renderer);
Listing6.6:Settingupandupdatingofanexamplestate.
GuiSelectableTextMenuItems[MENUITEM_COUNT];GuiSelectableGroupMenu;
voidSetupComponents(){Menu.SetPosition(Vec3(512.0F,480.0F,0.0F));
for(intitem=0;item<E_MENUITEM_COUNT;++item){MenuItems[item].SetupGuiComponent(s_ComponentNames[item]);MenuItems[item].SetupGuiSelectableText(s_ComponentText[item],Resource_GetFont());MenuItems[item].SetPosition(menuItemPos[item]);
Menu.AddItem(&MenuItems[item],true);}
MenuItems[E_MENUITEM_UNLOCK_ME].SetHighlightable(false);
#ifndefCONTROLLED_BY_POINTER_ONLYMenu.HighlightNext();#endif}
//Theupdateloopthenreducestothis:voidUpdateState(Time&delta){Menu.Update(delta);
#ifdefCONTROLLED_BY_POINTER_ONLY
if(m_Menu.SetFocus(pointerX,pointerY)){//Ifwe'reonlyusingthemouse,andwe'renotoverany//otheritem,unhighlightthecurrentlyselectedone.m_Menu.HighlightIndex(GUI_SELECTABLEGROUP_UNHIGHLIGHTED);}
#else
m_Menu.SetFocus(x,y);
//Nowloopthroughallthecontrollerscheckingtheiractions.for(allactivecontrollers){if(Controller.IsPressed(DPAD_UP))Menu.HighlightPrevious();elseif(Controller.IsPressed(DPAD_DOWN))Menu.HighlightNext();
if(Controller.IsPressed(SELECT)&&Menu.OnSelect())ProcessSelect();}
#endif//CONTROLLED_BY_POINTER_ONLY}
AddingvirtualfunctionsSetupComponents(),UpdateComponents(),RenderComponents(),andUpdateInput()tothebasestatesystemhelpsbuildaframeworkthatcanbeextendedandkeepsacommon
interfacethroughouttheGUIscreensaswellasseparatetheGUIcomponentscodefromotherstate-basedcode.Thisalsohelpstodistinguishthemodel,view,andcontrollerportionsofthecode.ThiscanbeseenbetterinthesourcecodeontheaccompanyingCD,thoughafullexplanationisoutsidethescopeofthisgem.
[1]NotethatthecodeinthisgemisashortenedversionofthefullcodeavailableontheaccompanyingCD.SimpleGet()orSet()andotherancillaryfunctionshavebeenremovedforthesakeofbrevity.
TeamUnknownRelease
Chapter6-AGUIFrameworkandPresentationLayerGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
6.4AndFinally
TheobjectspresentedinthisgemformsjustthebeginningofaGUIsystem.IncludedontheaccompanyingCDisanexampletestapplicationimplementingtheseandotherGUIComponentobjectsbydemonstratingavarietyofexamplestates.AlsoincludedisaworkableStateMachineclass,averybasicOpenGLrendererbuiltuponSDL,testfontlibrary,inputlibrary,basicMATHScomponentsandvariousfoundation,orutilityclassesandanythingelseneededtoputtogetheranimplementationofabasicGUIframework.
Totakethisworkastepfurther,anelementofdata-drivendesignneedsimplementing.Inparticular,weneedaneditorforartiststotakecontroloverthelookandfeelofthepresentation.
Anup-to-dateversionofthecodeisavailableontheWeaseltronwebsiteathttp://www.weaseltron.com/WeaselGui.ThecodeisdistributedundertheLGPLlicense,meaningit'sfreetouseincommercialproducts.
TeamUnknownRelease
Chapter6-AGUIFrameworkandPresentationLayerGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]ErichGamma,RichardHelm,RalphJohnson,andJohnVlissides.DesignPatterns—ElementsofReusableObject-OrientedSoftware.Addison-Wesley,1995.
[2]AngelCodeBMFont.http://www.angelcode.com/products/bmfont/
TeamUnknownRelease
Chapter7-World'sBestPalettizerGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter7:World'sBestPalettizer
JasonHughesSteelPennyGames,Inc.
7.1Palettes?Whateverfor?
Backintheolddaysoffixed-functiongraphicscardsandlimitedVRAM,palettizedtextureswereoneoftheearliestformsofcompression,atthecostofcoloraccuracy.However,asVRAMincreasedonvideocards,andgraphicschipsetsbeganmovingtowardprogrammableshaders,palettizedimagesgavewaytoS3TC/DXTC(andbetter)compressionthatgenerallygivethesamememorysavingswithbetterperformanceandcolordepth.
So,whyusepalettes?Perhapsyou'veheardofitshighbrowcousin,vectorquantization?Itmayseemlikeacrazythingtoneedinthisdayandage,butquantizationcodebooks(palettes)areusefulinmanycontexts,notjustwithimages.Someexamples:S3TC/DXTCisnotagoodchoiceforallimages,particularlycartoonlikegraphicswithlargeregionsofsolidcolordividedbysharpedges.Finelydetailedpixelartendsupasablockymess.Also,someGPUssupportindividuallyindexedvertexattributessuchasnormals,colors,andsoon,butarelimitedtoeither8-bitor16-bitindices.Thisiseffectivelyapaletteofvalues.Similarly,withshadersonmodernsystems,texturescouldbeusedtoreplacecertainvertexattributestosimplifythenumberofvertexformatsinvolvedinsettingupaflexibleengine,andcompressionofthosetexturescouldbeeffectivewithasmaller1Dlookuptabletexturehavingalimitedsetofvalues.Normalmapsmightalsobegoodcandidatesforpalettizing,dependingonthecomplexityofthesurface.S3TC/DXTCblockartifactscanshowupinlightingonsomesurfaceswhereaspalettizedversionswouldatworstshowbanding,butnotblockingartifacts.
Finally,therearetheoccasionalobscurebutamazingpalettetricksthatoldgamedogsknow.Onesuchtrickputpalettizedvertexcolorstoexcellentuse.Thegeneralideawastovertex-lighttheworldinNdifferentphasesofdayandnight,thencomputea3N-dimensionalpalettewhereeach"color"inthesourceimagewaseffectivelyastackofNRGBtriplets.Everyvertexintheworldwasgivenonevertexcolorpaletteindexthatwasthestackedcolorinthefinalpalette.So,withoutchanginganyvertexdata,simplybyinterpolatingthecolorsinsidethepalettefromonesetofRGBtothenext,verticesappeartogothroughasmoothchangeintimeofday.
Forreference,Figure7.1showshowtheWBPcomparestoPhotoshop,witherrordiffusiondisabledsowecanclearlyseethequalityofthecolorchoices.
Figure7.1:(SeealsoColorPlates.)(a)Flowerssourceimageusing100,162colors.(b)Photoshop,256colors.(c)Photoshop,16colors.(d)WBP,256colors.(e)WBP,16colors.(©Dundanim/Dreamstime.com)
TeamUnknownRelease
Chapter7-World'sBestPalettizerGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
7.2UnderstandingQuantization
Forsuchasimpleconcept,reducethecardinalityofadatasetfromMsamplestoafixednumberNthatminimizeserror,thereareasurprisingnumberoftoughlessonstobelearnedwhendesigningaquantizer.Someinterrelatedquestionshavetobeansweredthatvarywiththedataandthesituationtowhichthey'reapplied:
Howdoyoucomputearepresentativevalueformultiplesamples?
Whatisagoodmetricformeasuringerrorbetweensamplesandtheirrepresentativevalues?
Areallaspectsoftheinputdataequivalent,oristhereaneedforweightingsomechannelsorcomputingerrordifferentlyforthem?
IsitbettertostartwithMrepresentativevaluesandmergethem?
IsitbettertostartwithoneandsubdivideuntilreachingN?
Whataboutstochasticprocessesthatgraduallyimprovethefituntilsomemeasureissatisfied?
Ifyoustudytheresearchonpalettizingalgorithmsandvectorquantization,therearemanytrade-offsthatvariousapproachesmake,suchastoolsperformanceagainstruntimequality,codecomplexityagainsteaseofimplementation,perceptibleerroragainstnumericalerror,
etc.Whatisimportantisthatyouunderstandthatthisiseffectivelyak-meansclusteringproblem,whichisNP-hardevenforfindingtwopaletteentriesforadataset,muchlessN[1].So,foranyalgorithmtobepractical,therewillbeassumptionsmade,andpossiblyinadequateresultsforspecificinputs.Itisthereforeimportanttotrynottosolvetoogeneralaproblemandonlyworryaboutlargesourcesoferroraslongastheresultsarepositive.
TeamUnknownRelease
Chapter7-World'sBestPalettizerGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
7.3Hard-EarnedLessons
Thebestexperiences(inprogramming)arethoseoffailure.AsCharlesKetteringoncesaid,"Onefailsforwardtowardsuccess."Hereareafewobservationsafterdesigningseveralalgorithmsthatdidnotproducesatisfactoryresults.Othersmayhavebetterluckorbetterideas,butthefollowingwereimportantlessonsIlearned:
Colorsandotherattributescanbeunifiednicelybyconvertingthemalltofloatingpointnumbersandpassingthemaroundasarraysorstd::vector<float>.Thereisnolossingeneralityanditsimplifiescodetodothis.
Colorspacesareimportantwhendeterminingthequalityofacolorsample,andideallythemeasurementoftwosamplescanbemeaningfullyperformedbyEuclideandistanceformula.R′G′B′colors(ascomputer-generatedartanddigitalphotographytypicallyis)areinanonlineargamma-correctedspace,andslighterrorsinanyonecomponentmaybeverynoticeable.(Thereisatremendousrabbitholehere,whereyoumaybetemptedtospendalotoftimelearningaboutcolorspaces.Colorencodingandrepresentationisadeepfieldwithmountainsofresearch.Diveinifyoumust[2].)IfoundthatmovingR′G′B′gamma-correctedcolorstolinearRGBspacehadtheeffectofoddlycharacterizingsamplesasbothcloserandfarther(atdifferentpointsalongthegammacurve)fromeachotherthantheyappeared.Inshort,Ifoundthatleavingcolorsingamma-correctedspaceisbest.Thesamplesinthisarticleare
quantizeddirectlyfromyourstandard,run-of-the-millR′G′B′.
Anobviousoptimizationistoreducethenumberofinputsamplestoonlyuniquevalues.However,itisworthmentioningthatifyoudothis,makesureeachuniquesamplealsocarriesaweightthatisproportionaltothenumberofsamplesthatitrepresentsfromthesourcedata.Otherwise,thepalettizerwillnotbeabletoknowhowrelativelyimportanteachsampleis,andingeneralwillprovidepoorfitsforhighlyredundantdatasets.
ItriedstartingwithMdifferentrepresentativevaluesandmergingthemdowntoN.Theproblemisintractable,though,becauseMmaybehuge,andtheactoffinding"close"valuestomergetogetherisO(n2).Thisiswhyverylittleresearchexistsusingthisapproach.
MostofthealgorithmsthatstartwithmorethanoneentrydosobyselectingarandomsetofNrepresentativevaluesfortheirpaletteandnudgingthemarounduntilgoodvaluesarefound.Isawnoclearguidanceastohowtoselecttheseinitialvalues.Seeingashowtheinitialselectionofvaluesgreatlyinfluencesthenumberofiterationsrequiredtofindanoptimalfitting,andthatit'squitehardtoknowifthevaluesarestuckinalocalminimaorareactuallygoodrepresentations,Iabandonedtheseapproaches.Intools,Ipersonallyprefertogetnear-optimalresults(bysomecriteria)inadeterministicnumberofiterations.
OncedecideduponthestrategyofpartitioningclustersuntilNisreached,Istillhadissueswithcyclesinthe
algorithm.Thinkingthatitwouldyieldtighterclusters,Irecomputedtherepresentativevalueofeachpartition,thenreassignedeverysampletotheclosestvalueinthepalette.Thiswasfine,exceptthatoccasionallyaclusterwouldendupwithnosamplesassignedtoit.Naturally,Ijustdeletedthevalueandcontinued.Don't.Thissometimescausesinfiniteloopswhenasetofsamplescyclesbetweentwoorthreeextremevalues.Intheend,IfoundthatI'dcomputethecentroidofaset,thenfindtheclosestsampleinthatsetandforceittostayassociatedwiththevalue.Atleastthisway,thereisnopossibilityofaninfiniteloop.Asafurtherimprovement,attheendofaniteration,anyclusterthathasonlyonesampleassignedtoitchangesitsvaluetobeexactlythatsample'svalue,tobeaperfectfit.
Duringthereassignmentphase,Ialsoexperimentedwithupdatingthecentroidofclustersdynamically.Thissoundedlikeagoodidea,butithasnegativefeedbackloopsbasedontheorderingofsamples.Forexample,ifvaluesarespreadoutalongthenumberlineandtherearetwovaluesA=0andB=1000,assigningasample499tovalueAwillmoveitscentroidto249.5.Suddenly,thenextsample501willbeclosertoAthanB,andfurtherskewstheclusterpositioning.Orderdependencyisaterriblewaytobuildrobusttoolsandwillgivehighlyvariableresults,sothiskindofadaptivebehavioristobeavoidedatallcosts.
TeamUnknownRelease
Chapter7-World'sBestPalettizerGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
7.4AlgorithmOverview
Okay,here'stheviewfromorbit:
1. Initializetheclusterset.
2. Findtheclusterwiththelargesterroranddivideitintotwoclusters.
3. Reassignallsamplestotheclosestcluster.
4. RepeatfromStep2untilallclustererrorsarezeroorthetargetofNpaletteentriesisreached.
ThementalmodelIworkwithisthatapaletteisabunchofpointsinspace,andeachoriginalsamplegetsmappedtooneofthosepointsinspace.Thesearelittleclouds,orclusters,ofsamplessurroundingeachrepresentativevalueinthepalette.Here'sasimplifiedwayofwritingthisinC++:
typedefCVector3RepValue;typedefCVector3Sample;typedefstd::vector<Sample>Samples;typedefstd::map<RepValue,Samples>Clusters;typedefstd::map<Sample,int>Frequency;typedefstd::vector<float>AxisWeights;
Clustersclusters;SamplesorigSamples;FrequencysampleFreq;AxisWeightsweights;
Step1—InitializingtheClusterSet
Reducingthesetofsamplestoauniquesetshouldbedoneasapreconditioningstepforperformancereasons.Aconsequenceofdoingsoisthatimportancecalculationswillneedtoconsiderasecondaryfieldthatdescribesthepopulationthateachsamplerepresentsandiscarriedthroughcentroidcalculationsasaweightingfactor.Reductionisnotcomplicated,butitistediousandnotrequiredforthisalgorithmtoworkcorrectly.Fortestingpurposes,assignafrequencyforeachsampleinFrequencyto1.0.
Initializingtheclustersetisaccomplishedbycomputingarepresentativevalue,RepValue,whichisthecentroidofasetofsamples,thenpopulatingtheClustersmapwithasingleentryhavingallthesamplesassignedtoit.Thisisdoneasfollows:
RepValueconstrepValue=ComputeCentroid(origSamples,sampleFreq);clusters[repValue]=origSamples;
Computingasinglerepresentativevaluefrommanysamplesistrivialifandonlyifthenumericalspaceofthesamplesislinear.Forexample,colorsinaperceptuallyuniformspacecanbecomputedasasimplecentroid.Notethatsincein-putsamplesxihavebeenreducedtoonlyuniqueentries,aweightfactor proportionaltoitsoriginalfrequencymustbeappliedwheneverconsideringasampleforcomputingitscontributiontocentroidC:
Listing7.1:Computingthecentroidforasetofsampleslookscomplicatedbecauseweallowforareducedsetofsampleswithassociatedfrequencies.It'sreallyjustanaverage.
RepValueComputeCentroid(constSamples&x,constFrequency&w){inttotalFrequency=0;RepValuerepValue(0,0,0);for(size_tsamp=0;samp<x.size();samp++){intconstsampleFrequency=w[x[samp]];repValue+=x[samp]*sampleFrequency;totalFrequency+=sampleFrequency;}
repValue/=(float)totalFrequency;return(repValue);}
Step2—SubdividetheWorstFittingCluster
Determiningtheclustertosplitisrelativelyeasy,onceyouhavedefinedtheerrormetric.Iterateoverallclustersandcomputethesumoferrorsbetweentherepresentativevalueandallitsassignedsamples,againweightedbythesample'soriginalfrequencyinthesourcedata.Theclusterwiththegreatestcomputederrorisselectedforpartitioning.ItshouldbenotedthatFindWorstClusteristhenaiveimplementation,whichrecomputestheerrorevenforclustersthathavenotchanged.Thisimplementationis
simpleforclarity'ssake,notforperformance.
Listing7.2:Thiscomputestheerrorforeachcluster'ssamplesrelativetothecentroidchosenforit.Theclusterwiththegreatesterrorisreturnedforsubdivision.
RepValueFindWorstCluster(constClusters&clusters,constFrequency&freq,constAxisWeights&aw){RepValueworstValue(FLT_MAX,FLT_MAX,FLT_MAX);//nonsensevaluefloatworstError=0.0F;
for(Clusters::const_iteratorcluster=clusters.begin();cluster!=clusters.end();++cluster){constfloaterr=ComputeError(cluster->first,cluster->second,freq,aw);if(err>worstError){worstError=err;worstValue=cluster->first;//remembertheworstfit,//sowecansplitit}}
return(worstValue);}
MeasuringErrorinaCluster
Computingthequantizationerrorbetweenasampleanditsrepresentativevaluevcanbedoneinacoupleofwaysifyourdataformsavectorfield.RGBcolorsand(x,y,z)pointsarevectorsinspacethatformametricspace,meaningwecanusesimpledistanceformulas.Youhaveachoiceofsimplelinearerrororsquarederror.Linearerrorallowsfortotaldeviationɛtobelimited,withoutregardtohowwelldistributedtheerrorsareacrosstheaxesofthesample.Itseemstoolenient,butisfasttocompute.
Interestingly,4-componentRGBAcolorsdonoteffectivelyactasavectorfieldduringrenderingbecausealinearchangeinthealphamaycauseunpredictable,nonlinearchangestothefinalcolorduetoblending.AlphaisnotlinearlyindependentofRGB,inotherwords.ThatmeanssimpleEuclideandistanceslosetheirmeaningforthealphacomponent.ThebestwayIhavefoundtodealwiththisistogiveeachscalarcomponentaweightthatisappliedduringthesquarederroranalysis.
Evenso,thistechniqueishelpfulbecauseitaffordsgreatercontroloverthefittingofdata.Squarederrorstronglylimitsthedeviationperaxis,sothatyoucancontroltheimportanceofeachscalarwithinasamplebyweightingthedeviationoneachindividualwithafactorsi.ThiscanbeusedtomoreheavilyweighttheY′lumacomponentratherthanchrominance(CrCb),ortoimprovethezcomponentofnormalsattheexpenseofxandy.Asyoucansee,squarederrorisalittlemoreexpensivetocomputeaswell:
(Here,xijisthecomponentjofthesamplei.)
Listing7.3:Thisfunctioncomputesasquarederrorbetweentherepresentativevalueofaclusterandallofitssamples,withaweightpercomponent.
floatComputeError(constRepValue&v,constSamples&x,constFrequency&w,constAxisWeights&s){floaterr=0.0F;for(size_tsamp=0;samp<x.size();samp++){//computetheerrorofasinglesamplefloatsquaredErr=0.0F;for(size_tj=0;j<3;j++){
constfloatlinearErr=(v[j]-x[samp][j])*s[j];squaredErr+=linearErr*linearErr;}
constintsampleFrequency=w[x[samp]];err+=squaredErr*sampleFrequency;}
return(err);}
SplittingaClusterinTwain
Atthispoint,we'veidentifiedtheworst-fittingcluster.Howdowegoabouttryingtosplitit?Ideally,theendresultwillbetwoclusterswithroughlyhalfthesamplesfromtheoriginalclusterineachnewcluster.Also,we'dliketoknowthatwe'resplittingtheclusteralongitslongestaxis,sothatwhenwedivideitinhalf,it'llbebisectingthelongestlinewecandrawthroughthedatapoints.Thisshouldseparatethesamplesnaturallyintotwogroupsthatarefarthestapartfromeachother.
First,weneedawaytodeterminetheprincipleaxisofasetofvectors.Principlecomponentanalysisistheclassofmathematicaltoolswe'reinterestedin.Ifweweresearchingforafullyorthogonalsetofaxes,therearemethodswecouldimplementlikeeigenvaluedecompositionorsingularvaluedecomposition.However,thisismoreworkthannecessary,sincewe'reneverlookingformorethantheprimaryaxisofacluster.Theprimaryaxisofasetofpointsisthelargesteigenvectorofitscovariancematrix,whichcanbeeasilyproducedusingthepowermethod.Thisisbasicallydonebycontinuallymultiplyingavectoragainstamatrix,normalizingit,andrepeatingtheprocessuntilitstopschangingdirection.Theresultisthedominantaxisofthedatainthematrix.
Listing7.4:Thiscomputesacovariancematrixforthesamplesinthiscluster.
//PowerMethodforeigenvectorsistakenfromvoidSplitCluster(constRepValue&v,constSamples&x,constFrequency&w,constAxisWeights&s,Clusters&clusters)
{//computeclustercenterbasedonthesamplesandtheirfrequencyRepValuecenter=ComputeCentroid(x,w);
//counthowmanypixelsthisclusterrepresentsuintnumSamplesRepresented=0;for(size_ti=0;i<x.size();i++)numSamplesRepresented+=w[x[i]];
//computecovariancematrixofsamplesfloatcovMatrix[3][3]={{0,0,0},{0,0,0},{0,0,0}};for(size_touter=0;outer<3;outer++)//rows{for(size_tinner=0;inner<3;inner++)//cols{//onlycomputetheupperpartanddiagonals,simplyreflect//tothebottompartsincethecovariancematrixissymmetricif(inner>=outer){//computethecovarianceofeachchannelrelativetoeach//other,acrossallsamples.floatdeltaSum=0.0F;for(size_tloop=0;loop<x.size();loop++)//weightbynumberofpixelsthisrepresentsdeltaSum+=(x[loop][inner]-center[inner])*(x[loop][outer]-center[outer])*w[x[loop]];
constfloatcovariance=deltaSum/numSamplesRepresented;covMatrix[inner][outer]=covariance;covMatrix[outer][inner]=covariance;//symmetry}}}
Theabovecodecomputesacovariancematrix,whichisrequiredtoperformthepowermethodforproducingthefirsteigenvectorofthedataset.Therearesituationswherethepowermethodwillnotconverge—mostspecificallywhentheinitialeigenvectorguessisorthogonaltotherealdominantaxis.Inpractice,thisdoesn'tseemtobeaverycommonproblem,andifitbothersyou,changeittoabettermethod.Theappealinusingthepowermethodisthatit'ssimpletoimplementandunderstand,anditworksprettywell[3].
Listing7.5:Thisusesthepowermethodwithupto10iterationstodeterminethedominantaxisofthecluster.
//PowerMethodforproducingthefirsteigenVectorfloateigenVector[3]={1.0F,1.0F,1.0F};floattempVector[3];floatlastScalar=0.0F;
//generallyconvergesin~3iterationsfor(size_titeration=0;iteration<10;iteration++){//vector-matrixmultiplytheeigenVectorintotempVectorfor(size_tmultiI=0;multiI<3;multiI++){//storethisintothetempvectoruntilthemultiplyisdonetempVector[multiI]=0;for(size_tmultiJ=0;multiJ<3;multiJ++)tempVector[multiI]+=eigenVector[multiJ]*covMatrix[multiI][multiJ];}
//normalizetheeigenvector(whichisnotthesameasvector//normalization-anormaleigenvectorhasamaxcomponentof1)floatmaxComponent=0.0F;
//findthemaximumcomponentintheneweigenVectorfor(size_ti=0;i<3;i++)if(fabsf(tempVector[i])>maxComponent)maxComponent=fabsf(tempVector[i]);
//performthenormalizationandstoreintoeigenVectorfor(size_ti=0;i<3;i++)eigenVector[i]=tempVector[i]/maxComponent;
//figureoutifwe'veconvergedornotconstfloatabsoluteRelativeError=fabsf((lastScalar-1.0F/maxComponent)*maxComponent);if(absoluteRelativeError<0.001F)break;//ifourdirectionhasnotchanged,stopiterating
//moveontoanotheriteration,andrememberwhatthe//eigenvaluewaslastiterationlastScalar=1.0F/maxComponent;}
Oncethedominantaxishasbeendetermined,projectallsamplesontothisaxisandsplitthesetintosamplesononesideortheotheroftheplaneperpendiculartothedominantaxis.Inotherwords,assignallsamplesbasedonwhichsideofthemidpointalongtheprimaryaxiseachsamplefalls.Isupposewecouldtryusingamoresophisticatedbinningmethodhere,butthatwouldassumewehavearealistic
representativevalueineachclusterworthmeasuringerroragainst.Wedon't,andtheselectionofsuchasampleisNP-hard—theproblemofselectingarepresentativevalueforacloudofsamplesisessentiallypalettization!Atthispoint,allthatmattersisthattheclustersareroughlyequalinsizeandsplitalongthedominantaxis.Thesenewentriesareplaceholders—sampleswillbegloballyrepositioned,andnewrepresentativevalueswillbecomputedshortly.
Listing7.6:Thiscodesplitstheclusteratthemid-sectionalongthedominantaxis,binninghalfthesamplesintoeachnewcluster.
//Takinganysample,subtractthecentroidtogetarelativevector//fromthecenterofthedata.Then,takethedotwithrespectto//theeigenvector.Thisresultsinavaluethatindicateshowfar//alongthedominantaxisthepointis.
SamplespositiveSamples,negativeSamples;
for(size_ti=0;i<x.size();i++){//Sincethedominantaxisfitsthelargestrangeofsamples,//andthecentroidisgenerallyinthecenterofthesamples,//splittingtheremakessenseasafirststep.floatdotProduct=0.0F;for(size_tj=0;j<3;j++)dotProduct+=(x[i][j]-center[j])*eigenVector[j];
//Let'sbinthemouttopositiveandnegativelobes.if(dotProduct>0.0F)positiveSamples.push_back(x[i]);
elsenegativeSamples.push_back(x[i]);
}
RepValuepalPositive,palNegative;
//computethecentersofthetwonewlobesRepValuepalPositive=ComputeCentroid(positiveSamples,w);RepValuepalNegative=ComputeCentroid(negativeSamples,w);
//stickbothnewpaletteentriesintotheclustersmapclusters[palPositive]=positiveSamples;clusters[palNegative]=negativeSamples;
//makesurewegetridoftheclusterthatwe'vejustsplit,tooclusters.remove(v);}
So,thereyouhaveit!Theabovecodewillpartitionasetofdataintotwopiecesverywell.Still,refinementisrequiredtogetagoodglobalfittingtothedata.Also,beawarethatthedatastructuresshownareforeducationalpurposesonly—youcandofarbetterthanO(logN)lookupseverywhere.
Step3—ReassigningAllSamples
Nowthatwehavetwosmallerclusterswhereonelarge,poorlyfittedclusterusedtobe,thereisnoguaranteethateverysampleinthesetwoclustersareclosesttotheircentroids.Mathematicallyspeaking,theywillbeclosesttooneofthetwocentroids.Further,samplesthatpreviouslybelongedtootherclustersmaybeattractedtotheseshinynewcentroidsbecausethey'reacloserfitthantheircurrent
one.Globalsampleallocationrefinementreducesthemeasurablesampleerror,yieldingabetterpalette.Also,futureiterationswillmeasurethequalityofclustersbasedonhow"well-liked"theyarebytheirsamples.Aclusterpartitionerthatneverreallocatedsampleswouldendupsplittingthewrongclustereventually,andoverallwouldnotfitthedatawell.
So,first,weclearoutallthesamplesfromtheirclusters.Then,toavoidinfiniteloops,thefirstthingwedoisiterateoveralltheemptyclustersandfindtheclosestunallocatedsample(usingtheerrormetricfunction)andassignittothecluster.Thisassociatessomethingwitheverypaletteentry,sowedon'tendupwithunusedspacesinthepalette.
Second,weiterateoverallthesamplesandbinthemoutone-by-onetotheclosestcluster,usingtheerrormetricdevisedabove.Attheend,werecomputethecentroidofeachclustersothattherepresentativevalueisidealforthesamplesitrepresents.Thisprocessofreassignmentcouldberepeatedifdesired,asitshouldbequitestableifthealgorithmisimplementedcorrectly.Ifsamplesmigratetoacoupleofclusters,itprobablyindicatesaslightlyimbalancederrormetric.
Step4—Termination
Whenevertheclustercountreachesthedesirednumberofpaletteentries,stopiterating.Afinalreassignmentiseffectivelythefinalpalettizationofthedata.Incasetherearefeweruniquesamplesthanpaletteentries,theerrormetricshouldreturnzerobecausethereisapaletteentryforeverypossibleinput.Anothersimplecheckisthattheclustercount
didnotchangeoverthelastiteration.Inanyofthesecases,thealgorithmhascompleted,andthepaletteandpalettizeddatacanbewrittenout.
TeamUnknownRelease
Chapter7-World'sBestPalettizerGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
7.5FutureWork
Errordiffusionisaninterestingtechniquethatdrasticallyhelpsheavilycompressedimages.Whiledefinitelynotappropriatefornon-imagedatasets,whenusingerrordiffusion,onemighttweakthealgorithmtointentionallyreduceto2Ncolorsinthepalette,thenstartattemptingtofindpairsofcolorsinthepalettethatinterpolateat25%,50%,and75%toothercolorsinthepalette.Thosecolorsthatcaneasilybeinterpolatedfromexistingpaletteentriescouldberemovedfromthepaletteeasilyandinducerelativelylittleerror.Somecarewouldneedtobetakentoensurethatsamplesarekeptwithpreferenceforthosethatappearmostfrequentlyinthesourceimage.
Itisplausiblethatconvertingthecolorstoamoremeaningfulcolorspace,onethatinterpretsthecolorchannelsdifferentlybyseparatingimportantdataintoonechannel,andlessimportantdataintootherchannels,thequantizercouldbeguidedtopreferaccuracyononeaxismorethanothers.OnesuchspecificcolorspaceisY′CbCr.Forinstance,thechrominancevaluesCbandCrcanbegivenlowerweightingthanthelumavalueY′,whichismoreperceptuallyrelevant.Thisshouldhelptheoverallcontrastandbrightnesscontrolofthefinalimage,atsomeexpensetocolorrange.Intheory,itsoundslikeagoodideaandisasimpleconvolutionappliedtotheinputdata.
TeamUnknownRelease
Chapter7-World'sBestPalettizerGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
7.6Results
Figures7.2and7.3illustratethedifferencesbetweenpalettizationusingPhotoshopandpalettizationusingthetechniquepresentedinthisgem.
Figure7.2:(SeealsoColorPlates.)(a)Grasshoppersourceimageusing136,945colors.(b)Photoshop,256colors.(c)Photoshop,16colors.(d)WBP,256colors.(e)WBP,16colors.(©Picstudio/Dreamstime.com)
Figure7.3:(SeealsoColorPlates.)(a)Childsourceimageusing112,024colors.(b)Photoshop,256colors.(c)Photoshop,16colors.(d)WBP,256colors.(e)WBP,16colors.(©PavlaZakova/Dreamstime.com)
TeamUnknownRelease
Chapter7-World'sBestPalettizerGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]DanielAloise,AmitDeshpande,PierreHansen,andPreyasPopat."NP-hardnessofEuclideansum-of-squaresclustering".MachineLearning,Volume75,Number2(May2009),pp.245–248.
[2]CharlesPoynton."ColorFAQ".http://www.poynton.com/ColorFAQ.html
[3]E.Garcia."PowerMethod(VectorIteration)".http://www.miislita.com/information-retrieval-tutorial/matrix-tutorial-3-eigenvalues-eigenvectors.html#power-method
TeamUnknownRelease
Chapter8-3DStereoscopicRendering:AnOverviewofImplementationIssuesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter8:3DStereoscopicRendering:AnOverviewofImplementationIssues
AndersHastUPPMAX,UppsalaUniversity
Overview
Inrecentyears,therehasbeenanincreasinginterestinthefieldof3Ddisplaytechnologiesfromtheentertainmentindustry.Today,themovieindustryismovinginatawidefrontasthousandsof3Dstereoscopicmovietheatersarebeinginstalledworldwideandmovieproductioncompaniesproducetheirfilmsalsoin3D.Infact,anewjobtitlehasemerged,thestereoscopist,inchargeofmakingsurethatthescenescanbeviewedwithoutproblemsbytheaudience.Someofthethingsthestereoscopistmustdealwithareexplainedinthisgem.Atthesametime,manyproductsforthehomeaudiencehavebeendevelopedatanaffordablepricerangeandsoldtoagrowingnumberofcustomers,includingstereocapableTVsetsandcomputerscreens.Thisgivesthegamesindustryauserbasethatwillbefamiliarwithwatching3Dstereoscopiccontentandwho,inthefuture,mightalsobeexpectingtheirfavoritegamestobereleasedinstereoscopic3D.Thisgemdealswiththefarmostcommontypeofstereoscopicdisplay,theplano-stereoscopicdisplay.Incontrasttoothertypesofstereoscopicdisplays,thesearedisplaysthatworkwithtwoplanarsurfacesinordertoachievetheimpressionofdepth.Webasethediscussionaroundtheimportanceofdesigningthecontenttofitforastereoscopicdisplayandthedifferentkindsofviewingconditionsthatmustbeconsidered.Moreover,wediscussthemathematicsthathelpuscomputetheseviewingconditionsinordertobeabletoviewthecontentwithoutanyproblems.Andfinally,weprovideanoverviewofdifferenttypesofdisplaytechniques.
Westartbybrieflyfamiliarizingthereaderwiththefieldof
stereoscopyanddifferentdepthcues,coveringsomeimplementationdetails.Wetheniterateoversomeissuesthatcanarisewhenintegratingstereoscopicdisplaysupportintoagameengine.Eventhoughmosttermsareexplainedherein,itmightbevaluableforthereadertousetheonlineglossaryprovidedby[6].
TeamUnknownRelease
Chapter8-3DStereoscopicRendering:AnOverviewofImplementationIssuesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
8.1MechanismsofPlano-StereoscopicViewing
Theconceptbehindplano-stereoscopicdisplayscanbeseenquitesimplyasthecreationoftwoplanarviewsofthegame,oneforthelefteyeandonefortheright.Then,itisimportanttomakesurethateacheyeseesonlytheviewintendedforthateye.Theprocessesinvolvedcanbeseenascodinganddecodingprocesses.Inscientificvisualization,onesaysthatstereoscopicimagesareaimedtohelptheviewerforma3Dmentalimageofthedataset.Invideogames,ontheotherhand,theaimistogivetheplayerarichervisualexperience.
Inordertocreatethesenseofdepthinanormalrenderedgame(i.e.,amonoscopicrendering),monoculardepthcuesareusedincontrasttothebinoculardepthcuesdiscussedlater.Someexamplesofmonoculardepthcuesarethefollowing:
Occlusionoccurswhenobjectsclosertothevieweroccludeobjectsthatarefurtheraway,andthisishandledbythedepthbufferalgorithm.
Parallaxisaneffectcausedbythemotionoftheobserver,anditcreatestheillusionthatobjectsclosetotheobserveraremovingbyfasterthanobjectsfurtheraway.Forinstance,tosomeonelookingoutthewindowofamovingtrain,treesintheforegroundappeartomovebymuchfasterthanadistanthillside.
Thesizeofanobjectvariesdependingonthedistance
fromtheviewerduetotheperspectiveprojection,whereparallellinesconvergeatthehorizon.
Texturedetaillevelsprovideinformationaboutthedistancetoanobject.
Atmosphericeffectssuchasscatteringorhazemakeobjectsappearmoregrayinrelationtothedistancebetweenthemandtheviewer.
Shadingandshadowstellusaboutcurvatureandinter-objectrelationships.
Theproximitytothehorizonalsoprovidesadepthcuesinceweknowthatthehorizonisfaraway,andobjectsclosetothelineofthehorizonarethusperceivedasbeingfaraway.
Whatastereoscopicdisplayaddstothesecuesarethethreebinoculardepthcuesknownasaccommodation,convergence,andretinaldisparity.Convergenceoccurswhenwefocusourviewatanobjectinreallifebyrotatingoureyessothattheirlinesofsightintersectatthepointofinterest.Atthesametime,weapplypressuretothelensesintheeyesinordertofocus,andthisiscalledaccommodation.Undernormalnaturalviewingconditions,bothaccommodationandconvergencecorrespondandarehabitual,butcanbevoluntarilyputoutoffunctionbycrossingtheeyes.Thethirdcueisretinaldisparity,andthispertainstothefactthatwehavetworetinalimagesthatfallondifferentpointsofthetworetinas.Thesearethenmergedbythebrainandperceivedasasingleimage.
Thevirtualspacethatwedefineisdividedbythescreeninto
tworegionscalledviewspaceandscreenspace.Thevolumebetweentheviewerandthedisplayistheviewspaceandthevolumebehindthedisplayiscalledscreenspace.Ifwelookatapointlyingatthesamedepthasthedisplaysurfaceontowhichitprojects,asshownintheleftimageofFigure8.1,thenthehomologouspointsonthedisplayhavezeroparallaxbecausetheyhavenolateraldisplacement.Thehomologouspointsareidenticalfeaturesinthestereopair;thus,thesamepointinspaceislocatedondifferentplacesintheleftandrightstereoimagefornonzeroparallax.Pointsthatarelyingineitherviewspaceorinscreenspacehavealateraldisplacementandarethensaidtohaveeithernegativeorpositiveparallax.AllthreecasesareshowninFigure8.1.Convergingatapointbehindthedisplaysurfacecausesthehomologouspointsonthedisplaytohavepositiveparallax.Thispointissaidtobeinscreenspace.Similarly,convergingatapointinfrontofthedisplaysurfacecausesthehomologouspointsonthedisplaytohavenegativeparallax,andthispointissaidtobeinviewspace.
Figure8.1:Threedifferenttypesofparallaxthatcanoccur,fromlefttoright—zeroparallax,negativeparallax,andpositiveparallax.
Convergingtheeye'saxesuponavirtualpointatdistanceDcsupportsthefusionoftheparallaximageandstereopsisas
showninFigure8.2,wherestereopsisisthementalandpsychologicalprocessinvisualperceptionleadingtothesensationofdepthfromtwoslightlydifferentprojectionsoftheworldontoeacheye[1].Keepingthevisualstructureon-screeninfocusrequiresaccommodationatscreendistanceDs.Usually,accommodationisdominant,andforunaidedviewing,weseeoneplanardoubleimageinfocus.Withsometraining(crossingtheeyes)wecanconvergeatDcandseeafused3Dimageoutoffocus.Stereographicdevicessuchasstereoglassesgreatlysupportimagefusionandstereopsisatthecostofsuppressedaccommodation.
Figure8.2:Convergenceandaccommodationinplano-stereoscopicdisplays.
ScaleConsiderationsfor3DStereoImages
Tobeabletocreatecomfortablestereoimages,weneedtocalculatethecorrectperspectiveforthegivensetupandmakesurethatwestaywithinthoselimits[8].Somenewsituationsarisefromusingastereoscopicdisplay.Mainly,
scaleconsiderationsareimportantwhenmodelingthevirtualviewvolumeandcreatingtheactualstereopair.
Whenconvergingonobjectsatsomecertaindistance,apointatadifferentdistanceinthesceneappearsattheretinawithsomelateraldisparity,independentofconvergence,asshowninFigure8.3.Theinter-pupillarydistance(IPD)isthedistancebetweentheeyesmeasuredfromthecenterofthepupilsineacheye.Theretinaldisparityis,accordingtoKalawsky[12],computedasthedifferenceofthetwoanglesshowninthefigure:
disp=θ1-θ2.
Figure8.3:Thetwoanglesusedtocomputetheretinaldisparity.
Aretinaldisparityofmorethan10°causesdiplopia(doublevision)andshould,ofcourse,beavoidedatalltimes.
Theprojectioninstereoscopicdisplaysdoesnotscalelinearly.Theprojectionofthetwopointsthatwefuseinourbrainisverydependentonhowfarawaywesitfromthescreenandhowbigthescreenis.Asthispointisprojectedontooureyes,thedistancebetweenoureyesisalsoabigimpactfactor.Anaveragepersonhasaneyeseparationaround6.5cm.Inamoreuncommoncase,wecanfindpeoplewithupto7.5cmbetweentheireyes,andthelargestaudienceorusergroup,youngsters,canhaveasfewas4.5cmbetweentheireyes.Anditshouldberememberedthatthereisagreatvariationamongpeople,particularlyondifferentcontinents[4].SomegoodadvicewouldprobablybefortheIPDtobeanadjustablevariablesetbytheplayerinordertogiveacomfortablestereodepth.
Thephysicallimitonourdepthperceptionisaround200yards.Thiscomesfromthefactthatbeyondthispoint,wedon'tgetanyconvergenceinformationatallsinceourlineofsightismoreorlessparallel.Thisimportantfactmustbeconsideredwhenwecreateourvirtualspace,whichwemustdesignbeforewestarttoplaceobjectsinit.
Anotherpracticalissueinstereographicsisthatoneshouldnotexceedparallaxvaluesofmorethan1.5°visualangleinordertonotfeeluncomfortable[13],asshowninFigure8.4.Theon-screenparallax(osp),measuredincentimetersfordifferentviewingdistancesD,isshowninTable8.1.
Figure8.4:Parallaxvaluesgreaterthan1.5°visualangle
shouldnotbeexceeded.Table8.1:Practicalexamplesforon-screenparallaxvalues.
D(cm) On-screen-parallax(cm)
50 1.31
75 1.96
100 2.62
200 5.24
300 7.86
400 10.47
Letusnowlookatsomepracticalexamplesofhowthevirtualspacedepthislimitedduetothedistancetotheobserverandtheospforanegativeparallaxsituation,asshowninFigure8.5,wheredisthevirtualspatialdepth.Wehave
Figure8.5:Negativeparallax.
andsolvingfordgivesus
Similarly,wecancomputehowthevirtualspacedepthdislimitedduetothedistanceDtotheobserverandtheospforasituationwithpositiveparallaxasshowninFigure8.6.Tables8.2and8.3showsomeexamplevaluesforthecasesofnegativeandpositiveparallax.
Figure8.6:Positiveparallax.
Table8.2:ExampleforIPD=6.5cm,fornegativeparallax.
D(cm) osp(cm) d d/D
50 1.31 8.38 0.17
75 1.96 17.40 0.23
100 2.62 28.72 0.29
200 5.24 89.24 0.45
300 7.86 164.17 0.55
400 10.47 246.83 0.62
Table8.3:ExampleforIPD=6.5cm,forpositiveparallax.
D(cm) osp(cm) d d/D
50 1.31 12.61 0.25
75 1.96 32.47 0.43
100 2.62 67.47 0.67
200 5.24 829.45 4.15
300 5.76 1714.7 7.79
400 6.42 18612.4 75.97
Beforeyougoaheadandworkoutthemathforyourgame,therearesomethingsworthmentioning.First,oneshouldnoticethatvaluesformaximumallowabledisparityintheliteratureareexpressedindifferenttermssuchasangulardisparity,on-screenparallax,etc.Thesevaluessometimes
applyonlyasaruleofthumb.Lipton'svaluesappeartoworkformanyVRapplications,buttheydonotgenerallyapplyforallviewingconditions.AccordingtoLipton,compositingstereoscopicimagesisanart,notascience.Intheendwhatmattersisthatitlooksandfeelsgood!
TheBasicSetup
Severalpropositionshavebeenmadeonhowtosetupandrendergraphicsforstereoscopicdisplaysasefficientlyaspossible.Onewayistoletthegraphicsdriverhandleit.Someofthemajorgraphicsvendorshavenativesupportforstereoscopicdisplaysintheirdriverswhereitcreatesthestereopair.Thisdoesrequirethatthegameiscompatiblewithstereoscopicdisplayintheformthatthecontentisconformedforthesekindsofsystems.Andthislimitsyourgametobeusedonlyforthesespecificgraphicscardvendorsinordertomakeyourgameworkinstereo.Thesecond,lesscommonalternativeinthecontextofthegameindustryistomakeagraphicscommandinterceptor[3].Thisisanapplicationthatpretendstobethegraphicsdriverandinterceptsallthegraphicssentfromanapplication.Thiscanbestoredandthenintheorybeusedtocreatethetwostereopairs.Itisstillrequiredthatthegamecreatecontentthatissuitableforstereoscopicdisplays.
Usually,wewouldusethefollowingsimpleapproachtocreatethegameinstereo.
while(userwantstoplay)doGameLogic()setupProjectionForLeftEye()render()
setupProjectionForRightEye()render()
Inourgame,wemustsetthevirtualcamerastofocusonthepointofinterest,asshowninFigure8.7,usingthepreviouslydescribedmathinordertoensurethattheparallaxdonotexceedthevaluesfortheallowablevirtualspacedepth.
Figure8.7:Camerasetupforthe3Dstereoscopicgame.
TeamUnknownRelease
Chapter8-3DStereoscopicRendering:AnOverviewofImplementationIssuesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
8.2StereoTechniques
Today,therearethreedifferentpopulartechniquesofviewinggraphicsinstereoonthemarket:anaglyphstereo(a.k.a.red/greenstereo)[2],temporalmultiplexing,andpolarization.Wegiveabriefoverviewofthesetechniquesandmentionsomeprosandconsaswellasexaminethecodinganddecodingprocessforeachofthem.Animportantconceptdiscussedisghosting[14],whichshouldbeavoidedasmuchaspossible.Ghostingmeansthatoneeyeseessomeofthecontentmeantfortheothereye.
AnaglyphStereo
Anaglyphstereoisthesimplestandcheapestwayofdeliveringstereoscopicgamecontenttotheplayers.Here,theleftandrightviewsareseparatedusingwavelengthseparation.Theleftviewissimplyencodedintheredchannelandtherightviewisencodedwithcomplementarycolorssuchasthegreenchannelorbothblueandgreen(cyan).Theplayerneedstowearthewell-knownred/greenglassesinordertoseparatethetwoimagessothateacheyeseesonlytheimagemeantforit.Hence,thecolorofthelensesintheglassescorrespondstothecolorchannelsthatencodethepicture,andthisisthereasonforitsname.Thesourceisencodedintooneimage,whichmeansthatthismethodcanbeusedforalotofdifferentmedia,eveninprintedform.
Thebenefitofusingthistechniqueforgamesisthatitdoesnotrequireanyspecialdisplaysystem,justapairofcheapred/greenglassesinordertobeabletoviewtheeffect.
You'veprobablygottenapairofcardboardglasseswithacryliclensesforfreewhenyouboughtyoufavoritecomicmagazinethathadaspecial3Dcenterfoldissue.
Onepitfallusingthistechniqueisthattheplayermustcalibratethescreensothatthecolormatchesthefilteredglasses.Otherwise,theplayerseesalotofghostingwithabadqualitystereoeffect.Thereisalsoanoticeablelossofcolor,asalotofthecolorinformationisbeingfilteredout.
Amoresophisticatedversion,sometimescalledthesuperanaglyphtechnique,usesspectralmultiplexing,moreoftenreferredtoasInterferenceLightTechnology(INFITEC)[9].Spectralmultiplexingworksbydividingthevisualspectraintosixnarrowbands,twobandsforeachoftheprimarycolorsred,green,andblue.Thesebandsarethenseparatedbyfiltersanddividedsothatonebandofeachcolorreacheseacheye.Thatis,halfoftheredspectrareachesthelefteye,andtheotherhalfreachestherighteye,andsoon.Thissystemrequiresthattwosuchfiltersofdifferenttypeperformingthisspectralmultiplexingaremountedontotwoprojectorsforthedisplay,aswellasapairoflightweightglassesthattheplayerwearswithonetypeforeacheye.Hence,theviewerclearlysees,ifheshutsoneofhiseyes,thatthecolorinformationreachingtheeyeisslightlydifferentcomparedtolookingatthedisplayusingonlytheothereye.Nonetheless,thecolordifferencesarelessthanthatseenwhenusingthered/greenglasses.Finally,thebrainmergesthesetwoimagesintoonewithoutaproblem,andthepicturelooksasexpected.
TemporalMultiplexing
Thistechniqueencodesthestereopairbyinterleavingthemtime-wise.Hence,oneframeisbeingshownforthelefteyewhiletherighteyeisbeingoccludedbyanactiveshutter.Similarly,thenextframeisvisibletotherighteyewhilethelefteyeisbeingoccluded.Ifthisprocedureisperformedfastenough,theplayerperceivestheinterleavedimagesasonecontinuousstreamofimagesandgetsthestereopsisright.Thistechnologycutstheeffectiveframerateinhalf,asitisnecessarytorendertwiceasmanyimagestogetthesameupdaterate.Theoccludingisperformedbyapairofactiveshutterglasses,usuallyapairofLCDscreensthatissynchronizedtothedisplay.Whilethedisplaysystemshowsanewframe,itsendsasignaltotheglassestomaketheshuttering.Itisclearthatthistechniquerequiresglassesthatcostalotmorethanthesimplered/greenglasses.Furthermore,itcannotbeusedforprintedmediasince,clearly,thedisplaysystemhastohavetwosourcesfortheoutput.
PolarizedLight
Imageseparationcanalsobeachievedbyencodingeachimageusingpolarization.Thetwoimagesaresuperimposedontothescreenbythedisplaysystemthroughapairoforthogonalpolarizingfilters.Thevieweralsowearsapairofglasseswithcorrespondingfilters,whicharerelativelycheapcomparedtotheactiveshutterglasses.Thepolarizationdirectionsintheglassescorrespondtothoseonthesource.Thus,noextrahardwareisneededintheglasses,butonceagainitisnecessarytohavetwosourcesoflight,andthistechniquecannotbeusedforprintedmedia.Theviewermustnottilthisheadwhenusingthistechnique,asitresultsinsevereghosting.Alternatively,circularpolarizationcanbe
usedinanattempttoavoidthisproblem.Inallcases,polarizedfiltersdimthebrightnessofthesourcebecausetheoriginalcontentisfiltered.
Summary
Allthreetechniquespreviouslymentioned(withtheexceptionoftheINFITECvariationofanaglyphstereo)havebecomemainstreamforplayingstereogames.Apairofanaglyphglasseshasaverylowpriceanddoesnotrequiretheplayertoinvestinanyadditionalhardwaretoenjoystereogames,butitproducestheworstcolorrepresentationofthethreementionedtechniques.However,theframeratedoesnothavetobeincreasedasfortemporalmultiplexing.Temporalmultiplexingrequiresthattheplayeracquirestheactualglassesandalsohasasourcethatcanpreferablyemit120Hz,sincetheactualframerateiscutinhalfwiththistechnique.ThecurrenttrendisthatmoreofthistypeofdisplayandTVarecomingout.Thepolarizedlighttechniqueisrealizedeitherasanactivelypolarizeddisplayorbyhavingtwoprojectorsforthereallyluxuriousplayers.Regardless,theglassesarepassiveandthusgenerallycheaper.
TeamUnknownRelease
Chapter8-3DStereoscopicRendering:AnOverviewofImplementationIssuesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
8.3DesignConsiderationsfor3DScenes
Whatreallydifferswhencreatingcontentforastereoscopicgameenginecomparedtoamonoscopicone?Therearesomeapproximationsthatcanlookverybadorveryflatwhenportedtoastereoscopicdisplay.Wediscusssomeoftheminthissection.Itisalsoimportanttoavoidsomesituationsthatcanbeunpleasantfortheplayer[10].
Cullingcanbeaprobleminthesensethatwehavetwofrustatoclipagainst.Clippingagainstthemonoscopicfrustumwouldyielderroneousresults,asthisissmallerthanthecombinationoftheleftandrightfrusta.Thisproblemcanbehandledinatleasttwoways,eitherclipagainstthejointfrustafromleftandrightatonceorclipseparatelyagainstbothfrusta.
Stereoscopicrenderinginitsnaiveform,byrenderingthescenetwicefromtwodifferentperspectives,occursataroundtwicethecost.Someeffectsdonotneedtobecalculatedtwice,forexample,thosethatareviewpointindependent.Someoftheshortcutsthatarenormallymadeinmonoscopicrenderingdonotworkinstereoscopicrendering.Ifthesevisualeffectsarenotgameplaycritical,likeglowsaroundobjects,theycanbeturnedofforreplacedbysimilarobject-basedeffects.
Theoffscreenbuffersonlyneedtobeduplicatediftheyhavesomeview-dependenteffect.Thisappliestoshadowmappingandtothosevariantswheretheshadowmapisbasedonthesetupoftheviewfrustum.Manyoftheoftenusedscreen-spacepostprocessingeffectsdonotlookgood
instereo.Onethingtobecarefulwithishighdynamicrange(HDR)rendering.Thecontrastbetweenleftandrightviewisveryimportant,andifweapplytonemappingseparatelytoleftandrighteye,wecanintroduceashiftincontrastthatinducesstraintotheplayer.
Billboardingisanothercommonlyusedeffectthatdoesnotportwelltostereoscopicdisplays.Billboards,whichare2Dobjectsthatalwaysfacetheviewer,lookundercertainconditionsliketwoplanarobjectsandnotlikeavolumetricobjectastheywereintended.Thesameappliestoimpostors—theymustbemadeeye-dependenttonotlookcompletelyflatonastereoscopicdisplay.Thisalsoincursaperformancepenaltysincetwiceasmanyimpostorsmustberendered.
Backgroundsmust,onastereoscopicdisplay,beplacedattheproperdepthsothattheyappeartobeasfarawayasbackgroundsusuallyare.Thisisespeciallyimportantwithskyboxesandskydomes,astheyotherwiseappeartobeplacedtooclosetotheplayerasiftheskyispartoftheceiling.Thisalsorelatestothedepthrangeinyourscene.Iftechniquesareusedthatrequiredifferentdepthranges,thesealsodestroythestereoscopicsensation.Differentdepthrangesimplydifferentdepthinstereo,andobjectsendupinunexpectedplacesinthevirtualroom,causingthemtolookdeformed.
Thesameeffectcanbeseenwithoverlayssuchasgraphicaluserinterfaces(GUIs),whicharenormallyrenderedinscreenspacewithoutdepthinformation.However,instereothisgivescontradictingdepthcuesbecausethemenucanbedrawnoveranobjectthatliesinviewspace,buttheoccludingcueimpliesthatweshouldseetheobject.The
GUIshouldliebehindtheobject,butasconsequenceofthescreen-spacerendering,liesatthewrongdepth.OnesolutiontothisproblemistoalwaysputtheGUIclosesttotheplayer.Dependingonhowthevirtualroomissetup,thissolutionmightshovetheGUIintotheplayer'sfacesoitmustbedonewithgreatcaution.
Othereffectsthattraditionallytakeplaceinscreenspace,suchastextlabels,mustalsobeplacedinthegameworldattheproperdepth.Thisnowmeansthattheycanalsobecoveredbyobjectsinfrontofthem,whichmightchangehowtheactualgameplays.IncombinationwithGUIandtextlabels,weoftenalsohavesomekindofrepresentationofinputdevices.Thestandardmousecursorrenderedbytheoperatingsystemcanoftencauseanomaliesindifferentways.Ifwearerenderingthecontenttoaside-by-sidebuffer,whichisacommontechniquetorender3Dcontent,weonlyseethecursorinoneeyesincebotheyessharethesamebuffer.Seeinganobjectwithonlyoneeyethatshouldbeseenbybotheyesisreallyannoyingandshouldalwaysbeavoided!
Thus,wehavethesameproblemsaswiththeGUIwhenthepointerisrenderedinscreenspace.Itisthereforeadvisabletocreateasoftwarecursorthatisplacedasanobjectwithdepthintheworld.
Itisreallytiringfortheeyes,andtheviewerevenlosesthe3Deffectforashortwhile,ifthefocusfromonescenetoanotherchangesdramatically.Ifanobjectappearsveryclosetotheviewerinonescene,thenitcanbedraggedbackabitbeforethescenechanges,orrather,thefocusneedstobepulledbacktotheplaceoffocusforthenext
scene.Thisissomethingthattraditionalgamesusuallyneverbotherwith,butwillprobablybeagreatchallengeforstereogamedevelopers.Asimilarproblemoccursforstereofilmswithsubtitles,astheywilldefinitelymaketheviewertiredsincethefocusneedstobechangedrepeatedly.Theviewerwillprobablyendupconcentratingonthefilmwithoutevenbotheringtoreadthesubtitles.
Situationswithcontradictingdepthcuesinrelationtoocclusioncanalsoarisewhenobjectsmoveoffthescreeninviewspace.Now,thedepthcuetellstheplayerthattheobjectisinfrontofthescreen,butwhenitcomesclosetotherightorleftedgeofthescreenitdisappearsbehindit.Tomakethisso-callededgeconflictabitlessapparent,wecanmovetheobjectslowlyoffscreenorintroducevirtualbordersonthescreen.Thesewouldbetwoguardbandslyingindepthclosesttotheviewertopreventthecontradictingdepthcues.Thissolutionisknownasfloatingwindowsandwasusedasearlyasthe1950s[15].Ithassofarnotbeenusedforcomputergames,butwasrecentlyusedforthePixarfilmUp.
Wealsoseeaneedforagreaterlowerboundoftheacceptedframerateinagamewhenmigratingtostereo.Peopletendtoexperiencejerkinessiftheframeratedropsbelow60FPSwhenviewingitinstereo,afargreaternumberthanwithmonoscopicdisplays.Dependingonwhattypeofviewingdevicetheplayeruses,wecanalsoexperienceproblemswithframe-sequentialdeliveryofstereographicalcontent.
TeamUnknownRelease
Chapter8-3DStereoscopicRendering:AnOverviewofImplementationIssuesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
8.4Outlook
Usingstereoscopicdisplaysisagreatwaytoincreasethegameplayvalueintheformofimmersion.Thisisstillsomewhatofanunexploredfieldeventhoughstereoscopicdisplayshavebeenusedforalongerperiodoftimeinotherfieldssuchasscientificvisualization.Thefilmindustryhasmoreexperiencethanthegameindustrywithstereoscopicdisplays,andwearenowstartingtoseemoreandmoremoviescreatedforthesetypesofdisplays.Whenwillthegameindustryfollowinthosefootstepsandreleasemoregamesthataredirectlycateredforstereoenableddevices?Sofar,themajorityofthegamesthatarestereocapableareportsfrommonoscopicgames.Thestepfrommonoscopictostereoscopicalsogivesusgamedevelopersanewdimensiontocreatemoreinterestinggamesdesign-wise.Afterall,3Dstereoscopyisregardedbythefilmmakersasyetanotherimportantstorytellingtechnique.Thefuturewillholdmanynewdesignapproaches,bothfromapurerenderingperspectiveandalsofromagameplayperspective.AninterestingapproachistorenderthescenefromonevirtualcamerapositionandusetheinformationinthedepthbuffertogenerateastereopairusingDepth-Image-BasedRendering(DIBR)[7].Asimpleexampleofauniquegameplayexperienceavailableonlyinstereocouldbethesniperscopecommoninmanygamesinvolvingguns.Thestandardwayofhandlingthisistorenderacircleinthemiddleofthescreenwheretheplayerseesthezoomed-inpieceoftheworld.Thestereoscopicversioncouldthen,whentheplayerzoomswiththescope,blackoutoneeyeandtheplayerwouldalsoloosethestereopsis,givingmorerealismandimmersion.
Thefuturealsoholdsothertypesofstereoscopicdisplayscalledauto-stereoscopicdisplays.Thesedisplaysfunctionwithoutglasses,butmustrendermanymorethantwoviews[11,5].Eventhoughthetechniquehasbeenaroundforalongtime,itstillhasnotbeenwidelyacceptedbecauseithassomedrawbackssuchasghosting.Makingsurethatyourgameengineisadoptedforplano-stereoscopicdisplaysisasteptowardsmakingthemworksmoothlywithauto-stereoscopicdisplays,andtheguidelinespresentedinthisgemwillhelpyoutoachievethisasagameenginedesigner.
TeamUnknownRelease
Chapter8-3DStereoscopicRendering:AnOverviewofImplementationIssuesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Acknowledgements
TheauthorwishestothankStefanSeipel,UppsalaUniversity,andMartinEricsson,UPPMAX,UppsalaUniversity,forcontributingtheirmaterialforthisgem.
TeamUnknownRelease
Chapter8-3DStereoscopicRendering:AnOverviewofImplementationIssuesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]AkiyukiAnzai,IzumiOhzawa,andRalphD.Freeman,"NeuralMechanismsforProcessingBinocularInformation".JournalofNeurophysiology,Volume82,Number2(August1999),pp.891–908.
[2]Anaglyph.http://en.wikipedia.org/wiki/Anaglyph_image
[3]Chromium.http://chromium.sourceforge.net/
[4]NeilA.Dodgson."Variationandextremaofhumaninterpupillarydistance".ProceedingsofSPIEStereoscopicDisplaysandVirtualRealitySystemsXI,2004,pp.36–46.
[5]NeilA.Dodgson,J.R.Moore,andS.R.Lang."Multi-viewautostereoscopic3Ddisplay".IBC(InternationalBroadcastingConvention)1999,pp.497–502.
[6]InternationalStereoscopicUnion."InternationalStereoscopicUnion:AGlossaryofStereoscopicTerms".http://www.stereoscopy.com/isu/glossary-index.html
[7]JulienC.Flack,HughSanderson,StevenI.Pegg,andSimonKwok."Optimising3Dimagequalityandperformanceforstereoscopicgamedrivers".ProceedingsofSPIEStereoscopicDisplaysandApplicationsXX,2009.
[8]GrahamR.Jones,DelmanLee,NicolasS.Holliman,andDavidEzra,"Controllingperceiveddepthinstereoscopicimages",ProceedingsofSPIEStereoscopicDisplaysandVirtualRealitySystemsVIII,2001,pp.42–53.
[9]HelmutJorkeandMarkusFritz."INFITEC—ANewStereoscopicVisualisationToolbyWavelengthMultiplexImaging".JournalofThreeDimensionalImages,Volume19,Number3(September,2005),pp.50–56.
[10]JukkaHäkkinen,MonikaPölönen,JariTakatalo,andGöteNyman."Simulatorsicknessinvirtualdisplaygaming:acomparisonofstereoscopicandnon-stereoscopicsituations".ACMInternationalConferenceProceedingSeries,Volume159(2006),pp.227–230.
[11]KenPerlin,SalvatorePaxia,andJoelS.Kollin."AnAutostereoscopicDisplay",Proceedingsofthe27thAnnualConferenceonComputergraphicsandInteractiveTechniques,2000,pp.319–326.
[12]RoyS.Kalawsky.TheScienceofVirtualRealityandVirtualEnvironments.Addison-Wesley,1993.
[13]LennyLipton.TheCrystalEyesHandbook.StereoGraphicsCorporation,1991.
[14]BernardMendiburu.3DMovieMaking:StereoscopicDigitalCinemafromScripttoScreen.FocalPress,2009.
[15]RaymondSpottiswoodeandNigelSpottiswoode.TheTheoryofStereoscopicTransmissionanditsApplicationtotheMotionPicture.UniversityofCaliforniaPress,1953.
TeamUnknownRelease
Chapter9-AMultithreaded3DRendererGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter9:AMultithreaded3DRenderer
SebastienSchertenleibSonyComputerEntertainmentEurope
Overview
The3Drendererremainsoneofthemaincomponentsinmostmodernvideogames.Usually,3Drenderingengineshandleboththesoftwareandhardwarepipelinesthatarebeingexposedthrough3DgraphicsAPIssuchasDirectX,OpenGL,orlibgcm.Nowadays,multi-coreCPUsarewidelyavailablethroughgameconsolesandPCs.InordertoensurethattheGPUiscontinuouslyfedwithdatatoprocess,itiscriticalthat3Drendererstakefulladvantageofthisnewprogrammingschemebyutilizingtheavailableprocessingcores.Theworkflowinvolvedincreatinga3DpictureonthescreenreliesonpreparingalistofcommandsthattheGPUcaninterpretandexecute.
DifferentapproachescanbetakentodecoupletheCPUfromtheGPU.Forinstance,onecommontechniqueconsistsofusingadoublebufferortriplebufferschemewheretheCPUbuildsthecommandsinframeNandwheretheGPUconsumestheminframeN+1,asillustratedinFigure9.1.Alternatively,theGPUcanconsumethedatawithinthesameframeinordertoreducethelatency,asshowninFigure9.2.OnepotentialdrawbackisthatitmightbedifficulttoavoidsomeGPUstallsearlyintheframe.Ontheotherhand,thememoryfootprintismuchsmallercomparedtoadoublebufferingmethod,andthisisparticularlyimportantonembeddedsystemswithrestrictedamountsofmemory.
Figure9.1:Inadouble-bufferingscheme,theGPUconsumesthedataaframelaterthanitisgeneratedbytheCPU.
Figure9.2:Inthisscheme,theGPUconsumesthedatasoonafteritisgeneratedbytheCPU.
TeamUnknownRelease
Chapter9-AMultithreaded3DRendererGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
9.1TheMemoryModel
Regardlessofhowthedisplaylistsgetcreated,arenderermightultimatelyberestrictedbymemoryaccessspeed,particularlywhenthegraphicscommandsaregeneratedthroughasinglethread.Inrecentyears,theincreasingperformancegapbetweenGPUprocessingpowerandmemorylatencieshasmadeithardertofeedtheGPUduetoexpensivedataaccessesinsystemmemory,asshowninFigure9.3.Atypicalframeissubdividedintopasseswhichhandle,forinstance,renderingshadowmaps,renderingthemainscene,andrenderingfullscreenpost-processingeffects.Aunitofworkduringrenderingisoftenreferredtoasabatchandcombinesasetofrenderstates,shaders,andgeometryelementsasexemplifiedbythefollowinglisting:
//settingupabatchsetRenderStates(...);bindTextures(...);setShaders(...);setShaderConstants(...);setVertexBuffer(...);setIndexBuffer(...);drawCall(...);
Figure9.3:BoththerenderthreadandthegraphicsAPIarelikelytoaccessdatainvariouslocationsinmemory.
Settingupabatchconsistsmostlyofsettingtheaddressesofvariousresourcesneededtorenderanobject.Intheprocessofcreatingbatches,therenderingcodehastotraversethescenegraphandislikelytoaccessdatainvariouslocationsthroughoutmainmemory.Thisleadstoalargenumberofcachemisses.WithinorderCPUssuchasthePowerPCchipsfoundintheXbox360andPlayStation3gameconsoles,thiscouldstalltheprocessorforhundredsofcycleseachtimeacachemissoccurs,greatlyimpactingperformance.Thisproblemcanbemitigatedwithout-of-orderCPUsbecauseotherinstructionscanpotentiallybeexecutedwhilepreviousinstructionsarewaitingfordatatobeready.
Onemayarguethatitmightbepossibletoreorganizethedatastructurestoavoidmanyofthecache-misspenalties.Indeed,adoptingcache-awareorcache-obliviousalgorithms[2]helps,especiallyforthescenegraphmanagement,butunfortunately,thegraphicslibraryalsoneedstoaccessand
manipulatesomedataonitsown.LargestructuressuchasvertexbuffersandindexbuffersaregenerallystoredintheGPU'slocalmemoryandsodonotcauseaproblem,butmanytypesofrenderingstate,suchasshaderconstantsandtextureconfigurations,needtobecopiedtothecommandbuffereachframe.Therefore,alternativesolutionsareneededtoovercomethecachelimitations.
TeamUnknownRelease
Chapter9-AMultithreaded3DRendererGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
9.2BuildingtheDisplayListsinParallel
Toensurethatthedisplaylistsarecreatedintheshortestamountoftimeandinawaythatminimizesmemorybandwidthlimitations,oursolutionusesmultiplethreads,andeachisresponsibleforcreatingasubsetoftherenderingcommands.However,thisisonlypossiblewhenthe3Dgraphicslibraryexposessuchalevelofgranularity.Thankfully,thisisthecaseforXbox360,PlayStation3,andDirectX11developers.InthePlayStation3case,theSynergisticProcessingUnits(SPUs)oftheCellBroadbandEngine,beingpurelyvectorprocessors,excelwithgeometricprocessing.ThismeanstheycaneasilyperformgraphicaloperationsthathelpoffloadworkfromtheGPUwhennecessary.Tocreatethedisplaylistsinparallel,acommonlyfoundparadigmistohavetheprimarycommandbufferreferencedisplaylistscreatedinsidesecondarycommandbuffers.ThosedisplaylistsarecreatedinparallelondifferentexecutionunitsasshowninFigure9.4.Often,thosesecondarybuffershandleasubsetoftheframe,andthegranularitycouldbeanythingfromasingledrawcalltoanentirepass.
Figure9.4:Theprimarycommandbufferreferencesmultiplesecondarycommandbuffers,eachofwhich
handlesasubsetoftheframe.
Bydistributingthecreationofthedrawcallsandtheirgraphicsstatestomultiplecommandbuffersinparallel,theoveralllatencyforcreatingthedisplaylistsisconsiderablyreduced,whichisparticularlyusefulwhenusingtheschemeshowninFigure9.2.Moreover,thismakesthe3Drenderermorescalabletovariousconfigurationsofmulti-corearchitecturesandhelpsgeneratethethousandsofdrawcallscommonlyfoundinmodernvideogames.
TeamUnknownRelease
Chapter9-AMultithreaded3DRendererGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
9.3ParallelModels
3Drenderersarestrongcandidatesforparallelizationbecauseeachdrawcallcanusuallybetreatedasastandaloneunitofwork.Eachofthesetaskspairsachunkofdatawithsomelogictooperateonthatdataasshowninthefollowingcodelisting.
intTaskMain(...){constintsourceAddr=...;//sourceaddressofdataconstintcount=...;//numberofelementsconstintdataSize=count*sizeof(gfxObject);
//OnsomeplatformslikethePS3,//youmayhavetokeepthedatalocaltotheexecutingunitgfxObject*buffer=(gfxObject*)Allocate(dataSize);DmaGet(buffer,sourceAddr,dataSize);DmaWait(...);//barriertowaitforthedata
//let'sdosomeworkfor(inti=0;i<count;++i){buffer[i]->update();}
//OnsomeplatformslikethePS3,//youmayhavetostorebackthedatatosystemmemoryDmaPut(buffer,sourceAddr,dataSize);DmaWait(....);//barriertowaitforthedata}
Thismodelprovidesanefficientparallelparadigm,whereeachtaskcanbequeuedandconsumedbyavailableprocessingunits,asshowninFigure9.5.Thisalsoavoidssomeoftheissuesofamorestandardmultithreadedapproach,asthetasksprovideamorefine-grainedsubdivisionandcancopemoreeasilywithunevencomputation,whileamultithreadedarchitecturemightendupwaitingonaparticularsubsystemtocompleteitstasks.
Figure9.5:Tasksarestoredinaqueueandareconsumedbyavailableprocessingunits.
TeamUnknownRelease
Chapter9-AMultithreaded3DRendererGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
9.4SynchronizingtheGPUandCPU
SynchronizingtheGPUandCPUismoredifficulttohandleinamultithreadedenvironmentsincemultipleprocessingunitsmightpotentiallywanttoaccessthesamedata.Toavoidanyinconsistenciesandraceconditions,thereisaneedtoemploysynchronizationprimitivessuchasmutexesoratomics.Usually,theGPUcanreportitsprogressinaspecificmemoryarea.Forinstance,whenaparticularcommandhasbeenfinishedontheGPU,itcanwriteaspecifiedvalue,or"report",toaCPU-accessiblelocationtoindicatecompletion.Dependingonthearchitecture,theCPUcaneitherpollforthereportvalueorreceiveasystemcallbackofsomekind.Table9.1presentssomepossibleconfigurations.
Table9.1:SynchronizingtheGPUandtheCPU.
CPU CPU Report Comments
… WaitReport(22) 0 GPUwaitsforreporttobeset
SetReport(22) WaitReport(22) 22 GPUisnowunlocked
… … 22
WaitReport(33) CPUwaitforreporttobeset
WaitReport(33) SetReport(33) 33 GPUsetthereport&unlockCPU
… … 33
SetReport(12) SetReport(15) ? Racecondition
WaitReport(33) WaitReport(22) ? Deadlock
Aswithanymultithreadedenvironment,specialcareisneededtoavoidpotentialraceconditionsordeadlockssinceboththeCPUandtheGPUcangenerateandconsumedatawithunpredictabletimingpatterns.
TeamUnknownRelease
Chapter9-AMultithreaded3DRendererGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
9.5UsingAdditionalProcessingResources
Inmanygames,theprocessingloadisnotfullybalancedamongtheavailableprocessingunits.Inthesecases,itcanbeworthwhiletomovesomeoperationsthatareordinarilyperformedonthemainCPUtootherunitsthatmayhaveidletimeeachframe.Forinstance,lessintensivegraphicsapplicationscanemployGPGPU(generalpurposeGPU)codetooffloadphysicsorAIsimulationsfromtheCPUusingtechnologiessuchasCUDA.Ontheotherhand,gameswillingtopushtheboundariesofreal-time3DgraphicsmightwanttousespareCPUcorestoperformsomegraphicaloperations.IfitsohappensthatthosecoresprovideaninterestingISA(instructionsetarchitecture)withSIMDinstructionssuchastheSPUsontheCellprocessor,thenitbecomespossibletooffloadtheGPUforseveralkindsofoperationssuchasthefollowing:
Geometricprocessing,includingproceduralalgorithmsforcreatingterrain,trees,decals,orsubdivisionsurfaces.
Physicsandparticlesystemupdates.
Viewfrustumobjectcullingorocclusionculling.
Softwarerendering,inparticularforocclusionqueriesandpost-processingeffects.
TeamUnknownRelease
Chapter9-AMultithreaded3DRendererGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
9.6ReducingthePressureontheMemoryBandwidth
TheperformanceofboththeCPUandGPUisimprovingataveryfastpace,butmemoryspeedisnotfollowingthesamecurve,andconsequently,memorybandwidthbecomesmoreandmoreofasignificantbottleneck.Therefore,itisimportanttoconsideranytechniquethatwouldhelpminimizetherequirementsonthememorysystems,evenifitmeansusingmoreCPUorGPUcycles.SometechniquesthatcanbeusedincludepackingtheshaderinputandoutputattributesorusingthetessellationunitsoftheGPUwhenavailable.OntheCPUside,specialgeometrycullingcanavoidsendingupto70%ofprimitivesthatendupbeingdiscardedbytheGPU(backfacing,off-screen,zero-size,anddegenerateprimitives).AnothertechniqueistogenerateacoarsedepthbufferinsoftwaretoperformocclusionqueriesinsteadofrelyingontheGPU[1],asthelattercaninvolvereadingbackfromvideomemory,whichisoftenaslowpath.
TeamUnknownRelease
Chapter9-AMultithreaded3DRendererGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
9.7PerformingGraphicalOperationsinParallel
Sofar,wehavediscussedsometechniquesthathelptoimproveoverallperformance,butwecangoastepfurtherandallowdifferentprocessingunitstoworkinparalleltocreatethefinalimageforaframe.Moderngamesusemultiplerendertargetswithinasingleframe,andsomeofthemarenotaccessedsimultaneouslybytheGPU.Forinstance,itisoftenpossibletoaccessabackbufferwithanotherprocessingunitsuchasanSPUtoperformsomepost-processingeffectswhiletheGPUstartsrenderingthenextframe.Thisgivesuptoafullframetorendertheeffects,whilekeepingthesameframerate,butattheexpenseofanadditionalframeoflatency(seeFigure9.6).
Figure9.6:Whilethehelperthreadcomputesthepost-effects,theGPUstartsrenderingthenextframe.
TeamUnknownRelease
Chapter9-AMultithreaded3DRendererGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]JohanAndersson."TheIntersectionofGameEnginesandGPUs:Current&Future".GraphicsHardware2008.
[2]SebastienSchertenleib."AnEffectiveCache-ObliviousImplementationoftheABTTree".GameProgrammingGems5,CharlesRiverMedia,2005.
TeamUnknownRelease
Chapter10-Camera-CentricEngineDesignforMultithreadedRenderingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter10:Camera-CentricEngineDesignforMultithreadedRendering
ColtMcAnlisBlizzardEntertainment
Overview
ModerngraphicsAPIsgranttheabilitytorenderinparallelbyallowingthecreationofdrawingcommandsonseparatethreads.Assuch,anadvancedenginedesignmusthavetheabilitytoscalepracticallyinordertotakeadvantageofincreasedcorecountforrendering.Inordertoaccomplishthis,anenginemustsolvetheproblemofhowtoproperlybreakuprenderingtaskstobecomputedinparallel,butinordertodothat,itmustfirstconsiderwhattherenderingsubsystemisactuallydoing.
Moderngraphicsenginesrequiremultiplerenderingsofthesamescenetoaidintheoverallappearanceofthefinalimage.Forinstance,separaterendersofascenearerequiredtocomputeshadowmapping,run-timereflections,screentiling(forantialiasingonsomegameconsoles),impostergeneration,andoffscreenparticles.Assuch,itmakeslogicalsensethatthe'camera'isthecoarsestformofsceneorganizationdirectlytiedtothevisibilityofanenvironment,giventhatinordertovisualizeanyoftheconceptsabove,youmustfirstdefineaviewposition,viewdirection,andeventuallyarendertargetinwhichtostoretheresults.Asmostviewsofthescenecanberenderedindependentlyofeachother,wesubmitthatgroupingrenderingworkloadbycameraviewoffersthebestrenderingworkloadorganizationtotakeadvantageofmulti-corerendering.
Inthisgem,wepresentanenginedesignthatscalestomulti-coresystemsbyusingtheconceptofacameratoseparateparallelrenderingjobs.Toaccomplishthis,we
presenttwoconcepts.ThefirstisanAPI-independentmethodofcreatingrecordablecommandbuffers,aprocessthatenablesustouseparallelrenderingonanydevice.Thesecondisanobservationofhowtodistributethecreationofcommandbufferstoseparatethreadsbaseduponthecamerawithwhichitwillbeused.Usingthesetwosimple,yetpowerfulconceptscanenableyourenginetotakeadvantageofmultiplecoresforrenderingwiththeleastamountofpainpossible.
TeamUnknownRelease
Chapter10-Camera-CentricEngineDesignforMultithreadedRenderingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
10.1UsesofMulti-CoreinVideoGames
Withtheboomofmulti-coresystemavailability,there'samaddashtofillextracoreswithproperamountsofworktoenhancethelookofourproducts.Thedownside,though,isthatfordevelopersonplatformswherethehardwareconfigurationcanchangebetweentwousers(oreventhesameuser),decisionsmustbemadeabouthowtoaccuratelyscaleoutvisualfeaturesandworkloadsonlower-endprocessors.
Typically,weruninparallelthingssuchasparticles,animations,physics,etc.,whichcanhavelevel-of-detail(LOD)builtintothemforlow-endmachines.Thisconcept,however,doesn'tdirectlytranslatetotheactofsubmittingcommandstothegraphicsAPI,aprocesswhichcaneasilyincuragreatdealofperformanceoverhead.APIssuchasDirectX9andDirectX10suffergreatlyfromthis,asit'snotuncommonforfrequentcallsofbasicAPIfunctionstobecomeaperformanceburden.Thistypicallycausestherenderingphasetodelaytheprocessingofothersystemsinthearchitecture,whichifhighlydependentonrefreshrate,couldcauseproblems.Thecauseofthisissueisthefactthatwearerequiredtosubmitcommandstothedeviceonthethreadthatownsthedevice,whichformostgames,isthesamethreadonwhichthesimulationlogicisrun.Foranin-depthexaminationofthisprocess,pleasereferto[2],whichprovidesmanysignificantandproperlydocumentedexamples.
Figure10.1displaysanexampleframesetupforagivengameengine.Inthisexample,therenderingdeviceisowned
bytheprimarythread,andassuch,renderingasceneblocksthefrequencyofsimulationupdates.Asdescribed,theonlytwosystemsthatusethethreadpoolareparticlesandanimations,namelybecauseoftheirabilitytobeupdatedwithoutmemorycontention,andtheconceptofscalingbackaccuracyorLODforthesesystemsistrivial.Theendofourframegoesthroughtheprocessofrenderingshadowmaps,reflections,themainview,andfinallypost-processing.Becauseofthefactthatwemustsubmitcommandstothedeviceonthethreadthatownsit,mostoftherenderingcommandsareinterwovenwithlogic-basedoperationssuchasscenetraversals,updatelogic,andthelike.Thistypeofrenderingarchitecturecanmakeitdifficulttomaintainyourengineandportittootherplatforms.
Figure10.1:Astandardgameusageofparallelprocessing.Noticethatonlytheupdateinformationtendstobemultithreaded.
Tobefair,modernenginesarenotarchitectedaspoorlyasshowninFigure10.1,andonaveragemakemuchbetteruseofthreadpoolavailability.Someexcellentalternativethreadingarchitecturesforvariousgenresofvideogames,mostofwhichoperatebyoffloadingtherenderdeviceownershiptoaseparatethread,arelistedin[2].Thisfollowstheobservationthatthesimulationcodeisnotrequiredto
updateatthesamefrequencyasthegraphicsrendering,andoffloadingtherenderingtoadifferentthreadallowsthesimulationtocontinueontoprocessthenextframeaftersubmittingadrawrequest.
Althoughthisarchitectureismorecommonnow,itisstillfarfromideal.Evenwiththedevicesubmissionbeingoffloadedtoadifferentthread,itwouldbebeneficialtohavetheabilitytomodifythedrawingpartofaframesuchthatitcouldmakebetteruseofthreadpoolavailability,thusbalancingoutyourthreadutilizationmoreevenly.
TeamUnknownRelease
Chapter10-Camera-CentricEngineDesignforMultithreadedRenderingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
10.2MultithreadedCommandBuffers
ModernrenderingAPIsoperateinternallywithadatastructureknownasacommandbuffercontainingrenderinginformationneededtoprocessdrawcommandsatthedevicelevel.Typically,thesecommandbuffersarefilledonyourbehalfbytheexposedrenderingAPIandinsertedintoacommandqueuethatisexecutedatalatertime.Becauseofthis,thethreadonwhichtherenderingAPIsarecalledincurstheoverheadoffillingthecommandbuffer,oftenabsorbingCPUcyclesinawaythathashistoricallybeenacommonproblemformoderngames.
Atthetimeofthewritingofthisgem,afewAPIsallowtheabilitytocreatedynamiccommandbuffersorcommandlistsonathreadseparatefromtheonethatownsthedevice,aprocessthatallowsustodistributetheoverheadoffillingthecommandbuffersacrosstheentiresystem.Thisrepresentsanimportantstepforwardforrenderingengines,aswenowpossesstheabilitytomorefinelycontroltheworkloadofourrenderingengineacrossourthreadingsystems.Thenegativeside,however,isthatolderAPIsdonothavethesameabilities,causingissuesforenginesthataimtobecompatiblewithmultiplehardwarelevels.
Toaddressthisissue,Scheib[1]providedresearchshowingamannerinwhichtocreateanoverloadedDX9devicethatwouldmimicthesamerecordingabilityasthenewerAPIs.Toaccomplishthis,theyoverloadthedeviceAPIsforthestandardDX9interface,andreroutethemtotheirownsystemtocompositeacustomdatastructurethatresemblesaninternalcommandbuffer.Theythensubmitthiscommand
buffertotheAPIontheprimarythreadowningthedeviceusingstandardDX9calls.Ontheprimarythread,wearestillincurringtheoverheadofsubmittingAPIcalls,howeverduetothebatchedstate,theoverheadofAPIcallsontheprimarythreadisreducedsignificantly,providingadditionalperformanceincreases.ScheibshowsthatevenwiththeoverheadofsubmittingthebuffertotraditionalAPIs,theystillgainasignificantperformanceincreasebycompositingtherenderingcommanddataonaseparatethread.
TeamUnknownRelease
Chapter10-Camera-CentricEngineDesignforMultithreadedRenderingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
10.3Device-IndependentCommandBuffers
ForlegacytitlesthatwouldliketoreducetheamountofAPIperformanceoverheadwithouttoomuchcoderework,themethodof[1]worksquitewell,allowingadrop-insolutionthatcanbenefitperformance.Forthosetitleswiththeluxuryofanalysisandnotbeingboundbycrunchdeadlines,it'sworthpointingoutthatthepresentedmethodcanbeconsideredoverkillintermsofcomplexity,andshort-sightedintermsofAPIdifferentiation.AnalysisofyourrenderingsystemswillprovethatasubsetofdeviceAPIsareoftenused,andmoresothatmostofthemoccurinfrequentlypredictablepatterns.Assuch,whengeneratinganAPIwrappertothedevice,youwastetimebyofferingsupportforfunctionsthatarenotusedbyyourtitle.
Logically,APIcallsshouldbehiddenbehindawrapperinterface,anyway,inordertominimizetheamountofcodethatwouldneedtobereworkedtosupportmultipleplatformsandrenderingAPIs;assuch,itmakessensethatAPIcallsfordynamiccommandbufferrecordingshouldalsobehidden.Inthiscontext,thepresentedmethodof[1]islacking,andacustomAPI-independentsolutionisrequired.
RenderCommandStructure
Tothisend,wepresenttheconceptofaRenderCommand,whichcontainstheleastamountofinformationrequiredtosubmitadrawcalltotheAPIinadevice-independentfashion.Ineffect,ourRenderCommandstructureisdesignedtomimicthedrawcommandsthatmodernAPIsuseinternally,containinginformationaboutwhichvertexbufferto
use,theshadingstates,andhowmanypolygonstodraw.Dependingontheneedsofyourgraphicsengine,thelayoutandmembersofthisstructurecanvarygreatly;however,intheinterestofmemorythroughput,it'sagoodideatokeepthisstructureassmallaspossiblebyquantizingasmuchstatedataasyoucan.ThecodeinListing10.1showsanexampleofthisprocessinthat,ratherthanlistingoutexplicitstatesforblending,asingleenumeratedstateisused.
Listing10.1:AsimpleRenderCommandstructurethatcontainsbasicinformationforadrawcall.
structRenderCommand{ResourceHandleVertexBufferHandle;uint32VertexDeclEnumIndex;uint32NumTriangles;uint32NumVerts;
enumPrimType{kTriList=0,kTriStrip};
PrimTypePrimitiveType;
enumBlendType{kBlend_None=0,kBlend_Standard,kBlend_Additive,kBlend_Subtractive
};
BlendTypeBlendType;//andsoon...}
ThecodeinListing10.1isaverystrippeddown,simplifiedversionofwhatafullproductionRenderCommandstructurewouldlooklike.Forbonuspointsinthememorycategory,youcouldmakeadynamicallyresizableversionofthisstructurethatholdsonlydeltastatesthatchangebetweenthisdrawcallandthepreviousone.Ineffect,thismatchesclosertowhattheAPIsuseinternally,butforthecomplexityinvolved,thesimpleversionpresentedabovewillsufficeforthedescriptivepurposesofthisarticle.
Device-IndependentResourceHandles
Bydesign,theRenderCommandstructuremustcontainhandlestodeviceresourcesthatitwillreferencewhenexecutingthedrawcommand.Thisexhibitsaprobleminthatdirectlyexposingdevicehandlestotherestofthesystemsmakesitdifficulttoporttheenginetootherplatforms,oftencausingrenderingcodetobespreadoutacrosstheentireproject.Assuch,it'softenusefultocreatemanagersthatholdtheactualdevice-specificresourcehandlesandofferawrappedhandletotherestofthesystems.TheseResourceHandleobjectscanhelptheprocessofportingthecodetootherplatformsandalsograntyouabufferinterfacefordoingasynchronousassetloading.Afulldiscussionregardingtheissuesofmultithreadedresource
creationandmanagementisbeyondthescopeofthisgem.Formoreinformation,wereferthereaderto[1]asanintroductiontothetopic.
FillingaRenderCommandStructure
AgivenRenderCommandstructuresimplyreferencesdeviceinformationthatitneedstoexecute.Itdoesnot,incontrast,actuallyloadorownthedeviceresourcesitself.Assuch,anotherclass,whichwewillcallaRenderObject,isresponsibleformanagingthelifetimeofagivenresource(theactualresourceitselfshouldbeownedbythemanagerresponsibleforitatalowerlevel,asdescribedpreviously).BeforefillinginaRenderCommandstructure,weassumethatagivenRenderObjectstructurehasalreadybeenloadedintoyourengineandhascommunicatedwiththedeviceinsuchawaytoacquireproperrenderingresources(suchasvertexbuffers).
Listing10.2describesthestraightforwardprocessoffillinginaRenderCommandstructurewiththeinformationcontainedinaRenderObjectstructure.Asdescribed,theRenderObjectstructuremustcontaininformationabouthowitneedstorender,andinoursimpleexample,itcopiesitsstateovertotheRenderCommandstructurebaseduponinternaltypesandlogic.
Listing10.2:Fillingacommandbufferusinggenerichandles.Thisisagreatplacetodoadditionallogicrelatedtorenderingsetup,sinceitwillbeexecutedonthethreadpool.
voidRenderObject::fillCommandBuffer(RenderCommand*RC)
{//makesurewe'rerunningonthethreadpoolThreadAssert(ThreadPoolThread);
if(ObjectType==kTypeOpaqueMesh){RC->VertexBufferHandle=mVBHandle;RC->VertexDeclEnumIndex=kVD_Mesh;RC->PrimitiveType=kTriList;RC->BlendType=kBlend_None;RC->NumTriangles=numTrisFromPrimType();RC->NumVerts=mNumVerts;}elseif(ObjectType==kTypeTransparentMesh){//andsoon...}}
It'sworthnotingherethatwe'renotcomputingnewstatefortheRenderObjectstructureatthispoint;we'resimplycopyingoverstateinformationrelativetorenderingandassigningitintothestructure.Thisisahighlycriticalpointwhendiscussingthefillingofthesebuffersonmultiplethreads,asthisassignmentpatternlendsitselftobeingfreeofmemorycontention.Thatbeingsaid,thefillCommandBuffer()functionistypicallywhereyourRenderObjectstructurewouldcontainlogicdecidinghowtofillinaRenderCommandstructureproperly.Thisincludesthingslikecheckingthematerialtodetermineifweneedtorenderwithalphablending.Thesetypesoflogicanddata
accesspatternscangetcomplexattimes,whichiswhymovingthemofftoaseparatethreadfreesupcyclesonyourdevicethread.
SubmittingaRenderCommandtotheAPI
TheactofsubmittingourcustomRenderCommandstructuretotheAPIisoverlyverbose,yetdirectinexecution.Sincewearefillinginourcommandsonotherthreads,wemusteventuallyresignourselvestosubmittingthecommandsonthethreadthatownsthedevice.Inpractice,thisrelatestoconvertingthedatainourRenderCommandstructuretoinformationthatwepassontotherenderingdeviceAPIsforexecution.FormorerecentAPIsthatsupportcommandbuffercreation,theactofsubmittingtothedevicerequiresaconversionfromourcustomcommandbuffertothedesireddevice'sformatbeforesubmission;whereasforolderAPIsthatdonotdirectlysupportcommandbuffercreation,thedatamustbedirectlyfedtothedevicethroughAPIcalls.Listing10.3describestheexecuteDrawCommand*()function,whichatthispointistheonlyfunctionwe'vedescribedthatactuallyhasdirectaccesstothelow-levelrenderingdevice.WepresentheretheDirectX9versionofthecommand,wherewemustcalltheAPIdirectlywithourabstracteddatatypes.It'sworthnotingthatproperlogictodeterminewhichversionofexecuteDrawCommand*()tocallisahigher-levelengineconceptthatisbeyondthescopeofthisarticle;simpleversionsincludeapointertotheproperfunctiontocall,whilemoreadvancedversionsoverloadtheRenderControlclassentirely.
Listing10.3:SubmittingaRenderCommandstructuretotheAPI.Thissnippetofcodeistheonlyfunctionthat
canactuallycommunicatedirectlywiththedevice.
voidrenderControl::executeDrawCommandDX9(constRenderCommand*params){ThreadAssert(DeviceOwningThread);//SetvertexstreamconstVertexBufferContainer*vbc=mManagedVBs.getElement(params->vbHandle);DX9Dev->SetStreamSource(0,(IDirect3DVertexBuffer9*)vbc->devicehandle,0,vbc->perVertSizeInBytes);
SetShaderData(params);SetRenderStates(params);
//Weuselookuptablesforthesemappingsbecauseit'sfaster.DX9Dev->SetVertexDeclaration(StaticVDeclHandles[params->vDecl]);
D3DPRIMITIVETYPEtype=PrimTypeMappingLUT[params->PrimitiveType];
//dodrawDX9Dev->DrawPrimitive(type,0,params->NumTriangles);}
BecauseourRenderCommandstructurewrapsupdataandinformationinanabstractedfashion,submittingthatdatatotheAPIhastoincluderedirectionfromtheResourceHandletypestoAPIhandlesthatcanbesenttothedevice.Listing10.3belowshowsthisprocessdirectly,wherethevertexbuffermanagermustbegivenaResourceHandleobjectinordertoreceivetheproper
devicehandle.
TeamUnknownRelease
Chapter10-Camera-CentricEngineDesignforMultithreadedRenderingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
10.4ACamera-CentricDesign
Aswe'vediscussed,acameraisthecoarsestformofbatchingcontainerusedtogatherworkforrendering.Sofar,we'vedescribedasystemthatallowsustofillinsinglecommandbuffersinadevice-independentmannerandsubmitthemtotherenderdeviceatalatertime.Now,weneedtodescribethepropermannerofcreatingcontainerclassestoaidwiththisprocessofbatchingRenderCommandobjectsaswellasalargerenginedesigntoscaleeasilywithmultiplethreads.
BalancingRenderingAcrossMultipleThreads:Everything'saCamera
Withtheabilitytorenderacrossmultiplethreads,thenextstepisproperlyutilizingthisfeatureandapplyingittoyourthreadingsystem.ThismeansyouneedtofigureouthowtocreatejobpacketsthatrepresentrenderingworktobedonebyfillingintheRenderCommandstructures.Atthemostdirectlevel,itmakessensetosimplycreateonejobpacketforeachdrawcallthatwouldoccurinyourscene,thuscreatingoneRenderCommandobjectperjob.Therearesomeissueswiththis,theforemostbeingthatmodernenginestypicallygroupmanydrawstogetherintoasinglecommandbuffersothatitminimizesAPIsubmissionoverheadandalsotakesadvantageofthingslikeredundantrenderstatefiltering.Creatingthedrawcommandsindependentlyofeachotherrobsusoftheabilitytotakeadvantageofthisoptimizationasthecommandsarebeingcreated.
Assuch,itmakessensethatwestillneedtocreatedrawcommandsinsuchawaytotakeadvantageofstatefilteringbybatchingthemsequentially.ThedifficultpartaboutthisprocessisdetermininghowtoproperlybatchupyourRenderCommandobjectstotakeadvantageofthis.Typically,mostenginedesignsembracecommandbuffergenerationforobjectsbeingrenderedintotheprimaryview,butnotforsubsequentthingslikerenderingshadowmaps.Thisresultsinanunbalancedthroughputwithyourrenderingpipeline,beingthatsubmissionofsomedrawcommandsinonesectionaresignificantlyfasterthaninothers.
Asanalternateviewontherenderingprocess,agivenframeofrenderinginagamecanbedescribedasagroupingofcamerasthatallcontributetheirviewdatatothefinalscene.GPU-basedshadowmappingisagreatexampleofthisconceptinaction,asitdefinesthesamecameraandrender-targetsystemthatyourprimaryviewdoes,dealingwiththesameculling,redundantstateoverhead,andLOD.Reflectiverendertargets,offscreenparticlebuffers,andtiledantialiasingsystemsallexhibitthesamecharacteristicsaswellandcanbedescribedusingthesameconcepts.
Thiseffectivelysolvesaproblemforus.Becauseourscenecanbedescribedintermsofcameras,wecanusethatideatobatchourRenderCommandgeneration,sinceobjectsthatrendertothesametargettypicallysharesomesortofsimilarstatedata(forinstance,shadowmappingrequiresallobjectstouseadifferentsetofshaders).
Dividingourrecordingworkbycameraalsoprovidesuswithasimpleheuristicforscalingbackworkflowbaseduponhardwarefeatures.Forinstance,ifweknowthehardwareis
notfastenoughtohandlereal-timereflections,cube-mapcamerascanbeavoidedandnotprocessed.Anotherexampleistiledantialiasing,aprocessinwhichsubregionsoftheprimarycameraareusedtogeneratelargerimagesforportionsofthescreen,whicharethendownsampledandcombinedtogeneratethefinalimage.Byviewingeachsubregionasanothercamerarendertarget,changesbetweenantialiasinglevelssimplyresultinadditionorremovalofcamerasfromthesystem.
Figure10.2showshowagivenrenderingpipelinecanchangebybreakingupcommandbuffergenerationbygroupingthemacrosscameras.Noticethattheamountofworkdoneacrossmultiplethreadsincreases(decreasingouroverallprocessingtimeontheprimarythread),butweaddanadditionalAPIsubmitphaseonthethreadthatownsthedevice.
Figure10.2:Acamera-centricdesign.Wemakebetterusageofthethreadpoolforadditionalprocessingofrenderingjobs.
FormodernAPIs,it'sworthpointingoutthattheRenderCommandsubmitphaseinFigure10.2issignificantlysmallerbecausethecommandbuffersaresimplycopiedfrommainmemorytothedevice.Thisispossibleduetothe
facttheseAPIshavetheirowncommandbufferformats,andatranslationfromourcustomRenderCommandstructuretotheAPIsversioncanbetrivial.ForolderAPIsthatdonotsupportcommandbuffers,youcannotsimplytranslatethecommandbufferdatatothedeviceformat.Instead,youmustcommunicatewiththeAPIthroughstandardfunctioncallsasyounormallywould,usingthedataintheRenderCommandstructure,asshowninListing10.3.
RenderViewStructure
Atthispoint,weseektofillRenderCommandstructuresbylogicallydividingthemintorenderingbinsbaseduponthecamerastowhichtheyarevisible.Thisisbeneficialtous,becauseitwillallowustogrouptogetherthesedrawcommands,whichtypicallywillsharesomesortofstatedata(attheveryleast,theywillallsharethesamecameratransformmatrix).
ARenderViewobjectdefinesastructurethatcontainsalinkagebetweenacameraanditsrendertarget.Moreimportantly,aRenderViewobjectdescribeshowagivenrendertargetisfilled,meaningitmustcontainalistofallobjectsthataretoberenderedtothatgivenrendertarget.
Listing10.4showsthecomparisonbetweenaCamerastructureandourRenderViewstructure.Noticethatingeneral,ourrenderviewisasupersetofacamera,takingintoaccountadditionalgraphics-relatedproperties.Inaddition,notethatwestillcontainwrappedResourceHandleobjectstorepresentourdestinationrendertargets.TheRenderViewstructurealsocontainsalistofRenderCommandpointers,whicharefilledinbythe
RenderObjectobjectsthatarevisibletothisview.
Listing10.4:ComparisonbetweenaCamerastructurethatthesimulationwoulduse,andaRenderViewstructure.
structCamera{Float3at,up,right;floataspectRatio;};
structRenderView{CameraViewCamera;FrustumFrust;RenderTargetHandleDestColorRTT;RenderTargetHandleDestDepthRTT;
List<RenderCommand*>RenderCommands;
//thisenumerationisveryimportantasitdefinesthe//orderinwhichwesubmitrenderviewstotheAPIenumViewType{kVT_ShadowMap=0,kVT_ReflectionMap,kVT_MainCamera,kVT_PostProcessing,kVT_Count};
viewTypeViewType;
}
It'sworthtakingamomenttodiscusstheViewTypememberoftheRenderViewstructure.Althoughwe'recompositingviewinformationonmultiplethreads,westillrequireaspecificresourcedependencyaswe'resubmittingprimitivestotheAPI.Forinstance,weneedtocompositealltheshadowmapdatabeforetheprimaryviewsothatobjectsthatusethoseresourcescanrestassuredthattheyareavailableforuse.Toaccomplishthis,wemusttageachRenderViewobjectwithatypesothatlateron,wecansubmittherenderviewstotheAPIinaproper,serialmanner.Wecovertheimplementationofthisprocessinmoredetailbelow.
FillingaRenderViewStructure
Creatingandfillingtherenderviewsisaverysimpleprocess,butstillrequiresabitofunderstandingintherealmofthreading,oratleastanunderstandingofthreadpools.Wereferbacktoaconceptthatalargeportionofyourenvironmentcanberepresentedasacameraview,andassuch,wemustiterateoverthoseobjectcontainerstocreateourrenderviews.
Listing10.5belowcoverstheprocessofcreatinganewRenderViewstructureforeachcameratypeinyourscene.Thisincludesprimarycameras,shadowmaps,reflectionmaps,etc.Oncetheviewshavebeencreated,wecreatejobsforthethreadpoolthatfillinagivenrenderview.Atthispoint,theorderinwhichtheRenderViewobjectsare
assembledistrivialsincethedataaccesspatternsshouldalreadybethreadsafe.
Listing10.5:Creatingalltherenderviewsforagivenframe.
voidrenderControl::CreateRenderViews(){List<RenderView*>currentViews;
//foreachprimarycamera(thisincludesportalcameras)for(inti=0;i<mCameras.size();i++){currentViews.add(newRenderView(mCameras[i],kVT_MainCamera));}
//foreachshadowmap!for(inti=0;i<mLights.size();i++){if(mLights[i].IsShadowCasting()){currentViews.add(newRenderView(mLights[i].getShadowCamera(),kVT_ShadowMap));}}
//foreachreflectivetarget,etc...
//nowfillourrenderviewswithvisibleobjectsina//threadedenvironment.for(inti=0;i<currentViews.size();i++){Threadpool.QueueWork(procThreadedFillRenderView,
currentViews[i]);}
Threadpool.waitForWorkToFinish();}
Oncewe'vecreatedourrenderviews,weneedtomoveforwardwithdeterminingwhatobjectstorender.Todothis,wemustfirstculltheenvironmentagainstthefrustumofthecameraownedbytherenderview,andthencreateaRenderCommandstructureforeachobjectthat'svisibletothiscamera.TheseRenderCommandobjectscanresideinaliststructurethatcanbesubmittedsequentiallytotheAPIatalatertime.Listing10.6coversthisprocessbydescribingtheinternalsofathreadprocedurethatcreatestheRenderCommandobjectsforagivenrenderview.Weassumethatyourobjectmanagerhassomeabilitytoperformfrustumculling,fromwhichyouthengatherthevisibleobjectsandhaveeachonecreateanewRenderCommandstructure.
Listing10.6:FillinginRenderCommandstructuresshouldoccurinathread-safemanner,onaseparatethread.
voidrenderControl::procThreadedFillRenderView(void*DataPacket){RenderView*currView=(RenderView*)DataPacket;List<RenderObject*>objects=gObjectManager.giveFrustumCollision(currView->frustum);for(intq=0;q<objects.size();q++)
{RenderCommand*RC=newRenderCommand();Objects[q]->fillCommandBuffer(RC);currentViews[i].RenderCommands.add(RC);}}
It'sonceagainimportanttopointoutthatyourdataaccessmodelatthispointneedstobearead-onlysystemthatisthreadsafe.Thismeansthatwhiletraversingyourobjecthierarchytocullagainstacamerafrustum,youshouldbedoingsoinamannerthatdoesnotcorruptmemoryforcodeaccessingthesamedatainotherthreads.
Forthesakeofthreadsafetyandmemorycoherence,weallocateanewRenderCommandstructureforeachinstanceofanobjectforeachcameratowhichitisvisible.Ourparticularexampleusesthisallocationmetricforafewreasons,ofwhichoneistheassumptionthatyouwillsubmityourRenderCommandobjectsinathreadseparatefromtheonetheyareallocatedin,requiringframe-coherentmemory.Thiscanbeaveryperformance-heavyprocess,andassuch,youshouldkeepaneyeonitincaseyouneedtoimplementacustomcontainerthathasfasterallocation/deallocationspeed.Ifthisdoesnotmatchyourparticularthreadingsystem,thenyoumaybeabletogetbywithalessdynamicmodelofRenderCommandallocation.
SubmittingaRenderViewtotheAPI
Submittingarenderviewisafairlystraightforwardprocess.
Wemustsimplybindtherendertargetwe'redrawingtoandthensubmiteachofthecommandsthatwehaveinourlisttotheAPIfordrawing.Listing10.7showsthisprocess,anditaddsanadditionalsetofdatatoindicatewhethertherendertargetshouldbecleared.
Listing10.7:Serializingrenderviewsrequiresustoresolvetheminamannerthatsatisfiesdependencies.
voidrenderControl::serializeRenderViews(List<RenderView*>Views){for(intviewType=0;viewType<Count;viewType++){for(inti=0;i<views.size();i++){if(Views[i].mViewType!=viewType)continue;
BindRenderTarget(Views[i]->renderTarget,Views[i]->DepthTarget);
if(Views[i]->clearTargets){ClearTarget(Views[i]->clearFlags,Views[i]->clearColor,Views[i]->clearDepths);}
for(intk=0;k<Views[i]->commands.size();k++)executeDrawCommand(Views[i]->commands[k]);}}}
Inaddition,Listing10.7highlightsaveryimportantaspectofserializationofRenderViewobjects,namelythatsomeRenderViewobjectsaredependentonothers,soeventhoughwe'recompositingtheminmultiplethreads,wemuststillsubmittheminaparticularordertotheAPI.Forinstance,youneedtocompositeshadowmapsbeforeyoucanusetheminsubsequentdrawcommands.
TeamUnknownRelease
Chapter10-Camera-CentricEngineDesignforMultithreadedRenderingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
10.5FutureWork
We'vepresentedinthisarticleameansinwhichtooffloadtheoverheadofcreatingcommandbufferstoseparatethreadsinadeviceAPIagnosticmanneranduseacamera-centricdesigntoensureproperloadbalancingacrossmultiplethreads.AswecontinuetotakegreateradvantageofourrenderingAPIsforgeneralpurposecomputing,theabilitytoproperlybreakupworkbaseduponpacketsofrenderingdatawillcontinuetobeimportant.
Therearemanyproject-andsystem-specificissuesthatarerelatedtothisprocessthatarebeyondthescopeofthisarticle.Forcompletenesshowever,webrieflydescribethemhereandleavethenitty-grittydetailsasanexerciseforthereader.
SortingandInstancing
RenderCommandobjectsprovideexcellentdynamicprimitivesforinstancingevaluation.OncealltheobjectsforarenderviewhavebeenculledandaddedtoaRenderCommandbuffer,itistrivialatthatpointtodefinemultipletypesofsortingoperationsthatcanorganizetheRenderCommandobjectsinthebufferinvariousways.Sortingbymaterialindex,vertexdata,andobjecttypeoftenlendsitselftoasortingprocessthatcombinesobjectsinthecommandbufferinsuchamannerthatmultipledrawcommandscanberemovedandasinglecommandthatusesinstancingcanbeused.Andifyou'regeneratingyourRenderCommandstructuresdynamically,youcaneasilyremovetheoriginalcommandsandinsertthenewinstanced
version.Onsomehardware,thismaybeamoreperformancefriendlymethodofsortingbydata,asyourRenderCommandcantakebetterusageofprocessorcachesandreducememorytraversalthatwouldnormallyberequiredwhenwalkingyourobjectlist.
BetterLoadBalancing
Forthoseofyouwhoaremoreproficientatthreadedarchitectures,it'sworthnotingthatthere'samodificationtothisprocessthatcantakegreateradvantageofyourthreadingarchitecture.Thepracticalobservationisthatmostcamera-specificRenderCommandgenerationprocessesexhibitunevenprocessingtimes.Forinstance,smallerviewportsordifferenttypesofAPIcommandscauselessworktobedone.Assuch,youcanoftenwindupwastingprocessingtimewaitingfortheseunbalancedthreadstofinish.
Amoreadvancedsolutionistomodifyyourthreadproceduretocreateonethreadjobforeachdrawcall,allowingeachtobecreatedonaseparatethread,yetfillinginthecameradatastructureatomically.ThisusuallymeansthatyourvisibilitycullingforeachcamerawouldneedtooccurontheprimarythreadsothatyoucanensureproperallocationofspaceinyourRenderCommandlist.Inordertotakeadvantageofrender-statefiltering,sorting,orinstancing,youmustperformtheseprocessesoncealltheindividualRenderCommandobjectshavebeenproperlycreated.
Thebenefitofthismodificationisthatyou'venowcreatedsmallerjobpacketsthatcanmaximizeyourthreadutilizationoverthecourseofyourframe.Thisisveryusefulon
platformswhereyourthreadoperationscanbeinterruptedbyOSevents.Forafurtherdiscussiononissuesrelatedtosmall-jobpacketsinvideogames,pleasereferto[3].
TeamUnknownRelease
Chapter10-Camera-CentricEngineDesignforMultithreadedRenderingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]VincentScheib."PracticalParallelRenderingwithDirectX9and10".GameFest2008.
[2]Lindberg,etal."StudiesofthreadingsuccessinpopularPCgames".GameDevelopersConference,2008.
[3]RandallTurner."SaintsRowScheduler".GameDevelopersConference,2007.
TeamUnknownRelease
Chapter11-AGPU-ManagedMemoryPoolGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter11:AGPU-ManagedMemoryPool
JeremyMooreBlackRockStudio
Overview
ThePlayStation3andXbox360gameconsolesbothcontainunifiedmemoryarchitecturesinwhichtheGPUcandirectlyreadtoandwritefromCPU-accessiblememory.ThegraphicsAPIsontheseconsolesallowgraphicsdatatobeplacedanywhereinmemoryandprovidetheabilitytodirectlycreateandmanipulateGPUresourcessuchastexturesorvertexbuffers.Withthislowlevelofcontrol,itisnaturaltoconsidertheconstructionofstreamingsystemsinwhichwedynamicallyload,move,andunloadtheresourcesthattheGPUrenders.
StreamingsystemsrequirethatdataiscopiedintomemorywhereitcanbeaccessedbytheGPUforrendering.OneimplementationoptionistomanagealldatacopyingwiththeCPUthroughatraditionalmemcpy()styleAPI.SincetheCPUandGPUrunconcurrently,weneedtotakecaretoensurethatanydataisinplacewhentheGPUisreadytoreadit.However,dealingwiththesynchronizationoverheadinachievingthiscanbecomplexanderrorprone.
Inthisgem,wedescribeanalternativesolutionthatusestheGPUtomanagethedatacopying.ThisapproachensuresthatthecomplexityandoverheadofsynchronizingGPUoperationsanddatamovementarevastlyreduced.Inaddition,thisapproachremovestheCPUcostsfordatacopyingand,onourtargetplatforms,allowsustoachievehigherdatacopyingbandwidths.
Toillustratethisapproach,weoutlinethedesignoftheGPU-managedmemorypoolusedforthestreamingsystem
implementedatBlackRockStudio.ThissystemwaswrittenforuseonourracinggameSplit/Second.
NotethattheideasoutlinedherearemainlyrelevantforourtargetconsoleplatformsthatgiveahighdegreeofcontroloverGPUresources.Thisisn'tthecaseonthePCplatformwherethegraphicsAPInecessarilyabstractssuchdetailsawayfromus.
TeamUnknownRelease
Chapter11-AGPU-ManagedMemoryPoolGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
11.1Background
StreamingRequirements
ThegameworldsinSplit/Secondarelargeandrichlydetailed.Wecannotfitallofthegraphicsresourcesusedtorenderthemwithinthefixedmemoryspaceofourtargetconsoleplatforms.Thesolutionforthisproblemistodynamicallystreamintomemoryonlytheresourcesneededtorenderthepartoftheworldnearesttothecamera.
Anystreamingsystemconsistsofanumberofcomponents,suchasthesystemtoloadresourcesefficientlyfromdiscorthelogicfordecidingwhichresourcesshouldbeloadedforagivencameraposition.Inthisgem,weconcentrateonlyonthecomponentthatmanagesthememoryusedtostoreloadedresources.
ResourceTypes
Thereareanumberofgraphicsresourcetypesthatweusewhenrendering.Indecreasingorderoftypicalsize,someexamplesaretextures,vertexbuffers,indexbuffers,shaders,andconstantbuffers.Oncurrentconsoles,theseresourcesareeachmadeupoftwoparts.First,thereisasmallfixedsizecomponentthatisreadbytheCPU,whichwewillcallthe"header".Second,thereisalargervariable-sizedcomponentthatisreadbytheGPU,whichwewillcallthe"data".Whensubmittingrenderingcommands,theCPUparsestheheadertocreateentriesintheGPUcommandbuffer.Thecommandbufferthencontainsreferencestothe
datathattheGPUreadswhenexecutingtherenderingcommands.
Inthisgem,weonlyconsiderthemanagementofthedatathatisreaddirectlybytheGPU.Themanagementoftheheaderisasimplerproblem.Itgenerallyinvolvesfixedsizedobjectsforwhichafixedsizedobjectmemorypool[2]isacommonapproach.
DesignRequirements
WhengatheringtherequirementsforthestreamingsysteminSplit/Second,wedecidedthatitshouldbeabletodynamicallyloadandunloadanyGPUresourcefromdisc.WewantedtoavoidoverlycomplexcodeforsynchronizingtheGPUwiththestreamedresources.Wealsowantedtoavoidtheissuesthatcanoccurwithmemoryfragmentationwhenresourcesarecontinuallyallocatedanddeallocated.
TeamUnknownRelease
Chapter11-AGPU-ManagedMemoryPoolGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
11.2TheMemoryPool
Atthecoreofourstreamingimplementationisthememorypoolclass.Wedefineourmemorypooltobeablockofcontiguousmemoryalongwiththelogicfordynamicallymanagingit.WeuseoneormorememorypoolobjectstomanagethememoryinwhichwestoretheGPUdataresourcesforrendering.
Asimpledesignchoicewouldbetosplitthememoryinourpoolintoanumberofblockswithpredeterminedbutvariablesizes.Eachnewallocationinthememorypoolwouldthenuseoneoftheseblocks.Thistechniquesidestepsfragmentationissuesandreducesthememorymanagementlogictosimplyfindinganappropriatelysizedemptyblockinwhichtoplaceeachasset.
Thismightbeagoodapproachifwewanttostoreonlytexturedata,sincetexturestendtotakeoneofafinitecombinationofsizesandsurfacetypes.However,itwouldwastememorywhenwecannotpredictwithgoodaccuracywhatrangeofblocksizesweneedtosupport.Ourdesignrequirementsstatethatwewanttosupportotherresourcesinourmemorypoolsuchasvertexandindexbuffers.Thesehavealargepotentialrangeofdatasizes.Wethereforechoosetoavoidanyassumptionsaboutresourcesizeanddonotusethefixedblockapproachtomemorymanagement.
Instead,ourmemorypoolismadeupofdynamicallysizeddata"chunks".Thesechunkscanbeofanysize,andmaycontaindataorbeempty"freechunks".Althoughachunk
maycontainmorethanoneitemofdata,eachitemofdataisguaranteedtolivewithinasinglechunk.Thismeansthatchunkscanbemovedaroundthememorypoolwithoutbreakingtheinternalconsistencyofthedatathattheyhold.
Chunksneedtobekeptalignedaccordingtoplatformrestrictions.Forexample,ononetargetplatform,allvertexbufferdatamayneedtobealignedto128-byteaddressboundaries.Often,therearedifferentalignmentrestrictionsfordifferentresourcetypes.Forsimplicity,allchunkscanbealignedtothesizeoftheplatform'smaximumdataalignmentrestriction.Forexampleifourtargetplatformalsorequiresthattexturesneedtobe1kBaligned,thenwemightchosetoalignallchunks,includingthosecontainingonlyvertexbuffers,to1kB.
Amemorypoolstartsoutwithnodatainitandsocontainsasingleemptychunkthatspanstheentirememoryinthepool.Whenweaddnewdatatothememorypool,wefindanemptychunkthatitfitsinside.Ifthesizeofthenewdataislessthanthesizeoftheemptychunkthenwebreakthechunkintwo.Onechunknowcontainsthenewdata,andtheotherisafreechunkcontaininganyremainingunusedmemory.Whenweremoveachunkfromthememorypool,wemarkitasemptyandmergetheresultingfreechunkwithanyadjacentfreechunks.AtypicalsequenceofoperationsisillustratedinFigure11.1.
Figure11.1:Amemorypoollayoutshownasitundergoesanumberofallocateandfreeoperations.Notethateachdatachunkiskepttothealignmentshownbythedottedlines.Alsonotethatafteronlyafewoperations,wehaveafragmentedmemorypool.
TeamUnknownRelease
Chapter11-AGPU-ManagedMemoryPoolGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
11.3SynchronizationIssues
Onourtargetplatforms,theGPUexecutesasinglecommandbufferstreamcreatedbytheCPU.TheserialexecutionofthesecommandsisthekeytothememorypooldatamovementremainingsynchronizedwithGPUusage.
Considerthecasewherewehaveatextureinthememorypoolthatweuseforrendering,andthenwishtomoveitwithinthememorypoolbeforeusingitforrenderingasecondtime.Thisisatypicalsequenceofoperationsifwewishtosupportdefragmentationofthememorypool.IfweusetheCPUtomanagememorycopying,thentheCPUisforcedtowaitforanyrenderingthatusesthetexturetocompletebeforemovingthetexture.Thisisdifficulttosynchronizecorrectlyandefficiently.Itisalsointrusiveinthatweneedtokeeptrackofwhenthetextureisused.Figure11.2showshowthisapproachisexecutedontheCPUandGPUsides.
Figure11.2:SequencediagramofmemorymovementwithinourmemorypoolusingtheCPU.NotehowtheCPUneedstowaitforanindeterminatelengthoftimebefore
movingthetexturedatainthememorypool.
Incontrast,ifweusetheGPUtomanagememorycopying,thentheGPUexecuteseachmovementofdatainthememorypoolasoneactioninawell-orderedstreamofcommands.Thisguaranteesthatanyrenderingfromthetextureiscompletebythetimewemovethedata.Conversely,itguaranteesthatthemovementoftexturedataiscompletebythetimethatitisnextusedforrendering.Figure11.3showshowthisapproachisexecutedontheCPUandGPUsides.
Figure11.3:SequencediagramofmemorymovementwithinourmemorypoolusingtheGPU.NotethatnocomplexsynchronizationisnowneededbetweentheCPUandtheGPUandthattheCPUcanqueuethedatatransferimmediately.
TeamUnknownRelease
Chapter11-AGPU-ManagedMemoryPoolGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
11.4TheStagingBuffer
WehaveshownhowtheGPUcancontrolmemorymovementwithinthememorypool.Wealsoneedtodealwiththecaseofmovingnewdataintothememorypool.Whenaddingnewdataintothememorypool,wefirstsearchforafreechunktoplaceitin.Achunklabeledasfreeinthememorypoolmaybeinoneoftwostates.ItmaybegenuinelyemptyoritmaycontainvaliddatathathasbeenscheduledtobemovedbytheGPU.Inthelattercase,anydatacopyingintothechunkbytheCPUwouldpotentiallycorruptthedataalreadyinthememorypool.Forthisreason,onlytheGPUshouldcopynewdataintothememorypool.Toaccomplishthis,weinitiallyloadanynewdataintoaCPU-managedstagingbufferandthenusetheGPUtocopythisdataintothememorypool.
ThisapproachstillrequiressomestraightforwardsynchronizationbetweentheCPUandGPU.Inourimplementation,thestagingbufferisaringbufferstructure.Theringbuffercontainsblocksofdatathatareawaitinguploadtothememorypool.WecanonlycleareachringbufferentrywhentheGPUhascompletedcopyingittothememorypool.Totracktheprogressofcopiesintothememorypool,weuseafencesynchronizationprimitivetodeterminewhentheGPUhascompletedprocessinganoperation.Thefenceprimitiveisnamedandimplementedslightlydifferentlyondifferentplatforms.OnXbox360andinOpenGL,itiscalledafence,butinDirectXitiscalledanevent,andonPlayStation3itiscalledalabel.Oneachoftheseplatforms,wecanpushafencetotheGPUcommandbufferandpolltoseewhetherithasbeenprocessedbythe
GPU.Soforeachringbufferentryinourstagingbuffer,weplaceaGPUfenceaftertheGPUcopyoperation.Onceperframe,wecheckthefencestodeterminewhichcopiesarecompleteandcleartheringbufferentriesaccordingly.ThisusageisillustratedinFigure11.4.
Figure11.4:StagingbufferusageillustratingtheuseoffencestodeterminewhenaGPUcopyiscomplete.
Thisstagingbufferispartofthesystemthatthrottlesdatamovementintothememorypool.Inourimplementation,westreamnewdataintothestagingbufferusingabackgroundloadingthread.Ifthedatatransferintothememorypoolstalls,eitherbecauseoflackofspaceorslowdefragmentation,thenthestagingbufferfillsupandthebackgroundloadingthreadsleepsuntilspacebecomesavailableagain.
TeamUnknownRelease
Chapter11-AGPU-ManagedMemoryPoolGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
11.5MemoryPoolDefragmentation
Sinceourmemorypoolcontainsmanyflexibly-sizedchunksthatareaddedtoandremovedfromthepoolinnofixedorder,itispronetofragmentation.UsingtheGPUtomanageourmemorypoolfreesusfromconcernsaboutmultiprocessorsynchronization.Thismakesanydefragmentationsystemsimplertoimplement.
IntheirGDCpresentationBalestraandEngstad[1]brieflydescribeaverysimplealgorithmthattheyemploytodefragmentstreamedtextures.Ourinitialdefragmentationlogicusedthesameapproach.First,wescanthroughthememorypoolfrombeginningtoenduntilwefindafreechunk.Thenthenextnon-freechunkismoveddowntofillthisfreechunk.Thefreechunkcreatedbythemoveisthenconsolidatedwithanyadjacentemptychunksandthescaniscontinued.Thisdefragmentationpassisrunonceperframe.ToavoidhighGPUworkloadinpathologicalsituationsanupperlimitissetonthetotalsizeofthedatathateachdefragmentationpasscancopy.
Theadvantageofthisalgorithmisthatgivenasufficientdatatransferbudget,weareguaranteedtotendtowardsafullydefragmentedpool.Adisadvantageisthatdefragmentingasingleemptyblockatthestartofanotherwisefullmemorypoolrequiresanexpensivecopyofalmosttheentirememorypool.
Inournaiveimplementation,weoccasionallysawsmallspikesinthetimeittooktoloaddataintothememorypool.Thesegenerallycorrespondedtothepoordefragmentation
performanceinthepathologicalcaseoutlinedabove.Asimplesolutionthatreducedtheseissuestoamanageablelevelwastobreakthememorypoolintoanumberofregions.Thedefragmentationpasswasthenrunforeachregion,butitneverdefragmentedacrossregionboundaries.Theregionsizewastunedbytestingtofindthesizethatgavethebestperformance.ThisapproachisillustratedinFigure11.5.
Figure11.5:Memorypoollayoutshownoveranumberofdefragmentationpasses.Theshadedareasrepresentallocatedmemorychunks.Splittingthememorypoolintoregionscanreducethenumberofmemorycopiesrequiredduringdefragmentation.
TeamUnknownRelease
Chapter11-AGPU-ManagedMemoryPoolGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
11.6MemoryPoolEviction
Whenthememorypoolisoversubscribedinourstreamingsystem,weoftenneedtoselectacandidateforeviction.Ourmemorypoollogicimplementsthisbyassigningeachresourceinthestreamingsysteman"evictionmetric".Thecalculationofthisevictionmetricisdeterminedsolelybyapplication-specificlogic.Onceperframe,weevicttheitemsinthememorypoolwithevictionmetricshigherthanthelowestevictionmetricsofresourcesthatarenotinthememorypool.BecausetheGPUserializesallmovementinthememorypool,evictionissimplyacaseofmarkingtheevicteditems'chunksasbeingfreeandthereforereadyforreuse.
Whencalculatingasensibleevictionmetric,itisimportanttoconsiderthepossibilityofcachethrashing.Foranexcellentsummaryofcachereplacementalgorithmssuitableforastreamingmemorypoolsee[3].
TeamUnknownRelease
Chapter11-AGPU-ManagedMemoryPoolGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
11.7Platform-SpecificConsiderations
PlayStation3
ThePlayStation3memoryarchitectureissplitintomain(CPU)andlocal(GPU)memory.TheCPUhasslowaccesstolocalmemory,soitisimportanttoplacethestagingbufferinmainmemoryevenifthememorypoolisplacedinlocalmemory.
ThePlayStation3hasaverystraightforwardAPIforcarryingoutGPUmemorycopies.AnimportantconsiderationisthatsomesmallcareneedstobetakentoensurethataGPUmemorycopyofaresourceisfullycompletebeforethatresourceisusedforrendering.
Xbox360
TheXbox360hasaunifiedmemoryarchitecturewithinwhichtheGPUcanbeusedtocopymemoryusingitsmemexportAPI.However,whenusingmemexport,thesizeofthememorypoolthatcanbecreatedandusedislimitedto64MB.UsingtheGPUtocopymemoryontheXbox360hastheadvantageofhavingasignificantlyhigherdatathroughputthanwhenusingtheCPU.TheGPUcopiesdobypasstheCPUcaches,however,socareneedstobetakentoensurethatcoherencyismaintainedwithanymemorythattheCPUdirectlyaccesses.Forexample,thestagingbuffershouldeitherbeinsomeformofnon-cacheablememory,orwhenwritingtothestagingbuffer,weshouldexplicitlyflushthecachelinesdowntomemory.
TeamUnknownRelease
Chapter11-AGPU-ManagedMemoryPoolGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
11.8FutureWork
MultithreadingConsiderations
Onourtargetplatforms,anefficientmethodtobalancetheloadofrendercallsubmissionacrossmultipleprocessingcoresistogenerateoneormorecommandbufferspercorethatarecombinedandsubmittedtotheGPUfromasinglemainrenderingthread.WhendoingthisandusingaGPU-managedmemorypool,weneedtotakecarethateachcoremaintainsaconsistentviewofthecontentsofthememorypool.Thisshouldbestraightforwardifthememorypoolisupdatedonthemainrenderingthreadatatimewhennomultithreadedsubmissionistakingplace,suchasatthestartorendofaframe.
Non-GPUExtensions
Thepatternusedinthisgemmightbeextendedtootherprocessorsthathavetheabilitytomovedatainmemory,butwhichoperateconcurrentlywiththeCPU.ThePlayStation3SPUisanexampleusecase.ThisgroupofhighperformanceprocessorscancopymemoryusingaDMAandcanbedrivenusinganorderedqueueofjobs.ManyPlayStation3graphicsenginesuseresourcesthatarereadonlybytheSPUtocreateinputdatafortheGPU.TheSPUcouldalsobeusedtomanageanymemorypoolcontainingtheseresourcessothatstreaminganddefragmentationlogicissimplified.
BetterDefragmentation
Althoughthedefragmentationsystemoutlinedhereiseffectiveforourpurpose,inthefuture,itwouldbeworthresearchingtheefficiencyofotherdefragmentationalgorithms.AmoreefficientalgorithmwouldhavetheadvantageofreducingGPUmemorybandwidthandremovingpotentialstallswhenloadingdataintothememorypool.
TeamUnknownRelease
Chapter11-AGPU-ManagedMemoryPoolGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Acknowledgements
I'dliketoacknowledgeBalorKnightandClémentDagneau,whobothhelpeddevelopandintegratesomeoftheideasinthisarticle.I'dalsoliketoremembermycolleagueandfriendMarekRomanowskiwhocollaboratedonsomeoftheseideas,butwhosadlypassedawayduringtheperiodinwhichIwrotethisgem.
TeamUnknownRelease
Chapter11-AGPU-ManagedMemoryPoolGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]ChristopheBalestraandPål-KristianEngstad."TheTechnologyofUncharted:Drake'sFortune".GameDevelopersConference,2008.
[2]PaulGlinker."FightMemoryFragmentationwithTemplatedFreelists".GameProgrammingGems4,CharlesRiverMedia,2004.
[3]ColtMcAnlis."EfficientCacheReplacementUsingtheAgeandCostMetrics".GameProgrammingGems7,CharlesRiverMedia,2008.
TeamUnknownRelease
Chapter12-Precomputed3DVelocityFieldforSimulatingFluidDynamicsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter12:Precomputed3DVelocityFieldforSimulatingFluidDynamics
KhalidDjadoandRichardEgliCentreMoivre,UniversitédeSherbrooke
Thisgemdescribesamethodforsimulating3Dfluiddynamicsbyusingaprecomputedvelocityfield.First,wepresentamethodforbuildingthefluidvelocityfieldbyusingfluiddynamics.Thefluidvelocityfieldiscomputedonafixedgridinthefluiddomain.Second,wepresentamethodforsimplifyingthefluiddynamicsbyusingtheprecomputedvelocitiesandsomeheuristics.Theadvantagetousingaprecomputedvelocityfieldisthatitreducesthefluiddynamiccomputationtime.Thegreaterpartofthefluiddynamiccomputationisincludedinthevelocityfieldcomputationprocess,whichcanbeperformedoffline.
12.1Introduction
Inrecentyears,muchefforthasbeendevotedtointegratingrealphysicsintovirtualworldslikethosefoundinvideogames.Someofthesetechniques,suchascollisiondetection,arenowveryfamiliartogamedevelopersandhavebeenwidelyintegratedintogamephysicsengines.Thisisnotthecaseforfluids.Simulatingfluiddynamicswellinvirtualenvironmentsisgenerallyachallenge.TheprimarydifficultyisthatthereisnoanalyticsolutiontotheNavier-Stokesequationsdescribingthefluiddynamics.Ingeneral,thecomputergraphicscommunityusesagridorparticlesystemtosimulateafluid.ThisgemusesanEulerian3Dmethodonagridtocomputethefluidvelocityfield.
OneofthefirststudiesonsimulatingfluidsincomputergraphicswasdonebyFosterandMetaxas[2].TheirworkisbasedonapaperpublishedbyHarlowandWelch[3].Agreatsummaryofworkonfluidscanbefoundinarecentbook[1].WehaveimplementedthemethodofFosterandMetaxas[2]tocomputethefluiddynamics,sothefluiddomainisdiscretizedintovoxels.Thevelocityfieldcomputationprocessisasfollows:
Thevelocitysource(oranexteriorforce),whichwecalltheblower,isplacedinsidethefluiddomain(forexample,onaselectedfaceofavoxel).
ThefluidvelocityonallvoxelfacesinthefluiddomainiscalculatedbysolvingtheNavier-Stokesequations.Thesevelocitiescanbestoredinafileforfutureuse,orprecomputedbeforethesimulationstarts.
Theuseoffluiddynamicsinvideogamesisveryexpensiveintermsofcomputingtime.Forgreaterperformance,weprecomputethephysicsofthefluid.Thekeyideaistoprecomputethestepsthattakealongtime.Oncethetime-consumingstepsinthecalculationhavebeendone,weusethesedatatosimulatethefluidasifperformingallcalculationsinrealtime.Themethodisthusfast,whileyieldingrealisticandacceptableresultsforinteractiveapplicationssuchasvideogames.Theopportunitiesaffordedbytheapproachusedinthisarticleare:
Computationofthefluidvelocityanywhereinthefluiddomainusingtheprecomputedvelocities.
Simulationofanewvelocityfieldusingtheprecomputedvelocitiesandheuristicswhentheblowerchangesintensityanddirection.
Simulationofthepresenceofanewobjectorthemotionofanexistingobjectinthefluiddomainusingtheprecomputedvelocitiesandheuristics.Thisallowsinteractionbetweenfluidandobjects.
TeamUnknownRelease
Chapter12-Precomputed3DVelocityFieldforSimulatingFluidDynamicsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
12.2VelocityFieldComputation
Thevelocityfieldrepresentsthefluidvelocityanywhereinthesimulationdomain.Toobtainthevelocityfield,westartbycomputingthevelocityonfacesandthenthevelocityonvoxels.Eachvoxelhassixfaces.ThevelocitiesonavoxelareshownbytheredlinesandthevelocitiesoffacesbythegreenlinesinFigure12.1.Weusethefinitedifferencemethoddescribedin[2]tocomputethefluidvelocitiesonfacesandvoxels.ThebluelineinFigure12.1representstheblowervelocityintroducedinthedomain.TheNavier-Stokesequationsare
(12.1)
(12.2)
Figure12.1:(SeealsoColorPlates.)Voxelandfacevelocities.
UsingEquation(12.1),Equation(12.2),andtheproductrule,weobtain
(12.3)
TheseequationswereusedinFoster[2]andforimplementationinthispaper.Thequantitiesappearingintheequationsaresummarizedasfollows:
u,v,andwarethevelocitiesofthefluidonafaceoravoxelinthex-,y-,andz-axisdirections,respectively.
Pistheinternalpressureofthefluid(formally,Pisthepressuredividedbythedensityofthefluid[4]).
visthekinematicviscosity.
gistheaccelerationofgravity.
SolvingEquation(12.3)bythefinitedifferencemethodgivesthenewvelocityoneachfaceofthefluiddomain,likethevelocitiesingreenshowninFigure12.1.Thesesolutionscanbefoundin[2]andinthesourcecodeprovidedwiththisgem.Notethatthecomponentsofthevelocityforfacesperpendiculartothethreeaxes(x,y,andz)arecomputedseparately.Equation(12.2)isusedtocomputethenulldivergenceofthefluidvelocity.Thisprocessleadstofluidvelocitycorrectionand,finally,thepressureupdate.
Toensurethestabilityofthesolver,thetimestepΔtand
velocitiesneedtosatisfythecondition
(12.4)
ThestepstakenbythesolveraresummarizedinListing12.1.
Listing12.1:Pseudocodeofthefullsolver.
1Simulate(constfloat&deltaT)2{3//ResetVelocities(BoundaryandBlower)4ResetBoundaryAndBlower();5//ComputeNewFaceVelocitiesoneachvoxelwithEquation(12.3)6for(intvoxelID=0;voxelID<m_voxelNumber;voxelID++)7{8UofFaceVelocity(voxelID,deltaT);9VofFaceVelocity(voxelID,deltaT);10WofFaceVelocity(voxelID,deltaT);11}12//NullDivergencestepusingEquation(12.2)13floatmaxDivergence=s_epsilon;14while(maxDivergence>=s_epsilon)15{16floatcurrentMaxDivergence=-INFINITE;17for(intvoxelID=0;voxelID<m_voxelNumber;voxelID++)18{19floatdivergence=0.0F;20ComputeDeltaPressure(voxelID,deltaT,&divergence);21if(divergence>=currentMaxDivergence)22{
23currentMaxDivergence=divergence;
24}25}26maxDivergence=currentMaxDivergence;27//PressureUpdate28UpdatePressure();29}30//ComputeVoxelvelocitiesbyinterpolation31ComputeVoxelVelocities();32}
Tosimplifythefluiddynamics,weprecomputethevelocitiesofeachfaceforasingleblowerindifferentdirectionssuchasthepositivexdirection(seeFigure12.1).Theseprecomputedvelocitiescanbestoredinmemorybeforeusingthesimplifiedsimulationwithheuristics.Theycouldalsobesavedondiskandthenloadedintomemorywhenneeded.
TeamUnknownRelease
Chapter12-Precomputed3DVelocityFieldforSimulatingFluidDynamicsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
12.3PhysicsSimplification
UsingthefullfluiddynamicssolveraspresentedinSection12.2,wecanchangeblowerdirectionandmoveanobject.Inthissection,wepresentheuristicsthatallowustosimulatethepresenceandmotionofanobjectandalsotochangetheblowervelocitydirectionwithoutusingthefullsolver.Wehavetwokindsofheuristics:the"obstacleheuristic"andthe"blowerheuristic".
ObstacleHeuristic
Withfullcalculation,anobstaclesuchastheblackboxshowninFigure12.2requiresustorecalculateallquantities(velocitiesandpressure)inthefluiddomainateachtimestep.Thepresenceofanobstacleinthefluiddomainentailsadistortioninthevelocityfield.Wesimplifythefullsolverbynotcomputinglines4to11inListing12.1andbysettingthenumberofiterationsforthenulldivergencestep.Infact,thedivergenceminimizationprocessfromlines16to29isinsomecasesperformedmorethan10timeswhenwesets_epsilonto0.0001.Inthecaseofthe"obstacleheuristic"wesetthenumberofiterationstoaround2tomaketheprocessfaster.
Figure12.2:Imagesofthevelocityfield—(a)fromthefullsolver;(b)fromtheobstacleheuristic.
Thenewvelocityiscomputedbytheinterpolationdescribedbelow.LetVbethecurrentvelocityonaface,letVpbetheprecomputedvelocityonthesameface,andletcbeaconstantthatissettodeterminehowfastthevelocityfieldisreturnedtotheprecomputedstatewhenanobstacleisremoved.Δtisthetimestep.Theinterpolationiscalculatedusing
(12.5)
Thecomputationofanulldivergencefieldallowsvelocitymodificationaroundtheobstacle.Nulldivergencemeansthatthefluidflowinginequalsthefluidflowingout.Infact,thevelocitiesonthefacesoftheobstaclearezero,sothenulldivergenceensuresthatthefluidontheneighboringvoxelsgetsaroundtheobstacle.
The"obstacleheuristic"stepsaresummarizedinListing12.2.
Listing12.2:Pseudocodeoftheobstacleheuristic.
1SimulateObstacleHeuristic(constfloat&deltaT)2{3//UpdateVelocitiesusingthePrecomputation4//foreachfaceofeachvoxel5FaceVelocity=(1.0F-cstObstacle*deltaT)*FaceVelocity6+(cstObstacle*deltaT)*PreCompFaceVelocity;78//NulldivergencestepinPrecomputedVersion9PreCompDivergence(deltaT);1011//ComputeVoxelvelocitiesbyaveraging
12ComputeVoxelVelocities();13}
Themainadvantageofusingthe"obstacleheuristic"isthatweareabletoaddandmoveobjectsinthefluiddomain,startingfromtheprecomputedvelocityfieldwithoutanyobstacle.Theresultsobtainedwiththeheuristic(seeFigure12.2(b))aresimilartothoseofthefullNavier-Stokessolver(seeFigure12.2(a)),theadvantagebeingthattheprocessisatleasttwiceasfast.
BlowerHeuristic
Inthefluidsimulation,anychangeintheblowervelocitydirectionmeansallquantities(velocitiesandpressure)needtoberecalculatedinthefluiddomainateachtimestep.Wesimplifythefullsolvertobeabletosimulateadynamicblower.
Byobservingfluidsimulationandchangesinvelocity,wenoticethatthevelocityislinearwiththenormoftheblowervelocity.Thismeansthatifwedoublethevelocityfortheblower,theresultantvelocitiesontheothervoxelsarealsodoubled.Wealsonoticethatwhentheblowerchangesdirection,eachvelocityofavoxelchangesdirection.Tobeabletosimulatethesameeffect,weuseprecomputedvelocitiesbyaimingtheblowerinthedirectionsofthepositiveandnegativecoordinateaxes.Inthisarticle,weuseblowerdirectionsonlyintwodimensionsalongthex-andy-axes,butthemethodcanbegeneralizedtothreedimensions.Wehavetoprecomputethevelocityfieldforthe
blowerindirections(1,0,0),(-1,0,0),(0,1,0),(0,-1,0).Letθbetheanglebetweenthenewblowerdirectionandthevector(1,0,0).Forθbetweenθ1andθ2,weproceedasfollows:
Weidentifyθ1andθ2accordingtoθ.Forexample,θ1=0andθ2=90ifθisbetween0and90degrees.
Wecomputetheinterpolationweightf=(θ-θ1)/(θ2-θ1).
Vp1istheprecomputedfacevelocitiesforθ1.
Vp2istheprecomputedfacevelocitiesforθ2.
EachfacevelocityViscomputedbyV=(1-f)Vp1+fVp2.
Thisheuristicisanapproximationofthevelocitiesyieldedbythefullsolverversion.Figure12.3showsacomparisonofvelocitiesfromthe"blowerheuristic"andthefullsolver.Thesimulationusingtheheuristicismorethantwiceasfastasthefullsolver.
Figure12.3:Imagesofthevelocityfield—(a)fromthefullsolver;(b)fromtheblowerheuristic.
TeamUnknownRelease
Chapter12-Precomputed3DVelocityFieldforSimulatingFluidDynamicsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
12.4ResultsandDiscussion
Tovisualizethefluidvelocityfield,thevelocitiesonvoxelscanbedisplayedasvectors.Wearealsoabletovisualizethevelocityfieldusingunitvectorsforthedirectionandcolorsforthemagnitude.Forexample,thevelocitycanbedepictedinblueforahighmagnitudeorredforalowmagnitude.Itisalsopossibletovisualizethevelocityfieldwithparticlesmovinginthefluiddomain.TheFigure12.4showsexamplesofvisualizationmethods.
Figure12.4:(SeealsoColorPlates.)Imagesofthevelocityfieldvisualizationusingheuristics—(a)withvectorswithoutanobstacle;(b)withvectorswithanobstacle;(c)withunityvectorsforthedirectionandcolorfortheamplitudewithanobstacle;(d)withanobstacleandparticles.
Toillustratehowtheheuristicswork,animplementationis
providedontheaccompanyingCD.TheprogramiswritteninC++andusesOpenGLtodisplaythe3Dscene.Theusermustsettheresolutionofthefluiddomainintermsofnumberofvoxels.InthecaseofFigure12.4,thefluiddomainhas17×17×17voxelsandtheblower(inblue)isatthevoxelposition(8,8,8).ThegridinFigure12.4representsonlythevoxelswithz=8inthefluiddomainsinceitisnoteasytovisualizethevelocitieswithvectorswhenall17×17×17voxelsaredisplayedinastillimage.
Thefullsolverframerateisaround172FPSonalaptopequippedwithanIntelCPUT2400at1.83GHz(dualcore),withnoparallelisminthesimulationandvisualizationprocesses.Thesamesceneusing"blowerheuristic"(seeFigure12.4(a))allowsaframeratearound392FPS.Weget331FPSwiththetwoheuristics(seeFigure12.4(b)).
Someotheroptimizationsarepossibleinagamephysicscontext.Forexample,wedon'thavetoupdatethevelocityfieldwhentheblowerdoesn'tchangeandtheobstacledoesn'tmoveforacertaintime.Theheuristicsofthisarticlecanbesimplyaddedtoanexistinggamephysicsengine.Thevelocitiescanbeprecomputedatsetuporloadedfromafile.
TeamUnknownRelease
Chapter12-Precomputed3DVelocityFieldforSimulatingFluidDynamicsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]R.Bridson.FluidSimulationforComputerGraphics.AKPeters,2008.
[2]N.FosterandD.Metaxas."Realisticanimationofliquids".GraphicalModelsandImageProcessing,Volume58,Number5(September1996),pp.471–483.
[3]F.H.HarlowandJ.E.Welch."Numericalcalculationoftime-dependentviscousincompressibleflow".PhysicsofFluids,Volume8,Number12(1965),pp.2182–2189.
[4]L.Quartapelle.NumericalSolutionoftheIncompressibleNavier-StokesEquations.Springer,1993.
TeamUnknownRelease
Chapter13-MeshPartitioningforFunandProfitGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter13:MeshPartitioningforFunandProfit
JasonHughesSteelPennyGames,Inc.
Overview
Therearemanysituationsinwhichanentiremeshistoomuchdatatoprocess—whetherit'saCPU,SPU,orGPU,thereareperformancelimitationstoconsider.Ingeneral,artistscandothisworkbyhand,buthumanvariabilitybeingwhatitis,agoodalgorithmisfasterandmorereliable,anditimprovesartistproductivity.Akeytraitofasolidtoolspipelineisitsabilitytofreetheartistsfromsuchburdensanyway.Thereisasituationinalmostany3DgameinwhichonemeshreallywouldworkbetterasmanysmallerchunksthatcanbeuniquelyidentifiedbytheCPUandprocessedindependentlyfromothers.
Specifically,whatkindoflimitationsdoreal-worldgamesrunupagainst?
Renderinglimitations,suchas8-bitor16-bitindiceslimitingthenumberofverticesthatcanbeputintoamesh.
Skinninglimitations,suchasthenumberofmatricesaparticulargraphicschipisabletoexpresswithshaderconstants,orperhapsolderfixedfunctionpipelinesthathaveahardlimitforthenumberofmatrixindicesperrendercall.
Vertexunitbottlenecks,whereameshhasafewvisibletrianglesbutthemajorityarebeingtransformedandrejectedbytheclippingorbackfacingunit.ThebandwidthfortransferringvertexdatatotheGPUissignificantoncertainarchitectures,andthevertexshaderispotentiallyabottleneckaswell.
Virtuallyanyotheroperationthathasaper-trianglecomponentcouldpotentiallybeimprovedbypartitioningamesh,especiallyiftherearetrivialrejectionsthatcouldbeperformedonthosepartitionedmeshfragmentsasawhole.
TeamUnknownRelease
Chapter13-MeshPartitioningforFunandProfitGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
13.1DesirableAlgorithmProperties
Nowthatwehavesomeunderstandingofwhensplittingmeshesintoreasonable-sizedchunksifhelpful,isanypartitioningmechanismgood?Ifnot,whataretheidealpropertiesofabattle-hardenedmeshpartitioner?ThefollowingisanunorganizedlistofpropertiesthatIhavedeterminedthroughexperimentation.Otherpropertiesmayexistforcertainkindsofgames.Certaindifferencesinhardwaremayshifttheimportanceofsomeproperties.Useyourbestjudgement.Onceyouhaveasetofproperties,youcandefineanobjectivefitnessmetricforhowwellthealgorithmisperforming.Afitnessmetriciscrucialduringtheexperimentationphaseofalgorithmdesignbecauseotherwise,determiningwhetherchangesarebeneficial,detrimental,orirrelevantisverytimeconsumingandsubjective.
PartitionsShouldHaveRelativelySame-SizedBoundingColumes
Rationale:Thisimprovescullingperformancesinceregularlysizedpartitionshaveamorepredictableoverheadperpartition.Italsomeansarasterizedsetoftrianglesislikelytohaveamoreconsistentpixelthroughputperdrawcall,allowingyoutobalanceper-partitionworkversusper-triangleworkbyadjustingthemaximumboundingvolume.
PartitionsShouldHaveRelativelytheSameNumberofTrianglesandVertices
Rationale:ThisimprovespredictabilityofbandwidthandtransfertimestodedicatedprocessingunitsliketheSPUbysimplifyingbuffermanagement.Italsosmoothesoutperformancespikesthatmayotherwiseoccur.It'sgoodforlevelingoutDMAtransferperformanceandimprovingtheculled-to-renderedvertexratio,whichreducestotalbandwidthtotheGPU.
TheNumberofPartitionsShouldBeMinimizedOverall
Rationale:Managementcostsperpartitionareoftenhigh,soreducingthenumberofpartitionscanonlyimproveCPUperformance.However,ifthispropertyisoveremphasized,youendupwithasinglepartitioncontainingallthegeometry—nopartitioningatall.
TheNumberofTrianglesPerVertexinEachPartitionShouldBeMaximized
Rationale:Atrianglerequiresthreevertices.Ifyouinsertanadjacenttriangle(sharinganedge)inthesamepartition,thenewtriangleonlyaddsonevertex.However,thesameadjacenttriangleplacedinadifferentpartitioncreatesthreevertices.Obviously,youwanttoensureasmuchsharingofvertexdataaspossible.Someduplicationofvertexdataisunavoidablebecausepartitionsnaturallyseparateadjacenttrianglesalongtheirborders.Anotherwaytodescribethisisminimizingthenumberofbordersbetweenpartitions,butthatismorecomplicatedtomeasure.
ThePartitionerShouldGuaranteeaSolutionWiththePreviouslyDescribedPropertiesinPredictablyBounded
Time
Rationale:Asapracticalmatter,itisunacceptableforapartitioningtotakemorethanafewsecondsbecauseithurtsartists'abilitytoiterate.Thisdemandsafewdatastructures,somediligenceaboutavoidinganypossibilityofinfiniteloops,andsomethoughttoworstcasescenariosthatcan"neverhappen".
TeamUnknownRelease
Chapter13-MeshPartitioningforFunandProfitGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
13.2LessonsLearned
Myexperiencewithbuildingmeshpartitionersoverseveralyearsledmedownmanydeadends.HerearethebiggestmistakesthatImade,innoparticularorder,andwhatmadethempoorchoices.
Meshdataisnottobetrustedfromanysource.Thelikelihoodofanyassumptionregardingtriangularmeshconditioningwillbeprovenerroneousasymptoticallyapproachescertaintynearimportantmilestones.Youcanneverhaveenoughassertsinyourcodetohelpdiagnosetheseissues.Intheend,mypartitionergeneratedacompletelydifferenttopologicalrepresentationofthetriangledatatobeusedsolelyforpartitioning.
Addingatriangletoanexistingpartitionmayaddbetweenzeroandthreenewvertices.Partitionsholdreferencestotriangles,whichimplyvertices.Donottrytobuildpartitionsoutofvertices,andconstructthetrianglesetfromthem.Youendupwithduplicatedtriangles,heavilydisproportionatepartitions,andallmannerofotherissues.Sincetheimportantmeasurementofapartitionishowmanyverticesareinsideit(andwhetheranewtriangleaddsanynewvertices),youmustcomputefinalvertexsharingbeforeenteringthepartitioner.
Donotbuildapartitionerthateversubdividesandmergesthesamepartitioninalternation.Dooneortheother,ordotheminsequence,butdonotalternate
betweenthem.Thereareunforeseeableinfiniteloops,nomatterhowyoucraftthelogictopreventitfromhappening.
Mergingpartitionsseemslikeagoodidea,atfirst.InitializingonepartitionpertriangleandmergingnearestneighborswasonemethodIdiscardedquickly.ItisessentiallyKruskal'sminimumspanningtreealgorithm[1].However,Kruskal'sapproachcannotdealwithuser-definedlimitsforpartitions,andbreaksdownquicklyonceyouhaveanentirepopulationofpartitionsthatare51%ofyourthreshold.Atthispoint,merginganytwotogethermeansyouhavetinyleftoversthatneedtobeputsomewhereelse—thismeansmergingandpartitioninginalternation,causinginfiniteloops.Alternatively,youcanleavethoseabandonedtrianglesforafinalpartition,whichinvariablyviolateseveryimportantpropertydescribedabove.
Assurethatthefitnessmetric,afunctionresponsibleformeasuringthecurrent"fullness"ofapartition,ismonotonicallyincreasingastrianglesareadded.Toexplainfurther,onealgorithmIattemptedworkedbystealingtrianglesfromneighboringpartitionswhentheneighborwaslarger.Thefunctionthatselectswhichpartitiontostealfromneglectedtotestthefitnessofsourceandtargetpartitionsaftertransferringatriangle.Certainpartsofthedatawascompressed,andthesizeofthecompresseddatadependedontherangeofvaluespresentinthedataitself.Mostofthetime,removingdatafromapartitioncauseditscompiledpacketsizetoshrink,butoccasionally,itwouldgrow.Asaresult,asituationwouldoccurwhereAstealsfromB,
thenBstealsfromA,becausewhicheverdirectionthetrianglemoved,thesizesoffinaldatapacketswouldflip-flop.
Designforclearterminationconditionsandsteadyperformance.ItriedpartitioninginwhichIrandomlyassignedtrianglestopartitions,then"shoved"poorlyconnectedtrianglestoadjacentpartitionsthatwouldmutuallybenefitbothpartitions,eitherinboundingvolumereductions,packetsize,orothermetrics.Iteventuallydegeneratedtoapseudo-linkedlisttraversalpertriangle,whereonetriangleispushedtoaneighborwhoisnowviolatingsomemetricallimitandnowmustpushadifferenttriangletoanotheradjacentpartition,andsoon.Evenwhenmarkingapathbehindyoutopreventloops,thistendstocreatehugelinkedlisttraversalsthroughallpartitionsandslowsdowndramaticallyasthepartitionsbegintoconverge.
Useappropriatedatastructures,andfindwaystocacheorreducelookupsforrelationshipsthatarecostlytodetermine.Myfavoriteisthesimpleadjacencylistrepresentationforgraphs.Itistrivialtoimplementandveryeasytousewhendebuggingalgorithms.
Asktherightquestion.Forpartitioning,thatquestionis"WhichtriangleshouldIputinthecurrentpartitionnow?".Manyofmyattemptsweretryingtodecide"WhichtrianglewouldbebettertomovefrompartitionAtopartitionB?","Whichpartitionshouldgrow?",or"Whichpartitionshouldshrink?".Involvingrelativedecisionsaboutthequalityofpartitionsnevermaterializedintoaconcreteandusablesystemfor
partitioning.
Alwaysselecttheleast-connectedtriangleasthestartingpointfornewpartitions.Thisdiscouragesyourfinalpartitionfromincludingalargenumberofscattered"loner"trianglesthatnootherpartitionwanted.Theboundingsphereofsuchapartitionwouldbeverylargeandtherenderingveryinefficient.Thissameruleappliesfortrianglestrippingalgorithms,forthesamereasons.
Graphtheoryhelps.ReaduponPrim'salgorithm[2].Itisatemplateforhowthismeshpartitionerworks.However,donotbetemptedtofollowPrim'salgorithmexactlyasdescribed.Itsuggestsapriorityqueueforcandidateedges,butdoesnotallowforreprioritizingadjacentnodesinthequeue.Meshpartitionshavepotentiallyshareddatabetweentriangles,soitislikelythateachadditionaltriangleaddedtoapartitionchangesthecostcalculationforeverycandidatetriangle.Placingtrianglesintoaqueuewithafixedpriorityclearlydoesnotworkforthiskindofproblem.Thismeansyoumustscanthecandidatetrianglelisteveryiterationandrecomputethecosttofindthebestcandidate.
Itisveryhardtocomeupwithalocalmetricforpackingfacesintotightclusters.Ifyouputnorestrictionsontheclosenessoftriangles,yougetlongstripswithlargeboundingspheresthatpoorlyapproximatethepartition.Ifyoumeasurerelativetoacentroidoffaces,yougetpartitionsthatareverydenselypacked,butmaynotshareverticeswell(imaginethreeparallelplanesintersectingasphere,wheretrianglesinsidethesphereareclosetogether,buthavealotofboundaryvertices
thatareonlyusedonce).JustafterI'dsolvedthisproblem,Ireadapaper[3]thatexpressedthesamesolutiononlyacoupleofyearsearlier.Aswithmostdiscoveries,inretrospect,thesolutionisobvious:minimizethedistancebetweenrelatedtrianglecenters.
TeamUnknownRelease
Chapter13-MeshPartitioningforFunandProfitGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
13.3WhenGreedyIsGood
Sincetherearesomanychoicesthatcanbemadewhilewritingameshpartitioner,it'ssurprisinghowsimpleagoodonecanbe.Althoughthefollowingisagreedyalgorithm,theresultsareinitiallygood,andasimplerefinementstepafterwardscanimprovethepartitionefficiencybytwotothreepercent.ThissectiondescribeswhatIfoundtoworkwell.
CoreAlgorithmOverview
1. Selecttheleastconnected[1]unassignedtriangleandinitializeanewpartitionwithit.Ifnotriangleisunassigned,partitioningiscomplete.
2. Collectalltheunassignedtrianglesthatarerelated[2]tothepartitionintoacandidatelist.Ifnorelatedtrianglesexist,performanexhaustivesearchtofindallminimallyconnectedtrianglesintheunassignedmesh,andconsiderthemalltobecandidates.
3. Iterateoverthecandidatelistoftriangles,temporarilyaddingeachonetothepartition,andcomputethefitnessmetricofthepartitionusinganobjectivemetricfunction.Immediatelyrejectanycandidatetrianglethatcausesahardthresholdlimittobeexceeded,e.g.,vertexcountlimit,finalpacketsize[3],maximumboundingsphere.
4. Ifatleastonecandidatetrianglewasnotrejected,selectthetrianglethatyieldedthebestfitnessscoreforthe
partitionandaddittothepartition.RepeatfromStep2.
5. Otherwise,therearenotrianglesthatcanbeaddedtothepartitionwithoutexceedingsomethreshold(bytes,numberoftriangles,boundingsphere,etc.).Considerthispartitionfullandstartanewone.RepeatfromStep1.
Refinement
Afteralltriangleshavebeenassignedtopartitions,thelastpartitionis,onaverage,halfthesizeoftheothers.Evenifthisisacceptable,youmightwanttoperformarefinementonthepartitions.Youcantypicallyruntherefinementmultipletimestoleveloutsomeofthepartitionsandreducethetotalsizeofthesolution.Torefine,followthesestepsforeachpartition:
1. DetermineifanytrianglecanbemovedfrompartitionAtopartitionB,whereAlosessomeverticesandBgainsnone.Thisisaclearwinformemoryandperformance,butmaydistortboundingvolumesorcausethetrianglecountsacrosspartitionstobecomeunbalanced.Thisisdemonstratedinstep14ofthegraphicalwalkthroughattheendofthisgem.
2. DetermineifanytrianglecanmovefrompartitionAtopartitionB,whereBhasfewertriangles,evenifthenumberofverticesremainsthesameinbothpartitions.Thisbalancesthetrianglecountbetweenpartitions,butcanbedonetoimprovetheboundingvolumeortrianglecountbalance.
3. Asalastresort,determineifanyonevertexcanbemovedfrompartitionAtopartitionB,possiblymovingseveraltrianglestoBintheprocess,sothatAhasfewerverticesandtriangles,andBhasmore.Thisisparticularlyusefulwhenfillingoutthefinalpartition,becausethefinalpartitionisonaveragehalfthesizeoftheothers.
[1]Leastconnected,inthiscontext,meanstheunassignedtrianglethathasfewestsharedverticeswithotherunassignedtriangles.Thisisadynamicpropertyofatriangleduringpartitioning,andcannotbeprecomputed.
[2]Related,asdefinedforpartitioning,isthestatementthatvertexdatawithinapartitioncouldbesharedwithadjacentunassignedtriangles.Thispropertyformsthebasisofcandidatetrianglesforinclusioninagrowingpartition.
[3]Inmostmoderngraphicsengines,asinglebatchofgeometrysenttotheGPUiscalledapacket.Thesizeofthispacketisoftenhard-limitedataspecificnumberofbytesdictatedbyhardwareorenginesoftwaredesignconstraints.
TeamUnknownRelease
Chapter13-MeshPartitioningforFunandProfitGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
13.4FutureWork
Therearemanyinterestingapplicationsthatapartitionercanbeusedfor,onceyouhaveit.Whatkindsofrapidbackfacecullingcouldyoudoonalargemeshifyoupartitioneditintotrianglesthatallhaverelativelythesamefacenormalwithminordeviations?Thespeedupfrombulktrianglerejectioncouldbedramatic,withouthavingtoeventransmitthetrianglestotheGPU.
Similarly,thereisvalueinextendingthepartitionertopayattentiontothesizeofpacketsgeneratedsothattheyfitinsideafixedmemorysize.Thisfixedsizemightmakedatamanagementsimplerandfasterbecauselessbookkeepingisnecessary,particularlyforanarchitecturelikethePlayStation3whereSPUmemoryistightanddouble-bufferingDMAismandatoryforbestperformance.Dynamicallymanagingmemoryisslowerandmorecomplicatedthanusingfixedbuffers,soleavingroominyourimplementationtoextendthefitnessfunctionofapartitioninarbitrarywaysisvaluable.
Itisrelativelystraightforwardtoconvertapartitionerintoahill-climbingpartitionerbyrememberingtheorderthatfaceswereselectedforinclusioninpartitions,perturbingthatorderslightly,andre-evaluatingtheresults.Byrecordingthebestsequence,youcanusethatasthebasisforperturbations.Itisunlikelytoperformsignificantlybetterthanthecorealgorithm,butasslightastheimprovementsmaybe,ifsqueezingouttheabsolutebestperformanceandsmallestmemoryfootprintisyourobjective,itmaybeworthdoing.
TeamUnknownRelease
Chapter13-MeshPartitioningforFunandProfitGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
13.5GraphicalWalkthrough
Herefollowsasimplegraphicalwalkthroughthatshowsapartitioningintotwoequalparts.Thediagramsshowunassignedtriangles,assignedtriangles,andcandidatetrianglesforinclusionindifferentshadingsforclarity.
Step1.Asimpleinputmesh.Triangulationisnotstrictlynecessary,butyourimplementationwillbefarsimpler,andlikelyhavefewerbugsandbetterperformanceasaresult.
Step2.Thetop-leftcornerisoneoftheleastconnectedtrianglesinthemesh.Weinitializethefirstpartitionwithit.Next,wediscoverthecandidatetrianglesthathaveverticesincommonwiththecurrentpartition.Thisiseasilydonebycreatingamapofsharedverticestorelatedtriangles.
Step3.Next,oneofthecandidatesisselectedbasedonminimumboundingradius,distancebetweenthecandidatetriangleanditsrelatedtrianglein
thepartition(relatedthroughthesharingoververtices,notedgeconnectivity),orotherfactors.
Step4.Notehowcandidatetrianglesmaynothaveedgesincommonwiththecurrentpartition.
Step5.Thenewlyaddedtrianglewasagoodchoicebecauseitminimizedtheboundingvolumeofthecurrentpartitionandonlyaddedonevertex.
Step6.Let'spretendthatwecannotaddanynewtrianglesbecausesomethresholdhasbeenreached.So,wesearchallunassignedtriangles,notnecessarilyforonethatisnearthecurrentpartition,butforonethatisleastconnectedtotheremaining
unassignedtrianglesinthemesh.
Step7.Here,wehaveselectedatrianglethathasonlytworelatedtriangles.Everyotherunassignedtriangleisrelatedtoatleastthreeothers.Remember,relationshipisdeterminedbyvertexdatasharing,notedges.
Step8.Twoverticesaresharedwiththistriangle,soaddingitonlycostsonenewvertex.
Step9.Thistriangleonlyrequiresonenewvertex.
Step10.Thistriangleincreasestheboundingsphereminimally,andonlyaddsonenewvertex.
Step11.Thistriangleaddsonenewvertex,andiscloser,measuredbydistancebetweenfacecenters,totherelatedtrianglethanthesimilartrianglenotselectedabove.
Step12.Thistriangleaddsonenewvertex.
Step13.We'verunoutoftrianglestoassign.Theyaren'tperfectlybalanced,solet'srunarefinementsteptoimprovethebalance.
Step14.Duringtherefinementprocess,bothpartitionsareexaminedfortrianglesthatwouldprefertobesomewhereelse,atnocost.Sincethatsituationdoesnotoccur,theonlywaytobalancethepartitionsistomoveVertex6tothefirstpartitionalongwiththetriangleindicesforTriangle4.Provided
neitherresultingpartitionfailstofitundertheirthresholds(boundingradius,vertexcount,trianglecount,etc.),thisisdone.
NotethatTriangle7nowcanbemovedfreelybetweenthepartitionswithoutchangingthevertexcount.Inthiscase,there'snoreasontodososincethepartitionsarebalancedandvertexcountcannotbereduced.
TeamUnknownRelease
Chapter13-MeshPartitioningforFunandProfitGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]J.B.Kruskal."OntheShortestSpanningSubtreeofaGraphandtheTravelingSalesmanProblem".ProceedingsoftheAmericanMathematicalSociety,Volume7,Number1(February,1956),pp.48–50.
[2]R.C.Prim."Shortestconnectionnetworksandsomegeneralizations".BellSystemTechnicalJournal,Volume36(1957),pp.1389–1401.
[3]P.V.Sander,Z.J.Wood,S.J.Gortler,J.Snyder,andH.Hoppe."MultiChartGeometryImages".Proceedingsofthe2003EurographicsSymposiumonGeometryProcessing,2003.
TeamUnknownRelease
Chapter14-MomentsofInertiaforCommonShapesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter14:MomentsofInertiaforCommonShapes
EricLengyelTerathonSoftware
Overview
Themomentofinertiaisanimportantquantityinrigidbodydynamics.It'stherotationalanalogofmass,anditdescribeshowdifficultitistochangetheangularvelocityofanobject.TheformulausedtocalculatethemomentofinertiaIaboutaparticularaxisistheintegral
(14.1)
wheredmisadifferentialmassatsomepointinsidethebody,risthedistancefromthatpointtotheaxisofrotation,andVrepresentsthesetofpointsmakinguptheentirebody.Althoughthisintegrallooksverysimple,theshapeofthevolumeVoftenturnstheintegralintosomethingverycomplicated.Insteadoflaboriouslyevaluatingacomplicatedintegral,onemaychoosetolookupthemomentofinertiaforaparticularshape,butexistingreferencescanbedifficulttofind,andthosethatdoexistaresometimesinaccurateorincomplete.Thisgemprovidesthederivationsofthemomentsofinertiaforavarietyofcommonshapesandsummarizestheminahandyreferencetable.
TeamUnknownRelease
Chapter14-MomentsofInertiaforCommonShapesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
14.1CenterofMass
Inarigidbodysimulation,itismostusefultoknowthemomentofinertiaforanobjectaboutitscenterofmassbecausethatisthepointaboutwhichtheobjectnaturallyrotates.Inordertocalculatethecenterofmassforanobject,wefirstneedtobeabletocalculatetheobject'smass.Ifweconsideranobjecttobecomposedofalargenumberofparticles,thenit'stotalmassmissimplythesumofthemassesmkofthoseparticles:
(14.2)
ThecenterofmassCisfoundbytakingtheproductofeachparticle'smassmkanditspositionrk,summingoverallparticles,andthendividingbythetotalmassasfollows:
(14.3)
Foracontinuousvolume,thesesummationsbecomeintegrals.Themassmofanobjectisfoundbyintegratingtheobject'sdensityoveritsvolumeasfollows:
(14.4)
Here,ρ(r)isafunctionthatgivesthedensityoftheobjectatanypointrinsideitsvolume,anddv=dxdydzisa
differentialvolumeelement.Thedensityisoftenaconstantthatwecanmoveoutoftheintegral,sowedropthefunctionnotationandsimplywriteitasρ:
(14.5)
ThecenterofmassCforanobjectisfoundbyintegratingtheproductofthedifferentialmassanditspositionanddividingtheresultbythetotalmassasfollows:
(14.6)
TeamUnknownRelease
Chapter14-MomentsofInertiaforCommonShapesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
14.2TheInertiaTensor
Inagivencoordinatesystem,everyrigidbodyhasthreemomentsofinertia(oneforeachofthecoordinateaxes)andthreeproductsofinertia.Thesesixquantitiesformwhatiscalledtheinertiatensorfortherigidbody.Theinertiatensorisordinarilyexpressedasa3×3matrix,butitissymmetric,sothereareonlysixdistinctentries.Forasetofparticles,theinertiatensorIisgivenbytheformula
(14.7)
wherethek-thparticlehasmassmkandislocatedatthepoint(xk,yk,zk)[1].Thisformulacanalsobeexpressedas
(14.8)
whereE3isthe3×3identitymatrix,rk=(xk,yk,zk),andtheoperation⊗isthetensorproductgiving
(14.9)
Thediagonalentriesoftheinertiatensorarethemomentsofinertia,andtheoff-diagonalentriesaretheproductsofinertia.Itisalwayspossibletofindacoordinatesystemin
whichtheproductsofinertiaareallzero,andwecalltheaxesofsuchacoordinatesystemtheprincipalaxesofinertiaforarigidbody.Inthisgem,weonlycomputetheinertiatensorinacoordinatesystemalignedtotheprincipalaxes.Theorientationoftheseaxesareusuallyevidentduetosymmetryintheobjectbeingexamined.
Foracontinuousvolumeinacoordinatesystemalignedtotheprincipalaxesofinertia,thediagonalentriesoftheinertiatensorIaregivenbytheintegrals
(14.10)
Transformations
Givenaninvertible3×3transformationmatrixMthattransformspointsfromonecoordinatesystemtoanothercoordinatesystemwiththesameorigin,aninertiatensorIistransformedaccordingtotheformula
(14.11)
It'susefultothinkofthisproductasfirsttransforminginreversefromthenewcoordinatesystemtotheoriginalcoordinatesystemusingM-1,applyingtheinertiatensorIinthatcoordinatesystem,andthentransformingbackintothenewcoordinatesystemusingM.
Totransformaninertiatensorintoacoordinatesystemwithadifferentorigin,wecanuseaformulaknownastheparallelaxistheorem.Letsbeanoffsetvectorrepresentingthedifferencebetweentheneworiginandtheoldorigin.Then,startingwiththeformulafortheinertiatensorgiveninEquation(14.8),wereplacerwithr+stoobtain
(14.12)
Expandingthissummation,wehave
(14.13)
ThisequationcontainsthetwotermsfromoriginalsummationgivenbyEquation(14.8)forI,sowecansubstituteIforthesetermstoget
(14.14)
Now,iftheoriginofthecoordinatesystemcoincideswiththecenterofmass,thenthesummation∑kmkrkisequaltothepoint(0,0,0).ThisallowsustomakeatremendoussimplificationbecauseallofthetermsinEquation(14.14)containingthissummationvanish.Wethereforecanusetheformula
(14.15)
totransformaninertiatensorfromacoordinatesysteminwhichthecenterofmassliesattheorigintoanother
coordinatesysteminwhichtheneworiginliesatthepointsintheoriginalcoordinatesystem.
It'simportanttounderstandthatEquation(14.15)canonlybeappliedoncetoaninertiatensorinordertomoveitawayfromthecenterofmass.Aftertheinertiatensorhasbeenmoved,itnolongerusesacoordinatesysteminwhichtheorigincoincideswiththecenterofmass,butthatconditionmustbetrueforEquation(14.15)tobevalid.However,itispossibletorecovertheinertiatensorIfromtheoffsetinertiatensorI′ifthevectorsisknown,onceagainallowingEquation(14.15)tobeusedtoperformanewoffset.
TeamUnknownRelease
Chapter14-MomentsofInertiaforCommonShapesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
14.3DerivationofMomentsofInertia
Inthissection,wederivethecentersofmassandthemomentsofinertiaforavarietyofcommonsolidshapes.Theinertiatensorsarealwaysexpressedinacoordinatesysteminwhichtheoriginliesatthecenterofmassandthecoordinateaxesareparalleltotheshape'sprincipalaxesofinertia.
Evaluatingintegralsofthetypepresentedinthissectionbyhandcanbeaverytediousexercise.WerecommendusingasymboliccomputationpackagesuchasMathematicatoperformthesecalculations,shouldthereaderfeelsoinclined.
Box
Therearetwocommonwaystodescribethedimensionsofabox,asshowninFigure14.1.Onewayistoplacetheoriginatonecornerandidentifythefullextentsoftheboxinallthreedirectionsbyitslengthl,itswidthw,anditsheighth.Thesecondwayistoplacetheoriginatthecenteroftheboxandidentifytheperpendiculardistancesa,b,andcfromthecentertothefacesinallthreedirections.Weprovideformulasforbothcases.
Figure14.1:Abox.
Inthecasethattheboxisdescribedbythedimensionsl,w,andh,thetotalmassism=plwh,andthecenterofmassislocatedat(l/2,w/2,h/2)Themomentsofinertiaarethengivenbytheintegrals
(14.16)
Substitutingthemassm,thisgivesustheinertiatensor
(14.17)
Inthecasethattheboxisdescribedbythedimensionsa,b,andc,thetotalmassism=8ρabc,andthecenterofmasscoincideswiththeorigin.Themomentsofinertiaarethen
givenbytheintegrals
(14.18)
Substitutingthemassm,thisgivesustheinertiatensor
(14.19)
Cylinder
Thedimensionsofacylinderaredescribedbyitsheighthandthetwosemi-axislengthsaandbofitsbase,asshowninFigure14.2.Ifthecylinderiscircular,thena=b.
Figure14.2:Acylinder.
Thecenterofmassliesatthepoint(0,0,h/2),andthetotalmassofthecylinderism=ρπabh.Themomentsofinertiaarethengivenbytheintegrals
(14.20)
Substitutingthemassm,thisgivesustheinertiatensor
(14.21)
Pyramid
Thedimensionsofarectangularpyramidaredescribedbyitsheighthandtheperpendiculardistancesa0andb0fromthecenterofthebasetotwoadjacentedgesofthebase,asshowninFigure14.3.
Figure14.3:Arectangularpyramid.
Inordertocalculatethetotalmassandcenterofmass,weneedtobeabletoexpressthelengthsaandbofacross-sectionofthepyramidatanyz-coordinate.Functionsa(z)andb(z)producingthebaselengthsatz=0andlinearlytaperingtozeroattheapexwherez=haregivenby
(14.22)
Thetotalmassisthengivenbyintegratingrectangularareasovertheentireheightofthepyramid:
(14.23)
Thecenterofmassclearlyliesonthez-axis,andwecancalculateitsz-coordinatebymultiplyingafactorofzintotheintegrandforthemasstoobtain
(14.24)
Afterdividingbym,wefindthecenterofmasstobelocatedatthepoint(0,0,h/4).
Sincethemomentofinertiaisbestcalculatedinthecoordinatesystemforwhichtheorigincoincideswiththecenterofmass,itisusefultoconsiderapyramidthatextendsfrom-h/4to3h/4inthezdirectionandredefinethefunctionsa(z)andb(z)as
(14.25)
Themomentsofinertiaarethengivenbytheintegrals
(14.26)
Substitutingthemassm,thisgivesustheinertiatensor
(14.27)
Cone
Thedimensionsofaconearedescribedbyitsheighthandthetwosemi-axislengthsa0andb0ofitsbase,asshowninFigure14.4.Iftheconeiscircular,thena0=b0.
Figure14.4:Acone.
Aswiththepyramid,weusethefunctionsa(z)andb(z)givenbyEquation(14.22)toexpressthedimensionsofacross-sectionoftheconeataheightzabovethebase.Wecancalculatethetotalmassoftheconebyintegratingellipticaldiskareasovertheentireheightofthecone:
(14.28)
Thecenterofmassclearlyliesonthez-axis,andwecancalculateitsz-coordinatebymultiplyingafactorofzintotheintegrandforthemasstoobtain
(14.29)
Afterdividingbym,wefindthecenterofmasstobelocatedatthepoint(0,0,h/4).
Itisnocoincidencethatthecentersofmassforthepyramidandconeareequal.Thesamepointisobtainedforanytwo-dimensionalbaseshapethatlinearlytaperstoapointataheighth.Tocalculatethemomentsofinertiainacoordinatesystemhavingtheoriginatthecenterofmass,weagainredefinethefunctionsa(z)andb(z)asinEquation(14.25)andintegratefrom-h/4to3h/4.Themomentsofinertiaarethengivenbytheintegrals
(14.30)
Substitutingthemassm,thisgivesustheinertiatensor
(14.31)
Ellipsoid
Thedimensionsofanellipsoidaredescribedbythethreesemi-axislengthsa,b,andc,asshowninFigure14.5.Inthecaseofasphere,a=b=c.Thecenterofmassisclearlylocatedatthecenteroftheellipsoid,andthatiswhereweplacetheoriginaswell.Thetotalmassoftheellipsoidis
.
Figure14.5:Anellipsoid.
Tomaketheintegralssimpler,weremapanellipsoidtoasphereofradiusoneusingthefollowingsubstitutions:
(14.32)
Themomentsofinertiaforanellipsoidarethengivenbytheintegrals
(14.33)
Substitutingthemassm,thisgivesustheinertiatensor
(14.34)
Dome
Thedimensionsofadome,orellipsoidalhemisphere,aredescribedinthesamewayasacompleteellipsoid:bythesemi-axislengthsa,b,andc,asshowninFigure14.6.
Figure14.6:Adome,orellipsoidalhemisphere.
Thetotalmassofadomeis ,andthez-coordinateofthecenterofmasscanbecalculatedusingtheintegral
(14.35)
wherewehaveagainmadethesubstitutionsgiveninEquation(14.32).Afterdividingbym,wefindthecenterofmasstobelocatedatthepoint(0,0,3c/8).
Tocalculatethemomentsofinertiaforadome,wecanuseatrickthatmakestheintegralssimpler.InsteadofcalculatingtheinertiatensorIaboutthecenterofmass,wecalculatetheinertiatensorI′abouttheoriginatthecenterofthedome's
baseandthenuseEquation(14.15)tofindIwhentheoffsetiss=(0,0,-3c/8).Themomentsofinertiaabouttheoriginforadomearegivenbytheintegrals
(14.36)
Substitutingthemassm,thisgivesustheinertiatensor
(14.37)
Thisisidenticaltotheinertiatensorforanellipsoid,butthemassmhasbeencutinhalf.InordertoobtaintheinertiatensordomeIdomeaboutthecenterofmass,wemustcalculate
(14.38)
Withs=(0,0,-3c/8),wehave
(14.39)
andso
(14.40)
Capsule
Thedimensionsofacapsulearedescribedbytheheighthofacentralcylinder,thetwosemi-axislengthsaandbofthecylinder'sbase,andathirdsemi-axislengthcrepresentingtheextentofeachhemisphericalendcapinthedirectionperpendiculartothecylinder'sbase,asshowninFigure14.7.
Figure14.7:Acapsule.
Thetotalmassmcapsuleisgivenbythesum
(14.41)
wheremcylinder=ρπabhisthemassofthecentralcylinder,
and isthemassofasinglehemisphericalendcap.Duetosymmetry,itisclearthatthecenterofmassliesatthecenterofthecylindricalportionofthecapsule.
WecancalculatetheinertiatensorcapsuleIcapsulebycombiningtheinertiatensorsofthecylinderanddomeinthepropermanner.Withrespecttoanoriginlocatedatthecapsule'scenterofmass,thecentersofmassforthehemisphericalendcapslieatthez-coordinates
(14.42)
UsingtheoffsetformulagivenbyEquation(14.15)withs=(0,0,zcap),wecantransformtheinertiatensorforadomeintothecapsule'scoordinatesystemtoobtain
(14.43)
Doublingthistoaccountforbothendcapsandaddingittotheinertiatensorforacylindergivesus
(14.44)
Thesemomentsofinertiaaregivenintermsoftheoverallmassofthecapsuleinthesummaryattheendofthisgem.
TruncatedPyramid
Atruncatedpyramidisapyramidthathasbeencutoffatsomeheighthabovethebasebyaplaneparalleltothebase.Aswithapyramid,wedescribethedimensionsofthebasebytheperpendiculardistancesa0andb0fromthe
centerofthebasetotwoadjacentedgesofthebase,asshowninFigure14.8.Weintroduceafactorrrepresentingtheratioofthelengthofanedgeonthetopfacetothelengthofthecorrespondingedgeonthebottomface(thebase).Thedimensionsofthetopfacearethendescribedbytheperpendiculardistancesra0andrb0fromthecentertotheedges.Wherer=0,alloftheformulasforatruncatedpyramidreducetothoseforacompletepyramid.
Figure14.8:Atruncatedpyramid.
Inordertocalculatethetotalmassandcenterofmass,weexpressthelengthsaandbasfunctionsofthez-coordinateasfollows:
(14.45)
Thetotalmassisthengivenbyintegratingrectangularareasovertherangeofz-coordinatesbetweenthebottomandtopfacesofthetruncatedpyramid:
(14.46)
Thecenterofmassclearlyliesonthez-axis,andwecancalculateitsz-coordinatebymultiplyingafactorofzintotheintegrandforthemasstoobtain
(14.47)
Afterdividingbym,wefindthez-coordinateofthecenterofmasstobelocatedat
(14.48)
Weplacetheoriginatthecenterofmassbyshiftingtherangeofz-coordinatesdownwardbyCzandaddingthisshiftbackwhenevaluatingthefunctionsa(z)andb(z).Themomentsofinertiaarethengivenbytheintegrals
(14.49)
Substitutingthemassm,thisgivesustheinertiatensor
(14.50)
TruncatedCone
Atruncatedconeisaconethathasbeencutoffatsomeheighthabovethebasebyaplaneparalleltothebase.Aswithacone,wedescribethedimensionsofthebasebythesemi-axislengthsa0andb0,asshowninFigure14.9.Weintroduceafactorrrepresentingtheratioofasemi-axislengthofthetopfacetothecorrespondingsemi-axislengthofthebottomface(thebase).Thedimensionsofthetopfacearethendescribedbythesemi-axislengthsra0andrb0.Whenr=0,alloftheformulasforatruncatedconereducetothoseforacompletecone.
Figure14.9:Atruncatedcone.
Inordertocalculatethetotalmassandcenterofmass,weexpressthelengthsaandbasfunctionsofthez-coordinate
usingthesameformulasgivenbyEquation(14.45)forthetruncatedpyramid.Thetotalmassisthengivenbyintegratingellipticaldiskareasovertherangeofz-coordinatesbetweenthebottomandtopfacesofthetruncatedcone:
(14.51)
Wecanthencalculatethez-coordinateofthecenterofmassbymultiplyingafactorofzintotheintegrandforthemasstoobtain
(14.52)
Afterdividingbym,wefindthez-coordinateofthecenterofmasstobelocatedat
(14.53)
justasitisforthetruncatedpyramid.Themomentsofinertiaarethengivenbytheintegrals
(14.54)
Substitutingthemassm,thisgivesustheinertiatensor
(14.55)
TeamUnknownRelease
Chapter14-MomentsofInertiaforCommonShapesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
14.4Summary
Themass,thecenterofmass,andinertiatensorforeachoftheshapesexaminedintheprevioussectionaresummarizedinTable14.1.
Table14.1:Thistableliststhemassm,thecenterofmass(CM)C,andtheentriesoftheinertiatensorIforavarietyofsolidshapes.Theinertiatensorisalwaysgiveninthecoordinatesystemforwhichtheorigincoincideswiththecenterofmass.Eachshapeisconsideredtobesolidwithaconstantdensityρ.
TeamUnknownRelease
Chapter14-MomentsofInertiaforCommonShapesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]JerryB.MarionandStephenT.Thornton.ClassicalDynamics,3rdedition.SaudersCollegePublishing,1988.
TeamUnknownRelease
PartII-RenderingTechniquesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
PartII:RenderingTechniques
ChapterList
Chapter15:Physically-BasedOutdoorSceneLightingChapter16:RenderingPhysically-BasedSkyboxesChapter17:MotionBlurandtheVelocity-Depth-GradientBufferChapter18:FastScreen-SpaceAmbientOcclusionandIndirectLightingChapter19:Real-TimeCharacterDismembermentChapter20:ADeferredDecalRenderingTechnique
TeamUnknownRelease
Chapter15-Physically-BasedOutdoorSceneLightingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter15:Physically-BasedOutdoorSceneLighting
FrankKaneSundogSoftware,LLC
Overview
Adventuregames,role-playinggames,and"serious"trainingandsimulationgamesoftenneedtorenderthesamesceneundervarioustimesofday.Thisgemprovidesaphysically-basedapproachforgeneratingrealisticdirectanddiffuseambientskylightforanygiventimeandlocation,togetherwithatone-mappingoperatortoaccountforhumanperception.(SeeFigure15.1.)Thisgivesgameswithoutdoorscenesagreaterlevelofrealism,andprovidesthephysicallyaccuratelightingrequiredbytrainingsystemsthatmightuseyourengine.
Figure15.1:(SeealsoColorPlates.)Anoutdoorscenewithphysically-basedlightingatdusk(left)andatnight(right).(ImagescourtesyofEmergentGameTechnologiesandSundogSoftware,LLC.)
TeamUnknownRelease
Chapter15-Physically-BasedOutdoorSceneLightingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
15.1PositioningtheSunandMoon
Naturallightcomesprimarilyfromthesunandmoon,sothefirststepinlightinganoutdoorsceneistoknowwheretoplacetheselightsources.Todothis,youwillneedanephemerismodeltocomputethelocationofthesunandmoonforagiventimeandlocation.Sincethisisagameenginebookandnotanastronomybook,wedon'tgointothedetailshere,butreferyouinsteadtotheEphemerisclassinthecodeincludedontheaccompanyingCD.Understandingthisclassdoesrequireafewkeyconcepts,whichwedescribehere.
Numericalapproximationsofthepositionofastronomicalobjectsaregenerallydoneineclipticcoordinatesforagivenepochtime.OurEphemerisclassstartsbycomputingthelocationofthesun,moon,andvisibleplanetsineclipticcoordinates,whicharejustlatitudesandlongitudesrelativetotheplanedefinedbythepaththesuntakesacrossthesky.Asaresult,theeclipticlatitudeofthesunisalwayszero.ThealgorithmsusedtakeasinputGreenwichMeanTimeexpressedasthenumberofcenturieselapsedsincetheyear2000;thisistheepochtimeforepoch2000.OurLocalTimeclasswillhandleconvertingtimesinhours,minutes,andsecondsforagivendaytoepochcenturiesforyou.Whilewearecomputingthelocationofthemoon,wealsocomputethephaseofthemoon,whichisimportantfornighttimelighting.
Eclipticcoordinatesarenotterriblyusefulforrendering,soyouwillneedtotransformtheeclipticpolarcoordinatesystemtoaCartesiancoordinatesystemrelativetoyour
localhorizon.Fortunately,thiscanbedonewithjustacoupleof3×3rotationmatrices.OurcodestartsbytakingtheeclipticcoordinatesofthesunormoontogetherwithitsdistancefromtheEarth,andtransformingthatintoa3DvectorineclipticspacefromthecenteroftheEarth.Then,werotatethisvectorintoequatorialcoordinates,whichisasystemdefinedbytheplaneoftheEarth'sequatorinsteadoftheplaneoftheEarth'srevolutionaroundthesun;doingthisrequirescomputingtheEarth'stiltforthesimulatedtime.Finally,wetransformtheequatorialcoordinatesintohorizoncoordinatesforthelocationonEarththatyouwishtosimulate.Asafinishingtouch,wealsoapplyatmosphericrefractiontothehorizoncoordinates,whichaffectstheperceivedlocationofthesunandmoonastheyapproachthehorizon.
Caremustbetakenthatthisfinaltransformationisconsistentwithyourengine'scoordinatesystemconventions;ifyourusersmightdefine"north"and"up"asanyarbitraryaxis,you'llwanttoprovideameansforthemtoinfluencethisfinaltransformationintolocalcoordinates.ThesunrisingintheWestinsteadoftheEastisanembarrassingbugthatisveryeasytoslipthroughtesting.Youmayalsowanttoexposegeographiccoordinatesforthesunandmoon;insteadofbeingrelativetoaspecificlocationonthesurfaceoftheEarth,thesecoordinatesarerelativetothecenteroftheEarth.Flightsimulatorsthatcancoverlargedistancesfrequentlyusethiscoordinatesystem,andourEphemerisclasswillcomputethisforyouaswell.
Thesesamematricesareusefulfortransformingthingssuchasstarfieldsinthesky.Itmaysoundlikeoverkill,buthavingaccuratelypositionedstarsinyourscenescouldbe
importanttosomeonecreating,forexample,anavigationtrainingapplication.
Allthisworkpaysoffwhenyourengine'ssunandmoonriseatexactlytherighttimeforthelocationbeingsimulated.GotagamethattakesplaceinthewinterinAlaska?Yournightswillbelongandthesunwillbelow—automatically.
TeamUnknownRelease
Chapter15-Physically-BasedOutdoorSceneLightingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
15.2ComputingNaturalSunlight
Nowthatweknowwherethesunisinthesky,wecansimulatewhathappenstoitslightasitpassesthroughtheatmosphere;thisiscalledatmospherictransmittanceandatmosphericscattering,anditissimulatedbytheSpectrumclassintheincludedsourcecode.Ourapproachusesamodified"Birdsimplespectralmodel",namedafterDr.RichardBirdwhodevelopeditattheSolarEnergyResearchInstitute(nowtheNationalRenewableEnergyLaboratory)in1984[1].Itisrelativelysimplecomparedtoothermodels,yetamazinglyaccurate.Thiscodeiswhatwillturnsunlightrednearsunset,forexample.
OurSpectrumclassoperatesoverafullvisiblespectrum,andstartswithdatafromNASAonthespectrumofthesunfromoutsideoftheatmosphereasinput.(ThisspectrumisinsideourSolarSpectrumclass.)Italsotakesintheanglebetweenthesunandthetopoftheskydome(thezenithangle),youraltitude,andtheatmosphericturbidity,whichisessentiallyameasureofhowpollutedtheairis.Areasonablevalueforturbidityisaround2.2;goinglowerthan1.8orhigherthan20.0willcausethemathtostartbreakingdown.Thisclasswillrewardyouwithspectraofthedirectirradianceofthesunlighttransmittedthroughtheatmosphere(thiswillbecomethediffusecomponentofyourlightsourceforthesun)andthescatteredirradiance(whichwillbecometheambientcomponentofyourlightsource).
ThemeatofthesimulationisinSpectrum::ApplyAtmosphericTransmittance.Thismethoditeratesoversamplesofthevisiblespectrumfrom
380nmto720nm.Foreachwavelength,wesimulatetheeffectsofseveralcomponentsoftheatmosphereonhowthatwavelengthistransmittedandscatteredbytheatmosphere.Wecanmultiplytogetherthetransmittancesfromeachcomponenttoarriveatafinaltransmittanceforthegivenwavelength,andmultiplythatbythesun'sirradianceatthatwavelength.Thescatteredcomponentsareaddedtogetherandthenmultipliedbythesun'sirradiance.Oncewe'redonewithallofthewavelengthsofthevisiblespectrum,wecanconvertthisspectrumintoRGBvaluesfordirectandambientnaturallight.
Itstartsbycomputingtheairmassforthegivensolarangle,whichrepresentshowmuchatmospherethesunlightneedstopassthroughbeforeitgetstothecamera.TheairmassMforagivensolarzenithangleZ(indegrees)isgivenby
Thelowerthesunis,themoreairwillscatteritssunlight.Wemultiplytheairmassbytheisothermaleffect,whichisafancywayofsayingthatthehigheryouare,thelessatmospherethereis.Asyouraltitudeincreases,lowerairmasseswillresultinlesslightbeingscatteredandmoredirectlightreachingyou;modelingthismayyieldeffectssuchastheskydarkeningasyoustarttoenterspace.Theisothermaleffectisgivenbye-a/H,whereaisthealtitudeabovesealevel,andHisthe"pressurescaleheight"of8435meters.
Then,variouscomponentsoftheatmospherearetreatedindependentlyandtheireffectscombinedtogetherattheend.ThemaincomponentisRayleighscatteringwhichis
causedbythemoleculesofairitself;itisthereasontheskyisblue,andyourscatteredlightwillbeabitblueasaresult.ThelighttransmittedbyRayleighscatteringTRforagivenwavelengthλinmicrometersisgivenby
Nextcomestheeffectofaerosols,orlargerparticulatematter—thisisaffectedbytheturbidityTyoupassedin;moreaerosolsmeanreddersunsets.ThelighttransmittedbyaerosolsTAisgivenby
Thevalueofαmaybesetto1.140forruralenvironments.Forbetteraccuracy,it'sreally1.0274forwavelengthslessthan500nm,and1.2060otherwise.
WehavecomputedtheamountoflighttransmittedbyRayleighandaerosoleffectsforeachwavelength,whichwillgiveusourdirectsunlight.Forambientsunlight,wealsoneedtocomputethescatteredlight.Scatteredsunlightisalittletrickier.First,weneedtocomputeanaerosolscatteringtransmissiontermTAs,anaerosolabsorptiontransmissiontermTAa,thelogofanaerosolasymmetryfactorA,aconstantFSandaconstantC2:
ThescatteredlighttermrayDrayforRayleighscatteringisthengivenby
ThescatteredtermDaerforaerosolsis
ThetotalscatteredirradiancecanthenbederivedbyaddingDrayandDaermultiplyingitbythesun'sirradianceforthegivenwavelength.ModelingRayleighandaerosoleffectswillbeaccurateenoughformostapplications.Thesamplecodealsomodelstheeffectsofwatervapor,ozone,mixedgas,reflectionfromtheground,andsomeeffectsspecifictowavelengthsunder450nm,buttheseeffectsareallsmallduringdaytime.However,simulatingozonescatteringwillimprovetherealismofyoursunsets.
Animportantlimitationoftheseequationsisthattheybreak
downatzenithanglesover90degrees—thatis,assoonasthesundropsbelowthehorizon.Civiltwilightisdefinedasthepointwherethesunis6degreesbelowthehorizon,sotoavoidadiscontinuityat90degrees,you'llwanttointerpolatethedirectandscatteredsunlightbetweenitsvalueat90degreesand0at96degrees.Lightdoesn'treallyfallofflinearlyduringthistime;betterimplementationswouldusealookuptablebasedonexperimentaldataoftwilightluminanceforgivensolaranglesbelowthehorizon.Thenauticalalmanac[5]listedinthereferencesisonesourceofthisdata.
TheresultingspectrafortransmittedandscatteredsunlightarethenconvertedtoaCIEXYZcolor.Youwillneedtoknowalittleaboutcolortheorytoknowwhat'sgoingonhere,buttheimportantthingisthatXYZcontainschromaticityinformationaswellasluminanceinformation.Assuch,itmayrepresent"highdynamicrange"colors,andthesuncertainlyqualifiesasahighlightingvalue.Wewillmapthislightdowntosomethingdisplayableonamonitor,butfirstwealsoneedtoaddinthelightfromthemoon.
TeamUnknownRelease
Chapter15-Physically-BasedOutdoorSceneLightingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
15.3MoonlightandOtherNighttimeLightSources
Moonlightgetsscatteredthroughtheatmosphereinexactlythesamewayassunlight;theonlydifferenceisthatinsteadofstartingwithafixedspectrumoftheextraterrestrialsolarlightsource,wemustgeneratethemoon'sspectrumalgorithmically,sinceitvariesdependingonitsphaseanditsdistancefromtheEarth.
Fortunately,ourEphemerisclasswillgiveusthatinformation.Moonlightconsistsoftwocomponents:lightreflectedfromtheEarthoffthemoon("Earthshine")andlightreflectedfromthesunoffthemoon.Bothdependonthephaseofthemoon,asexpressedbyitsphaseangleΦ.Earthshineisgivenby
ThisvaluebecomesacomponentoftheexpressionforcomputingthetotalmoonshineirradianceseenfromtheEarth:
Here,disthedistancefromtheEarthtothemoonreturnedfromourEphemerisclass,rmistheradiusofthemoon(1738.1×103m),Esmistheirradianceofthesunatthemoon(1905W/m2),andCistheaveragealbedoofthemoon(0.072).
ToturnthisintoaspectrumthatyoucanpassthroughtheBirdspectralmodel,firstconvertW/m2fromtheequationabovetocd/m2usingtheapproximateconversionfactorof683.0/3.14.Then,linearlyscalethisvaluefrom0.7atthelowendofthevisiblespectrumto1.3atthehighend,normalizingtheresultstoensuretheresultingspectrumaddsuptothelunarirradianceyoucomputedabove.Fromthere,youcantreatmoonlightjustlikesunlight,andmodelthemoonasasecondlightsourceinthesamemannerasthesun.
Evenwhenthereisnomoonoutatnight,therearesourcesofambientillumination.Lightfrombrightplanets,zodiacallight,starlight,airglow,galacticlight,andcosmiclightcanallbemodeled,butarenegligible(~2×10-6W/m2)comparedtoartificiallightpollutioninallbutthemostremoteareas.Topreservesomevisibilityonmoonlessnights,youwillwanttoaddanarbitrarylightpollutiontermtoyourambientillumination.
TeamUnknownRelease
Chapter15-Physically-BasedOutdoorSceneLightingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
15.4Tone-MappingtheLight
NowthatyouhavethedirectandscatteredirradiancefromthesunandmoonexpressedasXYZcolors,thechallengeistomapthesedowntoRGBvaluesforlightingyourscene.
Thedifferenceinluminancebetweenamoonlessnightandhighnooninthesummerismorethantenordersofmagnitudeandcannotbedirectlydisplayedbyanydisplaydevice.Tone-mappingisrequiredtocapturethefactthatyoureyesadapttotheambientlight,allowingyoutoseeonamoonlitnightwhilenotbeingblindedduringtheday.Duringtheday,theconesinyoureyecreatewhatisknownasphotopicvision.Atnight,yourrodsareresponsibleforscotopicvision.Atdawnanddusk,bothmaybeactivetoprovidemesopicvision.Perceptualtonemappingworksdifferentlyineachcase.Forexample,thingsappeartolookalittleblueatnight,whichisaneffectwecancapture.Fortunately,thisisasolvedproblem.FrédoDurandandJulieDorseyatMITpresentedasimpletone-mappingoperatorforthispurposein2000,whichwe'llsummarizehere.
Perceptualtone-mappingrequiresknowledgeofboththeaverageluminosityofthescene(thisistheadaptationluminositythatyoureyesareadaptedto),andthemaximumluminanceofthedisplaydevice.TheadaptationluminositymaybeapproximatedbytheYcomponentofthesumofthesunandmoon'sscatteredlight.Thedisplay'sluminosityisgenerallysetto100cd/m2.
TheDurandoperatortreatstonemappingindependentlyforrodsandcones;we'llusethesamenotationusedin
Durand'spaper[3].TheadaptationluminosityforconesLwaCissimplytheYcomponentofthescene'sscatteredlight,asmentionedabove.ThedisplaythresholdforrodsLwaRmaybeapproximatedby
LwaR=-0.702X+1.039Y+0.433Z,
whereX,Y,andZaretheXYZcomponentsofthescene'sscatteredlight,whichisthesumofthescatteredlightfromthesunandthemoon.ThedisplayluminositiesLdaCandLdaRaresetto100cd/m2,butmaybeadjustedforbrighterordarkerscenes.Therodandconethresholdsfromthescenearethenmappedtorodandconethresholdsforthedisplay,usingatechniquecalledthresholdmapping.Forbothrodsandcones,photopic,mesopic,andscotopicconditionsaretreatedseparately;therodthresholdisgivenbyɛRandtheconethresholdbyɛc:
Whatwereallyneedistocomputetwoscalingvalues,oneforrods(mR)andoneforcones(mC):
Wealsoneedascalingvaluekwhichisusedtointerpolatebetweenfullcolorperceptionfromconesandblue-shiftedmonochromaticperceptionfromrodsinmesopicconditions.Withthevalueσsetto100cd/m2,
Finally,wehaveeverythingweneedtomaptherawXYZvaluesofthedirectandscatteredlighttosomethingdisplayable.FortherawlightingvalueL,thetone-mappedvaluelightingvalueL′isgivenby
TheintermediatevalueSaboverepresentsthescotopiccolorfromtherods(withaperceptualblue-shift),whichisblendedwiththefullcolorLfromthecones.
Thefinalstepistoconvertthetone-mappedXYZvalueintoanRGBvalue.Youmayperformthisconversionwithmanymatricesfoundonlinethatassumedifferentwhitepoints.Hereisonewithgoodresults(thisistheHDTVrec.709
matrix):
Unlessyou'reworkinginHDRspace,you'llwanttoclamptheresultsto[0,1]foreachcolorcomponent.
ThealgorithmsaboveareimplementedintheclassLuminanceMapperintheincludedsamplecodeontheCD.
TeamUnknownRelease
Chapter15-Physically-BasedOutdoorSceneLightingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
15.5ImplementationNotes
Althoughthecomputationsdescribedabovecanexecuteprettyquickly,thereisnoneedtocomputethemeveryframe.Theresultinglightvaluesshouldbecacheduntilthetimeofdayorlocationchanges,andmayevenbeprecomputedfortherangeofzenithanglesforagivenlocationatloadtime.
Withphysically-basedlightsources,itbecomesimportantthatthematerialsonyourscene'sobjectsarealsoaccurate.Ifthematerialsonyourobjectsaresetto100%brightnessfordiffuseorambientlight,theywilllooktoobrightwhenusingthesetechniques.Real-worldmaterialsdonotreflect100%oftheincominglight,unlesstheyareperfectmirrors.Workwithyourartstafftoensureyourmaterialsarereasonable.
Thesealgorithmsallassumeaclearskyatthecamera'slocation;forcloudyconditions,you'llneedtoattenuatetheresults.Sincetheamountofattenuationdependsonthethicknessoftheclouds,howmuchisreallyuptoyou.Forreallythickclouds,youmaywanttoreducetheamountofdirectlightandincreasetheambient,sincethecloudswillscatterthesunlightfurther.
Strictlyspeaking,scatteredsunlightandmoonlightisnotreallyambientlight—itismoreaccuratelymodeledasdirectionallightradiatingperpendiculartothesurfaceoftheskydome.Withoutsomesortofglobalilluminationscheme,however,yourobjectswilllikelyappeartoodarkifthescatteredlightdoesnotreflectbetweenobjectsonthe
ground,andtreatingitasanambienttermwillyieldbetterresults.Thereiscertainlynothingtostopyoufromdoingsomethingmoresophisticatedwiththeresultingdirectandscatteredlightthatthesetechniquesproduce.
TeamUnknownRelease
Chapter15-Physically-BasedOutdoorSceneLightingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]R.E.BirdandC.Riordan."SimpleSolarSpectralModelforDirectandDiffuseIrradianceonHorizontalandTiltedPlanesattheEarth'sSurfaceforCloudlessAtmospheres".TechnicalReportNo.SERI/TR-215-2436,SolarEnergyResearchInstitute,1984.
[2]PeterDuffett-Smith.PracticalAstronomywithyourCalculator.CambridgeUniversityPress,1988.
[3]FrédoDurandandJulieDorsey."InteractiveToneMapping".ProceedingsoftheEurographicsWorkshoponRenderingTechniques2000,pp.219–230.
[4]HenrikWannJensen,FrédoDurand,JulieDorsey,MichaelM.Stark,PeterShirley,andSimonPremože."APhysically-BasedNightSkyModel".Proceedingsofthe28thAnnualConferenceonComputerGraphicsandInteractiveTechniques,2001,pp.399–408.
[5]NauticalAlmanacOfficesoftheUnitedKingdomandtheUnitedStatesofAmerica.ExplanatorySupplementtotheAstronomicalEphemerisandtheAmericanEphemerisandNauticalAlmanac.HerMajesty'sStationaryOffice,1961.
TeamUnknownRelease
Chapter16-RenderingPhysically-BasedSkyboxesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter16:RenderingPhysically-BasedSkyboxes
FrankKaneSundogSoftware,LLC
Overview
Simple,GPU-friendlyalgorithmsexistforrenderingrealisticskyboxesinrealtimeforanygiventimeandlocation.ThisgemreviewsthePreethametal.model[3]foraccuratelydistributingluminancethroughoutasky,withsomeextensionstoensurenatural-lookingresults.Whenpairedwiththegem"PhysicallyBasedOutdoorSceneLighting",outdoorsceneswithlightingthatperfectlymatchestheskybecomepossible.Algorithmsforprocedurallygeneratingrealisticskyboxesaresurprisinglysimple,andenablecontinuoustimeofdayeffectsinyourengine.
TeamUnknownRelease
Chapter16-RenderingPhysically-BasedSkyboxesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
16.1GeneratingandDrawingtheSkybox
First,someskyboxbasics:askyboxisjustthat—acubethatisalwaysrenderedaroundthecamera'slocationsuchthatitmovesandrotateswiththecamera,whichiscoloredtolooklikeasky.Youmayproperlypositionitbysimplyzeroingoutthetranslationcomponentsofyourviewmatrixbeforedrawingit.
Intuitionwouldtellustorendertheskyboxasthefirstthinginyourframe,asit'sinfinitelydistantandeverythingelsewillbedrawninfrontofit.Youcouldjustdisabledepthbufferreadsandwrites,drawtheskyboxinsteadofclearingthedepthbuffer,andgoaboutrenderingyourscene.Thissimpleapproach,however,isnotthemostoptimal,sinceyou'llendupspendingtimefillingyourcolorbufferwithabunchofskythatwillendupbeingoverdrawnbyyourscene.Inreality,youwon'tsaveanythingonmodernhardwarebynotclearingthedepthbufferalongwithyourcolorbufferatthebeginningoftheframe—infact,itmayhurtperformance.Abetterapproachistokeepclearingyourcoloranddepthbuffertogether,thendrawyourskyboxasthelastthinginyourframe,withdepthbufferreadsenabledtopreventdrawingpartsoftheskythatarenotvisible.Ifyouuseaninfiniteprojectionmatrixwhilerenderingtheskybox,drawingtheskyboxlastinsteadoffirstwillrendercorrectlyandcangainyousomeperformance.
Forprocedurallygeneratedskyboxes,we'lluseavertexprogramtorendertheskyjustusingvertexcolors(disabletexturingandenablesmoothshading).Thisimpliesthatthegeometryoftheskyboxwarrantssomeeffort;youneedto
haveenoughverticesintheskyboxtoachieveconvincingresults,buttoomanymayimpactperformance.IntheimagesshowninFigure16.1,eachfaceoftheskyboxcubeconsistsofa40×40grid.Youmaychoosetohaveahighervertexdensitynearthehorizon,asthisiswhereskycolorschangemorequickly;youmayalsochoosetohavehighervertexdensityverticallyratherthanhorizontally,astheskycolorchangeslessrapidlyasyoulookaroundthehorizonthanasyoulookuptowardthesky'szenith.
Figure16.1:(SeealsoColorPlates.)Physically-basedskyboxgeneratedfor(left)latemorningand(right)twilight.(ImagescourtesyofSundogSoftware,LLC.)
Ifyou'recertainthatthebottomofyourskyboxwillneverbevisible,youmayomitthisface—butasanenginedeveloper,youshouldn'tmakethisassumption.YourusersmightenduprenderingtheEarthfromspaceusingyourskyboxforastar-fieldandwonderwhynothingisdrawnunderneathit.Youmayalsowanttoallowyouruserstoshifttheskyboxdownbysomeamountinsteadofdrawingitperfectlycenteredwiththecameraposition;iftheterraininthescenedoesn'tactuallyextendallthewaytothehorizon,thisisasimplewaytocoverupthatfact.
Thetechniquesinthisgemwillworkjustaswellwithaskydome.However,aboxissimpler,allowsustoeasilycullfacesoftheboxthatarenotvisible,andsincewe'reusing
vertexcolorsinsteadoftexturemapping,thecornersoftheboxarenotatallperceptible.
Usingaconventionalprojectionmatrix,thesizeoftheskyboxmustbechosenwithcare;youneedtoensurethatitfallswithinthenearandfarclippingplanesofyourviewfrustum.Thisbecomesespeciallyproblematicwithenginesthatdynamicallyadjustthenearandfarclipplanesbasedonthescene'sboundingvolumestomaximizedepthbufferresolution.Anelegantwayaroundthisproblemistorendertheskyboxusinganinfiniteprojectionmatrixandwithw-coordinatesofzero,assuringthefarclipplaneisirrelevantandtheskyboxisdrawnatmaximumdepth.Caremustbetakenwiththistechniquetoavoidround-offerrorartifacts[1].
TeamUnknownRelease
Chapter16-RenderingPhysically-BasedSkyboxesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
16.2ComputingtheSkyboxVertexColors
Backin1993,Perezetal.developedasimplemodelfordistributingluminanceacrossthesky,byfittingequationstoexperimentaldata[2].In1999,Preethametal.extendedthismethodtoworkinfullcolorinapaperpresentedatSiggraph.Theonlyinputrequiredisthepositionofthesun(ormoon)inthesky,andtheturbidityoftheatmosphere.Turbidityisameasureoftheparticulatematterintheatmosphere;areasonablevalueforrealistic-lookingscenesisaround2.2.Highervaluesimplymorepollutedskies,andwillresultinmoredramaticsunsets.
Thepositionofthesunormoonisspecifiedinpolarcoordinates.θSistheanglefromtheupaxistothelightsource,andφSistheanglefromtheSouthaxis(positiveistowardEast.)TheEphemerisclassintroducedin"Physically-BasedOutdoorSceneLighting"isusefulforcomputingthepositionofthesunandmoonforagiventimeandlocation.
ThePerezmodelfordistributingluminanceacrosstheskyisgivenby
F(θ,γ)=(1+AeB/cosθ)(1+CDγ+Ecos2γ).
Werefertothisequationasthe"Perezfunction".Here,θistheangleoftheskyboxvertexfromtheupaxis,andγistheanglebetweenthevertexandthesunormoon—yourvertexprogramwillcomputetheseanglesgiventhepositionofeachskyboxvertex.We'lldiscusstheconstantsAthroughEshortly.
ToarriveatanactualluminancevalueYPforagivenvertex,youneedtocomputetheabovefunctiononceforthevertex'sposition,andagainforthepositionofthesunormoon(thisonlyneedstobecomputedwhenthetimeofdaychanges,thencached):
You'llalsoneedthezenithluminanceYZ—althoughPreethametal.providesafunctionforcomputingthisintheirpaper,we'vegottenbetterresultsbyusingtheYcomponentofthetone-mappedscatteredsunlightormoonlightderivedinthegem"Physically-BasedOutdoorSceneLighting".Dividethatluminanceby1000,sincetheconstantswe'reusinghereareinunitsofkcd/m2.
ComputingtheluminanceYPfortheskyboxvertexisnice,butyouwantacolor,notjustaluminance.ThetrickistoworkinxyYcolorspace—thisisameansofrepresentingcolorswherethevaluesxandyrepresentthecolor'schromaticity,andYrepresentsitsluminance.Foreachvertex,youcomputethefunctionabovethreetimes—onceforx,oncefory,andonceforY,andthenconverttheresultingxyYcolortoRGB.
Thismeansyou'llneedanxyYvalueforthezenithYZ.IfyouhavethezenithcolorinXYZform,you'llneedtoconvertXYZtoxyYusingtherelationship
TheconstantsA,B,C,D,andEinthePerezfunctionarethemselvesfunctionsoftheturbidityvalueT,andaredifferentdependingonwhetheryou'recomputingthePerezfunctionforx,y,orY.Theseconstantsaregivenby
Thesevaluesonlyneedtoberecomputedifthesimulatedturbiditylevelchanges,andshouldbepassedintoyourvertexprogramasuniformparameters.TheresultofthePerezfunctionforthesunpositionandthezenithxyYcolorshouldalsobeuniformparameters,aswellasthepositionofthesun(ormoon).
Withthesevaluesandthepositionofeachvertexinyourskybox,youhaveeverythingyouneedforyourphysically-basedskyboxvertexprogramtocomputexyYvaluesfor
eachskyvertex.ThelaststepistomapthisresulttoRGBvalues.Todothis,firstconvertthexyYvaluetoXYZ:
Ifthezenithcoloryou'reusingisalreadytonemapped,thenfurthertonemappingisoptional.Thesametonemappingtechniquedescribedin"Physically-BasedOutdoorSceneLighting"maybeappliedatthispoint,however,andwillresultineffectssuchasawarmglowsurroundingafullmooninthesky.
Then,convertXYZtoRGBusingyourfavoriteconversionmatrix.Weuse
Othermatricestendtomaketheskylookalittlegreen,whichlooksquiteunnatural.Ifyou'renotworkinginHDRspace,you'llwanttoscaledowntheresultingRGBvaluestofitwithin[0,1]ifnecessary.
ThefinaltweaktotheRGBvalueisimportantandoftenoverlooked:gamma-correctingthecolorforthedisplay.Yourskywillappeartoodarkotherwise.SimplyraisethefinalRGBcolortothepowerof1/γ.Agammavalueof1.8providessatisfyingresults.
AsimplevertexprogramwritteninCgthatimplementsmuch
ofthisalgorithmisavailablewiththebook'ssamplecode.Potentialextensionsofthisprogramwouldincludehandlingtwolightsourcessimultaneously(thesunandthemoon)insteadofasingledominantlightsource,incorporatingvolumetricfogeffectsintothesky,simulatingovercastconditions,andthetone-mappingdescribedabove.
TeamUnknownRelease
Chapter16-RenderingPhysically-BasedSkyboxesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
16.3IntegratingtheSkyboxwithYourScene
Onechallengeaphysically-basedskyboxpresentsisthatitbecomesdifficulttoblendyourdistantscenerywiththeskybox,sincethesky'scolorsmayvaryaboutthehorizon.Theusualtrickoffoggingdistantterraintoblendintotheskymayproduceavisibleseam,sincetheskyisnotaconstantcolor.
Preethametal.describeaninvolvedmeansforimplementingphysically-basedatmosphericperspectiveeffectsonterrainthatwillmatchtheskyboxperfectly,butfromareal-timeperformancestandpointthisisimpracticalfordynamicsceneswithcomplexgeometry.
Onesolutionistoblendtheskyboxitselftoafixedfogcoloratandbelowthehorizon.Ifthisfogcolorischosenwisely,thesky/terrainboundarywillbeseamless,butyouwilllosesomeofthemoredramaticeffectsatdawnandduskwiththisapproach.
Anothersolutionistosampletheskyboxcoloratthehorizondirectlyinfrontofthecameraeveryframe,andsetyourterrain'sfogcolortomatchit.Theresultwon'tbeperfectlyseamless,butthisisagoodcompromiseforreasonablefieldsofview.
TeamUnknownRelease
Chapter16-RenderingPhysically-BasedSkyboxesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
16.4EmbellishingYourSkybox
Afterdrawingtheskyboxitself,you'llwanttodrawbillboardsrepresentingthesunandthemoon(preferablyinthecorrectphase)intheirproperlocations.Boththesunandmoon,byanamazingcoincidence,coverahalfadegreeofthesky.However,ifyourenderthemattheirphysicallyaccuratesize,they'llseemmuchtoosmallduetopsycho-perceptualissuesandtypicalfieldsofviewthatareunrealisticallylarge—goaheadanddrawthematwhateversizelooksrighttoyou.Usersalsoexpecttoseealargeglareeffectsurroundingthesun,whichwillalsohelpittolookbigger.
Anotherdramaticeffectisrenderingthestarsandplanetsaspartofyourskybox.Ifyoudrawthemaspointswithanadditiveblendingmode,they'llstarttoemergeatdusk.UsingthematricesinourEphemerisclass,yourstarswillmoveacrosstheskyinarealisticmanner.Dataonthepositions,magnitudes,andcolorsofthevisiblestarsarereadilyavailable,andintegratingthisdataintoyourskyboxisafunproject.
Thisgemwillrenderclearskiesforanytimeofdayandlocation,butclearskiesarenotthenorm.Addingsomecloudstothesceneaddsanextralevelofrealism.Renderingproperlylit3Dvolumetriccloudsisachallengingtask,butevenasinglelargequadhighintheskywithacirruscloudtextureonitwillgoalongway.
TeamUnknownRelease
Chapter16-RenderingPhysically-BasedSkyboxesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]E.Lengyel."ProjectionMatrixTricks".GameDeveloper'sConference2007.http://www.terathon.com/gdc07_lengyel.ppt
[2]R.Perez,R.Seals,andJ.Michalsky."AnAll-WeatherModelforSkyLuminanceDistribution".SolarEnergy,Volume50,Number3(March1993),pp.235–245.
[3]A.J.Preetham,PeterShirley,andBrianSmits."APracticalAnalyticModelforDaylight",Proceedingsofthe26thAnnualConferenceonComputerGraphicsandInteractiveTechniques,1999,pp.91–100.
TeamUnknownRelease
Chapter17-MotionBlurandtheVelocity-Depth-GradientBufferGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter17:MotionBlurandtheVelocity-Depth-GradientBuffer
EricLengyelTerathonSoftware
Highlights
Motionbluraddsasignificantamountofrealismtoarenderedscenesinceoureyesareaccustomedtoseeingitwhenwelookatmovingobjectsintherealworld.Thereareseveraltechniquesforproducingmotionblurincomputergraphics,andtheyvarywidelyinbothrenderingspeedandimagequality.Temporalsupersampling,inwhichmultipleframesarerenderedandthencombinedtoformoneimage,canproduceveryaccurateresults,butitsextremerenderingexpensemakesitimpracticalforreal-timeapplicationslikegames.Techniquesusinganaccumulationbufferofsomekindtostorepreviousframestobecombinedwiththecurrentframearefast,butproduceterribleresultsintermsofimagequality.
Thereisaclassofmotionblurtechniquesthatarebasedoncalculatingperpixelvelocitiesandusingthemtocollectmanysamplesfromthecolorbufferalongthedirectionofmotion.Thesetechniquesgenerallyproducegoodresults,butmanyofthemproduceaparticularartifactthatmanifestsitselfasafuzzyhaloaroundforegroundobjectswhenthepixelsbehindthemaremovingquickly.Thetechniquedescribedin[1]makesnoattempttoeliminateorreducetheappearanceofthisartifact.Themethodpresentedin[2]eliminatestheartifact,butalsoeliminatessomecasesofcorrectmotionblur,anditcomeswithsomesignificantlimitations.
Thereisbutonemethodthatisbothfastandcapableofproducinghigh-qualityimages,anditinvolvestheuseofavelocitybufferinconjunctionwithapost-processingshadertorenderadirectionalblurforpixelsbelongingtomoving
objects.Abasicimplementationofthisconceptproducesadequateresultsforsomeapplications,butitalsoproducesthefuzzyhaloartifact.Thisgemdiscussesanimprovementtothismotionblurtechniquethateliminateshaloartifactswithoutalsoaffectingcaseswheremotionblurwouldbecorrectlyrendered,producingimagesofmuchhigherqualitythanispossiblewithothertechniques.ThemethoddescribedherewasoriginallyimplementedintheC4Engine[3],andthatisthesourceoftheimagesshowninthisgem.
TeamUnknownRelease
Chapter17-MotionBlurandtheVelocity-Depth-GradientBufferGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
17.1TechniqueOverview
Thetechniquedescribedinthisgemrequiresthatadedicatedfour-channelvelocitybufferbeallocatedbytherenderingsystem.Foreachframewerender,wefillthisvelocitybufferwithinformationaboutthetwo-dimensionalscreen-spacevelocityofpixelsbelongingtoeachobjecttowhichwewanttoapplythemotionblureffect.Thisistypicallydoneearlyintherenderingprocessforaparticularframe,andithappensindependentlyofanypreviouslyrenderedframes.
Whenrenderingtothevelocitybuffer,therearethreesourcesofmotionthatweneedtoconsider.First,wemusttakethemotionofthecameraintoaccount,andthismotionaffectsallobjectsinthescene.Second,wemustconsiderthemotionthatindividualobjectshaveasawhole.Anobjectmaybemovingthroughspaceorrotating,andthismotioncanbecapturedbyconsideringtheobject'stransformationmatrixforboththecurrentframeandtheprecedingframe.Third,it'spossiblethattheverticescomposinganobject'strianglemesharethemselvesinmotion.Thisfrequentlyoccurswithskinnedcharactermodels,softbodies,andcloth.ExamplesofmotionblurarisingfromcameramovementandobjectmovementareshowninFigure17.1.AnexampleofmotionblurduetovertexmovementwithinasinglemeshisshowninFigure17.2.
Figure17.1:(SeealsoColorPlates.)Intheleftimage,motionblurresultingonlyfromcameramovementisshown.Noticehowthegroundandtreesclosertothecameraareblurredmuchmorethandistantobjects.Intherightimage,motionblurresultingfromrigidobjectsmovinginthesceneisshown.Bothtranslationalandrotationalmotionarevisibleinthisstillimage.(ImagescourtesyofTerathonSoftwareLLC.)
Figure17.2:Motionblurresultingfromvertexmovementonaskinnedcharactermodel.(ImagecourtesyofTerathonSoftwareLLC.)
Attheendoftherenderingprocessforasingleframe,we
applytotheentirescreenapost-processingshaderthatgeneratesthemotionblureffectusingthedatastoredinthevelocitybuffer.Thisshadercanusuallybecombinedwithotherpost-processingeffectssuchasglow,distortion,andcolormatrixapplication.Theshaderthatgeneratesthemotionblurreadsthescreen-spacevelocityforeachpixelfromthevelocitybufferandusesittochooseasetofsamplepointsatwhichthecolorbufferisthenread.Thesecolorsamplesaredistributedalongthedirectionofthevelocity,andtheyarespreadoveralargerdistanceforhighervelocities.Afterthecolorsamplesarecollected,theyarecombinedtogeneratethefinalcolorforeachpixel.
Itisnotalwaysthecasethatwewanttouseallofthecolorsamplesforaparticularpixel.Inparticular,ifafast-movingobjectpassesbehindaslow-movingorstationaryobjectwithrespecttothecamera,thenthemotionblurappliedtopixelsbelongingtothebackgroundobjectshouldnotsamplepixelsbelongingtotheforegroundobject.Topreventthisfromoccurring,wemustbeabletodeterminewhenpixelsbelongtothesameobjectandwhentheydon't,whilethepost-processingshaderiscollectingcolorsamples.Thiscanbeaccomplishedbystoringadditionalinformationaboutthedepthandtheslopeofsurfacesinthevelocitybuffer.
Inthetworemainingavailablechannelsofthevelocitybuffer,westorethedepthzofeachpixelincameraspace,andwestorethemagnitudeofthetwo-dimensionalgradientofthedepth.Thesevaluesgiveustheabilitytocalculatetheminimumdepthzminthatasamplelocationmusthaveinordertobeconsideredpartofthesamesurfaceasthepixelbeingblurred.Theformulais
(17.1)
whereristhedistancefromthesamplelocationtothepixelbeingblurred.Colorsampleslyingclosertothecamerathanthisminimumdeptharediscarded.Figure17.3demonstrateshowthistechniqueeliminatesartifactsthatappearifthedepthandgradientarenotconsidered.
Figure17.3:(SeealsoColorPlates.)Inthesetwoimages,thecameraisrotatingaroundthecharacter,causingthegroundtomoveacrossthescreenwhilethecharacterisalmostcompletelystill.Intheleftimage,thedepthandgradientinformationinthevelocitybufferisnotconsidered,andallcolorsamplesalongthedirectionofthevelocityareused.Noticetheghostingoftheglowingpartsofthecharacter'sarmorandthefuzzyhalosurroundinghislegs.Intherightimage,thedepthandgradientinformationinthevelocitybufferisconsidered,andtherejectionoftheappropriatecolorsampleseliminatestheartifacts.(ImagescourtesyofTerathonSoftwareLLC.)
TeamUnknownRelease
Chapter17-MotionBlurandtheVelocity-Depth-GradientBufferGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
17.2RenderingtotheVelocity-Depth-GradientBuffer
Atsomepointintimebeforepost-processingisperformed,wemustfilladedicatedvelocity-depth-gradientbufferwiththeinformationthatwillbeusedtogeneratethemotionblureffect.Manymodernenginesrenderadepth-onlypassatthebeginningofaframeinordertomaximizetheeffectivenessofhierarchicaldepthbufferingcapabilitiesbuiltintothegraphicshardware.Thisgivesusaconvenientplacetoalsorenderourvelocityinformationwithouthavingtopassvertexdatathroughtherenderingpipelineasecondtime.Sincemanyenginesalsorequirealineardepthvaluetobegeneratedandstoredearlyintheframeinordertorendersometypesofspecialeffects,itisdoublyconvenientthatsuchadepthisoneofthevalueswemustcalculateandstoreinthevelocitybuffer.
Werenderoutvelocity,depth,andgradientvaluesintoafloating-pointbufferhaving16bitsperchannel.Itispossibletoimplementthetechniquedescribedinthisgemusingaconventionalintegerbufferwith8bitsperchannel,butthesmallamountofavailableprecisionforthedepthvalueinthatcaselimitsthepracticalrangeforwhichwecaneliminatemotionblurartifactstoanunacceptablyshortdistanceinfrontofthecamera.
Tocalculateatwo-dimensionalscreen-spacevelocity,wedeterminethescreen-spacepositionsforeachvertexovertwoconsecutiveframes,subtractthem,andthenmultiplybyanormalizingscalefactor.Thehomogeneousscreen-spacepositionPscreenofavertexisgivenby
(17.2)
wherePmodelisthemodel-spacevertexposition,Mmodelisthematrixthattransformsmodel-spacepointsintoworldspace,Mcameraisthematrixthattransformsworldspacepointsintocameraspace,Mprojectistheprojectionmatrixforthecamera,andMviewportistheviewporttransformation.ThevaluesofMviewportandMprojectareordinarilyconstantfromoneframetothenext,butthevaluesofMcamera,Mmodel,andPmodelcanchange.Thus,itisnecessarytostoretheMcameramatrixusedbythecameraduringtheprecedingframe,anditisnecessarytostoretheMmodelmatrixusedbyeachmodelduringtheprecedingframe.Ifthemodel-spacevertexpositionscanchangeforaparticularmodel(forexample,onaskinnedcharacter),thentheentirearrayofvertexpositionsusedbythatmodelduringtheprecedingframemustalsobestored.
Whenrenderinganobjectintothevelocitybuffer,wecalculatetheproductofthefourmatricesinEquation(17.2)toconstructthematrixMmotionforboththeprecedingframeandthecurrentframeasfollows:
(17.3)
ThesuperscriptAindicatesamatrixbelongingtotheprecedingframe,andthesuperscriptBindicatesamatrixbelongingtothecurrentframe.Thetwoproducts and
aresenttotheGPUasparametersthatcanbe
accessedbythevertexshader.AsshowninListing17.1,thesematricesareusedinthevertexshadertocalculatetwohomogeneousscreen-spacepositionsforeachvertexandstorethemintexturecoordinatesthatareinterpolatedastrianglesarerasterized.ThevertexshadershowsthesamepositionbeingtransformedforbothframesAandB,butinthecasethatPmodelisnotconstant,asecondvertexpositionarraymustbespecifiedandusedwhencalculatingthepositionforframeA.
Listing17.1:ThisGLSLvertexshaderfirsttransformsthevertexpositionforthecurrentframetohomogeneousclip-spacecoordinatesintheordinarymannerusingthemodel-view-projection(MVP)matrix.TheshaderthentransformsthevertexpositionintoscreenspaceforboththeprecedingframeusingthematrixmotionAandthecurrentframeusingthematrixmotionB.Theresultingpositionsareoutputastexturecoordinatesthatwillbereadbythefragmentshader.
uniformmat4mvp,motionA,motionB;
voidmain(){//TransformthepositionusingtheordinaryMVPmatrix.gl_Position=mvp*gl_Vertex;
//Transformthepositionintoscreenspaceusingthemotionmatrix//fromtheprecedingframe(A)andthecurrentframe(B).gl_TexCoord[0]=motionA*gl_Vertex;gl_TexCoord[1]=motionB*gl_Vertex;}
Thematrixmultiplicationsperformedbythevertexshaderproducetwofour-dimensionalhomogeneousscreen-spacepositions.Itisimportantthatthesepositionsbeinterpolatedinhomogeneousformandthattheperspectivedividebythew-coordinateoccursinthefragmentshader.Otherwise,theinterpolatedpositionsintheinteriorsoftriangleswouldbeincorrect,especiallyfortriangleshavingverticesthatliebehindthecamera.
Inthefragmentshaderusedwhenrenderingtothevelocitybuffer,weobtaintwo-dimensionalscreen-spacepositionsforframesAandBbydividingthehomogeneousinterpolatedpositions and bytheirw-coordinates,asshowninListing17.2.Subtractingthesepositionsthenproducesascreen-spacevelocityVthroughtheformula
(17.4)
whereonlythexandycomponentsofthevelocityarecalculated.
Listing17.2:ThisGLSLfragmentshadercalculatesthescreen-spacevelocityandwritesittotheredandgreencomponentsoftheoutputcolor.ThevelocityScaleparameterholdsthevalueofs/rmaxshowninEquation(17.6).Thedepthistakendirectlyfromthew-coordinateofthecurrentpositionandiswrittentothebluecomponentoftheoutputcolor.Thegradientofthedepthiscalculatedusingthehardwarederivativefunctions,andthelargerofitscomponentsiswrittentothealpha
componentoftheoutputcolor.
uniformvec2velocityScale;
voidmain(){//Dividebythew-coordinatestoget3Dpositions.vec2posA=gl_TexCoord[0].xy/gl_TexCoord[0].w;vec2posB=gl_TexCoord[1].xy/gl_TexCoord[1].w;
//Subtractthepositionsandscaletogetvelocity.vec2veloc=(posB.xy-posA.xy)*velocity_scale;
//Clampthevelocitytoamaxmagnitudeof1.0.floatvmax=max(abs(veloc.x),abs(veloc.y));gl_FragColor.xy=veloc/max(vmax,1.0);
//Passthecurrentdepththrough.gl_FragColor.z=gl_TexCoord[1].w;
//Calculatethemaxcomponentofthedepthgradient.vec2grad=vec2(ddx(gl_TexCoord[1].w),ddy(gl_TexCoord[1].w));gl_FragColor.w=max(abs(grad.x),abs(grad.y));}
ThemagnitudeofthevelocityVisunbounded,butwecanonlyreadalimitednumberofcolorsamplesperpixelinthepost-processingphase,andwedon'twantthemtobetoofarawayfromthepixelbeingprocessed.Thus,toensureasmoothblur,weneedtoimposesomekindofboundsonthevelocity'ssize.Wefirstdividethevelocitybythemaximum
distancermaxthatwewanttoallowbetweenapixel'slocationandanycolorsampleusedtoblurit.Thevalue1/rmaxisaconstantthatispassedtothefragmentshaderasaparameterbywhichthevelocityismultiplied.Wecanalsoincludeinthisparameteranormalizationfactorsthataccountsforthetimeinbetweentwoframesandadjuststheoverallintensityofthemotionblureffect.Wedefinesas
(17.5)
wheret0isthetargettimeintervalbetweenframes,∆tistheactualtimebetweentheprecedingframeandthecurrentframe,andmisanadjustablefactorthatcontrolsthemotionblurintensity.Thescaledscreen-spacevelocityV?isgivenby
(17.6)
Afterapplyingthisscalefactor,weclampthevelocity'smagnitudetotherange[0,1]usingthefollowingformulatopreserveitsdirection:
(17.7)
ThexandycomponentsofthevelocityVfinalarestoredintheredandgreenchannelsofthecoloroutputtothevelocitybuffer.
Whatremainsistowritethedepthandgradientinformation
totheblueandalphachannelsofthevelocitybuffer.Thelinearcamera-spacedepthissuppliedbythew-coordinateofthepositionforthecurrentframe,anditissimplycopiedtothebluechanneloftheoutputcolor.Toobtainthegradientofthedepth,wequerythehardwareforthederivativesoftheposition'sw-coordinateinboththexandyscreendirections.Toachieveslightlyhigherperformance,weoutputthelargerabsolutevalueofthetwoderivativesinthealphachannelinsteadofcomputingtheactualmagnitudeofthegradient.(Thatis,wecomputethemaximumnorminsteadoftheEuclideannorm.)InthefragmentshadershowninListing17.2,thegradientmagnitudegstoredinthealphachannelisgivenby
(17.8)
TeamUnknownRelease
Chapter17-MotionBlurandtheVelocity-Depth-GradientBufferGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
17.3RenderingthePost-ProcessingEffect
Attheendofaframe,themotionblureffectisgeneratedforthefinalimagebyrenderingapost-processingshaderovertheentirescreen.Thisshaderusestheinformationinthevelocity-depth-gradientbuffertoselectasetofsamplelocationsfromwhichthecolorbufferisread.Allofthecolorsamplesthatsatisfytheminimumdepthrequirementareaveragedtogethertoproducethefinalcolorforeachpixel.Sincethesamplelocationsarederivedfromthemagnitudeanddirectionofthevelocitythatapixelpossesses,theresultisanimagecontainingconvincingmotionblur.
ThefragmentshadershowninListing17.3performsthemotionbluroperation.Itstartsbysamplingthecolorbufferatthepixellocationbeingrendered,whichwecallthe"centerpixel",andinitializingthenumberofvalidsamplestoone.Thenumberofvalidsamplesisstoredinthewcomponentofthecolor,andthesampleatthecenterpixelisalwaysvalid.Thisparticularimplementationtakeseightadditionalequallyweightedsamplesfromthecolorbufferatequallyspacedintervalsinthedirectionparalleltothevelocity.Thereissomefreedominchoosingthenumberofsamplesandtheirweights,andsomeimplementationsmayevendecidetotakeavariablenumberofcolorsamplesbasedonthemagnitudeofthevelocity.
Listing17.3:ThisGLSLfragmentshaderappliesthemotionblureffectinthepost-processingpass.Theninecolorsamplesareaccumulatedinthex,y,andzcomponentsofthecolorvector,andthenumberofvalidsamplesisstoredinthewcomponentofthecolor
vector.ThevalueofminDepthiscalculatedusingEquation(17.9),andonlysampleshavingadepthatleastthisfarfromthecameraplaneareusedtogeneratethefinalblurredpixel.
#extensionGL_ARB_texture_rectangle:require
uniformsampler2DRectcolorTexture;uniformsampler2DRectvelocityTexture;
voidmain(){vec4color,sample;
//Readthecentersamplefromthecolorbuffer.color.xyz=texRECT(colorTexture,gl_FragCoord.xy).xyz;color.w=1.0;
//Readthevelocitybufferatthecurrentpixel.float4velo=texRECT(velocityTexture,gl_FragCoord.xy);
//Calculatetheminimumdepthforothercolorsamples.floatminDepth=velo.z-max(velo.w,0.001)*7.0;
//Initializeconstantsampleweight.sample.w=1.0;
//Calculatecoordinatesforfirstsampleoneitherside.vec4coord=velo.xyxy*vec4(1.75,1.75,-1.75,-1.75)+gl_FragCoord.xyxy;
//Readacoloranddepthatthesamplelocation.sample.xyz=texRECT(colorTexture,coord.xy).xyz;
floatdepth=texRECT(velocityTexture,coord.xy).z;
//Addthesampletothefinalcolorifit'sdepthisgreatenough.if(depth>=minDepth)color+=sample;
//Grabthesampleontheoppositesideofthecenterpixel.sample.xyz=texRECT(colorTexture,coord.zw).xyz;depth=texRECT(velocityTexture,coord.zw).z;if(depth>=minDepth)color+=sample;
//Calculatecoordinatesforsecondpairofsamples.coord=velo.xyxy*vec4(3.5,3.5,-3.5,-3.5)+gl_FragCoord.xyxy;sample.xyz=texRECT(colorTexture,coord.xy).xyz;depth=texRECT(velocityTexture,coord.xy).z;if(depth>=minDepth)color+=sample;
sample.xyz=texRECT(colorTexture,coord.zw).xyz;depth=texRECT(velocityTexture,coord.zw).z;if(depth>=minDepth)color+=sample;
//Calculatecoordinatesforthirdpairofsamples.coord=velo.xyxy*vec4(5.25,5.25,-5.25,-5.25)+gl_FragCoord.xyxy;sample.xyz=texRECT(colorTexture,coord.xy).xyz;depth=texRECT(velocityTexture,coord.xy).z;if(depth>=minDepth)color+=sample;
sample.xyz=texRECT(colorTexture,coord.zw).xyz;depth=texRECT(velocityTexture,coord.zw).z;if(depth>=minDepth)color+=sample;
//Calculatecoordinatesforfourthpairofsamples.coord=velo.xyxy*vec4(7.0,7.0,-7.0,-7.0)+gl_FragCoord.xyxy;
sample.xyz=texRECT(colorTexture,coord.xy).xyz;depth=texRECT(velocityTexture,coord.xy).z;if(depth>=minDepth)color+=sample;
sample.xyz=texRECT(colorTexture,coord.zw).xyz;depth=texRECT(velocityTexture,coord.zw).z;if(depth>=minDepth)color+=sample;
//Totalweightofusedcolorsamplesisinthew-coordinate.//Dividebyittogetthefinalaveragedcolor.gl_FragColor.xyz=color.xyz/color.w;}
Thevelocity-depth-gradientbufferisreadatthelocationofthecenterpixel,andtheminimumdepthrequiredforallcolorsamplesiscalculatedas
(17.9)
wherezisthedepthstoredinthebluechannelofthebuffer,gisthedepthgradientgivenbyEquation(17.8)storedinthealphachannelofthebuffer,andrmaxisthelargestdistancebetweenthecenterpixelandasamplelocation.ThisisalittledifferentfromEquation(17.1)becauseweusethemaximumsampledistancermaxtocomputeasingleminimumdepthzminthatisusedforallsamplevalues.Weclamptheminimumvalueofthegradientto0.001sothatprecisionerrorsdon'tpreventthemotionblureffectfromworkingonsurfacesthatarenearlyperpendiculartotheviewdirection.
Thevalueofrmaxis7.0inthefragmentshadershowninListing17.3,andcolorsamplesaretakenatoffsetsgivenbythevelocityvectormultipliedbythevalues1.75,3.5,5.25,and7.0.Ateachsamplelocation,thevelocitybufferisread,butonlytofetchthedepthatthatsamplelocationandcompareittozmin.(Velocityandgradientinformationisonlyusedatthecenterpixel.)Foranysamplesatisfyingz≥zmin,weaddthecolorsampletothefinalcolorandaddonetothenumberofvalidsamples(inthewcomponentofthecolor).Afterallsampleshavebeentaken,wedividethefinalcolorbythenumberofvalidsamplesthathavebeenaccumulatedandoutputtheresult.
TeamUnknownRelease
Chapter17-MotionBlurandtheVelocity-Depth-GradientBufferGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
17.4GridOptimization
ThefragmentshaderpresentedinListing17.3producesverypreciseresults,butitcanbeunnecessarilyexpensiveforlargepartsofthescene.Whenitisknownthataregionofthescreencontainspixelsthatareallmovingatsimilarspeedsrelativetothecamera,asimplershaderthatdoesnotconsiderdepthcanbeusedinordertoincreaseoverallperformance.Theuseofthefullimplementationcanbelimitedtothoseareasofthescreeninwhichslow-movingforegroundobjectsareexpectedtoappear.
InFigure17.4,wehavedividedthescreenintoa16×12gridofrectangularcells.Beforerenderingthepost-processingshader,wedeterminewhetheranyforegroundobjectsmightbeslowmovingrelativetothebackgroundandmarkcellscoveredbythoseobjectsasrequiringthefull-blownshader.Thiswouldtypicallybedonewhenobjectsarebeingrenderedintothevelocitybuffernearthebeginningoftheframe.Inthepost-processingphase,weapplythesimpler,fastershadertocellsthathavenotbeenmarked.
Figure17.4:(SeealsoColorPlates.)Inthisimage,theviewportispartitionedintoagridof16×12cells.Thedepthandgradientinformationisonlyusedinthehighlightedcellssurroundingthecharactersincethatiswhereforegroundobjectsarelikelytobemovingslowlyrelativetothebackground.(ImagecourtesyofTerathonSoftwareLLC.)
TeamUnknownRelease
Chapter17-MotionBlurandtheVelocity-Depth-GradientBufferGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]GilbertoRosado."MotionBlurasaPost-ProcessingEffect".GPUGems3,Addison-Wesley,2008.
[2]BenPadget."EfficientReal-TimeMotionBlurforMultipleRigidObjects".ShaderX7.CharlesRiverMedia,2009.
[3]C4Engine.http://www.terathon.com/c4engine/
TeamUnknownRelease
Chapter18-FastScreen-SpaceAmbientOcclusionandIndirectLightingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter18:FastScreen-SpaceAmbientOcclusionandIndirectLighting
LászlóSzirmay-Kalos,BalázsTóth,andTamásUmenhofferBudapestUniversityofTechnologyandEconomics
18.1Introduction
Aphysicallycorrectapproachtorenderingwouldbethesolutionoftherenderingequation,butitistooexpensivecomputationallywhendynamicscenesareprocessedinrealtime.Thusinpractice,wepreferapproximationsthatcanbeefficientlyevaluated.Ausualsimplificationisthelocalilluminationmodel,whichcomputesonlythedirectcontributionofthelightsourcesandaddsaconstantambienttermforthemissingindirectillumination.However,constantambientlightingignoresthegeometryofthescene,whichresultsinplainandunrealisticimages.Weneedbettercompromisesbetweentherenderingequationandtheclassicalambientmodel.
Localapproachesexamineonlyaneighborhoodoftheshadedpointduringilluminationcalculation.Theobscurancesmethod[10,2],whichisalsocalledtheambientocclusion[1,6]computesjusthow"open"thesceneisintheneighborhoodofapoint,andscalestheambientlightaccordingly.Tocomputeocclusionsinrealtime,themethodcalledscreen-spaceambientocclusion[5]examinestheheightfielddefinedbythecurrentcontentofthedepthbufferinsteadoftherealscene'sgeometry.Thus,whenapointisshaded,therequiredgeometricinformationabouttherestofthesceneisfetchedfromadepthtexture.Theclassicalambientocclusionapproachassumesthatnoilluminationcomesfromnearbyoccluders.However,usingthealbedoorthecolorofthesepoints,localindirectlightingcanalsobeapproximated[10,8].
Inthisarticle,wepresentasimplifiedilluminationmodelthat
isderivedfromtherenderingequation.Themodelconsistsoftwoparts,theambientocclusionpartdescribingtheinfluenceofthedistantpartofthescene,andtheindirectilluminationpart,whichisresponsibleforlocalinteractions.Bothpartsaredirectionalintegrals,whichwouldneedmanydiscretesamplesforanaccurateestimation.Inorderfortheefficientevaluation,wetransformtheseintegrals.Theambientocclusionintegralisfirsttransformedtoavolumetricintegral,whichisfirstevaluatedalongthedepthcoordinateanalytically,thentheremainingintegraloverthediskperpendiculartotheviewingdirectionisobtainednumerically.Theindirectlightingintegralisexpressedfromthestableambientocclusionestimate.Thus,ourmethodismoregeneralthan[10]sinceitalsotakesintoaccounttheone-bounceofthedirectlighting,anditismorerobustthan[8]sinceitdoesnotincludeinfinitevariationformfactorsintheapproximation.
Theproposedmethodfallsintothecategoryofscreen-spacetechniquessinceweassumethatthedepthbufferisthesampledrepresentationofthescenegeometryandthecolorbufferstorestheradiancevaluesoftherepresentedsurfaces.However,weworkincameraspaceratherthaninscreenspacetoobtaincorrectdistanceandanglevalues.
Theinputofourrenderingmethodincludesthetexturesofcamera-spacedepthvalues,normalvectorsofthevisiblepoints,andthecolorbufferstoringtheradiancesduetodirectlighting.Fromtheseinputtextures,adeferredshadingalgorithmcomputestheradiancesofthevisiblepoints,takingintoaccountambientocclusionandlocalindirectillumination.
TeamUnknownRelease
Chapter18-FastScreen-SpaceAmbientOcclusionandIndirectLightingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
18.2AGeneralAmbientIlluminationModel
Letusassumethatthesurfacesarediffuse.Accordingtotherenderingequation,thereflectedradianceLratashadedpointscanbeobtainedasanintegralofdirectionsωrunningovertheunithemisphereΩabovethesurface:
whereaisthealbedoofthediffusesurface,Nsistheunitnormalattheshadedpoint,andLin(s,ω)istheincidentradianceoftheshadedpointfromdirectionω.
Ifanoccluderpointoisvisiblefromsinthedirectionω(seeFigure18.1)andthespaceisnotfilledwithparticipatingmedia,thentheincidentradianceisequaltoexitingradianceLr(o).Ifnosurfaceisseen,thenshadedpointsissaidtobeopeninthisdirection,andtheincidentradianceisambientintensityLa.However,thisdoesnotmeetourintuitionandeverydayexperiencethattheeffectofdistantsurfacesisreplacedbytheiraverage.Thisexperienceisduetothefactthattheactualspaceisnotemptybutisfilledwithaparticipatingmedium.Ifitsextinctioncoefficientisσanditsalbedois1,thentheradiancealongarayofdirectionωchangesaccordingtothevolumetricrenderingequation
Lin(s,ω)=e-σDLr(o)+(1-e-σD)La,
Figure18.1:Theshadedpointsisthecenteroftheneighborhoodsphere.TheradiusofthesphereisR.ThosedirectionsωwherethereisnointersectioncloserthanRarecalledopen.PointoisanintersectioncloserthanR.
whereDisthedistancebetweentheshadedandtheoccluderpoints.Notethatinthisequationfactorμ(D)=1-e-σD
anditscomplement1-μ(D)=e-σDexpresstheeffectsoftheambientlightingandoftheoccluderontheshadedpoint,respectively.Theeffectoftheoccluderdiminisheswiththedistance.ThefunctionμisafuzzymeasurethatdefineshowstronglydirectionωbelongstothesetofopendirectionsbasedondistanceDoftheocclusionatthisdirection.
Theexponentialfunctionderivedfromthephysicalanalogyofparticipatingmediahasasignificantdrawback[2].Asitisnonzeroforarbitrarilylargedistances,verydistantsurfacesneedtobeconsideredthatotherwisehaveanegligibleeffect.Thus,forpracticalfuzzymeasuresweusefunctions
thatarenonnegative,monotonicallyincreasingfromzero,andreachoneatthefinitedistanceR.Thisallowstheconsiderationofonlythoseocclusionsthatarenearby,i.e.,intheneighborhoodsphereofradiusR.TheparticularvalueofRcanbesetbytheapplicationdeveloper.Whenweincreasethisvalue,shadowsduetoambientocclusionsgetlargerandsofter.
Todefinethefuzzymeasurethatincreasesfromzerotoonein[0,R],wecanuseasimplepolynomial
Usingthisfuzzymeasure,thereflectedradianceoftheshadedpointcanbeexpressedinthefollowingway:
Thefirsttermofthisexpressionistheambientocclusionoftheshadedpoint:
Thesecondtermistheirradianceduetonearbyindirectlighting:
Thisintegralistracedbacktotheambientocclusion.ReplacingoccluderradianceLr(o)bytheaverageofsurface
radiancevaluesintheneighborhoodoftheshadedpoint
,wecanexpresstheirradianceas
TeamUnknownRelease
Chapter18-FastScreen-SpaceAmbientOcclusionandIndirectLightingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
18.3Screen-SpaceRepresentationoftheScene
Inglobalilluminationcomputations,weneedtoknowthegeometryofotherpartsofthesceneandtheradiancevaluesofpointsvisiblefromthecurrentlyshadedpoint.However,GPUsarebuiltaccordingtotheconceptoflocalilluminationandprefershadingeachpointindependentlyofotherpartsofthescene.Theonlyadditionalinformationthatcanbeusedisstoredintextures.
Screen-spacetechniquesassumethattheheightfielddefinedbythecurrentcontentofthedepthbufferisanappropriaterepresentationofthescreengeometry,andthecolorbufferstorestheradiancevaluesoftherepresentedpoints.Ofcourse,thecontentofthesebuffersrepresentsonlythesurfacesvisiblefromthecamera,butforlocalmethodslikeambientocclusion,thisrestrictionisusuallyacceptable.Inscreenspace,viewingraysareparalleltothez-axis,whichgreatlysimplifiescalculations.However,thetransformationtothisspaceisnotangleanddistancepreserving(itisnotevenaffine);thus,thisspaceisnotappropriateforangleanddistancecomputation.
Ifweneedtocomputeanglesanddistances,weshouldratherworkincameraspace,whichmeansthatwestorecamera-spacezvaluesandalsothecamera-spacenormalvectorsofthevisiblepointsintextures.Thetransformationfromworldspacetocameraspaceisangleanddistancepreservingsinceitisbasicallyatranslationandarotation;thus,thesespacesareequivalentwhendistancesandanglesarecomputed.Thedisadvantageofcameraspaceis
thatinthecaseofaperspectivecamera,theviewingraysarenotparallel,butratherintersecteachotherattheoriginofthecoordinatesystem.
Thus,tosolvethedistortionproblemofscreenspaceandalsokeeptheadvantagesofparallelrays,weworkincameraspacebutuseaquasi-orthogonalapproximation.Whenlargescaleinformationisobtained,wefollowthestructureofcameraspace.However,whensmallerneighborhoodsareexplored,whichhappensduringtheevaluationoftheambientocclusionintegral,weassumethatineachsmallneighborhood,theviewingraysareparallelwiththez-axis.Thisisanapproximation,butisareasonablecompromisebetweenaccuracyandsimplicity.
Indirectilluminationcomputationrequiresthosepointsthatarevisiblefromtheshadedpoint,whichcanusuallybeobtainedwithraytracing.Unfortunately,raytracingisquiteexpensivecomputationallyevenforheightfields,sowereplaceitbyasimpletest.Astheshadedpointbelongstothesetofpointsthatarevisiblefromthecamera,werequirethattheoccluderpointalsobevisiblefromthecamera.Twopointsarevisiblefromeachotheriftherayoriginatingatoneofthepointshasnointersectionwithanysurfacesbeforeitarrivesintheotherpoint.Therequirementofbeinginthevisiblepartoftheheightfield,ontheotherhand,meansthattherayintersectsthesurfacezero,two,four,etc.,times.Iftheneighborhoodissmall,andatmostoneintersectionispossible,thenthetwocriteriaaresimilar.
TeamUnknownRelease
Chapter18-FastScreen-SpaceAmbientOcclusionandIndirectLightingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
18.4VolumetricAmbientOcclusion
Inordertofindanefficientmethodfortheevaluationoftheambientocclusionintegral,weexpressitasathree-dimensional(i.e.,volumetric)integral.First,thefuzzymeasureiswrittenastheintegralofitsderivative:
Substitutingthisintegralintotheambientocclusionformula,weget
Realizingthatr2drdω=dVisadifferentialvolume,theambientocclusioncanalsobeexpressedasavolumetricintegral
whereScontainsthosepointsofthesolidhemispherethatarevisiblefromtheshadedpoint,andthereforealsovisiblefromthecamera.
Inourcase,theoccludersurfaceisaheightfielddefinedbythecontentofthedepthbuffer.Thusapoint(x,y,z)belongstothevisibleregionifitsz-coordinateislessthanthevalue
z*storedinthedepthbufferforthesame(x,y)coordinates.
WecomputethevolumetricintegralwithdifferentialelementshavingdzheightanddA=dxdybaseareaatthepoint(x,y)ofthediskCofradiusRandperpendiculartoz-axis(seeFigure18.2)as
Figure18.2:Evaluationofthevolumetricintegralinthevisiblepartoftheneighborhoodsphere.
wherehistheintegraloverzgivenby
Recallthatwearefreetosettheexponentofthefuzzymembershipfunction.Classicalambientocclusionusedanon-fuzzymeasure,whichcorrespondstoα=∞.Mendez[4]examinedseveralexponentsandconcludedthatα=1/2isagoodchoice.Ourcriterionforsettingtheexponentwillbethe
easeoftheevaluationoftheambientocclusionintegral.Thisintegralcanbeevaluatedanalyticallyifwedefinethefuzzymembershipfunctionasμ(r)=(r/R)αwithα=4.Thecenterofthesphereistheshadedpointwhosecoordinatesaredenotedby(xs,ys,zs).Inthiscase,
Theintegraloverthediskisevaluatedwithnumericalquadrature.Thetotalvolumeofthevisiblepartofthehemisphereisapproximatedbythesumofthevolumeofpipes.Theaxesofthesepipesareparalleltothez-axis.Thepipesareinsidethehemisphereandmaybelimitedbytheheightfieldofthedepthbuffer.Todefinethepipes,wesamplenuniformlydistributedpoints(xi,yi)inthediskofradiusR.Thus,eachpipehasthesamecrosssectionareaΔA=R2π/n.
Alinecrossingthei-thsamplepointandbeingparallelwiththez-axisentersthesphereat
exitsitat
andcrossesthetangentplaneoftheshadedpointat
Thepointsonthislinebelongtothevisibleregionwhentheirz-coordinatesarelessthan andaregreaterthan .Thecontributionofthepipestothevolumetricintegraloftheambientocclusionis
Thisquadratureisanapproximation,anditserrordecreasesifnewsamplepointsareadded.However,computingtheformulawithmanysamplepointsreducesrenderingspeed.Thus,weconsidertwotechniques,includingweighteduniformsamplingandinterleavedsampling,thatreducethecomputationerrorwithoutperformancedegradation.
Weighteduniformsampling[7]exploitsthefactthatifthereisnoocclusionintheneighborhoodsphere,thenambientocclusionfactorshouldbeequaltoone.Ifitisnot,thenthedifferenceisduetotheapproximationerror.Sowecomputenotonlytheambientocclusionfromthesamplesbutalsotheestimateofthisfactorassumingnoocclusionatall.Ignoringocclusions,thisfactoris
Dividingtheformulaforambientocclusionbythisapproximation,wecancompensateforthequadratureerror;thus,abetterestimateofambientocclusionis
Interleavedsampling[3]takesadvantageoftheestimatesinneighboringpixels.Themethoddiscussedsofarhassomeerrorineachpixel,dependingontheparticularsamplesused.Ifwetookdifferentrandomnumbersinneighboringpixels,dotnoisewouldbepresent.Usingthesamerandomnumbersineverypixelwouldmaketheerrorcorrelatedandreplacedotnoisewithstripes.Unfortunately,bothstripesandpixelnoisearequitedisturbing.Interleavedsamplingusesdifferentsetsofsamplesinthepixelsofa4×4pixelpattern.Theerrorsinthepixelsofa4×4pixelpatternareuncorrelated,whichcanbesuccessfullyreducedbyalowpassfilterofthesamesize.Whenimplementingthelow-passfilter,wealsocheckwhetherthedepthdifferencebetweenthecurrentandneighboringpixelsexceedsagivenlimit.Ifitdoes,wedonotincludetheneighboringpixelsintheaveragingoperation.
TeamUnknownRelease
Chapter18-FastScreen-SpaceAmbientOcclusionandIndirectLightingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
18.5IndirectLightingoftheNearGeometry
Thesecondpartofthereflectedradiancedependsontheaverageradiancevaluesofnearbysurfacepoints.Aswecalculatetheambientocclusionwithrandompointsintheneighborhoodoftheshadedpoint,thecoloroftheframebufferattheserandompointscanbeusedtoobtaintheaveragereflectedradiance.Byinspectingthecamera-spacenormal,wecanalsocheckwhetherthesurfaceisorientedtowardtheshadedpoint,andignoreitintheaverageotherwise.
TeamUnknownRelease
Chapter18-FastScreen-SpaceAmbientOcclusionandIndirectLightingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
18.6Implementation
Thediscussedalgorithmisimplementedasafragmentshaderruninadeferredshadingpass,asshowninListing18.1.Forthesakeofsimplicity,weomittedpartsrelatedtointerleavedsampling.Theprogramreceivesthefragmentpositionintexturespace(wPos)andin2Dclippingspace(hPos)asinterpolants.Byfetchingfromthetexturemapcontainingnormalvectorsanddepthvalues(depthMapSampler)withthetexture-spaceposition,weobtainthecamera-spacenormalvectorandthedepthvalue.The2Dclipping-spacecoordinatesarealsotransformedbacktocameraspaceusingthecamera-spacedepth,whichresultsintheshadedpoints.
Listing18.1:Thisfragmentshaderimplementsthealgorithmdiscussedinthisgem.
float4psAO(float2wPos:TEXCOORD0,float2hPos:TEXCOORD1):COLOR0{wPos+=pixelsize*0.5;//Texturecoordinatesin[0,1]
float3N=tex2D(depthMapSampler,wPos).xyz;//Camera-spacenormalfloatdepth=tex2D(depthMapSampler,wPos).a;//Camera-spacedepth
//Computecamera-spacepositionfloat4pcamera=mul(float4(hPos,0,1),projMatrixInverse);pcamera.xyz/=pcamera.w;
float3s=pcamera.xyz*depth/pcamera.z;//Shadedpoint
floatO=0;//Enumeratorofambientocclusion
floatDenom=0;//Denominatorofambientocclusionfloat3I=0;//Irradiance
for(intsampleidx=0;sampleidx<sampleCount;sampleidx++){float2sample=AO_RAND[sampleidx].xy*R;float3o=s+float3(sample.x,sample.y,0);//Occluder
pcamera=mul(float4(o,1),projMatrix);
float2texCoord=pcamera.xy/pcamera.w;texCoord.y*=-1.0;texCoord=(texCoord+1)/2;floatzstar=tex2D(depthMapSampler,texCoord).a;o.z=zstar;//Occluder’sdepth
floatd=sqrt(R*R-dot(sample.xy,sample.xy));floatzmin=s.z-d;floatzexit=s.z+d;floatzplane=s.z-dot(o.xy-s.xy,N.xy)/N.z;zplane=max(zplane,zmin);zexit=min(zplane,zexit);floatzmax=zexit;
Denom+=(zmax-zmin)*(dot(o.xy-s.xy,N.xy)+(zmax+zmin-2*s.z)/2*N.z);
if(zstar<zmin-R)zstar=zexit;//silhouetteeliminationzmax=min(zexit,zstar);zmax=max(zmax,zmin);
O+=(zmax-zmin)*(dot(o.xy-s.xy,N.xy)+(zmax+zmin-2*s.z)/2*N.z);
if(zmax<zexit)//Occlusionhappened?{float3No=tex2D(depthMapSampler,texCoord).xyz;if(dot(s-o,No)>0)I+=tex2D(colorMapSampler,texCoord).rgb;}}
O/=Denom;I*=(1.0-O)/sampleCount;returnfloat4(I,O);}
ThentheenumeratoranddenominatoroftheambientocclusionformulaandtheirradiancearecomputedinvariablesO,Denom,andIintheloopexecutedsampleCounttimes.Asingleoccludersampleoisgeneratedfromprepared2Dpointsuniformlydistributedintheunitdisk(AO_RAND[k]).Thedepthmapisfetchedusingthedirectionoftheoccluder,whichresultsinoccluderdepthzstar.Thisdepthiscomparedtotheentrydepthofthespherezmin,exitdepthzexit,andthatoftheintersectionwiththetangentplanezplane,determiningthezmin-zmaxintervalwherethefunctionhisevaluated.Inparallel,anotherintegraliscomputedinDenomthatdescribestheunoccludedcase.Thisintegralwillbeusedforerrorcompensation.Notethatwealsocheckwhethertheoccluderismuchclosertotheeyethantheshadedpointandignoresuchocclusions,whichwouldotherwiseresultinfalsesilhouetteedges.Ifnearocclusionhappensand,accordingtotheoccluder's
surfaceorientation,itcanilluminatetheshadedpoint,thentheoccluder'scolorisinsertedintotheaveragecolorusedforindirectillumination.Thisfragmentshaderreturnswiththeaverageindirectilluminationandtheambientocclusionofthefragment,whichisthencompositedwiththepreviouslycalculateddirectilluminationresult.
Asthecompilerunrollsloopsthatcontaintex2Dcalls,wemayrunoutofregisterswhenthesamplenumberishigh(itisgreaterthan12inthespecifiedhardware).Ifmoresamplesareneeded,thetex2Dcallsshouldbyreplacedbytex2Dlod,whichdoesnotforceloopunrolling.Surprisingly,thisreplacementdoesnotdegradetheperformance.
TeamUnknownRelease
Chapter18-FastScreen-SpaceAmbientOcclusionandIndirectLightingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
18.7Results
TheproposedmethodshavebeenimplementedinDirectX/HLSLenvironment,andtheirperformancehasbeenmeasuredonanNvidiaGeForce8800GTXGPUat800×600resolution.TherenderingresultsoftheharborsceneareshowninFigure18.3.Thisscenecanberenderedat170FPSifonlydirectionallightiscomputed.Taking16samplesperpixel,theambientocclusionandindirectlightingcomputationrunsover100FPS.With32samplesperpixel,therenderingspeeddropsto60FPS.
Figure18.3:(SeealsoColorPlates.)Renderingresultsoftheharborscene—(Firstcolumn)directlighting,(secondcolumn)directlightingplusambientocclusion,and(thirdcolumn)directlightingplusambientocclusionandindirect
lighting.
TeamUnknownRelease
Chapter18-FastScreen-SpaceAmbientOcclusionandIndirectLightingGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]H.Landis."Production-readyglobalillumination".SIGGRAPHCoursenotes16,2002.http://www.debevec.org/HDRI2004/landis-S2002-course16-prodreadyGI.pdf
[2]A.Iones,A.Krupkin,M.Sbert,andS.Zhukov."Fastrealisticlightingforvideogames".IEEEComputerGraphicsandApplications,Volume23,Number3(May2003),pp.54–64.
[3]A.KellerandW.Heidrich."Interleavedsampling".RenderingTechniques2001(Proceedingsofthe12thEurographicsWorkshoponRendering),2001,pp.269–276.
[4]A.Méndez,M.Sbert,andJ.Catá."Real-timeobscuranceswithcolorbleeding".SCCG'03:Proceedingsofthe19thSpringConferenceonComputerGraphics,ACM,2003,pp.171–176.
[5]M.Mittring."Findingnextgen—CryEngine2".AdvancedReal-TimeRenderingin3DGraphicsandGamescourse,SIGGRAPH2007,pp.97–121.
[6]M.PharrandS.Green."Ambientocclusion".GPUGems,Addison-Wesley,2004.
[7]M.PowellandJ.Swann."WeightedUniformSampling—aMonte-CarloTechniqueforReducingVariance".AppliedMathematics,Volume2,Number3(September1966),pp.228–236.
[8]T.Ritschel,T.Grosch,andH.—P.Seidel."ApproximatingDynamicGlobalIlluminationinImageSpace".ProceedingsACMSIGGRAPHSymposiumonInteractive3DGraphicsandGames(I3D),2009,pp.75–82.
[9]P.ShanmugamandO.Arikan."HardwareacceleratedambientocclusiontechniquesonGPUs".Proceedingsofthe2007SymposiumonInteractive3DGraphics,2007,pp.73–80.
[10]T.Umenhoffer,B.Tóth,andL.Szirmay-Kalos."EfficientMethodsforAmbientLighting".SpringConferenceonComputerGraphics,2009,pp.99–106.
[11]S.Zhukov,A.Iones,andG.Kronin."Anambientlightilluminationmodel".ProceedingsoftheEurographicsRenderingWorkshop,1998,pp.45–56.
TeamUnknownRelease
Chapter19-Real-TimeCharacterDismembermentGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter19:Real-TimeCharacterDismemberment
AurelioReisidSoftware
Overview
Moderngamesutilizeanumberoftrickstoconveyarealisticandintriguingworld.Inonlyafewyears,realisticphysicsanddestructibleenvironmentshavegarneredwidespreadindustryadoption,yetdespitethis,fewgamesattempttomodelheavydamageongamecharacters.Oneofthemainreasonsforthisisthecomplexityofdecomposingthetopologyofa3Dmeshdynamicallywithreal-timeperformance.
Inthisgem,weintroduceanefficientgeneral-usetechniqueforcharacterdismembermentthatcanbeeasilyincorporatedintoanexistingskeletalanimationsystem.Thisimplementationisperfectlysuitableforgames,real-timeapplicationslikemilitarysimulations,andother"seriousgames"wheretheaccuracyofthedamagemodelingdoesnotneedtobeprecise(asinmedicalsimulations).
Whilemanygamesdon'thavethesubjectmatterappropriatefordismemberment,whenpresentedinacartoonishwayit'spossibletoapproachgoreanddismembermentsuchthatitisrelevanttothegameplayexperienceasopposedtobeingusedpurelyforshockvalueorgimmick.MoviessuchasEvilDead2andKillBillhaveapproachedgoreinacomedicwaythatrivetsaudiences.Aszombieandmonstergamesgainpopularity,it'slikelythattheywillhavethemosttogainfromcharacterdismemberment.
TeamUnknownRelease
Chapter19-Real-TimeCharacterDismembermentGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
19.1WhatisCharacterDamageModeling?
Acharacterinthiscontextisroughlydefinedasanyanimatedfigureintheshapeofacreatureusingabipedalskeletonandkeyframedanimationsforitsmotion.Inthegameworld,acharacterisrepresentedasanentitythatiscapableofmovingaroundusingsomescriptedorautonomousbehaviorswithanimationsthatcoincidewiththismotion.Asthecharactermovesthroughouttheworld,itmaycomeincontactwithanumberofinteractiveelementsthatcanaffectthecharacterinonewayoranother.Inafirst-personshooter,forinstance,acharactermaymoveabout,beinfluencedinsomewaybytheenvironment(likewhenridinganelevator),andmayshootorbeshotbyothercharacters(playerornon-player).
Whenacharacteriskilled,theresultsareusuallypresentedusingsomekindofpre-cannedanimationsequencewithspecialeffectslikebloodandgorestrewnabout.Inmostmoderngames,itiscommonpracticetouse"ragdoll"physicsoncethecharacteriscompletelydeadtoaddanadditionalsenseofrealism.Inaddition,somegamesgosofarastocreatebloodychunksandbodyparts,so-called"gibs"(forgiblets).Combined,theseelementsresultincharacterdamagemodelingthatmakesthegameworldmorebelievabletotheplayer.Games,afterall,areaboutconsequences,andrealisticcharacterdeathsaddtothatrealism.
Whilemostgamesareabletogetawaywiththisdegreeofdamagemodeling,somegamesrequiremoreprecisedetail.Additionalelementslikeprojecteddecalsandcharacter
scarringaddsalot,butthemostdramaticchangewouldbeintheshapeofthecharacteritself.Theeffectsofseverebluntforceorimpacttraumaonahumanoidisenoughtodismembermostmajorlimbswithease,anditisthisparticularsubsetofdamagemodelingonwhichthisgemfocuses.
TeamUnknownRelease
Chapter19-Real-TimeCharacterDismembermentGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
19.2MethodsofMutilation
Thereareanumberofwaystoapproachcharacterdismemberment.Theeasiestandmoststraightforwardsolutionwouldbetoexplicitlymodelthecharacterasanarticulatedcollectionofbodyparts.Wheneveralimb'sboundingvolumeishit,everythingbelowthatbodypartinthehierarchycanbemanuallydetached.Thistechnique,whileincrediblysimple,canbedifficulttotunebecausemodelseamsbecomeprominentatthebodypartboundaries.Aesthetically,thequalityisquitepoor,althoughitcanbehiddenbyclevermodelingtricks(e.g.,placingagorgetatthecharacter'sneckseam).Inaddition,thistechniquewouldrequirealargenumberofbodypartstoworkwell,whichmeansadditionaldrawcallsthatcanresultinreducedperformance.
Theextremeoppositeofthiswouldbetousecomputationalgeometry.Constructivesolidgeometry(CSG)booleanoperationscanbeusedtoremovechunksofacharacterbydynamicallysplittingandremovingpolygonsfromthemesh.Theuniformcharactermeshisagreatadvantage,butunfortunatelythistechniquerequiresheavyCPUprocessingbothinpre-transformingtheanimatedmeshtotheproperposeandintherequiredgeometricoperations.Thistechniquecanalsoresultinahighlytriangulatedmeshwithanunpredictablenumberofpolygonsthatcanbecomebothamemoryandperformanceconcern.
AnotherpossiblemethodwouldbesimilartotheoneusedintheSoldierofFortuneseriesofgamesmadebyRavenSoftware.Intheirapproach,theyusedwhattheycalled"gore
zones"torepresentupto26areasthatcanbetoggledoffonagivenanimatedbiped.Thecharactermodelswerecreatedinsuchawaythateverygorezonewascompletelycapped,sealedandinternallytextured.Thisallowedthemtheflexibilitytorepresentinternalcavitiessuchastheskullwherethebraincouldbecomedetachedduetoheavytrauma.Theobviousdownsidesareaheavilytriangulatedmodelandadrawcallforeachgorezone—26drawcallspercharactermodel.Thiscanbeavoidedbyconsolidatinggorezonegeometryintoamassivedrawbatch,butsincethisrequirestouchingGPUmemory,itcancomeatanexpensivecost.
It'salsopossibletoremovelimbsbyusinggeometrymorphing.Inthismethod,ablendshapeisusedtomovetheverticesofthelimbtoadesiredfinalposition,i.e.,astumporcap.Thistechniqueiseasilyhardwareaccelerated,althougholderhardwareisonlyabletohandleafewblendshapes.Becauseofthis,dismemberingmultiplelimbsatthesametimerequiresmodifyingavertexbufferwhereamorphscaleisspecified.Inordertodeterminewhethertoapplythevertexpositiondeltastogetthedismemberedgeometrylocation,thisscaleismodifiedatruntime.Liketheprevioustechnique,thismethodsuffersfromhavingtomodifyabufferinGPUmemory.
Ultimately,eachofthesetechniqueshasitsownsetofpotentialbenefitsandpitfalls.Ideally,whatwewantisasystemthatworkswithauniformmeshbecauseoftheinherentperformancebenefitsofasingledrawcall.Inaddition,oursolutionshouldbeabletotakeadvantageofcommoditygraphicshardwarelikethatinthecurrentgenerationofgameconsoles.Thissystemalsoneedstobe
flexibleenoughtoallowforanynumberofuser-definedbodyparts(withinreason)tobesevered,generatinganewandseparateobjectthatisabletocoexistwiththeoriginalmodel.Finally,itshouldrequireaslittleartistinterventionaspossible,allowingforarbitrarybreakpointsbasedonuser-defineddatathatiseasilymodifiable.
TeamUnknownRelease
Chapter19-Real-TimeCharacterDismembermentGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
19.3BoneMatrixFlattening
Thesolutionpresentedinthisgemworkslikeso.Givena3Dmeshandamatchingskeletalhierarchy,auser-defineddamagezoneiscreatedforeverybreakablebodypart.Eachdamagezonecontainsaboundingvolumethatdefinesitsarea,thejointtowhichitisattached,thesurfaceareaitencompasses(moreonthislater),andthelistofjointsbelowthemainjoint(seeListing19.1).Theboundingvolumeshouldbeastight-fittingaspossible.Anorientedboundingboxworksquitewellsincemostlimbsonahumanbipedarelongerthantheyarewidealthoughaspheremayalsobesufficient(dependingontheunderlyinggeometry).Figure19.1showsanexampleofadamagezoneconfiguration.
Figure19.1:Thelimbdamagezones.
Listing19.1:Thedamagezoneclassdefinition.
classCArDamageZone:publicCArHitBox{public:
CArDamageSurfacem_Surface;
//Thenextdamagezonebelowthisoneintheskeletalhierarchy.CArDamageZone*m_Next;
//Thejointsbelowthisdamagezone'sjointvector<int>ChildJointList;
//Thelargestjointindexinthelimbhierarchy.intm_JointRange;
CArDamageZone():m_Next(NULL),m_iJointRange(INVALID_JOINT){}~CArDamageZone(){}
voidGatherChildJoints(constSMD5Skeleton&Skel);};
Whenahitisregisteredonabodypart'sdamagezone,everychildjointbelowthejointtowhichitisattachedistraversed,andeachoneofthesejointsismovedtothepositionofthemainjoint.Inadditiontothis,eachmatrixforthechildjointsisflattened,thatis,scaleddowntozerotoformadegeneratematrix.Thiseffectivelycreatesastumpastheverticesreducetoasingularpoint.
Theprocessforcreatingthedismemberedpieceisverysimilar.Thesamedamagezonejointisnowusedastheoriginofallthejointsaboveitinthehierarchyandtheirtransformsarealsoflattenedasinthepreviousstep.Thedetachedlimbcannowfunctionasaseparateentitywhereitcancomeundercontrolofthephysicssystem,i.e.,gointoragdollmode.
TeamUnknownRelease
Chapter19-Real-TimeCharacterDismembermentGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
19.4Improvements
Whilethistechnique'sresultscanbeutilizedimmediately,thereareanumberofthingsthatcanbeimproved.First,whiletheendcapsformedbythematrixflatteningdon'tlookthatbad,theydon'treallyconveyanykindoflocalizeddamage.Toremedythis,it'spossibletouse"gorecaps"and"bloodflowers".Theseartistcreatedmodelscanbeattachedtotheendofadamagezonejointtocreatetheappearanceofboneshardsorbloodyentrails.Sincetheyaredefinedperdamagezone,itispossibletoprovideauniquecontext-specificgorecapforanygivenbreakpoint.It'salsopossibletodetectwhetheramatrixhasbeenflattenedwithinashaderandreactaccordingly,possiblyblendingbetweenadamagetexturecreatedwithproceduraltexturecoordinatesorfadingoutthepolygonscompletelytoshowinteriorsurfaces.Incoordinationwithparticlesystemsforeffectslikeblood,averyrichvisualpresentationcanbecreated.
Figure19.2:Alimbisremovedbytransformingtheverticesassociatedwithitsjointsbyaskinningmatrixthatmakesthoseverticesdegenerate.Thedetachedlimbisformedbyperformingtheprocessinreverse.
Anotherareaofimprovementhastodowithhowdetachedlimbsarerendered.Whiledegeneratetrianglesthathavenopixelsgeneratedsaveonfillrate,theverticesstillhavebeenprocessed.ThishasthepotentialtowastequiteabitofvertexthroughputontheGPU.Tofixthis,wecanscanalltheverticesinthemodelandstoretheonesthatbelongtothechildjointsofaparticulardamagezoneasacontinuous"damagesurface".Thegeneratedsurfacescanthenberecombinedtoformanewmeshthathasitsvertices
prearrangedbylimborder.Whenitcomestimetorenderadetachedlimb,onlytherangeofverticesandtriangleindicesdefinedforagivendamagezoneneedtoberendered.ThedifferentcolorcodeddamagezonesurfacesareshowninFigure19.3.
Figure19.3:(SeealsoColorPlates.)Thecolorcodeddamagezonesurfacegroups.
It'spossibletoskipthisdecompositionstepbyprearrangingallthemodelgeometrytobecontainedwithinameshforeachdamagesurface.Inthisway,theonlyprocessingneededwouldbetorecombinetheindividualmeshesintooneuniformmeshinordertogainthebenefitsofasinglerenderbatch.
Whilethistechniqueworkswellfordrawingisolatedtrianglesfordetachedlimbs,itdoesnotworkforthebasecharactermodel,asanynumberofpossiblelimbscanbebrokenoff.Thismeansit'simpracticaltocreatevertexgroupconfigurationsthatcouldaccommodaterenderingofonlyspecificsubsetsofthemodelwithoutaddingadditionaldrawbatches.Also,thelimbbreakpointsdon'tneedtolieattheoriginofadamagezonejointaswasdescribedearlier.Bymeansofprojectingtheactualhitlocationalongalineconnectingthedamagezonejointanditsimmediatechild,itispossibletocreateabreakthatismoreprecise.Anotherenhancementworthexploringwouldbeaddingmirroredjointsinthemodel'sskeletontoallowrenderingofthemaincharacterbodyandthedismemberedlimbinthesamedrawcall.
Lastly,whilethetechniquedescribedherewasdescribedinthecontextofabipedalcreature,itcanbeusedforanyobjectwithdiscrete"limbs",suchastree(branches),lightpoles,orevennon-bipedalcreatures.
TeamUnknownRelease
Chapter19-Real-TimeCharacterDismembermentGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
19.5Demo
AdemoofthetechniquedescribedinthisgemcanbefoundontheaccompanyingCD.TheMD5formatwaschosenforthemodelandanimationssincetheycoverthebasesonmostskeletalanimationneeds.ThecorelogicforthedamagezonesurfacegenerationcanbefoundintheCArBaseModel::GenerateDamageSurfaces()functioninModel.cpp.ThelogicanddefinitionofthedamagezoneisinBoundingVolume.h/.cpp.ThelogicthatcalculatestheanimatedskeletonalongwiththeflattenedjointsforthelimbandbodyisintheCArAnimator::Update()functioninAnimator.cpp.
TeamUnknownRelease
Chapter20-ADeferredDecalRenderingTechniqueGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter20:ADeferredDecalRenderingTechnique
JanKrassniggUniversityofAachen,Germany
Overview
Renderingdecalsisacommonmethodusedtoapplymoredetailto3Dworldsdynamically.Decalsareoftenusedtoaddbulletholes,bloodstains,tiremarks,andsimilaritemstoaworldaseventsoccurinagame,buttheycanalsobeusedbyleveldesignerstoenrichtheenvironmentwithwearandteartextures,dirttextures,signsonwalls,etc.
ThisgempresentsadecalrenderingtechniquethatusesdeferredshadingtoproducescenesliketheoneshowninFigure20.1.Thetechniqueisentirelyshaderbased,extremelylightweightontheCPU,doesnotneedtodynamicallygeneratetriangles,andisastraightforwardadditiontomost3Dengines.
Figure20.1:Decalsappliedtocomplexgeometry.
TeamUnknownRelease
Chapter20-ADeferredDecalRenderingTechniqueGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
20.1TheProblem
Thereareseveralthingsthatagooddecalsystemshouldsolve:
1. Thesystemshouldintegratewellwithlighting.Decalsshouldnotonlychangethecolor,butalsochangethenormal,specularfactors,andanyothersurfaceparameterssuchthattheybecomeindistinguishablefromallothergeometry.
2. Decalsshouldworkconvincinglyonallsurfaces,staticanddynamic.
3. Decalsneedtoclipproperlytogeometryboundaries,andpossiblyevenwraparoundcorners.
4. Thesystemneedstoworkwithgeometrythatmightbeverydifferentfromthegeometryusedforcollisiondetection.
Item1inthislistiseasilysolvablewhenthegraphicsengineincludesdeferredrenderingcapabilities[2].Forforwardrenderers,thesysteminthisgemcanstillbeused,butsomemodificationswillbenecessary.Forathoroughexplanationofdeferredshading,pleasereferto[3,4,5,6].
Item2canbesolvedifwecangetourhandsonafree8-bitchannelintheG-buffer.Ifnot,wecanatleastmakeitworkonstaticgeometry.ThisisdiscussedinSection20.5.
Item3iswheretherealproblemsbegin.Thedecalneedstofollowthesurfacetowhichitisapplied,evenifthatsurfaceis
highlytessellated.Someexistingdecalrenderingtechniquesgenerateatrianglerepresentationforadecalonsuchasurface[1],butthiscanbecomputationallyexpensive.Furthermore,therawmeshisoftennotavailableatallontheCPUsincethemeshdataisloadedintoavertexbufferaccessibleonlybytheGPUatareasonablespeed.
Item4isanissuebecausetoday'sgamesoftenuseverycomplexmeshesforrendering,butalessdetailedmeshmightbeusedforcollisiondetection.Thedifficultyisthatthepointofintersectionthataraycastreturnsmightdifferquiteabitfromthelocationwherethedecalmustberendered.
TeamUnknownRelease
Chapter20-ADeferredDecalRenderingTechniqueGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
20.2TheGeneralIdea
Thebasicsolutionaffordedbyourtechniqueistoprojectadecalontoasurfaceusingaspecialfragmentshaderappliedtothedecal'sboundingvolumeinsteadofrenderingnewdecalpolygonsontopofthescenegeometry.Theonlyinformationrequiredtoachievethisconsistsofthepositionandorientationofthedecal,thedecal'ssize,andatexturecontainingdepthvaluesfortheviewportbeingrendered.Thisinformationallowsallcomputationtobedoneinthevertexandfragmentshaders.
Whenrenderingadecal,itisnaturalforustoworkin"decalspace",wherethex-andy-axeslieinthetangentplanetotheunderlyingsurfaceatthedecal'scenter,andthez-axisisparalleltothesurface'snormalvectoratthedecal'scenter.Inordertomovedataintothedecal'slocalcoordinatesystem,theshaderneedstobesuppliedwiththeinverseofthedecal'stransformationmatrix.
ThecodeinListing20.1firstfetchesthedepthfromtheG-bufferatthefragmentpositionandusesittoreconstructthe3Dworld-spacepositionofthefragment.TheworldToDecalconstantistheinverseofthetransformationmatrixforthedecalthatwearecurrentlyrendering.Withthismatrix,wecantransformthefragment'spositionintothelocalspaceofthedecal.Thislocalpositioncanthenbeusedtocomputetexturecoordinatesatwhichthedecaltextureissampled.Inthisfirstversion,weareusingthe(x,y)positiononly,whichsimplyprojectsthedecalalongitslocalz-axisontotheunderlyinggeometry,asshowninFigure20.2.
Figure20.2:Usingasimpleprojectionontothelocalx-yplanecausesdecalstobesmearedinthedirectionofthelocalz-axis.
Listing20.1:Thiscodedeterminesthedecal-spacecoordinatesforthefragmentbeingrenderedandtransformsthemintotexturecoordinatesforthedecal.TheRT_Depthtexturecontainsdepthvaluesfortheviewport,theworldToDecalconstantistheinverseofthedecal's4×4matrixtransformfromdecalspacetoworldspace,andtherecipDecalSizeconstantis1/s,wheresisthesizeofthedecalinthescene.
//SamplethedepthatthecurrentfragmentfloatpixelDepth=texture2DRect(RT_Depth,gl_FragCoord.xy).x;
//Computethefragment'sworld-spacepositionvec3worldPos=ComputeWorldSpacePosition(gl_FragCoord.xy,pixelDepth);
//Transformintodecalspacevec3decalPos=worldToDecal*worldPos;
//Usethexypositionofthepositionforthetexturelookupvec2texcoord=decalPos.xy*recipDecalSize*0.5+0.5;
//Fetchthedecaltexturecolorgl_FragColor=texture2D(diffuseDecalTexture,texcoord);
TeamUnknownRelease
Chapter20-ADeferredDecalRenderingTechniqueGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
20.3GeometryRendering
Nowthatwehavesomecodethatcalculatesbasictexturecoordinates,wemustconsiderwhatkindofgeometryweactuallyneedtorender.Eachfragmentrenderedwiththedecalshaderactslikea"window"throughwhichwecanpossiblyseeourdecal.Sowecouldjustrenderasurface-alignedquadcorrespondingtothesizeandpositionofthedecal.However,wewantourdecaltohavedepth,soweinsteadrenderaboundingcubeasshowninFigure20.3.Thisallowsustoseethedecalfromanydirection,anditwillallowustoaddawrap-aroundfeaturelateron.
Figure20.3:Aboundingcubecenteredonadecalcanberenderedtoensurethatwecapturethedecal'sinfluencefromanyviewpoint.
Asanoptimizationforscenescontainingalargenumberofdecals,itcanbeadvantageoustorenderdecalsthroughhardwareinstancing[7].Therefore,itisagoodideatosimply
renderaunitcubeandtransformittothecorrectsizeandpositioninthevertexshader,asdemonstratedinListing20.2.NotethatwecaneasilyextendtheuniformconstantsdecalSizeanddecalToWorldtoarraysandusegl_InstanceIDtorenderabatchofdecalsthroughinstancing.
Listing20.2:Thisvertexshaderscalesaunitcubetotheactualsizeofthedecalandtranslatesittothedecal'sworld-spaceposition.
uniformfloatdecalSize;uniformmat4decalToWorld;
//scaletheunitcubeandpositionitinworldspacevec4worldPos=decalToWorld*vec4(gl_Vertex.xyz*decalSize,gl_Vertex.w);
//outputthepositioninhomogeneousclipspacegl_Position=gl_ModelViewProjectionMatrix*worldPos;
Forsmalldecals,thistechniqueworksverywell.However,whenacubeisrenderedwithbackfacecullingenabledandthedepthtestsettoGL_LESS,thedecaldisappearsthemomentthecameraentersthecube.OnesolutiontothisproblemistocullfrontfacesandsetthedepthtesttoGL_GREATER.Thisway,onlyfragmentswhoseworldpositionsareactuallybehindsurfacesarerendered,butitcausesmanyfragmentstoberenderedunnecessarilywhentheyareoccludedbygeometryclosertothecamera.
Anothersolutionistofindthecornerofthecubethatisclosesttothecameraandrenderacamera-alignedquadatthatdepththatislargeenoughtoenclosetheentirecube.Iftheclosestcornerisinfrontofthenearplane,afull-screenquadshouldberenderedatthenearplaneinstead.
TeamUnknownRelease
Chapter20-ADeferredDecalRenderingTechniqueGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
20.4FadeOutAndWrap-Around
Wehaveabasicvertexshaderandfragmentshaderinplace,butourprojectionisinfinitealongthedecal'sz-axis,producingthesmearingshowninFigure20.2.Therearetwomethodswecanusetofixthisproblem.Thesimplermethodistousethedistancefromthefragment'spositiontothedecalplaneasafade-outparameter.ThisdistanceisalreadyavailableindecalPos.z,andwejusthavetoscaleittothesizeofthedecalasdemonstratedinListing20.3.
Listing20.3:Theabsolutevalueofthedecal-spacez-coordinateofthefragmentpositionisscaledtothesizeofthedecalandusedasafade-outparameterforthedecalcolor.Asbefore,therecipDecalSizeconstantis1/s,wheresisthesizeofthedecalinthescene.
//computethedistanceofthefragmenttothedecal'splanefloatdistance=abs(decalPos.z);
//scalethedistanceintothe[0,1]range//accordingtothesizeofthedecalfloatscaledDistance=max(distance*recipDecalSize*2.0,1.0);
//somehowusethatscaleddistancetofadeout//here:simplelinearfadeoutfloatfadeOut=1.0-scaledDistance;
vec4diffuseColor=texture2D(diffuseDecalTexture,texcoord);
gl_FragColor=vec4(diffuseColor.rgb,diffuseColor.a*fadeOut);
Thismethodisusefulwhenadecalissupposedtobeflatwithoutwrappingaroundgeometry.Theexactformulausedtofadeoutadecalcanbevariedtoproducethebestlookfordifferentdecals,anditisadvisabletomakethisconfigurablewithintheengine.
Amoreinterestingmethodforhandlingthesmearingproblemistomakeadecalwraparoundcornersandfollowthecurvatureofcomplexsurfaces.Thisisespeciallyusefulforbloodstainsandotherliquidsthatsplatterbecauseitismuchmoreconvincingifsuchadecalcoversanentiresurfaceindependentlyofitscurvature.
Thisisquiteeasytoachieve.Allweneedisthesurfacenormalatthepositionofeachfragmentinthedecal.Ifwerotatethatnormalintodecalspace,its(xy)componentsgiveusthegradientofthesurfacerelativetothedecalplane.Wecanusethisgradientandthefragment'sdistancetothedecalplanetomodifythetexturecoordinates.Inareaswithnorelativeslope,thetexturelookupremainsunchanged,butinareaswithalargeslope(forexample,atcorners),thetexturecoordinatesmoveoutwardaccordingtothedistancetothedecalplane.ThistechniqueisillustratedinListing20.4,andtheresultisshowninFigure20.4.
Figure20.4:Thedecalswraparoundcornersbasedonthesurfacenormals,andtheyfadeoutbasedonthedistancefromthedecalplane.
Listing20.4:Thiscodedemonstrateshowthenormaloftheunderlyingsurfacecanbeusedtoadjusttexturecoordinatesinsuchawaythatadecalwrapsaroundcurvesandcorners.TheRT_NormaltexturecontainsnormalvectorsfortheviewportencodedintheRGBchannels.
//gettheworld-spacenormalatthefragmentpositionvec3worldNormal=texture2DRect(RT_Normal,gl_FragCoord.xy).xyz;
//rotateitintothelocalspaceofthedecalvec3decalNormal=(worldToDecal*vec4(worldNormal.xyz,0.0)).xyz;
vec2texcoord=decalPos.xy;
//usethedistanceandgradienttomodifythetexturelookuptexcoord-=decalPos.z*decalNormal.xy;
//scaleandcenterthetexturecoordinatestexcoord+=vec2(decalSize);texcoord*=recipDecalSize*0.5;
gl_FragColor=texture2D(diffuseDecalTexture,texcoord);
Forthefinishingtouch,weaddcolorfadingthatdependsonthedistancetothedecalplane.Wealsoneedtoaccountfortheanglebetweenthedecalplane'snormalandtheunderlyingsurfacenormal,sinceotherwise,decalsappear
onthebacksideofthinwalls.AcompleteexampleshaderwithmoredetailscanbefoundontheaccompanyingCD.
TeamUnknownRelease
Chapter20-ADeferredDecalRenderingTechniqueGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
20.5SurfaceClipping
Thetechniquewehavedescribedworksbyapplyingdecalsinviewportspaceanddoesnotdifferentiateamongsurfacesontowhichitisprojected.Byusingtheprojectionandwrap-aroundmethod,itispossibletoattachdecalstoanykindofsurface,eventoanimatedcharacters.However,adecalisneverclipped,meaningthatadecalattachedtoaboxalsoprojectsontothegroundonwhichtheboxisresting.Iftheboxmovesandcarriesthedecalalongwithit,thentheprojectiononthegroundmovesaswell.
Therearetwopossiblesolutionstothisproblem.Onesolutionistorestrictdecalrenderingtopixelscoveredbystaticgeometryonly.Inthiscase,weneedtorenderallstaticgeometryfirst,thenapplydecals,andonlyafterward,renderdynamicobjects.
ThesecondsolutionrequiresthatanadditionalchannelintheG-bufferbeusedtoholda"decalID"foreverydistinctobjectthatisrendered.IntherenderingpassthatfillstheG-buffer,wewriteeachobject'sdecalIDalongwiththediffusecolor,normal,etc.,sothatwehaveaper-pixelmaskidentifyingtowhichobjecteachpixelbelongs.Whenadecalisappliedtoanobject,welookuptheobject'sIDandassociateitwiththedecal.Whenthedecalisrendered,wepassthisIDalongasanadditionaluniformconstantandcompareittotheIDreadfromtheG-buffer.Ifthetwomatch,thenweknowthatthepixelbelongstotheobjecttowhichthedecalisattached,andwerendernormally.Otherwise,wediscardthefragmentbecauseitwouldbedrawnoutsidethesetofpixelscoveredbytheobject.
Ifnodistinctionneedstobemadeamongstaticgeometries,thentheycanallsharethesameID,suchaszero.AlldynamicobjectsshouldhavedifferentIDs,butthoseIDsdon'tneedtobeunique.AllweneedtodoismakeitveryimprobablethattwodynamicobjectsneareachotherhavethesameID.Itthensufficestousean8-bitchannelandsimplygivethedynamicobjectsrandomIDsfromtherange[1,255].
ThiskindofobjectIDmanagementcouldalsobedoneusingthestencilbuffer.However,testingstencilvaluespreventsusfromusinginstancingtorenderthedecalsduetothefactthatwecannotpassthestencilcomparevalueasauniformconstanttotheshaderandchangethestenciltestonaper-fragmentbasis.
TeamUnknownRelease
Chapter20-ADeferredDecalRenderingTechniqueGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
20.6Limitations
Thereareafewlimitationsoneshouldbeawareof.Thistechniquedoesnotmagicallycreatevolumetricdecals.Itonlyimprovesa2Dprojectionsothatitlooksmorerealisticinmanysituations.Thewrap-aroundfeatureusesthenormaloftheunderlyingsurfacetochangethetexturecoordinatesinsideadecal.TheresultsshowninFigure20.4lookmuchbetterthaninFigure20.2,butthereisstillaclearlyvisiblecutattheedgeoftheverticalcolumn.Topreventsuchartifacts,onewouldneedtousetrulyvolumetricdecals.Figure20.5showsadecalappliedtoamorecomplexobject.Intheleftimage,theobjectusesfacenormals,andintherightimage,itusessmoothnormals.Inbothimages,nonormalmappingisconsidered.Obviously,thenormaloftheunderlyingsurfacehasabigeffectontheappearanceofthedecal.Consequently,surfaceswithstrongnormalmappingtendtodistortthedecal,sometimescausingvisualartifacts.Forsomekindsofdecals,thisislessofanissue,though,becauseabloodstainusuallystilllookslikeabloodstainnomatterhowdistorteditbecomeswhenappliedtoasurface.
Figure20.5:(Left)Wrap-aroundbasedonfacenormals.(Right)Wrap-aroundbasedonsmoothlyinterpolatedvertexnormals.
Awordaboutperformance:Itiseasytoculldecalsbytestingboundingvolumes,andtheGPUcanrenderdecalsinlargebatchesthroughinstancing.However,youshouldbecarefulnottohavemanydecalslayeredontopofeachotherbecausedoingsocaneasilyconsumeallavailablefragment-renderingpower.Youcantrytopreventsuchsituationsbyremovingdecalsafteracertainamountoftimeorbyrestrictingthenumberofdecalsonscreenatonce.Thelatterapproachallowsustokeepmanydecalsintheworldaslongastheyarenotvisibleatthesametime.Mostgamessimplycleanupthescenebyremovingdecalsafterafixedamountoftime,butlosealotofimmersivequalityintheprocess.ThegameMaxPayneplacesalimitonthenumberofdecalscreatedinasinglearea,whichhasthegreateffectthatiftheplayercomesbacktoaroomwherehewaspreviously,allthebloodstainsandbulletholesarestillthere.
TeamUnknownRelease
Chapter20-ADeferredDecalRenderingTechniqueGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
20.7AdditionalFeatures
SincewearealreadyrenderingthedecalsintotheG-bufferbeforeanylightingoccurs,wecannotonlychangethediffusecolor,butcanalsoreplace(ormodify)thenormal,gloss,andotherdataaswewish.Computingthetangentandbitangentvectorsforthedecalisverystraightforwardinthevertexshader,andnoadditionalvertexattributesareneeded.Formoreinformation,seetheexampleshaderontheaccompanyingCD.
TeamUnknownRelease
Chapter20-ADeferredDecalRenderingTechniqueGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]EricLengyel."ApplyingDecalstoArbitrarySurfaces".GameProgrammingGems2,CharlesRiverMedia,2001.
[2]JorisMansandDmitryAndreev."AnAdvancedDecalSystem".GameProgrammingGems7.CharlesRiverMedia,2008.
[3]OlesShishkovtsov."DeferredShadinginS.T.A.L.K.E.R.".GPUGems2.Addison-Wesley,2005.
[4]FrankPuigPlaceres."FastPer-PixelLightingwithManyLights".GameProgrammingGems6.CharlesRiverMedia,2006.
[5]RustyKoonce."DeferredShadinginTabulaRasa".GPUGems3.Addison-Wesley,2008.
[6]DeanCalver."DeferredLightingonPS3.0withHighDynamicRange".ShaderX3.CharlesRiverMedia,2005.
[7]FrancescoCarucci."InsideGeometryInstancing".GPUGems2.Addison-Wesley,2005.
TeamUnknownRelease
PartIII-ProgrammingMethodsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
PartIII:ProgrammingMethods
ChapterList
Chapter21:MultithreadedObjectModelsChapter22:HolisticTaskParallelismforCommonGameArchitecturePatternsChapter23:DynamicCodeExecutionHierarchiesChapter24:Key-ValueDictionaryChapter25:ABasicSchedulerChapter26:TheGameStateObserverPatternChapter27:FastTrigonometricOperationsUsingCordicMethodsChapter28:Inter-ProcessCommunicationBasedonYourOwnRPCSubsystem
TeamUnknownRelease
Chapter21-MultithreadedObjectModelsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter21:MultithreadedObjectModels
JonPariseElectronicArts
Overview
Manysimulationgamesareconstructedaroundacoremodelinwhicheverythinginthegameworldisrepresentedbyanobject.Eachobjectisadatacontainer,maintainingself-consistentstateforanin-gameentity.Objectscanbedefinedusingatypehierarchyoracompositionpattern,andtheycanbeorganizedspatially(asinascenegraph)orinflatcollections.
Onechallengecommontoalloftheseimplementationschemesishowtorepresentconsistentobjectstateinamultithreadedenvironment.Forexample,agameenginecouldupdatethesimulationstateononethread,renderthegameonasecondthread,andperformanimationandphysicsworkonadditionalthreads.Datasynchronization,consistency,andconflictresolutionquicklybecomeproblems.
Thisgemdiscussesfourapproachestosolvingtheseproblems:
Explicitlocking
Message-basedupdates
Multiplethreadcontexts
Bufferedstatechanges
Eachapproachvariesinitscomplexity,flexibility,andperformance.Tothatend,thereisnooneperfectsolutionforallgameengines,buttheinformationprovidedinthisarticle
shouldfacilitatechoosingthemostappropriatearchitecturetomatchaknownsetofrequirements.
TeamUnknownRelease
Chapter21-MultithreadedObjectModelsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
21.1ExplicitLocking
Themostcommonapproachtomultithreadeddatacontentionistoaddexplicitlockingusingoperatingsystemsynchronizationprimitives,suchascriticalsectionsandmutexlocks.Locksaregenerallyusedtoprotectconcurrentaccesstoshared,centralizeddataresources,suchasmessagequeuesordevicestate,andforthesetypesofusecases,theyareagoodwaytoensuredataconsistencyacrossmultiplethreads.Synchronizationprimitivesaregenerallywell-supported,theinterlockedscopesareexplicit,andtheinitialimplementationcosttendstobelow.
Whilelocksareofteneasytoimplement,theyshouldbeusedwithcare.First,locksintroduceblockingbehaviorintoamultithreadedapplication,andthepotentialforblocking(orworse,deadlock)increaseswiththenumberoflocksinthesystem.Second,manysynchronizationprimitivesconsumeoperatingsystemresources(suchaskernelhandlesunderWindows).Third,thereisnopointatwhichanobject'sstate(ormultipleobjects'states)canbeconsideredconsistent.Last,itisgenerallydifficulttoverifythecorrectnessoflargelock-basedsystemswithoutusingadvancedanalysistoolsandfrequentcodeaudits.
Inthespecificcaseofobjectmodels,eachobjectcouldpotentiallyrequireoneormorelocksinordertosupportmultithreadedaccess.Dependingonthesizeofthegame'ssimulation,thiscouldresultinalargeamountofoverhead,potentiallycompromisinggameperformance.Giventhat,thefine-grained,explicitlockingapproachseldomscalestolarge,complexsimulationgames.
TeamUnknownRelease
Chapter21-MultithreadedObjectModelsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
21.2Message-BasedUpdates
Amessagequeuecanbeusedtoserializeupdatesfrommultiple"writing"threads.Eachmessageinthequeuedescribesanobject-relatedstatechange.Thequeueisprotectedbyasinglelockthateachwritermustacquireinordertowriteamessage.Asingle"reading"threadperiodicallyprocessesthequeue'scontentsbyacquiringthelock,playingbacktheindividualmessages,andapplyingthestateupdatestotheirassociatedgameobjects.
Inthismodel,allobjectmodificationsultimatelyoccurfromthesingleprocessingthread,soprotectingindividualobjectsbecomesunnecessary.Readingobjectdatafrommultiplethreadsisstillproblematic,however,becausetherearenodataconsistencyguaranteesforthreadsotherthantheoneperformingthemessageprocessing.
Therearethreemeasurableformsofoverheadassociatedwiththisapproach.Thefirstinvolvesthememoryoverheadofthemessagequeueitself.Second,statechangesaredelayedforaslongastheirassociatedmessageisinthequeuewaitingtobeprocessed.Andthird,allwritingthreadsmustblockwhilewaitingtoacquirethequeue'slock,whichcouldbeheldbyotherwritersorbytheprocessingthread.Theprocessingthreaddoesnotnecessarilyneedtoblockifitfailstoacquirethelockimmediately,butwaitinguntilitsnextattemptwillfurtherdelaytheenqueuedupdates,andthereisstillnoguaranteeitwillbeabletoacquirethelockwithoutblocking.
Giventheseconsiderations,amessage-basedapproachis
mostappropriateforenginesthatdistributeworkfromasingleprimarythreadtomultipleworkerthreads.Itprovidesanicemechanismforserializingtheresultsofthosejobsandthenapplyingtheminthecontextoftheprimaryupdatethread.Synchronizationislimitedtoasinglelockprotectingthemessagequeue,whichreducescomplexityandaidsdebugging.
TeamUnknownRelease
Chapter21-MultithreadedObjectModelsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
21.3MultipleThreadContexts
Anothersolutionexplicitlyidentifiestheobjectdatafieldsthatneedtobeaccessedconcurrently.Thesefieldsareduplicatedandpartitionedintoper-threadcontextstructures.Eachthreadeffectivelygetsitsown"private"versionofthefieldsthatitcanaccesswithoutusinglockingprimitives,enablingasynchronousreadingandwriting.Thefields'contextsareperiodicallysynchronizedtoupdateallthreads'viewsofthedata.
MethodsthataccessthesefieldsselecttheappropriatecontextstructureusingthecurrentthreadID.Ifamethodmodifiesthefield'svalue,itmustalsomakearecordofthechange,suchasupdatingthefield'sassignedbitintheobject'sper-thread"modifiedfieldsmask".Thisallowsindividualchangestobetracked,whichdrivethesynchronizationprocess.
Attheendofeachframe,allmodificationsaremerged:thechangesmadebyonethreadarecopiedtotheotherthread'scontext,andvice-versa.Thisprocesscanbeoptimizedbyusingaper-thread"changelist"totrackunresolvedobjects(inconjunctionwiththe"modifiedfieldsmask"mentionedabove).
Intheeventthatboththreadshavemodifiedthesamefieldindependently,themergeisambiguous.Inthesecases,onethreadisconsideredauthoritative,andtheopposingthread'smodifiedvalueisoverwritten.Inaproperlyorganizedsystem,however,thisshouldrarely,ifever,happen.Mostfieldswillonlybemodifiedbyonethreadwhilebeingreadby
many.
Themergeoperationisperformedbyanexplicitsynchronizationstepattheendofarenderingframe(attheverticalblank,forexample)whenboththreadsenteracommonbarrier.Thiseffectivelytiesthenon-renderingthreadstotherenderingthread'supdaterate,butideallytheotherthreadsareperformingtime-slicedworkthatiscompatiblewiththistimingmodel.Alternatively,synchronizationcouldbebasedonanon-renderingthread'supdaterate(suchassimulationupdates),butthentherenderingthreadcouldnotbelockedtoaspecificvisualrefreshschedule.
Thisarchitecturecanbeextendedtosupportadditionalper-threadcontextstructurescontainingthread-localfieldsthatarenotsynchronized.Thisallowsthread-specificcodetostoreobject-specificdatausingtheexistingobjectsystem.Forexample,therenderingsystemcouldstorescenecullinginformationoneachobjectthatwouldonlybeaccessiblethroughthatthread'scontext.
Thememorycostassociatedwiththisapproachscaleswiththenumberofduplicatedcontexts.Theminimumnumberofcontextsistwo,sothisarchitecturecouldpotentiallyhalvethenumberofobjectsthatcanfitinmemoryatonetime,assumingallobjectfieldsneededtobesynchronized.Therearealsoruntimecostsassociatedwithselectingtheactivethread,trackingmodifiedfields,andmergingcontexts.
Thisapproachhastheadvantagethatallthreadsynchronizationiscentralizedwithintheobjectsystem.Codethatworkswithobjectscanremainignorantoftheunderlyingsynchronizationsystemandwillneverblockoutsidethe
explicitsynchronizationbarrier,whichispredictable.Italsoguaranteesaconsistentviewacrossallobjectdatafromeachthread'sperspective,aslongasalldataaccessrespectstheper-threadcontextpartitioningscheme.
TeamUnknownRelease
Chapter21-MultithreadedObjectModelsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
21.4BufferedStateChanges
Thefinalapproachpresentedhereisfairlystrictandcomplex,butitalsohasalottoofferintermsoffeaturesandarchitecturalcleanliness.Theimplementationcentersonadedicatedcommandqueuethatbuffersmultipleframesofstatechangesfromonethreadtoanother.Thiseffectivelysplitstheengineintotwohalves:adedicatedsimulationthread,whichproducesthestatechanges,andarenderingthread,whichpresentsthesimulationstatetotheplayer.
Anotherwaytothinkaboutthisseparationisinnetworkingterms:thesimulationthreadactslikeanetworkserver,thecommandqueuefulfillstheroleofthenetworktransportlayer,andtherenderingthreadissimilartoanetworkclientapplicationthatpresentsthebufferedstatechanges.
Thecommandqueueisdesignedtoholdmultipleframesofsimulationstate.Thesimplestpossibleimplementationusestwoframesinatraditionaldouble-bufferingscheme:whileoneframeisbeingwrittenbythesimulationthread,theotherframeispresentedbytherenderingthread.
Frameswapsarecontrolledbyasinglelock.Thesimulationthreadholdsthelockwhileitwritesitsstateintoanewframe,forcingtherenderingthreadtocontinuerenderingitscurrentframeuntilthenewframeismadeavailablebythesimulation.Renderingisthereforeneverstalledbythesimulation,butitisforcedtorepresentthesamesimulationstateovermultiplerenderingframeswhenitisrunningatafasterupdateratethanthesimulation.
Becauseitisquitecommonfortherenderertorunatafaster
rate,abettercommandqueueimplementationusesthreeormoreframesofsimulationstate.Thislayoutallowstherenderertoconsidermultipleframesofsimulationstate,interpolatingbetweenthembasedontiminginformationembeddedwithintheframes.Eachadditionalframeofdatainthequeueaddslatencyandmemoryoverhead,however,sotheideallayoutformanyenginesusesthreeframes.
Inordertofacilitateinterpolation,eachframeincludesatimestampindicatingwhenthedatawaswrittenbythesimulation.Also,eachentitywithinaframe,suchasanobject,isassignedauniquehandle.Whenanentityisserialized,bothitscurrenthandleanditshandlefromthepreviousframearewritten.Thisallowstherenderer'sinterpolationlogictomatchanentryinoneframetoitscorrespondingentryinthepreviousframe.Handlesaregenerallyrepresentedasindexesoroffsetsintotheframe'sbuffer.
Inadditiontobuffered,interpolativestate,thecommandqueuecanalsohandlediscreteevents.Frameeventsaretriggeredbythesimulationandhavesomeassociatedvisualeffect,suchasstartingorstoppingaparticlesystem.Whenoneoftheseeventsiswrittenbythesimulation,itbecomesassociatedwiththesimulation'scurrentframe.Bufferedeventswon'tbeexecutedbytherendereruntilitisalsoreadytoevaluatethatframe'sserializedstateaswell.Thisrulemaintainsvisualconsistencybetweenobjectstatechangesandevents.
Therearethreeclearadvantagestousingaframe-baseddependencyqueue.First,itresultsinacleanseparationbetweenthesimulationandrenderingsystems.Second,
communicationbetweensimulationandrenderingisone-way,removinganypotentialforambiguityconcerningthestateoftheworld.Lastly,timingandthreadsynchronizationareexplicit,allowingthethreadstorunatdifferentupdaterates.
Thistypeofarchitecturedoescarrysomeheavycosts,however,particularlyintermsofmemoryusage.Thesizeofeachframeinthequeuecanbesignificant,and,dependingonthegamedataandserializationrequirements,framesizesmightvaryfromonetothenext,makingitdifficulttoanticipatethepeakmemorysizerequiredbythequeue.Therearerun-timeprocessingcostsassociatedwithserializing,unpacking,andinterpolatingbufferedstate,andthereisthecostofdelayingstatechangeswhilethey'rebufferedinthequeue.Lastly,intermsofoverallenginedesign,separatingsystemsintorenderingandsimulation"halves"canbeachallenge.
TeamUnknownRelease
Chapter21-MultithreadedObjectModelsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
21.5SelectingtheBestApproach
Fourdifferentapproachestosynchronizingobjectstateinamultithreadedgameenginehavebeenpresented.Allofthemhavebeenusedsuccessfullyinmanyshippingtitles,butselectingthemostappropriatearchitectureforaspecificenginerequirescarefulevaluationofthatgame'srequirements.Inaddition,itisentirelypossibletoimplementhybridschemes,perhapsonaper-systembasis,butcaremustbetakentoavoidintroducingadditionaldataconsistencyproblemsasaresult.
It'simportanttoemphasizetheweightthatengineandgameplayrequirementsshouldhaveinthisdecision.Forexample,ifthegameembedsascriptinglanguage,theimplementationdetailsofthescriptingsystemmaydictateadditionalrequirementswithregardtohowscriptinteractswiththegame'sobjectsystem.Thescriptingenvironmentcouldevenhosttheentireobjectsystemitself,furtherinfluencinghowtheobjectsystemisconnectedtotherestofthegameengine.Alternatively,thegameplaymaynottolerateinputdelays,inwhichcasesomeofthebuffering-basedapproachesdiscussedabovemayintroduceunacceptablelatency.
It'salsoimportanttoconsiderexternallibrarieswhendesigningthisaspectofthegameengine'sarchitecture.Manymiddlewarepackageshavespecificrequirementswithregardtodataaccesspatterns.Forexample,DirectXonthePCrequiresthatmultithreadingsupportbeexplicitlyenabled,anditcomeswithameasurableperformancepenalty,soenginesthattargetthePCplatformgenerallyrestrict
renderingoperationstothemainapplicationthread.
Withhardwareandsoftwaretrendsclearlyembracingparallelexecutionenvironments,dealingwithconcurrentdataaccessproblemsisbecominganincreasinglycommonproblem.Gameenginesmustbedesignedwithbothruntimeanddeveloperefficiencyinmind,andtheconcurrencychallengesputevenmoreemphasisonestablishingsolidarchitecturalfoundations.Hopefully,thematerialpresentedinthisgemhascontributedsomeusefultechniquestothataspectofgameenginedevelopment.
TeamUnknownRelease
Chapter22-HolisticTaskParallelismforCommonGameArchitecturePatternsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter22:HolisticTaskParallelismforCommonGameArchitecturePatterns
BradWerthIntelCorporation
22.1TasksVersusThreadsinGames
Parallelprogrammingingameshastypicallyreliedupontheuseofthreadstoprovideconcurrentexecutionofwork.Threadsareindependentprocessingstreamsthataregivenexecutiontimebaseduponaschedulingalgorithminthehostoperatingsystem.Thissystemworkswellaslongasthereareenoughhardwareresources(CPUcores)toruntheavailablethreads.However,moderngamesrunonavarietyofplatforms,withverydifferentcoretopologies.Predeterminingaspecificnumberofthreadsforafixedamountofworkisnolongeraviablestrategy.
Apopularalternativestrategyistodefineathreadpool,withanumberofthreadsscaledtofittheavailablehardwareresources.Parallelworkisdividedintotasksandassignedtothethreadsforexecution.Thesetasksdifferfromthreadsinthattheyshouldneverblockwhilewaitingonothercomputation.Whenthetaskisstarted,itisruntocompletionandcannotbeswappedoutforthepurposeofrunninganothertask.Sothechallengeofcoordinatingtheactivityofmultiplethreadsischangedintotheeffortofdetermininghowtodivideworkintotasksandunderwhatconditionsthosetasksshouldberun.Thesecondhalfofthisgemdemonstratesmethodsforbreakingdowntypicalparallelgamepatternsofworkintotasks.
Figure22.1(a)showsanexampleofhowsomesampleworkcanbedividedintotasks.ThetotalworkhasaninitialperiodofuninterruptedworkA,butthenneedstowaitontheoccurrenceofsomeeventEbeforeproceedingwithadditionalworkB.ThisdependencyontheeventE
effectivelysplitsthissectionofworkintotwotasks,AandB.Later,workCisasectionthatcouldbesplitintopiecesandrunconcurrently(adatadecomposition).Ifitissplitintopieces,workDmustwaitforallofthosepiecestocompletebeforeitcanproceed.ThismeansthattherestoftheworkcanbesplitintosomenumberoftasksC1,C2.…,CnandtheremainingtaskD.ThefinaltaskbreakdownisshowninFigure22.1(b).
Figure22.1:(a)Worktobedividedintotasks.(b)Theworkexpressedasdependenttasks.
Takenalone,thistransformationfromthreadstotaskscanseemunderwhelming.Therealpowerofthisapproachbecomesevidentwhenagamearchitectureisabletoleveragelargeamountsofparallelworkatonce.Inthatcase,thetaskscanbepackedintothethreadsinthepoolwithveryfewgapsofinactivity.Mappingallthetaskstothethreadpoolinanefficientfashionistheresponsibilityofataskscheduler,theotherfocusofthisgem.
TeamUnknownRelease
Chapter22-HolisticTaskParallelismforCommonGameArchitecturePatternsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
22.2TheTaskScheduler
Thereareonlyafewfeaturesthatmustbeimplementedinataskscheduler.Ataminimum,itmustbepossibletodispatchandwaitontasks.Dispatchedtasksareassignedtoathreadinthethreadpool,thesizeofwhichcanbespecifiedatinitializationtime.Althoughthiscanbedonesimplywithasinglesharedtaskqueue,mosttaskschedulerimplementationshaveamorecomplicatedinternalarchitectureforperformancereasons.
Thekeyperformanceimprovementistheuseofper-threadtaskqueues.Thiseliminatesthesynchronizationchokepointwhenonesharedtaskqueueisused.Ifataskspawnsadditionaltasks,thenewtasksareaddedtothecurrentthread'squeue.Thisintroducesthepossibilityofqueuesizeimbalance,whichistypicallyresolvedbytheuseofworkstealing.Workstealingallowsathreadthathasemptieditsowntaskqueuetotakeworkfromanotherthread'squeue.Advancedtaskschedulersmayuseheuristicstodeterminewhichthreadtostealfromandwhichtasktosteal,andthismayhelpcacheperformance.Together,theseimprovementseliminatesignificantsynchronizationoverheadinthetaskscheduler.
Eventhoughtheminimalfeaturesetissosimple,itcanbedauntingtocreateataskschedulerfromscratchbecauseofthedifficultyofwritingcorrectmultithreadedcode.Thankfully,thereareexistingexamplestouseorstudy:Nulstein[1]isasmallandsimplefreesourcetaskschedulerforWindows,andIntelThreadingBuildingBlocks[2]isahighlyoptimizedschedulerwithbothaproprietaryandanopensourcelicense
availableforWindows,OSX,andLinux.TheopensourceversionhasalsobeenportedtotheXbox360.IntelThreadingBuildingBlocksisusedforthecodesamplesinthisgem,butanytaskschedulerwillworkaslongasithastheminimumfeatureset:dispatchingandwaitingontasksinascalablethreadpool.
TeamUnknownRelease
Chapter22-HolisticTaskParallelismforCommonGameArchitecturePatternsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
22.3DecomposingGamePatternsintoTasks
Ataskschedulercanefficientlyexecutetasks,butparallelgamesdon'ttypicallyusetasksdirectly.Instead,thereareanumberofpatternsthathaveemergedingamearchitecturesforhandlingparallelwork.Conveniently,it'snotdifficulttotransformthesepatternsintotask-basedpatterns.Thesetransformationsaredescribedforthosecreatingataskscheduleraswellasforthoseworkingwithanexistingtaskscheduler.Thepatternscanbeimplementedineithercase,butitisoftenmoreefficienttoextendthetaskschedulertosupportthepatternsthatyourgameactuallyuses.
TheexamplesbelowdescribeeachpatternandshowanexcerptofthecompletesourcecodethataccompaniesthisgemontheCD.Youwillneedtoexaminethecompletesourcecodetoseethewholepatterninaction.
CallbacksandFutures
Gamesfrequentlyhaveaneedtorunsomeworkinparalleltothemainwork.Thisparallelworkissometimesstructuredasafunctionthatsetsaflagwhencomplete.Thisflagcanbecheckedlatertodeterminewhentheparallelworkhasfinished.Thisisanexampleofacallbacksystem,anditmapsdirectlyintoatask-basedpatternbyusingthedispatchmethodofthetaskscheduler:
TaskManager::JobResultresult;boolflag=false;taskManager->dispatch(&result,(TaskManager::JobFunction)doCallback,&flag);
Theobviousimprovementthatcanbemadetothissystemistoallowthedispatchingthreadtowaitonthecompletionofthefunctionpointer.Thisiscalledthefuturepatternandissupportedbythetaskscheduler'sdispatchandwaitmethods:
taskManager->dispatch(&result,(TaskManager::JobFunction)doFuture,NULL);taskManager->wait(&result);
Awell-designedtaskschedulerwillensurethatwaitingonataskisanactivewait;insteadofsleeping,the"waiting"threadwillexecuteparallelworkifthereisworkavailable.Thefullcodeforthisexampleis"CallbackandFutureSample"ontheaccompanyingCD.
IndependentLoopsandSplittableTasks
Loopsappearfrequentlyingamearchitectures,andsomeofthoseloopscontainlargeamountsofcomputation.Iftheiterationsofalooparelogicallyindependentandthenumberofiterationsisknownatthestartoftheloop,thensubsetsofiterationscanbegroupedintoatask.Thisistypicallydonebydefiningthebodyofaloopasafunctionthattakesasingleparameter,thecontextobject.Thecontextobjectencapsulatesallthedataneededbytheoriginalloop,whichincludesatleastthestartandendindicesoverwhichtoiterate.Thetaskschedulerisgivenafunctionpointerandanarrayofcontextobjectsthathavebeeninitializedwithindexsubrangestocovertheoriginalloop'siterationrange.Thetaskschedulermethodprototypelookslikethis:
voiddispatchMultiple(JobResult*result,//structuretotrackcompletionJobFunctionfunc,//pointertotaskfunctionvoid*params,//arrayofcontextobjectsfortaskssize_tparamSize,//sizeofonecontextobjectunsignedintcount//numberoftaskstocreate);
Althoughthisapproachiseffective,itrequiresthegamecodetoallocateanarrayofcontextobjects,todecideaheadoftimehowmanytaskstosplittheloopinto,andtoinitializethecontextobjectswiththeappropriatesubranges.Toavoidtheseconstraints,thetaskschedulercanprovideamethodtodispatchsplittabletasks.Asplittabletaskhasafunctionpointerwiththreeparameters:inadditiontothecontextobject,itisalsopassedthestartandendindices.Theschedulerpassestheparameterstothetask'sfunctionpointer,butitmustalsodetermineifatask'sindexrangeshouldbesplitintosubrangesassignedtosubtasks.Anadditionalfunctionpointerisneededtodeterminewhetherandhowtosplitarangeintwopieces.Withalloftheseelementsinplace,thegamecodecantransformanindependentloopintotaskswithoutallocatingarraysofcontextobjectsorpredeterminingthenumberoftaskstocreate.Thetaskschedulermethodprototypelookslikethefollowing.(Thefullcodeforthisexampleis"LoopSample"ontheaccompanyingCD.)
voiddispatchSplittable(JobResult*result,//structureusedtotrackcompletionJobRangeFunctionrangeFunc,//pointertotaskfunctionJobSplitFunctionsplitFunc,//functionthatsplitsarange
void*param,//contextobjectfortasksunsignedintstart,//beginningofrangeunsignedintend//endofrange);
Long,Low-PriorityOperations
Gamesoccasionallyrelyoncomputationthatneedstoruncontinuously,butnotattheexpenseofothermoreimmediatework.Levelloading,assetdecompression,andAIpathfindingarecommonexamples.Thelow-priorityoperationrunscontinuouslyandprovidesperiodicoutputtothemainworkofthegamearchitecture.Onitssurface,thisseemslikeapatternthatisfundamentallyatoddswithtaskparallelism.However,ifthecontinuouscomputationcanbechangedintoaniterativealgorithm,theneachiterationofthealgorithmcanbetreatedasatask.
Oncethetransformationiscomplete,thenextchallengeistoensurethatthesetasksaregivenalowpriorityrelativetoothertasksinthetaskscheduler.Taskschedulerscanhandlethisinafewways:thetaskqueuesforeachthreadcanbechangedintopriorityheaps,orthelow-prioritytaskcanbeinsertedintothetaskqueueinalocationthatmakesitlesslikelytoberunimmediately(animplementation-specificdetail).Ifyouareusinganexistingtaskschedulerwithoutthiscapability,youcandispatchlow-prioritytasksopportunisticallywhenothertasksarebeingdispatched:
voidourDispatch(TaskManager::JobResult*result,TaskManager::JobFunctionfunc,void*param){
TaskManager*taskManager=TaskManager::getTaskManager();
//Beforewedispatchthefunctask,wechecktoseeifweshould//dispatchalow-prioritytask.if(g_lowPriorityTaskFlag){//It'stimetodispatchalow-prioritytasktaskManager->dispatch(&g_lowPriorityResult,(TaskManager::JobFunction)doLowPriorityCallback,&g_lowPriorityTaskFlag);}
//Nowwedispatchwhateverwewereaskedtodispatch.taskManager->dispatch(result,func,param);}
Thisworkaroundiseffectivesincemosttaskschedulersassigntaskstothreadsinlast-in-first-outorder.Ifthemostrecentlyaddedtaskspawnsadditionalparallelwork,tasksinsertedearlierwillgenerallynotbestarteduntilthenewparallelworkhasbeenstarted.Thefullcodeforthisexampleis"Low-PrioritySample"ontheaccompanyingCD.
SynchronizedCallbacks
Occasionally,gamesneedtodoper-threadinitializationbeforerunningparallelwork.Thisisusefulwheninteractingwithsomethreadedmiddlewarepackages.Whenyoucreatethetaskpooldirectly,itistrivialtoensurethateachthreadmakesthenecessarycalls.Butwhenyourthreadpoolismanagedbyataskscheduler,itcanbemorecomplicated.Aslongasyourtaskschedulerusesworkstealing,youcandispatchanumberoftasksequaltothenumberofthreadsin
thepool,andusesynchronizationprimitivestopreventthosetasksfromcompletinguntilallhavebeenassignedtothreads:
tbb::task*TaskManager::SynchronizedTask::execute(){ASSERT(m_func!=NULL);m_func(m_param);(*m_atomicCount)--;
while(*m_atomicCount>0){//yieldwhilewaitingTaskManager::yield();}
returnNULL;}
Ifyourtaskschedulerdoesnotuseworkstealinganddoesnotallowyoutodirectlyassigntaskstothreads,thenanotheroptionistodefineyourtaskstodoajust-in-timecheckforper-threadinitializationwhentheyarerun.Thefullcodeforthisexampleis"SynchronizedSample"ontheaccompanyingCD.
DirectedAcyclicGraphs
Earlierinthisgem,welookedathowanarbitrarypieceofparallelworkcouldbesplitintotasks,resultinginadirectedacyclicgraph(seeFigure22.1).Manygamesconceiveoftheirparallelworkthiswayandwouldliketosubmittaskstoataskschedulerprecededbyalistofancestors.Thetrickto
implementingthispatternistoencapsulatethetasksinobjectsthatcanmanagethedependencyrelationshipsandactuallydispatchthetaskswhenappropriate.Whenataskcompletes,itnotifiesthedependency-trackingobjects,whichmaytriggermoretasksbeingdispatchedtothetaskschedulerasfollows.(Thefullcodeforthisexampleis"GraphSample"ontheaccompanyingCD.)
voiddoDAGTaskFunction(){//First,wecallourfunctionpointer.m_func(m_param);
//Nowwetellourchildrenthatwe'redone.tbb::concurrent_vector<DAGTask*>::iteratorit=m_children.begin();while(it!=m_children.end()){(*it)->parentDone(this);it++;}}
TeamUnknownRelease
Chapter22-HolisticTaskParallelismforCommonGameArchitecturePatternsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
22.4TheFutureofTaskParallelisminGames
Establishedhabitsofgamedevelopmenthavebeendisruptedbytheinstalledbaseofmulti-coreCPUs.Inthesameway,gamedevelopmentwillbedisruptedagainbytheintroductionandevolutionofpowerfulmany-coreprocessorsthatrequirespecialprogrammingtechniquesinordertobeusedeffectively.Withthetechniquesdiscussedinthisgem,you'llbeabletocreatetaskparallelismabstractionsthatwillgetyourgamerunningquicklyinanyparallelhardwareenvironment.
TeamUnknownRelease
Chapter22-HolisticTaskParallelismforCommonGameArchitecturePatternsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]JérômeMuffat-Méridol."Do-it-yourselfGameTaskScheduling".IntelSoftwareNetwork,2009.http://software.intel.com/en-us/articles/do-it-yourself-game-task-scheduling/
[2]IntelThreadingBuildingBlocks.http://www.threadingbuildingblocks.org/
TeamUnknownRelease
Chapter23-DynamicCodeExecutionHierarchiesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter23:DynamicCodeExecutionHierarchies
MartinLinklaterSonyStudioLiverpool
Overview
Ashardwarebecomesmorecomplexandpowerful,thesoftwarethattheyrunbecomeslargerandmorecomplex.Assoftwaregrows,thenumberofprogrammersneededtowriteandmaintainthesoftwareincreases.Asanyonewhohasexperiencedlargeprogrammingteamswilltellyou,themoreprogrammersyouhaveworkingononeproduct,themoretimeyouneedtospendpolicingthecodestructureandmanagingitscomplexity.
Asdifferentprogrammerscreateandmodifygamesystems,itiseasyforthecodetobecomeamismatchofdifferentpatternsandstyles.Programmersalsotendtohavetheirownpersonalpreferenceswhenitcomestoobjectcreation,initialization,updating,anddeletion.
Inter-objectcommunication,ifnottightlyregulated,canbecomeaspaghettiofcodedependenciesandobjectgraphtraversals.Asprogrammersbuildobjectsthatrequiredatafromotherobjects,itisalltooeasytocreatemonolithicdependencygraphsandbeforcedtonavigateawkwardandnon-intuitiveobjectlinkages.Itisalsotooeasytocreateheaderfiledependenciesthathavenothingtodowiththeactualjobyouaretryingtoaccomplish,butarerequiredsoyoucanconnecttootherobjectsinthecode.
Keepingtrackofcodeconstructioncanbecomeafull-timejob,andmodifyingthecodeduringthelatterstagesofaprojectcanbecomeerrorproneanddangerous.Fragilecodeisdifficulttobug-fixwithoutcausingmorebugs.
Thisgemdiscussescodeexecutionhierarchies(CEH)asa
wayofcontrollingtheseproblemsandvisualizingyourcodewhenthingsgetconfusing.
TeamUnknownRelease
Chapter23-DynamicCodeExecutionHierarchiesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
23.1WhatareCodeExecutionHierarchies?
Codeexecutionhierarchiesprovideaframeworkforcodeconstructionandupdating.Youuseasimplebaseclassforallofyourmajorgamecomponents,anddefinetheupdateorderbywayofamanagerclasswithaninternaltreestructure.Ratherthancreateobjectsandaddthemmanuallytothemainloop,asinFigure23.1(a),youcreateyourobjectsandattachthemtotheexecutiontreerelativetoanalreadyexistingobject.Conceptually,thestructureissimilartoascenegraphingraphicsprogrammingwhereeachobjecthasaparent,andtheobject'slocationinthesceneisalwaysdefinedrelativetoitsparent.Whenyoumoveorupdateanodeinthegraph,allofitschildrenareautomaticallymovedwithit,maintainingtheparent-childrelationships.ThemainloopusingacodeexecutionhierarchyisshowninFigure23.1(b).
Figure23.1:(a)Traditionalupdateloop.(b)Codeexecutionhierarchy.
Explicitlydefiningtheparent-childandsiblingrelationshipsofyourcodemodulesnotonlyforcespeopletothinkbeforeaddinganobjecttotheupdatesystem(alwaysagoodthing),butalsohelpsprotectyourcodefromstructuralchangesinthefuture.Ifacodemoduleneedstobeupdatedsomewhereelseduringtheframe,youmovetheparent,andthechildobjectsareautomaticallymovedalongwithit.
Asimplebaseclassisusedforexecutionobjectssothatthemanagerclasshasaconsistentinterfaceforyourobjects.ThisbaseclassalsodefinestheAPIforobjectcreation,initialization,updating,anddeletion.Runtimetypeidentificationsystemsprovideasimplewayforobjectstoidentifythemselveswithinthehierarchy.Thisisusedwhenobjectsneedtofindobjectsofacertaintypewithinthehierarchy.AsimplebaseclassmaylooklikethatshowninListing23.1.ObjectscanbecreatedandaddedtothehierarchyasshowninListing23.2.Ratherthanbuildingyourcodeexplicitlyanddefiningyourupdateorderinthemainloop,youarecreatingcodeobjectsandinsertingthemintothecodeexecutionhierarchy.
Listing23.1:Examplebaseclass.
classCEHBase{public:
CEHBase();virtual~CEHBase();
virtualvoidInit(void);virtualvoidUpdate(floatdt);virtualvoid...grabthelatestcodeandprunethegoodbits
protected:
m_classID;};
Listing23.2:Examplegamecode.
MyObj*obj=newMyObj;obj->Insert(kAsChildOf,kRootNode);obj->Init();
OtherObj*other=newOtherObj;other->Insert(kAsSiblingAfter,obj);other->Init();
TeamUnknownRelease
Chapter23-DynamicCodeExecutionHierarchiesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
23.2DesignFeatures
TreeStructure
Theobjecthierarchyitselfisatreestructure.Thereisonerootnode,ownedbythemanagerclass.Eachobjecthasoneandonlyoneparent.Eachobjecthaszeroormoresiblings,andeachobjectcanhaveonechildlink.Onceyouhaveyourtreestructure,therearetworelationshipsthatarerequiredtobemaintainedwhenparsingthetree:
Parentsarealwaysupdatedbeforetheirchildren.
Siblingsarealwaysupdatedinthesameorder,firsttolast.
Giventhesetworelationships,therearetwowaystotraversethetreeandupdateyourobjects:depthfirstandbreadthfirst.Whichoneyouchoosedependsonyourrequirements,butduetomemoryaccesspatternsandperformanceIfavorthebreadth-firsttraversal.Figures23.2and23.3showasimplegraphandtheupdateordersforbothsystems.Asyoucansee,thetworulesabovearemaintainedforbothtraversalsystems—previoussiblingsareprocessedbeforenextsiblings,andparentsareprocessedbeforechildren.Areal-worldexampleofaCEHmightlooklikethatshowninFigure23.4.
Figure23.2:Simplegraph.
Figure23.3:Updateorders.
Figure23.4:ExampleCEHGraph.
TimeDeltas
YoumayhavenoticedthattheUpdate()methodinthebasetakesafloating-pointdtparameter.ThisisusedtotellyourobjecthowmuchtimehaselapsedsincethelastcalltoUpdate().MakingyourcodeflexibleregardingupdatefrequenciesanddecouplingobjectsfromV-synceventscanhavegreatbenefits,aslongasyouarewillingtodotheextra
worktomakeyourobjectinternalsrobustforvaryingdtvalues.
AddingawarenessoftimealsogivesyourCEHthepotentialtodealwithvaryingtimingrequirementsforitsobjects.Forinstance,itissimpletobuildbasictimerfunctionalityintothemanagerclass,allowingyoutospecifyhowoftenvariousobjectsrequireupdating.Ifyouhaveanobjectthatonlyneedstobeupdatedapproximatelyonceeachsecond,thenthatobjectcantellthemanagerthatitsdesiredupdateintervalisonlyeverysecond.Themanagerclasscanthenhandletheupdatecallforyou.
DynamicStructure
Sincetheupdatehierarchyisbuiltatruntime,itcanbetreatedasadynamicdatastructure.Updateorderisnothardwiredintoyourcode,butcanbemodifiedandalteredasrequired.Ifnetworkcodedoesnotneedtobeupdatedduetotherebeingnonetworkconnection,yousimpleomititfromtheupdategraph.Thiscanbemuchcleanerthanplacingthefollowingifstatementsalloveryourcode.
if(networkActive){...blah...}
Italsoallowsyoutodefineyourupdategraphviaadatafile,allowingruntimebehaviortochangewithoutrecompilingyourcode.
Introspection
BybuildinganintrospectionsystemintoyourCEHbaseclassyouallowformuchmoreflexiblecodeconstruction.I'msurewe'veallbeeninthepositionofneedingtogetahandleonacertainclassinstanceandbeingforcedtonavigatealotofrun-timelinkagetogetattheobject.Thealternativeistomakethedesiredobjectglobal,rarelyagoodthing.Asanexample,inFigure23.2,forobjectEtogetapointerforobjectG,itcouldberequiredthatEgoesthroughcodelikethis:
ptrG=GetD()->GetA()->GetB()->GetF()->GetG();
Thisisratherclunky,andifthelinkageofintermediateclasseschanges,sayobjectsAandBareswapped,allofthecodethatmanuallynavigatesthelinkagesneedstobeupdated.
UsingaCEHwithintrospection,thecallcouldlooklikethis:
ptrG=GetFirst<G>();
ThiscodedoesnotneedtobealteredifthehierarchychangessincetheCEHmanagerdoesalltheworkoflocatingthefirstinstanceofclassGandreturningapointertoit.ThisishowusingCEHscanmakeyourcodemoreresilienttocoderestructuring.TheintrospectionsystemcanalsoprovideAPIsthatdealwithvectorsofobjectsandfilteringofobjectsbasedonrelativelocation(above,below,etc.).Objectlinkageisnolongerexplicitincode,butdynamicatruntime.
Visualization
Visualizingtheexecutionofyourcodecanbeagreathelpindebuggingandbringingnewmembersofyourteamuptospeedwithyourcode.UsingaCEHcanhelpmakethisvisualizationconsistentanduseful.Sincethecallinggraphandactualcallorderarecontrolledbyyourmanagerclass,youcangetthemanagertooutputdataaboutyourexecutionbehaviorforyou.AhandyformatthatIuseistooutputtheexecutiongraphinDOTfileformat[1],andthenrunthisfilethroughGraphViztogetasnapshotofhowthecodeislinkedandexecuted.
DeferredOperations
Therearecertainoperationsthatareverydangeroustoperformwhileyouaremid-waythroughthegraphtraversal.Anyoperationthatmodifiesthegraph(delete,move,create)duringtheupdatephasecanhavedisastrouseffectssinceyouarealteringthesametreeyouaretraversing.Toshieldagainstthis,dangerousoperationscanbequeueduntiltheendoftheupdatephase.Thesedeferredhousekeepingtaskscancatchyououtifyouareexpectingthemodificationstohappenimmediately,buttheaddedlayerofbugprotectionitgivesyouisworththetrouble.
TeamUnknownRelease
Chapter23-DynamicCodeExecutionHierarchiesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
23.3Benefits&Pitfalls
Movingyourexecutioncontrolovertoadatadriven,dynamicsystemlikecodeexecutionhierarchieshasbothbenefitsandpitfalls.Thebenefitsinclude:
Codeisconstructedwithaconsistentpattern(Init,Update,Destroy,etc.).
Updateorderiseasiertomodifysincetheorderisdynamicratherthanhardcoded.
Introspectioncanmakefindingobjectinstanceseasierthanmanuallynavigatingclasslinkages.
Introspectionallowsthelayoutofobjectstochangewithouttheneedtomaintainmanuallinkagecode.
Itissimpletovisualizetheexecutionorderandhierarchyofyourcode.Thisaidsdebuggingandteachingofprogrammersnewtotheteam.
Thereare,ofcourse,afewpitfallsandcaveatsthatcomewiththeuseofcodeexecutionhierarchies.Theseinclude:
Itisdifficulttoseewhatishappeningbyjustlookingatthecode.Sincetheupdateorderisdynamicandbuiltatruntime,manualinspectionofthesourcecodedoesn'thelpmuch.
Somecodesimplydoesn'tfit.EventhoughCEHsprovideasimpleandflexiblestructure,therearestillpiecesoffunctionalitythatdon'tfitintotheframework.
Youstillhavetodealwiththesesystemsmanuallyandtakecareofobjectlinkagebyhand.
Youneedtobecarefulwithyourgranularity.PuttinggamesystemsandsubstantialgameobjectsintotheCEHcanbeveryuseful.Placingeveryparticleinaparticlesystemisoverkill,andyouwillwastelotsofCPUcyclesandmemorytounnecessaryCEHoverhead.Youneedtouseyourjudgement.
Modifyingtheexecutiontreewhileyouaretraversingitcancauselotsofcatastrophicbutdifficult-to-findbugs.Youneedtobeverycarefulwhenalteringthetreeordeferallmodificationoperationsuntilaftertheupdatephasehascompleted.
Codeexecutionhierarchiesareahugetopicfordiscussion.Ihaveworkedwithcodethatusesexplicitexecutionorder,linearlistsorqueuesofupdatefunctions,anddeeplyembeddedtreehierarchies,andtheyallhavetheirgoodandbadpoints.Myhopeisthatthisgemhasintroducedanewwayofthinkingaboutcodeexecutiontothosewhohavenotuseddynamicsystemsbeforeandhaspossiblyprovidedsomefoodforthoughtforthosecurrentlyusingdynamicexecutionsystems.Gamecodeinevitablybecomescomplex,andnewwaysoflookingathowweconstructcodearealwaysuseful.UsingconsistentandflexiblemethodslikethoseIhavedescribedcanhelpeasethepainandprovidemuchneededabstractions.
TeamUnknownRelease
Chapter23-DynamicCodeExecutionHierarchiesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]DOTfileformat.http://www.graphviz.org/doc/info/lang.html
[2]GraphViz.http://www.graphviz.org/
TeamUnknownRelease
Chapter24-Key-ValueDictionaryGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter24:Key-ValueDictionary
MartinLinklaterSonyStudioLiverpool
Overview
Thisgemspresentsadesignforaflexible,observablerepositoryforgameconfigurationdata.Thekey-valuedictionary(KVD)isadatastructureinspiredbythekey-valueobserving[1]technologyusedinMacOSXCocoaframeworks.Whenusedsensibly,theKVDcansimplifyyourcode.
TheKVDisdesignedtobeflexible,easytouse,andhavefewexternaldependencies.Itisnotdesignedtobeblisteringlyfast,ortobeusedforfrequentlymodifieddata.TheKVDiswellsuitedforstoringgamestateinformationthatisreadoften,butmodifiedrarely.TheKVDcodeontheaccompanyingCDiswritteninC++andusesthestandardtemplatelibrary(STL)forinternalstoragecontainers.TheKVDislightweight(about350linesofC++)andsimpletointegrateintoanexistinggameengine.
TeamUnknownRelease
Chapter24-Key-ValueDictionaryGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
24.1Design
Apple'sOSXreliesheavilyonObjective-CandtheCocoaframeworks.OneofthefundamentalmechanismsofCocoathatbindstheframeworkstogetheriscalled"key-valueobserving"(KVO).AppledescribesKVOasfollows:
Key-valueobservingprovidesamechanismthatallowsobjectstobenotifiedofchangestospecificpropertiesofotherobjects.[1]
KVOisamechanismthat,onceyougetusedtoit,becomesalmostinvaluable.Codedoesnothavetorepeatedlypollavaluetodetectwhetherithaschanged;rather,thecoderegistersitsinterestinlearningwhenavaluechangesandisnotifiedwhenthevaluedoeschange.Thecodeisnotifiedofthenewvaluebywayofacallback.SinceApple'sKVOmechanismrequiresObjective-CandCocoa,butwewritegamesprimarilyinC++,IhavecreatedtheKVD,asimpleC++datarepositorythatmimicssimpleKVObehavior.
TheKVDcantakedataofanytype,witheachpieceofdatahavingmultipleobservers.Observersarenotifiedofchangesimmediatelyuponthevaluechanging.TheKVDalsochecksthatavaluehasactuallychangedbeforesendingnotifications,sosettingavariabletothesamevalueitcurrentlyholdswillnottriggeranynotifications.
TeamUnknownRelease
Chapter24-Key-ValueDictionaryGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
24.2UsingtheKVD
Asmentionedintheintroduction,theKVDisdesignedtobeusedwithgeneralgamestatedatathatisreadoften,butchangedrelativelyinfrequently.Traditionally,gamestatedataisrepeatedlypolledbylotsofdifferentsystems.Forexample,thecurrentscreenresolutionisapieceofdatathatchangesveryinfrequently,butwhichispolledbyanumberofdifferentsystems.IfyouusetheKVDtostorethescreenresolution,youdon'tneedtopollforitsvalueeachframe,butyouareinsteadtoldwhenthevaluechangesandwhatthenewvalueis.
UsinganintermediarydatastoreliketheKVDcanhelpdecoupleyourclassesandreduceheaderfiledependencies.Asinthepreviousexample,ifyouweretostorethecurrentscreenresolutioninsidetherenderingsystem,everypieceofcodewhichisrequiredtoreadorsetthisvalueneedstoincludetheheaderfilefortherenderingsystem,andthatcoulditselfpullinotherheadersasillustratedinFigure24.1,increasingcompiletimes.
Figure24.1:Apollingmodelforaccessinginformation
oftencreatesadditionalheaderdependencies.
Pullingthescreenresolutionoutoftherenderingsystemandintoanintermediarywouldstopyourgamecodefromhavingtoincludetherenderingsystemheader(andalltheheaderstherenderingheaderincludes),butwouldaddadependencyontheintermediary.Aslongasthisintermediaryheaderhasfewerdependencies,youhavesimplifiedtheheaderfiledependencychainasshowninFigure24.2,andthiswillspeedupcompiletimes.
Figure24.2:Theheaderdependencygraphissimplifiedbymovinginformationintoanintermediaryheader.
TeamUnknownRelease
Chapter24-Key-ValueDictionaryGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
24.3CodeDetails
Thekey-valuedictionaryhasarelativelysmallAPIandonlyhasexternaldependenciesontheSTL.TousetheKVD,youfirstneedtocreateaKVDobjectbydeclaringitasfollows:
KeyValueDictionarymyKVD;
Oncecreated,theKVDisreadytoacceptvaluesandnotifications.SettingavalueintheKVDisassimpleasthis:
myKVD.Set<int>(std::string("myInteger"),1);
myKVD.Set<MyStruct>(std::string("myStruct"),myStructInstance);
Notificationshappenthroughafunctioncallbackmechanism.Thecallbackfunctionstakethefollowingform:
voidNotificationFunc(void*newValue,void*userData){//reacttonewvalue}
ToaddanotificationtotheKVD,youneedtotelltheKVDwhichfunctiontocallwhenthevaluechanges(thefirstparameter),andwhichvalueyouwanttowatch(thesecondparameter).ThethirdparameterisoptionalandwillbepassedtothecallbackfunctionastheuserDataparameter.Thiscanbesettonullorsettoapointertoyourownuserdataassociatedwiththecallbacknotification.
myKVD.AddNotification(NotificationFunc,std::string("myInteger"),0);
Thecallbackfunctionispassedtwovalues:thefirstisapointertothenewvalue,andthesecondisapointertotheuserdatathatwassetwhenthenotificationwasadded.Sincethecallbackisn'ttoldthetypeofthedata,youhavetocastthenewValuepointertotheappropriatetype.Youhavetomakesurethatthetypeusedisconsistentforagivenkey.Mixingtypescancausedifficult-to-findbugsorcrashes.
Onceyouhavesetanotification,wheneverthevalueischangedusingtheSet()method,yournotificationiscalled.Youcanaddhowevermanynotificationsyouwanttoeachkeyvalue—theywillallbecalledwheneverthevaluechanges.NotificationscanberemovedusingtheRemoveNotification()method:
myKVD.RemoveNotification(NotificationFunc,std::string("myInteger"));
Youcangetthevalueforakeymanually,ifyousowish,withthefollowingcall:
intmyValue;
myKVD.Get<int>(std::string("myInteger"),&myValue);
Internally,theprimarystoragecontainerfortheKVDisanSTLmap.Eachmapelementisindexedbythekeystringhash,andcontainsthevalueencodedasastd::string,alock,andastd::listofnotificationcallbacks.Eachnotificationcallbackcontainsapointertothecallbackfunction,andtheuserdatavalue,asillustratedinFigure24.3.
Figure24.3:ThedatastoredinaKVD.
STLwaschosenduetoitsgoodperformancecharacteristics,commonavailability,androbustnature.IfyourimplementationofKVDrequiresdifferentcontainercharacteristics,youarefreetochangethesourcetosuityourneeds.
EachentryintheKVDmaphasalockflag.Wheneverakeyvaluechanges,thelockissetbeforethenotificationphasebeginsandclearedafternotificationshavecompleted.Thislockischeckedbeforeakeyvalueismodified,tomakesurethatrecursivechangenotificationsdon'thappen.Recursivechangenotificationsareabadthingbecausetheycanblowthestackandcrashyourprogram.Considerthecasewhereakeyisalteredwithinthecallgraphofitsnotificationfunction.Eachchangeinvaluewouldtriggeranotification,whichwouldchangethevalue,triggeringthenotification,etc.Thelockingmechanismstopsthisfromhappening.SincelocksarepresentforeachindividualentryintheKVD,however,youareallowedtoalteradifferentKVDentryfrominsideacallback.
TeamUnknownRelease
Chapter24-Key-ValueDictionaryGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
24.4Caveats
Thecodepresentedinthisarticleisnotperfectandismeantasastartingpointforyoutomodifyasyouseefit.ThereareanumberofissuesthatIhavenottackledonpurpose,sincerequirementswilldifferamonguses:
Threadsafety.Thecodeisnotthread-safe.Ifyourequirethreadsafety,youwillneedtoaddwhateveryourmechanismofchoiceistothecode.IfyourgamecodeisquitetraditionalandkeepsalltheKVDlogicononethread,youcanignorethisissue.
Performance.AlthoughtheKVDisreasonablyfast,youmayhavespecificperformancerequirementsaffectingthedetailsofwhichcontainersandmechanismsareused.Forgeneralusecasesthough,thecodeshouldperformadequatelyasis.
Memoryallocation.TheinternalstoragefortheKVDishandledviatheSTLdefaultallocator.Thiscouldcausefragmentationissuesinyourcode.Ifyouneedtokeepafirmhandleonmemoryusage,youwillneedtoeitherwriteyourowncontainercodeoroverridethedefaultallocatorfortheSTLcontainersthatareused.
TeamUnknownRelease
Chapter24-Key-ValueDictionaryGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]"Key-ValueObservingProgrammingGuide".http://developer.apple.com/mac/library/DOCUMENTATION/Cocoa/Conceptual/KeyValueObserving/Concepts/Overview.html
TeamUnknownRelease
Chapter25-ABasicSchedulerGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter25:ABasicScheduler
JohnBoltonNetflix
Highlights
Manygameshaveaneedtoexecutetasksatregularintervals.Threadsareapossiblesolution,buttheycanbeapoorchoiceduetotheirnondeterministicnature,thecomplexitiesoftheirinteractions,andthehighoverheadofcontextswitching.Ontheotherhand,aschedulercanbeimplementedtoruninasinglethreadandexecutetasksataspecificpointintheframe.Itstaskswillexecuteinasinglecontextunderfullcontroloftheapplication.
Thisgempresentsabasiclightweightobject-orientedschedulerthatimplementslimitedcooperativemultitaskingbetweentasksinasinglethread.PossibleapplicationsofthisschedulerincludeAI,audio,andenvironmentaleffects.ThecompletecodefortheschedulerdescribedhereisavailableontheaccompanyingCD.
TeamUnknownRelease
Chapter25-ABasicSchedulerGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
25.1Overview
Theschedulerimplementedhereisdesignedtoexecutealistoftasksoneatatimeataparticularpointinaframeortimestep.Eachtaskhasatimermaintainedbythescheduler,andalltaskswhosetimershaveexpiredarequeuedtobeexecutedinthatframe.Tasksareexecutedseriallybytheschedulerinthescheduler'sthread,andeachtaskrunstocompletion.Thus,tasksareneverinterruptedbytheschedulerorbyothertasks,andthereisnoneedforsynchronizationamongtasks.
Orderofexecutionwithinaframeisarbitraryinthisimplementation,buttheordercanbecontrolledbyimplementingaprioritysystem.Otheradditionalfeatures,suchasloadbalancing,arenotimplementedhere,butaredescribedlater.
TeamUnknownRelease
Chapter25-ABasicSchedulerGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
25.2TaskFunctionality
Thetaskasseenbytheschedulerisverysimple.Thereisaninitializationfunction,acleanupfunction,andafunctiontoexecutethetask.Toprovidetheinterfaceforthisfunctionality,tasksarederivedfromthebaseclassTask,whichhasthevirtualfunctionsStart(),Stop(),andExecute().
TheStart()functioniscalledimmediatelyafterthetaskisaddedtothescheduler'stasklist.Itspurposeistoallowthetasktoinitializeitselfbeforeitbeginsanyexecution.Itissafe(asfarastheschedulerisconcerned)toremovethetaskfromthescheduler'stasklistintheStart()function.
TheStop()functioniscalledimmediatelyafterthetaskisremovedfromthetasklist.Itspurposeistoallowthetasktocleanitselfup.OncetheStop()functioniscalled,theschedulernolongerreferencesthetask,soitissafe(asfarastheschedulerisconcerned)todestroythetaskanytimeduringorafterthecalltotheStop()function.Itisalsosafetore-addthetasktothescheduler'stasklistfrominsidetheStop()function.
TheExecute()functioniscalledperiodicallybythescheduleraccordingtothetask'speriod.Thetimerismaintainedbythescheduler.Whenataskisfinishedexecuting,itreturnsoneofthefollowingvaluesthattheschedulerusestomanagethetask:
ACTIVE.Ifthisresultisreported,thetaskisrequeuedforexecutionagainaccordingtoitsperiod.Thisisthenormalresult.
AGAIN.Ifthisresultisreported,thetaskisrequeuedtorunagaininthenextframe,regardlessofitsperiod.Thisresultisintendedtoindicatethatthetaskcouldnotfinishsuccessfullyandshouldbeexecutedagainassoonaspossible.Itcouldalsobeusedtoforcethetasktobeexecutedeveryframe,butitisbettertosettheperiodappropriatelyinstead.
INACTIVE.Ifthisresultisreported,thetaskisremovedfromthescheduler'stasklist.Theschedulerremovesallreferencestothetaskandthencallsthetask'sStop()function.
TeamUnknownRelease
Chapter25-ABasicSchedulerGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
25.3SchedulerFunctionality
Theschedulerclassmaintainsalistoftasksandschedulesthemforexecutionaccordingtotheirperiods.Therecanbemorethanoneinstanceoftheschedulerclass,thoughforthesakeofsimplicity,instancescannotbecopiedorassignedinthisimplementation,andtasksdonotkeeptrackofwhichschedulerisexecutingthem.Inordertodeterminewhentasksaretobeexecuted,theschedulermaintainsatimerforeachclass.Eachframe,thetimersareupdated,andalltaskswhosetimershaveexpiredareexecutedinanarbitraryorder.Thescheduleralsoprovidestheabilitytosuspendandresumeexecutionsoftasksandtochangetheperiodofatask.
Tasksareaddedtoascheduler'sexecutionlistbytheAdd()function.TheAdd()functionalsospecifieshowoftenthetaskisexecuted.Ifthetaskbeingaddedisalreadyinthescheduler'stasklist,anerrorisreturned.Tasksmaybeaddedatanytime,includingwhiletasksarebeingexecuted.Thetask'stimerisinitializedwhenthetaskisadded.Asdescribedabove,atask'sStart()functioniscalledafteritisadded.
Tasksareremovedfromthescheduler'sexecutionlistbytheRemove()function.Taskscanberemovedatanytime.Ifthetaskbeingremovedisnotinthescheduler'stasklist,anerrorisreturned.Tasksarenotexecutedoncetheyareremoved,eveniftheyareremovedduringaframeinwhichtheyarescheduledtobeexecuted.Asdescribedabove,atask'sStop()functioniscalledafteritisremoved.
WhentheExecuteTasks()functioniscalled,alltasksthatarereadytorunareexecutedseriallyinanarbitraryorder.ThedeltaTimeparametertotheExecuteTasks()functionindicatestheamountoftimethathaspassedsincethelasttimethefunctionwascalled.Thisvalueisusedtoupdatethetasks'timersinordertodeterminewhentasksbecomereadytobeexecuted.TheExecuteTasks()functionisintendedtobecalledonceperframe,butitispossibletocallasoftenasdesired.AslongasthedeltaTimevariableisaccurateoratleastreasonable,thetaskswillexecutewiththeintendedperiod.TheexceptionsaretaskswithaperiodofPERIOD_EVERY_FRAMEandtasksthatreturnthevalueAGAIN.ThesetasksareexecutedwheneverExecuteTasks()iscalled.
TasksaresuspendedandresumedusingtheSuspend()andResume()functions.Suspendedtasksarenotexecutedandtheirtimersarenotupdated.Ifthetaskbeingsuspendedorresumedisnotinthescheduler'stasklist,anerrorisreturned.ThisdiffersfromAdd()andRemove()inthatthetaskremainsinthescheduler'stasklist,thetask'sStart()andStop()functionsarenotcalled,andthetask'stimeisfrozenuntilResume()iscalled.
TheperiodofataskcanbechangedbytheSetPeriod()function.Ifthetaskisnotinthescheduler'stasklist,PERIOD_INVALIDisreturned.Whenatask'speriodischanged,itstimerisreset.AperiodofPERIOD_EVERY_FRAMEcausesthetasktoexecuteeveryframe.CallingSetPeriod()withaninvalidperiod(lessthan0),doesnotchangetheperiodandreturnsthevaluePERIOD_INVALID.
TeamUnknownRelease
Chapter25-ABasicSchedulerGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
25.4Implementation
Thedatastructureusedbytheschedulerisverystraightforward.Apointertoeachtaskalongwithatimerandsomestateinformationisstoredinavector.AnSTLcontainerischoseninordertosimplifytheimplementation.Afixed-sizearraywouldalleviatememoryallocationissues,butthenadditionallogicwouldbenecessaryinordertopreventoverflow.
Othercontainertypesmightbeconsidereddependingonhowtheschedulerisused.Thetasksarenotsortedbecauseitisassumedthatsortingthetaskswouldbemoretime-consumingthansimplyscanningallentriestofindtaskstoexecute.Ifthereareaverylargenumberoftasksandonlyafewareexecutedeachframe,thenitmightpaytosortthetasks.
Taskexecutionisperformedinthreephases.First,thetimerforeachtaskisdecrementedaccordingtotheamountoftimethathaspassed.Then,foreachtask,ifthetimerislessthanorequaltozero,thetaskisexecuted(unlessitissuspendedormarkedforremoval),andthetimerisresettoitsperiod.Finally,alltasksmarkedforremovalareremoved.Thepurposeofthethreephasesistoavoidproblemsthatmightarisewhenonetaskmodifiesanothertask,oraddsorremovestasksfromthescheduler.
TeamUnknownRelease
Chapter25-ABasicSchedulerGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
25.5AdditionalFunctionality
Someadditionalfeaturesarenotimplementedhereforthesakeofsimplicity.Inthisimplementation,tasksarerunduringeachframeinanarbitraryorder.Itmightbeadvantageoustogivetasksaprioritysothattheycanbeexecutedinacertainorderwithrespecttotheothertasksthatareexecutedinthesameframe.Inordertoaccomplishthis,theprioritiesofthetasksarestoredinthetasklistandthetaskscanbesortedbypriorityorstoredinadatastructurethatsupportsapriorityscheme.
Insomesituations,theamountoftimeavailableforexecutingtasksmightbelimitedorbudgeted.Inthiscase,theExecuteTask()functioncouldhaveanadditionalparameterspecifyingthebudgetedtime,andtheschedulerwouldexecutetasksuntilthebudgetisusedup.Thetask'sExecute()functioncouldhaveanadditionalparameterspecifyingthetimeremaining,allowingthetaskitselftolimittheamountoftimeituses.Budgetingtimecouldalsoworkinconjunctionwithtaskpriorities,reducingthelatencyofhigherprioritytasks.However,itmustbenotedthatthissimpleschedulingalgorithmcanbecomeinadequateinsomesituations.
TeamUnknownRelease
Chapter26-TheGameStateObserverPatternGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter26:TheGameStateObserverPattern
RonBarbosaRevelexCorporation
Overview
Withtoday'shigh-poweredgraphicsandaudiohardware,developingagameisbecomingmoreakintoproducingablockbustermovie.Real-timeragdollphysicsarethenewstuntmen,andparticlesystemsarethenewpyrotechnics.Withsomanysexycomponentsrequiredtobringagameenginetogether,it'snotsurprisingthatlittletreatmentisgiventothemanagementofgamestate.
Whentheuserclicksthemousebuttonorpressestheleftanalogstickofagamepad,thestateofthegameiswhatdetermineswhethertheuserintendedtofirehisavatar'sweaponorselectthe"Quit"optionfromthemenu.Thegamestatedetermineswhetherthegraphicshardwareshouldrenderthegamescreenortheinventorymenu.WhetherNPCsshouldexecutetheirnextanimationframeorjustwaitaroundforthenextgameloopiterationisdueinlargeparttothegame'sstate.
Gamestatemanagementcanbemadestreamlinedandelegantusinganimplementationoftheobserverdesignpattern.[1]Theobserverpatternprovidesawayforinstancesofclasses(subjects)tobe"observed"byotherobjects(observers)intheapplication.Eachobserversubscribestothesubject,andwhenthesubject'sdataischanged,itsendsnotificationtoallregisteredobservers,providingeachsubscriberareferencetothesubject.Theobserverscanthenqueryanypublicdataorcallanypublicmethodsofthesubjecttodeterminewhathaschangedandhowitaffectstheobserver.
Combiningtheobserverpatternwithagamestatemanagerwouldallowthegamestatemanagertonotifyallinterestedsoftwarecomponentsofwhatthegamestateisinrealtime.Soiftheuserhitsthe"Pause"option,thegamestatecanbesettopaused.Thestatemanagerwouldthennotifytheavatarmanagerthatthegameispaused,anditcanstopprocessingcontrollerinputuntilfurthernotice.Thestatemanagerwouldsimultaneouslynotifythemenumanagerthatthegameispaused,andthemenumanagercouldbeginrenderingthepausemenuandrespondingtocontrollerinputtoselectmenuoptions.
Thereal-timenotificationmechanismsimplifiesthecodebyencapsulatingtheeffectofstatechangeonanindividualmoduleinthesoftware.Withoutaproperstatemanagementmechanism,developerswilloftenusearbitraryorartificialconditionstodeterminewhatshouldbedoneduringthegameloop.Considerthefollowingpseudocodeforanavatar'sgameloopprocessor.
publicvoidUpdate([arguments]){//Determineiftheinputshouldbeprocessedif(GamePad[0].active){//Processavatarmovementifthefirstgamepadisactive}elseif(GameObject.gamePaused){//Checkiftheplayeristryingtounpausethegame}elseif(GameObject.menuActive){
//Processinputformenus}elseif(GameObject.numberOfActivePlayers==0){//GameOver.Performcleanup.}}
publicvoidDraw([arguments]){//Determineiftheavatarshouldbedrawnif(GameObject.numberOfActivePlayers>0&&!GameObject.gamePaused&&!GameObject.gameOver){//Drawtheavatar}}
Thisdoesn'tseemsobadfornow,butwhathappenswhentheplayertriestostartthegamefromthesecondcontroller?Thedeveloperthenhastogobackintothecodeanddeterminewhythegamepadseemstohavegonedead.Whenthegame'sproducerdecidesthatthe"GameOver"screenshouldpaintthegamemapandallvisiblecharacters,thedeveloperwillneedtoupdatetheDraw()methodandaddconsiderationforthe"GameOver"state.
Theupdateanddrawmethodsstarttogetuglythemoreadvancedtheapplicationgets,becausetypicallyoncethegameelementsarecreatedandthegameentersitsprocessingloop,theframe-to-framecallstomethodslike
Update()andDraw()areusuallywhattriggersthegameelementstoperformwhateverprocessingandrenderingneedstobedone.Asfeaturesevolveandthecodebecomesmorecomplex,defensivecodingtechniquesstarttocreepintoplay,andlargesectionsoffunctionalitybecomewrappedinconditionalblockssothattheyonlytakeeffectincertainstateconditions.
Usingthegamestateobserverpattern,theindividualelementsofthegamecanbeimmediatelynotifiedofstatechangesandrespondtothematthetimeofstatechangeasopposedtowaitingforthenextcalltoUpdate().Gameelementscanalsoexcludethemselvesfromrenderingwhenthegamestatenolongerrequiresorallowsthemtobedrawn,savingvaluableprocessingpower.
Thebenefitsofproperstatemanagementandarobuststatechangenotificationmechanismbecomeimmediatelyevidentonceyoubegintousetheminyourownapplication.Intheupcomingsectionsofthisgem,wetakeamorein-depthlookatthevarioussoftwarecomponentsthatarerequiredtoputthegamestateobserverpatterntoworkforyou.
[1]Thisgemalsomakesuseofthesingletondesignpattern,butitisoutsidethescopeofthisgem.Formoreinformationregardingthisandotherdesignpatterns,referto[1].
TeamUnknownRelease
Chapter26-TheGameStateObserverPatternGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
26.1CreatingaGameStateManager
Tobeginusingthegamestateobserverpattern,wefirstneedaclasstorepresentthegamestate,suchastheGameStateclassshowninListing26.1.Sinceagameshouldonlyeverbeinonestate,we'llemployanotherdesignpatternfortheGameStateimplementation—thesingletondesignpattern.
Listing26.1:GameStateclassimplementationinC#.
classGameState{//ThisenumeratedtypeiswhereallvalidgamestatesaredefinedpublicenumState{Initializing,StartMenu,
Tutorial,InPlay,GameOver,Paused,BetweenLevels,GameEnded,ConfirmExit,GameOptionsMenu,DemoMode};
//DefinetheoneandonlyinstanceoftheGameStateclass.//ThisistheSingleton
privatestaticGameState_instance;
//Thisdatamemberwillstorethecurrentstate.privateState_currentState;
//Thisprivateconstructorcanonlybecalledfromwithinthis//class.ThisishowtheSingletonpatternensuresthatonly//oneinstanceofthisclasswilleverexist.privateGameState(){}
//Thispublicaccessorgivestheoutsideworldaccesstothe//SingletoninstanceofGameState.publicstaticGameStateinstance{get{//Iftheinstancehasnotbeendefined,createanew//instance.if(GameState._instance==null){GameState._instance=newGameState();}
//ReturntheinstancereturnGameState._instance;}}
//Theseaccessorsallowthecurrentstatetobequeriedand//setbytheoutsideworld.publicStatecurrentState
{get{returnthis._currentState;}set{this._currentState=value;}}}
TheinlinecommentsinListing26.1tellmostofthestory,butlet'sexaminethemovingparts.
TheenumeratedStatesubtype(GameState.State)containsthe"masterlist"ofallvalidgamestates.
The_instancedatamemberismarkedasprivate,keepingaccessibilitytothedataunderthecontroloftheclassitself.Thisdatamemberisalsomarkedstatic,meaningitbelongstotheclassandnottotheinstance.
The_currentStatedatamemberisalsoprivate.Itstoresthevalueofthecurrentgamestate,asallowedbytheGameState.Stateenumeratedtype.
TheonlyconstructorprovidedfortheGameStateclassisalsomarkedprivate,meaningtheclasscanonlybeinstantiatedfromwithinitself.
TheinstanceaccessorparameterallowstheoutsideworldtogetareferencetothesingletoninstanceofGameState.Theretrievalmechanismfirstcheckstoseeiftheclasshasbeeninstantiated.Ifnot,itcreatesanewinstanceofGameStateandstoresareferencetoitinthestatic_instancemember.Onceavalidinstancehasbeencreatedandstored,areferencetoitis
returned.
ThecurrentStateaccessorparametersareusedtosetandretrievethecurrentgamestate.Whiletheimplementationsshownherearequitesimple,morecomplexlogiccanbeemployedtoensurethatallstatetransitionsarelegal.
Atthispoint,wehaveafairlysimple,butfunctionalgamestatemanager.Ithaseverythingitneedstobeausefuladditiontoagameorgameengineproject.Initscurrentstate,itcanbeusedtoestablishandupdatethecurrentgamestate,anditcanbequeriedbyothersoftwaremodulesthatneedtoknowthestateofthegameinordertofunctionproperly.
Byprovidingothersoftwaremoduleswithasingleaccesspointtosetandretrievethegame'sstate,managinginputprocessingandrenderingbecomesafunctionofthegame'scurrentstate,ratherthanartificialconditionssuchaswhetherornotagivencontrollerisactiveorhasapressedbutton.
TheavatarUpdate()methodbeingcalledinthegameloopcanbesimplifiedsuchthattheavataronlyprocessesupdateswhenthegameisinastatethatismeaningfultotheavatar:
publicvoidUpdate([arguments]){if(GameState.instance.currentState!=GameState.State.InPlay){//Donothingifthegameisnotinplayreturn;}
//Processthisupdate}
Makingtheoperationsperformedbyeachsoftwaremodulefunctionsofthecurrentgamestatereducesprocessingoverheadbyallowingthemodulestodetermineifthereisanyneedforthemtoperformgiventhecurrentgamestate.
Thisformalizedstatemanagementishelpful,butthereisagreatdealofroomforimprovement.Eachgamemodulestillneedstoquerythegamestateinordertoknowwhatthecurrentstateis.Anotherthingtoconsideristhatthesoftwareonlyhasaccesstothecurrentstate.Thereis,asyet,nowayforamoduletobenotifiedthatthestatehaschanged.
Somesoftwaremodulesmayneedtoperformasetofoperationswhenthegametransitionsfromonestatetoanother.Forexample,anautomaticgamesavesystemmightneedtoknowthattheplayerhasjustcompletedsomestageofthegameandhastransitionedtothe"stats"screen.ThiscouldbedonebychangingthegamestatefromInPlaytoBetweenLevelsandusingthegamestateobserverpatterntonotifytheautomaticgamesavemechanismtoupdatetheplayer'sstatsandinventory.
TeamUnknownRelease
Chapter26-TheGameStateObserverPatternGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
26.2TheInterfacesoftheGameStateObserverPattern
Inthissection,webegintheprocessofturningtheGameStateintoan"observable"objectandcreatethefoundationthatwillprovidethecommunicationpathbetweentheGameStateandtheothermodulesofyourgame.
InorderforthesoftwareinyourgameenginetotreattheGameStateasthesubjectofobservation,weneedtocreateaninterfacethattellstherestofthegame'ssoftwarethatGameStatecanbeobserved.TheIObservableinterfaceshowninListing26.2providesawaytodothis.Asyoucansee,there'sverylittletothisinterface,asitonlydefinesthetwomethodsSubscribe()andUnsubscribe().SomeimplementationsoftheobserverpatternalsodefineaNotifySubscribers()methodintheIObservableinterface,butsinceallinterfacemethodsmustbedefinedaspublic,thatwouldallowothersoftwaremodulestoforceobservationsubjectstonotifysubscribersevenifnochangehasbeenmade.ThegamestateobserverpatternallowstheGameStateclasstodecidewhentonotifyitssubscribersofachange.
Listing26.2:TheIObservableandINotifiableinterfaces.
interfaceIObservable{voidSubscribe(INotifiableobserver);voidUnsubscribe(INotifiableobserver);}
interfaceINotifiable{voidProcessNotification(IObservablesubject);}
BoththeSubscribe()andUnsubscribe()methodstakeexactlyoneparameter.ThisparameterhasthetypeINotifiable,meaningitisaninstanceofaclassthatimplementstheINotifiableinterfaceshowninListing26.2.TheINotifiableinterfacedefinesonlyonemethod,ProcessNotification().ThismethodacceptsoneargumentoftypeIObservable,meaningitisanyinstanceofaclassthatimplementstheIObservableinterface.
Sinceanyclasscanimplementaninterface,youasthedevelopercandecidewhichmodulesinyoursoftwarecanobserveand/orbeobserved.Toimplementtheinterface,theclassdefinitionmustfirstbemodifiedtoindicatethatitimplementsagiveninterface.Thenitmustprovideanimplementationforeverymethodthattheinterfacedefines.
TeamUnknownRelease
Chapter26-TheGameStateObserverPatternGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
26.3MakingGameStateObservable
TomakeourGameStateclassobservable,wechangeitsclassdeclarationtoreadasfollows:[2]
classGameState:IObservable
NowthatwehavetaggedtheGameStateclassasobservable,wemustprovidethepublicmethodsnecessarytosatisfytheinterface'srequirements.Butbeforewemoveontothemethodimplementation,let'stakeabrieflookathowobservationworksinthegamestateobserverpattern:
Thesubjectofobservation(theIObservable)containsalistofsubscribers.
Anyobjectcapableofprocessingnotifications(theINotifiableinterface)cancallthesubject'sSubscribe()methodandbeaddedtothelistofsubscribers.
Whenthesubjectismodifiedinawaythatrequiresnotificationtobesentout,thesubscriberiteratesoverthelistofobserversandcallseachobserver'sProcessNotification()method.
Ifanobserverwantstostopreceivingupdatesfromthesubject,theobservercancallthesubject'sUnsubscribe()methodtoberemovedfromthelistofsubscribers.
TheGameStateclassneedsadatamemberinwhichtostoreitslistofobservers,soweaddthefollowinglineof
codetotheGameStateclassdefinitionunderthedefinitionofthecurrentStatemember:
privateList<INotifiable>_observers=newList<INotifiable>();
ThiscreatesaListobjectcalled_observersthatwillbeusedtostorereferencestoINotifiableobjects.
Now,thesubscriptionmechanismmustbecreated.TheSubscribe()methodisfairlysimple,asshowninListing26.3,andcanbeaddedattheendoftheGameStateclassimplementation.Whatthismethodeffectivelydoesisensurethattheobserverrequestingnotificationsisnotalreadyintheobserverlist,andifnot,thenitisaddedtothelist.
Listing26.3:TheSubscribe()andUnsubscribe()methodsoftheGameStateclass.
publicvoidSubscribe(INotifiableobserver){if(!this._observers.Contains(observer)){this._observers.Add(observer);}}
publicvoidUnsubscribe(INotifiableobserver){if(this._observers.Contains(observer)){this._observers.Remove(observer);}}
Withthesubscriptionmethodinplace,weneedamethodtounsubscribeaswell.TheUnsubscribe()methodcanbeaddedbeneaththeSubscribe()methodintheGameStateclasswiththeimplementationshowninListing26.3.InthesamefashionastheSubscribe()method,theUnsubscribe()methodcheckstoseeiftheobserverrequestingremovalisinthelist,andtheobserverisonlyremovedfromthelistifitisfound.
Atthispoint,wehaveanobservableGameStateclass.SubscribersthroughoutthegameapplicationcanregisterwiththeGameStateinstancetobeinformedofmodificationstothestate,butthere'sstillsomeworklefttodo.
TheGameStateclasswehavesofardoesnotyetnotifyitssubscribers.Itsimplymanagesalistofinterestedsoftwarecomponents.WestillneedtoprovidetheGameStateclasswithamethodthatnotifiesitsobserversofstatechanges.The_NotifySubscribers()methodshowninListing26.4canbeaddedtothebottomoftheGameStateclassimplementationtotakecareofthis.The_NotifySubscribers()methoditeratesoverthelistofobservers,andforeachonecallsitsProcessNotification()methodwithareferencetothesingletonGameStateinstance(this).
Listing26.4:The_NotifySubscribers()methodoftheGameStateclass.
privatevoid_NotifySubscribers()
{foreach(INotifiableobserverinthis._observers){observer.ProcessNotification(this);}}
TheGameStateclassnowhasamethodfornotifyingitsobservers,butthemethodisn'tbeingcalledanywhere.Thenextstepistoupdatethestatemodificationaccessormethodtocall_NotifySubscribers()whenthegamestateischanged,asshowninListing26.5.Thenewaccessortosetthecurrentstateonlydoessomethingwhenstateisactuallychanging,anditcalls_NotifySubscribers()tosendalltheobserverstheupdate.
Listing26.5:ThesetimplementationforthecurrentStatememberoftheGameStateclass.
publicStatecurrentState{get{returnthis._currentState;}
set{if(this._currentState!=value){this._currentState=value;this._NotifySubscribers();}
}}
That'saboutallthereisforthegamestatemanager'sroleinitsownobservation.Theballisnowintheobserver'scourt.Havingreceivedanupdate,itmustbeabletotakeactionbasedonthedataithasreceived.
[2]ThisisaC#implementation,andaC++implementationissimilar.Otherlanguagesprovidekeywordssuchasimplementsorextendstoindicaterelationshipsamongvariousstructures.Besuretousetheappropriatesyntaxfortheprogramminglanguageyourgameuses.
TeamUnknownRelease
Chapter26-TheGameStateObserverPatternGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
26.4CreatingObservers
Anyclassinyourgameenginelibrarycanbemadeintoanobserver,andthuscanbemadetoobservegamestate.Toturnanexistingclassintoanobserver,youmustfirstdeclarethatitimplementstheINotifiableinterface,andthenyoumustprovidetheimplementationfortheProcessNotification()methoddefinedintheINotifiableinterface,asexemplifiedbyListing26.6.Initsminimalistform,SomeGameComponentisanobserver.Itdefinesnousefulfunctionality,storesnodata,anddoesnothingwithanynotificationsitreceives,butithasallthemovingpartsneededtobeclassifiedasanobserver.
Listing26.6:Asampleobserverimplementation.
classSomeGameComponent:INotifiable{//Defineclassdatamembershere
publicvoidProcessNotification(IObservablesubject){//Querythesubjectforanymeaningfulchanges}}
InListing26.6,theargumentreceivedbytheProcessNotification()methodisoftypeIObservable.ThismeansthatanyinstanceofanyclassthatimplementstheIObservableinterfacecanbepassed
intothismethod.However,italsomeansthatonlythemethodsdefinedintheIObservableinterfacecanlegallybecalledwithoutexplicitlycastingtheargumenttoaknowntype.Inotherwords,wecannotqueryGameStatepropertiesorcallmethodsoftheGameStateclasswithoutcastingsubjecttobeoftypeGameState,asshowninListing26.7.
Listing26.7:Accessinganobserverbyitsnativetype.[3]
publicvoidProcessNotification(IObservablesubject){//CasttheinstancetoitsnativetypeGameStategsSubject=(GameState)subject;
//UsethequalifiedreferencetoaccessitsdataandmethodsGameState.StatecurrentGameState=gsSubject.currentState;}
TheimplementationfortheProcessNotification()methodisfunctional,butsomewhatlimiting.Supposeyouhaveagamecomponentinyourlibrarythatneedstoobservemultiplesubjectsofvarioustypes.TheaboveimplementationwouldfailifsubjectisnotoftypeGameState.Tomanagethissituation,wecanprovideaswitchboardmechanismwithinProcessNotification()thatdoesn'thandletheheavylifting,butsimplyidentifiesthebestmethodforthejob.
Listing26.8showshowaswitchboardmechanismcouldbe
usedtofunnelnotificationprocessingthroughtype-specificmethodssothatanyclassthatimplementsINotifiablecanobservemultiplesubjects,regardlessofthesubject'snativetype.Whenthetypeisidentified,processingishandedofftoapurpose-builtmethodcapableofhandlingupdatenotificationsforthattypeofclass.SomeObservable,intheexampleabove,canbethoughtofassomeotherclassthatimplementstheIObservableinterface.TheProcessNotification()methodcouldgetabitunwieldy,butinpracticalcases,it'suncommontoobservesubjectsofmorethanahandfuloftypes.
Listing26.8:Supportingmultiplesubjecttypes.
publicvoidProcessNotification(IObservablesubject){//Determinethebestmethodtohandlethisupdateif(subject.GetType()==typeof(GameState)){this._ProcessNotification((GameState)subject);
}elseif(subject.GetType()==typeof(SomeObservable)){this._ProcessNotification((SomeObservable)subject);}}
protectedvoid_ProcessNotification(GameStatesubject){//ProcessnotificationsforinstancesofGameState}
protectedvoid_ProcessNotification(SomeObservablesubject){//ProcessnotificationsforinstancesofSomeObservable}
[3]Listing26.7showshowtocastanobservablesubjecttoitsnativetypesothatitsdatamembersandmethodscanbeaccessed.Bearinmindthatthesyntaxforreferencecastingmaybedifferentintheprogramminglanguageyouareusing.InC++,youwouldnormallyusestatic_casttoperformthecasttothederivedclasstype.
TeamUnknownRelease
Chapter26-TheGameStateObserverPatternGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
26.5ManagingFunctionalitybyGameState
Inarealgameapplication,we'dwantourobjectsandgameentitiestoreactinmeaningfulwayswhenanotificationofchangehasbeenreceived.Whenworkingwithgamestatethistypicallymeansthegameneedstoreactdifferentlytocontrollerinputorrenderadifferentsceneormenu.Imagineagameenginewithaseriesofqueuesthatprovidedifferentfunctionality.Forexample,modulesthatwanttoprocesscontrollerinputcouldbeplacedinthe"inputqueue",andmodelsorspritesthatneedtobedrawncouldbeplacedinthe"renderqueue".Thesequeuescanprovidemethodstoallowobjectstobeaddedandremovedfromthem.
Uponreceiptofnotificationofastatechange,agamemodulecanaddorremoveitselffromtheabovequeuesproactively.Theavatarmanagementcodecanstopprocessinginputwhenthegameisinapausedstate,andthemenumanagementsoftwarecanstoprenderingmenuswhenthegameisinplay.
Afullexampleofsuchamechanismiswellbeyondthescopeofthisgem,butasimpleexamplecanbefoundontheaccompanyingCD.Theexample,calledobserverSample,isasmallXNAprojectthatshowshowthemodificationofgamestatecanbecommunicatedtoalltheinterestedmodulesofagameapplicationtomanagewhichcomponentsprocessinputandrendertothescreen.
TeamUnknownRelease
Chapter26-TheGameStateObserverPatternGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]ErichGamma,RichardHelm,RalphJohnson,andJohnM.Vlissides.DesignPatterns:ElementsofReusableObject-OrientedSoftware.Addison-WesleyProfessional,1994.
TeamUnknownRelease
Chapter27-FastTrigonometricOperationsUsingCordicMethodsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter27:FastTrigonometricOperationsUsingCordicMethods
JohnBoltonNetflix
Overview
Trigonometricfunctionsarerequiredtodisplay3Dandrotating2Dgraphics,butthesefunctionsaregenerallynotwell-supportedonsomegameplatformssuchashandhelddevicesandcellularphones.Onplatformswithoutafloating-pointprocessor,trigonometricfunctionsmayinsteadbeimplementedbyemulatingfloating-pointnumberrepresentationsandoperations,andperformancecanbeverypoorasaresult.Incontrast,CORDICmethodsimplementstandardtrigonometricfunctionsusingsimpleintegermathandbitshifting,andthiscanbeextremelyfast.
CORDICmethodswereinventedbyJackVolder[2]inthelate1950sasawaytocomputetrigonometricfunctionsinhardwareforuseinavionics.CORDICstandsforCOordinateRotationDIgitalComputer.Later,themethodswereextendedbyJohnWalther[3]andotherstorelatedfunctions(hyperbolicandexponentialfunctions,forexample).
TeamUnknownRelease
Chapter27-FastTrigonometricOperationsUsingCordicMethodsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
27.1RotationModeAlgorithm
TheCORDICmethodsarebasedoniterativelyrotatingapointbyfixedanglesuntiladesiredrotationisachieved.Theequationsforrotatingapoint(x,y)abouttheoriginintwodimensionsare
x′=xcosθ-ysinθ
y′=xsinθ+ycosθ,
andtheseareequivalentto
x′=cosθ(x-ytanθ)
y′=cosθ(y+xtanθ).
ThemainconceptbehindtheCORDICmethodsistoiterativelyrotatethepointbyangleswhosetangentisapoweroftwountilthedesiredangleisreachedusingtheformula
xi+1=Ci(xi-diyi·2-i)
yi+1=Ci(yi+dixi·2-i),
where
Choosingapoweroftwoallowsthemultiplicationbytanθtobereplacedwithashiftoperation.
BecausethevaluesofCiareconstant,themultiplicationbyCiineachiterationcanbemovedoutoftheiterationprocess,andtheresultcanbeadjustedbydoingasingleaccumulatedmultiplicationinstead(whennecessary).Thisoptimizationstepreduceseachiterationtoafewarithmeticandshiftoperationsplusanindexedlook-upintoanarrayofanglevalues.
ThevalueofCisapproximately0.60725asthenumberofiterationsapproachesinfinity,thoughtheactualvalueisbasedonafinitenumberofiterations.
Ineachiteration,thepointisrotatedbysuccessivelysmalleramounts.Athirditeratedvalueaiholdsthedifferencebetweentheaccumulatedrotationangleandthedesiredrotationangle,andisusedtodetermineifthenextiterationshouldrotatethepointclockwiseorcounterclockwise.a0isinitializedtotheinputangle,andaiapproacheszero.Thedirectionofrotationisdeterminedbythesignofthedifferenceandisrepresentedherebythevaluedi:
Asmentionedearlier,thepointisrotatedbysuccessivelysmalleramounts.Theiterationcontinuesuntiltheamountof
rotationistoosmalltoberepresentedbythechosenfixed-pointformat.Usingfeweriterationsispossible,butproduceslessaccurateresults.Thenumberofiterationsis25whenusingan8.24-bitfixed-pointformatandmeasuringanglesinradians.
Thisbasicalgorithmiscalledthe"rotation"modeandcanbeusedtorotateanarbitrary2Dvector.Itcanalsocomputethesineandcosineofananglesimplybyrotatingthepoint(1,0)bythatangleandreturningtheresultingvalues.Itisimportanttonotethatthisalgorithmrequirestheinputangletobeintherange[-π/2,π/2].Foranglesoutsideofthisrange,thepointisfirstrotatedby2πandπasnecessaryuntiltheangleisintheproperrange.
Insummary,thevaluesofthelastiterationoftherotationmodealgorithmaregivenby
xn=x0cosθ-y0sinθ
yn=x0sinθ+y0cosθ
αn=0
TeamUnknownRelease
Chapter27-FastTrigonometricOperationsUsingCordicMethodsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
27.2VectoringModeAlgorithm
Asecondrelatedalgorithmiterativelyrotatesagivenpointtowardsthex-axis.Thisiscalledthe"vectoring"mode.Inthisalgorithm,thepointisrotatedbysuccessivelysmalleramountsuntilthevalueofthey-coordinateiszero.Asintherotationmode,thedirectionofrotationisrepresentedbythevaluedi,butthisisinsteaddeterminedbythesignofthey-coordinate,ratherthantheangle:
Again,itisimportanttonotethatthisalgorithmrequirestheangleoftheinputvectortobeintherange[-π/2,π/2].Foranglesoutsideofthisrange,thepointisfirstrotatedby2πandπasnecessaryuntilitisintheproperrange.
Aftercompletion,theaccumulatedangleistheanglebetweentheinitialpointandthex-axis,andthevalueofthex-coordinateisthedistancetothepointfromtheorigin.Insummary,thevaluesofthelastiterationofthevectoringmodealgorithmaregivenby
TeamUnknownRelease
Chapter27-FastTrigonometricOperationsUsingCordicMethodsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
27.3Applications
Thefollowingtablesummarizesthecomputationsthatcanbedonebythesetwoalgorithms,givenaninitialvector(x0,y0)andanangleθ:
Operation Mode Input Results
Sine/cosine rotation (1,0),θcosθ=xn
sinθ=yn
Arctangent vectoring (x0,y0),θ tan-1(y0/x0)+θ=αn
2Dvectorrotation rotation (x0,y0),θ (x′,y′)=(xn,yn)
Vectorlength vectoring (x0,y0) ||(x0,y0)||=xn
TeamUnknownRelease
Chapter27-FastTrigonometricOperationsUsingCordicMethodsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
27.4Implementation
Thefollowingcodelistingsshowimplementationsofallthealgorithmsandfunctionslistedabove.Intheseimplementations,thefixed-pointformatisassumedtousean8-bitwholepartand24-bitfractionpart.
ThecodeinListing27.1implementsthetwoalgorithms.NotethatthemultiplicationbyCisnotpresenthereandmustbehandledelsewhere.Insomecases,multiplicationbyCisnotnecessarybecauseeitherthescaleoftheresultisirrelevantorbecausethereisamoreefficientwaytoapplythevalue.RefertoeachapplicationinthelistingsthatfollowtoseehowthevalueofCisapplied.
Listing27.1:Rotationandvectoringmodeimplementations.
//Returns0ifn>=0,and-1ifn<0inlineint32S(int32n){returnn>>(sizeof(int32)*8-1);}
//Returnsnifd==0,and-nifd==-1inlineint32CONDITIONAL_NEG(int32n,int32d){return(n^d)-d;}
voidRotationMode(int32x,int32y,int32a,int32*rx,int32*ry){
for(inti=0;i<NUMBER_OF_ITERATIONS;++i){int32d=S(a);//(a>=0)?0:-1;int32xi=x;int32yi=y;x=x-CONDITIONAL_NEG(yi>>i,d);y=y+CONDITIONAL_NEG(xi>>i,d);a-=CONDITIONAL_NEG(angles[i],d);}
*rx=x;*ry=y;}
voidVectoringMode(int32x,int32y,int32a,int32*rl,int32*ra){for(inti=0;i<NUMBER_OF_ITERATIONS;++i){int32d=S(y);//(y>=0)?0:-1;int32xi=x;int32yi=y;x=x+CONDITIONAL_NEG(yi>>i,d);y=y-CONDITIONAL_NEG(xi>>i,d);a+=CONDITIONAL_NEG(angles[i],d);}
*rl=x;*ra=a;}
Inrotationmode,theinputanglemustbeintherange[-
π/2,π/2].ThefunctionshowninListing27.2rotatestheinputvectorbymultiplesof2πandπ,adjustingtheinputangleaccordingly.
Listing27.2:Normalizingtheinputrange.
voidNormalize(int32&x,int32&y,int32&a){while(a>=FIXED_TWO_PI)a-=FIXED_TWO_PI;while(a<=-FIXED_TWO_PI)a+=FIXED_TWO_PI;
while(a>FIXED_PI_OVER_2){x=-x;y=-y;a-=FIXED_PI;}
while(a<-FIXED_PI_OVER_2){x=-x;y=-y;a+=FIXED_PI;}}
Thetableofanglesisbuiltbycomputingthevaluesoftan-1
2-iintheappropriatefixed-pointformatuntilthevalueiszero.Thenumberofiterationsintherotationandvectoringmodealgorithmsissimplythenumberofentriesinthetable.Thistablecanbeprecomputedasitisnotlikelytovary.The
codeinListing27.3showshowthisisdone.
Listing27.3:Angletablegeneration.
vector<int32>angles;inti=0;for(;;){doublea=atan(pow(2.0,-i));int32fixed_a=int32(a*0x01000000+0.5);if(fixed_a<=0)break;
angles.push_back(fixed_a);++i;}
intNUMBER_OF_ITERATIONS=angles.size();
ThecodeinListing27.4showshowthevalueofCiscomputed.Thisvaluecanbeprecomputed,asitisnotlikelytovary.
Listing27.4:ComputationofC.
doublek=1.0;for(inti=0;i<NUMBER_OF_ITERATIONS;++i){k*=sqrt(1.0+pow(4.0,-i));}
int32C=int32(1.0/k*0x01000000+0.5);
SineandCosine
ThecodeinListing27.5usestherotationmodetocomputethesineandcosineofanangle.
Listing27.5:Sineandcosineimplementation.
voidSineCosine(int32a,int32&s,int32&c){c=C;//Pre-multiply(1,0)byCs=0;Normalize(c,s,a);//Adjustangletotherange[-pi/2,pi/2]RotationMode(c,s,a,&c,&s);}
Arctangent
ThecodeinListing27.6usesthevectoringmodetocomputethearctangentofavalue.
Listing27.6:Arctangentimplementation.
int32ArcTangent(int32m){int32angle,length;
VectoringMode(0x01000000,m,0,&length,&angle);returna;}
2DVectorRotation
ThecodeinListing27.7usestherotationmodetorotateavectorbyagivenangle.
Listing27.7:Vectorrotationimplementation.
voidRotate(int32&x,int32&y,int32a){Normalize(x,y,a);//Adjustangletotherange[-pi/2,pi/2]RotationMode(x,y,a,&x,&y);
//ThevectormustbescaledbyCx=int32((int64(x)*int64(C))>>24);y=int32((int64(y)*int64(C))>>24);}
VectorLength
ThecodeinListing27.8usesthevectoringmodetocomputethelengthofavector.
Listing27.8:Vectorlengthimplementation.
int32Length(int32x,int32y){int32angle,length;
x=abs(x);//Putthevectorintotherange[-pi/2,pi/2]
VectoringMode(x,y,0,&length,&angle);
//ThelengthmustbescaledbyCreturnint32((int64(length)*int64(C))>>24);}
TeamUnknownRelease
Chapter27-FastTrigonometricOperationsUsingCordicMethodsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
27.5Considerations
Whenusingafixed-pointformat,overflowandprecisionareaconstantconcern.Thefollowingconsiderationsmustbetakenintoaccountwhenusingthesefunctions:
1. Thelengthofthevector(xi,yi)willgrowasitisrotatedbyafactorofapproximately1.65.Youmustconstraintheinputvaluestoensurethatthiswillnotcauseanoverflow.
2. Certainoptimizationsinthecodepresentedherearedoneinordertoeliminatebranchingandmultiplication.Optimizationssuchasthesecanbetailoredtothetargetplatform.
3. Thecodeimplementedhereassumesthatthecompilerimplementstheshiftoperatoronsignedtypesusinganarithmeticshift.InC/C++,theprecisebehavioroftheshiftoperatoronsignedintegertypesisdefinedbythecompilerimplementation.Thecompilermayormaynotuseanarithmeticshiftinthiscase.Forexample,theresultofshiftingthevalue-1rightbyonebitmaybe0x7FFFFFFFor0xFFFFFFFF,dependingonthecompiler.
TeamUnknownRelease
Chapter27-FastTrigonometricOperationsUsingCordicMethodsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
27.6Extensions
Thehyperbolicequivalents,aswellastheinversesofthefunctionspresentedabove,canalsobecomputedusingsimilarmethods.Inaddition,functionssuchastangent,hyperbolictangent,ex,naturallog,andsquarerootcanbederivedfromthebasicfunctions.Andraka[1]andWalther[3]describetheimplementationoftheseextensions.
TeamUnknownRelease
Chapter27-FastTrigonometricOperationsUsingCordicMethodsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]RayAndraka."AsurveyofCORDICalgorithmsforFPGAbasedcomputers".Proceedingsofthe1998ACM/SIGDAsixthinternationalsymposiumonFieldprogrammablegatearrays,1998,pp.191–200.
[2]JackE.Volder."TheCORDICTrigonometricComputingTechnique".IRETransactionsonElectronicComputing,VolumeEC-8(September1959),pp.330–334.
[3]JohnS.Walther."AUnifiedAlgorithmforElementaryFunctions".SpringJointComputerConferenceProceedings,Volume38(1971),pp.379–385.
TeamUnknownRelease
Chapter28-Inter-ProcessCommunicationBasedonYourOwnRPCSubsystemGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter28:Inter-ProcessCommunicationBasedonYourOwnRPCSubsystem
KurtPelzerPiranhaBytes
Overview
Theremoteprocedurecall(RPC)techniqueisapowerfultoolforconstructingdistributedapplications.Itimplementsaclient/serverbasedsystemwithoutrequiringthatcallersbeawareoftheunderlyingnetwork.Thatis,theprogrammerwouldwriteessentiallythesamecodewhethertheprocedureislocaltotheexecutingprogramorremote.RPCisolatestheapplicationfromthephysicalandlogicalelementsofthedatacommunicationsmechanismandallowstheapplicationtouseavarietyoftransports(e.g.,TCP/IPorUDP/IP).
WhenanapplicationiscombinedwithanRPCsubsystem,itisabletointeractwithasecondapplication(e.g.,editorandgame),andittransparentlymakesremotecallsthroughalocalprocedureinterface.ThetwoprocessesmaybeonthesamesystemasinFigure28.1(a),ortheymaybeondifferentsystemswithanetworkconnectingthemasinFigure28.1(b).
Figure28.1:TwoapplicationsconnectedviaRPC—Clientandserverapplicationson(a)thesameor(b)differentmachines.
Becauseofitstransportindependence,RPCmakestheclient/servermodelofcomputingmorepowerfulandeasiertoprogram.Itisbasedonextendingthenotionofconventional,orlocal,procedurecallingsothatthecalledprocedureneednotexistinthesameaddressspaceasthecallingprocedure.ImplementingyourownRPCsystemisusefulbecauseitenablesyoutoconnectdifferentsystemslikePCsandmultimediagameconsolessuchasthePlayStation3orXbox360.ItiseasytouseRPCinyourownapplication,soitmakessensetointegrateitintoyourengineanddevelopmenttools.
TeamUnknownRelease
Chapter28-Inter-ProcessCommunicationBasedonYourOwnRPCSubsystemGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
28.1HistoryofRemoteProcedureCall
Theideaoftheremoteprocedurecallgoesbackto1976,whenitwasdescribedinRFC707[1]asaninter-processcommunication(IPC)technology.AnIPCisasetoftechniquesfortheexchangeofdataamongmultiplethreadsinoneormoreprocessesthatmayberunningononeormorecomputersconnectedbyanetwork.IPCtechniquesaredividedintomethodsformessagepassing,synchronization,sharedmemory,andremoteprocedurecalls.OneofthefirstbusinessusesofRPCwasbyXeroxunderthename"Courier"in1981[2].
ThefirstpopularimplementationofRPCwasSun'sRPC,nowcalledONCRPC.Itisstillwidelyusedtodayonseveralplatforms[3,4].AnotherearlyimplementationwasApolloComputer'sNCS(NetworkComputingSystem).ItwasusedasthefoundationofDCE/RPC.Adecadelater(inthemid1990s),MicrosoftadoptedDCE/RPCasthebasisofMicrosoftRPC(MSRPC),andimplementedDCOMatopit.
TeamUnknownRelease
Chapter28-Inter-ProcessCommunicationBasedonYourOwnRPCSubsystemGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
28.2HowRPCWorks:InternalArchitectureofRPC
AnRPCisinitiatedbytheclientsendingarequestmessagetoaknownremoteserverinordertoexecuteaspecifiedprocedureusingsuppliedparameters.Aresponseisreturnedtotheclientwheretheapplicationcontinuesalongwithitsprocess.Whiletheserverisprocessingthecall,theclientisblocked—itwaitsuntiltheserverhasfinishedprocessingbeforeresumingexecution.
Figure28.2showstheflowofactivitythattakesplaceduringanRPCcallbetweentwonetworkedsystems.Likeafunctioncall,whenanRPCismade,thecallingargumentsarepassedtotheremoteprocedure,andthecallerwaitsforaresponsetobereturnedfromtheremoteprocedure.Thestepsarethefollowing:
1. Theclientmakesaprocedurecallthatsendsarequesttotheserverandwaits.ThethreadisblockedfromprocessinguntileitherareplyisreceivedortheRPCtimesout.
2. Whentherequestarrives,theservercallsadispatchroutinethatpreparestherequestedservice.
3. Therequestedserviceisperformedontheserver.
4. Theserversendstheresulttotheclient.
5. AftertheRPCiscompleted,theclientprogramcontinues.
Figure28.2:StepsduringanRPCcall,initiatedbytheclientsendingarequesttotheserver.
Animportantdifferencebetweenremoteprocedurecallsandlocalcallsisthatremotecallscanfailbecauseofunpredictablenetworkproblems.Also,callersgenerallymustdealwithsuchfailureswithoutknowingwhethertheremoteprocedurewasactuallyinvoked.
Notethatinthisremoteprocedurecallmodel,onlyoneofthetwoprocessesisactiveatanygiventime.However,thisscenarioisgivenonlyasanexample.TheRPCprotocolmakesnorestrictionsonconcurrency,andotherscenariosarepossible.Forexample,animplementationmaychoosetohaveasynchronousRPCcallssotheclientmaydousefulworkwhilewaitingforthereplyfromtheserver.Anotherpossibilityistohavetheservercreateaseparatetasktoprocessanincomingrequestsotheservercanbefreetoreceiveotherrequests.
Codethatcallsremotelymakesuseofalow-levelsubsystem.Theencodinganddecodingofprocedurecallsishandledinaspecialstubmodule(seeFigure28.3).ThatRPCstubmodulehandlestheprocedureidentificationand
themarshallingofthesuppliedprocedureparametersinsideamessagethathastobesentorhasbeenreceived.TheRPCprotocolisindependentoftransportprotocols;thatis,RPCdoesnotcarehowamessageispassedfromoneprocesstoanother—theprotocolisconcernedonlywiththespecificationandinterpretationofmessages.Forexample,RPCmaybeimplementedontopofTCP/IPorUDP/IP.Also,theactofbindingaclienttoaserverisnotpartoftheRPCsubsystem.Thisfunctionislefttosomehigher-levelsoftwaremodule.
Figure28.3:Encodinganddecodingofprocedurecallsinspecialstubmodules.
Aremoteprocedureisuniquelyidentifiedbythefollowingthreepiecesofinformation:
A. Theprogramnumber.
B. Theversionnumber.
C. Theprocedurenumber.
TheprogramnumberAidentifiesagroupofrelatedremoteprocedures,eachofwhichhasauniqueprocedurenumber.Aprogrammayconsistofoneormoreversions,andeachversionconsistsofacollectionofproceduresthatareavailabletobecalledremotely.TheversionnumberBenablesmultipleversionsofanRPCprotocoltobeavailablesimultaneously.Eachversioncontainsanumberofproceduresthatcanbecalledremotely,andeachprocedurehasaprocedurenumberC.
TeamUnknownRelease
Chapter28-Inter-ProcessCommunicationBasedonYourOwnRPCSubsystemGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
28.3HowtoBuildYourOwnRPCSubsystem
WehaveseenthattheRPCtechnologymustallowacomputerprogramtocauseaproceduretoexecuteinanotheraddressspace(commonlyonanothercomputeronasharednetwork)withouttheprogrammerexplicitlycodingthedetailsforthisremoteinteraction.ThatmeansthattheRPCstubmoduleshavetohandleanumberoftasks.Ontheclientside,thestubhastohidethefactthatthecalledprocedureisgoingtoruninadifferentprocess(onthesamemachineoronadifferentmachine).Ontheserverside,thestubhastohidethattheprocedurecallwasinitiatedinadifferent(client)processandthattheresultisgoingtobesentback.ThesetasksthestubshavetohandleleadtoalistofpointsthatshouldbekeptinmindwhenyoustarttoimplementyourownRPCsubsystem.
Theremoteprocedurecallmechanismmustbehavesimilarlytothatofthelocalprocedurecallmodel.Withthelocalmodel,thecallerplacesargumentstoaprocedureinawell-specifiedlocation(suchasinparticularregistersoronthestack)andtransferscontroltotheprocedure.Whenthecallereventuallyregainscontrol,itextractstheresultoftheprocedurefromthewell-specifiedlocationandcontinuesexecution.
Withrespecttotheremoteprocedurecallparadigm,aclientcallsthestubversionofthewantedproceduretoinitiatetheprocessingofthewantedcalculations.Now,thecalledfunctionintheclientRPCstubmodulehastoencodeallneededinformationinadatapackettobeabletoforcetheremoteprocessingofthewantedprocedurebysendingthis
packettoaserver.Thismeansthattheprocedureidentificationandallfunctionparametershavetobeencodedinthisdatapacket.Ontheserverside,aprocessiswaitingforthearrivalofaclientmessage.Whenacallmessagearrives,theserverrunsadispatchroutineinitsRPCstubthatextractstheprocedureidentificationanditsparameters,performstherequestedproceduretocomputetheresults,encodestheseresultsinanewdatapacket,andsendsitbacktotheclient.Thentheserverwaitsforthearrivalofthenextcallmessage.Theresultingpacketoftheprocedurecallreturnstotheclientwhereithastobedispatched.Finally,theprocedurethatinitiatedtheprocessingofthewantedcalculationsregainscontrol,continuesexecution,andcanhandletheresult(seeFigure28.4).
Figure28.4:RPCstubmoduleshandlingtheprocedureidentificationandthemarshallingofthesuppliedprocedureparameters.
WehavealreadyseenthattheidentificationofthewantedremoteproceduremustbeencodedintheclientRPCstubinasetofthreenumbers:theprogramnumberA,theversionnumberB,andtheprocedurenumberC.ThatinformationenablesthereceivinganddecodingRPCstubmoduleintheserverapplicationtoidentifythefunctionthatmustbeprocessed.Besidethisprocedureidentification,theclient/serverstubmoduleshavetohandlethemarshallingoftheprocedureparameters.Therearethreedifferentsolutionsforpassingtheparameters:
Aparametercanbepassedbyvalue—thismeansalocalvaluethatcanbemodified.
Aparametercanbepassedbyreference—thismeanstheparameterisapointertoavaluethatmustbehandledviacall-to-copy/restore.
Aparametercanbeapointertoacomplexdatastructuresuchasalist,tree,etc.Theservercouldreadstructureelementsfromtheclientoneatatime,butthiswouldbeveryinefficient.Abetterwayistocopythecompletedatastructuretotheserveraddressspace.
Withrespecttopossibledifferentrepresentationofintegerandfloatvalues,characters,andotherdataonclientandservermachines,youhavetobeabletohandlesystem-specificissues,especiallythemixingoflittleendianandbigendianbyteordering.Youhavetoencodeinformationaboutthedataformatofthepackedparameters,oryouhavetouseamachine-independenttransferformatfordata.
ThesimplecodefragmentsshowninListing28.1shouldgiveyouanideaofhowtoimplementthestubfunctionsforclient
andservermachines.
Listing28.1:Examplestubfunctionsfortheclientandserver.
//INTHECLIENTRPCSTUB
//exampleofaclientstubversionof"procedureA"//clientapplicationcallsthislocalproceduretorun//"procedureA"intheserverapplication
intClientRPCStub::procedureA(intparameter){//encodeadatapacketPackoutPacket(getProcID(this),getMshParams(parameter));
//sendpackettoservergetComModul().send(outPacket);
//waitforserverresponseResultPackresultPacket=waitForResult();
//dispatchtheresultandreturntocallerreturnresultPacket.getReturnValue();}
//INTHESERVERRPCSTUB
//functiontohandleclientrequestsintheserverappvoidServerRPCStub::handle(Pack&inPacket){//dispatchadatapacketProcIDprocID(inPacket);
ParamBlockparams(inPacket);
//calldispatchfunctiontoprocessthewantedprocedureResultPackresultPacket=run(procID,params);
//sendresulttocallergetComModul().send(resultPacket);}
//dispatchfunctionintheserverapplicationPackServerRPCStub::run(ProcID&procID,Block¶ms){//detectthewantedprocedureandextracttheparameters...
//callthelocalversionofwanted"procedureA"
intresult=procedureA(parameter);
//encodethepacketthathastobereturnedtoclientreturnResultPack(procID,result);}
TeamUnknownRelease
Chapter28-Inter-ProcessCommunicationBasedonYourOwnRPCSubsystemGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
28.4WhyRPCisUsefulforGameEngines
SincetheRPCtechnologyenablesyoutoconnectrunningprocessesondifferentsystemslikePCandgameconsoles(e.g.,Xbox360orPlayStation3),thepossibilityofdistributedprocessingleadstomanyusefulcasesthatletyourengineanddevelopmenttoolsinteractatruntime.Thefollowingaresomeusefulcasesofinteractingapplications:
Anadvancedlogtool,asshowninFigure28.5(a).Thistoolcouldbeabletooutputthetextmessages(events,warnings,errors)forthesendinggameandadditionallysupportanintegratedconsoletosendbackcommandsfromthetooltothegame.Enablingthetooltocheatortoswitchmodesintherunninggamefromoutside,thisisausefulapplicationofRPCespeciallyongameconsolesthatdon'tsupportkeyboards.ThisisalsousefulinthecaseofalongdistancebetweenthesystemrunningthelogtoolandthesystemrunningthegameconnectedviaTCP/IPandthelocalintranetortheinternet.
Externaleditingtoolsconnectedtoseparateruntimeapplications—applicationsthatmakeuseofconnectedruntimeprocessestobeabletosendinstructionsandreceivereal-timefeedback.Thisisusefulforlightsourceplacementorcutsceneeditingintherunninggamefromoutsidewithaconnectedtool.Itisalsousefulforeditorsthatneedtorunsystem-dependentprocessesandcalculationsonothermachinestobeabletodetectspecificrestrictionsandissues.
Aneditorconnectedtoseveralplatformsatthesame
time(seeFigure28.5(b)).Itcanbeusefultobeabletogetrenderedscreenviewsfromdifferentmachinesatthesametime.ThatenablesyoutodisplaythescreensofmultipleinstancesofarunninggamesentfromconnectedPCsandmultimediaconsolesonyourPCanddetectmachine-dependentdifferencesinqualityandperformance.
Advancedprecalculationsonmultiplemachines.Youcanusethecomputersystemsconnectedtoyourintranetfordistributedcomputationsinexpensiveprecomputationsteps,forexample,tocalculatethestaticlightingandambientocclusioninhigh-detailedgameworlds,etc.
Figure28.5:(a)Alogtoolconnectedtoarunninggame.(b)Aneditorapplicationconnectedtotwoinstancesofagame.
TeamUnknownRelease
Chapter28-Inter-ProcessCommunicationBasedonYourOwnRPCSubsystemGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
References
[1]JamesE.White."AHigh-LevelFrameworkforNetwork-BasedResourceSharing".AugmentationResearchCenter,StanfordResearchInstitute.http://tools.ietf.org/html/rfc707
[2]AndrewD.BirrellandBruceJayNelson."ImplementingRemoteProcedureCalls".XeroxPaloAltoResearchCenter,1984.http://www.cs.yale.edu/homes/arvind/cs422/doc/rpc.pdf
[3]"RemoteProcedureCall—OMCRPCProtocolSpecificationVersion1".SunMicrosystems,1988.http://tools.ietf.org/html/rfc1057
[4]"RemoteProcedureCall—OMCRPCProtocolSpecificationVersion2".SunMicrosystems,2009.http://tools.ietf.org/html/rfc5531
TeamUnknownRelease
ColorPlatesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
ColorPlates
High-resolutionversionsofallfiguresareincludedontheaccompanyingCD.
Figure12.1
Figure12.4c
Figure15.1
Figure16.1
Figure17.1
Figure17.3
Figure17.4
Figure19.3
Figure7.1
Figure7.2
Figure7.3
Figure18.3
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
Italicizedpagenumbersindicateareferencetoafigureortable.
Numbers
3DSMax,31,32
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
A
A*algorithm,64,68,71,72,74,76,77accomodation,124,125actor,62adaptationluminosity,225aerosols,222airmass,221ambientocclusion,249–61anaglyphstereo,131–32arctangentfunction,335artificialintelligence(AI),xiii,61,64,67assetpipeline,11–35buildprocess,13–14fastpath,15finalassets,12–13intermediateassets,15–16manifest,14pushandpullmodels,20–22sourceassets,12
atmosphericscattering,221atmospherictransmittance,221atmosphericturbidity,221,231
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
B
beaconpoint,62,75,76,77billboarding,134binoculardepthcue,124accomodation,124,125convergence,124retinaldisparity,124,126
Blender,33blower,178blowerheuristic,183–84boxinertiatensorof,201–2,215
bufferedstatechanges,286–87BulletPhysicsLibrary,56
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
C
C4Engine,235camera-centricdesign,149–65capsuleinertiatensorof,209–11,216
Cascadesdemo,53centerofmass,197–98centraldifferenceoperator,47characterdismemberment,263–69Cocoa,93,305codeexecutionhierarchy(CEH),297–304COLLADA,22–30<accessor>element,25,26–27<extra>element,24<input>element,28,29<mesh>element,27–29<param>element,26,30<source>element,24–26,27<technique_common>element,25<technique>element,24<triangles>element,28,29<vertices>element,27,28FCollada,32OpenCOLLADA,30–33
commandbuffer,142,151coneinertiatensorof,205–6,215
constructivesolidgeometry(CSG),264convergence,124
CORDICmethod,329–37rotationmode,329–31vectoringmode,331
cosinefunction,335covariancematrix,116Crymod,33Crysis(game),39CUDA,20,145cylinderinertiatensorof,202–3,215
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
D
damagezone,265,268decalrendering,271–80fade-outandwrap-around,275–77surfaceclipping,277–78
densityfield,43depthbuffer,229,238,249,250,252,254,255depthcuebinocular,124monocular,124
depthperception,127depthtest,275diplopia,126DirectX,139,141,150,152,156,172,260,288dismemberment.Seecharacterdismembermentdocumentobjectmodel(DOM),30domeinertiatensorof,208–9,215
DOTfileformat,303DXTCcompression,107
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
E
Earthshine,223eclipticcoordinates,220ellipsoidinertiatensorof,207–8,215
ephemerismodel,219epochtime,220equatorialcoordinates,220Euclideannorm,241EvilDead2(movie),263extensiblemarkuplanguage.SeeXML
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
F
FBXfileformat,30FCollada,32fitnessmetric,191fluiddynamics,177–85blower,178blowerheuristic,183–84Navier-Stokesequations,178–79obstacleheuristic,181–83velocityfield,178–81
fogcolor,233fuzzypath,62,64,68,71–75
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
G
gameassetpipeline.Seeassetpipelinegamestateobserverpattern.SeeobserverdesignpatternG-buffer,272,279gem,xiiigeographiccoordinates,220gibs,264gorezone,264GPLlicense,9gradient,236,238,241graphicaluserinterface(GUI),xiii,91–104GraphicsGems(bookseries),xiiiGraphViz,303
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
H
high-levelpath.Seefuzzypathhomologouspoints,125horizoncoordinates,220
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
I
idTech6engine,58indirectillumination,249–61inertiatensor,198–200ofbox,201–2,215ofcapsule,209–11,216ofcone,205–6,215ofcylinder,202–3,215ofdome,208–9,215ofellipsoid,207–8,215ofpyramid,203–5,215oftruncatedcone,213–14,216oftruncatedpyramid,211–13,216transforming,199–200
INFITEC,132IntelThreadingBuildingBlocks,291inter-processcommunication,340inter-pupillarydistance,126introspection,302irradiance,221isothermaleffect,221isovalue,43
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
K
key-valuedictionary(KVD),305–10KillBill(movie),263Kruskal'salgorithm,189
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
L
levelofdetail(LOD),49–50LGPLlicense,9libgcm,139listener,80,90lock,283–84
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
M
MajesticDragon,34manifest,14MarchingCubesalgorithm,41–42,42MathML,23maximumnorm,241Maya,31memorypool,GPU-managed,167–76meshpartitioning,187–96fitnessmetric,191
mesopicvision,224messagequeue,284–85middleware,3–10MinerWars(game),39mippyramid,49MITlicense,9modding,33–34model-view-controller(MVC)designpattern,92–93momentofinteria.Seeinteriatensormonoculardepthcue,124occlusion,124parallax,124,125
motionblur,235gridoptimization,245–46post-processingeffect,242–45velocity-depth-gradientbuffer,236,238–42
multithreadedobjectmodels,283–88multithreadedrenderer,139–46multithreading,9,50,139,149,283,289,311
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
N
Navier-Stokesequations,178–79
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
O
obscurancesmethod,249observerdesignpattern,315–27creatingobservers,325–27gamestatemanager,317–20interfacesof,321
obstacleheuristic,181–83occlusion,124Ogre3Dgraphicsengine,31,56on-screenparallax,127OpenCL,23OpenCOLLADA,30–33OpenEXR,17OpenGL,23,139,172
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
P
palette,107–21parallax,124,125parallax,on-screen.Seeon-screenparallaxparallelaxistheorem,200paralleltasks,289–96pathfinding,61–77fuzzypath,62,64,68,71–75terrainanalysis,62,67–69
Perezmodel,231Perforce,12Perlinnoise,54,56,58photopicvision,224Photoshop,17,108,120physically-basedrendering,219–27,229–34,249–61plano-stereoscopicdisplay,123,124–26PlayStation3,xiii,22,140,141,167,172,174,175,339,347polarization,132Prim'salgorithm,190principalaxesofinertia,199projectionmatrix,239pyramidinertiatensorof,203–5,215
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
R
Rayleighscattering,222real-timestrategy(RTS),62red/greenglasses,131reflectedradiance,250remoteprocedurecall(RPC),339–48retinaldisparity,124,126
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
S
S3TCcompression,107SAX,31scheduler,311–14screen-spaceambientocclusion,249shadowmapping,134Siggraph,22,23,31,34,231Sims,The(game),34sinefunction,335SketchUp,33skybox,229–34Sobeloperator,48SoldierofFortune(game),264soundculling,79–90soundgrid,83–89Split/Second(game),167,168Spore(game),34standardtemplatelibrary(STL),305,313stereopsis,125stereoscopicrendering,123–36Subversion,12surfaceextraction,41,46–52synergisticprocessingunit(SPU),141,145,187,188
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
T
taskparallelism,289–96temporalmultiplexing,132texturearray,54textureatlas,54Thermite3Dgameengine,40,57threadcontext,285–86tile,62,65trigonometricfunctions,329–37arctangentfunction,335cosinefunction,335sinefunction,335
triplanartexturing,53truncatedconeinertiatensorof,213–14,216
truncatedpyramidinertiatensorof,211–13,216
turbidity.Seeatmosphericturbiditytwilight,223
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
U
Up(movie),135
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
V
vectorlength,335vectorrotation,335velocityfield,178–81velocity-depth-gradientbuffer,236,238–42viewporttransformation,239viscosity,179volumetricenvironments,39–58voxel,40,178
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
W
WindowsPresentationFoundation,93Worms3D(game),39
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
X
Xbox360,xiii,14,140,141,167,172,175,291,339,347XML,24XNA,14XSIModTool,33xyYcolorspace,231,232XYZcolorspace,232,233
TeamUnknownRelease
IndexGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Index
Z
zenithangle,221,230zenithluminance,231Zliblicense,9
TeamUnknownRelease
ListofFiguresGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
ListofFigures
Chapter2:TheGameAssetPipeline
Figure2.1:AssetPipelineComponents.Figure2.2:Intermediateassetsalongthebuildprocess.Figure2.3:Gameengineeditorincontroloftheassetpipeline.Figure2.4:Definitionofthe<source>element.Figure2.5:Definitionofthe<accessor>element.Figure2.6:Definitionofthe<vertices>element.Figure2.7:Definitionofthe<triangles>element.
Chapter3:VolumetricRepresentationofVirtualEnvironments
Figure3.1:Volumetricrepresentationscanbeusedformanydifferenttypesofenvironments.In(a)weseeacomplexterrainwithtwoprimarylevelsandnumerousoverhangs[5].(ImagecourtesyofThomasSchöps.)TheEarthin(b)hasbeencutawaytoillustratethattheinteriorisalsomodeled[15].Manmadestructureswithmanydifferentmaterials(c)canalsoberepresented[15],while(d)showsaminingshipinsideanasteroid,destroyingitinrealtime[8].(ImagecourtesyofKeenSoftwareHouse.)Figure3.2:(a)Thisvolumeconsistsofan8×8×8gridofvoxels,thoughrealvolumesareconsiderablylarger.Thecorneriscutawaytoshowhowvoxelsalsomodeltheinteriorofanobject.(b)Eachgroupof2×2×2voxelsformsacell.Notethatvoxelsandvolumesaretheonlytypesthatarestoredexplicitlywithinoursystem,asedgesandcellsareimplicitconstructsthatwebuildbylookingatavoxel'sneighbors.Figure3.3:ThesetoftrianglesgeneratedbytheMarchingCubesalgorithmforeachofthe18possiblecellconfigurations.Thenumbersindicatehowmanytimeseachconfigurationoccurs.Solidcirclesrepresentvoxelscontainingsolidmaterial,whilehollowcirclesrepresentvoxelscontainingemptyspace.Inmostcasestheinverseconfigurationgeneratesthesamesetoftriangles,withtheexceptionofthelastthreecases(whichareinversesofearliercasesbutwithdifferenttrianglestoavoidholes).Figure3.4:Thevolumewithdimensions8×8×8voxelsatthetopofthefigurecontainsfourdifferentmaterialIDsrepresentedbycolors(seefigureonaccompanyingCD).It
issplitinto8blocks,eachofwhichhavedimensions4×4×4voxels.Thefourtopblocksandthetwolowerleftblocks(oneofwhichishiddenattheback)arehomogeneousandsocansharecopiesoftheactualdata.Thereferencecountsareindicatedatthebottomofthefigure.Explicitlystoringblockdataforonlyfouroutoftheeightblocksgivesusamemorysavingof50%inthisoverlysimplisticexample.Figure3.5:Asmallnumberofbackgroundthreadscontinuouslyprocessthequeueofregionsthathavebeenmodifiedandregeneratethesurfacegeometry.ThemainthreadretrievestheresultsanduploadsthegeometrytotheGPU.Figure3.6:AsingleinputmeshcontainingmultiplematerialIDs,representedbydifferentcolors(seefigureonaccompanyingCD),issplitintotwomeshes.Intheuniformtrianglemesh,allthecomponentsarespatiallyadjacentandareonlysplitupinthefiguretoaidvisualization.Inthenonuniformtrianglemesh,thethreepartsaredrawnontopofeachothersuchthatthealphavaluesblendcorrectly.
Chapter4:High-LevelPathfinding
Figure4.1:Aworldmap,withtilemarkings.Figure4.2:PathregionswithaworldgridoverlayandIDsforeachpathregion.Noticehowtheworldgridboxescontainmultipleregionsinsidethem,andthatregionsliketheoceanaredividedupbytheworldgridintomanyseparateregions.Figure4.3:Exampleofdiagonalconnectionsbetweenregions.Figure4.4:High-level,fuzzypathshavebeencomputedforeachofouractors.Noteourbeaconpoints,identifiedasfootprints.Weusethesebeaconpointsinthedetailedpathingphase.Figure4.5:Beaconpointsserveasaguidetothedetailedpathingengine.Notehowthedetailedpath(roundpathwithanarrowhead)doesnotgoallthewaytothebeaconpoints.
Chapter5:EnvironmentSoundCulling
Figure5.1:Allsoundemitterstestingagainstthelistener.Figure5.2:Soundsemittersusingtheiraudibledistancetodeterminewhichgridcellstheytouch.Figure5.3:Thecelloccupiedbythelistenerisonlywithintheradiusofonesoundemitter.
Chapter7:World'sBestPalettizer
Figure7.1:(SeealsoColorPlates.)(a)Flowerssourceimageusing100,162colors.(b)Photoshop,256colors.(c)Photoshop,16colors.(d)WBP,256colors.(e)WBP,16colors.(©Dundanim/Dreamstime.com)Figure7.2:(SeealsoColorPlates.)(a)Grasshoppersourceimageusing136,945colors.(b)Photoshop,256colors.(c)Photoshop,16colors.(d)WBP,256colors.(e)WBP,16colors.(©Picstudio/Dreamstime.com)Figure7.3:(SeealsoColorPlates.)(a)Childsourceimageusing112,024colors.(b)Photoshop,256colors.(c)Photoshop,16colors.(d)WBP,256colors.(e)WBP,16colors.(©PavlaZakova/Dreamstime.com)
Chapter8:3DStereoscopicRendering:AnOverviewofImplementationIssues
Figure8.1:Threedifferenttypesofparallaxthatcanoccur,fromlefttoright—zeroparallax,negativeparallax,andpositiveparallax.Figure8.2:Convergenceandaccommodationinplano-stereoscopicdisplays.Figure8.3:Thetwoanglesusedtocomputetheretinaldisparity.Figure8.4:Parallaxvaluesgreaterthan1.5°visualangleshouldnotbeexceeded.Figure8.5:Negativeparallax.Figure8.6:Positiveparallax.Figure8.7:Camerasetupforthe3Dstereoscopicgame.
Chapter9:AMultithreaded3DRenderer
Figure9.1:Inadouble-bufferingscheme,theGPUconsumesthedataaframelaterthanitisgeneratedbytheCPU.Figure9.2:Inthisscheme,theGPUconsumesthedatasoonafteritisgeneratedbytheCPU.Figure9.3:BoththerenderthreadandthegraphicsAPIarelikelytoaccessdatainvariouslocationsinmemory.Figure9.4:Theprimarycommandbufferreferencesmultiplesecondarycommandbuffers,eachofwhichhandlesasubsetoftheframe.Figure9.5:Tasksarestoredinaqueueandareconsumedbyavailableprocessingunits.Figure9.6:Whilethehelperthreadcomputesthepost-effects,theGPUstartsrenderingthenextframe.
Chapter10:Camera-CentricEngineDesignforMultithreadedRendering
Figure10.1:Astandardgameusageofparallelprocessing.Noticethatonlytheupdateinformationtendstobemultithreaded.Figure10.2:Acamera-centricdesign.Wemakebetterusageofthethreadpoolforadditionalprocessingofrenderingjobs.
Chapter11:AGPU-ManagedMemoryPool
Figure11.1:Amemorypoollayoutshownasitundergoesanumberofallocateandfreeoperations.Notethateachdatachunkiskepttothealignmentshownbythedottedlines.Alsonotethatafteronlyafewoperations,wehaveafragmentedmemorypool.Figure11.2:SequencediagramofmemorymovementwithinourmemorypoolusingtheCPU.NotehowtheCPUneedstowaitforanindeterminatelengthoftimebeforemovingthetexturedatainthememorypool.Figure11.3:SequencediagramofmemorymovementwithinourmemorypoolusingtheGPU.NotethatnocomplexsynchronizationisnowneededbetweentheCPUandtheGPUandthattheCPUcanqueuethedatatransferimmediately.Figure11.4:StagingbufferusageillustratingtheuseoffencestodeterminewhenaGPUcopyiscomplete.Figure11.5:Memorypoollayoutshownoveranumberofdefragmentationpasses.Theshadedareasrepresentallocatedmemorychunks.Splittingthememorypoolintoregionscanreducethenumberofmemorycopiesrequiredduringdefragmentation.
Chapter12:Precomputed3DVelocityFieldforSimulatingFluidDynamics
Figure12.1:(SeealsoColorPlates.)Voxelandfacevelocities.Figure12.2:Imagesofthevelocityfield—(a)fromthefullsolver;(b)fromtheobstacleheuristic.Figure12.3:Imagesofthevelocityfield—(a)fromthefullsolver;(b)fromtheblowerheuristic.Figure12.4:(SeealsoColorPlates.)Imagesofthevelocityfieldvisualizationusingheuristics—(a)withvectorswithoutanobstacle;(b)withvectorswithanobstacle;(c)withunityvectorsforthedirectionandcolorfortheamplitudewithanobstacle;(d)withanobstacleandparticles.
Chapter14:MomentsofInertiaforCommonShapes
Figure14.1:Abox.Figure14.2:Acylinder.Figure14.3:Arectangularpyramid.Figure14.4:Acone.Figure14.5:Anellipsoid.Figure14.6:Adome,orellipsoidalhemisphere.Figure14.7:Acapsule.Figure14.8:Atruncatedpyramid.Figure14.9:Atruncatedcone.
Chapter15:Physically-BasedOutdoorSceneLighting
Figure15.1:(SeealsoColorPlates.)Anoutdoorscenewithphysically-basedlightingatdusk(left)andatnight(right).(ImagescourtesyofEmergentGameTechnologiesandSundogSoftware,LLC.)
Chapter16:RenderingPhysically-BasedSkyboxes
Figure16.1:(SeealsoColorPlates.)Physically-basedskyboxgeneratedfor(left)latemorningand(right)twilight.(ImagescourtesyofSundogSoftware,LLC.)
Chapter17:MotionBlurandtheVelocity-Depth-GradientBuffer
Figure17.1:(SeealsoColorPlates.)Intheleftimage,motionblurresultingonlyfromcameramovementisshown.Noticehowthegroundandtreesclosertothecameraareblurredmuchmorethandistantobjects.Intherightimage,motionblurresultingfromrigidobjectsmovinginthesceneisshown.Bothtranslationalandrotationalmotionarevisibleinthisstillimage.(ImagescourtesyofTerathonSoftwareLLC.)Figure17.2:Motionblurresultingfromvertexmovementonaskinnedcharactermodel.(ImagecourtesyofTerathonSoftwareLLC.)Figure17.3:(SeealsoColorPlates.)Inthesetwoimages,thecameraisrotatingaroundthecharacter,causingthegroundtomoveacrossthescreenwhilethecharacterisalmostcompletelystill.Intheleftimage,thedepthandgradientinformationinthevelocitybufferisnotconsidered,andallcolorsamplesalongthedirectionofthevelocityareused.Noticetheghostingoftheglowingpartsofthecharacter'sarmorandthefuzzyhalosurroundinghislegs.Intherightimage,thedepthandgradientinformationinthevelocitybufferisconsidered,andtherejectionoftheappropriatecolorsampleseliminatestheartifacts.(ImagescourtesyofTerathonSoftwareLLC.)Figure17.4:(SeealsoColorPlates.)Inthisimage,theviewportispartitionedintoagridof16×12cells.Thedepthandgradientinformationisonlyusedinthehighlightedcellssurroundingthecharactersincethatiswhereforegroundobjectsarelikelytobemovingslowlyrelative
tothebackground.(ImagecourtesyofTerathonSoftwareLLC.)
Chapter18:FastScreen-SpaceAmbientOcclusionandIndirectLighting
Figure18.1:Theshadedpointsisthecenteroftheneighborhoodsphere.TheradiusofthesphereisR.ThosedirectionsωwherethereisnointersectioncloserthanRarecalledopen.PointoisanintersectioncloserthanR.Figure18.2:Evaluationofthevolumetricintegralinthevisiblepartoftheneighborhoodsphere.Figure18.3:(SeealsoColorPlates.)Renderingresultsoftheharborscene—(Firstcolumn)directlighting,(secondcolumn)directlightingplusambientocclusion,and(thirdcolumn)directlightingplusambientocclusionandindirectlighting.
Chapter19:Real-TimeCharacterDismemberment
Figure19.1:Thelimbdamagezones.Figure19.2:Alimbisremovedbytransformingtheverticesassociatedwithitsjointsbyaskinningmatrixthatmakesthoseverticesdegenerate.Thedetachedlimbisformedbyperformingtheprocessinreverse.Figure19.3:(SeealsoColorPlates.)Thecolorcodeddamagezonesurfacegroups.
Chapter20:ADeferredDecalRenderingTechnique
Figure20.1:Decalsappliedtocomplexgeometry.Figure20.2:Usingasimpleprojectionontothelocalx-yplanecausesdecalstobesmearedinthedirectionofthelocalz-axis.Figure20.3:Aboundingcubecenteredonadecalcanberenderedtoensurethatwecapturethedecal'sinfluencefromanyviewpoint.Figure20.4:Thedecalswraparoundcornersbasedonthesurfacenormals,andtheyfadeoutbasedonthedistancefromthedecalplane.Figure20.5:(Left)Wrap-aroundbasedonfacenormals.(Right)Wrap-aroundbasedonsmoothlyinterpolatedvertexnormals.
Chapter22:HolisticTaskParallelismforCommonGameArchitecturePatterns
Figure22.1:(a)Worktobedividedintotasks.(b)Theworkexpressedasdependenttasks.
Chapter23:DynamicCodeExecutionHierarchies
Figure23.1:(a)Traditionalupdateloop.(b)Codeexecutionhierarchy.Figure23.2:Simplegraph.Figure23.3:Updateorders.Figure23.4:ExampleCEHGraph.
Chapter24:Key-ValueDictionary
Figure24.1:Apollingmodelforaccessinginformationoftencreatesadditionalheaderdependencies.Figure24.2:Theheaderdependencygraphissimplifiedbymovinginformationintoanintermediaryheader.Figure24.3:ThedatastoredinaKVD.
Chapter28:Inter-ProcessCommunicationBasedonYourOwnRPCSubsystem
Figure28.1:TwoapplicationsconnectedviaRPC—Clientandserverapplicationson(a)thesameor(b)differentmachines.Figure28.2:StepsduringanRPCcall,initiatedbytheclientsendingarequesttotheserver.Figure28.3:Encodinganddecodingofprocedurecallsinspecialstubmodules.Figure28.4:RPCstubmoduleshandlingtheprocedureidentificationandthemarshallingofthesuppliedprocedureparameters.Figure28.5:(a)Alogtoolconnectedtoarunninggame.(b)Aneditorapplicationconnectedtotwoinstancesofagame.
TeamUnknownRelease
ListofTablesGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
ListofTables
Chapter2:TheGameAssetPipeline
Table2.1:Importinto3DSMaxusingOpenCOLLADAforMaxandFeeling'sColladaMax.Table2.2:Exportfrom3DSMaxusingOpenCOLLADAforMaxandFeeling'sColladaMax.
Chapter3:VolumetricRepresentationofVirtualEnvironments
Table3.1:Thememoryrequiredtostoreourexamplevolumesvariesdependingontheblocksize.Table3.2:Sometypicaltimingsforourthreadedsurfaceextractorrunningonaquad-core2.33GHzCPUwith2GBofmemory.
Chapter8:3DStereoscopicRendering:AnOverviewofImplementationIssues
Table8.1:Practicalexamplesforon-screenparallaxvalues.Table8.2:ExampleforIPD=6.5cm,fornegativeparallax.Table8.3:ExampleforIPD=6.5cm,forpositiveparallax.
Chapter9:AMultithreaded3DRenderer
Table9.1:SynchronizingtheGPUandtheCPU.
Chapter14:MomentsofInertiaforCommonShapes
Table14.1:Thistableliststhemassm,thecenterofmass(CM)C,andtheentriesoftheinertiatensorIforavarietyofsolidshapes.Theinertiatensorisalwaysgiveninthecoordinatesystemforwhichtheorigincoincideswiththecenterofmass.Eachshapeisconsideredtobesolidwithaconstantdensityρ.
TeamUnknownRelease
ListofListingsGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
ListofListings
Chapter2:TheGameAssetPipeline
Listing2.1:Aexample<source>elementand<accessor>element.Listing2.2:ThisisasampleexporterfromDAE2OgreOgreWriter.cpp.
Chapter3:VolumetricRepresentationofVirtualEnvironments
Listing3.1:Triplanartexturingcanprojecttexturesontoarbitrarygeometry.Thiscodereceivestextures,anormal,andaworldpositionasinputandcomputestheresultingcolor.Notethatinordertopreservetexturehandedness,oneoftheUVcoordinatesmustbenegatedwheneverthedominantnormalcomponentis-x,+y,or-z.Thisisparticularlyimportantwhenworkingwithnormalmapsbutisnotshownforsimplicity.
Chapter4:High-LevelPathfinding
Listing4.1:Thisisafunctionusedtodeterminethepathtypeofatile.Theorderofruleshereisveryimportant.Thisfunctionisaspecialcaseandwillbefrequentlymodifiedduringdevelopment.Listing4.2:Regionsshouldcontainalistofadjacentregionswithdataindicatingiftheconnectionisadiagonal,andwhichregionsborderthediagonalconnection.Listing4.3:PrimaryadjacencyupdateloopforfuzzypathingA*engine.Listing4.4:Thismethoddetermineswhetheranactorcanmovebetweenregions.Listing4.5:CostmethodforourA*fuzzypathingengine.
Chapter5:EnvironmentSoundCulling
Listing5.1:Thispseudocodeshowsanexamplesoundgridcellandsoundemitter.Listing5.2:Thispseudocodeshowshowwebuildtheactivelistandstopplayinginvalidsounds.Listing5.3:Thispseudocodedemonstrateshowtheactiveemitterlistisprocessed.Listing5.4:Sampleinsertionroutinefordynamicsounds.Listing5.5:Sampleremovalroutinefordynamicsounds.
Chapter6:AGUIFrameworkandPresentationLayer
Listing6.1:OurGuiComponentbaseclass.Listing6.2:Aspriteclass.Listing6.3:TheGuiTextItemclass.Listing6.4:Abaseclassforselectablecomponents.Listing6.5:AGuiSelectableGrouphandlestheselectionofmanyGuiSelectableobjects.Listing6.6:Settingupandupdatingofanexamplestate.
Chapter7:World'sBestPalettizer
Listing7.1:Computingthecentroidforasetofsampleslookscomplicatedbecauseweallowforareducedsetofsampleswithassociatedfrequencies.It'sreallyjustanaverage.Listing7.2:Thiscomputestheerrorforeachcluster'ssamplesrelativetothecentroidchosenforit.Theclusterwiththegreatesterrorisreturnedforsubdivision.Listing7.3:Thisfunctioncomputesasquarederrorbetweentherepresentativevalueofaclusterandallofitssamples,withaweightpercomponent.Listing7.4:Thiscomputesacovariancematrixforthesamplesinthiscluster.Listing7.5:Thisusesthepowermethodwithupto10iterationstodeterminethedominantaxisofthecluster.Listing7.6:Thiscodesplitstheclusteratthemid-sectionalongthedominantaxis,binninghalfthesamplesintoeachnewcluster.
Chapter10:Camera-CentricEngineDesignforMultithreadedRendering
Listing10.1:AsimpleRenderCommandstructurethatcontainsbasicinformationforadrawcall.Listing10.2:Fillingacommandbufferusinggenerichandles.Thisisagreatplacetodoadditionallogicrelatedtorenderingsetup,sinceitwillbeexecutedonthethreadpool.Listing10.3:SubmittingaRenderCommandstructuretotheAPI.Thissnippetofcodeistheonlyfunctionthatcanactuallycommunicatedirectlywiththedevice.Listing10.4:ComparisonbetweenaCamerastructurethatthesimulationwoulduse,andaRenderViewstructure.Listing10.5:Creatingalltherenderviewsforagivenframe.Listing10.6:FillinginRenderCommandstructuresshouldoccurinathread-safemanner,onaseparatethread.Listing10.7:Serializingrenderviewsrequiresustoresolvetheminamannerthatsatisfiesdependencies.
Chapter12:Precomputed3DVelocityFieldforSimulatingFluidDynamics
Listing12.1:Pseudocodeofthefullsolver.Listing12.2:Pseudocodeoftheobstacleheuristic.
Chapter17:MotionBlurandtheVelocity-Depth-GradientBuffer
Listing17.1:ThisGLSLvertexshaderfirsttransformsthevertexpositionforthecurrentframetohomogeneousclip-spacecoordinatesintheordinarymannerusingthemodel-view-projection(MVP)matrix.TheshaderthentransformsthevertexpositionintoscreenspaceforboththeprecedingframeusingthematrixmotionAandthecurrentframeusingthematrixmotionB.Theresultingpositionsareoutputastexturecoordinatesthatwillbereadbythefragmentshader.Listing17.2:ThisGLSLfragmentshadercalculatesthescreen-spacevelocityandwritesittotheredandgreencomponentsoftheoutputcolor.ThevelocityScaleparameterholdsthevalueofs/rmaxshowninEquation(17.6).Thedepthistakendirectlyfromthew-coordinateofthecurrentpositionandiswrittentothebluecomponentoftheoutputcolor.Thegradientofthedepthiscalculatedusingthehardwarederivativefunctions,andthelargerofitscomponentsiswrittentothealphacomponentoftheoutputcolor.Listing17.3:ThisGLSLfragmentshaderappliesthemotionblureffectinthepost-processingpass.Theninecolorsamplesareaccumulatedinthex,y,andzcomponentsofthecolorvector,andthenumberofvalidsamplesisstoredinthewcomponentofthecolorvector.ThevalueofminDepthiscalculatedusingEquation(17.9),andonlysampleshavingadepthatleastthisfarfromthecameraplaneareusedtogeneratethefinalblurredpixel.
Chapter18:FastScreen-SpaceAmbientOcclusionandIndirectLighting
Listing18.1:Thisfragmentshaderimplementsthealgorithmdiscussedinthisgem.
Chapter19:Real-TimeCharacterDismemberment
Listing19.1:Thedamagezoneclassdefinition.
Chapter20:ADeferredDecalRenderingTechnique
Listing20.1:Thiscodedeterminesthedecal-spacecoordinatesforthefragmentbeingrenderedandtransformsthemintotexturecoordinatesforthedecal.TheRT_Depthtexturecontainsdepthvaluesfortheviewport,theworldToDecalconstantistheinverseofthedecal's4×4matrixtransformfromdecalspacetoworldspace,andtherecipDecalSizeconstantis1/s,wheresisthesizeofthedecalinthescene.Listing20.2:Thisvertexshaderscalesaunitcubetotheactualsizeofthedecalandtranslatesittothedecal'sworld-spaceposition.Listing20.3:Theabsolutevalueofthedecal-spacez-coordinateofthefragmentpositionisscaledtothesizeofthedecalandusedasafade-outparameterforthedecalcolor.Asbefore,therecipDecalSizeconstantis1/s,wheresisthesizeofthedecalinthescene.Listing20.4:Thiscodedemonstrateshowthenormaloftheunderlyingsurfacecanbeusedtoadjusttexturecoordinatesinsuchawaythatadecalwrapsaroundcurvesandcorners.TheRT_NormaltexturecontainsnormalvectorsfortheviewportencodedintheRGBchannels.
Chapter23:DynamicCodeExecutionHierarchies
Listing23.1:Examplebaseclass.Listing23.2:Examplegamecode.
Chapter26:TheGameStateObserverPattern
Listing26.1:GameStateclassimplementationinC#.Listing26.2:TheIObservableandINotifiableinterfaces.Listing26.3:TheSubscribe()andUnsubscribe()methodsoftheGameStateclass.Listing26.4:The_NotifySubscribers()methodoftheGameStateclass.Listing26.5:ThesetimplementationforthecurrentStatememberoftheGameStateclass.Listing26.6:Asampleobserverimplementation.Listing26.7:Accessinganobserverbyitsnativetype.Listing26.8:Supportingmultiplesubjecttypes.
Chapter27:FastTrigonometricOperationsUsingCordicMethods
Listing27.1:Rotationandvectoringmodeimplementations.Listing27.2:Normalizingtheinputrange.Listing27.3:Angletablegeneration.Listing27.4:ComputationofC.Listing27.5:Sineandcosineimplementation.Listing27.6:Arctangentimplementation.Listing27.7:Vectorrotationimplementation.Listing27.8:Vectorlengthimplementation.
Chapter28:Inter-ProcessCommunicationBasedonYourOwnRPCSubsystem
Listing28.1:Examplestubfunctionsfortheclientandserver.
TeamUnknownRelease
Chapter1-WhattoLookforWhenEvaluatingMiddlewareforIntegrationGameEngineGems,VolumeOnebyEricLengyel(ed)JonesandBartlettPublishers©2011
Chapter1:WhattoLookforWhenEvaluatingMiddlewareforIntegration
JasonHughesSteelPennyGames,Inc.
1.1Middleware,HowDoILoveThee?
Moderngamesareveryrarelyworkscomprisedentirelyofproprietary,customcodewrittenbyin-housedevelopers.Theamountofpolishedfunctionalityrequiredtocompeteinthegamesindustryissimplyanenormoustaskforasinglestudiotoundertake,andisinaword,unproductive.Thesedays,gameprogrammersareexpectedtobecomfortableusingcertainwheelsthathavebeeninventedelsewhereandonlyreinventingthosethatprovidetangiblebenefitstotheirprojectorstudiobydirectlycontributingtothesuccessofatitleordifferentiatingtheminsomewayfromthecompetition.Giventhatsomemiddlewarelibrarieswillbechosentofulfillcertainneeds,andoftenthereareseveralproductstochoosebetweenwithsimilarfeaturesets,weask,"Whataretheconsiderationsateamshouldtakeintoaccountwhencomparingmiddlewareproducts?"
Let'sassumethatthelanguageofchoiceisC/C++,asismostcommontodaywithmiddlewareandgamedevelopment.Let'salsoassumethatwe'rediscussinganexistingcodebasewhereaspecificfeatureismissingthatcanbefilledbymiddleware,andthattheteam'sdesireistobesurgicalduringintegrationandleavethesmallestscarpossibleshoulditneedtoberemoved.
TeamUnknownRelease