Realtidsrendering av försimulerade partiklar
Transcript of Realtidsrendering av försimulerade partiklar
Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University Linköpings universitet
gnipökrroN 47 106 nedewS ,gnipökrroN 47 106-ES
LiU-ITN-TEK-A-14/002-SE
Realtidsrendering avförsimulerade partiklar
Nathalie Ek
2014-03-14
LiU-ITN-TEK-A-14/002-SE
Realtidsrendering avförsimulerade partiklar
Examensarbete utfört i Medieteknikvid Tekniska högskolan vid
Linköpings universitet
Nathalie Ek
Handledare Joel KronanderExaminator Jonas Unger
Norrköping 2014-03-14
Upphovsrätt
Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –under en längre tid från publiceringsdatum under förutsättning att inga extra-ordinära omständigheter uppstår.
Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat förickekommersiell forskning och för undervisning. Överföring av upphovsrättenvid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning avdokumentet kräver upphovsmannens medgivande. För att garantera äktheten,säkerheten och tillgängligheten finns det lösningar av teknisk och administrativart.
Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman iden omfattning som god sed kräver vid användning av dokumentet på ovanbeskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådanform eller i sådant sammanhang som är kränkande för upphovsmannens litteräraeller konstnärliga anseende eller egenart.
För ytterligare information om Linköping University Electronic Press seförlagets hemsida http://www.ep.liu.se/
Copyright
The publishers will keep this document online on the Internet - or its possiblereplacement - for a considerable time from the date of publication barringexceptional circumstances.
The online availability of the document implies a permanent permission foranyone to read, to download, to print out single copies for your own use and touse it unchanged for any non-commercial research and educational purpose.Subsequent transfers of copyright cannot revoke this permission. All other usesof the document are conditional on the consent of the copyright owner. Thepublisher has taken technical and administrative measures to assure authenticity,security and accessibility.
According to intellectual property law the author has the right to bementioned when his/her work is accessed as described above and to be protectedagainst infringement.
For additional information about the Linköping University Electronic Pressand its procedures for publication and for assurance of document integrity,please refer to its WWW home page: http://www.ep.liu.se/
© Nathalie Ek
Abstract
This master thesis presents a method for real-time streaming of pre-simulated
particle systems. The particle systems are simulated offline in any software
and then saved as an Alembic file. This Alembic file is then imported into
Frostbite and can be loaded at run-time. The result of this master the-
sis work is the implementation of a streaming and rendering framework
for pre-simulated particles. The implementation contains only basic shad-
ing and lighting due to the time constraints of the work but the streaming
part features an advanced solution to get predictable and manageable CPU
and memory overhead. The implentation performs well and works satisfac-
tory.
Contents
1 Introduction 6
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 The Company . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 EA Digital Illusions CE - DICE . . . . . . . . . . . . . 7
1.2.2 Frostbite . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Third Party Tools . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.1 Real Flow . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.2 Alembic . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Related Work 10
2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Background 13
3.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 A particle system is born . . . . . . . . . . . . . . . . 13
3.2 Hardware evolution . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.1 CPU evolution . . . . . . . . . . . . . . . . . . . . . . 16
3.2.2 GPU evolution . . . . . . . . . . . . . . . . . . . . . . 16
3.3 SIMD - Single Instruction Multiple Data . . . . . . . . . . . . 17
3.4 Sorting Particles . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.1 Bubblesort . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.2 Combsort . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.3 Insertion sort . . . . . . . . . . . . . . . . . . . . . . . 20
3.4.4 Mergesort . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4.5 Quicksort . . . . . . . . . . . . . . . . . . . . . . . . . 22
1
2
3.4.6 Radix Sort . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5 Third Party Software . . . . . . . . . . . . . . . . . . . . . . . 24
3.5.1 RealFlow . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5.2 Alembic . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 Particle Systems 26
4.1 Particle Systems . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.1 A particle system in general . . . . . . . . . . . . . . . 26
4.1.2 The Particle . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.3 Different types of Particle Systems . . . . . . . . . . . 27
4.1.4 The Emitter . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.5 Simulation . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.6 Rendering . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1.7 Performance . . . . . . . . . . . . . . . . . . . . . . . 32
5 Method 35
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Alembic importer . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2.1 Particle information . . . . . . . . . . . . . . . . . . . 36
5.3 Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.4 Runtime entity . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.5 Streaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.5.1 Streaming Cache . . . . . . . . . . . . . . . . . . . . . 37
5.5.2 Streaming State Machine . . . . . . . . . . . . . . . . 38
5.5.3 Reading from the Cache . . . . . . . . . . . . . . . . . 40
5.6 Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.7 Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.7.1 Sorting using a key-index stream . . . . . . . . . . . . 43
5.7.2 Sorting the Particle Stream directly . . . . . . . . . . 44
5.7.3 CPU sorting . . . . . . . . . . . . . . . . . . . . . . . 44
6 Result 45
6.1 Pipeline results . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.2 Runtime Results . . . . . . . . . . . . . . . . . . . . . . . . . 45
Contents 3
6.3 Workflow Results . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.4 Screenshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.5 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7 Discussion 50
7.1 Future Improvements . . . . . . . . . . . . . . . . . . . . . . . 51
7.1.1 Compression . . . . . . . . . . . . . . . . . . . . . . . 51
7.1.2 Lighting . . . . . . . . . . . . . . . . . . . . . . . . . . 51
List of Figures
3.1 Particle System in Genesis . . . . . . . . . . . . . . . . . . . . 14
3.2 Spacewar (1962) . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 SIMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Movie Texture . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5 Mergesort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.6 Quicksort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.1 Data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2 State machine . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.3 Chunk states . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.4 Ringbuffer example . . . . . . . . . . . . . . . . . . . . . . . . 40
5.5 Information split over chunks . . . . . . . . . . . . . . . . . . 41
6.1 Particle Stream simulation . . . . . . . . . . . . . . . . . . . . 47
6.2 Particle Stream simulation . . . . . . . . . . . . . . . . . . . . 48
6.3 Combined CinemaStream and ParticleStream simulation. . . 48
6.4 Combined CinemaStream and particle simulation. . . . . . . 49
4
Chapter 1
Introduction
This chapter will give a short motivation for the work. It will also go through
the third party tools involved in this master thesis work.
1.1 Motivation
Particles are of great value when it comes to giving the viewer/user a be-
lievable experience, whether it concerns movies or games. Hence, particle
systems are an important tool for visual effects in movies and games.
The challenge with particles in the movie industry is not efficiency. It does
not matter if one frame takes two days to render as long as the end result
is rewarding. In games, the situation is the opposite - rendering time has
to be less than 33 ms (for 30 fps) to be usable. With the fourth generation
of consoles (Xbox One and Playstation 4), Frostbite actually runs in 60 fps
which cuts the frame time in half to 16 ms.
The quality of particle simulations and the quantity of particles that can
be simulated have increased enormously since the introduction of GPU-
accelerated particle systems.
In modern games, particle systems tend to use a small amount of particles
6
Chapter 1. Introduction 7
but large ones in size, combined with shading instead of a large number of
small-sized particles. The particles are usually rendered as point sprites and
an artist provides a painted texture, sometimes along with a normal map to
integrate better with the light in the scene.
Since the number of particles increases with the evolution of hardware, shad-
ing becomes more and more important. Previous defects that could be
concealed with textures become more obvious as the number of particles
increases. One of the important aspects of shading dense particle systems
is self-shadowing, i.e. that particles can cast shadows on other particles.
Self-shadowing gives important cues to the density and shape of a particle
cloud.
1.2 The Company
1.2.1 EA Digital Illusions CE - DICE
EA Digital Illusions CE - DICE, was founded by Ulf Mandorff, Olof Gustafs-
son, Fredrik Liliegren, Andreas Axelsson and Markus Nystrom in Alvesta
in the early 1990s. The first games released were Pinball Dreams (1992)
and Pinball Illusions (1995). The release of Battlefield 1942 resulted in a
great break through within the game industry. Today the Battlefield se-
ries is among the most popular first person shooter games out there on all
platforms.
In 2004 the company entered collaboration with Electronic Arts (EA), re-
sulting in DICE being an affiliated company to EA.
1.2.2 Frostbite
Frostbite (currently Frostbite 3) is a game engine developed by DICE. As of
today, the engine is designed for use on Microsoft Windows, Playstation4,
Xbox One, Playstation 3, Xbox 360 and mobile platforms. It is also adapted
to a wide range of video game genres.
Chapter 1. Introduction 8
“Frostbite empowers game creators around the world to shape
the future of gaming. Together with our game teams, we provide
a smooth, beautiful, and collaborative development experience.”
[1]
1.3 Third Party Tools
1.3.1 Real Flow
RealFlow is a fluid and dynamics simulator for the 3D industry and is used
within the movie industry for creating realistic fluid simulations. RealFlow
fluids has appeared in, among others, Resident Evil Retribution, Ice Age
4, The Avengers and The Girl with the Dragon Tattoo. RealFlow has also
been used in games like Crysis 2 and Mass Effect 3.
RealFlow Intuitive Fluids is an industry-standard, out-of-the-box fluid sim-
ulation software. It is fast and easy to use and it is compatible with all
major 3D platforms (Maya, 3DS Max, Lightwave, Softimage, Houdini, Cin-
ema4D).
In January 2008 RealFlow won the Academy Technical Achievement Award
2007 granted by The Academy of Motion Picture Arts and Sciences.
1.3.2 Alembic
Alembic is an open computer graphics interchange framework which con-
tains tools for collaboration management and a generic, extensible, data
representation scheme. It includes a C++ library, a file format, client plu-
gin and applications. It was initially developed in 2010 by teams from Sony
Pictures Imageworks and Industrial Light & Magic.
Alembic was made to create an open standard for scene data sharing and
to support a baked-data workflow. It enables easy hand-off between disci-
plines and to enable fast workflows leading to greater productivity. Alembic
Chapter 2
Related Work
This chapter presents a summary on related work in the field of particle
simulation and also sorting algorithms.
2.1 Related Work
In 2004 A.Kolb and P.Kipfer [2], [3] introduced particle systems on the GPU
for real-time animation and rendering of particle systems in OpenGL since
the CPU was too slow. The GPU was subsequently used for simulating fluid
motion with smooth particle hydrodynamics [4].
Different application purposes require different parallelization strategies.
The particle system used by A.Kolb and P.Kipfer back in 2004 did not
require any knowledge about the surrounding particle neighbors and were
easily parallelized (i.e. one thread per for each particle). The SPH imple-
mentation used in [4] as well as P.Kipfer [3] required information about the
local neighbors due to collision detection and advection of the particles. Both
are Forward-Euler solutions where the particle velocity could be adjusted by
using a small uniform time step to allow the system to converge.
The implementation used in Dynamic Particle System for Mesh Extraction
on the GPU [5] does not have a uniform time step. Instead, each particle
10
Chapter 2. Related Work 11
determines its step size based on its energy and the local curvature. This
allows for faster convergence for the purpose of the mesh extraction.
To extract high quality meshes for isosurface computation one recent ad-
vance is based on a dynamic particle system. Particle placement techniques
requires a significant amount of time to produce a satisfactory mesh and to
address this problem Kim et al [5] have studied the parallelism property of
particle placement and the use of CUDA, a parallel programming technique
on the GPU. The approach significantly improves the performance of the
particle placement. Kim et al [5] presents their curvature dependent sam-
pling method and the implementation using CUDA on the GPU to extract
high quality meshes. They also devise and efficient implementation of a par-
ticle system on the GPU to reduce the runtime of the particle system.
Drone [6] presents different methods for creating an advanced interaction
particle system. The computations and data reside entirely on the GPU
and Drone [6] uses non-parametric particle systems on the GPU to display
the complex behavior of a particle.
Non-parametric particle systems must react to their environment as well
as to each other giving the non-parametric systems a true advantage over
parametric systems. To handle the particle interaction in the system Drone
[6] outlines a method of dealing with N-body problems on the GPU – Force
Splatting for N2 Particle Interactions. The goal is to project the force from
one particle onto all other particles in the system during a single operation.
The algorithm exploits the alpha blending capabilities and the fast rasteriza-
tion of the modern graphics hardware without the constant need to recreate
complex space partitioning structures on the GPU.
Another important aspect when it comes to particle systems is how to sort
the particles.
In the past, many sorting algorithms have been proposed (Knuth [7], Martin
[8] and and Quicksort is one of the fastest algorithms used in practice. Since
often used there are also many optimized implementations of the Quicksort
available. A Quicksort implementation using SIMD instructions would be
Chapter 2. Related Work 12
ideal, but according to Inoue [9] there is no known technique to implement
Quicksort algorithm using existing SIMD instructions.
The performance of sorting is often dominated by pipeline stalls caused
by branch miss predictions according to Sanders and Winkel [10]. The al-
gorithm presented by Inoue [9] makes it possible to take advantage of data
parallelism of SIMD instructions while avoiding pipeline stalls cause by miss
predictions.
Chapter 3
Background
This chapter will go through the history of particle systems and the hard-
ware evolution. It will also describe the third-party software used in this
report and provide a comparison for sorting algorithms suitable for particle
systems.
3.1 History
3.1.1 A particle system is born
In the early 1980s, William Reeves, who worked on Star Trek II: The Wrath
of Khan, started to do research methods to create realistic natural phenom-
ena in real time, in particular for creating realistic fire in the Genesis Demo
sequence (see figure 3.1). Reeves realized that conventional modeling would
not do the trick even though it was the best at creating objects that had
smooth, well-defined surfaces [11].
He used the term “fuzzy” when referring to the objects and thought it would
be better if they were modeled as a system of particles that behaved within
a set of dynamic rules.
Reeves was not the first who wanted to use particles and niether the first
13
Chapter 3. Background 14
Figure 3.1: An early particle system used in Genesis).
to use them. Particles had been used before to create natural effects such
as smoke and galaxies of stars. However, particles turned out to be hard to
control. Even though it was not referred to as such, particle systems were
used in some of the very first video games, for example Spacewar (see figure
3.2) which used particles to display explosions as early as in 1962.
Reeves came to the conclusion that by applying a system of rules to the
particles, a chaotic effect could be achieved while maintaining some creative
control.
During the past thirty years there has been a lot of particle systems making
people grasp for air; watching buildings blow up, giant cities get flooded or
other visually stunning effects. Notable uses in the movie industry includes
Lord of the Rings, Transformers, The Avengers, Ice Age.
Chapter 3. Background 15
Figure 3.2: Two spaceships, one goal - blast each other out of the sky.Built by a small team at MIT led by Steve Russel. The relatively action-packed
space shooter was outrageously ahead of its time.
Chapter 3. Background 16
3.2 Hardware evolution
The evolution in both CPU and GPU hardware has been progressing ex-
tremely fast over the last couple of decades.
3.2.1 CPU evolution
In the late 1970s, Intel released the first x86 CPU (Central Processing Unit)
with the introduction of the Intel 8086. This instruction set gradually be-
came the industry standard and Intel processors were the most popular
processors going into the 1990s. This decade saw a race where clockspeeds
of the processors increased at a rapid pace. However, in the early 2000s,
processor manufacturers hit what would become known as the power wall.
This meant that it was not possible to create processors with higher clock
frequencies without using large amounts of power. This led to the intro-
duction of multicore processors. This shift was not a shift only in hardware
since a new type of programming needed to be used to fully utilize the power
of the new processors. This is where the future of processors lie. Multiple
cores and threads running concurrent software.
3.2.2 GPU evolution
GPU (Graphics Processing Unit) started emerging in the early 1990s. An
early example of mass-market dedicated GPUs was the Playstation and Nin-
tendo 64. The first GPUs were dedicated to accelerating 3D functionality
with separate discrete boards. The most notable example of this setup be-
ing 3dfx with their Voodoo cards. The 1990s also saw the introduction of
OpenGL and quite soon, the API influenced hardware development. Dur-
ing the late 1990s, Microsoft introduced an API similar to OpenGL, called
DirectX. This API eventually gained popularity and became the industry
standard among Windows game developers.
The next step in GPU evolution came with the introduction of GPGPU
Chapter 3. Background 17
(General Purpose computing on GPU). The term was coined by Mark Harris
in 2002 when he noticed a trend of using the GPU for non-graphics work
[12]. The GPGPU evolution is still ongoing and it is still not as common as
might have been expected.
Even more complex systems can be simulated by creating a new particle
system when each particle dies. The technique of using more than one
particle system was used as early as in the Genesis sequence in Star Trek II.
Up to 400 particle systems consisting of 750 000 particles were used.
3.3 SIMD - Single Instruction Multiple Data
SIMD (Single Instruction Multiple Data), see figure 3.3, is a method which
lets one microinstruction operate at the same time on multiple data items.
A single computer instruction perform the same identical action (retrieve,
calculate or store) simultaneously on two or more pieces of data. Typically
this consists of many simple processors, each with a local memory in which
it keeps the data which it will work on.
The advantage of the SIMD format is that for the cost of doing a single
instruction, N instructions worth of work are performed. This results in
large speedups for data-parallelizeable algorithms.
Each processor simultaneously performs the same instruction on its local
data progressing through the instruction in lock-step, with the instruction
issued by the controller processor. SIMD concepts are also applicable to
GPUs since they are massively parallell units that are capable of vector
operations on wide registers.
Particle systems are in general data-parallel, i.e. the same operation is
performed on a large amount of data. Therefore, SIMD programming is
beneficial for particle systems.
Chapter 3. Background 18
Figure 3.3: SISD - Single Instruction Single Data.SIMD - Single Instruction Multiple Data.
DirectCompute
DirectCompute is an API from Microsoft that supports general-purpose
computing on graphics processing units. It is part of the Microsoft Di-
rectX collection of APIs and was initially released with the DirectX 11 API.
DirectCompute allows for vendor-independent development of GPGPU al-
gorithms which can be used for particle system computations.
Movie Textures
To simulate very complicated particle effects in real time, reality has to be
faked. One way to simulate such an effect, which could be a big explosion,
is to use movie textures.
Movie textures are animated textures that are created from a video file. It
is a technique where a movie is played back on a simple polygon. They can
be used for cut scene movie sequences or to render movies into the scene
itself.
Chapter 3. Background 19
Figure 3.4: Example of a movie texture used in Battlefield 4.
3.4 Sorting Particles
Since particles are often transparent to some degree, sorting has to be per-
formed. This is done to be able to guarantee that the order that particles
are rendered, corresponds to the depth. The fundamental operation of many
sorting algorithms is to compare two values and swap them if they are out
of order. Since sorting is such a fundamental part of particle systems, it
is important to conduct experiments to get performance numbers for the
different algorithms before making a choice.
3.4.1 Bubblesort
Bubblesort compares each element to the next element and makes a swap
if sorting is needed. The gap between the two elements being compared is
always one. The complexity for Bubblesort is Θ(n2).
3.4.2 Combsort
Combsort is an extension to Bubblesort. It compares and, if needed, swaps
two non-adjacent elements. Performance is drastically improved by com-
paring two values with large separations. This is because each value moves
Chapter 3. Background 20
toward its final position more quickly. Unlike Bubblesort the gap between
the two comparing elements can be more than one. The inner loop of the
Bubblesort that handles the actual sort requires a modification. The gap
between the two elements reduces for each iteration of the outer loop in
steps of a shrink factor.
[inputSize / shrinkFactor,
inputSize / shrinkFactor^2,
inputSize / shrinkFactor^3,
...]
The length of the list being sorted divided by the shrink factor decides the
value of the gap and the list is then sorted with that gap. During the sorting
the gap is divided by the shrink factor again and the process repeats until
the gap is 1. If the list is not fully sorted by this point, the sort continues
using a gap of 1 until sorting is completed. The final step is thus equivalent
to an efficient Bubblesort since most problems have been dealt with.
The computational complexity of combsort approximates Θ(n · log(n)) on
average.
3.4.3 Insertion sort
Insertion sort is an algorithm that works relatively efficient for small lists
and mostly unsorted lists (Shellsort is a variant of Insertion sort that is more
efficient for larger lists). Being a simple sorting algorithm it is often used as
part of a more sophisticated algorithm. It builds the final sorted array (or
list) one item at a time.
Shellsort is not as efficient on large lists or as more advanced algorithms
(Quicksort, Heapsort or Mergesort), but it provides several advantages - it
is simple to implement and is efficient for small data sets. Shellsort is also
stable and can sort a list as it receives it and only requires a constant amount
Θ(1) of additional memory space.
Insertion sort consumes one input element each repetition and the output
Chapter 3. Background 21
list, which is sorted, grows. For each iteration one element from the input
data is removed, finds the location it belongs to in the output list and is
inserted at the position.
The sorting is typically done in-place by growing the sorted array behind the
array that is being iterated. The value of each array element is compared
against the largest value in the sorted array. If the value is larger than the
largest one in the sorted array it leaves the element in place and inserts it
in the next. If the value in the sorted array is smaller, it finds the correct
position in the sorted list and shifts all the larger values to make place, then
inserts it at the correct position.
3.4.4 Mergesort
Mergesort, see figure 3.5, is a divide and conquer and comparison-based
algorithm invented in 1945 by John von Neumann.
Mergesort first divides the unsorted list into the smallest unit possible (a
single element) and then compares each element with the adjacent list to
sort, i.e. it compares every two elements (the first with the second, the third
with the fourth, etc.) and makes a swap if the first element should come
after the second. Each of the resulting lists of two are merged into lists of
four and then merged into lists of eight, and so on; until at least two lists
are merged into the final sorted list.
Merge sort scales well to very large lists and its worst case running time is
Θ(n · logn). In the worst case, merge sort does ca. 39 % fewer comparisons
than Quicksort does in the average case.
Stable sorting in-place is possible but causes the algorithm to be a bit slower,
even though still managing Θ(n · logn) time. Stable sorting can be achieved
by merging the blocks recursively.
Chapter 3. Background 22
Figure 3.5: Mergesort
3.4.5 Quicksort
Quicksort, see figure 3.6, is a divide and conquer algorithm that relies on
a partition operation, it is also a comparison sort since the elements are
compared to each other. Quicksort is a fast sorting algorithm which has,
on average, Θ(nlogn) complexity, which makes it suitable for sorting large
data.
A pivot point, an array element, is selected and then used when doing the
partition. With the pivot point comes two sublists that are re-ordered -
smaller values are moved to the left of the pivot point and larger elements
are moved to the right. This can be done efficiently in linear time and
in-place.
The two sublists are then recursively sorted, each sublist getting its new
pivot point and sublists.
The quicksort algorithm is suitable for CPU sorting.
Chapter 3. Background 23
Figure 3.6: Quicksort
3.4.6 Radix Sort
Radix sort is a sorting algorithm that sorts numbers by processing individual
numbers, which makes this a distribution sort. It is a simple sort that is
both easy to understand and easy to use.
Radix sort sorts numbers by the least significant digit, the next least sig-
nificant digit, and so on. If two numbers are equal, the first integer must
always stay before the second integer.
LSD (Least Significant Digit) preserves the relative order of the digits by
using a stable sort (requires the use of a stable sort).
Table 3.1: LSD - Soring by least significant digit.
Unsorted Sorted by 1s Sorted by 10s Sorted by 100s
123 123 123 123583 583 625 154154 154 154 456567 625 456 567689 456 567 583625 567 583 625456 689 689 689
Chapter 3. Background 24
MSD (Most Significant Digit) does not require the use of a stable sort and
the in-place MSD radix sort is not stable.
While LSD radix sort sorts after the least significant digit, MSD sorts after
the most significant digit.
3.4.7 Summary
The choice of sorting algorithm was based both on measurements and the
suitability for a SIMD implementation. Since quicksort is recursive, it was
discarded quite early even though it has good performance characteristics
in the single threaded case. The choice was made to use radix sort due to
the possibility for a SIMD implementation.
3.5 Third Party Software
3.5.1 RealFlow
RealFlow is used to simulate fluid, water surfaces, fluid-solid interactions,
soft bodies, rigid bodies and meshes. It uses particle based simulations
which can be influenced in a multitude of ways by point-based nodes. These
nodes can do anything from simulating gravity to recreating the vortex-like
motion of a tornado. [13]
Large Scale
A cutting-edge hybrid grid/particle solver, Hybrido, provides endless pos-
sibilities for large-scale simulations, such as floods or oceans with breaking
waves.
RealFlow automatically creates particles by calculating the conditions for
splashes, foam and mist information. It is possible to generate millions of
particles using this advanced feature.
Chapter 3. Background 25
Small-scale simulations
The SPH (Smoothed-Particle Hydrodynamics) solver in RealFlow is ideally
suited for highly-detailed fluid simulations with tiny splashes and turbulent
surfaces.
Particles are generated by emitters and their total amount represents the
fluid. Each particle can be handled as a point in 3D space with certain
properties (velocity, position or mass). The emitters can interact with solid
or soft bodies and RealWave objects. The emitters can also be completely
customized and it is even possible to write your own fluid engine if de-
sired.
Oceans
When an ocean surface has to be simulated quickly and effectively, Real-
Wave is ideal since it is a powerful simulation toolset for small to medium
ocean surfaces. To achieve certain wave forms and structures it uses the
displacement of a the vertices in a mesh.
3.5.2 Alembic
Alembic distills complex, animated, scenes into non-procedural, application-
independent, baked geometric results. Alembic will efficiently store the an-
imated vertex positions and animated transforms that result from an ar-
bitrarily complex animation and simulation process. It will not attempt
to store any representation of the network of computations which were re-
quired to produce the final animated vertex positions and animated trans-
forms.
Chapter 4
Particle Systems
This chapter will present the concept of a particle system. The chapter will
furthermore describe different types of interaction, simulation and rendering
for particle systems.
4.1 Particle Systems
“In physical sciences, a particle is a small localized object
which can be described by physical properties such as volume or
mass. The word is rather general in meaning, and is refined as
needed by various scientific fields.” [14]
4.1.1 A particle system in general
A particle system is a collection of 3D points in space where each point
represents a single particle. Compared to standard geometry objects, which
are static, particle systems are not. Each particle goes through a complete
life cycle - they are born, change over time and then die off. Particle systems
tend to be chaotic since a given particle does not have a pre-determined
path. Also, the particles can each have a random element, called a stochastic
process, which modifies its behavior and makes the effect look organic and
26
Chapter 4. Particle Systems 27
natural.
There are many use cases for particle systems in games and other computer
graphics applications - water, foam, smoke, dust, fire, blood splashes, sparks,
even hair and cloth simulation. Worth mentioning is that these kinds of sys-
tems also have scientific applications - cosmological simulations use particle
systems with tens of millions of particles for studying the creation of the
universe. Particle simulations are also used in research for large and costly
fusion reactors.
4.1.2 The Particle
When building a particle system, the particles can have a number of prop-
erties. The minimal set of properties are typically:
• A position as well as the previous position.
• The direction in which the particle is currently traveling is stored (can
be stored in a direction vector).
• The speed of the traveling particle can simply be combined with the
direction vector by multiplication.
Since a particle goes through a lifetime, in addition to the above, the life
count of the particle needs to be stored. This is the number of frames that
the particle has existed which is compared to a set limit on the lifetime of
particles.
4.1.3 Different types of Particle Systems
There are many different types of particle systems. The different kinds
of systems can be categorized by the level of interaction between parti-
cles.
Chapter 4. Particle Systems 28
No Interaction
No interaction refers to the fact that the particles are independent of each
other. The computational complexity of such a system is Θ(n) per step and
places non-interacting particle systems as the least computationally complex
type.
for i = 1 to numParticles do
move particle i
end for
Limited Interaction
This kind of system has particles that interact with neighboring particles,
that is, other particles within a short range. A use case for this type of
interaction would be collision between hard spheres that bounces off each
other.
There are two ways to compute this type of particle system - brute force
or using spatial data structures. The computational complexity for the
brute force approach is Θ(n2). However, by using spatial data structures as
mentioned above, and a neighborhood size a, the computational complexity
can be reduced.
for i = 1 to numParticles do
move particle i
for all j in neighbours(particle[i]) do
check collision between particle i and j
end for
end for
Chapter 4. Particle Systems 29
Full Interaction
In this kind of system all the particles affect each other as occurs with a
gravitational or electrostatic force. The brute force approach is required and
this type of particle system has a computational complexity of Θ(n2).
for i = 1 to numParticles do
for j = 0 to numParticles where j != i do
compute interaction between j and i
end for
move particle i
end for
Particle Data Structure
All the properties are stored in a structure of some kind and if a more
complex particle is wanted there is no problem to add more properties.
Complementary additions/changes could be adding a size to animate the
size of the particles, add the mass of the particle or adding transparency by
adding an alpha component to the color.
A restriction to the paricle structure is that the size should be kept as small
as possible to be able to handle huge amounts of particles but still require
reasonable amounts of memory.
struct Particle
{
Vector position;
Vector velocity;
float mass;
// ...
};
Chapter 4. Particle Systems 30
4.1.4 The Emitter
Once a particle system is created, the system itself has to be created in
the world to make any sense. The particle emitter is an entity responsible
for creating the particle system and it is this object that is placed in a 3D
world.
The number of particles and the general direction in which they should be
emitted as well as all the global settings are controlled by the emitter. This
could for example be the above mentioned lifetime setting for the parti-
cles.
4.1.5 Simulation
For the particles to be visually interesting, they have to move. There are
different ways to move the particles, either through real-time simulation
or off-line simulations that are baked in different formats and played back
later.
Physics
The physics model in a particle system could handle attributes such as the
mass of the particle which could be randomized, causing gravity to affect
each particle individually. Friction could be added to force some particles to
slow down while animating. Other local spatial effects such as wind gusts,
magnetic fields and rotational vortexes would make the particles stand out
from each other even more. Collision handling can also be added to the
particles to have them interacting with the surrounding world.
External Influences
It is important to consider all of the possible parameters that might be
wanted when creating a particle system and build that flexibility into the
system.
Chapter 4. Particle Systems 31
With wind as a parameter there might be a need of changing the wind
direction vector. For example, when a car drives on a snowy road, the
snowflakes are affected by a new wind direction generated by the car and
the snow responds to the wind as the car passes.
Updating the Particles
For each cycle of the simulation, each particle needs to be updated. To make
sure not to waste valuable time, the status of the particle is inspected to see
if the lifetime of the particle has expired. If the particle is marked as dead it
is removed from the emitter and returned to the global particle pool.
Point Cloud Vertex Animation
To enable complex animations, for example facial animations, point clouds
can be used. Point clouds are pre-rendered particle data that is exported
and played back later. This data is also attached to underlying vertex data
in a mesh, resulting in vertices being ”skinned” to the point cloud data.
This way, it is possible to create detailed animations with the help from
particles. In this case, the particles are only used indirectly and not rendered
on screen.
4.1.6 Rendering
There are many ways to render a particle system depending on what the
system represents.
In a violent game with a lot of blood splatter, there might be a need of having
multiple blood systems - blood pool, blood splat, blood squirt and camera
lens blood splat. Each blood system contain suitable particles, all requiring
their own rendering technique, creating a chain of called effects resulting in
a final effect. The blood squirt would render blood squirts flying through
the air and when the squirts collides with an object (a wall, the ground, etc)
Chapter 4. Particle Systems 32
the blood splat function would be called. This would create messy blood
splats on the object.
Since a particle system is basically a collection of 3D points in space, it can
be rendered as just that - a set of colored 3D points. There is always the
option to calculate a polygon around the 3D point which always faces the
camera like a billboard. Perspective can be created by scaling the polygon
with the distance from the camera. Another option is to draw a 3D object of
any type at the position of the particle - the possibilities are endless.
As said above, the particles can be represented by a polygon when rendering.
This polygon is most often a quad, i.e. four vertices.
Vector vertices[]
{
Vector(1.f, 1.f, 0.f),
Vector(-1.f, 1.f, 0.f),
Vector(-1.f, -1.f, 0.f),
Vector(1.f, -1.f, 0.f)
}
4.1.7 Performance
There has always been a great difference between spectacular effects in games
and movies and until recently this has mostly been due to hardware limita-
tions. There is simply no way to fill a Playstation 3 with millions of rendered
fluid particles or skyscrapers getting blown up into millions of pieces or hav-
ing a city completely flooded.
To get an accurate and realistic simulation most use cases require a large
number of particles. The more particles, the higher computational complex-
ity, resulting in a fundamental impact on performance and hence limits the
size of the particle system.
Games need to have smooth animation and responsive interaction, therefore
fast execution times are required. As the need for larger and more realistic
Chapter 4. Particle Systems 33
particle systems increases, so does the computational complexity.
Before the introduction of GPGPU computations and the game physics en-
gine PhysX, fluid simulations in games did not use particle systems. Today
substantial real-time fluid simulations can be performed.
Memory
As mentioned in section 4.1.3, it is important that the memory requirements
for a single particle is kept at a minimum to be able to have huge amounts
of particles. To further manage the memory requirements for a particle
system, it is common to use a pool for the particle memory. Since the
pool will consist of blocks with the same size, managing fragmentation also
becomes easier.
When it comes to allocations and releases, memory operations, there should
be as few as possible due to the performance overhead inherent in the re-
quired context switches. If a particle gets old and dies it should not be
released from the memory. It should instead be flagged and marked as dead
and re-initialized. It is not until all the particles in the particle system are
marked as dead that allocated memory for the entire system is released.
This is done by using the pool design mentioned above, so that all mem-
ory in the fixed size pool is pre-allocated, and just flagged as free when the
particle is released.
Rendering
When a particle, a 3D point in space, is supposed to correspond to for ex-
ample a snowflake, the image of the snowflake has to be drawn on a polygon.
A particle most likely needs four vertices which creates two polygons. Thus,
with 3000 visible snowflake particles, 6000 visible polygons are added for
the snow alone. Since most particles in a particle system moves, the vertex
buffer cannot be pre-calculated and needs to be changed every frame. Most
particle rendering methods also use hardware instancing to lower the CPU
Chapter 5
Method
This chapter will describe in detail how the implementation of a pre-simulated
particle system was made in Frostbite. The chapter will describe implemen-
tation of pipelines, streaming and rendering for the particle system.
5.1 Introduction
An implementation of a pre-simulated streaming particle system was made
in the Frostbite game engine. It reads particles pre-simulated in RealFlow,
exported as Alembic files. The system then uses a custom streaming solution
to minimize memory usage. A basic sorting algorithm and rendering system
for the streamed particles were also implemented. The data flow for the
system is illustrated in figure 5.1.
5.2 Alembic importer
The first step of creating a ParticleStream is authoring the simulation. This
is done inside the software Realflow (could be any software capable of ex-
porting Alembic files) and the result is exported to an Alembic file. Alembic
is an open source exchange format for digitally created assets, developed by
35
Chapter 5. Method 36
Figure 5.1: Figure illustrating the data flow from artist creation to Frostbiteruntime.
Sony Imageworks. The format supports high level constructs such as meshes
and points (particles). It is also possible to access the low-level parts of an
archive to attach and store any data such as particle scales, colors and vertex
colors for meshes.
To get the particle data into the Frostbite asset pipeline, a custom importer
was written that reads particle data from an Alembic file and stores it in-
side a custom binary file in the Frostbite asset pipeline, called a sandbox
file.
5.2.1 Particle information
The particle information contained in the Alembic file format is organized in
a Point structure inside Alembic. This structure contains information about
the position and has a numerical identifier for each particle. As stated above,
it is also possible to access lower-level constructs of the Alembic archive to
attach arbitrary data to each particle.
5.3 Pipeline
After the particle data has been read from the Alembic file, the pipeline
processes the custom binary file to create streamable chunks of particle
data. These chunks are 2MB each which is the standard chunk size for free
Chapter 5. Method 37
streaming chunks in Frostbite. The pipeline goes through all frames in the
imported particle simulation and writes them down into the chunk in the
following format
[position.x position.y position.z scale]
5.4 Runtime entity
To be useful in the game, there must be a way to place the particle simula-
tion in the game world. This is made possible by the implementation of a
ParticleStreamEntity which has a world space transform. This world space
transform is used to transform all the particles in the stream from object
space to world space.
5.5 Streaming
The amount of memory required to keep all the particles for the whole
simulation in main memory is too large and some sort of streaming is needed.
The above mentioned chunks are used for this streaming.
5.5.1 Streaming Cache
The particle simulation is then created in runtime and meta-data for it is
read. This meta-data stores how many particles there are per frame in
average in the imported file and how many streaming chunks there are. The
meta-data is then used to determine how many chunks are needed to be kept
in memory. For a higher framerate-simulation it is natural that more chunks
need to be kept in memory. The number of chunks needed also depends on
the average number of particles in a frame. The chunk cache is allocated as
a large array for maximum data locality.
The algorithm to calculate the cache size is as follows:
Chapter 5. Method 38
s = t ∗ fps ∗ T/p (5.1)
where s is the size of the cache expressed in streaming chunks, t is the target
time to have in cache in seconds, fps is the frame rate of the simulation, T
is the max number of particles in any frame in the simulation and p is the
number of particles per streaming chunk.
Each chunk in the chunk cache has a state which can be one of loading, empty
and ready. The streaming process is then controlled by a state machine
described in section 5.5.2.
5.5.2 Streaming State Machine
A state machine (rather a finite state machine) is a system that can, at any
given time, be in exactly one of a pre-defined set of states as shown in figure
5.2.
The particle stream is updated once per frame. In this update, the streaming
state machine is updated. The update goes through all chunks in the chunk
cache and take the appropriate action depending on their state.
If a chunk is currently loading, no action is taken. Chunks that have finished
loading from disk is set to the loaded state. To not create a too big load on
the IO-system, only one load is active at any given time. This means that
whenever a finished load is detected, a new one can be started if needed.
This request will always be created for the next chunk unless the previously
loaded chunk is the last chunk in the animation. Also, when a chunk has
been rendered completely, the state for it is set to empty, meaning it is
ready to store a new chunk. In this way, the chunk cache acts as a ring
buffer which is illustrated in figure 5.3 and 5.5
Chapter 5. Method 39
Figure 5.2: Figure illustrating a (finite) state machine with states and statetransitions.
Figure 5.3: Chunks have three states - empty (E), loading (L) and ready (R). F0,F1 and Fx shows frame 0, 1 and x.
Chapter 5. Method 40
5.5.3 Reading from the Cache
The cache is essentially a ring buffer. This means that there are methods
to be able to transparently start reading from the start of the buffer when
the end is reached as illustrated in figure 5.4.
Figure 5.4: A ringbuffer is a standard array in memory but is conceptually treatedas a ring.
The contents of the buffer is treated as raw bytes just as the chunks. This
means that particle information can be split over chunks, as illustrated in
figure 5.5, and it can also be the case that one particle starts at the end of
the ring buffer and the rest of the information is in a chunk placed at the
beginning. To allow for this, a given frame is read from the ring buffer into
an intermediate buffer where particle information is aligned to not be split
in memory. The reading from the ring buffer into this intermediate buffer is
handled by support routines. Example code for these support routines are
given below.
void incrementAndWrap(u32 bytesToIncrement, u8* const chunkCache)
{
currReadPos += bytesToIncrement;
// u8 is an 8-bit unsigned integer
u8* const chunkCacheEnd =
chunkCache + (CHUNK_SIZE_BYTE * chunkCount);
Chapter 5. Method 41
Figure 5.5: Particle information can be split over chunks. Fx and Fy shows framex and y.
// Cross chunk border?
if (currReadPos - chunkCache >=
(activeChunkIndex + 1) * CHUNK_SIZE_BYTE)
{
chunkStates[activeCacheIndex] = LoadingState_Empty;
activeChunkIndex = (activeChunkIndex + 1) % chunkCount;
}
// Wrap around
if (currReadPos >= chunkCacheEnd)
currReadPos = chunkCache + (currReadPos - chunkCacheEnd);
}
Below is the algorithm for handling reads from the cache into the interme-
diate buffer described above.
void safeRead(void* dest, size_t size)
{
u8* const chunkCache = static_cast<u8* const>(m_chunkCache);
u8* destPtr = static_cast<u8*>(dest);
Chapter 5. Method 42
u32 bytesToRead = static_cast<u32>(size);
while (bytesToRead > 0)
{
// Read a safe amount of bytes
// s64 is a 64-bit signed integer
s64 rem = max<s64>(chunkCount * CHUNK_SIZE_BYTE -
(currReadPos - chunkCache), 0);
// u32 is a 32-bit unsigned integer
u32 remainingBytesInCache = static_cast<u32>(rem);
u32 bytesRead = min(remainingBytesInCache, bytesToRead);
memoryCopy(destPtr, currReadPos, bytesRead);
destPtr += bytesRead;
incrementAndWrap(bytesRead, chunkCache);
// Do we have bytes left to read?
bytesToRead -= bytesRead;
}
}
5.6 Rendering
As with most particle systems, the particles are rendered as screen-aligned
quads. Since the buffer at this stage contains information about particles
in the form [pos.x pos.y pos.z scale] it contains everything needed to
place the quads at the correct world space location.
To handle rendering of the particles, a separate ParticleStreamRenderer
was created. This renderer is responsible for creating the needed GPU re-
sources and copying particle buffers into their GPU counterparts. A buffer
with fixed size is created for all particles and in the beginning of each frame
Chapter 5. Method 43
the renderer calls the ParticleStream with the GPU buffer as an argument.
The ParticleStream then copies the internal CPU buffer over to the GPU
buffer sent in as an argument.
The particles are rendered with hardware instancing on the platforms where
it is applicable. The instancing uses the per-instance data for each particle
from the buffer mentioned above. The fixed quad is transformed according to
the world space position and also scaled according to the embedded particle
scale.
5.7 Sorting
Particle systems that use additive and multiplicative blending can be ren-
dered in any order, there are, however, particle systems where ordering
need to be imposed on the system. These particle systems require sort-
ing. One reason to sort particles is for visual correctness. In cases where
non-commutative blending mode is used, such as alpha blending, sorting
is needed to ensure the correct order of operations. For non-commutative
operations care must be taken that blending happens in the correct order -
back to front order.
Rendering alpha blending particles in the wrong order is extremely notice-
able in motion as the particle system loses all sense of shape. To be able to
use any sorting algorithm it has to be applied to the particle data.
5.7.1 Sorting using a key-index stream
One way to sort the particles is to produce a key-index pair for each particle.
The key contains the value on which the sorting acts (the distance to the
viewer) and the index simply points out the position of the particle in the
particle stream.
The approach results in less bandwidth usage and better cache coherency
since there is no need to fetch a lot of data.
Chapter 5. Method 44
5.7.2 Sorting the Particle Stream directly
Another approach is to sort the particle stream directly without using a
key-indexed stream.
The downside of this approach is that the sorting itself has to read and write
the amount of data twice resulting in worse cache performance. The sorting
metric (the distance) needs to be computed for every sorting pass rather
than just doing it once per frame.
5.7.3 CPU sorting
The first implementation was a simple insertion-sort as a proof-of-concept.
This sorting algorithm has bad time complexity characteristics so something
faster is needed. A radix sort algorithm is what is currently used due to the
suitability for a SIMD implementation as described in section 3.4.6.
GPU sorting could also be used but was out of the time scope for this
implementation.
Chapter 6
Result
The result of the implementation is a pipeline that can import Alembic
files containing pre-simulated particle data into Frostbite. Furthermore, a
real-time free-streaming solution for runtime was implemented. A simple
renderer was also implemented for the particle streams.
6.1 Pipeline results
The pipeline implementation is a pipeline that reads the Alembic file frame
by frame. Since there is no real-time requirements on the pipeline, it does
not have to be very efficient and many times the easier solution was chosen
during this thesis. The pipeline is still very fast and only with very large
datasets will performance be a problem.
6.2 Runtime Results
The runtime implementation consisted of handling streaming of the particle
stream chunks. This procedure is described in chapter 5. The solution to
the streaming problem is the most advanced part of this work and works
very well with good performance. Sorting and rendering the particles are
45
Chapter 6. Result 46
also part of the runtime implementation. Sorting is currently implemented
with a simple quicksort algorithm.
The rendering is implemented with instancing on supported platforms which
gives good performance even with large amounts of objects. The rendering
implementation is currently missing proper lighting and shading support due
to time running out during the implementation phase. The rendering part
of the implementation is also where the bottleneck of the implementation
lies. This is since it is expensive to render alpha-blended geometry due to
overdraw costs.
6.3 Workflow Results
The workflow for placing a particle stream in the world is straightforward.
First, the particle stream data is exported in Alembic format from any
authoring tool wanted. This Alembic file is then imported into FrostEd (the
Frostbite editor). When this import is done, the particle stream pipeline
runs and stores the particle stream data in an internal format (described in
chapter 5 inside the game database. The artists then place a special particle
stream entity somewhere in the world and attach a particle stream asset to
it. The particle stream entity can then be controlled by the visual scripting
language in Frostbite to start, stop and in other ways control the playback
of the particle stream. The particle stream is then streamed and played
back in runtime.
This workflow is streamlined apart from the fact that the authoring of con-
tent happens outside Frostbite and the iteration times for making small
changes can be large.
6.4 Screenshots
This section present four screenshots (figures 6.1, 6.2, 6.3 and 6.4) from the
result of the streaming implementation. These screenshots do not represent
Chapter 6. Result 47
any finished lighting or shading.
The screenshots also show the debug rendering used to debug the streaming
process. This debug rendering illustrates the size of the buffer and each
chunk is shown as a block. The color of a block illustrates the status of
the block and there is also a marker to show the current position that is
read from in the buffer. Furthermore, some statistics on the current particle
stream are shown.
Figure 6.1: Screen capture showing a particle stream simulation and the relateddebugging tools.
6.5 Performance
The particle stream pipeline performs well and the performance is largely
dependent on the size of the data set. The particle stream pipeline currently
does no compression which will be a problem for more practical use-cases
where the amount of particles can be very large.
The runtime implementation also runs at good performance. The rendering
is implemented with instancing on supported platforms and can handle a
lot of particles since it never handles single particles. The streaming is also
fast and the major cost for the runtime implementation is still sorting and
Chapter 6. Result 48
Figure 6.2: Screen capture showing a particle stream simulation where time haspassed since figure 6.1.
Figure 6.3: Screen capture showing a combined CinemaStream and ParticleStream simulation.
Chapter 6. Result 49
Figure 6.4: Screen capture showing a combined CinemaStream and ParticleStream simulation.
rendering the particles in the stream.
Experimentation with a GPU-based sorting algorithm was made, but due
to shortage of time it was cancelled.
When it comes to memory usage, the runtime implementation has pre-
dictable memory overhead. This is since the size of the cache is static and
the memory usage is thus also static. This characteristic is desirable for
streaming buffers in general. However, even though the size of the stream-
ing buffer is constant during streaming it is not constant for all particle
streams. There is logic for deciding the size of the particle streaming buffer
(described in chapter 5).
Chapter 7
Discussion
To summarize, I think the implementation turned out well and uses good
schemes for handling runtime streaming and rendering.
I am very satisfied with the simplicity and efficiency of the implementation.
The ring-buffer used for caching streamed chunks makes the performance
overhead very predictable. Furthermore, since it contains heuristics for de-
termining the size, the intent is that it should work well without manual
tuning of buffer sizes.
The pipeline implementation is also simple and turned out to work well. It
basically consists of reading an Alembic archive and then packing the data
into runtime chunks for streaming. The simplicity of the implementation
makes it easy to plug in compression at a later point.
The workflow is also a part that turned out to work well. I think this
is mostly since the concepts are simple. A ParticleStream simulation is
simply placed somewhere in the world as a game entity and everything
in the simulation is then transformed in relation to the transform of the
ParticleStream entity.
A presimulated particle system has a lot of advantages in game applica-
tions and the only real drawback is that there is no gameplay control of the
particle system. This is quite a big drawback though and it can be hard
50
Chapter 7. Discussion 51
to fit a presimulated particle system in a massively dynamic world. The
biggest advantage is performance since a CPU-based particle system with
gameplay elements are orders of magnitude heavier than pre-simulating par-
ticles.
7.1 Future Improvements
To make ParticleStream production-ready, a few key improvements has to be
made. Most notably, compression and support for more advanced lighting
has to be implemented.
7.1.1 Compression
In the current implementation, the particle data is stored uncompressed in
the Frostbite data storage. This is not sustainable for real-world use cases
that uses gigabytes of data. The idea is that the ParticleStream system
would be integrated in the more general CinemaStream compression infras-
tructure in the future.
7.1.2 Lighting
What is obviously lacking in the implementation is proper lighting which
had to be skipped due to time constraints. Proper lighting of the particles
was not really the goal for this work. The work was instead focused around
the problem of getting source data into the Frostbite runtime and to be
able to utilize streaming efficiently. The goal was also to create an efficient
implementation that could handle large amounts of data.
The particles are simply rendered as alpha-tested quads with regular shaders
and support for artist-created custom shaders. One area in particular where
the lighting has to be improved is the shadowing. This is since volume
shading is a key aspect of high quality particle rendering.
Chapter 7. Discussion 52
Shadowing
The most obvious way to achieve shadowing for a particle system is to use
volume rendering techniques. To be able to do this, it is required that
the particles are converted into a discrete volume representation by being
rasterized into a 3D volume (voxelization). Once the particle is represented
as a volume, there are several ways to render it with shadowing.
Self-Shadowing
There has been a lot of research in this area. Some of the topics are Fourier
Shadow Mapping, Half-Angle Slice Rendering, Opacity Shadow Maps, but
none of them are scalable enough to use in large-scale in-game scenarios.
They are, however, perfect for cut-scenes and contained environments.
Half-Angle Slice Rendering
The key idea of half-angle slice rendering is to calculate a vector which is half
way between the light direction and the view direction. The volume is then
rendered as a series of slices perpendicular to this half-angle vector.
The half-angle vector enables the possibility of rendering the same slices
from both the light’s and camera’s point of view, since they will be facing
towards both. As a result of this the shadowing can be accumulated from
the light at the same time as the slices are being blended.
The main advantage of this technique is that it only requires a single 2D
shadow buffer.
Opacity Shadow Maps
Opacity Shadow Maps samples visibility at regular intervals and there are
numerous variants optimized to handle special cases such as hair. The
Bibliography 53
method is also suitable for generation of self-shadows in discontinuous vol-
umes with explicit geometry, such as fur and foliage, but continuous volumes
such as smoke and clouds may benefit from the approach.
With a set of planar opacity maps the light transmittance inside a complex
volume is approximated.
A volume made of standard primitives (points, lines, and polygons) is sliced.
The volume is then rendered with graphics hardware to each opacity map
that stores alpha values instead of traditionally used depth values.
Each primitive point is enclosed by the alpha values sampled in the maps
and then interpolated for shadow computation.
The algorithm is memory efficient and extensively exploits existing graphics
hardware
Bibliography
[1] Frostbite. http://www.frostbite.com, 2014. Accessed: 2014-01-18.
[2] A. Kolb, L. Latta, and C. Rezk-Salama. Hardware-based simula-
tion and collision detection for large particle systems. In Proceedings
of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics
Hardware, HWWS ’04, pages 123–131, New York, NY, USA, 2004.
ACM. ISBN 3-905673-15-0. doi: 10.1145/1058129.1058147. URL
http://doi.acm.org/10.1145/1058129.1058147.
[3] Peter Kipfer, Mark Segal, and Rudiger Westermann. Uber-
flow: A gpu-based particle engine. In Proceedings of the ACM
SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware,
HWWS ’04, pages 115–122, New York, NY, USA, 2004. ACM. ISBN
3-905673-15-0. doi: 10.1145/1058129.1058146. URL http://doi.acm.
org/10.1145/1058129.1058146.
[4] Andreas Kolb and Nicolas Cuntz. Dynamic particle coupling for gpu-
based fluid simulation. In In Proc. of the 18th Symposium on Simulation
Technique, pages 722–727, 2005.
[5] Mark Kim, Guoning Chen, and Charles Hansen. Dynamic particle
system for mesh extraction on the gpu. In Proceedings of the 5th
Annual Workshop on General Purpose Processing with Graphics Pro-
cessing Units, GPGPU-5, pages 38–46, New York, NY, USA, 2012.
ACM. ISBN 978-1-4503-1233-2. doi: 10.1145/2159430.2159435. URL
http://doi.acm.org/10.1145/2159430.2159435.
54
Bibliography 55
[6] Shannon Drone. Real-time particle systems on the gpu in dynamic
environments. In ACM SIGGRAPH 2007 Courses, SIGGRAPH ’07,
pages 80–96, New York, NY, USA, 2007. ACM. ISBN 978-1-4503-
1823-5. doi: 10.1145/1281500.1281670. URL http://doi.acm.org/
10.1145/1281500.1281670.
[7] Donald E. Knuth. The Art of Computer Programming, Volume 3: (2Nd
Ed.) Sorting and Searching. Addison Wesley Longman Publishing Co.,
Inc., Redwood City, CA, USA, 1998. ISBN 0-201-89685-0.
[8] W. A. Martin. Sorting. ACM Comput. Surv., 3(4):147–174,
1971. ISSN 0360-0300. doi: 10.1145/356593.356594. URL
http://portal.acm.org/citation.cfm?id=356593.356594&coll=
Portal&dl=GUIDE&CFID=89172762&CFTOKEN=95662085.
[9] Hiroshi Inoue, Takao Moriyama, Hideaki Komatsu, and Toshio
Nakatani. Aa-sort: A new parallel sorting algorithm for multi-core
simd processors. In Proceedings of the 16th International Confer-
ence on Parallel Architecture and Compilation Techniques, PACT ’07,
pages 189–198, Washington, DC, USA, 2007. IEEE Computer Soci-
ety. ISBN 0-7695-2944-5. doi: 10.1109/PACT.2007.12. URL http:
//dx.doi.org/10.1109/PACT.2007.12.
[10] Peter Sanders and Sebastian Winkel. Super scalar sample sort. In Su-
sanne Albers and Tomasz Radzik, editors, ESA, volume 3221 of Lecture
Notes in Computer Science, pages 784–796. Springer, 2004. ISBN 3-540-
23025-4. URL http://dblp.uni-trier.de/db/conf/esa/esa2004.
html#SandersW04.
[11] W. T. Reeves. Particle systems—a technique for modeling a class
of fuzzy objects. ACM Trans. Graph., 2(2):91–108, April 1983. ISSN
0730-0301. doi: 10.1145/357318.357320. URL http://doi.acm.org/
10.1145/357318.357320.
[12] Gpgpu.org - general purpose computation on graphics hardware. http:
//gpgpu.org, 2014. Accessed: 2014-01-18.
[13] Realflow. http://www.realflow.com, 2014. Accessed: 2014-01-19.