Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A....

21
Parallel Simulation of Urban Dynamics on the GPU Ivan Blecic, Arnaldo Cecchini and Giuseppe A. Trunfio Department of Architecture, Planning and Design University of Sassari Seventh International Conference on Geographical Analysis, Urban Modeling, Spatial Statistics, GEOG-AN-MOD 2012

Transcript of Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A....

Page 1: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

Parallel Simulation of Urban Dynamics on the GPU

Ivan Blecic, Arnaldo Cecchini and Giuseppe A. Trunfio

Department of Architecture, Planning and Design University of Sassari

Seventh International Conference on Geographical Analysis, Urban Modeling, Spatial Statistics, GEOG-AN-MOD 2012

Page 2: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

Introduction• A number of geosimulation models have been developed to better

understand and predict urban growth, land-use and landscape changes.

• Some trends can be recognized from the literature:– increasing size of the areas under study, which can often go beyond

the traditional scale of a city, covering wider regional and nation territory or even an entire continent;

– such models tend to be more and more sophisticated, also because they can take advantage of the increased availability of high resolution remote sensing data;

– Automatic and computationally expensive calibration processes are often required, involving large search spaces and many parameters.

• As a result, real world applications of such models often require long computing times.

Page 3: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

Introduction• Geosimulation models are often computationally

intensive;

• In spite of this, few studies exist in the literature on the application of parallel computing to geosimulation models – (e.g. the recent work by Guan and Clarke where a general-

purpose parallel library was developed and applied to speed up the well known CA model SLEUTH);

• We apply GPGPU (General-Purpose computing on Graphics Processing Units) to a widely used CA approach for land-use simulation based on the concept of transition potentials

Page 4: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

GPGPU• GPGPU (General-Purpose computing on Graphics Processing

Units): using Graphics Processing Units for standard computation

• Why computing using Graphics Processing Units ?• the computational power of devices enabling GPGPU has exceeded that

of the standard CPUs by more than one order of magnitude; • the price of a typical high-end GPU is comparable to the price of a

standard CPU;

CPUs

GPUs

Page 5: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

GPGPU• Why computing using Graphics Processing Units ?

• There has been a rapid increase in the programmability of GPU devices, which has facilitated the porting of many scientific applications leading to relevant parallel speedups

• Main alternatives from the programming point of view:

• nVidia CUDA: C-language Compute Unified Device Architecture is a popular programming model introduced in 2006 by nVidia Corporation for their GPUs

• openCL: an open standard maintained by the Khronos group with the backing of major graphics hardware vendors as well as large computer industry vendors

Page 6: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

GPUs• Modern GPUs are multiprocessors with a highly

efficient hardware-coded multi-threading support.

• The key capability of a GPU unit is thus to execute thousands of threads running the same function concurrently on different data.

• Hence, the computational power provided by such an architecture can be fully exploited through a fine grained data-parallel approach when the same computation can be independently carried out on different elements of a dataset.

Page 7: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

GPUs• We use the GPGPU platform provided by nVidia

– it consists of a group of Streaming Multiprocessors (SMs); – each SM can support some co-resident concurrent threads; – each SM consists of multiple Scalar Processor (SP) cores.

SM

Page 8: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

CUDA C-language Compute Unified Device Architecture

• In a typical CUDA program, sequential host instructions are combined with parallel GPU code.

• In CUDA, the GPU activation is obtained by writing device functions in C language, which are called kernels: – when a kernel is invoked by the CPU, a number of threads (e.g.

typically several thousands) execute the kernel code in parallel on different data;

Kernels are organized in blocks

Page 9: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

CUDA• The GPU can access different

types of memory.

• The device global memory can deliver significantly (e.g. one order of magnitude) higher memory bandwidth than the main computer memory;

• Unfortunately, the GPU global memory is typically linked to the GPU card through a relatively slow bus

Page 10: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

Two GPGPU accelerated models for Simulating Land-Use Dynamics

• Two versions of a typical Cellular Automata (CA) model for land use dynamics have been parallelized for the GPU: – a constrained cellular automata model (CCA);– and the corresponding unconstrained version (UCA).

• Both models are based on the well known concept of transition potential: – in the CCA the aggregate level of demand for every land use is

fixed by an exogenous constraint at each time step;– in the UCA the amount of cells that are in a certain state at each

time step only depends on the internal model parameters and model structure;

Page 11: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

CCA and UCA simulation of land use change

neighbourhood effect

(interactions between urban functions)

Transition potentials

Land use at time t Land use at time t+1

Land use requests in the area

Planning regulation, Accessibility, suitability, etc.

Page 12: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

CCA and UCA simulation of land use change

• Step 1 for both UCA and CCA: – transition potential computation (on a local basis);

• Step 2 for UCA (on a local basis):– of all the possible land uses, a cell is transformed into

the one having the highest transition potential;

• Step 2 for CCA (on a non-local basis):– transforming each cell into the state with the highest

potential, given the constraint of the overall number of cells in each state imposed by the exogenous trend for that step;

Page 13: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

GPGPU Parallelization with CUDA: design choices

• One or more CUDA computational kernels (i.e. threads) are assigned to each cell of the automaton;– to define the kernels a key step consists of identifying all the sets of

instructions that can be executed independently of each other on the different cells of the automaton;

• Most of the automaton data is stored in the GPU global memory. This involves: – CPU-GPU memory copy operation before the beginning of the simulation

and GPU-CPU memory copy at the end of the simulation;– at the end of each CA step a device-to-device memory copy operation is

used to re-initialise the current values of the CA state with the next values.

Page 14: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

GPGPU Parallelization of UCA

• In the UCA model, the computation performed at each step by each cell consists of two phases: 1. the computation of the transition potentials and 2. the assignment of a new land use;

• Since both can be carried out independently for each cell, they were included in a single kernel, thus avoiding the overhead related to invocation of an additional kernel.

Page 15: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

GPGPU Parallelization of CCA• Also in the CCA each cell computes its transition potential;

However, the downwards scanning of the list of cells ranked according to their higher potential (lines 4-5) must be carried out according to the list order, one cell at a time (inherently sequential)

• As a land-use demand is satisfied, a new ranking of cells must be performed before any further cell transition.

• The constraints on the total number of cells represents a strong condition of dependency between the cells.

Page 16: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

GPGPU Parallelization of CCA• A different constrained allocation procedure has been devised,

which is able to better exploit the GPU while maintaining the essential characteristics of the original constrained approach.

• The proposed parallel constrained allocation tries to process in parallel blocks of cells that have their highest potential for the same land use;

• More details of the algorithms in the paper

Page 17: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

Computational results: hardware• The sequential UCA and CCA reference versions were run on

a desktop computer equipped with a 2.66 Ghz Intel Core 2 Quad CPU;

• The parallel versions were run on the following GPUs:

Page 18: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

Computational results: test cases• Two different datasets:

– the first concerns the area of the city of Florence and is composed of 242 × 151 cells of size corresponding to 100 m;

– the second represents the urban area of Athens and is composed of 321 × 391 cells of size 100 m.

– 30 simulation steps (i.e. 30 years of future land use projection); – for the CCA, a constant 3% increment, referred to the initial number

of cells, was adopted as constraint for each active land use.

• In both the CCA and UCA, the effort involved in the computation of transition potentials is almost proportional to the number of neighbouring cells. – for this reason three different neighbourhood radius were

considered, namely r = 10 cells, r = 15 cells and r = 20 cells.

Page 19: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

Computational results: elapsed times

Page 20: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

Computational results: parallel speedups

Page 21: Parallel Simulation of Urban Dynamics on the GPU Ivan Blečić, Arnaldo Cecchini, Giuseppe A. Trunfio - Department of Architecture, Planning and Design, University of Sassari, Alghero

Computational results: conclusions

• The gain in terms of computing time is impressive. • As expected, the speedup of the UCA model was always superior to that

achieved on the CCA model. • Improvement are still possible, since not all typical GPGPU optimization

strategies have been implemented and more powerful GPUs are available;

• The main advantage lies in enabling an accurate calibration, which otherwise may not be possible in some cases involving models operating at regional or continental scale