Distributed Individual-Based Simulationbic/papers/JimingLiu-Thesis.pdfUNIVERSITY OF CALIFORNIA,...

UNIVERSITY OF CALIFORNIA, IRVINE

Distributed Individual-Based Simulation

DISSERTATION

submitted in partial satisfaction of the requirements for the degree of

DOCTOR OF PHILOSOPHY

in Information and Computer Science

by

Jiming Liu

Dissertation Committee: Professor Lubomir F. Bic, Co- Chair

Professor Michael B. Dillencourt Co-Chair Professor Arthur D. Lander

2008

ii

The dissertation of Jiming Liu

is approved and is acceptable in quality and form for

publication on microfilm and in digital formats:

____________________________

____________________________ Committee Co-Chair

____________________________ Committee Co-Chair

University of California, Irvine

2008

iii

Contents

List of Figures vii

List of Tables x

List of Algorithms xi

Acknowledgments xii

Curriculum Vistae xiii

Abstract xiv

1 Chapter 1: Introduction 1

1.1 Research motivation and target problem......................................................... 2

1.2 Individual-based model .................................................................................. 3

1.3 Distributed individual-based simulation ......................................................... 5

1.3.1 Problem partitioning ............................................................................. 5

1.3.2 Problem mapping .................................................................................. 6

1.3.3 Communication ..................................................................................... 6

1.3.4 Simulation consistency ......................................................................... 7

1.4 Chapters in the dissertation ............................................................................ 7

2 Chapter 2: Particle Diffusion Model 9

iv

2.1 The particle diffusion model .......................................................................... 9

2.2 Model specification ........................................................................................ 11

2.2.1 Simulated space .................................................................................... 11

2.2.2 Cell representation ................................................................................ 12

2.2.3 Receptor representation ........................................................................ 12

2.2.4 Particle representation ........................................................................... 13

2.3 Basic simulation ............................................................................................. 15

3 Chapter 3: Improved Model 16

3.1 Macro simulation distance step ...................................................................... 17

3.2 Calculating particle capturing ......................................................................... 18

3.2.1 First hit .................................................................................................. 20

3.2.2 Number of returns ................................................................................. 21

3.2.3 Horizontal distance ............................................................................... 22

3.2.4 Probability of capturing ........................................................................ 22

3.2.5 Modification to the formula .................................................................. 23

4 Chapter 4: Distributed Individual-Based Simulation 25

4.1 Problem decomposition .................................................................................. 27

4.1.1 The Lagrangian decomposition method ................................................ 27

4.1.2 The Eulerian decomposition method with vertical strips ..................... 29

4.1.3 The Eulerian decomposition method with horizontal strips ................. 31

4.2 Implementation ............................................................................................... 33

4.2.1 Overview ............................................................................................... 33

4.2.2 Messengers ............................................................................................ 35

4.3 Consistency with sequential implementation ................................................ 41

4.3.1 Identical results with the sequential simulation .................................... 42

4.3.2 Random number seed initialization ...................................................... 45

5 Chapter 5: Simulation Enhancement: Parallel Simulation Protocols 48

v

5.1 Exchange less frequently ................................................................................ 49

5.2 Shadow cells ................................................................................................... 53

5.3 Conflict scenarios description ........................................................................ 55

5.3.1 Scenario 1: A free particle becoming stuck when it should not ........... 55

5.3.2 Scenario 2: A free particle not becoming stuck when it should ........... 61

5.4 Conflict resolution .......................................................................................... 66

5.4.1 Solution to scenario 1 ............................................................................ 69

5.4.2 Solution to scenario 2 ............................................................................ 73

5.5 The order of processing the particles .............................................................. 77

5.5.1 First come, first processed .................................................................... 77

5.5.2 Particles migration ................................................................................ 79

5.5.3 Preserving the order of processing the particles ................................... 81

5.5.4 Temporary incoming particles .............................................................. 82

5.6 Confliction resolution algorithm .................................................................... 85

5.7 Correctness of distributed simulation ............................................................. 90

6 Chapter 6: Experimental Assessments 92

6.1 Performance evaluation .................................................................................. 92

6.1.1 Experiment 1 ......................................................................................... 94

6.1.2 Experiment 2 ......................................................................................... 96

6.1.3 Experiment 3 ......................................................................................... 98

6.1.4 Experiment 4 ......................................................................................... 100

6.1.5 Performance trade-offs .......................................................................... 102

6.2 System capability and scalability ................................................................... 104

6.3 System steady state - stop criteria .................................................................. 104

6.3.1 Determining the point of steady state ................................................... 105

6.3.2 Quantifying the mean time to reaching steady state ............................. 106

7 Chapter 7: Biology Results Obtained From the Simulation 110

7.1 Analysis of simulation output ......................................................................... 110

vi

7.1.1 Variation of number of stuck particles .................................................. 111

7.1.2 Shapiro-Wilk W test for variation ......................................................... 114

7.2 More experiments ........................................................................................... 118

7.2.1 Case study 1: release stuck particles ..................................................... 118

7.2.2 Case study 2: stuck particles crossing through cell .............................. 123

8 Chapter 8: Related work 127

9 Chapter 9: Conclusions 131

9.1 Contribution .................................................................................................... 131

9.2 Future work .................................................................................................... 133

Bibliography 135

Appendix 1 140

Appendix 2 161

Appendix 3 181

Appendix 4 202

vii

List of Figures

2.1 Particle diffusion models ...................................................................................... 10

2.2 The simulated space ............................................................................................. 11

2.3 Cell representation ................................................................................................ 12

2.4 Algorithm for the basic simulation ....................................................................... 15

3.1 Cell grid and 9 areas of a particle position ........................................................... 19

3.2 A particle moving to the right segment at the end of macro time step ................. 22

4.1 Lagrangian decomposition method and node mapping ........................................ 28

4.2 Eulerian decomposition method with vertical strips and node mapping .............. 30

4.3 Eulerian decomposition method with horizontal strips and node mapping ......... 31

4.4 An example of simulated space partitioning and node mapping .......................... 33

4.5 An example of nodes with their neighbors ........................................................... 33

4.6 Messengers system architecture and the task and shuttle Messengers ................. 34

4.7 Creator Messenger script pseudo code ................................................................. 35

4.8 Task Messenger script pseudo code ..................................................................... 37

4.9 Left shuttle Messenger script pseudo code ........................................................... 41

4.10 Random number sequences .................................................................................. 42

4.11 Random number sequence change in particle migration ...................................... 43

4.12 Random number sequences unique to new particles ............................................ 44

4.13 Function to assign a random number seed to a particle ........................................ 46

5.1 Communication granularity level ......................................................................... 51

viii

5.2 Node mapping with show cells ............................................................................. 52

5.3 View of incoming particle A’ on local node (node 1) ........................................... 54

5.4 Views of local node with an incoming particle (scenario 1) ................................ 56

5.5 Views of particles movement in sequential and parallel implementation

of scenario 1 ......................................................................................................... 57

5.6 View of local node with degraded stuck particle on local node (scenario 2) ....... 61

5.7 Views of particles movement in sequential and parallel implementation

of scenario 2 ......................................................................................................... 63

5.8 Tentatively stuck particles on node 1 .................................................................... 66

5.9 TSP tag structure .................................................................................................. 67

5.10 DSP tag structure .................................................................................................. 69

5.11 Solution for scenario 1 .......................................................................................... 70

5.12 Overstating the number of free receptors in the shadow cells of node 2 ............. 74

5.13 Particles migration between node 1 and node 2 ................................................... 80

5.14 A temporary incoming particle on the local node ................................................ 84

5.15 Pseudo code of the conflict resolution .................................................................. 86

6.1 Execution time of experiment 1 at different epoch 1engths ................................. 94

6.2 Speedup of experiment 1 at different epoch lengths ............................................ 95

6.3 Execution time of experiment 2 at different epoch lengths .................................. 97

6.4 Speedup of experiment 1 and 2 on 5 nodes .......................................................... 97


6.6 Speedup of experiment 3 at different epoch lengths ............................................ 99


6.8 Speedup of experiment 3 and 4 on 10 nodes ........................................................ 101

6.9 Example of fitted piecewise linear models to one of the stuck particles datasets 108

6.10 Example of fitted piecewise linear models to one of free particles datasets ........ 109

7.1 Average value and number of stuck particles from one simulation in bin 1 ........ 112

7.2 10 time ranges (T1 – T10) in bin 1 of stuck particles .......................................... 113

7.3 10 time ranges (T1 – T10) in bin 1 of free particles ............................................. 115

ix

7.4 CV for the number of stuck particles in 10 time ranges ....................................... 116

7.5 CV for the number of free particles in 10 time ranges ......................................... 106

7.6 Stuck particles released back to system ................................................................ 118

7.7 Particles diffusion in the simulated space (6.67 Bio-minutes) ............................. 119

7.8 Number of stuck Particles at the end of simulation (6.67 Bio-minutes) .............. 121

7.9 Number of free particles at the end of simulation (6.67 Bio-minutes) ................. 122

7.10 A stuck particle released crossing the cell ............................................................ 123

7.11 Particles diffusion at the end of 20 million iterations (3.33 Bio-minutes) ........... 124

7.12 Number of stuck particles at the end of simulation (3.33 Bio-minutes) .............. 125

7.13 Number of free particles at the end of simulation (3.33 Bio-minutes) ................. 126

x

List of Tables

6.1 Parameter set for experiments .............................................................................. 93

6.2 Comparison of the speedup and accuracy at different epoch lengths .................. 102

6.3 Mean time with confidence interval results for stuck particles ............................ 107

6.4 Mean time with confidence interval results for free particles .............................. 107

7.1 p-value produced by Shapiro-Wilk W Test for bin 1 output ................................ 115

7.2 Parameters in case studies ..................................................................................... 117

xi

List of Algorithms

5.1 Parallel simulation procedures performed by task Messengers ........................... 84

xii

Acknowledgments

I would like to thank my committee members: Professor Lubomir Bic, Professor Michael

Dillencourt, and Professor Arthur Lander for their guidance, support and patience

throughout the years. I would like to thank Professor Dan Gillen for his help in the simu-

lation data analysis.

I would like to thank the past and present members of Messengers research group:

Bozhena Bidyuk, Hairong Kuang, Susan Mabry, Eugene Gendelman, Koji Noguchi, Lei

Pan, Richard Utter, Ming Kin Lai, Javid Huseynov, and Wendy Zhang. Thank everyone,

for all their help and encouragement over the years. In particular, I would like to send

my gratitude to Koji and Ming, for maintaining the Messengers system and helping me in

debugging the Messengers programming.

I would like to thank my family for their love and understanding.

xiii

Curriculum Vitae

Jiming Liu

1983 B.S. in Computer Engineering National University of Defense Technology, China 1992 M.S. in Industrial Engineering Purdue University, West Lafayette, Indiana 2001 M.S. in Information and Computer Science University of California, Irvine 2008 Ph.D. in Information and Computer Science University of California, Irvine 2001 GAANN Fellowship Department of Education, USA

xiv

Abstract of the Dissertation


By

Jiming Liu

Doctor of Philosophy in Information and Computer Science

University of California, Irvine, 2008

Professor Lubomir F. Bic, Co-Chair

Professor Michael B. Dillencourt, Co-Chair

Individual-based simulation can be implemented in a distributed fashion by making each

machine in a distributed system responsible for a portion of the problem. Eulerian

method and Lagrangian method can be used to decompose the problem. Individual-based

simulation is not a new concept, nor is the idea of distributed computing; the system here

we worked on offers techniques and prototypes of combing these two paradigms into one

large scale simulation environment.

Our simulation target model is a biological particle diffusion model with a large

population of particles. We describe methods of improving simulation performance that

xv

combine several techniques including model improvement, distributed simulation system

structures, a hybrid problem decomposition that combines the classical Lagrangian and

Eulerian methods, and varying granularity of synchronization between computers in the

distributed system. We present the performance results to show the speedup of the simu-

lation using the methods and parallelism we developed and implemented. We evaluate

the simulation results consistency between the distributed simulation and the sequential

simulation and the trade-offs between accuracy and speedup. We also present biological

case studies by using our simulation system.

1

Chapter 1

Introduction

Individual-based models are the simulations that have the following characteristics. It

typically consists of an environment and individuals. The environment can be a simu-

lated space in which the interactions (1) between the individuals and (2) between the in-

dividuals and the environment occur. The individuals can be defined by its characteris-

tic parameters or attributes. The behaviors and states of the individuals are tracked

through entire simulation based on the global consequences of the local interactions of

the individuals. An individual-based simulation can also be exhibit mobility where the

individuals can move around in simulated space. Because of these characteristics of the

individual-based simulation, it has been widely used in many applications, such as ecol-

ogy and biology, traffic control and sociology.

A distributed individual-based simulation is to simulate the individual-based

2

model in a distributed computing environment. The distributed simulation applies to

the applications that simulate a large population of individuals or process time-

consuming tasks. The major issues in implementing a distributed simulation include the

application partitioning, communication overhead, and the simulation results consis-

tency.

We developed a distributed individual-base simulation system to support a time-

consuming large populated individual-based model. Individual-based simulation is not

a new concept, nor is the idea of distributed computing; the system here we worked on

offers techniques and prototypes of combing these two paradigms into one large scale

simulation environment.

1.1 Research motivation and target problem

We started this research work after we got interested in a biological experiment pre-

sented by professor Author Lander. The biological experiment is a process of mole-

cules diffusing in an area built with wed cells. During the molecules diffusion process,

a few events occur, such as molecules binding and unbinding to the cell receptors, and

molecules degradating from the intercellular space. The problem is that the molecule

diffusion process is an extremely time-consuming processing. In order to observe the

molecule activities in the diffusion process, the experiment needs to run for a period of

time defined by the biological time step. The biological time step used in this experi-

ment is 5 nanoseconds. To simulate molecule’s movement at each biological time step

3

for 1 second of the biological clock, the simulation needs to run for 200 million itera-

tions. The simulation can run for a long period of time to simulate a large amount of

particles.

We are interested in the problem because it is 1) a typical computer simulation

problem, 2) an individual-based application, and 3) a good candidate for a distributed

computing simulation. The developed simulation system can also be used as a tool for

simulating biological applications.

1.2 Individual-based model

The principles in modeling an individual-based application can be summed up as the

following: 1) the individuals have its own identity and are behaviorally instinct because

of the environmental influences, and 2) the interactions between the individuals or be-

tween the individual and its environment are inherently localized, e.g. the individuals

are influenced by the nearby individuals. Based on these principles and the characteris-

tics of the applications, the individual-based modeling is used in many research areas.

The applications of the individual-based model can be found most commonly in the

study of ecology and biology [KBW99, RG05, HHM96], sociology [WH96], artificial

intelligent and traffic control [HM96]. The techniques, approaches, software and tools

used in the modeling have been developed and researched and are continue to be re-

searched.

We use some of the techniques and approaches in presenting our target model, a

4

biology application of molecules diffusion within an intercellular space. The target

model is spatially exhibit meaning that the particles are associated with a geometry lo-

cation in the simulated space. The model also exhibits mobility because the particles

move around in the simulated space. There are three basic components to be defined in

the individual-based model for the biological application:

• Simulated space

The simulated space is the simulation environment representing an intercellular

space, which is constructed by cells. Each cell has its grid like geometry loca-

tion in the simulated space. There are receptors resided on each cell as part of

the cell structure.

• Particles

Particles are the individuals. The particles are associated with a geometry loca-

tion and move around in the simulated space. The particles do not interact with

each other directly, but they interact with the nearby cell receptors and the result

from the interaction influences its behaviors. Therefore the particles interact

with each other indirectly through the receptors. The states and location of each

particle is tracked through entire simulation.

• Parameters

Parameter values of the model specification define the simulation environment

and control the behavior of particles. By using a specific set of parameters or

the interaction rules to the particles, some complex decision making by a parti-

5

cle can be simulated. Different set of parameters can be used to study different

scenarios of the application.

In chapter 2 we present the original application model and its characteristics. In

chapter 3, we present an improved model to make it suitable to a computer simulation

and at the same time, not changing the nature of the application.

1.3 Distributed individual-based simulation

A distributed system runs on a collection of computers. A distributed simulation system

is developed for applications that simulate a large scale of problem and carry out time-

consuming tasks. The main issues in a distributed simulation system can be summed up

as follows: 1) problem partitioning and mapping, 2) communication overhead, 3) results

consistency, and 4) system capability and scalability.

1.3.1 Problem partitioning

To simulate a problem on a cluster of computers, the problem needs to be decomposed

into small pieces. The number of smaller problems should be equal or more than the

number of computers meaning that each computer processes one or more small prob-

lems. There are two well-known approaches in the problem partitioning in distributed

individual-based simulations: 1) the Lagrangian decomposition method and 2) the Eule-

rian decomposition method.

6

The Lagrangian decomposition method focuses on the simulated individual enti-

ties. It divides the entities into multiple groups and let each computer be responsible for

each group of entities. The most parallelism is achieved by this decomposition method

for the application that the individual entities only interact with each other within its

group.

The Eulerian decomposition method is to divide the simulated environment into

small portions. Each computer is responsible for the activities occurring in the region.

This decomposition method is usually used in the application in which the individual

entities interact with the environment locally and the global environment synchroniza-

tion does not happen frequently. In chapter 4, we present and compare each of the

methods and present a hybrid method of Lagrangian method and Eulerian method in the

problem decomposition.

1.3.2 Problem mapping

To map a portion or a set of portions of the problem to a node can be straightforward.

Usually at the stage of problem decompostion, the mapping is put into considerations.

The goal of mapping is to 1) minimize the communication between the processors, 2)

balance the workload on each node, 3) provide the scalability, and 4) provide the system

flexibility. In the chapter 4 we present the mapping strategy.

1.3.3 Communication

7

In the distributed system, each node processes a portion or a set of portions of the prob-

lem. Each portion does not exist as an independent task. The dependences between

these portions require data exchange to keep everyone in synchronized. The time spent

on data exchange is one of the major issues in the distributed system. In chapter 5 we

present communication granularity approach to reduce the communication overhead by

exchange less frequently between the machine nodes.

1.3.4 Simulation consistency

The application is simulated in a sequential way initially. We implement it by using our

distributed system to make it run faster. In general the consistency of results between a

sequential simulation and a parallel simulation is evaluated from two aspects: statistic

consistency and exact consistency. We focus on the exact consistency. In chapter 4, we

discuss the random number generator strategy by assigning a random number generator

to each particle. In chapter 5, we present the conflict resolution and more protocols to

ensure the exact consistency while benefiting from the most of parallelism.

1.4 Chapters in the dissertation

There are 9 chapters in this dissertation. Chapter 2 presents the target mode with the

model specification. Chapter 3 presents the improved model. Chapter 4 presents paral-

lel structures. Chapter 5 discusses the enhancement of the parallel simulations. Chap-

ter 6 examines and presents performance evaluation. In chapter 7 we use the simulation

8

system as a tool in biological case studies to produce the results in a biology form.

Chapter 8 reviews some related work to this research work and the final chapter, chap-

ter 9 discusses the conclusion and future work.

9

2. Chapter 2

The Target Model

2.1 The particle diffusion model

Our target model is to simulate particles diffusion in a biological intercellular space,

based on the molecules diffusion theory in random walks in biology presented by Berg

[Ber93]. We present this simulation model using a state and event based modeling ap-

proach [Fis95]. Figure 2.1 (a) presents the target model conceptually and shows the ba-

sic nature of the particle diffusion process in a virtual intercellular space. Figure 2.1 (b)

illustrates the state and event-based approach to declarative the model. It defines three

possible states of the system represented by circles, where a state is identified by parti-

cles existing in the system. The six events are represented by curves. A particle can be

a free particle when it arrives in the system and moves around in the system. A stuck

10

particle is a particle that is captured by a receptor. A stuck particle can be released back

to the system as a new free particle or be absorbed by the receptor and degrade from the

system for the rest of simulation time.

(a) Conceptual model

(b) State and event-based model

Figure 2.1 Particle diffusion models

11

2.2 Model specification

2.2.1 Simulated space

A two-dimensional particle diffusion space is used as a simulated space to present this

computational model. The space is composed of a set of cells with a fixed location and

size, in grid geometry, see figure 2.2. The left boundary is a closed boundary. Particles

attempting to go pass the left boundary are bounced back to the simulated space. The

right boundary is an open boundary. Particles that walk across the right boundary dis-

appear from the simulated space. The top and bottom boundaries are identified with

each other. For example, a particle crossing the top boundary ends up on the other site

(the bottom part) of the simulated space with the same coordinate value of x; and a par-

ticle crossing the bottom boundary ends up in the top part of the simulated space.

Topologically, the space is a cylinder, closed on the left and open on the right.

Figure 2.2 The simulated space

12

2.2.2 Cell representation

A cell object is a square with a size of 10µm by 10µm. The location of a cell object is a

fixed position in the cell simulated space. Cells do not overlap with each other. The

distance between cells is one tenth of the cell size, which equals 1µm. The wall of each

cell is divided into 20 segments, which corresponds to cell membranes. The number of

receptors in a given segment varies over time. Figure 2.3(a) displays the geometry of

the cells in the simulated space. Figure 2.3(b) illustrates the detail of a cell object,

which includes cell area, cell wall that is divided into wall segments, and receptors that

reside on the wall segments.

(a) (b)

Figure 2.3 Cell representation

2.2.3 Receptor representation

Receptors reside on cell membranes (cell segments in our simulation system). A recep-

tor has two states: free or occupied. When a receptor captures a particle its state

13

changes from free to the state of occupied. When the stuck particle gets released or de-

graded, the receptor that captures the particle changes state back to free. In the follow-

ing section we will give description of the particles.

Receptors, both free and occupied, move between neighbor cell segments at a

predefined rate at every simulation time step. This movement balances the number of

free and occupied receptors on the cell segments. For example, at each time step, 50%

of the free and 50% of occupied receptors in one segment are moved to its neighbors.

The fraction of number of receptors is accumulated at every time step and the receptors

move occurs when there is at least one receptor available.

2.2.4 Particle representation

Particles are identified in the system by their position and state. The position of a parti-

cle is defined by its coordinates, x and y in the simulated space. A particle does not own

a territory; the model allows more than one particle to reside at one spot and have the

same coordinate values. Therefore the density of particles in the simulated space has no

limit. The total number of particles entering into the system can be varied to simulate

different cases, according to biological assumptions.

New particles enter into the simulated space periodically during the course of

simulation. A particle starts as a free particle and can travel freely through the alleys

(between cells) or be captured by a free receptor on a cell segment. When a particle

moves to the next position within an alley and is not captured, it remains as a free parti-

14

cle at the end of that time step. If a particle is captured by a free receptor, it becomes a

stuck particle and occupies a receptor.

A free particle walks in a fixed distance (the micro distance step) in one of four

directions (north, south, east, and west). The direction is chosen with each direction

having equal probability randomly based on the value of a random number that is gen-

erated uniformly distributed over the interval [0.0, 1). For example, if the random num-

ber is in the interval [0, 0.25), the particle moves to the north direction, if in the interval

[0.25, 0.5), the particle moves to south, and so on.

The micro distance step (s) is 10-9m or 1nm. It is calculated by using the follow-

ing equation that we adapted from [Ber93], equation 2.1. Here, D is the diffusion coef-

ficient and t is the micro time step.

(1nm) mDts

sectseccmD

9

9

127

104

105105

−

−

−−

==

⎟⎟⎠

⎞⎜⎜⎝

⎛

×=

×=

(2.1)

As a particle diffuses through the cell alleys, at each time step the calculation

occurs to decide if the particle gets captured by a receptor and becomes a stuck particle.

If a particle hits a cell wall but does not get captured by a free receptor, it bounces away

from the cell wall in the next time step.

A stuck particle can be absorbed by the cell segment on which the receptor that

it occupies currently resides. When this happens, the occupied receptor becomes free

15

and the stuck particle disappears from the system forever. The rate at which this occurs

is predefined as a system parameter. A stuck particle can also be released from an oc-

cupied receptor and re-enter into the system with a new location near the segment where

it gets released. The rate at which this occurs is also predefined as a system parameter.

2.3 Basic simulation

The simulation algorithm described in this chapter is straightforward. The algorithm of

this basic simulation is showed in Figure 2.4.

Procedure MAIN While (not end of simulation) do For (all cell segments) Move receptors between the neighbor segments End For For (all cell segments) Calculate total #of degradation particles by accumulated in iteration Update #of free and occupied receptors End For For (all free particles) Move particle by distance (s) in one of directions (N, S, E, W) Decide whether particle gets captured by a free receptor Update particle state and location Update receptors state End For Advance bio-clock by (t) End While End MAIN

Figure 2.4 Algorithm for the basic simulation

16

3. Chapter 3

Improved Model

In the micro simulation model described in chapter 2, particles diffuse in the simulated

space in a simple Brownian motion. A particle moves one step in one of four directions

per micro time step. Direct execute of this model is time consuming, and system can

become infeasible for a simulation that runs for a long period of time. For example,

simulating the model for one bio-second requires system to execute 2*108 iterations.

This chapter introduces a macro simulation, which reduces the execution time by re-

placing micro time steps with a macro time step and improves the system scalability.

This macro simulation 1) does not change the nature of the micro simulation and 2) runs

more efficiently.

In the micro-step model introduced previously, a particle may be captured at a

micro time step when the particle moves a micro distance step and hits a cell wall. The

17

system then decides of whether it is indeed captured. Since the macro simulation re-

places micro time steps with a macro time step, micro time steps are no longer visible to

the system. This complicates the decision as to when particles should be captured, e.g.

at which micro time step. In this chapter, we introduce the macro simulation. In par-

ticular, we discuss two issues that need to be resolved for the macro simulation model:

defining the macro simulation distance step and defining the particles capturing process

to make it consistent with the micro simulation model, the latter uses some classical re-

sults about the behaviors of random walk [Fel66].

3.1 Macro simulation distance step

We calculate the motion in a macro simulation as a distance and a direction. The dis-

tance is calculated as a displacement of a Gaussian distribution. The direction is calcu-

lated randomly and uniformly.

Calculating the macro distance step

To replace micro time steps with macro time step, we compute the macro dis-

tance step (L) as a displacement from Gaussian distribution:

),(L ξ0Ν= . (3.1)

The standard deviation Dt4=ξ , where 127 sec105 −−×= cmD is a fixed pa-

rameter as we defined in chapter 2. Here t is macro time step, we assume that it equals

to 2000 micro time steps, therefore sec10 5−=t , then m.cm µξ 04402010 6 == − .

18

The assumption of a macro time step equal to 2000 micro time steps in the

Gaussian distribution is reasonable because of the following. We want the macro dis-

tance step is reasonably large enough, but do not go too far within one step. The micro

distance step at each micro time step is a random walk step, which equals 0.001µm (as

we presented in chapter 2). The geometry of a cell is 10µm by 10µm. A cell segment is

2µm (e.g. 2000 micro distance steps) and the alley between two cells is 1µm (e.g. 1000

micro distance steps). With the standard deviation m. µξ 0440= (44 micro distance

steps), ξ3 equals to 132 micro distance steps, which means that with less than 0.3%

probability a particle can move up to as many as 132 micro steps within a macro time

step. This tells that 1) a particle moves cross a cell alley within less than 5 macro steps

is highly unlikely to happen and 2) a particle moves cross a cell segment within 10

macro steps is also highly unlikely to happen. Therefore the macro time step assump-

tion is reasonable.

Particle moving direction

The particle moves at a direction between 0 and 360 degree. A uniformly

distributed random number is generated to calculate direction of motion.

3.2 Calculating particle capturing

In the macro simulation, a particle moves a macro step instead of a micro step. We

need a method to determine whether a particle is captured during a given macro step.

19

Our method is based on determining, in order, following four items:

• The first hit of a particle to a cell wall: The number of micro steps a particle

taken before it first time hits a cell wall within a macro time step, if it ever hits a

cell wall. To determine with which receptor on which cell wall segment a free

particle may bind to, we divide the cell alley around a cell to 8 areas. These ar-

eas are labeled from 0 to 7, see Figure 3.1. A free particle can be captured by a

receptor resided on the nearest cell wall segment. For example, a particle lo-

cated in area 0 can be captured by a free receptor on a segment of cell wall cw0

and a particle in area 5 can be captured by a free receptor on a segment of cell

wall cw0 or cw1.

Figure 3.1 Cell grid and 8 areas of a particle position

20

• The number of returns of the particle to the cell wall after the first hit: Within

the remaining macro time steps, calculate how many times the particle repeat-

edly hit the same cell wall.

• Horizontal distance: To determine which cell segment of the cell wall the parti-

cle hits.

• The probability of getting captured: With all above computed, to decide if the

particle gets captured.

3.2.1 First hit

For each particle, we calculate the number of micro time steps it takes before it first

time hits a cell wall. The cell wall of a particle is determined by the position of the par-

ticle, as we illustrated in the Figure 3.1.

The first-hit-time is the number of micro time steps (t) a particle has taken be-

fore it hits a cell wall for the first time. We compute the first-hit-time using inequality

(3.2), which we adapt from Theorem 2. (7.5) [Fel66]. We generate a real variable x uni-

formly in the interval [0, 1], and let t be the smallest integer for which (3.2) holds. If t

is less or equal to 2000 (macro step equals 2000 micro steps), then t is used for the fur-

ther calculation. Otherwise the particle is considered to not be captured during the

macro step. Note that if a particle locates far away from a cell wall, the chance of hit-

ting the cell wall within a macro time step is relatively small. Therefore we use ine-

quality (3.2) to calculate particles that are within 15 micro steps to the cell wall.

21

For the far away particles, the chance of getting captured is small and the com-

putation of t is too time-consuming. In this case we use a normal approximation to es-

timate the first-hit-time. We use equation (3.3), which we adapt from Theorem 3 (7.7)

[Fel66].

[ ]

⎟⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜⎜

⎝

⎛

<=

=

≥•⎟⎟

⎠

⎞

⎜⎜

⎝

⎛+•∑

=

2000

0

21

t...t 5, 3, 1, k ,number odd an is r If

6,...t 4, 2, k then number, even an is r If1 , number random a :x wallcell the to distance the :r

x k-2 rkk

t

k kr

(3.2)

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

<>=

=

2000010

22

2

t ),(Nu

wallcell the to distance the :rurt

(3.3)

3.2.2 Number of returns

If a particle hits the cell wall, the next step is to determine the total number of returns of

a particle to hit on the same receptor. Equation (3.4) below is used to calculate the

number of returning hits (v) of a particle to a cell wall. We adapt this equation from

22

Theorem 4 [Fel66].

( )⎟⎟⎠

⎞⎜⎜⎝

⎛ =

=

microsteps remaining of number the:n0,1Νd

ndv (3.4)

3.2.3 Horizontal distance

When a particle hits a cell wall, the probability of its being captured depends on which

segment on the cell wall it hits and how many free receptors are on that cell segment.

To decide the hitting cell segment, we use equation (3.5) to calculate the horizontal dis-

tance (x), which is the distance that the particle moves parallel to the direction of the

cell wall before the first hit. Here t is the number of micro steps the particle takes be-

fore it hits the cell wall (section 3.2.1). For example, see Figure 3.2, a particle starting

location is in the area of segment S2, but it may move to the right and ends up at a dif-

ferent segment area, in this case, S3.

( ) t,Nx 0= (3.5)

Figure 3.2 A particle moving to the right segment at the end of macro time step

3.2.4 Probability of capturing

23

We assume that the probabilities of being captured on any given hit are equal and inde-

pendent. The total probability p of being captured in the macro step is ( )vq−− 11 where

v is the number of return hits we calculated in the section 3.2.2. q is the probability of

being captured in a particular hit, fRLSRR q ×= . The value of q depends on the follow-

ing three quantities:

• fR : The number of free receptors on the cell segment that the particle hits.

• LS: The size of a cell segment. There are total of 20 cell segments, 5 segments

on each side of cell wall. The size of the cell segment is 2µm.

• RR: The radius of a receptor. Since the receptors are non-overlapping, we as-

sign a receptor a area with a radius.

3.2.5 Modification to the formula

The introduced formulas in the previous sections overestimate the probabilities of being

captured, because the probabilities are not really independent. If the first hit misses a

receptor, the next hit is likely to be nearby and hence also is likely to miss a receptor.

To correct this, we made the change to the following two calculations:

Reduce the number of hits by a factor of u in equation (3.4) to adjust v:

uvw = (3.6)

Reduce the probability of being captured (described in section 3.2.4) by a factor

24

of f, which results in equation (3.7):

( )

⎜⎜⎜⎜⎜⎜

⎝

⎛

⎟⎟⎟⎟⎟⎟

⎠

⎞

==

××=

−−=

receptors free of number :RnmRR

mLS

RfLS

RR q

qp

f

f

w

52

1

11

µ (3.7)

In this simulation we give .f ,u 305 == To determine an appropriate value of

factor u and factor f depends on biological observations.

Finally, we generate a uniformly distributed random number a (0< a<1). If a <

p, the particle is captured by one of the free receptors on the cell segment where the par-

ticle hits.

25

4. Chapter 4


Individual-based simulations are simulations based on the global consequences of local

interactions among individuals of a population and between individuals and its envi-

ronment. These individuals might represent plants, animals, molecules in ecosystems or

biological systems [FSW97, HM96], vehicles in traffic [PKC07], people in crowds, or

autonomous objects [FBD98]. These models typically consist of an environment or

framework in which the interactions occur and some number of individuals defined in

terms of their behaviors (procedural/activity rules) and characteristic parameters. In an

individual-based model, the characteristics of each individual are tracked through the

entire simulation. This stands in contrast to modeling techniques where the characteris-

tics of the population are averaged together and the model attempts to simulate changes

in these averaged characteristics for the whole population. Individual-based models are

26

also known as entity or agent based models.

Our model has the characteristics of an individual-based mode: 1) it consists of a

simulated space of cells in grid geometry as the environment, 2) the particles are the in-

dividuals, 3) a particle has its own state and behaviors, 4) particles do not interact to

each other; they only interact to the receptors on the cell walls, and 5) the characteristics

of particles are tracked in entire simulation at each iteration. An individual-based

model can be simulated in a sequential way, or in a distributed fashion. Our goal is to

develop a distributed simulation system that supports the simulations of this model and

runs faster than the sequential system and is accurate.

A distributed system runs on a cluster of nodes. Each node carries a certain

amount of work load. The nodes interact by exchanging messages over the intercon-

necting network to achieve synchronization. The major issues that arise in implement-

ing such a parallel computing system include: 1) dividing the problem into small por-

tions, 2) providing a distributed computing environment to support the implementation,

and 3) ensuring that the simulation is correct, in the sense that it provides results consis-

tent with what sequential implementation produces.

In this chapter, section 4.1 presents the methods of problem decomposition for a

distributed computing and mapping the partitioned problem to physical nodes. Section

4.2 describes algorithms developed for the distributed simulation and 4.3 discusses the

results.

27

4.1 Problem decomposition

In this section we describe two problem decomposition methods: the Lagrangian

method and the Eulerian method, which are two well-known mechanisms used to parti-

tion a problem in a distributed system by assigning each node a fixed portion of the

problem. We describe the advantages and disadvantages of each method for this par-

ticular simulation. Our conclusion is that the best partitioning scheme for our simula-

tion is hybrid of two methods - Eulerian horizontal strip decomposition.

4.1.1 The Lagrangian decomposition method

Definition

In the Lagrangian decomposition method each node is responsible for a set of

entities (particles) and tracks them for the entire simulation. A set of entries can be

grouped by their static pre-defined attributes.

Decomposition

In our simulation, all particles have the same characteristics. However, we can

group particles by the locations where the particles enter into the system, for example,

by the cell rows. New particles enter from the left boundary of the simulated space on

the center point of each row continuously. We can reasonably group particles according

to their entry point. Figure 4.1 shows two groups of particles. Group one is represented

by the black circles and consists of the particles that initially enter from row one.

28

Group two is represented by blank circles and consists of the particles entering from

row two. Each group of particles is assigned to a node and all particles in that group

remain assigned to that node for the entire simulation. As showed in Figure 4.1, group

one of particles is assigned to node 1, and group two of particles is assigned to node 2.

Figure 4.1 Lagrangian decomposition method and node mapping

Advantages

The advantage of using this decomposition method is that particles are bound to

a specific node and do not migrate to other nodes, so no communication time is spent on

particles migration.

Disadvantages

The particles do not directly interact with each other; they interact indirectly

through receptors that reside on the cell walls. When a particle comes in contact with a

receptor, the status of the receptor affects the new status of the particle and of the recep-

29

tor itself. If the status of the receptor changes, this state change is seen by particles that

later come in contact with the same receptor. This behavior of particles and receptors

requires that each node keeps an up-to-date map of the status of all receptors with which

its particle may come in contact. In the Lagrangian model, the particle tracked by any

one machine could be anywhere in the simulated space. Hence the Lagrangian ap-

proach complicates the simulation: every machine must know about every environ-

mental change in the entire system. This requires communications among all machines.

The more machines added to the system, the worse the performance is, so this approach

does not scale well.

4.1.2 The Eulerian decomposition method with vertical strips

Definition

In the Eulerian decomposition method, each node is responsible for a specific

region in the simulated space. Unlike the Lagrangian approach, we divide the fixed ge-

ometry space into problem regions, instead of dividing by the entity type.

Decomposition

One way of implementing this is method to partition the whole simulated space

into vertical strips that do not change the shape during the simulation. Each strip con-

tains a column of cells and is mapped to each node. Figure 4.2 shows the vertical strips

and the mapping. As new particles enter into the simulated space, it takes some time for

30

the particle to move from the entry point to the rest of the simulated space. Figure 4.2

shows that node 1 carries more particles than node 2. This is especially true at the be-

ginning of the simulation, when there are many more free particles than the stuck parti-

cles.

Figure 4.2 Eulerian decomposition method with vertical strips and node mapping

Disadvantages

One obvious problem with this approach is that there is no workload balance

and there is no parallelism at the beginning of the simulation.

A reasonable question is: can we cut vertical strips with different size of regions

to achieve the workload balance and parallelism? For instance, one node maps only a

portion of a most left cell column and another node can have multiple cell columns

mapped. This is not a good idea for our simulation model, because this breaks a cell

into multiple segments and allocates them into different nodes. The geometry break

causes heavy communication traffic between notes to keep the cell segments synchro-

31

nized. This complicates the simulation by adding communication overhead.

4.1.3 The Eulerian decomposition method with horizontal strips

Definition

An Eulerian method alternative is to partition the simulated space into horizontal

strips. The shape of the strips does not change in the simulation. New particles are

grouped by the location where they enter into the simulated space. Each horizontal strip

accepts new particles at each time step.

Figure 4.3 Eulerian decomposition method with horizontal strips and node mapping

Decomposition

Figure 4.3 illustrates particles that are grouped into horizontal strips based on

their locations in the simulation space. Each horizontal strip is mapped on to a node.

32

This approach combines the geometry partitioning and particles grouping. The geome-

try mapping to a node does not change in the entire simulation. As the particles migrate

to the neighbor nodes, the workload will remain balanced during the simulation.

Advantages

The proportion of particles in each horizontal strip does not change dramatically

during the simulation. This approach provides good workload balancing during the en-

tire simulation. The only required communications is when particles migrate to the

neighbor nodes, instead of among all nodes in the entire simulated space, when particles

walk across the horizontal boundary. This is the approach we have chosen for the

simulation.

Disadvantages

The required communication between neighbor nodes can also be costly, if the

particle migration occurs too frequently. We can reduce the communication frequency

by utilizing techniques that we will discuss in the next chapter.

Partitioned space mapping on nodes

Using the Eulerian horizontal strip decomposition approach, Figure 4.4 shows a

5-5 cells virtual space partitioning and mapping to a 5 node network. Each row of cells

is mapped to a node. Each node has two neighbors, for example node 2 has two

neighbors: node 3 is the lower neighbor and node 1 is the upper neighbor. We wrap

around the simulated space by connecting node 1 and node 5. Therefore node 5 is the

33

lower neighbor of node1 and node 1 is the upper neighbor of node 5. The numbers

marked in each cell represent cell numbers: there are 25 cells, 5 cells per row. Figure

4.5 lists the nodes and their neighbors.

Figure 4.4 An example of simulated space partitioning and node mapping

Node Upper neighbor node Lower neighbor node 1 2 5 2 3 1 3 4 2 4 5 3 5 1 4

Figure 4.5 An example of nodes and its neighbors

4.2 Implementation

4.2.1 Overview

34

We use MESSENGERS [BFD96, FBD98, FBD99, FCH+08], a distributed program-

ming system based on the principles of autonomous objects, developed in our research

group, to support implementation of our distributed individual-based simulation. The

autonomous objects, called Messengers, carry their own behaviors, perform tasks in a

form of program, and are capable of navigating through the underlying network.

In the implementation, we use these types of Messengers: 1) a creator Messen-

ger to create a logical network on a set of nodes, 2) a task Messenger to carry the com-

putation task on each node, and 3) a shuttle Messenger to perform the synchronization

between the neighbor nodes.

Figure 4.6 Messengers system architecture and the task and shuttle Messengers

Figure 4.6 shows the Messengers system architecture and activities of the task

Messenger and the shuttle Messengers. Each node has a task Messenger, drawn with a

gray circle labeled “TM”, which carries the work load for the node in the simulation. A

35

left-shuttle Messenger is drawn with a gray circle labeled “LM” with dashed-line and a

right-shuttle Messenger is drawn with a gray circle labeled “RM” with dashed-line.

Two shuttle Messengers are injected by a task Messenger to migrate particles and up-

date state of the cell grid to its neighbors at every time unit. A shuttle Messenger has a

short life in the system; it hops to its neighbor and uploads the particles and the space

strip information to the neighbor node, then exits from the system. In the next section,

we describe details of the Messengers implementation.

4.2.2 Messengers

Creator Messenger

We use a creator Messenger to create a logical network on a set of physical

nodes and inject a task Messenger on each node. Figure 4.7 shows the Messenger

pseudo code of the creator Messenger script.

1 create(node_name;n_link;physical_node); 2 for ( number of nodes < total_nodes ) { 3 create(node_name;n_link;physical_node); 4 } 5 // Create link between the first node and last node 6 create(node_name;n_link;physical_node); 7 // at the first node; 8 for ( current_node < total_nodes ) { 9 inject(Task_Messenger); 10 hop(link=+”n_link”); 11 } 12 exit;

Figure 4.7 Creator Messenger script pseudo code

36

Three Messengers functions are called by the creator Messengers script. The

Messengers logical network is created by statements in line 1-6. Creator Messenger in-

jects a task Messenger on the node (line 9) and hops to the next node (line 10). The for-

loop loops through all nodes in the network. Three Messengers statements are called by

the creator Messenger to perform this work.

1. create: This Messengers statement creates a logical node on a specified physi-

cal node. It also generates a link along with the Messenger moves. The links

we create between nodes are two-way links.

2. inject: This Messengers statement activates another Messenger. The activated

Messenger starts work on the same node. The function can pass the parameters

to the injected Messenger. The creator Messenger uses this statement to prompt

the task Messenger on each node.

3. hop: The hop statement makes the Messenger to be navigated to other nodes by

the link along the underlining network. The creator Messenger hops along the

logical nodes network on the physical nodes to inject the task Messenger on

each note.

Task Messenger

A task Messenger is injected by the creator Messenger on each logical node.

Basically, a task Messenger reads the update information sent by the neighbor nodes

and performs the tasks assigned to its node. A task Messenger also uploads up-to-date

information and injects shuttle Messengers to communicate with the neighbors. Figure

37

4.8 presents the Messenger script pseudo code for a task Messenger.

1. // Initialization 2. initParameters(); 3. createNodeGrid(); 4. initParticles(); 5. signalEvent(left_shuttle); 6. signalEvent(right_shuttle); 7. // Simulation 8. while( current time < simulation time) 9. { 10. waitEvent(e_left_shuttle, i); 11. waitEvent(e_right_shuttle, i); 12. updateStuckParticle(); 13. moveReceptors(); 14. degradeParticles(); 15. computeParticles(); 16. loadParticlesToShuttles(); 17. current time++; 18. inject(left_shuttle); 19. inject(right_shuttle); 20. waitEvent(left_shttle_hop); 21. waitEvent(right_shuttle_hop); 22. } 23. exit;

Figure 4.8 Task Messenger script pseudo code

The script initializes the node parameters and node particles (line 2, 4). We

have two kinds of Messengers: 1) task Messenger that runs and stays on the node, and

2) shuttle Messenger that comes and goes to perform the data exchange. When both

task and shuttle Messengers are executed at the same time on a node, a synchronization

of multiple Messengers must be done on that node, so the Messengers can run in a cor-

38

rect order. The event synchronization mechanism is used to manage the Messengers.

Two statements, signalEvent(event) and waitEvent(event) are used to synchronize each

other, along with the node variable event.

The simulated space mapped on each node is created on line 3. The while-loop

(line 8-22) is the main task of the program, which simulates the particle’s movement for

in fix simulation time. At each iteration, the task Messenger waits for the neighbors to

finish uploading the information (line 10-11). The number of stuck particles gets up-

dated (line 12). Line 13 moves the receptors around cell segments, and line 14 calcu-

lates and processes the degraded stuck particles. All free particles are calculated to the

next position (line 15). The particles that walked crossing to the neighbors are loaded to

the shuttle variables (line 16). The shuttle Messengers are injected to the neighbors

(line 17-18). Before advancing to the next iteration, the task Messenger waits shuttle

Messengers to finish loading the data from node variable to the Messenger variable ac-

cordingly (line 20-21).

There are eight C functions are called in the task Messenger script. Following

are the description of each function.

1. initParameters: This function initializes the simulation parameters, such as the

size of a cell grid, the number of receptors in each cell segment, and the simula-

tion parameters.

2. createNodeGrid: This function creates the simulated space on each node. The

simulated space has a grid cell structure and all cell grids have the same size.

39

3. initParticles: Each node is responsible to simulate a group of particles. This

function initializes the Messengers node variables for particles and allocates the

node variable arrays of particles.

4. updateStuckParticle: This function calculates and updates the number of stuck

particles in each cell segment.

5. moveReceptors: This function calculates the number of free and captured re-

ceptors in each cell segment and moves the receptors among the neighbor seg-

ments with a pre-defined exchange rate.

6. degradeParticles: This function calculates the number of degrading particles

based on the number of stuck particles in each cell segment. The rate of particle

degrading is pre-defined. If a stuck particle degrades from the system, the re-

ceptor that held it gets freed and its state changes from occupied to free.

7. computeParticles: This function processes all free particles residing on the

node. It calculates the next movement of free particles and makes decision on

particle capturing or degradation. If a particle gets captured by a free receptor,

the particle’s state changes from free to stuck, and is removed from the free par-

ticle list. The receptor that captures the particles changes its state from free to

occupied. The number of free receptors is decreased by 1.

8. loadParticlesToShuttles: This function sorts out all free particles. If there are

particles crossed over its node boundary, the particles are loaded to the shuttle

node variables and ready to be migrated to the neighbor nodes.

40

Shuttle Messenger

A shuttle Messenger is injected by a task Messenger on a node. It hops along

the underlying network between the nodes. A shuttle Messenger 1) sorts out the parti-

cles that are in neighbor’s territory, and 2) carries these to-be-migrate particles and hops

to the neighbor node and download the particles to the neighbor node variables. A shut-

tle Messenger exits system after finishing its task. So a new shuttle Messenger is in-

jected at every time. We have two shuttle Messengers, a left-shuttle Messenger and a

right-shuttle Messenger. The left-shuttle Messenger hops to the upper neighbor. The

right-shuttle Messenger hops to the lower neighbor. Figure 4.9 shows the script pseudo

code of a left shuttle Messenger. The right shuttle Messenger does the similar job as

what left shuttle Messengers does. The difference is that the left shuttle hops to its up-

per neighbor and the right shuttle Messenger hops to the lower neighbor.

Line 1 loads the migrated particles stored in the node variables to the Messenger

variables of the left shuttle Messenger. Line 2 loads the current cell grid state (the state

of stuck particles and state of receptors) to the Messenger variable of the left shuttle

Messenger. When the left shuttle is ready to hop to its neighbor, it sends signal out to

indicate that the left shuttle has finishing loading the data and ready to leave (line 3).

The left shuttle Messenger then hops to its upper neighbor node (line 4). On the

neighbor node, the arriving shuttle needs to wait neighbor’s shuttle Messengers to leave

before updating data to the neighbor node variables (line 5-6). The migrated particles

are uploaded into the neighbor on line 7 and the cell grid state is updated on line 8. Fi-

nally, the left shuttle Messenger signals the task Messenger to continue for the next it-

41

eration.

1. shuttleLoad(node_left_out, msgr_left_part); 2. gridMapShuttle(&param, node_grid, msgr_grid); 3. signalEvent(left_shuttle_hop, i); 4. hop(link=+/-”nodeLink”); 5. waitEvent(left_shuttle_hop, i); 6. waitEvent(right_shuttle_hop, i); 7. shuttleLoad(msgr_left_part, node_right_in); 8. gridMapShuttle(&param, msgr_grid, node_grid); 9. signalEvent(e_right_shuttle, i); 10. exit;

Figure 4.9 Left shuttle Messenger script pseudo code

There are two C functions called in the left shuttle Messenger script. Following

are the description of each function.

1. shuttleLoad: This function loads migrated particles from a node area of the

task Messenger to left shuttle Messenger. If there is no particle migration

needed, length of the list of migration particle is set to zero.

2. gridMapShuttle: This function synchronizes the cell map (the state of rece-

tores) between the neighbors. The cell map is updated through the data syn-

chronization between neighbor nodes.

4.3 Consistency with sequential implementation

We have described how to implement a parallel version of our simulation. We need to

verify that the parallel version can produce results consistent with the sequential simula-

42

tion. A common criterion is statistic consistency: the results may be somewhat different

by the exhibit similar statistical behavior. We describe here how to achieve something

stronger, exact consistency. We ensure that the behavior of each particle is exactly the

same in our parallel and sequential simulations.

4.3.1 Identical results with the sequential simulation

Handling random numbers

Figure 4.10 Random number sequences

In Figure 4.10, we give an example of generating random numbers in the se-

quential and the parallel simulations. P1, P2 and P3 are three particles. The sequential

simulation runs on node 1. One random number sequence S1 is generated for all parti-

cles on node 1. In the parallel simulation, one random number sequence is generated

for all particles on one node. We assume that P1 and P2 enter into the system from

node 1, and P3 enters into the system from node 2. Sequence S1 is used by the particles

43

on node 1, therefore P1 and P2 use the same sequence S1. The sequence S2 is gener-

ated for the particles on node 2, therefore P3 uses sequence S2. This implementation

produces the different results. Random numbers generated in the sequential simulation

are different than the random numbers generated in the parallel simulation for simulat-

ing the same particle. For example, at the iteration 2, in the sequential simulation, the

random number generated for the particle P1 is the fourth random number in the se-

quence S1, labeled “4, S1”. In the parallel simulation, it is the third random number in

the sequence S1, labeled “3, S1”. To resolve this difference, we assign a same random

number sequence to a cell row in both simulations. Figure 4.11 shows that, before the

migration occurs, the random numbers for each particle in both sequential and parallel

system are consistent. However, with this set, there is another problem in getting the

identical random numbers in the sequential and the parallel simulations.

Figure 4.11 Random number sequence change in particle migration

44

In the parallel simulation, particles migrate to the neighbor nodes. The migrated

particles are treated as new particles on the neighbor nodes. The random number se-

quence used by the particle is generated for that cell row where the particle first time

enters system. So it could lose its original random number sequence and starts a new

sequence on the node it migrates to. This cause the random numbers vary on both cell

rows. Figure 4.11 illustrates random number sequence change after the particle migra-

tion.

At iteration 3, particle P2 migrates to node 2. It loses its original random num-

ber sequence S1. On node 2, P2 is treated as a new particle, so the random number is

assigned to it as the next number in the sequence of S2, which is labeled as “4, S2”.

Figure 4.12 Random number sequences unique to new particles

In order to achieve exact consistency between distributed and sequential simula-

tion, we need the sequence of random numbers generated for every particle to be the

45

same in both systems. To accomplish this we assign a unique random number sequence

to each new particle. So the random numbers used by the simulation on that particle are

generated in the sequence bound with the particle no matter where the particle resides

during the simulation. When the particle migrates to its neighbor, it carries the random

number sequence structure with it. We use this mechanism in both sequential and par-

allel systems, so the random number sequence for a particular particle is always identi-

cal. Figure 4.12 illustrates the sequences that are unique to each new particle.

4.3.2 Random number seed initialization

We create a unique random number seed for each new particle. New particles enter

into simulated space on each cell row. Each cell row is mapped on to a node. Figure

4.13 shows the function of initializing seed structure.

The function initSeed is called when a new particle is generated. The seed value

is an accumulated number of integers with an initial value of 99 plus the node number

(line 3 – 4). We use C function srand48_r(long int seedval, struct drand48_data

*buffer) to initialize the seed structure (line 7 – 9). Array randState(particele_index) is

a node variable. It is used to store the value of the initialized seed structure. Each ele-

ment in the array is bound to the particle by the particle_index. The particle_index is

assigned to a new particle according to the order it enters into the system on each node.

The data type of drand48_data takes memory of 24 bytes. We use C library function

memcpy to copy the data to the node variable array (line 10). So during the simulation

46

when a random number is generated for simulating a particle movement, it is always

generated from the sequence that is initialized and saved with the particle at the initiali-

zation time.

1. function initSeed() 2. { 3. int node_i[5] = {1,2,3,4,5} 4. int seed_int = 99; 5. struct randState_s *&randState[particle_index]; 6. seed_int = seed_int + node_i[node_label]; 7. struct drand48_data *seedState; 8. seedState = (struct drand48_data *)malloc(sizeof(struct drand48_data)); 9. srand48_r(seed_init, seedState); 10. memcpy(randState, seedState, 24); 11. free(seedState); 12. }

Figure 4.13 Function to assigning a random number seed to a particle

.

The shuttle function in the shuttle Messengers

The shuttle function in the shuttle messengers carries the particle and its random

number sequence structure when hopping between the nodes. There are two more vari-

ables added in the shuttle function: (1) node_randState_left_out and (2)

node_randState_right_in. The shuttle function loads the particle and its random num-

ber sequence from the node variables to Messenger variables before it hops to the

neighbor node. The modified shuttle functions called in the left Messenger are as fol-

lows:

• shuttle(node_left_out, msgr_left_part, node_randState_left_out, left_randState)

47

• shuttle(msgr_left_part, node_right_in, left_randState, node_randState_right_in)

48

5. Chapter 5

Simulation Enhancement: Parallel Simu-

lation Protocols

The main goal is to parallelize the simulation model to speed up the simulation execu-

tion, so that we are able to simulate this application long enough in the sense of biologi-

cal time, in a realistic time frame. In the previous section, we created the system struc-

ture for this distributed individual-based simulation. The problem with that implemen-

tation is the slowness caused by communication overhead. Because of this overhead,

the simulation execution takes more time in the parallel computation than the sequential

execution. In this section, we present our approach to enhance the system implementa-

tion and improve the performance of the parallel computing. The two major issues we

need to address and resolve are 1) the communication overhead and 2) the consistency

49

between the parallel and sequential simulations. There are some other subtle problems

that arise and will be discussed as well.

Based on the simulated space mapping on to the machine nodes, which we have

discussed in the previous sections, when a particle moves across to a neighbor node,

communication between the nodes is required. The communication becomes a signifi-

cant factor in slowing down the simulation execution when such communication occurs

more frequently. The overhead of the communication makes more impact when the

simulation runs for a long period of time.

The correctness of the results produced by the parallel system is another con-

cern. The simulation model is originally implemented sequentially. Converting it to a

parallel simulation creates a speed vs. accuracy trade-off.

In the following sections, we present our solutions to these two issues.

5.1 Exchange less frequently

We reduce the communication overhead by decreasing the frequency of the communi-

cations between the nodes. This requires finding a level of granularity of communica-

tion delay that gives us adequate parallelism and does not compromise the accuracy of

simulation results.

Epoch

We define an epoch as a time interval between two occurrences of data ex-

50

change. The length determines the granularity of communication used in the system.

For example, the system synchronization in a distributed simulation can be designed to

occur at every iteration or at every epoch.

Epoch length

In our simulation the time interval or epoch is measured by number of iterations.

We use epoch length to quantify the time interval or epoch. For example, if we set ep-

och length to 500 iteartions, we exchange data at every epoch; then the communication

delay is 500 iterations.

Exchange every epoch

Figure 5.1 and Figure 5.2 show a simulation that runs with 3 iterations. Figure

5.1 (a) shows the communication that occurs when data is exchanged every iteration. Ti

is the computation time for iteration i, Ci is the time spent on network communication

and data packing and unpacking. The total computation time is 3Ti. The total commu-

nication time spent is 3Ci. The total time spent on 3 iterations is 3Ti+3Ci.

Figure 5.1 (b) shows communication that happens when data is exchanged every

3 iterations (i.e., with an epoch length of 3). Te is the time spent for the total computa-

tion, Te=3Ti. Ce is the total communication time. In Generally, 3Ci > Ce, the communi-

cation overhead is reduced when the communication granularity level is increased by

number of iterations.

Our goal is to find out the length of epoch that can serve the best for the least

51

communication overhead and keep the most of accuracy of the computation. This ap-

proach provides a way to speedup the simulation. However, at the same time it intro-

duces the communication delay.

Figure 5.1 Communication granularity level

52

The delay in turn introduces two issues:

1. A particle can move across to a neighbor node during an epoch. This is ad-

dressed by introducing shadow cells.

2. The information in shadow cells may not be current, causing particle to be-

come stuck when they should not, or conversely. We describe the problem

in more detail in section 5.3, and we describe the solution in section 5.4.

Figure 5.2 Node mapping with shadow cells

53

5.2 Shadow cells

We create shadow cells to extend the local node boundary. A shadow cell is a copy of

the neighbor cell at the beginning of the epoch. During the epoch, the shadow cells are

part of the local node working space. Figure 5.2 shows a 5-node mapping with shadow

cells. The simulated space consists of 25 cells in 5 rows and 5 columns, numbered from

1 to 25. Each node maps a row of local cells, and two rows of shadow cells of its

neighbors. The local cells marked with a bold numbers and shadow cells marked in

gray numbers. For example, node 1 maps one row of local cells marked by number 1 to

5, and two rows of shadow cells marked by number 6 to 10 and 21 to 25 respectively.

Shadow cells are synchronized at the beginning of the epoch and can be accessed and

processed by the local node. The length of the epoch determines how frequently the

shadow cells get refreshed. At the start of each epoch, the entire simulated space gets

synchronized with the local data exchange between neighbors.

A potential problem with this approach is that the out-of-date information in the

shadow cells could cause a conflict problem between a local particle and an incoming

particle. For example, consider Figure 5.3, which shows the view of a local node to an

incoming particle. In this example, node 1 is the local node; node 2 is the lower

neighbor. Particles on node 1 are labeled by letter A. Particles on node 2 are labeled by

letter B. At the beginning of an epoch, all A particles on node 1 and all B particles on

node 2 are within their respective node cell boundaries. As the simulation continues

and a particle on node 2 cell number 6 moves across its boundary into node 1 cell num-

54

ber 1 area, we mark this particle as A’ to distinguish it from other particles on node 1.

This particle A’ is an incoming particle to node 1. Before the end of the epoch, node 1

does not see this particle. Node 1 only processes its own particles – A particles, while

particle A’ is still moved and processed by node 2. On the other hand, node 2 does not

see the up-to-date information on node 1, so it processes particle A’ based on stale in-

formation about node 1 that was stored on node 2 at the beginning of the epoch.

Figure 5.3 View of incoming particle A’ on local node (node 1)

After particle enters into its neighbor cell area, a free receptor on node 1 could

be available to a local particle A and also to the incoming particle A’ at a later time. If

the local particle is captured by the free receptor, then the incoming particle, which is

only visible to the node 2, could be captured by the same free receptor because at the

time the node 2 sees the cell on node 1 as its shadow cells and does not know that the

receptor has became occupied by capturing a local particle. We illustrate this example

in scenario 1 of section 5.3, below.

55

If we do not correct the information mismatch caused by introducing shadow

cells, the simulation loses its accuracy. Such inaccuracies can increase as the simula-

tion continues. To correct this we developed a conflict resolution scheme, described in

section 5.4.

5.3 Conflict scenarios description

By introducing the shadow cells to support the less frequent data exchange, a conflict

happens when two free particles are candidates to be captured by same free receptor.

Because the communication delays in the distributed simulation, such conflict has to be

resolved at the end of epoch. In this section we describe these scenarios of showing

how these conflicts occur. We then give our solution in the following sections. We re-

fer to Figure 5.3 for the description of local node (node 1), neighbor node (node 2), lo-

cal particles (particle A), incoming particles on local node (particle A’) and neighbor

node particles (particle B).

5.3.1 Scenario 1: A free particle becoming stuck when it should

not

In this first scenario, one free receptor is available to a local particle. It captures the lo-

cal particle and becomes occupied. In a later time, the same receptor that has become

occupied appears to a neighbor node to be a free receptor, because the neighbor reads

the stale data saved at the beginning of the epoch. As a result, the neighbor node cap-

56

tures another particle (an incoming particle), which it should not capture. Figure 5.4

shows the sequence of events of particle capturing in both sequential and parallel simu-

lation within an epoch. We use a simple data set to describe this scenario.

Figure 5.4 Views of local node with an incoming particle of the local node (scenario 1)

Figure 5.4 (a) shows the sequence of events that occur on the node 1 in the se-

quential simulation. Figure 5.4 (b) and (c) illustrate the sequence of events that occur in

distributed simulation where we use the shadow cells. Figure 5.4 (b) illustrates the view

57

of the node 1 that appears to the node 1. The question mark indicates that the incoming

particle (particle A’) is not visible to the local node. Figure 5.4 (c) shows the view of

local node that appears to the neighbor node, node 2 (the local cells are the shadow cells

of the neighbor node). The question mark indicates that the local particles (particle A)

are not visible to the node 2. The time periods T0 through Te are the subsequence time

periods within an epoch.

Figure 5.5 illustrates this scenario graphically with the time line. The sequential

simulation runs on one node and the parallel simulation runs on two nodes with shadow

cells mapped on each node. The shadow cell is labeled by letter S.

Figure 5.5 Views of particles movement in sequential and parallel implementations of scenario 1

58

1. At time T0, the beginning of the epoch, the system is synchronized by exchanging

data between neighbor nodes. Figure 5.4 (a), (b) and (c) have the same view: one

free receptor (#of stuck particles) on the local node, and one local free particle (#of

local particles). In the synchronization, all the incoming particles are migrated to

the local node, so there are no incoming free particles at the beginning of each ep-

och.

2. At time T1, one particle on the neighbor node walks across the boundary becoming

an incoming particle on the local node. Figure 5.4 (a) shows that without communi-

cation delay or in a sequential simulation, the local particle and the incoming parti-

cle are treated the same in the sequential simulation in which all cells are located on

one node. So this view shows that there are two free particles are processed, one is

the local particle and another one is the incoming particle.

In the Figure 5.4 (b), there is a question mark for the incoming particle. The

question mark here indicates that, during the epoch, the node 1 does not see any in-

coming particles that come from node 2. At this time, node 1 has an incoming par-

ticle that has crossed from node 2, but node 1 does not see it, so there is a question

mark for the incoming particle. The node 1 only moves the local free particle to the

next location.

In the Figure 5.4(c), the question mark indicates that the node 2 does not see

any particles on node 1 during the epoch. At this time, one particle that was located

59

on the node 2 before time T1 walks cross the node boundary and enters into its

shadow cell. This particle is an incoming particle of node 1, but is moved and proc-

essed by the node 2 based on the stale information of node 1 during the epoch.

3. At time T2, Figure 5.4 (a) and (b) show that a particle capturing event occurs. Fig-

ure 5.4 (a) shows that there are two free particles, and the local free particle gets

captured by the free receptor and becomes a stuck particle. The incoming particle

moves to the next location.

The particle capturing event on node 1 is illustrated in Figure 5.4 (b). The

node 1 sees only the local free particle; it gets captured by the free receptor and be-

comes a stuck particle.

In this example (Figure 5.4 (c)), the particle capturing event does not occur

on node 2, although node 2 sees that there is a free receptor available in its shadow

area (node 1) based on the stale information saved on the neighbor node at T0. The

incoming particle may be too far away from the available free receptor on the local

node, or the capturing probability does not satisfy the capturing criteria. So node 2

moves the incoming particle to the next location in its shadow area.

4. At time T3, node 1 sees there is no free receptor left on the local node. Node 1 then

moves the incoming particle to the next location (Figure 5.4 (a)). Figure 5.4 (b)

shows that there is no free receptor available and there is no local free particle to be

moved. So node 1 does not have any work to do.

60

However, a particle capturing event occurs on node 2; see Figure 5.4 (c).

Node 2 reads that there is a free receptor available on node 1, its shadow area. It

processes the incoming particle. The incoming particle gets captured by the free re-

ceptor that is the same free receptor that captured the local particle at T2 to become

a stuck particle. Comparing the events that occur in the sequential simulation illus-

trated in Figure 5.4 (a), this particle capturing event on node 2 should not occur and

this incoming particle should not be captured. The cause of this problem is because

node 2 does not see the up-to-date information on node 1 and uses the stale informa-

tion to make the decision of capturing. The solution for this scenario is described in

section 5.4.1.

5. At time Te, it is the end of the epoch. In the sequential simulation, there is a stuck

particle and a free particle in the system. In the parallel simulation, on node 1, it

ends with a stuck particle and 0 free local particles, and unknown incoming particles

(Figure 5.4 (b)). On node 2, there is a stuck particle and 0 free incoming particles,

and unknown local particles (Figure 5.4 (c)). These unknown particles are resolved

at the time when the system performs the system synchronization by exchanging the

data between the nodes at the end of the epoch. In this example, there is no particle

needs to be migrated between the local node and the neighbor node. We discuss the

particle migration in section 5.4.3.

The basic idea of resolving the scenario 1 is to re-process the particle capturing

events and release the stuck particle that should not be stuck. We implement a mecha-

61

nism for this solution in the conflict resolution. The detail of this solution is discussed

in section 5.4.1.

5.3.2 Scenario 2: A free particle not becoming stuck when it

should

Figure 5.6 Views of local node with degraded stuck particles on local node (Scenario 2)

62

In general, data can only be recovered if it was saved or can be reproduced. In

the distributed simulation, the communication delay can cause information lose by up-

dating system based on stale data. We have seen the free receptor conflict in the sce-

nario 1 by using stale data at T3 on node 2. However, the conflict in that situation can

be resolved in the conflict resolution, in which the stuck particles can be recalculated to

produce the correct result. In the second scenario, the problem is that a free particle

does not get stuck when it should. When this happens, the system has no knowledge

about this event. Therefore, this lost stuck particle cannot be recovered. We use a sim-

ple dataset to describe this scenario. Figure 5.6 shows sequence of events on node 1.

Figure 5.6 (a) illustrates the sequence of events in the sequential simulation. Figure 5.6

(b) and (c) illustrates the sequence of events in the parallel simulation by using the

shadow cells.

Figure 5.7 illustrates this scenario graphically with the time line. The sequential

simulation runs on one node and the parallel simulation runs on two nodes with shadow

cells mapped on each node. The shadow cell is labeled by letter S.

63

Figure 5.7 Views of particles movement in sequential and parallel implementations of scenario 2 1. At time T0, it is the beginning of the epoch. The local node data is same in both se-

quential and parallel simulations. The local data is also saved by the neighbor node.

There is 1 stuck particle on the local node.

2. At time T1, Figure 5.6 (a) illustrates a particle degradation event that occurs on the

local node. The stuck particle is degraded from the system; the receptor that cap-

tured the particle is freed. In the sequential simulation, the just freed receptor is

available to the free particles.

Figure 5.6 (b) and (c) illustrate particle activities in the parallel simulation

on two nodes. Figure 5.6 (b) shows the particle degradation event on node1. The

64

receptor that is occupied by the stuck particle is freed. This newly freed receptor is

available only to the local particles because node 1 does not see any incoming parti-

cles from its neighbor node.

Node 2 however is not aware of the particle degradation event on node 1, so

it does not see the newly freed receptor (Figure 5.6 (c)). The number of free recep-

tors appears to node 2 remains 0, which is read from the data saved at the beginning

of the epoch, at time T0.

3. At time T2, in Figure 5.6 (a), it illustrates that a particle walks across the boundary

and become an incoming particle. The incoming particle is treated same as a local

free particle in the sequential simulation.

In the parallel simulation, the node 1 does not see the incoming particle that

comes across from the neighbor node; see Figure 5.6 (b). There is no work to do for

the node 1 at this time period.

On node 2, a particle moves across its node boundary and enters into its

shadow cell area becoming a incoming particle of node 1. Node 2 processes this

particle in the shadow area and moves it the next location on node 1 (Figure 5.6 (c)).

4. At time T3, in the sequential simulation (Figure 5.6 (a)), the incoming particle is

captured by the newly freed receptor.

In the parallel simulation, node 1 is not aware of the incoming particle that

comes from a neighbor at time T2, the capturing event does not occur on node 1.

The newly freed receptor remains free on node 1, see Figure 5.6 (b).

65

While the neighbor node, node 2 is not aware of the newly released free re-

ceptor on node 1, so the particle capturing event does not happen in the shadow area

on node 2. This incoming particle processed by node 2 should become stuck but is

not, because the current information about the free receptor on the local node is not

passed to the neighbor node during the epoch. The data available to the neighbor

node is the stale data saved at time T0.

The problem with this lost stuck particle is that it cannot be corrected later

on because both local and neighbor nodes have no knowledge about the to-be-

captured incoming particles. The event of capturing the incoming particle by the

free receptor on the local node is not snapshot by the system and therefore it cannot

be found or re-produced later on. To be able to capture this incoming particle when

it should be captured, we need an available free receptor that appears to the

neighbor node at the time. To make this happen, we overstate the number of free

receptors on the shadow cells at the beginning of the epoch. The detail is discussed

in section 5.4.2.

5. At time Te, the end of the epoch, in the sequential simulation, there is a stuck parti-

cle in the system (Figure 5.6 (a)). In the parallel simulation, because the particle

capturing event does not happen during the epoch, there is a free particle remain in

the system, instead of a stuck particle. This also tells that, after time T3, the sequen-

tial and parallel simulations produce a different result that cannot be corrected. In

66

this scenario, the discrepancies between the two simulations cannot be resolved at a

later time.

We have described two conflict scenarios in this section. In the following sec-

tion, we discuss the solutions and the more potential problems in each scenario.

5.4 Conflict resolution

We developed a conflict resolution to deal with the scenarios we identified in the sec-

tion 5.3. The conflict resolution scheme is to (1) take snapshots of every particle cap-

turing event that occurs during the epoch, (2) take snapshots of every particle degrada-

tion event that occurs during the epoch, and (3) re-process the data that is collected dur-

ing the previous epoch to resolve the conflicts to obtain the correct simulation results.

Figure 5.8 Tentatively stuck particles on node 1

Tentatively stuck particle

We name a particle a tentatively stuck particle if it is captured by a free receptor

during the epoch. The tentatively stuck particles include the stuck particles on the local

cells, and the stuck particles that are in its shadow cells on the neighbor nodes. Figure

67

5.8 illustrates the tentatively stuck particles on node 1. Cell 1 to 5 are the local cells of

node 1, cell 6 to 10 are the shadow cells of its lower neighbor node, and cell 21 to 25

are the shadow cells of its upper neighbor. There are total of 5 tentatively stuck particles

in this figure. There are three local tentatively stuck particles which are in the local

cells labeled by . The stuck particle in the shadow cells of lower neighbor is labeled

by . The stuck particle in the shadow cells of upper neighbor is labeled by .

This means that during the epoch, node 1 takes five snapshots to record five particle

capturing events.

Tentatively stuck particle tag

We use a tentatively stuck particle (TSP) tag to save the snapshot whenever a

particle capturing event occurs during the epoch. A TSP tag attaches to every tenta-

tively stuck particle. By the end of epoch, in the conflict resolution process, these TSP

tags are exchanged between nodes. Figure 5.9 shows the TSP tag.

Figure 5.9 TSP tag structure

68

The tag entries are:

1. particle index: This is the index of the free particle list, used to remember

the entering order of the free particle.

2. iteration: This is the iteration when the particle is captured during the ep-

och.

3. location: This is the coordinates of geometry location where the free particle

resides before it gets captured.

4. event type: This event type is used to indicate the tentatively stuck particle

that it is a local stuck particle, or is captured in the shadow cells. If it is cap-

tured by on local cell, it is a type I event, otherwise it is type II. For exam-

ple, the event type in the tag for a A tentatively particle is I; the event type

is II for and tentatively stuck particles, which are captured in

shadow cells.

5. cell segment number: this is the number of cell segment in which the parti-

cle is captured.

6. number of free receptors: This is the number of free receptors at the time a

particle gets captured.

7. number of return hit: This is one of the parameters used in calculating the

capturing of a particle.

8. random number for the capturing probability: This is the random num-

ber generated when the particle gets captured.

Degraded stuck particle tag

We use a degraded stuck particle (DSP) tag to save the snapshot when a stuck

particle degrades from the local cells. Figure 5.10 shows the DSP tag. A DSP tag at-

taches to the stuck particle that is degraded from the system, and is processed in the

69

conflict resolution.

Figure 5.10 DSP tag structure

5.4.1 Solution to scenario 1

As described in the scenario 1 in the previous section, conflicts occurs when one free

receptor captures two free particles during the epoch. The rule in our simulation is that

one free receptor can only capture one free particle at a time and then it becomes an oc-

cupied receptor. A free particle cannot be captured by an occupied receptor. So the

second particle capturing event should not occur. In the section we describe how we fix

this problem.

To fix the problem, we (1) take snapshot of all tentatively stuck particles during

the epoch and (2) exchange the TSP tags between the nodes and re-run the simulation of

the last epoch to confirm the tentatively stuck particle or release it as a free particle.

Because the conflict is caused by type II tentatively stuck particle, the re-run process

should be only executed when there is a type II tentatively stuck particle.

Figure 5.11 illustrates the process of taking the snapshots during the epoch and

how the conflict resolution resolves the tentatively stuck particles for scenario 1. Figure

70

5.11 (a) shows the snapshot that is taken on the local node during the epoch. Figure

5.11 (b) shows the snapshot that is taken on the neighbor node during the epoch. Figure

5.11 (c) illustrates the re-processing of the tags of tentatively stuck particles. It releases

the tentatively stuck particle that was tagged at T3 on node 2 and moves it to the next

location. The process is described as the following:

Figure 5.11 Solution for scenario 1

71

1. At the simulation time T2 during the epoch, a free receptor is available to the local

node. The free receptor captures the local free particle. A snapshot is taken for this

capturing event, labeled by the TSP tag. The event type of this TSP tag is type I,

see Figure 5.11(a).

2. Figure 5.11(b) shows that, at the simulation time T3 during the epoch, on the

neighbor node, the same free receptor that captured the local free particle at T2, ap-

pears to be a free receptor to the neighbor node in its shadow cell area. The particle

that walks into its shadow cell area on node 2 get captured by this free receptor that

is the same free receptor capturing the free particle on node 1. A TSP tag is created

for this tentatively stuck particle. The event type in this tag is type II because the

tentatively stuck particle was captured in the shadow cells of the neighbor node.

At the conflict resolution time, the TSP tags created on the shadow cells on

neighbor nodes during the epoch are sent over to the local node. In this example, the

local node has total of two TSP tags, one is a type I TSP tag and one is a type II tag.

This means that during the epoch, at the most two particles can become stuck on the lo-

cal node.

The resolution repeats the simulation of the last epoch. It starts at T0. The dif-

ference of this repeat simulation with the normal simulation is that this repeat simula-

tion does not simulate the free particles; it only calculates the stuck particles. The goal

of the resolution is to find the tentatively stuck particles that should not become stuck,

and then release them back to free particles. Because the TSP tags collect all possible

72

stuck particles that could be stuck during the epoch, the resolution process is to read

each TSP tag to either confirm it as a stuck particle or release it to a free particle. If a

tentatively stuck particle is released as a free particle, it continues to move to the next

location until the end of the epoch. The tags are sorted first by the iteration at which the

snapshot took place. So the tag created at T2 is processed first followed by the tag cre-

ated at T3. Figure 5.11 (c) illustrates resolution process. It confirms one tentatively

stuck particle to be a stuck particle and releases another tentatively stuck particle as a

free particle. Following we describe process of resolving the TSP tags.

1. The repeat process starts at T0, but only re-calculates the capturing events at the

time when the snapshot was taken. At time T2, the first TSP tag that was taken at

time T2 during the epoch is processed. If the data in TSP tag matches the data in the

current simulation then the tentatively particle is confirmed as a stuck particle. Fig-

ure 5.11 (c) shows that the TSG data matches the current simulation, there is a free

receptor available in that cell segment; it confirms that the tentatively stuck particle

is a stuck particle. The free receptor becomes occupied.

2. At time T3 during the resolution process, the TSP tag that was created at T3 during

the epoch gets processed. There is no free receptor left in the cell segment, the sec-

ond tentatively stuck particle cannot be stuck. So it is released as a free particle and

this free particle continues to move to the next location until the end of the epoch.

The result of the conflict resolution for this example is that the local free particle

73

is stuck at T2. It is the only new stuck particle during this epoch. The second tenta-

tively stuck particle is released to be a free particle and continues to move to the next

location until the end of the epoch. The calculation of the free particle movement in the

resolution process is accurate based on the data saved by TSP tag. The conflict resolu-

tion ensures that the result is consistent with what the sequential simulation produces.

5.4.2 Solution to scenario 2

We have defined type I and type II tentatively stuck particles. The solution to scenarios

1 shows that in order to use the tentatively stuck particle mechanism we must ensure

that all of the tentatively stuck particles including local stuck particles (type I) and stuck

particles in shadow cells (type II) are successfully tagged during the epoch. Scenario 2

describes the problem of the type II tentatively stuck particles not being tagged during

the epoch. The reason why this happens is that the newly freed receptors are available

to the local free particles only because the neighbor node does not have visibility to the

local node during epoch. To resolve this problem, we overstate the number of free re-

ceptors in the shadow cells at the beginning of the epoch. By doing this, the number of

free receptors appears to the free particles in the shadow cells is more than it actually is.

So the possibility of capturing a free particle in a shadow cell is increased. This ap-

proach ensures that at least all possible tentatively stuck particles are detected and

tagged during the epoch.

We overstate the number of free receptors in the shadow cells at (1) the begin-

74

ning of the epoch and (2) during the epoch when the number of free receptors in the

shadow cell is zero read by the free particles in the shadow cell.

Figure 5.12 Overstating the number of free receptors of local node in the shadow cell on neighbor node

At the beginning of the epoch, the system has performed synchronization. All

the nodes have been updated by exchanging the data with the neighbors. So each node

75

has the current information of its neighbors and saves the information to its shadow

cells. We update the shadow cells by adding one additional free receptor to every cell

segment of the shadow cells. This information is used as stale data to calculate the in-

coming particles in the shadow cells during the epoch. This extra free receptor in the

shadow cell acts as a newly freed receptor released by a particle degradation event dur-

ing epoch and appears to the incoming particle in the shadow cell. Figure 5.12 illus-

trates this approach to resolve the scenarios 2.

Figure 5.12 (a) shows the snapshot that is taken on the local node during the ep-

och. Figure 5.12 (b) shows the snapshot that is taken on the neighbor node during the

epoch. Figure 5.12 (c) illustrates the re-processing of the two tags of tentatively stuck

particles. The process is described as follows:

1. At the simulation time T1 during the epoch, a stuck particle is degraded, so the re-

ceptor that bound with this stuck particle is freed. After it is freed, it is immediately

available as a free receptor to the local free particles. This particle degrading event

is tagged with a DSP tag, see Figure 5.12 (a).

2. At the simulation time T3 during the epoch, on the neighbor node, the incoming par-

ticle gets captured as a tentatively stuck particle in the shadow cell by the free re-

ceptor. This free receptor is a free receptor that was saved at the beginning of the

epoch as an overstated free receptor. The type II TSP tag is generated for this tenta-

tively stuck particle, see Figure 5.12 (b).

DSP tags are local node tags and do not exchange with neighbors. It contains

76

the event information of particle degradation on the local node. Because particle degra-

dation process changes the number of free receptors and number of stuck particles in the

system, the DSP tags must be processed with the exchanged DSP tags in the conflict

resolution.

Figure 5.12 (c) illustrates resolution process. It confirms that the tentatively

stuck particle at T3. Following we describe process of resolving the TSP tags.

1. When the conflict resolution runs to the first DSP tag at iteration of T1, the conflict

resolution confirms that the stuck particle is degraded and the receptor that bound

with the stuck particle is freed. The newly freed receptor is available immediately.

2. As the resolution process runs to time T3, the TSP tag created at T3 during the epoch

is processed. The newly freed receptor is available at the time. The number of free

receptors in the current system matches the number of free receptors saved in the

TSP tag, which is one free receptor. So this tentatively stuck particle is confirmed

as a stuck particle. It results the same result as the one from the sequential simula-

tion. By the end of epoch, there is one stuck particle and no free receptor left in the

system.

When the cell is full or almost full with the stuck particles, there is none or very

few of free receptors left. This is the situation when the simulation has run for a certain

period of time and free receptors have captured the particles up to its capacity. How-

ever, when the system reaches to this situation, the stuck particles start to degrade more

77

because the degradation rate is based on the number of stuck particles in a cell segment.

The more the stuck particles reside in a cell segment, the fast the stuck particles de-

grade. So this degradation can produce more free receptors during the epoch than it is

predicted at the beginning of the epoch. So adding one more free receptor to each cell

segment in the shadow cells at the beginning of the epoch may not always succeed in

this situation. To deal with this situation, we set the number of free receptor to 1 when

there is 0 free receptors read in the shadow cells by an incoming particle during epoch,

to increase the particle capturing probability.

The potential problem with this approach is that the overstating of free receptors

in shadow cells can have more particles tagged as tentatively stuck particles than they

should during the epoch. However, when it happens, it then becomes the issue we ad-

dressed in scenario 1.

5.5 The order of processing the particles

We have described the problems and resolutions in the event of particle capturing in the

distributed simulation. In the section we address the issue about the order of processing

particles.

5.5.1 First come, first processed

The order of processing particles is not in the biological sense. Driving on the itera-

tions, the particles are processed in the order determined only by the implementation. It

78

does not matter which particle is captured, they move simultaneously and instantane-

ously, but the same chosen order must be preserved in both simulation systems to make

the sequential and parallel simulations comparable and achieve an identical simulation

result.

In general, in the sequential simulation, the particles are processed in the priority

of first-come-first-processed. The order of processing particles can determine which

particle gets stuck. For example, two free particles are the candidates to be captured by

one free receptor. The one that comes first gets the priority to be stuck. Following we

alter the example we stated earlier in the scenario 1 (see Figure 5.4 (a)), to describe this

issue.

• If the both local and incoming free particles become candidates of the free re-

ceptor at the same iteration, i.e. T3 = T2, but only one particle can be captured by

the free receptor, the one that is processed first becomes stuck. The other parti-

cle then bounces back from the cell segment to continue the next move as a free

particle. So the order of processing these two particles determines which parti-

cle gets stuck.

In the sequential simulation, particles keep its order in the entire simulation on

each node. The particle that enters into the system first is always processed first. This

is not always true in the parallel simulation. In the parallel system, some particles mi-

grate to the neighbor nodes at the end of epoch. The particles that migrate to the

neighbor are treated as new particles on that neighbor node. By taking the priority of

79

first-come-first-processed, the migrated particles are processed after the particles that

are already on the neighbor node get processed. This could change the order of process-

ing the particles, because these migrated particles may come into the system earlier than

some of the particles on the neighbor node.

The sequential and parallel simulations can produce different results if the parti-

cles are simulated in different orders. So we need to preserve the order of processing

the particles in the entire simulation.

5.5.2 Particles migration

In the parallel simulation, the incoming particles migrate to its neighbor nodes at the

end of epoch. So at the beginning of the epoch, all particles start on the node they be-

long to. A particle migrates to its neighbor node as a new particle to that node. Based

on the priority of first-come-first-processed, the migrated particles are processed fol-

lowing the local particles.

Figure 5.13 illustrates a simple example of particles migration between node 1

and node 2. At epoch e, there are 4 particles A1, A2, A3, and A4 on node 1, labeled by

A1(e,1), A2(e,1), A3(e,1) and A4(e,1). There are 4 particles on node 2, labeled by B1(e,2),

B2(e,2), B3(e,2) and B4(e,2). e is the time of epoch, 1 is the node number.

80

Figure 5.13 Particles migration between node 1 and node 2

As we discussed in the earlier section, to simulate particles in different orders

between two simulations can produce different results. In the sequential simulation, the

order of processing the particles does not change in the entire simulation. Figure 5.13

(a) shows the particles migration in the sequential simulation. During the epoch e, node

1 particle A3(e,1) moves across to the node 2 area and becomes an incoming particle of

node 2. At the end of epoch, in the particle migration process, the particles are not

moved from its current node to the node it migrates to; instead the migration updates the

values of particles. Particle A3(e,1) updates to A3(e+1,2) to indicates that at epoch e+1,

particle A3 currently located in the node 2 area. So the migration process in the sequen-

tial simulation does not change the order of processing particles on the nodes.

81

Figure 5.13 (b) shows the particle migration in the parallel simulation. The par-

ticle A3(e,1) migrates to node 2 from node 1. It is shuttled over by the shuttle Messenger

to node 2, labeled by A3(e+1, 2). The order of processing the particles on node 1 changes

from A1, A2, A3, A4 to A1, A2, A4. The order of processing the particles on node 2

changes from B1, B2, B3, B4 to B1, B2, B3, B4, A3.

To keep the order of processing particles same in both sequential and parallel

simulations, we migrate the particles to its neighbors in sequential simulation in the

same way as we do in the parallel simulation. The sequential simulation runs on one

physical node. We create a set of logical nodes in the sequential simulation correspond-

ing to the set of physical nodes on which the parallel simulation runs. When an incom-

ing particle migrates to a neighbor node in the parallel simulation, it is shuttled over by

the shuttle Messenger. In the sequential simulation, the incoming particle is deleted

from the particle list of the logical local node, and then added to the particle list on the

logical neighbor node the particle migrates to.

5.5.3 Preserving the order of processing the particles

We pack and unpack the particles that are shuttled between nodes in such a way that it

keeps particles in order consistently. There are three types of new particles entering to a

node. They are (1) new particles first entering into the system; (2) migrated particles,

and (3) tentatively stuck particles that are released back into the system as free particles.

The new particles entering from the left boundary are always appended to the

82

end of the particle list on each node. The migrated particles and the just freed tenta-

tively stuck particles released by the conflict resolution are sorted by the particle index

on their original node and appended to the particle list on the current node. The proce-

dure of unpacking particles on a destination node is described as the following:

a. Load particles migrated from upper neighbor to the local node.

b. Load particles migrated from lower neighbor to the local node.

c. Load tentatively stuck particles to the local node.

d. Run confliction resolution to resolve tentatively stuck particles.

e. Sort the particles come from the upper neighbor based on the original parti-

cle index and append them to the particle list on the local node.

f. Sort the particles come from the lower neighbor based on the original part

index and append them to the particle list on the local node.

g. Sort the particles released from the tentatively stuck particles that were in-

coming particles in the upper shadow cells and append them to the particle

list on the local node.

h. Sort the particles released from the tentatively stuck particles that were in-

coming particles in the lower shadow cells and append them to the particle

list on the local node.

5.5.4 Temporary incoming particles

A temporary incoming particle is: a particle walks into a shadow cell and goes close to a

83

free receptor and then returns to its local node during epoch. Figure 5.14 illustrates the

path of a temporary incoming particle of the local node. The particle labeled by B, starts

on the neighbor node cell number 7, and walks cross the boundary to enter to node 1

shadow cell becoming an incoming particle of node 1, in shadow cell number 2. The

incoming particle is labeled by B’. The incoming particle moves in the shadow cell and

can go very close to a free receptor in shadow cell number 2, but did not get stuck. By

the end of epoch, it returns back to node 2 and ends with a new location. The returned

particle is labeled by B.

The temporary incoming particle could be processed in a different way between

the sequential and the parallel simulation. In the sequential simulation, this particle is

simply a free particle on node 2 with a new location. But in the parallel simulation, it is

not always the case; it may become a type II tentatively stuck particle during the time it

moves around in the shadow cell. If that happens, the tentatively stuck particle is shut-

tled to node 1 at the end of the epoch. After it gets released by the conflict resolution

process, it stays on node 1 as a free particle and processed by node 1 in the next epoch.

The particle then is processed on different nodes. From that point on, the order of proc-

essing this particle is different in two systems. As we have mentioned in the previous

section, the sequential and the parallel simulations could produce different results by

processing the particles in different orders.

84

Figure 5.14 A temporary incoming particle on the local node

To handle this mismatch, we define a rule. This rule says that during epoch, if a

particle travels cross to a shadow cell and at least one-time goes near to a free receptor

in a cell segment, within a predefined distance (for example, 5 micro steps) and at the

end of the epoch, this particle travels back to its original node, then this particle is con-

sidered as an incoming particle and needs to be migrated to the neighbor.

We apply this rule in both sequential and parallel simulations to ensure that the

particle is simulated in the same way. The temporary incoming particles are considered

same as other new migrated particles to the node. The different about this particle with

the other migrating particles is how it is identified as a migrating particle in the epoch.

The temporary incoming particle is identified as a migrating particle when it meets the

following two conditions: (1) it goes close enough to a free receptor in the shadow cell

but does not get captured during the epoch and (2) it ends up on the node it starts on at

the beginning of the epoch.

85

So there are total four types of new particles entering to the node at the end of

epoch. We described the unpacking, sorting and appending of the migrated particles in

this section to preserve the order of processing particles.

5.6 Confliction resolution algorithm

The conflict resolution is part of the synchronization process. It is executed at the end

of epoch. Figure 5.15 displays the pseudo code of conflict resolution. Line 18 – 27

process and confirm the stuck particles and set the particles to free if it is not a stuck

particle. Line 11 calls a function to calculate the just freed particle and moves it to the

next location until the end of epoch. If it gets stuck during this calculation, it returns the

state as a stuck particle to tagParts.state, and will be traded as a tentatively stuck parti-

cle again and continued to be processed.

1 processTentativeStuckParts() { 2 n: current iteration 3 el: epoch length 4 for( i < (n – el)) { 5 // use the data as the beginning of the epoch 6 moveCellReceptors(i); 7 for ( I < tagPart_len) { 8 if( tagParts[I] is set to a free particle in line 25) { 9 // recalculate the next movement as a free particle; 10 // returns 1 when becomes stuck 11 tagParts[I].state = re-calculate unstuck particles; 12 } 13 } 14 if(all tagParts have been processed, set at line 30) { 15 Continue; 16 } 17 if(i == tagParts[tagI].iteration) {

86

18 while (I == tagParts[tagI].iteration and tagI < tagPart_len) { 19 calculate the degradation; 20 calculate the capturing; 21 If(captured) { 22 update cell map; 23 } 24 else { 25 set the tagPart to a free particle and move one step; 26 } 27 tagI++; 28 } 29 if(tagI >= tagPart_len) { 30 all tagParts set to have been processed; 31 } 32 } 33 update the current node cell map; 34 }

Figure 5.15 Pseudo code of the conflict resolution

Algorithm with exchange less frequently

The algorithm of the parallel simulation with a variable level of granularity of data ex-

change delay is listed below, in Algorithm 5.1.

Algorithm 5.1 Parallel simulation procedures performed by task Messengers

Procedure MAIN Open output files Initiate Messenger shuttle events While not (end of simulation) do If at epoch then Wait for shuttle events LOAD_TAG_SHUTTLE PROCESS_TAG_PARTS For all tag parts Add unconfirmed tentative stuck particles to the migrating particle list Add the random number structure to node variable(s) End For

87

Sort node particles migrated from left/right neighbor by particle index Add migrating particles to the node particle list Reset shuttle variable(s) UPDATE_BIN_STUCK_PARTS Write particles to the open files End If MOVE_RECEPTORS DEGRADATE_STUCK_PARTS COMPUTE_PARTS Advance by 1step If at epoch then LOAD_MIGRATION_PARTS UPDATE_PARTS Inject Messengers shuttles Wait for the signals form shuttles hopped over from neighbors End If End While Close output files End MAIN Procedure LOAD_TAG_SHUTTLE Load tag parts of left shuttle to node variables Load tag parts of right shuttle to node variables Sort tag parts by tag particle index Sort random number structure by tag particle index End LOAD_TAG_SHUTTLE Procedure PROCESS_TAG_PARTS Refer to Pseudo code in Figure 4.20 End PROCESS_TAG_PARTS Procedure UPDATE_BIN_STUCK_PARTS For (all cell segments on local node) Update tagPart’s #of free/stuck receptors by local node value Increasing #of free receptors to neighbor cell segments by 1 End For End UPDATE_BIN_STUCK_PARTS Procedure MOVE_RECEPTORS

88

For (all cell segments on local node and neighbor nodes) If integer(50% of free or occupied receptors in the segment) >= 1 Move them to the neighbor segments Update the total number of free and occupied receptors End If End For End MOVE_RECEPTORS Procedure DEGRADATE_STUCK_PARTS For (all cell segments on local node and neighbor nodes) Calculate total #of degradating particles by accumulated in iteration If total #of degradating particles >=1 then Add it to tagPart as a degradation particle Update #of free and occupied receptors End If End For End DEGRADATE_STUCK_PARTS Procedure COMPUTE_PARTS Define: S = the particle state (0: free particle; 1: local stuck; 2: tagPart stuck), R = the particle release state (1: particle stuck, need to be released), For (all particles) If particle is a freed particle or out of right boundary of virtual space then Set particle state to be freed Go to next particle End If Copy random number sequence structure from node variable Generate three random numbers IS_PARTICLE_STUCK If S = 1 or S = 2 then R = 1 S = 0 Add the particle to tagpart Copy the random sequence structure to tagPart’s Increase #of tagPart by 1 Else Generate 2 random numbers MOVE_PART Update the particle location Copy the random sequence structure to node variable End If

89

End For Return the #of tagPart End COMPUTE_PARTS Procedure IS_PARTICLE_STUCK Define: D = the distance between particle and cell segment FS = the number of steps the particle takes to reaches the cell segment RS = the number of steps the particle takes to return to the cell segment Find the particle location area in the grid Calculate D, FS and RS If D > FS + RS If the particle ends in neighbor area then If the particle is within 5 steps to a cell segment then Particle is set to be an invaded particle End If Get #of free receptor from the neighbor cell segment End If Get #of free receptor from local cell segment Calculate stuck particle If it is a stuck particle then Update cell segment Add to tagPart End If End If End IS_PARTICLE_STUCK Procedure MOVE_PART Calculate the direction and distance the next location of the particle (e3.2) If the particle hits the cell segment, it bounces back to the grid area Update particle with next location End MOVE_PART Procedure LOAD_MIGRATION_PARTS For (all particles) If the particle is a crossing particle to the neighbor then Load crossing particles to the shuttles respectively Load corresponding random number sequence structure to the shuttles Set particle state to be released End If End For

90

UPDATE_PARTS End LOAD_MIGRATION_PARTS Procedure UPDATE_PARTS For (all particles) If the particle is to be released Recycle particle to be reused to store next new particle End If End For End UPDATE_PARTS

5.7 Correctness of distributed simulation

Most of distributed individual-based simulations are developed for the parallel execu-

tion from the beginning. There are many simulation models designed for sequential

simulation. It is not a trivial task to convert such sequential simulation model to a par-

allel structured simulation in order to obtain a better speedup performance. Some inter-

esting issues to a parallel simulation model are addressed by Bajaj [BBM99] in their

case study. They focus on the process of converting a sequential model to parallel im-

plementation by using PARSEC programming language that supports both sequential

and parallel simulation algorithms. They identify the areas need to be changed in the

initial sequential simulation model in order to make the model suitable for parallel

simulation. The approach works for the models that can make such changes without

losing the requirements of the original model. However, the study focused on the proc-

ess of converting other than comparing and validating the simulation results from two

simulation structures.

91

We have developed the protocols used in the distributed individual-based simu-

lation. These protocols provide a set of rules governing the computation and conflict

resolution to produce the exact same results as from the sequential simulation.

In the next chapter we discuss the performance in varying level of granularity of

the communication and a trade-off between simulation execution speedup and results

accuracy between the sequential and parallel simulations.

92

6. Chapter 6

Experimental Assessment

We have developed the protocols used in the distributed individual-based simulation.

These protocols provide a set of rules governing the computation and conflict resolution

to produce the exact same results as what the sequential simulation produces. The as-

sessment we present in this section covers the following aspects of (1) the performance

evaluation, (2) the capability and scalability of the distributed simulation system, and

(3) the system steady state.

6.1 Performance evaluation

To evaluate performance of a parallel system, the most commonly used metrics is

speedup, which captures the benefits of solving a given problem using a parallel system.

93

In general, speedup is defined as the ratio of time needed to solve the problem on a sin-

gle processor to the time used to simulate the same problem with multi-processors.

However, speedup in our simulation can not be evaluated as straightforward as stated in

such a definition. The granularity of communication between processors is a significant

factor to the speedup.

In this section we evaluate the simulation performance by running experiments

on the different size of problems. We then study the factors that affect the performance

of the simulation and explain the experiment results. The parameters used in the ex-

periments are listed in Table 6.1.

Cell size 10µm by 10µm Number of cell segments per cell 20 Cell segment size 2µm Cell alley (between cells) 1µm Number of receptors per cell seg-ment 20

Cell specification

Receptor radius 5nm Time step 1e-5 second Particle diffusion

parameters Distance step N(0, 0.044µm) Free receptors moving rate (between cell segments, per itera-tion)

0.5

Stuck receptors moving rate (between cell segments, per itera-tion)

0.5 Other parameters

Stuck particle degradation rate (per iteration, per cell segment) 5e-7

Table 6.1 Parameter set for experiments

94

6.1.1 Experiment 1

The simulated space in this experiment consists of 50 cells organized by 5 rows with 10

cells in each row. New particles enter into the system from the left boundary of the

simulated space at a rate of 0.001 particles per iteration per cell row. The experiment

runs for 1,000,000 iterations. So we simulated a total of 5000 particles.

Figure 6.1 Execution time of experiment 1 at different epoch lengths

In the sequential simulation, the experiment runs on one node. In the parallel

simulation, the simulated space is mapped to 5 nodes with each cell row to one node.

We apply the parallel simulation protocols we discussed in the chapter 5 into this ex-

periment. We run an experiment on each epoch length of 10, 100, 200, 500, 1000,

2000, 3000 and 5000 respectively. Figure 6.1 shows the execution times for the se-

quential and parallel simulations (labeled by 5 nodes) at each of the epoch length. The

95

execution time for the sequential simulation is 1303 seconds. In the parallel simula-

tion, the execution time varied based on the length of the epoch. The performance of

the parallel simulation is worse than the sequential simulation when the epoch length is

less than or equal to 10. The execution time then is reduced significantly when the

length of the epoch extends from 10, to 1000. After the length of 1000, the execution

time does not change much. This is because that when the epoch length is relatively too

long, the system synchronization and the conflict resolution process starts taking longer

time to finish.

Figure 6.2 Speedup of experiment 1 at different epoch lengths

Figure 6.2 show the speedup of the experiment. The speedup ( pS ) of this ex-

periment is calculated as the following:

96

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

=

processors p withn simulatioparallel the of time execution the is :Tn simulatiol sequentiathe of time execution the is :T

processors of number the is :p

TTS

p

pp

The parallel simulation on 5 nodes gains more speedup along the increasing of the ep-

och length. When the length of epoch stretches to 5000, the speedup is 2.62. The simu-

lation gets a good speedup in this experiment.

6.1.2 Experiment 2

The parallel simulation gains a speedup in the experiment 1, but the speedup is not a

linear or close to a linear speedup. In this experiment, we increase the size of problem

by increasing the number of particles to be simulated. The experiment has the same

simulated space of experiment 1, but has more particles. There are 0.004 particles per

iteration per cell row entered into system, 4 times more than in experiment 1. We run

the simulation for the same set of epoch lengths as in the experiment 1.

Figure 6.3 shows the execution times. The execution time of sequential simula-

tion is 9558 seconds. The parallel simulation (labeled by 5 nodes) gains speedup along

the increasing of the epoch length. This is consistent to experiment 1. However, the

experiment 2 gets more speedup than experiment 1, Figure 6.4 shows the speedup com-

parison of the two simulations. The most speedup of experiment 2 is 4.42, which is

close to a linear speedup. This result shows that the problem has to be big enough to

97

gain a better performance.


Figure 6.4 Speedup of experiment 1 and 2 on 5 nodes

Before the system reaches a steady state, when new particles are increased, the

number of free particles in the system is increased as well. This is because that the free

receptors are limited by a fix number, 20 per cell segment in this case. When free re-

98

ceptors become occupied by capturing particles, no more free particles can be captured

until the occupied receptors become free again by degradation of stuck particles. When

more free particles in the system, the simulation takes more time on calculating. The

sequential simulation takes about 7 times longer in experiment 2 than experiment 1.

However the parallel simulation handles this problem better. Because the calculation is

done on each node in the parallel system, the parallelization efficiency is increased.

Therefore the performance of parallel simulation is improved on simulating a bigger

problem.

6.1.3 Experiment 3

In this experiment, we increase the simulated space to simulate 100 cells organized by

10 rows with 10 cells in each row. We map one cell row to one node, so 10 nodes are

used. The workload is balanced by a row of cells. New particles enter into the system

at a rate of 0.001 per iteration per cell row. There are total of 10,000 particles simulated

in this experiment. As in the earlier experiments, we run an experiment at epoch length

of 10, 100, 200, 500, 1000, 2000, 3000, and 5000 respectively.

Figure 6.5 shows the execution time of the sequential simulation and the parallel

simulation (labeled by 10 nodes). The execution time of the sequential simulation is

2625 seconds. In the parallel simulation, at length of 10, the execution time is more

than the sequential simulation. The parallel simulation gains more speedup when the

length of epoch gets longer. This is consistent with the earlier experiments.

99


Figure 6.6 Speedup of experiment 3 at different epoch lengths

The Figure 6.6 shows the speedup of experiment 3. The most speedup is 3.5 at

the epoch length of 5000. Although the workload on each node is same as in the ex-

periment 1, this experiment gets a better performance comparing to the sequential simu-

100

lation. This is because when we simulate a big problem, with more cell rows involved,

the sequential simulation takes more time to process particles on these cell rows. How-

ever, the parallel structure can add more nodes to handle more cell rows. Because the

workload on each node is balanced by cell rows, the execution time does not increase

by adding more nodes into the system. The only extra time required is the overhead

time spent on the synchronization among the neighbor nodes in a larger network.

6.1.4 Experiment 4

Experiment 3 gives a good speedup when simulating more cells. The most speedup is

3.5. In this experiment, we further increase the problem size to see if we can get a good

speedup close to be linear. We use the same simulated space as in the experiment 3 and

the same set of 10 nodes to map the problem. The problem size is increased by simulat-

ing more particles – we increase particle entering rate to 0.004 particles per iteration per

cell row. There are 40,000 particles simulated in this experiment, 4 times more than it

in the experiment 3.

Figure 6.7 illustrates the execution time for the sequential simulation and the

parallel simulation on 10 nodes at different epoch length of 10, 100, 200, 500, 1000,

2000, 3000, and 5000. The execution time of the sequential simulation is 20712 sec-

onds. In parallel simulation, at the epoch length of 10, the exaction time is more than

the sequential simulation. The parallel simulation gains more speedup when the epoch

length gets longer. This is consistent with the experiment 3. Figure 6.8 shows the

101

comparison of speedups of experiment 3 and 4. The most speedup in experiment 4 is

8.47 at the epoch length of 5000. The experiment 4 obtains a better performance for

simulating more particles. This is also consistent with what the experiment 2 produces.


Figure 6.8 Speedup of experiment 3 and 4 on 10 nodes

102

6.1.5 Performance trade-offs

In these experiments we found that the simulation is scalable. When adding more nodes

or increasing the simulated particles in the system, the simulation gains a better per-

formance with a significant speedup, especially when the epoch length stretches from 0

to 1000. The system continues to gain a little more speedup with increasing of the

length of the epoch, but not much. In this section we compare the results produced by

the parallel simulation on each of the epoch length with the result produced by the se-

quential simulation and find the length of the epoch that gives the best simulation re-

sults in the sense of the speedup and consistency.

We found that the result generated from the parallel simulation with the epoch

length less or equal to 1000 is identical to the result produced by the sequential simula-

tion. When the length increases more than 1000 iterations, some discrepancy starts to

occur. However the simulation does not gain much more speedup by using the epoch

length longer than 1000, as we examined in the experiments. The performance trade-

offs evaluation makes sense for an application where it gains more speedup when the

length of epoch gets longer.

Iterations 500 1000 2000 3000 5000 Accuracy 100% 100% 99.7% 95.8% 93.9%

5 node speedup 1.86 2.24 2.45 2.56 2.6 10 node speedup 2.74 3.16 3.42 3.48 3.5

Table 6.2 Comparison of the speedup and accuracy at different epoch lengths

103

Table 6.2 shows that when the epoch length gets longer, the accuracy goes lower

and the speedup gets a little increased. 100% accuracy means that the results from the

sequential simulation is exactly the same as the results produced by the parallel simula-

tion, including 1) the number of free particles, 2) the number of stuck particles, and 3)

the location of each free particle in the system. If the accuracy is not 100%, it means

that there are a number of particles that do not match in the results produced by the se-

quential and parallel simulations, in one or more of these three counts.

The following behaviors of particles cause the discrepancy when the epoch

length goes too long: 1) the particle is not stuck when it should, and 2) the particle

moves across over the boundary of the shadow node, and fails to migrate to the correct

node. We can enforce new rules to correct these in the conflict resolution. For exam-

ple, we can overstate more free receptors to deal with issue 1, to ensure that all tenta-

tively stuck particles are captured during the epoch. However, the time spent on resolv-

ing these issues and the speedup the simulation can obtain by resolving these issues can

become too much overhead. So we conclude that if the simulation obtains a significant

speedup when the length of epoch goes longer, the new rules should be enforced to keep

the simulation in consistent results.

The system does not get much of the speedup and starts to lose it accuracy as the

length of epoch gets longer than 1000 iterations. This inaccuracy introduces uncertainty

into the system and can cause more discrepancy in the rest of simulation time. So the

epoch length of 1000 iterations is the recommended epoch length to benefit most from

the parallelization and keep the simulation results in consistency.

104

6.2 System capability and scalability

The distributed simulation implementation makes it possible to simulate this particle

diffusion model, which could potentially run in a very long period of time, in a reason-

able time frame. The distributed simulation system is scalable. It is extensible to deal

with a bigger or more complicated problem. As we described in the experiments, the

simulation gains more speedup when simulating a bigger problem of running on more

nodes or processing more particles.

The architecture of the distributed simulation allows different problem parame-

ters and new functionalities to be added as needed. This supports the flexibility of the

simulation implementation. The particle behaviors in the diffusion process can be ex-

amined by changing or adding new parameters. In chapter 7, we use this system to

simulate biological applications in case studies.

6.3 System steady state - stop criteria

The simulation model that we study in this work is a biological particle diffusion model,

which is a flow system - particles enter into the system constantly during the entire

simulation. A steady state of such a system is that the current observed behavior of the

system will continue in the future, even the particles still flow through the system.

However, from that point on, the system starts to produce repeating results; therefore,

there is no need to continue the simulation. This leads us to find out if our system has

105

such a steady state, and if it does, what it takes for the system to reach the steady state.

We did the following work to define the system steady state: 1) specify mean-

ingful criteria for determining the time at which the number of cells in each bin had

reached a steady state and 2) quantify the mean time to reaching a steady state for each

bin. Professor Dan Gillen in ICS department of UCI has come up this idea and is the

main contributor to this work.

6.3.1 Determining the point of steady state

When each bin reaches a steady state, then the system researches a steady state. For

each bin, consider a piecewise liner regression model of the form:

iiii )c,tmax()c,tmin(y εβββ +++= 210 (eq. 1)

where iy represents the number of cells collected at time it , c represents a single

change-point for the piecewise linear term, and iε is a random error term. In this

case 1β denotes the slope of the number of captured particles at times less than c and 2β

denotes the slope of the number of captured particles at times greater than c. To deter-

mine the point of equilibrium, the regression model given in (eg. 1) was fit assuming

the value of c for each observed catchment time. This procedure then produced slope β̂

corresponding to a change point at each observed catchment time. The point of equilib-

rium was determined as the first time point as which the upper limit of a 95% confi-

dence interval for 2β (the slope of the line for times after c) had ruled out values greater

106

than 0. Intuitively, this means that we are looking for the first change point where we

are confident that the first order trend in the number of captured particles is no longer

positive.

To apply this model for both free and stuck particles in each bin, we collected 10

datasets from 10 runs on 10 nodes. Each dataset 1) computes one cell row on one node,

2) generates a unique random number sequence, and 3) runs for 60 million iterations

(10 minutes of biological time). We collect data at every 500 iterations. The data shows

that the particles move around at the first 4 bins after certain time, and few of particles

appear in bin 5.

The procedure was performed for each of the datasets, 10 datasets total for the

first 4 bins, resulting in a separate time to reaching a steady state for each dataset.

6.3.2 Quantifying the mean time to reaching steady state

Now we present the mean time (and corresponding 95% confidence interval) to reach-

ing a steady state for both free and stuck particles in each bin. Table 6.3 shows the re-

sults for stuck particles and Table 6.4 presents the results for free particles.

Figure 6.9 shows the histogram of number stuck particles in 5 bins on one of the

simulation runs. The mean time that reaches the steady state in each bin varies from bin

1 to bin 4. In bin 1 and bin 2 when it reaches the steady state, about 100% receptors are

occupied. In bin 3, there are about 91% receptors full. In bin 4, less than half of the re-

ceptors are occupied. There are a low number of receptors in bin 5 that captures parti-

107

cles. After all 4 bins reach the steady state; the number of stuck particles keeps its

level. The system is in a steady state.

Bin Mean Lo. 95CI Hi. 95CI 1 4025500 3850274 4200726 2 4028000 3950069 4105931 3 6933000 6833945 7032055 4 14675500 14058454 15292546

Table 6.3 Mean time with confidence interval results for stuck particles

Bin Mean time Lo. 95CI Hi. 95CI 1 8128000 7581478 8674522 2 10148000 9380705 109152953 13130500 11465772 147952284 19563000 15970401 23155599

Table 6.4 Mean time with confidence interval results for free particles

Figure 6.10 shows the histogram charts of free particles in each bin over the 5

bins. As stuck particles, free particles reach the steady state in each bin at the different

time and keep the number of free particles in each bin at a different level. The farthest

bin, in this case, free particles travel through is bin 5. The number of free particles in

bin 5 is 1 at most of time after the system reaches the steady state. Because the number

of free particles takes longer time to reach the steady state than the stuck particles, we

claim the system reaches a steady state at the time when the number of free particles

reaches the steady state in each bin.

108

(a) Bin 1 (b) Bin 2

(c) Bin 3 (d) Bin 4

(e) Bin 5

Figure 6.9 Example of fitted piecewise linear models to one of the stuck particles data-sets

109

(a) Bin 1 (b) Bin 2

(c) Bin 3 (d) Bin 4

(e) Bin 5

Figure 6.10 Example of fitted piecewise linear models to one of free particles datasets

110

7. Chapter 7

Biology Results Obtained From the

Simulation

We have presented our distributed simulation system that improves the performance of

simulation applications. One of our goals is to make this system be useful in biological

application simulations. In this chapter we present data analysis to the simulation re-

sults and use the simulation system in biological case studies.

7.1 Analysis of simulation output

In this section we present a statistical analysis to the simulation results. The output of

the simulation we analyze includes 1) the number of stuck particles in each bin and 2)

111

the number of free particles in each bin during the simulation. What we want to find

out is whether the variations of the number of stuck and the number of free particles fit

a normal distribution. If it is not, what the distribution is and how close it is to a normal

distribution. The assumption is that if the variation distribution does not fit a normal

distribution, the problem cannot well be formed by a mathematical approach and a

computer simulation is necessary for better describing the problem.

In the following sections, we describe our approach to define and calculate the

variations for the number of stuck particles and the number of free particles. To sim-

plify the calculation, we only present the data in bin 1 from one simulation run. The

same calculation and data analysis can be done for different simulation runs with altered

parameter sets and for each of cell bin.

7.1.1 Variation of number of stuck particles

The dataset we measure is the histogram data of the number of stuck particles in bin 1.

The data was collected at every 500 iterations. We consider two measures of variabil-

ity: the range and the variation. We also run statistic test to test the variation to be nor-

mally distributed. The mean value we use in the variation calculation is the average

value calculated by 10 datasets that come from 10 simulation runs. Figure 7.1 (a)

shows the average value curve and a curve of the number of stuck particles from one of

simulation runs for bin 1. Figure 7.1 (b) shows a part of zoom-in curves in Figure 7.1

(a) for a better look for the two curves.

112

(a) Average curve and number of stuck particles from one of the simulation runs

(b) Zoom in of two curves from part of (a)

Figure 7.1 Average value and number of stuck particles from one simulation in bin 1

113

Figure 7.2 10 time ranges (T1 – T10) in bin 1 of stuck particles

The curve of the number of stuck particles is stabilized when it is close to the

steady state. Knowing how much variation there is from the beginning of the simula-

tion to the time of reaching the steady state can be very helpful. When we know the

variation, we can find out whether the variation to be normally distributed. If it is nor-

mally distributed, the application can be mathematically modeled and calculated, in-

stead of doing the simulation work.

We use range to cluster the data and calculate variation in each time range. The

total time range in this calculation is between the beginning of the simulation and the

time the simulation has reached the steady state. For example, for the number of stuck

particles in bin 1, the system is close to the steady state around iteration of 4,000,000.

We define a time range to 500,000 iterations. Figure 7.2 shows that we divide total

114

simulation time to 10 ranges (T1 to T10). In each time range, there is 1000 points of

data collected at every 500 iterations within the time range. We calculate variation in

each time range as follows:

Let ix represent the simulated data in a dataset, n be the size of the data set ( in

this case, n is 1000), ix be the average value on the fitting curve and i = 1,…n.

The absolute variation:

iii xxx −=∆

The relative variation:

x

xxx

i

iii

−=∂

7.1.2 Shapiro-Wilk W test for variation

The same method of calculating the number of stuck particles in section 7.2.1 is used to

calculate the number of free particles and its variation. To simplify the process, we

only show the datasets for bin 1.

There are total four sets of data we run by Shapiro-Wilk W Test to test the null

hypothesis that the absolute and relative variation came from a normal distributed popu-

lation. The size of time range for free particles is defined by 1,000,000 iterations, be-

cause it takes longer to reach the steady state. Figure 7.3 shows the 10 time ranges for

the number of free particles in total simulation time of 10,000,000 iterations.

115

Figure 7.3 10 time ranges (T1 – T10) in bin 1 of free particles

Number of stuck particles Number of free particles Time range Absolute

variation Relative variation

Absolute variation

Relative variation

T1 < .0001 0.000 < .0001 0.000 T2 < .0001 < .0001 < .0001 < .0001 T3 < .0001 < .0001 < .0001 < .0001 T4 < .0009 < .0013 < .0023 < .0004 T5 < .0001 < .0001 < .0001 < .0001 T6 < .0001 < .0001 < .0569 < .0582 T7 < .0001 < .0001 < .1626 < .2151 T8 < .0002 < .0003 < .0001 < .0001 T9 < .0009 < .0011 < .0001 < .0001

T10 < .0001 < .0001 < .0008 < .0001

Table 7.1 p-value produced by Shapiro-Wilk W Test for Bin 1 output

A small p-value yielded (< 0.05) from the test rejects the null hypothesis. Table

7.1 shows the p-values for each time range for the stuck and free particles for bin 1.

116

The results show that almost all the tests are rejected. The details of test results are

listed in the Appendices. The tests were run at the UCI Center for Statistical Consulting

Department of Statistics.

The same method of range variation calculation and statistical test can be ap-

plied to other bins for further analysis. The time range can also be varied in measuring

the output. By using a different time range, the distribution may vary as well.

Figure 7.4 CV for the number of stuck particles in 10 time ranges

Figure 7.5 CV for the number of free particles in 10 time ranges

117

Coefficient of variation (CV)

We can measure the spread of the variations in each time range by calculating

the coefficient of variation. Because most of the variations do not come from normal

distribution, we use the relative variation ix∂ and N datasets in CV calculation, i.e.

( )∑ ∂= 21ix

NCV . Here N is 1000 for the stuck particles, and 2000 for the free parti-

cles. Figure 7.4 shows the Coefficient variation for the number of stuck particles. Fig-

ure 7.5 presents the Coefficient variation for the number of free particles.

Simulated space 50 cells 5 rows by 10 columns

Cell size 10µm by 10µm Number of cell segments per cell 20 Cell segment size 2µm Cell alley (between cells) 1µm Number of receptors per cell seg-ment 20

Cell specification

Receptor radius 5nm Time step 1e-5 second Particle diffusion

parameters Distance step N(0, 0.044µm) Free receptors moving rate (between cell segments, per itera-tion)

0.5

Stuck receptors moving rate (between cell segments, per itera-tion)

0.5

Stuck particle degradation rate (per iteration, per cell segment) 5e-7

Other parameters

Stuck particle release rate (per iteration, per cell segment) 1.0e-7

Table 7.2 Case study parameters

118

7.2 More experiments

We extend the capability of the simulation model to program various scenarios that

would be interested in a biology study. In this section we present two cases we simu-

lated by using this system and illustrate the results.

7.2.1 Case study 1: releasing stuck particles

In this case study, we release the stuck particles back into the system as new particles.

The released particle has same characteristics of a new particle. However, it entries into

the system at the location it gets released, instead of entering from the left boundary of

the simulated space. For example, if a stuck particle is released, it becomes a new par-

ticle and the entry location is at the middle point of the cell segment it was stuck to (see

Figure 7.6). The parameters we used in this case study are listed in table 7.2.

Figure 7.6 Stuck particles released back to system

The simulation runs for 40 million iterations, which equals 6.67 biological min-

utes. By adding the releasing particles, we now have two sources of new particles in-

119

jecting into the system. We observed: 1) the free particles keep increasing crossing the

bins, 2) the particles move to as far as the 8th bin, and 3) the system does not reach a

steady state at the end of the simulation. From this observation, we can predict that

when the particles move crossing the right boundary of the simulation space and start to

disappearing from the system, the system will eventually reach a steady state, in which

the rate of new particles entering into system matches the rate of particles degrading

from the system. Figure 7.7 shows the particles diffusion in the systems at the end of

simulation. Each dot presents a location of a free particle. Figure 7.7 (a) shows the re-

sults we did earlier without particle released to the system. Figure 7.7 (b) shows the re-

sults with particles released back into the simulated space as new particles.

(a) Without releasing stuck particles (b) With releasing stuck particles

Figure 7.7 Particles diffusion in the simulated space (6.67 Bio-minutes)

To simplify the data presentation, Figure 7.8 illustrates the comparison of stuck

particles and Figure 7.9 shows the comparison of the free particles in 8 bins on one cell

row respectively. Curve 1 shows the number of particles without releasing particles,

120

and curve 2 shows the number of particles with the feature of releasing particles. By

adding the release feature, particles can travel as far as to bin 8, while in the simulation

where no stuck particles being released, particles do not travel crossing over bin 5 and

the system reaches a steady state faster.

121

Figure 7.8 Number of stuck particles at the end of simulation (6.67 Bio-minutes)

122

Figure 7.9 Number of free particles at the end of simulation (6.67 Bio-minutes)

123

7.2.2 Case study 2: stuck particle crossing through cell

In this simulation, we do not release stuck particles in a certain rate as we did in the

case study 1 while keep all other parameters same. We release a stuck particle on the

cell wall 0 (cw 0), at every 10,000 iterations, which equals 0.1 bio-seconds. The re-

leased the particle re-enters the system from the other side of the cell. Figure 7.10 illus-

trates an example of such crossing cell release. The release rate can be adjusted by

simply changing value of release parameter of the simulation. For example, the number

of releasing particles be can calculated by a certain percentage of stuck particles.

Figure 7.10 A stuck particle released crossing the cell

Figure 7.11 shows the particle diffusion results at the end of 20 million itera-

tions, which equals 3.33 bio-minutes. Figure 7.11 (a) shows free particle locations at

the end of simulation without releasing stuck particles, and Figure 7.11 (b) shows the

simulation with the feature of releasing stuck particles periodically. With the stuck

particles released in this way, some of the free particles are able to travel as far as to bin

9.

To simplify the data presentation, Figure 7.12 shows comparison of the number

124

of stuck particles in 10 bins. Figure 7.13 shows comparison of the number of free parti-

cles in 10 bins. Curve 1 illustrates the data in simulation without releasing particles and

curve 2 presents the data in this case study that the stuck particles are released periodi-

cally and move crossing the cells.

(a) Without releasing feature (b) Stuck particles released crossing cells

Figure 7.11 Particles diffusion at the end of 20 million iterations (3.33 Bio-minutes)

125

Figure 7.12 Number of stuck particles at the end of simulation (3.33 Bio-minutes)

126

Figure 7.13 Number of free particles at the end of simulation (3.33 Bio-minutes)

127

8. Chapter 8

Related Work

The research work on parallel and distributed simulation started in the 1970’s and has

been remained active since then. Distributed simulation technologies address issues

concerning the execution of simulation programs on a collection of computers which do

not share memory and are connected by a communication underlining network. Parallel

and distributed simulation systems can provide benefit to many applications including

individual-based applications [FBD98, MBD98].

The main goal of a distributed simulation system is to reduce execution time.

To archive this goal, issues in developing such a system have long been discussed and

studies, such as problem decomposition, distributed virtual environment, time manage-

ment, synchronization, parallel algorithms, and simulation correctness.

128

Problem partitioning is a defining characteristic of a distributed system. The

partition is a logical boundary between portions of the problem or information, or a

physical boundary between groups of machine nodes. The purpose of partitioning is to

assign responsibility for some aspect of problem to a specific processor, in order to

achieve a maximum efficiency of the parallelism. Obtaining a good load balancing and

efficient communication between the processors are main concerns in the partitioning.

Two well-known methods people often use to partition the problem of individual-based

model in the distributed system are Lagrangian method and the Eulerian method

[CHM+94, FBD98, Mer98]. In general, Lagrangian method assigns a fixed set of enti-

ties to a node in the distributed system. The Eulerian method divides the simulated

space and assigns a portion of the simulated space, together with the entities currently

located in that area to a node in the distributed system. We apply a kind of hybrid of

these two methods into our problem partitioning. New particles are grouped based on

their entering location. The simulated space is divided to Eulerian horizontal strips.

Each node in the distributed system is responsible for a horizontal strip or a number of

horizontal strips. The particles residing in the area of the partition can migrate to other

area periodically. Our partition provides an opportunity to obtain load balancing

through the entire simulation and a good mapping structure to support communication

efficiency between the nodes in the distributed system.

The research on distributed simulation system has continued for decades. The

techniques developed to resolve issues in distributed simulation system have matured

over the last few decades [NF92, Fuj99, Fuj01]. With the principal goal of reducing

129

execution time, the communication cost is a key problem that must be addressed. The

communication between nodes is managed by synchronization. The goal of synchroni-

zation mechanism is to ensure that each node processes events in timestamp order. The

interesting synchronization techniques are described as conservative synchronization

and optimistic synchronization.

In conservative synchronization, if each node can keep such timestamp order

precisely, execution of the simulation on a distributed simulation system will produce

exactly the same results as an execution on a sequential simulation system. In contract

to conservative synchronization approach, optimistic synchronization allows events to

occur concurrently. The concurrently processed events might create conflicts. How-

ever, the optimistic synchronization is able to detect and recover from them.

To implement synchronization mechanisms to a distributed simulation system

generally rely on application model specific information. What to avoid is the commu-

nication overhead in the synchronization. Some research papers [OHS91, Fer95,

LPL93, Fuj99,] contain basic ideas of synchronization techniques, alterative schemes

for different models and applications, and solutions to ultimately reduce the execution

time in a distributed simulation system.

The synchronization mechanism we used in our distributed simulation system is

an alterative optimistic synchronization. We introduce epoch as a time interval for syn-

chronization to occur. The length of epoch is defined by number of iterations that are

the time steps in the simulation. During epoch particles are processed on each node

130

may create conflicts between particles on different nodes. We developed a conflict

resolution to take snapshot of tentatively conflict events and resolved the conflicts by

rolling back to prior epoch and re-process particles that cause the conflicts. This syn-

chronization mechanism works efficiently. The distributed simulation system obtains a

good speedup, sufficient correctness, and scalability.

Distributed individual-based simulation systems can provide substantial benefit

to applications that are simulated by time steps and need a significant long simulation

time to get an interesting result. The computer simulation system may also useful in

simulating models that are mathematically theoretically formulated [BKP+05, BKP+06,

BPK+08]. The simulation can present the model in a virtual environment in a real bio-

clock time. The results from the simulation can be used to validate and enhance the

model. Our distributed individual-based simulation system provides such an environ-

ment for the biological particles diffusion model. The system has the ability to adapt to

different scenarios of case studies and is scalable to large problems.

131

9. Chapter 9

Conclusions

We describe a distributed individual-based simulation system to allow a large scale of

simulation while preserving the consistency results between the sequential simulation

and the parallel simulation. The research work starts with a biology application model

and focuses on developing a system in which a computer can be useful to help study

and understand applications. The contributions of this work are listed in next section.

9.1 Contributions

The main contribution of this dissertation is to develop an approach to perform compu-

tationally intensive individual-based simulation. We investigated issues and provided

the solutions in the following areas:

132

• Model simplification. We simplified a basic simulation model to use macro

time steps instead of micro time steps by replacing the random walk with Gaus-

sian distribution and formulate the basic micro simulation.

• Parallelization to improve performance.

• We studied the characteristics of the simulation model and choices of paral-

lelism. We investigated methods of problem partitioning and chose and im-

plemented the Eulerian-horizontal strip mapping for the parallel simulation

implementation.

• We defined a level of granularity of communication delay to reduce the

communication frequency and to speedup the simulation.

• We developed a set of rules and protocols and conflict resolution scheme to

ensure the parallelism and simulation results accuracy between the sequen-

tial simulation and the parallel simulation. The trade-off between speedup

and consistency is investigated and presented.

• Performance evaluation. We evaluated the system performance by evaluating

the simulation execution speedup and accuracy of the simulation results between

the sequential simulation and the parallel simulation. The parallel simulation

implementation allows large scale of simulation while preserving correctness.

• A tool for biology applications. Biology datasets are used in the case study.

The results of the simulation are presented from the biological point of view and

133

can be used in analysis and research of molecular diffusion in an intercellular

virtual space. The simulation system provides flexibility to allow the execution

of different biology datasets by varying the system parameters.

9.2 Future work

The work presented in this dissertation can be further continued in the following areas.

Improvement of the conflict resolution scheme

We use conflict resolution to solve the problem caused by the communication

delay to ensure consistency of the simulation results. One of the rules we use in the

conflict resolution is to overstate the number of receptors in shadow cells at the begin-

ning of epoch. An interesting problem is to develop a dynamic overstating mechanism

to further improve the performance and consistency between sequential simulation and

parallel simulation. For example, when there are many free receptors available, a small

number of overstating could be used. When stuck particles are increased in the system,

the number of overstating could be increased. This is because when there are more

stuck particles in the system, more stuck particles could get degraded from the system

to free receptors during epoch. To adjust the number of overstating can make this rule

play more accurately and efficiently.

The length of epoch affects the simulation performance and consistency. More

work can be done to develop a dynamic mechanism to vary the granularity of commu-

nication to further investigate trade-offs between performance and accuracy. For exam-

134

ple, the epoch length can be longer when conflicts occur less frequently, and can be ad-

justed working together with other rules applied to the system.

Usability evaluation

The simulation system we developed can be used as a tool for simulating bio-

logical applications. We have run the simulation in biology case-study to produce data

for biological analysis. We did this by adding new events of particles and changing the

parameter sets. However, more case-studies are needed to help evaluate the system us-

ability and extendibility.

135

10. Bibliography

[BBM99] L. Bajaj, R.Bagrodia, and R. Mayer. Case Study: Paralyzing a Sequential

Simulation Model. Computer Science Department, University of Califor-

nia, Los Angles, 1999.

[Ber93] H. C. Berg. Random Walks in biology. Expanded Edition. Princeton Uni-

versity Press, 1993.

[BFD96] L. F. Bic, M. Fukuda, and M. B. Dillencourt. Distributed Computing us-

ing Autonomous Objects. IEEE Computer, 29(8), 1996.

[BKP+05] T. Bollenbach, K. Kruse, P. Pantazis, M. Gonzales-Gaitan, and F. Julicher.

Robust formation of morphogen gradients. Biological Physics (phys-

ics.bio-ph), Physical Review Letters 94, 018103, 2005.

[BKP+06] T. Bollenbach, K. Kruse, P. Pantazis, M. Gonzales-Gaitan, and F. Julicher.

Morphogen Transport in Epithelia. Biological Physics (physics.bio-ph),

arXiv: q-bio/0609011v1[q-bio.OT], 2006

[BPK+08] T. Bollenbach, P. Pantazis, A. Kicheva, C. Bokel, M. Gonzalez-Gaitan and

F. Julicher. Precision of the Dpp Gradient. Development 135 (6), pages

1137-1146, 2008.

136

[CHM+94] T. W. Clart, R. v. Hanxleden, J. A. McCammon, and L. R. Scott. Parallel-

izing Molecular Dynamics using Spatial Decomposition. Proceedings of

the IEEE Scalable High-Performance Computing Conference, pages 95-

102, 1994.

[FBD98] M. Fukuda, L. F. Bic, and M. B. Dillencourt. Distributed Individual-

Based Simulation Using Autonomous Objects. Technical Report 97-46,

Department of Information and Computer Science, University of Califor-

nia, Irvine, 1998.

[FBD99] M. Fukuda, L. F. Bic, and M. B. Dillencourt. Messages versus messengers

in distributed programming. Journal of Parallel and Distributed Comput-

ing, 57:188-211, 1999.

[FCH+08] M. Fukuda, C. Wicke, H. Kuang, E. Gendelman, K. Noguchi, and M. K.

Lai. Messengers User’s manual, version 3.1.4. Department of Computer

science Donald Bren School of Information and Computer Sciences, Uni-

versity of California, Irvine, 2008.

[Fel66] W. Feller An Introduction to Probability Theory and Its Applications. Vol-

ume 1, Third Edition. John Wiley & Sons, Inc. New York, 1966.

[Fer95] A. Ferscha. Parallel and Distributed Simulation of Discrete Event Sys-

tems. Contributed to the: Handbook of Parallel and Distributed Comput-

ing. McGraw-Hill, 1995.

[Fis95] P. A. Fishwick. Simulation Model Design and Execution (Building Digital

World). Prentice Hall International Series in Industrial and Systems Engi-

neering, Prentice-Hall, Inc., New Jersey, 1995.

137

[FSW97] F. Fishwick, J. G. Sanderson, and W. F. Wolf. A Multimodeling Basis for

Across-Trophic-Level Ecosystem Modeling: The Florida Everglades Ex-

ample. Transactions of the Society for Computer Simulation International,

15 (2). pp. 76-89. ISSN 0740-6797/98

[Fuj99] R. M. Fujimoto. Parallel and Distributed Simulation. Proceedings of the

1999 Winter Simulation Conference, 1999, pages 122 – 131.

[Fuj99a] R. M. Fujimoto. Exploiting Temporal Uncertainty in Parallel and Dis-

tributed Simulations. Proceedings of the 13th Workshop on Parallel and

Distributed Simulation, pages 46-53, 1999.

[Fuj01] R. M. Fujimoto. Parallel and Distributed Simulation Systems. Proceed-

ings of the 2001 Winter Simulation Conference, 2001, pages 147 – 157.

[Fuk97] M. Fukuda. Messengers: A Distributed Computing System Based on

Autonomous Objects. PhD Dissertation, Department of Information and

Computer Science, University of California, Irvine, 1997.

[Gim02] H. R. Gimblett. Integrating Geographic Information Systems and Agent-

based Modeling Techniques for Simulating Social and Ecological Proc-

esses. A volume in the Santa Fe institute studies in the sciences of com-

plexity, Oxford University Press, 2002.

[HHM96] S. Hinckley, A. J. Hermann, B. A. Megrey, Development of a sparially ex-

plicit, individual-based model of marine fish early life history. Marine

Ecology Progress Series, Volume 139, pages 47-68, 1996.

[HM96] T. Hopkins and D. R. Morse. The implementation and visualization of a

large spatial individual-based model using Fortran 90. Technical Report

138

18-96*, University of Kent, Computing Laboratory, University of Kent,

Canterbury, UK, 1996.

[HNP97] J. Hamilton, D.A. Nash, and U. W. Pooch. Distributed Simulation. Com-

pater Science & Engineering, Volume 8, CRC Press, 1997.

[KBW99] J. U. Kreft, G. Booth, J.W.T. Wimpenny. Applications of individual-based

modeling in microbial ecology. In Proceedings of the 8th International

Symposium on Microbial Ecology, Atlantic Canada Society for Microbial

Ecology, Halifax, Canada, 1999.

[LPL93] Y.-B. Lin, B. R. Preiss, and W. M. Loucks. Selecting the Checkpoint In-

terval in Time Warp Parallel Simulation. In Proceedings of the 7th work-

shop on Parallel and Distributed Simulation, pages 3-10, 1993. IEEE

Computer Society.

[MBD98] F. Merchant, L. Bic, and M. B. Dillencourt. Load Balancing in Individual-

Based Spatial Applications. In proceedings of the International Confer-

ence on Parallel Architectures and Compilation Techniques (PACT’98),

pages 350-357, 1998.

[Mer98] F. Merchant. Load balancing in spatial individual-based system using

autonomous objects. PhD Dissertation, Department of Information and

Computer Science, University of California, Irvine, 1998.

[MSC94] W. Maniatty, B. Szymansk, and T. Garaco. Implementation and Perform-

ance of Parallel Ecological Simulations. In Proc. Conf. Applications in

Parallel and Distributed Computing, Caracas, Venezuela, April 1994. IFIP

Transaction A-44, North Holland, Amsterdam, 1994, pages 93-102.

139

[NF92] D. Nicol and R. Fujimoto. Parallel Simulation Today. NASA Contract

Nos. NASI-18605 and NASI-19480, In Annals of Operations Research,

Institute for Computer Applications in Science and Engineering NASA

Langley Research Center, 1992.

[OHS91] B. Overeinder, B. Hertzberger, and P. Sloot. Parallel Discrete Event

Simulation. In third workshop on design and realization of computer sys-

tems. http://www.science.uva.nl/research/scs/papers/byyear.html.

[PKC07] J. Plumert, J. Kearney, and J. Cremer. How Does Traffic Density Influence

Cyclists’Gap Choices? International Conference, Road Safety and Simu-

lation (RSS), Rome, Italy, 2007.

[RG05] B. Rashleigh and G. D. Grossman. An individual-based simulation for

mottled sculpin in a southern Appalachian stream. Ecological Modeling

187(2005) 247-258.

[Rob05] S. Robinson. Distributed simulation and simulation practice. Simulation

2005, Volume81, Number 1, 2005.

[TT96] Y. M. Teo and S. C. Tay. Performance analysis of parallel simulation on

distributed systems. Distrib. Syst. Engng 3, pages 20-31. The British

Computer Society, The institution of Electrical Engineers and IOP Pub-

lishing Ltd, 1996. Printed in UK.

[WH96] J. D. Westervelt and L. D. Hopkins. Facilitating mobile objects within the

context of spatial landscape processes. NCGIA, Third International Con-

ference/Workshop on Integrating GIS and Environmental Modeling, Santa

Fe, NM, 1996.

140

Appendix 1

Shapiro-Wilk W Test result for the number of stuck particles – absolute variation distribution in time range of T1 to T10

141

T1 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-5 -4 -3 -2 -1 0 1 2 3 4 5 6

Normal(0.2823,2.16659) Quantiles 100.0% maximum 5.80099.5% 5.40097.5% 4.40090.0% 3.30075.0% quartile 1.80050.0% median 0.30025.0% quartile -1.40010.0% -2.6002.5% -3.6980.5% -4.3000.0% minimum -5.000 Moments Mean 0.2823Std Dev 2.1665851Std Err Mean 0.0685134upper 95% Mean 0.4167468lower 95% Mean 0.1478532N 1000Sum Wgt 1000Sum 282.3Variance 4.6940908Skewness 0.0980475

142

Kurtosis -0.582746CV 767.47611N Missing 0Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ 0.2823 0.1478532 0.4167468 Dispersion σ 2.1665851 2.0756152 2.2659563 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.991890 <.0001

Note: Ho = The data is from the Normal distribution. Small p-values reject Ho.

143

T2 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-10 -8 -6 -5 -4 -3 -1 0 1 2 3 4 5

Normal(-1.3123,2.6049) Quantiles 100.0% maximum 4.60099.5% 4.49997.5% 3.00090.0% 2.09075.0% quartile 0.50050.0% median -1.00025.0% quartile -3.10010.0% -4.6902.5% -6.8000.5% -9.3000.0% minimum -9.600 Moments Mean -1.3123Std Dev 2.6049039Std Err Mean 0.0823743upper 95% Mean -1.150654lower 95% Mean -1.473946N 1000Sum Wgt 1000Sum -1312.3Variance 6.7855242Skewness -0.421029

144

Kurtosis 0.0921381CV -198.4991N Missing 0Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ -1.3123 -1.473946 -1.150654 Dispersion σ 2.6049039 2.4955301 2.7243787 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.986497 <.0001


145

T3 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-5 -4 -3 -2 -1 0 1 2 3 4 5

Normal(0.2473,1.57448) Quantiles 100.0% maximum 4.60099.5% 4.50097.5% 3.69890.0% 2.40075.0% quartile 1.00050.0% median 0.20025.0% quartile -0.70010.0% -1.8002.5% -2.5980.5% -4.3000.0% minimum -4.500Moments Mean 0.2473Std Dev 1.5744814Std Err Mean 0.0497895upper 95% Mean 0.3450039lower 95% Mean 0.1495961N 1000Sum Wgt 1000Sum 247.3Variance 2.4789917Skewness 0.1276124Kurtosis 0.3538502CV 636.66858

146

N Missing 0Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ 0.2473 0.1495961 0.3450039 Dispersion σ 1.5744814 1.5083726 1.6466956 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.985826 <.0001


147

T4 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-6 -5 -4 -3 -2 -1 0 1 2

Normal(-1.6343,1.51794) Quantiles 100.0% maximum 2.10099.5% 1.90097.5% 1.20090.0% 0.30075.0% quartile -0.60050.0% median -1.50025.0% quartile -2.70010.0% -3.6002.5% -4.8000.5% -5.7000.0% minimum -5.900 Moments Mean -1.6343Std Dev 1.5179386Std Err Mean 0.0480014upper 95% Mean -1.540105lower 95% Mean -1.728495N 1000Sum Wgt 1000Sum -1634.3Variance 2.3041376Skewness -0.175283

148

Kurtosis -0.217664CV -92.88005N Missing 0Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ -1.6343 -1.728495 -1.540105 Dispersion σ 1.5179386 1.4542039 1.5875594 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.994383 0.0009


149

T5 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-6 -5 -4 -3 -2 -1 0 1 2 3 4

Normal(-1.0976,1.90504) Quantiles 100.0% maximum 3.50099.5% 3.10097.5% 2.70090.0% 1.30075.0% quartile 0.40050.0% median -1.10025.0% quartile -2.60010.0% -3.5002.5% -4.5000.5% -5.5000.0% minimum -5.700 Moments Mean -1.0976Std Dev 1.9050416Std Err Mean 0.0602427upper 95% Mean -0.979383lower 95% Mean -1.215817N 1000Sum Wgt 1000Sum -1097.6Variance 3.6291834Skewness 0.048216

150


W Prob<W0.988991 <.0001


151

T6 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-5 -4 -3 -2 -1 0 1 2 3 4 5 6

Normal(0.1468,1.91214) Quantiles 100.0% maximum 5.60099.5% 5.30097.5% 4.30090.0% 2.30075.0% quartile 1.30050.0% median 0.20025.0% quartile -1.00010.0% -2.2002.5% -4.1000.5% -5.0000.0% minimum -5.000 Moments Mean 0.1468Std Dev 1.9121365Std Err Mean 0.0604671upper 95% Mean 0.265457lower 95% Mean 0.028143N 1000Sum Wgt 1000Sum 146.8Variance 3.656266Skewness -0.111582

152

Kurtosis 0.2930095CV 1302.5453N Missing 0Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ 0.1468 0.028143 0.265457 Dispersion σ 1.9121365 1.8318504 1.9998373 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.990576 <.0001


153

T7 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-5 -4 -3 -2 -1 0 1 2 3


154


W Prob<W0.988097 <.0001


155

T8 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-5 -4 -3 -2 -1 0 1 2 3 4

Normal(-0.7583,1.68346) Quantiles 100.0% maximum 3.70099.5% 3.20097.5% 2.30090.0% 1.40075.0% quartile 0.50050.0% median -0.80025.0% quartile -1.90010.0% -3.1002.5% -4.1000.5% -4.7000.0% minimum -5.100 Moments Mean -0.7583Std Dev 1.6834623Std Err Mean 0.0532358upper 95% Mean -0.653833lower 95% Mean -0.862767N 1000Sum Wgt 1000Sum -758.3Variance 2.8340452Skewness -0.095227Kurtosis -0.492681

156

CV -222.0048N Missing 0Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ -0.7583 -0.862767 -0.653833 Dispersion σ 1.6834623 1.6127776 1.7606749 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.993359 0.0002


157

T9 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-4 -3 -2 -1 0 1 2 3


158


W Prob<W0.994430 0.0009


159

T10 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-5 -4 -3 -2 -1 0 1 2 3 4 5

Normal(0.6512,1.45349) Quantiles 100.0% maximum 4.70099.5% 4.10097.5% 3.30090.0% 2.50075.0% quartile 1.70050.0% median 0.60025.0% quartile -0.20010.0% -1.1002.5% -2.3980.5% -4.3990.0% minimum -4.500 Moments Mean 0.6512Std Dev 1.4534893Std Err Mean 0.0459634upper 95% Mean 0.7413958lower 95% Mean 0.5610042N 1000Sum Wgt 1000Sum 651.2Variance 2.1126312Skewness -0.246777

160


W Prob<W0.992108 <.0001


161

Appendix 2

Shapiro-Wilk W Test result for the number of stuck particles - relative variation distribution in time range of T1 to T10

162

T1 Distribution

-1 0 1

Normal(-0.0006,0.11194) Quantiles 100.0% maximum 1.00099.5% 0.61397.5% 0.21890.0% 0.02775.0% quartile 0.01350.0% median 0.0023325.0% quartile -0.01010.0% -0.0402.5% -0.2110.5% -0.4740.0% minimum -1.000 Moments Mean -0.000617Std Dev 0.111939Std Err Mean 0.0035451upper 95% Mean 0.0063396lower 95% Mean -0.007574N 997Sum Wgt 997Sum -0.615402Variance 0.0125303Skewness -0.190762Kurtosis 33.915057CV -18135.01N Missing 3 Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ -0.000617 -0.007574 0.0063396 Dispersion σ 0.111939 0.1072322 0.1170812 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.489990 0.0000

163


164

T2 Distribution

-0.03 -0.02 -0.01 0 0.01

Normal(-0.0041,0.00789) Quantiles 100.0% maximum 0.016799.5% 0.016397.5% 0.008490.0% 0.005775.0% quartile 0.001350.0% median -0.003025.0% quartile -0.009310.0% -0.01432.5% -0.02060.5% -0.02820.0% minimum -0.0290 Moments Mean -0.004055Std Dev 0.0078872Std Err Mean 0.0002494upper 95% Mean -0.003566lower 95% Mean -0.004545N 1000Sum Wgt 1000Sum -4.055108Variance 0.0000622Skewness -0.437854Kurtosis 0.1677688CV -194.5N Missing 0 Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ -0.004055 -0.004545 -0.003566 Dispersion σ 0.0078872 0.007556 0.0082489 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.984724 <.0001

165


166

T3 Distribution

-0.01 0 0.01

Normal(0.00065,0.00415) Quantiles 100.0% maximum 0.012199.5% 0.011997.5% 0.009790.0% 0.006375.0% quartile 0.002750.0% median 0.0005325.0% quartile -0.001810.0% -0.00482.5% -0.00690.5% -0.01130.0% minimum -0.0119 Moments Mean 0.0006477Std Dev 0.0041544Std Err Mean 0.0001314upper 95% Mean 0.0009055lower 95% Mean 0.0003899N 1000Sum Wgt 1000Sum 0.6476681Variance 1.7259e-5Skewness 0.1205743Kurtosis 0.3403582CV 641.4393N Missing 0 Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ 0.0006477 0.0003899 0.0009055 Dispersion σ 0.0041544 0.00398 0.0043449 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.986340 <.0001

167


168

T4 Distribution

-0.01 0

Normal(-0.0042,0.00394) Quantiles 100.0% maximum 0.005599.5% 0.004997.5% 0.003190.0% 0.0007875.0% quartile -0.001650.0% median -0.003925.0% quartile -0.007110.0% -0.00932.5% -0.01240.5% -0.01470.0% minimum -0.0153 Moments Mean -0.004237Std Dev 0.0039352Std Err Mean 0.0001244upper 95% Mean -0.003993lower 95% Mean -0.004481N 1000Sum Wgt 1000Sum -4.237032Variance 1.5486e-5Skewness -0.171529Kurtosis -0.219183CV -92.8759N Missing 0 Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ -0.004237 -0.004481 -0.003993 Dispersion σ 0.0039352 0.00377 0.0041157 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.994673 0.0013

169


170

T5 Distribution

-0.01 0 0.01

Normal(-0.0028,0.00488) Quantiles 100.0% maximum 0.009099.5% 0.007997.5% 0.006990.0% 0.003375.0% quartile 0.001050.0% median -0.002825.0% quartile -0.006610.0% -0.00902.5% -0.01160.5% -0.01410.0% minimum -0.0146 Moments Mean -0.002815Std Dev 0.004876Std Err Mean 0.0001542upper 95% Mean -0.002512lower 95% Mean -0.003118N 1000Sum Wgt 1000Sum -2.815035Variance 2.3775e-5Skewness 0.0434639Kurtosis -0.668155CV -173.211N Missing 0 Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ -0.002815 -0.003118 -0.002512 Dispersion σ 0.004876 0.0046712 0.0050996 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.989184 <.0001

171


172

T6 Distribution

-0.01 0 0.01

Normal(0.00037,0.00485) Quantiles 100.0% maximum 0.014299.5% 0.013597.5% 0.010990.0% 0.005875.0% quartile 0.003350.0% median 0.0005125.0% quartile -0.002510.0% -0.00562.5% -0.01040.5% -0.01270.0% minimum -0.0127 Moments Mean 0.0003706Std Dev 0.0048535Std Err Mean 0.0001535upper 95% Mean 0.0006718lower 95% Mean 6.9387e-5N 1000Sum Wgt 1000Sum 0.3705715Variance 2.3557e-5Skewness -0.10646Kurtosis 0.2877157CV 1309.7445N Missing 0 Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ 0.0003706 6.9387e-5 0.0006718 Dispersion σ 0.0048535 0.0046498 0.0050761 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.990664 <.0001

173


174

T7 Distribution

-0.01 0

Normal(-0.0019,0.00412) Quantiles 100.0% maximum 0.007399.5% 0.007197.5% 0.005090.0% 0.003875.0% quartile 0.001050.0% median -0.001825.0% quartile -0.005110.0% -0.00662.5% -0.01090.5% -0.01170.0% minimum -0.0137 Moments Mean -0.001915Std Dev 0.0041215Std Err Mean 0.0001303upper 95% Mean -0.001659lower 95% Mean -0.00217N 1000Sum Wgt 1000Sum -1.914698Variance 1.6987e-5Skewness -0.195183Kurtosis -0.492687CV -215.2561N Missing 0 Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ -0.001915 -0.00217 -0.001659 Dispersion σ 0.0041215 0.0039485 0.0043105 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.988012 <.0001

175


176

T8 Distribution

-0.01 0 0.01


W Prob<W0.993555 0.0003

177


178

T9 Distribution

-0.01 0


W Prob<W0.994561 0.0011

179


180

T10 Distribution

-0.01 0 0.01

Normal(0.00165,0.00368) Quantiles 100.0% maximum 0.012099.5% 0.010497.5% 0.008490.0% 0.006375.0% quartile 0.004350.0% median 0.001525.0% quartile -0.000510.0% -0.00282.5% -0.00600.5% -0.01120.0% minimum -0.0114 Moments Mean 0.0016488Std Dev 0.0036813Std Err Mean 0.0001164upper 95% Mean 0.0018772lower 95% Mean 0.0014203N 1000Sum Wgt 1000Sum 1.6487637Variance 1.3552e-5Skewness -0.243157Kurtosis 0.3815874CV 223.27637N Missing 0Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ 0.0016488 0.0014203 0.0018772 Dispersion σ 0.0036813 0.0035267 0.0038501 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.992216 <.0001


181

Appendix 3

Shapiro-Wilk W Test result for the number of free particles – absolute variation distribution in time range of T1 to T10

182

T1 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8


183


W Prob<W0.992199 <.0001


184

T2 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-10 0 10

Normal(-0.1145,4.15515) Quantiles 100.0% maximum 13.5099.5% 10.4097.5% 8.2090.0% 4.7975.0% quartile 2.3050.0% median 0.2025.0% quartile -2.4010.0% -5.702.5% -9.200.5% -10.900.0% minimum -14.90 Moments Mean -0.1145Std Dev 4.1551549Std Err Mean 0.0929121upper 95% Mean 0.0677147lower 95% Mean -0.296715N 2000Sum Wgt 2000Sum -229Variance 17.265312Skewness -0.179739Kurtosis 0.4234024

185

CV -3628.956N Missing 0Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ -0.1145 -0.296715 0.0677147 Dispersion σ 4.1551549 4.0302634 4.2880924 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.988270 <.0001


186

T3

Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-10 0 10 20

Normal(1.2164,6.66349) Quantiles 100.0% maximum 23.0099.5% 19.4097.5% 15.8090.0% 11.1075.0% quartile 5.5050.0% median 0.2025.0% quartile -3.6010.0% -6.792.5% -9.800.5% -12.100.0% minimum -14.70Moments Mean 1.2164Std Dev 6.6634902Std Err Mean 0.1490002upper 95% Mean 1.5086119lower 95% Mean 0.9241881N 2000Sum Wgt 2000Sum 2432.8Variance 44.402102Skewness 0.4871243

187


W Prob<W0.978342 <.0001


188

T4

Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-20 -10 0 10 20

Normal(-1.095,6.68336) Quantiles 100.0% maximum 20.8099.5% 16.0097.5% 11.5090.0% 7.8075.0% quartile 3.8050.0% median -1.0025.0% quartile -5.9010.0% -9.702.5% -14.100.5% -16.900.0% minimum -22.00Moments Mean -1.095Std Dev 6.6833568Std Err Mean 0.1494444upper 95% Mean -0.801917lower 95% Mean -1.388083N 2000Sum Wgt 2000Sum -2190Variance 44.667259Skewness 0.0382623

189


W Prob<W0.997436 0.0023


190

T5

Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-40 -30 -20 -10 0 10

Normal(-14.28,8.94581) Quantiles 100.0% maximum 16.1099.5% 9.0097.5% 4.4090.0% -1.4175.0% quartile -8.2050.0% median -15.4025.0% quartile -20.9010.0% -25.002.5% -29.200.5% -32.700.0% minimum -38.40Moments Mean -14.2802Std Dev 8.9458058Std Err Mean 0.2000343upper 95% Mean -13.8879lower 95% Mean -14.6725N 2000Sum Wgt 2000Sum -28560.4Variance 80.027442Skewness 0.3991832

191


W Prob<W0.984885 <.0001


192

T6 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-20 -10 0 10 20

Normal(0.74645,7.88901) Quantiles 100.0% maximum 24.1099.5% 20.3097.5% 16.3090.0% 11.2075.0% quartile 6.0050.0% median 0.4025.0% quartile -4.7010.0% -8.902.5% -14.200.5% -19.600.0% minimum -25.30Moments Mean 0.74645Std Dev 7.8890104Std Err Mean 0.1764036upper 95% Mean 1.0924042lower 95% Mean 0.4004958N 2000Sum Wgt 2000Sum 1492.9Variance 62.236486Skewness 0.0476895Kurtosis -0.11743

193

CV 1056.8706N Missing 0 Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ 0.74645 0.4004958 1.0924042 Dispersion σ 7.8890104 7.6518904 8.1414067 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.998430 0.0569


194

T7 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-30 -20 -10 0 10 20

Normal(-3.4707,7.57891) Quantiles 100.0% maximum 20.9099.5% 16.3097.5% 11.7090.0% 5.8075.0% quartile 1.6050.0% median -3.4025.0% quartile -8.6010.0% -12.802.5% -18.100.5% -25.400.0% minimum -31.40Moments Mean -3.47075Std Dev 7.5789102Std Err Mean 0.1694696upper 95% Mean -3.138394lower 95% Mean -3.803106N 2000Sum Wgt 2000Sum -6941.5Variance 57.439879Skewness -0.046462Kurtosis 0.1270792

195


W Prob<W0.998759 0.1626


196

T8 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-20 -10 0 10 20 30 40

Normal(5.65015,9.06994) Quantiles 100.0% maximum 38.1099.5% 27.8097.5% 23.6090.0% 17.9075.0% quartile 12.1850.0% median 5.2025.0% quartile -0.9010.0% -5.602.5% -10.900.5% -17.300.0% minimum -26.50Moments Mean 5.65015Std Dev 9.0699373Std Err Mean 0.20281upper 95% Mean 6.047891lower 95% Mean 5.252409N 2000Sum Wgt 2000Sum 11300.3Variance 82.263762Skewness 0.1123349Kurtosis -0.294735

197

CV 160.5256N Missing 0 Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ 5.65015 5.252409 6.047891 Dispersion σ 9.0699373 8.7973221 9.3601153 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.996043 <.0001


198

T9 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-30 -20 -10 0 10 20

Normal(-4.4903,8.03024) Quantiles 100.0% maximum 22.1099.5% 16.4097.5% 11.7090.0% 6.3075.0% quartile 1.1850.0% median -4.9025.0% quartile -10.6010.0% -14.502.5% -18.900.5% -22.200.0% minimum -29.10Moments Mean -4.4903Std Dev 8.0302377Std Err Mean 0.1795616upper 95% Mean -4.138153lower 95% Mean -4.842447N 2000Sum Wgt 2000Sum -8980.6Variance 64.484718Skewness 0.209037Kurtosis -0.32125

199

CV -178.8352N Missing 0 Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ -4.4903 -4.842447 -4.138153 Dispersion σ 8.0302377 7.7888729 8.2871523 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.994541 <.0001


200

T10 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-30 -20 -10 0 10

Normal(-8.6058,7.90431) Quantiles 100.0% maximum 15.5099.5% 10.7097.5% 6.6090.0% 1.4075.0% quartile -3.3050.0% median -8.3025.0% quartile -13.6010.0% -18.702.5% -25.300.5% -31.200.0% minimum -36.00Moments Mean -8.60585Std Dev 7.9043115Std Err Mean 0.1767458upper 95% Mean -8.259225lower 95% Mean -8.952475N 2000Sum Wgt 2000Sum -17211.7Variance 62.47814Skewness -0.193983Kurtosis 0.1786541

201

CV -91.84812N Missing 0 Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ -8.60585 -8.952475 -8.259225 Dispersion σ 7.9043115 7.6667316 8.1571972 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.997073 0.0008


202

Appendix 4

Shapiro-Wilk W Test result for the number of free particles – relative variation distribution in time range of T1 to T10

203

T1 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4

Normal(0.01326,0.09714) Quantiles 100.0% maximum 0.413099.5% 0.340297.5% 0.228190.0% 0.138175.0% quartile 0.063850.0% median 0.003725.0% quartile -0.033410.0% -0.09972.5% -0.18520.5% -0.28570.0% minimum -0.3590Moments Mean 0.0132613Std Dev 0.0971411Std Err Mean 0.0021721upper 95% Mean 0.0175212lower 95% Mean 0.0090014N 2000Sum Wgt 2000Sum 26.522633Variance 0.0094364Skewness 0.2452791Kurtosis 1.628284

204

CV 732.51496N Missing 0Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ 0.0132613 0.0090014 0.0175212 Dispersion σ 0.0971411 0.0942214 0.100249 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.969956 0.0000


205

T2 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

0

Normal(0.00026,0.02463) Quantiles 100.0% maximum 0.086199.5% 0.066497.5% 0.053390.0% 0.029975.0% quartile 0.014350.0% median 0.001125.0% quartile -0.014210.0% -0.03142.5% -0.05080.5% -0.06060.0% minimum -0.0828 Moments Mean 0.0002624Std Dev 0.0246308Std Err Mean 0.0005508upper 95% Mean 0.0013426lower 95% Mean -0.000818N 2000Sum Wgt 2000Sum 0.5248922Variance 0.0006067Skewness 0.0587376

206

Kurtosis 0.4722138CV 9385.1007N Missing 0Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ 0.0002624 -0.000818 0.0013426 Dispersion σ 0.0246308 0.0238905 0.0254189 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.990876 <.0001


207

T3 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

0 0.1


208


W Prob<W0.990514 <.0001


209

T4 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-0.07 -0.04 -0.01 0.01 0.03 0.05 0.07


210


W Prob<W0.996842 0.0004


211

T5 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-0.1 0

Normal(-0.0437,0.02746) Quantiles 100.0% maximum 0.050299.5% 0.027797.5% 0.013890.0% -0.004475.0% quartile -0.025050.0% median -0.047025.0% quartile -0.063910.0% -0.07652.5% -0.09010.5% -0.10130.0% minimum -0.1218 Moments Mean -0.04371Std Dev 0.0274649Std Err Mean 0.0006141upper 95% Mean -0.042506lower 95% Mean -0.044915N 2000Sum Wgt 2000Sum -87.4202Variance 0.0007543Skewness 0.384414

212


W Prob<W0.986192 <.0001


213

T6 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-0.07 -0.05 -0.02 0 0.02 0.04 0.06


214


W Prob<W0.998437 0.0582


215

T7 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-0.09 -0.06 -0.03 0 0.02 0.04 0.06


216

Kurtosis 0.143328CV -219.9601N Missing 0Fitted Normal Parameter Estimates Type Parameter Estimate Lower 95% Upper 95% Location µ -0.009536 -0.010455 -0.008616 Dispersion σ 0.0209743 0.0203439 0.0216454 Goodness-of-Fit Test Shapiro-Wilk W Test

W Prob<W0.998850 0.2151


217

T8 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

0 0.1


218


W Prob<W0.995788 <.0001


219

T9 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-0.08 -0.05 -0.03 -0.01 0.01 0.03 0.05


220


W Prob<W0.995178 <.0001


221

T10 Distribution

.001

.01

.05

.10

.25

.50

.75

.90

.95

.99

.999

-4

-3

-2

-1

0

1

2

3

4

Nor

mal

Qua

ntile

Plo

t

-0.09 -0.07 -0.05 -0.03 -0.01 0.01 0.03

Normal(-0.0219,0.02014) Quantiles 100.0% maximum 0.038799.5% 0.026797.5% 0.016490.0% 0.003575.0% quartile -0.008450.0% median -0.021225.0% quartile -0.034410.0% -0.04762.5% -0.06490.5% -0.08040.0% minimum -0.0923Moments Mean -0.02188Std Dev 0.0201419Std Err Mean 0.0004504upper 95% Mean -0.020996lower 95% Mean -0.022763N 2000Sum Wgt 2000Sum -43.75908Variance 0.0004057Skewness -0.236473Kurtosis 0.2169387

222


W Prob<W0.996051 <.0001


Distributed Individual-Based Simulationbic/papers/JimingLiu-Thesis.pdfUNIVERSITY OF CALIFORNIA,...

Documents

Transcript of Distributed Individual-Based Simulationbic/papers/JimingLiu-Thesis.pdfUNIVERSITY OF CALIFORNIA,...