Mathematical Modeling Simulation I 2010

5/11/2018 Mathematical Modeling Simulation I 2010 - slidepdf.com

http://slidepdf.com/reader/full/mathematical-modeling-simulation-i-2010 1/70

Mathematical Modelingand Simulation

Nguyen V.M. Man, Ph.D.Applied Statistician

September 6, 2010Contact: [email protected]

or [email protected]



ii



Contents

0.1 Mathematical modeling and simulation Why? . . . . . . . . . 6

0.2 Mathematical modeling and simulation How? . . . . . . . . . 6

0.3 Cautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70.4 Typical applications . . . . . . . . . . . . . . . . . . . . . . . 7

0.5 Computing Software . . . . . . . . . . . . . . . . . . . . . . . 7

1 Dynamic Systems 9

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.2 Discrete Dynamic Systems- a case study . . . . . . . . . . . . 9

1.3 Continuous Dynamic Systems . . . . . . . . . . . . . . . . . . 14

2 Stochastic techniques 17

2.1 Generating functions . . . . . . . . . . . . . . . . . . . . . . . 172.2 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3 Compound distributions . . . . . . . . . . . . . . . . . . . . . 22

2.4 Introdductory Stochastic Processes . . . . . . . . . . . . . . . 24

2.5 Markov Chains (MC), a keytool in modeling random phenomena 26

2.6 Classification of States . . . . . . . . . . . . . . . . . . . . . . 30

2.7 Limiting probabilities and Stationary distribution of a MC . . 32

2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3 Simulation 37

3.1 Introductory Simulation . . . . . . . . . . . . . . . . . . . . . 373.2 Generation of random numbers . . . . . . . . . . . . . . . . . 38

3.3 Transformation random numbers into input data . . . . . . . 39

3.4 Measurement of output data . . . . . . . . . . . . . . . . . . 41

3.5 Analyzing of output- Making meaningful inferences . . . . . . 45

3.6 Simulation languages . . . . . . . . . . . . . . . . . . . . . . . 45

3.7 Research 1: Simulation of Queueing systems with multiclasscustomers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

iii



CONTENTS 1

4 Probabilistic Modeling 47

4.1 Markovian Models . . . . . . . . . . . . . . . . . . . . . . . . 474.1.1 Exponential distribution . . . . . . . . . . . . . . . . . 474.1.2 Poisson process . . . . . . . . . . . . . . . . . . . . . . 48

4.2 Bayesian Modeling in Probabilistic Nets . . . . . . . . . . . . 48

5 Statistical Modeling in Quality Engineering 495.1 Introduction to Statistical Modeling (SM) . . . . . . . . . . . 495.2 DOE in Statistical Quality Control . . . . . . . . . . . . . . . 525.3 How to measure factor interactions? . . . . . . . . . . . . . . 535.4 What should we do to bring experiments into daily life? . . . 53

6 New directions and Conclusion 576.1 Black-Scholes model in Finance . . . . . . . . . . . . . . . . . 576.2 Drug Resistance and Design of Anti-HIV drug . . . . . . . . . 576.3 Epidemic Modeling . . . . . . . . . . . . . . . . . . . . . . . . 576.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

7 Appendices 597.1 Appendix A: Theory of stochastic matrix for MC . . . . . . . 597.2 Appendix B: Spectral Theorem for Diagonalizable Matrices . 61

Keywords: linear algebra, computational algebra, graph, random processes,simulation, combinatorics, statistics, Markov chains, discrete time processes



2 CONTENTS



Introduction

We propose a few specific mathematical modeling techniques used in variousapplications such as Statistical Simulations of Service systems, Reliability

engineering, Finance engineering, Biomathematics, Pharmaceutical Science,and Environmental Science. These are aimed for graduates in AppliedMathematics, Computer Science and Applied Statistics at HCM City.

The aims the course

This lecture intergrates mathematical and computing techniques intomodeling and simulating of industrial and biological processes.

The structure of the course. The course consists of three parts:

Part I: Introductory specific topics

Part II: Methods and Tools

Part III: Connections and research projects———————————————————————————

Working method.Each group of 2 graduates is expected to carry out a small independentresearch project (max 25 pages, font size 11, 1.5 line spacing, Time newroman) from the chosen topic and submit their report at the end of thecourse [week 15].

Examination The grading will be based on performance in:* hand-ins of home-work assignments (weight 20% of the grade)* a written report of group work on a small project topic (20%) andthree-times oral presentation about the project (20%)* a final exam (40%) covers basic mathematical and statistical methodsthat have been introduced

Literature. Many, will know in the lecture.

3



4 CONTENTS

Prerequisites The participants will benefit from a solid knowledge of

advanced calculus, discrete mathematics, basic knowledge of symboliccomputing, ordinary and partial differential equations and programmingexperience with Matlab, Scilab, R, Maple (or an equivalent language).

Part I: Introductory specific topics – case studies———————————————————————————

This course aims at teaching the principles and practice of mathematicaland statistical modeling. An important part of the course is to recognize theessential mechanisms governing a phenomenon. These mechanisms have tobe translated into mathematics and included into the model. This activityrequires both a good understanding of the system under consideration andgood mathematical skills. Although mathematical modelling may make use

of all fields of mathematics, this course will concentrate on applications in Pharmaceutics, Finance and Industry , and focus mostly on discrete models.Diifferential equations could be involved but at a moderate level.

Except from selfstudy and class-teaching, an important part of the courseconcerns working in a small group on a specific project. The topics forproject work may differ from year to year.

Organization. The first part of the course consists of few lectures wherethe main methods are presented. The second part of the course is offered as“intensive-course” weeks, fully devoted to assigned projects. Here the

graduates work in groups of 2 students. In most cases a computer programfor simulating, investigating or computing a certain physical phenomenonhas to be developed. Furthermore, there is six weeks of presentations whereall the groups present and discuss their projects. Finally, the written reportof your project has to be handed in.

Time distribution will be:

15 weeks = 1 intro + 2 methods + 1 problem solving+

1 presentation A (introducing software) + 2 methods + 1 problem solving +3 weeks of presentation B (setting up model) +

3 weeks of presentation C (using model) + 1 review.

The main topics are:

MM and Simulation: mathematical models for simulation, why?

MM and Simulation: mathematical models for simulation, how?

An Economic System: modeling inventory models

A Pharmaceutical Phenomenon: math view of comparment models



CONTENTS 5

Part II: Methods of MM and Simulation

———————————————————————————We will discuss the followings:

Introductory Simulation

Dynamic Systems

Probabilistic and stochastic techniques

Part III: New applications of MMS———————————————————————————

We investigate few fascinating applications:

Probabilistic Modeling

Statistical Modeling

Pharmaceutical Modeling

Financial Modeling

———————————————————————————

Proposed project list

1. Multi-compartment Models in Pharmaceutical Science [ref. chapter I,II of R. Bellman]

2. Pharmacokinetic Properties of Compartment Models- the case onedrug [ref. chapter I, IV of R. Bellman]

3. The use of Control Theory in Optimal Dosage Determination [ref.chapter I, VII of R. Bellman]

4. The use of Decision Theory and Dynamic programming in OptimalDosage Determination [ref. chapter I, IX, X of R. Bellman]

5. Application of Large Deviation theory in insurance industry [ref. F.Esscher, Notices of AMS, Feb 2008.]

Problem: too many claims could be made against the insurancecompany, we worry about the total claim amount exceeding thereserve fund set aside for paying these claims.



6 CONTENTS

Part I: Motivation of MMS.

0.1 Mathematical modeling and simulation Why?

• Increasing the understanding of the system,

• Predicting the future system behavior,

• Carrying technical and quantitative computations for control design,from which optimization can be done

• Studying human-machine cooperation.

0.2 Mathematical modeling and simulation How?

The modeling process itself is (or should be) most often aniterative process: one can distinguish in it a number of rather separate stepswhich usually must be repeated. One begins with the real system underinvestigation and pursues the following sequence of steps:

(i) empirical observations, experiments, and data collection;

(ii) formalization of properties, relationships and mechanismswhich result in a biological or physical model (e.g., mechanisms,biochemical reactions, etc., in a metabolic pathway model;stress-strain, pressure-force relationships in mechanics; functionalrelationships between cost and reliabilities of distinct components in asoftware development projects);

(iii) abstraction or mathematization resulting in a mathematicalmodel (e.g., algebraic and/or differential equations with constraintsand initial and/or boundary conditions);

(iv) model analysis (which can consist of simulation studies, analyticaland qualitative analysis including stability analysis, and use of mathematical techniques such as perturbation studies);

(v) interpretation and comparison (with the real system) of theconclu- sions, predictions and conjectures obtained from step (iv);

(vi) changes in understanding of mechanisms, etc., in the real system.



0.3. CAUTIONS 7

0.3 Cautions

Common dificulties/limitations often encountered in the modeling of systems:(a) Availability and accuracy of data; (b) Analysis of the mathematicalmodel;(c) Use of local representations that are invalid for the overall system;(d) Assumptions that the ‘model’ is the real system;(e) Obsession with the solution stage; (f) Communication ininterdisciplinary efforts.

0.4 Typical applications

Finance Trend Analysis with Stochastic CalculusTwo projectsFinancial Economics- Models in Finance EngineeringTwo projectsPharmaceutical Phenomena- from math viewThree projects

0.5 Computing Software

Working Groups introduce softwares:SciLab, Matlab,Maple, Mathematica G.A.P, Singular , R, OpenModelica , and so on——————————————————————————————–

Part II: Techniques and Algorithms.



8 CONTENTS



Chapter 1

Dynamic Systems

We outline the modeling process of dynamic systems and introduce majortools of the trade.

1.1 Introduction

Our viewpoints are:

• connection between models and data – namely connection betweendynamic modeling and statistical modeling, is the first concern, and

• modern computer-based statistical methods must be appliedintensively to dynamic models.

Connecting models with data is almost always the eventual goal. So takingthe time to learn statistical theory and discrete methods will make you abetter modeler and a more effective collaborator with experimentalists.

1.2 Discrete Dynamic Systems- a case study

Consider a simple discrete dynamic system S that depends on four binary variables f ,g,c,w. [S changes its states when f ,g,c,w change their values.]Suppose that

• only f is the factor that can change states of S , i.e. state u changes toanother state v if and only if the equality uf + vf = 1 holds

• a state u = (uf , ug, uc, uw) changes to another state v = (vf , vg, vc, vw)iff at most two coordinates of them are different,

9



10 CHAPTER 1. DYNAMIC SYSTEMS

• the system evolves from the initial state sI = (0000) (source) to a final

state sF = (1111) (sink).

The aim: provided some constraints between the four binary variables, wewant to choose a shortest path from the source sI = (0000) to the sinksF = (1111).First Tool: InvariantsInvariant means a not-changing property of a process or system when thatprocess evolves. Similar terms of invariant are law and pattern . Inmathematical modeling and algorithmic problem solving, very often1/ you wants to model a system in a concise way as much as possible (of course after droping unimportant aspects to make the model tractable);

2/ then your goal is to find some solution or conclusion from making logicalreasoning on the model by mathematical techniques or by simulation it.The crucial structures/objects/property that you wish to find out andemploy during both two phases above are fixed pattern or invariant duringthe whole process’ evolution. Reason: these fixed rules could help you tokeep the model within tractable range, and moreover to restrict the searchdomain of solutions.

Example 1. Cutting a chocolate bar of a rectangular shape by a horizontal cut or vertical cut into unit-size squares provides an invariant between thenumber of cuts c and the number of pieces p:

p − c = 1

no matter how large the bar is!

How to find invariants? But how to figure out interesting invariants? Thisactivity depends very much on the type and complication of your system orprocess. You have to detect and employ any meaningful relationshipbetween the key components/factors/variables that make the processrunning or working. The more various models and application domains youface, the more mathematical methods that you have in your hands, thebetter the invariants could be found and exploited.

Definition 1 (System invariants). Invariants of the system S are specificconstraints or properties that do not change when the system factors(variables) change their values [to ensure S ’ stability, existence ...] during the evolution of concerned system.

Fact 1.a) Try your best to represent invariants mathematically, not by words! b) When you think that some rules could be invariants, you have to proveyour thoughts by logical reasoning.



1.2. DISCRETE DYNAMIC SYSTEMS- A CASE STUDY 11

Example 2. A farmer wishes to ferry a goat, a cabbage and a wolf across a

river with a small boat which can accommodate at most one of his belongings at a time.Furthermore, he will not let the goat be alone (i.e. without him) with cabbage, for a clear reason that the predator will eat the prey in thementioned case. What would be possible invariants of the process of transporting between the river banks?

You can not formulate any invariant without introducing some majorvariables. Here four variables f ,g,c,w would be reasonably good fordescribing the moving process, where f ,g,c,w all receive values from thebinary set {L(eft bank), R(ight bank)} = {0, 1}. Well, the situation that thegoat is not allowed be alone with cabbage without farmer is justsymbolically represented by the fixed rule:

Invariant 1 : (g = c) OR (f = g = c)

This rule is indeed an invariant, since it is (or better must be) maintainedin the whole process. For instance, if we require that(i) either f , g and c must receive the same value v ∈ {0, 1}(ii) or g and c can not receive the same value v ∈ {0, 1} if f gets value 1 − vwhat would be the invariants you could say?In case (i), easy to formulate the first invariant as f = g = c.Brute force.

Definition 2 (Brute-force). Search exhaustively all possibilities of thesearch space and, after checking, list all solutions.

Obviously all possible states V of the system S above are the solution set of the polynomial system of equations

f 2 − f = g2 − g = c2 − c = w2 − w = 0.

However, if no constraint is detected and imposed, you have to searchexhaustively all possible paths from sI to sF : this task is computationallyimpossible when we have a lot binary variables or their values are not

binary. For instance, if we useConstraint I:

(I a) f, g and c must receive the same value a ∈ {0, 1}

(I b) OR g and c must receive different values

then a few states must be discarded from V , such as {(1, 0, 0, 1), (0, 1, 1, 0)}.Finding the invariant saying that ‘the small boat can accommodate at mostone of the farmer belongings at a time’ is more tricky! To do this, you have




to think about the state-changing aspect of the process when the farmer

rows from one side to the other side of the river. [You will not draw a boatwith some items on it and move it on the river several times, will you?]Then a simple question naturally arises: what are the states of the process?

This smart question leads us to the next mathematical tool, shown in thenext part.

State-transition graphs Usually a state set of a graph G = (V, E ) can begiven by a binary space/set V consisting of length n vectors:V = {u = (u1, u2, . . . , un) : ui = 0 ∨ ui = 1}.

The Hamming distance d(u,v) between two binary states

u = (u1, u2, . . . , un) and v = (v1, v2, . . . , vn) is the number of their distinctcoordinates:

d(u,v) = |u1 − v1| + |u2 − v2| + . . . + |un − vn| =n

i=1

|ui − vi|

The weight of a binary state/vector is defined to be

wt(u) = d(u, 0) =n

i=1

|ui − 0| =n

i=1

ui.

The Hamming distance d(., .) defined on some binary space V is also calledthe Hamming metric, and the space V equipped with the Hamming metric

d(., .) is called a Hamming metric space.

Definition 3 (State-transition graph). State-transition graph G = (V, E ) of a developing system S is a directed graph where

• the vertices V consists of all feasible states that the system can realize

• the edges E consists of arcs e = (u, v) such that state u can reach tostate v during the evolution of the concerned system

Very often, changing states of a state-transition graph G = (V, E ) can beconducted mathematically by measuring how far the Hamming distance isbetween an original state u = (u1, u2, . . . , un) to its effect statev = (v1, v2, . . . , vn).

Example 3 (The farmer’s crossing river problem, cont.). The states of theriver crossing process are binary vectors of length 4

u = (uf , ug, uc, uw) = (u1, u2, u3, u4) ∈ {0, 1}4,

if we encode the left bank L and the right bank R by {0, 1} as done above!



1.2. DISCRETE DYNAMIC SYSTEMS- A CASE STUDY 13

In our specific example above, V can hold all 16 = 24 possible states if no

system invariants would be found and imposed on S . With Constraint I , V can be redefined as V := V \ {(1, 0, 0, 1), (0, 1, 1, 0)}.

We understand when the farmer is rowing his boat, for instance from a leftriver bank u = (u1, u2, u3, u4) to a right river bank v = (v1, v2, . . . , vn) (orthe other way round), his position must change. The changing of state u tov creates an edge e = (u,v) ∈ E , indeed! More clearly, the edge e = (u,v)is truly determined iff

if u1 = L (i.e. 0) then v1 = R (i.e. 1); or the other way round.

A ha, we just find another invariant, must always be true to let the process

run, say: an edge e = (u,v) would exist if we equivalently have:

Invariant 2 : u1 + v1 = 1, where the sum is binary plus.

Combining with the fact that

‘the small boat can accommodate at most one of the farmer belongings’ werealize that

a starting state u change at most two of its two coordinates to be the resultstate v.

Hence, the third invariant is found:

Invariant 3 : d(u,v) =4

i=1

|ui − vi| ≤ 2.

Decomposition

Having known how to describe a process or system by a state-transitiongraph G = (V, E ) is not enough! The reason is that we sometimes wish

• to search from the all elligible states in V to find best solutions, or

• to determine an optimal path running through that search space V .

This comes down to list all states in V efficiently! In that situation, wecould split the search space into several small-enough piceces, usually iscalled Decompostion, and then list all elements in that pieces, calledBrute force.

Example 4 (The farmer’s crossing river problem, cont.). The set of elligible states V consists of two parts: one holds every state corresponding to the position L of the farmer, and the other one holds every statecorresponding to the position R of the farmer.




This observation tells us to decompose the state vertices V into two subsets:

V L = {uL = (0, u2, u3, u4) ∈ {0, 1}4}; and V R = {uR = (1, u2, u3, u4) ∈ {0, 1}4}

Now Brute forcing the subset V L means listing all verticesuL = (0, u2, u3, u4), just have to keep in your mind Invariant 1 andmaintain it! Do similarly for the subset V R.SolutionTo determine a solution for the farmer, our specific instance so far, we finda path running through the search space V by combining several methodssuggested above.

Example 5 (The farmer’s crossing river problem, cont.). With the starting

state LLLL = 0000 of the farmer, transporting all his items to the right bank requests us, mathematically, to find a path P (consists of a few edgese = (u,v) ∈ E ) to the final state RRRR = 1111.

To do this, for any vertex u = (u1, u2, u3, u4) ∈ P , Invariants 2 and 3provide candidate vertices v = (1 − u1, v2, v3, v4) for making valid edgese = (u,v) ∈ E . You have to follow every possible tracks generated at anyintermediate vertex of P , until you arrive in the final state 1111.

1.3 Continuous Dynamic Systems

Major steps are:

1. Setting the objective

2. Building an initial model

3. Developing equations for process rates

4. Nonlinear rates from data: nonparametric models

5. Stochastic models

6. Fitting rate equations by calibration

Setting the objectiveFew crucial steps should be considered in this phase:

• Decide the objective to be theoretical or practical modeling; where

theoretical modeling: the putative model helps us to understand thesystem and interpret observations of its behavior

practical modeling : the putative model helps us to predict the system



1.3. CONTINUOUS DYNAMIC SYSTEMS 15

• Decide how much numerical accuracy you need

• Assess the feasibility of your goals: should a bit pessimism (startsmall first, and then expand it to a more complex one)

• Assess the feasibility of your data: should a bit optimistic (don’tworry if you miss some data from the begining)

Building an initial modelConceptual model and diagram . The best known is compartmental model;you have to decide which variables and processes in the system are the mostimportant and which compartment should they be located.Developing equations for process rates Having drawn a model

diagram, we next need an equation for each process rate. Mathematically,we need a differential equation expressed by:

• an Ordinary Differential Equation (ODE) of the form:

x = f (x,u, t)

where x denotes the derivative of x (the state variables) with respect

to the time variable t, and u the input vector variable, or

• by Differential Algebraic Equations (DAE):

x = f (x,u, t), and 0 = g(x,u, t)

Linear rates: when and why? . See R. BellmanNonlinear rates from data: fitting parametric models. See R. Bellman



Chapter 2

Stochastic techniques

We will discuss the followings:

Generating Functions

Stochastic processes

Markov chains

2.1 Generating functions

———————————————————————————

Introduction Probabilistic models often involve several random variablesof interest. For example, in a medical diagnosis context, the results of several tests may be significant , or in a networking context, the workloadsof several gateways may be of interest. All of these random variables areassociated with the same experiment, sample space, and probability law,and their values may relate in interesting ways Generating functions areimportant in handling stochastic processes involving integral-valued random variables.

Elementary results

Suppose we have a sequence of real numbers a0, a1, a2, . . . Introducing thedummy variable x, we may define a function

A(x) = a0 + a1x + a2x2 + · · · =∞

j=0

a jx j . (2.1)

If the series converges in some real interval −x0 < x < x0, the function A(x)is called the generating function of the sequence {a j}.

17



18 CHAPTER 2. STOCHASTIC TECHNIQUES

Fact 2. If the sequence {a j} is bounded by some cosntant K , then A(x)

converges at least for |x| < 1 [Prove it!]

Fact 3. In case of the sequence {a j} represents probabilities, we introducethe restriction

a j ≥ 0,∞

j=0

a j = 1.

The corresponding function A(x) is then a probability-generating function.We consider the (point) probability distribution and the tail probability of a random variable X , given by

P[X = j] = p j , P[X > j ] = q j ,

then the usual distribution function is

P[X ≤ j] = 1 − q j .

The probability-generating function now is

P (x) = j=0

p jx j = E(x j), E indicates the expectation operator.

Also we can define a generating function for the tail probabilities:

Q(x) = j=0

q jx j

.

Q(x) is not a probability-generating function, however.

Fact 4.a/ P (1) =

j=0 p j1 j = 1 and

|P (x)| ≤

j=0 | p jx j | ≤

j=0 p j ≤ 1 if |x| < 1. So P (x) is absolutely convergent at least for |x| ≤ 1.b/ Q(x) is absolutely convergent at least for |x| < 1.c/ Connection between P (x) and Q(x): (check this!)

(1 − x)Q(x) = 1 − P (x) or P (x) + Q(x) = 1 + xQ(x).

Mean and variance of a probability distribution

m = E(X ) = j=0

j p j = P (1) = j=0

q j = Q(1)(why!?)

Recall that the variance of the probability distribution p j is

σ2 = E(X (X − 1)) + E(X ) − [E(X )]2



2.1. GENERATING FUNCTIONS 19

we need to know

E[X (X − 1)] = j=0

j( j − 1) p j = P (1) = 2Q(1)?

Therefore,

σ2 =???Whatisit

Exercise: Find the formula of the r-th factorial moment

µ[r] = E(X (X − 1)(X − 2) · · · (X − r + 1))

Finding a generating function from a recurrence.

Multiply both sides by xn.

Example: Fibonacci sequence

f n = f n−1 + f n−2 =⇒ F (x) = x + xF (x) + x2F (x)

Finding a recurrence from a generating function.

Whenever you know F (x), we find its power series P , the coefficicents of P before xn are Fibonacci numbers.

How? Just remember how to find partial fractions expansion of F (x), inparticular a basic expansion

11 − αx

= 1 + αx + α2x2 + · · ·

In general, if G(x) is a generating function of a sequence (gn) then

G(n)(0) = n!gn

Multiple random variables. We consider probabilities involvingsimultaneously the numerical values of several random variables and toinvestigate their mutual couplings. In this section, we will extend theconcepts of PMF and expectation developed so far to multiple randomvariables.

Consider two discrete random variables X, Y : S → R associated with thesame experiment. The joint PMF of X and Y is defined by

pX,Y (x, y) = P[X = x, Y = y]

for all pairs of numerical values (x, y) that X and Y can take. We will usethe abbreviated notation P(X = x, Y = y) instead of the more precisenotations P[(X = x) ∩ (Y = y)] or P[X = x and Y = x].. For the pair of random variables X, Y , we say




Definition 4. X and Y are independent if for all x, y ∈ R, we have

P[X = x, Y = y] = P[X = x]P[Y = y] ⇔ pX,Y (x, y) = pX(x) pY (y),

or in terms of conditional probability

P({X = x}|{Y = y}) = P{X = x}.

This can be extended to the so-called mutually independent of a finitenumber n r. v.s.

Expectation. The expectation operator defines the expected value of arandom variable X as

Definition 5.

E(X ) =

x∈Range(X)

P{X = x} · x

If we consider X is a function from a sample space S to the naturals N, then

E(X ) =∞i=0

P{X > i}.(W hy?)

Functions of Multiple Random Variables. When there are multiplerandom variables of interest, it is possible to generate new random variablesby considering functions involving several of these random variables. Inparticular, a function Z = g(X, Y ) of the random variables X and Y definesanother random variable. Its PMF can be calculated from the joint PMF

pX,Y according to

pZ (z) =

(x,y)|g(x,y)=z

pX,Y (x, y).

Furthermore, the expected value rule for functions naturally extends andtakes the form

E[g(X, Y )] = (x,y)

g(x, y) pX,Y (x, y).

Theorem 6. We have two important results of expectation.

1. (Linearity) E(X + Y ) = E(X ) + E(Y ) for any pair of random variables X, Y

2. (Independence) E(X · Y ) = E(X ) · E(Y ) for any pair of independent random variables X, Y



2.2. CONVOLUTIONS 21

2.2 Convolutions

Now we consider two nonnegative independent integral-valued randomvariables X and Y , having the probability distributions

P{X = j} = a j , P {Y = k} = bk. (2.2)

The joint probability of the event (X = j, Y = k) is a jbk obviously. Weform a new random variable

S = X + Y,

then the event S = r comprises the mutually exclusive events

(X = 0, Y = r), (X = 1, Y = r − 1), · · · , (X = r, Y = 0).

Fact 5. The probability distribution of the sum S then is

P{S = r} = cr = a0br + a1br−1 + · · · + arb0.

Proof.

pS (r) = P(X +Y = r) =

(x,y):x+y=r

P(X = x and Y = y) ==x

pX(x) pY (r−x)

Definition 7. This method of compounding two sequences of numbers (not necessarily be probabilities) is called convolution. Notation

{c j} = {a j} ∗ {b j}

will be used.

Fact 6. Define the generating functions of the sequence {a j}, {b j} and {c j}by

A(x) = j=0

a jx j , B(x) = j=0

b jx j , C (x) = j=0

c jx j ,

it follows that C (x) = A(x)B(x). [check this!]

In practical applications, the sum of several independent integral-valuedrandom variables X i can be defined

S n = X 1 + X 2 + · · · + X n, n ∈ Z+.

If the X i have a common probability distribution given by p j , withprobability-generating function P (x), then the probability-generatingfunction of S n is P (x)n. Clearly, the n-fold convolution of S n is

{ p j} ∗ { p j} ∗ · · · { p j} (n factors) = { p j}n∗

.




2.3 Compound distributions

In our discussion so far of sums of random variables, we have alwaysassumed that the number of variables in the sum is known and fixed , i.e., itis nonrandom. We now generalize the previous concept of convolution tothe case where the number N of random variables X k contributing to thesum is itself a random variable! In particular, we consider the sumS N = X 1 + X 2 + · · · + X N , where

P{X k = j} = f j ,

P{N = n} = gn,

P{S N = l} = hl.

(2.3)

Probability-generating functions of X , N and S are

F (x) =

f jx j ,

G(x) =

gnxn,

H (x) =

hlxl.

(2.4)

Compute H (x) with respect to F (x) and G(x). Prove that

H (x) = G(F (x)).

Example 6. A remote village has three gas stations, and each one of them is open on any given day with probability 1/2, independently of the others.The amount of gas available in each gas station is unknown and isuniformly distributed between 0 and 1000 gallons. We wish to characterizethe distribution of the total amount of gas available at the gas stations that are open.

The number N of open gas stations is a binomial random variable with p = 1/2 and the corresponding transform is

GN (x) = (1 − p + pex)3 =1

8(1 + ex)3.

The transform (probability-generating function) F X(x) associated with theamount of gas available in an open gas station is

F X(x) =e1000x − 1

1000x.

The transform H S (x) associated with the total amount S of gas available atthe three gas stations of the village that are open is the same as GN (x),except that each occurrence of ex is replaced with F X(x), i.e.,



2.3. COMPOUND DISTRIBUTIONS 23

H S (x) = G(F (x)) =1

8 (1 + F X(x))3

.

Application in Large Deviation theoryWe are interested in a practical situation in insurance industry, originallyrealized from 1932 by F. Esscher, (Notices of AMS, Feb 2008).Problem: too many claims could be made against the insurance company,we worry about the total claim amount exceeding the reserve fund set asidefor paying these claims.Our aim: to compute the probability of this event.Modeling. Each individual claim is a random variable, we assume somedistribution for it, and the total claim is then the sum S of a large number

of (independent or not) random variables. The probability that this sumexceeds a certain reserve amount is the tail probability of the sum S of independent random variables.Large Deviation theory invented by Esscher requires the calculation of the moment generating functions! If your random variables are independentthen the moment generating functions are the product of the individualones, but if they are not (like in a Markov chain) then there is no longer

just one moment generating function!Research project: study Large Deviation theory to solve this problem.




2.4 Introdductory Stochastic Processes

The concept. A stochastics process is just a collection (usually infinite) of

random variables, denoted X t or X (t); where parameter t often representstime. State space of a stochastics process consists of all realizations x of X t,i.e. X t = x says the random process is in state x at time t. Stochasticsprocesses can be generally subdivided into four distinct categoriesdepending on whether t or X t are discrete or continuous:

1. Discrete processes: both are discrete, such as Bernoulli process (dierolling) or Discrete Time Markov chains.

2. Continuous time discrete state processes : the state space of X t isdiscrete and the index set, e.g. time set T of t is continuous, as aninterval of the reals R.

• Poisson process– the number of clients X (t) who has enteredACB from the time it opened until time t. X (t) will have thePoisson distribution with the mean E[X (t)] = λt (λ being the

arrive rate).

• Continuous time Markov chain .

• Queuing process– people not only enter but also leave the bank,we need the distribution of service time (the time a client spends

in ACB).

3. Continuous processes: both X t and t are continuous, such as diffusion process (Brownian motion).

4. Discrete time continuous state processes : X t is continuous and t isdiscrete– the so-called TIME SERIES such as

• monthly fluctuations of the inflation rate of Vietnam,

• daily fluctuations of a stock market.

Examples

1. Discrete processes: random walk model consisting of positions X t of an object (drunkand) at time discrete time point t during 24 hours,whose directional distance from a particular point 0 is measured ininteger units. Here T = {0, 1, 2, . . . , 24}.

2. Discrete time continuous processes: X t is the number of births in agiven population during time period [0, t]. Here T = R

+ = [0, ∞) andthe state space is {0, 1, 2, . . . , } The sequence of failure times of amachine is a specific instance.



2.4. INTRODDUCTORY STOCHASTIC PROCESSES 25

3. Continuous processes: X t is population density at time

t ∈ T = R

+

= [0, ∞), and the state space of X t is R+

.

4. TIME SERIES of daily fluctuations of a stock market

What interesting characteristics of SP that we want to know? Weknow a stochastic process is a mathematical model of a probabilisticexperiment that evolves in time and generates a sequence of numericalvalues. Three interesting aspects of SP that we want to know:(a) We tend to focus on the dependencies in the sequence of valuesgenerated by the process. For example, how do future prices of a stockdepend on past values?

(b) We are often interested in long-term averages, involving the entire se-quence of generated values. For example, what is the fraction of time that amachine is idle?(c) We sometimes wish to characterize the likelihood or frequency of certainboundary events. For example, what is the probability that within agiven hour all circuits of some telephone system become simultaneouslybusy, or what is the frequency with which some bu[U+FB00]er in acomputer network over[U+FB02]ows with data?

Few fundamental properties and categories

1. STATIONARY property: A process is stationary when all the X (t)have the same distribution. That means, for any τ , the distribution of a stationary process will be unaffected by a shift in the time origin,and X (t) and X (t + τ ) will have the same distributions. For thefirst-order distribution,

F X(x; t) = F X(x; t + τ ) = F X(x); and f X(x; t) = f X(x).

These processes are found in Arrival-Type Processes. For which, weare interested in occurrences that have the character of an “arrival,”such as message receptions at a receiver, job completions in amanufacturing cell, customer purchases at a store, etc. We will focus

on models in which the interarrival times (the times betweensuccessive arrivals) are independent random variables.

♦ The case where arrivals occur in discrete time and the interarrivaltimes are geometrically distributed – is the Bernoulli process.

♦ The case where arrivals occur in continuous time and theinterarrival times are exponentially distributed – is the Poisson process. Bernoulli process and Poisson process will be investigated indetail in the Stochastic Processes course.




2. MARKOVIAN (memory-less) property: Many processes with

memory-less property caused by experiments that evolve in time andin which the future evolution exhibits a probabilistic dependence onthe past. As an example, the future daily prices of a stock aretypically dependent on past prices. However, in a Markov process, weassume a very special type of dependence: the next value depends onpast values only through the current value, that is X i+1 depends onlyon X i, and not on any previous values.

2.5 Markov Chains (MC), a keytool in modelingrandom phenomena

We discuss the concept of discrete time Markov Chain or just MarkovChains in this section. Suppose we have a sequence M of consecutive trials,numbered n = 0, 1, 2, · · · . The outcome of the nth trial is represented by therandom variable X n, which we assume to be discrete and to take one of thevalues j in a finite set Q of discrete outcomes/states {e1, e2, e3, . . . , es}.

M is called a (discrete time) Markov chain if, while occupying Q states ateach of the unit time points 0, 1, 2, 3, . . . , n − 1, n , n + 1, . . ., M satisfies thefollowing property, called Markov property or memoryless property.————————————————————————————

P(X n+1 = j|X n = i, · · · , X 0 = a) = P(X n+1 = j|X n = i), for all n = 0, 1, 2, · · · .

———————————————————————————–(In each time step n to n + 1, the process can stay at the same state ei (atboth n, n + 1) or move to other state e j (at n + 1) with respect to thememoryless rule, saying the future behavior of system depends only on thepresent and not on its past history.)

Definition 8 (One-step transition probability).Denote the absolute probability of outcome j at the nth trial by

p j(n) = P(X n = j) (2.5)

The one-step transition probability, denoted

pij(n + 1) = P(X n+1 = j|X n = i),

defined as the conditional probability that the process is in state j at timen + 1 given that the process was in state i at the previous time n, for all i, j ∈ Q.



2.5. MARKOV CHAINS (MC), A KEYTOOL IN MODELING RANDOM PHENOMENA27

Independent of time property- Homogeneous Markov chains. If

the state transition probabilities pij(n + 1) in a Markov chain M isindependent of time n, they are said to be stationary, time homogeneous or

just homogeneous. The state transition probability in homogeneous chainthen can be written without mention time point n:

pij = P(X n+1 = j|X n = i). (2.6)

Unless stated otherwise, we assume and will work with homogeneousMarkov chains M . The one-step transition probabilities given by 3.2 of these Markov chains must satisfy:

s

j=1

pij

= 1; for each i = 1, 2, · · · , s and pij

≥ 0.

Transition Probability Matrix . In practical applications, we are likely giventhe initial distribution (i.e. the probability distribution of starting positionof the concerned object at time point 0), and the transition probabilities;and we want to determine the the probability distribution of position X nfor any time point n > 0. The Markov property, quantitatively describedthrough transition probabilities, can be represented in the state transitionmatrix P = [ pij ]:

P =

p11 p12 p13 . . . .p1s. p21 p22 p23 . . . p2s. p31 p32 p33 . . . p3s.

......

.... . .

...

(2.7)

Briefly, we have

Definition 9. A (homogeneous) Markov chain M is a triple ( Q,p,A) in which:

• Q is a finite set of states (be identified with an alphabet Σ),

• p(0) are initial probabilities, (at initial time point n = 0)

• P are state transition probabilities, denoted by a matrix P = [ pij ] in which

pij = P(X n+1 = j|X n = i)

.

• And such that the memoryless property is satisfied,ie.,

P[X n+1 = j|X n = i, · · · , X 0 = a] = P[X n+1 = j|X n = i], for all n.




In practice, the initial probabilities p(0) is obtained at the current time

(begining of a research), and the transition probability matrix P is foundfrom empirical observations in the past. In most cases, the major concern isusing P and p(0) to predict future.

Example 7. The Coopmart chain (denoted C ) in SG currently controls60% of the daily processed-food market, their rivals Maximart and other brands (denoted M ) takes the other share. Data from the previous years(2006 and 2007) show that 88% of C ’s customers remained loyal to C , while12% switched to rival brands. In addition, 85% of M ’s customers remained loyal to M , while other 15% switched to C . Assuming that these trendscontinue, use MC theory to determine C ’s share of the market ( a ) in 5 years and ( b) over the long run.

Proposed solution . Suppose that the brand attraction is time homogeneous,for a sample of large enough size n, we denote the customer’s attention inthe year n by a random variable X n. The market share probability of thewhole population then can be approximated by using the sample statistics,e.g.

P(X n = C ) =|{x : X n(x) = C }|

n, and P(X n = M ) = 1 − P(X n = C ).

Set n = 0 for the current time, the initial probabilities then is

p(0) = [0.6, 0.4] = [P(X 0 = C ),P(X 0 = M )].

Obviously we want to know the market share probabilities p(n) = [P(X n = C ),P(X n = M )] at any year n > 0. We now introduce atransition probability matrix with labels on rows and columns to be C andM

P =

C M −− −− −−C 0.88 0.12M 0.15 0.85

=

1 − a = 0.88 a = 0.12

b = 0.15 1 − b = 0.85

, =

0.88 0.120.15 0.85

,

(2.8)

where a = pCM = P[X n+1 = M |X n = C ], b = pMC = P[X n+1 = C |X n = M ].————————————————————————————Higher-order transition probabilities.The aim : find the absolute probabilities at any stage n. We write

p(n)ij = P(X n+m = j|X m = i), with p

(1)ij = pij (2.9)

for the n-step transition probability, being dependent of m ∈ N, see

Equation 3.2. The n-step transition matrix is denoted as P (n) = ( p(n)ij ). For



2.5. MARKOV CHAINS (MC), A KEYTOOL IN MODELING RANDOM PHENOMENA29

the case n = 0, we have

p(0)ij = δij = 1 if i = j, and i = j.

Chapman Komopgorov equations. Chapman Komopgorov equations relatethe n-step transition probabilities and k-step and n − k-step transitionprobabilities:

p(n)ij =

sh=1

p(n−k)ih

p(k)hj

, 0 < k < n.

This results in the matrix notation

P (n) = P (n−k)P (k).

Since P (1) = P , we get P (2) = P 2, and in general P (n) = P n.Let p(n) denote the vector form of probability mass distribution (pmf orabsolute probability distribution ) associated with X n of a Markov process,that is

p(n) = [ p1(n), p2(n), p3(n), . . . , ps(n)],

where each pi(n) is defined as in 2.5.

Proposition 10. The absolute probability distribution p(n) at any stage nof a Markov chain is given in the matrix form

p

(n)

= P

n

p

(0)

, where p

(0)

= p is the initial probability vector . (2.10)Proof. We employ two facts:* P (n) = P n, and* the absolute probability distribution p(n+1) at any stage n + 1 (associatedwith X n+1) can be found by the 1-step transition matrix P = [ pij ] and thedistribution

p(n) = [ p1(n), p2(n), p3(n), . . . , ps(n)]

at any stage n (associated with X n):

p j(n + 1) =

s

i=1 pij pi(n), or in the matrix notation p

(n+1)

= P p(n)

.

Then just do the induction p(n+1) = P p(n) = P P, p(n−1) = · · · = P n+1 p(0).

Example 8 (The Coopmart chain: cont. ). ( a/ ) C ’s share of the market in 5 years can be computed by

p(5) = [ pC (5), pM (5)] = P 5 p(0).




2.6 Classification of States

Accessible states. State j is said to be accessible from state i if for some

n ≥ 0, p(n)ij > 0, and we write i → j. Two states i and j accessible to each

other are said to communicate, and we write i ↔ j. If all statescommunicate with each other, then we say that the Markov chain is

irreducible.

Recurrent states. Let A(i) be the set of states that are accessible from i. Wesay that i is recurrent if for every j that is accessible from i, i is alsoaccessible from j; that is, for all j ∈ A(i) we have that i ∈ A( j).

When we start at a recurrent state i, we can only visit states j ∈ A(i) fromwhich i is accessible. Thus, from any future state, there is always someprobability of returning to i and, given enough time, this is certain tohappen. By repeating this argument, if a recurrent state is visited once, itwill be revisited an infinite number of times.

Transient states. A state is called transient if it is not recurrent. Inparticular, there are states j ∈ A(i) such that i is not accessible from j.After each visit to state i, there is positive probability that the state enterssuch a j. Given enough time, this will happen, and state i cannot be visitedafter that. Thus, a transient state will only be visited a finite number of

times.

If i is a recurrent state, the set of states A(i) that are accessible from i,form a recurrent class (or simply class), meaning that states in A(i) are allaccessible from each other, and no state outside A(i) is accessible fromthem. Mathematically, for a recurrent state i, we have A(i) = A( j) for all jthat belong to A(i), as can be seen from the definition of recurrence. It canbe seen that at least one recurrent state must be accessible from any giventransient state. This is intuitively evident, and a more precise justificationis given in the theoretical problems section. It follows that there must existat least one recurrent state, and hence at least one class. Thus, we reach

the following conclusion.Markov Chain Decomposition .

• A MC can be decomposed into one or more recurrent classes, pluspossibly some transient states.

• A recurrent state is accessible from all states in its class, but is notaccessible from recurrent states in other classes.

• A transient state is not accessible from any recurrent state.



2.6. CLASSIFICATION OF STATES 31

• At least one, possibly more, recurrent states are accessible from a

given transient state.

Remark 7. For the purpose of understanding long-term behavior of Markov chains, it is important to analyze chains that consist of a singlerecurrent class.For the purpose of understanding short-term behavior, it is also important to analyze the mech- anism by which any particular class of recurrent statesis entered starting from a given transient state.

Periodic states.

Absorption probabilities. In this section, we study the short-term behaviorof Markov chains. We first consider the case where the Markov chain startsat a transient state. We are interested in the first recurrent state to beentered, as well as in the time until this happens. When focusing on suchquestions, the subsequent behavior of the Markov chain (after a recurrentstate is encountered) is immaterial. State j is said to be an absorbing stateif p jj = 1; that is, once state j is reached, it is never left. We assume,without loss of generality, that every recurrent state k is absorbing:

pkk = 1, pkj = 0 for all j = k.

• If there is a unique absorbing state k, its steady-state probability is 1(because all other states are transient and have zero steady-stateprobability), and will be reached with probability 1, starting from anyinitial state.

• If there are multiple absorbing states, the probability that one of themwill be eventually reached is still 1, but the identity of the absorbingstate to be entered is random and the associated probabilities maydepend on the starting state.

In the sequel, we fix a particular absorbing state, denoted by s, and considerthe absorption probability ai that s is eventually reached, starting from i:

ai = P(X n eventually becomes equal to the absorbing state s|X 0 = i).

Absorption probabilities can be obtained by solving a system of linearequations.

as = 1, ai = 0, for all absorbing i = s, ai =m

j=1

pija j , for all transient i.




2.7 Limiting probabilities and Stationary

distribution of a MC

Definition 11. Vector p∗ is called a stationary distribution of a Markov chain {X n, n ≥ 0} with the state transition matrix P if:

p∗P = p

∗.

This equation indicates that a stationary distribution p∗ is a left

eigenvector of P with eigenvalue 1. In general, we wish to know limitingprobabilities p

∞ from taking n → ∞ in the equation

p(∞) = P ∞ p(0).

We need some general results to determine the stationary distribution p∗

and limiting probabilities p∞ of a Markov chain.

A) Markov chains that have two states. At first we investigate thecase of Markov chains that have two states, say Q = {e1, e2}. Let a = pe1e2

and b = pe2e1 the state transition probabilities between distinct states in atwo state Markov chain, its state transition matrix is

P =

p11 p21

p12 p22

=

1 − a a

b 1 − b

, where 0 < a < 1, 0 < b < 1. (2.11)

Proposition 12.a) The n-step transition probability matrix is given by

P (n) = P n =1

a + b

b ab a

+ (1 − a − b)n

a −a

−b b

b) Find the limit matrix when n −→ ∞.

To prove this basic Proposition 12 (computing transition probability matrixof two state Markov chains), we use a fundamental result of Linear Algebrathat is recalled in Subsection ??.

Proof. The eigenvalues of the state transition matrix P found by solvingequation

c(λ) = |λI − P | = 0

are λ1 = 1 and λ2 = 1 − a − b. The spectral decomposition of square matrixsays P can be decomposed into two constituent matrices E 1, E 2 (since onlytwo eigenvalues was found):

E 1 =1

λ1 − λ2[P − λ2I ], E 2 =

1

λ2 − λ1[P − λ1I ].



2.8. EXERCISES 33

That means, E 1, E 2 are orthogonal matrices, i.e. E 1 · E 2 = 0 = E 2 · E 1, and

P = λ1E 1 + λ2E 2; E 21 = E 1, E 22 = E 2.

Hence, P n = λn1E 1 + λn

2E 2 = E 1 + (1 − a − b)nE 2, or

P (n) = P n =1

a + b

b ab a

+ (1 − a − b)n

a −a

−b b

b) The limit matrix when n −→ ∞:

limn→∞

P n =1

a + b

b ab a

B) Markov chains that have more than two states. For s > 2, it iscumbersome to compute constituent matrices E i of P , we could employ theso-called regular property . Markov chains are regular if there exists m ∈ N

such that P (m) = P m > 0 (every entry is positive).

2.8 Exercises

A/ Simple skills.

—————————————————–Let Z 1, Z 2, · · · be independent identically distributed r.v.’s withP(Z n = 1) = p and P(Z n = −1) = q = 1 − p for all n. Let

X n =n

i=1

Z i, n = 1, 2, · · ·

and X 0 = 0. The collection of r.v.’s {X n, n ≥ 0} is a random process, and itis called the simple random walk X (n) in one dimension.

(a) Describe the simple random walk X (n).

(b) Construct a typical sample sequence (or realization) of X (n).

(c) Find the probability that X (n) = −2 after four steps.

(d) Verify the result of part (a) by enumerating all possible samplesequences that lead to the value X (n) = −2 after four steps.

(e) Find the mean and variance of the simple random walk X (n). Findthe autocorrelation function RX(n, m) of the simple random walkX (n).




(f) Show that the simple random walk X (n) is a Markov chain.

(g) Find its one-step transition probabilities.

(h) Derive the first-order probability distribution of the simple randomwalk X (n).

Solution.

(a) The simple random walk X (n) is a discrete-parameter (or time),discrete-state random process. The state space isE = {..., −2, −1, 0, 1, 2,...}, and the index parameter set is T = {0, 1, 2,...}.

(b) A sample sequence x(n) of a simple random walk X (n) can be producedby tossing a coin every second and letting x(n) increase by unity if a head Happears and decrease by unity if a tail T appears. Thus, for instance, wehave a small realization of X (n) in Table 2.8:

n 0 1 2 3 4 5 6 7 8 9 10 · · ·Coin tossing H T T H H H T H H T · · ·

xn 0 1 0 - 1 0 1 2 1 2 3 2 · · ·

Table 2.1: Simple random walk from Coin tossing

The sample sequence x(n) obtained above is plotted in (n, x(n))-plane. Thesimple random walk X (n) specified in this problem is said to beunrestricted because there are no bounds on the possible values of X . Thesimple random walk process is often used in the following primitivegambling model: Toss a coin. If a head appears, you win one dollar; if a tailappears, you lose one dollar.

B/ Concepts.

—————————————————–

1. Show that if P is a Markov matrix, then P n is also a Markov matrixfor any positive integer n.

2. A state transition diagram of a finite-state Markov chain is a linediagram with a vertex corresponding to each state and a directed linebetween two vertices i and j if pij > 0. In such a diagram, if one canmove from i and j by a path following the arrows, then i → j. Thediagram is useful to determine whether a finite-state Markov chain isirreducible or not, or to check for periodicities. Draw the state



2.8. EXERCISES 35

transition diagrams and classify the states of the MCs with the

following transition probability matrices:

P 1 =

0 0.5 0.5

0.5 0 0.50.5 0.5 0

; P 2 =

0 0 0.5 0.51 0 0 00 1 0 00 1 0 0

; P 3 =

0.3 0.4 0 0 0.30 1 0 0 00 0 0 0.6 0.40 0 1 0 0

3. Verify the transitivity property of the Markov chain ; that is, if i → jand j → k, then i → k. (Hint: use Chapman Komopgorov equations).

4. Show that in a finite-state Markov chain, not all states can betransient.

C/ Markov Chains and Modeling.—————————————————–

1. A certain product is made by two companies, A and B, that controlthe entire market. Currently, A and B have 60 percent and 40percent, respectively, of the total market. Each year, A loses 5 of itsmarket share to By while B loses 3 of its share to A. Find the relativeproportion of the market that each hold after 2 years.

2. Let two gamblers, A and B, initially have k dollars and m dollars,

respectively. Suppose that at each round of their game, A wins onedollar from B with probability p and loses one dollar to B withprobability q = 1 − p. Assume that A and B play until one of themhas no money left. (This is known as the Gambler’s Ruin problem.)Let X n be A’s capital after round n, where n = 0, 1, 2, · · · and X 0 = k.

(a) Show that X (n) = {X n, n ≥ 0} is a Markov chain with absorbingstates.

(b) Find its transition probability matrix P . Realize P when p = q = 1/2 and N = 4

(c*) What is the probability of A’s losing all his money?

Hint: Different rounds are assumed independent. The gambler A, say,plays continuously until he either accumulates a target amount of m,or loses all his money. We introduce the Markov chain shown whosestate i represents the gambler’s wealth at the beginning of a round.The states i = 0 and i = m correspond to losing and winning,respectively. All states are transient, except for the winning and losingstates which are absorbing. Thus, the problem amounts to finding theprobabilities of absorption at each one of these two absorbing states.Of course, these absorption probabilities depend on the initial state i.




D/ Advanced Skills.

—————————————————–

Theorem 13. If every eigenvalue of a matrix P yields linearly independent left eigenvectors in number equal to its multuiplicity, then * there exists a nonsingular matrix M whose rows are left eigenvectors of P , such that * D = M P M −1 is a diagonal matrix with diagonal elements are theeigenvalues of P , repeated according to multiplicity.

Apply this for a practical problem in Business Intelligence through a casestudy in mobile phone industry in VN. Due to a most recent survey, thereare four big mobile producers/sellers N , S , M and L, and their market

distributions in 2007 is given by the stochastic matrix:

P =

N M L S

−− −− −− −− −−N 1 0 0 0M 0.4 0 0.6 0L 0.2 0 0.1 0.7S 0 0 0 1

Is P regular? ergodic? Find the long term distribution matrixL = limm→∞ P m. What is your conclusion? (Remark that the state N andS are called absorpting states).



Chapter 3

Simulation

This section is aimed at providing a brief introduction to simulationmethods and tools within Industrial Statistics, Computational Mathematicsand Operations Research .

3.1 Introductory Simulation

Practical Motivation. An organisation has realised that a system is notoperating as desired, it will look for ways to improve its performance. To doso, sometimes it is possible to experiment with the real system and, throughobservation and the aid of Statistics, reach valid conclusions towards futuresystem improvement . However, experiments with a real system may entailethical and/or economical problems, which may be avoided dealing with aprototype, a physical model.

Sometimes, it is not feasible or possible, to build a prototype, yet we mayobtain a mathematical model describing, through equations and constraints,the essential behaviour of the system. This analysis may be done,sometimes, through analytical or numerical methods, but the model may betoo complex to be dealt with. Statistically, in the design phase of a system,there is no system available, we can not rely on measurements forgenerating a pdf. In such extreme cases, we may use simulation. Largecomplex system simulation has become common practice in many industrialareas. Essentially, simulation consists of

(i) building a computer model that describes the behaviour of asystem; and

(ii) experimenting with this model to reach conclusions that supportdecisions.

37



38 CHAPTER 3. SIMULATION

Once we have a computer siumulation model of the actual system, we need

to generate values for the random quantities that are part of the systeminput (to the model).

Note that : Besides Simulation, two other key methods used to solvepractical problems in OR are Linear Programming , and Statistical Methods.

In this chapter, from the Statistical point of view , we introduce keyconcepts, methods and tools from simulation with the IndustrialStatistics orientation in mind. The major parts of this section are from [8]and [28] We mainly consider the problem within Step (ii) only. To conductStep (i) rightly and meaningfully, a close collaboration with experts inspecific areas is vital. Topics discussing Step (ii) are shown in the otherchapters. We learn

1. How to generate random numbers?

2. How to transformation random numbers into input data?

3. How to measure/record output data?

4. How to analyze and intepret output data and making meaningfulinferences?

3.2 Generation of random numbers

General concepts. Ref 3.1, [8] and [28].The most basic computational component in simulation involves thegeneration of random variables distributed uniformly between 0 and 1.These then can be used to generate other random variables, both discreteand continuous depending on practical contexts. Few major requirementsfor meaningfully reasonable/reliable simulation:

• the simulation is run long enough to obtain an estimate of theoperating characteristics of the system

• the number of runs also should be large enough to obtain reliableestimate

• the result of each run is a random sample implies that a simulation isa statistical experiment, that must be conducted using statisticaltools such as: i) point estimation, ii) confident intervals and iii)hypothesis testing.



3.3. TRANSFORMATION RANDOM NUMBERS INTO INPUT DATA39

A schematic diagram to mathematically simulate a system . If a system S is

described by a discrete random variable X , a fundamental diagram tosimulate S is:

A random number generator G → uniform random variable U → pdf or cdf of X.

3.3 Transformation random numbers into inputdata

Ref 3.2, [8]

Now some advanced simulation techniques.

Practicality : we use G to randomly compute specific value of X . in the lasttwo phases of this diagram. Using the so-called discrete inverse transform method, in which we write the cdf of X by F (k) =

ki=0 p(i) ∈ [0, 1], then:

- generate a uniform random number U ∈ [0, 1] by G,

- find the value of X = k by determining the interval [F (k − 1), F (k)]consisting of U , mathematically this means to find the preimage F −1(U ).

The Transformation Method

Generally, we need an algorithm, named Transformation Method , describedin two steps:

Step 1 use an algorithm A to generates variates V n, n = 1, 2,... of a r.v. V (V = U in the above example) with specific cdf F V (v) for continuouscase or pdf f V (v) for discrete case. Then

Step 2 employ an approriate transformation g(.) to generate a variate of X , namely X n = g(V n).

Theorem 14 (Relationship of V and X ). Consider a r.v. V with pdf f V (v)and given transformation X = g(V ). Denote by v1, v2, · · · , vn the real rootsof the equation

x − g(v) = 0 then the pdf of the r.v. X is given by (3.1)

f X(x) =n

l=1

f V (vl) ·1

|dgdv

(vl)|.

Given x, if Equ. 3.1 has no real solutions then the odf f X(x) = 0

Proof. DIY




Two most important uses of the Transformation Method are:

A) Linear (affine when b = 0) case: X = g(V ) = aV + b where a, b ∈ R.Then

f X(x) =1

|a|f V (

x − b

a).

B) Inverse case X = g(V ) = F −1X (V ) where F X(x) is the cdf of the randomvariable X .

Theorem 15 (Inverse case). Consider a r.v. V with uniform cdf F V (v) = v, v ∈ [0, 1]. Then the transformation X = g(V ) = F −1X (V ) givesvariates x of X with cdf F X(x).

Proof. For any real number a, and due to the monotonicity of the cdf function F X , so

P(X ≤ a) = P[F −1X (V ) ≤ a] = P[V ≤ F X(a)] = F V (F X(a)) = F X(a).

Use this, an algorithm is formulated for generating variates of a r.v. X .

1. Invert the given cdf F X(x) to find its inverse F −1X

2. Generate a uniform variate V ∈ [0, 1]

3. Generate variates x via the transformation X = F −1X (V ).

Example 9. Consider a Bernoulli r.v. X ∼ B( p) where p = P(X = 1). In addition, the cdf F X(x) = P(X ≤ x) = V is a step (stair-case) function u(.).[That is, u(t) = bi if ai ≤ t < ai+1, where (ai)i is an ascending sequence.]

HereF X(x) = 0 if x < 0, F X(x) = p if 0 ≤ x < 1, and F X(x) = p + (1 − p) = 1 if 1 ≤ x.How to generate X ? We employ V ∼ UniDist([0.1]), and the fact that the

inverseF −1X (V ) = u(V − (1 − p)).

Example 10. Consider a binomial r.v. X ∼ BinomDist(n, p) where p = P(X = 1). X takes values in X = {0, 1,...,n} and the distribution isgiven by a probability function

p(k) = P(X = k) =

n

k

pk (1 − p)n−k.



3.4. MEASUREMENT OF OUTPUT DATA 41

We employ the fact that V ∼ UniDist([0.1]), and use

F X(x) = P(X ≤ x) = V ⇐⇒ x = F −1X (V ) = u(V ),

in which the parameters of the step function u(V ) are given by:u(V ) = k if

i=0..k−1 p(i) < V ≤

i=0..k p(i), k ∈ {1,...,n}; u(V ) = 0 if

V < 0.How is this done? Simply split the vertical interval [0, 1] into n + 1subintervals, with the length of the kth subinterval equal to

p(k) = P(X = k), k ∈ {0, 1,...,n}.———————————————————————————

Example 11 (Implementation in Maple). If a system S is described by a

binomial random variable X describing an arrival process of tankers at a sea-port with parameters (n, p) = (5, 0.7), then its mean is λ = np = 3.5. Abinomial random variate x for the number of tankers arriving can begenerated using the Maple function

x = random[binomiald[n, p]][1].

———————————————————————————

Key steps to obtain reliable simulation result

1. determine a proper simulation run length T , i.e. T times of generatinguniform random number U ∈ [0, 1], with the same probabilitydistribution each time

2. determine a proper number of runs R

3. design good random number generators G

Determine R.

Key steps in designing good random number generators G.Good random number generators mainly are based on integer arithmeticwith modulo operations. A good one could be V i = aV i−1 mod 232 forcomputer with 32-bit CPUs.

3.4 Measurement of output data

Ref 3.3, [8]The third step in a simulation process consists of passing the inputsthrough the simulation model to obtain outputs to be analysed later. We




shall consider the two main application areas in Industrial Statistics: Monte

Carlo simulation and discrete event simulation.Discrete event simulation models Discrete event simulation (DES)deals with systems whose state changes at discrete times, not continuously.These methods were initiated in the late 50’s; for example, the firstDES-specific language, GSP, was developed at General Electric by Tocherand Owen to study manufacturing problems.

To study such systems, we build a discrete event model. Its evolution intime implies changes in the attributes of one of its entities, or modelcomponents, and it takes place in a given instant. Such change is called

event . The time between two instants is an interval. A process describes the

sequence of states of an entity throughout its life in the system.

There are several strategies to describe such evolution, which depend on themechanism that regulates time evolution within the system.

• When such evolution is based on time increments of the sameduration, we talk about synchronous simulation .

• When the evolution is based on intervals, we talk about asynchronoussimulation .

Generation of values of a Markov Chain

Discrete Time Markov Chain (DTMC). We consider a homogeneousDTMC described by a transition matrix P . How do we generate samplepaths of states {X n}? (574-575 Viniotis)Briefly, we have learned that

Definition 16. A (homogeneous) Markov chain M is a triple ( Q,p,A) in which:

• Q is a finite set of states (be identified with an alphabet Σ),

• p are initial probabilities, (at time point t = 0)

• P are state transition probabilities, denoted by a matrix P = [ pij ] in which

pij = P(X n+1 = j|X n = i)

.

• And such that the memoryless property is satisfied,ie.,

P(X n+1 = j|X n = i, · · · , X 0 = a) = P(X n+1 = j|X n = i), for all n.



3.4. MEASUREMENT OF OUTPUT DATA 43

Independent of time property- Homogeneous Markov chains. If the state

transition probabilities pij(n + 1) in a Markov chain M is independent of time n, they are said to be stationary, time homogeneous or justhomogeneous. The state transition probability in homogeneous chain thencan be written without mention time point n:

pij = P(X n+1 = j|X n = i). (3.2)

Unless stated otherwise, we assume and will work with homogeneousMarkov chains M . The one-step transition probabilities given by 3.2 of these Markov chains must satisfy:

s

j=1

pij = 1; for each i = 1, 2, · · · , s and pij ≥ 0.

Transition Probability Matrix . In practical applications, we are likely giventhe initial distribution (i.e. the probability distribution of starting positionof the concerned object at time point 0), and the transition probabilities;and we want to determine the the probability distribution of position X nfor any time point n > 0. The Markov property, quantitatively describedthrough transition probabilities, can be represented conveniently in theso-called state transition matrix P = [ pij ]:

P =

p11 p12 p13 . . . .p1s.

p21 p22 p23 . . . p2s. p31 p32 p33 . . . p3s.

......

.... . .

...

(3.3)

Definition 17. Vector p∗ is called a stationary distribution of a Markov chain {X n, n ≥ 0} with the state transition matrix P if:

p∗P = p

∗.

Question: how to find a stationary distribution of a Markov chain?Consider a homogeneous DTMC {X n} described by the transition matrixP = [ p

ij]. How do we generate sample paths of {X

n}. Two issues involved

here:

a) Only steady state results are of interest

b) Transient results are of interest as well.

In a), we want to generate values for a single stationary random variable p∗

that describes the steady-state behavior of the MC. p∗ is one-dimensionalpdf the algorithm after Theorem 15 suffices




Instances of synchronous and asynchronous simulation We illustrate

both strategies describing how to sample from a Markov chain with statespace S and transition matrix

P = ( pij), with pij = P(X (n + 1) = j|X (n) = i).

The obvious way to simulate the (n + 1)-th transition, given X (n), is

Generate X (n + 1) ∼ { px(n) j : j ∈ S }.

This synchronous approach has the potential shortcoming thatX (n) = X (n + 1), with the corresponding computational effort lost.

Alternatively, we may simulate T n, the time until the next change of stateand, then, sample the new state X (n + T n). If X (n) = s, T n follows ageometric distribution GeomDist( pss) of parameter pss and X (n + T n) willhave a discrete distribution with mass function {

psj(1− pss) : j ∈ S \ {s}}.

Should we wish to sample N transitions of the chain, assuming X (0) = i0,we do

Do t = 0, X (0) = i0While t < N

Sample h ∼ GeomDist( px(t)x(t))

Sample X (t + h) ∼ {

px(t)j

(1− px(t)x(t)) : j ∈ S \ {x(t)}}Do t = t + h

Two key strategies for asynchronous simulation.

One is that of event scheduling . The simulation time advances until thenext event and the corresponding activities are executed. If we have k typesof events (1, 2, . . . , k) , we maintain a list of events, ordered according totheir execution times (t1, t2, . . . , tk) . A routine Ri associated with the i-thtype of event is started at time τ i = min(t1, t2, . . . , tk).

An alternative strategy is that of process interaction . A process represents

an entity and the set of actions that experiments throughout its life withinthe model. The system behaviour may be described as a set of processesthat interact, for example, competing for limited resources. A list of processes is maintained, ordered according to the occurrence of the nextevent. Processes may be interrupted, having their routines multiple entrypoints, designated reactivation points.

Each execution of the program will correspond to a replication, whichcorresponds to simulating the system behaviour for a long enough period of



3.5. ANALYZING OF OUTPUT- MAKING MEANINGFUL INFERENCES 45

time, providing average performance measures, say X (n), after n customers

have been processed. If the system is stable,

X (n) −→ X.

If, e.g., processing 1000 jobs is considered long enough, we associate witheach replication j of the experiment the output X j(1000). After severalreplications, we would analyse the results as described in the next section.

3.5 Analyzing of output- Making meaningfulinferences

Ref 3.4, [8] and [21], Section 5.

3.6 Simulation languages

Use JMT system or OpenModelica.

3.7 Research 1: Simulation of Queueing systemswith multiclass customers

Classical queueing models have been extensively studied from the 60sduring the emerging of internet. One of the pioneers of the field is LeonardKleinrock at UCLA. In fact, queueing models are applied not only innetworks and systems of computers but also in any service system of aneconomy that posseses resource allocation and/or sharing. In Europe, theproject called Euro-NGI (European Network of Exellence Project onDesign and Enginnering of the Next-Generation Internet) has been created

just a few years.We restricted ourselve to studying and simulating basic queueing systemssuch as M/M/1, M/M/1/K and M/G/1 systems. Now, how to improve thework in [8]?

——————————————————————————————–

Part III: Practical Applications of MMS.



Chapter 4

Probabilistic Modeling

This is a seminar-based chapter. Main references are:1/ Chapter 6 of Simulation, A Modeler’s Approach by Jame Thomson,Wiley, 20002/ Chapters 1-4 of Mathematical Modeling and ComputerSimulation by Daniel Maki and Maynard Thompson, ThomsonBrookscole, 20063/ Chapter 4 of Risk and Financial Management- MathematicalMethods by C. S. Tapiero, Wiley, 2004

4.1 Markovian Models4.1.1 Exponential distribution

An exponential random variable T with parameter λ has the densitydistribution

f (t) = λe−λt, t ≥ 0,

the cumulative distribution and tail (survivor) distribution respectively are

P(T ≤ t) =

t0 f (x) dx; and

P(T > t) =

+∞t f (x) dx = e

−λt

.

Memory-less property of exponential distributionsFor exponential random variables T ,

P(T > s + t) = P(T > t)P(T > s).

The Erlang random variable If T 1, T 2 are two independent andidentically distributed (i.i.d.) exponential random variables, what would bethe distribution of S 2 = T 1 + T 2?

47



48 CHAPTER 4. PROBABILISTIC MODELING

4.1.2 Poisson process

Suppose an experiment begins at time point t = 0 and whose ith eventoccurs at time point, a random variable T i ≥ 0 named point of occurrence,for i = 1, 2, · · · . Let Z n = T n − T n−1 denote the interarrival time period. If the Z n’s are i.i.d. then {Z n, n ≥ 1} is called a recurrent /renewal process.{T n, n ≥ 0} itself is called an arrival process.Counting process N (t)If we now view time t is continuous, a random process {N (t), t ≥ 0} is saidto be a counting process if N (t) counts the number of events that haveoccurred in the interval (0, t). Obviously

1. N (t) ∈ N

2. N (s) ≤ N (t) if s ≤ t

3. N (t) − N (s) = the number of events that have occurred in theinterval (s, t).

Poisson process {N (t), t ≥ 0} is a special type of Counting process.A counting process {N (t), t ≥ 0} is said to be a Poisson process with rateλ > 0 if Remark: the Poisson distribution is the limit case of binomial distribution.Interarrival times of the Poisson processNonhomogeneous Poisson processCompound Poisson process

4.2 Bayesian Modeling in Probabilistic Nets



Chapter 5

Statistical Modeling in

Quality Engineering

5.1 Introduction to Statistical Modeling (SM)

This chapter is planned for persons interested in the design, conduct andanalysis of experiments in the physical, chemical, biological, medical, social,psychological, economic, engineering or industrial sciences. The chapter willexamine how to design experiments, carry them out, and analyze the datathey yield. Our major aims are:

1/ provide an introduction to descriptive and inferential statistical conceptsand methods. Topics include grouping of data, measures of central tendencyand dispersion, probability concepts and distributions, sampling, statisticalestimation, and statistical hypothesis testing.

2/ introduce a specific problem in Statistical Quality Control: Design of Experiments (DOE)

Why Statistics

[See [27] for more information.]

Statistical methods are applied in an enormous diversity of problems insuch fields as:2

Agriculture (which varieties grow best?)2 Genetics, Biology (selecting new varieties, species)2 Economics (how are the living standards changing?)2 Market Research (comparison of advertising campaigns)2 Education (what is the best way to teach small children reading?)2 Environmental Studies (do strong electric or magnetic fields induce highercancer rates?)2 Meteorology (is global warming a reality?)2 Medicine (which drug is best?)

49



50CHAPTER 5. STATISTICAL MODELING IN QUALITY ENGINEERING

2 Psychology (how are shyness and loneliness related?)2

Social Science (comparison of people’s reaction to di Rerent stimuli)Basic terms

1. A population is the collection of items under discussion. It may befinite or infinite; it may be real or hypothetical. A sample is a subset of a population. The sample should be chosen to be representative of the population because we usually want to draw conclusions orinferences about the population based on the sample.

2. An appropriate statistical model for our data will often be of theform

Observed data = f (x; µ) + error;

where x are variables we have measured and µ are parameters of ourmodel.

3. Variable. A property or characteristic on which information isobtained in an experiment. There are two major kinds of variables:

a. Quantitative Variables (measurements and counts)

• continuous (such as heights, weights, temperatures); their values

are often real numbers; there are few repeated values;

• discrete (counts, such as numbers of faulty parts, numbers of telephone calls etc); their values are usually integers; there maybe many repeated values.

b. Qualitative Variables (factors, class variables); these variablesclassify objects into groups.

• categorical (such as methods of transport to College); there is nosense of order;

• ordinal (such as income classified as high, medium or low); there

is natural order for the values of the variable.

4. Observation. The collection of information in an experiment, oractuial values obtained on variables in an experiment. Responsevariables are outcomes or observed values of an experiment.

5. Parameters and Statistics. A parameter is a numericcharacteristic of a population or a process. A statistic is numericalcharacteristic that is computed from a sample of observations.



5.1. INTRODUCTION TO STATISTICAL MODELING (SM) 51

6. Distribution. A tabular, graphical or theoretical description of the

values of a variable using some measure of how frequently they occurin a population, a process or a sample.

7. Parametric methods versus non-parametric methods. Amethod for making statistical inferences that assumes that samplescome from a known family of distributions. For example, the methodof analysis of variance assumes that samples are drawn from normal distributions. Non-parametric methods allow making statisticalinferences from samples that does not assume the sample to comefrom any underlying family of distributions and make no assumptionsabout any population parameters.

8. Mathematical models and Statistical models. A model istermed mathematical if it is derived from theoretical considerationsthat represent exact, error-free assumed relationships among thevariables. A model is termed statistical if it is derived from data thatare subject to various types of specifications, observations,experimental, and/or measurement errors.

9. Regression analysis is used to model relationships between randomvariables, determine the magnitude of the relationships betweenvariables. Some are independent variables or the predictors, alsocalled explanatory variables, control variables, or regressors, usually

named X 1, . . . , X d. The others are response variables, also calleddependent variables, explained variables, predicted variables, orregressands, usually named Y . If there is more than one responsevariable, we speak of multivariate regression .

Brief aims of designing experimentsVarious (statistical) designs are discussed and their respective differences,advantages, and disadvantages are noted. In particular, factorial and

fractional factorial designs are discussed in detail. These are designs inwhich two or more factors are varied simultaneously; the experimenterwishes to study not only the effect of each factor, but also how the effect of one factor changes as the other factors change. The latter is generallyreferred to as an interaction among factors. Generally, designingexperiments helps us do the followings perform experiments to evaluate the effects the factors have on thecharacteristics of interest, and also discover possible relationship among thefactors (which could affect the characteristics). The goal is to use these newunderstanding to improve product. answers to questions such as:

1. What are the key factors in a process?




2. At what settings would the process deliver acceptable performance?

3. What are the key, main and interaction effects in the process?

4. What settings would bring about less variation in the output?

Important steps in designing experiments

Several critical steps should be followed to achieve our goals:

1. State objective: write a mission statement for the experiment orproject;

2. Choose response: it is about consultation, have to ask clients what

they want know, or ask yourself; pay attention to thenominal-the-best responses;

3. Perform pre-experiment data analysis?

4. Choose factors and levels: you have to use flowchart to represent theprocess or system, use cause-effect diagram to list the potential factorsthat may impact the response;

5. Select experimental plan

6. Perform the experiment

7. Analyze the data

8. Draw conclusions and make recommendations.

5.2 DOE in Statistical Quality Control

History . The DOE’s history goes back to the 1930s of the 20-th century,when Sir R. A. Fisher in England used Latin squares to randomize the plantvarieties before planning at his farm, among other activities. The goal wasto get high productivity havests. The mathematical theory of combinatorial

designs was developed by R.C. Bose in the 1950s in India and then in theUS. Nowadays, DOE is extensively studied and employed in virtually everyhuman being’s activities, and the mathematics for DOE is very rich.

The term ”Algebraic Statistics” was coined by Pistone, Riccomagno andWynn in 2000. Motivated by problems in ”Design of Experiments”, such ascomputing fractional factorial design , they developed the systematic use of Groebner basis methods for problems in discrete probability and statistics.

In this lecture, the fractional factorial design has been chosen for detailedstudy in view of its considerable record of success over the last thirty years.



5.3. HOW TO MEASURE FACTOR INTERACTIONS? 53

It has been found to allow cost reduction , increase efficiency of

experimentation, and often reveal the essential nature of a process.What is an Experiment Design ? Fix n finite subsets D1; . . . ; Dn of the setof real numbers R. Their cartesian product D = D1 × · · · × Dn is a fnitesubset of Rn. In statistics, the set D is called a full factorial design . A basicaim in our studying is using full factorial designs or their subsets to find aregression model describing the relationship between factors.

An example of special interest is the case when Di = {0; 1} for all i. In thatcase, D consists of the 2n vertices of the standard n-cube and is referred toas a full Boolean design. For instance, consider a full factorial design 2 3

with three binary factors: the factor x1 of mixture ratio, the factor x2 of temperature, the factor x3 of experiment time period and the response y

of wood toughness. The levels of factors are given in the following table:

Factor Low (0) High (1)

Mix(ture) Ratio 45p 55pTemp(erature) 1000C 1500CTime period 30m 90m

Table 5.1: Factor levels of 23 factorial experiment

5.3 How to measure factor interactions?

This is very complicated topic! See more at [7].

5.4 What should we do to bring experiments intodaily life?

There is a few ways to do that but we have to employ Data Analysistechniques in a great deal. We illustrate the way we do by going through aparticular instance, e.g. a forward-looking application in woodindustry. Then see the next section for the data analyzing.

Description

A household furnishture production project requires studying producttoughness using 8 factors. Steps are

• Select experimental plan




RUN Mix Ratio Temp Time Yield

1 45p (-) 100C (-) 30m (-) 82 55p (+) 100C (-) 30m (-) 93 45p (-) 150C (+) 30m (-) 344 55p (+) 150C (+) 30m (-) 525 45p (-) 100C (-) 90m (+) 166 55p (+) 100C (-) 90m (+) 227 45p (-) 150C (+) 90m (+) 458 55p (+) 150C (+) 90m (+) 56

Table 5.2: Results of an example 23

Full Factorial Experiment

• Choose factors and levels

• State objective

• Conducting experiments

• Data analysis

• Draw conclusions and make recommendations

Select experimental planWe employ a strength 3 fractional factorial design, also called strength 3mixed-levels Orthogonal Array (OA) that has 96 runs and is able toaccomodate studying up to eight factors. This array is denoted byOA(96; 6 · 42 · 25; 3), its factors and their levels are described in Table 5.3.The factor description of a workable design. The full factorial design of the eight factors described above is the Cartesian product

{0, 1, . . . , 5} × {0, 1, . . . , 3}2 × {0, 1}5.

Using the full design, we are able to estimate all interactions, butperforming all 3072 runs exceeds the firm’s budget. Instead we use a

fractional factorial design , that is, a subset of elements in the full factorialdesign.

Our aim choose a fractional design that has rather small runsize but stillallows us to estimate the main effects and some of the two-interactions.A workable solution is the 96 run experimental design presented in Table5.4. This allows us to estimate the main effect of each factor and some of their pairwise interactions.



5.4. WHAT SHOULD WE DO TO BRING EXPERIMENTS INTO DAILY LIFE? 55

Table 5.3: Eight factors, the number of levels and the level meanings

LevelFactor Description # 0 1 2 3 4 5

1 (A) wood 6 pine oak birch chestnut poplar walnut

2 (B) glue 4 a (less b c d (mostadhesive) adhesive)

3 (C) moisture content 4 10% 20% 30% 40%

4 (D) processing time 2 1 h(our) 2h

5 (E) pretreatment 2 no yes

6 (F) indenting of wood samples 2 no yes

7 (G) pressure 2 1 pas(cal) 10 pas

8 (H) hardeningconditions 2 no yes

† The construction of new factors given the run size of an OA of strength 2and 3 (ie., extending factors while fixing the number of experiments and thestrength) by a combined approach is detailed in Chapters 3 and 4 of [3].

Remark 8.

1. If we want to measure simultaneously all effects up to 2-interactions of the above 8 factors, an ? run fractional design would be needed.

2. Constructing a ? run design is possible, and could be found with trial-and-error algorithms. But it lacks some attractive features such as balance, which will be discussed below.

3. The responses Y have been computed by simulation, not by conducting actual experiments.




run A B C D E F G H Y

wood glue moisture processing pre- indenting pressure hardening yieldtype content time treatment of wood conditions

samples

6 4 4 2 2 2 2 2

1 0 0 0 0 0 0 0 02 0 0 1 1 1 0 1 13 0 0 2 1 0 1 1 04 0 0 3 0 1 1 0 1. . . . . . . . .. . . . . . . . .. . . . . . . . .

81 5 0 0 1 1 1 0 082 5 0 1 0 0 1 1 183 5 0 2 0 1 0 1 084 5 0 3 1 0 0 0 1

85 5 1 0 0 1 0 1 186 5 1 1 1 1 1 0 087 5 1 2 1 0 0 0 188 5 1 3 0 0 1 1 089 5 2 0 0 0 1 0 190 5 2 1 1 0 0 1 091 5 2 2 1 1 1 1 192 5 2 3 0 1 0 0 093 5 3 0 1 0 0 1 094 5 3 1 0 1 0 0 195 5 3 2 0 0 1 0 096 5 3 3 1 1 1 1 1

Table 5.4: A mixed orthogonal design with 3 distinct sectionsThe 96 runs balanced factorial design.



Chapter 6

New directions andConclusion

This is a seminar-based chapter. Topics could be

6.1 Black-Scholes model in Finance

See Rubenstein.

6.2 Drug Resistance and Design of Anti-HIV drug

See Richard Bellman.

6.3 Epidemic Modeling

See O. Diekmann.

6.4 Conclusion

57



58 CHAPTER 6. NEW DIRECTIONS AND CONCLUSION



Chapter 7

Appendices

7.1 Appendix A: Theory of stochastic matrix forMC

A stochastic matrix is a matrix for which each column sum equals one. If the row sums also equal one, the matrix is called doubly stochastic. Hencethe transition probability matrix P = [ pij ] is a stochastic matrix.

Proposition 18. Every stochastic matrix K has

• 1 as an eigenvalue (possibly with multiple), and

• none of the eigenvalues exceeds 1 in absolute value, that is all eigenvalues λi satisfy |λi| ≤ 1.

Proof. DIY

Fact 9. If K is a stochastic matrix then K m is a stochastic matrix.

Proof. Let e = [1, 1, · · · , 1]t the all-one vector, then use the fact thatK e = e. Prove that K me = e.

Let A = [aij ] > 0 denote that every element aij of A satisfies the conditionaij > 0.

Definition 19.

• A stochastic matrix P = [ pij ] is ergodic if limm→∞ P m = L (say)

exists, that is each p(m)ij has a limit when m → ∞.

59



60 CHAPTER 7. APPENDICES

• A stochastic matrix P is regular if there exists a natural m such that

P m

> 0. In our context, a Markov chain, with transition probability matrix P , is called regular if there exists an m > 0 such that P m > 0,i.e. there is a finite positive integer m such that after m time-steps,every state has a nonzero chance of being occupied, no matter what the initial state.

Example 12. Is the matrix

P =

0.88 0.120.15 0.85

regular? ergodic? Calculate the limit matrix L = limm→∞ P m.

The limit matrix L = limm→∞ P m practically shows the longterm behaviour(distribution, property) of the process. How to know the existence L (i.e.the ergodicity of transition matrix P = [ pij ])?

Theorem 20. A stochastic matrix P = [ pij ] is ergodic if and only if * the only eigenvalue λ of modul (magnitude) 1 is 1 itself, and * if λ = 1 has multiplicity k, there exist k independent eigenvectorsassociated with this eigenvalue.

For a regular homogeneous Markov chain we have the following theorem

Theorem 21 (Regularity of stochastic matrix). If a stochastic matrix P = [ pij ] is regular then

1. 1 is an eigenvalue of multiplicity one, and all other eigenvalues λi

satisfy |λi| < 1;

2. P is ergodic, that is limm→∞ P m = L exists. Furthermore, L’s rowsare identical and equal to the stationary distribution p∗.

Proof. If (1) is proved then, by Theorem 20, P = [ pij ] is ergodic. Hence,when P = [ pij ] is regular, the limit matrix L = limm→∞ P m does exist. Bythe Spectral Decomposition (7.1),

P = E 1 + λ2E 2 + · · · + λkE k, where all |λi| < 1, i = 2, · · · , k.

Then, by (7.2) L = limm→∞ P m = limm→∞(E 1 + λm2 E 2 + · · · + λm

k E k) = E 1.Let vector p∗ be the unique left eigenvector associating with the biggesteigenvalue λ1 = 1 (which is simple eigenvalue since it has multiplicity one),that is p

∗P = p∗ ⇔ p

∗(P − 1I ) = 0, ( p∗ is called a stationary distributionof MC). Your final task is proving that L’s rows are identical and equal tothe stationary distribution p

∗ i.e.: L = [ p∗, · · · , p∗].



7.2. APPENDIX B: SPECTRAL THEOREM FOR DIAGONALIZABLE MATRICES 61

Corollary 22. Few important remarks are: ( a ) for regular MC, the

long-term behavior does not depend on the initial state distribution probabilities p(0); ( b) in general, the limiting distributions are influenced by the initial distributions p(0), whenever the stochastic matrix P = [ pij ] isergodic but not regular. (See more at problem D).

Example 13. Consider a Markov chain with two states and transition probability matrix

3/4 1/41/2 1/2

(a) Find the stationary distribution ˆ p of the chain.(b) Find limn→∞ P n by first evaluating P n.(c) Find limn→∞ P n.

7.2 Appendix B: Spectral Theorem forDiagonalizable Matrices

Consider a square matrix P of order s with spectrumσ(P ) = {λ1, λ2, · · · , λk} consisting of its eigenvalues. Few basic facts are:

• If {(λ1,x1), (λ2,x2), · · · , (λk,xk)} are eigenpairs for P , then

S = {x1, · · · ,xk} is a linearly independent set. If Bi is a basis for thenull space N (P − λiI ), then B = B1 ∪ B2 · · · ∪ Bk is a linearlyindependent set

• P is diagonalizable if and only if P possesses a complete set of eigenvectors (i.e. a set of s linearly independent vectors). Moreover,H −1P H = D = Diagmat(λ1, λ2, · · · , λs) if and only if the columns of H constitute a complete set of eigenvectors and the λ j ’s are theassociated eigenvalues- i.e., each (λ j , H [∗, j]) is an eigenpair for P .

Spectral Theorem for Diagonalizable Matrices. A square matrix P of order s with spectrum σ(P ) = {λ1, λ2, · · · , λk} consisting of eigenvalues isdiagonalizable if and only if there exist constituent matrices{E 1, E 2, · · · , E k} (called the spectral set) such that

P = λ1E 1 + λ2E 2 + · · · + λkE k, (7.1)

where the E i’s have the following properties:

• E i · E j = 0 whenever i = j, and E 2i = E i for all i = 1..k




• E 1 + E 2 + · · · + E k = I

In practice we employ Fact 7.1 in two ways:Way 1 : if we know the decomposition 7.1 explicitly, then we can computepowers

P m = λm1 E 1 + λm

2 E 2 + · · · + λmk E k, for any integer m > 0. (7.2)

Way 2 : if we know P is diagonalizable then we find the constituentmatrices E i by:* finding the nonsingular matrix H = (x1|x2| · · · |xk), where each xi is abasis left eigenvector of the null subspace

N (P − λiI ) = {v : (P − λiI )(v) = 0 ⇔ P v = λiv};

** then, P = HDH −1 = (x1|x2| · · · |xk) · D · H −1 whereD = diag(λ1, · · · , λk) the diagonal matrix, and

H −1 = K =

yt1

yt2

...

ytk

; (i.e.K = (y1|y2| · · · |yk)).

Here each yi is a basis right eigenvector of the null subspace

N (P − λiI ) = {v : vP = λiv}.

The constituent matrices E i = xi · yti.

Example 14. Diagonalize the following matrix and provide its spectral decomposition.

P =

1 −4 −4

8 −11 −8−8 8 5

.

The characteristic equation is p(λ) = det(P − λI ) = λ3 + 5λ2 + 3λ − 9 = 0.So λ = 1 is a simple eigenvalue, and λ = −3 is repeated twice (its algebraicmultiplicity is 2). Any set of vectors x satisfying

x ∈ N (P − λI ) ⇔ (P − λI )x = 0 can be taken as a basis of the eigenspaces(null space) N (P − λI ). Bases of for the eigenspaces are:

N (P − 1I ) = span

[1, 2, −2]

; and N (P + 3I ) = span

[1, 1, 0], [1, 0, 1]

.

Easy to check that these three eigenvectors xi form a linearly independentset, then P is diagonalizable. The nonsingular matrix (also called similaritytransformation matrix)



7.2. APPENDIX B: SPECTRAL THEOREM FOR DIAGONALIZABLE MATRICES 63

H = (x1|x2|x3) = 1 1 12 1 0

−2 0 1

;

will diagonalize P , and since P = HDH −1 we have

H −1P H = D = Diagmat(λ1, λ2, λ2) = Diagmat(1, −3, −3) =

1 0 0

0 −3 00 0 −3

Here, H −1

=

1 −1 −1

−2 3 22 −2 −1

implies that

yt1 = [1, −1, −1], yt

2 = [−2, 3, 2], yt3 = [2, −2, −1]. Therefore, the constituent

matrices

E 1 = x1·yt1 =

1 −1 −1

2 −2 −2−2 2 2

; E 2 = x2·yt

2 =

−2 3 2

−2 3 20 0 0

; E 3 = x3·yt

3 =

2 −2 −1

0 0 02 −2 −1

.

Obviously,

P = λ1

E 1

+ λ2

E 2

+ λ3

E 3

= 1 −4 −48 −11 −8

−8 8 5.



Bibliography

[1] Arjeh M. Cohen, Computer algebra in industry: Problem Solving in Practice, Wiley, 1993

[2] Nguyen, V. M. Man and the DAG group at Eindhoven University of Technology, www.mathdox.org/nguyen , 2005,

[3] Nguyen, V. M. Man Computer-Algebraic Methods for the Construction of Designs of Experiments, Ph.D. thesis, 2005, Technische UniversiteitEindhoven, www.mathdox.org/nguyen

[4] Nguyen, Van Minh Man, Depart. of Computer Science, Faculty of CSE,HCMUT, Vietnam, www.cse.hcmut.edu.vn/ mnguyen

[5] Brouwer E. Andries, Cohen M. Arjeh and Nguyen, V. M. Man,

Orthogonal arrays of strength 3 and small run sizes,www.cse.hcmut.edu.vn/ mnguyen/OrthogonalArray-strength3.pdf,Journal of Statistical Planning and Inference, 136 (2007)

[6] Nguyen, V. M. Man, Constructions of strength 3 mixed orthogonal arrays,www.cse.hcmut.edu.vn/ mnguyen/Specific-Constructions-OAs.pdf,Journal of Statistical Planning and Inference 138- Jan 2008,

[7] Eric D. Schoen and Nguyen, V. M. Man, Enumeration and Classification of Orthogonal Arrays, Faculty of Applied Economics,

University of Antwerp, Belgium (2007)

[8] Huynh, V. Linh and Nguyen, V. M. Man, Discrete Event Modeling in Optimization for Project Management , B.E. thesis, HCMUT, 69 pages,2008.

[9] T. Beth, D. Jung Nickel and H. Lenz. Design Theory vol II, pp 880,Encyclopedia of Mathematics and Its Applications, CambridgeUniversity Press (1999)

65



66 BIBLIOGRAPHY

[10] Glonek G.F.V. and Solomon P.J., Factorial and time course designs for

cDNA microarray experiments, Biostatistics 5, 89-111, 2004

[11] N. J. A. Sloane, A Library of Orthogonal Arrayshttp://www.research.att.com/ njas/oadir/index.html/,

[12] Warren Kuhfeld,http://support.sas.com/techsup/technote/ts723.html/

[13] Hedayat, A. S. and Sloane, N. J. A. and Stufken, J., Orthogonal Arrays, Springer-Verlag, 1999

[14] Madhav, S. P., iSixSigma LLC, Design Of Experiment For SoftwareTesting , isixsigma.com/library/content/c030106a.asp , 2004

[15] Bernd Sturmfels, Solving Polynomial Systems, AMS, 2002

[16] OpenModelica project, Sweden 2006,www.ida.liu.se/ pelab/modelica/OpenModelica.html

[17] Computer Algebra System for polynomial computations, Germany2006 www.singular.uni-kl.de/

[18] Sudhir Gupta. Balanced Factorial Designs for cDNA Microarray Experiments, Communications in Statistics: Theory andMethods, Volume 35, Number 8 (2006) , pp. 1469-1476

[19] Morris W. Hisch, Stephen Smale, Differential Equations, Dynamical Systems and Linear Algebra , 1980

[20] Jame Thomson, Simulation, A Modeler’s Approach , Wiley, 2000

[21] David Insua, Jesus Palomo, Simulation in Industrial Statistics, SAMSI,2005

[22] Ruth J. Williams, Introduction to the Mathematics of Finance, AMSvol 72, 2006

[23] C. S. Tapiero, Risk and Financial Management- Mathematical Methods, Wiley, 2004

[24] A.K. Basu, Introduction to Stochastic Processes, Alpha Science 2005

[25] L. Kleinrock, Queueing Systems, vol 2, John Wiley & Sons, 1976



BIBLIOGRAPHY 67

[26] L. Kleinrock, Time-shared systems: A theoretical treatment , Journal of

the ACM 14 (2), 1967, 242-261.

[27] S. G. Gilmour, Fundamentals of Statistics I , Lecture Notes School of Mathematical Sciences Queen Mary, University of London, 2006

[28] M. Parlar, Interactive Operations Research with Maple, Birkhouser2000.

[29] Tim Holliday, Pistone, Riccomagno, Wynn, The Application of Computational Algebraic Geometry to the Analysis of Design of Experiments: A Case Study.

Copyright 2010 by

Lecturer Nguyen V. M. Man, Ph.D. in Statistics

Working area Algebraic Statistics, Experimental Designs,

Statistical Optimization and Operations Research

Institution University of Technology of HCMC

Address 268 Ly Thuong Kiet, Dist. 10, HCMC, Vietnam

Ehome: www.cse.hcmut.edu.vn/~mnguyen

Email: [email protected]

[email protected]

Mathematical Modeling Simulation I 2010

Documents

Transcript of Mathematical Modeling Simulation I 2010