1614 probability-models and concepts

Post on 13-Apr-2017

183 views 6 download

Transcript of 1614 probability-models and concepts

ROBABILITY:

P

MODELS & CONCEPTS

Chance, Consequence & Strategy:

Likelihood or ProbabilitySince there is little in life that occurs with absolute certainty, probability theory has found application in virtually every field of human endeavor.

Why Probability Theory?

As we observe the universe about us, wonderful Craftsmanship can be seen.

As we examine the elements of this creation we discover that there is incredible order, but also variation therein.

Probability theory seeks to describe the variation or randomness within order so that underlying order may be better understood.

Once understood, strategies can be more effectively formulated and their risks evaluated.

Objective Assessment:Apriori & Aposteriori Probability

Apriori means “before the fact” and hence probability assessments of this sort typically rely on a study of

traits of the phenomenon under consideration. Based on Theory.

Aposteriori means “after the fact”. This approach to likelihood assessment is also called the “relative frequency” approach.Based on repeated observation.

Likelihood Concepts EVENTS

• As we observe a phenomenon, we generally note that varying, and sometimes “identical” conditions do not always give rise to identical results. As a phenomena is repeatedly observed, the various possible results can be thought of as “events”.

Mutually Exclusive Events

• Any number of events are said to be mutually exclusive if they have no overlap or commonality.

“Nothing is impossible Mario; improbable, unlikely maybe, but not impossible.” Luigi Mario speaking to brother, Mario Mario in the movie, “Super Mario Bros.”

• A collection of events is exhaustive if, taken in totality, they account for all possible results or outcomes.

A

B A and B are mutually exclusive.

Mutually Exclusive & Exhaustive Events

Intersection & Union of Events

The intersection of two or more events is like the intersection of two streets --- it is the property they share in-common.

The intersection of events A and B is symbolized by AB.

The union of two or more events is the totality of results captured by these events.The union of two events A and B is symbolized by AUB

Notation & Definitions The probability of the event A is given by: P(A)

The probability of AB is P(AB) = P(A) + P(B) - P(AUB) where P(AUB) is the probability of the union of events A and

B. The conditional probability of the event A given that the event B

has occurred is: P(A|B) = P(AB)/P(B)DEPENDENCE & INDEPENDENCE

Two events A and B are said to be independent if and only if: P(A|B) = P(A) and P(B|A) = P(B)

It follows from this that if A and B are independent then P(AB) = P(A)*P(B)

This is the multiplication rule for independent events.

A Service Sector Example:Fast Food Clientele

A leading fast food restaurant chain routinely & randomly surveys its customers in an effort to continually improve ability to serve their clientele. Two primary questions on the survey address frequency of customer patronage and primary reason for this patronage. Results of last month’s survey of 1,000 customers are recorded in the following table.

Survey of 1000 Customers:Frequency of and Reason for Patronage

occasional moderate frequent TOTALS

menu/food 60 120 30 210

customer relations

75 180 45 300

value/cost 35 200 40 275

location/ access

60 80 25 165

other reason

20 20 10 50

TOTALS 250 600 150 1000

Marginal Probability• Marginal probabilities can be thought of as

the probabilities of being in the various margins of the table. For example, the marginal probability of a customer patronizing the restaurant chain due to menu, regardless of frequency of patronage is:

• P(menu) = 210/1000 = .21• The various marginal probabilities for this

example are determined and represented graphically as follows. The graphs are “marginal probability distributions”.

Frequency of Patronage:Marginal Probability

Distribution • Occasional Patrons:

P(occasional) = 250/1000 = .25• Moderate Patronage:

P(moderate) = 600/1000 = .60• Frequent Patrons:

P(frequent) = 150/1000 = .150

0.1

0.2

0.3

0.4

0.5

0.6

occasion

moderate

frequent

Reasons for Patronage:Marginal Probability

Distribution• Menu: P(Menu) = 210/1000 = .21• C.Rel.: P(CR) = 300/1000 = .30• Value: P(Value) = 275/1000 = .275• Location: P(Loc) =165/1000 = .165• Other: P(Other) = 50/1000 = .05

0

0.05

0.1

0.15

0.2

0.25

0.3

menucust.rel.valuelocationother

Joint Probability• Consider the cross-tabulation relating the two traits:

– frequency of patronage, and – primary reason for patronage

• Joint probabilities are probabilities of intersections of the categories (or events) of two traits. As an example, the joint probability that a customer is moderate in their patronage and their primary reason for patronage is the menu is given by

– P(moderate menu) = P(AB) = 120/1000 = .120.• A graphical representation of the complete joint probability

distribution follows.

Reasons & Frequency of Patronage Joint

Probabilities

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

occasion moderate frequent

menucust.rel.valuelocationother

Conditional Probability

Conditional probability can be thought of as probability determined in the mode of either “what if” or “given that”

For example, we might ask, “what is the probability that a customer’s primary reason for patronage is the value (A), given that the customer is frequent (B) in their patronage?”

This is symbolized by P(A|B) and is calculated as P(AB)/P(B) where the vertical line, “|” is read as “given that”.

Thus a “conditional probability” is equal to the probability of the appropriate intersection, divided by the marginal probability of the given.

Reasons for Frequent Patrons:

Conditional Probabilities

0

0.2

0.4 menucust.rel.valuelocationother

• P(value | frequent) = P(value ∩ frequent)/P(frequent) =

(40/1000) / (150/1000) = .04/.15 = .267• This is represented by the “red” bar above. The entire

“conditional probability distribution of reasons for patronage by frequent customers” is displayed above.

Independence & Dependence

Recall that two events, A & B, are mutually independent if and only if P(A|B) = P(A) and P(B|A) = P(B)

Are the events A & B independent where: A = Primary patronage reason is customer relations B = Customer is a frequent in patronage Recall that P(A) = .3, that P(B) = .15 and that

P(AB) = 45/1000 = .045 so that P(A|B) = P(AB)/P(B) = .045/.15 = .30 = P(A) P(B|A) = P(AB)/P(A) = .045/.30 = .15 = P(B) Indeed, A & B are independent.,

Independence - Key Concept

If two events, A & B, are independent then the occurrence of one of the two events does not change the LIKELIHOOD or probability that the other of the two events will occur.

Occurrence of one of the two events does alter the MANNER in which the other of the two events may occur.

Dependence If two events A & B are dependent then P(A|B) will

not equal P(A) and, similarly, P(B|A) will not equal P(B).

Let A = primary reason for patronage is menu. Let B = frequency of patronage is moderate. We have P(A|B) = 120/600 = .20 and is not equal

to P(A) = 210/1000 = .21. P(B|A) = 120/210 = .57 which is not equal to

P(B) = 600/1000 = .60. In this case, even though values are comparable

they are not equal => dependence.

Dependence - Key Concept

If two events A & B are dependent, then occurrence of one of the two events will alter the likelihood and the manner in which the other of the two events may occur.

In the case of mutually exclusive events, occurrence of one of the two events will preclude occurrence of the other event.

Mutually exclusive events are always dependent.

ProbabilityModels

Probability Models Probability models are mathematical descriptions of the

behavior of one or more variables. The ability to somewhat anticipate the behavior of a variable can be useful in risk assessment and strategy formulation.

Three commonly used models, the binomial, Poisson, and normal models, are introduced.

Random variables described by these models may be either ‘discrete’ or ‘continuous’.

Mean, Variance and Standard Deviation of a

Random Variable The mean of a random variable (r.v.) Y is denoted by

Y. For a discrete r.v. Y this is calculated as: Y = yiP(yi) This is the weighted average of the values of Y. For continuous random variables, integration replaces summation.

The variance and standard deviation of the r.v. Y are represented by 2

Y and Y, respectively. For a discrete r.v. Y, these are:

2Y = Pyi)(yi - Y)2 and Y = 2

Y

The Poisson ModelNapoleon had a problem: many of

his men were killed when kicked in the head by their own horse or mule.

Napoleon had to plan for this problem.

The Poisson model helped him to do so.

Poisson ConditionsThe Poisson model (or distribution) is

commonly applicable when: We are modeling events which occur only “rarely”,

where “rare” means “rare relative to opportunity for occurrence”.

Our random variable will be the “number of occurrences of the event over the region of opportunity for occurrence”.

Poisson Conditions:Region of Opportunity

Examples of region of opportunity include:number of customers arriving per minute

(or any other time unit);number of phone calls arriving at a

switch board per unit time;number of scars on the surface of a

compact disk.Generally “region of opportunity” is

defined either temporally or spatially.

The Poisson Model is Integral to the Study of

Queueing Theory

The Poisson Model• Defining our random variable as Y = “number of

occurrences of the event over the region of opportunity”, y = 0, 1, 2, 3, ... we have the Poisson probability model:

• P(y) = ye-/y! for y = 0, 1, 2, 3, ...

• Where is the mean or average number of occurrences of the event over the region of opportunity and e = 2.7183 is the natural base.

Estimation of the Process Mean,

• The mean of the Poisson process is ,• The variance of the process is also , that is2 = • so that the standard deviation is = • In the following example we proceed as though is of

known value. When this is not the case we simply estimate with X, the mean of the sample.

First Federal Bank of Centerville

A Queueing Example First Federal Bank (FFB) of Centreville has an automatic teller

machine (ATM) near the entrance of the bank. Long lines at the ATM have sometimes led to congestion and

perhaps a diminishing clientele. With a view toward improved customer service, FFB is considering the addition of one or more ATMs or, possibly, relocation of the current ATM.

During peak hours ATM users arrive in a manner described by a Poisson distribution with a mean of 1.7 customers per minute.

First Federal Bank of Centreville

The Probability Distribution• What is the probability that no customers arrive in

one-minute during a peak business period?

• Solution: P(0) = 1.70e-1.7/0! = .1827• Similarly, P(1) = 1.71e-1.7/1! = .3106• Determine probabilities for 2, 3, ...., 9 customers.

The probability distribution appears on the next slide.

FFB of CentrevilleProbability Distribution

x P(X = x) 0 0.1827 1 0.3106 2 0.2640 3 0.1496 4 0.0636 5 0.0216 6 0.0061 7 0.0015 8 0.0003 9 0.0001 10 0.0000

x P(X LESS < x) 0 0.1827 1 0.4932 2 0.7572 3 0.9068 4 0.9704 5 0.9920 6 0.9981 7 0.9996 8 0.9999 9 1.0000

Poisson Probabilities with µ= 1.7

Poisson Cum

ulative Probabilitiesw

ith µ= 1.7

First Federal Bank of CentrevilleATM Customer Probabilities

0

0.05

0.1

0.15

0.2

0.25

0.3

0.350123456789

First Federal Bank of Centreville

CDF Graph

0

0.2

0.4

0.6

0.8

10123456789

The cdf graph above was constructed by adding the appropriate Poisson probabilities.

First Federal Bank of Centreville

Key ConsiderationsKey factors that FFB should address

prior to making a decision include:What is the service rate (how quickly do

customers complete their ATM transactions)? If long lines are forming during peak hours, the

service rate may be less than customer arrival rate and addition of one or more ATMs may be necessary.

If the problem is congestion, rather than excessive wait to use the ATM, the solution may be to simply move the ATM.

Model Adequacy:Chi-Square Goodness of Fit Testing

DOES THIS MODEL FIT?Chi-Square Goodness-of-Fit

Tests The purpose of 2 goodness-of-fit tests is to

evaluate whether a particular probability distribution does an adequate job of modeling the behavior of the process under consideration. This sort of test can be applied to any model.

A “skeleton” or template for the chi-square goodness-of-fit test follows.

2 Goodness of Fit Test - General Layout.

1) H0: p1 = p10, p2 = p20, ... , pk = pk0

HA: at least one pi ≠ pi0

2) n = _______ = _______3) DR: Reject H0 in favor of HA iff 2

calc > 2crit = ___.

Otherwise, FTR H0.4) 2

calc = (Oi - npio)2/npio = (Oi - Ei)2/Ei

5) Interpretation: Should relate to whether the hypothesized model adequately describes behavior of the process underconsideration.

Generic Example: A computer manufacturer produces a disk drive which has three major causes of failure (A, B, C) and a variety of minor failure causes (D).

Suppose that historic failure rates are:Due to A: .20 Due to B: .35 Due to C: .30 Due to D: .15The manufacturer has worked on A, B, and C and believes that failures due to these causes has been reduced, so that, while fewer failure will occur, it is more likely that when one occurs, it will be due to D. To examine this claim the manufacturer will sample 200 failed disk drives manufactured since process changes were made. IF THE CHANGES HAD NO IMPACT then the number of these failed drives that were due to causes A, B, C, and D that would be EXPECTED would be:EA = npA0 = 200(.20) = 40 EB = npB0 = 200(.35) = 70EC = npC0 = 200(.30) = 60 ED = npD0 = 200(.15) = 30

Upon observation, suppose that we had OA = 28, OB = 66, OC = 46, OD = 60. Test the appropriate hypothesis at the= .05 level.

CONTINUED NEXT PAGE

Failure Mode Profile Example - Continued

1) H0: pA = .20, pB = .35, pC = .30, pD = .15

HA: at least one pi ≠ pi0 for i = A, B, C, D

2) n = 200 = .05

3) DR: Reject H0 in favor of HA iff 2c > 2

T = 7.8147. Otherwise, FTR H0. Note: There are (k-1) = 3 degrees of freedom.

4) 2c = (Oi - npio)2/npio = (Oi - Ei)2/Ei

= (28-40)2/40 + (66-70)2/70 + (46-60)2/60 + (60-30)2/30 = 3.6000 + 0.2286 + 3.2667 + 30.0000 = 37.0953

5) Interpretation: Since 2c exceeds 2

T, we can conclude that the historic failure mode distribution no longer applies (reject H0 in favor of HA). So how has the distribution changed? The answer is embedded in the individual category contributions to 2

calc ... larger contributions indicate where the changes have occurred: reductions in A and C, no obvious change in B, the various failures that make-up D now comprise a (proportionally) larger amount of the failures.

Chi-Square Goodness of Fit Test

for the Poisson DistributionA sample of 120 minutes selected during rush periods at FFB gave the following number of customers arriving during each of those 120 minutes. Is this data consistent with a Poisson distribution with a mean of 1.7 customers per minute, as previously stated? Test the appropriate hypothesis at the = .10 level of significance.

Number of 0 1 2 3 4 or more Customers

Frequency 25 42 35 9 9

FFB of CentrevillePoisson Goodness of Fit

TestCustomers/ Prob. Obs (O) Exp (E) (O-E)2/Eminute 0 0.1827 25 21.924 0.4316 1 0.3106 42 37.272 0.5998 2 0.2640 35 31.680 0.3479 3 0.1496 9 17.952 4.4640

> 4 0.0932 9 11.184 0.4265 1.00 120 120 6.2698 = 2

calc

with = .10 and (k-1) = 4 df, the critical value is 7.7794

FFB of Centreville - Continued

1) H0: the number of customers arriving per minute is Poisson distributed with a mean of 1.7. OR p(0) = .1827 p(1) = .3106 p(2) = .2640 p(3) = .1496 p(4+) = .0932

HA: the number of customers arriving per minute is not Poisson with = 1.7

2) n = 120 and = .10

3) DR: Reject H0 in favor of HA iff 2calc > 2

crit = 7.7794. Otherwise, FTR H0. (NOTE - THERE ARE 4 DF)

4) 2calc = 6.2698 (calculations on previous slide)

5) FTR H0. In this case, the number of customers arriving per minute during the business rush at FFB of Centreville is reasonably well-modeled by a Poisson distribution with a mean of 1.7.

As a modification --- if we had not had information about the mean number of customers arriving per minute, we would have had to estimate this value with the sample mean and then determined the estimated probabilities. This would have cost an additional degree of freedom (e.g. df = (k-1) - 1 = 3.

Binomial Conditions

Suppose that there are two possible outcomes to an experiment which are mutually exclusive and exhaustive (refer to these generically as “success” and “failure”);

a predetermined sample size, n;the probability of “success” is a constant, p, and

the probability of “failure” is a constant, (1-p);the condition of one item is not influenced by the

condition of any other item (this is called independence).

Collectively, these are the binomial conditions.

Binomial Probability Model

• When the binomial conditions are present, and the random variable Y is defined as the number of “successes” out of n items sampled, then the model which determines probabilities for the various values of Y is given by:

• P(Y = y) = [nCy]py(1-p)n-y • where nCy = n!/[y!(n-y)!] is read as the number of

combinations of n things selected y-at-a-time.• with any integer x! being x(x-1)(x-2)...(1)• so that, for example, 5! = 5(4)(3)(2)(1) = 120

Binomial Mean, Varianceand Standard Deviation

• Although the formulas previously presented can be used to determine the values of Y, 2

Y and Y, the following results are more easily applied in the binomial case:

Y = np2

Y = np(1-p)Y = np(1-p)

Estimation of p The binomial parameter, p, is thought of as the “probability that

any single item sampled is identified as a ‘success’ “. Frequently this value will be unknown and will need to be

estimated from sample information. p is estimated as simply x/n where x is the number of ‘successes’

in the sample of n items. This estimate is often denoted by p. Similarly, the estimate of (1-p) is (1-p).

^^

The Electronix Store

In a competitive local retail electronics market, the probability that a randomly selected “customer” browsing in The Electronix Store will make a purchase is .2.

If 6 “customers” are randomly selected, what is the probability that exactly 2 of these individuals will make a purchase?

This and similar questions can be addressed via the binomial distribution.

The Electronix StoreWe identify:

n = 6 customersp = .2 = probability that a customer buysY = number of the six customers who buy

Thus we see that:Y = np = 6(.2) = 1.2 Customers2

Y = np(1-p) = 6(.2)(.8) = .96Y = √ .96 = .98 Customers

^

^^ ^

The Electronix Store

• We have:– P(0) = .2621 P(1) = .3932– P(2) = .2458 P(3) = .0819– P(4) = .0154 = {6!/[4!2!]}(.2)4(.8)2

– = 15(.0016)(.64)– P(5) = .0015 P(6) = .0001 (or .000064)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0123456

The Electronix StoreCustomer Purchase

Probabilities

The Electronix Store

We may require answers to such questions as:“What is the probability that no more than two of six

customers make a purchase?”“What is the probability that at least four of six

customers make a purchase?”“How many cash registers are needed?”

Answers to these and similar questions can be investigated through study such as we have undertaken.

The Electronix StoreCumulative Probabilities

0

0.5

10123456

• A tabulation of the “less than or equal to” probabilities is called a “cumulative distribution function” or cdf. The Electronix Store cdf appears above.

Application of this information might spark discussion on:staffing decisions,sales representative

specialization focus on

merchandise, value, and customer service.

The Electronix Store:Strategy

2 Goodness-of-Fit Test: Binomial

ExampleOil & Gas Exploration is both expensive and risky. The average cost of a “dry hole” is in excess of $20 million. New technologies are always under development in an effort to reduce the likelihood of drilling a “dry hole” with the result being increased profitability. Suppose an experimental technology has been developed that claims to have an 80% success rate (e.g. only 20% dry holes). This technology was tested by drilling four holes and counting the number of productive wells. This was done 100 times, each time counting the number of productive wells. The data is recorded below:

Number of productive wells 0 1 2 3 4

Observed 3 6 22 41 28 Frequency

Test the appropriate hypothesis at the = .01 level of significance.

Oil & Gas Exploration Example

1) H0: the new technology delivers success according to a binomial distribution with p = .8 or ... p(0 or 1) = .0272 p(2) = .1536 p(3) = .4096 p(4) = .4096 (NOTE - SEE NEXT PAGE FOR THESE VALUES)

HA: the new technology does not deliver success according to a binomial distribution with p=.8.

2) n = 100 and = .01

3) DR: Reject H0 in favor of HA iff 2calc > 2

crit = 11.3449. Otherwise, FTR H0.

4) 2calc = 21/4705 (calculations on next slide)

5) Reject H0 in favor of HA. In this case, note that “O” tends to be greater than “E” for lower numbers of successful wells, and the reverse for higher numbers of successful wells ... this indicates that the success rate of the new technology is LESS THAN THE CLAIMED 80% rate.

Hits Prob Count Expected Combined C-Prob C-Count C-Expect (O-E)^2/E X^2calc 0 0.0016 3 0.16 0-1 0.0272 9 2.72 14.4994 21.4705 1 0.0256 6 2.56 2 0.1536 22 15.36 2.8704 2 0.1536 22 15.36 3 0.4096 41 40.96 0.0000 3 0.4096 41 40.96 4 0.4096 28 40.96 4.1006 4 0.4096 28 40.96

Modified Oil & Gas Exploration Example

(still binomial)If p were unknown, then it would have to be estimated from the data. There is a cost to this --- a lost degree of freedom. In general df = (k - 1) - m

where k = number of categories-1 because the probabilities across all categories add to one (lacking only one probability, we can determine the otherm = the number of parameters that must be estimated.

In this case, the estimate of p is this: a total of 400 wells were drilled (100 fields at 4 wells each). The number of productive wells was (3*0 + 6*1 + 28*2 + 41*3 + 22*4) = 273

So that our estimate of p is 273/400 = .6825. The modified calculations follow.

Modified Oil & Gas Exploration Example

MTB > pdf;SUBC> binomial n=4 p=.6825.

BINOMIAL WITH N = 4 P = 0.682500 K P( X = K) Observed Expected (O-E)2/E 0 0.0102

combine these .0976 9 9.76 0.0592 1 0.0874

2 0.2817 28 28.17 0.0010 3 0.4037 41 40.37 0.0098 4 0.2170 22 21.70 0.0041

0.0742 = calculated value of 2

MTB > invcdf .99;SUBC> chis 2. 0.9900 9.2103 = critical value

Clearly we would FTR H0. So that if you combine the information, really, you havenot rejected the binomial distribution altogether ... though you did reject the binomialdistribution with p=.8. The binomial distribution with p=.6825 does an excellent jobof modeling the performance of this new oil & gas exploration technology.

The Normal Probability Model

The “normal” or “Gaussian” distribution is the most commonly used of all probability models.This distribution is known perhaps most familiarly as the “bell curve”. The normal distribution serves as the assumed model of behavior for various phenomena, generally as an approximation. It is also foundational to the development of numerous commonly used statistical methods.

The Normal Distribution• The normal distribution is described by the

mathematical expression:

f(x) = (1/ √ 22)exp(-(x-)2/22)

X is a random variable with mean and standard deviation exp = e = 2.7183 is the natural base, raised to the power expressed in the ( ). As will be seen, we need not work with the formula above.

00.020.040.060.080.1

0.120.140.160.180.2

A histogram representation of the normal distribution might appear as this one.

The normal distribution is symmetric about its mean, It is also well-tabled as the “standard normal distribution” with = 0 and = 1.

The Normal Probability Model

Table Use - Relationships

Since the normal distribution isa probability distribution, with total area

under the curve equal to 1, andsymmetric about its mean, µ, we have:

P(Z > Z*) = .5 - A(Z*) where Z* > 0 A(-Z*) = A(Z*) by symmetry. Knowing these few relationships, any needed

probabilities can be found. Only positive values of Z need be tabled.

Z Table Use Examples

• Using available Z tables determine :• A(1.33) and A(-1.33)• The probability of being between Z = -1.33 and +1.33.• The probability that Z is at most 1.33• The probability that Z is at least 1.33• The probability that Z is at most -1.33• The probability that Z is between -.75 and +1.2• The probability that Z is between +.50 and +1.2

-1.33 0 .5 .75 1.2 1.33

Z Table - Selected Portions

Z 0.00 0.01 0.02 0.03 0.04 0.05 ......... 0.090.0 .0000 .0040 .0080 .0120 .0160 .0199 ......... .0359

0.5 .1915 .1950 .1985 .2019 .2054 .2088 ......... .2224

0.7 .2580 .2611 .2642 .2673 .2704 .2734 ......... .2852

1.2 .3849 .3869 .3888 .3907 .3925 .3944 ......... .4015

1.3 .4032 .4049 .4066 .4082 .4099 .4155 ......... .4177

Inverse Use of the Z Table

In application, there are two common variations requiring opposite use of tables of the standard normal distribution.

We have illustrated the first variation where, given one or more values of Z, we can determine the needed area under the curve (e.g. the needed probability).

The “inverse” situation is one in which an area under the curve is designated, and the corresponding value(s) of Z are obtained.

Inverse Use of the Z Table The inverse approach is to:

locate the appropriate area or probability in the body of the table,

then move to the corresponding top and left table margins to identify the appropriate value(s) of Z.

From this we have X = + Z

A(Z) = known

?

Application of the Inverse Normal

The Normal Distribution in General

We can determine probabilities for any normally distributed process performance measure or PPM, X, by determining the corresponding value of Z, that is Z = (X - )/

Inversely, given an area under the curve, we can determine a needed value of X as: X = + Z

The SUPER MarketThe SUPER Market, a major metropolitan area

superstore chain, offers delivery service to addresses within a defined region.

The SUPER Market guarantees delivery within two hours of the time that the order is received. If this guarantee is not met, the customer receives a 10% discount for each 30 minutes late.

The SUPER Market

• Delivery time is approximately normally distributed with an average delivery time of 1 hour and 20 minutes and a standard deviation of 20 minutes. That is = 80 min. and = 20 min.

Guaranteeddelivery

within two hours!

The SUPER Market:Time to Delivery

• Inverse Problems• Given a designated

probability, what is the corresponding value of Z and, in turn, X = delivery time?

A Goodness of Fit Test for the Normal

Distribution IS DELIVERY TIME NORMAL? To determine whether delivery times for the SUPER MARKET are, within reason, normally distributed we would select a random sample of delivery times and apply any of a number of goodness of fit techniques.

While the chi-square goodness of fit test could be applied, a graphical procedure, the normal probability plot, will be illustrated. This is augmented by a more formal procedure, the Anderson-Darling test.

To proceed we will select a sample of, say, 40 delivery times. These appear in the sequel.

40 Sampled Delivery Times56 89 123 97 68 79 80 96 74 108 86 65 102 96 90 88 67 87 58 71 72 83 90 59 76 73 82 88 63 114 86 54 109 43 69 47 90 96 52 117

N Mean Median Std. Dev. Del_Time 40 81.07 82.50 19.45

p-value: 0.934A-Squared: 0.166

Anderson-Darling Normality Test

N of data: 40Std Dev: 19.448Average: 81.075

120110100908070605040

.999

.99

.95

.80

.50

.20

.05

.01

.001

Prob

abilit

y

Del_Time

Normal Probability PlotSampled Delivery Times from the SUPER Market

Normally distributed values should plot VERY close to a straight line. While this is a judgment call, a more objective approach is to examine the p-value from theAnderson-Darling test -- if the p-value is less than , then normality is questionable.