4 Continuous Random Variables and Probability...

4 Continuous RandomVariables and

Probability Distributions

Stat 4570/5570

Material from Devore’s book (Ed 8) – Chapter 4 - and Cengage

2

Continuous r.v.A random variable X is continuous if possible valuescomprise either a single interval on the number line or aunion of disjoint intervals.

Example: If in the study of the ecology of a lake, X, the r.v.may be depth measurements at randomly chosen locations.Then X is a continuous r.v. The range for X is the minimumdepth possible to the maximum depth possible.

3

Continuous r.v.In principle variables such as height, weight, andtemperature are continuous, in practice the limitations ofour measuring instruments restrict us to a discrete (thoughsometimes very finely subdivided) world.

However, continuous models often approximate real-worldsituations very well, and continuous mathematics (calculus)is frequently easier to work with than mathematics ofdiscrete variables and distributions.

4

Probability Distributions for Continuous Variables

Suppose the variable X of interest is the depth of a lake ata randomly chosen point on the surface.

Let M = the maximum depth (in meters), so that anynumber in the interval [0, M ] is a possible value of X.

If we “discretize” X by measuring depth to the nearestmeter, then possible values are nonnegative integers lessthan or equal to M.

The resulting discrete distribution of depth can be picturedusing a probability histogram.

5


If we draw the histogram so that the area of the rectangleabove any possible integer k is the proportion of the lakewhose depth is (to the nearest meter) k, then the total areaof all rectangles is 1:

Probability histogram of depth measured to the nearest meter

6


If depth is measured much more accurately, each rectanglein the resulting probability histogram is much narrower,though the total area of all rectangles is still 1.

Probability histogram of depth measured to the nearest centimeter

7


If we continue in this way to measure depth more and morefinely, the resulting sequence of histograms approaches asmooth curve.

Because for each histogram the total area of all rectanglesequals 1, the total area under the smooth curve is also 1.

A limit of a sequence of discrete histograms

8


DefinitionLet X be a continuous rv. Then a probability distributionor probability density function (pdf) of X is a function f (x)such that for any two numbers a and b with a ≤ b,

P (a ≤ X ≤ b) =

9


The probability that X takes on a value in the interval [a, b]is the area above this interval and under the graph of thedensity function:

P (a ≤ X ≤ b) = the area under the density curve between a and b

10


For f (x) to be a legitimate pdf, it must satisfy the followingtwo conditions:

1. f (x) ≥ 0 for all x

2. = area under the entire graph of f (x) = 1

11

ExampleConsider the reference line connecting the valve stem on atire to the center point.

Let X be the angle measured clockwise to the location ofan imperfection. One possible pdf for X is

12

Example, contThe pdf is graphed in Figure 4.3.

The pdf and probability from Example 4

Figure 4.3

cont’d

13

Example , contClearly f(x) ≥ 0. How can we show that the area of this pdfis equal to 1?

How do we calculate P(90 <= X <= 180)?

The probability that the angle of occurrence is within 90° ofthe reference line (reference line is at 0 degrees)?

cont’d

14


Because whenever 0 ≤ a ≤ b ≤ 360 in Example 4.4 andP (a ≤ X ≤ b) depends only on the width b – a of the interval,X is said to have a uniform distribution.

DefinitionA continuous rv X is said to have a uniform distributionon the interval [A, B] if the pdf of X is

15

Example“Time headway” in traffic flow is the elapsed time betweenthe time that one car finishes passing a fixed point and theinstant that the next car begins to pass that point.

Let X = the time headway for two randomly chosenconsecutive cars on a freeway during a period of heavyflow.

This pdf of X is essentially the one suggested in “The Statistical Properties ofFreeway Traffic” (Transp. Res., vol. 11: 221–228):

16

Example , contThe graph of f (x) is given in Figure 4.4; there is no densityassociated with headway times less than .5, and headwaydensity decreases rapidly (exponentially fast) as xincreases from .5.

The density curve for time headway in Example 5

Figure 4.4

cont’d

17

Example , contClearly, f (x) ≥ 0; to show that f (x)dx = 1, we use thecalculus result

e–kx dx = (1/k)e–k a.

What is the probability that headway time is at most 5seconds?

cont’d

18

The Cumulative DistributionFunction

19

The Cumulative Distribution Function

The cumulative distribution function F(x) for acontinuous rv X is defined for every number x by

F(x) = P(X ≤ x) =

For each x, F(x) is the area under the density curve to theleft of x. This is illustrated in Figure 4.5, where F(x)increases smoothly as x increases.

Figure 4.5A pdf and associated cdf

20

ExampleLet X, the thickness of a certain metal sheet, have auniform distribution on [A, B].

The density function is shown in Figure 4.6.

Figure 4.6

The pdf for a uniform distribution

21

Example , contFor x < A, F(x) = 0, since there is no area under the graphof the density function to the left of such an x.

For x ≥ B, F(x) = 1, since all the area is accumulated to theleft of such an x. Finally for A ≤ x ≤ B,

cont’d

22

Example , contThe entire cdf is

The graph of this cdf appears in Figure 4.7.

Figure 4.7

The cdf for a uniform distribution

cont’d

23

Using F (x) to Compute Probabilities

24

Percentiles of a Continuous Distribution

When we say that an individual’s test score was at the 85thpercentile of the population, we mean that 85% of allpopulation scores were below that score and 15% wereabove.

Similarly, the 40th percentile is the score that exceeds 40%of all scores and is exceeded by 60% of all scores.

25


PropositionLet p be a number between 0 and 1. The (100p)thpercentile of the distribution of a continuous rv X, denotedby η(p), is defined by

p = F(η(p)) = f(y) dy

η(p) is the specific value such that 100p% of the areaunder the graph of f(x) lies to the left of η(p) and 100(1 – p)%lies to the right.

26


Thus η(.75), the 75th percentile, is such that the area underthe graph of f(x) to the left of η(.75) is .75.

Figure 4.10 illustrates the definition.

Figure 4.10

The (100p)th percentile of a continuous distribution

27

Example 9The distribution of the amount of gravel (in tons) sold by aparticular construction supply company in a given week is acontinuous rv X with pdf

What is the cdf of sales for any x? How do you use this tofind the probability that X is less than .25? What about theprobability that X is greater than .75? What aboutP(.25 < X < .75)?

28

Example 9The graphs of both f (x) and F(x) appear in Figure 4.11.

Figure 4.11

The pdf and cdf for Example 4.9

cont’d

29

Example 9

How do we find the 40th percentile of this distribution?

How would you find the median of this distribution?

cont’d

30


DefinitionThe median of a continuous distribution, denoted by , isthe 50th percentile, so satisfies .5 = F( ) That is, half thearea under the density curve is to the left of and half is tothe right of .

A continuous distribution whose pdf is symmetric—thegraph of the pdf to the left of some point is a mirror imageof the graph to the right of that point—has median equalto the point of symmetry, since half the area under thecurve lies to either side of this point.

31

Expected Values

DefinitionThe expected or mean value of a continuous rv X withthe pdf f (x) is:

µ x = E(X) = x f(x) dx

32

Example, contBack to the gravel example, the pdf of the amount ofweekly gravel sales X is:

33

Expected Values of functions of r.v.

If h(X) is a function of X, then

E[h(X)] = µh(X) = h(x) f (x) dx

For h(X), a linear function,

E[h(X)] = E(aX + b) = a E(X) + b

34

VarianceThe variance of a continuous random variable X with pdff(x) and mean value µ is

= V(X) = (x – µ)2 f (x) dx = E[(X – µ)2] = E(X2) – [E(X)]2

The standard deviation (SD) of X is σX =

When h(X) = aX + b, the expected value and variance ofh(X) satisfy the same properties as in the discrete case:

E[h(X)] = a µ + b and V[h(X)] = a2 σ2.

35

Example, cont.

How do we compute the variance for the weeklygravel sales example?

36Copyright © Cengage Learning. All rights reserved.

4.3 The Normaldistribution

37

The Normal DistributionThe normal distribution is probably the most importantdistribution in all of probability and statistics.

Many populations have distributions that can be fit veryclosely by an appropriate normal (Gaussian, bell) curve.

Examples: height, weight, and other physicalcharacteristics, scores on various tests

38

The Normal DistributionDefinitionA continuous rv X is said to have a normal distributionwith parameters µ and σ (or µ and σ2), where < µ <and 0 < σ, if the pdf of X is

f(x; µ, σ) = < x < (4.3)

e denotes the base of the natural logarithm systemand equals approximately 2.71828π is another mathematical constant with approximate value3.14159.

39

The Normal DistributionThe statement that X is normally distributed withparameters µ and σ2 is often abbreviated X ~ N(µ, σ2).

Clearly f(x; µ, σ) ≥ 0, but a somewhat complicated calculusargument must be used to verify that f(x; µ, σ) dx = 1.

Similarly, it can be shown that E(X) = µ and V(X) = σ2, sothe parameters are the mean and the standard deviation ofX.

40

The Normal DistributionFigure below presents graphs of f(x; µ, σ) for severaldifferent (µ, σ) pairs.

Two different normal density curves Visualizing µ and σ for a normal distribution

41

The Standard NormalDistribution

42

The Standard Normal DistributionThe normal distribution with parameter values µ = 0 andσ = 1 is called the standard normal distribution.

A random variable with this distribution is called a standardnormal random variable and is denoted by Z. Its pdf is:

< z <

The graph of f(z; 0, 1) is called the standard normal curve.Its inflection points are at 1 and –1.

The cdf of Z is which we denote by

43

The Standard Normal DistributionThe standard normal distribution rarely occurs naturally.

Instead, it is a reference distribution from whichinformation about other normal distributions can beobtained via a simple formula.

These probabilities can then be found “normal tables.”

This can also be computed with a single command inR, Matlab, Mathematica, Stata, etc.

44

The Standard Normal DistributionFigure below illustrates the probabilities tabulated in TableA.3:

45

ExampleP(Z ≤ 1.25) = (1.25), a probability that is tabulated in anormal table.

What is this probability?

Figure below illustrates this probability:

cont’d

46

Example , cont.b) P(Z ≥ 1.25) = ?c) Why does P(Z ≤ –1.25) = P(Z >= 1.25)? What is

(–1.25)?d) How do we calculate P(–.38 ≤ Z ≤ 1.25)?

cont’d

47

Percentiles of the Standard NormalDistribution

48

ExampleThe 99th percentile of the standard normal distribution isthat value of z such that the area under the z curve to theleft of the value is .99

Tables give for fixed z the area under the standard normalcurve to the left of z, whereas now –we have the area and want the value of z.

This is the “inverse” problem to P(Z ≤ z) = ?

How can the table be used for this?

49

ExampleBy symmetry, the first percentile is as far below 0 as the99th is above 0, so equals –2.33 (1% lies below the firstand also above the 99th).

cont’d

50

Percentiles of the Standard Normal Distribution

If p does not appear in a table, what can we do?

What is the 95th percentile of the normal distribution?

51

zα NotationIn statistical inference, we need the z values that givecertain tail areas under the standard normal curve.

There, this notation will be standard:zα will denote the z value for which α of the area underthe z curve lies to the right of zα.

52

zα Notation for z Critical ValuesFor example, z.10 captures upper-tail area .10, and z.01captures upper-tail area .01.

Since α of the area under the z curve lies to the right of zα,1 – α of the area lies to its left.

Thus zα is the 100(1 – α)th percentile of the standardnormal distribution.

Similarly, what does –zα mean?

53

zα Notation for z Critical ValuesTable below lists the most useful z percentiles and zαvalues.

54

Example – critical valuesz.05 is the 100(1 – .05)th = 95th percentile of the standardnormal distribution, so z.05 = 1.645.

The area under the standard normal curve to the left of–z.05 is also .05

55

Nonstandard Normal DistributionsWhen X ~ N(µ, σ 2), probabilities involving X are computedby “standardizing.” The standardized variable is (X – µ)/σ.

Subtracting µ shifts the mean from µ to zero, and thendividing by σ scales the variable so that the standarddeviation is 1 rather than σ.

PropositionIf X has a normal distribution with mean µ and standarddeviation σ, then

is distributed standard normal.

56

Nonstandard Normal DistributionsWhy do we standardize normal random variables?

Equality of nonstandard and standard normal curve areas

57

Nonstandard Normal Distributionshas a standard normal distribution. Thus

58

Using Normal to approximate theBinomial Distribution

59

Approximating the Binomial Distribution

Figure below displays a binomial probability histogram forthe binomial distribution with n = 20, p = .6,for which µ = 20(.6) = 12 and σ =

Binomial probability histogram for n = 20, p = .6 with normal approximation curve superimposed

60

Approximating the Binomial Distribution

Let X be a binomial rv based on n trials with successprobability p. Then if np is large (the binomial probabilityhistogram is not too skewed), X has approximately anormal distribution with µ = np and σ =

In particular, for x = a possible value of X?

61

Exponential Distribution

62

The Exponential DistributionsThe family of exponential distributions provides probabilitymodels that are very widely used in engineering andscience disciplines. Examples?

DefinitionX is said to have an exponential distribution with the rateparameter λ (λ > 0) if the pdf of X is

63

The Exponential DistributionsIntegration by parts give the following results:

How would we set up the calculation for EX?Both the mean and standard deviation of the exponentialdistribution equal 1/λ.

CDF:

64

Another important application of the exponential distributionis to model the distribution of lifetimes.

A partial reason for the popularity of such applications isthe “memoryless” property of the Exponential distribution.

The Exponential Distributions

65

The Exponential DistributionsSuppose a light bulb’s lifetime is exponentially distributedwith parameter λ.

Say you turn the light on, and then we leave and comeback after t0 hours to find it still on. What is the probabilitythat the light bulb will last for at least additional t hours?

In symbols, we are looking for P(X ≥ t + t0 | X ≥ t0).

How would we calculate this?

66

The Gamma Distribution

67

The Gamma FunctionTo define the family of gamma distributions, we first needto introduce a function that plays an important role in manybranches of mathematics.

DefinitionFor α > 0, the gamma function is defined by

68

The Gamma FunctionThe most important properties of the gamma function arethe following:

1. For any α > 1, Γ(α) = (α – 1) Γ(α – 1) [via integration by parts]

2. For any positive integer, n, Γ(n) = (n – 1)!

3.

69

The Gamma FunctionSo if we let

then f(x; α) ≥ 0 and ,

so f(x; a) satisfies the two basic properties of a pdf.

70

The Gamma DistributionDefinitionA continuous random variable X is said to have a gammadistribution if the pdf of X is

where the parameters α and β satisfy α > 0, β > 0. Thestandard gamma distribution has β = 1.

71

The Gamma DistributionThe Exp dist results from taking α = 1 and β = 1/λ.

Figure on left illustrates the gamma pdf f(x; α, β) for several(α, β) pairs, and right the standard gamma pdf.

Gamma density curves standard gamma density curves

72

The Gamma DistributionThe mean and variance of a random variable X having the gammadistribution f (x; α, β) are

E(X) = µ = αβ V(X) = σ2 = αβ2

When X is a standard gamma rv, the cdf of X,

is often called the incomplete gamma function.

Tables with probabilities are available. Even better, R (pgamma),Matlab, Mathematica, etc can all calculate gamma probabilities.

73

ExampleSuppose the survival time X (weeks) of a random mousehas a gamma distribution with α = 8 and β = 15.

What are EX, Var(X), and sd(X)?

What is the probability probability that a mouse survivesbetween 60 and 120 weeks? What is the probability that amouse survives at least 30 weeks? (Use tables and R)

74

The Chi-Squared Distribution

75

The Chi-Squared DistributionDefinitionLet v be a positive integer. Then a random variable X issaid to have a chi-squared distribution with parameter v ifthe pdf of X is the gamma density with α = v/2 and β = 2.The pdf of a chi-squared rv is thus

The parameter is called the number of degrees offreedom (df) of X. The symbol χ2 is often used in place of“chi-squared.”

(4.10)

76

The Weibull Distribution

77

The Weibull DistributionThe family of Weibull distributions was introduced by theSwedish physicist Waloddi Weibull in 1939; his 1951 article“A Statistical Distribution Function of Wide Applicability”(J. of Applied Mechanics, vol. 18: 293–297) discusses anumber of applications.

DefinitionA random variable X is said to have a Weibull distributionwith parameters α and β (α > 0, β > 0) if the pdf of X is

(4.11)

78

The Weibull DistributionIn some situations, there are theoretical justifications for theappropriateness of the Weibull distribution, but in manyapplications f (x; α, β) simply provides a good fit toobserved data for particular values of α and β.

When α = 1, the pdf reduces to the exponential distribution(with λ = 1/β), so the exponential distribution is a specialcase of both the gamma and Weibull distributions.

79

The Weibull DistributionBoth α and β can be varied to obtain a number of different-looking density curves, as illustrated in

Weibull density curves

80

The Weibull Distributionβ is called a scale parameter, since different values stretchor compress the graph in the x direction, and α is referredto as a shape parameter.

Integrating to obtain E(X) and E(X2) yields

The computation of µ and σ

2 requires use of the gammafunction.

81

The Weibull DistributionThe integration is easily carried out to obtainthe cdf of X.

The cdf of a Weibull rv having parameters α and β is

(4.12)

82

The Lognormal Distribution

83

The Lognormal DistributionDefinition

A nonnegative rv X is said to have a lognormaldistribution if the rv Y = ln(X) has a normal distribution.

The resulting pdf of a lognormal rv when ln(X) is normallydistributed with parameters µ and σ is

84

The Lognormal DistributionThe parameters µ and σ are not the mean and standarddeviation of X but of ln(X).

It is common to refer to µ and σ as the location and thescale parameters, respectively. The mean and variance ofX can be shown to be

85

The Lognormal DistributionFigure below illustrates graphs of the lognormal pdf;although a normal curve is symmetric, a lognormal curve isskewed. Is the skew positive or negative?

86

The Lognormal DistributionBecause ln(X) has a normal distribution, the cdf of X can beexpressed in terms of the cdf φ(z) of a standard normalrv Z.

F(x; µ, σ) = P(X ≤ x) = P [ln(X) ≤ ln(x)]

(4.13)

87

The Beta Distribution

88

The Beta DistributionSo far, all families of continuous distributions (except forthe uniform distribution) had positive density over an infiniteinterval.

The beta distribution provides positive density only for X inan interval of finite length [A,B].

The standard beta distribution is commonly used to modelvariation in the proportion or percentage of a quantityoccurring in different samples. Examples?

89

The Beta DistributionDefinitionA random variable X is said to have a beta distributionwith parameters α, β (both positive), A, and B if the pdf ofX is

The case A = 0, B = 1 gives the standard betadistribution.

90

The Beta DistributionFigure below illustrates several standard beta pdf’s.

Standard beta density curves

91

The Beta DistributionGraphs of the general pdf are similar, except they areshifted and then stretched or compressed to fit over [A, B].

Unless α and β are integers, integration of the pdf tocalculate probabilities is difficult. Either a table of theincomplete beta function or appropriate software should beused.

The mean and variance of X are

92

Examples…

93

Example 1Suppose the pdf of the magnitude X of a dynamic load on abridge (in newtons) is

What is F(x)?

94

Example 1The graphs of f(x) and F(x) are shown in Figure 4.9.

The pdf and cdf for Example 4.7

cont’d

95

Example 1What is the probability that the load is between 1 and 1.5?The probability that the load exceeds 1?

The average load? The median load?

cont’d

96

Example 2Two species are competing in a region for control of alimited amount of a certain resource.

Let X = the proportion of the resource controlled by species1 and suppose X has pdf 0 ≤ x ≤ 1 otherwise

which is a uniform distribution on [0, 1]. (In her bookEcological Diversity, E. C. Pielou calls this the “broken- tick”model for resource allocation, since it is analogous tobreaking a stick at a randomly chosen point.)

f(x) =

97

Example 2Then the species that controls the majority of this resourcecontrols the amount

h(X) = max (X, 1 – X) =

What is the expected amount controlled by the specieshaving majority control?

cont’d

98

Example 3The time that it takes a driver to react to the brake lights ona decelerating vehicle is critical in helping to avoid rear-endcollisions.

The article “Fast-Rise Brake Lamp as a Collision-Prevention Device” (Ergonomics, 1993: 391–395) suggeststhat reaction time for an in-traffic response to a brake signalfrom standard brake lights can be modeled with a normaldistribution having mean value 1.25 sec and standarddeviation of .46 sec.

What is the probability that reaction time is between 1.00sec and 1.75 sec?

99

Example 3Similarly, if we view 2 seconds as a critically long reactiontime, the probability that actual reaction time will exceedthis value is

cont’d

100

Example 4According to the article “Predictive Model for PittingCorrosion in Buried Oil and Gas Pipelines” (Corrosion,2009: 332–342), the lognormal distribution has beenreported as the best option for describing the distribution ofmaximum pit depth data from cast iron pipes in soil.

The authors suggest that a lognormal distribution withµ = .353 and σ = .754 is appropriate for maximum pit depth(mm) of buried pipelines.

101

Example 4What are the mean and variance of pit depth?

What is the probability that maximum pit depth is between1 and 2 mm?

What value c is such that only 1% of all specimens have amaximum pit depth exceeding c?

cont’d

4 Continuous Random Variables and Probability...

Documents

Transcript of 4 Continuous Random Variables and Probability...