Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf ·...

55
Chapter 4. Probability and Probability Distributions

Transcript of Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf ·...

Page 1: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Chapter 4. Probability and Probability

Distributions

Page 2: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Importance of Knowing Probability

• To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the degree of accuracy to which the sample mean, sample standard deviation, or sample proportion represent the corresponding population values.

• To decide at what point the result of the observed sample is not possible.– This means that we need to know how to find the probability of

obtaining a particular sample outcome.– Probability is the tool that enables us to make an inference.

Page 3: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Definition of Probability (1)

• Classical definition– Each possible distinct result is called an outcome; an event is

identified as a collection of outcomes.

– The probability of an event E is computed by taking the ratio of the number of outcomes favorable to event E (Ne) to the total number of N of possible outcomes:

NNEP e=)event (

Page 4: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Definition of Probability (2)

• Relative frequency– If an experiment is conducted n different times and if event E

occurs on ne of these trials, then the probability of event E is approximately

nnEP e≈)event (

Page 5: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Basic Event Relations and Probability Laws (1)

• The probability of an event, say event A, will always satisfy the property:

• Mutually exclusive– Two events A and B are said to be mutually exclusive if they

cannot occur simultaneously.

1)(0 ≤≤ AP

)()()or ( BPAPBAP +=

Page 6: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Basic Event Relations and Probability Laws (2)

• Complement– The complement of an event A is the event that A does not

occur. The complement of A is denoted by the symbol .

• Union– The union of two events A and B is the set of all outcomes that

are included in either A or B (or both).

• Intersection– The intersection of two events A and B is the set of all

outcomes that are included in both A and B.

A

1)()( =+ APAP

)()()()( BAPBPAPBAP I−+=∪

Page 7: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Basic Event Relations and Probability Laws (3)

• Conditional Probability– When probabilities are calculated with a subset of the total

group as the denominator, the result is called a conditional probability.

– Consider two events A and B with nonzero probabilities, P(A) and P(B). The conditional probability of event A given event B is:

)()()|(

BPBAPBAP ∩

=

Page 8: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Basic Event Relations and Probability Laws (3)

• Independence– The occurrence of event A is not dependent on the occurrence

of event B or, simply, that A and B are independent event.

– When events A and B are independent, it follows that:

)()|( APBAP =

)()()|()()( BPAPABPAPBAP ==∩

Page 9: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Bayes’ Formula (1)

• Let A1, A2,…, Ak be a collection of k mutually exclusive and exhaustive events with P(Ai)>0 for i=1,…, k. Then for any other B for which P(B) >0

• Example 1.

kjAPABP

APABPBPBAP

BAP k

iii

jjjj ,...,1

)()|(

)()|()(

)()|(

1

=⋅

=∩

=

∑=

Page 10: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Bayes’ Formula (2)• Sensitivity

– The sensitivity of a test (or symptom) is the probability of a positive test result (or presence of the symptom) given the presence of the disease.

• Specificity– The specificity of a test (or symptom) is the probability of a negative

test result (or absence of the symptom) given the absence of thedisease.

• False positive– The false positive of a test (or symptom) is the probability of a positive

test result (or presence of the symptom) given the absence of the disease.

• False negative– The false negative of a test (or symptom) is the probability of a

negative test result (or presence of the symptom) given the presence of the disease.

Page 11: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Bayes’ Formula (3)

• Predictive value positive– The predictive value positive of a test (or symptom) is the

probability that a subject has the disease given that the subject has a positive test result (or has the symptom).

• Predictive value negative– The predictive value negative of a test (or symptom) is the

probability that a subject does not have the disease, given thatthe subject has a negative test result (or does not have the symptom).

• Example 2.

Page 12: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Discrete and Continuous Variables

• Discrete random variable– When observation on a quantitative random variable can

assume only a countable number of values, the variable is called a discrete random variable.

• Continuous random variable– When observations on a quantitative random variable can

assume any one of the uncountable number of values in a line interval, the variable is called a continuous random variable.

Page 13: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Probability Distribution for Discrete Random Variables (1)

• For discrete random variables, we can compute the probability ofspecific individual values occurring.

• The probability distribution for a discrete random variable displays the probability P(y) associated with each value of y.

• Properties of discrete random variables:– The probability associated with every value of y lies between 0 and 1.– The sum of the probabilities for all values of y is equal to 1.– The probabilities for a discrete random variable are additive. Hence, the

probability that y=1, 2, 3, …, k is equal to P(1) + P(2)+P(3)+…+P(k).– Example 3.

Page 14: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Probability Distribution for Discrete Random Variables (2)

• Binomial distribution (or experiment)– Properties:

• A binomial experiment consists of n identical trials.• Each trial results is one of two outcomes. We will label one

outcome a success and the other a failure.• The probability of success on a single trial is equal to π and π remains the same from trial to trial.

• The trials are independent; that is, the outcome of one trial does not influence the outcome of any other trial.

• The random variability y is the number of successes observed during the n trials.

Page 15: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

General Formula for Binomial Probability

• The probability of observing y successes in n trials of a binomial experiment is:

• Example 3 123)2()1( ! trialsin successes ofnumber trialaon failure ofy probabilit 1 trialaon success ofy probabilit

trialsofnumber where

)1()!(!

!)(

⋅⋅⋅⋅⋅−⋅−⋅===−==

−−

= −

nnnnny

n

ynynyP yny

ππ

ππ

Page 16: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Mean and Standard Deviation of the Binomial Probability Distribution

• Mean (µ)

• Standard deviation (σ)

• Example 6

πµ n=

.experiment binomial in the trialsofnumber theis and algiven tri ain success ofy probabilit theis where

)1(

n

ππσ −=

Page 17: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Probability Distributions for Continuous Random Variables

• Theoretically, a continuous random variable is one that can assume values associated with infinitely many points in a line interval. It is impossible to assign a small amount of probability to each value of y and retain the property that the probabilities sum to 1.

• To overcome this difficulty, for continuous random variables, the probability of an interval of values is the event of interest or the probability of y falling in a given interval.

Page 18: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Normal Distribution (1)

• Normal distribution distribution

• Normal probability density function-4 -3 -2 -1 0 1 2 3 4

µ

0.0

0.1

0.2

0.3

Nor

mal

Den

sity

2

2

2)(

21)( σ

µ

σπ

−−

⋅=y

eyf

Page 19: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Normal Distribution (2)

-4 -3 -2 -1 0 1 2 3 4µ

0.0

0.1

0.2

0.3

0.4

Nor

mal

Den

sity

-4 -3 -2 -1 0 1 2 3 4µ

0.0

0.1

0.2

0.3

0.4

Nor

mal

Den

sity

-4 -3 -2 -1 0 1 2 3 4µ

0.0

0.1

0.2

0.3

Nor

mal

Den

sity

-4 -3 -2 -1 0 1 2 3 4µ

0.0

0.1

0.2

0.3

Nor

mal

Den

sity

0.6826

0.9544 0.9974

•Area under a normal curve

Page 20: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Normal Distribution (3)

• Z score– To determine the probability that a measurement will be less

than some value y, we first calculate the number of standard deviations that y lies away from the mean by using the formula:

– The value of z computed using this formula is sometimes referred to as the z score associated with the y-value. Using the computed value of z, we determine the appropriate probability by using the z table.

– Example 8

σµ−

=yz

Page 21: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Normal Distribution (4)

• 100pth percentile– The 100pth percentile of a distribution is that value, yp, such that

100p% of the population values fall below yp and 100(1-p)% are above yp.

– To find the percentile, zp, we find the probability p in z table.

– To find the 100pth percentile, yp, of a normal distribution with mean µand standard deviation σ, we need to apply the reverse of the standardization formula:

– Example 9σµ pp zy +=

Page 22: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Random Sampling

• Random number table

• Random number generator

Page 23: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Sampling Distributions (1)

• A sample statistic is a random variable; it id subject to randomvariation because it is based on a random sample of measurementsselected from the population of interest.

• Like any other random variable, a sample statistic has a probability distribution. We call the probability distribution of a sampling statistic the sampling distribution of that statistic.

• Example 10

Page 24: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Sampling Distributions (2)

• The sampling distribution of has mean and standard deviation , which are related to the population mean µ, and standard deviation σ, by the following relationship:

• The sampling deviations have means that are approximately equal to the population mean. Also, the sampling deviations have standard deviations that are approximately equal to . If all possible values of have been generated, then the standard deviation of would equal to exactly.

yµyyσ

nyyσσµµ ==

nσy

y nσ

Page 25: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Central Limit Theorems (1)

• Let denote the sample mean computed from a random sample of n measurements from a population having a mean, µ, and finite standard deviation, σ. Let and denote the mean and standard deviation of the sampling distribution of , respectively. Based on repeated random samples of size n from the population, we can conclude the following:

yσyµ

y

y

µµ =y 1.

nσ y

σ= .2

.increases) as precise more becomingion approximat (with the normalely approximat be will ofon distributi sampling thelarge, is When 3.

nyn

. size sampleany for nromalexactly is ofon distributi sampling thenormal, ison distributi population When the.4

ny

Page 26: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Central Limit Theorems (2)

• The Central Limit Theorems provide theoretical justification forour approximating the true sampling distribution of the sample mean with the normal distribution. Similar theorems exist for the sample median, sample standard deviation, and the sample proportion.

• For applying the Central Limit Theorems, no specific shape is required for the theorems to be validated. However, this is nottrue in general. If the population distribution had many extreme values or several modes, the sampling distribution of wouldrequire n to be considerably larger in order to achieve a symmetric bell shape.

y

Page 27: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Central Limit Theorems (3)

• It is very unlikely that the exact shape of the population distribution will be known. Thus, the exact shape of the sampling distribution of will not be known either. The important point to remember is that the sampling distribution of will be approximately normally distributed with a mean , the population mean , and a standard deviation . The approximation will be more precise as n, the sample size for each sample, increases and as the shape of the population distribution becomes more like the shape of a normal distribution.

• How large should the sample size be for the Central Limit Theorem to hold? In general, the Central Limit Theorem holds for n > 30. However, one should not apply this rule blindly. If thepopulation is heavily skewed, the sampling distribution for will still be skewed even for n > 30. On the other hand, if population is symmetric, the Central Limit Theorem holds for n < 30.

yy

µµ =yny σσ =

y

Page 28: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Central Limit Theorems (4)

• Central Limit Theorem for :– Let denote the sum of a random sample of n measurements

from a population having a mean µ and finite standard deviation σ. Let and denote the mean and standard deviation of the sampling distribution of , respectively. Based on repeated random samples of size n from the population, we can conclude the following:

∑ y

∑ y

µµ ny =∑ 1.

σnσ y =∑ .2

.increases) as precise more becomingion approximat (with the normal

ely approximat be will ofon distributi sampling thelarge, is When 3.

nyn ∑

. size sampleany for nromalexactly is ofon distributi sampling thenormal, ison distributi population When the.4

ny∑

∑ y∑ yµ

∑ yσ

Page 29: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Normal Approximation to the Binomial (1)

• The binomial random variable y is the number of successes in the n trials. Let n random variables, I1, I2,…, In defined as:

• To consider the sum of the random variables, I1, I2,…, In , . A “1” is placed in the sum for each success that occurs and a “0”for each failure that occurs. Thus, is the number ofsuccesses that occurred during the n trials. Hence, we conclude that .

• Because the binomial random variable y is the sum of independent random variables, each having the same distribution, we can apply the Central Limit Theorem for sums to y.

⎩⎨⎧

=failure ain results th trial theif 0success ain results th trial theif 1

ii

Ii

∑ =

n

i iI1

∑ =

n

i iI1

∑ ==

n

i iIy1

Page 30: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Normal Approximation to the Binomial (2)

• The normal distribution can be used to approximate the binomial distribution when n is of an appropriate size. The normal distribution that will be used has a mean and standard deviationgiven by the following formula:

• Example 11

success ofy probabilit the

)1(

=

−=

=

π

ππσ

πµ

n

n

Page 31: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Normal Approximation to the Binomial (3)

• The normal approximation to the binomial distribution can be unsatisfactory if nπ < 5 or n(1- π) < 5. If π is small and n is modest, the actual binomial distribution is seriously skewed to the right. In such a case, the symmetric normal curve will give an unsatisfactory approximation. If π is near 1, so n(1- π) < 5, the actual binomial will be skewed to the left, and again the normalapproximation will not be very accurate.

• The normal approximation is quite good when nπ or n(1- π) exceed about 20. In the middle zone, nπ or n(1- π) between 5 and 20, a modification called continuity correction makes a substantial contribution to the quality of the approximation.

Page 32: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Normal Approximation to the Binomial (4)

• The point of the continuity correction is that we are using the continuous normal curve to approximate a discrete binomial distribution. The general idea of the contunity correction is to add or subtract 0.5 from a binomial value before using normal probabilities. A picture of the situation as the following:

0.41636 0.17886 0.130420.071600.027840.00684 0.00080

7.03.07.03.07.03.07.03.07.03.07.03.0)5( :isy probabilit binomial actual The

4052.0)24.0(]7.03.020/)3.0205.5([)5.5(

use3121.0)49.0(]7.03.020/)3.0205([)5(

of Instead

155205

164204

173203

182202

191201

200200

=+++++=

⋅+⋅+⋅+⋅+⋅+⋅=≤

=−≤=⋅⋅⋅−≤=≤

=−≤=⋅⋅⋅−≤=≤

CCCCCCyP

zpzPyP

zpzpyP

Page 33: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Homework

• 4.39, 4.40 (p.153)• 4.95, 4.96 (p.181)• 4.117 (p.189)

Page 34: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Example 1

• A book club classifies members as heavy, medium, or light purchasers, and separate mailings are prepared for each of these groups. Overall, 20% of the members are heavy purchasers, 30% medium, and 50% light. A member is not classified into a group until 18 months after joining the club, but a test is made of the feasibility of using the first 3 months’ purchases to classify members. The following percentages are obtained from existing records of individuals classified as heavy, medium, or light purchasers:

• If a member purchases no books in the first 3 months, what is the probability that the member is a light purchaser? (Note: This table contains “conditional percentages for each column.)

Group (%) First 3 Months’

PurchasesHeavy Medium Light

0 5 15 60 1 10 30 20 2 30 40 15

3+ 55 15 5

Page 35: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Answer to Example 1

847.0355.030.0

355.050.060.0

20.005.030.015.050.060.0)0(

)()|0()()|0()()|0()0(

)0()0()0|(

formula, Bayes' toAccording?)0|(

==⋅

=

⋅+⋅+⋅∩

=

⋅+⋅+⋅∩

=

∩=

=

LPHPHPMPMPLPLP

LPPLPLP

LP

Page 36: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Example 2

• A screening test for a disease shows the result as the following table. What are the sensitivity, specificity, false positive, false negative, predictive value positive, and predictive value negative?

Disease Test Result Present (D) Absent (D) Total

Positive (T) a b a + b Negative (T) c d c + d

Total a + c b + d n

Page 37: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Answer to Example 2

)()|()()|()()|()|(negative valuepredictive

)()|()()|()()|()|(positive valuepredictive

)|(positive false )|(negative false

)|(y specificit )|(y sensitivit

DPDTPDPDTPDPDTP

dcdTDP

DPDTPDPDTPDPDTP

baaTDP

dbbDTP

cacDTP

dbdDTP

caaDTP

+=

+==

+=

+==

+==

+==

+==

+==

Page 38: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Example 3

• An article in the March 5, 1998, issue of The New England Journal of Medicine discussed a large outbreak of tuberculosis. One person, called the index patient, was diagnosed with tuberculosis in 1995. The 232 co-worker of the index patient were given a tuberculin screening test. The number of co-workers recording a positive reading on the test was the random variable of interest. Did this study satisfy the properties of a binomial experiment?

Page 39: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Answer to Example 3

• Were there n identical trials? Yes

• Did each trial result in one of two outcomes? Yes

• Was the probability of success the same from trial to trial? Yes

• Were the trials independent? Yes

• Was the random variable of interest to the experimenter the number of successes y in the 232 screening tests? Yes

• All five characteristics were satisfied, so the tuberculin screening test represented a binomial experiment.

Page 40: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Example 4

• An economist interview 75 students in a class of 100 to estimate the proportion of students who expect to obtain a “C” or better in the course. Is this a binomial experiment?

Answer:– Were there n identical trials? Yes– Did each trial result in one of two outcomes? Yes– Was the probability of success the same from trial to trial? No

Page 41: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Example 5

• What is the probability distribution of the number of heads in 10000 tosses of 4 coins?

Answer:Let y is the number of heads observed. Then the empirical sampling results for y:

y Frequency Observed Relative

Frequency

Expected Relative

Frequency 0 638 0.0638 0.0625 1 2505 0.2505 0.2500 2 3796 0.3796 0.3750 3 2434 0.2434 0.2500 4 627 0.0627 0.0625

Page 42: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Answer to example 5 (continued)Probability distribution for the number of heads when 4 coins are tossed.

0.0

0.1

0.2

0.3

0.4

0 1 2 3 4

Number of Heads

P(y)

Page 43: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Example 6

• Suppose that a sample of households is randomly selected from all the households in the city in order to estimate the percentage in which the head of the household in unemployed. To illustrate the computation of a binomial probability, suppose that the unknown percentage is actually 10% and that a sample of n=5 is selected from the population. – What is the probability that all five heads of the households are

employed?– What is the probability of one or fewer being unemployed?

Answer:

918.09.01.09.05)5()4()5or 4(

590.09.01.09.0!0!5

!51.09.0)!55(!5

!5)5(

514

50505

=+⋅⋅=+==

==⋅⋅

=⋅−⋅

==

PPyP

yP

Page 44: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Example 7

• A company producing the turf grass takes a sample of 20 seeds ona regular basis to monitor the quality of the seeds. According to the result from previous experiments, the germination rate of the seeds is 85%. If in a particular sample of 20 seeds there are only 12 had germinated, would the germination arte of 85% seem consist with the current results?

Answer:

Thus, y=12 seeds is more than 3 standard deviation less than the mean number of seeds µ = 17; it is not likely that in 20 seeds we would obtain only 12 germinated seeds if π really is equal to 0.85.

125.36.11712

60.1)85.01(85.020)1(

1785.020

−=−

=−××=−=

=×==

ππσ

πµ

n

n

Page 45: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

The binomial distribution for n = 20 and π=0.85

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

050

010

0015

0020

0025

00

Number of Germinated Seeds

Cou

nt

Page 46: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Example 8

• The mean daily milk production of a herd of Guerney cows has a normal distribution with µ=70 pounds and σ=13 pounds.– What is the probability that the milk production for a cow

chosen at random will be less than 60 pounds?– What is the probability that the milk production for a cow

chosen at random will be greater than 90 pounds?– What is the probability that the milk production for a cow

chosen at random will be between 60 pounds and 90 pounds?

Page 47: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Answer to Example 8 (1)

• To compute the z value corresponding to the value of 60 pounds.

7692.013

7060−=

−=

−=

σµyz

20 30 40 50 60 70 80 90 100 110 120µ

0.00

00.

005

0.01

00.

015

0.02

00.

025

0.03

0

Nor

mal

Den

sity

0.2206

Page 48: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Answer to Example 8 (2)

• To compute the z value corresponding to the value of 90 pounds. Then, check the z table to find out the corresponding probability of the values greater than 90 pounds.

5384.113

7090=

−=

−=

σµyz

20 30 40 50 60 70 80 90 100 110 120µ

0.00

00.

005

0.01

00.

015

0.02

00.

025

0.03

0

Nor

mal

Den

sity

0.0618

Page 49: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Answer to Example 8 (3)

• The area between two values 60 and 90 is determine by finding the difference between the areas to left of the two values.

7176.02206.09382.09382.00618.01=−

=−

20 30 40 50 60 70 80 90 100 110 120µ

0.00

00.

005

0.01

00.

015

0.02

00.

025

0.03

0

Nor

mal

Den

sity

0.7176

Page 50: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Example 9

• The Scholastic Assessment Test (SAT) is an examination used to measure a person’s readiness for college. The mathematics scores are used to have a normal distribution with mean 500 and standard deviation 100. – What proportion of the people taking the SAT will score below

350?– To identify a group of students needing remedial assistance,

say, the lower 10% of all scores, what is the score on the SAT?

Page 51: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Answer to Example 9

• To find the proportion of scores below 350:

• To find the 10th percentile, we first find z0.1 in z table. Since 0.1003 is the value nearest 0.1000 and its corresponding z is –1.28, we take z0.1 = -1.28 and then compute:

5.1100

500350−=

−=

−=

σµyz

200 250 300 350 400 450 500 550 600 650 700 750 800µ

0.00

00.

001

0.00

20.

003

0.00

4

Nor

mal

Den

sity

0.0668

372128500100)28.1(5001.01.0 =−=⋅−+=+= σµ zy

Page 52: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Random Numbers

Page 53: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Example 10 (1)

• The population consists of 500 pennies from which we compute theage of each penny: age = 2000 – date on penny. What are the distributions of based on sample of sizes n = 5, 10 and 25? (Given the population mean µ = 15.070 and the population standard deviation σ = 10.597.)

y

05

1015

2025

0 1 2 3 4 5 6 7 8 9 1 01 11 21 31 41 51 61 71 81 92 02 12 22 32 42 52 62 72 82 93 03 13 23 33 43 53 63 73 83 94 0

A g e s

Freq

uenc

y

Page 54: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Example 10 (2)

Sampling distribution of y for n = 5, 10 and 25

0 4 8 12 16 20 24 28 32 36 40

040

080

012

0016

0020

00

Mean Age

Freq

uenc

y

0 4 8 12 16 20 24 28 32 36 40

040

080

012

0016

0020

00

Mean Age

Freq

uenc

y

0 4 8 12 16 20 24 28 32 36 40

040

080

012

0016

0020

00

Mean Age

Freq

uenc

y

Sample Size

Mean of y Standard Deviation of y n

597.10

1 15.070 10.597 10.597 5 15.042 4.728 4.739

10 15.039 3.324 3.351 25 15.078 2.075 2.119

Page 55: Chapter 4. Probability and Probability Distributionsweb.cjcu.edu.tw/~jdwu/biostat01/lect004.pdf · Probability Distributions for Continuous Random Variables • Theoretically, a continuous

Example 11

• Using the normal approximation to the binomial to compute the probability of observing 460 or fewer in a sample of 1000 favoring consolidation if we assume that 50% of the entire population favor the change.

53.28.15500460

8.155.05.01000)1(

5005.01000 :Answer

−=−

=−

=

=××=−=

=×==

σµ

ππσ

πµ

yz

n

n

430 442 454 466 478 490 502 514 526 538 550

y

0.00

00.

005

0.01

00.

015

0.02

00.

025

f(y)

0.0057