Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial...

22
Binomial Distribution Binomial Experiment 1 The same experiment is repeated a fixed number of times. 2 There are only two possible outcomes, success and failure.; P ( success ) = p, P ( failure ) = 1 - p. 3 The repeated trials are independent, so that the probability of success remains the same for each trial. The Binomial Distribution is P ( exactly k successes in n trials) = p k (1 - p) n-k C (n, k ). Examples are a 2 showing in a rool of dice, or H in a toss of coins from before, but NOT birthdays. Dan Barbasch Math 1105 Chapter 8 Week of September 17 1 / 22

Transcript of Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial...

Page 1: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Binomial Distribution

Binomial Experiment

1 The same experiment is repeated a fixed number of times.

2 There are only two possible outcomes, success and failure.;P( success ) = p, P( failure ) = 1− p.

3 The repeated trials are independent, so that the probability of successremains the same for each trial.

The Binomial Distribution is

P( exactly k successes in n trials) = pk(1− p)n−kC (n, k).

Examples are a 2 showing in a rool of dice, or H in a toss of coins frombefore, but NOT birthdays.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 1 / 22

Page 2: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Binomial Distribution, Examples I

Example (#48 Section 8.4)

A hospital receives 1/5 = 0.2 of its flu vaccine shipments from Company Xand the remainder of its shipments from other companies. Each shipmentcontains a very large number of vaccine vials. For Company X’sshipments, 10% of the vials are ineffective. For every other company, 2%of the vials are ineffective. The hospital tests 30 randomly selected vialsfrom a shipment and finds that one vial is ineffective. What is the prob-ability that this shipment came from Company X?

Dan Barbasch Math 1105 Chapter 8 Week of September 17 2 / 22

Page 3: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Binomial Distribution, Examples IIAnswer.

For X , p = 0.1 and 1− p = 0.9, for NX , p = 0.02 and 1− p = 0.98.

P(X ) = 0.2 P(NX ) = 0.8

P(D | X ) = 0.1 P(D | NX ) = 0.02

P(1D/30 | X ) = C (30, 1)× (0.1)1 × (0.9)29

P(1D/30 | NX ) = C (30, 1)× (0.02)1 × (0.98)29

Draw the usual tree diagram for Bayes’s theorem and compute.

P(X | 1D/30) =P(1D/30 and X )

P(1D/30)=

=C (30, 1) · (0.1)1 · (0.9)29 · 0.2

C (30, 1) · (0.1)1 · (0.9)29 · 0.2 + C (30, 1) · (0.02)1 · (0.98)29 · 0.8.

The answer is (close to) 0.1.Dan Barbasch Math 1105 Chapter 8 Week of September 17 3 / 22

Page 4: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Pascal’s Triangle I

The triangular array of numbers shown below is called Pascals triangle inhonor of the French mathematician Blaise Pascal (1623 - 1662), who wasone of the first to use it extensively. The triangle was known long beforePascals time and appears in Chinese and Islamic manuscripts from theeleventh century.

11 1

1 2 11 3 3 1

1 4 6 4 11 5 10 10 5 1

The array provides a quick way to find binomial probabilities. The nth rowof the triangle, where n = 0, 1, 2, 3, . . . , gives the coefficients C (n, r) forr = 0, 1, 2, 3, . . . , n. For example, for n = 4,1 = C (4, 0), 4 = C (4, 1), 6 = C (4, 2), and so on. Each number in the

Dan Barbasch Math 1105 Chapter 8 Week of September 17 4 / 22

Page 5: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Pascal’s Triangle II

triangle is the sum of the two numbers directly above it. For example, inthe row for n = 4, 1 is the sum of 1, the only number above it, 4 is thesum 1 + 3, 6 = 3 + 3 and so on.The general formula is

C (n, r) = C (n − 1, r − 1) + C (n − 1, r).

Choosing r out of n is the same as the sum of choose r out of n − 1(make the choice of all r out of 1, . . . , n− 1 plus choose r − 1 out of n− 1(choose n and then r − 1 out of n − 1).

Dan Barbasch Math 1105 Chapter 8 Week of September 17 5 / 22

Page 6: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Example, Sports I

In many sports championships, such as the World Series in baseball andthe Stanley Cup final series in hockey, the winner is the first team to winfour games. For this exercise, assume that each game is independent ofthe others, with a constant probability p that one specified team (say, theNational League team) wins.a. Find the probability that the series lasts for four, five, six, and sevengames when p = 0.5.b. Morrison and Schmittlein have found that the Stanley Cup finals can bedescribed by letting p = 0.73 be the probability that the better team winseach game. Find the probability that the series lasts for four, five, six, andseven games.Source: Chance.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 6 / 22

Page 7: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Example, Sports II

Answer.

P( End in exactly 4 ) = P(AAAA and BBBB} = 2 · (0.5)4,

P( End in exactly 5 ) = C (4, 1)(0.5)5 + C (4, 3)(0.5)5,

P( End in exactly 6 ) = C (5, 2) · (0.5)6 + C (5, 3)(0.5)6,

P( End in exactly 7 ) = C (6, 3)(0.5)7 + C (6, 3)(0.5)7.

From the triangle,

C (4, 1) = C (3, 1) + C (3, 0) = 3 + 1 = 4,

C (4, 3) = C (3, 3) + C (3, 2) = 1 + 3 = 4,

C (5, 2) = C (4, 2) + C (4, 1) = 6 + 4 = 10,

C (5, 3) = C (4, 3) + C (4, 2) = 4 + 6 = 10,

C (6, 3) = C (5, 3) + C (5, 2) = 10 + 10 = 20.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 7 / 22

Page 8: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Fermat and Pascal IFand P are playing a game. They toss a coin, p = P(H) = 0.3. F wins ifH, P wins if T . F leads 8 to 7. What is the probability that the game endswhenever one reaches 20. What is the probability the game ends afteranother

1 12

2 20

tosses?

Dan Barbasch Math 1105 Chapter 8 Week of September 17 8 / 22

Page 9: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Fermat and Pascal II

Answer.

For (1), p12. Only F can win.

For (2) C (19, 12)p12(1− p)8 + C (19, 13)p7(1− p)13. The sum ofprobabilities that F wins and that P wins.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 9 / 22

Page 10: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Random Variables I

Random Variable A random variable X is a function that assigns a realnumber to each outcome of an experiment.

Probability Distribution The probability distribution of a radom variable is{P(X = k) = pk} with 0 ≤ pk ≤ 1 and the sum of pk ,∑n

k=0 = 1. This definition is for when X takes finitely manyvalues only.

Expected Value E (X ) =∑

k kP(X = k).

Example

Toss a coin. Let X = 1 if H, and X = 0 if T . The coin satisfies P(H) = pand P(T ) = 1− p. The probability distribution is P(X = 1) = 1 andP(X = 0) = 1− p. Then EX = 1 · p + 0 · (1− p) = p.If X = 1 if H, and X = −1 if T , then EX = 1 · p + (−1) · (1− p) = 2p− 1.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 10 / 22

Page 11: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Random Variables II

Example

Toss two fair dice. Let X be the sum of the faces.

P(X = 2) = 1/36 P(X = 3) = 2/36 P(X = 4) = 3/36

P(X = 5) = 4/36 P(X = 6) = 5/36 P(X = 7) = 6/36

P(X = 8) = 5/36 P(X = 9) = 4/36 P(X = 10) = 3/36

P(X = 11) = 2/36 P(X = 12) = 1/36

Then EX = 2 · 1/36 + 3 · 2/36 + 4 · 3/36 + 5 · 4/36 + 6 · 5/36 + 7 · 6/36 +8 · 5/36 + 9 · 4/36 + 10 · 3/36 + 11 · 2/36 + 12 · 1/36= 7.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 11 / 22

Page 12: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Motivation

Suppose you have a coin that comes up H 30% of times and T 70% oftimes. You get paid $2 if H, and you pay out $1 if T. What do you expectto have after 100 tosses? The intuition says it should be the average,2 · 30− 1 · 70 = −10. For one toss you’d expect 2 · 0.3 + (−1) · 0.7 = −0.1.Repeat a 100 times, and you expect to have lost $10.The mathematics is the Expected Value. We interpret P(H) = 0.3 andP(T ) = 0.7.The expected value is EX = 2 · P(H) + (−1) · P(T ).For n tosses you expect nEX .

Dan Barbasch Math 1105 Chapter 8 Week of September 17 12 / 22

Page 13: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Expected Value of a Sum of Independent Variables I

Definition

Two random variables X1,X2 are called independent ifP(X1 = a,X2 = b) = P(X1 = a) · P(X2 = b). More general, X1, . . .Xn arecalled independent if P(Xi1 = a1, . . .Xik = ak) = P(Xi1 = i1) · P(Xik = ak)for any choice of a subset of the variables.

Theorem

Let X1, . . . ,Xn be independent random variables. ThenE (X1 + · · ·+ Xn) = EX1 + · · ·+ EXn

We ilustrate the proof for the case n = 2, E (X1 + X2) = EX1 + EX2. Thisis the warmup.

P(X1 = a1) = p P(X1 = a2) = 1− p

P(X2 = b1) = q P(X2 = b2) = 1− q.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 13 / 22

Page 14: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Expected Value of a Sum of Independent Variables II

EX1 = a1p+ a2(1− p)

EX2 = b1q+ b2(1− q)

E (X1 + X2) = (a1 + b1)pq+ (a1 + b2)p(1− q)+

(a2 + b1)(1− p)q+ (a2 + b2)(1− p)(1− q)

Gather the terms according to the a′s and b′s, and do the algebra.

a1(pq + p(1− q)) = a1p

a2((1− p)q + (1− p)(1− q)) = a2(1− p)

b1((1− p)q + pq) = b1q

b2(p(1− q) + 1− p)(1− q)) = b2(1− q).

Dan Barbasch Math 1105 Chapter 8 Week of September 17 14 / 22

Page 15: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Expected Value of a Sum of Independent Variables III

Example (Binomial distribution)

For n independent identical trials each with two possible outcomes, S andF, with probability p and 1− p, X the number of S , the distribution isP(X = k) = C (n, k)pk(1− p)n−k . The expected value is

EX =n∑

k=0

kC (n, k)pk(1− p)n−k= np.

For a single trial, EX = p · 1 + 0 · (1− p) = p.The general case can be computed directly using algebra.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 15 / 22

Page 16: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Binomial Distribution

We apply this to the binomial distribution, X1, . . . ,Xn i.i.d. (independentidentically distributed random variables) with probability distributionP(X = 1) = p, P(X = 0) = 1− p. Then

EX = 1 · p + 0 · (1− p) = p.

SoE (X1 + · · ·+ Xn) = p + · · ·+ p︸ ︷︷ ︸

n

= np.

The case of two dice is similar; X = X1 + X2. Then

EX1 = EX2 = 1·1/6+2·1/6+3·1/6+4·1/6+5·1/6+6·1/6 = 21/6 = 7/2.

So E (X1 + X2) = 7/2 + 7/2 = 7.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 16 / 22

Page 17: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Statistics, Measures of Central Tendency I

We are considering a random variable X with a probability distributionwhich has some parameters. We want to get an idea what theseparameters are. We perfom an experiment n times and record theoutcome. This means we have X1, . . . ,Xn i.i.d. random variables, withprobability distribution same as X . We want to use the outcome to inferwhat the parameters are.

Mean The outcomes are x1, . . . , xn. The Sample Mean isx := x1+···+xn

n . Also sometimes called the average. Theexpected value of X , EX , is also called the mean of X .Often denoted by µ. Sometimes called population mean.

Median The number so that half the values are below, half above. Ifthe sample is of even size, you take the average of themiddle terms.

Mode The number that occurs most frequently. There could beseveral modes, or no mode.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 17 / 22

Page 18: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Statistics, Measures of Central Tendency II

Example

You have a coin for which you know that P(H) = p and P(T ) = 1− p.You would like to estimate p. You toss it n times. You count the numberof heads. The sample mean should be an estimate of p.

EX = p, and E (X1 + · · ·+ Xn) = np. So

E

(X1 + · · ·+ Xn

n

)= p.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 18 / 22

Page 19: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Descriptive Statistics I

Frequency Distribution Divide into a number of equal disjoint intervals.For each interval count the number of elements in thesample occuring.

Histogram see the next slide

Grouped Data Mean Essentially calculate the mean of the frequencydistribution. Intervals are used, rather than single values. Itis assumed that all these values are located at the midpointof the interval. The letter xM is used to represent themidpoints and f represents the frequencies:∑

xM f

n

Frequency Polygon Connect the middles of the tops of each interval.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 19 / 22

Page 20: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

HistogramA histogram is a graphical representation of the distribution of numericaldata. It is a kind of bar graph. To construct a histogram, the first step isto ”bin” the range of values, that is, divide the entire range of values intoa series of intervals, and then count how many values fall into eachinterval. The bins are usually specified as consecutive, non-overlappingintervals of a variable. The bins (intervals) must be adjacent, and areoften (but are not required to be) of equal size.

Bin Count−3.5− 2.51 9−2.5− 1.51 32−1.5− 0.51 109−0.5− 0.49 1800.5− 1.49 1321.5− 2.49 342.5− 3.49 4

Mean: (−3)·9+(−2)·32+(−1)·109+·(0)180+1·132+2·34+3·4500

Dan Barbasch Math 1105 Chapter 8 Week of September 17 20 / 22

Page 21: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Example

The table on the next page gives the number of days in June and July ofrecent years in which the temperature reached 90 degrees or higher in NewYorks Central Park. Source: The New York Times and Accuweather.com.a. Prepare a frequency distribution with a column for intervals andfrequencies. Use seven intervals, starting with [0 4].b. Sketch a histogram and a frequency polygon, using the intervals in parta.c. Find the mean for the original data.d. Find the mean using the grouped data from part a.e. Explain why your answers to parts c and d are different.f. Find the median and the mode for the original data.

Dan Barbasch Math 1105 Chapter 8 Week of September 17 21 / 22

Page 22: Binomial Distribution - Cornell Universitypi.math.cornell.edu/~web1105/slides/9_17_17.pdfBinomial Distribution Binomial Experiment 1 The same experiment is repeated a xed number of

Temperature Data

9.1 Frequency Distributions; Measures of Central Tendency 417

a. Use this table to estimate the mean income for white house-holds in 2008.

b. Compare this estimate with the estimate found in Exercise39. Discuss whether this provides evidence that white Amer-ican households have higher earnings than African Ameri-can households.

41. Airlines The number of consumer complaints against the topU.S. airlines during the first six months of 2010 is given in thefollowing table. Source: U.S. Department of Transportation.

Delta 1175 2.19

American 660 1.56

United 487 1.84

US Airways 428 1.69

Continental 350 1.64

Southwest 149 0.29

Skywest 77 0.65

American Eagle 68 0.87

Expressjet 56 0.70

Alaska 34 0.44

Complaints per 100,000Airline Complaints Passengers Boarding

Pig 16

Cow 12

Chicken 11

Horse 9

Human 8

Sheep 7

Dog 7

Rhesus monkey 6

Mink 5

Rabbit 5

Mouse 4

Rat 4

Cat 2

Animal Number of Blood Types

a. By considering the numbers in the column labeled “Com-plaints,” calculate the mean and median number of com-plaints per airline.

b. Explain why the averages found in part a are not meaningful.

c. Find the mean and median of the numbers in the columnlabeled “Complaints per 100,000 Passengers Boarding.”Discuss whether these averages are meaningful.

Life Sciences

42. Pandas The size of the home ranges (in square kilometers) ofseveral pandas were surveyed over a year’s time, with the fol-lowing results.

0.1–0.5 11

0.6–1.0 12

1.1–1.5 7

1.6–2.0 6

2.1–2.5 2

2.6–3.0 1

3.1–3.5 1

Home Range Frequency

a. Sketch a histogram and frequency polygon for the data.

b. Find the mean for the data.

43. Blood Types The number of recognized blood types varies byspecies, as indicated by the table below. Find the mean,median, and mode of this data. Source: The Handy ScienceAnswer Book.

General Interest

44. Temperature The following table gives the number of days inJune and July of recent years in which the temperature reached90 degrees or higher in New York’s Central Park. Source: TheNew York Times and Accuweather.com.

a. Prepare a frequency distribution with a column for intervalsand frequencies. Use six intervals, starting with 0–4.

b. Sketch a histogram and a frequency polygon, using theintervals in part a.

c. Find the mean for the original data.

d. Find the mean using the grouped data from part a.

e. Explain why your answers to parts c and d are different.

f. Find the median and the mode for the original data.

1972 11 1985 4 1998 5

1973 8 1986 8 1999 24

1974 11 1987 14 2000 3

1975 3 1988 21 2001 4

1976 8 1989 10 2002 13

1977 11 1990 6 2003 11

1978 5 1991 21 2004 1

1979 7 1992 4 2005 12

1980 12 1993 25 2006 5

1981 12 1994 16 2007 4

1982 11 1995 14 2008 10

1983 20 1996 0 2009 0

1984 7 1997 10 2010 20

Year Days Year Days Year Days

Dan Barbasch Math 1105 Chapter 8 Week of September 17 22 / 22