Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster...

21
Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking • The study of Statistics concerns data about things. •A unit, case or subject is one of things we are talking about. Subjects are people. •A variable names a property a unit can have. • A variable has a type describing its possible values. ... ... ... 21 ... ... • The intrinsic type of a variable is the type the variable would have if there were no approximation or grouping. • A categorical variable places units in categories according to their characteristics. A characteristic is a value a categori- cal variable can have. • The frequency of a characteristic is the number of units with that characteristic. •A subgroup is a smaller group within a larger group. • The percentage of units in a group or a subgroup which have a certain characteristic is: p = # units w/characteristic total # units in group × 100. • There is an association between two categorical variables if the subgroups formed by one variable differ in percentages of a characteristic from the other variable. Two variables being associated means if you know a unit’s value for one variable, you can say something about the tendency of the unit’s value for the other variable. E.g., You hear, “He likes broccoli.” Can you say anything about his height? No. So, the variables likes broccoli and height are not associated. You hear, “He is in the NBA.” Can you say anything about his height? Yes, people in the NBA tend to be tall. So, the variables in the NBA and height are associated. E.g., In the raw data for 50 students on page 7, is there an association between gender and Math feelings? The variables can be taken in either order, so make the 1st variable gender, whose values are “female” and “male”. The 2nd variable is Math feelings with D or E meaning “positive” and A, B and C meaning “negative.” Using gender, divide the group into female and male. From the data, there are 32 females and 18 males. Of the females, there are 4 with positive feelings. There are 8 males with positive feelings. Calculate the percentages: p fp = #(females & positive)(100%) #(females) = 4(100%) 32 = 12.5%, p mp = #(males & positive)(100%) #(males) = 8(100%) 18 = 44.44%. The percentages are unequal so there is an association. Since p fp < p mp , the direction is males tend to have more positive feelings about math than females. The very different percent- ages mean a strong association. Association is symmetric. If variable A is associated with variable B, then B is associated with A. • For numerical variables, there are three commonly used “av- erage” values: • Find the mean by adding all the values and then dividing by the number of values. E.g., the mean of 11, 12, 13, 15 is (11 + 12 + 13 + 15)/4 = 12.75. • The median is the middle value. Half the data is less than the median and half is more. To find it, list the values in order, including repeated values. Find the middle number. If there is no one middle number, take the mean of the middle two. E.g., the median of 11, 12, 13, 15 is (12 + 13)/2 = 12.5. • The mode is the value seen most often. E.g., the mode of 91, 92, 92, 92, 93, 94, 95 is 92. • For a categorical variable and a numerical variable, the vari- ables are associated if the subgroups differ in some numeri- cal way, usually the mean or median. Procedure: To see if a categorical variable and a numerical one are associated: 1. Choose which average to use, mean or median, etc. 2. Find the average for each subgroup made by the categorical variable. 3. If the averages are all the same, there is no association. If the average differs between subgroups, there is association. The association is strong if the averages are very different. Tell the direction of association by saying which subgroups have the higher or lower average. E.g., look at gender and height, from example 7.4, p23. 1. The median has been chosen for this example. 2. The gender variable separates the group into female and male subgroups. We find the median female height is 64 inches. Similarly, we find the median male height is 69 inches. 3. Because the medians are not the same, there is an associa- tion. The direction is that males tend to be taller than females and the association is pretty strong.

Transcript of Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster...

Page 1: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3)

1. Introduction to Statistical Thinking

• The study of Statistics concerns data about things.

• A unit, case or subject is one of things we are talking about.Subjects are people.

• A variable names a property a unit can have.

• A variable has a type describing its possible values.

Variable Possible Values Type of Variable

gender female, male dichotomous categorical

race/ethnicity Black, Hispanic, white, other unordered categorical

height (nearest inch) 55, 56, 57,. . .,77,78 discrete numerical

age at last birthday 16, 17, 18, . . ., 56, 57, . . . discrete numerical

age group 16�19, 20�24, 25�70 grouped numerical

if unit's age ≥ 21 yes, no dichotomous categorical

writing hand left, right dichotomous categorical

expected di�culty 1, 2, 3, 4, 5ordered categoricalor discrete numerical

GPA (within 0.01) 2.00, 2.01, . . ., 3.99, 4.00 almost continuous numerical

sleep/night (to 0.5 hr) 4, 4.5, 5, . . ., 9, 9.5, 10 discrete numerical

feelings about math A, B, C, D, E ordered categorical

major AH, BF, CESM, HE, other unordered categorical

• The intrinsic type of a variable is the type the variable wouldhave if there were no approximation or grouping.

• A categorical variable places units in categories according totheir characteristics. A characteristic is a value a categori-cal variable can have.

• The frequency of a characteristic is the number of units withthat characteristic.

• A subgroup is a smaller group within a larger group.

• The percentage of units in a group or a subgroup which havea certain characteristic is:

p = # units w/characteristictotal # units in group

×100.

• There is an association between two categorical variables ifthe subgroups formed by one variable differ in percentages ofa characteristic from the other variable. Two variables beingassociated means if you know a unit’s value for one variable,you can say something about the tendency of the unit’s valuefor the other variable.

• E.g., You hear, “He likes broccoli.” Can you say anything abouthis height? No. So, the variables likes broccoli and height arenot associated.

You hear, “He is in the NBA.” Can you say anything abouthis height? Yes, people in the NBA tend to be tall. So, thevariables in the NBA and height are associated.

• E.g., In the raw data for 50 students on page 7, is there anassociation between gender and Math feelings?

The variables can be taken in either order, so make the 1stvariable gender, whose values are “female” and “male”. The

2nd variable is Math feelings with D or E meaning “positive”and A, B and C meaning “negative.”

Using gender, divide the group into female and male. Fromthe data, there are 32 females and 18 males. Of the females,there are 4 with positive feelings. There are 8 males withpositive feelings. Calculate the percentages:

pfp=#(females & positive)(100%)

#(females)= 4(100%)

32=12.5%,

pmp= #(males & positive)(100%)#(males)

= 8(100%)18

=44.44%.

The percentages are unequal so there is an association. Sincep f p < pmp, the direction is males tend to have more positivefeelings about math than females. The very different percent-ages mean a strong association.

• Association is symmetric. If variable A is associated withvariable B, then B is associated with A.

• For numerical variables, there are three commonly used “av-erage” values:

• Find the mean by adding all the values and then dividingby the number of values. E.g., the mean of 11,12,13,15 is(11+12+13+15)/4= 12.75.

• The median is the middle value. Half the data is less thanthe median and half is more. To find it, list the values in order,including repeated values. Find the middle number. If there isno one middle number, take the mean of the middle two. E.g.,the median of 11,12,13,15 is (12+13)/2= 12.5.

• The mode is the value seen most often. E.g., the mode of 91,92, 92, 92, 93, 94, 95 is 92.

• For a categorical variable and a numerical variable, the vari-ables are associated if the subgroups differ in some numeri-cal way, usually the mean or median.

• Procedure: To see if a categorical variable and a numericalone are associated:

1. Choose which average to use, mean or median, etc.

2. Find the average for each subgroup made by the categoricalvariable.

3. If the averages are all the same, there is no association.If the average differs between subgroups, there is association.The association is strong if the averages are very different.Tell the direction of association by saying which subgroupshave the higher or lower average.

• E.g., look at gender and height, from example 7.4, p23.

1. The median has been chosen for this example.

2. The gender variable separates the group into female andmale subgroups. We find the median female height is 64 inches.Similarly, we find the median male height is 69 inches.

3. Because the medians are not the same, there is an associa-tion. The direction is that males tend to be taller than femalesand the association is pretty strong.

Page 2: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

• We usually study a small group to find out about a large group.The large group is the population. The small group is a sam-ple. The population size is the number of units in the pop-ulation. The sample size is the number in the sample. Pa-rameters are data about the population. Statistics are dataabout the sample. We give different symbols to correspondingparameters and statistics.

Population SampleVariable type Name parameter statistic

Size N nCategorical Frequency F fCategorical Percentage p = F/N p = f /nNumerical Mean µx xNumerical Median MDx mdx

• For a large enough, properly chosen sample, the sample statis-tics are usually close to the population parameters. You knowor can calculate sample statistics exactly, but you may neverknow the population parameters for sure.

2. Introduction to Probability and Sampling

• A random procedure is a procedure whose outcome cannotbe known ahead of time. Depending on the point of view, theoutcome may be interpreted different ways. E.g., a rouletteball stops on 32 red, but we only care about red.

• A simple outcome is an outcome connected to one unit. Thesimple outcomes of a random procedure are combined to formany other outcome. Simple outcomes cannot be divided intosmaller outcomes.

• A composite outcome is an outcome made up of 2 or moresimple outcomes.

• An event is an outcome, simple or composite.

• ELSOs are Equally Likely Simple Outcomes, like a fair dieshowing an integer 1–6.

• If the outcome of a random procedure can be any one of NELSOs, and an event consists of F of them, then the Funda-mental Principle of ELSOs says: the probability of theevent is F/N.

• A dichotomous trial is a random procedure with exactly 2outcomes, success or failure. The outcomes may be compos-ite, made up of one or more ELSOs. The success probability isp, and the probability of failure is q. p+ q = 1 in numbers, orp+ q = 100% in percentages. E.g., flipping a coin, which couldbe unfair, with a success being heads and a failure being tailsand p = probability of heads = 0.6, and q = probability of tails= 0.4.

• In repeated dichotomous trials a trial is done many times.Over n trials, we record f successes and find p = f /n = ob-served success rate = relative frequency = percentage of suc-cesses.

• Repeated dichotomous trials can be visualized with a trianglegrid where the n-th level represents the possibilities after ntrials.

• Identical dichotomous trials are trials with the same proba-bility of success p. Independent dichotomous trials are trialswhere the outcome of one trial does not affect the p of another.Independent identical dichotomous trials, like flippingthe same coin many times, are called IID trials.

• The Fundamental Principle for IID Trials: When manyIID trials are done, p is probably close to p.

• In sampling, one or more units are chosen from a population.

• In random sampling, the probability of being selected is thesame for all units.

• In sampling with replacement, a unit is chosen and thenput back. (E.g., when picking cards from a deck with replace-ment, it is assumed that the deck is reshuffled after a pickedcard is put back.)

• Simple random sampling is sampling without replace-ment. Chosen units are set aside and not used again.

• Suppose that percentage p of a population has a characteristicand a coin has probability p of showing heads. Then, samplingthe population is equivalent to repeated dichotomous trials,i.e., flipping the coin many times.

Correspondences between Sampling & IID TrialsRandom Sampling Coin Flips (IID Trials)

Pick unit Flip coinUnit has the characteristic Coin shows headsp = % with characteristic p = probability of headsn = sample size n = # of flipsf = # chosen with characteristic f = # of heads seenp = f /n = % chosen w/characteristic p = f /n = % heads seen

• There are basically three good ways to sample.

1. Sampling With Replacement: a large sample probably givesp close to p.

2. Sampling Without Replacement from a Population MuchLarger than the Sample: with N >> n, p is probably close top.

3. Sampling Without Replacement from a Population Not MuchLarger than the Sample is like asking almost everybody in apopulation, so p is likely closer to p than the p found in (1).

• Fundamental Principles of Sample Size/Number of Tri-als apply to both sampling and IID trials because of the corre-spondences in the above table.

1. The larger n is, the less likely p = p exactly, “hitting thenail on the head.”

2. The larger n is, the more likely p is close to p.

3. The larger n is, the less likely p is far from p.

Comment: p = f /n so as n gets large, p becomes a long decimalnumber. But, while it probably gets closer to p, it is harder forit to be exactly p.

• With good sampling, p is probably close to p, so p is the ex-pected value of p.

• Use the Seesaw Charts at the back of the book to answerquestions related to the Fundamental Principles of SampleSize/Number of Trials above.

2

Page 3: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

• Chart EX relates sample size n, the probability that p = p,and true percentage p. Given any two, you can find the third.Use the Independent Trials side when:

• Sampling with replacement,

• Sampling w/o replacement from a large population,

• Independent identical dichotomous (IID) trials.

Use the Sampling without Replacement side when you aresampling n = 20% of the population without replacement.

• Chart C has a page for each p. It relates the sample sizen, deviation D and the probability that p is in the interval(p−D, p+D).

• Chart T is like Chart C, but instead tells you the probabilitythat p is outside (p−D, p+D). The “two tails” probability isthe probability of being above or below the interval. The “onetail” is the probability of being only above or only below, butnot either above or below.

3. Introduction to Surveys, Confidence Intervals,and Hypothesis Testing

Survey Methods

• The goal of sampling design is to get a representative sam-ple, similar to the population for some variable.

• From a survey we calculate a statistic p, which is an estimateof the population parameter p, but

p = p+sampling error= p+bias+random error.

• Bias is a systematic error caused by improper sampling. Properrandom sampling can make the bias zero.

• Non-random (Biased) Sampling Designs: Limited response;Biased selection; and Volunteer, Convenience or Judgment sam-pling, are all probably useless.

• Random Sampling Designs (scientific sampling): With alarge sample, and good design, an estimate of the populationparameter can be made and the size of the random error canbe calculated. Methods include:

– Random sampling with replacement.

– Random sampling without replacement (aka Simple Ran-dom Sampling, or SRS).

– Stratified sampling: population is divided into groups,then each group is randomly sampled.

– Cluster sampling: divide population into groups, choosesome groups at random, take all units in those groupsinto the sample.

– Systematic sampling: order units independently of thevariable of interest, then pick very 10th unit, for exam-ple.

– Multi-stage sampling: a combination of the above meth-ods.

– Haphazard sampling: use existing randomness to choose.

Confidence Intervals

• We use p from random sampling to estimate the unknownvalue p. A confidence interval is an interval that containsthe unknown population parameter with a probability c calledthe confidence level. The margin of error E is the likelymaximum error when estimating p using p. (Using p to ap-proximate p, the biggest mistake we can make is probably±E.) The chance that p ∈ (p−E, p+E) is c.

• There is a “nicE” relationship between sample size n, confi-dence level c and margin of error E. Think of the letters n, c,and E as possible pivot points on a 3-pivot seesaw.

1. If c is constant, it is a regular seesaw. If n goes up, Egoes down, and vice versa. So, we can trade off n andE while keeping the same c. We can shrink the marginof error by increasing the sample size while keeping thesame confidence level.

2. If E is constant, then n and c increase or decrease to-gether. For a given margin of error, we can increase theconfidence level by increasing the sample size.

3. If n is the constant pivot, then c and E go up or down to-gether. This means that for the same sample size, confi-dence level goes up with the margin of error. If the inter-val is gets bigger, we have more confidence the parameterlives inside.

4. Seesaw Chart CI puts actual numbers to the above rela-tionships.

• Unless told otherwise, use a confidence level c = 95%.

• For c = 95%, this table for Margin of Error E can be used whensampling with replacement, or without replacement with n <<N, and p ends up between about 10% and 90%.

p is between: E is approximately:

38% – 62% 100%/p

n26% – 37% or 63% – 74% 90%/

pn

18% – 25% or 75% – 82% 80%/p

n13% – 17% or 83% – 87% 70%/

pn

9% – 12% or 88% – 91% 60%/p

n

Note: Sincep

n is in the denominator, to shrink E by a factorof 2, we would need to survey 22 = 4 times as many units. Aslong as n << N, N itself does not matter. Usually, but notalways, the confidence interval will contain the parameter p.You cannot know p for sure without asking every unit in thepopulation. p is unknown, but not random.

Hypothesis Testing

• In hypothesis testing we check how likely something is tohappen by chance and if it is less than 5%, we conclude thatsomething more than chance is at work. Assuming only chanceis at work is the Null Hypothesis. For now, we only handlea situation like a guessing game of n IID trials, each with aprobability p of success, and we only have tables for p = 20%,30%, 40%, 50%, 60%, 70%, or 80%.

3

Page 4: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

• Procedure for simple hypothesis testing

1. Assuming only chance is at work, find the probability ofsuccess p for each independent trial.(ESP example: A person claiming ESP guesses the sym-bol on the backs of 250 randomly chosen cards. He knowsthere are 5 possible symbols, so the chance of success isp = 1/5= 20%).

2. Do the trials and find the success rate p = f /n. Thinkabout how it would be if something more than just chancewas at work.(ESP: She may have ESP if her success rate is high. So,we want to know the probability that she did as well asshe did or better. This is a 1-tail probability.)

3. Find the deviation D = | p− p| because the seesaw chartneeds D to find a p-value. If p is 20% or 80% use ChartT-20/80. If p is 30% or 70% use Chart T-30/70. If p is 40%or 60% use Chart T-40/60. If p is 50% use Chart T-50.(ESP: D = | p− p| = 22%−20% = 2%. Use this D, and n =250 in Chart T-20/80 to find the p-value for independenttrials.)

4. Compare the p-value to 5%. A p-value < 5% is consideredevidence that it is not just chance at work.(ESP: p-value ≈ 21%> 5%. We conclude that only chanceis at work, not ESP. If we had tested 100 people withoutESP, about 21 would have done as well or better. Peoplewith ESP should do much better. If she had gotten a lotof cards right, the p-value would have been below 5% andwe would believe she had ESP.)

4. Univariate and Bivariate Categorical Data

Univariate Categorical Data: Tables

• A sample (size n) is part of a population (size N). A categori-cal variable separates a sample into categories. The number ofunits in a category is its frequency f . If the sample is the en-tire population, the frequency of a category is F. The relativefrequency in the sample is p = f /n expressed as a percent.For a population, the relative frequency would be p = F/N asa percent.

• For a sample, the sum of all the frequencies f is the samplesize n. For a population, the sum of all the frequencies F isthe population size N.∑

f = n,∑

F = N.

• If all the categories have the same frequency, the units areuniformly distributed. If some categories have more unitsthan others, the distribution is non-uniform.

• In a non-uniform distribution, the category with the highestfrequency is the modal category.

• The modal category is not necessarily the majority.

Univariate Categorical Data: Graphs and Charts

• Bar charts can show frequency or relative frequency.

• Stacked bar charts of relative frequency combine the barsso that the total height can be seen to be 100%.

• Truncated bar charts do not start at zero and may be decep-tive.

• Charts may be placed side by side, or back to back for compar-ison.

• Dot charts are simplified bar or stacked bar charts with ad-ditional scales to convey more data.

• Pictographs use pictures to represent data. Pictographs maybe deceptive because we tend to extrapolate an object’s volumeand not look just at its height.

• Pie charts show relative frequency and are especially goodfor unordered categorical data.

Bivariate Categorical Data: Tables

• Let’s look at contingency tables with the example from Sec.4.3, but using symbols instead of numbers. Let N = sample size,and the number in each category be: M =Male, F =Female, y=yes, n = no. Fig. 4.23 summarizes the results of the survey.

(4.23) Male Female Row Total

yes yM = My yF = Fy yno nM = Mn nF = Fn n

Column Total M F N

where yM = “yes responders who are Male,”, while Fn = “Fe-males who said no”, etc. Notice that yM = My because thenumber of “yes responders who are Male” is the same as “Maleswho said yes”, etc.

• Next, we analyze by gender by dividing each gender column bythe number of the gender at the top of the column and sum-ming them at the bottom. For nicety, the last column wasdivided by the sample size N. The book does this in Fig. 4.24.We can think of My/M = “Males who said yes divided by allMales” or of yM /M = “yes sayers who are Male divided by allMales.” They are the same number.

(4.24) Male Female Row Total

yes My/M Fy/F y/Nno Mn/M Fn/F n/N

Column Total My+MnM = 1 Fy+Fn

F = 1 y+nN = 1

• Now, we analyze by the yes/no response by going back to thedata in [4.23] and dividing each row by the number who gavethe response at the head of the row. The book does this in Fig.4.25. It may be easier to think of yM /y = “yes responders whoare Male divided by all yes responders”, etc. Again, for nicety,the last row was divided through by N.

(4.25) Male Female Row Total

yes yM /y yF /y (yM + yF )/y= 1no nM /n nF /n (nM +nF )/n = 1

Column Total M/N F/N (M+F)/N = 1

• Finally, we analyze by overall percentage by again going backto [4.23] and dividing by the total sample size N. The bookdoes this in Fig. 4.26.

4

Page 5: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

(4.26) Male Female Row Total

yes yM /N=My/N yF /N=Fy/NyM+yF

N = My+FyN = y

N

no nM /N=Mn/N nF /N=Fn/NnM+nF

N = Mn+FnN = n

N

C. Tot. M/N F/N M+FN = y+n

N = NN = 1

• This completes the analysis. (4.26) is true, but not as inter-esting as (4.24) or (4.25). From the original data in (4.23), wegot (4.24) by dividing each column by the total in that column.We got (4.25) by dividing each row by the total in that row.These tables were done with symbols, but filling in numbersfrom the book’s Fig. 4.23 will make all the book’s results comeout.

Bivariate Categorical Data: Association and Independence

• Let us fully work out the example of Age versus Hours of sleepon p129. The original data is:

Hours of sleepAge 6 or less 6.5–7.5 8 or more Total

under 21 7 7 4 1821 or over 10 13 7 30

Total 17 20 11 48

• All the same question: “Is Age associated with Hours of sleep?”,“Do different Ages have different Sleep hours?” Ages is the 1stvariable, so divide each Age row by the row total to get:

Age 6 or less 6.5–7.5 8 or more

under 21 7/18= 38.9% 7/18= 38.9% 4/18= 22.2%21 or over 10/30= 33.3% 13/30= 43.3% 7/30= 23.3%

• Comparing the percentages for under 21 vs. 21 or over, bylooking across the table, we see there is a small trend for olderpeople to get more sleep.

• The same question reversed: “Is Sleep time associated withAge?”, “Do different sleep times imply different Ages?” Sleephours is the 1st variable, so divide each sleep column by thecolumn total:

Age 6 or less 6.5–7.5 8 or more

under 21 7/17= 41.2% 7/20= 35.0% 4/11= 36.4%21 or over 10/17= 58.8% 13/20= 65.0% 7/11= 63.6%

• For each sleep time, compare the percentages by Age down thetable. We see that if a subject gets more sleep, he/she tends tobe older, by a medium strong trend.

• We have analyzed the data by rows and by columns and sawan association both times because association is symmetric.Together, the analyses give a very good understanding of thedata. Sometimes analyzing just by rows or just by columnsalready gives enough understanding of the data. Sometimes,one way is more compelling than the other.

• No association is Independence, in which case all parallelpercentages (row or column) would be equal.

5. Univariate Numerical Data

Frequency and Relative Frequency Tables

• Our data is a list of the values x with frequency f being thenumber of times each value occurs. The sum of the frequenciesis n, the total number of values. The relative frequency p isthe frequency divided by n. E.g.,

Hours Frequency Relative frequencyx f p4 1 1/7 = 14.3%5 2 2/7 = 28.6%8 4 4/7 = 57.1%

Totals 7 100%

• A group like “Age 20–24 is intrinsically continuous. It actuallyincludes ages from 20.000 to 24.999. So the group or classboundaries are 20 and 25, the class width or span is 25−20= 5 years. The density is the frequency divided by the spanin units/year. E.g., one line from a table using spans of ages,with n = 40 could be:

rel. freq. rel. freq.Age span freq. freq. density density

20–24 5 10 25% 10/5 = 2 25% / 5 = 5%

Histograms

• A histogram plots frequency or relative frequency versusequally grouped values with the height representing the fre-quency. For unequally grouped values, use the frequency den-sity, but note that the height is now density and the area rep-resents the frequency.

The “Average” Value: Mean, Median, and Mode

• Summation notation:∑

x can be read “sum over all x" andmeans “add up all the x values.”

• The mean is found by adding all the values and dividing bythe number of values. (µ is pronounced “mu”.)

Population Sample

meanµx = 1

N∑

x= 1

N∑

(Fx)= ∑

(px)

x = 1n

∑x

= 1n

∑( f x)

= ∑(px)

deviation x−µx x− x∑(x−µx)= 0

∑(x− x)= 0

• Beware: Extreme values, big or small, shift the mean.

5

Page 6: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

• The median is the middle value in a set of ordered values.E.g., find the median from a frequency or a relative frequencytable:

x f p19 3 30%22 2 20%25 2 20%26 2 20%67 1 10%

totals 10 100%

x f p19 4 40%22 2 20%25 2 20%26 1 10%67 1 10%

totals 10 100%

In the left table, start adding up the f values. 3+ 2 = 5 isexactly half of 10, so the median is between 22 and the nextvalue 25. Take the middle value, (22+25)/2= 23.5.

In the right table, use the relative frequency for practice.Adding, we get 40%+20% = 60% which is already more than50%. We must have just passed the median, so it must be 22.

• The mode is the most popular value, if there is one. In afrequency histogram, the modal class has the highest bar. Ina frequency density histogram, the modal class has the biggestarea.

The Weighted Average

• The weighted average is an average that puts more impor-tance on some values than others. To compute it, multiplyeach value by the weight and add them all up. Then, divide bythe sum of the weights.

weighted average x =∑(wx) /

∑w.

• E.g., Present and Future GPA: In your school career, youalready have 20 units with an accumulated GPA of 3.500. Thissemester, you are taking a classes worth 3, 4 and 5 units. Ifyou get an ‘A’ in the 3 and 4 unit classes and a ‘B’ in the 5 unitclass what will your GPA for the semester be? And, what willyour new accumulated GPA be?

• Solution using weighted averages. First, find the GPA forthis semester. The values x are from your grade in a class, 4for ‘A’, 3 for ‘B’. The weights w are the units, because a highunit class is worth more than a low unit class.

semester GPA =∑(wx) /

∑w

= 3(4.0)+4(4.0)+5(3.0)3+4+5 = 43

12 = 3.583.

Now, for your accumulated GPA, at semester’s end, you willhave an accumulated GPA from 20 units, and a semester GPAfrom 12 units. You combine them with a weighted average,again with the units as weights because a GPA from moreunits counts more than a GPA with fewer units behind it.

new accumulated GPA=∑(wx) /

∑w

= 20(3.500)+12(3.583)20+12 = 113.000

32 = 3.531.

Your accumulated GPA can drag you down or keep you up andit gains weight every semester. Get your ‘A’s early on insteadof betting on the future, so the weighted average is on yourside.

The Standard Deviation

• The standard deviation tells the average distance the datavalues are from the mean.

Population Samplestandarddeviation σx =

√ ∑(x−µx)2

N sx =√ ∑

(x−x)2

n−1

z-score zx = x−µxσx

zx = x−xsx

• The standard deviations (SDs) make a two-sided ruler with 0at the mean and marks at ±1,±2 and ±3 SDs. For any set ofdata, you can say:

• About 67% of the data is within 1 SD of the mean.

• About 95% of the data is within 2 SDs of the mean.

• Almost all of the data is within 3 SDs of the mean.

• The z-score for a data value tells how many standard devia-tions it is away from the mean. So, for any set of data: about67% of the z-scores will be between -1 and 1; about 95% willbe between -2 and 2; and almost all will be between -3 and 3.

Quartiles, Percentiles and Boxplots

• The median of a data set divides it in two. The quartile val-ues divide each half into two again. There are three quartilevalues, Q1,Q2, and Q3. Q2 is the median of all the data. Q1is the median of the lower half; Q3 is the median of the upperhalf.

• To find Q2 find the median. If n is even, Q2 is between twovalues (their sum divided by 2), clearly cutting the set in two.Find Q1 as the median of the lower half and Q3 as the medianof the upper half. But, if n is odd, then Q2 lands on a datavalue. We choose to include the Q2 data point in both sides,and again finding the medians of the lower and upper halvesto get Q1 and Q3.

• The quartile regions are the regions marked out by the quar-tile values. Quartile values and regions work best for datathat is almost continuous with a large sample size.

• To make a simple boxplot, place small marks at each end ofthe data. Place taller marks at the three quartile values. Con-nect the tall marks to form two boxes. Draw lines (“whiskers”)from the end values to the boxes.

• A more sophisticated boxplot is made in the same way,but instead of the end values, the small marks are placedat the first data points inside the fences. The fences are at[Q1 − (1.5)(IQR)] and [Q3 + (1.5)(IQR)], where IQR = Q3 −Q1is the interquartile range. Data outside of the fences aredisplayed as separate dots and are outliers.

• On a plot or list, cutting the data with 99 marks, with the k-thmark greater than k percent of the data gives percentiles.E.g., the 30% mark would be more than 30% of the data, butless than 70%. Also, the 25th percentile = Q1, the 50th per-centile =Q2 = median, and the 75th percentile =Q3.

6

Page 7: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

6. Bivariate Numerical Data: Scatterplots, Correlation andRegression

Scatterplots

• A scatterplot displays bivariate numerical data by plottingthem as points (x, y) on the x-y plane. x is the predictoror independent variable. y is the response or dependentvariable.

• If y increases as x increases, the association is positive. If ydecreases as x increases, the association is negative. If thepoints fall around a line, the association is linear and thevariables form a linear pairing. (If the points fall on a hor-izontal line or a vertical line, there is actually only one vari-able, not two, because one is constant. Use the methods forunivariate numerical data.) If the points follow a curve, theassociation is curvilinear and there is a curvilinear pair-ing. The clearer the trend, the stronger the association. Ifthere is no trend, there is no association.

• For population data, the means of the two variables are µx andµy. The point of the means is the point (µx,µy). A horizon-tal line and a vertical line through this point divides the planeinto 4 deviation quadrants (DQ), numbered I–IV, counter-clockwise.

DQ I DQ II DQ III DQ IVx−µx + − − +y−µy + + − −

• For population data, the standard deviations (SDs) are σx andσy. (σ is pronounced “sigma”.) Try to see the trend in the datato tell if the SD line should be drawn sloping up or slopingdown. The SD line goes through the point (µx,µy) and thepoint (µx +σx,µy +σy) for a positive association. For a neg-ative association, use the point (µx +σx,µy −σy). If the signof the association is not clear, use the sign of the correlationcoefficient (see below). The closer the points bunch around theSD line, the stronger the linear association.

The Correlation Coefficient

• The correlation coefficient ρxy (“rho x y”) tells the sign andstrength of the association between two linearly associatednumerical variables x and y. If the data falls perfectly on arising(falling) line then ρxy = 1(ρxy =−1), otherwise it is some-where between -1 and 1.

• On a standardized scatterplot, the scales on the x and yaxes are adjusted so that σx =σy in visual appearance on thegraph, not in numerical value. It allows us to compare vari-ables that have very different values (e.g., height vs. shoe size)while keeping the scatterplot nice. On a standardized scatter-plot, the SD line slants ±45◦, depending on positive or nega-tive association. ρxy tells how closely the points bunch aroundthe SD line.

• On a standardized scatterplot, ρxy can be estimated by draw-ing an oval around the points and comparing the short width(a) of the oval to the long width (b):

a/b ρxy a/b ρxy a/b ρxy

0.0 ≈ 1 0.4 0.70–0.75 0.8 0.20–0.250.1 0.95–0.99 0.5 ≈ 0.6 0.9 ≈ 0.10.2 0.90–0.95 0.6 0.45–0.50 1.0 ≈ 0.00.3 0.80–0.85 0.7 0.30–0.35

• Beware: Data with curvilinear association can have ρxy nearzero. ρxy is only good for linear association.

• Finding ρxy for population data ⟨⟨r for sample data⟩⟩ x, y:1. Find means µx,µy and SDs σx,σy ⟨⟨ x, y and SDs sx, sy ⟩⟩.2. Find deviations x−µx, y−µy ⟨⟨ x− x, y− y ⟩⟩ .

3. For each point (x, y) multiply the deviations and add themall up.

4. Divide by Nσxσy ⟨⟨ (n−1) sxsy ⟩⟩ .

ρxy =∑

[(x−µx)(y−µy)]Nσxσy

; r =∑

[(x− x)(y− y)](n−1)sxsy

for a population; for a sample

The Regression Line

• The regression line is the line that best summarizes thedata. It can be used to make predictions of points not in thedata. Corresponding quantities are used for populations andsamples. For sample data, the regression line goes throughthe point of the means (x, y) and (x+ sx, y+ rsy). As associa-tion strengthens, the regression line gets closer to becomingthe SD line.

• The regression line is also the “Line of Means,” the “Line ofAverages,” and the “Prediction Line” because for any x value,not necessarily in the data, we have an average or expectedvalue for y.

• We can find the equation of the regression line because, forsample data, we know it goes through the point (x, y) and hasslope rsy/sx. So from the point-slope form, with yp being y-predicted:

yp − yx− x

= r sy

sx⇒ yp = y+ r sy

sx(x− x).

• For an actual data point (x, y) the residual error is ye = y− yp.The sum of the residuals for all the points is zero:

∑(y−yp)= 0.

• Of all possible lines through the data points, the regressionline is the “Line of Best Fit” and the “Least Squares Line”because it minimizes the sum of the squares of all the residualerrors,

∑(y− yp)2.

• On a standardized scatterplot, r is the slope of the regressionline because the plot has been scaled so that sx = sy in appear-ance (not in reality).

7

Page 8: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

7. Introduction to Probability

• If a random procedure has N equally likely simple outcomes(ELSOs), and F of them make up an event E, then the proba-bility of the event is P(E)= F/N. We can also use odds to saythis. E.g., if the probability of an event is 1/6, then the oddsfor it are 1 to 5, while the odds against it are 5 to 1.

• The Fundamental Principle for Long-Term Relative Fre-quency or Law of Averages says that if a large number n ofindependent, identical trials of a random procedure are done,and an event appears f times, its relative frequency p = f /ntends to its true probability p as n gets large. The Law of Av-erages works by swamping, not compensation. E.g., a fair coinshowing heads 10 times in a row does not mean it will showmore tails soon to “balance” things out. Instead, as more flipshappen, the relative frequency gets closer and closer to 50%,overwhelming any blips.

• In a random process, the probability of an event is the num-ber approached by the long-term relative frequency of the eventas the number of independent repetitions of the process in-creases.

• Suppose after n trials of a random procedure, we find the rel-ative frequency of an event is p = f /n. We can use the marginof error table in Ch. 3 to find how close p probably is to p.Find p in the table and get E. We are then 95% certain that pis between p−E and p+E. Use Seesaw table CI for levels ofconfidence of 80% or 99%.

• A random variable is a variable whose value depends on theresult of a random procedure.

• The probability distribution table of a random variableshows the probability of each possible value of the randomvariable. The table can be thought of as giving the “ideal”relative frequencies we would get from doing the random pro-cedure many, many, many times. E.g., for a ball bouncing on aspinning roulette wheel, a random variable could be the colorthe ball stops on, and the possibilities are red, black and green.The wheel has 38 spots, 18 are red, 18 are black and 2 aregreen. So, we can make a probability distribution table:

x (color) P(x) (probability of color)red 18/38 = 47.4%

black 18/38 = 47.4%green 2/38 = 5.3%Total 38/38= 100%=∑

P(x)

• The expected value of a numerical random variable x is∑

xP(x),the sum over all possible values times the probability of thatvalue. For n trials of a random process generating possiblevalues x we can calculate x = ∑

x ( f /n). The expected valuecan be looked at as the number that x approaches as the num-ber of trials n gets very large. The expected value does nothave to be a value that x can have. E.g., x may have to bean integer, but x could be between two integers. The expectedvalue is the mean of a probability distribution.

8. More Topics in Probability

• If A is an event, the complement of A aka “not A” are theoutcomes that are not in event A. The event A or B consistsof outcomes that are either in event A or event B or both. Theevent A & B consists of events in both A and B. E.g., for adie, make event A = roll a 3, and make event B = roll an oddnumber. Then, not A = {1,2,4,5,6}.A or B = {1,3,5}. A & B = {3}

• Events A and B are complementary if B is the complement ofA. The Rule of 100% for Complementary Events says thatif A and B are complementary random events, then P(A)+P(B) = 100%. That is, either A happens or B happens, so it isa certainty that one of them will happen, so their probabilitiesmust add up to 100%. E.g., if P(heads) of an unfair coin is60%, then P(tails) = 100%−60% = 40%. Since the coin musteither land heads or tails, they are complementary.

• Two events A and B are:

mutually exclusive/incompatible compatibleThey cannot both happen. They can both happen.If one occurs, the other cannot. If one occurs, the other still can occur.Their intersection is the empty event. Their intersection is not the empty event.They have no outcomes in common. They have outcomes in common.The event A & B is impossible. The event A & B is possible.P(A & B)= 0. P(A & B) 6= 0

• Complementary events are incompatible, but incompatibleevents are not necessarily complementary.

• If A and B are mutually exclusive, then

P(A or B)= P(A)+P(B).

• If A and B are compatible events, then

max of {P(A),P(B)}≤ P(A or B)< P(A)+P(B).

• For random sampling from bivariate categorical data, the con-ditional probability of E given F, is

P(E, given F)= P(E |F)= #(E & F)#F

= % of units in Falso in E.

The data used with this formula usually comes in a table ofseveral rows and columns.

• Two events are independent if the probability that one oc-curs is not affected by whether or not the other occurred. Oth-erwise, they are dependent. For dependent or associatedevents, the probability of one event is different depending onwhether or not the other event occurred.

• Two events E and F are independent if

P(E)= P(E, given F)= P(E, given not F).

If any two of the above are not equal, the events are associatedor dependent. If the three are almost the same, but not equal,the association is weak. If the three are very different, theassociation is strong.

• Successive events result from random procedures that occurone after the other.The Multiplication Rule for P(A & B) for Successive orAssociated Events A and B:

(MR1) P(A & B)= P(A)P(B, given A).

8

Page 9: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

• The Multiplication Rule for P(A & B) for IndependentEvents A and B:

(MR2) P(A & B)= P(A)P(B).

• In Sampling with Replacement each selection is indepen-dent, so use (MR2).

• In Sampling without Replacement, the selections are de-pendent, with the conditional probabilities changing at eachstep. So, use (MR1).

• If sampling is done without replacement in a very small pop-ulation, the conditional probabilities after any selection stepare very different from the probabilities before any selectionis done.

• But if sampling is done without replacement and the samplesize n is very small compared to the population N (i.e., n <(5%) N), the conditional probabilities do not change much asthe selections are made.

9. Binomial Distributions

Binomial Procedures and Distributions

• A binomial procedure is a sequence of repeated indepen-dent identical dichotomous random trials. E.g., any procedurethat is like flipping a coin (fair or unfair) over and over againis a binomial procedure.

• A binomial random variable is the frequency of success fof a binomial procedure.

• A binomial distribution is defined by a binomial procedure,together with all possible values of the random variable f andtheir probabilities. E.g., for multiple coin flips, f could be thenumber of heads resulting, i.e., {0,1,2, . . .}. The probabilityof each value is {P(0),P(1),P(2), . . .}. Listing this informationtogether makes a binomial distribution table, and a spikediagram is a picture of it.

Binomial Distributions for p = 50%

• A triangle diagram can be used to envision repeated dichoto-mous trials. You can think of it as a bunch of streets, at everyintersection you flip a coin to decide which fork to take—butyou can never turn back. Think of the grid points reachableafter the same number of decisions as having the same level ofprogress towards the final level. For p = 50%, the probabilityof getting to a grid point is the number of paths to get theredivided by the number of paths to get to that same level.

probability ofdestination = # paths to it

total paths to same level

E.g., after 3 coin flips, there is 1 way of getting 3 heads whilethere are 8 ways that flipping a coin 3 times can happen. So,the probability of getting 3 heads is 1/8.

• In a binomial procedure with n trials, there are 2n paths. E.g.,for flipping a coin 3 times, there are 23 = 8 paths:{HHH,HHT,HTH,HTT,THH,THT,TTH,TTT}.

• For a binomial procedure of n trials with p = 50%, the proba-bility of any one path is 1/2n. E.g., one way flipping a coin 3times could happen is getting “HTH”, and there are 8 possi-ble ways, so the probability of getting “HTH” is 1/8 = 12.5%.In particular, when p = 50% the probability of getting n trialsending in all heads or all tails is 1/2n.

• For small values of n, we can use Pascal’s Triangle to seethe number of paths to a destination.

1 1 n = 11 2 1 n = 2

1 3 3 1 n = 31 4 6 4 1 n = 4

• k!= 1 ·2 ·3 · . . . ·k (“k factorial”). Note: 0!= 1.

• For a binomial procedure, if there are f successes in n trials,there must be g = n− f failures.

• For n trials of a binomial procedure, the number of paths lead-ing to f successes and g = n− f failures is:

nC f =(nf

)= n!

f ! g!= n!

f ! (n− f )!.

• Think of the triangle grid as a street map. You walk and ateach intersection (each trial), you choose the right or the leftfork. No matter the order, f rights and g lefts take you to thesame final destination. How many different paths are thereto each destination? Suppose for a right fork, we keep a bluemarker, and for a left fork, a gold marker. We end up with fblue and g gold, with f + g = n. The number of paths is thesame as the number of ways we can arrange these markers.There are n! ways to arrange n different things. But the bluemarkers are all alike and the gold markers are all alike, sothere are repeats in the arrangements. We have to divide outthese repeats. The number of ways to arrange f things is f !and for g things, g!, giving us n!/( f !g!) as the number of waysto arrange n things, when f of them are alike, and another gare alike.

• For n trials with p = 1/2, the probability of any one path is1/2n, so the probability of any one of n!/( f !g!) paths is[n!/( f !g!)](1/2n). This is the probability f successes and g fail-ures in n trials if p = 1/2.

Binomial Distributions for p = Any Value; SD

• For a binomial procedure, with probability of success p andprobability of failure q = 1− p, the probability of a single pathwith f successes and g failures is p f qg. The probability ofn trials ending up with f successes and g = n− f failures isn! p f qg/( f !g!). To see this, again walk the grid but now, choosethe right fork with probability p and the left fork with proba-bility q. The probability of a path with f rights and g lefts isp f qg. Ending up with f successes and g failures is like havingpicked f rights and g lefts, in any order. There are n!/( f !g!)paths like this, so we multiply it by the probability of any onepath to get the above result.

• In n trials, the probability of 100% success is pn, and the prob-ability of 100% failure is qn.

9

Page 10: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

• For n trials, the expected value of f is Exp( f ) = np. Sinceeach trial has probability of success p, we would expect npsuccesses in n trials. p is the observed success percentage. Itsexpected value should be the expected number of successes fdivided by the number of trials n, so Exp(p)= np/n = p.

• The mode of the distribution of f (the most likely value of f )is the whole number closest to np. Exp( f ) is the mean of itsdistribution, the balance point of the spike diagram. Exp(p) isthe mean of its distribution, which is the spike diagram withthe heights of the spikes divided by n.

• The following two approximations are especially good whenboth np > 5 and n(1− p)> 5.

• For a binomial random variable f , SD( f )=pnpq. After n tri-als, probably 2/3 of the results are within

pnpq of pn; about95% are within 2

pnpq of pn; and almost all are within 3pnpq

of pn. As n increases, the SD gets wider, so the deviation fromExp( f ) tends to grow.

• The standard deviation of the sample probability is SD(p) =√pq/n. As n increases, the SD narrows, confining p. The

“2/3, 95%, Virtually All” rule applies.

10. The Normal Distribution

The Normal Distribution; Using the Table

• The normal curve, aka the bell curve, is everywhere. E.g.,over time, the footsteps of many many people can carve thenormal curve into stone steps.

• The normal curve is like a hill with a dome at the peak anda ski slope towards the bottom. The center is straight un-der the peak, and the inflection point is where the dome be-comes a ski slope. The spread is the horizontal distance fromthe center to the inflection point. The larger the spread, thelower the peak; the smaller the spread, the higher the peak.

• The normal curve is continuous, but often it can be used toapproximate the discrete binomial distribution.

• By “changing our ruler” or normalizing, we can match anynormal curve to the standard normal curve which is in Ta-ble A-26. We use the table because no simple formula exists.Before normalization, the center is at µ and the spread is σ.When normalized to the standard normal curve, the center isat 0; and the spread is 1 (unit is SD), and area under the curveis 1 (unit is probability).

• The general and standard normal distributions are:

y= 1p2πσ

e−(x−µ)2/2σ2 =⇒ y= 1p2π

e−z2/2.

• For a normal curve with mean µ (“mu”) and standard devia-tion σ (“sigma”), each x has a z-score z = (x−µ)/σ. Using thisz is how we “change our ruler”. The z-score gives the distancefrom 0 in terms of SDs. E.g., we have a curve with µx = 3and σx = 2 and need to find the area under the curve betweenx1 = 2 and x2 = 5. To do this, convert to z-scores: z1 = (x1 −µx)/σx = (2−3)/2 = −0.5 and z2 = (x2 −µx)/σx = (5−3)/2 = 1.0.Now, use these z-scores with Table A-26, which gives the areafrom 0 to the z-score. The area from z = −0.5 to z = 1.0 hasto be broken up into the area from z = −0.5 to z = 0 and the

area from z = 0 to z = 1.0. Looking up z = −0.5, we find thearea between it and z = 0 is 19.15%. For z = 1.0, we find thearea between it and z = 0 is 34.13%. So, add the areas to get19.15%+34.13%= 53.28%.

• Recall, the percentile of a value is the percentage of data lessthan that value. Follow the examples in the table below to findthe percentile from a z-score.

z-score z =−1.645≤ 0 z = 1.34> 0table 45% 41%

adjust 50%−45%= 5% 50%+41%= 91%result 5th percentile 91st percentile

• Formulas relating an x value to its z-score are:

z = (x−µx)/σx ⇐⇒ x = zσx +µx.

We can answer questions like: “The test scores have a normaldistribution with mean 200 and SD 40, and I am on the k-thpercentile, what is my raw score?”

percentile k = 42%≤ 50% k = 87%> 50%table z =−1.41 z = 1.13

formula x =−1.41(40)+200 x = 1.13(40)+200raw score x = 144 x = 245

Note: in the above, for a percentile less than 50%, we used thenegative z-score from Table A-26.

• If we cannot assume the normal distribution, we cannot usethe above two methods to answer those questions.

• For normally distributed data, the “About 2/3, About 95%, Vir-tually All” rule for data lying within 1, 2, and 3 SDs of themean becomes the “68.3%, 95.4%, 99.7%” rule.

Normal Approximation of the Binomial Distribution

• The binomial distribution of f resulting from a binomial pro-cess can be approximated by a normal distribution if np > 5and nq > 5. Since q = 1− p, algebra gives

1− 5n> p > 5

n⇒ 95%> p > 5%, if n = 100.

For larger n, a larger range of p would qualify.

• The approximate normal distribution has mean np and SDpnpq. Furthermore, we convert questions about f to onesabout the its z-score, z = ( f −np)/

pnpq, which has the stan-dard normal distribution.

• E.g., we take a sample of n = 1200 from a very large popu-lation, asking a yes/no question, if the probability of ‘Yes’ isp = 50%. Find the probability that we get f = the number of‘Yes’ responses being 552 or less.

1. Make sure the procedure is binomial.(a) The trials are almost independent and identical be-cause the population is very large. 3.(b) There are only two possibilities, ‘Yes’ and ‘No’, so eachtrial is dichotomous (like flipping a coin). 3

2. Check that we can use the normal approximation:(a) p = q = 50%, so np = nq = 1200(50%)= 600> 5 3(b) n = 1200> 100. 3

10

Page 11: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

3. Find the mean and SD of the approximating normal dis-tribution.µ= np = 1200(0.5)= 600,σ=pnpq =p

1200(0.5)(0.5)= 17.3

4. Do the calculation to answer the question. We need theprobability P( f ≤ 552) The whole point of our approxima-tion is:

P[ f ≤ 552] ≈ P[z ≤ (552−µ)/σ]

{using binomial distr} {using std normal distr}

{troublesome to find} {easy, just use the table}

(552−µ)/σ = (552−600)/17.3 = −2.77 is the z-score andthe table gives P(z ≤−2.77)= 50%−49.72%= 0.28%. Thismeans that if we took the survey and p was really 50%,that it would be very unlikely to get 552 or less ‘Yes’responses. Now, 0.28% is very much less than 5%, theusual percentage below which we say an event is veryunlikely. If we were doing hypothesis testing, we wouldstop believing p = 50%.

Normal Approximation of the Distribution of p

• If n and p qualify, the distribution of the sample probability pof a binomial process can be approximated by a normal distri-bution with mean p and SD

√pq/n.

• This helps us to answer questions like, “If the true probabilityis p, what is the probability that in a sample of n, the sampleprobability turns out to be p ≥ k.

• E.g., the true probability is p = 48%. Find the probability thatin a sample of n = 200, that we will see a sample probabilityp ≥ 55%.

1. Make sure the procedure is binomial.(a) The trials are independent and identical because thecases are all the same and do not influence each other. 3(b) There are only two possibilities, ‘Yes’ and ‘No’, so eachtrial is dichotomous (like flipping a coin). 3

2. Check that we can use the normal approximation:(a) p = 48%, q = 52%, so np = 200(48%) = 96 > 5,nq =200(52%)= 104> 5. 3(b) n = 200> 100. 3

3. Find the mean and SD of the approximating normal dis-tribution.µ= p = 48%, σ=√

pq/n =p0.48(0.52)/200= 3.53%

4. Do the calculation to answer the question. We need theprobability p ≥ 55% The whole point of our approxima-tion is:

P[p ≥ 55%] ≈ P[z ≥ (0.55−µ)/σ]

{using binomial distr} {using std normal distr}

{troublesome to find} {easy, just use the table}

(0.55−µ)/σ = (0.55− 0.48)/0.0353 = 1.983 is the z-scoreand the table gives P(z ≥ 1.983)= 50%−47.61%= 2.39%.This means that if the probability really is p = 48%, thechance that we will see a sample probability p ≥ 55% isonly 2.39%.

11. Confidence Intervals

A Bit More About Normal Distributions• x ∼ N(µ,σ) means variable x has a normal distribution with

mean µ and SD σ. The probability that x is between a and b isthe area under the normal curve between a and b. There is noformula for the area under N(µ,σ), but we have Table A-26 forthe area under N(0,1). To answer a probability question aboutx with x ∼ N(µ,σ), we “change our ruler” so that x becomes itsz-score and the question becomes one about z with z ∼ N(0,1).We answer this new question using the table and then usealgebra to get back to the answer in terms of x if needed. Wewill do this again and again. Mathematically, we say,

P[x ∈ (a,b)] = area N(µ,σ) above (a,b)

z-scores: a → a−µ

σ= za, b → b−µ

σ= zb

= area N(0,1) above (za, zb)

= P[z ∈ (za, zb)].

• Typical problem: For x ∼ N(µ,σ), find E such that the proba-bility of x being within E of µ is c. Mathematically:

P[x ∈ (µ−E,µ+E)]= c, x ∼ N(µ,σ), E =?

• As above, rewrite the condition using z-scores:

x → x−µ

σ= z, (µ−E) → (µ−E)−µ

σ= −E

σ,

(µ+E) → (µ+E)−µ

σ= Eσ

.

• The question is also converted.

P[z ∈ (−E/σ,E/σ)]= c, z ∼ N(0,1), E =?

The normal distribution is symmetric, so equivalently,

P[z ∈ (0,E/σ)]= c/2, z ∼ N(0,1), E =?

Now use Table A-26 to look up c/2 in the M-column and readoff the value zc which means

P[z ∈ (0, zc)]≈ c/2, z ∼ N(0,1).

So, we see E/σ is approximated by zc:

E/σ≈ zc ⇒ E ≈ zcσ.

The 6 Contexts - Confidence IntervalsVariable(s)→ 1 Large Sample Small SamplePopulations↓ Dichotomous 1 Numerical 1 Numerical

1 p µx µx(conditions) (abc) (ad) (aef)

2 p1 − p2 µ1 −µ2 µ1 −µ2(conditions) (abc) (ad) (aefg)

Necessary Conditions key:

(a) Random sampling with replacement, or random sampling with-out replacement with N À n.

(b) pn > 5 and qn = (1− p)n > 5.

(c) n > 100.

(d) n ≥ 30.

(e) n < 30.

( f ) Variable is approximately normal in population.

(g) σ1 ≈σ2. SDs are about the same.

11

Page 12: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

1 - One Population, One Dichotomous Variable;Confidence Interval for p

• A typical example of this context is a survey, meeting con-ditions (abc), asking a y/n question. We want to estimatep = F/N from our sample p = f /n and state the confidenceinterval for a given confidence level c.

• From our survey, we get f ‘y’ responses out of a sample of n.We know from before that

f /n = p ∼ N(p,σ), σ=√

pq/n.

But, p is unknown, so we approximate σ with σ:

p ∼ N(p, σ), σ=√

pq/n, q = 1− p.

• Our goal is to find a margin of error E such that

(T1a) P[p ∈ (p−E, p+E)]= c.

• Following exactly what we did before, we convert to z-scores,getting the key quantity in the process:

p → z = p− pσ

, p−E → −Eσ

, p+E → Eσ

.

• The goal is converted to finding E such that:

P[z ∈ (−E/σ,E/σ)]= c, z ∼ N(0,1).

Exactly as before, this leads us to

(T1b) E = zc σ= zc√

pq/n,

where zc comes from looking up c/2 in the M-column of Ta-ble A-26. So, we can quote (T1a) now with E known. But, itsounds better to turn it around because p is the unknown. Weargue† that saying p is within E of p is the same as saying pis within E of p. So, we say,

P[p ∈ (p−E, p+E)]= c.

We are c sure that p is between p−E and p+E.

• E.g., we take a survey with n = 400, finding p = 240/400 =60%, meeting (abc). We want to find a margin of error E sothat we can be c = 99% certain that the true p = F/N (thepercentage of ‘y’ answers there would be if we surveyed theentire population) is between p−E and p+E.

Since we have done the work already, we skip to the end andfind the numbers and plug into (T1b). zc comes from lookingup c/2 = 49.50% in Table A-26, p = 0.60, q = 1− p = 0.40, andn = 400.

E = zc√

pq/n ≈ 2.575(0.025)= 6.3%.

60%−6.3%= 53.7%, 60%+6.3%= 66.3%.

So, with 99% certainty, p is between 53.7% and 66.3%.

• Algebra gives us a guideline for how large our sample n has tobe to meet a confidence requirement.

E = zc√

p q/n ⇒ n = z2c p q/E2.

E.g., for confidence level c = 95%, margin of error E = 2%, andnoting that pq is at most 0.25;we need n = 1.962(0.25)/0.022 = 2401.

2 - Two Populations, One Dichotomous Variable;Confidence Interval for p1 − p2

• Do a y/n survey of two populations, both (abc). Find the confi-dence interval with confidence level c for p1 − p2.

• Theory tells us subtracting two normally distributed variablesgives a result which is also normally distributed, but with dif-ferent parameters:

p1 ∼ N(p1,σ1), p2 ∼ N(p2,σ2)

⇒ (p1 − p2)∼ N(p1 − p2,

√σ2

1 +σ22

).

• From before, we know that for each survey:

p1 ∼ N(p1,σ1), σ1 =√

p1q1/n1 ≈√

p1 q1/n1 = σ1,

p2 ∼ N(p2,σ2), σ2 =√

p2q2/n2 ≈√

p2 q2/n2 = σ2.

Together, theory and these approximations say

(p1 − p2)∼ N(p1 − p2,

√σ2

1 + σ22

).

• Goal: with (p1 − p2) distributed as above, and a given confi-dence level c, find a margin of error E such that

(T2a) P[(p1 − p2) ∈ (p1 − p2 −E, p1 − p2 +E)]= c.

• Convert to z-scores, getting this context’s key quantity.

(p1 − p2)→ z = (p1 − p2)− (p1 − p2)√σ2

1 + σ22

p1 − p2 ±E → ±E√σ2

1 + σ22

• This converts our goal to finding an E such that

P

z ∈

−E√σ2

1 + σ22

,E√

σ21 + σ2

2

= c, z ∼ N(0,1).

This leads to

(T2b) E = zc

√σ2

1 + σ22 = zc

√p1 q1

n1+ p2 q2

n2,

where we find zc by looking up c/2 in the M-column of Table A-26. So, we can say with confidence c, that the actual difference(p1 − p2) is between (p1 − p2 −E) and (p1 − p2 +E).

• E.g., we take two surveys, with (abc): n1 = 200, p1 = 0.40; n2 =120, p2 = 0.30 We want a margin of error E for a confidencelevel c = 95%. Use (T2b):

E = 1.96

√0.4(0.6)

200+ 0.3(0.7)

120≈ 10.6%.

p1 − p2 = 0.4−0.3= 10%

p1 − p2 −E =−0.6%, p1 − p2 +E = 20.6%

With 95% confidence, p1 − p2 is between −0.6% and 20.6%.

12

Page 13: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

3 - One Population, One Numerical Variable,(Large Sample Case); Confidence Interval for µx

• From a survey meeting conditions (ad), use the sample meanto estimate the actual mean with a confidence interval.

• Theory tells us x is normally distributed:

x ∼ N(µx,σx/p

n )≈ N(µx, sx/p

n ),

The population SD σx can be replaced by the sample SD sxsince condition (d) is met.

• The goal is to find E such that

(T3a) P[x ∈ (µx −E,µx +E)]= c.

• As usual, convert to the z-score, using sx ≈ σx, encounteringthe key quantity for this context.

x → z = x−µx

sx/p

n, µx ±E → ±E

sx/p

n

• This converts the goal to finding an E such that

P[

z ∈( −E

sx/p

n,

Esx/

pn

)]= c, z ∼ N(0,1).

This is the same as finding E such that

P[

z ∈(0,

Esx/

pn

)]= c/2, z ∼ N(0,1).

because the normal distribution is symmetric. Introduce zc tomake the goal more clear:

zc = Esx/

pn

⇒ P[z ∈ (0, zc)]= c/2, z ∼ N(0,1).

We can find zc by looking up M = c/2 in Table A-26. Then wecan find E:

(T3b) E = zcsx/p

n.

We can now say: with confidence c, the actual mean µx is be-tween x−E and x+E.

• E.g., a survey meeting (ad) has been done: n = 400, x = 1255and sx = 620. Find the confidence interval for c = 95%. Lookup zc for M = c/2= 47.5% in Table A-26. Use (T3b):

E = zcsx/p

n = 2.326(620)/p

400= 72.1.

x−E = 1255−72.1= 1182.9, x+E = 1327.1.

With 98% confidence, µx is between 1182.9 and 1327.1.

4 - Two Populations, One Numerical Variable,(Large Sample Case); Confidence Interval for µ1 −µ2

• Two surveys with (ad), give x1, s1 and x2, s2. Find the confi-dence interval with confidence level c for µ1 −µ2.

• Each sample mean is normally distributed:

x1 ∼ N(µ1,σ1/p

n1 ), σ1 ≈ s1

x2 ∼ N(µ2,σ2/p

n2 ), σ2 ≈ s2,

Since (d) is met, the approximation is OK. Theory says the dif-ference between two normal variables is normal with differentparameters. Together, they say

(x1 − x2)∼ N(µ1 −µ2,

√s2

1/n1 + s22/n2

).

• The goal is to find E such that

(T4a) p[(x1 − x2) ∈ (µ1 −µ2 −E,µ1 −µ2 +E)= c.

• Convert to z-scores, getting the key quantity:

(x1 − x2) → z = (x1 − x2)− (µ1 −µ2)√s2

1/n1 + s22/n2

µ1 −µ2 ±E → ±E/√

s21/n1 + s2

2/n2

• The goal is converted to finding an E such that:

P

z ∈

−E√s2

1n1

+ s22

n2

,E√

s21

n1+ s2

2n2

= c, z ∼ N(0,1).

Using the same reasoning as before, this leads to:

(T4b) E = zc

√s2

1/n1 + s22/n2,

where zc is from looking up c/2 in Table A-26. So, we can saywith confidence c that the actual difference µ1−µ2 is between(x1 − x2 −E) and (x1 − x2 +E).

• E.g., Two surveys with (ad) have n1 = 200, x1 = 34.2, s1 = 5.8;n2 = 120, x2 = 31.7, s2 = 6.3. c = 90%.

Look up c/2= 45% in the M-column of Table A-26 and read offzc = 1.645. Use (T4b):

E = 1.645√

5.82/200+6.32/120 = 1.2

(x1 − x2)−E = 34.2−31.7−1.2 = 1.3

(x1 − x2)+E = 34.2−31.7+1.2 = 3.7

With 90% confidence, µ1 −µ2 is between 1.3 and 3.7.

13

Page 14: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

5 - One Population, One Numerical Variable,(Small Sample Case); Confidence Interval for µx

• A small n survey, with (aef) met, gives x, sx. Find a confidenceinterval with confidence level c for µx. The distribution of x isnot normal, but similar reasoning leads to a similar procedure.

• The goal is to find E such that

(T5a) P[x ∈ (µx −E,µx +E)]= c,

but the distribution of x is complicated.

• Theory says to go to the t-score to get the key quantity:

x → t = x−µx

sx/p

n, µx ±E → ±E

sx/p

n.

• This converts the goal to finding E such that

P[

t ∈( −E

sx/p

n,

Esx/

pn

)]= c, t ∼ td ,

where td is the t-distribution with degrees of freedom d = n−1in Table A-28. As before, this leads to

(T5b) E = tc sx/p

n,

where tc is found by going to the c column, and the (n−1)-throw of Table A-28. So, with confidence c, the actual mean µxis between x−E and x+E.

• E.g., a small survey, with (aef), has n = 20, x = 180, sx = 10.Find the c = 99% confidence interval for µx.

Use (T5b). From the 99% column, and the d = n−1 = 19-throw of Table T-28, read tc = 2.861.

E = tc sx/p

n = 2.861(10)/p

20= 6.4.

x−E = 180−6.4= 173.6, x+E = 180+6.4= 186.4.

With 99% confidence, µx is between 173.6 and 186.4.

† x is within E of µx if and only if µx is within E of x.

µx +E ≥ x ≥ µx −EE ≥ x−µx ≥ −E

−x+E ≥ −µx ≥ −x−Ex−E ≤ µx ≤ x+E

x ∈ (µx −E,µx +E) ⇔ µx ∈ (x−E, x+E).

6 - Two Populations, One Numerical Variable,(Small Sample Case); Confidence Interval for µ1 −µ2

• For two small surveys with (aefg), and n1, x1, s1, and n2, x2, s2,find the confidence interval for the true mean µ1−µ2 with con-fidence level c.

• The goal is to find E such that

(T6a) P[(x1 − x2) ∈ (µ1 −µ2 −E,µ1 −µ2 +E)]= c,

but the distribution of (x1 − x2) is complicated.

• Theory says to go to the t-score to get the key quantity:

d1 = n1 −1, d2 = n2 −1, d = d1 +d2,

s2 = d1s21 +d2s2

2d

, s12 =√

s2(

1n1

+ 1n2

)

(x1 − x2) → t = (x1 − x2)− (µ1 −µ2)s12

µ1 −µ2 ±E → ±E/

s12

• This converts the goal to finding E such that

P[t ∈ (−E/s12,E/s12)], t ∼ td .

As usual, this leads to

(T6b) E = tc s12,

where tc is found in column c, row d of Table A-28. We cansay: with confidence c that µ1 −µ2 is between x1 − x2 −E andx1 − x2 +E.

• E.g., Two surveys have: n1 = 15, x1 = 47.3, s1 = 2.4, and n2 =10, x2 = 33.4, s2 = 3.1. Find the c = 95% confidence interval forµ1 −µ2.

Use (T6b), but find the intermediate values first.

d1 = 14, d2 = 9, d = 14+9= 23.

s2 = [14(2.42)+9(3.12)]/23= 7.267.

s12 =√

7.267(1/15+1/10)= 1.101.

tc = 2.069 from the 95% column, row 23 of Table A-28.

E = tc s12 = 2.069(1.101) = 2.278.

x1 − x2 −E = 11.6, x1 − x2 +E = 16.2.

So, we are 95% confident that the actual difference of themeans µ1 −µ2 is between 11.6 and 16.2.

14

Page 15: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

12. Hypothesis Testing

A Hypothesis Testing Dream

• It is a dark and stormy night and you are making a hypoth-esis about something that can be answered by a survey. Youbelieve this hypothesis and call it the null hypothesis becauseyou will not have to change your mind if it turns out to betrue. You fall asleep, thinking to call the opposite of the nullhypothesis the alternative hypothesis and getting a notion ofthem defining different universes.

You wake up, desperate to figure which universe you are in.In the null universe, the null hypothesis is true. In the alter-native universe, the alternative hypothesis is true. Becausethe null and alternative hypotheses are mutually exclusive,only one can be true in a given universe.

You can survey the inhabitants to figure out which universeyou are in. To start, you assume you are in the null universe.In the null universe, the survey results will be have a certaindistribution. You can use this to find the likelihood of a setof survey results. In order to make an honest decision, be-fore the survey, you decide on a significance level below whichyou would have to conclude you are actually in the alternativeuniverse.

You take the survey and find the p-value of the results as-suming you are in the null universe. If the likelihood is abovethe significance level, you decide you are in the null universeand relax. But no! You find the p-value is below the signif-icance level. It is too unlikely in the null universe to get thesurvey results you got. You conclude you are actually strandedin the alternative universe and you never come back!

You wake up, this time for real. The storm has passed andyou have a full understanding of hypothesis testing.

Hypothesis Testing Stages

• Ask a yes/no question that can be answered by a survey. Con-sider the situation and decide which one of the contexts belowis applicable.

• Decide on a parameter to be tested. The value of this param-eter should answer the question. The survey will give thesample statistic corresponding to the parameter, e.g., if we aretesting the true mean µx, then our survey should give us thesample mean x.

• Form two hypotheses which are mutually exclusive. E.g., “ei-ther p = 50% or p > 50%,” or “either µx = 10 or µx 6= 10,” or“either µ1 =µ2 or µ1 6=µ2.”

• Make the hypothesis with the = sign the null hypothesis H0because this will give us a specific null distribution to workwith. Call the other hypothesis the alternative hypothesisHA . If HA has a < or > sign, we will need a one-tailed test;while a 6= sign calls for a two-tailed test.

• Assume the null hypothesis H0 is true. The survey will yield asample statistic. Theory says that the sample statistic will bedistributed some way around the true parameter and we willbe able to calculate how likely it is for a range of values of thesample statistic to appear. Decide now, ahead of the survey,how unlikely a sample statistic has to be before we can nolonger believe H0. This likelihood is the significance levelα.

• Assuming H0 is true, the sample statistic will have some com-plicated distribution around the true parameter. To makefinding the likelihood easier, we “change our ruler” by calcu-lating the test statistic which theory says has a known dis-tribution, the null distribution of the test statistic, whichwill be in the back of the book. The test statistic to use de-pends on the context.

• Using the significance level α, whether the test is one- or two-tailed, and the correct table (A-26–28) for the context, we canfind the rejection region and decision rule for our test. Dothis by looking up α (or α/2) in the body of the table and read-ing the z- or t-scores which will be the cut-off values. For aone-tailed test the rejection region is one of the tails. For two-tailed tests, it is both tails. So, the rejection region may be tothe left, right, or left-or-right of the cut-off value(s). The deci-sion rule is to reject H0 if the test statistic is in the rejectionregion.

• After the above steps, do a survey that gives the sample statis-tic. The survey must meet the conditions for the context. Teststhat use the t-distribution are robust so we can proceed evenif the condition of normality is not fully met.

• Compute the test statistic from the sample statistic and plotit on the distribution which theory says it has.

• If the test statistic is in the rejection region, we say, “At the α

level of significance, we conclude that HA is true.” If the teststatistic is not in the rejection region, we keep on believing H0and say, “At the α level of significance, we cannot conclude HAis true.”

• Whether or not we have determined the cut-off values, we canstill find the p-value for the test statistic. The p-value is theprobability of getting the test statistic or worse if H0 is true. Itis the area under the tail(s) of the null distribution of the teststatistic, and found by looking up the test statistic in tablesA-26–28.

The 6 Contexts - Hypothesis Testing

Variable(s)→ 1 Large Sample Small SamplePopulations↓ Dichotomous 1 Numerical 1 Numerical

1 H0 : p = p0 H0 :µx =µ0 H0 :µx =µ0(conditions) (abc) (ad) (aef)

2 H0 : p1 = p2 H0 :µ1 =µ2 H0 :µ1 =µ2(conditions) (ahc) (ad) (aefg)

Necessary Conditions key:(a) Random sampling with replacement, or random sampling with-

out replacement with N À n, (or N1 À n1 and N2 À n2).

(b) p0n > 5 and q0n = (1− p0)n > 5.

(c) n > 100, (or n1 > 100 and n2 > 100).

(d) n ≥ 30, (or n1 ≥ 30 and n2 ≥ 30).

(e) n < 30.

( f ) Variable is approximately normal in population.

(g) σ1 ≈σ2, i.e., SDs are about the same.

(h) ppooledn1, ppooledn2, qpooledn1, qpooledn2 all > 5.

15

Page 16: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

1 - One Population, One Dichotomous Variable;Testing H0 : p = p0

• A typical example of this context would be our hearing thatp0 percent of a population have a certain trait, but wonderingif perhaps the percentage is actually more. We then do a sur-vey, meeting conditions (abc) and finding p and asking, if thepercentage really is p0, how likely is our getting p or more?

• Our parameter is the relative frequency p. Our survey willgive us the sample relative frequency p.

• The null hypothesis H0 is that p = p0. In this example, ouralternative hypothesis HA is that p > p0. Since HA uses the> sign, we will have a one-tailed test.

• We assume H0 is true, so the sample statistic will be centeredaround p0 with a certain distribution. We decide before thesurvey that if the probability of getting p or above is less thana significance level α (usually 5%) then we can no longer be-lieve H0 is true.

• “Changing our ruler” by going to the z-score lets us use thenull distribution of the test statistic centered around 0, with 0representing no deviation from p0. Theory says the test statis-tic for this context is approximately normal,

z = p− p0√p0q0/n

∼ N(0,1).

So, we will use a normal distribution table, like Table A-26.

• The area under the curve to the right of some zα is α. Weare using a one-tailed test, so find z in the table that has M =0.50−α. This is zα which marks the rejection region. Thecorresponding decision rule is: if z > zα then we reject H0.

• E.g., We think that p = p0 = 30%. We plan to do an investi-gation to see if p > p0. The null hypothesis is H0 : p = p0 andthe alternative hypothesis is HA : p > p0. We decide that ifwe assume H0 to be true, and end up with a p that has lessthan an α= 5% chance of appearing, that HA must be true in-stead. We are using a one-tailed test because of the > sign. Inthis context, the test statistic will be normal, N(0,1). So, fromα= 5%, we look up 50%−5%= 45% in the M% column of TableA-26. This gives us zα = 1.645 which will be the mark for ourrejection region. Having done all this, we take the survey withconditions (abc) and n = 1000 and get a p = 32.3%. From this,we compute the test statistic:

z = 0.323−0.300p0.30(0.70)/1000

= 0.0230.01449

= 1.59 < 1.645.

So, we are not in the rejection region, and cannot conclude thatHA : p > 30% is true. So, “With a significance level of 5%, wecannot conclude p > 30%.”

2 - Two Populations, One Dichotomous Variable;Testing H0 : p1 = p2

• We want to see if the percentage of a certain trait in two pop-ulations is the same. We take surveys, with conditions (ahc),getting sample relative frequencies p1 and p2. The pooledquantities in condition (h) are:

ppooled = f1 + f2

n1 +n2, qpooled = 1− ppooled.

• The parameter is p1 − p2. The survey gives the sample statis-tic p1 − p2. H0 is p1 − p2 = 0, and HA is p1 − p2 6= 0. The 6=indicates a two-tailed test.

• Assuming H0 is true, theory says the sample statistic will behave some distribution around the parameter. The decisionrule is, if the probability of getting p1− p2 is less than α= 5%,then we stop believing H0.

• By the usual “changing our ruler,” we go to the z-score whichtheory says is approximately normal,

z = p1 − p2√ppooled qpooled(1/n1 +1/n2)

∼ N(0,1).

• We need the two-tailed probability, so we need zα such thatP(|z| > zα)=α. The normal curve is symmetric, so look for a zαwith P(z > zα) = α/2. In the table find zα for M% = 0.50−α/2.If z is in either tail, i.e., z < −zα or z > zα, we reject H0 andbelieve HA .

• E.g., 2 surveys give: n1 = 200, f1 = 80, p1 = 40%, and n2 =120, f2 = 36, p2 = 30%. Assume H0 : p1 = p2. Find the proba-bility of getting our results (or anything more extreme) if H0really is true. Compute the test statistic, the z-score of p1− p2.

ppooled = f1 + f2

n1 +n2= 80+36

200+120= 116

320= 36.3%.

qpooled = 1− ppooled = 100%−36.3%= 63.7%.

z = 0.40−0.30p(0.363)(0.634)(1/200+1/120)

= 1.81.

Now, we find zα for α = 5%. Because HA : p1 6= p2, we needthe two-tailed probability. Find z in Table A-26 such that M =0.50−α/2 = 0.500−0.025 = 47.5%. The table gives zα = 1.96.Mark both zα and −zα because we need the two-tailed prob-ability. The rejection region is z < −zα or z > zα. Because−zα < z < zα, our test statistic z is not in the rejection region,so we cannot reject the null hypothesis, but keep on believingit.

We compared z-scores, but we could have compared prob-abilities. Table A-26 gives the two-tailed probability: P(|z| >z)= 2P(z > 1.81)= 2(0.50−P[z ∈ (0,1.81)])= 2(0.50−0.4649)=0.07. Since 0.07> 0.05=α, we stick with H0 just as before.

16

Page 17: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

3 - One Population, One Numerical Variable,(Large Sample Case); Testing H0 :µx =µ0

• We take a survey with conditions (ad) and get a sample meanµx, but we think the mean for the population is µ0. How likelyis it to get the sample mean x if the true mean really is µ0?

• The null hypothesis H0 is that the true mean is µ0. The pa-rameter is µx, the sample statistic is x. The alternate hypoth-esis HA is that µx 6= µ0. This means we will use a two-tailedprobability.

• We are assuming H0 is true, so the sample mean will be dis-tributed around µ0. According to theory, we can again “changeour ruler” and get a test statistic z-score with an approximatenormal distribution

z = x−µ0

sx/p

n∼ N(0,1).

• Suppose we want a significance level α, and let zα be the z-score such that P(|z| > zα)=α. ±zα marks the rejection regionused by the decision rule. We need to make some adjustmentsbefore using Table A-26.

P(|z| > zα)= 1−P(|z| < zα)= 1−2P[z ∈ (0, zα)]=α

⇒ P[z ∈ (0, zα)]= 1−α

2= M.

So, we look in the table for the value of z = zα that gives theM value. Because we need a two-tailed probability, −zα andzα mark the rejection region. The decision rule is: if z < −zαor z > zα, we stop believing H0 and switch to HA .

• E.g., We take a survey with conditions (ad) with n = 400 andfind a sample mean x = 1255, sx = 620. We would like to hon-estly say the true mean µx > 1000. We decide on a significancelevel of α= 1%, meaning that we consider ourselves honest ifwhat we say has less than a 1% probability of being false if thenull hypothesis is true. The null hypothesis is H0 : µx = 1000.Can we honestly say µx > 1000?

Find the test statistic:

z = 1255−1000

620/p

400= 255

31= 8.23.

From here, we should be able to find the associated two-tailedprobability. Try to look up z = 8.23 in Table A-26 and read M.It is off the scale, meaning that the tail probability is virtually0. This means that if H0 : µx = 1000 were true, the chanceof us getting x = 1255 or more is virtually zero. Since oursample mean is much greater than 1000, we can confidentlysay that the true mean µx > 1000. Our sample mean beinggreater than 1000 is important because we are dealing withtwo-tailed probabilities. If we had gotten a sample mean of1000−255 = 745 we would have been able to say with equalcertainty that the true mean is less than 1000.

Just for fun, let’s find the rejection region for our α= 0.01.The M value needed is 0.50−0.005= 0.495, which correspondsto a zα = 3.30. So, we would make the same conclusion, thatµx > 1000 for any test statistic z > 3.30. We have 8.23, whichis way past 3.30.

4 - Two Populations, One Numerical Variable,(Large Sample Case); Testing H0 :µ1 =µ2

• We take two surveys from two populations, each with condi-tions (ad), and get two sample means. Are their true meansthe same? We answer by finding how likely it would be if thetrue means are the same, µ1 = µ2, that we would get samplemeans with a difference x1 − x2.

• The null hypothesis H0 is µ1 =µ2 or µ1−µ2 = 0. The parameteris their difference µ1−µ2. The sample statistic is the differenceof the sample means x1 − x2. The alternate hypothesis HA isµ1 6= µ2. Because of the 6=, we are looking for a two-tailedprobability.

• We assume the null hypothesis H0 is true so x1 − x2 will bedistributed some way around µ1 −µ2. Theory says that if we“change our ruler” to get the z-score, its distribution will beapproximately the standard normal distribution.

z = x1 − x2√s2

1/n1 + s22/n2

∼ N(0,1).

• We choose a significance level α, and let zα be the z-score suchthat P(|z| > zα) =α. Because it is a two-tailed probability, therejection region is marked by ±zα. If z is left of −zα or right ofzα, we reject H0 and embrace HA . This is the decision rule. Tofind this zα, we have to turn things around a little before usingTable A-26. The the M-column of the table tells P[z ∈ (0, zα)]where we look up zα in the z-column. So, we have to turn ourrequirement around to use the table.

P(|z| > zα)= 1−P(|z| < zα)= 1−2P[z ∈ (0, zα)]=α

⇒ P[z ∈ (0, zα)]= (1−α)/2= M.

So, we look up M = (1−α)/2 in the table and find zα. Thismeans P(|z| > zα)<α. We see if z from our data is to the left of−zα or right of zα. If it is, the probability of our getting z if H0is true is less than the significance level α. We would rejectH0 and start believing HA .

• E.g., We take two surveys, each with conditions (ad), with n1 =200, n2 = 120 and get x1 = 34.2, s1 = 5.8, x2 = 31.7 and s2 = 6.3.Compute the test statistic

z = 34.2−31.7p5.82/200+6.32/120

= 2.5p0.49895

= 3.54.

We did not specify a significance level α to help our decisionprocess, so we will just go ahead and find the probability ofgetting what we got or more, if the true means are the same.

P(|z| > z)= 1−P(|z| < z)= 1−2P[z ∈ (0, z)].

We find P[z ∈ (0, z)] by looking up z = 3.54 in Table A-26 andreading M = 49.98%. So,

P(|z| > z)= 1−2P[z ∈ (0, z)]= 1−2M

= 1−2(0.4998)= 0.0004= 0.04%.

This tells us that if H0 is true, our chance of getting what wegot or more is tiny. So, we no longer believe H0 : µ1 = µ2, andswitch our belief to HA :µ1 6=µ2.

17

Page 18: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

5 - One Population, One Numerical Variable,(Small Sample Case); Testing H0 :µx =µ0

• We take a survey, but the sample is small. We want to see ifthe sample mean for a variable, x is compatible with what weassume is the population mean µ0. We have conditions (aef).Condition (f) is that the variable, for which we are finding thesample mean, is normal in the population. It is hard to knowthis for sure, but our procedure will work unless the popula-tion distribution is far from being bell-shaped.

• The null hypothesis H0 is that the true mean is µ0. The pa-rameter is µx and the sample statistic is x. The alternativehypothesis HA is that µx 6= µ0. The 6= means we will use atwo-tailed probability.

• Assuming H0 is true, the sample mean x will be distributed insome way around µ0. Theory tells us that the t-score will havethe t-distribution

t = x−µ0

sx/p

n∼ td , d = n−1,

where d = n−1 is the degrees of freedom needed to use TableA-28.

• If we assign a significance level α, we can find the cut-offpoints ±tα that have

P(|t| > tα)=α

from the table. The table only works for α = 10%,5%,2% or1%, so we have to select one of those values. With the chosenα and degrees of freedom d, we can find tα and compare itwith t. If t < −tα or t > tα then we no longer believe H0 andswitch over to HA . This is the decision rule.

• E.g., We take a survey, with conditions (aef), getting n = 20,x = 180 and sx = 10. We want to see how compatible our datais with the belief H0 that the true mean is µ0 = 185 Calculatethe test statistic

t = 180−185

10/p

20= −5

2.236=−2.24.

To use Table A-28, we need d = n− 1 = 19 and decide on asignificance level of α= 5%. The table gives tα = 2.093. We seethat −2.24 <−2.093 so our t is in the rejection region and ourdecision rule says to stop believing H0 and believe HA whichsays that the true mean is not 185, µx 6= 185.

If we had chosen α= 1% instead, we would have found tα =2.861. We would have had −tα < −t because −2.861 < −2.24,so t is not in the rejection region and we stick with our beliefthat µx = 185. Our final belief depends on our choice for thesignificance level.

6 - Two Populations, One Numerical Variable,(Small Sample Case); Testing H0 :µ1 =µ2

• We have to decide if the means of two populations of are thesame by using two small surveys with conditions (aefg). Thesame concerns about condition (f) are present. Condition (g) isalso uncertain, but the data itself will give an indication if itis true.

• The null hypothesis H0 is that the means are equal, µ1 = µ2.HA and H0 must be mutually exclusive, but not necessarilyexact opposites. This time, we make HA : µ1 > µ2, calling for aone-tailed probability.

• Assuming H0 is true, the difference in sample means will bedistributed some way around the difference in true means.Theory says that the t-score will have the t-distribution withdegrees of freedom d:

t = x1 − x2√s2(1/n1 +1/n2)

∼ td ,

d = n1 −1+n2 −1= n1 +n2 −2,

s2 = (n1 −1)s21 + (n2 −1)s2

2n1 +n2 −2

.

• Suppose we want a significance level α. Then to find the re-gion of rejection, we find tα such that P(t > tα)=α. Table A-28only has one-tailed probabilities of α = 5%, 2.5%, 1% or 0.5%so we have to use one of these values. Find tα by looking atthe α-column and looking at row d for the degrees of freedom.Since this is a one-tailed probability, the decision rule is toreject H0 :µ1 =µ2 if t > tα, embracing HA :µ1 >µ2 instead.

• E.g., We do 2 surveys with conditions (aefg) and get n1 =15, x1 = 47.3, s1 = 2.4 and n2 = 10, x2 = 33.4, s2 = 3.1. To see ifthe data is compatible with null hypothesis H0 :µ1 =µ2, makethe alternative hypothesis be HA :µ1 >µ2.

Calculate the test statistic:

s2 = 14(2.42)+9(3.12)15+10−2

= 167.323

= 7.267,

t = 47.3−33.4√(7.2672)(1/15+1/10)

= 12.6.

Use the usual level of significance α = 5%. The degrees offreedom is d = n1 + n2 −2 = 23. In Table A-28 at row d = 23and the column for 5% in each tail, we find tα = 1.714. Since12.6> 1.714 we reject H0.

Now, suppose we had required a level of significance of α=0.5%. Then, using the column for 0.5% in each tail in TableA-28, read tα = 2.807. Since 12.6 > 2.807 also, we reject H0and embrace HA even at this small α.

18

Page 19: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

Significance, Confidence Intervals, Errors and Power

• The results of a hypothesis test are statistically significantif any of four equivalent conditions is true:(a) the test statistic is in the rejection region,(b) the p-value is less than the significance level α,(c) the null hypothesis H0 is rejected,(d) we conclude the alternative hypothesis HA is true.

• “Statistically significant” does not necessarily mean “signifi-cant” in the ordinary sense. Large sample results can be thefirst while not being the second. Small sample results can bevice versa.

• The “HTest & CI” table lists results from Context 2 hypothesistests and confidence intervals for two sets of sample probabil-ities and sample sizes.

• Let ∆= sample difference= p1 − p2.Let ∆= true difference= p1 − p2.

HTest & CI n1 = n2 = 100 n1 = 2400,n2 = 2000

p1 = 44% z = 0.43 z = 2.00p2 = 41% p-value: 67%> 5% p-value: 4.5%< 5%

HTest: keep H0; NOT stat sig reject H0; stat sig∆= 3% NOT sig NOT sig

95% CI ⇒ (−10.7%,16.7%) (0.11%,5.89%)p1 = 55% z = 1.84 z = 8.589p2 = 42% p-value: 6.58%> 5% p-value: 0.0%< 5%

HTest: keep H0; NOT stat sig reject H0; stat sig∆= 13% sig sig

95% CI ⇒ (−0.7%,26.7%) (10.06%,15.94%)

• Ordinary significance is based on ∆, but the 95% confidenceinterval for ∆ tells a fuller story.

Left, upper or lower (small sample): ∆ big or small.

Right, upper or lower (large sample): ∆ confined.

• A powerful test is more likely to detect when H0 is false.

• Scenarios in hypothesis testing and their probabilities:

Decision⇒ declare H0, reject H0,Truth⇓ reject HA . declare HA .

H0 Correct Type I errorPr= 100%−α “false alarm”, Pr=α

HA Type II error Correct“miss”, Pr=β Pr= 100%−β

(Power of the Test)

• The power of a hypothesis test is 100%−β. The α is the samesignificance level used previously above.

• Sample size n, power and significance level α can be put onsort of a seesaw. Holding n constant and increasing α alsoincreases the power, 100%−β. This means β, the probabilityof a “miss”, is decreased. (Sounding the alarm every time atthe slightest hint also means you have a better chance of beingright when there really is something going on.) Increasing thesample size n lowers the probability of both “false alarms” and“misses.”

• Increasing the sample size n used in a test increases the powerof the test and decreases the probability of both Type I andType II errors (“false alarms” and “misses”).

A Bayesian Approach

• In our hypothesis testing we see how likely it is to get a resultassuming a hypothesis is true. If it is too unlikely, we say thehypothesis must be false. But, we do not say anything on howlikely that hypothesis is in the first place.

• The Prosecutor’s Fallacy is an example of how this issuearises. For a defendant J, let H0 = Innocent, HA = Guilty,and M = Matching DNA. The prosecutor argues: P(a match)=0.01, J matches, soP(J innocent, given J matches)= 0.01, andP(J guilty, given J matches)= 0.99. Symbolically:

P(M)= 0.01, J is M, so P(H0 |M)= 0.01,

P(HA |M)= 1−P(H0 |M)= 0.99≈ 99%.

• J is in trouble, but we have P(H0 & M) = P(M &H0), and Ch.8 (MR1) with algebra gives Bayes’ Theorem:

(H1) P(H0 |M)= P(M |H0)P(H0)P(M)

.

• To write a plain probability with conditioned probabilities:

(H2) P(M)= P(M |H0)P(H0)+P(M |HA)P(HA).

• A Bayesian statistician calls H0 and HA prior probabilities—their values come from data had before the evidence M isadded. P(H0 |M) is a posterior probability—a new probabil-ity recomputed after accounting for evidence M.

• If J is innocent, his probability of matching is the same as forthe general population, so P(M |H0) = 0.01. If J is guilty, amatch is almost certain, so P(M |HA)≈ 1.

• Bayesian methods call for prior probabilities. Suppose fromearlier evidence, we know P(H0) = 0.7 and P(HA) = 0.3. Putall these values into (H2).

P(M)= 0.01(0.7)+1(0.3)= 0.307.

Then (H1) gives the posterior probability for J being innocentgiven his DNA matches:

P(H0 |M)= (0.01)(0.7)/0.307= 0.023.

• J’s probability of being guilty, given his DNA matches is

P(HA |M)= 1−P(H0 |M)= 0.977≈ 98%.

• J looks pretty guilty, but the prosecutor’s reasoning was bad.He mistook P(M |H0) for P(H0 |M). In this example, J’s ver-dict was unchanged by correct Bayesian methods, but therehave been legal cases where correctness has made a differ-ence. A lawyer intentionally using the Prosecutor’s Fallacy issubject to disbarment.

19

Page 20: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

13. Chi-Square Tests

The Chi-Square Test for Independence

• Context: We have one population and two categorical vari-ables. Assuming the variables are independent in the popula-tion, how likely is it to get what we got? To perform this testof independence, we do a fancy comparison of our data toperfectly independent data.

• The table of our observed data looks like this:

1 2 3a T1a T2a T3a Tab T1b T2b T3b Tb

T1 T2 T3 T

• Using Ch. 4 and algebra, we can construct a table of inde-pendent data, the expected values table, from the row andcolumn totals {Ta,Tb}, and {T1,T2,T3}

1 2 3a T1Ta/T T2Ta/T T3Ta/T Tab T1Tb/T T2Tb/T T3Tb/T Tb

T1 T2 T3 T

• Theory says we can boil down the difference between the ob-served data and the expected values into one number calledthe chi-square statistic:

χ2 =∑ (Observed−Expected)2

Expected.

To compute χ2, we take each cell in the observed data tableand subtract from it the same cell in the expected values table,square the result, then divide by the expected value. Finally,we add up the results from all the cells.

• Now, χ2 will be exactly 0 if our sample data shows no associ-ation between the variables. But, the stronger the associationin the sample data, the larger χ2 is. Theory says that χ2 hasa distribution called the chi-square distribution which de-pends on the degrees of freedom (d.f.) in our data given by

d.f.= (#rows−1)(#columns−1)= (R−1)(C−1).

• To justify using this theory, we must have two conditions:

(1) Random sampling with replacement

or without replacement with N >> n,

(2) All expected values ≥ 5 or so.

Notice that we cannot know (2) until after our survey is done.If (2) is not met, consult a statistician.

• Using this theory, we will look up tail probabilities in TableA-29 of us getting the χ2 computed from our data if χ2 wasactually 0. In other words, we look up the probability of usgetting sample data showing association of amount χ2 if thereactually was no association in the population.

• E.g., We take a survey and get this observed data:

⇓Time; Age⇒ < 30 30–40 > 40 Totalday 88 28 4 120

evening 32 32 16 80Total 120 60 20 200

• Construct a table of independent data for comparison. First,for each cell, form the product of the row and column marginaltotals.

⇓Time; Age⇒ < 30 30–40 > 40 Totalday 120(120) 60(120) 20(120) 120

evening 120(80) 60(80) 20(80) 80Total 120 60 20 200

Then, divide each cell by the grand total, 200, getting the ex-pected value table:

⇓Time; Age⇒ < 30 30–40 > 40 Totalday 72 36 12 120

evening 48 24 8 80Total 120 60 20 200

• Now, make sure of conditions (1) and (2). Condition (1) is metby our having correctly designed a survey. (2) is met becausethe entries in the expected value table are all more than 5.With both conditions met, find the degrees of freedom in thissurvey.

d.f.= (2−1)(3−1)= 1(2)= 2.

• Calculate the χ2 statistic by first subtracting the expected val-ues from the observed values:

< 30 30–40 > 40day 88−72 28−36 4−12

evening 32−48 32−24 16−8

Then, square each cell and divide by the expected value:

< 30 30–40 > 40day 162/72 (−8)2/36 (−8)2/12

evening (−16)2/48 82/24 82/8

Finally, find χ2 by adding up all the cells:

< 30 30–40 > 40day 3.56 1.78 5.33

evening 5.33 2.67 8.00

χ2 = 26.67

• Now, do the actual test. Use the usual 5% level of significance.Table A-29 shows χ2 = 26.67 would be less than the 0.5% cut-off on the d.f. = 2 row. So, if H0: “Time and Age are not asso-ciated in the population” is true, the probability of getting thesample data we got is less than 0.5%. Since 0.5% < 5%, rejectH0 and say there is an association in the population.

20

Page 21: Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) · Math 13 - Statistics - Poster Handout - Ch 1–13 - (Version 1.3) 1. Introduction to Statistical Thinking •

The Chi-Square Test for Goodness-of-Fit

• Context: We have one population, one categorical variable,and think we know the distribution of that variable. We areasking if the variable has the suspected distribution how likelyis it to get the distribution we got? A test for goodness-of-fitprovides the answer.

• Typically, we would be comparing two surveys with the samecategorical variable. That variable should have the same cat-egories, not just be about the same thing because we have tocompare distributions.

• Suppose we have the relative frequency distribution from anold survey. There is one categorical variable with k = 4 cate-gories {1,2,3,4}.

old 1 2 3 4 Total% p1 p2 p3 p4 100%

• Now, suppose we have observed the frequency (not relativefrequency) distribution from a new survey using the same cat-egorical variable. There are n units in the survey in all.

Obs 1 2 3 4 Totalf n1 n2 n3 n4 n

• Now, we need to compare these two distributions. There aretwo ways to go. We could convert the new data to relativefrequencies and then compare the relative frequency distribu-tions. Or, we could use the old distribution to make up a ta-ble of data and compare the frequency distributions. We willdo the latter. (The old data has been boiled down to relativefrequencies. We need to “unboil” it to make the comparison.)So, apply the old distribution to the present n, getting an ex-pected value table.

Exp 1 2 3 4 Totalf np1 np2 np3 np4 n

• So, now we have two frequency distributions, the observed andthe expected and want to compare them. The null hypothesisH0 will be that the distribution for the new survey is the sameas the old. The question is, assuming H0: the distributionis the same, how likely is it for us to get the distribution wegot? If the likelihood is less than some significance level, say5%, we stop believing H0 and say the new data must have adifferent distribution.

• To justify using this theory, we must have both:

(1) Random sampling with replacement

or without replacement with N >> n,

(2) All expected values ≥ 5 or so. (Known after survey.)

• Theory says we can compare the two frequency distributionsby boiling their differences down to one number, the chi-squarestatistic,

χ2 =∑ (Observed−Expected)2

Expected.

• To find χ2, find the value for each category and then add themall up:

1 2 3 4 Total(n1−np1)2

np1

(n2−np2)2np2

(n3−np3)2np3

(n4−np4)2np4

χ2

• Find the degrees of freedom for this variable, which is

d.f.= #Categories−1= k−1.

Suppose we have decided on a 5% level of signficance. Use the(d.f.)-th row of Table A-29 for 5% to see where χ2 lies. If thep-value is less than 5%, we can no longer believe H0, that ournew data is distributed the same as the old survey.

• E.g., An old survey’s relative frequency distribution is:

Age < 30 31–40 41–50 > 50 Total% 30% 42% 18% 10% 100%

We do a new survey with n = 50, getting

Obs < 30 31–40 41–50 > 50 Totalf 9 25 13 3 50

To do the comparison, we build a frequency distribution fromthe old distribution by multiplying the percentages by n ineach category.

Exp < 30 31–40 41–50 > 50 Total(0.30)50 (0.42)50 (0.18)50 (0.10)50

fE = 15 = 21 = 9 = 5 50

Now, find the difference between the Obs table and the Exptable, square it, and divide by the same entry in the Exp tablefor each category.

χ2 < 30 31–40 41–50 > 50 Total(9−15)2

15(25−21)2

21(13−9)2

9(3−5)2

5= 2.4 = 0.76 = 1.78 = 0.8 χ2 = 5.74

The d.f. for this variable is 4−1 = 3. The conditions are bothmet. We decide on a 5% level of significance. Then, in TableA-29, we see that the cut-off for 5%, d.f. 3, is 7.81. Since ourχ2 = 5.74< 7.81, we are not past the cut-off. So, our p-value isnot less than 5%, so we cannot say that the distribution for thenew survey is different from that of the old survey. We keepon believing H0.

21