Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761,...

57
Stat 13, Intro. to Statistical Methods for the Life and Health Sciences. 1. Power. 2. Confidence Intervals for a proportion and the dog sniffing cancer example. 3. CIs for a proportion and the Affordable Care Act example. 4. CIs for a population mean and the used cars example. Start reading chapter 4. http://www.stat.ucla.edu/~frederic/13/F16 . HW2 is due Oct 18 and is problems 2.3.15, 3.3.18, and 4.1.23. 1

Transcript of Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761,...

Page 1: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

Stat 13, Intro. to Statistical Methods for the Life and Health Sciences.

1. Power.2. Confidence Intervals for a proportion and the dog sniffing cancer example.3. CIs for a proportion and the Affordable Care Act example. 4. CIs for a population mean and the used cars example.

Start reading chapter 4.http://www.stat.ucla.edu/~frederic/13/F16 .HW2 is due Oct 18 and is problems 2.3.15, 3.3.18, and 4.1.23.

1

Page 2: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

1.Power.

• Poweris1– P(TypeIIerror).Usuallyexpressedasafunctionofµ.

• RecallTypeIandTypeIIerrors.– AtypeIerrorisafalsepositive.Rejectingthenullwhenitistrue.

– AtypeIIerrorisafalsenegative.Failingtorejectthenullwhenthenullisfalse.

Page 3: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

Power• Theprobabilityofrejectingthenullhypothesiswhenitisfalseiscalledthepower ofatest.

• Poweris1minustheprobabilityoftypeIIerror.• Wewantatestwithhighpowerandthisisaidedby– Alargeeffectsize,i.e.trueµfarfromtheparameterinthenullhypothesis.

– Alargesamplesize.– Asmallstandarddeviation.– Significancelevel.Ahighersign.levelmeansgreaterpower.ThedownsideisthatyouincreasethechanceofmakingatypeIerror.

Page 4: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

Estimationandconfidenceintervals.

Chapter3

Page 5: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

ChapterOverview

• Sofar,wecanonlysaythingslike– “Wehavestrongevidencethatthelong-runprobabilityBuzzpushesthecorrectbuttonislargerthan0.5.”

– “Wedonothavestrongevidencekidshaveapreferencebetweencandyandatoywhentrick-or-treating.”

• Wewantamethodthatsays– “Ibelieve68to75%ofallelectionscanbecorrectlypredictedbythecompetentfacemethod.”

Page 6: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

ConfidenceIntervals

• Intervalestimatesofapopulationparameterarecalledconfidenceintervals.

• Wewillfindconfidenceintervalsthreeways.– Throughaseriesoftestsofsignificancetoseewhichproportionsareplausiblevaluesfortheparameter.

– Usingthestandarddeviationofthesimulatednulldistributiontohelpusdeterminethewidthoftheinterval.

– Throughtraditionaltheory-basedmethods.

Page 7: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

StatisticalInference:ConfidenceIntervals

Section3.1

Page 8: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

CanDogsSniffOutCancer?

Section3.1

Page 9: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

CanDogsSniffOutCancer?Sonoda etal.(2011).Marine,adogoriginallytrainedforwaterrescues,wastestedtoseeifshecoulddetectifapatienthadcolorectalcancerbysmellingasampleoftheirbreath.• Shefirstsmellsabagfromapatientwithcolorectalcancer.

• Thenshesmells5othersamples;4fromnormalpatientsand1 fromapersonwithcolorectalcancer

• Sheistrainedtositnexttothebagthatmatchesthescentoftheinitialbag(the“cancerscent”)bybeingrewardedwithatennisball.

Page 10: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

CanDogsSniffOutCancer?InSonoda etal.(2011).Marinewastestedin33trials.• Nullhypothesis:Marineisrandomlyguessing

whichbagisthecancerspecimen(𝜋 =0.20)• Alternativehypothesis:Marinecandetectcancer

betterthanguessing(𝜋 >0.20)

𝜋 representsherlong-runprobabilityofidentifyingthecancerspecimen.

Page 11: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

CanDogsSniffOutCancer?• 30outof33trialsresultedinMarinecorrectlyidentifyingthebagfromthecancerpatient

• Sooursampleproportionis

�̂� = %&%%≈ 0.909

• DoyouthinkMarinecandetectcancer?• Whatsortofp-valuewillweget?

Page 12: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

CanDogsSniffOutCancer?Oursampleproportionliesmorethan10standarddeviationsabovethemeanandhenceourp-value~ 0.

Page 13: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

CanDogsSniffOutCancer?

• CanweestimateMarine’slongrunfrequencyofpickingthecorrectspecimen?

• Sinceoursampleproportionisabout0.909,itisplausiblethat0.909isavalueforthisfrequency.Whataboutothervalues?

• IsitplausiblethatMarine’sfrequencyisactually0.70andshehadaluckyday?

• Isasampleproportionof0.909unlikelyif𝜋 =0.70?

Page 14: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

CanDogsSniffOutCancer?

• H0:𝜋 =0.70 Ha:𝜋 ≠0.70• Wegetasmallp-value(0.0090)sowecanessentiallyruleout0.70asherlongrunfrequency.

Page 15: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

CanDogsSniffOutCancer?

• Whatabout0.80?• Is0.909unlikelyif𝜋 =0.80?

Page 16: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

CanDogsSniffOutCancer?

• H0:𝜋 =0.80Ha:𝜋 ≠0.80• Wegetalargep-value(0.1470)so0.80isaplausible valueforMarine’slong-runfrequency.

Page 17: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

Developingarangeofplausiblevalues

• Ifwegetasmallp-value(likewedidwith0.70)wewillconcludethatthevalueunderthenullisnotplausible.Thisiswhenwerejectthenullhypothesis.

• Ifwegetalargep-value(likewedidwith0.80)wewillconcludethevalueunderthenullisplausible.Thisiswhenwecan’trejectthenull.

Page 18: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

Developingarangeofplausiblevalues

• Onecouldusesoftware(liketheone-proportionappletthebookrecommends)tofindarangeofplausiblevaluesforMarine’slongtermprobabilityofchoosingthecorrectspecimen.

• Wewillkeepthesampleproportionthesameandchangethepossiblevaluesof𝜋.

• Wewilluse0.05asourcutoffvalueforifap-valueissmallorlarge.(Recallthatthisiscalledthesignificancelevel.)

Page 19: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

CanDogsSniffOutCancer?• Itturnsoutvaluesbetween0.761and0.974areplausiblevaluesforMarine’sprobabilityofpickingthecorrectspecimen.

Probabilityundernull 0.759 0.760 0.761 0.762

……0.973 0.974 0.975 0.976

p-value 0.042 0.043 0.063 0.063 0.059 0.054 0.049 0.044

Plausible? No No Yes Yes ………Yes Yes Yes No No

Page 20: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

CanDogsSniffOutCancer?

• (0.761,0.974)iscalledaconfidenceinterval.• Sinceweused5%asoursignificancelevel,thisisa95%confidenceinterval.(100%− 5%)

• 95%istheconfidencelevelassociatedwiththeintervalofplausiblevalues.

Page 21: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

CanDogsSniffOutCancer?

• Wewouldsayweare95%confidentthatMarine’sprobabilityofcorrectlypickingthebagwithbreathfromthecancerpatientfromamong5bagsisbetween0.761and0.974.

• ThisisamoreprecisestatementthanourinitialsignificancetestwhichconcludedMarine’sprobabilitywasmorethan0.20.

• Sidenote: We do not say P{π is in (.761, .974)} = 95%, because π is not random. The interval is random, and would change with a different sample. If we calculate an interval this way, then P(interval contains π) = 95%.

Page 22: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

ConfidenceLevel• Ifweincreasetheconfidencelevelfrom95%to99%,whatwillhappentothewidthoftheconfidenceinterval?

Page 23: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

CanDogsSniffOutCancer?

• Sincetheconfidencelevelgivesanindicationofhowsurewearethatwecapturedtheactualvalueoftheparameterinourinterval,tobemoresureourintervalshouldbewider.

• Howwouldweobtainawiderintervalofplausiblevaluestorepresenta99%confidencelevel?– Usea1%significancelevel inthetests.– Valuesthatcorrespondto2-sidedp-valueslargerthan0.01shouldnowbeinourinterval.

Page 24: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

2SDandTheory-BasedConfidenceIntervalsforaSingleProportion

Section3.2

Page 25: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

Introduction• Section3.1foundconfidenceintervalsbydoingrepeatedtestsofsignificance(changingthevalueinthenullhypothesis)tofindarangeofvaluesthatwereplausibleforthepopulationparameter(longrunprobabilityorpopulationproportion).

• Thisisaverytediouswaytoconstructaconfidenceinterval.

• Wewillnowlookattwootherswaytoconstructconfidenceintervals[2SD andTheory-Based].

Page 26: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

TheAffordableCareActExample3.2

Page 27: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

TheAffordableCareAct

• ANovember2013Galluppollbasedonarandomsampleof1,034adultsaskedwhethertheAffordableCareActhadaffectedtherespondentsortheirfamily.

• 69%ofthesample respondedthattheacthadnoeffect. (Thisnumberwentdownto59%inMay2014and54%inOct2014.)

• WhatcanwesayabouttheproportionofalladultAmericansthatwouldsaytheacthadnoeffect?

Page 28: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

TheAffordableCareAct

• Wecouldconstructaconfidenceintervaljustlikewedidlasttime.

• Wefindweare95%confidentthattheproportionofalladultAmericansthatfeltunaffectedbytheACAisbetween0.661and0.717.

Probabilityundernull 0.659 0.660 0.661 ………… 0.717 0.718 0.719

Two-sidedp-value 0.0388 0.0453 0.0514 ………… 0.0517 0.0458 0.0365

Plausiblevalue(0.05)? No No Yes ………… Yes No No

Page 29: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

Shortcut?

• Themethodweusedlasttimetofindourintervalofplausiblevaluesfortheparameteristediousandtimeconsuming.

• Mighttherebeashortcut?• Oursampleproportionshouldbethemiddleofourconfidenceinterval.

• Wejustneedawaytofindouthowwideitshouldbe.

Page 30: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

2SDmethod

• Whenastatisticisnormallydistributed,about95%ofthevaluesfallwithin2standarddeviationsofitsmeanwiththeother5%outsidethisregion

Page 31: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

2SDmethod

• Sowecouldsaythataparametervalueisplausibleifitiswithin2standarddeviations(SD)fromourbestestimateoftheparameter,ourobservedsamplestatistic.

• Thisgivesusthesimpleformulafora95%confidenceintervalof

𝒑, ± 𝟐𝑺𝑫

Page 32: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

WheredowegettheSD?

• NulldistributionforACAwithπ =0.5.

Page 33: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

2SDmethod

• Usingthe2SDmethodonourACAdatawegeta95%confidenceinterval

0.69 ± 2(0.016)0.69 ± 0.032

• The± part,like0.032intheabove,iscalledthemarginoferror.

• Theintervalcanalsobewrittenaswedidbeforeusingjusttheendpoints;(0.658,0.722)

• Thisisapproximatelywhatwegotwithourrangeofplausiblevaluesmethod(abitwider).

Page 34: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

Theory-BasedMethods

• The2SDmethodonlygivesusa95%confidenceinterval

• Ifwewantadifferentlevelofconfidence,wecanusetherangeofplausiblevalues(hard)ortheory-basedmethods(easy).

• Thetheory-basedmethodisvalidprovidedthereareatleast10successesand10failuresinyoursample.

Page 35: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

Theory-BasedMethods

• Withtheory-basedmethodsweusenormaldistributionstoapproximateoursimulatednulldistributions.

• Thereforewecandevelopaformulaforconfidenceintervals.

𝑝,+multiplier × �̂� 1 − �̂� /𝑛� .Fora95%CI,thebooksuggestsamultiplierof2.Actuallypeopleuse1.96,not2.

qnorm(.975)=1.96.qnorm(.995)=2.58.

Page 36: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

• Let’scheckoutthisexampleusingthetheory-basedmethod.

• Remember69%of1034respondentswerenotaffected.

𝑝,+multiplier × �̂� 1 − �̂� /𝑛�

=69%+ 2x .69(1 − .69)/1034�

=69%+ 2.88%.With1.96insteadof2itwouldbe69%+ 2.82%.

Page 37: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

2SDandTheory-BasedConfidenceIntervalsforaSingleMean

Section3.3

Page 38: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

UsedCars

Example3.3

Page 39: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

UsedCars

Thefollowinghistogramdisplaysdataforthesellingpriceof102HondaCivicsthatwerelistedforsaleontheInternetinJuly2006.

Page 40: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

UsedCars• Theaverageofthissampleis�̅� =$13,292withastandarddeviationofs =$4,535.

• Whatcanwesayaboutμ,theaveragepriceofallusedHondaCivics?

Page 41: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

UsedCars• Whileweshouldbecautiousaboutoursamplebeingrepresentativeofthepopulation,let’streatitassuch.

• μmightnotequal$13,292(thesamplemean),butitshouldbeclose.

• Todeterminehowclose,wecanconstructaconfidenceinterval.

Page 42: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

ConfidenceIntervals• Rememberthebasicformofaconfidenceintervalis:

statistic± multiplier× (SDofstatistic)

SDofstatisticisalsocalledStandardError(SE).• Inourcase,thestatisticis�̅� sowearewriteour2SDconfidenceintervalas:

�̅� ± 2(SE)

Page 43: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

ConfidenceIntervals• ItisimportanttonotethattheSDof�̅� (theSE)andtheSDofoursample(s =$4,535)arenotthesame.

• Thereismorevariabilityinthedata(thecar-to-carvariability)thaninsamplemeans.

• TheSEis𝑠 𝑛�⁄ .Whichmeanswecanwritea2SDconfidenceintervalas:

�̅� ± 2×𝑠𝑛�

• Thismethodwillbevalidwhenthenulldistributionisbell-shaped.

Page 44: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

SummaryStatistics

• Atheory-basedconfidenceintervalisquitesimilarexceptitusesamultiplierthatisbasedonat-distributionandisdependentonthesamplesizeandconfidencelevel.

• Fortheory-basedconfidenceintervalforapopulationmean(calledaone-samplet-interval)tobevalid,theobservationsshouldbe(approx.)independent,andeitherthepopulationshouldbenormalornshouldbelarge.Checkthesampledistributionforskewandasymetry.

Page 45: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

ConfidenceIntervals

• Wefindour95%CIforthemeanpriceofallusedHondaCivicsisfrom$12,401.20to$14,182.80.

• NoticethatthisisamuchnarrowerrangethanthepricesofallusedCivics.

• Fora99%confidenceinterval,itwouldbewider.Themultiplierwouldbe2.6insteadof1.96.

Page 46: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

FactorsthatAffecttheWidthofaConfidenceInterval

Section3.4

Page 47: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

FactorsAffectingConfidenceIntervalWidths

• Levelofconfidence(e.g.,90%vs.95%)– Asweincreasetheconfidencelevel,weincreasethewidthoftheinterval.

• Samplesize– Assamplesizeincreases,variabilitydecreasesandhencethestandarderrorwillbesmaller.Thiswillresultinanarrowerinterval.

• Samplestandarddeviation– Alargerstandarddeviation,s,willyieldawiderinterval.

– Forsampleproportions,widerintervalswhen�̂� iscloserto0.5. s=√[�̂� (1-�̂�)].

Page 48: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

LevelofConfidence

• Ifwehaveawiderinterval,weshouldbemoreconfidentthatwehavecapturedthepopulationproportionorpopulationmean.

• Wecouldseethiswithrepeatedtestsofsignificance.– Ahigherconfidencelevelcorrespondstoalowersignificancelevel,andonemustgofarthertotheleftandfarthertotherightinourtablestogetourconfidenceinterval.

Page 49: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

SampleSize

• Weknowassamplesizeincreases,thevariability(andthusstandarddeviation)inournulldistributiondecreases

n =90(SD=0.054) n =361(SD=0.026) n =1444(SD=0.013)Samplesize 90 361 1444

SDofnulldistr. 0.053 0.027 0.013

Margin of error 2xSD=0.106 2× SD=0.054 2× SD=0.026

Confidenceinterval (0.091,0.303) (0.143,0.251) (0.171,0.223)

Page 50: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

SampleSize• (Witheverythingelsestayingthesame)increasingthesamplesizewillmakeaconfidenceintervalnarrower.

Notice:• Theobservedsampleproportionisthemidpoint.(thatwon’tchange)

• Marginoferrorisamultipleofthestandarddeviation soasthestandarddeviationdecreases,sowillthemarginoferror.

Page 51: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

Valueof𝑝B(orthevalueusedforπ underthenull)

• Asthevaluethatisusedunderthenullgetsfartherawayfrom0.5,thestandarddeviationofthenulldistributiondecreases.

• Whenthisstandarddeviationisusedinthe2SDmethod,theintervalgetsgraduallynarrower.

Page 52: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

StandardDeviation• Supposewearetakingrepeatedsamplesofapopulation.• Howdoweestimatewhatthestandarddeviationofthenull

distribution(standarderror)willbe?𝑠 𝑛�⁄ .

Meansofsamplesofsize10.

Page 53: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

StandardDeviation

• TheSDofthenulldistributionisapproximatedby𝑠 𝑛�⁄ .• Rememberthat2(𝑠 𝑛)�⁄ isapproximatelythemarginoferror

fora95%confidenceinterval,soasthestandarddeviationofthesampledata(s)increasessodoesthewidthoftheconfidenceinterval.

• Intuitivelythisshouldmakesense,morevariabilityinthedatashouldbereflectedbyawiderconfidenceinterval.

Page 54: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

FormulasforTheory-BasedConfidenceIntervals

�̂� +𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑖𝑒𝑟× JB KLJBM

� �̅� ± 𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑖𝑒𝑟× N

M�

• Thewidthoftheconfidenceintervalincreasesaslevelofconfidenceincreases(multiplier)

• Thewidthoftheconfidenceintervaldecreasesasthesamplesizeincreases

• Thevalue�̂�alsohasamoresubtleeffect.Thefartheritisfrom0.5thesmallerthewidth.

• Thewidthoftheconfidenceintervalincreasesasthesamplestandarddeviationincreases.

Page 55: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

Whatdoes95%confidencemean?

• Ifwerepeatedlysampledfromapopulationandconstructed95%confidenceintervals,95%ofourintervalswillcontainthepopulationparameter.

• Noticetheintervalistherandomeventhere.

Page 56: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

Whatdoes95%confidencemean?

• Supposea95%confidenceintervalforameanis2.5to4.3.Wewouldsayweare95%confidentthatthepopulationmeanisbetween2.5and4.3.– Doesthatmeanthat95%ofthedatafallbetween2.5and4.3?• No

– Doesthatmeanthatinrepeatedsampling,95%ofthesamplemeanswillfallbetween2.5and4.3?• No

– Doesthatmeanthatthereisa95%chancethepopulationmeanisbetween2.5and4.3?• Notquitebutclose.

Page 57: Stat 13, Intro. to Statistical Methods for the Life and ...frederic/13/F16/day05.pdf · • (0.761, 0.974) is called a confidence interval. • Since we used 5% as our significance

Whatdoes95%confidencemean?

• Whatdoesitmeanwhenwesayweare95%confidentthatthepopulationmeanisbetween2.5and4.3?– Itmeansthatifwerepeatedthisprocess(takingrandomsamplesofthesamesizefromthesamepopulationandcomputing95%confidenceintervalsforthepopulationmean)repeatedly,95%oftheconfidenceintervalswefindwouldcontainthepopulationmean.

– P(confidenceintervalcontainsµ)=95%.