Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods...

57
Stat 13, Intro. to Statistical Methods for the Life and Health Sciences. 1. Midterms. 2. Polls. 3. Two quantitative variables, correlation. 4. Linear regression. No class Thu Nov 24, Thanksgiving. Read ch10. Hw4 is 10.1.8, 10.3.14, 10.3.21, 10.4.11 and is due Tue Nov 29. The final Fri Dec 9, 8am-11, right here, will be on ch1-10. Bring a PENCIL and CALCULATOR and any books or notes you want. No computers. http://www.stat.ucla.edu/~frederic/13/F16 . 1. Midterms. The scores are listed in midtermscores.txt . They are out of 20. The mean was 16.628 = 83.14%. Median = 85%. SD = 15%. I do reward improvement on the final. 1

Transcript of Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods...

Page 1: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

Stat 13, Intro. to Statistical Methods for the Life and Health Sciences.

1.Midterms.2.Polls.3.Twoquantitativevariables,correlation.4.Linearregression.

NoclassThuNov24,Thanksgiving.Readch10.Hw4is10.1.8,10.3.14,10.3.21,10.4.11andisdueTueNov29.

ThefinalFri Dec9,8am-11,right here,willbeonch1-10.BringaPENCILandCALCULATORandanybooksornotesyouwant.Nocomputers.http://www.stat.ucla.edu/~frederic/13/F16.

1.Midterms.Thescoresarelistedinmidtermscores.txt .Theyareoutof20.

Themeanwas16.628=83.14%.Median=85%.SD=15%.Idorewardimprovementonthefinal.

1

Page 2: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

2.Polls.a.Whyweretheysofaroff?

Beforewegetintothis,whenitcomestotheelection,pleasemakesuretorespectfreedomofspeechandalsotorespecteveryone'srighttotheirbeliefseveniftheyarefarfromyourown.

Ourdept's officialstatement:Intheaftermathofthepresidentialelection,inwhichpassionsranhighsurroundingissuesoftolerancefordiversity,itisimportanttorememberthatharassmentanddiscriminationbasedonsuchthingsas:•race,ethnicity,ancestry,color•sex,gender,genderidentity,genderexpression,sexualorientation•nationalorigin,citizenshipstatus•religionarenotacceptableatUCLA,andmayhaveseriousconsequences.Informationforhowtoobtainredressorcounselingifyouaresubjectedtosuchharassmentordiscriminationcanbefoundat:https://equity.ucla.edu/report-an-incident/

2

Page 3: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

2.Polls.a.Whyweretheysofaroff?

3

Page 4: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

2.Polls.

Intotalthismakes17,104likelyvotersinthoseWisconsinpollsputtogether.Theyaveraged40.3% forTrump,andClinton46.8%.Thedifferenceis6.5%.Combined,themarginoferrorfora95%confidenceintervalaroundTrump'spercentagewouldbe0.735%.Thestandarderroris0.375%ontheestimateofTrump'spercentageof40.3%,andhegot47.9%.Sotheywereoffby7.6%whichismorethan20standarderrors.Theprobabilityis1in10^90thatthepollswouldbeoffbythatmuchormorejustbychance,iftheanswerstothepollswerejustarandomsampleofhowpeoplewereactuallygoingtovote.Technically,thereareundecidedvotersinthepollsalso.JusttakingthedifferenceinpercentagesbetweenTrumpandHillaryClintonratherthanthepercentageforTrumpintoaccount,theresultswereoffbyabout10SEs,not20,andthismakestheprobabilityofsomethingthisextremeormoreextremestillastronomical,about1.5*10-23.Thechanceofamonkeyrandomlytyping15letterscompletelyatrandomandhappeningtochoose"hillary rclinton"inorder,wouldbe6*10-22,soit'sabout40timesmorelikely.Whatdoweconclude?

4

Page 5: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

2.Polls.

Whatdoweconclude?Either*lotsofpeoplechangedtheirminds,*thepollswerebiased,*theofficialresultswereincorrect,or*thepollsweren'tindependentofeachother.Thosearereallytheonlytenableexplanations.

5

Page 6: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

2. Polls.

b.IfClintonhadoutperformedthepolls,whatwouldhavebeenanexplanation?*fundraisingadvantage.* moreexperiencedcampaignstaff.*moreorganizedgroundgame.*Latinopopulationshadsurged.*AhigherpercentageofLatinosvoted.*Earlyvotinghadenabledmorepoorpeopleandminoritiestovote.Thuspeoplewhomightbeconsideredunlikelyvotersduetovotingtrendsinpreviouselectionsmightactuallybevoting,andthemajorityofthesewouldbeexpectedtobeDemocrats.*Mostlygoodnewsforherrightbeforetheelection.TheFBIsaidtheywentthroughtheemailsandclearedherofcharges.Lotsofbigstarswereperformingandgettingpeopleouttothepollsandrallies.Obama,Michelle,BillClinton,andmanyotherswerecampaigninghardforherinthefinaldays.*Meanwhile,manytopRepublicanswerenotevensupportingTrump.Hisgroundgameandfieldofficesweredisorganizedornonexistent.EvensomerightwingradiohostswerecriticizingTrump.Hehadfiredhiscampaignmanagermidwaythroughthecampaignandhadaninexperiencedhodge podgeofsupportersandstaff.

6

Page 7: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

2. Polls.

c.Otherpossiblepro-Clintonexplanations.*Hillarypreparedveryearlyforherrunforoffice.Trumpwasalatecomer.*TheDemocratsmostlycoalescedaroundHillary,allowinghertopileupnumerousendorsementsveryearly.Onlyacoupleofpeopleevenranagainsther,andnonewerereallypromisingcandidates.EvenSanderswasnotseenasaveryviablecandidatewhentheracebegan.*OntheRepublicanside,Trumphadtofendoff16othercandidateswhileHillarywasraisingmoneyandholdingontoit.*AfterTrumpgotthenomination,barely,hehadlittleconventionbounce,andHillaryhadahugeoneandgotabigleadinthepolls.*Obama'spopularityhasbeenhigh.*Theeconomy,whilenottoostrong,ismuchmuchstrongerthanwhenObamabeganinoffice,andhewasabletocampaignstronglyforHillary.*Shealsohadanincrediblycharismaticandgreatspeakerinherhusband,andMichellemadegreatspeechesaswell.*Thefatherofamuslim fallensoldierspokeeloquentlyagainstTrump.*Melania gotcaughtblatantlyplagiarizingMichelle'sspeechfrom2008.*TrumpgotsuedforfraudforTrumpUniversity,andcriticizedthejudgeasunfitbecausehewasofMexicandescent.*Trumpmadefunofareporterinawheelchair.*Clintonwonall3debatesaccordingtomostpollsandsurveys.

7

Page 8: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

2. Polls.

c.Otherpossiblepro-Clintonexplanations.*Trumpcontinuedtorefusetoreleasehistaxreturnsdespiteprodding.*ThetapecameoutwhereTrumpbraggedaboutgrabbingwomen.Clintonsurgedwayaheadinthepolls.*ManyimportantRepublicansstoppedsupportinghim.*DownballotRepublicanstriedtodistancethemselvesfromhim.*ClintonwasraisingmorefundsthanTrump,andTrumpwasnotspendingmuchofhisownmoneyonhiscampaign.

d.Bayesianstatistics,NateSilver,and538.com.

Inhismostrecentarticleon538.com,NateSilverstatesveryclearlythatheexpectedHillarytowin,andhismodelsfavoredher,buthealsoemphasizesthatthepollsaretypicallyoffbytheamountstheywereoffthistimeandpointsoutthathismodelsgaveTrumpabouta30%chanceofwinningthedaybeforetheelection.

Healsoadmitsthatthepollstypicallymissbyamountsgreaterthanwhatwewouldexpectduetosamplingerroralone.

8

Page 9: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

d.Bayesianstatistics,NateSilver,and538.com.

AsNateSilvernotes,thepollsindicatedalargeproportionofundecidedvoters,whichmeanttherewasmoreuncertaintyandthereforealargerchanceforamajorpollingerror.

NateSilverand538.comuseaBayesianmodeltoforecasttheelection.HowdoesBayesianmodelingwork?Themodelstartswithpriordistributionswhicharesupposedtoreflecttheresearcher'sbeliefsabouttheprobabilityofsomethingbeforecollectingdata.Forinstance,youmighthaveapriordistributionthatthepercentageµofvotestheDemocraticcandidatewouldgetisspreaduniformlybetween40%and60%.Thenyoucollectdata,frompolls,economicdata,etc.,andgraduallyupdateyourdistribution,formingaposteriordistribution forµ.NateSilver'smainideawastoweightthedifferentpollsbyhowaccuratetheywerehistorically.

Bayesianmodelersareoftencorrectlycriticizedfornotadequatelyvalidatingtheirmodels.NateSilvermightbeagoodmodeler,butitishardtotell.

9

Page 10: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

d.Bayesianstatistics,NateSilver,and538.com."Apartfromcalibration,arethereothergoodmethodstoevaluateaprobabilisticforecast?Notreally,althoughsometimesitcanbeworthwhiletolookforsignsofwhetheranupsetwinnerbenefitedfromgoodluckorquirky,one-offcircumstances."NateSilver,May18,2016.http://fivethirtyeight.com/features/how-i-acted-like-a-pundit-and-screwed-up-on-donald-trump

Calibration.Ifwetakeallcandidateswherethemodeloutputsaprobabilityof35%,dothecandidateswin35%ofthetime?

IcouldjustsayeveryDemocratandRepublicanhasa50%chanceeveryelection.Thismodelmightbewellcalibrated,butitisnotveryinformative.Validation.Isthemodelaccurate?Doesitoutperformcompetingmodels?e.g.Log-likelihood:∑i won log(pi)+∑i lost log(1-pi).

TherearepotentiallimitationsofBayesianstatistics.*WhymightonewanttoknowifClinton'sprobabilityofwinningis67%or72%?Perhapsifoneisonlyverysuperficiallyinterested?*WhatifmypriorisdifferentfromNateSilver's?*Ifyouaredeeplyinterested,whynotlookatthepollresultsthemselves,directly?*Bayesianmodelsmightbemoredifficulttovalidate,insomecases.

10

Page 11: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

e.ArethereotherpredictorsofsuccessinPresidentialelections?

Herearethepresidentialelectionsthathaveoccurredinmylifetime.1972.Nixonvs.McGovern.1976.Cartervs.Ford.1980.Reaganvs.Carter.1984.Reaganvs.Mondale.1988.Bushvs.Dukakis.1992.Clintonvs.Bush.1996.Clintonvs.Dole.2000.Bushvs.Gore.2004.Bushvs.Kerry.2008.Obamavs.McCain.2012.Obamavs.Romney.2016.Trump vs.Clinton.

Many would say the candidate with morehumor,style,andcharisma won11/12ofthese elections.The two-sided p-value =0.63%.Experience?Hard tosay,but perhaps the moreexperienced candidate won3/12.The two-sided p-value =14.6%.

11

Page 12: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

3.TwoQuantitativeVariablesChapter10

Page 13: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

TwoQuantitativeVariables:ScatterplotsandCorrelationSection10.1

Page 14: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

ScatterplotsandCorrelation

Time 30 41 41 43 47 48 51 54 54 56 56 56 57 58

Score 100 84 94 90 88 99 85 84 94 100 65 64 65 89

Time 58 60 61 61 62 63 64 66 66 69 72 78 79

Score 83 85 86 92 74 73 75 53 91 85 62 68 72

Supposewecollecteddataontherelationshipbetweenthetimeittakesastudenttotakeatestandtheresultingscore.

Page 15: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

Scatterplot

Putexplanatoryvariableonthehorizontalaxis.

Putresponsevariableontheverticalaxis.

Page 16: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

DescribingScatterplots•Whenwedescribedatainascatterplot,wedescribethe• Direction(positiveornegative)• Form(linearornot)• Strength(strong-moderate-weak,wewillletcorrelationhelpusdecide)• UnusualObservations• Howwouldyoudescribethetimeandtestscatterplot?

Page 17: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

Correlation• Correlationmeasuresthestrengthanddirectionofalinear associationbetweentwoquantitative variables.• Correlationisanumberbetween-1and1.• Withpositivecorrelationonevariableincreases,onaverage,astheotherincreases.• Withnegativecorrelationonevariabledecreases,onaverage,astheotherincreases.• Thecloseritistoeither-1or1thecloserthepointsfittoaline.• Thecorrelationforthetestdatais-0.56.

Page 18: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

CorrelationGuidelinesCorrelationValue Strengthof

AssociationWhatthismeans

0.7to1.0 Strong Thepointswillappeartobenearlyastraightline

0.3to0.7 Moderate Whenlookingatthegraphtheincreasing/decreasingpatternwillbeclear,but thereisconsiderablescatter.

0.1to0.3 Weak Withsomeeffortyouwillbeabletoseeaslightlyincreasing/decreasingpattern

0to0.1 None Nodiscernibleincreasing/decreasingpattern

Same StrengthResultswithNegativeCorrelations

Page 19: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

BacktothetestdataActuallythelastthreepeopletofinishthetesthadscoresof93,93,and97.

Whatdoesthisdotothecorrelation?

Page 20: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

InfluentialObservations• Thecorrelationchangedfrom-0.56(afairlymoderatenegativecorrelation)to-0.12(aweaknegativecorrelation).• Pointsthatarefartotheleftorrightandnotintheoveralldirectionofthescatterplotcangreatlychangethecorrelation.(influentialobservations)

Page 21: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

Correlation• Correlationmeasuresthestrengthanddirectionofalinear associationbetweentwoquantitativevariables.• -1< r< 1• Correlationmakesnodistinctionbetweenexplanatoryandresponsevariables.• Correlationhasnounits.• Correlationisnotresistanttooutliers.Itissensitive.

Page 22: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

LearningObjectivesforSection10.1• Summarizethecharacteristicsofascatterplotbydescribingitsdirection,form,strengthandwhetherthereareanyunusualobservations.• Recognizethatthecorrelationcoefficientisappropriateonlyforsummarizingthestrengthanddirectionofascatterplotthathaslinearform.• Recognizethatascatterplotistheappropriategraphfordisplayingtherelationshipbetweentwoquantitativevariablesandcreateascatterplotfromrawdata.• Recognizethatacorrelationcoefficientof0meansthereisnolinearassociationbetweenthetwovariablesandthatacorrelationcoefficientof-1or1meansthatthescatterplotisexactlyastraightline.• Understandthatthecorrelationcoefficientisinfluencedbyextremeobservations.

Page 23: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

InferencefortheCorrelationCoefficient:Simulation-BasedApproachSection10.2

Page 24: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

Wewilllookatasmallsampleexampletoseeifbodytemperatureisassociatedwithheartrate.

Page 25: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

TemperatureandHeartRateHypotheses

• Null:Thereisnoassociationbetweenheartrateandbodytemperature.(ρ=0)• Alternative:Thereisapositivelinearassociationbetweenheartrateandbodytemperature.(ρ>0)

ρ=rho

Page 26: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

InferenceforCorrelationwithSimulation(Section10.2)

1.Computetheobservedstatistic.(Correlation)2.Scrambletheresponsevariable,computethesimulatedstatistic,andrepeatthisprocessmanytimes.

3.Rejectthenullhypothesisiftheobservedstatisticisinthetailofthenulldistribution.

Page 27: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

TemperatureandHeartRate

Tmp 98.3 98.2 98.7 98.5 97.0 98.8 98.5 98.7 99.3 97.8HR 72 69 72 71 80 81 68 82 68 65Tmp 98.2 99.9 98.6 98.6 97.8 98.4 98.7 97.4 96.7 98.0HR 71 79 86 82 58 84 73 57 62 89

CollecttheData

Page 28: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

TemperatureandHeartRate

r=0.378

ExploretheData

Page 29: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

TemperatureandHeartRate• Iftherewasnoassociationbetweenheartrateandbodytemperature,whatistheprobabilitywewouldgetacorrelationashighas0.378justbychance?

• Ifthereisnoassociation,wecanbreakapartthetemperaturesandtheircorrespondingheartrates.Wewilldothisbyshufflingoneofthevariables.

Page 30: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

ShufflingCards• Let’sremindourselveswhatwedidwithcardstofindoursimulatedstatistics.• Withtwoproportions,wewrotetheresponseonthecards,shuffledthecardsandplacedthemintotwopilescorrespondingtothetwocategoriesoftheexplanatoryvariable.• Withtwomeanswedidthesamethingexceptthistimetheresponseswerenumbersinsteadofwords.

Page 31: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

20.0% Improvers

66.7% Improvers

DolphinTherapyControlNon-

improver

Improver

Improver

Improver

Improver

Improver

Improver

Improver

ImproverImprover

Improver

Improver

Improver

ImproverNon-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

Non-improver

40.0% Improvers

46.7% Improvers0.400 – 0.467 = -0.067

Difference in Simulated Proportions

Page 32: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

mean = 3.90mean = 19.82

Music Nomusic

14.5

25.2

11.6

12.6

18.6

12.1

30.534.5

-7.0

45.6 10.0

9.6

-10.7

7.2-14.7

21.3

2.2

4.5

-10.7

21.8

2.4

mean = 6.38 mean = 16.126.38 – 16.12 = -9.74

Difference in Simulated Means

Page 33: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

ShufflingCards• Nowhowwillthisshufflingbedifferentwhenboththeresponseandtheexplanatoryvariablearequantitative?• Wecan’tputthingsintwopilesanymore.• Westillshufflevaluesoftheresponsevariable,butthistimeplacethemnexttotwovaluesoftheexplanatoryvariable.

Page 34: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

98.3° 98.2° 97.7° 98.5° 97.0° 98.8° 98.5° 98.7° 99.3° 97.8°

98.2° 99.9° 98.6° 98.6° 97.8° 98.4° 98.7° 97.4° 96.7° 98.0°

r = 0.378

6972 8180 82687172

r = 0.073

Simulated Correlations

Body Temperature and Heart Rate

68 65

7971 8458 57738286 62 89

Page 35: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

MoreSimulations0.054

-0.253 -0.3450.062 0.259

0.339

0.447-0.008

-0.229

-0.0290.059 -0.006

-0.034

-0.327 0.1000.067

0.212

0.097

0.447

0.034

0.167

0.3290.020

-0.042

0.232

0.2000.314

Only one simulated statistic out of 30 was as large or larger than our observed correlation of 0.378, hence our p-value for this null distribution is 1/30 ≈ 0.03.

Simulated Correlations 0.378

Page 36: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

TemperatureandHeartRate• Wecanlookattheoutputof1000shuffleswithadistributionof1000simulatedcorrelations.

Page 37: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

TemperatureandHeartRate• Noticeournulldistributioniscenteredat0andsomewhatsymmetric.• Wefoundthat530/10000timeswehadasimulatedcorrelationgreaterthanorequalto0.378.

Page 38: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

TemperatureandHeartRate• Withap-valueof0.053=5.3%,wealmostbutdonotquitehavestatisticalsignificance.Thisismoderateevidenceofapositivelinearassociationbetweenbodytemperatureandheartrate.Perhapsalargersamplewouldgiveasmallerp-value.

Page 39: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

4.LeastSquaresRegressionSection10.3

Page 40: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

Introduction• Ifwedecideanassociationislinear,itishelpfultodevelopamathematicalmodelofthatassociation.• Helpsmakepredictionsabouttheresponsevariable.• Theleast-squaresregressionline isthemostcommonwayofdoingthis.

Page 41: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

Introduction• Unlessthepointsareperfectlylinearlyalligned,therewillnotbeasinglelinethatgoesthrougheverypoint.• Wewantalinethatgetsascloseaspossibletoallthepoints.

Page 42: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

Introduction• Wewantalinethatminimizestheverticaldistancesbetweenthelineandthepoints• Thesedistancesarecalledresiduals.• Thelinewewillfindactuallyminimizesthesumofthesquaresoftheresiduals.• Thisiscalledaleast-squaresregressionline.

Page 43: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

AreDinnerPlatesGettingLarger?Example10.3

Page 44: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

GrowingPlates?• TherearemanyrecentarticlesandTVreportsabouttheobesityproblem.• Onereasonsomehavegivenisthatthesizeofdinnerplatesareincreasing.• Aretheseblackcirclesthesamesize,orisonelargerthantheother?

Page 45: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

GrowingPlates?• Theyappeartobethesamesizeformany,buttheoneontherightisabout20%largerthantheleft.

• Thissuggeststhatpeoplewillputmorefoodonlargerdinnerplateswithoutknowingit.

• Thereisnameforthisphenomenon:Delboeufillusion

Page 46: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

GrowingPlates?• Researchersgathereddatatoinvestigatetheclaimthatdinnerplatesaregrowing• Americandinnerplatessoldonebay onMarch30,2010(VanIttersum andWansink,2011)• Yearmanufacturedanddiameteraregiven.

Page 47: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

GrowingPlates?• Bothyear(explanatoryvariable)anddiameterininches(responsevariable)arequantitative.• Eachdotrepresentsoneplateinthisscatterplot.• Describetheassociationhere.

Page 48: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

GrowingPlates?• Theassociationappearstoberoughlylinear• Theleastsquaresregressionlineisadded• Howcanwedescribethisline?

Page 49: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

RegressionLineTheregressionequationis𝑦" = 𝑎 + 𝑏𝑥:• a isthey-intercept• b istheslope• x isavalueoftheexplanatoryvariable• ŷ isthepredictedvaluefortheresponsevariable

• Foraspecificvalueofx,thecorrespondingdistancey − 𝑦" (oractual– predicted)isaresidual

Page 50: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

RegressionLine• Theleastsquareslineforthedinnerplatedatais𝑦" = −14.8 + 0.0128𝑥• Ordiameter7 = −14.8 + 0.0128(year)• Thisallowsustopredictplatediameterforaparticularyear.

Page 51: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

Slope𝑦" = −14.8 + 0.0128𝑥

• Whatisthepredicteddiameterforaplatemanufacturedin2000?• -14.8+0.0128(2000)=10.8in.

• Whatisthepredicteddiameterforaplatemanufacturedin2001?• -14.8+0.0128(2001)=10.8128in.

• Howdoesthiscomparetoourpredictionfortheyear2000?• 0.0128larger

• Slopeb =0.0128meansthatdiametersarepredictedtoincreaseby0.0128inchesperyearonaverage

Page 52: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

Slope• Slopeisthepredictedchangeintheresponsevariableforone-unitchangeintheexplanatoryvariable.• Boththeslopeandthecorrelationcoefficientforthisstudywerepositive.• Theslopeis0.0128• Thecorrelationis0.604

• Theslopeandcorrelationcoefficientwillalwayshavethesamesign.

Page 53: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

y-intercept• They-interceptiswheretheregressionlinecrossesthey-axisorthepredictedresponsewhentheexplanatoryvariableequals0.• Wehaday-interceptof-14.8inthedinnerplateequation.Whatdoesthistellusaboutourdinnerplateexample?• Dinnerplatesinyear0were-14.8inches.

• Howcanitbenegative?• Theequationworkswellwithintherangeofvaluesgivenfortheexplanatoryvariable,butfailsoutsidethatrange.

• Ourequationshouldonlybeusedtopredictthesizeofdinnerplatesfromabout1950to2010.

Page 54: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

Extrapolation• Predictingvaluesfortheresponsevariableforvaluesoftheexplanatoryvariablethatareoutsideoftherangeoftheoriginaldataiscalledextrapolation.

Page 55: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

CoefficientofDetermination

• Whiletheinterceptandslopehavemeaninginthecontextofyearanddiameter,rememberthatthecorrelationdoesnot.Itisjust0.604.• However,thesquareofthecorrelation(coefficientofdeterminationorr2)doeshavemeaning.• r2 =0.6042=0.365or36.5%• 36.5%ofthevariationinplatesize(theresponsevariable)canbeexplainedbyitslinearassociationwiththeyear(theexplanatoryvariable).

Page 56: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

LearningObjectivesforSection10.3• Understandthatonewayascatterplotcanbesummarizedisbyfittingthebest-fit(leastsquaresregression)line.• Beabletointerpretboththeslopeandinterceptofabest-fitlineinthecontextofthetwovariablesonthescatterplot.• Findthepredictedvalueoftheresponsevariableforagivenvalueoftheexplanatoryvariable.• Understandtheconceptofresidualandfindandinterprettheresidualforanobservationalunitgiventherawdataandtheequationofthebestfit(regression)line.• Understandtherelationshipbetweenresidualsandstrengthofassociationandthatthebest-fit(regression)linethisminimizesthesumofthesquaredresiduals.

Page 57: Stat 13, Intro. to Statistical Methods for the Life and ...Stat 13, Intro. to Statistical Methods for the Life and ... The chance of a monkey randomly typing 15 letters completely

LearningObjectivesforSection10.3• Findandinterpretthecoefficientofdetermination(r2)asthesquaredcorrelationandasthepercentoftotalvariationintheresponsevariablethatisaccountedforbythelinearassociationwiththeexplanatoryvariable.• Understandthatextrapolationiswhenaregressionlineisusedtopredictvaluesoutsideoftherangeofobservedvaluesfortheexplanatoryvariable.• Understandthatwhenslope=0meansnoassociation,slope<0meansnegativeassociation,slope>0meanspositiveassociation,andthatthesignoftheslopewillbethesameasthesignofthecorrelationcoefficient.• Understandthatinfluentialpointscansubstantiallychangetheequationofthebest-fitline.