DSUR Errata - Discovering StatisticsDSUR Errata Wilcox Functions ... This contradicts Gramming Sam's...
Transcript of DSUR Errata - Discovering StatisticsDSUR Errata Wilcox Functions ... This contradicts Gramming Sam's...
©Prof.AndyField,2012 www.discoveringstatistics.com Page1
DSUR Errata Wilcox Functions Wilcox’s website address has changed since the book was published. Latest versions of the functions for robustanalysisbyWilcoxareavailablebyexecuting:
source("http://dornsife.usc.edu/assets/sites/239/docs/Rallfun-v26.txt")
Code changes/Package Updates. • Chapter 4 (ggplot2): After the bookwas publishedHadleyWickhamupdated ggplot2, and some of the syntax
changedconsiderably(seehttp://docs.ggplot2.org/current/).Pleaseletmeknowofanythingthatdoesn’twork,buthereareafewproblemsthatIknowaboutalready.
• LinegraphsnotworkingThisisabugintroducedinggplot20.9.3Therewilllikelybeafixsoon(aversion0.9.3.1).In themeantime, a temporary fix can be found by executing (I didn’t write this fix and it could create otherproblems)1.Seehttps://github.com/hadley/ggplot2/issues/732
install.packages("devtools")
library(devtools)
source_gist("https://gist.github.com/4578531")
• Page 155 (R´s Souls´ Tip 4.3): scale_fill_manual ("Gender", c("Female" = "Blue", "Male"="Green")) should bescale_fill_manual("Gender",values=c("Female"="Blue","Male"="Green")).[ThanksSteffenWild].
• The opts() function is depreciated and has been replaced by the theme() function. This has implications foranythinginthechapterthatusesopts().Thereisaverygoodtransitionguidetohelpyoutransferfromopts()totheme()here.NeedlesstosayIwillhavetoupdatethechapter/codeatsomepoint.IfyoucorrectanycodethenpleaseemailittomeifyoufeelsoinclinedJTogetridofthelegendusetheme(legend.position="none")insteadofopts().
• P.156:thefactor()functionhaschanged,soyou’llgetanerrorusing:
hiccups$Intervention_Factor<-factor(hiccups$Intervention, levels = hiccups$Intervention)
Instead,youneedtoexecutethis(toorderthelevelsastheyareinthebookratherthanalphabetic):
hiccups$Intervention_Factor<-factor(hiccups$Intervention, levels(hiccups$Intervention)[c(1, 4, 2, 3)])
• P.199(R’sSouls’Tip5.4):thefinalcommand:
dlf$meanHygiene<-ifelse(dlf$daysMissing < 2, NA, rowMeans(cbind(dlf$day1, dlf$day2, dlf$day3), na.rm = TRUE))
shouldbe(notethepositionofNA–ithasmovedtotheendofthecommand):
dlf$meanHygiene<-ifelse(dlf$daysMissing < 2, rowMeans(cbind(dlf$day1, dlf$day2, dlf$day3), na.rm = TRUE), NA)
1 I have used this patch on three different machines (Macs) and had no issues at all. However, Isaac van Pattenemailedtosaythatthepatchhadmesseduphissystem.Hesaidhe:
“…hadtodeleteR2.15.2altogetherandreinstallit.Whatwillworkistoremoveggplot20.9.3fromthelibraryandthengotothearchivesandloadggplot20.9.1fromthesourcecode…usingtheolderversionitdrawsthegraphsasneeded.”
LikeIsaid,it’snotmypatch,souseitatyourownrisk.Itworksfineforme,butitcancauseproblems.See
©Prof.AndyField,2012 www.discoveringstatistics.com Page2
• P.216(thecor()function):IfyouthrowtheexamDatadataframeintocor()you’llgetanerrorsayingthatxmustbe numeric. The problem is the gender variable, which is a non-numeric factor (Male and Female). The wayaroundthis,istoeitherselectonlythefirst4variablesofthedataframe(thenumericvariables):
cor(examData[,1:4])
YoucouldalsoconvertGendertoa0,1dummycodedvariable,thenruncor()onthewholedataframe(inwhichcasecorrelationsinvolvinggenderwillbethepointbiserialcorrelations).Inthecodebelowas.numeric()convertstheGendervariabletonumbers,butRwilluse1and2bydefault,sothetheminus1changesthesevaluesto0and1asperdummycoding:
> examData$Gender<-as.numeric(examData$Gender)-1 > cor(examData)
• P.226(Section6.5.7):bootTau<-function(liarData,i)cor(liarData…won’trunwithoutaspacebeforecor,thereisaspaceinthebookbutbecauseofthetypesettingthatisn’tnecessarilyclear.It’ssafertobracketthefunction{},soyoucouldwritethisfunctionas(ThanksJanDittrich):
bootTau<- function(liarData,i){cor(liarData … etc. )}
• P.235(section6.6.2):arequireddependencyfortheggmpackageisnolongersupportedbyCRAN–thegraphpackage isno longeravailable.It isbeingmaintainedatBioconductor.orgbutrequires individualdownloadandinstallation.ItalsorequiressomeotherdependenciesfromBiocoductor,BiocGenerics&RBGL,tobedownloadedandinstalledinyourlibraryfolder.Todothisexecute:
source("http://bioconductor.org/biocLite.R")
biocLite(c("BiocGenerics", "RBGL"))
install.packages("ggm")
library(ggm)
Oncethiswasdone, thematerial inSection6.6.2willwork.Without ityoucannot load theggmpackage.Oneotherthingthatisnotobviousisadataframemustjustbethevariablesincludedinthepartialcorrelationforthevar() argument (e.g. – it’ll choke if you forget to stripout the subjectnumbers!). [Thanks, IsaacT.VanPatten,RadfordUniversityandJeffP.]
• P.299:bootReg<-function(formula,data,indices)o IndicesshouldbeItomatchthedata[i,]twolinesbelow.Thecodesampleiscorrect,justthebookthat’s
wrong.• P.895:Growthmodels.IfyouusethefileHoneymoonPeriodRestructured.saveverythingwillbefine.However,
ifyouusetheHoneymoonPeriod.datfileandrestructurethedatainR(usingmelt())thenyouwillgetanerrormessageresultingromthefactthatthevariableTimeistreatedasafactorratherthananumericvariable.INmycodesamplethedataarepreparedasfollows:
satisfactionData = read.delim("Honeymoon Period.dat", header = TRUE)
restructuredData<-melt(satisfactionData, id = c("Person", "Gender"), measured = c("Satisfaction_Base", "Satisfaction_6_Months", "Satisfaction_12_Months", "Satisfaction_18_Months"))
names(restructuredData)<-c("Person", "Gender", "Time", "Life_Satisfaction")
restructuredData$Time<-as.numeric(restructuredData$Time)-1
However,inthebook,Idon’ttalkaboutthisindetail(becauseofspace)andIreallyshouldhaveflaggedtheneedforthe final line because it converts time into a numeric variable. In fact, I also subtract 1 from the numeric valuesbecausetheas.numeric()functionwillconverttheTimefactorintovaluesof1,2,3,4andIwantthemtobe0,1,2,3(becausethebaselinevalueoftimeisameaningfulzeropoint).
Typos
©Prof.AndyField,2012 www.discoveringstatistics.com Page3
• Page14line11.'j14the'>>>'the'• -page58,subsectionStatisticalpower: "...as longasweknowthreeof theseproperties" - shouldn't thismean
"...twooftheseproperties.."?• Page194:dltshouldreaddlf(thanksBastianWimmer):
• Page212(thirdvariableproblem):ReferencetoJaneSuperbrainBox1.1shouldbe1.4.• -page218:withinthetwolastcor.test()functionsthereisabrackettoomuchafter"less"• -page291:Theparentheseswithin the formulacalculating theaverage leverage iswrong, it shouldbe (k+1)/n
ratherthank+1/n.• Page299,line3and5fromthebottom.advert>>>advertstime.>>>time).• Page224,line3and6fromthetop,miss-typing.liarData=>>>liarData<-• Page329:VariablenameCuredshouldsayIntervention.• Page379,line12fromthetop.statistics).------>statistics.• Page382,line4fromparagraph2.-40and47------>40and47• Page388:Equation’sequalsignisomitted.• page415,line6:Thereshouldnotbeadoubledot• page428,heading:"hoc"shouldbealsoinitalicletters• page455,below:calculates.esdoesnotexist,thenameiscompute.es;-)• page472,"...robustversionofANOVA,..."shouldberather"...robustversionofANCOVA,..."• page474,R'sSouls'Tip:"...totheeffectsintheroverallANOVA..."shouldbealso"...ANCOVA..."• page475,JaneSuperbrain:Firstsentence:AsfarasIseetypeIVsumsofsquareshasnotbeenintroduced• page476, lastparagraphof JaneSuperbrain:"...mainchoice inANOVAdesigns isbetweenType IIandType III
sums..." This contradictsGrammingSam's tipsonpage491where in the thirdpoint it iswritten "youneed todecidewhethertouseTypeIorTypeIIIsumsofsquares"
• page482,R-code:shouldbe"plot(viagraModel)"insteadof"plots(viagraModel)"• page488,lastparagraph:Whatisthesmall"x"?Shouldthisbethecapital"X"appearinginthesentencebefore?• page493,fourthR-code"mes(5.988117,...)".Youhavetakenthewrongvalueshere,thesearenotthemeanand
theadjustedmeanbutthevaluesofthe95%confidenceintervalshowninoutput11.4• page 537, last R-code: This works (at least atmy PC) only in case we have additionally specified "est=mom".
Otherwise,onlyNA'sareshown.• page538,Output12.8:therighthandsideoftheoutputiscompletelymissing• page543:"calculate.es"shouldratherbe"compute.es"• page556,Figure13.2:onthesecondlevel,SS_BrespectivelySS_Wisonetimeabove,onetimebelow• page 562 "General procedure...", point 4: "Depending on what you find in the previous step.." it is ton the
previousstepbutthestepbeforethat• page566,R-code(lastline):Shouldthisbeinblue,orisitratherapartofoutput13.1?• page579:Iamnotsureifthe"hatPsi"symbolhasbeenintroducedyet• page595,firstR-code:Thisshouldbenamed"drinkModel",the"baseline"hasbeenalreadydefinedbefore• page661,output15.2:Ithinkweneedthepackage"car"toperformtheLevene'sTest.Thispackagehasnotbeen
mentionedatthebeginofthechapter
©Prof.AndyField,2012 www.discoveringstatistics.com Page4
• page678,secondlastparagraph:"Output15.8showsthattheKruskal-Wallistest..."shouldberather"...Shapiro-Wilktest..."
• page689, last paragraph: "Friedman'sANOVA is significant..." shouldbe replacedby "The Shapiro-Wilk Test issignificant..."
• page727,Figure16.6:Inthegraphonthelefthandsidebelow:Shouldbetheoutlier(26)inblue?• page768,fourthlastline:Thenameofthefileis"raq.dat"insteadof"RAQ.dat"• page 778, "R's Souls' Tip": You should change the names "pc2" to "pc1", since these models are the same
comparedtothepc1modelsaboveonthispage.pc2incontrastisdefinedobpage781asthererunofpc1usingonlyrelevantfactors.
• page783,R-codeafterlastsentenceonthispage:Shouldbeinbluecolorandseparatedfromtheoutput.• page788,inthemiddleofthepage:R-command"pc2"shouldbechangedto"pc3"• -page818,assumption2forthechi-squaretest:Therearerulesregardingfrequencies>5or<5inthetwofirst
sentence,thisexcludesthecase=5.Sowhathappensifallfrequenciesequalto5?• P.818,line3frombottom,beginswith"catData".Subsequently,whenIrefertothisdata-frame(e.g.,onp.821,
line7frombottom),Icallit"catsData".Imeanttocallit"catsData"throughout.[RonaldWyllys]• page839,R-codes:Hereyousuddenlyuse"="insteadof"<-"todefineobjects.Acommentwouldbeniceif"="
worksalwaysanalogouslyto"<-"• page845,Figure18.5:Thetitlewithinthefigureiswrong,itshouldbe"Cats:Expectedvalues"
Thankstoeveryonespottingmistakes,[email protected]