Indices - Amherst College

ii

“K23166” — 2015/1/9 — 17:35 — page 255 — #281 ii

ii

ii

Appendix D

Indices

Separate indices are provided for subject (concept or task) and R command. References tothe examples are denoted in italics.

D.1 Subject index

3-Dhistogram, 128plot, 130

95% confidence intervalmean, 52proportion, 53

absolute value, 36accelerated failure time model, 99access

Dropbox files, 6elements in R, 221files, 50variables, 11

addlines to plot, 146marginal rug plot, 147matrices, 39noise, 146normal density, 147straight line, 145text, 147variables, 13

age variable, 64, 239agreement, 54AIC, 86, 102airline delays, 207Akaike information criterion (AIC), 86,

102alcohol abuse, 241

alcoholic drinksHELP dataset, 240

Allaire, J.J., xxiialtitude, 193Amazon sales rank, 195analysis of variance

interaction plot, 130one-way, 70two-way, 70, 84

analytic power calculations, 58and operator, 28angular plot, 131annotating datasets, 26ANOVA

interaction plot, 130one-way, 70tables, 102

Aotearoa (New Zealand), 211API (application programming

interface), 199, 200, 202Apple R FAQ, 213application programming interface

(API), 199, 200, 202arbitrary quantiles, 52area under the curve, 132ARIMA model, 98arrays, 27, 46

extract elements, 223arrows, 148

255

ii

“K23166” — 2015/1/9 — 17:35 — page 256 — #282 ii

ii

ii

256 D.1 Subject index

ArXiv.org, 202ASCII

datasets, 5, 8encoding, 17

assertions, 47assignment operators in R, 221association plot, 131attributable risk, 53attributes

R, 226AUC (area under the curve), 132Auckland, University of, 211automated report generation, 63, 171autoregressive model, 98available datasets in R, 236AvantGarde font, 150average

running, 188average number of drinks

HELP dataset, 240axes

labels, 151multiple, 127omit, 152range, 151style, 151values, 151

barcharterror bars, 126

barplot, 123baseline interview, 237batch mode, 216Bates, Douglas, 211Bayesian

external software, 174inference task view, 173, 176, 232information criterion, 102logistic regression, 175methods, 186

BCA intervals, 181Beatles, 199best linear unbiased predictors, 96beta

distribution, 33, 53function, 37

beta-binomial distribution, 33beta-normal distribution, 33bias corrected and accelerated, 182bias-corrected and accelerated, 181BIC, 102

big data, 2, 18, 207regression, 69

Bike ride, 193binned scatterplot, 128binomial distribution, 33binomial family, 91binomial probabilities

tabulation, 188bitmap image file, 153bivariate

loess, 94relationship, 60, 127, 128

Bland–Altman plot, 133BMDP files, 3, 8BMP export, 153Bonferroni correction, 71book website, xxiiBoolean

operations, 16, 19, 28, 140R, 222

bootstrapping, 20, 181box around plots, 150boxplot, 125

side-by-side, 113, 125Bradley International Airport, 207break lines, 202Breslow estimator, 98Breslow–Day test, 55Breusch–Pagan test, 73“broken stick” models, 97bug reports, 236byte code compiler, 231

c statistic, 91calculate derivatives, 38calculus, 38calling functions from R, 226capture output, 50cartoon guide, 195case

sensitivity, 214statement, 14

categorical data, 30as predictor, 68from continuous, 13generation, 155parameterization, 68, 177plot, 131tables, 61

Cauchydistribution, 53

ii

“K23166” — 2015/1/9 — 17:35 — page 257 — #283 ii

ii

ii

D.1 Subject index 257

link function, 91causal inference, 177censored data, 98, 133, 165

simulate data, 158Center for Epidemiologic Studies

Depression (CESD) scale, 239centering, 52Central Limit Theorem, 161CESD, 27cesd variable, 27, 239chained equation models, 183, 186Chambers, John, 211change working directory, 50character translations, 17character variable, see string variablecharacteristics, test, 54characters, plotting, 145chemometrics task view, 232chi-square

distribution, 53statistic, 55

Cholesky decomposition, 96choose function, 37choropleth maps, 130, 193circadian plot, 131circular plot, 131class methods, 226class variable, 30

creating, 68ordering of levels, 68

classification, 100, 119cleaning data, 219clinical trial, 237

task view, 232clinical trials, 186clock

system, 34closest values, 187closing a graphic device, 153cluster analysis

task view, 232clustering

hierarchical, 101task view, 100, 101

cocaine, 241Cochran–Mantel–Haenszel test, 55code completion, 211code examples

downloading, xxiicoding numbers, 7coe�cient

of determination, 75of variation, 181regression, 73

coercingcharacter variable from numeric, 15dataframes into matrices, 224date from character, 4factor variable from numeric, 14matrices into dataframes, 224numeric from character, 4string variable from numeric, 13

collinearity, 95color

palettes, 151selection, 151

column width, 25combine matrices, 39Comic Sans font, 150comma-separated value (CSV) files, 2, 8command history, 49

R, 213comments, 223comparison

floating-point variables, 38operators, 221

compiler, 231complementary log-log link function, 91complex fixed format files, 3, 190

two lines, 196complex numbers, 38complex survey design, 101component-wise matrix multiplication,

40Comprehensive R archive network, 212computational economics task view, 232computational physics task view, 232concatenate, 170

datasets, 22matrices, 39strings, 15

conditional execution, 45conditional logistic regression, 92conditional logistic regression model, 91conditional probability, 163conditioning plot, 129, 135confidence interval, 48

for parameter estimates, 74for predicted observations, 132for the mean, 132proportion, 53

confidence level

ii

“K23166” — 2015/1/9 — 17:35 — page 258 — #284 ii

ii

ii


default, 48confidence limits

for individual (new) observations, 75for the mean, 74plotting, 74

conflicts, 224, 230confounding, 177constrained optimization, 208contingency table, 55, 61

plot, 131contour plots, 130contrasts, 68, 88

Helmert, 68polynomial, 68SAS, 68treatment, 68

control flow, 45control structures, 45, 217control widgets, 205controlling graph size, 149controlling Type-I error rate, 71convergence diagnosis for MCMC, 173,

174, 176converting characters, 17converting covariance to correlation

matrix, 76converting datasets

long (tall) to wide format, 21wide to long (tall) format, 21

Cook’s distance, 72cookies, 201coordinate systems (maps), 192corpus, 202correlated data, 112

generating, 157regression models, 96residuals, 96

correlationKendall, 54matrix, 60, 76, 141Pearson, 54Spearman, 54

cosine function, 37count models

goodness of fit, 103negative binomial regression, 93, 107Poisson regression, 93, 105zero-inflated negative binomial, 94zero-inflated Poisson regression, 93,

106Courier font, 150

coursesswirl, 217

covariance matrix, 75, 76, 112covariate imbalance, 177Cowles, Kate, 173Cox proportional hazards model, 98, 117

frailty, 99proportionality test, 99simulate data, 158time-varying covariate, 100

CPU time, 49Cramer’s V, 56CRAN (Comprehensive R Archive

Network), 212CRAN task views, see task viewscreate

ASCII datasets, 8categorical variable from continuous,

13categorical variable using logic, 14CSV (comma-separated value) files,

8dataset from counts, 53datasets for other packages, 8date variable, 23factors, 68files for other packages, 8functions, 48lagged variable, 17matrix, 39numeric variable from string, 15observation number, 20recode categorical variable, 14string variable from numeric, 13time variables, 24

Cronbach’s ↵, 100, 117cross-classification table, 29, 55crosstabs, 55, 61CSV (comma-separated value) files, 2, 8cumulative

density function, 33hazard, 99hazard plots, 133product, 189sum, 189

curated guide to learning R, 217Curran, James, 98curve plotting, 131custom graphic layouts, 149

Dalgaard, Peter, 211

ii

“K23166” — 2015/1/9 — 17:35 — page 259 — #285 ii

ii

ii


dashed line, 151data

display, 12entry, 7generation, 45input, 25mining, 202scraping, 195

Data Expo 2009, 207data input, 1

two lines, 196data step

repeat steps for a set of variables, 46data structures in R, 220data technologies, 9data viewer, 211database system, 18, 69, 207dataframes, 221

comparison with column bind, 224comparison with matrix, 224detaching, 11R, 223remove from workspace, 224

datasetcomments, 12from counts, 53HELP study, 239in book, xxiiother packages, 3R, 236

date and time variablescreate date, 23create time, 24extract month, 24extract quarter, 24extract weekday, 24extract year, 24reading, 3

dayslink variable, 66, 239DBF files, 3, 8debugging, 47

RStudio, 47decimal representation, 38decomposition

singular value, 41Deducer, 213default confidence level, 48defining functions, 48delete objects, 221density

estimation, 124, 128

overlapping, 126plot, 60, 65, 124, 128

density functions, 33generate random, 33probability, 33quantiles, 33

dependency management, 231depressive symptoms, 27derivatives, 38derived variable, 13, 27, 28design matrix, 75, 87

specification, 68, 177design of experiments task view, 232design weights, 101detach

dataframes, 11, 83, 224packages, 11, 109, 225

determinant, 41detoxification program, 237, 239deviance

tables, 102DFFITs, 73diagnostic

agreement, 132plots, 73tests, 73

diagnostic agreement, 54ROC curve, 138

diagnostic plots, 82diagnostics from linear regression, 81diagonal elements, 40, 41di↵erence in log-likelihoods, 102di↵erence in sets, 16di↵erential equations

task view, 232dimension, 40diploma problem, 162directory delimiter, 1directory structure, 1dispersion parameter, 107display missing categories, 55displaying

data, 12, 26model results, 7objects, 226scientific notation, 12

distance metric, 16distribution

beta, 53Cauchy, 53chi-squared, 53

ii

“K23166” — 2015/1/9 — 17:35 — page 260 — #286 ii

ii

ii


empirical probability density plot,125

exponential, 53F, 53gamma, 53geometric, 53logistic, 53lognormal, 53negative binomial, 53normal, 33, 53parameters, 53Poisson, 53probability, 33q-q plot, 131quantile, 33quantile–quantile plot, 131stem plot, 124t, 53Weibull, 53

divert output, 50DocBook document type definition, 9document mining, 202document term matrix, 203document type definition, 9documentation

R, 216dotplot, 124downloading

code examples, xxiidplyr, see library(dplyr) in R indexdrinks of alcohol

HELP dataset, 240drinkstat variable, 28Dropbox, 6dropping variables, 19drugrisk variable, 141, 239DTD, 9duplicated values, 20dynamic

web applications, 205, 211dynamic graphics task view, 232dynamite plot, 126

ecological data task view, 232econometrics task view, 232edit distance, 16editing data, 7e�ciency

vector operations, 45Efron, Bradley, 202

Efron estimator, 98

eigenvalues and eigenvectors, 41elapsed time, 24else statement, 217empirical

density plot, 65estimation, 162finance task view, 232power calculations, 169probability density plot, 125variance, 97, 115

encodingASCII, 17

entering data, 7environment, 226, 230environmental task view, 232Epi Info files, 3equal variance test, 57error bars

bar chart, 126error recovery, 47etiquette

R, 236evaluate integrals, 38Evans, Michael, 159exact

confidence intervals, 53logistic regression model, 92test of proportions, 56

example codedownloading, xxiiR, 215

Excelcreating, 8reading, 2

excesskurtosis, 52zeroes, 93, 94

exchangeable working correlation, 97execution

conditional, 45in operating system, 49profiling, 47

expansionwildcard, 50

expected cell counts, 63expected values, 162experimental design task view, 232exponential

distribution, 53random variables, 36, 161, 165scientific notation, 12

ii

“K23166” — 2015/1/9 — 17:35 — page 261 — #287 ii

ii

ii


exponentiation, 36export

BMP, 153datasets for other packages, 8Excel, 8graphs, 152JPEG, 153PDF, 152PNG, 153postscript, 152TIFF, 153WMF (Windows metafile format),

153expressions

R, 221extensible markup language (XML), 6, 9,

202extract characters from string, 15extract from objects, 54, 223

F distribution, 53f1 variables, 27, 28, 117, 239factor

analysis, 100, 118levels, 68, 177reordering, 68variable, 30, 68

factor object, 68factorial function, 37failure time data, 98Falcon, Seth, 211false discovery rate correction, 71false positive, 132family

binomial, 91Gamma, 91Gaussian, 91inverse Gaussian, 91Poisson, 91

FAQApple R, 213R, 217, 236Windows R, 212

female variable, 28, 240Fibonacci sequence, 189file

browsing, 50temporary, 50variable format, 4

filtering, 19finance task view, 232

findapproximate string, 16closest values, 187string within a string, 16working directory, 49

finite mixture modelstask view, 186, 232

finite population correction, 101Fisher’s exact test, 56, 61fit model separately by group, 83fixed format files, 1fixed width files, 2, 3flight delays, 207floating-point representation, 38follow-up interviews, 237fonts in graphics, 150footnotes, 147for statement, 217foreign format, 26formatted

data, 8model results, 7output, 171variables, 18

formula object, 55, 67forward stagewise regression, 103Foundation for Statistical Computing

R, 211Fox, John, 100fraction of missing information, 185frailty model, 99frequently asked questions

seeFAQ, 217Friedman’s super smoother, 146functions, 48

plotting, 131R, 48, 226

fuzzy search, 16

G-rho family of Harrington and Fleming,65

g1b variable, 135, 240g1btv variable, 110, 112, 115GAM, 94Gamma

distribution, 53family, 91function, 37, 159gamma distribution, 33regression, 91

Gaussian

ii

“K23166” — 2015/1/9 — 17:35 — page 262 — #288 ii

ii

ii


distribution, 33family, 91

Gelman, Andrew, 159, 160gender variable, 30, 240general linear model for correlated data,

96, 112generalized additive model, 94, 109generalized estimating equation, 115

exchangeable working correlation, 97independence working correlation,

97unstructured working correlation, 97

generalized linear mixed model, 97, 116generalized linear model, 91, 104

big data, 69generalized logit model, 93, 108generalized multinomial model, 93generate

arbitrary random variables, 36categorical data, 155correlated binary variables, 157Cox model, 158dataset from counts, 53exponential random variables, 36generalized linear model random

e↵ects, 156grid of values, 47logistic regression, 156multinomial random variables, 35multivariate normal random

variables, 35normal random variables, 35other random variables, 36pattern of repeated values, 46predicted values, 72random variables, 33residuals, 72sequence of values, 46truncated normal random variables,

36uniform random variables, 34

genetics task view, 232genf variable, 84Gentleman, Robert, 211geometric distribution, 33, 53getting

and cleaning data, 219help in R, 236

ggplot2, see library(ggplot2) in R indexGitHub, 211, 230goodness of fit, 103, 106

ROC curve, 138Google Maps, 193GPS coordinates, 193graduation, 162grammar of graphics, 193graphical layouts, 149graphical models task view, 232graphical reporting, 186graphical settings, 150graphical user interface

deducer, 213R, 213RStudio, 211

graphicsboxplot, 125choropleth, 193exporting, 152side-by-side boxplots, 125size, 149task view, 123, 232

greater than operator, 28grid

graphics, 232of values, 47rectangular, 148search, 208

grouping variablelinear model, 168summary statistics, 167

growth curve models, 97Gruen, Bettina, 186guide to packages

R, 231guidelines

R-help postings, 236

Hadoop, 19hanging rootogram, 103Harrell, Frank, 76, 126, 186, 229Harrington and Fleming G-rho family, 65harvesting data, 195hat matrix, 72hat-check problem, 162hazard plots, 133Health Evaluation and Linkage to

Primary Care (HELP) study,237

health surveySF-36, 240

Helmert contrasts, 68HELP study

ii

“K23166” — 2015/1/9 — 17:35 — page 263 — #289 ii

ii

ii


clinic, 241dataset, 239introduction, 237results, 237

help systemother resources, 236R, 215, 216R packages, 231

Helvetica font, 150heroin, 241Hesterberg, Tim, 161heteroscedasticity test, 73hierarchical clustering, 101, 121high-performance computing task view,

232histogram, 124

comparing, 125history

of commands, 49, 213R, 211

Hochberg correction, 71Holm correction, 71homeless variable, 61, 104, 240homelessness, 239homogeneity of odds ratio, 55honest significant di↵erence, 71, 87Hornik, Kurt, 211Hosmer–Lemeshow test, 103hospitalization, 239Hotelling’s t, 98HSD (honest significant di↵erence), 87HTML files, 8

harvesting data, 195reproducible output, 172table, 6, 198

HTTP/HTTPS, 5, 197Huber variance, 115hypergeometric distribution, 33hypertext markup language format

(HTML), 8hypertext transport protocol (HTTP), 5

i1 variable, 28, 105, 240i2 variable, 28, 240Iacus, Stefano, 211id number, 20id variable, 240identifying points, 148identity link function, 91if statement, 19, 45, 217Ihaka, Ross, 211

ill-conditioned problems, 95image plot, 130imaginary numbers, 38imaging task view, 232import data, 3imputation, 183in statement, 217income inequality, 94incomplete data, 182, 183independence working correlation, 97indexing, 191

in R, 27lists, 222matrix, 40vector, 221

indicator variable, 68, 177individual level data, 53indtot variable, 104, 135, 240InDUC (Inventory of Drug Use

Consequences), 240infinite values, 182influence, 72information criterion (AIC), 86information matrix, 75inner join, 23installing

packages in R, 229R, 212RStudio, 213

integerfunctions, 37problems, 210

integration, 38interaction, 69

linear regression, 77plot, 84, 130testing, 85two-way ANOVA, 84

interactivecourses in swirl, 217visualization, 203web applications, 205

interceptno, 69

intersection, 16interval censored data, 133introduction

R, 211, 216RStudio, 211

invalid locale, 5

ii

“K23166” — 2015/1/9 — 17:35 — page 264 — #290 ii

ii

ii


Inventory of Drug Use Consequences, seeindtot variable

inverseGaussian family, 91link function, 91matrix, 40probability integral transform, 36

iterative proportional fitting, 93

JAGS, 174JavaScript Object Notation (JSON)

format, 6jitter points, 146joining datasets, 22joins, 19JPEG export, 153JSON format, 6

Kaplan, Danny, xxii, 131Kaplan–Meier plot, 133, 137Kappa, 54keeping variables, 19Kendall correlation, 54kernel smoother plot, 124, 128knapsack problem, 208knitr, 171Knuth, Donald, 171Kolmogorov–Smirnov test, 57, 64Kruskal–Wallis test, 57kurtosis, 52

L1-constrained fitting, 102labels for variables, 12Laplace distribution, 33large data, 2, 18, 207large sample assumption, 161lasso method, 102latent class analysis, 101LATEX output, 171

R, 80Lavine, Michael, 160Lawrence, Michael, 211learning R, 217least absolute shrinkage and selection

operator, 102least angle regression, 103least squares

linear, 67nonlinear, 94

legend, 42, 148Leisch, Friedrich, 171, 186, 211

lengthof string, 15of vector, 40

less than operator, 28Levene’s test for equal variances, 57Levenshtein edit distance, 16leverage, 72library

help, 231R, 229

Ligges, Uwe, 211likelihood ratio test, 85, 102line

on plot, 146style, 151types, 151width, 151

line wrap, 202linear combinations of parameters, 71linear discriminant analysis, 100, 120linear models, 67

big data, 69by grouping variable, 168categorical predictor, 68diagnostic plots, 73diagnostic tests, 73diagnostics, 81generalized, 91interaction, 69, 77no intercept, 69parameterization, 68, 177R object, 67residuals, 72

standardized, 72studentized, 72

standardized residuals, 72stratified analysis, 168studentized residuals, 72test for heteroscedasticity, 73

linear programming, 210link function

cauchit, 91cloglog, 91identity, 91inverse, 91log, 91logit, 91probit, 91square root, 91

linkage to primary care, 239linkstatus variable, 66, 240

ii

“K23166” — 2015/1/9 — 17:35 — page 265 — #291 ii

ii

ii


Linux installationR, 212

Lipsitz, Stuart, 157list files, 50lists, 222

extract elements, 54, 223literate programming, 171Little, Roderick, 183Liverpool, England, 198local polynomial regression, 146locating points, 148loess

bivariate, 94log

base 10, 36base 2, 36base e, 36link function, 91

log fileR, 49

log scale, 152log-likelihood, 102log-linear model, 93log-normal distribution, 33logic, 14logical expressions, 13, 14logical operator, 13, 221logistic

distribution, 53generalized, 108

logistic regression, 91, 104Bayesian, 175c statistic, 91generating, 156goodness of fit, 103Nagelkerke R2, 91ROC curve, 138

logit link function, 91lognormal

distribution, 33, 53regression, 91

logrank test, 58, 65long (tall) to wide format conversion, 21longitudinal regression, 96

reshaping datasets, 110looping, 45lower to upper case conversions, 17lowess, 94, 109, 146lubridate, see library(lubridate) in R

indexLucida font, 150

Lumley, Thomas, 101, 211

M estimation, 95machine learning

task view, 100, 232machine precision, 38Macintosh R FAQ, 213macros, 48MAD regression, 95Maechler, Martin, 211mailing list

R-help, 236make variables available, 11manipulate string variables, 15–17

remove spaces, 17split, 17

MANOVA, 98Mantel–Haenszel test, 55maps

choropleth, 130, 193coordinate systems, 192Google Maps, 193plotting, 190

margin specification, 150marginal

histograms, 135plot, 147

Markdown, 8, 171in Shiny, 205

Markov Chain Monte Carlo, 92, 159,173, 176

Masarotto, Guido, 211masking, 224, 230matching, 177mathematical constants, 37mathematical expressions, 42, 148mathematical functions

absolute value, 36beta, 37choose, 37exponential, 36factorial, 37Fibonacci sequence, 189gamma, 37integer functions, 37log, 36maximum value, 36mean value, 36minimum value, 36modulus, 36natural log, 36

ii

“K23166” — 2015/1/9 — 17:35 — page 266 — #292 ii

ii

ii


permute, 37square root, 36standard deviation, 36sum, 36trigonometric functions, 37

mathematical symbolsadding, 148

mathematics task view, 39, 232matrix

addition, 39combine, 39component-wise multiplication, 40concatenate, 39correlation, 76covariance, 75, 76creation, 39design, 75dimension, 40document term, 203extract elements, 223graphs, 73hat, 72indexing, 40, 223information, 75inverse, 40large, 39multiplication, 35, 40, 75, 222overview, 39plots, 129R, 223structured, 7transposition, 40

maximum likelihood estimation, 53maximum number of drinks

HELP dataset, 240maximum value, 36MCMC, 92, 159, 173, 176McNemar’s test, 56mcs variable, 60, 240mean, 36, 51, 52

by group, 167trimmed, 52weighted, 51

mean–di↵erence plot, 133median regression, 95medical imaging task view, 232medical problems, 239memory usage, 47merging datasets, 22meta analysis task view, 232metadata, 226

methods, 226, 232metric for distance, 16Metropolis–Hastings algorithm, 159MICE (chained equations), 183Microsoft rtf format, 152Microsoft Word format, 152, 171, 172minimum absolute deviation regression,

95minimum value, 36mining

text, 202Minitab files, 3missing data, 27, 182, 183, 186

tables, 55missing information fraction, 185missing values

recoding, 183mixed model, 96

generating, 156logistic, 97logistic regression, 116

mode of storage, 226model

comparisons, 86, 102diagnostics, 81selection, 86, 102specification, 69, 77

modeling language, 55, 67, 167modulus, 36moments, 52Mongo databases, 19month variable, 24Monty Hall problem, 163Morgan, Martin, 211mosaic plot, 131Mosteller, Fred, 162motivational interview, 237movies in Liverpool, 198moving average model, 98Mplus, 101multicollinearity, 95multilevel models, 97multinomial model

generalized, 93logit, 108nominal outcome, 93ordered outcome, 92

multinomial random variable, 35multiple comparisons, 71, 87multiple imputation, 183, 186multiple plots per page, 149

ii

“K23166” — 2015/1/9 — 17:35 — page 267 — #293 ii

ii

ii


multiple y axes, 127, 134multiplication

matrix, 35, 40multivariate statistics

task view, 100, 232multivariate test, 98multiway tables, 55Murdoch, Duncan, 211Murrell, Paul, 9, 123, 134, 211

Nagelkerke R2 for logistic regression, 91name conflicts, 224, 230named arguments in R, 48, 227named lists, 222names and variable types, 11native data files, 8native files, 1natural language processing, 202

task view, 203natural language processing task view,

232negative binomial distribution, 53negative binomial regression, 93, 107

zero-inflated, 94negative-binomial distribution, 33Nelson–Aalen estimate, 99nested models, 91nested quotes, 12New Century Schoolbook font, 150new users

R, 216New Zealand (Aotearoa), 211next statement, 217NIAAA, 237NIDA, 237NLP optimization, 39no intercept, 69noise

add to points, 146non-ASCII, 5non-randomized studies, 177nonlinear least squares, 94nonparametric tests, 57, 64normal density, 147normal distribution, 33, 42, 52, 53normal random variables, 35

truncated, 36normality testing, 56normalizing, 52

constant, 159residuals from linear model, 72

residuals from mixed model, 96not operator, 182notched boxplot, 125NP completeness, 208number coding, 7number of digits to display, 7numeric from string, 15numerical mathematics task view, 232

object-oriented programming, 226objects

displaying, 226R, 220, 221remove, 221

observation number, 20observational studies, 177Octave files, 3ODBC, 19odds ratio, 53, 62

homogeneity, 55o�cial statistics, 101

task view, 101, 232Omegahat, 6, 230omit axes, 152one-way analysis of variance, 70open-source, xxiiiOpenBUGS, 174operating system

change working directory, 50execute command, 49find working directory, 49list files, 50pause execution, 49temporary files, 50

optimization, 39task view, 39, 232with constraints, 208

optionsR, 226scientific notation, 12

OR (odds ratio), 53or operator, 28, 221order statistics, 51ordered factor, 68ordered logistic model, 92, 108ordered multinomial model, 92ordering of levels, 68ordinal logit, 92, 108orientation

axis labels, 151boxplot, 125

ii

“K23166” — 2015/1/9 — 17:35 — page 268 — #294 ii

ii

ii


outer join, 23output file formats

R, 171overdispersion, 91overplotting, 128

packagesconflicts, 230detaching, 11help, 231R, 229remove from workspace, 225

Packrat projects, 231page, multiple plots per, 149pairs plot, 138pairwise di↵erences, 87Palatino font, 150palettes of colors, 151Pandoc, 152, 171Parade magazine, 163parallel

boxplots, 113, 125computation, 232computing task view, 232processing, 228

parameter estimatesconfidence interval, 74standard errors, 74univariate distribution, 53used as data, 73

parameterization of categorical variable,68, 177

reference category, 87Parel, Daniel, xxiipartial file read, 1pathological distribution

sampling, 159pause execution for a time interval, 49pcs variable, 60, 240pdf output

creating, 171, 172exporting, 152

peakedness, 52Pearson correlation, 54Pearson’s �2 test, 55, 61, 103percentiles

probability density function, 33Perl

interface, 18modules, 8

permutation test, 57, 64

permute function, 37permuted sample, 20pharmacokinetic task view, 232phi coe�cient, 56phylogenetics

task view, 232Pi (⇡), 37Pioneer Valley, 193pipe operator, 21, 111, 228plot

adding arrows, 148adding footnotes, 147adding polygons, 148adding shapes, 148adding text, 147arbitrary function, 131characters, 145conditioning, 129curve, 131limits, 76maps, 190predicted lines, 132predicted values, 132regression diagnostics, 73rotating text, 147symbols, 145time series data, 197titles, 147

Plummer, Martyn, 211PNG export, 153point size specification, 150points, 146

locating, 148Poisson distribution, 53Poisson family, 91Poisson regression, 91, 93, 105

Bayesian, 176zero-inflated, 93, 106

polygons, 148polynomial contrasts, 68polynomial regression, 94posterior probability, 173, 176posting guide (R-help), 236postscript, 150, 152power calculations

analytic, 58empirical, 169

practical extraction and report language(Perl), 8, 18

predicted values, 71generating from linear model, 72

ii

“K23166” — 2015/1/9 — 17:35 — page 269 — #295 ii

ii

ii


preprints, 202presentations in RStudio, 172primary care

linkage, 239visits, 240

primary sampling unit, 101primary substance of abuse, 241printing model results, 7prior distribution, 173, 176probability density, 33, 125probability distributions, 42

parameter estimation, 53quantiles, 33random variables, 33simulation, 155, 162task view, 33, 232

probability integral transform, 36probit link function, 91probit regression, 91productivity, xxiprofiling of execution, 47programming, 45projection, 192projects, 211propensity scores, 177proportion, 53proportional hazards model, 98, 117

frailty, 99proportionality test, 99simulate data, 158time-varying covariate, 100

proportional odds model, 92, 108proportionality test, 99Pruim, Randall, xxii, 126, 131pseudo R2, 91pseudo-random number

generation, 33set seed, 34

pss fr variable, 141, 240psychometrics, 100, 117

task view, 100, 232punctuation, 203

QQ plot, 82, 131quadratic growth curve models, 97quantile regression, 95, 107quantile–quantile plot, 131quantile-quantile plot, 82quantiles, 52

probability density function, 33t distribution, 48

quarter variable, 24quasi-complete separation, 176quitting R, 215quotes, nested, 12

Ravailable datasets, 236bug reports, 236command history, 213data structures, 220detach packages, 109Development Core Team, 211exiting, 214export SAS dataset, 8FAQ, 217, 236Foundation for Statistical

Computing, 211graphical user interface, 213help system, 215, 216history, 211installation, 212introduction, 211libraries, 229Linux installation, 212Markdown, 8, 172Markdown in Shiny, 205objects, 221packages, 229, 231programming, 219Project, 236questions, 200, 217R Commander, 213R-help mailing list, 236reading SAS files, 3resources for new users, 216sample session, 214starting, 214support, 236task views, 231warranty, 215Windows installation, 212

R2

linear regression, 75logistic regression, 91

R-help mailing list, 236ragged data, 190rail trails, 193random coe�cient model, 96, 97random e↵ects model, 96, 113

estimate, 96generating, 156

ii

“K23166” — 2015/1/9 — 17:35 — page 270 — #296 ii

ii

ii


random intercept model, 96random number

seed, 34, 189random slopes model, 96random variables

density, 33generate, 33generation, 33probability, 33quantiles, 33

randomization group, 241randomized clinical trial, 237range

axes, 151rank sum test, 57reading

bytes, 5comma-separated value (CSV) files,

2data, 25data with two lines per obs, 196dates, 3fixed format files, 1HTML table, 6, 198HTTP from URL, 5long lines, 3more complex fixed format files, 3native format files, 1other files, 2other packages, 3R into SAS, 2R objects, 1SAS into R, 3spreadsheets, 2variable format files, 4, 190XML files, 6

receiver operating characteristic curve,132, 138

recodingmissing values, 183variables, 13, 14

recover from error, 47rectangular grid, 148recursive partitioning, 100, 119redirect output, 50reference category, 68, 87, 177regression

big data, 69categorical predictor, 68coe�cients, 73diagnostic tests, 73

diagnostics, 71, 73, 81forward stagewise, 103Gamma, 91interaction, 69, 77least angle, 103linear, 46, 67logistic, 91lognormal, 91no intercept, 69overdispersed binomial, 91overdispersed Poisson, 91parameterization, 68, 177Poisson, 91probit, 91residuals, 72standardized coe�cients, 73standardized residuals, 72stratified analysis, 168studentized residuals, 72test for heteroscedasticity, 73

regular expressions, 16, 17, 19rejection sampling, 159relative risk, 53reliability measures, 100, 117remove

dataframe from workspace, 224numbers, 203objects, 221package from workspace, 225punctuation, 203spaces from a string, 17whitespace, 203

rename variables, 13repeat statement, 45, 217replace a string within a string, 17replicable variates, 34replicating examples from the book, 215report generation, 8, 63, 171repository of preprints, 202reproducible analysis, 8, 63, 186, 211

knitr, 171packages, 231random numbers, 34rich text format, 152Statweave, 171tangle, 171task view, 171, 232weave, 171

resampling-based inference, 181reserved commands, 217reshaping datasets, 21, 110

ii

“K23166” — 2015/1/9 — 17:35 — page 271 — #297 ii

ii

ii


residuals, 72analysis, 81correlated, 96plots, 82standardized, 72studentized, 72

results from HELP study, 237rich text format (rtf), 152ridge regression, 95right censored data, 133Ripley, Brian, 211Risk Assessment Battery, 239robust statistical methods

empirical variance, 97, 115regression, 95task view, 95, 232

ROC curve, 132, 138RODBC, 69Rosenthal, Je↵rey, 159rotating

axis labels, 151text, 147

round results, 25, 37RR (relative risk), 53RSeek, 217RStudio, xxi, xxii, 211

curated guide to learning R, 217exporting graphs, 152installation, 213Packrat projects, 231presentations, 172reproducible analysis, 172

RTF (rich text format), 152Rubin, Donald, 183rug plot, 147running a script, 216running average, 188

sales rank, 195Samet, Je↵rey, 237sample size calculations

analytic, 58empirical, 169

samplingchallenging distribution, 159dataset, 20

sampling distribution, 161sandwich variance, 97, 115Sarkar, Deepayan, 123, 134, 186, 211SAS

files from R, 3

savingdata, 26graphs, 152R history, 213

scalelog, 152

scaling, 52scatterplot, 61, 76, 127

binned, 128lines, 146marginal histograms, 129, 135matrix, 129multiple y values, 127points, 146separate plotting characters per

group, 145smoother, 76, 146

Schoenfeld residuals, 99Schwarte, Heiner, 211scientific notation, 12scraping data, 195script file, 215, 216search for approximate string, 16seed, random number, 34, 161sensitivity, 54, 132separate model fitting by group, 83separate plotting characters per group,

145server version, 211session information, 224set names, 18set operations, 16settings, graphical, 150sexrisk variable, 104, 108, 241SF-36 short form health survey, 240shapes, 148Shiny, 205, 211short form (SF) health survey, 240shrinkage method, lasso, 102side-by-side boxplots, 113, 125sideways orientation

boxplot, 125significance stars in R, 67, 77simulate

categorical data, 155Cox model, 158generalized linear model random

e↵ects, 156linear regression, 46logistic regression, 156power calculations, 169

ii

“K23166” — 2015/1/9 — 17:35 — page 272 — #298 ii

ii

ii


simulation studies, 156sine function, 37singular value decomposition, 41sink output, 50size of graph, 149skewness, 52slides in RStudio, 172Smith College, 162smoothing spline, 76, 94, 109, 124, 128,

146social sciences

task view, 67, 76, 91, 103, 232social supports, 240SOCR (Statistics Online Computational

Resource), 213solve optimization problems, 39sorting, 22, 31sourcing commands, 215sparse matrices, 39spatial statistics

choropleth, 193task view, 103, 192, 232

spatio-temporal datatask view, 232

Spearman correlation, 54specificity, 54, 132specifying

box around plots, 150color, 151design matrix, 68, 177margin, 150point size, 150text size, 150

splines, 232split string, 17spreadsheet, 2, 7SPSS files, 3, 8SQL, 18, 207square root, 36

link function, 91stack exchange, 200stack overflow, 217stagewise regression, 103standard deviation, 36, 51standard error, 47standardized regression coe�cients, 73standardized residuals, 72

mixed model, 96Stata files, 3, 8statistical genetics task view, 232statistical learning task view, 232

Statistics Online ComputationalResource (SOCR), 213

status codes, 201stem plot, 124stop words, 203storage mode, 226straight line

adding, 145stratification, 101stratified analysis, 83, 168string variable

concatenating strings, 15extract characters, 15find a string, 16find approximate string, 16from numeric variable, 13length, 15remove spaces, 17replace a string, 17

structural equation modelinglatent class analysis, 101

structured matrices, 7structured query language (SQL), 18,

207Student’s t-test, 56, 161studentized residuals, 72styles

axes, 151line, 151

sub variable, 76, 84submatrix, 40subsetting, 19, 29, 31substance abuse treatment, 240substance of abuse, 241substance variable, 61, 241sum, 36summary statistics, 59

mean, 51separately by group, 31, 167weighted mean, 51

sums of squarescross products, 75Type III, 77, 102

support, 236survey methodology, 101

task view, 101, 232weighted mean, 51

survival analysis, 98, 165accelerated failure time model, 99Cox model, 117frailty, 99

ii

“K23166” — 2015/1/9 — 17:35 — page 273 — #299 ii

ii

ii


Kaplan–Meier plot, 133, 137logrank test, 58, 65proportional hazards model, 98, 99simulate data, 158task view, 98, 133, 232

suspend execution for a time interval, 49Sweave, 8, 171sweep operator, 52swirl interactive courses, 217symbolic numbers, 7symbols

mathematical, 148plot, 145

syntax highlighting, 211Systat files, 3system clock, 34

t distribution, 42, 53quantile, 48

t-test, 56t-test, 64, 161table

cross-classification, 55reading HTML, 6, 198

tabulate binomial probabilities, 188tagged image file format, 153tangent function, 37tangle, 171, 172task view, 231

analysis of spatial data, 103Bayesian inference, 173, 176clustering, 100, 101finite mixture models, 186graphics, 123machine learning, 100multivariate statistics, 100natural language processing, 203o�cial statistics, 101optimization and mathematical

programming, 39probability distributions, 33psychometrics, 100reproducible analysis, 171robust statistical methods, 95social sciences, 67, 76, 91, 103spatial statistics, 192survival analysis, 98, 133time series, 98

Temple Lang, Duncan, 211temporal data

task view, 232

temporary files, 50test

characteristics, 54heteroscedasticity, 73interaction, 85joint null hypotheses, 70normality, 56proportionality, 99

textadding, 147analytics, 202files, 8mining, 202rotating, 147size specification, 150

Tibshirani, Rob, 102tick marks, 151tidyr, see library(tidyr) in R indexTierney, Luke, 211TIFF export, 153time

elapsed, 24variables, 24

time series, 98plotting, 197task view, 98, 232

time variable, 112time-to-event analysis, 98time-varying covariate, 100Times font, 150timing commands, 49titles, 147tolerance, 38tracing memory usage, 47transformed residuals, 96translations, character, 17transparent plot symbols, 128transposing

long (tall) to wide format, 21matrix, 40wide to long (tall) format, 21

trap error, 47treat variable, 66, 241treatment contrasts, 68trigonometric functions, 37trimmed mean, 52true positive, 132truncated normal random variables, 36truncation, 37Tufte, Edward, 126, 134Tukey, John, 134

ii

“K23166” — 2015/1/9 — 17:35 — page 274 — #300 ii

ii

ii


honest significant di↵erences, 71, 87mean–di↵erence plot, 133notched boxplot, 125

two line data input, 196two sample t-test, 56, 64two-way ANOVA, 70, 84

interaction plot, 130two-way tables, 61Type III sums of squares, 77, 102

UCLA, 213uniform random variables, 34union, 16unique filename, 50unique values, 20univariate distribution parameter

estimation, 53univariate loess, 94universal resource identifier (URI), 202universal resource locator (URL), 5University of Auckland, 211unnamed function, 169unstructured covariance matrix, 112unstructured working correlation, 97upper to lower case conversions, 17, 203Urbanek, Simon, 211URI (universal resource identifier), 202URL, 5

harvesting data, 195

values of variables, 12van Buuren, Stef, 183Vanderbilt University, 126variable display, 12variable format files, 4, 190variable labels, 12variables

add, 13rename, 13

variance, 51, 162weighted, 51

variance equality test, 57variance–covariance matrix, 96varimax rotation, 100, 118vectors

e�ciency, 45extract elements, 223from a matrix, 41indexing, 221recycling, 221

version number, 224, 231

Verzani, John, 25violin plots, 125visualization

interactive, 203matrices, 7

visualize correlation matrix, 141vos Savant, Marilyn, 163

warranty for R, 215weave, 171, 172web applications, 211

in Shiny, 205web technologies, 6, 9, 198

task view, 232website for book, xxiiweekday variable, 24Weibull distribution, 33, 53, 158weighted least squares, 95weighted mean, 51weighted variance, 51where to begin, xxivwhile statement, 45, 217White variance, 115whitespace, 203Wickham, Hadley, xxii, 19, 25, 123, 134,

167, 169, 193, 228wide-to-long (tall) format conversion, 21widgets

control, 205width of line, 151Wikipedia, 198Wilcoxon test, 57, 64wildcard, 16, 17, 19wildcard expansion, 50Wilkinson dotplot, 124WinBUGS, 174Windows

installation of R, 212metafile, 153R FAQ, 212

word boundaries, 202Word format, 152, 172workflow, xxi, 171working correlation matrix, 97, 115working directory, 49, 50workspace, 226, 230

browser, 211conflicts, 224, 230

wrap strings, 202writing

ii

“K23166” — 2015/1/9 — 17:35 — page 275 — #301 ii

ii

ii


CSV (comma-separated value) files,8

native format files, 8other packages, 8text files, 8

X’X matrix, 75x-y plot, see scatterplotXie, Yihui, 171XML, 6, 8, 202

create file, 8DocBook DTD, 9read file, 3write files, 9

year variable, 24

zero-inflatednegative binomial regression, 94Poisson regression, 93, 106

Indices - Amherst College

Documents

Transcript of Indices - Amherst College