Indices - Amherst College
Transcript of Indices - Amherst College
ii
“K23166” — 2015/1/9 — 17:35 — page 255 — #281 ii
ii
ii
Appendix D
Indices
Separate indices are provided for subject (concept or task) and R command. References tothe examples are denoted in italics.
D.1 Subject index
3-Dhistogram, 128plot, 130
95% confidence intervalmean, 52proportion, 53
absolute value, 36accelerated failure time model, 99access
Dropbox files, 6elements in R, 221files, 50variables, 11
addlines to plot, 146marginal rug plot, 147matrices, 39noise, 146normal density, 147straight line, 145text, 147variables, 13
age variable, 64, 239agreement, 54AIC, 86, 102airline delays, 207Akaike information criterion (AIC), 86,
102alcohol abuse, 241
alcoholic drinksHELP dataset, 240
Allaire, J.J., xxiialtitude, 193Amazon sales rank, 195analysis of variance
interaction plot, 130one-way, 70two-way, 70, 84
analytic power calculations, 58and operator, 28angular plot, 131annotating datasets, 26ANOVA
interaction plot, 130one-way, 70tables, 102
Aotearoa (New Zealand), 211API (application programming
interface), 199, 200, 202Apple R FAQ, 213application programming interface
(API), 199, 200, 202arbitrary quantiles, 52area under the curve, 132ARIMA model, 98arrays, 27, 46
extract elements, 223arrows, 148
255
ii
“K23166” — 2015/1/9 — 17:35 — page 256 — #282 ii
ii
ii
256 D.1 Subject index
ArXiv.org, 202ASCII
datasets, 5, 8encoding, 17
assertions, 47assignment operators in R, 221association plot, 131attributable risk, 53attributes
R, 226AUC (area under the curve), 132Auckland, University of, 211automated report generation, 63, 171autoregressive model, 98available datasets in R, 236AvantGarde font, 150average
running, 188average number of drinks
HELP dataset, 240axes
labels, 151multiple, 127omit, 152range, 151style, 151values, 151
barcharterror bars, 126
barplot, 123baseline interview, 237batch mode, 216Bates, Douglas, 211Bayesian
external software, 174inference task view, 173, 176, 232information criterion, 102logistic regression, 175methods, 186
BCA intervals, 181Beatles, 199best linear unbiased predictors, 96beta
distribution, 33, 53function, 37
beta-binomial distribution, 33beta-normal distribution, 33bias corrected and accelerated, 182bias-corrected and accelerated, 181BIC, 102
big data, 2, 18, 207regression, 69
Bike ride, 193binned scatterplot, 128binomial distribution, 33binomial family, 91binomial probabilities
tabulation, 188bitmap image file, 153bivariate
loess, 94relationship, 60, 127, 128
Bland–Altman plot, 133BMDP files, 3, 8BMP export, 153Bonferroni correction, 71book website, xxiiBoolean
operations, 16, 19, 28, 140R, 222
bootstrapping, 20, 181box around plots, 150boxplot, 125
side-by-side, 113, 125Bradley International Airport, 207break lines, 202Breslow estimator, 98Breslow–Day test, 55Breusch–Pagan test, 73“broken stick” models, 97bug reports, 236byte code compiler, 231
c statistic, 91calculate derivatives, 38calculus, 38calling functions from R, 226capture output, 50cartoon guide, 195case
sensitivity, 214statement, 14
categorical data, 30as predictor, 68from continuous, 13generation, 155parameterization, 68, 177plot, 131tables, 61
Cauchydistribution, 53
ii
“K23166” — 2015/1/9 — 17:35 — page 257 — #283 ii
ii
ii
D.1 Subject index 257
link function, 91causal inference, 177censored data, 98, 133, 165
simulate data, 158Center for Epidemiologic Studies
Depression (CESD) scale, 239centering, 52Central Limit Theorem, 161CESD, 27cesd variable, 27, 239chained equation models, 183, 186Chambers, John, 211change working directory, 50character translations, 17character variable, see string variablecharacteristics, test, 54characters, plotting, 145chemometrics task view, 232chi-square
distribution, 53statistic, 55
Cholesky decomposition, 96choose function, 37choropleth maps, 130, 193circadian plot, 131circular plot, 131class methods, 226class variable, 30
creating, 68ordering of levels, 68
classification, 100, 119cleaning data, 219clinical trial, 237
task view, 232clinical trials, 186clock
system, 34closest values, 187closing a graphic device, 153cluster analysis
task view, 232clustering
hierarchical, 101task view, 100, 101
cocaine, 241Cochran–Mantel–Haenszel test, 55code completion, 211code examples
downloading, xxiicoding numbers, 7coe�cient
of determination, 75of variation, 181regression, 73
coercingcharacter variable from numeric, 15dataframes into matrices, 224date from character, 4factor variable from numeric, 14matrices into dataframes, 224numeric from character, 4string variable from numeric, 13
collinearity, 95color
palettes, 151selection, 151
column width, 25combine matrices, 39Comic Sans font, 150comma-separated value (CSV) files, 2, 8command history, 49
R, 213comments, 223comparison
floating-point variables, 38operators, 221
compiler, 231complementary log-log link function, 91complex fixed format files, 3, 190
two lines, 196complex numbers, 38complex survey design, 101component-wise matrix multiplication,
40Comprehensive R archive network, 212computational economics task view, 232computational physics task view, 232concatenate, 170
datasets, 22matrices, 39strings, 15
conditional execution, 45conditional logistic regression, 92conditional logistic regression model, 91conditional probability, 163conditioning plot, 129, 135confidence interval, 48
for parameter estimates, 74for predicted observations, 132for the mean, 132proportion, 53
confidence level
ii
“K23166” — 2015/1/9 — 17:35 — page 258 — #284 ii
ii
ii
258 D.1 Subject index
default, 48confidence limits
for individual (new) observations, 75for the mean, 74plotting, 74
conflicts, 224, 230confounding, 177constrained optimization, 208contingency table, 55, 61
plot, 131contour plots, 130contrasts, 68, 88
Helmert, 68polynomial, 68SAS, 68treatment, 68
control flow, 45control structures, 45, 217control widgets, 205controlling graph size, 149controlling Type-I error rate, 71convergence diagnosis for MCMC, 173,
174, 176converting characters, 17converting covariance to correlation
matrix, 76converting datasets
long (tall) to wide format, 21wide to long (tall) format, 21
Cook’s distance, 72cookies, 201coordinate systems (maps), 192corpus, 202correlated data, 112
generating, 157regression models, 96residuals, 96
correlationKendall, 54matrix, 60, 76, 141Pearson, 54Spearman, 54
cosine function, 37count models
goodness of fit, 103negative binomial regression, 93, 107Poisson regression, 93, 105zero-inflated negative binomial, 94zero-inflated Poisson regression, 93,
106Courier font, 150
coursesswirl, 217
covariance matrix, 75, 76, 112covariate imbalance, 177Cowles, Kate, 173Cox proportional hazards model, 98, 117
frailty, 99proportionality test, 99simulate data, 158time-varying covariate, 100
CPU time, 49Cramer’s V, 56CRAN (Comprehensive R Archive
Network), 212CRAN task views, see task viewscreate
ASCII datasets, 8categorical variable from continuous,
13categorical variable using logic, 14CSV (comma-separated value) files,
8dataset from counts, 53datasets for other packages, 8date variable, 23factors, 68files for other packages, 8functions, 48lagged variable, 17matrix, 39numeric variable from string, 15observation number, 20recode categorical variable, 14string variable from numeric, 13time variables, 24
Cronbach’s ↵, 100, 117cross-classification table, 29, 55crosstabs, 55, 61CSV (comma-separated value) files, 2, 8cumulative
density function, 33hazard, 99hazard plots, 133product, 189sum, 189
curated guide to learning R, 217Curran, James, 98curve plotting, 131custom graphic layouts, 149
Dalgaard, Peter, 211
ii
“K23166” — 2015/1/9 — 17:35 — page 259 — #285 ii
ii
ii
D.1 Subject index 259
dashed line, 151data
display, 12entry, 7generation, 45input, 25mining, 202scraping, 195
Data Expo 2009, 207data input, 1
two lines, 196data step
repeat steps for a set of variables, 46data structures in R, 220data technologies, 9data viewer, 211database system, 18, 69, 207dataframes, 221
comparison with column bind, 224comparison with matrix, 224detaching, 11R, 223remove from workspace, 224
datasetcomments, 12from counts, 53HELP study, 239in book, xxiiother packages, 3R, 236
date and time variablescreate date, 23create time, 24extract month, 24extract quarter, 24extract weekday, 24extract year, 24reading, 3
dayslink variable, 66, 239DBF files, 3, 8debugging, 47
RStudio, 47decimal representation, 38decomposition
singular value, 41Deducer, 213default confidence level, 48defining functions, 48delete objects, 221density
estimation, 124, 128
overlapping, 126plot, 60, 65, 124, 128
density functions, 33generate random, 33probability, 33quantiles, 33
dependency management, 231depressive symptoms, 27derivatives, 38derived variable, 13, 27, 28design matrix, 75, 87
specification, 68, 177design of experiments task view, 232design weights, 101detach
dataframes, 11, 83, 224packages, 11, 109, 225
determinant, 41detoxification program, 237, 239deviance
tables, 102DFFITs, 73diagnostic
agreement, 132plots, 73tests, 73
diagnostic agreement, 54ROC curve, 138
diagnostic plots, 82diagnostics from linear regression, 81diagonal elements, 40, 41di↵erence in log-likelihoods, 102di↵erence in sets, 16di↵erential equations
task view, 232dimension, 40diploma problem, 162directory delimiter, 1directory structure, 1dispersion parameter, 107display missing categories, 55displaying
data, 12, 26model results, 7objects, 226scientific notation, 12
distance metric, 16distribution
beta, 53Cauchy, 53chi-squared, 53
ii
“K23166” — 2015/1/9 — 17:35 — page 260 — #286 ii
ii
ii
260 D.1 Subject index
empirical probability density plot,125
exponential, 53F, 53gamma, 53geometric, 53logistic, 53lognormal, 53negative binomial, 53normal, 33, 53parameters, 53Poisson, 53probability, 33q-q plot, 131quantile, 33quantile–quantile plot, 131stem plot, 124t, 53Weibull, 53
divert output, 50DocBook document type definition, 9document mining, 202document term matrix, 203document type definition, 9documentation
R, 216dotplot, 124downloading
code examples, xxiidplyr, see library(dplyr) in R indexdrinks of alcohol
HELP dataset, 240drinkstat variable, 28Dropbox, 6dropping variables, 19drugrisk variable, 141, 239DTD, 9duplicated values, 20dynamic
web applications, 205, 211dynamic graphics task view, 232dynamite plot, 126
ecological data task view, 232econometrics task view, 232edit distance, 16editing data, 7e�ciency
vector operations, 45Efron, Bradley, 202
Efron estimator, 98
eigenvalues and eigenvectors, 41elapsed time, 24else statement, 217empirical
density plot, 65estimation, 162finance task view, 232power calculations, 169probability density plot, 125variance, 97, 115
encodingASCII, 17
entering data, 7environment, 226, 230environmental task view, 232Epi Info files, 3equal variance test, 57error bars
bar chart, 126error recovery, 47etiquette
R, 236evaluate integrals, 38Evans, Michael, 159exact
confidence intervals, 53logistic regression model, 92test of proportions, 56
example codedownloading, xxiiR, 215
Excelcreating, 8reading, 2
excesskurtosis, 52zeroes, 93, 94
exchangeable working correlation, 97execution
conditional, 45in operating system, 49profiling, 47
expansionwildcard, 50
expected cell counts, 63expected values, 162experimental design task view, 232exponential
distribution, 53random variables, 36, 161, 165scientific notation, 12
ii
“K23166” — 2015/1/9 — 17:35 — page 261 — #287 ii
ii
ii
D.1 Subject index 261
exponentiation, 36export
BMP, 153datasets for other packages, 8Excel, 8graphs, 152JPEG, 153PDF, 152PNG, 153postscript, 152TIFF, 153WMF (Windows metafile format),
153expressions
R, 221extensible markup language (XML), 6, 9,
202extract characters from string, 15extract from objects, 54, 223
F distribution, 53f1 variables, 27, 28, 117, 239factor
analysis, 100, 118levels, 68, 177reordering, 68variable, 30, 68
factor object, 68factorial function, 37failure time data, 98Falcon, Seth, 211false discovery rate correction, 71false positive, 132family
binomial, 91Gamma, 91Gaussian, 91inverse Gaussian, 91Poisson, 91
FAQApple R, 213R, 217, 236Windows R, 212
female variable, 28, 240Fibonacci sequence, 189file
browsing, 50temporary, 50variable format, 4
filtering, 19finance task view, 232
findapproximate string, 16closest values, 187string within a string, 16working directory, 49
finite mixture modelstask view, 186, 232
finite population correction, 101Fisher’s exact test, 56, 61fit model separately by group, 83fixed format files, 1fixed width files, 2, 3flight delays, 207floating-point representation, 38follow-up interviews, 237fonts in graphics, 150footnotes, 147for statement, 217foreign format, 26formatted
data, 8model results, 7output, 171variables, 18
formula object, 55, 67forward stagewise regression, 103Foundation for Statistical Computing
R, 211Fox, John, 100fraction of missing information, 185frailty model, 99frequently asked questions
seeFAQ, 217Friedman’s super smoother, 146functions, 48
plotting, 131R, 48, 226
fuzzy search, 16
G-rho family of Harrington and Fleming,65
g1b variable, 135, 240g1btv variable, 110, 112, 115GAM, 94Gamma
distribution, 53family, 91function, 37, 159gamma distribution, 33regression, 91
Gaussian
ii
“K23166” — 2015/1/9 — 17:35 — page 262 — #288 ii
ii
ii
262 D.1 Subject index
distribution, 33family, 91
Gelman, Andrew, 159, 160gender variable, 30, 240general linear model for correlated data,
96, 112generalized additive model, 94, 109generalized estimating equation, 115
exchangeable working correlation, 97independence working correlation,
97unstructured working correlation, 97
generalized linear mixed model, 97, 116generalized linear model, 91, 104
big data, 69generalized logit model, 93, 108generalized multinomial model, 93generate
arbitrary random variables, 36categorical data, 155correlated binary variables, 157Cox model, 158dataset from counts, 53exponential random variables, 36generalized linear model random
e↵ects, 156grid of values, 47logistic regression, 156multinomial random variables, 35multivariate normal random
variables, 35normal random variables, 35other random variables, 36pattern of repeated values, 46predicted values, 72random variables, 33residuals, 72sequence of values, 46truncated normal random variables,
36uniform random variables, 34
genetics task view, 232genf variable, 84Gentleman, Robert, 211geometric distribution, 33, 53getting
and cleaning data, 219help in R, 236
ggplot2, see library(ggplot2) in R indexGitHub, 211, 230goodness of fit, 103, 106
ROC curve, 138Google Maps, 193GPS coordinates, 193graduation, 162grammar of graphics, 193graphical layouts, 149graphical models task view, 232graphical reporting, 186graphical settings, 150graphical user interface
deducer, 213R, 213RStudio, 211
graphicsboxplot, 125choropleth, 193exporting, 152side-by-side boxplots, 125size, 149task view, 123, 232
greater than operator, 28grid
graphics, 232of values, 47rectangular, 148search, 208
grouping variablelinear model, 168summary statistics, 167
growth curve models, 97Gruen, Bettina, 186guide to packages
R, 231guidelines
R-help postings, 236
Hadoop, 19hanging rootogram, 103Harrell, Frank, 76, 126, 186, 229Harrington and Fleming G-rho family, 65harvesting data, 195hat matrix, 72hat-check problem, 162hazard plots, 133Health Evaluation and Linkage to
Primary Care (HELP) study,237
health surveySF-36, 240
Helmert contrasts, 68HELP study
ii
“K23166” — 2015/1/9 — 17:35 — page 263 — #289 ii
ii
ii
D.1 Subject index 263
clinic, 241dataset, 239introduction, 237results, 237
help systemother resources, 236R, 215, 216R packages, 231
Helvetica font, 150heroin, 241Hesterberg, Tim, 161heteroscedasticity test, 73hierarchical clustering, 101, 121high-performance computing task view,
232histogram, 124
comparing, 125history
of commands, 49, 213R, 211
Hochberg correction, 71Holm correction, 71homeless variable, 61, 104, 240homelessness, 239homogeneity of odds ratio, 55honest significant di↵erence, 71, 87Hornik, Kurt, 211Hosmer–Lemeshow test, 103hospitalization, 239Hotelling’s t, 98HSD (honest significant di↵erence), 87HTML files, 8
harvesting data, 195reproducible output, 172table, 6, 198
HTTP/HTTPS, 5, 197Huber variance, 115hypergeometric distribution, 33hypertext markup language format
(HTML), 8hypertext transport protocol (HTTP), 5
i1 variable, 28, 105, 240i2 variable, 28, 240Iacus, Stefano, 211id number, 20id variable, 240identifying points, 148identity link function, 91if statement, 19, 45, 217Ihaka, Ross, 211
ill-conditioned problems, 95image plot, 130imaginary numbers, 38imaging task view, 232import data, 3imputation, 183in statement, 217income inequality, 94incomplete data, 182, 183independence working correlation, 97indexing, 191
in R, 27lists, 222matrix, 40vector, 221
indicator variable, 68, 177individual level data, 53indtot variable, 104, 135, 240InDUC (Inventory of Drug Use
Consequences), 240infinite values, 182influence, 72information criterion (AIC), 86information matrix, 75inner join, 23installing
packages in R, 229R, 212RStudio, 213
integerfunctions, 37problems, 210
integration, 38interaction, 69
linear regression, 77plot, 84, 130testing, 85two-way ANOVA, 84
interactivecourses in swirl, 217visualization, 203web applications, 205
interceptno, 69
intersection, 16interval censored data, 133introduction
R, 211, 216RStudio, 211
invalid locale, 5
ii
“K23166” — 2015/1/9 — 17:35 — page 264 — #290 ii
ii
ii
264 D.1 Subject index
Inventory of Drug Use Consequences, seeindtot variable
inverseGaussian family, 91link function, 91matrix, 40probability integral transform, 36
iterative proportional fitting, 93
JAGS, 174JavaScript Object Notation (JSON)
format, 6jitter points, 146joining datasets, 22joins, 19JPEG export, 153JSON format, 6
Kaplan, Danny, xxii, 131Kaplan–Meier plot, 133, 137Kappa, 54keeping variables, 19Kendall correlation, 54kernel smoother plot, 124, 128knapsack problem, 208knitr, 171Knuth, Donald, 171Kolmogorov–Smirnov test, 57, 64Kruskal–Wallis test, 57kurtosis, 52
L1-constrained fitting, 102labels for variables, 12Laplace distribution, 33large data, 2, 18, 207large sample assumption, 161lasso method, 102latent class analysis, 101LATEX output, 171
R, 80Lavine, Michael, 160Lawrence, Michael, 211learning R, 217least absolute shrinkage and selection
operator, 102least angle regression, 103least squares
linear, 67nonlinear, 94
legend, 42, 148Leisch, Friedrich, 171, 186, 211
lengthof string, 15of vector, 40
less than operator, 28Levene’s test for equal variances, 57Levenshtein edit distance, 16leverage, 72library
help, 231R, 229
Ligges, Uwe, 211likelihood ratio test, 85, 102line
on plot, 146style, 151types, 151width, 151
line wrap, 202linear combinations of parameters, 71linear discriminant analysis, 100, 120linear models, 67
big data, 69by grouping variable, 168categorical predictor, 68diagnostic plots, 73diagnostic tests, 73diagnostics, 81generalized, 91interaction, 69, 77no intercept, 69parameterization, 68, 177R object, 67residuals, 72
standardized, 72studentized, 72
standardized residuals, 72stratified analysis, 168studentized residuals, 72test for heteroscedasticity, 73
linear programming, 210link function
cauchit, 91cloglog, 91identity, 91inverse, 91log, 91logit, 91probit, 91square root, 91
linkage to primary care, 239linkstatus variable, 66, 240
ii
“K23166” — 2015/1/9 — 17:35 — page 265 — #291 ii
ii
ii
D.1 Subject index 265
Linux installationR, 212
Lipsitz, Stuart, 157list files, 50lists, 222
extract elements, 54, 223literate programming, 171Little, Roderick, 183Liverpool, England, 198local polynomial regression, 146locating points, 148loess
bivariate, 94log
base 10, 36base 2, 36base e, 36link function, 91
log fileR, 49
log scale, 152log-likelihood, 102log-linear model, 93log-normal distribution, 33logic, 14logical expressions, 13, 14logical operator, 13, 221logistic
distribution, 53generalized, 108
logistic regression, 91, 104Bayesian, 175c statistic, 91generating, 156goodness of fit, 103Nagelkerke R2, 91ROC curve, 138
logit link function, 91lognormal
distribution, 33, 53regression, 91
logrank test, 58, 65long (tall) to wide format conversion, 21longitudinal regression, 96
reshaping datasets, 110looping, 45lower to upper case conversions, 17lowess, 94, 109, 146lubridate, see library(lubridate) in R
indexLucida font, 150
Lumley, Thomas, 101, 211
M estimation, 95machine learning
task view, 100, 232machine precision, 38Macintosh R FAQ, 213macros, 48MAD regression, 95Maechler, Martin, 211mailing list
R-help, 236make variables available, 11manipulate string variables, 15–17
remove spaces, 17split, 17
MANOVA, 98Mantel–Haenszel test, 55maps
choropleth, 130, 193coordinate systems, 192Google Maps, 193plotting, 190
margin specification, 150marginal
histograms, 135plot, 147
Markdown, 8, 171in Shiny, 205
Markov Chain Monte Carlo, 92, 159,173, 176
Masarotto, Guido, 211masking, 224, 230matching, 177mathematical constants, 37mathematical expressions, 42, 148mathematical functions
absolute value, 36beta, 37choose, 37exponential, 36factorial, 37Fibonacci sequence, 189gamma, 37integer functions, 37log, 36maximum value, 36mean value, 36minimum value, 36modulus, 36natural log, 36
ii
“K23166” — 2015/1/9 — 17:35 — page 266 — #292 ii
ii
ii
266 D.1 Subject index
permute, 37square root, 36standard deviation, 36sum, 36trigonometric functions, 37
mathematical symbolsadding, 148
mathematics task view, 39, 232matrix
addition, 39combine, 39component-wise multiplication, 40concatenate, 39correlation, 76covariance, 75, 76creation, 39design, 75dimension, 40document term, 203extract elements, 223graphs, 73hat, 72indexing, 40, 223information, 75inverse, 40large, 39multiplication, 35, 40, 75, 222overview, 39plots, 129R, 223structured, 7transposition, 40
maximum likelihood estimation, 53maximum number of drinks
HELP dataset, 240maximum value, 36MCMC, 92, 159, 173, 176McNemar’s test, 56mcs variable, 60, 240mean, 36, 51, 52
by group, 167trimmed, 52weighted, 51
mean–di↵erence plot, 133median regression, 95medical imaging task view, 232medical problems, 239memory usage, 47merging datasets, 22meta analysis task view, 232metadata, 226
methods, 226, 232metric for distance, 16Metropolis–Hastings algorithm, 159MICE (chained equations), 183Microsoft rtf format, 152Microsoft Word format, 152, 171, 172minimum absolute deviation regression,
95minimum value, 36mining
text, 202Minitab files, 3missing data, 27, 182, 183, 186
tables, 55missing information fraction, 185missing values
recoding, 183mixed model, 96
generating, 156logistic, 97logistic regression, 116
mode of storage, 226model
comparisons, 86, 102diagnostics, 81selection, 86, 102specification, 69, 77
modeling language, 55, 67, 167modulus, 36moments, 52Mongo databases, 19month variable, 24Monty Hall problem, 163Morgan, Martin, 211mosaic plot, 131Mosteller, Fred, 162motivational interview, 237movies in Liverpool, 198moving average model, 98Mplus, 101multicollinearity, 95multilevel models, 97multinomial model
generalized, 93logit, 108nominal outcome, 93ordered outcome, 92
multinomial random variable, 35multiple comparisons, 71, 87multiple imputation, 183, 186multiple plots per page, 149
ii
“K23166” — 2015/1/9 — 17:35 — page 267 — #293 ii
ii
ii
D.1 Subject index 267
multiple y axes, 127, 134multiplication
matrix, 35, 40multivariate statistics
task view, 100, 232multivariate test, 98multiway tables, 55Murdoch, Duncan, 211Murrell, Paul, 9, 123, 134, 211
Nagelkerke R2 for logistic regression, 91name conflicts, 224, 230named arguments in R, 48, 227named lists, 222names and variable types, 11native data files, 8native files, 1natural language processing, 202
task view, 203natural language processing task view,
232negative binomial distribution, 53negative binomial regression, 93, 107
zero-inflated, 94negative-binomial distribution, 33Nelson–Aalen estimate, 99nested models, 91nested quotes, 12New Century Schoolbook font, 150new users
R, 216New Zealand (Aotearoa), 211next statement, 217NIAAA, 237NIDA, 237NLP optimization, 39no intercept, 69noise
add to points, 146non-ASCII, 5non-randomized studies, 177nonlinear least squares, 94nonparametric tests, 57, 64normal density, 147normal distribution, 33, 42, 52, 53normal random variables, 35
truncated, 36normality testing, 56normalizing, 52
constant, 159residuals from linear model, 72
residuals from mixed model, 96not operator, 182notched boxplot, 125NP completeness, 208number coding, 7number of digits to display, 7numeric from string, 15numerical mathematics task view, 232
object-oriented programming, 226objects
displaying, 226R, 220, 221remove, 221
observation number, 20observational studies, 177Octave files, 3ODBC, 19odds ratio, 53, 62
homogeneity, 55o�cial statistics, 101
task view, 101, 232Omegahat, 6, 230omit axes, 152one-way analysis of variance, 70open-source, xxiiiOpenBUGS, 174operating system
change working directory, 50execute command, 49find working directory, 49list files, 50pause execution, 49temporary files, 50
optimization, 39task view, 39, 232with constraints, 208
optionsR, 226scientific notation, 12
OR (odds ratio), 53or operator, 28, 221order statistics, 51ordered factor, 68ordered logistic model, 92, 108ordered multinomial model, 92ordering of levels, 68ordinal logit, 92, 108orientation
axis labels, 151boxplot, 125
ii
“K23166” — 2015/1/9 — 17:35 — page 268 — #294 ii
ii
ii
268 D.1 Subject index
outer join, 23output file formats
R, 171overdispersion, 91overplotting, 128
packagesconflicts, 230detaching, 11help, 231R, 229remove from workspace, 225
Packrat projects, 231page, multiple plots per, 149pairs plot, 138pairwise di↵erences, 87Palatino font, 150palettes of colors, 151Pandoc, 152, 171Parade magazine, 163parallel
boxplots, 113, 125computation, 232computing task view, 232processing, 228
parameter estimatesconfidence interval, 74standard errors, 74univariate distribution, 53used as data, 73
parameterization of categorical variable,68, 177
reference category, 87Parel, Daniel, xxiipartial file read, 1pathological distribution
sampling, 159pause execution for a time interval, 49pcs variable, 60, 240pdf output
creating, 171, 172exporting, 152
peakedness, 52Pearson correlation, 54Pearson’s �2 test, 55, 61, 103percentiles
probability density function, 33Perl
interface, 18modules, 8
permutation test, 57, 64
permute function, 37permuted sample, 20pharmacokinetic task view, 232phi coe�cient, 56phylogenetics
task view, 232Pi (⇡), 37Pioneer Valley, 193pipe operator, 21, 111, 228plot
adding arrows, 148adding footnotes, 147adding polygons, 148adding shapes, 148adding text, 147arbitrary function, 131characters, 145conditioning, 129curve, 131limits, 76maps, 190predicted lines, 132predicted values, 132regression diagnostics, 73rotating text, 147symbols, 145time series data, 197titles, 147
Plummer, Martyn, 211PNG export, 153point size specification, 150points, 146
locating, 148Poisson distribution, 53Poisson family, 91Poisson regression, 91, 93, 105
Bayesian, 176zero-inflated, 93, 106
polygons, 148polynomial contrasts, 68polynomial regression, 94posterior probability, 173, 176posting guide (R-help), 236postscript, 150, 152power calculations
analytic, 58empirical, 169
practical extraction and report language(Perl), 8, 18
predicted values, 71generating from linear model, 72
ii
“K23166” — 2015/1/9 — 17:35 — page 269 — #295 ii
ii
ii
D.1 Subject index 269
preprints, 202presentations in RStudio, 172primary care
linkage, 239visits, 240
primary sampling unit, 101primary substance of abuse, 241printing model results, 7prior distribution, 173, 176probability density, 33, 125probability distributions, 42
parameter estimation, 53quantiles, 33random variables, 33simulation, 155, 162task view, 33, 232
probability integral transform, 36probit link function, 91probit regression, 91productivity, xxiprofiling of execution, 47programming, 45projection, 192projects, 211propensity scores, 177proportion, 53proportional hazards model, 98, 117
frailty, 99proportionality test, 99simulate data, 158time-varying covariate, 100
proportional odds model, 92, 108proportionality test, 99Pruim, Randall, xxii, 126, 131pseudo R2, 91pseudo-random number
generation, 33set seed, 34
pss fr variable, 141, 240psychometrics, 100, 117
task view, 100, 232punctuation, 203
QQ plot, 82, 131quadratic growth curve models, 97quantile regression, 95, 107quantile–quantile plot, 131quantile-quantile plot, 82quantiles, 52
probability density function, 33t distribution, 48
quarter variable, 24quasi-complete separation, 176quitting R, 215quotes, nested, 12
Ravailable datasets, 236bug reports, 236command history, 213data structures, 220detach packages, 109Development Core Team, 211exiting, 214export SAS dataset, 8FAQ, 217, 236Foundation for Statistical
Computing, 211graphical user interface, 213help system, 215, 216history, 211installation, 212introduction, 211libraries, 229Linux installation, 212Markdown, 8, 172Markdown in Shiny, 205objects, 221packages, 229, 231programming, 219Project, 236questions, 200, 217R Commander, 213R-help mailing list, 236reading SAS files, 3resources for new users, 216sample session, 214starting, 214support, 236task views, 231warranty, 215Windows installation, 212
R2
linear regression, 75logistic regression, 91
R-help mailing list, 236ragged data, 190rail trails, 193random coe�cient model, 96, 97random e↵ects model, 96, 113
estimate, 96generating, 156
ii
“K23166” — 2015/1/9 — 17:35 — page 270 — #296 ii
ii
ii
270 D.1 Subject index
random intercept model, 96random number
seed, 34, 189random slopes model, 96random variables
density, 33generate, 33generation, 33probability, 33quantiles, 33
randomization group, 241randomized clinical trial, 237range
axes, 151rank sum test, 57reading
bytes, 5comma-separated value (CSV) files,
2data, 25data with two lines per obs, 196dates, 3fixed format files, 1HTML table, 6, 198HTTP from URL, 5long lines, 3more complex fixed format files, 3native format files, 1other files, 2other packages, 3R into SAS, 2R objects, 1SAS into R, 3spreadsheets, 2variable format files, 4, 190XML files, 6
receiver operating characteristic curve,132, 138
recodingmissing values, 183variables, 13, 14
recover from error, 47rectangular grid, 148recursive partitioning, 100, 119redirect output, 50reference category, 68, 87, 177regression
big data, 69categorical predictor, 68coe�cients, 73diagnostic tests, 73
diagnostics, 71, 73, 81forward stagewise, 103Gamma, 91interaction, 69, 77least angle, 103linear, 46, 67logistic, 91lognormal, 91no intercept, 69overdispersed binomial, 91overdispersed Poisson, 91parameterization, 68, 177Poisson, 91probit, 91residuals, 72standardized coe�cients, 73standardized residuals, 72stratified analysis, 168studentized residuals, 72test for heteroscedasticity, 73
regular expressions, 16, 17, 19rejection sampling, 159relative risk, 53reliability measures, 100, 117remove
dataframe from workspace, 224numbers, 203objects, 221package from workspace, 225punctuation, 203spaces from a string, 17whitespace, 203
rename variables, 13repeat statement, 45, 217replace a string within a string, 17replicable variates, 34replicating examples from the book, 215report generation, 8, 63, 171repository of preprints, 202reproducible analysis, 8, 63, 186, 211
knitr, 171packages, 231random numbers, 34rich text format, 152Statweave, 171tangle, 171task view, 171, 232weave, 171
resampling-based inference, 181reserved commands, 217reshaping datasets, 21, 110
ii
“K23166” — 2015/1/9 — 17:35 — page 271 — #297 ii
ii
ii
D.1 Subject index 271
residuals, 72analysis, 81correlated, 96plots, 82standardized, 72studentized, 72
results from HELP study, 237rich text format (rtf), 152ridge regression, 95right censored data, 133Ripley, Brian, 211Risk Assessment Battery, 239robust statistical methods
empirical variance, 97, 115regression, 95task view, 95, 232
ROC curve, 132, 138RODBC, 69Rosenthal, Je↵rey, 159rotating
axis labels, 151text, 147
round results, 25, 37RR (relative risk), 53RSeek, 217RStudio, xxi, xxii, 211
curated guide to learning R, 217exporting graphs, 152installation, 213Packrat projects, 231presentations, 172reproducible analysis, 172
RTF (rich text format), 152Rubin, Donald, 183rug plot, 147running a script, 216running average, 188
sales rank, 195Samet, Je↵rey, 237sample size calculations
analytic, 58empirical, 169
samplingchallenging distribution, 159dataset, 20
sampling distribution, 161sandwich variance, 97, 115Sarkar, Deepayan, 123, 134, 186, 211SAS
files from R, 3
savingdata, 26graphs, 152R history, 213
scalelog, 152
scaling, 52scatterplot, 61, 76, 127
binned, 128lines, 146marginal histograms, 129, 135matrix, 129multiple y values, 127points, 146separate plotting characters per
group, 145smoother, 76, 146
Schoenfeld residuals, 99Schwarte, Heiner, 211scientific notation, 12scraping data, 195script file, 215, 216search for approximate string, 16seed, random number, 34, 161sensitivity, 54, 132separate model fitting by group, 83separate plotting characters per group,
145server version, 211session information, 224set names, 18set operations, 16settings, graphical, 150sexrisk variable, 104, 108, 241SF-36 short form health survey, 240shapes, 148Shiny, 205, 211short form (SF) health survey, 240shrinkage method, lasso, 102side-by-side boxplots, 113, 125sideways orientation
boxplot, 125significance stars in R, 67, 77simulate
categorical data, 155Cox model, 158generalized linear model random
e↵ects, 156linear regression, 46logistic regression, 156power calculations, 169
ii
“K23166” — 2015/1/9 — 17:35 — page 272 — #298 ii
ii
ii
272 D.1 Subject index
simulation studies, 156sine function, 37singular value decomposition, 41sink output, 50size of graph, 149skewness, 52slides in RStudio, 172Smith College, 162smoothing spline, 76, 94, 109, 124, 128,
146social sciences
task view, 67, 76, 91, 103, 232social supports, 240SOCR (Statistics Online Computational
Resource), 213solve optimization problems, 39sorting, 22, 31sourcing commands, 215sparse matrices, 39spatial statistics
choropleth, 193task view, 103, 192, 232
spatio-temporal datatask view, 232
Spearman correlation, 54specificity, 54, 132specifying
box around plots, 150color, 151design matrix, 68, 177margin, 150point size, 150text size, 150
splines, 232split string, 17spreadsheet, 2, 7SPSS files, 3, 8SQL, 18, 207square root, 36
link function, 91stack exchange, 200stack overflow, 217stagewise regression, 103standard deviation, 36, 51standard error, 47standardized regression coe�cients, 73standardized residuals, 72
mixed model, 96Stata files, 3, 8statistical genetics task view, 232statistical learning task view, 232
Statistics Online ComputationalResource (SOCR), 213
status codes, 201stem plot, 124stop words, 203storage mode, 226straight line
adding, 145stratification, 101stratified analysis, 83, 168string variable
concatenating strings, 15extract characters, 15find a string, 16find approximate string, 16from numeric variable, 13length, 15remove spaces, 17replace a string, 17
structural equation modelinglatent class analysis, 101
structured matrices, 7structured query language (SQL), 18,
207Student’s t-test, 56, 161studentized residuals, 72styles
axes, 151line, 151
sub variable, 76, 84submatrix, 40subsetting, 19, 29, 31substance abuse treatment, 240substance of abuse, 241substance variable, 61, 241sum, 36summary statistics, 59
mean, 51separately by group, 31, 167weighted mean, 51
sums of squarescross products, 75Type III, 77, 102
support, 236survey methodology, 101
task view, 101, 232weighted mean, 51
survival analysis, 98, 165accelerated failure time model, 99Cox model, 117frailty, 99
ii
“K23166” — 2015/1/9 — 17:35 — page 273 — #299 ii
ii
ii
D.1 Subject index 273
Kaplan–Meier plot, 133, 137logrank test, 58, 65proportional hazards model, 98, 99simulate data, 158task view, 98, 133, 232
suspend execution for a time interval, 49Sweave, 8, 171sweep operator, 52swirl interactive courses, 217symbolic numbers, 7symbols
mathematical, 148plot, 145
syntax highlighting, 211Systat files, 3system clock, 34
t distribution, 42, 53quantile, 48
t-test, 56t-test, 64, 161table
cross-classification, 55reading HTML, 6, 198
tabulate binomial probabilities, 188tagged image file format, 153tangent function, 37tangle, 171, 172task view, 231
analysis of spatial data, 103Bayesian inference, 173, 176clustering, 100, 101finite mixture models, 186graphics, 123machine learning, 100multivariate statistics, 100natural language processing, 203o�cial statistics, 101optimization and mathematical
programming, 39probability distributions, 33psychometrics, 100reproducible analysis, 171robust statistical methods, 95social sciences, 67, 76, 91, 103spatial statistics, 192survival analysis, 98, 133time series, 98
Temple Lang, Duncan, 211temporal data
task view, 232
temporary files, 50test
characteristics, 54heteroscedasticity, 73interaction, 85joint null hypotheses, 70normality, 56proportionality, 99
textadding, 147analytics, 202files, 8mining, 202rotating, 147size specification, 150
Tibshirani, Rob, 102tick marks, 151tidyr, see library(tidyr) in R indexTierney, Luke, 211TIFF export, 153time
elapsed, 24variables, 24
time series, 98plotting, 197task view, 98, 232
time variable, 112time-to-event analysis, 98time-varying covariate, 100Times font, 150timing commands, 49titles, 147tolerance, 38tracing memory usage, 47transformed residuals, 96translations, character, 17transparent plot symbols, 128transposing
long (tall) to wide format, 21matrix, 40wide to long (tall) format, 21
trap error, 47treat variable, 66, 241treatment contrasts, 68trigonometric functions, 37trimmed mean, 52true positive, 132truncated normal random variables, 36truncation, 37Tufte, Edward, 126, 134Tukey, John, 134
ii
“K23166” — 2015/1/9 — 17:35 — page 274 — #300 ii
ii
ii
274 D.1 Subject index
honest significant di↵erences, 71, 87mean–di↵erence plot, 133notched boxplot, 125
two line data input, 196two sample t-test, 56, 64two-way ANOVA, 70, 84
interaction plot, 130two-way tables, 61Type III sums of squares, 77, 102
UCLA, 213uniform random variables, 34union, 16unique filename, 50unique values, 20univariate distribution parameter
estimation, 53univariate loess, 94universal resource identifier (URI), 202universal resource locator (URL), 5University of Auckland, 211unnamed function, 169unstructured covariance matrix, 112unstructured working correlation, 97upper to lower case conversions, 17, 203Urbanek, Simon, 211URI (universal resource identifier), 202URL, 5
harvesting data, 195
values of variables, 12van Buuren, Stef, 183Vanderbilt University, 126variable display, 12variable format files, 4, 190variable labels, 12variables
add, 13rename, 13
variance, 51, 162weighted, 51
variance equality test, 57variance–covariance matrix, 96varimax rotation, 100, 118vectors
e�ciency, 45extract elements, 223from a matrix, 41indexing, 221recycling, 221
version number, 224, 231
Verzani, John, 25violin plots, 125visualization
interactive, 203matrices, 7
visualize correlation matrix, 141vos Savant, Marilyn, 163
warranty for R, 215weave, 171, 172web applications, 211
in Shiny, 205web technologies, 6, 9, 198
task view, 232website for book, xxiiweekday variable, 24Weibull distribution, 33, 53, 158weighted least squares, 95weighted mean, 51weighted variance, 51where to begin, xxivwhile statement, 45, 217White variance, 115whitespace, 203Wickham, Hadley, xxii, 19, 25, 123, 134,
167, 169, 193, 228wide-to-long (tall) format conversion, 21widgets
control, 205width of line, 151Wikipedia, 198Wilcoxon test, 57, 64wildcard, 16, 17, 19wildcard expansion, 50Wilkinson dotplot, 124WinBUGS, 174Windows
installation of R, 212metafile, 153R FAQ, 212
word boundaries, 202Word format, 152, 172workflow, xxi, 171working correlation matrix, 97, 115working directory, 49, 50workspace, 226, 230
browser, 211conflicts, 224, 230
wrap strings, 202writing
ii
“K23166” — 2015/1/9 — 17:35 — page 275 — #301 ii
ii
ii
D.1 Subject index 275
CSV (comma-separated value) files,8
native format files, 8other packages, 8text files, 8
X’X matrix, 75x-y plot, see scatterplotXie, Yihui, 171XML, 6, 8, 202
create file, 8DocBook DTD, 9read file, 3write files, 9
year variable, 24
zero-inflatednegative binomial regression, 94Poisson regression, 93, 106