Using R in a Microsoft Office World (aka: hostile environment) Harold Baize, PhD SF-DPH

Post on 25-Feb-2016

22 views 0 download

Tags:

description

Using R in a Microsoft Office World (aka: hostile environment) Harold Baize, PhD SF-DPH. First: Advice to the R Newbie. Use the assignment arrow!

Transcript of Using R in a Microsoft Office World (aka: hostile environment) Harold Baize, PhD SF-DPH

Using R in a Microsoft Office

World (aka: hostile environment)

Harold Baize, PhD SF-DPH

First: Advice to the R Newbie

Use the assignment arrow! <- Directional and not ambiguous Clearly shows an object created or modified

X = Y X <- Y Y -> X Use caution with attach() !! * Does not work like SPSS or SAS! * Data manipulations are not part of the data frame

Attach HeadacheAn Introduction to Statistical Learning: with Applications in R James, Witten, Hastie and Tibshirani (2013)“To refer to a variable, we must type the data set and the variable name joined with a $ symbol. Alternatively, we can use the attach() function in order to tell R to make the variables in this data frame available by name.”

Google R Style Guide:“AttachThe possibilities for creating errors when using attach are numerous. Avoid it.”

Attach HeadacheR is fundamentally different from SAS and SPSS, you’re not restricted to one dataset at a time! var1 <- gear * carb #Error: object 'gear' not foundattach(mtcars) var2 <- gear + carb

mtcars$var3 <- gear + carb

There are now two mtcars, the one attached was not changed!

mtcars$var4 <- mtcars$cyl + mtcars$var3

Formula notation lets you skip $

Attach not necessary with formula notation (data = )

REPRODUCIBILITY

REPRODUCIBILITY

REPRODUCIBILITY

https://www.scienceexchange.com/reproducibility

Ewww… It’s sorta GUI

R in a Microsoft Office World IT does not trust open source software Your supervisors don’t trust open source software (but want to save $)

Everyone else only knows MS-Office (you might get hit by a bus) Management only knows Excel (and they think its great!) You have to be able to share data between R and MS-Office apps Need to publish results in MS-Office formats but want reproducibility

R in a Microsoft Office World Two basic strategies:

Work in MS Office with R plug-ins to execute R from within the app a. not open source b. not easily reproducible due to GUI

Work in R and use packages to tailor output for MS Office

MS Office Plug-ins

statconnDCOM Server: server that allows integration of R into other applications. rscproxy: R package required for rcom and for the statconn (D)COM Server. rcom: R package for R and other apps to communicate RExcel: embeds R in Excel SWord: embeds R in Wordmodified from:http://www.londonr.org/Presentations/May'10%20Markus%20Gesmann.pdf

Working in R:Data Exchange ODBC CSV files Excel

RODBC Use RODBC to perform SQL queries ODBC drivers need to be set up in

your operating system. Once set up, you use the

Data Source Name (dsn) as a connection

Control Panel > Administrative Tools > Data Sources (ODBC)

Enter a name Description optional Select server from drop down menu Defaults should work for the rest…

library(RODBC)AvDW <- odbcConnect( dsn = "Avatar_Reports")

AvLv <- odbcConnect( dsn = "AVCALPMLIVE", believeNRows = FALSE) ## , uid="User", pwd="password")

X <- sqlQuery(AvDW, "select * from tableName")

Returns a data frame or SQL error message to the data object, no R error until you use X Dates are returned in POSIXct format Text fields are returned as factors, unless you specify stringsAsFactors = FALSE

sqlSave(OpenChannel, dfName, append = FALSE)

StDate <- "2012-07-01"EnDate <- "2013-06-30"

idEpFY <- sqlQuery(AvDW, paste( "select distinct id, epi_num from billing_tx_history where date_of_service Between {d '",StDate,"'} and {d '",EnDate,"'} and rrg = 'Mental Health'"))odbcClose(AvDW) # {d '2012-07-01'}

Accesslibrary(RODBC)Ac <- odbcConnectAccess("fileName.mdb") ## For newer versions of Access, 2007 or later... "accdb"## dataObj <- odbcConnectAccess2007("fileName.accdb") ## In this example "Demos_FY" is a table in an Access database## If the table name contains underscore, then the ## table name needs to be in back tick quote`

DemoDat <- sqlQuery(Ac, "select * from `Demos_FY`")

CSV Comma Separated Values (in plain text) Almost a universal format… ALMOST! Microsoft opens CSV files with Excel by default

## To read a CSV fileAnx <- read.csv("ANexx.csv")str(Anx)

## Returns dates as factorAnx$disDate <- as.Date(Anx$disDate, "%m/%d/%Y")## To write CSVwrite.csv(Anx, file = "Ane.csv", row.names = FALSE)

## If you don’t use ## row.names = FALSE

str(Anx)'data.frame': 2515 obs. of 23 variables: $ X : int 1 2 3 4 5 6 7 8 ... $ X.3 : int 1 2 3 4 5 6 7 8 ... $ X.2 : int 1 2 3 4 5 6 7 8 ... $ X.1 : int 1 2 3 4 5 6 7 8 ... $ program_code : chr "88121" "8919F1"...

Excel CSV “Feature”Microsoft wants to be sure your data looks good, so…. Leading zeroes are removed! program_code 00201 becomes 201Not an option, it’s automatic.

Only work around in Excel is use GUI and open as “text”Note: This problem can occur in R too, although not as common. The solution in R is to use the colClasses option to define data types with read.csv().

ExcelPackage RODBC has a function odbcConnectExcel()Package gdata read.xls()and read.xlsx() Requires separate installation of pearl executable.

Packages xlsx and XLConnect both use Java run time to parse MS Office “Open XML” format. They can both read and write.

Best package to use to read and write Excel is…

XLConnect Uses Java runtime installed on mostcomputers Multiplatform Provided by Mirai Solutions GmbHCan write Excel files with formatting

Read Excellibrary(XLConnect)prg <- readWorksheetFromFile( "ExcelFile.xls", sheet = 1)

## Other arguments:## startRow = 1, startCol = 5, endRow = 1, endCol = 5

1 2 3 4## sr,sc,er,ec

Excel file as often misused

## Create data frame from multiple Excel files## Read file names from the directoryfilenames <- list.files("./XLfiles", pattern = "*.xls", full.names = TRUE)

lpd <- lapply(filenames, function(x) {cbind( readWorksheetFromFile(x,1,13, 1,75, 6), # main data readWorksheetFromFile(x,1, 1, 5, 1, 5, header = FALSE), # prog code sr 1,sc 5,er 1,ec 5 readWorksheetFromFile(x,1, 3, 5, 3, 5, header = FALSE), # program name readWorksheetFromFile(x,1, 5,13, 5,13, header = FALSE), # Date Submitted readWorksheetFromFile(x,1, 2,13, 2,13, header = FALSE)) # FY } )## Change the list object into one data frameddf <- do.call(rbind,lpd)## Give names to the columnsnames(ddf) <- c("A","B","LFname","Hours","Worked", "ClinHrsWk","prog","progN","DateSub","FY")

Write ExcelBasically the same as reading…

To create a new Excel file from a data frame:writeWorksheetToFile("Name.xls", data = dfName, sheet = "Some Name", startRow = 3, startCol = 4)

Add a sheet to an existing Excel file, put your data frame (dfName) into it. wb <- loadWorkbook("xlFile.xls")createSheet(wb, name = "Sheet Name")writeWorksheet(wb, dfName, sheet = "Sheet Name")saveWorkbook(wb)

Reports and Publishing Export data to Crystal Reports Tableau integration Word

R2DOCX Markdown to Word (knitr) xtable sjPlot

Tableau 8.1 integration involves installing the package Rserve which establishes connection to Tableau. Short R scripts can be executed in one line Tableau functions return either integer, real, string or boolean data type R code is the part between red single quotes below.

SCRIPT_INT('kmeans(data.frame(.arg1,.arg2,.arg3,.arg4),3)$cluster;',SUM([Petal length]), SUM([Petal width]),SUM([Sepal length]),SUM([Sepal width]))

R2DOCXR2DOCX is a package on GitHub that allows you to write Word documents directly in R by using functions!

library(R2DOCX)## Create a document new objectdoc <- new("Docx",

title = "My example", basefile = template.file)

doc <- addParagraph(doc, stylename = "Normal", value = paste("This is a rather short paragraph. ", RObj," Not a lot of room on a PowerPoint screen."))

Markdown to Word Use knitr and knit2html() Use Pandoc, or… A simple trick allows you to open HTML documents in Word You can compose reproducible reports in R as markdown documents with plots and tables Plots can be created as any device including Windows Metafile Tables can be formatted using packages xtable or sjPlot (but not both)

Markdown: A simplified Heading=======

Sub-heading----------- Paragraphs are separated by a blank line. Text attributes *italic*, **bold**, `monospace`. A [link](http://example.com). Shopping list: * apples * oranges

Numbered list:

1. apples 2. oranges

The rain---not the reign---in Spain.

# Heading

## Sub

_italic___bold__

<h1>Heading</h1> <h2>Sub-heading</h2>

<p>Paragraphs are separated by a blank line.</p>

<p>Text attributes <em>italic</em>, <strong>bold</strong>, <code>monospace</code>.</p> <p>A <a href="http://example.com">link</a>.</p>

<p>Shopping list:</p> <ul> <li>apples</li> <li>oranges</li> <li>pears</li> </ul>

<p>Numbered list:</p> <ol> <li>apples</li> <li>oranges</li> <li>pears</li> </ol>

<p>The rain&mdash;not the reign&mdash;in Spain.</p>

R in Markdown## Code chunks

A _paragraph_ here. A code chunk below

```{r}1+1 ## defaults: echo=TRUE, include=TRUE .4-.7+.3 ## Use trunc() or round()```## Inline code

The value of pi is **`r pi`**, and 3 times pi is **`r sprintf("%.2f",3*pi)`**.

HTML to Word: Pandoc

HTML to Wordlibrary(knitr)knit2html("RMD_Web.rmd")Creates RMD_Web.html and RMD_Web.md

How do you turn it into a Word file?

file.copy("fileName.html", "fileName.doc")(not fileName.docx)

Change Windows to show file extensions Organize > Folder and search options

HTML to Wordknit2html("RMD_Word.rmd", options = c('mathjax','smartypants', 'use_xhtml','highlight_code'))

Rename RMD_Word.html as RMD_Word.doc

Double click, it opens in Word as a web page, not text

Change the view from web page… View > Print Layout

Save as “docx” it is now a Word document!

Plots and Images By default knitr renders plots as PNG and stores in sub-folder “figure”

Default is to embed plots in the HTML as base 64 encryption. Word does not read that. If you render plots directly to output, then you need to turn off the base_64images option.

More control by rendering plots using R device (PNG, JPEG, PDF…) then reading in the file If you render image files then read them in, you don’t need to change options to turn off base_64images. To use in Word, render plots using win.metafile() as device, then read in the file. ![plotName](plotName.wmf)

Generate image file in R chunk, read it into document in markdown

```{r, echo=FALSE, include=FALSE}win.metafile("wmDense.wmf", width = 6, height = 4)

densityplot(~mpg|as.factor(cyl), main = "Density Plot by Cylinders", xlab = "Miles per Gallon ", data = mtcars)

dev.off()```![wmhist](wmDense.wmf)

xtable Nice plots are not enough. You need to be able to print well formatted results and tables. The crude text of R output won’t do in a publication. knit2html() produces HTML Tags can be used in your text! (i.e. Greek: &beta; prints out β) xtable prints tables and data frames as HTML directly in the text

**Regression on Income**```{r, echo = FALSE, results = 'asis'}reg <- lm(income ~ age + IQ, data = donner)

rxl <- summary(reg)print(xtable(rxl), type = "HTML")

```Residual standard error `r sprintf("%.1f", rxl$sigma)`

on `r rxl$df[2]` degrees of freedom.

xtable

Another Example of In Line Statistics

**Chi Square** Statistical test from a cross tabulation.

**&chi;^2** = `r xt$chisq$statistic`,df = `r xt$chisq$parameter`,*p* = `r xt$chisq$p.value`

sjPlot sj is for the blog “Strenge Jacke!” German title: “Strict Jacket, the majority do not want to hear the Daniel.” (Daniel Lüdecke) A Package for plots and to generate well formatted output for linear models, frequency tables, and cross tabs similar to SPSS output. Originally designed to produce stand alone web pages or output to R Studio viewer window. Should work with knitr…

sjPlot Use sjPlot functions in markdown in-line R statements Internal objects of function page.style and page.content provide the HTML Double in-line calls to sjPlot functions Caused conflicts with CSS of the web page Automatically opened web browser Not well suited for knitr use…

sjPlot So… Comments on Daniel’s blog page New version of sjPlot to be more knitr friendly Still using in line R in markdown but… Embedded CSS Option for no output to browser knitr object added, only need one function

sjPlot table functions: sjt.corr, sjt.df, sjt.frq, sjt.glm, sjt.xtab, sjt.pca, sjt.stackfrq

```{r echo = FALSE}library(sjPlot) ## One line```

`r sjt.frq(efc$c160age, no.output = TRUE, variableLabels = list("Carer's Age"), autoGroupAt = 10)$knitr`

Do it…