R for Pirates. ESCCONF October 27, 2011

35
R for Pirates Mandi Walls @lnxchk EscConf, Boston, MA October 27, 2011

Transcript of R for Pirates. ESCCONF October 27, 2011

Page 1: R for Pirates. ESCCONF October 27, 2011

R for PiratesMandi Walls

@lnxchkEscConf, Boston, MA

October 27, 2011

Page 2: R for Pirates. ESCCONF October 27, 2011

whoami

• stats misfit

• R tinkerer

• large-farm runner

• not a professional statistician :D

Page 3: R for Pirates. ESCCONF October 27, 2011

What is R

• Scripting language for stats work

• Inspired by earlier S (for statistics) developed at AT&T

• FOSS

• Syntax inherits through Algol family, so looks somewhat like C/C++

Page 4: R for Pirates. ESCCONF October 27, 2011

What Does R Do?

• Manipulate data

• Complex Modeling and Computation

• Graphics and Visualization

Page 5: R for Pirates. ESCCONF October 27, 2011

Why R?

• WHY NOT!?

Page 6: R for Pirates. ESCCONF October 27, 2011

But Other Math Stuff!

• Mathematica

• MatLab

• Minitab

• MAPLE

• Excel (yes. shutup h8rs. ask your CFOs what they use)

• R provides sophisticated statistical and modeling capabilities, and is extendible through your own code

Page 7: R for Pirates. ESCCONF October 27, 2011

Get R

• Available for Linux, Mac, Windows

• http://www.r-project.org/

Page 8: R for Pirates. ESCCONF October 27, 2011

Fire!

• R console on Mac

• Interactive interpreter for your R needs

• Can also run from the command line: R

Page 9: R for Pirates. ESCCONF October 27, 2011

R Basics

• R considers all elements to be vectors

• A single number is a one-element vector

• Use <- for assignment

• Use c() to concatenate values into a vector

Page 10: R for Pirates. ESCCONF October 27, 2011

Let’s see that again

Page 11: R for Pirates. ESCCONF October 27, 2011

Practice Datasets

• data()

• shows the sample sets included with your R

Page 12: R for Pirates. ESCCONF October 27, 2011

Functions

• Looks familiar!

• Let’s see one!

• “evencount” counts the number of even ints in a vector

Page 13: R for Pirates. ESCCONF October 27, 2011
Page 14: R for Pirates. ESCCONF October 27, 2011

Datatypes

• Vectors, the important ones

• Scalars are really single-element vectors

• Character strings

• Matrices, rectangular arrays of numbers

• Lists

• Tables, useful for data transitions and temp work

Page 15: R for Pirates. ESCCONF October 27, 2011

Vectors

• R’s most-used data structure

• All elements in a vector must have the same mode or data type

• To add values to a vector, you concatenate into it with the c() function

• Many mathematical functions can be performed on a vector, they can also be traversed like arrays

• Index starts at 1, not 0!

Page 16: R for Pirates. ESCCONF October 27, 2011

Scalars

• One-element vectors

> x <- 8

> x[1]

[1] 8

• also climb your rigging

©Disney.

Page 17: R for Pirates. ESCCONF October 27, 2011

Character Strings• Single-element vectors

with mode character

> y <- "abc"

> length(y)

[1] 1

> mode(y)

[1] "character"

• Can do normal string things, like

> t <- paste("yo","dawg")

> t

[1] "yo dawg"

> u <- strsplit(t,"")

> u

[[1]]

[1] "y" "o" " " "d" "a" "w" "g"

Page 18: R for Pirates. ESCCONF October 27, 2011

Matrices• Two-dimensional array

> m <- rbind(c(1,4),c(2,2))

> m

[,1] [,2]

[1,] 1 4

[2,] 2 2

> m[1,2]

[1] 4

> m[1,]

[1] 1 4

Page 19: R for Pirates. ESCCONF October 27, 2011

Lists• Contain elements of different types

• Have a particular syntax

> x <- list(u=2, v="abc")> x$u[1] 2

$v[1] "abc"

> x$u[1] 2

Page 20: R for Pirates. ESCCONF October 27, 2011

Data Frames• Matrices are limited to only a single type for all elements

• A data frame can contain different types of data, can be read in from a file or created in realtime> df <- data.frame(list(kids=c("Olivia","Madison"),ages=c(10,8)))

> df

kids ages

1 Olivia 10

2 Madison 8

> df$ages

[1] 10 8

Page 21: R for Pirates. ESCCONF October 27, 2011

Putting R to Work

• Read in a log file:access <- read.table("access.log", header=FALSE)

> head(access)

V1 V2 V3 V4 V5 V6 V7 V8

1 192.168.1.10 - - [23/Oct/2011:07:03:33 -0500] GET /menu/menu.js HTTP/1.1 401 401

2 192.168.1.10 - - [23/Oct/2011:07:03:33 -0500] GET /menu/menu.js HTTP/1.1 200 1970

3 192.168.1.10 - - [23/Oct/2011:07:03:33 -0500] GET /menu/menu.css HTTP/1.1 200 2258

Page 22: R for Pirates. ESCCONF October 27, 2011

Fun with Plots

• This plot series is going to make use of the “return codes” from the access log

• We’ll do a series of plots that gradually get more sophisticated

• This is a basic histogram of the data, it’s not much fun

Page 23: R for Pirates. ESCCONF October 27, 2011

Barplotbarplot(table(access[,7]))

Page 24: R for Pirates. ESCCONF October 27, 2011

Barplot v2barplot(table(access[,7]),ylab="Number of Pages",xlab="Return Code",main="Plot of Return Codes")

Page 25: R for Pirates. ESCCONF October 27, 2011

Barplot v3barplot(table(access[,7]),ylab="Number of Pages",xlab="Return Code",main="Plot of Return Codes", col=heat.colors(length(x)))

Page 26: R for Pirates. ESCCONF October 27, 2011

Barplot v4

Source: wikipedia, http://en.wikipedia.org/wiki/Bar_%28establishment%29

Page 27: R for Pirates. ESCCONF October 27, 2011

Writing Graphical Output to Files

• Set up the output target by calling a graphics function:

• pdf(), png(), jpeg(), etc

• jpeg(“/var/www/images/returncodes-date.jpg”)

• Call the plot function you have chosen, then call dev.off()

• Can be used in batch mode to create graphics from your data

Page 28: R for Pirates. ESCCONF October 27, 2011

Shopping is Hard, Let’s Do Math

• Read in some load averages (one-min)

loadavg<-read.table("load_avg.txt")

head(loadavg) V11 3.792 3.113 2.944 4.81

Page 29: R for Pirates. ESCCONF October 27, 2011

Summary Stats

• Summarize the data with one function call

• Gives the min, max, mean, median, and quartilessummary(loadavg) V1 Min. :0.760 1st Qu.:1.390 Median :1.970 Mean :2.302 3rd Qu.:3.080 Max. :5.070

Page 30: R for Pirates. ESCCONF October 27, 2011

Summary Stats as Boxplot

Page 31: R for Pirates. ESCCONF October 27, 2011

Same Thing, 3 Datacenters

> cpu<-read.table("cpu")

> head(cpu)

V1 V2

1 3.78 smq

2 2.57 smq

3 3.69 smq

4 0.86 smq

• Looks like there’s outliers. That could spell trouble! You found them with R awesomeness. Horay!

boxplot(cpu[,1] ~ cpu[,2], xlab="Load Average at Time t, by Datacenter", ylab="One-Minute Load Average", main="Box Plot of One-Minute Load Average, FEs", col=topo.colors(3))

Page 32: R for Pirates. ESCCONF October 27, 2011

Running R in Your Workflow

• The little bit of boxplotting we did eariler, in a script:

[mandi@mandi ~]$ cat sample.R#!/usr/bin/env Rscriptcpu<-read.table("cpu")jpeg("./sample.jpg")boxplot(cpu[,1] ~ cpu[,2], xlab="Load Average at Time t, by Datacenter", ylab="One-Minute Load Average", main="Box Plot of One-Minute Load Average, FEs", col=heat.colors(3))dev.off()[mandi@mandi ~]$ Rscript sample.R > /dev/null[mandi@mandi ~]$ ls -l sample.jpg -rw-rw-r-- 1 mandi staff 20137 Oct 24 20:44 sample.jpg

Page 33: R for Pirates. ESCCONF October 27, 2011

Hey!

• I made a graph with a script!

Page 34: R for Pirates. ESCCONF October 27, 2011

What Else?• R can read data input from a variety of files with regular

formats

• R can also fetch data from the internet using the url() function

• R has a number of functions available for dealing with reading data, creating data frames or other structures, and converting string text into numerical data modes

• Extended packages provide support for structured data formats like JSON.