nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all
description
Transcript of nytimes/2009/01/07/technology/business-computing/07program.html?pagewanted=all
http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html?pagewanted=all
Source Code- Tons of Lines of
Code Simplified
Package- Code- Documentation- Datasets
Workspace- Fewer Lines of Code- Efficiency- Capability
The next data visual was produced with about 150 lines of R code
Workflow
Statistics &Analysis
Data AnalysisGoals
Data Input
Visualization & Reporting
Data Management
Enter Manually
Combine Variables Add Variable Select a Subset
Input a Comma Separated Values
R Installation AlreadyIncludes Several Libraries
Integrated Development Environment (IDE)
Write Code/ Program- Input Data- Analyze- Graphics
Datasets, etc.
Enter CommandsView Results
The R Graphics Package
Graphing Parameters
TitlesX-Axis TitleY-Axis TitleLegendScalesColorGridlines
library(help="graphics")
Basic Chart Types
Currently, how many R Packages?
At the command line enter: dim(available.packages()) available.packages()
Correlations Matrix library(car) scatterplotMatrix(h)
In ggplot2 a plot is made up of layers.
ggplot2
Pl o t
Grammar of Graphics
Layer
- Data
- Mapping
- Geom
- Stat
- Postiion
Scale
Coord
Facet
ggplot2
Character Vector: b <- c("one","two","three")
numeric vector
character vector
Numeric Vector: a <- c(1,2,5.3,6,-2,4)
Matrix: y<-matrix(1:20, nrow=5,ncol=4)
Dataframe:d <- c(1,2,3,4)e <- c("red", "white", "red", NA)f <- c(TRUE,TRUE,TRUE,FALSE)mydata <- data.frame(d,e,f)names(mydata) <- c("ID","Color","Passed")
List:w <- list(name="Fred", age=5.3)
Data Structures
Framework Source: Hadley Wickham
Actor Heights
1) Create Vectors of Actor Names, Heights, Date of Birth, Gender
2) Combine the 4 Vectors into a DataFrame
• Numeric: e.g. heights
• String: e.g. names
• Dates: “12-03-2013
• Factor: e.g. gender
• Boolean: TRUE, FALSE
Variable Types
• We use the c() function and list all values in quotations so that R knows that it is string data.
• Create a variable called ActorNames as follows:
ActorNames <- c(“John", “Meryl”, “Jennifer", “Andre")
Creating a Character / String Vector
Class, Length, Index
class(ActorNames)
length(ActorNames)
ActorNames[2]
• Create a variable called ActorHeights (inches):
ActorHeights <- c(77, 66, 70, 90)
Creating a Numeric Vector / Variable
• Use the as.Date() function:
ActorDoB <-as.Date(c("1930-10-27", "1949-06-22", "1990-08-15", "1946-05-19“ ))
• Each date has been entered as a text string (in quotations) in the appropriate format (yyyy-mm-dd).
• By enclosing these data in the as.Date() function, these strings are converted to date objects.
Creating a Date Variable
• Use the factor() function:
ActorGender <- c(“male", “female", “female", “male“ )ActorGender <- factor(ActorGender)
Creating a Categorical / Factor Variable
Actor.DF <-data.frame(Name=ActorNames, Height=ActorHeights, BirthDate = ActorDob, Gender=ActorGender)
Vectors and DataFrames
dim(Actor.DF)Actor.DF[2]Actor.DF[2,]Actor.DF[1,3]Actor.DF[2,2]Actor.DF[2:3,]
> getwd()[1] "C:/Users/johnp_000/Documents"
> setwd()
getwd() setwd()
• write.table(Actors.DF, “ActorData.txt", sep="\t", row.names = TRUE)
• write.csv(Actors.DF, “ActorData.csv")
Write / Create a File