Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12...
Transcript of Dr. Aijun Zhang STAT3622 Data Visualization 12 September 2016 · STAT3622 Data Visualization 12...
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Exploratory Data Analysis
Dr. Aijun ZhangSTAT3622 Data Visualization
12 September 2016
StatSoft.org 1
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Outline
1 Exploratory Data Analysis
2 Simple Base Graphics
3 Using Lattice Package
StatSoft.org 2
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
John Tukey
John Tukey (1915 – 2000)
Proposed “Exploratory Data Analysis”
Coined terms: Boxplot, Stem-and-Leafplot, ANOVA (Analysis of Variance)
Coined terms “Bit” and “Software”
Co-Developed Fast Fourier Transformalgorithm, Projection Pursuit, Jackknifeestimation
Famous quote: “The best thing aboutbeing a statistician is that you get to playin everyone’s backyard. ”
StatSoft.org 3
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
John Tukey
“The greatest value of a picture is when it forces us to notice whatwe never expected to see.”
John Tukey (1977)
Tables
Five-number summary
Scatter plot
Box-plot
Residual plot
Smoother
Stem-and-Leaf plot
Bag plot
Median Polish
StatSoft.org 4
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Example 1: Anscombe Dataset
x1 y1 x2 y2 x3 y3 x4 y41 10.00 8.04 10.00 9.14 10.00 7.46 8.00 6.582 8.00 6.95 8.00 8.14 8.00 6.77 8.00 5.763 13.00 7.58 13.00 8.74 13.00 12.74 8.00 7.714 9.00 8.81 9.00 8.77 9.00 7.11 8.00 8.845 11.00 8.33 11.00 9.26 11.00 7.81 8.00 8.476 14.00 9.96 14.00 8.10 14.00 8.84 8.00 7.047 6.00 7.24 6.00 6.13 6.00 6.08 8.00 5.258 4.00 4.26 4.00 3.10 4.00 5.39 19.00 12.509 12.00 10.84 12.00 9.13 12.00 8.15 8.00 5.56
10 7.00 4.82 7.00 7.26 7.00 6.42 8.00 7.9111 5.00 5.68 5.00 4.74 5.00 5.73 8.00 6.89
Mean 9.00 7.50 9.00 7.50 9.00 7.50 9.00 7.50Sd 3.32 2.03 3.32 2.03 3.32 2.03 3.32 2.03Cor 0.82 0.82 0.82 0.82
StatSoft.org 5
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Example 1: Anscombe Dataset
StatSoft.org 6
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Exploratory Data Analysis
The EDA is a statistical approach to make sense of data by using avariety of techniques (mostly graphical). It may help
Assess assumption about variables distribution
Identify relationship between variables
Extract important variables
Suggest use of appropriate models
Detect problems of collected data (e.g. outliers, missing data,measurement errors)
StatSoft.org 7
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Base Statistical Graphics
UnivariteHistogram, Stem-and-Leaf, Dot, Q-Q, Density plotsBoxplot, Box-and-whiskerBar, Pie, Polar, Waterfall charts
BivariateXYplot, Line, Area, Scatter, Bubble charts
Trivariate3D Scatter, Contour, Level/Heatmap, Surface plots
StatSoft.org 8
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Which Chart to Use?
StatSoft.org 9
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Which Chart to Use?
Indeed, experience matters!
StatSoft.org 10
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Outline
1 Exploratory Data Analysis
2 Simple Base Graphics
3 Using Lattice Package
StatSoft.org 11
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Iris Dataset
Let’s play with Iris data in RStudio. Refer to R markdown soucecodes and html output files (reproducible).
StatSoft.org 12
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Histogram
Options: title(.,main, xlab, ylab), hist(.,breaks, freq, col)figure layout by par(mfrow/mfcol = c(nr,nc))
StatSoft.org 13
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Histogram with Density Plot
Options: hist(.,freq=F); lines(density(.), lty, lwd,)
StatSoft.org 14
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Boxplot
Remarks: Outliers /∈ {Q1− 1.5IQR,Q3 + 1.5IQR}
Options: plotting x (vector), X (matrix) and x ∼ c (grouping)
StatSoft.org 15
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Plotting Categorical Variables
Ticks: data selection/subsetting; see UCLA R-site
StatSoft.org 16
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Relationship Between Variables
Tricks: mathematical annotations in plots; see plotmath.html
StatSoft.org 17
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Relationship Between Variables
Tricks: color indexing by unclass(DataX$Species)adding legend/text at different locations
StatSoft.org 18
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Pairwise Scatter Plots
Tricks: color indexing by unclass(DataX$Species)adding legend/text at different locations
StatSoft.org 19
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Pairwise Scatter Plots
Tricks: color indexing by unclass(DataX$Species)adding legend/text at different locations
StatSoft.org 20
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Outline
1 Exploratory Data Analysis
2 Simple Base Graphics
3 Using Lattice Package
StatSoft.org 21
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Lattice Package
Sarkar (2008; Springer)
Using trellis graphs for multivariate data
Multipanel conditioning and grouping
Elegant high-level data visualization
Covering most of statistical charts
Figures and Codes can be found athttp://lmdvr.r-forge.r-project.org/
Plot customization are not straightforward
StatSoft.org 22
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Univariate Distributions with Conditioning and Grouping
Refer to R Markdown for source codes/outputs (reproducible)
StatSoft.org 23
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Univariate Distributions with Conditioning and Grouping
StatSoft.org 24
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Univariate Distributions with Conditioning and Grouping
StatSoft.org 25
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Exploring Bivariate Relationships
StatSoft.org 26
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Trivariate Heatmap and 3D Plots
StatSoft.org 27
Exploratory Data Analysis Simple Base Graphics Using Lattice Package
Trivariate Heatmap and 3D Plots
StatSoft.org 28