Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many...
Transcript of Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many...
Math 141Lecture 13: Graphics!
Albyn Jones1
1Library [email protected]
www.people.reed.edu/∼jones/courses/141
Albyn Jones Math 141
Time for a Graphical Interlude!
Probability theory is mathematics: we start with axiomsand deduce consequences. There are theorems.Statistics is about inference — it is not an axiomatizablediscipline, and a statistical argument from data to aconclusion is not a proof! While there are theorems,assumptions like independence are not guaranteed to betrue statements about the world.Graphical methods are important tools for validation ofmodel assumptions or conditions.
Albyn Jones Math 141
Exploring the shapes of univariate distributions
The primary graphical tools for direct assessment of thefeatures of a sample for a single variable include:
histogramsstem plotsdensity plotsquantile-quantile plots
Albyn Jones Math 141
Populations
Most univariate plots are methods of exploring features of apopulation, typically the distribution of some measurementtaken from samples of a population.
We are interested in features like the location, spread andshape of the distribution of our measurements. In addition, wemay see other features, such as the existence of outliers —isolated observations far from the bulk of the data.
Albyn Jones Math 141
Density Curves
There are many ways to display a distribution, we will focus ondensity curves. A density curve is a function f (x), such that thetotal area under the curve is 1, and the probability that anobservation lies in the interval (a,b) is the area under the curvebetween a and b. For example, the Normal density curve is
f (x) =1√
2πσ2exp
(−1
2
(x − µσ
)2)
Albyn Jones Math 141
Example: Normal Density Curve
In addition to areas, density curves reveal location, spread, andshape.
−3 −2 −1 0 1 2 3
0.0
0.1
0.2
0.3
0.4
Z
dens
ity
Area = .683
Area under Normal curve between −1 and 1
Albyn Jones Math 141
Histograms
Histograms come in two flavors: the vertical axis may befrequency (counts) or probability (relative frequency, density).Histograms are very crude density estimates!
frequency histogram
CogB
Freq
uenc
y
−10 0 5 10 20
02
46
810
probability histogram
CogB
Dens
ity
−10 0 5 10 20
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Albyn Jones Math 141
Warning!!
Frequency count histograms with irregular interval sizes can bemisleading!
frequency histogram
CogB
Freq
uenc
y
−10 0 5 10 15 20
01
23
45
67
probability histogram
CogB
Dens
ity
−10 0 5 10 15 20
0.00
0.02
0.04
0.06
0.08
0.10
0.12
Albyn Jones Math 141
Warning!!
The appearance of histograms depends on the intervalboundaries!
Histogram of CogB
CogB
Dens
ity
−10 0 5 10 20
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Histogram of CogB
CogB
Dens
ity
−10 0 5 10 15 20
0.00
0.02
0.04
0.06
0.08
0.10
0.12
Albyn Jones Math 141
Stem Plots
A stem plot is essentially a histogram turned on its side, withregular interval boundaries:
> stem(CogB)
The decimal point is 1 digit(s) to the rightof the |
-0 | 98-0 | 4441111000 | 01112222440 | 61 | 231 | 5572 | 1
Albyn Jones Math 141
Stem Plots
There is some control over the display scale. If you pick theright scale, you can recover two digits of each case from thestem plot!
stem(CogB,scale=4) stem(CogB,scale=3)
-9 | 1 -8 | 1-8 | -6 | 6-7 | 6 -4 | 0-6 | -2 | 75-5 | -0 | 487731-4 | 0 ..etc..-3 | 75..etc..
Albyn Jones Math 141
Density Plots
For publication graphics, I recommend density plots. A densityplot is a direct estimate of the population density curve.
−10 0 10 20
0.00
0.02
0.04
0.06
0.08
0.10
plot(density(CogB),col="purple")
N = 28 Bandwidth = 1.785
Den
sity
Albyn Jones Math 141
Density Plots: details
The density plot is essentially little bumps, added up!
−10 0 10 20
0.00
0.02
0.04
0.06
0.08
0.10
CogB density
N = 28 Bandwidth = 1.785
Den
sity
Albyn Jones Math 141
Density Plots: add the rug!
−10 0 10 20
0.00
0.02
0.04
0.06
0.08
0.10
CogB density
N = 28 Bandwidth = 1.785
Den
sity
rug(CogB, lwd = 2, col = ”red”)
Albyn Jones Math 141
Density Estimation: BandWidth
When you plot a density curve, the text at the bottom listssomething called bandwidth. That determines the size of thelittle bumps that density() adds up to get the density estimate.R will make a pretty good guess, probably better than you can,so normally you should just use the default bandwidthselection. But just for fun . . .
Albyn Jones Math 141
This BW is TOOO Big!
−100 −50 0 50 100
0.00
00.
002
0.00
40.
006
0.00
80.
010
0.01
2
CogB density
N = 28 Bandwidth = 30
Den
sity
Albyn Jones Math 141
This BW is TOOO Small!
−10 −5 0 5 10 15 20
0.0
0.1
0.2
0.3
CogB density
N = 28 Bandwidth = 0.1
Den
sity
Albyn Jones Math 141
The Default BW is JUST RIGHT!
−10 0 10 20
0.00
0.02
0.04
0.06
0.08
0.10
CogB density
N = 28 Bandwidth = 1.785
Den
sity
Albyn Jones Math 141
What Does The Density Plot Reveal?
It looks like a bi-modal density, with peaks near 0 and 15. Thereare possible outliers on the left side of the main peak. It isdefinitely not a Normal distribution, though it might be a mixtureof two Normal components.
−10 0 10 20
0.00
0.02
0.04
0.06
0.08
0.10
CogB density
N = 28 Bandwidth = 1.785
Den
sity
Albyn Jones Math 141
Summary
The primary graphical tools for looking at a single population orsample from a population are in essence density estimates.
histograms
stem plotsdensity plotsFor publication quality graphics, density plots are best.Stem plots are often handy for exploration, since we canrecover two significant digits of the data from the plot!Next time: quantile-quantile plots!
Albyn Jones Math 141
Summary
The primary graphical tools for looking at a single population orsample from a population are in essence density estimates.
histogramsstem plots
density plotsFor publication quality graphics, density plots are best.Stem plots are often handy for exploration, since we canrecover two significant digits of the data from the plot!Next time: quantile-quantile plots!
Albyn Jones Math 141
Summary
The primary graphical tools for looking at a single population orsample from a population are in essence density estimates.
histogramsstem plotsdensity plots
For publication quality graphics, density plots are best.Stem plots are often handy for exploration, since we canrecover two significant digits of the data from the plot!Next time: quantile-quantile plots!
Albyn Jones Math 141
Summary
The primary graphical tools for looking at a single population orsample from a population are in essence density estimates.
histogramsstem plotsdensity plotsFor publication quality graphics, density plots are best.Stem plots are often handy for exploration, since we canrecover two significant digits of the data from the plot!
Next time: quantile-quantile plots!
Albyn Jones Math 141
Summary
The primary graphical tools for looking at a single population orsample from a population are in essence density estimates.
histogramsstem plotsdensity plotsFor publication quality graphics, density plots are best.Stem plots are often handy for exploration, since we canrecover two significant digits of the data from the plot!Next time: quantile-quantile plots!
Albyn Jones Math 141