Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many...

24
Math 141 Lecture 13: Graphics! Albyn Jones 1 1 Library 304 [email protected] www.people.reed.edu/jones/courses/141 Albyn Jones Math 141

Transcript of Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many...

Page 1: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

Math 141Lecture 13: Graphics!

Albyn Jones1

1Library [email protected]

www.people.reed.edu/∼jones/courses/141

Albyn Jones Math 141

Page 2: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

Time for a Graphical Interlude!

Probability theory is mathematics: we start with axiomsand deduce consequences. There are theorems.Statistics is about inference — it is not an axiomatizablediscipline, and a statistical argument from data to aconclusion is not a proof! While there are theorems,assumptions like independence are not guaranteed to betrue statements about the world.Graphical methods are important tools for validation ofmodel assumptions or conditions.

Albyn Jones Math 141

Page 3: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

Exploring the shapes of univariate distributions

The primary graphical tools for direct assessment of thefeatures of a sample for a single variable include:

histogramsstem plotsdensity plotsquantile-quantile plots

Albyn Jones Math 141

Page 4: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

Populations

Most univariate plots are methods of exploring features of apopulation, typically the distribution of some measurementtaken from samples of a population.

We are interested in features like the location, spread andshape of the distribution of our measurements. In addition, wemay see other features, such as the existence of outliers —isolated observations far from the bulk of the data.

Albyn Jones Math 141

Page 5: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

Density Curves

There are many ways to display a distribution, we will focus ondensity curves. A density curve is a function f (x), such that thetotal area under the curve is 1, and the probability that anobservation lies in the interval (a,b) is the area under the curvebetween a and b. For example, the Normal density curve is

f (x) =1√

2πσ2exp

(−1

2

(x − µσ

)2)

Albyn Jones Math 141

Page 6: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

Example: Normal Density Curve

In addition to areas, density curves reveal location, spread, andshape.

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Z

dens

ity

Area = .683

Area under Normal curve between −1 and 1

Albyn Jones Math 141

Page 7: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

Histograms

Histograms come in two flavors: the vertical axis may befrequency (counts) or probability (relative frequency, density).Histograms are very crude density estimates!

frequency histogram

CogB

Freq

uenc

y

−10 0 5 10 20

02

46

810

probability histogram

CogB

Dens

ity

−10 0 5 10 20

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Albyn Jones Math 141

Page 8: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

Warning!!

Frequency count histograms with irregular interval sizes can bemisleading!

frequency histogram

CogB

Freq

uenc

y

−10 0 5 10 15 20

01

23

45

67

probability histogram

CogB

Dens

ity

−10 0 5 10 15 20

0.00

0.02

0.04

0.06

0.08

0.10

0.12

Albyn Jones Math 141

Page 9: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

Warning!!

The appearance of histograms depends on the intervalboundaries!

Histogram of CogB

CogB

Dens

ity

−10 0 5 10 20

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Histogram of CogB

CogB

Dens

ity

−10 0 5 10 15 20

0.00

0.02

0.04

0.06

0.08

0.10

0.12

Albyn Jones Math 141

Page 10: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

Stem Plots

A stem plot is essentially a histogram turned on its side, withregular interval boundaries:

> stem(CogB)

The decimal point is 1 digit(s) to the rightof the |

-0 | 98-0 | 4441111000 | 01112222440 | 61 | 231 | 5572 | 1

Albyn Jones Math 141

Page 11: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

Stem Plots

There is some control over the display scale. If you pick theright scale, you can recover two digits of each case from thestem plot!

stem(CogB,scale=4) stem(CogB,scale=3)

-9 | 1 -8 | 1-8 | -6 | 6-7 | 6 -4 | 0-6 | -2 | 75-5 | -0 | 487731-4 | 0 ..etc..-3 | 75..etc..

Albyn Jones Math 141

Page 12: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

Density Plots

For publication graphics, I recommend density plots. A densityplot is a direct estimate of the population density curve.

−10 0 10 20

0.00

0.02

0.04

0.06

0.08

0.10

plot(density(CogB),col="purple")

N = 28 Bandwidth = 1.785

Den

sity

Albyn Jones Math 141

Page 13: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

Density Plots: details

The density plot is essentially little bumps, added up!

−10 0 10 20

0.00

0.02

0.04

0.06

0.08

0.10

CogB density

N = 28 Bandwidth = 1.785

Den

sity

Albyn Jones Math 141

Page 14: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

Density Plots: add the rug!

−10 0 10 20

0.00

0.02

0.04

0.06

0.08

0.10

CogB density

N = 28 Bandwidth = 1.785

Den

sity

rug(CogB, lwd = 2, col = ”red”)

Albyn Jones Math 141

Page 15: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

Density Estimation: BandWidth

When you plot a density curve, the text at the bottom listssomething called bandwidth. That determines the size of thelittle bumps that density() adds up to get the density estimate.R will make a pretty good guess, probably better than you can,so normally you should just use the default bandwidthselection. But just for fun . . .

Albyn Jones Math 141

Page 16: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

This BW is TOOO Big!

−100 −50 0 50 100

0.00

00.

002

0.00

40.

006

0.00

80.

010

0.01

2

CogB density

N = 28 Bandwidth = 30

Den

sity

Albyn Jones Math 141

Page 17: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

This BW is TOOO Small!

−10 −5 0 5 10 15 20

0.0

0.1

0.2

0.3

CogB density

N = 28 Bandwidth = 0.1

Den

sity

Albyn Jones Math 141

Page 18: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

The Default BW is JUST RIGHT!

−10 0 10 20

0.00

0.02

0.04

0.06

0.08

0.10

CogB density

N = 28 Bandwidth = 1.785

Den

sity

Albyn Jones Math 141

Page 19: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

What Does The Density Plot Reveal?

It looks like a bi-modal density, with peaks near 0 and 15. Thereare possible outliers on the left side of the main peak. It isdefinitely not a Normal distribution, though it might be a mixtureof two Normal components.

−10 0 10 20

0.00

0.02

0.04

0.06

0.08

0.10

CogB density

N = 28 Bandwidth = 1.785

Den

sity

Albyn Jones Math 141

Page 20: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

Summary

The primary graphical tools for looking at a single population orsample from a population are in essence density estimates.

histograms

stem plotsdensity plotsFor publication quality graphics, density plots are best.Stem plots are often handy for exploration, since we canrecover two significant digits of the data from the plot!Next time: quantile-quantile plots!

Albyn Jones Math 141

Page 21: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

Summary

The primary graphical tools for looking at a single population orsample from a population are in essence density estimates.

histogramsstem plots

density plotsFor publication quality graphics, density plots are best.Stem plots are often handy for exploration, since we canrecover two significant digits of the data from the plot!Next time: quantile-quantile plots!

Albyn Jones Math 141

Page 22: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

Summary

The primary graphical tools for looking at a single population orsample from a population are in essence density estimates.

histogramsstem plotsdensity plots

For publication quality graphics, density plots are best.Stem plots are often handy for exploration, since we canrecover two significant digits of the data from the plot!Next time: quantile-quantile plots!

Albyn Jones Math 141

Page 23: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

Summary

The primary graphical tools for looking at a single population orsample from a population are in essence density estimates.

histogramsstem plotsdensity plotsFor publication quality graphics, density plots are best.Stem plots are often handy for exploration, since we canrecover two significant digits of the data from the plot!

Next time: quantile-quantile plots!

Albyn Jones Math 141

Page 24: Math 141 - Lecture 13: Graphics!people.reed.edu/~jones/Courses/P13.pdfDensity Curves There are many ways to display a distribution, we will focus on density curves. A density curve

Summary

The primary graphical tools for looking at a single population orsample from a population are in essence density estimates.

histogramsstem plotsdensity plotsFor publication quality graphics, density plots are best.Stem plots are often handy for exploration, since we canrecover two significant digits of the data from the plot!Next time: quantile-quantile plots!

Albyn Jones Math 141