Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder...

36
Math 141 Quantile-Quantile Plots Albyn Jones 1 1 Library 304 [email protected] www.people.reed.edu/jones/courses/141 Albyn Jones Math 141

Transcript of Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder...

Page 1: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Math 141Quantile-Quantile Plots

Albyn Jones1

1Library [email protected]

www.people.reed.edu/∼jones/courses/141

Albyn Jones Math 141

Page 2: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Outline: Quantile-Quantile Plots

Quantiles and Order Statistics

Quantile-Quantile Plots

Normal Quantile Plots

Albyn Jones Math 141

Page 3: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Quantiles and Order Statistics

Definition: p-th quantile qpEarlier we defined the p-th quantile of the distribution of a RV Xas any number qp satisfying

P(X ≤ qp) ≥ p

andP(X ≥ qp) ≥ (1− p).

Albyn Jones Math 141

Page 4: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Order Statistics

Sample quantiles are based on Order Statistics:Let X1,X2, . . . ,Xn be a sample of size n. The order statisticsX(1),X(2), . . . ,X(n) are just the observations sorted intoascending order.

# The data> x[1] 76 92 83 105 102 109 106 91 110 89

# The order statistics> sort(x)[1] 76 83 89 91 92 102 105 106 109 110

Albyn Jones Math 141

Page 5: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Order Statistics and Sample Quantiles

There are numerous definitions of sample quantiles chosen toperform well under various conditions. All involve interpolationbetween neighboring order statistics.

Suppose that we want the pth quantile, where p liesbetween k/n and (k + 1)/n. There are variations, but allare, for some choice of a ∈ [0,1]:

q̂p = aX(k) + (1− a)X(k+1)

Example, when n is even, the sample median q.5 is usuallytaken to be the average of the two middle observations.

Albyn Jones Math 141

Page 6: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Order Statistics and Sample Quantiles

There are numerous definitions of sample quantiles chosen toperform well under various conditions. All involve interpolationbetween neighboring order statistics.

Suppose that we want the pth quantile, where p liesbetween k/n and (k + 1)/n. There are variations, but allare, for some choice of a ∈ [0,1]:

q̂p = aX(k) + (1− a)X(k+1)

Example, when n is even, the sample median q.5 is usuallytaken to be the average of the two middle observations.

Albyn Jones Math 141

Page 7: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Order Statistics and Sample Quantiles

There are numerous definitions of sample quantiles chosen toperform well under various conditions. All involve interpolationbetween neighboring order statistics.

Suppose that we want the pth quantile, where p liesbetween k/n and (k + 1)/n. There are variations, but allare, for some choice of a ∈ [0,1]:

q̂p = aX(k) + (1− a)X(k+1)

Example, when n is even, the sample median q.5 is usuallytaken to be the average of the two middle observations.

Albyn Jones Math 141

Page 8: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Order Statistics as Sample Quantiles

Let’s turn it around, and ask what sample quantiles correspondto the order statistics?

Consider 4 observations from a population. On the average, weexpect them to at least approximately divide the population intoequal chunks corresponding to equally spaced percentiles:

15,25,35,45

In other words, they correspond to the sample quantiles

q.2,q.4,q.6,q.8

Albyn Jones Math 141

Page 9: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Comparing Two Samples

Suppose we have two samples of size n, X1,X2, . . .Xn andY1,Y2, . . .Yn.

If they were samples from the same distribution, then theorder statistics X(1),X(2), . . .X(n) and Y(1),Y(2), . . .Y(n)would be estimates of the same quantiles.Thus we expect that X(1) ≈ Y(1), X(2) ≈ Y(2), etc.

Albyn Jones Math 141

Page 10: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Comparing Two Samples

Suppose we have two samples of size n, X1,X2, . . .Xn andY1,Y2, . . .Yn.

If they were samples from the same distribution, then theorder statistics X(1),X(2), . . .X(n) and Y(1),Y(2), . . .Y(n)would be estimates of the same quantiles.

Thus we expect that X(1) ≈ Y(1), X(2) ≈ Y(2), etc.

Albyn Jones Math 141

Page 11: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Comparing Two Samples

Suppose we have two samples of size n, X1,X2, . . .Xn andY1,Y2, . . .Yn.

If they were samples from the same distribution, then theorder statistics X(1),X(2), . . .X(n) and Y(1),Y(2), . . .Y(n)would be estimates of the same quantiles.Thus we expect that X(1) ≈ Y(1), X(2) ≈ Y(2), etc.

Albyn Jones Math 141

Page 12: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

The QQ plot

The quantile–quantile plot, or QQplot, is a simple graphicalmethod for comparing two sets of sample quantiles. Plot thepairs of order statistics

(X(k),Y(k)).

If the two datasets come from the same distribution, the pointsshould lie roughly on a line through the origin with slope 1.

Albyn Jones Math 141

Page 13: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

QQ plot example

● ●

●●

●●

●●●

●●●●●●●●

●●●●●●●●●●

●●●●●●●●●● ●●

●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●● ●

●●

−3 −2 −1 0 1 2

−3

−2

−1

01

23

X

Y

QQplot of two N(0,1) samples of size 200

Albyn Jones Math 141

Page 14: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

QQ plot example, small sample!

Alert!! With small samples, expect variation!

●●

● ●● ●

−3 −2 −1 0 1

−2.

0−

1.5

−1.

0−

0.5

0.0

0.5

1.0

X

Y

QQplot of two N(0,1) samples of size 20

Albyn Jones Math 141

Page 15: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

QQ plot example: location shift

Two samples from similar distributions which differ only inlocation: the green reference line is y = x .

● ●●●●

●●●●●

●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●

●●●●● ●●●●●●

●●●●●

●●● ● ● ●

●●

−2 −1 0 1 2

01

23

45

6

X

Y

QQplot of two samples of size 200

Albyn Jones Math 141

Page 16: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

QQ plot example: different spread

Two samples from similar distributions which differ only inspread, with reference line.

● ●●●●

●●●●●

●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●

●●●●● ●●●●●●

●●●●●

●●● ● ● ●

●●

−2 −1 0 1 2

−6

−4

−2

02

46

X

Y

QQplot of two samples of size 200

Albyn Jones Math 141

Page 17: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

QQ plot example: different shape

Two samples from distributions which differ in shape, as well aslocation and spread, with reference line.

●●●●● ●

●●●●●

●●●● ●●●●●●

●●●●●●●

●●●●●●●●●●●

●●● ●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●

●●●●●

●●●●●●●●

●●●●●●●●

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

23

X

Y

QQplot of two samples of size 200

Albyn Jones Math 141

Page 18: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

QQ plot example: Anorexia data

The Family Therapy group had 17 subjects, the Control Therapy26. qqplot() uses estimated quantiles for the larger dataset.

● ●

●●

●●

●● ●

−10 −5 0 5 10 15

−5

05

1015

20

Control

Fam

ily

QQplot of Family Therapy vs Control

Albyn Jones Math 141

Page 19: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Normal Quantile Plots

Often we wish to compare a dataset to the Normal distribution,a theoretical population, rather than to a second dataset. R hasa function that plots the order statistics of a sample against thecorresponding quaintiles of the standard normal distribution:

qqnorm(X )

If the plot is roughly linear, then our data are approximatelynormally distributed.

Albyn Jones Math 141

Page 20: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Normal Quantile Plot: Normal data

Don’t expect a perfectly straight plot even with normal data!

●●●

●●

●●

●●

−2 −1 0 1 2

05

1015

20

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Albyn Jones Math 141

Page 21: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Normal Quantile Plot: Mean and SD

We can estimate the mean and SD from a Normal quantile plot:the mean is roughly equal to the median (plotted above 0), andthe slope is roughly the SD.

●●●

●●

●●

●●

−2 −1 0 1 2

05

1015

20

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

rise: 4.8

Albyn Jones Math 141

Page 22: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Normal Quantile Plot: Short Tails

●●

● ●

−2 −1 0 1 2

−5

05

1015

2025

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Albyn Jones Math 141

Page 23: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Normal Quantile Plot: Short Tails

Short tails are hard to diagnose from a density plot!

−5 0 5 10 15 20 25

0.00

0.01

0.02

0.03

0.04

0.05

0.06

Short Tails

N = 50 Bandwidth = 2.436

Den

sity

Albyn Jones Math 141

Page 24: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Normal Quantile Plot: Long Tails

● ●

−2 −1 0 1 2

−2

−1

01

2

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Albyn Jones Math 141

Page 25: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Density Plot: Long Tails

−2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

Long Tails

N = 50 Bandwidth = 0.3177

Den

sity

Albyn Jones Math 141

Page 26: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Normal Quantile Plot: Positive Skewness

●●

●●

●●

−2 −1 0 1 2

02

46

810

12

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Albyn Jones Math 141

Page 27: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Density Plot: Positive Skewness

0 5 10 15

0.00

0.05

0.10

0.15

Positive Skewness

N = 50 Bandwidth = 0.9877

Den

sity

Albyn Jones Math 141

Page 28: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Normal Quantile Plot: Negative Skewness

●●

●●

●●

−2 −1 0 1 2

68

1012

14

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Albyn Jones Math 141

Page 29: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Normal Quantile Plot: Bimodal Data

●●

●●

●●

●● ●

●●

●●

●●

●●

−2 −1 0 1 2

24

68

1012

Normal Q−Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

Albyn Jones Math 141

Page 30: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Density Plot: Bimodal Data

0 5 10 15

0.00

0.05

0.10

0.15

Bimodal data

N = 50 Bandwidth = 1.372

Den

sity

Albyn Jones Math 141

Page 31: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Other distributions

Suppose we would like to make a theoretical quantile plot for adataset X to compare to some other distribution, say aChisquared distribution with 5 degrees of freedom. Easy:

Sort your dataset:SampleQuantiles <- sort(X)

Compute the theoretical quantiles:ChiSqQuantiles <- qchisq(ppoints(X),5)

Plot:plot(ChiSqQuantiles,SampleQuantiles,

pch=19)

Albyn Jones Math 141

Page 32: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Other distributions

Suppose we would like to make a theoretical quantile plot for adataset X to compare to some other distribution, say aChisquared distribution with 5 degrees of freedom. Easy:

Sort your dataset:SampleQuantiles <- sort(X)

Compute the theoretical quantiles:ChiSqQuantiles <- qchisq(ppoints(X),5)

Plot:plot(ChiSqQuantiles,SampleQuantiles,

pch=19)

Albyn Jones Math 141

Page 33: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Other distributions

Suppose we would like to make a theoretical quantile plot for adataset X to compare to some other distribution, say aChisquared distribution with 5 degrees of freedom. Easy:

Sort your dataset:SampleQuantiles <- sort(X)

Compute the theoretical quantiles:ChiSqQuantiles <- qchisq(ppoints(X),5)

Plot:plot(ChiSqQuantiles,SampleQuantiles,

pch=19)

Albyn Jones Math 141

Page 34: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Other distributions

Suppose we would like to make a theoretical quantile plot for adataset X to compare to some other distribution, say aChisquared distribution with 5 degrees of freedom. Easy:

Sort your dataset:SampleQuantiles <- sort(X)

Compute the theoretical quantiles:ChiSqQuantiles <- qchisq(ppoints(X),5)

Plot:plot(ChiSqQuantiles,SampleQuantiles,

pch=19)

Albyn Jones Math 141

Page 35: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Gamma Quantile Plot

● ●

● ●

● ●

● ● ●

●●

● ●●

● ●●

2 4 6 8 10

24

68

10

Gq

sort

(X)

Gamma Quantile Plot

Albyn Jones Math 141

Page 36: Math 141 - Quantile-Quantile Plots - Reed Collegepeople.reed.edu/~jones/Courses/P14.pdfOrder Statistics and Sample Quantiles There are numerous definitions of sample quantiles chosen

Summary

QQplots are an excellent graphical tool for comparing twosamples to each other, or one sample to a theoreticaldistribution like the Normal. They reveal differences in location,spread and shape more clearly than do density plots orhistograms.

Albyn Jones Math 141