Flexible smoothing with P-splines: some...
Transcript of Flexible smoothing with P-splines: some...
1
Flexible smoothing with P-splines: someapplications
Maria DurbanDepartment of Statistics, Universidad Carlos III de Madrid, Spain
Joint work with Raymon Carroll, Iain Currie, Paul Eilers, Jareck Harezlaand Matt Wand
Department of Economics,Bielefeld University, June 2003
2
What is this talk about?
2
What is this talk about?
• Introduction
? Smoothing? Why P-splines?? Mixed model representation of P-splines
2
What is this talk about?
• Introduction
? Smoothing? Why P-splines?? Mixed model representation of P-splines
• Applications
? Additive models? Models with heteroscedastic errors? Smoothing and correlation? Generalised additive models
2
What is this talk about?
• Introduction
? Smoothing? Why P-splines?? Mixed model representation of P-splines
• Applications
? Additive models? Models with heteroscedastic errors? Smoothing and correlation? Generalised additive models
• P-splines for longitudinal data
3
Canadian Occupational Prestige Data (B. Blishen, 1971)
Data consist of prestige scores, average income (in $1000) and education(in years) for 102 occupations.
income
pres
tige
0 5000 10000 15000 20000 25000
2040
6080
education
pres
tige
6 8 10 12 14 16
2040
6080
4
Smoothing
• Prestige score varies smoothly along the income range
• A suitable model for these data could be:
y = f(x) + ε
where x is the covariate (income) f is a smooth function of x whichdepends on λ =smoothing parameter
• Smoothing methods fall into two groups:
? Specified by the fitting procedure: Kernels? Solution of a minimisation problem: Splines
5
0 5000 10000 15000 20000 25000
income
020
4060
8010
0
pres
tige
6
P-spline• Eilers and Marx, 1996.
• They are a generalisation of ordinary regression.
• Modify the log-likelihood by a penalty on the regression coefficients.
y = f(x) + ε f(x) ≈ Ba S = (y −Ba)′(y −Ba) + λa′Pa
a = (B′B + λP )−1B′y
6
P-spline• Eilers and Marx, 1996.
• They are a generalisation of ordinary regression.
• Modify the log-likelihood by a penalty on the regression coefficients.
y = f(x) + ε f(x) ≈ Ba S = (y −Ba)′(y −Ba) + λa′Pa
a = (B′B + λP )−1B′y
P-splines receive also other names:
• Penalised splines
• pseudosplines
• low-rank smoothers
7
Basis for P-splines
B-splines, truncated polynomial basis, radial basis, etc.
7
Basis for P-splines
B-splines, truncated polynomial basis, radial basis, etc.
B-splines
• B-spline: bell-shaped like Gauss curve
• Polynomial pieces smoothly joining at the knots
7
Basis for P-splines
B-splines, truncated polynomial basis, radial basis, etc.
B-splines
• B-spline: bell-shaped like Gauss curve
• Polynomial pieces smoothly joining at the knots
Truncated polynomial
For example: truncated linear basis for knots κ1, . . . , κk is:
1,x, (x− κ1)+, . . . , (x− κk)+
8
0 10 20 30 40
0.0
0.1
0.2
0.3
0.4
0.5
0.6
B-spline basis
0 10 20 30 40
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Scaled B-splines and their sum
0 10 20 30 40
010
2030
Truncated lines basis
9
Why P-splines?
• The number of basis functions used to construct the function estimatesdoes not grow with the sample size
• Quite insensitive to the choice of knots (Ruppert, 2000)
• Computationally simpler
• No need for backfitting in the case of additive models
• Easily extended to 2 or more dimesions and non Gaussian errors
10
Psplines: mixed model approach
10
Psplines: mixed model approach
y = f(x) + ε ε ∼ N(0, σ2R)
We write f(x) = Ba. It can be shown that Ba may be written as
Xβ︸︷︷︸fixed
+ Zu︸︷︷︸random
u ∼ N(0, σ2uI) λ = σ2/σ2
u
y = Xβ+Zu+ε Cov
[uε
]=
[σ2
uI 00 σ2R
]Cov[y] = V = Rσ2+Z ′Zσ2
u
11
Use REML for variance parameters
l(V ) = −12
log |V |−12
log |X ′V X|−y′(V −1−V −1X(X ′V X)−1X ′V −1)y,
Given R, σ2 and σ2u, β and u are solutions to:[
X ′R−1X X ′R−1ZZ ′R−1X Z′R−1Z + λI
] [βu
]=
[X ′R−1
Z ′R−1
]y.
12
Advantages
• Unified approach
• Automatic selection of smoothing parameter
• Likelihood ratio test for model selection
• Already implemented in standard sofware: Splus, SAS, R.
13
APPLICATIONS
14
Additive models: Prestige data revisited
y = f(income)︸ ︷︷ ︸X1β1+Z1u1
+ f(education)︸ ︷︷ ︸X2β2+Z2u2
+ε
= Xβ + Zu + ε Cov
[uε
]=
σ2u1
I 0 00 σ2
u2I 0
0 0 σ2I
β = (β′
1,β′2)
′ u = (u′1,u
′2)
′ X = [X1 : X2] Z = [Z1 : Z2]
15
Partial residuals plot
0 5000 10000 15000 20000 25000
income
-20
-10
010
20
part
ial r
esid
uals
6 8 10 12 14 16
education
-20
-10
010
2030
part
ial r
esid
uals
16
Is the model additive?: Conditional plots
6 8 10 12 14 16 6 8 10 12 14 16
2040
6080
2040
6080
6 8 10 12 14 16
education
pres
tige
17
Two-dimensional P-splines
Now y = f(income, education) + ε = Ba + ε, where
B = B1 ⊗B2 P = λ1P 1 ⊗ In2 + λ2In2 ⊗ P 1
8
10
12
14
education5000
10000
15000
20000
25000
income
2040
6080
pres
tige
18
Smoothing and correlation (Currie and Durban, 2002)
AIC and GCV lead to underestimation of the smoothing parameter in thepresence of positive serial correlation. The general approach to modellingwith P-splines takes care of this problem.
18
Smoothing and correlation (Currie and Durban, 2002)
AIC and GCV lead to underestimation of the smoothing parameter in thepresence of positive serial correlation. The general approach to modellingwith P-splines takes care of this problem.
Wood profile data
320 measurements of the profile of a block of wood subject to grinding.
Sampling distance
Pro
file
0 50 100 150 200 250 300
7080
9010
011
012
0
19
Lag
AC
F
0 5 10 15 20 25
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
Residuals AR(1)
20
0 50 100 150 200 250 300Sampling Distance
7080
9010
011
012
0
Pro
file
21
Lag
AC
F
0 5 10 15 20 25
0.0
0.2
0.4
0.6
0.8
1.0
Residuals AR(2)
Other examples in Durban and Currie (2003), Computational Statistics.
22
Smoothing and heteroscedasticity (Currie and Durban(2002)
Simulated experiment to test crash helmets, 133 head accelerations andtimes after impact
Time (ms)
Acc
eler
atio
n (g
)
10 20 30 40 50
-100
-50
050
23
Fit y = Ba + ε with V ar(ε) = σ2V and V = W−1,W = diag(w1, . . . , wn).
Use P-splines to smooth Ri = log r2i r2
i = (yi − yi)2/σ2 andw−1
i ∝ exp(Ri).
••••• •••••••••••• ••••••••••••
••••••
•
••
•
•
•
•
••
•
•
•
••••
•
••
•
•••••••
•••••
•
••••
•
•
•
•
•
•
•
•
•
••
••••
•
•
•
•••••
•
•
•
•
•
••
•
•
•
•••
•••
••
••••••
•••• •
•• ••• •
Time (ms)
Res
idua
ls s
quar
ed
10 20 30 40 50
02
46
810
••••• •••••••••••••••••••••••••••••
••••••
•••••
••
•••••
•••••
••••••••••
••
•••••••••••
•••
••••••••••
••••
•••••
••
•
•••
••
•••••••••••• ••• •• •
Time (ms)
Inve
rse
wei
ghts
10 20 30 40 500
12
3
24
Generalised additive models: Count data
The one-parameter exponential family model, with canonical link, has jointdensity,
f(y|η) = exp {y′η − 1′b(η) + 1′c(y)}the linear predictor η = Ba, using the mixed model representation ofP -splines we rewrite Ba = Xβ + Zu
f(y|u) = exp {y′(Xβ + Zu)− 1′exp(Xβ + Zu)− 1′log(Γ(y + 1))}
and u ∼ N(0, σ2uI).
Iterate between penalised quasi-likelihood (PQL) of Breslow (1993) (toestimate β and u) and REML (to estimate variance components).
In the case of count data λ = 1/σ2u.
25
The data
Male policyholders, source: Continuous Mortality Investigation Bureau(CMIB).
For each calendar year (1947-1999) and each age (11-100) we have:
• Number of years lived (the exposure).
• Number of policy claims (deaths).
Mortality of male policyholders has improved rapidly over the last 30 years
⇓Model mortality trends overtime and dependence on age.
26
27
Additive model: Fitted curves for Ages 34 and 60
Year
log(
mu)
1950 1960 1970 1980 1990 2000
-7.8
-7.6
-7.4
-7.2
-7.0
-6.8
-6.6
-6.4
Year
log(
mu)
1950 1960 1970 1980 1990 2000-5
.0-4
.8-4
.6-4
.4-4
.2
Age: 34 Age: 60
28
Tensor model: Fitted curves for Ages 34 and 60
Year
log(
mu)
1950 1960 1970 1980 1990 2000
-7.6
-7.4
-7.2
-7.0
-6.8
-6.6
-6.4
Year
log(
mu)
1950 1960 1970 1980 1990 2000
-5.0
-4.8
-4.6
-4.4
-4.2
Age: 34 Age: 60
29
Age-Period
Age-Period-Cohort
Tensor
30
Forecasting with P-splines
Treat the forecasting of future values as a missing value problem.
• We have data for ny years and na ages and wish to forecast nf years
• Define a weight matrix V = blockdiagonal(I,0) I is an identity matrixof size nyna, 0 is a square matrix of size nf
• Define a new basis: B = BV and proceed as before
31
Forecast
Age: 34
1950 1960 1970 1980 1990 2000
Year
-8.5
-8.0
-7.5
-7.0
-6.5
log(
mu)
Age: 60
1950 1960 1970 1980 1990 2000
Year
-5.5
-5.0
-4.5
-4.0
log(
mu)
TruePredictionC.I.
32
P-splines for longitudinal data
33
The data
Objetive: Determine the effect of 4 surgical treatments on coronary sinuspotasium in dogs
• 36 dogs
• 4 treatments
• 7 measurements per dog
34
2 4 6 8 10 12
time
3.0
3.5
4.0
4.5
5.0
5.5
6.0
pota
ssiu
m
Group 1
2 4 6 8 10 12
time
3.0
3.5
4.0
4.5
5.0
5.5
pota
ssiu
m
Group 2
2 4 6 8 10 12
time
3.0
3.5
4.0
4.5
5.0
5.5
6.0
pota
ssiu
m
Group 3
2 4 6 8 10 12
time
3.0
3.5
4.0
4.5
5.0
5.5
pota
ssiu
m
Group 4
35
Models for longitudinal data
Basic Model yij = α0 + α1tij + βi0 + εij 1 ≤ j ≤ 7 1 ≤ i ≤ 36
35
Models for longitudinal data
Basic Model yij = α0 + α1tij + βi0 + εij 1 ≤ j ≤ 7 1 ≤ i ≤ 36
⇓ Relax linearity assumption
Model A yij = fgr(i)(tij) + βi0 + εij 1 ≤ gr(i) ≤ 4
35
Models for longitudinal data
Basic Model yij = α0 + α1tij + βi0 + εij 1 ≤ j ≤ 7 1 ≤ i ≤ 36
⇓ Relax linearity assumption
Model A yij = fgr(i)(tij) + βi0 + εij 1 ≤ gr(i) ≤ 4
⇓ Add random slope + general covariance matrix
Model B yij = fgr(i)(tij) + βi0 + βi1tij + εij
35
Models for longitudinal data
Basic Model yij = α0 + α1tij + βi0 + εij 1 ≤ j ≤ 7 1 ≤ i ≤ 36
⇓ Relax linearity assumption
Model A yij = fgr(i)(tij) + βi0 + εij 1 ≤ gr(i) ≤ 4
⇓ Add random slope + general covariance matrix
Model B yij = fgr(i)(tij) + βi0 + βi1tij + εij
⇓ Subject specific curves
Model C yij = fgr(j)(tij) + gi(tij) + εij
36The mixed model associated to Model A is:
y = X + Zu + ε Cov
[u
ε
]=
ΣgrI 0 0
0 σ2β0
0
0 0 σ2I
X =
X time...
X time
X time =
1 t1... ...
1 t7
Z =
Z1 1 0 · · · 0... ... . . . ...
1 0 · · · 0
Z2 0 1 · · · 0... ... . . . ...
0 1 · · · 0
Z3... ... ... ...
0 0 · · · 1... ... . . . ...
Z4 0 0 · · · 1
Zgr(i) =
Ztime...
Ztime
Σgr =
σ2
1I
σ22I
σ23I
σ24I
37
time
pota
sium
2 4 6 8 10 12
4.0
4.4
4.8
5.2
time
pota
sium
2 4 6 8 10 12
3.4
3.5
3.6
3.7
time
pota
sium
2 4 6 8 10 12
3.4
3.8
4.2
4.6
time
pota
sium
2 4 6 8 10 12
3.6
3.8
4.0
4.2
38
The mixed model associated to Model B is:
y = X + Zu + ε Cov
[u
ε
]=
Σgr 0 0
0 blockdiag(Σ) 0
0 0 σ2I
Z =
Z1 X time 0 · · · 0... ... . . . ...
X time 0 · · · 0Z2 0 X time · · · 0
... ... . . . ...0 X time · · · 0
Z3... ... ... ...0 0 · · · X time... ... . . . ...
Z4 0 0 · · · X time
39
The mixed model associated to Model C is:
y = X + Zu + ε Cov
[u
ε
]=
Σgr 0 0 0
0 blockdiag(Σ) 0 0
0 0 σ2cI 0
0 0 0 σ2I
Z =
Z1 X time 0 · · · 0 Ztime 0 · · · 0... ... . . . ... ... ... . . . ...
X time 0 · · · 0 Ztime 0 · · · 0
Z2 0 X time · · · 0 0 Ztime · · · 0... ... . . . ... ... ... . . . ...
0 X time · · · 0 0 Ztime · · · 0
Z3... ... ... ... ... ... ... ...
0 0 · · · X time 0 0 · · · Ztime... ... . . . ... ... ... . . . ...
Z4 0 0 · · · X time 0 0 · · · Ztime
40
time
pota
sium
2 4 6 8 10 12
2.8
3.0
3.2
3.4
3.6
3.8
4.0
4.2
time
pota
sium
2 4 6 8 10 12
4.5
5.0
5.5
time
pota
sium
2 4 6 8 10 12
3.2
3.4
3.6
3.8
time
pota
sium
2 4 6 8 10 12
4.2
4.4
4.6
4.8
time
pota
sium
2 4 6 8 10 12
2.8
3.0
3.2
3.4
3.6
time
pota
sium
2 4 6 8 10 12
3.5
4.0
4.5
5.0
time
pota
sium
2 4 6 8 10 12
3.4
3.6
3.8
4.0
4.2
time
pota
sium
2 4 6 8 10 12
3.5
4.0
4.5
5.0
41
Conclusions and work in progress
41
Conclusions and work in progress
• P -splines are useful tool to model data in many situations
• P-splines as mixed models
• Easy to implement in standard sorfware
• Model selection
42
References
42
References
• Currie, I. and Durban, M. and Eilers, P. (2003). Smoothing and forecasting mortality
rates.
• Currie, I. and Durban, M. (2002). Flexible smoothing with P-splines: a unified
approach. Statistical Modelling 2.
• Durban, M. and Currie,I. (2003). A note on P -Spline additive models with correlated
errors. Computational Statistics, 18.
• Durban, M., Harezla,J., Carrol, R. and Wand, M. (2003). Simple fitting of
subject-specific curves for longitudinal data.
• Eilers, P.H.C. & Marx, B.D. (1996). Flexible smoothing with B-splines ans penalties.
Statist. Sci. 11.
• Ruppert, D., Wand, M.P., Carroll, R.J. (2003). Semiparametric Regression. Cambridge
University Press.
• Wand, M.P. (2003). Smoothing and mixed models. Comput. Stat. 18.