Flexible Regression and Smoothing - gamlss.com€¦ · Flexible Regression and Smoothing The...
Transcript of Flexible Regression and Smoothing - gamlss.com€¦ · Flexible Regression and Smoothing The...
Flexible Regression and SmoothingThe gamlss.family Distributions
Bob Rigby Mikis Stasinopoulos
Dipartimento di Scienze Cliniche a de Comunita, Universita degli studi di Milano, Italy, January 2018
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 1 / 58
Outline
1 Types of distribution in GAMLSS
2 Continuous distributionsSkewness and kurtosisTypes of continuous distributionsDistributions on <Distributions on <+
Distributions on <(0,1)
Summary of methods of generating distributionsTheoretical ComparisonTypes of tailsFitting a distribution
3 End
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 2 / 58
Outline
Information
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 3 / 58
Types of distribution in GAMLSS
Different types of distribution in GAMLSS
1 continuous distributions: fY (y |θ), are usually defined on (−∞,+∞),(0,+∞) or (0, 1).
2 discrete distributions: P(Y = y |θ) are defined on y = 0, 1, 2, . . . , n,where n is a known finite value or n is infinite, i.e. usually discrete(count) values.
3 mixed distributions: (finite mixture distributions) are mixtures ofcontinuous and discrete distributions, i.e. continuous distributionswhere the range of Y has been expanded to include some discretevalues with non-zero probabilities.
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 4 / 58
Types of distribution in GAMLSS
Different types of distribution: demos
1 demo.GA()
2 demo.PO()
3 demo.BI()
4 demo.BE()
5 demo.BEINF()
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 5 / 58
Types of distribution in GAMLSS
Example of continuous distribution: SEP1
−4 −2 0 2 4
0.0
0.4
0.8
1.2
(a)
y
f(y)
−4 −2 0 2 4
0.0
0.2
0.4
0.6
0.8
(b)
y
f(y)
−4 −2 0 2 4
0.0
0.2
0.4
0.6
(c)
y
f(y)
−4 −2 0 2 4
0.0
0.2
0.4
0.6
0.8
1.0
(d)
y
f(y)
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 6 / 58
Types of distribution in GAMLSS
Example of discrete distribution: Sichel
0 5 10 15 20
0.00
0.05
0.10
0.15
Sichel, SICHEL
SICHEL( mu = 5, sigma = 11.04, nu = 0.98 )
y
pf, f
(y)
0 5 10 15 20
0.00
0.05
0.10
0.15
Sichel, SICHEL
SICHEL( mu = 5, sigma = 1.151, nu = 0 )
y
pf, f
(y)
0 5 10 15 20
0.00
0.05
0.10
0.15
Sichel, SICHEL
SICHEL( mu = 5, sigma = 0.9602, nu = −1 )
y
pf, f
(y)
0 5 10 15 20
0.00
0.05
0.10
0.15
Sichel, SICHEL
SICHEL( mu = 5, sigma = 43.9, nu = −3 )
y
pf, f
(y)
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 7 / 58
Types of distribution in GAMLSS
Example of mixed distribution distributions: ZAGA
0 2 4 6 8 10
0.00
0.05
0.10
0.15
Zero adjusted GA
y
pdf ●
0 5 10 15 20
0.0
0.4
0.8
Zero adjusted Gamma c.d.f.
y
F(y)
●
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 8 / 58
Continuous distributions Skewness and kurtosis
Skewness and kurtosis
0 5 10 15 20
0.00
0.05
0.10
0.15
negative skewness
y
f(y)
0 5 10 15 20 25 30
0.00
0.05
0.10
0.15
positive skewness
y
f(y)
0 5 10 15 20
0.00
0.04
0.08
platy−kurtosis
y
f(y)
0 5 10 15 20
0.00
0.10
0.20
lepto−kurtosis
y
f(y)
Figure: Showing different types of continuous distributions
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 9 / 58
Continuous distributions Types of continuous distributions
Types of continuous distributions
(−∞,∞): real line <(0,∞): positive real line <+
(0, 1) real line on interval <(0,1) (not containing zero or one)
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 10 / 58
Continuous distributions Distributions on <
Location and scale family
For these distributions, if
Y ∼ D(µ, σ, ν, τ)
thenε = (Y − µ)/σ ∼ D(0, 1, ν, τ),
i.e. Y = µ+ σε, so Y is a scaled and shifted version of the randomvariable ε.All (−∞,∞) distributions in GAMLSS except exGAUS(µ, σ, τ) arelocation-scale families
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 11 / 58
Continuous distributions Distributions on <
Location-scale family
−4 −2 0 2 4
0.0
0.1
0.2
0.3
y
f(y)
GU(0,1)GU(−1,1)GU(1,1)
(a) changing the locationn parameter
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
0.5
y
f(y)
GU(0,1)GU(0,0.7)GU(0,1,5)
(b) changing the scale parameter
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 12 / 58
Continuous distributions Distributions on <
Two parameter on <
Two parameter distributions on (−∞,∞)
Gumbel GU(µ, σ) left skew
Logistic LO(µ, σ) lepto
Normal NO(µ, σ) and NO2(µ, σ)
Reverse Gumbel RG(µ, σ) right skew
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 13 / 58
Continuous distributions Distributions on <
Two parameter on <
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
y
f(y)
NO(0,1)LO(0,1)
(a) NO and LO distributions
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
y
f(y)
NO(0,1)GU(0,1)RG(0,1)
(b) NO, GU, ans RG distributions
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 14 / 58
Continuous distributions Distributions on <
Three parameter on <
exponential Gaussian : exGAUS(µ, σ, ν) for modelling right skew data,
normal family: NOF(µ, σ, ν) for modelling mean and variance relationshipsfollowing the power law;
power exponential: PE(µ, σ, ν) and PE2(µ, σ, ν) for modelling lepto andplaty kurtotic data;
t family: TF(µ, σ, ν) and TF2(µ, σ, ν) for modelling lepro kurtotic data;
skew normal: SN1(µ, σ, ν) and SK2(µ, σ, ν) for modellinng skewness indata.
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 15 / 58
Continuous distributions Distributions on <
Three parameter on <: skew normal type 1
−4 −2 0 2 4
0.0
0.2
0.4
0.6
0.8
y
f(y)
ν = −100ν = −3ν = −1ν = 0
f(y)
y −4 −2 0 2 4
y
f(y)
ν = 100ν = 3ν = 1ν = 0
y
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 16 / 58
Continuous distributions Distributions on <
Three parameter on <: PE and TF distributions
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
y
f(y)
ν = 1 ν = 2ν = 10ν = 100
(a) power exponential
−4 −2 0 2 4
0.0
0.1
0.2
0.3
0.4
y
f(y)
(b) the t−family
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 17 / 58
Continuous distributions Distributions on <
Four parameter on <
exponential generalised beta type 2: EGB2(µ, σ, ν, τ) skewness andleptokutrosis;
generalised t: GT(µ, σ, ν, τ) kurtosis;
Johnson’s SU: JSU(µ, σ, ν, τ) and JSUo(µ, σ, ν, τ) skewness andleptokurtosis;
normal-exponential-t: NET(µ, σ, ν, τ), robustly location and scale;
skew exponential power: SEP1(µ, σ, ν, τ), SEP2(µ, σ, ν, τ), SEP3(µ, σ, ν, τ)and SEP4(µ, σ, ν, τ) skewness and lepto-platy;
sinh-arcsinh: SHASH(µ, σ, ν, τ), SHASHo(µ, σ, ν, τ) and SHASHo2(µ, σ, ν, τ)skewness and lepto-platy;
skew t: ST1(µ, σ, ν, τ), ST2(µ, σ, ν, τ), ST3(µ, σ, ν, τ),ST4(µ, σ, ν, τ), ST5(µ, σ, ν, τ) and SST(µ, σ, ν, τ) skewnessand leptokurtosis.
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 18 / 58
Continuous distributions Distributions on <
Example of four parameter on <: SEP1
−4 −2 0 2 4
0.0
0.4
0.8
1.2
(a)
y
f(y)
−4 −2 0 2 4
0.0
0.2
0.4
0.6
0.8
(b)
y
f(y)
−4 −2 0 2 4
0.0
0.2
0.4
0.6
(c)
y
f(y)
−4 −2 0 2 4
0.0
0.2
0.4
0.6
0.8
1.0
(d)
y
f(y)
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 19 / 58
Continuous distributions Distributions on <
The NET distribution
−4 −2 0 2 4
0.00
0.05
0.10
0.15
0.20
0.25
Y
f Y(y)
The NET distribution
normal
exponential
t−distribution
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 20 / 58
Continuous distributions Distributions on <
Summary of GAMLSS family distributions defined on(−∞,+∞)
Distributions family no par. skewness kurtosis
Exp. Gaussian exGAUS 3 positive -
Exp.G. beta 2 EGB2 4 both lepto
Gen. t GT 4 (symmetric) lepto
Gumbel GU 2 (negative) -
Johnson’s SU JSU, JSUo 4 both lepto
Logistic LO 2 (symmetric) (lepto)
Normal-Expon.-t NET 2,(2) (symmetric) lepto
Normal NO-NO2 2 (symmetric)
Normal Family NOF 3 (symmetric) (meso)
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 21 / 58
Continuous distributions Distributions on <
Summary of GAMLSS family distributions defined on(−∞,+∞)
Power Expon. PE-PE2 3 (symmetric) both
Reverse Gumbel RG 2 positive
Sinh Arcsinh SHASH, 4 both both
Skew Exp. Power SEP1-SEP4 4 both both
Skew t ST1-ST5, SST 4 both lepto
t Family TF 3 (symmetric) lepto
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 22 / 58
Continuous distributions Distributions on <+
Continuous distribution on <+: scale family
If a random variable is distributed as
Y ∼ D(µ, σ, ν, τ)
andε = (Y /µ) ∼ D(1, σ, ν, τ)
then Y has a scale family and the random variable
Y = µε
is a scaled version of the random variable ε.µ does not effect the shape.
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 23 / 58
Continuous distributions Distributions on <+
Continuous distribution on <+: Weibull pdf
0 1 2 3 4 5 6
y
f(y)
σ = 0.5σ = 1σ = 6
µ = 1
0 1 2 3 4 5 6
0.0
0.5
1.0
1.5
2.0
0 1 2 3 4 5 6
y
f(y)
µ = 2
0 1 2 3 4 5 6
0.0
0.2
0.4
0.6
0.8
1.0
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 24 / 58
Continuous distributions Distributions on <+
Continuous distribution on <+: one and two parameters
exponencial EXP(µ, σ)
gamma: GA(µ, σ) member of the exponential family;
inverse gamma: IGAMMA(µ, σ);
inverse Gaussian: IG(µ, σ) a member of the exponential family;
log-normal: LOGNO(µ, σ) and LOGNO2(µ, σ).
Pareto: PARETO(µ, σ), PARETO2o(µ, σ) and GP(µ, σ) for heavy tail;
Weibull: WEI(µ, σ), WEI2(µ, σ) and WEI3(µ, σ) used in survivalanalysis.
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 25 / 58
Continuous distributions Distributions on <+
Continuous distribution on <+: Weibull survival andhazard functions
0 1 2 3 4 5 6
0.0
0.2
0.4
0.6
0.8
1.0
Y
S(Y
)
σ = 0.5σ = 1σ = 6
(a) survival function
0 1 2 3 4 5 6
y
h(y)
σ = 0.5σ = 1σ = 6
(b) hazard function
0 1 2 3 4 5 6
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 26 / 58
Continuous distributions Distributions on <+
Continuous distribution on <+: three parameters
Box-Cox Cole and Green BCCG(µ, σ, ν) known also as the LMS method incentile estimation.
generalised gamma GG(µ, σ, ν)
generalised inverse Gaussian GIG(µ, σ, ν)
log-normal family LNO(µ, σ, ν) based on the standard Box-Coxtransformation
Z =
(Y ν−1)
ν if ν 6= 0
log(Y ) if ν = 0
(1)
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 27 / 58
Continuous distributions Distributions on <+
Continuous distribution on <+: four parameters
Box-Cox power exponential BCPE(µ, σ, ν, τ) and BCPEo(µ, σ, ν, τ)skewness and platy-lepto;
Box-Cox t BCT(µ, σ, ν, τ) and BCTo(µ, σ, ν, τ) skewness andleptokurtosis;
generalised beta type 2 GB2(µ, σ, ν, τ) skewness and platy-lepto.
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 28 / 58
Continuous distributions Distributions on <+
Continuous distribution on <+: four parameters, BCT
01
23
f(x)
σ = 0.15
y
σ = 0.2
BCT
y
σ = 0.5
ν = −4ν = 0ν = 2
τ = 100
01
23
f(x)
y y
τ = 5
0 1 2 3 4
01
23
f(x)
y
0 1 2 3 4
yy
0 1 2 3 4
yy
τ = 1
Figure: Box-Cox t Distribution BCT (µ, σ, ν, τ). Parameter values µ = 1,σ = (0.15, 0.20, 0.50), ν = (−4, 0, 2), τ = (1, 5, 100).
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 29 / 58
Continuous distributions Distributions on <+
Continuous GAMLSS family distributions defined on(0,+∞)
Distributions family no par. skewness kurtosis
BCCG BCCG 3 both
BCPE BCPE 4 both both
BCT BCT 4 both lepto
Exponential EXP 1 (positive) -
Gamma GA 2 (positive) -
Gen. Beta type 2 GB2 4 both both
Gen. Gamma GG-GG2 3 positive -
Gen. Inv. Gaussian GIG 3 positive -
Inv. Gaussian IG 2 (positive) -
Log Normal LOGNO 2 (positive) -
Log Normal family LNO 2,(1) positive
Reverse Gen. Extreme RGE 3 positive -
Weibull WEI-WEI3 2 (positive) -
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 30 / 58
Continuous distributions Distributions on <+
Transformation from (−∞,∞) to (0,+∞)
Any distribution for Z on (−∞,∞) can be transformed to acorresponding distribution for Y = exp(Z ) on (0,+∞)
For example: from t distribution to log t distribution
gen.Family("TF", type="log")
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 31 / 58
Continuous distributions Distributions on <+
Positive response: log T distribution
0 2 4 6 8 10
0.0
0.2
0.4
0.6
0.8
1.0
x
dlog
TF
(x, m
u =
−5,
sig
ma
= 1
, nu
= 1
0)
(a)
mu = −5mu = −1mu = 0mu = 1mu = 2
0.0 0.5 1.0 1.5 2.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
x
dlog
TF
(x, m
u =
0, s
igm
a =
0.5
, nu
= 1
0)
(b)
sigma = .5sigma = 1sigma = 2sigma = 5
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.0
0.2
0.4
0.6
0.8
1.0
x
dlog
TF
(x, m
u =
0, s
igm
a =
0.5
, nu
= 1
000)
(c)
nu = 1000nu = 10nu = 2nu = 1
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 32 / 58
Continuous distributions Distributions on <+
Positive response: log T distribution
0 2 4 6 8 10
0.00.2
0.40.6
x
dlogT
F(x,
mu =
0)
0 2 4 6 8 10
0.00.2
0.40.6
0.81.0
x
plogT
F(x,
mu =
0)
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
x
qlogT
F(x,
mu =
0)
Histogram of Y
Y
Frequency
0 5 10 15 20 25 30
050
100
150
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 33 / 58
Continuous distributions Distributions on <+
Truncating from (−∞,∞) to (0,+∞)
Any distribution for Z on (−∞,∞) can be truncated to acorresponding distribution for Y on (0,+∞)
For example: from SST(µ, σ, ν, τ) distribution toflipped-SST(µ, σ, ν, τ) distribution
gen.trun(0,"SST",type="left")
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 34 / 58
Continuous distributions Distributions on <+
Truncating from (−∞,∞) to (0,+∞)
0 1 2 3 4
0.0
0.5
1.0
1.5
x
dSS
Ttr
(x, m
u =
0.1
, sig
ma
= 0
.5, n
u =
1, t
au =
10)
(a)
mu = .1mu = 1mu = 2
0 1 2 3 4
0.0
0.2
0.4
0.6
0.8
x
dSS
Ttr
(x, m
u =
2, s
igm
a =
0.5
, nu
= 1
, tau
= 1
0)
(b)
sigma = .5sigma = 1sigma = 2
0 1 2 3 4
0.0
0.2
0.4
0.6
0.8
1.0
x
dSS
Ttr
(x, m
u =
2, s
igm
a =
0.5
, nu
= 0
.1, t
au =
10)
(c)
nu = 0.1nu = 1nu = 2
0 1 2 3 4
0.0
0.2
0.4
0.6
0.8
1.0
1.2
x
dSS
Ttr
(x, m
u =
2, s
igm
a =
0.5
, nu
= 1
, tau
= 3
)
(d)
tau = 3tau = 5tau = 100
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 35 / 58
Continuous distributions Distributions on <(0,1)
Continuous GAMLSS family distributions defined on(0, 1)
Distributions family no par. skewness kurtosis
Beta BE 2 (both) -
Beta original BEo 2 (both) -
Logit normal LOGITNO 2 (both) -
Generalized beta type 1 GB1 4 (both) (both)
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 36 / 58
Continuous distributions Distributions on <(0,1)
Transformation from (−∞,∞) to (0, 1)
Any distribution for Z on (−∞,∞) can be transformed to acorresponding distribution for Y = 1
1+e−Z on (0, 1)
For example: from t distribution to logit t distribution
gen.Family("TF", "logit")
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 37 / 58
Continuous distributions Distributions on <(0,1)
logit T distribution
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
x
dlog
itTF
(x, m
u =
−5,
sig
ma
= 1
, nu
= 1
0)
(a)
mu = −5mu = −1mu = 0mu = 1mu = 5
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
x
dlog
itTF
(x, m
u =
0, s
igm
a =
0.5
, nu
= 1
0)
(b)
sigma = .5sigma = 1sigma = 2sigma = 5
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
x
dlog
itTF
(x, m
u =
0, s
igm
a =
1, n
u =
100
0)
(c)
nu = 1000nu = 10nu = 5nu = 1
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 38 / 58
Continuous distributions Distributions on <(0,1)
logit T distribution
0.0 0.2 0.4 0.6 0.8 1.0
0.51.0
1.5
x
dlogit
TF(x,
mu =
0)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.2
0.40.6
0.81.0
x
plogit
TF(x,
mu =
0)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.2
0.40.6
0.81.0
x
qlogit
TF(x,
mu =
0)
Histogram of Y
Y
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
2025
30
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 39 / 58
Continuous distributions Distributions on <(0,1)
Truncating from (−∞,∞) to (0, 1)
Any distribution for Z on (−∞,∞) can be truncated to acorresponding distribution for Y on (0, 1)
For example: from SST(µ, σ, ν, τ) distribution to trun-SST(µ, σ, ν, τ)distribution
gen.trun(c(0,1),"SST",type="both")
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 40 / 58
Continuous distributions Distributions on <(0,1)
Truncated SST on (0, 1)
0.0 0.2 0.4 0.6 0.8 1.0
01
23
45
x
dSS
Ttr
(x, m
u =
0.1
, sig
ma
= 0
.1, n
u =
1, t
au =
10)
(a)
mu = .1mu = .5mu = .9
0.0 0.2 0.4 0.6 0.8 1.0
01
23
4
x
dSS
Ttr
(x, m
u =
0.5
, sig
ma
= 0
.1, n
u =
1, t
au =
10)
(b)
sigma = .1sigma = .2sigma = .5
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
x
dSS
Ttr
(x, m
u =
0.5
, sig
ma
= 0
.2, n
u =
0.1
, tau
= 1
0)
(c)
nu = .1nu = 2nu = .5
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
x
dSS
Ttr
(x, m
u =
0.5
, sig
ma
= 0
.2, n
u =
1, t
au =
3)
(d)
tau = 3tau = 4tau = 10
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 41 / 58
Continuous distributions Summary of methods of generating distributions
Summary of methods of generating distributions
1 univariate transformation from a single random variable (LOGNO,BCCG, BCPE, BCT, EGB2, SHASHo, inverse log and logictransformations . . .)
2 transformation from two or more random variables (TF, ST2,exGAUS)
3 truncation distributions
4 a (continuous or finite) mixture of distributions (TF, GT, GB2, EGB2)
5 Azzalini type methods (SN1, SEP1, SEP2, ST1, ST2)
6 splicing distributions (SN2, SEP3, SEP4, ST3, ST4, NET)
7 stopped sums
8 systems of distributions
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 42 / 58
Continuous distributions Theoretical Comparison
Comparison of properties of distributions
1 Explicit pdf cdf and inverse cdf
2 Explicit centiles and centile based measures (e.g median)
3 Explicit moment based measures (e.g. mean)
4 Explicit mode(s)
5 Continuity of the pdf and its derivatives with respect to y
6 Continuity of the pdf with respect to µ, σ, ν and τ
7 Flexibility in modelling skewness and kurtosis
8 Range of Y
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 43 / 58
Continuous distributions Theoretical Comparison
Theoretical comparison of distributions
OK we have a lot of distributions in GAMLSS.Do we need any more?
Do we need all of them or some of them are redundant?
Is there any way to compare them theoretically?
We can compare the tail behaviour in of the distributions
Since most of then are location-scale we can compare their flexibilityin terms of skewness and kurtosis
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 44 / 58
Continuous distributions Theoretical Comparison
Comparing tails: real line
−10 −5 0 5 10
−12−10
−8−6
−4−2
x
log(pd
f)
NormalCauchyLaplace
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 45 / 58
Continuous distributions Theoretical Comparison
Comparing tails: positive real line
0 5 10 15 20 25 30
−7−6
−5−4
−3−2
−10
x
log(pd
f)
exponentialPareto 2log Normal
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 46 / 58
Continuous distributions Types of tails
Types of tails
There are three main forms for logfY (y) for a tail of Y
Type I: −k2 (log |y |)k1 ,
Type II: −k4 |y |k3 ,
Type III: −k6 ek5|y |,
decreasing k’s results in a heavier tail,
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 47 / 58
Continuous distributions Types of tails
Types of tails: classification (real line)
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 48 / 58
Continuous distributions Types of tails
Types of tails: classification (positive real line)
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 49 / 58
Continuous distributions Fitting a distribution
How to fit distributions in R
optim() or mle() requires initial parameter values
gamlsssML() mle a using variation of the mle() function
gamlss(): mle using RS or CG or mixed algorithms,
histDist() mle using RS or CG or mixed algorithms, for a univariatesample only, and plots a histogram of the data with thefitted distribution
fitDist() fits a set of distributions and chooses the one with thesmallest GAIC
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 50 / 58
Continuous distributions Fitting a distribution
Strength of glass fibres data
Data summary:
R data file: glass in package gamlss.data of dimensions 63× 1
sourse: Smith and Naylor (1987)
strength: the strength of glass fibres (the unitof measurement are not given).
purpose: to demonstrate the fitting of a parametricdistribution to the data.
conclusion: a SEP4 distribution fits adequately
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 51 / 58
Continuous distributions Fitting a distribution
Strength of glass fibres data: SBC
glass strength
glass$strength
Freque
ncy
0.5 1.0 1.5 2.0
05
1015
20
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 52 / 58
Continuous distributions Fitting a distribution
Strength of glass fibres data: creating truncateddistribution
data(glass)
library(gamlss.tr)
gen.trun(par = 0, family = TF)
A truncated family of distributions from TF
has been generated
and saved under the names:
dTFtr pTFtr qTFtr rTFtr TFtr
The type of truncation is left and the
truncation parameter is 0
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 53 / 58
Continuous distributions Fitting a distribution
Strength of glass fibres data: fitting the models
> m1<-fitDist(strength, data=glass, k=2, extra="TFtr")
# AIC
> m2<-fitDist(strength, data=glass, k=log(length(strength)),
extra="TFtr") # SBC
> m1$fit[1:8]
SEP4 SEP3 SHASHo EGB2 JSU BCPEo ST2 BCTo
27.68790 27.98191 28.01800 28.38691 30.41380 31.17693 31.40101 31.47677
> m2$fit[1:8]
SEP4 SEP3 SHASHo EGB2 GU WEI3 JSU PE
36.26044 36.55444 36.59054 36.95945 38.19839 38.69995 38.98634 39.00792
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 54 / 58
Continuous distributions Fitting a distribution
Strength of glass fibres data: Results
histDist(glass$strength, SEP4, nbins = 13,
main = "SEP4 distribution",
method = mixed(20, 50))
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 55 / 58
Continuous distributions Fitting a distribution
Strength of glass fibres data: SBC
0.5 1.0 1.5 2.0 2.5
0.00.5
1.01.5
2.02.5
SEP4 distribution
glass$strength
f()
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 56 / 58
Continuous distributions Fitting a distribution
Strength of glass fibres data: summary
Call: gamlss(formula = y~1, family = FA)
Mu Coefficients:
(Intercept)
1.581
Sigma Coefficients:
(Intercept)
-1.437
Nu Coefficients:
(Intercept)
-0.1280
Tau Coefficients:
(Intercept)
0.3183
Degrees of Freedom for the fit: 4 Residual Deg. of Freedom 59
Global Deviance: 19.8111
AIC: 27.8111
SBC: 36.3836
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 57 / 58
End
ENDfor more information see
www.gamlss.org
Bob Rigby, Mikis Stasinopoulos Flexible Regression and Smoothing 2016 58 / 58