Archimedean Copulas Theodore Charitos MSc. Student CROSS.
-
Upload
rosalyn-heath -
Category
Documents
-
view
217 -
download
2
Transcript of Archimedean Copulas Theodore Charitos MSc. Student CROSS.
Archimedean Copulas
Theodore Charitos
MSc. Student
CROSS
Task-Goal
• Examination of relations between Archimedean copulas and diagonal band or minimum information copulas given correlation constraints
• Computation of relative information with respect to uniform distribution for each family.
Accomplishment of tasks
• Use of the algorithm provided in the paper of Christian Genest and Louis-Paul Rivest “Statistical Inference Procedures for Bivariate Archimedean Copulas”.
• Use of small program in Matlab for calculating numerically the relative information with respect to uniform distribution.
Structure of presentation
• Theoretical Background
• Explanation and description of the whole procedure proposed by Genest and Rivest.
• Analysis of datasets sampled from Unicorn software.
• Results
• Conclusions-Discussions
Theoretical Background
Definition: A bivariate distribution function with marginals and is said to be generated by an Archimedean copula if it can be expressed in the form
for some convex,
decreasing function on in such a way that
xF yG
yxH , yGxF 1
1,0 01
Proposition: Let and be uniform random variates whose
dependence function is of the form
for some convex decreasing function defined on with the
property that . Set ,
and . Then is uniformly distributed on ,
is distributed as and , are
independent random variables.
X Y
yxH , yx 1
1,0
01
YX
XU
YXHV ,
u
uu
' U 1,0
uuuK UV V
Proposition: Let X and Y be uniform random variables with dependence function . For let
and define
The function is convex,decreasing and satisfies if
and only if for all .
It is obvious from the above propositions that is determined as long as can be determined from the dataset. This will be done in our case via a nonparametric estimation of the distribution
of V based on a decomposition of Kendall’s tau.
yxH , 10 u
uYXHuK ,Pr tKuKut
lim
u 01
uuK 10 u
u u
uK
A pair of random variables is concordant if large values of one tend to be associated with large values of the other and vice versa. More precisely, if we have two observations
and from a vector of continuous random variables we say that and
are concordant if and . Similarly,
and are discordant if
and or vice versa.
ii yx ,
jj yx , YX , ii yx , jj yx ,
ji xx ji yy
ii yx , jj yx , ji xx
ji yy
DefinitionThe Kendall’s tau for the sample is defined as
where c is the number of concordant pairs, d is the number of discordant pairs from n observations of a vector and is the number of distinct pairs of observations in the sample.
2
n
dc
YX ,
2
n
For Archimedean copulas the Kendall’s tau statistic can be conveniently computed via the identity
Apparently, the problem now of estimating the bivariate dependence function relies on the estimation of . Genest and Rivest provide a nonparametric procedure for estimating
and also .
1
0
1414 duuVE
uK
Analysis of various datasetsThe algorithm proposed by Genest and Rivest uses
the variables where
the symbol # stands for the cardinality of a set. If
denotes the distribution function of a point mass at the origin, then a nonparametric estimator of is given by
Knowing that , a sample equivalent for the estimation of is
1
,:,#
n
YYXXYXV ijijjj
i
t
uK
ni
n n
VuuK
1
14 VE 14 Vn
Family
Clayton
Frank
Gumbel
-
a
v a 1 a
vv a12a
a
av
a
exp1
exp1log
av
a
ava
av
exp1
exp1log
exp
exp1 a
aD 141 1
1log av
1
log
a
vv
1a
a
The next step of the analysis concerns the performance of a Pearson chi-squared goodness of fit test statistic for each family in order to assess the fit of the various models. This means that a classification of the dataset is made each time constituting the observed frequencies. However, since the chi-squared test requires predicted values for its computation, it is necessary to generate random variates
whose joint distribution belongs to one of the mentioned Archimedean families
vu,
Algorithm for sampling from Archimedean families
1.Generate two independent uniform variates
u and t.
2. Set
3. Set v =
4. The desired pair is
1,0
t
uw
'' 1
uw 1
vu,
Archimedean Family
Clayton
Frank
Gumbel
yxH ,
aaa yx/1
1
1
111ln
1a
ayax
e
ee
a
1/111 lnlnexp
aaa yx
Clayton’s joint density with a=1.514 Frank’s joint density with a=4.604
Gumbel’s joint density with a=0.757
In general, the relative information with respect to uniform distribution for the bivariate case is
computed as
where is the joint density of and
An approximation of the real solution in each case will be provided, which however is enough to indicate what should someone expect from each Archimedean family.
1
0
1
0
,log,/ dxdyyxhyxhuhI
yxh , X Y
• To illustrate the above procedure six datasets (n=1000) were at first sampled and thoroughly analyzed. The correlations were 0.2, 0.65 and 0.9 for both the diagonal band and the minimum information copulas. A classification of the frequencies was also decided.
• For the sake of completeness, six more datasets with similar correlations constraints but different size (n=5000) and classification were also analyzed in order to compare results.
44
66
Recapitulation-Steps
• Sample from diagonal band and minimum information copulas.
• Estimate Kendal’s tau and the empirical lambda function.
• Estimate the parameters for each family according to the previous results.
• Estimate the lambda functions for each family.
• Classify the dataset in categories and simulate values from each family according to their estimated parameters.
• Perform chi-square goodness of fit test and compare the resulting fits.
• Compute the relative information with respect to uniform distribution.
• Repeat the whole procedure for different correlations and size of the dataset.
Examples of classifications from diag.band with 0.2 Cross-Classification of X and Y (Observed values) X\Y
8 43 36 0 47 166 144 38 40 131 186 52 0 49 46 14 Cross-Classification of X and Y (Observed values) X\Y
67 126 118 111 78 1 105 222 230 173 118 78 129 211 207 131 203 131 126 196 121 199 244 128 80 119 183 233 239 127 0 84 123 152 126 77
1.0,0
5.0,1.0 9.0,5.0
1,9.0
44 1.0,0 5.0,1.0 9.0,5.0 1,9.0
66 1.0,0 3.0,1.0 5.0,3.0 7.0,5.0 9.0,7.0 1,9.0
1.0,0
3.0,1.0
5.0,3.0
7.0,5.0
9.0,7.0
1,9.0
=0.0877273
Statistic df
Clayton 47.2727 8
Frank 34.4113 8
Gumbel 49.5668 8
=0.1076116
Statistic df
Clayton 52.3819 7
Frank 32.7662 8
Gumbel 47.2727 8
n2
n2
=0.431007
Statistic df
Clayton 219.842 5
Frank 66.528 7
Gumbel 129.861 6
=0.424140
Statistic df
Clayton 113.601 5
Frank 24.808 7
Gumbel 119.047 7
n
n2
2
=0.6820981
Statistic df
Clayton 265.805 3
Frank 67.031 3
Gumbel 182.181 3
=0.6991151
Statistic df
Clayton 215.628 3
Frank 39.309 3
Gumbel 140.177 3
n
n2
2
=0.1083165
Statistic df
Clayton 356.184 25
Frank 301.967 25
Gumbel 357.828 25
=0.1040581
Statistic df
Clayton 148.598 25
Frank 65.967 25
Gumbel 94.2128 25
n
n2
2
=0.4270299
Statistic df
Clayton 1707.548 22
Frank 604.8511 23
Gumbel 1398.062 23
=0.4398811
Statistic df
Clayton 1044.165 23
Frank 127.982 23
Gumbel 557.023 23
n
n
2
2
=0.6801317
Statistic df
Clayton 1685.918 15
Frank 473.3928 17
Gumbel 1128.659 17
=0.6906168
Statistic df
Clayton 1142.479 14
Frank 124.384 16
Gumbel 677.908 17
n
n2
2
Relative Information with respect to uniform distribution
Clayton Frank Gumbel
Diag.band 0.2 (n=1000) 0.0144 0.8261 0.0887
Min.inf.0.2 (n=1000) 0.0215 0.7925 0.1099
Diag.band 065 (n=1000) 0.3207 0.4932 0.5222
Min.inf.0.65 (n=1000) 0.3107 0.4944 0.5126
Diag.band 0.9 (n=1000) 0.8653 0.6884 0.8794
Min.inf.0.9 (n=1000) 0.9202 0.7247 0.9015
Diag.band0.2 (n=5000) 0.0218 0.7913 0.1106
Min.inf.0.2 (n=5000) 0.0202 0.7983 0.1060
Diag.band 0.65 (n=5000) 0.3150 0.4939 0.5167
Min.inf.0.65 (n=5000) 0.3343 0.4920 0.5351
Diag.band0.9 (n=5000) 0.8592 0.6844 0.8768
Min.inf.0.9 (n=5000) 0.8924 0.7060 0.8905
Conclusions-Comments
• for correlation 0.2 all the three families seem to fit reasonably well when n=1000,but when n=5000 only Frank’s and also Gumbel’s for the min.information.
• for correlation 0.65 the results are quite promising only when n=1000
• for correlation 0.9 Frank’s and Gumbel’s family seem to fit the data when n=1000 and only Frank’s family when n=5000.
• the results are more promising in the cases of minimum information copula and this is actually a fact that holds for all the datasets no matter what the correlation is.
• the results are much better when the size of the dataset is smaller.
• It is obvious that the chi-square test statistic is sensitive to the number of cells.For greater size n and 6x6 cross-classification the results in almost all cases are disappointing. A performance of another goodness of fit test might result in more encouraging conclusions.
• for correlation 0.2 Clayton’s family has the smallest values of relative information with respect to uniform distribution.
• Nonetheless, for greater correlations, Frank’s family has the smallest values.