Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from...
-
Upload
donald-reynolds -
Category
Documents
-
view
216 -
download
2
Transcript of Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from...
Inference for the mean vector
Univariate InferenceLet x1, x2, … , xn denote a sample of n from the normal distribution with mean and variance 2.Suppose we want to test
H0: = 0 vs
HA: ≠ 0
The appropriate test is the t test:
The test statistic:
Reject H0 if |t| > t/2
0xt n
s
The multivariate TestLet denote a sample of n from the p-variate normal distribution with mean vector and covariance matrix .
Suppose we want to test
1 2, , , nx x x
0 0
0
: vs
:A
H
H
Example
For n = 10 students we measure scores on – Math proficiency test (x1),
– Science proficiency test (x2),
– English proficiency test (x3) and
– French proficiency test (x4)
The average score for each of the tests in previous years was 60. Has this changed?
The data
Student Math Science Eng French
1 81 89 73 742 73 79 73 743 61 86 81 814 55 70 76 735 61 71 61 666 52 70 56 587 56 74 56 568 65 87 73 699 54 76 69 72
10 48 71 62 63
Summary Statistics
60.677.368.068.6
x
S
102.044 56.689 41.222 39.48956.689 56.456 42.000 35.35641.222 42.000 75.778 65.11139.489 35.356 65.111 61.378
the mean vector
the sample covariance matrix
0
60606060
Roy’s Union- Intersection PrincipleThis is a general procedure for developing a multivariate test from the corresponding univariate test.
1
i.e. observation vector
p
X
X
X
1. Convert the multivariate problem to a univariate problem by considering an arbitrary linear combination of the observation vector.
1 1 p pU a X a X a X
arbitrary linear combination of the observations
2. Perform the test for the arbitrary linear combination of the observation vector.
3. Repeat this for all possible choices of
1
p
a
a
a
4. Reject the multivariate hypothesis if H0 is rejected for any one of the choices for
5. Accept the multivariate hypothesis if H0 is accepted for all of the choices for
6. Set the type I error rate for the individual tests so that the type I error rate for the multivariate test is .
.a
.a
Let denote a sample of n from the p-variate normal distribution with mean vector and covariance matrix .
Suppose we want to test
1 2, , , nx x x
0 0
0
: vs
:A
H
H
Application of Roy’s principle to the following situation
1 1Let i i i p piu a x a x a x
Then u1, …. un is a sample of n from the normal distribution with mean and variance .a a aΣ
to test
0 0
0
: vs
:
a
aA
H a a
H a a
we would use the test statistic:
0a
u
u at n
s
1 1
1 1Now
n n
i ii i
u u a xn n
1 1
1 1n n
i ii i
a x a x a xn n
and
222
1 1
1 1
1 1
n n
u i ii i
s u u a x a xn n
2
1
1
1
n
ii
a x xn
1
1
1
n
i ii
a x x x x an
1
1
1
n
i ii
a x x x x a a an
S
Thus
00
a a x a nt n a x
a aa a
SS
We will reject 0 0:aH a a
if 0 / 2
a nt a x t
a a
S
2
2 0 2
/ 2or a
n a xt t
a a
S
We will reject
0 0 0: in favour of :AH H
Using Roy’s Union- Intersection principle:
2
2 0 2
/ 2if for at least one a
n a xt t a
a a
S
We accept0 0:H
2
2 0 2
/ 2if for all a
n a xt t a
a a
S
We reject
0 0:H
i.e.
2
0 2
/ 2if max
a
n a xt
a a
S
We accept0 0:H
2
0 2
/ 2if max
a
n a xt
a a
S
Consider the problem of finding:
2
0max max
a a
n a xh a
a a
S
where
2
0 0 0n a x a x x a
h a na a a a
S S
0 0 0 0
2
2 20
a a x x a a x x a ah a
na a a
S S
S
0 0or a a x a x a S S
thus 2
0max
opt
aopt opt
n a xh a
a a
S
1 10 0
0
or opt
a aa x k x a
a x
SS S
21
0 0
2 1 10 0
n k x x
k x x
S
S SS
10 0n x x S
We reject 0 0:H Thus Roy’s Union- Intersection principle states:
1 20 0 / 2
if n x x t
S
We accept 0 0:H
1 20 0 / 2
if n x x t
S
2 10 0The statistic T n x x S
is called Hotelling’s T2 statistic
We reject 0 0:H Choosing the critical value for Hotelling’s T2 statistic
2 1 20 0 / 2
if T n x x t
S
2
/ 2To determine t
, we need to find the sampling distribution of T2 when H0 is true.
It turns out that if H0 is true than
2 1
0 0 1 1
n p nn pF T x x
p n p n
S
has an F distribution with 1 = p and 2 = n - p
We reject 0 0:H
Thus
Hotelling’s T2 test
2 1 20 0
1, a
p nT n x x F p n p T
n p
S
2 ,1
n pF T F p n p
p n
or if
f x
Another derivation of Hotelling’s T2 statistic
Another method of developing statistical tests is the Likelihood ratio method.
Suppose that the data vector, , has joint densityx
Suppose that the parameter vector, , belongs to the set . Let denote a subset of .
Finally we want to test 0 : vs
:A
H
H
ˆmax max
ˆmaxmax
Lf x L
Lf x L
The Likelihood ratio test rejects H0 if
ˆwhere the MLE of
0
ˆand the MLE of when is true.H
The situationLet denote a sample of n from the p-variate normal distribution with mean vector and covariance matrix .
Suppose we want to test
1 2, , , nx x x
0 0
0
: vs
:A
H
H
The Likelihood function is:
1
1
1
2
/ 2 / 2
1, e
2
n
i ii
x x
np nL
and the Log-likelihood function is:
, ln , l L
1
1
1ln 2 ln
2 2 2
n
i ii
np nx x
and
the Maximum Likelihood estimators of
are
1
1ˆ n
ii
x xn
and
1
1 1ˆ n
i ii
nx x x x S
n n
and the Maximum Likelihood estimators of
when H 0 is true are:
0
ˆ ˆ
and
0 01
1ˆ n
i ii
x xn
The Likelihood function is:
1
1
1
2
/ 2 / 2
1, e
2
n
i ii
x x
np nL
now
11 1
1 1
ˆ ˆˆ n n
ni i i in
i i
x x x x S x x
11
1
n
ni in
i
tr x x S x x
1
11
n
ni in
i
tr S x x x x
11
1
n
ni in
i
tr S x x x x
1 11 = 1 = n nn ntr n I n p np
Thus 2
/ 2/ 2 1
1ˆ ˆ, 2
np
nnp nn
L eS
similarly
2/ 2
/ 2
1ˆ ˆˆ ˆ, ˆ2
np
nnp
L e
and
/ 2 / 21 1
/ 2 / 2
0 01
ˆ ˆˆ ˆ,
ˆ ˆ ˆ, 1ˆ
n nn nn n
n nn
i ii
L S S
Lx x
n
/ 2
/ 2
0 01
1
n
nn
i ii
n S
x x
Note:11 12
21 22
A A u wA
A A w V
Let
111 22 21 11 12
122 11 12 22 21
A A A A AA
A A A A A
1
1u V ww
u
V u w V w
11Thus u V ww V u w V w
u
and1
1
1V ww
w V wuV u
/ 2
/ 2
0 01
1
n
nn
i ii
n S
x x
Now
and
2/
0 01
1 n
n
i ii
n S
x x
Also
0 0 0 01 1
= n n
i i i ii i
x x x x x x x x
01 1
=n n
i i ii i
x x x x x x x
0 0 01
n
ii
x x x n x x
0 01
=n
i ii
x x x x n x x
0 01
=n
i ii
x x x x n x x
0 0= 1n S n x x
Thus
2/
0 01
1 n
n
i ii
n S
x x
0 0
1
1
n S
n S n x x
0 0
1
S
nS x x
n
Thus 0 02/ 1
n
nS x x
n
S
using 1
1
1V ww
w V wuV u
0
1,
and
u n
V S
w n x
Then 1
0 02/ 1 1
nn x S x
n
Thus to reject H0 if < 2/i.e. n n
2/or n n
10 0
and 1 1
nn x S x
n
10 0or 1 -1 nn x S x n
This is the same as Hotelling’s T2 test if
2/ 11 -1 , n p n
n T F p n pn p
Example
For n = 10 students we measure scores on – Math proficiency test (x1),
– Science proficiency test (x2),
– English proficiency test (x3) and
– French proficiency test (x4)
The average score for each of the tests in previous years was 60. Has this changed?
The data
Student Math Science Eng French
1 81 89 73 742 73 79 73 743 61 86 81 814 55 70 76 735 61 71 61 666 52 70 56 587 56 74 56 568 65 87 73 699 54 76 69 72
10 48 71 62 63
Summary Statistics
60.677.368.068.6
x
S
102.044 56.689 41.222 39.48956.689 56.456 42.000 35.35641.222 42.000 75.778 65.11139.489 35.356 65.111 61.378
0.0245 -0.0255 0.0195 -0.0218-0.0255 0.0567 -0.0405 0.02670.0195 -0.0405 0.1782 -0.1783-0.0218 0.0267 -0.1783 0.2040
1
: S
Note
2 10 0 151.135T n x S x
0.05 0.05 0.05
1 4 9 4 9, 4,6 = 4.53 27.18
6 6
p nT F p n p F
n p
0
60606060
Inference for the mean vector
Univariate InferenceLet x1, x2, … , xn denote a sample of n from the normal distribution with mean and variance 2.Suppose we want to test
H0: = 0 vs
HA: ≠ 0
The appropriate test is the t test:
The test statistic:
Reject H0 if |t| > t/2
0xt n
s
We reject 0 0:H Hotelling’s T2 statistic and test
2 1 20 0 / 2
if T n x x t
S
0: offavour in
AH
2T
pnpFpn
npT
,1
where 2
Example
For n = 10 students we measure scores on – Math proficiency test (x1),
– Science proficiency test (x2),
– English proficiency test (x3) and
– French proficiency test (x4)
The average score for each of the tests in previous years was 60. Has this changed?
The data
Student Math Science Eng French
1 81 89 73 742 73 79 73 743 61 86 81 814 55 70 76 735 61 71 61 666 52 70 56 587 56 74 56 568 65 87 73 699 54 76 69 72
10 48 71 62 63
Summary Statistics
60.677.368.068.6
x
S
102.044 56.689 41.222 39.48956.689 56.456 42.000 35.35641.222 42.000 75.778 65.11139.489 35.356 65.111 61.378
0.0245 -0.0255 0.0195 -0.0218-0.0255 0.0567 -0.0405 0.02670.0195 -0.0405 0.1782 -0.1783-0.0218 0.0267 -0.1783 0.2040
1
: S
Note
2 10 0 151.135T n x S x
0.05 0.05 0.05
1 4 9 4 9, 4,6 = 4.53 27.18
6 6
p nT F p n p F
n p
0
60606060
The two sample problem
Univariate Inference
Let x1, x2, … , xn denote a sample of n from the normal distribution with mean x and variance 2.
Let y1, y2, … , ym denote a sample of n from the normal distribution with mean y and variance 2.
Suppose we want to test
H0: x = y vs
HA: x ≠ y
The appropriate test is the t test:
The test statistic:
Reject H0 if |t| > t/2 d.f. = n + m -2
1 1pooled
x yt
sn m
2 21 1
2x y
pooled
n s m ss
n m
The multivariate TestLet denote a sample of n from the p-variate normal distribution with mean vector and covariance matrix .
1 2, , , nx x x
x
0 : vs
:
x y
A x y
H
H
Suppose we want to test
Let denote a sample of m from the p-variate normal distribution with mean vector and covariance matrix .
1 2, , , my y y
y
Hotelling’s T2 statistic for the two sample problem
2 111 1 pooledT x y x y
n m
S
if H0 is true than
21
2
n m pF T
p n m
has an F distribution with 1 = p and
2 = n +m – p - 1
1 1
2 2pooled x y
n m
n m n m
S S S
We reject 0 : x yH
Thus
Hotelling’s T2 test
21if , 1
2
n m pF T F p n m p
p n m
2 11with
1 1 pooledT x y x y
n m
S
1 1
2 2pooled x y
n m
n m n m
S S S
Example 2Annual financial data are collected for firms approximately 2 years prior to bankruptcy and for financially sound firms at about the same point in time. The data on the four variables
• x1 = CF/TD = (cash flow)/(total debt), • x2 = NI/TA = (net income)/(Total assets), • x3 = CA/CL = (current assets)/(current liabilties, and • x4 = CA/NS = (current assets)/(net sales) are given in
the following table.
The data are given in the following table:
Bankrupt Firms Nonbankrupt Firms x1 x2 x3 x4
x1 x2 x3 x4
Firm CF/TD NI/TA CA/CL CA/NS Firm CF/TD NI/TA CA/CL CA/NS 1 -0.4485 -0.4106 1.0865 0.4526 1 0.5135 0.1001 2.4871 0.5368 2 -0.5633 -0.3114 1.5314 0.1642 2 0.0769 0.0195 2.0069 0.5304 3 0.0643 0.0156 1.0077 0.3978 3 0.3776 0.1075 3.2651 0.3548 4 -0.0721 -0.0930 1.4544 0.2589 4 0.1933 0.0473 2.2506 0.3309 5 -0.1002 -0.0917 1.5644 0.6683 5 0.3248 0.0718 4.2401 0.6279 6 -0.1421 -0.0651 0.7066 0.2794 6 0.3132 0.0511 4.4500 0.6852 7 0.0351 0.0147 1.5046 0.7080 7 0.1184 0.0499 2.5210 0.6925 8 -0.6530 -0.0566 1.3737 0.4032 8 -0.0173 0.0233 2.0538 0.3484 9 0.0724 -0.0076 1.3723 0.3361 9 0.2169 0.0779 2.3489 0.3970 10 -0.1353 -0.1433 1.4196 0.4347 10 0.1703 0.0695 1.7973 0.5174 11 -0.2298 -0.2961 0.3310 0.1824 11 0.1460 0.0518 2.1692 0.5500 12 0.0713 0.0205 1.3124 0.2497 12 -0.0985 -0.0123 2.5029 0.5778 13 0.0109 0.0011 2.1495 0.6969 13 0.1398 -0.0312 0.4611 0.2643 14 -0.2777 -0.2316 1.1918 0.6601 14 0.1379 0.0728 2.6123 0.5151 15 0.1454 0.0500 1.8762 0.2723 15 0.1486 0.0564 2.2347 0.5563 16 0.3703 0.1098 1.9914 0.3828 16 0.1633 0.0486 2.3080 0.1978 17 -0.0757 -0.0821 1.5077 0.4215 17 0.2907 0.0597 1.8381 0.3786 18 0.0451 0.0263 1.6756 0.9494 18 0.5383 0.1064 2.3293 0.4835 19 0.0115 -0.0032 1.2602 0.6038 19 -0.3330 -0.0854 3.0124 0.4730 20 0.1227 0.1055 1.1434 0.1655 20 0.4875 0.0910 1.2444 0.1847 21 -0.2843 -0.2703 1.2722 0.5128 21 0.5603 0.1112 4.2918 0.4443 22 0.2029 0.0792 1.9936 0.3018 23 0.4746 0.1380 2.9166 0.4487 24 0.1661 0.0351 2.4527 0.1370 25 0.5808 0.0371 5.0594 0.1268
Hotelling’s T2 test
A graphical explanation
Hotelling’s T2 statistic for the two sample problem
2 111 1 pooledT x y x y
n m
S
1 1where
2 2pooled x y
n m
n m n m
S S S
2
2 2max max1 1a a
pooled
a x yT t a
a an m
S
: 1 1
pooled
a x a yt a
a an m
Note
S
is the test statistic for testing:
0 : vs :x y A x yH a a a H a a a
Popn A
Popn B
X1
X2
Hotelling’s T2 test
Popn A
Popn B
X1
X2
Univariate test for X1
Popn A
Popn B
X1
X2
Univariate test for X2
Popn A
Popn B
X1
X2
Univariate test for a1X1 + a2X2
Mahalanobis distance
A graphical explanation
22
1
,p
i ii
d a b a b a b a b
Euclidean distance
a
points equidistant
from a
2 ,Md a b a b a b
Mahalanobis distance: , a covariance matrix
a
points equidistant
from a
Hotelling’s T2 statistic for the two sample problem
2 1 21 1, ,pooled M pooledT x y x y d x y
n m
S S
2 111 1 pooledT x y x y
n m
S
1pooled
nmx y x y
n m
S
2 , ,M pooled
n md x y
nm
S
Popn A
Popn B
X1
X2
Case I
Popn A
Popn B
X1
X2
Case II
Popn A
Popn B
X1
X2
Case I
Popn A
Popn B
X1
X2
Case II
In Case I the Mahalanobis distance between the mean vectors is larger than in Case II, even though the Euclidean distance is smaller. In Case I there is more separation between the two bivariate normal distributions
Discrimination and Classification
Discrimination
Situation:
We have two or more populations 1, 2, etc
(possibly p-variate normal).
The populations are known (or we have data from each population)
We have data for a new case (population unknown) and we want to identify the which population for which the new case is a member.
Examples Population 1 and 2 Measured variables X1, X2, X3, ... , Xn
1. Solvent and distressed Total assets, cost of stocks and bonds, property-liability market value of stocks and bonds, loss insurance companies expenses, surplus, amount of premiums written. 2. Nonulcer dyspeptics (those Measures of anxiety, dependence, guilt, with stomach problems) and perfectionism. controls ("normal") 3. Federalist papers written by Frequencies of different words and length James Madison and those of sentences. written by Alexander Hamilton 4. Good and poor credit risks. Income, age, number of credit cards, family size education 5. Succesful and unsuccessful Entrance examination scores, high-school grade- (fail to graduate) college point average, number of high-school activities students 6. Purchasers and Non purchasers. Income, Education, family size, previous of a home computer purchase of other home computers, Occupation 7. Two species of chickweed Sepal length, Petal length, petal cleft depth, bract length, sreious tip length, sacrious tip length, pollen diameter
The Basic Problem
Suppose that the data from a new case x1, … , xp has joint density function either :
1: f(x1, … , xn) or
2: g(x1, … , xn)
We want to make the decision to
D1: Classify the case in 1 (f is the correct distribution) or
D2: Classify the case in 2 (g is the correct distribution)
The Two Types of Errors
1. Misclassifying the case in 1 when it actually lies in 2.
Let P[1|2] = P[D1|2] = probability of this type of error
2. Misclassifying the case in 2 when it actually lies in 1.
Let P[2|1] = P[D2|1] = probability of this type of error
This is similar Type I and Type II errors in hypothesis testing.
Note:
1. C1 = the region were we make the decision D1.
(the decision to classify the case in 1)
A discrimination scheme is defined by splitting p –dimensional space into two regions.
2. C2 = the region were we make the decision D2.
(the decision to classify the case in 2)
1. Set up the regions C1 and C2 so that one of the probabilities of misclassification , P[2|1] say, is at some low acceptable value . Accept the level of the other probability of misclassification P[1|2] = .
There can be several approaches to determining the regions C1 and C2. All concerned with taking into account the probabilities of misclassification P[2|1] and P[1|2]
2. Set up the regions C1 and C2 so that the total probability of misclassification:
P[Misclassification] = P[1] P[2|1] + P[2]P[1|2]
is minimized
P[1] = P[the case belongs to 1]
P[2] = P[the case belongs to 2]
3. Set up the regions C1 and C2 so that the total expected cost of misclassification:
E[Cost of Misclassification]
= c2|1P[1] P[2|1] + c1|2 P[2]P[1|2]
is minimized
P[1] = P[the case belongs to 1]
P[2] = P[the case belongs to 2]
c2|1= the cost of misclassifying the case in 2 when the case belongs to 1.
c1|2= the cost of misclassifying the case in 1 when the case belongs to 2.
4. Set up the regions C1 and C2 The two types of error are equal:
P[2|1] = P[1|2]
Computer security:
P[2|1] = P[identifying a valid user as an imposter]
P[2] = P[imposter]
1: Valid users
2: Imposters
c1|2= the cost of identifying the user as a valid user when the user is an imposter.
P[1|2] = P[identifying an imposter as a valid user ]
P[1] = P[valid user]
c2|1= the cost of identifying the user as an imposter when the user is a valid user.
This problem can be viewed as an Hypothesis testing problem
P[2|1] =
H0:1 is the correct population
HA:2 is the correct population
P[1|2] =
Power = 1 -
The Neymann-Pearson Lemma Suppose that the data x1, … , xn has joint density function
f(x1, … , xn ;)
where is either 1 or 2.Let
g(x1, … , xn) = f(x1, … , xn ;1) and
h(x1, … , xn) = f(x1, … , xn ;2)
We want to test
H0: = 1 (g is the correct distribution) against
HA: = 2 (h is the correct distribution)
The Neymann-Pearson Lemma states that the Uniformly Most Powerful (UMP) test of size is to reject H0 if:
2 1
1 1
, ,
, ,n
n
L h x xk
L g x x
and accept H0 if:
2 1
1 1
, ,
, ,n
n
L h x xk
L g x x
where k is chosen so that the test is of size .
Proof: Let C be the critical region of any test of size . Let
1*
11
, ,, ,
, ,n
nn
h x xC x x k
g x x
*
1 1, , n n
C
g x x dx dx
1 1, , n n
C
g x x dx dx
Note: * * *C C C C C
* *C C C C C
We want to show that
*
1 1, , n n
C
h x x dx dx
1 1, , n n
C
h x x dx dx
hence *
1 1, , n n
C
g x x dx dx
1 1, , n n
C
g x x dx dx and
*
1 1, , n n
C C
g x x dx dx
*
1 1, , n n
C C
g x x dx dx
*
1 1, , n n
C C
g x x dx dx
*
1 1, , n n
C C
g x x dx dx
Thus *
1 1, , n n
C C
g x x dx dx
*
1 1, , n n
C C
g x x dx dx
*C*C C*C C
C
*C C
*
1 1, , n n
C C
g x x dx dx
*
1 1, , n n
C C
g x x dx dx
and
*
1 1, , n n
C C
g x x dx dx
*
1 1, , n n
C C
g x x dx dx
*
1 1
1, , n n
C C
h x x dx dxk
*1 1
1since , , , , in .n ng x x h x x C
k
*
1 1
1, , n n
C C
h x x dx dxk
*1 1
1since , , , , in .n ng x x h x x C
k
Thus *
1 1, , n n
C C
h x x dx dx
*
1 1, , n n
C C
h x x dx dx
*
1 1, , n n
C
h x x dx dx
1 1, , n n
C
h x x dx dx
and
when we add the common quantity
*
1 1, , n n
C C
h x x dx dx
to both sides.Q.E.D.
Fishers Linear Discriminant Function.
Suppose that x1, … , xp is either data from a p-variate Normal distribution with mean vector:
111 12
/ 2 1/ 2
1
2
x x
pf x e
The covariance matrix is the same for both populations 1 and 2.
1 2 or
112 22
/ 2 1/ 2
1
2
x x
pg x e
111 12
112 22
/ 2 1/ 2
/ 2 1/ 2
1
21
2
x x
p
x x
p
ef x
g x e
The Neymann-Pearson Lemma states that we should classify into populations 1 and 2 using:
1 11 12 2 1 12 2x x x xe
That is make the decision
D1 : population is 1
if ≥ k
1 11 12 2 1 12 2or ln lnx x x x k
or 1 12 2 1 1 2lnx x x x k
1 1 12 2 22x x x
1 1 11 1 12 2lnx x x k
1 1 111 2 1 1 2 22lnx k
or
and
a x K
1 1 111 2 1 1 2 22 and lna K k
Finally we make the decision
D1 : population is 1
if
where
11 2a x x
The function
Is called Fisher’s linear discriminant function
11 2a x x K
1
21
2
11 2a x x x S x
In the case where the populations are unknown but estimated from data
Fisher’s linear discriminant function
1201008060402000
100
200
A Pictorial representation of Fisher's procedure for two populations
x
x
1
2Classify as
Classify as
1
2
1 2
Example 1
1 : Riding-mower owners 2 : Nonowners
x1 (Income x2 (Lot size x1 (Income x2 (Lot size in $1000s) in 1000 sq ft) in $1000s) in 1000 sq ft) 20.0 9.2 25.0 9.8 28.5 8.4 17.6 10.4 21.6 10.8 21.6 8.6 20.5 10.4 14.4 10.2 29.0 11.8 28.0 8.8 36.7 9.6 16.4 8.8 36.0 8.8 19.8 8.0 27.6 11.2 22.0 9.2 23.0 10.0 15.8 8.2 31.0 10.4 11.0 9.4 17.0 11.0 17.0 7.0 27.0 10.0 21.0 7.4
403020104
8
12
Riding Mower ownersNon ownwers
Income (in thousands of dollars)
Lot
Siz
e (i
n th
ousa
nds
of s
quar
e fe
et)
Example 2Annual financial data are collected for firms approximately 2 years prior to bankruptcy and for financially sound firms at about the same point in time. The data on the four variables
• x1 = CF/TD = (cash flow)/(total debt), • x2 = NI/TA = (net income)/(Total assets), • x3 = CA/CL = (current assets)/(current liabilties, and • x4 = CA/NS = (current assets)/(net sales) are given in
the following table.
The data are given in the following table:
Bankrupt Firms Nonbankrupt Firms x1 x2 x3 x4
x1 x2 x3 x4
Firm CF/TD NI/TA CA/CL CA/NS Firm CF/TD NI/TA CA/CL CA/NS 1 -0.4485 -0.4106 1.0865 0.4526 1 0.5135 0.1001 2.4871 0.5368 2 -0.5633 -0.3114 1.5314 0.1642 2 0.0769 0.0195 2.0069 0.5304 3 0.0643 0.0156 1.0077 0.3978 3 0.3776 0.1075 3.2651 0.3548 4 -0.0721 -0.0930 1.4544 0.2589 4 0.1933 0.0473 2.2506 0.3309 5 -0.1002 -0.0917 1.5644 0.6683 5 0.3248 0.0718 4.2401 0.6279 6 -0.1421 -0.0651 0.7066 0.2794 6 0.3132 0.0511 4.4500 0.6852 7 0.0351 0.0147 1.5046 0.7080 7 0.1184 0.0499 2.5210 0.6925 8 -0.6530 -0.0566 1.3737 0.4032 8 -0.0173 0.0233 2.0538 0.3484 9 0.0724 -0.0076 1.3723 0.3361 9 0.2169 0.0779 2.3489 0.3970 10 -0.1353 -0.1433 1.4196 0.4347 10 0.1703 0.0695 1.7973 0.5174 11 -0.2298 -0.2961 0.3310 0.1824 11 0.1460 0.0518 2.1692 0.5500 12 0.0713 0.0205 1.3124 0.2497 12 -0.0985 -0.0123 2.5029 0.5778 13 0.0109 0.0011 2.1495 0.6969 13 0.1398 -0.0312 0.4611 0.2643 14 -0.2777 -0.2316 1.1918 0.6601 14 0.1379 0.0728 2.6123 0.5151 15 0.1454 0.0500 1.8762 0.2723 15 0.1486 0.0564 2.2347 0.5563 16 0.3703 0.1098 1.9914 0.3828 16 0.1633 0.0486 2.3080 0.1978 17 -0.0757 -0.0821 1.5077 0.4215 17 0.2907 0.0597 1.8381 0.3786 18 0.0451 0.0263 1.6756 0.9494 18 0.5383 0.1064 2.3293 0.4835 19 0.0115 -0.0032 1.2602 0.6038 19 -0.3330 -0.0854 3.0124 0.4730 20 0.1227 0.1055 1.1434 0.1655 20 0.4875 0.0910 1.2444 0.1847 21 -0.2843 -0.2703 1.2722 0.5128 21 0.5603 0.1112 4.2918 0.4443 22 0.2029 0.0792 1.9936 0.3018 23 0.4746 0.1380 2.9166 0.4487 24 0.1661 0.0351 2.4527 0.1370 25 0.5808 0.0371 5.0594 0.1268