Stochastic Majorization, A Characterizationfncdcn/Research/Stochastic Majorization, A... · Web...
Transcript of Stochastic Majorization, A Characterizationfncdcn/Research/Stochastic Majorization, A... · Web...
Stochastic Majorization: A Characterization
by
David C. Nachman
Department of Finance
J. Mack Robinson College of Business
Georgia State University
Atlanta, Georgia 30303-3083
September, 2005
Abstract
Stochastic majorization is a pre-order on the space of probability measures on a finite
dimensional Euclidean space induced by the majorization pre-order on this underlying
space. Taking advantage of techniques used in mathematical economics and the
continuity properties of the majorization relation, we provide a general characterization
of stochastic majorization.
Keywords: Majorization, stochastic majorization, Schur-convex functions and sets,
continuous correspondences.
1
1. Introduction
The applications of majorization and various notions of stochastic majorization are
extensive in mathematics especially in probability and statistics. The excellent treatise by
Marshall and Olkin (1979) presents the theory of majorization and an extensive display
of its applications and extensions. See this treatise for appropriate references to the
original work. In particular, Marshall and Olkin, 1979, Chapter 11, presents and
characterizes various notions of stochastic majorization. The major one of import here is
the one induced by the cone of Schur-convex functions.
Kamae, et. al., 1977, Theorem 1, present a general characterization of the partial
ordering of probability measures induced by a partial ordering on the underlying space.
Majorization is a pre-order but not a partial order since it is not antsymmetric (Marshall
and Olkin, 1979, 1.B). In this paper, we exploit the continuity properties of majorization
and theorems of Strassen, (1965) and Himmelberg and Van Vleck (1975) to provide a
Kamie, et. al. like characterization of stochastic majorization. The basics including the
continuity properties are presented in section 2. To this author’s knowledge, these
continuity properties have not been noticed. The characterization of stochastic
majorization is presented in section 3. To this author’s knowledge, this characterization is
new as well. We borrow much from Marshall and Olkin (1979).
2. Majorization
Let and be n-tuples of real numbers and let
and denote the vectors and with coordinates
rearranged in decreasing order, i. e., and . The vector is
2
majorized by the vector (or majorizes ), written , if for each ,
with equality holding for (Marshall and Olkin, 1979, A.1, p. 7).
In words, is majorized by if the components of are more evenly spread
out than the components of or the components of are more concentrated than the
components of . This intuition is reinforced by noting the following. Let ,
the n-tuple whose coordinates are all equal to one. Then for a vector the inner product
is the sum of the components of . Let and let
, where appears in the component. The vectors , ,
and all have the same total sum of components, but the components of are more
evenly spread out than those of . Clearly concentrates this sum in one component.
In this sense, is the most evenly spread of this sum of components and is the most
concentrated of this sum. Indeed, we have that , .
We note that the majorization relation is reflexive and transitive (established
below in Lemma 3) and hence is a pre-ordering. It is not a partial ordering, however,
since it is not antisymmetric.
Let denote n-dimensional Euclidean space. All topological properties in the
sequel will be with respect to the usual metric on . Let denote the set of
permutation matrices and let denote the set of doubly stochastic matrices. Then
if and only if there is one one in each row and each column of and all other
entries are zero. Similarly, if and only if the entries in are nonnegative and
each row and each column sum to one.
3
Theorem 1. For , the following are equivalent:
i. ;
ii. , some ;
iii. , for some , , and some .
Proof: The equivalence of i and ii is due to Hardy, Littlewood and Polya. See Marshall
and Olkin, 1979, Theorem 2.B.2. The equivalence of ii and iii is due to Birkhoff. See
Marshall and Olkin, 1979, Theorem 2.A.2
For each let , the set of n-tuples that are majorized
by . For a picture of this set in the case n = 3 see Marshall and Olkin, 1979, Figure 3, p.
9. Let , the graph of the relation . The following are
properties of this relation.
Theorem 2. is a compact convex valued continuous correspondence in .
Consequently is closed in .
Proof: Clearly for each , , so is a correspondence in terms of
Hildenbrand, 1974, p. 5. is convex, by Theorem 1.ii (convex combinations of
doubly stochastic matrices are doubly stochastic) and compact since, by Theorem 1.iii, it
is the convex polyhedron generated by the finite number of permutations of .
Suppose and with . If , by Theorem 1.ii,
, some . Then again by Theorem 1.ii, and , so
is lower hemi-continuous (Hildenbrand, 1974, Theorem 2, p. 27).
4
Let with arbitrary. By Birkhoff’s theorem is the
convex polyhedron generated by the permutation matrices and hence is compact in .
Thus there is a subsequence of the that converges to an element . For this
subsequence indexed by , . Thus is upper hemi-continuous
(Hildenbrand, 1974, Theorem 1, p. 24). The closure of then follows by the same result
As obvious as these properties of are, except for convexity of , they
appear nowhere in the literature on majorization to this author’s knowledge. The
following result establishes the transitivity of majorization and will be used later in the
characterization of stochastic majorization.
Lemma 3. For , if , then .
Proof: For , suppose and . Then by Theorem 1.ii and
some . But then and (Marshall and Olkin, 1979,
2.A.3, p. 20). Again by Theorem 1.ii,
We are interested in probability measures on . Let denote the Borel sets in
. Let denote the set of probability measures on endowed with the
topology of weak convergence (Hildenbrand, pp. 48-53). For the rest of the paper we
drop the reference space and just write . However, the reference space is different and
will be mentioned explicitly for one part of the characterization of stochastic majorization
in Section 3.
5
For denote by the support of , the smallest closed subset of
with measure one (Chung, 1974, p. 31). For each , let
. Let , the graph of
the correspondence .
Theorem 4. is a compact convex valued continuous correspondence in . The graph
is closed in .
Proof: Let and denote by the probability measure with . Then
. Let and let . Then
, so .
Thus is convex valued. By Himmelberg and Van Vleck, 1975, Theorem 3.i, inherits
the continuity and compact valuedness of established in Theorem 2. The closure of
follows from Hildenbrand, 1974, Theorem 1, p. 24
We use the correspondence to characterize stochastic majorization in the next
section.
3. Stochastic Majorization
The functions that are increasing (non-decreasing) in the
majorization relation are called Schur-convex. See Marshall and Olkin, 1979, Ch. 1.D,
Ch. 3, for the origins of this terminology and the characterizations of this class of
functions. Denote by the class of Borel measurable Schur-convex functions. The
measurability requirement is a restriction (Marshall and Olkin, 1979, 3.C.4, p. 70). We
can extend the relation in to a relation in . For we say that
6
majorizes (or that is majorized by ), and write , if and only if
for every for which both these integrals exist. The range of
integration is all of unless specifically mentioned. This is truly an extension since for
, in if and only if in .
Intuitively, if puts more weight on vectors that are extreme in the
relation on than does . The relation in is the version of stochastic
majorization studied in Marshall and Olkin, 1979, Ch. 11. There definitions are given
in terms of valued random vectors say and . The relation is then stated as
( stochastically majorizes or is stochastically majorized by in the
sense of ) if for all for which these expectations exist.
It is easy to see that this is equivalent to the above definition since these expectations are
given by integration with respect to the distributions in of these random vectors and
given these distributions there are valued random variables with these distributions.
There is a another definition of stochastic majorization that Marshall and Olkin,
1979, pp. 282-283, call that implies and appears ostensibly to be stronger than .
There if for all , where is the typical meaning of
stochastically larger (Marshall and Olkin, 1979, 17A.1). Clearly since
stochastically larger random variable have larger expectations. It turns out that in this
particular case we also have as well. See the argument in Marshall and Olkin,
1979, top of p. 283. We will use this argument to show one part of the characterization
of the relation in defined above.
7
A Markov kernel on is a map such that for each set
the map is Borel measurable and for fixed
. For such a Markov kernel and a probability measure
denote by the element of defined by , for
measurable rectangles, . We say that the first marginal of is and denote
the second marginal . Finally, we say that a set is Schur-convex if its indicator
function is Schur-convex. These designations are borrowed from Kamae, et. al., 1977, pp.
899-900.
The following characterization of the relation on is new and flushes out
the intuition given above.
Theorem 5. For the following are equivalent:
i. ;
ii. There exists a Markov kernel on such that and , almost
every ;
iii. There exists a probability measure with with first
marginal and second marginal ;
iv. There exists a real valued random variable and two measurable functions
with ( ) such that the distribution of is
and the distribution of is ;
v. There exist valued random variables and such that and the
distribution of is and the distribution of is ;
8
vi. for every Schur-convex set .
Proof: The key equivalence is i. and ii. The rest follow easily. Let and assume
that ii holds. Let be such that the integrals exist. Then
, since , almost every
. This establishes i.
Therefore assume i. For every bounded continuous function define
. By Theorem 4 and Hildenbrand, 1974, Corollary p. 30,
is continuous in and for each , since
and . Thus is bounded as well, so all integrals below exist. Finally,
is also Schur-convex in . For if and then by Lemma 3
, implying that , and hence . It follows that
, the last inequality from i. Condition ii then
follows from Strassen, 1965, Theorem 3.
Assume ii and let . Then , since the -section
for every . From Theorem 2, is closed in . This gives iii. Therefore
assume iii. The construction in Kamae, et. al., 1977, Theorem 1. (iii) goes through here as
well and this gives iv. Assuming iv let and . Then clearly
and v. follows from the fact that (Marshall and Olkin, 1979, top of p.
283). Therefore assume v. If is Schur-convex, then
9
, where is the indicator of the set and the
inequality follows from the fact that and so .
It remains to show that vi imples i. Assume vi. For , for
all real . It follows from vi that and
hence i. follows from Marshall and Olkin, 1979, 17.A.1
Kamae, et. al. (1977) use a theorem of Strassen (1965) to characterize stochastic
orderings induced by a partial order on the underlying space. Their result, Kamae, et. al.,
1977, Theorem 1, is the model for Theorem 5 above, but the relevant theorem of Strassen
used in the proof of Theorem 5 is not the one used by Kamae, et. al. (1977). We
emphasize here the relation on is not a partial order.
The crucial implication i. implies ii. in Theorem 5 relies on Theorem 3 of Strassen
(1965) and this theorem applies more generally. It can be used to obtain the same
implication for any pre-order on any Polish space that is sufficiently regular to give the
function , defined above in the second paragraph the proof of Theorem 5, to be
Borel measurable in . Weaker conditions than those of Theroem 2 above suffice for this
function to be Borel measurable. See for example Hildenbrand, 1974, Proposition 3, p.60.
The result Hildenbrand, 1974, Corollary p. 30, is referred to in the mathematical
economics literature as the maximum theorem and is used there to establish continuity of
consumer demand in various situations. Transitivity of the pre-order gives monotonicity
of in the pre-order. Reflexivity of the pre-order gives and is
10
convex whether is or not. This convex valuedness is essential to apply Strassen,
1965, Theorem 3, but comes at no cost.
Theorem 5 is reminiscent of the characterization of dilations as given for example
in Phelps, 1966, Ch. 13. In this case, a dilation moves probability weight toward extreme
points of a compact convex set. Here the Markov kernel of Theorem 5.ii moves
probability weight away from extreme points (which is an extreme point of ) to
less extreme points, in the sense of majorization, in . Borrowing with a little license
from the terminology in Kamae, et. al., 1977, p. 900, we could call the Markov kernel
of Theorem 5.ii downward.
As in the case of dilations, it is natural to ask about maximal measures for the
relation on , where is a compact convex subset of , and the support of
these measures if they exist. This of course is complicated by the fact that is only a
pre-order and not a partial order. This is a project for future research.
11
REFERENCES
Chung (1974) Chung, K. L. (1974), A Course in Probability Theory (Academic Press,
New York, 2nd ed.).
Hildenbrand (1974) Hildenbrand, W. (1974), Core and Equilibria of a Large Economy
(Princeton University Press, Princeton).
Himmelberg and Van Vleck, (1975) Himmelberg, C. J. and Van Vleck, F. S. (1975),
Multifunctions with values in a space of probability measures. J. Math. Anal. Appls. 50,
108-112.
Kamae, et. al., (1977) Kamae, T., Krengel, U. and O’Brien, G. L. (1977), Stochastic
inequalities on partially ordered spaces. Ann. Probab. 5, 899-912.
Marshall and Olkin (1979) Marshall, A. W. and Olkin, I. (1979), Inequalities: Theory of
Majorization and Its Applications (Academic Press, New York).
Phelps (1966) Phelps, R. R. (1966), Lectures on Choquet’s Theorem (Van Nostrand,
Princeton).
Strassen (1965) Strassen, V. (1965), The existence of probability measures with given
marginals. Ann. Math. Statist. 36, 423-439.
12
Derivation of Theorem 5.iv from Theorem 5.iii. (taken from Kamae, et. al., 1977,
Theorem 1 (iii) from Theorem 1 (ii)). The probability space is isomorphic mod 0
to where is a Borel subset of , is the collection of Borel subsets of ,
and is a probability measure on . The reference for this result is given by
Kamae, et. al., 1977, p. 900. Let be the isomorphism and let and
let , where and are the projections of onto the first and
second factors. This defines and on . For , but , take
. For each , and
and and . Also and and thus the
distribution of is and the distribution of is .
13