Stochastic Majorization, A Characterizationfncdcn/Research/Stochastic Majorization, A... · Web...

Stochastic Majorization: A Characterization

by

David C. Nachman

Department of Finance

J. Mack Robinson College of Business

Georgia State University

Atlanta, Georgia 30303-3083

September, 2005

Abstract

Stochastic majorization is a pre-order on the space of probability measures on a finite

dimensional Euclidean space induced by the majorization pre-order on this underlying

space. Taking advantage of techniques used in mathematical economics and the

continuity properties of the majorization relation, we provide a general characterization

of stochastic majorization.

Keywords: Majorization, stochastic majorization, Schur-convex functions and sets,

continuous correspondences.

1

1. Introduction

The applications of majorization and various notions of stochastic majorization are

extensive in mathematics especially in probability and statistics. The excellent treatise by

Marshall and Olkin (1979) presents the theory of majorization and an extensive display

of its applications and extensions. See this treatise for appropriate references to the

original work. In particular, Marshall and Olkin, 1979, Chapter 11, presents and

characterizes various notions of stochastic majorization. The major one of import here is

the one induced by the cone of Schur-convex functions.

Kamae, et. al., 1977, Theorem 1, present a general characterization of the partial

ordering of probability measures induced by a partial ordering on the underlying space.

Majorization is a pre-order but not a partial order since it is not antsymmetric (Marshall

and Olkin, 1979, 1.B). In this paper, we exploit the continuity properties of majorization

and theorems of Strassen, (1965) and Himmelberg and Van Vleck (1975) to provide a

Kamie, et. al. like characterization of stochastic majorization. The basics including the

continuity properties are presented in section 2. To this author’s knowledge, these

continuity properties have not been noticed. The characterization of stochastic

majorization is presented in section 3. To this author’s knowledge, this characterization is

new as well. We borrow much from Marshall and Olkin (1979).

2. Majorization

Let and be n-tuples of real numbers and let

and denote the vectors and with coordinates

rearranged in decreasing order, i. e., and . The vector is

2

majorized by the vector (or majorizes ), written , if for each ,

with equality holding for (Marshall and Olkin, 1979, A.1, p. 7).

In words, is majorized by if the components of are more evenly spread

out than the components of or the components of are more concentrated than the

components of . This intuition is reinforced by noting the following. Let ,

the n-tuple whose coordinates are all equal to one. Then for a vector the inner product

is the sum of the components of . Let and let

, where appears in the component. The vectors , ,

and all have the same total sum of components, but the components of are more

evenly spread out than those of . Clearly concentrates this sum in one component.

In this sense, is the most evenly spread of this sum of components and is the most

concentrated of this sum. Indeed, we have that , .

We note that the majorization relation is reflexive and transitive (established

below in Lemma 3) and hence is a pre-ordering. It is not a partial ordering, however,

since it is not antisymmetric.

Let denote n-dimensional Euclidean space. All topological properties in the

sequel will be with respect to the usual metric on . Let denote the set of

permutation matrices and let denote the set of doubly stochastic matrices. Then

if and only if there is one one in each row and each column of and all other

entries are zero. Similarly, if and only if the entries in are nonnegative and

each row and each column sum to one.

3

Theorem 1. For , the following are equivalent:

i. ;

ii. , some ;

iii. , for some , , and some .

Proof: The equivalence of i and ii is due to Hardy, Littlewood and Polya. See Marshall

and Olkin, 1979, Theorem 2.B.2. The equivalence of ii and iii is due to Birkhoff. See

Marshall and Olkin, 1979, Theorem 2.A.2

For each let , the set of n-tuples that are majorized

by . For a picture of this set in the case n = 3 see Marshall and Olkin, 1979, Figure 3, p.

9. Let , the graph of the relation . The following are

properties of this relation.

Theorem 2. is a compact convex valued continuous correspondence in .

Consequently is closed in .

Proof: Clearly for each , , so is a correspondence in terms of

Hildenbrand, 1974, p. 5. is convex, by Theorem 1.ii (convex combinations of

doubly stochastic matrices are doubly stochastic) and compact since, by Theorem 1.iii, it

is the convex polyhedron generated by the finite number of permutations of .

Suppose and with . If , by Theorem 1.ii,

, some . Then again by Theorem 1.ii, and , so

is lower hemi-continuous (Hildenbrand, 1974, Theorem 2, p. 27).

4

Let with arbitrary. By Birkhoff’s theorem is the

convex polyhedron generated by the permutation matrices and hence is compact in .

Thus there is a subsequence of the that converges to an element . For this

subsequence indexed by , . Thus is upper hemi-continuous

(Hildenbrand, 1974, Theorem 1, p. 24). The closure of then follows by the same result

As obvious as these properties of are, except for convexity of , they

appear nowhere in the literature on majorization to this author’s knowledge. The

following result establishes the transitivity of majorization and will be used later in the

characterization of stochastic majorization.

Lemma 3. For , if , then .

Proof: For , suppose and . Then by Theorem 1.ii and

some . But then and (Marshall and Olkin, 1979,

2.A.3, p. 20). Again by Theorem 1.ii,

We are interested in probability measures on . Let denote the Borel sets in

. Let denote the set of probability measures on endowed with the

topology of weak convergence (Hildenbrand, pp. 48-53). For the rest of the paper we

drop the reference space and just write . However, the reference space is different and

will be mentioned explicitly for one part of the characterization of stochastic majorization

in Section 3.

5

For denote by the support of , the smallest closed subset of

with measure one (Chung, 1974, p. 31). For each , let

. Let , the graph of

the correspondence .

Theorem 4. is a compact convex valued continuous correspondence in . The graph

is closed in .

Proof: Let and denote by the probability measure with . Then

. Let and let . Then

, so .

Thus is convex valued. By Himmelberg and Van Vleck, 1975, Theorem 3.i, inherits

the continuity and compact valuedness of established in Theorem 2. The closure of

follows from Hildenbrand, 1974, Theorem 1, p. 24

We use the correspondence to characterize stochastic majorization in the next

section.

3. Stochastic Majorization

The functions that are increasing (non-decreasing) in the

majorization relation are called Schur-convex. See Marshall and Olkin, 1979, Ch. 1.D,

Ch. 3, for the origins of this terminology and the characterizations of this class of

functions. Denote by the class of Borel measurable Schur-convex functions. The

measurability requirement is a restriction (Marshall and Olkin, 1979, 3.C.4, p. 70). We

can extend the relation in to a relation in . For we say that

6

majorizes (or that is majorized by ), and write , if and only if

for every for which both these integrals exist. The range of

integration is all of unless specifically mentioned. This is truly an extension since for

, in if and only if in .

Intuitively, if puts more weight on vectors that are extreme in the

relation on than does . The relation in is the version of stochastic

majorization studied in Marshall and Olkin, 1979, Ch. 11. There definitions are given

in terms of valued random vectors say and . The relation is then stated as

( stochastically majorizes or is stochastically majorized by in the

sense of ) if for all for which these expectations exist.

It is easy to see that this is equivalent to the above definition since these expectations are

given by integration with respect to the distributions in of these random vectors and

given these distributions there are valued random variables with these distributions.

There is a another definition of stochastic majorization that Marshall and Olkin,

1979, pp. 282-283, call that implies and appears ostensibly to be stronger than .

There if for all , where is the typical meaning of

stochastically larger (Marshall and Olkin, 1979, 17A.1). Clearly since

stochastically larger random variable have larger expectations. It turns out that in this

particular case we also have as well. See the argument in Marshall and Olkin,

1979, top of p. 283. We will use this argument to show one part of the characterization

of the relation in defined above.

7

A Markov kernel on is a map such that for each set

the map is Borel measurable and for fixed

. For such a Markov kernel and a probability measure

denote by the element of defined by , for

measurable rectangles, . We say that the first marginal of is and denote

the second marginal . Finally, we say that a set is Schur-convex if its indicator

function is Schur-convex. These designations are borrowed from Kamae, et. al., 1977, pp.

899-900.

The following characterization of the relation on is new and flushes out

the intuition given above.

Theorem 5. For the following are equivalent:

i. ;

ii. There exists a Markov kernel on such that and , almost

every ;

iii. There exists a probability measure with with first

marginal and second marginal ;

iv. There exists a real valued random variable and two measurable functions

with ( ) such that the distribution of is

and the distribution of is ;

v. There exist valued random variables and such that and the

distribution of is and the distribution of is ;

8

vi. for every Schur-convex set .

Proof: The key equivalence is i. and ii. The rest follow easily. Let and assume

that ii holds. Let be such that the integrals exist. Then

, since , almost every

. This establishes i.

Therefore assume i. For every bounded continuous function define

. By Theorem 4 and Hildenbrand, 1974, Corollary p. 30,

is continuous in and for each , since

and . Thus is bounded as well, so all integrals below exist. Finally,

is also Schur-convex in . For if and then by Lemma 3

, implying that , and hence . It follows that

, the last inequality from i. Condition ii then

follows from Strassen, 1965, Theorem 3.

Assume ii and let . Then , since the -section

for every . From Theorem 2, is closed in . This gives iii. Therefore

assume iii. The construction in Kamae, et. al., 1977, Theorem 1. (iii) goes through here as

well and this gives iv. Assuming iv let and . Then clearly

and v. follows from the fact that (Marshall and Olkin, 1979, top of p.

283). Therefore assume v. If is Schur-convex, then

9

, where is the indicator of the set and the

inequality follows from the fact that and so .

It remains to show that vi imples i. Assume vi. For , for

all real . It follows from vi that and

hence i. follows from Marshall and Olkin, 1979, 17.A.1

Kamae, et. al. (1977) use a theorem of Strassen (1965) to characterize stochastic

orderings induced by a partial order on the underlying space. Their result, Kamae, et. al.,

1977, Theorem 1, is the model for Theorem 5 above, but the relevant theorem of Strassen

used in the proof of Theorem 5 is not the one used by Kamae, et. al. (1977). We

emphasize here the relation on is not a partial order.

The crucial implication i. implies ii. in Theorem 5 relies on Theorem 3 of Strassen

(1965) and this theorem applies more generally. It can be used to obtain the same

implication for any pre-order on any Polish space that is sufficiently regular to give the

function , defined above in the second paragraph the proof of Theorem 5, to be

Borel measurable in . Weaker conditions than those of Theroem 2 above suffice for this

function to be Borel measurable. See for example Hildenbrand, 1974, Proposition 3, p.60.

The result Hildenbrand, 1974, Corollary p. 30, is referred to in the mathematical

economics literature as the maximum theorem and is used there to establish continuity of

consumer demand in various situations. Transitivity of the pre-order gives monotonicity

of in the pre-order. Reflexivity of the pre-order gives and is

10

convex whether is or not. This convex valuedness is essential to apply Strassen,

1965, Theorem 3, but comes at no cost.

Theorem 5 is reminiscent of the characterization of dilations as given for example

in Phelps, 1966, Ch. 13. In this case, a dilation moves probability weight toward extreme

points of a compact convex set. Here the Markov kernel of Theorem 5.ii moves

probability weight away from extreme points (which is an extreme point of ) to

less extreme points, in the sense of majorization, in . Borrowing with a little license

from the terminology in Kamae, et. al., 1977, p. 900, we could call the Markov kernel

of Theorem 5.ii downward.

As in the case of dilations, it is natural to ask about maximal measures for the

relation on , where is a compact convex subset of , and the support of

these measures if they exist. This of course is complicated by the fact that is only a

pre-order and not a partial order. This is a project for future research.

11

REFERENCES

Chung (1974) Chung, K. L. (1974), A Course in Probability Theory (Academic Press,

New York, 2nd ed.).

Hildenbrand (1974) Hildenbrand, W. (1974), Core and Equilibria of a Large Economy

(Princeton University Press, Princeton).

Himmelberg and Van Vleck, (1975) Himmelberg, C. J. and Van Vleck, F. S. (1975),

Multifunctions with values in a space of probability measures. J. Math. Anal. Appls. 50,

108-112.

Kamae, et. al., (1977) Kamae, T., Krengel, U. and O’Brien, G. L. (1977), Stochastic

inequalities on partially ordered spaces. Ann. Probab. 5, 899-912.

Marshall and Olkin (1979) Marshall, A. W. and Olkin, I. (1979), Inequalities: Theory of

Majorization and Its Applications (Academic Press, New York).

Phelps (1966) Phelps, R. R. (1966), Lectures on Choquet’s Theorem (Van Nostrand,

Princeton).

Strassen (1965) Strassen, V. (1965), The existence of probability measures with given

marginals. Ann. Math. Statist. 36, 423-439.

12

Derivation of Theorem 5.iv from Theorem 5.iii. (taken from Kamae, et. al., 1977,

Theorem 1 (iii) from Theorem 1 (ii)). The probability space is isomorphic mod 0

to where is a Borel subset of , is the collection of Borel subsets of ,

and is a probability measure on . The reference for this result is given by

Kamae, et. al., 1977, p. 900. Let be the isomorphism and let and

let , where and are the projections of onto the first and

second factors. This defines and on . For , but , take

. For each , and

and and . Also and and thus the

distribution of is and the distribution of is .

13

Stochastic Majorization, A Characterizationfncdcn/Research/Stochastic Majorization, A... · Web...

Documents

Transcript of Stochastic Majorization, A Characterizationfncdcn/Research/Stochastic Majorization, A... · Web...