Multilinear Form

Multilinear formFrom Wikipedia, the free encyclopedia

Contents

1 Majorization 11.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Geometry of Majorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Equivalent conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 In linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 In recursion theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.6 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.7 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.10 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.11 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 MATLAB 62.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.2 Vectors and matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.3 Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.4 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2.5 Function handles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2.6 Classes and Object-Oriented Programming . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 Graphics and graphical user interface programming . . . . . . . . . . . . . . . . . . . . . . . . . 82.4 Interfacing with other languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.5 License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.6 Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.7 Release history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.8 File extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.8.1 MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.8.2 Simulink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.8.3 Simscape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.8.4 MuPAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.8.5 Third-party . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

i

ii CONTENTS

2.9 Easter eggs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.10 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.11 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.12 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.13 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Matrix addition 163.1 Entrywise sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2 Direct sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3 Kronecker sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.4 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.5 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.7 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Matrix analysis 194.1 Matrix spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.3 Eigenvalues and eigenvectors of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3.1 Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.3.2 Perturbations of eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.4 Matrix similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.4.1 Unitary similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.5 Canonical forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.5.1 Row echelon form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.5.2 Jordan normal form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.5.3 Weyr canonical form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.5.4 Frobenius normal form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.6 Triangular factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.6.1 LU decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.7 Matrix norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.7.1 Denition and axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.7.2 Frobenius norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.8 Positive denite and semidenite matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.9 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.9.1 Functions of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.9.2 Matrix-valued functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.10 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.10.1 Other branches of analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.10.2 Other concepts of linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.10.3 Types of matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.10.4 Matrix functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

CONTENTS iii

4.11 Footnotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.12 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.12.1 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.12.2 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5 Matrix calculus 245.1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5.1.1 Relation to other derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.1.2 Usages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.2.1 Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.3 Derivatives with vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.3.1 Vector-by-scalar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.3.2 Scalar-by-vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.3.3 Vector-by-vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.4 Derivatives with matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.4.1 Matrix-by-scalar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.4.2 Scalar-by-matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285.4.3 Other matrix derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.5 Layout conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.5.1 Numerator-layout notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.5.2 Denominator-layout notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.6 Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.6.1 Vector-by-vector identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.6.2 Scalar-by-vector identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.6.3 Vector-by-scalar identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.6.4 Scalar-by-matrix identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.6.5 Matrix-by-scalar identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.6.6 Scalar-by-scalar identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.6.7 Identities in dierential form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.7 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.9 External links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6 Matrix Cherno bound 356.1 Matrix Gaussian and Rademacher series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.1.1 Self-adjoint matrices case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356.1.2 Rectangular case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.2 Matrix Cherno inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366.2.1 Matrix Cherno I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366.2.2 Matrix Cherno II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

6.3 Matrix Bennett and Bernstein inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

iv CONTENTS

6.3.1 Bounded case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376.3.2 Subexponential case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376.3.3 Rectangular case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.4 Matrix Azuma, Hoeding, and McDiarmid inequalities . . . . . . . . . . . . . . . . . . . . . . . . 386.4.1 Matrix Azuma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386.4.2 Matrix Hoeding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396.4.3 Matrix bounded dierence (McDiarmid) . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.5 Survey of related theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406.6 Derivation and proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6.6.1 Ahlswede and Winter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416.6.2 Tropp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

7 Matrix congruence 447.1 Congruence over the reals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447.2 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

8 Matrix decomposition into clans 468.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468.2 Near relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468.3 Clan relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468.4 Decomposed matrix structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468.5 Employing decomposition into clans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 478.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

9 Matrix determinant lemma 489.1 Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489.2 Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489.3 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499.4 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499.5 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

10 Matrix dierence equation 5010.1 Non-homogeneous rst-order matrix dierence equations and the steady state . . . . . . . . . . . . 5010.2 Stability of the rst-order case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5110.3 Solution of the rst-order case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5110.4 Extracting the dynamics of a single scalar variable from a rst-order matrix system . . . . . . . . . 5110.5 Solution and stability of higher-order cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5110.6 Nonlinear matrix dierence equations: Riccati equations . . . . . . . . . . . . . . . . . . . . . . . 5210.7 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5310.8 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

CONTENTS v

11 Matrix norm 5411.1 Denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5411.2 Induced norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5411.3 Entrywise norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

11.3.1 L2,1 norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5611.3.2 Frobenius norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5611.3.3 Max norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

11.4 Schatten norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5711.5 Consistent norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5711.6 Compatible norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5811.7 Equivalence of norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

11.7.1 Examples of norm equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5811.8 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5911.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

12 Matrix pencil 6012.1 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6012.2 Pencil generated by commuting matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6012.3 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6112.4 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6112.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

13 Mixed linear complementarity problem 6213.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

14 Mode of a linear eld 63

15 Multilinear form 6415.1 See also . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6415.2 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6415.3 Text and image sources, contributors, and licenses . . . . . . . . . . . . . . . . . . . . . . . . . . 65

15.3.1 Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6515.3.2 Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6615.3.3 Content license . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Chapter 1

Majorization

This article is about a partial ordering of vectors on Rd. For functions, see Lorenz ordering.

In mathematics,majorization is a preorder on vectors of real numbers. For a vector a 2 Rd , we denote by a# 2 Rdthe vector with the same components, but sorted in descending order. Given a; b 2 Rd , we say that a weaklymajorizes (or dominates) b from below written as a w b i

kXi=1

a#i kXi=1

b#i fork = 1; : : : ; d;

where a#i and b#i are the elements of a and b , respectively, sorted in decreasing order. Equivalently, we say that b isweakly majorized (or dominated) by a from below, denoted as b w a .Similarly, we say that a weakly majorizes b from above written as a w b i

dXi=k

a#i dX

i=k

b#i fork = 1; : : : ; d;

Equivalently, we say that b is weakly majorized by a from above, denoted as b w a .If a w b and in addition

Pdi=1 ai =

Pdi=1 bi we say that a majorizes (or dominates) b written as a b .

Equivalently, we say that b is majorized (or dominated) by a , denoted as b a .It is easy to see that a b if and only if a w b and a w b .Note that the majorization order do not depend on the order of the components of the vectors a or b . Majorizationis not a partial order, since a b and b a do not imply a = b , it only implies that the components of each vectorare equal, but not necessarily in the same order.Regrettably, to confuse the matter, some literature sources use the reverse notation, e.g., is replaced with , mostnotably, in Horn and Johnson, Matrix analysis (Cambridge Univ. Press, 1985), Denition 4.3.24, while the sameauthors switch to the traditional notation, introduced here, later in their Topics in Matrix Analysis (1994).A function f : Rd ! R is said to be Schur convex when a b implies f(a) f(b) . Similarly, f(a) is Schurconcave when a b implies f(a) f(b):The majorization partial order on nite sets, described here, can be generalized to the Lorenz ordering, a partial orderon distribution functions.

1.1 ExamplesThe order of the entries does not aect the majorization, e.g., the statement (1; 2) (0; 3) is simply equivalent to(2; 1) (3; 0) .

1

2 CHAPTER 1. MAJORIZATION

(Strong) majorization: (1; 2; 3) (0; 3; 3) (0; 0; 6) . For vectors with n components

1

n; : : : ;

1

n

1

n 1 ; : : : ;1

n 1 ; 0

1

2;1

2; 0; : : : ; 0

(1; 0; : : : ; 0) :

(Weak) majorization: (1; 2; 3) w (1; 3; 3) w (1; 3; 4) . For vectors with n components:

1

n; : : : ;

1

n

w

1

n 1 ; : : : ;1

n 1 ; 1:

1.2 Geometry of Majorization

Figure 1. 2D Majorization Example

For x; y 2 Rn;we have x y if and only if x is in the convex hull of all vectors obtained by permuting the coordinatesof y .Figure 1 displays the convex hull in 2D for the vector y = (3; 1) . Notice that the center of the convex hull, which isan interval in this case, is the vector x = (2; 2) . This is the smallest vector satisfying x y for this given vectory .Figure 2 shows the convex hull in 3D. The center of the convex hull, which is a 2D polygon in this case, is thesmallest vector x satisfying x y for this given vector y .

1.3. EQUIVALENT CONDITIONS 3

Figure 2. 3D Majorization Example

1.3 Equivalent conditionsEach of the following statements is true if and only if a b :

b = Da for some doubly stochastic matrixD (see Arnold,[1] Theorem 2.1). This is equivalent to saying b canbe represented as a weighted average of the permutations of a .

From a we can produce b by a nite sequence of Robin Hood operations where we replace two elements aiand aj < ai with ai " and aj + " , respectively, for some " 2 (0; ai aj) (see Arnold,[1] p. 11).

For every convex function h : R! R ,Pdi=1 h(ai) Pdi=1 h(bi) (see Arnold,[1] Theorem 2.9). 8t 2 R Pdj=1 jaj tj Pdj=1 jbj tj . (see Nielsen and Chuang Exercise 12.17,[2])

1.4 In linear algebra Suppose that for two real vectors v; v0 2 Rd , v majorizes v0 . Then it can be shown that there exists a setof probabilities (p1; p2; : : : ; pd);

Pdi=1 pi = 1 and a set of permutations (P1; P2; : : : ; Pd) such that v0 =Pd

i=1 piPiv . Alternatively it can be shown that there exists a doubly stochastic matrixD such that vD = v0

We say that a hermitian operator, H , majorizes another, H 0 , if the set of eigenvalues of H majorizes that ofH 0 .

4 CHAPTER 1. MAJORIZATION

1.5 In recursion theoryGiven f; g : N! N , then f is said to majorize g if, for all x , f(x) g(x) . If there is some n so that f(x) g(x)for all x > n , then f is said to dominate (or eventually dominate) g . Alternatively, the preceding terms are oftendened requiring the strict inequality f(x) > g(x) instead of f(x) g(x) in the foregoing denitions.

1.6 GeneralizationsVarious generalizations of majorization are discussed in chapters 14 and 15 of the reference work Inequalities: Theoryof Majorization and Its Applications. Albert W. Marshall, Ingram Olkin, Barry Arnold. Second edition. SpringerSeries in Statistics. Springer, New York, 2011. ISBN 978-0-387-40087-7

1.7 See also Muirheads inequality Schur-convex function SchurHorn theorem relating diagonal entries of a matrix to its eigenvalues. For positive integer numbers, weak majorization is called Dominance order.

1.8 Notes[1] Barry C. Arnold. Majorization and the Lorenz Order: A Brief Introduction. Springer-Verlag Lecture Notes in Statistics,

vol. 43, 1987.

[2] Nielsen and Chuang. Quantum Computation and Quantum Information. Cambridge University Press, 2000

1.9 References J. Karamata. Sur une inegalite relative aux fonctions convexes. Publ. Math. Univ. Belgrade 1, 145158, 1932. G. H. Hardy, J. E. Littlewood and G. Plya, Inequalities, 2nd edition, 1952, Cambridge University Press,London.

Inequalities: Theory of Majorization and Its Applications Albert W. Marshall, Ingram Olkin, Barry Arnold,Second edition. Springer Series in Statistics. Springer, New York, 2011. ISBN 978-0-387-40087-7

Inequalities: Theory of Majorization and Its Applications (1980) Albert W. Marshall, Ingram Olkin, AcademicPress, ISBN 978-0-12-473750-1

A tribute to Marshall and Olkins book Inequalities: Theory of Majorization and its Applications Quantum Computation and Quantum Information, (2000) Michael A. Nielsen and Isaac L. Chuang, CambridgeUniversity Press, ISBN 978-0-521-63503-5

Matrix Analysis (1996) Rajendra Bhatia, Springer, ISBN 978-0-387-94846-1 Topics in Matrix Analysis (1994) Roger A. Horn and Charles R. Johnson, Cambridge University Press, ISBN978-0-521-46713-1

Majorization and Matrix Monotone Functions in Wireless Communications (2007) Eduard Jorswieck and HolgerBoche, Now Publishers, ISBN 978-1-60198-040-3

The Cauchy Schwarz Master Class (2004) J. Michael Steele, Cambridge University Press, ISBN 978-0-521-54677-5

1.10. EXTERNAL LINKS 5

1.10 External links Majorization in MathWorld Majorization in PlanetMath

1.11 Software OCTAVE/MATLAB code to check majorization

Chapter 2

MATLAB

For the region in Bangladesh, see Matlab (Bangladesh).Not to be confused with MATHLAB.

MATLAB (matrix laboratory) is a multi-paradigm numerical computing environment and fourth-generation pro-gramming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions anddata, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other lan-guages, including C, C++, Java, Fortran and Python.Although MATLAB is intended primarily for numerical computing, an optional toolbox uses the MuPAD symbolicengine, allowing access to symbolic computing capabilities. An additional package, Simulink, adds graphical multi-domain simulation and Model-Based Design for dynamic and embedded systems.In 2004, MATLAB had around onemillion users across industry and academia.[3] MATLAB users come from variousbackgrounds of engineering, science, and economics. MATLAB is widely used in academic and research institutionsas well as industrial enterprises.

2.1 HistoryCleve Moler, the chairman of the computer science department at the University of New Mexico, started developingMATLAB in the late 1970s.[4] He designed it to give his students access to LINPACK and EISPACK without themhaving to learn Fortran. It soon spread to other universities and found a strong audiencewithin the appliedmathematicscommunity. Jack Little, an engineer, was exposed to it during a visit Moler made to Stanford University in 1983.Recognizing its commercial potential, he joined with Moler and Steve Bangert. They rewrote MATLAB in C andfounded MathWorks in 1984 to continue its development. These rewritten libraries were known as JACKPAC.[5] In2000, MATLAB was rewritten to use a newer set of libraries for matrix manipulation, LAPACK.[6]

MATLAB was rst adopted by researchers and practitioners in control engineering, Littles specialty, but quicklyspread to many other domains. It is now also used in education, in particular the teaching of linear algebra, numericalanalysis, and is popular amongst scientists involved in image processing.[4]

2.2 SyntaxThe MATLAB application is built around the MATLAB scripting language. Common usage of the MATLAB ap-plication involves using the Command Window as an interactive mathematical shell or executing text les containingMATLAB code.[7]

2.2.1 VariablesVariables are dened using the assignment operator, =. MATLAB is a weakly typed programming language becausetypes are implicitly converted.[8] It is an inferred typed language because variables can be assigned without declaring

6

2.2. SYNTAX 7

their type, except if they are to be treated as symbolic objects,[9] and that their type can change. Values can comefrom constants, from computation involving values of other variables, or from the output of a function. For example:>> x = 17 x = 17 >> x = 'hat' x = hat >> y = x + 0 y = 104 97 116 >> x = [3*4, pi/2] x = 12.0000 1.5708 >> y =3*sin(x) y = 1.6097 3.0000

2.2.2 Vectors and matrices

A simple array is dened using the colon syntax: init:increment:terminator. For instance:>> array = 1:2:9 array = 1 3 5 7 9

denes a variable named array (or assigns a new value to an existing variable with the name array) which is an arrayconsisting of the values 1, 3, 5, 7, and 9. That is, the array starts at 1 (the init value), increments with each step fromthe previous value by 2 (the increment value), and stops once it reaches (or to avoid exceeding) 9 (the terminatorvalue).>> array = 1:3:9 array = 1 4 7

the increment value can actually be left out of this syntax (along with one of the colons), to use a default value of 1.>> ari = 1:5 ari = 1 2 3 4 5

assigns to the variable named ari an array with the values 1, 2, 3, 4, and 5, since the default value of 1 is used as theincrementer.Indexing is one-based,[10] which is the usual convention for matrices in mathematics, although not for some program-ming languages such as C, C++, and Java.Matrices can be dened by separating the elements of a row with blank space or comma and using a semicolon toterminate each row. The list of elements should be surrounded by square brackets: []. Parentheses: () are used toaccess elements and subarrays (they are also used to denote a function argument list).>> A = [16 3 2 13; 5 10 11 8; 9 6 7 12; 4 15 14 1] A = 16 3 2 13 5 10 11 8 9 6 7 12 4 15 14 1 >> A(2,3) ans = 11

Sets of indices can be specied by expressions such as 2:4, which evaluates to [2, 3, 4]. For example, a submatrixtaken from rows 2 through 4 and columns 3 through 4 can be written as:>> A(2:4,3:4) ans = 11 8 7 12 14 1

A square identity matrix of size n can be generated using the function eye, and matrices of any size with zeros or onescan be generated with the functions zeros and ones, respectively.>> eye(3,3) ans = 1 0 0 0 1 0 0 0 1 >> zeros(2,3) ans = 0 0 0 0 0 0 >> ones(2,3) ans = 1 1 1 1 1 1

Most MATLAB functions can accept matrices and will apply themselves to each element. For example, mod(2*J,n)will multiply every element in J by 2, and then reduce each element modulo n. MATLAB does include standardfor and while loops, but (as in other similar applications such as R), using the vectorized notation often producescode that is faster to execute. This code, excerpted from the function magic.m, creates a magic square M for oddvalues of n (MATLAB function meshgrid is used here to generate square matrices I and J containing 1:n).[J,I] = meshgrid(1:n); A = mod(I + J - (n + 3) / 2, n); B = mod(I + 2 * J - 2, n); M = n * A + B + 1;

2.2.3 Structures

MATLAB has structure data types.[11] Since all variables in MATLAB are arrays, a more adequate name is struc-ture array, where each element of the array has the same eld names. In addition, MATLAB supports dynamiceld names[12] (eld look-ups by name, eld manipulations, etc.). Unfortunately, MATLAB JIT does not support

8 CHAPTER 2. MATLAB

MATLAB structures, therefore just a simple bundling of various variables into a structure will come at a cost.[13]

2.2.4 Functions

When creating a MATLAB function, the name of the le should match the name of the rst function in the le. Validfunction names begin with an alphabetic character, and can contain letters, numbers, or underscores.

2.2.5 Function handles

MATLAB supports elements of lambda calculus by introducing function handles,[14] or function references, whichare implemented either in .m les or anonymous[15]/nested functions.[16]

2.2.6 Classes and Object-Oriented Programming

MATLABs support for object-oriented programming includes classes, inheritance, virtual dispatch, packages, pass-by-value semantics, and pass-by-reference semantics.[17] However, the syntax and calling conventions are signicantlydierent from other languages. MATLAB has value classes and reference classes, depending on whether the classhas handle as a super-class (for reference classes) or not (for value classes).[18]

Method call behavior is dierent between value and reference classes. For example, a call to a methodobject.method();

can alter any member of object only if object is an instance of a reference class.An example of a simple class is provided below.classdef hello methods function greet(this) disp('Hello!') end end end

When put into a le named hello.m, this can be executed with the following commands:>> x = hello; >> x.greet(); Hello!

2.3 Graphics and graphical user interface programming

MATLAB supports developing applications with graphical user interface features. MATLAB includes GUIDE[19](GUI development environment) for graphically designing GUIs.[20] It also has tightly integrated graph-plotting fea-tures. For example the function plot can be used to produce a graph from two vectors x and y. The code:x = 0:pi/100:2*pi; y = sin(x); plot(x,y)

produces the following gure of the sine function:

2.4. INTERFACING WITH OTHER LANGUAGES 9

A MATLAB program can produce three-dimensional graphics using the functions surf, plot3 or mesh.In MATLAB, graphical user interfaces can be programmed with the GUI design environment (GUIDE) tool.[21]

2.4 Interfacing with other languagesMATLAB can call functions and subroutines written in the C programming language or Fortran.[22] A wrapperfunction is created allowing MATLAB data types to be passed and returned. The dynamically loadable object lescreated by compiling such functions are termed "MEX-les" (forMATLAB executable).[23][24] Since 2014 increasingtwo way interfacing with python is being added. [25][26]

Libraries written in Perl, Java, ActiveX or .NET can be directly called from MATLAB,[27][28] and many MATLABlibraries (for example XML or SQL support) are implemented as wrappers around Java or ActiveX libraries. CallingMATLAB from Java is more complicated, but can be done with a MATLAB toolbox[29] which is sold separately byMathWorks, or using an undocumented mechanism called JMI (Java-to-MATLAB Interface),[30][31] (which shouldnot be confused with the unrelated Java Metadata Interface that is also called JMI).As alternatives to the MuPAD based Symbolic Math Toolbox available from MathWorks, MATLAB can be con-nected to Maple or Mathematica.[32][33]

Libraries also exist to import and export MathML.[34]

2.5 LicenseMATLAB is a proprietary product of MathWorks, so users are subject to vendor lock-in.[3][35] Although MAT-LAB Builder products can deploy MATLAB functions as library les which can be used with .NET[36] or Java[37]application building environment, future development will still be tied to the MATLAB language.Each toolbox is purchased separately. If an evaluation license is requested, the MathWorks sales department requiresdetailed information about the project for which MATLAB is to be evaluated. If granted (which it often is), theevaluation license is valid for two to four weeks. A student version of MATLAB is available as is a home-use licensefor MATLAB, SIMULINK, and a subset of Mathworks Toolboxes at substantially reduced prices.It has been reported that EU competition regulators are investigating whether MathWorks refused to sell licenses toa competitor.[38]

2.6 AlternativesSee also: list of numerical analysis software and comparison of numerical analysis software

10 CHAPTER 2. MATLAB

MATLAB has a number of competitors.[39] Commercial competitors include Mathematica, TK Solver, Maple, andIDL. There are also free open source alternatives to MATLAB, in particular GNU Octave, Scilab, FreeMat, Julia,and Sage which are intended to be mostly compatible with the MATLAB language. Among other languages thattreat arrays as basic entities (array programming languages) are APL, Fortran 90 and higher, S-Lang, as well asthe statistical languages R and S. There are also libraries to add similar functionality to existing languages, such asIT++ for C++, Perl Data Language for Perl, ILNumerics for .NET, NumPy/SciPy for Python, and Numeric.js forJavaScript.GNUOctave is unique from other alternatives because it treats incompatibility withMATLAB as a bug (seeMATLABCompatibility of GNU Octave). Therefore, GNU Octave attempts to provide a software clone of MATLAB.

2.7 Release historyThe number (or Release number) is the version reported by Concurrent License Manager program FLEXlm.For a complete list of changes of both MATLAB and ocial toolboxes, consult the MATLAB release notes.[72]

2.8 File extensions

2.8.1 MATLAB

.g MATLAB gure

.m MATLAB code (function, script, or class)

.mat MATLAB data (binary le for storing variables)

.mex... (.mexw32, .mexw64, .mexglx, ...) MATLAB executable MEX-les[73] (platform specic, e.g. ".mexmacfor the Mac, ".mexglx for Linux, etc.[74])

.p MATLAB content-obscured .m le (P-code[75])

.mlappinstall MATLAB packaged App Installer[76]

.mlpkginstall support package installer (add-on for third-party hardware)[77]

.mltbx packaged custom toolbox[78]

.prj project le used by various solutions (packaged app/toolbox projects, MATLAB Compiler/Coder projects,Simulink projects)

.rpt report setup le created by MATLAB Report Generator[79]

2.8.2 Simulink

.mdl Simulink Model

.mdlp Simulink Protected Model

.slx Simulink Model (SLX format)

.slxp Simulink Protected Model (SLX format)

2.8.3 Simscape

.ssc Simscape[80] Model

2.9. EASTER EGGS 11

2.8.4 MuPAD

.mn MuPAD Notebook

.mu MuPAD Code

.xvc, .xvz MuPAD Graphics

2.8.5 Third-party

.jkt GPU Cache le generated by Jacket for MATLAB (AccelerEyes)

.mum MATLAB CAPE-OPEN Unit Operation Model File (AmsterCHEM)

2.9 Easter eggs

Screen capture of two easter eggs in MATLAB 3.5.

Several easter eggs exist in MATLAB.[81] These include hidden pictures,[82] and jokes. For example, typing in spywill generate a picture of the spies from Spy vs Spy. Spy was changed to an image of a dog in recent releases(R2011B). Typing in why randomly outputs a philosophical answer. Other commands include penny, toilet,image, and life. Not every Easter egg appears in every version of MATLAB.

2.10 See also List of numerical analysis software

Comparison of numerical analysis software


2.11 Notes[1] The L-Shaped Membrane. MathWorks. 2003. Retrieved 7 February 2014.

[2] System Requirements and Platform Availability. MathWorks. Retrieved 2013-08-14.

[3] Richard Goering, "Matlab edges closer to electronic design automation world, EE Times, 10/04/2004

[4] Cleve Moler (December 2004). The Origins of MATLAB. Retrieved April 15, 2007.

[5] MATLAB Programming Language. Altius Directory. Retrieved 2010-12-17.

[6] Moler, Cleve (January 2000). MATLAB Incorporates LAPACK. Cleves Corner. MathWorks. Retrieved December 20,2008.

[7] MATLAB Documentation. MathWorks. Retrieved 2013-08-14.

[8] Comparing MATLAB with Other OO Languages. MATLAB. MathWorks. Retrieved 14 August 2013.

[9] Create Symbolic Variables and Expressions. Symbolic Math Toolbox. MathWorks. Retrieved 14 August 2013.

[10] Matrix Indexing. MathWorks. Retrieved 14 August 2013.

[11] Structures. MathWorks. Retrieved 14 August 2013.

[12] Generate Field Names from Variables. MathWorks. Retrieved 14 August 2013.

[13] Considering Performance in Object-Oriented MATLAB Code, Loren Shure, MATLAB Central, March 26, 2012: func-tion calls on structs, cells, and function handles will not benet from JIT optimization of the function call and can be manytimes slower than function calls on purely numeric arguments

[14] Function Handles. MathWorks. Retrieved 14 August 2013.

[15] Anonymous Functions. MathWorks. Retrieved 14 August 2013.

[16] Nested Functions. MathWorks.

[17] Object-Oriented Programming. MathWorks. Retrieved 2013-08-14.

[18] Comparing Handle and Value Classes. MathWorks.

[19] Create a Simple GUIDE GUI. MathWorks. Retrieved 14 August 2014.

[20] MATLAB GUI. MathWorks. 2011-04-30. Retrieved 2013-08-14.

[21] Smith, S. T. (2006). Matlab: Advanced GUI Development. Dog Ear Publishing. ISBN 978-1-59858-181-2.

[22] Application Programming Interfaces to MATLAB. MathWorks. Retrieved 14 August 2013.

[23] Create MEX-Files. MathWorks. Retrieved 14 August 2013.

[24] Spielman, Dan (2004-02-10). Connecting C and Matlab. Yale University, Computer Science Department. Retrieved2008-05-20.

[25] MATLAB Engine for Python. MathWorks. Retrieved 13 June 2015.

[26] Call Python Libraries. MathWorks. Retrieved 13 June 2015.

[27] External Programming Language Interfaces. MathWorks. Retrieved 14 August 2013.

[28] Call Perl script using appropriate operating system executable. MathWorks. Retrieved 7 November 2013.

[29] MATLAB Builder JA. MathWorks. Retrieved 2010-06-07.

[30] Altman, Yair (2010-04-14). Java-to-Matlab Interface. Undocumented Matlab. Retrieved 2010-06-07.

[31] Kaplan, Joshua. matlabcontrol JMI.

[32] Germundsson, Roger (1998-09-30). MaMa: Calling MATLAB from Mathematica with MathLink. Wolfram Research.Wolfram Library Archive.

[33] rsmenon; szhorvat (2013). MATLink: Communicate with MATLAB from Mathematica. Retrieved 14 August 2013.

2.11. NOTES 13

[34] Weitzel, Michael (2006-09-01). MathML import/export. MathWorks - File Exchange. Retrieved 2013-08-14.

[35] Staord, Jan (21 May 2003). The Wrong Choice: Locked in by license restrictions. SearchOpenSource.com. Retrieved14 August 2013.

[36] MATLAB Builder NE. MathWorks. Retrieved 14 August 2013.

[37] MATLAB Builder JA. MathWorks. Retrieved 14 August 2013.

[38] MathWorks Software Licenses Probed by EU Antitrust Regulators. Bloomberg news. 2012-03-01.

[39] Steinhaus, Stefan (February 24, 2008). Comparison of mathematical programs for data analysis.

[40] Moler, Cleve (January 2006). The Growth ofMATLAB and TheMathWorks over Two Decades. News &Notes Newslet-ter. MathWorks. Retrieved 14 August 2013.

[41] Memory Mapping. MathWorks. Retrieved 22 January 2014.

[42] MATLAB bsxfun. MathWorks. Retrieved 22 January 2014.

[43] Do MATLAB versions prior to R2007a run under Windows Vista?". MathWorks. 2010-09-03. Retrieved 2011-02-08.

[44] OOP Compatibility with Previous Versions. MathWorks. Retrieved March 11, 2013.

[45] Packages Create Namespaces. MathWorks. Retrieved 22 January 2014.

[46] Map Containers. MathWorks. Retrieved 22 January 2014.

[47] Creating and Controlling a Random Number Stream. MathWorks. Retrieved 22 January 2014.

[48] New MATLAB External Interfacing Features in R2009a. MathWorks. Retrieved 22 January 2014.

[49] Ignore Function Outputs. MathWorks. Retrieved 22 January 2014.

[50] Ignore Function Inputs. MathWorks. Retrieved 22 January 2014.

[51] Working with Enumerations. MathWorks. Retrieved 22 January 2014.

[52] Whats New in Release 2010b. MathWorks. Retrieved 22 January 2014.

[53] New RNG Function for Controlling Random Number Generation in Release 2011a. MathWorks. Retrieved 22 January2014.

[54] MATLAB rng. MathWorks. Retrieved 22 January 2014.

[55] Replace Discouraged Syntaxes of rand and randn. MathWorks. Retrieved 22 January 2014.

[56] MATLAB matle. MathWorks. Retrieved 22 January 2014.

[57] MATLAB max workers. Retrieved 22 January 2014.

[58] Loren Shure (September 2012). The MATLAB R2012b Desktop Part 1: Introduction to the Toolstrip.

[59] MATLAB Apps. MathWorks. Retrieved August 14, 2013.

[60] MATLAB Unit Testing Framework. MathWorks. Retrieved August 14, 2013.

[61] MathWorks Announces Release 2013b of the MATLAB and Simulink Product Families. MathWorks. September 2013.

[62] MATLAB Tables. MathWorks. Retrieved 14 September 2013.

[63] MathWorks Announces Release 2014a of the MATLAB and Simulink Product Families. MathWorks. Retrieved 11March 2014.

[64] Graphics Changes in R2014b. MathWorks. Retrieved 3 October 2014.

[65] uitab: Create tabbed panel. MathWorks. Retrieved 3 October 2014.

[66] Create and Share Toolboxes. MathWorks. Retrieved 3 October 2014.

[67] Dates and Time. MathWorks. Retrieved 3 October 2014.

[68] Source Control Integration. MathWorks. Retrieved 3 October 2014.


[69] MATLAB MapReduce and Hadoop. MathWorks. Retrieved 3 October 2014.

[70] Call Python Libraries. MathWorks. Retrieved 3 October 2014.

[71] MATLAB Engine for Python. MathWorks. Retrieved 3 October 2014.

[72] MATLAB Release Notes. MathWorks. Retrieved 25 January 2014.

[73] Introducing MEX-Files. MathWorks. Retrieved 14 August 2013.

[74] Binary MEX-File Extensions. MathWorks. Retrieved 14 August 2013.

[75] Protect Your Source Code. MathWorks. Retrieved 14 August 2013.

[76] MATLAB App Installer File. MathWorks. Retrieved 14 August 2013.

[77] Support Package Installation. MathWorks. Retrieved 3 October 2014.

[78] Manage Toolboxes. MathWorks. Retrieved 3 October 2014.

[79] MATLAB Report Generator. MathWorks. Retrieved 3 October 2014.

[80] Simscape. MathWorks. Retrieved 14 August 2013.

[81] What MATLAB Easter eggs do you know?". MathWorks - MATLAB Answers. 2011-02-25. Retrieved 2013-08-14.

[82] Eddins, Steve (2006-10-17). The Story Behind the MATLAB Default Image. Retrieved 14 August 2013.

2.12 References Gilat, Amos (2004). MATLAB: An Introduction with Applications 2nd Edition. John Wiley & Sons. ISBN978-0-471-69420-5.

Quarteroni, Alo; Saleri, Fausto (2006). Scientic Computing with MATLAB and Octave. Springer. ISBN978-3-540-32612-0.

Ferreira, A.J.M. (2009). MATLAB Codes for Finite Element Analysis. Springer. ISBN 978-1-4020-9199-5. Lynch, Stephen (2004). Dynamical Systems with Applications using MATLAB. Birkhuser. ISBN 978-0-8176-4321-8.

2.13 External links Ocial website MATLAB Central File Exchange Library of over 20,000 user-contributed MATLAB les and toolboxes,mostly distributed under BSD License.

MATLAB at DMOZ MATLABCentral Newsreader aweb-based newsgroups reader hosted byMathWorks for comp.soft-sys.matlab LiteratePrograms (MATLAB) MATLAB Central Blogs Physical Modeling in MATLAB by Allen B. Downey, Green Tea Press, PDF, ISBN 978-0-615-18550-7. Anintroduction to MATLAB.

Writing Fast MATLAB Code by Pascal Getreuer Calling MATLAB from Java: MatlabControl JMI Wrapper, The MatlabJava Server, MatlabControl International Online Workshop on MATLAB and Simulink by WorldServe Education MATLAB tag on Stack Overow.

2.13. EXTERNAL LINKS 15

MATLAB Answers a collaborative environment for nding the best answers to your questions about MAT-LAB, Simulink, and related products.

Cody a MATLAB Central game that challenges and expands your knowledge of MATLAB. MATLAB Online Programming Contest Trendy a MATLAB based web service for tracking and plotting trends. Undocumented Matlab a blog on undocumented/non-ocial aspects of MATLAB. Hazewinkel, Michiel, ed. (2001), Linear algebra software packages, Encyclopedia of Mathematics, Springer,ISBN 978-1-55608-010-4

Chapter 3

Matrix addition

In mathematics, matrix addition is the operation of adding two matrices by adding the corresponding entries to-gether. However, there are other operations which could also be considered as a kind of addition for matrices, thedirect sum and the Kronecker sum.

3.1 Entrywise sumTwo matrices must have an equal number of rows and columns to be added.[1] The sum of two matrices A and B willbe a matrix which has the same number of rows and columns as do A and B. The sum of A and B, denoted A + B,is computed by adding corresponding elements of A and B:[2][3]

A+ B =

26664a11 a12 a1na21 a22 a2n... ... . . . ...

am1 am2 amn

37775+26664b11 b12 b1nb21 b22 b2n... ... . . . ...

bm1 bm2 bmn

37775

=

26664a11 + b11 a12 + b12 a1n + b1na21 + b21 a22 + b22 a2n + b2n

... ... . . . ...am1 + bm1 am2 + bm2 amn + bmn

37775For example:

241 31 01 2

35+240 07 52 1

35 =241 + 0 3 + 01 + 7 0 + 51 + 2 2 + 1

35 =241 38 53 3

35We can also subtract one matrix from another, as long as they have the same dimensions. A B is computed bysubtracting corresponding elements of A and B, and has the same dimensions as A and B. For example:

241 31 01 2

35240 07 52 1

35 =241 0 3 01 7 0 51 2 2 1

35 =24 1 36 51 1

35

3.2 Direct sumAnother operation, which is used less often, is the direct sum (denoted by ). Note the Kronecker sum is also denoted; the context should make the usage clear. The direct sum of any pair of matrices A of size m n and B of size p q is a matrix of size (m + p) (n + q) dened as [4][2]

16

3.3. KRONECKER SUM 17

A B =A 00 B

=

2666666664

a11 a1n 0 0... . . . ... ... . . . ...

am1 amn 0 00 0 b11 b1q... . . . ... ... . . . ...0 0 bp1 bpq

3777777775For instance,

1 3 22 3 1

1 60 1

=

26641 3 2 0 02 3 1 0 00 0 0 1 60 0 0 0 1

3775The direct sum of matrices is a special type of block matrix, in particular the direct sum of square matrices is a blockdiagonal matrix.The adjacency matrix of the union of disjoint graphs or multigraphs is the direct sum of their adjacency matrices.Any element in the direct sum of two vector spaces of matrices can be represented as a direct sum of two matrices.In general, the direct sum of n matrices is:[2]

nMi=1

Ai = diag(A1;A2;A3 An) =

26664A1 0 00 A2 0... ... . . . ...0 0 An

37775where the zeros are actually blocks of zeros, i.e. zero matricies.

3.3 Kronecker sumMain article: Kronecker sum

The Kronecker sum is dierent from the direct sum but is also denoted by . It is dened using the Kroneckerproduct and normal matrix addition. If A is n-by-n, B is m-by-m and Ik denotes the k-by-k identity matrix thenthe Kronecker sum is dened by:

A B = A Im + In B:

3.4 See also Matrix multiplication Vector addition

3.5 Notes[1] Elementary Linear Algebra by Rorres Anton 10e p53

[2] Lipschutz Lipson.

18 CHAPTER 3. MATRIX ADDITION

[3] Riley, K.F.; Hobson, M.P.; Bence, S.J. (2010). Mathematical methods for physics and engineering. Cambridge UniversityPress. ISBN 978-0-521-86153-3.

[4] Weisstein, Eric W., Matrix Direct Sum, MathWorld.

3.6 References Lipschutz, S.; Lipson, M. (2009). Linear Algebra. Schaums Outline Series. ISBN 978-0-07-154352-1.

3.7 External links Direct sum of matrices at PlanetMath.org.

Abstract nonsense: Direct Sum of Linear Transformations and Direct Sum of Matrices Mathematics Source Library: Arithmetic Matrix Operations Matrix Algebra and R

Chapter 4

Matrix analysis

In mathematics, particularly in linear algebra and applications,matrix analysis is the study of matrices and their al-gebraic properties.[1] Some particular topics out of many include; operations dened on matrices (such as matrixaddition, matrix multiplication and operations derived from these), functions of matrices (such as matrix expo-nentiation and matrix logarithm, and even sines and cosines etc. of matrices),[2] and the eigenvalues of matrices(eigendecomposition of a matrix, eigenvalue perturbation theory).

4.1 Matrix spacesThe set of all mn matrices over a number eld F denoted in this article Mmn(F) form a vector space. Examples ofF include the set of integers , the real numbers , and set of complex numbers . The spacesMmn(F) andMpq(F)are dierent spaces if m and p are unequal, and if n and q are unequal; for instance M32(F) M23(F). Two mnmatrices A and B in Mmn(F) can be added together to form another matrix in the space Mmn(F):

A;B 2Mmn(F ) ; A+ B 2Mmn(F )and multiplied by a in F, to obtain another matrix in Mmn(F):

2 F ; A 2Mmn(F )Combining these two properties, a linear combination of matrices A and B are in Mmn(F) is another matrix inMmn(F):

A+ B 2Mmn(F )where and are numbers in F.Any matrix can be expressed as a linear combination of basis matrices, which play the role of the basis vectors forthe matrix space. For example, for the set of 22 matrices over the eld of real numbers, M22(), one legitimatebasis set of matrices is:

1 00 0

;

0 10 0

;

0 01 0

;

0 00 1

;

because any 22 matrix can be expressed as:

a bc d

= a

1 00 0

+ b

0 10 0

+ c

0 01 0

+ d

0 00 1

;

where a, b, c,d are all real numbers. This idea applies to other elds and matrices of higher dimensions.

19

20 CHAPTER 4. MATRIX ANALYSIS

4.2 DeterminantsMain article: Determinant

The determinant of a square matrix is an important property. The determinant indicates if a matrix is invertible(i.e. the inverse of a matrix exists when the determinant is nonzero). Determinants are used for nding eigenvaluesof matrices (see below), and for solving a system of linear equations (see Cramers rule).

4.3 Eigenvalues and eigenvectors of matricesMain article: Eigenvalues and eigenvectors

4.3.1 DenitionsAn nn matrix A has eigenvectors x and eigenvalues dened by the relation:

Ax = x

In words, the matrix multiplication of A followed by an eigenvector x (here an n-dimensional column matrix), is thesame as multiplying the eigenvector by the eigenvalue. For an nn matrix, there are n eigenvalues. The eigenvaluesare the roots of the characteristic polynomial:

pA() = det(A I) = 0where I is the nn identity matrix.Roots of polynomials, in this context the eigenvalues, can all be dierent, or some may be equal (in which case eigen-value has multiplicity, the number of times an eigenvalue occurs). After solving for the eigenvalues, the eigenvectorscorresponding to the eigenvalues can be found by the dening equation.

4.3.2 Perturbations of eigenvaluesMain article: Eigenvalue perturbation

4.4 Matrix similarityMain articles: Matrix similarity and Change of basis

Two nn matrices A and B are similar if they are related by a similarity transformation:

B = PAP1

The matrix P is called a similarity matrix, and is necessarily invertible.

4.4.1 Unitary similarityMain article: Unitary matrix

4.5. CANONICAL FORMS 21

4.5 Canonical formsFor other uses, see Canonical form.

4.5.1 Row echelon formMain article: Row echelon form

4.5.2 Jordan normal formMain article: Jordan normal form

4.5.3 Weyr canonical formMain article: Weyr canonical form

4.5.4 Frobenius normal formMain article: Frobenius normal form

4.6 Triangular factorization

4.6.1 LU decompositionMain article: LU decomposition

LU decomposition splits a matrix into a matrix product of an upper triangular matrix and a lower triangle matrix.

4.7 Matrix normsMain article: Matrix norm

Since matrices form vector spaces, one can form axioms (analogous to those of vectors) to dene a size of aparticular matrix. The norm of a matrix is a positive real number.

4.7.1 Denition and axiomsFor all matrices A and B inMmn(F), and all numbers in F, a matrix norm, delimited by double vertical bars || ... ||,fullls:[note 1]

Nonnegative:

kAk 0

22 CHAPTER 4. MATRIX ANALYSIS

with equality only for A = 0, the zero matrix.

Scalar multiplication:

kAk = jjkAk

The triangular inequality:

kA+ Bk kAk+ kBk

4.7.2 Frobenius normThe Frobenius norm is analogous to the dot product of Euclidean vectors; multiply matrix elements entry-wise, addup the results, then take the positive square root:

kAk =pA : A =

vuut mXi=1

nXj=1

(Aij)2

It is dened for matrices of any dimension (i.e. no restriction to square matrices).

4.8 Positive denite and semidenite matricesMain article: Positive denite matrix

4.9 FunctionsMain article: Function (mathematics)

Matrix elements are not restricted to constant numbers, they can be mathematical variables.

4.9.1 Functions of matricesA functions of a matrix takes in a matrix, and return something else (a number, vector, matrix, etc...).

4.9.2 Matrix-valued functionsA matrix valued function takes in something (a number, vector, matrix, etc...) and returns a matrix.

4.10 See also

4.10.1 Other branches of analysis Mathematical analysis Tensor analysis Matrix calculus Numerical analysis

4.11. FOOTNOTES 23

4.10.2 Other concepts of linear algebra Tensor product Spectrum of an operator Matrix geometrical series

4.10.3 Types of matrix Orthogonal matrix, unitary matrix Symmetric matrix, antisymmetric matrix Stochastic matrix

4.10.4 Matrix functions Matrix polynomial Matrix exponential

4.11 Footnotes[1] Some authors, e.g. Horn and Johnson, use triple vertical bars instead of double: |||A|||.

4.12 References

4.12.1 Notes[1] R. A. Horn, C. R. Johnson (2012). Matrix Analysis (2nd ed.). Cambridge University Press. ISBN 052-183-940-8.

[2] N. J. Higham (2000). Functions of Matrices: Theory and Computation. SIAM. ISBN 089-871-777-9.

4.12.2 Further reading C. Meyer (2000). Matrix Analysis and Applied Linear Algebra Book and Solutions Manual. Matrix Analysisand Applied Linear Algebra 2. SIAM. ISBN 089-871-454-0.

T. S. Shores (2007). Applied Linear Algebra and Matrix Analysis. Undergraduate Texts in Mathematics.Springer. ISBN 038-733-195-6.

Rajendra Bhatia (1997). Matrix Analysis. Matrix Analysis Series 169. Springer. ISBN 038-794-846-5.

Alan J. Laub (2012). Computational Matrix Analysis. SIAM. ISBN 161-197-221-3.

Chapter 5

Matrix calculus

In mathematics, matrix calculus is a specialized notation for doing multivariable calculus, especially over spaces ofmatrices. It collects the various partial derivatives of a single function with respect to many variables, and/or of amultivariate function with respect to a single variable, into vectors and matrices that can be treated as single entities.This greatly simplies operations such as nding the maximum or minimum of a multivariate function and solvingsystems of dierential equations. The notation used here is commonly used in statistics and engineering, while thetensor index notation is preferred in physics.Two competing notational conventions split the eld of matrix calculus into two separate groups. The two groupscan be distinguished by whether they write the derivative of a scalar with respect to a vector as a column vectoror a row vector. Both of these conventions are possible even when the common assumption is made that vectorsshould be treated as column vectors when combined with matrices (rather than row vectors). A single conventioncan be somewhat standard throughout a single eld that commonly use matrix calculus (e.g. econometrics, statistics,estimation theory and machine learning). However, even within a given eld dierent authors can be found usingcompeting conventions. Authors of both groups often write as though their specic convention is standard. Seriousmistakes can result when combining results from dierent authors without carefully verifying that compatible notationsare used. Therefore great care should be taken to ensure notational consistency. Denitions of these two conventionsand comparisons between them are collected in the layout conventions section.

5.1 ScopeMatrix calculus refers to a number of dierent notations that use matrices and vectors to collect the derivative of eachcomponent of the dependent variable with respect to each component of the independent variable. In general, theindependent variable can be a scalar, a vector, or a matrix while the dependent variable can be any of these as well.Each dierent situation will lead to a dierent set of rules, or a separate calculus, using the broader sense of the term.Matrix notation serves as a convenient way to collect the many derivatives in an organized way.As a rst example, consider the gradient from vector calculus. For a scalar function of three independent variables,f(x1; x2; x3) , the gradient is given by the vector equation

rf = @f@x1

x^1 +@f

@x2x^2 +

@f

@x3x^3

where x^i represents a unit vector in the xi direction for 1 i 3 . This type of generalized derivative can be seenas the derivative of a scalar, f, with respect to a vector, x and its result can be easily collected in vector form.

rf = @f@x =h@f@x1

@f@x2

@f@x3

i:

More complicated examples include the derivative of a scalar function with respect to a matrix, known as the gradientmatrix, which collects the derivative with respect to each matrix element in the corresponding position in the resultingmatrix. In that case the scalar must be a function of each of the independent variables in the matrix. As anotherexample, if we have an n-vector of dependent variables, or functions, of m independent variables we might consider

24

5.2. NOTATION 25

the derivative of the dependent vector with respect to the independent vector. The result could be collected in anmnmatrix consisting of all of the possible derivative combinations. There are, of course, a total of nine possibilities usingscalars, vectors, and matrices. Notice that as we consider higher numbers of components in each of the independentand dependent variables we can be left with a very large number of possibilities.The six kinds of derivatives that can be most neatly organized in matrix form are collected in the following table.[1]

Here, we have used the term matrix in its most general sense, recognizing that vectors and scalars are simplymatrices with one column and then one row respectively. Moreover, we have used bold letters to indicate vectors andbold capital letters for matrices. This notation is used throughout.Notice that we could also talk about the derivative of a vector with respect to a matrix, or any of the other unlledcells in our table. However, these derivatives are most naturally organized in a tensor of rank higher than 2, so thatthey do not t neatly into a matrix. In the following three sections we will dene each one of these derivatives andrelate them to other branches of mathematics. See the layout conventions section for a more detailed table.

5.1.1 Relation to other derivatives

The matrix derivative is a convenient notation for keeping track of partial derivatives for doing calculations. TheFrchet derivative is the standard way in the setting of functional analysis to take derivatives with respect to vectors.In the case that a matrix function of a matrix is Frchet dierentiable, the two derivatives will agree up to translation ofnotations. As is the case in general for partial derivatives, some formulae may extend under weaker analytic conditionsthan the existence of the derivative as approximating linear mapping.

5.1.2 Usages

Matrix calculus is used for deriving optimal stochastic estimators, often involving the use of Lagrange multipliers.This includes the derivation of:

Kalman lter

Wiener lter

Expectation-maximization algorithm for Gaussian mixture

5.2 NotationThe vector and matrix derivatives presented in the sections to follow take full advantage of matrix notation, usinga single variable to represent a large number of variables. In what follows we will distinguish scalars, vectors andmatrices by their typeface. We will let M(n,m) denote the space of real nm matrices with n rows and m columns.Such matrices will be denoted using bold capital letters: A,X,Y, etc. An element ofM(n,1), that is, a column vector,is denoted with a boldface lowercase letter: a, x, y, etc. An element ofM(1,1) is a scalar, denoted with lowercase italictypeface: a, t, x, etc. XT denotes matrix transpose, tr(X) is the trace, and det(X) is the determinant. All functions areassumed to be of dierentiability class C1 unless otherwise noted. Generally letters from rst half of the alphabet (a,b, c, ) will be used to denote constants, and from the second half (t, x, y, ) to denote variables.NOTE: As mentioned above, there are competing notations for laying out systems of partial derivatives in vectorsand matrices, and no standard appears to be emerging yet. The next two introductory sections use the numeratorlayout convention simply for the purposes of convenience, to avoid overly complicating the discussion. The sectionafter them discusses layout conventions in more detail. It is important to realize the following:

1. Despite the use of the terms numerator layout and denominator layout, there are actually more than twopossible notational choices involved. The reason is that the choice of numerator vs. denominator (or in somesituations, numerator vs. mixed) can be made independently for scalar-by-vector, vector-by-scalar, vector-by-vector, and scalar-by-matrix derivatives, and a number of authors mix and match their layout choices in variousways.

26 CHAPTER 5. MATRIX CALCULUS

2. The choice of numerator layout in the introductory sections below does not imply that this is the corrector superior choice. There are advantages and disadvantages to the various layout types. Serious mistakescan result from carelessly combining formulas written in dierent layouts, and converting from one layoutto another requires care to avoid errors. As a result, when working with existing formulas the best policy isprobably to identify whichever layout is used and maintain consistency with it, rather than attempting to usethe same layout in all situations.

5.2.1 Alternatives

The tensor index notation with its Einstein summation convention is very similar to the matrix calculus, except onewrites only a single component at a time. It has the advantage that one can easily manipulate arbitrarily high ranktensors, whereas tensors of rank higher than two are quite unwieldy with matrix notation. All of the work here canbe done in this notation without use of the single-variable matrix notation. However, many problems in estimationtheory and other areas of applied mathematics would result in too many indices to properly keep track of, pointing infavor of matrix calculus in those areas. Also, Einstein notation can be very useful in proving the identities presentedhere, as an alternative to typical element notation, which can become cumbersome when the explicit sums are carriedaround. Note that a matrix can be considered a tensor of rank two.

5.3 Derivatives with vectors

Main article: Vector calculus

Because vectors are matrices with only one column, the simplest matrix derivatives are vector derivatives.The notations developed here can accommodate the usual operations of vector calculus by identifying the spaceM(n,1)of n-vectors with the Euclidean spaceRn, and the scalarM(1,1) is identied withR. The corresponding concept fromvector calculus is indicated at the end of each subsection.NOTE: The discussion in this section assumes the numerator layout convention for pedagogical purposes. Some au-thors use dierent conventions. The section on layout conventions discusses this issue in greater detail. The identitiesgiven further down are presented in forms that can be used in conjunction with all common layout conventions.

5.3.1 Vector-by-scalar

The derivative of a vector y =

26664y1y2...ym

37775 , by a scalar x is written (in numerator layout notation) as

@y@x

=

26664@y1@x@y2@x...@ym@x

37775:

In vector calculus the derivative of a vector y with respect to a scalar x is known as the tangent vector of the vectory, @y@x . Notice here that y:R! Rm.Example Simple examples of this include the velocity vector in Euclidean space, which is the tangent vector of theposition vector (considered as a function of time). Also, the acceleration is the tangent vector of the velocity.

5.4. DERIVATIVES WITH MATRICES 27

5.3.2 Scalar-by-vector

The derivative of a scalar y by a vector x =

26664x1x2...xn

37775 , is written (in numerator layout notation) as@y

@x =@y

@x1

@y

@x2 @y

@xn

:

In vector calculus the gradient of a scalar eld y, in the space Rn whose independent coordinates are the componentsof x is the derivative of a scalar by a vector. In physics, the electric eld is the vector gradient of the electric potential.The directional derivative of a scalar function f(x) of the space vector x in the direction of the unit vector u is denedusing the gradient as follows.

ruf(x) = rf(x) uUsing the notation just dened for the derivative of a scalar with respect to a vector we can re-write the directionalderivative as ruf = @f@x u: This type of notation will be nice when proving product rules and chain rules that comeout looking similar to what we are familiar with for the scalar derivative.

5.3.3 Vector-by-vectorEach of the previous two cases can be considered as an application of the derivative of a vector with respect to avector, using a vector of size one appropriately. Similarly we will nd that the derivatives involving matrices willreduce to derivatives involving vectors in a corresponding way.

The derivative of a vector function (a vector whose components are functions) y =

26664y1y2...ym

37775 , with respect to an input

vector, x =

26664x1x2...xn

37775 , is written (in numerator layout notation) as

@y@x =

266664@y1@x1

@y1@x2

@y1@xn@y2@x1

@y2@x2

@y2@xn... ... . . . ...@ym@x1

@ym@x2

@ym@xn

377775:In vector calculus, the derivative of a vector function y with respect to a vector x whose components represent a spaceis known as the pushforward or dierential, or the Jacobian matrix.The pushforward along a vector function f with respect to vector v in Rm is given by d f(v) = @f@xv:

5.4 Derivatives with matricesThere are two types of derivatives with matrices that can be organized into a matrix of the same size. These arethe derivative of a matrix by a scalar and the derivative of a scalar by a matrix respectively. These can be useful inminimization problems found many areas of applied mathematics and have adopted the names tangent matrix andgradient matrix respectively after their analogs for vectors.NOTE: The discussion in this section assumes the numerator layout convention for pedagogical purposes. Some au-thors use dierent conventions. The section on layout conventions discusses this issue in greater detail. The identitiesgiven further down are presented in forms that can be used in conjunction with all common layout conventions.


5.4.1 Matrix-by-scalarThe derivative of a matrix function Y by a scalar x is known as the tangent matrix and is given (in numerator layoutnotation) by

@Y@x

=

26664@y11@x

@y12@x @y1n@x

@y21@x

@y22@x @y2n@x... ... . . . ...

@ym1@x

@ym2@x @ymn@x

37775:

5.4.2 Scalar-by-matrixThe derivative of a scalar y function of a matrix X of independent variables, with respect to the matrix X, is given (innumerator layout notation) by

@y

@X =

266664@y@x11

@y@x21

@y@xp1@y@x12

@y@x22

@y@xp2... ... . . . ...@y@x1q

@y@x2q

@y@xpq

377775:Notice that the indexing of the gradient with respect toX is transposed as compared with the indexing ofX. Importantexamples of scalar functions of matrices include the trace of a matrix and the determinant.In analog with vector calculus this derivative is often written as the following.

rXy(X) = @y(X)@X

Also in analog with vector calculus, the directional derivative of a scalar f(X) of a matrix X in the direction ofmatrix Y is given by

rYf = tr@f

@XY:

It is the gradient matrix, in particular, that nds many uses in minimization problems in estimation theory, particularlyin the derivation of the Kalman lter algorithm, which is of great importance in the eld.

5.4.3 Other matrix derivativesThe three types of derivatives that have not been considered are those involving vectors-by-matrices, matrices-by-vectors, and matrices-by-matrices. These are not as widely considered and a notation is not widely agreed upon. Asfor vectors, the other two types of higher matrix derivatives can be seen as applications of the derivative of a matrixby a matrix by using a matrix with one column in the correct place. For this reason, in this subsection we consideronly how one can write the derivative of a matrix by another matrix.The dierential or the matrix derivative of a matrix function F(X) that maps from nm matrices to pq matrices, F: M(n,m)! M(p,q), is an element of M(p,q) ? M(m,n), a fourth-rank tensor (the reversal of m and n here indicatesthe dual space of M(n,m)). In short it is an mn matrix each of whose entries is a pq matrix.

@F@X =

2664@F

@X1;1 @F@Xn;1... . . . ...

@F@X1;m

@F@Xn;m

3775;

5.5. LAYOUT CONVENTIONS 29

and note that each @F@Xij is a pq matrix dened as above. Note also that this matrix has its indexing transposed; mrows and n columns. The pushforward along F of an nm matrix Y in M(n,m) is then

dF(Y) = tr@F@XY

;

Note that this denition encompasses all of the preceding denitions as special cases.According to Jan R. Magnus and Heinz Neudecker, the following notations are both unsuitable, as the determinant ofthe second resulting matrix would have no interpretation and a useful chain rule does not exist if these notationsare being used:[2]

Given , a dierentiable function of an nm matrix X = (xi;j) ,

@(X)@X =

2664@

@x1;1 @@x1;q... . . . ...

@@xn;1

@@xn;q

3775Given F = (fs;t) , a dierentiablem n function of an nm matrix X ,

@F(X)@X =

264@f1;1@X @f1;p@X... . . . ...

@fm;1@X @fm;p@X

375The Jacobian matrix, according to Magnus and Neudecker,[2] is

DF (X) = @ vec F (X)@ (vec X)0 :

5.5 Layout conventionsThis section discusses the similarities and dierences between notational conventions that are used in the variouselds that take advantage of matrix calculus. Although there are largely two consistent conventions, some authorsnd it convenient to mix the two conventions in forms that are discussed below. After this section equations will belisted in both competing forms separately.The fundamental issue is that the derivative of a vector with respect to a vector, i.e. @y@x , is often written in twocompeting ways. If the numerator y is of size m and the denominator x of size n, then the result can be laid out aseither anmnmatrix or nmmatrix, i.e. the elements of y laid out in columns and the elements of x laid out in rows,or vice versa. This leads to the following possibilities:

1. Numerator layout, i.e. lay out according to y and xT (i.e. contrarily to x). This is sometimes known as theJacobian formulation.

2. Denominator layout, i.e. lay out according to yT and x (i.e. contrarily to y). This is sometimes known as theHessian formulation. Some authors term this layout the gradient, in distinction to the Jacobian (numeratorlayout), which is its transpose. (However, "gradient" more commonly means the derivative @y@x ; regardless oflayout.)

3. A third possibility sometimes seen is to insist on writing the derivative as @y@x0 ; (i.e. the derivative is taken withrespect to the transpose of x) and follow the numerator layout. This makes it possible to claim that the matrixis laid out according to both numerator and denominator. In practice this produces results the same as thenumerator layout.

When handling the gradient @y@x and the opposite case @y@x ; we have the same issues. To be consistent, we should doone of the following:


1. If we choose numerator layout for @y@x ; we should lay out the gradient @y@x as a row vector, and @y@x as a columnvector.

2. If we choose denominator layout for @y@x ; we should lay out the gradient @y@x as a column vector, and @y@x as arow vector.

3. In the third possibility above, we write @y@x0 and @y@x ; and use numerator layout.

Not all math textbooks and papers are consistent in this respect throughout the entire paper. That is, sometimesdierent conventions are used in dierent contexts within the same paper. For example, some choose denominatorlayout for gradients (laying them out as column vectors), but numerator layout for the vector-by-vector derivative @y@x :Similarly, when it comes to scalar-by-matrix derivatives @y@X and matrix-by-scalar derivatives @Y@x ; then consistentnumerator layout lays out according to Y and XT, while consistent denominator layout lays out according to YT andX. In practice, however, following a denominator layout for @Y@x ; and laying the result out according to YT, is rarelyseen because it makes for ugly formulas that do not correspond to the scalar formulas. As a result, the followinglayouts can often be found:

1. Consistent numerator layout, which lays out @Y@x according to Y and @y@X according to XT.

2. Mixed layout, which lays out @Y@x according to Y and @y@X according to X.

3. Use the notation @y@X0 ; with results the same as consistent numerator layout.

In the following formulas, we handle the ve possible combinations @y@x ; @y@x ; @y@x ; @y@X and @Y@x separately. We also handlecases of scalar-by-scalar derivatives that involve an intermediate vector or matrix. (This can arise, for example, if amulti-dimensional parametric curve is dened in terms of a scalar variable, and then a derivative of a scalar functionof the curve is taken with respect to the scalar that parameterizes the curve.) For each of the various combinations,we give numerator-layout and denominator-layout results, except in the cases above where denominator layout rarelyoccurs. In cases involving matrices where it makes sense, we give numerator-layout and mixed-layout results. Asnoted above, cases where vector andmatrix denominators are written in transpose notation are equivalent to numeratorlayout with the denominators written without the transpose.Keep in mind that various authors use dierent combinations of numerator and denominator layouts for dierenttypes of derivatives, and there is no guarantee that an author will consistently use either numerator or denominatorlayout for all types. Match up the formulas below with those quoted in the source to determine the layout used forthat particular type of derivative, but be careful not to assume that derivatives of other types necessarily follow thesame kind of layout.When taking derivatives with an aggregate (vector or matrix) denominator in order to nd a maximum or minimumof the aggregate, it should be kept in mind that using numerator layout will produce results that are transposed withrespect to the aggregate. For example, in attempting to nd the maximum likelihood estimate of a multivariate normaldistribution using matrix calculus, if the domain is a kx1 column vector, then the result using the numerator layoutwill be in the form of a 1xk row vector. Thus, either the results should be transposed at the end or the denominatorlayout (or mixed layout) should be used.

The results of operations will be transposed when switching between numerator-layout and denominator-layout no-tation.

5.5.1 Numerator-layout notation

Using numerator-layout notation, we have:[1]

@y

@x =@y

@x1

@y

@x2 @y

@xn

:

5.5. LAYOUT CONVENTIONS 31

@y@x

=

26664@y1@x@y2@x...@ym@x

37775:

@y@x =

266664@y1@x1

@y1@x2

@y1@xn@y2@x1

@y2@x2

@y2@xn... ... . . . ...@ym@x1

@ym@x2

@ym@xn

377775:

@y

@X =

266664@y@x11

@y@x21

@y@xp1@y@x12

@y@x22

@y@xp2... ... . . . ...@y@x1q

@y@x2q

@y@xpq

377775:The following denitions are only provided in numerator-layout notation:

@Y@x

=

26664@y11@x

@y12@x @y1n@x

@y21@x

@y22@x @y2n@x... ... . . . ...

@ym1@x

@ym2@x @ymn@x

37775:

dX =

26664dx11 dx12 dx1ndx21 dx22 dx2n... ... . . . ...

dxm1 dxm2 dxmn

37775:

5.5.2 Denominator-layout notation

Using denominator-layout notation, we have:[3]

@y

@x =

266664@y@x1@y@x2...@y@xn

377775:

@y@x

=

@y1@x

@y2@x

@ym@x

:

@y@x =

266664@y1@x1

@y2@x1

@ym@x1@y1@x2

@y2@x2

@ym@x2... ... . . . ...@y1@xn

@y2@xn

@ym@xn

377775:

@y

@X =

266664@y@x11

@y@x12

@y@x1q@y@x21

@y@x22

@y@x2q... ... . . . ...@y@xp1

@y@xp2

@y@xpq

377775:


5.6 Identities

As noted above, in general, the results of operations will be transposed when switching between numerator-layoutand denominator-layout notation.To help make sense of all the identities below, keep in mind the most important rules: the chain rule, product ruleand sum rule. The sum rule applies universally, and the product rule applies in most of the cases below, providedthat the order of matrix products is maintained, since matrix products are not commutative. The chain rule appliesin some of the cases, but unfortunately does not apply in matrix-by-scalar derivatives or scalar-by-matrix derivatives(in the latter case, mostly involving the trace operator applied to matrices). In the latter case, the product rule can'tquite be applied directly, either, but the equivalent can be done with a bit more work using the dierential identities.

5.6.1 Vector-by-vector identities

This is presented rst because all of the operations that apply to vector-by-vector dierentiation apply directly tovector-by-scalar or scalar-by-vector dierentiation simply by reducing the appropriate vector in the numerator ordenominator to a scalar.

5.6.2 Scalar-by-vector identities

The fundamental identities are placed above the thick black line.

5.6.3 Vector-by-scalar identities

NOTE: The formulas involving the vector-by-vector derivatives @g(u)@u and@f(g)@g (whose outputs are matrices) assume

the matrices are laid out consistent with the vector layout, i.e. numerator-layout matrix when numerator-layout vectorand vice versa; otherwise, transpose the vector-by-vector derivatives.

5.6.4 Scalar-by-matrix identities

Note that exact equivalents of the scalar product rule and chain rule do not exist when applied to matrix-valuedfunctions of matrices. However, the product rule of this sort does apply to the dierential form (see below), and thisis the way to derive many of the identities below involving the trace function, combined with the fact that the tracefunction allows transposing and cyclic permutation, i.e.:

tr(A) = tr(A>)

tr(ABCD) = tr(BCDA) = tr(CDAB) = tr(DABC)

For example, to compute @tr(AXBX>C)@X :

5.7. SEE ALSO 33

d tr(AXBX>C) = d tr(CAXBX>) = tr(d(CAXBX>))= tr(CAXd(BX>) + d(CAX)BX>)= tr(CAXd(BX>)) + tr(d(CAX)BX>)= tr(CAXBd(X>)) + tr(CA(dX)BX>)= tr(CAXB(dX)>) + tr(CA(dX)BX>)= tr

(CAXB(dX)>)>

+ tr(CA(dX)BX>)

= tr((dX)B>X>A>C>) + tr(CA(dX)BX>)= tr(B>X>A>C>(dX)) + tr(BX>CA(dX))= tr

(B>X>A>C> + BX>CA)dX

Therefore,

@tr(AXBX>C)@X = B

>X>A>C> + BX>CA:

(For the last step, see the `Conversion from dierential to derivative form' section.)

5.6.5 Matrix-by-scalar identities

Further see Derivative of the exponential map.

5.6.6 Scalar-by-scalar identitiesWith vectors involved

With matrices involved

5.6.7 Identities in dierential formIt is often easier to work in dierential form and then convert back to normal derivatives. This only works well usingthe numerator layout.

To convert to normal derivative form, rst convert it to one of the following canonical forms, and then use theseidentities:

5.7 See also Derivative (generalizations) Product integral


5.8 Notes[1] Minka, Thomas P. Old and New Matrix Algebra Useful for Statistics. December 28, 2000.

[2] Magnus, Jan R.; Neudecker, Heinz (1999). Matrix Dierential Calculus with Applications in Statistics and Econometrics.Wiley Series in Probability and Statistics (2nd ed.). Wiley. pp. 171173.

[3]

[4] Here, 0 refers to a column vector of all 0s, of size n, where n is the length of x.

[5] Here, 0 refers to a matrix of all 0s, of the same shape as X.

[6] Petersen, Kaare Brandt andMichael Syskind Pedersen. TheMatrix Cookbook. November 14, 2008. http://matrixcookbook.com. This book uses a mixed layout, i.e. by Y in @Y

@x; by X in @y

@X :

[7] Duchi, John C. Properties of the Trace and Matrix Derivatives (PDF). University of California at Berkeley. Retrieved19 July 2011.

[8] See Determinant#Derivative for the derivation.

[9] The constant a disappears in the result. This is intentional. In general,

d ln audx

=1

au

d(au)

dx=

1

auadu

dx=

1

u

du

dx=

d lnudx

:

5.9 External links Linear Algebra: Determinants, Inverses, Rank appendix D from Introduction to Finite Element Methods bookon University of Colorado at Boulder. Uses the Hessian (transpose to Jacobian) denition of vector and matrixderivatives.

Matrix Reference Manual, Mike Brookes, Imperial College London. The Matrix Cookbook (2006), with a derivatives chapter. Uses the Hessian denition. The Matrix Cookbook (2012), an updated version of the Matrix Cookbook. Linear Algebra and its Applications (author information page; see Chapter 9 of book), Peter Lax, CourantInstitute.

Matrix Dierentiation (and some other stu), Randal J. Barnes, Department of Civil Engineering, Universityof Minnesota.

Notes on Matrix Calculus, Paul L. Fackler, North Carolina State University. Matrix Dierential Calculus (slide presentation), Zhang Le, University of Edinburgh. Introduction toVector andMatrixDierentiation (notes onmatrix dierentiation, in the context of Econometrics),Heino Bohn Nielsen.

A note on dierentiating matrices (notes on matrix dierentiation), Pawel Koval, fromMunich Personal RePEcArchive.

Vector/Matrix Calculus More notes on matrix dierentiation. Matrix Identities (notes on matrix dierentiation), Sam Roweis.

Chapter 6

Matrix Cherno bound

For certain applications in linear algebra, it is useful to know properties of the probability distribution of the largesteigenvalue of a nite sum of random matrices. Suppose fXkg is a nite sequence of random matrices. Analogous tothe well-known Cherno bound for sums of scalars, a bound on the following is sought for a given parameter t:

Pr(max

Xk

Xk! t)

The following theorems answer this general question under various assumptions; these assumptions are named belowby analogy to their classical, scalar counterparts. All of these theorems can be found in (Tropp 2010), as the specicapplication of a general result which is derived below. A summary of related works is given.

6.1 Matrix Gaussian and Rademacher series

6.1.1 Self-adjoint matrices caseConsider a nite sequence fAkg of xed, self-adjoint matrices with dimension d , and let fkg be a nite sequenceof independent standard normal or independent Rademacher random variables.Then, for all t 0 ,

Pr(max

Xk

kAk! t) d et2/22

where

2 =

Xk

A2k

:

6.1.2 Rectangular caseConsider a nite sequence fBkg of xed, self-adjoint matrices with dimension d1 d2 , and let fkg be a nitesequence of independent standard normal or independent Rademacher random variables. Dene the variance pa-rameter

2 = max(

X

k

BkBk

;

X

k

BkBk

Multilinear Form

Documents

Transcript of Multilinear Form