Your Dreams May Come True with MTP2€¦ · Your Dreams May Come True with MTP2... Caroline Uhler...
-
Upload
nguyendang -
Category
Documents
-
view
218 -
download
2
Transcript of Your Dreams May Come True with MTP2€¦ · Your Dreams May Come True with MTP2... Caroline Uhler...
Your Dreams May Come True with MTP2...
Caroline Uhler (MIT)
Joint work with Shaun Fallat, Steffen Lauritzen, Kayvan Sadeghi,Nanny Wermuth, and Piotr Zwiernik
Optimization and Parsimonious ModelingIMA
January 26, 2016
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 1 / 18
Problem
Given large data sets for example from medical tests or IQ tests,determine a sparse graph that describes the dependencies between thevariables.
1/25/16
pro
ble
ms
1
p
2
...
1 n2 ...individuals
mea
sure
men
ts
1
p
2
...
1 n2 ...patients
Two common approaches:
Machine Learning: Graphical lasso
Applied Statistics: Chow-Liu and subsequent stepwise selection
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 2 / 18
Overview
1. MTP2 distributions
2. Properties of MTP2 distributions related to sparsity
3. Models that imply MTP2
4. Maximum likelihood estimation under MTP2
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 3 / 18
Positive dependence and MTP2 distributions
A distribution (i.e. density function) p on X is multivariate totallypositive of order 2 (MTP2) if
p(x)p(y) ≤ p(x ∧ y)p(x ∨ y) for all x , y ∈ X ⊂ Rm.
A random vector X is positively associated if for any non-decreasingfunctions φ, ψ : Rm → R
cov{φ(X ), ψ(X )} ≥ 0.
Theorem (FortuinKasteleynGinibre inequality, 1971)
MTP2 implies positive association.
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 4 / 18
Discrete and Gaussian MTP2 distribution
Example: Binary vector X = (X1,X2,X3) ∈ {0, 1}3 is MTP2 if and only ifp001p110 ≤ p000p111 p010p101 ≤ p000p111 p100p011 ≤ p000p111
p011p101 ≤ p001p111 p011p110 ≤ p010p111 p101p110 ≤ p100p111
p001p010 ≤ p000p011 p001p100 ≤ p000p101 p010p100 ≤ p000p110
Theorem (Horn and Johnson, 1991)
A multivariate Gaussian distribution p(x ; θ) is MTP2 if and only if theinverse covariance matrix θ is an M-matrix, that is
θij ≤ 0 for all i 6= j .
Theorem (Karlin and Rinott, 1980)
If p(x) > 0 and p is MTP2 for any pair of coordinates when the others areheld constant, then p is MTP2.
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 5 / 18
Properties of MTP2 distribution
Theorem (FLSUWZ, 2015)
If X = (X1, . . . ,Xm) is MTP2, then
(i) any marginal distribution is MTP2
(ii) any conditional distribution is MTP2
(iii) marginal independence structure:
Xi ⊥⊥ Xj ⇐⇒ cov(Xi ,Xj) = 0
(iv) conditional independence structure:
XA ⊥⊥ XB | XC =⇒ XA ⊥⊥ XB | XC∪{k}
(iv) composition property:
XA ⊥⊥ XB | XC and XA ⊥⊥ XD | XC =⇒ XA ⊥⊥ XB∪D | XC
(iv) singelton transitivity property:
Xi ⊥⊥ Xj | XC and Xi ⊥⊥ Xj | XC∪{k} =⇒ Xi ⊥⊥ Xk | XC or Xj ⊥⊥ Xk | XC
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 6 / 18
Occurrence of MTP2 distributions
MTP2 contraints appear to be extremely restrictive:
3-dim. Gaussian distributions: about 5% are MTP2
4-dim. Gaussian distributions: about 0.09% are MTP2
3-dim. binary distributions: about 2% are MTP2
4-dim. binary distributions: about 0% are MTP2
Constraints are less restrictive with additional Markov structure!
For 3-dim. Gaussian distributions:
if 1 ⊥⊥ 2 | 3: 25% are MTP2,
if in addition 1 ⊥⊥ 3 | 2: 50% are MTP2,
if 1 ⊥⊥ 2 ⊥⊥ 3: 100% are MTP2.
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 7 / 18
Example: EPH-gestosis
Dataset collected 40 years ago in a study on “Pregnancy and ChildDevelopment” by the German Research Foundation and recently analyzedby Wermuth and Marchetti (2014).
EPH-gestosis: disease syndrome for pregnant women; three symptoms
edema (high body water retention)
proteinuria (high amounts of urinary proteins)
hypertension (elevated blood pressure)
Observed counts:[n000 n010 n001 n011
n100 n110 n101 n111
]=
[3299 107 1012 58
78 11 65 19
].
This sample distribution is MTP2!
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 8 / 18
Example: Math grades
Data: grades of 88 students in
Mechanics, Vectors, Algebra, Analysis, Statistics
S =
mechanics vectors algebra analysis statistics
305.7680 127.2226 101.5794 106.2727 117.4049 mechanics127.2226 172.8422 85.1573 94.6729 99.0120 vectors101.5794 85.1573 112.8860 112.1134 121.8706 algebra106.2727 94.6729 112.1134 220.3804 155.5355 analysis117.4049 99.0120 121.8706 155.5355 297.7554 statistics
S−1 = 10−3 ·
mechanics vectors algebra analysis statistics
5.2446 −2.4351 −2.7395 0.0116 −0.1430 mechanics−2.4351 10.4268 −4.7078 −0.7928 −0.1660 vectors−2.7395 −4.7078 26.9548 −7.0486 −4.7050 algebra
0.0116 −0.7928 −7.0486 9.8829 −2.0184 analysis−0.1430 −0.1660 −4.7050 −2.0184 6.4501 statistics
Although sample distribution is not quite MTP2, any fitted reasonableGaussian graphical model is MTP2
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 9 / 18
MTP2 constraints are often implicit
Pairwise interaction model for a graph G = (V ,E ):
p(x) =1
Z
∏i∈V
ψi (xi )∏
(i ,j)∈Eψij(xi , xj),
where ψij positive functions, Z the normalizing constant.
Theorem (FLSUWZ, 2015)
p is MTP2 if and only if ψij are MTP2 functions.
Example: Ferromagnetism in Ising models
ψij(xi , xj) = exp(−θijxixj), θij ≤ 0
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 10 / 18
Signed MTP2 distributions
A Gaussian / discrete random vector X = (X1, . . . ,Xm) has a signedMTP2 distribution if and only if:
Discrete: The distribution of X is MTP2 up to a permutation of thevalues in each Xi
Gaussian: There exists a diagonal matrix D ∈ {−1,+1}m such thatDX is MTP2.
The following models are signed MTP2:
Gaussian / binary pairwise interaction models on trees
Binary latent class models (Allman, Rhodes, Sturmfels & Zwiernik, 2013)
Gaussian / binary latent tree models
Single factor analysis models
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 11 / 18
ML Estimation for Gaussian graphical models
Primal: Max-Likelihood: Dual: Min-Entropy: G = (V ,E )
ML Estimation for Gaussian graphical models
maximize✓⌫0
log det(✓) � trace(✓S)
subject to ✓uv = 0, 8 uv /2 E , u 6= v .
minimize⌃⌫0
� log det(⌃) � m
subject to ⌃uv = Suv , 8 uv 2 E , and u = v .
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 12 / 20
ML Estimation for Gaussian graphical models
maximize✓⌫0
log det(✓) � trace(✓S)
subject to ✓uv = 0, 8 uv /2 E , u 6= v .
minimize⌃⌫0
� log det(⌃) � m
subject to ⌃uv = Suv , 8 uv 2 E , and u = v .
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 12 / 20
Concentration matrices: Covariance matrices:
GeometryGeometry
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 12 / 18
ML Estimation for Gaussian MTP2 distributions
Primal: Max-Likelihood: Dual: Min-Entropy:
ML Estimation for Gaussian MTP2 distributions
maximize✓⌫0
log det(✓) � trace(✓S)
subject to ✓uv 0, 8 u 6= v .
minimize⌃⌫0
� log det(⌃) � m
subject to ⌃vv = Svv , ⌃uv � Suv .
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 13 / 20
ML Estimation for Gaussian MTP2 distributions
maximize✓⌫0
log det(✓) � trace(✓S)
subject to ✓uv 0, 8 u 6= v .
minimize⌃⌫0
� log det(⌃) � m
subject to ⌃vv = Svv , ⌃uv � Suv .
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 13 / 20
Theorem
The MLE based on S exists if and only if there exists Σ � 0 with Σ ≥ S.It is then equal to the unique element θ̂ = Σ̂−1 � 0 that satisfies thefollowing system of equations and inequalities
(a) Primal feasibility: θ̂uv ≤ 0 ∀u 6= v ,
(b) Dual feasibility: Σ̂vv − Svv = 0 ∀v , Σ̂uv − Suv ≥ 0 ∀u 6= v
(c) Complimentary slackness: (Σ̂uv − Suv ) θ̂uv = 0 ∀u 6= v.
Note: We get sparsity for free!!
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 13 / 18
ML Estimation for Gaussian MTP2 distributions
Theorem (Slawski and Hein, 2015)
The MLE in a Gaussian MTP2 model exists with probability 1 when n ≥ 2.
Theorem (LUZ, 2016)
Let S be a sample correlation matrix and θ̂ the MLE of the concentrationmatrix in the Gaussian MTP2 model. Let GMST (S) be the maximalspanning tree of S and G (θ̂) the concentration graph. Then
GMST (S) ⊂ G (θ̂).
Algorithm: Input: Sample correlation matrix SOutput: Graph under Gaussian signed MTP2 model
Let D ∈ {−1, 1}p diagonal s.t. Chow-Liu tree of DSD is positive
Compute MLE Σ̂ based on DSD under Gaussian MTP2 model
Output G (Σ̂−1)
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 14 / 18
Example: Carcass
344 measurements of the thickness of meat and fat layers at differentlocations of a slaughter pig
S−1 =
Fat11 Meat11 Fat12 Meat12 Fat13 Meat13
0.40 0.04 −0.23 −0.07 −0.19 0.05 Fat110.04 0.15 −0.01 −0.06 −0.05 −0.06 Meat11
−0.23 −0.01 0.51 0.07 −0.23 −0.05 Fat12−0.07 −0.06 0.07 0.14 −0.00 −0.09 Meat12−0.19 −0.05 −0.23 −0.00 0.54 0.03 Fat13
0.05 −0.06 −0.05 −0.09 0.03 0.16 Meat13
S =
Fat11 Meat11 Fat12 Meat12 Fat13 Meat13
11.34 0.74 8.42 2.06 7.66 −0.76 Fat110.74 32.97 0.67 35.94 2.01 31.97 Meat118.42 0.67 8.91 0.31 6.84 −0.60 Fat122.06 35.94 0.31 51.79 2.18 41.47 Meat127.66 2.01 6.84 2.18 7.62 0.38 Fat13
−0.76 31.97 −0.60 41.47 0.38 41.44 Meat13
Fat11
Meat11Fat12
Meat12
Fat13 Meat13
Fat11
Meat11Fat12
Meat12
Fat13 Meat13
Under MTP2 constraint Using glasso
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 15 / 18
Example: BodyFat
241 observations on 15 variables: age, weight, height, percentage of bodyfat, body density, and the circumferences of various body parts.
Density
BodyFat
Age
WeightHeight
Neck
Chest
Abdomen
Hip
Thigh
KneeAnkle Biceps
Forearm
Wrist
Density
BodyFat
Age
WeightHeight
Neck
Chest
Abdomen
Hip
Thigh
KneeAnkle Biceps
Forearm
Wrist
Under MTP2 constraint Using glasso
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 16 / 18
Conclusions and future work
MTP2 constraints reflect real processes and models
ferromagnetism
latent class models with positive associations
latent Gaussian/binary tree models
they lead to some beautiful theory (exponential families, convexity,combinatorics, semialgebraic geometry)
they are useful in high-dimensional settings
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 17 / 18
References
Fallat, Lauritzen, Sadeghi, Uhler, Wermuth, and Zwiernik: Totalpositivity in Markov structures (arXiv:1510.01290)
Lauritzen, Uhler, and Zwiernik: Totally positive exponential familiesand graphical models (on the arXiv shortly)
!"#$%&'()*+*,-.**/#("012$02"'*32#))024)5*)'%06'&040"'*%2"$07*
89%:('"0945*246*8941'7*2(;'<$208*;'9%'"$=*>?@!/*AB5*BCDCE
,-.*3'9%'"$=*9&*%270%#%*(0F'(0G996*')"0%2"094*04*32#))024*;$2:G082(*
%96'()*>G9:'&#((=*94*"G'*2$H01*<=*!#462=E
IG246$2)'F2$245*!G2G5*,-*?)=%:"9"08)*9&*%270%#%*(0F'(0G996*
')"0%2"094*04*32#))024*8=8(')*>04*:$9;$'))E
!"#$%&'()
*
Caroline Uhler (MIT) MTP2 distributions Minneapolis, January 2016 18 / 18