Conditional Chow-Liu Tree Structures for Modeling Discrete-Valued Vector Time Series Sergey...
-
date post
22-Dec-2015 -
Category
Documents
-
view
221 -
download
0
Transcript of Conditional Chow-Liu Tree Structures for Modeling Discrete-Valued Vector Time Series Sergey...
Conditional Chow-Liu Tree Structures for Modeling Discrete-Valued Vector
Time Series
Sergey Kirshner, UC Irvine
Padhraic Smyth, UC Irvine
Andrew Robertson, IRI
July 10, 2004
2UAI-2004 © Sergey Kirshner, UC Irvine
Overview
• Data and its modeling aspects
• Model description– General Approach
• Hidden Markov models
– Capturing data properties• Chow-Liu trees• Conditional Chow-Liu trees
• Inference and Learning
• Experimental Results
• Summary and Future Extensions
4UAI-2004 © Sergey Kirshner, UC Irvine
Data Aspects
• Correlation– Spatial dependence
• Temporal structure– First order dependence
• Variability of individual series– Interannual variability
5UAI-2004 © Sergey Kirshner, UC Irvine
Modeling Precipitation Occurrence
Southwestern Australia, 1978-92
Western US, 1952-90
6UAI-2004 © Sergey Kirshner, UC Irvine
A Bit of Notation
• Vector time series R– R1:T=R1,..,RT
• Vector observation of R at time t– Rt=(At,Bt,…,Zt)
A1
B1
Z1
C1
R1
A2
B2
Z2
C2
R2
AT
BT
ZT
CT
RT
7UAI-2004 © Sergey Kirshner, UC Irvine
Weather Generator
R1 R2 RT
A1
B1
Z1
C1
A2
B2
Z2
C2
AT
BT
ZT
CT
T
t c
T
t
ttttT ccPcPPPP2 Z,..,A 2
1111:1 )|()()|()()( RRRR
• Does not take correlation into account
8UAI-2004 © Sergey Kirshner, UC Irvine
Hidden Markov Model
R1 R2 Rt RT-1 RT
S1 S2 St ST-1 ST
T
t
tt
T
t
ttTT SPSSPSPSP12
11:1:1 )|()|()(),( RR
9UAI-2004 © Sergey Kirshner, UC Irvine
HMM-Conditional-Independence
Rt
St St
At Ct
ZtBt
=
Z,,A
)(
)|Z,,A()|(
c
tt
ttttt
|ScP
SPSP R
R1 R2 Rt RT-1 RT
S1 S2 St ST-1 ST
10UAI-2004 © Sergey Kirshner, UC Irvine
HMM-CI: Is It Sufficient?
• Simple yet effective
• Requires large number of values for St
• Emissions can be made to capture more spatial dependencies
11UAI-2004 © Sergey Kirshner, UC Irvine
Chow-Liu Trees
• Approximation of a joint distribution with a tree-structured distribution [Chow and Liu 68]
12UAI-2004 © Sergey Kirshner, UC Irvine
0.31260.02290.01720.02300.01830.2603
ABACADBCBDCD
(0.56, 0.11, 0.02, 0.31)(0.51, 0.17, 0.17, 0.15)(0.53, 0.15, 0.19, 0.13)(0.44, 0.14, 0.23, 0.19)(0.46, 0.12, 0.26, 0.16)(0.64, 0.04, 0.08, 0.24)
A
C
B
D
A
C
B
D
ABACADBCBDCD
0.31260.02290.01720.02300.01830.2603
(0.56, 0.11, 0.02, 0.31)(0.51, 0.17, 0.17, 0.15)(0.53, 0.15, 0.19, 0.13)(0.44, 0.14, 0.23, 0.19)(0.46, 0.12, 0.26, 0.16)(0.64, 0.04, 0.08, 0.24)
Illustration of CL-Tree Learning
A
C
B
D
13UAI-2004 © Sergey Kirshner, UC Irvine
Chow-Liu Trees
• Approximation of a joint distribution with a tree-structured distribution [Chow and Liu 68]
• Learning the structure and the probabilities– Compute individual and pairwise marginal distributions for all
pairs of variables – Compute mutual information (MI) for each pair of variables
– Build maximum spanning tree with for a complete graph with variables as nodes and MIs as weights
• Properties– Efficient:
• O(#samples×(#variables)2×(#values per variable)2)
– Optimal
YX YPXP
YXPYXPYX
, )()(
),(log),(),MI(
14UAI-2004 © Sergey Kirshner, UC Irvine
HMM-Chow-Liu
R1 R2 Rt RT-1 RT
S1 S2 St ST-1 ST
Rt
St
Bt
DtCt
Bt
DtCt
Bt
DtCt
St
St=1 St=2 St=3=
T1(Rt) T2(Rt) T3(Rt)
At AtAt
15UAI-2004 © Sergey Kirshner, UC Irvine
Improving on Chow-Liu Trees
• Tree edges with low MI add little to the approximation.
• Observations from the previous time point can be more relevant than from the current one.
• Idea: Build Chow-Liu tree allowing to include variables from the current and the previous time point.
16UAI-2004 © Sergey Kirshner, UC Irvine
Conditional Chow-Liu Forests
• Extension of Chow-Liu trees to conditional distributions– Approximation of conditional multivariate
distribution with a tree-structured distribution– Uses MI to build maximum spanning trees (forest)
• Variables of two consecutive time points as nodes
• All nodes corresponding to the earlier time point considered connected before the tree construction
– Same asymptotic complexity as Chow-Liu trees• O(#samples×(#variables)2×(#values per variable)2)
– Optimal
17UAI-2004 © Sergey Kirshner, UC Irvine
B’A’
C’
BA
C
0.31260.02290.02300.12070.12530.06230.13920.17000.05590.00330.00300.0625
ABACBCA’AA’BA’CB’AB’BB’CC’AC’BC’C
(0.56, 0.11, 0.02, 0.31)(0.51, 0.17, 0.17, 0.15)(0.44, 0.14, 0.23, 0.19)(0.57, 0.11, 0.11, 0.21)(0.51, 0.17, 0.07, 0.25)(0.54, 0.14, 0.14, 0.18)(0.52, 0.07, 0.16, 0.25)(0.48, 0.10, 0.11, 0.31)(0.47, 0.11, 0.21, 0.21)(0.48, 0.20, 0.20, 0.12)(0.41, 0.26, 0.17, 0.16)(0.53, 0.14, 0.14, 0.19)
ABACBCA’AA’BA’CB’AB’BB’CC’AC’BC’C
0.31260.02290.02300.12070.12530.06230.13920.17000.05590.00330.00300.0625
B’A’
C’
BA
C
Example of CCL-Forest Learning
B’A’
C’
BA
C
B’A’
C’
BA
C
18UAI-2004 © Sergey Kirshner, UC Irvine
AR-HMM
T
t
ttt
T
t
ttTT SPSSPSPSPSP2
,1
2
1111:1:1 )|()|()|()(),( RRRR
R1 Rt RT
S1 St ST
Rt-1
St-1
R2
S2
19UAI-2004 © Sergey Kirshner, UC Irvine
HMM-Conditional-Chow-Liu
St
Rt-1 Rt
R1 Rt RT
S1 St ST
Rt-1
St-1
R2
S2
At-1
Bt-1
Ct-1
Dt-1
At Bt
CtDt
Dt-1
Ct-1
Bt-1
At-1
CtDt
At Bt
Dt-1
Ct-1
Bt-1
At-1
Dt Ct
At Bt
St
St=1 St=2 St=3
=
20UAI-2004 © Sergey Kirshner, UC Irvine
Inference and Learning for HMM-CL and HMM-CCL
• Inference (calculating P(S|R,))– Recursively calculate P(R1:t,St|) and P(Rt+1:T|St,)
(Forward-Backward)
• Learning (Baum-Welch or EM)– E-step: calculate P(S|R,)
• Forward-Backward
• Calculate P(St|R,) and P(St,St+1|R,)
– M-step: • Maximize EP(S|R,)[P(S, R|’)]
• Similar to mixtures of Chow-Liu trees
21UAI-2004 © Sergey Kirshner, UC Irvine
Chain Chow-Liu Forest (CCLF)
R1 Rt RTRt-1R2
RtRt-1
Bt
CtDt
At
At
Bt
Ct
Dt
=
22UAI-2004 © Sergey Kirshner, UC Irvine
Complexity Analysis
Model
Criterion
HMM-CI HMM-CL HMM-CCL
# params K2+MK(V-1) K2+K(M-1)(V2-1) K2+KM(V2-1)
Time (per iteration)
O(NTK(K+M)) O(NTK(K+M2V2)) O(NTK(K+
+M2V2))
Space O(NTK(K+M)) O(NTK(K+M)+KM2V2) O(NTK(K+M)+
+KM2V2)
N – number of sequencesT – length of each sequenceK – number of hidden statesM – dimensionality of each vectorV – number of possible values for each vector component
23UAI-2004 © Sergey Kirshner, UC Irvine
Experimental Setup
• Data– Australia
• 15 seasons, 184 days each, 30 stations
– Western U.S.• 39 seasons, 90 days each, 8 stations
• Measuring predictive performance– Choose K (number of states)– Leave-one-out cross-validation– Log-likelihood– Error for prediction of a single entry given the rest
29UAI-2004 © Sergey Kirshner, UC Irvine
Summary
• Efficient approximation for finite-valued conditional distributions– Conditional Chow-Liu forests
• New models for spatio-temporal finite-valued data– HMM with Chow-Liu trees– HMM with conditional Chow-Liu forests– Chain Chow-Liu forests
• Applied to precipitation modeling
30UAI-2004 © Sergey Kirshner, UC Irvine
Future Work
• Extension to real-valued data
• Priors on tree structure and parameters [Jaakkola and Meila 00]
– Locations of the stations
• Interannual variability– Atmospheric variables as inputs to non-homogeneous HMM
[Robertson et al 04]
• Other approximations for finite-valued multivariate data– Maximum Entropy– Multivariate probit models (binary)