Gap filling using a Bayesian-regularized neural network
description
Transcript of Gap filling using a Bayesian-regularized neural network
Gap filling using a Bayesian-regularized neural network
B.H. BraswellUniversity of New Hampshire
MacKay DJC (1992) A practical Bayesian framework for backpropagation networks. Neural Computation, 4, 448-472.
Bishop C (1995) Neural Networks for Pattern Recognition, New York: Oxford University Press.
Nabney I (2002) NETLAB: algorithms for pattern recognition. In: Advances in Pattern Recognition, New York: Springer-Verlag.
Proper Credit
Two-layer ANN is a nonlinear regression
Two-layer ANN is a nonlinear regression
e.g., tanh()usually nonlinear
usually linear
Neural networks are efficient with respect to number of estimated
parameters
Polynomial of order M: Np ~ dM
Consider a problem with d input variables
Neural net with M hidden nodes: Np ~ d∙M
Early stopping
Regularization
Bayesian methods
Avoiding the problem of overfitting:
Avoiding the problem of overfitting:
Early stopping
Regularization
Bayesian methods
Avoiding the problem of overfitting:
Early stopping
Regularization
Bayesian methods
Arti f icial neural net w orks
An ar t ifi c i a l n eura l n e t w o r k ( A NN ) i s a f u n c t i o n a l m app i n g o f a v ec to r x c o n t a i n i n g d i n pu t s , i n to a
v ec to r y c o n t a i n i n g c o u t pu t s . An A NN c o n s i s t s o f “ l a y ers”, eac h h a v i n g M “n o des”. A n o d e i s a
li n ear t ra n s f o r m a t i o n o f i n pu t s f o l l o wed b y app li c at i o n o f a prescr i b ed f u n c t i o n . T h e o u t p ut s o f a l l
n o des i n a l a y er a re c o ll ec t ed i n t o a n ew v ec tor and i n pu t i n t o t h e n e x t l a y er . F or a t w o - l a y er
n e t w o rk,
€
ˆ y k ( x ) = f w kj
( 2 ) ˜ f w ji
(1) x i
i= 0
d
∑ ⎛
⎝ ⎜
⎞
⎠ ⎟
j = 0
M
∑ ⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟, (1)
w h ere f i s a f u n c t i o n , t y p i ca lly f ( a )= a , a n d
€
˜ f i s a d iff e re n t f u n c t i o n t h a t i s usua lly n o n li n ear ( e.g . ,
t a nh ( a )) . T h e t w o m a t r i ces w(1)
a n d w(2 )
represe n t t h e f ree para m e t ers i n t h e regress i o n a n d i n c l ude
b i as t er m s f or j =0 a n d i = 0 .
F o r para m e t e r es t im a t i o n , t h e s t a n da rd b ackpr o pa g a t i o n a l g o r i t hm i s t y p i ca lly used . T h i s m e t h o d
upda t es t h e we i g h t s a n d b i ases w f o r t h e N pa i rs o f o b ser v ed da t a v ec to rs x a n d yi n o rde r t o
mi n imi ze t h e e r r o r:
€
E (w ) =k
( n )y (w ) −
k
( n )ˆ y ( )2
k =1
c
∑n =1
N
∑ (2)
b y es t im a t i n g t h e der i v a t i v e o f E ( w ) w i t h respec t t o w .
1
Bayesian regularization
I n t h e Ba y es i a n f ra m ew o r k , m o de l para m e t ers are t rea t ed as pr o b a b ili t y d i s t r i b u t i o n s , a n d t h e
p o s t er i o r pr o b a b ili t y o f t h e we i g h t s g i v en t h e se t o f o b ser v ed o u t pu ts
€
D ≡ y ( n ); n =1… N , i s:
)( )()|()|( DppDpDp www= (3)
w h ere p ( D | w ) i s t h e pr o b a b il i t y o f t h e o b ser v a t i o n s g i v e n a c h o i ce o f we i g h t s (t h e li ke li h ood ) , p ( w )
i s a pr i or d i s t r i b u t i o n o f we i g h t v a l ues, a n d t h e den o mi n a to r i s a n o r m a li za t i o n c o n s t a n t.
A s su mi n g Gauss i a n d i s t r i b u t i o n s f or b ot h t h e li ke l ih oo d a n d t h e p r i or , t h e p o s t er i o r d i s tr i b u t i o n i s
g i v e n b y
€
p(w | D ) =1
Z D
exp( −β E D ) ⎛
⎝ ⎜
⎞
⎠ ⎟
1
Z W
exp( −α E W ) ⎛
⎝ ⎜
⎞
⎠ ⎟
=1
Z S
exp( −β E D − α E w ) =1
Z S
exp − S (w )( )
(4)
w h ere Z S i s a c o n s t a n t a n d S ca n b e rew r i tt e n as
€
S =β
2y k
( n ) (w ) − ˆ y k( n )
( )2
k =1
c
∑ +α
2n = 1
N
∑ w i
2
i=1
W
∑ ( 5)
T h e para m e t ers a n d represe n t t h e n o i se i n t h e da t a a n d t h e v ar i a n ce o f t h e we i g h t s ,
respec t i v e ly . T h us , a s o l u t i o n t o t h i s pr o b l e m i s f o u n d b y m a x imi z i n g t h e p o s t er i or pr o b a b il i t y w i t h
respec t to w , o r m i n imi z i n g t h e n ega t i v e lo g o f t h e pr o b a b ili t y . T h i s a m o u n t s t o mi n i m i z i n g t h e
m o d ifi ed e r r o r f u n c t i o n S ( w ) i n Equa t i o n 5.
2
Gaussian approximation to the posterior distribution
T o es t im a t e t h e u n cer t a i n t y o f t h e p red i c t i o n s, we use a Gauss i a n appr o x im a t i o n to t h e p o s t er i o r
d i s t r i b u t i o n f o r t h e we i g h t s a n d per f o r m a Ta yl or ser i es e x pa n s i o n ar o u n d t h e m o s t p r o b a b l e v a l ues
w MP ,
)()(21)()( MPTMPMPSS wwAwwww −−+≅ , (6)
w h ere A i s equa l to :
IA α+∇∇=∇∇= )(MPDMP ES (7)
T h i s i s t h e Hess i a n m a t r i x o f t h e err o r f u n c t i o n ( E q ua t i o n 5) , a n d i t s e l e m e n t s ca n b e ca l cu l a t ed
n u m er i ca lly dur i n g t h e b ackpr o paga t i o .n T h e e x pa n s i o n i n Equa t i o n 6 a ll o ws us t o rewr i t e t h e
p o s t er i o r d i s t r i b u t i o n f or t h e we i g h t s :as
€
p(w | D ) =1
Z S
*exp −S (w MP ) −
1
2Δw TAΔw
⎛
⎝ ⎜
⎞
⎠ ⎟ , (8)
w h ere
€
Δw = w − w MP , a n d
€
Z S
* i s g i v e n b y
€
Z S
* (α ,β ) = (2π )W / 2 A−1 / 2
exp S (w MP )( ) . (9)
3
P osterior distribution o f outputs
T h e d i s t r i b u t i o n o f n e t w o rk o u t p ut s (a n d t h us t h e u n cer t a i n t y o f t h e pred i c t i o n) i s es t im a t ed by
assu mi n g t h a t t h e w i d t h o f t h e po s t er i o r d i s tr i b u t i o n i s n ot e x t re m e ly b r o a d , a n d s o t h e pred i c t i o n s
are e x pa n ded as
€
ˆ y (w ) = ˆ y (w MP ) + g T (w − w MP ) (10)
w h ere
€
g ≡ ∇ wˆ y |w MP
. (11)
T h e p o s t er i or d i s t r i b u t i o n o f t h e pred i c t i o n s i s g i v e n b y
€
p( y | D ) = p ( y∫ | w ) p (w | D )dw ( 12)
I n t h e Gauss i a n appr o x im a t i o n f or t h e p o s t er i or o f t h e we i g h t s (Equa t i o n 8 ) , a n d assu mi n g zer o
m ea n Gauss i a n n o i se, t h i s b ec o m es
€
p( y | D ) ∝ exp −β
2y − ˆ y (w )[ ]
2 ⎛
⎝ ⎜
⎞
⎠ ⎟exp −
1
2Δw TAΔw
⎛
⎝ ⎜
⎞
⎠ ⎟∫ dw (13)
T h us, su b s t i t u t i n g Equa t i o n 1 0 , a n d g i v e n Equa t i o n s 7 a n d 9 , t h e p o s t er i o r d i s t r i b u t i o n f o r t h e
o u t pu t s
€
p( y | x , D ) i s n o r m a l , w i t h s t a n dard de v i a t i o n
€
σ y
2 =1
β+ gTA −1g (14)
4
5Determining the regularization coe ff icients
T h e m o s t li ke l y v a l ues o f t h e “h y perpara m e t ers” a n d ca n b e de t e r mi n ed i n a h i erarc h i ca l
f as h i o n . T h e p o s t er i o r f r o m Equa t i o n 3 , n o w a j o i n t d i s t r i b u t i o n c o n t a i n i n g t h ese add i t i o n a l
para m e t ers, i s f i rs t appr o x im a t ed as
€
p(w | D ) ≅ p (w | α MP ,β MP , D ) , (15)
T h us we m us t a l t er n a t i v e l y es t im a t e a n d , t h e n use Equa t i o n 5 t o ca l cu l a t e t h e we i g h t s . Us i n g
Ba y es ' t h e o re m , t h e p o s t er i or d i s t r i b u t i o n f or t h e h y perpara m e t ers i s g i v e n b y
€
p(α ,β | D ) =p(D | α ,β ) p (α ,β )
p (D ). (16)
I n c l ud i n g t h e e x p li c i t depe n de n ce o f t h e h y perpar a m e t ers, t h e n o r m a li za t i o n o f Equa t i o n 3 can b e
wr i tt e n as
€
p(D | α ,β ) = p(D | w,α ,β )∫ p (w | α ,β )dw
= p(D | w,β )∫ p(w | α )dw (17)
Us i n g t h e e x p o n e n t i a l f or m u l a t i o n f o r t h e pr i o r and li ke li h oo d , t h i s e x press i o n i s n o w f ra m ed i n
t er m s o f t h e o r i g i n a l n o r m a li z i n g c o n s t a n t s:
€
p(D | α ,β ) =1
Z D (β )
1
Z W (α )exp( −S (w )) dw∫
=Z S (α ,β )
Z D (β )Z W (α )
(18)
Su b s i t u t i n g t h e Gauss i a n appr o x im a t i o n f o r t h e p o s t er i o r e v a l ua t ed i n t h e n e i g hb o r h ood o f t h e
o p t im a l we i g h t s w M P , (Equa t i o n 9 ) , t h e f u n c t i o n t ha t m us t b e mi n i m i zed n o w i s t h e n ega t i v e log o f
t h e li ke li h oo d p ( D | , ) , w h i ch i s equa l to
€
ln P (D | α ,β ) = α E W
( MP ) + β E D
( MP ) +1
2ln(A ) −
W
2ln( α ) −
N
2ln( β ) +
N
2ln( 2π ) (19)
E v a l ua t i n g t h e d e r i v a t i v e o f t h i s e x press i o n w i t h r e spec t to t h e t w o p a ra m e t ers i nv o l v es ca l cu l a t i n g
t h e e i ge nv a l ues o f A , a n d we ar e l e f t w i t h e x pre s s i o n s f o r t h e o p t im a l v a l ues o f a n d , g i v e n
t h e m o s t pr o b a b l e v a l ues o f t h e we i g h t s w . T h us,
€
α ( n +1) =γ (n)
2 E w
( n ); β ( n +1) =
N - γ (n)
2 E D
( n ) (20 )
w h ere
€
γ ≡λ i
λ i + αi=1
W
∑ (21)
I n prac t i ce , t h e para m e t ers a n d h y perpara m e t ers a r e s o l v ed to ge t h er by a l t er n a t i v e ly upda t i n g w
us i n g s t a n dard b ackpr o paga t i o n , a n d upda t i n g a n d us i n g Equa t i o n s 2 0 a n d 2 1 ( h e n ce t h e i n dex
n +1 a b o v e).
6
Hagen SC, Braswell BH, Frolking, Richardson A, Hollinger D, Linder E (2006) Statistical uncertainty of eddy flux based estimates of gross ecosystem carbon exchange at Howland Forest, Maine. Journal of Geophysical Research, 111.
Braswell BH, Hagen SC, Frolking SE, Salas WE (2003) A multivariable approach for mapping subpixel land cover distributions using MISR and MODIS: An application in the Brazilian Amazon. Remote Sensing of Environment, 87:243-256.
Previous Work
ANN Regression for Land Cover Estimation
Band1
Band2
Band3
Band4
Forest Fraction
Cleared Fraction
Secondary Fraction
Training data suppliedby classified ETM imagery
Forest Secondary Cleared
ETM+observed
MISRpredicted
1.0
0.0
1.0
0.0
0.4
0.0
0.4
0.0
0.6
0.0
0.6
0.0
Mean Val.Error=0.045 km2
Mean Val.Error=0.038 km2
Mean Val.Error=0.025 km2
(R2=.62) (R2=.58) (R2=.47)
ANN Regression for Land Cover Estimation
ANN Estimation of GEE and Resp, with Monte Carlo simulation of Total Prediction uncertainty
Clim Flux
Weekly GEE from Howland Forest, ME based on NEE
ANN Estimation of GEE and Resp, with Monte Carlo simulation of Total Prediction uncertainty
Some demonstrations of the MacKay/BishopANN regression with 1 input and 1 output
Noise=0.10
1.4
Noise=0.10
Linear Regression
Noise=0.10
ANN Regression
Noise=0.02
ANN Regression
Noise=0.20
ANN Regression
Noise=0.20
ANN Regression
Noise=0.10
ANN Regression
Noise=0.05
ANN Regression
Noise=0.05
ANN Regression
Noise=0.05
ANN Regression
Issues associated with multidimensional problems
Sufficient sampling of the the input space
Data normalization (column mean zero and standard deviation one)
Processing time
Algorithm parameter choices
Our gap-filling algorithm
1.Assemble meteorological and flux data in an Nxd table
2.Create five additional columns for sin() and cos() of time of day and day of year, and potential PAR
3.Standardize all columns
4.First iteration: Identify columns with no gaps; use these to fill all the others, one at a time.
5.Create an additional column, NEE(t-1), flux lagged by one time interval
6.Second iteration: Remove filled points from the NEE time series, refill with all other columns
Room for Improvement
1.Don’t extrapolate wildly, revert to time-based filling in areas with low sampling density, especially at the beginning and end of the record
2.Carefully evaluate the sensitivity to internal settings (e.g., alpha, beta, Nnodes)
3.Stepwise analysis for relative importance of driver variables
4.Migrate to C or other faster environment
5.Include uncertainty estimates in the output
6.At least, clean up the code and make it available to others in the project, and/or broader community