Gap filling using a Bayesian-regularized neural network

33
Gap filling using a Bayesian- regularized neural network B.H. Braswell University of New Hampshire

description

Gap filling using a Bayesian-regularized neural network. B.H. Braswell University of New Hampshire. Proper Credit. MacKay DJC (1992) A practical Bayesian framework for backpropagation networks. Neural Computation, 4, 448-472. - PowerPoint PPT Presentation

Transcript of Gap filling using a Bayesian-regularized neural network

Page 1: Gap filling using a Bayesian-regularized neural network

Gap filling using a Bayesian-regularized neural network

B.H. BraswellUniversity of New Hampshire

Page 2: Gap filling using a Bayesian-regularized neural network

MacKay DJC (1992) A practical Bayesian framework for backpropagation networks. Neural Computation, 4, 448-472.

Bishop C (1995) Neural Networks for Pattern Recognition, New York: Oxford University Press.

Nabney I (2002) NETLAB: algorithms for pattern recognition. In: Advances in Pattern Recognition, New York: Springer-Verlag.

Proper Credit

Page 3: Gap filling using a Bayesian-regularized neural network

Two-layer ANN is a nonlinear regression

Page 4: Gap filling using a Bayesian-regularized neural network

Two-layer ANN is a nonlinear regression

e.g., tanh()usually nonlinear

usually linear

Page 5: Gap filling using a Bayesian-regularized neural network

Neural networks are efficient with respect to number of estimated

parameters

Polynomial of order M: Np ~ dM

Consider a problem with d input variables

Neural net with M hidden nodes: Np ~ d∙M

Page 6: Gap filling using a Bayesian-regularized neural network

Early stopping

Regularization

Bayesian methods

Avoiding the problem of overfitting:

Page 7: Gap filling using a Bayesian-regularized neural network

Avoiding the problem of overfitting:

Early stopping

Regularization

Bayesian methods

Page 8: Gap filling using a Bayesian-regularized neural network

Avoiding the problem of overfitting:

Early stopping

Regularization

Bayesian methods

Page 9: Gap filling using a Bayesian-regularized neural network

Arti f icial neural net w orks

An ar t ifi c i a l n eura l n e t w o r k ( A NN ) i s a f u n c t i o n a l m app i n g o f a v ec to r x c o n t a i n i n g d i n pu t s , i n to a

v ec to r y c o n t a i n i n g c o u t pu t s . An A NN c o n s i s t s o f “ l a y ers”, eac h h a v i n g M “n o des”. A n o d e i s a

li n ear t ra n s f o r m a t i o n o f i n pu t s f o l l o wed b y app li c at i o n o f a prescr i b ed f u n c t i o n . T h e o u t p ut s o f a l l

n o des i n a l a y er a re c o ll ec t ed i n t o a n ew v ec tor and i n pu t i n t o t h e n e x t l a y er . F or a t w o - l a y er

n e t w o rk,

ˆ y k ( x ) = f w kj

( 2 ) ˜ f w ji

(1) x i

i= 0

d

∑ ⎛

⎝ ⎜

⎠ ⎟

j = 0

M

∑ ⎛

⎝ ⎜ ⎜

⎠ ⎟ ⎟, (1)

w h ere f i s a f u n c t i o n , t y p i ca lly f ( a )= a , a n d

˜ f i s a d iff e re n t f u n c t i o n t h a t i s usua lly n o n li n ear ( e.g . ,

t a nh ( a )) . T h e t w o m a t r i ces w(1)

a n d w(2 )

represe n t t h e f ree para m e t ers i n t h e regress i o n a n d i n c l ude

b i as t er m s f or j =0 a n d i = 0 .

F o r para m e t e r es t im a t i o n , t h e s t a n da rd b ackpr o pa g a t i o n a l g o r i t hm i s t y p i ca lly used . T h i s m e t h o d

upda t es t h e we i g h t s a n d b i ases w f o r t h e N pa i rs o f o b ser v ed da t a v ec to rs x a n d yi n o rde r t o

mi n imi ze t h e e r r o r:

E (w ) =k

( n )y (w ) −

k

( n )ˆ y ( )2

k =1

c

∑n =1

N

∑ (2)

b y es t im a t i n g t h e der i v a t i v e o f E ( w ) w i t h respec t t o w .

1

Page 10: Gap filling using a Bayesian-regularized neural network

Bayesian regularization

I n t h e Ba y es i a n f ra m ew o r k , m o de l para m e t ers are t rea t ed as pr o b a b ili t y d i s t r i b u t i o n s , a n d t h e

p o s t er i o r pr o b a b ili t y o f t h e we i g h t s g i v en t h e se t o f o b ser v ed o u t pu ts

D ≡ y ( n ); n =1… N , i s:

)( )()|()|( DppDpDp www= (3)

w h ere p ( D | w ) i s t h e pr o b a b il i t y o f t h e o b ser v a t i o n s g i v e n a c h o i ce o f we i g h t s (t h e li ke li h ood ) , p ( w )

i s a pr i or d i s t r i b u t i o n o f we i g h t v a l ues, a n d t h e den o mi n a to r i s a n o r m a li za t i o n c o n s t a n t.

A s su mi n g Gauss i a n d i s t r i b u t i o n s f or b ot h t h e li ke l ih oo d a n d t h e p r i or , t h e p o s t er i o r d i s tr i b u t i o n i s

g i v e n b y

p(w | D ) =1

Z D

exp( −β E D ) ⎛

⎝ ⎜

⎠ ⎟

1

Z W

exp( −α E W ) ⎛

⎝ ⎜

⎠ ⎟

=1

Z S

exp( −β E D − α E w ) =1

Z S

exp − S (w )( )

(4)

w h ere Z S i s a c o n s t a n t a n d S ca n b e rew r i tt e n as

S =β

2y k

( n ) (w ) − ˆ y k( n )

( )2

k =1

c

∑ +α

2n = 1

N

∑ w i

2

i=1

W

∑ ( 5)

T h e para m e t ers a n d represe n t t h e n o i se i n t h e da t a a n d t h e v ar i a n ce o f t h e we i g h t s ,

respec t i v e ly . T h us , a s o l u t i o n t o t h i s pr o b l e m i s f o u n d b y m a x imi z i n g t h e p o s t er i or pr o b a b il i t y w i t h

respec t to w , o r m i n imi z i n g t h e n ega t i v e lo g o f t h e pr o b a b ili t y . T h i s a m o u n t s t o mi n i m i z i n g t h e

m o d ifi ed e r r o r f u n c t i o n S ( w ) i n Equa t i o n 5.

2

Page 11: Gap filling using a Bayesian-regularized neural network

Gaussian approximation to the posterior distribution

T o es t im a t e t h e u n cer t a i n t y o f t h e p red i c t i o n s, we use a Gauss i a n appr o x im a t i o n to t h e p o s t er i o r

d i s t r i b u t i o n f o r t h e we i g h t s a n d per f o r m a Ta yl or ser i es e x pa n s i o n ar o u n d t h e m o s t p r o b a b l e v a l ues

w MP ,

)()(21)()( MPTMPMPSS wwAwwww −−+≅ , (6)

w h ere A i s equa l to :

IA α+∇∇=∇∇= )(MPDMP ES (7)

T h i s i s t h e Hess i a n m a t r i x o f t h e err o r f u n c t i o n ( E q ua t i o n 5) , a n d i t s e l e m e n t s ca n b e ca l cu l a t ed

n u m er i ca lly dur i n g t h e b ackpr o paga t i o .n T h e e x pa n s i o n i n Equa t i o n 6 a ll o ws us t o rewr i t e t h e

p o s t er i o r d i s t r i b u t i o n f or t h e we i g h t s :as

p(w | D ) =1

Z S

*exp −S (w MP ) −

1

2Δw TAΔw

⎝ ⎜

⎠ ⎟ , (8)

w h ere

Δw = w − w MP , a n d

Z S

* i s g i v e n b y

Z S

* (α ,β ) = (2π )W / 2 A−1 / 2

exp S (w MP )( ) . (9)

3

Page 12: Gap filling using a Bayesian-regularized neural network

P osterior distribution o f outputs

T h e d i s t r i b u t i o n o f n e t w o rk o u t p ut s (a n d t h us t h e u n cer t a i n t y o f t h e pred i c t i o n) i s es t im a t ed by

assu mi n g t h a t t h e w i d t h o f t h e po s t er i o r d i s tr i b u t i o n i s n ot e x t re m e ly b r o a d , a n d s o t h e pred i c t i o n s

are e x pa n ded as

ˆ y (w ) = ˆ y (w MP ) + g T (w − w MP ) (10)

w h ere

g ≡ ∇ wˆ y |w MP

. (11)

T h e p o s t er i or d i s t r i b u t i o n o f t h e pred i c t i o n s i s g i v e n b y

p( y | D ) = p ( y∫ | w ) p (w | D )dw ( 12)

I n t h e Gauss i a n appr o x im a t i o n f or t h e p o s t er i or o f t h e we i g h t s (Equa t i o n 8 ) , a n d assu mi n g zer o

m ea n Gauss i a n n o i se, t h i s b ec o m es

p( y | D ) ∝ exp −β

2y − ˆ y (w )[ ]

2 ⎛

⎝ ⎜

⎠ ⎟exp −

1

2Δw TAΔw

⎝ ⎜

⎠ ⎟∫ dw (13)

T h us, su b s t i t u t i n g Equa t i o n 1 0 , a n d g i v e n Equa t i o n s 7 a n d 9 , t h e p o s t er i o r d i s t r i b u t i o n f o r t h e

o u t pu t s

p( y | x , D ) i s n o r m a l , w i t h s t a n dard de v i a t i o n

σ y

2 =1

β+ gTA −1g (14)

4

Page 13: Gap filling using a Bayesian-regularized neural network

5Determining the regularization coe ff icients

T h e m o s t li ke l y v a l ues o f t h e “h y perpara m e t ers” a n d ca n b e de t e r mi n ed i n a h i erarc h i ca l

f as h i o n . T h e p o s t er i o r f r o m Equa t i o n 3 , n o w a j o i n t d i s t r i b u t i o n c o n t a i n i n g t h ese add i t i o n a l

para m e t ers, i s f i rs t appr o x im a t ed as

p(w | D ) ≅ p (w | α MP ,β MP , D ) , (15)

T h us we m us t a l t er n a t i v e l y es t im a t e a n d , t h e n use Equa t i o n 5 t o ca l cu l a t e t h e we i g h t s . Us i n g

Ba y es ' t h e o re m , t h e p o s t er i or d i s t r i b u t i o n f or t h e h y perpara m e t ers i s g i v e n b y

p(α ,β | D ) =p(D | α ,β ) p (α ,β )

p (D ). (16)

I n c l ud i n g t h e e x p li c i t depe n de n ce o f t h e h y perpar a m e t ers, t h e n o r m a li za t i o n o f Equa t i o n 3 can b e

wr i tt e n as

p(D | α ,β ) = p(D | w,α ,β )∫ p (w | α ,β )dw

= p(D | w,β )∫ p(w | α )dw (17)

Us i n g t h e e x p o n e n t i a l f or m u l a t i o n f o r t h e pr i o r and li ke li h oo d , t h i s e x press i o n i s n o w f ra m ed i n

t er m s o f t h e o r i g i n a l n o r m a li z i n g c o n s t a n t s:

p(D | α ,β ) =1

Z D (β )

1

Z W (α )exp( −S (w )) dw∫

=Z S (α ,β )

Z D (β )Z W (α )

(18)

Page 14: Gap filling using a Bayesian-regularized neural network

Su b s i t u t i n g t h e Gauss i a n appr o x im a t i o n f o r t h e p o s t er i o r e v a l ua t ed i n t h e n e i g hb o r h ood o f t h e

o p t im a l we i g h t s w M P , (Equa t i o n 9 ) , t h e f u n c t i o n t ha t m us t b e mi n i m i zed n o w i s t h e n ega t i v e log o f

t h e li ke li h oo d p ( D | , ) , w h i ch i s equa l to

ln P (D | α ,β ) = α E W

( MP ) + β E D

( MP ) +1

2ln(A ) −

W

2ln( α ) −

N

2ln( β ) +

N

2ln( 2π ) (19)

E v a l ua t i n g t h e d e r i v a t i v e o f t h i s e x press i o n w i t h r e spec t to t h e t w o p a ra m e t ers i nv o l v es ca l cu l a t i n g

t h e e i ge nv a l ues o f A , a n d we ar e l e f t w i t h e x pre s s i o n s f o r t h e o p t im a l v a l ues o f a n d , g i v e n

t h e m o s t pr o b a b l e v a l ues o f t h e we i g h t s w . T h us,

α ( n +1) =γ (n)

2 E w

( n ); β ( n +1) =

N - γ (n)

2 E D

( n ) (20 )

w h ere

γ ≡λ i

λ i + αi=1

W

∑ (21)

I n prac t i ce , t h e para m e t ers a n d h y perpara m e t ers a r e s o l v ed to ge t h er by a l t er n a t i v e ly upda t i n g w

us i n g s t a n dard b ackpr o paga t i o n , a n d upda t i n g a n d us i n g Equa t i o n s 2 0 a n d 2 1 ( h e n ce t h e i n dex

n +1 a b o v e).

6

Page 15: Gap filling using a Bayesian-regularized neural network

Hagen SC, Braswell BH, Frolking, Richardson A, Hollinger D, Linder E (2006) Statistical uncertainty of eddy flux based estimates of gross ecosystem carbon exchange at Howland Forest, Maine. Journal of Geophysical Research, 111.

Braswell BH, Hagen SC, Frolking SE, Salas WE (2003) A multivariable approach for mapping subpixel land cover distributions using MISR and MODIS: An application in the Brazilian Amazon. Remote Sensing of Environment, 87:243-256.

Previous Work

Page 16: Gap filling using a Bayesian-regularized neural network

ANN Regression for Land Cover Estimation

Band1

Band2

Band3

Band4

Forest Fraction

Cleared Fraction

Secondary Fraction

Training data suppliedby classified ETM imagery

Page 17: Gap filling using a Bayesian-regularized neural network

Forest Secondary Cleared

ETM+observed

MISRpredicted

1.0

0.0

1.0

0.0

0.4

0.0

0.4

0.0

0.6

0.0

0.6

0.0

Mean Val.Error=0.045 km2

Mean Val.Error=0.038 km2

Mean Val.Error=0.025 km2

(R2=.62) (R2=.58) (R2=.47)

ANN Regression for Land Cover Estimation

Page 18: Gap filling using a Bayesian-regularized neural network

ANN Estimation of GEE and Resp, with Monte Carlo simulation of Total Prediction uncertainty

Clim Flux

Page 19: Gap filling using a Bayesian-regularized neural network

Weekly GEE from Howland Forest, ME based on NEE

ANN Estimation of GEE and Resp, with Monte Carlo simulation of Total Prediction uncertainty

Page 20: Gap filling using a Bayesian-regularized neural network

Some demonstrations of the MacKay/BishopANN regression with 1 input and 1 output

Page 21: Gap filling using a Bayesian-regularized neural network

Noise=0.10

1.4

Page 22: Gap filling using a Bayesian-regularized neural network

Noise=0.10

Linear Regression

Page 23: Gap filling using a Bayesian-regularized neural network

Noise=0.10

ANN Regression

Page 24: Gap filling using a Bayesian-regularized neural network

Noise=0.02

ANN Regression

Page 25: Gap filling using a Bayesian-regularized neural network

Noise=0.20

ANN Regression

Page 26: Gap filling using a Bayesian-regularized neural network

Noise=0.20

ANN Regression

Page 27: Gap filling using a Bayesian-regularized neural network

Noise=0.10

ANN Regression

Page 28: Gap filling using a Bayesian-regularized neural network

Noise=0.05

ANN Regression

Page 29: Gap filling using a Bayesian-regularized neural network

Noise=0.05

ANN Regression

Page 30: Gap filling using a Bayesian-regularized neural network

Noise=0.05

ANN Regression

Page 31: Gap filling using a Bayesian-regularized neural network

Issues associated with multidimensional problems

Sufficient sampling of the the input space

Data normalization (column mean zero and standard deviation one)

Processing time

Algorithm parameter choices

Page 32: Gap filling using a Bayesian-regularized neural network

Our gap-filling algorithm

1.Assemble meteorological and flux data in an Nxd table

2.Create five additional columns for sin() and cos() of time of day and day of year, and potential PAR

3.Standardize all columns

4.First iteration: Identify columns with no gaps; use these to fill all the others, one at a time.

5.Create an additional column, NEE(t-1), flux lagged by one time interval

6.Second iteration: Remove filled points from the NEE time series, refill with all other columns

Page 33: Gap filling using a Bayesian-regularized neural network

Room for Improvement

1.Don’t extrapolate wildly, revert to time-based filling in areas with low sampling density, especially at the beginning and end of the record

2.Carefully evaluate the sensitivity to internal settings (e.g., alpha, beta, Nnodes)

3.Stepwise analysis for relative importance of driver variables

4.Migrate to C or other faster environment

5.Include uncertainty estimates in the output

6.At least, clean up the code and make it available to others in the project, and/or broader community