Distributed nonparametric regression in multi-agent...

Distributed nonparametric regression in

multi-agent systems

Damiano Varagnolo

(joint work with Gianluigi Pillonetto and Luca Schenato)

Department of Information Engineering - University of Padova

March, 25th 2011

entirely

writtenin

Beamer and Tik Z

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 1 / 42

introduction

Multi-agent systems: examples of applications

Multi-Robot

Coordination

Underwater

Exploration

Smart Grids

Deployment

Wind Power

Management

Wild Life

Monitoring

Highways

Deployment

Surveillance

Implementation

introduction

Generic and important problem

Assumption

noisy measurements of

f (x , t) : R3 × R 7→ R

that are

non uniformly sampled in space x

non uniformly sampled in time t

taken by di�erent agents

Objective

smoothing in space (x) and forecast in time (t) the quantity f (x , t)

introduction

Example 1 - channel gains in geographical areas

source: Dall'Anese et al., 2011

x ∈ R2 : position

t : time

f (x , t) : channel gain

introduction

Example 2 - waves power extraction

source: www.graysharboroceanenergy.com

x ∈ R2 : position

t : time

f (x , t) : sea level

introduction

Example 3 - multi robot exploration

source: http://www-robotics.jpl.nasa.gov

x ∈ R2 : position

f (x) : ground level

introduction

Problems and di�culties

Information-related problems

non-uniform samplings both in time and in space

unknown dynamics of f

unknown or extremely complex correlations in time and space

Hardware-related problems

energy & computational & memory & bandwidth limitations

Framework-related problems

mobile and time varying network

introduction

State of the art

proposed

distributed

solutions

Least Squares

Maximum Likelihoods

Kalman Filtering

Kriging

Other Learning Techniques

Choi et al. 2009, Predd et al. 2006, Boyd et al. 2005

Schizas et al. 2008, Barbarossa

and Scutari 2007, Boyd et al.

Cressie and Wikle 2002,

Olfati-Saber 2007, Carli

et al. 2008

Dall'Anese et al. 2011, Cortés 2010

Nguyen et al. 2005, Bazerque et al. 2010

nonparametric

introduction

State of the art

proposed

distributed

solutions

Least Squares

Maximum Likelihoods

Kalman Filtering

Kriging

et al. 2008

dynamic scenarios

introduction

State of the art

proposed

distributed

solutions

Least Squares

Maximum Likelihoods

Kalman Filtering

Kriging

et al. 2008

static scenarios

introduction

State of the art - Vision

obtain

f (x , t) = Ψ (past measurements, θ)

distributed

capable of both smoothing and prediction

Our approach

nonparametric: Ψ (·) lives in an in�nite dimensional space

introduction

State of the art - Vision

obtain

f (x , t) = Ψ (past measurements)

distributed

capable of both smoothing and prediction

Our approach

nonparametric: Ψ (·) lives in an in�nite dimensional space

introduction

Why should we use a nonparametric approach?

Motivations

it could be di�cult or even impossible to de�ne a parametric

model (e.g. when only regularity assumptions are available)

parametric models could involve a large number of parameters

(could require nonlinear optimization techniques)

lead to convex optimization problems

consistent, i.e. f → f when # measurements →∞ (De Nicolao,

Ferrari-Trecate, 1999)

introduction

State of the art - where we actually contribute

our small

puzzle-piece

agents estimate

the same f

static scenario

(f indepen-

dent of t)

Regularization

Network

approach

Poggio and Girosi 1990

e.g. exploration

no needs to discard

old measurements

introduction

Goal of the current presentation

completely self-organizing multi-agent regression strategy

Bene�ts of self-organization

widest applicability

robustness w.r.t. human error

Requirements

simple strategy

quanti�able performances

automatic tuning of the parameters

framework & assumptions

Framework

Sensors:

# = S , identical

sample the same noisy function

limited computational &

communication capabilities

want to collaborate

1 measurement × sensor (ease

of notation)

framework & assumptions

Measurement model

yi = f (xi) + νi (1)

f : X ⊂ Rd → R unknown (X compact)

νi ⊥ xi , zero mean and variance σ2

xi ∼ µ i.i.d. (sensors know µ!!)

examples of µ:

uniform jitter generic

distributed estimators

Regularization Networks

Q (f ) =S∑i=1

(yi − f (xi))2 + γ ‖f ‖2K

lives in an in�nite

dimensional space

regularization factor,

K : X × X → RMercer kernel

Centralized optimal solution

fc =S∑i=1

ciK (xi , ·)

([K (x1, x1) · · · K (x1, xS )

K (xS , x1) · · · K (xS , xS )

)−1 [y1

]Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 15 / 42

Regularization Networks

Q (f ) =S∑i=1

(yi − f (xi))2 + γ ‖f ‖2K

lives in an in�nite

dimensional space

regularization factor,

K : X × X → RMercer kernel

Centralized optimal solution

fc =S∑i=1

ciK (xi , ·)

([K (x1, x1) · · · K (x1, xS )

K (xS , x1) · · · K (xS , xS )

)−1 [y1

]Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 15 / 42

Example - Cubic splines smoothingE�ects of variation of γ for a given K

high γlow γ

Drawbacks

fc =S∑i=1

ciK (xi , ·)

([K (x1, x1) · · · K (x1, xS )

K (xS , x1) · · · K (xS , xS )

)−1 [y1

computational cost: O (S3) (inversion of S × S matrix)

transmission cost: O (S) (knowledge of whole {xi , yi}Si=1)

need to �nd alternative solutions

Alternative centralized optimal solution (1st on 2)

1 consider that

K (x1, x2) =+∞∑e=1

λeφe(x1)φe(x2)λe = eigenvalue

φe = eigenfunction(2)

2 rewrite the measurement model:

Ci :=︷︸︸︷[φ1 (xi) , φ2 (xi) , . . .]

b:=︷︸︸︷ b1b2...

+νi (3)

3 rewrite the cost function:

Q (b) =S∑i=1

(yi − Cib)2 + γ+∞∑e=1

b2eλe

Alternative centralized optimal solution (2nd on 2)

S∑i=1

CTi Ci

)−1(1

S∑i=1

CTi yi

involves in�nite dimensional objects:

• · · · · · ·...

. . ....

•......

⇒ cannot be computed exactly

Hypothesis space reduction

P ≥ 0 P = ≈

Connection with PCA

span 〈φ1, . . . , φE 〉 is the E -dimensional subspace that captures thebiggest part of the energy of the estimate (E � S)

Implication: new measurement model

yi = CEi b + νi

with CEi := [φ1(xi), . . . , φE (xi)] b := [b1, . . . , bE ]T

Suboptimal �nite dimensional solution

New estimator

S∑i=1

)−1(1

S∑i=1

)computable (involves E × E matrices and E -dimensional vectors)

minimizes QE (b) :=∑S

(yi − CE

+ γ∑E

e=1b2eλe

Drawbacks

1 O (E 3) computational e�ort

2 O (E 2) transmission e�ort

3 must know S

Suboptimal �nite dimensional solution

New estimator

S∑i=1

)−1(1

S∑i=1

)computable (involves E × E matrices and E -dimensional vectors)

minimizes QE (b) :=∑S

(yi − CE

+ γ∑E

e=1b2eλe

Drawbacks

1 O (E 3) computational e�ort

2 O (E 2) transmission e�ort

3 must know S

Derivation of the distributed estimator

S∑i=1

)−1(1

S∑i=1

Consider the approximations

S → Sg (guess, will be tuned later on)

S∑i=1

)TCEi → E

Derivation of the distributed estimator

obtain: bd =

Sgdiag

)−1(1

S∑i=1

Advantages

1 O (E ) computational e�ort

2 O (E ) transmission e�ort

Summary of proposed estimation schemes

S∑i=1

CTi Ci

)−1(1

S∑i=1

CTi yi

S∑i=1

)−1(1

S∑i=1

Sgdiag

)−1(1

S∑i=1

br : O(E3)comput., O

(E2)transm., uses S

bd : O (E) comput., O (E) transm., uses Sg

original hyp. space

bcbc : O

(S3)comput., O (S) transm. for fc

reduced hyp. s

(E2)transm., uses S

original hyp. space

bcbc : O

reduced hyp. s

(E2)transm., uses S

original hyp. space

bcbc : O

reduced hyp. s

(E2)transm., uses S

original hyp. space

bcbc : O

reduced hyp. s

(E2)transm., uses S

original hyp. space

bcbc : O

reduced hyp. s

(E2)transm., uses S

original hyp. space

bcbc : O

reduced hyp. s

Open problems

quanti�cation of performance

how to tune E and Sg

bounds on errors

Quanti�cation of performances: �rst results

bc =(1Sdiag

(γλe

∑Si=1 C

)−1 (1S

∑Si=1 C

(1Sdiag

(γλe

∑Si=1

)−1 (1S

∑Si=1

diag(γλe

)+ I)−1 (

∑Si=1

computed o�ine

average of local residuals

‖bc − bd‖2 ≤ α ‖br − bd‖2 +1

S∑i=1

bounds on errors

Quanti�cation of performances: �rst results

bc =(1Sdiag

(γλe

∑Si=1 C

)−1 (1S

∑Si=1 C

(1Sdiag

(γλe

∑Si=1

)−1 (1S

∑Si=1

diag(γλe

)+ I)−1 (

∑Si=1

computed o�ine

average of local residuals

‖bc − bd‖2 ≤ α ‖br − bd‖2 +1

S∑i=1

bounds on errors

Quanti�cation of performances

??‖bc − bd‖2 ≤ α ‖br − bd‖2 + |rave|

not distributedly computable

Smax∝ I − 1

S∑i=1

‖br − bd‖2 ≤ ‖USbd‖2 + ‖UCbd‖2

bounds on errors

??‖bc − bd‖2 ≤ α ‖br − bd‖2 + |rave|

Smax∝ I − 1

S∑i=1

bounds on errors

??‖bc − bd‖2 ≤ α ‖br − bd‖2 + |rave|

Smax∝ I − 1

S∑i=1

bounds on errors

??‖bc − bd‖2 ≤ α ‖br − bd‖2 + |rave|

Smax∝ I − 1

S∑i=1

bounds on errors

Distributed Monte Carlo Estimation(assumption: bd and |rave| already computed)

‖bc − bd‖2 ≤ |rave|+ α ‖USbd‖2 + α ‖UCbd‖2

1 (locally) generate a sample of ‖UCbd‖22 (distributedly) estimate E

[‖UCbd‖2

], var

(‖UCbd‖2

)3 use Cantelli's inequality:

P[? ≤ E [?] + k · stdev (?)

]≥ 1

1 + k2

obtain:

‖bc−bd‖2 ≤ |rave|+α ‖USbd‖2+α(E[‖UCbd‖2

]+k · stdev

(‖UCbd‖2

))Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 29 / 42

bounds on errors

[‖UCbd‖2

], var

(‖UCbd‖2

P[? ≤ E [?] + k · stdev (?)

]≥ 1

1 + k2

obtain:

]+k · stdev

(‖UCbd‖2

bounds on errors

[‖UCbd‖2

], var

(‖UCbd‖2

P[? ≤ E [?] + k · stdev (?)

]≥ 1

1 + k2

obtain:

]+k · stdev

(‖UCbd‖2

bounds on errors

[‖UCbd‖2

], var

(‖UCbd‖2

P[? ≤ E [?] + k · stdev (?)

]≥ 1

1 + k2

obtain:

]+k · stdev

(‖UCbd‖2

bounds on errors

[‖UCbd‖2

], var

(‖UCbd‖2

P[? ≤ E [?] + k · stdev (?)

]≥ 1

1 + k2

obtain:

]+k · stdev

(‖UCbd‖2

bounds on errors

Description of the simulations

f : R 7→ R

µ = U [0, 1]

composition of f : sum of sinusoidals

K : Gaussian kernel

measurement noise st. dev.: 0.25 (average SNR: ≈ 10)

bounds on errors

Regression strategy e�ectiveness exampleS = 1000, E = 40 (motivated later)

0 0.2 0.4 0.6 0.8 1

−1.5

−0.5

bounds on errors

Examples of the e�ectiveness of the boundsS = 1000, E = 40, Smin = 900, Smin = 1100, 100 Monte Carlo runs

0 0.1 0.2 0.3 0.4

‖bc − bd‖2 / ‖bd‖2

computedbound

k = 0 (c.l. 0)k = 3 (c.l. 0.9)

bounds on errors

Swept under the carpeti.e. (some of) the things we should have proved

how to compute the bound

why estimates of E[‖UCbd‖2

]and stdev

(‖UCbd‖2

)are good

estimates

why we can use Cantelli's inequality

auto-tuning procedures

The need for automatic tuning procedures

to be distributedly computed

can be considered �xed

bd (Sg ,E ) =

Sgdiag

)−1(1

S∑i=1

bd (Sg ,E ) =

Sgdiag

)−1(1

S∑i=1

bd (Sg ,E ) =

Sgdiag

)−1(1

S∑i=1

Tuning under centralized scenarios

General aim

maximize the generalization capabilities of the estimators

Classical approaches

cross validation

maximum (marginal) likelihood

Peculiarities of our distributed framework

no test sets available

involves unknown quantities

Proposed solution - key ideas

parameters to

be estimated

number of

eigenfunc. E

number of

sensors Sg

assure ‖bc − br (E )‖to be su�ciently small

minimize the bound

on ‖bc − bd (Sg ,E )‖

parameters to

be estimated

number of

eigenfunc. E

number of

sensors Sg

minimize the bound

parameters to

be estimated

number of

eigenfunc. E

number of

sensors Sg

minimize the bound

results

Description of the simulations

f : R 7→ R

µ = U [0, 1]

composition of f : sum of sinusoidals

K : Gaussian kernel

measurement noise st. dev.: 0.75 (implies an average SNR of

≈ 2.5)

results

Regression strategy e�ectiveness exampleS = 100, E = 20, Smin = 90, Smax = 110

0 0.2 0.4 0.6 0.8 1

results

Regression strategy e�ectiveness exampleS = 100, E = 20, Smin = 20, Smax = 2000

0 0.2 0.4 0.6 0.8 1

results

Comparisons with naïve and oracle strategiesS = 100, E = 20, Smin = 20, Smax = 2000

naïve: Sg = Save

0 0.2 0.4 0.6 0.8

∥∥bnaïved − bc

∥∥2

∥ ∥ bours

∥ ∥ 2

oracle

0 0.2 0.4 0.6 0.8

∥∥boracled − bc

∥∥2

∥ ∥ bours

∥ ∥ 2

conclusions

Conclusions and future works

Conclusions

Strategy is:

e�ective and easy to be implemented strategy

tight performances bound

self-tuning capabilities

Future works

exploit statistical knowledge about S

incorporate e�ects of �nite number of steps in consensus

algorithms

extend to dynamic scenarios (long term objective)

conclusions

Distributed nonparametric regression in

multi-agent systems

Damiano Varagnolo

(joint work with Gianluigi Pillonetto and Luca Schenato)

Department of Information Engineering - University of Padova

March, 25th 2011

damiano.varagnolo@dei.unipd.it

www.dei.unipd.it/∼varagnolo/google: damiano varagnolo

licensed under the Creative Commons BY-NC-SA 2.5 Italy License:

entirely

writtenin

Beamer and Tik Z

References

Barbarossa, S. and Scutari, G. (2007). Decentralized

maximum-likelihood estimation for sensor networks composed of

nonlinearly coupled dynamical systems. IEEE Transactions on

Signal Processing, 55(7):3456 � 3470.

Bazerque, J., Mateos, G., and Giannakis, G. (2010). Distributed lasso

for in-network linear regression. In IEEE International Conference

on Acoustics, Speech and Signal Processing.

Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2010).

Distributed optimization and statistical learning via the alternating

direction method of multipliers. Technical report, Stanford

University.

Carli, R., Chiuso, A., Schenato, L., and Zampieri, S. (2008).

Distributed kalman �ltering based on consensus strategies. IEEE

Journal on Selected Areas in Communications, 26:622�633.

Choi, J., Oh, S., and Horowitz, R. (2009). Distributed learning andDamiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 42 / 42

References

cooperative control for multi-agent systems. Automatica,

45(12):2802 � 2814.

Cortés, J. (2009). Distributed Kriged Kalman �lter for spatial

estimation. IEEE Transactions on Automatic Control, 54(12):2816

� 2827.

Cressie, N. and Wikle, C. K. (2002). Space - time Kalman �lter.

Encyclopedia of Environmetrics, 4:2045 � 2049.

Dall'Anese, E., Kim, S.-J., and Giannakis, G. B. (2011). Channel

gain map tracking via distributed kriging. IEEE Transactions on

Vehicular Technology, 2011.

De Nicolao, G. and Ferrari-Trecate, G. (1999). Consistent

identi�cation of NARX models via Regularization Networks. IEEE

Transactions on Automatic Control, 44(11):2045 � 2049.

Nguyen, X., Wainwright, M. J., and Jordan, M. I. (2005).

Nonparametric decentralized detection using kernel methods. IEEE

Transactions on Signal Processing, 53(11):4053 � 4066.Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 42 / 42

conclusions

Olfati-Saber, R. (2007). Distributed kalman �ltering for sensor

networks. In IEEE Conference on Decision and Control.

Predd, J. B., Kulkarni, S. R., and Poor, H. V. (2006). Distributed

learning in wireless sensor networks. IEEE Signal Processing

Magazine, 23(4):56 � 69.

Schizas, I. D., Ribeiro, A., and Giannakis, G. B. (2008). Consensus in

ad hoc WSNs with noisy links - part I: Distributed estimation of

deterministic signals. IEEE Transactions on Signal Processing,

56(1):350 � 364.

Varagnolo, D., Pillonetto, G., and Schenato, L. (2011a). Auto-tuning

procedures for distributed nonparametric regression algorithms.

IEEE Conference on Decision and Control, Submitted.

Varagnolo, D., Pillonetto, G., and Schenato, L. (2011b). Distributed

parametric and nonparametric regression with on-line performance

bounds computation. Automatica, Submitted.Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 42 / 42

Distributed nonparametric regression in multi-agent...

Documents

Transcript of Distributed nonparametric regression in multi-agent...

Introduction to Nonparametric Estimation A.B.Tsybakovpluto.huji.ac.il/~pchiga/teaching/Nonparametric... · 2016-12-06 · Solutions to the exercises from \Introduction to Nonparametric

Nonparametric Notes

“Nonparametric” “Nonparametric” methodsmethods

Managerial speeches

Bhattacharya Nonparametric

Nonparametric tests

Nonparametric Counterfactual Predictions in Neoclassical ... · Nonparametric Counterfactual Predictions in Neoclassical Models . ... Nonparametric Counterfactual Predictions in Neoclassical

Nonparametric Inference

Module 9: Nonparametric Tests - Nova Southeastern …apps.fischlerschool.nova.edu/toolbox...Module 9 Overview ! Nonparametric Tests ! Parametric vs. Nonparametric Tests ! Restrictions

Module 9: Nonparametric Statistics - Naval …faculty.nps.edu/rdfricke/OA3102/Nonparametric Statistics.pdf2 Goals for this Lecture • Discuss advantages and disadvantages of nonparametric

NonParametric Thai

Lesson 15 - 1 Nonparametric Statistics Overview. Objectives Understand Difference between Parametric and Nonparametric Statistical Procedures Nonparametric.

CityportoCityporto Padova Padova - · PDF fileCityportoCityporto Padova Padova Provincia di Padova Comune di Padova Freight mobility in urban areas A successful model of city logistics

Speeches of Martin Luther King, Jr. tin-luther-king-speeches tin-luther-king-speeches.

Nonparametric Statistics

Nonparametric Methods

nonparametric lecture.ppt

Relating Through Informative Speeches and Persuasive Speeches CHAPTER 13.

Nonparametric Estimation: Part II - Stanford Universityyjhan/nonparametric... · Nonparametric Estimation: Part II Regression in Transformed Space Yanjun Han Department of Electrical

Speeches Summary