Distributed nonparametric regression in multi-agent...

Post on 28-Sep-2020

2 views 1 download

Transcript of Distributed nonparametric regression in multi-agent...

Distributed nonparametric regression in

multi-agent systems

Damiano Varagnolo

(joint work with Gianluigi Pillonetto and Luca Schenato)

Department of Information Engineering - University of Padova

March, 25th 2011

entirely

writtenin

LATEX

2εus

ing

Beamer and Tik Z

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 1 / 42

introduction

Multi-agent systems: examples of applications

Multi-Robot

Coordination

Underwater

Exploration

Smart Grids

Deployment

Wind Power

Management

Wild Life

Monitoring

Smart

Highways

Deployment

Smart

Surveillance

Implementation

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 2 / 42

introduction

Generic and important problem

Assumption

noisy measurements of

f (x , t) : R3 × R 7→ R

that are

non uniformly sampled in space x

non uniformly sampled in time t

taken by di�erent agents

Objective

smoothing in space (x) and forecast in time (t) the quantity f (x , t)

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 3 / 42

introduction

Example 1 - channel gains in geographical areas

source: Dall'Anese et al., 2011

x ∈ R2 : position

t : time

f (x , t) : channel gain

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 4 / 42

introduction

Example 2 - waves power extraction

source: www.graysharboroceanenergy.com

x ∈ R2 : position

t : time

f (x , t) : sea level

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 5 / 42

introduction

Example 3 - multi robot exploration

source: http://www-robotics.jpl.nasa.gov

x ∈ R2 : position

f (x) : ground level

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 6 / 42

introduction

Problems and di�culties

Information-related problems

non-uniform samplings both in time and in space

unknown dynamics of f

unknown or extremely complex correlations in time and space

Hardware-related problems

energy & computational & memory & bandwidth limitations

Framework-related problems

mobile and time varying network

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 7 / 42

introduction

State of the art

proposed

distributed

solutions

Least Squares

Maximum Likelihoods

Kalman Filtering

Kriging

Other Learning Techniques

Choi et al. 2009, Predd et al. 2006, Boyd et al. 2005

Schizas et al. 2008, Barbarossa

and Scutari 2007, Boyd et al.

2010

Cressie and Wikle 2002,

Olfati-Saber 2007, Carli

et al. 2008

Dall'Anese et al. 2011, Cortés 2010

Nguyen et al. 2005, Bazerque et al. 2010

nonparametric

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 8 / 42

introduction

State of the art

proposed

distributed

solutions

Least Squares

Maximum Likelihoods

Kalman Filtering

Kriging

Other Learning Techniques

Choi et al. 2009, Predd et al. 2006, Boyd et al. 2005

Schizas et al. 2008, Barbarossa

and Scutari 2007, Boyd et al.

2010

Cressie and Wikle 2002,

Olfati-Saber 2007, Carli

et al. 2008

Dall'Anese et al. 2011, Cortés 2010

Nguyen et al. 2005, Bazerque et al. 2010

dynamic scenarios

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 8 / 42

introduction

State of the art

proposed

distributed

solutions

Least Squares

Maximum Likelihoods

Kalman Filtering

Kriging

Other Learning Techniques

Choi et al. 2009, Predd et al. 2006, Boyd et al. 2005

Schizas et al. 2008, Barbarossa

and Scutari 2007, Boyd et al.

2010

Cressie and Wikle 2002,

Olfati-Saber 2007, Carli

et al. 2008

Dall'Anese et al. 2011, Cortés 2010

Nguyen et al. 2005, Bazerque et al. 2010

static scenarios

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 8 / 42

introduction

State of the art - Vision

obtain

f (x , t) = Ψ (past measurements, θ)

being

distributed

capable of both smoothing and prediction

Our approach

nonparametric: Ψ (·) lives in an in�nite dimensional space

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 9 / 42

introduction

State of the art - Vision

obtain

f (x , t) = Ψ (past measurements)

being

distributed

capable of both smoothing and prediction

Our approach

nonparametric: Ψ (·) lives in an in�nite dimensional space

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 9 / 42

introduction

Why should we use a nonparametric approach?

Motivations

it could be di�cult or even impossible to de�ne a parametric

model (e.g. when only regularity assumptions are available)

parametric models could involve a large number of parameters

(could require nonlinear optimization techniques)

lead to convex optimization problems

consistent, i.e. f → f when # measurements →∞ (De Nicolao,

Ferrari-Trecate, 1999)

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 10 / 42

introduction

State of the art - where we actually contribute

our small

puzzle-piece

agents estimate

the same f

static scenario

(f indepen-

dent of t)

Regularization

Network

approach

Poggio and Girosi 1990

e.g. exploration

no needs to discard

old measurements

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 11 / 42

introduction

Goal of the current presentation

completely self-organizing multi-agent regression strategy

Bene�ts of self-organization

widest applicability

robustness w.r.t. human error

Requirements

simple strategy

quanti�able performances

automatic tuning of the parameters

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 12 / 42

framework & assumptions

Framework

Sensors:

# = S , identical

sample the same noisy function

limited computational &

communication capabilities

want to collaborate

1 measurement × sensor (ease

of notation)

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 13 / 42

framework & assumptions

Measurement model

yi = f (xi) + νi (1)

f : X ⊂ Rd → R unknown (X compact)

νi ⊥ xi , zero mean and variance σ2

xi ∼ µ i.i.d. (sensors know µ!!)

examples of µ:

uniform jitter generic

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 14 / 42

distributed estimators

Regularization Networks

Q (f ) =S∑i=1

(yi − f (xi))2 + γ ‖f ‖2K

lives in an in�nite

dimensional space

regularization factor,

K : X × X → RMercer kernel

Centralized optimal solution

fc =S∑i=1

ciK (xi , ·)

[c1

.

.

.

cS

]=

([K (x1, x1) · · · K (x1, xS )

.

.

.

.

.

.

K (xS , x1) · · · K (xS , xS )

]+γI

)−1 [y1

.

.

.

yS

]Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 15 / 42

distributed estimators

Regularization Networks

Q (f ) =S∑i=1

(yi − f (xi))2 + γ ‖f ‖2K

lives in an in�nite

dimensional space

regularization factor,

K : X × X → RMercer kernel

Centralized optimal solution

fc =S∑i=1

ciK (xi , ·)

[c1

.

.

.

cS

]=

([K (x1, x1) · · · K (x1, xS )

.

.

.

.

.

.

K (xS , x1) · · · K (xS , xS )

]+γI

)−1 [y1

.

.

.

yS

]Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 15 / 42

distributed estimators

Example - Cubic splines smoothingE�ects of variation of γ for a given K

x

f(x)

high γlow γ

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 16 / 42

distributed estimators

Drawbacks

fc =S∑i=1

ciK (xi , ·)

[c1

.

.

.

cS

]=

([K (x1, x1) · · · K (x1, xS )

.

.

.

.

.

.

K (xS , x1) · · · K (xS , xS )

]+γI

)−1 [y1

.

.

.

yS

]

computational cost: O (S3) (inversion of S × S matrix)

transmission cost: O (S) (knowledge of whole {xi , yi}Si=1)

need to �nd alternative solutions

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 17 / 42

distributed estimators

Alternative centralized optimal solution (1st on 2)

1 consider that

K (x1, x2) =+∞∑e=1

λeφe(x1)φe(x2)λe = eigenvalue

φe = eigenfunction(2)

2 rewrite the measurement model:

yi =

Ci :=︷ ︸︸ ︷[φ1 (xi) , φ2 (xi) , . . .]

b:=︷ ︸︸ ︷ b1b2...

+νi (3)

3 rewrite the cost function:

Q (b) =S∑i=1

(yi − Cib)2 + γ+∞∑e=1

b2eλe

(4)

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 18 / 42

distributed estimators

Alternative centralized optimal solution (2nd on 2)

bc =

(1

Sdiag

λe

)+

1

S

S∑i=1

CTi Ci

)−1(1

S

S∑i=1

CTi yi

)(5)

involves in�nite dimensional objects:

bc =

• · · · · · ·...

. . ....

. . .

−1

•......

⇒ cannot be computed exactly

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 19 / 42

distributed estimators

Hypothesis space reduction

PCA

P ≥ 0 P = ≈

Connection with PCA

span 〈φ1, . . . , φE 〉 is the E -dimensional subspace that captures thebiggest part of the energy of the estimate (E � S)

Implication: new measurement model

yi = CEi b + νi

with CEi := [φ1(xi), . . . , φE (xi)] b := [b1, . . . , bE ]T

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 20 / 42

distributed estimators

Suboptimal �nite dimensional solution

New estimator

br =

(1

Sdiag

λe

)+

1

S

S∑i=1

(CEi

)TCEi

)−1(1

S

S∑i=1

(CEi

)Tyi

)computable (involves E × E matrices and E -dimensional vectors)

minimizes QE (b) :=∑S

i=1

(yi − CE

i b)2

+ γ∑E

e=1b2eλe

Drawbacks

1 O (E 3) computational e�ort

2 O (E 2) transmission e�ort

3 must know S

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 21 / 42

distributed estimators

Suboptimal �nite dimensional solution

New estimator

br =

(1

Sdiag

λe

)+

1

S

S∑i=1

(CEi

)TCEi

)−1(1

S

S∑i=1

(CEi

)Tyi

)computable (involves E × E matrices and E -dimensional vectors)

minimizes QE (b) :=∑S

i=1

(yi − CE

i b)2

+ γ∑E

e=1b2eλe

Drawbacks

1 O (E 3) computational e�ort

2 O (E 2) transmission e�ort

3 must know S

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 21 / 42

distributed estimators

Derivation of the distributed estimator

br =

(1

Sdiag

λe

)+

1

S

S∑i=1

(CEi

)TCEi

)−1(1

S

S∑i=1

(CEi

)Tyi

)

Consider the approximations

S → Sg (guess, will be tuned later on)

1

S

S∑i=1

(CEi

)TCEi → E

[(CEi

)TCEi

]= I

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 22 / 42

distributed estimators

Derivation of the distributed estimator

obtain: bd =

(1

Sgdiag

λe

)+ I

)−1(1

S

S∑i=1

(CEi

)Tyi

)

Advantages

1 O (E ) computational e�ort

2 O (E ) transmission e�ort

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 23 / 42

distributed estimators

Summary of proposed estimation schemes

bc =

(1

Sdiag

λe

)+

1

S

S∑i=1

CTi Ci

)−1(1

S

S∑i=1

CTi yi

)

br =

(1

Sdiag

λe

)+

1

S

S∑i=1

(CEi

)TCEi

)−1(1

S

S∑i=1

(CEi

)Tyi

)

bd =

(1

Sgdiag

λe

)+ I

)−1(1

S

S∑i=1

(CEi

)Tyi

)

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 24 / 42

distributed estimators

Summary of proposed estimation schemes

br

br : O(E3)comput., O

(E2)transm., uses S

bd

bd : O (E) comput., O (E) transm., uses Sg

original hyp. space

bcbc : O

(S3)comput., O (S) transm. for fc

reduced hyp. s

pace

???

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 25 / 42

distributed estimators

Summary of proposed estimation schemes

br

br : O(E3)comput., O

(E2)transm., uses S

bd

bd : O (E) comput., O (E) transm., uses Sg

original hyp. space

bcbc : O

(S3)comput., O (S) transm. for fc

reduced hyp. s

pace

???

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 25 / 42

distributed estimators

Summary of proposed estimation schemes

br

br : O(E3)comput., O

(E2)transm., uses S

bd

bd : O (E) comput., O (E) transm., uses Sg

original hyp. space

bcbc : O

(S3)comput., O (S) transm. for fc

reduced hyp. s

pace

???

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 25 / 42

distributed estimators

Summary of proposed estimation schemes

br

br : O(E3)comput., O

(E2)transm., uses S

bd

bd : O (E) comput., O (E) transm., uses Sg

original hyp. space

bcbc : O

(S3)comput., O (S) transm. for fc

reduced hyp. s

pace

???

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 25 / 42

distributed estimators

Summary of proposed estimation schemes

br

br : O(E3)comput., O

(E2)transm., uses S

bd

bd : O (E) comput., O (E) transm., uses Sg

original hyp. space

bcbc : O

(S3)comput., O (S) transm. for fc

reduced hyp. s

pace

???

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 25 / 42

distributed estimators

Summary of proposed estimation schemes

br

br : O(E3)comput., O

(E2)transm., uses S

bd

bd : O (E) comput., O (E) transm., uses Sg

original hyp. space

bcbc : O

(S3)comput., O (S) transm. for fc

reduced hyp. s

pace

???

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 25 / 42

distributed estimators

Open problems

quanti�cation of performance

how to tune E and Sg

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 26 / 42

bounds on errors

Quanti�cation of performances: �rst results

bc

brbd

??

bc =(1Sdiag

(γλe

)+ 1

S

∑Si=1 C

Ti Ci

)−1 (1S

∑Si=1 C

Ti yi

)br =

(1Sdiag

(γλe

)+ 1

S

∑Si=1

(CEi

)TCEi

)−1 (1S

∑Si=1

(CEi

)Tyi

)bd =

(1Sg

diag(γλe

)+ I)−1 (

1S

∑Si=1

(CEi

)Tyi

)

computed o�ine

average of local residuals

‖bc − bd‖2 ≤ α ‖br − bd‖2 +1

S

S∑i=1

|ri |

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 27 / 42

bounds on errors

Quanti�cation of performances: �rst results

bc

brbd

??

bc =(1Sdiag

(γλe

)+ 1

S

∑Si=1 C

Ti Ci

)−1 (1S

∑Si=1 C

Ti yi

)br =

(1Sdiag

(γλe

)+ 1

S

∑Si=1

(CEi

)TCEi

)−1 (1S

∑Si=1

(CEi

)Tyi

)bd =

(1Sg

diag(γλe

)+ I)−1 (

1S

∑Si=1

(CEi

)Tyi

)

computed o�ine

average of local residuals

‖bc − bd‖2 ≤ α ‖br − bd‖2 +1

S

S∑i=1

|ri |

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 27 / 42

bounds on errors

Quanti�cation of performances

bc

brbd

??‖bc − bd‖2 ≤ α ‖br − bd‖2 + |rave|

not distributedly computable

∝ 1

Smin

− 1

Smax∝ I − 1

S

S∑i=1

(CEi

)TCEi

‖br − bd‖2 ≤ ‖USbd‖2 + ‖UCbd‖2

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 28 / 42

bounds on errors

Quanti�cation of performances

bc

brbd

??‖bc − bd‖2 ≤ α ‖br − bd‖2 + |rave|

not distributedly computable

∝ 1

Smin

− 1

Smax∝ I − 1

S

S∑i=1

(CEi

)TCEi

‖br − bd‖2 ≤ ‖USbd‖2 + ‖UCbd‖2

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 28 / 42

bounds on errors

Quanti�cation of performances

bc

brbd

??‖bc − bd‖2 ≤ α ‖br − bd‖2 + |rave|

not distributedly computable

∝ 1

Smin

− 1

Smax∝ I − 1

S

S∑i=1

(CEi

)TCEi

‖br − bd‖2 ≤ ‖USbd‖2 + ‖UCbd‖2

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 28 / 42

bounds on errors

Quanti�cation of performances

bc

brbd

??‖bc − bd‖2 ≤ α ‖br − bd‖2 + |rave|

not distributedly computable

∝ 1

Smin

− 1

Smax∝ I − 1

S

S∑i=1

(CEi

)TCEi

‖br − bd‖2 ≤ ‖USbd‖2 + ‖UCbd‖2

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 28 / 42

bounds on errors

Distributed Monte Carlo Estimation(assumption: bd and |rave| already computed)

‖bc − bd‖2 ≤ |rave|+ α ‖USbd‖2 + α ‖UCbd‖2

1 (locally) generate a sample of ‖UCbd‖22 (distributedly) estimate E

[‖UCbd‖2

], var

(‖UCbd‖2

)3 use Cantelli's inequality:

P[? ≤ E [?] + k · stdev (?)

]≥ 1

1 + k2

obtain:

‖bc−bd‖2 ≤ |rave|+α ‖USbd‖2+α(E[‖UCbd‖2

]+k · stdev

(‖UCbd‖2

))Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 29 / 42

bounds on errors

Distributed Monte Carlo Estimation(assumption: bd and |rave| already computed)

‖bc − bd‖2 ≤ |rave|+ α ‖USbd‖2 + α ‖UCbd‖2

1 (locally) generate a sample of ‖UCbd‖22 (distributedly) estimate E

[‖UCbd‖2

], var

(‖UCbd‖2

)3 use Cantelli's inequality:

P[? ≤ E [?] + k · stdev (?)

]≥ 1

1 + k2

obtain:

‖bc−bd‖2 ≤ |rave|+α ‖USbd‖2+α(E[‖UCbd‖2

]+k · stdev

(‖UCbd‖2

))Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 29 / 42

bounds on errors

Distributed Monte Carlo Estimation(assumption: bd and |rave| already computed)

‖bc − bd‖2 ≤ |rave|+ α ‖USbd‖2 + α ‖UCbd‖2

1 (locally) generate a sample of ‖UCbd‖22 (distributedly) estimate E

[‖UCbd‖2

], var

(‖UCbd‖2

)3 use Cantelli's inequality:

P[? ≤ E [?] + k · stdev (?)

]≥ 1

1 + k2

obtain:

‖bc−bd‖2 ≤ |rave|+α ‖USbd‖2+α(E[‖UCbd‖2

]+k · stdev

(‖UCbd‖2

))Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 29 / 42

bounds on errors

Distributed Monte Carlo Estimation(assumption: bd and |rave| already computed)

‖bc − bd‖2 ≤ |rave|+ α ‖USbd‖2 + α ‖UCbd‖2

1 (locally) generate a sample of ‖UCbd‖22 (distributedly) estimate E

[‖UCbd‖2

], var

(‖UCbd‖2

)3 use Cantelli's inequality:

P[? ≤ E [?] + k · stdev (?)

]≥ 1

1 + k2

obtain:

‖bc−bd‖2 ≤ |rave|+α ‖USbd‖2+α(E[‖UCbd‖2

]+k · stdev

(‖UCbd‖2

))Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 29 / 42

bounds on errors

Distributed Monte Carlo Estimation(assumption: bd and |rave| already computed)

‖bc − bd‖2 ≤ |rave|+ α ‖USbd‖2 + α ‖UCbd‖2

1 (locally) generate a sample of ‖UCbd‖22 (distributedly) estimate E

[‖UCbd‖2

], var

(‖UCbd‖2

)3 use Cantelli's inequality:

P[? ≤ E [?] + k · stdev (?)

]≥ 1

1 + k2

obtain:

‖bc−bd‖2 ≤ |rave|+α ‖USbd‖2+α(E[‖UCbd‖2

]+k · stdev

(‖UCbd‖2

))Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 29 / 42

bounds on errors

Description of the simulations

f : R 7→ R

µ = U [0, 1]

composition of f : sum of sinusoidals

K : Gaussian kernel

measurement noise st. dev.: 0.25 (average SNR: ≈ 10)

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 30 / 42

bounds on errors

Regression strategy e�ectiveness exampleS = 1000, E = 40 (motivated later)

0 0.2 0.4 0.6 0.8 1

−1.5

−1

−0.5

0

0.5

1

x

f(x)

f

bc

bd

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 31 / 42

bounds on errors

Examples of the e�ectiveness of the boundsS = 1000, E = 40, Smin = 900, Smin = 1100, 100 Monte Carlo runs

0 0.1 0.2 0.3 0.4

0

0.1

0.2

0.3

0.4

‖bc − bd‖2 / ‖bd‖2

computedbound

k = 0 (c.l. 0)k = 3 (c.l. 0.9)

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 32 / 42

bounds on errors

Swept under the carpeti.e. (some of) the things we should have proved

how to compute the bound

why estimates of E[‖UCbd‖2

]and stdev

(‖UCbd‖2

)are good

estimates

why we can use Cantelli's inequality

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 33 / 42

auto-tuning procedures

The need for automatic tuning procedures

to be distributedly computed

can be considered �xed

bd (Sg ,E ) =

(1

Sgdiag

λe

)+ I

)−1(1

S

S∑i=1

(CEi

)Tyi

)

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 34 / 42

auto-tuning procedures

The need for automatic tuning procedures

to be distributedly computed

can be considered �xed

bd (Sg ,E ) =

(1

Sgdiag

λe

)+ I

)−1(1

S

S∑i=1

(CEi

)Tyi

)

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 34 / 42

auto-tuning procedures

The need for automatic tuning procedures

to be distributedly computed

can be considered �xed

bd (Sg ,E ) =

(1

Sgdiag

λe

)+ I

)−1(1

S

S∑i=1

(CEi

)Tyi

)

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 34 / 42

auto-tuning procedures

Tuning under centralized scenarios

General aim

maximize the generalization capabilities of the estimators

Classical approaches

cross validation

maximum (marginal) likelihood

. . .

Peculiarities of our distributed framework

no test sets available

involves unknown quantities

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 35 / 42

auto-tuning procedures

Proposed solution - key ideas

parameters to

be estimated

number of

eigenfunc. E

number of

sensors Sg

assure ‖bc − br (E )‖to be su�ciently small

minimize the bound

on ‖bc − bd (Sg ,E )‖

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 36 / 42

auto-tuning procedures

Proposed solution - key ideas

parameters to

be estimated

number of

eigenfunc. E

number of

sensors Sg

assure ‖bc − br (E )‖to be su�ciently small

minimize the bound

on ‖bc − bd (Sg ,E )‖

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 36 / 42

auto-tuning procedures

Proposed solution - key ideas

parameters to

be estimated

number of

eigenfunc. E

number of

sensors Sg

assure ‖bc − br (E )‖to be su�ciently small

minimize the bound

on ‖bc − bd (Sg ,E )‖

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 36 / 42

results

Description of the simulations

f : R 7→ R

µ = U [0, 1]

composition of f : sum of sinusoidals

K : Gaussian kernel

measurement noise st. dev.: 0.75 (implies an average SNR of

≈ 2.5)

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 37 / 42

results

Regression strategy e�ectiveness exampleS = 100, E = 20, Smin = 90, Smax = 110

0 0.2 0.4 0.6 0.8 1

−3

−2

−1

0

1

2

3

x

f(x)

bc

bd

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 38 / 42

results

Regression strategy e�ectiveness exampleS = 100, E = 20, Smin = 20, Smax = 2000

0 0.2 0.4 0.6 0.8 1

−3

−2

−1

0

1

2

3

x

f(x)

bc

bd

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 39 / 42

results

Comparisons with naïve and oracle strategiesS = 100, E = 20, Smin = 20, Smax = 2000

naïve: Sg = Save

0 0.2 0.4 0.6 0.8

0

0.2

0.4

0.6

0.8

∥∥bnaïved − bc

∥∥2

∥ ∥ bours

d−

bc

∥ ∥ 2

oracle

0 0.2 0.4 0.6 0.8

0

0.2

0.4

0.6

0.8

∥∥boracled − bc

∥∥2

∥ ∥ bours

d−

bc

∥ ∥ 2

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 40 / 42

conclusions

Conclusions and future works

Conclusions

Strategy is:

e�ective and easy to be implemented strategy

tight performances bound

self-tuning capabilities

Future works

exploit statistical knowledge about S

incorporate e�ects of �nite number of steps in consensus

algorithms

extend to dynamic scenarios (long term objective)

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 41 / 42

conclusions

Distributed nonparametric regression in

multi-agent systems

Damiano Varagnolo

(joint work with Gianluigi Pillonetto and Luca Schenato)

Department of Information Engineering - University of Padova

March, 25th 2011

damiano.varagnolo@dei.unipd.it

www.dei.unipd.it/∼varagnolo/google: damiano varagnolo

licensed under the Creative Commons BY-NC-SA 2.5 Italy License:

entirely

writtenin

LATEX

2εus

ing

Beamer and Tik Z

Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 42 / 42

References

Barbarossa, S. and Scutari, G. (2007). Decentralized

maximum-likelihood estimation for sensor networks composed of

nonlinearly coupled dynamical systems. IEEE Transactions on

Signal Processing, 55(7):3456 � 3470.

Bazerque, J., Mateos, G., and Giannakis, G. (2010). Distributed lasso

for in-network linear regression. In IEEE International Conference

on Acoustics, Speech and Signal Processing.

Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. (2010).

Distributed optimization and statistical learning via the alternating

direction method of multipliers. Technical report, Stanford

University.

Carli, R., Chiuso, A., Schenato, L., and Zampieri, S. (2008).

Distributed kalman �ltering based on consensus strategies. IEEE

Journal on Selected Areas in Communications, 26:622�633.

Choi, J., Oh, S., and Horowitz, R. (2009). Distributed learning andDamiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 42 / 42

References

cooperative control for multi-agent systems. Automatica,

45(12):2802 � 2814.

Cortés, J. (2009). Distributed Kriged Kalman �lter for spatial

estimation. IEEE Transactions on Automatic Control, 54(12):2816

� 2827.

Cressie, N. and Wikle, C. K. (2002). Space - time Kalman �lter.

Encyclopedia of Environmetrics, 4:2045 � 2049.

Dall'Anese, E., Kim, S.-J., and Giannakis, G. B. (2011). Channel

gain map tracking via distributed kriging. IEEE Transactions on

Vehicular Technology, 2011.

De Nicolao, G. and Ferrari-Trecate, G. (1999). Consistent

identi�cation of NARX models via Regularization Networks. IEEE

Transactions on Automatic Control, 44(11):2045 � 2049.

Nguyen, X., Wainwright, M. J., and Jordan, M. I. (2005).

Nonparametric decentralized detection using kernel methods. IEEE

Transactions on Signal Processing, 53(11):4053 � 4066.Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 42 / 42

conclusions

Olfati-Saber, R. (2007). Distributed kalman �ltering for sensor

networks. In IEEE Conference on Decision and Control.

Predd, J. B., Kulkarni, S. R., and Poor, H. V. (2006). Distributed

learning in wireless sensor networks. IEEE Signal Processing

Magazine, 23(4):56 � 69.

Schizas, I. D., Ribeiro, A., and Giannakis, G. B. (2008). Consensus in

ad hoc WSNs with noisy links - part I: Distributed estimation of

deterministic signals. IEEE Transactions on Signal Processing,

56(1):350 � 364.

Varagnolo, D., Pillonetto, G., and Schenato, L. (2011a). Auto-tuning

procedures for distributed nonparametric regression algorithms.

IEEE Conference on Decision and Control, Submitted.

Varagnolo, D., Pillonetto, G., and Schenato, L. (2011b). Distributed

parametric and nonparametric regression with on-line performance

bounds computation. Automatica, Submitted.Damiano Varagnolo (DEI) varagnolo@dei.unipd.it March, 25th 2011 42 / 42