a user point of view

61
Eperimental Eperimental modeling: modeling: learning models from data learning models from data a user point of view a user point of view Mario Milanese Dip. di Automatica e Informatica Politecnico di Torino The Logic of Modeling Alta Scuola Politecnica Milano, April 21, 2006

Transcript of a user point of view

Page 1: a user point of view

Eperimental Eperimental modeling:modeling:learning models from datalearning models from data

a user point of viewa user point of view

Mario MilaneseDip. di Automatica e Informatica

Politecnico di Torino

The Logic of Modeling

Alta Scuola Politecnica

Milano, April 21, 2006

Page 2: a user point of view

OutlineOutline

Models as tools for making inferences from system dataprediction, simulation, control, filtering, fault detection

Model structuresphysical law based, input-output description, linear, nonlinear

Model estimationstatistical/parametric, set membership, structured

Model quality evaluation (vs. model validation)

Application examplesPrediction of atmospheric pollutionSimulation of dam crest dynamicsIdentification of vehicles with controlled suspensions

Page 3: a user point of view

Regression form of system representationRegression form of system representation

System So produces output signal y when driven by input signal u :

1 2

1

1 1 2 2

( )

[ ]y u u

t o t

t n t n t nt t t t

y f w

w y y u u u u

+

− − −

=

=

So

u1 y

Output y is related to input u by the regression function f o :

u2

Page 4: a user point of view

Regression form of system representationRegression form of system representation

Linear system

If ny=0 : MA (FIR) system

f o is linear in :tw

1 1 11 1

y u

y u

t n t nt t t t to n o ny a y a y a y b u b u b u− −+ − −= + + + + +

If nu=0 : AR system

ARMA system

If f o nonlinear : NARMA, NFIR, NAR systems

Page 5: a user point of view

Making inferences from dataMaking inferences from dataIt is desired to make an inference on system So :

prediction, identification, simulation, control, filtering, fault detection

The inference is described by the operator I( f o,wT)

one-step prediction

identification

I(f o,wT)=f o(wT)

I(f o,wT)=f o

The system So is unknown, but a finite number of noise corrupted measurements of yt,wt are available:

1 ( ) , 1, ,t o t ty f w d t T+ = + =dt accounts for errors in data ,t ty w

Page 6: a user point of view

Making inferences from dataMaking inferences from data

Problems :for given estimates

evaluate the inference error

The inference error cannot be exactly evaluatedsince f o and wT are not known

ˆ ˆ,o T Tf f w wˆ ˆ( , ) ( , )o T TI f w I f w−

find estimates ˆ ˆ,o T Tf f w w“minimizing” the inference error

Need of prior assumptions on f o and d t for deriving finite bounds on inference error

~~

~ ~

Page 7: a user point of view

The model is described by:

type of function ftype of noise d

Model structuresModel structures

which inputs u1, u2,…lag values ny, nu1, nu2,…

1 2

1

1 1 2 2

( )

[ ]y u u

t t t

t n t n t nt t t t

y f w d

w y y u u u u

+

− − −

= +

=

Model structure is defined by:

Page 8: a user point of view

Typical assumptions in literature:on system:

known lag values ny, nu1, nu2,…

Functional form of F(θ) required:derived from physical lawsσi : “basis” function (polynomial, sigmoid,..)

1

( ) ( , ) ( , )r

oi i i

i

f f w wθ θ ασ β=

∈ =

∑F =

on noise: iid stochastic noise

Parameters θ are estimated by optimizingLeast Squares (LS) or Maximum Likelihood functionals

Statistical/parametric approachStatistical/parametric approachModel structuresModel structures

Page 9: a user point of view

If possible, physical laws are used to obtain theparametric representation of ( ),f w θ

When the physical laws are not well known or too complex, input-output parameterizations are used

“Fixed” basis parametrizationPolinomial, trigonometric, etc.

“Tunable” basis parametrizationNeural networks, wawelets , etc.

often called black-box models

Statistical/parametric approachStatistical/parametric approachModel structuresModel structures

Page 10: a user point of view

( ) [ ]

( )

11

( , )

:

r

i i ri

i

f w w

w

θ ασ θ α α

σ=

′= =∑“Basis”

Problem: Can σi ’s be found such that

( )( , ) orf w f wθ →∞→ ?

Statistical/parametric approachStatistical/parametric approachModel structures: Model structures: “fixed” basis“fixed” basis

Page 11: a user point of view

For continuous f o, bounded and σi

polynomial of degree i (Weierstrass):

nW ⊂ℜ

( )lim sup ( , ) 0o

r w Wf w f w θ

→∞ ∈− =

Polynomial NARX models

Statistical/parametric approach Statistical/parametric approach Model structures: Model structures: “fixed” basis“fixed” basis

Page 12: a user point of view

( )1

1 11

( , ) ,

,

r

i ii

qr rq i

f w wθ ασ β

θ α α β β β

=

=

′ = ∈ℜ

One of the most common “tunable” parameterizationis the one-hidden layer sigmoidal neural network

( ) ( ), Ti i iw w a bσ β σ= +

( )σ •

Statistical/parametric approach Statistical/parametric approach Model structures: Model structures: “tunable” basis“tunable” basis

sigmoid

Page 13: a user point of view

2 1 1

3 2 2

1

( , )( , )

( , )

o

o

T T o T

y f w dy f w d

y f w d

θ

θ

θ+

= +

= +

= +

( )oY F Dθ= +

Given T noise-corrupted measurements of y t,w t:

Measured output

Known function

( )1

( , ) ,i i

ro o o o

i

f f w wθ α σ β=

= = ∑

Error

Statistical/parametric approach Statistical/parametric approach Model estimationModel estimation

Page 14: a user point of view

( )oY F Dθ= + Gaussian pdf

Maximum Likelihood –Least Squares estimate

( )ˆ arg min Rθ

θ θ=

Problem: is in general non-convex

( ) ( ) ( )1 1R D D Y F Y FT T

θ θ θ′

′= = − −

( )R θ

Statistical/parametric approach Statistical/parametric approach Model estimationModel estimation

Page 15: a user point of view

“Fixed” basis:“Fixed” basis:

Estimation of θ is a linear problem: oY L Dθ= +

( ) [ ]11

( , )r

i i ri

f w wθ α σ θ α α=

′= =∑

If D is iid gaussian: ( ) 1ˆML L L L Yθ −′ ′=

( ) ( )

( ) ( )

1 1 12 3 1

1

rT

T r T

w wL Y y y y

w w

σ σ

σ σ

+

′ = =

Statistical/parametric approach Statistical/parametric approach Model estimationModel estimation

Statistical/parametric approach Statistical/parametric approach Model estimationModel estimation

Page 16: a user point of view

For fixed basis and D iid gaussian:

For tunable basis this results holds asymptotically (T→∞) with:

( ) 1ˆ 2 . . 0.95o MLi i i

iiL L w pϑ θ σ− ′− ≤

Statistical/parametric approach Statistical/parametric approach Estimation accuracyEstimation accuracy

standard deviation of noise component d i

o

FLϑ ϑϑ =

∂ = ∂

Page 17: a user point of view

Model structure choice:- “basis” type- Number r of “basis”- Number n of regressors

Problem: “curse of dimensionality”The number r of basis needed to obtain “accurate” approximation of f o may grow exponentiallywith the dimension n of regressor space

More relevant in the case of “fixed” basis

Statistical/parametric approach Statistical/parametric approach Model structures: Model structures: propertiesproperties

Page 18: a user point of view

Under suitable regularity conditions on the function to approximate, the number of parameters r required to obtain “accurate” models grows linearly with n

Estimation of θ requires to solve a non-convexminimization problem

Trapping in local minima

Statistical/parametric approach Statistical/parametric approach Model structures: Model structures: propertiesproperties

Using tunable basis:

Page 19: a user point of view

Basic to the statistical/parametric approach is the assumption of no modeling error

Statistical/parametric approach Statistical/parametric approach Modeling errorsModeling errors

: ( , )o o of f wϑ ϑ∃ =

stochastic variinde

able pendent of input u

( , )t t od y f w ϑ= −

Page 20: a user point of view

Statistical/parametric approach Statistical/parametric approach Modeling errorsModeling errors

Searches for the functional form of unknown f o are time consuming and lead to approximate model structures

d t is no more a stochastic variable independent of u

Statistical estimation in presence of modeling errors is a hard problem

Set Membership approach:no assumption on the functional form of f o

no statistical assumption on d t

Page 21: a user point of view

Set Membership approachSet Membership approach

Significant improvements obtained by:

use of “local” bound

bounded set ∈ Rn

{ }1

2

'( ) : ( ) ,of f C f w w Wγ γ∈ ∈ ≤ ∀ ∈F =

scaling of regressors w to adapt to data

, 1 , . . ,t t td t Tε γ δ =≤ +

SM assumptions:

on system:

on noise:

'

2( ) ( )f w wγ≤

Page 22: a user point of view

Inference algorithm Φ maps all informationinto estimated inference:

( ) : | ( ) | , 1, ,T t tt tFSS f y f w t Tγ ε γδ ∈ − ≤ =

+= F

All information (prior and data) are summarizedin the Feasible Systems Set:

ˆ ( ) ( , )T o TI FSS I f w=Φ

FSST is the set of all systems ∈F (γ) that could have generated the data

~

Set Membership approachSet Membership approach

Page 23: a user point of view

Set Membership approachSet Membership approachPrior assumptions validationPrior assumptions validation

The fact that the priors are validated by using the present data does not exclude that they may beinvalidated by future data (Popper, “Conjectures and Refutations: the Growth of Scientific Knowledge”, 1969)

Prior assumptions are invalidated by dataif FSST is empty

Prior assumptions are considered validatedif FSST ≠ Ø

Page 24: a user point of view

Set Membership approach Set Membership approach Prior assumptions validationPrior assumptions validation

Theorem:Conditions for assumptions to be validated are:

necessary:

sufficient:

( ) , 1, ..,t tf w h t T≥ =

( ) , 1, ..,t tf w h t T> =

Define: 21,.., 1( ) min ( || || )t t

t Tf w h w wγ

= −= + −

1 1,t t t t t t t th y h yε γδ ε γδ+ += + + = − −21,.., 1

( ) max ( || || )t t

t Tf w h w wγ

= −= + −

Administrator
Rectangle
Administrator
Line
Page 25: a user point of view

Set Membership approach Set Membership approach Prior assumptions validationPrior assumptions validation

Used for thechoice of γ,εvalues

In space (γ,ε) the surface * ( ) infTFSS

γ ε γ≠∅

=separates falsified values from validated ones

validated

falsified

Page 26: a user point of view

Set Membership approach Set Membership approach Error and optimality conceptsError and optimality concepts

(Local) Inference error:

An algorithm Φ* is optimal if:

| |

ˆ( ) [ ( )] sup sup || ( ) ( , ) ||T T T T T

T T T

f FSS w wE I E FSS FSS I f w

γδε∈ +− ≤= Φ = Φ −

*[ ( )] inf [ ( )]T T TE FSS E FSS r FSSΦ

Φ = Φ = ∀

An algorithm Φα is α-optimal if:

[ ( )] inf [ ( )]T T TE FSS E FSS FSSα αΦ

Φ ≤ Φ ∀

r: (local) radius of information

Page 27: a user point of view

InferenceInference

Let || I(f,wT)||=|| f ||p=

Theorem:

1( ) [ ( ) ( )]2

cf w f w f w= +

i) The identification algorithm Φc(FSST)=f c

is optimal for any Lp norm, 1≤ p ≤∞

1[ ] ||2

||cpE f r f f= = −

Define

1/[ | ( ) | d ]p p

W

f w w∫

ii) The radius of information r is:

Identification: Identification: I(I(f,f,wwTT)=)=ff

Set Membership approachSet Membership approach

Page 28: a user point of view

InferenceInference

Let: || I(f,wT)||=| f (wT) |

Assume:

Prediction: Prediction: I(I(f,f,wwTT)=)=f f ((wwTT))

Let: { }2( ) :t t tB w w W w wδ δ= ∈ − ≤

t t td ε γδ≤ +

Set Membership approachSet Membership approach

Page 29: a user point of view

InferenceInference

Theorem:i) The prediction algorithm

ii) If

Prediction: Prediction: I(I(f,f,wwTT)=)=ff((wwTT))

( ) ( )c T c TFSS f wΦ =

( ) ,T T TB w C Cδ ⊂ ∩ then prediction 1ˆ ( )T c Ty f w+ =is optimal and the radius of information is:

is 2-optimal, with prediction error bounded by:

( ) 1 ( ) ( )2

c T T T TE FSS f w f w γδ Φ ≤ − +

1 ( ) ( )2

c T T TE r f w f w γδ Φ = = − +

Set Membership approachSet Membership approach

Page 30: a user point of view

Structured identificationStructured identification

In the case of large dimension of regressor space it is often very hard to obtain satisfactory modeling accuracy.

Structured (block-oriented) identification

The high-dimensional problem is reduced to the identification of lower dimensional subsystems and to the estimation of their interactions

Page 31: a user point of view

Structured identificationStructured identification

Not measured

Nonlinear Linear

Typical cases: Wiener, Hammerstein and Lur’e systems

Page 32: a user point of view

Iterative identification algorithm:- Initialisation: get an initial guess M2

(0) of M2

- Step k:1) Compute v(k) such that M2

(k-1)[v(k)]=y2) Identify M1

(k) using u and y as inputs, v(k) as output3) Identify M2

(k) using v(k) = M2(k)[u,y] as input, y as output

and return to step 1)

Key feature: The identification error is non-increasing for increasing iteration.

Structured identificationStructured identification

Page 33: a user point of view

Model quality evaluationModel quality evaluation

The usual approach is to look for model validity

Model invalidity only can be surely asserted, when the model does not explain the measured data

Even more, infinitely many models exactly explaining the data can be derived

| |t tMy y− > expected noise size

Infinitely many not-invalidated models can be derived

“overfitting” danger

Page 34: a user point of view

Model quality evaluationModel quality evaluation

choose #r of basis functions = #T of measured data

( ) ( )

( ) ( )

1 1 1

1

T

T T T

w wL

w w

σ σ

σ σ

=

' 1 'ˆ ( )LL LYϑ −=

invertible

Finding models exactly explaining the data

' 1 'ˆ ( )MY L L LL LY Yϑ −= = =

Page 35: a user point of view

Model quality evaluationModel quality evaluation

Example:

1 2 3 42 0.5 0.8 0.5u u u u= =− −= =

1 11 1 1 2( ) t t t

MM y u uϑ ϑ ϑ+ −⇒ = +1 1 2

2 2 1 2( ) ( )t t tMM y u uϑ ϑ ϑ+ −⇒ = +

1 1 33 3 1 2( ) ( )t t t

MM y u uϑ ϑ ϑ+ −⇒ = +

1 2 3 40 1 8 0.125y y y y−= = = =

input

output

candidate model structures

Page 36: a user point of view

Model quality evaluationModel quality evaluation

Estimation of M1, M2, M3

1( )M ϑ ⇒ 1

2

8 0.5 20.125 0.8 0.5

ϑϑ

− − =

2t = →Y

3t = →

L= θ

1 1

2

ˆ 2.03ˆ 3.49

L Yϑ

ϑ−

− = =

2 ( )M ϑ ⇒ 1

2

8 0.5 40.125 0.8 0.25

ϑϑ

− − =

2t = →3t = →

1 1

2

ˆ 0.81ˆ 2.10

L Yϑ

ϑ−

= = −

3( )M ϑ ⇒ 1

2

8 0.5 80.125 0.8 0.125

ϑϑ

− − =

2t = →3t = →

1 1

2

ˆ 0ˆ 1

L Yϑ

ϑ−

= =

Page 37: a user point of view

y

1 2 3 4 5 6 7 8-8

-6

-4

-2

0

2

4

Model quality evaluationModel quality evaluation

All models M1, M2, M3 explain exactly the given data y

How to choose among them ?

choose the one with the best “predictive ability”

measured by accuracyin simulating data not used for model estimation

estimation data

y,yM1, yM2, yM3

Page 38: a user point of view

Model quality evaluationModel quality evaluation

Several indexes have been proposed for estimating the predictive ability of models:

ˆ( )T nFPE RT n

ϑ +=

−2ˆln ( ) nAIC RT

ϑ= +

lnˆln ( ) n TBIC RT

ϑ= +They provide quite crude approximations, especially

for nonlinear systems

A simple but effective approach: splitting of data

• estimation data: estimate candidate models Mi, i=1,..,m• calibration data: choose the best one among Mi

:T numberof data:n numberof parametersϑ

( ) [ ] [ ]1R Y L Y LT

θ ϑ ϑ′= − −

Page 39: a user point of view

1 2 3 4 5 6 7 8-8

-6

-4

-2

0

2

4

Model quality evaluationModel quality evaluation

Best model among candidate ones Mi

estimation data

y,yM1, yM2, yM3

calibration data

minimum simulation error on the “calibration” data

Example: M3 is the best one among M1, M2, M3

Page 40: a user point of view

ApplicationsApplications

Prediction of atmospheric pollutionSimulation of dam crest dynamicsIdentification of vehicles with controlled suspensions

Page 41: a user point of view

Prediction of urban ozone peaks

NOx

SOx

VOCCH4 NH3N2O

COCO2

NO2

NO

O3

O

O2

O2

RH

RO2

smogfotochimico

Page 42: a user point of view

Prediction of urban ozone peaks

hCombustion processes and high solar radiation cause high tropospheric ozone concentrationshPrediction of ozone concentrations is important

for authorities in charge of pollution control and preventionhStudies in the literature show that physical models

are not able to reliably forecast the links between precursor emissions (Nox, VOC), methereological conditions and ozone concentrations

Sillman “The relation between ozone, Nox and hydrocarbons”, Atmos. Environ., 1999

Jenkin-Clemitshaw “Ozone and other photochemical polluttants: chemical processes governing their formation”, Atmos. Environ., 1999

Page 43: a user point of view

Prediction of urban ozone peaks

Value to bepredicted

[O3]

[µg/m3]

day t-1 day t+1

Used measurements

day t

typical data at Broletto (Bs)

Page 44: a user point of view

Prediction of urban ozone peaks

1

1 2 3 4

( )[ ]

t o t

t t t t t t

y f ww y u u u u

+ =

=

• Structure of used models:

- : max O3 concentration at day t- : mean NO2 concentration at 4-8 pm of day t

- : mean O3 concentration at 4-8 pm of day t

- : max temperature at day t

- : forecast of max temperature at day t+14tu

ty1tu

2tu

3tu

Page 45: a user point of view

Prediction of urban ozone peaks

• Prediction methods tested:

PERS:

ARCX: periodic ARX

NN: sigmoidal neural net

NF: neuro-fuzzy

NSM: nonlinear set membership

1t ty y+ =

• Hourly data measured at Brescia center:

1995-1998: estimation data set

1999: calibration data set

2000-2001: testing data set

Page 46: a user point of view

Prediction of urban ozone peaks

Indexes measuring the ability to predict concentrations exceeding a given threshold:

observed

predicted yes no total

yes a f – a f

no m - a N + a – m – f

N – f

total m N - m N

fraction of Correct Predictions: CP=(a/m)%

fraction of False Alarms: FA=(1-a/f)%

Success index: SI=[(a/m)+((N+a-m-f)/(N-m))-1]%

European Environmental Agency, Tech. Report 9, 1998

Page 47: a user point of view

Prediction of urban ozone peaks

PERS ARCX NN NF NSM CP 65.1 61.9 69.8 63.5 71

FA 33.9 25 27.9 25.9 27.4

SI 47.6 51.1 55.7 51.8 51.2

Calibration data set: m=63 exceeded thresholds

PERS ARCX NN NF NSM CP 41.5 35.9 53.8 66.7 71.8

FA 57.5 51.7 40 44.7 44

SI 34.4 31.3 49.6 60.2 63.5

Testing data set: m=39 exceeded thresholds

Page 48: a user point of view

Model of Schlegeis Arch Dam

• Model to simulate the crest displacement of the dam as function of:

water level

concrete temperature

air temperature

• Difficulties in deriving reliable physical models

• Models tested: ARX, NN, NSM

• Daily data available in period 1992-2000

Page 49: a user point of view

Model of Schlegeis Arch Dam

1

1 1 1 1 11 1 1 2 2 3 3

( )[ ]

t o t

t t t t t t t t t t

y f ww y y u u u u u u u

+

− + − + +

=

=

• Structure of used models:

- : crest displacement at day t

- : concrete temperature at day t

- : mean air temperature at day t

ty1tu

2tu

3tu

- : water level at day t

• Daily data:1992-1996: estimation data set

1997-1998: calibration data set

1999-2000: testing data set

Page 50: a user point of view

Model of Schlegeis Arch Dam

ARX model NN modelexperimental datamodel

• Simulation results on the testing data set:

Page 51: a user point of view

Model of Schlegeis Arch Dam

NN model NSM modelexperimental datamodel

• Simulation results on the testing data set:

Page 52: a user point of view

Identification of vehicles with controlled suspensions

Virtual design and tuning of Continuous Damping Control systems

Derive a model for simulation ofchassis and wheels accelerations as function of road profile and damper control

GOAL:

USE:

Page 53: a user point of view

Experimental settinghC-segment prototype vehicle with controlled dampers

and CDC-Skyhook (Continuous Damping Control system).

hMeasurements are performed on a four-postertest bench of FIAT-Elasis Research Center.

Page 54: a user point of view

Experimental setting

Road profiles:

hRandom: random road.hEnglish Track: road with irregularly spaced holes and bumps.hShort Back: impulse road.hMotorway: level road.hPavé track: road with small amplitude irregularities.hDrain well: negative impulse road.

Note: The road profiles are symmetric (left=right).

Page 55: a user point of view

Experimental setting

Data set: 93184 data, collected with a sampling frequency of 512 Hz, partitioned as follows:

hEstimation data set: 0-5 seconds of each acquisition. hCalibration data set: 5-7 seconds of each acquisition.hTesting set: 7-14 seconds of each acquisition.

Page 56: a user point of view

Structure of vehicles vertical dynamics

Since the road profiles are symmetric, a Half-car model has been considered:

Page 57: a user point of view

Structured Identification ofvehicles vertical dynamics

Structure decomposition: - CE: chassis + engine- SWT: suspension +

wheel + tire

Measured variables:- prf and prr : front and rear

road profiles. - isf and isr : control currents of

front and rear suspensions. - acf and acr : front and rear

chassis vertical accelerations.

Note: Fcf and Fcr are not measured.

Page 58: a user point of view

Results on testing set of NSM modelFront wheel acceleration: english track road

measurements, NSM model

Page 59: a user point of view

Results on testing set of NSM modelChassis front accelerations: random road

measurements, NSM model

Page 60: a user point of view

Results on testing set of NSM modelChassis rear accelerations: random road

measurements, NSM model.

Page 61: a user point of view

Comparison with physical model

Chassis front accelerations: random roadmeasurements, NSM model, physical model