How to Make the Most Out of Evolving Information: Function...

Post on 09-Jan-2019

216 views 0 download

Transcript of How to Make the Most Out of Evolving Information: Function...

How to Make the Most Out of Evolving

Information:

Function Identification Using Epi-Splines

Johannes O. Royset

Operations Research, NPSin collaboration with L. Bonfiglio, S. Buttrey, S. Brizzolara, J.Carbaugh, I.Rios, D. Singham, K. Spurkel, G. Vernengo, J-P.

Watson, R. Wets, D. Woodruff

Smart Energy and Stochastic Optimization, ENPC ParisTechJune 2015

This material is based upon work supported in part by the U.S. ArmyResearch Laboratory and the U.S. Army Research Office under grants00101-80683, W911NF-10-1-0246 and W911NF-12-1-0273 as well as

by DARPA under HR0011-14-1-00601 / 126

Goal: estimate, predict, forecast

2 / 126

Copper prices

−10 −5 0 5 10

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

Months

Pric

e[U

SD

]

3 / 126

Copper prices

−10 −5 0 5 10

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

Months

Pric

e[U

SD

]

Real prices

E(hist+market) + σ

E(hist+market) − σ

E(hist+market)

4 / 126

Hourly electricity loads

5 / 126

Hard information: data

◮ Observations

◮ Usually scarce, excessive, corrupted, uncertain

Tuesday, October 22, 13

6 / 126

Soft information

◮ Indirect observations, external information

◮ Knowledge about

◮ structures◮ established “laws”◮ physical restrictions◮ shapes, smoothness of curves

7 / 126

Evolving information

data acquisition, information growth

8 / 126

Function Identification Problem

Identify a function that

best represents all available information

9 / 126

Applications of function identification approach

◮ energy

◮ natural resources

◮ financial markets

◮ image reconstruction

◮ uncertain quantification

◮ variograms

◮ nonparametric regression

◮ deconvolution

◮ density estimation

10 / 126

Outline

◮ Illustrations and formulations (slides 12-33)

◮ Framework (slides 34-60)

◮ Epi-splines and approximations (slides 61-77)

◮ Implementations (slides 78-85)

◮ Examples (slides 86-124)

11 / 126

Illustrations and formulations

12 / 126

Identifying financial curvesEstimate discount factor curve given instruments i = 1, 2, ..., I and

payments pi0, pi1, ..., p

iNi

at times 0 = t i0, ti1, ..., t

iNi

13 / 126

Identifying financial curvesEstimate discount factor curve given instruments i = 1, 2, ..., I and

payments pi0, pi1, ..., p

iNi

at times 0 = t i0, ti1, ..., t

iNi

Identify nonnegative, nonincreasing function f with f (0) = 1

Ni∑

j=0

f (t ij )pij ≈ 0 for all i

13 / 126

Identifying financial curvesEstimate discount factor curve given instruments i = 1, 2, ..., I and

payments pi0, pi1, ..., p

iNi

at times 0 = t i0, ti1, ..., t

iNi

Identify nonnegative, nonincreasing function f with f (0) = 1

Ni∑

j=0

f (t ij )pij ≈ 0 for all i

Constrained infinite-dimensional optimization problem:

minimize

I∑

i=1

Ni∑

j=0

f (t ij )pij

such that f (0) = 1, f ≥ 0, f ′ ≤ 0

13 / 126

Day-ahead electricity load forecast

d on day d

, humid. cur

Tuesday, October 22, 13

Tuesday, October 22, 13

Data: predicted temperature, dew point; observed load

14 / 126

Day-ahead load forecast problem

Using a relevant data and weather forecast for tomorrow,determine tomorrow’s load curve and its uncertainty

15 / 126

Day-ahead load forecast problem

Using a relevant data and weather forecast for tomorrow,determine tomorrow’s load curve and its uncertainty

◮ j = 1, 2, ..., J: days in data set

15 / 126

Day-ahead load forecast problem

Using a relevant data and weather forecast for tomorrow,determine tomorrow’s load curve and its uncertainty

◮ j = 1, 2, ..., J: days in data set

◮ h = 1, 2, ..., 24: hours of the day

15 / 126

Day-ahead load forecast problem

Using a relevant data and weather forecast for tomorrow,determine tomorrow’s load curve and its uncertainty

◮ j = 1, 2, ..., J: days in data set

◮ h = 1, 2, ..., 24: hours of the day

◮ tjh: predicted temperature in hour h of day j

15 / 126

Day-ahead load forecast problem

Using a relevant data and weather forecast for tomorrow,determine tomorrow’s load curve and its uncertainty

◮ j = 1, 2, ..., J: days in data set

◮ h = 1, 2, ..., 24: hours of the day

◮ tjh: predicted temperature in hour h of day j

◮ djh: predicted dew point in hour h of day j

15 / 126

Day-ahead load forecast problem

Using a relevant data and weather forecast for tomorrow,determine tomorrow’s load curve and its uncertainty

◮ j = 1, 2, ..., J: days in data set

◮ h = 1, 2, ..., 24: hours of the day

◮ tjh: predicted temperature in hour h of day j

◮ djh: predicted dew point in hour h of day j

◮ ljh: actual observed load in hour h of day j

15 / 126

Day-ahead load forecast problem (cont.)“Functional” regression model:

ljh = f tmp(h)t jh + f dpt(h)d j

h + ejh,

◮ regression “coefficients” f tmp and f dpt are functions of time◮ e

jh: error between the observed and predicted loads

16 / 126

Day-ahead load forecast problem (cont.)“Functional” regression model:

ljh = f tmp(h)t jh + f dpt(h)d j

h + ejh,

◮ regression “coefficients” f tmp and f dpt are functions of time◮ e

jh: error between the observed and predicted loads

Regression problem: find functions f tmp and f dpt on [0, 24] that

minimize

J∑

j=1

24∑

h=1

∣ljh −

[

f tmp(h)t jh + f dpt(h)d jh

]∣

subject to smoothness, curvature conditions

16 / 126

Day-ahead load forecast problem (cont.)“Functional” regression model:

ljh = f tmp(h)t jh + f dpt(h)d j

h + ejh,

◮ regression “coefficients” f tmp and f dpt are functions of time◮ e

jh: error between the observed and predicted loads

Regression problem: find functions f tmp and f dpt on [0, 24] that

minimize

J∑

j=1

24∑

h=1

∣ljh −

[

f tmp(h)t jh + f dpt(h)d jh

]∣

subject to smoothness, curvature conditions

Constrained infinite-dimensional optimization problem

16 / 126

Day-ahead load forecast problem (cont.)

Given temperature t and dew point d forecasts (functions of time)for tomorrow,

load forecast becomes f tmp(h)t(h) + f dpt(h)d(h), h ∈ [0, 24]

17 / 126

Day-ahead load forecast problem (cont.)

Given temperature t and dew point d forecasts (functions of time)for tomorrow,

load forecast becomes f tmp(h)t(h) + f dpt(h)d(h), h ∈ [0, 24]

But, what about uncertainty?

17 / 126

Estimation of errors

For hour h:

minimized errors e jh = ljh − [f tmp(h)t jh + f dpt(h)d j

h], j = 1, ..., J

= samples from actual error density

But that wouldn’t capture the uncertainty!

one would expect:

h

18 / 126

Estimation of errors

For hour h:

minimized errors e jh = ljh − [f tmp(h)t jh + f dpt(h)d j

h], j = 1, ..., J

= samples from actual error density

But that wouldn’t capture the uncertainty!

one would expect:

h

Probability density estimation problem with very small data set

18 / 126

Estimation of errors (cont.)

Probability density estimation problem:

Given iid sample x1, x2, ..., xν ,

find a function h ∈ H that maximizes

ν∏

i=1

h(x i )

19 / 126

Estimation of errors (cont.)

Probability density estimation problem:

Given iid sample x1, x2, ..., xν ,

find a function h ∈ H that maximizes

ν∏

i=1

h(x i )

H = constraints on h: h ≥ 0,∫

h(x)dx = 1, and more

19 / 126

Estimation of errors (cont.)

Probability density estimation problem:

Given iid sample x1, x2, ..., xν ,

find a function h ∈ H that maximizes

ν∏

i=1

h(x i )

H = constraints on h: h ≥ 0,∫

h(x)dx = 1, and more

Constrained infinite-dimensional optimization problem

19 / 126

Estimation of errors (cont.)

20 / 126

Generating load scenariosAlso need to consider conditioning:

◮ if load above regression function early, then likely above later◮ error data for density estimation reduced further

21 / 126

Generating load scenariosAlso need to consider conditioning:

◮ if load above regression function early, then likely above later◮ error data for density estimation reduced further

21 / 126

Forecast precision (Connecticut 2010-12)

Season Mean % error

Fall 3.99Spring 2.73Summer 4.19Winter 3.47

22 / 126

Uncertainty quantification (UQ)

Engineering, biological, physical systems

◮ Input: random vector V (“known” distribution)

◮ System function g ; implicitly defined e.g. by simulation

◮ Output: random variable

X = g(V )

23 / 126

Illustration of UQ challenges

Output amplitude of dynamical system:

−8 −6 −4 −2 0 2 4 6 80

0.5

1

1.5

x

dens

ity o

f X

How to estimate densities like this with a small sample?

24 / 126

Probability density estimation

Sample X 1, ...,X ν : maximize∏ν

i=1 h(Xi) s.t. h ∈ Hν ⊂ H

◮ Sample size might be growing

◮ Constraint set Hν might be evolving

25 / 126

Evolving probability density estimation

maximize∏ν

i=1(h(Xi))1/ν s.t. h ∈ Hν ⊂ H

26 / 126

Evolving probability density estimation

maximize∏ν

i=1(h(Xi))1/ν s.t. h ∈ Hν ⊂ H

maximize 1ν

∑νi=1 log h(X

i) s.t. h ∈ Hν ⊂ H

26 / 126

Evolving probability density estimation

maximize∏ν

i=1(h(Xi))1/ν s.t. h ∈ Hν ⊂ H

maximize 1ν

∑νi=1 log h(X

i) s.t. h ∈ Hν ⊂ H

actual problem: maximize E [log h(X )] s.t. h ∈ H ⊂ H

26 / 126

Evolving constraints

Constraints: nonnegative support, smooth, curvature bound

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

1.5

x

dens

ity

True DensityEpi−Spline EstimateKernel EstimateEmpirical Density

MSE = 0.2765 (epi-spline) and 0.3273 (kernel)

27 / 126

Evolving constraints (cont.)

Also log-concave

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

1.5

x

dens

ity

True DensityEpi−Spline EstimateKernel EstimateEmpirical Density

MSE = 0.1144 (epi-spline) and 0.3273 (kernel)

28 / 126

Evolving constraints (cont.)

Also nonincreasing

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

1.5

x

dens

ity

True DensityEpi−Spline EstimateKernel EstimateEmpirical Density

MSE = 0.0470 (epi-spline) and 0.3273 (kernel)

29 / 126

Evolving constraints (cont.)

Also slope bounds

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

1.5

x

dens

ity

True DensityEpi−Spline EstimateKernel EstimateEmpirical Density

MSE = 0.0416 (epi-spline) and 0.3273 (kernel)

30 / 126

Take-aways so far

◮ Challenges in forecasting and estimation are functionidentification problems

◮ Day-ahead forecasting of electricity demand involves

◮ functional regression to get trend◮ probability density estimation to get errors

◮ Soft information supplements hard information (data)

31 / 126

Questions?

32 / 126

Outline

◮ Illustrations and formulations (slides 12-33)

◮ Framework (slides 34-60)

◮ Epi-splines and approximations (slides 61-77)

◮ Implementations (slides 78-85)

◮ Examples (slides 86-124)

33 / 126

Framework

34 / 126

Actual problem

◮ Constrained infinite-dimensional optimization problem

minψ(f )

such that f ∈ F ⊆ F ⊆ some function space

35 / 126

Actual problem

◮ Constrained infinite-dimensional optimization problem

minψ(f )

such that f ∈ F ⊆ F ⊆ some function space

◮ criterion ψ (norms, error measures, etc)

35 / 126

Actual problem

◮ Constrained infinite-dimensional optimization problem

minψ(f )

such that f ∈ F ⊆ F ⊆ some function space

◮ criterion ψ (norms, error measures, etc)

◮ feasible set F (shape restrictions, external info)

35 / 126

Actual problem

◮ Constrained infinite-dimensional optimization problem

minψ(f )

such that f ∈ F ⊆ F ⊆ some function space

◮ criterion ψ (norms, error measures, etc)

◮ feasible set F (shape restrictions, external info)

◮ subspace of interest F (continuity, smoothness)

35 / 126

Special case: linear regression

0 1000 2000 3000 4000 5000

0500

1000

1500

2000

Household Income

Food E

xpenditure

0.75−Superquantile Regression

0.75−Quantile Regression

Least−squares Regression

find affine f that min∑

j

(y j − f (x j ))2 or other error measures

36 / 126

Special case: interpolation

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1.5

−1

−0.5

0

0.5

1

1.5

datasmoothing splineinterpolating spline

min ‖f ′′‖2

such that f (x j ) = y j for all j and f ∈ Sobolev space

37 / 126

Special case: smoothing

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1.5

−1

−0.5

0

0.5

1

1.5

datasmoothing splineinterpolating spline

min∑

j

(y j − f (x j ))2 + λ‖f ′′‖2

such that f ∈ Sobolev space

38 / 126

Evolving problems

◮ Evolving infinite-dimensional optimization problems

minψν(f )

such that f ∈ F ν ⊆ F ⊆ some function space

◮ evolving/approximating criterion ψν

◮ evolving/approximating feasible set F ν

39 / 126

Evolving problems

◮ Evolving infinite-dimensional optimization problems

minψν(f )

such that f ∈ F ν ⊆ F ⊆ some function space

◮ evolving/approximating criterion ψν

◮ evolving/approximating feasible set F ν

40 / 126

Function space

extended real-valued lower semicontinuous functions on IRn

41 / 126

Function space

extended real-valued lower semicontinuous functions on IRn

x

lower semicont. f

41 / 126

Function space

extended real-valued lower semicontinuous functions on IRn

x

lower semicont. f

(lsc-fcns(IRn), dl): complete separable metric, dl = epi-distance

41 / 126

Modeling possibilities with lsc functions

◮ functions with jumps and high growth

◮ response surface building with implicit constraints

◮ system identification with subsequent minimization

◮ nonlinear transformations requiring ±∞

◮ functions with unbounded domain

42 / 126

Jumps

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1.5

−1

−0.5

0

0.5

1

1.5

Fits by polynomial and lsc function

43 / 126

Modeling possibilities with lsc functions

◮ functions with jumps and high growth

◮ response surface building with implicit constraints

◮ system identification with subsequent minimization

◮ nonlinear transformations requiring ±∞

◮ functions with unbounded domain

44 / 126

Nonlinear transformations

Recall: probability density estimation with sample X 1, ...,X ν :

maximize 1ν

∑νi=1 log h(X

i) s.t. h ∈ Hν ⊂ H including h ≥ 0

45 / 126

Nonlinear transformations

Recall: probability density estimation with sample X 1, ...,X ν :

maximize 1ν

∑νi=1 log h(X

i) s.t. h ∈ Hν ⊂ H including h ≥ 0

Exponential transformation: h(x) = exp(−s(x))

45 / 126

Nonlinear transformations

Recall: probability density estimation with sample X 1, ...,X ν :

maximize 1ν

∑νi=1 log h(X

i) s.t. h ∈ Hν ⊂ H including h ≥ 0

Exponential transformation: h(x) = exp(−s(x))

minimize∑ν

i=1 s(Xi) s.t. s ∈ Sν ⊂ S excluding nonnegativity

45 / 126

Nonlinear transformations

Recall: probability density estimation with sample X 1, ...,X ν :

maximize 1ν

∑νi=1 log h(X

i) s.t. h ∈ Hν ⊂ H including h ≥ 0

Exponential transformation: h(x) = exp(−s(x))

minimize∑ν

i=1 s(Xi) s.t. s ∈ Sν ⊂ S excluding nonnegativity

◮ h log-concave ⇐⇒ s convex

45 / 126

Nonlinear transformations

Recall: probability density estimation with sample X 1, ...,X ν :

maximize 1ν

∑νi=1 log h(X

i) s.t. h ∈ Hν ⊂ H including h ≥ 0

Exponential transformation: h(x) = exp(−s(x))

minimize∑ν

i=1 s(Xi) s.t. s ∈ Sν ⊂ S excluding nonnegativity

◮ h log-concave ⇐⇒ s convex

◮ If s = 〈c(·), r〉, then certain expression linear in r

45 / 126

Nonlinear transformations

Recall: probability density estimation with sample X 1, ...,X ν :

maximize 1ν

∑νi=1 log h(X

i) s.t. h ∈ Hν ⊂ H including h ≥ 0

Exponential transformation: h(x) = exp(−s(x))

minimize∑ν

i=1 s(Xi) s.t. s ∈ Sν ⊂ S excluding nonnegativity

◮ h log-concave ⇐⇒ s convex

◮ If s = 〈c(·), r〉, then certain expression linear in r

But need s to take on ∞ for h = exp(−s(·)) to vanish

45 / 126

Identifying functions with unbounded domains

Complications for “standard” spaces

46 / 126

Identifying functions with unbounded domains

Complications for “standard” spaces

Is f ν near f 0?

x

/

46 / 126

Identifying functions with unbounded domains

Complications for “standard” spaces

Is f ν near f 0?

x

/

Not in Lp-norm, but epi-distance ≤ 1/ν

46 / 126

Modeling possibilities with lsc functions

◮ functions with jumps and high growth

◮ response surface building with implicit constraints

◮ system identification with subsequent minimization

◮ nonlinear transformations requiring ±∞

◮ functions with unbounded domain

47 / 126

Epi-graph

x

f

48 / 126

Epi-graph

x

epi f

f

49 / 126

Geometric view of lsc functions

f ∈ lsc-fcns(IRn) ⇐⇒ epi f nonemptly closed subset of IRn+1

50 / 126

Geometric view of lsc functions

f ∈ lsc-fcns(IRn) ⇐⇒ epi f nonemptly closed subset of IRn+1

Notation:

◮ ρIB = IB(0, ρ) = origin-centered ball with radius ρ

◮ d(y ,S) = infy ′∈S ‖y − y ′‖ = distance between y and S

50 / 126

Distances between epi-graphs

x

epi f eeeeeeeeeeepppppppppppppppppppppppppiiiiiiiiii ffffffffffffffffffffffffff

epi g

51 / 126

Distances between epi-graphs (cont.)

ρ-epi-distance = dlρ(f , g) = supx̄∈ρIB

|d(x̄ , epi f )−d(x̄ , epi g)|, ρ ≥ 0

52 / 126

Distances between epi-graphs (cont.)

ρ-epi-distance = dlρ(f , g) = supx̄∈ρIB

|d(x̄ , epi f )−d(x̄ , epi g)|, ρ ≥ 0

f

x

g

52 / 126

Distances between epi-graphs (cont.)ρ-epi-distance = dlρ(f , g) = sup

x̄∈ρIB|d(x̄ , epi f )−d(x̄ , epi g)|, ρ ≥ 0

epi f

x

epi g

= ( , x)

dlρ(f , g) = δ

53 / 126

Distances between epi-graphs (cont.)

ρ-epi-distance = dlρ(f , g) = supx̄∈ρIB

|d(x̄ , epi f )−d(x̄ , epi g)|, ρ ≥ 0

( , )

= ( , )

( , epi )

epi

epi

( , epi )

54 / 126

Metric on the lsc functions

ρ-epi-distance dlρ is a pseudo-metric on lsc-fcns(IRn)

epi-distance = dl(f , g) =

∫ ∞

0dlρ(f , g)e

−ρdρ

dl is a metric on lsc-fcns(IRn)

The epi-distance induces the epi-topology on lsc-fcns(IRn)(lsc-fcns(IRn), dl) is a complete separable metric space (Polish)

55 / 126

Epi-convergence

f ν ∈ lsc-fcns(IRn) epi-converges to f if dl(f ν , f ) → 0

56 / 126

Actual problem

◮ Constrained infinite-dimensional optimization problem

minψ(f )

such that f ∈ F ⊆ F ⊆ lsc-fcns(IRn)

◮ criterion ψ (norms, error measures, etc)

◮ feasible set F (shape restrictions, external info)

◮ subspace of interest F (continuity, smoothness)

57 / 126

Take-aways in this section

◮ Actual problem:

◮ find best extended real-valued lower semicontinuous function◮ captures regression, curve fitting, interpolation, density

estimation etc:◮ allows functions on IRn, jumps, transformation requiring ±∞

◮ Evolving problem due to changes in information andapproximations

◮ Theory about lsc functions:

◮ metric space under epi-distance◮ epi-distance = distance between epi-graphs

58 / 126

Questions?

59 / 126

Outline

◮ Illustrations and formulations (slides 12-33)

◮ Framework (slides 34-60)

◮ Epi-splines and approximations (slides 61-77)

◮ Implementations (slides 78-85)

◮ Examples (slides 86-124)

60 / 126

Epi-splines and approximations

61 / 126

Epi-splines = piecewise polynomial + lower semicont.

Piecewise polynomial not automatic; must be constructed

62 / 126

Epi-splines = piecewise polynomial + lower semicont.

Piecewise polynomial not automatic; must be constructed

s(x)

m1 m2 m3

x

mN-1m4 m5 m6…

62 / 126

Epi-splines = piecewise polynomial + lower semicont.

Piecewise polynomial not automatic; must be constructed

s(x)

m1 m2 m3

x

mN-1m4 m5 m6…

Optimization over new class of piecewise polynomial functions

62 / 126

n-dim epi-splines

63 / 126

n-dim epi-splines (cont.)

Epi-spline s of order p defined on IRn with partition R = {Rk}Nk=1

is a real-valued function that

on each Rk , k = 1, ...,N, is polynomial of total degree p and

for every x ∈ IRn, has s(x) = lim infx ′→x

s(x ′)

64 / 126

n-dim epi-splines (cont.)

Epi-spline s of order p defined on IRn with partition R = {Rk}Nk=1

is a real-valued function that

on each Rk , k = 1, ...,N, is polynomial of total degree p and

for every x ∈ IRn, has s(x) = lim infx ′→x

s(x ′)

e-splpn(R) = epi-splines of order p on IRn with partition R

64 / 126

Evolving and approximating problems

minψν(s)

such that s ∈ Sν = F ν ∩ Sν

65 / 126

Evolving and approximating problems

minψν(s)

such that s ∈ Sν = F ν ∩ Sν

◮ subspace of interest Sν ⊆ e-splpν

n (Rν) ∩ F

65 / 126

Evolving and approximating problems

minψν(s)

such that s ∈ Sν = F ν ∩ Sν

◮ subspace of interest Sν ⊆ e-splpν

n (Rν) ∩ F

◮ evolving/approximate feasible set F ν

65 / 126

Evolving and approximating problems

minψν(s)

such that s ∈ Sν = F ν ∩ Sν

◮ subspace of interest Sν ⊆ e-splpν

n (Rν) ∩ F

◮ evolving/approximate feasible set F ν

◮ evolving/approximate criterion ψν

65 / 126

Function identification framework

Evolving Problems - information growth

- criterion and constraint approx.

- function approximations

Actual Problem - complete information

- infinite dimensional

- conceptual

solutions = epi-splines solutions

consistency, rates

66 / 126

Convergence of evolving problems

Epi-convergence of functionals on a metric space:

67 / 126

Convergence of evolving problems

Epi-convergence of functionals on a metric space:

Definition: {ψν : Sν → IR}∞ν=1 epi-converge to ψ : F → IR if andonly if

(i) for every sν → f ∈ lsc-fcns(IRn), with sν ∈ Sν , we havelim inf ψν(sν) ≥ ψ(f ) if f ∈ F and ψν(sν) → ∞ otherwise;

(ii) for every f ∈ F , there exists {sν}∞ν=1, with sν ∈ Sν , suchthat sν → f and lim supψν(sν) ≤ ψ(f ).

67 / 126

Convergence of evolving problems

Epi-convergence of functionals on a metric space:

Definition: {ψν : Sν → IR}∞ν=1 epi-converge to ψ : F → IR if andonly if

(i) for every sν → f ∈ lsc-fcns(IRn), with sν ∈ Sν , we havelim inf ψν(sν) ≥ ψ(f ) if f ∈ F and ψν(sν) → ∞ otherwise;

(ii) for every f ∈ F , there exists {sν}∞ν=1, with sν ∈ Sν , suchthat sν → f and lim supψν(sν) ≤ ψ(f ).

Equivalent to previous definition

67 / 126

Convergence of evolving problems

Epi-convergence of functionals on a metric space:

Definition: {ψν : Sν → IR}∞ν=1 epi-converge to ψ : F → IR if andonly if

(i) for every sν → f ∈ lsc-fcns(IRn), with sν ∈ Sν , we havelim inf ψν(sν) ≥ ψ(f ) if f ∈ F and ψν(sν) → ∞ otherwise;

(ii) for every f ∈ F , there exists {sν}∞ν=1, with sν ∈ Sν , suchthat sν → f and lim supψν(sν) ≤ ψ(f ).

Equivalent to previous definition

Also say that the evolving optimization problems epi-converge tothe actual problem.

67 / 126

Consequence of epi-convergence

If evolving problems epi-converge to the actual problem, then

lim sup

(

infs∈Sν

ψν(s)

)

≤ inff ∈F

ψ(f ).

68 / 126

Consequence of epi-convergence

If evolving problems epi-converge to the actual problem, then

lim sup

(

infs∈Sν

ψν(s)

)

≤ inff ∈F

ψ(f ).

Moreover, if sk are optimal for the evolving problemsmins∈Sνk ψνk (s) and sk → f 0, then f 0 is optimal for the actualproblem and

limk→∞

infs∈Sνk

ψνk (s) = inff ∈F

ψ(f ) = ψ(f 0).

68 / 126

When will evolving problems epi-converge to actual?

minψ(f ) minψν(s)

such that f ∈ F ⊆ F ⊆ lsc-fcns(IRn) such that s ∈ Sν = F ν ∩ Sν

Theorem: A sufficient condition for epi-convergence is

◮ ψν converges continuously to ψ relative to F

◮ Epi-splines dense in lsc functions

◮ F ν set-converges to F

◮ F is solid

69 / 126

Epi-splines dense in lsc functions

{Rν}∞ν=1, with Rν = {Rν1 , ...,R

νNν}, is an infinite refinement if

for every x ∈ IRn and ǫ > 0, there exists a positive integer ν̄ s.t.

Rνk ⊂ IB(x , ǫ) for every ν ≥ ν̄ and k satisfying x ∈ clRν

k .

70 / 126

Epi-splines dense in lsc functions

{Rν}∞ν=1, with Rν = {Rν1 , ...,R

νNν}, is an infinite refinement if

for every x ∈ IRn and ǫ > 0, there exists a positive integer ν̄ s.t.

Rνk ⊂ IB(x , ǫ) for every ν ≥ ν̄ and k satisfying x ∈ clRν

k .

Theorem: For any p = 0, 1, 2, ..., and infinite refinement {Rν}∞ν=1,

∞⋃

ν=1

e-splpn(Rν) is dense in lsc-fcns(IRn)

70 / 126

Dense in continuous functions under simplex partitioning

Definition: A simplex S in IRn is the convex hull of n + 1 pointsx0, x1, ..., xn ∈ IRn, with x1 − x0, x2 − x0, ..., xn − x0 linearlyindependent.

71 / 126

Dense in continuous functions under simplex partitioning

Definition: A simplex S in IRn is the convex hull of n + 1 pointsx0, x1, ..., xn ∈ IRn, with x1 − x0, x2 − x0, ..., xn − x0 linearlyindependent.

Definition: A partition R1,R2, ...,RN of IRn is a simplex partition ofIRn if clR1, ..., clRN are “mostly” simplexes.

71 / 126

Dense in cnts fcns under simplex partitioning (cont.)

Theorem: For any p = 1, 2, .., and {Rν}∞ν=1, an infinite refinementof IRn consisting of simplex partitions of IRn,

∞⋃

ν=1

e-splpn(Rν) ∩ C0(IRn) is dense in C0(IRn).

72 / 126

Rates in univariate probability density estimation

For convex and finite-dimensional estimation problem:

Theorem: For any “correct” soft information,

ν1/2dKL(h0||hνǫ ) = Op(1) for some hνǫ = near-optimal solution

73 / 126

Decomposition

Theorem: For every s ∈ e-splpn(R), with n > p ≥ 1 andR = {Rk}

Nk=1, there exist qk,i ∈ polyp(IRn−1), i = 1, 2, ..., n and

k = 1, 2, ...,N, such that

s(x) =n

i=1

qk,i (x−i), for all x ∈ Rk .

an n-dim epi-spline is the sum of n, (n − 1)-dim epi-splines

74 / 126

Take-aways in this section

◮ Epi-splines are piecewise polynomials defined on arbitrarypartition of IRn

◮ Only lower-semicontinuity required (not smoothness)

◮ Dense in space of extended real-valued lower-semicontinuousfunctions under epi-distance (i.e., epi-splines can approximateevery lsc function to an arbitrary accuracy)

◮ Solutions of evolving problem (in terms of epi-splines) areapproximate solutions of actual problem

75 / 126

Questions?

76 / 126

Outline

◮ Illustrations and formulations (slides 12-33)

◮ Framework (slides 34-60)

◮ Epi-splines and approximations (slides 61-77)

◮ Implementations (slides 78-85)

◮ Examples (slides 86-124)

77 / 126

Implementations

78 / 126

Implementation considerations

◮ Selection of epi-spline order and composition h = exp(−s)

◮ Selection of partition: go fine!

◮ Implementation of criterion functional and soft info

79 / 126

Implementation considerations

◮ Selection of epi-spline order and composition h = exp(−s)

◮ Selection of partition: go fine!

◮ Implementation of criterion functional and soft info

Provide details for one-dimensional epi-splines of order 1

Focusing on criteria and constraints for probability densities

79 / 126

Implementation of criteria functionals

Probability densities h ∈ e-spl1(m), with meshm = {m0,m1, ...,mN},

−∞ < m0 < m1 < ... < mN <∞

80 / 126

Implementation of criteria functionals

Probability densities h ∈ e-spl1(m), with meshm = {m0,m1, ...,mN},

−∞ < m0 < m1 < ... < mN <∞

First-order epi-splines (piecewise linear):h(x) = ak0 + akx for x ∈ (mk−1,mk)

80 / 126

Implementation of criteria functionals (cont.)

Log-likelihood for observations x1, ..., xν :

log

ν∏

i=1

h(x i) =

ν∑

i=1

log(aki0 +aki x i ), where ki such that x i ∈ (mki−1,mki )

81 / 126

Implementation of criteria functionals (cont.)

Log-likelihood for observations x1, ..., xν :

log

ν∏

i=1

h(x i) =

ν∑

i=1

log(aki0 +aki x i ), where ki such that x i ∈ (mki−1,mki )

Entropy:

h(x) log h(x)dx = −N∑

k=1

∫ mk

mk−1

(ak0 + akx) log(ak0 + akx)

81 / 126

Implementation of criteria functionals (cont.)

Log-likelihood for observations x1, ..., xν :

log

ν∏

i=1

h(x i) =

ν∑

i=1

log(aki0 +aki x i ), where ki such that x i ∈ (mki−1,mki )

Entropy:

h(x) log h(x)dx = −N∑

k=1

∫ mk

mk−1

(ak0 + akx) log(ak0 + akx)

Both concave in epi-parameters ak0 and ak , k = 1, ...,N

81 / 126

Soft information

Nonnegativity: h ≥ 0 if ak0 + akmk−1 ≥ 0 and ak0 + akmk ≥ 0,k = 1, ...,N

82 / 126

Soft information

Nonnegativity: h ≥ 0 if ak0 + akmk−1 ≥ 0 and ak0 + akmk ≥ 0,k = 1, ...,N

Integrate to one:

h(x)dx =N∑

k=1

ak0(mk −mk−1) +N∑

k=1

ak

2(m2

k −m2k−1) = 1

82 / 126

Soft information

Nonnegativity: h ≥ 0 if ak0 + akmk−1 ≥ 0 and ak0 + akmk ≥ 0,k = 1, ...,N

Integrate to one:

h(x)dx =N∑

k=1

ak0(mk −mk−1) +N∑

k=1

ak

2(m2

k −m2k−1) = 1

Continuity: ak0 + akmk = ak+10 + ak+1mk for k = 1, ...,N − 1.

82 / 126

Soft information

Nonnegativity: h ≥ 0 if ak0 + akmk−1 ≥ 0 and ak0 + akmk ≥ 0,k = 1, ...,N

Integrate to one:

h(x)dx =N∑

k=1

ak0(mk −mk−1) +N∑

k=1

ak

2(m2

k −m2k−1) = 1

Continuity: ak0 + akmk = ak+10 + ak+1mk for k = 1, ...,N − 1.

Log-concavity, convexity, monotonicity

82 / 126

Soft information (cont.)

Deconvolution Y = X +W : Given densities hW and hY ,

hY (y) =

h(x)hW (y − x)dx =

N∑

k=1

ak0

∫ mk

mk−1

hW (y − x)dx

+

N∑

k=1

ak∫ mk

mk−1

xhW (y − x)dx for all y

83 / 126

Soft information (cont.)

Deconvolution Y = X +W : Given densities hW and hY ,

hY (y) =

h(x)hW (y − x)dx =

N∑

k=1

ak0

∫ mk

mk−1

hW (y − x)dx

+

N∑

k=1

ak∫ mk

mk−1

xhW (y − x)dx for all y

Inverse problem Y = g(X ): Given vq =∫

yqhY (y)dy ,q = 1, 2, ...

vq =

[g(x)]qh(x)dx ≈

N∑

k=1

Mk∑

j=1

w jk [g(x jk)]q(ak0 + akx jk)

83 / 126

Soft information (cont.)

Deconvolution Y = X +W : Given densities hW and hY ,

hY (y) =

h(x)hW (y − x)dx =

N∑

k=1

ak0

∫ mk

mk−1

hW (y − x)dx

+

N∑

k=1

ak∫ mk

mk−1

xhW (y − x)dx for all y

Inverse problem Y = g(X ): Given vq =∫

yqhY (y)dy ,q = 1, 2, ...

vq =

[g(x)]qh(x)dx ≈

N∑

k=1

Mk∑

j=1

w jk [g(x jk)]q(ak0 + akx jk)

Convex optimization problems

83 / 126

Software, references

◮ Papers and tutorials: http://faculty.nps.edu/joroyset

◮ Software for univariate probability density estimation

◮ Matlab toolbox http://faculty.nps.edu/joroyset/XSPL.html◮ R toolbox (S. Buttrey)

http://faculty.nps.edu/sebuttre/home/Software/expepi/index.html

84 / 126

Outline

◮ Illustrations and formulations (slides 12-33)

◮ Framework (slides 34-60)

◮ Epi-splines and approximations (slides 61-77)

◮ Implementations (slides 78-85)

◮ Examples (slides 86-124)

85 / 126

Examples: forecasting and fitting

86 / 126

Copper price forecast

Stochastic differential equation: estimate drift, volatility

Steps:

◮ identify discount factor curve using epi-splines and futuresmarket information =⇒ future spot prices

◮ identify drift using epi-splines, historical data, future spotprices, etc

◮ identify volatility using epi-splines and observed/estimatederrors

87 / 126

Copper price forecast (cont.)

−10 −5 0 5 10

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

Months

Pric

e[U

SD

]

Real prices

E(hist+market) + σ

E(hist+market) − σ

E(hist+market)

88 / 126

Surface reconstruction

f (x) = (cos(πx1) + cos(πx2))3 on [−3, 3]2

continuity; second-order epi-splines; N = 400; 900 uniform points

Actual function and epi-spline approximation

89 / 126

Surface reconstruction

f (x) = sin(π‖x‖)/(π‖x‖) for x ∈ [−5, 5]2\{0}, f (0) = 1cont. diff.; second-order epi-splines; N = 225; 600 random points

−5

0

5

−5

0

5−0.5

0

0.5

1

x1

x2

actu

al s

urf

ace

−5

0

5

−5

0

5

0

0.5

1

x1

x2

epi−

spli

ne

esti

mat

e

Actual function and epi-spline approximation

90 / 126

Examples: density estimation

91 / 126

Examples of probability density estimation

Find a density h that maximizes log-likelihood of observationsand satisfies soft information

Examples: earthquake losses, queuing, robustness, deconvolution,UQ, mixture, bivariate densities

92 / 126

Earthquake losses

93 / 126

Earthquake losses, Vancouver Region*Comprehensive damage model (284 random variables)

100,000 simulations of 50-year loss

Figure 4-1: Vancouver metropolitan region.

*Data from Mahsuli, 2012; see also Mahsuli & Haukaas, 2013

94 / 126

Earthquake losses (cont.)Probability density of 50-year loss (billion CAD)

0 2 4 6 8 10 12 14 16 18 200

0.05

0.1

0.15

0.2

0.25

0.3

loss

dens

ity

95 / 126

Use fewer simulations?Using 100 simulations only and kernel estimator

96 / 126

Use fewer simulations?Using 100 simulations only and kernel estimator

0 2 4 6 8 10 12 14 16 18 200

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

loss

dens

ity

96 / 126

Using function identification approachMax likelihood of 100 simulations; second-order exp. epi-splinesEng. knowledge: nonincreasing, smooth, nonnegative support

97 / 126

Using function identification approachMax likelihood of 100 simulations; second-order exp. epi-splinesEng. knowledge: nonincreasing, smooth, nonnegative support

0 2 4 6 8 10 12 14 16 18 200

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

loss

dens

ity

100000 sample100 sample100 kernel

97 / 126

Using function identification approach (cont.)

Varying number of simulations; same soft information

0 2 4 6 8 10 12 14 16 18 200

0.05

0.1

0.15

0.2

0.25

loss

dens

ity

100000 sample10000 sample1000 sample100 sample10 sample

98 / 126

Using function identification approach (cont.)30 meta-replications of 100 simulations

0 2 4 6 8 10 12 14 16 18 200

0.05

0.1

0.15

0.2

0.25

0.3

loss

dens

ity

100,000 sim. 100 sim.

mean 3.2 3.1(±1.3)

average 10% highest 10.6 10.3(±4.4)

99 / 126

Using function identification approach (cont.)Also pointwise Fisher info.: h′(y)/h(y) ∈ [−0.5,−0.1]

0 2 4 6 8 10 12 14 16 18 200

0.05

0.1

0.15

0.2

0.25

0.3

loss

dens

ity

100,000 sim. 100 sim.

mean 3.2 3.2(±1.2)

average 10% highest 10.6 10.6(±4.0)

100 / 126

Using function identification approach (cont.)Also pointwise Fisher info.: h′(x)/h(x) ∈ [−0.35,−0.25]

0 2 4 6 8 10 12 14 16 18 200

0.05

0.1

0.15

0.2

0.25

0.3

loss

dens

ity

100,000 sim. 100 sim.

mean 3.2 3.3(±0.5)

average 10% highest 10.6 10.9(±1.7)

101 / 126

Building robustness

102 / 126

Diversity of estimates using KL-divergenceReturning to exponential densityContinuously diff., nonincreasing, nonnegative support

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

1.5

x

dens

ity

Original EstimateEmpirical DensityKL = 0.001True DensityKL = 0.01KL = 0.1

103 / 126

Queuing

104 / 126

M/M/1; 50% of customers delayed for fixed timeX = customer time-in-service; 100 obs.; exp. epi-splineSoft info: lsc, X ≥ 0, pointwise Fisher, unimodal upper tail

−1 0 1 2 3 4 5 6 70

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

x

density h

(x)

True Density

Epi−Spline Estimate

Kernel Estimate

Empirical Density

105 / 126

Deconvolution

106 / 126

True density Gamma(5,1)First-order epi-spline on [0, 23], N = 1000Three sample pointsContin., unimodal, convex tails, bounds on gradient jumps

x-5 0 5 10 15

dens

ity

0

0.05

0.1

0.15

0.2

0.25

X True DensityX Epi-Spline EstimateX Sample data

107 / 126

Also noisy observations5000 observations of Y = X +W

W independent normal noise; mean 0, stdev 3.2

108 / 126

Also noisy observations5000 observations of Y = X +W

W independent normal noise; mean 0, stdev 3.2Estimate hY separately

108 / 126

Also noisy observations5000 observations of Y = X +W

W independent normal noise; mean 0, stdev 3.2Estimate hY separately|hY (y)−

h(x)hW (y − x)dx | ≤ 0.005 for 101 y points

108 / 126

Also noisy observations5000 observations of Y = X +W

W independent normal noise; mean 0, stdev 3.2Estimate hY separately|hY (y)−

h(x)hW (y − x)dx | ≤ 0.005 for 101 y points

x-5 0 5 10 15

dens

ity

0

0.05

0.1

0.15

0.2

0.25

X True DensityX Epi-spline EstimateX Sample dataError density

108 / 126

Uncertainty quantification

109 / 126

Dynamical systemRecall the true density of the amplitude

−8 −6 −4 −2 0 2 4 6 80

0.5

1

1.5

x

dens

ity o

f X

110 / 126

Density of amplitudeSample size 100; cont. diff.; unimodal tails, exp. epi-splines

−8 −6 −4 −2 0 2 4 6 80

0.5

1

1.5

x

dens

ity

True DensityEpi−Spline EstimateKernel EstimateSample

111 / 126

Gradient information

Gradient information for bijective g : IR → IR

Recall: If X = g(V ), then

h(x) = hV (g−1(x))/|g ′(g−1(x))|

112 / 126

Gradient information

Gradient information for bijective g : IR → IR

Recall: If X = g(V ), then

h(x) = hV (g−1(x))/|g ′(g−1(x))|

Present context without a bijection and data x i = g(v i ), g ′(v i ):

h(x i ) ≥hV (v

i )

|g ′(v i )|

Value of pdf bounded from below at x i

112 / 126

Gradient information (cont.)Sample size 20

−8 −6 −4 −2 0 2 4 6 80

0.5

1

1.5

x

dens

ity

True DensityEpi−Spline EstimateKernel EstimateSample

113 / 126

Fluid dynamics

Drag/lift estimates:

High-fidelity RANSE solves (each 4 hours on 8 cores)

Low-fidelity potential flow solves (each 5 sec on 1 core)

114 / 126

Predicting high-fidelity performance

898 high- and low-fidelity solves

Drag/lift using low-fidelity solver0.0795 0.08 0.0805 0.081 0.0815 0.082 0.0825 0.083 0.0835D

rag/

lift u

sing

hig

h-fid

elity

sol

ver

0.083

0.084

0.085

0.086

0.087

0.088

Learn from low-fidelity solves and avoid (many) high-fidelity solves

115 / 126

Density using high or low solves

Exponential epi-splines of second order; mesh N = 50Soft info: log-concavity and bounds on second-order derivatives

response8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8

dens

ity

0

5

10

15 high 898Samplelow 800

116 / 126

Estimating conditional errorX =high-fidelity; Y =low-fidelity

hX (x) =∫

hX |Y (x |y)hY (y)dy

117 / 126

Estimating conditional errorX =high-fidelity; Y =low-fidelity

hX (x) =∫

hX |Y (x |y)hY (y)dy

Normal linear least-squares regression model on training data ofsize 50 → hX |Y (x |y)

117 / 126

Estimating conditional errorX =high-fidelity; Y =low-fidelity

hX (x) =∫

hX |Y (x |y)hY (y)dy

Normal linear least-squares regression model on training data ofsize 50 → hX |Y (x |y)

response8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8

dens

ity

0

5

10

15 high 898cond lsSamplelow 800

117 / 126

Information fusion

Max likelihood using 10 high-fidelity simulations0.5hX (x) ≤

hX |Y (x |y)hY (y)dy ≤ 1.5hX (x)

118 / 126

Information fusion

Max likelihood using 10 high-fidelity simulations0.5hX (x) ≤

hX |Y (x |y)hY (y)dy ≤ 1.5hX (x)

response8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8

dens

ity

0

5

10

15 high 898cond lsSamplehigh 10 + condlow 800

118 / 126

Only 10 high-fidelity

response8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8

dens

ity

0

5

10

15 high 898cond lsSamplehigh 10 + condhigh 10low 800

119 / 126

Uniform mixture density

120 / 126

Uniform mixture density

Sample size 100; lsc, slope constraints; exp. epi-splines

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

3.5

x

dens

ity

True DensityEpi−Spline EstimateKernel EstimateEmpirical Density

121 / 126

Bivariate normal density

122 / 126

Bivariate normal probability density

123 / 126

Bivariate normal probability density

Curvature, log-concave, 25 sample points, exp. epi-spline

124 / 126

Conclusions

◮ Function identification problems: rich class

125 / 126

Conclusions

◮ Function identification problems: rich class

◮ lsc functions provide modeling flexibility

125 / 126

Conclusions

◮ Function identification problems: rich class

◮ lsc functions provide modeling flexibility

◮ Epi-convergence allows evolution of info./approx.

125 / 126

Conclusions

◮ Function identification problems: rich class

◮ lsc functions provide modeling flexibility

◮ Epi-convergence allows evolution of info./approx.

◮ Epi-splines central approximation tools

125 / 126

References

◮ Singham, Royset, & Wets, Density estimation of simulation outputusing exponential epi-splines

◮ Royset, Sukumar, & Wets, Uncertainty quantification usingexponential epi-splines

◮ Royset & Wets, From data to assessments and decisions: epi-splinetechnology

◮ Royset & Wets, Fusion of hard and soft information innonparametric density estimation

◮ Royset & Wets, Multivariate epi-splines and evolving functionidentification problems

◮ Feng, Rios, Ryan, Spurkel, Watson, Wets & Woodruff, Towardscalable stochastic unit commitment - Part 1

◮ Rios, Wets & Woodruff, Multi-period forecasting with limitedinformation and scenario generation with limited data

◮ Royset & Wets, On function identification problems

126 / 126