How to Make the Most Out of Evolving Information: Function...
Transcript of How to Make the Most Out of Evolving Information: Function...
How to Make the Most Out of Evolving
Information:
Function Identification Using Epi-Splines
Johannes O. Royset
Operations Research, NPSin collaboration with L. Bonfiglio, S. Buttrey, S. Brizzolara, J.Carbaugh, I.Rios, D. Singham, K. Spurkel, G. Vernengo, J-P.
Watson, R. Wets, D. Woodruff
Smart Energy and Stochastic Optimization, ENPC ParisTechJune 2015
This material is based upon work supported in part by the U.S. ArmyResearch Laboratory and the U.S. Army Research Office under grants00101-80683, W911NF-10-1-0246 and W911NF-12-1-0273 as well as
by DARPA under HR0011-14-1-00601 / 126
Goal: estimate, predict, forecast
2 / 126
Copper prices
−10 −5 0 5 10
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
Months
Pric
e[U
SD
]
3 / 126
Copper prices
−10 −5 0 5 10
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
Months
Pric
e[U
SD
]
Real prices
E(hist+market) + σ
E(hist+market) − σ
E(hist+market)
4 / 126
Hourly electricity loads
5 / 126
Hard information: data
◮ Observations
◮ Usually scarce, excessive, corrupted, uncertain
Tuesday, October 22, 13
6 / 126
Soft information
◮ Indirect observations, external information
◮ Knowledge about
◮ structures◮ established “laws”◮ physical restrictions◮ shapes, smoothness of curves
7 / 126
Evolving information
data acquisition, information growth
8 / 126
Function Identification Problem
Identify a function that
best represents all available information
9 / 126
Applications of function identification approach
◮ energy
◮ natural resources
◮ financial markets
◮ image reconstruction
◮ uncertain quantification
◮ variograms
◮ nonparametric regression
◮ deconvolution
◮ density estimation
10 / 126
Outline
◮ Illustrations and formulations (slides 12-33)
◮ Framework (slides 34-60)
◮ Epi-splines and approximations (slides 61-77)
◮ Implementations (slides 78-85)
◮ Examples (slides 86-124)
11 / 126
Illustrations and formulations
12 / 126
Identifying financial curvesEstimate discount factor curve given instruments i = 1, 2, ..., I and
payments pi0, pi1, ..., p
iNi
at times 0 = t i0, ti1, ..., t
iNi
13 / 126
Identifying financial curvesEstimate discount factor curve given instruments i = 1, 2, ..., I and
payments pi0, pi1, ..., p
iNi
at times 0 = t i0, ti1, ..., t
iNi
Identify nonnegative, nonincreasing function f with f (0) = 1
Ni∑
j=0
f (t ij )pij ≈ 0 for all i
13 / 126
Identifying financial curvesEstimate discount factor curve given instruments i = 1, 2, ..., I and
payments pi0, pi1, ..., p
iNi
at times 0 = t i0, ti1, ..., t
iNi
Identify nonnegative, nonincreasing function f with f (0) = 1
Ni∑
j=0
f (t ij )pij ≈ 0 for all i
Constrained infinite-dimensional optimization problem:
minimize
I∑
i=1
∣
∣
∣
∣
∣
∣
Ni∑
j=0
f (t ij )pij
∣
∣
∣
∣
∣
∣
such that f (0) = 1, f ≥ 0, f ′ ≤ 0
13 / 126
Day-ahead electricity load forecast
d on day d
, humid. cur
Tuesday, October 22, 13
Tuesday, October 22, 13
Data: predicted temperature, dew point; observed load
14 / 126
Day-ahead load forecast problem
Using a relevant data and weather forecast for tomorrow,determine tomorrow’s load curve and its uncertainty
15 / 126
Day-ahead load forecast problem
Using a relevant data and weather forecast for tomorrow,determine tomorrow’s load curve and its uncertainty
◮ j = 1, 2, ..., J: days in data set
15 / 126
Day-ahead load forecast problem
Using a relevant data and weather forecast for tomorrow,determine tomorrow’s load curve and its uncertainty
◮ j = 1, 2, ..., J: days in data set
◮ h = 1, 2, ..., 24: hours of the day
15 / 126
Day-ahead load forecast problem
Using a relevant data and weather forecast for tomorrow,determine tomorrow’s load curve and its uncertainty
◮ j = 1, 2, ..., J: days in data set
◮ h = 1, 2, ..., 24: hours of the day
◮ tjh: predicted temperature in hour h of day j
15 / 126
Day-ahead load forecast problem
Using a relevant data and weather forecast for tomorrow,determine tomorrow’s load curve and its uncertainty
◮ j = 1, 2, ..., J: days in data set
◮ h = 1, 2, ..., 24: hours of the day
◮ tjh: predicted temperature in hour h of day j
◮ djh: predicted dew point in hour h of day j
15 / 126
Day-ahead load forecast problem
Using a relevant data and weather forecast for tomorrow,determine tomorrow’s load curve and its uncertainty
◮ j = 1, 2, ..., J: days in data set
◮ h = 1, 2, ..., 24: hours of the day
◮ tjh: predicted temperature in hour h of day j
◮ djh: predicted dew point in hour h of day j
◮ ljh: actual observed load in hour h of day j
15 / 126
Day-ahead load forecast problem (cont.)“Functional” regression model:
ljh = f tmp(h)t jh + f dpt(h)d j
h + ejh,
◮ regression “coefficients” f tmp and f dpt are functions of time◮ e
jh: error between the observed and predicted loads
16 / 126
Day-ahead load forecast problem (cont.)“Functional” regression model:
ljh = f tmp(h)t jh + f dpt(h)d j
h + ejh,
◮ regression “coefficients” f tmp and f dpt are functions of time◮ e
jh: error between the observed and predicted loads
Regression problem: find functions f tmp and f dpt on [0, 24] that
minimize
J∑
j=1
24∑
h=1
∣
∣
∣ljh −
[
f tmp(h)t jh + f dpt(h)d jh
]∣
∣
∣
subject to smoothness, curvature conditions
16 / 126
Day-ahead load forecast problem (cont.)“Functional” regression model:
ljh = f tmp(h)t jh + f dpt(h)d j
h + ejh,
◮ regression “coefficients” f tmp and f dpt are functions of time◮ e
jh: error between the observed and predicted loads
Regression problem: find functions f tmp and f dpt on [0, 24] that
minimize
J∑
j=1
24∑
h=1
∣
∣
∣ljh −
[
f tmp(h)t jh + f dpt(h)d jh
]∣
∣
∣
subject to smoothness, curvature conditions
Constrained infinite-dimensional optimization problem
16 / 126
Day-ahead load forecast problem (cont.)
Given temperature t and dew point d forecasts (functions of time)for tomorrow,
load forecast becomes f tmp(h)t(h) + f dpt(h)d(h), h ∈ [0, 24]
17 / 126
Day-ahead load forecast problem (cont.)
Given temperature t and dew point d forecasts (functions of time)for tomorrow,
load forecast becomes f tmp(h)t(h) + f dpt(h)d(h), h ∈ [0, 24]
But, what about uncertainty?
17 / 126
Estimation of errors
For hour h:
minimized errors e jh = ljh − [f tmp(h)t jh + f dpt(h)d j
h], j = 1, ..., J
= samples from actual error density
But that wouldn’t capture the uncertainty!
one would expect:
h
18 / 126
Estimation of errors
For hour h:
minimized errors e jh = ljh − [f tmp(h)t jh + f dpt(h)d j
h], j = 1, ..., J
= samples from actual error density
But that wouldn’t capture the uncertainty!
one would expect:
h
Probability density estimation problem with very small data set
18 / 126
Estimation of errors (cont.)
Probability density estimation problem:
Given iid sample x1, x2, ..., xν ,
find a function h ∈ H that maximizes
ν∏
i=1
h(x i )
19 / 126
Estimation of errors (cont.)
Probability density estimation problem:
Given iid sample x1, x2, ..., xν ,
find a function h ∈ H that maximizes
ν∏
i=1
h(x i )
H = constraints on h: h ≥ 0,∫
h(x)dx = 1, and more
19 / 126
Estimation of errors (cont.)
Probability density estimation problem:
Given iid sample x1, x2, ..., xν ,
find a function h ∈ H that maximizes
ν∏
i=1
h(x i )
H = constraints on h: h ≥ 0,∫
h(x)dx = 1, and more
Constrained infinite-dimensional optimization problem
19 / 126
Estimation of errors (cont.)
20 / 126
Generating load scenariosAlso need to consider conditioning:
◮ if load above regression function early, then likely above later◮ error data for density estimation reduced further
21 / 126
Generating load scenariosAlso need to consider conditioning:
◮ if load above regression function early, then likely above later◮ error data for density estimation reduced further
21 / 126
Forecast precision (Connecticut 2010-12)
Season Mean % error
Fall 3.99Spring 2.73Summer 4.19Winter 3.47
22 / 126
Uncertainty quantification (UQ)
Engineering, biological, physical systems
◮ Input: random vector V (“known” distribution)
◮ System function g ; implicitly defined e.g. by simulation
◮ Output: random variable
X = g(V )
23 / 126
Illustration of UQ challenges
Output amplitude of dynamical system:
−8 −6 −4 −2 0 2 4 6 80
0.5
1
1.5
x
dens
ity o
f X
How to estimate densities like this with a small sample?
24 / 126
Probability density estimation
Sample X 1, ...,X ν : maximize∏ν
i=1 h(Xi) s.t. h ∈ Hν ⊂ H
◮ Sample size might be growing
◮ Constraint set Hν might be evolving
25 / 126
Evolving probability density estimation
maximize∏ν
i=1(h(Xi))1/ν s.t. h ∈ Hν ⊂ H
26 / 126
Evolving probability density estimation
maximize∏ν
i=1(h(Xi))1/ν s.t. h ∈ Hν ⊂ H
maximize 1ν
∑νi=1 log h(X
i) s.t. h ∈ Hν ⊂ H
26 / 126
Evolving probability density estimation
maximize∏ν
i=1(h(Xi))1/ν s.t. h ∈ Hν ⊂ H
maximize 1ν
∑νi=1 log h(X
i) s.t. h ∈ Hν ⊂ H
actual problem: maximize E [log h(X )] s.t. h ∈ H ⊂ H
26 / 126
Evolving constraints
Constraints: nonnegative support, smooth, curvature bound
0 0.5 1 1.5 2 2.5 3 3.5 40
0.5
1
1.5
x
dens
ity
True DensityEpi−Spline EstimateKernel EstimateEmpirical Density
MSE = 0.2765 (epi-spline) and 0.3273 (kernel)
27 / 126
Evolving constraints (cont.)
Also log-concave
0 0.5 1 1.5 2 2.5 3 3.5 40
0.5
1
1.5
x
dens
ity
True DensityEpi−Spline EstimateKernel EstimateEmpirical Density
MSE = 0.1144 (epi-spline) and 0.3273 (kernel)
28 / 126
Evolving constraints (cont.)
Also nonincreasing
0 0.5 1 1.5 2 2.5 3 3.5 40
0.5
1
1.5
x
dens
ity
True DensityEpi−Spline EstimateKernel EstimateEmpirical Density
MSE = 0.0470 (epi-spline) and 0.3273 (kernel)
29 / 126
Evolving constraints (cont.)
Also slope bounds
0 0.5 1 1.5 2 2.5 3 3.5 40
0.5
1
1.5
x
dens
ity
True DensityEpi−Spline EstimateKernel EstimateEmpirical Density
MSE = 0.0416 (epi-spline) and 0.3273 (kernel)
30 / 126
Take-aways so far
◮ Challenges in forecasting and estimation are functionidentification problems
◮ Day-ahead forecasting of electricity demand involves
◮ functional regression to get trend◮ probability density estimation to get errors
◮ Soft information supplements hard information (data)
31 / 126
Questions?
32 / 126
Outline
◮ Illustrations and formulations (slides 12-33)
◮ Framework (slides 34-60)
◮ Epi-splines and approximations (slides 61-77)
◮ Implementations (slides 78-85)
◮ Examples (slides 86-124)
33 / 126
Framework
34 / 126
Actual problem
◮ Constrained infinite-dimensional optimization problem
minψ(f )
such that f ∈ F ⊆ F ⊆ some function space
35 / 126
Actual problem
◮ Constrained infinite-dimensional optimization problem
minψ(f )
such that f ∈ F ⊆ F ⊆ some function space
◮ criterion ψ (norms, error measures, etc)
35 / 126
Actual problem
◮ Constrained infinite-dimensional optimization problem
minψ(f )
such that f ∈ F ⊆ F ⊆ some function space
◮ criterion ψ (norms, error measures, etc)
◮ feasible set F (shape restrictions, external info)
35 / 126
Actual problem
◮ Constrained infinite-dimensional optimization problem
minψ(f )
such that f ∈ F ⊆ F ⊆ some function space
◮ criterion ψ (norms, error measures, etc)
◮ feasible set F (shape restrictions, external info)
◮ subspace of interest F (continuity, smoothness)
35 / 126
Special case: linear regression
0 1000 2000 3000 4000 5000
0500
1000
1500
2000
Household Income
Food E
xpenditure
0.75−Superquantile Regression
0.75−Quantile Regression
Least−squares Regression
find affine f that min∑
j
(y j − f (x j ))2 or other error measures
36 / 126
Special case: interpolation
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1.5
−1
−0.5
0
0.5
1
1.5
datasmoothing splineinterpolating spline
min ‖f ′′‖2
such that f (x j ) = y j for all j and f ∈ Sobolev space
37 / 126
Special case: smoothing
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1.5
−1
−0.5
0
0.5
1
1.5
datasmoothing splineinterpolating spline
min∑
j
(y j − f (x j ))2 + λ‖f ′′‖2
such that f ∈ Sobolev space
38 / 126
Evolving problems
◮ Evolving infinite-dimensional optimization problems
minψν(f )
such that f ∈ F ν ⊆ F ⊆ some function space
◮ evolving/approximating criterion ψν
◮ evolving/approximating feasible set F ν
39 / 126
Evolving problems
◮ Evolving infinite-dimensional optimization problems
minψν(f )
such that f ∈ F ν ⊆ F ⊆ some function space
◮ evolving/approximating criterion ψν
◮ evolving/approximating feasible set F ν
40 / 126
Function space
extended real-valued lower semicontinuous functions on IRn
41 / 126
Function space
extended real-valued lower semicontinuous functions on IRn
x
lower semicont. f
41 / 126
Function space
extended real-valued lower semicontinuous functions on IRn
x
lower semicont. f
(lsc-fcns(IRn), dl): complete separable metric, dl = epi-distance
41 / 126
Modeling possibilities with lsc functions
◮ functions with jumps and high growth
◮ response surface building with implicit constraints
◮ system identification with subsequent minimization
◮ nonlinear transformations requiring ±∞
◮ functions with unbounded domain
42 / 126
Jumps
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−1.5
−1
−0.5
0
0.5
1
1.5
Fits by polynomial and lsc function
43 / 126
Modeling possibilities with lsc functions
◮ functions with jumps and high growth
◮ response surface building with implicit constraints
◮ system identification with subsequent minimization
◮ nonlinear transformations requiring ±∞
◮ functions with unbounded domain
44 / 126
Nonlinear transformations
Recall: probability density estimation with sample X 1, ...,X ν :
maximize 1ν
∑νi=1 log h(X
i) s.t. h ∈ Hν ⊂ H including h ≥ 0
45 / 126
Nonlinear transformations
Recall: probability density estimation with sample X 1, ...,X ν :
maximize 1ν
∑νi=1 log h(X
i) s.t. h ∈ Hν ⊂ H including h ≥ 0
Exponential transformation: h(x) = exp(−s(x))
45 / 126
Nonlinear transformations
Recall: probability density estimation with sample X 1, ...,X ν :
maximize 1ν
∑νi=1 log h(X
i) s.t. h ∈ Hν ⊂ H including h ≥ 0
Exponential transformation: h(x) = exp(−s(x))
minimize∑ν
i=1 s(Xi) s.t. s ∈ Sν ⊂ S excluding nonnegativity
45 / 126
Nonlinear transformations
Recall: probability density estimation with sample X 1, ...,X ν :
maximize 1ν
∑νi=1 log h(X
i) s.t. h ∈ Hν ⊂ H including h ≥ 0
Exponential transformation: h(x) = exp(−s(x))
minimize∑ν
i=1 s(Xi) s.t. s ∈ Sν ⊂ S excluding nonnegativity
◮ h log-concave ⇐⇒ s convex
45 / 126
Nonlinear transformations
Recall: probability density estimation with sample X 1, ...,X ν :
maximize 1ν
∑νi=1 log h(X
i) s.t. h ∈ Hν ⊂ H including h ≥ 0
Exponential transformation: h(x) = exp(−s(x))
minimize∑ν
i=1 s(Xi) s.t. s ∈ Sν ⊂ S excluding nonnegativity
◮ h log-concave ⇐⇒ s convex
◮ If s = 〈c(·), r〉, then certain expression linear in r
45 / 126
Nonlinear transformations
Recall: probability density estimation with sample X 1, ...,X ν :
maximize 1ν
∑νi=1 log h(X
i) s.t. h ∈ Hν ⊂ H including h ≥ 0
Exponential transformation: h(x) = exp(−s(x))
minimize∑ν
i=1 s(Xi) s.t. s ∈ Sν ⊂ S excluding nonnegativity
◮ h log-concave ⇐⇒ s convex
◮ If s = 〈c(·), r〉, then certain expression linear in r
But need s to take on ∞ for h = exp(−s(·)) to vanish
45 / 126
Identifying functions with unbounded domains
Complications for “standard” spaces
46 / 126
Identifying functions with unbounded domains
Complications for “standard” spaces
Is f ν near f 0?
x
/
46 / 126
Identifying functions with unbounded domains
Complications for “standard” spaces
Is f ν near f 0?
x
/
Not in Lp-norm, but epi-distance ≤ 1/ν
46 / 126
Modeling possibilities with lsc functions
◮ functions with jumps and high growth
◮ response surface building with implicit constraints
◮ system identification with subsequent minimization
◮ nonlinear transformations requiring ±∞
◮ functions with unbounded domain
47 / 126
Epi-graph
x
f
48 / 126
Epi-graph
x
epi f
f
49 / 126
Geometric view of lsc functions
f ∈ lsc-fcns(IRn) ⇐⇒ epi f nonemptly closed subset of IRn+1
50 / 126
Geometric view of lsc functions
f ∈ lsc-fcns(IRn) ⇐⇒ epi f nonemptly closed subset of IRn+1
Notation:
◮ ρIB = IB(0, ρ) = origin-centered ball with radius ρ
◮ d(y ,S) = infy ′∈S ‖y − y ′‖ = distance between y and S
50 / 126
Distances between epi-graphs
x
epi f eeeeeeeeeeepppppppppppppppppppppppppiiiiiiiiii ffffffffffffffffffffffffff
epi g
51 / 126
Distances between epi-graphs (cont.)
ρ-epi-distance = dlρ(f , g) = supx̄∈ρIB
|d(x̄ , epi f )−d(x̄ , epi g)|, ρ ≥ 0
52 / 126
Distances between epi-graphs (cont.)
ρ-epi-distance = dlρ(f , g) = supx̄∈ρIB
|d(x̄ , epi f )−d(x̄ , epi g)|, ρ ≥ 0
f
x
g
52 / 126
Distances between epi-graphs (cont.)ρ-epi-distance = dlρ(f , g) = sup
x̄∈ρIB|d(x̄ , epi f )−d(x̄ , epi g)|, ρ ≥ 0
epi f
x
epi g
= ( , x)
dlρ(f , g) = δ
53 / 126
Distances between epi-graphs (cont.)
ρ-epi-distance = dlρ(f , g) = supx̄∈ρIB
|d(x̄ , epi f )−d(x̄ , epi g)|, ρ ≥ 0
( , )
= ( , )
( , epi )
epi
epi
( , epi )
54 / 126
Metric on the lsc functions
ρ-epi-distance dlρ is a pseudo-metric on lsc-fcns(IRn)
epi-distance = dl(f , g) =
∫ ∞
0dlρ(f , g)e
−ρdρ
dl is a metric on lsc-fcns(IRn)
The epi-distance induces the epi-topology on lsc-fcns(IRn)(lsc-fcns(IRn), dl) is a complete separable metric space (Polish)
55 / 126
Epi-convergence
f ν ∈ lsc-fcns(IRn) epi-converges to f if dl(f ν , f ) → 0
56 / 126
Actual problem
◮ Constrained infinite-dimensional optimization problem
minψ(f )
such that f ∈ F ⊆ F ⊆ lsc-fcns(IRn)
◮ criterion ψ (norms, error measures, etc)
◮ feasible set F (shape restrictions, external info)
◮ subspace of interest F (continuity, smoothness)
57 / 126
Take-aways in this section
◮ Actual problem:
◮ find best extended real-valued lower semicontinuous function◮ captures regression, curve fitting, interpolation, density
estimation etc:◮ allows functions on IRn, jumps, transformation requiring ±∞
◮ Evolving problem due to changes in information andapproximations
◮ Theory about lsc functions:
◮ metric space under epi-distance◮ epi-distance = distance between epi-graphs
58 / 126
Questions?
59 / 126
Outline
◮ Illustrations and formulations (slides 12-33)
◮ Framework (slides 34-60)
◮ Epi-splines and approximations (slides 61-77)
◮ Implementations (slides 78-85)
◮ Examples (slides 86-124)
60 / 126
Epi-splines and approximations
61 / 126
Epi-splines = piecewise polynomial + lower semicont.
Piecewise polynomial not automatic; must be constructed
62 / 126
Epi-splines = piecewise polynomial + lower semicont.
Piecewise polynomial not automatic; must be constructed
s(x)
m1 m2 m3
x
mN-1m4 m5 m6…
62 / 126
Epi-splines = piecewise polynomial + lower semicont.
Piecewise polynomial not automatic; must be constructed
s(x)
m1 m2 m3
x
mN-1m4 m5 m6…
Optimization over new class of piecewise polynomial functions
62 / 126
n-dim epi-splines
63 / 126
n-dim epi-splines (cont.)
Epi-spline s of order p defined on IRn with partition R = {Rk}Nk=1
is a real-valued function that
on each Rk , k = 1, ...,N, is polynomial of total degree p and
for every x ∈ IRn, has s(x) = lim infx ′→x
s(x ′)
64 / 126
n-dim epi-splines (cont.)
Epi-spline s of order p defined on IRn with partition R = {Rk}Nk=1
is a real-valued function that
on each Rk , k = 1, ...,N, is polynomial of total degree p and
for every x ∈ IRn, has s(x) = lim infx ′→x
s(x ′)
e-splpn(R) = epi-splines of order p on IRn with partition R
64 / 126
Evolving and approximating problems
minψν(s)
such that s ∈ Sν = F ν ∩ Sν
65 / 126
Evolving and approximating problems
minψν(s)
such that s ∈ Sν = F ν ∩ Sν
◮ subspace of interest Sν ⊆ e-splpν
n (Rν) ∩ F
65 / 126
Evolving and approximating problems
minψν(s)
such that s ∈ Sν = F ν ∩ Sν
◮ subspace of interest Sν ⊆ e-splpν
n (Rν) ∩ F
◮ evolving/approximate feasible set F ν
65 / 126
Evolving and approximating problems
minψν(s)
such that s ∈ Sν = F ν ∩ Sν
◮ subspace of interest Sν ⊆ e-splpν
n (Rν) ∩ F
◮ evolving/approximate feasible set F ν
◮ evolving/approximate criterion ψν
65 / 126
Function identification framework
Evolving Problems - information growth
- criterion and constraint approx.
- function approximations
Actual Problem - complete information
- infinite dimensional
- conceptual
solutions = epi-splines solutions
consistency, rates
66 / 126
Convergence of evolving problems
Epi-convergence of functionals on a metric space:
67 / 126
Convergence of evolving problems
Epi-convergence of functionals on a metric space:
Definition: {ψν : Sν → IR}∞ν=1 epi-converge to ψ : F → IR if andonly if
(i) for every sν → f ∈ lsc-fcns(IRn), with sν ∈ Sν , we havelim inf ψν(sν) ≥ ψ(f ) if f ∈ F and ψν(sν) → ∞ otherwise;
(ii) for every f ∈ F , there exists {sν}∞ν=1, with sν ∈ Sν , suchthat sν → f and lim supψν(sν) ≤ ψ(f ).
67 / 126
Convergence of evolving problems
Epi-convergence of functionals on a metric space:
Definition: {ψν : Sν → IR}∞ν=1 epi-converge to ψ : F → IR if andonly if
(i) for every sν → f ∈ lsc-fcns(IRn), with sν ∈ Sν , we havelim inf ψν(sν) ≥ ψ(f ) if f ∈ F and ψν(sν) → ∞ otherwise;
(ii) for every f ∈ F , there exists {sν}∞ν=1, with sν ∈ Sν , suchthat sν → f and lim supψν(sν) ≤ ψ(f ).
Equivalent to previous definition
67 / 126
Convergence of evolving problems
Epi-convergence of functionals on a metric space:
Definition: {ψν : Sν → IR}∞ν=1 epi-converge to ψ : F → IR if andonly if
(i) for every sν → f ∈ lsc-fcns(IRn), with sν ∈ Sν , we havelim inf ψν(sν) ≥ ψ(f ) if f ∈ F and ψν(sν) → ∞ otherwise;
(ii) for every f ∈ F , there exists {sν}∞ν=1, with sν ∈ Sν , suchthat sν → f and lim supψν(sν) ≤ ψ(f ).
Equivalent to previous definition
Also say that the evolving optimization problems epi-converge tothe actual problem.
67 / 126
Consequence of epi-convergence
If evolving problems epi-converge to the actual problem, then
lim sup
(
infs∈Sν
ψν(s)
)
≤ inff ∈F
ψ(f ).
68 / 126
Consequence of epi-convergence
If evolving problems epi-converge to the actual problem, then
lim sup
(
infs∈Sν
ψν(s)
)
≤ inff ∈F
ψ(f ).
Moreover, if sk are optimal for the evolving problemsmins∈Sνk ψνk (s) and sk → f 0, then f 0 is optimal for the actualproblem and
limk→∞
infs∈Sνk
ψνk (s) = inff ∈F
ψ(f ) = ψ(f 0).
68 / 126
When will evolving problems epi-converge to actual?
minψ(f ) minψν(s)
such that f ∈ F ⊆ F ⊆ lsc-fcns(IRn) such that s ∈ Sν = F ν ∩ Sν
Theorem: A sufficient condition for epi-convergence is
◮ ψν converges continuously to ψ relative to F
◮ Epi-splines dense in lsc functions
◮ F ν set-converges to F
◮ F is solid
69 / 126
Epi-splines dense in lsc functions
{Rν}∞ν=1, with Rν = {Rν1 , ...,R
νNν}, is an infinite refinement if
for every x ∈ IRn and ǫ > 0, there exists a positive integer ν̄ s.t.
Rνk ⊂ IB(x , ǫ) for every ν ≥ ν̄ and k satisfying x ∈ clRν
k .
70 / 126
Epi-splines dense in lsc functions
{Rν}∞ν=1, with Rν = {Rν1 , ...,R
νNν}, is an infinite refinement if
for every x ∈ IRn and ǫ > 0, there exists a positive integer ν̄ s.t.
Rνk ⊂ IB(x , ǫ) for every ν ≥ ν̄ and k satisfying x ∈ clRν
k .
Theorem: For any p = 0, 1, 2, ..., and infinite refinement {Rν}∞ν=1,
∞⋃
ν=1
e-splpn(Rν) is dense in lsc-fcns(IRn)
70 / 126
Dense in continuous functions under simplex partitioning
Definition: A simplex S in IRn is the convex hull of n + 1 pointsx0, x1, ..., xn ∈ IRn, with x1 − x0, x2 − x0, ..., xn − x0 linearlyindependent.
71 / 126
Dense in continuous functions under simplex partitioning
Definition: A simplex S in IRn is the convex hull of n + 1 pointsx0, x1, ..., xn ∈ IRn, with x1 − x0, x2 − x0, ..., xn − x0 linearlyindependent.
Definition: A partition R1,R2, ...,RN of IRn is a simplex partition ofIRn if clR1, ..., clRN are “mostly” simplexes.
71 / 126
Dense in cnts fcns under simplex partitioning (cont.)
Theorem: For any p = 1, 2, .., and {Rν}∞ν=1, an infinite refinementof IRn consisting of simplex partitions of IRn,
∞⋃
ν=1
e-splpn(Rν) ∩ C0(IRn) is dense in C0(IRn).
72 / 126
Rates in univariate probability density estimation
For convex and finite-dimensional estimation problem:
Theorem: For any “correct” soft information,
ν1/2dKL(h0||hνǫ ) = Op(1) for some hνǫ = near-optimal solution
73 / 126
Decomposition
Theorem: For every s ∈ e-splpn(R), with n > p ≥ 1 andR = {Rk}
Nk=1, there exist qk,i ∈ polyp(IRn−1), i = 1, 2, ..., n and
k = 1, 2, ...,N, such that
s(x) =n
∑
i=1
qk,i (x−i), for all x ∈ Rk .
an n-dim epi-spline is the sum of n, (n − 1)-dim epi-splines
74 / 126
Take-aways in this section
◮ Epi-splines are piecewise polynomials defined on arbitrarypartition of IRn
◮ Only lower-semicontinuity required (not smoothness)
◮ Dense in space of extended real-valued lower-semicontinuousfunctions under epi-distance (i.e., epi-splines can approximateevery lsc function to an arbitrary accuracy)
◮ Solutions of evolving problem (in terms of epi-splines) areapproximate solutions of actual problem
75 / 126
Questions?
76 / 126
Outline
◮ Illustrations and formulations (slides 12-33)
◮ Framework (slides 34-60)
◮ Epi-splines and approximations (slides 61-77)
◮ Implementations (slides 78-85)
◮ Examples (slides 86-124)
77 / 126
Implementations
78 / 126
Implementation considerations
◮ Selection of epi-spline order and composition h = exp(−s)
◮ Selection of partition: go fine!
◮ Implementation of criterion functional and soft info
79 / 126
Implementation considerations
◮ Selection of epi-spline order and composition h = exp(−s)
◮ Selection of partition: go fine!
◮ Implementation of criterion functional and soft info
Provide details for one-dimensional epi-splines of order 1
Focusing on criteria and constraints for probability densities
79 / 126
Implementation of criteria functionals
Probability densities h ∈ e-spl1(m), with meshm = {m0,m1, ...,mN},
−∞ < m0 < m1 < ... < mN <∞
80 / 126
Implementation of criteria functionals
Probability densities h ∈ e-spl1(m), with meshm = {m0,m1, ...,mN},
−∞ < m0 < m1 < ... < mN <∞
First-order epi-splines (piecewise linear):h(x) = ak0 + akx for x ∈ (mk−1,mk)
80 / 126
Implementation of criteria functionals (cont.)
Log-likelihood for observations x1, ..., xν :
log
ν∏
i=1
h(x i) =
ν∑
i=1
log(aki0 +aki x i ), where ki such that x i ∈ (mki−1,mki )
81 / 126
Implementation of criteria functionals (cont.)
Log-likelihood for observations x1, ..., xν :
log
ν∏
i=1
h(x i) =
ν∑
i=1
log(aki0 +aki x i ), where ki such that x i ∈ (mki−1,mki )
Entropy:
−
∫
h(x) log h(x)dx = −N∑
k=1
∫ mk
mk−1
(ak0 + akx) log(ak0 + akx)
81 / 126
Implementation of criteria functionals (cont.)
Log-likelihood for observations x1, ..., xν :
log
ν∏
i=1
h(x i) =
ν∑
i=1
log(aki0 +aki x i ), where ki such that x i ∈ (mki−1,mki )
Entropy:
−
∫
h(x) log h(x)dx = −N∑
k=1
∫ mk
mk−1
(ak0 + akx) log(ak0 + akx)
Both concave in epi-parameters ak0 and ak , k = 1, ...,N
81 / 126
Soft information
Nonnegativity: h ≥ 0 if ak0 + akmk−1 ≥ 0 and ak0 + akmk ≥ 0,k = 1, ...,N
82 / 126
Soft information
Nonnegativity: h ≥ 0 if ak0 + akmk−1 ≥ 0 and ak0 + akmk ≥ 0,k = 1, ...,N
Integrate to one:
∫
h(x)dx =N∑
k=1
ak0(mk −mk−1) +N∑
k=1
ak
2(m2
k −m2k−1) = 1
82 / 126
Soft information
Nonnegativity: h ≥ 0 if ak0 + akmk−1 ≥ 0 and ak0 + akmk ≥ 0,k = 1, ...,N
Integrate to one:
∫
h(x)dx =N∑
k=1
ak0(mk −mk−1) +N∑
k=1
ak
2(m2
k −m2k−1) = 1
Continuity: ak0 + akmk = ak+10 + ak+1mk for k = 1, ...,N − 1.
82 / 126
Soft information
Nonnegativity: h ≥ 0 if ak0 + akmk−1 ≥ 0 and ak0 + akmk ≥ 0,k = 1, ...,N
Integrate to one:
∫
h(x)dx =N∑
k=1
ak0(mk −mk−1) +N∑
k=1
ak
2(m2
k −m2k−1) = 1
Continuity: ak0 + akmk = ak+10 + ak+1mk for k = 1, ...,N − 1.
Log-concavity, convexity, monotonicity
82 / 126
Soft information (cont.)
Deconvolution Y = X +W : Given densities hW and hY ,
hY (y) =
∫
h(x)hW (y − x)dx =
N∑
k=1
ak0
∫ mk
mk−1
hW (y − x)dx
+
N∑
k=1
ak∫ mk
mk−1
xhW (y − x)dx for all y
83 / 126
Soft information (cont.)
Deconvolution Y = X +W : Given densities hW and hY ,
hY (y) =
∫
h(x)hW (y − x)dx =
N∑
k=1
ak0
∫ mk
mk−1
hW (y − x)dx
+
N∑
k=1
ak∫ mk
mk−1
xhW (y − x)dx for all y
Inverse problem Y = g(X ): Given vq =∫
yqhY (y)dy ,q = 1, 2, ...
vq =
∫
[g(x)]qh(x)dx ≈
N∑
k=1
Mk∑
j=1
w jk [g(x jk)]q(ak0 + akx jk)
83 / 126
Soft information (cont.)
Deconvolution Y = X +W : Given densities hW and hY ,
hY (y) =
∫
h(x)hW (y − x)dx =
N∑
k=1
ak0
∫ mk
mk−1
hW (y − x)dx
+
N∑
k=1
ak∫ mk
mk−1
xhW (y − x)dx for all y
Inverse problem Y = g(X ): Given vq =∫
yqhY (y)dy ,q = 1, 2, ...
vq =
∫
[g(x)]qh(x)dx ≈
N∑
k=1
Mk∑
j=1
w jk [g(x jk)]q(ak0 + akx jk)
Convex optimization problems
83 / 126
Software, references
◮ Papers and tutorials: http://faculty.nps.edu/joroyset
◮ Software for univariate probability density estimation
◮ Matlab toolbox http://faculty.nps.edu/joroyset/XSPL.html◮ R toolbox (S. Buttrey)
http://faculty.nps.edu/sebuttre/home/Software/expepi/index.html
84 / 126
Outline
◮ Illustrations and formulations (slides 12-33)
◮ Framework (slides 34-60)
◮ Epi-splines and approximations (slides 61-77)
◮ Implementations (slides 78-85)
◮ Examples (slides 86-124)
85 / 126
Examples: forecasting and fitting
86 / 126
Copper price forecast
Stochastic differential equation: estimate drift, volatility
Steps:
◮ identify discount factor curve using epi-splines and futuresmarket information =⇒ future spot prices
◮ identify drift using epi-splines, historical data, future spotprices, etc
◮ identify volatility using epi-splines and observed/estimatederrors
87 / 126
Copper price forecast (cont.)
−10 −5 0 5 10
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
Months
Pric
e[U
SD
]
Real prices
E(hist+market) + σ
E(hist+market) − σ
E(hist+market)
88 / 126
Surface reconstruction
f (x) = (cos(πx1) + cos(πx2))3 on [−3, 3]2
continuity; second-order epi-splines; N = 400; 900 uniform points
Actual function and epi-spline approximation
89 / 126
Surface reconstruction
f (x) = sin(π‖x‖)/(π‖x‖) for x ∈ [−5, 5]2\{0}, f (0) = 1cont. diff.; second-order epi-splines; N = 225; 600 random points
−5
0
5
−5
0
5−0.5
0
0.5
1
x1
x2
actu
al s
urf
ace
−5
0
5
−5
0
5
0
0.5
1
x1
x2
epi−
spli
ne
esti
mat
e
Actual function and epi-spline approximation
90 / 126
Examples: density estimation
91 / 126
Examples of probability density estimation
Find a density h that maximizes log-likelihood of observationsand satisfies soft information
Examples: earthquake losses, queuing, robustness, deconvolution,UQ, mixture, bivariate densities
92 / 126
Earthquake losses
93 / 126
Earthquake losses, Vancouver Region*Comprehensive damage model (284 random variables)
100,000 simulations of 50-year loss
Figure 4-1: Vancouver metropolitan region.
*Data from Mahsuli, 2012; see also Mahsuli & Haukaas, 2013
94 / 126
Earthquake losses (cont.)Probability density of 50-year loss (billion CAD)
0 2 4 6 8 10 12 14 16 18 200
0.05
0.1
0.15
0.2
0.25
0.3
loss
dens
ity
95 / 126
Use fewer simulations?Using 100 simulations only and kernel estimator
96 / 126
Use fewer simulations?Using 100 simulations only and kernel estimator
0 2 4 6 8 10 12 14 16 18 200
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
loss
dens
ity
96 / 126
Using function identification approachMax likelihood of 100 simulations; second-order exp. epi-splinesEng. knowledge: nonincreasing, smooth, nonnegative support
97 / 126
Using function identification approachMax likelihood of 100 simulations; second-order exp. epi-splinesEng. knowledge: nonincreasing, smooth, nonnegative support
0 2 4 6 8 10 12 14 16 18 200
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
loss
dens
ity
100000 sample100 sample100 kernel
97 / 126
Using function identification approach (cont.)
Varying number of simulations; same soft information
0 2 4 6 8 10 12 14 16 18 200
0.05
0.1
0.15
0.2
0.25
loss
dens
ity
100000 sample10000 sample1000 sample100 sample10 sample
98 / 126
Using function identification approach (cont.)30 meta-replications of 100 simulations
0 2 4 6 8 10 12 14 16 18 200
0.05
0.1
0.15
0.2
0.25
0.3
loss
dens
ity
100,000 sim. 100 sim.
mean 3.2 3.1(±1.3)
average 10% highest 10.6 10.3(±4.4)
99 / 126
Using function identification approach (cont.)Also pointwise Fisher info.: h′(y)/h(y) ∈ [−0.5,−0.1]
0 2 4 6 8 10 12 14 16 18 200
0.05
0.1
0.15
0.2
0.25
0.3
loss
dens
ity
100,000 sim. 100 sim.
mean 3.2 3.2(±1.2)
average 10% highest 10.6 10.6(±4.0)
100 / 126
Using function identification approach (cont.)Also pointwise Fisher info.: h′(x)/h(x) ∈ [−0.35,−0.25]
0 2 4 6 8 10 12 14 16 18 200
0.05
0.1
0.15
0.2
0.25
0.3
loss
dens
ity
100,000 sim. 100 sim.
mean 3.2 3.3(±0.5)
average 10% highest 10.6 10.9(±1.7)
101 / 126
Building robustness
102 / 126
Diversity of estimates using KL-divergenceReturning to exponential densityContinuously diff., nonincreasing, nonnegative support
0 0.5 1 1.5 2 2.5 3 3.5 40
0.5
1
1.5
x
dens
ity
Original EstimateEmpirical DensityKL = 0.001True DensityKL = 0.01KL = 0.1
103 / 126
Queuing
104 / 126
M/M/1; 50% of customers delayed for fixed timeX = customer time-in-service; 100 obs.; exp. epi-splineSoft info: lsc, X ≥ 0, pointwise Fisher, unimodal upper tail
−1 0 1 2 3 4 5 6 70
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
x
density h
(x)
True Density
Epi−Spline Estimate
Kernel Estimate
Empirical Density
105 / 126
Deconvolution
106 / 126
True density Gamma(5,1)First-order epi-spline on [0, 23], N = 1000Three sample pointsContin., unimodal, convex tails, bounds on gradient jumps
x-5 0 5 10 15
dens
ity
0
0.05
0.1
0.15
0.2
0.25
X True DensityX Epi-Spline EstimateX Sample data
107 / 126
Also noisy observations5000 observations of Y = X +W
W independent normal noise; mean 0, stdev 3.2
108 / 126
Also noisy observations5000 observations of Y = X +W
W independent normal noise; mean 0, stdev 3.2Estimate hY separately
108 / 126
Also noisy observations5000 observations of Y = X +W
W independent normal noise; mean 0, stdev 3.2Estimate hY separately|hY (y)−
∫
h(x)hW (y − x)dx | ≤ 0.005 for 101 y points
108 / 126
Also noisy observations5000 observations of Y = X +W
W independent normal noise; mean 0, stdev 3.2Estimate hY separately|hY (y)−
∫
h(x)hW (y − x)dx | ≤ 0.005 for 101 y points
x-5 0 5 10 15
dens
ity
0
0.05
0.1
0.15
0.2
0.25
X True DensityX Epi-spline EstimateX Sample dataError density
108 / 126
Uncertainty quantification
109 / 126
Dynamical systemRecall the true density of the amplitude
−8 −6 −4 −2 0 2 4 6 80
0.5
1
1.5
x
dens
ity o
f X
110 / 126
Density of amplitudeSample size 100; cont. diff.; unimodal tails, exp. epi-splines
−8 −6 −4 −2 0 2 4 6 80
0.5
1
1.5
x
dens
ity
True DensityEpi−Spline EstimateKernel EstimateSample
111 / 126
Gradient information
Gradient information for bijective g : IR → IR
Recall: If X = g(V ), then
h(x) = hV (g−1(x))/|g ′(g−1(x))|
112 / 126
Gradient information
Gradient information for bijective g : IR → IR
Recall: If X = g(V ), then
h(x) = hV (g−1(x))/|g ′(g−1(x))|
Present context without a bijection and data x i = g(v i ), g ′(v i ):
h(x i ) ≥hV (v
i )
|g ′(v i )|
Value of pdf bounded from below at x i
112 / 126
Gradient information (cont.)Sample size 20
−8 −6 −4 −2 0 2 4 6 80
0.5
1
1.5
x
dens
ity
True DensityEpi−Spline EstimateKernel EstimateSample
113 / 126
Fluid dynamics
Drag/lift estimates:
High-fidelity RANSE solves (each 4 hours on 8 cores)
Low-fidelity potential flow solves (each 5 sec on 1 core)
114 / 126
Predicting high-fidelity performance
898 high- and low-fidelity solves
Drag/lift using low-fidelity solver0.0795 0.08 0.0805 0.081 0.0815 0.082 0.0825 0.083 0.0835D
rag/
lift u
sing
hig
h-fid
elity
sol
ver
0.083
0.084
0.085
0.086
0.087
0.088
Learn from low-fidelity solves and avoid (many) high-fidelity solves
115 / 126
Density using high or low solves
Exponential epi-splines of second order; mesh N = 50Soft info: log-concavity and bounds on second-order derivatives
response8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8
dens
ity
0
5
10
15 high 898Samplelow 800
116 / 126
Estimating conditional errorX =high-fidelity; Y =low-fidelity
hX (x) =∫
hX |Y (x |y)hY (y)dy
117 / 126
Estimating conditional errorX =high-fidelity; Y =low-fidelity
hX (x) =∫
hX |Y (x |y)hY (y)dy
Normal linear least-squares regression model on training data ofsize 50 → hX |Y (x |y)
117 / 126
Estimating conditional errorX =high-fidelity; Y =low-fidelity
hX (x) =∫
hX |Y (x |y)hY (y)dy
Normal linear least-squares regression model on training data ofsize 50 → hX |Y (x |y)
response8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8
dens
ity
0
5
10
15 high 898cond lsSamplelow 800
117 / 126
Information fusion
Max likelihood using 10 high-fidelity simulations0.5hX (x) ≤
∫
hX |Y (x |y)hY (y)dy ≤ 1.5hX (x)
118 / 126
Information fusion
Max likelihood using 10 high-fidelity simulations0.5hX (x) ≤
∫
hX |Y (x |y)hY (y)dy ≤ 1.5hX (x)
response8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8
dens
ity
0
5
10
15 high 898cond lsSamplehigh 10 + condlow 800
118 / 126
Only 10 high-fidelity
response8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8
dens
ity
0
5
10
15 high 898cond lsSamplehigh 10 + condhigh 10low 800
119 / 126
Uniform mixture density
120 / 126
Uniform mixture density
Sample size 100; lsc, slope constraints; exp. epi-splines
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
2.5
3
3.5
x
dens
ity
True DensityEpi−Spline EstimateKernel EstimateEmpirical Density
121 / 126
Bivariate normal density
122 / 126
Bivariate normal probability density
123 / 126
Bivariate normal probability density
Curvature, log-concave, 25 sample points, exp. epi-spline
124 / 126
Conclusions
◮ Function identification problems: rich class
125 / 126
Conclusions
◮ Function identification problems: rich class
◮ lsc functions provide modeling flexibility
125 / 126
Conclusions
◮ Function identification problems: rich class
◮ lsc functions provide modeling flexibility
◮ Epi-convergence allows evolution of info./approx.
125 / 126
Conclusions
◮ Function identification problems: rich class
◮ lsc functions provide modeling flexibility
◮ Epi-convergence allows evolution of info./approx.
◮ Epi-splines central approximation tools
125 / 126
References
◮ Singham, Royset, & Wets, Density estimation of simulation outputusing exponential epi-splines
◮ Royset, Sukumar, & Wets, Uncertainty quantification usingexponential epi-splines
◮ Royset & Wets, From data to assessments and decisions: epi-splinetechnology
◮ Royset & Wets, Fusion of hard and soft information innonparametric density estimation
◮ Royset & Wets, Multivariate epi-splines and evolving functionidentification problems
◮ Feng, Rios, Ryan, Spurkel, Watson, Wets & Woodruff, Towardscalable stochastic unit commitment - Part 1
◮ Rios, Wets & Woodruff, Multi-period forecasting with limitedinformation and scenario generation with limited data
◮ Royset & Wets, On function identification problems
126 / 126