Elisa Negrini (WPI) Advisors: Professors Luca Capogna (WPI ... · Elisa Negrini (WPI) Advisors:...

1
1. Hasan, Ali, et al. "Learning Partial Differential Equations from Data Using Neural Networks." arXiv preprint arXiv:1910.10262 (2019). 2. Ogunmolu, Olalekan, et al. "Nonlinear systems identification using deep dynamic neural networks." arXiv preprint arXiv:1610.01439 (2016). 3. Qin, Tong, Kailiang Wu, and Dongbin Xiu. "Data driven governing equations approximation using deep neural networks." Journal of Computational Physics 395 (2019): 620-635. 4. Rudy, Samuel H., et al. "Data-driven discovery of partial differential equations." Science Advances 3.4 (2017): e1602614. 5. Sahoo, Subham S., Christoph H. Lampert, and Georg Martius. "Learning equations for extrapolation and control." arXiv preprint arXiv:1806.07259 (2018). 6. Schaeffer, Hayden. "Learning partial differential equations via data discovery and sparse optimization." Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 473.2197 (2017): 20160446. Consider the system of ODEs: () = , Where: ∈ℝ state vector at time . ∈ℝ velocity. Assumptions: :ℝ 1+ →ℝ unknown Lipschitz function. We observe at time points 1 ,…, and initial conditions 1 0 ,…, (0), where (0) ∈ ℝ Goal: Learn the function given a finite number of observations of the state vector , with no prior knowledge of the ODE system. Elisa Negrini (WPI) Advisors: Professors Luca Capogna (WPI) and Giovanna Citti (UNIBO) In this work we reconstruct the right-hand side of a system of ODEs () = , directly from observed data using a Feed Forward Network (FFN) with ReLu activation. Since FFN are universal approximators, we don’t need any prior knowledge on the ODE system, in contrast with sparse regression approaches ([6]). Moreover, the recovered function is a good approximation also outside of training data. We test our model on autonomous and non autonomous systems of ODEs, with and without noise in the data. Rigorous mathematical explanation of the different stability behaviors with respect to noise in the data in low and high dimension. Investigate stability of the architecture: can we design problem-specific stable networks? Add Lipschitz and L 2 regularization terms in the loss in order to improve recovery for noisy real-world data. Recovery of parabolic PDEs. Consider: (, ) = , , , Given a finite number of observations of the state vector , , we want to recover (⋅). We can recover in autonomous and non autonomous cases both with and without noise. In all cases the test error decreases with the time step since difference quotients quality improves. Noise in the data in high dimensional cases result in a great increase in the error. The recovery is more accurate in a neighborhood of the training trajectories domain. Input: = , 1 ,…, ∈ℝ 1+ Where ∈ℝ is an observation of at time for initial condition 0. For time points 1 ,…, and initial conditions 1 0 ,…, (0) ∈ ℝ define: = 1, … , , = 1, … , ℎ=+ −1 Network: Feed Forward Network with L layers. Weight matrices: ∈ℝ × −1 , 0 =1+, = Bias vectors: ∈ℝ Activation: : ℝ → ℝ, Leaky ReLU Define for input ∈ℝ +1 = (… ( 1 + 1 2 + 2 … ) + Loss Function: let = {, } model parameters = 1 ℎ=1,…, 2 2 The predicted approximation of , is the network corresponding to arg min () Input Hidden 1 Hidden 2 Hidden (L-1) Hidden L Output = 1, … , , = 1, … , ℎ=+ −1 Sparse regression has been used for equation recovery ([4],[6]), but prior information on the equation is needed. We don’t need prior knowledge because neural networks are universal approximators. The use of dictionaries of functions for equation recovery is a common choice since it allows for better extrapolation but for a limited class of functions ([1],[5]). We did not use dictionaries to allow for more general , . Res Nets [3] and Dynamical Nets [2] have been used for equation recovery on trajectory data. Our method fits in this category but it’s simpler in nature and the recovered function is a good approximation also on non-trajectory data. Noise percentage Mean Rel Error N1(x1,x2) Mean Rel Error N2(x1,x2) No Noise 0.173% 0.216% 5% 0.280% 0.258% 10% 0.290% 0.270% Test Data: The network is compared with (, ) on arbitrary couples , in the domain of (, ) Autonomous ODE: = () Data generated from = cos = 5 × 10 −1 ,∈ 0,3 , = 200 = 8 Data from Lotka Volterra system: 1 = 1.5 1 1 2 2 = −3 2 + 1 2 = 10 −2 ,∈ 0,5 , = 200, = 8 Autonomous ODE system: Predicted Velocity Field and Trajectories, No Noise Noise percentage Mean Relative Error No Noise 1.11% 5% 1.17% 10% 1.47% Target: = = 1 ,…,ሶ ∈ℝ Where ∈ℝ is the finite difference approximation of the first order derivative of at time for initial condition 0. Training Data: ( , ), ℎ = 1, … Training Target Graph of Predicted and True RHS No Noise True f(x) Predicted N(x) No Noise x f(x) Noise percentage Mean Relative Error No Noise 0.371% 5% 1.91% 10% 4.46% Non-Autonomous ODE: = (, ) Data generated from = log 2 = 5 × 10 −1 ,∈ 0.1,2 , = 200, = 8 Training Target True f(t, x) Predicted N(t, x), No Noise x1 x2 3 4 Training Input t x(t) Training Target N(t, x) Predicted N1(x1, x2), No Noise True f1(x1, x2) x1 x2 Training Target 1 t x(t) t x(t) True f2(x1, x2) Predicted N2(x1, x2), No Noise x1 x2 Training Target 2

Transcript of Elisa Negrini (WPI) Advisors: Professors Luca Capogna (WPI ... · Elisa Negrini (WPI) Advisors:...

Page 1: Elisa Negrini (WPI) Advisors: Professors Luca Capogna (WPI ... · Elisa Negrini (WPI) Advisors: Professors Luca Capogna (WPI) and Giovanna Citti (UNIBO) In this work we reconstruct

1. Hasan, Ali, et al. "Learning Partial Differential Equations from Data Using Neural Networks." arXiv preprint arXiv:1910.10262 (2019).

2. Ogunmolu, Olalekan, et al. "Nonlinear systems identification using deep dynamic neural networks." arXiv preprint arXiv:1610.01439 (2016).

3. Qin, Tong, Kailiang Wu, and Dongbin Xiu. "Data driven governing equations approximation using deep neural networks." Journal of Computational Physics 395 (2019): 620-635.

4. Rudy, Samuel H., et al. "Data-driven discovery of partial differential equations." Science Advances 3.4 (2017): e1602614.

5. Sahoo, Subham S., Christoph H. Lampert, and Georg Martius. "Learning equations for extrapolation and control." arXiv preprint arXiv:1806.07259 (2018).

6. Schaeffer, Hayden. "Learning partial differential equations via data discovery and sparse optimization." Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 473.2197 (2017): 20160446.

Consider the system of ODEs: ሶ𝑥(𝑡) = 𝑓 𝑡, 𝑥

Where:

• 𝑥 𝑡 ∈ ℝ𝑑 state vector at time 𝑡.

• ሶ𝑥 𝑡 ∈ ℝ𝑑 velocity.

Assumptions:

• 𝑓:ℝ1+𝑑 → ℝ𝑑 unknown Lipschitz function.

• We observe 𝑥 𝑡 at time points 𝑡1, … , 𝑡𝑀 and initial conditions 𝑥1 0 ,… , 𝑥𝐾(0), where 𝑥𝑖(0) ∈ ℝ𝑑

Goal:

• Learn the function 𝑓 ⋅ given a finite number of observations of the state vector 𝑥 𝑡 , with no prior knowledge of the ODE system.

Elisa Negrini (WPI)

Advisors: Professors Luca Capogna (WPI) and Giovanna Citti (UNIBO)

In this work we reconstruct the right-hand side of a system of ODEs ሶ𝑥(𝑡) = 𝑓 𝑡, 𝑥 directly from observed data using a Feed Forward Network (FFN) with ReLu activation.

Since FFN are universal approximators, we don’t need any prior knowledge on the ODE system, in contrast with sparse regression approaches ([6]). Moreover, the recovered function is a good approximation also outside of training data.

We test our model on autonomous and non autonomous systems of ODEs, with and without noise in the data.

• Rigorous mathematical explanation of the different stability behaviors with respect to noise in the data in low and high dimension.

• Investigate stability of the architecture: can we design problem-specific stable networks?

• Add Lipschitz and L2 regularization terms in the loss in order to improve recovery for noisy real-world data.

• Recovery of parabolic PDEs.Consider:

𝑢𝑡(𝑥, 𝑡) = 𝑓 𝑡, 𝑢, 𝑢𝑥, 𝑢𝑥𝑥

Given a finite number of observations of the

state vector 𝑢 𝑥, 𝑡 , we want to recover 𝑓(⋅).

• We can recover 𝑓 ⋅ in autonomous and non autonomous cases both with and without noise.

• In all cases the test error decreases with the time step since difference quotients quality improves.

• Noise in the data in high dimensional cases result in a great increase in the error.

• The recovery is more accurate in a neighborhood of the training trajectories domain.

Input:

𝑋ℎ = 𝑡𝑗 , 𝑥𝑖1 𝑡𝑗 , … , 𝑥𝑖

𝑑 𝑡𝑗 ∈ ℝ1+𝑑

Where 𝑥𝑖 𝑡𝑗 ∈ ℝ𝑑 is an observation of 𝑥 𝑡 at time 𝑡𝑗for initial condition 𝑥𝑖 0 .

For time points 𝑡1, … , 𝑡𝑀 and initial conditions

𝑥1 0 ,… , 𝑥𝐾(0) ∈ ℝ𝑑 define:

𝑖 = 1,… , 𝐾, 𝑗 = 1,… ,𝑀ℎ = 𝑗 + 𝑖 − 1 𝑀

Network: Feed Forward Network with L layers.

• Weight matrices: 𝑊𝑖 ∈ ℝ𝑛𝑖×𝑛𝑖−1 , 𝑛0 = 1 + 𝑑, 𝑛𝐿 = 𝑑

• Bias vectors: 𝑏𝑖 ∈ ℝ𝑛𝑖

• Activation: 𝜎:ℝ → ℝ, Leaky ReLU

Define for input 𝑋ℎ ∈ ℝ𝑑+1

𝑁 𝑋ℎ = (… (𝜎 𝜎 𝑋ℎ𝑊1𝑇 + 𝑏1 𝑊2

𝑇 + 𝑏2 …)𝑊𝐿𝑇 + 𝑏𝐿

Loss Function: let 𝜃 = {𝑊, 𝑏} model parameters

𝐿 𝜃 =1

𝐾𝑀

ℎ=1,…,𝐾𝑀

∥ 𝑌ℎ − 𝑁 𝑋ℎ ∥22

The predicted approximation of 𝑓 𝑡, 𝑥 is the network 𝑁 corresponding to arg min

𝜃𝐿(𝜃)

⋮ ⋮

Input

Hidden 1 Hidden 2 Hidden (L-1) Hidden L

Output

……

𝑖 = 1,… , 𝐾, 𝑗 = 1,… ,𝑀ℎ = 𝑗 + 𝑖 − 1 𝑀

• Sparse regression has been used for equation recovery ([4],[6]), but prior information on the equation is needed. We don’t need prior knowledge because neural networks are universal approximators.

• The use of dictionaries of functions for equation recovery is a common choice since it allows for better extrapolation but for a limited class of functions ([1],[5]). We did not use dictionaries to allow for more general 𝑓 𝑡, 𝑥 .

• Res Nets [3] and Dynamical Nets [2] have been used for equation recovery on trajectory data. Our method fits in this category but it’s simpler in nature and the recovered function is a good approximation also on non-trajectory data.

Noise percentage

Mean Rel Error

N1(x1,x2)

Mean Rel Error

N2(x1,x2)

No Noise 0.173% 0.216%

5% 0.280% 0.258%

10% 0.290% 0.270%

Test Data: The network 𝑁 is compared with 𝑓(𝑡, 𝑥) on arbitrary couples 𝑡, 𝑥 in the domain of 𝑓(𝑡, 𝑥)

Autonomous ODE: ሶ𝒙 𝒕 = 𝒇(𝒙)

• Data generated from ሶ𝑥 𝑡 = 𝑥 cos 𝑥

• 𝛥𝑡 = 5 × 10−1, 𝑡 ∈ 0, 3 , 𝐾 = 200 𝐿 = 8

Data from Lotka Volterra system: ቊሶ𝑥1 = 1.5𝑥1 − 𝑥1𝑥2ሶ𝑥2 = −3𝑥2 + 𝑥1𝑥2

• 𝛥𝑡 = 10−2, 𝑡 ∈ 0, 5 , 𝐾 = 200, 𝐿 = 8

Autonomous ODE system:

Predicted Velocity Field and Trajectories, No Noise

Noise percentage Mean Relative Error

No Noise 1.11%

5% 1.17%

10% 1.47%

Target:

𝑌ℎ = ሶ𝑋ℎ = ሶ𝑥𝑖1 𝑡𝑗 , … , ሶ𝑥𝑖

𝑑 𝑡𝑗 ∈ ℝ𝑑

Where ሶ𝑥𝑖 𝑡𝑗 ∈ ℝ𝑑 is the finite difference approximation of the first order derivative of 𝑥 𝑡 at time 𝑡𝑗 for initial condition 𝑥𝑖 0 .

Training Data: (𝑋ℎ, 𝑌ℎ), ℎ = 1,…𝐾𝑀

Training TargetGraph of Predicted and

True RHS No NoiseTrue f(x)Predicted N(x)

No Noise

x

f(x)

Noise percentage Mean Relative Error

No Noise 0.371%

5% 1.91%

10% 4.46%

Non-Autonomous ODE: ሶ𝒙 𝒕 = 𝒇(𝒕, 𝒙 𝒕 )

• Data generated from ሶ𝑥 𝑡 = 𝑒−𝑥 log 𝑡 − 𝑡2

• 𝛥𝑡 = 5 × 10−1, 𝑡 ∈ 0.1, 2 , 𝐾 = 200, 𝐿 = 8Training Target True f(t, x) Predicted N(t, x), No Noise

x1

x2

𝒙𝟏 𝒕𝟏

𝒙𝟏 𝒕𝟐

𝒙𝟏 𝒕𝟑

𝒙𝟏 𝒕𝟒

𝒙𝟐 𝒕𝟏

𝒙𝟐 𝒕𝟑

𝒙𝟐 𝒕𝟒

𝒙𝟑 𝒕𝟏

𝒙𝟑 𝒕𝟐 𝒙𝟑 𝒕𝟑𝑥3 𝑡4

Training Input

t

x(t)

Training Target

N(t, x)

Predicted N1(x1, x2), No NoiseTrue f1(x1, x2)

x1

x2

Training Target 𝑌ℎ1

t

x(t)

t

x(t)

True f2(x1, x2) Predicted N2(x1, x2), No Noise

x1

x2

Training Target 𝑌ℎ2