Application of sparse identiﬁcation of nonlinear dynamics ...

Application of sparse identification of nonlineardynamics for physics-informed learning

Matteo CorbettaSGT Inc NASA Ames Research Center

Moffett Field CA 94035matteocorbettanasagov

AbstractmdashAdvances in machine learning and deep neural net-works has enabled complex engineering tasks like image recog-nition anomaly detection regression and multi-objective opti-mization to name but a few The complexity of the algorithmarchitecture eg the number of hidden layers in a deep neuralnetwork typically grows with the complexity of the problemsthey are required to solve leaving little room for interpreting(or explaining) the path that results in a specific solution Thisdrawback is particularly relevant for autonomous aerospaceand aviation systems where certifications require a completeunderstanding of the algorithm behavior in all possible scenar-ios Including physics knowledge in such data-driven tools mayimprove the interpretability of the algorithms thus enhancingmodel validation against events with low probability but rele-vant for system certification Such events include for examplespacecraft or aircraft sub-system failures for which data maynot be available in the training phase This paper investigatesa recent physics-informed learning algorithm for identificationof system dynamics and shows how the governing equations ofa system can be extracted from data using sparse regressionThe learned relationships can be utilized as a surrogate modelwhich unlike typical data-driven surrogate models relies onthe learned underlying dynamics of the system rather thanlarge number of fitting parameters The work shows that thealgorithm can reconstruct the differential equations underlyingthe observed dynamics using a single trajectory when no uncer-tainty is involved However the training set size must increasewhen dealing with stochastic systems eg nonlinear dynamicswith random initial conditions

TABLE OF CONTENTS

1 INTRODUCTION 12 SUMMARY OF SPARSE REGRESSION FOR SYS-

TEM IDENTIFICATION 23 APPLICATION 34 CONCLUSIONS 5ACKNOWLEDGMENTS 7REFERENCES 7BIOGRAPHY 7

1 INTRODUCTIONMachine learning is revolutionizing several branches of sci-ence and engineering It has enhanced the capability toanalyze and interpret data [1] perceive external environmentthrough sensors image processing [2] enable compressedrepresentations of complex systems [3] and many othersAs tasks became more complex so the complexity of themachine learning solutions utilized to accomplish them Theresults are complex algorithms capable of grinding large

978-1-7281-2734-720$3100 ccopy2020 IEEEUS Government work not protected by US copyright

amount of data and producing outstanding results Thiscomes with costs First the obvious need of data Largeamount of data to train deep networks are not always avail-able In the optimal case training sets should span the domainexpected to be observed after the algorithm deploymentbut this condition too may be prohibitive for a variety ofscenarios Second the complexity of the tasks drive up thenumber of parameters to be learned during training whichmeans risk of overfitting and reduced generalization thusrequiring even more data to avoid the latter Then the lackof interpretability of such complex algorithms has raisedconcerns in recent years Those concerns are driven by theneed to trust a machine decision and even more importantlyto understand the reasoning behind it [4] Consequently therisk-adverse aerospace and aeronautical domains are slowerthan others in embracing advanced machine learning and arti-ficial intelligence Systems have to be certified by regulatorsand besides some recent efforts [5] methodologies have notbeen consolidated yet The algorithms should be capable ofhandling scenarios not observed during the training phase forexample sensor signals representative of sub-system failuresIn addition it is unclear how to handle the stochastic orprobabilistic nature of some algorithms Those producesdifferent results at different runs because the solutions reflecta local minimum of the objective function thus hinderingdeployment in safety-critical areas

The need for interpretable algorithms lead torwards newresearch areas like explainable AI [4] interpretable machinelearning [6] as well as physics-informed learning Some re-cent works on physics-informed learning attempted at solvingphysics problems described by sets of differential equationsby adopting machine learning algorithms In those casesthe cost functions were opportunely modified by introducingphysical constraints and symmetries that lead to solutionsconforming to the underlying phenomenon [7] [8] Otherworks focused on the discovery of governing equations ofdynamical systems from data [9] which has some potentialadvantages It would intrinsically enhance extrapolation andgeneralization If an algorithm were capable of identifyingthe links between the observed quantities through physical re-lationships it could be applied to other specimens of the samesystem with no or minimal tuning Physical relationshipswould also help extrapolating the system behavior outside ofthe training domain Aerospace and aeronautics would bene-fit of this last property since a variety of scenarios especiallyrelated to anomalies component and sub-system failuresmay have no data available for training andor validation

This paper was inspired by the work of Brunton et al [9]where sparse regression was proposed as a tool to learngoverning equations of dynamical systems from data Thispaper explores the approach by applying it to two dynamicalsystems a modified Van der Pol oscillator where a nonlinear-ity was added in the zero-order term (ie stiffness term) and

1

a first order three-dimensional system with random initialconditions on two of the three dimensions The paper showsthe application of the algorithm to those two case studieswith some qualitative but critical analysis Sparse regressionappears to require very limited amount of data when com-pared to typical machine learning methods However the3D stochastic system required considerably mode data thanthe modified Van der Pol oscillator to learn the relationshipsbetween the quantities of interest Finally the paper discussessome aspects related to the application and deployment onreal systems

2 SUMMARY OF SPARSE REGRESSION FORSYSTEM IDENTIFICATION

This section summarizes the algorithm utilized in this workoriginally proposed in [9] The notation utilized in thissection and throughout the manuscript is the following Low-ercase letters eg x indicate scalar quantities and boldlowercase letters eg x indicates vectors Matrices are de-fined through bold capital letters eg X while parenthesisstress the dependency in functions and vector functions egf(middot) Time dependencies eg x(t) are emphasized whennecessary

The algorithm proposed in [9] called sparse identification ofnonlinear dynamics (SINDy) aims at estimating the structureof a potentially highly-nonlinear differential equation

x(t) = f (x(t))

where x(t) isin IRntimes1 is the n-dimensional state vector attime t and f(middot) IRntimes1 rarr IRntimes1 is a n-dimensional statemapping function The goal is to estimate the actual form ofthe function f(middot) using data

As the authors stressed in the original paper the assumptionallowing the algorithm to converge to the solution is thatonly a few elements compose function f(middot) making it sparsein the space of possible functions [9] Sparsity balancesmodel complexity and accuracy and the method weights theregression by the number of non-zero elements The goal ofSINDy can be defined through a `1-regularized regression

ξk = arg minξk

||X minus ξkΘ (X) ||2 + λ ||ξk||1 (1)

where all elements are described hereafter

MatrixX isin IRmtimesn is composed of m snapshots of the statevector at different time steps as in Eq (2) where subscripts1 2 middot middot middot n indicates the elements within the n-dimensionalstate vector and ti indicates the i-th time step

state dimensionminusminusminusminusminusminusminusminusminusminusminusrarr

X(t) =

x1(ti) x2(ti) xn(ti)

ytime(2)

The snapshot of the state derivatives X follows the samestructure It can be approximated by differentiation of theelements in X although some filtering may be required toremove noise-driven oscillations [9] Matrix Θ(X) repre-sents one of the key novelties of the method It is a candidatefunction library where each column represents a potential

candidate for the elements in f(middot) to be discovered Theselection of functions to populate the library is arbitraryand may be composed of polynomial terms trigonometricfunctions etc [9] Eq (3)

Θ (X) =

1 X Xp2 sinX cosX

(3)

Matrix Θ(X) is constructed by stacking together column bycolumn candidate nonlinear functions of X For exampleelement 1 is a column-vector of ones element X is alreadydefined in (2) elementXp2 is the matrix containing the set ofall quadratic polynomial functions of the state vector x andis constructed as follows

Xp2 (t) =

x21(t1) x1 x2(t1) x2

2(t1) middot middot middot x2n(t1)

x21(t2) x1 x2(t2) x2


x21(tm) x1 x2(tm) x2

2(tm) middot middot middot x2n(tm)

The superscript p2 has been used to define the set of quadraticpolynomial functions and should not be confused with forexample rdquox to the power of p2rdquo Similarly Xp3 defines theset of cubic polynomial functions and so on

Vector ξ collects the coefficients of the candidate functions inΘ(X) and is the objective of the minimization The numberof vectors equals the dimension of the state vector n sok = 1 n Since only a few of the candidate functionsare expected to have an effect on the system dynamics allvectors ξ are expected to be sparse Symbols || middot ||2 and || middot ||1indicates norm-2 and norm-1 respectively while λ is a scalarmultiplier Element λ ||ξk||1 is the regularization term whichpenalizes coefficients different from 0 in a linear fashion It isthe actual promoter of sparsity in the minimization problem

Once all sparse vectors ξk have been estimated (see followingsubsection) they can be collected in the sparse matrix Ξ

Ξ =

[

ξ1 ξ2 ξn

]

so that the system dynamics can be computed as in Eq (4)

X(t) = Θ(X(t))Ξ (4)

Solution of sparse regression

Equation (1) poses the estimation problem in the same termsof the LASSO (least absolute shrinkage and selection op-erator) [10] However [9] suggests the use of sequentialthresholded least square which will be also used to producethe graphs in this paper While the LASSO solution requiresconvex optimization algorithms or the solution by least angleregression [11] sequential thresholded least square imposessparsity by rdquomanuallyrdquo setting all coefficients smaller thanλ to 0 in an iterative fashion The two methods producedvery similar results in many of the tested cases Howeverthe results from the LASSO were not reported for the sakeof brevity since do not add intuition to the discussion Thesequential thresholded least square from [9] is summarizedby the Python code in Table 12 The result is the sparse

2A Matlab version of the code can be found in the supporting material of theoriginal paper from Brunton et al

2

Table 1 Sequential thresholded least square in PythonThe linear equation is solved using the QR

decomposition

import numpy as np

def linearEqSolver(A b)Q R = nplinalgqr(A)p = npdot(QT b)return npdot( nplinalginv(R) p )

Xi = linearEqSolver(Theta dxdt) Initialize Xi valuesfor iteration in range(niter)

smallCoeff = abs(Xi) lt lmbd get coeff lt lambdaXi[smallCoeff] = 00 set them to 0for dim in range(ndim) regress over large coeffs

bigCoeff = nplogical not(smallCoeff[ dim])reshape((-1))Theta tmp = Theta[ bigCoeff]dxdt tmp = dxdt[ dim]Xi[bigCoeff dim] = linearEqSolver(Theta tmp dxdt tmp)

matrix Ξ collecting the coefficients of the candidate functionsin Θ(X) If as expected the dynamic of the system is sparsewith respect to the space of possible functions most of theelements in Ξ will be zero Otherwise Ξ will be a full matrixsuggesting the solution is not sparse

The number of iterations (niter in Table 1) is not defined hereSeveral trials showed that a few iterations (from a couple upto 5 or 10) were sufficient to estimate the dynamics of thetwo systems presented in Section 3 Performance appearsindependent on the number of iterations in those test cases Ifthe algorithm was not capable of identifying the correct dy-namics after a few iterations then it never converged Addingmore iterations did not improve the selection capability Thisbehavior may not hold for larger systems (large n)

3 APPLICATIONThis section presents the application of the SINDy algorithmto two nonlinear systems All results and graphics reproducedhere were generated using Python 36 [12] libraries NumPyand SciPy [13] and matplotlib [14]

Modified Van der Pol Oscillator

This subsection shows the application of SINDy to the Vander Pol oscillator equation [15] modified by an extra non-linear term and perturbed by sinusoidal excitation Thenonlinear dynamics is defined by the scalar second-orderdifferential equation

xminus c

m(1minus x2)x+ x

(k

m+knlmx2)

=1

mF (t) (5)

The element (1 minus x2) in the second term on the left-handside distinguish the Van der Pol oscillator from the standardlinear oscillator and the term knl x

2 has been added here tointroduce further non-linearity Mass-normalized parameterscm km and knlm are positive scalar quantities whileF (t) is the sinusoidal excitation The values utilized in thissimulation are reported in Table 2 Since Eq (5) is a second-order differential equation it has been transformed into astate-space formulation ie x = [x x]T The dynamicwas integrated using Eulerrsquos forward method with time step

Table 2 Parameters of the modified Van der Poloscillator and initial conditions for the simulation

Parameters Valuescm 016km 025knlm 125

F0 = 200 NF (t) = F0 sin (2πft+ φ) f = 02 Hz

φ = 0 radInitial conditionsx(t = 0) 275 m

x(t = 0) 0 ms

0 20 40 60 80 100time s

40

30

20

10

0

10

20

30

40

acce

lera

tion

ms

2

noisy = 20filtered

20 25 30 35 40

Figure 1 Acceleration signal corrupted by r sim N (0 2)and filtered

size ∆t = 1e minus 3 s and results were used as input data forSINDy The acceleration was corrupted by Gaussian noisewith standard deviation σ = 20 to replicate possible noisefrom acceleration measurements Figure 1

In this mechanical case study the solution through SINDyrequires availability of position velocity and accelerationThe full kinematic profile may not be available in experimen-tal settings where kinematics is measured through sensorsNumerical differentiation and subsequent filtering may helpovercoming the problem as already suggested in [9] The useof two sensors eg displacement transducers and accelerom-eters could potentially help the algorithm thus requiring onlyone numerical differentiation

Figure 2 shows position and velocity of the modified forcedVan der Pol oscillator in time domain A phase plot is alsoshown to appreciate the nonlinear dynamics The 3D plotshows the state variables as time passes by where the firsthalf of the simulation was used as training data (blue line)The dynamics learned from the SINDy algorithm (dashedgreen line) is then compared against the second half of thesimulation (orange line) for validation

The dynamics learned through SINDy is reported in Table

3

x m 3210123

x ms64202468

time

s

0

20

40

60

80

trainvalSINDy

x m

x m

sphase plot

Figure 2 Dynamics of the modified forced Van der Poloscillator learned through SINDy

Table 3 Sparse vector ξ2 from SINDy

Θ-column ξ2 Reference Error 1 00 0 -x -024938066 -025 025x 015618899 016 238x2 00 00 -xx 00 00 -x2 00 00 -x3 -125049414 -125 004x2x -015509556 -016 30xx2 00 00 -x3 00 00 -F (t) 005003716 005 007

3 (vector ξ1 because of the state-space formulation is all0 except for the velocity term x equal to 1) The candidatefunction library was built using sets of polynomials up to thethird order so using the matrix Θ(X) presented in Eq (3) upto the element Xp3 and adding as last column the forcingF (t) Similar results were obtained by increasing the size ofthe candidate function library as well as adding trigonometricfunctions all estimates of the non-necessary coefficients werenull Adding the external forcing is fundamental If F (t)were not included then SINDy would try to link erroneouslythe effect of F (t) to the other columns in Θ

The regularization parameter λ is the sparsity promoterSmall λ would cause the resulting matrix Ξ not to be sparseand large λ would force excessive sparsity with the limitcase ξk = 0 forall k = 1 n A (sub-)optimal λ can beselected in a typical machine learning fashion by splitting theavailable kinematic profile in training and validation Firstselect an initial λ A segment of the kinematic profile isthen used to train the algorithm by imposing the selected λas regularizer and the remaining segment of the kinematicprofile is simulated through SINDy Eq (4) The errorbetween the SINDy-replicated kinematics and validation datais then used as metric to assess the goodness of the selected

00 02 04 06 08 10 -

102

103

104

105

106

squa

red

sum

of r

esid

uals

-

best

Figure 3 Selection of regularized parameters throughAIC based on squared sum of residuals The dashed-grayline represents the error in the limit case ξ = 0 attained

when λ is too high

λ The procedure can be repeated sweeping through a setof possible λ values In the proposed example 28 valuesbetween 0 and 1 were used to define the set of possible λ[0 1e-5 2e-5 0001 0002 01 02 1] The bestλ was selected using the Akaike information criterion (AIC)computed over the squared sum of residuals as in [16]

ε =

msumi=1

(x(ti)minus xSINDy(ti))2

AICγ = m logε

m+ 2γ

where m is the number of datapoints in the validation set(similarly to the notation in the previous sectionm representsthe number of data points over time) and γ is the number ofnon-zero parameters in the SINDy model In this instancethe squared sum of residuals was computed using the positionprofile only The resulting AICγ is the value corresponding toγ non-zero parameters and the model with the lowest AIC isselected as the best performing Figure 3 shows the resultingsum of squared residuals and the best regularization valuechosen with AIC λ = 002 (the results shown in Fig 2 weregenerated with such λ)

Nonlinear system with random initial conditions

This subsection shows the application of SINDy to thestochastic system

dy1dt

= y1y3

dy2dt

= minusy2y3

dy3dt

= minusy21 + y22

(6)

The initial conditions are y1(0) = 1 y2(0) = 01u andy3(0) = 0 where u sim U [minus1 1] The system was first uti-lized in [17] to test deep-residual recurrent neural networksSINDy is not capable of learning the correct dynamics using

4

a single trajectory because of the random initial condition ofy2 The result is intuitive since the random initial conditiondrives trajectory y2 on either positive or negative side of theset of real numbers and that changes at every run (see Fig4 discussed below) The algorithm is obviously incapable ofunderstanding the true dynamic with a single trajectory Theaddition of several trajectories to the dataset helps identifyingthe correct sparse coefficients The latter can be performedby simply row-stacking all trajectories together Thereforethe number of rows of matrix X (Eq (2)) will increasewith the number of sample trajectories In this examplen = 3 and the number of sample points in time domain ism so X isin IRmtimes3 By stacking T trajectories together thedimension of X becomes Tm times 3 All other dimensions ofthe matrices required by SINDy will follow accordingly

To test the effect of randomness on the system identificationcapabilities a number of trajectories y(t) were simulated for20 seconds using ∆t = 005 s The derivatives dydt werecomputed by numerical differentiation and adding Gaussiannoise with σ = 02 Figure 4 and 5 show the systemsimulation unrolled in time domain and a 3D plot wheretime is represented by colors (from blue to red) If onlyone of those trajectories were fed into SINDy the resultingvectors ξk were not sparse Several values of λ rangingfrom 1e minus 3 up to the order of 1e3 were tested in a trial anderror fashion but none of them produced reasonably accurateresults Instead by using multiple sample trajectories thecorrect sparse vectors were identified even by varying λFigure 6 shows the result after training SINDy using 200sample trajectories and λ = 05 (dot-dashed red line) againstthe results after training with a single trajectory (dashedorange line also with λ = 05) The thick dark gray curvewas left out of the training and used as validation Thestructure of the function learned with a single curve producesa trajectory that becomes unstable after a couple of iterationsfalling outside of the range of values while adding multiplesample paths during training allows SINDy to successfullyreconstruct Eq (6) with the following non-zero elements

y1 y3 = 0996069

y2 y3 = minus0994354

y21 = minus0997278 y22 = 099549

where each row refers to the first second and third equationrespectively

The introduction of multiple trajectories in the dataset ap-pears to be robust to more randomness The original initialconditions were modified by adding y3(0) = u where uis again a random realization from the uniform distributionwith range [-11] (different from the realization for y2(0))Figure 7 shows the system dynamics unrolled in time whileFigure 8 shows a 3D representation In this case thenumber of sample trajectories used for training had to beincreased to 400 to correctly capture the system dynamic andso to provide accurate sparse vectors ξ The regularizationparameter λ was kept to 05 Nonetheless the estimation of a(sub-)optimal λ as done in the previous example would helpthe regression problem and a lower number of trajectoriesmay be required Figure 9 shows the SINDy results using400 samples compared against a reference validation curveand the results using a single curve for training The outcomecorroborates what was observed in the previous case studywith one random initial condition only

0

1

2

y 1 -

2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -

Figure 4 1500 sample trajectories y(t) unrolled in timedomain For clarity only a subset of all the trajectories

has been shown

y1 -00

05

10

y2 -1

0

1

y3 -

1

0

1

dy1dt = y1y3dy2dt = y2y3dy3dt = y2

1 + y22

y1(0) = 1y2(0) [ 01 01]y3(0) = 0

0

20

time s

Figure 5 3D representation of the system dynamics Forclarity only a subset of all the trajectories has been

shown

4 CONCLUSIONSThis work proposed a qualitative analysis of the SINDyalgorithms in two test scenarios where nonlinear dynamicalsystems served as reference From the short analysis pre-sented here a number of advantages and drawbacks arisefrom the use of SINDy

Sparsity enables simultaneous model selection and parameteridentification which helps generating parsimonious models(with low number of parameters and functions) when the truedynamics of the system is unknown SINDy produces modelsbased on candidate functions of the state variables so itpaves the way to easy interpretation and may also aid causalinference This is a clear advantage when comparing SINDyagainst other machine learning techniques were the param-eters utilized for regression lose (completely or partially)physical meaning The computational cost appears to be low

5

0

1

2y 1

-SINDy - 1 SINDy - T val

2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -

Figure 6 System identification using a single curve and200 random trajectories For clarity only a subset of all

the trajectories has been shown

0

1

2

y 1 -

2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -

Figure 7 1500 sample trajectories y(t) unrolled in timedomain with the additional random initial conditiony3(0) = u For clarity only a subset of all the

trajectories has been shown

at least for the low-dimension problems shown in this paperMoreover the algorithm allows the straightforward stackingof multiple sample trajectories in the dataset of snapshotsX thus enhancing the regression capability for stochasticsystems

A number of issues emerged by the implementation of thealgorithm The primary concern is the candidate functionslibrary Θ(X) If one of the snapshots of the true functionsis missing from the library then the algorithm will try toassociate the effect of that missing term to the columns inΘ(X) As a result the coefficient vectors ξk might be in-correct or not sparse failing to identify the correct dynamicsIn those cases the extrapolation capabilities of SINDy usingEq (4) can be extremely poor Further improvements of themethodology are likely to be required for real applications

y1 -0

1

2

y2 -

2

0

2

y3 -

2

0

2


1 + y22

y1(0) = 1y2(0) [ 01 01]y3(0) [ 1 1]

0

20

time s

Figure 8 3D representation of system dynamics with theadditional random initial condition y3(0) = u Forclarity only a subset of all the trajectories has been

shown

0

1

2y 1


2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -

Figure 9 System identification with y3(0) = u using asingle curve and 400 random trajectories For clarityonly a subset of all the trajectories has been shown

because when the true dynamic of the system is unknown itis not efficient nor sound to select a huge set of candidatefunctions in the library hoping to include the correct oneSuch a problem does not apply to the simple cases shownhere where the dimension of the state variable was either 2or 3 and the elements constituting the differential equationswere fairly simple

Systems subjects to exogenous forces may be identified eas-ily provided that external forcing is measured and embeddedin the candidate function library On the other hand systemswith feedback correction loops eg K(x minus xdes) posea problem If the control system is simple enough thatfeedback correction can be split andKx andK(xdes) can beembedded in the function library (for example in proportionalor proportional-derivative control) then SINDy still works

6

Otherwise it would become impossible to disambiguate theeffect of the feedback control from the un-controlled systemdynamic [9]

The outcome from this qualitative analysis may change whentesting SINDy in high-dimensional spaces If the systemdynamic is described by a combination of non-rational andrational functions the sequential thresholded least squareshould be modified by first constructing the library Θ(X)in a specific fashion and then compute the null space of thenew library as proposed in [18] The sampling frequency hasnot been discussed here but it can also pose challenges Ifmeasurements of the system state variables were coarse ielow sampling rate the algorithm might not converge to thecorrect sparse vectors

Uncertainty quantification is also relevant for this type ofalgorithms and Bayesian alternative could be explored Asa matter of fact Park and Casella proposed in 2008 theBayesian LASSO [19] which is based on Gibbs samplingand could be implemented to solve Eq (1) Howeverseveral attempts have been carried out to apply the BayesianLASSO to the case studies proposed here and none of themwas successful Specifically the target non-zero elementswere underestimated (closer to 0 than they should be) andthe target zero elements were overestimated (larger than 0)Although the cause of those unsuccessful results was notidentified the use of Bayesian methods in sparse regressionproblems is still a matter of discussion

ACKNOWLEDGMENTSThis work was supported by the System-Wide Safety (SWS)project under the Airspace Operations and Safety Programwithin the NASA Aeronautics Research Mission Directorate(ARMD)

REFERENCES[1] S Ekins A C Puhl K M Zorn T R Lane D P

Russo J J Klein A J Hickey and A M Clark ldquoEx-ploiting machine learning for end-to-end drug discoveryand developmentrdquo Nature materials vol 18 no 5 p435 2019

[2] S Sabour N Frosst and G E Hinton ldquoDynamicrouting between capsulesrdquo in Advances in neural infor-mation processing systems 2017 pp 3856ndash3866

[3] S L Brunton and J N Kutz Data-driven science andengineering Machine learning dynamical systemsand control Cambridge University Press 2019

[4] D Gunning ldquoExplainable artificial intelligence (xai)rdquoDefense Advanced Research Projects Agency (DARPA)nd Web vol 2 2017

[5] C Wilkinson J Lynch R Bharadwaj and K Wood-ham ldquoVerification of adaptive systemsrdquo 2016 fAATechnical Report

[6] F Doshi-Velez and B Kim ldquoTowards a rigorous sci-ence of interpretable machine learningrdquo arXiv preprintarXiv170208608 2017

[7] A Karpatne W Watkins J Read and V KumarldquoPhysics-guided neural networks (pgnn) An appli-cation in lake temperature modelingrdquo arXiv preprintarXiv171011431 2017

[8] M Raissi P Perdikaris and G E Karniadakis

ldquoPhysics informed deep learning (part i) Data-drivensolutions of nonlinear partial differential equationsrdquoarXiv preprint arXiv171110561 2017

[9] S L Brunton J L Proctor and J N Kutz ldquoDiscov-ering governing equations from data by sparse identifi-cation of nonlinear dynamical systemsrdquo Proceedings ofthe National Academy of Sciences vol 113 no 15 pp3932ndash3937 2016

[10] R Tibshirani ldquoRegression shrinkage and selection viathe lassordquo Journal of the Royal Statistical SocietySeries B (Methodological) vol 58 no 1 pp 267ndash2881996

[11] B Efron T Hastie I Johnstone R Tibshirani et alldquoLeast angle regressionrdquo The Annals of statisticsvol 32 no 2 pp 407ndash499 2004

[12] G v Rossum ldquoPython tutorial technical report cs-r9526rdquo in Centrum voor Wiskunde en Informatica(CWI) Amsterdam 1995

[13] E Jones T Oliphant P Peterson et al ldquoSciPy Opensource scientific tools for Pythonrdquo 2001 [Online]Available rdquohttpwwwscipyorgrdquo

[14] J D Hunter ldquoMatplotlib A 2d graphics environmentrdquoComputing In Science amp Engineering vol 9 no 3 pp90ndash95 2007

[15] B Van der Pol ldquoA theory of the amplitude of free andforced triode vibrations radio rev 1 (1920) 701-710754-762 selected scientific papers vol irdquo 1960

[16] N M Mangan J N Kutz S L Brunton and J LProctor ldquoModel selection for dynamical systems viasparse regression and information criteriardquo Proceedingsof the Royal Society A Mathematical Physical andEngineering Sciences vol 473 no 2204 p 201700092017

[17] J N Kani and A H Elsheikh ldquoDr-rnn A deep residualrecurrent neural network for model reductionrdquo arXivpreprint arXiv170900939 2017

[18] N M Mangan S L Brunton J L Proctor and J NKutz ldquoInferring biological networks by sparse iden-tification of nonlinear dynamicsrdquo IEEE Transactionson Molecular Biological and Multi-Scale Communica-tions vol 2 no 1 pp 52ndash63 2016

[19] T Park and G Casella ldquoThe bayesian lassordquo Journal ofthe American Statistical Association vol 103 no 482pp 681ndash686 2008

BIOGRAPHY[

Matteo Corbetta is a Research Engi-neer with SGT Inc at NASA Ames Re-search Center CA where he is investi-gating uncertainty quantification meth-ods model-based and data-driven algo-rithms for diagnostics and prognosticsapplied to autonomous systems Prior tojoining NASA he worked as RampD Con-dition Monitoring Systems Engineer atSiemens Wind Power Denmark and as

Post-Doctoral Researcher and Teaching Assistant at Politec-nico di Milano Italy where he received PhD MSc and BScin Mechanical Engineering His research interests includestochastic processes algorithms for uncertainty quantifica-tion machine learning and system health management He

7

is member of AIAA IEEE and member of the Editorial Boardof the PHM Society

8

Introduction

Summary of sparse regression for system identification

Application

Conclusions

Acknowledgments

References

Biography

a first order three-dimensional system with random initialconditions on two of the three dimensions The paper showsthe application of the algorithm to those two case studieswith some qualitative but critical analysis Sparse regressionappears to require very limited amount of data when com-pared to typical machine learning methods However the3D stochastic system required considerably mode data thanthe modified Van der Pol oscillator to learn the relationshipsbetween the quantities of interest Finally the paper discussessome aspects related to the application and deployment onreal systems

2 SUMMARY OF SPARSE REGRESSION FORSYSTEM IDENTIFICATION

This section summarizes the algorithm utilized in this workoriginally proposed in [9] The notation utilized in thissection and throughout the manuscript is the following Low-ercase letters eg x indicate scalar quantities and boldlowercase letters eg x indicates vectors Matrices are de-fined through bold capital letters eg X while parenthesisstress the dependency in functions and vector functions egf(middot) Time dependencies eg x(t) are emphasized whennecessary

The algorithm proposed in [9] called sparse identification ofnonlinear dynamics (SINDy) aims at estimating the structureof a potentially highly-nonlinear differential equation

x(t) = f (x(t))

where x(t) isin IRntimes1 is the n-dimensional state vector attime t and f(middot) IRntimes1 rarr IRntimes1 is a n-dimensional statemapping function The goal is to estimate the actual form ofthe function f(middot) using data

As the authors stressed in the original paper the assumptionallowing the algorithm to converge to the solution is thatonly a few elements compose function f(middot) making it sparsein the space of possible functions [9] Sparsity balancesmodel complexity and accuracy and the method weights theregression by the number of non-zero elements The goal ofSINDy can be defined through a `1-regularized regression

ξk = arg minξk

||X minus ξkΘ (X) ||2 + λ ||ξk||1 (1)

where all elements are described hereafter

MatrixX isin IRmtimesn is composed of m snapshots of the statevector at different time steps as in Eq (2) where subscripts1 2 middot middot middot n indicates the elements within the n-dimensionalstate vector and ti indicates the i-th time step

state dimensionminusminusminusminusminusminusminusminusminusminusminusrarr

X(t) =

x1(ti) x2(ti) xn(ti)

ytime(2)

The snapshot of the state derivatives X follows the samestructure It can be approximated by differentiation of theelements in X although some filtering may be required toremove noise-driven oscillations [9] Matrix Θ(X) repre-sents one of the key novelties of the method It is a candidatefunction library where each column represents a potential

candidate for the elements in f(middot) to be discovered Theselection of functions to populate the library is arbitraryand may be composed of polynomial terms trigonometricfunctions etc [9] Eq (3)

Θ (X) =

1 X Xp2 sinX cosX

(3)

Matrix Θ(X) is constructed by stacking together column bycolumn candidate nonlinear functions of X For exampleelement 1 is a column-vector of ones element X is alreadydefined in (2) elementXp2 is the matrix containing the set ofall quadratic polynomial functions of the state vector x andis constructed as follows

Xp2 (t) =

x21(t1) x1 x2(t1) x2


x21(t2) x1 x2(t2) x2


x21(tm) x1 x2(tm) x2

2(tm) middot middot middot x2n(tm)

The superscript p2 has been used to define the set of quadraticpolynomial functions and should not be confused with forexample rdquox to the power of p2rdquo Similarly Xp3 defines theset of cubic polynomial functions and so on

Vector ξ collects the coefficients of the candidate functions inΘ(X) and is the objective of the minimization The numberof vectors equals the dimension of the state vector n sok = 1 n Since only a few of the candidate functionsare expected to have an effect on the system dynamics allvectors ξ are expected to be sparse Symbols || middot ||2 and || middot ||1indicates norm-2 and norm-1 respectively while λ is a scalarmultiplier Element λ ||ξk||1 is the regularization term whichpenalizes coefficients different from 0 in a linear fashion It isthe actual promoter of sparsity in the minimization problem

Once all sparse vectors ξk have been estimated (see followingsubsection) they can be collected in the sparse matrix Ξ

Ξ =

[

ξ1 ξ2 ξn

]

so that the system dynamics can be computed as in Eq (4)

X(t) = Θ(X(t))Ξ (4)

Solution of sparse regression

Equation (1) poses the estimation problem in the same termsof the LASSO (least absolute shrinkage and selection op-erator) [10] However [9] suggests the use of sequentialthresholded least square which will be also used to producethe graphs in this paper While the LASSO solution requiresconvex optimization algorithms or the solution by least angleregression [11] sequential thresholded least square imposessparsity by rdquomanuallyrdquo setting all coefficients smaller thanλ to 0 in an iterative fashion The two methods producedvery similar results in many of the tested cases Howeverthe results from the LASSO were not reported for the sakeof brevity since do not add intuition to the discussion Thesequential thresholded least square from [9] is summarizedby the Python code in Table 12 The result is the sparse

2A Matlab version of the code can be found in the supporting material of theoriginal paper from Brunton et al

2


decomposition

import numpy as np










xminus c

m(1minus x2)x+ x

(k

m+knlmx2)

=1

mF (t) (5)







x(t = 0) 0 ms

0 20 40 60 80 100time s

40

30

20

10

0

10

20

30

40

acce

lera

tion

ms

2

noisy = 20filtered

20 25 30 35 40






3

x m 3210123

x ms64202468

time

s

0

20

40

60

80

trainvalSINDy

x m

x m

sphase plot






00 02 04 06 08 10 -

102

103

104

105

106

squa

red

sum

of r

esid

uals

-

best


when λ is too high


ε =

msumi=1


AICγ = m logε

m+ 2γ




dy1dt

= y1y3

dy2dt

= minusy2y3

dy3dt

= minusy21 + y22

(6)


4



y1 y3 = 0996069

y2 y3 = minus0994354

y21 = minus0997278 y22 = 099549



0

1

2

y 1 -

2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -


has been shown

y1 -00

05

10

y2 -1

0

1

y3 -

1

0

1


1 + y22

y1(0) = 1y2(0) [ 01 01]y3(0) = 0

0

20

time s


shown



5

0

1

2y 1


2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -



0

1

2

y 1 -

2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -





y1 -0

1

2

y2 -

2

0

2

y3 -

2

0

2


1 + y22

y1(0) = 1y2(0) [ 01 01]y3(0) [ 1 1]

0

20

time s


shown

0

1

2y 1


2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -




6


























BIOGRAPHY[



7


8

Introduction


Application

Conclusions

Acknowledgments

References

Biography


decomposition

import numpy as np










xminus c

m(1minus x2)x+ x

(k

m+knlmx2)

=1

mF (t) (5)







x(t = 0) 0 ms

0 20 40 60 80 100time s

40

30

20

10

0

10

20

30

40

acce

lera

tion

ms

2

noisy = 20filtered

20 25 30 35 40






3

x m 3210123

x ms64202468

time

s

0

20

40

60

80

trainvalSINDy

x m

x m

sphase plot






00 02 04 06 08 10 -

102

103

104

105

106

squa

red

sum

of r

esid

uals

-

best


when λ is too high


ε =

msumi=1


AICγ = m logε

m+ 2γ




dy1dt

= y1y3

dy2dt

= minusy2y3

dy3dt

= minusy21 + y22

(6)


4



y1 y3 = 0996069

y2 y3 = minus0994354

y21 = minus0997278 y22 = 099549



0

1

2

y 1 -

2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -


has been shown

y1 -00

05

10

y2 -1

0

1

y3 -

1

0

1


1 + y22

y1(0) = 1y2(0) [ 01 01]y3(0) = 0

0

20

time s


shown



5

0

1

2y 1


2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -



0

1

2

y 1 -

2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -





y1 -0

1

2

y2 -

2

0

2

y3 -

2

0

2


1 + y22

y1(0) = 1y2(0) [ 01 01]y3(0) [ 1 1]

0

20

time s


shown

0

1

2y 1


2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -




6


























BIOGRAPHY[



7


8

Introduction


Application

Conclusions

Acknowledgments

References

Biography

x m 3210123

x ms64202468

time

s

0

20

40

60

80

trainvalSINDy

x m

x m

sphase plot






00 02 04 06 08 10 -

102

103

104

105

106

squa

red

sum

of r

esid

uals

-

best


when λ is too high


ε =

msumi=1


AICγ = m logε

m+ 2γ




dy1dt

= y1y3

dy2dt

= minusy2y3

dy3dt

= minusy21 + y22

(6)


4



y1 y3 = 0996069

y2 y3 = minus0994354

y21 = minus0997278 y22 = 099549



0

1

2

y 1 -

2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -


has been shown

y1 -00

05

10

y2 -1

0

1

y3 -

1

0

1


1 + y22

y1(0) = 1y2(0) [ 01 01]y3(0) = 0

0

20

time s


shown



5

0

1

2y 1


2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -



0

1

2

y 1 -

2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -





y1 -0

1

2

y2 -

2

0

2

y3 -

2

0

2


1 + y22

y1(0) = 1y2(0) [ 01 01]y3(0) [ 1 1]

0

20

time s


shown

0

1

2y 1


2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -




6


























BIOGRAPHY[



7


8

Introduction


Application

Conclusions

Acknowledgments

References

Biography



y1 y3 = 0996069

y2 y3 = minus0994354

y21 = minus0997278 y22 = 099549



0

1

2

y 1 -

2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -


has been shown

y1 -00

05

10

y2 -1

0

1

y3 -

1

0

1


1 + y22

y1(0) = 1y2(0) [ 01 01]y3(0) = 0

0

20

time s


shown



5

0

1

2y 1


2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -



0

1

2

y 1 -

2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -





y1 -0

1

2

y2 -

2

0

2

y3 -

2

0

2


1 + y22

y1(0) = 1y2(0) [ 01 01]y3(0) [ 1 1]

0

20

time s


shown

0

1

2y 1


2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -




6


























BIOGRAPHY[



7


8

Introduction


Application

Conclusions

Acknowledgments

References

Biography

0

1

2y 1


2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -



0

1

2

y 1 -

2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -





y1 -0

1

2

y2 -

2

0

2

y3 -

2

0

2


1 + y22

y1(0) = 1y2(0) [ 01 01]y3(0) [ 1 1]

0

20

time s


shown

0

1

2y 1


2

0

2

y 2 -

00 25 50 75 100 125 150 175 200time s

2

0

2

y 3 -




6


























BIOGRAPHY[



7


8

Introduction


Application

Conclusions

Acknowledgments

References

Biography


























BIOGRAPHY[



7


8

Introduction


Application

Conclusions

Acknowledgments

References

Biography


8

Introduction


Application

Conclusions

Acknowledgments

References

Biography

Application of sparse identiﬁcation of nonlinear dynamics ...

Documents

Transcript of Application of sparse identiﬁcation of nonlinear dynamics ...