DeepOPF: A Deep Neural Network Approach for Security … · (DNN) approach for solving...

1

DeepOPF: A Deep Neural Network Approach forSecurity-Constrained DC Optimal Power Flow

Xiang Pan, Tianyu Zhao, Minghua Chen, and Shengyu Zhang

Abstract—We develop DeepOPF as a Deep Neural Network(DNN) approach for solving security-constrained direct currentoptimal power flow (SC-DCOPF) problems, which are critical forreliable and cost-effective power system operation. DeepOPF isinspired by the observation that solving SC-DCOPF problemsfor a given power network is equivalent to depicting a high-dimensional mapping from the load inputs to the generationand phase angle outputs. We first train a DNN to learn themapping and predict the generations from the load inputs. Wethen directly reconstruct the phase angles from the generationsand loads by using the power flow equations. Such a predict-and-reconstruct approach significantly reduces the dimension of themapping to learn, subsequently cutting down the size of the DNNand the amount of training data needed. We further characterizea condition for tuning the size of the DNN according to thedesired approximation accuracy of the load-generation mapping.We develop an efficient post-processing procedure based on `1-projection to ensure the feasibility of the obtained solution, whichcan be of independent interest. Simulation results for IEEE testcases show that DeepOPF generates feasible solutions with minoroptimality loss, while speeding up the computation time by up totwo orders of magnitude as compared to a state-of-the-art solver.

NOMENCLATURE

Variable DefinitionN Set of buses, N , |N |.G Set of generators.D Set of load.E Set of branch.C Set of contingency cases.PG Power generation injection vector, [PGi , i ∈ N ].PminG Minimum generator output vector, [Pmin

Gi, i ∈ N ].

PmaxG Maximum generator output vector, [Pmax

Gi, i ∈ N ].

PD Power load vector, [PDi , i ∈ N ].Θc Voltage angle vector under the c-th contingency.θc,i Voltage angle under the c-th contingency for bus i.Bc Admittance matrix under the c-th contingency.xij,c Line susceptance from bus i to j under the c-th contin-

gency.PmaxTij,c Line transmission limit from bus i to j under the c-th

contingency.Nhid The number of hidden layers in the neural network.

We use | · | to denote the size of a set. Note that for the buses withoutgenerators, the corresponding active generator output as well as theminimum/maximum bound of the generator output are equal to 0 i.e.PGi = Pmin

Gi= Pmax

Gi= 0, i /∈ G. Similarly, PDi = 0, i /∈ D.

Xiang Pan and Tianyu Zhao are with Department of Information Engineer-ing, The Chinese University of Hong Kong. Minghua Chen is with School ofData Science, City University of Hong Kong. Shengyu Zhang is with TencentQuantum Laboratory. Corresponding author: Minghua Chen.

I. INTRODUCTION

The “deep learning revolution” largely enlightened by theOctober 2012 ImageNet victory [1] has transformed variousindustries in human society, including artificial intelligence,health care, online advertising, transportation, and robotics. Asthe most widely-used and mature model in deep learning, DeepNeural Network (DNN) [2] demonstrates superb performancein complex engineering tasks such as recommendation [3],bio-informatics [4], mastering difficult game like Go [5],and human pose estimation [6]. The capability of approximat-ing continuous mappings and the desirable scalability makeDNN a favorable choice in the arsenal of solving large-scaleoptimization and decision problems in engineering systems.In this paper, we apply DNN to power systems for solvingthe essential security-constrained direct current optimal powerflow (SC-DCOPF) problem in power system operation.

The OPF problem, first posed by Carpentier in 1962 in [7],is to minimize an objective function, such as the cost ofpower generation, subject to all physical, operational, andtechnical constraints, by optimizing the dispatch and trans-mission decisions. These constraints include Kirchhoff’s laws,operating limits of generators, voltage levels, and loadinglimits of transmission lines [8]. The OPF problem is central topower system operations as it underpins various applicationsincluding economic dispatch, unit commitment, stability andreliability assessment, and demand response. While OPF witha full AC power flow formulation (AC-OPF) is most accurate,it is a non-convex problem and its complexity obscures practi-cability. Meanwhile, based on linearized power flows, DC-OPFis a convex problem admitting a wide variety of applications,including electricity market clearing and power transmissionmanagement. See e.g., [9], [10] for a survey.

The SC-DCOPF problem, a variant of DC-OPF, is criticalfor reliable power system operation against contingenciescaused by equipment failure [11]. It considers not only con-straints under normal operation, but also additional steady-state security constraints for each possible contingency1 [13].Meanwhile, solving SC-DCOPF incurs excessive computa-

1There are two types of SC-DCOPF problems, namely the preventiveSC-DCOPF problem and the corrective SC-DCOPF problem. Both of themare critical in practice. We focus on the preventive SC-DCOPF problem inthis paper, in which the system operating decisions stay unchanged oncedetermined and they need to satisfy both the pre- and post- contingencyconstraints. Usually, only line contingencies are considered in the preventiveSC-DCOPF problem [12]. Our DeepOPF approach is also useful for thecorrective SC-DCOPF problem, where the system operator only has a shorttime to adjust the operating points after the occurrence of a contingency. ByDeepOPF, the system operator can obtain new operating points in a fractionof the time used by conventional solvers.

arX

iv:1

910.

1444

8v2

[ee

ss.S

Y]

12

May

202

0

2

tional complexity, limiting its applicability in large-scalepower networks [14].

To this end, we propose a machine learning approach forsolving the SC-DCOPF problem efficiently. Our approach isbased on the following observations.

• Given a power network, solving the SC-DCOPF problemis equivalent to depicting a high-dimensional mappingbetween load inputs and generations and voltages outputs.

• In practice, the SC-DCOPF problem is usually solvedrepeatedly for the same power network, e.g., every 5minutes [11], with different load inputs at different timeepochs.

As such, it is conceivable to leverage the universal approxi-mation capability of deep feed-forward neural networks [15],[16], to learn the input-to-output mapping for a given powernetwork, and then apply the mapping to obtain operatingdecisions upon giving load inputs (e.g., once every 5 minutes).

Specifically, we develop DeepOPF as a DNN based solu-tion for the SC-DCOPF problem. As compared to conventionalapproaches based on interior-point methods [17], DeepOPFexcels in (i) reducing computing time and (ii) scaling wellwith the problem size. These salient features are particularlyappealing for solving large-scale SC-DCOPF problems. Notethat the complexity of constructing and training a DNN modelis minor if amortized over many problem instances (e.g., oneper every 5 minutes) that can be solved using the same model.We summarize our contributions as follows.

First, after reviewing the SC-DCOPF problem in Sec. III,we prospose DeepOPF as a DNN framework for solvingthe SC-DCOPF problem in Sec. IV. In DeepOPF, we firsttrain a DNN to learn the load-generation mapping and pre-dict the generations from the load inputs. We then directlyreconstruct the phase angles from the generations and loads byusing the (linearized) power flow equations. Such a predict-and-reconstruct two-step procedure significantly reduces thedimension of the mapping to learn, subsequently cutting downthe size of our DNN and the amount of training data/timeneeded. We also design an efficient post-processing procedurebased on `1-projection to ensure the feasibility of the finalsolution, which can be of independent interest.

Then in Sec. V, we derive a condition suggesting that theapproximation accuracy of the neural network in DeepOPFdecreases exponentially in the number of layers and poly-nomially in the number of neurons per layer. This allowsus to systematically tune the size of the neural network inDeepOPF according to the desired performance. We alsoanalyze the computational complexity of DeepOPF.

Finally, we carry out simulations and summarize the resultsin Sec. VI. Simulation results of IEEE test cases show thatDeepOPF always generates feasible solutions with minoroptimality loss while speeding up the computation time byup to two orders of magnitude as compared to a state-of-the-art solver. The results also highlight a trade-off between theprediction accuracy and running time of DeepOPF.

Due to the space limitation, all proofs are in the supple-mentary material.

II. RELATED WORK

Existing studies on solving SC-OPF focus on four lines ofapproaches. The first is on iteration-based algorithms. TheSC-OPF problem is first approximated as an optimizationproblem, e.g., quadratic programming [18] or linear program-ming [19]. Then iteration-based algorithms, e.g., the interior-point method [20], are applied to obtain the solutions forthe approximated problems. The time complexity of iteration-based algorithms, however, can be substantial for large-scalepower systems, limiting its applicability in practice. This isdue to the significant number of constraints introduced bythe consideration of a large number of contingencies. See,e.g., [13] for a survey on the iteration-based algorithms forsolving SC-OPF problems.

The second approximate solution to SC-OPF problem ison computational intelligence-based schemes, including theones based on evolutionary programming [21], [22], [23], [24].For instance, the authors of [21] propose a particle swarmoptimization method for solving SC-OPF problems, in whichthey apply the particle swarm optimization (PSO) algorithmwith reconstruction operators (PSO-RO) to find the solutionsand designed an external penalty to ensure the feasibility ofthe obtained solution. Two limitations of this approach arethe lack of performance guarantee and high computationalcomplexity [25].

The third is on learning-based methods. There have beenresearches applying machine learning to various tasks in thepower system, e.g., power system state estimation (PSSE) [26],[27], [28]; see [29] for a comprehensive survey. On solvingOPF problems, existing studies mainly focus on integrating thelearning techniques into conventional algorithms to facilitatethe process of solving SC-OPF problems [30], [31], [32], [33].For instance, [30] applies a neural network to learn the systemsecurity boundaries as an explicit function to be used in theOPF formulation. The authors of [32] and [33] develop adecision tree model to extract tractable rules from large datasets of operating points. The model is then used to identifythe feasible region and derive possible solutions accordingly. Itremains an active research topic to understand the potential ofthis line of approach and develop corresponding performanceguarantees.

Recently, there is a line of research on determining theactive/inactive constraints set to reduce the size of powersystem optimization problems, e.g., unit commitment and theOPF problem, to accelerate the solving process [34], [35],[36], [37], [38]. While both DeepOPF and the approaches onremoving inactive/determining active constraints can reducethe computing time for solving OPF problems, they are basedon orthogonal ideas, and they can be combined together toachieve an even better speedup performance. Specifically,the approaches on removing inactive constraints/determiningactive constraints achieve speedup by reducing the size of theOPF problems. In contrast, DeepOPF achieves speedup byemploying a DNN-based OPF solver. It is conceivable to firstreduce the size of an OPF problem by removing the inactiveconstraints and then apply DeepOPF to solve the size-reducedproblem, so as to achieve a speedup performance not possible

3

by individual approaches alone.To our best knowledge, DeepOPF is the first to develop

a DNN-based solver for directly solving OPF problems. Itlearns the mapping from the load inputs to the generationand voltage outputs and directly obtains solutions for the SC-DCOPF problem with feasibility guarantees. As compared toour previous study on DeepOPF in [39], this paper studies themore challenging SC-DCOPF problem and, more importantly,characterizes a useful condition that allows us to tune the sizeof DNN according to the pre-specified performance guarantee.The predict-and-reconstruct DNN framework for solving OPFproblems outlined in [39] (and this paper) applies to the AC-OPF setting as well. It has received growing interests withinitial results reported in [40], [41], which demonstrates thespeedup potential and highlights the challenges of ensuringfeasibility under the AC-OPF setting.

III. SECURITY-CONSTRAINED DCOPF PROBLEM

We focus on the widely-studied (N − 1) SC-DCOPFproblem considering contingencies due to the outage of anysingle transmission line. The objective is to minimize the totalgeneration cost subject to the generator operation limits, thepower balance equations, and the transmission line capacityconstraints under all contingencies [42]. Assumed that thepower network remains connected upon contingency, the SC-DCOPF problem is formulated as follows2:

minΘc,PG

∑i∈G

gi (PGi) (1)

s.t. PminGi ≤ PGi ≤ Pmax

Gi , i ∈ G, (2)Bc ·Θc = PG − PD, c ∈ C, (3)1

xij,c(θi,c − θj,c) ≤ Pmax

Tij,c, (i, j) ∈ E , c ∈ C. (4)

Here c = 0 denotes the case without any contingencies. PmaxTij,c

is the transmission limit for the branch connecting buses i andj. Bc is the admittance matrix for the c-th contingency, whichis an N ×N matrix with entries

Bij,c =

0, if (i, j) /∈ E , i 6= j;

− 1xij,c

, if (i, j) ∈ E ;N∑

k=1,k 6=i

1

xij,c, if i = j.

The first set of constraints in the formulation describe thegeneration limits. The second set of constraints are the powerflow equations with contingencies taken into account. Thethird set of constraints capture the line transmission capacityfor both pre-contingency and post-contingency cases. In theobjective, gi (PGi) is the cost function for the generator at thei-th bus, commonly modeled as a quadratic function [45]:

gi (PGi) = λ1iP2Gi + λ2iPGi + λ3i, (5)

2We note that there is another formulation involving only generations asthe phase angels can be uniquely determined by the generations and loads;see e.g., [43]. We focus on the standard formulation and both formulationsincur the same order of running time complexity [44].

Fig. 1: Overview of the predict-and-reconstruct framework.

where λ1i, λ2i, and λ3i are the model parameters and canbe obtained from measured data of the heat rate curve [42].We note that the SC-DCOPF problem is a strictly convex(quadratic) problem and thus has a unique optimal solution.While the SC-DCOPF problem is important for reliable powersystem operation, solving it for large-scale power networksincurs excessive running time, limiting its practicability [14].In the next section, we propose a neural network approach tosolve the SC-DCOPF problem in a fraction of the time usedby existing solvers.

IV. DEEPOPF FOR SOLVING SC-DCOPF

A. A Neural-Network Approach for Solving OPF Problems

We outline a general predict-and-reconstruct framework forsolving OPF in Fig. 1. Specifically, we exploit the depen-dency induced by the equality constraints among the decisionvariables in the OPF formulation. Given the load inputs, thelearning model (e.g., DNN) is applied to predict only a setof independent variables. The remaining variables are thendetermined by leveraging the (power balance) equality con-straints. This way, we not only reduce the number of variablesto predict but also guarantee that the obtained solution alwayssatisfies the equality constraints, which is usually difficult forgeneric learning based approaches. In this paper, we followthis general approach to develop DeepOPF for solving theSC-DCOPF problem.

B. Overview of DeepOPF

The framework of DeepOPF is shown in Fig. 2, whichis divided into a training stage and an inference stage. Wefirst train a DNN to learn the load-generation mapping andpredict the generations from the load inputs. We then directlycompute the voltages from the generations and loads by usingthe (linearized) power flow equations.

We discuss the process of constructing and training theDNN model in the following subsections. In particular, wediscuss the preparation of the training in Sec. IV-C, thevariable prediction and reconstruction in Sec. IV-D, and thedesign and training of DNN in Sec. IV-E.

In the inference stage, we directly apply DeepOPF to solvethe SC-DCOPF problem with given load inputs. This is dif-ferent from recent learning-based approaches for solving OPFwhere machine learning helps to facilitate existing solvers,e.g., by identifying the active constraints [37]. We describe anefficient post-processing procedure based on `1-projection toensure the feasibility of the obtained solutions in Sec. IV-F.

4

Fig. 2: The flow chart of DeepOPF.

C. Load Sampling and Pre-processing

We sample the loads within [(1 − x) · PDi, (1 + x) · PDi]uniformly at random, where PDi is the default power loadat the i-th bus and x is the percentage of sampling range,e.g., 10%. It is then fed into the traditional quadratic program-ming solver [46] to generate the optimal solutions. Uniformsampling is applied to avoid the over-fitting issue which iscommon in generic DNN approaches3. After that, the trainingdata is normalized (using the statistical mean and standardvariation) to improve training efficiency.

D. Generation Prediction and Phase Angle Reconstruction

We express PGi as follows, for i ∈ G,

PGi = αi ·(PmaxGi − Pmin

Gi

)+ Pmin

Gi , (6)

where αi ∈ [0, 1] is a scaling factor. It is clear that αiand PGi have a one-to-one correspondence. Thus the scalingfactors used in the training phase can be directly computedfrom the generated data. Meanwhile, instead of predicting thegenerations with diverse value ranges, we predict the scalingfactor αi ∈ [0, 1] and recover PGi by using (6)). This simplifiesthe DNN output layer design to be discussed later. Note thatthe generation of the slack bus is obtained by subtractinggenerations of other buses from the total load.

Once we obtain PG, we directly compute the phase anglesby a useful property of the admittance matrices. We first obtainan (N − 1)× (N − 1) matrix, Bc by eliminating the row andcolumn corresponding to the slack bus from the admittancematrix Bc for the c-th contingency. It is well-understood thatBc is a full-rank matrix [47], [48]. Then we compute an(N − 1)-dimensional phase angle vector Θc as

Θc =(Bc

)−1 (PG − PD

), (7)

3For load inputs of large dimensions, the uniform mechanism may not besufficient to guarantee enough good samples, especially near the boundary. Inthose cases, Markov chain Monte Carlo (MCMC) methods can be appliedto sample according to a pre-specified probability distribution, to collectsufficient samples near the boundary of the sampling space.

where PG and PD stand for the (N − 1)-dimensional genera-tion and load vectors for buses excluding the slack bus undereach contingency, respectively. In the end, we output the N -dimensional phase angle vector Θc by inserting a constantphase angle for the slack bus into Θc.

There are two advantages to this design. On one hand,we use the property of the admittance matrix to reduce thenumber of variables to predict by our neural network, cuttingdown the size of our DNN model and the amount of trainingdata/time needed. On the other hand, the equality constraintsinvolving the generations and the phase angles can be satisfiedautomatically, which can be difficult to handle in alternativelearning-based approaches.

E. The DNN Model

The core of DeepOPF is the DNN model, which is appliedto approximate the load-generation mapping, given a powernetwork. The DNN model is established based on the multi-layer feed-forward neural network structure, which consistsof typical three-level network architecture: one input layer,several hidden layers, and one output layer. More specifically,the DNN model is defined as:

h0 = PD,

hi = σ (Wihi−1 + bi−1) ,∀ i = 1, ..., Nhid

α = σ′ (wohhid + bo) ,

where h0 denotes the input vector of the network, hi is theoutput vector of the i-th hidden layer and α is the generatedscaling factor vector for the generators.

1) The architecture: The i-th hidden layer models theinteractions between features by introducing a connectionweight matrix Wi and a bias vector bi. The activation functionσ(·) further introduces non-linearity into the hidden layers.We adopt the Rectified Linear Unit (ReLU) as the activationfunction of the hidden layers, which helps to accelerate theconvergence and alleviate the vanishing gradient problem [1].In addition, the Sigmoid function [2], σ′ (x) = 1

1+e−x , is

5

applied on the output layer to constrain the outputs of thenetwork to (0, 1).

2) The loss function: After constructing the DNN model,we need to design the corresponding loss function to guidethe training. Since there exists a one-to-one correspondencebetween PG and Θc, it suffices to focus on the loss of PG,which is defined as the sum of mean square error between theobtained αi and the optimal scaling factors αi as follows:

LPG=

1

|G|∑i∈G

(αi − αi)2. (8)

Meanwhile, we introduce a penalty term related to theinequality constraint into the loss function. We first introducean Na×N matrix Ac for each contingency c, where Na is thenumber of adjacent buses. Each row in Ac corresponds to anadjacent bus pair. Given the k-th adjacent bus pair (ik, jk) ∈ E ,k = 1, ..., Na, under the c-th contingency, let the power flowfrom the ik-th bus to the jk-th bus. Thus, the elements, akik,cand akjk,c, the corresponding entries of the matrix Ac, aregiven as:

akik,c =1

PmaxTikjk,c

· xikjk,cand akjk,c =

−1PmaxTikjk,c

· xikjk,c.

(9)Based on (7) and (9), the capacity constraints for the trans-

mission line in (4) can be expressed as:

−1 ≤(AcΘc

)k≤ 1, k = 1, ..., Na, c ∈ C, (10)

where (AcΘc)k represents the k-th element of AcΘc. Notethat Θc is the phase angle vector generated based on (7) andthe discussion below it, and it is computed from PG and PD.We can then calculate (AcΘc)k. The penalty term capturingthe feasibility of the generated solutions is defined as:

Lpen =1

Na

Na∑k=1

max

((AcΘc

)2k− 1, 0

). (11)

In summary, the loss function consists of two parts: thedifference between the generated solution and the referencesolution and the penalty upon solutions violating the inequalityconstraints. The total loss is a weighted sum of the two:

Ltotal = w1 · LPG+ w2 · Lpen, (12)

where w1 and w2 are positive weighting factors for balancingthe influence of each term in the training phase.

3) The training process: The training processing can beregarded as minimizing the average loss for the given trainingdata by tuning the parameters of the DNN model as follows:

minWi,bi

1

NT

NT∑k=1

Ltotal,k (13)

where we recall that Wi and bi, i = 1, ..., Nhid represent theconnection weight matrix and vector for layer i. NT is theamount of training data and Ltotal,k is the loss of the k-thitem in the training.

We apply the stochastic gradient descent (SGD) methodwith momentum [49] to solve the problem in (13), which iseffective for the large-scale dataset and can economize on the

computational cost at every iteration by choosing a subset ofsummation functions at every step.

F. Post-Processing

After obtaining a solution including the generations andphase angles, we check its feasibility by examining if itviolates the generation limits and the line transmission limits.We output the solution if it passes the feasibility test. Other-wise, we solve the following `1-projection problem with linearconstraints to obtain a feasible solution,

min ‖PG − U‖1 s.t. U satisfies (2)-(4), (14)

where PG is the solution predicted by DNN. We remark thatsuch an `1-projection problem is indeed an LP and can besolved efficiently by off-the-shell solvers.

V. PERFORMANCE ANALYSIS OF DEEPOPF

A. Approximation Error of the Load-to-Generation Mapping

Given a power network, the SC-DCOPF problem is aquadratic programming problem with linear constraints. Wedenote the mapping between the load input PD and the optimalgeneration PG as f∗(·). Following the common practice in thedeep-learning analysis (e.g., [50], [51], [52]) and without lossof generality, we focus on the case of one-dimensional outputin the following analysis, i.e., f∗(·) is a scalar.4 Assumed theload input domain is compact, which usually holds in practice,f∗(·) has certain properties.

Lemma 1. The function f∗(·) is piece-wise linear andLipschitz-continuous. That is, there exists a constant Λ > 0,such that for any x1, x2 in the domain of f∗(·),

|f∗(x2)− f∗(x1)| ≤ Λ · ‖x1 − x2‖2.

Define f(·) as the mapping between PD and the generationobtained by DeepOPF by using a neural network with depthNhid and maximum number of neurons per layer M . We focuson the case of one-dimensional output. As f(·) is generatedfrom a neural network with ReLU activation functions, it isalso piece-wise linear [53].

By exploiting the piece-wise linearity and the Lipschitzcontinuity, we analyze the approximation error between f∗(·)and f(·).

Theorem 2. Let H be the class of all possible f∗(·) witha Lipschitz constant Λ > 0. Let G be the class of all f(·)generated by a neural network with depth Nhid and at most Mneurons per layer.

maxf∗∈H

minf∈G

maxx∈S|f∗ (x)− f (x)| ≥ Λ · d

4 · (2M)Nhid, (15)

where d is the diameter of the load input domain S.

4To extend the results for mappings with one-dimensional output tomappings with multi-dimensional outputs, one can view the latter as multiplemappings each with one-dimensional output, apply the results for one-dimensional output multiple times, and combine them to get the one formulti-dimension output.

6

The theorem characterizes a lower bound on the worst-caseerror of using neural networks to approximate load-generationmappings in SC-DCOPF problems. The bound is linear ind, which captures the size of the load input domain, andΛ, which captures the “curveness” of the mapping to learn.Meanwhile, interestingly, the bound decreases exponentiallyin the number of layers while polynomially in the number ofneurons per layer. This suggests the benefits of using “deep”neural networks in mapping approximation, similar to theobservations in [50], [51], [52]5.

A useful corollary suggested by Theorem 2 is the following.

Corollary 3. The following gives a condition on the neuralnetwork parameters, such that it is ever possible to approx-imate the most difficult load-to-generation mapping with aLipschitz constant Λ, up to an error of ε > 0.

(2M)Nhid ≥ Λ · d

4 · ε, (16)

where d is the diameter of the input domain S .

The condition in (16) gives a necessary “size” of the neuralnetwork to achieve preferred approximation accuracy. If (16)is not satisfied, then there may exist a difficult mapping, eventhe smallest possible approximation error exceeds ε.

B. Computational Complexity

Recall that N is the number of buses. The number ofoptimization variables in SC-DCOPF, including the genera-tions and the phase angles of all the lines under all possiblecontingencies, and the constraints is O

(N3).

The computational complexity of interior point methodsfor solving SC-DCOPF as a convex quadratic problem isO((N3)4)

= O(N12

), measured as the number of elemen-

tary operations assuming that each elementary operation takesa fixed amount of time to perform [54].

The computational complexity of DeepOPF consists ofthree parts. The first is the complexity of predicting thegenerations using the DNN, which is O

(NhidM

2)

where Mis the maximum number of neurons in each layer and Nhid

is the number of hidden layers in DNN. See Appendix E fordetails of the analysis. To achieve satisfactory performance interms of optmality loss and speed-up, we set M to be O (N)and Nhid to be 3. As such, the complexity for predicting thegenerations by our DNN is O

(N2).

The second is the complexity of computing the phase anglesfrom the generations by directly solving (linearized) powerflow equations and checking the feasibility of the results. Theprocess involves solving O

(N2)

sets of linear equations, oneset for each contingency, and checking the transmission linelimit constraints. The total complexity is O

(N5).

The third is the complexity of `1-projection, if the post-processing procedure is involved to ensure feasibility of theobtained solutions. The `1-projection is a linear programmingproblem and can be solved in O

((N3)2.5)

= O(N7.5

)5While our observations are similar to those in [50], [51], [52], there is

distinct difference in the results and the proof techniques as we explore thepiece-wise linearity of the function unique to our setting.

Fig. 3: The detail architecture of DNN model for IEEE Case30.

TABLE I: Parameters for test cases.

Case N |G| |D| |K| Nhid Neurons per hidden layerIEEE

Case30 30 2 21 41 3 32/16/8

IEEECase57 57 4 42 80 3 32/16/8

IEEECase118 118 19 99 186 3 128/64/32

IEEECase300 300 57 199 411 3 256/128/64

* The number of load buses is calculated based on the default load on eachbus. A bus is considered a load bus if its default active power consumptionis positive.

amount of time by using algorithms based on fast matrixmultiplication [55].

Overall, the total computational complexity of DeepOPFis O

(N5)

if the post-processing procedure is not involved,for example when the power system is operated in the light-load regime. Otherwise, it is O

(N7.5

). In both cases, the

complexity is significantly lower than that of the conventionalinterior point method, which is O

(N12

).

Our simulation results in Sec. VI corroborate the aboveobservations. For both typical and highly-congested settings,DeepOPF obtains quality solutions for SC-DCOPF problemsin a fraction of the time used by a state-of-the-art solver.We also note that the `1-projection in the post-processingprocedure is an LP and can be solved efficiently by manyoff-the-shell solvers.

C. Trade-off between Accuracy and ComplexityThe results in Theorem 2 and Proposition 6 suggest a trade-

off between accuracy and complexity. In particular, we cantune the number of hidden layers Nhid and the maximum num-ber of neurons per layer M to trade between the approximationaccuracy and computational complexity of the DNN approach.It appears desirable to design multi-layer neural networks inDeepOPF as increasing Nhid may reduce the approximationerror exponentially, but only increase the complexity linearly.

VI. NUMERICAL EXPERIMENTS

A. Experiment Setup1) Simulation environment: The experiments are conducted

in CentOS 7.6 on the quad-core ([email protected] Hz) CPUworkstation and 16GB RAM.

7

TABLE II: Performance comparison under typical operating conditions.

Test case # Contingencies # VariablesFeasibility before

`1-projection (%)

Average cost ($/hr) Optimality

loss (%)

Running time (millisecond)Speedup

DeepOPF Ref. DeepOPF Ref.

IEEE Case30 38 1172 100 225.7 225.7 <0.1 0.72 17 ×24

IEEE Case57 79 4564 100 9022.9 9021.6 <0.1 0.76 102 ×133

IEEE Case118 177 21023 100 29197.9 29149.0 <0.2 2.48 698 ×281

IEEE Case300 318 95757 81.7 156601.8 156542.5 <0.1 81.4 5766 ×318

(a) (b)

Fig. 4: Empirical cumulative distribution of speedup andoptimality loss for the IEEE Case118 under typical operatingconditions.

2) Test case: We consider four IEEE standard casesin the Power Grid Lib [56] (version 19.04): the IEEECase- /30/57/118/300 test systems, representing small-scale,medium-scale, and large-scale power networks, respectively.Their illustrations are in [57], [58] and their parameters areshown in Table I. For each case, we consider the typicaloperating conditions [56], where the active power loads arewithin the normal region and the branch limits are not bindingduring both the pre-/post- contingency cases. Note the powerflow balance constraints are active so the SC-DCOPF under thetypical operating conditions are still a constrained optimizationproblem. We illustrate the detailed architecture of our DNNmodel for the IEEE Case30 in Fig. 3.

3) Data preparation: In the training stage, the load datais sampled uniformly at random within [90%, 110%] of thedefault value on each bus [56]. As the Power Grid Lib only haslinear cost functions for generators, we use the cost functionsfrom the test cases with same bus from MATPOWER [59](version 7.0) while all other parameters are taken from thePower Grid Lib cases. Then we obtain the solution of the SC-DCOPF problems by Gurobi [46] (version 8.1.1). We sample50,000 training data and 5,000 test data for each test case.

4) The implementation of the DNN model: We design theDNN model based on Pytorch platform and apply the stochas-tic gradient descent (SGD) method with momentum [49] totrain the neural network. The epoch is set to 300 and the batchsize is 64. We set the weighting factors in the loss functionin (12) to be w1 = w2 = 1, based on empirical experience.The remaining parameters are shown in Table I, including thenumber of hidden layers and the number of neurons on eachlayer.

TABLE III: Performance under the high-variation load condi-tion.

Operating conditionsFeasibility

rate (%)

Optimality

loss (%)Speedup

Typical 100 <0.1 ×338

Congested

with

`1-projection100 <0.2 ×56

without

`1-projection15.7 <0.2 ×315

5) Evaluation metrics: We compare the performance ofDeepOPF and the state-of-the-art Gurobi solver6 using thefollowing metrics, averaged over 5,000 test instances. Thefirst is the percentage of the feasible solution obtained byboth approaches. The second is the objective cost obtainedby both approaches. The third is the running time, i.e., theaverage computation time for obtaining solutions for the 5,000instances. The fourth is the speedup, i.e., the average of therunning-time ratios of the Gurobi solver to DeepOPF for allthe test instances. It captures the average gain in computationtime, of using DeepOPF over the Gurobi solver. We note thatthe speedup is the average of ratios, and it is different from theratio of the average running times between the Gurobi solverand DeepOPF.

B. Performance under the Typical Operating Condition

The simulation results for the test cases under the typicaloperating conditions are shown in Table II and we haveseveral observations. First, as compared to the Gurobi solver,DeepOPF speeds up the computing time by up to twoorders of magnitude. The speedup is increasingly significantas the test cases get larger, suggesting that DeepOPF ismore efficient for large-scale power networks. Second, Deep-OPF without involving the post-processing procedure alwaysgenerates feasible solutions for IEEE Case30, IEEE Case57,and IEEE Case118, which justifies our design. We note thatfor IEEE Case300, DeepOPF achieves 81.7% feasibility ratebefore the post-processing procedure and overall 318 averagespeedup. Further analysis shows the average speedup for thetest instances with feasible solutions generated by DNN (thuswithout involving the post-processing procedure) is 385 with

6The Gurobi solver by default uses multi-threading technique, which affectsthe computing time due to the threads’ communication overhead. For faircomparison, we use the single-threading setting in our simulations.

8

an average running time of 15ms. For the remaining 18.3%test instances for which DNN generates infeasible solutions,it is due to the violation of 1 or 2 line capacity limit con-straints. The `1-projection based post-processing procedure isinvolved to obtain feasible solutions, and the average runningtime of DeepOPF with `1-projection is 378ms. Overall, theaverage DeepOPF running time for all the IEEECase300 testinstances is 81.4ms and the average speedup is 318. Third, thecost difference between with the DeepOPF solution and thereference Gurobi solution is minor, which means the generatedsolution has decent accuracy as compared to the optimalsolution.

To further understand the performance of DeepOPF, weplot the empirical cumulative distortions of the speedupand the optimality loss for the IEEE Case118 in Fig. 4(a)and Fig. 4(b), respectively. As seen, DeepOPF consistentlyachieves excellent optimality-loss and speedup performancefor all the test instances. Overall, our results show thatDeepOPF can generate solutions with minor optimality losswithin a fraction of the time used by the Gurobi solver.

C. Performance with High-Variation Load and under Con-gested Conditions

To stress-test our solution DeepOPF, we enlarge the sam-pling range of the load on each bus to 50% and carry outsimulations on IEEE Case118 under both the typical andthe congested settings. Noted that under typical operatingconditions, almost no line constraints are binding; while underthe congested setting, the loads are larger than those underthe typical operation condition and the transmission lines aremore likely to be binding. The number of training data andthe test data are 50,000 and 5,000, respectively. Our analysisshows that more than 98% of the 5,000 test instances haveat least one branch line constraint binding, i.e., at least onepower line is congested at the optimal solution. On averageeach instance has 2 line constraints binding. The testingresults for both cases are reported in Table III. We observethat DeepOPF achieves a 100% feasibility rate as well asdesirable speedup, and 0.1% optimality loss under typicaloperation with 50% load variation. This implies DeepOPFworks well on the high-variation load typical condition. Inaddition, under the congested setting, DeepOPF with post-processing still generates 100% feasible solutions, with 0.2%optimality loss and a ×56 speedup, as compared to a state-of-the-art solver Gurobi. These observations demonstrate thatDeepOPF achieves substantial speedups at the expense ofminor optimality loss under both the typical and the stressfulconditions.

D. Performance with different DNN Scales and Training dataSizes

When applying DNN approaches, it is of interest to evaluatethe influence of the DNN’s size and the amount of trainingdata on the performance. In addition to the correspondingperformance analysis w.r.t. the DNN’s size in Sec. V, wecarry out experiments to compare the optimality loss andspeedup of DeepOPF with different neural network size and

(a) (b)

Fig. 5: Performance under different neural network and train-ing date sizes for IEEE Case118 under typical operatingconditions.

training data size for IEEE case118 under the typical operationcondition. Three DNN models of different scales are used forcomparison:• DeepOPF-V1: A simple neural network with one hidden

layer; the number of neurons is 16.• DeepOPF-V2: A simple neural network with two hidden

layers; the numbers of neurons per layer are 32 and 16,respectively.

• DeepOPF-V3: A simple neural network with three hid-den layers; the numbers of neurons per layer are 64, 32,and 16, respectively.

The training data size varies from 10,000 to 30,000. Theresults are shown in Fig. 5(a) and Fig. 5(b). It is observedthat larger training data size contributes to smaller optimalityloss. Furthermore, we observe that when the depth and thesize of the neural network increase, DeepOPF achieves betterperformance on optimality loss but less speedup. The aboveresults correspond to our theoretical analysis on computationalcomplexity and prediction accuracy in Sec. V-B and Sec. V-A,i.e., larger DNN size tends to have better prediction accuracy(smaller optimality loss) but also higher computational com-plexity. Having said so, the over-fitting issue may appear inpractice if we keep increasing the depth and size. Thus, fordifferent power networks (as IEEE test cases), the DNN modelcan be determined by educated guesses and iterative tuning,which is also by far the common practice in generic DNNapproaches in various engineering domains.

E. Performance with different Weighting Factors in LossFunction

As shown Sec. IV-E, there are two weighting factors w1 andw2 in the loss function to balance between the training loss andthe penalty of violating the inequality constraints. We carryout comparative experiments to evaluate the influence of thetwo hyper-parameters on the performance. More specifically,we use IEEE Case118 with 50% sampling range for testing,where the penalty is more likely to take effect as severaltransmission lines are binding. Three variants of the weightingfactors in the loss function and the corresponding results areshown in Table IV. As seen, larger value of w2 enhances thefeasibility rate (before `1-projection) and the speedup as thepost-processing step is involved in fewer test instances. In

9

TABLE IV: Performance comparisons of different combina-tions of weights in the loss function.

Weight settingFeasibility

rate (%)

Optimality

loss (%)Speedup

w1 = 1,

w2 = 1

with


without

`1-projection15.7 <0.2 ×315

w1 = 1,

w2 = 10

with

`1-projection100 <0.3 × 83

without

`1-projection23.8 <0.3 ×324

w1 = 10,

w2 = 1

with


without

`1-projection14.5 <0.1 ×324

practice, the weight factors can be determined by educatedguesses and iteratively adjusted to balance the influence ofthe two term in the loss function.

VII. CONCLUSION

We develop DeepOPF for solving SC-DCOPF problems.DeepOPF is inspired by the observation that solving SC-DCOPF problems for a given power network is equivalentto learning a high-dimensional mapping between the loadinputs and the dispatch and transmission decisions. DeepOPFemploys a DNN to learn such mapping. With the learnedmapping, it first obtains the generations from the load inputsand then directly computes the phase angels from the genera-tions and loads. We also develop an `1-projection based post-processing procedure to ensure the feasibility of the obtainedsolution, which can be of independent interest. We characterizethe approximation capability and computational complexityof DeepOPF. Simulation results show that DeepOPF scaleswell in the problem size and speeds up the computing time byup to two orders of magnitude as compared to conventionalapproaches. Future directions include extending DeepOPF tothe AC-OPF setting and exploring joint learning based andoptimization based algorithm design.

REFERENCES

[1] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifica-tion with deep convolutional neural networks,” in Proceedings of theInternational Conference on Neural Information Processing Systems,vol. 1, Lake Tahoe, Nevada, USA, 2012, pp. 1097–1105.

[2] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep Learning.MIT Press Cambridge, 2016, vol. 1.

[3] P. Covington, J. Adams, and E. Sargin, “Deep Neural Networks forYouTube Recommendations,” in Proceedings of the ACM Conferenceon Recommender Systems, New York, NY, USA, Sep 2016, pp. 191–198.

[4] F. Wan, L. Hong, A. Xiao, T. Jiang, and J. Zeng, “NeoDTI: neuralintegration of neighbor information from a heterogeneous network fordiscovering new drug-target interactions,” Bioinformatics, vol. 35, no. 1,pp. 104–111, Jul 2018.

[5] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van denDriessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanc-tot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever,T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis,“Mastering the game of Go with deep neural networks and tree search,”Nature, vol. 529, no. 7587, pp. 484–489, Jan 2016.

[6] A. Toshev and C. Szegedy, “Deeppose: Human pose estimation viadeep neural networks,” in Proceeding of IEEE Conference on ComputerVision and Pattern Recognition, Columbus, OH, USA, June 2014, pp.1653–1660.

[7] J. Carpentier, “Contribution to the economic dispatch problem,” Bulletinde la Societe Francoise des Electriciens, vol. 3, no. 8, pp. 431–447, 1962.

[8] D. E. Johnson, J. R. Johnson, J. L. Hilburn, and P. D. Scott, ElectricCircuit Analysis. Prentice Hall Englewood Cliffs, 1989, vol. 3.

[9] S. Frank, I. Steponavice, and S. Rebennack, “Optimal power flow: abibliographic survey i,” Energy Systems, vol. 3, no. 3, pp. 221–258,Sep 2012.

[10] ——, “Optimal power flow: a bibliographic survey ii,” Energy Systems,vol. 3, no. 3, pp. 259–289, Sep 2012.

[11] M. B. Cain, R. P. Oneill, and A. Castillo, “History of optimal powerflow and formulations,” Federal Energy Regulatory Commission, vol. 1,pp. 1–36, 2012.

[12] A. J. Ardakani and F. Bouffard, “Identification of umbrella constraints indc-based security-constrained optimal power flow,” IEEE Transactionson Power Systems, vol. 28, no. 4, pp. 3924–3934, 2013.

[13] F. Capitanescu, J. M. Ramos, P. Panciatici, D. Kirschen, A. M.Marcolini, L. Platbrood, and L. Wehenkel, “State-of-the-art, challenges,and future trends in security constrained optimal power flow,”Electric Power Systems Research, vol. 81, no. 8, pp. 1731 – 1741,2011. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0378779611000885

[14] N. Chiang and A. Grothey, “Solving security constrained optimalpower flow problems by a structure exploiting interior point method,”Optimization and Engineering, vol. 16, no. 1, pp. 49–71, 2015.

[15] K. Hornik, “Approximation capabilities of multilayer feedforward net-works,” Neural networks, vol. 4, no. 2, pp. 251–257, 1991.

[16] B. Karg and S. Lucia, “Efficient representation and approximationof model predictive control laws via deep learning,” arXiv preprintarXiv:1806.10644, 2018.

[17] J. A. Momoh and J. Z. Zhu, “Improved interior point method for opfproblems,” IEEE Transactions on Power Systems, vol. 14, no. 3, pp.1114–1120, Aug 1999.

[18] J. A. Momoh, “A generalized quadratic-based model for optimal powerflow,” in Proceedings of IEEE International Conference on Systems,Man and Cybernetics, vol. 1, Cambridge, MA, USA, Nov 1989, pp.261–271.

[19] S. H. Low, “Convex relaxation of optimal power flowpart i: Formulationsand equivalence,” IEEE Transactions on Control of Network Systems,vol. 1, no. 1, pp. 15–27, March 2014.

[20] A. A. Sousa and G. L. Torres, “Globally convergent optimal power flowby trust-region interior-point methods,” in 2007 IEEE Lausanne PowerTech, Lausanne, Switzerland, Jul 2007, pp. 1386–1391.

[21] P. E. O. Yumbla, J. M. Ramirez, and C. A. C. Coello, “Optimalpower flow subject to security constraints solved with a particle swarmoptimizer,” IEEE Transactions on Power Systems, vol. 23, no. 1, pp.33–40, 2008.

[22] N. Amjady, H. Fatemi, and H. Zareipour, “Solution of optimal powerflow subject to security constraints by a new improved bacterial foragingmethod,” IEEE Transactions on Power Systems, vol. 27, no. 3, pp. 1311–1323, 2012.

[23] Y. Xu, Z. Y. Dong, R. Zhang, K. P. Wong, and M. Lai, “Solvingpreventive-corrective SCOPF by a hybrid computational strategy,” IEEETransactions on Power Systems, vol. 29, no. 3, pp. 1345–1355, 2013.

[24] J. Cao, W. Du, and H. Wang, “An improved corrective security con-strained OPF with distributed energy storage,” IEEE Transactions onPower Systems, vol. 31, no. 2, pp. 1537–1545, 2015.

[25] P. A. Vikhar, “Evolutionary algorithms: A critical review and its futureprospects,” in 2016 International conference on global trends in signalprocessing, information computing and communication (ICGTSPICC).IEEE, 2016, pp. 261–265.

[26] A. S. Zamzam and N. D. Sidiropoulos, “Physics-aware neuralnetworks for distribution system state estimation,” arXiv preprintarXiv:1903.09669, 2019.

[27] L. Zhang, G. Wang, and G. B. Giannakis, “Real-time power systemstate estimation via deep unrolled neural networks,” in IEEE GlobalConference on Signal and Information Processing (GlobalSIP). IEEE,2018, pp. 907–911.

10

[28] ——, “Real-time power system state estimation and forecasting via deepunrolled neural networks,” IEEE Transactions on Signal Processing,vol. 67, no. 15, pp. 4069–4077, 2019.

[29] L. A. Wehenkel, Automatic learning techniques in power systems.Springer Science & Business Media, 2012.

[30] V. J. Gutierrez-Martinez, C. A. Canizares, C. R. Fuerte-Esquivel,A. Pizano-Martinez, and X. Gu, “Neural-network security-boundaryconstrained optimal power flow,” IEEE Transactions on Power Systems,vol. 26, no. 1, pp. 63–72, 2010.

[31] R. Canyasse, G. Dalal, and S. Mannor, “Supervised learning for optimalpower flow as a real-time proxy,” in 2017 IEEE Power & Energy SocietyInnovative Smart Grid Technologies Conference (ISGT). IEEE, 2017,pp. 1–5.

[32] F. Thams, L. Halilbasic, P. Pinson, S. Chatzivasileiadis, and R. Eriksson,“Data-driven security-constrained OPF,” in X Bulk Power SystemsDynamics and Control Symposium, 2017.

[33] L. Halilbasic, F. Thams, A. Venzke, S. Chatzivasileiadis, and P. Pinson,“Data-driven security-constrained ac-opf for operations and markets,” in2018 Power Systems Computation Conference (PSCC). IEEE, 2018,pp. 1–7.

[34] Q. Zhai, X. Guan, J. Cheng, and H. Wu, “Fast identification of inactivesecurity constraints in scuc problems,” IEEE Transactions on PowerSystems, vol. 25, no. 4, pp. 1946–1954, 2010.

[35] L. A. Roald and D. K. Molzahn, “Implied constraint satisfaction inpower system optimization: The impacts of load variations,” in 201957th Annual Allerton Conference on Communication, Control, andComputing (Allerton). IEEE, 2019, pp. 308–315.

[36] S. Pineda, J. Morales, and A. Jimenez-Cordero, “Data-driven screeningof network constraints,” arXiv preprint arXiv:1907.04694, 2019.

[37] Y. Ng, S. Misra, L. A. Roald, and S. Backhaus, “Statistical LearningFor DC Optimal Power Flow,” arXiv preprint arXiv:1801.07809, 2018.

[38] D. Deka and S. Misra, “Learning for DC-OPF: classifying active setsusing neural nets,” arXiv preprint arXiv:1902.05607, 2019.

[39] X. Pan, T. Zhao, and M. Chen, “DeepOPF: Deep Neural Network forDC Optimal Power Flow,” in IEEE SmartGridComm, Oct. 2019.

[40] K. Baker, “Learning Warm-Start Points for AC Optimal Power Flow,”arXiv preprint arXiv:1905.08860, 2019.

[41] X. Pan, T. Zhao, M. Chen, and S. Low, “DeepOPF: A feasibility-optimized deep neural network approach for ac optimal power flowproblems,” in preparation, 2020.

[42] R. D. Christie, B. F. Wollenberg, and I. Wangensteen, “Transmissionmanagement in the deregulated environment,” Proceedings of the IEEE,vol. 88, no. 2, pp. 170–195, Feb 2000.

[43] X. Cheng and T. J. Overbye, “PTDF-based power system equivalents,”IEEE Transactions on Power Systems, vol. 20, no. 4, pp. 1868–1876,2005.

[44] V. H. Hinojosa and F. Gonzalez-Longatt, “Preventive Security-Constrained DCOPF Formulation Using Power Transmission Distribu-tion Factors and Line Outage Distribution Factors,” Energies, vol. 11,no. 6, 2018.

[45] J. H. Park, Y. S. Kim, I. K. Eom, and K. Y. Lee, “Economic load dispatchfor piecewise quadratic cost function using hopfield neural network,”IEEE Transactions on Power Systems, vol. 8, no. 3, pp. 1030–1038,Aug 1993.

[46] L. Gurobi Optimization, “Gurobi optimizer reference manual,” 2019.[Online]. Available: http://www.gurobi.com

[47] P. J. Martınez-Lacanina, J. L. Martınez-Ramos, A. de la Villa-Jaen,and A. Marano-Marcolini, “Dc corrective optimal power flow based ongenerator and branch outages modelled as fictitious nodal injections,”IET Generation, Transmission & Distribution, vol. 8, no. 3, pp. 401–409, 2013.

[48] S. Chatzivasileiadis, “Lecture Notes on Optimal Power Flow (OPF),”2018. [Online]. Available: http://arxiv.org/abs/1811.00943

[49] N. Qian, “On the momentum term in gradient descent learning algo-rithms,” Neural networks, vol. 12, no. 1, pp. 145–151, 1999.

[50] D. Yarotsky, “Error bounds for approximations with deep ReLU net-works,” Neural Networks, vol. 94, pp. 103–114, 2017.

[51] I. Safran and O. Shamir, “Depth-width Tradeoffs in ApproximatingNatural Functions with Neural Networks,” in Proceedings of the 34thInternational Conference on Machine Learning - Volume 70, ser.ICML’17, 2017, pp. 2979–2987.

[52] S. Liang and R. Srikant, “Why deep neural networks for functionapproximation?” arXiv preprint arXiv:1610.04161, 2016.

[53] G. F. Montufar, R. Pascanu, K. Cho, and Y. Bengio, “Onthe number of linear regions of deep neural networks,”in Advances in Neural Information Processing Systems 27,Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence,

Fig. 6: Illustration of approximating a two-segment piece-wiseLipschitz-continuous function h(·) by a linear function g(·).

and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014,pp. 2924–2932. [Online]. Available: http://papers.nips.cc/paper/5422-on-the-number-of-linear-regions-of-deep-neural-networks.pdf

[54] Y. Ye and E. Tse, “An extension of karmarkar’s projective algorithm forconvex quadratic programming,” Mathematical Programming, vol. 44,no. 1, pp. 157–179, May 1989.

[55] P. M. Vaidya, “Speeding-up linear programming using fast matrixmultiplication,” in IEEE FOCS, 1989, pp. 332–337.

[56] S. Babaeinejadsarookolaee, A. Birchfield, R. D. Christie, C. Coffrin,C. DeMarco, R. Diao, M. Ferris, S. Fliscounakis, S. Greene, R. Huanget al., “The power grid library for benchmarking ac optimal power flowalgorithms,” arXiv preprint arXiv:1908.02788, 2019.

[57] C. H. Liang, C. Y. Chung, K. P. Wong, and X. Z. Duan, “Parallel OptimalReactive Power Flow Based on Cooperative Co-Evolutionary Differen-tial Evolution and Power System Decomposition,” IEEE Transactionson Power Systems, vol. 22, no. 1, pp. 249–257, Feb 2007.

[58] “IEEE case300 topology,” 2018, https://www.al-roomi.org/power-flow/300-bus-system.

[59] R. D. Zimmerman, C. E. Murillo-Sanchez, R. J. Thomas et al., “MAT-POWER: Steady-state operations, planning, and analysis tools for powersystems research and education,” IEEE Transactions on Power Systems,vol. 26, no. 1, pp. 12–19, 2011.

[60] M. Telgarsky, “Benefits of depth in neural networks,” arXiv preprintarXiv:1602.04485, 2016.

[61] S.-G. Chen and P. Hsieh, “Fast computation of the nth root,” Computers& Mathematics with Applications, vol. 17, no. 10, pp. 1423–1427, 1989.

[62] D. M. Gordon et al., “A survey of fast exponentiation methods,” J.Algorithms, vol. 27, no. 1, pp. 129–146, 1998.

APPENDIX APROOF OF LEMMA 1

Proof. We now show the considered piece-wise linear one-dimensional output function f∗(·) is Lipschitz-continuous inthe input domain S, which can be partitioned into r differentconvex polyhedral regions, Ri, i = 1, ..., r. The mapping f∗ (·)is piece-wise linear and can be defined as follows:

f∗ (x) =

a1x+ b1, if x ∈ R1;

a2x+ b2, if x ∈ R2;

· · ·arx+ br, if x ∈ Rr;

where x ∈ Rn×1, ai ∈ R1×n, i = 1, ..., r and bi ∈ R1, i =1, ..., r. Then, we can have:

|f∗ (x1)− f∗ (x2)| ≤ ‖ai‖ · ‖x1 − x2‖, ∀x1, x2 ∈ S.

Thus, let Λ = max {‖ai‖, . . . , ‖ar‖}. We have

|f∗ (x1)− f∗ (x2)| ≤ Λ · ‖x1 − x2‖, ∀x1, x2 ∈ S.

Therefore, f∗ (·) is Lipschitz-continuous.

11

Supplementary MaterialsAPPENDIX B

PROOF OF LEMMA 4

Before we proceed, we present a result on the approximationerror between two scalar function classes.

Lemma 4. Let H be the class of two-segment piece-wiselinear functions with a Lipschitz constant Λ > 0, over aninterval [−µ, µ] (µ > 0). Let K be the class of all linearscalar functions over [−µ, µ]. Then, the following holds,

maxh∈H

ming∈K

maxx∈[−µ,µ]

|h (x)− g (x)| ≥ Λ · µ2. (17)

Essentially, the lemma gives a lower bound to the worst-caseerror of using a linear function to approximate a two-segmentpiece-wise linear function.

Proof. We can derive the lower bound to the worst-case L∞-based approximation error as follows. Suppose we want tofind a function g (·) belongs to the linear scalar function classK to approximate the function h belongs to the two-segmentpiece-wise linear function class H with a Lipschitz constantΛ > 0, over an interval [−µ, µ] (µ > 0). An illustration isshown in Fig. 6. Let g (x) = a · x + b, for x ∈ [−µ, µ]. Leth ∈ H be the following:

h (x) =

{Λ(x+ µ), if x ∈ [−µ, 0] ;−Λ(x− µ), if x ∈ [0, µ] ;

(18)

Then, we can obtain the lower bound for the L∞-basedapproximation error of h (·) and g (·) by the classificationdiscussion on the intercept b.

• If b ≤ Λµ2 . Under this case, we can get:

maxx∈[−µ,µ]

∣∣∣h (x)− g (x)∣∣∣ ≥ ∣∣∣h (0)− g (0)∣∣∣≥ Λ · µ

2n= |Λµ− b|≥ Λ · µ

2.

• Otherwise Λµ2 < b. If a > 0, under this case we can have:

maxx∈[−µ,µ]

∣∣∣h (x)− g (x)∣∣∣ ≥ ∣∣∣h (µ)− g (µ)∣∣∣≥ (Λ+ a) · µ

2

≥ Λ · µ2.

Otherwise a ≤ 0, we can consider the point x = −µ andobtain the same result.

Thus overall, we observe

ming∈K

maxx∈[−µ,µ]

∣∣∣h (x)− g (x)∣∣∣ = Λ · µ2.

For the worst-case L∞-based approximation error, we have

maxh∈H

ming∈K

maxx∈[−µ,µ]

|h (x)− g (x)|

≥ming∈K

maxx∈[−µ,µ]

∣∣∣h (x)− g (x)∣∣∣≥Λ · µ

2.

APPENDIX CPROOF OF THEOREM 2

Proof. Suppose K is the family of piece-wise linear functionsgenerated by a neural network with depth Nhid and maximumnumber of neurons per layer M , on the load input domain Swith the diameter d. The maximum number of segments anyfunctions in K can have is defined as n. Let H be the classof all possible f∗(·) with a Lipschitz constant Λ > 0. Let[ai, ai+1], 0 ≤ i ≤ 2n − 1, be 2n intervals with equal lengthportioning the diameter of input domain. Define f ∈ H asfollows:

f (x) =

{Λ(x− ai), if x ∈ [ai, ai+1] , i = 0, 2, . . . , 2n− 2;

−Λ(x− ai+2), if x ∈ [ai+1, ai+2] , i = 0, 2, . . . , 2n− 2.

Consider any f ∈ K, since f is piece-wise linear with at mostn segments over the input domain, it must be linear over oneof the following n segments [ai, ai+2] , i = 0, 2, ..., 2n − 2.Over that particular segment, we apply Lemma 4 to boundthe approximation error as in (17). Overall, we have

minf∈K

maxx∈S

∣∣∣f (x)− f (x)∣∣∣ ≥ Λ · d4n. (19)

Since the above inequality holds for a particular choice off ∈ H, we must have

maxf∗∈H

minf∈K

maxx∈S|f∗ (x)− f (x)| ≥ Λ · d

4n. (20)

Meanwhile, we use the result in [60], of which the followingis an immediate corollary.

Corollary 5. The maximum number of linear segments gen-erated from the family of ReLU neural networks with depth(the number of hidden layers) l and maximal width (neuronson the hidden layer) m is (2m)

l.

By the above corollary, we have n ≤ (2M)Nhid . Plugging

the relationship into (20), we have

maxf∗∈H

minf∈K

maxx∈S|f∗ (x)− f (x)| ≥ Λ · d

4 · (2M)Nhid. (21)

12

APPENDIX DPROOF OF COROLLARY 3

Proof. We next will show how to derive the Corollary 3.Suppose ε is defined as the upper bound for the worst-caseapproximation error, that is:

maxf∗∈H

minf∈G

maxx∈D|f∗ (x)− f (x)| ≤ ε (22)

Then, we can derive the following inequality based on theabove definition and Theorem 2:

Λ · d

4 · (2M)Nhid≤ ε, (23)

After some transformations, we can obtain the following nec-essary condition related to the DNN’s scale on the Corollary3, which can guarantee that the designed DNN’s ever possibleto approximate the most difficult load-to-generation mappingwith a Lipschitz constant Λ, up to an error of ε > 0:

(2M)Nhid ≥ Λ · d

4 · ε. (24)

APPENDIX ECOMPUTATIONAL COMPLEXITY OF DEEPOPF FOR

PREDICTING THE GENERATIONS

Recall that the number of bus and the number of contin-gencies are N and C, respectively. The input and the outputof the DNN model have Kin and Kout dimensions, and theDNN model has Nhid hidden layers and each hidden layer hasat most M neurons. Specifically, in our setting, Kin equals tothe number of buses with load and Kout equals to the numberof generators. Therefore, the input and output dimensions areof the same order of N . From empirical experience, we setM to be on the same order of N and set Nhid to be a constant.Once we finish training the DNN model, the complexity ofgenerating solutions by using DeepOPF is characterized inthe following proposition.

Proposition 6. The computational complexity (measured asthe number of arithmetic operations) to generate the genera-tions to the SC-DCOPF problem by using DeepOPF is

T = KinK1 +

Nhid−1∑i=1

KiKi+1 +KoutNhid, (25)

which is O(NhidM

2).

Note that Nhid is set to 3 and M is set to be O (N). Thecomplexity of DeepOPF is for predicting the generations isO(N2), significantly smaller than that of the interior point

method. Our simulation results in the next section corroboratethis observation.

Proof. We next will show how to derive the computationalcomplexity of using the DNN model to obtain the genera-tion output from the given input. Recall that the input andthe output of the DNN model in DeepOPF are Kin andKout dimensions, respectively, and the DNN model has Nhid

hidden layers and each hidden layer has Ki neurons, for

i = 1, ..., Nhid. The maximal neurons on the hidden layersis M neurons. For each neuron in the DNN model, we canregard the computation complexity on each neuron (measuredby basic arithmetic operation) as O (1). As we apply the fully-connected architecture, the output of each neuron is calculatedby taking a weighted sum of the output from the neurons onthe previous hidden layer and passing through a activationfunction.

Thus, the computational complexity (measured as the num-ber of arithmetic operations) to generate the output from theinput by a DNN model consists of the following three parts:• Complexity of computation from the input to the first

hidden layer. As each neuron on the first hidden layer willtake the input data, thus the corresponding complexity isO (KinK1).

• Complexity of computation between the consecutive hid-den layers. Since each neuron on the current hiddenlayer will take the output from each neuron on theprevious hidden layer as the input data. Thus, thus thecorresponding complexity is O

(∑Nhid−1i=1 KiKi+1

).

• Complexity of computation from the last hidden layerto the output. As the output of each neuron on thelast hidden layer is used to calculated the output, thecorresponding complexity is O (NhidKout).

The Sigmoid function is applied to each element of theoutput in order to guarantee that the elements of the finaloutput is within (0, 1). The Sigmoid function takes the formof

S(x) =1

1 + e−x=

ex

ex + 1.

and computing a Sigmoid function involves one addition op-eration, one division operation, and one exponentiation opera-tion. The exponentiation operation is essentially a combinationof n-th power operation and m-th root operation, where nand m are some integers depending on the output element x.That is, x = n

m . Previous works show that the computationalcomplexity of n-th multiplication operations and m-th rootoperations is O(logn · logm) [61], [62]. Therefore, a Sigmoidfunction requires O(logn · logm) operations. In practice, bothn and m are bounded by some constant integer M in theactual computation process, and therefore the computationalcomplexity for the Sigmoid function is (logM)2, which isa constant too. In our DeepOPF solution, the output layerof DNN has Kout neurons with Sigmoid function, the corre-sponding computational complexity (the number of arithmeticoperations) for the output layer is O(Kout)..

Hence, the overall complexity of the calculation by a DNNmodel is:

T = O(NinK1 +NhidM

2 +NhidKout

)= O

(NhidM

2).

DeepOPF: A Deep Neural Network Approach for Security … · (DNN) approach for solving...

Documents

Transcript of DeepOPF: A Deep Neural Network Approach for Security … · (DNN) approach for solving...