Deep Reinforcement Learning based Smart Building Energy M...

Deep Reinforcement Learning based Smart Building Energy Management Minh N. H. Nguyen, Tuyen Le Pham, Nguyen H. Tran and Choong Seon Hong

Department of Computer Science and Engineering, Kyung Hee University Email: {minhnhn, tuyenple, nguyenth, cshong}@khu.ac.kr

Abstract: The increasing electronic devices in modern smart buildings leads to the necessity of an efficient energy management based on smart meters. Furthermore, in U.S. and other countries in Europe has launched the dynamic energy price (Time-of-Used) tariff programs to incentivize consumer to regulate their energy loads according to the time-varying electric prices. Towards this end, electronic devices scheduling becomes an essential feature of the future smart building controllers, which can recommend economical plans for customers. In this paper, we apply a deep reinforcement learning technique to control electric devices more efficient and reduce the energy cost regarding the user dissatisfaction cost.

1. Introduction

Nowadays, the proliferation of smart devices, household renewable energy systems, and smart grids have opened a new research area, i.e., energy management for smart buildings. Different from conventional research, the customer role has transited from a passive role to an active role in the energy management problem [1], [2]. According to the foreknowledge of electricity price from the public utility, customers can schedule their own electrical device loads with respect to daily energy profiles.

In this paper, we apply the state-of-the-art actor-critic approach, deep deterministic policy gradient (DDPG) [3] for energy usage scheduling, which can be considered as an efficient approximated solution approach of the cost minimization problem for nanogrid controllers [2].

2. Smart Building Energy Management

A. System Model

In this section, we propose our nanogrid energy management controller model as in Fig. 1 for smart building. Our model considers the existing of renewable generation system and battery storage. The nanogrid controller can make decisions on device operation to minimize their energy cost and user dissatisfaction cost.

Fig. 1: Nanogrid controller model.

The energy cost of a smart building can be defined as the buying/selling from/to the grid with an amount of energy (i.e.,

gridt t tE E Et tE Et tt E ) as follows:

1

( , ) , ,( ),T

gridt

t

E d E t d( , ), ), )T

t 1

T

, ,, ,((( (1)

where

The electric price function is also a piecewise linear function of energy usage at a certain time. In addition to the energy cost, we include the user dissatisfaction cost based on two assumptions for the controlled electric devices:

A1. The nanogrid controller expects to provide deferrable loads as soon as it can (e.g., electric vehicle charging.)

A2. The higher variable loads, the better for user comfort when using electronic devices (e.g., air conditioner, light.)

Accordingly, we define the following user discomfort cost as

1

( , ) .max v vT

d v d s t t tt t t max min

t e s t t

t t Q x ex x x

t t Q Q( , )d v )t t, ),, ), ),

max v vQmax v vtev

t tQmax vQ xmax vt tt

s t tt Q Qs t tt

.mint Q Qt Q Qmaxtet

T t

t1

tT tTdt

t

t

tt

ttt (2)

B. Problem Formulation

min. ( , ) ( , )d vE d x x( , ), ), ( , )d v( ,,d

subject to. , ,d d u v max b bt t t t tE x e e x Q x e t,d d u v max b b

t t t tx e e x Q x e t,d d u v max b bd u maxt t t tt tdxd (3)

, [0,1], [ 1,1], ,d v bt t tx x x t[ 1,1], ,b[0 1] t[0, ],[0,1], [ 1,1],,[0 1][0,1], (4)

1

Td d dt

t

x e ET

d dt

t

x ed dt

1

dE (Deferrable load) (5)

0, [ , ],di s ex t t t0, [ , ],s e0, [ ,,0 [[0, [ ,[ , (6)

, , ,min v max maxt i t t tQ x Q Q t, ,v max max

i t t t,x Q Q t,v max maxv maxi t t tt, (Variable load) (7)

1 , 0,.., 1,b b b bt t tE E x e t T111 , 0,.., 1,b b b

t tE x e , 0,..,0,..,b b bb bt ttE (Battery) (8)

, ,b b bmin t maxE E E t, ,b b

t maxE E t,b bbt maxt (9)

The variables of this problem are decisions on deferrable load , variable load , and battery . This optimization

problem is energy and user dissatisfaction cost minimization over the control horizon T (e.g., 1 day with 48 half-hour time slots). Since we assume that we can only get the full state information at the beginning of a time slot, then the closed-form solution of this problem is unattainable.

( , , )E t d( , , ), ,, , ( , , ), 0, [ ]r E t d E buyingE( buy0, [

( ), otherwise. [ ]r E selling( )

871

2017년 한국소프트웨어종합학술대회 논문집

3. Deep Deterministic Policy Gradients

In this section, we investigate the DDPG technique [3] as a solution approach for the cost minimization problem. Using this state-of-the-art deep reinforcement learning approach, we iteratively update the actor network and critic network as in Fig. 2. Three actions of the controller will be the output of the actor network while the critic network will approximate action value function Q(s,x).

Fig. 2: DDPG actor, critic network for nanogrid controllers.

Both networks have two hidden layers: 400 nodes in the first layer and 300 nodes in the second layer.

In addition, we define the infeasible cost to relax the constraints of the optimization control problem as

1 1 2 2 3 3( , , ) ( ) ( ) ( ).d b d d v bx x x x x x( , , )d b d, , ), ,, , ), ,, , ), ,, ( ) ( ) ( ).d v b1 1 2 2 3 32 2( ) ( ) (1 1 2111 2 3 33333( ) ( ) () ( ) (( ) ( ) () ( ) (1 1 211 21 21 2 3 32 3 332 3 32 3 3 (10)

where the deferrable constraint (5) becomes the penalty when the allocated amount of energy is less than the required deferrable loads as follows

11

( ) / .( )T

d d d dd t

t

x E x e E1( )d ))) d / .T

d/ E//t 1

Td dt (11)

The variable constraint (7) becomes the penalty when the controller decide to use less than the minimum or greater than the required variable energy as

2[ ] [ ]( ) .

v v v vt max min t

vmax min

x e Q Q x ex

Q Q2 ( )v

]]] []]] [] [] [] [] [ ]v v[[[[[[

minQm

(12)

Similarly, the penalty of the battery constraint (9) becomes

3[ ] [ ]( ) .

b b b b b bt max min t

b b bmax min

x e E E x ex

E E3( )b

]]] []b ]] []]] []] []] ]b b b[[[[[[bminEm

(13)

Moreover, constraint (6) is integrated into user dissatisfaction cost (2) when the deferrable loads are planned to use outside of the given range [ , ]s et t t[ , ]s e,,[ ,, as follows

1

( , ) .max v vT

s pend v d t t tt t t max min

t e s t t

t t t Q x ex x x

t t Q Q( , )d v )t t, ), ),, ),

t v vv vtQmaxQmax

ttpentp

vtx evv

t tt

s t tt Q Qs t ts t

.nt Q Qmint Q Qmax

T

t

t

tet1

tT tTdt

t

t

tsts (14)

The negative total cost, which consists of energy, user dissatisfaction, and infeasible cost is used as the reward function with respect to the current state and decision of the nanogrid controller.

For the energy price, we apply the dynamic electric price such as Time-of-Use (TOU) tariff, which includes on-peak, off-peak, mid-peak rates according to the time period in a day in Austin city [4]. The selling rate also follows Value-of-Solar rate in Austin [4].

Type Weekdays (hours)

On-Peak [14:00, 20:00] Mid-Peak [6:00, 14:00], [20:00, 22:00] Off-Peak [22:00, 6:00]

Table 1: TOU Tariff 2016 in Austin, U.S. [4]

4. Simulation Results

The energy loads of devices are extracted and generated from the dataset from UK [5]. We train for the first 430 days and test with the last 31 days. Each day is divided into 48 time slots, each time slot duration is 30 minutes. At the beginning of time slot, the controller will collect the state information (i.e., electric price function, required variable load, battery status, renewable generation) and produce the actions ( . The required period of the deferrable load is set to 17 p.m. ~ 6 a.m. The renewable energy generation is a normal distribution with mean 400Wh in the daytime and will be zero in the nighttime since we consider solar energy source alone. Battery is limited from 100Wh ~ 500Wh. The control unit deferrable load and battery are 200Wh and 100Wh, respectively.

The Fig. 3 shows the costs reduction in the training stage. After the high variance in the first 220 days, the costs have a tendency to become more stable and decrease.

Fig. 3: Costs reduction in the training stage.

We next evaluate the learning performance of DDPG and a heuristic strategy in the last 31 days. The heuristic strategy is illustrated in Fig. 5a, where it tries to provide the deferrable loads as soon as it can to reduce user dissatisfaction cost and do not use the battery. The average cost improvement by using DDPG is 16% of total costs. Specifically, in the first 4 days of testing data as shown in Fig. 4b, the energy cost can reduce 28% compared to the heuristic strategy.

872


(a)

(b)

Fig. 4: Test Performance in 31 days.

Finally, Fig. 5 illustrates the first 4 days of testing data. With DDPG, deferrable loads are postponed to reduce the the energy usage in the on-peak tariff period.

(a)

(b)

Fig. 5: Actions and costs in the first 4 days of testing.

5. Conclusion & Future Work

In this paper, we investigate an efficient energy management approach in a nanogrid controller, which can help to reduce energy cost while scarifying small level of user satisfaction compared to a simple heuristic strategy. In the future work, we advocate extending the work for the cooperation of nanogrids controller when access the common renewable energy.

Acknowledgment

This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education (NRF-2016R1D1A1B01015320).

This research was supported by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program(IITP-2017-2013-0-00717)supervised by the IITP(Institute for Information & communications Technology Promotion). *Dr. CS Hong is the corresponding author.

References:

[1] E. Mocanu, D. C. Mocanu, P. H. Nguyen, A. Liotta, M. E. Webber, M. Gibescu, and J. G. Slootweg, “On-line Building Energy Optimization using Deep Reinforcement Learning,” arXiv:1707.05878, jul 2017. [2] D. A. Daniel Burmester, Ramesh Rayudu, WinstonSeah, “A review of nanogrid topologies and technologies,” Renewable and Sustainable Energy Reviews, vol. 67, pp. 760–775, jan 2017. [3] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv:1509.02971, sep 2015. [4] “City of Austin FY 2017 Electric Tariff,” Austin City Council, Tech. Rep., 2016. [Online]. Available: https://austinenergy. com/wps/wcm/connect/c4f3dd41-c714-43bc-9279-e92940094731/ FY2017aeElectricRateSchedule.pdf?MOD=AJPERES [5] D. Murray, L. Stankovic, and V. Stankovic, “An electrical load measurements dataset of United Kingdom households from a two-year longitudinal study,” Scientific Data, vol. 4, p. 160122, jan 2017.

873


Deep Reinforcement Learning based Smart Building Energy M...

Documents

Transcript of Deep Reinforcement Learning based Smart Building Energy M...