Next Article in Journal
A Dynamic and Static Binary Translation Method Based on Branch Prediction
Previous Article in Journal
Packet Reordering in the Era of 6G: Techniques, Challenges, and Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A DRL-Based Load Shedding Strategy Considering Communication Delay for Mitigating Power Grid Cascading Failure

1
Information & Telecommunications Company, State Grid Shandong Electric Power Company, **an 250013, China
2
The School of Information Science and Engineering, Shandong Provincial Key Laboratory of Wireless Communication Technologies, Shandong University, Qingdao 266237, China
*
Authors to whom correspondence should be addressed.
Electronics 2023, 12(14), 3024; https://doi.org/10.3390/electronics12143024
Submission received: 16 June 2023 / Revised: 5 July 2023 / Accepted: 7 July 2023 / Published: 10 July 2023
(This article belongs to the Special Issue Power Systems Stability in Smart Grid Era)

Abstract

:
Successive failures in power transmission lines can cause cascading failures in the power grid, which may eventually affect large parts of the power grid and even cause the power grid system to go down. Collecting and transmitting primary equipment information and issuing load-shedding action commands in the power grid depend on the power communication network. With the help of the power communication network, we can better observe the situation of the power grid in real time and provide a guarantee for the regular working of the power grid. However, the communication network also has the problem of communication delay causing latency in load-shedding action. On the premise of preserving the key physical properties and operational characteristics of the power grid, this paper uses the IEEE 14 and 30 bus systems as examples to establish a direct current (DC) power flow simulation environment. We establish a communication network model based on the power grid topology and the corresponding communication channels. For the problem of cascading failures occurring in the power grid after transmission line failures, a load-shedding strategy using soft actor-critic (SAC) based on deep reinforcement learning (DRL) was developed to effectively mitigate cascading failures in the power grid while considering the impact of communication delay. The corresponding communication delay is obtained by calculating the shortest communication path using the Dijkstra algorithm. The simulation verifies the feasibility and effectiveness of the SAC algorithm to mitigate cascading failures. The trained network can decide on actions and give commands quickly when a specific initial failure is encountered, reducing the scale of cascading failures.

1. Introduction

The power grid is a vital infrastructure for modern society, as almost all activities depend on electricity. Power grid systems must be robust enough to cope with failures that occur internally and disruptive events that arise externally. In addition, power communication network failures and human misoperation may also lead to the propagation of initial failures [1,2]. This phenomenon of continuous and uncontrolled successive failures of power grid components is known as cascading failures [3].
An important cause of major blackout accidents in smart grids is cascading failure. Severe power system cascading failures can lead to the collapse of the entire power network, triggering large-scale power outages and bringing great economic losses to society. In addition to economic losses, grid disintegration and large blackout accidents will also affect people’s normal lives and endanger public safety, causing a serious social impact. With the continuous development of the power system and the formation of the power grid, operation of the power grid is getting closer to its limit level. Thus, it is very important to study and analyze power system cascading failures, and it is also necessary to study how to mitigate a power grid’s cascading failures when they occur. Mitigating cascading failures can not only effectively reduce economic losses but also avoid the inconvenience of large-scale power outages for the people.
The operation of power grid systems is faced with various security threats, and it is urgent to build a green and resilient power grid system. In recent years, people’s research on new energy in the power grid has become more and more in-depth. Improving the power generation efficiency of new energy sources, such as wind and solar energy, is also one of the research hotspots [4,5].
In 2003, initial failures of transmission lines and generator sets led to massive power outages in the United States and Canada [6], costing up to USD 30 billion in economic losses. In 2005, a widespread blackout in the Moscow metropolitan area shut down all 321 substations in the Moscow power grid, paralyzing nearly half of the city’s industrial production and transportation and causing economic losses of up to USD 1 billion [7]. In 2012, India experienced a massive power outage that disrupted the power supply to nearly half of the country, affecting a record 600 million people [8].
The above-mentioned large-scale outages were closely related to cascading failures in the power grid from occurrence to a large-scale spread, which shows that cascading failures are highly destructive to the power grid. Therefore, it is essential to conduct in-depth research on cascading failures and how to reduce their scale.
However, it is difficult to model the evolutionary process when cascading failures occur in a power grid system. The reason for this is that a grid contains many components and has a complex topology. The factors that cause the state of a grid to change are numerous and uncertain. In addition, operational characteristics such as the transmission line load levels in a grid can also affect the cascading state of the grid. With the advent of the Information Age, the transmission of data and operating instructions in the grid system are inextricably linked to the power communication network. When the power communication network fails to transmit information correctly or lags, it also affects cascading failures in the power grid system. The photovoltaic power generation system constructed in [5] could reduce sudden connections or cut-off load, improve system stability, and avoid cascading failures.
Recently, researchers have put great effort into modeling and understanding cascading failures in power systems. In [9], the authors showed that models based on graph theory and percolation theory do not accurately capture cascading failures. The reason for this is that when a transmission line fails, the following failed lines could be anywhere on the grid and not necessarily on an adjacent line, which is usually a typical assumption in epidemic models. There is also a discontinuous line failure propagation phenomenon in real-world power outages.
In 2015, Typhoon Rainbow hit Guangdong’s power grid, and many transmission lines tripped, preventing electricity transmission from the supply side to the load side [10]. In this case, the grid operation mode calculated beforehand was no longer feasible, and the grid dispatcher needed to integrate multiple objectives (guaranteeing a power supply to important users and ensuring that the power grid is not splitting, minimizing load shedding and returning to the regular grid structure as soon as possible) to make the optimal judgment. Veteran dispatchers also tended to rely on their intuition and experience when conducting operations.
In theory, the grid dispatching problem can be solved by exhaustive enumeration to obtain the best results, but the vast action space, extremely long decision steps, excessively complex topology, and various contingencies occurring randomly make the problem size impossible to be solved by exhaustive enumeration. DRL has significant advantages in dealing with large spatial nonlinear problems, and a well-trained network can significantly reduce the decision time.
DRL combines the feature extraction capability of deep learning (DL) and the decision making capability of reinforcement learning (RL), and it is an end-to-end decision control algorithm that is widely used in dynamic decision making and real-time prediction among other uses [11].
The power communication network allows the collection of real-time information about the operation and equipment status in the power grid extremely quickly. It can automatically classify the power data to better transfer the information to the control center (CC) for further processing. The power communication network can optimize the power distribution network, make the network architecture more efficient and practical, and improve the quality of the power supply.
The proper operation of the power grid system depends on information transported via the power communication network. Interruptions or delays in the power communication network can impact the grid system’s stable operation. The larger the scale of the power grid, the more information is required to be transmitted. When the transmission distance becomes more extensive, the impact of communication delays on the power grid’s operation and control will become more and more significant.
In this paper, we use data from optimal DC power flow simulations to prevent the evolution of cascading failures, considering transmission line capacity constraints and communication delays, and the types of failures studied were mainly transmission line failures. We use PYPOWER to numerically simulate the IEEE 14 and 30 bus systems. We propose an SAC based on the DRL algorithm for mitigating cascading failures by load shedding. The use of a load shedding strategy can reduce the burden on the network, reduce the power flow in the grid, and try to avoid transmission line overloads. Then, we use the Dijkstra algorithm for route planning to find the shortest communication path to send down the load-shedding action and analyze the effect of communication delay on cascading failures. The rest of the paper is structured as follows. Section 2 details some research methods for mitigating power grid cascading failures. Section 3 describes the construction of grid and communication network models and simulation conditions. Section 4 presents the application of RL in power grid cascading failure mitigation. Section 5 gives simulation examples which prove that the proposed algorithm can effectively reduce the cascading failure scale.

2. Related Works

To clarify the occurrence and propagation mechanism of cascading failures in a power grid, researchers analyzed and studied different perspectives, such as a complex network topology and actual network characteristics, to find ways to block and reduce cascading failures. Depending on the different purposes of the modeling, the research on power system cascading failures was mainly divided into the following two ideas: (1) models with complex system theories such as power flow calculation and stability analysis as the core, including the CASCADE model [12], branching process model [13], and optimal power flow (OPF) model [14], and (2) abstract modeling of power systems and analysis of their characteristics, using complex network theories to reveal the relationship between the topology characteristics and cascading failure evolution, such as the small-world network model. The main application of this paper is the DC power flow calculation model.
The cascading failure mitigation problem could be viewed as a stochastic dynamic planning problem with unknown damage risk information. Previous studies have mainly used mathematical planning or heuristic methods to solve this problem. There are three main directions of research on cascading failure contingency control problems: (1) safety-constrained alternating current (AC) optimal power flow (SC-ACOPF) [15]; (2) optimal control [16]; and (3) traditional machine learning, such as decision trees [17] and classic RL [18].
In the early phase of power grid cascading failure analysis, most research concentrated on modeling power grid failure in a single non-interactive environment. In recent years, cascading failure studies of power grids considering the effect of power communication networks have increased. In [19], the authors proposed a two-phase control strategy to mitigate cascading failures in a power grid by exploiting the interdependence of the power grid and the communication network.
In [20], Motter and Lai proposed the maximum load (ML) model, which has since become one of the most widely used models for studying cascading failures in complex networks. A node in a network fails and is removed from the grid when its load exceeds its maximum capacity in the ML model. At the same time, the load connected to that failed node is redistributed by some distribution method.
In [21], Cordova-Garcia proposed a load-shedding cascading control algorithm that considered the time delay of the power communication network. The authors of [22] proposed a node state influence matrix to analyze the cascading failures at the beginning of the development process. In addition, a matrix solution method based on the quadratic programming optimization model was given. Data-driven RL [23] algorithms also gained significant interest in the power systems community. A systematic RL framework was proposed to solve multilevel cascading failure problems, using the concept of “two-player games” and implementing a co-simulation platform based on DIgSILENT and MATLAB for RL [24]. The RL method based on DC-OPF was proposed to solve the cascading failure problem to some extent [25]. In [26], a new method was proposed for power system vulnerability analysis based on a double Q-learning algorithm to obtain the optimal attack results under sequential attack conditions, which provided new ideas for future research about sequential attacks on power systems.
The DRL algorithm was used for short-term voltage control of the system [27] and the determination of generator set trip** in emergency situations [28]. A deep Q-network (DQN) optimization framework which combined deep neural networks (DNNs) and RL was proposed in the single-core networks which considered mixed failure modes [29]. It provided an effective solution to reduce failure propagation and improve the robustness of SCNs.
DQN uses a greedy strategy for exploration which tends to overestimate the Q value and is not conducive to algorithm convergence. The SAC algorithm is a maximum entropy-based DRL algorithm with a stochastic strategy. Compared with the deep deterministic policy gradient (DDPG) algorithm, it can increase the randomness of exploration and the training speed to avoid obtaining local optimal solutions. Moreover, DDPG can only be applied to a continuous space. Therefore, we adopted the SAC algorithm to mitigate cascading failures.
In this paper, the SAC algorithm based on the DRL technique is used for emergency control to mitigate cascading failures when power grid failures occur. Since buses in the grid are physical entities that cannot be added or removed at will, and most transmission lines are equipped with automatic protection relay devices, the devices can trip when the power or temperature exceeds a certain threshold. Therefore, this paper mainly focuses on line failures rather than bus failures.

3. System Model Construction

3.1. Power Grid Model Construction

We used the IEEE 14 bus system in Figure 1 as an example to introduce the power grid model construction method. The system consists of N bus nodes, M transmission lines, K generators, and R loads, which can be represented by an undirected graph. The bus nodes are represented by a node collection V = { v i , i Λ } , where Λ = 1 , 2 , 3 , , N . In Figure 1, N = 14 , and V = v 1 , v 2 , , v 14 .
The connectivity of the bus nodes is represented by the adjacency matrix H . If there is a transmission line between bus v i and bus v j , then H i , j = 1 ; otherwise, H i , j = 0 . In the constructed undirected graph, the positive and negative power flow in the transmission lines represent the direction of power flow, and a single transmission line can flow in either the forward or reverse direction. An undirected transmission line can be considered as two directed transmission lines, and the set of directed transmission lines can be expressed as
E d = e i , j H ( i , j ) = 1 , i Λ , j Λ .
Actually, e i , j and e j , i represent the same transmission line. To represent the actual grid structure, only one directed transmission line can be used to represent that physical transmission line, defining the set of physical transmission lines as follows:
E u = e i , j H ( i , j ) = 1 , i < j , i Λ , j Λ .
The number of elements in the set is M. In Figure 1, we have
E u = e 1 , 2 , e 1 , 5 , e 2 , 3 , e 2 , 4 , e 2 , 5 , e 3 , 4 , e 4 , 5 , e 4 , 7 , e 4 , 9 , e 5 , 6 , e 6 , 11 , e 6 , 12 , e 6 , 13 , e 7 , 8 , e 7 , 9 , e 9 , 10 , e 9 , 14 , e 10 , 11 , e 12 , 13 , e 13 , 14 .
Thus, the modeling of the grid can be expressed by G = V , E u .

3.2. Communication Network Model Construction

We considered the power communication network from the information transmission layer, setting it to have the same topology as the power grid and to host the control and measurement agents, with the control agents having actuators that enabled remote control of power grid actions such as load shedding, circuit breaker trip**, and line disconnection. Here, we assumed that the power communication network sent appropriate signals to the grid to remotely control the grid’s load-shedding action based on the DRL load shedding strategy results.
The power communication network was established with the same topology as the power grid, and the switch nodes in the power communication network had a one-to-one correspondence with the bus nodes in the power grid. We assumed that the forwarding processing delay of the switch nodes in the power communication network was the same, being fixed at τ v . When defining the link length matrix L , if the transmission line and the fiber optic cable link e i , j E d , then the element d e i , j of row i and column j of the matrix represent the length of the line, and if e i , j E d , then the matrix element takes the value of zero:
L ( i , j ) = d e i , j , e i , j E d 0 , e i , j E d , i , j Λ
The routing and policy of load-shedding commands were studied under the premise of modeling the power grid and power communication network. We defined the matrix ε to represent one of the reachable routes. If ε i , j = 1 , then the reachable route contains transmission line e i , j .
It is required to aggregate the fiber optic links and switch the nodes on a route to calculate the command transmission delay. According to the definition of route ε , the set of fiber optic links of this route can be obtained as follows:
E c = { e i , j | ε ( i , j ) = 1 , i Λ , j Λ } .
To represent the set of switch nodes that the route passes, we took the starting nodes of all the optical links to form a node set { v i | j = 1 N ε ( i , j ) = 1 } and then added the target node to the set. The set of switch nodes can be denoted by
V ε = { v i | j = 1 N ε ( i , j ) = 1 } { d ( n ) }
where d ( n ) is the target node.
The control command delivery delay τ ε for route ε is the sum of the transmission delay of the fiber optical link and the processing and forwarding delay of the intermediate nodes. The intermediate nodes of the route are the nodes after removing the start and target nodes from the switch nodes through which the route passes. The number of switch nodes passing through the route is V ε , the number of intermediate nodes is V ε 2 , and the processing and forwarding delay of all intermediate nodes is
τ 1 = ( V ε 2 ) τ v .
From the link length matrix and the set of control command distribution route links, the control command distribution route length is
L e = l ( e i , j ) , e i , j E c .
Then, the propagation delay of the control command data in the transmission line is
τ 2 = 1 c / n r L e
where c / n r is the control command transmission speed, c is the speed of light, and  n r is the refractive index of the fiber optic cable. The control command delivery delay τ ε is τ 1 + τ 2 .
Let the information collection delay be τ 3 , which is the maximum value of the delay when the CC collects all the instantaneous data of the network and the same as the command transmission delay of the farthest node from the CC:
τ 3 = ( τ 1 + τ 2 ) max .
The delay in generating and issuing load-shedding control commands from the CC is  τ 4 .
Therefore, the total delay for the control command to be generated and take effect is
τ = τ 1 + τ 2 + τ 3 + τ 4 .
This time delay is the power communication delay. The Dijkstra algorithm [30] is used to calculate the shortest route between the control node and the action application node as well as the longest way to collect the node data.

3.3. Constraints

The upper limit of the transmission line capacity with a head node i and tail node j is
C L i j = ( 1 + β ) C i j
where C i j is the initial power flow of the transmission line with a head node i and tail node j. Here, β denotes the safety margin of the line. The safety margin in the actual power grid is generally not large for economic reasons, and thus β = 0.5 is used in the subsequent simulations.

3.3.1. DC Power Flow Constraint

The matrix representation of the DC power flow model is as follows:
P = B φ
where B is the nodal admittance matrix. In the DC model, the power flow f i j depends on the reactance x i j and the voltage angle φ (i.e.,  φ i φ j = x i j f i j ). The P i in the active power vector P = P 1 , , P i , , P N T corresponds to j V ( i ) f i j = P i , i , j V , and ( i , j ) E d , V ( i ) , which is the set of neighborhoods of node i.

3.3.2. Basic Kirchhoff’s Law and Ohm’s Law Constraints

The matrix representation of the constraints are as follows:
Xf = g f = W X T φ
where f represents the power flow in the transmission line, the connection matrix X = x 1 , x 2 , x 3 , , x M is an N-row and M-column matrix, the mth ( m = [ 1 , 2 , , M ] ) column vector x m corresponds to the transmission line m, and the +1 and −1 in x m denote the head and tail nodes of the transmission line, respectively. The values on the remaining nodes are zero, while g is the supply and demand vector. The node is connected to the generator if g i > 0 . The node is connected to the load if g i < 0 . If  g i = 0 , then the node is connected to neither the generator nor the load. The diagonal element in W denotes the magnetization or weight of the corresponding line [31].

3.3.3. Supply and Demand Balance Constraint

The power generated by the generator is balanced with the energy consumed by the load.
1 T p = 0 .

3.4. Cascading Failure Evolution Process

Removing a transmission line changes the network’s topology. When a transmission line fails, the network begins a series of cascading iterations. First, subnet detection is performed to determine whether the subnet contains generator nodes, and if there are no generator nodes, then all unserviced nodes are removed. The power flow distribution is calculated for the subnet containing generator nodes. The nodes and transmission lines that exceed their capacity will be removed. The process is repeated until all surviving nodes and transmission lines work stably, at which point the cascading failure process is completed. Figure 2 shows the power grid cascading failures occurrence flow chart.

3.5. Simulation Tool PYPOWER

The power system analysis toolkit PYPOWER can be understood as a Python version of MATPOWER, with similar functions to MATPOWER.
MATPOWER is an open-source Matlab-based power system simulation package widely used for research and education on AC and DC currents and OPF simulation. Many sample power flow and OPF cases are included, ranging from trivial four-bus examples to real-life cases with several thousand buses.
The simulation data mainly include the baseMVA, bus, branch, gen, and gencost, where baseMVA is a scalar and the rest are matrices. The simulations in this paper mainly used the general power flow calculation function in PYPOWER and especially considered the DC power flow. The primary data of interest for the simulation included the active part of the bus and branch matrices, and the partial structures of the two matrices are shown in Table 1 and Table 2.
From left to right, the parameters in Table 1 indicate the bus number, bus type, the active power of the bus injected load, reactive power of the bus injected load, bus parallel conductance, bus parallel conductance, grid section number, the initial amplitude of the bus voltage, the initial phase angle of the bus voltage, and the reference voltage of the bus.
The parameters in Table 2 from left to right indicate the branch starting node number, branch ending node number, branch resistance, branch reactance, branch electric power, branch long-term allowable power, branch short-term allowable power, branch emergency allowable power, branch variation ratio, branch phase angle, and branch operating state.

4. Deep Reinforcement Learning

RL can interact with the environment and learn specific knowledge so that it can solve related problems. Interacting with the environment is similar to the evolution process in a power grid system. To apply RL to a practical power grid problem, it is first necessary to map the physical quantities of the power grid to the components of the RL framework (i.e., the agent, state, action, reward, policy, and Q-function).
For cascading failure problems in practical power grid systems, the state space and action space can become massive due to many bus nodes in the power grid system, and the traditional RL represented by Q-learning is no longer applicable. DNNs have obvious advantages in dealing with nonlinear large space problems. Therefore, DNNs are used to approximate the policy function and Q-function, which are combined with RL as DRL.

4.1. The Structure of RL

  • Agent: This is used to transmit data information and intervention operations in the power grid.
  • State s t : This represents the observation value at time step t in the power grid environment. It includes the generator’s active power P g , load active power P d , and voltage V d , line state o , and power flow f .
  • Action a t : This denotes the output value at time step t according to the state s t and the policy network π . In our simulation environment, the action reduced the power value of the load. The reduction value was 20% of the initial power [32].
  • Policy π : This represents the probability that the agent takes action a t in state s t .
  • Reward r t : The reward r t is used as a performance metric to evaluate how good the action a t is in the given state s t .
  • Q-function: This indicates the expected return of taking action a t under state s t .
During the interaction with the power grid environment, the agent searches for an ideal policy that maximizes the long-term cumulative discount reward. The state-action Q-function describes the expectation of the cumulative discount reward and is expressed as follows:
Q π s t , a t = E π [ τ = 0 γ τ r t + τ + 1 s t , a t ]
where γ ( 0 , 1 ] denotes the discount factor.
We used an actor network μ θ π s to approximate the policy function and a critic network Q ( θ Q s , a ) to approximate the state-action Q-function. The actor network is represented by μ θ π s , with the state s as the input and action a as the output. The action a is the input to the critic network Q ( θ Q s , a ) along with the state s, and the output is the Q value of the state-action pair, while θ Q and θ π denote the critic network and actor network parameters.

4.2. SAC

Since the load-shedding action in the power grid system is discrete, we adopted the maximum entropy-based SAC [33,34] method to settle the cascading failure problems. In contrast to the deterministic policy, the entropy of the accumulative reward and the strategy is maximized, instead of simply maximizing the cumulative discount reward. The desired actor network is defined as follows:
μ ( θ π s t ) = arg max μ E μ [ τ = 0 γ τ ( r t + τ + 1 + α H ( μ ( θ π s t ) ) ) ]
where α is a coefficient that controls the importance of entropy and H ( μ ( θ π s t ) ) is the entropy of the actor network μ ( θ π s t ) , denoted by
H ( μ ( θ π s ) ) = E μ [ log ( μ ( θ π s ) ) ] .
In the SAC algorithm, the Q-function is rewritten as
Q ( θ Q s , a ) = E μ [ τ = 0 γ τ ( r t + τ + 1 + α H ( μ ( θ π s t ) ) ) ] .
The SAC algorithm consists of five networks: an actor network μ ( θ π s ) , two critic networks Q ( θ 1 Q s , a ) and Q ( θ 2 Q s , a ) , and two target critic networks Q ( θ 1 , target Q s , a ) and Q ( θ 2 , target Q s , a ) . In the learning process, the smaller value of the two critic networks is used as the Q value in each step of the SAC algorithm to avoid overestimating the value. Each neural network contains a four-layer network structure (i.e., an input layer, two fully connected layers, and an output layer).
At time step t, the action a t is obtained from the actor network. Based on the state s t , the corresponding reward r t is calculated, and then the state is updated to s t + 1 , while the quaternion data ( s t , a t , r t , s t + 1 ) are stored in an experience replay buffer R b of a size D. Due to the time delay of the power communication network, the state s t and load-shedding action a t are not available and effective in real time. According to the action a t selected by the CC in the power communication network, the corresponding time delay τ a t is calculated. The action a t will take effect only after τ a t time has elapsed. The number of delay steps T R L a t in RL is expressed as follows:
T R L a t = τ a t τ R L
where · is rounded down and τ R L is the time delay corresponding to each time step. Figure 3 depicts the mechanism by which the actions take effect during training.
When the data in the experience replay buffer is full, the newly deposited quaternion data replaces the initial quaternion data. At each time step t, the quaternion data of size B are randomly selected for a network update. The target critic network is used to compute
y I i t = r I i t + γ ( min j = 1 , 2 Q ( θ j , target Q | s I i t + 1 , a I i t + 1 ) α H ( μ ( θ π s I i t ) ) ) .
The loss functions of the two critic networks are represented by
L F t ( θ j Q ) = 1 B i = 1 B ( y I i t Q ( θ j Q | s I i t , a I i t ) ) 2 , j = 1 , 2 .
The loss function of the actor network is expressed as follows:
J t ( θ π ) = 1 B i = 1 B ( α log μ ( θ π s I i t , a I i t ) min j = 1 , 2 Q ( θ j Q s I i t , a I i t ) )
where a I i t indicates the state s I i t of the reparameterized sampling action. The loss function of the coefficient α of entropy is expressed as follows:
J t ( α ) = 1 B i = 1 B α ( log μ ( θ π s I i t ) + H )
where H is defined as the objective entropy. Gradients are used to update the two critic networks, actor network, and  α :
θ j Q θ j Q l Q θ j Q L t ( θ j Q ) , j = 1 , 2
θ π θ π l π θ π J ( θ π )
α α l α α J ( α )
where l Q , l π , and  l α denote the learning rates of the corresponding network parameters and take on a value between 0 and 1. The two target critic networks are updated by a soft update:
θ j , target Q η θ j Q ( 1 η ) θ j , target Q , j = 1 , 2
where η denotes the learning rate of the target critic networks.
The structure of the SAC algorithm is shown in Figure 4, and the algorithm details of SAC are shown in Algorithm 1. At each episode, the interaction process between the SAC algorithm and the power grid environment is shown in Figure 5.
Algorithm 1 SAC algorithm.
  • Input:  P g , P d , V d , o , f , l π , l Q , l α , η , D and B.
  • Output: The load-shedding action for mitigating power grid cascading failures.
  • Initialize: Get the initial topology and data information in the power grid, set the initial failures, initialize the neural network parameters θ j Q , θ j , t arg et Q , θ π , j = 1 , 2 , and initialize the experience replay buffer R b .
  • for e p i s o d e = 1 , 2 , , U do
  •     Initialize the power grid environment and get state s 1 .
  •     for  t = 1 , 2 , , T  do
  •         Obtain action a t = μ θ π s t based state s t .
  •         Calculate the action a t time delay τ a t , and action
  •          a t takes effect depending on the time delay τ a t ,
  •         calculate reward r t and obtain new state s t + 1 .
  •         Place quaternion s t , a t , r t , s t + 1 into R b .
  •         Random sample minibatch of size B from R b .
  •         Update two critic train networks with (21) (22) (25).
  •         Update the actor network with (23) (26).
  •         Update the coefficient α of entropy with (24) (27).
  •         Update the target networks according to (28).
  •     end for
  • end for
First, we obtained the data information of the power grid system in which the transmission line failure occurred, from which we found the state s t , including the active generator power P g , active load power P d , load voltage V d , line state o , and power flow f . According to the state s t , the action a t was obtained from the actor network, and the load-shedding action was performed at the corresponding load according to the power communication network delay. Then, the active load power P d was updated, and the updated data were sent to PYPOWER to simulate the evolution of the grid. The power flow was obtained after the evolution finished, and we updated the line state according to Equation (29).
o i , j = 0 if f i , j > C i , j 1 else
where f i , j indicates the power flow of the transmission line e i , j and o i , j denotes the state of the transmission line e i , j , where a one indicates a connected one and a zero indicates a disconnected one. The system becomes stable if no new overloaded line appears in the grid. Then, the interaction process of the current episode with the power grid environment ends, and a new interaction process of the next episode starts. If there are still overloaded lines, then the iterative process continues until the power grid reaches a stable state or the maximum number of iterative steps T.

5. Numerical Results and Evaluations

We used the IEEE 14 and 30 bus systems as simulation cases. This consisted of N = 14 bus nodes, M = 20 transmission lines, K = 5 generators, and R = 11 loads in the IEEE 14 bus system. We assumed that the distance of all transmission lines was 20 km and the communication CC was located at bus node 6. Choosing node 6 as the CC was a result of random selection. Selecting other nodes was feasible. This would change the transmission delay, and the simulation effect would differ. The node processing forwarding delay τ v was 0.022 ms. Table 3 describes the total time delay of the load-shedding action on each load when the CC was at nodes 6 and 10. The IEEE 30 bus system environment is not further described.
All layers in the actor network and the fully connected layer in the critic networks used the ReLU activation function, and the softmax function was used in the output layers of the critic networks. The optimizers of all networks were Adam, and the two fully connected layers of all networks contained 256 and 64 neurons. The GPU and CPU in our computer used for the simulations were an RTX3050Ti and i5-11400H. The specific simulation parameters are shown in Table 4.
For the rewards in the simulation process, we defined the following sub-rewards: load-shedding sub-reward r 1 , line disconnection sub-reward r 2 , line usage sub-reward r 3 , and residual load sub-reward r 4 :
r 1 = 10 n 1 R
r 2 = 10 n 2 M
r 3 = r i j , line _ usage
r 4 = 5 P d 1 P d , ini 1
where n 1 denotes the total number of removing loads and n 2 denotes the total number of failure lines, while r i j , line _ usage denotes the utilization reward of line e i , j , defined as
r i j , line _ usage = cos ( b i , j 0.8 ) π 0.2 1 if b i , j > 0.8 0 else
where b i , j denotes the utilization of the line e i , j . This is expressed as follows:
b i , j = f i , j C L i , j , i < j , i Λ , j Λ
We calculated the average reward with a sliding average of the rewards obtained for different episodes:
r a v e = i = 1 e p i r i e p i , e p i = 1 , 2 , U .
In a power grid and power communication network, the time delay of the power communication network will impact the collection of data and the distribution of actions in the power grid. In response to failures in the power grid, the load-shedding action made by the CC will lag for some time before it takes effect. Figure 6 depicts the reward varying with episodes under the influence of a delay in the power communication network. As can be seen in Figure 6, the delay reduced the average reward obtained by the SAC algorithm. At the beginning of the training process, as the agent was in the exploration phase, it took actions that may have aggravated the cascading failures, thus reducing the average reward.
When using the SAC algorithm without considering the time delay, the actions taken by the agent take effect immediately in the exploration phase. In contrast, when considering the time delay, the actions that may aggravate the cascading failures at the current moment will take effect with a delay. By the time the actions take effect, they may be transformed into actions that mitigate the cascading failures for the current power grid, which in turn increases the average reward.
Table 5 depicts the network cascading failure scale for the IEEE 14 and 30 bus systems after the system is stabilized by free evolution of the power grid. F in Table 5 indicates the scale of the cascading failures. The failure scale is the ratio of the number of transmission lines that fail in the system to the total number of transmission lines. Figure 7 and Figure 8 show the failure scale of the IEEE 14 and 30 bus systems with the SAC algorithm for the load-shedding action with or without considering the time delay compared to the failure scale without taking load-shedding action.
In Figure 7, the even-numbered lines have a larger failure scale than the odd-numbered lines with the SAC algorithm. The cause of this phenomenon is accidental. Every numbered line has a different connection relationship limited power value. There was no regularity for the scale of failure caused by the failure of the odd- or even-numbered lines because we numbered the line numbers randomly. This phenomenon occurred coincidentally.
As seen in Figure 7 and Figure 8, for four different line failures, the SAC algorithm can reduce the cascading failure scale from 0.55, 0.5, 0.3, and 0.3 as well as 1, 0.61, 0.39, and 0.39 to 0.2, 0.25, 0.1, and 0.25 as well as 0.2, 0.15, 0.17, and 0.27, respectively. With the network trained by the SAC algorithm, the CC can quickly make the corresponding load-shedding action according to the failure, and the time from the initial failure state to the stable state was 42.86 ms. The less time it takes for the power grid system to return to the stable state, the smaller the scale of the resulting cascading failure.
From Figure 6, Figure 7 and Figure 8, we can conclude that the load shedding strategy based on the SAC algorithm can obviously reduce the cascading failure scale of a power grid. The action loses its timeliness due to the impact of the communication network delay. It is slightly less effective at mitigating cascading failures than not considering the communication delay.
The percentage of remaining load under different initial line failures after load shedding can be seen in Figure 9. The remaining load for some numbered loads was zero. This was due to the load disconnected from the network, which was caused by line disconnection during the evolution of the grid. Since the network structure in the IEEE 14 bus system is relatively simple, the grid with SAC reached a stable state after 2–3 steps of load shedding. Therefore, the load shedding of SAC in Figure 9 is relatively small. The SAC algorithm can retain most of the load compared with the free evolution without action.
Figure 10 depicts the transmission line utilization of the IEEE 14 bus system with the SAC algorithm after setting different initial failure lines. As can be seen in Figure 10, most of the line utilization in the IEEE 14 bus system was below 85% with the SAC algorithm, avoiding highly loaded transmission lines, which can significantly alleviate the cascading failure of the power grid.

6. Conclusions

In this paper, we addressed the problem of grid cascading failures triggered by a single transmission line failure and proposed a DRL-based SAC algorithm to mitigate grid cascading failures, considering communication delay by load shedding. The simulation environment was established with IEEE 14 and 30 bus systems, and the simulation process considered the latency of a load-shedding action due to communication delay and verified the feasibility and effectiveness of the SAC algorithm for mitigating cascading failures. The trained network can decide on actions and give commands quickly when a specific initial failure is encountered, reducing the scale of cascading failures. However, affected by the information transmission delay of the power communication network, there was a noticeable gap in the effectiveness of cascading failure mitigation compared with when there was no delay. In future research, we will consider different control actions, such as changing line connection relationships and node voltages, as well as simulations for different failure types in the power grid and power communication network, such as node failure, switch failure, and communication route blockage, to verify the effectiveness of the algorithm for various scenarios.

Author Contributions

Conceptualization, Y.W. and W.Z.; Methodology, Y.W., A.T., Y.J. and L.M. (Liqiang Ma); Software, A.T., Y.J. and L.M. (Liqiang Ma); Validation, L.M. (Liqiang Ma); Formal analysis, Y.W. and W.Z.; Investigation, A.T.; Resources, Y.J., W.Z. and L.M. (Liang Ma); Data curation, L.M. (Liang Ma) and C.S.; Writing—original draft, Y.W. and W.Z.; Writing—review & editing, A.T., L.M. (Liqiang Ma) and J.S.; Visualization, L.M. (Liang Ma) and C.S.; Supervision, J.S.; Project administration, A.T.; Funding acquisition, J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Science and Technology project of the State Grid Corporation of China (Research on Dispatching Fusion Communication Oriented to Power Communication Network and Its Cooperative Control with Power Network Operation, 52060022001B).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Shuvro, R.A.; Wangt, Z.; Das, P.; Naeini, M.R.; Hayat, M.M. Modeling cascading-failures in power grids including communication and human operator impacts. In Proceedings of the 2017 IEEE Green Energy and Smart Systems Conference (IGESSC), Long Beach, CA, USA, 6–7 November 2017; pp. 1–6. [Google Scholar] [CrossRef]
  2. Wang, Z.; Rahnamay-Naeini, M.; Abreu, J.M.; Shuvro, R.A.; Das, P.; Mammoli, A.A.; Ghani, N.; Hayat, M.M. Impacts of Operators’ Behavior on Reliability of Power Grids During Cascading Failures. IEEE Trans. Power Syst. 2018, 33, 6013–6024. [Google Scholar] [CrossRef]
  3. Dobson, I.; Carreras, B.; Newman, D. Branching Process Models for the Exponentially Increasing Portions of Cascading Failure Blackouts. In Proceedings of the 38th Annual Hawaii International Conference on System Sciences, Big Island, HI, USA, 3–6 January 2005; p. 64a. [Google Scholar] [CrossRef]
  4. Saleh, B.; Yousef, A.M.; Abo-Elyousr, F.K.; Mohamed, M.; Abdelwahab, S.A.M.; Elnozahy, A. Performance Analysis of Maximum Power Point Tracking for Two Techniques with Direct Control of Photovoltaic Grid -Connected Systems. Energy Sources Part A Recover. Util. Environ. Eff. 2021, 44, 413–434. [Google Scholar] [CrossRef]
  5. Eid, M.A.E.; Abdelwahab, S.A.M.; Ibrahim, H.A.; Alaboudy, A.H.K. Improving the Resiliency of a PV Standalone System Under Variable Solar Radiation and Load Profile. In Proceedings of the 2018 Twentieth International Middle East Power Systems Conference (MEPCON), Cairo, Egypt, 18–20 December 2018; pp. 570–576. [Google Scholar] [CrossRef]
  6. Anderson, R. Final Report on the August 14, 2003 Blackout in the United States and Canada: Causes and Recommendations; Technical Report; U.S.-Canada Power System Outage Task Force: Washington, DC, USA, 2004. [Google Scholar]
  7. Makarov, Y.; Reshetov, V.; Stroev, A.; Voropai, I. Blackout Prevention in the United States, Europe, and Russia. Proc. IEEE 2005, 93, 1942–1955. [Google Scholar] [CrossRef]
  8. Lai, L.L.; Zhang, H.T.; Lai, C.S.; Xu, F.Y.; Mishra, S. Investigation on July 2012 Indian blackout. In Proceedings of the 2013 International Conference on Machine Learning and Cybernetics, Tian**, China, 14–17 July 2013; Volume 1, pp. 92–97. [Google Scholar] [CrossRef]
  9. Bernstein, A.; Bienstock, D.; Hay, D.; Uzunoglu, M.; Zussman, G. Power grid vulnerability to geographically correlated failures-Analysis and control implications. In Proceedings of the IEEE INFOCOM 2014—IEEE Conference on Computer Communications, Toronto, ON, Canada, 27 April–2 May 2014; pp. 2634–2642. [Google Scholar] [CrossRef]
  10. Lin, Z.; Wen, F.; Wang, H.; Lin, G.; Mo, T.; Ye, X. CRITIC-Based Node Importance Evaluation in Skeleton-Network Reconfiguration of Power Grids. IEEE Trans. Circuits Syst. II Express Briefs 2018, 65, 206–210. [Google Scholar] [CrossRef]
  11. Huang, Q.; Huang, R.; Hao, W.; Tan, J.; Fan, R.; Huang, Z. Adaptive Power System Emergency Control using Deep Reinforcement Learning. IEEE Trans. Smart Grid 2020, 11, 2933191. [Google Scholar] [CrossRef] [Green Version]
  12. Qu, Y.; Gao, M.; Chen, Y.; **g, F.; Zhang, S.; Zhang, L. An analysis of the invulnerability for communication networks base on Cascading failure model. In Proceedings of the 2020 International Conference on Robots Intelligent System (ICRIS), Sanya, China, 7–8 November 2020; pp. 154–157. [Google Scholar] [CrossRef]
  13. Qi, J.; Ju, W.; Sun, K. Estimating the Propagation of Interdependent Cascading Outages With Multi-Type Branching Processes. IEEE Trans. Power Syst. 2017, 32, 1212–1223. [Google Scholar] [CrossRef] [Green Version]
  14. Li, P.; Sheng, W.; Duan, Q. Optimal Power Flow Calculation Method for AC/DC Hybrid Distribution Network Based on Power Router. In Proceedings of the 2021 6th Asia Conference on Power and Electrical Engineering (ACPEE), Chongqing, China, 8–11 April 2021; pp. 1694–1699. [Google Scholar] [CrossRef]
  15. Misra, S.; Roald, L.; Vuffray, M.; Chertkov, M. Fast and Robust Determination of Power System Emergency Control Actions. ar** Error Reduction. ar**+Error+Reduction&author=Kumar,+A.&author=Fu,+J.&author=Tucker,+G.&author=Levine,+S.&publication_year=2019&journal=ar** Under Emergency Circumstances Based on Deep Reinforcement Learning. Zhongguo Dianji Gongcheng Xuebao/Proc. Chin. Soc. Electr. Eng. 2018, 38, 109–119. [Google Scholar] [CrossRef]
  16. Zhang, L.; Zhou, J.; Ma, Y.; Shen, L. Sequential Topology Attack of Supply Chain Networks Based on Reinforcement Learning. In Proceedings of the 2022 International Conference on Cyber-Physical Social Intelligence (ICCSI), Nan**g, China, 18–21 November 2022; pp. 744–749. [Google Scholar] [CrossRef]
  17. Kumar, V.; Jangir, S.; Patanvariya, D.G. Traffic Load Balancing in SDN Using Round-Robin and Dijkstra Based Methodology. In Proceedings of the 2022 International Conference for Advancement in Technology (ICONAT), Goa, India, 21–22 January 2022; pp. 1–4. [Google Scholar] [CrossRef]
  18. Ba, Q.; Savla, K. A dynamic programming approach to optimal load shedding control of cascading failure in DC power networks. In Proceedings of the 55th IEEE Conference on Decision and Control, CDC 2016, Las Vegas, NV, USA, 12–14 December 2016; pp. 3648–3653. [Google Scholar] [CrossRef]
  19. Kuiava, R.; Bogodorova, T.; Fernandes, T.C.C.; Ramos, R.A. A Study on the Relation between the Maximum Loadability Point and Undervoltage Load Shedding Schemes. In Proceedings of the 2020 IEEE Power & Energy Society General Meeting (PESGM), Montreal, QC, Canada, 2–6 August 2020; pp. 1–5. [Google Scholar] [CrossRef]
  20. Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 1861–1870. [Google Scholar]
  21. Ding, F.; Ma, G.; Chen, Z.; Gao, J.; Li, P. Averaged Soft Actor-Critic for Deep Reinforcement Learning. In Proceedings of the 2017 International Conferenc Ma, Guanfeng e on Computing, Networking and Communications (ICNC), Silicon Valley, CA, USA, 26–29 January 2021; pp. 1488–1493. [Google Scholar] [CrossRef]
Figure 1. IEEE 14 bus system topology diagram.
Figure 1. IEEE 14 bus system topology diagram.
Electronics 12 03024 g001
Figure 2. Network cascade failures occurrence flow chart.
Figure 2. Network cascade failures occurrence flow chart.
Electronics 12 03024 g002
Figure 3. Schematic diagram of the actions taking effect at step t.
Figure 3. Schematic diagram of the actions taking effect at step t.
Electronics 12 03024 g003
Figure 4. The structure of the SAC algorithm.
Figure 4. The structure of the SAC algorithm.
Electronics 12 03024 g004
Figure 5. The interaction process between the SAC algorithm and the power grid environment in an episode.
Figure 5. The interaction process between the SAC algorithm and the power grid environment in an episode.
Electronics 12 03024 g005
Figure 6. Average rewards varying with increasing episodes.
Figure 6. Average rewards varying with increasing episodes.
Electronics 12 03024 g006
Figure 7. Relationship between initial failure line and cascading failure scale of IEEE 14 bus system under different algorithms.
Figure 7. Relationship between initial failure line and cascading failure scale of IEEE 14 bus system under different algorithms.
Electronics 12 03024 g007
Figure 8. Relationship between initial failure line and cascading failure scale of IEEE 30 bus system under different algorithms.
Figure 8. Relationship between initial failure line and cascading failure scale of IEEE 30 bus system under different algorithms.
Electronics 12 03024 g008
Figure 9. Residual load of different initial failure lines after system stabilization using the SAC algorithm in the IEEE 14 bus system.
Figure 9. Residual load of different initial failure lines after system stabilization using the SAC algorithm in the IEEE 14 bus system.
Electronics 12 03024 g009
Figure 10. Line utilization of different initial failure lines after system stabilization using the SAC algorithm in the IEEE 14 bus system.
Figure 10. Line utilization of different initial failure lines after system stabilization using the SAC algorithm in the IEEE 14 bus system.
Electronics 12 03024 g010
Table 1. Partial parameters of bus matrix.
Table 1. Partial parameters of bus matrix.
Bus_iTypePdQdGsBsAreaVmVabaseKV
1197.644.20011.0364−13.537311
Table 2. Partial parameters of branch matrix.
Table 2. Partial parameters of branch matrix.
fbustbusrxbrateArateBrateCratioanglestatus
120.30.40.6606060001
Table 3. Total delay of each load number (LN) when the CC is at different nodes.
Table 3. Total delay of each load number (LN) when the CC is at different nodes.
Delay/msLN1234567891011
CC
Node 60.7120.8090.8310.7120.5930.5930.8310.9490.8310.7120.593
Node 100.9490.8310.8310.7120.8310.7120.7120.8310.5930.5930.593
Table 4. Simulation parameters.
Table 4. Simulation parameters.
ParametersValueParametersValue
U1200Soft update learning rate η 0.001
T100 α learning rate l α 0.001
Batch size B32Discount factor0.96
Experience replay buffer D10,000Fiber refractive index n r 1.45
Actor network learning rate l π 0.001Critic network learning rate l Q 0.001
Table 5. Failure scale caused by partial transmission line failure of IEEE 14 and 30 bus systems.
Table 5. Failure scale caused by partial transmission line failure of IEEE 14 and 30 bus systems.
Line Number e i , j (14)F(14) e i , j (30)F(30)
1 e 4 , 5 11 e 3 , 4 41
2 e 1 , 2 10 e 8 , 28 25
3 e 4 , 9 6 e 1 , 3 16
4 e 9 , 14 6 e 12 , 16 16
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wei, Y.; Tian, A.; Jiang, Y.; Zhang, W.; Ma, L.; Ma, L.; Sun, C.; Sun, J. A DRL-Based Load Shedding Strategy Considering Communication Delay for Mitigating Power Grid Cascading Failure. Electronics 2023, 12, 3024. https://doi.org/10.3390/electronics12143024

AMA Style

Wei Y, Tian A, Jiang Y, Zhang W, Ma L, Ma L, Sun C, Sun J. A DRL-Based Load Shedding Strategy Considering Communication Delay for Mitigating Power Grid Cascading Failure. Electronics. 2023; 12(14):3024. https://doi.org/10.3390/electronics12143024

Chicago/Turabian Style

Wei, Yong**g, Anqi Tian, Yingjie Jiang, Wenjian Zhang, Liqiang Ma, Liang Ma, Chao Sun, and Jian Sun. 2023. "A DRL-Based Load Shedding Strategy Considering Communication Delay for Mitigating Power Grid Cascading Failure" Electronics 12, no. 14: 3024. https://doi.org/10.3390/electronics12143024

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop