False data injection attack in smart grid: Attack model and reinforcement learning-based detection method

Lin, Xixiang; An, Dou; Cui, Feifei; Zhang, Feiye

doi:10.3389/fenrg.2022.1104989

ORIGINAL RESEARCH article

Front. Energy Res. , 24 January 2023

Sec. Process and Energy Systems Engineering

Volume 10 - 2022 | https://doi.org/10.3389/fenrg.2022.1104989

This article is part of the Research Topic Energy Efficiency Analysis and Intelligent Optimization of Process Industry View all 11 articles

False data injection attack in smart grid: Attack model and reinforcement learning-based detection method

School of Automation Science and Engineering, Xi’an Jiaotong University, Xi’an, Shaanxi, China

The smart grid, as a cyber-physical system, is vulnerable to attacks due to the diversified and open environment. The false data injection attack (FDIA) can threaten the grid security by constructing and injecting the falsified attack vector to bypass the system detection. Due to the diversity of attacks, it is impractical to detect FDIAs by fixed methods. This paper proposed a false data injection attack model and countering detection methods based on deep reinforcement learning (DRL). First, we studied an attack model under the assumption of unlimited attack resources and information of complete topology. Different types of FDIAs are also enumerated. Then, we formulated the attack detection problem as a Markov decision process (MDP). A deep reinforcement learning-based method is proposed to detect FDIAs with a combined dynamic-static detection mechanism. To address the sparse reward problem, experiences with discrepant rewards are stored in different replay buffers to achieve efficiency. Moreover, the state space is extended by considering the most recent states to improve the perception capability. Simulations were performed on IEEE 9,14,30, and 57-bus systems, proving the validation of attack model and efficiency of detection method. Results proved efficacy of the detection method in different scenarios.

1 Introduction

Smart grid is a representative cyber-physical system (Pasqualetti et al., 2013), permitting the bidirectional communication of both information and electric power between the utility and users. In the energy management system (EMS), state estimation plays a critical part of information-physical integration. Through state estimation, EMS can recognize the actual state of electricity transmission by filtering out possible noise to improve the reliability of the real-time data (Katiraei and Iravani, 2006).

Due to the decentralized and multi-temporal coupled characteristics of measurement devices (Annaswamy and Amin, 2013), the terminal equipment often lacks effective physical protection, resulting in the susceptibility to attacks. Diversified cyber-physical system attacks have been proposed and among the attacks, FDIA is a representative one. The FDIA attacker constructs attack vectors through specific algorithms conditioned on the power grid topology information, injects them through weak points of the grid, and avoids being detected to damage data integrity (An et al., 2019). FDIAs can directly affect the state estimation and the subsequent control elements, causing the system to lose stability and even break down. A large number of smart grid security incidents have shown that, compared with traditional attacks, it is more difficult to detect and defend cyber-physical attacks (Liang et al., 2017).

Research efforts against FDIAs can be categorized into two main types: defense and detection. First, to defend against FDIAs before being attacked, researchers have studied the deployment of grid at the cyber-physical level, such as optimizing the distribution of key nodes and devices according to their coupling characteristics (Lei et al., 2020; Wu et al., 2021). Second, to detect FDIAs after attack, substantial efforts has been made, such as dynamic state estimation method (An et al., 2022), tracking the deviation of measurement (Alnowibet et al., 2021; Mohamed et al., 2021; Sinha et al., 2022) and some stochastic game methods (Wei et al., 2018; Oozeer and Haykin, 2019). In summary, above researches had shown that both cyber and physical methods are required in FDIA studies.

However, the grid operation is full of uncertainties, in which the system states and attacks are diverse (An and Liu, 2019). Due to the diversity, enumerating all attacks is not realistic with limited resources. Moreover, empirical or off-line attack detection strategies are not optimal solutions for online network attack detection (Ashok et al., 2018; Tsobdjou et al., 2022). Therefore, reinforcement learning (RL)-based methods are introduced to avoid the complexity of empirical methods and gain the ability of detecting attacks in multiple scenarios (Wang et al., 2018; Kurt et al., 2019; Haque et al., 2021).

Deep reinforcement learning (DRL) learns the optimum strategy of sequential decision problems by exploring and interacting with the environment. The agent gets rewards for guiding the behavior, with the goal of maximizing the long-term return (Sutton and Barto, 1998). DRL combines the feature-extraction capacity of neural networks with the decision-making capability of reinforcement learning in unknown environments to achieve direct control from state to action (Arulkumaran et al., 2017). The Deep Q-Network (DQN), used in conjunction with the replay buffer and a target network, is a representative DRL algorithm that can be adapted to environment with uncertainty (Mnih et al., 2015).

As for detection process of FDIA, after the unknown start of attack, state estimation results are falsified by the attack vector with unknown attack model (Kurt et al., 2019). Moreover, detection process of FDIA has the feature of sequential decision and the transition of state can be described as model-free (An et al., 2019, 2022). Thus, FIDA detection can be described as a MDP and trained utilizing DQN algorithm to achieve detection by neural networks.

The rest part is: In Section 2, related studies are reviewed. In Section 3, the smart grid state estimation is introduced, including the static and dynamic method. The empirical bad data detection is also introduced. In Section 4, an FDIA model is introduced and three types of FDIAs are discussed based on the attack model. In Section 5, a DRL-based, combined dynamic-static FDIA detection method is proposed and optimized. In Section 6, simulations of attacks and detections are performed on IEEE grid systems in multiple scenarios. In Section 7, this paper is concluded.

2 Related work

Weighted Least Squares (WLS) is the basic and widely-used method for power grid state estimation (Schweppe and Rom, 1970; Schweppe and Wildes, 1970). (Debs and Larson, 1970) applied Kalman filter (KF) to the power grid. As the study deepened, the extended Kalman filter (EKF) applied KF to non-linear systems. Moreover, unscented Kalman filter (UKF) and particle filtering (PF) were applied on state estimation, which improved the accuracy and stability of filtering (Wan and Van Der Merwe, 2000; Julier and Uhlmann, 2004).

(Liu et al., 2009) proposed FDIA and proved that the attack vector can bypass the detection element and cause damages on the system (Pang et al., 2016). studied attack method with the minimum cost to avoids anomaly detection (He et al., 2017). constructed a parallel FDIAs detection scheme, utilizing static-dynamic state estimation to detect attacks, which is robust (Li and Wang, 2019). investigated the method to construct a less costly and undetectable attack vector by partial topology information (Li et al., 2019). studied the selection of optimal buses during the attack and proposed a data-driven optimal bus attack method (Jiang et al., 2020). studied two types of FDIAs and proposed a detection-defense method (Chen and Wang, 2020). proposed a new state estimation method that estimates the grid state by sequential Monte Carlo filtering to detect multiple attacks.

Deep reinforcement learning matured later, but is widely used in sequential decision-making problems in recent years (Mnih et al., 2013). proposed DQN algorithm in 2013, and published a paper in 2015, in which DQN reached a high level over human players (Mnih et al., 2015). In the fields of smart grid security, (Wang et al., 2018), proposed an autonomous FDIA method adopting the nearest sequence memory Q-learning (Liu et al., 2020). investigated the vulnerability of power grids with new energy based on DRL (Wang et al., 2021). studied a hybrid cyber-physical topological attack in power grids, and proposed DRL-based method for detecting attacks with minimum cost (Luo and Xiao, 2021). proposed a FDIA method based on reinforcement learning (RL), utilizing measurements, grid states and other parameters to construct attacks, without dependence on topology information.

As for RL-based FDIA detection, (Kurt et al., 2019), formulated the detection process as a MDP, and proposed a model-free RL-based detection scheme (Zhang and Wu, 2021). proposed a RL-based detection method without the attack model, utilizing a Q-table to detect attacks by Sarsa algorithm. To address the complexity of storing Q-table, (An et al., 2019; Sinha et al., 2022), applied DQN algorithm to detect FDIAs by neural networks. Moreover, (Alnowibet et al., 2021; Mohamed et al., 2021), studied FDIA detection on energy trading and energy management systems by intelligent priority selection-based RL method.

Researches have proved the efficacy of RL-based method in FIDA detection. However, most studies focused on detection against single attack model, and studies on different types of FIDAs are not sufficient. Moreover, few studies combined multiple state estimation methods in the detection scheme, bringing out the focus of this paper.

3 Preliminaries

In this section, static and dynamic state estimation algorithms are introduced, laying the foundation of combined dynamic-static FIDA detection mechanism. Bad data detection method is shown to verify the efficacy of FDIA. Basic information about DQN algorithm is also introduced.

3.1 Measurement equations

Relation between the measured power flows and state of the grid is (Schweppe and Wildes, 1970):

\begin{aligned} z & = h (x) + υ \\ x & = {[\begin{matrix} φ_{i}, V_{i} \end{matrix}]}^{T} \\ z & = {[\begin{matrix} V_{i}, P_{i}, Q_{i}, P_{i j}, Q_{i j} \end{matrix}]}^{T} \end{aligned} (1)

where, z denotes the system measurement, h denotes the measurement equation, x denotes the state of the grid, υ denote the measurement noise, φ_i and V_i denote the voltage phase angle and magnitude of node i, P_i and Q_i denote the power of node i, P_ij and Q_ij denote the tributary power flow, from i to j, whose detailed equations are:

\begin{aligned} P_{i} & = \sum_{j \in N_{i}} V_{i} V_{j} (G_{i j} \cos (φ_{i} - φ_{j}) + B_{i j} \sin (φ_{i} - φ_{j})) \\ Q_{i} & = \sum_{j \in N_{i}} V_{i} V_{j} (G_{i j} \sin (φ_{i} - φ_{j}) - B_{i j} \cos (φ_{i} - φ_{j})) \\ P_{i j} & = V_{i}^{2} (g_{s i} + g_{i j}) - V_{i} V_{j} c o s (φ_{i} - φ_{j}) - V_{i} V_{j} b_{i j} \sin (φ_{i} - φ_{j}) \\ Q_{i j} & = - V_{i}^{2} (b_{s i} + b_{i j}) - V_{i} V_{j} s i n (φ_{i} - φ_{j}) \\ - V_{i} V_{j} b_{i j} \cos (φ_{i} - φ_{j}) \end{aligned} (2)

where, G_ij, B_ij are the real and imaginary part of the i, j term in the node conduction matrix, g_ij, b_ij denote the conductance and susceptance between i, j, g_si + jb_si denotes the conductance of the parallel branch of i, g_sj + jb_sj denotes the conductance of the parallel branch of j.

3.2 Static state estimation

Due to the pervasive noise, measurements can be unreliable and inconsistent with the actual state. Static state estimation filters the noise based on the current-measured data (Schweppe and Wildes, 1970). Due to this factor, when attacked by FDIAs, results of static state estimation will deviate significantly from the true state. That deviation is utilized in attack detection mechanism of this paper. WLS method is adopted in this paper, whose iterative form is:

\{\begin{matrix} Δ z^{(k)} = z - h ({\hat{x}}^{(k)}) \\ Δ {\hat{x}}^{(k)} = {[H^{T} R^{- 1} H]}^{- 1} H^{T} R^{- 1} Δ z^{(k)} \\ {\hat{x}}^{(k + 1)} = {\hat{x}}^{(k)} + Δ {\hat{x}}^{(k)} \end{matrix} (3)

where, k denotes the step of iteration, $\hat{x}$ denotes the estimated state, $H (x)$ denotes the Jacobian matrix of the measurement equation, with the dimension of m × n. The iteration ends when $Δ \hat{x}$ is sufficiently small.

3.3 Dynamic state estimation

Dynamic state estimation bases on KF to estimate the state and eliminate noise. While static method focuses primarily on real-time states, dynamic method tries to predict the state of the next step and the estimation result is closer to true state under attacks. Due to the non-linearity of power system, EKF and UKF are applied in our works.

3.3.1 Extended kalman filter

EKF is effective for non-linear models, and performs better in systems with weak non-linearities and perturbations (Li et al., 2015).

A second order expansion of $h (\hat{x})$ around $\tilde{x}$ is:

h (\hat{x}) = h (\tilde{x}) + H (\tilde{x}) Δ x + S (4)

where, $Δ x = \hat{x} - \tilde{x}$ , $\tilde{x}$ is the prior estimation of state, $\hat{x}$ is the posterior estimation of state, H denotes the Jacobian matrix of the measurement function, and S is the remainder term of the second and higher order. Omitting S, a linearized model of grid state is obtained.

\begin{aligned} x_{k + 1} & = F_{k} x_{k} + Q_{k} \\ z_{k + 1} & = H x_{k + 1} + R_{k} \end{aligned} (5)

where, F_k denotes the state-transition function, Q_k and R_k denote the system and measurement noise.

Equations of EKF are shown in Table 1, and the explanation is:

1. Prior estimation: Calculate ${\tilde{x}}_{k + 1}$ and covariance matrix of prior estimation M_k+1 by the post estimation results of step k.

2. Kalman gain: Calculate gain K_k+1 by M_k+1 and H.

3. Post estimation: Calculate ${\hat{x}}_{k + 1}$ and covariance matrix of post estimation Σ_k+1 for the next step.

TABLE 1

TABLE 1. Equations of EKF.

3.3.2 Unscented kalman filter

UKF applies KF to non-linear systems utilizing the Unscented Transformation (UT). UKF performs better under systems with strong non-linearity compared with EKF (Julier and Uhlmann, 2004). The non-linear form of grid state is:

\{\begin{aligned} x_{k + 1} = f (x_{k}) + ω_{k} \\ z_{k} = h (x_{k}) + υ_{k} \end{aligned} (6)

where, $f (x_{k})$ is n × 1 dimensional non-linear state-transition function, ω_k and υ_k are n × 1 and m × 1-dimensional Gaussian white noise with the zero mean.

Equations of UKF are shown in Table 2, and the explanation is:

1. Generate sigma-points: Generate 2n+1 sigma-points (Julier and Uhlmann, 2004).

2. Prior estimation: Utilizing sigma points to calculate the prior estimation ${\tilde{x}}_{k + 1}$ , and the prior estimation error covariance M_k+1.

3. Measurement correction: Calculate the prior estimated measurement ${\tilde{z}}_{k + 1}$ . Difference between ${\tilde{z}}_{k + 1}$ and z_k+1 is used to calculate covariance matrices $Σ_{k + 1}^{z z}$ and $Σ_{k + 1}^{x z}$ .

4. Kalman gain: Gain K_k+1 and post estimated state ${\hat{x}}_{k + 1}$ are calculated by $Σ_{k + 1}^{z z}$ and $Σ_{k + 1}^{x z}$ .

TABLE 2

TABLE 2. Equations of UKF.

3.4 Bad data detection

Errors in the initial measurement data can be the source of distortion of estimated states, leading to wrong decisions of EMS. Therefore, detection of bad data in measurements is applied to detect possible errors. The most common method is constructing an empirical threshold and detect by the residual function (Merrill and Schweppe, 1971):

τ < {‖z - h (\hat{x})‖}_{2} (7)

where, $\hat{x}$ denotes the state estimation results, ${‖z - h (\hat{x})‖}_{2}$ denotes the l²-norm of residuals and τ denotes the empirical threshold generated form historical data.

Holding of Eq. 7 denotes that residuals of the estimated states exceed the threshold. Then a bad data alarm will be triggered, indicating the existence of bad data.

3.5 DQN algorithm

The DQN algorithm, used with replay buffer and target network, is a representative DRL algorithm. Applying DQN algorithm can overcome the complexity of storing Q-table in Q-learning. Other improvements of DQN over Q-learning are (Mnih et al., 2015):

1. Construct replay buffer: At each step, store the experiences in buffer $D$ . When updating the neural network, a mini batch is extracted to update weights θ. Format of experience e_t is:

e_{t} = (s_{t}, a_{t}, r_{t}, s_{t + 1}) (8)

where, s_t, a_t, and r_t denote the state, action and reward of step t during the interacting between the agent and environment.

2. Use target Q network: DQN is a dual-network model. A target network is defined and periodically updated, generating target Q. Thus, the equation of gradient descent is:

\begin{align} \nabla_{θ_{i}} L (θ_{i}) & = E_{s, a, r, s^{'}} [(r + γ \max Q (s_{t + 1}, a_{t + 1}, θ_{i}^{-}) \\ - Q (s_{t}, a_{t}, θ_{i}))] \end{align} (9)

where, $Q (s_{t + 1}, a_{t + 1}, θ_{i}^{-})$ , $Q (s_{t}, a_{t}, θ_{i})$ are generated by the weights of target and current Q network, respectively.

3. Normalize reward: Restrain the reward r in $(- 1,1)$ , which can reduce the gradient during updating.

4. Adopt ɛ-greedy strategy: Adopt a random strategy at each step with a chance of 1 − ɛ, and ɛ increases with training.

4 Smart grid FDIA

4.1 FDIA model based on complete topology information

In this section, we construct FDIAs under the assumption of complete topology information and unlimited cost. Thus the attacker can extract whole measurement function h(x) and construct attack without considering the cost, resulting in the inefficacy of empirical bad data detection mechanism.

Equations for constructing attacks are (Liu et al., 2009):

\begin{aligned} z_{a} & = z + a \\ {\hat{x}}_{a} & = H^{- 1} z_{a} = \hat{x} + c \end{aligned} (10)

where, z_a denotes the attacked measurement values that the system obtains, z denotes the real measurement values of the grid, a denotes the attack vector, ${\hat{x}}_{a}$ denotes the estimated states under attacks, $\hat{x}$ denotes the estimated states without attacks, c denotes the change of state values.

As for a non-linear power system, FIDA can also satisfy the measurement equation $z_{a} = h (x_{a})$ by:

a = z_{a} - z = h (x + c) - h (x) = h (x_{a}) - h (x) (11)

\begin{aligned} {‖z_{a} - h ({\hat{x}}_{a})‖}_{2} & = {‖(z - h (\hat{x})) + (a + h (\hat{x}) - h ({\hat{x}}_{a}))‖}_{2} \\ = {‖z - h (\hat{x})‖}_{2} \end{aligned} (12)

If the static state estimation is operated as always, there is ${\hat{x}}_{a} \approx x_{a}$ and $h ({\hat{x}}_{a}) \approx h (x_{a})$ . Comparing Eq. 12 with Eq. 7, ignoring the inherent Gaussian noise in Eq. 6, the residuals under valid FDIAs are the same as the residuals without an attack, namely ${‖z_{a} - h ({\hat{x}}_{a})‖}_{2} = {‖z - h (\hat{x})‖}_{2}$ .

In other words, a FIDA constructed this way doesn’t change the residuals in Eq. 7. Therefore, a valid FIDA doesn’t trigger the bad data detection alarm mentioned in Section 3.4.

4.2 Types of attacks

Considering the diversity of attacks and attackers’ intentions, types of FDIAs are also diverse. In this paper, we classified FDIAs by duration and variation of intensity.

According to the duration, attacks can be divided into transient attack and continuous attack (Jiang et al., 2020). The transient attack tends to have stronger perturbation in a short period, while the continuous attack can remain undetected for a longer period by applying weak perturbation.

According to the intensity, the attack can be divided into constant-intensity and variable-intensity attack. The constant-intensity attack vectors are similar in magnitude, while the variable-intensity attack vectors can be stochastic or asymptotic.

Detailed classification of FDIA is shown in Table 3. Three types of attacks are selected and studied in this paper, the strategies are shown in Algorithm 1 and the detailed equations are:

TABLE 3

TABLE 3. Classification of FDIAs.

Algorithm 1 Strategies of three attacks.

1. Attack1. Continuous-constant-intensity attack:

Define the start and end of the attack as t_start and t_end. While t_start < t < t_end, construct and inject the attack by Eq. 13.

\{\begin{matrix} \begin{aligned} x_{a} = x + c \cdot ω \\ z_{a} = h (x_{a}) \end{aligned} \end{matrix} (13)

where, c = [c_φ1, c_φ2, ⋯c_φn, c_V1, c_V2, ⋯c_Vn] denotes the intended deviation of phase angle c_φi and magnitude c_Vi on the node voltage, ω is a standardized normal variable.

Attack-1 is a typical form of FDIA. When c_φi = c_Vi = 0, no attack will be injected on bus i, and c depends on the intention of attackers. Since we focus on detection, the programming problem of determining c is replaced by a Gaussian variable ω. Multiplying by ω, the attack vector varies in a reasonable range (0, c). Thus, diversified attack intention can be included, and value of c can be fixed. For example, if we define c_V1 = 0.1p.u., all attacks with the intensity between 0 and 0.1p.u. on bus one are considered as long as there are enough episodes. Moreover, ω can also make FDIAs hard to be detected by empirical method.

2. Attack2. Transient-constant-intensity attack:

While t_start < t < t_end, at each step, with a probability of ɛ_attack to construct and inject the attack by Eq. 13. In other cases, no attack is conducted. Attack-2 aims to test the response speed of the detection method.

3. Attack3. Continuous-variable-intensity (incremental) attack:

While t_start < t < t_end, construct the attack by:

\{\begin{matrix} \begin{aligned} x_{a} = x_{a} + \frac{c}{t_{end} - t_{start}} \\ z_{a} = h (x_{a}) \end{aligned} \end{matrix} (14)

Attack3 is valid during steady-state grid operation when state x undergoes little change. One obvious feature of Attack3 is that x_a is cumulative. Since the cumulation of deviation is slow, Attack3 is hard to be detected at an early stage.

5 DQN-based FDIA detection

In this section, we first introduce a combined dynamic-static detection mechanism. Then, to avoid the complexity and achieve more effectiveness, we proposed a DQN-based FDIA detection method.

5.1 Combined dynamic-static empirical FDIA detection

According to Section 4.1, it is hard to detect well-constructed FDIAs by bad data detection. However, when attacked by FDIAs, results of different state estimation methods produce a significant difference: Result of static state estimation will deviate from the true state, since it only depends on the real-time measurements. Result of dynamic method is closer to the true state due to the prediction steps.

So in our works, we combine results of static method (WLS) with dynamic method (EKF and UKF) and detect attack based on their inconsistency:

τ_{1} = |x_{K F} - x_{WLS} ∥_{2} > τ_{attack} (15)

where, x_KF and x_WLS denote the result of KF and WLS, τ_attack is the threshold for determining an attack. When Eq. 15 holds, the system is determined to be attacked.

However, grid states change abruptly sometimes due to other factors, which can also leads to the deviation of state estimation. Thus Eq. 15 can be false-positive, so we combine it with bad data detection:

τ_{2} = {‖z - h (x_{WLS})‖}_{2} > τ_{baddata} (16)

The mechanism is summarized as Figure 1, when Eq. 16 holds, the system is determined to have bad data, When Eq. 16 does’t hold but Eq. 15 holds, the system is determined to be attacked by FDIAs.

FIGURE 1

FIGURE 1. FDIA detection mechanism.

However, since the gird is vulnerable to disturbances, evaluating the performance of the method only by accuracy is incomplete. Considering the time sensitivity, an effective detection of FDIA in this paper is defined in Eq. 17. Utilizing Eq. 17, performance of detection is evaluated by detection rate.

t_{start} ⩽ t_{alarm} ⩽ t_{start} + 2 (17)

where, t_start denotes the start of attack, and t_alarm denotes the time that the attack is detected.

If Eq. 17 holds, it shows that the attack is detected within a short period of time. Thus the detection is effective and the safety of grid can be protected.

5.2 DQN-based FDIA detection scheme

Due to the changing load of grid and randomness of attacks, the threshold of Eq. 15 varies greatly in different scenarios. Therefore, it is impractical and costly to apply a certain empirical threshold τ_attack in a wide range of grids.

To address the shortcomings of empirical detection, FIDA detection is formulated as a MDP and trained utilizing DQN algorithm. Detection is achieved through neural network, equivalent to a dynamic threshold instead of the empirical threshold τ_attack in Eq. 15. The neural network is trained by interacting with the environment during the MDP of FDIA detection (An et al., 2019; Kurt et al., 2019). After an action of detection, agent receives a feedback (reward) from the environment for guiding the actions by updating the neural network (Sutton and Barto, 1998).

5.2.1 MDP-based attack detection model

MDP is the model for sequential decision making (Baxter, 1995). When the state of environment is Markovian, MDP can simulate the strategies and rewards that an agent can achieve. We formulate FIDA detection process as a MDP due to the feature of sequential decision and uncertainty of attack model.

Main components of MDP are state space S, action space A, state transition P and reward R, denoted by $\{S, A, P, R\}$ (Luong et al., 2019). For the FDIA detection, we defined S and A as:

\begin{aligned} S = [s_{n}, s_{a}] \\ A = [a_{c}, a_{s}] \end{aligned} (18)

where, s_n represents that no attack exists in the grid, s_a represents that the gird is under an attack, a_c denotes that no attack is detected and the system continues to operate, a_s denotes that the attack is detected, and the MDP ends when an attack is detected or time ends.

P represents the state transition function. To address the random and unpredictable characteristics of cyber attacks, a model-free approach is taken to define state transition, i.e., the state transition probability $p (s^{'} | s, a)$ is unknown (An et al., 2019). When the system chooses to continue the operation, the state of the next step is calculated by state estimation and perceived by the agent.

R represents the reward function. For the sparse characteristics of the power grid under attack, we define R by efficacy of detection. When agent detects attacks during normal operation, the reward is negative. When the agent detects attacks under attacks, the reward is positive, and the more timely the detection, the more the rewards. Rewards of the other cases are 0. The detailed function is:

r_{t} (s_{t}, a_{t}) = \{\begin{matrix} 0 & s_{t} = s_{n} & a_{t} = a_{c} \\ 0 & s_{t} = s_{a} & a_{t} = a_{c} \\ - β^{-} & s_{t} = s_{n} & a_{t} = a_{s} \\ β^{+} \frac{t - t_{start}}{t_{end} - t_{start}} & s_{t} = s_{a} & a_{t} = a_{s} \end{matrix} (19)

where, t_start, t_end denote the start and end of the attack, respectively, β⁻, β⁺ denote the reward coefficient.

5.2.2 Optimized DQN-based detection scheme

Framework of the detection scheme is shown in Figure 2. The detailed training and testing algorithms are given in Algorithm 2 and Algorithm 3.

FIGURE 2

FIGURE 2. Framework of optimized DQN-based detection schem.

Algorithm 2 Training of the a optimized DQN-based FDIA detection algorithm.

Algorithm 3 Testing of the FIDA detection algorithm.

6 Simulations

6.1 Simulations setup

Extensive simulations performed to simulate the actual scenarios of FIDA attack-detection process. First, to ensure the practicability, simulations are based on IEEE 9, 14, 30 and 57-bus networks by MATPOWER (Zimmerman et al., 2011). Second, due to the diversity of attacks and attack intentions (An and Liu, 2019), three types of FDIAs are adopted, namely Attack-1, 2, and 3. Cases with single attack and multiple attacks are both considered during simulations. Then, the attacks aim at the magnitude of node voltages, with the intensity to cause voltage violation (Zhu and Liu, 2016; Zheng et al., 2020). Third, WLS is adopted in static estimation while EKF and UKF are adopted for dynamic state estimation in different cases. In addition, random seeds were used to reduce the random error. Details of settings are shown in Table 4.

TABLE 4

TABLE 4. Simulation settings.

6.2 Simulations and effects of attacks

Considering the change in voltage magnitude of static state estimation result, effects of three typical attacks is in Figure 3. The dashed lines indicate the start and end of attacks.

FIGURE 3

FIGURE 3. Effects of three typical attacks. (A) Effect of Attack-1, (B) Effect of Attack-1, and (C) Effect of Attack-1.

Figure 3A shows effect of Attack-1, namely continuous-constant-intensity attack. Attack-1 injects attack continuously and magnitude of the attack vector follows the same Gaussian distribution. Thus, deviation in voltage magnitude is obvious under Attack-1.

Figure 3B shows effect of Attack-2, namely transient-constant-intensity attack. Attack-2 injects the attack vector intermittently and magnitude of the vector follows the same Gaussian distribution. Deviation generated by Attack-2 is also considerable, but the attack duration is compressed. Thus, for the detector, higher response speed is required.

Figure 3C shows effect of Attack-3, namely continuous-variable-intensity (incremental) attack. Attack-3 injects the attack vector continuously and magnitude of the vector is cumulative. The deviation between two steps generated by Attack-3 is smaller than other attacks, which can avoid being detected. However, the deviation accumulates over a period of time and the amplitude at the end of attack is also considerable, so the consequence of Attack-3 can be severe.

Taking Attack-1 as an example, Figure 4 shows the differences between valid and invalid attacks. In Figure 4A, the residual modulus of the no-attack case is about 0, and almost overlaps with the valid-attack case. But in Figure 4B, the residual modulus fluctuates greatly at a high level under an invalid attack, exposing the attack to detector. In summary, the difference of effectiveness between valid and invalid attacks to bypass the bad data detection is obvious.

FIGURE 4

FIGURE 4. Bad data detection results (A) Bad data detection results under valid attack and (B) Bad data detection results under invalid attack.

Changes in node voltage of IEEE 9-bus system when attacked by the FDIA are shown in Figure 5. Since the intended deviation of angle c_φi = 0°, the phase angle deviates slightly during the attack in Figure 5A. Moreover, the intended deviation of magnitude c_Vi = 0.1p.u., so the intensity of attack is within (0, c_j). Correspondingly, the bus voltage magnitude undergoes a large deviation in Figure 5B, misleading the following grid operation.

FIGURE 5

FIGURE 5. Static state estimation results under valid attack. (A) Change of node voltage angle after the valid attack and (B) Change of node voltage magnitude after the valid attack.

6.3 Effectiveness of optimized DQN-based method

Simulations in this section compare the optimized DQN-based method with the original DQN-based method and empirical threshold method. Moreover, cases with different dimension of state space are also compared Effectiveness of adopting sampled replay buffers and extending state space is proved.

First, to verify that effectiveness of extending state space, we changed the dimension of state space in different cases. Results are shown in Figure 6. The detection of original method is unstable during the training and reaches convergence after 3,000 episodes, the attack detection rate fluctuates at a low level (43%), and the detection rate is unstable with fluctuation.

FIGURE 6

FIGURE 6. Attack detection rate under different state spaces.

With the extended state space, the detection rate increases by at least 21%–64%, and the fluctuation decreases significantly. Comparing the above cases, the detection rate reaches near 100% after convergence in cases that N ≥ 4, so in the rest of this paper N = 4.

Second, to prove efficacy of the optimized method, we simulated the detections with optimized DQN-based method, original DQN-based method and empirical method.

The empirical threshold method uses a fixed threshold consturcted from experiences. In this paper the algorithm is: By $τ = {‖x_{K F} - x_{WLS}‖}_{2}$ , calculate τ₁ in no-attack cases and τ₂ in attacked cases, τ_attack = τ₂−τ₁. So the detection rate is a fixed number and behaves as a horizontal line since the empirical threshold is hardly updated online in practice.

Results are shown in Figure 7. Each case is simulated in five parallel groups utilizing random seeds in Table 4. Detection rates are averaged and the shadows denote the standard deviations between different groups.

FIGURE 7

FIGURE 7. Detection with optimized-DQN, original-DQN and empirical method against attacks. (A) Detection with EKF against Attack-1, (B) Detection with EKF against Attack-2, (C) Detection with EKF against Attack-3, (D) Detection with UKF against Attack-1, (E) Detection with UKF against Attack-2 and (F) Detection with UKF against Attack-3.

Comparing the performances in Figure 7, empirical method doesn’t perform well in detection. Detection rate of empirical method is 61% and 80% against Attack-1 and Attack-2, and is only 30% against Attack-3. What’s more, the method with original DQN performs well at certain episodes in Figures 7C, F, but the detection rate fluctuates substantially throughout the training in Figures 7A, D, E. The convergence of training with original DQN is difficult, too.

As for the optimized DQN-based method, cases with EKF converges around 8,000 episodes, and the converged detection rate is 98.42% against Attack-1, 99.70% against Attack-2, and 100% against Attack-3, with some fluctuation. Cases with UKF converges around 5,000 episodes, and the converged detection rate is 96.95% against Attack-1, 98.99% against Attack-2, and 100% against Attack-3 with little fluctuation. After convergence, fluctuation of detection rate has been restricted within 4%. In addition, the training process of EKF-based method is less stable than the UKF-based method.

In conclusion, detection rate is improved by at least 15.95% utilizing the optimized DQN-based method. Stability of the training is also improved fundamentally over the original DQN-based method, especially the UKF-based method.

6.4 Simulation in multiple cases

In this section, we compared three types of FDIAs in power systems in different networks to prove that the proposed method is effective for multiple scenarios. In addition, each case was repeated at least three times and results are averaged. Results are shown in Figure 8 and Table 5.

FIGURE 8

FIGURE 8. Attack detection rate while training against multiple attacks under multiple systems based on EKF or UKF. (A) Training against Attack-1 based on EKF, (B) Training against Attack-2 based on EKF, (C) Training against Attack-3 based on EKF, (D) Training against Attack-1 based on UKF, (E) Training against Attack-2 based on UKF, and (F) Training against Attack-3 based on UKF.

TABLE 5

TABLE 5. Performance of detection in different systems against multiple attacks.

In Figure 8, the training process converges after about 8,000 episodes against Attack-1 and 6,000 episodes against Attack-2. Meanwhile, the speed of convergence is faster in cases based on UKF, especially in the cases against Attack-3. At the final stages of training in above cases, the detection rates fluctuate by 2%. In addition, the convergence speed is slightly faster of a more complex network, since the accumulation of state deviation is faster.

First, in Table 5, the detection performances are similar in different networks, since the detection mechanism only depends on the state estimation performance and is not affected by the network complexity. Second, detection rates in different cases are consistently close to 100% after convergence. Third, UKF-based method performs better in detection than EKF-based method. In summary, the method performs well under different attacks in multiple scenarios.

6.5 Simulation against hybrid attacks

To prove utility of the detection method, a hybrid attack model is constructed. In each episode, the type and start of attack is random and unknown. One of Attack-1, 2, 3 is randomly selected and conducted during the attack based on IEEE 14-bus network.

Rusult of training is shown in Figure 9. Detection rates are also averaged by results of five groups, and standard deviations are given by the shadows.

FIGURE 9

FIGURE 9. Attack detection rate against the hybrid attack.

After training, the detection rate of EKF-based method reaches 99.01%, and the UKF-based method reaches 99.71%. In Figure 9, since the attack is hybrid, the detection rate fluctuates at the early stage of training. Trainings converge more slowly compared to the cases against single attack. EKF-based method converges at about 5,500 episodes and UKF-based method converges at about 8,500 episodes.

7 Conclusion

In this paper, a FDIA model with complete topology information and unlimited cost is introduced first. Attacks constructed under this model is verified to have the ability of bypassing the empirical bad data detection. FDIAs are classified by duration and intensity. Three types of attacks and their effects are performed. Then, a detection mechanism is proposed by combining static and dynamic state estimation. Second, the FIDA detection process was formulated as a MDP, and a DQN-based detection method is constructed. To address the problems while training and detection, optimizations were made to improve the efficacy. The DQN-based method is adaptive and has a non-deterministic threshold. Third, sufficient simulations were conducted, including a variety of cases, laying the foundation for studying multiple types of FDIAs. Simulation results prove that the detection rate against FDIA is improved by at least 15.95% over the empirical threshold method. The fluctuation of detection rate has been restricted within 4% during the final stage of training. Moreover, the highest detection rate reached 99.71% against the proposed hybrid attack.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

XL developed the methodology, performed the experiment, analyzed the data, and wrote the manuscript; DA contributed to the conception of the study and manuscript preparation; FC helped perform the analysis with constructive discussions. FZ contributed significantly to analysis and manuscript preparation.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62173268, Grant 61803295, Grant 61973247, and Grant 61673315; in part by the Major Research Plan of the National Natural Science Foundation of China under Grant 61833015; in part by the National Postdoctoral Innovative Talents Support Program of China under Grant BX20200272; in part the National Key Research and Development Program of China under Grant 2019YFB1704103; and in part by the China Postdoctoral Science Foundation under Grant 2018M643659.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Alnowibet, K., Annuk, A., Dampage, U., and Mohamed, M. A. (2021). Effective energy management via false data detection scheme for the interconnected smart energy hub–microgrid system under stochastic framework. Sustainability 13, 11836. doi:10.3390/su132111836

CrossRef Full Text | Google Scholar

An, D., Yang, Q., Liu, W., and Zhang, Y. (2019). Defending against data integrity attacks in smart grid: A deep reinforcement learning-based approach. IEEE Access 7, 110835–110845. doi:10.1109/ACCESS.2019.2933020

CrossRef Full Text | Google Scholar

An, D., Zhang, F., Yang, Q., and Zhang, C. (2022). Data integrity attack in dynamic state estimation of smart grid: Attack model and countermeasures. IEEE Trans. Autom. Sci. Eng. 19, 1631–1644. doi:10.1109/TASE.2022.3149764

CrossRef Full Text | Google Scholar

An, Y., and Liu, D. (2019). Multivariate Gaussian-based false data detection against cyber-attacks. IEEE Access 7, 119804–119812. doi:10.1109/ACCESS.2019.2936816

CrossRef Full Text | Google Scholar

Annaswamy, A. M., and Amin, M. (2013). Ieee vision for smart grid controls: 2030 and beyond. IEEE Vis. Smart Grid Controls 2030 Beyond. doi:10.1109/IEEESTD.2013.6577608

CrossRef Full Text | Google Scholar

Arulkumaran, K., Deisenroth, M. P., Brundage, M., and Bharath, A. A. (2017). Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 34, 26–38. doi:10.1109/MSP.2017.2743240

PubMed Abstract | CrossRef Full Text | Google Scholar

Ashok, A., Govindarasu, M., and Ajjarapu, V. (2018). Online detection of stealthy false data injection attacks in power system state estimation. IEEE Trans. Smart Grid 9, 1–1646. doi:10.1109/TSG.2016.2596298

CrossRef Full Text | Google Scholar

Baxter, L. A. (1995). Markov decision processes: Discrete stochastic dynamic programming. Technometrics 37, 353. doi:10.1080/00401706.1995.10484354

CrossRef Full Text | Google Scholar

Chen, L., and Wang, X. (2020). Quickest attack detection in smart grid based on sequential Monte Carlo filtering. IET Smart Grid 3, 686–696. doi:10.1049/iet-stg.2019.0320

CrossRef Full Text | Google Scholar

Debs, A. S., and Larson, R. E. (1970). A dynamic estimator for tracking the state of a power system. IEEE Trans. Power Apparatus Syst. 89, 1670–1678. doi:10.1109/TPAS.1970.292822

CrossRef Full Text | Google Scholar

Haque, N. I., Shahriar, M. H., Dastgir, M. G., Debnath, A., Parvez, I., Sarwat, A., et al. (2021). “A survey of machine learning-based cyber-physical attack generation, detection, and mitigation in smart-grid,” in 2020 52nd North American Power Symposium (NAPS). doi:10.1109/NAPS50074.2021.9449635

CrossRef Full Text | Google Scholar

He, Y., Mendis, G. J., and Wei, J. (2017). Real-time detection of false data injection attacks in smart grid: A deep learning-based intelligent mechanism. IEEE Trans. Smart Grid 8, 2505–2516. doi:10.1109/TSG.2017.2703842

CrossRef Full Text | Google Scholar

Jiang, Q., Chen, H., Xie, L., and Wang, K. (2020). Learning-based cooperative false data injection attack and its mitigation techniques in consensus-based distributed estimation. IEEE Access 8, 166852–166869. doi:10.1109/ACCESS.2020.3023117

CrossRef Full Text | Google Scholar

Julier, S., and Uhlmann, J. (2004). Unscented filtering and nonlinear estimation. Proc. IEEE 92, 401–422. doi:10.1109/JPROC.2003.823141

CrossRef Full Text | Google Scholar

Katiraei, F., and Iravani, M. (2006). Power management strategies for a microgrid with multiple distributed generation units. IEEE Trans. Power Syst. 21, 1821–1831. doi:10.1109/TPWRS.2006.879260

CrossRef Full Text | Google Scholar

Kurt, M. N., Ogundijo, O., Li, C., and Wang, X. (2019). Online cyber-attack detection in smart grid: A reinforcement learning approach. IEEE Trans. Smart Grid 10, 5174–5185. doi:10.1109/TSG.2018.2878570

CrossRef Full Text | Google Scholar

Lei, D., Zhao, J., Hu, M., Chang, X., Zhang, X., and Song, X. (2020). “Optimized configuration scheme of harmonic measuring device considering practical situations of grid nodes and monitoring device,” in 2020 IEEE 4th Conference on Energy Internet and Energy System Integration (EI2). doi:10.1109/EI250167.2020.9346993

CrossRef Full Text | Google Scholar

Li, Q., Li, R., Ji, K., and Dai, W. (2015). “Kalman filter and its application,” in 2015 8th International Conference on Intelligent Networks and Intelligent Systems (ICINIS), 74–77. doi:10.1109/ICINIS.2015.35

CrossRef Full Text | Google Scholar

Li, Q., Li, S., Xu, B., and Liu, Y. (2019). Optimal node attack on causality analysis in cyber-physical systems: A data-driven approach. IEEE Access 7, 16066–16077. doi:10.1109/ACCESS.2019.2891772

CrossRef Full Text | Google Scholar

Li, Y., and Wang, Y. (2019). False data injection attacks with incomplete network topology information in smart grid. IEEE Access 7, 3656–3664. doi:10.1109/ACCESS.2018.2888582

CrossRef Full Text | Google Scholar

Liang, G., Zhao, J., Luo, F., Weller, S. R., and Dong, Z. Y. (2017). A review of false data injection attacks against modern power systems. IEEE Trans. Smart Grid 8, 1630–1638. doi:10.1109/TSG.2015.2495133

CrossRef Full Text | Google Scholar

Liu, X., Ospina, J., and Konstantinou, C. (2020). Deep reinforcement learning for cybersecurity assessment of wind integrated power systems. IEEE Access 8, 208378–208394. doi:10.1109/ACCESS.2020.3038769

CrossRef Full Text | Google Scholar

Liu, Y., Reiter, M., and Ning, P. (2009). False data injection attacks against state estimation in electric power grids. ACM Trans. Inf. Syst. Secur. 14, 21–33. doi:10.1145/1952982.1952995

CrossRef Full Text | Google Scholar

Luo, W., and Xiao, L. (2021). “Reinforcement learning based vulnerability analysis of data injection attack for smart grids,” in 2021 40th Chinese Control Conference (CCC), 6788–6792. doi:10.23919/CCC52363.2021.9550523

CrossRef Full Text | Google Scholar

Luong, N. C., Hoang, D. T., Gong, S., Niyato, D., Wang, P., Liang, Y. C., et al. (2019). Applications of deep reinforcement learning in communications and networking a survey. IEEE Commun. Surv. Tutorials 21, 3133–3174. doi:10.1109/COMST.2019.2916583

CrossRef Full Text | Google Scholar

Merrill, H. M., and Schweppe, F. C. (1971). Bad data suppression in power system static state estimation. IEEE Trans. Power Apparatus Syst. 90, 2718–2725. doi:10.1109/TPAS.1971.292925

CrossRef Full Text | Google Scholar

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., et al. (2013). Playing atari with deep reinforcement learning. CoRR 1312, 5602.

Google Scholar

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare, M., et al. (2015). Human-level control through deep reinforcement learning. Nature 518, 529–533. doi:10.1038/nature14236

PubMed Abstract | CrossRef Full Text | Google Scholar

Mohamed, M. A., Hajjiah, A., Alnowibet, K. A., Alrasheedi, A. F., Awwad, E. M., and Muyeen, S. M. (2021). A secured advanced management architecture in peer-to-peer energy trading for multi-microgrid in the stochastic environment. IEEE Access 9, 92083–92100. doi:10.1109/ACCESS.2021.3092834

CrossRef Full Text | Google Scholar

Oozeer, M. I., and Haykin, S. (2019). Cognitive dynamic system for control and cyber-attack detection in smart grid. IEEE Access 7, 78320–78335. doi:10.1109/ACCESS.2019.2922410

CrossRef Full Text | Google Scholar

Pang, Z.-H., Liu, G.-P., Zhou, D., Hou, F., and Sun, D. (2016). Two-channel false data injection attacks against output tracking control of networked systems. IEEE Trans. Ind. Electron. 63, 3242–3251. doi:10.1109/TIE.2016.2535119

CrossRef Full Text | Google Scholar

Pasqualetti, F., Dörfler, F., and Bullo, F. (2013). Attack detection and identification in cyber-physical systems. IEEE Trans. Autom. Contr. 58, 2715–2729. doi:10.1109/TAC.2013.2266831

CrossRef Full Text | Google Scholar

Schweppe, F. C., and Rom, D. B. (1970). Power system static-state estimation, part ii: Approximate model. IEEE Trans. Power Apparatus Syst. 89, 125–130. doi:10.1109/TPAS.1970.292679

CrossRef Full Text | Google Scholar

Schweppe, F. C., and Wildes, J. (1970). Power system static-state estimation, part i: Exact model. IEEE Trans. Power Apparatus Syst. 89, 120–125. doi:10.1109/TPAS.1970.292678

CrossRef Full Text | Google Scholar

Sinha, A., Thukkaraju, A. R., and Vyas, O. P. (2022). “A multi agent framework to detect in progress false data injection attacks for smart grid,” in Advanced network technologies and intelligent computing. Editors I. Woungang, S. K. Dhurandher, K. K. Pattanaik, A. Verma, and P. Verma (Cham: Springer International Publishing), 123–141.

CrossRef Full Text | Google Scholar

Sutton, R., and Barto, A. (1998). Reinforcement learning: An introduction. IEEE Trans. Neural Netw. 9, 1054. doi:10.1109/TNN.1998.712192

CrossRef Full Text | Google Scholar

Tsobdjou, L. D., Pierre, S., and Quintero, A. (2022). An online entropy-based ddos flooding attack detection system with dynamic threshold. IEEE Trans. Netw. Serv. Manage. 19, 1679–1689. doi:10.1109/TNSM.2022.3142254

CrossRef Full Text | Google Scholar

Wan, E., and Van Der Merwe, R. (2000). “The unscented kalman filter for nonlinear estimation,” in Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium, 153–158. doi:10.1109/ASSPCC.2000.882463

CrossRef Full Text | Google Scholar

Wang, Z., Chen, Y., Liu, F., Xia, Y., and Zhang, X. (2018). Power system security under false data injection attacks with exploitation and exploration based on reinforcement learning. IEEE Access 6, 48785–48796. doi:10.1109/ACCESS.2018.2856520

CrossRef Full Text | Google Scholar

Wang, Z., He, H., Wan, Z., and Sun, Y. (2021). Coordinated topology attacks in smart grid using deep reinforcement learning. IEEE Trans. Ind. Inf. 17, 1407–1415. doi:10.1109/TII.2020.2994977

CrossRef Full Text | Google Scholar

Wei, L., Sarwat, A. I., Saad, W., and Biswas, S. (2018). Stochastic games for power grid protection against coordinated cyber-physical attacks. IEEE Trans. Smart Grid 9, 684–694. doi:10.1109/TSG.2016.2561266

CrossRef Full Text | Google Scholar

Wu, Z., He, L., Li, S., Zhang, H., Hu, S., Zhang, M., et al. (2021). “Reinforcement learning based multistage optimal pmu placement against data integrity attacks in smart grid,” in 2021 4th IEEE International Conference on Industrial Cyber-Physical Systems (ICPS). doi:10.1109/ICPS49255.2021.9468170

CrossRef Full Text | Google Scholar

Zhang, K., and Wu, Z.-G. (2021). “A reinforcement learning-based detection method for false data injection attack in distributed smart grid,” in 2021 8th International Conference on Information, Cybernetics, and Computational Social Systems (ICCSS), 38–43. doi:10.1109/ICCSS53909.2021.9722027

CrossRef Full Text | Google Scholar

Zheng, Y., Hill, D. J., Song, Y., Zhao, J., and Hui, S. Y. R. (2020). Optimal electric spring allocation for risk-limiting voltage regulation in distribution systems. IEEE Trans. Power Syst. 35, 273–283. doi:10.1109/TPWRS.2019.2933240

CrossRef Full Text | Google Scholar

Zhu, H., and Liu, H. J. (2016). Fast local voltage control under limited reactive power: Optimality and stability analysis. IEEE Trans. Power Syst. 31, 3794–3803. doi:10.1109/TPWRS.2015.2504419

CrossRef Full Text | Google Scholar

Zimmerman, R. D., Murillo-Sánchez, C. E., and Thomas, R. J. (2011). Matpower: Steady-state operations, planning, and analysis tools for power systems research and education. IEEE Trans. Power Syst. 26, 12–19. doi:10.1109/TPWRS.2010.2051168

CrossRef Full Text | Google Scholar

Keywords: state estimation, deep reinforcement learning, attack detection, smart grid, false data injection attack

Citation: Lin X, An D, Cui F and Zhang F (2023) False data injection attack in smart grid: Attack model and reinforcement learning-based detection method. Front. Energy Res. 10:1104989. doi: 10.3389/fenrg.2022.1104989

Received: 22 November 2022; Accepted: 05 December 2022;
Published: 24 January 2023.

Edited by:

Yongming Han, Beijing University of Chemical Technology, China

Reviewed by:

Yu Ma, Chang’an University, China
Zhijiang Chen, Frostburg State University, United States
Yalong Wu, University of Houston–Clear Lake, United States

Copyright © 2023 Lin, An, Cui and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Dou An, ZG91YW4yMDE3QHhqdHUuZWR1LmNu

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

False data injection attack in smart grid: Attack model and reinforcement learning-based detection method

1 Introduction

2 Related work

3 Preliminaries

3.1 Measurement equations

3.2 Static state estimation

3.3 Dynamic state estimation

3.3.1 Extended kalman filter

3.3.2 Unscented kalman filter

3.4 Bad data detection

3.5 DQN algorithm

4 Smart grid FDIA

4.1 FDIA model based on complete topology information

4.2 Types of attacks

5 DQN-based FDIA detection

5.1 Combined dynamic-static empirical FDIA detection

5.2 DQN-based FDIA detection scheme

5.2.1 MDP-based attack detection model

5.2.2 Optimized DQN-based detection scheme

6 Simulations

6.1 Simulations setup

6.2 Simulations and effects of attacks

6.3 Effectiveness of optimized DQN-based method

6.4 Simulation in multiple cases

6.5 Simulation against hybrid attacks

7 Conclusion

Data availability statement

Author contributions

Funding

Conflict of interest

Publisher’s note

References

95% of researchers rate our articles as excellent or good

95% of researchers rate our articles as excellent or good