Load frequency optimal control of the hydropower-photovoltaic hybrid microgrid system based on the off-policy integral reinforcement learning algorithm

Wang, Enzhong; Yuan, Lin; Zeng, Fanfei; Liu, Xiaoheng; Liu, Jiannan; Sun, Lingfang; Zhuang, Min

doi:10.3389/fenrg.2024.1464722

ORIGINAL RESEARCH article

Front. Energy Res., 10 October 2024

Sec. Smart Grids

Volume 12 - 2024 | https://doi.org/10.3389/fenrg.2024.1464722

This article is part of the Research TopicModeling and Control of Power Electronics for RenewablesView all 13 articles

Load frequency optimal control of the hydropower-photovoltaic hybrid microgrid system based on the off-policy integral reinforcement learning algorithm

Enzhong Wang¹

Lin Yuan¹

Fanfei Zeng²

Xiaoheng Liu²

Jiannan Liu²

Lingfang Sun³*

Min Zhuang³

¹Guoneng Qinghai Yellow River MaerDang Hydropower Development Co., Ltd., Qinghai, China
²Beijing SP Zhishen Control Technology Co., Ltd., Beijing, China
³School of Automation Engineering, Northeast Electric Power University, Jilin, China

With the promotion and development of clean energy, it is challenging to ensure the optimization of control performance in frequency control of the hydropower-photovoltaic hybrid microgrid system caused by the output power fluctuation of photovoltaic power generation. In this study, an optimal load frequency controller (LFC) for a hydropower-photovoltaic hybrid microgrid system was designed to improve the dynamic response when the load and photovoltaic output power are perturbed based on the off-policy integral reinforcement learning algorithm. First, a mechanism model of the hydropower-photovoltaic hybrid microgrid system was established. Next, the LFC problem was transformed into a zero-sum game control problem based on the characteristics of the power system. Subsequently, three neural networks were employed to approximate the Nash equilibrium solution of the zero-sum game with historical input and output data when the system dynamics are completely unknown. Finally, simulation experiments were conducted to verify the effectiveness and optimality of the proposed method. The introduction of this method provides a new perspective for frequency control for the hydropower-photovoltaic hybrid microgrid system.

1 Introduction

With the development of the national economy and society, the contradiction between increasing energy demand and energy shortages has become increasingly obvious (Gilani et al., 2020; Patnaik et al., 2020; Zhang and Kong, 2022). Traditional thermal power generation causes problems such as the consumption of nonrenewable energy and excessive carbon emissions (Ahmad et al., 2018; Cowie et al., 2020; Olabi and Abdelkareem, 2022). Hydropower and solar energy have attracted the attention of researchers owing to their renewable and environment-friendly nature (Gielen et al., 2019; Zepter et al., 2019).

However, photovoltaic (PV) power generation is intermittent, leading to unstable output power and microgrid frequency oscillations (Thirunavukkarasu and Sawle, 2021; Chen et al., 2022; Wu and Yang, 2023). To ensure the frequency stability of a microgrid, it is necessary to supplement controllable power sources, such as hydroelectric units or energy storage devices, to fill the power deficit, which can effectively maintain the microgrid frequency stability (Coban et al., 2022). The power quality of PV power systems can be improved by utilizing a control algorithm for controllable power sources, which is applied to obtain an optimal load frequency controller (LFC) system (Papaefthymiou et al., 2010; Ma et al., 2014; Dhundhara and Verma, 2020). Some researchers focus on the suppression of local load fluctuations and their interactions with the distribution system (Khalid et al., 2022). Additionally, the role of ancillary services and the integration of renewable energy should also be addressed upon introduction to minimize fluctuations and cover intermittency (Khalid et al., 2022; Rehman et al., 2024; Osman et al., 2022).

Owing to their simple structure and ease of implementation, proportional-integral-derivative (PID) control methods are widely used in microgrid LFC (Mohamed et al., 2020; Nisha and Jamuna, 2022). Ray et al. (2011) utilized a PI controller to regulate the frequency of a microgrid and achieve the required frequency ratings. Guha et al. (2021) designed a fractional-order PID method to solve the frequency stabilization problem of microgrid systems with uncertain parameters. Huang et al. (2021) used fuzzy reasoning in PID to improve the control performance of a hydraulic turbine regulation system.

Many practical power systems can only be partially modeled, and models of unknown parts are unavailable (Ganguly et al., 2018; Li et al., 2023; Wu and Yang, 2023). Dynamic characteristics of droop-controlled inverters are evaluated by a reduce-order small-signal transfer function model, which is designed on the basis of the Jordan continued-fraction expansion to provide a preprocessing method for real-time power system simulation (Wang et al., 2020). Therefore, owing to the insensitivity to the dynamics of the unmodeled parts of the controlled object, adaptive control methods have been proposed by continuously identifying system parameters to achieve the ideal control effect. Adaptive control methods can be used to resolve problems arising from parameter variations in the LFC of a power system. Zeng et al. (2015) designed a port-controlled Hamiltonian system that decomposed nonlinear control into stabilizing control with a given equilibrium point and proposed L-2 adaptive control for application to a hydroelectric generator unit. Fang et al. (2011) effectively improved the dynamic performance of the hydraulic turbine regulation process using an improved particle swarm optimization algorithm, which was applied to the optimal design of the parameters of a hydraulic turbine regulation system to achieve an optimal positive setting of the parameters. Tran et al. (2021) used a combination of second-order sliding film control and a state estimator for frequency regulation to reduce the number of overtones. Although these methods can achieve better control performance, they have not been widely popularized in practical power systems owing to their complexity and difficulty.

The adaptive dynamic programming (ADP) algorithm is an emerging intelligent control algorithm that solves the problem of dimensional disasters caused by the traditional dynamic programming (DP) method (Werbos, 1992; Vamvoudakis and Lewis, 2010; Lewis et al., 2012; Bellman and Dreyfus, 2015) and is suitable for systems with a high degree of nonlinearity. Shuai et al. (2020) used a hybrid ADP algorithm to achieve optimal operation of gas and electric systems. Xue et al. (2022) used ADP for the real-time scheduling of battery heat storage tank integrated heat and power systems, providing optimal economic operation strategies. The off-policy integral reinforcement learning (IRL) algorithm is proposed based on the theory of the ADP algorithm, which can explore system information with historical input and output data, thereby overcoming the difficulty of traditional ADP relays on neural network weights in the training process to find the continuous excitation function. Chai et al. (2017) used the game theory to solve multi-objective trajectory optimization problems for aerial vehicles. Song et al. (2019) proposed an off-policy IRL algorithm to solve an optimal control problem with partially known system dynamics. Based on the ADP algorithm, this paper proposes an integral reinforcement learning method that requires only the historical input-output data of the system, allowing for optimal solutions even when the system dynamics are completely unknown.

To the best of our knowledge, in the hydropower-photovoltaic hybrid microgrid system, the challenges of considering system disturbances and employing model-free methods for frequency control are quite evident. Traditional frequency control methods typically rely on a mathematical model of the system and assume that disturbances are known or predictable. However, in real microgrid systems, disturbances such as load variations and fluctuations in renewable energy output are often unpredictable, and obtaining an accurate model of the system can be difficult or complex. The existing reinforcement learning methods for frequency control in the hydropower-photovoltaic hybrid microgrid systems have not simultaneously addressed disturbances in the system and utilized the model-free approaches, which motivates our study. The focus of this paper is on how to abstract a hybrid power generation system with disturbances as a zero-sum game problem and solve it using the proposed model-free method. This approach provides a theoretical foundation and basis for the grid integration of a series of photovoltaic combined power generation systems. The main contributions of this article are as follows:

1. A hydropower-photovoltaic hybrid microgrid system model was constructed on the basis of the mechanistic modeling of the hydraulic turbine and photovoltaic power generation, meanwhile treating the photovoltaic power generation perturbed as the disturbance term.

2. Based on the power generation characteristics, the secondary frequency modulation control signal was used as the control vector, and the input system load frequency and solar energy power were used as the perturbation vectors of the hydropower-photovoltaic microgrid power system, which transforms the LFC problem into a zero-sum optimal control problem. By solving the Nash equilibrium of the zero-sum game, the optimal control rate and the maximum disturbance that the system can withstand can be obtained, thereby controlling the load frequency of the hydropower-photovoltaic hybrid microgrid system.

3. An off-policy IRL algorithm was adopted to resolve the zero-sum optimal control problem in which three networks were employed to approximate the Nash equilibrium point of the zero-sum game to obtain the optimal LFC of the hybrid system. The proposed method overcomes the limitation of existing solution methods that require precise system model information.

2 Problem statement

The hydropower-photovoltaic microgrid power system effectively exploits the inherent frequency regulation advantages of hydropower units while integrating solar energy generation resources within the same regional grid. This hybrid system aims to enhance the overall frequency quality of the microgrid by balancing both renewable energy inputs and electrical load demand. However, such integration significantly increases the operational requirements for the Load Frequency Control (LFC) controller. In this study, Figure 1 outlines the core structure of the system: a power busbar connects hydropower units (HP), photovoltaic generation units (PV), and electrical loads. Specifically, the PV units are connected to the alternating current (AC) microgrid through direct current (DC) to alternating current (AC) conversion using DC/AC inverters.

Figure 1

Figure 1. Main structure of the independent microgrid.

The frequency stability of this isolated microgrid relies heavily on maintaining an active power balance within the network. Variations in electrical load and the intermittent, fluctuating output from photovoltaic sources can disturb this balance, leading to changes in system frequency. A central feature of the hydropower-photovoltaic microgrid system is the hydro-turbine generator, which is responsible for providing rotational reserves that help regulate frequency by adjusting the mechanical input to the turbine. This response compensates for any mismatch between generation and demand, ensuring system stability.

The hydropower units play a critical role in Load Frequency Control (LFC) tasks. The primary function of the LFC system is to regulate water flow into the turbines of the hydropower generators. It achieves this by dynamically adjusting the active power output of the hydropower units in real-time, depending on the load and intermittent power output from the solar resources. This real-time control is vital for compensating fluctuations in both solar power production and load changes, stabilizing the generator speed, and ultimately controlling the frequency of the microgrid.

To improve the effectiveness of the system and minimize control costs, an advanced optimal load frequency controller was designed, utilizing an off-policy Inverse Reinforcement Learning (IRL) algorithm. This controller ensures the stability of the grid-connected voltage in the hydropower-photovoltaic microgrid by optimizing the dynamic allocation of power resources. In essence, it manages the trade-offs between ensuring grid frequency stability and maintaining operational cost efficiency, leading to a robust, reliable, and sustainable microgrid power system.

3 Materials and methods

3.1 Establishment of the hydro turbine group model

The hydro turbine group consisted of hydro turbines, governors, and generators. A turbine group model was established for each part.

The equations of moment and flow of the hydro turbine are expressed as Equation 1:

\{\begin{array}{l} Δ m_{t} (t) = n_{x} Δ x (t) + n_{y} Δ y (t) + n_{h} Δ h (t) \\ Δ q (t) = n_{q x} Δ x (t) + n_{q y} Δ y (t) + n_{q h} Δ h (t), \end{array} (1)

where $Δ m_{t} (t)$ and $Δ q (t)$ represent the increments in torque and flow, respectively; $Δ y (t)$ , $Δ h (t)$ , and $Δ x (t)$ indicate the relative deviation variables of the guide vane opening, water head, and hydro turbine speed, respectively; $n_{x}, n_{y},$ and $n_{h}$ represent the transfer coefficients of the hydro turbine torque to the speed, guide vane opening and water head, respectively; $n_{q x}, n_{q y}$ and $n_{q h}$ are transfer coefficients of the hydro turbine flow to the speed, guide vane opening and the water head, respectively. Under stable conditions, each transfer coefficient is regarded as a constant.

When $n = \frac{n_{q y} n_{h}}{n_{y}} - n_{q h}$ , the transfer function from the guide vane increment $Δ y (t)$ to the torque increment $Δ m_{t} (t)$ of the hydro turbine is as follows:

G_{y m} (s) = \frac{Δ m (s)}{Δ y (s)} {= n}_{y} \frac{1 - n T_{w} s}{1 + n_{q h} T_{w} s}, (2)

where $T_{w}$ indicates the inertia time constant of the water flow.

The hydro turbine governor can be simplified as a first-order inertial link by ignoring the nonlinear factors, as Equation 3:

G_{y} (s) = \frac{Δ y (s)}{u (s)} = \frac{1}{T_{y} s + 1}, (3)

where $T_{y}$ represents the response time constant of the hydro-turbine governor. The corresponding differential equation is as follows:

\frac{d Δ y (t)}{d t} = - \frac{1}{T_{y}} Δ y (t) + \frac{1}{T_{y}} u (t) (4)

Equation 2 is substituted into Equation 4 after the Laplace transformation. The differential equation is obtained by the Laplace transformation of Equation 2, and Equation 4 is substituted into the differential equation to obtain the hydro turbine differential equation as follows:

\frac{d Δ m_{t} (t)}{d t} = \frac{1}{n_{q h} T_{w}} [- Δ m_{t} (t) + (n_{y} + n_{y} n \frac{T_{w}}{T_{y}}) Δ y (t) - n_{y} n \frac{T_{w}}{T_{y}} u (t)] (5)

The second-order model of the generator includes the rotor rotation motion equation and the equation that characterizes the relationship between the power angle and speed, as follows:

\{\begin{array}{l} \frac{d Δ δ (t)}{d t} = w_{0} Δ x (t) \\ \frac{d x (t)}{d t} = \frac{1}{T_{J}} [Δ m_{t} (t) - Δ P_{e} (t) - D Δ x (t)], \end{array} (6)

where $δ$ indicates the generator power angle, which remains constant under stable operating conditions, and $Δ P_{e}$ indicates the electromagnetic power increment, which is equivalent to the relative value of the load increment $Δ P_{L} . w_{0} = 100 π$ indicates the synchronous electrical angular velocity, D is the damping coefficient, and $T_{J}$ indicates the inertia time constant of the generator. $Δ P_{S}$ is the sum of the incremental relative values of power supplied by all sources within the microgrid. From the Equation 6, we get $Δ P_{S} - Δ P_{L} = T_{M} \frac{d Δ f (t)}{d t} + D_{M} Δ f (t)$ . $T_{M}$ represents the equivalent inertia time coefficient of the system, which is equal to the weighted sum of the inertia coefficients of all the generators in the system. $D_{M}$ represents the equivalent load-damping factor of the system.

By combining Equations 4–6, the following mathematical model of the hydro-turbine group can be obtained:

\{\begin{array}{l} \frac{d δ (t)}{d t} = w_{0} Δ x (t) \\ \frac{d Δ x (t)}{d t} = \frac{1}{T_{J}} [Δ m_{t} (t) - Δ P_{e} (t) - D Δ x (t)] \\ \frac{d Δ m_{t} (t)}{d t} = \frac{1}{n_{q h} T_{w}} [- Δ m_{t} (t) + (n_{y} + n_{y} n \frac{T_{w}}{T_{y}}) Δ y (t) - n_{y} n \frac{T_{w}}{T_{y}} u (t)] \\ \frac{d Δ y (t)}{d t} = - \frac{1}{T_{y}} Δ y (t) + \frac{1}{T_{y}} u (t) \end{array} (7)

3.2 Establishment of the photovoltaic model

PV panels convert solar energy into electrical energy based on PV effects. The main body of the frequency control in this study was the hydropower unit. Therefore, in this subsection, a first-order model with time constant $T_{s o l a r}$ was used to express the frequency characteristics of the PV power as follows:

\frac{d Δ P_{P V}}{d t} = - \frac{1}{T_{s o l a r}} Δ P_{P V} + \frac{1}{T_{s o l a r}} Δ P_{s o l a r}, (8)

where $T_{s o l a r}$ represents the time constant of the PV power system, $Δ P_{P V}$ represents the output power of the PV power system, and $Δ P_{s o l a r}$ represents the solar power. Solar power has obvious volatility, as PV panels are easily affected by light intensity and ambient temperature. Therefore, solar power is regarded as a disturbance term. A block diagram of the model is shown in the Figure 2.

Figure 2

Figure 2. Block diagram of the model.

3.3 Establishment of the hydropower-photovoltaic microgrid power system model

The transient changes in the voltage and power angle of the system can be ignored in the frequency control analysis. Therefore, in the analysis process of LFC, $Δ m_{t} (t) = Δ P_{m} (t)$ is the incremental relative value of the output power of the hydro turbine group, the relative value of the electromagnetic power increment $Δ P_{e} (t)$ is replaced by the relative value of the load increment $Δ P_{L} (t)$ , and the relative value of the speed deviation $Δ x (t)$ is equal to the relative value of the frequency deviation $Δ f (t)$ .

By combining Equations 7, 8, the hydro-photovoltaic microgrid power system can be derived as Equation 9:

\{\begin{array}{l} \frac{d δ (t)}{d t} = w_{0} Δ f (t) \\ \frac{d Δ f (t)}{d t} = \frac{1}{T_{J}} [Δ P_{m} (t) - Δ P_{L} (t) - D Δ f (t)] \\ \frac{d Δ P_{m} (t)}{d t} = \frac{1}{n_{q h} T_{w}} [- Δ P_{m} (t) + (n_{y} + n_{y} n \frac{T_{w}}{T_{y}}) Δ y (t) - n_{y} n \frac{T_{w}}{T_{y}} u (t)] \\ \frac{d Δ y (t)}{d t} = - \frac{1}{T_{y}} Δ y (t) + \frac{1}{T_{y}} u (t) \\ \frac{d Δ P_{P V}}{d t} = - \frac{1}{T_{s o l a r}} Δ P_{P V} + \frac{1}{T_{s o l a r}} Δ P_{s o l a r} . \end{array} (9)

Here, $x (t) = {[x_{1} (t) x_{2} (t) x_{3} (t) x_{4} (t) x_{5} (t)]}^{T}$ = ${[δ (t) f (t) Δ P_{m} (t) Δ y (t) {Δ P}_{P V}]}^{T}$ are state variable. The load frequency $Δ P_{L} (t)$ and solar energy $Δ P_{s o l a r}$ of the input system are considered as the elements in the disturbance vector, and the PV output power fluctuation and load power change cause the power supply and demand of microgrid to lose balance. The system disturbance vector can be obtained as follows: $w (t) = {[Δ P_{L} (t) Δ P_{s o l a r} (t)]}^{T}$ ; $u = u (t)$ is the control vector signal. Therefore, the load frequency model of the hydropower-photovoltaic system can be obtained as follows:

\dot{x} = A x + B u + F w, (10)

where the system state variable $x \in R^{5 \times 1} i s$ the system control variable and $u \in R^{1 \times 1}$ is the disturbance variable $w \in R^{2 \times 1}$ . The system matrix is expressed as follows:

A = [\begin{array}{c} 0 & w_{0} & 0 & 0 & 0 \\ 0 & - \frac{D}{T_{J}} & \frac{1}{T_{J}} & 0 & 0 \\ 0 & 0 & - \frac{1}{n_{q h} T_{w}} & \frac{n_{y}}{n_{q h} T_{w}} (1 + n \frac{T_{w}}{T_{y}}) & 0 \\ 0 & 0 & 0 & - \frac{1}{T_{y}} & 0 \\ 0 & 0 & 0 & 0 & - \frac{1}{T_{s o l a r}} \end{array}], B = [\begin{array}{c} 0 \\ 0 \\ - \frac{n_{y} n}{n_{q h} T_{y}} \\ \frac{1}{T_{y}} \\ 0 \end{array}], F = [\begin{array}{c} 0 & 0 \\ - \frac{1}{T_{y}} & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & \frac{1}{T_{s o l a r}} \end{array}]

Thus far, the load frequency control problem of the hydropower-photovoltaic microgrid power system has been transformed into a zero-sum game optimal control problem, wherein the input of the governor was taken as the control variable and the load frequency and solar power were taken as the disturbance variables.

4 Results of the optimal controller based on off-policy IRL algorithm

The hydropower-photovoltaic microgrid power system model was established using Equation 10, where $x, u$ and $w$ are the state, control input, and disturbance input of the system, respectively. $x = 0$ is the equilibrium point of the hydropower-photovoltaic microgrid power system. The infinite-horizon performance index function can be designed as follows:

J (x (0), u, w) = \int_{0}^{\infty} \{r (x (t), u, w)\}, (11)

where in the utility function can be described as Equation 12:

r (x (t), u, w) = x^{T} R x + u^{T} S u - w^{T} T w, (12)

where the coefficient matrices $R$ , $S$ and $T$ are real symmetric positive definite matrices. Please note that this performance indicator has clear economic significance, in which $x^{T} R x$ is the penalty cost when the quality of power supply at the node i deviates from the system’s stable value and $u^{T} S u - w^{T} T w$ is the control costs and disturbance costs incurred by node i to reduce penalty costs.

The purpose of the zero-sum game is to solve for an optimal control that satisfies Equation 13.

V^{*} (x (0)) = \underset{u}{\inf_{⏟}} \underset{w}{\sup_{⏟}} J (x (0), u, w) (13)

The zero-sum game selects to minimize the player set $u$ and maximize the player set $w$ , the saddle point $u^{*}$ and $w^{*}$ must satisfy the following inequality Equation 14:

J (x, u^{*}, w) \leq J (x, u^{*}, w^{*}) \leq J (x, u, w^{*}) (14)

When there is a unique set of solutions that satisfy the following Nash equilibrium condition Equation 15:

V^{*} (x) = \underset{u}{\inf_{⏟}} \underset{w}{\sup_{⏟}} J (x, u, w) = \underset{w}{\sup_{⏟}} \underset{u}{\inf_{⏟}} J (x (0), u, w), (15)

the cost function of every player can be written as Equation 16:

V (x (t)) = \int_{t}^{\infty} \{x^{T} R x + u^{T} S u - w^{T} T w} d t (16)

Using the Leibniz formula and differentiating Equation 6, the Bellman equation of the zero-sum game can be obtained as follows:

H (x, \nabla V, u, w) = x^{T} R x + u^{T} S u - w^{T} T w + \nabla V^{T} (A x + B u - F w), (17)

where $\nabla V = \frac{\partial V}{\partial x}$ . The control and disturbance inputs can be obtained as Equations 18, 19:

\frac{\partial H}{\partial u} = 2 u S + \nabla V^{T} B = 0 (18)

\frac{\partial H}{\partial w} = - 2 w T + \nabla V^{T} F = 0 (19)

The optimal control policy $u^{*}$ and the optimal disturbance $w^{*}$ can be derived as follows:

u^{*} = - \frac{1}{2} S^{- 1} B^{T} \nabla V (20)

w^{*} = \frac{1}{2} T^{- 1} F^{T} \nabla V (21)

The Hamilton-Jacobi-Bellman equation can be obtained by substituting Equations 20, 21 into Equation 17 as follows:

0 = x^{T} R x + \nabla V^{T} A x - \frac{1}{4} \nabla V^{T} B S^{- 1} B^{T} \nabla V + \frac{1}{4} \nabla V^{T} F T^{- 1} F^{T} \nabla V (22)

The following equations were used to update the control and disturbance policies as Equations 23, 24:

u^{[k + 1]} = - \frac{1}{2} S^{- 1} B^{T} \nabla V^{[k]} (23)

w^{[k + 1]} = \frac{1}{2} T^{- 1} F^{T} \nabla V^{[k]}, (24)

where the superscript $[k]$ represents the number of steps in the iteration process.

The Equation 11 can be transformed as Equation 25:

\dot{x} = A x + B u^{[k]} + F w^{[k]} + B (u - u^{[k]}) + F (w - w^{[k]}) (25)

The Equation 22 can be rewritten as follows:

\begin{array}{c} V^{[k]} (x (t + Δ t)) - V^{[k]} x (t) = \int_{t}^{t + Δ t} Δ V^{[k] T} \dot{x} d τ \\ = \int_{t}^{t + Δ t} Δ V^{[k] T} (A x + b u + F w) d τ \\ + \int_{t}^{t + Δ t} Δ V^{[k] T} (B (u - u^{[k]}) + F (w - w^{[k]})) \\ = - \int_{t}^{t + Δ t} (x^{T} R x + u^{[k] T} S u^{[k]} - w^{[k] T} T w) d τ \\ + \int_{t}^{t + Δ t} Δ V^{[k] T} (B (u - u^{[k]}) + F (w - w^{[k]})) \end{array} (26)

By deriving Equation 23, we get $\nabla V^{[k] T} B = - 2 u^{[k + 1]} S$ and $\nabla V^{[k] T} F = 2 w^{[k + 1]} T$ ; upon substituting these parameters into Equation 26, the following equation was obtained:

\begin{array}{c} V^{[k]} (x (t + Δ t)) - V^{[k]} x (t) = - \int_{t}^{t + Δ t} (x^{T} R x + u^{[k] T} S u^{[k]} - w^{[k] T} T w) d τ \\ + \int_{t}^{t + Δ t} - 2 (u^{[k + 1] T} S (u - u^{[k]}) - w^{[k + 1] T} T (w - w^{[k]})) \end{array} (27)

From Equations 26, 27, the system dynamic matrices A, B, and F are replaced. Equation 27 overcomes the difficulty of obtaining the dynamic information of the system in practical applications. $(V^{[k]}, u^{[k + 1]}, w^{[k + 1]}$ ) is a unique solution for the off-policy IRL algorithm. Three neural networks were employed to solve the solution $(V^{[k]}, u^{[k + 1]}, w^{[k + 1]}$ ); the expressions are as follows:

{\hat{V}}^{[k]} = P_{V}^{T} (x) {\hat{θ}}_{V}^{[k]} (28)

u^{[k]} (x) = {{(P}_{u}^{a} (x))}^{T} {\hat{θ}}_{u_{a}}^{[k]} (29)

w^{[k]} (x) = {{(P}_{w}^{b} (x))}^{T} {\hat{θ}}_{w_{b}}^{[k]}, (30)

$P_{V}^{T} (x), P_{u}^{T} (x),$ and $P_{w}^{T} (x)$ satisfy $P_{V}^{T} (0) = 0$ , $P_{u}^{T} (0) = 0$ and $P_{w}^{T} (0) = 0$ , respectively, and are linearly independent. ${\hat{V}}^{[k]}$ is approached by a critical neural network (CNN), and $u^{[k]} (x)$ and $w^{[k]} (x)$ are approached by an action neural network (ANN) and a disturbance neural network (DNN), respectively. Here, ${\hat{θ}}_{V}^{[k]}$ , ${\hat{θ}}_{u_{a}}^{[k]}$ , and ${\hat{θ}}_{w_{b}}^{[k]}$ indicate the weights of CNN, ANN, and DNN, respectively.

According to Equation 28, the residual can be written as follows:

\begin{array}{c} ϵ^{[k]} (x, u, w) = {\hat{V}}^{[k]} (x (t)) - {\hat{V}}^{[k]} (x (t + Δ t)) \\ - \int_{t}^{t + Δ t} (x^{T} R x + u^{[k] T} S u^{[k]} - w^{[k] T} T w) d τ \\ + \int_{t}^{t + Δ t} - 2 (u^{[k + 1] T} S (u - u^{[k]}) - w^{[k + 1] T} T (w - w^{[k]})) \end{array} (31)

Substituting Equations 28–30 into Equation 31 yields

\begin{array}{l} ϑ^{[k]} (x, u . w) = {[P_{V} (x (t)) - P_{V} (x (t + Δ t))]}^{T} {\hat{θ}}_{V}^{[k + 1]} - \int_{t}^{t + Δ t} x^{T} R x d τ \\ - \sum_{a = 1}^{n} \int_{t}^{t + Δ t} {\hat{θ}}_{u_{a}}^{[k] T} P_{u}^{a} (x (t)) S {{(P}_{u}^{a} (x (t)))}^{T} {\hat{θ}}_{u_{a}}^{[k]} d τ \\ + \sum_{b = 1}^{m} \int_{t}^{t + Δ t} {\hat{θ}}_{w_{b}}^{[k] T} P_{w} (x (t)) T {{(P}_{w}^{b} (x (t)))}^{T} {\hat{θ}}_{w_{b}}^{[k]} d τ \\ + 2 \sum_{a = 1}^{n} \int_{t}^{t + Δ t} {\hat{θ}}_{u_{a}}^{[k] T} P_{u}^{a} (x (t)) S {(P_{u}^{a} (x (t)))}^{T} {\hat{θ}}_{u_{a}}^{[k + 1]} d τ \\ - 2 \sum_{a = 1}^{n} \int_{t}^{t + Δ t} u^{T} S {(P_{u}^{a} (x (t)))}^{T} {\hat{θ}}_{u_{a}}^{[k + 1]} d τ \\ - 2 \sum_{b = 1}^{m} \int_{t}^{t + Δ t} {\hat{θ}}_{w_{b}}^{[k] T} P_{w}^{b} (x (t)) T {{(P}_{w}^{b} (x (t)))}^{T} {\hat{θ}}_{w_{b}}^{[k + 1]} d τ \\ + 2 \sum_{b = 1}^{m} \int_{t}^{t + Δ t} w^{T} T {{(P}_{w}^{b} (x (t)))}^{T} {\hat{θ}}_{w_{b}}^{[k + 1]} d τ . \end{array} (32)

In order to simplify Equation 32, the following parameters are defined as Equations 33–39:

Q_{F A} (x (t)) = {P_{V}}^{T} (x (t)) - {P_{V}}^{T} (x (t + Δ t)) (33)

Q_{F B} (x (t)) = 2 \sum_{a = 1}^{n} \int_{t}^{t + Δ t} {\hat{θ}}_{u_{a}}^{[k] T} P_{u}^{a} (x (t)) S {{(P}_{u}^{a} (x (t)))}^{T} d τ (34)

Q_{F C} (x (t), u) = 2 \sum_{a = 1}^{n} \int_{t}^{t + Δ t} u^{T} S {{(P}_{u}^{a} (x (t)))}^{T} d τ (35)

Q_{F D} (x (t)) = - 2 \sum_{b = 1}^{m} \int_{t}^{t + Δ t} {\hat{θ}}_{w_{b}}^{[k] T} P_{w}^{b} (x (t)) T {{(P}_{w_{b}}^{b} (x (t)))}^{T} d τ (36)

Q_{F E} (x (t), w) = - 2 \sum_{b = 1}^{m} \int_{t}^{t + Δ t} w^{T} T {{(P}_{w}^{b} (x (t)))}^{T} d τ (37)

Q_{M} (x (t)) = \int_{t}^{t + Δ t} x^{T} R x d τ (38)

\begin{array}{c} Q_{N} (x (t)) = \sum_{a = 1}^{n} \int_{t}^{t + Δ t} {\hat{θ}}_{u_{a}}^{[k] T} P_{u}^{a} (x (t)) S {{(P}_{u}^{a} (x (t)))}^{T} {\hat{θ}}_{u_{a}}^{[k]} d τ \\ - \sum_{b = 1}^{m} \int_{t}^{t + Δ t} {\hat{θ}}_{w_{b}}^{[k] T} P_{w}^{b} (x (t)) T {{(P}_{w}^{b} (x (t)))}^{T} {\hat{θ}}_{w_{b}}^{[k]} d τ \end{array} (39)

Then, Equation 32 can be written as Equation 40:

\begin{array}{c} ϑ^{[k]} (x, u . w) = Q_{F A} (x (t)) {\hat{θ}}_{V}^{[k + 1]} + Q_{F B} (x (t)) {\hat{θ}}_{u}^{[k + 1]} \\ - Q_{F C} (x (t), u) {\hat{θ}}_{u}^{[k + 1]} + Q_{F D} (x (t)) {\hat{θ}}_{w}^{[k + 1]} - Q_{F E} (x (t), w) {\hat{θ}}_{w}^{[k + 1]} \\ - Q_{M} (x (t)) - Q_{N} (x (t)) \end{array} (40)

The Equations 41–43 are then generated to obtain the optimal solutions:

{\hat{W}}^{[k]} = [{\hat{θ}}_{V}^{[k + 1]}, {\hat{θ}}_{u 1}^{[k + 1]} \dots {\hat{θ}}_{u a}^{[k + 1]}, {\hat{θ}}_{w 1}^{[k + 1]} \dots {\hat{θ}}_{w b}^{[k + 1]}] (41)

\begin{array}{c} X_{A}^{[k]} (x, u, w) = [Q_{F A} (x (t)), Q_{F B}^{1} (x (t)) - Q_{F C}^{1} (x (t), u) \dots, Q_{F B}^{n} (x (t)) \\ - Q_{F c}^{n} (x (t)), Q_{F D}^{1} (x (t)) - Q_{F E}^{1} (x (t), w), \dots, \\ Q_{F D}^{n} (x (t)) - Q_{F E}^{n} (x (t), w)] \end{array} (42)

X_{B}^{[k]} = Q_{M} (x (t)) + Q_{N} (x (t)) (43)

Finally, Equation 32 can be written as follows:

ϑ^{[k]} (x, u, w) = X_{A}^{[k]} (x, u, w) {\hat{W}}^{[k]} - X_{B}^{[k]} (44)

In order to solve weight ${\hat{W}}^{[k]}$ , the residual error $ϑ^{[k]} (x, u . w)$ is approximated to zero. The inner product is applied to solve ${\hat{W}}^{[k]}$ using Equation 45 as follows:

< d ϑ^{[k]} (x, u, w) / d {\hat{W}}^{[k]}, d ϑ^{[k]} (x, u, w) >_{D} = 0, (45)

upon substituting Equation 45 is substituted in Equations 44, 46 is obtained as follows:

{< X}_{A}^{[k]}, X_{A}^{[k]} >_{D} {\hat{W}}^{[k]} - {< X}_{A}^{[k]}, X_{B}^{[k]} >_{D} = 0 (46)

$\hat{W}$ can be calculated as follows:

{\hat{W}}^{[k]} = {< X}_{A}^{[k]}, X_{A}^{[k]} {>_{D}}^{- 1} {< X}_{A}^{[k]}, X_{B}^{[k]} >_{D} (47)

Various numerical integrals in domain D were acquired to calculate ${< X}_{A}^{[k]}, X_{A}^{[k]} >_{D}$ and ${< X}_{A}^{[k]}, X_{B}^{[k]} >_{D}$ . The Monte Carlo integration was used to resolve this calculation. When $μ_{D} ≜ \int_{D} d (x, u, w),$ and $Q_{M} ≜ [(x_{i}, u_{i}, w_{i}) | (x_{i}, u_{i}, w_{i}) \in D, i = 1, 2, \dots, M],$ all of them are the sets sampled on domain D. M indicates the dimensions of the $Q_{M}$ , which should be as wide as possible to ensure that the sample set provides full coverage of D. Therefore, ${< X}_{A}^{[k]}, X_{A}^{[k]} >_{D}$ can be obtained as follows:

\begin{array}{c} < X_{A}^{[k]}, X_{A}^{[k]} >_{D} = \int_{D} {(X_{A}^{[k]} (x_{i}, u_{i}, w_{i}))}^{T} (X_{A}^{[k]} (x_{i}, u_{i}, w_{i})) d (x_{i}, u_{i}, w_{i}), \\ = \frac{μ_{D}}{M} \sum_{i = 1}^{M} ({(X_{A}^{[k]} (x_{i}, u_{i}, w_{i}))}^{T} (X_{A}^{[k]} (x_{i}, u_{i}, w_{i})) \\ = \frac{μ_{D}}{M} {(γ^{[k]})}^{T} γ^{[k]} \end{array} (48)

in which, $γ^{[k]} = {[(X}_{A}^{[k] T} (x_{1}, u_{1}, w_{1}), X_{A}^{[k] T} (x_{2}, u_{2}, w_{2}), \dots X_{A}^{[k] T} (x_{M}, u_{M}, w_{M})] .$

{< X}_{A}^{[k]}, X_{B}^{[k]} >_{D} = \begin{array}{c} \frac{μ_{D}}{M} \sum_{i = 1}^{M} ({(X_{A}^{[k]} (x_{i}, u_{i}, w_{i}))}^{T} ((X_{B}^{[k]} (x_{i}) \\ = \frac{μ_{D}}{M} {(γ^{[k]})}^{T} β^{[k]} \end{array} (49)

where $β^{[k]} = {[X_{B}^{[k]} (x_{1}), X_{B}^{[k]} (x_{2}), \dots, X_{B}^{[k]} (x_{M})]}^{T}$ .

Upon substituting Equations 48, 49 in Equation 47, the following equation is obtained:

{\hat{W}}^{[k]} = {[{(γ^{[k]})}^{T} γ^{[k]}]}^{- 1} {(γ^{[k]})}^{T} β^{[k]} (50)

The zero-sum problem can be solved using Algorithm 1 as follows:

Algorithm 1.Off-policy IRL method to solve the optimal control problem.

Step 1: Start with the signals $u$ and $w$ as well as collecting the hydropower-photovoltaic cogeneration system data ( $x_{p}, u_{p}, w_{p}$ ) to build the set $Q_{M}$ ; then, calculate the $Q_{F A} (x), Q_{F B} (x), Q_{F C} (x, u), Q_{F D} (x) {, Q}_{F E} (x, w), Q_{M} (x)$ and $Q_{N} (x)$ .

Step 2: The values of cost function, control, and disturbance are set initial admissible weight vectors as ${\hat{θ}}_{V}^{[0]}, {\hat{θ}}_{u 1}^{[0]} \dots {\hat{θ}}_{u a}^{[0]},$ and ${\hat{θ}}_{w 1}^{[0]} \dots {\hat{θ}}_{w b}^{[0]}$ , separately.

Step 3: Calculate the $γ^{[k]},$ and $β^{[k]}$ to renew the ${\hat{W}}^{[k]}$ by the Equation 50.

Step 4: Let k = k + 1, return to step 3, and go on.

Step 5: Until $‖ {\hat{W}}^{[k + 1]} - {\hat{W}}^{[k]} ‖\leq α$ , where in $α$ is a small positive constant. Then, the iteration is stopped, and ${\hat{W}}^{[k]}$ is used to acquire the control policy using Equation 41.

It is worth mentioning that the input and output data of the hydropower-photovoltaic microgrid power system are necessary to solve the zero-sum problem when the system dynamics are completely unknown.

5 Discussion

The hydropower-photovoltaic microgrid power system model was established, the proposed Algorithm 1 was utilized to solve the LFC control, and the simulation was realized in the MATLAB platform. Simulation results verified that the microgrid can maintain frequency stability despite local load and PV disturbances. The control and disturbance curves eventually approach to 0, as shown in Figure 3. The Figure 3 illustrates the behavior of two variables over a period of 10 time steps, designated on the x-axis. The y-axis represents the Control Value ranging from −0.5 to 1.0. The graph features two sets of trajectories for the control u and ω, each represented by both initial estimated values and adjusted values obtained using the Algorithm 1. The dashed and solid lines indicate the approximation curves under initial admissive control and Algorithm 1, respectively. It can be seen that the convergence speed of Algorithm 1 is better than that of the initial admissible control method. The frequency finally stabilized. For variable ω, it starts from a lower value and similarly converge towards zero. Overall, the obtained trajectories using Algorithm 1 exhibit a more rapid convergence towards 0 for both u and ω compared to their respective initial trajectories. Demonstrating the enhanced performance of Algorithm 1 over the initial admissible control method.

Figure 3

Figure 3. State traces of the system.

The weight convergence curves of the three networks are shown in Figures 4–6. These three figures illustrates the convergence of weights for every seven different networks $θ_{V}, θ_{u}, and θ_{ω}$ over 15 iterations, represented on the x-axis. The y-axis denotes the Weight of Value Function, with values ranging from −10 to 15. At the beginning of the iterations, the weights of these networks start from various initial values. Some weights exhibit significant fluctuations, particularly up to around the 5th iteration, where noticeable oscillations are apparent. As iterations proceed, the weights for all networks gradually stabilize and converge to constant values between 0 and 1, indicating that the values no longer change significantly with additional iterations. By the 15th step, all weights have reached their steady-state values. The convergence of these weights within 15 iterations suggests that the optimization process is effective, allowing the system to arrive at an optimal control that approximates the Nash equilibrium point for the zero-sum game problem.

Figure 4

Figure 4. Convergence curves of the $θ_{V}$ .

Figure 5

Figure 5. Convergence curves of the $θ_{u}$ .

Figure 6

Figure 6. Convergence curves of the $θ_{ω}$ .

Compared to traditional Dynamic Programming methods, the proposed method effectively overcomes the “curse of dimensionality,” significantly reducing the computational burden when solving high-dimensional matrices. In contrast to previous reinforcement learning approaches for controlling the optimal frequency of hydropower-photovoltaic microgrid power systems, this method incorporates the consideration of disturbance factors, providing a robust theoretical basis for the grid integration of hybrid power generation systems.

The desired voltage and current is 50 hz sinusoidal waves, such that the systemis dynamic with high frequency. Yet the IRL method depends a process to collect the control and states data from the system under a quasi-optimal control, which may lead to the power oscillation and need more time to turn the system from the transient state to steady state. Therefore, the limitation of this method is that it is currently only applicable to offline systems.

6 Conclusion

This paper focused on the hydropower-photovoltaic hybrid microgrid system and designed an optimal LFC using the IRL algorithm. First, the mechanism models of the hydro turbine generator and the photovoltaic generator were established, respectively. Second, a state-space model of the hydropower-photovoltaic hybrid microgrid system was developed, and based on the power generation characteristics, it was transformed in solving a zero-sum game problem. Third, the IRL algorithm was employed to approximate the Nash equilibrium point of the zero-sum game problem using three neural networks. Finally, the simulation experiments were conducted to verify the effectiveness of the proposed method.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

EW: Conceptualization, Writing–original draft, Writing–review and editing. LY: Methodology, Writing–review and editing. FZ: Writing–review and editing. XL: Writing–review and editing. JL: Writing–original draft. LS: Writing–original draft. MZ: Writing–original draft.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This work was supported by a grant for the project “Research and engineering demonstration of a safe, autonomous, and controllable intelligent control system for 10 million kilowatts of clean energy” (CSIEKJ220700539).

Conflict of interest

Authors EW and LY were employed by Guoneng Qinghai Yellow River MaerDang Hydropower Development Co., Ltd. Authors FZ, XL, and JL were employed by Beijing SP Zhishen Control Technology Co., Ltd.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Abbreviations

LFC, load frequency controller; PV, photovoltaic; PID, proportional-integral-derivative; ADP, adaptive dynamic programming; IRL, integral reinforcement learning.

References

Ahmad, S., Ahmad, A., and Yaqub, R. (2018). Optimized energy consumption and demand side management in smart grid. Smart Grid as a Solut. Renew. Effic. Energy, 1–25. doi:10.4018/978-1-5225-0072-8.ch001

CrossRef Full Text | Google Scholar

Bellman, R. E., and Dreyfus, S. E. (2015). Applied dynamic programming. Princeton, New Jersey: Princeton University Press.

Google Scholar

Chai, R., Savvaris, A., Tsourdos, A., and Chai, S. (2017). Multi-objective trajectory optimization of space manoeuvre vehicle using adaptive differential evolution and modified game theory. Acta Astronaut. 136, 273–280. doi:10.1016/j.actaastro.2017.02.023

CrossRef Full Text | Google Scholar

Chen, Z., Chen, J., Fu, K., and Xue, L. (2022). Power coordination control strategy of microgrid based on photovoltaic generation. MATEC Web Conf. 355, 03065. doi:10.1051/matecconf/202235503065

CrossRef Full Text | Google Scholar

Coban, H. H., Rehman, A., and Mousa, M. (2022). Load frequency control of microgrid system by battery and pumped-hydro energy storage. Water 14, 1818. doi:10.3390/w14111818

CrossRef Full Text | Google Scholar

Cowie, P., Townsend, L., and Salemink, K. (2020). Smart rural futures: will rural areas be left behind in the 4th Industrial Revolution?. J. Rural. Stud. 79, 169–176. doi:10.1016/j.jrurstud.2020.08.042

PubMed Abstract | CrossRef Full Text | Google Scholar

Dhundhara, S., and Verma, Y. P. (2020). Application of micro pump hydro energy storage for reliable operation of microgrid system. IET Renew. Power Gener. 14, 1368–1378. doi:10.1049/iet-rpg.2019.0822

CrossRef Full Text | Google Scholar

Fang, H., Chen, L., and Shen, Z. (2011). Application of an improved PSO algorithm to optimal tuning of PID gains for water turbine governor. Energy Convers. Manag. 52, 1763–1770. doi:10.1016/j.enconman.2010.11.005

CrossRef Full Text | Google Scholar

Ganguly, S., Shiva, C. K., and Mukherjee, V. (2018). Frequency stabilization of isolated and grid connected hybrid power system models. J. Energy Storage. 19, 145–159. doi:10.1016/j.est.2018.07.014

CrossRef Full Text | Google Scholar

Gielen, D., Boshell, F., Saygin, D., Bazilian, M. D., Wagner, N., and Gorini, R. (2019). The role of renewable energy in the global energy transformation. Energy Strategy Rev. 24, 38–50. doi:10.1016/j.esr.2019.01.006

CrossRef Full Text | Google Scholar

Gilani, M. A., Kazemi, A., and Ghasemi, M. (2020). Distribution system resilience enhancement by microgrid formation considering distributed energy resources. Energy 191, 116442. doi:10.1016/j.energy.2019.116442

CrossRef Full Text | Google Scholar

Guha, D., Roy, P. K., and Banerjee, S. (2021). Equilibrium optimizer-tuned cascade fractional-order 3DOF−PID controller in load frequency control of power system having renewable energy resource integrated. Int. Trans. Electr. Energy Syst. 31, e12702. doi:10.1002/2050-7038.12702

CrossRef Full Text | Google Scholar

Huang, Z., Liu, X., Fu, H., and Du, Z. (2021). A novel parameter optimisation method of hydraulic turbine regulating system based on fuzzy differential evolution algorithm and fuzzy PID controller. Int. J. Bio Inspired Comput. 18, 153–164. doi:10.1504/IJBIC.2021.119203

CrossRef Full Text | Google Scholar

Khalid, H. M., Muyeen, S. M., and Kamwa, I. (2022). An improved decentralized finite-time approach for excitation control of multi-area power systems. Sustain. Energy, Grids Netw. 31, 100692. doi:10.1016/j.segan.2022.100692

CrossRef Full Text | Google Scholar

Lewis, F. L., Vrabie, D., and Syrmos, V. L. (2012). Optimal control. Hoboken, New Jersey: John Wiley & Sons.

Google Scholar

Li, J., Guo, W., and Liu, Y. (2023). Nonlinear state feedback-synergetic control for low frequency oscillation suppression in grid-connected pumped storage-wind power interconnection system. J. Energy Storage. 73, 109281. doi:10.1016/j.est.2023.109281

CrossRef Full Text | Google Scholar

Ma, T., Yang, H., Lu, L., and Peng, J. (2014). Technical feasibility study on a standalone hybrid solar-wind system with pumped hydro storage for a remote island in Hong Kong. Renew. Energy. 69, 7–15. doi:10.1016/j.renene.2014.03.028

CrossRef Full Text | Google Scholar

Mohamed, R., Helaimi, M., Taleb, R., Gabbar, H. A., and Othman, A. M. (2020). Frequency control of microgrid system based renewable generation using fractional PID controller. IJEECS 19, 745–755. doi:10.11591/ijeecs.v19.i2.pp745-755

CrossRef Full Text | Google Scholar

Nisha, G., and Jamuna, K. (2022). Frequency stabilization of stand-alone microgrid with tuned PID controller. ECS Trans. 107, 773–782. doi:10.1149/10701.0773ecst

CrossRef Full Text | Google Scholar

Olabi, A. G., and Abdelkareem, M. A. (2022). Renewable energy and climate change. Renew. Sustain. Energy Rev. 158, 112111. doi:10.1016/j.rser.2022.112111

CrossRef Full Text | Google Scholar

Osman, N., Khalid, H. M., Tha’er, O. S., Abuashour, M. I., and Muyeen, S. M. (2022). A PV powered DC shunt motor: study of dynamic analysis using maximum power Point-Based fuzzy logic controller. Energy Convers. Manag. X 15, 100253. doi:10.1016/j.ecmx.2022.100253

CrossRef Full Text | Google Scholar

Papaefthymiou, S. V., Karamanou, E. G., Papathanassiou, S. A., and Papadopoulos, M. P. (2010). A wind-hydro-pumped storage station leading to high RES penetration in the autonomous island system of Ikaria. IEEE Trans. Sustain. Energy. 1, 163–172. doi:10.1109/TSTE.2010.2059053

CrossRef Full Text | Google Scholar

Patnaik, B., Mishra, M., Bansal, R. C., and Jena, R. K. (2020). AC microgrid protection–A review: current and future prospective. Appl. Energy. 271, 115210. doi:10.1016/j.apenergy.2020.115210

CrossRef Full Text | Google Scholar

Ray, P. K., Mohanty, S. R., and Kishor, N. (2011). Proportional–integral controller based small-signal analysis of hybrid distributed generation systems. Energy Convers. Manag. 52, 1943–1954. doi:10.1016/j.enconman.2010.11.011

CrossRef Full Text | Google Scholar

Rehman, A. U., Ullah, Z., Qazi, H. S., Hasanien, H. M., and Khalid, H. M. (2024). Reinforcement learning-driven proximal policy optimization-based voltage control for PV and WT integrated power system. Renew. Energy 227, 120590. doi:10.1016/j.renene.2024.120590

CrossRef Full Text | Google Scholar

Shuai, H., Ai, X., Fang, J., Ding, T., Chen, Z., and Wen, J. (2020). Real-time optimization of the integrated gas and power systems using hybrid approximate dynamic programming. Int. J. Electr. Power Energy Syst. 118, 105776. doi:10.1016/j.ijepes.2019.105776

CrossRef Full Text | Google Scholar

Song, R., Wei, Q., and Li, Q. (2019). Off-policy integral reinforcement learning method for multi-player non-zero-sum games. Adapt. Dyn. Program. Single Multiple Control., 227–249. doi:10.1007/978-981-13-1712-5_12

CrossRef Full Text | Google Scholar

Thirunavukkarasu, M., and Sawle, Y. (2021). Smart microgrid integration and optimization. Act. Electr. Distrib. Netw. Smart Approach, 201–235. doi:10.1002/9781119599593.ch11

CrossRef Full Text | Google Scholar

Tran, A. T., Minh, B. L. N., Huynh, V. V., Tran, P. T., Amaefule, E. N., Phan, V. D., et al. (2021). Load frequency regulator in interconnected power system using second-order sliding mode control combined with state estimator. Energies 14, 863. doi:10.3390/en14040863

CrossRef Full Text | Google Scholar

Vamvoudakis, K. G., and Lewis, F. L. (2010). Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46, 878–888. doi:10.1016/j.automatica.2010.02.018

CrossRef Full Text | Google Scholar

Wang, R., Sun, Q., Pinjia, Z., Yonghao, G., Dehao, Q., and Peng, W. (2020). Reduced-order transfer function model of the droop-controlled inverter via Jordan continued-fraction expansion. IEEE Trans. Energy 35, 1585–1595. doi:10.1109/TEC.2020.2980033

CrossRef Full Text | Google Scholar

Werbos, P. (1992). Approximate dynamic programming for real-Time control and neural modeling. Handb. Intelligent Control.

Google Scholar

Wu, J., and Yang, F. (2023). A dual-driven predictive control for photovoltaic-diesel microgrid secondary frequency regulation. Appl. Energy. 334, 120652. doi:10.1016/j.apenergy.2023.120652

CrossRef Full Text | Google Scholar

Xue, X., Ai, X., Fang, J., Yao, W., and Wen, J. (2022). Real-time schedule of integrated heat and power system: a multi-dimensional stochastic approximate dynamic programming approach. Int. J. Electr. Power Energy Syst. 134, 107427. doi:10.1016/j.ijepes.2021.107427

CrossRef Full Text | Google Scholar

Zeng, Y., Zhang, L. X., Guo, Y. K., and Qian, J. (2015). Hamiltonian stabilization additional L 2 adaptive control and its application to hydro turbine generating sets. Int. J. Control Autom. Syst. 13, 867–876. doi:10.1007/s12555-013-0460-7

CrossRef Full Text | Google Scholar

Zepter, J. M., Lüth, A., Crespo del Granado, P. C., and Egging, R. (2019). Prosumer integration in wholesale electricity markets: synergies of peer-to-peer trade and residential storage. Energy Build. 184, 163–176. doi:10.1016/j.enbuild.2018.12.003

CrossRef Full Text | Google Scholar

Zhang, D., and Kong, Q. (2022). Green energy transition and sustainable development of energy firms: an assessment of renewable energy policy. Energy Econ. 111, 106060. doi:10.1016/j.eneco.2022.106060

CrossRef Full Text | Google Scholar

Keywords: hydropower-photovoltaic hybrid microgrid system, load frequency controller, off policy integral reinforce learning algorithm, data-based optimal control, neural networks

Citation: Wang E, Yuan L, Zeng F, Liu X, Liu J, Sun L and Zhuang M (2024) Load frequency optimal control of the hydropower-photovoltaic hybrid microgrid system based on the off-policy integral reinforcement learning algorithm. Front. Energy Res. 12:1464722. doi: 10.3389/fenrg.2024.1464722

Received: 15 July 2024; Accepted: 26 September 2024;
Published: 10 October 2024.

Edited by:

Wenping Zhang, Tianjin University, China

Reviewed by:

Linfei Yin, Guangxi University, China
Haris M. Khalid, University of Dubai, United Arab Emirates
Qiuye Sun, Northeastern University, China

Copyright © 2024 Wang, Yuan, Zeng, Liu, Liu, Sun and Zhuang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Lingfang Sun, c3VubGZAbmVlcHUuZWR1LmNu

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.