- 1College of Electric Power, South China University of Technology, Guangzhou, China
- 2China Electric Power Research Institute, Nanjing, China
A data-driven PEMFC output voltage control method is proposed. Moreover, an Improved deep deterministic policy gradient algorithm is proposed for this method. The algorithm introduces three techniques: Clipped multiple Q-learning, policy delay update, and policy smoothing to improve the robustness of the control policy. In this algorithm, the hydrogen controller is treated as an agent, which is pre-trained to fully interact with the environment and obtain the optimal control policy. The effectiveness of the proposed algorithm is demonstrated experimentally.
Introduction
Fuel Cell is the fourth type of power generation technology after hydroelectric, thermal and nuclear power generation. It converts chemical energy stored in fuel and oxidizer directly into electricity through electrode reactions in an isothermal environment (Yang et al., 2021a; Yang et al., 2021b). As a new type of chemical power source, the fuel cell generation process is not a direct combustion of fuel compared to thermal power generation, the power generation efficiency is not limited by the Carnot cycle and the emission of harmful substances is extremely low(Yang et al., 2020; Yang et al., 2018). Its energy conversion rate is as high as 80 %, and its actual efficiency is double that of an ordinary internal combustion engine (Bougrine et al., 2013) The fuel cell is therefore a new power source with high efficiency and clean features, combining new technologies in energy, chemicals, materials and automatic control (Yang et al., 2019a; Yang et al., 2021c).
However, as the PEMFC system is a complex system with multiple inputs and outputs, nonlinear, approximately east, with random disturbances, time-varying and high order (Yang et al., 2019b; Li and Yu, 2021a), it is difficult to achieve satisfactory control results with traditional PID control (Li et al., 2021). In order to obtain accurate and fast response results, various advanced control strategies have been applied in the research of PEMFC output control strategies. (Zhang et al., 2019; Li and Yu, 2021b; Zhang et al., 2021).
In recent years, scholars at home and abroad have done a lot of research on the control of PEMFC, which is mainly divided into the following categories:
1) Model-based control methods (Wang and Kim, 2014): including internal model control (IMC) (Danzer et al., 2008), model predictive control (MPC) (Kim, 2010), model-based adaptive control (Zhang et al., 2008), nonlinear model predictive control (NMPC) (Park and Gajic, 2014), model multivariable control (nonlinear multivariable control) (Talj et al., 2009), time delay control (Liu et al., 2016), generalized model control (Damour et al., 2014), etc.
2) Sliding mode control. Some higher order sliding mode control methods have also been applied to PEMFC, (Ou et al., 2015) such as first order sliding mode control, higher order super twisted sliding mode control and (Chen et al., 2018) higher order sliding mode control with an observer.
3) PID and its improvement algorithms. Some improved algorithms on the PID algorithm have also been used extensively, for example, neural PID controller (Zhao et al., 2020), fuzzy PID control (Sun et al., 2018), and algorithm combining PID and fuzzy controller (Ou et al., 2017), feedback linearization controller, and reference fractional order PID (FOPID) controller.
4) Adaptive control. Some adaptive control has also been applied to PEMFC control, such as data-driven adaptive controller, an adaptive control based on parameter identification, and adaptive pole search controller.
5) Compound control. There are also some compound controllers, for example, PID-neural network control, interval type II fuzzy (Fuzzy)-PID control, fuzzy adaptive PID control.
The existing research problems are:
1) There is no model -free control algorithm that can effectively adapt to the non-linear characteristics of PEMFC.
2) No optimal algorithm with adaptive capabilities and low computational effort.
For this reason, model-free controllers with strong adaptive capabilities are more suitable for such systems.
A data-driven PEMFC output voltage control method is proposed. Moreover, an Improved deep deterministic policy gradient algorithm is proposed for this method. The algorithm introduces three techniques: double Q learning, policy delay update, and policy smoothing to improve the robustness of the control policy. In this algorithm, the hydrogen controller is treated as an agent, which is pre-trained to fully interact with the environment and obtain the optimal control policy. The effectiveness of the proposed algorithm is demonstrated experimentally.
The innovations in this paper are.
1) A data-driven PEMFC output voltage control method is proposed.
2) An Improved deep deterministic policy gradient algorithm is proposed.
The remainder of this paper comprises the following sections: the PEMFC model is demonstrated in PEMFC Model, and the proposed algorithm is described in Proposed Method; the experimental results are analysed and discussed in Case Studies, and the findings in this paper are summarised in Conclusion.
PEMFC Model
PEMFC Output Voltage
The dynamic model of the PEMFC has been refined from the electrochemical model. Ideally, the voltage released at full reaction is 1.229 V. The actual potential decreases due to irreversible losses, which in practice are also known as polarization overvoltage. In the power generation process of a PEMFC, polarization overvoltage is mainly manifested as activation overvoltage, ohmic overvoltage and concentration overvoltage. Therefore, in the actual power generation process, the individual voltage is inevitably less than the ideal standard electric potential due to the polarization overvoltage. In addition to the factors such as temperature, pressure and current density, chemical and material factors such as electrode material and electrolyte can also influence the polarization or overvoltage of the electrodes.
For a fuel cell stack consisting of N single cells connected in series, the output voltage V can be expressed as
Ohmic Voltage Overvoltage
The ohmic polarization overvoltage is mainly caused by the equivalent membrane impedance of the proton exchange membrane to the transfer of protons and the impedance of the electrodes and current collectors to the transfer of electrons. Based on the Amphlett model, the PEMFC ohmic overvoltage mainly includes the voltage drop caused by the impedance of the two parts of the PEMFC. These two parts of impedance, one part is the equivalent membrane impedance of the proton membrane, The other part is the resistance that prevents protons from passing through the proton membrane which is usually a constant. According to the resistivity theorem, the equivalent membrane impedance Rm can be obtained by the following formula:
In the formula,
Empirically, the internal resistance of the battery is
Ohmic polarization overvoltage can be expressed as:
Activation Overvoltage
Activation overvoltage is the deviation of an electrode’s potential from its equilibrium potential due to a delay in its electrochemical reaction. The activation polarization overvoltage of the cathode can be obtained
Anode activation polarization overvoltage:
The total activation polarization overvoltage is the sum of the anode activation overvoltage and the cathode activation overvoltage, expressed as:
Thermodynamic Electric Potential
According to the empirical formula for the PEMFC, the thermodynamic electric potential can be obtained as follows:
PEMFC can directly convert chemical energy into electrical energy. The chemical energy release of a fuel cell can be calculated by the change of the Gibbs self-burning energy
The corresponding change in Gibbs’ self-reliance is:
The changed Gibbs self-burning energy is a function of temperature and pressure:
We can deduce the voltage of the fuel cell:
When specific values are brought in, the equation can be transformed into
Dense Differential Polarization Overvoltage
Concentration overvoltage is a phenomenon caused by the deviation of the electrode potential from the equilibrium potential due to the difference between the concentration of ions in the solution at the electrode interface layer and the concentration of the body solution in the electrolytic bath, which can be expressed as:
Dynamic and Capacitive Characteristics of the Double Layer Charge
The phenomenon of a “double layer of charge” in a proton exchange membrane fuel cell is particularly important for the dynamics of the PEMFC. On the surface of the electrode electrons are collected and on the surface of the electrolyte hydrogen ions are collected. Between them there is a potential difference in which charge and energy are stored, which acts as an equivalent capacitance. This “smoothest out” the voltage loss across the equivalent resistance and results in a very realistic dynamic model of the PEMFC. Therefore, when modelling the dynamics of the PEMFC, a capacitance is added to the electrochemical model. This “equivalent capacitance” is able to better represent this effect by smoothing the output voltage response of the fuel cell as the current changes, with a transition time.
In Figure 1, the polarization voltage across Rd is
Thus, the voltage of the stack can be expressed as:
The output power and efficiency of the stack can therefore be expressed as:
Proposed Method
DDPG
The DDPG method fuses deep neural networks with Deterministic Policy Gradients (DPG) algorithms and uses actor -critic a framework as the basic architecture for the algorithm. The actor network is used to update the policy and the critic network is used to approximate the state action value function. The use a non-linear neural network as an approximator. Inspired by the algorithm, DQN solve this problem by setting up a target actor network and a target critic network, as well as an experience replay mechanism. Instead of DQN directly copying the current network to the target network, DDPG updates the target network in a “soft” way, ensuring that each parameter update is small, thus achieving a stable training effect.
In DDPG, the objective function is defined as a sum with discounted rewards
Of which
The actor network parameters are updated by means of a chain derivative rule for the objective function:
To address the problem of under-exploration caused by actors mapping states to deterministic actions in the DPG approach, the DDPG algorithm generates temporal correlated noise through the Ornstein-Uhlenbeck (OU) process to improve the exploration capability of the algorithm under deterministic strategies.
DDPG uses an empirical replay mechanism based on random sampling but suffers from Q-value overestimation.
The IDDPG algorithm uses two critics. The target value formula is as follows:
Case Studies
The DDPG control strategy, fuzzy PID controller (Fuzzy-PID), PSO optimized fuzzy PID controller (PSO- PID), and PID are introduced in this paper as comparative examples. The load variation makes the step disturbance at 1 s. The load current magnitude appears from 100 A with load disturbance and rises to 127 A. The results are shown in Figures 1A,B.
1 According to Figures 1A,B-, the IDDPG algorithm improves the robustness of the algorithm because it uses advanced techniques to solve the Q overestimation problem in conventional deep reinforcement learning algorithms. In contrast, the DDPG algorithm does not use an effective strategy to improve the robustness of the algorithm, so the algorithm tends to fall into local optima, making the final control strategy sub-optimal and not robust. In addition, the other algorithms do not have better optimal control capability and have difficulty in adapting to the non-linear characteristics of the PEMFC, therefore, their output voltage control performance is low.
2 For Fuzzy-based algorithms, their performance is mostly better than Optimized-based algorithms due to their ability to automatically adjust coefficients, but the simplicity of the Fuzzy rule makes them less accurate.
Optimized-based algorithms are not adaptive and robust due to the inability to adjust the coefficients in real time, which ultimately leads to overshooting and instability of the output voltage.
In summary: In Case 1, the IDDPG algorithm has better static and dynamic performance and is able to control the output voltage effectively.
Conclusion
A data-driven PEMFC output voltage control method is proposed. An improved deep deterministic policy gradient algorithm is proposed for this method, which introduces three techniques: Clipped multiple Q-learning, policy delay update and policy smoothing to improve the robustness of the control policy. In this algorithm, the hydrogen controller is treated as an agent, which is pre-trained to fully interact with the environment and obtain the optimal control policy. The effectiveness of the proposed algorithm is experimentally demonstrated.
The IDDPG algorithm has a short response time, a fast response time, good dynamic and static performance indicators, enabling timely and effective output voltage control.
Data Availability Statement
The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding authors.
Author Contributions
JL: conceptualization, methodology, software, data curation, writing- original draft preparation, visualization, investigation, software, validation. YL: Writing- Reviewing and editing. TY: Supervision.
Funding
This work was jointly supported by National Natural Science Foundation of China (U2066212).
Conflict of Interest
Author Y Li was employed by the company China Electric Power Research Institute.
The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s Note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Bougrine, M. D., Benalia, A., and Benbouzid, M. H. (2013). “Nonlinear Adaptive Sliding Mode Control of a Powertrain Supplying Fuel Cell Hybrid Vehicle,” in 3rd International Conference on Systems and Control (Algiers, Algeria: IEEE), 714–719. doi:10.1109/icosc.2013.6750938
Chen, J., Liu, Z., Wang, F., Ouyang, Q., and Su, H. (2018). Optimal Oxygen Excess Ratio Control for PEM Fuel Cells. IEEE Trans. Contr. Syst. Technol. 26, 1711–1721. doi:10.1109/TCST.2017.2723343
Damour, C., Benne, M., Lebreton, C., Deseure, J., and Grondin-Perez, B. (2014). Real-time Implementation of a Neural Model-Based Self-Tuning PID Strategy for Oxygen Stoichiometry Control in PEM Fuel Cell. Int. J. Hydrogen Energ. 39, 12819–12825. doi:10.1016/j.ijhydene.2014.06.039
Danzer, M. A., Wilhelm, J., Aschemann, H., and Hofer, E. P. (2008). Model-based Control of Cathode Pressure and Oxygen Excess Ratio of a PEM Fuel Cell System. J. Power Sourc. 176, 515–522. doi:10.1016/j.jpowsour.2007.08.049
Kim, Y.-B. (2010). Improving Dynamic Performance of Proton-Exchange Membrane Fuel Cell System Using Time Delay Control. J. Power Sourc. 195, 6329–6341. doi:10.1016/j.jpowsour.2010.04.042
Li, J., and Yu, T. (2021a). A New Adaptive Controller Based on Distributed Deep Reinforcement Learning for PEMFC Air Supply System. Energ. Rep. 7, 1267–1279. doi:10.1016/j.egyr.2021.02.043
Li, J., and Yu, T. (2021b). Distributed Deep Reinforcement Learning for Optimal Voltage Control of PEMFC. IET Renew. Power Generation 15, 2778–2798. doi:10.1049/rpg2.12202
Li, J., Yu, T., Zhang, X., Li, F., Lin, D., and Zhu, H. (2021). Efficient Experience Replay Based Deep Deterministic Policy Gradient for AGC Dispatch in Integrated Energy System. Appl. Energ. 285, 116386. doi:10.1016/j.apenergy.2020.116386
Liu, J., Luo, W., Yang, X., and Wu, L. (2016). Robust Model-Based Fault Diagnosis for PEM Fuel Cell Air-Feed System. IEEE Trans. Ind. Electron. 63, 3261–3270. doi:10.1109/TIE.2016.2535118
Ou, K., Wang, Y.-X., and Kim, Y.-B. (2017). Performance Optimization for Open-Cathode Fuel Cell Systems with Overheating Protection and Air Starvation Prevention. Fuel Cells 17, 299–307. doi:10.1002/fuce.201600181
Ou, K., Wang, Y.-X., Li, Z.-Z., Shen, Y.-D., and Xuan, D.-J. (2015). Feedforward Fuzzy-PID Control for Air Flow Regulation of PEM Fuel Cell System. Int. J. Hydrogen Energ. 40, 11686–11695. doi:10.1016/j.ijhydene.2015.04.080
Park, G., and Gajic, Z. (2014). A Simple Sliding Mode Controller of a Fifth-Order Nonlinear PEM Fuel Cell Model. IEEE Trans. Energ. Convers. 29, 65–71. doi:10.1109/TEC.2013.2288064
Sun, L., Shen, J., Hua, Q., and Lee, K. Y. (2018). Data-driven Oxygen Excess Ratio Control for Proton Exchange Membrane Fuel Cell. Appl. Energ. 231, 866–875. doi:10.1016/j.apenergy.2018.09.036
Talj, R. J., Hissel, D., Ortega, R., Becherif, M., and Hilairet, M. (2010). Experimental Validation of a PEM Fuel-Cell Reduced-Order Model and a Moto-Compressor Higher Order Sliding-Mode Control. IEEE Trans. Ind. Electron. 57, 1906–1913. doi:10.1109/TIE.2009.2029588
Wang, Y.-X., and Kim, Y.-B. (2014). Real-time Control for Air Excess Ratio of a PEM Fuel Cell System. Ieee/asme Trans. Mechatron. 19, 852–861. doi:10.1109/TMECH.2013.2262054
Yang, B., Li, D., Zeng, C., Chen, Y., Guo, Z., Wang, J., et al. (2021a). Parameter Extraction of PEMFC via Bayesian Regularization Neural Network Based Meta-Heuristic Algorithms. Energy 228, 120592. doi:10.1016/j.energy.2021.120592
Yang, B., Swe, T., Chen, Y., Zeng, C., Shu, H., Li, X., et al. (2021b). Energy Cooperation between Myanmar and China under One Belt One Road: Current State, Challenges and Perspectives. Energy 215, 119130. doi:10.1016/j.energy.2020.119130
Yang, B., Wang, J., Zhang, X., Yu, T., Yao, W., Shu, H., et al. (2020). Comprehensive Overview of Meta-Heuristic Algorithm Applications on PV Cell Parameter Identification. Energ. Convers. Manag. 208, 112595. doi:10.1016/j.enconman.2020.112595
Yang, B., Yu, T., Shu, H., Dong, J., and Jiang, L. (2018). Robust Sliding-Mode Control of Wind Energy Conversion Systems for Optimal Power Extraction via Nonlinear Perturbation Observers. Appl. Energ. 210, 711–723. doi:10.1016/j.apenergy.2017.08.027
Yang, B., Yu, T., Zhang, X., Li, H., Shu, H., Sang, Y., et al. (2019a). Dynamic Leader Based Collective Intelligence for Maximum Power point Tracking of PV Systems Affected by Partial Shading Condition. Energ. Convers. Manag. 179, 286–303. doi:10.1016/j.enconman.2018.10.074
Yang, B., Zeng, C., Wang, L., Guo, Y., Chen, G., Guo, Z., et al. (2021c). Parameter Identification of Proton Exchange Membrane Fuel Cell via Levenberg-Marquardt Backpropagation Algorithm. Int. J. Hydrogen Energ. 46, 22998–23012. doi:10.1016/j.ijhydene.2021.04.130
Yang, B., Zhong, L., Zhang, X., Shu, H., Yu, T., Li, H., et al. (2019b). Novel Bio-Inspired Memetic Salp Swarm Algorithm and Application to MPPT for PV Systems Considering Partial Shading Condition. J. Clean. Prod. 215, 1203–1222. doi:10.1016/j.jclepro.2019.01.150
Zhang, J., Liu, G., Yu, W., and Ouyang, M. (2008). Adaptive Control of the Airflow of a PEM Fuel Cell System. J. Power Sourc. 179, 649–659. doi:10.1016/j.jpowsour.2008.01.015
Zhang, X., Li, S., He, T., Yang, B., Yu, T., Li, H., et al. (2019). Memetic Reinforcement Learning Based Maximum Power point Tracking Design for PV Systems under Partial Shading Condition. Energy 174, 1079–1090. doi:10.1016/j.energy.2019.03.053
Zhang, X., Tan, T., Zhou, B., Yu, T., Yang, B., and Huang, X. (2021). Adaptive Distributed Auction-Based Algorithm for Optimal Mileage Based AGC Dispatch with High Participation of Renewable Energy. Int. J. Electr. Power Energ. Syst. 124, 106371. doi:10.1016/j.ijepes.2020.106371
Keywords: output voltage, proton exchange membrane fuel cell, deep deterministic policy gradient algorithm, robustness, PEMFC
Citation: Li J, Li Y and Yu T (2021) Control Method for PEMFC Using Improved Deep Deterministic Policy Gradient Algorithm. Front. Energy Res. 9:753064. doi: 10.3389/fenrg.2021.753064
Received: 04 August 2021; Accepted: 13 September 2021;
Published: 30 September 2021.
Edited by:
Yaxing Ren, University of Warwick, United KingdomReviewed by:
Du Gang, North China Electric Power University, ChinaYunzheng Zhao, The University of Hong Kong, Hong Kong, SAR China
Copyright © 2021 Li, Li and Yu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Jiawen Li, ZXBsaWppYXdlbkBtYWlsLnNjdXQuZWR1LmNu; Tao Yu, dGFveXUxQHNjdXQuZWR1LmNu, ZXB0YW95dTFAMTYzLmNvbQ==