Skip to main content

ORIGINAL RESEARCH article

Front. Neurosci., 21 December 2023
Sec. Decision Neuroscience
This article is part of the Research Topic Theory and Application of Artificial Neural Networks in Control and Decision-making of Autonomous Systems View all 6 articles

Realizing asynchronous finite-time robust tracking control of switched flight vehicles by using nonfragile deep reinforcement learning

Haoyu Cheng
Haoyu Cheng1*Ruijia SongRuijia Song2Haoran LiHaoran Li1Wencheng WeiWencheng Wei3Biyu ZhengBiyu Zheng1Yangwang FangYangwang Fang1
  • 1Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an, China
  • 2Xi’an Modern Control Technology Research Institute, Xi’an, China
  • 3School of Astronautics, Northwestern Polytechnical University, Xi’an, China

In this study, a novel nonfragile deep reinforcement learning (DRL) method was proposed to realize the finite-time control of switched unmanned flight vehicles. Control accuracy, robustness, and intelligence were enhanced in the proposed control scheme by combining conventional robust control and DRL characteristics. In the proposed control strategy, the tracking controller consists of a dynamics-based controller and a learning-based controller. The conventional robust control approach for the nominal system was used for realizing a dynamics-based baseline tracking controller. The learning-based controller based on DRL was developed to compensate model uncertainties and enhance transient control accuracy. The multiple Lyapunov function approach and mode-dependent average dwell time approach were combined to analyze the finite-time stability of flight vehicles with asynchronous switching. The linear matrix inequalities technique was used to determine the solutions of dynamics-based controllers. Online optimization was formulated as a Markov decision process. The adaptive deep deterministic policy gradient algorithm was adopted to improve efficiency and convergence. In this algorithm, the actor–critic structure was used and adaptive hyperparameters were introduced. Unlike the conventional DRL algorithm, nonfragile control theory and adaptive reward function were used in the proposed algorithm to achieve excellent stability and training efficiency. We demonstrated the effectiveness of the presented algorithm through comparative simulations.

1 Introduction

Aerospace technology has developed rapidly since the 20th century (Wang et al., 2021; Giacomin and Hemerly, 2022; Wang and Xu, 2022). To satisfy the requirements of scientific exploration, military attack, transportation, industrial assistance, and other domains (Bao et al., 2021), flight vehicle systems are becoming increasingly complex (Wu et al., 2021; Lee and Kim, 2022). As an effective tool for the analysis of complex nonlinear systems, switched systems exhibit considerable potential for use in fast time-variation (Hu et al., 2019), full envelope, structural model mutation (Grigorie et al., 2022), re-modeling (Yue et al., 2019), among others (Chen et al., 2022; Yang et al., 2022).

Switched systems are a critical component of a series of discrete/continuous subsystems, and a switching signal controls the switching logic between these subsystems (Zhang et al., 2019). The switched system exhibits considerable potential for use in theoretical research and engineering applications (Sun and Lei, 2021), such as modeling (Huang et al., 2020), stability analysis (Yang et al., 2020; Zhang and Zhu, 2020), and control problems (Gong et al., 2020; Xiao et al., 2020). The stability analysis of the switched systems is typically used for controller design (Liu et al., 2020). The common Lyapunov function (CLF) method is widely used for stability analysis of arbitrary switching (Jiang et al., 2020). However, ensuring that a CLF is shared by all the subsystems remains challenging. This method is conservative to some degree, which leads to the research is required on the MLF and average dwell time (ADT) methods. Zhao et al. (2012) first studied the stability of the switched systems with ADT switching. In another study, the linear copositive function was extended to the MLF, and the multiple linear copositive Lyapunov function method was used to obtain a sufficient stability criterion for switched systems (Cheng et al., 2017). To obtain tight bounds on the dwell time, the mode-dependent average dwell time (MDADT) method was proposed to overcome the sharing problem of common parameters, and the worst cases were considered in the ADT method. The results were extended to a general case, and the properties of subsystems were considered. Generally, unstable modes may exist during the switching intervals. Therefore, a piecewise multi-Lyapunov function method was proposed in Zhao et al. (2017) for the stability analysis of unstable modes. To avoid dwelling for a long time in subsystems with poor performance and considering the MDADT methods, the slow switching is typically applied to stable modes, and fast switching is applied to unstable modes. Xu et al. (2019) proposed a time-dependent quadratic Lyapunov function method to solve the stability problem with all subsystems unstable. The bounded maximum ADT method is used to obtain the stability conditions of the linear switched system. However, these studies have only focused on infinite-time stability, whereas in finite time, the performance of the systems cannot be guaranteed. Unlike conventional Lyapunov stability, the FTS can achieve superior transient performance in finite time. Wei et al. (2020) proposed a novel MDADT switching signal. The dynamic decomposition technique was used to generate the switching signals, and sufficient conditions for FTS were detailed. For nonlinear switched systems with time delay, the Lyapunov-Razumikhin approach and Lyapunov-Krasovskii function method were used to investigate FTS problems (Wang et al., 2020). Furthermore, the tracking control is widely applied in flight vehicles (Liu et al., 2021). The finite-time tracking control problems in Wang et al. (2017) furthers research on finite-time robust tracking control of switched flight vehicles.

The tracking control problem for uncertain systems is investigated as follows (Liu et al., 2019; Chen et al., 2020; Lu et al., 2022): (1) constant parameter control, such as robust control, proportional integral derivative control, and optimal control, in which the worst case is considered for the bounded uncertainties and disturbances; (2) variable parameter control, such as adaptive and observer-based controls, in which the uncertainties and disturbances are compensated in real time; (3) learning-based control policy, such as reinforcement learning, which compensates uncertainties without prior knowledge and learns a control law through trial and error. In constant parameter control, the model uncertainties and external disturbances are assumed to be bounded with known boundaries, which result in performance degradation and conservative control laws. The variable parameter control method can be used to mitigate the problem of time-varying uncertainties with unknown boundaries. However, the model uncertainties are assumed to be linearly parameterized with predefined structure and unknown time-varying parameters. The learning-based control method can be used for addressing system uncertainties with unknown boundaries and unknown structures (Yuan et al., 2017). However, this method cannot ensure stability, and computational complexities increase. A novel model-reference adaptive law and a switching logic were developed for uncertain switched systems. Ban et al. (2018) designed an H controller for polytopic uncertain switched systems. Introducing scalar parameters reduced the conservatism of the linear matrix inequality (LMI) conditions and simultaneously ensured robust H performance of the system. The problems of nonfragile control for nonlinear switched systems considering actuator failures and parametric uncertainties were studied in Sakthivel et al. (2018). The Lyapunov-Krasovskii function method and ADT approach were used to design a nonfragile reliable sampled-data controller. These studies have focused on control in the ideal environment. However, in practice, because of the limitation of network bandwidth, a network delay and packet loss always exist, which cause inevitable asynchronous switching. Thus, the control switching lags behind state switching. This phenomenon results in performance degradation and instability. Li and Deng (2018) investigated the pth moment exponential input-to-state stability (ISS) of the switched systems with asynchronous switching. The indefinite differentiable Lyapunov function was combined with ADT to establish the ISS conditions of the switched systems with Lévy noise. The conclusion of these results (Zhang and Zhu, 2019) were generalized in Li and Deng (2018), and the ISS problems, stochastic-ISS, and integral-ISS for asynchronously switched systems with asynchronous switching were investigated. Fast ADT switching was introduced to mitigate the increase in the Lyapunov-Krasovskii function when active subsystems matches the controller. However, in most existing results on controller design for flight vehicles, although stability and robustness can be attained, achieving optimal control performance in real-time challenging.

With improvement in the calculating ability of computing devices, machine learning has been widely applied in many fields, including the control field (Cheng and Zhang, 2018; Guo et al., 2019; Gheisarnejad and Khooban, 2021). Xu et al. (2019) proposed a model-driven DDPG algorithm for robotic multi-peg-in-hole assembly to avoid the analysis of the contact model. A feedback strategy and a fuzzy reward function were proposed to improve data efficiency and learning efficiency. In Tailor and Izzo (2019), optimal trajectory for a quadcopter model in two dimensions was investigated. A near-optimal policy was proposed to construct trajectories that satisfy Pontryagin’s principle of optimality through supervised learning. With improved aircraft performance, the guidance and control system require rapidity, stability, and robustness. Therefore, deep learning and the exploration of reinforcement learning are an effective solution to this problem, which cannot be solved using conventional control. Cheng et al. (2019) and Gaudet et al. (2020) studied the fuel-optimal landing problems based on DRL. The optional control algorithms were designed considering the uncertainties of environment and system parameters by using deep neural networks and policy gradient methods to ensure the real-time performance and optimality of the landing mission. The design of the reward function is a critical factor for controller/filter design with DRL. In this method, the final performance of the training networks was determined but not treated satisfactorily. This study is motivated to solve this problem.

However, the methods proposed in Tailor and Izzo (2019) and Gaudet et al. (2020) could not ensure the robustness and stability of the given system. Considering the advantages and limitations of the model-based and model-free methods, we proposed a novel nonfragile DRL for achieving asynchronously finite-time robust tracking control of switched flight vehicles. In this method, the best compromise was realized between system stability, robustness, and rapidity. The intelligent controller based on nonfragile H control and DRL was proposed to compensate model uncertainties and realize superior control performance. The FTS and finite-time robustness were realized by nonfragile H control, and the transient performance was optimized by using the adaptive deep deterministic policy gradient (ADDPG) algorithm. Because of the significance of reward function design in the training process, adaptive hyperparameters were introduced to construct a generalized reward function to improve the performance and achieve robustness. Therefore, the contributions of the paper can be summarized as follows:

1. A novel control structure consisting of dynamics-based and learning-based controllers was proposed for the finite-time tracking control of switched flight vehicles. The robust control is focused on the worst case of uncertainties. However, transient performance is not ensured. The learning-based method, such as DRL, can address uncertainties with unknown boundaries and structures. However, stability is not guaranteed. Compared with the conventional method, in such a design structure, the advantages of both conventional robust control method and pure DRL are combined. The DRL is used to enhance control performance without exploiting their structures or boundaries, and the robustness is guaranteed by using model-based robust control. Thus, an optimal compromise between robustness and dynamic performance was achieved.

2. The stability and robustness of closed-loop system were guaranteed by using non-fragile control theory. The restricted DRL algorithm was proposed, in which the boundaries of scheduling intervals were predefined. The scheduling of parameters can be viewed as the perturbation of parameters within a given interval. Compared with pure DRL, the proposed method improved training efficiency and ensured stability of the closed-loop system.

3. The adaptive reward functions were proposed to realize rapid training convergence. The reward functions were crucial for the DRL algorithm. The conventional method of reward functions typically depends on the designing experience of the researchers, which degrade training efficiency and result in trial and error. Therefore, in the proposed method, adaptive factors for reward functions were used to improve training efficiency.

The rest of the paper is organized as follows. In Section 2, the structure of intelligent switched controllers is presented. In Section 3, the finite-time robust tracking control algorithm using DRL and H control was proposed. A numerical example is provided in Section 4. Finally, Section 5 presents the summary and directions for future studies.

2 Problem statement

The HiMAT vehicle was studied, which is an unmanned flight vehicle. Its nonlinear model can be described in Eq. (1).

{ m f v ˙ = T cos α D m f g sin θ α α ˙ = T sin α m f v L m f v + q + g cos θ α v φ ˙ = q I y q ˙ = M y y h ˙ = v sin θ α θ = φ α     (1)

where mf and v denote the mass and velocity of the flight vehicle, respectively. Here, α , θ , φ , and q are the attack angle, flight path angle, pitch angle, and pitch rate, respectively. Furthermore, Myy and Iy are the pitch moment and the moment of inertia about the pitch axis, respectively. Furthermore, g denotes the gravitational constant. The notations of T , D , and L represent the thrust, drag force, and lift force, which can be expressed as follows:

{ T = Q S C T D = Q S C D L = Q S C L     (2)

where C L α = C L 0 0 + C L 1 α α , C D α = C D 1 0 + C D 2 α α + C D 3 α 2 α 2 , C T δ c = C T 0 + C T δ c δ c , Q = 0.5 ρ V 2 , in which ρ and δ c are the air density and throttle setting.

Based on Jacobian linearization, the nonlinear model of HiMAT vehicle can be converted into the linear model to bridge the connection between complex nonlinear and linear models. Therefore, the longitudinal short-period model of the HiMAT vehicle can be modeled as switched systems as follows:

{ x k + 1 = A i x k + B i u k + D i ω k y k = C i x k     (3)

where x k = α q T R x is the state vector, ω k R ω represents the external disturbance that belongs to L 2 0 , u k = δ e δ v δ c T R u with δ e , δ v , and δ c representing the elevator, elevon, and canard deflection, and y k R y denoting the control and output signals. Here, σ k = i Ω = 1 2 n is the switching function, which is a piecewise continuous constant function. Furthermore, n > 1 is the number of subsystems. The characteristic of subsystems is assumed to depend on the switching signal, which are known previously. Here, A i , B i , C i , and D i are system matrices with appropriate dimensions.

In the network environment, because of the limit source of network bandwidth, the packet dropouts should be considered. The packet dropouts are considered in the channel of sensors–controllers to satisfy the Bernoulli distribution (Cheng et al., 2018). Therefore, the measured output is described as follows:

{ y ˜ k = θ k y k Pr o b θ k = 1 = E θ k = ρ Pr o b θ k = 0 = 1 E θ k = 1 ρ     (4)

where y ˜ k is the measured output, θ k represents a stochastic variable satisfying the Bernoulli distribution and takes value of 0 1 , and ρ 0 1 is the probability of packet dropouts.

The control structure of switched flight vehicles to ensure stability and improve transient performance is displayed in Figure 1.

FIGURE 1
www.frontiersin.org

Figure 1. Structure of the controller.

The controller diagram reveals that the controller is composed of two parts:

u k = u n k + u c k     (5)

where u n k is the dynamics-based controller, and u c is the learning-based controller, which are developed based on finite-time H control and DRL. The FTS and prescribed attenuation index are ensured by u n k , whose parameters can be obtained by the LMI technique. The transient performance is improved by u c k , whose parameters are scheduled by the ADDPG algorithm.

The tracking error of the output is defined as e k = r c k y k , and the objective of tracking control is as follows:

lim k e k = 0     (6)

where r c k denotes the command signal.

We set the integral of tracking error as follows:

g k = l = 0 k 1 e l = l = 0 k 1 r c l y l     (7)

The feedback controller is proposed as follows:

u n k = K n , i x ˜ k = K n , 1 i x k + K n , 2 i g k     (8)

where x ˜ k = x k T g k T T , K n , 1 i and K n , 2 i are the gain matrices to be determined.

Nominal controller parameters K n , 1 i and K n , 2 i can be designed by the H control, the variation internal of learning-based controller u c k in subsystem i can be perceived as the additional bounded uncertainties of the dynamics-based controller. Thus, the parameters vary in the interval K n , i Δ K ¯ c , i , K n , i + Δ K ¯ c , i and the stability of learning-based controller can be analyzed by using nonfragile control theory. Here, Δ K c , i is defined as the additional compensation to obtain the actual gain matrices as follows:

K i = K n , i + Δ K c , i     (9)

where Δ K ¯ c , i and Δ K ¯ c , i denote the lower and upper bounds of Δ K c , i ; set Δ K c , i = M i F i N i , M i and N i are known parameters with appropriate dimensions, and F i are uncertain matrices satisfying the following equation:

F i T F i I     (10)

Remark 1: The model of flight vehicle can be given based on switched systems. The variation of states in the envelope can be viewed as the switching between subsystems. The tracking controller is composed of two parts, namely dynamics-based controller u n k , which is developed based on finite-time H control to ensure stability and prescribed attenuation index; the learning-based controller u c k , which is based on ADDPG algorithm to achieve superior performance in real time. The output of u c k varies in the neighbor interval of u n k with given bounds. Therefore, the nonfragile control can be used to ensure the stability of u c k . As mentioned, ensuring stability, robustness, and optimal performance simultaneously remains difficult. To improve training efficiency, adaptive factors for reward functions were applied in DDPG algorithm. With inspiration from the achievements in the DDPG algorithm and robust control, the advantages of model-based method ( H control) and model-free method (DRL) were considered the problem.

Remark 2: The compensation of learning-based controller is considered as an additional gain value on the controller parameters with known bounds, which can be predefined and can presented by M i and N i . The optimal control policy can be realized in the scheduling interval by using the ADDPG algorithm.

The switching of controller always lags the switching of system mode because of packet dropouts. The i th subsystem is assumed to be activated at k i , and the controller of i th subsystem is activated at k i + Δ i , where Δ i denotes the length of unmatched periods. The condition in which unmatched and matched periods exist simultaneously is called asynchronous switching. The Lyapunov-like function decreases in matched periods and increases in unmatched periods with bounded rates, where a i are introduced to represent the decreasing rate in matched periods, and b i represent the increasing rate in unmatched periods. The increasing coefficients of the Lyapunov-like function at switching instants are set to be μ i .

For proof, the following assumptions are introduced.

Assumption 1 (Cheng et al., 2017): For given positive constant N f , the time-varying exogenous disturbance ω k satisfies the following equation:

k = 0 N f ω T k ω k ω ¯     (11)

where ω ¯ is the upper bound of external disturbance.

Assumption 2 (Cheng et al., 2017): The maximum number of consecutive data missing is set to be N1, and the maximum probability of data missing is set to be ρ ¯ .

According to the aforementioned statement, the closed-loop switched systems can be described as follows:

{ x ˜ k + 1 = A ˜ i i x ˜ k + θ ˜ k A ˜ 1 i x ˜ k + B ˜ i ω ˜ k e k = C ˜ i x ˜ k + θ ˜ k C ˜ 1 i x ˜ k + D ˜ i ω ˜ k , k [ k i + Δ i , k i + 1 ) { x ˜ k + 1 = A ˜ i j x ˜ k + θ ˜ k A ˜ 1 i x ˜ k + B ˜ i ω ˜ k e k = C ˜ i x ˜ k + θ ˜ k C ˜ 1 i x ˜ k + D ˜ i ω ˜ k , k [ k i , k i + Δ i )     (12)

where ω ˜ k = ω T k r c T k T , A ˜ i i = A i + B i K 1 i B i K 2 i ρ C i I , A ˜ 1 i = 0 0 C i 0 , A ˜ i j = A i + B i K 1 j B i K 2 j ρ C i I , B ˜ i = D i 0 0 I C ˜ i = ρ C i 0 , C ˜ 1 i = C i 0 , D ˜ i = 0 I , θ ˜ k = θ k ρ .

Furthermore, the definitions of finite-time stable, finite-time boundedness, and finite-time H performance for switched systems are expressed as follows:

Definition 1 (Wei et al., 2020): For given appropriate constant positive matrix R s , positive constants c 1 > 0 , c 2 > 0 , and N f with c 1 < c 2 , respectively. The switched systems in Eq. (12) with u k 0 and ω k 0 are finite-time stable with respect to c 1 c 2 N f R s if Eq. (13) holds.

x T k 0 R s x k 0 c 1 x T k R s x k c 2 , k 1 2 N f     (13)

Definition 2 (Wei et al., 2020): For given appropriate constant positive matrix R s , constants c 1 > 0 , c 2 > 0 , ω ¯ , and N f with c 1 < c 2 , respectively. The switched system in Eq. (12) is finite-time bounded (FTB) with respect to c 1 c 2 ω ¯ N f R s such that the following expression holds:

x T k 0 R s x k 0 c 1 x T k R s x k c 2 , k 1 2 N f     (14)

where the external disturbance satisfies Assumption 1.

Definition 3 (Wei et al., 2020): For a given appropriate constant positive matrix R s , constants c 1 > 0 , c 2 > 0 for ω ¯ and N f with c 1 < c 2 . The system in Eq. (12) exhibits finite-time H performance γ d if the system is FTB and satisfies the following expression:

s = 0 N f e T s e s γ d 2 s = 0 N f ω ˜ T s ω ˜ s     (15)

Thus, the main purposes of controller design is to ensure that the switched system is FTS with prescribed H performance γ d with respect to c 1 c 2 ω ¯ N f R s , which is equivalent to design the robust controller, such that the following condition is satisfied:

1. The switched systems in Eq. (12) is FTB.

2. For given constant γ d > 0 , the system in Eq. (12) satisfies Eq. (15) under zero-initial situation for all external disturbance satisfies Eq. (11).

Based on the structure of control diagram, the design process is categorized into two steps:

Step 1: The scheduling interval of control parameters can be assumed to be the uncertain compensation of dynamics-based controller. Considering the controller uncertainties and asynchronous switching caused by packet dropouts, the finite-time H controllers are derived as dynamics-based controller according to nonfragile control theory and finite-time robust control theory in terms of LMI.

Step 2: The variations of controller parameters are assumed to be the action, and the dynamic model of flight vehicles is assumed to be the environment. The DRL algorithm was introduced to derive the learning-based controller to realize optimal control policy, in which the ADDPG algorithm was proposed as the model-free method in the actor–critic framework.

3 Main results

A dynamics-based controller was proposed to ensure stability and a prescribed performance index. The ADDPG algorithm was developed to realize performance and ensure controllers can adaptively schedule parameters.

3.1 Dynamics-based controller design

Definition 4 (Zhao et al., 2017): Given switching signal σ k and any 0 k 1 k 2 , let N σ i k 1 k 2 be the activated number of i th subsystem over the time interval k 1 k 2 . Here, T i k 1 k 2 denotes the total running time of i th subsystem during the time interval k 1 k 2 , i Ω . If positive numbers N 0 i and τ a i , exist such that

N σ i k 1 k 2 N 0 i + T i k 1 k 2 τ a i     (16)

then τ a i is called the MDADT and N 0 i is called the mode-dependent chatter bounds.

Lemma 1 (Cheng et al., 2017): For given symmetric matric Y , matrices F , M ˜ , and N ˜ , if a scalar ε > 0 exists such that

Y + ε 1 M ˜ T M ˜ + ε N ˜ T N ˜ < 0     (17)

then we can obtain the following:

Y + M ˜ T F N ˜ + N ˜ T F T M ˜ < 0     (18)

where F satisfies F T F < I .

Lemma 2 (Aristidou et al., 2014): For given matrix Q , which satisfies

Q = Q 11 Q 12 Q 21 Q 22     (19)

where Q 12 = Q 21 T , and Q 11 and Q 22 are invertible matrices. Then we can conclude that the following three conditions are equivalent, which is called Schur Complement.

1 Q < 0 ; 2 Q 11 < 0 , Q 22 Q 12 T Q 11 T Q 12 < 0 ; 3 Q 22 < 0 , Q 11 Q 12 Q 11 T Q 12 T < 0.

Theorem 1: Given system Eq. (12) and constant scalars 0 < a i < 1 , b i > 0 , μ i 1 , γ > 0 , if matrices S i > 0 , S j > 0 , S i j > 0 , and W i , i , j Ω , i j , then the following expression is obtained:

S j μ i S i     (20)
S i 0 A ˜ i i S i B ˜ i S i ρ ˜ A ˜ 1 i S i 0 1 a i S i 0 γ 2 W i     (21) < 0
S i j 0 A ˜ i j S j B ˜ i S i j ρ ˜ A ˜ 1 i S j 0 1 + b i S i j S j S j T 0 γ 2 W i     (22) < 0

then the switched system in Eq. (12) is FTB with respect to c 1 c 2 ω ¯ N f R s if the MDADT satisfies the following equations:

τ ai τ ai = N f ln μ i + N f Δ i ln i ln c 2 η 1 ln c 1 η 2 + γ 2 b ˜ max N f η 3 ω ¯ N f ln a ˜ i     (23)
c 1 η 2 + γ 2 b ˜ max N f η 3 ω ¯ a ˜ i N f c 2 η 1     (24)

where η 1 = max i Ω λ max S ¯ i , λ max S ¯ i j , η 2 = min i Ω λ min S ¯ i , λ min S ¯ i j , η 3 = λ max W i , ρ ˜ = ρ 1 ρ , S ¯ i = R s 1 / 2 S i R s 1 / 2 , S ¯ j = R s 1 / 2 S j R s 1 / 2 , a ˜ i = 1 a i , b ˜ i = 1 + b i , i = b ˜ i / a ˜ i , b ˜ max = max b ˜ i .

Proof: For positive constant k , we define k 0 = 0 and k 1 , k 2 , , k i , k i + 1 , k n as the switching instants over the interval 0 k , suppose the following Lyapunov functions exist:

V i k = x ˜ T k P i x ˜ k     (25)

Class κ functions exist as follows:

κ 1 i x ˜ k V i k κ 2 i x ˜ k     (26)
Δ V i k { a i V i k , k k 0 k 1 k i + Δ i , k i + 1 b i V i k , k k i , k i + Δ i     (27)
V i k μ i V j k     (28)

where P i > 0 are Lyapunov matrices.

Define ξ k = x ˜ T k ω ˜ T k T , and combining with Eqs. (12) and (27), we can obtain the following expression:

Δ V i k + a i V i k γ 2 ω ˜ T k W i ω ˜ k = V i k + 1 V i k + a i V i k γ 2 ω ˜ T k W i ω ˜ k = E x ˜ T k + 1 P i x ˜ k + 1 1 a i x ˜ T k P i x ˜ k γ 2 ω ˜ T k W i ω ˜ k = ξ T k A ˜ i i T B ˜ i T P i A ˜ i i B ˜ i + ρ ˜ 2 A ˜ 1 i T 0 P i A ˜ 1 i 0 + 1 a i P i 0 0 γ 2 W i ξ k = ξ T k Π i i ξ k     (29)
Δ V i k b i V i k γ 2 ω ˜ T k W ω ˜ k = V i k + 1 V i k b i V i k γ 2 ω ˜ T k W i ω ˜ k = E x ˜ T k + 1 P i j x ˜ k + 1 1 + b i x T k P i j x k γ 2 ω ˜ T k W ω ˜ k = ξ T k A ˜ i j T B ˜ i T P j A ˜ i j B ˜ i + ρ ˜ 2 A ˜ 1 i T 0 P i A ˜ 1 i 0 + 1 + b i P i j 0 0 γ 2 W i ξ k = ξ T k Π i j ξ k     (30)

where

Π i i = P i 0 P i A ˜ i i P i B ˜ i P i ρ ˜ P i A ˜ 1 i 0 1 a i P i 0 γ 2 W i ,
Π i j = P i j 0 P i j A ˜ i j P i j B ˜ i P i j ρ ˜ P i j A ˜ 1 i 0 1 + b i P i j 0 γ 2 W i .

Setting S i = P i 1 and performing a congruence transformation to Eqs. (29), (30) by matrices diag S i S i S i I and diag S i j S i j S j I , we can obtain the following expression:

S i 0 A ˜ i i S i B ˜ i S i ρ ˜ A ˜ 1 i S i 0 1 a i S i 0 γ 2 W i     (31) < 0
S i j 0 A ˜ i j S j B ˜ i S i j ρ ˜ A ˜ 1 i S j 0 1 + b i S j T S i j 1 S j 0 γ 2 W i     (32) < 0

The inequality S i j S j T S i j S i j S j 0 implies the following:

S i j S j S j T S j T S i j 1 S j     (33)

We can conclude that Eq. (31) is equivalent to Eq. (21) and Eq. (32) is equivalent to Eq. (22), such that the following expression holds true:

Δ V i k { a i V i k + γ 2 ω ˜ T k ω ˜ k , k k 0 k 1 k i + Δ i , k i + 1 b i V i k + γ 2 ω ˜ T k ω ˜ k , k k i , k i + Δ i     (34)

Combining Eqs. (25), (26), (28), (34), we can obtain the following equations by iteration operation:

With the definitions of η 1 and η 2 , we have the following expression:

V σ k k a ˜ σ k i + Δ i k k i Δ i V σ k i + Δ i k i + Δ i + γ 2 s = k i + Δ i k 1 a ˜ σ k i k 1 s ω ˜ T s W i ω ˜ s μ σ k i a ˜ σ k i + Δ i k k i Δ i b ˜ σ k i Δ i V σ k i k i + Δ i + γ 2 s = k i + Δ i k 1 a ˜ σ k i k 1 s ω ˜ T s W i ω ˜ s + γ 2 s = k i k i + Δ i 1 a ˜ σ k i k k i Δ i b ˜ σ k i k i + Δ i s 1 ω ˜ T s W i ω ˜ s = μ σ k i a ˜ σ k i + Δ i k k i σ k i Δ i V σ k i k i + Δ i + γ 2 s = k i + Δ i k 1 a ˜ σ k i k 1 s ω ˜ T s W i ω ˜ s + γ 2 s = k i k i + Δ i 1 a ˜ σ k i k s 1 σ k i k i + Δ i s 1 ω ˜ T s W i ω ˜ s μ σ k i a ˜ σ k i k k i σ k i Δ i V σ k i 1 k i + γ 2 s = k i k 1 b ˜ σ k i k 1 s ω ˜ T s W i ω ˜ s μ σ k i a ˜ σ k i k k i σ k i Δ i μ σ k i 1 a ˜ σ k i 1 k i k i 1 σ k i 1 Δ i 1 V σ k i 1 k i 1 + γ 2 s = k i 1 + Δ i 1 k i 1 b ˜ σ k i 1 k i 1 s ω ˜ T s W i ω ˜ s + γ 2 s = k i k 1 b ˜ σ k i k 1 s ω ˜ T s W i ω ˜ s a ˜ i T i k 0 k a ˜ i 1 T i 1 k 0 k a ˜ 1 T 1 k 0 k i Δ i N σ , i k 0 k i 1 Δ i 1 N σ , i 1 k 0 k 1 Δ 1 N σ , 1 k 0 k μ i N σ , i k 0 k μ i 1 N σ , i 1 k 0 k μ 1 N σ , 1 k 0 k V σ k 0 k 0 + a ˜ i T i k 0 k a ˜ 1 T 1 k 0 k i Δ i N σ , j k 0 k 1 Δ 1 N σ , 1 k 0 k μ i N σ , i k 0 k μ i 1 N σ , i 1 k 0 k μ 1 N σ , 1 k 0 k γ 2 s = k 0 + Δ 0 k 1 1 b ˜ σ k 0 k 1 1 s ω ˜ T s W i ω ˜ s + + γ 2 s = k i + Δ i k 1 b ˜ i k 1 s ω ˜ T s W i ω ˜ s i = 1 n μ i N σ , i k 0 k a ˜ i T i k 0 k i Δ i N σ , i k 0 k V σ k 0 k 0 + γ 2 b ˜ max N f s = k 0 k 1 i = 1 n μ i N σ , i s k a ˜ i T i s k i Δ i N σ , i s k ω ˜ T s W i ω ˜ s i = 1 n μ i N σ , i k 0 k a ˜ i T i k 0 k i Δ i N σ , i k 0 k V σ k 0 k 0 + γ 2 b ˜ max N f η 3 ω ¯     (35)
V σ k 0 k 0 = x ˜ T k 0 P i x ˜ k 0 = x ˜ T k 0 R s 1 / 2 S ¯ i 1 R s 1 / 2 x ˜ k 0 1 η 2 x ˜ T k 0 R s x ˜ k 0     (36)

Moreover, using x ˜ T k 0 R s x ˜ k 0 c 1 , we can obtain the following expression:

x ˜ T k R s x ˜ k η 1 V σ k k η 1 i = 1 n μ i N σ , i k 0 k a ˜ i T i k 0 k i Δ i N σ , i k 0 k c 1 η 2 + γ 2 b ˜ max N f η 3 ω ¯ exp ln η 1 + i = 1 n T i k 0 k τ ai ln μ i + T i k 0 k ln a ˜ i + Δ i T i k 0 k τ ai ln i + ln c 1 η 2 + γ 2 b ˜ max N f η 3 ω ¯ exp ln η 1 + i = 1 n ln μ i τ ai + ln a ˜ i + Δ i ln i τ ai T i k 0 k + ln c 1 η 2 + γ 2 b ˜ max N f η 3 ω ¯

Based on Definition 2, we have x ˜ T k R s x ˜ k c 2 , which can be expressed as follows:

exp ln η 1 + i = 1 n ln μ i τ ai + ln a ˜ i + Δ i ln i τ ai T i k 0 k + ln c 1 η 2 + γ 2 b ˜ max N f λ 3 ω ¯ c 2 ln μ i + Δ i ln i τ ai N f ln c 2 η 1 ln c 1 η 2 + γ 2 b ˜ max N f η 3 ω ¯ N f ln a ˜ i τ ai N f ln μ i + N f Δ i ln i ln c 2 η 1 ln c 1 η 2 + γ 2 b ˜ max N f η 3 ω ¯ N f ln a ˜ i     (37)

If Eqs. (23), (24) hold, then we can conclude that the following expression is true:

ln c 2 η 1 ln c 1 η 2 + γ 2 b ˜ max N f η 3 ω ¯ N f ln a ˜ i > 0 , τ ai N f ln μ i + N f Δ i ln i ln c 2 η 1 ln c 1 η 2 + γ 2 b ˜ max N f η 3 ω ¯ N f ln a ˜ i     (38)

which is equivalent to x ˜ T k R s x ˜ k c 2 . Thus, the switched system in Eq. (12) is FTB, which completes the proof.

The sufficient guarantees of FTS are given in Theorem 1, and the prescribed attenuation performance are discussed in Theorem 2.

Theorem 2: Given system Eq. (12) and constant scalars 0 < a i < 1 , b i > 0 , μ i 1 , γ > 0 , if matrices S i > 0 , S j > 0 , S i j > 0 , and W i , i , j Ω , i j , such that the following expression holds:

S j μ i S i     (39)
S i 0 0 0 A ˜ i i S i B ˜ i S i 0 0 ρ ˜ A ˜ 1 i S i 0 I 0 C ˜ i S i D ˜ i I ρ ˜ C ˜ 1 i S i 0 1 a i S i 0 γ 2 I     (40) < 0
S i j 0 0 0 A ˜ i j S j B ˜ i S i j 0 0 ρ ˜ A ˜ 1 i S j 0 I 0 C ˜ i S j D ˜ i I ρ ˜ C ˜ 1 i S j 0 1 + b i S i j S j S j T 0 γ 2 I     (41) < 0

then the system with MDADT satisfying the following expression is FTS with H performance γ d with respect to 0 c 2 ω ¯ N f R s γ d .

τ ai τ ai = max N f ln μ i + N f Δ i ln i ln c 2 η 1 ln γ 2 b ˜ max N f ω ¯ N f ln a ˜ i , Δ i ln i + ln μ i ln a ˜ i     (42)
γ 2 b ˜ max N f ω ¯ a ˜ i N f c 2 η 1     (43)

where η 1 = max i Ω λ max S ¯ i , λ max S ¯ i j , ρ ˜ = ρ 1 ρ , S ¯ i = R s 1 / 2 S i R s 1 / 2 , S ¯ j = R s 1 / 2 S j R s 1 / 2 , a ˜ i = 1 a i , b ˜ i = 1 + b i , i = b ˜ i / a ˜ i γ d = γ a ˜ max a ˜ min N f / 2 , a ˜ max = max a ˜ i , a ˜ min = min a ˜ i , b ˜ max = max b ˜ i . Proof: The Lyapunov functions are determined in Eq. (25). We can obtain the following equations under the zero-initial condition.

Δ V i k + a i V i k + E e T k e k γ 2 ω ˜ T k ω ˜ k = Ε x ˜ k + 1 T P i x ˜ k + 1 1 a i x ˜ T k P i x ˜ k + E e T k e k γ 2 ω ˜ T k ω ˜ k = ξ T k { A ˜ i i T B ˜ i T P i A ˜ i i B ˜ i + ρ ˜ 2 A ˜ 1 i T 0 P i A ˜ 1 i 0 + C ˜ i T D ˜ i T C ˜ i D ˜ i + ρ ˜ 2 C ˜ 1 i T 0 C ˜ 1 i 0 + 1 a i P i 0 0 γ 2 I } ξ k = ξ T k Z i i ξ k     (44)
Δ V i k b i V i k + E e T k e k γ 2 ω ˜ T k ω ˜ k = Ε x ˜ k + 1 T P i j x ˜ k + 1 1 + b i x ˜ T k P i j x ˜ k + E e T k e k γ 2 ω ˜ T k ω ˜ k = ξ T k { A ˜ i j T B ˜ i T P i j A ˜ i j B ˜ i + ρ ˜ 2 A ˜ 1 i T 0 P i j A ˜ 1 i 0 + C ˜ i T D ˜ i T C ˜ i D ˜ i + ρ ˜ 2 C ˜ 1 i T 0 C ˜ 1 i 0 + 1 + b i P i j 0 0 γ 2 I } ξ k = ξ T k Z i j ξ k     (45)

where

Z i i = P i 0 0 0 P i A ˜ i i P i B ˜ i P i 0 0 ρ ˜ P i A ˜ 1 i 0 I 0 C ˜ i D ˜ i I ρ ˜ C ˜ 1 i 0 1 a i P i 0 γ 2 I ,
Z i j = P i j 0 0 0 P i j A ˜ i j P i j B ˜ i P i j 0 0 ρ ˜ P i j A ˜ 1 i 0 I 0 C ˜ i D ˜ i I ρ ˜ C ˜ 1 i 0 1 + b i P i j 0 γ 2 I

The system in Eq. (12) is stable with predefined performance such that

Z i i < 0     (46)
Z i j < 0     (47)

Setting S i = P i 1 and performing congruence transformation to the aforementioned inequalities through diag S i S i I I S i I and diag S i j S i j I I S j I , we can obtain the following expression:

S i 0 0 0 A ˜ i i S i B ˜ i S i 0 0 ρ ˜ A ˜ 1 i S i 0 I 0 C ˜ i S i D ˜ i I ρ ˜ C ˜ 1 i S i 0 1 a i S i 0 γ 2 I     (48) < 0
S i 0 0 0 A ˜ i i S i B ˜ i S i 0 0 ρ ˜ A ˜ 1 i S i 0 I 0 C ˜ i S i D ˜ i I ρ ˜ C ˜ 1 i S i 0 1 a i S i 0 γ 2 I     (49) < 0

Similar to the transformation in Eq. (33), we can obtain the following expression:

S i j S j S j T S j T S i j 1 S j     (50)

With Eqs.(40), (41), we have Z i i < 0 and Z i j < 0 , which implies that the following expression:

Δ V i k { a i V i k E e T k e k + γ 2 ω ˜ T k ω ˜ k , k [ k i + Δ i , k i + 1 ) b i V i k E e T k e k + γ 2 ω ˜ T k ω ˜ k , k [ k i , k i + Δ i )     (51)

The following equation can be obtained by setting γ 2 ω ˜ T k W i ω ˜ k as ψ s = E e T k e k + γ 2 ω ˜ T k ω ˜ k . Moreover, the system in Eq. (12) is FTB with respect to 0 c 2 ω ¯ N f R s by setting W i = I and c 1 = 0 .

V i k i = 1 n μ i N σ , i k 0 k a ˜ i T i k 0 k i Δ i N σ , i k 0 k V σ k 0 k 0 + b ˜ max N f s = k 0 k 1 i = 1 n μ i N σ , i s k a ˜ i T i s k i Δ i N σ , i s k ψ s     (52)

According to V σ k k 0 and zero-initial condition, we have the following expression:

s = k 0 k 1 i = 1 n μ i N σ , i s k a ˜ i T i s k i Δ i N σ , i s k ψ s 0 s = k 0 k 1 i = 1 n μ i N σ , i s k a ˜ i T i s k i Δ i N σ , i s k e T k e k γ 2 s = k 0 k 1 i = 1 n μ i N σ , i s k a ˜ i T i s k i Δ i N σ , i s k ω ˜ T s ω ˜ s     (53)

Multiplying both sides of Eq. (53) by i = 1 n i Δ i μ i N σ i k 0 k , we obtain the following equation:

s = k 0 k 1 i = 1 n μ i N σ , i k 0 s a ˜ i T i s k i Δ i N σ , i k 0 s e T k e k γ 2 s = k 0 k 1 i = 1 n μ i N σ , i k 0 s a ˜ i T i s k i Δ i N σ , i k 0 s ω ˜ T s ω ˜ s     (54)

Based on the definition of MDADT and Eq. (42), we have the following:

0 N σ , i k 0 s T i k 0 s τ ai T i k 0 s ln a ˜ i Δ i ln b ˜ i ln a ˜ i + ln μ i     (55)

Combining with Eqs. (43), (45), we infer the following:

s = k 0 k 1 i = 1 n μ i i Δ i T i k 0 s ln a ˜ i Δ i ln b ˜ i ln a ˜ i + ln μ i a ˜ i T i s k e T k e k γ 2 s = k 0 k 1 i = 1 n μ i i Δ i T i k 0 s ln a ˜ i Δ i ln b ˜ i ln a ˜ i + ln μ i a ˜ i T i s k ω ˜ T s ω ˜ s     (56)

Thus, we have the following equation:

μ i i Δ i T i k 0 s ln a ˜ i Δ i ln b ˜ i ln a ˜ i + ln μ i = a ˜ i T i k 0 s     (57)

Next, we have the following expression:

s = k 0 k 1 i = 1 n a ˜ i T i k 0 k e T k e k γ 2 s = k 0 k 1 i = 1 n a ˜ i T i k 0 k ω ˜ T s ω ˜ s     (58)

Setting k 1 = N f , we can obtain the following:

s = k 0 N f i = 1 n a ˜ i T i k 0 k e T k e k γ 2 s = k 0 k 1 i = 1 n a ˜ i T i k 0 k ω ˜ T s ω ˜ s a ˜ min N f s = k 0 N f e T k e k γ 2 a ˜ max N f s = k 0 k 1 ω ˜ T s ω ˜ s s = k 0 N f e T k e k γ 2 a ˜ max a ˜ min N f s = k 0 k 1 ω ˜ T s ω ˜ s

Therefore, the system Eq. (12) is FTB with given attenuation index γ d = γ a ˜ max a ˜ min N f / 2 , which completes the proof.

Based on Theorems 1 and 2, the parameters of finite-time tracking controller of switched systems is derived in Theorem 3.

Theorem 3: Given system Eq. (12) and constant scalars 0 < a i < 1 , b i > 0 , μ i 1 , γ > 0 , if positive matrices S i , S j and S i j , i , j Ω , i j , exist such that the following holds true:

S j μ i S i     (59)
Φ i i Σ 1 i Σ 2 i Ξ 1 i 0 Ξ 2 i     (60) < 0
Φ i j Σ 1 j Σ 2 j Ξ 1 j 0 Ξ 2 j     (61) < 0

System Eq. (12) with MDADT satisfying Eqs. (42), (43) is finite-time stable with predefined attenuation index γ d with respect to 0 c 2 ω ¯ N f R s γ d , and the parameters of robust controller can be expressed as follows:

K n , 1 i = U 1 i S 1 i 1     (62)
K n , 2 i = U 2 i S 2 i 1     (63)

where

S i = S 1 i 0 0 S 2 i ,
S j = S 1 j 0 0 S 2 j
Φ i i = S i 0 0 0 φ i i B ˜ i S i 0 0 ρ ˜ A ˜ 1 i S i 0 I 0 C ˜ i S i D ˜ i I ρ ˜ C ˜ 1 i S i 0 1 a i S i 0 γ 2 I ,
Φ i j = S i j 0 0 0 φ i j B ˜ i S i j 0 0 ρ ˜ A ˜ 1 i S j 0 I 0 C ˜ i S j D ˜ i I ρ ˜ C ˜ 1 i S j 0 1 + b i S i j S j S j T 0 γ 2 I ,
Σ 1 i = M ˜ 1 i T N ˜ 1 i T ,
Σ 2 i = M ˜ 2 i T N ˜ 2 i T ,
Σ 1 j = M ˜ 1 j T N ˜ 1 j T ,
Σ 2 j = M ˜ 2 j T N ˜ 2 j T ,
Ξ 1 i = diag ε 1 i ε 1 i ,
Ξ 2 i = diag ε 2 i ε 2 i ,
Ξ 1 j = diag ε 1 j ε 1 j ,
Ξ 2 j = diag ε 2 j ε 2 j ,
φ i i = A i S 1 i + B i U 1 i B i U 2 i ρ C i S 1 i S 2 i ,
φ i j = A i S 2 j + B i U 1 j B i U 2 j ρ C i S 1 j S 2 j ,
M ˜ 1 i = M 1 i T B ˜ i T 0 0 0 0 0 0 0 0 0 ,
N ˜ 1 i = 0 0 0 0 0 0 N 1 i S 1 i 0 0 0 ,
M ˜ 2 i = M 2 i T B ˜ i T 0 0 0 0 0 0 0 0 0 ,
N ˜ 2 i = 0 0 0 0 0 0 0 N 2 i S 2 i 0 0 ,
M ˜ 1 j = M 1 j T B ˜ i T 0 0 0 0 0 0 0 0 0 ,
N ˜ 1 j = 0 0 0 0 0 0 N 1 j S 1 j 0 0 0 ,
M ˜ 2 j = M 2 j T B ˜ i T 0 0 0 0 0 0 0 0 0 ,
N ˜ 2 j = 0 0 0 0 0 0 0 N 2 j S 2 j 0 0 ,
ε 1 i > 0 ,
ε 2 i > 0 ,
ε 2 j > 0 .

Proof: According to Schur Complement (Aristidou et al., 2014) and Lemma 1, we can calculate the following equation:

Φ i i + M ˜ 1 i T F 1 i N ˜ 1 i + N ˜ 1 i T F 1 i T M ˜ 1 i + M ˜ 2 i T F 2 i N ˜ 2 i + N ˜ 2 i T F 2 i T M ˜ 2 i < 0     (64)
Φ i j + M ˜ 1 j T F 1 j N ˜ 1 j + N ˜ 1 j T F 1 j T M ˜ 1 j + M ˜ 2 j T F 2 j N ˜ 2 j + N ˜ 2 j T F 2 j T M ˜ 2 j < 0     (65)

Let U 1 i = K n , 1 i S 1 i , U 2 i = K n , 2 i S 2 i , U 1 j = K n , 1 j S 1 j , U 2 j = K n , 2 j S 2 j , Eq. (60) is equivalent to Eq. (40), and Eq. (61) is equivalent to Eq. (41). Therefore, the parameters of controller can be given according to Eqs. (59)–(61) by solving linear matrix inequalities Eqs. (62), (63).

3.2 Online scheduling based on the ADDPG algorithm

Based on the finite-time H control, the sufficient conditions to ensure the FTS and prescribed performance are presented. The process of online scheduling can be formulated as the Markov decision process (MDP). Because the control process is a series of continuous decision process, the ADDPG algorithm was proposed based on the actor–critic framework to realize superior control performance of switched flight vehicles.

The DRL is composed of an agent and the interacting environment. At each time, the agent obtains a state s k , selects an action a k , and can receive reward r k and s k + 1 by interacting with the environment, in which r k is used to evaluate the performance of state-action pair at the time instant. In this study, the switched tracking controller can be viewed as the agent, whose purpose is maximizing the sum of the expected discounted reward function over a series of future steps:

R k = r k + γ d r k + 1 + γ d 2 r k + 2 + + γ d K f k r K f = r k + γ d R k + 1     (66)

where γ d 0 1 denotes the discount factor. Here, K f denotes the terminal step of reinforcement learning. The value of reward depends on the action undertaken and the current state. The action and state are defined as follows:

a k = Δ K c , 1 i k Δ K c , 2 i k     (67)
s k = α k q k r c k u n k     (68)

The ADDPG algorithm is provided based on the DDPG algorithm, in which the advantages of both deep Q learning and actor–critic framework are used to realize the optimal action, which is updated in continuous action spaces based on policy gradient theory. The ADDPG algorithm is realized in the following two sections: the action-value in each step is approximated by the critic network Q s k , a k | ς Q with weights ς Q , the current control policy is obtained by the actor network ϖ s k | ς ϖ with weights ς ϖ . The weights of the critic network are updated by minimizing the loss function, which can be described as follows:

L ς Q = E s a Q s k , a k | ς Q y ¯ k 2     (69)

where

y ¯ k = r k s k a k + γ d Q s k + 1 , ϖ s k | ς ϖ | ς Q .

The weights of actor network are updated according to the policy gradient in the following equations:

ς ϖ k + 1 = ς ϖ k + L an ς ϖ J     (70)
ς ϖ J = E π ς ϖ Q π s k , π s k | ς ϖ | ς Q | s = s k , a = π s k | ς ϖ = E π ς ϖ Q π s k , π s k | ς Q | ς ϖ π s k | ς Q     (71)

where L an is the learning rate of ϖ s k | ς ϖ .

To overcome the divergence of Q learning, two separated networks were adopted: the actor target network ϖ s k | ς ϖ ' and the critic target network Q s k , a k | ς Q ' , the mentioned two networks can update their weights as follows:

ς ϖ ' k + 1 = L a t n ς ϖ k + 1 L a t n ς ϖ ' k     (72)
ς Q ' k + 1 = L c t n ς Q k + 1 + 1 L c t n ς Q ' k + 1     (73)

where L a t n and L c t n are the learning rates.

Moreover, an exploration noise N a is added to the actor to realize exploration and actual control policy, which is generated by actor and can be rewritten as follows:

a k = π s k | ς ϖ + N a     (74)

Unlike the conventional DDPG algorithm, the adaptive parameters were introduced to achieve superior convergence and robustness, respectively. By introducing robustness as a continuous parameter, the reward function enables the convenient exploration to realize adaptive training. The control policy is used to reduce the tracking error with lower control input and unsaturated actuator, therefore, the reward function depends on the tracking error, amplitude of control signal, and the saturation of actuator, which can be expressed as follows:

r k = g 1 r e 1 k + g 2 r e 2 k + g 3 r e 3 k     (75)
r e 1 = υ 1 2 υ 1 e k / l 1 υ 1 2 + 1 υ 1 2 1     (76)
r e 2 = υ 2 2 υ 2 u k / l 2 υ 2 2 + 1 υ 2 2 1     (77)
r e 3     (78) = { δ p , u k > u ¯ k 0 , u k u ¯ k

where r e 1 k represents the reward of tracking error, r e 2 k denotes the reward of control input, and r e 3 k is the reward of saturation, respectively. Here, g 1 , g 2 , and g 3 denote the weights of r e 1 k , r e 2 k , and r e 3 k in the reward function. Furthermore, υ 1 , υ 2 are the adaptive shape parameters, which determine the robustness of the reward function. l 1 > 0 and l 2 > 0 are the parameters that controls the size of the quadratic bowl near the origin, respectively. Here, δ p is predefined constant and u ¯ k denotes the upper bound of the actuator. Next, the final reward function r e 1 k and r e 2 k with adaptive parameters can be rewritten as follows:

r e 1     (79) k = { 1 2 e k / l 1 2 if υ 1 = 2 log 1 2 e k / l 1 2 + 1 if υ 1 = 0 1 exp 1 2 e k / l 1 2 if υ 1 = υ 1 2 υ 1 e k / l 1 υ 1 2 + 1 υ 1 2 1 otherwise
r e 2     (80) k = { 1 2 u k / l 2 2 if υ 2 = 2 log 1 2 u k / l 2 2 + 1 if υ 2 = 0 1 exp 1 2 u k / l 2 2 if υ 2 = υ 2 2 υ 2 u k / l 2 υ 2 2 + 1 υ 2 2 1 otherwise

The adaptive updating law of hyper parameters are defined as follows to improve transient performance and robustness of the algorithm:

{ v 1 s = v 1 max v 1 min sigmoid v 1 p s + v 1 min v 2 s = v 2 max v 2 min sigmoid v 2 p s + v 2 min     (81)
{ l 1 s = softplus l 1 p s + l 1 min l 2 s = softplus l 2 p s + l 2 min     (82)

where v 1 max and v 1 min denote the maximum and minimum values of v 1 max . Similarly, we can obtain the definitions of v 2 max , v 2 min , l 1 min , and l 2 min . The length of each segment is determined by training episodes.

Based on the statement, the pseudocode for the ADDPG algorithm proposed in this paper is presented in Algorithm 1.

Remark 3: Although the conventional DDPG algorithm can realize parameter optimization (Xu et al., 2019; Gaudet et al., 2020; Gheisarnejad and Khooban, 2021), guaranteeing data efficiency and system stability because it attempts to explore the optimal control policy for all possible action in the action space is difficult. Moreover, the proposed adaptive hyper parameters can increase robustness and achieve generalized case because the reward function determines training performance.

4 Numerical examples

In this study, the HiMAT vehicle is given to validate the proposed method. The three-view drawing and trim condition for operation points can be obtained from the study performed by Wang et al. (2015). The flight condition and the model of longitudinal motion dynamics are given as Wang et al. (2015).

Based on the trim condition within the flight envelope, the longitudinal motion dynamics can be described by switched systems. We set the sampling time T s = 0.02 and obtain the system matrices A i and B i , which can be described as follows:

A 1 = 0.9804 0.0188 0.1768 0.9720 , B 1 = 0.0049 0.0034 0.0007 0.1579 0.0979 0.0993 A 2 = 0.9728 0.0188 0.3773 0.9622 , B 2 = 0.0075 0.0050 0.0014 0.2941 0.1765 0.1831 A 4 = 0.9688 0.0187 0.4968 0.9560 , B 4 = 0.0096 0.0065 0.0021 0.4334 0.2895 0.2547 A 8 = 0.9766 0.0190 0.3312 0.9668 , B 8 = 0.0077 0.0054 0.0018 0.3759 0.2798 0.2113 A 9 = 0.9725 0.0189 0.3344 0.9594 , B 9 = 0.0099 0.0068 0.0026 0.5374 0.3793 0.2890 A 12 = 0.9649 0.0188 0.2242 0.9509 , B 12 = 0.0136 0.0094 0.0042 0.9015 0.6166 0.4367 A 18 = 0.9657 0.0191 0.9772 0.9523 , B 18 = 0.0061 0.0033 0.0023 0.4595 0.2426 0.2576 A 19 = 0.9635 0.0192 1.2369 0.9507 , B 19 = 0.0066 0.0032 0.0019 0.5334 0.2569 0.2163

The switching of subsystems in the flight envelope is supposed to be 19-18-12-9-8-4-2-1, which is described in Figure 2.

FIGURE 2
www.frontiersin.org

Figure 2. Switching logic of HiMAT in the flight envelope.

The harmonics wind gust is considered in the paper, which is described in Eq. (83).

{ p k + 1 = 0.9922 0.1247 0.1247 0.9922 p k d k = 1 0 p k     (83)

where p k represents the state of external disturbance with initial value of 0.01 0 .Furthermore, a command filter was provided to improve the performance of the intelligent tracking controller, which can be generated as follows:

{ J k + 1 = J 1 k + 1 J 2 k + 1 = J 2 k 2 ζ n ω n S v ω n 2 2 ζ n ω n S a J 1 k J 2 k z k = z 1 k z 2 k = J 1 k J 2 k     (84)

where J k denotes the state vector; z k represents the output of the filter; ζ n and ω n are the damping ratio and band width; S a and S v denote the transfer functions of the amplitude limiting and the rate limiting filters.

The parameters of the switched systems are given as c 1 = 0 , c 2 = 1.5 , N f = 25 , ω ¯ = 5 , and R = I . Compared with the conventional ADT method, tighter bounds on FTS analysis can be obtained. The ADT method can be considered to be a special case of the MDADT method, and we can obtain that τ a i τ a , which is illustrated in Table 1. Therefore, the proposed method can realize limited conservative results than the ADT method. We set the probability of data missing as ρ = 0.95 , the maximum number of consecutive data missing N1 is set to be 5. Moreover, the matrices U 1 i , U 2 i , S 1 i , and S 2 i can be solved by Eqs. (62), (63) in Theorem 3. The dynamics-based controller was constructed, and its parameter matrices and structure are given as follows:

K 1 = 138.4164 2.4407 6.7702 167.8987 0.9985 7.5232 383.8513 4.9224 17.4799 , K 2 = 101.5134 1.6037 3.8630 119.6668 0.8277 4.3093 276.3462 1.8812 10.0604 , K 4 = 96.5387 1.2329 2.9000 93.0939 0.8237 2.1852 268.1340 0.7194 7.2606 , K 8 = 175.2877 1.6404 6.3880 61.4511 1.1225 2.3991 391.6398 0.1709 14.2824 , K 9 = 143.3570 1.4171 6.7477 71.8536 0.9124 3.4568 359.6986 0.5128 16.8429 , K 12 = 135.7652 1.2744 7.2143 84.5744 0.9719 5.9774 399.1680 1.8255 23.1364 , K 18 = 221.0958 4.4685 11.2652 162.9903 3.1023 8.3565 555.0112 7.9902 29.2663 , K 19 = 251.0396 5.1211 13.9648 113.9665 2.3356 5.9639 762.3699 11.0262 40.3752     (85)
TABLE 1
www.frontiersin.org

Table 1. Dwell time of various switching logics.

Moreover, to overcome the problem of operation points with static instability, an angular rate compensator was introduced as follows:

T f s = k q s + 1 / t q s     (86)

where T f s denotes the transfer function of angular rate compensator, t q and k q are the parameters of compensator.

Next, we presented two examples to validate the proposed method.

Example 1: The tighter bounds on the dwell time can be obtained by the proposed method according to the data in Table 1. Moreover, because the characteristic of each subsystem is considered, the transient performance can be achieved by using the MDADT method. The switching of subsystems is displayed in Figure 2. Notably, the parameters of flight vehicles switch at the switching instants. First, to compare the difference between the two switching logic mechanisms, the simulation results under ADT switching logic and MDADT switching logic are displayed in Figures 3, 4, in which the labels are defined as ADT and MDADT, respectively. Figures 3, 4 reveal that the curves of the attack angle highlight the tracking performance in the flight envelope of switched controllers under ADT switching logic and MDADT switching logic. Thus, the tracking error can converge within the given time interval, and the transient performance of MDADT method is superior. Moreover, in Figures 3, 4, we provide the detailed enlargement of simulation curves near the switching time and steady process. Switched controllers with MDADT logic can achieve better transient performance than the those of controllers with ADT logic. Furthermore, the MDADT method corresponds to smoother response. The switched controllers with MDADT logic can obtain excellent transient performance with tighter bounds on the dwell time, which is less conservative than the ADT logic.

FIGURE 3
www.frontiersin.org

Figure 3. Response of the attack angle.

FIGURE 4
www.frontiersin.org

Figure 4. Tracking error.

Example 2. In this section, the feasibility of the ADDPG algorithm for flight aircraft is validated. The weights of actor network and critic network are updated such that the learning-based controller adaptively compensates the model uncertainties and external disturbance in the environment. The action of supplementary control is added to the dynamics-based controller, which constitutes the real-time finite-time adaptive tracking control for the flight vehicles. The design parameters of the ADDPG algorithm are defined in Table 2.

TABLE 2
www.frontiersin.org

Table 2. Parameters setting of the ADDPG.

The input is divided into two paths for critic networks, corresponding to the observation and action. The number of neurons in the input layer of the observation path is the dimension of the observed states, which is represented by obs. The number of neurons in the input layer of the action path corresponding to the controller parameters. The critic networks are updated based on the adaptive moment estimation (Adam) algorithm. The regularization factor is set to be 2 × 10 4 .

We define the input of actor network is the observed states and the output is the compensated controller parameters. The activation function of fully connected layers is set to be ReLu and the activation function of output layer is tanh. The weights of actor network are updated based on the Adam algorithm. The variance of noise is set to be 0.1 and the variance decay rate is 1 × 10 5 . Because the stability and robustness of the closed-loop system are guaranteed by the switched control theory and robust control theory, we consider wind gust in the training environment, the perturbations of aerodynamic parameters and wind gust are introduced in the testing environment. Then the algorithms can be implemented on a desktop with Intel Core i7-10700K @3.80GHz RAM 16.00 GB and operation system of Windows 10.

The DDPG algorithm was simulated to verify the advantages of the proposed method in terms of control performance and convergence for algorithms. The robust controller proposed by the MDADT method was designed as the dynamics-based controller. Both the ADDPG and DDPG algorithms are given in the simulation as the learning-based controller to compensate the unexpected uncertainties in the flight environment. The simulation results are displayed in Figures 59, in which the MDADT method, MDADT with DDPG method, and MDADT with ADDPG method are labeled as MDADT, DDPG, and ADDPG, respectively. As displayed in Figures 5, 6, the ADDPG algorithm outperformed the episodes reward convergence of DDPG algorithm, which required fewer episodes to converge in the neighbor of the origin. Therefore, the ADDPG algorithm outperformed the conventional DDPG algorithm in terms of the control performance and steady error. The responses of attack angle are displayed in Figure 7. Both DDPG and ADDPG algorithms could achieve convergence and efficient performance. However, the transient convergence of the ADDPG algorithm was superior to that of the DDPG algorithm. The tracking errors are displayed in Figure 8. The controller compensated with the DDPG and ADDPG algorithms can exhibit improved performance of steady-state response. However, the steady-state error of the ADDPG algorithm was less than that of the DDPG algorithm. The reward function of an episode is displayed in Figure 9. The ADDPG algorithm can achieve superior final performance.

FIGURE 5
www.frontiersin.org

Figure 5. Episodes reward of the ADDPG.

FIGURE 6
www.frontiersin.org

Figure 6. Episodes reward of the DDPG.

FIGURE 7
www.frontiersin.org

Figure 7. Response of the attack angle.

FIGURE 8
www.frontiersin.org

Figure 8. Tracking error.

FIGURE 9
www.frontiersin.org

Figure 9. Response of reward function.

The average tracking errors of methods are presented in Table 3. The online scheduling through DDPG and ADDPG can efficiently reduce the average tracking error; the adaptive reward function can improve the tracking performance. The proposed method can overcome the undesirable response caused by asynchronous switching and uncertainties in the flight environment.

TABLE 3
www.frontiersin.org

Table 3. Average tracking errors.

Moreover, to show the effectiveness to deal with system uncertainties and disturbance, we give the simulation results of HiMAT vehicle with disturbances and uncertainties of aerodynamic parameters, which can also illustrate the potential application prospects for practical environment. The results are described in Figures 10, 11, in which we consider the cases where the aerodynamic parameter perturbations are 10, 15, and 20%. The responses of attack angle are given in Figure 10 and the tracking errors are given in Figure 11. The average tracking errors in the presence of aerodynamic perturbations are also provided in Table 4. We can see that the stability and tracking performance can be guaranteed with uncertainties and disturbances by using the proposed method, which illustrates that the proposed method can ensure the control accuracy, stability, and robustness simultaneously.

FIGURE 10
www.frontiersin.org

Figure 10. Response of the attack angle.

FIGURE 11
www.frontiersin.org

Figure 11. Tracking error.

TABLE 4
www.frontiersin.org

Table 4. Average tracking errors in the presence of aerodynamic perturbations.

Remark 4: We draw inspiration from the traditional method of dealing with the sim-to-real transfer issue. Firstly, the nonlinear model is converted to a linear model by employing Jacobian linearization. Then we can design the nominal controller on the reference points. In most engineering applications, the stability margin is introduced and analyzed to ensure the robustness. Similarly, in this paper, we developed finite-time robust control theory to ensure the stability and attenuation performance. The uncertainties and disturbances in practical environment can be overcome. However, we noticed that it is difficult to realize optimal compromise between robustness and transient performance. The ADDPG algorithm is given to improve the control accuracy. Moreover, the non-fragile control theory is introduced, which ensures the stability and prescribed attenuation performance on the scheduling intervals.

Remark 5: The problem of finite-time tracking control for switched flight vehicles was investigated. According to the numerical examples, the advantages of the suggested control method to address the flight vehicle considering disturbances and uncertainties over the existing control methods are demonstrated, which can be described as follows: (1) Unlike the conventional model-based control methods, the proposed method was developed by using DRL, which can improve control performance and overcome the undesirable response caused by uncertainties. (2) In the proposed method, the advantages of model-based and model-free method are combined. The dynamics-based controller was developed to ensure stability and robustness, and the learning-based controller was proposed to compensate the uncertainties in the flight environment. (3) The established adaptive generalized reward function can improve convergence and robustness.

5 Conclusion

The finite-time control of switched flight vehicles with asynchronous switching was realized using a novel nonfragile DRL method. The flight vehicles were modeled as the switched system, and the asynchronous switching caused by packet dropouts was considered. The MDADT and MLF methods were used to ensure FTS and weighted prescribed attenuation index. LMIs were used to determine the solutions of the finite-time tracking controller. To compensate the external disturbance and improve tracking performance, the ADDPG algorithm based on the actor–critic framework was provided to optimize the parameters of tracking controllers. To improve optimization efficiency and decrease computational complexity, parameter optimization was assumed to be limited in the given range. The compensation of control policy in a given range is considered as the uncertainties of the controller parameters, and the FTS is ensured by nonfragile control theory. Compared with the conventional DDPG algorithm, the adaptive hyper parameters of reward function were introduced to achieve superior control performance and realize a general case. The FTS, robustness, and transient performance were ensured simultaneously by the proposed method. In the future, the following four points should be studied: (1) The event-triggered control structure should be considered to reduce the load and improve the robustness of information transformation. (2) The parallel optimization methods should be presented to improve training efficiency. (3) The fitting ability and generalization ability of neural networks should be studied to improve the robustness in the complex environment. (4) The semi physical simulations and flight tests of mini drones should be developed to further demonstrate the engineering feasibility of the proposed method.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

HC: Writing – original draft, Writing – review & editing. RS: Writing – original draft, Writing – review & editing. HL: Writing – review & editing, Writing – original draft. WW: Writing – review & editing, Writing – original draft. BZ: Writing – review & editing, Writing – original draft. YF: Writing – original draft, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This study was co-supported by National Natural Science Foundation of China (No. 62303380,62176214, 62003268, 62101590), and the Aero-nautical Science Foundation of China (No. 201907053001).

Acknowledgments

The authors would like to thank all the reviewers who participated in the review.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Aristidou, P., Fabozzi, D., and Cutsem, V. T. (2014). Dynamic simulation of large-scale power systems using a parallel Schur-complement-based decomposition method. IEEE Trans. Para. and Dis. Sys. 25, 2561–2570. doi: 10.1109/TPDS.2013.252

CrossRef Full Text | Google Scholar

Ban, J., Kwon, W., Won, S., and Kim, S. (2018). Robust H finite-time control for discrete-time polytopic uncertain switched linear systems. Nonlinear Anal-Hybri 29, 348–362. doi: 10.1016/j.nahs.2018.03.005

CrossRef Full Text | Google Scholar

Bao, C. Y., Wang, P., and Tang, G. J. (2021). Integrated method of guidance, control and morphing for hypersonic morphing vehicle in glide phase. Chin. J. Aeronaut. 34, 535–553. doi: 10.1016/j.cja.2020.11.009

CrossRef Full Text | Google Scholar

Chen, L. H., Fu, S. S., Zhao, Y. X., Liu, M., and Qiu, J. B. (2020). State and fault observer design for switched systems via an adaptive fuzzy approach. IEEE Trans. Fuzzy Syst. 28, 2107–2118. doi: 10.1109/TFUZZ.2019.2930485

CrossRef Full Text | Google Scholar

Chen, S. Z., Ning, C. Y., Liu, Q., and Liu, Q. P. (2022). Improved multiple Lyapunov functions of input–output-to-state stability for nonlinear switched systems. Inf. Sci. 608, 47–62. doi: 10.1016/j.ins.2022.06.025

CrossRef Full Text | Google Scholar

Cheng, H. Y., Dong, C. Y., Jiang, W. L., Wang, Q., and Hou, Y. Z. (2017). Non-fragile switched H∞ control for morphing aircraft with asynchronous switching. Chin. J. Aeronaut. 30, 1127–1139. doi: 10.1016/j.cja.2017.01.008

CrossRef Full Text | Google Scholar

Cheng, H. Y., Fu, W. X., Dong, C. Y., Wang, Q., and Hou, Y. Z. (2018). Asynchronously finite-time H∞ control for morphing aircraft. Trans. Inst. Meas. Control. 40, 4330–4344. doi: 10.1177/0142331217746737

CrossRef Full Text | Google Scholar

Cheng, L., Wang, Z. B., and Jiang, F. H. (2019). Real-time control for fuel-optimal moon landing based on an interactive deep reinforcement learning algorithm. Astrodynamics 3, 375–386. doi: 10.1007/s42064-018-0052-2

CrossRef Full Text | Google Scholar

Cheng, Y., and Zhang, W. D. (2018). Concise deep reinforcement learning obstacle avoidance for underactuated unmanned marine vessels. Neurocomputing 272, 63–73. doi: 10.1016/j.neucom.2017.06.066

CrossRef Full Text | Google Scholar

Gaudet, B., Linares, R., and Furfaro, R. (2020). Deep reinforcement learning for six degree-of-freedom planetary landing. Adv. Space Res. 65, 1723–1741. doi: 10.1016/j.asr.2019.12.030

CrossRef Full Text | Google Scholar

Gheisarnejad, M., and Khooban, H. M. (2021). An intelligent non-integer PID controller-based deep reinforcement learning: implementation and experimental results. IEEE Trans. Ind. Electron. 68, 3609–3618. doi: 10.1109/TIE.2020.2979561

CrossRef Full Text | Google Scholar

Giacomin, P. A. S., and Hemerly, E. M. (2022). A distributed, real-time and easy-to-extend strategy for missions of autonomous aircraft squadrons. Inf. Sci. 608, 222–250. doi: 10.1016/j.ins.2022.06.043

CrossRef Full Text | Google Scholar

Gong, L. G., Wang, Q., Hu, C. H., and Liu, C. (2020). Switching control of morphing aircraft based on Q-learning. Chin. J. Aeronaut. 33, 672–687. doi: 10.1016/j.cja.2019.10.005

CrossRef Full Text | Google Scholar

Grigorie, L. T., Khan, S., Botez, M. R., Mamou, M., and Mébarki, Y. (2022). Design and experimental testing of a control system for a morphing wing model actuated with miniature BLDC motors. Chin. J. Aeronaut. 33, 1272–1287. doi: 10.1016/j.cja.2019.08.007

CrossRef Full Text | Google Scholar

Guo, Q., Zhang, Y., Celler, G. B., and Su, W. S. (2019). Neural adaptive backstepping control of a robotic manipulator with prescribed performance constraint. IEEE Trans. Neural Netw. Learn. Syst. 30, 3572–3583. doi: 10.1109/TNNLS.2018.2854699

PubMed Abstract | CrossRef Full Text | Google Scholar

Hu, Q. L., Xiao, L., and Wang, C. L. (2019). Adaptive fault-tolerant attitude tracking control for spacecraft with time-varying inertia uncertainties. Chin. J. Aeronaut. 32, 674–687. doi: 10.1016/j.cja.2018.12.015

CrossRef Full Text | Google Scholar

Huang, L. T., Li, Y. M., and Tong, S. C. (2020). Fuzzy adaptive output feedback control for MIMO switched nontriangular structure nonlinear systems with unknown control directions. IEEE Trans. Syst. Man. Cybern. Syst. 50, 550–564. doi: 10.1109/TSMC.2017.2778099

CrossRef Full Text | Google Scholar

Jiang, W. L., Wu, K. S., Wang, Z. L., and Wang, Y. N. (2020). Gain-scheduled control for morphing aircraft via switching polytopic linear parameter-varying systems. Aerosp. Sci. Technol. 107:106242. doi: 10.1016/j.ast.2020.106242

CrossRef Full Text | Google Scholar

Lee, S., and Kim, D. (2022). Deep learning based recommender system using cross convolutional filters. Inf. Sci. 592, 112–122. doi: 10.1016/j.ins.2022.01.033

CrossRef Full Text | Google Scholar

Li, M. L., and Deng, F. Q. (2018). Moment exponential input-to-state stability of non-linear switched stochastic systems with Lévy noise. IET Contr. Theory Appl. 12, 1208–1215. doi: 10.1049/iet-cta.2017.1229

CrossRef Full Text | Google Scholar

Liu, Y., Dong, C. Y., Zhang, W. Q., and Wang, Q. (2021). Phase plane design based fast altitude tracking control for hypersonic flight vehicle with angle of attack constraint. Chin. J. Aeronaut. 34, 490–503. doi: 10.1016/j.cja.2020.04.026

CrossRef Full Text | Google Scholar

Liu, T. J., Du, X., Sun, X. M., Richter, H., and Zhu, F. (2019). Robust tracking control of aero-engine rotor speed based on switched LPV model. Aerosp. Sci. Technol. 91, 382–390. doi: 10.1016/j.ast.2019.05.031

CrossRef Full Text | Google Scholar

Liu, L. J., Zhao, X. D., Sun, X. M., and Zong, G. D. (2020). Stability and l2-gain analysis of discrete-time switched systems with mode-dependent average dwell time. IEEE Trans. Syst. Man. Cybern. Syst. 50, 2305–2314. doi: 10.1109/TSMC.2018.2794738

CrossRef Full Text | Google Scholar

Lu, Y., Jia, Z., Liu, X., and Lu, K. F. (2022). Output feedback fault-tolerant control for hypersonic flight vehicles with non-affine actuator faults. Acta Astronaut. 193, 324–337. doi: 10.1016/j.actaastro.2022.01.023

CrossRef Full Text | Google Scholar

Sakthivel, R., Wang, C., Santra, S., and Kaviarasan, B. (2018). Non-fragile reliable sampled-data controller for nonlinear switched time-varying systems. Nonlinear Anal-Hybri 27, 62–76. doi: 10.1016/j.nahs.2017.08.005

CrossRef Full Text | Google Scholar

Sun, Y. M., and Lei, Z. (2021). Fixed-time adaptive fuzzy control for uncertain strict feedback switched systems. Inf. Sci. 546, 742–752. doi: 10.1016/j.ins.2020.08.059

CrossRef Full Text | Google Scholar

Tailor, D., and Izzo, D. (2019). Learning the optimal state-feedback via supervised imitation learning. Astrodynamics 3, 361–374. doi: 10.1007/s42064-019-0054-0

CrossRef Full Text | Google Scholar

Wang, J. H., Ha, L., Dong, X. W., Li, Q. D., and Ren, Z. (2021). Distributed sliding mode control for time-varying formation tracking of multi-UAV system with a dynamic leader. Aerosp. Sci. Technol. 111:106549. doi: 10.1016/j.ast.2021.106549

CrossRef Full Text | Google Scholar

Wang, Z. C., Sun, J., Chen, J., and Bai, Y. Q. (2020). Finite-time stability of switched nonlinear time-delay systems. Int. J. Robust Nonlinear Control. 30, 2906–2919. doi: 10.1002/rnc.4928

CrossRef Full Text | Google Scholar

Wang, Z. L., Wang, Q., Dong, C. Y., and Gong, L. G. (2015). Closed-loop fault detection for full-envelope flight vehicle with measurement delays. Chin. J. Aeronaut. 28, 832–844. doi: 10.1016/j.cja.2015.04.009

CrossRef Full Text | Google Scholar

Wang, H., and Xu, R. (2022). Heuristic decomposition planning for fast spacecraft reorientation under multiaxis constraints. Acta Astronaut. 198, 286–294. doi: 10.1016/j.actaastro.2022.06.012

CrossRef Full Text | Google Scholar

Wang, F., Zhang, X. Y., Chen, B., Chong, L., Li, X. H., and Zhang, J. (2017). Adaptive finite-time tracking control of switched nonlinear systems. Inf. Sci. 421, 126–135. doi: 10.1016/j.ins.2017.08.095

CrossRef Full Text | Google Scholar

Wei, J. M., Zhang, X. X., Zhi, H. M., and Zhu, X. L. (2020). New finite-time stability conditions of linear discrete switched singular systems with finite-time unstable subsystems. J. Frankl. Inst. 357, 279–293. doi: 10.1016/j.jfranklin.2019.03.045

CrossRef Full Text | Google Scholar

Wu, C. H., Yan, J. G., Lin, H., Wu, X. W., and Xiao, B. (2021). Fixed-time disturbance observer-based chattering-free sliding mode attitude tracking control of aircraft with sensor noises. Aerosp. Sci. Technol. 111:106565. doi: 10.1016/j.ast.2021.106565

CrossRef Full Text | Google Scholar

Xiao, X. Q., Park, H. J., Zhou, L., and Lu, G. P. (2020). Event-triggered control of discrete-time switched linear systems with network transmission delays. Automatica 111:108585. doi: 10.1016/j.automatica.2019.108585

CrossRef Full Text | Google Scholar

Xu, J., Hou, Z. M., Wang, W., Xu, B. H., Zhang, K. G., and Chen, K. (2019). Feedback deep deterministic policy gradient with fuzzy reward for robotic multiple peg-in-hole assembly tasks. IEEE Trans. Ind. Inform. 15, 1658–1667. doi: 10.1109/TII.2018.2868859

CrossRef Full Text | Google Scholar

Xu, X. Z., Mao, X., Li, Y., and Zhang, H. B. (2019). New result on robust stability of switched systems with all subsystems unstable. IET Contr. Theory Appl. 13, 2138–2145. doi: 10.1049/iet-cta.2019.0018

CrossRef Full Text | Google Scholar

Yang, D., Li, X. D., and Song, S. J. (2020). Design of state-dependent switching laws for stability of switched stochastic neural networks with time-delays. IEEE Trans. Neural Netw. Learn. Syst. 31, 1808–1819. doi: 10.1109/TNNLS.2019.2927161

PubMed Abstract | CrossRef Full Text | Google Scholar

Yang, D., Zong, G. D., Liu, Y. J., and Ahn, C. K. (2022). Adaptive neural network output tracking control of uncertain switched nonlinear systems: an improved multiple Lyapunov function method. Inf. Sci. 606, 380–396. doi: 10.1016/j.ins.2022.05.071

CrossRef Full Text | Google Scholar

Yuan, S., Schutter, D. B., and Baldi, S. (2017). Adaptive asymptotic tracking control of uncertain time-driven switched linear systems. IEEE Trans. Autom. Control 62, 5802–5807. doi: 10.1109/TAC.2016.2639479

CrossRef Full Text | Google Scholar

Yue, T., Xu, Z. J., Wang, L. X., and Wang, T. (2019). Sliding mode control design for oblique wing aircraft in wing skewing process. Chin. J. Aeronaut. 32, 263–271. doi: 10.1016/j.cja.2018.11.002

CrossRef Full Text | Google Scholar

Zhang, L. X., Nie, L., Cai, B., Yuan, S., and Wang, D. Z. (2019). Switched linear parameter-varying modeling and tracking control for flexible hypersonic vehicle. Aerosp. Sci. Technol. 95:105445. doi: 10.1016/j.ast.2019.105445

CrossRef Full Text | Google Scholar

Zhang, M., and Zhu, Q. X. (2019). Input-to-state stability for non-linear switched stochastic delayed systems with asynchronous switching. IET Contr. Theory Appl. 13, 351–359. doi: 10.1049/iet-cta.2018.5956

CrossRef Full Text | Google Scholar

Zhang, M., and Zhu, Q. X. (2020). Stability analysis for switched stochastic delayed systems under asynchronous switching: a relaxed switching signal. Int. J. Robust Nonlinear Control 30, 8278–8298. doi: 10.1002/rnc.5240

CrossRef Full Text | Google Scholar

Zhao, X. D., Shi, P., Yin, Y. F., and Nguang, S. K. (2017). New results on stability of slowly switched systems: a multiple discontinuous Lyapunov function approach. IEEE Trans. Autom. Control 62, 3502–3509. doi: 10.1109/TAC.2016.2614911

CrossRef Full Text | Google Scholar

Zhao, X. D., Zhang, L. X., Shi, P., and Liu, P. (2012). Stability of switched positive linear systems with average dwell time switching. Automatica 48, 1132–1137. doi: 10.1016/j.automatica.2012.03.008

CrossRef Full Text | Google Scholar

Keywords: switched systems, asynchronous switching, deep reinforcement learning, nonfragile control, finite H∞ control

Citation: Cheng H, Song R, Li H, Wei W, Zheng B and Fang Y (2023) Realizing asynchronous finite-time robust tracking control of switched flight vehicles by using nonfragile deep reinforcement learning. Front. Neurosci. 17:1329576. doi: 10.3389/fnins.2023.1329576

Received: 29 October 2023; Accepted: 20 November 2023;
Published: 21 December 2023.

Edited by:

Ziming Zhang, Worcester Polytechnic Institute, United States

Reviewed by:

Weilai Jiang, Hunan University, China
Lin Cheng, Beihang University, China

Copyright © 2023 Cheng, Song, Li, Wei, Zheng and Fang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Haoyu Cheng, Y2hlbmdoYW95dUBud3B1LmVkdS5jbg==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.