Skip to main content

ORIGINAL RESEARCH article

Front. Neurorobot. , 29 January 2025

Volume 19 - 2025 | https://doi.org/10.3389/fnbot.2025.1549414

Privacy-preserving ADP for secure tracking control of AVRs against unreliable communication

  • 1School of Astronautics, Beihang University, Beijing, China
  • 2School of Electrical Engineering, University of Jinan, Jinan, China
  • 3LAAS-CNRS, University of Toulouse, CNRS, Toulouse, France
  • 4Department of Aeronautical and Automotive Engineering, Loughborough University, Loughborough, United Kingdom

In this study, we developed an encrypted guaranteed-cost tracking control scheme for autonomous vehicles or robots (AVRs), by using the adaptive dynamic programming technique. To construct the tracking dynamics under unreliable communication, the AVR's motion is analyzed. To mitigate information leakage and unauthorized access in vehicular network systems, an encrypted guaranteed-cost policy iteration algorithm is developed, incorporating encryption and decryption schemes between the vehicle and the cloud based on the tracking dynamics. Building on a simplified single-network framework, the Hamilton-Jacobi-Bellman equation is approximately solved, avoiding the complexity of dual-network structures and reducing the computational costs. The input-constrained issue is successfully handled using a non-quadratic value function. Furthermore, the approximate optimal control is verified to stabilize the tracking system. A case study involving an AVR system validates the effectiveness and practicality of the proposed algorithm.

1 Introduction

Autonomous vehicles or robots (AVRs) have rapidly transformed from a futuristic concept to a tangible reality, driving significant advancements in automotive technology. The advancement of autonomous vehicle technology has increasingly focused on improving tracking control systems, which are crucial for effective vehicle guidance (Pan et al., 2023). However, a persistent issue is the unreliable communication between a local vehicle and a reference vehicle, leading to discrepancies in signal reception and affecting tracking precision. In addition to these developments, the emergence of connected vehicles (Li et al., 2019a; Liu et al., 2023b), which leverages cloud computing for data processing and optimization, presents both opportunities and challenges. These systems function as cyber–physical systems (He et al., 2014; Zhang et al., 2014; Mohan et al., 2020), integrating computational and physical processes to enhance real-time data exchange and improve overall traffic management (Jiang et al., 2022; Li et al., 2019b). However, during communication between the vehicle and the cloud, the network's homogeneous and civilian nature makes it, particularly, vulnerable to attacks. This vulnerability, especially in the absence of robust security protocols, exposes these systems to cyber threats, including eavesdropping.

To enhance the security of vehicular cyber-physical systems, researchers from various fields, such as communication, control systems, and information theory, have developed various strategies to address cyberattacks across different layers (Han et al., 2024; Deng and Wen, 2021; Liu et al., 2021, 2023a). Various types of attacks, including denial-of-service (DoS) attacks, false data injection (FDI) attacks, and replay attacks, have been extensively studied (Teixeira et al., 2012; Li et al., 2024; Hu et al., 2023). These types of attacks share the characteristic of being active strategies designed to disrupt system functionality or manipulate transmitted data. Although defense mechanisms have made progress in countering such threats, majority of the existing methods primarily concentrate on detecting and mitigating explicit attacks, often overlooking the fundamental challenge of ensuring communication security. In vehicular cybersecurity, one of the critical issues is the threat of eavesdropping attacks (Yang et al., 2020; Wu et al., 2022). Unlike the direct and active nature of DoS and FDI attacks, eavesdropping operates passively, enabling attackers to intercept sensitive information while remaining undetected. This makes it a significant long-term threat that can compromise communication confidentiality and can even enable more destructive attacks. Addressing this challenge requires advanced encryption and privacy-preserving techniques to ensure secure communication. Although these methods are effective, they do not ensure optimal control performance at minimal energy cost, as they do not incorporate the principles of optimal control.

Optimal tracking control has become a cornerstone of modern control theory, with adaptive dynamic programming (ADP) algorithms attracting considerable interest in recent years (Lu et al., 2020; Mu et al., 2017b). For non-linear optimal control problems, the principal challenge lies in solving the Hamilton-Jacobi-Bellman (HJB) equation—a problem that is nearly intractable through exact mathematical methods. ADP techniques have offered a promising alternative by leveraging neural networks (NNs) to approximate optimal solutions, leading to significant advancements across fields such as automatic control and artificial intelligence (Mu et al., 2017a; Guo et al., 2024). For example, El-Sousy et al. (2021) designed a three-network structure to approximate the solution of the HJB equation for permanent-magnet synchronous motor servo drives. Wang et al. (2020) proposed an dual-network to approximate local Q-functions and control policies, solving optimal consensus control for non-linear multiagent systems. Furthermore, ADP-based optimal tracking control has been widely investigated (Dong et al., 2022; Song et al., 2023), including efforts to address input-constrained systems (Yang et al., 2023; Zhang et al., 2018). However, conventional ADP approaches, particularly those employing actor-critic frameworks, are frequently hindered by significant approximation errors introduced during iterative processes and NN training, thereby restricting their practical applicability.

To address these challenges, researchers have proposed several single-network ADP methodologies designed to streamline system architectures and enhance computational efficiency in handling nonlinear systems (Xu et al., 2023; Chen et al., 2021; Zou and Zhang, 2023). Chen et al. (2021) developed an event-triggered optimal control scheme for a macro–micro stage system, using a single critic NN to solve the modified HJB equation. In Guo et al. (2024), a distributed control strategy for attitude-constrained quadrotor unmanned aerial vehicle is proposed based on a critic network. Among the core ADP algorithms, value iteration and policy iteration (PI) have been widely employed, demonstrating robust performance in numerous applications (Zhang et al., 2020; Lin et al., 2023). However, the two-stage iterative procedures inherent in these methods frequently involve information transmission, which makes them susceptible to interception by adversaries. This vulnerability necessitates additional security measures, thereby increasing computational complexity and further constraining their applicability to complex systems. Although efforts to streamline computational burdens by eliminating actor networks have yielded progress, current ADP methods still inadequately address essential issues such as input saturation and ensuring reliable system performance, leaving these critical areas as potential opportunities for future research.

Unlike the previous studies, this article proposes an encrypted guaranteed-cost tracking control scheme for input-constrained tracking system with unreliable communication, and the main contributions are summarized as follows:

1. This article introduces an encrypted guaranteed-cost tracking control scheme for AVRs under unreliable communication. Compared with existing works, this is the first attempt to integrate ADP with encryption techniques, addressing both control performance and information security challenges in vehicular networks.

2. The designed privacy-preserving control method introduces a strategy to address eavesdropping attacks in control systems. By applying consistent output masking and encryption mechanisms at both the vehicle side and the cloud side, sensitive data and critical control information are effectively protected from potential breaches. This integrated approach ensures secure data transmission while maintaining the integrity and privacy of the control system.

3. A single-network structure with enhanced computational efficiency is proposed to approximate the HJB equation. Compared to conventional dual-network designs, the single-network structure reduces computational complexity while maintaining theoretical guarantees on weight error convergence and system stability. Additionally, input saturation is explicitly addressed through the adoption of a nonlinear value function, further enhancing the robustness.

2 Preliminaries and problem formulation

Consider an AVR operating in the X-Y plane, the position and orientation of the vehicle's mass center are represented by a posture vector

Z˙:=[x(t)y(t)ϑ(t)],

where x(t) and y(t) denote the horizontal and vertical positions, respectively, and ϑ(t) denotes the heading direction measured counterclockwise from the X-axis. The vehicle's motion is governed by the following kinematic model:

Z˙=Ku(t)=[cos(ϑ(t))Ysin(ϑ(t))sin(ϑ(t))-Ycos(ϑ(t))01]  [v(t)w(t)].    (1)

Here, v(t) and w(t) represent the vehicle's linear and rotational velocities, respectively, while Y is the distance between the vehicle's mass center and its drive axle; and K is the Jacobian matrix that links the control inputs to the vehicle's motion. So far, the control objectives are summarized in the following.

Control objective: For an AVR under unreliable communication, design an ADP-based robust optimal controller with secure information exchange to drive the vehicle along the target, such that the following objectives are achieved:

1) Robust tracking control objective: For an AVR, Zc: = [x(t);y(t);ϑ(t)] to track the desired orbit Zd: = [xd(t);yd(t);ϑd(t)] under malicious cyberattacks on the tracking process, as shown in Figure 1. Due to the occurrence of an attack, a small deviation arises between the received signal and the actual signal. This deviation, caused by malicious attacks, is defined as Za: = [xa(t);ya(t);ϑa(t)]. We assume that Za and its derivative are bounded.

Figure 1
www.frontiersin.org

Figure 1. Proposed scheme for tracking process of AVRs.

With the minor difference Za caused by unreliable communication, following the framework in Zhang et al. (2022), we derive the tracking error system as

Z˙e=[cos(ϑe(t))vd(t)+ye(t)wd(t)-vc(t)+γx(t)sin(ϑe(t))vd(t)-xe(t)wd(t)-wc(t)+γy(t)wd(t)-wc(t)+γϑ(t)],    (2)

where Ze: = [xe(t);ye(t);ϑe(t)] denotes the tracking error posture, vd(t) and wd(t) are the desired linear and rotational velocities, vc(t) and wc(t) are the control inputs of the vehicle, and [γx(t);γy(t);γϑ(t)] captures the effect of cyberattacks on the received signals and given by

[γx(t)γy(t)γϑ(t)]=[cos(ϑc(t))a+sin(ϑc(t))a-sin(ϑc(t))a+cos(ϑc(t))aϑ˙a].

This model describes the dynamic behavior of the tracking error in AVR control.

To facilitate system description and control implementation, let us consider that z = [xe(t);ye(t);ϑe(t)], f(z) = [cos(ϑe(t))vd(t); sin(ϑe(t))vd(t);wd(t)], g(z) = [−1, ye(t);0, Lxe(t);0, 1], and u = [vc(t), wc(t)]. The system (Equation 2) is rewritten as

ż=f(z)+g(z)u+γ,    (3)

where u is control input and satisfies the asymmetric constrained set 𝔒 = {u||u| ≤ ℏ}. To follow the conventional optimal tracking architecture, we can rewrite the reference trajectory as follows

żd=fd(zd)+gd(zd)ud,    (4)

where ud is the steady-state control input taking the following form

ud=gd-1(zd)(żd-fd(zd)),    (5)

where gd-1(zd)gd(zd)=In, In denotes an n × n identify matrix.

Assumption 1. The unreliable communication γ(t) is bounded by γ̄, that is γ̄(t)γ̄, where γ̄ is positive constant.

2) Prevent eavesdropping objective: As shown in Figure 1, the cloud handles monitoring, scheduling, optimization, and computation tasks, while the local controller is responsible for distributing control signals, albeit with limited data storage and processing capabilities. The cyberattack considered here is eavesdropping, where unauthorized interception of data during transmission allows attackers to steal sensitive system information, such as real-time control signals and operational states. To mitigate these risks, encryption and decryption mechanisms are implemented to safeguard the confidentiality and integrity of the transmitted data, ensuring secure communication and enhancing the system's overall reliability.

3) Optimal control objective: Based on the optimal control strategy, the AVR can achieve a compromise between performance and cost when running along a target, such that

min   J(z)=tγ1γ̄2+T(z,u)ds,s.t.   ż=f(z)+g(z)u,u𝔒,    (6)

where T(z,u)=zTQz+Ū(μ), which is the utility function with feedback control μ = uud, γ1 is positive constant, Q = QT > 0, and Ū(·) is a positive definite non-quadratic integrand function.

3 Iterative algorithm design

In this section, based on the preceding analysis, the tracking problem is reformulated into a stabilization problem for the error dynamics. To address this, a cryptography-based controller is designed, which not only mitigates the impact of unreliable communication but also ensures the security of information transmission against eavesdropping.

3.1 Encryption and decryption algorithm design

To effectively counter eavesdropping attacks on data transmitted between the vehicle side and the cloud side, privacy-preserving rules are designed for both sides. The encryption and decryption formulas (Han et al., 2024) for each iteration are provided in the following.

1) AVR to Cloud:

Encryption process: At the vehicle side, the data z to be sent are extracted from Equation 3 and encrypted using Equation 7, resulting in the encrypted data zr. This encrypted data are then transmitted to the cloud. The encryption formula is as follows:

{zs=a(t)z+Aξ(t),(7a)a(t)=e(δ1Vr(z)(t1)22),(7b)ξ(t)=ρ1e(tmodρ2),(7c)

where a(t) and ξ(t) are encryption operators, δ1, ρ1, and ρ2 are constants, and A is the channel assignment matrices. To simplify the presentation of the method, it is assumed that Vs(z)(t − 1) is already stored in the cloud. The value V(z) needs to be calculated on the cloud side. Its design is detailed in Section 3.2 and it serves as an essential component of the controller μ.

Decryption process: The cloud side receives the encrypted data zr and decrypts it to recover the original data z. The decryption formula is as follows:

{z=zrAξ(t)c(t),(8a)c(t)=e(δ1Vs(z)(t1)22),(8b)

where c(t) is the counterpart of a(t). It is observed that the design forms of the encryption operators a(t) and ξ(t), and encrypted expressions are shared between the vehicle side and the cloud. Furthermore, the parameters A, δ1, ρ1, and ρ2 are also shared.

2) Cloud to AVR:

Encryption process: After policy evaluation, the computed V(z) is encrypted using Equation 9 and sent back to the vehicle.

{Vs(z)=b(t)V(z)+Bζ(t),(9a)b(t)=e(δ2zr22),(9b)ζ(t)=ϱ1e(tmodϱ2),(9c)

where b(t) and ζ(t) are encryption operators, δ2, ϱ1, and ϱ2 are constants, and B is the channel assignment matrices.

Decryption process: At the vehicle side, the received encrypted data Vr(z) is decrypted using Equation 10 to recover V(z) for policy improvement.

{V(z)=Vr(z)Bζ(t)d(t), (10a)d(t)=e(δ2zs22),(10b)

where d(t) is the counterpart of b(t). Similarly, the design forms of the encryption operators b(t) and ζ(t), and encrypted expressions are shared between the vehicle side and the cloud. Furthermore, the parameters B, δ2, ϱ1, and ϱ2 are also shared. At this point, a complete iteration of privacy-preserving processing has been completed.

From the above encryption and decryption processes, it can be observed that the introduced masking signals ξ(t) and ζ(t) and the encryption formula designs effectively ensure privacy during data transmission between the vehicle and the cloud. Notably, the data transmitted over the network are not the raw values z and V(z) but their encrypted counterparts, zs, zr, Vs(z), and Vr(z), which effectively prevent unauthorized entities from intercepting sensitive information.

3.2 Encrypted iterative algorithm design

The objective is to stabilize Equation 3 by constructing an encrypted iterative algorithm so that minimizing the performance index function, thereby reducing control costs and enhancing system security. Recalling Equation 6, the performance index is

V(z)=t(γ1γ̄2+zTQz+Ū(μ))ds,    (11)

where

Ū(μ)=i=1m2θ10ui-udh-1(ιθ1)ridιi          =2θ10u-udh-1(ιθ1)Rdι,    (12)

where R = diag{[r1, ..., rm]} > 0, ι=[ι1,...,ιm]T. The function h(·) is assumed to be a monotonic odd function satisfying h(0) = 0. For the purposes of this article, h(·) is specifically selected as h(x) = (ezez)/(ez + ez).

According to the optimal control theory, Equation 11 is a Lyapunov function for the Equation 3 and the Hamiltonian function can be derived as

H(z,μ,V(z))=γ1γ̄2+zTQz+Ū(μ)+V(z)(f(z)+g(z)u+γ),    (13)

with V(z)=V(z)z. On defining V*(z) as the minimum value of Equation 11, based on Bellman's principle of optimality, we have

0=H(z,μ,V*(z))  =γ1γ̄2+zTQz+Ū(μ)+V*(z)(f(z)+g(z)u*+γ),    (14)

and the optimal control u* is obtained from H(z,μ,V*(z))u*=0:

u*=θ1tanh(-12θ1R-1gT(z)V*(z))+ud.    (15)

Substituting Equation 15 into Equation 12 yields

Ū(μ*)=V*T(z)g(z)tanh(D(z))+θ12i=1mln(1-tanh2(Di(z))),    (16)

where D(z)=12θ1R-1gT(z)V*(z) and μ*=u*-ud. Then, the HJB equation can be derived as

H(z,μ*,V*(z))=γ1γ̄2+zTQz+V*(z)(f(z)+γ)                             +θ12i=1mln(1-tanh2(Di(z)))=0.    (17)

As highlighted in the preceding analysis, obtaining the optimal controller in Equation 15 necessitates solving the HJB Equation 17, a task well-known for its considerable computational and analytical challenges. To overcome this challenge, an iterative algorithm based on ADP is employed to obtain an approximate solution. The details of this iterative algorithm are presented in Algorithm 1.

Algorithm 1
www.frontiersin.org

Algorithm 1. Encrypted guaranteed cost policy iteration algorithm.

Lemma 1. By utilizing the encrypted PI process as described in Algorithm 1, which incorporates encryption and decryption steps for secure control of the tracking error dynamics in an AVR, the resulting control uς ensures the asymptotic stability of the system dynamics. Additionally, Vς(z) will converge to the optimal value function V*(z) as ς → ∞, ensuring that uς converges to the optimal control u*.

Proof. Initially, without iterations, the control u1 is considered admissible. For ∀uς produced during iterations, consider the Lyapunov function Vς(z), which satisfies

V˙ς(z)=Vς(z)ż           =Vς(z)(f(z)+g(z)uς+γ).    (20)

According to HJB Equation 17, we can drive

Vς(z)(f(z)+g(z)uς+γ)=-γ1γ̄2-zTQz-Ū(μς),    (21)

where μς = uςud. Then, substituting Equation 21 into Equation 22 yields

V˙ς(z)=-γ1γ̄2-zTQz-Ū(μς)0.    (22)

Therefore, the iteration process ensures that the error dynamics remain asymptotically stable. Moreover, policy improvement is achieved by minimizing the associated value function, consistent with the Kleinman method, guaranteeing convergence. As the iteration count ς → ∞, Vς(z)V*(z), and uςu* hold. This concludes the proof.     

Based on Lemma 1, the iterative process, enhanced with secure encryption and decryption, converges, leading to optimal control as the approximation errors diminish.

4 Critic neural network design

In this section, this study employs the fundamental update equations of PI to design a NN, utilizing the critic neural network (CNN) to approximate the solution of the HJB Equation 17 during each iteration step. Therefore, based on the universal approximation property of NNs, there exist ideal weights W* such that the ideal value function can be approximated as

V*(z)=W*Tφ(z)+ϵ1(z),    (23)

where φ(z) ∈ ℝα denotes activation functions and α is the number of neurons. Utilizing Equation 23, HJB Equation 17 becomes

γ1γ̄2+zTQz+(W*Tφ(z)+ϵ1T(z))(f(z)+γ)                            +θ12i=1mln(1-tanh2(Hi(z)))=0,    (24)

where

Hi(z)=H1i(z)+H2i(z)           =12θR-1gT(z)φT(z)W*+12θ1R-1gT(z)ϵ1T(z),    (25)

with φ(z)=φ1z and ϵ1(z)=φz. Therefore, by defining residual error ϵH, Equation 24 can be rewritten as

    γ1γ̄2+zTQz+W*Tφ(z)(f(z)+γ)+ϵH+θ12i=1mln(1-tanh2(H1i(z)))=0,    (26)

where

ϵH=ϵ1T(z)(f(z)+γ)-θ12i=1m1O1i(z)tanh(O2i(z))           (1-tanh2(O2i(z))),    (27)

with O1i(z)[1-tanh2(Di(z)),1-tanh2(H1i(z))], O2i(z) ∈ [Di(z), H1i(z)]. Note that if the number of hidden layer neurons α is sufficiently large, the residual error ϵH will approach zero. Based on the Lipschitz assumption of the system dynamics, this ϵH is bounded within a compact set, that is, ϵHϵ̄H. Therefore, based on Equation 23 the ideal optimal control is

u*=θ1tanh(-12θ1R-1gT(z)φTW*)+ud+ϵ2    (28)

where ϵ2=-12i=1m(1-tanh2(ψi))R-1gT(z)ϵ1, ψi ∈ [Di, H1i].

Since the ideal weight is unknown, the approximated value function is

V^(z)=W^Tφ(z),    (29)

where W^ is approximated value of W*. Then, we can get

û=-θ1tanh(12θ1R-1gT(z)φT(z)W^)+ud.    (30)

Thus, approximated Hamiltonian function is

H(z,μ^,V^(z))=γ1γ̄2+zTQz+W^Tφ(z)(f(z)+γ)                          +θ12i=1mln(1-tanh2(H^1i(z)))                    :=ϵ^H,    (31)

where ϵ^H is the residual error due to NN approximation error.

Furthermore, let us consider E=12ϵ^HTϵ^H, and to ensure that W^ converge toward the optimal weights W*, the weight update formula (Zhang et al., 2018) is

W^˙1=-ητϖ2ϵ^H+η2κφ1(g(I-M(H^1))gT)Va              +η(-θ1φgSτTϖW^-(K2-K1τT)W^),    (32)

where η is learning rate, τ = ∇φ(z)(f(z)+g(z)û+γ), ϖ = τT τ + 1, and K1 and K2 are a tuning matrix. M=diag{tanh2(H^1i)}, S=sgn(H^1)-tanh(H^1). Based on the Lemma 2 by Zhang et al. (2018), Va denotes Lyapunov function, and if ∇Va(f(z)+ + γ) > 0, then κ = 0, else κ = 1. Defining W~=W*-W^, we obtain

W~˙=-ηττTW~+ητϖ(θ1W~TφgS+ϵH)     -η2κφ1g(I-M(H^1))gTVa      +ηθ1φgSτTϖW^+η(K2-K1τT)W^,    (33)

with ϵH=θ1WTφg(sgn(H1)-sgn(H^1))+2θ12H̄-ϵH, H̄=i=1mln1+exp(-2H1i)1+exp(-2H^1i).

Theorem 1. For the optimal control policy described in Equation 30, the weight tuning law of the CNN is determined by the update formula provided in Equation 32. Under this design, the error dynamic system z and the weight errors W~ are uniformly ultimately bounded (UUB).

Proof. Define the Lyapunov function as L = L1 + L2, where

L1=12W~Tη-1W~,   L2=Va(z).    (34)

First, along Equation 33, the derivative of L2 is

L˙1=W~Tη-1W~˙1     =W~Tη-1{-ηττTW~+ητϖ(θ1W~TφgS+ϵH)-η2κφ1(g(I-M(H^1))gT)Va         +ηθ1φgSτTϖW^+η(K2-K1τT)W^}     =-W~TττTW~+θ1W~TφgSτTϖW~T+ϵHτTϖW~T-12κVaTg(I-M(H^1))gTφTW~         +θ1W~TφgSτTϖW*-θ1W~TφgSτTϖW~+W~T(K2-K1τT)W^     =-W~TττTW~+ϵHτTϖW~T-12κVaTg(I-M(H^1))gTφTW~         +θ1W~TφgSτTϖW*+W~T(K2-K1τT)W^     =-PTAP+PTB-12κVaTg(I-M(H^1))gTφTW~,    (35)

where

P=[W~TτW~T],A=[I-12K1T-12K1K2],
B=[-1ϖϵH(θ1W~TφgSτTϖ+K2-K1τT)W*].

Supposing that W*W̄1, W̄1>0, and due to ϵH is bound, BB̄, B̄>0. Therefore, L˙1 is

L˙1-λmin(A)P2+B̄P-12κVaTg(I-M(H^1))gTφTW~.    (36)

Owing to κ of L2, L˙ is divided into two parts. For κ = 0, we have

L˙VaTż-λmin(A)P2+B̄P.    (37)

From a study by Rudin et al. (1964), we can know VaTż<-Vazm<0, ||z|| ≤ zm, zm > 0, thus, L˙ becomes

L˙-Vazm-λmin(A)(P-B̄2λmin(A))2+B̄24λmin(A).

Moreover, L˙<0 if

Va>B̄24zmλmin(A),    (38)

or

P>B̄2λmin(A).    (39)

According to Equation 39, we can derive

W~>2B̄5λmin(A).    (40)

For κ = 1, L˙ is

    L˙VaTż-λmin(A)P2+B̄P-12κVaTg(I-M(H^1))gTφTW~    VaT(f+gû+γ)-λmin(A)P2+B̄P-12VaTg(I-M(H^1))gTφTW~.    (41)

Regarding tanh(H1)-tanh(H^1):=H, using the Taylor series, we know

H=12θ1(I-M(H^1))gTφTW~+o((H1-H^1)2),

where o((H1-H^1)2) is the higher order term and satisfies

o((H1-H^1)2)H+12θ1(I-M(H^1))gTφTW~tanh(H1)+tanh(H^1)+12θ1(I-M(H^1))gTφTW~=(i=1m|tanh(H1)|2)12+(i=1m|tanh(H^1)|2)12+12θ1(I-M(H^1))gTφTW~2m+1θ1φ̄W~,    (42)

where ||g|| ≤ ḡ, ḡ > 0 and φφ̄, φ̄>0.

Recalling Equations 2830, the term in Equation 41 with respect to ∇Vag can be written as

    VaT(gû-12g(I-M(H^1))gTφTW~)=-θ1VaTgtanh(H1)+θ1VaTgo((H1-H^1)2)=VaTgu*-VaTϵ2+θ1VaTgo((H1-H^1)2).    (43)

Until now, Equation 41 can be rewritten as

L˙VaT(f+gu*+γ)-λmin(A)P2+B̄P-VaTϵ2+θ1VaTgo((H1-H^1)2)    Vaf+gu*+Vaγ-λmin(A)P2+B̄P-VaTϵ2       +θ1VaTgo((H1-H^1)2)    -λmin(A)P2+B̄P-λmin(C)Va2+ϵ̄2Va+2θ1mVa       +γ̄Va+2φ̄VaW~    =-λmin(A)P2+B̄P-λmin(C)Va2+ωVa+2φ̄VaW~,    (44)

where ϵ2ϵ̄2, ϵ̄2>0. Let ℓ1 and ℓ2 satisfy 0 < ℓ1 < 1, 0 < ℓ2 < 1, and ℓ1 + ℓ2 = 1. Then, Equation 44 can be rewritten as

L˙-42λmin(C)λmin(A)-4φ̄242λmin(C)(P-22λmin(C)B̄42λmin(C)λmin(A)-4φ̄2)2        -1λmin(C)(Va-ω21λmin(C))2-1λmin(C)(Va-2φ̄21λmin(C)W~)2        +2λmin(C)B̄242λmin(C)λmin(A)-4φ̄2+ω241λmin(C)    =-ω142λmin(C)(P-22λmin(C)B̄ω1)2-1λmin(C)(Va-ω21λmin(C))2        -1λmin(C)(Va-2φ̄21λmin(C)W~)2+ω2.

Therefore, L˙<0 if

Va>2φ̄21λmin(C)+ω11λmin(C),    (45)

or

P>22λmin(C)B̄ω1+22λmin(C)ω2ω1.    (46)

Similar to Equation 40, we can derive

W~>42λmin(C)B̄5ω1+42λmin(C)ω25ω1.    (47)

By considering the two cases, κ = 0 and κ = 1, and based on the derived results as expressed in Equations 3840 and Equations 4547, we can conclude that the function ∇Va and the error weights W~ are UUB. Furthermore, knowing that Va is in polynomial form, it follows that the error z is also UUB.

Remark 1. The algorithm designed in this article is depicted in Figure 2, where Algorithm 1 is implemented using a CNN. The CNN generates the estimated value function V^, which is subsequently used to derive the approximated optimal control law û based on Equation 30. In contrast to the constrained optimal control designs presented in the studies by Zou and Zhang (2023); Chen et al. (2021), this work integrates privacy-preserving mechanisms during information transmission by leveraging encryption and decryption techniques. This incorporation not only safeguards data confidentiality but also enhances the overall security and reliability of the proposed algorithm.

Figure 2
www.frontiersin.org

Figure 2. Illustration of tracking for AVRs subject to privacy protection.

5 Simulation results

To analyze the tracking performance of the AVR, we conduct simulations based on a predefined tracking error dynamic model. The tracking error dynamics Z˙e is modeled as

[eeϑ˙e]=[cos(ϑe)vdsin(ϑe)vdwd]+[-1ye0Y-xe0-1]u+γ,    (48)

where Y represents the distance from the vehicle's center of mass to the rear axle, set to Y = −1.2m in this article. The desired reference trajectory is initialized with the state:

[xd(0),yd(0),ϑd(0)]T=[0,0,0]T,

and the vehicle's trajectory is initialized with the state:

[xc(0),yc(0),ϑc(0)]T=[-2.5,2.5,-0.5]T.

Consequently, the initial value of error denotes

[xe(0),ye(0),ϑe(0)]T=[2.5,-2.5,0.5]T.

The reference trajectory's desired velocities are vd = 0.5 and wd = 0.04. Under the input constraints, ℏ = 1.5, meaning the constraint range is [−1.5, 1.5]. The unreliable communication γ is defined as

γ(t)=σ[sin(σ)xecos(σ)yesin(σ)xeye],

where σ is a random variable uniformly distributed in σ ∈ [−0.1, 0.1]. For the performance evaluation, we define the cost function using the weighting matrices

Q=[100001000010],   R=[1.5000.5].

The activation function vector of CNN is φ(z)=[z14,z24,z34, z12z22,z22z32,z12z32,z12z2,z22z3,z1z22,z33, sin(z1), sin(z2), sin(z3), cos(z1),cos(z2),cos(z3)]T. ρ1 = 1.1, ρ2 = 1.03, ϱ1 = 3.2, ϱ2 = 1.08, δ1=0.3×10-5, δ2=0.4×10-2, A = 1, and B = 1.

Using the proposed method, Figure 3A illustrates the two-dimensional trajectory of the AVR. The vehicle quickly adjusts its direction and begins tracking the reference trajectory with good accuracy. After the initial phase, the vehicle follows the desired trajectory smoothly and closely. Figures 3BG depict the tracking performance and error, demonstrating that the position error gradually reduces to zero, while the directional error also diminishes to zero, effectively ensuring precise position tracking throughout the process.

Figure 3
www.frontiersin.org

Figure 3. AVR driving trajectories. (A) The X-Y plot of tracking trajectories. (B–D) Tracking trajectories. (E–G) Tracking errors.

Figure 4 displays the evolution of the designed controller during the vehicle's tracking process. The dashed lines indicate the upper and lower bounds of the input constraints, which are set to [−1.5, 1.5]. The privacy-preserving characteristics of the proposed scheme are illustrated in Figure 5. It is evident that masking the vehicle-side output z effectively safeguards its privacy from potential attackers. Meanwhile, as shown in Figure 6, masking on the cloud side further prevents the leakage of critical information related to the designed control strategy. Therefore, these results ensure robust privacy protection during data transmission.

Figure 4
www.frontiersin.org

Figure 4. The constrained control input.

Figure 5
www.frontiersin.org

Figure 5. Encrypted error and decrypted error.

Figure 6
www.frontiersin.org

Figure 6. Encrypted value function and decrypted value function.

6 Conclusion

This study develops an encrypted guaranteed-cost tracking control scheme to address the challenges of information security and computational efficiency in AVR systems using the adaptive dynamic programming technique. By leveraging ADP and integrating encryption mechanisms between the vehicle and the cloud, the proposed method ensures stable tracking performance under unreliable communication. The input constraints are successfully managed using a nonlinear value function, while the CNN facilitates an efficient solution to the HJB equation. Simulation results from a case study confirm the stability and effectiveness of the designed algorithm, demonstrating its potential for real-world applications in AVR networks. Future work will focus on ensuring the security of cloud-based computations by processing encrypted data, further enhancing the safety and reliability of cloud operations in vehicular network systems.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

KZ: Conceptualization, Methodology, Writing – original draft. KH: Methodology, Writing – review & editing. ZH: Conceptualization, Supervision, Writing – review & editing. GT: Validation, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work is supported by Beijing Nova Program (20240484516), the Fundamental Research Funds for the Central Universities (KG16314701), and Beihang World TOP University Cooperation Program.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Chen, X., Chen, X., Bai, W., and Guo, Z. (2021). Event-triggered optimal control for macro–micro composite stage system via single-network ADP method. IEEE Trans. Indust. Elect. 68, 4190–4198. doi: 10.1109/TIE.2020.2984462

Crossref Full Text | Google Scholar

Deng, C., and Wen, C. (2021). Mas-based distributed resilient control for a class of cyber-physical systems with communication delays under dos attacks. IEEE Trans. Cybern. 51, 2347–2358. doi: 10.1109/TCYB.2020.2972686

PubMed Abstract | Crossref Full Text | Google Scholar

Dong, H., Zhao, X., and Luo, B. (2022). Optimal tracking control for uncertain nonlinear systems with prescribed performance via critic-only ADP. IEEE Trans. Syst. Man, Cybernet.: Syst. 52, 561–573. doi: 10.1109/TSMC.2020.3003797

Crossref Full Text | Google Scholar

El-Sousy, F. F. M., Amin, M. M., and Al-Durra, A. (2021). Adaptive optimal tracking control via actor-critic-identifier based adaptive dynamic programming for permanent-magnet synchronous motor drive system. IEEE Trans. Ind. Appl. 57, 6577–6591. doi: 10.1109/TIA.2021.3110936

Crossref Full Text | Google Scholar

Guo, Z., Li, H., Ma, H., and Meng, W. (2024). Distributed optimal attitude synchronization control of multiple quavs via adaptive dynamic programming. IEEE Trans. Neural Netw. Learn. Syst. 35:8053–8063. doi: 10.1109/TNNLS.2022.3224029

PubMed Abstract | Crossref Full Text | Google Scholar

Han, K., Zhang, K., Wang, Z.-P., and Su, R. (2024). Resilient predictive load frequency control of multi-area interconnected power systems with privacy preserving and active detection against stealthy cyber attacks. IEEE Intern. Things J. 7, 4387–4394. doi: 10.1109/JIOT.2024.3507291

Crossref Full Text | Google Scholar

He, W., Yan, G., and Xu, L. D. (2014). Developing vehicular data cloud services in the IoT environment. IEEE Trans. Indust. Inform. 10, 1587–1595. doi: 10.1109/TII.2014.2299233

Crossref Full Text | Google Scholar

Hu, S., Ge, X., Chen, X., and Yue, D. (2023). Resilient load frequency control of islanded ac microgrids under concurrent false data injection and denial-of-service attacks. IEEE Trans. Smart Grid 14, 690–700. doi: 10.1109/TSG.2022.3190680

Crossref Full Text | Google Scholar

Jiang, M., Wu, T., Wang, Z., Gong, Y., Zhang, L., and Liu, R. P. (2022). A multi-intersection vehicular cooperative control based on end-edge-cloud computing. IEEE Trans. Vehicular Technol. 71, 2459–2471. doi: 10.1109/TVT.2022.3143828

Crossref Full Text | Google Scholar

Li, Y., Tang, C., Li, K., He, X., Peeta, S., and Wang, Y. (2019a). Consensus-based cooperative control for multi-platoon under the connected vehicles environment. IEEE Trans. Intellig. Transport. Syst. 20, 2220–2229. doi: 10.1109/TITS.2018.2865575

Crossref Full Text | Google Scholar

Li, Y., Tang, C., Peeta, S., and Wang, Y. (2019b). Nonlinear consensus-based connected vehicle platoon control incorporating car-following interactions and heterogeneous time delays. IEEE Trans. Intellig. Transport. Syst. 20, 2209–2219. doi: 10.1109/TITS.2018.2865546

Crossref Full Text | Google Scholar

Li, Z., Shi, Y., Xu, S., Xu, H., and Dong, L. (2024). Distributed model predictive consensus of mass against false data injection attacks and denial-of-service attacks. IEEE Trans. Automat. Contr. 69, 5538–5545. doi: 10.1109/TAC.2024.3371895

Crossref Full Text | Google Scholar

Lin, Z., Ma, J., Duan, J., Li, S. E., Ma, H., Cheng, B., et al. (2023). Policy iteration based approximate dynamic programming toward autonomous driving in constrained dynamic environment. IEEE Trans. Intellig. Transp. Syst. 24, 5003–5013. doi: 10.1109/TITS.2023.3237568

Crossref Full Text | Google Scholar

Liu, K., Zhang, H., Zhang, Y., and Sun, C. (2023a). False data-injection attack detection in cyber–physical systems with unknown parameters: a deep reinforcement learning approach. IEEE Trans. Cybern. 53, 7115–7125. doi: 10.1109/TCYB.2022.3225236

PubMed Abstract | Crossref Full Text | Google Scholar

Liu, R., Hao, F., and Yu, H. (2021). Optimal SINR-based dos attack scheduling for remote state estimation via adaptive dynamic programming approach. IEEE Trans. Syst. Man, Cybernet.: Syst. 51, 7622–7632. doi: 10.1109/TSMC.2020.2981478

Crossref Full Text | Google Scholar

Liu, T., Cui, L., Pang, B., and Jiang, Z.-P. (2023b). A unified framework for data-driven optimal control of connected vehicles in mixed traffic. IEEE Trans. Intellig. Vehicl. 8, 4131–4145. doi: 10.1109/TIV.2023.3287131

Crossref Full Text | Google Scholar

Lu, J., Wei, Q., and Wang, F.-Y. (2020). Parallel control for optimal tracking via adaptive dynamic programming. IEEE/CAA J. Automat. Sinica 7, 1662–1674. doi: 10.1109/JAS.2020.1003426

Crossref Full Text | Google Scholar

Mohan, A. M., Meskin, N., and Mehrjerdi, H. (2020). A comprehensive review of the cyber-attacks and cyber-security on load frequency control of power systems. Energies 13:15. doi: 10.3390/en13153860

PubMed Abstract | Crossref Full Text | Google Scholar

Mu, C., Ni, Z., Sun, C., and He, H. (2017a). Air-breathing hypersonic vehicle tracking control based on adaptive dynamic programming. IEEE Trans. Neural Netw. Learn. Syst. 28, 584–598. doi: 10.1109/TNNLS.2016.2516948

PubMed Abstract | Crossref Full Text | Google Scholar

Mu, C., Ni, Z., Sun, C., and He, H. (2017b). Data-driven tracking control with adaptive dynamic programming for a class of continuous-time nonlinear systems. IEEE Trans. Cybern. 47, 1460–1470. doi: 10.1109/TCYB.2016.2548941

PubMed Abstract | Crossref Full Text | Google Scholar

Pan, H., Zhang, C., and Sun, W. (2023). Fault-tolerant multiplayer tracking control for autonomous vehicle via model-free adaptive dynamic programming. IEEE Trans. Reliab. 72, 1395–1406. doi: 10.1109/TR.2022.3208467

Crossref Full Text | Google Scholar

Rudin, W. (1964). Principles of Mathematical Analysis, Volume 3. New York: McGraw-Hill.

Google Scholar

Song, S., Gong, D., Zhu, M., Zhao, Y., and Huang, C. (2023). Data-driven optimal tracking control for discrete-time nonlinear systems with unknown dynamics using deterministic adp. IEEE Trans. Neural Netw. Learn. Syst. 36, 1184–1198. doi: 10.1109/TNNLS.2023.3323142

PubMed Abstract | Crossref Full Text | Google Scholar

Teixeira, A., Pérez, D., Sandberg, H., and Johansson, K. H. (2012). “Attack models and scenarios for networked control systems,” in Proceedings of the 1st International Conference on High Confidence Networked Systems (New York, NY: Association for Computing Machinery), 55–64.

Google Scholar

Wang, W., Chen, X., Fu, H., and Wu, M. (2020). Model-free distributed consensus control based on actor–critic framework for discrete-time nonlinear multiagent systems. IEEE Trans. Syst. Man, Cybernet.: Syst. 50, 4123–4134. doi: 10.1109/TSMC.2018.2883801

Crossref Full Text | Google Scholar

Wu, H., Li, M., Gao, Q., Wei, Z., Zhang, N., and Tao, X. (2022). Eavesdropping and anti-eavesdropping game in uav wiretap system: A differential game approach. IEEE Trans. Wireless Commun. 21, 9906–9920. doi: 10.1109/TWC.2022.3180395

Crossref Full Text | Google Scholar

Xu, Y., Li, T., Yang, Y., Tong, S., and Chen, C. L. P. (2023). Simplified adp for event-triggered control of multiagent systems against fdi attacks. IEEE Trans. Syst. Man, Cybernet.: Syst. 53, 4672–4683. doi: 10.1109/TSMC.2023.3257031

Crossref Full Text | Google Scholar

Yang, W., Zheng, Z., Chen, G., Tang, Y., and Wang, X. (2020). Security analysis of a distributed networked system under eavesdropping attacks. IEEE Trans. Circuits Systems II: Express Briefs 67, 1254–1258. doi: 10.1109/TCSII.2019.2928558

Crossref Full Text | Google Scholar

Yang, X., Xu, M., and Wei, Q. (2023). Approximate dynamic programming for event-driven H constrained control. IEEE Trans. Syst. Man, Cybernet.: Syst. 53, 5922–5932. doi: 10.1109/TSMC.2023.3277737

Crossref Full Text | Google Scholar

Zhang, H., Qu, Q., Xiao, G., and Cui, Y. (2018). Optimal guaranteed cost sliding mode control for constrained-input nonlinear systems with matched and unmatched disturbances. IEEE Trans. Neural Netw. Learn. Syst. 29, 2112–2126. doi: 10.1109/TNNLS.2018.2791419

PubMed Abstract | Crossref Full Text | Google Scholar

Zhang, K., Liang, X., Lu, R., and Shen, X. (2014). Sybil attacks and their defenses in the internet of things. IEEE Intern. Things J. 1, 372–383. doi: 10.1109/JIOT.2014.2344013

Crossref Full Text | Google Scholar

Zhang, K., Zhang, H., Cai, Y., and Su, R. (2020). Parallel optimal tracking control schemes for mode-dependent control of coupled markov jump systems via integral rl method. IEEE Trans. Autom. Sci. Eng. 17, 1332–1342. doi: 10.1109/TASE.2019.2948431

Crossref Full Text | Google Scholar

Zhang, K., Zhang, H., Xue, W., and Zhang, R. (2022). “A robust control scheme for autonomous vehicles path tracking under unreliable communication,” in 2022 IEEE 11th Data Driven Control and Learning Systems Conference (DDCLS) (Chengdu: IEEE), 1413–1418. doi: 10.1109/DDCLS55054.2022.9858512

Crossref Full Text | Google Scholar

Zou, H., and Zhang, G. (2023). Dynamic event-triggered-based single-network adp optimal tracking control for the unknown nonlinear system with constrained input. Neurocomputing 518, 294–307. doi: 10.1016/j.neucom.2022.11.015

Crossref Full Text | Google Scholar

Keywords: adaptive dynamic programming, encryption and decryption, tracking control, optimal control, autonomous vehicle

Citation: Zhang K, Han K, Hu Z and Tan G (2025) Privacy-preserving ADP for secure tracking control of AVRs against unreliable communication. Front. Neurorobot. 19:1549414. doi: 10.3389/fnbot.2025.1549414

Received: 21 December 2024; Accepted: 10 January 2025;
Published: 29 January 2025.

Edited by:

Ming-Feng Ge, China University of Geosciences Wuhan, China

Reviewed by:

Xiao-Wei Zhang, Tiangong University, China
Jian Han, Ludong University, China

Copyright © 2025 Zhang, Han, Hu and Tan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Zhijian Hu, emhpamlhbi5odUBsYWFzLmZy

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Research integrity at Frontiers

Man ultramarathon runner in the mountains he trains at sunset

94% of researchers rate our articles as excellent or good

Learn more about the work of our research integrity team to safeguard the quality of each article we publish.


Find out more