Adaptive formation learning control for cooperative AUVs under complete uncertainty

Jandaghi, Emadodin; Zhou, Mingxi; Stegagno, Paolo; Yuan, Chengzhi

doi:10.3389/frobt.2024.1491907

ORIGINAL RESEARCH article

Front. Robot. AI, 14 February 2025

Sec. Robot Learning and Evolution

Volume 11 - 2024 | https://doi.org/10.3389/frobt.2024.1491907

This article is part of the Research TopicAdvancements in Neural Learning Control for Enhanced Multi-Robot CoordinationView all 3 articles

Adaptive formation learning control for cooperative AUVs under complete uncertainty

¹Department of Mechanical, Industrial and Systems Engineering, University of Rhode Island, Kingston, RI, United States
²Graduate School of Oceanography, University of Rhode Island, Kingston, RI, United States
³Department of Electrical, Computer, and Biomedical Engineering, University of Rhode Island, Kingston, RI, United States

Introduction: This paper addresses the critical need for adaptive formation control in Autonomous Underwater Vehicles (AUVs) without requiring knowledge of system dynamics or environmental data. Current methods, often assuming partial knowledge like known mass matrices, limit adaptability in varied settings.

Methods: We proposed two-layer framework treats all system dynamics, including the mass matrix, as entirely unknown, achieving configuration-agnostic control applicable to multiple underwater scenarios. The first layer features a cooperative estimator for inter-agent communication independent of global data, while the second employs a decentralized deterministic learning (DDL) controller using local feedback for precise trajectory control. The framework's radial basis function neural networks (RBFNN) store dynamic information, eliminating the need for relearning after system restarts.

Results: This robust approach addresses uncertainties from unknown parametric values and unmodeled interactions internally, as well as external disturbances such as varying water currents and pressures, enhancing adaptability across diverse environments.

Discussion: Comprehensive and rigorous mathematical proofs are provided to confirm the stability of the proposed controller, while simulation results validate each agent’s control accuracy and signal boundedness, confirming the framework’s stability and resilience in complex scenarios.

1 Introduction

Robotics and autonomous systems have a wide range of applications, spanning from manufacturing and surgical procedures to exploration in challenging environments (Ghafoori et al., 2024; Jandaghi et al., 2023). However, controlling robots in such settings, especially in space and underwater, presents significant difficulties due to unpredictable dynamics. In the context of underwater exploration, AUVs have become essential tools, offering cost-effective, reliable, and versatile solutions for adapting to dynamic conditions. Effective use of AUVs is critical for unlocking the mysteries of marine environments, making advancements in their control and operation essential. As the demand for efficient underwater exploration increases and the complexity of tasks assigned to AUVs grows, there is a pressing need to enhance their operational capabilities. This includes developing sophisticated formation control strategies that allow multiple AUVs to operate in coordination, drawing inspiration from natural behaviors observed in fish schools and bird flocks (Zhou et al., 2023; Yang et al., 2021). By leveraging multi-agent systems, AUVs can work in coordinated groups, enhancing efficiency, stability, and coverage while navigating dynamic and complex underwater environments. These strategies are essential for ensuring precise operations in varied underwater tasks, ranging from pipeline inspections and seafloor mapping to environmental monitoring (Yan et al., 2023).

Despite challenges from intricate nonlinear dynamics, complex interactions among AUVs, and the uncertain dynamic nature of underwater environments, effective multi-AUV formation control is increasingly critical in modern ocean industries (Yan et al., 2018; Hou and Cheah, 2009). Historically, formation control research has predominantly utilized the behavioral approach (Balch and Arkin, 1998; Lawton, 2000), which divides the overall control design into subproblems, with each vehicle’s action determined by a weighted average of solutions, though selecting appropriate weighting parameters can be challenging. The leader-following approach (Cui et al., 2010; Rout and Subudhi, 2016) designates one vehicle as the leader while others follow, maintaining predefined geometric relationships, and controlling formation behavior by designing specific motions for the leader. Alternatively, the virtual structure approach (Millán et al., 2013).

Despite advancements in formation control and path planning for multi-AUV systems, challenges such as environmental disturbances, complex underwater dynamics, and communication limitations continue to pose difficulties (Hadi et al., 2021). To address these challenges, there is a critical need for controllers that are independent of both robot dynamics and environmental disturbances. Developing such controllers would enhance formation control by allowing for decentralized application, which increases flexibility in formation structures and improves robustness against communication constraints. Addressing these gaps is essential for advancing the capabilities and reliability of multi-AUV systems. On the other hand, communication constraints in underwater environments make decentralized control with a virtual leader-following topology ideal for AUVs, enabling coordination using local information despite communication delays or interruptions (Yan et al., 2023).

Reinforcement learning (RL) has also been extensively applied in robotic control (Christen et al., 2021; Cao et al., 2022). RL approaches, such as deep reinforcement learning (DRL), offer advantages in learning complex, non-linear control policies directly from data. However, RL methods generally lack the ability to provide mathematical stability proofs and guarantees for the controller’s behavior, making it challenging to ensure safety and reliability, especially in critical applications. Besides, while Zhang et al. (2018) developed various direct neural adaptive laws that lead to increased oscillations with higher adaptation gains, indirect neural adaptive laws using prediction error methods were proposed to mitigate this issue, though they could not guarantee parameter convergence. However, NN-based learning control methods, such as those utilizing adaptive neural networks or deterministic learning frameworks Jandaghi et al. (2024), can incorporate stability analysis and provide rigorous mathematical proofs for parameter convergence. These methods enable researchers to establish theoretical guarantees for the stability and robustness of the controller, which is essential for deploying controllers in real-world applications where safety and reliability are critical. Most recently, Tutsoy et al. (2024) proposed an optimization-based approach for path planning in Unmanned Air Vehicles (UAVs) with actuator failures using particle swarm optimization and genetic algorithms. Their method focuses on minimizing both time and distance by optimizing predefined cost functions through heuristic methods, while incorporating system constraints such as actuator limits, kinematic, and dynamic constraints, as well as parametric uncertainties.

Despite extensive literature in the field, to the best of our knowledge, existing researches assume homogeneous dynamics and certain system parameters for all AUV agents, which is unrealistic in unpredictable underwater environments. Factors such as buoyancy, drag, and varying water viscosity significantly alter system dynamics and behavior. Additionally, AUVs may change shape during tasks like underwater sampling or when equipped with robotic arms, further complicating control. Typically, designing multi-AUV formation control involves planning desired formation paths and developing tracking controllers for each AUV. However, accurately tracking these paths is challenging due to the complex nonlinear dynamics of AUVs, especially when precise models are unavailable. Implementing a fully distributed and decentralized formation control system is also difficult, as centralized control designs become exceedingly complex with larger AUV groups. To address these challenges previous work, such as Yuan et al. (2017) and Dong et al. (2019), developed adaptive learning controllers that relied on the assumption of a known mass matrix, which is not practical in real-world applications. These controllers relied on known system parameters that can fail due to varying internal forces caused by varying external environmental conditions. The solution is to develop environment-independent controllers that do not rely on any specific system dynamical parameters.

The framework’s control architecture is ingeniously divided into a first-layer Cooperative Estimator Observer and a lower-layer Decentralized Deterministic Learning (DDL) Controller. The first-layer observer is pivotal in enhancing inter-agent communication by sharing crucial system estimates, operating independently of any global information. Concurrently, the second-layer DDL controller utilizes local feedback to finely adjust each AUV’s trajectory, ensuring resilient operation under dynamic conditions heavily influenced by hydrodynamic forces and torques by considering system uncertainty completely unknown. This dual-layer setup not only facilitates acute adaptation to uncertain AUV dynamics but also leverages RBFNN for precise local learning and effective knowledge storage. Such capabilities enable AUVs to efficiently reapply previously learned dynamics after the system restarts. This tow-layer framework achieves a significant advancement by considering all system dynamics parameters as unknown, enabling a universal application across all AUVs, regardless of their operating environments. This universality is crucial for adapting to environmental variations such as water flow, which increases the AUV’s effective mass via the added mass phenomenon and affects the vehicle’s inertia. Additionally, buoyancy forces that vary with depth, along with hydrodynamic forces and torques, stemming from water flow variations, the AUV’s unique shape, its appendages, and drag forces due to water viscosity, significantly impact the damping matrix in the AUV’s dynamics. This framework not only improves operational efficiency but also significantly advances the field of autonomous underwater vehicle control by laying a robust foundation for future enhancements in distributed adaptive control systems and fostering enhanced collaborative intelligence among multi-agent networks in marine environments. Extensive simulations have underscored the effectiveness of the framework, demonstrating its potential to elevate the adaptability and resilience of AUV systems under the most demanding conditions. In summary, the contribution of this paper is as follows:

• The universal controller works in any environment and condition, such as currents or depth.

• Each AUV controller operates independently.

• The controller functions without needing information about the robot’s dynamic parameters, like mass, damping, or inertia. Each AUV can also have different dynamic parameters.

• The system learns the dynamics once and reuses the pre-trained weights, avoiding the need for retraining.

• The use of localized RBFNN reduces real-time computational demands.

• Providing rigorous stability analysis of the controller while providing mathematical proofs to ensure and guarantee the reliability of the controller.

The rest of the paper is organized as follows: Section 2 provides an initial overview of graph theory, RBFNN, and the problem statement. The design of the distributed cooperative estimator and the decentralized deterministic learning controller are discussed in Section 3. The formation adaptive control and formation control using pre-learned dynamics are explored in Section 4 and Section 5, respectively. Simulation studies are presented in Section 6, and Section 7 concludes the paper.

2 Preliminaries and problem statement

2.1 Notation and graph theory

Denoting the set of real numbers as $R$ , we define $R^{m \times n}$ as the set of $m \times n$ real matrices, and $R^{n}$ as the set of $n \times 1$ real vectors. The identity matrix is symbolized as $I$ . The vector with all elements being 1 in an $n$ -dimensional space is represented as $1_{n}$ . The sets $S_{+}^{n}$ and $S_{-}^{n +}$ stand for real symmetric $n \times n$ and positive definite matrices, respectively. A block diagonal matrix with matrices $X_{1}, X_{2}, \dots, X_{p}$ on its main diagonal is denoted by $diag {X_{1}, X_{2}, \dots, X_{p}}$ . $A \otimes B$ signifies the Kronecker product of matrices $A$ and $B$ . For a matrix $A$ , $\vec{A}$ is the vectorization of $A$ by stacking its columns on top of each other. For a series of column vectors $x_{1}, \dots, x_{n}$ , $col {x_{1}, \dots, x_{n}}$ represents a column vector formed by stacking them together. Given two integers $k_{1}$ and $k_{2}$ with $k_{1} < k_{2}$ , $I [k_{1}, k_{2}] = {k_{1}, k_{1} + 1, \dots, k_{2}}$ . For a vector $x \in R^{n}$ , its norm is defined as $| x | ≔ {(x^{T} x)}^{1 / 2}$ . For a square matrix $A$ , $λ_{i} (A)$ denotes its $i$ -th eigenvalue, while $λ_{min} (A)$ and $λ_{max} (A)$ represent its minimum and maximum eigenvalues, respectively.

A directed graph $G = (V, E)$ comprises nodes in the set $V = {1,2, \dots, N}$ and edges in $E \subseteq V \times V$ . An edge from node $i$ to node $j$ is represented as $(i, j)$ , with $i$ as the parent node and $j$ as the child node. Node $i$ is also termed a neighbor of node $j$ . $N_{i}$ is considered as the subset of $V$ consisting of the neighbors of node $i$ . A sequence of edges in $G$ , $(i_{1}, i_{2}), (i_{2}, i_{3}), \dots, (i_{k}, i_{k + 1})$ , is called a path from node $i_{1}$ to node $i_{k + 1}$ . Node $i_{k + 1}$ is reachable from node $i_{1}$ . A directed tree is a graph where each node, except for a root node, has exactly one parent. The root node is reachable from all other nodes. A directed graph $G$ contains a directed spanning tree if at least one node can reach all other nodes. The weighted adjacency matrix of $G$ is a non-negative matrix $A = [a_{i j}] \in R^{N \times N}$ , where $a_{i i} = 0$ and $a_{i j} > 0 \Rightarrow (j, i) \in E$ . The Laplacian of $G$ is denoted as $L = [l_{i j}] \in R^{N \times N}$ , where $l_{i i} = \sum_{j = 1}^{N} a_{i j}$ and $l_{i j} = - a_{i j}$ if $i \neq j$ . It is established that $L$ has at least one eigenvalue at the origin, and all nonzero eigenvalues of $L$ have positive real parts. From Ren and Beard (2005), $L$ has one zero eigenvalue and remaining eigenvalues with positive real parts if and only if $G$ has a directed spanning tree.

2.2 Radial basis function neural networks (RBFNN)

The RBFNN Networks can be described as $f_{n n} (Z) = \sum_{i = 1}^{N} w_{i} s_{i} (Z) = W^{T} S (Z)$ , where $Z \in Ω_{Z} \subseteq R^{q}$ and $W = w_{1}, \dots, w_{N}^{T} \in R^{N}$ as input and weight vectors respectively (Park and Sandberg, 1991). $N$ indicates the number of NN nodes, $S (Z) = {[s_{1} (‖ Z - μ_{i} ‖), \dots, s_{N} (‖ Z - μ_{i} ‖)]}^{T}$ with $s_{i} (\cdot)$ is a radial basis function, and $μ_{i} (i = 1, \dots, N)$ is distinct points in the state space. The Gaussian function $s_{i} (‖ Z - μ_{i} ‖) = \exp [- \frac{{(Z - μ_{i})}^{T} (Z - μ_{i})}{η_{i}^{2}}]$ is generally used for radial basis function, where $μ_{i} = {[μ_{i 1}, μ_{i 2}, \dots, μ_{i N}]}^{T}$ is the center and $η_{i}$ is the width of the receptive field. The Gaussian function categorized by localized radial basis function $s$ in the sense that $s_{i} (‖ Z - μ_{i} ‖) \to 0$ as $‖ Z ‖ \to \infty$ . Moreover, for any bounded trajectory $Z (t)$ within the compact set $Ω_{Z}$ , $f (Z)$ can be approximated using a limited number of neurons located in a local region along the trajectory $f (Z) = W_{ζ}^{*} S_{ζ} (Z) + ϵ_{ζ}$ . $ζ$ denotes the indices of active RBFNN nodes where $| s_{j_{i}} (Z) | > ι$ , based on the state $Z (t)$ . $ϵ_{ζ}$ is the approximation error, with $ϵ_{ζ} = O (ϵ) = O (ϵ^{*})$ , $S_{ζ} (Z) = {[s_{j_{1}} (Z), \dots, s_{j_{ζ}} (Z)]}^{T} \in R^{N_{ζ}}$ , $W_{ζ}^{*} = {[w_{j_{1}}^{*}, \dots, w_{j_{ζ}}^{*}]}^{T} \in R^{N_{ζ}}$ , $N_{ζ} < N_{n}$ , and the integers $j_{i} = j_{1}, \dots, j_{ζ}$ are defined by $| s_{j_{i}} (Z_{p}) | > ι$ ( $ι > 0$ is a small positive constant) for some $Z_{p} \in Z (t)$ . This holds if $‖ Z (t) - ξ_{j_{i}} ‖ < ϵ$ for $t > 0$ . The following lemma regarding the persistent excitation (PE) condition for RBFNN is recalled from Wang and Hill (2018).

Lemma 1. Consider any continuous recurrent trajectory¹ $Z (t) : [0, \infty) \to R^{q}$ . $Z (t)$ remains in a bounded compact set $Ω_{Z} \subset R^{q}$ . Then for an RBFNN $W^{T} S (Z)$ with centers placed on a regular lattice (large enough to cover the compact set $Ω_{Z}$ ), the regressor subvector $S_{ζ} (Z)$ consisting of RBFNN with centers located in a small neighborhood of $Z (t)$ is persistently exciting.

2.3 Problem statement

A multi-agent system comprising $N$ AUVs with heterogeneous nonlinear uncertain dynamics is considered. The dynamics of each AUV can be expressed as Fossen (1999):

\begin{aligned} {\dot{η}}_{i} = J_{i} (η_{i}) ν_{i}, \\ M_{i} {\dot{ν}}_{i} + C_{i} (ν_{i}) ν_{i} + D_{i} (ν_{i}) ν_{i} + g_{i} (η_{i}) + Δ_{i} (χ_{i}) = τ_{i} . \end{aligned} (1)

In this study, the subscript $i \in I [1, N]$ identifies each AUV within the multi-agent system. For every $i \in I [1, N]$ , the vector $η_{i} = {[x_{i}, y_{i}, ψ_{i}]}^{T} \in R^{3}$ represents the $i$ -th AUV’s position coordinates and heading in the global coordinate frame, while $ν_{i} = {[u_{i}, v_{i}, r_{i}]}^{T} \in R^{3}$ denotes its linear velocities and angular rate of heading relative to a body-fixed frame. The positive definite inertial matrix $M_{i} = M_{i}^{T} \in S_{3}^{+}$ , Coriolis and centripetal matrix $C_{i} (ν_{i}) \in R^{3 \times 3}$ , and damping matrix $D_{i} (ν_{i}) \in R^{3 \times 3}$ characterize the AUV’s dynamic response to motion. The vector $g_{i} (η_{i}) \in R^{3 \times 1}$ accounts for the restoring forces and moments due to gravity and buoyancy. The term $Δ_{i} (χ_{i}) \in R^{3 \times 1}$ , with $χ_{i} ≔ col {η_{i}, ν_{i}}$ , describes the vector of generalized deterministic unmodeled uncertain dynamics for each AUV.

The vector $τ_{i} \in R^{3}$ represents the control inputs for each AUV. The associated rotation matrix $J_{i} (η_{i})$ is given by:

\begin{aligned} J_{i} (η_{i}) = [\begin{matrix} \cos (ψ_{i}) & \sin (ψ_{i}) & 0 \\ - \sin (ψ_{i}) & \cos (ψ_{i}) & 0 \\ 0 & 0 & 1 \end{matrix}], \end{aligned}

Unlike previous work Yuan et al. (2017), which assumed known values for the AUV’s inertia and rotation matrices, this study considers all matrix coefficients, including $C_{i} (ν_{i})$ , $D_{i} (ν_{i})$ , $g_{i} (η_{i})$ , and $Δ_{i} (χ_{i})$ , as well as the inertia matrix, as completely unknown. The adaptive estimation process inherently addresses the effects of external forces and disturbances on system dynamics. This eliminates the need for explicit parameter estimation of these forces, as disturbances like water flow, varying currents, or depth variations are directly incorporated into the control input through adaptive estimation. This makes the controller universally applicable to any AUV, regardless of its design, weight, or environmental conditions by addressing both internal and external dynamic variation at the same time.

Internally, it handles unknown parameters such as mass and damping coefficients, as well as unmodeled nonlinear interactions and couplings. Externally, it accounts for unpredictable disturbances, including fluctuating water currents, depth-dependent pressures, and changes in hydrodynamic forces.

By avoiding reliance on predefined models, the proposed approach is robust and adaptable to diverse mission scenarios and unexpected environmental changes, ensuring reliable performance even in highly uncertain conditions.

In the context of leader-following formation tracking control, the following virtual leader dynamics generates the tracking reference signals:

{\dot{χ}}_{0} = A_{0} χ_{0}, (2)

with “0” marking the leader node, the leader state $χ_{0} ≔ col {η_{0}, ν_{0}}$ with $η_{0} \in R^{3}$ and $ν_{0} \in R^{3}$ , $A_{0} \in R^{6 \times 6}$ is a constant matrix available only to the leader’s neighboring AUV agents.

Considering the system dynamics of multiple AUVs (Equation 1) along with the leader dynamics (Equation 2), we establish a non-negative matrix $A = [a_{i j}]$ , where $i, j \in I [0, N]$ such that for each $i \in I [1, N]$ , $a_{i 0} > 0$ if and only if agent $i$ has access to the reference signals $η_{0}$ and $ν_{0}$ . All remaining elements of $A$ are arbitrary non-negative values, such that $a_{i i} = 0$ for all $i$ . Correspondingly, we establish $G = (V, E)$ as a directed graph derived from $A$ , where $V = {0,1, \dots, N}$ designates node 0 as the leader, and the remaining nodes correspond to the $N$ AUV agents. We proceed under the following assumptions:

Assumption 1. All the eigenvalues of $A_{0}$ in the leader’s dynamics (Equation 2) are located on the imaginary axis.

Assumption 2. The directed graph $G$ contains a directed spanning tree with the node 0 as its root.

Assumption 1 is crucial for ensuring that the leader dynamics produce stable, meaningful reference trajectories for formation control. It ensures that all states of the leader, represented by $χ_{0} = col {η_{0}, ν_{0}}$ , remain within the bounds of a compact set $Ω_{0} \subset R^{6}$ for all $t \geq 0$ . The trajectory of the system, starting from $χ_{0} (0)$ and denoted by $ϕ_{0} (χ_{0} (0))$ , generates periodic signal. This periodicity is essential for maintaining the Persistent Excitation (PE) condition, which is pivotal for achieving parameter convergence in Distributed Adaptive (DA) control systems. Modifications to the eigenvalue constraints on $A_{0}$ mentioned in Assumption 1 may be considered when focusing primarily on formation tracking control performance, as discussed later.

Additionally, Assumption 2 reveals key insights into the structure of the Laplacian matrix $L$ of the network graph $G$ . Let $Ψ$ be an $N \times N$ non-negative diagonal matrix where each $i$ -th diagonal element is $a_{i 0}$ for $i \in I [1, N]$ . The Laplacian $L$ is formulated as:

L = [\begin{matrix} \sum_{j = 1}^{N} a_{0 j} & - [a_{01}, \dots, a_{0 N}] \\ - Ψ 1_{N} & H \end{matrix}],

where $a_{0 j} > 0$ if $(j, 0) \in E$ and $a_{0 j} = 0$ otherwise. This results in $H 1_{N} = Ψ 1_{N}$ since $L 1_{N + 1} = 0$ . As cited in Su and Huang (2011), all nonzero eigenvalues of $H$ , if present, exhibit positive real parts, confirming $H$ as nonsingular under Assumption 2.

Problem 1. In the context of a multi-AUV system (Equation 1) integrated with virtual leader dynamics (Equation 2) and operating within a directed network topology $G$ , the aim is to develop a distributed NN learning control protocol that leverages only local information. The specific goals are twofold:

1) Formation Control: Each of the $N$ AUV agents will adhere to a predetermined formation pattern relative to the leader, maintaining a specified distance from the leader’s position $η_{0}$ .

2) Decentralized Learning: The nonlinear uncertain dynamics of each AUV will be identified and learned autonomously during the formation control process. The insights gained from this learning process will be utilized to enhance the stability and performance of the formation control system.

Remark 1. The leader dynamics described in Equation 2 are designed as a neutrally stable LTI system. This design choice facilitates the generation of sinusoidal reference trajectories at various frequencies which is essential for effective formation tracking control. This approach to leader dynamics is prevalent in the literature on multiagent leader-following distributed control systems like Yuan (2017) and Jandaghi et al. (2024).

Remark 2. It is important to emphasize that the formulation assumes formation control is required only within the horizontal plane, suitable for AUVs operating at a constant depth, and that the vertical dynamics of the 6 degrees of freedom (DOF) AUV system, as detailed in Prestero (2001), are entirely decoupled from the horizontal dynamics.

As shown in Figure 1, a two-layer hierarchical design approach is proposed to address the aforementioned challenges. The first layer, the Cooperative Estimator, enables information exchange among neighboring agents. The second layer, known as the Decentralized Deterministic Learning (DDL) controller, processes only local data from each individual AUV. The development and formulation of the first layer are discussed in detail in Section 3.1, while the DDL control strategy, along with its corresponding controller design and analysis, is provided in Section 3.2.

Figure 1

Figure 1. Proposed two-layer distributed controller architecture for each AUVs.

3 Two-layer distributed controller architecture

3.1 First layer: cooperative estimator

In the context of leader-following formation control, not all AUV agents may have direct access to the leader’s information, including tracking reference signals $(χ_{0})$ and the system matrix $(A_{0})$ . This necessitates collaborative interactions among the AUV agents to estimate the leader’s information effectively. Drawing on principles from multiagent consensus and graph theories Ren and Beard (2008), we propose to develop a distributed adaptive observer for the AUV systems as:

{\dot{\hat{χ}}}_{i} (t) = {\hat{A}}_{i} (t) {\hat{χ}}_{i} (t) + β_{i 1} \sum_{j = 1}^{N} a_{i j} ({\hat{χ}}_{j} (t) - {\hat{χ}}_{i} (t)), \forall i \in I [1, N] . (3)

The observer states for each $i$ -th AUV, denoted by ${\hat{χ}}_{i} = {[{\hat{η}}_{i}, {\hat{ν}}_{i}]}^{T} \in R^{6}$ , aim to estimate the leader’s state, $χ_{0} = {[η_{0}, ν_{0}]}^{T} \in R^{6}$ . As $t \to \infty$ , these estimates are expected to converge, such that ${\hat{η}}_{i}$ approaches $η_{0}$ and ${\hat{ν}}_{i}$ approaches $ν_{0}$ , representing the leader’s position and velocity, respectively. Equation 3 accounts for the communication graph by including the adjacency matrix information through the term $a_{i j}$ . Note that ${\hat{A}}_{i} (t) \in R^{6 \times 6}$ represents an estimate computed by agent i of the leader’s matrix dynamics $A_{0} (t) \in R^{6 \times 6}$ which is also not available to the agents. Therefore, each agent estimates such matrix using a cooperative adaptation law:

{\dot{\hat{A}}}_{i} (t) = β_{i 2} \sum_{j = 1}^{N} a_{i j} ({\hat{A}}_{j} (t) - {\hat{A}}_{i} (t)), \forall i \in I [1, N], (4)

which borrowed from Ren and Beard (2008) as well. The constants $β_{i 1}$ and $β_{i 2}$ are all positive numbers and are subject to design.

Remark 3. Each AUV agent in the group is equipped with an observer configured as specified in Equations 3, 4, comprising two state variables, $χ_{i}$ and $A_{i}$ . For each $i \in I [1, N]$ , ${\hat{χ}}_{i}$ estimates the virtual leader’s state $χ_{0}$ , while ${\hat{A}}_{i}$ estimates the leader’s system matrix $A_{0}$ . The real-time data necessary for operating the $i$ -th observer includes: (1) the estimated state ${\hat{χ}}_{i}$ and matrix $\hat{A_{i}}$ , obtained from the $i$ -th AUV itself, and (2) the estimated states ${\hat{χ}}_{j}$ and matrices $\hat{A_{j}}$ for all $j \in N_{i}$ , obtains from the $j$ -th AUV’s neighbors. Note that in Equations 3, 4, if $j \notin N_{i}$ , then $a_{i j} = 0$ , indicating that the $i$ -th observer does not utilize information from the $j$ -th AUV agent. This configuration ensures that the proposed distributed observer can be implemented in each local AUV agent using only locally estimated data from the agent itself and its immediate neighbors, without the need for global information such as the size of the AUV group or the network interconnection topology.

To verify the convergence properties, we need to compute the error dynamics. Now we define the estimation error for the state and the system matrix for agent $i$ as ${\tilde{χ}}_{i} = {\hat{χ}}_{i} - χ_{0}$ and ${\tilde{A}}_{i} = {\hat{A}}_{i} - A_{0}$ , and then we derive the error dynamics:

\begin{align} {\dot{\tilde{χ}}}_{i} (t) & = {\hat{A}}_{i} (t) {\hat{χ}}_{i} (t) - A_{0} χ_{0} (t) + β_{i 1} \sum_{j = 1}^{N} a_{i j} ({\hat{χ}}_{j} (t) - {\hat{χ}}_{i} (t)) \\ = {\hat{A}}_{i} (t) {\hat{χ}}_{i} (t) - A_{0} {\hat{χ}}_{i} (t) + A_{0} {\hat{χ}}_{i} (t) - A_{0} χ_{0} (t) \\ + β_{i 1} \sum_{j = 1}^{N} a_{i j} ({\hat{χ}}_{j} (t) - χ_{0} (t) + χ_{0} (t) - {\hat{χ}}_{i} (t)) \\ = A_{0} {\tilde{χ}}_{i} (t) + {\tilde{A}}_{i} (t) {\tilde{χ}}_{i} (t) + {\tilde{A}}_{i} (t) χ_{0} (t) + β_{i 1} \sum_{j = 1}^{N} a_{i j} ({\tilde{χ}}_{j} (t) - {\tilde{χ}}_{i} (t)) \\ {\dot{\tilde{A}}}_{i} (t) & = β_{i 2} \sum_{j = 1}^{N} a_{i j} ({\tilde{A}}_{j} (t) - {\tilde{A}}_{i} (t)), \forall i \in I [1, N] . \end{align}

Define the collective error states and adaptation matrices: $\tilde{χ} = col {{\tilde{χ}}_{1}, \dots, {\tilde{χ}}_{N}}$ for the state errors, $\tilde{A} = col {{\tilde{A}}_{1}, \dots, {\tilde{A}}_{N}}$ for the adaptive parameter errors, ${\tilde{A}}_{b} = diag {{\tilde{A}}_{1}, \dots, {\tilde{A}}_{N}}$ representing the block diagonal of adaptive parameters, $B_{β_{1}} = diag {β_{11}, \dots, β_{N 1}}$ and $B_{β_{2}} = diag {β_{12}, \dots, β_{N 2}}$ for the diagonal matrices of design constants. With these definitions, the network-wide error dynamics can be expressed as:

\begin{align} \dot{\tilde{χ}} (t) & = ((I_{N} \otimes A_{0}) - B_{β_{1}} (H \otimes I_{6})) \tilde{χ} (t) + ({\tilde{A}}_{b} (t) \otimes I_{6}) \tilde{χ} (t) + {\tilde{A}}_{b} (t) (1_{N} \otimes χ_{0} (t)), \\ \dot{\tilde{A}} (t) & = - B_{β_{2}} (H \otimes I_{6}) \tilde{A} (t) . \end{align} (5)

Theorem 1. Consider the error system Equation 5. Under Assumptions 1, 2, and given that $β_{1}, β_{2} > 0$ , it follows that for all $i \in I [1, N]$ and for any initial conditions $χ_{0} (0), χ_{i} (0), A_{i} (0)$ , the error dynamics of the adaptive parameters and the states will converge to zero exponentially. Specifically, $\lim_{t \to \infty} {\tilde{A}}_{i} (t) = 0$ and $\lim_{t \to \infty} {\tilde{χ}}_{i} (t) = 0$ .

This convergence is facilitated by the independent adaptation of each agent’s parameters within their respective error dynamics, represented by the block diagonal structure of ${\tilde{A}}_{b}$ and control gains $B_{β_{1}}$ and $B_{β_{2}}$ . These matrices ensure that each agent’s parameter updates are governed by local interactions and error feedback, consistent with the decentralized control framework.

Proof: We begin by examining the estimation error dynamics for $\tilde{A}$ as presented in Equation 5. This can be rewritten in the vector form:

{\dot{\vec{\tilde{A}}}}_{0} (t) = - β_{2} (I_{6} \otimes (H \otimes I_{6})) {\vec{\tilde{A}}}_{0} (t) . (6)

Under Assumption 2, all eigenvalues of $H$ possess positive real parts according to Su and Huang (2011). Consequently, for any positive $β_{2} > 0$ , the matrix $- β_{2} (I_{6} \otimes (H \otimes I_{6}))$ is guaranteed to be Hurwitz, Which implies the exponential stability of system (Equation 6). Hence, it follows that $\lim_{t \to \infty} {\vec{\tilde{A}}}_{0} (t) = 0$ exponentially, leading to $\lim_{t \to \infty} {\tilde{A}}_{i 0} (t) = 0$ exponentially for all $i \in I [1, N]$ . Now, we analyze the error dynamics for $χ_{0}$ in Equation 5. Based on the previous discussions, We have $\lim_{t \to \infty} {\tilde{A}}_{b} (t) = 0$ exponentially, and the term ${\tilde{A}}_{b} (t) (1_{N} \otimes χ_{0} (t))$ will similarly decay to zero exponentially. Based on Cai et al. (2015), if the system defined by

{\dot{\tilde{χ}}}_{0} (t) = ((I_{N} \otimes A_{0}) - β_{1} (H \otimes I_{6})) {\tilde{χ}}_{0} (t) . (7)

is exponentially stable, then $\lim_{t \to \infty} χ_{0} (t) = 0$ exponentially. With Assumption 1, knowing that all eigenvalues of $A_{0}$ have zero real parts, and since $H$ as nonsingular with all eigenvalues in the right-half plane, system (Equation 7) is exponentially stable for any positive $β_{1} > 0$ . Consequently, this ensures that $\lim_{t \to \infty} χ_{0} (t) = 0$ , i.e., $\lim_{t \to \infty} χ_{i 0} (t) = 0$ exponentially for all $i \in I [1, N]$ .

Now, each individual agent can accurately estimate both the state and the system matrix of the leader through cooperative observer estimation Equations 3, 4. This information will be utilized in the DDL controller design for each agent’s second layer, which will be discussed in the following subsection.

3.2 Second layer: decentralized deterministic learning controller

To fulfill the overall formation learning control objectives, in this section, we develop the DDL control law for the multi-AUV system defined in Equation 1. We use $d_{i}^{*}$ to denote the desired distance between the position of the $i$ -th AUV agent $η_{i}$ and the virtual leader’s position $η_{0}$ . Then, the formation control problem is framed as a position tracking control task, where each local AUV agent’s position $η_{i}$ is required to track the reference signal $η_{d, i} ≔ η_{0} + d_{i}^{*}$ . Besides, due to the inaccessibility of the leader’s state information $χ_{0}$ for all AUV agents, the tracking reference signal ${\hat{η}}_{d, i} ≔ {\hat{η}}_{0, i} + d_{i}^{*}$ is employed instead of the reference signal $η_{d, i}$ . As established in Theorem 1, ${\hat{η}}_{d, i}$ is autonomously generated by each local agent and will exponentially converge to $η_{d, i}$ . This ensures that the DDL controller is feasible and the formation control objectives are achievable for all $i \in I [1, N]$ using ${\hat{η}}_{d, i}$ .

To design the DDL control law that addresses the formation tracking control and the precise learning of the AUVs’ complete nonlinear uncertain dynamics at the same time, we will integrate renowned backstepping adaptive control design method outlined in Krstic et al. (1995) along with techniques from Wang and Hill (2018) and Yuan et al. (2017) for deterministic learning using RBFNN. Specifically, for the $i$ -th AUV agent described in system (Equation 1), we define the position tracking error as $z_{1, i} = η_{i} - {\hat{η}}_{d, i}$ for all $i \in I [1, N]$ . Considering $J_{i} (η_{i}) J_{i}^{T} (η_{i}) = I$ for all $i \in I [1, N]$ , we proceed to:

{\dot{z}}_{1, i} = J_{i} (η_{i}) ν_{i} - {\dot{\hat{η}}}_{i}, \forall i \in I [1, N] . (8)

To frame the problem in a more tractable way, we assume $ν_{i}$ as a virtual control input and $α_{i}$ as a desired virtual control input in our control strategy design, and by implementing them in the above system we have:

\begin{aligned} z_{2, i} & = ν_{i} - α_{i}, \\ α_{i} & = J_{i}^{T} (η_{i}) (- K_{1, i} z_{1, i} + {\dot{\hat{η}}}_{i}), \forall i \in I [1, N] . \end{aligned} (9)

A positive definite gain matrix $K_{1, i} \in S_{3}^{+}$ is used for tuning the performance. Substituting $ν_{i} = z_{2, i} + α_{i}$ into Equation 8 yields:

{\dot{z}}_{1, i} = J_{i} (η_{i}) z_{2, i} - K_{1, i} z_{1, i}, \forall i \in I [1, N] .

Now we derive the first derivatives of the virtual control input and the desired control input as follows:

\begin{aligned} {\dot{z}}_{2, i} & = {\dot{ν}}_{i} - {\dot{α}}_{i} \\ = M_{i}^{- 1} (- C_{i} (ν_{i}) ν_{i} - D_{i} (ν_{i}) ν_{i} - g_{i} (η_{i}) - Δ_{i} (χ_{i}) + τ_{i}) - {\dot{α}}_{i}, \end{aligned}

{\dot{α}}_{i} = {\dot{J}}_{i}^{T} (η_{i}) (- K_{1, i} z_{1, i} + {\dot{\hat{η}}}_{i}) + J_{i}^{T} (η_{i}) (K_{1, i} {\dot{\hat{η}}}_{i} - K_{1, i} J_{i} (η_{i}) ν_{i} + {\ddot{\hat{η}}}_{i}), \forall i \in I [1, N] . (10)

As previously discussed, unlike earlier research that only identified the matrix coefficients $C_{i} (ν_{i})$ , $D_{i} (ν_{i})$ , $g_{i} (η_{i})$ , and $Δ_{i} (χ_{i})$ as unknown system nonlinearities while assuming the mass matrix $M_{i}$ to be known, this work advances significantly by also considering $M_{i}$ as unknown. Consequently, all system dynamic parameters are treated as completely unknown, making the controller fully independent of the robot’s configuration such as its dimensions, mass, or any appendages and the uncertain environmental conditions it encounters, like depth, water flow, and viscosity. This independence is critical as it ensures that the controller does not rely on predefined assumptions about the dynamics, aligning with the main goal of this research. To address these challenges, we define a unique nonlinear function $F_{i} (Z_{i})$ that encapsulates all nonlinear uncertainties as follows:

F_{i} (Z_{i}) = M_{i} {\dot{α}}_{i} + C_{i} (ν_{i}) ν_{i} + D_{i} (ν_{i}) ν_{i} + g_{i} (η_{i}) + Δ_{i} (χ_{i}), (11)

where $F_{i} (Z_{i}) = {[f_{1, i} (Z_{i}), f_{2, i} (Z_{i}), f_{3, i} (Z_{i})]}^{T}$ and $Z_{i} = col {η_{i}, ν_{i}} \in Ω_{Z_{i}} \subset R^{6}$ , with $Ω_{Z_{i}}$ being a bounded compact set. We then employ the following RBFNN to approximate the model dynamics in (Equation 11) expressed by nonlinear functions $F_{i} (Z_{i})$ with $f_{k, i}$ for all $i \in I [1, N]$ and $k \in I [1,3]$ as follows:

f_{k, i} (Z_{i}) = W_{k, i}^{* T} S_{k i} (Z_{i}) + ϵ_{k, i} (Z_{i}), (12)

where $W_{k, i}^{*}$ is the ideal constant NN weights, and $ϵ_{k, i} (Z_{i})$ is the approximation error $ϵ_{k, i}^{*} > 0$ for all $i \in I [1, N]$ and $k \in I [1,3]$ , which satisfies $| ϵ_{k, i} (Z_{i}) | \leq ϵ_{k, i}^{*}$ . This error can be made arbitrarily small given a sufficient number of neurons in the network. A self-adaptation law is designed to estimate the unknown $W_{k, i}^{*}$ online. We aim to estimate $W_{k, i}^{*}$ with ${\hat{W}}_{k, i}$ by constructing the DDL feedback control law as follows:

τ_{i} = - J_{i}^{T} (η_{i}) z_{1, i} - K_{2, i} z_{2, i} + {\hat{W}}_{i}^{T} S_{i}^{F} (Z_{i}) . (13)

$K_{2, i} \in S_{3}^{+}$ is a feedback gain matrix that can be tuned to achieve the desired performance. To approximate the unknown nonlinear function vector $F_{i} (Z_{i})$ in (Equation 11) along the trajectory $Z_{i}$ within the compact set $Ω_{Z_{i}}$ , we use:

{\hat{W}}_{i}^{T} S_{i}^{F} (Z_{i}) = [\begin{matrix} {\hat{W}}_{1, i}^{T} S_{1, i} (Z_{i}) \\ {\hat{W}}_{2, i}^{T} S_{2, i} (Z_{i}) \\ {\hat{W}}_{3, i}^{T} S_{3, i} (Z_{i}) \end{matrix}] .

Then, from Equations 1, 13 we have:

M_{i} {\dot{ν}}_{i} + C_{i} (ν_{i}) ν_{i} + D_{i} (ν_{i}) ν_{i} + g_{i} (η_{i}) + Δ_{i} (χ_{i}) = τ_{i} = - J_{i}^{T} (η_{i}) z_{1, i} - K_{2, i} z_{2, i} + {\hat{W}}_{k, i}^{T} S_{k, i} (Z_{i}) .

By subtracting $W_{k, i}^{* T} S_{k, i} (Z_{i}) + ϵ_{k, i} (Z_{i})$ from both sides and considering Equations 9, 11, we define ${\tilde{W}}_{k, i} ≔ {\hat{W}}_{k, i} - W_{k, i}^{*}$ , leading to:

{\dot{z}}_{2, i} = M_{i}^{- 1} (- J_{i}^{T} (η_{i}) z_{1, i} - K_{2, i} z_{2, i} + {\hat{W}}_{k, i}^{T} S_{k, i} (Z_{i}) - ϵ_{k, i} (Z_{i})) .

For updating ${\hat{W}}_{k, i}$ online, a robust self-adaptation law is constructed using the $σ$ -modification technique Ioannou and Sun (1996) as follows:

{\dot{\hat{W}}}_{k, i} = - Γ_{k, i} (S_{k, i} (Z_{i}) z_{2 k, i} + σ_{k i} {\hat{W}}_{k, i}) . (14)

where $z_{2, i} = {[z_{21, i}, z_{22, i}, z_{23, i}]}^{T}$ , $Γ_{k, i} = Γ_{k, i}^{T} > 0$ , and $σ_{k, i} > 0$ are free parameters to be designed for all $i \in I [1, N]$ and $k \in I [1,3]$ . Integrating Equations 9, 13, 14 yields the following closed-loop system:

\{\begin{aligned} {\dot{z}}_{1, i} & = - K_{1, i} z_{1, i} + J_{i} (η_{i}) z_{2, i}, \\ {\dot{z}}_{2, i} & = M_{i}^{- 1} (- J_{i}^{T} (η_{i}) z_{1, i} - K_{2, i} z_{2, i} + {\hat{W}}_{k i}^{T} S_{k i} (Z_{i}) - ϵ_{k, i} (Z_{i})), \\ {\dot{\tilde{W}}}_{k, i} & = - Γ_{k, i} (S_{k, i} (Z_{i}) z_{2, k, i} + σ_{k, i} {\hat{W}}_{k, i}), \end{aligned} (15)

where, for all $i \in I [1, N]$ and $k \in I [1,3]$ , ${\tilde{W}}_{i}^{T} S_{i} (Z_{i}) = {[{\tilde{W}}_{1, i}^{T} S_{1, i} (Z_{i}), {\tilde{W}}_{2, i}^{T} S_{2, i} (Z_{i}), {\tilde{W}}_{3, i}^{T} S_{3, i} (Z_{i})]}^{T}$ , and $ϵ_{i} (Z_{i}) = {[ϵ_{1, i} (Z_{i}), ϵ_{2, i} (Z_{i}), ϵ_{3, i} (Z_{i})]}^{T}$ .

Remark 4. Unlike the first-layer DA observer design, the second-layer control law is fully decentralized for each local agent. It utilizes only the local agent’s information for feedback control, including $χ_{i}$ , ${\hat{χ}}_{i}$ , and $W_{k, i}$ , without involving any information exchange among neighboring AUVs.

The following theorem summarizes the stability and tracking control performance results of the overall system:

Theorem 2. Consider the local closed-loop system (Equation 15). For each $i \in I [1, N]$ , if there exists a sufficiently large compact set $Ω_{Z_{i}}$ such that $Z_{i} \in Ω_{Z_{i}}$ for all $t \geq 0$ , then for any bounded initial conditions, we have: 1) All signals in the closed-loop system remain uniformly ultimately bounded (UUB). 2) The position tracking error $η_{i} - η_{d, i}$ converges exponentially to a small neighborhood around zero in finite time $T_{i} > 0$ by choosing the design parameters with sufficiently large $\underset{̲}{λ} (K_{1, i}) > 0$ and $\underset{̲}{λ} (K_{2, i}) > 2 \bar{λ} (K_{1, i}) > 0$ , and sufficiently small $σ_{k, i} > 0$ for all $i \in I [1, N]$ and $k \in I [1,3]$ .

Proof: 1) Consider the following Lyapunov function candidate for the closed-loop system (Equation 15):

V_{i} = \frac{1}{2} z_{1, i}^{T} z_{1, i} + \frac{1}{2} z_{2, i}^{T} M_{i} z_{2, i} + \frac{1}{2} \sum_{k = 1}^{3} {\tilde{W}}_{k, i}^{T} Γ_{k, i}^{- 1} {\tilde{W}}_{k, i} .

Evaluating the derivative of $V_{i}$ along the trajectory of Equation 15 for all $i \in I [1, N]$ yields:

\begin{aligned} {\dot{V}}_{i} & = z_{1, i}^{T} (- K_{1, i} z_{1, i} + J_{i} (η_{i}) z_{2, i}) \\ + z_{2, i}^{T} (- J_{i}^{T} (η_{i}) z_{1, i} - K_{2, i} z_{2, i} + {\tilde{W}}_{k, i}^{T} S_{k, i} (Z_{i}) - ϵ_{k, i} (Z_{i}))) \\ - \sum_{k = 1}^{3} {\tilde{W}}_{k, i}^{T} (S_{k, i} (Z_{i}) z_{2 k, i} + σ_{k, i} {\hat{W}}_{k, i}) \\ = - z_{1, i}^{T} K_{1, i} z_{1, i} - z_{2, i}^{T} K_{2, i} z_{2, i} - z_{2, i}^{T} ϵ_{k, i} (Z_{i}) \\ - \sum_{k = 1}^{3} σ_{k, i} {\tilde{W}}_{k, i}^{T} {\hat{W}}_{k, i}, \forall i \in I [1, N] . \end{aligned}

Choose $K_{2, i} = K_{1, i} + K_{22, i}$ such that $K_{1, i}, K_{22, i} \in S_{3}^{+}$ . Using the completion of squares, we have:

\begin{aligned} - σ_{k, i} {\tilde{W}}_{k, i}^{T} {\hat{W}}_{k, i} & \leq - \frac{σ_{k, i} ‖ {\tilde{W}}_{k, i} ‖^{2}}{2} + \frac{σ_{k, i} ‖ W_{k, i}^{*} ‖^{2}}{2}, \\ - z_{2, i}^{T} K_{22, i} z_{2, i} - z_{2, i}^{T} ϵ_{i} (Z_{i}) & \leq \frac{ϵ_{i}^{T} (Z_{i}) ϵ_{i} (Z_{i})}{4 \underset{̲}{λ} (K_{22, i})} \leq \frac{‖ ϵ_{i}^{*} ‖^{2}}{4 \underset{̲}{λ} (K_{22, i})}, \end{aligned}

where $ϵ_{i}^{*} = {[ϵ_{1, i}^{*}, ϵ_{2, i}^{*}, ϵ_{3, i}^{*}]}^{T}$ . Then, we obtain:

\begin{aligned} {\dot{V}}_{i} \leq & - z_{1, i}^{T} K_{1, i} z_{1, i} - z_{2, i}^{T} K_{1, i} z_{2, i} + \frac{‖ ϵ_{i}^{*} ‖^{2}}{4 \underset{̲}{λ} (K_{22, i})} \\ + \sum_{k = 1}^{3} (- \frac{σ_{k, i} ‖ {\tilde{W}}_{k, i} ‖^{2}}{2} + \frac{σ_{k, i} ‖ W_{k, i}^{*} ‖^{2}}{2} . \end{aligned}

It follows that ${\dot{V}}_{i}$ is negative definite whenever:

\begin{aligned} ‖ z_{1, i} ‖ & > \frac{‖ ϵ_{i}^{*} ‖}{2 \sqrt{\underset{̲}{λ} (K_{1, i}) \underset{̲}{λ} (K_{22, i})}} + \sum_{k = 1}^{3} (\sqrt{\frac{σ_{k, i}}{2 \underset{̲}{λ} (K_{1, i})}} ‖ W_{k, i}^{*} ‖), \\ ‖ z_{2, i} ‖ & > \frac{‖ ϵ_{i}^{*} ‖}{2 \sqrt{\underset{̲}{λ} (K_{1, i}) \underset{̲}{λ} (K_{22, i})}} + \sum_{k = 1}^{3} (\sqrt{\frac{σ_{k, i}}{2 \underset{̲}{λ} (K_{1, i})}} ‖ W_{k, i}^{*} ‖), \\ ‖ {\tilde{W}}_{k, i} ‖ & > \frac{‖ ϵ_{i}^{*} ‖}{2 \sqrt{σ_{k, i} \underset{̲}{λ} (K_{22, i})}} + \sum_{k = 1}^{3} ‖ W_{k, i}^{*} ‖ : = {\tilde{W}}_{k, i}^{*} . \end{aligned}

For all $i \in I [1, N]$ , $\exists k \in I [1,3]$ . This leads to the Uniformly Ultimately Bounded (UUF) behavior of the signals $z_{1, i}$ , $z_{2, i}$ , and ${\tilde{W}}_{k, i}$ for all $i \in I [1, N]$ and $k \in I [1,3]$ . As a result, it can be easily verified that since $η_{d} i = η_{i} + d_{i}^{*}$ with $η_{i}$ bounded (according to Theorem 1 and Assumption 1), $η_{i} = z_{1, i} + η_{i}$ is bounded for all $i \in I [1, N]$ . Similarly, the boundedness of $ν_{i} = z_{2, i} + α_{i}$ can be confirmed by the fact that $α_{i}$ in Equation 9 is bounded. In addition, $W_{k, i} = {\tilde{W}}_{k, i} + W_{k, i}^{*}$ is also bounded for all $i \in I [1, N]$ and $k \in I [1,3]$ because of the boundedness of ${\tilde{W}}_{k, i}$ and $W_{k, i}^{*}$ . Moreover, in light of Equation 10, ${\dot{α}}_{i}$ is bounded as all the terms on the right-hand side of Equation 10 are bounded. This leads to the boundedness of the control signal $τ_{i}$ in Equation 13 since the Gaussian function vector $S_{i}^{F} (Z_{i})$ is guaranteed to be bounded for any $Z_{i}$ . As such, all the signals in the closed-loop system remain UUB, which completes the proof of the first part.

2) For the second part, it will be shown that $η_{i}$ will converge arbitrarily close to $η_{d i}$ in some finite time $T_{i} > 0$ for all $i \in I [1, N]$ . To this end, we consider the following Lyapunov function candidate for the dynamics of $z_{1, i}$ and $z_{2, i}$ in Equation 15:

V_{z, i} = \frac{1}{2} z_{1, i}^{T} z_{1, i} + \frac{1}{2} z_{2, i}^{T} M_{i} z_{2, i}, \forall i \in I [1, N] . (16)

The derivative of $V_{z, i}$ is:

\begin{align} {\dot{V}}_{z, i} & = z_{1, i}^{T} (- K_{1, i} z_{1, i} + J_{i} (η_{i}) z_{2, i}) + z_{2, i}^{T} (- J_{i}^{T} (η_{i}) z_{1, i} - K_{2, i} z_{2, i} + {\tilde{W}}_{k, i}^{T} S_{k, i} (Z_{i}) - ϵ_{i} (Z_{i})) \\ = - z_{1, i}^{T} K_{1, i} z_{1, i} - z_{2, i}^{T} K_{2, i} z_{2, i} + z_{2, i}^{T} {\tilde{W}}_{k, i}^{T} S_{k, i} (Z_{i}) - z_{2, i}^{T} ϵ_{k, i} (Z_{i}), \forall i \in I [1, N] . \end{align}

Similar to the proof of part one, we let $K_{2, i} = K_{1, i} + 2 K_{22, i}$ with $K_{1, i}, K_{22, i} \in S_{3}^{+}$ . According to Wang and Hill (2018), the Gaussian RBFNN regressor $S_{i}^{F} (Z_{i})$ is bounded by $‖ S_{i}^{F} (Z_{i}) ‖ \leq s_{i}^{*}$ for any $Z_{i}$ and for all $i \in I [1, N]$ with some positive number $s_{i}^{*} > 0$ . Through completion of squares, we have:

\begin{aligned} - z_{2, i}^{T} K_{22, i} z_{2, i} + z_{2, i}^{T} {\tilde{W}}_{i}^{T} S_{i}^{F} (Z_{i}) & \leq \frac{‖ {\tilde{W}}_{i}^{*} ‖^{2} s_{i}^{* 2}}{4 \underset{̲}{λ} (K_{22, i})}, \\ - z_{2, i}^{T} K_{22, i} z_{2, i} - z_{2, i}^{T} ϵ_{i} (Z_{i}) & \leq \frac{‖ ϵ_{i}^{*} ‖^{2}}{4 \underset{̲}{λ} (K_{22, i})} . \end{aligned}

Also ${\tilde{W}}_{i}^{*} = {[{\tilde{W}}_{1, i}^{*}, {\tilde{W}}_{2, i}^{*}, {\tilde{W}}_{3, i}^{*}]}^{T}$ . This leads to:

\begin{aligned} {\dot{V}}_{z, i} & \leq - z_{1, i}^{T} K_{1, i} z_{1, i} - z_{2, i}^{T} K_{1, i} z_{2, i} + δ_{i}, \\ \leq - 2 \underset{̲}{λ} (K_{1, i}) (\frac{1}{2} z_{1, i}^{T} z_{1, i} + \frac{1}{2 \bar{λ} (M_{i})} z_{2, i}^{T} M_{i} z_{2, i}) + δ_{i}, \\ \leq - ρ_{i} V_{z, i} + δ_{i}, \forall i \in I [1, N], \end{aligned} (17)

where $ρ_{i} = \min {2 \underset{̲}{λ} (K_{1, i}), 2 \underset{̲}{λ} (K_{1, i}) / \bar{λ} (M_{i})}$ and $δ_{i} = (‖ {\tilde{W}}_{i}^{*} ‖^{2} s_{i}^{* 2} / 4 \underset{̲}{λ} (K_{22, i})) + (‖ ϵ_{i}^{*} ‖^{2} / 4 \underset{̲}{λ} (K_{22, i}))$ , $\forall i \in I [1, N]$ . Solving the inequality Equation 17 yields:

0 \leq V_{z, i} (t) \leq V_{z, i} (0) \exp (- ρ_{i} t) + \frac{δ_{i}}{ρ_{i}},

which together with Equation 16 implies that:

\min \{1, \underset{̲}{λ} (M_{i})\} \frac{1}{2} (‖ z_{1, i} ‖^{2} + ‖ z_{2, i} ‖^{2}) \leq V_{z, i} (0) \exp (- ρ_{i} t) + \frac{δ_{i}}{ρ_{i}}, \forall t \geq 0, i \in I [1, N],

also

‖ z_{1, i} ‖^{2} + ‖ z_{2, i} ‖^{2} \leq \frac{2}{\min \{1, \underset{̲}{λ} (M_{i})\}} V_{z, i} (0) \exp (- ρ_{i} t) + \frac{2 δ_{i}}{ρ_{i} \min \{1, \underset{̲}{λ} (M_{i})\}} .

Consequently, it is straightforward that given ${\bar{δ}}_{i} > \sqrt{2 δ_{i} / ρ_{i} \min {1, \underset{̲}{λ} (M_{i})}}$ , there exists a finite time $T_{i} > 0$ for all $i \in I [1, N]$ such that for all $t \geq T_{i}$ , both $z_{1, i}$ and $z_{2, i}$ satisfy $‖ z_{1, i} (t) ‖ \leq {\bar{δ}}_{i}$ and $‖ z_{2, i} (t) ‖ \leq {\bar{δ}}_{i} \forall i \in I [1, N]$ , where ${\bar{δ}}_{i}$ can be made arbitrarily small by choosing sufficiently large $\underset{̲}{λ} (K_{1, i}) > 0$ and $\underset{̲}{λ} (K_{2, i}) > 2 \bar{λ} (K_{1, i}) > 0$ for all $i \in I [1, N]$ . This ends the proof.

By integrating the outcomes of Theorems 1, 2, the following theorem is established, which can be presented without additional proof:

Theorem 3. By Considering the multi-AUV system (Equation 1) and the virtual leader dynamics (Equation 2) with the network communication topology $G$ and under Assumptions 1 and 2, the objective 1 of Problem 1 (i.e., $η_{i}$ converges to $η_{0} + d_{i}^{*}$ exponentially for all $i \in I [1, N]$ ) can be achieved by using the cooperative observer Equations 3, 4 and the DDL control law Equations 13, 14 with all the design parameters satisfying the requirements in Theorems 1 and 2, respectively.

Remark 5. With the proposed two-layer formation learning control architecture, inter-agent information exchange occurs solely in the first-layer DA observation. Only the observer’s estimated information, and not the physical plant state information, needs to be shared among neighboring agents. Additionally, since no global information is required for the design of each local AUV control system, the proposed formation learning control protocol can be designed and implemented in a fully distributed manner.

Remark 6. It is important to note that the eigenvalue constraints on $A_{0}$ in Assumption 1 are not needed for cooperative observer estimation (as detailed in the Section 3 or for achieving formation tracking control performance (as discussed in this section). This indicates that formation tracking control can be attained for general reference trajectories, including both periodic paths and straight lines, provided they are bounded. However, these constraints will become necessary in the next section to ensure the accurate learning capability of the proposed method.

4 Accurate learning from formation control

It is necessary to demonstrate the convergence of the RBFNN weights in Equations 13, 14 to their optimal values for accurate learning and identification. The main result of this section is summarized in the following theorem.

Theorem 4. Consider the local closed-loop system (Equation 15) with Assumptions 1, 2. For each $i \in I [1, N]$ , if there exists a sufficiently large compact set $Ω_{Z_{i}}$ such that $Z_{i} \in Ω_{Z_{i}}$ for all $t \geq 0$ , then for any bounded initial conditions and $W_{k, i} (0) = 0 \forall i \in I [1, N], k \in I [1,3]$ , the local estimated neural weights $W_{ζ, k, i}$ converge to small neighborhoods of their optimal values $W_{ζ, k, i}^{*}$ along the periodic reference tracking orbit $ϕ_{ζ, i} (Z_{i} (t)) |_{t \geq T_{i}}$ (denoting the orbit of the NN input signal $Z_{i} (t)$ starting from time $T_{i}$ ). This leads to locally accurate approximations of the nonlinear uncertain dynamics $f_{k, i} (Z_{i}) \forall k \in I [1,3]$ in Equation 11 being obtained by $W_{k, i}^{T} S_{k, i} (Z_{i})$ , as well as by ${\bar{W}}_{k, i}^{T} S_{k, i} (Z_{i})$ , where $\forall i \in I [1, N], k \in I [1,3]$ . Also, $ζ$ denotes the subset of neurons (or nodes) that are active when the system state $Z (t)$ is within a specific neighborhood of the state space.

{\bar{W}}_{k, i} = {m e a n}_{t \in [t_{a, i}, t_{b, i}]} {\hat{W}}_{k, i} (t), (18)

where $[t_{a, i}, t_{b, i}]$ $(t_{b, i} > t_{a, i} > T_{i})$ represents a time segment after the transient process.

Proof: From Theorem 3, we have shown that for all $i \in I [1, N]$ , $η_{i}$ will closely track the periodic signal $η_{d, i} = η_{0} + d_{i}^{*}$ in finite time $T_{i}$ . In addition, (Equation 9) implies that $ν_{i}$ will also closely track the signal $J_{i}^{T} (η_{i}) {\dot{η}}_{0}^{i}$ since both $z_{1, i}$ and $z_{2, i}$ will converge to a small neighborhood around zero according to Theorem 2. Moreover, since ${\dot{η}}_{0}^{i}$ will converge to ${\dot{η}}_{0}$ according to Theorem 1, and $J_{i} (η_{i})$ is a bounded rotation matrix, $ν_{i}$ will also be a periodic signal after finite time $T_{i}$ , because ${\dot{η}}_{0}$ is periodic under Assumption 1. Consequently, since the RBFNN input $Z_{i} (t) = c o l {η_{i}, ν_{i}}$ becomes a periodic signal for all $t \leq T_{i}$ , the PE condition of some internal closed-loop signals, i.e., the RBFNN regression subvector $S_{ζ, k, i} (Z_{i})$ $(\forall t \geq T_{i})$ , is satisfied according to Lemma 1. As mentioned in Section 2.2, $ζ$ represents the subset of RBFNN nodes and weights that are specifically utilized along the recurrent trajectory of the system state $Z_{i} (t)$ . This subset focuses on the active neural components required for approximating the system’s nonlinear dynamics locally, ensuring that the learning and adaptation processes are efficient and accurate within the compact region where the trajectory resides. It should be noted that the periodicity of $Z_{i} (t)$ leads to the PE of the regression subvector $S_{ζ, k, i} (Z_{i})$ , but not necessarily the PE of the whole regression vector $S_{k, i} (Z_{i})$ . Thus, we term this as a partial PE condition, and we will show the convergence of the associated local estimated neural weights $W_{ζ, k, i} \to W_{ζ, k, i}^{*}$ , rather than $W_{k, i} \to W_{k, i}^{*}$ .

Thus, to prove accurate convergence of local neural weights $W_{ζ, k, i}$ associated with the regression subvector $S_{ζ, k, i} (Z_{i})$ under the satisfaction of the partial PE condition, we first rewrite the closed-loop dynamics of $z_{1, i}$ and $z_{2, i}$ along the periodic tracking orbit $ϕ_{ζ, i} (Z_{i} (t)) |_{t \geq T_{i}}$ by using the localization property of the Gaussian RBFNN:

\begin{align} {\dot{z}}_{1, i} & = - K_{1, i} z_{1, i} + J_{i} (η_{i}) z_{2, i}, \\ {\dot{z}}_{2, i} & = M_{i}^{- 1} (- W_{ζ, i}^{* T} S_{ζ, i}^{F} (Z_{i}) - ϵ_{ζ, i} - J_{i}^{T} (η_{i}) z_{1, i} - K_{2, i} z_{2, i} + {\hat{W}}_{ζ, i}^{T} S_{ζ, i}^{F} (Z_{i}) + {\hat{W}}_{\bar{ζ}, i}^{T} S_{\bar{ζ}, i}^{F} (Z_{i})) \\ = M_{i}^{- 1} (- J_{i}^{T} (η_{i}) z_{1, i} - K_{2, i} z_{2, i} + {\tilde{W}}_{ζ, i}^{T} S_{ζ, i}^{F} (Z_{i}) - ϵ_{ζ, i}^{'}) . \end{align}

where $F_{i} (Z_{i}) = W_{ζ, i}^{* T} S_{ζ, i}^{F} (Z_{i}) + ϵ_{ζ, i}$ with $W_{ζ, i}^{* T} S_{ζ, i}^{F} (Z_{i}) = {[W_{ζ, 1, i}^{* T} S_{ζ, 1, i} (Z_{i}), W_{ζ, 2, i}^{* T} S_{ζ, 2, i} (Z_{i}), W_{ζ, 3, i}^{* T} S_{ζ, 3, i} (Z_{i})]}^{T}$ and $ϵ_{ζ, i} = {[ϵ_{ζ, 1, i}, ϵ_{ζ, 2, i}, ϵ_{ζ, 3, i}]}^{T}$ being the approximation error. Additionally, $W_{ζ, i}^{T} S_{ζ, i}^{F} (Z_{i}) + W_{\bar{ζ}, i}^{T} S_{\bar{ζ}, i}^{F} (Z_{i}) = W_{i}^{T} S_{i}^{F} (Z_{i})$ with subscripts $ζ$ and $\bar{ζ}$ denoting the regions close to and far away from the periodic trajectory $ϕ_{ζ, i} (Z_{i} (t)) |_{t \geq T_{i}}$ , respectively. According to Wang and Hill (2018), $‖ W_{\bar{ζ}, i}^{T} S_{\bar{ζ}, i}^{F} (Z_{i}) ‖$ is small, and the NN local approximation error $ϵ_{ζ, i}^{'} = ϵ_{ζ, i} - W_{\bar{ζ}, i}^{T} S_{\bar{ζ}, i}^{F} (Z_{i})$ with $‖ ϵ_{ζ, i}^{'} ‖ = O (‖ ϵ_{ζ, i} ‖)$ is also a small number. Thus, the overall closed-loop adaptive learning system can be described by:

and

\begin{aligned} [\begin{matrix} {\dot{\tilde{W}}}_{\bar{ζ}, i, 1} \\ {\dot{\tilde{W}}}_{\bar{ζ}, i, 2} \\ {\dot{\tilde{W}}}_{\bar{ζ}, i, 3} \end{matrix}] & = [\begin{matrix} - Γ_{\bar{ζ}, 1, i} (S_{\bar{ζ}, 1, i} (Z_{i}) z_{2, i} + σ_{i, 1} {\hat{W}}_{\bar{ζ}, 1, i}) \\ - Γ_{\bar{ζ}, 2, i} (S_{\bar{ζ}, 2, i} (Z_{i}) z_{2, i} + σ_{i, 2} {\hat{W}}_{\bar{ζ}, 2, i}) \\ - Γ_{\bar{ζ}, 3, i} (S_{\bar{ζ}, 3, i} (Z_{i}) z_{2, i} + σ_{i, 3} {\hat{W}}_{\bar{ζ}, 3, i}) \end{matrix}], \end{aligned}

where

\begin{aligned} Ξ_{i} & = [\begin{matrix} 0 \\ M_{i}^{- 1} [\begin{matrix} S_{ζ, 1, i}^{T} (Z_{i}) & 0 & 0 \\ 0 & S_{ζ, 2, i}^{T} (Z_{i}) & 0 \\ 0 & 0 & S_{ζ, 3, i}^{T} (Z_{i}) \end{matrix}] \end{matrix}], \end{aligned}

for all $i \in I [1, N]$ . The exponential stability property of the nominal part of subsystem (Equation 19) has been well-studied in Wang and Hill (2018), Yuan and Wang (2011), and Yuan and Wang (2012), where it is stated that PE of $S_{ζ, k, i} (Z_{i})$ will guarantee exponential convergence of $(z_{1, i}, z_{2, i}, {\tilde{W}}_{ζ, k, i}) = 0$ for all $i \in I [1, N]$ and $k \in I [1,3]$ . Based on this, since $‖ ϵ_{ζ, i}^{'} ‖ = O (‖ ϵ_{ζ, i} ‖) = O (‖ ϵ_{i} ‖)$ , and $σ_{k, i} Γ_{ζ, k, i} {\hat{W}}_{ζ, k, i}$ can be made small by choosing sufficiently small $σ_{k, i}$ for all $i \in I [1, N]$ , $k \in I [1,3]$ , both the state error signals $(z_{1, i}, z_{2, i})$ and the local parameter error signals ${\tilde{W}}_{ζ, k, i}$ $(\forall i \in I [1, N], k \in I [1,3])$ will converge exponentially to small neighborhoods of zero, with the sizes of the neighborhoods determined by the RBFNN ideal approximation error $ϵ_{i}$ as in Equation 12 and $σ_{k, i} ‖ Γ_{ζ, k, i} {\hat{W}}_{ζ, k, i} ‖$ . The convergence of $W_{ζ, k, i} \to W_{ζ, k, i}^{*}$ implies that along the periodic trajectory $ϕ_{ζ, i} (Z_{i} (t)) |_{t \geq T_{i}}$ , we have

\begin{aligned} f_{k, i} (Z_{i}) & = W_{ζ, k, i}^{* T} S_{ζ, k, i} (Z_{i}) + ϵ_{ζ, k, i} \\ = {\hat{W}}_{ζ, k, i}^{T} S_{ζ, k, i} (Z_{i}) - {\tilde{W}}_{ζ, k, i}^{T} S_{ζ, k, i} (Z_{i}) + ϵ_{ζ, k, i} \\ = {\hat{W}}_{ζ, k, i}^{T} S_{ζ, k, i} (Z_{i}) + ϵ_{ζ_{1}, k, i} \\ = {\bar{W}}_{ζ, k, i}^{T} S_{ζ, k, i} (Z_{i}) + ϵ_{ζ_{2}, k, i}, \end{aligned}

where for all $i \in I [1, N]$ , $k \in I [1,3]$ , $ϵ_{ζ_{1}, k, i} = ϵ_{ζ, k, i} - {\tilde{W}}_{ζ, k, i}^{T} S_{ζ, k, i} (Z_{i}) = O (‖ ϵ_{ζ, i} ‖)$ due to the convergence of ${\tilde{W}}_{ζ, k, i} \to 0$ . The last equality is obtained according to the definition of (Equation 18) with ${\bar{W}}_{ζ, k, i}$ being the corresponding subvector of ${\bar{W}}_{k, i}$ along the periodic trajectory $ϕ_{ζ, i} (Z_{i} (t)) |_{t \geq T_{i}}$ , and $ϵ_{ζ_{2}, k, i}$ being an approximation error using ${\bar{W}}_{ζ, k, i}^{T} S_{ζ, k, i} (Z_{i})$ . Apparently, after the transient process, we will have $ϵ_{ζ_{2}, k, i} = O (ϵ_{ζ_{1}, k, i})$ , $\forall i \in I [1, N]$ , $k \in I [1,3]$ . Conversely, for the neurons whose centers are distant from the trajectory $ϕ_{ζ, i} (Z_{i} (t)) |_{t \geq T_{i}}$ , the values of $‖ S_{\bar{ζ}, k, i} (Z_{i}) ‖$ will be very small due to the localization property of Gaussian RBFNN. From the adaptation law (Equation 13) with $W_{i}^{k} (0) = 0$ , it can be observed that these small values of $S_{\bar{ζ}, k, i} (Z_{i})$ will only minimally activate the adaptation of the associated neural weights $W_{\bar{ζ}, k, i}$ . As a result, both $W_{\bar{ζ}, k, i}$ and $W_{\bar{ζ}, k, i}^{T} S_{\bar{ζ}, k, i} (Z_{i})$ , as well as ${\bar{W}}_{\bar{ζ}, k, i}$ and ${\bar{W}}_{\bar{ζ}, k, i}^{T} S_{\bar{ζ}, k, i} (Z_{i})$ , will remain very small for all $i \in I [1, N]$ , $k \in I [1,3]$ along the periodic trajectory $ϕ_{ζ, i} (Z_{i} (t)) |_{t \geq T_{i}}$ . This indicates that the entire RBFNN $W_{k, i}^{T} S_{k, i} (Z_{i})$ and ${\bar{W}}_{k, i}^{T} S_{k, i} (Z_{i})$ can be used to accurately approximate the unknown function $f_{k, i} (Z_{i})$ locally along the periodic trajectory $ϕ_{ζ, i} (Z_{i} (t)) |_{t \geq T_{i}}$ , meaning that.

f_{k, i} (Z_{i}) = {\hat{W}}_{ζ, k, i}^{T} S_{ζ, k, i} (Z_{i}) + ϵ_{ζ_{1}, k, i} = {\hat{W}}_{k, i}^{T} S_{k, i} (Z_{i}) + ϵ_{1, k, i}

= {\bar{W}}_{ζ, k, i}^{T} S_{ζ, k, i} (Z_{i}) + ϵ_{ζ_{2}, k, i} = {\bar{W}}_{k, i}^{T} S_{k, i} (Z_{i}) + ϵ_{2, k, i},

with the approximation accuracy level of $ϵ_{1, k, i} = ϵ_{ζ_{1}, k, i} - W_{\bar{ζ}, k, i}^{T} S_{\bar{ζ}, k, i} (Z_{i}) = O (ϵ_{ζ_{1}, k, i}) = O (ϵ_{k, i})$ and $ϵ_{2, k, i} = ϵ_{ζ_{2}, k, i} - {\bar{W}}_{\bar{ζ}, k, i}^{T} S_{\bar{ζ}, k, i} (Z_{i}) = O (ϵ_{ζ_{2}, k, i}) = O (ϵ_{k, i})$ for all $i \in I [1, N]$ , $k \in I [1,3]$ . This ends the proof.

Remark 7. The key idea in the proof of Theorem 4 is inspired by Wang and Hill (2018). For more detailed analysis on the learning performance, including quantitative analysis on the learning accuracy levels $ϵ_{1, i, k}$ and $ϵ_{2, i, k}$ as well as the learning speed, please refer to Yuan and Wang (2011). Furthermore, the AUV nonlinear dynamics (Equation 9) to be identified do not contain any time-varying random disturbances. This is important to ensure accurate identification/learning performance under the deterministic learning framework. To understand the effects of time-varying external disturbances on deterministic learning performance, interested readers are referred to Yuan and Wang (2012) for more details.

Remark 8. Based on Equation 18, to obtain the constant RBFNN weights ${\bar{W}}_{k, i}$ for all $i \in I [1, N]$ , $k \in I [1,3]$ , one needs to implement the formation learning control law Equations 13, 14 first. Then, according to Theorem 4, after a finite-time transient process, the RBFNN weights $W_{k, i}$ will converge to constant steady-state values. Thus, one can select a time segment $[t_{a, i}, t_{b, i}]$ with $t_{b, i} > t_{a, i} > T_{i}$ for all $i \in I [1, N]$ to record and store the RBFNN weights $W_{k, i} (t)$ for $t \in [t_{a, i}, t_{b, i}]$ . Finally, based on these recorded data, ${\bar{W}}_{k, i}$ can be calculated off-line using Equation 18.

Remark 9. It is shown in Theorem 4 that locally accurate learning of each individual AUV’s nonlinear uncertain dynamics can be achieved using localized RBFNNs along the periodic trajectory $ϕ_{ζ, i} (Z_{i} (t)) |_{t \geq T_{i}}$ . The learned knowledge can be further represented and stored in a time-invariant fashion using constant RBFNN, i.e., ${\bar{W}}_{k, i}^{T} S_{k, i} (Z_{i})$ for all $i \in I [1, N]$ , $k \in I [1,3]$ . In contrast to many existing techniques (e.g., Peng et al., 2017; Peng et al., 2015), this is the first time, to the authors’ best knowledge, that locally accurate identification and knowledge representation using constant RBFNN are accomplished and rigorously analyzed for multi-AUV formation control under complete uncertain dynamics.

5 Formation control with pre-learned dynamics

In this section, we will further address objective 2 of Problem 1, which involves achieving formation control without readapting to the AUV’s nonlinear uncertain dynamics. To this end, consider the multiple AUV systems (Equation 1) and the virtual leader dynamics (Equation 2). We employ the estimator observer Equations 3, 4 to cooperatively estimate the leader’s state information. Instead of using the DDL feedback control law (Equation 13), and self-adaptation law (Equation 4), we introduce the following constant RBFNN controller, which does not require online adaptation of the NN weights:

τ_{i} = - J_{i}^{T} (η_{i}) z_{1, i} - K_{2, i} z_{2, i} + {\bar{W}}_{i}^{T} S_{i}^{F} (Z_{i}), (20)

where ${\bar{W}}_{i}^{T} S_{i}^{F} (Z_{i}) = {[{\bar{W}}_{1, i}^{T} S_{1, i} (Z_{i}), {\bar{W}}_{2, i}^{T} S_{2, i} (Z_{i}), {\bar{W}}_{3, i}^{T} S_{3, i} (Z_{i})]}^{T}$ is obtained from Equation 18. The term ${\bar{W}}_{k, i}^{T} S_{k, i} (Z_{i})$ represents the locally accurate RBFNN approximation of the nonlinear uncertain function $f_{k, i} (Z_{i})$ along the trajectory $ϕ_{ζ, i} (Z_{i} (t)) |_{t \geq T_{i}}$ , and the associated constant neural weights ${\bar{W}}_{k, i}$ are obtained from the formation learning control process as discussed in Remark 8.

Theorem 5. Consider the multi-AUV system (Equation 1) and the virtual leader dynamics (Equation 3) with the network communication topology $G$ . Under Assumptions 1, 2, the formation control performance (i.e., $η_{i}$ converges to $η_{0} + d_{i}^{*}$ exponentially with the same $η_{0}$ and $d_{i}^{*}$ defined in Theorem 3 for all $i \in I [1, N]$ ) can be achieved by using the DA observer Equations 3, 20 and the constant RBFNN control law (Equation 4) with the constant NN weights obtained from Equation 18.

Proof: The closed-loop system for each local AUV agent can be established by integrating the controller (Equation 20) with the AUV dynamics (Equation 1).

\begin{aligned} {\dot{z}}_{1, i} = & - K_{1, i} z_{1, i} + J_{i} (η_{i}) z_{2, i}, \\ {\dot{z}}_{2, i} = & M_{i}^{- 1} (- J_{i}^{T} (η_{i}) z_{1, i} - K_{2, i} z_{2, i} + {\bar{W}}_{i}^{T} S_{i}^{F} (Z_{i}) - F_{i} (Z_{i})) \\ = & M_{i}^{- 1} (- J_{i}^{T} (η_{i}) z_{1, i} - K_{2, i} z_{2, i} - ϵ_{2, i}), \forall i \in I [1, N], \end{aligned}

where $ϵ_{2, i} = {[ϵ_{21, i}, ϵ_{22, i}, ϵ_{23, i}]}^{T}$ . Consider the Lyapunov function candidate $V_{z, i} = \frac{1}{2} z_{1, i}^{T} z_{1, i} + \frac{1}{2} z_{2, i}^{T} M_{i} z_{2, i}$ , whose derivative along the closed-loop system described is given by:

\begin{aligned} {\dot{V}}_{z, i} = & z_{1, i}^{T} (- K_{1, i} z_{1, i} + J_{i} (η_{i}) z_{2, i}) + z_{2, i}^{T} (- J_{i}^{T} (η_{i}) z_{1, i} - K_{2, i} z_{2, i} - ϵ_{2, i}) \\ = & - z_{1, i}^{T} K_{1, i} z_{1, i} - z_{2, i}^{T} K_{2, i} z_{2, i} - z_{2, i}^{T} ϵ_{2, i} . \end{aligned}

Selecting $K_{2, i} = K_{1, i} + K_{22, i}$ where $K_{1, i}, K_{22, i} \in S_{3}^{+}$ , we can utilize the method of completing squares to obtain:

- z_{2, i}^{T} K_{22, i} z_{2, i} - z_{2, i}^{T} ϵ_{2, i} \leq (\frac{‖ ϵ_{2, i} ‖^{2}}{4 \underset{̲}{λ} (K_{22, i})}) \leq (\frac{‖ ϵ_{2, i}^{*} ‖^{2}}{4 \underset{̲}{λ} (K_{22, i})}),

which implies that:

{\dot{V}}_{z, i} \leq - z_{1, i}^{T} K_{1, i} z_{1, i} - z_{2, i}^{T} K_{1, i} z_{2, i} + (\frac{‖ ϵ_{2, i}^{*} ‖^{2}}{4 \underset{̲}{λ} (K_{22, i})}) \leq - ρ_{i} V_{z, i} + δ_{i}, \forall i \in I [1, N] .

where $ρ_{i} = \min {2 \underset{̲}{λ} (K_{1, i}), (2 \underset{̲}{λ} (K_{1, i}) / \bar{λ} (M_{i}))}$ and $δ_{i} = (‖ ϵ_{2, i}^{*} ‖^{2} / 4 \underset{̲}{λ} (K_{22, i}))$ . Using similar reasoning to that in the proof of Theorem 2, it is evident from the derived inequality that all signals within the closed-loop system remain bounded. Additionally, $η_{i} - η_{i}^{d}$ will converge to a small neighborhood around zero within a finite period. The magnitude of this neighborhood can be minimized by appropriately choosing large values for $\underset{̲}{λ} (K_{1, i}) > 0$ and $\underset{̲}{λ} (K_{2, i}) > \bar{λ} (K_{1, i})$ across all $i \in I [1, N]$ . In line with Theorem 1, under Assumptions 1, 2, the implementation of the DA observer DA observer Equations 3, 4 facilitates the exponential convergence of $η_{i}$ towards $η_{0}$ . This conjunction of factors assures that $η_{i}$ rapidly aligns with $η_{d, i} = η_{0} + d_{i}^{*}$ , achieving the objectives set out for formation control.

Remark 10. Building on the locally accurate learning outcomes discussed in Section 4, the newly developed distributed control protocol comprising Equations 3, 4, 20 facilitates stable formation control across a repeated formation pattern. Unlike the formation learning control approach outlined in Section 3.2, which involves Equations 3, 4 coupled with Equations 13, 14, the current method eliminates the need for online RBFNN adaptation for all AUV agents. This significantly reduces the computational demands, thereby enhancing the practicality of implementing the proposed distributed RBFNN formation control protocol. This innovation marks a significant advancement over many existing techniques in the field.

6 Simulation

We consider a multi-AUV heterogeneous system composed of 5 AUVs for the simulation. The dynamics of these AUVs are described in THE system (Equation 1). The system parameters for each AUV are specified as follows:

\begin{aligned} M_{i} & = [\begin{matrix} m_{11, i} & 0 & 0 \\ 0 & m_{22, i} & m_{23, i} \\ 0 & m_{23, i} & m_{33, i} \end{matrix}], \\ C_{i} & = [\begin{matrix} 0 & 0 & - m_{22, i} v_{i} - m_{23, i} r_{i} \\ 0 & 0 & - m_{11, i} u_{i} \\ m_{22, i} v_{i} + m_{23, i} r_{i} & - m_{11, i} u_{i} & 0 \end{matrix}], \\ D_{i} & = [\begin{matrix} d_{11, i} (ν_{i}) & 0 & 0 \\ 0 & d_{22, i} (ν_{i}) & d_{23, i} (ν_{i}) \\ 0 & d_{32, i} (ν_{i}) & d_{33, i} (ν_{i}) \end{matrix}], g_{i} = 0, \\ Δ_{i} & = [\begin{matrix} Δ_{1, i} (χ_{i}) \\ Δ_{2, i} (χ_{i}) \\ Δ_{3, i} (χ_{i}) \end{matrix}], \forall i \in I [1,5], \end{aligned}

where the mass and damping matrix components for each AUV $i$ are defined as:

\begin{array}{l} m_{11, i} & = m_{i} - X_{\dot{u}, i}, & m_{22, i} & = m_{i} - Y_{\dot{v}, i}, \\ m_{23, i} & = m_{i} x_{g, i} - Y_{\dot{r}, i}, & m_{33, i} & = I_{z, i} - N_{\dot{r}, i}, \\ d_{11, i} & = - (X_{u, i} + X_{u u, i} ‖ u_{i} ‖), & d_{22, i} & = - (Y_{v, i} + Y_{v v, i} ‖ v_{i} ‖ + Y_{r v, i} ‖ r_{i} ‖), \\ d_{23, i} & = - (Y_{r, i} + Y_{v r, i} ‖ v_{i} ‖ + Y_{r r, i} ‖ r_{i} ‖), & d_{32, i} & = - (N_{v, i} + N_{v v, i} ‖ v_{i} ‖ + N_{r v, i} ‖ r_{i} ‖), \\ d_{33, i} & = - (N_{r, i} + N_{v r, i} ‖ v_{i} ‖ + N_{r r, i} ‖ r_{i} ‖) . \end{array}

According to the notations in Prestero (2001) and Skjetne et al. (2005) the coefficients ${X (\cdot), Y (\cdot), N (\cdot)}$ are hydrodynamic parameters. For the associated system parameters are borrowed from Skjetne et al. (2005) (with slight modifications for different AUV agents) and simulation purposes and listed in Table 1. For all $i \in I [1,5]$ , we set $x_{g, i} = 0.05$ and $Y_{\dot{r}, i} = Y_{r v, i} = Y_{v r, i} = Y_{r r, i} = N_{r v, i} = N_{r r, i} = N_{v v, i} = N_{v r, i} = N_{r, i} = 0$ . Model uncertainties are given by:

\begin{array}{l} Δ_{1} & = 0, Δ_{2} = {[\begin{matrix} 0.2 u_{2}^{2} + 0.3 v_{2} & - 0.95 & 0.33 ‖ r_{2} ‖ \end{matrix}]}^{T} \\ Δ_{3} & = {[\begin{matrix} - 0.58 + \cos (v_{3}) & 0.23 r_{3}^{3} & 0.74 u_{3}^{2} \end{matrix}]}^{T} \\ Δ_{4} & = {[\begin{matrix} - 0.31 & 0 & 0.38 u_{4}^{2} + v_{4}^{3} \end{matrix}]}^{T} \\ Δ_{5} & = {[\begin{matrix} \sin (v_{5}) & \cos (u_{5} + r_{5}) & - 0.65 \end{matrix}]}^{T} . \end{array}

Table 1

Table 1. Parameters of AUVs.

Figure 2 illustrates the communication topology and the spanning tree where agent 0 is the virtual leader and is considered as the root, in accordance with Assumption 2. The desired formation pattern requires each AUV, $η_{i}$ , to track a periodic signal generated by the virtual leader $η_{0}$ . The dynamics of the leader are defined as follows:

\begin{aligned} [\begin{matrix} {\dot{η}}_{0} \\ {\dot{ν}}_{0} \end{matrix}] = & [\begin{matrix} 0 & [\begin{matrix} 1 & 0 & 0 \\ 0 & - 1 & 0 \\ 0 & 0 & 1 \end{matrix}] \\ [\begin{matrix} - 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & - 1 \end{matrix}] & 0 \end{matrix}] [\begin{matrix} η_{0} \\ ν_{0} \end{matrix}], \\ [\begin{matrix} η_{0} (0) \\ ν_{0} (0) \end{matrix}] = & {[\begin{matrix} 0 & 80 & 0 & 80 & 0 & 80 \end{matrix}]}^{T} . \end{aligned} (21)

Figure 2

Figure 2. The communication network topology and spanning tree of multi-AUV system of the simulation with 0 as virtual leader.

The initial conditions and system matrix are structured to ensure all eigenvalues of $A_{0}$ lie on the imaginary axis, thus satisfying Assumption 1. The reference trajectory for $η_{0}$ is defined as ${[80 \sin (t), 80 \cos (t), 80 \sin (t)]}^{T}$ . The predefined offsets $d_{i}^{*}$ , which determine the relative positions of the AUVs to the leader, are specified as follows:

d_{1}^{*} = {[0,0,0]}^{T}, d_{4}^{*} = {[- 10,10,0]}^{T}, d_{2}^{*} = {[10, - 10,0]}^{T}, d_{5}^{*} = {[- 10, - 10,0]}^{T}, d_{3}^{*} = {[10,10,0]}^{T} .

Each AUV tracks its respective position in the formation by adjusting its location to $η_{i} = η_{0} + d_{i}^{*}$ .

6.1 DDL formation learning control simulation

The estimated virtual leader’s state, derived from the cooperative estimator in the first layer (see Equations 3, 4), is utilized to estimate each agent’s complete uncertain dynamics within the DDL controller (second layer) using Equations 13, 14. The uncertain nonlinear functions $F_{i} (Z_{i})$ for each agent are approximated using RBFNN, as described in Equation 11. Specifically, for each agent $i \in {1, \dots, 5}$ , the nonlinear uncertain functions $F_{i} (Z_{i})$ , dependent on $ν_{i}$ , are modeled. The input to the NN, $Z_{i} = {[u_{i}, v_{i}, r_{i}]}^{T}$ , allows the construction of Gaussian RBFNN, represented by $W_{k, i}^{T} S_{k, i} (Z_{i})$ , utilizing 4,096 neurons arranged in an $16 \times 16 \times 16$ grid. The centers of these neurons are evenly distributed over the state space $[- 100,100] \times [- 100,100] \times [- 100,100]$ , and each has a width $γ_{k, i} = 60$ , ensuring bounded and structured parameter optimization for all $i \in {1, \dots, 5}$ and $k \in {1,2,3}$ .

The observer and controller parameters are chosen as $β_{1} = β_{2} = 5$ , and the diagonal matrices $K_{1, i} = 800 * diag {1.2, 1,1}$ and $K_{2, i} = 1200 * diag {1.2, 1,1}$ , with $Γ_{k, i} = 10$ and $σ_{k, i} = 0.0001$ for all $i \in {1, \dots, 5}$ and $k \in {1,2,3}$ . The initial conditions for the agents are set as $η_{1} (0) = {[30,60,0]}^{T}$ , $η_{2} (0) = {[40,70,0]}^{T}$ , $η_{3} (0) = {[50,80,0]}^{T}$ , $η_{4} (0) = {[10,70,0]}^{T}$ , and $η_{5} (0) = {[10,50,0]}^{T}$ . Zero initial conditions are assumed for all the distributed observer states $(χ_{i, 0}, A_{i, 0})$ and the DDL controller states $W_{k, i}$ for all $i \in {1, \dots, 5}$ and $k \in {1,2,3}$ . Time-domain simulation is carried out using the DDL formation learning control laws as specified in Equations 13, 14, along with Equations 3, 4.

Figure 3 displays the simulation results of the cooperative estimator (first layer) for all five agents. It illustrates how each agent’s estimated states, ${\hat{η}}_{i}$ , converge perfectly to the leader’s states ${\hat{η}}_{0}$ thorough Equations 3, 4. Figure 4 presents the position tracking control responses of all agents. Figures 4A–C illustrate the tracking performance of AUVs along the x-axis, y-axis, and vehicle heading, respectively, demonstrating effective tracking of the leader’s position signal. While the first AUV exactly tracks the leader’s states, agents 2 through 5 are shown to successfully follow agent 1, maintaining prescribed distances and alignment along the x and y axes, and matching the same heading angle. These results underscore the robustness of the real-time tracking control system, which enforces a predefined formation pattern, initially depicted in Figure 2. Additionally, Figure 5 highlights the real-time control performance for all agents, showcasing the effectiveness of the tracking strategy in maintaining the formation pattern.

Figure 3

Figure 3. Simulation results of the cooperative observer (first layer) for all three states (x-axis, y-axis, and vehicle heading) of each AUV: (A) ${\hat{x}}_{i} \to x_{0} (m)$ , (B) ${\hat{y}}_{i} \to y_{0} (m)$ , (C) ${\hat{ψ}}_{i} \to ψ_{0} (deg)$ .

Figure 4

Figure 4. Simulation results of position tracking control performance of all agents: (A) $x_{i} \to x_{0} (m)$ , (B) $y_{i} \to y_{0} (m)$ , (C) $ψ_{i} \to ψ_{0} (deg)$ .

Figure 5

Figure 5. Real-time control performance in simulation for all agents, demonstrating the tracking strategy’s effectiveness in maintaining the formation pattern.

The sum of the absolute values of the neural network weights in Figure 6. This convergence reflects the network’s ability to maintain consistent performance, as further adjustments to the weights become minimal. Also, updating neural network weights and their convergence throughout the learning process into their optimal valies depicted in Figure 7. This convergence of all neural network weights to their optimal values during the training process, aligns with Theorem 4 as well. This leads to achieving accurate function approximation in the second layer. Figure 8 represents the successful function approximation results for the unknown system dynamics $F_{3} (Z_{3})$ as defined in Equation 11 for the third AUV, using RBFNN. The approximations are plotted for both $W_{k, 3}^{T} S_{k, 3} (Z_{3})$ and ${\bar{W}}_{k, 3}^{T} S_{k, 3} (Z_{3})$ for all $k \in I [1,3]$ which defined in Theorem 4. The results confirm that locally accurate approximations of the AUV’s nonlinear dynamics were achieved. Moreover, this learned knowledge about the dynamics is effectively stored and represented using localized constant RBFNN.

Figure 6

Figure 6. Sum of the absolute values of neural network weights in simulation for the third agent, showing the network stabilized and learns a consistent tracking pattern.

Figure 7

Figure 7. Convergence of neural network weights of each state to their optimal values in simulation for the third agent: (A) ${\hat{W}}_{1,3}$ , (B) ${\hat{W}}_{2,3}$ , (C) ${\hat{W}}_{3,3}$ . The stabilized weights demonstrate accurate learning in the second layer throughout the training process.

Figure 8

Figure 8. Simulation results of successful function approximation for all three states (k = 1, 2, 3) of the $3^{rd}$ AUV: (A) $k = 1$ , (B) $k = 2$ , (C) $k = 3$ . Comparison of $f_{k, 3} (Z_{3})$ , $W_{k, 3}^{T} S_{k, 3} (Z_{3})$ , and ${\bar{W}}_{k, 3}^{T} S_{k, 3} (Z_{3})$ using stored constant NN weights $({\bar{W}}_{k, 3})$ .

6.2 Simulation for formation control with pre-learned dynamics

To evaluate the distributed control performance of the multi-AUV system, we implemented the pre-learned distributed formation control law. This strategy integrates the estimator observer Equations 3, 4, this time coupled with the constant RBFNN controller (Equation 21). We employed the virtual leader dynamics described in Equation 21 to generate consistent position tracking reference signals, as previously discussed in Section 6.1. To ensure a fair comparison, identical initial conditions and control gains and inputs were used across all simulations. Figure 9 illustrates the comparison of the tracking control results from Equations 13, 14 with the results using pre-trained weights $\bar{W}$ in Equation 20.

Figure 9

Figure 9. Simulation results of successful performance of position tracking control using pretrained weights $(\bar{W})$ : (A) $x_{i} \to x_{0} (m)$ , (B) $y_{i} \to y_{0} (m)$ , (C) $ψ_{i} \to ψ_{0} (deg)$ .

The control experiments and simulation results presented demonstrate that the constant RBFNN control law (Equation 4) can achieve satisfactory tracking control performance comparable to that of the adaptive control laws (Equations 13, 14), but with no computational demand. The elimination of online recalculations or readaptations of the NN weights under this control strategy significantly reduces the computational load whenever system restarts without needing to retrain again. This reduction is particularly advantageous in scenarios involving extensive neural networks with a large number of neurons, thereby conserving system energy and enhancing operational efficiency in real-time applications.

Before concluding the paper, a brief contribution of the paper is provided:

• Distributed Observer Results: Simulations showed that the distributed observer effectively estimated the leader’s state, allowing for accurate formation control without needing global information.

• Tracking Control Results: The controller demonstrated reliable tracking of reference signals, maintaining performance even under varying conditions and unknown system dynamics.

• Formation Control: The proposed controller maintained accurate formation control relative to a virtual leader in simulations, even when the system dynamics were unknown with different AUVs.

• Neural Network Weight Convergence: The simulation results demonstrated that the neural network weights converged effectively, ensuring accurate function approximation and reliable performance in controlling AUVs under uncertainties.

• Adaptability and Stability: The framework ensured stable tracking performance across various environmental conditions by relying on the RBFNN’s learning capabilities, allowing the AUVs to use prelearned information and maintain formation control without needing to relearn dynamics whenever system restarts.

• Reduction in Computational Load: The use of pre-trained neural network weights significantly reduced the computational burden during real-time operation, particularly when large neural networks were employed.

7 Conclusion

In conclusion, this paper has introduced a novel two-layer control framework designed for Autonomous Underwater Vehicles (AUVs), aimed at universal applicability across various AUV configurations and environmental conditions. This framework assumes all system dynamics to be unknown, thereby enabling the controller to operate independently of specific dynamic parameters and effectively handle any environmental challenges, including hydrodynamic forces and torques. The framework consists of a first-layer distributed observer estimator that captures the leader’s dynamics using information from adjacent agents, and a second-layer decentralized deterministic learning controller. Each AUV utilizes the estimated signals from the first layer to determine the desired trajectory, simultaneously training its own dynamics using Radial Basis Function Neural Networks (RBFNN). This innovative approach not only sustains stability and performance in dynamic and unpredictable environments but also allows AUVs to efficiently utilize previously learned dynamics after system restarts, facilitating rapid resumption of optimal operations. The robustness and versatility of this framework have been rigorously confirmed through comprehensive simulations, demonstrating its potential to significantly enhance the adaptability and resilience of AUV systems. By embracing total uncertainty in system dynamics, this framework establishes a new benchmark in autonomous underwater vehicle control and lays a solid groundwork for future developments aimed at minimizing energy use and maximizing system flexibility. We plan to expand this framework by accommodating more general leader dynamics and conducting experimental applications to validate its performance in real-world settings. Moreover, a more accurate model of some source of uncertainty could improve performance which we will address in our future research. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

EJ: Conceptualization, Writing–original draft, Writing–review and editing. MZ: Conceptualization, Funding acquisition, Supervision, Writing–review and editing. PS: Conceptualization, Supervision, Writing–review and editing. CY: Conceptualization, Formal Analysis, Funding acquisition, Supervision, Writing–review and editing.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This work is supported in part by the National Science Foundation under Grant CMMI-1952862 and CMMI-2154901.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Footnotes

¹A recurrent trajectory represents a large set of periodic and periodic-like trajectories generated from linear/nonlinear dynamical systems. A detailed characterization of recurrent trajectories can be found in Wang and Hill (2018).

References

Balch, T., and Arkin, R. (1998). Behavior-based formation control for multirobot teams. IEEE Trans. Robotics Automation 14, 926–939. doi:10.1109/70.736776

CrossRef Full Text | Google Scholar

Cai, H., Lewis, F. L., Hu, G., and Huang, J. (2015). “Cooperative output regulation of linear multi-agent systems by the adaptive distributed observer,” in 2015 54th IEEE Conference on Decision and Control (CDC) (IEEE), 5432–5437.

CrossRef Full Text | Google Scholar

Cao, X., Ren, L., and Sun, C. (2022). Dynamic target tracking control of autonomous underwater vehicle based on trajectory prediction. IEEE Trans. Cybern. 53, 1968–1981. doi:10.1109/tcyb.2022.3189688

PubMed Abstract | CrossRef Full Text | Google Scholar

Christen, S., Jendele, L., Aksan, E., and Hilliges, O. (2021). Learning functionally decomposed hierarchies for continuous control tasks with path planning. IEEE Robotics Automation Lett. 6, 3623–3630. doi:10.1109/lra.2021.3060403

CrossRef Full Text | Google Scholar

Cui, R., Ge, S. S., How, B. V. E., and Choo, Y. S. (2010). Leader–follower formation control of underactuated autonomous underwater vehicles. Ocean. Eng. 37, 1491–1502. doi:10.1016/j.oceaneng.2010.07.006

CrossRef Full Text | Google Scholar

Dong, X., Yuan, C., Stegagno, P., Zeng, W., and Wang, C. (2019). Composite cooperative synchronization and decentralized learning of multi-robot manipulators with heterogeneous nonlinear uncertain dynamics. J. Frankl. Inst. 356, 5049–5072. doi:10.1016/j.jfranklin.2019.04.028

CrossRef Full Text | Google Scholar

Fossen, T. I. (1999). “Guidance and control of ocean vehicles,”. Norway: University of Trondheim. Doctors Thesis.

Google Scholar

Ghafoori, S., Rabiee, A., Cetera, A., and Abiri, R. (2024). Bispectrum analysis of noninvasive eeg signals discriminates complex and natural grasp types. arXiv Prepr. arXiv:2402.01026, 1–5. doi:10.1109/embc53108.2024.10782163

CrossRef Full Text | Google Scholar

Hadi, B., Khosravi, A., and Sarhadi, P. (2021). A review of the path planning and formation control for multiple autonomous underwater vehicles. J. Intelligent and Robotic Syst. 101, 67–26. doi:10.1007/s10846-021-01330-4

CrossRef Full Text | Google Scholar

Hou, S. P., and Cheah, C. C. (2009). “Coordinated control of multiple autonomous underwater vehicles for pipeline inspection,” in Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference (IEEE), 3167–3172.

Google Scholar

Ioannou, P. A., and Sun, J. (1996). Robust adaptive control, 1. Upper Saddle River, NJ: PTR Prentice-Hall.

Google Scholar

Jandaghi, E., Chen, X., and Yuan, C. (2023). “Motion dynamics modeling and fault detection of a soft trunk robot,” in 2023 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) (IEEE), 1324–1329.

CrossRef Full Text | Google Scholar

Jandaghi, E., Stein, D. L., Hoburg, A., Stegagno, P., Zhou, M., and Yuan, C. (2024). “Composite distributed learning and synchronization of nonlinear multi-agent systems with complete uncertain dynamics,” in 2024 IEEE International Conference on Advanced Intelligent Mechatronics (AIM), 1367–1372. doi:10.1109/aim55361.2024.10637197

CrossRef Full Text | Google Scholar

Krstic, M., Kokotovic, P. V., and Kanellakopoulos, I. (1995). Nonlinear and adaptive control design. John Wiley and Sons, Inc.

Google Scholar

Lawton, J. R. T. (2000). A Behavior-Based Approach to Multiple Spacecraft Formation Flying (Ph.D. thesis). Brigham Young University, Provo, UT, United States.

Google Scholar

Millán, P., Orihuela, L., Jurado, I., and Rubio, F. R. (2013). Formation control of autonomous underwater vehicles subject to communication delays. IEEE Trans. Control Syst. Technol. 22, 770–777. doi:10.1109/tcst.2013.2262768

CrossRef Full Text | Google Scholar

Park, J., and Sandberg, I. W. (1991). Universal approximation using radial-basis-function networks. Neural Comput. 3, 246–257. doi:10.1162/neco.1991.3.2.246

PubMed Abstract | CrossRef Full Text | Google Scholar

Peng, Z., Wang, D., Shi, Y., Wang, H., and Wang, W. (2015). Containment control of networked autonomous underwater vehicles with model uncertainty and ocean disturbances guided by multiple leaders. Inf. Sci. 316, 163–179. doi:10.1016/j.ins.2015.04.025

CrossRef Full Text | Google Scholar

Peng, Z., Wang, J., and Wang, D. (2017). Distributed maneuvering of autonomous surface vehicles based on neurodynamic optimization and fuzzy approximation. IEEE Trans. Control Syst. Technol. 26, 1083–1090. doi:10.1109/tcst.2017.2699167

CrossRef Full Text | Google Scholar

Prestero, T. (2001). “Development of a six-degree of freedom simulation model for the remus autonomous underwater vehicle,” in MTS/IEEE Oceans 2001. An Ocean Odyssey. Conference Proceedings (IEEE Cat. No.01CH37295), Honolulu, HI, USA, 2001, pp.450–455 vol.1, doi:10.1109/OCEANS.2001.968766

PubMed Abstract | CrossRef Full Text | Google Scholar

Ren, W., and Beard, R. W. (2005). Consensus seeking in multiagent systems under dynamically changing interaction topologies. IEEE Trans. automatic control 50, 655–661. doi:10.1109/tac.2005.846556

CrossRef Full Text | Google Scholar

Ren, W., and Beard, R. W. (2008). Distributed consensus in multi-vehicle cooperative control, 27. Springer.

Google Scholar

Rout, R., and Subudhi, B. (2016). A backstepping approach for the formation control of multiple autonomous underwater vehicles using a leader–follower strategy. J. Mar. Eng. and Technol. 15, 38–46. doi:10.1080/20464177.2016.1173268

CrossRef Full Text | Google Scholar

Skjetne, R., Fossen, T. I., and Kokotović, P. V. (2005). Adaptive maneuvering, with experiments, for a model ship in a marine control laboratory. Automatica 41, 289–298. doi:10.1016/j.automatica.2004.10.006

CrossRef Full Text | Google Scholar

Su, Y., and Huang, J. (2011). Cooperative output regulation of linear multi-agent systems. IEEE Trans. Automatic Control 57, 1062–1066. doi:10.1109/TAC.2011.2169618

CrossRef Full Text | Google Scholar

Tutsoy, O., Asadi, D., Ahmadi, K., Nabavi-Chashmi, S. Y., and Iqbal, J. (2024). Minimum distance and minimum time optimal path planning with bioinspired machine learning algorithms for faulty unmanned air vehicles. IEEE Trans. Intelligent Transp. Syst. 25, 9069–9077. doi:10.1109/tits.2024.3367769

CrossRef Full Text | Google Scholar

Wang, C., and Hill, D. J. (2018). Deterministic learning theory for identification, recognition, and control. CRC Press. doi:10.1201/9781315221755

CrossRef Full Text | Google Scholar

Yan, T., Xu, Z., Yang, S. X., and Gadsden, S. A. (2023). Formation control of multiple autonomous underwater vehicles: a review. Intell. and Robotics 3, 1–22. doi:10.20517/ir.2023.01

CrossRef Full Text | Google Scholar

Yan, Z., Liu, X., Zhou, J., and Wu, D. (2018). Coordinated target tracking strategy for multiple unmanned underwater vehicles with time delays. IEEE Access 6, 10348–10357. doi:10.1109/access.2018.2793338

CrossRef Full Text | Google Scholar

Yang, Y., Xiao, Y., and Li, T. (2021). A survey of autonomous underwater vehicle formation: performance, formation control, and communication capability. IEEE Commun. Surv. and Tutorials 23, 815–841. doi:10.1109/comst.2021.3059998

CrossRef Full Text | Google Scholar

Yuan, C. (2017). Leader-following consensus of parameter-dependent networks via distributed gain-scheduling control. Int. J. Syst. Sci. 48, 2013–2022. doi:10.1080/00207721.2017.1309597

CrossRef Full Text | Google Scholar

Yuan, C., Licht, S., and He, H. (2017). Formation learning control of multiple autonomous underwater vehicles with heterogeneous nonlinear uncertain dynamics. IEEE Trans. Cybern. 48, 2920–2934. doi:10.1109/tcyb.2017.2752458

PubMed Abstract | CrossRef Full Text | Google Scholar

Yuan, C., and Wang, C. (2011). Persistency of excitation and performance of deterministic learning. Syst. and control Lett. 60, 952–959. doi:10.1016/j.sysconle.2011.08.002

CrossRef Full Text | Google Scholar

Yuan, C., and Wang, C. (2012). Performance of deterministic learning in noisy environments. Neurocomputing 78, 72–82. doi:10.1016/j.neucom.2011.05.037

CrossRef Full Text | Google Scholar

Zhang, Y., Li, S., and Liu, X. (2018). Neural network-based model-free adaptive near-optimal tracking control for a class of nonlinear systems. IEEE Trans. neural Netw. Learn. Syst. 29, 6227–6241. doi:10.1109/tnnls.2018.2828114

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhou, J., Si, Y., and Chen, Y. (2023). A review of subsea auv technology. J. Mar. Sci. Eng. 11, 1119. doi:10.3390/jmse11061119

CrossRef Full Text | Google Scholar

Keywords: environment-independent controller, autonomous underwater vehicles (AUV), dynamic learning, formation learning control, multi-agent systems, neural network control, adaptive control, robotics

Citation: Jandaghi E, Zhou M, Stegagno P and Yuan C (2025) Adaptive formation learning control for cooperative AUVs under complete uncertainty. Front. Robot. AI 11:1491907. doi: 10.3389/frobt.2024.1491907

Received: 05 September 2024; Accepted: 12 December 2024;
Published: 14 February 2025.

Edited by:

Giovanni Iacca, University of Trento, Italy

Reviewed by:

Önder Tutsoy, Adana Science and Technology University, Türkiye
Di Wu, Harbin University of Science and Technology, China

Copyright © 2025 Jandaghi, Zhou, Stegagno and Yuan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Chengzhi Yuan, Y3l1YW5AdXJpLmVkdQ==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.