Toward a physics-guided machine learning approach for predicting chaotic systems dynamics

Feng, Liu; Liu, Yang; Shi, Benyun; Liu, Jiming

doi:10.3389/fdata.2024.1506443

ORIGINAL RESEARCH article

Front. Big Data, 17 January 2025

Sec. Big Data Networks

Volume 7 - 2024 | https://doi.org/10.3389/fdata.2024.1506443

This article is part of the Research TopicInterdisciplinary Approaches to Complex Systems: Highlights from FRCCS 2023/24View all 7 articles

Toward a physics-guided machine learning approach for predicting chaotic systems dynamics

Liu Feng¹

Yang Liu¹

Benyun Shi²

Jiming Liu¹^*

¹Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
²College of Computer and Information Engineering, Nanjing Tech University, Nanjing, China

Predicting the dynamics of chaotic systems is crucial across various practical domains, including the control of infectious diseases and responses to extreme weather events. Such predictions provide quantitative insights into the future behaviors of these complex systems, thereby guiding the decision-making and planning within the respective fields. Recently, data-driven approaches, renowned for their capacity to learn from empirical data, have been widely used to predict chaotic system dynamics. However, these methods rely solely on historical observations while ignoring the underlying mechanisms that govern the systems' behaviors. Consequently, they may perform well in short-term predictions by effectively fitting the data, but their ability to make accurate long-term predictions is limited. A critical challenge in modeling chaotic systems lies in their sensitivity to initial conditions; even a slight variation can lead to significant divergence in actual and predicted trajectories over a finite number of time steps. In this paper, we propose a novel Physics-Guided Learning (PGL) method, aiming at extending the scope of accurate forecasting as much as possible. The proposed method aims to synergize observational data with the governing physical laws of chaotic systems to predict the systems' future dynamics. Specifically, our method consists of three key elements: a data-driven component (DDC) that captures dynamic patterns and mapping functions from historical data; a physics-guided component (PGC) that leverages the governing principles of the system to inform and constrain the learning process; and a nonlinear learning component (NLC) that effectively synthesizes the outputs of both the data-driven and physics-guided components. Empirical validation on six dynamical systems, each exhibiting unique chaotic behaviors, demonstrates that PGL achieves lower prediction errors than existing benchmark predictive models. The results highlight the efficacy of our design of data-physics integration in improving the precision of chaotic system dynamics forecasts.

1 Introduction

Chaotic systems are ubiquitous, from academic research in physics (Pecora and Carroll, 1990; Grassberger and Procaccia, 1983) and chemistry (Hess, 1990; Field et al., 1993) to real-world domains such as epidemiology (Aguiar et al., 2008; Mishra et al., 2020) and climatology (Palmer, 1993; Olsen et al., 2019). By predicting the dynamics of these systems, we can gain valuable insights into their future behaviors, which can not only help us understand the underlying mechanisms of these systems but, more importantly, effectively inform and guide the decision-making process in real-world problems within the respective fields. For example, forecasting the dynamical behaviors in the spread of epidemics can help us uncover the disease transmission patterns and, accordingly, deploy effective intervention strategies to control infectious diseases (Mangiarotti et al., 2016). Predicting the dynamics of variables in the climate system, such as temperature and precipitation, can help us be well prepared for extreme weather events (Toreti et al., 2013).

In recent years, with the availability of large amounts of data and the advancement of computing power, many studies have utilized data-driven approaches to analyze and predict the dynamics of chaotic systems. These methods generally utilize the given data to learn the mapping function between historical observations and the future value of the target variable, and then use the learned mapping function to conduct the prediction. Typical data-driven methods that have been widely used in chaotic system dynamics prediction include long short-term memory networks(LSTM) (Hochreiter, 1997; Chattopadhyay et al., 2020), reservoir computing (Jaeger, 2001; Pathak et al., 2018), etc. The above methods have been proven to be effective for the short-term prediction of chaotic systems, demonstrating an ability to capture the instantaneous dynamics (Chantry et al., 2021). However, their ability to make long-term predictions is limited, especially for those rapidly evolving chaotic dynamical systems, where even a slight initial variation can result in significant differences as the evolution over time (Lorenz, 1963). The reason could be that such data-driven methods rely solely on historical observations during the learning process but ignore the underlying mechanisms of chaotic systems, which are, in fact, of great importance in characterizing the systems' dynamical behaviors.

To overcome the limitations of pure data-driven models in predicting chaotic system dynamics and to enhance prediction performance, several existing studies have combined data with physical mechanisms. For example, PIESN (Doan et al., 2020) and its variant (Na et al., 2023) encode the systems' governing equations into the models' loss functions, penalizing predictions that deviate from physical laws. Furthermore, other methods utilize physical knowledge to help reconstruct and predict the dynamics of chaotic systems with unmeasured variables (Racca and Magri, 2021; Özalp et al., 2023). These methods, however, typically require complete and precise knowledge of the governing differential equations of the systems, including the equation parameters, to effectively guide the predictive models, which limits their applicability. Meanwhile, the reconciliation between data-driven approaches and prior physical knowledge remains an open yet essential problem in the prediction of chaotic systems' dynamics.

To effectively extend the capability for chaotic dynamics prediction, in this paper, we introduce a novel method called Physics-Guided Learning (PGL). Inspired by a recently developed physics-informed neural network (PINN), which was originally designed for solving forward and reverse problems in nonlinear partial differential equations (Raissi et al., 2019), our PGL method seeks to synergize observational data with the governing physical laws of chaotic systems. In our study, we operate under the assumption that the knowledge of the dynamical system we aim to predict is partially available. Specifically, we assume familiarity with the structure of the ordinal differential equations, while the parameters of these equations remain unknown and will be inferred throughout the learning process. This modest assumption has been widely adopted in recent research in physics-informed machine learning and aligns with many real-world scenarios where precise governing equations are not accessible (Misyris et al., 2020; Nath et al., 2023; Ning et al., 2023). For example, in climate modeling, researchers often rely on the well-established Navier-Stokes equations, despite the challenges in determining their exact parameters and solutions (Yang et al., 2023; Gao et al., 2024). The architecture of PGL is composed of three integral components: a data-driven component that learns the dynamical patterns and mapping functions from historical observations, a physics-guided component that exploits and represents systems' governing mechanisms, and a nonlinear learning component that integrates the output from the data-driven component and that from the physics-guided component in a proper way. The objective functions of these three components will be jointly optimized to achieve the desired goal of chaotic dynamics prediction.

Several related works have explored the use of neural networks to generate chaotic dynamics. Notably, Hopfield Neural Networks (Hopfield, 1984) with memristors (Chua, 1971) have attracted much attention due to their flexible network architecture and bio-inspired characteristics. These models have been employed to produce a variety of chaotic dynamics, including multi-scroll, coexisting, and hyperchaotic attractors (Li et al., 2022; Kong et al., 2024; Deng et al., 2024). In contrast to approaches that generate dynamics with chaotic characteristics for applications such as image encryption (Liu et al., 2019) and privacy protection (Hu et al., 2024), and that do not necessitate reference to a specific dynamical system, our study seeks to predict the dynamical behaviors of a particular chaotic system. We employ data-driven methods, specifically neural networks, leveraging historical observations and partial knowledge of the chaotic system being modeled. By integrating data with physical principles, we aim to extend the scope and accuracy of chaotic dynamics prediction.

The remainder of this paper is organized as follows. Section 2 outlines the proposed methodology, with a detailed explanation of its core principles, architecture design, and learning processes. In Section 3, we present the settings and results of our experiments on six typical chaotic systems, which are designed to validate the effectiveness of the proposed method in the task of chaotic dynamics prediction. Finally, we conclude our work in Section 4.

2 Methodology

In this section, we will outline the formalism and computational mechanism of the proposed PGL method. We begin by defining the learning problem and providing an overview of the method. Subsequently, we present the mathematical definition and formulation of the proposed method for chaotic system dynamics prediction, which integrates data and physical understanding. To enhance the clarity, we detail the method's structure, workflow, and objective function.

2.1 Problem statement

First, we state the definition of chaotic system dynamics prediction. For a chaotic system with N state variables, we represent the system's state observations at time t as $X_{t} = [x_{t}^{1}, x_{t}^{2}, \dots, x_{t}^{N}]$ . X_t−L+1:t = [X_t−L+1, X_t−L+2, …, X_t] denotes the historical data containing L time steps. Meanwhile, the time point sequence T_t−L+1:t = [t−L+1, t−L+2, ⋯t] corresponding to the system's state value sequence X_t−L+1:t is also recorded. The target of chaotic system dynamics prediction is to learn the underlying state transition function and the potential dynamics of the system based on the historical data and governing physical laws, and then forecast the subsequent state of the chaotic system, denoted as ${\tilde{X}}_{t + 1}$ . To achieve this goal, we devise a PGL method that makes use of both the observational data and the underlying dynamical mechanism of the chaotic system. Specifically, the proposed method comprises three core components: a data-driven component (DDC), a physics-guided component (PGC), and a nonlinear learning component (NLC). In the subsequent section, we will furnish a more detailed exposition of our design.

2.2 Physics-guided learning

Figure 1 illustrates the architecture of the proposed method PGL, consisting of DDC, PGC, and NLC. For the DDC, we use a three-layer LSTM with 20 hidden units each, followed by a dense layer. For the PGC, we refer to the PINN configuration (Raissi et al., 2019), using a 10-layer neural network with 32 neurons in each layer. For the third component NLC, note that it is intentionally designed to affirm the feasibility of the proposed idea of integrating data-driven and physics-guided components. Due to the real-world data often exhibits different complex nonlinear patterns, our model, which can be seen as a physics-guided learning framework, is designed with flexibility, allowing for the incorporation of different sophisticated neural network architectures to accommodate and adapt to these higher levels of complexity. In this paper, we utilize two typical architectures–the multi-layer perceptron (MLP)¹ and the attention mechanism–as examples to demonstrate our design of the NLC. Specifically, the MLP-based NLC has two layers: one input layer and one output layer. In the attention-based NLC, we utilize the cross-attention mechanism to capture the nonlinearity in the DDC and PGC's outputs (Vaswani, 2017; Shi et al., 2024). Note that other deep learning modules or architectures can also be flexibly integrated into our framework as the NLC. Next, we will elaborate in detail on how these three components work together to predict the dynamical behaviors of chaotic systems.

Figure 1

Figure 1. Illustration of the architecture of the proposed method PGL, which is composed of three core components: a data-driven component (DDC), a physics-guided component (PGC), and a nonlinear learning component(NLC).

2.2.1 Data-driven component

Firstly, we obtain the prediction of the data-driven branch for the next time step, denoted by $X_{t + 1}^{d a t a} = D D C (X_{t - L + 1 : t})$ . We expect the long short-term memory (LSTM) structure in the DDC to capture both short-term and long-term temporal dependencies in the historical state sequence through its unique gating mechanism and make predictions for the next time step.

2.2.2 Physics-guided component

Afterward, we extend the T_t−L+1:t, turning it into T_t−L+1:t+L, which is further fed into the PGC. The PGC generates the system state predictions that are of equal length to the extended time sequence T_t−L+1:t+L. This process is shown in the following equation:

\begin{array}{l} X_{t - L + 1 : t + L}^{p h y} = P G C (t - L + 1, t - L + 2, . . ., t + L), & (1) \end{array}

where $X_{i}^{p h y} = [x_{i}^{p h y}, y_{i}^{p h y}, z_{i}^{p h y}]$ . We expect that, with the guidance of physical knowledge, the PGC can learn the dynamics of the system and assist the entire model in making predictions. Note that the design of PGC is general and can be used in various chaotic systems. Here, for a better explanation, we use the typical Lorenz system (Lorenz, 1963) as an example to show how the PGC works. The only information that we have is the form of the system's equations shown in the following Equation 2, and we do not know the crucial initial values and system parameters.

\begin{array}{l} \frac{d x}{d t} = a (y - x), \\ \frac{d y}{d t} = c x - y - x z, \\ \frac{d z}{d t} = x y - b z . & (2) \end{array}

Following the work of physics-informed neural networks in Raissi et al. (2019), we utilize the automatic differentiation tools within the deep learning framework PyTorch (Paszke et al., 2017) to compute the derivative of the PGC's output $X_{t - L + 1 : t + L}^{p h y}$ with respect to its input T_t−L+1:t+L, yielding the following:

\begin{array}{l} \frac{\partial X_{t - L + 1 : t + L}^{p h y}}{\partial t} = [\frac{\partial X_{t - L + 1}^{p h y}}{\partial t}, \frac{\partial X_{t - L + 2}^{p h y}}{\partial t}, \dots, \frac{\partial X_{t + L}^{p h y}}{\partial t}], & (3) \end{array}

where $\frac{\partial X_{i}^{p h y}}{\partial t} = [\frac{\partial x_{i}^{p h y}}{\partial t}, \frac{\partial y_{i}^{p h y}}{\partial t}, \frac{\partial z_{i}^{p h y}}{\partial t}]$ . We expect that the approximate derivatives conform to the definition of the Lorenz system, and therefore, we have calculated the residuals with respect to the physics-guided component, as shown below.

\begin{array}{l} l o s s_{p h y} = λ_{1} l o s s_{x} + λ_{2} l o s s_{y} + λ_{3} l o s s_{z}, \\ l o s s_{x} = \sum_{i = t - L + 1}^{t + L} | \frac{\partial x_{i}^{p h y}}{\partial t} - ã (y_{i}^{p h y} - x_{i}^{p h y}) |^{2}, \\ l o s s_{y} = \sum_{i = t - L + 1}^{t + L} | \frac{\partial y_{i}^{p h y}}{\partial t} - (\tilde{c} x_{i}^{p h y} - y_{i}^{p h y} - x_{i}^{p h y} z_{i}^{p h y}) |^{2}, \\ l o s s_{z} = \sum_{i = t - L + 1}^{t + L} | \frac{\partial z_{i}^{p h y}}{\partial t} - (x_{i}^{p h y} y_{i}^{p h y} - \tilde{b} z_{i}^{p h y}) |^{2}, & (4) \end{array}

where λ₁, λ₂, and λ₃ are hyper parameters which can be selected by a grid search strategy from a predefined rough range in practice. ã, $\tilde{b}$ , and $\tilde{c}$ are trainable parameters of the model. Note that the true parameters of the chaotic systems remain unidentified for the PGC and for the proposed PGL model, a scenario that is typical in real-world applications. It is our expectation that the proposed model is capable of learning and characterizing the systems' dynamics even in the presence of such uncertainties. Additionally, since we have the ground truth X_t−L+1:t, we conduct supervised learning by minimizing the following loss_data:

\begin{array}{l} l o s s_{d a t a} = \frac{1}{L} \sum_{i = t - L + 1}^{t} | X_{i}^{p h y} - X_{i} |^{2} . & (5) \end{array}

By incorporating penalty terms based on physics and data, we hope that the PGC can rely on known physical knowledge and work in collaboration with the DDC to predict chaotic systems.

2.2.3 Nonlinear learning component

Next, a nonlinear learning component will balance the predicted $X_{t + 1}^{d a t a}$ and $X_{t + 1}^{p h y}$ from DDC and PGC to provide the final prediction ${\tilde{X}}_{t + 1}$ for the system at the time step t+1. In the following, we will introduce the MLP-based NLC and the Attention-based NLC, separately.

2.2.3.1 MLP-based NLC

In the MLP-based NLC, we utilize a classical structure of MLP to conduct the nonlinear learning task, which can be described as the following equation:

\begin{array}{l} {\tilde{X}}_{t + 1} = N L C (c o n c a t e n a t e (X_{t + 1}^{d a t a}, X_{t + 1}^{p h y})), & (6) \end{array}

where ${\tilde{X}}_{t + 1}$ represents the predicted value for the next time step. To constraints the learning process, we also calculate the loss which is formulated as follows:

\begin{array}{l} l o s s_{N L C} = | {\tilde{X}}_{t + 1} - X_{t + 1} |^{2}, & (7) \end{array}

where X_t+1 denotes the ground truth value of the system's state variable at time step t+1, which serves as the label in our supervised learning. It is important to note that the data for X_t+1 in Equation 7 is exclusively accessible during the training phase. This information is not available during the testing phase, where the model must predict X_t+1 without the aid of ground truth values.

2.2.3.2 Attention-based NLC

In the Attention-based NLC, we use a specifically designed attention mechanism, i.e. the cross-attention, to learn the nonlinearity and make the final predictions. First, the attention mechanism generates the query Q_data, the key K_data, and the value V_data by applying linear transformations to $X_{t + 1}^{d a t a}$ , i.e., $Q_{d a t a} = X_{t + 1}^{d a t a} \cdot W_{q}^{d a t a}$ , $K_{d a t a} = X_{t + 1}^{d a t a} \cdot W_{k}^{d a t a}$ , and $V_{d a t a} = X_{t + 1}^{d a t a} \cdot W_{v}^{d a t a}$ , where $W_{q}^{d a t a}$ , $W_{k}^{d a t a}$ , and $W_{v}^{d a t a}$ are the trainable matrices. Similarly, we can obtain the query Q_phy, the key K_pyh, and the value V_phy by performing the same calculation for the output of PGC $X_{t + 1}^{p h y}$ . Then, we can further calculate the attention feature maps A^data and A^phy based on these units:

\begin{array}{l} \begin{matrix} A^{d a t a} = softmax (\frac{Q_{d a t a} K_{d a t a}^{T}}{\sqrt{d_{K_{d a t a}}}}) \cdot V_{d a t a} + X_{t + 1}^{d a t a}, \\ A^{p h y} = softmax (\frac{Q_{p h y} K_{p h y}^{T}}{\sqrt{d_{K_{p h y}}}}) \cdot V_{p h y} + X_{t + 1}^{p h y}, \end{matrix} & (8) \end{array}

where d_{K_data} and d_{K_phy} denote the dimensions of K_data and K_pyh, respectively, and somfmax is an activation function. By doing so, we intend to learn the important information in the outputs from DDC and PGC separately, so as to guarantee the prediction performance.

Next, we attempt to capture the nonlinear relationships between $X_{t + 1}^{d a t a}$ and $X_{t + 1}^{p h y}$ by applying the cross-attention mechanism. The cross-attention feature map CA^data can be obtained by using Q_phy to query the key-value pair (K_data, V_data):

\begin{array}{l} C A^{d a t a} = softmax (\frac{Q_{p h y} K_{d a t a}^{T}}{\sqrt{d_{K_{d a t a}}}}) \cdot V_{d a t a} + X_{t + 1}^{d a t a} . & (9) \end{array}

Similarly, the cross-attention feature map CA^phy can be calculated as follows:

\begin{array}{l} C A^{p h y} = softmax (\frac{Q_{d a t a} K_{p h y}^{T}}{\sqrt{d_{K_{p h y}}}}) \cdot V_{p h y} + X_{t + 1}^{p h y} . & (10) \end{array}

Finally, all the feature maps obtained above are concatenated and fed into the output layer, to make the final prediction ${\tilde{X}}_{t + 1}$ :

\begin{array}{l} {\tilde{X}}_{t + 1} = F (c o n c a t e n a t e (A^{d a t a}, A^{p h y}, C A^{d a t a}, C A^{p h y})), & (11) \end{array}

where F denotes the output layer in the Attention-based NLC. Same as MLP-based NLC, we also calculate the loss $l o s s_{N L C} = | {\tilde{X}}_{t + 1} - X_{t + 1} |^{2}$ , to constraint the learning process.

2.2.4 Objective function

The final optimization objective function, which takes account of both data and physics, is given as follows:

\begin{array}{l} \begin{matrix} m i n (w_{1} l o s s_{N L C} + w_{2} l o s s_{d a t a} + w_{3} l o s s_{p h y}), \end{matrix} & (12) \end{array}

where w₁, w₂, and w₃ are hyper parameters.

3 Experimental results

In this section, we use six dynamical systems with different chaotic behaviors, i.e., the Rossler, Aizawa, Lorenz, Chua, Chen, and Halvorsen systems, which are widely used in chaotic systems dynamics prediction (Nasiri and Ebadzadeh, 2022; Cheng et al., 2021; Na et al., 2021; Wu et al., 2024; Kennedy et al., 2024; Gilpin, 2021), to validate the performance of the proposed PGL method in long-term forecasting of chaotic dynamics. We also perform an ablation study to analyze the contributions of different components of the proposed method to the chaotic dynamics prediction.

3.1 Descriptions of chaotic systems

3.1.1 Rossler system

In 1976, Rössler (1976) proposed the well-known Rossler system, which exhibits chaotic phenomena and nonlinear dynamical behavior. The system is defined by the following differential equations:

\begin{array}{l} \frac{d x}{d t} = - y - z, \\ \frac{d y}{d t} = x - a y, \\ \frac{d z}{d t} = b + x z - c z . & (13) \end{array}

3.1.2 Aizawa system

In 1982, Aizawa and Uezu (1982) introduced a new chaotic system, which has multiple three-order nonlinear terms. The Aizawa system can be described by the following equations:

\begin{array}{l} \frac{d x}{d t} = (z - b) x - d y, \\ \frac{d y}{d t} = d x + (z - b) y, \\ \frac{d z}{d t} = c + a z - \frac{z^{3}}{3} - (x^{2} + y^{2}) (1 + e z) + f z x^{3} . & (14) \end{array}

3.1.3 Lorenz system

In 1963, Lorenz (1963) discovered the existence of a peculiar “butterfly effect” in meteorological systems when studying convective instability. The Lorenz system can be described by the following equations:

\begin{array}{l} \frac{d x}{d t} = a (y - x), \\ \frac{d y}{d t} = c x - y - x z, \\ \frac{d z}{d t} = x y - b z . & (15) \end{array}

3.1.4 Chua system

In 1986, Chua et al. (1986) introduced the Chua system, marking an advancement in the study of chaotic systems by linking chaos and nonlinear circuits. The equations of the Chua system are given as follows:

\begin{array}{l} \frac{d x}{d t} = a (y - x - G (x)), \\ \frac{d y}{d t} = x - y + z, \\ \frac{d z}{d t} = - b y, \\ G (x) = c x + (d + c) (| x + 1 | - | x - 1 |) . & (16) \end{array}

3.1.5 Chen system

In 1999, Chen and Ueta (1999) identified a chaotic attractor that bears similarities to the Lorenz system but is topologically distinct in their research on chaotic control. The Chen system can be described by the following equations:

\begin{array}{l} \frac{d x}{d t} = a (y - x), \\ \frac{d y}{d t} = (c - a) x - x z + c y, \\ \frac{d z}{d t} = x y - b z . & (17) \end{array}

3.1.6 Halvorsen System

The Sprott (2010) system, proposed by Arne Dehli Halvorsen, is a 3-D system of chaotic flows whose governing equations are cyclically symmetric and can be described as follows:

\begin{array}{l} \frac{d x}{d t} = - a x - 4 y - 4 z - y^{2}, \\ \frac{d y}{d t} = - a y - 4 z - 4 x - z^{2}, \\ \frac{d z}{d t} = - a z - 4 x - 4 y - x^{2} . & (18) \end{array}

All the above six dynamical systems have nonlinear and chaotic behaviors, posing great challenges for long-term prediction. We use the fourth-order Runge-Kutta method with a step size of 0.01 to obtain the chaotic time series containing 10, 000 steps, which are divided into training, validation, and testing datasets in a ratio of 6:2:2. Specifically, we utilize the data from the initial 6, 000 time steps for training purposes. This is followed by the subsequent 2, 000 time steps, which are designated for the validation process. Finally, we employ the data from the concluding 2, 000 time steps to test the performance of our model. Table 1 provides the details of system parameters and initial values. For parameters λ₁, λ₂, and λ₃ in Equation 4 of the proposed method, we determine their values through a grid search strategy. Specifically, the parameter values are empirically constrained within the range of [0.05, 0.35], with a search step size of 0.05.

Table 1

Table 1. The system parameters and initial values of six chaotic systems used in our study.

3.2 Comparison models and evaluation metrics

We select five representative methods as the baselines for performance comparison in our experiments. They are the long short-term memory (LSTM) (Hochreiter, 1997), the echo state network (ESN) (Pathak et al., 2017), the next generation reservoir computing method (NG-RC) (Gauthier et al., 2021), the knowledge-based neural ordinary differential equations method (K-NODE) (Jiahao et al., 2021) and DLinear (Zeng et al., 2023). Here, LSTM is a classic recurrent neural network model for time series prediction; ESN and NG-RC are representative methods specifically designed and widely used for chaotic system dynamics prediction; DLinear is a state-of-the-art deep learning method developed for complex time series forecasting; and K-NODE is a hybrid-learning approach which integrates the first principles knowledge, specifically the ordinary differential equations, with data-driven technologies, to predict chaotic systems dynamics. For LSTM, we use a three-layer architecture with a uniform hidden state size. To achieve its optimal performance, we experiment with a variety of hidden state sizes, specifically 8, 16, and 32, and report the best result. For ESN, we implement it with a spectral radius of 1.4 and a reservoir size of 300. For NG-RC and DLinear, we follow the default settings reported in their original papers. For K-NODE, we set the prior knowledge as the form of the governing equations with the approximated parameters learned by classic symbolic regression.

When assessing the effectiveness of the methods in capturing and forecasting the dynamical behavior of chaotic systems over the long term, it is a common practice to employ the model's own prediction as the input for forecasting subsequent time steps during the test phase. This iterative process can result in an increase in errors as the forecast horizon extends, especially in chaotic systems, where small deviations at the beginning can lead to significant differences in later outcomes. The mean absolute error (MAE), the root mean square error (RMSE), and the R² (Amaranto and Mazzoleni, 2023) are used as evaluation metrics to measure the prediction performance. The MAE and RMSE are defined as follows:

\begin{array}{l} MAE = \frac{1}{T} \sum_{t = 1}^{T} | ŷ_{t} - y_{t} |, & (19) \end{array}

\begin{array}{l} RMSE = \sqrt{\frac{1}{T} \sum_{t = 1}^{T} {(ŷ_{t} - y_{t})}^{2}}, & (20) \end{array}

\begin{array}{l} R^{2} = 1 - \frac{\sum_{t = 1}^{T} {(ŷ_{t} - y_{t})}^{2}}{\sum_{t = 1}^{T} {(ȳ - y_{t})}^{2}} & (21) \end{array}

where ŷ_t denotes the predicted value of the model, y_t denotes the ground truth, ȳ represents the average value of the ground truth, and T is the corresponding forecast horizon.

3.3 Analysis of results

Figures 2 and 3 demonstrate the comparison between the ground truth of dynamics of the Rossler, Aizawa, Lorenz, Chua, Chen, and Halvorsen systems in 2, 000 time steps, which is illustrated in blue in each sub-figure, and the predictions generated by the proposed PGL-MLP (Figure 2) and PGL-ATT (Figure 3) methods, which are shown in red. From these two figures, we can observe that both PGL-MLP and PGL-ATT can capture the dynamical patterns of these six chaotic systems. Although employing an iterative prediction process in the prediction phase brings great challenges to the task of long-term forecasting, the integration of data and physics enables our method to produce predictions that are consistent with actual dynamics.

Figure 2

Figure 2. Comparison between the ground truth of dynamics of (A) Rossler, (B) Aizawa, (C) Lorenz, (D) Chua, (E) Chen, and (F) Halvorsen systems (blue) and the predictions generated by the proposed PGL-MLP method (red).

Figure 3

Figure 3. Comparison between the ground truth of dynamics of the (A) Rossler, (B) Aizawa, (C) Lorenz, (D) Chua, (E) Chen, and (F) Halvorsen systems (blue) and the predictions generated by the proposed PGL-ATT method (red).

To further evaluate the performance of our predictions, we also conduct an analysis by visualizing the temporal evolution of the ground truth and predictions of the state variables in these chaotic systems in Figures 4 and 5. Generally, both PGL-MLP and PGL-ATT can make satisfactory predictions of the state variables X(t), Y(t), and Z(t) for these chaotic systems. However, the performance of each method on different systems varies slightly. For the Rossler system, the predicted curves of both PGL-MLP and PGL-ATT closely match the ground truth, accurately characterizing even the irregular patterns in Z(t) component; only one peak was missed by the PGL-ATT. This indicates that the proposed method successfully captures the dynamics of this chaotic system and thus is able to make accurate predictions in such a long-term period. For the Aizawa system, the PGL-MLP shows very good performance; its prediction is consistent with the ground truth in all 2, 000 steps. The performance of the PGL-ATT is also acceptable; the predicted dynamics match well with the actual curve in the first 1, 000 steps. For the Lorenz system, both PGL-MLP and PGL-ATT achieve high accuracy up to around 1, 100 time steps on the component Z(t), and 600 time steps on the components X(t) and Y(t), respectively. For the Chua system, PGL-MLP and PGL-ATT have similar performance, making accurate predictions up to about 1250 time steps, and then exhibit notable discrepancies in the components X(t), Y(t), and Z(t). Such discrepancies in Chen and Halvorsen systems appear earlier, compared with the Chua system. Interestingly, PGL-MLP's predictions for both the Chen and Halvorsen systems initially achieve high accuracy but subsequently exhibit noticeable disturbances. Fortunately, due to the model's ability to balance data and physical knowledge, it regains accuracy in its predictions after these disturbances.

Figure 4

Figure 4. Comparison between the ground truth of the state variables of the (A) Rossler, (B) Aizawa, (C) Lorenz, (D) Chua, (E) Chen, and (F) Halvorsen systems (blue) and the predictions generated by the proposed PGL-MLP method (red) over time.

Figure 5

Figure 5. Comparison between the ground truth of the state variables of the (A) Rossler, (B) Aizawa, (C) Lorenz, (D) Chua, (E) Chen, and (F) Halvorsen systems (blue) and the predictions generated by the proposed PGL-ATT method (red) over time.

To quantitatively compare the performance of our methods (i.e., PGL-MLP and PGL-ATT) with that of existing methods, we report the MAE and RMSE of all methods for different prediction horizons in Tables 2, 3, respectively. The results demonstrate that the proposed methods achieve the lowest prediction errors in most of the settings, demonstrating the effectiveness of our methods in making long-term predictions of chaotic system dynamics. An interesting observation is that the performance of PGL-MLP is generally better than that of PGL-ATT, despite the latter employing a more sophisticated attention mechanism. One potential explanation is that the complexity of the attention mechanism may lead to overfitting in the predictive model when compared to PGL-MLP. It is important to note that the task of predicting chaotic system dynamics differs from natural language processing, where attention mechanisms have demonstrated notable effectiveness. The former focuses on capturing the intrinsic, evolving patterns of dynamical systems, which may change over time, whereas the latter is primarily concerned with understanding consistent contextual relationships in input data. Consequently, a model that is overly complex or overfitted to historical data may not yield the expected performance in predicting chaotic systems dynamics.

Table 2

Table 2. MAE of LSTM, ESN, NG-RC, DLinear, K-NODE, and the proposed PGL in different prediction horizons on six chaotic systems.

Table 3

Table 3. RMSE of LSTM, ESN, NG-RC, DLinear, K-NODE, and the proposed PGL-ATT and PGL-MLP in different prediction horizons on six chaotic systems.

In addition to the MAE and RMSE, we further analyze the performance of the comparison baselines and our proposed methods using the R² metric, which ranges from 0 to 1 to indicate performance quality. As illustrated in Figure 6, we plot the R² score's trend with increasing predicted time steps and calculate the specific Lyapunov Time for different forecasting horizons. The Lyapunov Time is a critical indicator of a system's chaotic behavior, representing the duration over which two initially close trajectories will diverge significantly (Sangiorgio and Dercole, 2020; Sangiorgio et al., 2021, 2022; Pathak et al., 2018; Patel et al., 2021; Vlachas et al., 2020). Our results show that the proposed methods achieve improved performance across the six chaotic systems. However, the performance of all methods varies across different chaotic systems. This variability is likely due to each system's unique Lyapunov Time, presenting different levels of prediction difficulty.

Figure 6

Figure 6. R² of LSTM, ESN, NG-RC, DLinear, K-NODE, and the proposed PGL-ATT and PGL-MLP in different prediction horizons on six chaotic systems. (A) Rossler, (B) Aizawa, (C) Lorenz, (D) Chua, (E) Chen, and (F) Halvorsen.

3.4 Ablation study

In this subsection, we conduct an ablation study to understand the individual contributions of the different components within our proposed method to predict chaotic dynamics. Specifically, we examine the performance of the Lorenz system dynamics prediction using four distinct configurations of our method: (1) employing only the DDC, which is an LSTM network; (2) integrating both DDC and PGC through a simple linear combination, referred to as PGL-Linear; (3) implementing the proposed method with attention-based NLC as described in this manuscript, referred to as PGL-ATT; and (4) implementing the proposed method with MLP-based NLC as described in this manuscript, referred to as PGL-MLP. In this ablation study, all experimental settings remain consistent with those used in previous experiments, including the initial conditions, the ratio of training and testing sets, and the prediction horizons.

Table 4 presents the results of the ablation study with respect to MAE and RMSE across various forecast horizons. The results obtained from the DDC alone exhibit relatively high MAE and RMSE across all prediction horizons. When integrating the DDC with the PGC using a simple linear combination (denoted as PGL-Linear), there is an observable improvement in performance compared to the DDC results. However, the enhancement achieved by PGL-Linear falls short of our expectations. One potential reason for this is that the relationship between the observational data and the physical principles governing the system's dynamics is likely nonlinear. As a result, a straightforward linear combination may be insufficient to capture the complexity of these interactions. This highlights the necessity of the proposed nonlinear combination (NLC) design for effectively integrating the DDC and PGC to enhance prediction accuracy. This necessity is further supported by the results from PGL-ATT and PGL-MLP, which demonstrate improved performance in terms of MAE and RMSE across all prediction horizons.

Table 4

Table 4. MAE and RMSE of DDC, PGL-Linear, PGL-ATT, and PGL-MLP in different prediction horizons on the Lorenz system.

4 Conclusion and discussion

In this paper, we proposed a physics-guided learning approach to predict the dynamics of chaotic systems. We experimentally evaluated the performance of our method on the Rossler, Aizawa, Lorenz, Chua, Chen, and Halvorsen dynamical systems. The experimental results demonstrated that our method outperforms other baselines in terms of prediction accuracy.

To our knowledge, PINN is among several representative techniques that employ neural networks to solve ordinary and partial differential equations. Other noteworthy methods include those based on the Deep Galerkin Method (DGM) (Sirignano and Spiliopoulos, 2018; Aristotelous et al., 2023) and Neurodifferential approaches (Lagaris et al., 1998; Ramuhalli et al., 2005), each offering unique contributions to the field. In our work, we utilize PINN as a typical example to demonstrate the efficacy of integrating data-driven structures with physical knowledge to accurately predict the dynamics of chaotic systems. This exemplification paves the way for further exploration into the integration of other physics-guided modules with data-driven components, potentially leading to enhanced predictive capabilities.

In our future work, we aim to extend our framework to scenarios where observations are noisy and the underlying governing differential equations are not fully known in advance. Moreover, in our current study, we used only six representative chaotic systems that exhibit distinct dynamical patterns such as the spiral-type chaos in the Rossler system (Rössler, 1977), the butterfly-shaped pattern in the Lorenz system (Li and Yin, 2009), and the double-scroll attractor in the Chua system (Chua, 2007) to demonstrate the feasibility of the proposed idea. Moving forward, we plan to conduct more comprehensive tests on 131 diverse chaotic systems across various domains (Gilpin, 2021)to further validate the robustness of our learning framework. Further, we intend to apply the proposed method to various real-world applications, such as infectious disease risk prediction, climate forecast, and traffic flow prediction. Additionally, we plan to conduct a comprehensive theoretical analysis of the proposed learning framework, attempting to quantitatively characterize its learning capacity and prediction error bounds using a series of key properties of chaotic systems, such as the Lyapunov Exponent and the Hurst Exponent.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

LF: Conceptualization, Data curation, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing. YL: Conceptualization, Funding acquisition, Methodology, Project administration, Supervision, Writing – original draft, Writing – review & editing. BS: Funding acquisition, Investigation, Methodology, Writing – original draft, Writing – review & editing. JL: Conceptualization, Funding acquisition, Methodology, Project administration, Supervision, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported in part by the Ministry of Science and Technology of China (2021ZD0112500), in part by the National Natural Science Foundation of China and the Research Grants Council (RGC) of Hong Kong Joint Research Scheme (No. 62261160387, N_ HKBU222/22), in part by the Hong Kong Research Grants Council General Research Fund (RGC/HKBU12202220, RGC/HKBU12203122, and RGC/HKBU12200124), and in part by the Postgraduate Research & Practice Innovation Program of Jiangsu Province (Grant no. SJCX23_0435).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that Gen AI was used in the creation of this manuscript. We acknowledge the use of ChatGPT (GPT-4, OpenAI's language model: http://openai.com) in polishing some of the wordings in the manuscript.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fdata.2024.1506443/full#supplementary-material

Footnotes

1. ^A preliminary version of this work appeared in the 4th French Regional Conference on Complex Systems (FRCCS 2024) (Feng et al., 2024).

References

Aguiar, M., Kooi, B., and Stollenwerk, N. (2008). Epidemiology of dengue fever: a model with temporary cross-immunity and possible secondary infection shows bifurcations and chaotic behaviour in wide parameter regions. Math. Model. Nat. Phenom. 3, 48–70. doi: 10.1051/mmnp:2008070