Geometric algebra based recurrent neural network for multi-dimensional time-series prediction

Li, Yanping; Wang, Yi; Wang, Yue; Qian, Chunhua; Wang, Rui

doi:10.3389/fncom.2022.1078150

ORIGINAL RESEARCH article

Front. Comput. Neurosci., 22 December 2022

Volume 16 - 2022 | https://doi.org/10.3389/fncom.2022.1078150

This article is part of the Research TopicDeep Neural Network Based Decision-Making InterpretabilityView all 9 articles

Geometric algebra based recurrent neural network for multi-dimensional time-series prediction

Yanping Li^1,2

Yi Wang¹

Yue Wang¹

Chunhua Qian³^*

Rui Wang¹^*

¹School of Communication and Information Engineering, Shanghai University, Shanghai, China
²Office of Academic Affairs, Shanghai University, Shanghai, China
³Department of Endocrinology and Metabolism, Shanghai Tenth People's Hospital, School of Medicine, Tongji University, Shanghai, China

Recent RNN models deal with various dimensions of MTS as independent channels, which may lead to the loss of dependencies between different dimensions or the loss of associated information between each dimension and the global. To process MTS in a holistic way without losing the inter-relationship among dimensions, this paper proposes a novel Long-and Short-term Time-series network based on geometric algebra (GA), dubbed GA-LSTNet. Specifically, taking advantage of GA, multi-dimensional data at each time point of MTS is represented as GA multi-vectors to capture the inherent structures and preserve the correlation of those dimensions. In particular, traditional real-valued RNN, real-valued LSTM, and the back-propagation through time are extended to the GA domain. We evaluate the performance of the proposed GA-LSTNet model in prediction tasks on four well-known MTS datasets, and compared the prediction performance with other six methods. The experimental results indicate that our GA-LSTNet model outperforms traditional real-valued LSTNet with higher prediction accuracy, providing a more accurate solution for the existing shortcomings of MTS prediction models.

1. Introduction

Multi-dimensional time-series (MTS) are ubiquitous in our daily lives, including stock market prices, traffic flow on highways, output from solar power plants, temperatures in different cities, and so on. Prediction of these time-series can serve as the basis for many practical applications. However, there are usually complex dynamic interdependencies among the variables of these data (Faloutsos et al., 2018), and how to capture and utilize this information for efficient and reliable prediction is a long-standing research hotspot.

Traditional methods for MTS prediction include linear support vector regression (Cao and Tay, 2003), autoregressive integrated moving average models (Gurland and Whittle, 1951), vector autoregression (VAR) (Han et al., 2015), and so on. The most common approach is to treat observations at a point in time as vectors and model dynamic time using VARs and linear dynamical systems. Essentially, these models rely on AR coefficient matrices or dynamical matrices to capture the correlation structure between different time-series. Chen et al. (2021) extended the VAR model for third-order tensor time-series by introducing two AR coefficient matrices to characterize the correlation structure. Mohammad et al. (2019) also achieved excellent results in time-series data prediction using a stochastic model approach. However, the large number of parameters in the coefficient matrix and the high computational cost make these model parameters very difficult to estimate and prone to overfitting. At the same time, the performance of these traditional models in time-series with mixed long-and short-term modes is always insufficient, and cannot accurately capture the complex nonlinear relationship between sequence data.

Due to the excellent performance of deep learning in applications such as image recognition and machine translation, its potential in the field of MTS prediction has also attracted a lot of attention. Recent studies have indicated that modern deep learning techniques not only achieve state-of-the-art prediction performance but also systematically reduce the complexity of the prediction process significantly, thus improving maintainability (Hochreiter and Schmidhuber, 1997).

Recurrent neural network (RNN) and long short-term memory (LSTM) (Hochreiter and Schmidhuber, 1997) are the earliest neural networks to deal with time-series. Subsequently, the emergence of gated recurrent units (Cho et al., 2014) reduces the number of network parameters, reducing the risk of overfitting compared to LSTMs. At the same time, there are also many works showing that specific convolutional neural network structures can also achieve good results, such as convolution-based gated linear units (Dauphin et al., 2017) and temporal convolutional networks (Bai et al., 2018). Due to the complex structural information among the dimensions of multi-dimensional time-series, it is difficult for a single network to achieve a good processing effect. Therefore, the emergence of hybrid deep networks has brought prediction accuracy to a new level. LSTNet (Ai et al., 2017) combines CNN, LSTM, attention mechanism, and AR autoregressive process to extract short-term local dependence patterns among variables and long-term dependence patterns of time-series. DeepState (Syama et al., 2018) combines state-space models with deep recurrent neural networks to learn the parameters of the whole network by maximum log-likelihood function. DeepGLO (Sen et al., 2019) is a hybrid model that includes a global matrix decomposition model normalized by a temporal convolutional network and a temporal network that captures the local properties of each time-series and associated covariates.

In addition, graph convolutional neural networks (GCNs) have also been shown to capture the correlation between partial time-series. Spatio-temporal graph convolutional neural network (ST-GCN) (Yu et al., 2018) is a deep learning framework for traffic prediction, which fully exploits the graph structure of road networks by integrating graph convolutions and gated linear units for faster training. Li et al. (2017) directly stacked graph convolution and temporal modules to capture spatial and temporal dependencies in traffic data streams, but the network requires predefined relational topology. Graph wavenet (Wu et al., 2019) combines graph convolution layers, adaptive adjacency matrices, and expanded stochastic convolution to capture spatio-temporal dependencies. However, most of them either ignore the correlation between data or require reliance on the graph as a priori. In addition, the Fourier transform has shown its advantages in previous work, especially the joint Fourier transform (Grassi et al., 2018; Isufi et al., 2019; Loukas and Perraudin, 2019), which enables prediction tasks in the fields of weather information, traffic data, and seismic waveforms. The discrete Fourier transform can also be used for time-series analysis. For example, state-frequency memory networks (Zhang et al., 2017) combine the advantages of the discrete Fourier transform and LSTM for stock price prediction together. Nevertheless, none of the existing solutions jointly captures temporal patterns and multivariate correlations in the spectral (Parcollet et al., 2019).

Accurate prediction based on historical time-series data is challenging because it requires joint modeling of the temporal patterns of the data and correlations among the data. How to capture and exploit the dynamic correlations among multiple variables is a huge research challenge for MTS prediction. At present, there are more studies based on multi-dimensional time-series prediction, however, they are all based on the real number, and there is no more advanced theoretical breakthrough. Moreover, in the process of multi-dimensional time-series processing and final fusion, real-valued networks have the problem of information loss inevitably. It is worth noting that Parcollet et al. (2019) constructed new quaternion recurrent neural networks and quaternion long short-term memory networks with quaternions, placing the three feature values of speech signals on each of the three imaginary parts of the quaternion, and exploiting the potential structural dependence within the quaternion to obtain better performance than real RNNs and LSTMs in practical automatic speech recognition applications. However, for signals with more features, such as MTS that usually have dozens or even hundreds of features, quaternion recurrent neural networks and quaternion long short-term memory networks are powerless.

Geometric algebra (GA) has opened up new directions for the study and application of MTS. Through the potential structural dependence within the multi-vector, the multi-dimensional features are combined into a single entity input to the network for processing, capturing the internal relationships between the sequence features, allowing the structural information inherent in the multi-dimensional features to be well preserved.

Therefore, this paper proposes new geometric algebra based recurrent neural network (GA-RNN) and geometric algebra based long-and short-term time-series network (GA-LSTM) and constructs a new geometric algebra based long-and short-term time-series network, (GA-LSTNet). Firstly, the multi-dimensional time-series is represented as a GA multi-vector, preserving the correlation states of its channels. Secondly, each layer of the network and the training algorithm are extended to the GA space, and the corresponding processing algorithm is provided for the input GA multi-vector to ensure the retention of multi-channel information in signal processing. Essentially, each feature of the multi-dimensional signal is mapped to each component of the GA multi-vector separately, and then the overall computation based on the GA multi-vector is performed to maximize the retention of the potential features of the multi-dimensional signal.

The rest of this paper is organized as follows. Section 2 introduces the basics of GA and neural networks based on GA. Section 3 describes the proposed GA-RNN and GA-LSTM. Comparison experimental results between GA-LSTNet and real-valued methods are provided in Section 4, followed by concluding remarks drawn in Section 5.

2. Preliminary

GA was first described by William K. Clifford, also called Clifford Algebra. For multi-dimensional signals, GA is not only an effective framework to handle the representation and computation issues but also a useful tool for widespread use in mathematics and physics (Hestenes, 1986; Rafal, 2004; López-González et al., 2016).

Mathematically, suppose G_n denotes a 2ⁿ dimensional vector space, and there exist a set of orthogonal bases {e₁, e₂, …, e_n}. The power set γ = {1, ⋯ , n} can turn the basis into an ordered one with the index set Γ.

\begin{array}{l} Γ : = {(a_{1}, \dots a_{r}) \in γ, 1 \leq a_{1} \dots a_{r} \leq n} & (1) \end{array}

Then, the basis of G_n is denoted by

\begin{array}{l} {e_{I} : = e_{a_{1}} \dots e_{a_{r}} ∣ I \in Γ} & (2) \end{array}

For example, the basis in the 2³ vector space can be described as

\begin{array}{l} {1, e_{1}, e_{2}, e_{3}, e_{12}, e_{13}, e_{23}, e_{123}} . & (3) \end{array}

For convenience, in the rest of the paper, e₁⋯e_r will be denoted by e_1⋯r. In general, the multiplication in GA will follow the following rules

\begin{array}{l} {\begin{array}{l} e_{i}^{2} = 1, i = 1, \dots, p \\ e_{i}^{2} = - 1, i = p + 1, \dots, p + q \end{array} & (4) \end{array}

and 𝔾_n can also be denoted as 𝔾_{p, q}, with 2ⁿ = p+q. An arbitrary element of the GA is given by

\begin{array}{l} x = \sum_{t = 0}^{n} {〈 x 〉}_{t} = \sum_{I \in Γ} {[x]}_{I} e_{I} & (5) \end{array}

Where [x]_A∈ℝ, represents the value of each component of the multi-vector. For example, an element in the 2³ vector space can be represented as

\begin{array}{l} x = {〈 x 〉}_{0} + {〈 x 〉}_{1} + {〈 x 〉}_{2} + {〈 x 〉}_{3} \\ = x_{0} + x_{1} e_{1} + x_{2} e_{2} + x_{3} e_{3} \\ + x_{12} e_{12} + x_{13} e_{13} + x_{23} e_{23} + x_{123} e_{123} & (6) \end{array}

The addition in GA can be defined as

\begin{array}{l} x + y = \sum_{I \in Γ} ({[x]}_{I} + {[y]}_{I}) e_{I} & (7) \end{array}

The geometric product in GA can be written in the following form

\begin{array}{l} x \otimes_{p, q} y = x \cdot y + x \land y & (8) \end{array}

Where x·y and x∧y represent the inner and outer products in GA, respectively.

The geometric product between two multi-vectors can also be converted into matrix operations. Assuming that there is a multi-vector, the multi-vector can be expressed as

\begin{array}{l} x = [{[x]}_{0}, {[x]}_{1}, {[x]}_{2}, \dots, {[x]}_{I}, \dots] \cdot {[1, e_{1}, e_{2}, \dots, e_{I}, \dots]}^{T} \\ = F_{x} \cdot N_{x} & (9) \end{array}

Where $F_{x} \in ℝ^{1 \times 2^{n}}$ is the coefficient matrix of multi-vector x and N_x is the corresponding orthogonal basis matrix. According to the calculation rules between different e_I, R(x) can be defined as its real representation matrix (Roy et al., 2020). Then,

\begin{array}{l} x \otimes_{p, q} y = [R (y) \cdot {(F_{x})}^{T}] \cdot N_{x (y)} & (10) \end{array}

The inversion of multi-vector is denoted by

\begin{array}{l} \tilde{x} = \sum_{t = 0}^{n} {(- 1)}^{\frac{t (t - 1)}{2}} {〈 x 〉}_{t} & (11) \end{array}

The conjugation of multi-vector is denoted by

\begin{array}{l} x^{*} = \sum_{t = 0}^{n} {(- 1)}^{\frac{t (t + 1)}{2}} {〈 x 〉}_{t} & (12) \end{array}

For any two multi-vectors x, y∈G_n, dot product is defined by

\begin{array}{l} x ⊙ y = \sum_{i \in Γ} {[x]}_{I} {[y]}_{I} e_{I} & (13) \end{array}

In addition, similar to quaternion, the basic element of GA also has the concept of module. For any multi-vector, its module is defined by

\begin{array}{l} || x || = \sqrt{\sum_{I \in Γ} {({[x]}_{I})}^{2}} & (14) \end{array}

3. Methods

GA provides a new direction for the research and application of MTS. Through the latent structural dependencies within the multi-vector, the multi-dimensional features are combined into a single entity as input for network processing, and the internal relationship between the sequence features is captured, so that the inherent structural information in the multi-dimensional features is well preserved.

In this section, we extend RNN and LSTM from the real-value domain to the GA domain. In our proposed networks, inputs, outputs and weights are represented by GA multi-vectors. The operations in each layer and the training algorithm will be introduced in the following.

3.1. Geometric algebra based recurrent network layer

The learning process of the geometric algebra based RNN layer (GA-RNN) is similar to that of the real-valued RNN, the difference is that the input and network parameters have become multi-vectors, as shown in Figure 1. The multi-dimensional features at each time point are converted into multi-vectors as input for GA-RNN. The weights of the input features are also multi-vectors.

FIGURE 1

Figure 1. Multi-vectorization of real-valued RNN layer.

Suppose that the dimension of input vector x_t at time t is N, the number of neurons in the hidden layer is H and the number of neurons in output layer is K. In addition, σ and α represent the sigmoid and tanh activation functions, respectively. Then the forward propagation process of the GA-RNN basic unit is denoted by

\begin{array}{l} a^{t} = σ (U \otimes_{p, q} x^{t} + W \otimes_{p, q} a^{t - 1} + θ^{a}) \\ b^{t} = α (V \otimes_{p, q} a^{t} + θ^{b}) \\ a^{t} = {[\begin{matrix} a_{1}^{t} & \dots & a_{h}^{t} & \dots & a_{H}^{t} \end{matrix}]}^{T} \\ a^{t} = {[\begin{matrix} b_{1}^{t} & \dots & b_{k}^{t} & \dots & b_{K}^{t} \end{matrix}]}^{T} & (15) \end{array}

Where x^t and a^t are multi-vector formed from the original input real-valued data and hidden state, respectively. U, W, and V are the weight matrices of the input, hidden state and output, respectively. θ^a and θ^b are the bias terms of the hidden state and output layer, respectively. b^t is the output target.

In addition, for a multi-vector x, assuming that f is any standard activation function, then

\begin{array}{l} f (x) = \sum_{I \in Γ} f ({[x]}_{I}) e_{I} & (16) \end{array}

The activation function of multi-vector output is essentially the activation function operation in the real domain for each component of the multi-vector.

3.2. Geometric algebra based back-propagation through time

The principle of back-propagation algorithm through time of GA-RNN (GA-BPTT) is the same as that of real-valued RNN. After the sample information is modeled by GA, it is transmitted to the output layer through the input layer to obtain the actual output, and then the loss function is used to calculate the error E between actual output and true labels. Then backpropagation corrects the weight matrix and bias parameters, respectively, until the error converges to a certain threshold.

Similar to the multi-vector activation function, the loss function of multi-vector output is essentially the loss function in the real domain for each component. In the real domain, the dynamic loss is only based on all previously connected neurons, while the difference of GA-BPTT is that the multi-vector loss is calculated for each component of the multi-vector neuron parameter, which can act as a regularizer during training.

According to Equation (15), the output $b_{k}^{t}$ can be written as

\begin{array}{l} b_{k}^{t} = \sum_{I \in Γ} {[b_{k}^{t}]}_{I} e_{I} & (17) \end{array}

Suppose that y^t is the true labels, b^t in Equation (15) is the actual output, the final loss function is defined as the sum of the mean squared errors at each moment

\begin{array}{l} E = \sum_{t = 1}^{T} E^{t} = \frac{1}{2} \sum_{t = 1}^{T} || y^{t} - b^{t} {||}_{2} & (18) \end{array}

Because the loss function is computed separately for each component of the multi-vector, the loss E is also a multi-vector.

3.2.1. GA-RNN output layer weight matrix

The weight matrix V of the output layer is used to calculate the actual output b^t, that is,

\begin{array}{l} \frac{\partial E}{\partial V} = \sum_{t = 1}^{T} \frac{\partial E^{t}}{\partial V} \\ \frac{\partial E^{t}}{\partial V} = (\begin{matrix} \frac{\partial E^{t}}{\partial v_{11}} & \dots & \frac{\partial E^{t}}{\partial v_{H 1}} \\ ⋮ & ⋱ & ⋮ \\ \frac{\partial E^{t}}{\partial v_{1 K}} & \dots & \frac{\partial E^{t}}{\partial v_{H K}} \end{matrix}) = (\begin{matrix} \nabla E_{v_{11}}^{t} & \dots & \nabla E_{v_{H 1}}^{t} \\ ⋮ & ⋱ & ⋮ \\ \nabla E_{v_{1 K}}^{t} & \dots & \nabla E_{v_{H K}}^{t} \end{matrix}) & (19) \end{array}

Each item in Equation (19) can be calculated individually, i.e.,

\begin{array}{l} \nabla E_{v_{h k}}^{t} = \sum_{I \in Γ} \frac{\partial E^{t}}{\partial {[v_{h k}]}_{I}} e_{I} & (20) \end{array}

For each component in Equation (20), the chain rule is applied for parameter updating, that is,

\begin{array}{l} \frac{\partial E^{t}}{\partial {(v_{h k})}_{I}} = \sum_{B \in Γ} \frac{\partial E^{t}}{\partial {[m_{k}^{t}]}_{B}} \cdot \frac{\partial {[m_{k}^{t}]}_{B}}{\partial {[v_{h k}]}_{I}} & (21) \end{array}

The calculation of Equation (21) can be divided into the activation function part and the propagation function part, as shown in Equations (22, 23), respectively.

\begin{array}{l} \frac{\partial E^{t}}{\partial {[m_{k}^{t}]}_{B}} = \frac{\partial E^{t}}{\partial {[b_{k}^{t}]}_{B}} \cdot \frac{\partial {[b_{k}^{t}]}_{B}}{\partial {[m_{k}^{t}]}_{B}} = ({[b_{k}^{t}]}_{B} - {[y_{k}^{t}]}_{B}) \cdot α^{'} ({[m_{k}^{t}]}_{B}) \\ = {[δ_{k}^{t}]}_{B} & (22) \end{array}

\begin{array}{l} \frac{\partial {[m_{k}^{t}]}_{B}}{\partial {[v_{h k}]}_{I}} = {R (a_{h}^{t})}_{(b, i)} \\ \frac{\partial {[m_{k}^{t}]}_{B}}{\partial {[a_{h}^{t}]}_{B}} = {[v_{h k}]}_{0} & (23) \end{array}

Where ${R (a_{h}^{t})}_{(b, i)}$ is the value in row b column i of real representation matrix $R (a_{h}^{t})$ . Therefore,

\begin{array}{l} \frac{\partial E^{t}}{\partial {[v_{h k}]}_{I}} = \sum_{B \in Γ} {[δ_{k}^{t}]}_{B} \cdot {R (a_{h}^{t})}_{(b, i)} & (24) \end{array}

That is,

\begin{array}{l} \nabla E_{v_{j k}}^{t} = \sum_{I \in Γ} {[\sum_{B \in Γ} {[δ_{k}^{t}]}_{B} \cdot {R (a_{h}^{t})}_{(b, i)}]}_{I} e_{I} & (25) \end{array}

3.2.2. GA-RNN hidden layer weight matrix

The derivation process of the backpropagation for W is as follows:

\begin{array}{l} \frac{\partial E}{\partial W} = \sum_{t = 1}^{T} \frac{\partial E^{t}}{\partial W} & (26) \end{array}

Similarly,

\begin{array}{l} \frac{\partial E^{t}}{\partial W} = (\begin{matrix} \frac{\partial E^{t}}{\partial w_{11}} & \dots & \frac{\partial E^{t}}{\partial w_{H 1}} \\ ⋮ & ⋱ & ⋮ \\ \frac{\partial E^{t}}{\partial w_{1 H}} & \dots & \frac{\partial E^{t}}{\partial w_{H H}} \end{matrix}) = (\begin{matrix} \nabla E_{w_{11}}^{t} & \dots & \nabla E_{w_{H 1}}^{t} \\ ⋮ & ⋱ & ⋮ \\ \nabla E_{w_{1 H}}^{t} & \dots & \nabla E_{w_{H H}}^{t} \end{matrix}) & (27) \end{array}

Since the weights of the hidden layer are related to the state of the previous moment, consider

\begin{array}{l} a_{h}^{t + 1} = σ (\sum_{i = 1}^{N} u_{i h} \otimes_{p, q} x_{i}^{t + 1} + \sum_{h^{'} = 1}^{H} w_{h^{'} h} \otimes_{p, q} a_{h}^{t} + θ_{h}^{a}) \\ b_{k}^{t + 1} = α (m_{k}^{t + 1}) = α (\sum_{h = 1}^{H} v_{h k} \otimes_{p, q} a_{h}^{t + 1} + θ_{k}^{b}) & (28) \end{array}

We have

\begin{array}{l} \nabla E_{w_{h^{'} h}}^{t} = \sum_{I \in Γ} \frac{\partial E^{t}}{\partial {[w_{h^{'} h}]}_{I}} e_{I} & (29) \end{array}

In Equation (29), the calculation of each component is denoted by

\begin{array}{l} \frac{\partial E^{t}}{\partial {[w_{h^{'} h}]}_{I}} = \sum_{B \in Γ} \frac{\partial E^{t}}{\partial {[m_{k}^{t}]}_{B}} \cdot \frac{\partial {[m_{k}^{t}]}_{B}}{\partial {[a_{h}^{t}]}_{B}} \cdot \frac{\partial {[a_{h}^{t}]}_{B}}{\partial {[z_{h}^{t}]}_{B}} \cdot \frac{\partial {[z_{h}^{t}]}_{B}}{\partial {[w_{h^{'} h}]}_{I}} \\ + \sum_{B \in Γ} \frac{\partial E^{t}}{\partial {[m_{k}^{t + 1}]}_{B}} \cdot \frac{\partial {[m_{k}^{t + 1}]}_{B}}{\partial {[a_{h}^{t + 1}]}_{B}} \cdot \frac{\partial {[a_{h}^{t + 1}]}_{B}}{\partial {[z_{h}^{t + 1}]}_{B}} \cdot \frac{\partial {[z_{h}^{t + 1}]}_{B}}{\partial {[w_{h^{'} h}]}_{I}} & (30) \end{array}

Where

\begin{array}{l} \sum_{B \in Γ} \frac{\partial E^{t}}{\partial {[m_{k}^{t}]}_{B}} \cdot \frac{\partial {[m_{k}^{t}]}_{B}}{\partial {[a_{h}^{t}]}_{B}} \cdot \frac{\partial {[a_{h}^{t}]}_{B}}{\partial {[z_{h}^{t}]}_{B}} \cdot \frac{\partial {[z_{h}^{t}]}_{B}}{\partial {[w_{h^{'} h}]}_{I}} \\ = \sum_{B \in Γ} {[δ_{k}^{t}]}_{B} \cdot {[v_{h k}]}_{0} \cdot σ^{'} ({[z_{h}^{t}]}_{B}) \cdot {R (a_{h}^{t - 1})}_{(b, i)} \\ + \sum_{B \in Γ} \frac{\partial E^{t}}{\partial {[m_{k}^{t + 1}]}_{B}} \cdot \frac{\partial {[m_{k}^{t + 1}]}_{B}}{\partial {[a_{h}^{t + 1}]}_{B}} \cdot \frac{\partial {[a_{h}^{t + 1}]}_{B}}{\partial {[z_{h}^{t + 1}]}_{B}} \cdot \frac{\partial {[z_{h}^{t + 1}]}_{B}}{\partial {[w_{h^{'} h}]}_{I}} \\ = \sum_{B \in Γ} {[δ_{k}^{t + 1}]}_{B} \cdot {[v_{h k}]}_{0} \cdot σ^{'} ({[z_{h}^{t + 1}]}_{B}) \cdot {R (a_{h}^{t})}_{(b, i)} & (31) \end{array}

3.2.3. GA-RNN input layer weight matrix

The updating of input layer weight matrix U is the same as that of the hidden layer, that is,

\begin{array}{l} \frac{\partial E}{\partial U} = \sum_{t = 1}^{T} \frac{\partial E^{t}}{\partial U} \\ \frac{\partial E^{t}}{\partial U} = (\begin{matrix} \frac{\partial E^{t}}{\partial u_{11}} & \dots & \frac{\partial E^{t}}{\partial u_{N 1}} \\ ⋮ & ⋱ & ⋮ \\ \frac{\partial E^{t}}{\partial u_{1 H}} & \dots & \frac{\partial E^{t}}{\partial u_{N H}} \end{matrix}) = (\begin{matrix} \nabla E_{u_{11}}^{t} & \dots & \nabla E_{u_{N 1}}^{t} \\ ⋮ & ⋱ & ⋮ \\ \nabla E_{u_{1 H}}^{t} & \dots & \nabla E_{u_{N H}}^{t} \end{matrix}) & (32) \end{array}

Where

\begin{array}{l} \nabla E_{u_{i h}}^{t} = \sum_{I \in Γ} \frac{\partial E^{t}}{\partial {[u_{i h}]}_{I}} e_{I} \\ \frac{\partial E^{t}}{\partial {(u_{i h})}_{I}} = \sum_{B \in Γ} \frac{\partial E^{t}}{\partial {[m_{k}^{t}]}_{B}} \cdot \frac{\partial {[m_{k}^{t}]}_{B}}{\partial {[a_{h}^{t}]}_{B}} \cdot \frac{\partial {[a_{h}^{t}]}_{B}}{\partial {[z_{h}^{t}]}_{B}} \cdot \frac{\partial {[z_{h}^{t}]}_{B}}{\partial {[u_{i h}]}_{I}} \\ = \sum_{B \in Γ} {[δ_{k}^{t}]}_{B} \cdot {(v_{h k})}_{0} \cdot σ^{'} ({[z_{h}^{t}]}_{B}) \cdot {R (x_{i}^{t})}_{(b, i)} & (33) \end{array}

That is,

\begin{array}{l} \nabla E_{u_{i h}}^{t} = \sum_{I \in Γ} {[\sum_{B \in Γ} {[δ_{h}^{t}]}_{B} \cdot {[v_{h k}]}_{0} \cdot σ^{'} ({[z_{h}^{t}]}_{B}) \cdot {R (x_{i}^{t})}_{(b, i)}]}_{I} e_{I} & (34) \end{array}

3.2.4. GA-RNN output layer bias

$\frac{\partial E}{\partial θ^{b}}$ can be written as

\begin{array}{l} \frac{\partial E}{\partial θ^{b}} = \sum_{t = 1}^{T} \frac{\partial E^{t}}{\partial θ^{b}} = \sum_{t = 1}^{T} (\begin{matrix} \frac{\partial E^{t}}{\partial θ_{1}^{b}} \\ ⋮ \\ \frac{\partial E^{t}}{\partial θ_{K}^{b}} \end{matrix}) = \sum_{t = 1}^{T} (\begin{matrix} \nabla E_{θ_{1}^{b}}^{t} \\ ⋮ \\ \nabla E_{θ_{K}^{b}}^{t} \end{matrix}) & (35) \end{array}

Similarly,

\begin{array}{l} \nabla E_{θ_{k}^{b}}^{t} = \sum_{I \in Γ} \frac{\partial E^{t}}{\partial {[θ_{k}^{b}]}_{I}} e_{I} \\ \frac{\partial E^{t}}{\partial {[θ_{k}^{b}]}_{I}} = \sum_{B \in Γ} \frac{\partial E^{t}}{\partial {[m_{k}^{t}]}_{B}} \cdot \frac{\partial {[m_{k}^{t}]}_{B}}{\partial {[θ_{k}^{b}]}_{I}} \\ = \sum_{B \in Γ} {[δ_{k}^{t}]}_{B} \cdot \frac{\partial {[m_{k}^{t}]}_{B}}{\partial {[θ_{k}^{b}]}_{I}} = {[δ_{k}^{t}]}_{I} & (36) \end{array}

Therefore,

\begin{array}{l} \nabla E_{θ_{k}^{b}}^{t} = \sum_{I \in Γ} {[δ_{k}^{t}]}_{I} e_{I} & (37) \end{array}

3.2.5. GA-RNN hidden layer bias

Same as Equations (35, 36):

\begin{array}{l} \nabla E_{θ_{h}^{a}}^{t} = \sum_{I \in Γ} \frac{\partial E^{t}}{\partial {[θ_{h}^{a}]}_{I}} e_{I} & (38) \end{array}

The difference is that the weight updating of the hidden layer takes into account the state of the previous moment, namely:

\begin{array}{l} \frac{\partial E^{t}}{\partial {[θ_{h}^{a}]}_{I}} = \sum_{B \in Γ} \frac{\partial E^{t}}{\partial {[m_{k}^{t}]}_{B}} \cdot \frac{\partial {[m_{k}^{t}]}_{B}}{\partial {[a_{h}^{t}]}_{B}} \cdot \frac{\partial {[a_{h}^{t}]}_{B}}{\partial {[z_{h}^{t}]}_{B}} \cdot \frac{\partial {[z_{h}^{t}]}_{B}}{\partial {[θ_{h}^{a}]}_{I}} \\ = \sum_{B \in Γ} {[δ_{k}^{t}]}_{B} \cdot {[v_{h k}]}_{0} \cdot σ^{'} ({[z_{h}^{t}]}_{B}) \cdot \frac{\partial {[z_{h}^{t}]}_{B}}{\partial {[θ_{h}^{a}]}_{I}} \\ = {[δ_{k}^{t}]}_{I} \cdot {[v_{h k}]}_{0} \cdot σ^{'} ({[z_{h}^{t}]}_{I}) & (39) \end{array}

Therefore,

\begin{array}{l} \nabla E_{θ_{h}^{a}}^{t} = \sum_{I \in Γ} {[δ_{k}^{t}]}_{I} \cdot {[v_{h k}]}_{0} \cdot σ^{'} ({[z_{h}^{t}]}_{I}) e_{I} & (40) \end{array}

3.3. Geometric algebra based long short-term memory network layer

Due to the “long-term dependence” problem caused by the disappearance of the RNN gradient, LSTM emerged. It has the ability to learn long-distance dependencies. This mechanism can be easily extended to the GA (GA-LSTM). Gates of LSTM is the core components, and the GA gates are characterized by the fusion process of each component of the multi-vector input signal after multiplication with the components of the multi-vector gate parameters. Let f^t, i^t, o^t, c^t, and h^t be the forget gate, input gate, output gate, cell state and hidden state of the GA-LSTM unit at time step t, respectively. Then the GA-LSTM propagation process can be defined as:

\begin{array}{l} f^{t} = σ (U_{f} \otimes_{p, q} x^{t} + W_{f} \otimes_{p, q} h^{t - 1} + θ^{f}) \\ i^{t} = σ (U_{i} \otimes_{p, q} x^{t} + W_{i} \otimes_{p, q} h^{t - 1} + θ^{i}) \\ c^{t} = f^{t} ⊙ c^{t - 1} + i^{t} ⊙ α (U_{c} \otimes_{p, q} x^{t} + W_{c} \otimes_{p, q} h^{t - 1} + θ^{c}) \\ o^{t} = σ (U_{o} \otimes_{p, q} x^{t} + W_{o} \otimes_{p, q} h^{t - 1} + θ^{o}) \\ h^{t} = o^{t} ⊙ α (c^{t}) & (41) \end{array}

4. Results and discussion

4.1. Datasets

This experiment uses four publicly available datasets, namely Traffic, Electricity, Solar-Energy and Exchange-Rate. As shown in Table 1, Traffic records the occupancy rates (0–1) measured by simultaneous interpreting 862 different sensors on the San Francisco Bay expressway within 2 years (2015–2016 years), and the data are collected once 1 h. Electricity records the power consumption of 321 customers from 2012 to 2014. The data is collected every 15 min. This part converts the data to reflect the hourly consumption; Solar-Energy is the solar power generation record of 137 photovoltaic power stations in Alabama in 2006, which is sampled every 10 min. Exchange-Rate is the summary of daily exchange rates of eight countries including Australia, the United Kingdom, Canada, Switzerland, China, Japan, New Zealand and Singapore from 1990 to 2016. These datasets are real-world data and contain linear and nonlinear interdependencies (Jordan et al., 2003). All datasets are divided into training set (60%), verification set (20%) and test set (20%) in chronological order. The download address of the four datasets is here.

TABLE 1

Table 1. Datasets used in the experiment.

4.2. Experimental design

In order to verify the performance of the proposed neural networks based on GA, several MTS prediction algorithms with the proposed network are implemented, which are:

1) AR: Standard autoregressive model,

2) LSVR (Li and Cyrus, 2018): Vector autoregressive model with support vector regression objective function,

3) Lridge: Vector autoregressive model with L2 regularization,

4) TRMF (Hsiang et al., 2016): Autoregressive model of time regularization matrix decomposition,

5) GP (Roberts et al., 2012): Gaussian process for time-series modeling,

6) LSTNet (Lai et al., 2018): Deep neural network for modeling long-term and short-term time patterns,

7) GA-LSTNet: A geometric algebra based depth neural network for modeling long-term and short-term time patterns proposed in this paper.

For the first six comparison methods, the parameter setting used in this experiment is the same as that of Lai et al. (2018). That is, grid search is performed for all adjustable hyperparameters on the verification set of each method and each dataset. Specifically, the regularization coefficient of AR is selected from {0.1, 1, 10} to achieve the best performance. The value range of LSVR and LRidge regularization coefficients is {2⁻¹⁰, 2⁻⁹, …, 2⁹, 2¹⁰}. TRMF, the search ranges of hidden dimension and regularization coefficient are {2², 2³, …, 2⁶} and {0.1, 1, 10}, respectively.

The GA-LSTNet used in this paper is composed of convolution layer and LSTM of GA instead of convolution layer and LSTM of LSTNet. LSTNet and GA-LSTNet have the same network structure and parameter settings. Their differences are the form of data input and the calculation method of network layer. Specifically, the selection range of hidden dimensions of LSTM and convolution layer is {50, 100, 200} and {20, 50, 100}. For the number of RNN-skip layers, Electricity and Traffic are set to 24, and the adjustment range of Solar-Energy and Exchange-Rate is 2¹–2⁶. In addition to the input and output layers, dropout with a size of 0.1 or 0.2 is set after each layer. The optimizer for the two models is Adam.

In order to quantify and represent all experimental results, the seven methods in this section follow the same evaluation index (Hsiang et al., 2016): relative absolute error (RAE), root relative square error (RSE) and correlation coefficient (CORR). The first evaluation criterion RAE is defined as:

\begin{array}{l} RAE = \frac{\sqrt{\sum_{t = t_{0}}^{t_{1}} \sum_{i = 1}^{n} | (y_{t, i} - ŷ_{t, i}) |}}{\sqrt{\sum_{t = t_{0}}^{t_{1}} \sum_{i = 1}^{n} | ŷ_{t, i} - \bar{ŷ_{t_{0} : t_{1}, 1 : n}} |}} & (42) \end{array}

The second evaluation criterion RSE is defined as:

\begin{array}{l} RSE = \frac{\sqrt{\sum_{t = t_{0}}^{t_{1}} \sum_{i = 1}^{n} {(y_{t, i} - ŷ_{t, i})}^{2}}}{\sqrt{\sum_{t = t_{0}}^{t_{1}} \sum_{i = 1}^{n} {(ŷ_{t, i} - \bar{ŷ_{t_{0} : t_{1}, 1 : n}})}^{2}}} & (43) \end{array}

The third evaluation criterion CORR is defined as:

\begin{array}{l} CORR = \frac{1}{n} \sum_{i = 1}^{n} \frac{\sum_{t = t_{0}}^{t_{1}} (y_{t, i} - \bar{y_{t_{0} : t_{1}, i}}) (ŷ_{t, i} - \bar{ŷ_{t_{0} : t_{1}, i}})}{\sqrt{\sum_{t = t_{0}}^{t_{1}} {(y_{t, i} - \bar{y_{t_{0} : t_{1}, i}})}^{2} {(ŷ_{t, i} - \bar{ŷ_{t_{0} : t_{1}, i}})}^{2}}} & (44) \end{array}

Where y is the predicted value, ŷ is the real value of the test set, $\bar{y}$ represents the mean value of the set, and t∈[t₀, t₁] is the data label of the test set. For RAE and RSE, the lower the value, the better the prediction result. On the contrary, for CORR, the higher the value, the better the prediction result.

4.3. Experimental results and analysis

In this part, seven methods will be used to conduct prediction experiments on four datasets, and the prediction range horizon is set to {3, 6, 12, 24}. According to Table 1, for Electricity and Traffic, the prediction range is set to {3, 6, 12, 24} h. For Solar-Energy, the prediction range is set to {30, 60, 120, 240} min. The prediction range for Exchange-Rate is set to {3, 6, 12, 24} days.

This part first compares the convergence curves of the two networks on Traffic and Solar-Energy. Figure 2A shows the convergence curve of GA-LSTNet and LSTNet predicting power generation in the next 240 min on the Solar-Energy dataset; Figure 2B shows the convergence curve of GA-LSTNet and LSTNet predicting traffic flow in the next 6 h on the Traffic dataset. The curves of the two experiments show that under the same network configuration, the convergence rate of GA-LSTNet is faster than LSTNet, and the final convergence value and the training error are smaller.

FIGURE 2

Figure 2. Convergence curves of GA-LSTNet and LSTNet on different datasets. (A) Solar-Energy and (B) Traffic.

Tables 2, 3, respectively, show the comparison of RAE and RSE of the prediction results of the seven methods, in which the best results are shown in bold.

TABLE 2

Table 2. Prediction results using RAE as indicator.

TABLE 3

Table 3. Prediction results using RSE as indicator.

It can be seen from the results in Tables 2, 3 that on the Traffic and Solar-Energy datasets, the GA- LSTNet model proposed in this part has absolute advantages over the LSTNet model with the same structure and achieves lower prediction error. On the Electricity dataset, although LSTNet achieved the best results in some predictions, it can be observed that the difference between GA-LSTNet and LSTNet is no more than 0.002, which is a very small gap. The experimental results show the feasibility of GA-LSTNet, because it can capture more useful information in different data channels when it represents MTS as GA multi-vector. Therefore, the LSTM based on GA has better performance than the real-valued LSTM in MTS prediction.

In addition, in order to more intuitively observe the prediction results, Figure 3 shows the prediction results of seven different methods under different conditions using CORR as an index. In Figure 3, the higher the bar chart, the greater the CORR representing the corresponding prediction task and the higher the prediction accuracy. As shown in Figure 3, GA-LSTNet obtained the highest CORR value in each prediction task of Electricity, Traffic and Exchange-Rate. In the prediction task of horizon = 3 for Solar-Energy dataset, the accuracy is slightly lower than LSTNet, but with the growth of prediction time, the superiority of GA-LSTNet is gradually obvious. It can be seen that the introduction of GA into real-valued RNN and LSTM will not change their basic properties. The capture of the correlation between different features by GA enhances the long-term dependent learning ability of real-valued neural networks.

FIGURE 3

Figure 3. Prediction results using CORR. (A) Electricity, (B) Traffic, (C) Solar-Energy, and (D) Exchange-Rate.

From the above quantitative results, it can be seen that the overall accuracy of GA-LSTNet is improved compared with LSTNet, which means that the predicted positions of some points will be more consistent with the real labels. As shown in Figures 4, 5, two networks are selected to visually observe all the prediction results of Electricity and some prediction results on Traffic dataset.

FIGURE 4

Figure 4. Prediction results of electricity by LSTNet and GA-LSTNet when horizon = 12.

FIGURE 5

Figure 5. Prediction results of traffic by LSTNet and GA-LSTNet when horizon = 6.

In Figure 4, red represents the real value of power consumption and blue represents the predicted value. GA-LSTNet and LSTNet are similar to the real value trend on the whole, but GA-LSTNet performs better in details, especially in some prominent troughs. For example, in 2,000–2,500 h, the gap between GA-LSTNet and the real value is smaller and closer to the real value.

In Figure 5, the red curve represents the real value of Traffic occupancy within 160 h of the test set, the green curve represents the predicted value of LSTNet, and the blue curve represents the predicted value of GA-LSTNet. Both are basically consistent with the real value at the trough, but among the six peaks shown in Figure 5, five are GA-LSTNet, which is closer to the real value. It can be seen from the comparison figure that after replacing the corresponding real-valued network with GA convolution and LSTM, GA-LSTNet retains more useful relevant information in the original data due to the multi-dimensional consistency of GA, and the prediction performance is significantly improved.

5. Conclusion

This paper focuses on the construction of the geometric algebra based RNN and LSTM for the processing of MTS. Under the framework of GA, the MTS is encoded into GA multi-vectors to avoid the loss of structural relationships among multi-dimensional variables. And then the forward and backpropagation algorithms for the proposed GA-RNN, GA-LSTM are derived. The experimental results show that the GA-LSTNet has good convergence and more accurate prediction accuracy in MTS prediction, and has certain advantages compared with the real-valued LSTNet. GA-RNN and GA-LSTM provide a more accurate solution for the existing shortcomings of MTS prediction models. At next steps, we will focus on more practical MTS applications with the proposed networks.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding authors.

Author contributions

YL and YueW proposed the new idea of the paper and participated in the outage performance analysis. YiW performed the simulations and drafted the paper. CQ played an important role in interpreting the results and revised the manuscript. RW conceived of the study, participated in its design, coordination, and helped to draft the manuscript. All authors have read and approved the final manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (NSFC) under Grant Nos. 61771299 and 62071286, the Clinical Research Funds for Shanghai Municipal Health Commission (202040170).

Acknowledgments

We thank reviewers for constructive feedback on the manuscript, all participants who participated in the study.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Ai, G., Chang, W. C., and Yang, Y. (2017). Modeling long- and short-term temporal patterns with deep neural networks. arXiv:1703.07015 [cs.LG]. doi: 10.48550/arXiv.1703.07015

CrossRef Full Text | Google Scholar

Bai, S., Kolter, J. Z., and Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv:1803.01271 [cs.LG]. doi: 10.48550/arXiv.1803.01271

CrossRef Full Text | Google Scholar

Cao, L. J., and Tay, F. E. H. (2003). Support vector machine with adaptive parameters in financial time series forecasting. IEEE Trans. Neural Netw. 14, 1506–1518. doi: 10.1109/TNN.2003.820556

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, R., Xiao, H., and Yang, D. (2021). Autoregressive models for matrix-valued time series. J. Econometr. 222(1, Part B), 539–560. doi: 10.1016/j.jeconom.2020.07.015

CrossRef Full Text | Google Scholar

Cho, K., Merrienboer, B. V., and Bahdanau, D. (2014). “On the properties of neural machine translation: encoder-decoder approaches,” in Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (Doha: Association for Computational Linguistics), 103–111.

Google Scholar

Dauphin, Y. N., Fan, A., and Auli, M. (2017). “Language modeling with gated convolutional networks,” in Proceedings of the 34th International Conference on Machine Learning, Volume 70 of ICML'17 (Sydney, NSW), 933–941. Available online at: https://www.JMLR.org

PubMed Abstract | Google Scholar

Faloutsos, C., Gasthaus, J., Januschowski, T., and Wang, Y. (2018). Forecasting big time series: old and new. Proc. VLDB Endowment 11, 2102–2105. doi: 10.14778/3229863.3229878

CrossRef Full Text | Google Scholar

Grassi, F., Loukas, A., and Perraudin, N. (2018). A time-vertex signal processing framework: scalable processing and meaningful representations for time-series on graphs. IEEE Trans. Signal Process. 66, 817–829. doi: 10.1109/TSP.2017.2775589

CrossRef Full Text | Google Scholar

Gurland, J., and Whittle, P. (1951). Hypothesis testing in time series analysis. J. Am. Stat. Assoc. 49, 197–199. doi: 10.2307/2281054

PubMed Abstract | CrossRef Full Text | Google Scholar

Han, F., Lu, H., and Liu, H. (2015). A direct estimation of high dimensional stationary vector autoregressions. J. Mach. Learn. Res. 16, 3115–3150. doi: 10.48550/arXiv.1307.0293

CrossRef Full Text | Google Scholar

Hestenes, D. (1986). New Foundations for Classical Mechanics. New York, NY: Kluwer Academic Publishers.

Google Scholar

Hochreiter, S., and Schmidhuber, J. (1997). Long short-term memory. Neural Comput. 9, 1735–1780. doi: 10.1162/neco.1997.9.8.1735

PubMed Abstract | CrossRef Full Text | Google Scholar

Hsiang, F. Y., Nikhil, R., and Dhillon, I. S. (2016). “Temporal regularized matrix factorization for high-dimensional time series prediction,” in NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS'16 (Red Hook, NY: Curran Associates Inc.), 847–855.

Google Scholar

Isufi, E., Loukas, A., and Perraudin, N. (2019). Forecasting time series with varma recursions on graphs. IEEE Trans. Signal Process. 67, 4870–4885. doi: 10.1109/TSP.2019.2929930

CrossRef Full Text | Google Scholar

Jordan, N., Becker, R., and Gunsolus, J. (2003). Knowledge networks: an avenue to ecological management of invasive weeds. Weed Sci. 51, 271–277. doi: 10.1614/0043-1745(2003)051[0271:KNAATE]2.0.CO;2

CrossRef Full Text | Google Scholar

Lai, G., Chang, W. C., and Yang, Y. (2018). “Modeling long- and short-term temporal patterns with deep neural networks,” in SIGIR '18: The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '18 (New York, NY: Association for Computing Machinery), 95–104.

Google Scholar

Li, Y., Yu, R., and Shahabi, C. (2017). Diffusion convolutional recurrent neural network: data-driven traffic forecasting. arXiv:1707.01926 [cs.LG]. doi: 10.48550/arXiv.1707.01926

CrossRef Full Text | Google Scholar

Li, Y. G., and Cyrus, S. (2018). A brief overview of machine learning methods for short-term traffic forecasting and future directions. SIGSPATIAL Special 10, 3–9. doi: 10.1145/3231541.3231544

CrossRef Full Text | Google Scholar

López-González, G., Altamirano-Gómez, G., and Bayro-Corrochano, E. (2016). Geometric entities voting schemes in the conformal geometric algebra framework. Adv. Appl. Clifford Algebras 26, 1045–1059. doi: 10.1007/s00006-015-0589-y

CrossRef Full Text | Google Scholar

Loukas, A., and Perraudin, N. (2019). Stationary time-vertex signal processing. EURASIP J. Adv. Signal Process 36, 1–19. doi: 10.1186/s13634-019-0631-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Mohammad, Z., Hossein, B., and Isa, E. (2019). A reliable linear stochastic daily soil temperature forecast model. Soil Tillage Res. 189, 73–87. doi: 10.1016/j.still.2018.12.023

CrossRef Full Text | Google Scholar

Parcollet, T., Ravanelli, M., and Morchid, M. (2019). “Bidirectional quaternion long-short term memory recurrent neural networks for speech recognition,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (Brighton: IEEE), 8519–8523.

Google Scholar

Rafal, A. (2004). Lectures on Clifford (Geometric) Algebras and Applications. Boston, MA. Birkhäuser.

Google Scholar

Roberts, S., Osborne, M., and Ebden, M. (2012). Gaussian processes for time-series modelling. Philos. Trans. 371, 20110550. doi: 10.1098/rsta.2011.0550

PubMed Abstract | CrossRef Full Text | Google Scholar

Roy, S. K., Krishna, G., and Dubey, S. R. (2020). Hybridsn: exploring 3-d-2-d cnn feature hierarchy for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 17, 277–281. doi: 10.1109/LGRS.2019.2918719

CrossRef Full Text | Google Scholar

Sen, R., Yu, H. F., and Dhillon, I. (2019). Think globally, act locally: a deep neural network approach to high-dimensional time series forecasting. arXiv:1905.03806 [stat.ML]. doi: 10.48550/arXiv.1905.03806

CrossRef Full Text | Google Scholar

Syama, S. R., Matthias, W. S., and Jan, G. (2018). “Deep state space models for time series forecasting,” in NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS'18 (Red Hook, NY: Curran Associates Inc.), 7796–7805.

Google Scholar

Wu, Z. H., Pan, S. R., and Guo, D. L. (2019). “Graph wavenet for deep spatial-temporal graph modeling,” in IJCAI'19: Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI'19 (Macao: AAAI Press), 1907–1913.

Google Scholar

Yu, B., Yin, H., and Zhu, Z. (2018). “Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting,” in IJCAI'18: Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI'18 (Stockholm: AAAI Press), 3634–3640.

Google Scholar

Zhang, L., Aggarwal, C., and Qi, G. J. (2017). “Stock price prediction via discovering multi-frequency trading patterns,” in KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '17 (New York, NY: Association for Computing Machinery), 2141–2149.

Google Scholar

Keywords: geometric algebra, recurrent neural network, long-and short-term time-series network, prediction, multi-dimensional time-series

Citation: Li Y, Wang Y, Wang Y, Qian C and Wang R (2022) Geometric algebra based recurrent neural network for multi-dimensional time-series prediction. Front. Comput. Neurosci. 16:1078150. doi: 10.3389/fncom.2022.1078150

Received: 24 November 2022; Accepted: 01 December 2022;
Published: 22 December 2022.

Edited by:

Guitao Cao, East China Normal University, China

Reviewed by:

Yancheng Ji, Nantong University, China
Yong Cai, Shanghai Astronomical Observatory (CAS), China

Copyright © 2022 Li, Wang, Wang, Qian and Wang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Chunhua Qian, yes Y2hxaWFuMjAwM0AxMjYuY29t; Rui Wang, yes cndhbmdAc2h1LmVkdS5jbg==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.