Solution of the Fokker–Planck Equation by Cross Approximation Method in the Tensor Train Format

Chertkov, Andrei; Oseledets, Ivan

doi:10.3389/frai.2021.668215

ORIGINAL RESEARCH article

Front. Artif. Intell., 02 August 2021

Sec. Machine Learning and Artificial Intelligence

Volume 4 - 2021 | https://doi.org/10.3389/frai.2021.668215

This article is part of the Research TopicTensor Methods for Deep LearningView all 5 articles

Solution of the Fokker–Planck Equation by Cross Approximation Method in the Tensor Train Format

Andrei Chertkov*

Ivan Oseledets

Skolkovo Institute of Science and Technology, Moscow, Russia

We propose the novel numerical scheme for solution of the multidimensional Fokker–Planck equation, which is based on the Chebyshev interpolation and the spectral differentiation techniques as well as low rank tensor approximations, namely, the tensor train decomposition and the multidimensional cross approximation method, which in combination makes it possible to drastically reduce the number of degrees of freedom required to maintain accuracy as dimensionality increases. We demonstrate the effectiveness of the proposed approach on a number of multidimensional problems, including Ornstein-Uhlenbeck process and the dumbbell model. The developed computationally efficient solver can be used in a wide range of practically significant problems, including density estimation in machine learning applications.

1 Introduction

Fokker–Planck equation (FPE) is an important in studying properties of the dynamical systems, and has attracted a lot of attention in different fields. In recent years, FPE has become widespread in the machine learning community in the context of the important problems of density estimation (Grathwohl et al., 2018) for neural ordinary differential equation (ODE) (Chen et al., 2018; Chen and Duvenaud, 2019), generative models (Kidger et al., 2021), etc.

Consider a stochastic dynamical system which is described by stochastic differential equation (SDE) of the form¹

d x = f (x, t) d t + S (x, t) d β, d β d β^{⊤} = Q (t) d t, x = x (t) \in ℝ^{d}, (1)

where $d β$ is a q-dimensional space-time white noise, f is a known d-dimensional vector-function and $S \in ℝ^{d \times q}$ , $Q \in ℝ^{q \times q}$ are known matrices. The FPE for the corresponding probability density function (PDF) $ρ (x, t)$ of the spatial variable $x$ has the form

\frac{\partial ρ (x, t)}{\partial t} = \sum_{i = 1}^{d} \sum_{j = 1}^{d} \frac{\partial}{\partial x_{i}} \frac{\partial}{\partial x_{j}} [D_{i j} (x, t) ρ (x, t)] - \sum_{i = 1}^{d} \frac{\partial}{\partial x_{i}} [f_{i} (x, t) ρ (x, t)], (2)

where $D (x, t) = \frac{1}{2} S (x, t) Q (t) S^{⊤} (x, t)$ is a diffusion tensor.

One of the major complications in solution of the FPE is the high dimensionality of the practically significant computational problems. Complexity of using grid-based representation of the solution grows exponentially with d, thus some low-parametric representations are required. One of the promising directions is the usage of low-rank tensor methods, studied in (Dolgov et al., 2012). The equation is discretized on a tensor-product grid, such that the solution is represented as a d-dimensional tensor, and this tensor is approximated in the low-rank tensor train format (TT-format) (Oseledets, 2011). Even with such complexity reduction, the computations often take a long time. In this paper we propose another approach of using low-rank tensor methods for the solution of the FPE, based on its intimate connection to the dynamical systems.

The key idea can be illustrated for $S = 0$ , i.e. in the deterministic case. For this case the evolution of the PDF along the trajectory is given by the formula

\frac{\partial ρ (x, t)}{\partial t} = - Tr (\frac{\partial f (x, t)}{\partial x}) ρ (x, t), (3)

where $Tr (\cdot)$ is a trace operation for the matrix. Hence, to compute the value of $ρ (x, t)$ at the specific point $x = \hat{x}$ , it is sufficient to find a preimage ${\hat{x}}_{0}$ such that if it is used as an initial condition for eq. 1, then we arrive to $\hat{x}$ . To find the preimage, we need to integrate the eq. 1 backwards in time, and then to find the PDF value, we integrate a system of eqs 1, 3. Since we can evaluate the value of $ρ (x, t)$ at any $\hat{x}$ , we can use the cross approximation method (CAM) (Oseledets and Tyrtyshnikov, 2010; Savostyanov and Oseledets, 2011; Dolgov and Savostyanov, 2020) in the TT-format to recover a supposedly low-rank tensor from its samples. In this way we do not need to have any compact representation of f, but only numerically solve the corresponding ODE. For $S \neq 0$ the situation is more complicated, but we develop a splitting and multidimensional interpolation schemes that allow us effectively recompute the values of the density from some time moment t to the next step $t + h$ .

To summarize, main contributions of our paper are the following:

• we derive a formula to recompute the values of the PDF on each time step, using the second order operator splitting, Chebyshev interpolation and spectral differentiation techniques;

• we propose to use a TT-format and CAM to approximate the solution of the FPE which makes it possible to drastically reduce the number of degrees of freedom required to maintain accuracy as dimensionality increases;

• we implement FPE solver, based on the proposed approach, as a publicly available python code², and we test our approach on several examples, including multidimensional Ornstein-Uhlenbeck process and dumbbell model, which demonstrate its efficiency and robustness.

2 Computation of the Probability Density Function

For ease of demonstration of the proposed approach, we suppose that the noise $β \in ℝ^{q}$ has the same dimension as the spatial variable $x \in ℝ^{d}$ ( $q = d$ ), and the matrices in eq. 1 and eq. 2 have the form³

Q (t) \equiv I_{d}, S (x, t) \equiv \sqrt{2 χ} I_{d}, D (x, t) \equiv χ I_{d}, (4)

where $χ \geq 0$ is a scalar diffusion coefficient. Then eqs 1, 2 can be rewritten in a more compact form

d x = f (x, t) d t + \sqrt{2 χ} d β, d β d β^{⊤} = I_{d} d t, (5)

\frac{\partial ρ}{\partial t} = χ Δ ρ - div [f (x, t) ρ], (6)

where d-dimensional spatial variable $x = x (t) \in Ω \subset ℝ^{d}$ has the corresponding PDF $ρ (x, t)$ with initial conditions

x (0) = x_{0} \sim ρ (x, 0), ρ (x, 0) = ρ_{0} (x) . (7)

To construct the PDF at some moment τ ( $τ > 0$ ) for the known initial distribution $ρ_{0} (x)$ , we discretize eqs 5, 6 on the uniform time grid with M ( $M \geq 2$ ) points

t_{m} = m h, h = \frac{τ}{M - 1}, m = 0,1, \dots, M - 1, (8)

and introduce the notation $x_{m} = x (t_{m})$ for value of the spatial variable at the moment $t_{m}$ and $ρ_{m} (\cdot) = ρ (\cdot, t_{m})$ for values of the PDF at the same moment.

2.1 Splitting Scheme

Let $\hat{V}$ and $\hat{W}$ be diffusion and convection operators from the eq. 6

\hat{V} v \equiv χ Δ v, \hat{W} w \equiv - div [f (x, t) w], (9)

then on each time step m ( $m = 0,1, \dots, M - 2$ ) we can integrate equation

\frac{\partial ρ}{\partial t} = (\hat{V} + \hat{W}) ρ, ρ (\cdot, t_{m}) = ρ_{m} (\cdot), (10)

on the interval $(t_{m}, t_{m} + h)$ , to find $ρ_{m + 1}$ for the known value $ρ_{m}$ from the previous time step. Its solution can be represented in the form of the product of an initial solution with the matrix exponential

ρ_{m + 1} = e^{h (\hat{V} + \hat{W})} ρ_{m}, (11)

and if we apply the standard second order operator splitting technique (Glowinski et al., 2017), then

ρ_{m + 1} \approx e^{\frac{h}{2} \hat{V}} e^{h \hat{W}} e^{\frac{h}{2} \hat{V}} ρ_{m}, (12)

which is equivalent to the sequential solution of the following equations

\frac{\partial v^{(1)}}{\partial t} = χ Δ v^{(1)}, v^{(1)} (\cdot, t_{m}) = ρ_{m} (\cdot), (13)

\frac{\partial w}{\partial t} = - div [f (x, t) w], w (\cdot, t_{m}) = v^{(1)} (\cdot, t_{m} + \frac{h}{2}), (14)

\frac{\partial v^{(2)}}{\partial t} = χ Δ v^{(2)}, v^{(2)} (\cdot, t_{m}) = w (\cdot, t_{m} + h), (15)

with the final approximation of the solution $ρ_{m + 1} (\cdot) = v^{(2)} (\cdot, t_{m} + \frac{h}{2})$ .

2.2 Interpolation of the Solution

To efficiently solve the convection eq. 14, we need the ability to calculate the solution of the diffusion eq. 13 at arbitrary spatial points, hence the natural choice for the discretization in the spatial domain are Chebyshev nodes, which makes it possible to interpolate the corresponding function on each time step by the Chebyshev polynomials (Trefethen, 2000).

We introduce the d-dimensional spatial grid $X^{(g)}$ as a tensor product of the one-dimensional grids⁴

x_{k}^{(g)} \in ℝ^{N_{k}}, x_{k}^{(g)} [n_{k}] = cos \frac{π \cdot (n_{k} - 1)}{N_{k} - 1}, n_{k} = 1,2, \dots, N_{k}, (16)

where $N_{k}$ ( $N_{k} \geq 2$ ) is a number of points along the kth spatial axis ( $k = 1,2, \dots, d$ ), and the total number of the grid points is $N = N_{1} \cdot N_{2} \cdot \dots \cdot N_{d}$ . Note that this grid can be also represented in the flatten form as a following matrix

X^{(g)} \in ℝ^{d \times N}, X^{(g)} [k, n] = x_{k}^{(g)} [mind (n) [k]], (17)

where $n = 1,2, \dots, N$ , $k = 1,2, \dots, d$ and by $mind (n) = [n_{1}, n_{2}, \dots, n_{d}^{⊤}$ we denoted an operation of construction of the multi-index from the flatten long index according to the big-endian convention

n = n_{d} + (n_{d - 1} - 1) N_{d} + \dots + (n_{1} - 1) N_{2} N_{3} \dots N_{d} . (18)

Suppose that we calculated PDF $ρ_{m}$ on some time step m ( $m \geq 0$ ) at the nodes of the spatial grid $X^{(g)}$ [note that for the case $m = 0$ , the corresponding values come from the known initial condition $ρ_{0} (x)$ ]. These values can be collected as elements of a tensor⁵ $ℛ_{m} \in ℝ^{N_{1} \times N_{2} \times \dots \times N_{d}}$ such that

ℛ_{m} [n_{1}, n_{2}, \dots, n_{d}] = ρ_{m} (x_{1}^{(g)} [n_{1}], x_{2}^{(g)} [n_{2}], \dots, x_{d}^{(g)} [n_{d}]), (19)

where $n_{k} = 1,2, \dots, N_{k}$ ( $k = 1,2, \dots, d$ ).

Let us interpolate PDF $ρ_{m}$ via the system of orthogonal Chebyshev polynomials of the first kind

T_{0} (x) = 1, T_{1} (x) = x, T_{k + 1} (x) = 2 x T_{k} (x) - T_{k - 1} (x) f o r k = 1,2, \dots, (20)

in the form of the naturally cropped sum

\begin{array}{l} ρ_{m} (x) \approx \tilde{ρ_{m}} (x) = \\ = \sum_{n_{1} = 1}^{N_{1}} \sum_{n_{2} = 1}^{N_{2}} \cdot \cdot \cdot \sum_{n_{d} = 1}^{N_{d}} A_{m} [n_{1}, n_{2}, \dots, n_{d}] T_{n_{1} - 1} (x_{1}) T_{n_{2} - 1} (x_{2}) \dots T_{n_{d} - 1} (x_{d}), \end{array} (21)

where $x = (x_{1}, x_{2}, \dots, x_{d})$ is some spatial point and interpolation coefficients are elements of the tensor $A_{m} \in ℝ^{N_{1} \times N_{2} \times \dots \times N_{d}}$ . For construction of this tensor we should set equality in the interpolation nodes eq. 16

\begin{array}{l} \tilde{ρ_{m}} (x_{1}^{(g)} [n_{1}], x_{2}^{(g)} [n_{2}], \dots, x_{d}^{(g)} [n_{d}]) = \\ ρ_{m} (x_{1}^{(g)} [n_{1}], x_{2}^{(g)} [n_{2}], \dots, x_{d}^{(g)} [n_{d}]) \end{array} (22)

for all combinations of $n_{k} = 1,2, \dots, N_{k}$ ( $k = 1,2, \dots, d$ ).

Therefore the interpolation process can be represented as a transformation of the tensor $ℛ_{m}$ to the tensor $A_{m}$ according to the system of eq. 22. If the Chebyshev polynomials and nodes are used for interpolation, then a good way is to apply a fast Fourier transform (FFT) (Trefethen, 2000) for this transformation. However the exponential growth of computational complexity and memory consumption with the growth of the number of spatial dimensions makes it impossible to calculate and store related tensors for the multidimensional case in the dense data format. Hence in the next sections we present an efficient algorithm for construction of the tensor $A_{m}$ in the low-rank TT-format.

2.3 Solution of the Diffusion Equation

To solve the diffusion eqs 13, 15 on the Chebyshev grid, we discretize Laplace operator using the second order Chebyshev differential matrices [see, for example, (Trefethen, 2000)] $D_{k} \in ℝ^{N_{k} \times N_{k}}$ such that $D_{k} = {\tilde{D}}_{k} {\tilde{D}}_{k}$ , where for each spatial dimension $k = 1,2, \dots, d$

{\tilde{D}}_{k} [i, j] = \{\begin{matrix} \frac{2 {(N_{k} - 1)}^{2} + 1}{6}, i = j = 1, \\ \frac{- x_{k}^{(g)} [j]}{2 (1 - {(x_{k}^{(g)} [j])}^{2})}, i = j = 2,3, \dots, N_{k} - 1, \\ \frac{c_{i}}{c_{j}} \frac{{(- 1)}^{i + j}}{x_{k}^{(g)} [i] - x_{k}^{(g)} [j]}, i \neq j, i, j = 2,3, \dots, N_{k} - 1, \\ - \frac{2 {(N_{k} - 1)}^{2} + 1}{6}, i = j = N_{k}, \end{matrix} (23)

with $c_{i} = 2$ if $i = 1$ or $i = N_{k}$ and $c_{i} = 1$ otherwise, and one dimensional grid points $x_{k}^{(g)}$ defined from eq. 16. Then discretized Laplace operator has the form⁶

Δ = D_{1} \otimes I_{N_{2}} \otimes \dots \otimes I_{N_{d}} + I_{N_{1}} \otimes D_{2} \otimes \dots \otimes I_{N_{d}} + \dots + I_{N_{1}} \otimes I_{N_{2}} \otimes \dots \otimes D_{d} . (24)

Let $V_{m} \in ℝ^{N_{1} \times N_{2} \times \dots \times N_{d}}$ be the known initial condition for the diffusion equation on the time step m ( $t_{m} = m h$ ), then for the solution $V_{m + \frac{1}{2}}$ at the moment $t_{m} + \frac{h}{2}$ we have

vec (V_{m + \frac{1}{2}}) = e^{\frac{h}{2} χ Δ} vec (V_{m}), (25)

where an operation $vec (\cdot)$ constructs a vector from the tensor by a standard reshaping procedure like eq. 18. And finally due to the well known property of the matrix exponential, we come to

vec (V_{m + \frac{1}{2}}) = (e^{\frac{h}{2} χ D_{1}} \otimes e^{\frac{h}{2} χ D_{2}} \otimes \dots \otimes e^{\frac{h}{2} χ D_{d}}) vec (V_{m}) . (26)

If we can represent the initial condition $V_{m}$ in the form of Kronecker product of the one-dimensional tensors (for example, in terms of the TT-format in the form of the Kronecker products of the TT-cores, as will be presented below in this work), then we can efficiently evaluate the formula eq. 26 to obtain the desired approximation for solution $vec (V_{m + \frac{1}{2}})$ .

2.4 Solution of the Convection Equation

Convection eq. 14 can be reformulated in terms of the FPE without diffusion part, when the corresponding ODE has the form

d x = f (x, t) d t, x = x (t) \in ℝ^{d}, x \sim ρ (x, t) . (27)

If we consider the differentiation along the trajectory of the particles, as was briefly described in the Introduction, then

\begin{matrix} {(\frac{\partial w}{\partial t})}_{x = x (t)} & = \sum_{k = 1}^{d} \frac{\partial w}{\partial x_{k}} \frac{\partial x_{k}}{\partial t} + \frac{\partial w}{\partial t} = \sum_{k = 1}^{d} \frac{\partial w}{\partial x_{k}} \frac{\partial x_{k}}{\partial t} - div [f w] = \\ = \sum_{k = 1}^{d} \frac{\partial w}{\partial x_{k}} f_{k} - \sum_{k = 1}^{d} \frac{\partial f_{k}}{\partial x_{k}} w - \sum_{k = 1}^{d} f_{k} \frac{\partial w}{\partial x_{k}} = - \sum_{k = 1}^{d} \frac{\partial f_{k}}{\partial x_{k}} w, \end{matrix} (28)

where we replaced the term $\frac{\partial w}{\partial t}$ by the right hand side of eq. 14 and $\frac{\partial x_{k}}{\partial t}$ by the right hand side of the corresponding equation in eq. 27.

Hence equation for w may be rewritten in terms of the trajectory integration of the following system

{\begin{matrix} \frac{\partial x}{\partial t} = f (x, t), \\ \frac{\partial w}{\partial t} = - Tr (\frac{\partial f}{\partial x} (x, t)) w \end{matrix} (29)

Let us integrate eq. 29 on a time step m ( $m = 0,1, \dots, M - 2$ ). If we set any spatial grid point $x^{*} = X^{(g)} [:, n]$ ( $n = 1,2, \dots, N$ ) as initial condition for the spatial variable, then we’ll obtain solution ${\hat{w}}_{m + 1}$ for some point ${\hat{x}}_{m + 1}$ outside the grid (see Figure 1 with the illustration for the two-dimensional case). Hence we should firstly solve eq. 27 backward in time to find the corresponding spatial point ${\hat{x}}_{m}$ that will be transformed to the grid point $x^{*}$ by the step $m + 1$ . If we select this point ${\hat{x}}_{m}$ and the related value ${\hat{w}}_{m} = w ({\hat{x}}_{m}, t_{m})$ as initial conditions for the system eq. 29, then its solution $w_{m + 1}$ will be related to the point of interest $x^{*}$ .

FIGURE 1

FIGURE 1. Evolution of the spatial variable and the corresponding PDF for two consecutive time steps related to the fixed Chebyshev grid in the case of two dimensions.

Note that, according to our splitting scheme, we solve the convection part eq. 14 after the corresponding diffusion eq. 13, and hence the initial condition $w_{m}$ is already known and defined as a tensor $W_{m} \in ℝ^{N_{1} \times N_{2} \times \dots \times N_{d}}$ on the Chebyshev spatial grid. Using this tensor, we can perform interpolation according to the formula eq. 22 and calculate the tensor of interpolation coefficients $A_{m}$ . Then we can evaluate the approximated value at the point ${\hat{x}}_{m}$ as $\tilde{w_{m}} ({\hat{x}}_{m})$ according to eq. 21.

Hence our solution strategy for convection equation is the following. For the given spatial grid point $x^{*} = X^{(g)} [:, n]$ we integrate equation

\frac{\partial x}{\partial t} = f (x, t), x (t_{m + 1}) = x *, (30)

backward in time to find the corresponding point ${\hat{x}}_{m} = x (t_{m})$ . Then we find the value of w at this point, using interpolation $\tilde{w_{m}}$ , and then we solve the system eq. 29 on the time interval $(t_{m}, t_{m} + h)$ with initial condition $({\hat{x}}_{m}, \tilde{w_{m}} ({\hat{x}}_{m}))$ to obtain the value $w_{m + 1}$ at the point $x^{*}$ . The described process should be repeated for each grid point ( $n = 1,2, \dots, N$ ) and, ultimately, we’ll obtain a tensor $W_{m + 1} \in ℝ^{N_{1} \times N_{2} \times \dots \times N_{d}}$ which is the approximated solution of convection part eq. 14 of the splitting scheme on the Chebyshev spatial grid.

An important contribution of this paper is an indication of the possibility and a practical implementation of the usage of the multidimensional CAM in the TT-format to recover a supposedly low-rank tensor $W_{m + 1}$ from computations on only a part of specially selected spatial grid points. This scheme will be described in more details later in the work after setting out the fundamentals of the TT-format.

3 Low-Rank Representation

There has been much interest lately in the development of data-sparse tensor formats for high-dimensional problems. A very promising tensor format is provided by the tensor train (TT) approach (Oseledets and Tyrtyshnikov, 2009; Oseledets, 2011), which was proposed for compact representation and approximation of high-dimensional tensors. It can be computed via standard decompositions (such as SVD and QR-decomposition) but does not suffer from the curse of dimensionality⁷.

In many analytical considerations and practical cases a tensor is given implicitly by a procedure enabling us to compute any of its elements, so the tensor appears rather as a black box. For example, to construct the convection part of PDF (i.e., the tensor $W_{m}$ introduced above), we should compute the corresponding function for all possible sets of indices. This process requires an extremely large number of operations and can be time-consuming, so it may be useful to find some suitable low-parametric approximation of this tensor using only a small portion of all tensor elements. CAM (Oseledets and Tyrtyshnikov, 2010) which is a widely used method for approximation of high-dimensional tensors looks appropriate for this case.

In this section we describe the properties of the TT-format and multidimensional CAM that are necessary for efficient solution of our problem, as well as the specific features of the practical implementation of interpolation by the Chebyshev polynomials in terms of the TT-format and CAM.

3.1 Tensor Train Format

A tensor $ℛ \in ℝ^{N_{1} \times N_{2} \times \dots \times N_{d}}$ is said to be in the TT-format (Oseledets, 2011), if its elements are represented by the formula

\begin{array}{l} ℛ [n_{1}, n_{2}, \dots, n_{d}] = \sum_{r_{1} = 1}^{R_{1}} \sum_{r_{2} = 1}^{R_{2}} \cdot \cdot \cdot \sum_{r_{d - 1} = 1}^{R_{d - 1}} G_{1} [1, n_{1}, r_{1}] G_{2} [r_{1}, n_{2}, r_{2}] \dots \\ G_{d - 1} [r_{d - 2}, n_{d - 1}, r_{d - 1}] G_{d} [r_{d - 1}, n_{d}, 1] \end{array} (31)

where $n_{k} = 1,2, \dots, N_{k}$ ( $k = 1,2, \dots, d$ ), three-dimensional tensors $G_{k} \in ℝ^{R_{k - 1} \times N_{k} \times R_{k}}$ are named TT-cores, and integers $R_{0}, R_{1}, \dots, R_{d}$ (with convention $R_{0} = R_{d} = 1$ ) are named TT-ranks. The latter formula can be also rewritten in a more compact form

ℛ [n_{1}, n_{2}, \dots, n_{d}] = G_{1} (n_{1}) G_{2} (n_{2}) \dots G_{d} (n_{d}), (32)

where $G_{k} (n_{k}) = G_{k} [:, n_{k}, :]$ is an $R_{k - 1} \times R_{k}$ matrix for each fixed $n_{k}$ (since $R_{0} = R_{d} = 1$ , the result of matrix multiplications in eq. 32 is a scalar). And a vector form of the TT-decomposition looks like

vec (ℛ) = \sum_{r_{1} = 1}^{R_{1}} \sum_{r_{2} = 1}^{R_{2}} \cdot \cdot \cdot \sum_{r_{d - 1} = 1}^{R_{d - 1}} G_{1} [1, :, r_{1}] \otimes G_{2} [r_{1}, :, r_{2}] \otimes \dots \otimes G_{d} [r_{d - 1}, :, 1], (33)

where the slices of the TT-cores $G_{k}$ are vectors of length $N_{k}$ ( $k = 1,2, \dots, d$ ).

The benefit of the TT-decomposition is the following. Storage of the TT-cores $G_{1}, G_{2}, \dots, G_{d}$ requires less or equal than $d \times {max}_{1 \leq k \leq d} (N_{k} R_{k}^{2})$ memory cells (instead of $N = N_{1} N_{2} \dots N_{d} \sim N_{0}^{d}$ cells for the uncompressed tensor, where $N_{0}$ is an average size of the tensor modes), and hence the TT-decomposition is free from the curse of dimensionality if the TT-ranks are bounded.

The detailed description of the TT-format and linear algebra operations in terms of this format⁸ is given in works (Oseledets and Tyrtyshnikov, 2009; Oseledets, 2011). It is important to note that for a given tensor $\hat{ℛ}$ in the full format, the TT-decomposition (compression) can be performed by a stable TT-SVD algorithm. This algorithm constructs an approximation $ℛ$ in the TT-format to the given tensor $\hat{ℛ}$ with a prescribed accuracy $ϵ_{T T}$ in the Frobenius norm⁹

{‖ ℛ - \hat{ℛ} ‖}_{F} \leq ϵ_{T T} \cdot {‖ \hat{ℛ} ‖}_{F}, (34)

but a procedure of the tensor approximation in the full format is too costly, and is even impossible for large dimensions due to the curse of dimensionality. Therefore more efficient algorithms like CAM are needed to quickly construct the tensor in the low rank TT-format.

3.2 Cross Approximation Method

The CAM allows to construct a TT-approximation of the tensor with prescribed accuracy $ϵ_{C A}$ , using only part of the full tensor elements. This method is a multi-dimensional analogue of the simple cross approximation method for the matrices (Tyrtyshnikov, 2000) that allows one to approximate large matrices in $O (N_{0} R^{2})$ time by computing only $O (N_{0} R)$ elements, where $N_{0}$ is an average size of the matrix modes and R is the rank of the matrix. The CAM and the TT-format can significantly speed up the computation and reduce the amount of consumed memory as will be illustrated in the next sections on the solution of the model equations.

The CAM constructs a TT-approximation $ℛ$ to the tensor $\hat{ℛ}$ , given as a function $f (n_{1}, n_{2}, \dots, n_{d})$ , that returns the $(n_{1}, n_{2}, \dots, n_{d})$ th entry of $\hat{ℛ}$ for a given set of indices. This method requires only

$O (d \times {max}_{1 \leq k \leq d} (N_{k} R_{k}^{3}))$ operations for the construction of the approximation with a prescribed accuracy $ϵ_{C A}$ , where $R_{0}, R_{1}, \dots, R_{d}$ ( $R_{0} = R_{d} = 1$ ) are TT-ranks of the tensor $ℛ$ [see detailed discussion of the CAM in (Oseledets and Tyrtyshnikov, 2010)]. It should be noted that TT-ranks can depend on the value of selected accuracy $ϵ_{C A}$ , but for a wide class of practically interesting tasks the TT-ranks are bounded or depend polylogarithmically on $ϵ_{C A}$ [(Oseledets, 2010; Oseledets, 2011) for more details and examples]. In Algorithm 1 the description of the process of construction of the tensor in the TT-format on the Chebyshev grid by the CAM is presented (we’ll call it as a function $cros s_{X} (\cdot)$ below). We prepare function func, which transforms given indices into the spatial grid points and return an array of the corresponding values of the target $r (\cdot)$ . Then this function is passed as an argument to the standard rank adaptive method $tt_rectcross$ from the ttpy package. The CAM is described in more detail in the original papers (Oseledets and Tyrtyshnikov, 2010; Savostyanov and Oseledets, 2011), as well as in a recent work (Dolgov and Savostyanov, 2020), which formulates a computationally efficient parallel implementation of the algorithm.

3.3 Multidimensional Interpolation

As was discussed in the previous sections, we discretize the FPE on the multidimensional Chebyshev grid and interpolate solution of the first diffusion equation in the splitting scheme eq. 13 by the Chebyshev polynomials to obtain its values on custom spatial points (different from the grid nodes) and then perform efficient trajectory integration of the convection eq. 14.

The desired interpolation may be constructed from solution of the system of eq. 22 in terms of the FFT (Trefethen, 2000), but for the high dimension numbers we have the exponential growth of computational complexity and memory consumption, hence it is very promising to construct tensor of the nodal values and the corresponding interpolation coefficients in the TT-format.

Consider a TT-tensor $ℛ \in ℝ^{N_{1} \times N_{2} \times \dots \times N_{d}}$ with the list of TT-cores $[G_{1}, G_{2}, \dots G_{d}]$ , which collects PDF values on the nodes of the Chebyshev grid at some time step (the related function is $r (x)$ , and this tensor is obtained, for example, by the CAM or according to TT-SVD procedure from the tensor in the full format). Then the corresponding TT-tensor $A \in ℝ^{N_{1} \times N_{2} \times \dots \times N_{d}}$ of interpolation coefficients with the TT-cores $[{\tilde{G}}_{1}, {\tilde{G}}_{2}, \dots {\tilde{G}}_{d}]$ can be constructed according to the scheme, which is presented in Algorithm 2 [we’ll call it as a function $interpolate (\cdot)$ below].

In this Algorithm we use standard linear algebra operations $swapaxes$ and $reshape$ , which rearrange the axes and change the dimension of the given tensor respectively, function $fft$ for construction of the one-dimensional FFT for the given vector, and function $tt_round$ from the ttpy package, which round the given tensor to the prescribed accuracy ϵ. Note that the inner loop in Algorithm 2 for $r^{*}$ may be replaced by the vectorized computations of the corresponding two-dimensional FFT.

For the known tensor $A$ we can perform a fast computation of the function value at any given spatial point $x = [x_{1}, x_{2}, \dots, x_{d}^{⊤}]$ by a matrix product of the convolutions of the TT-cores of $A$ with appropriate column vectors of Chebyshev polynomials

\begin{matrix} r (x) \approx \sum_{r_{1} = 1}^{R_{1}} \sum_{r_{2} = 1}^{R_{2}} \dots \sum_{r_{d - 1} = 1}^{R_{d - 1}} (\sum_{n_{1} = 1}^{N_{1}} {\tilde{G}}_{1} [1, n_{1}, r_{1}] T_{n_{1} - 1} (x_{1})) \\ (\sum_{n_{2} = 1}^{N_{2}} {\tilde{G}}_{2} [r_{1}, n_{2}, r_{2}] T_{n_{2} - 1} (x_{2})) \dots (\sum_{n_{d} = 1}^{N_{d}} {\tilde{G}}_{d} [r_{d - 1}, n_{d}, 1] T_{n_{d} - 1} (x_{d})), \end{matrix} (35)

We’ll call the corresponding function as $inter_eval (A, X)$ below. This function constructs a list of $r (\cdot)$ values for the given set of I points $X \in ℝ^{d \times I}$ ( $I \geq 1$ ), using interpolation coefficients $A$ and sequentially applying the formula eq. 35 for each spatial point.

4 Detailed Algorithms

In Algorithms 3, 4 and 5 we combine the theoretical details discussed in the previous sections of this work and present the final calculation scheme for solution of the multidimensional FPE in the TT-format, using CAM (function $cros s_{X}$ , Algorithm 1)¹⁰ and interpolation by the Chebyshev polynomials (function $interpolate$ from Algorithm 2 that constructs interpolation coefficients and function $inter_eval$ that evaluates interpolation result at given points according to the formula eq. 35).

We denote by $einsum$ the standard linear algebra operation that evaluates the Einstein summation convention on the operands (see, for example, the numpy python package). Function $vstack$ stack arrays in sequence vertically, function $ode_solve (rhs, t_{1}, t_{2}, Y_{0})$ (where $t_{1}$ and $t_{2}$ are initial and final times, $rhs$ is the right hand side of equations, and matrix $Y_{0}$ collects initial conditions) solves a system of ODE with vectorized initial condition by the one step of the fourth order Runge-Kutta method.

5 Numerical Examples

In this section we illustrate the proposed computational scheme, which was presented above, with the numerical experiments. All calculations were carried out in the Google Colab cloud interface¹¹ with the standard configuration (without GPU support).

Firstly we consider an equation with a linear convection term—Ornstein-Uhlenbeck process (OUP) (Vatiwutipong and Phewchean, 2019) in one, three and five dimensions. For the one-dimensional case, which is presented for convention, we only solve equation using the dense format (not TT-format), hence the corresponding results are used to verify the general correctness and convergence properties of the proposed algorithm, but not its efficiency. In the case of the multivariate problems we use the proposed tensor based solver, which operates in accordance with the algorithm described above. To check the results of our computations, we use the known analytic stationary solution for the OUP, and for the one-dimensional case we also perform comparison with constructed analytic solution at any time moment.

Then we consider more complicated dumbbell problem (Venkiteswaran and Junk, 2005) which may be represented as a three-dimensional FPE with a nonlinear convection term. For this case we consider the Kramer expression and compare our computation results with the results from another works for the same problem.

In the numerical experiments we consider the spatial region $Ω$ such that PDF is almost vanish on the boundaries $ρ (x, t) |_{\partial Ω} \approx 0$ , and the initial condition is selected in the form of the Gaussian function

ρ (x, 0) = ρ_{0} (x) = {(2 π s)}^{- \frac{d}{2}} exp [- \frac{1}{2 s} | | x | |^{2}], s \in ℝ, s > 0, (36)

where parameter s is selected as $s = 1$ . To estimate the accuracy of the obtained PDF ( $ρ$ ) we use the relative 2-norm of deviation from the exact value ( $ρ_{e x a c t}$ )

e = \frac{| | ρ - ρ_{e x a c t} | |_{2}}{| ρ_{e x a c t} | |_{2}} . (37)

We compute the value $ρ_{e x a c t}$ through the given function, using a CAM with an accuracy parameter two orders of magnitude higher, than the one that was used in the solver.

5.1 Numerical Solution of the Ornstein-Uhlenbeck Process

Consider FPE of the form eq. 6 in the d-dimensional case with

f (x, t) = A (μ - x (t)), χ = \frac{1}{2}, x \in Ω = {[x_{m i n}, x_{m a x}]}^{d}, t \in [0, τ], (38)

where $A \in ℝ^{d \times d}$ is invertible real matrix, $μ \in ℝ^{d}$ is the long-term mean, $x_{m i n} \in ℝ$ and $x_{m a x} \in ℝ$ ( $x_{m i n} < x_{m a x}$ ), $τ \in ℝ$ ( $τ > 0$ ). This equation is a well known multivariate OUP with the following properties [see for example (Singh et al., 2018; Vatiwutipong and Phewchean, 2019)]:

• mean vector is

M (t, x_{0}) = e^{- A t} x_{0} + (I_{d} - e^{- A t}) μ; (39)

• covariance matrix is

Σ (t) = \int_{0}^{t} e^{A (s - t)} S S^{⊤} e^{A^{⊤} (s - t)} d s, (40)

and, in our case as noted above $S = \sqrt{2 χ} I_{d}$ ;

• transitional PDF is

ρ (x, t, x_{0}) = \frac{exp [- \frac{1}{2} {(x - M (t, x_{0}))}^{⊤} Σ^{- 1} (t) (x - M (t, x_{0}))]}{\sqrt{| 2 π Σ (t) |}}; (41)

• stationary solution is

ρ_{s t} (x) = \frac{exp [- \frac{1}{2} x^{⊤} W^{- 1} x]}{\sqrt{{(2 π)}^{d} d e t (W)}}, (42)

where matrix $W \in ℝ^{d \times d}$ can be found from the following equation

A W + W A^{⊤} = 2 χ I_{d}; (43)

• the (multivariate) OUP at any time is a (multivariate) normal random variable;

• the OUP is mean-reverting (the solution tends to its long-term mean $μ$ as time t tends to infinity) if all eigenvalues of A are positive (if $A > 0$ in the one-dimensional case).

5.1.1 One-Dimensional Process

Let consider the one-dimensional ( $d = 1$ ) OUP with

A = 1, μ = 0, x_{m i n} = - 5, x_{m a x} = 5, τ = 10. (44)

We can calculate the analytic solution in terms of only spatial variable and time via integration of the transitional PDF eq. 41

ρ (x, t) = \int_{- \infty}^{\infty} ρ (x, t, x_{0}) ρ_{0} (x_{0}) d x_{0} . (45)

Accurate computations lead to the following formula

ρ (x, t) = \frac{1}{\sqrt{2 π (Σ (t) + s e^{- 2 A t})}} exp [- \frac{x^{2}}{2 (Σ (t) + s e^{- 2 A t})}], (46)

where $Σ (t)$ is defined by eq. 40 and for the one-dimensional case may be represented in the form

Σ (t) = \frac{1 - e^{- 2 A t}}{2 A} . (47)

Using the formulas eq. 42 and eq. 43 we can represent a stationary solution for the one-dimensional case in the explicit form

ρ_{s t a t} (x) = \sqrt{\frac{A}{π}} e^{- A x^{2}} . (48)

We perform computation for $N_{1} = 50$ spatial points and $M = 1000$ time points and compare the numerical solution with the known analytic eq. 46 and stationary eq. 48 solution. In the Figure 2 we present the corresponding result. Over time, the error of the numerical solution relative to the analytical solution first increases slightly, and then stabilizes at approximately $10^{- 5}$ . At the same time, the numerical solution approaches the stationary one, and the corresponding error at large times also becomes approximately $10^{- 5}$ . Note that the time to build the solution was about 5 s.

FIGURE 2

FIGURE 2. Relative error of the calculated solution vs known analytic and stationary solutions for the one-dimensional OUP.

5.1.2 Three-Dimensional Process

Our next example is the three-dimensional ( $d = 3$ ) OUP with the following parameters

A = [\begin{matrix} 1.5 & 1 & 0 \\ 0 & 1 & 0 \\ 0.5 & 0.3 & 1 \end{matrix}], μ = 0, x_{m i n} = - 5, x_{m a x} = 5, τ = 5. (49)

When carrying out numerical calculation, we select $10^{- 4}$ as the accuracy of the CAM, 100 as a total number of time points and 30 as a number of points along each of the spatial dimensions. The computation result is compared with the stationary solution eq. 42 which was obtained as solution of the related matrix eq. 43 by a standard solver for Lyapunov equation.

The result is shown in Figure 3. As can be seen, the TT-rank¹² remains limited, and the accuracy of the solution over time grows, reaching $10^{- 3}$ by the time $t = 5$ . The time to build the solution was about 26 s.

FIGURE 3

FIGURE 3. Relative error of the calculated solution vs known stationary solution (A) and the effective TT-rank (B) for the three-dimensional OUP.

To evaluate the efficiency of the proposed algorithm in the TT-format, we also solve these three-dimensional OUP, using dense format (as for the one-dimensional case, all arrays are presented in its full form). The corresponding calculation took about 376 s, so in this case we have an acceleration of calculations by more than an order of magnitude.

5.1.3 Five-Dimensional Process

This multidimensional case is considered in the same manner as the previous one. We select the following parameters

A = [\begin{matrix} 1.5 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ 0.5 & 0.3 & 0.2 & 0 & 1 \end{matrix}], μ = 0, x_{m i n} = - 5, x_{m a x} = 5, τ = 5. (50)

We select the same values as in the previous example for the CAM accuracy ( $10^{- 4}$ ), the number of time points (100) and the number of spatial points (30), and compare result of the computation with the stationary solution from eq. 42 and eq. 43.

The results are presented on the plots on Figure 4. The TT-rank of the solution remains limited and reaches the value 4.5 at the end time step, and the solution accuracy reaches almost $10^{- 3}$ . The time to build the solution was about 100 s.

FIGURE 4

FIGURE 4. Relative error of the calculated solution vs known stationary solution (A) and the effective TT-rank (B) for the five-dimensional OUP.

5.2 Numerical Solution of the Dumbbell Problem

Now consider a more complex non-linear example corresponding to the three-dimensional ( $d = 3$ ) dumbbell model of the form eq. 6 with¹³

f (x, t) = A x - \frac{1}{2} \nabla ϕ, A = β [\begin{matrix} 0 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{matrix}], ϕ = \frac{| | x | |^{2}}{2} + \frac{α}{p^{3}} e^{- \frac{| | x | |^{2}}{2 p^{2}}}, (51)

where

χ = \frac{1}{2}, x \in Ω = {[- 10,10]}^{3}, t \in [0,10], α = 0.1, β = 1, p = 0.5. (52)

Making simple calculations (taking into account the specific form of the matrix A), we get explicit expressions for the function and the required partial derivatives ( $k = 1,2,3$ )

f = β [\begin{matrix} x_{2} \\ 0 \\ 0 \end{matrix}] - \frac{1}{2} x + \frac{α}{2 p^{5}} e^{- \frac{| | x | |^{2}}{2 p^{2}}} x, (53)

\frac{\partial f_{k}}{\partial x_{k}} = - \frac{1}{2} + \frac{α}{2 p^{5}} e^{- \frac{| | x | |^{2}}{2 p^{2}}} - \frac{α}{2 p^{7}} e^{- \frac{| | x | |^{2}}{2 p^{2}}} x_{k}^{2} . (54)

Next, we consider the Kramer expression

τ (t) = \int ρ (x, t) [x \otimes \nabla ϕ] d x, (55)

and as the values of interest (as in the works (Venkiteswaran and Junk, 2005; Dolgov et al., 2012)) we select

ψ (t) = \frac{τ_{11} (t) - τ_{22} (t)}{β^{2}} = \frac{1}{β^{2}} ρ (x, t) (x_{1} \frac{\partial ϕ}{\partial x_{1}} - x_{2} \frac{\partial ϕ}{\partial x_{2}}), (56)

η (t) = \frac{τ_{12} (t)}{β} = \frac{1}{β} ρ (x, t) x_{1} \frac{\partial ϕ}{\partial x_{2}} . (57)

During the calculations we used the following solver parameters:

• the accuracy of the CAM is $10^{- 5}$ ;

• the number of time grid points is 100;

• the number of grid points along each of the spatial dimensions is 60.

The results are presented on the plots on Figure 5. The time to build the solution was about 200 s (also additional time was required to calculate the values $ψ (t)$ and $η (t)$ from eq. 56 and eq. 57 respectively). As can be seen, the TT-rank remains limited, and its stationary value is about 8. We compared the obtained stationary values of the $ψ (t)$ and $η (t)$ variables:

ψ (t = 10) = 2.0707, η (t = 10) = 1.0318, (58)

with the corresponding results from (Dolgov et al., 2012)¹⁴, and we get the following values for relative errors

ϵ_{ψ} = 1.9 \times 10^{- 4}, ϵ_{η} = 9.7 \times 10^{- 4} . (59)

FIGURE 5

FIGURE 5. Computed values (A) and the effective TT-rank (B) for the three-dimensional dumbbell problem.

6 Related Works

The problem of uncertainty propagation through nonlinear dynamical systems subject to stochastic excitation is given by the FPE, which describes the evolution of the PDF, and has been extensively studied in the literature. A number of numerical methods such as the path integral technique (Wehner and Wolfer, 1983; Subramaniam and Vedula, 2017), the finite difference and the finite element method (Kumar and Narayanan, 2006; Pichler et al., 2013) have been proposed to solve the FPE.

These methods inevitably require mesh or associated transformations, which increase the amount of computation. The problem becomes worse when the system dimension increases. To maintain accuracy in traditional discretization based numerical methods, the number of degrees of freedom of the approximation, i.e. the number of unknowns, grows exponentially as the dimensionality of the underlying state-space increases.

On the other hand, the Monte Carlo method, that is common for such kind of problems (Kikuchi et al., 1991; Küchlin and Jenny, 2017), has slow rate of convergence, causing it to become computationally burdensome as the underlying dimensionality increases. Hence, the so-called curse of dimensionality fundamentally limits the use of the FPE for uncertainty quantification in high dimensional systems.

In recent years, low-rank tensor approximations have become especially popular for solving multidimensional problems in various fields of knowledge (Cichocki et al., 2016). However, for the FPE, this approach is not yet widely used. We note the works (Dolgov et al., 2012; Sun and Kumar, 2014; Sun and Kumar, 2015; Dolgov, 2019; Fox et al., 2020) in which the low-rank TT-decomposition was proposed for solution of the multidimensional FPE. In these works, the differential operator and the right-hand side of the system are represented in the form of TT-tensor. Moreover, in paper (Dolgov et al., 2012) the joint discretization of the solution in space-time is considered. The difference of our approach from these works is its more explicit iterative form for time integration, as well as the absence of the need to represent the right hand side of the system in a low-rank format, which allows to use this approach in machine learning applications.

7 Conclusion

In this paper we proposed the novel numerical scheme for solution of the multidimensional Fokker–Planck equation, which is based on the Chebyshev interpolation and spectral differentiation techniques as well as low rank tensor approximations, namely, the tensor train decomposition and cross approximation method, which in combination make it possible to drastically reduce the number of degrees of freedom required to maintain accuracy as dimensionality increases.

The proposed approach can be used for the numerical analysis of uncertainty propagation through nonlinear dynamical systems subject to stochastic excitations, and we demonstrated its effectiveness on a number of multidimensional problems, including Ornstein-Uhlenbeck process and dumbbell model.

As part of the further development of this work, we plan to conduct more rigorous estimates of the convergence of the proposed scheme, as well as formulate a set of heuristics for the optimal choice of number of time and spatial grid points and tensor train rank. Another promising direction for further research is the application of established approaches and developed solver to the problem of density estimation for machine learning models.

Data Availability Statement

Program code, input data and calculation results can be found here: https://github.com/AndreiChertkov/fpcross

Author Contributions

IO contributed to general formulation of the problem and formulation of the preliminary version of the algorithm. AC contributed to development of the final version of algorithms and program code, carrying out numerical experiments.

Funding

The work was supported by Ministry of Science and Higher Education grant No. 075-10-2021-068.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Footnotes

¹Vectors and matrices are denoted hereinafter by lower case bold letters ( $a, b, c, \dots$ ) and upper case letters ( $A, B, C, \dots$ ) respectively. We denote the $(i_{1}, i_{2})$ th element of an $N_{1} \times N_{2}$ matrix A as $A [i_{1}, i_{2}]$ and assume that $1 \leq i_{1} \leq N_{1}$ , $1 \leq i_{2} \leq N_{2}$ . For vectors we use the same notation: $a [i]$ is the i-th element of the vector $a$ ( $i = 1,2, \dots, N$ ). In addition, for a compact representation of an i-th ( $i = 1,2, \dots, d$ , where $d \geq 1$ ) element of a vector function $f = {[f_{1}, f_{2}, \dots, f_{d}]}^{⊤}$ , we will use the notation $f_{i}$ , which means $f_{i} (\cdot) = f_{i} (\cdot) [i]$ .

²The code is publicly available from https://github.com/AndreiChertkov/fpcross.

³We use notation $I_{k}$ for the $k \times k$ ( $k = 1,2, \dots$ ) identity matrix.

⁴We suppose that for each spatial dimension the variable x varies within $[- 1, 1]$ . In other cases, an appropriate scaling can be easily applied.

⁵By tensors we mean multidimensional arrays with a number of dimensions d ( $d \geq 1$ ). A two-dimensional tensor ( $d = 2$ ) is a matrix, and when $d = 1$ it is a vector. For tensors with $d > 2$ we use upper case calligraphic letters ( $A, ℬ, C, \dots$ ). The $(n_{1}, n_{2}, \dots, n_{d})$ th entry of a d-dimensional tensor $A \in ℝ^{N_{1} \times N_{2} \times \dots \times N_{d}}$ is denoted by $A [n_{1}, n_{2}, \dots, n_{d}]$ , where $n_{k} = 1,2, \dots, N_{k}$ ( $k = 1,2, \dots, d$ ) and $N_{k}$ is a size of the k-th mode, and mode-k slice of such tensor is denoted by $A [n_{1}, \dots, n_{k - 1}, :, n_{k + 1}, \dots, n_{d}]$ .

⁶Note that for the case $N_{1} = N_{2} = \dots = N_{d} \equiv N_{0}$ , we have only one matrix $D_{1} = D_{2} = \dots = D_{d} \equiv D_{0} \in ℝ^{N_{0} \times N_{0}}$ which greatly simplifies the computation process.

⁷By the full format tensor representation or uncompressed tensor we mean the case, when one calculates and saves in the memory all tensor elements. The number of elements of an uncompressed tensor (hence, the memory required to store it) and the amount of operations required to perform basic operations with such tensor grows exponentially in the dimensionality, and this problem is called the curse of dimensionality.

⁸All basic operations in the TT-format are implemented in the ttpy python package https://github.com/oseledets/ttpy and its MATLAB version https://github.com/oseledets/TT-Toolbox.

⁹An exact TT-representation exists for the given full tensor $\hat{ℛ}$ , and TT-ranks of such representation are bounded by ranks of the corresponding unfolding matrices (Oseledets, 2011). Nevertheless, in practical applications it is more useful to construct TT-approximation with a prescribed accuracy $ϵ_{T T}$ , and then carry out all operations (summations, products, etc) in the TT-format, maintaining the same accuracy $ϵ_{T T}$ of the result.

¹⁰Note that in Algorithm 4 we use the solution of the convection part from the previous time step ( $k - 1$ ) as an initial guess $W_{0}$ for CAM on the next time step (k). As it was found empirically It may seem more logical to use the solution of the diffusion part from the same time step (k) as an initial guess, but we found empirically that it leads to higher TT-ranks of the result.

¹¹Actual links to the corresponding Colab notebooks are available in our public repository https://github.com/AndreiChertkov/fpcross.

¹²Hereinafter, we present effective TT-rank of the computation result. For TT-tensor $X \in ℝ^{N_{1} \times N_{2} \times \dots \times N_{d}}$ with TT-ranks $R_{0}, R_{1}, \dots, R_{d}$ ( $R_{0} = R_{d} = 1$ ) the effective TT-rank $\hat{R}$ is a solution of quadratic equation $N_{1} \hat{R} + \sum_{α = 2}^{d - 1} N_{α} {\hat{R}}^{2} + N_{d} \hat{R} = \sum_{α = 1}^{d} N_{α} R_{α - 1} R_{α} .$ The representation with a constant TT-rank $\hat{R}$ ( ${\hat{R}}_{0} = 1$ , ${\hat{R}}_{1} = {\hat{R}}_{2} \dots = {\hat{R}}_{d - 1} = \hat{R}$ , ${\hat{R}}_{d} = 1$ ) yields the same total number of parameters as in the original decomposition of the tensor $X$ .

¹³This choice of parameters corresponds to the problem of polymer modeling from the work (Venkiteswaran and Junk, 2005). In the corresponding model, the molecules of the polymer are represented by beads and interactions are indicated by connecting springs. Accordingly, for the case of only two particles we come to the dumbbell problem, which can be mathematically written in the form of the FPE.

¹⁴As values for comparison, we used the result of the most accurate calculation from work (Dolgov et al., 2012), within which $\hat{ψ} (t = 10) = 2.071143$ , and $\hat{η} (t = 10) = 1.0328125$ .

References

Chen, R. T., and Duvenaud, D. (2019). Neural Networks with Cheap Differential Operators. New York, NY: arXiv preprint arXiv:1912.03579.

Chen, T. Q., Rubanova, Y., Bettencourt, J., and Duvenaud, D. K. (2018). Neural Ordinary Differential Equations. Adv. Neural Inf. Process. Syst., 6571–6583.

Google Scholar

Cichocki, A., Lee, N., Oseledets, I., Phan, A.-H., Zhao, Q., and Mandic, D. P. (2016). Tensor Networks for Dimensionality Reduction and Large-Scale Optimization: Part 1 Low-Rank Tensor Decompositions. FNT Machine Learn. 9, 249–429. doi:10.1561/2200000059

CrossRef Full Text | Google Scholar

Dolgov, S., and Savostyanov, D. (2020). Parallel Cross Interpolation for High-Precision Calculation of High-Dimensional Integrals. Comp. Phys. Commun. 246, 106869. doi:10.1016/j.cpc.2019.106869

CrossRef Full Text | Google Scholar

Dolgov, S. V. (2019). A Tensor Decomposition Algorithm for Large Odes with Conservation Laws. Comput. Methods Appl. Math. 19, 23–38. doi:10.1515/cmam-2018-0023

CrossRef Full Text | Google Scholar

Dolgov, S. V., Khoromskij, B. N., and Oseledets, I. V. (2012). Fast Solution of Parabolic Problems in the Tensor Train/Quantized Tensor Train Format with Initial Application to the Fokker--Planck Equation. SIAM J. Sci. Comput. 34, A3016–A3038. doi:10.1137/120864210

CrossRef Full Text | Google Scholar

Fox, C., Dolgov, S., Morrison, M. E., and Molteno, T. C. (2020). Grid Methods for Bayes-Optimal Continuous-Discrete Filtering and Utilizing a Functional Tensor Train Representation. Inverse Probl. Sci. Eng., 1–19. doi:10.1080/17415977.2020.1862109

Google Scholar

Glowinski, R., Osher, S. J., and Yin, W. (2017). Splitting Methods in Communication, Imaging, Science, and Engineering. Springer.

Grathwohl, W., Chen, R. T., Bettencourt, J., Sutskever, I., and Duvenaud, D. (2018). Ffjord: Free-form Continuous Dynamics for Scalable Reversible Generative Models. arXiv preprint arXiv:1810.01367.

Kidger, P., Foster, J., Li, X., Oberhauser, H., and Lyons, T. (2021). Neural Sdes as Infinite-Dimensional gans. arXiv preprint arXiv:2102.03657.

Kikuchi, K., Yoshida, M., Maekawa, T., and Watanabe, H. (1991). Metropolis Monte Carlo Method as a Numerical Technique to Solve the Fokker-Planck Equation. Chem. Phys. Lett. 185, 335–338. doi:10.1016/s0009-2614(91)85070-d

CrossRef Full Text | Google Scholar

Küchlin, S., and Jenny, P. (2017). Parallel Fokker-Planck-DSMC Algorithm for Rarefied Gas Flow Simulation in Complex Domains at All Knudsen Numbers. J. Comput. Phys. 328, 258–277. doi:10.1016/j.jcp.2016.10.018

CrossRef Full Text | Google Scholar

Kumar, P., and Narayanan, S. (2006). Solution of Fokker-Planck Equation by Finite Element and Finite Difference Methods for Nonlinear Systems. Sadhana 31, 445–461. doi:10.1007/bf02716786

CrossRef Full Text | Google Scholar

Oseledets, I., and Tyrtyshnikov, E. (2010). Tt-cross Approximation for Multidimensional Arrays. Linear Algebra its Appl. 432, 70–88. doi:10.1016/j.laa.2009.07.024

CrossRef Full Text | Google Scholar

Oseledets, I. V. (2010). Approximation of $2^d\times2^d$ Matrices Using Tensor Decomposition. SIAM J. Matrix Anal. Appl. 31, 2130–2145. doi:10.1137/090757861

CrossRef Full Text | Google Scholar

Oseledets, I. V. (2011). Tensor-train Decomposition. SIAM J. Sci. Comput. 33, 2295–2317. doi:10.1137/090752286

CrossRef Full Text | Google Scholar

Oseledets, I. V., and Tyrtyshnikov, E. E. (2009). Breaking the Curse of Dimensionality, or How to Use Svd in many Dimensions. SIAM J. Sci. Comput. 31, 3744–3759. doi:10.1137/090748330

CrossRef Full Text | Google Scholar

Pichler, L., Masud, A., and Bergman, L. A. (2013). “Numerical Solution of the Fokker-Planck Equation by Finite Difference and Finite Element Methods-A Comparative Study,” in Computational Methods in Stochastic Dynamics (Springer), 69–85. doi:10.1007/978-94-007-5134-7_5

CrossRef Full Text | Google Scholar

Savostyanov, D., and Oseledets, I. (2011). “Fast Adaptive Interpolation of Multi-Dimensional Arrays in Tensor Train Format,” in The 2011 International Workshop on Multidimensional (nD) Systems (IEEE), 1–8. doi:10.1109/nds.2011.6076873

CrossRef Full Text | Google Scholar

Singh, R., Ghosh, D., and Adhikari, R. (2018). Fast Bayesian Inference of the Multivariate Ornstein-Uhlenbeck Process. Phys. Rev. E 98, 012136. doi:10.1103/PhysRevE.98.012136

PubMed Abstract | CrossRef Full Text | Google Scholar

Subramaniam, G. M., and Vedula, P. (2017). A Transformed Path Integral Approach for Solution of the Fokker-Planck Equation. J. Comput. Phys. 346, 49–70. doi:10.1016/j.jcp.2017.06.002

CrossRef Full Text | Google Scholar

Sun, Y., and Kumar, M. (2015). A Numerical Solver for High Dimensional Transient Fokker-Planck Equation in Modeling Polymeric Fluids. J. Comput. Phys. 289, 149–168. doi:10.1016/j.jcp.2015.02.026

CrossRef Full Text | Google Scholar

Sun, Y., and Kumar, M. (2014). Numerical Solution of High Dimensional Stationary Fokker-Planck Equations via Tensor Decomposition and Chebyshev Spectral Differentiation. Comput. Math. Appl. 67, 1960–1977. doi:10.1016/j.camwa.2014.04.017

CrossRef Full Text | Google Scholar

Trefethen, L. N. (2000). Spectral Methods in MATLAB, Vol. 10. Philadelphia, PA: Siam.

Tyrtyshnikov, E. (2000). Incomplete Cross Approximation in the Mosaic-Skeleton Method. Computing 64, 367–380. doi:10.1007/s006070070031

CrossRef Full Text | Google Scholar

Vatiwutipong, P., and Phewchean, N. (2019). Alternative Way to Derive the Distribution of the Multivariate Ornstein-Uhlenbeck Process. Adv. Differ. Equ 2019, 276. doi:10.1186/s13662-019-2214-1

CrossRef Full Text | Google Scholar

Venkiteswaran, G., and Junk, M. (2005). A QMC Approach for High Dimensional Fokker-Planck Equations Modelling Polymeric Liquids. Mathematics Comput. Simulation 68, 43–56. doi:10.1016/j.matcom.2004.09.002

CrossRef Full Text | Google Scholar

Wehner, M. F., and Wolfer, W. G. (1983). Numerical Evaluation of Path-Integral Solutions to Fokker-Planck Equations. Phys. Rev. A. 27, 2663–2670. doi:10.1103/physreva.27.2663

CrossRef Full Text | Google Scholar

Keywords: fokker-planck equation, probability density function, tensor train format, cross approximation, chebyshev polynomial, ornstein-uhlenbeck process, dumbbell model

Citation: Chertkov A and Oseledets I (2021) Solution of the Fokker–Planck Equation by Cross Approximation Method in the Tensor Train Format. Front. Artif. Intell. 4:668215. doi: 10.3389/frai.2021.668215

Received: 15 February 2021; Accepted: 29 June 2021;
Published: 02 August 2021.

Edited by:

Evangelos Papalexakis, University of California, Riverside, United States

Reviewed by:

Devin Matthews, Southern Methodist University, United States
Prakash Vedula, University of Oklahoma, United States

Copyright © 2021 Chertkov and Oseledets. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Andrei Chertkov, YS5jaGVydGtvdkBza29sdGVjaC5ydQ==

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.