Skip to main content

ORIGINAL RESEARCH article

Front. Appl. Math. Stat., 12 May 2020
Sec. Mathematical Finance
This article is part of the Research Topic Artificial Intelligence in Insurance and Finance View all 12 articles

Deep Learning Methods for Mean Field Control Problems With Delay

\nJean-Pierre Fouque
Jean-Pierre Fouque*Zhaoyu ZhangZhaoyu Zhang
  • Department of Statistics and Applied Probability, University of California Santa Barbara, Santa Barbara, CA, United States

We consider a general class of mean field control problems described by stochastic delayed differential equations of McKean–Vlasov type. Two numerical algorithms are provided based on deep learning techniques, one is to directly parameterize the optimal control using neural networks, the other is based on numerically solving the McKean–Vlasov forward anticipated backward stochastic differential equation (MV-FABSDE) system. In addition, we establish the necessary and sufficient stochastic maximum principle of this class of mean field control problems with delay based on the differential calculus on function of measures, and the existence and uniqueness results are proved for the associated MV-FABSDE system under suitable conditions.

Mathematical Subject Classification (2000): 93E20, 60G99, 68-04

1. Introduction

Stochastic games were introduced to study the optimal behaviors of agents interacting with each other. They are used to study the topic of systemic risk in the context of finance. For example, in Carmona et al. [1], the authors proposed a linear quadratic inter-bank borrowing and lending model, and solved explicitly for the Nash equilibrium with a finite number of players. Later, this model was extended in Carmona et al. [2] by considering delay in the control in the state dynamic to account for the debt repayment. The authors analyzed the problem via a probabilistic approach which relies on stochastic maximum principle, as well as via an analytic approach which is built on top of an infinite dimensional dynamic programming principle.

Both mean field control and mean field games are used to characterize the asymptotic behavior of a stochastic game as the number of players grows to infinity under the assumption that all the agents behave similarly, but with different notion of equilibrium. The mean field games consist of solving a standard control problem, where the flow of measures is fixed, and solving a fixed point problem such that this flow of measures matches the distribution of the dynamic of a representative agent. Whereas, the mean field control problem is a non-standard control problem in the sense that the law of state is present in the McKean–Vlasov dynamic, and optimization is performed while imposing the constraint of distribution of the state. More details can be found in Carmona and Delarue [3] and Bensoussan et al. [4].

In this paper, we considered a general class of mean field control problem with delay effect in the McKean–Vlasov dynamic. We derived the adjoint process associated with the McKean–Vlasov stochastic delayed differential equation, which is an anticipated backward stochastic differential equation of McKean–Vlasov type due to the fact that the conditional expectation of the future of adjoint process as well as the distribution of the state dynamic are involved. This type of anticipated backward stochastic differential equations (BSDE) was introduced in Peng and Yang [5], and for the general theory of BSDE, we refer Zhang [6]. The necessary and sufficient part of stochastic maximum principle for control problem with delay in state and control can be found in Chen and Wu [7]. Here, we also establish a necessary and sufficient stochastic maximum principle based on differential calculus on functions of measures as we consider the delay in the distribution. In the meantime, we also prove the existence and uniqueness of the system of McKean–Vlasov forward anticipated backward stochastic differential equations (MV-FABSDE) under some suitable conditions using the method of continuation, which can be found in Zhang [6], Peng and Wu [8], Bensoussan et al. [9], and Carmona et al. [2]. For a comprehensive study of FBSDE theory, we refer to Ma and Yong [10].

When there was no delay effect in the dynamic, Ma et al. [11] proved the relation between the solution to the FBSDE and quasi-linear partial differential equation (PDE) via “Four Step Scheme.” E et al. [12] and Raissi [13] explored the use of deep learning for solving these PDEs in high dimensions. However, the class of fully coupled MV-FABSDE considered in our paper has no explicit solution. Here, we present one algorithm to tackle the above problem by means of deep learning techniques. Due to the non-Markovian nature of the state dynamic, we apply the long short-term memory (LSTM) network, which is able to capture the arbitrary long-term dependencies in the data sequence. It also partially solves the vanishing gradient problem in vanilla recurrent neural networks (RNNs), as was shown in Hochreiter and Schmidhuber [14]. The idea of our algorithm is to approximate the solution to the adjoint process and the conditional expectation of the adjoint process. The optimal control is readily obtained after the MV-FABSDE being solved. We may also emphasize that the way that we present here for numerically computing conditional expectation may have a wide range of applications, and it is simple to implement. We also present another algorithm solving the mean field control problem by directly parameterizing the optimal control. Similar idea can be found in the policy gradient method in the regime of reinforcement learning [15] as well as in Han and E [16]. Numerically, the two algorithms that we propose in this paper yield the same results. Besides, our approaches are benchmarked to the case with no delay for which we have explicit solutions.

The paper is organized as follows. We start with an N-player game with delay, and let number of players goes to infinity to introduce a mean field control problem in section 2. Next, in section 3, we mathematically formulate the feedforward neural networks and LSTM networks, and we propose two algorithms to numerically solve the mean field control problem with delay using deep learning techniques. This is illustrated on a simple linear-quadratic toy model, however with delay in the control. One algorithm is based on directly parameterizing the control, and the other depends on numerically solving the MV-FABSDE system. In addition, we also provide an example of solving a linear quadratic mean field control problem with no delay both analytically, and numerically. The adjoint process associated with the delayed dynamic is derived, as well as the stochastic maximum principle is proved in section 4. Finally, the uniqueness and existence solution for this class of MV-FABSDE are proved under suitable assumptions via continuation method in section 5.

2. Formulation of the Problem

We consider an N-player game with delay in both state and control. The dynamic (Xti)0tT for player i ∈ {1, …, N} is given by a stochastic delayed differential equation (SDDE),

dXti=bi(t,Xt,Xt-τ,αti,αt-τi)dt+σi(t,Xt,Xt-τ,αti,αt-τi)dWti,      t(0,T],  X0i=x0i,  Xti=αti=0;t[-τ,0),    (2.1)

for T > 0, τ > 0 given constants, where Xt=(Xt1,,XtN), and where ((Wti)t[0,T])i=1,,N are N independent Brownian motions defined on the space (Ω,F,), (Ft)0tT being the natural filtration of Brownian motions.

(bi,σi):[0,T]××N×N×A×A×,

are progressively measurable functions with values in ℝ. We denote A a closed convex subset of ℝ, the set of actions that player i can take, and denote 𝔸 the set of admissible control processes. For each i ∈ {1, …, N}, A-valued measurable processes (αti)0tT satisfy an integrability condition such that 𝔼[-τT|αti|2dt]<+.

Given an initial condition x0=(x01,,x0N)N, each player would like to minimize his objective functional:

Ji(α)=𝔼[0Tfi(t,Xt,Xt-τ,αti)dt+gi(XT)],    (2.2)

for some Borel measurable functions fi:[0, T] × ℝN × ℝN × A → ℝ, and gi:ℝN → ℝ.

In order to study the mean-field limit of (Xt)t∈[0,T], we assume that the system (2.1) satisfy a symmetric property, that is to say, for each player i, the other players are indistinguishable. Therefore, drift bi and volatility σi in (2.1) take the form of

(bi,σi)(t,Xt,Xt-τ,αti,αt-τi)=(bi,σi)(t,Xti,μtN,Xt-τi,μt-τN,αti,αt-τi),

and the running cost fi and terminal cost gi are of the form

fi(t,Xt,Xt-τ,αti)=fi(t,Xti,μtN,Xt-τi,μt-τN,αti) and gi(XT)=gi(XTi,μTN),

where we use the notation μtN for the empirical distribution of X = (X1, ⋯ , XN) at time t, which is defined as

μtN=1Nj=1NδXtj.

Next, we let the number of players N go to +∞ before we perform the optimization. According to symmetry property and the theory of propagation of chaos, the joint distribution of the N dimensional process (Xt)0tT=(Xt1,,XtN)0tT converges to a product distribution, and the distribution of each single marginal process converges to the distribution of (Xt)0≤tT of the following McKean–Vlasov stochastic delayed differential equation (MV-SDDE). For more detail on the argument without delay, we refer to Carmona and Delarue [3] and Carmona et al. [17].

dXt=b(t,Xt,μt,Xt-τ,μt-τ,αt,αt-τ)dt        +σ(t,Xt,μt,Xt-τ,μt-τ,αt,αt-τ)dWt,t(0,T],  X0=x0,  Xt=αt=0,t[-τ,0).    (2.3)

We then optimize after taking the limit. The objective for each player of (2.2) now becomes

J(α)=𝔼[0Tf(Xt,μt,Xt-τ,μt-τ,αt)dt+g(XT,μT)],    (2.4)

where we denote μt:=L(Xt) the law of Xt.

3. Solving Mean-Field Control Problems Using Deep Learning Techniques

Due to the non-Markovian structure, the above mean-field optimal control problem (2.3 and 2.4) is difficult to solve either analytically or numerically. Here we propose two algorithms together with four approaches to tackle the above problem based on deep learning techniques. We would like to use two types of neural networks, one is called the feedforward neural network, and the other one is called Long Short-Term Memory (LSTM) network.

For a feedforward neural network, we first define the set of layers 𝕄d,hρ, for x ∈ ℝd, as

𝕄d,hρ:={M:dh|M(x)=ρ(Ax+b),Ah×d,bh}.    (3.1)

d is called input dimension, h is known as the number of hidden neurons, A ∈ ℝh×d is the weight matrix, b ∈ ℝh is the bias vector, and ρ is called the activation function. The following activation functions will be used in this paper, for some x ∈ ℝ,

ρReLU(x):=x+=max(0,x);ρs(x):=11+e-x;ρtanh(x):=tanh(x);ρId(x):=x.

Then feedforward neural network is defined as a composition of layers, so that the set of feedforward neural networks with l hidden layers we use in this paper is defined as

     d1,d2l={M~:d1d2|M~=MlM1M0,M0𝕄d1,h1ρReLU,Ml𝕄hl,d2ρId,Mi𝕄hi,hi+1ρReLU,h·+,i=1,,l-1}.    (3.2)

The LSTM network is one of RNN architectures, which are powerful for capturing long-range dependence of the data. It is proposed in Hochreiter and Schmidhuber [14], and it is designed to solve the shrinking gradient effects which basic RNN often suffers from. The LSTM network is a chain of cells. Each LSTM cell is composed of a cell state, which contains information, and three gates, which regulate the flow of information. Mathematically, the rule inside tth cell follows,

Γft=ρs(Afxt+Ufat-1+bf), Γit=ρs(Aixt+Uiat-1+bi),Γot=ρs(Aoxt+Uoat-1+bo),     ct=Γftct-1+Γitρtanh(Acxt+Ucat-1+bc),     at=Γotρtanh(ct),    (3.3)

where the operator ⊙ denotes the Hadamard product. (Γft,Γit,Γot)h×h×h represents forget gate, input gate and output gate, respectively, h refers the number of hidden neurons. xtd is the input vector with d features. ath is known as the output vector with initial value a0 = 0, and cth is known as the cell state with initial value c0 = 0. A·h×d are the weight matrices connecting input and hidden layers, U·h×h are the weight matrix connecting hidden and output layers, and b ∈ ℝh represents bias vector. The weight matrices and bias vectors are shared through all time steps, and are going to be learned during training process by back-propagation through time (BPTT), which can be implemented in Tensorflow platform. Here we define the set of LSTM network up to time t as

𝕃𝕊𝕋𝕄d,h,t={M:(d)t×h×hh×h | M(x0,,xt,a0,c0)=(at,ct),ct=Γftct-1+Γitρtanh(Acxt+Ucat-1+bc),at=Γotρtanh(ct),a0=c0=0},                     (3.4)

where Γf·, Γi·, Γo· are defined in (3.3).

In particular, we specify the model in a linear-quadratic form, which is inspired by Carmona et al. [2] and Fouque and Zhang [18]. The objective function is defined as

J(α)=𝔼[0T(12αt2+cf2(Xt-mt)2)dt+ct2(XT-mT)2],    (3.5)

subject to

        dXt=(αt-αt-τ)dt+σdWt,t[0,T],          X0=x0,Xt=αt=0,t[-τ,0),    (3.6)

where σ, cf, ct > 0 are given constants, and mt:=xdμt(x) denotes the mean of X at time t, and μt:=L(Xt). In the following subsections, we solve the above problem numerically using two algorithms together with four approaches. The first two approaches are to directly approximate the control by either a LSTM network or a feedforward neural network, and minimize the objective (3.5) using stochastic gradient descent algorithm. The third and fourth approaches are to introduce the adjoint process associated with (3.6), and approximate the adjoint process and the conditional expectation of adjoint process using neural networks.

3.1. Approximating the Optimal Control Using Neural Networks

We first set Δt = T/N = τ/D for some positive integer N. The time discretization becomes

-τ=t-Dt-D+1t0=0=t0t1tNT,

for titi−1 = Δt, i ∈ {−D + 1, ⋯ , 0, ⋯ , N − 1, N}. The discretized SDDE (3.6) according to Euler-Maruyama scheme now reads

Xti+1=Xti+(αti-αti-D)Δt+σΔtΔWti, for i{0,,N-1},    (3.7)

where (ΔWti)0≤iN−1 are independent, normal distributed sequence of random variables with mean 0 and variance 1.

First, from the definition of open loop control, and due to non-Markovian feature of (3.6), the open-loop optimal control is a function of the path of the Brownian motions up to time t, i.e., α(t, (Ws)0≤st). We are able to describe this dependency by a LSTM network by parametrizing the control as a function of current time and the discretized increments of Brownian motion path, i.e.,

                    (ati,cti)=φ1(ti,(ΔWs)t0sti|Φ1) for φ1𝕃𝕊𝕋𝕄2,h1,t      and Φ1=(Af,Ai,Ao,Ac,Uf,Ui,Uo,Uc,bf,bi,bo,bc),α(ti,(Ws)t0sti)ψ1(ati|Ψ1) for ψ1𝕄h1,1Id and Ψ1=(A,b),    (3.8)

for some h1+. We remark that the last dense layer is used to match the desired output dimension.

The second approach is again directly approximate the control but with a feedforward neural network. Due to the special structure of our model, where the mean of dynamic in (3.6) is constant, the mean field control problem coincides with the mean field game problem. In Fouque and Zhang [18], authors solved the associated mean field game problem using infinite dimensional PDE approach, and found that the optimal control is a function of current state and the past of control. Therefore, the feedforward neural network with l layers, which we use to approximate the optimal control, is defined as

αti(Xti,(αs)ti-Ds<ti)ψ2(Xti,(αs)ti-Dsti|Ψ2)            for ψ2D+1,1l, Ψ2=(A0,b0,,Al,bl).    (3.9)

From Monte Carlo algorithm, and trapezoidal rule, the objective function (3.5) now becomes

J=1Mj=1M[(12(αt0(j))2+cf2(Xt0(j)-X̄t0)2+     i=1N-1((αti(j))2+cf(Xti(j)-X̄ti)2)+12(αtN(j))2+cf2(XtN(j)-X̄tN)2)Δt2+ct2(XtN(j)-X̄tN)2],    (3.10)

where M denotes the number of realizations and X̄:=1Mj=1MX(j) denotes the sample mean. After plugging in the neural network either given by (3.8) or (3.9), the optimization problem becomes to find the best set of parameters either (Φ1, Ψ1) or Ψ2 such that the objective J1, Ψ1) or J2) is minimized with respect to those parameters.

The algorithm works as follows:

ALGORITHM 1
www.frontiersin.org

Algorithm 1: Algorithms for solving mean field control problem with delay by directly approximating the optimal control using neural networks

In the following graphics, we choose x0 = 0, cf = ct = 1, σ = 1, T = 10, τ = 4, Δt = 0.1, M = 4,000. For approach 1, the neural network φ ∈ ℕℕ, which is defined in (3.2), is composed of 3 hidden layers with d1 = 42, h1 = 64, h2 = 128, h3 = 64, d2 = 1. For approach 2, the LSTM network φ ∈ 𝕃𝕊𝕋𝕄, which is defined in (3.4), consists of 128 hidden neurons. For a specific representative path, the underlying Brownian motion paths are approximately the same for different approaches. Figure 1 compares one representative optimal trajectory of the state dynamic and the control, and they coincide. Figure 2 plots the sample average of optimal trajectory of the state dynamic and the control, which are trajectories of approximately 0, and this is the same as the theoretical mean.

FIGURE 1
www.frontiersin.org

Figure 1. On the left, we compare one representative optimal trajectory of (Xti)t0titN. The plot on the right show the comparison of one representative optimal trajectory of (αti)t0titN between approach 1 and approach 2.

FIGURE 2
www.frontiersin.org

Figure 2. On the left, we compare the sample mean of optimal trajectories of (Xti)t0titN. The plot on the right show the comparison of sample mean of trajectories of optimal control (αti)t0titN between approach 1 and approach 2.

3.2. Approximating the Adjoint Process Using Neural Networks

The third and fourth approaches are based on numerically solving the MV-FABSDE system using LSTM network and feedforward neural networks. From section 4, we derive the adjoint process, and prove the sufficient and necessary parts of stochastic maximum principle. From (4.7), we are able to write the backward stochastic differential equation associated to (3.6) as,

dYt=-cf(Xt-mt)dt+ZtdWt, t[0,T],    (3.11)

with terminal condition YT = ct(XTmT), and Ys = 0 for s ∈ (T, T + τ]. The optimal control (α^t)0tT can be obtained in terms of the adjoint process Yt from the maximum principle, and it is given by

α^t=-Yt+𝔼[Yt+τ|Ft].

From the Euler-Maruyama scheme, the discretized version of (3.6) and (3.11) now reads, for i = {0, …, N − 1},

Xti+1=Xti+(α^ti-α^ti-D)Δt+σΔtΔWti, where ΔWti~N(0,1).    (3.12)
Yti+1=Yti-cf(Xti-X̄ti)Δt+ZtiΔtΔWti,    (3.13)

where we use the sample average X̄t=1Mj=1MXt(j) to approximate the expectation of Xt. In order to solve the above MV-FABSDE system, we need to approximate (Yti,𝔼[Yti+D|Fti],Zti)0titN.

The third approach consists of approximating (Yti,𝔼[Yti+D|Fti],Zti)0titN using three LSTM networks as functions of current time and the discretized path of Brownian motions, respectively, i.e.,

(atiY,ctiY)=φY(ti,(ΔWs)t0sti|ΦY)     for  φY𝕃𝕊𝕋𝕄2,hY,ti and        ΦY= (AfY,AiY,AoY,AcY,UfY,UiY,UoY,UcY,bfY,biY,boY,bcY),   YtiψY(atiY|ΨY) for ψY𝕄hY,1Id and ΨY= (AY,bY),
    (ati𝔼Y,cti𝔼Y)=φ𝔼Y(ti,(ΔWs)t0sti|Φ𝔼Y)       for φ𝔼Y𝕃𝕊𝕋𝕄2,h𝔼Y,ti and          Φ𝔼Y= (Af𝔼Y,Ai𝔼Y,Ao𝔼Y,Ac𝔼Y,Uf𝔼Y,Ui𝔼Y,Uo𝔼Y,Uc𝔼Y,bf𝔼Y,bi𝔼Y,bo𝔼Y,bc𝔼Y),𝔼[Yti+D|Fti]ψ𝔼Y(ati𝔼Y|Ψ𝔼Y) for ψ𝔼Y𝕄h𝔼Y,1Id and Ψ𝔼Y= (A𝔼Y,b𝔼Y),
(atiZ,ctiZ)=φZ(ti,(ΔWs)t0sti|ΦZ)     for φZ𝕃𝕊𝕋𝕄2,hZ,ti     and ΦZ= (AfZ,AiZ,AoZ,AcZ,UfZ,UiZ,UoZ,UcZ,bfZ,biZ,boZ,bcZ),   ZtiψZ(atiZ|ΨZ) for ψZ𝕄hZ,1Id and ΨZ= (AZ,bZ),    (3.14)

for some hY,hEY,hZ+. Again, the last dense layers are used to match the desired output dimension.

Since approach 3 consists of three neural networks with large number of parameters, which is hard to train in general, we would like to make the following simplification in approach 4 for approximating (Yti,𝔼[Yti+D|Fti],Zti)t0titN via combination of one LSTM network and three feedforward neural networks. Specifically,

(ati,cti)=φ(ti,(ΔWs)t0sti|Φ)     for φ𝕃𝕊𝕋𝕄2,h,ti and       Φ= (Af,Ai,Ao,Ac,Uf,Ui,Uo,Uc,bf,bi,bo,bc),   YtiψY(ati|ΨY) for ψYh,1l and       ΨY= (A0Y,b0Y,,AlY,blY),𝔼[Yti+D|Fti]     ψ𝔼Y(ati|Ψ𝔼Y) for ψ𝔼Yh,1l and        Ψ𝔼Y= (A0𝔼Y,b0𝔼Y,,Al𝔼Y,bl𝔼Y),   ZtiψZ(ati|ΨZ) for ψZh,1l and        ΨZ= (A0Z,b0Z,,AlZ,blZ).    (3.15)

In words, the algorithm works as follows. We first initialize the parameters (ΘY, ΘEY, ΘZ) = ((ΦY, ΨY), (ΦEY, ΨEY), (ΦZ, ΨZ)) either in (3.14) or (ΘY, ΘEY, ΘZ) = ((Φ, ΨY), ΨEY), ΨZ) in (3.15). At time 0, X0 = x0, (Y0,𝔼[YtD|F0],Z0)(φ0Y(ΘY),φ0𝔼Y(Θ𝔼Y),φ0Z(ΘZ)) for some network (φY, φEY, φZ) given by either (3.14) or (3.15), and α0 = −Yt0+E[YtD|Ft0]. Next, we update Xti+1 and Yti+1 according to (3.12), and the solution to the backward equation at ti+1 is denoted by Ỹti+1. In the meantime, Yti+1 is also approximated by a neural network. In such case, we refer to Ỹ· as the label, and Y· given by the neural network as the prediction. We would like to minimize the mean square error between these two. At time T, YtN is also supposed to match ct(XtN-X̄tN), from the terminal condition of (3.11). In addition, the conditional expectation 𝔼[Yti+D|Fti] given by a neural network should be the best predictor of Ỹti+D, which implies that we would like to find the set of parameters ΘEY such that 𝔼[(ti+D-φtiEY(ΘEY))2] is minimized for all ti ∈ {t0, …, tND}. Therefore, for M samples, we would like to minimize two objective functions L1 and L2 defined as

L1(ΘY,ΘZ)=1M[j=1Mi=0N(φtiY,(j)-ti(j))2        +j=1M(φtNY,(j)-ct(XtN(j)-X̄tN))2],L2(Θ𝔼Y)=1Mj=1Mi=0N-D(φti𝔼Y,(j)-ti+D(j))2.    (3.16)

The algorithm works as follows:

ALGORITHM 2
www.frontiersin.org

Algorithm 2: Algorithms for solving mean field control problem with delay according to MV-FABSDE

Again, in the following graphics, we choose x0 = 0, cf = ct = 1, σ = 1, T = 10, τ = 4, Δt = 0.1, M = 4, 000. In approach 3, each of the three LSTM networks approximating Yti,𝔼[Yti+D|Fti] and Zti consists of 128 hidden neurons, respectively. In approach 4, the LSTM consists of 128 hidden neurons, and each of the feedforward neural networks has parameters d1 = 128, h1 = 64, h2 = 128, h3 = 64, d2 = 1. For a specific representative path, the underlying Brownian motion paths are the same for different approaches. Figure 3 compares one representative optimal trajectory of the state dynamic and the control via two approaches, and they coincide. Figure 4 plots the sample average of optimal trajectory of the dynamic and the control, which are trajectories of 0, which is the same as the theoretical mean. Comparing to Figures 1, 2, as well as based on numerous experiments, we find that given a path of Brownian motion, the two algorithms would yield similar optimal trajectory of state dynamic and similar path for the optimal control. From Figure 6, the loss L1 as defined in (3.16) becomes approximately 0.02 in 1,000 epochs for both approach 3 and approach 4. This can also be observed from Figure 5, since the red dash line and the blue solid line coincide for both left and right graphs. In addition, from the righthand side of Figure 6, we observe the loss L2 as defined in (3.16) converges to 50 after 400 epochs. This is due to the fact that the conditional expectation can be understood as an orthogonal projection. Figure 7 plots 64 sample paths of the process (Zti)t0titN, which seems to be a deterministic function since σ is constant in this example. Finally, Figure 8 shows the convergence of the value function as number of epochs increases. Both algorithms arrive approximately at the same optimal value which is around 6 after 400 epochs. This confirms that the out control problem has a unique solution. In section 5, we show that the MV-FASBDE system is uniquely solvable. It is also observable that the first algorithm converges faster than the second one, since it directly paramerizes the control using one neural network, instead of solving the MV-FABSDE system, which uses three neural networks.

FIGURE 3
www.frontiersin.org

Figure 3. On the left, we compare one representative optimal trajectory of (Xti)t0titN. The plot on the right shows the comparison of one representative optimal trajectory of (αti)t0titN between approach 3 and approach 4.

FIGURE 4
www.frontiersin.org

Figure 4. On the left, we compare the sample mean of optimal trajectories of (Xti)t0titN. The plot on the right shows the comparison of sample mean of trajectories of optimal control (αti)t0titN between approach 3 and approach 4.

FIGURE 5
www.frontiersin.org

Figure 5. Plots of representative trajectories of [(Yti)t0titN,(ti)t0titN,(𝔼[Yti+D|Fti])t0titN], from approach 3 (one the left) and from approach 4 (on the right).

FIGURE 6
www.frontiersin.org

Figure 6. Convergence of loss L1 (on the left) and convergence of loss L2 (on the right) as defined in (3.16) from approach 3 and approach 4.

FIGURE 7
www.frontiersin.org

Figure 7. 64 trajectories of (Zti)t0titN based on approach 3 (on the left) and approach 4 (on the right).

FIGURE 8
www.frontiersin.org

Figure 8. Comparison convergence of objective values as in (3.10) among four approaches.

3.3. Numerically Solving the Optimal Control Problem With No Delay

Since the algorithms we proposed embrace the case with no delay, we illustrate the comparison between numerical results and the analytical results. By letting τ > T we obtain αt−τ = 0 in (3.6), and we aim at solving the following linear-quadratic mean-field control problem by minimizing

J(α)=𝔼[0T(12αt2+cf2(Xt-mt)2)dt+ct2(XT-mT)2],    (3.17)

subject to

dXt=αtdt+σdWt,t[0,T],  X0=x0.    (3.18)

Again, from section 4, we find the optimal control

α^t=-Yt,

where (Yt, Zt) is the solution of the following adjoint process,

dYt=-cf(Xt-mt)dt+ZtdWt.    (3.19)

Next, we make the ansatz

Yt=ϕt(Xt-mt),    (3.20)

for some deterministic function ϕt, satisfying the terminal condition ϕT = ct. Differentiating the ansatz, the backward equation should satisfy

dYt=(ϕ.t-ϕt2)(Xt-mt)dt+ϕtσdWt,    (3.21)

where ϕt denotes the time derivative of ϕt. Comparing with (3.19), and identifying the drift and volatility term, ϕt must satisfy the scalar Riccati equation,

{ϕ.t=ϕt2-cf,ϕT=ct,    (3.22)

and the process Zt should satisfy

Zt=ϕtσ,    (3.23)

which is deterministic. If we choose x0 = 0, cf = ct = 1, T = 10, ϕt = 1 solves the Riccati equation (3.22), so that Zt = 1, ∀t ∈ [0, T], and from (3.20), the optimal control satisfies

α^t=-(Xt-mt).    (3.24)

Numerically, we apply the two deep learning algorithms proposed in the previous section. The first algorithm directly approximates the control. According to the open loop formulation, we set

   (ati,cti)=φ(ti,(ΔWs)t0sti|Φ)        for φ𝕃𝕊𝕋𝕄2,h,t and          Φ=(Af,Ai,Ao,Ac,Uf,Ui,Uo,Uc,bf,bi,bo,bc),α(ti,(Ws)t0sti)ψ(ati|Ψ) for ψ𝕄h,1Id and Ψ=(A,b),    (3.25)

for some h ∈ ℤ+. We remark that the last dense layer is used to match the desired output dimension. The second algorithm numerically solves the forward backward system as in (3.18) and (3.19). From the ansatz (3.20) and the Markovian feature, we approximate (Yt, Zt)0≤tT using two feedforward neural networks, i.e.,

Ytiψ1(ti,Xti|Ψ1) for ψ12,1l, Ψ1=(A01,b01,,Al1,bl1);Ztiψ2(ti,Xti|Ψ2) for ψ22,1l, Ψ2=(A02,b02,,Al2,bl2).    (3.26)

Figure 9 shows the representative optimal trajectory of (Xti-X̄ti)t0titN and (αti)t0titN from both algorithms, which are exactly the same. The symmetry feature can be seen from the computation (3.24). On the left of Figure 10 confirms the mean of the processes (Xti-X̄ti)t0titN and (αti)t0titN are 0 from both algorithms. The right picture of Figure 10 plots the data points of x against α, and we can observe that the optimal control α is linear in x as a result of (3.24), and the slope tends to be - 1, since ϕ = 1 solves the scalar Riccati equation (3.22). Finally, Figure 11 plots representative optimal trajectories of the solution to the adjoint equations (Yt, Zt). On the left, we observe that the adjoint process (Yt)0≤tT matches the terminal condition, and on the right, (Zt)0≤tT appears to be a deterministic process of value 1, and this matches the result we compute previously.

FIGURE 9
www.frontiersin.org

Figure 9. Representative optimal trajectory of (Xti-X̄ti)t0titN and (αti)t0titN from algorithm 1 (on the left) and from algorithm 2 (on the right).

FIGURE 10
www.frontiersin.org

Figure 10. The picture on the left plots the sample averages of (Xti)t0titN and (αti)t0titN for both algorithm 1 and 2; The plot on the right shows the points of x against α for both algorithms.

FIGURE 11
www.frontiersin.org

Figure 11. The plot on the left shows representative trajectories of [(Yti)t0titN, (Ỹti)t0titN]. The picture on the right plots 64 trajectories of (Zti)t0titN.

4. Stochastic Maximum Principle for Optimality

In this section, we derive the adjoint equation associated to our mean field stochastic control problem (2.3) and (2.4). The necessary and sufficient parts of stochastic maximum principle have been proved for optimality. We assume

(H4.1) b, σ are differentiable with respect to (Xt, μt, Xt−τ, μt−τ, αt, αt−τ); f is differentiable with respect to (Xt, μt, Xt−τ, μt−τ, α); g is differentiable with respect to (XT, μT). Their derivatives are bounded.

In order to simplify our notations, let θt = (Xt, μt, αt). For 0 < ϵ < 1, we denote αϵ the admissible control defined by

αtϵ:=αt+ϵ(βt-αt):=αt+ϵΔαt,

for any (α)0≤tT and (β)0≤tT ∈ 𝔸. Xtϵ:=Xtαϵ is the corresponding controlled process. We define

Xt:=limϵ0Xtϵ-Xtαϵ

to be the variation process, which should follow the following dynamic for t ∈ (0, T],

dXt=[xb(t,θt,θt-τ)Xt+xτb(t,θt,θt-τ)Xt-τ     +𝔼~[μb(t,θt,θt-τ)(X~t)X~t]+𝔼~[μτb(t,θt,θt-τ)(X~t-τ)X~t-τ]     +αb(t,θt,θt-τ)Δαt+ατb(t,θt,θt-τ)Δαt-τ]dt     +[xσ(t,θt,θt-τ)Xt+xτσ(t,θt,θt-τ)Xt-τ     +𝔼~[μσ(t,θt,θt-τ)(Xt~)Xt~]+𝔼~[μτσ(t,θt,θt-τ)(X~t-τ)X~t-τ]     +ασ(t,θt,θt-τ)Δαt+ατσ(t,θt,θt-τ)Δαt-τ]dWt    (4.1)

with initial condition ∇Xt = Δαt = 0, t ∈ [−τ, 0). (X~t,X~t) is a copy of (Xt, ∇Xt) defined on (Ω~,F~,~), where we apply differential calculus on functions of measure, see Carmona and Delarue [3] for detail. ∂xb, ∂xτb, ∂μb, ∂μτb, ∂αb, ∂ατb are derivatives of b with respect to (Xt, Xt−τ, μt, μt−τ, αt, αt−τ), respectively, and we use the same notation for ∂·σ.

In the meantime, the Gateaux derivative of functional α → J(α) is given by

   limϵ0J(αϵ)-J(α)ϵ=𝔼[xg(XT,μT)XT+𝔼~[μg(XT,μT)(X~T)X~T]]   +𝔼0T[xf(θt,Xt-τ,μt-τ)Xt+𝔼~[μf(θt,Xt-τ,μt-τ)(X~t)X~t]   +xτf(θt,Xt-τ,μt-τ)Xt-τ+𝔼~[μτf(θt,Xt-τ,μt-τ)(X~t-τ)X~t-τ]   +αf(θt,Xt-τ,μt-τ)(Δαt)]dt    (4.2)

In order to determine the adjoint backward equation of (Yt, Zt)0≤tT associated to (2.3), we assume it is of the following form:

dYt=-φtdt+ZtdWt,t[0,T],  YT=xg(XT,μT)+𝔼~[μg(XT,μT)(X~T)],  Yt=Zt=0,t(T,T+τ]    (4.3)

Next, we apply integration by part to ∇Xt and Yt. It yields

d(XtYt)=Yt[xb(t,θt,θt-τ)Xt + xτb(t,θt,θt-τ)Xt-τ + 𝔼~[μb(t,θt,θt-τ)(X~t)X~t] + 𝔼~[μτb(t,θt,θt-τ)(X~t-τ)X~t-τ] + αb(t,θt,θt-τ)Δαt + ατb(t,θt,θt-τ)Δαt-τ]dt + Yt[xσ(t,θt,θt-τ) + xτσ(t,θt,θt-τ)Xt-τ + 𝔼~[μσ(t,θt,θt-τ)(Xt~)Xt~] + 𝔼~[μτσ(t,θt,θt-τ)(X~t-τ)X~t-τ] + ασ(t,θt,θt-τ)Δαt + ατσ(t,θt,θt-τ)Δαt-τ]dWt-φtXtdt + XtZtdWt + Zt[xσ(t,θt,θt-τ)Xt + xτσ(t,θt,θt-τ)Xt-τ + 𝔼~[μσ(t,θt,θt-τ)(Xt~)Xt~] + 𝔼~[μτσ(t,θt,θt-τ)(X~t-τ)X~t-τ] + ασ(t,θt,θt-τ)Δαt + ατσ(t,θt,θt-τ)Δαt-τ]dt

We integrate from 0 to T, and take expectation to get

𝔼[XTYT] =𝔼0TYt[xb(t,θt,θt-τ)Xt + xτb(t,θt,θt-τ)Xt-τ + 𝔼~[μb(t,θt,θt-τ)(X~t)X~t] + 𝔼~[μτb(t,θt,θt-τ)(X~t-τ)X~t-τ] + αb(t,θt,θt-τ)Δατ + ατb(t,θt,θt-τ)Δαt-τ]dt -𝔼0TφtXtdt + 𝔼0TZt[xσ(t,θt,θt-τ)Xt + xτσ(t,θt,θt-τ)Xt-τ + 𝔼~[μσ(t,θt,θt-τ)(Xt~)Xt~] + 𝔼~[μτσ(t,θt,θt-τ)(X~t-τ)X~t-τ] + ασ(t,θt,θt-τ)Δαt + ατσ(t,θt,θt-τ)Δαt-τ]dt    (4.4)

Using the fact that Yt = Zt = 0 for t ∈ (T, T + τ], we are able to make a change of time, and by Fubini's theorem, so that (4.4) becomes

𝔼[XTYT]=𝔼0T(Ytxb(t,θt,θt-τ)+Yt+τxτb(t+τ,θt+τ,θt)     +𝔼~[μb(t,θ~t,θ~t-τ)(Xt)]     +𝔼~[μτb(t+τ,θ~t+τ,θ~t)(Xt))Xtdt     +𝔼0T(αb(t,θt,θt-τ)+ατb(t+τ,θt+τ,θt))Δαtdt     -𝔼[0TφtXt]dt+𝔼0T(Ztxσ(t,θt,θt-τ)     +Zt+τxτσ(t+τ,θt+τ,θt)+𝔼~[μσ(t,θ~t,θ~t-τ)(Xt)]     +𝔼~[μτσ(t+τ,θ~t+τ,θ~t)(Xt))Xtdt     +𝔼0T(ασ(t,θt,θt-τ)+ατσ(t+τ,θt+τ,θt))Δαtdt    (4.5)

Now we define the Hamiltonian H for (t,x,μ,xτ,μτ,y,z,α,ατ)[0,T]××P2()××P2()×××A×A as

H(t,x,μ,xτ,μτ,α,ατ,y,z)=b(t,x,μ,xτ,μτ,α,ατ)y+σ(t,x,μ,xτ,μτ,α,ατ)z                  +f(t,x,μ,xτ,μτ,α)    (4.6)

Using the terminal condition of YT, and plugging (4.5) into (4.2), and setting the integrand containing ∇Xt to zero, we are able to obtain the adjoint equation is of the following form

dYt=-{xH(t,Xt,μt,Xt-τ,μt-τ,αt,αt-τ,Yt,Zt)      +𝔼~[μH(t,X~t,μt,X~t-τ,μt-τ,α~t,α~t-τ,t,Z~t)(Xt)]      +𝔼[xτH(t+τ,Xt+τ,μt+τ,Xt,μt,αt+τ,αt,Yt+τ,Zt+τ)|Ft]      +𝔼[𝔼~[μτH(t+τ,X~t+τ,μt+τ,X~t,μt,α~t,α~t-τ,t+τ,Z~t+τ)     (Xt+τ)]|Ft]}dt+ZtdWt YT=xg(XT,μT)+𝔼~[μg(X~T,μT)(XT)].    (4.7)

Theorem 4.1. Lett)0≤tT ∈ A be optimal, (Xt)0≤tT be the associated controlled state, and (Yt, Zt)0≤tT be the associated adjoint processes defined in (4.7). For any β ∈ A, and t ∈ [0, T],

  (αH(t,Xt,μt,Xt-τ,μt-τ,αt,αt-τ,Yt,Zt)+𝔼[ατH(t+τ,Xt+τ,μt+τ,Xt,μt,αt+τ,αt,Yt+τ,Zt+τ)|Ft])           (β-αt)0 a.e.    (4.8)

Proof. Given any (βt)0≤tT ∈ 𝔸, we perturbate αt by ϵ(βt−αt) and we define αtϵ:=αt+ϵ(βt-αt) for 0 ≤ ϵ ≤ 1. Using the adjoint process (4.7), and apply integration by parts formula to (∇XtYt). Then plug the result into (4.2), and the Hamiltonian H is defined in (4.6). Also, since α is optimal, we have

0limϵ0J(αϵ)-J(α)ϵ =𝔼0T([αH(t,θt,θt-τ,Yt,Zt)+   𝔼[ατH(t+τ,θt+τ,θt,Yt+τ,Zt+τ)|Ft])(βt-αt)dt    (4.9)

Now, let CFt be an arbitrary progressively measurable set, and denote C′ the complement of C. We choose βt to be βt:=β1C+αt1C for any given β ∈ A. Then,

𝔼0T([αH(t,θt,θt-τ,Yt,Zt) +𝔼[ατH(t+τ,θt+τ,θt,Yt+τ,Zt+τ)|Ft])(βt-αt)1Cdt0,    (4.10)

which implies,

(αH(t,θt,θt-τ,Yt,Zt)+ 𝔼[ατH(t+τ,θt+τ,θt,Yt+τ,Zt+τ)|Ft])(β-αt)0. a.e.    (4.11)

Remark 4.2. When we further assume that H is convex in (αt, αt−τ), then for any β, βτA in Theorem 4.1, we have

H(t,Xt,μt,Xt-τ,μt-τ,αt,αt-τ,Yt,Zt)H(t,Xt,μt,Xt-τ,μt-τ,β,βτ,Yt,Zt), a.e.

as a direct consequence of (4.8).

Theorem 4.3. Lett)0≤tT ∈ 𝔸 be an admissible control. Let (Xt)0≤tT be the controlled state, and (Yt, Zt)0≤tT be the corresponding adjoint processes. We further assume that for each t, given Yt and Zt, the function (x, μ, xτ, μτ, α, ατ) → H(t, x, μ, xτ, μτ, α, ατ, Yt, Zt), and the function (x, μ) → g(x, μ) are convex. If

H(t,Xt,μt,Xt-τ,μt-τ,αt,αt-τ,Yt,Zt)=infα𝔸H(t,Xt,μt,Xt-τ,μt-τ,αt,αt-τ,Yt,Zt),    (4.12)

for all t, thent)0≤tT is an optimal control.

Proof. Let (αt)0tT𝔸 be a admissible control, and let (Xt)0tT=(Xtα)0tT be the corresponding controlled state. From the definition of the objective function as in (2.4), we first use convexity of g, and the terminal condition of the adjoint process Yt in (4.7), then use the fact that H is convex, and because of (4.12), we have the following

  J(α)-J(α)=𝔼[g(XT,μT)-g(XT,μT)]   + 𝔼0T[f(t,θt,Xt-τ,αt-τ)-f(t,θt,Xt-τ,αt-τ)]dt𝔼[xg(XT,μT)(XT-XT)+𝔼~[μg(XT,μT)(X~T)(X~T-X~T)]] +𝔼0T[f(t,θt,Xt-τ,αt-τ)-f(t,θt,Xt-τ,αt-τ)]dt=𝔼[(xg(XT,μT)+[μg(X~T,μT)(XT)])(XT-XT)] +𝔼0T[f(t,θt,Xt-τ,αt-τ)-f(t,θt,Xt-τ,αt-τ)]dt=𝔼[YT(XT-XT)]+𝔼0T[f(t,θt,Xt-τ,αt-τ) -f(t,θt,Xt-τ,αt-τ)]dt=𝔼0T[(b(t,θt,θt-τ)-b(t,θt,θt-τ))Yt +(σ(t,θt,θt-τ)-σ(t,θt,θt-τ))Zt]dt-𝔼0T[(xH(t,θt,θt-τ,Yt,Zt)  + 𝔼~[μH(t,θ~t,θ~t-τ,t,Z~t)(Xt)])(Xt-Xt)]dt -𝔼0T[(𝔼[xτH(t+τ,θt+τ,θt,Yt+τ,Zt+τ)|Ft] +𝔼[𝔼~[μτH(t+τ,θ~t+τ,θ~t,t+τ,Z~t+τ)(Xt)]|Ft])(Xt-Xt)]dt +𝔼0T[H(t,θt,θt-τ,Yt,Zt)-H(t,θt,θt-τ,Yt,Zt)]dt +𝔼0T[(b(t,θt,θt-τ)-b(t,θt,θt-τ))Yt+(σ(t,θt,θt-τ)  -σ(t,θt,θt-τ))Zt]dt-𝔼0T[xH(t,θt,θt-τ,Yt,Zt)(Xt-Xt)  + 𝔼~[μH(t,θt,θt-τ,Yt,Zt)(X~t)(X~t-Xt~)]dt -𝔼0T[xτH(t,θt,θt-τ,Yt,Zt)(Xt-τ-Xt-τ) +𝔼~[μτH(t,θt,θt-τ,Yt,Zt)(X~t-τ)(X~t-τ-X~t-τ)]]dt +𝔼0T[H(t,θt,θt-τ,Yt,Zt)-H(t,θt,θt-τ,Yt,Zt)]dt 𝔼0T[αH(t,θt,θt-τ,Yt,Zt)(αt-αt)+ατH(t,θt,θt-τ,Yt,Zt)  (αt-τ-αt-τ)]dt𝔼0T(αH(t,θt,θt-τ,Yt,Zt) +𝔼[ατH(t+τ,θt+τ,θt,Yt+τ,Zt+τ)|Ft])(αt-αt)dt0.    (4.13)

5. Existence and Uniqueness Result

Given the necessary and sufficient conditions proven in section 4, we use the optimal control (α^t)0tT defined by

   α^(t,Xt,μt,Xt-τ,μt-τ,Yt,Zt,𝔼[Yt+τ|Ft],𝔼[Zt+τ|Ft])=argminαAH(t,Xt,μt,Xt-τ,μt-τ,αt,αt-τ,Yt,Zt),    (5.1)

to establish the solvability result of the McKean–Vlasov FABSDE (2.3) and (4.7) for t ∈ [0, T]:

dXt=b(t,Xt,μt,Xt-τ,μt-τ,α^t,α^t-τ)dt    +σ(t,Xt,μt,Xt-τ,μt-τ,α^t,α^t-τ)dWt,dYt=-{xH(t,Xt,μt,Xt-τ,μt-τ,α^t,α^t-τ,Yt,Zt)    +𝔼~[μH(t,X~t,μt,X~t-τ,μt-τ,α^~t,α^~t-τ,t,Z~t)(Xt)]    +𝔼[xτH(t+τ,Xt+τ,μt+τ,Xt,μt,α^t+τ,α^t,Yt+τ,Zt+τ)|Ft]    +𝔼[𝔼~[μτH(t+τ,X~t+τ,μt+τ,X~t,μt,α^~t+τ,α^~t,t+τ,Z~t+τ)     (Xt)]|Ft]}dt+ZtdWt    (5.2)

with initial condition X0=x0;Xt=α^t=0fort[-τ,0) and terminal condition YT=xg(XT,μT)+𝔼~[μg(X~T,μT)(XT)]. In addition to assumption (H 4.1), we further assume

(H5.1) The drift and volatility functions b and σ are linear in x, μ, xτ, μτ, α, ατ. For all (t,x,μ,xτ,μτ,α,ατ)[0,T]××P2()×P2()×A×A, we assume that

b(t,x,μ,xτ,μτ,α,ατ)=b0(t)+b1(t)x+b̄1(t)m+b2(t)xτ+b̄2(t)mτ+b3(t)α+b4(t)ατ,σ(t,x,μ,xτ,μτ,α,ατ)=σ0(t)+σ1(t)x+σ̄1(t)m+σ2(t)xτ+σ̄2(t)mτ+σ3(t)α+σ4(t)ατ,    (5.3)

for some measurable deterministic functions b0,b1,b̄1,b2,b̄2,b3,b4,σ0,σ1,σ̄1,σ2,σ̄2,σ3,σ4 with values in ℝ bounded by R, and we have used the notation m = ∫xdμ(x) and mτ = ∫xdμτ(x) for the mean of measures μ and μτ, respectively.

(H5.2) The derivatives of f and g with respect to (x, xτ, μ, μτ, α) and (x, μ) are Lipschitz continuous with Lipschitz constant L.

(H5.3) The function f is strongly L-convex, which means that for any t ∈ [0, T], any x,x,xτ,xτ, any α, α′ ∈ A, any μ,μ,μτ,μτP2(), any random variables X and X′ having μ and μ′ as distribution, and any random variables Xτ and Xτ having μτ and μτ as distribution, then

f(t,x,μ,xτ,μτ,α)-f(t,x,μ,xτ,μτ,α)-xf(t,x,μ,xτ,μτ,α)(x-x)-xτf(t,x,μ,xτ,μτ,α)(xτ-xτ)-𝔼[μf(t,x,μ,xτ,μτ,α)(X)·(X-X)]-𝔼[μτf(t,x,μ,xτ,μτ,α)(Xτ)·(Xτ-Xτ)]-αf(t,x,μ,xτ,μτ,α)(α-α)κ|α-α|2.    (5.4)

The function g is also assumed to be L-convex in (x, μ).

Theorem 5.1. Under assumptions (H5.1-H5.3), the McKean–Vlasov FABSDE (5.2) is uniquely solvable.

The proof is based on continuation methods. Let λ ∈ [0, 1], consider the following class of McKean–Vlasov FABSDEs, denoted by MV-FABSDE(λ), for t ∈ [0, T]:

dXt=(λb(t,θt,θt-τ)+Itb)dt+(λσ(t,θt,θt-τ)+Itσ)dWt,dYt=-{λ(xH(t,θt,θt-τ,Yt,Zt)+𝔼~[μH(t,θ~t,θ~t-τ,t,Z~t)(Xt)]      +𝔼[xτH(t+τ,θt+τ,θt,Yt+τ,Zt+τ)|Ft]      +𝔼[𝔼~[μτH(t+τ,θ~t+τ,θ~t,t+τ,Z~t+τ)(Xt)]|Ft])      +Itf}dt+ZtdWt,    (5.5)

where we denote θt = (Xt, μt, αt), with optimality condition

αt=α^(t,Xt,μt,Xt-τ,μt-τ,Yt,Zt,𝔼[Yt+τ|Ft],𝔼[Zt+τ|Ft]),t[0,T],

and with initial condition X0 = x0; Xt = αt = 0 for t ∈ [−τ, 0) and terminal condition

YT=λ{xg(XT,μT)+𝔼~[μg(X~T,μT)(XT)}+ITg,

and Yt = 0 for t ∈ (T, T + τ], where (Itb,Itσ,Itf)0tT are some square-integrable progressively measurable processes with values in ℝ, and ITgL2(Ω,FT,) is a square integrable FT-measurable random variable with value in ℝ.

Observe that when λ = 0, system (5.5) becomes decoupled standard SDE and BSDE, which has an unique solution. When setting λ=1, Itb=Itσ=Itf=0 for 0 ≤ tT, and ITg=0, we are able to recover the system of (5.2).

Lemma 5.2. Given λ0 ∈ [0, 1), for any square-integrable progressively measurable processes (Itb,Itσ,Itf)0tT, and ITgL2(Ω,FT,), such that system FABSDE0) admits a unique solution, then there exists δ0 ∈ (0, 1), which is independent on λ0, such that the system MV-FABSDE(λ) admits a unique solution for any λ ∈ [λ0, λ0 + δ0].

Proof. Assuming that (Xˇ,Yˇ,Ž,αˇ) are given as an input, for any λ ∈ [λ0, λ0 + δ0], where δ0 > 0 to be determined, denoting δ: = λ − λ0 ≤ δ0, we take

Itbδ[b(t,θˇt,θˇt-τ)]+Itb,Itσδ[σ(t,θˇt,θˇt-τ)]+Itσ,Itfδ[xH(t,θˇt,θˇt-τ,Yt,Zt)+𝔼~[μH(t,θˇ~t,θˇ~t-τ,Yˇ~t,Ž~t)(Xt)]  +𝔼[xτH(t+τ,θˇt+τ,θˇt,Yˇt+τ,Žt+τ)|Ft]  +𝔼[𝔼~[μτH(t+τ,θˇ~t+τ,θˇ~t,Yˇ~t+τ,Ž~t+τ)(Xˇt)]|Ft]]+Itf,ITgδ[xg(XˇT,μT)+𝔼~[μg(Xˇ~T,μT)(XˇT)]+ITg.    (5.6)

According to the assumption, let (X, Y, Z) be the solutions of MV-FABSDE(λ0) corresponding to inputs (Xˇ,Yˇ,Ž), i.e., for t ∈ [0, T]

dXt=(λ0bt+δbˇt+Itb)dt+(λ0σt+δσˇt+Itσ)dWt,dYt=-{λ0(xHt+𝔼~[μH~t(Xt)]      +𝔼[xτHt+τ|Ft]+𝔼[𝔼~[μτH~t+τ(Xt)]|Ft])      +δ(xȞt+𝔼~[μȞ~t(Xˇt)]+𝔼[xτȞt+τ|Ft]      +𝔼[𝔼~[μτȞ~t+τ(Xˇt)]|Ft])+Itf}dt+ZtdWt,    (5.7)

with initial condition, X0 = x0, Xs = αs = 0 for s ∈ [−τ, 0), and terminal condition

YT=λ0(xgT+𝔼~[μg~T(XT)])+δ(xǧT+𝔼~[μǧ~T(XˇT)])+ITg,    (5.8)

and Yt = Zt = 0 for t ∈ (T, T+τ], where we have used simplified notations,

         bt :=b(t,θt,θt-τ);bˇt :=b(t,θˇt,θˇt-τ);         σt :=σ(t,θt,θt-τ);σˇt :=σ(t,θˇt,θˇt-τ);       xHt :=xH(t,θt,θt-τ,Yt,Zt);      𝔼~[μH~t(Xt)] :=𝔼~[μH(t,θ~t,θ~t-τ,t,Z~t)(Xt)]     𝔼[xτHt+τ|Ft] :=𝔼[xτH(t+τ,θt+τ,θt,Yt+τ,Zt+τ)|Ft];𝔼[𝔼~[μH~t(Xt)]|Ft] :=𝔼[𝔼~[μτH(t+τ,θ~t+τ,θ~t,t+τ,Z~t+τ)         (Xt)]|Ft];      xgT :=xg(XT,μT);𝔼~[μg~T(XT)]]          :=𝔼~[μg(X~T,μT)(XT)]similar notation for xȞt, 𝔼~[μȞ~t(Xˇt)], 𝔼[xτȞt+τ|Ft], and 𝔼[𝔼~[μτȞ~t+τ(Xˇt)]|Ft].    (5.9)

We would like to show that the map Φ:(Xˇ,Yˇ,Ž,αˇ)Φ(Xˇ,Yˇ,Ž,αˇ)=(X,Y,Z,α) is a contraction. Consider (ΔX, ΔY, ΔZ, Δα) = (XX′, YY′, ZZ′, α−α′), where (X,Y,Z,α)=Φ(Xˇ,Yˇ,Ž,αˇ). In addition, for the following computation, we have used simplified notation:

            Δbt :=b(t,θt,θt-τ)-b(t,θt,θt-τ);            Δbˇt :=b(t,θˇt,θˇt-τ)-b(t,θˇt,θˇt-τ);            Δσt :=σ(t,θt,θt-τ)-σ(t,θt,θt-τ);            Δσˇt :=σ(t,θˇt,θˇt-τ)-σ(t,θˇt,θˇt-τ)            xgT :=xg(XT,μT)-xg(XT,μT);       Δ𝔼~[μg~T(XT)]] :=𝔼~[μg(X~T,μT)(XT)]            -𝔼~[μg(Xˇ~T,μT)(XT)]          ΔxHt :=xH(t,θt,θt-τ,Yt,Zt)            -xH(t,θt,θt-τ,Yt,Zt)      Δ𝔼~[μH~t(Xt)] :=𝔼~[μH(t,θ~t,θ~t-τ,t,Z~t)(Xt)]            -𝔼~[μH(t,θ~t,θ~t-τ,t,Z~t)(Xt)]    Δ𝔼[xτHt+τ|Ft] :=𝔼[xτH(t+τ,θt+τ,θt,Yt+τ,Zt+τ)|Ft]            -𝔼[xτH(t+τ,θt+τ,θt,Yt+τ,Zt+τ)|Ft]   Δ𝔼[𝔼~[μH~t(Xt)]|Ft] :=𝔼[𝔼~[μτH(t+τ,θ~t+τ,θ~t,t+τ,            Z~t+τ)(Xt)]|Ft]-𝔼[𝔼~[μτH(t+τ,θ~t+τ,θ~t,            t+τ,Z~t+τ)(Xt)]|Ft]similar notation forΔxȞt, Δ𝔼~[μȞ~t(Xˇt)], Δ𝔼[xτȞt+τ|Ft],             andΔ𝔼[𝔼~[μτȞ~t+τ(Xˇt)]|Ft].    (5.10)

Applying integration by parts to ΔXtYt, we have

d(ΔXtYt)=Yt{[λ0Δbt+δΔbˇt]dt+[λ0Δσt+δΔσˇt]dWt}-ΔXt{λ0(xHt+𝔼~[μH~t(Xt)]+𝔼[xτHt+τ|Ft]+𝔼[𝔼~[μτH~t+τ(Xt)]|Ft])+δ(xȞt+𝔼~[μȞ~t(Xˇt)]+𝔼[xτȞt+τ|Ft]+𝔼[𝔼~[μτȞ~t+τ(Xˇt)]|Ft])}dt+ΔXtZtdWt+(λ0Δσt+δΔσˇt)Ztdt.    (5.11)

After integrating from 0 to T, and taking expectation on both sides, we obtain

𝔼[ΔXTYT]=λ0𝔼0T(ΔbtYt+ΔσtZt-ΔXt(xHt+𝔼~[μH~t(Xt)]+𝔼[xτHt+τ|Ft]+𝔼[𝔼~[μτH~t+τ(Xt)]|Ft]))dt+δ𝔼0T(ΔbˇtYt+ΔσˇZt-ΔXt(xȞt+𝔼~[μȞ~t(Xˇt)]+𝔼[xτȞt+τ|Ft]+𝔼[𝔼~[μτȞ~t+τ(Xˇt)]|Ft]))dt    (5.12)

In the meantime, from the terminal condition of YT given in (5.8), and since g is convex, we also have

  𝔼[ΔXTYT]=𝔼[ΔXT(λ0(xgT+𝔼~[μg~T(XT)])+δ(xǧT+𝔼~[μǧ~T(XˇT)])+ITg)]λ0𝔼[g(XT,μT)-g(XT-μT)]+δΔXT(xǧT+𝔼~[μǧ~T(XˇT)])+ΔXTITg    (5.13)

Following the proof of sufficient part of maximum principle and using (5.12), and (5.13), we find

λ0(J(α)-J(α))       =λ0𝔼[g(XT,μT)-g(XT,μT)]       +λ0𝔼0T[f(t,θt,Xt-τ,μt-τ)       -f(t,θt,Xt-τ,μt-τ)]dt       𝔼[ΔXTYT]-δΔXT(xǧT+𝔼~[μǧ~T(XˇT)])-ΔXTITg       +λ0𝔼0T[f(t,θt,Xt-τ,μt-τ)-f(t,θt,Xt-τ,μt-τ)]dt       =λ0𝔼0T[ΔbtYt+ΔσtZt       -ΔXt(xHt+𝔼~[μH~t(Xt)]+𝔼[xτHt+τ|Ft]       +𝔼[𝔼~[μτH~t+τ(Xt)]|Ft])]dt       +δ𝔼0T[ΔbˇtYt+ΔσˇtZt-ΔXt(xȞt+𝔼~[μȞ~t(Xˇt)]       +𝔼[xτȞt+τ|Ft]+𝔼[𝔼~[μτȞ~t+τ(Xt)]|Ft])]dt       +λ0𝔼0T[H(t,θt,θt-τ,Yt,Zt)-H(t,θt,θt-τ,Yt,Zt)]dt       -λ0𝔼0T(ΔbtYt+ΔtσZt)dt-δΔXT(xǧT       +𝔼~[μǧ~T(XˇT)])-ΔXTITg       =λ0𝔼0T[H(t,θt,θt-τ,Yt,Zt)-H(t,θt,θt-τ,Yt,Zt)       -ΔXt(xHt+𝔼~[μH~t(Xt)]+𝔼[xτHt+τ|Ft]       +𝔼[𝔼~[μτH~t+τ(Xt)]|Ft])]dt+δ𝔼0T[ΔbˇtYt       +ΔσˇZt-ΔXt(xȞt+𝔼~[μȞ~t(Xˇt)]+𝔼[xτȞt+τ|Ft]       +𝔼[𝔼~[μτȞ~t+τ(Xˇt)]|Ft])]dt       -δΔXT(xǧT+𝔼~[μǧ~T(XˇT)])-ΔXTITg       -𝔼0Tλ0κ|Δαt|2dt+δ𝔼0T[ΔbˇtYt+ΔσˇtZt       -ΔXt(xȞt+𝔼~[μȞ~t(Xˇt)]+𝔼[xτȞt+τ|Ft]        +𝔼[𝔼~[μτȞ~t+τ(Xˇt)]|Ft])]dt-δΔXT(xǧT       +𝔼~[μǧ~T(XˇT)])-ΔXTITg    (5.14)

Reverse the role of α and α′, we also have

λ0(J(α)-J(α))-𝔼0Tλ0κ|Δαt|2dt+δ𝔼0T[ΔbˇtYt+ΔσˇtZt-ΔXt(xȞt+𝔼~[μȞ~t(Xˇt)]+𝔼[xτȞt+τ|Ft]+𝔼[𝔼~[μτȞ~t+τ(Xˇt)]|Ft])]dt-δΔXT(xǧT+𝔼~[μǧ~T(XˇT)])-ΔXTITg    (5.15)

Summing (5.14) and (5.15), using the fact that b and σ have the linear form, using change of time and Lipschitz assumption, it yields

2λ0κ𝔼0T|Δαt|2dtδ𝔼0T[ΔbˇtΔYt+ΔσˇΔZt-ΔXt(ΔxȞt+Δ𝔼~[μȞ~t(Xt)]+Δ𝔼[xτȞt+τ|Ft]+Δ𝔼[𝔼~[μτȞ~t+τ(Xt)]|Ft])]dt+δΔXT(xǧT-xǧT+𝔼~[μǧ~T(XˇT)]-𝔼~[μǧ~T(XˇT)])12𝔼0T[ϵ(|ΔXt|2+|ΔYt|2+|ΔZt|2)+1ϵδ2(|Δbˇt|2+|Δσˇ|2+|ΔxȞt+Δ𝔼~[μȞ~t(Xt)]+Δ𝔼[xτȞt+τ|Ft]+Δ𝔼[𝔼~[μτȞ~t+τ(Xt)]|Ft]|2)]dt+12(ϵ|ΔXT|2+1ϵδ2|xǧT-xǧT+𝔼~[μǧ~T(XˇT)]-𝔼~[μǧ~T(XˇT)]|2)12ϵ𝔼[0Tϵ(|ΔXt|2+|ΔYt|2+|ΔZt|2+|Δαt|2)dt+|ΔXT|2]+12δCϵ𝔼[0T(|ΔXˇt|2+|ΔYˇt|2+|ΔŽt|2+|Δαˇt|2)]dt+|ΔXˇT|2],    (5.16)

Next, we apply Ito's formula to ΔXt2,

  dΔXt2=2ΔXtdXt+dX,Xt=2ΔXt(λ0Δbt+δΔbˇt)dt+2ΔXt(λ0Δσt+δΔσˇt)dWt+(λ0Δσt+δΔσˇt)2dt    (5.17)

Then integrate from 0 to T, and take expectation,

  𝔼[|ΔXt|2]=2λ0𝔼0t|ΔXsΔbs|ds+2δ𝔼0t|ΔXsΔbˇs|ds+𝔼0t|λ0Δσs+δΔσˇs|2dsλ0𝔼0t(|ΔXs|2+|Δbs|2)ds+𝔼0t(|ΔXs|2+δ2|Δbˇs|2)ds+𝔼0t(2λ02|Δσs|2+2δ2|Δσˇs|2)dsC𝔼0t+τ(|ΔXs|2+|Δαs|2)ds+δC𝔼0t+τ(|ΔXˇs|2+|Δαˇs|2)ds    (5.18)

From Gronwall's inequality, we can obtain

sup0tT𝔼[|Xt|2]C𝔼0T|Δαt|2dt+δC𝔼0T(|ΔXˇt|2+|Δαˇt|2)dt    (5.19)

Similarly, applying Ito's formula to |ΔYt|2, and taking expectation, we have

  𝔼[|ΔYt|2+tT|ΔZs|2ds]=2λ0𝔼tT|ΔYt(ΔxHt+Δ𝔼~[μH~t(Xt)]+Δ𝔼[xτHt+τ|Ft]+Δ𝔼[𝔼~[μτH~t+τ(Xt)]|Ft])|+2δ𝔼tT|ΔYt(ΔxȞt+Δ𝔼~[μȞ~t(Xˇt)]+Δ𝔼[xτȞt+τ|Ft]+Δ𝔼[𝔼~[μτȞ~t+τ(Xˇt)]|Ft])|+𝔼|ΔYT|2𝔼tT(1ϵ|ΔYt|2+ϵλ02|ΔxHt+Δ𝔼~[μH~t(Xt)]+Δ𝔼[xτHt+τ|Ft]+Δ𝔼[𝔼~[μτH~t+τ(Xt)]|Ft]|2)dt+𝔼tT(|ΔYt|2+δ2|ΔxȞt+Δ𝔼~[μȞ~t(Xˇt)]+Δ𝔼[xτȞt+τ|Ft]+Δ𝔼[𝔼~[μτȞ~t+τ(Xˇt)]|Ft]|2)dt+𝔼|λ0(ΔxgT+Δ𝔼~[μg~T(XT)])+δ(ΔxǧT+Δ𝔼~[μǧ~T(XˇT)])|2    (5.20)

Choose ϵ = 96 max{R2, L}, and from assumption (H5.1 - H5.2) and Gronwall's inequality, we obtain a bound for sup0tT𝔼|ΔYt|2; and then substitute the it back to the same inequality, we are able to obtain the bound for 0TE|Zt|2dt. By combining these two bounds, we deduce that

  𝔼[sup0tT|Yt|2+0T|Zt|2dt]C𝔼(sup0tT|ΔXt|2+0T|Δαt|2dt)+δC𝔼[sup0tT(|ΔXˇt|2+|ΔYˇt|2)+0T(|ΔŽt|2+|Δαˇt|2)dt]    (5.21)

Finally, combining (5.19) and (5.21), and (5.16), we deduce

  𝔼[sup0tT|ΔXt|2+sup0tT|ΔYt|2+0T(|ΔZt2|+|Δαt|2)dt]δC𝔼[sup0tT|ΔXˇt|2+sup0tT|ΔYˇt|2+0T(|ΔŽt|2+|Δαˇt|2)dt]    (5.22)

Let δ0=12C, it is clear that the mapping Φ is a contraction for all δ ∈ (0, δ0). It follows that there is a unique fixed point which is the solution of MV-FABSDE(λ) for λ = λ0 + δ, δ ∈ (0, δ0).

Proof of Theorem 5.1: For λ = 0, FABSDE(0) has a unique solution. Using Lemma 5.2, there exists a δ0 > 0 such that FBSDE(δ) has a unique solution for δ ∈ [0, δ0], assuming (n − 1)δ0 < 1 ≤ 0. Following by a induction argument, we repeat Lemma 5.2 for n times, which gives us the existence of the unique solution of FABSDE(1).

6. Conclusion

Overall, we presented a comprehensive study of a general class of mean-field control problems with delay effect. The state dynamics is described by a McKean–Vlasov stochastic delayed differential equation. We derive the adjoint process associated to the dynamics, which is in the form of an anticipated backward stochastic differential equation of McKean–Vlasov type. We also prove a version of stochastic maximum principle, which gives necessary and sufficient conditions for the optimal control. Furthermore, we prove the existence and uniqueness of this class of forward anticipated backward stochastic differential equations under suitable conditions.

However, due to the lack of explicit solutions, numerical methods are needed. The non-linear nature of the problem due to the McKean–Vlasov aspect combined with non-Markovianity due to delay rule out classical numerical methods. Our study show that deep learning methods can deal with these obstacles. we proposed two algorithms based on machine learning to numerically solve the control problem. One is to directly approximate the control using a neural network, while the loss is given by the objective function in the control problem. The other algorithm is based on the system of forward and backward stochastic differential equations. We approximate the adjoint processes (Y·, Z·) and the conditional expectation of the adjoint process 𝔼[Y·+τ|F·] using neural networks. In this case, there are two loss functions that we need to minimize as shown in (3.16). The first loss is associated with the adjoint process Y·, and the other one is related to 𝔼[Y·+τ|F·]. After minimizing the losses, the optimal control can be readily obtained from the adjoint processes. In addition, Figure 8 illustrates that the optimal value to the control problem, that is computed from different methods, converges. Moreover, our method also works when the control problem has no delay. As a sanity check, we provide an example of control problem without delay effect, and we show that the solution obtained from the algorithm matches the explicit solution.

Data Availability Statement

The datasets generated for this study are available on request to the corresponding author.

Author Contributions

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

Funding

This work supported by NSF grant DMS-1814091.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

1. Carmona R, Fouque JP, Sun LH. Mean field games and systemic risk. Commun Math Sci. (2015) 13:911–33. doi: 10.4310/CMS.2015.v13.n4.a4

CrossRef Full Text | Google Scholar

2. Carmona R, Fouque JP, Mousavi SM, Sun LH. Systemic risk and stochastic games with delay. J Optim Appl. (2018) 179:366–99. doi: 10.1007/s10957-018-1267-8

CrossRef Full Text | Google Scholar

3. Carmona R, Delarue F. Probabilistic Theory of Mean Field Games with Applications I & II. Springer International Publishing (2018).

Google Scholar

4. Bensoussan A, Frehse J, Yam SCP. Mean Field Games and Mean Field Type Control Theory. New York, NY: Springer-Verlag (2013).

Google Scholar

5. Peng S, Yang Z. Anticipated backward stochastic differential equations. Ann Probab. (2009) 37:877–902. doi: 10.1214/08-AOP423

CrossRef Full Text | Google Scholar

6. Zhang J. Backward Stochastic Differential Equations. New York, NY: Springer-Verlag (2017).

Google Scholar

7. Chen L, Wu Z. Maximum principle for the stochastic optimal control problem with delay and application. Automatica. (2010) 46:1074–80. doi: 10.1016/j.automatica.2010.03.005

CrossRef Full Text | Google Scholar

8. Peng S, Wu Z. Fully coupled forward-backward stochastic differential equations and applications to optimal control. SIAM J Control Optim. (1999) 37:825–43. doi: 10.1137/S0363012996313549

CrossRef Full Text | Google Scholar

9. Bensoussan A, Yam SCP, Zhang Z. Well-posedness of mean-field type forward-backward stochastic differential equations. Stochastic Process Appl. (2015) 125:3327–54. doi: 10.1016/j.spa.2015.04.006

CrossRef Full Text | Google Scholar

10. Ma J, Yong J. Forward-Backward Stochastic Differential Equations and their Applications. Berlin; Heidelberg: Springer-Verlag (2007).

Google Scholar

11. Ma J, Protter P, Yong J. Solving forward-backward stochastic differential equations explicitly–a four step scheme. Probab Theory Relat Fields. (1994) 98:339–59.

Google Scholar

12. Weinan E, Han J, Jentzen A. Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun Math Stat. (2017) 5:349–80. doi: 10.1007/s40304-017-0117-6

CrossRef Full Text | Google Scholar

13. Raissi M. Forward-backward stochastic neural networks: deep learning of high-dimensional partial differential equations. (2018). arxiv[Preprint].arxiv:1804.07010.

Google Scholar

14. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. (1997) 9:1735–80.

PubMed Abstract | Google Scholar

15. Sutton RS, McAllester DA, Singh SP, Mansour Y. Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems (NIPS). (1999). p. 1057–63. Available online at: https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation

Google Scholar

16. Han J, Weinan E. Deep Learning Approximation for Stochastic Control Problems, Deep Reinforcement Learning Workshop. Conference on Neural Information Processing Systems (NIPS) (2016).

17. Carmona R, Delarue F, Lachapelle A. Control of McKean-Vlasov dynamics versus mean field games. Math Fin Econ. (2013) 7:131–66. doi: 10.1007/s11579-012-0089-y

CrossRef Full Text | Google Scholar

18. Fouque JP, Zhang Z. Mean field game with delay: a toy model. Risks. (2018) 6:1–17. doi: 10.3390/risks6030090

CrossRef Full Text | Google Scholar

Keywords: deep learning, mean field control, delay, McKean-Vlasov, stochastic maximum principal

Citation: Fouque J-P and Zhang Z (2020) Deep Learning Methods for Mean Field Control Problems With Delay. Front. Appl. Math. Stat. 6:11. doi: 10.3389/fams.2020.00011

Received: 02 November 2019; Accepted: 07 April 2020;
Published: 12 May 2020.

Edited by:

Sou Cheng Choi, Illinois Institute of Technology, United States

Reviewed by:

Tomas Tichy, VSB-Technical University of Ostrava, Czechia
Samy Wu Fung, University of California, Los Angeles, United States

Copyright © 2020 Fouque and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jean-Pierre Fouque, Zm91cXVlQHBzdGF0LnVjc2IuZWR1

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.