Skip to main content

ORIGINAL RESEARCH article

Front. Appl. Math. Stat., 24 May 2021
Sec. Optimization
This article is part of the Research Topic Data Science Applications to Inverse and Optimization Problems in Earth Science View all 9 articles

Performance Analysis of Trust Region Subproblem Solvers for Limited-Memory Distributed BFGS Optimization Method

Guohua Gao
Guohua Gao1*Horacio FlorezHoracio Florez1Jeroen C. VinkJeroen C. Vink2Terence J. WellsTerence J. Wells2Fredrik SaafFredrik Saaf1Carl P. A. BlomCarl P. A. Blom2
  • 1Shell Global Solutions (United States) Inc., Houston, TX, United States
  • 2Shell Global Solutions International B.V., Amsterdam, Netherlands

The limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) optimization method performs very efficiently for large-scale problems. A trust region search method generally performs more efficiently and robustly than a line search method, especially when the gradient of the objective function cannot be accurately evaluated. The computational cost of an L-BFGS trust region subproblem (TRS) solver depend mainly on the number of unknown variables (n) and the number of variable shift vectors and gradient change vectors (m) used for Hessian updating, with m << n for large-scale problems. In this paper, we analyze the performances of different methods to solve the L-BFGS TRS. The first method is the direct method using the Newton-Raphson (DNR) method and Cholesky factorization of a dense n × n matrix, the second one is the direct method based on an inverse quadratic (DIQ) interpolation, and the third one is a new method that combines the matrix inversion lemma (MIL) with an approach to update associated matrices and vectors. The MIL approach is applied to reduce the dimension of the original problem with n variables to a new problem with m variables. Instead of directly using expensive matrix-matrix and matrix-vector multiplications to solve the L-BFGS TRS, a more efficient approach is employed to update matrices and vectors iteratively. The L-BFGS TRS solver using the MIL method performs more efficiently than using the DNR method or DIQ method. Testing on a representative suite of problems indicates that the new method can converge to optimal solutions comparable to those obtained using the DNR or DIQ method. Its computational cost represents only a modest overhead over the well-known L-BFGS line-search method but delivers improved stability in the presence of inaccurate gradients. When compared to the solver using the DNR or DIQ method, the new TRS solver can reduce computational cost by a factor proportional to n2/m for large-scale problems.

Introduction

Decision-making tools based on optimization procedures have been successfully applied to solve practical problems in a wide range of areas. An optimization problem is generally defined as minimizing (or maximizing) an objective function f(x) within a user defined search domain xΩ, and subject to some linear or nonlinear constraints, where x is an n-dimensional vector that contains all controllable variables.

In the oil and gas industry, an optimal business development plan requires robust production optimization because of the considerable uncertainty of subsurface reservoir properties and volatile oil prices. Many papers have been published on the topic of robust optimization and their applications to cyclic CO2 flooding through the Gas-Assisted Gravity Drainage process [1], well placement optimization in geologically complex reservoirs [2], optimal production schedules of smart wells for water flooding [3] and in naturally fractured reservoirs [4], just mentioning a few of them as examples.

Some researchers formulated the optimization problem under uncertainty as a single objective optimization problem, e.g., only maximizing the mean of net present value (NPV) for simplicity by neglecting the associated risk. However, it is recommended to formulate robust optimization as a bi-objective optimization problem for consistency and completeness. A bi-objective optimization problem is generally defined as minimizing (or maximizing) two different objective functions, f1(x) and f2(x), within a user-defined search domain xΩ, and subject to some linear or nonlinear constraints. For example, we may maximize the mean value, denoted by f1(x), and minimize the standard deviation, denoted by f2(x), of NPV. For a bi-objective optimization problem, the optimal solutions are defined as the Pareto optimal solutions (or Pareto front). It is a very challenging task to find multiple optimal solutions on the Pareto front [5, 6].

It is also a very challenging task to properly characterize the uncertainty of reservoir properties (e.g., porosity and permeability in each grid-block) and reliably quantify the ensuing uncertainty of production forecasts (e.g., production rates of oil, gas, and water phases) by conditioning to historical production data and 4D seismic data [7, 8], which requires generating multiple conditional realizations by minimizing a properly defined objective function, e.g., using the randomized maximum likelihood (RML) method [9].

When an adjoint-based gradient of the objective function is available [1012], we may apply a gradient-based optimization method, e.g., the Broyden–Fletcher–Goldfarb–Shanno (BFGS) optimization method [13, 14]. However, the adjoint-based gradient is not available for many commercial simulators and not available for integer type or discrete variables (e.g., well locations defined by grid-block indices). In such a case, we must apply a derivative-free optimization (DFO) method [1518]. For problems with smooth objective functions and continuous variables, model-based DFO methods using either radial-basis function [19] or quadratic model [16, 20, 21] performs more efficiently than other DFO methods such as direct pattern search methods [22] and stochastic search methods [23].

Traditional optimization methods only locate a single optimal solution, and they are referred to as single-thread optimization methods in this paper. It is unacceptably expensive to use a single-thread optimization method to locate multiple optimal solutions on the Pareto front or generate multiple RML samples [24]. To overcome the limitations of single-thread optimization methods, Gao et al. [17] developed a local-search distributed Gauss-Newton (DGN) DFO method to find multiple best matches concurrently. Later, Chen et al. [25] modified the local-search DGN optimization method and generalized the DGN optimizer for global search. They also integrated the global-search DGN optimization method with the RML method to generate multiple RML samples in parallel.

Because multiple search points are generated in each iteration, finally, multiple optimal solutions can be found in parallel. These distributed optimization methods are referred to as multiple-thread optimization methods. Both the local- and global-search DGN optimization methods are only applicable to a specific optimization problem, i.e., history matching problem or least-squares optimization problem, of which the objective function can be expressed as a form of least-squares, f(x)=12αxTx+12yT(x)y(x), but they cannot be applied to other type or generic optimization problems where the objective function cannot be expressed as a form of least-squares. Furthermore, the DGN optimization methods may become less efficient for history matching problems when both the number of variables (n) and the number of observed data (m) are large (e.g., in the order of thousands or more).

Recently, Gao et al. [18] developed a well-paralleled distributed quasi-Newton (DQN) DFO method for generic optimization problems by introducing a generalized form of the objective function, F(x,y)=f(x). Here, x represents the vector of controllable variables or model parameters to be optimized (explicit variables) and y(x) denotes the vector of simulated responses (implicit variables) of a reservoir using explicit variables x. Using the generalized form of the objective function, the gradient of the objective function can be evaluated analytically by,

g(x)=xF(x,y)+JT(x)yF(x,y)(1)

In Eq. 1, JT(x)=xyT(x) is the transpose of the sensitivity matrix. Using the generalized expression of the objective function F(x,y), different objective functions can be evaluated using the same y(x) that is simulated from the same reservoir model, e.g., for multi-objective optimization problems. A specific case of the generalized expression is F(x,y)=y=f(x), where the transpose of the sensitivity matrix becomes the gradient of the objective function.

The DQN optimization algorithm runs Ne optimization threads in parallel. In the k-th iteration, there are NLC(k)Ne non-converged threads. The DQN optimizer generates a new search points xT(i,k)=x(i,k)+s(i,k) for each non-converged thread, where the search step s(i,k) is solved from a trust region subproblem (TRS). The NLC(k) simulation cases are submitted to a high-performance computing (HPC) cluster in parallel, to generate corresponding simulated responses yT(i,k)=y(xT(i,k)). Then, we evaluate the objective function fT(i,k)=F(xT(i,k),yT(i,k)) and its associated partial derivatives xF(i,k) and yF(i,k). If the new search point xT(i,k) improves the objective function, i.e., fT(i,k)<f(i,k)=f(x(i,k)), then we update x(i,k+1)=xT(i,k) and f(i,k+1)=fT(i,k).

All simulation results generated by all DQN optimization threads in previous and current iterations are recorded in a training data set, and they are shared among all DQN optimization threads. The sensitivity matrix J(i,k+1)=J(x(i,k+1)) is approximated using a modified QR-method proposed by Gao, et al. [18] by linear interpolation of training data points that are closest to x(i,k+1). Then, we approximate the gradient g(i,k+1)=g(x(i,k+1)) using Eq. 1. Finally, the Hessian H(i,k+1) for each thread is updated using either the BFGS method or the symmetric rank-1 (SR1) method [14]. For simplicity, we will drop the superscript “i” (the optimization thread index) in the following discussions.

Both DGN and DQN optimization methods approximate the objective function by a quadratic model, and they are designed for problems with smooth objective function and continuous variables. Although their convergence is not guaranteed for problems with integer type variables (e.g., well location optimization), if those integers can be treated as truncated continuous variables, our numerical tests indicate that these distributed optimization methods can improve the objective function significantly and locate multiple suboptimal solutions for problems with integer type variables in only a few iterations.

The number of variables for real-world problems may vary from a few to thousands or even more, depending on the problem and the parameterization techniques employed. For example, the number of variables could be in the order of millions if we do not apply any parameter reduction techniques to reduce the dimension of some history matching problems (e.g., to tune permeability and porosity in each grid block). Because reservoir properties are generally correlated with each other with long correlation lengths, we may reduce the number of variables to be tuned to only a few hundred, e.g., using the principal component analysis (PCA) or other parameter reduction techniques [26].

In this paper, our focus is on performance analysis of different methods to solve the TRS formulated with the limited-memory BFGS (L-BFGS) Hessian updating method for unconstrained optimization problems. In the future, we will further integrate the new TRS solver with our newly proposed limited-memory distributed BFGS (L-DBFGS) optimization algorithm and then apply the new optimizer to some realistic oil/gas field optimization cases and benchmark its overall performance against other distributed optimization methods such as those only using the gradient [27]. We will also continue our investigation in the future for constrained optimization problems (e.g., variables with lower- and/or upper-bounds and problems with nonlinear constraints). Gao et al. [18] applied the popular Newton-Raphson method [28] to directly solve the TRS using matrix factorization, the DNR method in short. The DNR method is quite expensive when it is applied to solve Ne TRSs of a distributed optimization method, especially for large-scale problems with thousands or more variables [29]. This paper follows the ideas and concepts presented in the book “Matrix Computation” [30]. Flops are used to quantify the volume of work associated with a computation, a count for floating-point operations of add, subtract, multiply, or divide. Computational cost (flops) of some commonly used algebraic operations and numerical methods are summarized in Table 1 for reference.

TABLE 1
www.frontiersin.org

TABLE 1. A summary of computational costs of commonly used algebraic operations.

For completeness, we discuss the compact representation of the L-BFGS Hessian updating formulation [31] and the algorithm to directly update the Hessian in the next section directly. In the third section, we present three different methods to solve the L-BFGS TRS: the DNR method, the direct method using inverse quadratic (DIQ) interpolation approach proposed by Gao et al. [32], and the technique using matrix inversion lemma (MIL) together with an efficient matrix updating algorithm. Some numerical tests and performance comparisons are discussed in the fourth section. We finally draw some conclusions in the last section.

The Limited-Memory Hessian Updating Formulation

Let x(k) be the best solution obtained in the current (k-th) iteration, and f(k), g(k), and H(k), respectively, the objective function, its gradient and Hessian evaluated at x(k). The Hessian H(k+1) is updated using the BFGS formulation as follows,

H(k+1)=H(k)+z(k)[z(k)]T[z(k)]Ts(k)H(k)s(k)[s(k)]TH(k)[s(k)]TH(k)s(k)(2)

In Eq. 2, s(k)=x(k+1)x(k) and z(k)=g(k+1)g(k). The Hessian H(k+1) updated using Eq. 2 is guaranteed positive definite if the condition ε(k)=[z(k)]Ts(k)>c3z(k)s(k) is satisfied where c3>0 is a small positive number.

Compact Representation

To save memory usage and computational cost, the L-BFGS Hessian updating method limits the maximum history size or the maximum number of pairs (s(k), z(k)) used for the BFGS Hessian updating to LM > 1. The recommended value of LM ranges from 5 to 20, using smaller number for problems with more variables.

Let 1 ≤ lkLM denote the number of pairs of variable shift vectors and gradient change vectors, (s(j), z(j)) for j = 1,2,...lk, used to update the Hessian using the L-BFGS method in the k-th iteration. Let S(k)=[s(1),s(2),s(lk)] be the n × lk variable shift matrix and Z(k)=[z(1),z(2),z(lk)] the n × lk gradient change matrix. Both matrices S(k) and Z(k) are updated iteration by iteration. Let m=2lk and V(k)=[S(k),Z(k)] is an n × m matrix. Let A(k)=[S(k)]TS(k) and B(k)=[S(k)]TZ(k). We decompose B(k) into three parts: the strictly lower triangular part L(k), the strictly upper triangular part U(k), and the diagonal part E(k) that contains the main diagonals of B(k), i.e.,

B(k)=L(k)+E(k)+U(k) (3)

The Hessian can be updated using the compact form of the L-BFGS Hessian updating formulation [31, 3335] as follows,

H(k+1)=α(k)InV(k)W(k)[V(k)]T (4)

In Eq. 4, In is an n × n identity matrix and W(k) is an m×m symmetric matrix defined as,

[W(k)]1=1α(k){A(k)                 L(k)[L(k)]T     α(k)E(k)}(5)

In Eq. 4 and Eq. 5, α(k)=z(lk)2[z(lk)]Ts(lk) is a scaling factor. It is required that a(k) > acr > 0 using Eq. 4 and Eq. 5 to update the Hessian, where acr is a small positive number.

The Algorithm to Directly Update the Hessian Using the L-BFGS Compact Representation

Given LM ≥ 1, k ≥ 1, lk−1 ≥ 1, the thresholds c3 > 0 and acr > 0, search step s=x(k+1)x(k) and gradient change z=g(k+1)g(k), the two n × lk−1 matrices, S(k−1) and Z(k−1), we can update S(k), Z(k) and directly compute the Hessian H(k+1) using the compact representation of the L-BFGS Hessian updating Eq. 4, as summarized in Algorithm-1.

Algorithm-1: Updating the Hessian directly using the L-BFGS compact representation

1. Compute ||s||, ||z||, ε(k)=zTsandα(k)=z2ε(k) if ε(k)>c3sz;

2. Ifε(k)c3szorα(k)αcr, setlk=lk1, S(k)=S(k1), Z(k)=Z(k1), and goto step 4;

3. Otherwise,

a. Setlk=LMand remove the first column fromS(k−1)andZ(k−1)iflk1=LM;else setlk=lk1+1;

b. UpdateS(k)=[S(k1)  s]andZ(k)=[Z(k1)  z];

4. FormV(k)=[S(k)    Z(k)];

5. ComputeA(k)=[S(k)]TS(k)andB(k)=[S(k)]TZ(k);

6. Form the matrixWI(k)=[W(k)]1usingEq. 5;

7. ComputeW(k)=[WI(k)]1using Cholesky factorization;

8. ComputeU(k)=W(k)[V(k)]T;

9. ComputeT(k)=V(k)U(k);

10. UpdateH(k+1)=α(k)InT(k).

11. Stop.

The computational cost to directly update the Hessian using the L-BFGS method as described in Algorithm-1 mainly depends on the number of variables (n) and the number of vectors used to update the Hessian (m = 2lk). We should reemphasize that m is updated iteratively but limited to m ≤ 2Lk. The computational cost is c1=2mn2+(7+3m2)n+O(m3) (flops), which includes the following seven operations: 1) computing ‖s‖, ‖z‖, and ε(k)=zTs in step 1 (6n flops); 2) computing A(k) and B(k) in step 5 (m2n flops); 3) forming the matrix WI(k) in step 6 (m2 flops); 4) computing W(k) in step 7 (O(m3) flops); 5) compute U(k) in step 8 (2m2n flops); 6) compute T(k) in step 9 (2mn2 flops); and 7) updating H(k+1) in step 10 (n flops).

Solving the L-BFGS Trust Region Subproblem

The Trust Region Subproblem

In the neighborhood of the best solution x(k), the objective function f(x) can be approximated by a quadratic function of s=xx(k),

q(k)(s)=f(k)+[g(k)]Ts+12sTH(k)s(6)

If the Hessian H(k) is positive definite, then q(k)(s) has a unique minimum s*(k), which is the solution of H(k)s=g(k),

s*(k)=[H(k)]1g(k)(7)

When a line search strategy is applied, we accept the full search step s*(k) by setting x(k+1)=x(k)+s*(k) if it improves the objective function sufficiently (e.g., satisfying the two Wolfe conditions). Otherwise, we can find a better search step size 0<γ(k)1 such that x(k+1)=x(k)+γ(k)s*(k) improves the objective function sufficiently. If the gradient g(k) can be accurately evaluated, it has theoretically proved that a line search strategy can converge [14]. However, for most real-world optimization problems, the gradient cannot be accurately evaluated, and as discussed by Gao et al. [16], a trust region search strategy performs more robustly and more efficiently than a line search strategy.

The trust region search step s*(k) for the next iteration is the global minimum of the quadratic model q(k)(s) within a ball-shaped trust region with radius Δ(k)Δmin>0 which is solved from the following trust region subproblem (TRS) [28],

minsq(k)(s)=f(k)+[g(k)]Ts+12sTH(k)ssubjecttos2Δ(k)(8)

If the solution s*(k) given by Eq. 7 satisfies s*(k)2Δ(k), then s*(k) is the solution of the TRS defined in Eq. 8. Otherwise, the solution of the TRS must lie on the boundary of s2=Δ(k), and we should solve the Lagrange multiplier λ*(k) together with the search step s*(k) from the following nonlinear TRS equations,

[H(k)+λIn]s(λ)=g(k)(9)
π(λ)=[s(λ)]Ts(λ)=[Δ(k)]2(10)

For the trivial case with H(k)=αIn, we have s(λ)=g(k)α+λ and π(λ)=g(k)2(α+λ)2. The solution of the TRS defined in Eq. 8 is λ*=0 and s*(k)=g(k)/α if g(k)2αΔ(k) or λ*=g(k)2Δ(k)α and s*(k)=g(k)Δ(k)g(k)2 otherwise.

We may apply the popular DNR method to solve the TRS when H(k) is a dense matrix. However, the DNR method is quite expensive for large-scale problems and we should seek and apply a more efficient method to solve the TRS.

Solving the L-BFGS TRS Directly Using the Newton-Raphson Method

The DNR method solves the TRS defined in Eq. 8 using the Cholesky decomposition. It applies the Newton-Raphson method to directly solve the nonlinear TRS equation [28], ϕ(λ)=1s(λ)1Δ=0, iteratively, which requires computing the first order derivative ϕ(λ)=v(λ)2s(λ)3 where v is solved from LTv=s together with the Cholesky factorization of D=H(k)+λIn=LLT or LU-decomposition.

Given the trust region size Δ=Δ(k)>0, threshold of convergence δcr>0, maximum number of iterations allowed NTRS,max>0, Hessian H=H(k) and gradient g=g(k) evaluated at the current best solution x(k), both the Lagrange multiplier λ*and the trust region search step s*=s(λ*) can be solved from the TRS defined in Eq. 8, using the DNR method as summarized in Algorithm-2.

Algorithm-2: Solving the L-BFGS TRS Using the DNR Method

1. Initializel=0, λ0=0;

2. ComputeD=H+λ0Inand Cholesky decompositionD=LLT;

3. Solveu*fromLu=g,s*fromLTs=u*, andv*fromLTv=s*;

4. Computes*andv*;

5. Ifs*Δ, then acceptλ*=λ0and go to step 8;

6. Set δ=|1s*Δ|;

7. Repeat steps (a) through (f) below, until convergence (δδcrorl>NTRS,max):

a. Updateλl+1=λl+(s*v*)2s*ΔΔ;

b. ComputeD=H+λl+1Inand Cholesky decompositionD=LLT;

c. Solveu*fromLu=g, s*fromLTs=u*, andv*fromLTv=s*;

d. Computes*andv*;

e. Updateδ=|1s*Δ|;

f. Setl=l+1.

8. Stop.

Let NDNR denote the number of iterations required for the DNR TRS solver to converge. The total computational cost (flops) to solve the TRS using Algorithm-2 is, c2=(5n+3n2+n33)(NDNR+1), including the following three operations: (1) computing D=H+λk+1In and Cholesky decomposition D=LLT in step 2 and step 7(b) (n+n33flops); (2) solving u* from Lu=g, s* from LTs=u*, and v* from LTv=s* in step 3 and step 7(c) (3n2 flops); (3) computing s* and v* in step 4 and step 7(d) (4n flops).

The total computational cost (flops) used to solve the L-BFGS TRS using the DNR method (Algorithm-2) together with the L-BFGS Hessian updating method (Algorithm-1) is,

cDNR=c1+c2=2mn2+(7+3m2)n+(5n+3n2+n33)(NDNR+1)+O(m3)(11)

Our numerical results indicate that the DNR TRS solver may fail when tested on the well-known Rosenbrock function, especially for problems with large n. The root cause for failure of convergence using the DNR method is the same as in the GNTRS solver using the traditional Newton-Raphson method as discussed by Gao et al. [36]. Very small value of |ϕ(λ)| may result in a very large search step and thus result in failure of converging. Gao et al. [36] proposed integrating the DNR method with a bisection line search to overcome this issue.

The Inverse Quadratic Model Interpolation Method to Directly Solve the L-BFGS TRS

Instead of applying the Newton-Raphson method which requires evaluating ϕ(λ), Gao et al. [32] proposed a method to directly solve the TRS using inverse quadratic model interpolation (called the DIQ method), i.e., approximating π(λ)=[s(λ)]Ts(λ) by the following inverse quadratic function,

qINV(λ)=b(λ+a)2(12)

Both coefficients of a and b in Eq. 12 can be determined by interpolating the values of π(λ) evaluated at two different points π1=π(λ1) and π2=π(λ2) with 0λ1<λ2.

In the first iteration, we set λ1=λmin=0 and accept λmin=0 as the solution if π1Δ2. Otherwise, π1>Δ2 holds and we set λ2=λmax=max[0, g(k)Δα] and π2<Δ2 holds if it is not accepted as the solution. In the following iteration, either λ1 or λ2 will be updated accordingly such that the two conditions π1>Δ2 and π2<Δ2 always hold.

Let ρ=π1π1π21 and the following solution of qINV(λ)=Δ2 will be used as the trial search point for the next iteration,

λ*=λ2ρ(1π2Δ)(λ2λ1)(13)

We either update λ1=λ* if π(λ*)>Δ2 or update λ2=λ* if π(λ*)<Δ2 iteratively until convergence. We accept λ* as the desired solution and terminate the iterative process either when |π(λ*)Δ1|<δcr, the tolerance for convergence, or when the number of iterations used to solve the TRS (NDIQ) reaches the user specified maximum iteration number (NTRS,max), i.e.,NDIQNTRS,max.

Given the scaling factor α>0, trust region size Δ>0, threshold of convergence δcr>0, maximum number of iterations allowed NTRS,max>0, Hessian H and gradient g, both the Lagrange multiplier λ*and the trust region search step s*=s(λ*) can be solved from the TRS defined in Eq. 8, using the DIQ method as summarized in Algorithm-3.

Algorithm-3: Solving the L-BFGS TRS Using the DIQ Method

(1) Computeg;

(2) Calculateλmax=max{0,gΔα};

(3) Initializek=0, λ1=λmin=0andλ2=λmax;

(4) ComputeD=H+λ1Inand Cholesky decompositionD=LLT;

(5) Solveu1fromLu=gands1fromLTs=u1;

(6) Computeπ1=π(λ1)=s1Ts1;

(7) Ifπ1Δ2, then acceptλ*=λ1and go to step 12;

(8) Ifπ1>Δ2, then

a. ComputeD=H+λ2Inand Cholesky decompositionD=LLT;

b. Solveu2fromLu=gands2fromLTs=u2;

c. Computeπ2=π(λ2)=s2Ts2;

(9) If|1π1Δ|<|1π2Δ|, thenδ=|1π1Δ|, λ*=λ1;

(10) Otherwise, δ=|1π2Δ|, λ*=λ2;

(11) Repeat steps (a) through (i) below, until convergence (δδcr or k>NTRS,max):

a. Calculateρ=π1π1π2;

b. Calculateλ*=λ2ρ(1π2Δ)(λ2λ1);

c. ComputeD=H+λ*Inand Cholesky decompositionD=LLT;

d. Solveu*fromLu=gands*fromLTs=u*;

e. Compute π*=π(λ*)=[s*]Ts*;

f. Updateδ=|1π*Δ| ;

g. If π*>Δ, updateλ1=λ*andπ1=π*;

h. Otherwise, updateλ2=λ*andπ2=π*;

i. k k+1

(12) Stop.

Let NDIQ denote the number of iterations required for the DIQ TRS solver to converge. The total computational cost to solve the L-BFGS TRS using Algorithm-3 together with Algorithm-1 is,

cIQ=2mn2+(7+3m2)n+(5n+2n2+n33)(NDIQ+1)+O(m3)(14)

Generally, the DIQ method converges faster than the DNR method and it performs more efficiently and robustly than the DNR method, especially for large scale problems. Both the DNR method and the DIQ method become quite expensive for large-scale problems.

Using Matrix Inversion Lemma (MIL) to Solve the L-BFGS TRS

To save both memory usage and computational cost, Gao, et al. [32, 36, 37] proposed an efficient algorithm to solve the Gauss-Newton TRS (GNTRS) for large-scale history matching problems using the matrix inversion lemma (or the Woodbury matrix identity). With appropriate normalization of both parameters and residuals, the Hessian of the objective function for a history matching problem can be approximated by the well-known Gauss-Newton equation as follows,

H(k+1)=αIn+V(k)[V(k)]T(15)

In Eq. 15, V(k)=[J(k+1)]T is an n×m matrix, the transpose of the sensitivity matrix J(k+1) that is evaluated at the current best solution x(k+1). Here, m denotes the number of observed data to be matched and it is not the same m as used in the L-BFGS Hessian updating Eq. 4.

The GNTRS solver proposed by Gao, et al. [32, 36, 37] using the matrix inversion lemma (MIL) has been implemented and integrated with the distributed Gauss-Newton (DGN) optimizer. Because the compact representation of the L-BFGS Hessian updating formula Eq. 4 is similar to Eq. 15, we follow the similar idea as proposed by Gao, et al. [32, 36, 37] to compute [H(k+1)+λIn]1 by applying the matrix inversion lemma and then solve the L-BFGS trust region search step s(λ).

[H(k+1)+λIn]1=1α(k)+λIn1[α(k)+λ]2V(k){1α(k)+λ[V(k)]TV(k)[W(k)]1}1[V(k)]T(16)

Let P(k)=[V(k)]TV(k) and u(k)=[V(k)]Tg(k). We first solve v(λ) from,

{1α(k)+λP(k)[W(k)]1}v(λ)=u(k).(17)

Then, we compute the trust region search step s(λ) and π(λ)=s(λ) 2 as,

s(λ)=[H(k+1)+λIn]1g(k)=1α(k)+λg(k)+1[α(k)+λ]2V(k)v(λ),(18)
π(λ)=g(k)2[α(k)+λ]2+[v(λ)]TP(k)v(λ)[α(k)+λ]42[u(k)]Tv(λ)[α(k)+λ]3.(19)

From Eq. 15, we have P(k)v(λ)=[α(k)+λ]{u(k)+[W(k)]1v(λ)} and Eq. 19 can be rewritten as,

π(λ)=g(k)2[α(k)+λ]2+[v(λ)]T[W(k)]1v(λ)[u(k)]Tv(λ)[α(k)+λ]3.(20)

We first try λ=λmin=0. If π(0)Δ2 holds, we accept λ*=0 and s*=s(0) as the solution of the TRS defined in Eq. 8. Otherwise, the solution is on the boundary defined in Eq. 10, which can be solved by the Newton-Raphson (NR) method iteratively.

Let ϕ(λ)=1π(λ)1Δ and we have ϕ(λ)=12π(λ)[π(λ)]3/2 where π(λ) is computed by,

π(λ)=g(k)2[α(k)+λ]33π(λ)α(k)+λ+2[v(λ)]T[W(k)]1w(λ)[u(k)]Tw(λ)[α(k)+λ]3.(21)

In Eq. 21, w(λ)=v(λ) is solved from,

{1α(k)+λP(k)[W(k)]1}w(λ)=1[α(k)+λ]2P(k)v(λ).(22)

Because {1α(k)+λP(k)[W(k)]1} is an m×m symmetric and positive definite matrix, it only requires. m33+6m2 flops to solve v(λ) and w(λ) from Eq. 17 and Eq. 22 using the Cholesky factorization.

Given the n×lk variable shift matrix S(k) and gradient change matrix Z(k), we may directly compute the following three lk×lk matrices, A(k)=[S(k)]TS(k), B(k)=[S(k)]TZ(k) and C(k)=[Z(k)]TZ(k), and then form the m×m matrix P(k)=[V(k)]TV(k) as,

P(k)={A(k)     B(k)[B(k)]T    C(k)}(23)

Directly computing the three lk×lk matrices, A(k), B(k) and C(k), requires 6lk2n=1.5m2n flops. We also need to compute the vector u(k)=[V(k)]Tg(k) (2mn flops) and g(k)2 (2n flops). To further reduce the computational cost, the three matrices A(k), B(k) and C(k) and the vector u(k) should be updated iteratively instead.

Updating Matrices and Vectors Used for Solving the L-BFGS TRS

We first initialize k=0 and H(0)=In. If the trial search point xT(0)=x(0)+s*(0) does not improve the objective function, i.e., f(xT(0)(f(x(0)), we set x(k+1)=x(0) and H(k+1)=H(0), recompute the sensitivity matrix J(k+1)=J(x(0)) and the gradient g(k+1)=g(x(0)) when needed (e.g., using updated training data points), and shrink the trust region size Δ(k+1)=γdΔ(k) with 0<γd<1. We repeatedly generate trial search point xT(k)=x(k)+s*(k) until xT(k) improves the objective function, i.e., f(xT(k))<f(x(0)).

If f(xT(k))<f(x(0)), we set lk=1, x(k+1)=xT(k), evaluate g(k+1)=g(x(k+1)), and compute s=x(k+1)x(k) and z=g(k+1)g(k). In the following iterations, we update lk1 and associated matrices and vectors used for solving the L-BFGS TRS for different cases accordingly. We use Sflag(k) to indicate whether the new search point xT(k) improves the objective function (Sflag(k)="True") or not (Sflag(k)="False").

The L-BFGS Hessian updating formulation of Eq. 4 requires ε(k)=zTs>c3zs and α(k)=zTzzTs>αcr>0 where c3 and αcr are small positive numbers. If ε(k)c3zs or α(k)αcr, we simply set lk=lk1, s(lk)=s(lk1), z(lk)=z(lk1), α(k)=α(k1), S(k)=S(k1), Z(k)=Z(k1), A(k)=A(k1), B(k)=B(k1), C(k)=C(k1), and P(k)=P(k1), with no updating. However, we need to compute u(k)=[S(k),   Z(k)]Tg(k+1) using the new gradient g(k+1). In the following cases, we assume that both conditions ε(k)>c3zs and α(k)>αcr are satisfied.

Case-1: Sflag(k) = “False

Although the best solution x(k+1) does not change when Sflag(k) = “False”, the sensitivity matrix J(k+1) and thus the gradient of the objective function g(k+1) evaluated at x(k+1) using simulation results and training data points updated in the k+1 (current) iteration may be different from those evaluated at the same point but using simulation results and training data points obtained in the k (previous) iteration. To use the right gradient, we should replace the last column in Z(k) with z.

Let Z˜(k1) be the submatrix of Z(k1) by removing its last column z˜(lk1), B˜(k1)=[S(k1)]TZ˜(k1) the submatrix of B(k1)=[S(k1)]TZ(k1) by removing its last column, and C˜(k1)=[Z˜(k1)]TZ˜(k1) the submatrix of C(k1)=[Z(k1)]TZ(k1) by removing its last row and last column. We first compute the following two vectors p2(k)=[S(k1)]Tz and p˜4(k)=[Z˜(k)]Tz and two scalars μ(k)=zTz and τ(k)=zTg(k+1)[z˜(lk1)]Tg(k), with computational cost 2mn flops.

We set lk=lk1, s(lk)=s(lk1), z(lk)=z, S(k)=S(k1), and A(k)=A(k1), and Update Z(k) as,

Z(k)=[Z˜(k1)   z].(24)

It is straightforward to derive the following equations to update B(k) and C(k) accordingly,

B(k)=[S(k1)]T[Z˜(k1)  z ]=[B˜(k1)   p2(k)],(25)
C(k)={[Z˜(k1)]T zT}[Z˜(k1)   z ]={C˜(k1)     p˜4(k)[p˜4(k)]T    μ(k) }.(26)

By definition,

u(k1)=[S(k1),   Z(k1)]Tg(k)=[S(k1),   Z˜(k1),z˜(lk1)]Tg(k)
u(k)=[S(k),   Z(k)]g(k+1)=[S(k1),   Z˜(k1),z]Tg(k+1)

Thus, we have,

u(k)=u(k1)+{[S(k1),   Z˜(k)]TzzTg(k+1)[z˜(lk1)]Tg(k)}=u(k1)+[p2(k)p˜4(k)τ(k)].(27)

Case-2: Sflag(k) = “True” and lk−1 < LM

We set lk=lk1+1 and s(lk)=s and z(lk)=z, compute four lk1-dimensional vectors p1(k)=[S(k1)]Ts and p2(k)=[S(k1)]Tz and [p3(k)]T=sTZ(k1) and p4(k)=[Z(k1)]Tz and five scalars, β(k)=sTs and ε(k)=sTz and μ(k)=zTz and γs(k)=sTg(k+1) and γz(k)=zTg(k+1), with computational cost 4mn flops. Both S(k) and Z(k) are updated by,

S(k)=[S(k1)    s](28)
Z(k)=[Z(k1)   z](29)

It is straightforward to derive the following equations to update A(k), B(k), and C(k),

A(k)=[S(k1)  s]T[S(k1)  s ]={A(k1)       p1(k)[ p1(k)]T     β(k)}(30)
B(k)=[S(k1)  s]T[Z(k1)  z ]={B(k1)      p2(k)[p3(k)]T     ε(k)}(31)
C(k)=[Z(k1)  Z]T[Z(k1)  z ]={C(k1)      p4(k)[p4(k)]T    μ(k)}(32)

We first split the 2lk1-dimensional vector u(k1) into two lk1-dimensional sub-vectors uS(k1) and uZ(k1), i.e.,

u(k1)=[S(k1),   Z(k1)]Tg(k)=[uS(k1)uZ(k1)].

where uS(k1)=[S(k1)]Tg(k) is composed of the first lk1 rows in u(k1) and uZ(k1)=[Z(k1)]Tg(k) is composed of the last lk1 rows in u(k1).

Because u(k)=[S(k),   Z(k)]g(k+1)=[S(k1),s   Z(k1),z]Tg(k+1) and g(k+1)=g(k)+z, thus we have,

u(k)=[uS(k1)+p2(k)γs(k)uZ(k1)+p4(k)γz(k)].(33)

Case-3: Sflag(k) = “True” and lk−1 = LM

We set lk=lk1=LM. Let S^(k1) and Z^(k1) denote the submatrices of S(k1) and Z(k1) by removing the first column s(1) and z(1) from S(k1) and Z(k1), respectively. Let A^(k1) and B^(k1) and C^(k1) denote the submatrices of A(k), B(k), and C(k) by removing their first row and first column.

We apply Eq. 28 through Eq. 32 to update S(k), Z(k), A(k), B(k), and C(k) by simply replacing S(k−1), Z(k−1), A(k−1), B(k−1), and C(k−1) with S^(k1) and Z^(k1) and A^(k1) and B^(k1) and C^(k1), respectively. We can also apply Eq. 33 to update u(k) by replacing uS(k1) with u^S(k1)=[S^(k1)]Tg(k) that is composed of the second row through the lk1-th rows in u(k1) and replacing uZ(k1) with u^Z(k1)=[Z^(k1)]Tg(k) that is composed of the last lk11 rows in u(k1).

The Algorithm to Update Matrices and Vectors Used for Solving the L-BFGS TRS

Given LM ≥ 1, k > 1, lk−1 ≥ 1, the thresholds c3 > 0 and acr > 0, n-dimensional vectors g(k+1), s and z, lk−1-dimensional vector u(k−1), n × lk−1 matrices S(k−1) and Z(k−1), lk−1 × lk−1 matrices A(k−1), B(k−1) and C(k−1), 2lk−1 × 2lk−1 matrices P(k−1) and WI(k1)=[W(k1)]1, we can update lk and the lk-dimensional vector u(k), n × lk matrices S(k) and Z(k), lk × lk matrices A(k), B(k), and C(k), and 2lk × 2lk matrices P(k) and WI(k)=[W(k)]1, using the algorithm as summarized in Algorithm-4.

Algorithm-4: Updating Matrices and Vectors Used for Solving the L-BFGS TRS

1. Computeε(k)=zTs, μ(k)=zTz, β(k)=sTs;

2. Compute the scaling factorα(k)=μ(k)/ε(k)ifε(k)>c3β(k)μ(k), or setα(k)=0otherwise;

3. If α(k)αcrorε(k)c3β(k)μ(k), then

a. Setα(k)=α(k1)andlk=lk1;

b. UpdateS(k)=S(k1)andZ(k)=Z(k1);

c. UpdateA(k)=A(k1), B(k)=B(k1), andC(k)=C(k1);

d. UpdateP(k)=P(k1)andWI(k)=WI(k1);

e. Computeu(k)=[S(k),   Z(k)]Tg(k+1);

f. Goto step 10.

4. Else ifSflag(k)="False"(for Case-1), then

a. Setlk=lk1;

b. UpdateS(k)=S(k1);

c. Setz˜(lk1)to the last column ofZ(k1);

d. SetZ˜(k)by removing the last columnz˜(lk1)fromZ(k1);

e. UpdateZ(k)=[Z˜(k)    z];

f. Computeτ(k)=zTg(k+1)[z˜(lk1)]Tg(k);

g. Computep2(k)=[S(k1)]Tzandp˜4(k)=[Z˜(k)]Tz;

h. Computeu(k)usingEq. 27;

i. UpdateA(k)=A(k1);

j. SetB˜(k1)by removing the first column fromB(k1);

k. SetC˜(k1)by removing the first row and the first column fromC(k1);

l. UpdateB(k)andC(k)usingEq. 25andEq. 26;

5. Else iflk1<LM(for Case-2), then

a. Setlk=lk1+1;

b. Computep1(k)=[S(k1)]Ts, p2(k)=[S(k1)]Tz, p3(k)=[Z(k1)]Ts, andp4(k)=[Z(k1)]Tz;

c. Computeγs=sTg(k+1)andγz=zTg(k+1);

d. UpdateS(k),Z(k),A(k),B(k), andC(k)usingEq. 28throughEq. 32, respectively;

e. SetuS(k1)to the firstlk1rows inu(k1)anduZ(k1)to the lastlk1rows inu(k1);

f. Updateu(k)usingEq. 31.

6. Else (for Case-3)

a. Setlk=LM;

b. Remove the first column fromS(k1)andZ(k1);

c. Remove the first row and first column fromA(k1),B(k1), andC(k1);

d. Computep1(k)=[S(k1)]Ts,p2(k)=[S(k1)]Tz,p3(k)=[Z(k1)]Ts, andp4(k)=[Z(k1)]Tz;

e. Computeγs=sTg(k+1)andγz=zTg(k+1);

f. UpdateS(k),Z(k),A(k),B(k), andC(k)usingEq. 28 throughEq. 32present, respectively;

g. SetuS(k1)to the second row through thelk1-th rows inu(k1);

h. SetuZ(k1)to the lastlk11rows inu(k1);

i. Updateu(k)usingEq. 33.

7. Form the matrixP(k)usingEq. 23;

8. DecomposeB(k)=L(k)+E(k)+U(k);

9. Form the matrixWI(k)usingEq. 5;

10. End

The computational cost to update matrices and vectors used for solving the L-BFGS TRS as described in Algorithm-4 is at most c4=2n+4mn (taking Case-2 in step 5 as an example), including the following 4 operations: (1) computing ε(k)=zTs, μ(k)=zTz, and β(k)=sTs in step 1 (6n flops); (2) computing p1(k)=[S(k1)]Ts,p2(k)=[S(k1)]Tz,p3(k)=[Z(k1)]Ts, and p4(k)=[Z(k1)]Tz in step 5(b) (8lk1n=4mn8n flops),; (3) computing γs=sTg(k+1) and γz=zTg(k+1)in step 5(c) (4n flops).

In this paper, we apply the same idea of reducing the dimension from n to m using the matrix inversion lemma as presented in the three papers [32, 36, 37]. However, the implementation is quite different. For the DGN optimization method, the n×m matrix V(k) (or the transpose of the sensitivity matrix) in Eq. 15 is directly computed. Its dimensions (both the number of variables n and the number of observed data m) are fixed, i.e., they remain the same for all iterations. Therefore, we have to directly compute the m×m matrix P(k)=[V(k)]TV(k) and the m-dimensional vector u(k)=[V(k)]Tg(k), with the computational cost 2m2n+2mn flops. Using the algorithm as summarized in Algorithm-4, the computational cost can be further reduced to 2n+4mn, by a factor of lk=m/2, 5 to 20 roughly.

The Algorithm to Solve the L-BFGS TRS Using the Matrix Inversion Lemma

Given the trust region size Δ=Δ(k)>0, threshold of convergence δcr>0, maximum number of iterations allowed NTRS,max>0, the scaling factor α>αcr>0, m0, n×m matrix V=[S(k),Z(k)], m×m matrices P=P(k) and WI=[W(k)]1, m-dimensional vector u=u(k) and n-dimensional gradient vector g=g(k) evaluated at the current best solution x(k), both the Lagrange multiplier λ*and the trust region search step s*=s(λ*) can be solved from the TRS defined in Eq. 8 using the matrix inversion lemma (MIL) as summarized in Algorithm-5.

Algorithm-5: Solving the L-BFGS TRS Using the Matrix Inversion Lemma

1. Compute g;

2. Initializel=0, λ0=0;

3. Ifm=0, then

a. Setλ*=0ands=g/αifgαΔ;

b. Setλ*=gΔαands=gΔgifg>αΔ;

c. Goto step 11.

4. ComputeD=1α+λ0PWIand Cholesky decompositionD=LLT;

5. Solvep*fromLp=uandv*fromLTv=p*;

6. ComputeuTv*, p=WIv*, pTv*, andπ0=π(λ0)usingEq. 20;

7. Ifπ0Δ2, then acceptλ*=λ0and goto step 10;

8. Setδ=|1π0Δ|;

9. Repeat steps (a) through (j) below, until convergence (δδcr or l>NTRS,max):

a. Computeq=1(α+λl)2Pv* ;

b. Solvep*fromLp=qandw*fromLTw=p*;

c. ComputeuTw*,p=WIw*,pTv*, andπl=π(λl)usingEq. 21;

d. Computeϕl=1πl1Δandϕl=12πl(πl)3/2;

e. Updateλl+1=λlϕlϕl';

f. Setl=l+1;

g. ComputeD=1α+λlPWIand Cholesky decompositionD=LLT;

h. Solvep*fromLp=uandv*fromLTv=p*;

i. ComputeuTv*, p=WIv*, pTv*, andπl=π(λl)usingEq. 20;

j. Setδ=|1πlΔ|.

10. Computep*=Vv*ands* =1α+λ*g+1(α+λ*)2p*;

11. End

Let NMIL denote the number of iterations required for the L-BFGS TRS solver to converge. The total computational cost (flops) to solve the L-BFGS TRS using Algorithm-5 is, c5=4n+2mn+(9m+12m2+m33)(NMIL+1), including the following eight operations: (1) computing g in step 1 (2n flops); (2) computing D=1α+λlPWI and Cholesky decomposition D=LLT in step 4 or step 9(g) (2m2+m33 flops); (3) solving p* from Lp=u, and v* from LTv=p* in step 5 or step 9(h) (2m2 flops); (4) computing uTv*, p=WIv* and pTv* in step 6 or step 9(i) (4m+2m2 flops); (5) computing q=1(α+λl)2Pv* in step 9(a) (m+2m2 flops); (6) Solving p* from Lp=q and w* from LTw=p* in step 9(b) (2m2 flops); and (7) computing uTw*, p=WIw*, and pTv* in step 9(c) (4m+2m2 flops); and (8) computing p*=Vz* and s* =1α+λ*g+1(α+λ*)2p* in step 10 (2mn+2n).

The total computational cost (flops) used to solve the L-BFGS TRS using Algorithm-5 together with Algorithm-4 is,

cMIL=c4+c5=6n+6mn+(9m+12m2+m33)(NMIL+1)(34)

It is straightforward to compute the normalized computational cost, or the ratio of computational cost of the DNR method (Algorithm-2) or the DIQ method (Algorithm-3) together with the algorithm that directly updates Hessian (Algorithm-1) over the MIL TRS solver using the matrix inversion lemma (Algorithm-5) and the efficient matrix updating algorithm (Algorithm-4),

β=cDIQcMIL=2mn2+(7+3m2)n+(5n+2n2+n33)(NDIQ+1)+O(m3)6n+6mn+(9m+12m2+m33)(NMIL+1)(35)

Because nm for large-scale problems, Eq. 35 can be further simplified as,

β=cDIQcMILNDIQ+118n2m(36)

For some problems the solution s*=s(0) is accepted with NDIQ=NMIL=0. For other problems, it takes roughly NDIQNMIL=515 iterations for a TRS solver to converge. Figure 1A,B illustrate the plots of β vs. n2/m for different m by fixing NDIQ=NMIL=0 and NDIQ=NMIL=10, respectively. As shown in both Figure 1A and Figure 1B, the MIL TRS solver may reduce the computational cost by a factor β=10105 for n2m=100106.

FIGURE 1
www.frontiersin.org

FIGURE 1. Plots of the ratio of computational cost β vs. n2/m, computed using Eq. 35 by fixing NDIQ=NMIL=0 in (A) and NDIQ=NMIL=10 in (B).

Numerical Validation

We benchmarked the MIL TRS solver against the DNR (or DIQ) TRS solver on two well-known analytic optimization problems using analytical gradients: the Rosenbrock function and the Sphere function defined as follows,

f(x)Rosenbrock=i=1n1[100(xi+1xi2)2+(1xi)2] ,(37)
f(x)Sphere=i=1nxi2 (38)

The Rosenbrock function defined in Eq. 37 has one local minimum located at x1=1 and xi=1 for i=2,3,n with the objective function being 4 (approximately), and one global minimum located at xi=1 for i=1,2,n with the objective function being 0. Occasionally, a local-search optimizer may converge to the local minimum.

We implemented different algorithms described in this paper in our in-house C++ optimization library (OptLib). We employ “Armadillo” as our foundational template-based C++ library for linear algebra (http://arma.sourceforge.net/). We ran all test cases reported on Tables 25 on a virtual machine computer equipped with an Intel Xeon Platinum Processor at 2.60 MHz and 16.0 GB of RAM. We varied the number of parameters, n as powers of two and fixed twice the maximum number of snapshots, M=2LM=10. Tables 25 summarize the preliminary numerical results, including the value of the objective function, the gradient norm, the number of iterations, and the elapsed CPU time (s). The last column displays the resulting speedup, namely, the ratio of CPU times using the DNR (or DIQ) method over the MIL method. We prescribe the tolerance to stop the L-BFGS TRS solver for all the numerical experiments in this subsection as δcr=104 and the maximum number of iterations as NTRS,max=16. We set the initial trust region size at ∆ = 0.5.

TABLE 2
www.frontiersin.org

TABLE 2. Performance comparison of DNR vs. MIL methods testing on Rosenbrock function.

TABLE 3
www.frontiersin.org

TABLE 3. Performance comparison of DIQ vs. MIL methods testing on Rosenbrock function.

TABLE 4
www.frontiersin.org

TABLE 4. Performance comparison of DNR vs. MIL methods testing on Sphere function.

TABLE 5
www.frontiersin.org

TABLE 5. Performance comparison of DIQ vs. MIL methods testing on Sphere function.

We noticed that the matrix D may become ill-conditioned for a few cases. Indeed, we could observe that the first entries on the diagonal are nearly zero. The same situation occurs for different optimization steps where m grows from 0, 2, 4, 6, up to ten. The Cholesky decomposition of such a matrix will fail, i.e., Armadillo throws an exception. We advise replacing the former with an LU-decomposition to handle the pivots, namely, entries on the diagonal of the lower triangular matrix, to be nearly zero. The rest of the algorithm remains the same.

Table 2 compares performance of the DNR method vs. the MIL methods for the Rosenbrock problem encompassing 8, 16, 32, and 48 parameters, respectively. Beyond those problem sizes, the DNR method could not converge to the solution. Indeed, the DNR method becomes relatively slow, i.e., renders very large lambda values that imply tiny shifts. Often, after 2,048 iterations, the objective function is still greater than one. It seems that the DNR method is not robust enough to tackle the Rosenbrock function with more than 48 parameters. It appears that the MIL method slightly outperforms the DNR method for problems with fewer parameters.

Table 3 compares performance of the DIQ method vs. the MIL method. These numerical results confirm that the MIL method outperforms the DIQ method. As expected, the speedup improves with the increase of the problem size n.

Table 4 shows similar results for the Sphere function problem. We could benchmark the DNR method against the MIL method with ranks up to 2,048 parameters. Again, the speedup increases with problem size. Finally, Table 5 benchmarks the DIQ method against the MIL method testing on the Sphere function. The speedup reduces when compared to Table 4 for the same ranks, but we still clearly see that the performance of the MIL solver exceeds its DIQ counterpart. Results shown in Table 4 and Table 5 also confirm that the DIQ method performs better than the DNR method. We believe that the DIQ TRS solver converges faster with smaller number of iterations than the DNR TRS solver (i.e., NDIQ<NDNR).

The computational costs (in terms of CPU time) summarized in Tables 25 are the overall costs for all iterations, including the cost of computing both value and gradient of the objective function in addition to the cost of solving the L-BFGS TRS. In contrast, the theoretical analysis results of computational cost for β in Eq. 35 only count for the cost of solving the L-BFGS TRS in just one iteration. Therefore, numerical results listed in Tables 25 are not quantitatively in agreement with the theoretically derived β.

Conclusion

We can draw the following conclusions based on theoretical analysis and numerical tests:

1) Using the compact representation of the L-BFGS formulation in Eq. 4, the updated Hessian H(k+1) is composed of a diagonal matrix λIn and a matrix V(k)W(k)[V(k)]T with low rank when m < n.

2) Using the matrix inversion lemma, we are able to reduce the cost of solving the L-BFGS TRS by transforming the original equation with n variables to a new equation with m variables, by taking advantage of the low rank matrix updating feature.

3) Further reduction in computational cost is achieved by updating the vector u(k) and matrices such as S(k) and Z(k), A(k), B(k), and C(k), P(k), and [W(k)]1 iteratively, which effectively avoids expensive operations such as matrix-matrix and matrix-vector multiplications.

4) Through seamless integration of matrix inversion lemma with the iterative matrix and vector updating approach, the newly proposed TRS solver for the limited-memory distributed BFGS optimization method performs much more efficiently than the DNR TRS solver.

5) The MIL TRS solver can obtain the right solution with acceptable accuracy, without increasing memory usage but reducing the computational cost significantly, as confirmed by numerical results on the Rosenbrock function and Sphere function.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author Contributions

GG, HF, and JV contributed to conception and algorithmic analysis of the study. HF, TW, FS, and CB contributed to design and numerical validations. GG and FH wrote the first draft of the article. All authors contributed to article revision, read, and approved the submitted version.

Conflict of Interest

Authors GG, HF, and FS were employed by Shell Global Solutions (United States) Inc. Authors JV, TW and CB were employed by Shell Global Solutions International B.V.

References

1. Ai-Mudhafar, WJ, Rao, DN, and Srinivasan, S. Robust Optimization of Cyclic CO2 Flooding through the Gas-Assisted Gravity Drainage Process under Geological Uncertainties. J Pet Sci Eng (2018). 166:490–509. doi:10.1016/j.petrol.2018.03.044

Google Scholar

2. Alpak, FO, Jin, L, and Ramirez, BA. Robust Optimization of Well Placement in Geologically Complex Reservoirs. in: Paper SPE-175106-MS Presented at the SPE Annual Technical Conference and Exhibition; September28–30 2015; Houston, TX (2015).

Google Scholar

3. van Essen, GM, Zandvliet, MJ, Den Hof, PMJ, Bosgra, OH, Jansen, JD, et al. Robust Optimization of Oil Reservoir Flooding. in: Paper Presented at the IEEE International Conference on Control Applications; Oct 4-6 2006; Munich, Germany (2006). p. 4–6.

Google Scholar

4. Liu, Z, and Forouzanfar, F. Ensemble Clustering for Efficient Robust Optimization of Naturally Fractured Reservoirs. Comput Geosci (2018). 22:283–96. doi:10.1007/s10596-017-9689-1

CrossRef Full Text | Google Scholar

5. Liu, X, and Reynolds, AC. Augmented Lagrangian Method for Maximizing Expectation and Minimizing Risk for Optimal Well-Control Problems with Nonlinear Constraints. SPE J (2016). 21(5). doi:10.1007/s10596-017-9689-1

CrossRef Full Text | Google Scholar

6. Liu, X, and Reynolds, AC. Gradient-based Multi-Objective Optimization with Applications to Waterflooding Optimization. Comput Geosci (2016). 20:677–93. doi:10.1007/s10596-015-9523-6

CrossRef Full Text | Google Scholar

7. Oliver, DS, Reynolds, AC, and Liu, N. Inverse Theory for Petroleum Reservoir Characterization and History Matching. Cambridge, United Kingdom: Cambridge University Press (2008). doi:10.1017/cbo9780511535642

CrossRef Full Text

8. Tarantola, A. Inverse Problem Theory and Methods for Model Parameter Estimation. Philadelphia, PA: SIAM (2005). doi:10.1137/1.9780898717921

CrossRef Full Text

9. Oliver, DS. On Conditional Simulation to Inaccurate Data. Math Geol (1996). 28:811–7. doi:10.1007/bf02066348

CrossRef Full Text | Google Scholar

10. Jansen, JD. Adjoint-based Optimization of Multi-phase Flow through Porous Media - A Review. Comput Fluids (2011). 46(1):40–51. doi:10.1016/j.compfluid.2010.09.039

CrossRef Full Text | Google Scholar

11. Li, R, Reynolds, AC, and Oliver, DS. History Matching of Three-phase Flow Production Data. SPE J (2003). 8(4):328–40. doi:10.2118/87336-pa

CrossRef Full Text | Google Scholar

12. Sarma, P, Durlofsky, L, and Aziz, K. Implementation of Adjoint Solution for Optimal Control of Smart Wells. in: Paper SPE 92864 Presented at the SPE Reservoir Simulation Symposium Held in the Woodlands; Jan.-2 Feb; Woodlands, TX (2005). p. 31.

Google Scholar

13. Gao, G, and Reynolds, AC. An Improved Implementation of the LBFGS Algorithm for Automatic History Matching. SPE J (2006). 11(1):5–17. doi:10.2118/90058-pa

CrossRef Full Text | Google Scholar

14. Nocedal, J, and Wright, SJ. Numerical Optimization. New York City: Springer (1999).

15. Chen, C, et al. Assisted History Matching Using Three Derivative-free Optimization Algorithms. in: Paper SPE-154112-MS Presented at the SPE Europec/EAGE Annual Conference Held in Copenhagen; June 4–7 2012; Copenhagen, Denmark (2012). p. 4–7.

Google Scholar

16. Gao, G, Vink, JC, Chen, C, Alpak, FO, and Du, K. A Parallelized and Hybrid Data-Integration Algorithm for History Matching of Geologically Complex Reservoirs. SPE J (2016). 21(6):2155–74. doi:10.2118/175039-pa

CrossRef Full Text | Google Scholar

17. Gao, G, Vink, JC, Chen, C, El Khamra, Y, Tarrahi, M, et al. Distributed Gauss-Newton Optimization Method for History Matching Problems with Multiple Best Matches. Comput Geosciences (2017). 21(5-6):1325–42. doi:10.1007/s10596-017-9657-9

CrossRef Full Text | Google Scholar

18. Gao, G, Wang, Y, Vink, J, Wells, T, Saaf, F, et al. Distributed Quasi-Newton Derivative-free Optimization Method for Optimization Problems with Multiple Local Optima. in: 17th European Conference on the Mathematics of Oil Recovery ECMOR XVII; September 2020; Edinburgh, United Kingdom (2020). p. 14–7.

Google Scholar

19. Wild, SM. Derivative Free Optimization Algorithms for Computationally Expensive Functions. [PhD thesis]. Ithaca (new york): Cornell University (2009).

20. Powell, MJD. Least Frobenius Norm Updating of Quadratic Models that Satisfy Interpolation Conditions. Math Programming (2004). 100(1):183–215. doi:10.1007/s10107-003-0490-7

CrossRef Full Text | Google Scholar

21. Zhao, H, Li, G, Reynolds, AC, and Yao, J. Large-scale History Matching with Quadratic Interpolation Models. Comput Geosci (2012). 17:117–38. doi:10.1007/s10596-012-9320-4

CrossRef Full Text | Google Scholar

22. Audet, C, and Dennis, JE Mesh Adaptive Direct Search Algorithms for Constrained Optimization. SIAM J Optim (2006). 17:188–217. doi:10.1137/040603371

CrossRef Full Text | Google Scholar

23. Spall, JC. Introduction to Stochastic Search and Optimization. Hoboken, NJ: John Wiley & Sons (2003). doi:10.1002/0471722138

CrossRef Full Text

24. Chen, C, et al. EUR Assessment of Unconventional Assets Using Parallelized History Matching Workflow Together with RML Method. In: Paper Presented the 2016 Unconventional Resources Technology Conference; August 1–3 2016; San Antonio, TX(2016).

CrossRef Full Text | Google Scholar

25. Chen, C, Gao, G, Li, R, Cao, R, Chen, T, Vink, JC, et al. Global-Search Distributed-Gauss-Newton Optimization Method and its Integration with the Randomized-Maximum-Likelihood Method for Uncertainty Quantification of Reservoir Performance. SPE J (2018). 23(05):1496–517. doi:10.2118/182639-pa

CrossRef Full Text | Google Scholar

26. Chen, C, Gao, G, Ramirez, BA, Vink, JC, and Girardi, AM. Assisted History Matching of Channelized Models by Use of Pluri-Principal-Component Analysis. SPE J (2016). 21(05):1793–812. doi:10.2118/173192-pa

CrossRef Full Text | Google Scholar

27. Armacki, A, Jakovetic, D, Krejic, N, Krklec Jerinkic, N, et al. Distributed Trust-Region Method with First Order Models. in Paper Presented at the IEEE EUROCON-2019, 18th International Conference on Smart Technologies July 1–14 2019 Novi Sad, Serbia (2019). p. 1–4.

Google Scholar

28. Moré, JJ, and Sorensen, DC. Computing a Trust Region Step. SIAM J Sci Stat Comput (1983). 4(3):553–72. doi:10.1137/0904038

CrossRef Full Text | Google Scholar

29. Mohamed, AW. Solving Large-Scale Global Optimization Problems Using Enhanced Adaptive Differential Evolution Algorithm. Complex Intell Syst (2017). 3:205–31. doi:10.1007/s40747-017-0041-0

CrossRef Full Text | Google Scholar

30. Golub, GH, and Van Loan, CF. Matrix Computations. 4th ed. The Johns Hopkins University Press (2013).

31. Brust, JJ, Leyffer, S, and Petra, CG. Compact Representations of BFGS Matrices. Preprint ANL/MCS-P9279-0120. Lemont, IL: ARGONNE NATIONAL LABORATORY (2020).

32. Gao, G, Jiang, H, Vink, JC, van Hagen, PPH, Wells, TJ, et al. Performance Enhancement of Gauss-Newton Trust-Region Solver for Distributed Gauss-Newton Optimization Method. Comput Geosci (2020). 24(2):837–52. doi:10.1007/s10596-019-09830-x

CrossRef Full Text | Google Scholar

33. Burdakov, O, Gong, L, Zikrin, S, and Yuan, Y. On Efficiently Combining Limited-Memory and Trust-Region Techniques. Math Prog Comp (2017). 9:101–34. doi:10.1007/s12532-016-0109-7

CrossRef Full Text | Google Scholar

34. Burke, JV, Wiegmann, A, and Xu, L. Limited Memory BFGS Updating in a Trust-Region framework Tech. Rep. Seattle, WA: University of Washington (2008).

35. Byrd, RH, Nocedal, J, and Schnabel, RB. Representations of Quasi-Newton Matrices and Their Use in Limited Memory Methods. Math Programming (1994). 63:129–56. doi:10.1007/bf01582063

CrossRef Full Text | Google Scholar

36. Gao, G, Jiang, H, van Hagen, P, Vink, JC, and Wells, T. A Gauss-Newton Trust Region Solver for Large Scale History Matching Problems. SPE J (2017b). 22(6):1999–2011. doi:10.2118/182602-pa

CrossRef Full Text | Google Scholar

37. Gao, G, Saaf, F, Vink, J, Krymskaya, M, Wells, T, et al. Gauss-Newton Trust Region Search Optimization Method for Ill-Conditioned Least Squares Problem. in: ECMOR XVII 17th European Conference on the Mathematics of Oil Recovery; 14 September 2020; Edinburgh, UK (2020). p. 14–17.

Google Scholar

Keywords: limited-memory BFGS method, trust region search optimization method, trust region subproblem, matrix inversion lemma, low rank matrix update

Citation: Gao G, Florez H, Vink JC, Wells TJ, Saaf F and Blom CPA (2021) Performance Analysis of Trust Region Subproblem Solvers for Limited-Memory Distributed BFGS Optimization Method. Front. Appl. Math. Stat. 7:673412. doi: 10.3389/fams.2021.673412

Received: 27 February 2021; Accepted: 29 April 2021;
Published: 24 May 2021.

Edited by:

Olwijn Leeuwenburgh, Netherlands Organisation for Applied Scientific Research, Netherlands

Reviewed by:

Andreas Størksen Stordal, Norwegian Research Institute (NORCE), Norway
Sarfraz Ahmad, COMSATS University Islamabad, Pakistan

Copyright © 2021 Gao, Florez, Vink, Wells, Saaf and Blom. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Guohua Gao, Z3VvaHVhLmdhb0BzaGVsbC5jb20=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.