The connection of velocity and impedance sensitivity kernels with scattering-angle filtering and its application in full waveform inversion

Yang, Jidong; Xu, Jie; Huang, Jianping; Yu, Youcai; Sun, Jiaxing

doi:10.3389/feart.2022.961750

ORIGINAL RESEARCH article

Front. Earth Sci., 26 September 2022

Sec. Solid Earth Geophysics

Volume 10 - 2022 | https://doi.org/10.3389/feart.2022.961750

This article is part of the Research TopicJoint Inversion and Imaging in GeophysicsView all 12 articles

The connection of velocity and impedance sensitivity kernels with scattering-angle filtering and its application in full waveform inversion

Department of Geophysics, China University of Petroleum (East China), Qingdao, China

Multi-scale strategies such as starting from the low-frequency and early-arrival part of recorded data are commonly used in full waveform inversion (FWI) to maneuver complex nonlinearity. An alternative way is to apply appropriate filtering and conditioning to the misfit gradient in the model domain. In acoustic constant-density media, we prove that velocity and impedance sensitivity kernels are equivalent to applying a high-pass and a low-pass scattering-angle filter to a conventional single-parameter velocity (CSV) kernel. The high-pass scattering-angle filter allows the velocity kernel to include low-wavenumber updates (tomography component). In contrast, the low-pass scattering-angle filter helps the impedance kernel to yield high-wavenumber updates (migration component). The velocity model can be updated using a hybrid gradient of two components combined with appropriate weights. This FWI scheme is able to overcome the potential nonlinearity and partially mitigate the cycle-skipping problem. Numerical examples for the SEG/EAGE overthrust model and the Marmousi model demonstrate that the hybrid gradient facilitates FWI to converge faster to the true model even in cases when conventional CSV-based FWI fails.

1 Introduction

Full waveform inversion (FWI) aims to estimate subsurface rock parameters by minimizing misfits between observed and synthetic data (Lailly, 1983; Tarantola, 1984; Mora, 1989; Pratt et al., 1998; Pratt, 1999; Pratt and Shipp, 1999). To date, global optimization methods are still expensive in practice due to high computational costs for forward calculations and high dimensionality of model space. Gradient-based local optimization techniques are commonly used to update model parameters (Tromp et al., 2005; Liu and Tromp, 2006; Plessix, 2006). Due to irregular acquisition, limited offset, ambient noise and lack of low-frequency signals, the nonlinearity of FWI becomes much complicated and the gradient-based solvers are prone to be trapped in the local minima (Brossier et al., 2009; Virieux and Operto, 2009; Sears et al., 2010; Vigh et al., 2010).

Multi-scale strategy is a natural way to reduce the potential nonlinearity through data decimation and selection, which helps FWI to mitigate the cycle-skipping problems (Bunks et al., 1995; Pratt et al., 1996; Virieux and Operto, 2009). One implementation of this strategy is to gradually increase frequency bands to ensure that the phase difference between the predicted and observed data is always less than half a period (Pratt, 1999; Ravaut et al., 2004; Sirgue and Pratt, 2004; Brossier et al., 2009; Fichtner et al., 2013; Xue et al., 2016). The success of this approach requires the existence of effective low-frequency signals in the recorded data, which can be achieved by either acquiring broadband data or utilizing low-frequency enhanced techniques (Xie, 2013; Li and Demanet, 2016; Wang and Herrmann, 2016). Another implementation of multi-scale strategy is to gradually include later arrivals by designing particular windows in the time domain (Shipp and Singh, 2002; Sheng et al., 2006; Brossier et al., 2009; Boonyasiriwat et al., 2010) or introducing proper damping terms in the frequency domain (Shin et al., 2002; Brenders and Pratt, 2007; Shin and Cha, 2008). In addition, layer-stripping and offset-dependent windowing can also be combined with the above two strategies to reduce the nonlinearity and improve the success probability of FWI (Shipp and Singh, 2002; Brossier et al., 2009; Virieux and Operto, 2009).

As analyzed by Wu and Toksöz (1987) and Mora (1989), the resolved wavenumbers of velocity model are determined by scattering patterns in diffraction tomography. Such works provide insights in the model domain instead of the data domain, such as filtering and conditioning the misfit gradient, to reduce the potential nonlinearity in FWI (Albertin et al., 2013; Almomin and Biondi, 2013; Tang et al., 2013; Alkhalifah, 2016). Using the slopes of subsurface structures, Guitton et al. (2012) and Ma et al. (2012) design a directional smoothing operator for the gradients, which can help them to generate smooth velocity models and mitigate the cycle-skipping problem. Tang et al. (2013) notice that the FWI gradient includes tomography and migration components. They propose to enhance the tomography part at early stages in order to recover long-wavelength velocity perturbations. From the standpoint of wavenumber continuation, Alkhalifah (2015) design a scattering-angle filter to extract different wavenumber components and prove that even 10 Hz data can produce vertical near-zero wavenumber components in the FWI gradients. This allows him to update the velocity model from low-to high-wavenumbers by successively relaxing the scattering-angle filter (Wu and Alkhalifah, 2015; Alkhalifah, 2016; Kazei et al., 2016). Wu and Alkhalifah (2017) split the velocity model into background and perturbation components, integrate them directly in the wave equation and introduce a new cheap implementation of scattering angle enrichment, achieving the separation of the background and perturbation components efficiently (Wu and Alkhalifah, 2017).

In this study, we parameterize the acoustic wave equation with velocity and impedance and derive their sensitivity kernels using the Lagrange multiplier method. For a constant-density model, we analytically prove that the velocity and impedance kernels are equivalent to applying a high-pass and a low-pass scattering-angle filter to the conventional single-parameter velocity (CSV) kernel. The high-pass scattering-angle filter allows the velocity kernel to recover low-wavenumber perturbations (tomography component) and can be used to estimate macro velocity models. The low-pass scattering-angle filter helps the impedance kernel to update high-wavenumber perturbations (migration component) and produces high-resolution results. Similar to Tang et al. (2013), we combine these two components into a hybrid gradient (HG) to update the velocity model. By emphasizing the velocity kernel at a few early iterations and then relaxing its weights at later iterations, HG-based FWI provides us with a way to reduce FWI nonlinearity and partially mitigate the cycle-skipping problem. Numerical examples for the SEG/EAGE overthrust and the Marmousi models demonstrate that the proposed method is much more accurate than CSV-based FWI for recovering deep low-wavenumber velocity anomalies.

This paper is organized as follows. First, we derive the velocity and impedance sensitivity kernels using the Lagrange multiplier method. Next, we establish a connection between the velocity and impedance kernels with scattering-angle filtering. Then, we combine these two kernels into a hybrid gradient by properly choosing weights, and apply it to FWI to recover velocity models. Finally, two synthetic examples are used to illustrate the performance of the proposed method.

2 Theory

2.1 Velocity and impedance sensitivity kernels

Using velocity v(x) and impedance z(x) as model parameters, the second-order acoustic wave equation can be written as (Plessix and Li, 2013)

\frac{1}{v (x) z (x)} \frac{\partial^{2} p (x, t)}{\partial t^{2}} - \nabla \cdot (\frac{v (x)}{z (x)} \nabla p (x, t)) = f (x_{s}, t), (1)

where x_s and x are the source and subsurface locations, f (x_s, t) is the source function, p (x, t) is the pressure wavefield, which is subject to the initial conditions: p (x, 0) = 0 and ∂p (x, t)/∂t=0.

To derive the adjoint wave equation and sensitivity kernels, we use the Lagrange multiplier method. The augmented least-squares waveform function can be formulated as (Tromp et al., 2005; Liu and Tromp, 2006; Plessix, 2006)

\begin{aligned} J & = \frac{1}{2} \sum_{x_{r}} \int_{0}^{T} {[p (x_{r}, t) - d (x_{r}, t)]}^{2} d t \\ - \int_{0}^{T} \int_{Ω} q (x, t) [\frac{1}{v (x) z (x)} \frac{\partial^{2} p (x, t)}{\partial t^{2}} - \nabla \cdot (\frac{v (x)}{z (x)} \nabla p (x, t))] {d x}^{3} d t, \end{aligned} (2)

where x_r is the receiver location, q (x, t) is the Lagrange multiplier, [0, T] denotes the record duration, Ω is the subsurface volume of interest. Taking the variation of J in Eq. 2 and using integration by parts for the spatial derivatives of p (x, t), we obtain

\begin{aligned} δ J & = \int_{0}^{T} \int_{Ω} \sum_{x_{r}} [p (x_{r}, t) - d (x_{r}, t)] δ p (x, t) δ (x - x_{r}) {d x}^{3} d t \\ - \int_{0}^{T} \int_{Ω} \{δ p (x, t) [\frac{1}{v (x) z (x)} \frac{\partial^{2} q (x, t)}{\partial t^{2}} - \nabla \cdot (\frac{v (x)}{z (x)} \nabla q (x, t))] \\ - δ \ln v [\frac{1}{z (x) v (x)} \frac{\partial^{2} p (x, t)}{\partial t^{2}} q (x, t) - \frac{v (x)}{z (x)} \nabla p (x, t) \cdot \nabla q (x, t)] \\ - δ \ln z [\frac{1}{z (x) v (x)} \frac{\partial^{2} p (x, t)}{\partial t^{2}} q (x, t) + \frac{v (x)}{z (x)} \nabla p (x, t) \cdot \nabla q (x, t)]\} {d x}^{3} d t, \end{aligned} (3)

where δ(x) is the Dirac delta function, δp (x, t) is the perturbed pressure wavefield. δ ln z = δz(x)/z₀(x), δ ln v = δv(x)/v₀(x), where δz(x) and z₀(x) are the perturbed and background impedances respectively, δv(x) and v₀(x) are the perturbed and background velocities. Note that in (Eq. 3), we only consider the first-order expansion for the pressure wavefield and model parameters, which is known as the Born approximation (Tarantola, 2005; Tromp et al., 2005; Plessix, 2006).

Provided the Lagrange multiplier q (x, t) satisfies

\frac{1}{v (x) z (x)} \frac{\partial^{2} q (x, t)}{\partial t^{2}} - \nabla \cdot (\frac{v (x)}{z (x)} \nabla q (x, t)) = \sum_{x_{r}} [p (x_{r}, t) - d (x_{r}, t)] δ (x - x_{r}) (4)

and is subject to the final conditions

q (x, T) = 0, \frac{\partial q (x, T)}{\partial t} = 0, (5)

(Eq. 3) can be simplified to

\begin{aligned} δ J & = - \int_{0}^{T} \int_{Ω} \{δ \ln v [\frac{1}{z (x) v (x)} \frac{\partial^{2} p (x, t)}{\partial t^{2}} q (x, t) - \frac{v (x)}{z (x)} \nabla p (x, t) \cdot \nabla q (x, t)] \\ + δ \ln z [\frac{1}{z (x) v (x)} \frac{\partial^{2} p (x, t)}{\partial t^{2}} q (x, t) + \frac{v (x)}{z (x)} \nabla p (x, t) \cdot \nabla q (x, t)]\} {d x}^{3} d t . \end{aligned} (6)

This equation tells us that the change of the misfit δJ may be caused by the changes of the model parameters δ ln z and δ ln v in terms of the forward wavefield p (x, t) and the Lagrange multiplier wavefield q (x, t). The adjoint wavefield is defined as p^†(x, t) = q (x, T − t) and inserted into Eq. 4 to obtain the adjoint wave equation

\frac{1}{v (x) z (x)} \frac{\partial^{2} p^{†} (x, t)}{\partial t^{2}} - \nabla \cdot (\frac{v (x)}{z (x)} \nabla p^{†} (x, t)) = \sum_{x_{r}} [p (x_{r}, T - t) - d (x_{r}, T - t)] δ (x - x_{r}) . (7)

Then, the variation of the misfit function in (Eq. 6) is reduced to

\begin{aligned} δ J = - \int_{Ω} (δ \ln v K_{v} + δ \ln z K_{z}) {d x}^{3} . \end{aligned} (8)

where the velocity (K_v) and impedance (K_z) kernels are defined as

\begin{aligned} K_{v} (x) & = \int_{0}^{T} [\frac{1}{z (x) v (x)} \frac{\partial^{2} p (x, t)}{\partial t^{2}} p^{†} (x, T - t) - \frac{v (x)}{z (x)} \nabla p (x, t) \cdot \nabla p^{†} (x, T - t)] d t, \\ K_{z} (x) & = \int_{0}^{T} [\frac{1}{z (x) v (x)} \frac{\partial^{2} p (x, t)}{\partial t^{2}} p^{†} (x, T - t) + \frac{v (x)}{z (x)} \nabla p (x, t) \cdot \nabla p^{†} (x, T - t)] d t . \end{aligned} (9)

Note that the adjoint wave equation is exactly the same as the forward wave (Eq. 1), except for replacing f (x_s, t) by time-reversed data residual (adjoint source).

For comparison, herein present the CSV kernel (the detailed derivation is given in Appendix A):

K_{csv} (x) = 2 \int_{0}^{T} [\frac{1}{z (x) v (x)} \frac{\partial^{2} p (x, t)}{\partial t^{2}} p^{†} (x, T - t)] d t . (10)

(Eqs. 9, 10) show that the CSV kernel equal the summation of the velocity and impedance kernels, indicating that the updates in CSV-based FWI include both velocity and impedance information. Zhou et al. (2015) show that due to the different responses to the scattering angles, the velocity and impedance kernels produce low-wavenumber (tomography) and high-wavenumber (migration) components, respectively (Wu and Toksöz, 1987; Mora, 1989; Alkhalifah, 2015). Therefore, conventional CSV-based FWI simultaneously update the macro and detail structures in velocity model building, which increases the nonlinearity and is prone to be trapped into local minima. One way to mitigate this problem in the model domain is to use the wavenumber continuation strategy that combines the two separated components with proper weights (Tang et al., 2013; Alkhalifah, 2016).

2.3 Connection between velocity and impedance kernels with scattering-angle filtering in constant-density media

Douma et al. (2010) prove that the impedance kernel in adjoint tomography is equivalent to the application of Laplacian filtering to reverse-time migration (RTM) images. Zhang and Sun (2009) notice that the application of the Laplacian filtering to RTM images is equivalent to the application of a cos²ϕ filter to angle-domain gathers, where ϕ is the reflection angle. Such conclusions confirm that the impedance-based imaging condition produces fewer low-wavenumber artifacts on RTM images (Zhu et al., 2009; Whitmore and Crawley, 2012; Pestana et al., 2014). In this section, we extend Douma et al. (2010)’s derivation to both velocity and impedance kernels and establish a connection between these kernels with scattering-angle filtering.

Considering the initial and final conditions for the forward and adjoint wavefields

p (x, 0) = p^{†} (x, T) = 0, p (x, T) = p^{†} (x, 0) = 0, (11)

and using integration by parts for the time derivatives, we have the following identity

\begin{aligned} \int_{0}^{T} [\frac{\partial^{2} p (x, t)}{\partial t^{2}} p^{†} (x, T - t)] d t & = - \int_{0}^{T} [\frac{\partial p (x, t)}{\partial t} \frac{d p^{†} (x, T - t)}{d t}] d t \\ = \int_{0}^{T} [p (x, t) \frac{\partial^{2} p^{†} (x, T - t)}{\partial t^{2}}] d t . \end{aligned} (12)

This notation simplifies the velocity and impedance kernels in (Eq. 9) to

\begin{aligned} K_{v} (x) & = \frac{1}{2} \frac{v (x)}{z (x)} \int_{0}^{T} [\frac{1}{v^{2} (x)} \frac{\partial^{2} p (x, t)}{\partial t^{2}} p^{†} (x, T - t) \\ - 2 \nabla p (x, t) \cdot \nabla p^{†} (x, T - t) + \frac{1}{v^{2} (x)} p (x, t) \frac{\partial^{2} p^{†} (x, T - t)}{\partial^{2} t}] d t, \\ K_{z} (x) & = \frac{1}{2} \frac{v (x)}{z (x)} \int_{0}^{T} [\frac{1}{v^{2} (x)} \frac{\partial^{2} p (x, t)}{\partial t^{2}} p^{†} (x, T - t) \\ + 2 \nabla p (x, t) \cdot \nabla p^{†} (x, T - t) + \frac{1}{v^{2} (x)} p (x, t) \frac{\partial^{2} p^{†} (x, T - t)}{\partial^{2} t}] d t . \end{aligned} (13)

In isotropic acoustic media with constant density, i.e., z(x)/v(x) = ρ₀, the forward and adjoint wavefields satisfy

\begin{aligned} \frac{1}{v^{2} (x)} \frac{\partial^{2} p (x, t)}{\partial t^{2}} - \nabla^{2} p (x, t) = ρ_{0} f (x_{s}), \frac{1}{v^{2} (x)} \frac{\partial^{2} p^{†} (x, t)}{\partial t^{2}} - \nabla^{2} p^{†} (x, t) = ρ_{0} f^{†} (x, t), \end{aligned} (14)

where f^†(x, t) is the adjoint source, that is, the right-hand side of (Eq 7). Inserting (Eq. 14) into (Eq. 13) and replacing the time-derivatives with the spatial-derivatives of p (x, t) and p^†(x, t) yields

\begin{aligned} K_{v} (x) & = \frac{1}{2 ρ_{0}} \int_{0}^{T} [\nabla^{2} p (x, t) p^{†} (x, T - t) - 2 \nabla p (x, t) \cdot \nabla p^{†} (x, T - t) \\ + p (x, t) \nabla^{2} p^{†} (x, T - t)] d t, \\ K_{z} (x) & = \frac{1}{2 ρ_{0}} \int_{0}^{T} [\nabla^{2} p (x, t) p^{†} (x, T - t) + 2 \nabla p (x, t) \cdot \nabla p^{†} (x, T - t) \\ + p (x, t) \nabla^{2} p^{†} (x, T - t)] d t . \end{aligned} (15)

In (Eq. 15), the terms associated with the source and the adjoint source are neglected because they vanish for far-field wavefields (Douma et al., 2010). Applying Fourier transform to (Eq. 15) and because $|k| = |k^{†}| = ω / v$ , the kernels have the following form:

\begin{aligned} K_{v} (k) & = \frac{1}{ρ_{0}} \sum_{ω} | k |^{2} (1 - \frac{k \cdot k^{†}}{| k |^{2}}) P (k, ω) P^{†} (k, ω), \\ K_{z} (k) & = \frac{1}{ρ_{0}} \sum_{ω} | k |^{2} (1 + \frac{k \cdot k^{†}}{| k |^{2}}) P (k, ω) P^{†} (k, ω) . \end{aligned} (16)

where ω is the angular frequency, p (k, ω) and p^†(k, ω) are the forward and adjoint Fourier-domain wavefields, k and k^† are the forward and adjoint wavenumbers. Considering the relation $\frac{k \cdot k^{†}}{| k |^{2}} = \cos θ$ and using the dispersion relation |k|² = ω²/v², (Eq. 16) can be reformulated as

\begin{aligned} K_{v} (x) & = \frac{2}{ρ_{0}} \sum_{ω} \frac{ω^{2}}{v^{2} (x)} P (x, ω) P^{†} (x, ω) \frac{(1 - \cos θ)}{2}, \\ K_{z} (x) & = \frac{2}{ρ_{0}} \sum_{ω} \frac{ω^{2}}{v^{2} (x)} P (x, ω) P^{†} (x, ω) \frac{(1 + \cos θ)}{2} . \end{aligned} (17)

where θ is the scattering angle (see Figure 1). Similarly, CSV kernel in (Eq. 10) can be rewritten in the frequency domain as

\begin{aligned} K_{csv} (x) = \frac{2}{ρ_{0}} \sum_{ω} \frac{ω^{2}}{v^{2} (x)} P (x, ω) P^{†} (x, ω) . \end{aligned} (18)

(Eqs. 17, 18) show that the differences between K_v and K_z with K_csv are the multiplication with (1 − cos θ)/2 and (1 + cos θ)/2, which appear as band-pass filters associated with scattering angles. Note that K_z includes both velocity and density updates when density varies significantly in the subsurface. Since (Eqs. 12–17) are derived based on the assumption of constant density, K_z and K_v herein represent the velocity perturbations within different scattering-angle bands.

FIGURE 1

FIGURE 1. Geometry of incident and scattering waves. Red lines are the wave paths of the forward and adjoint wavefields. k and k^† denote their propagation directions. (x, z) is the Cartesian coordinate system and θ is the scattering angle.

A simple experiment (Figure 2) is used to illustrate the effects of these two scattering-angle filters. Since the sensitivity kernels in (Eqs. 17, 18) are derived based on Born approximation, we calculate Born modeling results using CSV parameterization as well as velocity and impedance parameterization (Figures 2B–D). These results are also known as radiation patterns (Virieux and Operto, 2009; Wang et al., 2015; Zhou et al., 2015). Figure 2A shows that the CSV kernel has an all-pass response with respect to scattering angles, which corresponds to a homogenous radiation pattern (Figure 2B). This suggests that the CSV kernel includes both large and small scattering-angle updates (Tang et al., 2013). The velocity kernel K_v behaves like applying a high-pass filter to the CVS kernel (red line in Figure 2A), which emphasizes large-angle forward scattering contributions (Figure 2C). This indicates that the tomography component in FWI gradient is enhanced in K_v and can be used to recover large- and intermediate-scale velocity perturbations. In contrast, the impedance kernel K_z is a result of applying a low-pass scattering-angle filter to the CSV kernel (green line in Figure 2A), emphasizing the small-angle backscattering component (Figure 2D). This high-pass scattering-angle filter helps us to produce migration profiles and can be used to resolve detail structures (Luo et al., 2009; Zhu et al., 2009; Pestana et al., 2014).

FIGURE 2

FIGURE 2. Scattering angle filters and radiation patterns for a vertically incident plane wave in a homogenous medium with v =2 km/s (A) Angle filters for different kernels; (B) CSV radiation pattern; (C) velocity and (D) impedance radiation patterns. A unit model perturbation indicated by a green star is located at x =2 km and z =2 km. Red dashed lines in (B), (C) and (D) denote the amplitude variations with respect to different scattering angles.

2.4 Full waveform inversion with a hybrid gradient

In the CSV-based FWI, the velocity model can be updated using the following gradient-based scheme:

m^{k + 1} = m^{k} - α H^{- 1} g^{k} (19)

where m^k+1 and m^k are the velocity models in the next and current iterations, respectively; g^k is the misfit gradient, which can be computed by summing K_csv in (Eq. 10) over sources; α is the step length, which can be computed with a line-search algorithm (Potra and Shi, 1995); H⁻¹ is the Hessian inverse and can be used to speed up convergence (Pratt et al., 1998; Shin et al., 2001; Plessix and Mulder, 2004; Tang, 2009; Métivier et al., 2013).

Using wavefield decomposition, Tang et al. (2013) and Wang et al. (2016) show that the enhancement of tomography components in the misfit gradient helps FWI to reduce nonlinearity and mitigate the cycle-skipping problem. This can also be implemented with a wavenumber continuity strategy, designing an appropriate scattering-angle filter (Wu and Alkhalifah, 2015; Alkhalifah, 2016). In the previous section, the connection between the velocity and impedance kernels is established by scattering-angle filtering in acoustic constant-density media. Similar to Tang et al. (2013), we combine the velocity and impedance kernels into a hybrid gradient:

g_{h} = λ \sum_{x_{s}} K_{v} (x_{s}) + \sum_{x_{s}} K_{z} (x_{s}), (20)

where x_s is the source location, K_v(x_s) and K_z (x_s) are the kernels in (Eq. 9), and λ is an adjustable scalar parameter to balance the relative weights of K_v and K_z.

Note that the large scattering-angle components in the gradient are enhanced by setting λ greater than one in (Eq. 20) at early iterations, the large scattering-angle components in the gradient are enhanced. This allows us to recover large- and intermediate-scale velocity perturbations. In subsequent iterations, reducing λ gradually to one can increase the relative weight of K_z and use more smaller scattering-angle contributions, which enables us to update detail structures. We refer to this workflow as the HG-based FWI. Since we first update large scattering-angle perturbations and then introduce smaller scatting-angle information, HG-based FWI provides us a possible way to reduce potential nonlinearity during FWI iterations.

3 Numerical examples

We present two synthetic examples to illustrate the performance of the proposed HG-based FWI scheme. The first example is the 2D SEG/EAGE overthrust model, which is modified by adding a 175 m thick water layer on the top of the model. The true velocity model is shown in Figure 3A. Initial model in Figure 3B is built by applying a 625 m × 625 m Gaussian filter to the true model. Seismograms are calculated using a staggered-grid finite-difference scheme with eighth-order accuracy in space and second-order accuracy in time. 25 shots are evenly distributed on the surface with a 250 m interval. Each shot is recorded by 250 receivers, which are uniformly deployed on the surface with a 25 m spacing. A Ricker wavelet with a peak frequency of 20 Hz is used as the source function. A high-pass filter is applied to filter out low-frequency signals below 5 Hz.

FIGURE 3

FIGURE 3. FWI experiments for SEG/EAGE overthrust model. (A) True velocity model, (B) initial velocity model, (C) recovered velocity model using CSV-based FWI, and (D) recovered velocity model using HG-based FWI. 40 iterations are performed in (C,D). The lowest effective frequency used in (C,D) is 5 Hz.

Inversion results using CSV-based FWI and our method are shown in Figures 3, 4. 40 iterations are performed and a phase-encoding diagonal Hessian (Figure 6D) is used as a pre-conditioner (Tang, 2009) in both methods. In our method, λ is set to 8.0 in the first 10 iterations, and then gradually reduces to 1.0 during the following iterations as shown in Figure 5. Gradients calculated using different kernels at the first iteration are shown in Figure 6. We notice that the gradient in CSV-based FWI includes almost high-wavenumber updates, i.e., migration components (see Figure 6A). This does not mean there are no low-wavenumber components, but their magnitudes are relatively small in comparison with high-wavenumber components. As shown in Figures 6B,C, the low- and high-pass scattering-angle filters decompose the CSV kernel into a tomography component (K_v) and a migration component (K_z). This favorable scale-separation property allows HG-based FWI to update the low-wavenumber tomography component by enhancing the weight of K_v, and to resolve detail structures by gradually increasing the weight of K_z (see Figures 3D, 4).

FIGURE 4

FIGURE 4. Comparisons of velocity logs for SEG/EAGE overthrust model. (A–C) are located at the midpoints of 1.25, 3.5, and 5.0 km, respectively. Black and green lines are from true and initial models, blue and red lines are from CSV-based FWI and HG-based FWI. HG-based FWI produces more accurate results than CSV-based method, especially at great depths with large-scale velocity anomalies.

FIGURE 5

FIGURE 5. The relationship between the weight factor lambda and iterations. λ is set to 8.0 in the first 10 iterations, and then gradually reduces to 1.0 according to the change of cosine function in subsequent iterations.

FIGURE 6

FIGURE 6. Comparison of different gradients in the first iteration for SEG/EAGE overthrust model. (A) CSV-based gradient, (B) velocity and (C) impedance gradients, and (D) phase-encoding diagonal Gauss-Newton Hessian. (D) Is used as a preconditioner in both CSV-based and HG-based FWI methods. The magnitude of each panel is normalized with respect to their maximum values.

Figure 7 shows the evolutions of data residuals and model errors for these two FWI schemes. Data residuals and model errors are defined as

r = \frac{‖ d_{obs} - d_{syn} ‖^{2}}{‖ d_{obs} ‖^{2}}, e = \frac{‖ m_{tru} - m_{fwi} ‖^{2}}{‖ m_{tru} ‖^{2}}, (21)

where d_obs and d_syn are the observed and synthetic data, m_tru and m_fwi are the true and recovered velocity models. Relative data residuals and model errors are calculated by normalizing r and e by their initial values (r₀ and e₀). Although the data residual of CSV-based FWI has been reduced by about 90%, it is stuck around 10% (blue line in Figure 7A), suggesting that it is trapped into a local minimum. This is also reflected in the corresponding model errors in Figure 7B. With a hybrid gradient, the proposed method recovers low-wavenumber components first and then gradually increases high-wavenumber components, leading to a faster convergence rate and a higher inversion accuracy in comparison with CSV-based FWI (see Figures 4, 7).

FIGURE 7

FIGURE 7. Convergence rates for SEG/EAGE overthrust model. (A,B) are relative data residual and model error, respectively. Blue and red lines are results from CSV-based FWI and the proposed HG-based FWI.

In the second example, the Marmousi model is used to test the robustness of the proposed method for complicated structures. True velocity model is shown in Figure 8A. Starting model (Figure 8B) is built by applying a 1,250 m × 1,250 m Gaussian filter to the true model. Seismograms are generated using the same finite-difference scheme as the previous example. Source function is a Ricker wavelet with a peak frequency of 8 Hz. 38 shots are distributed on the surface with a 250 m spacing. 761 receivers are deployed evenly on the model surface with a 12.5 m interval. A dataset is built by filtering frequency components below 3 Hz and is used for both CVS-based FWI and the proposed method. The setting of λ in HG-based FWI is the same as the previous example. Inversion results are shown in Figures 10A,B. Gradients for the first iteration and velocity logs are shown in Figures 9, 11, respectively. CSV-based FWI produces accurate results at shallow depths for both low- and high-wavenumber perturbations, but fails to recover low-wavenumber velocity anomalies at greater depths (see Figures 10A, 11). This is caused by uneven subsurface scattering-angle illuminations. At shallow depths, there is sufficient illumination for both large and small scattering angles due to large offset-to-depth ratio (O/D). As depth increases, O/D decreases and the CSV-based gradient is dominated by small scattering-angle components (Figure 9A). Although large scattering-angle components in deep areas are very weak, they do exist as proved by Alkhalifah (2015). Instead of using the CSV kernel, HG-based FWI combines the K_v and K_z kernels to update velocity model (Figures 9B,C). This helps us to recover low-wavenumber perturbations at greater depths by enhancing the K_v kernel in FWI gradient. Then, high-wavenumber structures are recovered by gradually reducing the weight of the K_v kernel, producing a final high-resolution result (Figures 10B, 11).

FIGURE 8

FIGURE 8. FWI experiment for Marmousi model. (A) True velocity model and (B) initial velocity model.

FIGURE 9

FIGURE 9. Comparisons of different gradients in the first iteration for Marmousi model. (A) CSV-based gradient, (B) velocity and (C) impedance gradients. The magnitude of each panel is normalized with respect to their maximum values.

FIGURE 10

FIGURE 10. Recovered Marmousi velocity model from different FWI methods. (A,B) are CSV-based FWI and our method using data without frequency components below 3 Hz. 40 iterations are used in (A,B).

FIGURE 11

FIGURE 11. Comparisons of velocity logs for Marmousi model. (A–C) are located at the midpoints of 1.8, 5.0, and 8.1 km, respectively. Black and green lines are from true and initial models; blue and red lines are from CSV-based FWI and the proposed method. The proposed method is much more accurate than CSV-based FWI for recovering deep low-wavenumber velocity anomalies.

To better quantitatively evaluate the inversion results of these two methods, we compare their convergence rates (Figure 12) and predicted seismograms using final recovered models (Figure 13). Compared with CSV-based FWI method (red line in Figure 12), the proposed method has a faster convergence rate and a higher inversion accuracy (blue line in Figure 12), which also can illustrate that the proposed method has better adaptability to complex structures.

FIGURE 12

FIGURE 12. Convergence rates for Marmousi model. (A,B) are relative data residual and model error, respectively. Blue and red lines are results from CSV-based FWI and the proposed method.

FIGURE 13

FIGURE 13. Comparisons of common-shot gathers for Marmousi model. (A) Observed data, (B) synthetic data using initial model; (C) synthetic data using CSV-based FWI velocity model, (D) difference between (A) with (C); (E) synthetic data using recovered velocity model from the proposed method, (F) difference between (A,E).

4 Discussion

By parameterizing the acoustic wave equation using velocity and impedance, we derive their sensitivity kernels based on the Lagrange multiplier method. In a constant-density case, we prove that the velocity (K_v) and impedance (K_z) kernels are equivalent to applying a high-pass and a low-pass scattering-angle filters to the CSV (K_csv) kernel. K_v mainly provides low-wavenumber updates and is helpful to recover large-scale anomalies. Although impedance is defined as z = ρv, K_z represents high-wavenumber velocity perturbations in constant-density media and helps us to resolve detail structures. By choosing weights properly, we can enhance K_v contribution in the hybrid gradient at early iterations and then gradually increase K_z contribution in subsequence iterations. This workflow reduces FWI nonlinearity and partially mitigate the cycle-skipping problem.

The proposed HG-based FWI scheme is similar to the tomography-enhanced FWI presented by Tang et al. (2013). But there are several key differences. First, Tang et al. (2013) use wavefield decomposition to extract the tomography and migration components from the CSV-based gradient. There is no clear relation between these two components with the scattering-angle filtering. Second, in areas with complicated structures, cross-correlations between source-side and receiver-side upgoing waves or between source-side and receiver-side downgoing waves might still produce certain migration components. Third, separating upgoing and downgoing waves requires the construction of an analytical wavefield at every time step. This can be implemented by either solving the wave equation twice (Shen and Albertin, 2015) or calculating a complex-valued wave equation (Zhang and Zhang, 2009; Pestana and Revelo, 2017), thus with a higher computational cost. Our approach only needs to modify the gradient calculation with time- and spatial-derivatives and hence no additional computational costs are required.

In our derivation from (Eqs. 12–18), the subsurface density is assumed to be constant. Therefore, our method is applicable for areas without significant density variations. When density varies greatly, we should consider the spatial derivation of 1/ρ in Eq. 14 and cannot obtain an analytic relation between K_v and K_z with K_csv. In this case, K_z includes both velocity and density perturbations (Prieux et al., 2013; Zhou et al., 2015), and thus cannot be combined with K_v to update the velocity model. How to simultaneously invert multi-parameters, such as velocity and density, velocity and impedance or impedance and density, is beyond the scope of this paper and needs further investigation.

5 Conclusion

We establish a connection between the velocity and impedance kernels with scattering angle filtering in acoustic constant-density media. This allows us to combine these two kernels into a HG-based FWI workflow to update the velocity model. By enhancing the velocity kernel contribution at early iterations, which mainly gives tomography updates, the proposed method enables us to recover large- and intermediate-scale velocity anomalies. In the subsequence iterations, gradually increasing the weight of the impedance kernel helps us to resolve small-scale structures. This workflow provides us with a way to reduce the potential nonlinearity of FWI and partially mitigate the cycle-skipping problem. Synthetic examples demonstrate that the proposed method produces an inversion result with faster convergence and higher accuracy than CSV-based FWI method.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://wiki.seg.org/wiki/SEG/EAGE_Salt_and_Overthrust_Models https://wiki.seg.org/wiki/Dictionary:Marmousi_model/zh.

Author contributions

YY: be responsible for correcting and modifying articles and checking procedures. JS: be responsible for correcting and modifying articles.

Funding

This research is supported by the startup funding (no. 20CX06069A) of Guanghua Scholar at Geophysics Department, China University of Petroleum (East China). We thank the support from the Funds of Creative Research Groups of China (no. 41821002), the National Outstanding Youth Science Foundation (no. 41922028), the National Natural Science Foundation of China (General Program) (no. 41874149), the Strategic Priority Research Program of the Chinese Academy of Sciences (no. XDA14010303), the Key Program for International Cooperation Projects of China (no. 41720104006), the National Key R&D Program of China (no. 2019YFC0605503), the Major Scientific and Technological Projects of CNPC (no. ZD 2019-183-003).

Acknowledgments

We appreciate the comments and suggestions from editor J. Zhang and two reviewers, which significantly improve the manuscript.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Albertin, U., Shan, G., and Washbourne, J. (2013). “Gradient orthogonalization in adjoint scattering-series inversion,” in SEG Technical Program Expanded Abstracts, Houston, September 22, 2013, 1058–1062. doi:10.1190/segam2013–0580.1