Building surrogate models of nuclear density functional theory with Gaussian processes and autoencoders

Verriere, Marc; Schunck, Nicolas; Kim, Irene; Marević, Petar; Quinlan, Kevin; Ngo, Michelle N.; Regnier, David; Lasseri, Raphael David

doi:10.3389/fphy.2022.1028370

ORIGINAL RESEARCH article

Front. Phys., 08 November 2022

Sec. Nuclear Physics

Volume 10 - 2022 | https://doi.org/10.3389/fphy.2022.1028370

This article is part of the Research TopicUncertainty Quantification in Nuclear PhysicsView all 16 articles

Building surrogate models of nuclear density functional theory with Gaussian processes and autoencoders

Marc Verriere¹

Nicolas Schunck¹*

Irene Kim^2,3

Petar Marević^1,4,5

Kevin Quinlan⁶

Michelle N. Ngo^6,7

David Regnier^8,9

Raphael David Lasseri⁴

¹Nuclear and Data Theory Group, Nuclear and Chemical Science Division, Lawrence Livermore National Laboratory, Livermore, CA, United States
²Machine Intelligence Group, Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, United States
³Sensing and Intelligent Systems Group, Global Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, CA, United States
⁴Centre Borelli, ENS Paris-Saclay, Université Paris-Saclay, Cachan, France
⁵Physics Department, Faculty of Science, University of Zagreb, Zagreb, Croatia
⁶Applied Statistics Group, Lawrence Livermore National Laboratory, Livermore, CA, United States
⁷Center for Complex Biological Systems, University of California Irvine, Irvine, CA, United States
⁸Université Paris-Saclay, CEA, Laboratoire Matière en Conditions Extrêmes, Bruyères-le-Châtel, France
⁹CEA, DAM, DIF, Bruyères-le-Châtel, France

From the lightest Hydrogen isotopes up to the recently synthesized Oganesson (Z = 118), it is estimated that as many as about 8,000 atomic nuclei could exist in nature. Most of these nuclei are too short-lived to be occurring on Earth, but they play an essential role in astrophysical events such as supernova explosions or neutron star mergers that are presumed to be at the origin of most heavy elements in the Universe. Understanding the structure, reactions, and decays of nuclei across the entire chart of nuclides is an enormous challenge because of the experimental difficulties in measuring properties of interest in such fleeting objects and the theoretical and computational issues of simulating strongly-interacting quantum many-body systems. Nuclear density functional theory (DFT) is a fully microscopic theoretical framework which has the potential of providing such a quantitatively accurate description of nuclear properties for every nucleus in the chart of nuclides. Thanks to high-performance computing facilities, it has already been successfully applied to predict nuclear masses, global patterns of radioactive decay like β or γ decay, and several aspects of the nuclear fission process such as, e.g., spontaneous fission half-lives. Yet, predictive simulations of nuclear spectroscopy—the low-lying excited states and transitions between them—or of nuclear fission, or the quantification of theoretical uncertainties and their propagation to basic or applied nuclear science applications, would require several orders of magnitude more calculations than currently possible. However, most of this computational effort would be spent into generating a suitable basis of DFT wavefunctions. Such a task could potentially be considerably accelerated by borrowing tools from the field of machine learning and artificial intelligence. In this paper, we review different approaches to applying supervised and unsupervised learning techniques to nuclear DFT.

1 Introduction

Predicting all the properties of every atomic nucleus in the nuclear chart, from Hydrogen all the way to superheavy elements, remains a formidable challenge. Density functional theory (DFT) offers a compelling framework to do so, since the computational cost is, in principle, nearly independent of the mass of the system Eschrig [1]. Because of our incomplete knowledge of nuclear forces and of the fact that the nucleus is a self-bound system, the implementation of DFT in nuclei is slightly different from other systems such as atoms or molecules and is often referred to as the energy density functional (EDF) formalism Schunck [2].

Simple single-reference energy density functional (SR-EDF) calculations of atomic nuclei can often be done on a laptop. However, large-scale SR-EDF computations of nuclear properties or higher-fidelity simulations based on the multi-reference (MR-EDF) framework can quickly become very expensive computationally. Examples where such computational load is needed range from microscopic fission theory Schunck and Regnier [3]; Schunck and Robledo [4] to parameter calibration and uncertainty propagation Kejzlar et al. [5]; Schunck et al. [6] to calculations at the scale of the entire chart of nuclides Erler et al. [7]; Ney et al. [8] relevant, e.g., for astrophysical simulations Mumpower et al. [9]. Many of these applications would benefit from a reliable emulator of EDF models.

It may be useful to distinguish two classes of quantities that such emulators should reproduce. What we may call “integral” quantities are quantum-mechanical observables such as, e.g., the energy, radius, or spin of the nucleus, or more complex data such as decay or capture rates. By contrast, we call “differential” quantities the basic degrees of freedom of the theoretical model. In this article, we focus on the Hartree-Fock-Bogoliubov (HFB) theory, which is both the cornerstone of the SR-EDF approach and provides the most common basis of generator states employed in MR-EDF calculations. In the HFB theory, all the degrees of freedom are encapsulated into three equivalent quantities: the quasiparticle spinors, as defined either on some spatial grid or configuration space; the full non-local density matrix ρ(rστ, r′σ′τ′) and pairing tensor κ(rστ, r′σ′τ′), where r refers to spatial coordinates, σ = ±1/2 to the spin projection and τ = ±1/2 to the isopin projection Perlińska et al. [10]; the full non-local HFB mean-field and pairing potentials, often denoted by h (rστ, r′σ′τ′) and Δ(rστ, r′σ′τ′).

Obviously, integral quantities have the clearest physical meaning and can be compared to data immediately. For this reason, they have been the focus of most of the recent efforts in applying techniques of machine learning and artificial intelligence (ML/AI) to low-energy nuclear theory, with applications ranging from mass tables Utama et al. [11]; Utama and Piekarewicz [12,13]; Niu and Liang [14]; Neufcourt et al. [15]; Lovell et al. [16]; Scamps et al. [17]; Mumpower et al. [18], β-decay rates Niu et al. [19], or fission product yields Wang et al. [20]; Lovell et al. [21]. The main limitation of this approach is that it must be repeated for every observable of interest. In addition, incorporating correlations between such observables, for example the fact that β-decay rates are strongly dependent on Q_β-values which are themselves related to nuclear masses, is not easy. This is partly because the behavior of observables such as the total energy or the total spin is often driven by underlying shell effects that can lead to very rapid variations, e.g. at a single-particle crossing. Such effects could be very hard to incorporate accurately in a statistical model of integral quantities.

This problem can in principle be solved by emulating what we called earlier differential quantities. For example, single-particle crossings might be predicted reliably with a good statistical model for the single-particle spinors themselves. In addition, since differential quantities represent, by definition, all the degrees of freedom of the SR-EDF theory, any observable of interest can be computed from them, and the correlations between these observables would be automatically reproduced. In this sense, an emulator of differential quantities is truly an emulator for the entire SR-EDF approach and can be thought of as a variant of intrusive, model-driven, model order reduction techniques discussed in Melendez et al. [22]; Giuliani et al. [23]; Bonilla et al. [24]. In the much simpler case of the Bohr collective Hamiltonian, such a strategy gave promising results Lasseri et al. [25].

The goal of this paper is precisely to explore the feasibility of training statistical models to learn the degrees of freedom of the HFB theory. We have explored two approaches: a simple one based on independent, stationary Gaussian processes and a more advanced one relying on deep neural networks with autoencoders and convolutional layers.

In Section 2, we briefly summarize the nuclear EDF formalism with Skyrme functionals with a focus on the HFB theory preserving axial symmetry. Section 3 presents the results obtained with Gaussian processes. After recalling some general notions about Gaussian processes, we analyze the results of fitting HFB potential across a two-dimensional potential energy surface in ²⁴⁰Pu. Section 4 is devoted to autoencoders. We discuss choices made both for the network architecture and for the training data set. We quantify the performance of autoencoders in reproducing canonical wavefunctions across a potential energy surface in ⁹⁸Zr and analyze the structure of the latent space.

2 Nuclear density functional theory

In very broad terms, the main assumption of density functional theory (DFT) for quantum many-body systems is that the energy of the system of interest can be expressed as a functional of the density of particles Parr and Yang [26]; Dreizler and Gross [27]; Eschrig [1]. Atomic nuclei are a somewhat special case of DFT, since the nuclear Hamiltonian is not known exactly and the nucleus is a self-bound system Engel [28]; Barnea [29]. As a result, the form of the energy density functional (EDF) is often driven by underlying models of nuclear forces, and the EDF is expressed as a function of non-local, symmetry-breaking, intrinsic densities Schunck [2]. In the single-reference EDF (SR-EDF) approach, the many-body nuclear state is approximated by a simple product state of independent particles or quasiparticles, possibly with some constraints reflecting the physics of the problem. We notate |Φ(q)⟩ such as state, with q representing a set of constraints. The multi-reference EDF (MR-EDF) approach builds a better approximation of the exact many-body state by mixing together SR-EDF states.

2.1 Energy functional

The two most basic densities needed to build accurate nuclear EDFs are the one-body density matrix ρ and the pairing tensor κ (and its complex conjugate κ*). The total energy of the nucleus is often written as

E [ρ, κ, κ^{*}] = E_{nuc} [ρ] + E_{Cou} [ρ] + E_{pair} [ρ, κ, κ^{*}], (1)

where E_nuc [ρ] represents the particle-hole, or mean-field, contribution to the total energy from nuclear forces, E_Cou [ρ] the same contribution from the Coulomb force, and E_pair [ρ, κ, κ*] the particle-particle contribution to the energy¹. In this work, we model the nuclear part of the EDF with a Skyrme-like term

E_{nuc} [ρ] = \sum_{t = 0,1} \int d^{3} r χ_{t} (r), (2)

which includes the kinetic energy term and reads generically

χ_{t} (r) = C_{t}^{ρ ρ} ρ_{t}^{2} + C_{t}^{ρ τ} ρ_{t} τ_{t} + C_{t}^{J J} J_{t}^{2} + C_{t}^{ρ Δ ρ} ρ_{t} Δ ρ_{t} + C_{t}^{ρ \nabla J} ρ_{t} \nabla \cdot J_{t} . (3)

In this expression, the index t refers to the isoscalar (t = 0) or isovector (t = 1) channel and the terms $C_{t}^{u u^{'}}$ are the coupling constants associated with the energy functional. The particle density ρ_t(r), kinetic energy density τ_t(r), spin-current tensor $J_{t} (r)$ , and vector density J_t(r) are all derived from the full one-body, non-local density ρ(rστ, r′σ′τ′) where r are spatial coordinates, σ is the intrinsic spin projection, σ = ±1/2, and τ = ±1/2 is the isospin projection; see Engel et al. [30]; Dobaczewski and Dudek [31]; Bender et al. [32]; Perlińska et al. [10]; Lesinski et al. [33] for their actual definition. Since we do not consider any proton-neutron mixing, all densities are diagonal in isospin space. The two remaining terms in Eq. 1 are treated in exactly the same way as in Schunck et al. [34]. In particular, the pairing energy is derived from a surface-volume density-dependent pairing force

V^{(τ)} (r, r^{'}) = V_{0}^{(τ)} [1 - \frac{1}{2} \frac{ρ (r)}{ρ_{c}}] δ (r - r^{'}), (4)

where ρ_c = 0.16 fm⁻³ is the saturation density of nuclear matter.

2.2 Hartree-Fock-Bogoliubov theory

The actual densities in (3) are obtained by solving the Hartree-Fock-Bogoliubov (HFB) equation, which derives from applying a variational principle and imposing that the energy be minimal under variations of the densities Schunck [2]. The HFB equation is most commonly solved in the form of a non-linear eigenvalue problem. The eigenfunctions define the quasiparticle (q.p.) spinors. Without proton-neutron mixing, we can treat neutrons and protons separately. Therefore, for any one type of particles, the HFB equation giving the μ^th eigenstate reads in coordinate space Dobaczewski et al. [35].

\int d^{3} r^{'} \sum_{σ^{'}} (\begin{matrix} h (r σ, r^{'} σ^{'}) - λ δ_{σ σ^{'}} & \tilde{h} (r σ, r^{'} σ^{'}) \\ {\tilde{h}}^{*} (r σ, r^{'} σ^{'}) & - h (r σ, r^{'} σ^{'}) + λ δ_{σ σ^{'}} \end{matrix}) (\begin{matrix} U (E_{μ}, r^{'} σ^{'}) \\ V (E_{μ}, r^{'} σ^{'}) \end{matrix}) = E_{μ} (\begin{matrix} U (E_{μ}, r σ) \\ V (E_{μ}, r σ) \end{matrix}), (5)

where h (rσ, r′σ′) is the mean field, $\tilde{h} (r σ, r^{'} σ^{'})$ the pairing field² and λ the Fermi energy. Such an eigenvalue problem must be solved for protons and for neutrons.

For the case of Skyrme energy functionals and zero-range pairing functionals, both the mean field h and pairing field $\tilde{h}$ become semi-local functions of r (semi-local refers to the fact that these potentials involve differential operators). We refer to Vautherin and Brink [36]; Engel et al. [30] for an outline of the derivations leading to the expressions of the mean field in the case of Skyrme functionals and to, e.g., Dobaczewski and Dudek [37]; Bender et al. [38]; Hellemans et al. [39]; Ryssens et al. [40] for the expression of the mean field in terms of coupling constants rather than the parameters of the Skyrme potential. In the following, we simply recall the essential formulas needed in the rest of the manuscript.

Expression 5 is written in coordinate space. In configuration space, i.e., when the q.p. spinors are expanded on a suitable basis of the single-particle (s.p.) Hilbert space, the same equation becomes a non-linear eigenvalue problem that can be written as

(\begin{matrix} h - λ & \tilde{h} \\ {\tilde{h}}^{*} & - h^{*} + λ \end{matrix}) (\begin{matrix} U & V^{*} \\ V & U^{*} \end{matrix}) = (\begin{matrix} U & V^{*} \\ V & U^{*} \end{matrix}) (\begin{matrix} - E & 0 \\ 0 & E \end{matrix}), (6)

where h, $\tilde{h}$ , U and V are now N_basis × N_basis matrices, with N_basis the number of basis states. Eigenvalues are collected in the diagonal N_basis × N_basis matrix E. The set of all eigenvectors define the Bogoliubov matrix,

W = (\begin{matrix} U & V^{*} \\ V & U^{*} \end{matrix}), (7)

which is unitary: $W W^{†} = W^{†} W = 1$ . Details about the HFB theory can be found in the standard references Valatin [41]; Mang [42]; Blaizot and Ripka [43]; Ring and Schuck [44].

2.3 Mean-field and pairing potentials

The mean fields are obtained by functional differentiation of the scalar-isoscalar energy functional (1) with respect to all relevant isoscalar or isovector densities, ρ₀, ρ₁, τ₀, etc. For the case of a standard Skyrme EDF when time-reversal symmetry is conserved, the corresponding mean-field potentials in the isoscalar-isovector representation become semi-local Dobaczewski and Dudek [37,45]; Stoitsov et al. [46]; Hellemans et al. [39].

h_{t} (r) = - \nabla M_{t}^{*} (r) \nabla + U_{t} (r) + \frac{1}{2 i} \sum_{μ ν} (\nabla_{μ} σ_{ν} B_{t, μ ν} (r) + B_{t, μ ν} (r) \nabla_{μ} σ_{ν}), (8)

where, as before, t = 0, 1 refers to the isoscalar or isovector channel and the various contributions are.

M_{t} (r) = \frac{ℏ^{2}}{2 m} + C_{t}^{ρ τ} ρ_{t}, (9a)

U_{t} (r) = 2 C_{t}^{ρ ρ} ρ_{t} + C_{t}^{ρ τ} τ_{t} + 2 C_{t}^{ρ Δ ρ} Δ ρ_{t} + C_{t}^{ρ \nabla J} \nabla \cdot J_{t} + U_{t}^{(r e a r)}, (9b)

B_{t, μ ν} (r) = 2 C_{t}^{ρ J} J_{t, μ ν} - C_{t}^{ρ Δ J} \nabla_{μ} ρ_{t, ν} . (9c)

In these expressions, μ, ν label spatial coordinates and σ is the vector of Pauli matrices in the chosen coordinate system. For example, in Cartesian coordinates, μ, ν ≡ x, y, z and σ = (σ_x, σ_y, σ_z). The term $U_{t}^{(r e a r)}$ is the rearrangement potential originating from the density-dependent part of the energy. The resulting isoscalar and isovector mean-field and pairing potentials can then recombined to give the neutron and proton potentials,

h^{(n)} = h_{0} + h_{1}, h^{(p)} = h_{0} - h_{1} . (10)

Note that the full proton potential should also contain the contribution from the Coulomb potential.

The pairing field is obtained by functional differentiation of the same energy functional (1), this time with respect to the pairing density. As a result, one can show that it is simply given by

{\tilde{h}}^{(τ)} (r) = V_{0}^{(τ)} [1 - \frac{1}{2} \frac{ρ_{0} (r)}{ρ_{c}}] {\tilde{ρ}}^{(τ)} (r) . (11)

2.4 Collective space

Nuclear fission or nuclear shape coexistence are two prominent examples of large-amplitude collective motion of nuclei Schunck and Regnier [3]; Heyde and Wood [47]. Such phenomena can be accurately described within nuclear DFT by introducing a small-dimensional collective manifold, e.g., associated with the nuclear shape, where we assume the nuclear dynamics is confined Nakatsukasa et al. [48]; Schunck [2]. The generator coordinate method (GCM) and its time-dependent extension (TDGCM) provide quantum-mechanical equations of motion for such collective dynamics Griffin and Wheeler [49]; Wa Wong [50]; Reinhard and Goeke [51]; Bender et al. [32]; Verriere and Regnier [52]. In the GCM, the HFB solutions are generator states, i.e., they serve as a basis in which the nuclear many-body state is expanded. The choice of the collective manifold, that is, of the collective variables, depends on the problem at hand. For shape coexistence or fission, these variables typically correspond to the expectation value of multipole moment operators on the HFB state. A pre-calculated set of HFB states with different values for the collective variables defines a potential energy surface (PES).

In practice, PES are obtained by adding constraints to the solutions of the HFB equation. This is achieved by introducing a set of constraining operators ${\hat{Q}}_{a}$ capturing the physics of the problem at hand. The set of all such constraints q ≡ (q₁, …, q_N) defines a point in the PES. In this work, our goal is to design emulators capable of reproducing the HFB solutions at any given point q of a PES. Throughout this article, we consider exclusively two-dimensional collective spaces spanned by the expectation values of the axial quadrupole ${\hat{Q}}_{20}$ and axial octupole ${\hat{Q}}_{30}$ moment operators. In the presence of constraints, the mean-field potential in the HFB equation is modified as follows

h (r σ, r^{'} σ^{'}) - λ δ_{σ σ^{'}} \to h (r σ, r^{'} σ^{'}) - (λ + \sum_{a} λ_{a} Q_{a} (r)) δ_{σ σ^{'}} . (12)

As is well known, the Fermi energies play in fact the role of the Lagrange parameters λ_a for the constraints on particle number. When performing calculations with constraints on the octupole moment, it is also important to fix the position of the center of mass. This is typically done by adding a constraint on the dipole moment ${\hat{Q}}_{10}$ . In the following, we note q_λμ the expectation value of the operator ${\hat{Q}}_{λ μ}$ on the quasiparticle vacuum, $q_{λ μ} = ⟨ Φ (q) | {\hat{Q}}_{λ μ} | Φ (q) ⟩$ .

Potential energy surfaces are a very important ingredient in a very popular approximation to the GCM called the Gaussian overlap approximation (GOA) Brink and Weiguny [53]; Onishi and Une [54]; Une et al. [55]. By assuming, among other things, that the overlap between two HFB states with different collective variables q and q′ is approximately Gaussian, the GOA allows turning the integro-differential Hill-Wheeler-Griffin equation of the GCM into a much more tractable Schrödinger-like equation. The time-dependent version of this equation reads as Verriere and Regnier [52].

i ℏ \frac{\partial}{\partial t} g (q, t) = [- \frac{ℏ^{2}}{2} \sum_{α β} \frac{\partial}{\partial q_{α}} B_{α β} (q) \frac{\partial}{\partial q_{β}} + V (q)] g (q, t), (13)

where the probability to be at point q of the collective space at time t is given by |g (q, t)|², V(q) is the actual PES, typically the HFB energy as a function of the collective variables q (sometimes supplemented by some zero-point energy correction) and B_αβ(q) the collective inertia tensor. In (13), indices α and β run from 1 to the number N_col of collective variables. While the HFB energy often varies smoothly with respect to the collective variables, the collective inertia tensor can exhibit very rapid variations near level crossings.

2.5 Canonical basis

The Bloch-Messiah-Zumino theorem states that the Bogoliubov matrix $W$ of (7) can be decomposed into a product of three matrices Ring and Schuck [44]; Bloch and Messiah [56]; Zumino [57].

W = D \bar{W} C = (\begin{matrix} D & 0 \\ 0 & D^{*} \end{matrix}) (\begin{matrix} \bar{U} & \bar{V} \\ \bar{V} & \bar{U} \end{matrix}) (\begin{matrix} C & 0 \\ 0 & C^{*} \end{matrix}), (14)

where D and C are unitary matrices. The matrices $\bar{U}$ and $\bar{V}$ take the very simple canonical form

\bar{U} = (\begin{matrix} 0 \\ ⋱ \\ u_{k} & 0 \\ 0 & u_{\bar{k}} \\ ⋱ \\ 0 \end{matrix}), \bar{V} = (\begin{matrix} 0 \\ ⋱ \\ 0 & v_{k} \\ v_{\bar{k}} & 0 \\ ⋱ \\ 0 \end{matrix}) . (15)

Starting from an arbitrary s.p. basis $(\hat{c}, {\hat{c}}^{†})$ of the Hilbert space, the transformation characterized by the matrix $D$ leads to a new basis $(\hat{a}, {\hat{a}}^{†})$ that diagonalizes the density matrix ρ and puts the pairing tensor κ into the canonical form similar to that of $\bar{V}$ . This new basis is called the canonical basis of the HFB theory. Properties of the canonical basis are discussed in details in the literature; see, e.g., Ring and Schuck [44]; Schunck [2]. In the HFB theory, quasiparticles are superpositions of particle operators ${\hat{a}}^{†}$ and hole operators $\hat{a}$ . Thus, the canonical basis is transformed according to the matrix $\bar{W}$ to obtain a set of quasiparticle operators $(\hat{α}, {\hat{α}}^{†})$ . There is another transformation of these operators associated with the matrix $C$ . However, the most important property for the purpose of this paper is that physical observables associated with HFB solutions do not depend on that last transformation.

In addition to simplifying the calculation of many-body observables, the canonical basis is also computationally less expensive than the full Bogoliubov basis³. As an illustration, let us take the example of the local density ρ(r). Assuming the s.p. basis $(\hat{c}, {\hat{c}}^{†})$ is represented by the basis functions ${ψ_{n} (r, σ)}_{n \in N}$ , the local density (for isospin τ) is obtained from the matrix of the Bogoliubov transformation by

ρ (r) = \sum_{σ} \sum_{μ} \sum_{m n} V_{m μ}^{*} V_{n μ} ψ_{m} (r, σ) ψ_{n}^{*} (r, σ) . (16)

Notwithstanding the constraints imposed by the orthonormality of the q.p. spinors, the number of independent parameters in this expression approximately scales like $2 \times N_{basis}^{2} \times N_{qp} \times N_{r}$ , where N_basis is the size of the s.p. basis, N_qp the number of q.p. states μ and N_r the total number of points in the spatial grid r (which depends on the symmetries imposed). In the canonical basis, and assuming that the state ${\hat{a}}_{μ} | 0 〉$ is associated with the wavefunction φ_μ(r, σ), the same object is represented by

ρ (r) = \sum_{σ} \sum_{μ} v_{μ}^{2} | φ_{μ} (r, σ) |^{2} . (17)

The number of data points now scales like 2 × N_qp × N_r + N_qp, or about $N_{basis}^{2}$ smaller than before. For calculations with N_basis ≈ 1,000 the compression enabled by the canonical basis is of the order of 10⁶.

2.6 Harmonic oscillator basis

All calculations in this article were performed with the HFBTHO code Marević et al. [58]. Recall that HFBTHO works by expanding the solutions on the axially-deformed harmonic oscillator basis Stoitsov et al. [46]. Specifically, the HO basis functions are written

ψ_{n} (r, σ) = ψ_{n_{r}}^{Λ} (r) ψ_{n_{z}} (z) \frac{e^{i Λ θ}}{\sqrt{2 π}} χ Σ (σ), (18)

where n ≡ (n_r, n_z, Λ, Ω = Λ ±Σ) are the quantum numbers labeling basis states and.

ψ_{n_{r}}^{Λ} (r) = N_{n_{r}} β_{⊥} \sqrt{2} η^{| Λ | / 2} e^{- η / 2} L_{n_{r}}^{| Λ |} (η), (19a)

ψ_{n_{z}} (z) = N_{n_{z}} \sqrt{β_{z}} e^{- ξ^{2} / 2} H_{n_{z}} (ξ), (19b)

With $η = β_{⊥}^{2} r^{2}$ and ξ = β_zz dimensionless variables, $L_{n_{r}}^{| Λ |}$ the associated Laguerre polynomials of order n_r and $H_{n_{z}}$ the Hermite polynomial of order n_z. The oscillator scaling factors β_⊥ and β_z are the inverse of the oscillator lengths, i.e., β_z = 1/b_z.

All integrations are performed by Gauss quadrature, namely Gauss-Hermite for integrations along the ξ-axis of the intrinsic reference frame and Gauss-Laguerre for integrations along the perpendicular direction characterized by the variable η. In the following, we note N_z the number of Gauss-Hermite nodes and N_⊥ the number of Gauss-Laguerre nodes.

3 Supervised learning with Gaussian processes

Gaussian processes (GPs) are a simple yet versatile tool for regression that has found many applications in low-energy nuclear theory over the past few years, from determining the nuclear equation of state Drischler et al. [59], quantifying the error of nuclear cross sections calculations Kravvaris et al. [60]; Acharya and Bacca [61] to modeling of neutron stars Pastore et al. [62]. In the context of nuclear DFT, they were applied to build emulators of χ₂ objective functions in the UNEDF project Kortelainen et al. [63–65]; Higdon et al. [66]; McDonnell et al. [67]; Schunck et al. [6], of nuclear mass models Neufcourt et al. [15,68–70], or of potential energy surfaces in actinides Schunck et al. [34]. In this section, we test the ability of GPs to learn directly the HFB potentials across a large, two-dimensional collective space.

3.1 Gaussian processes

Gaussian processes are commonly thought of as the generalization of normally-distributed random variables (Gaussian distribution) to functions. There exists a considerable field of applications for GPs and we refer to the reference textbook by Rasmussen and Williams for a comprehensive review of the formalism and applications of GPs Rasmussen and Williams [71]. For the purpose of this work, we are only interested in the ability of GPs to be used as a regression analysis tool and we very briefly outline below some of the basic assumptions and formulas.

We assume that we have a dataset of observations ${y = y_{i}}_{i = 1, \dots, n}$ and that these data represent n realizations of

y = f (x) + ϵ, (20)

where f: x↦f(x) is the unknown function we are seeking to learn from the data. Saying that a function f is a Gaussian process means that every finite collection of function values f = (f (x₁), …, f (x_p)) follows a p-dimensional multivariate normal distribution. In other words, we assume that the unknown function f follows a normal distribution in ‘function space’. This is denoted by

f (x) \sim GP (m (x), k (x, x^{'})), (21)

where m: x↦m(x) is the mean function and k: (x, x′)↦k (x, x′) the covariance function, which is nothing but the generalization to functions of the standard deviation,

k (x, x^{'}) = E [(f (x) - m (x)) (f (x^{'}) - m (x^{'}))] . (22)

Thanks to the properties of Gaussian functions, the mean and covariance functions have analytical expressions as a function of the test data y and covariance k; see Eqs (2.25)-(2.26) in Rasmussen and Williams [71].

The covariance function is the central object in GP regression. It is typically parametrized both with a functional form and with a set of free parameters called hyperparameters. The hyperparameters are determined from the observed data by maximizing the likelihood function. In our tests, the covariance matrix is described by a standard Matérn 5/2 kernel,

k (x, x^{'}) = (1 + \frac{\sqrt{5}}{ℓ} ‖ x - x^{'} ‖ + \frac{5}{3 ℓ^{2}} ‖ x - x^{'} ‖^{2}) \exp (- \frac{\sqrt{5}}{ℓ} ‖ x - x^{'} ‖), (23)

where ℓ is the length-scale that characterizes correlations between values of the data at different locations. The length-scale is a hyper-parameter that is optimized in the training phase of the Gaussian process. In this work, we only considered stationary GPs: the correlation between data points x and x′ only depends on the distance ‖x − x′‖ between these points, not on their actual value.

3.2 Study case

3.2.1 HFB potentials

Section 2.2 showed that the HFB mean-field potential involves several differential operators. When the HFB matrix is constructed by computing expectation values of the HFB potential on basis functions, differentiation is carried over to the basis functions and computed analytically—one of the many advantages of working with the HO basis. In practice, this means that the elements of the HFB matrix are computed by multiplying spatial kernels with different objects representing either the original HO functions or their derivatives. This means that we cannot consider a single emulator for the entire HFB potential. Instead, we have to build several different ones for each of its components: the central potential U (derivative of the EDF with respect to ρ), the r- and z-derivatives of the effective mass M* (derivative with respect to the kinetic density τ), the r- and z-derivatives of the spin-orbit potential W, and the pairing field $\tilde{h}$ . There are six such functions for neutrons and another six for protons. We denote this set of twelve functions as ${f_{i}}_{i = 1, \dots, 12}$ .

At any given point q of the collective space, these functions are all local, scalar functions of η and ξ, f_i(q) ≡ f_i: (η, ξ)↦f_i (η, ξ; q) where (η, ξ) are the nodes of the Gauss-Laguerre and Gauss-Hermite quadrature grid. We note generically f_ik(q) the value at point k of the quadrature grid (linearized) of the sample at point q of the function f_i. When fitting Gaussian process to reproduce mean-field and pairing potentials, we consider a quadrature grid of N_z × N_⊥ = 3,200 points. Our goal is thus to build 3,200 different emulators, one for each point k of that grid, for each of the 12 local functions characterizing the mean-field and pairing potentials. This gives a grand total of 38,400 emulators to build. While this number is large, it is still easily manageable on standard computers. It is also several orders of magnitude smaller than emulating the full set of quasiparticle spinors, as we will see in the next section.

In addition, the value of all the Lagrange parameters used to set the constraints must also be included in the list of data points. In our case, we have 5 of them: the two Fermi energies λ_n and λ_p and the three constraints on the value of the dipole, quadrupole and octupole moments, λ₁, λ₂ and λ₃, respectively. Finally, we also fit the expectation value of the three constraints on ${\hat{Q}}_{10}$ , ${\hat{Q}}_{20}$ and ${\hat{Q}}_{30}$ . We thus have a grand total of 38,408 functions of q to emulate.

3.2.2 Training data and fitting procedure

We show in Figure 1 the potential energy surface that we are trying to reconstruct. This PES is for the ²⁴⁰Pu nucleus and was generated with the SkM* parameterization of the Skyrme energy functional Bartel et al. [72]. The pairing channel is described with the zero-range, density-dependent pairing force of Eq. 4 that has exactly the same characteristics as in Schunck et al. [73].

FIGURE 1

FIGURE 1. Potential energy surface of ²⁴⁰Pu with the SkM* EDF for the grid (q₂₀, q₃₀) ∈ [0 b, 300 b] × [0 b^3/2, 51 b^3/2] with steps δq₂₀ = 6 b and δq₃₀ = 3 b^3/2. The black crosses are the training points, the white circles the validation points. Energies indicated by the color bar are in MeV relatively to −1820 MeV.

We imposed constraints on the axial quadrupole and octupole moments such that: 0 b ≤ q₂₀ ≤ 300 b and 0 b^3/2 ≤ q₃₀ ≤ 51 b^3/2 with steps of δq₂₀ = 6 b and δq₃₀ = 3 b^3/2, respectively. The full PES should thus contain 918 collective points. In practice, we obtained N_p = 887 fully converged solutions. Calculations were performed with the HFBTHO solver by expanding the solutions on the harmonic oscillator basis with N_max = 28 deformed shells and a truncation in the number of states of N_basis = 1,000. At each point of the PES, the frequency ω₀ and deformation β₂ of the HO basis are set according to the empirical formulas given in Schunck et al. [73]. Following standard practice, we divided the full N_p = 887 dataset of points into a training (80% of the points) and validation (20% of the points) set. The selection was done randomly and resulted in N_train = 709 training points and N_valid. = 178 validation points. The training points are marked as small black crosses in Figure 1 while the validation points are marked as larger white circles.

Based on the discussion in Section 3.2.1, we fit a Gaussian process to each of the 38,408 variables needed to characterize completely the HFB matrix. Since we work in a two-dimensional collective space, we have two features and the training data is represented by a two-dimensional array X of dimension (n_samples, n_features) with n_samples = N_p and n_features = 2. The target values Y (= the value at point k on the quadrature grid of any of the functions f_i) are contained in a one-dimensional array of size N_p. Prior to the fit, the data is normalized between 0 and 1. The GP is based on a standard Matérn kernel with ν = 2.5 and length-scale ℓ. In practice, we use different length-scales for the q₂₀ and q₃₀ directions so that ℓ = ℓ is a vector. We initialized these values at the spacing of the grid ℓ = (δq₂₀, δq₃₀). We added a small amount of white noise to the Matérn kernel to account for the global noise level of the data.

3.2.3 Performance

Once the GP has been fitted on the training data, we can estimate its performance on the validation data. For each of the N_valid. = 178 validation points, we used the GP-fitted HFB potentials to perform a single iteration of the HFB self-consistent loop and extract various observables from this single iteration. Figure 2 focuses on the total HFB energy and the zero-point energy correction ɛ₀. Together, these two quantities define the collective potential energy in the collective Hamiltonian (13) of the GCM. The left panel of the figure shows the histogram of the error $Δ E = E_{HFB}^{(t r u e)} - E_{HFB}^{(G P)}$ , where $E_{HFB}^{(t r u e)}$ is the result from the fully converged HFB solution and $E_{HFB}^{(G P)}$ is the value predicted by the Gaussian process. The bin size is 100 keV. Overall, we find that the large majority of the error is within ±200 keV. This is a rather good result considering the span of the PES and the fact that basis truncation errors can easily amount to a few MeV Schunck [74].

FIGURE 2

FIGURE 2. (A): Histogram of the error on the GP-predicted total HFB energy and zero-point energy correction across the validation points. Bin size is 100 keV. (B): Size of the error on the GP-predicted total HFB energy across the validation set. Gray circles have an error lower than 500 keV and the size of the markers correspond to energy bins of 100 keV. Black circles have an error greater than 500 keV and are binned by 400 keV units. Energies indicated by the color bar are in MeV relatively to −1820 MeV.

To gain additional insight, we draw in the right panel of Figure 2 each of the validation points with a marker, the size of which is proportional to the error of the prediction. To further distinguish between most points and the few outliers, we show in gray the points for which the absolute value of the error is less than 500 keV and in black the points for which it is greater than 500 keV. For the gray points, we use 5 different marker sizes corresponding to energy bins of 100 keV: the smaller grey symbol corresponds to an error smaller than 100 keV, the larger one between 400 and 500 keV. Similarly, the larger black circles have all an error greater than 500 keV and are ordered by bins of 400 keV (there are only two points for which the error is larger than 4 MeV). Interestingly, most of the larger errors are concentrated in the region of small elongation q₂₀ < 80 b and high asymmetry q₃₀ > 30 b^3/2. This region of the collective space is very high in energy (more than 100 MeV above the ground state) and plays no role in the collective dynamics.

Note that the expectation values of the multipole moments themselves are not reproduced exactly by the GP: strictly speaking, the contour plot in the right panel of Figure 2 is drawn based on the requested values of the constraints, not their actual values as obtained by solving the HFB equation once with the reconstructed potentials. The histogram in the left panel of Figure 3 quantifies this discrepancy. It shows the absolute error $Δ q_{λ μ} = q_{λ μ}^{(t r u e)} - q_{λ μ}^{(G P)}$ , where $q_{λ μ}^{(t r u e)}$ is the result from the fully converged HFB solution and $q_{λ μ}^{(t r u e)}$ is the value predicted by the Gaussian process. On average, the error remains within ±0.5 b for q₂₀ and ±0.5 b^3/2 for q₃₀, which is significantly smaller than the mesh size.

FIGURE 3

FIGURE 3. (A): Histogram of the error on the GP-predicted values of the multipole moments. The bin size is 0.2 b^λ/2 with λ = 2 (quadrupole moment) or λ = 3 (octupole moment). (B): Histogram of the relative error, in percents, on the GP-predicted values of the components of the collective inertia tensor. The bin size is 1, corresponding to 1% relative errors.

The collective potential energy is only one of the two ingredients used to simulate fission dynamics. As mentioned in Section 2.4, see Eq. 13, the collective inertia tensor is another essential quantity Schunck and Robledo [4]; Schunck and Regnier [3]. In this work, we computed the collective inertia at the perturbative cranking approximation Schunck and Robledo [4]. Since we work in two-dimensional collective spaces, the collective inertia tensor $B$ has three independent components, hereafter labeled B₂₂, B₃₃ and B₃₂ = B₂₃. Figure 3 shows the relative error on these quantities, defined as $ϵ = (B_{λ λ^{'}}^{(t r u e)} - B_{λ λ^{'}}^{(G P)}) / B_{λ λ^{'}}^{(t r u e)}$ . Overall, the error is more spread than for the energy but rarely exceeds five percents⁴.

Both the total energy and the collective mass tensor are computed from the HFB solutions. However, since the GP fit is performed directly on the mean-field and pairing potentials, one can analyze the error on these quantities directly. In Figure 4, we consider two different configurations. The configuration $C_{1} = (q_{20}, q_{30}) = (198 b, 30 b^{3 / 2})$ is very well reproduced by the GP with an error in the HFB energy of 4.4 keV and a relative error on B₂₂ of -0.43% and B₂₂ of -0.84% only. In contrast, the configuration $C_{2} = (q_{20}, q_{30}) = (138 b, 51 b^{3 / 2})$ is one of the worst possible cases, with a total error on the HFB energy of 9.0 MeV and relative errors on B₂₂ of -71.0% and B₂₂ of -13.7%. For each of these two configurations, we look at the central part of the mean-field potential for protons, the term U_p = U₀ − U₁ of (9b). The left side of Figure 4 shows, respectively, the actual value of U_p(r, z) across the quadrature grid (top panel) and the difference between the true value and the GP fit (bottom panel) for the configuration $C_{1}$ . The right side of the figure shows the same quantity for the configuration $C_{2}$ . In all four plots, the energy scale is in MeV; it is identical for the two panels at the top, but it is different for the two at the bottom.

FIGURE 4

FIGURE 4. (A): Central part of the mean-field potential for protons, U_p(r, z) for the configuration (q₂₀, q₃₀) = (198 b, 30 b^3/2); (B): Error in the GP fit for that same configuration. (C): Central part of the mean-field potential for protons, U_p(r, z) for the configuration (q₂₀, q₃₀) = (138 b, 51 b^3/2); (D): Error in the GP fit for that same configuration. In all figures, the energy given by the error bar is in MeV. Note the much smaller energy scale for the bottom left panel.

We see that for the “good” configuration $C_{1}$ , the error is between −0.6 MeV and 1.0 MeV but is mostly occurring at the surface of the nucleus and at the edges of the domain. Conversely, the “bad” configuration $C_{2}$ actually corresponds to a scissioned configuration: the mean-field potential (upper right panel) shows two different regions corresponding to fully separated fragments⁵. Such a geometric configuration is very different from the rest of the potential energy surface shown in Figure 1, which contains mostly non-scissioned configurations. As a result, the error in the GP fit is very large in the region between the two fragments since it predicts this configuration to be non-scissioned. Note that in HFBTHO, the representation of the potentials on the quadrature points does not contain the exponential factor $\exp (- β_{z} ξ^{2}) \exp (- β_{⊥}^{2} ξ^{2})$ which is factored in the quadrature weights. Therefore, the large errors at the edges of the domain, for z ≈ − 30 fm, z ≈ + 30 fm or r ≈ 18 fm are not significant since they are entirely absorbed by this exponential factor.

Overall, Gaussian processes seem to provide an efficient way to predict HFB solutions across potential energy surfaces. Their primary advantage is that they are very simple to implement, with several popular programming environments offering ready-to-use, full GP packages, and very fast to train (a few minutes at most for a few hundreds of samples). As our examples suggest, GPs are very good at interpolating across a domain where solutions behave smoothly. In the case of PES, this implies that the training data must not contain, e.g., scissioned and non-scissioned configurations. More generally, it should not feature too many discontinuities Dubray and Regnier [75]. When these conditions are met, GPs can be used to quickly and precisely densify a PES, e.g., to obtain more precise fission paths in spontaneous fission half-live calculations Sadhukhan [76].

However, Gaussian processes are intrinsically limited. In our example, we treated the value of each potential at each point of the quadrature mesh as an independent GP. Yet, such data are in reality heavily correlated. To incorporate such correlations requires generalizing from scalar GPs to vector, or multi-output GPs Bruinsma et al. [77]. In our example of nuclear potentials, the output space would be $R^{D}$ with D ≈ 32,008. An additional difficulty is related to choosing the kernel that is appropriate to describe the correlated data and identifying what the prior distribution should be Álvarez et al. [78]. Yet another deficiency of standard Gaussian processes, especially in contrast to the deep-learning techniques discussed below, is that they are not capable to learn a latent representation of the data. For these reasons, we consider such techniques helpful mostly to densify existing potential energy surfaces.

4 Deep learning with autoencoders

Even though self-consistent potential energy surfaces are key ingredients in the microscopic theory of nuclear fission Bender et al. [79], we must overcome two significant obstacles to generate reliable and complete PES. First, the computational cost of nuclear DFT limits the actual number of single-particle d.o.f. When solving the HFB equation with basis-expansion methods, for example, the basis must be truncated (up to a maximum of about a few thousand states, typically), making the results strongly basis-dependent Schunck [80]; even in mesh-based methods, the size of the box and lattice spacing also induce truncation effects Ryssens et al. [81]; Jin et al. [82]. Most importantly, the number of collective variables that can be included in the PES is also limited: in spontaneous fission calculations, which do not require a description of the PES up to scission, up to N_col = 5 collective variables have been incorporated Sadhukhan [76]; when simulating the PES up to scission, only 2 collective variables are included with only rare attempts to go beyondRegnier et al. [83]; Zhao et al. [84]. As a consequence, the combination of heavily-truncated collective spaces and the adiabatic hypothesis inherent to such approaches leads to missing regions in the PES and spurious connections between distinct channels with unknown effects on physics predictions Dubray and Regnier [75]; Lau et al. [85]. The field of deep learning may offer an appealing solution to this problem by allowing the construction of low-dimensional and continuous surrogate representations of potential energy surfaces. In the following, we test the ability of autoencoders—a particular class of deep neural networks—to generate accurate low-dimensional representations of HFB solutions.

4.1 Network architecture

The term ‘deep learning’ encompasses many different types of mathematical and computational techniques that are almost always tailored to specific applications. In this section, we discuss some of the specific features of the data we seek to encode in a low-dimensional representation, which in turn help constrain the network architecture. The definition of a proper loss function adapted to quantum-mechanical datasets is especially important.

4.1.1 Canonical states

We aim at building a surrogate model for determining canonical wavefunctions as a function of a set of continuous constraints. Canonical states are denoted generically $φ_{μ}^{(τ)} (r, σ)$ with r ≡ (r, z, θ) the cylindrical coordinates and σ = ±1/2 the spin. Fully characterizing an HFB state requires the set of canonical wavefunctions for both neutrons and protons, which are distinguished by their isospin quantum number τ = +1/2 (neutrons) and τ = −1/2 (protons). As mentioned in Section 2, an HFB solution |Φ(q)⟩ is entirely determined up to a global phase by the set of all canonical states ${φ_{μ}^{(τ)} (r, σ)}_{μ}$ and their associated occupation amplitudes ${v_{μ}^{(τ)}}_{μ}$ .

In this work, we restrict ourselves to axially-symmetric configurations. In that case, the canonical wavefunctions are eigenstates of the projection of the total angular momentum on the symmetry axis ${\hat{J}}_{z}$ with eigenvalue Ω and acquire the same separable structure (18) as the HO basis functions,

φ_{μ}^{(τ)} (r, σ) = φ_{μ}^{(τ)} (r, z, σ) \frac{e^{i Λ θ}}{\sqrt{2 π}}, (24)

where $φ_{μ}^{(τ)} (r, z, σ)$ is the canonical wavefunction at θ = 0. In this initial work, we only consider even-even nuclear systems and time-reversal symmetric nuclear Hamiltonians. Therefore, Kramer’s degeneracy ensures that paired particles in the canonical basis are time-reversal partners of each other: $φ_{\bar{μ}}^{(τ)} (r, σ) = 2 σ φ_{μ}^{(τ) *} (r, - σ)$ . This guarantees that the canonical wavefunction at θ = 0 can be chosen purely real. Incidentally, it also means that we only need to describe one wavefunction per pair of particles. Using these properties, we can completely describe a canonical wavefunction in our model by only predicting a single pair of real-valued functions (one for each spin projection σ).

As shown by Eqs 8, Eqs 9a–9c, all mean-field and pairing potentials are functions of the Skyrme densities. The kinetic energy density τ(r, z), spin-current tensor $J (r, z)$ , and vector density J (r, z) involve derivatives of the quasiparticle spinors or, in the canonical basis, of the canonical wavefunctions on the quadrature grid Stoitsov et al. [46]. We compute these derivatives by first extracting the coefficients $α_{n μ}^{(τ)}$ of the expansion of the canonical wavefunctions $φ_{μ}^{(τ)} (r, σ)$ in the HO basis

φ_{μ}^{(τ)} (r, σ) = \sum_{n} α_{n μ}^{(τ)} ψ_{n} (r, σ) \Rightarrow α_{n μ}^{(τ)} = \int d^{3} r ψ_{n}^{*} (r, σ) φ_{μ}^{(τ)} (r, σ), (25)

using Gauss-Laguerre and Gauss-Hermite quadrature. Since all the derivatives of the HO functions can be computed analytically, the expansion (25) makes it very easy to compute partial derivatives with respect to r or z, for example,

\frac{\partial φ_{μ}^{(τ)}}{\partial z} (r, σ) = \sum_{n} α_{n μ}^{(τ)} \frac{\partial ψ_{n}}{\partial z} (r, σ) . (26)

4.1.2 Structure of the predicted quantity

In the ideal case, the canonical wavefunctions evolve smoothly with the collective variables. The resulting continuity of the many-body state with respect to collective variables is a prerequisite for a rigorous description of the time evolution of fissioning systems, yet it is rarely satisfied in practical calculations. We discuss below the three possible sources of discontinuity of the canonical wavefunctions in potential energy surfaces.

First, the canonical wavefunctions are invariant through a global phase. Since the quantity we want to predict is real, the orbitals can be independently multiplied by an arbitrary sign. Even though this type of discontinuity does not impact the evolution of global observables as a function of deformation, it affects the learning of the model: since we want to obtain continuous functions, a flipping of the sign would be seen by the neural network as a discontinuity in the input data. We address this point through the choice of the loss, as discussed in Section 4.1.3, and through the determination of the training set, as detailed in Section 4.2.

Second, we work within the adiabatic approximation, which consists in building PES by selecting the q.p. vacuum that minimizes the energy at each point. When the number N_col of collective variables of the PES is small, this approximation may lead to discontinuities Dubray and Regnier [75]. These discontinuities correspond to missing regions of the collective space and are related to the inadequate choice of collective variables. Since we want to obtain a continuous description of the fission path, we must give our neural network the ability to choose the relevant degrees of freedom. This could be achieved with autoencoders. Autoencoders are a type of neural networks analogous to the zip/unzip programs. They are widely used and greatly successful for representation learning—the field of Machine Learning that attempts to find a more meaningful representation of complex data Baldi [86]; Burda et al. [87]; Chen et al. [88]; Gong et al. [89]; Bengio et al. [90]; Zhang et al. [91]; Yu et al. [92] and can be viewed as a non-linear generalization of principal component analysis (PCA). As illustrated in Figure 5, an autoencoder Ξ typically consists of two components. The encoder E (T^(φ)) encodes complex and/or high-dimensional data T^(φ) to a typically lower-dimensional representation v^(φ). The latent space is the set of all possible such representations. The decoder D (v^(φ)) takes the low-dimensional representation of the encoder and uncompresses it into a tensor T^(ϕ) as close as possible to T^(φ). Such architectures are trained using a loss function that quantifies the discrepancy between the initial input and the reconstructed output,

L_{rec.} (T^{(ϕ)}) = d (T^{(φ)}, T^{(ϕ)}), (27)

where d (., .) defines the metric in the space of input data. We discuss the choice of a proper loss in more details in Section 4.1.3.

FIGURE 5

FIGURE 5. An autoencoder is the association of two blocks. The first one, on the left, compresses the input data into a lower-dimensional representation, or code, in the latent space. The second one, on the right, decompresses the code back into the original input.

Third, the evolution of the q.p. wavefunctions as a function of the collective variables q may lead to specific values q_i where the q.p. solutions are degenerate. These degeneracies form a sub-manifold of dimension at most D − 2, where D is the number of collective d.o.f.s. As a consequence, they cannot appear in one-dimensional PES: q.p. solutions with the same symmetry “cannot cross” (the famous no-crossing rule von Neuman and Wigner [93]). In multi-dimensional spaces, this rule does not hold anymore: when following a closed-loop trajectory around such a degeneracy, the sign of the q.p. wavefunctions is flipped, in a similar manner that we flip side when winding around a Moebius strip Teller [94]; Longuet-Higgins et al. [95]; Longuet-Higgins [96]. In the field of quantum chemistry, such degeneracies are referred to as diabolical points or conical intersections Domcke et al. [97]; Larson et al. [98]. The practical consequence of conical intersections for deep learning is that the manifold of all the q.p. wavefunctions cannot be embedded in a D-dimensional latent space. Such singularities can be treated in two ways: (i) by using a latent space of higher dimension than needed or (ii) by implementing specific neural network layers capable of handling such cases. For now, we do not consider these situations.

4.1.3 Loss functions and metrics

As already discussed in Section 4.1.2, autoencoders are trained through the minimization of a loss function that contains a reconstruction term of the form (27). As suggested by its name, this term ensures that the autoencoder can correctly reconstruct the input tensor T^(φ) from its compressed representation. It depends on a definition for the metric d (., .) used to compare the different elements of the input space. Since our canonical wavefunctions φ_μ are expanded on the axial harmonic oscillator basis of Section 2.6, they are discretized on the Gauss quadrature mesh without any loss of information. Therefore, both the input and output tensors of our surrogate model are a rank-3 tensor T^(ϕ) ≡ T^(φ) ≡ T_ijk of dimensions N_⊥× N_z × 2, where i is the index of the Gauss-Laguerre node along the r-axis, j the index of the Gauss-Hermite node along the z-axis, and k the index of the spin component.

A standard loss used with autoencoders is the mean-square-error (MSE). Because of the structure of our input data, see Section 4.1.1, the MSE loss reads in our case

d_{MSE} (T^{(φ)}, T^{(ϕ)}) = \frac{1}{N_{⊥} \times N_{z} \times 2} \sum_{i = 0}^{N_{⊥} - 1} \sum_{j = 0}^{N_{z} - 1} \sum_{i = 0}^{1} {(T_{ijk}^{(ϕ)} - T_{ijk}^{(φ)})}^{2} . (28)

The MSE is very general and can be thought of, quite simply, as the mean squared “distance” between the initial and reconstructed data. However, this generality implies that it does not contain any information about the properties of the data one tries to reconstruct.

Indeed, we can define a metric that is better suited to the physics we aim to describe. Let us recall that our goal is to compute potential energy surfaces that can be used, e.g., for fission simulations. These PES are nothing but generator states for the (TD)GCM mentioned in Section 2.4. The GCM relies on the norm kernel $N (q, q^{'})$ and the Hamiltonian kernel $H (q, q^{'})$ , which are defined as.

N (q, q^{'}) = ⟨ Φ (q) | Φ (q^{'}) ⟩, (29)

H (q, q^{'}) = ⟨ Φ (q) | \hat{H} | Φ (q^{'}) ⟩ . (30)

Since the norm kernel involves the standard inner product in the many-body space, it represents the topology of that space. Therefore, it should be advantageous to use for the loss a metric induced by the same inner product that defines the norm kernel.

In our case, we want to build an AE where the encoder v^(φ) = E (T^(φ)) compresses the single-particle, canonical orbitals ${φ_{μ}}_{μ}$ associated with |Φ⟩ into a low-dimensional vector v^(φ) and where the decoder T^(ϕ) = D (v^(φ)) is used to compute the set of reconstructed canonical orbitals ${ϕ_{μ}}_{μ}$ . Most importantly, this reconstruction should be such that the reconstructed many-body state |Ψ⟩ is as close as possible to the original state |Φ⟩. In other words, we need to use a loss that depends on the norm overlap (between many-body states) but since we work with single-particle wavefunctions, we must have a way to relate the norm overlap to these s.p. wavefunctions. This can be achieved with Equations 5.4 and (5.6) of Haider and Gogny [99], which relate the inner product ⟨Φ|Ψ⟩ in the many-body space with the inner product (overlap) ⟨φ_μ|ϕ_ν⟩ between the related canonical orbitals φ_μ and ϕ_ν,

⟨ φ_{μ} | ϕ_{ν} ⟩ \equiv τ_{μ ν}^{(φ ϕ)} = \{{\hat{a}}_{μ}^{(φ) †}, {\hat{a}}_{ν}^{ϕ}\} = \sum_{σ} \int d^{3} r φ_{μ}^{*} (r, σ) ϕ_{ν} (r, σ) (31)

and with the occupation amplitudes. However, it assumes that the canonical wavefunctions of each many-body state are orthogonal. This property is not guaranteed for our reconstructed canonical wavefunctions. In fact, because of this lack of orthogonality, the reconstructed wavefunctions cannot be interpreted as representing the canonical basis of the Bloch-Messiah-Zumino decomposition of the quasiparticle vacuum and the Haider and Gogny formula cannot be applied ‘as is’. However, we show in Supplementary Appendix S1 that it is possible to find a set of transformations of the reconstructed wavefunctions that allows us to define such as genuine canonical basis.

We want the loss function to depend only on the error associated with the reconstructed orbital ϕ_μ. Therefore, we should in principle consider the many-body state $| {\tilde{Φ}}_{μ} 〉$ where only the orbital φ_μ is substituted by its reconstruction ϕ_μ. We can then compute the inner product between |Φ⟩ and $| {\tilde{Φ}}_{μ} 〉$ using Supplementary Appendix S1 and deduce any induced metric f

d_{exact}^{f} (T^{(φ)}, T^{(ϕ)}) = f (\frac{⟨ Φ | {\tilde{Φ}}_{μ} ⟩}{\sqrt{⟨ {\tilde{Φ}}_{μ} | {\tilde{Φ}}_{μ} ⟩}}) . (32)

However, computing this metric is too computationally involved to be carried out explicitly for each training data at each epoch. Instead, we keep this metric for comparing a posteriori the performance of our model.

Instead of explicitly determining $d_{exact}^{f} (T^{(φ)}, T^{(ϕ)})$ , we focus on reproducing canonical orbitals using the metrics of the one-body Hilbert space. In practice we considered the distance noted $d_{◦}^{(0)}$ that is induced by the inner product between normalized functions in the one-body Hilbert space, that is,

d_{◦}^{(0)} (φ, ϕ) \equiv (\frac{〈 φ |}{\sqrt{⟨ φ | φ ⟩}} - \frac{〈 ϕ |}{\sqrt{⟨ ϕ | ϕ ⟩}}) (\frac{| φ 〉}{\sqrt{⟨ φ | φ ⟩}} - \frac{| ϕ 〉}{\sqrt{⟨ ϕ | ϕ ⟩}}), (33)

which is nothing but

d_{◦}^{(0)} (φ, ϕ) = \sum_{σ} \int d^{3} r {|φ (r, σ) - ϕ (r, σ)|}^{2}, (34)

where the φ(r, σ) and ϕ(r, σ) have been normalized. Since all wavefunctions are discretized on the Gauss quadrature mesh, this distance reads

d_{◦}^{(0)} (φ, ϕ) = \sum_{n_{⊥} n_{z} n_{σ}} W_{n_{⊥} n_{z}} {|T_{n_{⊥} n_{z} n_{σ}}^{(φ)} - T_{n_{⊥} n_{z} n_{σ}}^{(ϕ)}|}^{2}, (35)

where the weights W are given by

W_{n_{⊥} n_{z}} = \frac{w_{n_{⊥}}^{GL}}{2 b_{⊥}^{2}} \times 2 π \times \frac{w_{n_{z}}^{GH}}{b_{z}} . (36)

These weights, which depend on the indices n_⊥ and n_z in the summation, are the only difference between the squared distance loss (Eq. 35) and the MSE loss (Eq. 28). Although the distance (Eq. 35) is norm-invariant⁶, it still depends on the global phase of each orbital. We have explored other possible options for the loss based on norm- and phase-invariant distances; see Supplementary Appendix S2 for a list. However, we found in our tests that the distance $d_{◦}^{(0)}$ systematically outperformed the other ones and, for this reason, only show results obtained with this one.

4.1.4 Physics-informed autoencoder

From a mathematical point of view, deep neural networks can be thought of as a series of compositions of functions. Each composition operation defines a new layer in the network. Networks are most often built with alternating linear and nonlinear layers. The linear part is a simple matrix multiplication. Typical examples of nonlinear layers include sigmoid, tanh, Rectified Linear Unit (ReLU) functions. In addition to these linear and nonlinear layers, there could be miscellaneous manipulations of the model for more specific purposes, such as adding batch normalization layers Ioffe and Szegedy [100], applying dropout Srivastava et al. [101] to some linear layers, or skip connection He et al. [102] between layers.

Our data is a smooth function defined over a N_⊥× N_z = 60 × 40 grid and is analogous to a small picture. For this reason, we chose a 2D convolutional network architecture. Convolutional layers are popular for image analysis, because they incorporate the two-dimensional pixel arrangement in the construction of the weights of the network. These two-dimensional weights, or filters, capture local shapes and can model the dependent structure in nearby pixels of image data. Given a 2D m × m input array, a 2D filter F is a n × n matrix, usually with n ≪ m. If we note Iⁿ the space of n × n integer-valued matrices, then the convolutional layer $C$ is an operation of the $C : (I^{n}, I^{n}) \to N$ that is applied to all pairs (F, C) where C is any n × n chunk of the input image; see Figure 6 for an example. This way, the resulting output summarizes the strength and location of that particular filter shape within the image. As the model gets trained, the filter parameters are fitted to a shape that is learned to be important in the training data. Convolutional neural network are very effective for image analysis and are currently widely used Krizhevsky et al. [103]; Zeiler and Fergus [104]; Sermanet et al. [105]; Szegedy et al. [106].

FIGURE 6

FIGURE 6. Schematic example of a convolutional layer. For any 2 × 2 chunk C of the input image on the left, this convolutional layer performs the point-wise multiplication of C with the filter F followed by the addition of all elements. This compresses the initial chunk of the image into a single integer.

In this work, we used the Resnet 18 model as our encoder and constructed the decoder from a transposed convolution architecture of the Resnet 18. The Resnet 18 model was first introduced by He et al. as a convolutional neural network for image analysis He et al. [102]. It was proposed as a solution to the degradation of performance as the network depth increases. Resnet branches an identity-function addition layer to sub-blocks (some sequential layers of composition) of a given network. While a typical neural network sub-block input and output could be represented by x and f(x), respectively, a Resnet sub-block would output f(x) + x for the same input x, as in Figure 2 of He et al. [102]. This architecture is called ‘skip connection’ and was shown to be helpful for tackling multiple challenges in training deep neural network such as vanishing gradient problem and complex loss function Li et al. [107]; He et al. [102]. Since then, the Resnet architecture has been widely successful, often being used as a baseline for exploring new architectures Zhang et al. [108]; Radosavovic et al. [109] or as the central model for many analyses Cubuk et al. [110]; Yun et al. [111]; Zhang et al. [112]. In a few cases, it was also combined with autoencoders for feature learning from high-dimensional data Wickramasinghe et al. [113].

For the decoder part, we designed a near-mirror image of the encoder using transposed convolution. Transposed convolution is essentially the opposite operation to convolution in terms of input and output dimensions. Here the meaning of transpose refers to the form of the filter matrix when the convolution layer is represented by a 1D vector input obtained from linearizing the 2D input. Note that the mirror-located filters in the decoder are independent parameters and not the actual transposed filter matrix of the encoder. Such a construction ensures symmetrical encoder and decoder models, making the decoder model close to the inverse shape of the encoder model. Figure 7 illustrates the operation: one input value is multiplied by the entire kernel (filter) and is added to the output matrix at its corresponding location. The corresponding output location for each colored input number are color-coded and show how the addition is done.

FIGURE 7

FIGURE 7. Schematic illustration of the 2D-transposed convolution. Each input value, e.g., 55, 57, etc., is multiplied by the entire kernel resulting in a 3 × 3 matrix. These matrices are then added to one another in a sliding and overlapping way.

The first and the last layer of the Resnet architecture are mostly for resizing and were minimally modified from the original Resnet 18 model since the size of our input data is significantly smaller than typical image sizes used for Resnet image classification analyses. We also modified the number of input channels of the first layer of the encoder to be 2 (for each of the spin components of the nuclear wavefunction) instead of the usual number 3 (for the RGB colors of colored images) or 1 (for black-and-white images). The spin components are closely related to each other with covariance structure, similar to how colors interact within an image. Therefore, we treat a pair of spin components as a single sample and treat each component as an input channel. The same applies to the output channel of the decoder.

The full network is represented schematically in Figure 8. Parametric Rectified Linear Unit, or PRELU, layers were added to impose nonlinearity in the model He et al. [114]. PRELU layers are controlled by a single hyperparameter that is trained with the data. Batch normalization is a standardizing layer that is applied to each batch by computing its mean and standard deviation. It is known to accelerate training by helping with optimization steps Ioffe and Szegedy [100]. The average pooling layer (bottom left) averages each local batch of the input and produces a downsized output. The upsampling layer (top right) upsamples the input using a bilinear interpolation.

FIGURE 8

FIGURE 8. Representation of our modified Resnet 18 architecture for the encoder (A) and the decoder (B). Large numbers on the left of each side label the different layers. Numbers such as 64, 128, etc. refer to the size of the filer; see text for a discussion of some of the main layers.

4.2 Training

As mentioned in Section 4.1.4, the loss is the discrepancy between the input of the encoder and the output of the decoder. The minimization of the loss with respect to all the model parameters w, such as the filter parameters, is the training process. We used the standard back-propagation algorithm to efficiently compute the gradient of the loss function with respect to the model parameters. The gradient computation is done with the chain rule, iterating from the last layer in the backward direction. We combined this with the mini-batch gradient descent algorithm: ideally, one would need the entire dataset to estimate the gradient at the current model parameter value. However, with large datasets, this becomes computationally inefficient. Instead we use a random subset of the entire data, called mini-batch, to approximate the gradient, and expedite the convergence of the optimization. For each mini-batch, we update each parameter w by taking small steps of gradient descent, $w_{t + 1} = w_{t} - α \frac{\partial L}{\partial w_{t}}$ . At step t, or at tth mini-batch, the average loss $L$ and the gradient with respect to current model parameter w_t are computed. Then α-sized gradient descent step is taken to update the model parameters. Instead of using the current gradient for the update, one can use a weighted average of past gradients. We employed the well-known Adam algorithm, which uses the exponential moving average of current and past gradients Kingma and Ba [115].

Iterating over the entire dataset once, using multiple mini-batches, is called an epoch. Typically a deep neural network needs hundreds to thousands of epochs for the algorithm to converge. Parameters such as the batch size or learning rate, the parameters of the optimizer itself (Adam’s or other), and the number of epochs are hyper-parameters that must be tuned for model fitting. For our training, we used the default initialization method in PyTorch for the model parameters. The linear layers were initialized with a random uniform distribution over [ − 1/k, 1/k], where k is the size of the weight. For example, if there are 2 input channels and 3 × 3 convolution filters are used, k = 2 × 3 × 3. PRELU layers were initialized with their default PyTorch value of 0.25. We proceeded with mini-batches of size 32 with the default β₁ = 0.9, β₂ = 0.999 and ϵ = 10^–8: all these numbers refer to the PyTorch implementation of the Adam’s optimizer. For α, we used 0.001 as starting value and used a learning rate scheduler, which reduces the α value by a factor of 0.5 when there is no improvement in the loss for 15 epochs. After careful observation of the loss curves, we have estimated that at least 1,000 epochs are needed to achieve convergence.

To mitigate the problem of the global phase invariance of the canonical wavefunctions discussed in Section 4.1.2, we doubled the size of the dataset: at each point q of the collective space (=the sample), we added to each canonical wavefunction φ_μ(r, σ) the same function with the opposite sign − φ_μ(r, σ). The resulting dataset was then first split into three components, training, validation and test datasets, which represent 70%, 15%, and 15% of the entire data respectively. Training data is used for minimizing the loss with respect to the model parameters as explained above. Then we choose the model at the epoch that performs the best with the validation dataset as our final model. Finally, the model performance is evaluated using the test data.

4.3 Results

In this section, we summarize some of the preliminary results we have obtained after training several variants of the AE. In Section 4.3.1, we give some details about the training data and the quality of the reconstructed wavefunctions. We discuss some possible tools to analyze the structure of the latent space in Section 4.3.2. In these two sections, we only present results obtained for latent spaces of dimension D = 20. In Section 4.3.3, we use the reconstructed wavefunctions to recalculate HFB observables with the code HFBTHO. We show the results of this physics validation for both D = 20 and D = 10.

4.3.1 Performance of the network

Figure 9 shows the initial potential energy surface in ⁹⁸Zr used in this work. Calculations were performed for the SkM* parametrization of the Skyrme potential with a surface-volume, density-dependent pairing force with $V_{0}^{(n)} = - 199.69$ MeV.fm⁻³, $V_{0}^{(p)} = - 223.59$ MeV.fm⁻³ and an “infinite” pairing cutoff. The basis was identical for all points with oscillator length b₀ = 1.971 fm, deformation β₀ = 0.3 and number of shells N_basis = 20. The PES contains 548 HFB calculations with constraints on the axial quadrupole, $q_{20} = ⟨ {\hat{Q}}_{20} ⟩$ , and axial octupole moment, $q_{30} = ⟨ {\hat{Q}}_{30} ⟩$ . The mesh was: − 12.5 b ≤ q₂₀ ≤ 25.0 b with steps δq₂₀ = 1 b and 0.0 b3/2 ≤ q₃₀ ≤ 3.0 b^3/2 with step δq₃₀ = 0.125 b^3/2. The black dots in Figure 9 indicate the location of the converged solutions. For each solution, the n_p = 60 highest-occupation proton and n_n = 87 highest-occupation neutron canonical wavefunctions were used as training data for the network⁷.

FIGURE 9

FIGURE 9. Potential energy surface of ⁹⁸Zr in the (q₂₀, q₃₀) plane. Converged HFBTHO solutions are represented by black dots. Energies given by the color bar are in MeV relatively to the ground state.

For each of the losses discussed in Supplementary Appendix S2, we trained the AE with the slightly modified Resnet 18 architecture described in Section 4.1.4. It is important to keep in mind that the value of these losses should not be compared with one another. The only rigorous method to compare the performance of both networks would be to compute the many-body norm overlap across all the points in each case—or to perform a posteriori physics validation with the reconstructed data, as will be shown in Section 4.3.3.

To give an idea of the quality of the AE, we show in Figure 10 one example of the original and reconstructed canonical wavefunctions. Specifically, we consider the configuration (q₂₀, q₃₀) = ( − 7.0 b, − 0.25 b^3/2) in the collective space and look at the neutron wavefunction with occupation number $v_{μ}^{2} = 0.945255$ , which is located near the Fermi surface. This example was obtained for an AE trained with the $d_{o}^{(0)}$ loss and compressed to D = 20. The figure shows, in the left panel, the logarithm of the squared norm of the original wavefunction across the quadrature mesh, $\ln | φ_{μ} |^{2} \equiv \ln | T_{n_{⊥} n_{z} n_{σ}}^{(φ_{μ})} |^{2}$ , in the middle panel, the same quantity for the reconstructed wavefunction, and in the right panel the logarithm of the absolute value of the difference between the two. On this example, the AE can reconstruct the wavefunction with about 3% error.

FIGURE 10

FIGURE 10. (A): Contour plot of the logarithm of the squared norm of the neutron canonical wavefunction with occupation number $v_{μ}^{2} = 0.945255$ (without the exponential factor). Middle: Same for the reconstructed wavefunction. (B): Logarithm of the difference between the squared norm of the original and reconstructed wavefunctions.

4.3.2 Structure of the latent space

One of the advantages of AEs is the existence of a low-dimensional representation of the data. In principle, any visible structure in this latent space would be the signal that the network has properly learned, or encoded, some dominant features of the dataset. Here, our latent space has dimension D = 20. This means that every canonical wavefunction, which is originally a matrix of size n = N_⊥× N_z, is encoded into a single vector of size D. From a mathematical point of view, the encoder is thus a function

\begin{matrix} \hat{E} : R^{n} & \to R^{D} \\ φ & ⟼ v = \hat{E} (φ) \end{matrix} (37)

Let us consider some (scalar) quantity P associated with the many-body state |Φ(q)⟩ at point q. Such a quantity could be an actual observable such as the total energy but it could also be an auxiliary object such as the expectation value of the multipole moment operators. In fact, P could also be a quantity associated with the individual degrees of freedom at point q, for example the q.p. energies. In general terms, we can think of P as the output value of the function

\begin{matrix} \hat{P} : R^{n} & \to R \\ φ & ⟼ P = \hat{P} (φ) \end{matrix} (38)

For example, if P represents the s.p. canonical energies, then the function $\hat{P}$ is the one that associates with each canonical wavefunction its s.p. energy. Therefore, for every canonical wavefunction, there is a different value of P. Conversely, if $P = ⟨ {\hat{Q}}_{20} ⟩$ , there is a single value for all the canonical wavefunctions at point q. Since there is a vector in the latent space for each canonical wavefunction, and there is also a value for the quantity P for each such function, we can then define the new function $\hat{P}$ acting on vectors of the latent space and defined as

\begin{matrix} \hat{P} : R^{D} & \to R \\ v & ⟼ P = \hat{P} (v) \end{matrix} (39)

and it is straightforward to see that: $\hat{P} = \hat{P} ◦ \hat{E}$ . Our goal is now to try to analyze where various quantities P are located in the latent space and whether one can identify some specific features of these locations.

Since we have a total of n_t = 147 wavefunctions for each of the N_p = 552 points in the collective space, the encoder yields a set of n_t × N_p vectors of dimension D. This means that, in the latent space, every quantity P above is also represented by a cloud of n_t × N_p such vectors. This is obviously impossible to visualize. For this reason, we introduce the following analysis. First, we perform a linear regression in the D-dimensional latent space of a few select quantities of interest P, that is, we write

P = α \cdot v + b, (40)

where α is a D-dimensional vector, v is the vector associated with the quantity P in the latent space and $b \in R$ . The unit vector u = α/‖α‖ can be interpreted as representing the leading direction in the latent space. The quantity u = u ⋅v is a scalar which we obtain easily from the result of the linear regression. We can thus plot the function P: u↦P(u). Examples of such functions are shown in Figure 11. Each point in the figures represents the value $P = \hat{P} (v)$ of some characteristic quantity at point u = u ⋅v.

FIGURE 11

FIGURE 11. One-dimensional projections of the D-dimensional linear fit for the total energy E_HFB (A), the projection Ω of the canonical state (B) and the neutron Fermi energy λ_n (C). Each point represents one of these quantities for a canonical wavefunction μ and a point q in the collective space.

The three cases shown in Figure 11 illustrate that the network has not always identified relevant features. The case of Ω, middle panel, is the cleanest: there is a clear slope as a function of u: if one sets u = 1, for example, then only values of 7/2 ≤ Ω ≤ 15/2 are possible. Conversely, the AE has not really discovered any feature in the neutron Fermi energy (right panel): for any given value of u, there is a large range of possible values of Fermi energies. In the case of the total energy (left panel), the situation is somewhat intermediate: there is a faint slope suggesting a linear dependency of the energy as a function of u.

4.3.3 Physics validation

The results presented in the Section 4.3.1 suggest the AE has the ability to reproduce the canonical wavefunctions with good precision. To test this hypothesis, we recalculated the HFB solution at all the training, validation and testing points by substituting in the HFBTHO binary files the original canonical wavefunctions by the ones reconstructed by the AE. Recall that only the lowest n_t wavefunctions with the largest occupation were encoded in the AE (n_n = 87 for neutrons and n_p = 60 for protons); the remaining ones were unchanged. In practice, their occupation is so small that their contribution to nuclear observables is very small ( $<$ 10 keV for the total energy, for example).

Figure 12 shows the error on the potential energy across the (q₂₀, q₃₀) collective space obtained with the reconstructed canonical wavefunctions for latent spaces of dimension D = 20 (left) and D = 10 (right). In each case, we only show results obtained when using the $d_{o}^{(0)}$ loss, which gives the best results. The black crosses denote the location of all the original points; the white circles show the location of the validation points. Overall, the results are very encouraging. In both cases, most of the error is concentrated near regions of the PES where there are discontinuities (hence, the lack of converged solutions). Everywhere else, the error is small and mostly randomly distributed across the PES, that is, it is not systematically larger at the validation points. As expected, the quality of the reconstruction is a little worse when D = 10: one can notice about a dozen points for which the error is significantly larger, in absolute value. Examples include (q₂₀, q₃₀) = ( − 5.0 b, 1.125 b^3/2) or (q₂₀, q₃₀) = ( + 8.0 b, 2.5 b^3/2) in the validation set, and (q₂₀, q₃₀) = (0.0 b, 1.75 b^3/2) or the region around 1 b ≤ q₂₀ ≤ 4 b and 1.5 b^3/2 ≤ q₃₀ ≤ 2.25 b^3/2 in the training set. These may suggest that for D = 10, the loss may not have fully converged yet. Because of the existence of discontinuities near these points, this could also be the manifestation that our continuous AE cannot build a continuous representation of the data everywhere. However, the fact that an increase of the compression by a factor 2, from D = 20 to D = 10, does not substantially degrade the performance of the AE is very promising.

FIGURE 12

FIGURE 12. (A): Potential energy surface in the (q₂₀, q₃₀) plane for ⁹⁸Zr obtained after replacing the first n_n = 87 and n_p = 60 highest-occupation canonical wavefunctions by their values reconstructed by the AE for a latent space of dimension D = 20. The black dots show the location of the training points only, the white circles the location of the validation points. (B): same figure for a latent space of dimension D = 10. For both figures, energies are given in MeV.

The two histograms in Figure 13 give another measure of the quality of the AE. The histogram in the left shows the distribution of the error on the HFB energy for two sizes of the latent space, D = 20 and D = 10. For the D = 20 case, the mean error is $\bar{Δ E} = 104$ keV and the standard deviation σ_E = 89 keV. These numbers increase a little bit in the case of D = 10: $\bar{Δ E} = 122$ keV and σ_E = 153 keV. To better estimate the quality of the AE, we applied the same Gaussian process technique as described in Section 3 to the dataset in ⁹⁸Zr. Since GPs are interpolators and reproduce the training data, we can only test them across the validation set, which, in the case of ⁹⁸Zr contains only 110 configurations. After computing the energy with the HFB potentials obtained from the GP across this set, we obtain: ${\bar{Δ E}}^{(G P)} = - 32$ keV and $σ_{E}^{(G P)} = 225$ keV. This is comparable to what the AE predicts. As mentioned before, it is important to bear in mind that, for the AE, the points with the higher error ΔE < − 150 keV or ΔE > 250 keV do not correspond to testing points only.

FIGURE 13

FIGURE 13. (A): Histogram of the difference in total HFB energy between the original HFBTHO calculation and the result obtained by computing the energy in the canonical basis with the reconstructed wavefunctions (see text for details). Calculations were performed both for a D = 20 and D = 10 latent space. (B): Similar histogram for the expectation value of the axial quadrupole moment.

The histogram on the right shows the distribution of the error for q₂₀, in units of b. Here, the mean and standard deviation of the error are: $\bar{Δ q_{20}} = 6.8$ mb and σ_q = 33.0 mb for D = 20, and $\bar{Δ q_{20}} = - 7.4$ mb and σ_q = 48.2 mb for D = 10. For comparison, the results obtained (across the validation set only) with the GP are: $\bar{Δ q_{20}} = - 12.7$ mb and σ_q = 68.3 mb. Once again, the AE performs just as well, if not slightly better, than the GP fit.

5 Conclusion

Nuclear density functional methods are amenable to large-scale calculations of nuclear properties across the entire chart of isotopes relevant for, e.g., nuclear astrophysics simulations or uncertainty quantification. However, such calculations remain computationally expensive and fraught with formal and practical issues associated with self-consistency or reduced collective spaces. In this article, we have analyzed two different techniques to build fast, efficient and accurate surrogate models, or emulators, or DFT objects.

We first showed that Gaussian processes could reproduce reasonably well the values of the mean-field and pairing-field potentials of the HFB theory across a large two-dimensional potential energy surface. The absolute error on the total energy was within ±100 keV and the relative errors on the collective inertia tensor smaller than 5%. However, GPs require the training data to be “smoothly-varying”, i.e., they should not include phenomena such as nuclear scission or, more generally, discontinuities in the PES. It is well known that GPs are not reliable for extrapolation: such a technique can thus be very practical to densify (=interpolate) an existing potential energy surface but must not be applied outside its training range.

Our implementation of standard versions of GPs is fast and simple to use, but it misses many of the correlations that exist between the values of the HFB potentials on the quadrature grid: (i) all components of the full Skyrme mean field (central, spin-orbit, etc.) are in principle related to one another through their common origin in the non-local one-body density matrix; (ii) the correlations between the value of any given potential at point (r, z) and at point (r′, z′) were not taken into account; (iii) the correlations between the variations of the mean fields at different deformations was also neglected. Incorporating all these effects may considerably increase the complexity of the emulator. In such a case, it is more natural to directly use deep-learning techniques. In this work, we reported the first application of autoencoders to emulate the canonical wavefunctions of the HFB theory. Autoencoders are a form of deep neural network that compresses the input data, here the canonical wavefunctions, into a small-dimensional space called the latent representation. The encoder is trained simultaneously with a decoder by enforcing that the training data is left invariant after compression followed by decompression. In practice, the measure of such “invariance” is set by what is called the loss of the network. We discussed possible forms of the loss that are best adapted to learning quantum-mechanical wavefunctions of many-body systems such as nuclei. We showed that such an AE could successfully reduce the data into a space of dimension D = 10 while keeping the total error on the energy lower than ΔE = 150 keV (on average). The analysis of the latent space revealed well-identified structures in a few cases, which suggests the network can learn some of the physics underlying the data. This exploratory study suggests that AE could serve as reliable canonical wavefunctions generators. The next step will involve learning a full sequence of such wavefunctions, i.e., an ordered list, in order to emulate the full HFB many-body state.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

MV led the study of autoencoders for canonical wavefunctions: design of the overall architecture of the autoencoders, development of the software stack and analysis of the results. NS led the study of Gaussian process for mean-field potentials—training and validation runs, code development and HFBTHO calculations—and supervised the whole project. IK implemented, tested and trained different architectures of autoencoders and helped with the analysis of the results. PM developed an HFBTHO module to use canonical wavefunctions in HFB calculations. KQ provided technical expertise about Gaussian processes and supervised MN, who implemented and fitted Gaussian processes to mean-field potentials. DR and RL provided technical consulting on deep neural network and the architecture of autoencoders. All authors contributed to the writing of the manuscript, and read and approved the submitted version.

Funding

This work was performed under the auspices of the United States Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. Computing support came from the Lawrence Livermore National Laboratory (LLNL) Institutional Computing Grand Challenge program.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fphy.2022.1028370/full#supplementary-material

Footnotes

¹The pairing contribution lumps together terms coming from nuclear forces, Coulomb forces and possibly rearrangement terms.

²Following Dobaczewski et al. [35,117], we employ the ‘russian’ convention where the pairing field is defined from the pairing density $\tilde{ρ} (r σ, r^{'} σ^{'})$ rather than the pairing tensor. The quantity $\tilde{h}$ is related to the more traditional form of the pairing field Δ through: $\tilde{h} (r σ, r^{'} σ^{'}) = - 2 σ^{'} Δ (r σ, r^{'} - σ^{'})$ .

³This statement is obviously not true when solving the HFB equation directly in coordinate space. In the case of the local density discussed here, the expression $ρ (r) = \sum_{σ} \sum_{μ} V_{μ} (r, σ) V_{μ}^{*} (r, σ)$ is just as computationally expensive as the canonical basis expression $ρ (r) = \sum_{σ} \sum_{μ} v_{μ}^{2} | φ_{μ} (r, σ) |^{2}$ .

⁴Note that B₃₂ vanishes for axially-symmetric shapes. As a result, the relative error can be artificially large for values of q₃₀ ≈ 0 b^3/2.

⁵This particular scission configuration corresponds to what is called cluster radioactivity Warda and Robledo [118]; Warda et al. [119]; Matheson et al. [120]. The heavy fragment is much larger than the light one. Here, ⟨A_H⟩ = 205.6, ⟨A_L⟩ = 34.4.

⁶A distance d(u, v) is norm-invariant if for any positive real number α and β, we have d(αu, βv) = d(u, v).

⁷Since time-reversal symmetry is conserved, the Fermi energy is located around states with indices μ_p ≈ 20 and μ_n ≈ 29. Therefore, our choice implies that in our energy window, about 1/3 of all states are below the Fermi level and about 2/3 of them are above it.

References

1. Eschrig R. Fundamentals of density functional theory. Leipzig: Teubner (1996).

Google Scholar

2. Schunck N. IOP expanding physics. Bristol, UK: IOP Publishing (2019).Energy density functional methods for atomic nuclei.