Hardware Design for Autonomous Bayesian Networks

Faria, Rafatul; Kaiser, Jan; Camsari, Kerem Y.; Datta, Supriyo

doi:10.3389/fncom.2021.584797

ORIGINAL RESEARCH article

Front. Comput. Neurosci., 08 March 2021

Volume 15 - 2021 | https://doi.org/10.3389/fncom.2021.584797

This article is part of the Research TopicBrain-Inspired Computing: From Neuroscience to Neuromorphic Electronics driving new forms of Artificial IntelligenceView all 11 articles

Hardware Design for Autonomous Bayesian Networks

Rafatul Faria¹^*

Jan Kaiser¹

Kerem Y. Camsari²

Supriyo Datta¹

¹Department of Electrical and Computer Engineering, Purdue University, West Lafayette, IN, United States
²Department of Electrical and Computer Engineering, University of California, Santa Barbara, Santa Barbara, CA, United States

Directed acyclic graphs or Bayesian networks that are popular in many AI-related sectors for probabilistic inference and causal reasoning can be mapped to probabilistic circuits built out of probabilistic bits (p-bits), analogous to binary stochastic neurons of stochastic artificial neural networks. In order to satisfy standard statistical results, individual p-bits not only need to be updated sequentially but also in order from the parent to the child nodes, necessitating the use of sequencers in software implementations. In this article, we first use SPICE simulations to show that an autonomous hardware Bayesian network can operate correctly without any clocks or sequencers, but only if the individual p-bits are appropriately designed. We then present a simple behavioral model of the autonomous hardware illustrating the essential characteristics needed for correct sequencer-free operation. This model is also benchmarked against SPICE simulations and can be used to simulate large-scale networks. Our results could be useful in the design of hardware accelerators that use energy-efficient building blocks suited for low-level implementations of Bayesian networks. The autonomous massively parallel operation of our proposed stochastic hardware has biological relevance since neural dynamics in brain is also stochastic and autonomous by nature.

1. Introduction

Bayesian networks (BN) or belief nets are probabilistic directed acyclic graphs (DAG) popular for reasoning under uncertainty and probabilistic inference in real-world applications such as medical diagnosis (Nikovski, 2000), genomic data analysis (Friedman et al., 2000; Jansen et al., 2003; Zou and Conzen, 2004), forecasting (Sun et al., 2006; Ticknor, 2013), robotics (Premebida et al., 2017), image classification (Arias et al., 2016; Park, 2016), neuroscience (Bielza and Larrañaga, 2014), and so on. BNs are composed of probabilistic nodes and edges from parent to child nodes and are defined in terms of conditional probability tables (CPT) that describe how each child node is influenced by its parent nodes (Heckerman and Breese, 1996; Koller and Friedman, 2009; Pearl, 2014; Russell and Norvig, 2016). The CPTs can be obtained from expert knowledge and/or machine learned from data (Darwiche, 2009). Each node and edge in a BN have meaning representing specific probabilistic events and their conditional dependencies and they are easier to interpret (Correa et al., 2009) than neural networks where the hidden nodes do not necessarily have meaning. Unlike neural networks where useful information is extracted only at the output nodes for prediction purposes, BNs are useful for both prediction and inference by looking at not only the output nodes but also other nodes of interest. Computation of different probabilities from a BN becomes intractable when the network gets deeper and more complicated with child nodes having many parent nodes. This has inspired various hardware implementations of BNs for efficient inference (Rish et al., 2005; Chakrapani et al., 2007; Weijia et al., 2007; Jonas, 2014; Querlioz et al., 2015; Zermani et al., 2015; Behin-Aein et al., 2016; Friedman et al., 2016; Thakur et al., 2016; Tylman et al., 2016; Shim et al., 2017). In this article, we have elucidated the design criteria for an autonomous (clockless) hardware for BN unlike other implementations that typically use clocks.

Recently, a new type of hardware computing framework called probabilistic spin logic (PSL) is proposed (Camsari et al., 2017a) based on a building block called probabilistic bits (p-bits) that are analogous to binary stochastic neurons (BSN) (Ackley et al., 1985; Neal, 1992) of the artificial neural network (ANN) literature. p-bits can be interconnected to solve a wide variety of problems such as optimization (Sutton et al., 2017; Borders et al., 2019), inference (Faria et al., 2018), an enhanced type of Boolean logic that is invertible (Camsari et al., 2017a; Faria et al., 2017; Pervaiz et al., 2017, 2018), quantum emulation (Camsari et al., 2019), and in situ learning from probability distributions (Kaiser et al., 2020).

Unlike conventional deterministic networks built out of deterministic, stable bits, stochastic or probabilistic networks composed of p-bits (Figure 1A), can be correlated by interconnecting them to construct p-circuits defined by two equations (Ackley et al., 1985; Neal, 1992; Camsari et al., 2017a): (1) a p-bit/BSN equation and (2) a weight logic/synapse equation. The output of a p-bit, m_i, is related to its dimensionless input I_i by the equation:

\begin{array}{l} m_{i} (t + τ_{N}) = sgn (rand (- 1, 1) + tanh I_{i} (t)) & (1a) \end{array}

where rand(−1, +1) is a random number uniformly distributed between −1 and +1, and τ_N is the neuron evaluation time.

FIGURE 1

Figure 1. Clocked vs. autonomous p-circuit: (A) a probabilistic (p-)circuit is composed of p-bits interconnected by a weight logic (synapse) that computes the input I_i to the i^th p-bit as a function of the outputs from other p-bits. (B) p-bit design 1 based on stochastic Magnetic Tunnel Junction (s-MTJ) using low barrier nanomagnets (LBMs) and an NMOS transistor as tunable component. (C) p-bit design 2 based on s-MTJ as tunable component. Both designs have been used to build a p-circuit as shown in (A). (D) Two types of p-circuits are built: a directed or Bayesian network and a symmetrically connected Boltzmann network. The p-circuits are sequential (labeled as SeqPSL) that means p-bits are updated sequentially, one at a time, using a clock circuitry with a sequencer. It is shown that for Boltzmann networks update order does not matter and any random update order would produce the correct probability distribution. But for Bayesian networks, a specific, parent-to-child update order is necessary to converge to the correct probability distribution dictated by the Bayes rule. (E) The same Bayesian and Boltzmann p-circuits are implemented on an autonomous hardware built with p-bit design 1 and 2 without any clocks or sequencers. It is interesting to note that for Bayesian networks, design 2 fails to match the probabilities from applying Bayes rule, whereas design 1 works quite well as an autonomous Bayesian network. For every histogram in this figure, 10⁶ samples have been collected.

The synapse generates the input I_i from a weighted sum of the states of other p-bits. In general, the synapse can be a linear or non-linear function, although a common form is the linear synapse described according to the equation:

\begin{array}{l} I_{i} (t + τ_{S}) = I_{0} (h_{i} + \sum_{j} J_{i j} m_{j} (t)) & (1b) \end{array}

where h_i is the on-site bias and J_ij is the weight of the coupling from j^th p-bit to i^th p-bit, I₀ parameterizes the coupling strength between p-bits, and τ_S is the synpase evaluation time. Several hardware designs of p-bits based on low barrier nanomagnet (LBM) physics have been proposed and also experimentally demonstrated (Ostwal et al., 2018; Borders et al., 2019; Ostwal and Appenzeller, 2019; Camsari et al., 2020; Debashis, 2020). The thermal energy barrier of the LBM is of the order of a few k_BT instead of 40–60 k_BT used in the memory technology to retain stability. Because of thermal noise the magnetization of the LBM keeps fluctuating as a function of time with an average retention time τ~τ₀exp(E_B/k_BT) (Brown, 1979), where τ₀ is a material-dependent parameter called attempt time that is experimentally found to be in the range of nanosecond or less and E_B is the thermal energy barrier (Lopez-Diaz et al., 2002; Pufall et al., 2004). The stochasticity of the LBMs makes them naturally suitable for p-bit implementation.

Figure 1 shows two p-bit designs: Design 1 (Figure 1B) (Camsari et al., 2017b; Borders et al., 2019) and Design 2 (Figure 1C) (Camsari et al., 2017a; Ostwal and Appenzeller, 2019). Designs 1 and 2 both are fundamental building blocks of spin transfer torque (STT) and spin orbit torque (SOT) magnetoresistive random access memory (MRAM) technologies, respectively (Bhatti et al., 2017). Their technological relevance motivates us to explore their implementations as p-bits. Design 1 is very similar to the commercially available 1T/1MTJ (T: Transistor, MTJ: Magnetic Tunnel Junction) embedded MRAM device where the free layer of the MTJ is replaced by an in-plane magnetic anisotropy (IMA) or perpendicular magnetic anisotropy (PMA) LBM. Design 2 is similar to the basic building block of SOT-MRAM device (Liu et al., 2012) where the thermal fluctuation of the free layer magnetization of the stochastic MTJ (s-MTJ) (Vodenicarevic et al., 2017, 2018; Mizrahi et al., 2018; Parks et al., 2018; Zink et al., 2018; Borders et al., 2019) is tuned by a spin current generated in a heavy metal layer underneath the LBM due to SOT effect. The in-plane polarized spin current from the SOT effect in the spin hall effect (SHE) material in design 2 requires an in-plane LBM to tune its magnetization, although a perpendicular LBM with a tilted anisotropy axis is also experimentally shown to work (Debashis et al., 2020). However, design 2 requires spin current manipulation, design 1 does not rely on that as long as circular in-plane LBMs with continuous valued magnetization states that are hard to pin are used. In-plane LBMs also provide faster fluctuation than perpendicular ones leading to faster sampling speed in the probabilistic hardware (Hassan et al., 2019; Kaiser et al., 2019).

The key distinguishing feature of the two p-bit designs (designs 1 and 2) is the time scales in implementing Equation (1a). From a hardware point of view, Equation (1a) has two components: a random number generator (RNG) (rand) and a tunable component (tanh). In design 1, the RNG is the s-MTJ utilizing an LBM and the tunable component is the NMOS transistor, thus having two different time scales in the equation. But in design 2, both the RNG and the tunable component are implemented by a single s-MTJ utilizing an LBM, thus having just one time scale in the equation. This difference in time scales in the two designs is shown in Figure 2. Note that although the two p-bit designs have the same RNG source, namely a fluctuating magnetization, it is the difference in their circuit configuration with or without the NMOS transistor in the MTJ branch that results in different time dynamics of the two designs.

FIGURE 2

Figure 2. Autonomous behavioral model for p-bit: (A–D) Behavioral model for the autonomous hardware with design 1 (Figure 1B) is benchmarked with SPICE simulations of the actual device involving experimentally benchmarked modules. The behavioral model (labeled as “PPSL”) shows good agreement with SPICE in terms of capturing fluctuation dynamics (A), steady-state sigmoidal response (B), and two different time responses: autocorrelation time of the fluctuating output under zero input condition labeled as τ_corr (C), which is proportional to the LBM retention time τ_N in the nanosecond range, and the step response time τ_step (D) that is proportional to transistor response time τ_T, which is few picoseconds and much smaller than τ_N. The magnet parameters used in the simulations are mentioned in section 2. (E–H) Similar benchmarking for p-bit design 2 (Figure 1C). In this case, τ_step is proportional to τ_N. For (B,F), each point for the SPICE simulation was obtained by averaging m_i over 1 μs. The step response time for (D,H) is obtained by averaging over 2,000 ensembles where I_i = −5 at t < 0 and I_i = 0 at t > 0.

In traditional software implementations, p-bits are updated sequentially for accurate operation such that after each τ_S+τ_N time interval, only one p-bit is updated (Hinton, 2007). This naturally implies the use of sequencers to ensure the sequential update of p-bits. The sequencer generates an Enable signal for each p-bit in the network and ensures that no two p-bits update simultaneously. The sequencer also makes sure that every p-bit is updated at least once in a time step where each time step corresponds to N·(τ_S+τ_N), N being the number of p-bits in the network. (Roberts and Sahu, 1997; Pervaiz et al., 2018). For symmetrically connected networks (J_ij = J_ji) such as Boltzmann machines, the update order of p-bits does not matter and any random update order produces the standard probability distribution described by equilibrium Boltzmann law as long as p-bits are updated sequentially. But for directed acyclic networks (J_ij≠0, J_ji = 0) or BNs to be consistent with the expected conditional probability distribution, p-bits need to be updated not only sequentially but also in a specific update order, which is from the parent to child nodes (Neal, 1992) similar to the concept of forward sampling in belief networks (Henrion, 1988; Guo and Hsu, 2002; Koller and Friedman, 2009). As long as this parent to child update order is maintained, the network converges to the correct probability distribution described by probability chain rule or Bayes rule. This effect of update order in a sequential p-circuit is shown on a three p-bit network in Figure 1D. In the Supplementary Material, it is shown in an example how the CPT of the BN can be mapped to a p-circuit following Faria et al. (2018).

Unlike sequential p-circuits in ANN literature, the distinguishing feature of our probabilistic hardware is that it is autonomous where each p-bit runs in parallel without any clocks or sequencers. This autonomous p-circuit (ApC) allows massive parallelism potentially providing peta flips per second sampling speed (Sutton et al., 2020). The complete sequencer-free operation of our “autonomous” p-circuit is very different from the “asynchronous” operation of spiking neural networks (Merolla et al., 2014; Davies et al., 2018). Although p-bits are fluctuating in parallel in an ApC, it is very unlikely that two p-bits will update at the exact same time since random noise control their dynamics. Therefore, persistent parallel updates are extremely unlikely and are not a concern. Note that even if p-bits update sequentially, each update has to be informed such that when one p-bit updates it has received the up-to-date input I_i based on the latest states of other p-bits m_j that it is connected to. This informed update can be ensured as long as the synapse response time is much faster than the neuron time (τ_S ≪ τ_N) and this is a key design rule for an ApC. If the input of the p-bit is based on old state of neighboring p-bits or on time-integrated synaptic inputs, the ApC operation declines in functionality or fails completely. However, for τ_S ≪ τ_N, the ApC works properly for a Boltzmann network without any clock because no specific update order is required in this case. But, it is not intuitive at all if an ApC would work for a BN because a particular parent to child informed update order is required in this case, as shown in Figure 1D. As such, it is not straightforward that a clockless autonomous circuit can naturally ensure this specific informed update order. In Figure 1E, we have shown that it is possible to design hardware p-circuit that can naturally ensure a parent to child informed update order in a BN without any clocks. In Figure 1E, two p-bit designs are evaluated for implementing both Boltzmann network and BN. We have shown that design 1 is suitable for both Boltzmann network and BN. But design 2 is suitable for Boltzmann networks only and does not work for BNs in general. The synapse in both types of p-circuits is implemented using a resistive crossbar architecture (Alibart et al., 2013; Camsari et al., 2017b), although there are also other types of hardware synapse implementations based on memristors (Li et al., 2018; Mahmoodi et al., 2019; Mansueto et al., 2019), magnetic tunnel junctions (Ostwal et al., 2019), spin orbit torque driven domain wall motion devices (Zand et al., 2018), phase change memory devices (Ambrogio et al., 2018), and so on. In all the simulations, τ_S is assumed to be negligible compared to other time scales in the circuit dynamics.

Our proposed probabilistic hardware for BNs shows significant biological relevance because of the following reasons: (1) The brain consists of neurons and synapses. The basic building block called “p-bit” of our proposed hardware mimics the neuron and the interconnection among p-bits mimics the synapse function. (2) The components of brain are stochastic or noisy by nature. p-bits mimicking the neural dynamics in our proposed hardware are also stochastic. (3) Brain does not have a single clock for synchronous operation and can perform massively parallel processing (Strukov et al., 2019). Our autonomous hardware also does not have any global clock or sequencers and each p-bit fluctuates in parallel allowing massively parallel operation.

Further, we have provided a behavioral model in section 2 for both designs 1 and 2, illustrating the essential characteristics needed for correct sequencer-free operation of BNs. Both models are benchmarked against state-of-the-art device/circuit models (SPICE) of the actual devices and can be used for the efficient simulation of large-scale autonomous networks.

2. Behavioral Model for Autonomous Hardware

In this section, we will develop an autonomous behavioral model that we will call parallel probabilistic spin logic (PPSL) for design 1 (Figure 1B) and revisit the behavioral model for design 2, which was proposed by Sutton et al. (2020). The term “Parallel” refers to all the p-bits fluctuating in parallel without any clocks or sequencers. These behavioral models are high-level representations of the p-circuit and p-bit behavior and connect Equations (1a) and (1b) to the hardware p-bit designs. Please note the parameters introduced in these models will represent certain parts of the p-bit and synapse behavior like MTJ resistances (r_MTJ) and transistor resistances (r_T) but are generally dimensionless apart from time variables (e.g., τ_T,τ_N). The advantage of these models is that they are computationally less expensive to use than full SPICE simulations while preserving the crucial device and system characteristics.

2.1. Autonomous Behavioral Model: Design 1

The autonomous circuit behavior of design 1 can be explained by slightly modifying the two equations (Equations 1a,b) stated in section 1. The fluctuating resistance of the low barrier nanomagnet-based MTJ is represented by a correlated random number r_MTJ with values between −1 and +1 and an average dwell time of the fluctuation denoted by τ_N. The NMOS transistor tunable resistance is denoted by r_T and the inverter is represented by a sgn function. Thus, the normalized output m_i = V_{OUT, i}/V_DD of the i_th p-bit can be expressed as:

\begin{array}{l} m_{i} (t + Δ t) = sgn (r_{T, i} (t + Δ t) - r_{M T J, i} (t + Δ t)) & (2) \end{array}

where Δt is the simulation time step, r_{T, i} represents the NMOS transistor resistance tunable by the normalized input I_i = V_{IN, i}/V₀ (compare Equation 1a) where V₀ is a fitting parameter which is ≈50mV for the chosen parameters and transistor technology (compare Figure 2B) and r_{MTJ, i} is a correlated random number generator with an average retention time of τ_N. For design 1, the transistor represents the tunable component that works in conjunction with the unbiased stochastic signal of the MTJ. r_{T, i} as a function of input I_i is approximated by a tanh function with a response time denoted by τ_T modeled by the following equations:

\begin{array}{l} r_{T, i} (t + Δ t) = r_{T, i} (t) exp (- Δ t / τ_{T}) \\ + (1 - exp (- Δ t / τ_{T})) (tanh (I_{i} (t + Δ t))) & (3) \end{array}

where it can be clearly seen that the dimensionless quantity r_{T, i} representing the transistor resistance is bounded by −1 ≤ r_T,i ≤ 1 for all synaptic inputs. Figure 2B shows by utilizing SPICE simulation how I_i influences the average output m_i and shows that the average response of the circuit is in good agreement with the tanh-function used in Equation (3).

The synapse delay τ_S in computing the input I_i can be modeled by:

\begin{array}{l} I_{i} (t + Δ t) = I_{i} (t) exp (- Δ t / τ_{S}) \\ + (1 - exp (- Δ t / τ_{S})) (I_{0} (\sum_{j} J_{i j} m_{j} (t) + h_{j})) & (4) \end{array}

For calculating r_{MTJ, i}, at time t+Δt a new random number will be picked according to the following equations:

\begin{array}{l} r_{f l i p, i} (t + Δ t) = sgn (exp (- \frac{Δ t}{τ_{N}}) - ran d_{[0, 1]}) & (5a) \end{array}

where rand_{[0, 1]} is a uniformly distributed random number between 0 and 1 and τ_N represents the average retention time of the fluctuating MTJ resistance. If r_flip is -1, a new random r_MTJ will be chosen between −1 and +1. Otherwise, the previous r_MTJ(t) will be kept in the next time step (t+Δt), which can be expressed as

\begin{array}{l} r_{M T J, i} (t + Δ t) = \frac{r_{f l i p, i} (t + Δ t) + 1}{2} r_{M T J, i} (t) \\ - \frac{r_{f l i p, i} (t + Δ t) - 1}{2} ran d_{[- 1, 1]} & (5b) \end{array}

where −1 ≤ r_MTJ,i(t) ≤ 1.

The charge current flowing through the MTJ branch of p-bit design 1 can get polarized by the fixed layer of the MTJ and generate a spin current I_s that can tune/pin the MTJ dynamics by modifying τ_N. This effect is needed for tuning the output of design 2 but is not desired in design 1. However, the developed behavioral model can account for this pinning effect according to

\begin{array}{l} τ_{N} = τ_{N}^{0} exp (r_{M T J} I_{M T J}), & (6) \end{array}

where $τ_{N}^{0}$ is the retention time of r_MTJ when I_MTJ = 0. The dimensionless pinning current I_MTJ is defined as I_MTJ = I_s/I_{s, 0} where I_{s, 0} can be extracted by following the procedure of Figure 2F. This pinning effect by I_MTJ is much smaller in in-plane magnets (IMA) than perpendicular magnets (PMA) (Hassan et al., 2019) and is ignored for design 1 throughout this paper.

Figures 2A–D shows the comparison of this behavioral model for p-bit design 1 with SPICE simulation of the actual hardware in terms of fluctuation dynamics, sigmoidal characteristic response, autocorrelation time (τ_corr), and step response time (τ_step) and in all cases the behavioral model closely matches SPICE simulations. The SPICE simulation involves experimentally benchmarked modules for different parts of the device. The SPICE model for the s-MTJ model solves the stochastic Landau–Lifshitz–Gilbert equation for the LBM physics. For the transistors, 14 nm Predictive Technology Model¹ is used. As simulator HSPICE is utilized with the .trannoise function and a time step of 1 ps. The simulating framework was benchmarked experimentally and by using standard simulation tools in the field (Datta, 2012; Torunbalci et al., 2018). The autonomous behavioral model for design 1 is labeled as “PPSL: design 1.” The benchmarking is done for two different LBMs: (1) Faster fluctuating magnet 1 with saturation magnetization M_s = 1100 emu/cc, diameter D = 22 nm, thickness th = 2 nm, in-plane easy axis anisotropy H_k = 1 Oe, damping coefficient α = 0.01, demagnetization field H_d = 4πM_s and (2) slower fluctuating magnet 2 with the same parameters as in magnet 1 except D = 150 nm. The supply voltage was set to V_DD = −V_SS = 0.4 V. The fast and slow fluctuations of the normalized output m_i = V_{OUT, i}/V_DD are captured by changing the τ_N parameter in the PPSL model. In the steady-state sigmoidal response, V₀ is a tanh fitting parameter that defines the width of the sigmoid and lies within the range of 40–60 mV reasonably well depending on which part of the sigmoid needs to be better matched. In Figure 2B, V₀ value of 50 mV is used to fit the sigmoid from SPICE simulation. The following parameters have been extracted from the calibration shown in Figure 2, where Δt = 1 ps was used: τ_N = 150 ps (magnet 1), τ_N = 1.5 ns (magnet 2), τ_T = 3 ps, I_{s, 0} = 120 μA (magnet 1), I_{s, 0} = 1 mA (magnet 2).

There are two types of time responses: (1) Autocorrelation time under zero input condition labeled as τ_corr and (2) step response time τ_step. The full width half maximum (FWHM) of the autocorrelation function of the fluctuating output under zero input is defined by τ_corr, which is proportional to the retention time τ_N of the LBM. The step response time τ_step is obtained by taking an average of the p-bit output over many ensembles when the input I_i is stepped from a large negative value to zero at time t = 0 and measuring the time it takes for the ensemble averaged output to reach its statistically correct value consistent with the new input. τ_step defines how fast the first statistically correct sample can be obtained after the input is changed. For p-bit design 1, τ_step is independent of LBM retention time τ_N and is defined by the NMOS transistor response time τ_T, which is much faster (few picoseconds) than LBM fluctuation time τ_N. The effect of these two very different time scales in design 1 (τ_step ≪ τ_corr) on an autonomous BN is described in section 3.

2.2. Autonomous Behavioral Model: Design 2

The autonomous behavioral model for design 2 is proposed in Sutton et al. (2020). In this article, we have benchmarked this model with the SPICE simulation of the single p-bit steady state and time responses shown in Figures 2E–H. According to this model, the normalized output m_i = V_{OUT, i}/V_DD can be expressed as:

\begin{array}{l} m_{i} (t + Δ t) = m_{i} (t) sgn (p_{N O T f l i p, i} (t + Δ t) - ran d_{[0, 1]}) & (7a) \end{array}

\begin{array}{l} p_{N O T f l i p, i} (t + Δ t) = exp (- \frac{Δ t}{τ_{N} exp (I_{i} m_{i} (t))}) & (7b) \end{array}

where p_{NOTflip, i}(t+Δt) is the probability of retention of the i^th p-bit (or “not flipping”) in the next time step that is a function of average neuron flip time τ_N, input I_i, and the current p-bit output m_i(t). Figure 2 shows how this simple autonomous behavioral model for design 2 matches reasonably well with SPICE simulation of the device in terms of fluctuation dynamics (Figure 2E), sigmoidal characteristic response (Figure 2F), autocorrelation time (τ_corr) (Figure 2G), and step response time (τ_step) (Figure 2H). In design 2, τ_step and τ_corr are both proportional to LBM fluctuation time τ_N unlike design 1.

Different time scales in p-bit designs 1 and 2 are also reported in Hassan et al. (2019) in an energy-delay analysis context. In this article, we explain the effect of these time scales in designing an autonomous BN (section 3).

3. Difference Between Designs 1 and 2 in Implementing Bayesian Networks

The behavioral models introduced in section 2 are applied to implement a multi-layer belief/BN with 19 p-bits and random interconnection strengths between +1 and −1 (Figure 3A). For illustrative purposes, the interconnections are designed in such a way that although there are no meaningful correlations between the blue and red colored nodes with random couplings, pairs of intermediate nodes (A, M₁) and (M₁, B) get negatively correlated because of a net −r² type coupling through each branch connecting the pairs. So it is expected that the start and end nodes (A, B) get positively correlated. Figure 3B shows histograms of four configurations (00, 01, 10, 11) of the pair of nodes A and B obtained from different approaches: Bayes rule (labeled as Analytic), SPICE simulation of design 1 (SPICE: Design 1) and design 2 (SPICE: Design 2), and autonomous behavioral model for design 1 (PPSL: Design 1) and design 2 (PPSL: design 2). It is shown that results from SPICE simulation and behavioral model for design 1 matches reasonably well with the standard analytical values showing 00 and 11 states with highest probability, whereas design 2 autonomous hardware does not work well in terms of matching with the analytical results and shows approximately all equal peaks. We have tested this basic conclusion for other networks as well with more complex topology as shown in Supplementary Figure 1. The analytical values are obtained from applying the standard joint probability rule for BNs (Pearl, 2014; Russell and Norvig, 2016), which is:

\begin{array}{l} P (x_{1}, x_{2}, \dots, x_{N}) = \prod_{i = 1}^{N} P (x_{i} | P a r e n t s (x_{i})) & (8) \end{array}

Joint probability between two specific nodes x_i and x_j can be calculated from the above equation by summing over all configurations of the other nodes in the network, which becomes computationally expensive for larger networks. But one major advantage of our probabilistic hardware is that probabilities of specific nodes can be obtained by looking at the nodes of interest ignoring all other nodes in the system similar to what Feynman stated about a probabilistic computer imitating the probabilistic laws of nature (Feynman, 1982). Indeed, in the BN example in Figure 3, the probabilities of different configurations of nodes A and B were obtained by looking at the fluctuating outputs of the two nodes ignoring all other nodes. For the SPICE simulation of design 1 hardware, tanh fitting parameter V₀ = 57 mV is used and the mapping principle from dimensionless coupling terms J_ij to the coupling resistances in the hardware is described in Faria et al. (2018). An example of this mapping is given in the Supplementary Material.

FIGURE 3

Figure 3. Difference between designs 1 and 2: (A) The behavioral models described in Figure 2 are applied to simulate a 19 p-bit BN with random J_ij between +1 and -1. The indices i and j of J_{i, j} correspond to the numbers inside each circle. The interconnections are designed in such a way so that pairs of intermediate nodes (A, M₁) and (M₁, B) get anti-correlated and (A, B) gets positively correlated. (B) The probability distribution of four configurations of AB are shown in a histogram from different approaches (SPICE, behavioral model and analytic). The behavioral models for two designs (labeled as PPSL) match reasonably well with the corresponding results from SPICE simulation of the actual hardware. Note that while design 1 matches with the standard analytical values quite well, design 2 does not works as an autonomous BN in general. For each histogram, 10⁶ samples have been collected.

The reason why design 1 works for a BN and design 2 does not is because of the two very different time responses of the two designs shown in Figure 2 due to the fact that the tunable component is the transistor in design 1 (τ_step∝τ_T) and the MTJ in design 2 (τ_step∝τ_N). It is these two different time scales in design 1 (τ_step ≪ τ_corr) that naturally ensures a parent to child informed update order in a BN. The reason is that when τ_step is small, each child node can immediately respond to any change of its parent nodes that happens due to a random event, which have a much larger time scale ∝τ_corr. Thus, due to that fast step response, information about changing p-bits at the parent node can propagate quickly through the network and the output of the child nodes can be conditionally satisfied with the parent nodes very fast. Otherwise, if τ_corr gets comparable to τ_step, the child nodes will not be able to keep up with the fast changing parent nodes since the information of the parent p-bit state has not been propagated through the network. As a result, the child nodes will produce a substantial number of statistically incorrect samples over the entire time range, thus deviating from the correct probability distribution. This effect is especially strong for networks where the coupling strength between p-bits is large.

To illustrate this point, the effect of τ_step/τ_corr ratio is shown in Figure 4 for the same BN presented in Figure 3 by plotting the histogram of AB configurations for different τ_T/τ_N ratios. It is shown that when τ_T/τ_N ratio is small, the histogram converges to the correct distribution. As τ_T gets comparable to τ_N, the histogram begins to diverge from the correct distribution. Thus, the very fast NMOS transistor response in design 1 makes it suitable for an autonomous BN hardware. One thing to note that under certain conditions, results from design 2 can also match the analytical results if spin current bias is large enough to drive down the fast step response time to ensure τ_step ≪ τ_corr.

FIGURE 4

Figure 4. Effect of step response time in design 1: The reason for design 1 to work accurately as an autonomous Bayesian network as shown in Figure 3 is the two different time scales (τ_T and τ_N) in this design with the condition that τ_T ≪ τ_N. The same histogram shown in Figure 3 is plotted using the proposed behavioral model for different τ_T/τ_N ratios and compared with the analytical values. It can be seen that as τ_T gets comparable to τ_N, the probability distribution diverges from the standard statistical values. For each histogram 10⁶, samples have been collected.

So apart from ensuring a fast synapse compared to neuron fluctuation time (τ_S ≪ τ_N), which is the design rule for an autonomous probabilistic hardware, the autonomous BN demands an additional p-bit design rule that is a much faster step response time of the p-bit compared to its fluctuation time (τ_step ≪ τ_N) as ensured in design 1. In all the simulations, the LBM was a circular in-plane magnet whose magnetization spans all values between +1 and −1 and negligible pinning effect. If the LBM is a PMA magnet with bipolar fluctuations having just two values +1 and −1, design 1 will not provide any sigmoidal response except with substantial pinning effect (Borders et al., 2019). Under this condition, τ_step of design 1 will be comparable to τ_N again and the system will not work as an autonomous BN in general. Therefore, LBM with continuous range fluctuation is expected for design 1 p-bit to work properly as a BN.

4. Discussion

In this article, we have elucidated the design criteria for an autonomous clockless hardware for BNs that requires a specific parent to child update order when implemented on a probabilistic circuit. By performing SPICE simulations of two autonomous probabilistic hardware designs built out of p-bits (designs 1 and 2 in Figure 1), we have shown that the autonomous hardware will naturally ensure a parent to child informed update order without any sequencers if the step response time (τ_step) of the p-bit is much smaller than its autocorrelation time (τ_corr). This criteria of having two different time scales is met in design 1 as τ_step comes from the NMOS transistor response time τ_T in this design, which is few picoseconds. We have also proposed an autonomous behavioral model for design 1 and benchmarked it against SPICE simulation of the actual hardware. All the simulations using behavioral model for design 1 are performed ignoring some non-ideal effects listed as follows:

• Pinning of the s-MTJ fluctuation due to STT effect is ignored by assuming I_MTJ = 0 in Equation (6). This is a reasonable assumption considering circular in-plane magnets that are very difficult to pin due to the large demagnetization field that is always present, irrespective of the energy barrier (Hassan et al., 2019). This effect is more prominent in perpendicular anisotropy magnets (PMA) magnets. It is important to include the pinning effect in p-bits with bipolar LBM fluctuations because in this case the p-bit does not provide a sigmoidal response without the pinning current. This effect is also experimentally observed in Borders et al. (2019) for PMA magnets. Such a p-bit design with bipolar PMA and STT pinning might not work for BNs in general, because in this case τ_step will be dependent on magnet fluctuation time τ_N.

• In the proposed behavioral model, the step response time of the NMOS transistor τ_T in design 1 is assumed to be independent of the input I. But there is a functional dependence of τ_T on I in real hardware.

• The NMOS transistor resistance r_T is approximated as a tanh function for simplicity. In order to capture the hardware behavior in a better way, the tanh can be replaced by a more complicated function and the weight matrix [J] will have to be learnt around that function.

All the non-ideal effects listed above are supposed to have minimal effects on different probability distributions shown in this article. Real LBMs may suffer from common fabrication defects, resulting in variations in average magnet fluctuation time τ_N (Abeed and Bandyopadhyay, 2019). The autonomous BN is also quite tolerant to such variations in τ_N as long as τ_T ≪ min(τ_N).

It is important to note that, for design 1 (Transistor-controlled) to function as a p-bit that has a step response time (τ_step) much smaller than its average fluctuation time (τ_N), the LBM fluctuation needs to be continuous and not bipolar. It is important to note that while most experimental implementations of low barrier magnetic tunnel junctions or spin-valves exhibit telegraphic (binary) fluctuations (Pufall et al., 2004; Locatelli et al., 2014; Parks et al., 2018; Debashis et al., 2020), theoretical results (Abeed and Bandyopadhyay, 2019; Hassan et al., 2019; Kaiser et al., 2019) indicate that it should be possible to design low barrier magnets with continuous fluctuations. Preliminary experimental results for such circular disk nanomagnets have been presented in Debashis et al. (2016). We believe that a lack of experimental literature on such magnets is partly due to the lack of interest of randomly fluctuating magnets that have long been discarded as impractical and irrelevant. The other experimentally demonstrated p-bits (Ostwal et al., 2018; Ostwal and Appenzeller, 2019; Debashis, 2020) fall under design 2 category with the LBM magnetization tuned by SOT effect and are not suitable for autonomous BN operation in general. It might also be possible to design p-bits using other phenomena such as voltage controlled magnetic anisotropy (Amiri and Wang, 2012), but this is beyond the scope of the present study. Here, we have specifically focused on two designs that can be implemented with existing MRAM technology based on STT and SOT.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

Author Contributions

RF and SD wrote the paper. RF performed the simulations. JK helped setting up the simulations. KC developed the simulation modules for the BSN in SPICE. All authors discussed the results and helped refine the manuscript.

Funding

This work was supported in part by ASCENT, one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

The authors would like to thank professor Joerg Appenzeller from Purdue University for helpful discussions. This manuscript has been released as a preprint at arXiv (Faria et al., 2020).

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fncom.2021.584797/full#supplementary-material

Footnote

1. ^http://ptm.asu.edu/

References

Abeed, M. A., and Bandyopadhyay, S. (2019). Low energy barrier nanomagnet design for binary stochastic neurons: design challenges for real nanomagnets with fabrication defects. IEEE Magnet. Lett. 10, 1–5. doi: 10.1109/LMAG.2019.2929484