- 1Department of Information Medicine, National Institute of Neuroscience, National Center of Neurology and Psychiatry, Tokyo, Japan
- 2Graduate School of Medical and Dental Sciences, Institute of Science Tokyo, Tokyo, Japan
- 3Laboratory for Advanced Brain Functions, Institute for Protein Research, Osaka University, Osaka, Japan
Accurate interoceptive processing in decision-making is essential to maintain homeostasis and overall health. Disruptions in this process have been associated with various psychiatric conditions, including depression. Recent studies have focused on nutrient homeostatic dysregulation in depression for effective subtype classification and treatment. Neurophysiological studies have associated changes in appetite in depression with altered activation of the mesolimbic dopamine system and interoceptive regions, such as the insular cortex, suggesting that disruptions in reward processing and interoception drive changes in nutrient homeostasis and appetite. This study aimed to explore the potential of computational psychiatry in addressing these issues. Using a homeostatic reinforcement learning model formalizing the link between internal states and behavioral control, we investigated the mechanisms by which altered interoception affects homeostatic behavior and reward system activity via simulation experiments. Simulations of altered interoception demonstrated behaviors similar to those of depression subtypes, such as appetite dysregulation. Specifically, reduced interoception led to decreased reward system activity and increased punishment, mirroring the neuroimaging study findings of decreased appetite in depression. Conversely, increased interoception was associated with heightened reward activity and impaired goal-directed behavior, reflecting an increased appetite. Furthermore, effects of interoception manipulation were compared with traditional reinforcement learning parameters (e.g., inverse temperature β and delay discount γ), which represent cognitive-behavioral features of depression. The results suggest that disruptions in these parameters contribute to depressive symptoms by affecting the underlying homeostatic regulation. Overall, this study findings emphasize the importance of integrating interoception and homeostasis into decision-making frameworks to enhance subtype classification and facilitate the development of effective therapeutic strategies.
1 Introduction
Interoception that is appropriately integrated into decision-making is essential for maintaining homeostasis and overall health (Cannon, 1929; Friston, 2013; Stephan et al., 2016). Maladaptive homeostasis is associated with eating disorders (Brown et al., 2017; Khalsa et al., 2022), unbalanced feeding in autism spectrum disorder (Fiene and Brownlow, 2015), and depression (Paulus and Stein, 2010; Avery et al., 2014; Stephan et al., 2016). Among these conditions, nutritional homeostasis dysregulation could be a primary diagnostic marker of depression, which is characterized by symptoms of maladaptive appetite and is used as a criterion for classification (Weissenburger et al., 1986; Zimmerman et al., 2011; American Psychiatric Association, 2022). Maladaptive appetite in depression is heterogeneous, either increasing or decreasing in different cases (Maxwell and Cole, 2009; Simmons et al., 2013, 2016, 2020).
Nutrient homeostatic dysregulation underlying depression has been actively studied in recent years as it is crucial for effective subtype classification of depression and development of appropriate treatments (Konttinen et al., 2010; Privitera et al., 2013; Cosgrove et al., 2020). For example, from a neurophysiological perspective, research has revealed that changes in appetite in individuals with depression are related to altered activation of the mesolimbic dopamine system and areas that are strongly associated with interoceptive processing, such as the insular cortex (Simmons et al., 2016, 2020). These findings suggest that alterations in reward processing and the complex interplay between nutrient interoceptive processing are involved in changes in nutrient homeostasis and altered appetite in depression. However, system-level mechanisms, including those affecting brain activity and behavior, remain largely unclear (Young et al., 2021).
To address these challenges a neurocomputational theory-based methods to reveal the pathophysiology of neuropsychiatric conditions (i.e., “computational psychiatry”) is expected to make important contributions (Montague et al., 2012; Friston et al., 2014; Yamashita, 2021; Takahashi et al., 2023). In the field of depression research, there has been growing interest in using the reinforcement learning (RL) theory in combination with behavioral experiments to test hypotheses and further our understanding of the underlying mechanisms (Takahashi et al., 2008; Kunisato et al., 2012; Toyama et al., 2019). However, the integration of interoception and homeostasis into the theoretical frameworks of decision-making has not progressed sufficiently (Paulus, 2007; Rangel, 2013; Morville et al., 2018).
Therefore, in this study, we aimed to provide a systems-level explanation for the dysregulation of nutrient homeostasis in depression using computational psychiatry methods to clarify the mechanisms by which changes in interoceptive processing and alterations in reward system activity are related. Specifically, we attempted to integrate the mechanisms of homeostasis and decision-making and provide a system-level explanation of the functions of the reward system and nutritional state. To achieve this, we used the homeostatic RL (HRL) model, which formalizes the relationship between internal states and the drive that controls behavior as the “homeostatic space” (Keramati and Gutkin, 2014; Keramati et al., 2017; Morville et al., 2018; Hulme et al., 2019; Uchida et al., 2022). Using the HRL model, we aimed to interpret the changes in homeostatic maintenance behaviors and reward system activity related to changes in interoceptive sensations by conducting a pseudomanipulation experiment (simulation) of reduced and exaggerated interoceptive sensations in the HRL model. In addition, we aimed to compare the effects of the modulation of RL parameters previously associated with depression on decision-making behaviors, focusing on the effects of changes in interoceptive processing within the HRL model. In particular, delay discounting parameter had not been previously examined in detail with the HRL model. This approach allows us to discuss the similarities (equifinality) in the behavior of the model with modulated delay discounting and other parameters of the HRL such as modulation of interoceptive information processing. Through this study, we hope to offer a systems-level explanation of the associations among phenomena previously considered as changes in RL parameters, changes in interoceptive processing, and nutrient homeostasis in depression.
2 Materials and methods
2.1 HRL model
In this study, nutritional homeostasis was modeled using the HRL model. This model assumes that homeostasis is an RL process, in which the minimization of deviations in internal states from an optimal level (i.e., homeostasis) is treated as a computation for maximizing the sum of rewards. In the HRL model, a multidimensional metric space in which each dimension represents an internal state (such as body temperature, blood glucose density, water balance, and sodium level) is defined as a “homeostatic space.” In this homeostatic space, the drive function D(Ht) is defined as the distance between the internal state of the i-th component (e.g., water or sodium) at time t, Hit, and the ideal internal state H*i (Equation 1):
where m and n are free parameters that define the distance and N is the total number of dimensions for the internal states (e.g., water and sodium). When the internal state approaches the ideal state, the drive function decreases. Based on this drive function, the reward rt is determined as the change in the values of the drive function from time t to time t + 1. Specifically, to implement nutrient intake, the internal state at time t + 1 should contain the amount of nutrient intake at time t, defined as Kt (Equation 2).
As described later, in the HRL model, the intake of taste stimuli ( ) can be modeled as a predictor of the actual nutrient intake (Kt). Under this assumption, the reward is calculated as follows (Equation 3):
Q-learning was used to model the RL process. In this model, the values of action at (e.g., intake, do nothing…) and Qt(at) are updated based on the temporal difference error ( ) (Equation 4):
where αQ is the learning rate for Qt(a), a’ is the next candidate action, is the TD error, and is the discount rate. Action selection depends on the relative magnitudes of the values of each action (Q-value) according to the softmax function (Equation 5).
where Pt (ak) is the probability of an action ak to be selected at time t, and β is the inverse temperature, a parameter controlling the randomness of an action. The initial values of the Q-values of both actions were set to 0. Therefore, the first action is chosen at random. When the agent intakes, the internal state increases with Kt, which is a constant that defines the amount of intake. When nothing was chosen, Kt was set to 0. At t = 0, representing the hungry state, the first internal state (H0 = 100) was far lower than the ideal state (H* = 200). At this stage, the drive function is large because it corresponds to the distance from the internal state at time t (Ht) to the ideal state (H* = 200; Equation 1). If an agent performs the intake behavior at this moment, the internal state is expected to increase and the drive function is anticipated to decrease, resulting in a positive reward (Equation 3). In addition, the natural decrease in nutritional balance was implemented as follows using the temporal decay constant τ (Equation 6):
The reward value was calculated as follows (Equation 7):
After updating the Q-values via Q-learning, the agent selects the next action. As previously mentioned, the HRL model assumes that , the cognition of the stimulus based on the reward from the action at, is renewed through learning. In this study, the following equation was used to update (Equation 8):
The detailed values of the simulation parameters are listed in Supplementary Table S1. Specifically, as commonly used in previous homeostatic reinforcement learning studies, we employed (m, n) = (3, 4) (Keramati and Gutkin, 2014). The impact of this choice on the modulation of interoception, which is a central theme of this study, is addressed in the Discussion section.
2.2 Nutritional homeostasis: intake-after-food-restriction task
We performed an intake-after-food-restriction task to investigate the applicability of the HRL model to nutritional homeostasis (Figure 1). The computational algorithm is illustrated in Figure 1A. For simplicity, only one nutritional state is considered. The external state (S0) and two actions–do nothing (a0) and intake (a1)–were assessed (Figure 1B).
Figure 1. Nutritional homeostatic maintenance according to the homeostatic reinforcement learning (HRL) model. (A) Schematic of the computational process of the HRL model. Herein, η represents the activity of interoceptive processing, β denotes the inverse temperature, and γ signifies delay discounting; η reflects the degree to which the difference between the internal state and the ideal state is overestimated or underestimated, characterizing the nature of interoceptive information processing until it is conveyed to the reward system. The inverse temperature β indicates the extent to which decision-making reflects the learning history. The delay discounting parameter γ is used in updating the state-action values. (B) Definition of the state and two actions in intake-after-food-restriction simulations. (C) Example of homeostatic behavior. Changes in the internal nutritional state (H), value of each action (Q-value), selected actions (a), probability of intake (P(Intake)), and magnitude of reward (R) are plotted. Solid lines indicate the results of a single trial. Dotted lines in the panel of the internal state indicate the ideal point (H* = 200) of the nutrient. In the panel related to actions (a), action 1 indicates “intake,” and action 0 indicates “do nothing.” At the beginning of the simulation, internal nutritional state was 100, and Q-values for each action were set to 0. After several random selections of action, Q-value of nutrient intake was increased, and the internal nutritional state quickly reached the ideal point, maintaining homeostatic regulation of behavior.
The following formula was adopted to implement the alteration in interoception (Equation 9):
where is a parameter that refers to the modulation of the interoception and is a constant over the difference between the ideal and actual internal state. The validity of this implementation is discussed in the Discussion section.
2.3 Mountain-climbing task
Mountain tasks have been utilized to assess whether individuals prioritize short-term, small rewards or long-term, large rewards that can only be obtained by enduring sequences of punishments (e.g., “Mountain car”) (Sutton and Barto, 2018). In this study, a derived form was employed for this task. The mountain-climb task is designed to evaluate the balance of sensitivity between short-term and long-term rewards (Figure 2). In the task, the optimal behavior to maximize rewards involves choosing actions that may incur short-term penalties but lead to greater rewards over the long term. Assuming an agent that demonstrates a balanced approach to rewards as a healthy model, this study aims to examine how changes in parameters, such as internal states, influence the agent’s behavior and reward system activity. The task comprised 8 states (S0-S7; Figure 2A). For each condition (control and low interoception), the experiment consisted of 30 trials, and each trial included 15 episodes (Figure 2B).
Figure 2. Altered interoception in the mountain-climbing task. (A) Definitions of eight states and two actions at each state in the mountain-climbing task. (B) Time series relationships among variables. In all 15 episodes, the agent started its actions from S0. Upon reaching S7 and performing action a71 (major feeding), the episode concluded, and the agent moved to S0 to begin the next episode. The external state was reset to S0, and the internal state was initialized to 100, while the state-action values and the predicted changes in the internal state due to actions were carried over. Completing 15 episodes in this manner constituted one agent’s mountain-climbing task, which was conducted across 30 agents (30 trials). Results from performing this mountain-climbing task under different conditions, such as variations in interoceptive modulation, were compared across multiple metrics. (C) S7-rate of the control, low interoception, and high interoception models. (D) Total timesteps per trial. (E) Total number of minor intakes. (F) Trajectories of each variable in the 5th episode of the control, low interoception, and high interoception models. Significance was determined using the Student’s t-test (B,C) or Wilcoxon rank-sum test (D) (30 trials). ***p < 0.001; N.S., not significant.
At the beginning of each episode in this task, the agents started at S0 with an internal state of 100 (H0 = 100). The agent can only choose to intake at two states: the bottom state (S0), where it can choose to consume a small amount, and the summit state (S7), where it can choose to consume a large amount. At S0, the agents have two options for selecting an action: a00 (small intake) or a01 (moving horizontally). In S1-S6, the agents have the option to climb or descend (and move horizontally only in S1). It is important to note that climb actions (a10, a20, …, a60) result in a constant decrease in the internal state, which acts as a punishment in the context of nutritional deficiency accompanied by climb actions. Other actions, than the climbs, follow decrease in internal state derived from only the attenuating rate ( ), but the decreases with climb actions resulted from both and constant cost additionally. Once the agent selects a70 (large intake), the episode ends, the state-action values (Q) and the prediction of the internal state increase (K^) are carried over to the next “episode.” The initial states of the subsequent episodes in the mountain-climbing task were consistently set to S0, and the episodes were iterated 15 times. One trial of the mountain-climbing task was completed when 15 large-intake actions (a70) were performed (Figures 2A,B). Each condition consisted of 30 trials (Figure 2B). A different set of free parameters was used in the mountain-climbing task than in the intake-after-food-restriction task to ensure that the number of time steps in a trial converged within rational time steps (Supplementary Tables S1, S2).
3 Results
3.1 Nutritional homeostasis: intake-after-food-restriction task
First, we demonstrated the behavior of a healthy control model using the intake-after-food restriction task (Figure 1). The detailed process is described in the Materials and Methods section.
At the beginning of the simulation, the internal nutritional state was set to a value far from the ideal state (corresponding to a fasting state), and Q-values for each action were set to 0. After several random selections of actions, Q-value of nutrient intake increased, and internal nutritional state quickly reached the ideal point, maintaining the homeostatic regulation of behavior. The value of do nothing decreased, and Q-values of the intake increased and remained relatively high for some time, even after exceeding the set point (Ht > H * = 200). Subsequently, frequency of doing nothing increased owing to continued punishment from an excess internal state as the value of doing nothing became greater than the Q-value of intake. Continued do nothing caused the internal state to decline due to natural decay. In response to this decline, Q-value of intake was greater than that of doing nothing, resulting in the maintenance of homeostatic regulation (Figure 1C). This simulation expressed one aspect of nutritional homeostatic maintenance behavior: the frequency of intake increased after experiencing nutritional deficiency and decreased after achieving sufficiency. Starting from an energy-deficient state, individuals restore an appropriate energy level through food consumption. Thereafter, in line with a natural progression, they consume food if energy levels decrease, and stop the consumption after reaching a certain threshold, maintaining homeostasis (where a state of low drive over a certain period is considered normal). Therefore, in Figure 1C, the behavior fluctuating between “do nothing” and “intake” after the internal state has converged to the target state represents a model of normal behavior. In contrast, under abnormal internal state conditions, the agent primarily aims to maintain homeostasis, but deviations in homeostatic maintenance or reward system activity from the norm are anticipated.
3.2 Altered interoception in the intake-after-food-restriction task
Next, we investigated the impact of altered interoception on feeding behavior and nutritional homeostasis using an intake-after-food-restriction task. In the simulation, we assessed the simple behavioral characteristics of each model under nutrient deprivation at 100 time points (Figure 3) and quantified the changes in feeding behavior and nutritional homeostasis induced by altered interoception. As shown in Equation 9, altered interoception was simulated by varying the value of parameter η ( ), which is a constant parameter manipulating the impact of the difference between the setpoint (H* = 200) and actual internal state.
Figure 3. Altered interoception in the intake-after-food-restriction task. (A) Average of reward with all intake behavior in a single episode determined from the simulations with altered interoception models. (B) Sum of punishments during one episode with altered interoception. (C) Sum of drive in one episode determined from the simulated lesion models with altered interoception. (D) Total intake in an episode with altered interoception. (E) Example of transition of variable from episodes of the control, low interoception, and high interoception models. In panels (A–D), Student’s t-test or Welch’s t-test was used for between-group comparisons after Levene’s test. **p < 0.01 and ***p < 0.001 (N = 40).
Figure 3A demonstrates the reward properties showing the average reward per intake (i.e., total reward obtained for an intake within an episode divided by the number of intakes). Figure 3B shows the sum of the punishments in an episode. In the HRL, Rt is defined as the change in drive resulting from a particular action. If the drive changes toward the ideal state, Rt > 0; conversely, if the action results in the drive moving away from the ideal state, Rt < 0. Herein, Rt when Rt < 0 is defined as “punishment,” and in Figures 3B, 4A,C, the total value of this punishment is plotted. In the low η model ( ), average reward per intake was decreased (Figure 3A), and the sum of punishment was increased (Figure 3B). This occurred because the absolute value of the drive (D) obtained by the behavior was smaller (Figure 3E), and consequently, the absolute value of reward (R) (i.e., the difference in drive (D)), also became smaller (Figure 3E). In addition, the low η models reduced the frequency of the normal action selection (i.e., intake in the case of nutritional deficiency; Figure 3D) because the input of nutritional deficiencies into the model was reduced. In the “Actions” of Figure 3E, action a1 represents intake, and it can be observed that the frequency of intakes decreased in the low interoception ( ) model. Consistent with the decreased intake frequency, the total number of drivers per episode increased (Figure 3C).
Figure 4. Alterations in reinforcement learning (RL) parameters in the intake-after-food-restriction task. (A) Average rewards per intake behavior in single episodes and sum of punishment in each episode determined from the simulations with altered inverse temperature (β). (B) Example transitions of variables of the altered β models. (C) Average rewards with all intake behavior in single episodes and sum of punishment in each episode determined from simulations with altered discount ratio (γ). (D) Example transitions of variables of the altered γ models. (E) Total number of intakes in single episodes determined from simulated lesion models with altered β. (F) Total number of intakes in single episodes determined from models with altered γ. (G) Sum of drive during single episodes of models with altered β. (H) Sum of drive during single episodes of models with altered γ. ***p < 0.001 (N = 40); N.S., not significant. In panels (A), (C), and (E–H), Student’s t-test or Welch’s t-test was used after Levene’s test.
In contrast to the low η models ( ), the high η model ( ) exhibited an increase in the average reward per intake (Figure 3A) and a decrease in the sum of punishment (Figure 3B), reflecting opposite mechanisms compared to the low η model (Figure 3E).
3.3 Altered interoception: mountain-climbing task
To investigate the impact of alterations in interoception on balance with regard to minor immediate and major long-term rewards, we used a mountain-climbing task (Figures 2A,B). To assess changes in behavior, three types of measures were used: (1) S7-rate, which is the number of time steps spent in S7 divided by the total number of time steps within a single trial; (2) total-time steps, which is the sum of all time steps within a single trial; and (3) a00-timesteps, which is the total number of times that a small intake (a00) was selected during a single trial. A larger S7-rate (Figure 2C), smaller total-timesteps (Figure 2D) and a00-timesteps (Figure 2E) characterized the priority for long-term, large rewards. The overestimated interoception agent showed a higher S7-rate (Figure 2C), fewer total time steps (Figure 2D), and lower a00-timesteps (Figure 2E). This suggests that agents prioritize reaching the summit and receiving a large reward, even if this means short-term nutritional loss, over immediately receiving a small reward. In the underestimated interoception condition, the HRL models showed no significant changes in S7-rate (Figure 2C) and total-timesteps (Figure 2D), but decreased a00-timesteps (Figure 2E), suggesting Q-values of actions that moved away from the end of the episode were larger (Figure 2F; Q0-value).
3.4 Altered RL parameters
As mentioned earlier, we endeavored to compare the impact of altering RL parameters, namely inverse temperature (β) and discount rate (γ), which have been linked to depression, with the effects of modifying interoceptive processing in the HRL model (Figure 4). First, we conducted the intake-after-food-restriction task by manipulating the inverse temperature parameter β, which is associated with effects of Q-values to the action selections (Schweighofer and Doya, 2003). As a result of manipulating β in the HRL model, the reward gained from a single intake increased, while the cumulative punishment also increased significantly (Figures 4A,B). Although this outcome may initially appear counterintuitive, it can be explained as follows. As the normal behavioral choice in the deficient state (i.e., intake behavior) was reduced, the internal state remained far from the set point (Figure 4E). This, in turn, reduced the proximity to the ideal value of the internal state in the homeostatic space, resulting in greater rewards being obtained from a single intake within the homeostatic space (i.e., at a position where the change in drive per intake was greater; Figure 4B). In fact, when we assessed the total drive within a single episode, it was observed to increase as β decreased (i.e., when homeostasis was altered; Figure 4G).
We also manipulated γ in the intake-after-food-restriction task. When γ was decreased, the subjects made action selections emphasizing immediate rewards or punishments rather than future ones. This manipulation of the HRL model resulted in a slight increase in the average reward for each intake episode (Figure 4C), increase in the sum of drives in a single episode (Figure 4H), and decrease in total intake (Figure 4F). This is because decreasing γ reduced the absolute value of the second term in Equation 4 (δ: TD error), thereby reducing the range of possible values for δ and range of possible Q-values. Consequently, the differences between 2 actions were reduced as low β (Figures 4B,D), impacting the reward and drive similar to that in the low β model.
We further examined the effects of changing γ and β of the HRL model in the mountain-climbing task. Models with decreased discount rate γ consistently showed a lower frequency of summit attainment across all metrics compared to the control group (Figures 5A–C). Notably, these models experienced an increase in short-term rewards and stayed at lower altitudes (Figure 5C), underscoring the tendency of smaller γ values to prioritize immediate rewards and punishments over distant future rewards. This observation confirmed the simulation’s assumption that models with lower γ behave more impulsively, thereby validating the rationality of the mountain-climbing task. In the low γ group shown in Figure 5D, the darkest blue line represents the Q-value for a00, and the second darkest line indicates the Q-value for a10, suggesting that the differences in state-action values reflect the elevation in short-term state-action values. Additionally, increased frequency of stays at states S0 and S1 in this group, as shown in Figure 5D, indicates the decreased climbing performance due to the relative rise in lower state-action values in the low-gamma group. Models with low β also demonstrated a reduced frequency of reaching the summit across all measures compared to the control group (Figures 5A–C). This can be attributed to the increased randomness in behavior caused by low β, leading to less frequent selection of the optimal behaviors necessary for achieving rewards at the summit, especially during periods of nutritional deficiency.
Figure 5. Alterations in RL parameters in the mountain-climbing task. (A) Results of the control, low γ, and low β models referred to the ratios determined from the same calculations as those shown in Figure 2C. (B) The same measure of Figure 2D. (C) The same measure of Figure 2E. (D) Trajectories of each variable in the final episode of low γ and low β models. N.S., not significant; **p < 0.01 and ***p < 0.001.
4 Discussion
In this study, we attempted to interpret the changes in homeostatic maintenance behaviors and reward system activity by conducting pseudo-manipulation simulations of reduced and exaggerated interoception using the HRL model. Additionally, we compared the effects of modifications in RL parameters associated with depression on decision-making behavior with the effects of changes in interoceptive processing within the HRL model. Through this comparison, we aimed to provide a systems-level explanation of the relationship between phenomena previously considered to be changes in RL parameters, changes in interoceptive processing, and nutrient homeostasis.
In the low η model, the difference between the ideal state and the actual internal state was mitigated, resulting in a smaller reward per intake behavior (Figure 3A). This suppressed the learning of the Q-value for optimal behavior (intake; Figure 3E). As a result, the agent spent more time in a state in which the internal conditions deviated from the optimal values. Due to this prolonged deviation, the accumulated drive remained high, reflecting the failure of the system to regulate itself effectively (Figure 3C). However, no significant changes were observed during the mountain-climbing task (Figure 2). In the high η model, in contrast, the difference between the ideal state and the actual internal state was exaggerated, resulting in a larger reward per intake behavior (Figure 3A). This made it easier to learn the Q-value for optimal behavior, and the total drive decreased quickly (Figure 3E). However, this model demonstrated difficulties in acquiring large distant rewards (completing the task) in the mountain-climbing task (Figures 2C–F). This was due to the tendency to overestimate immediate punishment before a large reward and the small rewards obtained away from the large reward (Figure 2).
The observations in these models were qualitatively similar to depressive symptoms. For example, in the reduced interoception model, we observed a decrease in the range (R, Q) of reward system activity (Figures 3A,E). This can be understood as a pattern of reduced activity in the insular cortex, which processes interoceptive information, and in the reward system of patients with depression with reduced appetite, which is revealed in fMRI-based research (Simmons et al., 2016). Furthermore, the increase in punishment across tasks (Figure 3B), decrease in optimal intake behavior in the post-dietary intake task (Figure 3D), and increase in total drive (Figure 3C) can be understood as corresponding to depressed patients’ subjective feelings of inadequate internal state maintenance and sustained physical strain. In contrast, in the increased interoception (high η) model, increased reward system activity (Figures 3A,E) was observed. This can be understood as a pattern similar to the increased insular cortex activity and increased reward system activity in patients with depression and increased appetite, showed in the same study (Simmons et al., 2016). This increased interoception model demonstrated appropriate behavior in the intake-after-food-restriction task but prevented the completion of the mountain-climbing task. This can be understood to be similar to the deficits in long-term reward-oriented behavior observed in depression. In the literature, body mass index (BMI) was comparable between the groups of subjects who showed clear contrasts in the activities of the insular cortex and reward systems. This is consistent with the fact that the internal state (H) did not differ from that of the control model in either the increased or decreased interoception models. Thus, these manipulated interoception models may represent an aspect of the pathophysiology of a subtype of depression characterized by decreased or increased appetite.
In addition, we examined the effects of RL parameters, which are often discussed in relation to depression. Previous behavioral modeling studies of depression using RL models, such as simple learning, have highlighted increased behavioral randomness and low β-values (Kunisato et al., 2012; Blanco et al., 2013; Huys et al., 2013; Rupprechter et al., 2018), alongside a tendency to overestimate short-term rewards while underestimating long-term rewards, in association with a reduced γ-value (Takahashi et al., 2008; Dombrovski et al., 2011; Cáceda et al., 2014; Imhoff et al., 2014; Mies et al., 2016). Notably, HRL model exhibited trends similar to those of the conventional RL model. In the low β model, the tendency for the value of intake behavior increased normally (Figure 4B), but when calculating the probability of action (P) from the Q-value, the relative magnitude of the two behavioral values was underestimated, resulting in a lower frequency of the appropriate behavior, intake, being selected (Figure 4B). Similarly, the low γ model showed the same trend as the conventional RL model. That is, the future value of the behavior is underestimated, and the prediction of immediate reward or punishment strongly influences decision-making. Consequently, the climbing task required more time steps to reach the summit (Figure 5).
Behaviors of these RL parameter modulation models have both similarities and differences with the results of the interoception modulation models. First, low η, low β, and low γ models exhibit similarity in increased drive (Supplementary Figure S1). These results are due to a decrease in the frequency of optimal behavior, resulting from the smaller range of rewards and Q-values (low η and low γ) or difficulty in reflecting the relative magnitude in action probabilities. For the high η and low γ models, whose performance declined in the mountain-climbing task, responses to immediate rewards and punishments increased, and the reward for each intake was large. However, in the intake-after-food-restriction task, the drive increased in the low γ model but decreased in the high η model, where high η was more adaptive. Although high η, low β, and low γ showed similar increased reward per action, two different mechanisms were involved: high η overestimated the reward for a change in internal state of a certain magnitude, whereas low β and low γ did not alter the evaluation of rewards for internal states compared to the control. However, in low β and low γ models, more time was spent in regions where the internal state had significantly deviated, and the punishment for failing to choose intake was high.
These results indicate that individuals showing similar results in one task may have different underlying mechanisms and exhibit different behaviors in another task. Therefore, influence of homeostasis and interoception should be considered when discussing the relationship between RL and depressive symptoms related to nutrition. Although the current model focuses on the deterministic modulation of interoception, actual eating behavior is possibly influenced by various factors, such as hormones and visual stimuli, over different time scales. Therefore, further detailed computational modeling is warranted to understand the physiological homeostasis and mechanisms in psychiatric disorders.
In this simulation, the value of the deficient internal state was fixed (H0 = 100); however, in a real organism, this value is not constant. For instance, in a model starting from a more deficient internal state, the reward from intake in the intake-after-food-restriction task would be greater, as would be the reward from minor feeding in the mountain climbing task.
In this study, we introduced a deterministic modulation of interoception, where the discrepancy between the ideal state and the actual internal state is scaled by a constant factor. This modulation is likely related to the shape of the homeostatic space defined by the free parameters m and n. For example, a model that overestimates the discrepancy between the ideal state and the actual state can be interpreted as having a steeper gradient in the homeostatic space. The relationship and physiological significance of η, m, and n are important questions, whose theoretical understanding remains insufficient. Therefore, further studies are required, along with the accumulation of empirical data.
In this study, we fixed the parameters of the HRL to biased values (η) to simplify the investigation of the effects of parameter modulation. However, in the real world, such parameter modulation may dynamically change in response to environmental pressures. In fact, evidence is accumulating in interoception research that supports the idea that interoceptive modulation can be explained in terms of prediction uncertainty or communication accuracy, which fluctuate due to environmental influences (Smith et al., 2020; Young et al., 2021). Similarly, using models that address dynamic interoceptive modulation in response to environmental changes could help elucidate the workings of biological systems and provide valuable clinical insights.
This study examined changes in decision-making tendencies that reflect internal states and may represent an aspect of decision-making in other mental disorders. For example, impulsive decision-making observed in disorders such as obsessive-compulsive disorder (OCD) can be represented as an increase in delay discounting within reinforcement learning (Amlung et al., 2019). However, much remains unclear about how homeostatic regulation functions are involved in these conditions. Focusing on nutritional homeostasis, there is evidence suggesting substantial overlap between OCD and eating disorders (Halmi, 2004), highlighting the importance of deepening our systemic understanding. In OCD, reinforcement learning modeling has progressed, showing that interventions suggested by biologically-based model research can be effective (Sakai et al., 2022). Investigating the connection between these models and nutritional homeostasis may yield practical interventions for impulsive eating behaviors.
Compulsivity has been identified as a transdiagnostic factor across various psychiatric disorders, yet many aspects of the underlying mechanisms remain unclear (Gillan et al., 2017). This study suggests that distinct mechanisms may underlie behaviors viewed as food intake actions aimed at obtaining short-term rewards (an “equifinal” symptom) and that these mechanisms may stem from factors such as interoceptive modulation, inverse temperature, or altered time discounting. This insight points to the potential for identifying practical, personalized interventions.
The results of the current computational model suggest that when individuals make myopic decisions in situations where rational calculations of delayed rewards from nutritional intake, such as in the mountain-climbing task, two underlying mechanisms may be at play: decreased delay discounting or excessive interoception. According to previous research, pharmacological interventions such as 5-hydroxytryptamine (5-HT) 2A receptor blockade may be effective in addressing behaviors characterized by myopic tendencies resulting from decreased delay discounting (Ardayfio et al., 2008; Gillan et al., 2017). Thus, if decreased delay discounting is the underlying mechanism, such pharmacotherapy could be a promising treatment. However, if excessive interoception is the underlying cause, schemes that address sensory hypersensitivity may be beneficial, although further evidence is required to substantiate this approach for specific discussions.
Data availability statement
The original contributions presented in the study are included in the article/Supplementary material, further inquiries can be directed to the corresponding author.
Author contributions
YU: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Resources, Software, Visualization, Writing – original draft, Writing – review & editing, Validation. TH: Funding acquisition, Investigation, Project administration, Resources, Supervision, Validation, Writing – original draft, Writing – review & editing. MH: Project administration, Resources, Supervision, Validation, Writing – original draft, Writing – review & editing. YY: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing.
Funding
The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This work was supported by the following funding sources: JST SPRING, grant number JPMJSP2120 (YU), JSPS KAKENHI (JP23K24205, JP22H00494, and JP23K18163), AMED under grant number (JP21wm0425010 and JP21gm1510006), Salt Science Research Foundation Grant (2438), the Collaborative Research Program of Institute for Protein Research, Osaka University (ICR-24-03) (TH), JSPS KAKENHI (JP20H00625, JP24H00076, and JP24K00499), JST CREST (JPMJCR21P4), Intramural Research Grant (4–6, 6–9) for Neurological and Psychiatric Disorders of NCNP (YY).
Acknowledgments
We would like to thank Editage for English language editing, and GPT-4o model for providing valuable language assistance in the preparation of this manuscript.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) verify and take full responsibility for the use of generative AI in the preparation of this manuscript. Generative AI was used improving the readability of the manuscript.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fnhum.2024.1502508/full#supplementary-material
References
American Psychiatric Association (Ed.) (2022). Diagnostic and statistical manual of mental disorders: DSM-5-TR™. fifth edition, text revision. Edn. Washington, DC: American Psychiatric Association Publishing.
Amlung, M., Marsden, E., Holshausen, K., Morris, V., Patel, H., Vedelago, L., et al. (2019). Delay discounting as a Transdiagnostic process in psychiatric disorders: a Meta-analysis. JAMA Psychiatry 76, 1176–1186. doi: 10.1001/jamapsychiatry.2019.2102
Ardayfio, P. A., Benvenga, M. J., Chaney, S. F., Love, P. L., Catlow, J., Swanson, S. P., et al. (2008). The 5-Hydroxytryptamine 2A receptor antagonist R -(+)-α-(2,3-Dimethoxyphenyl)-1-[2-(4-fluorophenyl)ethyl-4-piperidinemethanol (M100907) attenuates impulsivity after both drug-induced disruption (Dizocilpine) and enhancement (antidepressant drugs) of differential-reinforcement-of-low-rate 72-s behavior in the rat. J. Pharmacol. Exp. Ther. 327, 891–897. doi: 10.1124/jpet.108.143370
Avery, J. A., Drevets, W. C., Moseman, S. E., Bodurka, J., Barcalow, J. C., and Simmons, W. K. (2014). Major depressive disorder is associated with abnormal interoceptive activity and functional connectivity in the insula. Biol. Psychiatry 76, 258–266. doi: 10.1016/j.biopsych.2013.11.027
Blanco, N. J., Otto, A. R., Maddox, W. T., Beevers, C. G., and Love, B. C. (2013). The influence of depression symptoms on exploratory decision-making. Cognition 129, 563–568. doi: 10.1016/j.cognition.2013.08.018
Brown, T. A., Berner, L. A., Jones, M. D., Reilly, E. E., Cusack, A., Anderson, L. K., et al. (2017). Psychometric evaluation and norms for the multidimensional assessment of interoceptive awareness (MAIA) in a clinical eating disorders sample. Eur. Eat. Disord. Rev. 25, 411–416. doi: 10.1002/erv.2532
Cáceda, R., Durand, D., Cortes, E., Prendes-Alvarez, S., Moskovciak, T., Harvey, P. D., et al. (2014). Impulsive choice and psychological pain in acutely suicidal depressed patients. Psychosom. Med. 76, 445–451. doi: 10.1097/PSY.0000000000000075
Cannon, W. B. (1929). Organization for physiological homeostasis. Physiol. Rev. 9, 399–431. doi: 10.1152/physrev.1929.9.3.399
Cosgrove, K. T., Burrows, K., Avery, J. A., Kerr, K. L., DeVille, D. C., Aupperle, R. L., et al. (2020). Appetite change profiles in depression exhibit differential relationships between systemic inflammation and activity in reward and interoceptive neurocircuitry. Brain Behav. Immun. 83, 163–171. doi: 10.1016/j.bbi.2019.10.006
Dombrovski, A. Y., Szanto, K., Siegle, G. J., Wallace, M. L., Forman, S. D., Sahakian, B., et al. (2011). Lethal forethought: delayed reward discounting differentiates high- and low-lethality suicide attempts in old age. Biol. Psychiatry 70, 138–144. doi: 10.1016/j.biopsych.2010.12.025
Fiene, L., and Brownlow, C. (2015). Investigating interoception and body awareness in adults with and without autism spectrum disorder. Autism Res. 8, 709–716. doi: 10.1002/aur.1486
Friston, K. (2013). Life as we know it. J. R. Soc. Interface 10:20130475. doi: 10.1098/rsif.2013.0475
Friston, K. J., Stephan, K. E., Montague, R., and Dolan, R. J. (2014). Computational psychiatry: the brain as a phantastic organ. Lancet Psychiatry 1, 148–158. doi: 10.1016/S2215-0366(14)70275-5
Gillan, C. M., Fineberg, N. A., and Robbins, T. W. (2017). A trans-diagnostic perspective on obsessive-compulsive disorder. Psychol. Med. 47, 1528–1548. doi: 10.1017/S0033291716002786
Halmi, K. A. (2004). Obsessive-compulsive personality disorder and eating disorders. Eat. Disord. 13, 85–92. doi: 10.1080/10640260590893683
Hulme, O. J., Morville, T., and Gutkin, B. (2019). Neurocomputational theories of homeostatic control. Phys Life Rev 31, 214–232. doi: 10.1016/j.plrev.2019.07.005
Huys, Q. J., Pizzagalli, D. A., Bogdan, R., and Dayan, P. (2013). Mapping anhedonia onto reinforcement learning: a behavioural meta-analysis. Biol. Mood Anxiety Disord. 3:12. doi: 10.1186/2045-5380-3-12
Imhoff, S., Harris, M., Weiser, J., and Reynolds, B. (2014). Delay discounting by depressed and non-depressed adolescent smokers and non-smokers. Drug Alcohol Depend. 135, 152–155. doi: 10.1016/j.drugalcdep.2013.11.014
Keramati, M., Durand, A., Girardeau, P., Gutkin, B., and Ahmed, S. H. (2017). Cocaine addiction as a homeostatic reinforcement learning disorder. Psychol. Rev. 124, 130–153. doi: 10.1037/rev0000046
Keramati, M., and Gutkin, B. (2014). Homeostatic reinforcement learning for integrating reward collection and physiological stability. eLife 3:e04811. doi: 10.7554/eLife.04811
Khalsa, S. S., Berner, L. A., and Anderson, L. M. (2022). Gastrointestinal Interoception in eating disorders: charting a new path. Curr. Psychiatry Rep. 24, 47–60. doi: 10.1007/s11920-022-01318-3
Konttinen, H., Männistö, S., Sarlio-Lähteenkorva, S., Silventoinen, K., and Haukkala, A. (2010). Emotional eating, depressive symptoms and self-reported food consumption. A population-based study. Appetite 54, 473–479. doi: 10.1016/j.appet.2010.01.014
Kunisato, Y., Okamoto, Y., Ueda, K., Onoda, K., Okada, G., Yoshimura, S., et al. (2012). Effects of depression on reward-based decision making and variability of action in probabilistic learning. J. Behav. Ther. Exp. Psychiatry 43, 1088–1094. doi: 10.1016/j.jbtep.2012.05.007
Maxwell, M. A., and Cole, D. A. (2009). Weight change and appetite disturbance as symptoms of adolescent depression: toward an integrative biopsychosocial model. Clin. Psychol. Rev. 29, 260–273. doi: 10.1016/j.cpr.2009.01.007
Mies, G. W., De Water, E., and Scheres, A. (2016). Planning to make economic decisions in the future, but choosing impulsively now: are preference reversals related to symptoms of ADHD and depression? Int. J. Methods Psychiatr. Res. 25, 178–189. doi: 10.1002/mpr.1511
Montague, P. R., Dolan, R. J., Friston, K. J., and Dayan, P. (2012). Computational psychiatry. Trends Cogn. Sci. 16, 72–80. doi: 10.1016/j.tics.2011.11.018
Morville, T., Friston, K., Burdakov, D., Siebner, H. R., and Hulme, O. J. (2018). The homeostatic logic of reward (preprint). bioRxiv. doi: 10.1101/242974
Paulus, M. P. (2007). Decision-making dysfunctions in psychiatry—altered homeostatic processing? Science 318, 602–606. doi: 10.1126/science.1142997
Paulus, M. P., and Stein, M. B. (2010). Interoception in anxiety and depression. Brain Struct. Funct. 214, 451–463. doi: 10.1007/s00429-010-0258-9
Privitera, G. J., Misenheimer, M. L., and Doraiswamy, P. M. (2013). From weight loss to weight gain: appetite changes in major depressive disorder as a mirror into brain-environment interactions. Front. Psychol. 4:873. doi: 10.3389/fpsyg.2013.00873
Rangel, A. (2013). Regulation of dietary choice by the decision-making circuitry. Nat. Neurosci. 16, 1717–1724. doi: 10.1038/nn.3561
Rupprechter, S., Stankevicius, A., Huys, Q. J. M., Steele, J. D., and Seriès, P. (2018). Major depression impairs the use of reward values for decision-making. Sci. Rep. 8:13798. doi: 10.1038/s41598-018-31730-w
Sakai, Y., Sakai, Y., Abe, Y., Narumoto, J., and Tanaka, S. C. (2022). Memory trace imbalance in reinforcement and punishment systems can reinforce implicit choices leading to obsessive-compulsive behavior. Cell Rep. 40:111275. doi: 10.1016/j.celrep.2022.111275
Schweighofer, N., and Doya, K. (2003). Meta-learning in reinforcement learning. Neural Netw. 16, 5–9. doi: 10.1016/S0893-6080(02)00228-9
Simmons, W. K., Avery, J. A., Barcalow, J. C., Bodurka, J., Drevets, W. C., and Bellgowan, P. (2013). Keeping the body in mind: insula functional organization and functional connectivity integrate interoceptive, exteroceptive, and emotional awareness: functional organization. Hum. Brain Mapp. 34, 2944–2958. doi: 10.1002/hbm.22113
Simmons, W. K., Burrows, K., Avery, J. A., Kerr, K. L., Bodurka, J., Savage, C. R., et al. (2016). Depression-related increases and decreases in appetite: dissociable patterns of aberrant activity in reward and interoceptive neurocircuitry. Am. J. Psychiatry 173, 418–428. doi: 10.1176/appi.ajp.2015.15020162
Simmons, W. K., Burrows, K., Avery, J. A., Kerr, K. L., Taylor, A., Bodurka, J., et al. (2020). Appetite changes reveal depression subgroups with distinct endocrine, metabolic, and immune states. Mol. Psychiatry 25, 1457–1468. doi: 10.1038/s41380-018-0093-6
Smith, R., Kuplicki, R., Feinstein, J., Forthman, K. L., Stewart, J. L., Paulus, M. P., et al. (2020). A Bayesian computational model reveals a failure to adapt interoceptive precision estimates across depression, anxiety, eating, and substance use disorders. PLoS Comput. Biol. 16:e1008484. doi: 10.1371/journal.pcbi.1008484
Stephan, K. E., Manjaly, Z. M., Mathys, C. D., Weber, L. A. E., Paliwal, S., Gard, T., et al. (2016). Allostatic self-efficacy: a metacognitive theory of Dyshomeostasis-induced fatigue and depression. Front. Hum. Neurosci. 10:550. doi: 10.3389/fnhum.2016.00550
Sutton, R. S., and Barto, A. G. (2018). Reinforcement learning: An introduction (Adaptive computation and machine learning series). second Edn. Cambridge, MA: The MIT Press.
Takahashi, Y., Murata, S., Ueki, M., Tomita, H., and Yamashita, Y. (2023). Interaction between functional connectivity and neural excitability in autism: a novel framework for computational modeling and application to biological data. Comput. Psychiatry 7, 14–29. doi: 10.5334/cpsy.93
Takahashi, T., Oono, H., Inoue, T., Boku, S., Kako, Y., Kitaichi, Y., et al. (2008). Depressive patients are more impulsive and inconsistent in intertemporal choice behavior for monetary gain and loss than healthy subjects – an analysis based on Tsallis' statistics. Neuro Endocrinol. Lett. 29, 351–358
Toyama, A., Katahira, K., and Ohira, H. (2019). Reinforcement learning with parsimonious computation and a forgetting process. Front. Hum. Neurosci. 13:153. doi: 10.3389/fnhum.2019.00153
Uchida, Y., Hikida, T., and Yamashita, Y. (2022). Computational mechanisms of osmoregulation: a reinforcement learning model for sodium appetite. Front. Neurosci. 16:857009. doi: 10.3389/fnins.2022.857009
Weissenburger, J., John Rush, A., Giles, D. E., and Stunkard, A. J. (1986). Weight change in depression. Psychiatry Res. 17, 275–283. doi: 10.1016/0165-1781(86)90075-2
Yamashita, Y. (2021). Psychiatric disorders as failures in the prediction machine. Psychiatry Clin. Neurosci. 75, 1–2. doi: 10.1111/pcn.13173
Young, H. A., Gaylor, C. M., de-Kerckhove, D., and Benton, D. (2021). Individual differences in sensory and expectation driven interoceptive processes: a novel paradigm with implications for alexithymia, disordered eating and obesity. Sci. Rep. 11:10065. doi: 10.1038/s41598-021-89417-8
Zimmerman, M., Hrabosky, J. I., Francione, C., Young, D., Chelminski, I., Dalrymple, K., et al. (2011). Impact of obesity on the psychometric properties of the diagnostic and statistical manual of mental disorders, Fourth Edition criteria for major depressive disorder. Compr. Psychiatry 52, 146–150. doi: 10.1016/j.comppsych.2010.05.001
Keywords: appetite, computational neuroscience, computational psychiatry, decision-making, dopamine, homeostasis, homeostatic reinforcement learning
Citation: Uchida Y, Hikida T, Honda M and Yamashita Y (2024) Heterogeneous appetite patterns in depression: computational modeling of nutritional interoception, reward processing, and decision-making. Front. Hum. Neurosci. 18:1502508. doi: 10.3389/fnhum.2024.1502508
Edited by:
Lidia Ghosh, RCC Institute of Information Technology, IndiaReviewed by:
Rei Akaishi, RIKEN Center for Brain Science (CBS), JapanJunichiro Yoshimoto, Fujita Health University, Japan
Copyright © 2024 Uchida, Hikida, Honda and Yamashita. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Yuichi Yamashita, eWFtYXlAbmNucC5nby5qcA==