Multi-scale asynchronous correlation and 2D convolutional autoencoder for adolescent health risk prediction with limited fMRI data

Gao, Di; Yang, Guanghao; Shen, Jiarun; Wu, Fang; Ji, Chao

doi:10.3389/fncom.2024.1478193

ORIGINAL RESEARCH article

Front. Comput. Neurosci., 15 October 2024

Volume 18 - 2024 | https://doi.org/10.3389/fncom.2024.1478193

This article is part of the Research TopicHealth Data Science and AI in Neuroscience & PsychologyView all 6 articles

Multi-scale asynchronous correlation and 2D convolutional autoencoder for adolescent health risk prediction with limited fMRI data

Di Gao¹^*

Guanghao Yang¹

Jiarun Shen¹

Fang Wu¹

Chao Ji²^*

¹School of Physical Education, China University of Mining & Technology (Beijing), Beijing, China
²Physical Education Teaching and Research Section, Beijing City University, Beijing, China

Introduction: Adolescence is a fundamental period of transformation, encompassing extensive physical, psychological, and behavioral changes. Effective health risk assessment during this stage is crucial for timely intervention, yet traditional methodologies often fail to accurately predict mental and behavioral health risks due to the intricacy of neural dynamics and the scarcity of quality-annotated fMRI datasets.

Methods: This study introduces an innovative deep learning-based framework for health risk assessment in adolescents by employing a combination of a two-dimensional convolutional autoencoder (2DCNN-AE) with multi-sequence learning and multi-scale asynchronous correlation information extraction techniques. This approach facilitates the intricate analysis of spatial and temporal features within fMRI data, aiming to enhance the accuracy of the risk assessment process.

Results: Upon examination using the Adolescent Risk Behavior (AHRB) dataset, which includes fMRI scans from 174 individuals aged 17–22, the proposed methodology exhibited a significant improvement over conventional models. It attained a precision of 83.116%, a recall of 84.784%, and an F1-score of 83.942%, surpassing standard benchmarks in most pertinent evaluative measures.

Discussion: The results underscore the superior performance of the deep learning-based approach in understanding and predicting health-related risks in adolescents. It underscores the value of this methodology in advancing the precision of health risk assessments, offering an enhanced tool for early detection and potential intervention strategies in this sensitive developmental stage.

1 Introduction

Adolescence is a critical period of individual development, during which significant physiological, psychological, and social changes occur, profoundly impacting long-term health outcomes (Su et al., 2020; Tate et al., 2020). However, adolescents face increasing health risks, including mental health issues such as depression and anxiety, behavioral problems like substance abuse and violent tendencies, as well as other health-related behaviors such as eating disorders and lack of physical activity (Bozzini et al., 2020; Zink et al., 2020; Scardera et al., 2020). Early identification and intervention of these risks are crucial for safeguarding the future health of adolescents. In recent years, the

development of functional magnetic resonance imaging (fMRI) technology has provided powerful tools for studying the neural changes in adolescents and their relationship with health risks (Baranger et al., 2021; Lee et al., 2023; Sripada et al., 2020). However, acquiring and annotating high-quality fMRI data is often challenging due to the high costs and technical expertise required, limiting its use in large-scale population studies. Moreover, the rise of deep learning and machine learning technologies has further advanced health risk assessment research based on fMRI data. By combining these technologies, researchers can more accurately analyze brain function data in adolescents, predict their health risks, and provide valuable support for clinical decision-making. To address the challenges of limited fMRI data, autoencoders have been introduced to reconstruct and reduce the dimensionality of fMRI data, thereby reducing the cost and improving the efficiency of model building.

Functional magnetic resonance imaging (fMRI) is widely used in neuroscience research as a non-invasive imaging technique that captures brain activity by detecting blood oxygen level-dependent (BOLD) signals (Lauharatanahirun et al., 2023; Agarwal et al., 2023). This allows researchers to study the activity patterns of different brain regions under specific tasks or psychological states, revealing the neural mechanisms associated with health risks. However, the application of fMRI technology faces several challenges. First, the high cost of acquiring and processing fMRI data limits its use in large-scale population studies. Second, fMRI is sensitive to noise and individual differences, requiring careful interpretation of the data, often necessitating additional data sources and expert judgment (Viessmann and Polimeni, 2021; Bollmann and Barth, 2021). Additionally, the scanning process may cause discomfort for some participants, particularly adolescents, potentially impacting data reliability.

To address these issues, researchers have begun applying deep learning and machine learning models to fMRI data analysis. CNNs can automatically extract complex spatial features from fMRI images, improving the efficiency and accuracy of feature extraction without relying on traditional manual feature selection methods (Yin et al., 2022; Lin et al., 2022). Autoencoders, on the other hand, achieve dimensionality reduction and denoising through unsupervised learning, reducing redundant information, and enhancing model robustness (Kim et al., 2021; Qiang et al., 2021). Despite the significant advantages these models offer in fMRI data analysis, they still face certain limitations. Firstly, deep learning models typically require large amounts of labeled data for training, and the high cost of acquiring and labeling fMRI data limits the size of the training datasets. Secondly, the “black box” nature of deep learning models makes them difficult to interpret, which is particularly important in medical applications (Sheu, 2020). Additionally, issues such as overfitting and high computational complexity may limit the performance of these models in practical applications.

Despite the potential demonstrated by the integration of fMRI technology with deep learning models in health risk assessment, there are still several challenges in practical application. Firstly, developing efficient and accurate models with limited data remains a pressing issue (Allen et al., 2022). Secondly, improving the interpretability of the models is crucial for enabling clinicians to understand and trust the predictions made by these models. Moreover, the lack of standardized assessment methods and criteria makes it difficult to generalize these models across different populations and settings. Therefore, the motivation of this paper is to develop an adolescent health risk assessment method that combines fMRI and deep learning. This method aims to efficiently extract and analyze brain function data while improving the accuracy and interpretability of predictions, thereby better serving the health management of adolescent populations.

This paper proposes a method for adolescent health risk prediction that integrates multi-sequence two-dimensional convolutional autoencoders (2DCNN-AE) with multi-scale asynchronous correlation information extraction. Initially, raw fMRI data is preprocessed using the PyReliMRI toolkit, including head motion correction, slice timing correction, and spatial normalization. The 2DCNN-AE model is then employed to extract spatial and temporal features from the preprocessed fMRI data. This model consists of an encoder and a decoder, where the input data is feature-encoded and reconstructed through convolutional layers and upsampling layers. Additionally, a multi-sequence and multi-scale asynchronous correlation information extraction method is introduced, mapping brain partition maps under three-dimensional spatial coordinates to specific brain functional areas and extracting the probability distribution of synchronous expression between different time series. Finally, the extracted multi-scale asynchronous correlation information is used as feature inputs to train and construct the adolescent health risk prediction model.

The contributions of this paper are as follows:

• The proposed multi-sequence two-dimensional convolutional autoencoder (2DCNN-AE) method efficiently extracts spatial and temporal features from fMRI data, significantly improving the efficiency and accuracy of feature extraction.

• By introducing a multi-scale asynchronous correlation information extraction technique, the proposed method captures complex temporal relationships between different brain regions, thereby enhancing the robustness and predictive capability of the health risk assessment model.

• The use of autoencoders allows for the reconstruction of fMRI data samples, reducing the cost and challenges associated with acquiring large-scale annotated datasets, and thereby making the model more feasible for practical applications.

• The proposed method not only enhances the accuracy of model predictions but also improves model interpretability, making the predictions easier to understand and apply in clinical practice.

2 Related work

2.1 Functional magnetic resonance imaging techniques

The application of functional magnetic resonance imaging (fMRI) in adolescent health risk assessment represents a cutting-edge advancement in this field. By detecting blood oxygen level-dependent (BOLD) signals, fMRI indirectly reflects brain activity, offering rich spatial and temporal information (Stiernman et al., 2021). This technology is extensively used to study neurodevelopmental processes in adolescents and to identify brain function characteristics related to mental health issues, such as depression, anxiety, and attention deficit hyperactivity disorder (ADHD) (Wang et al., 2023; McNorgan et al., 2020). For example, fMRI can help pinpoint brain activity patterns associated with these common adolescent mental health challenges.

However, despite the non-invasive nature and high spatial resolution of fMRI, there are certain limitations in its practical application. First, the high cost of fMRI data acquisition and analysis restricts its use in large-scale studies. Second, fMRI is sensitive to noise and individual differences, which necessitates careful interpretation of the results (Uyulan et al., 2023). Additionally, the fMRI scanning process may cause discomfort in some participants, especially adolescents, potentially affecting the accuracy of the data. Therefore, improving data quality while reducing participant discomfort, as well as simplifying the complexity of fMRI data processing, remain significant challenges in this area.

2.2 Deep learning in network magnetic resonance imaging techniques

With the advancement of deep learning, researchers have increasingly integrated it with fMRI data to enhance the precision and efficiency of health risk assessment (Liu et al., 2022). Deep learning models, particularly convolutional neural networks (CNNs) and Long Short-Term Memory network (LSTM) (Saurabh and Gupta, 2024), have shown immense potential in processing and analyzing fMRI data. CNNs, with their hierarchical structure, can effectively extract complex spatial features from fMRI images (Chen et al., 2020), while AEs use unsupervised learning to achieve data dimensionality reduction and reconstruction, thereby alleviating the computational burden associated with high-dimensional data.

The application of deep learning in fMRI data analysis offers several significant advantages. For instance, CNNs can automatically extract brain activity features related to adolescent mental health risks without relying on traditional manual feature selection methods. This not only improves the efficiency of feature extraction but also captures a wider range of potential brain function patterns. Additionally, AEs excel in denoising and feature selection, making the models more robust and stable when handling fMRI data.

However, the application of deep learning in network magnetic resonance imaging also faces challenges. First, deep learning models typically require large amounts of labeled data for training, and the high cost of acquiring and labeling fMRI data limits the scale of training datasets. Second, the “black box” nature of deep learning models makes them difficult to interpret, which is particularly important in medical applications (Iravani et al., 2021). Moreover, issues such as overfitting and high model complexity may lead to suboptimal performance in practical applications. Therefore, balancing model complexity and interpretability, and training efficient models on small sample datasets, remain key areas of focus in this field.

2.3 Adolescent health risk assessment criteria

Adolescent health risk assessment criteria are a critical application area for combining fMRI techniques with deep learning (Ernst et al., 2015). Existing health risk assessment standards are typically based on a variety of factors, including biomarkers, behavioral assessments, and psychological questionnaires, providing essential tools for identifying at-risk adolescents (Bjork et al., 2010). However, traditional assessment methods often rely on expert judgment, which can introduce bias and inconsistency.

In recent years, researchers have sought to develop health risk assessment criteria based on fMRI data and deep learning models. For example, some studies have utilized deep learning models to automatically analyze fMRI data, extracting features related to health risks and predicting individual mental health risks based on these features (Mueller et al., 2010). This approach not only improves the objectivity and accuracy of assessments but also enables early detection of potential health issues, providing critical information for intervention and treatment.

The integration of fMRI and deep learning technologies provides powerful tools for health risk prediction. Despite the significant potential demonstrated by current research and applications, challenges remain in terms of data acquisition, model interpretability, and the standardization of assessment criteria. Future research should focus on developing more efficient, accurate, and scalable health risk assessment models to better serve the health management needs of adolescent populations.

3 Method

This paper proposes a method for predicting adolescent health risks by combining multi-sequence, two-dimensional convolutional autoencoder (2DCNN-AE) and multi-scale asynchronous correlation information extraction. The algorithm flow of two-dimensional convolutional autoencoder and multi-sequence asynchronous correlation is shown in Figure 1. First, the original fMRI data was preprocessed using the PyReliMRI toolkit, including head motion correction, slice timing correction and spatial normalization. Next, the 2DCNN-AE model was used to extract spatial and temporal features from the preprocessed fMRI data. The model consists of an encoder and a decoder, and the input data is feature encoded and reconstructed through convolutional layers and upsampling layers. At the same time, we introduced a multi-sequence and multi-scale asynchronous correlation information extraction method to map the brain partition map under three-dimensional spatial coordinates to specific brain functional areas and extract the probability distribution of synchronous expression between different time series. The preprocessed numerical sequence data was state mapped by dynamic thresholding, and the dynamic correlation between time series was calculated. Finally, we used the extracted multi-scale asynchronous correlation information as the feature input model to train and construct an adolescent health risk prediction model.

Figure 1

Figure 1. Flow chart of multiple sequence asynchronous correlation algorithm for 2D convolutional auto-encoding.

3.1 2D convolutional autoencoder

An autoencoder (AE) is a neural network model primarily used for unsupervised learning. It achieves dimensionality reduction and feature extraction by learning to encode input data. The basic structure consists of two parts: the encoder and the decoder. The encoder maps the input data to a hidden representation, while the decoder attempts to reconstruct the original input from this hidden representation. By minimizing the error between the input data and the reconstructed data, the autoencoder learns useful features of the data.

We utilize a convolutional autoencoder model with the proposed number of layers to extract features from the Adolescent Health Risk Behaviors (AHRB) dataset (Demidenko M. I. et al., 2024). Initially, we preprocessed the raw data using the PyReliMRI toolkit (Demidenko M. et al., 2024), which includes standardized steps such as head motion correction, slice timing correction, and spatial normalization.

The goal of the autoencoder is to encode the input x into a hidden representation h, and then reconstruct the input x from h. The encoding and decoding processes are shown in Equations 1, 2, respectively:

\begin{array}{l} h = f (W_{1} x + b_{1}), & (1) \end{array}

\begin{array}{l} \hat{x} = g (W_{2} h + b_{2}), & (2) \end{array}

where W₁ and W₂ are the weight matrices of the encoder and decoder, respectively, b₁ and b₂ are the bias terms, and f and g are activation functions (typically nonlinear functions such as ReLU).

The loss function for the reconstruction error is typically the mean squared error (MSE), as shown in Equation 3:

\begin{array}{l} L (x, \hat{x}) = || x - \hat{x} | |^{2}, & (3) \end{array}

In the analysis of functional magnetic resonance imaging (fMRI) data, convolutional autoencoders can be used to extract spatial and temporal features for assessing adolescent health risks. The convolutional autoencoder comprises an encoder and a decoder. The encoder consists of three convolutional layers, each followed by a ReLU activation function and a max-pooling layer. The decoder consists of four convolutional layers, the first three layers are the upper sampling layer, and the last layer generates an image with the same shape as the input, as shown in Figure 2 and Table 1 for details.

Figure 2

Figure 2. A 2DCNN-AE model for extracting features from fMRI images.

Table 1

Table 1. Depth information for our 2DCNN-AE model.

For the input connectivity matrix data, features are first extracted through convolutional layers. The final layer of the encoder provides the hidden representation, as shown in Equation 4:

\begin{array}{l} h = f (W_{1} x + b_{1}), & (4) \end{array}

The hidden representation h is fed into the decoder, which reconstructs the input data through convolution and upsampling layers, as shown in Equation 5:

\begin{array}{l} \hat{x} = g (W_{2} h + b_{2}), & (5) \end{array}

Finally, the model optimizes its parameters by minimizing the reconstruction error, thereby learning useful features from the fMRI data for subsequent adolescent health risk assessment.

3.2 Multi-sequence multi-scale asynchronous correlation information extraction

We map the voxels in three-dimensional spatial coordinates to specific brain functional regions using a brain partition map. The voxel values within each brain region are averaged to represent the overall fluctuation of blood oxygen concentration levels in that region, focusing our research on regions of interest (ROIs). This process is illustrated in Figure 3.

Figure 3

Figure 3. Transforming brain partition map targets into regions of interest (ROIs). (A) Three-dimensional coordinates of the brain partition map; (B) ROI distribution of fluctuation values of blood oxygen concentration levels in the brain area.

The state sequence mapping process converts the preprocessed numerical sequence data into state sequences. This study uses a dynamic threshold set by the rule of thumb (Figure 4), marking data above the threshold as active (1) and below as inactive (0). The threshold is defined as follows:

\begin{array}{l} m T (H_{n}^{k}, η) = μ (H_{n}^{k}) + η \cdot σ (H_{n}^{k}), & (6) \end{array}

Figure 4

Figure 4. Rule of thumb and schematic diagram of state sequence mapping.

where $μ (H_{n}^{k})$ denotes the mean and $σ (H_{n}^{k})$ denotes the standard deviation of the time series $H_{n}^{k}$ , see Equations 7, 8 for details.

\begin{array}{l} μ (H_{n}^{k}) = \frac{\sum_{i = 1}^{L} m_{n, l}^{k}}{| H_{n}^{k} |}, & (7) \end{array}

\begin{array}{l} σ (H_{n}^{k}) = \frac{\sum_{i = 1}^{L} {(m_{n, l}^{k} - μ (H_{n}^{k}))}^{2}}{| H_{n}^{k} | - 1}, & (8) \end{array}

We define the mapping function $f (m_{n, l}^{k}, η)$ to map each time slice η and $m_{n, m}^{k}$ to a state u:

\begin{array}{l} f (m_{n, l}^{k}, η) = {\begin{array}{l} State u_{1}, m_{n, m}^{k} < m h (H_{n}^{k}, η_{1}) \\ State u_{2}, m h (H_{n}^{k}, η_{1}) \leq m_{n, l}^{k} < t h (H_{n}^{k}, η_{2}) \\ \dots \\ State u_{s}, m h (H_{n}^{k}, η_{s}) \leq m_{n, l}^{k} < m h (H_{n}^{k}, η_{s + 1}) \\ \dots \\ State u_{S}, m h (H_{n}^{k}, η_{S}) \leq m_{n, l}^{k} \end{array}, & (9) \end{array}

3.3 Probability statistics of synchronous expression between brain regions

We extract discrete probability distributions of synchronous expression between brain regions. First, we define a function ϕ(·) to calculate the dynamic temporal relationship between two time series. The state sequence mapping step has successfully converted the fMRI data from a numerical sequence to a state sequence. Next, we use a measure function ψ(·) to evaluate the degree of synchronous expression between two time series, defined as in Equation 10:

\begin{array}{l} ϕ (m_{n_{1}, l_{1}}^{k}, m_{n_{2}, l_{2}}^{k}) = ψ (f (m_{n_{1}, l_{1}}^{k}, η^{k}), f (t_{n_{2}, l_{2}}^{k}, η^{k})), & (10) \end{array}

where f(·) represents the mapping function under certain prior conditions, converting data from a numerical sequence to a state sequence, and η^k denotes the mapping parameter. We then statistically analyze the frequency information of synchronous expression between brain regions. The function ψ(·) calculates the concurrent activation of two brain regions given a time slice parameter. This concurrent activation could be coincidental or indicative of potential interactions between these regions. With a sufficient number of time slices, we can obtain the probability distribution of whole-brain synchronous activity, distinguishing between coincidental and genuinely related phenomena, as shown in Equation 11:

\begin{array}{l} ψ (f (m_{n_{1}, l_{1}}^{k}, η^{k}), f (m_{n_{2}, l_{2}}^{k}, η^{k})) \\ = {\begin{array}{l} 1, & m_{n_{1}, l_{1}}^{k} > m h (H_{n}^{k}, η^{k}) a n d m_{n_{2}, l_{2}}^{k} > m h (H_{n}^{k}, η^{k}) \\ 0, & e l s e \end{array}, & (11) \end{array}

For a multivariate time series data H_k, the interaction between any two time series $H_{i}^{k}$ and $H_{j}^{k}$ is defined as follows in Equation 12:

\begin{array}{l} X_{ϕ (\cdot)}^{k} (i, j) = \sum_{i = 1}^{L} ϕ (m_{i, l}^{k}, m_{j, l}^{k}), & (12) \end{array}

where X_ϕ(·)(i, j) represents the cumulative sum of function values for each time slice pair. Based on this, the interaction between any two time series $H_{i}^{k}$ and $H_{j}^{k}$ is further defined with a time interval parameter It = [rt, st] as shown in Equations 13, 14:

\begin{array}{l} X_{ϕ (\cdot)}^{k} (i, j, I_{m}) = \sum_{i = 1}^{L} \sum_{g = r_{t}}^{s_{m}^{'}} ϕ (m_{i, l}^{k}, t_{j, g + l}^{k}), & (13) \end{array}

\begin{array}{l} s_{t}^{'} = min (s_{t}, M - m), & (14) \end{array}

This computes the interaction of custom asynchronous intervals with active-passive relationships. Note that $X_{ϕ (\cdot)}^{k} (i, j, I_{m}) \neq X_{ϕ (\cdot)}^{k} (j, i, I_{m})$ , indicating that the resulting interaction matrix X_ϕ(·) is asymmetric, as the interactions have active-passive relationships.

The comprehensive multivariate time series interactions, $X_{ϕ (\cdot)}^{k} \in R^{N \times N \times T}$ , represent multi-scale interval asynchronous synchronous expression values. Here, $X_{ϕ (\cdot)}^{k}$ is a third-order tensor, where N denotes the number of time series, and T denotes the number of time slices of any time series. We convert the tensor $X_{ϕ (\cdot)}^{k}$ into a discrete probability distribution form $Q_{ϕ (\cdot)}^{k}$ , defined as follows in Equation 15:

\begin{array}{l} Q_{ϕ (\cdot)}^{k} = {q_{ϕ (\cdot)}^{k} (i, j, I_{m}) | i, j \in [1, N], I_{m} \in I}, & (15) \end{array}

where $q_{ϕ (\cdot)}^{k} (i, j, I_{m})$ denotes the discrete probability value between the i-th and j-th time series under interval parameter I_m and mapping function ϕ(·), defined as follows in Equation 16:

\begin{array}{l} q_{ϕ (\cdot)}^{k} (i, j, I_{m}) = \frac{X_{ϕ (\cdot)}^{k} (i, j, I_{m})}{\sum_{i = 1}^{N} \sum_{j = 1}^{N} \sum_{m = 1}^{T} X_{ϕ (\cdot)}^{k} (i, j, I_{m})}, & (16) \end{array}

Finally, we use the extracted multi-scale asynchronous correlation information as features to train the model, resulting in an adolescent health risk prediction model.

4 Experiment

4.1 Dataset

The Adolescent Risk Behavior (AHRB) study dataset (Demidenko M. I. et al., 2024) recruited participants from diverse backgrounds to ensure a representative sample of the adolescent population. Each participant underwent a series of assessments, including neuroimaging, behavioral tests, and self-reported questionnaires. The primary focus of the study is to capture the dynamic changes in behavior and brain function as participants transition from late adolescence to early adulthood. The dataset includes two main cohorts from different years: Year 1 consists of approximately 108 participants aged 17–20, and Year 2 consists of approximately 66 participants aged 19–22. This study aims to track the developmental trajectory of risk behaviors and their underlying neural mechanisms.

The functional magnetic resonance imaging (fMRI) component of the AHRB study includes tasks designed to probe emotional and reward processing. Specifically, the study utilizes the Emotional Faces task and the Monetary Incentive Delay (MID) task. For our analysis, we use the raw Blood Oxygen Level-Dependent (BOLD) data from the MID task, which aligns with similar tasks used in the MLS and ABCD studies. The MID task requires participants to respond to cues indicating potential monetary rewards or losses. During the anticipatory phase, participants receive cues that indicate whether they can win or lose money based on their performance. The BOLD response during this phase is particularly interesting as it reflects the neural processes involved in anticipation, a critical component of reward-based decision-making. Understanding the neural basis of reward processing is crucial, as it is a key aspect of adolescent risk behavior.

4.2 Evaluation metrics

The evaluation metrics used in this study include accuracy (Acc), Precision (Prec), Recall (Rec), and F1−score. These metrics are defined as follows in Equations 17–20:

\begin{array}{l} A c c = \frac{T P + T N}{T P + T N + F N + F P}, & (17) \end{array}

\begin{array}{l} P r e c = \frac{T P}{T P + F P}, & (18) \end{array}

\begin{array}{l} R e c = \frac{T P}{T P + F N}, & (19) \end{array}

\begin{array}{l} F 1 - s c o r e = \frac{2 \times P r e c \times R e c}{P r e c + R e c} . & (20) \end{array}

Here, True Positive (TP) represents the correctly classified positive samples, True Negative (TN) represents the correctly classified negative samples, False Positive (FP) represents the incorrectly classified positive samples, and False Negative (FN) represents the incorrectly classified negative samples.

4.3 Model parameters

We first consider the threshold parameter η in Equation 6, which distinguishes the active or inhibitory states of the brain in fMRI imaging. A dynamic threshold converts a numerical sequence into a 0/1 sequence, where larger η values make the active state determination more stringent, resulting in fewer data points, while smaller values capture more data but may introduce noise. For all experiments, we set η = 1, as it balances avoiding overfitting while maintaining sufficient data points.

To improve model generalization given the limited data, we reduced the complexity of the 2DCNN-AE model by decreasing the number of layers and parameters in the convolutional layers. Additionally, we applied L2 regularization with a weight decay of 0.01 and used a Dropout rate of 0.5 in the fully connected layers to further prevent overfitting.

For model evaluation, we conducted a 10-fold cross-validation. The dataset was divided into 10 subsets, with the model trained on 9 subsets and validated on the remaining subset. This process was repeated 10 times, and the final performance metrics were averaged across all folds to ensure robust assessment of the model's stability and generalization. The experimental Settings are shown in Table 2.

Table 2

Table 2. Experimental setup.

The primary goal of this study is to explore asynchronous functional connectivity between different regions of the adolescent brain. We extracted discrete probability distributions of synchronous expression at varying time intervals between brain regions. The interval set I was defined as [0, 0], [1, 1], [2, 2], [3, 6], [7, 12], with smaller intervals capturing short time delay interactions and larger intervals representing longer delays. This setup helps balance the sensitivity to synchronous information while minimizing the risk of overfitting due to excessively large delay parameters. The pseudocode for our algorithm is shown in Algorithm 1:

Algorithm 1

Algorithm 1. Training process of 2DCNN-AE net with multi-scale asynchronous correlation information extraction and synchronous expression probability statistics.

4.4 Experimental results

First, the experiment utilized a two-sample t-test as a feature selection method for dimensionality reduction. Next, the significance level parameter p-value was set to 0.05, 0.01, 0.005, and 0.001 respectively, with the results shown in Table 3.

Table 3

Table 3. Feature reduction parameter validation results.

The results show that as the significance level parameter decreases, the number of extracted features also decreases. Initially, the experimental results improve with fewer features, but when the number of features is reduced too much (e.g., p = 0.001), the performance drops sharply. The best classification results were obtained with a significance level parameter of p = 0.005, achieving an accuracy, precision, recall, and F1-score of 70.142%, 66.276%, 71.946%, and 68.995%, respectively.

After selecting features with a significance level of p = 0.005, we further validated the parameter for the dynamic threshold μ in the state sequence transition. The experimental parameter μ was tuned within the range [0, 2] via grid search, with the results shown in Table 4.

Table 4

Table 4. Dynamic threshold μ parameter validation results.

As shown in Table 4, using the full temporal mean value as the high activity threshold (μ = 0) yields poor results. This might be due to the low threshold being too broad, defining half of the time points as active, which introduces a lot of noise. As the threshold increases, the results improve. The best classification accuracy is achieved at μ = 1.0 and μ = 1.2, as a tighter definition of “active state” can effectively distinguish important activities. Beyond μ>1.2, the performance declines as the threshold becomes too high, leaving few points defined as active. The optimal default choice for the dynamic threshold parameter μ is 1.0. The experimental results for different parameters p are shown in Figure 5.

Figure 5

Figure 5. Experimental results with different parameter (p) values.

Next, we compare our model with methods using Pearson correlation coefficient, higher-order statistics (Wee et al., 2016), and dynamic functional connectivity models (Harlalka et al., 2019). Other popular methods include those by Zhang and Wang (2020), Brown et al. (2019), Abraham et al. (2017), and Yang et al. (2020). The results on the AHRB dataset are summarized in Table 5.

Table 5

Table 5. Comparison with existing methods on the AHRB dataset.

As shown in Table 5, our proposed method achieves precision, recall, and F1-score results of 83.116%, 84.784%, and 83.942%, respectively. Compared to the aforementioned methods, our method ranks highly in precision, recall, and F1-score. Although the accuracy is slightly lower than the method by Abraham et al. (2017) (83.366%), our method excels in the other three metrics. The experimental results demonstrate that our approach, incorporating 2D convolutional autoencoders and multi-sequence, multi-scale asynchronous information extraction, uncovers more asynchronous correlation information, yielding good classification accuracy in adolescent health risk assessment applications.

To validate our proposed improvements, we conducted three groups of ablation experiments. We added complex convolution and channel attention mechanisms to the 2DCNN network, along with phase smoothness and coil sensitivity smoothness as physical priors. The ablation experiments are summarized in Table 6, where coil sensitivity smoothness is denoted as S, phase smoothness as P, complex convolution as C, and channel attention mechanism as A. The baseline model is 2DCNN.

Table 6

Table 6. Results of 2DCNN-AE network ablation experiments.

As shown in Table 6, each proposed improvement brought about performance enhancements. Individually adding coil sensitivity prior and phase prior led to considerable improvements, while the combination of channel attention mechanism and complex convolution resulted in significant gains. Combining all improvements achieved the best results.

These results indicate that the introduced 2D convolutional autoencoder and multi-sequence, multi-scale asynchronous information extraction methods effectively capture asynchronous correlation information, enhancing model performance in adolescent health risk assessment applications. The proposed modifications lead to significant improvements, as evidenced by the comprehensive ablation studies.

In summary, our method demonstrates superior performance in most metrics compared to existing methods, highlighting its potential in adolescent health risk assessment based on rs-fMRI data.

5 Discussion and conclusion

This study utilized fMRI and deep learning techniques to tackle challenges in adolescent health risk assessment, aiming to enhance the efficiency, accuracy, and interpretability of extracting features from fMRI data. We introduced a novel method integrating multi-sequence 2DCNN-AE with multi-scale asynchronous correlation information extraction, designed to capture spatial and temporal features and address the complex interactions between brain regions. Our experimental evaluation on the AHRB dataset demonstrated the method's superiority in accuracy, precision, recall, and F1-score, highlighting its capability to identify critical features and intricate temporal patterns often missed by traditional methods.

However, the study is not without limitations. First, the method's reliance on a relatively small dataset, due to the high cost and complexity of acquiring and processing fMRI data, may limit its generalizability to larger populations. This issue is particularly pronounced in deep learning models, which typically require large amounts of labeled data for effective training. Second, while the method improves interpretability compared to traditional deep learning approaches, the “black box” nature of certain deep learning components still poses challenges in clinical settings, where understanding the rationale behind predictions is crucial.

The superior performance of the proposed model can be attributed to its ability to capture both spatial and temporal dynamics from fMRI data. Specifically, the integration of multi-sequence 2DCNN-AE with multi-scale asynchronous correlation extraction allows for a more nuanced understanding of brain activity. This design choice helps to uncover latent interactions between brain regions that are otherwise overlooked in traditional models, leading to a more accurate assessment of health risks. Moreover, the asynchronous correlation extraction provides a mechanism to account for non-linear and time-shifted relationships between brain regions, which may be critical in identifying early indicators of health risks. These insights not only demonstrate the efficacy of the proposed approach but also open new avenues for exploring brain region connectivity in health-related research.

Looking forward, future research should focus on addressing these limitations. Expanding the dataset size through collaborative efforts and leveraging transfer learning techniques could help improve the model's generalizability. Additionally, integrating more interpretable machine learning methods or developing hybrid models that combine deep learning with rule-based systems could further enhance the clinical applicability of the proposed method. These improvements would not only increase the accuracy and robustness of the predictions but also make them more actionable for healthcare providers.

In conclusion, the superior performance of the proposed model can be attributed to its ability to capture both spatial and temporal dynamics from fMRI data. Specifically, the integration of multi-sequence 2DCNN-AE with multi-scale asynchronous correlation extraction allows for a more nuanced understanding of brain activity. This design choice helps to uncover latent interactions between brain regions that are otherwise overlooked in traditional models, leading to a more accurate assessment of health risks. Moreover, the asynchronous correlation extraction provides a mechanism to account for non-linear and time-shifted relationships between brain regions, which may be critical in identifying early indicators of health risks. These insights not only demonstrate the efficacy of the proposed approach but also open new avenues for exploring brain region connectivity in health-related research.

Data availability statement

The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/supplementary material.

Author contributions

DG: Conceptualization, Data curation, Methodology, Resources, Writing – original draft. GY: Formal analysis, Methodology, Writing – review & editing. JS: Validation, Writing – review & editing. FW: Supervision, Writing – review & editing. CJ: Supervision, Validation, Visualization, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported by the China University of Mining and Technology (Beijing) Central Universities Fundamental Scientific Research Operating Expenses Project “Practical Research on Physical Fitness Teaching Mode to Improve College Students' Physical Fitness” (Project No.: 2021YQTY03), the China University of Mining and Technology (Beijing) Central Universities Basic Scientific Research Operating Expenses Project “Path Study on Occupational Health Exercise Promotion for Emergency Rescue Personnel in the Context of ‘Healthy China”' (Project No.: 2023SKPYTY02), and the Fundamental Research Funds for the Central Universities (2024SKPYTY01). The authors sincerely appreciate the financial support provided by these projects.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Abbreviations

AHRB, Adolescent Health Risk Behavior; fMRI, Functional Magnetic Resonance Imaging; BOLD, Blood Oxygen Level-Dependent; CNN, Convolutional Neural Network; LSTM, Long Short-Term Memory; 2DCNN-AE, Two-Dimensional Convolutional Autoencoder; MID, Monetary Incentive Delay.

References

Abraham, A., Milham, M. P., Di Martino, A., Craddock, R. C., Samaras, D., Thirion, B., et al. (2017). Deriving reproducible biomarkers from multi-site resting-state data: an autism-based example. Neuroimage 147, 736–745. doi: 10.1016/j.neuroimage.2016.10.045

PubMed Abstract | Crossref Full Text | Google Scholar

Agarwal, K., Manza, P., Tejeda, H. A., Courville, A. B., Volkow, N. D., and Joseph, P. V. (2023). Risk assessment of maladaptive behaviors in adolescents: Nutrition, screen time, prenatal exposure, childhood adversities - adolescent brain cognitive development study. J. Adoles. Health. 8:33. doi: 10.1016/j.jadohealth.2023.08.033

PubMed Abstract | Crossref Full Text | Google Scholar

Allen, E. J., St-Yves, G., Wu, Y., Breedlove, J. L., Prince, J. S., Dowdle, L. T., et al. (2022). A massive 7t fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nat. Neurosci. 25, 116–126. doi: 10.1038/s41593-021-00962-x

PubMed Abstract | Crossref Full Text | Google Scholar

Baranger, D. A., Lindenmuth, M., Nance, M., Guyer, A. E., Keenan, K., Hipwell, A. E., et al. (2021). The longitudinal stability of fMRI activation during reward processing in adolescents and young adults. Neuroimage 232:117872. doi: 10.1016/j.neuroimage.2021.117872

PubMed Abstract | Crossref Full Text | Google Scholar

Bjork, J. M., Smith, A. R., Chen, G., and Hommer, D. W. (2010). Adolescents, adults and rewards: comparing motivational neurocircuitry recruitment using fMRI. PLoS ONE 5:e11440. doi: 10.1371/journal.pone.0011440

PubMed Abstract | Crossref Full Text | Google Scholar

Bollmann, S., and Barth, M. (2021). New acquisition techniques and their prospects for the achievable resolution of fMRI. Prog. Neurobiol. 207:101936. doi: 10.1016/j.pneurobio.2020.101936

PubMed Abstract | Crossref Full Text | Google Scholar

Bozzini, A. B., Bauer, A., Maruyama, J., Sim oes, R., and Matijasevich, A. (2020). Factors associated with risk behaviors in adolescence: a systematic review. Braz. J. Psychiat. 43, 210–221. doi: 10.1590/1516-4446-2019-0835

PubMed Abstract | Crossref Full Text | Google Scholar

Brown, C. J., Miller, S. P., Booth, B. G., Zwicker, J. G., Grunau, R. E., Synnes, A. R., et al. (2019). Predictive connectome subnetwork extraction with anatomical and connectivity priors. Comput. Med. Imag. Graph. 71, 67–78. doi: 10.1016/j.compmedimag.2018.08.009

PubMed Abstract | Crossref Full Text | Google Scholar

Chen, Y., Tang, Y., Wang, C., Liu, X., Zhao, L., and Wang, Z. (2020). ADHD classification by dual subspace learning using resting-state functional connectivity. Artif. Intell. Med. 103:101786. doi: 10.1016/j.artmed.2019.101786

PubMed Abstract | Crossref Full Text | Google Scholar

Demidenko, M., Mumford, J., and Poldrack, R. (2024). PyReliMRI: an open-source python tool for estimates of reliability in MRI data [J] [Computer software]. Zenodo. doi: 10.5281/zenodo.12522260

Crossref Full Text | Google Scholar

Demidenko, M. I., Huntley, E. D., and Keating, D. P. (2024). Adolescent health risk behavior study. OpenNeuro. [Dataset] doi: 10.18112/openneuro.ds005012.v1.0.2

Crossref Full Text | Google Scholar

Ernst, M., Torrisi, S., Balderston, N., Grillon, C., and Hale, E. A. (2015). fMRI functional connectivity applied to adolescent neurodevelopment. Annu. Rev. Clin. Psychol. 11, 361–377. doi: 10.1146/annurev-clinpsy-032814-112753

PubMed Abstract | Crossref Full Text | Google Scholar

Harlalka, V., Bapi, R. S., Vinod, P., and Roy, D. (2019). Atypical flexibility in dynamic functional connectivity quantifies the severity in autism spectrum disorder. Front. Hum. Neurosci. 13:6. doi: 10.3389/fnhum.2019.00006

PubMed Abstract | Crossref Full Text | Google Scholar

Iravani, B., Arshamian, A., Fransson, P., and Kaboodvand, N. (2021). Whole-brain modelling of resting state fMRI differentiates adhd subtypes and facilitates stratified neuro-stimulation therapy. Neuroimage 231:117844. doi: 10.1016/j.neuroimage.2021.117844

PubMed Abstract | Crossref Full Text | Google Scholar

Kim, J.-H., Zhang, Y., Han, K., Wen, Z., Choi, M., and Liu, Z. (2021). Representation learning of resting state fMRI with variational autoencoder. Neuroimage 241:118423. doi: 10.1016/j.neuroimage.2021.118423

PubMed Abstract | Crossref Full Text | Google Scholar

Lauharatanahirun, N., Maciejewski, D. F., Kim-Spoon, J., and King-Casas, B. (2023). Risk-related brain activation is linked to longitudinal changes in adolescent health risk behaviors. Dev. Cogn. Neurosci. 63:101291. doi: 10.1016/j.dcn.2023.101291

PubMed Abstract | Crossref Full Text | Google Scholar

Lee, K. S., Hagan, C. N., Hughes, M., Cotter, G., Freud, E. M., Kircanski, K., et al. (2023). Systematic review and meta-analysis: task-based fMRI studies in youths with irritability. J. Am. Acad. Child Adoles. Psychiat. 62, 208–229. doi: 10.1016/j.jaac.2022.05.014

PubMed Abstract | Crossref Full Text | Google Scholar

Lin, Q.-H., Niu, Y.-W., Sui, J., Zhao, W.-D., Zhuo, C., and Calhoun, V. D. (2022). Sspnet: an interpretable 3D-CNN for classification of schizophrenia using phase maps of resting-state complex-valued fMRI data. Med. Image Anal. 79:102430. doi: 10.1016/j.media.2022.102430

PubMed Abstract | Crossref Full Text | Google Scholar

Liu, S., Zhao, L., Zhao, J., Li, B., and Wang, S.-H. (2022). Attention deficit/hyperactivity disorder classification based on deep spatio-temporal features of functional magnetic resonance imaging. Biomed. Signal Process. Control 71:103239. doi: 10.1016/j.bspc.2021.103239

PubMed Abstract | Crossref Full Text | Google Scholar

McNorgan, C., Judson, C., Handzlik, D., and Holden, J. G. (2020). Linking ADHD and behavioral assessment through identification of shared diagnostic task-based functional connections. Front. Physiol. 11:583005. doi: 10.3389/fphys.2020.583005

PubMed Abstract | Crossref Full Text | Google Scholar

Mueller, S. C., Maheu, F. S., Dozier, M., Peloso, E., Mandell, D., Leibenluft, E., et al. (2010). Early-life stress is associated with impairment in cognitive control in adolescence: an fMRI study. Neuropsychologia 48, 3037–3044. doi: 10.1016/j.neuropsychologia.2010.06.013

PubMed Abstract | Crossref Full Text | Google Scholar

Qiang, N., Dong, Q., Liang, H., Ge, B., Zhang, S., Sun, Y., et al. (2021). Modeling and augmenting of fMRI data using deep recurrent variational auto-encoder. J. Neural Eng. 18:0460b06. doi: 10.1088/1741-2552/ac1179

PubMed Abstract | Crossref Full Text | Google Scholar

Saurabh, S., and Gupta, P. (2024). Deep learning-based modified bidirectional lstm network for classification of ADHD disorder. Arab. J. Sci. Eng. 49, 3009–3026. doi: 10.1007/s13369-023-07786-w

Crossref Full Text | Google Scholar

Scardera, S., Perret, L. C., Ouellet-Morin, I., Gariépy, G., Juster, R.-P., Boivin, M., et al. (2020). Association of social support during adolescence with depression, anxiety, and suicidal ideation in young adults. JAMA Netw. Open 3:e2027491. doi: 10.1001/jamanetworkopen.2020.27491

PubMed Abstract | Crossref Full Text | Google Scholar

Sheu, Y.-H. (2020). Illuminating the black box: interpreting deep neural network models for psychiatric research. Front. Psychiatry 11:551299. doi: 10.3389/fpsyt.2020.551299

PubMed Abstract | Crossref Full Text | Google Scholar

Sripada, C., Rutherford, S., Angstadt, M., Thompson, W. K., Luciana, M., Weigard, A., et al. (2020). Prediction of neurocognition in youth from resting state fMRI. Mol. Psychiatry 25, 3413–3421. doi: 10.1038/s41380-019-0481-6

PubMed Abstract | Crossref Full Text | Google Scholar

Stiernman, L. J., Grill, F., Hahn, A., Rischka, L., Lanzenberger, R., Panes Lundmark, V., et al. (2021). Dissociations between glucose metabolism and blood oxygenation in the human default mode network revealed by simultaneous pet-fMRI. Proc. Nat. Acad. Sci. 118:e2021913118. doi: 10.1073/pnas.2021913118

PubMed Abstract | Crossref Full Text | Google Scholar

Su, C., Aseltine, R., Doshi, R., Chen, K., Rogers, S. C., and Wang, F. (2020). Machine learning for suicide risk prediction in children and adolescents with electronic health records. Transl. Psychiatry 10:413. doi: 10.1038/s41398-020-01100-0

PubMed Abstract | Crossref Full Text | Google Scholar

Tate, A. E., McCabe, R. C., Larsson, H., Lundström, S., Lichtenstein, P., and Kuja-Halkola, R. (2020). Predicting mental health problems in adolescence using machine learning techniques. PLoS ONE 15:e0230389. doi: 10.1371/journal.pone.0230389

PubMed Abstract | Crossref Full Text | Google Scholar

Uyulan, C., Erguzel, T. T., Turk, O., Farhad, S., Metin, B., and Tarhan, N. (2023). A class activation map-based interpretable transfer learning model for automated detection of adhd from fMRI data. Clin. EEG Neurosci. 54, 151–159. doi: 10.1177/15500594221122699

PubMed Abstract | Crossref Full Text | Google Scholar

Viessmann, O., and Polimeni, J. R. (2021). High-resolution fMRI at 7 tesla: challenges, promises and recent developments for individual-focused fMRI studies. Curr. Opin. Behav. Sci. 40, 96–104. doi: 10.1016/j.cobeha.2021.01.011

PubMed Abstract | Crossref Full Text | Google Scholar

Wang, Z., Zhou, X., Gui, Y., Liu, M., and Lu, H. (2023). Multiple measurement analysis of resting-state fMRI for adhd classification in adolescent brain from the abcd study. Transl. Psychiatry 13:45. doi: 10.1038/s41398-023-02309-5

PubMed Abstract | Crossref Full Text | Google Scholar

Wee, C.-Y., Yang, S., Yap, P.-T., Shen, D., and Initiative, A. D. N. (2016). Sparse temporally dynamic resting-state functional connectivity networks for early mci identification. Brain Imaging Behav. 10, 342–356. doi: 10.1007/s11682-015-9408-2

PubMed Abstract | Crossref Full Text | Google Scholar

Yang, X., Schrader, P. T., and Zhang, N. (2020). A deep neural network study of the abide repository on autism spectrum classification. Int. J. Adv. Comput. Sci. Applic. 11, 1–6. doi: 10.14569/IJACSA.2020.0110401

Crossref Full Text | Google Scholar

Yin, W., Li, L., and Wu, F.-X. (2022). Deep learning for brain disorder diagnosis based on fMRI images. Neurocomputing 469, 332–345. doi: 10.1016/j.neucom.2020.05.113

Crossref Full Text | Google Scholar

Zhang, W., and Wang, Y. (2020). “Deep multimodal brain network learning for joint analysis of structural morphometry and functional connectivity,” in 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI) (IEEE), 1–5. doi: 10.1109/ISBI45749.2020.9098624

PubMed Abstract | Crossref Full Text | Google Scholar

Zink, J., Belcher, B. R., Imm, K., and Leventhal, A. M. (2020). The relationship between screen-based sedentary behaviors and symptoms of depression and anxiety in youth: a systematic review of moderating variables. BMC Public Health 20, 1–37. doi: 10.1186/s12889-020-08572-1

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: adolescence, health risk assessment, functional magnetic resonance imaging, deep learning, convolutional autoencoder, behavioral health risks

Citation: Gao D, Yang G, Shen J, Wu F and Ji C (2024) Multi-scale asynchronous correlation and 2D convolutional autoencoder for adolescent health risk prediction with limited fMRI data. Front. Comput. Neurosci. 18:1478193. doi: 10.3389/fncom.2024.1478193

Received: 09 August 2024; Accepted: 23 September 2024;
Published: 15 October 2024.

Edited by:

Zhiqiang Huo, King's College London, United Kingdom

Reviewed by:

Xing Yang, Anhui Science and Technology University, China
Yuxi Liu, University of Florida, United States

Copyright © 2024 Gao, Yang, Shen, Wu and Ji. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Di Gao, R2RjdW10YjE5ODdAMTI2LmNvbQ==; Chao Ji, Y2guamM4QDE2My5jb20=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.