Transfer learning-enhanced CNN-GRU-attention model for knee joint torque prediction

Xie, Hao; Wang, Yingpeng; Liu, Tingting; Yan, Songhua; Zeng, Jizhou; Zhang, Kuan

doi:10.3389/fbioe.2025.1530950

ORIGINAL RESEARCH article

Front. Bioeng. Biotechnol., 03 March 2025

Sec. Biomechanics

Volume 13 - 2025 | https://doi.org/10.3389/fbioe.2025.1530950

This article is part of the Research TopicUse of Digital Human Modeling for Promoting Health, Care and Well-BeingView all 12 articles

Transfer learning-enhanced CNN-GRU-attention model for knee joint torque prediction

Hao Xie¹

Yingpeng Wang²

Tingting Liu¹

Songhua Yan¹

Jizhou Zeng³

Kuan Zhang¹*

¹School of Biomedical Engineering, Capital Medical University, Beijing, China
²Department of Rehabilitation, Beijing Rehabilitation Hospital, Capital Medical University, Beijing, China
³Department of Orthopedics, Beijing Luhe Hospital, Capital Medical University, Beijing, China

Introduction: Accurate prediction of joint torque is critical for preventing injury by providing precise insights into the forces acting on joints during activities. Traditional approaches, including inverse dynamics, EMG-driven neuromusculoskeletal (NMS) models, and standard machine learning methods, typically use surface EMG (sEMG) signals and kinematic data. However, these methods often struggle to reveal the complex, non-linear relationship between muscle activation and joint motion, particularly with complex or unfamiliar movements. The generalization of joint torque estimation models across different individuals faces a significant challenge, as feature transferability tends to decline in higher, task-specific layers, reducing model performance.

Methods: In this study, we proposed a CNN-GRU-Attention neural network model combining a neuromusculoskeletal (NMS) solver-informed (hybrid-CNN) augmented with transfer learning, designed to predict knee joint torque with higher accuracy. The neural network was trained using EMG signals, joint angles, and muscle forces as inputs to predict knee joint torque in different activities, and the predictive performance of the model was evaluated both within and between subjects. Additionally, we have developed a transfer learning method in the inter-subject model, which improved the accuracy of knee torque prediction by transferring the learning knowledge of previous participants to new participants.

Results: Our results showed that the hybrid-CNN model can predict knee joint torque within subjects with a significantly lower error (root mean square error ≤0.16 Nm/kg). A transfer learning technique was adopted in the inter-subject tests to significantly improve the generalizability with a lower error (root mean square error ≤0.14 Nm/kg).

Conclusion: The transfer learning-enhanced CNN-GRU-Attention with the NMS model shows great potential in the prediction of knee joint torque.

1 Introduction

Accurate estimation of knee joint torque is important not only for understanding joint function but also for developing effective rehabilitation strategies and preventing injuries. Electromyography (EMG) signals provide insight into the electrical activity of muscles during contraction, which is intrinsically linked to the force produced by these muscles. Muscle force subsequently contributes to joint torque via the biomechanical leverage inherent in the musculoskeletal system, establishing EMG as a critical indicator of torque generation (Prilutsky and Gregor, 2000). Torque reflects the forces exerted by the surrounding muscles, making it a key factor in both athletic performance and the progression of joint conditions such as osteoarthritis (McErlain-Naylor et al., 2021; Zaman et al., 2022), evaluation of surgical outcomes (Berning et al., 2021) and design of exoskeletons or prostheses (Sartori et al., 2018; Yao et al., 2018; Pizzolato et al., 2019). Traditionally, the accurate measurement of knee joint torque has required specialized equipment, making it difficult to achieve real-time applications. Thus, predicting knee joint torque using electromyography (EMG) signals and advanced computational models has become an attractive solution.

Current methods for estimating knee joint torque mainly fall into two categories: EMG-driven neuromusculoskeletal (NMS) modeling and deep learning techniques. EMG-driven NMS models aim to capture the complex interactions among muscles, tendons, and joints by relating muscle activation signals to joint torque. While these models are highly informative, their application often involves intricate calibration procedures and the precise determination of numerous physiological parameters, such as muscle dynamics and kinematic variables, making their implementation both intricate and time-consuming. (Jung et al., 2022; Zhang et al., 2022; Zhao et al., 2023). The NMS model must first be calibrated through individual experiments to obtain personalized parameters, such as the optimal fiber length, optimal feather angle, maximum isometric contraction force, and tendon relaxation length before the personalized model can be used to estimate muscle force and joint torque (Serrancolí et al., 2016; Ao et al., 2022; Ao et al., 2023). Therefore, it is expected that the NMS model is lack of generalization for inter-subject predictions. This complexity can make them time-consuming to set up, limiting their practicality for real-time applications in clinical settings.

To address the time-consuming issues of physics-based musculoskeletal models, data-driven models have also been popular. These models can effectively capture nonlinear relationships between inputs such as joint angles and EMG signals, and outputs such as joint torques. Several studies used neural networks for the prediction of knee flexion angles or torques in able-bodied subjects (Dzulkifli et al., 2018; Xu et al., 2018; Hajian et al., 2021; Zhang et al., 2021; Schulte et al., 2022). Huang et al. (2019) proposed a deep-recurrent neural network for the prediction of knee joint angles in real-time. The model used EMG signals together with inertial data from different activities and reported a root mean squared error of 2.93° over a range of approximately 60° (4.9% error). Gautam et al. (2020) used a Long-term Recurrent Convolution Network to classify movements and predict their corresponding knee joint angles, based on EMG. They reported an average mean absolute error of 8.1% in the knee angle prediction of healthy subjects. Zhang et al. (2021) developed an artificial neural network for the prediction of ankle torque from EMG. Root mean squared error (RMSE) values in a range of approximately 1.5 Nm/kg were found for ankle plantar- and dorsiflexion. All these studies indicate that machine learning can be a valuable tool in predicting knee torque or knee angle. However, machine learning models do not account for the underlying mechanisms that link EMG signals to torque generation.

To address this issue, we propose a hybrid modeling approach that combines Convolutional Neural Networks (CNNs), Gated Recurrent Units (GRUs) and Attention mechanism with the NMS model. This innovative framework seeks to integrate the strengths of both NMS modeling and deep learning models, offering a more effective tool for estimating knee joint torque. The CNN component excels in extracting features from time-series data, while the GRU is well-suited for modeling temporal dependencies in joint movements. The Attention mechanism enhances the model’s ability to focus on key time points and significant features, improving prediction accuracy. However, the effectiveness of generalized models often diminishes significantly when applied to novel, previously unseen data, highlighting their limitations in handling unfamiliar scenarios. A notable example of this phenomenon can be seen in the work of Su et al. who introduced a Long Short-Term Memory (LSTM) model aimed at forecasting gait trajectories and phases over several future time frames (Su and Gutierrez-Farewik, 2020). Their findings indicated a significant reduction in the model’s performance when subjected to inter-subject testing, highlighting the challenges associated with applying the model to unfamiliar data. In response to this issue, researchers have increasingly turned their attention to transfer learning techniques in recent years. Soleimani et al. proposed a transfer learning framework that outperformed in the inter-subject scenarios. Transfer learning has emerged as an effective technique that utilizes knowledge from previous tasks to address challenges such as small sample sizes (Wu et al., 2023). By leveraging pre-trained models with characteristic parameters, this approach enhances the generalization capabilities of LSTM models, improving their performance across diverse and dynamic scenarios.

For addressing two key challenges: 1) the lack of personalized information in traditional machine learning models; 2) the performance degradation when testing on unseen data, we have developed a hybrid deep learning model that integrates transfer learning, CNN-GRU-Attention, and a musculoskeletal model. The objectives of this study are: 1) to compare the accuracy of knee joint torque prediction using a deep learning model integrated with a calibrated musculoskeletal model versus a standard deep learning model; 2) to investigate whether incorporating transfer learning improves the prediction accuracy of the model across subjects.

2 Materials and methods

2.1 Experiment setup

We used the GPOWER software (3.1.9.7) to calculate the sample size according to pre-experiment, and the sample size was calculated as 9. Ten healthy volunteers (age: 24 ± 3 years, height:1.74 ± 0.06 m, weight:70.9 ± 7.0 kg) were recruited for this study finally and the power with 10 subjects was 0.86. All participants provided informed consent prior to participation, and the study was approved by Capital Medical University. Participants were free of any musculoskeletal or neurological impairments, and none reported any recent injuries that could impact gait or knee function. Two movements were performed by each participant: isometric knee flexion-extension and walking at varying speeds. A 3D motion capture system (Vicon, Oxford, United Kingdom) and force plates (AMTI, USA) were used to record the kinematic and kinetic data of each task as shown in Figure 1.

Figure 1

Figure 1. Shows marker and EMG sensor placement and plug-in-gait model based on captured marker positions. (A) represents the processing process of 3D motion capture data; (B) shows a model of reflective marker points used in plug-in gait; (C) shows the position of the EMG sensor in the acquisition process; (D) shows a 3D motion capture laboratory scene.

Participants performed isometric contractions of the knee extensor and flexor muscles using a dynamometer (Biodex, System 4, Shirley, NY, USA). The knee joint was positioned at 30, 45, 60, 75 and 90-degree flexion angles, and participants were instructed to exert maximal voluntary isometric contraction (MVIC) for 5 s. A total of 3 trials were conducted with 3-minute rest intervals between trials. Torque output was recorded continuously to capture peak isometric strength and time-to-fatigue. We placed seven EMG electrodes on the medial gastrocnemius, lateral gastrocnemius, biceps femoris, semitendinosus, rectus femoris, vastus medialis, and vastus lateralis using a wireless surface EMG system at a 1,500 Hz sampling frequency in the two types of movements. Following the isometric tests, participants walked on an instrumented treadmill at three different speeds (0.8 m/s, 1.2 m/s, and 1.4 m/s). Each walking trial lasted 3 min, and participants were given a 2-minute rest between trials. 3D kinematic data were collected using the motion capture system, with reflective markers placed on the lower limbs according to the Plug-in Gait model. Ground reaction forces were recorded simultaneously with the force plates.

2.2 Data processing

The raw EMG data were band-pass filtered (20–450 Hz) to remove noise, followed by full-wave rectification. Signals were then normalized to MVIC values and a low-pass filter (6 Hz) was applied to obtain the envelope (Mantoan et al., 2015; Derrick et al., 2020). Kinematic data collected from the motion capture system during the gait trials were processed using OpenSim (v4.3), an open-source musculoskeletal modeling software (Pizzolato et al., 2016). Reflective markers were placed on the lower limbs according to the Plug-in Gait model. Inverse kinematics (IK) were performed in OpenSim to compute joint angles based on the 3D marker trajectories. Using inverse dynamics (ID), the net joint moments at the knee were calculated by combining the kinematic data with the ground reaction forces recorded from force plates. The muscle force used in this study is calculated using OpenSim’s Computed Muscle Control (CMC) tool.

2.3 Neural network architecture: CNN-GRU-attention model for torque prediction

In this study, a CNN-GRU-Attention model was established to predict knee joint torque. As shown in Figure 2, the model consists of A CNN, GRU and attention mechanism to perform regression on EMG and angle time series data. Firstly, CNN extracts feature from each input variable capturing local characteristics. GRU then captures long-term dependencies in the data. Finally, the attention module weights and sums the importance of input variables to enhance prediction performance. The model maps the features to the output variable space through dense layers. The personalized model used in this study is derived from previous work that employed an EMG-driven musculoskeletal model to calculate knee torque and reduce experimental measurement errors through an optimization algorithm.

Figure 2

Figure 2. (A) Architecture of knee joint torque framework based on CNN-GRU-Attention network. For the hybrid-CNN model, computed muscle force through the physics-based calibrated NMS model was added. Panel (B) Network structure of CNN-GRU-Attention combined with transfer learning. We extracted layers from a pre-trained model and transferred it to the target model. In the target model, we tuned learning rate and retrained the last dense layer with data from the target subject.

2.3.1 Convolutional neural networks (CNN)

The CNN is designed to extract spatial features from the time-series EMG signals, which are structured as 2D arrays where one dimension represents time and the other represents multiple EMG channels. The CNN architecture consists of several 2D convolutional layers with 3 × 3 convolution kernels. These kernels slide across the EMG data, detecting localized patterns related to muscle activation that are crucial for predicting joint torque. Each convolutional layer is followed by a Rectified Linear Unit (ReLU) activation function, which introduces non-linearity to the model. This helps the CNN capture complex relationships between muscle activation signals and torque output. The ReLU function is defined as Equation 1:

R e L U = \max (0, x) (1)

To further reduce the dimensionality of the feature maps while retaining important information, max-pooling layers with a 2 × 2 window size are incorporated after specific convolutional layers. Max-pooling selects the maximum value within each 2 × 2 region, effectively down sampling the feature maps and mitigating overfitting.

The CNN architecture consists of alternating 2D convolutional layers and max-pooling layers, allowing the model to progressively reduce spatial dimensions while capturing increasingly complex patterns in the EMG signals. The convolution operation for the input 2D array X is expressed as Equation 2:

f_{i, j} = R e L U (\sum_{m = 1}^{3} \sum_{n = 1}^{3} ω_{m, n} \cdot x_{i + m - 1, j + n - 1} + b) (2)

where ω_m,n represents the 3 × 3 kernel, x is the input data, and b is the bias term. The output f_i,j forms the resulting feature map at position (i, j).

2.3.2 Gated recurrent unit (GRU)

The gate recurrent unit (GRU) network is a simplified version of the long short-term memory (LSTM) network, sharing similar functionalities and gating mechanisms. It will acquire a faster training speed and make fewer errors than LSTM because of the fewer parameters considered (Xu et al., 2021). At the same time, owing to fewer parameters, the fitting effect of GRU is better than that of the LSTM with fewer original data (Choe et al., 2021). Figure 3 illustrates the structural relationship within the GRU unit at time t, where z_t and r_t represent the update and reset gate respectively which selectively remember or forget information, thereby mitigating the problems of vanishing and exploding gradients through the gating mechanism; h_t denotes the hidden state information of the GRU unit at this moment, while represents candidate state information. Nonlinear activation functions σ and tanh are applied to implement their respective functions in update gate z_t and reset gate r_t by utilizing previously hidden state information h_t-1 and current input x_t. The update gate determines the allocation ratio between hidden state information h_t at time t-1 and candidate state information at time t; a higher value retains more information from time t-1. Similarly, the reset gate controls the correlation degree between candidate state information at time t and hidden state information h_t-1 at time t-1; a lower value leads to greater forgetting of past information at time t; a higher value retains more information from time t-1.

Figure 3

Figure 3. Shows the basic structure of GRU, the star (*) symbol denotes element-wise multiplication.

The calculation procedure of the forward pass in a GRU architecture can be expressed by the (Equations 3–6).

r_{t} = σ (ω_{r} \cdot [h_{t - 1}, x_{t}]) (3)

z_{t} = σ (ω_{z} \cdot [h_{t - 1}, x_{t}]) (4)

\tilde{h_{t}} = \tanh (ω \cdot [r_{t} * h_{t - 1}, x_{t}]) (5)

h_{t} = (1 - z_{t}) * h_{t - 1} + z_{t} * \tilde{h_{t}} (6)

where, ω_r, ω_z and ω are corresponding weight matrices; $*$ represents matric multiplication.

2.3.3 Attention mechanism

The attention mechanism is a widely used technique in the field of sequence data processing (Liu and Guo, 2019). Its basic idea is to dynamically assign weights, so as to selectively focus on specific parts of the input sequence. This method allows the model to deal with long-distance dependency problems more effectively, thereby improving the performance of sequence modeling. By calculating the similarity between any two positions, the attention mechanism can capture long-distance dependency relationships and is not limited by the sequence length. It is also suitable for processing various feature data and can flexibly adjust the weights of each feature to enhance the model’s performance and adaptability.

2.3.4 Transfer learning

A transfer learning technique was employed to adapt the hybrid model for different participants in predicting knee joint torque. The pre-trained model on EMG data from certain subjects (S1, S2, …, Sp), was designed to capture general patterns in EMG relevant to torque prediction. For a new participant (St), only the final dense layer of the model was re-trained. With this approach, all layers of the pre-trained CNN-GRU -attention model, including convolutional, recurrent (GRU), and attention layers, were retained. However, the dense layer at the output, responsible for mapping the learned features to the torque prediction, was reinitialized and re-trained from scratch using the data from new subjects. The remaining layers were fine-tuned to adjust to the specific characteristics of the new subject while maintaining the knowledge acquired from previous subjects.

2.3.5 Hyper parameters tuning for model

Hyperparameter tuning was conducted using a coarse-to-fine random search to optimize the CNN-GRU model for predicting knee joint torque from EMG signals (Bergstra and Bengio, 2012). The model was trained with a batch size of 512, using the Adam optimizer with an initial learning rate of 10⁻³, which was reduced to 10⁻⁴ during the transfer learning phase. The mean squared error (MSE) loss function was used to minimize prediction error. The tuning process explored a wide range of hyperparameters, including learning rates, CNN kernel sizes, and GRU units, followed by a refined search in promising regions. The model training was limited to 1000 epochs, after which optimization was stopped, and early stopping was applied to prevent overfitting based on validation loss.

2.4 Evaluation framework

The study examines the accuracy of torque estimation by standard-CNN and hybrid-CNN models under two scenarios: intra-subject and inter-subject. In the inter-subject predictions, transfer learning is introduced to assess and compare the estimation accuracy of both models. For a detailed testing protocol, please refer to Figure 4.

Figure 4

Figure 4. (A) The training and testing data hybrid-CNN model for five different cases in intra -subject prediction; (B) the training and testing data of hybrid-CNN model for five different cases in inter-subject prediction. Intra-subject prediction: For each case, different movements were used to train model. Models were trained by using data from each motion separately, and tested on the same type motions at different speeds, for each user individually. Inter-subject prediction without transfer learning: Models were trained for each movement of multiple subjects except one (leave-one-out cross-validation method), and then tested on the same type motions for the remaining new subject. Inter-subject prediction with transfer learning: Models were pre-trained for each movement on multiple users except one, and were shared to a new user with a common structure. We then re-trained models with data from the same motion of the new participant.

2.4.1 Intra-subject prediction

The model is trained on different movement variations within a single type of activity and tested on remaining variations. Specifically, two trials are used for training, while one trial serves as validation data.

2.4.2 Without transfer learning

Both deep learning models were trained for each movement of multiple subjects except one (leave-one-out cross-validation method) and then tested on the same type of motions for the remaining new subject.

2.4.3 With transfer learning

Both deep learning models were pre-trained for each movement on multiple subjects except one. We then re-trained the model using data from the same type of motion of the new subject and tested it on the remaining trials.

For each model, the prediction error is calculated as the root mean square error (RMSE) calculated as Equation 7 between predicted and measured joint torques, with the latter obtained via inverse dynamics. This RMSE is then normalized by body weight. To assess differences in prediction error between models, a paired-sample t-test is applied, with significance determined at the p < 0.05 level.

E_{R M S} = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{p, i} - y_{i})}^{2}} (7)

3 Results

3.1 Intra-subject prediction performance

Overall, compared to the standard-CNN model, the torque predictions from the hybrid-CNN model significantly demonstrated a superior agreement with the torque calculated through inverse dynamics at movement tests and smaller RMSE was observed in Figure 5 (hybrid-CNN: RMSE = 0.13Nm/kg, standard-CNN: RMSE = 0.21 Nm/kg). In all trained motions, the predicted accuracy by hybrid-CNN was significantly higher than that of the NMS. (slow walking: p = 0.004, self-selected speed walking: p = 0.004, fast walking: p < 0.0001, isometric knee extension 30°: p = 0.028, isometric knee extension 60°: p = 0.028). Compared to the standard CNN, the hybrid CNN did not always demonstrate superior prediction accuracy. It is worth noting that a worse predicted torque agreement with actual torque by the hybrid-CNN model and standard-CNN model was found in some tested motions, such as self-selected speed walking in Gait_fast case and fast walking in Gait_slow case (Figure 6).

Figure 5

Figure 5. Violin plots depicting the distributions of RMSEs between predicted and measured joint torque (normalized by body mass) across subjects during five movements in the intra-subject scenario. ∗ indicates a significant difference between two cases.

Figure 6

Figure 6. One example trial of measured (computed by inverse dynamics) and estimated joint torque via models during five movements in intra-subject scenario. For each case, the motion used as training data was framed with a dashed box and others are testing data.

3.2 Inter-subject prediction performance

Overall, the torque prediction error from the standard CNN model was higher than that from the hybrid CNN model before adopting transfer learning. It clearly showed that the standard CNN model performed worst in almost test scenarios (Figures 7, 8). In almost trained motions, the prediction accuracy by standard-CNN was significantly worse compared to hybrid CNN (slow walking: p = 0.038, self-selected speed walking: p = 0.011, fast walking: p = 0.025, isometric knee extension 30°: p = 0.028, isometric knee extension 60°: p = 0.028). In the tested movements, the standard CNN generally had worse prediction accuracy than the hybrid CNN model. The RMSE between predicted and calculated torque from inverse dynamics in the standard-CNN and hybrid-CNN model was 0.225 Nm/kg and 0.172 Nm.kg respectively.

Figure 7

Figure 7. Violin plots depicting the distributions of RMSEs between predicted and measured joint torque without transfer learning (normalized by body mass) across subjects during five movements in inter-subject scenario. ∗ indicates a significant difference between two cases.

Figure 8

Figure 8. One example trial of measured (computed by inverse dynamics) and estimated joint torque via models without transfer learning during five movements in inter-subject scenario. For each case, the motion used as training data was framed with a dashed box and others are testing data.

After adopting transfer learning in both models, the predicted torque showed good agreement with the calculated and the RMSE had decreased wherein the standard-CNN had poorer accuracy in some movements compared to the hybrid-CNN model in terms of Figures 9, 10. The RMSE between predicted and calculated torque from inverse dynamics in the standard-CNN and hybrid-CNN model was respectively 0.168 Nm/kg and 0.12 Nm/kg and the prediction error significantly decreased 28.49% and 25.43% (p < 0.001) respectively.

Figure 9

Figure 9. Violin plots depicting the distributions of RMSEs between predicted and measured joint torque with transfer learning (normalized by body mass) across subjects during five movements in inter-subject scenario. ∗ indicates a significant difference between two cases.

Figure 10

Figure 10. One example trial of measured (computed by inverse dynamics) and estimated joint torque via models with transfer learning during five movements in inter-subject scenario. For each case, the motion used as training data was framed with a dashed box and others are testing data.

4 Discussion

This study proposed a modeling method of transfer learning combined with convolutional neural networks-recurrent neural networks-attention mechanism (CNN-GRU-Attention) for the prediction of knee joint torque with much higher accuracy. This approach enhances model accuracy and generalization, particularly in predicting joint torque across diverse individuals and movement scenarios. This method extracts the common knowledge from a set of data through the pre-training of the network model, and extrapolates the knowledge to the target by fine-tuning (FT) of the network parameters, so as to quickly obtain a new adaptive model to realize the data feature transfer learning with small samples. The muscle forces calculated through the musculoskeletal model are entered into the model as features, and the performance between using (hybrid-CNN) and not using the muscle force model (standard-CNN) is compared. We observed a decline in torque prediction performance when extending the model to inter-subject scenarios. To address this issue, we implemented transfer learning, which significantly improved prediction accuracy and enhanced the generalization capability of the proposed model.

Gated recurrent unit (GRU) networks are frequently employed to predict joint torque and other sequence-based data in biomechanics. Serving as a model-free alternative to EMG-driven NMS models, GRUs enable direct mapping of EMG signals to joint torques, thus supporting real-time applications. Unlike NMS approaches, GRUs do not explicitly model physiological relationships, such as muscle excitation-activation dynamics, muscle force-length properties, and muscle-tendon kinematics at various joint angles. By contrast, NMS models are typically customized to individual subjects, using experimental data to refine parameters like optimal muscle fiber length and tendon slack length, thereby improving joint torque predictions on a subject-specific basis (Pizzolato et al., 2015; Hoang et al., 2018). Due to the individual calibration, NMS models are rarely evaluated for cross-subject generalizability. However, for situations requiring precise, subject-specific accuracy, NMS-based methods are often the preferred choice. Integrating muscle forces derived from musculoskeletal (MSK) models with EMG signals and joint angles as inputs to deep learning models significantly improves knee joint torque prediction over standard CNNs. As shown in Figure 5, the mean RMSE is 0.178 Nm/kg, 0.15 Nm/kg and 0.157 Nm/kg in the case of Gait_slow, Gait_self and Gait_fast using hybrid-CNN. Also, the mean RMSE is 0.255 Nm/kg, 0.231 Nm/kg and 0.217 Nm/kg in the case of Gait_slow, Gait_self and Gait_fast using standard-CNN. The prediction error decreased 18%, 35.06% and 27.64% respectively in intra-subject scenarios. This enhancement stems from incorporating physiologically meaningful data that better reflects the human musculoskeletal system. Muscle force serves as a crucial intermediate variable that bridges the complex relationship between muscle activation and joint dynamics, allowing the model to capture non-linear biomechanical interactions more effectively. Combining EMG and joint angles with MSK-derived muscle force provides a multi-source input approach, offering comprehensive insights into movement phases and muscle responses, thus reducing prediction errors. Moreover, muscle force data help the model adapt to inter-subject variability and unseen movement patterns, enhancing robustness against data distribution shifts.

As expected, when transfer learning was not adopted, inter-subject torque prediction performance was less accurate than that of intra-subject prediction, regardless of which hybrid-CNN model was used. Without transfer learning, both hybrid- and standard CNN were trained using data from previous experiences/subjects but none from the new subject. Therefore, it is to be expected that the torque prediction would be less accurate than the calculated from inverse dynamics. When transfer learning was implemented, the joint torque prediction performance was significantly improved in almost all cases as shown in Figure 9. When using the hybrid CNN model, the mean RMSE was 0.123 Nm/kg and 0.172 Nm/kg respectively with and without transfer learning. It can be concluded that the predictive error has significantly reduced by 28.49%. Similarly, when using the standard CNN model, the mean RMSE was respectively 0.168 Nm/kg and 0.225 Nm/kg with and without transfer learning. It can be concluded that the predictive error has witnessed a significant reduction of 25.33%. Transfer learning enhances adaptability across patient populations by enabling models to fine-tune efficiently on specific datasets, reducing the need for extensive data collection. Techniques such as model pruning and lightweight architectures ensure low-latency performance, critical for real-time tasks like prosthetic control, gait analysis, and rehabilitation monitoring. Additionally, transfer learning improves generalization to diverse clinical conditions, supporting applications such as fall risk prediction, post-surgical mobility assessment, and remote monitoring of chronic conditions.

Muscle coordination patterns can be expected to vary across subjects (Safavynia et al., 2011; Rugy et al., 2012), thus, the standard-CNN model may not have sufficient generalizability without information from a new subject in the training process, particularly when training data sets with other subjects are not rich enough. Transfer learning is a common approach in inter-subject cross-validation, improving the generalization of neural networks by transferring knowledge from one domain (previous subjects) to another (new participants) (Soleimani and Nazerfard, 2021). This approach effectively mitigates the decline in predictive accuracy observed when evaluating models on previously unseen data. A study by Kian et al. found that the effectiveness of EMG-driven NMS model calibration is task-dependent, suggesting the use of diverse tasks to optimize musculotendon and EMG-to-activation parameters (Kian et al., 2021). However, individuals with disabilities may struggle to perform a wide range of tasks. Transfer learning proves especially valuable when collecting large datasets from new subjects or movements is costly, time-intensive, or challenging, as in the case of motor disabilities. By incorporating transfer learning, joint torque prediction accuracy was significantly improved.

One limitation of this study lies in the relatively small sample size, which might restrain the diversity of the input data and, accordingly, the model’s capacity to capture a broader range of variability in biomechanical and physiological characteristics. Additionally, the participant group was confined to young adults, which precludes the exploration of how age-related factors, such as alterations in muscle strength or joint stiffness, could impact the model’s performance. Also, the accuracy of this model is verified only in different walking speeds and isometric test movements. Future studies should expand the dataset to include older adults, different movement such as jumping and running, pediatric populations, and individuals with musculoskeletal or neurological conditions. These diverse cohorts would allow us to assess the model’s robustness and adaptability across a broader range of biomechanical patterns, ultimately enhancing its clinical applicability.

5 Conclusion

This study developed an advanced hybrid neural network model that integrates biomechanical parameters from an NMS with CNNs to enhance knee joint torque prediction accuracy. The results reveal that the hybrid model surpasses the performance of standard-CNN in intra-subject assessments. This improvement is attributed to the inclusion of physiological variables, such as individualized muscle forces computed by the hybrid CNN, which act as essential intermediaries in knee torque estimation and boost prediction accuracy, especially when faced with new movement patterns. To address the challenges in inter-subject torque prediction, the study incorporates transfer learning into the NMS-informed CNN model. This approach significantly improves predictive accuracy for unseen movements, such as slower walking, notably during the late stance phase when peak knee extension torque occurs. The transfer learning-enhanced CNN-GRU-Attention with the NMS model outperforms both hybrid-CNN without transfer learning and standard-CNN significantly, and shows great potential in the prediction of knee joint torque.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Ethics statement

The studies involving humans were approved by Ethics Commitee of Capital Medical University. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

HX: Conceptualization, Data curation, Formal Analysis, Methodology, Software, Validation, Visualization, Writing–original draft, Writing–review and editing. YW: Data curation, Funding acquisition, Writing–review and editing. TL: Formal Analysis, Methodology, Writing–review and editing. SY: Formal Analysis, Methodology, Writing–review and editing. JZ: Formal Analysis, Methodology, Writing–review and editing. KZ: Conceptualization, Methodology, Writing–review and editing, Project administration.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This study was supported by the Young Scientists Fund of the National Natural Science Foundation of China (Grant No.12302419).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Ao, D., Li, G., Shourijeh, M. S., Patten, C., and Fregly, B. J. (2023). EMG-driven musculoskeletal model calibration with wrapping surface personalization. IEEE Trans. Neural Syst. Rehabilitation Eng. 31, 4235–4244. doi:10.1109/tnsre.2023.3323516

PubMed Abstract | CrossRef Full Text | Google Scholar

Ao, D., Vega, M. M., Shourijeh, M. S., Patten, C., and Fregly, B. J. (2022). EMG-driven musculoskeletal model calibration with estimation of unmeasured muscle excitations via synergy extrapolation. Front. Bioeng. Biotechnol. 10, 962959. doi:10.3389/fbioe.2022.962959

PubMed Abstract | CrossRef Full Text | Google Scholar

Bergstra, J., and Bengio, Y. (2012). Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305. doi:10.5555/2188385.2188395

CrossRef Full Text | Google Scholar

Berning, J., Francisco, G. E., Chang, S. H., Fregly, B. J., and O'Malley, M. K. (2021). Myoelectric control and neuromusculoskeletal modeling: complementary technologies for rehabilitation robotics. Curr. Opin. Biomed. Eng. 19, 100313. doi:10.1016/j.cobme.2021.100313

CrossRef Full Text | Google Scholar

Choe, D.-E., Kim, H.-C., and Kim, M.-H. (2021). Sequence-based modeling of deep learning with LSTM and GRU networks for structural damage detection of floating offshore wind turbine blades. Renew. Energy 174, 218–235. doi:10.1016/j.renene.2021.04.025

CrossRef Full Text | Google Scholar

Derrick, T. R., van den Bogert, A. J., Cereatti, A., Dumas, R., Fantozzi, S., and Leardini, A. (2020). ISB recommendations on the reporting of intersegmental forces and moments during human motion analysis. J. BIOMECHANICS 99, 109533. doi:10.1016/j.jbiomech.2019.109533

PubMed Abstract | CrossRef Full Text | Google Scholar

Dzulkifli, M. A., Hamzaid, N. A., Davis, G. M., and Hasnan, N. (2018). Neural network-based muscle torque estimation using mechanomyography during electrically-evoked knee extension and standing in spinal cord injury. Front. Neurorobotics 12, 50. doi:10.3389/fnbot.2018.00050

PubMed Abstract | CrossRef Full Text | Google Scholar

Gautam, A., Panwar, M., Biswas, D., and Acharyya, A. (2020). MyoNet: a transfer-learning-based lrcn for lower limb movement recognition and knee joint angle prediction for remote monitoring of rehabilitation progress from sEMG. IEEE J. Transl. Eng. Health Med. 8, 1–10. doi:10.1109/jtehm.2020.2972523

PubMed Abstract | CrossRef Full Text | Google Scholar

Hajian, G., Morin, E., and Etemad, A. (2021). Convolutional neural network approach for elbow torque estimation during quasi-dynamic and dynamic contractions. Annu Int Conf IEEE Eng Med Biol Soc., 665–668. doi:10.1109/EMBC46164.2021.9630287

PubMed Abstract | CrossRef Full Text | Google Scholar

Hoang, H. X., Pizzolato, C., Diamond, L. E., and Lloyd, D. G. (2018). Subject-specific calibration of neuromuscular parameters enables neuromusculoskeletal models to estimate physiologically plausible hip joint contact forces in healthy adults. J. Biomech. 80, 111–120. doi:10.1016/j.jbiomech.2018.08.023

PubMed Abstract | CrossRef Full Text | Google Scholar

Huang, Y., He, Z., Liu, Y., Yang, R., Zhang, X., Cheng, G., et al. (2019). Real-time intended knee joint motion prediction by deep-recurrent neural networks. IEEE Sensors J. 19 (23), 11503–11509. doi:10.1109/jsen.2019.2933603

CrossRef Full Text | Google Scholar

Jung, M. K., Muceli, S., Rodrigues, C., Megia-Garcia, A., Pascual-Valdunciel, A., del-Ama, A. J., et al. (2022). Intramuscular EMG-driven musculoskeletal modelling: towards implanted muscle interfacing in spinal cord injury patients. IEEE Trans. Biomed. Eng. 69 (1), 63–74. doi:10.1109/tbme.2021.3087137

PubMed Abstract | CrossRef Full Text | Google Scholar

Kian, A., Pizzolato, C., Halaki, M., Ginn, K., Lloyd, D., Reed, D., et al. (2021). The effectiveness of EMG-driven neuromusculoskeletal model calibration is task dependent. J. Biomech 129, 110698. doi:10.1016/j.jbiomech.2021.110698

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, G., and Guo, J. (2019). Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 337, 325–338. doi:10.1016/j.neucom.2019.01.078

CrossRef Full Text | Google Scholar

Mantoan, A., Pizzolato, C., Sartori, M., Sawacha, Z., Cobelli, C., and Reggiani, M. (2015). MOtoNMS: a MATLAB toolbox to process motion data for neuromusculoskeletal modeling and simulation. Source code Biol. Med. 10, 12. doi:10.1186/s13029-015-0044-4

PubMed Abstract | CrossRef Full Text | Google Scholar

McErlain-Naylor, S. A., King, M. A., and Felton, P. J. (2021). A review of forward-dynamics simulation models for predicting optimal technique in maximal effort sporting movements. Appl. Sci. 11 (4), 1450. doi:10.3390/app11041450

CrossRef Full Text | Google Scholar

Pizzolato, C., Lloyd, D. G., Sartori, M., Ceseracciu, E., Besier, T. F., Fregly, B. J., et al. (2015). CEINMS: a toolbox to investigate the influence of different neural control solutions on the prediction of muscle excitation and joint moments during dynamic motor tasks. J. Biomech. 48 (14), 3929–3936. doi:10.1016/j.jbiomech.2015.09.021

PubMed Abstract | CrossRef Full Text | Google Scholar

Pizzolato, C., Reggiani, M., Modenese, L., and Lloyd, D. G. (2016). Real-time inverse kinematics and inverse dynamics for lower limb applications using OpenSim. Comput. Methods Biomechanics Biomed. Eng. 20 (4), 436–445. doi:10.1080/10255842.2016.1240789

PubMed Abstract | CrossRef Full Text | Google Scholar

Pizzolato, C., Saxby, D. J., Palipana, D., Diamond, L. E., Barrett, R. S., Teng, Y. D., et al. (2019). Neuromusculoskeletal modeling-based prostheses for recovery after spinal cord injury. Front. Neurorobotics 13, 97. doi:10.3389/fnbot.2019.00097

PubMed Abstract | CrossRef Full Text | Google Scholar

Prilutsky, B. I., and Gregor, R. J. (2000). Analysis of muscle coordination strategies in cycling. IEEE Trans. Rehabilitation Eng. 8 (3), 362–370. doi:10.1109/86.867878

CrossRef Full Text | Google Scholar

Rugy, A., Loeb, G. E., and Carroll, T. J. (2012). Muscle coordination is habitual rather than optimal. J. Neurosci. 32 (21), 7384–7391. doi:10.1523/JNEUROSCI.5792-11.2012

PubMed Abstract | CrossRef Full Text | Google Scholar

Safavynia, S. A., Torres-Oviedo, G., and Ting, L. H. (2011). Muscle synergies: implications for clinical evaluation and rehabilitation of movement. Top. spinal cord Inj. rehabilitation 17 (1), 16–24. doi:10.1310/sci1701-16

PubMed Abstract | CrossRef Full Text | Google Scholar

Sartori, M., Durandau, G., Došen, S., and Farina, D. (2018). Robust simultaneous myoelectric control of multiple degrees of freedom in wrist-hand prostheses by real-time neuromusculoskeletal modeling. J. Neural Eng. 15 (6), 066026. doi:10.1088/1741-2552/aae26b

PubMed Abstract | CrossRef Full Text | Google Scholar

Schulte, R. V., Zondag, M., Buurke, J. H., and Prinsen, E. C. (2022). Multi-day EMG-based knee joint torque estimation using hybrid neuromusculoskeletal modelling and convolutional neural networks. Front. Robotics AI 9, 869476. doi:10.3389/frobt.2022.869476

PubMed Abstract | CrossRef Full Text | Google Scholar

Serrancolí, G., Kinney, A. L., Fregly, B. J., and Font-Llagunes, J. M. (2016). Neuromusculoskeletal model calibration significantly affects predicted knee contact forces for walking. J. Biomechanical Eng. 138 (8), 0810011–08100111. doi:10.1115/1.4033673

PubMed Abstract | CrossRef Full Text | Google Scholar

Soleimani, E., and Nazerfard, E. (2021). Cross-subject transfer learning in human activity recognition systems using generative adversarial networks. Neurocomputing 426, 26–34. doi:10.1016/j.neucom.2020.10.056

CrossRef Full Text | Google Scholar

Su, B., and Gutierrez-Farewik, E. M. (2020). Gait trajectory and gait phase prediction based on an LSTM network. Sensors 20 (24), 7127. doi:10.3390/s20247127

PubMed Abstract | CrossRef Full Text | Google Scholar

Wu, D., Yang, J., and Sawan, M. (2023). Transfer learning on electromyography (EMG) tasks: approaches and beyond. IEEE Trans. Neural Syst. Rehabilitation Eng. 31, 3015–3034. doi:10.1109/tnsre.2023.3295453

PubMed Abstract | CrossRef Full Text | Google Scholar

Xu, J., Wang, K., Lin, C., Xiao, L., Huang, X., and Zhang, Y. (2021). FM-GRU: a time series prediction method for water quality based on seq2seq framework. Water 13 (8), 1031. doi:10.3390/w13081031

CrossRef Full Text | Google Scholar

Xu, L., Chen, X., Cao, S., Zhang, X., and Chen, X. (2018). Feasibility study of advanced neural networks applied to sEMG-based force estimation. Sensors 18 (10), 3226. doi:10.3390/s18103226

PubMed Abstract | CrossRef Full Text | Google Scholar

Yao, S., Zhuang, Y., Li, Z., and Song, R. (2018). Adaptive admittance control for an ankle exoskeleton using an EMG-driven musculoskeletal model. Front. Neurorobotics 12, 16. doi:10.3389/fnbot.2018.00016

PubMed Abstract | CrossRef Full Text | Google Scholar

Zaman, R., Xiang, Y. J., Rakshit, R., and Yang, A. M. (2022). Hybrid predictive model for lifting by integrating skeletal motion prediction with an OpenSim musculoskeletal model. IEEE Trans. Biomed. Eng. 69 (3), 1111–1122. doi:10.1109/TBME.2021.3114374

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, L. B., Li, Z. J., Hu, Y. B., Smith, C., Farewik, E. M. G., and Wang, R. L. (2021). Ankle joint torque estimation using an EMG-driven neuromusculoskeletal model and an artificial neural network model. IEEE Trans. Automation Sci. Eng. 18 (2), 564–573. doi:10.1109/TASE.2020.3033664

CrossRef Full Text | Google Scholar

Zhang, Q., Clark, W. H., Franz, J. R., and Sharma, N. (2022). Personalized fusion of ultrasound and electromyography-derived neuromuscular features increases prediction accuracy of ankle moment during plantarflexion. Biomed. Signal Process. Control 71, 103100. doi:10.1016/j.bspc.2021.103100

CrossRef Full Text | Google Scholar

Zhao, Y., Zhang, J., Li, Z., Qian, K., Xie, S. Q., Lu, Y., et al. (2023). Computationally efficient personalized EMG-driven musculoskeletal model of wrist joint. IEEE Trans. Instrum. Meas. 72, 1–10. doi:10.1109/tim.2022.3225023

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: knee joint torque, convolutional neural network, transfer learning, neuromusculoskeletal model, muscle force

Citation: Xie H, Wang Y, Liu T, Yan S, Zeng J and Zhang K (2025) Transfer learning-enhanced CNN-GRU-attention model for knee joint torque prediction. Front. Bioeng. Biotechnol. 13:1530950. doi: 10.3389/fbioe.2025.1530950

Received: 19 November 2024; Accepted: 12 February 2025;
Published: 03 March 2025.

Edited by:

Sofia Scataglini, University of Antwerp, Belgium

Reviewed by:

Amin Komeili, University of Guelph, Canada
Agnieszka Tomaszewska, Gdansk University of Technology, Poland

Copyright © 2025 Xie, Wang, Liu, Yan, Zeng and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Kuan Zhang, a3poYW5nQGNjbXUuZWR1LmNu

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Transfer learning-enhanced CNN-GRU-attention model for knee joint torque prediction

1 Introduction

2 Materials and methods

2.1 Experiment setup

2.2 Data processing

2.3 Neural network architecture: CNN-GRU-attention model for torque prediction

2.3.1 Convolutional neural networks (CNN)

2.3.2 Gated recurrent unit (GRU)

2.3.3 Attention mechanism

2.3.4 Transfer learning

2.3.5 Hyper parameters tuning for model

2.4 Evaluation framework

2.4.1 Intra-subject prediction

2.4.2 Without transfer learning

2.4.3 With transfer learning

3 Results

3.1 Intra-subject prediction performance

3.2 Inter-subject prediction performance

4 Discussion

5 Conclusion

Data availability statement

Ethics statement

Author contributions

Funding

Conflict of interest

Generative AI statement

Publisher’s note

References

94% of researchers rate our articles as excellent or good

94% of researchers rate our articles as excellent or good