Improved prediction of soil shear strength using machine learning algorithms: interpretability analysis using SHapley Additive exPlanations

Ahmad, Mahmood; Al Zubi, Mohammad; Almujibah, Hamad; Sabri Sabri, Mohanad Muayad; Mustafvi, Jawad Bashir; Haq, Shay; Ouahbi, Tariq; Alzlfawi, Abdullah

doi:10.3389/feart.2025.1542291

ORIGINAL RESEARCH article

Front. Earth Sci., 28 February 2025

Sec. Geoscience and Society

Volume 13 - 2025 | https://doi.org/10.3389/feart.2025.1542291

Improved prediction of soil shear strength using machine learning algorithms: interpretability analysis using SHapley Additive exPlanations

Mahmood Ahmad^1,2*

Mohammad Al Zubi³

Hamad Almujibah⁴

Mohanad Muayad Sabri Sabri⁵

Jawad Bashir Mustafvi⁶

Shay Haq⁷

Tariq Ouahbi⁸

Abdullah Alzlfawi⁹

¹Department of Civil Engineering, University of Engineering and Technology Peshawar (Bannu Campus), Bannu, Pakistan
²Institute of Energy Infrastructure, Universiti Tenaga Nasional, Kajang, Malaysia
³Department of Mechanical Engineering, Hijjawi Faculty for Engineering Technology, Yarmouk University, Irbid, Jordan
⁴Department of Civil Engineering, College of Engineering, Taif University, Taif City, Saudi Arabia
⁵Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russia
⁶Department of Civil Engineering, University of Management and Technology Lahore, Lahore, Pakistan
⁷Department of Geotechnical Engineering, National Institute of Transportation (NIT), National University of Sciences and Technology (NUST), Risalpur, Pakistan
⁸LOMC, UMR 6294 CNRS, Université Le Havre Normandie, Normandie Université, Le Havre, France
⁹Department of Civil and Environmental Engineering, College of Engineering, Majmaah University, Al Majmaah, Saudi Arabia

The soil’s shear strength is an important parameter that is used frequently throughout the design phase of construction. The conventional method of calculating shear strength in a laboratory is more expensive and time-consuming. This study presents an attempt to develop models for predicting soil shear strength with improved accuracy, particularly Extreme Gradient Boosting (XGBoost), Gradient Boosting (GB), Adaptive Boosting (AdaBoost), and Categorical Boosting (CatBoost). The Coefficient of determination (R²), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Mean Absolute Deviation (MAD) indices were used to validate each of the developed models. The analysis of the results demonstrates that the AdaBoost model achieved a better prediction performance with R² = 0.99794 and lowest values of RMSE = 0.00400, MAE = 0.00080, MAPE = 0.24390 and MAD = 0.00080 followed by the CatBoost model with R² = 0.99651, RMSE = 0.00521, MAE = 0.00429. MAPE = 1.33450 and MAD = 0.00429 in the training phase when compared to previous models such as multivariate adaptive regression splines and support vector regression published in the literature. In addition, SHapley Additive Explanations analysis elucidates that the liquidity index has the greatest influence on soil shear strength, followed by wet density.

1 Introduction

The shear strength of the soil is a significant attribute which is employed most frequently throughout the design phase of construction projects. Collapse of building and ground failure are often associated with the shear strength of the soil (Das, 2021). When evaluating the stability of constructions, such as high-rise building foundations, retaining walls, embankments, and airfield pavements, geotechnical engineers take the soil’s shear strength into account (Vanapalli and Fredlund, 2000; Zhang et al., 2023). As a result, in many geotechnical designs, estimating the shear strength of soil accurately is a crucial task (Gao et al., 2020; Li et al., 2019; Eid and Rabie, 2017; Yu et al., 2021; Zou et al., 2024). Traditional calculations of shear strength rely on the cohesion (c) and internal friction angle (φ) parameters. These parameters of interest (c, φ) can be determined in the lab using the vane shear test equipment or any indirect technique of soil testing; in the field, they can be measured using the tri-axial test, the unconfined compressive strength test, or the direct shear test (Murthy, 2009; Pham et al., 2018; Xu et al., 2021).

The soil shear strength is affected by specific gravity, void ratio, water content, plastic limit, liquid limit, stress history, and relative density (Pham et al., 2020). Over the past few years, there has been rapid development in the field of artificial intelligence techniques. This development has led to the emergence of machine learning (ML) algorithms that have been proposed and are now widely used in various fields. ML applications have transformed the way how complex problems can be tackled using new and innovative solutions. Due to their learning ability, ML algorithms became a desirable tool for revealing relationships between many soil parameters. Therefore, the growing interest in studying the potential applications of ML algorithms on geotechnical issues has been witnessed in the past decades (Ahmad et al., 2021; Ahmad et al., 2022a; Ahmad F. et al., 2022; Ahmad F. et al., 2023; Ahmad M. et al., 2023; Barkhordari et al., 2023; Asteris et al., 2022a; Li et al., 2022; Armaghani et al., 2014; Armaghani et al., 2017) including shear strength of soil (Pham et al., 2018; Nguyen et al., 2021; Tien Bui et al., 2019). Furthermore, several researchers have also utilized ML algorithms to solve some other specific problems (Fan et al., 2024; Zhou et al., 2022; Lü et al., 2024; Zi et al., 2024; Noman et al., 2024). The parameters for civil works design are frequently tested by using empirical correlations, which are produced by fitting equations for regression to a pre-existing database, as opposed to direct measurements in the lab and field (Hua et al., 2024; Shu et al., 2024; Wang et al., 2024; Shan et al., 2025). Garven and Vanapalli (2006) looked into nineteen distinct empirical techniques for predicting soil shear strength in unsaturated conditions. With the approach used, a number of possible soil parameters were assessed for association with soil shear strength.

Soft computing techniques are known for their proficiency in non-linear modeling, and there is evidence in the literature from a number of technical and scientific fields that these techniques can establish correlations between desired outcomes and a variety of influencing parameters, whether those parameters have direct or indirect impacts (Fan et al., 2024; Zhou et al., 2022; Lü et al., 2024; Asteris et al., 2022b; Koopialipoor et al., 2019). Taking into account the effects of influencing parameters, experimental data can be used to design a high performance soft computing-based paradigm. However, choosing a suitable soft computing model is challenging for the reasons listed below: (a) Inadequate modeling and validation; (b) models in use not being able to identify the precise global optimum; (c) problems with overfitting, etc. XGBoost, GB, AdaBoost, and CatBoost are all powerful machine learning algorithms, often chosen for their strengths in solving a variety of classification and regression problems (Abdullah et al., 2024; Ahmad et al., 2022c; Ahmad et al., 2022d; Ahmad et al., 2022e; Islam and Amin, 2020; Prokhorenkova et al., 2018; Dorogush et al., 2018; Chen and Guestrin, 2016). These algorithms have proven to be versatile and adaptable to a wide variety of domains, as evidenced by their frequent use in research across fields but the applications in geotechnical engineering are limited based on literature surveys. Therefore, these four well-known machine learning algorithms—XGBoost, GB, AdaBoost, and CatBoost—have been chosen for modeling in this study. Furthermore, selected a research topic “the prediction of soil shear strength” which is an important geotechnical engineering task (Pham et al., 2018; Nguyen et al., 2021; Tien Bui et al., 2019). Therefore, this paper tries to address the following issues: (1) providing an accurate and efficient ML model for predicting the soil shear strength; (2) Examining the prediction accuracy of the best ML model against that of existing models in literature; and (3) using the Shapley Additive exPlanations (SHAP) approach to describe the importance and participation of input variables that influence the soil shear strength.

The paper is organized as follows: Section 2 presents the data collection and correlation analysis. Section 3 explains the theory of Extreme Gradient Boosting (XGBoost), Gradient Boosting (GB), Adaptive Boosting (AdaBoost), and Categorical Boosting (CatBoost); Section 4 describes the performance measurement used; Section 5 presents the results and a discussion of them; and at last, findings derived from the achieved results are given.

2 Dataset and correlation analysis

A total of 249 soil samples (see supplementary data, Supplementary Table S1) from 65 boreholes were collected from the geotechnical investigation phase of the Le Trong Tan Geleximco Project, located in the west of Hanoi, Vietnam (see Figure 1) reported by Cao et al. (2022). The depth of the collected soil samples ranges from 1.20 to 39.5 m. The boreholes are drilled by means of slurry (a mixture of bentonite and water), and thin-walled metal tubes to ward off soil collapses. The soil samples with a diameter of 91 mm are gathered by the method of piston samplers. The sample collection process complies with the Vietnamese national standards of the TCXDVN-194-2006 (High Rise Building—Guide for Geotechnical Investigation), the TCN-259-2000 (the procedure for soil sampling by boring methods). Further details can be found in Cao et al. (2022) research paper.

Figure 1

Figure 1. Site location—Le Trong Tan Geleximco, Hanoi, Vietnam.

The factors measured from soil samples are depth of sample (X1, m), sand percentage (X2), loam percentage (X3), clay percentage (X4), moisture content (X5, %), wet density (X6, g/cm³), dry density (X7, g/cm³), void ratio (X8), liquid limit (X9, %), plastic limit (X10, %), plastic index (X11), and liquidity index (X12). These 12 factors are employed as conditioning variables to estimate the shear strength of the soil (Y, kg/cm²). The descriptive statistics of the dataset are presented in Table 1. As can be inferred from the table, the sample variances (scattered in the range of 0.00539–131.891) indicate a wide range of input parameters. The variance for the output parameter is 0.00769. The values of standard error are also scattered in a wide range from 0.00465 to 0.72779 and thus confirm the credibility of the dataset. The heat map of the Pearson correlation coefficient (r) between each parameter is shown in Figure 2. The Pearson correlation coefficient (r) shown in Figure 2 presents that liquidity index (X12) and shear strength of soil (Y) has a strong positive correlation, i.e., (r = 0.83). In statistical modeling, it is well known that the existence of strongly correlated variables can significantly influence the efficiency of the model. This belief stems from the assumption that these variables, due to their strong correlation, may cause redundancy and unnecessary complexity in the model. Moreover, Kutner et al. (2005) also argued that these correlated variables do not typically affect inferences about mean responses in the data. Correlation only captures linear relationships. Since the relationship between variables is non-linear, the correlation coefficient value is low, so it is worthwhile to explore the relationship through non-linear models. This suggests that even if variables share a strong correlation, each can still provide unique and valuable insights about the average responses in the dataset, thereby making them essential components of the model. Whereas correlation coefficient between sand percentage (X2) and shear strength of soil (Y) exhibited a notably weak negative correlation (r = −0.02) for the dataset which indicates that the relationship between the variables is not linear.

Table 1

Table 1. Descriptive statistics of the dataset.

Figure 2

Figure 2. Correlation heat map.

3 Machine learning algorithms

3.1 Adaptive boosting

Adaptive Boosting (AdaBoost), an ensemble of several weak learner decision trees, outperforms random guessing by a modest margin. To minimize the error of the previous tree, the adaptive feature of the AdaBoost technique transmits gradient information from previous trees to subsequent trees. As a result, the continuous process of learning trees at every stage promotes the growth of an efficient learner. The weighted average of the predictions produced by each tree is used to establish the final extension. The weight distribution of every sample in the dataset must be changed throughout the training of every distinct tree model. In each iteration, it assigns higher weights to misclassified data points in an attempt to lower the overall classification error. The training outcomes show fluctuation in line with the variation of the training data, and the total of all the outcomes is the result of this process (Schapire, 2013). AdaBoost’s significant adaptability improves its robustness against outliers and irrelevant data. Moreover, the approach is specifically tailored to function in a way that feeds the information gathered by prior trees to subsequent trees. This allows them to concentrate solely on training data that present prediction challenges (Freund and Schapire, 1997).

A single decision tree is called a weak learner because of its limited capabilities. Researchers are considering if it is possible to create a strong learner by combining many weak learners together. In 1990, the conjecture was verified, providing the basic ideas behind the boosting algorithm, which combines multiple weak learners in a sequential fashion (Schapire, 1990).

3.2 Gradient boosting

Gradient boosting (GB) is an ensemble technique that builds several weak models and then combines them to enhance performance as a whole. The GB minimizes the loss function associated with a given model by applying the gradient descent methodology. There is an iterative process involved in adding weak learners to the model. The total input of all weak learners establishes the final prediction, which is subsequently decided by a gradient optimization process aiming at minimizing the strong learner’s overall error (Islam and Amin, 2020; Aurélien, 2019). The method by which GB fits the model to the residuals (the difference between the actual and predicted values) of the preceding iteration is to optimize a user-specified loss function. These loss functions include, for example, the log loss for classification and the mean squared error for regression. There are three main mechanisms involved in the GB. Optimizing a loss function is the first thing that needs to be done. It is required that the loss function be differentiable. A loss function is used to quantify the degree of concordance between a machine learning model and observed data relevant to various phenomena. Depending on the specific issue at hand, different loss functions may be chosen. The use of the weak learner is implemented in the following stage. In gradient boosters, the decision tree is used as the weak learner. Regression trees are a unique technique for handling residuals in previous iteration forecasts by integrating the output of successive models; they produce precise values for divisions and enable for output aggregation. While classification and regression problems use different approaches, they share a common approach to data classification. Regression analysis is an approach that makes use of decision trees. The gathering of multiple poor performers is what the third phase involves. The analysis gradually incorporates successive decision trees. A gradient descent technique is applied during the tree-incorporation process in order to minimize loss. Gradient boosters require the gradient component as a necessary component. The gradient descent optimization technique is applied to the model’s output in place of using the parameters of the weaker models. By altering both the gradient and the loss function, the gradient boosting strategy, which is an enhanced variant of the gradient descent method, permits generalization (Ngo et al., 2023).

3.3 Extreme gradient boosting

Extreme Gradient Boosting, or XGBoost, is a method developed by Chen based on gradient boosting (Chen and Guestrin, 2016). In this method, the decision trees classifier is usually used as a weak model (Zounemat-Kermani et al., 2021). The projections are based on a sequence of weak learners that consistently improve the output of their predecessors. To address the overfitting problem, XGBoost adds a regularization component to the objective function given in Equation 1.

O = \sum_{i = 1}^{n} (L (y_{i}, F (x_{i}))) + \sum_{k = 1}^{t} R (f_{k}) + C (1)

where O is objective function, R(f_k) denotes the regularization term at the k iteration time, and C is a constant. To prevent overfitting, XGBoost offers regularization settings. The regularization term is expressed in Equation 2 as:

R (f_{k}) = α H + \frac{1}{2} η \sum_{j = 1}^{H} w_{j}^{2} (2)

where α denotes complexity of leaves, H represents the number of leaves, η denotes the penalty parameter, and w_j is the output of each leaf node. The trees are split either level-wise or according to depth by the XGBoost algorithm. Each tree in the decision-making process analyzes the feature and the threshold that corresponds with it, as well as identifying the branch impact that has the best possible outcome. Consecutive splits are used to extend the tree topologies.

3.4 Categorical boosting (CatBoost)

Categorical Boosting (CatBoost) is a permutation-based approach that differs from conventional algorithms. It is a distinctive method for processing categorical data in data processing (Prokhorenkova et al., 2018; Dorogush et al., 2018). The proposed method includes two new concepts: ordered target statistics and ordered boosting. Hancock and Khoshgoftaar (Hancock and Khoshgoftaar, 2020) provided a thorough analysis of this method, looking at how well it works in a variety of domains for classification and regression problems. In order to manage category features, CatBoost uses target statistics as additional numerical features. This is a highly successful strategy that minimizes information loss (Prokhorenkova et al., 2018). CatBoost uses Ordered Boosting, a type of gradient-based regularization that prevents overfitting by limiting model complexity. The dataset is arranged in a random order by the algorithm, and then the mean label value for the training samples that fall into the same category inside the arrangement is calculated. Following Prokhorenkova et al. (Prokhorenkova et al., 2018), if σ = (σ₁, σ₂, . . ., σ_n) is a permutation, the category $x_{σ_{p}, k}$ can be substituted with the average label value ${\hat{x}}_{σ_{p}, k}$ in Equation 3.

{\hat{x}}_{σ_{p}, k} = \frac{\sum_{j = 1}^{p - 1} [x_{σ j, k} = x_{σ p, k}] Y_{σ_{j}} + a . P}{\sum_{j = 1}^{p - 1} [x_{σ j, k} = x_{σ p, k}] + a} (3)

where P is a prior value; a is the weight of the prior; Y_σj is a label value; [·] denotes the Iverson bracket, namely, [x_σj,k x_σp,k] equals 1 if x_σj,k = x_σp,k, and otherwise, it is equal to 0. For further details regarding CatBoost, interested readers are referred to the publications of Prokhorenkova et al. (Prokhorenkova et al., 2018) and Dorogush et al. (Dorogush et al., 2018).

4 Performance evaluation

The evaluation stage involves the computation of diverse assessment metrics, encompassing, Coefficient of determination (R²), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Mean Absolute Deviation (MAD). These metrics serve to gauge the efficacy of the model’s performance, shedding light on the extent to which the model’s predictions correlate with the actual target values. The formulations used to calculate these performance metrics are expressed in Equations 4–8.

R^{2} = 1 - \frac{\sum_{i = 1}^{m} {({\hat{y}}_{i} - y_{i})}^{2}}{\sum_{i = 1}^{m} {({\bar{y}}_{i} - y_{i})}^{2}} (4)

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{m} {({\hat{y}}_{i} - y_{i})}^{2}} (5)

M A E = \frac{1}{N} \sum_{i = 1}^{m} |{\hat{y}}_{i} - y_{i}| (6)

MAPE = \frac{1}{N} \sum_{i = 1}^{m} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}| * 100 (7)

MAD = \frac{1}{N} \sum_{i = 1}^{m} |({\hat{y}}_{i_{i}} - {\bar{y}}_{i})| (8)

where ${\hat{y}}_{i}$ represents the predicted value; ${\bar{y}}_{i}$ represents the average value; $y_{i}$ represents the measured value; $m$ is the training or testing samples; and N indicates the total number of samples.

Figure 3 presents the schematic view of steps of the methodology.

Figure 3

Figure 3. The methodology applied during the model development and performance evaluation.

5 Results and discussion

The models for predicting the soil shear strength were developed using Orange, a popular open-source machine learning technology platform for statistical computation and data mining (Demšar et al., 2013). Orange software (version 3.32.0), developed in collaboration with the open source community at the Bioinformatics Laboratory, Faculty of Computer and Information Science, University of Ljubljana, was used to analyze the data in this work. Orange software incorporates a comprehensive range of ML algorithms that are widely utilized in research and practice. In this study, depth of sample (X1, m), sand percentage (X2), loam percentage (X3), clay percentage (X4), moisture content (X5, %), wet density (X6, g/cm³), dry density (X7, g/cm³), void ratio (X8), liquid limit (X9, %), plastic limit (X10, %), plastic index (X11), and liquidity index (X12) were the predictor variables, and shear strength of soil (Y, kg/m³) was the target variable. Every modelling stage necessitates the selection of an appropriate size of training and testing datasets. Thus, 175 data points, or 70% of the total data, were used to develop models, and 74 data points, or 30% of the data, were utilized to evaluate the models in this study. The proposed models were tweaked using trial and error to obtain optimal hyperparameter values based on accurate soil shear strength prediction. Tuning hyperparameters appropriately leads to more efficient training, better performance, and a generalizable model. This research finds the best values for some important model parameters and clarifies the definitions of these hyperparameters. During the trials, the models’ tuning parameters were chosen, and they were adjusted until the best values shown in Table 2 were achieved.

Table 2

Table 2. Hyperparameter optimization results.

The efficacy of the models that were developed was assessed using a number of performance metrics, including the coefficient of determination (R²), mean absolute error (MAE), mean absolute percent error (MAPE), root mean square error (RMSE), and mean absolute deviation (MAD). Table 3 and Figures 4, 5 provide a summary of the developed models’ performance during the training and testing stages. Based on the findings, the proposed AdaBoost model achieved the highest coefficient of determination value R² = 0.99794 and lowest values of RMSE = 0.00400, MAE = 0.00080, MAPE = 0.24390 and MAD = 0.00080 followed by the CatBoost model (R² = 0.99651, RMSE = 0.00521, MAE = 0.00429. MAPE = 1.33450 and MAD = 0.00429) in the training phase. However, in the testing phase, the XGBoost model achieves R² = 0.86359, RMSE = 0.03158; MAE = 0.02454, MAPE = 7.53478, and MAD = 0.02454 followed by the AdaBoost (R² = 0.85689, RMSE = 0.03234, MAE = 0.02405, MAPE = 7.36267, and MAD = 0.02405) found slightly lower. Along the line (x = y), the scatter plot displays the predicted and actual soil shear strengths. A point on the line (x = y) represents an error-free prediction of the model’s performance; a prediction that is closer to the line (x = y) denotes a more accurate model.

Table 3

Table 3. Model performance in training and testing phases.

Figure 4

Figure 4. The predicted versus actual soil shear strength: (A) XGBoost, (B) GB, (C) AdaBoost, and (D) CatBoost models based on the training dataset.

Figure 5

Figure 5. The predicted versus actual soil shear strength: (A) XGBoost, (B) GB, (C) AdaBoost, and (D) CatBoost models based on the testing dataset.

The accuracy of all developed models at predicting soil shear strength values is depicted in Figures 6A–D for the training dataset and Figures 7A–D for the testing dataset. The AdaBoost model (see Figure 6C) provided the most reliable prediction, as their predicted results are sufficiently consistent with the actual shear strength values and less error values shown in this graph whereas in testing dataset XGBoost model (see Figure 7A) showed reliable prediction. It is generally visible by the larger aggregation of results around the y-axis (y = 0) by the AdaBoost model in training and testing datasets, with the exception of a few noise points. The comparative findings show that AdaBoost can predict accurately the soil shear strength values as compare to the other models (i.e., XGBoost, GB, and CatBoost) because their predicted results are sufficiently consistent with the actual shear strength values.

Figure 6

Figure 6. Comparison of the proposed models results in the training dataset (A) XGBoost, (B) GB, (C) AdaBoost, and (D) CatBoost in predicting soil shear strength values.

Figure 7

Figure 7. Comparison of the proposed models results in the testing dataset (A) XGBoost, (B) GB, (C) AdaBoost, and (D) CatBoost in predicting soil shear strength values.

Taylor diagram, a straightforward graphical representation of the relationship between predicted and actual data, is used to evaluate the effectiveness of various simulation models. It presents standard deviations, correlation coefficients, and root-mean-square (RMS) differences on a two-dimensional graph to illustrate a statistical comparison of multiple models. The radial distance from the origin is used to express the standard deviation. The difference in standard deviation units between the actual and anticipated fields determines the RMS error. The azimuthal angle is a representation of the correlation coefficient. Figure 8 shows the Taylor diagrams for testing and training datasets. It’s evident from Figure 8 that all the developed models, i.e., CatBoost, XGBoost, AdaBoost, and GB are performing better in both the testing and training phases. The dots for CatBoost and XGBoost are almost coinciding, and their performance is equally good; however, AdaBoost seems slightly better in both the training and testing phases.

Figure 8

Figure 8. Taylor diagram of proposed models for (A) Training, and (B) Testing phases.

The results of the current research were also validated against literature reports on the implementation of models over the train and test modeling phases. Weights and biases are utilized as factors in order to organize the computational connection that exists among the many components of an ANN by Rabbani et al. (2023). It was determined through a procedure of trial and error how many hidden processing would be optimal for the system. In the course of this inquiry, 500 iterations were carried out to ensure that the simulations have a suitable level of reliability. The optimal hyperparamters for the various models presented in Table 4 such as SVM, BPNN, etc., readers may consult Cao et al. (2022) research paper. It is worthwhile to mention here that the data and input parameters in this study were kept the same as that of Rabbani et al. (2023) and Cao et al. (2022) owing to make a fair comparison. Table 4 represents the comparative performance of soft computing models that were studied to evaluate the suitability of soil shear strength prediction. According to the results, AdaBoost model was demonstrated as being comparatively best model with R² = 0.8569 and RMSE = 0.0323 whereas the model BPNN developed by Cao et al. (2022) showed worse performance with R² = 0.659 and RMSE = 0.047 in testing data. The comparative analysis results revealed that the AdaBoost model can be implemented in the future applications.

Table 4

Table 4. Comparison of the developed models with available models in literature.

After calculating all the performance indices for the testing and training phase, models are ranked consequently. The ranking scores for two distinct models that produce identical results could be the equal. Ideal value of performance parameters for R² is considered as 1, whereas for RMSE, MAE, MAPE, and MAD it is 0. On the basis of calculation for performance measures, the rank analysis of all the developed models were computed and shown in the tabular form (Table 5) to pick the best model (Xue et al., 2023). AdaBoost model overall score in both training and test phases together is 38, considerably higher than XGBoost (29), CatBoost (20), and GB (13) as in Table 5. This gives an in-depth evaluation of the model’s predictive ability and presentation (Mustafa et al., 2022). As a result, the AdBoost model outperforms the other developed models in predicting shear strength of soil. The top model received a maximum of four points (as four models were used in this study), while the worst model received one point. Following that, all of the rankings are summed together to provide a total rank, which is as well calculated in this learning process (Mustafa et al., 2022).

Table 5

Table 5. Rank analysis of developed models.

6 Shapley analysis

Lundberg and Lee’s SHapley Additive ExPlanations (SHAP) technique explains how to predict instance x by examining the relative value of each characteristic in the prediction process. The basic concept behind the plot is that features having higher SHAP values hold more importance. The dots indicate the Shapely explanatory values calculated for each instance in the dataset such that red denotes greater values while blue denotes lower values. The SHAP summary chart takes into account the importance and impact of each feature. In the summary graphic, the Shapley value of each feature and occurrence is represented by a dot. A coordinate system’s horizontal axis is determined by the Shapley value, whereas its vertical axis is determined by a particular attribute. A chromatic gradient that goes from the least intense to the most intense hue indicates the relative value of a feature. The attributes listed are arranged according to relative importance in a hierarchical format. The impact’s polarity—positive or negative—is indicated by the horizontal axis. Red and blue are used as chromatic markers to indicate feature values; greater feature values are indicated by red, and lower feature values are indicated by blue. The inputs having high importance are present on the top and their importance decreases going from top to bottom. The result of Shapley analysis based on the AdaBoost model in the form of summary plot is given in Figure 9. Notice that most of the red points are located at the negative side of Shap values in front of liquidity index (X12). Since red colour indicates higher values, it means that an increase in liquidity index (X12) will have a negative Shap value and consequently a negative impact on the output (shear strength in this case). Higher values of liquidity index (X12) increase the shear strength significantly, while lower values of wet density (X6) and loam percentage (X3) decreases the shear strength of soil significantly. After liquidity index (X12), wet density (X6) is the next most important variable having both red and blue points in a continuous manner within a range lesser than liquidity index (X12). Furthermore, the void ratio (X8) variable does not have a significant impact on the prediction of shear strength of soil. It is important to mention here that very low impact polarity values on the horizontal axis of a Shapley values might be very low due to limited data variety. Since, the data was collected from an actual building job, it became much clearer that the soil in the region that was the subject of the study has distinctive qualities. This led to a lower value for critical elements, which in turn led to the shear strength of soil having a lower value. Small datasets present unique challenges that can affect the Shapley values, which are calculated based on the contribution of each feature to the model’s prediction.

Figure 9

Figure 9. SHAP diagram for the AdaBoost model.

7 Conclusion

In this research study, the ML algorithms such as XGBoost, GB, AdaBoost, and CatBoost were used to predict the soil shear strength. The performance of the developed models was evaluated using statistical metrics such as R², RMSE, MAE, MAPE, and MAD and compared to the available soft computing models developed recently in the literature. The following are the main findings based on the results.

(1) The rank analysis of AdaBoost model in training and test phases together is 38, considerably higher than XGBoost (29), CatBoost (20), and GB (13). As a result, the AdaBoost outperforms the other developed models in predicting shear strength of soil.

(2) The new proposed models i.e., XGBoost, GB, AdaBoost, and CatBoost have the highest performance capability as compare to available models developed recently in literature with less variation in the actual and predicted values in terms of errors in the training and test sets. The coefficient of determination value in the training phase is highest for AdaBoost (0.9979). In the testing phase, XGBoost (0.86359) have a slight lead over AdaBoost (0.8569).

(3) The validation of developed models was done using a variety of error metrics such as RMSE, MAE, MAPE, MAD, and the findings showed that developed models fulfilled the standards that the literature suggested being accepted.

(4) The Shapley analysis results show that feature liquidity index (X12) is the variable that has the greatest influence on soil shear strength, followed by wet density (X6). Higher value of liquidity index (X12) increase the soil shear strength significantly, while lower values of plastic index (X11) and loam percentage (X3) decreases the soil shear strength significantly. The void ratio (X8) variable does not have a significant impact on the prediction of soil shear strength.

The accuracy and reliability of predictions provided by the presented models improve when interpolation is employed, as opposed to extrapolation, owing to the use of input values. Therefore, the models should not be applied to input parameter values outside of the range specified by the study. It should be noted that the accuracy and reliability of machine learning algorithms are affected by the dataset, such as the number and kind of samples. Therefore, additional samples should be collected and more effective models should be suggested in the future.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author contributions

MA: Conceptualization, Formal Analysis, Investigation, Methodology, Software, Visualization, Writing–original draft, Writing–review and editing. MAZ: Data curation, Funding acquisition, Resources, Software, Validation, Writing–review and editing. HA: Data curation, Formal Analysis, Project administration, Supervision, Writing–review and editing. MS: Funding acquisition, Investigation, Methodology, Resources, Software, Supervision, Validation, Writing–review and editing. JM: Data curation, Formal Analysis, Investigation, Resources, Writing–review and editing. SH: Investigation, Software, Validation, Writing–review and editing. TO: Formal Analysis, Funding acquisition, Methodology, Project administration, Writing–review and editing. AA: Data curation, Formal Analysis, Validation, Visualization, Writing–review and editing.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This research was funded by Taif University, Saudi Arabia, Project No. (TU-DSPP-2024-33). Furthermore, the research is partially funded by the Ministry of Science and Higher Education of the Russian Federation as part of the World-Class Research Center program: Advanced Digital Technologies (contract No. 075-15-2022-311 dated 20.04.2022).

Acknowledgments

The authors extend their appreciation to Taif University, Saudi Arabia, for supporting this work through project number (TU-DSPP-2024-33).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Generative AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/feart.2025.1542291/full#supplementary-material

References

Abdullah, G. M. S., Ahmad, M., Babur, M., Badshah, M. U., Al-Mansob, R. A., Gamil, Y., et al. (2024). Boosting-based ensemble machine learning models for predicting unconfined compressive strength of geopolymer stabilized clayey soil. Sci. Rep. 14 (1), 2323. doi:10.1038/s41598-024-52825-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Ahmad, F., Tang, X. W., Ahmad, M., González-Lezcano, R. A., Majdi, A., and Arbili, M. M. (2023a). Stability risk assessment of slopes using logistic model tree based on updated case histories. Math. Biosci. Eng. 20 (12), 21229–21245. doi:10.3934/mbe.2023939

PubMed Abstract | CrossRef Full Text | Google Scholar

Ahmad, F., Tang, X. W., Qiu, J. N., Wróblewski, P., Ahmad, M., and Jamil, I. (2022b). Prediction of slope stability using Tree Augmented Naive-Bayes classifier: modeling and performance evaluation. Math. Biosci. Eng. 19 (5), 4526–4546. doi:10.3934/mbe.2022209

PubMed Abstract | CrossRef Full Text | Google Scholar

Ahmad, M., Al-Mansob, R. A., Kashyzadeh, K. R., Keawsawasvong, S., Sabri Sabri, M. M., Jamil, I., et al. (2022d). Extreme gradient boosting algorithm for predicting shear strengths of rockfill materials. Complexity 2022 (1), 9415863. doi:10.1155/2022/9415863

Improved prediction of soil shear strength using machine learning algorithms: interpretability analysis using SHapley Additive exPlanations

1 Introduction

2 Dataset and correlation analysis

3 Machine learning algorithms

3.1 Adaptive boosting

3.2 Gradient boosting

3.3 Extreme gradient boosting

3.4 Categorical boosting (CatBoost)

4 Performance evaluation

5 Results and discussion

6 Shapley analysis

7 Conclusion

Data availability statement

Author contributions

Funding

Acknowledgments

Conflict of interest

Generative AI statement

Publisher’s note

Supplementary material

References

94% of researchers rate our articles as excellent or good