Exploring alternate coupling inputs of a data-driven model for optimum daily streamflow prediction in calibrated SWAT-BiLSTM rainfall-runoff modeling

Ahmad, Khalil; Iqbal, Mudassar; Tariq, Muhammad Atiq Ur Rehman; Khan, Afed Ullah; Nadeem, Abdullah; Chen, Jinlei; Usanova, Kseniia; Almujibah, Hamad; Alyami, Hashem; Abid, Muhammad

doi:10.3389/frwa.2025.1558218

ORIGINAL RESEARCH article

Front. Water, 02 April 2025

Sec. Water and Hydrocomplexity

Volume 7 - 2025 | https://doi.org/10.3389/frwa.2025.1558218

Exploring alternate coupling inputs of a data-driven model for optimum daily streamflow prediction in calibrated SWAT-BiLSTM rainfall-runoff modeling

Khalil Ahmad^1,2

Mudassar Iqbal¹^*

Muhammad Atiq Ur Rehman Tariq¹

Afed Ullah Khan²

Abdullah Nadeem¹

Jinlei Chen³^*

Kseniia Usanova^4,5

Hamad Almujibah⁶

Hashem Alyami⁷

Muhammad Abid⁸

¹Centre of Excellence in Water Resources Engineering, University of Engineering and Technology, Lahore, Pakistan
²Department of Civil Engineering, University of Engineering and Technology Peshawar, Bannu Campus, Bannu, Pakistan
³State Key Laboratory of Cryospheric Science and Frozen Soil Engineering, Northwest Institute of Eco-Environment and Resources, Chinese Academy of Sciences, Lanzhou, China
⁴Scientific and Technological Complex for Digital Engineering in Construction, Peter the Great St. Petersburg Polytechnic University, Saint Petersburg, Russia
⁵Academy of Engineering, RUDN University, Moscow, Russia
⁶Department of Civil Engineering, College of Engineering, Taif University, Taif, Saudi Arabia
⁷Department of Computer Science, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia
⁸College of Aerospace and Civil Engineering, Harbin Engineering University, Harbin, China

Accurate streamflow prediction in mountainous regions is vital for sustaining water resources in downstream areas, ensuring reliable availability for agriculture, energy, and consumption. However, physically based prediction models are prone to substantial uncertainties due to complex processes and the inherent variability in model parameters and parameterization. This study addresses these challenges by exploring alternative coupling inputs for data-driven (DD) models to optimize daily streamflow prediction in a calibrated SWAT-BiLSTM rainfall-runoff model within the Astore sub-basin of the Upper Indus Basin (UIB), Pakistan. The research explores two standalone models (SWAT and BiLSTM) and three alternative coupling inputs: conventional climatic variables (precipitation and temperature), cross-correlation based selected inputs, and exclusion of direct climatic inputs, in calibrated SWAT-BiLSTM model. The study spans calibration, validation, and prediction periods from 2007 to 2011, 2012 to 2015, and 2017 to 2019, respectively. Based on compromise programing (CP) ranking, SWAT-C-BiLSTM (Q_P) and SWAT-C-BiLSTM (T₁ Q_P) showed competent performances followed by BiLSTM, SWAT-C-BiLSTM (PTQ_P), and SWAT. These findings highlight that excluding climatic parameters alternative SWAT-C-BiLSTM (Q_P) enhances the couple model’s accuracy sufficiently and underscores the potential for this approach to contribute to sustainable water resource management.

1 Introduction

Streamflow is an integral part of the hydrological cycle that is used for water resource planning and flood or drought forecasting, flood damage, and water-related risks evaluation. These aspects require accurate streamflow prediction to be managed effectively (Cheng et al., 2020; Cui et al., 2020). Hydrological models are vital in prediction of floods and low-flows, for both academics and practitioners in the management of rivers (Pfannerstill et al., 2014). However, the major difficulty is in simulating all phases of the hydrological process accurately using a consistent set of model parameters (Madsen, 2000). This is important so as not to underpredict high flows, which will lead to an elevated flooding risk, and to not overpredict low flows, which could result in a scarcity of water. It has been established that hydrological modeling can efficiently estimate streamflow in different ecological settings and the overall models can be either process based conceptual or data-driven models (Fan et al., 2020; Schoppa et al., 2020).

The process based conceptual hydrological models provide an abstract, mathematical representation of activities in the water cycle. Soil and Water Assessment Tool (SWAT) (Gassman et al., 2007), Hydrologiska Byråns Vattenbalansavdelning (HBV) (Lindström et al., 1997), Precipitation-Runoff Modeling System Version IV (PRMS-IV) (Markstrom et al., 2015), Hydrological Predictions for the Environment (HYPE) (Lindström et al., 2010), and Modular Integrated Kinetic Equations – System Hydrologique Europeen (MIKE-SHE) (Jaber and Shukla, 2012) are some models that have been developed and used internationally. However, challenges persist in selecting the appropriate model, configuring it effectively, addressing uncertainties in data inputs and their quality, and determining related calibrated parameters (Kavetski et al., 2006; Dakhlalla and Parajuli, 2019; Ghaith and Li, 2020). Additionally, a significant issue in model calibration is equifinality, where similar simulation outcomes can be achieved using different parameter sets. Moreover, these models overestimated the peak flows, especially during the flood occurrences in several places such as; Pakistan, (Haleem et al., 2022; Masood et al., 2023), Norway (Huang et al., 2019), the US (Shrestha et al., 2019) and other places (Molina-Navarro et al., 2014; Bizuneh et al., 2021; Valeh et al., 2021).

Data driven (DD) models, which are often called black box models, aim at finding such relations, which can be nonlinear or linear, between informative and target parameters relying only on input data. These models do so without necessarily going deep into inherent processes of the system. As a category of DD models, one of the promoted tools that has been used in many hydrological applications is deep machine learning (Maniquiz et al., 2010; Li et al., 2015; Lee J. et al., 2020). For instance, long short term memory (LSTM), initially used for tasks including machine translation, speech to text transformation, sequence or series data processing (Lee T. et al., 2020), has been applied in various hydrological activities including flood forecasting (Hu et al., 2018), prediction of the moisture state in the soil, as well as prediction of thee groundwater table level. Moreover, Bidirectional long short term memory (BiLSTM), which consists of a couple of LSTMs with opposite directionality, has been found to enhance the performance of hydrological prediction (Ghasemlounia et al., 2021).

Recent advancements in hybrid modeling have further improved streamflow prediction by integrating deep learning with physical models. CNN-LSTM hybrid models, for instance, have demonstrated superior performance in glacier-fed basins by incorporating glacio-hydrological model outputs, significantly improving flood risk assessment and water resource management. A study showed that leveraging glacier-derived features enhanced predictive accuracy (NSE = 0.83, KGE = 0.88), while multi-scale feature analysis further improved high-flow event prediction (NSE = 0.97) (Ougahi and Rowan, 2025). These developments underscore the potential of AI-enhanced hydrological models in addressing the challenges of runoff forecasting.

Combining the strengths of process-based and DD models, the SWAT coupled with the LSTM model offers an optimal approach for streamflow prediction. This hybrid model leverages various inputs, including meteorological data, to improve the accuracy of streamflow simulations. The SWAT-LSTM approach integrates the interpretability of process-based modeling with the predictive power of machine learning, providing a robust solution for long-duration streamflow simulations in both ungauged and poorly gaged watersheds (Chen et al., 2023). A number of coupling options that have been tried in recent studies, including uncalibrated model coupled with data driven models, partially calibrated model with data driven model and calibrated model with data driven models (Yuan and Forshay, 2022), and it is concluded that calibrated coupled models perform better as compare to partially and uncalibrated coupled models (Yuan and Forshay, 2022; Noori and Kalin, 2016; Senent-Aparicio et al., 2019; Zhang et al., 2023a; Wang et al., 2023; Yang et al., 2023; Jeong D. S. et al., 2024).

Two main categories of inputs have been predominantly used in literature. The first category involves conventional inputs, which typically consist of climatic variables along with the simulated streamflow from process-based models (Yuan and Forshay, 2022; Jin et al., 2024; Jung et al., 2022). The second category includes selected inputs based on cross-correlation analysis which gave reasonable good results in comparison to conventional inputs (Yang et al., 2023; Jeong H. et al., 2024; Zhang et al., 2023b; Afshan et al., 2009; Naeem et al., 2012). While these approaches have not yet explored exclusion of climatic parameters during coupling, as data driven model’s may not capture non linearity of climate variables in every region significantly therefore this study is conducted to evaluate the coupled model’s performance under alternate inputs, in streamflow prediction within a unified study framework. This gap presents an opportunity to address uncertainties associated with model inputs and enhance streamflow prediction performance. By advancing the calibrated SWAT-BiLSTM model through the exploration of novel coupling input combinations, this study aims to enhance daily streamflow predictions in hydrology. The inclusion of diverse climate parameters, or their deliberate exclusion, will enable a more nuanced understanding of their roles in reducing uncertainties and improving predictive reliability. Thus, this research contributes to filling a critical gap in the current literature and provides a robust methodology for advancing rainfall-runoff modeling.

The primary focus of this study is to investigate alternative coupling inputs, particularly through the inclusion or exclusion of climatic parameters in data-driven models. This exploration is conducted within a calibrated SWAT-BiLSTM framework for the Astore sub-basin of the UIB, aiming to enhance daily streamflow prediction and improve hydrological modeling accuracy.

2 Description of study area

The Astore sub-basin is located in the northwest part of the Himalayan range within the UIB, Pakistan and has an estimated area of 3,988 square kilometers (Figure 1). The area altitude varies between 1,198 m and 8,069 m and is home to 543 square kilometers of glaciers. This sub-basin also holds the Nanga Parbat range which holds the ninth highest mountain in the world; the mountain is also considered to be the highest in Pakistan and the Himalayas (Afshan et al., 2009). The basin area is mainly covered by glaciers and accumulated seasonal snow which significantly influence the basin’s hydrological processes (Naeem et al., 2012).

Figure 1

Figure 1. Spatial description of study area (Astore Sub-basin), location in world (left panel) and topographic variation along with hydroclimatic stations (right panel).

The precipitation pattern in the region is predominantly shaped by westerly circulations in the winter and spring, accounting for roughly two-thirds of the total precipitation. The remaining one-third is influenced by the monsoon during the summer and autumn months (Naeem et al., 2012). In the Astore sub-basin, snow cover varies widely, from 7% during the summer to as much as 95% in the winter, playing a critical role in the sustainability of downstream river systems (Tahir et al., 2016). This cryo-nival regime results in a distinct hydrological cycle, where river flows are primarily governed by snow accumulation in winter and subsequent meltwater contributions in spring and summer. The delayed release of water from snowmelt sustains baseflow during drier months, ensuring water availability for agriculture and hydropower generation downstream. However, variations in temperature and snowfall patterns can significantly impact the timing and magnitude of runoff, potentially leading to water scarcity during low-snowfall years or excessive flooding during rapid snowmelt events. Between 1998 and 2012, the mean annual temperatures were recorded at 2.9°C in the Rama valley (elevation 3,179 m) and 9.9°C in the Rattu valley (elevation 2,718 m) (Farhan et al., 2015), highlighting the sensitivity of the region’s hydrology to climatic fluctuations.

3 Materials and methods

This section includes a description of the data sources, model’s overview, the coupling of models, exploration of various input options in the coupling scenario, and statistical evaluation with ranking using compromise programming (CP).

3.1 Datasets

Terrestrial data for the study were obtained from various sources. The Advanced Space borne Thermal Emission and Reflection Radiometer (ASTER) Digital Elevation Model (DEM), with a 30 m resolution, was obtained from the National Aeronautics and Space Administration (NASA). Land-use data were retrieved from the United States Geological Survey (USGS) via the Moderate Resolution Imaging Spectroradiometer (MODIS) having resolution 500 m. Soil data were collected from the Food and Agriculture Organization (FAO). Figure 2 provides spatial distribution of MODIS (2020) land use classes while Figure 3 provides a description regarding soil.

Figure 2

Figure 2. Description of MODIS land use classes in study area.

Figure 3

Figure 3. Description of FAO soil classes in study area.

The MODIS International Geosphere-Biosphere Program (IGBP) classification scheme provides a global land cover classification system with 17 classes, each representing a distinct land cover type. As per these classification, major classes in study area are grass land (57.23%), baren/sparsely vegetated (25.8%), snow/ice (6.51%) and others (10.46%). As per FAO, the study area consists of three loamy soils including I-B-U-3712 (83.1%), Be78-2c-3679 (2.5%), and Be72-2a-3669 (0.8%), and glaciers-6998 (13.6%). Numerous studies in the Upper Indus region have utilized the same land use and soil datasets, encountering discrepancies in the percentage of land use classes, particularly for the snow/glacier class (Haleem et al., 2023; Khan et al., 2023; Mahmood et al., 2024). A possible explanation is the difference in scales, with the FAO Soil Map of the World being developed at a much coarser scale of 1:5,000,000 compared to the finer resolution of the MODIS (500 m) land use dataset and temporal variation.

In hydroclimatic data, meteorological including daily precipitation, temperature maximum and minimum and daily streamflow data, covering the period from 2005 to 2019, were sourced from the Pakistan Meteorological Department (PMD) and the Water and Power Development Authority (WAPDA), respectively.

3.2 Model description

3.2.1 SWAT model

The SWAT is a semi-distributed hydrological model that assesses the intricate relationships between land and meteorological parameters at the watershed level (Yi and Sophocleous, 2011), formulated by Arnold et al. (1998). The main modules consist of a weather generator, water quality, hydrology and plant growth (Zhang et al., 2023). It includes spatial management practices and agricultural chemical use databases that permit the prediction of streamflows in a basin that is large and diverse in soil, land cover and management conditions (Song et al., 2022; Raihan et al., 2020). The model requires a minimal set of input data, including terrestrial information such as Digital Elevation Models (DEM), land use, and soil data, along with daily meteorological inputs like precipitation, temperature, wind speed, relative humidity, and solar radiation (Ficklin and Barnhart, 2014; Wisal et al., 2020). Due to the unavailability of observed wind speed, relative humidity, and solar radiation data, the SWAT model’s built-in weather generator was used to simulate these parameters. The weather generator estimates missing climate variables based on statistical distributions derived from historical data, ensuring that meteorological inputs remain consistent with the regional climate conditions. This approach allows for a complete dataset while maintaining the integrity of hydrological simulations.

The SWAT model divides the whole watershed into number of hydrological response units (HRUs) based on the slope, land use and soil type and simulates the spatial changes in the watershed based on the hydrological dynamics of each HRU to compute water volume in river channels (Arnold et al., 2012). It has computational capability to estimate snow and glacier melt contributions using a temperature index algorithm (TIA). Astore sub-basin was subdivided into 5 subbasins and 47 HRUs. To account for the orographic effects, each sub-catchment was split into 10 elevation bands. The SWAT model simulates hydrological processes using the Soil Conservation Service Curve Number (SCS-CN) method for runoff estimation and the Muskingum scheme for channel routing. The Penman-Monteith equation was selected to estimate evapotranspiration. Since observed wind speed, solar radiation, and relative humidity data were unavailable, the SWAT model’s built-in weather generator was used to estimate these variables based on historical climate statistics. This approach ensures that the Penman-Monteith equation remains applicable despite missing observed data. The SWAT model is founded on the principles of the water cycle, as represented in Equation 1 (Neitsch et al., 2011):

\begin{array}{l} S W_{t} = S W_{0} + \sum_{i = 1}^{t} {(R_{day} + Q_{surf} - E_{a} - W_{seep} - Q_{g w})}_{i} & (1) \end{array}

Where:

SW_t: Final soil moisture content (mm).

SW₀: Initial soil moisture content on day 𝑖i (mm).

t: Time (days).

𝑅_𝑑𝑎𝑦: Precipitation on day i (mm).

𝑄_{𝑠𝑢𝑟𝑓}: Surface runoff on day i (mm).

𝐸_𝑎: Evapotranspiration on day i (mm).

𝑊_{𝑠𝑒𝑒𝑝}: Water entering the vadose zone from the soil profile on day i (mm).

$Q_{g w}$ : Groundwater return flow on day i (mm).

The SWAT model estimates surface runoff (𝑄_{𝑠𝑢𝑟𝑓}) using the SCS curve number method (Mockus, 1964) and Green-Ampt Infiltration method (Winchell et al., 2010). In this study, the SCS curve number method was chosen to compute surface runoff by Equation 2:

\begin{array}{l} Q_{surf} = \frac{{(R_{day} - 0.2 S)}^{2}}{(R_{day} + 0.8 S)} & (2) \end{array}

Where Q_surf (mm) is excess rainfall or surface runoff, R_day (mm) is the rainfall in a day and S (mm per day) is the maximum retention parameter. This parameter varies spatially and temporally due to variations in land-use management, slope and soils, and soil water content, respectively. The retention parameter is given by Equation 3:

\begin{array}{l} S = 25.4 (\frac{1000}{C N} - 10) & (3) \end{array}

Where S is defined earlier, and CN denotes the curve number.

3.2.2 BiLSTM deep learning model

The LSTM is a deep learning model type that was developed specifically to address the gradient explosion or vanish problems that traditional recurrent neural networks (RNNs) experience (Hochreiter and Schmidhuber, 1997). Over time, it has demonstrated its effectiveness in handling sequentially arranged data processing (Winchell et al., 2010; Hochreiter and Schmidhuber, 1997; Lipton et al., 2015).

During a specific time(t), a standard LSTM can solely acquire knowledge from its preceding inputs. However, through the Coupling of two LSTMs operating in opposing directions, a BiLSTM analyzes information in both forward and backward directions simultaneously, utilizing distinct hidden layers. By encompassing input information that precedes and follows each input at time t, BiLSTM exhibits the potential to outperform unidirectional LSTMs, particularly in scenarios involving time-series data. BiLSTM structure’s flow chart is demonstrated with the basic LSTM structure of cell calculation and equations in Figure 4.

Figure 4

Figure 4. LSTM and BiLSTM basic structures along with equations.

The optimal architecture of the BiLSTM model was structured with careful consideration of various components to achieve peak performance and mitigate overfitting. The first hidden layer was comprised of 512 neurons, setting the foundation for subsequent layers. The following four dense layers consist of 256, 64, 32, and 1 neuron, respectively, formed a hierarchical architecture. To prevent overfitting, dropout layers with a 0.2 dropout rate were strategically placed before and after each fully connected layer (Srivastava et al., 2014).

To enhance computational efficiency and avoid gradient disappearance, the Rectified Linear Unit (ReLU) was employed as the activation function in the hidden layers (Wu et al., 2020). Compared to activation functions such as Sigmoid and Tanh, ReLU is computationally efficient, prevents vanishing gradients by allowing positive gradients to pass unchanged, and promotes faster convergence in deep networks. The optimizer of choice was Adaptive Moment Estimation (Adam), known for its adaptive learning rates and effective optimization capabilities. The learning rate was set at 0.001, contributing to the determination of the optimal combination of various hyperparameters during training.

The model’s loss function was a weighted sum of Mean Absolute Error (MAE) and Mean Square Error (MSE), allowing for a balanced evaluation of prediction errors. MAE measures absolute differences between predicted and observed values, making it robust to outliers, while MSE penalizes larger errors more heavily, encouraging the model to focus on minimizing significant deviations. The weights assigned to MAE (0.7) and MSE (0.3) were chosen to prioritize overall prediction accuracy while preventing excessive sensitivity to large errors. This combination ensures that both small and large discrepancies are accounted for in the training process, leading to improved generalization (Yang et al., 2023).

3.3 Input selection for BiLSTM standalone model

Input selection for standalone deep learning BiLSTM was carried out by cross correlation analysis to determine the most relevant meteorological factors influencing daily streamflow in the Astore sub-basin. This analysis involved computing correlation coefficients between daily streamflow and three key variables over different lag times: daily rainfall on the antecedent nth day (P_t − n), total rainfall in the preceding n days (P_n), and daily temperature on the antecedent nth day (T_t-n). These variables were chosen based on their hydrological significance, as rainfall contributes to direct runoff, cumulative rainfall accounts for delayed responses in the watershed, and temperature plays a critical role in snowmelt-driven streamflow, particularly in mountainous regions like Astore. The correlation analysis was performed for lag times ranging from 0 to 50 days, allowing the identification of the most influential time delays for each variable.

3.4 Coupling of calibrated SWAT-BiLSTM

Calibrated SWAT and BiLSTM coupling was carried out in such a way that preliminary streamflow obtained from calibrated SWAT model used as one of the inputs in data driven BiLSTM model along with the three different alternate inputs scenarios as shown in Schematic diagram of methodology (Figure 5). These scenarios were designed to explore different levels of input reliance and optimize predictive performance by integrating process-based and data-driven modeling approaches. In the SWAT-C-BiLSTM (PTQ_p) model, three input parameters were used for BiLSTM: precipitation, temperature (without any selection analysis), and preliminary flow from the calibrated SWAT model. This scenario serves as a baseline model, leveraging conventional meteorological inputs that are commonly used in hydrological modeling. The SWAT-C-BiLSTM (T₁ Q_p) model included only two input parameters: temperature with a one-day lag (based on cross-correlation analysis) and preliminary flow from the calibrated SWAT model. This scenario was developed based on cross-correlation analysis, which identified the most relevant climate variable with the highest correlation to daily streamflow. The selection of T_t−1 is particularly relevant in snowmelt-driven basins like Astore, where delayed temperature effects significantly influence streamflow. The SWAT-C-BiLSTM (Q_p) model utilized only one input parameter, the preliminary flow from the calibrated SWAT model without including precipitation or temperature. This scenario was designed to evaluate whether simulated streamflow alone, without meteorological inputs, could sufficiently drive BiLSTM predictions, effectively testing the strength of SWAT’s hydrological representation in a hybrid modeling framework.

Figure 5

Figure 5. Schematic diagram of methodology showing data collection, development of standalone models, coupling calibrated SWAT with BiLSTM under alternate inputs options, and streamflow prediction followed by statistical evaluation and ranking through compromise programing.

3.5 Calibration, validation, and prediction of models

The standalone SWAT model was calibrated and validated using the SWAT Calibration and Uncertainty Program (SWAT-CUP), following the approach of Garee et al. (2017). Model parameter uncertainty and performance were evaluated through the Sequential Uncertainty Fitting (SUFI-2) method in SWAT-CUP. To obtain the best-fit values for sensitive parameters, SWAT-CUP was run for 10,000 iterations during the calibration phase. The 2 years (2005–2006) were warm up period for model stability. Daily flow data observed at the Doyian station from 2007 to 2011 was used for calibration, while data from 2012 to 2015 served to validate the simulation results. On the other hand, based on cross correlation analysis input that was temperature at 1 day lag used as input to train and test the standalone BiLSTM data driven model at daily time scale. Keeping standalone DD model architecture same, three alternate inputs were used to train and test the models (coupled one) individually (Figure 5). Following calibration and validation, all five models including two standalone and three coupled alternate input scenarios based, were used to predict daily streamflow from 2017 to 2019.

3.6 Statistical evaluation and ranking using compromise programing

The evaluation of the each modeled daily streamflow with respect to observed daily streamflow was conducted using statistical matrices including coefficient of determination (R²), Nash Sutcliffe efficiency coefficient (NSE), percent bias (PBIAS) and root mean square error to the standard deviation ratio (RSR). Ideal value for R² and NSE is 1 while PBIAS and RSR is 0. Table 1 shows the statical matrices mathematical expression along with their possible ranges.

Table 1

Table 1. Performance metrics describing mathematical expressions with ranges of NSE, R², PBIAS and RSR.

Compromise Programming (CP) involves combining various statistical measures, as highlighted by Iqbal et al. (2021) and Khan et al. (2023). In this study, CP was applied to assess and rank hydrological models using four core performance indicators: R², NSE, PBIAS and RSR which collectively gage the model effectiveness. Within the CP framework, researchers computed a specialized distance measure known as the L_P metric, as demonstrated by Shiru and Chung (2021). The L_P metric is defined by Equation 4:

\begin{array}{l} L_{p} = {[\sum_{n = 1}^{n} | W_{n}^{*} - W_{n} |^{m}]}^{1 / m} & (4) \end{array}

Here, the parameter m is set to 1, W_n represents the actual value of a statistical performance measure, and $W_{n}^{*}$ denotes the ideal value of the performance measure achieved when model simulations perfectly align with observed data. The L_p metric is always positive, and lower L_p values are preferred as they signify superior model performance.

4 Results

This section includes results description of the selection of the input for standalone data driven model, calibration and validation performance of models, prediction performance and lastly ranking using compromise programing based on statistical evaluations.

4.1 Selection for standalone data driven model input

Input selection for deep learning standalone BiLSTM model was carried out by cross correlation analysis. Figure 6 illustrates the correlation coefficients between daily streamflow and daily rainfall, total rainfall and temperature factors, respectively, at daily lag time from 2005 to 2015. These factors include daily rainfall on the antecedent nth day (P_t − n), total rainfall in the preceding n days (P_n), and daily temperature on the antecedent nth day (T_t-n). The maximum daily rainfall and total rainfall exhibited correlations of 0.12 and 0.10, respectively. In contrast, the analysis showed that the temperature on the preceding day (T_t-1) had the highest correlation of 0.69 with the basin’s daily streamflow. As a result, T_t-1 was chosen as the covariate in the BiLSTM model to simulate daily streamflow in the Astore sub-basin.

Figure 6

Figure 6. Cross correlation analysis between daily streamflow and daily rainfall on the previous nth day, total rainfall in the preceding n-days and daily temperature on the previous nth day, respectively in Astore basin for the duration 2005–2015.

4.2 Sensitivity analysis, calibration, and validation performances of models

The performance of the two standalone models (SWAT and BiLSTM) and three coupled models (SWAT-C-BiLSTM (T₁Q_p), SWAT-C-BiLSTM (Q_p), and SWAT-C-BiLSTM (PTQ_p)) was evaluated during the calibration (2007–2011) and validation (2012–2015) periods at the Doyian station on the Astore River. A warm-up period of 2 years (2005–2006) was used to stabilize the physically based SWAT model.

Sensitivity analysis was conducted during SWAT model calibration and 24 parameters were identified as critical for simulating daily streamflows, as presented in Table 2 and their ranges and fitted values are presented in Figure 7. Table 2 also presents the t-test values and p-values derived from Global Sensitivity Analyses for selected parameters in the Astore sub-basin. Among these, groundwater-related parameters, including ALPHA_BF (baseflow recession coefficient), GW_DELAY (groundwater delay time), GWQMN (threshold depth of water in the shallow aquifer required for return flow), and REVAPMN (threshold depth of water in the shallow aquifer for percolation to deep aquifers), played a crucial role in regulating baseflow contributions to the river. These parameters influenced model performance, particularly in low-flow conditions, highlighting the potential interactions between surface water and groundwater in the basin.

Figure 7

Figure 7. Parameters utilized in standalone SWAT model for calibration showing light green range (minimum and maximum) bar and dark red fitted values.

Table 2

Table 2. Sensitive parameters during SWAT model calibration with t-stat, p-value, and ranks.

The most sensitive parameters, SMTMP, SNO50COV, and CH_N1, were crucial for calibrating the SWAT-T-BiLSTM model. In this area, snow cover and glacier melt are the primary water sources (Ayub et al., 2020; Ali et al., 2023), which explains SWAT’s sensitivity to SMTMP (snowmelt base temperature) and SNO50COV (snow water equivalent at 50% snow cover). Additionally, sub-basin, hydraulic response unit, and groundwater parameters significantly influenced streamflow simulations.

While groundwater parameters were considered during calibration, direct observational evidence of river-aquifer interactions in the Astore basin is limited. Future studies incorporating hydrogeological field measurements, isotopic analysis, or groundwater monitoring networks could further improve the constraint on these parameters and enhance the model’s physical realism.

During the calibration (2007–2011) and validation (2012–2015) phases, the coupled models outperformed the standalone models in both statistical metrics (Figure 8) and flow simulation (Table 3). Among the coupled models, SWAT-C-BiLSTM (T₁Q_p) and SWAT-C-BiLSTM (Q_p) exhibited comparable performance, demonstrating superior accuracy in streamflow prediction. SWAT-C-BiLSTM (Q_p) achieved an NSE of 0.73 and 0.78, an R² of 0.83 and 0.85, and a low PBIAS of 23.78 and 21.74% during calibration and validation periods, respectively. Similarly, SWAT-C-BiLSTM (T₁Q_p) showed an NSE of 0.80 and 0.73, R² of 0.89 and 0.84, and effectively replicated low flows with averages of 32.54 m³/s and 32.65 m³/s. Both models effectively captured peak and low flows, with SWAT-C-BiLSTM (T₁Q_p) slightly better at simulating peak flows, while SWAT-C-BiLSTM (Q_p) exhibited slightly lower bias in terms of PBIAS. Given their close performance, both models can be considered equally robust for accurate daily streamflow predictions. Line plots of calibration and validation are shown in Figures 9, 10.

Figure 8

Figure 8. Statistical evaluation of different models during calibration/training, validation/testing, and prediction phase by utilizing R², NSE, PBIAS, and RSR, respectively.

Table 3

Table 3. Yearly average of daily peak and low flows summary during calibration, validation, and prediction phases, observed and simulated by different models.

Figure 9

Figure 9. Calibration/training of standalone and coupled models under alternate input options at daily time scale.

Figure 10

Figure 10. Validation/testing of standalone and coupled models under alternate inputs options at daily time scale.

In contrast, the standalone SWAT model exhibited significant biases, with PBIAS exceeding 44 and 48.26% and showed a tendency to underestimate both peak and low flows, as reflected in the low flow averages of 1.30 m³/s and 1.19 m³/s compared to the observed values of 27.58 m³/s and 24.13 m³/s (Table 3). The standalone BiLSTM model performed better than SWAT, achieving an NSE of 0.67 and 0.68, and producing low flows average of 31.72 m³/s and 31.01 m³/s, yet it still lagged behind coupled models. Among the coupled models, SWAT-C-BiLSTM (PT₁Q_p) performed the weakest, likely due to uncertainties in precipitation and temperature inclusion, resulting in low NSE values of 0.23 and 0.27 and a higher PBIAS of 31.45 and 32.06%. These results highlight the superiority of SWAT-C-BiLSTM (Q_p) and SWAT-C-BiLSTM (T₁Q_p) in accurately predicting both high and low flows, with no significant advantage over the other, making both valuable tools for hydrological modeling (Table 4).

Table 4

Table 4. Difference in metrics and values for ranking of models using compromise programing approach.

4.3 Model prediction performance

During the prediction period (2017–2019), the coupled models continued to outperform the standalone models in terms of statistical metrics and flow simulation accuracy (Figure 11a and Table 4). SWAT-C-BiLSTM (T₁Q_p) and SWAT-C-BiLSTM (Q_p) exhibited nearly identical predictive capabilities, with SWAT-C-BiLSTM (T₁Q_p) achieving an NSE of 0.88 and R² of 0.89, while SWAT-C-BiLSTM (Q_p) attained an NSE of 0.86 and R² of 0.87. Both models effectively minimized bias, with PBIAS values of 4.06 and 5.04%, respectively. These models demonstrated high accuracy in capturing streamflow variations, including peak and low flows, with SWAT-C-BiLSTM (T₁Q_p) showing slightly better peak flow predictions, whereas SWAT-C-BiLSTM (Q_p) maintained lower bias in overall flow representation.

Figure 11

Figure 11. (a) Prediction of standalone and coupled models under alternate input options at daily time scale for streamflow simulation; (b) flow duration curve during prediction phase.

In contrast, the standalone SWAT model exhibited considerable bias, reflected in its high PBIAS of 41.33% and lower NSE of 0.62. It significantly underestimated both peak and low flows, with an average peak flow of 251.03 m³/s compared to the observed 415.47 m³/s and an average low flow of 1.61 m³/s against the observed 21.79 m³/s. On the other hand, SWAT-C-BiLSTM (Q_p) closely matched the observed peak flow with 462.83 m³/s, while its low flow predictions (30.54 m³/s) remained highly consistent with the observed values. SWAT-C-BiLSTM (T₁Q_p) followed a similar trend, maintaining comparable accuracy across different flow conditions.

Among the coupled models, SWAT-C-BiLSTM (PT₁Q_p) performed the weakest, likely due to increased uncertainty from additional climate inputs, resulting in the highest bias in low flow predictions (42.12 m³/s). The flow duration curve (Figure 11b) further illustrates that SWAT-C-BiLSTM (T₁Q_p) and SWAT-C-BiLSTM (Q_p) closely follow the observed flow distribution across different exceedance probabilities, reinforcing their reliability in predictive performance. These findings highlight the effectiveness of coupled models, particularly SWAT-C-BiLSTM (T₁Q_p) and SWAT-C-BiLSTM (Q_p), in accurately simulating streamflow across different hydrological conditions.

4.4 Model ranking using the CP approach based on statistical evaluation

The CP approach effectively evaluated the performance of the models by aggregating statistical metrics into a single ranking criterion, allowing for a holistic comparison across calibration, validation, and prediction periods. The rankings (Table 4) identified SWAT-C-BiLSTM (Q_p) as the best-performing model, achieving the lowest aggregate (3.54) due to its consistent superiority across all metrics, particularly during validation and prediction periods. It exhibited minimal differences in R², NSE, PBIAS, and RSR compared to observed values, reflecting its robust ability to capture both peak and low flow dynamics. Similarly, SWAT-C-BiLSTM (T₁Q_p) secured the second rank with an aggregate of 4.77, demonstrating competitive performance, although slightly less accurate than SWAT-C-BiLSTM (Q_p) in some scenarios.

In contrast, the standalone SWAT model ranked at last (87.99), with the highest aggregate metric deviations, emphasizing its limitations in simulating streamflow under complex hydrological conditions. The BiLSTM model with an aggregate of 18.05 ranked at third 18.05, performing well independently but falling behind the coupled models due to its lack of physical hydrological considerations. The SWAT-C-BiLSTM (PTQ_p) model ranked at fourth having an aggregate of 65.11, primarily due to higher discrepancies in PBIAS and RSR, which impacted on its overall performance.

These results underscore the efficiency of the CP approach in ranking models, affirming the superiority of coupled models SWAT-C-BiLSTM (Q_p) and SWAT-C-BiLSTM (T₁Q_p) for streamflow prediction. This ranking provides a clear decision-making framework for selecting models tailored to high-performance hydrological simulations at daily timescale.

5 Discussion

5.1 Results interpretation and comparison

The coupled SWAT-C-BiLSTM (Q_p) model, excluding climatic variables as inputs to BiLSTM, and The cross correlation-based coupling scenario, SWAT-C-BiLSTM (T₁ Q_p), emerged as the best-performing scenarios in this study, achieving the highest accuracy and lowest bias in streamflow prediction. Conversely, SWAT-C-BiLSTM (PTQ_P), which included both precipitation and temperature, performed the poorest. These findings highlight the critical role of optimized input selection in enhancing the performance of hybrid hydrological models.

The SWAT-C-BiLSTM (Q_p) model does not explicitly incorporate climate variables such as temperature and precipitation as direct inputs to BiLSTM, these variables are inherently embedded within the SWAT-simulated flow (Q_p). SWAT is a process-based model that simulates streamflow based on climatic inputs, land surface interactions, and catchment hydrological processes. Thus, Q_p inherently carries the aggregated influence of precipitation, temperature, and other hydrological drivers. Our results indicate that using Q_p as an input to BiLSTM allows for leveraging SWAT’s physically based outputs while reducing additional uncertainties that may arise from directly incorporating climate variables into BiLSTM. This does not mean that temperature and other climatic factors are disregarded; rather, their effects are already encapsulated in the Q_p variable as simulated by SWAT.

The findings are align with the study Jeong D. S. et al. (2024), where the inclusion of climate variable precipitation to hybrid models did not make noticeable improvements for streamflow and suspended solids simulations. Additionally, the coupled models performed better compared to standalone SWAT and data driven models (Yang et al., 2023; Jeong H. et al., 2024; Mei et al., 2024). Daily temperature showed a stronger correlation with streamflow than precipitation, reflecting the snow and glacier melt dominated hydrology of the Astore sub-basin. This may be attributed to the fact that streamflow in the study area is primarily driven by glacier melt, followed by snowmelt and rainfall (Ayub et al., 2020; Ali et al., 2023).

The relatively poor performance of the standalone SWAT model, as observed in this study, can be attributed to several factors, including limited observed climate data and the reliance on default settings for critical variables like relative humidity, wind speed, and solar radiation. Ghane and Alvankar (2015) highlight how variations in these parameters can drastically influence runoff estimations, underscoring the importance of accurate meteorological inputs. Furthermore, the inadequate coverage of rain gages and the coarse resolution data likely exacerbate uncertainties, as demonstrated by Anderson and Bingner (2010) and Gao et al. (2017), respectively. The SWAT model may struggle to accurately identify the most sensitive parameters (Shen et al., 2008; Cibin et al., 2010), including snow-specific parameters, for each sub-basin due to the coarse resolution of land use data, which hinders its ability to capture unique sub-basin conditions. High-resolution datasets and improved spatial coverage could address these limitations and enhance the model’s ability to capture sub-basin characteristics. Additionally, the lack of preprocessing and calibration for snow-specific parameters may have compounded these challenges, consistent with findings by Jajarmizadeh et al. (2017). In future work, leveraging alternative data sources such as Climate Forecast System Reanalysis (CFSR) etc. in place of default SWAT algorithm should be utilized as Haleem et al. (2023) and employing advanced calibration techniques may improve model performance and reduce uncertainties in hydrological predictions.

These limitations are mitigated by coupling SWAT with BiLSTM, which provides enhanced flexibility and adaptability in modeling nonlinear hydrological processes. The BiLSTM model, in this study, demonstrated its capacity to adapt to dominant hydrological drivers, such as glaciers and snowmelt, effectively capturing low-flow conditions across all phases. However, as a standalone model, BiLSTM faced challenges in simulating peak flows due to the absence of physical process representation.

In SWAT-C-BiLSTM (Q_p) scenario, BiLSTM leveraged SWAT’s physically based outputs to significantly improve peak flow simulations while addressing uncertainties associated with coarse-resolution land use data and limited climatic variables. This coupling approach underscores the complementary strengths of BiLSTM, with selective variable integration playing a pivotal role. Additionally, since SWAT simulates streamflow by incorporating these climatic variables, the coupled scenarios we discuss pertain specifically to BiLSTM’s input selection and do not imply that the SWAT-simulated flow (Q_p) is devoid of climatic influences.

The marginal accuracy difference between the excluded-variable scenario (SWAT-C-BiLSTM (Q_p)) and the cross correlation based included variable scenario (SWAT-C-BiLSTM (T₁ Q_p)) highlights the importance of carefully curating input variables rather than including all climatic data. For glacier-fed basins like Astore, where precipitation has limited daily correlation with streamflow, excluding it from the coupling process yielded enhanced performance.

5.2 Implications for hydrology and model development

The findings of this study underscore the critical importance of optimizing input selection and leveraging hybrid modeling approaches in hydrology. The superior performance of the SWAT-C-BiLSTM (Q_p) scenario, which excluded climatic variables from direct BiLSTM input, demonstrates that selective integration of variables aligned with dominant hydrological drivers can enhance model accuracy while reducing uncertainty. This is particularly relevant for glacier-fed basins like Astore, where streamflow is primarily influenced by glacier and snowmelt rather than precipitation. The results highlight the complementary strengths of physically based models, like SWAT, and data-driven approaches, such as BiLSTM, in addressing the limitations of standalone models. By combining physical process representation with nonlinear adaptability, coupled models offer a promising pathway for improving hydrological predictions in data-scarce and complex environments, paving the way for more robust water resource management strategies under changing climatic conditions.

5.3 Operational relevance for water resource management

The findings of this study offer practical implications for water resource management, particularly in streamflow forecasting for flood and drought preparedness. The superior performance of the SWAT-C-BiLSTM (Q_p) model suggests that integrating process-based hydrological models with machine learning can enhance predictive accuracy without requiring direct climate inputs. This is particularly beneficial in data-scarce regions where real-time climate data may be limited.

Improved streamflow predictions can aid water managers in optimizing reservoir operations, flood risk mitigation, and drought preparedness by providing more reliable forecasts. The study’s approach could be further explored for operational hydrological forecasting systems, ensuring robust water resource planning under changing climatic conditions.

5.4 Limitations and future directions

This study faced several limitations that could influence the outcomes. The use of coarse resolution land use data, such as MODIS, introduced uncertainties in sub-basin parameterization, potentially affecting the accuracy of hydrological simulations. Limited preprocessing of hydroclimatic data impacted the ability to accurately simulate peak and low-flow conditions. The reliance on cross-correlation for lag selection between climatic variables, while effective as a preliminary step, may not fully capture the nonlinear relationships inherent in hydrological processes, potentially leading to suboptimal lag time identification. Additionally, considering only precipitation and temperature as climatic variables may not fully capture the hydrological complexities of the Astore sub-basin, particularly in a glacier-fed system. Also, the findings are specific to the Astore sub-basin, and their generalizability to other regions with varying geographic and climatic contexts remains unexplored.

Meteorological data were obtained from the Pakistan Meteorological Department (PMD) and the Water and Power Development Authority (WAPDA); however, these datasets may have limited spatial coverage. Future studies could explore the use of alternative data sources, such as ERA5 or Climate Forecast System Reanalysis (CFSR) datasets, to supplement observed meteorological inputs and potentially enhance model accuracy. The integration of high-resolution reanalysis datasets could improve spatial representation and reduce uncertainties in hydrological simulations, particularly in data-scarce mountainous regions.

Additionally, while groundwater parameters were considered during calibration, direct observational evidence of river-aquifer interactions in the Astore sub-basin is limited. Future studies incorporating hydrogeological field measurements, isotopic analysis, or groundwater monitoring networks could further improve the constraint on these parameters and enhance the model’s physical realism. Groundwater contributions, particularly in low-flow conditions, may play a more significant role than currently represented in the model, warranting further investigation.

Moreover, future studies should incorporate high-resolution land use data to enhance parameter sensitivity and simulation accuracy. Expanding the range of climatic inputs to include additional variables, such as precipitation variables, humidity and solar radiation, could provide a more comprehensive understanding of hydrological processes. Incorporating more robust and exhaustive methods, such as cross-validation, grid search, or advanced feature selection techniques, is recommended to systematically optimize lag times and better align with the nonlinear capabilities of machine learning models. Thorough preprocessing of hydroclimatic data is critical to minimize uncertainties, especially for extreme events. Additionally, testing the exclusion or inclusion approach in diverse geographic and climatic regions will help assess its broader applicability and effectiveness in coupled modeling scenarios.

6 Conclusion

This study explored alternate coupling inputs of a data-driven model to enhance daily streamflow prediction in a calibrated SWAT-BiLSTM rainfall-runoff modeling framework for the Astore sub-basin of the Upper Indus Basin, Pakistan. Five modeling scenarios were analyzed, including standalone SWAT and BiLSTM models and three alternate coupling configurations that utilized conventional climate variables (precipitation and temperature), cross-correlation-based variable selection (temperature or precipitation), and a scenario excluding direct climatic inputs while incorporating SWAT-simulated streamflow. Model calibration, validation, and prediction were conducted for the periods January 2007 to December 2011, January 2012 to December 2015, and January 2017 to December 2019, respectively. Model performance was evaluated using statistical metrics (R², NSE, PBIAS, and RSR), and a ranking was established through Compromise Programming. The key findings of this study are as follows:

Cross-correlation analysis identified temperature as the most influential input (maximum correlation of 0.69 at 1-day lag), emphasizing the need for optimized input selection in coupled modeling.

The SWAT-C-BiLSTM (Q_p) model, which excluded direct climate inputs, emerged as the best-performing configuration, followed closely by the cross correlation based SWAT-C-BiLSTM (T₁Q_p) model. These models exhibited minimal bias and high predictive accuracy.

The compromise programming ranking confirmed the superiority of coupled models, except for SWAT-C-BiLSTM (PTQ_p), over standalone SWAT and BiLSTM models, demonstrating the benefits of integrating physically based and machine-learning approaches.

Data availability statement

The spatial data used in this study including elevation, land use and soil data is freely available and can be accessed from the websites given in the data section of the manuscript. The climatic parameters and stream flow data is the property of the Pakistan Meteorological Department (PMD) and Water and Power Development Authority (WAPDA), Pakistan, respectively, and can be requested from these departments via official channels. The data that support the findings of this study are available on request from the corresponding author.

Author contributions

KA: Conceptualization, Formal analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing. MI: Conceptualization, Formal analysis, Investigation, Methodology, Supervision, Writing – review & editing. MT: Conceptualization, Formal analysis, Investigation, Validation, Writing – review & editing. AK: Conceptualization, Formal analysis, Investigation, Validation, Writing – review & editing. AN: Formal analysis, Investigation, Validation, Visualization. JC: Formal analysis, Resources, Validation, Writing – review & editing, Funding acquisition. KU: Funding acquisition, Investigation, Methodology, Project administration. HamA: Formal analysis, Investigation, Validation, Visualization, Resources. HasA: Conceptualization, Funding acquisition, Supervision, Writing – review & editing. MA: Validation, Investigation, Methodology, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. The research is partially funded by Gansu Provincial Science and Technology Program (25JRRA510), the program of the State Key Laboratory of Cryospheric Science and Frozen Soil Engineering, Chinese Academy of Sciences (CSFSE-ZQ-2411) and the Young Elite Scientists Sponsorship Program by CAST (2023QNRC001), the Ministry of Science and Higher Education of the Russian Federation as part of the World-class Research Center program Advanced Digital Technologies [contract no. 075-15-2022 (contract no. 075-15-311 dated 20.04.2022)]. This research was also funded by Taif University, Saudi Arabia (project no. TU-DSPP-2024-33).

Acknowledgments

The authors are grateful to the Center of Excellence in Water Resources Engineering, UET Lahore, for supporting this study. They also extend their gratitude to the Pakistan Meteorological Department (PMD) and the Water and Power Development Authority (WAPDA), Pakistan, for providing the observed meteorological and hydrological data used in this study. Additionally, the authors sincerely thank the reviewers for their invaluable insights, which have significantly improved the quality of the manuscript.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Gen AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Afshan, N. S., Khalid, A. N., Iqbal, S. H., Niazi, A. R., and Sultan, A. (2009). Puccinia subepidermalis Sp. Nov. and new records of rust fungi from fairy meadows, northern Pakistan. Mycotaxon 110, 173–182. doi: 10.5248/110.173

Exploring alternate coupling inputs of a data-driven model for optimum daily streamflow prediction in calibrated SWAT-BiLSTM rainfall-runoff modeling

1 Introduction

2 Description of study area

3 Materials and methods

3.1 Datasets

3.2 Model description

3.2.1 SWAT model

3.2.2 BiLSTM deep learning model

3.3 Input selection for BiLSTM standalone model

3.4 Coupling of calibrated SWAT-BiLSTM

3.5 Calibration, validation, and prediction of models

3.6 Statistical evaluation and ranking using compromise programing

4 Results

4.1 Selection for standalone data driven model input

4.2 Sensitivity analysis, calibration, and validation performances of models

4.3 Model prediction performance

4.4 Model ranking using the CP approach based on statistical evaluation

5 Discussion

5.1 Results interpretation and comparison

5.2 Implications for hydrology and model development

5.3 Operational relevance for water resource management

5.4 Limitations and future directions

6 Conclusion

Data availability statement

Author contributions

Funding

Acknowledgments

Conflict of interest

Generative AI statement

Publisher’s note

References

94% of researchers rate our articles as excellent or good