Flood Susceptibility Modeling in a Subtropical Humid Low-Relief Alluvial Plain Environment: Application of Novel Ensemble Machine Learning Approach

Pandey, Manish; Arora, Aman; Arabameri, Alireza; Costache, Romulus; Kumar, Naveen; Mishra, Varun Narayan; Nguyen, Hoang; Mishra, Jagriti; Siddiqui, Masood Ahsan; Ray, Yogesh; Soni, Sangeeta; Shukla, UK

doi:10.3389/feart.2021.659296

ORIGINAL RESEARCH article

Front. Earth Sci., 20 December 2021

Sec. Environmental Informatics and Remote Sensing

Volume 9 - 2021 | https://doi.org/10.3389/feart.2021.659296

This article is part of the Research TopicGeospace Observation of Natural HazardsView all 13 articles

Flood Susceptibility Modeling in a Subtropical Humid Low-Relief Alluvial Plain Environment: Application of Novel Ensemble Machine Learning Approach

Manish Pandey^1,2*

Aman Arora^1,3,4*

Alireza Arabameri⁵

Romulus Costache^6,7

Naveen Kumar⁸

Varun Narayan Mishra⁹

Hoang Nguyen^10,11

Jagriti Mishra^12,13

Masood Ahsan Siddiqui⁴

Yogesh Ray¹⁴

Sangeeta Soni¹⁵

UK Shukla¹⁶

¹University Center for Research and Development (UCRD), Chandigarh University, Mohali, India
²Department of Civil Engineering, University Institute of Engineering, Chandigarh University, Mohali, India
³Bihar Mausam Seva Kendra, Planning and Development Department, Government of Bihar, Patna, India
⁴Department of Geography, Faculty of Natural Sciences, Jamia Millia Islamia, New Delhi, India
⁵Department of Geomorphology, Tarbiat Modares University, Tehran, Iran
⁶Department of Civil Engineering, Transilvania University of Brasov, Brasov, Romania
⁷Danube Delta National Institute for Research and Development, Tulcea, Romania
⁸Physical Research Laboratory, Ahmedabad, India
⁹Centre for Climate Change and Water Research, Suresh Gyan Vihar University, Jaipur, India
¹⁰Department of Surface Mining, Mining Faculty, Hanoi University of Mining and Geology, Hanoi, Vietnam
¹¹Innovations for Sustainable and Responsible Mining (ISRM) Group, Hanoi University of Mining and Geology, Hanoi, Vietnam
¹²Civil Engineering Research Institute for Cold Region, Sapporo, Japan
¹³Institute of Engineering and Technology, GLA University, Mathura, India
¹⁴National Centre for Polar and Ocean Research, Ministry of Earth Sciences, Government of India, Goa, India
¹⁵School of Computer and Systems Sciences, Jaipur National University, Jaipur, India
¹⁶Center for Advanced Study in Geology, Institute of Science, Banaras Hindu University, Varanasi, India

This study has developed a new ensemble model and tested another ensemble model for flood susceptibility mapping in the Middle Ganga Plain (MGP). The results of these two models have been quantitatively compared for performance analysis in zoning flood susceptible areas of low altitudinal range, humid subtropical fluvial floodplain environment of the Middle Ganga Plain (MGP). This part of the MGP, which is in the central Ganga River Basin (GRB), is experiencing worse floods in the changing climatic scenario causing an increased level of loss of life and property. The MGP experiencing monsoonal subtropical humid climate, active tectonics induced ground subsidence, increasing population, and shifting landuse/landcover trends and pattern, is the best natural laboratory to test all the susceptibility prediction genre of models to achieve the choice of best performing model with the constant number of input parameters for this type of topoclimatic environmental setting. This will help in achieving the goal of model universality, i.e., finding out the best performing susceptibility prediction model for this type of topoclimatic setting with the similar number and type of input variables. Based on the highly accurate flood inventory and using 12 flood predictors (FPs) (selected using field experience of the study area and literature survey), two machine learning (ML) ensemble models developed by bagging frequency ratio (FR) and evidential belief function (EBF) with classification and regression tree (CART), CART-FR and CART-EBF, were applied for flood susceptibility zonation mapping. Flood and non-flood points randomly generated using flood inventory have been apportioned in 70:30 ratio for training and validation of the ensembles. Based on the evaluation performance using threshold-independent evaluation statistic, area under receiver operating characteristic (AUROC) curve, 14 threshold-dependent evaluation metrices, and seed cell area index (SCAI) meant for assessing different aspects of ensembles, the study suggests that CART-EBF (AUC_SR = 0.843; AUC_PR = 0.819) was a better performant than CART-FR (AUC_SR = 0.828; AUC_PR = 0.802). The variability in performances of these novel-advanced ensembles and their comparison with results of other published models espouse the need of testing these as well as other genres of susceptibility models in other topoclimatic environments also. Results of this study are important for natural hazard managers and can be used to compute the damages through risk analysis.

1 Introduction

Floods in the changing climatic and anthropogenic scenario over the Holocene period have been impacting the living conditions of humans (Macklin and Lewin, 2003). Owing to the recurring floods and their devastating worldwide societal implications, the United Nations Sustainable Development Goals (UNSDGs) incorporate flood risk management and mitigation as one of its principal aims (UNSDG, 2013). Depending upon the geological, hydrological, climatic, and societal factors, floods have been variously classified (Sikorska et al., 2015). However, the widely accepted definition of flood encompasses the views of hydrologists, hazard managers, and sociologists, i.e., floods occur when the rise of water levels, caused by meteorological, hydrological, geomorphic, anthropological, and societal factors, can result in inundated areas which otherwise remain dry thereby causing loss of life, agriculture (including livestock), and property (Hubbart and Jones, 2009). The state of Bihar in India faces annual flooding incurring a loss of life, property, and agriculture (livestock included), in the tune of approximately ₹146,301.71 million (CWC, 2018). Previous studies have suggested that the Ganga River Basin (GRB) in the Himalayan Foreland Basin (HFB) is currently under active tectonic regime (Kumar, 2020). It is experiencing subsidence due to subsurface structural activities accentuating floods occurring due to various reasons (Shukla et al., 2012; Gupta et al., 2014). Apart from tectonically induced ground subsidence, landuse/landcover (LULC) induced (Kumar et al., 2018), climate change-induced (Arora et al., 2021a), river embankment breach induced (Bhatt et al., 2010), etc., factors cause frequent flooding in the GRB. Advancement in remote sensing technology has proved to be helpful in monitoring and prediction of the flooding (Jiménez-Jiménez et al., 2020). Many aspects of floods are quantifiable using continuously growing remote sensing satellite technology and their output products (Plaza et al., 2009).

GRAPHICAL ABSTRACT

GRAPHICAL ABSTRACT

The recent developments in remote sensing satellite technologies and sensors (Toth and Jóźków, 2016; Zhang X. et al., 2019; Han et al., 2019; Weiss et al., 2020; Yang et al., 2021), rise in number of available platforms for the satellite data access (Boerner, 2007; Rizzato et al., 2020), and improvements in other low altitude geospatial technologies like lightweight unmanned aerial vehicles (UAVs) (Rizzato et al., 2020) have aided to ease the monitoring and analysis of natural hazards and disasters (Gillespie et al., 2007) at various spatial and temporal scales. Since monitoring floods in urban settings is difficult due to narrow open space among the concrete jungles, use of UAVs immensely helps to monitor and quantify the flooded and flood-induced damages (Yalcin, 2018). Challenges, advantages, and disadvantages of using UAVs for such purposes in urban settings are discussed in detailed fashion in the literature (Feng et al., 2015). Freely available remote sensing products such as optical, radar, and hyperspectral datasets are more popular in studies quantifying different aspects of natural hazards (Lin and Yan, 2016; Yao et al., 2019). These remote sensing datasets are used to monitoring of current flood events (Ban et al., 2017) and to compute different set of variables that are entered as input in flood prediction models (Arora et al., 2021a).

In recent years, modeling development has attracted the attention of many researchers in various scientific disciplines (Cheng and Han, 2016; He et al., 2018; Chen et al., 2021). Multi-criteria decision-making (MCDM) (Opricovic and Tzeng, 2004; Abdullahi et al., 2015; De Brito and Evers, 2016; Turskis et al., 2019) and artificial intelligence (Suman et al., 2016; Guikema, 2020; Sun et al., 2020; Tan et al., 2021) models are very popular among researchers. There are a wide variety of flood inundation prediction models, e.g., statistical models including bivariate and multivariate (Tehrany et al., 2014), machine learning models, multi-criteria decision-making (MCDM) (Nachappa et al., 2020), and an ensemble of two or more models (Arabameri et al., 2020c). Guerriero et al. (2018) have discussed a more exhaustive discussion of existing methods on the flood inundation models prediction with their pros and cons. Also, new models are being devised and tested regularly (Razavi Termeh et al., 2018). However, different flood susceptibility models perform with different levels of accuracy and sensitivity (Bui et al., 2018) giving rise to inconsistency in model performances in different environmental settings. Currently, there appears to be a challenging task to find a model with a high level of predictability in diverse topographic and climatic settings. This task requires rigorous testing of various flood susceptibility prediction index (FSPI) models in different topoclimatic settings like low-relief floodplain environment with humid subtropical monsoon climate (Hong et al., 2018b) and mountainous high-relief rugged terrain with the semiarid climatic regime (Ahmadlou et al., 2018).

As pointed out in the previous paragraph, different types of susceptibility models accrue differences in accuracy and sensitivity in a similar or same topoclimatic setting. Furthermore, new models are constantly being developed and tested to achieve a better level of accuracy and sensitivity and to overcome disadvantages arising out of different factors discussed by researchers (Reichenbach et al., 2018). Additionally, the need to develop new models and test the previously developed ones in different settings is clearly visible in the hazard modeling community in the present decade (Panahi et al., 2021). To further this current practice among the hazard modeling community, we, in this study, present two new ensemble models and test their performance for a typical topoclimatic setting. We test the performance of one recently developed novel-advanced ensemble model viz. CART-FR and one new ensemble model (CART-EBF) developed for the first time by us to predict flood occurrences and delineate flood susceptibility zones in a region of the Middle Ganga Plain environment. We apply 12 widely used flood predictors namely geomorphology, altitude, slope, aspect, plan curvature, topographic wetness index (TWI), drainage density, distance to the river, distance from the road, soil type, annual rainfall, and landuse/landcover (LULC). This study also attempts to assess the contribution significance and efficiency of different flood predictors by using information gain (IG) method, through analysis of weightage rankings assigned by various ensemble models. This flood predictor ranking may assist flood hazard managers during the policy formulation and mitigation measures implementation.

2 Materials and Methods

2.1 Study Area

The part of the Middle Ganga Plain (MGP) investigated for flood susceptibility prediction, covering an area of ∼10,138.5 km², in this study is located in between the Upper and the Lower Ganga Plains (Figure 1). It lies between latitude 25°14′48.00″N–26°14′24.60″N and longitude 83°51′46.19″E–85°45′3.25″E. About 55.4% of the GRB (Singh et al., 2007) is covered with a thick layer of alluvium brought and deposited by a dense network of streams. There are a number of tectonic structures, both in the deep basement and at the surface which produce surface geomorphic markers revealing continuous active tectonic activity in MGP (Singh, 1996). The Ganga plain is also undergoing subsidence as a result of tectonics as well as excess groundwater depletion (Sahu et al., 2010). The study area is drained by several tributaries including Gomti River, Ghaghara River, Gandaki river, and Kosi river (these tributaries join the Ganga from the left bank); and Yamuna River, Son River, and Punpun river (these join the right bank of the Ganga). This densely populated area has been on the constant radar of national disaster management agencies for very long.

FIGURE 1

FIGURE 1. Location of the study area. (A): Location of the studied area marked on the map of India. It also shows Tibet and Pakistan in the northeastern and northwestern sides respectively. (B): Elevation of the study area classified using Natural Break (NB) method with input from SRTM 30m digital elevation model. (C): broad beological profile of the study area and its surroundings. This section also shows major drainages of the Ganga River Basin of which our study area is a part. (D) Loss of lives due to 2008 Bihar floods in 15 districts is shown here. (E,F) are photographs of the flood situation in the study area. (E,F): field photographs captured in the study area caused by 2008-Bihar flood.

GRB experiences a humid subtropical climate featuring four seasons—the winter season (January–March), summers (April–May), monsoon (June–September), and post-monsoon (OctoberDecember) (Dimri, 2019). According to the Indian Meteorological Department (IMD), the average annual mean, maximum, and minimum temperature experienced in GRB in 35 years (1969–2004) are 24.82°C, 31.22°C, and 18.44°C, respectively.

The MGP records average annual rainfall on the order of 100–120 cm, three-quarters of which is downpoured within 4 months long monsoon season (Trivedi et al., 2019). The influence of western disturbances (WDs) on Indian monsoonal rainfall is well-documented in the form of sporadic rains and hailstorms during the southward migration of intertropical convergence in winter months (Dimri and Chevuturi, 2016). The seasonal variability in the Ganga River discharge has led hydrologists to term river discharge of Indian River Network systems associated with monsoon systems such as monsoonal discharge, post-monsoonal discharge, summer or winter monsoon discharge (Gupta, 1984). The monsoon season river discharge in the Ganga River increases by 50–100 times due to heavy rainfall downpour.

2.2 Data and Methodology

Data preparation is the first step in scientific works (Feng et al., 2020). Table 1 provides the datasets used for preparing the flood predictors derived from Shuttle Radar Topography Mission (SRTM) digital elevation model (DEM) (30 m resolution), and other data sources, and flood inventory computed from Landsat-5 thematic mapper (TM) satellite imagery. The flood occurrence susceptibility modeling flow diagram shown in Figure 2 suggests that this research work has been accomplished in the follwing six steps: 1) obtain least cloudy Landsat 5 TM images of the study area from National Aeronautical Space Agency’s (NASA’s) earth explorer portal (https://earthexplorer.usgs.gov/) and generate the flood polygon for the “2008 Bihar Flood” event using normalized difference water index (NDWI) thresholding. The input datasets used in this study have been discussed in detail in Section 2 and its subsections in the study by Arora et al. (2021b), 2) create the flood inventory and the flood predictors, 3) generate flood and non-flood points for model training and validation, 4) test the conditioning factors of flood and non-flood points for multicollinearity, and also, apply the feature selection methods on all the flood predictors for a proper understanding of the suitability and contribution potential of all the factors involved. Here, we have applied information gain, 5) calculate the weightage of all the flood predictors using bivariate frequency ratio and evidential belief function models; also, devise the ensemble models with MLP, and CART machine learning models; and 6) perform model evaluations using various statistical parameters (discussed in the respective sections).

TABLE 1

TABLE 1. Satellite and DEM data characteristic details used in the study.

FIGURE 2

FIGURE 2. Flow diagram showing step-by-step methodology employed in this study.

2.2.1 Flood Inventorying

As suggested by Arora et al. (2021b), this step involves the computation of NDWI from the selected satellite scenes. The details of NDWI computation method, based on Gao (1996), are presented in Arora et al. (2021b). Flood pixels are separated from non-flooded pixels by applying a threshold ≥0.20 to the NDWI raster.

2.2.2 Flood Predictors

We have selected 12 flood predictors (Table 2) based on an extensive literature survey and our knowledge of the geomorphic, hydrologic, and climatic conditions of the study area. The slope angle is defined as the rate of change of elevation with Euclidean distance. Slope is one of the factors that determine and influence soil type, moisture content, and vegetation and, therefore, affects the surface runoff (Yang et al., 2020) and infiltration rates (Nassif and Wilson, 1975; Liu and Singh, 2004). Thus, the slope has both indirect and direct effects on flood inundation (Al-Rawas and Valeo, 2010). We computed the slope degree using ArcGIS 10.3 and DEM data and classified the slope range of 0–42.80° into five categories using the natural break method (Figure 3A).

TABLE 2

TABLE 2. Multicollinearity test results of all the conditioning factors; information gain (IG) attribute evaluation method for selection of flood conditioning factors using 300 flood and 300 non-flood points in Weka Software.

FIGURE 3

FIGURE 3. Flood conditioning factors used for modeling of flood susceptibility. From (A–L) the maps indicate Altitude, Slope; Aspect; TWI; River Density; Distance to Road; Annual Rainfall (mm); Soils; Curvature, Distance to River; LULC; and Geomorphology.

Slope direction is one of the variables that bear a relationship with the availability of soil moisture, geomorphic stability, exposure to radiation at the surface, wind (dry or wet), and rainfall intensity. Hence, it has an established relationship with flooding (Siahkamari et al., 2018). We used 3-D Analyst of SRTM DEM in ArcGIS 10.3 to calculate slope direction and categorized it into ten classes (Figure 3B).

Altitude affects the flood level in two ways: 1) elevation from the channel bed level decides how far from the rivers will inundation occur and 2) height from sea level controls atmospheric phenomena and hence type and magnitude of precipitation. In this study, SRTM v.4 30 m digital elevation model (DEM) derived low altitude range (13–96 m) surface of the area is classified into seven categories using the natural break method (Figure 3C).

Plan curvature or planform curvature is the directionality parallel to the maximum slope and decides the flow direction (Kimerling et al., 2016). We have used SRTM DEM to compute and reclassify three categories of planform curvature namely negative, zero, and positive indicating concave, flat, and convex surfaces, respectively (Figure 3D).

TWI is a quantitative measure indicating topographic control on hydrological processes. It is computed using the formula: $TWI = ln (\frac{A_{s}}{β})$ where $A_{s}$ denotes the specific catchment area (m²/m), and $β$ equals slope angle in degrees. TWI values determine the surface saturation condition which is one of the governing factors in surface runoff and hence becomes one of the determinants of potential flooding in a watershed. Here, we categorized the TWI map of the study into six classes (Figure 3E).

River density also known as drainage density ( $D_{d}$ ) represents the stream channel length per unit area. It is calculated using the formula suggested by Horton (1932): $D_{d} = \frac{L_{T}}{A},$ where $L_{T}$ is the total stream length serving an area, and $A$ is the contributing area. Thus, $D_{d}$ bears a directly proportional relationship with potential flood prediction. The higher the drainage density ( $D_{d})$ , the higher is the probability of flooding in a watershed. We have reclassified the $D_{d}$ range of 0–15.35 into six categories using the natural break method in ArcGIS 10.3 (Figure 3F). The drainage density is also established to be an indirect indicative of active tectonic activities (Han et al., 2003). Since the area is tectonically active, this parameter has been included to incorporate the effects of active tectonic activities to the flood potential.

Euclidean distance from the river channel is an important factor determining the extent of the inundated area by a flood (Khosravi et al., 2018). The areas far away from the river channel in a watershed are less probable to flooding than the ones nearer to the channel. Relationship of flood susceptibility to the distance to the river channel is subjective as the relationship varies from place to place depending on various factors (Choubin et al., 2019). We calculated the Euclidean distance to river channels in the Spatial Analyst Toolbox of the ArcGIS 10.3 and later interpolated and reclassified it into 10 categories to produce the map of “distance to the river” (Figure 3G).

Distance from the road is one of the important independent variables used in flood susceptibility modeling. The road networks in modern-day urban agglomerations increase impervious surfaces which contribute to changing the surface hydrological properties. Road network data used in this study have been obtained from Open Street Map Portal which is a collaborative mapping project (CMP). The data quality and its usability are described in Fan et al. (2014). After producing the map using interpolation, the data have been reclassified into seven categories (Figure 3H).

Geomorphology is closely connected to flood susceptibility (Mokarram and Sathyamoorthy, 2016). Floods sculpture landforms by the processes of erosion and deposition. Sometimes, extreme flood events destroy the landforms formed by different geomorphic agents. Thus, the interrelationship between floods and landforms of different scales (spatial and temporal) is established since the dawn of geomorphology as a discipline. In this study, geomorphological units were extracted from Google Earth Pro^© through onscreen digitization method at 1:500/1,000 scale. Seventeen microscale geomorphic units have been identified through the classification of the study area (Figure 3I). Fine-scale geomorphology being another proxy that connotes the effects of active tectonic activities and seismological perturbances in the surface has also been taken as one of the exploratory variables of flood susceptibility, but it has previously been ignored in flood susceptibility modeling community. Some of the geomorphic markers mapped in this area which represent effects of active tectonic activity include 1) asymmetrical meander belts (Leeder and Alexander, 1987), 2) abrupt scarp faces, 3) highly sinuous mountain fronts (Taloor et al., 2019), 4) unpaired terraces (Joshi et al., 2016), 5) unilateral migration (Latrubesse, 2015), 6) shifted fan lobes and terraces (Jolley et al., 1990), etc.

Climate change affects hydrological processes (Tian et al., 2020; He, 2021). Rainfall variations have an impact on flash flooding (Mahtab et al., 2018). Prolonged rainfall events or a set of short-interval events of different intensities and magnitudes often prompt floods. In this study, we have used the global CFSR annual rainfall dataset to extract rainfall conditions in the study area (Trivedi et al., 2019). The dataset is provided on a 1000 m spatial resolution which has been resampled to 30 m resolution data by using the nearest neighbor (NN) method (Figure 3J). The data range of 1,001–1,081 mm is reclassified into six classes using the natural break method.

Different characteristics of soil affect various hydrologic properties of the surface (Zhang et al., 2019b). Soil types with high permeability and high infiltration ratio show less susceptibility to flooding and vice versa (Krogh and Greve, 2006; FAO and ISRIC, 2012). Soil map produced by FAO and ITPS (2015) used in this study is resampled at 30 m resolution and classified into six categories (Figure 3K).

LULC can alter and control factors such as moisture retention capacity of the surface, infiltration rate, surface runoff, heat albedo and hence bears a well-known relationship with flood possibility in an area. For example, if an area has been converted to built-up land from forested land, the probability of flooding increases owing to the increased imperviousness caused by altered surface cover type (Rogger et al., 2017). The LULC data produced by climate change initiative (CCI) program of the European Space Agency (ESA) have been used in this study. The subset of LULC data obtained from ESA archived 300 m spatial resolution, annual worldwide dataset generated for the period 1992–2015 (Li et al., 2018), has been reprocessed using nearest neighbor resampling procedure at 30 m resolution (Figure 3L). The manual provided by the agency gives a full description of the dataset which readers can access to gain better knowledge (ESA, 2017).

2.2.3 Multicollinearity Assessment Through the Variance Inflation and Tolerance Analysis

Multicollinearity analysis (Alin, 2010) (also known as collinearity) is the foremost important step in the regression analyses. The concept of multicollinearity refers to the property of predictor variables not showing dependency on one another which Dormann et al. (2013) phrase as “non-independence of the predictor variables.” The noncollinear relationship among flood predictors (or independent variables/predictor variables) is warranted to get unbiased model results. Collinearity among the predictor variables is determined through “variance of inflation (VIF)” and “tolerance (TOL)” for a case $X = {X_{1}, X_{2}, X_{3}, \dots \dots, X_{N}}$ with the formula for which mathematical expressions are given as:

TOL = 1 - R_{j}^{2}, (1)

VIF = \frac{1}{TOL}, (2)

where $R_{j}^{2}$ = coefficient of determination of an explanatory variable’s regression on all the other explanatory variables. For the above case X, jth explanatory variable $X j$ ’s regression on all the other explanatory variables like $X_{1}, X_{2}, X_{3}, \dots .. X_{N}$ yields the input variable $R_{j}^{2}$ needed for computation of $VIF$ and $TOL$ . The value of $VIF$ >10 and TOL <0.1 is indicative of severe multicollinearity among the explanatory variables. VIF was calculated by “exploratory regression” which is an embedded tool in Spatial Statistics Toolbox of ArcGIS 10.3. The $VIF$ and $TOL$ values are presented in Table 2.

2.2.4 Feature Selection Method for Flood Predictors

For determination of the significance of controlling factors and ranking flood predictors as per their contribution in the prediction of flood phenomena, the information gain (IG) method (Section 2.2.4.1) was applied using the Weka software v3.9.4, developed by the University of Waikato, Hamilton, New Zealand.

2.2.4.1 Information Gain

IG is one of the most widely used methods of feature selection in various machine learning (ML) applications including landslide modeling (Đurić et al., 2019) and flood modeling (Costache and Tien Bui, 2019). This method is found to be one of the fastest and simplest methods used for ranking the features (Hall and Holmes, 2003).

The concept of entropy is one of the main tenets of the “information theory” and serves as the basis of IG:

IG (t) = - \sum_{i = 1}^{| C |} P (C_{i}) logP (C_{i}) + P (t) \sum_{i = 1}^{| C |} P (C_{i} |t) logP (C_{i} | t) + P (\bar{t}) \times \sum_{i = 1}^{| C |} P (C_{i} | \bar{t}) logP (C_{i} | \bar{t}), (3)

where $C_{i}$ = the i^th category; $P (C_{i})$ = the probability of i^th category; $P (t)$ and $P (\bar{t})$ represent the probabilities of occurrence and no-occurrence of phenomena “t”, respectively. The entropy value of $C_{i}$ , for discrete variables, can be defined as:

H (C) = - \sum_{i = 1}^{k} P (C_{i}) {log}_{2} (P (C_{i})) (4)

This equation assumes that $C_{i}$ picks its values from ${C_{1}, C_{2}, C_{3}, \dots \dots, C_{k}}$ and $P (C_{i})$ is the probability such that $C = C_{i}$

The decision of flood predictor selection using the IG based on entropy values of variables computed from D training dataset comprising n number of flood predictors can be expressed as follows (Chapi et al., 2017):

IG (D, F) = \frac{Entropy (D) - Entropy (D, F)}{SplitEntropy (D, F)}, (5)

Entropy (D) = - \sum_{i = 1}^{2} \frac{(Y_{i}, F)}{| D |} {log}_{2} \frac{n (Y_{i}, F)}{| D |}, (6)

Entropy (D, F) = - \sum_{j = 1}^{m} \frac{D_{j}}{| D |} Entropy (D), (7)

SplitEntropy (D, F) = - \sum_{j = 1}^{m} \frac{| D_{j} |}{| D |} {log}_{2} \frac{| D_{j} |}{| D |} . (8)

3 Models Employed for Flood Susceptibility Prediction Index Mapping

For the present work, two base bivariate statistical models, viz. FR and EBF, have been used to compute weightage for each of the twelve flood predictors. Subsequently, those flood predictors’ weight values have been used to train the ensemble advanced ML models namely CART, FR, and EBF models. In the subsequent subsections, the brief functionality of each of the individual models is described, and later, how the two bivariate model-based weights are used for ensembling the other three machine learning models is presented.

3.1 Models Applied for Data Preparation

3.1.1 Evidential Belief Function

This algorithm is based on Dempster-Shafer’s theory of evidence (Dempster, 1967; Smith and Shafer, 1976). Four important functions form the EBF: 1) belief function (Bel), 2) plausibility function (PLs), 3) disbelief function (Dis), 4) uncertainty function (Unc).

m : 2^{Θ} = {θ, T_{P}, \bar{T_{P},} Θ} where   Θ = {T_{P,} \bar{T_{P}}}, (10)

where $T_{P}$ represents spring generated class pixels, $\bar{T_{P}}$ represents class pixels not influenced by spring, and $θ$ is an empty set.

The above equation yields the Bel (belief function) calculated with the help of the following equation (Park, 2011):

[λ {(T_{P})}_{A_{ij}}] = [\frac{N (S \cap A_{ij})}{N (S)}] / [{N (A_{ij} - N (S \cap A_{ij}))} / [N (P) - N (S)]], (11)

Belief function (Bel) = (\frac{[λ {(T_{P})}_{A_{ij}}]}{\sum [λ {(T_{P})}_{A_{ij}}]}) (12)

where $N (S \cap A_{ij})$ = density of flood pixels occurring in $A_{ij}$ ; $N (S)$ = total density of whole flood occurring in the study area; $N (A_{ij})$ = the density of pixels in $A_{ij}$ ; $N (P)$ = the density of pixels in the whole study area $P$ .

The disbelief function (Dis) can be derived as:

[λ {({\bar{T}}_{P})}_{A_{ij}}] = [\frac{N (S) - N (S \cap A_{ij})}{N (S)}] / [(N (P) - N (S) - N (A_{ij}) + N (S \cap A_{ij}) / N (P) - N (S))], (13)

Disbelief (Dis) = ([λ {({\bar{T}}_{P})}_{A_{ij}}]) / \sum [λ {({\bar{T}}_{P})}_{A_{ij}}] (14)

And the following equations are used to compute uncertainty (Unc) and plausibility (PLs):

Uncertainty (Unc) = [(1 - (Belief) - (Disbelief))] (15)

Plausibility (Pls) = [1 - (Disbelief)] (16)

3.1.2 Frequency Ratio

Frequency ratio is a frequently used bivariate statistical model. It represents the probability of event occurrence; in our case, the event is the flood pixel (Arabameri et al., 2019b).

The frequency ratio (FR) computation uses the following mathematical expression:

FR = \frac{\begin{matrix} \frac{N_{pix} ({SX}_{i})}{\sum_{i = 1}^{m} {SX}_{i}} & (flood occurance ratio) \end{matrix}}{\begin{matrix} \frac{N_{pix} (X_{j})}{\sum_{j = 1}^{n} N_{pix} (X_{j})} & (area ratio) \end{matrix}}, (17)

where $N_{pix} ({SX}_{i})$ = the number of pixels with flood events within class i of factor variable X; $N_{pix} (X_{j})$ = the number of pixels within the factor variable $X_{j}$ ; m = the number of classes in the parameter variable Xi; and n = the number of factors in the study area (Regmi et al., 2014).

3.1.3 Classification and Regression Tree

CART is a powerful data mining machine learning nonparametric algorithm proposed by Breiman et al. (1984). As the name suggests, it can perform both the classification and regression of number, binary, and categorical type of variables (Haughton and Oulabi, 1993). After performing the classification of variables in either number, binary, or categorical format, the average response values are computed using the mathematical expression:

I (Split) = ⌊ 0.25 {q {(1 - q)}^{u} \sum_{k} {| PL (k) - PR (k) | ⌋}^{2}, (18)

where k = index of the target classes; $PL (k)$ and $PR (k)$ = distribution of the probability of the target classes associated with the left and right child nodes, respectively; $u$ = penalty trolled by the user when unequal sized child nodes are generated.

The resulting outcome of the CART model comes in a very complex form of a decision tree which needs pruning to extract only relevant and most important info out of it.

3.2 Ensemble Models Applied for Flood Susceptibility Prediction Index Computation

Due to the limitations of stand-alone models (Hapuarachchi and Wang, 2008; Hapuarachchi et al., 2011) and the advantages of ensemble models (Fernández et al., 2018; Zounemat-Kermani et al., 2020), in recent years, the use of ensemble models has expanded among researchers (Fernández et al., 2018; Zounemat-Kermani et al., 2020; Costache et al., 2021). Two ensemble models are used to derive the Flood Susceptibility Prediction Index and corresponding zonation maps. These ensembles are generated through the combination of CART and bivariate statistics models—FR and EBF. The factor class/category coefficients derived with the help of FR and EBF models are used as input in the CART.

3.3 Database Establishment

For the present research work, a database consisting of 12 flood predictors for a total number of 300 flood points was prepared using ArcGIS. Since the flood-prone area identification was performed following a binary classification of pixels, it was necessary to create another data sample, having the same number of points (300), consisting of non-flood locations (Ali et al., 2020). To ensure the objectivity of the results, the non-flood locations were randomly distributed across the entire study area.

3.4 Feature Selection With IG

The involvement of multiple predictors to estimate the susceptibility to a specific natural hazard can lead to issues related to the prediction (Costache, 2019). To overcome this shortcoming and to eliminate the noisy data from the workflow, the predictive ability of the 12 flood predictors was tested using information gain (IG). To determine the flood predictors’ significance, all the three models were applied using Weka 3.9 software.

3.5 EBF and FR Coefficient Normalization

EBF and FR coefficients were used to code the predictor class/category. These two types of coefficients were calculated using the procedure described in Sections 3.1.2, 3.1.3. Furthermore, to bring the EBF and FR values to the same range of values, the normalization procedure was applied using Equation 23 proposed by Costache et al. (2020):

y = \frac{(x - \min (d)) \times (\max (n) - \min (n))}{maz (d) - min (d)} + min (n), (23)

where $y$ = standardized value of $x$ , $x$ = variable’s current value, $d$ = range limit of the variable values, and $n$ = standardized range limit.

3.6 Preparation of the Training and Validating Datasets

After obtaining the normalized EBF- and FR-derived weightage database, we need to set up the training and validation samples using this newly generated dataset. Previous studies (Arabameri et al., 2019a; Bui et al., 2019a) suggest that the training sample is established to represent 70% of the total dataset, while the other 30% is apportioned for validating the dataset. Thus, we used 210 flood and 210 non-flood pixels as the training dataset, while 90 flood and 90 non-flood locations were used in the validation process. The Subset Features tool from ArcGIS was used to randomly split the dataset.

3.7 Setting the Configuration for Hybrid Ensemble Models

3.7.1 CART-EBF and CART-FR Ensembles

The two CART-based ensembles were trained with the help of Salford Predictive Modeler v8.2 (Costache et al., 2020). The trial process was used to optimize the CART ensembles’ parameters (minimum cases of parent nodes and terminal nodes) whose values were established in accordance with the highest AUC. Finally, the weights of the flood predictors for CART-EBF and CART-FR ensembles were also determined.

3.8 Model Performance Evaluation and Comparison

Performance evaluation is the most important step in scientific works (Zhang et al., 2019a). Because no single or a set of universally valid model evaluation measurement matrices related guidelines could be found such as AUROC, TSS, RMSE (Zhou et al., 2018; Alam et al., 2020), and others, we have chosen two types of matrices to evaluate the performance of models used in this study: 1) threshold-independent and 2) threshold-dependent. Under the first category, area under the receiver operating characteristic (AUROC) curve has been used. In the case of the threshold-dependent performance evaluation matrix, we have used true skill statistics (TSS) and others (see Table 6). It should be noted that many of the threshold-dependent matrices (West et al., 2016) listed in Table 6 are derived from AUROC curve (the abbreviations used in this section are given in Table 6 and its appended note given just below the table).

The AUROC graph plot is a biaxial plot with “sensitivity” (y-axis parameter) versus “1-specificity” (x-axis parameter) (Jiménez-Valverde, 2012). The AUROC value ranges from 0.5 (inaccurate) to 1 (highly accurate).

The matrices plotted on the two axes of the ROC curve, sensitivity (also called true positive rate), and specificity (or true negative rate) are expressed in mathematical form as:

TPR = Sensitivity = \frac{FP}{FP + TN}, (26)

FPR = 1 - Specificity = \frac{TP}{TP + FN}, (27)

where $FP$ = number of false-positive cases, $TN$ = number of true-negative cases, $TP$ = number of true-positive cases, and $FN$ = number of false-negative cases. The $TPR$ , also termed as sensitivity, is the representative of the probability of a test predicting true events to be true. And the FPR, which is also known by the names “1-specificity,” indicates the probability of a test predicting a non-event to be a true event. The AUROC value range of 0.5–1.00 has different implications in terms of the accuracy of model performance.

The specificity and sensitivity values using different cutoff thresholds for both models CART-EBF & CART-FR are provided in Table 4.

The threshold-dependent statistic metric used in this work, the “true skill statistics (TSS)” (Flueck, 1987), is one of the popularly used skill score measures for categorical datasets in forecast-related studies. This matrix’s discovery traces back to its first proposal by Peirce (1884) and is also widely called by the name “HanssenKuipers” discriminant (Wilks, 1995)/or Kuipers’ performance index/or the true skill statistic (Allouche et al., 2006). Cohen’s Kappa is dependent on the prevalence of sample points affecting the sensitivity and specificity of the model performance, and the TSS overcomes this disadvantage (Allouche et al., 2006). Besides, the accuracy, F-score, Cohen’s kappa (Cohen, 1960), Matthew’s correlation coefficient (MCC), TPR (sensitivity), TNR (specificity), FPR (fall out), informedness (bookmaker informedness; BMI), etc. (see Table 6 and the appended notes for details of the list of matrices and their expansions and calculation formulas) are also dependent on independent variables that control AUROC, such as TP, TN, FP, and FN have also been calculated for the performance evaluation of the models. All the performance evaluators (statistic matrices) listed in Table 6 are used for assessing different facets of model performances viz. accuracy, precision, robustness, sensitivity, consistency, the goodness of fit between observed and estimated values of natural phenomena, most of which are derived from the 2Χ2 contingency confusion matrix generated from binary classification scheme. There is a long list of classifier performance evaluation matrices. However, their suitability for a particular type of modeling exercise has not been put forward by the ML community yet (Seliya et al., 2009). Seliya et al. (2009) studied twenty-two of such evaluators with their meanings, what their higher or lower values imply as well as the relationship among them.

The Kappa statistic assesses the agreement between two distinguished sets of classification while catering to the randomness in the classification (Baattrup-Pedersen et al., 2012). The Kappa statistics can be calculated using the following equation:

K = \frac{P_{obs} - P_{exp}}{1 - P_{exp}}, (28)

where $P_{obs}$ = observed agreements = (TP + TN), and it is representative of the correctly classified inundated and non-inundated pixels; $P_{exp}$ = expected agreements = [{(TP + FN) x (TP + FP)} + {(FP + TN) x (FN + TN)}]; it equates to the proportion of inundated and non-inundated pixels which were expected to show agreement, on the basis of chance (Hoehler, 2000).

The value of k-index varies between 0 and 1; the value moving towards 0 indicates less agreement, whereas those moving towards higher values, i.e., towards 1 indicate the model’s prediction accuracy heading towards or near to perfection. Cohen (1960) presented fivefold classification of k-index such that: K ≤ 0 (no agreement); 0.01–0.20 (slight agreement); 0.21–0.40 (fair agreement); 0.41–0.60 (moderate agreement); 0.61–0.80 (substantial agreement), and 0.81–1.00 near to perfect agreement.

We have also employed the seed cell area index (SCAI), and frequency ratio plots (FRP) for classification accuracy assessment of the models as the second round of validating the modeled classification results. The SCAI index computation takes into account the mathematical expression equating the ratio of each classified class and the susceptible seed cell percent values (Süzen and Doyuran, 2004):

SCAI (%) = \frac{\begin{matrix} \frac{N_{pix} (X_{j})}{\sum_{j = 1}^{n} N_{pix} (X_{j})} & (area ratio) \times 100 \end{matrix}}{\begin{matrix} \frac{N_{pix} ({SX}_{i})}{\sum_{i = 1}^{m} {SX}_{i}} & (flood susceptible occurance ratio) \times 100 \end{matrix}}, (29)

where $N_{pix} ({SX}_{i})$ = number of flood pixels within class i of flood predictor X; $N_{pix} (X_{j})$ = number of pixels within the flood predictor variable $X_{j}$ ; m = number of classes in the parameter Xi and n = number of total factors selected for the study area. In our study, there is an inverse relationship between SCAI and the accuracy of prediction of susceptible classes (Arabameri et al., 2020b). In other words, low SCAI value for “high” and “very high” FSPI sensitivity classes and high SCAI value for “low” and “very low” FSPI sensitivity classes validate that the classification results of FSPI zones are correctly demarcated in the resultant flood susceptibility maps (Dragićević et al., 2015). The FRP method of classification validation behaves inverse to the SCAI values methods (Aghdam et al., 2017). Therefore, these two additional second rounds of validation will add an extra level of confidence in the results of the modeled susceptibility prediction index.

4 Results

4.1 Selecting the Flood Predictors

We selected a list of flood predictors based on an extensive literature survey and familiarity with the topographic, hydrologic, climatic, and anthropic settings of the study area. Afterward, we selected the most significant and least redundant features or flood predictors by applying three statistical measures meant for checking multicollinearity, retrieving weights, and ranking of the variable features. We also analyzed interdependence among the flood predictors by applying the test of multicollinearity; furthermore, the application of IG (Figure 4) test methods has helped in ranking the flood predictors in order of their contribution to the flood occurrence probability.

FIGURE 4

FIGURE 4. Importance of flood predictors derived through the use of IG arranged in order of their contribution to flood occurrence prediction.

4.1.1 Multicollinearity and IG Analyses of Feature Selection

Interdependency of flood predictors has been assessed by applying multicollinearity analysis. This analysis shows that all the flood predictors listed in Table 2 have variance inflation factor (VIF) and tolerance (TOL) > 10 and <0.1, respectively, which do not show sign of collinearity and hence can be included in the models applied for flood susceptibility zonation exercise.

The IG method applied to retrieve weightage and ranking shown in Figure 4 helps to assign rank to the flood predictors and is given in Table 2. The calculated IG ranks and weightages have ascertained the significance of the role of predictors in flood occurrence prediction. Geomorphology has been found to be the first ranker signifying its most important contribution in the flood susceptibility prediction process. IG method suggests that the first four most important flood predictors (descending order) are non-DEM-derived factors viz. geomorphology, soil, LULC, and “distance-to-road.” And the least significant predictors (increasing order of significance) are rainfall, curvature, “distance-to-river,” and stream density. As per IG, almost all the DEM-derived topography-related parameters, except curvature, are middle-level performants in their significance to flood contribution.

4.1.2 EBF and FR Coefficients

The FR and EBF model results for each class of every flood predictor have provided base weight values for training and validating data points for CART model ensembles. At first, we classified all the flood predictors’ values using the methods listed in Table 3. Subsequently, the EBF and FR weights corresponding to the original class values for each of the flood predictors at the subclass level were computed using methods discussed in Sections 4.1.1, 4.1.2, respectively. The elevation class ranges “0–45.0” and “65.8–96.0” show maximum (FR = 3.29; EBF = 0.39) and minimum (FR = 0.03; EBF = 0.01) weights for both FR and EBF, respectively. The slope aspect classes that provide maximum (FR = 1.54; EBF = 0.17) and minimum (FR = 0.68; EBF = 0.08) FR and EBF weights are flat (−10–00) and northwest (292.50–337.50). Out of the three curvature classes, class range assigned as flat gives the maximum FR (1.11) and EBF (0.37) weights whereas “convex” class renders the minimum weight for both FR (0.88) and EBF (0.30) models. The flood predictor “distance from river” class ranges which produced maximum (FR = 1.46; EBF = 0.40) and minimum (FR = 0; EBF = 0) weights by FR and EBF models are “0–600” and “3,601–4,200” as well as “4,201–5,383.” The FR and EBF models have assigned maximum (FR = 5.65; EBF = 0.33) and minimum (FR = 4.95; EBF = 0.29) weights to levee and waterlogged areas, two geomorphological classes, respectively. It can be noted that the maximum (FR = 3.30; EBF = 0.54) and minimum (FR = 0.18; EBF = 0.03) FR and EBF weight, respectively, have been assigned to “water” and “settlement” classes. The FR maximum and minimum weights for “rainfall” classes are 1.32 and 0.84, and as per EBF for the same “rainfall” classes, the value ranges from 0.24 to 0.15, respectively. As per FR and EBF, the maximum (2.62) and minimum (0.74) weights for river density have fallen in the same classes as 0.29 (max) and 0.08 (min), respectively. In the case of “distance to road” flood predictor classes “3,001–4,000” and “0–500,” maximum and minimum weights computed using FR are 2.02 (max) and 0.60 (min); and that by EBF are 0.25 (max) and 0.07 (min), respectively. The maximum (FR = 2.46; EBF = 0.34) and minimum (FR = 0.86; EBF = 0.12) FR and EBF weights delivered to account for the “slope angle” classes viz. “7.1–42.8” and “1.1–3.0” and “3.1–5.0”, respectively. Fifth soil class (FL-Fluvisol-3743) and third soil class (CL-Calcisol-3694) were recognized as maximum (1.78) and minimum (0.16), respectively, by FR; and that by EBF model, weightage values are 0.44 as maximum and 0.04 as a minimum. TWI class (22.33–31.84) has been the one with maximum weight value for both FR as well as EBF (FR = 4.41; EBF = 0.35); and the TWI class “7.33–10.89” is representative of minimum (FR = 0.80; EBF = 0.06) weight as per FR and EBF both. Classwise weights of each class of every flood predictor for FR and WBF are tabulated in Table 3.

TABLE 3

TABLE 3. FR and EBF values of factor class/categories (FR values taken from Arora et al., 2019b).

4.1.3 Flood Susceptibility Prediction Index Zonation Results

All the hybrid models were trained and validated using normalized flood predictor values for each class representing the controls of flood susceptibility in the MGP (see Section 3.6). After estimating the FR- and EBF-based flood predictor weights of the entire 600 flood and non-flood points, the trial-and-error method using backward and forward propagation was applied to obtain the CART ensemble weights for those points. Four categories of results were obtained by the use of EBF- and FR-based ensembles: 1) the ensembles have arranged the flood predictors in the sequence of their significance (ascending order of weight assigned to the flood predictors); 2) by using these weights for each subclass of every flood predictor, the flood susceptibility prediction index (FSPI) of the entire study area was obtained and classified into “very low,” “low,” “medium,” “high,” and “very high” flood susceptible zones using natural break (NB) method (Figures 5A,B); 3) corresponding to each class of FSPI, the entire study area was delineated into 5 zonation units (with the percentage of area appendages to each class) (the areal share of each FSPI zone using four different segmentation methods is presented in Figures 6A–C); and 4) the accuracy, sensitivity, precision, robustness, etc. of all the models indicating how well the models performed in this low-relief, subhumid monsoon-dominated topoclimatic setting have been computed. For both of the ensembles, these different levels of results are presented in the subsections below. Since the natural break (NB) method is most widely used and quantile (QNTL) accrued the highest areal percentage shares in the “high” and “very high” classes, FSPI % shares were separately computed for all the methods using these two methods and are presented in Figures 7A,B.

FIGURE 5

FIGURE 5. Flood susceptibility map using six ensemble models computed using methods discussed in Section 5.1; (A, B) show FSPI classified results according to CART-EBF and CART-FR results. Boxed areas A and B are zoomed windows in each of the model output maps to show detailed FSPI conditions nearby confluence zones.

FIGURE 6

FIGURE 6. FSPI histogram classification of both models’ outputs. In parts (A) and (B), Percentage share of areal coverage in “very low,” “low,” “medium,” “high,” and “very high” categories as classified by Natural Break (NB) and Quantile (QNTL) methods, respectively, is visualized, whereas, in part (C), areal coverage (%) by using four methods (EI-Equal Interval; GI-Geometric Interval; NB-Natural break; and QNTL-quantile) for only “high” and “very high” classes is demonstrated.

FIGURE 7

FIGURE 7. Area under receiver operating characteristics (AUROC) curve for the model was constructed in a single graph in order to compare the model’s performance. Validation of the models was performed using 30% of the randomly generated flood points specifically segregated from the points kept for the purpose of validation using AUROC statistical method. Panel (A) is computed using training dataset (represents model success rate) and Panel (B) with validation datasets (shows model prediction rate).

4.1.3.1 CART-EBF and CART-FR

Following the training procedure keeping in mind different cut-off thresholds corresponding to specificity and sensitivity values of both the models (Table 4), the minimum cases of parent nodes for the CART-EBF ensemble were established at 24, while the minimum cases of terminal nodes were kept equal to 10. Instead, in terms of the CART-FR model, the optimal minimum cases of parent nodes resulted to be 27, while the optimal minimum cases of terminal nodes were 12. Based on these details, the models, in the next step, have computed the weights for each of the flood predictors which are arrayed in Table 5. According to this Table 5, the CART-EBF and CART-FR have, computationally, annexed the “land use” (0.115) and the “geomorphology” (0.125) as highest weight scorers, respectively, followed by soil (0.114), geomorphology (0.111), altitude (0.073), TWI (0.021), aspect (0.020), river density (0.019), distance from river (0.015), road distance (0.015), rainfall (0.01), curvature (0.002), and slope angle (0.001) as per CART-EBF, and altitude (0.121), land use (0.054), soil (0.046), rainfall (0.039), distance from river (0.024), TWI (0.014), aspect (0.008), river density (0.007), curvature (0.005), road distance (0.004), and slope angle (0.002) according to CART-FR.

TABLE 4

TABLE 4. Specificity and sensitivity values using different cut-off thresholds.

TABLE 5

TABLE 5. Weights of conditioning factors within the applied models.

By applying these flood predictors’ weights, the FSPI values were computed in raster calculator embedded in Spatial Analyst of ArcMap version 10.3 and categorized into 5 classes for carrying out flood zonation using four classification methods QNTL, NB, GI, and EI. The highest percentage share of flood pixels in the “very high” class category has been noted by QNTL (19.43%), and the second, third, and fourth rankers stood out to be GI (9.94%), NB (8.14%), and EI (3.81%) for CART-EBF. And for CART-FR, the first rank has been registered by QNTL (19.64%), followed by the lower rank holders in descending order as GI (5.11%), NB (3.92%), and EI (0.89%) respectively.

4.2 Model Performance Validation Through AUROC and Other Statistical Measures

In Table 6, model performance evaluation statistic matrices belonging to two categories of evaluators viz. cutoff-dependent, cutoff-independent, most of which are derived from confusion matrix related parameters, such as TP, TN, FP, and FN, are presented. These are used to assess different aspects of model performances, such as model accuracy or efficiency, precision, robustness, randomness driven performance, etc. Rahmati et al. (2019) reviewed 21 threshold-dependent model performance evaluation indices to judge different aspects of the functioning of susceptibility models used in the field of natural hazard studies. We have used only 14 of those evaluators (given in Table 6) to refrain from making model evaluation sections of the paper lengthy. As suggested by Rahmati et al. (2019), the threshold-independent and threshold-dependent evaluators used in this work are discussed in the following Sections 4.2.1, 4.2.2, respectively.

TABLE 6

TABLE 6. Minimum and maximum FSPI values for all the flood susceptibility classes as per CART- EBF and CART-RF models.

4.2.1 Threshold-independent Matrices

Area under receiver operating characteristic (AUROC) curve, success rate curve (SRC: ROC computed using training dataset), and prediction rate curve (PRC: ROC computed using validation dataset) for all the modeled results are given in Figures 7A,B and Table 4. For the training dataset, CART-EBF (84.3%) has performed better than CART-FR (82.8%), whereas AUROC concept applied to the dataset used for validation of models results in a higher prediction rate for CART-EBF (81.9%) and slightly lower for CART-FR (80.2%).

4.2.2 Threshold-dependent Matrices

All 14 threshold-dependent evaluation matrices are presented in Table 4. The detailed definition, formulae, and their interpretation are given by Frattini et al. (2010) and Rahmati et al. (2019). In terms of the overall accuracy of ensembles (for both training and validation datasets), the CART-EBF (Acc_SR = 81.40%; Acc_PR=79.60%) outsmarts the CART-FR (Acc_SR = 75.9%; Acc_PR=74.0%). In this study, both ensembles have exhibited sensitivity (TPR: true positive rate or the ability of models to correctly predict positives or flood points) in the range of 78.5–82.4% for the training dataset and 76.0–80.9% for validation dataset. The models’ ability to correctly predict the negatives, i.e., non-flood points, is adjudged by the specificity or true negative rate (TNR) was found to be 0.738 for CART-FR for training dataset and for validation phases, and the TNR value is 0.723. The PPV (positive predictive value), also called as confidence or precision of predictive capacity of models, and its complementary metric FDR [false discovery rate, which deals with conceptualization of Type I error. See Frattini et al. (2010) for the definition of Type I and II errors] are used here to see how precisely the ensembles used here can predict flood pixels and non-flood pixels, respectively. The higher values of PPV and lower value of FDR are indicative of the high precision of prediction capability of ensembles. CART-FR with PPV: 0.714; FDR: 0.286 (for training data) and PPV: 0.783; FDR: 0.298 (for validation dataset) has been found to perform a little imprecisely than CART-EBF (Table 6). It should be noted that the lower the value of FNR, the better the model performance. Though accuracy and F1-score have been widely used to assess model performances, they sometimes lead to misleading implications. The F1-score does not take into account all the four primary matrices of the confusion matrix and that is where MCC (Mathew’s Correlation Coefficient) plays a decisive role as it overcomes this shortcoming by incorporating all the four matrices (Chicco and Jurman, 2020). A look at the F1-Score and MCC values of the ensembles in Table 6 reveals that when adjudged based on overall accuracy as well as F1 and MCC, the same sequence of performance levels emerges for both models. The k-index or kappa index, one of the most widely used statistic for model performance accuracy assessment, suffers from a lacuna involving its overdependence on prevalence or pervasiveness of samples (Allouche et al., 2006). Hence, to overcome this issue, an alternative measure, true skill statistic (TSS) has been computed and is presented in Table 6. According to k-index and TSS as well, when considered concomitantly, the CART-EBF model is the better accurately performing model during the training and validation phases than the CART-FR. Table 6 lists the “informedness” or “bookmaker informedness; BMI,” statistic referred to be the “only unbiased indicator” of model accuracy which helps with an informed selection of models.

4.2.3 SCAI- and FRP-Based Performance Evaluation

The second round of validation matrices, SCAI-, and FRP-based performance of both ensembles’ accuracy was performed to gain an extra level of confidence in model results. The computation of SCAI & FRP at the classwise level was performed as per methods discussed in Sections 3.8, 4.1.2, respectively. In this study, as visible in Figure 8B, CART-EBF (SCAI = 5.209 for “very low” class) performs more accurately than CART-FR (SCAI = 83.263 for “very low” class). The FRP-based performance in Figure 8A shows that the FR values of the lower class of both ensembles are lower, just opposite to the behavior of SCAI, and that of “high” and “very high” susceptibility classes are higher. This pattern of FPR classwise values conforms to the result implications provided by SCAI values; and that have presented a better confidence level about the models’ performance accuracy for both models.

FIGURE 8

FIGURE 8. Ensemble model result validation using: (A) Frequency Ratio Plot (FRP); and (B) Seed Cell Area Index (SCAI) methods.

4.3 Flood Pixel Distribution Vis-à-Vis FSPI Classes

Figures 6A–C and Table 7 represent the final zonation of flood susceptible areas modeled by both the ensembles (classified using the natural break method). It shows areas highly susceptible (covering “high” and “very high” classes collectively) to flood menace and the safer zones. The FSPI values in the “high” and “very high” classes were used to compute the distribution of areas falling under these two classes (Figure 6). Since the QNTL has been found to delineate the highest percentage of areas under “very high” class, as discussed in Section 4.1.3, and in most of the published results, the final zonal classifications are performed by using NB, and the QNTL-based areal shares (in %) corresponding to the NB-based areal shares of “high” and “very high” classes are also shown and discussed here in this section using Figures 6A,B. The CART-EBF covers 27.24% areal coverage (as per NB classification method), whereas, as per the QNTL methods, CART-EBF encompasses 49.49% area under high and very high classes [see Table 7 (for FSPI) and Supplementary Table S1A for % areal coverage by each class]. The lower areal percentage coverage was produced by CART-FR (21.94% as per NB) and 29.49% as per QNTL.

TABLE 7

TABLE 7. Statistical metrices used for model performance evaluation (Note: All the abbreviations used in this table are expanded and defined below this table itself).

5 Discussion

5.1 Flood Predictor Significance

Predictor selection for the natural hazard susceptibility prediction aiming at determining the significance levels of contributing factors/control factors is done at three stages (Janizadeh et al., 2019). 1) First, multicollinearity analysis (VIF and TOL) is performed before training the models. This helps to root out all the interdependent control factors. 2) Then, orderly levels of contribution of control factors in computation of natural hazard susceptibility prediction indices (NHSPIs) are analyzed using the factor selection techniques such as IG, relief-F, RF. Control factors with weights higher than zero only are included in the model training and validation processes. 3) Finally, by application of all the models, weights are retrieved to all the conditioning factors. The second-stage results using IG were useful for winnowing out the flood predictors with zero weightage values to be included in the analysis even when their multicollinearity results of the first stage allowed them to be included for further step of the model training. The output from the third stage of the control factor is helpful at the policy formulation level in hazard management. The exact knowledge of the area-specific flood-predicting control factors will help hazard-related policy formulators to allocate funds that manage those respective flood predictors on a priority basis as compared to others. There are numerous techniques used for assessing the significance of contributing factors at aforesaid second and third stages. The type and number of conditioning factors (for floods) depend upon several variables including type topographic and climatic settings (Benson, 1963; Merz et al., 2014) as well as the type of flood, e.g., flash flood, precipitation-induced riverine flood, coastal storm flood, tsunami induced flooding (Pignatelli et al., 2009), or glacial lake outburst flood (GLOF) (Aggarwal et al., 2017)/landslide lake outburst flood (LLOF) (Srivastava et al., 2017). The quality of topographic data (Cook and Merwade, 2009) can also affect the predictor significance. High-resolution LiDAR-derived DEMs (Laks et al., 2017) and their derivatives behave differently than those which are derived using freely available DEMs, such as SRTM 30 m, ASTER 30 m, AW3D 30 m. For fluvial or riverine floods in mountainous areas, terrain parameters perform better when derived from ALOS World 3D 30 m (AW3D 30 m) (Boulton and Stokes, 2018), but the situation gets reversed for flat floodplain environmental setting like MGP (Tanaka et al., 2019). Since there is no specific guideline set for choosing the predictor significance assessment method and several previous studies use the medium resolution DEM dataset for deriving topography-based predictors (Santos et al., 2019), SRTM 30 m version 4 DEM-derived topographic variables were computed (see Section 2), and their significance was analyzed using IG. The results of this analysis show that detailed geomorphological mapping derived geomorphic units have played a very important role in flood susceptibility prediction using IG method. The second rank assigned to “distance to river” by this method appears to be true as the “2008 Bihar” flood, which is the source of flood inventory in this work, was a riverine flood due to overbank flooding caused by levee breach in the upstream Kosi megafan area leading to the sudden supply of water discharged for the lower reaches (UNDP Emergency Analyst, 2008). There was a time lag of around 10 days between the excessive rainfall in upper catchments, Kosi levee breach, and 2008 Bihar flooding (for which we have created our flood inventory), that’s why “rainfall” has scored least significance as a flood predictor. The reason for geomorphology bearing the first place on the significance score scale is that in the fluvial environments, most of the geomorphic forms evolve through processes governed by rivers’ hydrological, hydraulic, erosive, depositional, etc., characteristics. In a study conducted in the mountainous catchment of the northeast region, Lao Cai, of Vietnam, the DEM-derived predictor “slope angle” has received the highest predictive value whereas another DEM-derived parameter “curvature” was reported to be the least predictive factor (Bui et al., 2019a). The same study highlights that four other predictors, out of 12 used in that study, which have scored high significance scores, were DEM-derived parameters. In this study, four DEM-derived parameters: aspect, TWI, altitude, and slope have scored significant weights as per IG, equivalent to second, third, fourth, and fifth ranks, respectively. Irrespective of their apparent image of being dominant control factors of floods, rainfall could not stand among first five contributing factors in this study which conforms to the findings by other studies conducted in different parts of the globe (Tien Bui et al., 2020). Khosravi et al. (2019) conducted a study in a hilly moderate relief topographic setting (altitude range: 29–1,410 m) located in China and found that rainfall doesn’t have significant predictive significance. In their study, altitude scored the highest predictive significance followed by distance from river, NDVI, soil, slope, lithology, LULC, STI, rainfall, SPI, and curvature. There are variations in the significance scores of DEM-derived contributing factors such as slope, aspect, curvature, stream power index (SPI), terrain surface texture (TST), topographic position index (TPI), etc. and that may be because of a number of factors related to DEM resolution, algorithms used (IG, relief-F, RF, SWARA, etc.), type of topographic setting (plain or mountainous), number of factors used in the modeling exercise, etc. We could not find a study that has sorted out this issue of variability in the significance scores of conditioning factors due to the variability in data quality, use of different techniques/algorithms, number of conditioning factors, to name a few among others. Hence, there is a need for future research on this theme.

5.2 Nature of Flood, Predictor Selection, Topoclimatic Setting

The literature is replete with studies conducted with the aims of applying a new model for susceptibility prediction of different types of natural phenomena like floods (Ngo et al., 2018). Looking at the number and type of conditioning factors used by these studies, it appears that there is no clear guideline as to how many conditioning factors and which conditioning factors should be applied for, say, floods susceptibility, or landslide susceptibility, or ground subsistence susceptibility, or groundwater potential mapping exercises that can accrue to most optimal model performance. One common practice seen in the flood susceptibility modeling studies is that most of the researchers use “geology” or lithology as one of the control factors for flash floods, riverine floods, and storm surge related floods irrespective of whether they occur in mountainous regions, floodplain zones of low-relief settings, high mountain plateau provinces (Ngo et al., 2018), or coastal zones (Dodangeh et al., 2020), and in all these studies, the significance level of “lithology” stood out to (Di et al., 2019) be at sixth or later ranks. But there has been no study which employed detailed microlevel “geomorphology” as a control factor in low-relief topographic zone’s riverine flooding events. Selection of geomorphology as a proxy for flood susceptibility has been essentially chosen here because the area is affected by active tectonic perturbances, and continuous and fast groundwater depletion is causing ground subsidence. And geomorphology directly reveals those effects. Vegetation in different types of topographic and climatic settings shows the variance in their type (Kumar, 2016), and hence, vegetation diversity in different terrain types will alter the characteristics of NDVI and its threshold (Davenport and Nicholson, 1993). Keeping these associations between vegetation diversity changes and topoclimatic environmental variability, application of NDVI threshold may change its significance score, and hence, model performance too can follow suit.

5.3 Comparative Assessment of Ensemble Models’ Performance Vis-à-Vis Topoclimatic Setting

The EBF- and FR-based two ensembles with CART used in this work have yielded accuracy levels, as adjudged in terms of the threshold-independent statistics like AUROC, in the range of widely acceptable limits as per the classification scheme of AUROC values followed by Fressard et al. (2014). Both the ensemble models’ AUROC has been found to be within the range of 0.828–0.8432 (for training dataset) and 0.802–0.819 (for validation dataset). The higher AUROC, for both the training and validation datasets (also known as success rate or SR and prediction rate or PR of the model), has been scored by CART-EBF (SR = 0.843; PR = 0.819) and slightly lower by CART-FR (SR = 0.828; PR = 0.802) (Figures 7A,B). It is worth noting that this study has been performed in a low altitude (altitude range: <45.0–96.0 m AMSL) humid monsoonal climatic region undergoing constant active tectonic perturbances (Valdiya, 1976; Brown and Nicholls, 2015) and hence frequent and more severe flooding in the low-lying subsiding areas. In such a topographic environment, the use of moderate spatial resolution digital elevation data lends more levels of uncertainty errors which further propagate in other derivatives computed using this data (Oksanen and Sarjakoski, 2005). In such topographic settings, augmentation of topographic data quality has the potential to enhance the accuracy of DEM-derived input parameters (Sanders, 2007) and hence the models’ performances (van Westen et al., 2008). By applying the LR-, MLP-, and CART-based ensemble with a different bivariate model viz. statistical index (SI) for an area located in the mountainous and hilly part of Romania (altitudinal range: 242–1,463 m AMSL, characterized by temperate continental climate), Costache et al. (2020) have achieved both success rate and prediction rate accuracies of 0.94 (MLP-SI), 0.939 (CART-SI), 0.925 (LR-SI) and 0.927 (MLP-SI), 0.922 (CART-SI), 0.901 (LR-SI), respectively. There are 10 flood control factors selected by Costache et al. (2020), for flash flood occurrence prediction in his study area, and four out of them viz. L-S factor, hydrological soil group (HSG), stream power index (SPI), and topographic position index (TPI) are different from our study. In another study, Costache and Tien Bui (2019) investigated flash flood susceptibility prediction in different parts of Romania in similar topoclimatic setting for flash flood susceptibility prediction but with 14 flood predictors, five of which are different from ours, and achieved almost same levels of accuracy of success and prediction rates, as their previous study discussed just above in this section, but better than ours, ranging from MLP-FR (0.94), CART-FR (0.937) and MLP-FR (0.981), CART-FR (0.929), respectively. MLP-EBF trained and validated with 10 flash flood conditioning factors in hilly and mountainous catchment dominated by temperate climate has accrued 0.912 AUROC success rate accuracy and 0.806 prediction rate accuracy in identifying torrential valleys vulnerable to flash floods (Costache et al., 2019). In almost similar (similar to ours) flat terrain setting and climate, Hong et al. (2018a) have conducted a study wherein the altitude range was between <40–720 m AMSL in the southeastern part of China to investigate the fuzzy weight of evidence (fuzzy-WofE)-based ensembles with logistic regression (LR), random forest (RF), and support vector machine (SVM) using 11 conditioning factors, three of which differed from ours, but their reported success rate accuracy and prediction rate accuracy levels were in the range of 0.9519 (fWofE-LR)–0.9882 (fWofE-SVM) and 0.9652 (fWofE-LR)–0.9865 (fWofE-SVM), respectively. This study reveals that SVM- and fuzzy-WofE-based ensemble has the capability to perform much better, accuracy-wise, in like MGP topoclimatic setting, with freely available moderate quality DEM-like ASTER 30 m.

Other reasons that affect model performance levels include quality of flood inventory generated using different methodologies, like some use NDWI (Jain et al., 2005), mNDWI (Mohammadi et al., 2017) with different threshold values, or some other methods using various sensors of satellite datasets such as optical Landsat 7 ETM+ and Landsat 8 OLI imageries (Kumar, 2016), or radar data (Ward et al., 2014), and water surface DEM and bare-earth LiDAR DEM differencing (Guerriero et al., 2018). Variations in the number of flood and non-flood points meant for training and validation of models, resolution of DEM to derive topography-based flood predictors, and other related parameters also affect the flood inventory accuracy and hence alter the model performance. Regarding DEM data quality, Podhorányi et al. (2013) who have used LiDAR derived DEM data, have asserted that the DEM data quality has an inverse relationship with the level of uncertainties involved, i.e. better the data quality, lesser is the level of uncertainty in DEM derived parameters. And hence, for better performance of susceptibility models, higher quality DEM is warranted. On the other hand, Chen et al. (2020) have used freely available different DEMs in the spatial resolution range of 30–90 m (all derived by resampling of 30 m ASTER DEM), and they reported that DEM spatial resolution does not necessarily affect the susceptibility model results. It should be noted that the difference between Chen’s results and that of Podhorányi’s maybe because Chen has derived all the seven variants of DEMs from the same ASTER 30 m data whereas the latter created their DEM from point clouds collected using LiDAR which has proven excellent in several aspects over freely available moderate resolution DEMs (Goulden et al., 2014). Different statistic matrices presented here in this work indicate different aspects of model performances like how good the prediction accuracy is or how sensitive the model behaves and what is the overall performance of the individual ensembles or how badly the model fails to predict flood or non-flood pixels, and other such aspects.

6 Conclusion and Recommendations for Future Research

To achieve the goal of flood susceptible area zonation of MGP based on FSPI produced by applying different ensembles of models, this study is the next in the series of models’ testing after Arora et al. (2021b). This study, based on AUROC, has shown that the CART-based ensembles with bivariates EBF and FR perform reasonably well with both success and prediction. When it comes to utilizing moderate resolution-based conditioning factors, by using as less as 12 conditioning factors only, the decision of selecting ensembles for flood zonation mapping, which is an essential requirement for achieving sustainable development goals (SDGs) set by United Nations related to flooding, it is recommended that CART-EBF should be given priority over CART-FR. Different threshold-dependent statistic indices connote different aspects of model performances (detailed in references cited in Section 3.8), and based on user’s requirements, the researchers and agencies are recommended to make their choices. Another point that emerges out of the models’ output used herein is that both the models have their performance accuracies in the range of “good” as per the traditional AUROC classification scheme.

Detailed microscale geomorphic mapping is based on “geomorphology” as playing the best contributor in the susceptibility prediction mapping. The rank of geomorphology as number one in tectonically active areas and in fluvial floodplain areas affected by regular riverine flooding appears to be because this factor incorporates effects of active tectonic activity and ground subsidence related to excessive and fast groundwater depletion. Looking at its significance, it is advised that the government of concerned areas having similar topoclimatic setting first gets the areas geomorphologically mapped by using high-resolution satellite to be used as input in the flood susceptible zonation exercises. The research by Arora et al. (2021a) also vindicates this observation.

Some of the limitations faced in this work are: 1) instead of ground truth points collected using GPS in the field, we have used Google Earth Pro® for validation of non-flood points; 2) moderate resolution DEM used for computation of input flood predictions. Use of DEMs prepared using point cloud obtained with unmanned aerial vehicles (UAVs) or pulsed laser light-based LiDAR DEMs, or terrestrial laser scanning (TLS) device-based DEMs would have affected the model performance accuracy that affects the susceptibility zone percentage shares. The testing of all kinds of models, both standalone and ensembles, of all family of models, for instance, machine learning, statistical, multicriteria decision-making models highlighting their advantages and disadvantages as well as new model development is recommended to have a better understanding of optimality in the behavior of models. Since in the forthcoming future, the age is going to be of machines, space-based monitoring, and quantification of all natural and man-made phenomena with the best possible accuracy and precision will be the prime information that will be needed. In the coming future, the missions like surface water and oceans topography (SWOT) (Morrow et al., 2019) will be the need of the time to monitor all the phenomena including floods from space, and instantaneous susceptibility prediction zonation of areas will be instantly planned to be done in such missions at the control rooms of such missions. Model universalization by the selection of the best model through rigorous testing and validation of the available models of different genres performing with higher accuracy in a particular type of topoclimatic environmental setting will help guide such future missions.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, and further inquiries can be directed to the corresponding authors.

Author Contributions

AArora and MP: Conceptualization, database preparation, and model output presentation in map and graph chart form and manuscript writing; AA and RC: Modeling and analysis; NK, VNM, HN, JM, MAS, YR, SS, and UKS: Manuscript writing, enhancement, guidance during write up, and manuscript revision during first and second rounds of review.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer NL declared a past co-authorship with the authors AA, AA, RC to the handling Editor.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations or those of the publisher, the editors, and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

The authors are grateful to the department of geography, JMI for allowing the GIS lab for carrying out the part of analyses. The lab facility at the Department of Civil Engineering, Chandigarh University, Punjab, is acknowledged for they have allowed some part of the work to be carried out therein. Any use of trade, firm, or product name is for descriptive purposes only and does not imply endorsement by the authors.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/feart.2021.659296/full#supplementary-material

References

Abdullahi, S., Pradhan, B., and Jebur, M. N. (2015). GIS-based Sustainable City Compactness Assessment Using Integration of MCDM, Bayes Theorem and RADAR Technology. Geocarto Int. 30, 365–387. doi:10.1080/10106049.2014.911967