Climate change conditions the selection of rust-resistant candidate wild lentil populations for in situ conservation

Civantos-Gómez, Iciar; Rubio Teso, María Luisa; Galeano, Javier; Rubiales, Diego; Iriondo, José María; García-Algarra, Javier

doi:10.3389/fpls.2022.1010799

ORIGINAL RESEARCH article

Front. Plant Sci. , 03 November 2022

Sec. Technical Advances in Plant Science

Volume 13 - 2022 | https://doi.org/10.3389/fpls.2022.1010799

This article is part of the Research Topic Wild Plant Genetic Resources: A Hope for Tomorrow View all 9 articles

Climate change conditions the selection of rust-resistant candidate wild lentil populations for in situ conservation

Iciar Civantos-Gómez^1,2

María Luisa Rubio Teso³

Javier Galeano¹

Diego Rubiales⁴

José María Iriondo³

Javier García-Algarra^5*

¹Complex System Group, Universidad Politécnica de Madrid, Madrid, Spain
²Faculty of Economics and Business Administration, Universidad Pontificia Comillas, Madrid, Spain
³ECOEVO Research Group, Área de Biodiversidad y Conservación, Universidad Rey Juan Carlos, Madrid, Spain
⁴Instituto de Agricultura Sostenible (CSIC) Avenida Menéndez Pidal s/n Campus Alameda del Obispo, Córdoba, Spain
⁵DRACO Research Group, Centro Universitario de Tecnología y Arte Digital, Las Rozas, Spain

Crop Wild Relatives (CWR) are a valuable source of genetic diversity that can be transferred to commercial crops, so their conservation will become a priority in the face of climate change. Bizarrely, in situ conserved CWR populations and the traits one might wish to preserve in them are themselves vulnerable to climate change. In this study, we used a quantitative machine learning predictive approach to project the resistance of CWR populations of lentils to a common disease, lentil rust, caused by fungus Uromyces viciae-fabae. Resistance is measured through a proxy quantitative value, DSr (Disease Severity relative), quite complex and expensive to get. Therefore, machine learning is a convenient tool to predict this magnitude using a well-curated georeferenced calibration set. Previous works have provided a binary outcome (resistant vs. non-resistant), but that approach is not fine enough to answer three practical questions: which variables are key to predict rust resistance, which CWR populations are resistant to rust under current environmental conditions, and which of them are likely to keep this trait under different climate change scenarios. We first predict rust resistance in present time for crop wild relatives that grow up inside protected areas. Then, we use the same models under future climate IPCC (Intergovernmental Panel on Climate Change) scenarios to predict future DSr values. Populations that are rust-resistant by now and under future conditions are optimal candidates for further evaluation and in situ conservation of this valuable trait. We have found that rust-resistance variation as a result of climate change is not uniform across the geographic scope of the study (the Mediterranean basin), and that candidate populations share some interesting common environmental conditions.

1 Introduction

In the coming decades, food security will be seriously compromised by the lack of adaptive resilience to climate change of cultivars currently used in crops (Smith et al., 2017; Wiebe et al., 2019; Anderson et al., 2020). This lack of adaptive resilience is caused by the low genetic diversity that is inherent to most modern cultivars (Rauf et al., 2010; Van de Wouw et al., 2010; Rufo et al., 2019). Indeed, it has been estimated that major crops are likely to experience sensible yield reductions in the coming decades. In their review of 2015 for Eastern Africa, Adhikari et al. depict a gloomy scenario for main staples by the end of this century. They predict a 72% drop for wheat, around 40% for other cereals, and 10% for potatoes (Adhikari et al., 2015). A recent study predicts a drop that ranges from 3% to 12% by 2050 and from 11% to 25% by 2090 for rice and soybeans (Wing et al., 2021). This will be the result of losses caused by the arrival of new pests and pathogens, the intensification of the effects of those active right now, and potential mismatches to the new climate regimes, including increasing temperature and drought and higher incidence of extreme events (e.g., hail, strong winds, floods, etc). Anyway, those statistical projections could hide the fact that climate change may result beneficial for some crops and regions (Ray et al., 2019).

Crop wild relatives (CWR) are one of the most important sources of genetic diversity to transfer key adaptations to crops, a relevant fact in the context of climate change (Heywood et al., 2007; Maxted, 2008; Zhang et al., 2017). For example, they have been effectively used in plant breeding in crops such as sunflower (Helianthus annuus L.) (Seiler et al., 2017), narrow−leafed lupin (Lupinus angustifolius L.) (Mousavi-Derazmahalleh et al., 2018), durum wheat (Triticum turgidum subsp. durum (Desf.) Husn.) (El Haddad et al., 2021) or pea (Pisum sativum L.) (Rubiales et al., 2020).

Interestingly, the genetic diversity of crop wild relatives is, at the same time, being severely eroded mainly due to habitat modification and destruction by human activities (Iriondo et al., 2008; Khoury et al., 2022). A recent survey of 600 species of crop wild relatives in the United States, estimated that more than one-half are endangered and 7% in a critical condition (Khoury et al., 2020). As a result, a global effort is being made to promote the establishment of genetic reserves or the in situ conservation of crop wild relatives (Maxted et al., 2008; van Treuren et al., 2017; Labokas et al., 2018).

In the process of deciding which populations of a given crop wild relative should be selected for in situ conservation, there are a number of considerations that must be taken into account. On one hand, genetic reserves should be representative of the range of genetic diversity present in the species (Dempewolf et al., 2017). The best way of characterizing genetic diversity is through sequencing as it provides complete information about the genome and its cost is drastically decreasing. Nevertheless, the characterization of hundreds or thousands of populations that a given crop wild relative might have is a highly time-consuming task that would require huge human and economic resources as well. Ecogeographic information arises as a useful tool that provides a proxy to estimate among-population genetic diversity and helps in the selection of representative populations (Vincent et al., 2019). Plant breeders are often interested in identifying and conserving candidates that display a targeted phenotypic trait (e.g., resistance to a pathogen). Once again, ecogeographic information associated with the CWR populations is critical to identifying these candidate populations given the impossibility of conducting evaluation experiments with plant material from so many sites. In this sense, predictive characterization techniques based on FIGS (Focused Identification Germplasm Strategy) have been applied to identify wheat resistance to stem rust, caused by the fungus Puccinia graminis (Endresen et al., 2012; Bari et al., 2014), and barley resistance to leaf rust, caused by Puccinia hordei (Amouzoune et al., 2022). Different methodological approaches predicted phenotypic traits through a calibration approach, using as a starting point a set of training data where the targeted trait of a set of populations is known and the corresponding ecogeographic information may be retrieved from public repositories (Endresen, 2010; Sánchez et al., 2019). In the search for resistance to pathogens, several researchers have followed qualitative approaches in which the material is previously evaluated as resistant or non-resistant (Bari et al., 2012; Rubio Teso et al., 2022). However, resistance to pathogens could be evaluated using numerical variables as well (Arojju et al., 2018; Ren et al., 2021). Predictive characterization of quantitative resistance traits such as DSr (Disease Severity relative) to lentil rust (Uromyces vicia-fabae (Pers.) Schröt), is likely to benefit from machine-learning models with quantitative dependent variables (Rubiales et al., 2013).

The second aspect of great relevance when identifying the most appropriate CWR populations relates to the adequacy of the site for the long term in situ conservation. The land use of the site has to be compatible with the long-term viability and persistence of the candidate population (Hunter, 2012; Hunter et al., 2012). Those within protected areas are less vulnerable to human disturbance and are, therefore, preferred in this context (Maxted et al., 2012). In any case, the in situ conservation of CWR in genetic reserves may also be feasible in other instances (e.g., farms), whenever there is a long-term commitment by the landowners (Maxted and Kell, 2009).

Ex situ conservation is the best and most adequate approach for conserving and utilizing plant genetic reserves. One of the main benefits of complementing it with in situ conservatio of CWR is that in situ conserved wild populations are constantly evolving as a result of changing biotic and abiotic environmental factors (Fu, 2017). Adapting to such changes favours genotypes that maximize their fitness under current and potentially future environmental conditions (Meilleur and Hodgkin, 2004). On the contrary, the germplasm conserved ex situ in genebanks allows rapid access to genetic variation but represent a static genetic diversity of the population in the moment of sampling (Castañeda-Álvarez et al., 2016) that, in any case can be useful in recovering populations. In the identification of the most appropriate CWR populations for in situ conservation, one must take into account not only the environmental conditions currently present in the target population but also those expected to occur in the future as a result of climate change, and whether those future conditions are compatible with the preservation of the population or the targeted traits. The vulnerability of protected areas to the effects of climate change is starting to be assessed in terms of global biodiversity or emblematic species but has not been studied in the context of CWR in situ conservation (Hannah, 2010; Triviño et al., 2018).

In this study we used a predictive characterization approach, based on machine learning, to quantitatively project the rust resistance of crop wild relative populations of lentils (Lens culinaris subsp. culinaris) in the Mediterranean basin. Rust is a severe foliar disease in lentils (Rubiales et al., 2011). Breeding with CWRs to increase rust resistance of cultivars is a convenient method for this disease control in legume crops (Barilli et al., 2009; Rubiales et al., 2011; Negussie and Pretorius, 2012; Sillero et al., 2017).

Climate change is expected to unevenly affect agriculture in different parts of the world (Howden et al., 2007). There is large variation in climatic conditions, soils, land use, infrastructure, and political and economic conditions across the European continent (Olesen and Bindi, 2002; Cramer et al., 2018; Fellmann et al., 2018). These differences are expected to influence the responsiveness of CWR populations to climate change. Here we apply climate change projections and Shared Socio-Economic Pathways (SSPs) to predictive characterization in order to compare the potential changes in rust-resistance of lentil CWR populations in Europe and Turkey (Dufresne et al., 2013; Voldoire et al., 2013; O’Neill et al., 2014; Wu et al., 2014). We, then, aimed to identify a set of rust-resistant candidate populations that could be designated for in situ conservation in genetic reserves, searching for those that occur in protected areas and selecting those with the lowest vulnerability to changes in the environment that might result in the loss of this trait. The results of the calibration method of the predictive characterization techniques should answer the following questions: (i) which variables are the most important to predict rust resistance? (ii) which CWR populations are likely to show strong resistance to rust under present time environmental conditions? and (iii) which of them are likely to keep this trait under forthcoming climate change scenarios? We expect selected populations to have evolved to develop rust resistance over time and consider them to be a valuable asset under present and future conditions.

2 Data description

Lentil was cultivated for the first time in the Fertile Crescent around 5000 BP (Zohary et al., 2012). It probably spread out through the Mediterranean basin, the Indian subcontinent, and the Horn of Africa at a relatively fast pace driven by its high yield (Liber et al., 2021). Thus, it didn’t face the selective pressure of its wild relatives, the subspecies orientalis and odemensis and the three species Lens nigricans (M. Bieb.) Godr., Lens ervoides (Brign.) Grande and Lens lammotei. Czefr. We considered in our study all Lens taxa naturally occurring in Europe and Turkey. These are L. ervoides, L. nigricans and L. lammotei, as well as L. culinaris subsp. orientalis (Boiss.) Ponert. and L. culinaris subsp. odemensis. (Ladiz.). Lens taxa distribution data were extracted from a database of crop wild relative populations in Europe and Turkey generated for the Farmer’s Pride project (www.farmerspride.eu) (Rubio Teso et al., 2020). Due to the imbalance in the number of samples ofthe original database (443 populations of L. nigricans found in 12 countries, 145 of L. ervoides in 9 countries, 29 of L. lamottei and 9 of L. culinaris subsp. orientalis), we decided to build a unique model and exclude the species as an input variable. We made this choice to avoid overfitted models, being aware that there is a loss of input information, but taking into account that the four taxa belong to the same genus.

Raw data downloaded were further cleaned and filtered as indicated in (Rubio Teso et al., 2022). The calibration dataset holding rust evaluation data has 351 samples of five Lens taxa, both wild and cultivated (L. culinaris, L. culinaris subsp. culinaris, L. culinaris subsp. orientalis, L. ervoides and L. nigricans). Each sample is georeferenced and its DSr value is the mean of four years’ field trials. Each field trial followed a complete block design with 3 replications, artificially inoculating the samples and including frequent rows of susceptible checks to act as spreaders to ensure a high and uniform disease pressure. Disease Severity on mature plants in the field is assessed as a visual estimation of the leaf area covered by rust pustules, which is influenced by environmental factors. This value (DS) is standardized each year by expressing each DS value as a percentage of the highest one in each location that is set at 100% (DSr) (Sillero et al., 2017).

Each record of the filtered lentil CWR distribution database and of the calibration dataset was associated to the values of 65 bioclimatic, 35 edaphic and 18 geophysic variables at 2.5 arc-min resolution, corresponding to the sites of the populations. This information was obtained from the ecogeographic database of CAPFITOGEN3 (Parra-Quijano et al. 2020). Latitude and longitude were also added to the variable selection procedure. SelecVar function of CAPFITOGEN3 was used to estimate variable importance according to the random forest classification (RFC) and detected redundant variables through bivariate correlation analysis (Garcia et al., 2017). The first 15 variables of each bioclimatic, edaphic and geophysic component with the highest Mean Decrease Accuracy (MDA) values (Cutler et al., 2007; Rubio Teso et al., 2022) were checked for colinearity. Pairs of variables with Pearson correlation coefficient > |0.50| and p-value< 0.05 in the same ecogeographical component were identified and the variable with the lowest MDA removed. Hence, in the bioclimatic component, only annual mean temperature (°C) and annual precipitation (mm) were kept. In the edaphic component, four non-correlated variables were selected: bulk density (fine earth) of topsoil, topsoil available soil water capacity until wilting point, topsoil total exchangeable bases and topsoil sand fraction. Finally, in the geophysic component, three non-correlated variables were selected: annual solar radiation (kJ/m²perday), December solar radiation, and longitude. Further information and details about the generation and characteristics of this environmental database can be found in (Rubio Teso et al., 2022).

To identify the populations of lentil, crop wild relatives of Europe and Turkey that occur within protected areas, we considered the protected areas registered at the World Database of Protected Areas (WDPA) for this territory and those included in the Natura 2000 network. The file with the polygons of the WDPA was downloaded in April 2021 from the website ‘Explore the World’s Protected Areas’ (protectedplanet.net). The polygons of the Natura 2000 network were downloaded from the European Environment Agency website (https://www.eea.europa.eu/data-and-maps/data/natura-11/natura-2000-spatial-data/natura-2000-shapefile-1, last accessed 2021/07/20). The shapefile polygons from N2000 and WDPA obtained were merged into a single shapefile that contained all available protected areas in Europe and Turkey, using the function ‘join vector layers’ in QGIS v.3.18.2-Zürich (QGIS Org, 2021). All protected areas in the resulting shapefile were considered for the selection of candidate populations.

The dataset of lentil CWRs comprises 583 populations of four Lens taxa; 236 out of them grow inside a protected area (Supplementary Table S1).

3 Climate change models

Considering that climate change is expected to influence the evolutionary dynamics of CWR populations, one of the aims of this study was also to apply predictive characterization under future climate conditions. To do so, we incorporated climate change projections and Shared Socio-Economic Pathways (SSPs) to assess whether the environmental conditions that are likely to promote at present the presence of rust resistance in wild populations still remain in the forthcoming future. According to this, we combined the ecogeographic variables with the projected temperature and precipitation to quantitatively project the rust resistance of crop wild relative populations in future climate scenarios. A potential role of the results extracted here is their applicability in the selection of candidate populations (Stockwell and Peterson, 2002; Guisan and Thuiller, 2005; Araújo and Guisan, 2006).

The future climate Geographical Information System (GIS) layers were downloaded from the Worldclim database (http://www.worldclim.org/) at 2.5 arc-min resolution (around 5x5 km). From the available periods, we selected the 2021-2040 as the future climate scenario. We chose three global circulation models (GCMs) for climate change projections, produced by the Coupled Model Intercomparison Project Phase 6 (CMIP6) (O’Neill et al., 2016). For every GCM, we analysed three different shared socioeconomic pathways(SSPs), the most “pessimistic” or “conservative” scenario, a “balanced” scenario and an “optimistic” scenario, so we can cover the range of expectations to do a sensitivity analysis. Only the two bioclimatic variables that had been previously selected (Annual mean temperature and Annual mean precipitation) were considered.

4 Methods

4.1 Predictive characterization for evaluation accessions

In this study, we rely on the evaluated accessions that constituted the calibration dataset to carry out a quantitative predictive characterization using a machine learning regression approach. This dataset includes, on the one hand, the continuous range of the DSr as the dependent variable and the selected ecogeographical variables, at present time.

Rubio Teso et al. addressed the issue of predicting resistance to rust by means of the calibration method and a qualitative strategy with the same dataset (Rubio Teso et al., 2022). This method consisted of the binarization of DSr numerical values into qualitative values (resistant, susceptible), prior to prediction. According to this, accessions with the lowest DSr values were classified as rust-resistant, i.e., those located in the first decile of the distribution. The binarized levels of expression (0 = susceptible; 1 = resistant) were used as the dependent variable. Ecogeographical variables were the inputs to predict the binarized DSr resistance through nine classification algorithms. Then, the best predictor model was projected on the non-evaluated populations.

We followed a different path in this research, predicting the numerical value of DSr. We built and evaluated three different families of predictive models with the present time environmental values and DSr values for evaluated accessions: Ridge Regression (Hoerl, 1962, Random Forest (Breiman et al., 1984) and XGBoost (eXtreme Gradient Bosting) (Friedman, 2001; Chen and Guestrin, 2016).

4.1.1 Regression models

DSr quantitative prediction is a regression problem with tabular data. We have implemented in Python the three regression models to tackle the task to identify populations potentially resistant to lentil rust.

Initially, we used the calibration dataset at present to train the models. From this dataset, we carry out a cleaning process in which we discard duplicate values and samples with incomplete variables, leaving us with a total of 255 samples. Considering the small sample size situation, in order to avoid cross-validation overfitting Ng et al. (1997) and achieving robust predictions for the different scenarios proposed, we address the following approach. We build 500 models that only differ in the random split of training and testing sets, including all variables. According to this, the dataset was split into different random train and test subsets, set to a 70/30 ratio. We train the model using the training subset and then we perform the predictions with the test subsets. Once the 500 models had been trained we collected all the predictions to better understand their distribution.

For visualization purposes we used the R programming language. A full list of the packages used is provided at the end of the methods section.

Ridge Regression is the simplest choice to achieve a balance between interpretability and precision. Since linear regression establishes a relationship between dependent variable and one or more independent variables that might be correlated, Ridge Regression imposes a penalty term on the size of the coefficients to overcome this issue, which is called multicollinearity (Gruber, 1998).

The aim of both Ordinary Least Squares (OLS) and Ridge Regression coefficients is to minimize the residual sum of squares (Saleh et al., 2019) and thus, the MSE. According to this, the penalty hyperparameter must be tuned so that model coefficients change in order to optimize the model error, by decreasing the residual sum of squares. As well as linear regression, the Ridge regressor explains the outcome as a function of multiple input variables. Thus, as a result, each input variable has an associated weight that will be positive or negative depending on its contribution to the model.

As well as linear regression, the Ridge regressor explains the outcome as a function of the multiple input variables. Thus, as a result, each input variable has an associated weight that will be positive or negative depending on its contribution to the model.

Random Forest Regression (RFR) is a tree-based ensemble method and belongs to the family of Classification and Regression Trees (CART). An ensemble method is a technique that combines the predictions from multiple machine learning algorithms together to make more accurate predictions than any individual model (Breiman, 2001).

RFR operates by constructing a multitude of decision trees, which are trained with a random subset of samples that have been drawn with replacement from the training sample. The number of variables included in each tree is limited to a percentage of the total variables that must be initially set. This ensures that the ensemble model does not rely too heavily on any individual variable, and makes fair use of all potentially predictive variables. As a result, the output estimation is the mean prediction of the individual trees.

Regarding interpretability, it is known that decision trees can be easily converted into rules which increase human interpretability of the results and explain why a decision was made. However, in the case of Random Forest it is not straightforward to find out the contribution of each of the variables (Rogers and Gunn, 2005). In Random Forests each Decision Tree is a set of internal nodes and leaves. In the internal node, the selected variable is used to make decision how to divide the data set into two separate sets with similar responses within. The variables for internal nodes are selected with some criterion, which for regression tasks is variance reduction. We can measure how each variable decrease the impurity of the split (the variable with highest decrease is selected for internal node). For each variable we can collect how on average it decreases the impurity. The average over all trees in the forest is the measure of the variable importance.

XGBoost (eXtreme Gradient Boosting) is another ensemble method that relies on the concept of gradient tree boosting. Boosting is a sequential algorithm that makes predictions for several rounds on the entire training sample and iteratively improves the performance of the algorithm with the information from the prior round’s prediction accuracy. However, XGBoost produces black box models, hard to visualize and tune compared to RFR (Agarwal and Das, 2020). Note that our aim is not to compare performance across a wide range of modelling techniques, but to show how different modelling approaches ranging from simple Ridge regression to more complex XGBoost can be explored within our framework.

4.1.2 Variable engineering

To optimize the regressors, we followed a three-step process. In particular, we proceeded as follows:

I. Variable selection. The environmental variables that might be the most relevant for explaining lentil taxa distribution were identified using a modified R script developed for the SelecVar tool from CAPFITOGEN3 (Parra-Quijano et al. 2020).

II. Data normalization. Machine learning regressors typically require variables to have a close scale (Kotsiantis, 2011). The scale difference between variables can influence the performance of a ML regressor. Hence, we performed a data normalization scenario, namely standardization.

III. Variable Importance. For the three regression models (Ridge Regression, Random Forests, and XGBoost), we ran a procedure after the training process to estimate variable importance. The importance of variables explaining rust resistance was carried out by analysing the weights assigned to the predictors for each model during the training step.

4.1.3 Model evaluation

To assess the performance of regression models we computed the Root Mean Square Error (RMSE). Although Pearson correlation coefficient is widely used in quantitative genetics (González-Recio et al., 2014), Mean Squared Error (MSE) and RMSE yield better performance in model selection when the sample size is small and there is a high variance in the outcome variable (Oliveira et al., 2018; Waldmann, 2019). RMSE is a distance between the vectors of recorded values (y_i) and predicted values $({\hat{y}}_{i})$ (Chai and Draxler, 2014).

\begin{array}{l} R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}} & (1) \end{array}

4.2 Non-evaluated projections under current conditions

As it was previously mentioned, regression models were initially trained using present time data for rust-resistance evaluated accessions. After that, we applied the best performing model to non-evaluated populations. This dataset had the same present-time ecogeographic variables than the calibration dataset.

We used the trained regression model to perform DSr projections on crop wild relative populations. Thus, we can identify those wild populations that are most likely to be resistant according to their predicted DSr value. Those populations for which DSr projection falls within the first quartile for the continuous range of rust-evaluated populations (DSr ≤ 30.48) and which are located in a protected area were selected as candidates to the long term in situ conservation.

4.3 DSr variation under climate change

Climate change models provide the future estimations for average temperature and yearly precipitation. According to this, we replaced both bioclimatic variables with future climate projections and applied these datasets as inputs to the predictive model. Those candidate populations to in situ conservation whose projected DSr under future climate conditions still falls within the first quartile of DSr distribution at present time were considered to be the populations most likely to retain the rust resistance trait, and, therefore, were selected as the most valuable populations for the establishment of genetic reserves.

List of statistical packages used

Python: python 3.8.8 (Van Rossum and Drake, 1995), matplotlib 3.3.4 (Hunter, 2007), numpy 1.20.1 (Harris et al., 2020), pandas 1.2.4 (Wes McKinney, 2010), seaborn 1.11.1 (Waskom, 2021), scikit-learn 0.24.1 (Pedregosa et al., 2011), verde 1.6.1 (Uieda, 2018), xgboost 1.4.2 (Chen and Guestrin, 2016). R: r-base 4.1.0 (R Core Team, 2020), countrycode 1.2.0 (Arel-Bundock et al., 2018), dplyr 3.4.0 (Wickham et al., 2022), forcats 0.5.1 (Wickham, 2021), gensysr 1.0.0 (Obreza, 2019), ggplot2 3.3.3 (Wickham, 2016), maps 3.3.0 (Brownrigg, 2018), readxl 1.3.1 (Wickham and Bryan, 2022), rgbif 0.9.8 (Chamberlain et al., 2017), scico 1.3.0 (Pedersen and Crameri, 2021), seewave 2.2.0 (Sueur et al., 2008).

5 Results

5.1 Predictive characterization

The training models generated with the three regression models rendered similar global results. Figure 1A shows the RMSE distributions for each of the three predictive models at present time, evaluated for the samples whose DSr values were known in advance. The Ridge model yielded the lowest median value, however this measure of centrality is not the only criterion to decide which is the most appropriate predictive model. Thus, the Ridge model showed a larger error spread than Random Forest, and, in addition, it was less accurate for small DSr values, precisely those corresponding to the most resistant samples and, therefore, the most valuable for conservation purposes (Figure 1B, Supplementary Table S2). Finally, a visual comparison of the distribution of the actual values and those predicted by the three methods reveals that the Ridge model tends to overpredict in the intermediate value range. Figures 1C–E include the values of the Kolmogorov-Smirnov distance which measures how the predicted and the evaluated values differ. Although the value of this difference is slightly lower for XGB than for Random Forest (0.114 vs. 0.116), the RMSE distribution of XGB showed a greater median value and error spread. For all these considerations, we selected Random Forest as the most suitable model to predict the DSr value of non-evaluated populations both at present and under different climate change scenarios.

FIGURE 1

Figure 1 (A) RMSE of predicted DSr for evaluated samples for each of the three methods, average values are marked as color-filled diamonds. (B) The cumulative RMSE per sample is the sum of RMSE over the number of evaluated samples, and its value is equal to the average value for the last one. Random Forest is the best performer for low values of DSr. (C–E) plots compare the evaluated data distribution of DSr to the predicted one.

We run the Variable Importance method for the Random Forest predictor, where results show that annual precipitation (mm) is the second most relevant, after Longitude. This fact becomes relevant specially when introducing the Climate Change models (Table 1), since annual precipitation was a variable susceptible to be modified, as well as annual average temperature that was at the fifth position in the variable importance ranking.

TABLE 1

Table 1 Variable importance for the Random Forest predictor.

5.2 DSr variation under climate change

One of the main questions of this research is to assess how climate change may impact the ability of populations to maintain resistance to rust. An interesting result when comparing projections under current conditions and under climate change scenarios is that the global distribution of DSr values doesn’t drift in a clear direction. Figure 2A shows that the distribution of the DSr value is very similar for the present and future predictions under the BCC370 model. The median remains almost unchanged, going from 33.70 to 33.85.

FIGURE 2

Figure 2 (A) Histogram of DSr prediction using Random Forest for non-evaluated wild populations in present time and in the future under BCC370 climate change model. Vertical lines mark the median value of each distribution. (B), (C) Histograms of variation of average annual precipitation and average annual temperature between present time and future conditions under BCC370 climate model.

We have built the Random Forest predictor for all the climate change models described in methods. In what follows we always refer to the Random Forest regression under the conditions of change of the BCC370 model.

The variation in annual precipitation is weakly positive, with a median increase of 6 mm (Figure 2B). This change is highly concentrated around the median with -9 mm variation at the limit of the first quartile and 22 mm at the limit of the third quartile. In other words, locations with the lowest precipitation tended to lower their precipitation even further, whereas the other locations increased their precipitation, especially those which initially had the highest precipitation. Average annual temperature increases, on the other hand, are remarkable with values 1.28 °C, 1.56 °C and 1.71 °C for the first quartile, the median and the third quartile (Figure 2C).

The highest values of sensitivity to rust are found in some locations in the interior of the Iberian Peninsula, in regions with very hot and dry summers where rust cannot thrive (Figure 3A). Something similar occurs in the interior of France, but in this same country there is a cluster of potentially resistant populations in the final course of the Rhone, an area with much higher humidity conditions. Proximity to sea seems to foster populations with low DSr, as it happens in Greece, the Anatolian shoreline and Southern Crimea. Figure 3B shows the map of DSr variation with respect to the present time for the same wild accessions using Model BCC_370 instead. Spatial patterns are easy to spot. According to this, DSr values will decrease in the mountains of the Iberian Peninsula, but will be higher in the Rhone region and Southern Greece (See Supplementary Figures S2 and S3 for details). Annual precipitation is the main driver of these changes, with a general trend to drought. Figure 3C shows the variation of precipitation according to the BCC_370 model.

FIGURE 3

Figure 3 (A) DSr predicted median value, using Random Forest, for wild accessions under present climate conditions. (B) Variation of the same parameter according to BCC370 model. Encircled samples in (A) show a low DSr (first quartile) both in present time and in the future and are located inside a protected area. (C) Variation of annual precipitation.

Supplementary Table S1 shows the number of populations whose present DSr value falls under the first quartile for the present time (DSr ≤ 30.48) and are inside a protected area: 14 belong to Lens ervoides and 36 to Lens nigricans. Out of these 51 populations there are 16 that have a DSr value under 30.48 under the BCC370 hypothesis of Climate Change, 7 of Lens ervoides and 9 of Lens nigricans. They are encircled in magenta in Figure 3B. They are the most valuable populations for in situ conservation and are listed in Table 2.

TABLE 2

Table 2 Selected subset of wild lentil populations for in situ conservation, occurring in protected areas and most likely to be rust-resistant at present and in the future according to the tested climate change projection models.

6 Discussion

In this study, we addressed the pressing problem of the lack of adaptive resilience in modern cultivars in the context of climate change and the need to identify crop wild relative populations that might provide the genetic diversity needed to obtain specific traits. Rust is a severe disease for lentil production and breeding with resistant CWRs is a convenient way to reduce this problem but systematic straight identification of wild samples that have evolved to resist the fungus is not possible because of the expensive and time-consuming method to estimate Disease Severity. We tackled this problem by building a Machine Learning model using a rich dataset of calibrated samples grown under controlled environmental conditions. We used the model to predict the sensitivity to rust of a set of samples of wild relatives of Lens culinaris subsp. culinaris, in the present time and under 9 scenarios of Climate Change. As DSr is a quantitave measurement, we have gone straight to a regressive model instead of a qualitative binarized approach as in previous works. Our goal was to identify natural CWR populations in protected areas, that are likely to be resistant to rust at present time and to maintain this trait in the future due to the maintenance of selective pressures (i.e., have a projected low DSr at present time and in the future scenarios of climate change models). A quantitative estimation is better suited for the purpose of finding extremely resistant accessions.

With this methodological approach we aimed to answer three questions. The first one is which variables have more impact the model. Variable importance analysis revealed that Longitude is the most relevant one. This fact comes as no surprise given the East-West distribution of lentil wild species populations across the Mediterranean basin (Ladizinsky et al., 1983; Ladizinsky et al., 1984), and the evidences of westward migration from Near East of other wild Fabaceae like wild peas (Smỳkal et al., 2017; Hellwig et al., 2022) or wild lupins (Mousavi-Derazmahalleh et al., 2018). Rust sensitivity is higher in the West Mediterranean basin, with extreme values in inner areas of the Iberian Peninsula and France, where the conditions are less prone to rust development. These results are in line with (Singh et al., 2014) who experimentally evaluated 405 wild lentil accessions and identified 27 promising rust-resistant populations which were mostly located in the Eastern Mediterranean (Syria and Turkey). Besides longitude, annual precipitation, bulk density topsoil, available soil water capacity and annual average temperature are the most relevant for the RFR predictor.

In response to the second question we identify 51 populations whose DSr values are under the first quartile in present time and grow inside a protected area. These are good rust-resistant candidate populations which should be evaluated for this trait and could be easily conserved in situ.

The third question constituting a relevant landmark of this work was to identify which of those 51 populations are likely to maintain the environmental conditions that are favourable for the occurrence of rust-resistant genotypes under the projected Climate Change scenarios. Although longitude is the variable with the greatest weight in the prediction of our model, this variable is constant through time and not affected by climate change. Annual precipitation and average annual temperature were the variables that had an impact in the different scenarios on the future sensitivity to rust. The effect on the variation of DSr was similar for them all (Supplementary Figure S1; Supplementary Table S3), possibly because of the short span over which the change is projected (year 2040). Locations with the lowest precipitation tended to lower their precipitation even further, whereas the other locations increased their precipitation, especially those which initially had the highest precipitation. Average annual temperature increases, on the other hand, are remarkable with values 1.28 °C, 1.56 °C and 1.71 °C for the first quartile, the median and the third quartile. This could suggest that rust resistance is not affected by the changes in annual precipitation and mean temperature, but quite the opposite, we found clear patterns of DSr variation by geographic area. Annual precipitation was found to be the main driver of change and those regions with increased precipitation were associated to conditions more favourable for rust-resistant populations (lower DSr values). In particular, the accessions of South Crimea, the Iberian plateaus and Western France would be the ones that experiment a greater increase in precipitation and consequently a decrease in projected DSr. Just the opposite would happen on the Southern shore of Anatolia, where precipitation is going to decrease sharply reducing the chances of finding rust resistant genotypes in those locations.

We identified 16 populations in protected areas with DSr values below 30.48 (first quartile of the distribution of the DSr projection) at the present time and under the future environmental conditions (Table 2). Eight populations are located in the Eastern Mediterranean basin (6 belonging to Lens ervoides and 2 to Lens nigricans). In the Western basin, 7 populations are in the South of France (all of them belonging to Lens nigricans) and only 1 is in the Iberian Peninsula (Lens ervoides). There is a remarkable difference among the CWR species. While 7 out of 14 populations of Lens ervoides selected at present time will remain of high interest in the future from the rust-resistance point of view, only 9 of 37 populations of Lens nigricans fall within this category. They all share a common geographic feature, they are very close to the coast, and 5 of them are on islands (Rhodes, Zakhyntos, Crete and Cyprus). There are not populations of Lens lamottei and Lens culinaris subsp. orientalis in the selected subset of high interest populations for in situ conservation, but the initial number of populations for these species was small compared to the other species. This fact suggests an interest for focusing future field work on these underrepresented taxa.

7 Conclusion

Crop Wild Relatives of Lens culinaris subsp. culinaris are a source of genetic resistance to a commonrust disease caused by the fungus Uromyces viciae-fabae. The quantitative field evaluation of rust resistance is a hard, expensive, and time-consuming task. The use of Machine Learning approaches provided a way to mitigate this obstacle, using a carefully built calibration set. Our results identified 16 populations that are likely to be resistant to rust, occur in protected areas, and are expected to be resilient under predicted Climate Change conditions. Thus, they are sound candidates for the establishment of genetic reserves for in situ conservation. Further characterization by field evaluation of these populations is needed to check the validity of Machine Learning predictions and improve the genetic value of the calibration set. The same method may be extended to predict pest and pathogen resistance traits of CWRs of other crops.

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found here: https://zenodo.org/record/6883274.

Author contributions

Conceptualization: IC-G, MLRT, JI, JG, JG-A. Data curation: DR, MLRT, JI. Funding acquisition: JI, JG, DR. Methodology: IC-G, JG-A, JG, JI. Machine learning models: IC-G, JG-A, MLRT. Visualization: JG-A. Writing, original draft: IC-G, JG-A, JI, JG. Writing, review & editing: IC-G, MLRT, JG-A, JG, DR, JI. All authors contributed to the article and approved the submitted version.

Funding

This research was partially funded by the Farmer’s Pride project: Networking, Partnerships, and Tools to Enhance in situ Conservation of European Plant Genetic Resources, a 3-year EU-funded project funded by the Horizon 2020 Framework Programme of the European Union, grant agreement no. 774271. J.G. acknowledges financial support provided by the Ministerio de Ciencia, Innovación y Universidades (PGC2018-093854-B-100). This research was partially funded too by I + D + i Plan Andaluz Investigación LEGAND project (P20_0986).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpls.2022.1010799/full#supplementary-material

References

Adhikari, U., Nejadhashemi, A. P., Woznicki, S. A. (2015). Climate change and Eastern Africa: A review of impact on major crops. Food Energy Secur. 4, 110–132. doi: 10.1002/fes3.61

Climate change conditions the selection of rust-resistant candidate wild lentil populations for in situ conservation

1 Introduction

2 Data description

3 Climate change models

4 Methods

4.1 Predictive characterization for evaluation accessions

4.1.1 Regression models

4.1.2 Variable engineering

4.1.3 Model evaluation

4.2 Non-evaluated projections under current conditions

4.3 DSr variation under climate change

5 Results

5.1 Predictive characterization

5.2 DSr variation under climate change

6 Discussion

7 Conclusion

Data availability statement

Author contributions

Funding

Conflict of interest

Publisher’s note

Supplementary material

References

95% of researchers rate our articles as excellent or good