AUTHOR=Sainz-Villegas Samuel , de la Hoz Camino Fernández , Juanes José A. , Puente Araceli TITLE=Predicting non-native seaweeds global distributions: The importance of tuning individual algorithms in ensembles to obtain biologically meaningful results JOURNAL=Frontiers in Marine Science VOLUME=9 YEAR=2022 URL=https://www.frontiersin.org/journals/marine-science/articles/10.3389/fmars.2022.1009808 DOI=10.3389/fmars.2022.1009808 ISSN=2296-7745 ABSTRACT=

Modelling non-native marine species distributions is still a challenging activity. This study aims to predict the global distribution of five widespread introduced seaweed species by focusing on two mains aspects of the ensemble modeling process: (1) Does the enforcement of less complex models (in terms of number of predictors) help in obtaining better predictions? (2) What are the implications of tuning the configuration of individual algorithms in terms of ecological realism? Regarding the first aspect, two datasets with different number of predictors were created. Regarding the second aspect, four algorithms and three configurations were tested. Models were evaluated using common evaluation metrics (AUC, TSS, Boyce index and TSS-derived sensitivity) and ecological realism. Finally, a stepwise procedure for model selection was applied to build the ensembles. Models trained with the large predictor dataset generally performed better than models trained with the reduced dataset, but with some exceptions. Regarding algorithms and configurations, Random Forest (RF) and Generalized Boosting Models (GBM) scored the highest metric values in average, even though, RF response curves were the most unrealistic and non-smooth and GBM showed overfitting for some species. Generalized Linear Models (GLM) and MAXENT, despite their lower scores, fitted smoother curves (especially at intermediate complexity levels). Reliable and biologically meaningful predictions were achieved. Inspecting the number of predictors to include in final ensembles and the selection of algorithms and its complexity have been demonstrated to be crucial for this purpose. Additionally, we highlight the importance of combining quantitative (based on multiple evaluation metrics) and qualitative (based on ecological realism) methods for selecting optimal configurations.