AUTHOR=de Lutio Riccardo , Park John Y. , Watson Kimberly A. , D'Aronco Stefano , Wegner Jan D. , Wieringa Jan J. , Tulig Melissa , Pyle Richard L. , Gallaher Timothy J. , Brown Gillian , Guymer Gordon , Franks Andrew , Ranatunga Dhahara , Baba Yumiko , Belongie Serge J. , Michelangeli Fabián A. , Ambrose Barbara A. , Little Damon P. TITLE=The Herbarium 2021 Half–Earth Challenge Dataset and Machine Learning Competition JOURNAL=Frontiers in Plant Science VOLUME=12 YEAR=2022 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2021.787127 DOI=10.3389/fpls.2021.787127 ISSN=1664-462X ABSTRACT=

Herbarium sheets present a unique view of the world's botanical history, evolution, and biodiversity. This makes them an all–important data source for botanical research. With the increased digitization of herbaria worldwide and advances in the domain of fine–grained visual classification which can facilitate automatic identification of herbarium specimen images, there are many opportunities for supporting and expanding research in this field. However, existing datasets are either too small, or not diverse enough, in terms of represented taxa, geographic distribution, and imaging protocols. Furthermore, aggregating datasets is difficult as taxa are recognized under a multitude of names and must be aligned to a common reference. We introduce the Herbarium 2021 Half–Earth dataset: the largest and most diverse dataset of herbarium specimen images, to date, for automatic taxon recognition. We also present the results of the Herbarium 2021 Half–Earth challenge, a competition that was part of the Eighth Workshop on Fine-Grained Visual Categorization (FGVC8) and hosted by Kaggle to encourage the development of models to automatically identify taxa from herbarium sheet images.