AUTHOR=Zhan Weihui , Chen Bowen , Wu Xiaolian , Yang Zhen , Lin Che , Lin Jinguo , Guan Xin TITLE=Wood identification of Cyclobalanopsis (Endl.) Oerst based on microscopic features and CTGAN-enhanced explainable machine learning models JOURNAL=Frontiers in Plant Science VOLUME=Volume 14 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/plant-science/articles/10.3389/fpls.2023.1203836 DOI=10.3389/fpls.2023.1203836 ISSN=1664-462X ABSTRACT=The accurate and fast identification of wood at the "species" level has become critical technical support for protecting and conserving tree species resources. In response to problems such as low efficiency of manual identification, the high cost of using the wood image as training data for deep learning algorithms, and difficulty in industrialization due to the complexity of the trained model, a wood species identification model based on wood anatomy and using the Cyclobalanopsis genus wood cell geometric dataset was proposed, which was enhanced by the conditional tabular generative adversarial network(CTGAN) deep learning algorithm. Based on the actual cell geometric feature dataset of three species of Cyclobalanopsis and combined with CTGAN to generate a simulated cell geometric feature dataset. Two machine learning models (BPNN, SVM) were trained respectively for recognition of three Cyclobalanopsis species with simulated vessel cells and simulated wood fiber cells, which were then evaluated and tested on the real dataset of vessel cells and wood fiber cells. Meanwhile, the two machine learning models were interpreted based on Local Interpretable Model-Agnostic Explanations(LIME) to explore how the machine learning models identify tree species based on wood cell geometric features of three Cyclobalanopsis species. The results showed that the SVM model and BPNN model trained based on CTGAN-generated vessel dataset achieved recognition accuracy of 96.4% and 99.6%, respectively, on the real dataset, while the BPNN model and SVM model trained based on the CTGAN-generated wood fiber dataset achieved recognition accuracy of 75.5% and 77.9% on real dataset respectively. The machine learning model trained based on the enhanced cell geometric feature data by CTGAN could achieve good recognition of Cyclobalanopsis, with the SVM model having a higher prediction accuracy than BPNN.