ORIGINAL RESEARCH article

Front. Nutr.

Sec. Food Chemistry

Volume 12 - 2025 | doi: 10.3389/fnut.2025.1598875

This article is part of the Research TopicApplications of metabolomics in the formation of food flavorView all 5 articles

Characterization and feature selection of volatile metabolites in Yangxian colored rice through GC-MS and machine learning

Provisionally accepted
Kaiqi  ChengKaiqi Cheng1Ruonan  DongRuonan Dong1Fei  PanFei Pan2Wen  SuWen Su1Lingjie  XiLingjie Xi1Meng  ZhangMeng Zhang1Jingzhang  GengJingzhang Geng1Ruichang  GaoRuichang Gao3Wengang  JinWengang Jin1*A. M.  Abd El-AtyA. M. Abd El-Aty4*
  • 1Shaanxi University of Technology, Hanzhong, China
  • 2Chinese Academy of Agricultural Sciences (CAAS), Beijing, Beijing Municipality, China
  • 3Jiangsu University, Zhenjiang, Jiangsu Province, China
  • 4Cairo University, Giza, Giza, Egypt

The final, formatted version of the article will be published soon.

Pigmented rice is fascinated by consumers for its abundant phytochemicals and unique aroma.In this study, GC-MS-based metabolomics of Yangxian colored rice varieties were performed to characterize their volatile metabolites through multivariate statistics and machine learning algorithms.A total of 357 metabolic chemicals were detected and segmented into 9 groups, including 96 organooxygen compounds (26.89%), 52 carboxylic acids and derivatives (14.57%), 42 fatty acyls (11.76%), 16 benzene and substituted derivatives (4.48%), and 11 hydroxy acids and derivatives (3.08%). Multivariate statistics were used to screen 127 differentially abundant metabolites via PLS-DA. Principal component analysis revealed that the percentages of PC1 and PC2 were 52.48% and 27.09%, respectively. On the basis of differentially metabolites with great multicollinearity above 0.8 and the chi-square test (20% feature numbers), only 7 metabolites were found to represent the overall metabolites among the several colored rice varieties. Four machine learning models were further used for the classification of various colored rice varieties, and the random forest model was the optimum for predicting classification, with an accuracy of 0.97. Moreover, SHAP analysis revealed that the 7 metabolites can be used as potential markers for representing the metabolomic profiles. These results implied that GC-MS-based metabolomics combined with random forest might be effective for extracting key features among different pigmented rice varieties.

Keywords: Pigmented rice, Metabolites, multivariate statistics, machine learning, volatiles

Received: 24 Mar 2025; Accepted: 22 Apr 2025.

Copyright: © 2025 Cheng, Dong, Pan, Su, Xi, Zhang, Geng, Gao, Jin and Abd El-Aty. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Wengang Jin, Shaanxi University of Technology, Hanzhong, China
A. M. Abd El-Aty, Cairo University, Giza, 12613, Giza, Egypt

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.