Skip to main content

ORIGINAL RESEARCH article

Front. Nutr.
Sec. Nutrition Methodology
Volume 11 - 2024 | doi: 10.3389/fnut.2024.1469878
This article is part of the Research Topic Revolutionizing Personalized Nutrition: AI's Role in Chronic Disease Management and Health Improvement View all 4 articles

Visual Nutrition Analysis: Leveraging Segmentation and Regression for Food Nutrient Estimation

Provisionally accepted
  • 1 School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
  • 2 Changshu Key Laboratory of Medical Artificial Intelligence and Big Data, Suzhou, China
  • 3 Department of Scientific Research, The Changshu Affiliated Hospital of Soochow University, Suzhou, China

The final, formatted version of the article will be published soon.

    Nutrition is closely related to body health. A reasonable diet structure not only meets the body's needs for various nutrients but also effectively prevents many chronic diseases. However, due to the general lack of systematic nutritional knowledge, people often find it difficult to accurately assess the nutritional content of food. In this context, image-based nutritional evaluation technology can provide significant assistance. Therefore, we are dedicated to directly predicting the nutritional content of dishes through images. Currently, most related research focuses on estimating the volume or area of food through image segmentation tasks and then calculating its nutritional content based on the food category. However, this method often lacks real nutritional content labels as a reference, making it difficult to ensure the accuracy of the predictions. To address this issue, we combined segmentation and regression tasks and used the Nutrition5k dataset, which contains detailed nutritional content labels but no segmentation labels, for manual segmentation annotation. Based on these annotated data, we developed a nutritional content prediction model that performs segmentation first and regression afterward. Specifically, we first applied the UNet model to segment the food, then used a backbone network to extract features, and enhanced the feature expression capability through the Squeeze-and-Excitation structure. Finally, the extracted features were processed through several fully connected layers to obtain predictions for the weight, calories, fat, carbohydrates, and protein content. Our model achieved an outstanding average percentage mean absolute error (PMAE) of 17.06% for these components. All manually annotated segmentation labels can be found at

    Keywords: nutrition estimation, Nutrition5k, deep learning, image segmentation, regression

    Received: 24 Jul 2024; Accepted: 03 Dec 2024.

    Copyright: © 2024 Zhao, Zhu, Jiang and Xia. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Kaijian Xia, Department of Scientific Research, The Changshu Affiliated Hospital of Soochow University, Suzhou, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.