Image-based food monitoring and dietary management for patients living with diabetes: a scoping review of calorie counting applications

Rouhafzay, Asal; Rouhafzay, Ghazal; Jbilou, Jalila

doi:10.3389/fnut.2025.1501946

REVIEW article

Front. Nutr., 27 March 2025

Sec. Nutrition Methodology

Volume 12 - 2025 | https://doi.org/10.3389/fnut.2025.1501946

This article is part of the Research TopicSmart Dietary Management for Precision Diabetes Mellitus CareView all 4 articles

Image-based food monitoring and dietary management for patients living with diabetes: a scoping review of calorie counting applications

Asal Rouhafzay^1,2

Ghazal Rouhafzay³

Jalila Jbilou^1,2^*

¹School of Psychology, Université de Moncton, Moncton, NB, Canada
²Centre de formation médicale du Nouveau-Brunswick, Université de Moncton, Moncton, NB, Canada
³Department of Computer Science, Université de Moncton, Moncton, NB, Canada

Accurate dietary intake estimation is crucial for managing weight-related chronic diseases, such as diabetes, where precise measurement of food volume and caloric content is essential. Traditional calorie counting methods are often error-prone and may not meet the specific needs of individuals with diabetes. Recent advancements in computer science offer promising solutions through automated systems that estimate calorie intake from food images using deep learning techniques. These systems provide personalized dietary recommendations, helping individuals with diabetes make informed choices. As smartphones and wearable devices become more accessible, the utilization of electronic apps for dietary monitoring is increasing, highlighting the need for more research into safe, secure, and evidence-based IoT solutions. However, challenges such as standardization, validation across diverse populations, and data privacy concerns need to be addressed. This review focuses on the role of computer science in dietary intake estimation, specifically food segmentation, classification, and volume estimation for calorie calculation. By synthesizing existing literature, this review provides insights into current methods, key challenges, and potential future directions. The review also explores advancements in technology that can improve the accuracy of dietary assessments, contributing to personalized disease management and the prevention of weight-related chronic conditions.

1 Introduction

Weight-related diseases, including diabetes, are labeled as a pandemic and represent an alarmingly increasing global public health issue. The prevalence of diabetes has tripled these last 15 years, rising more rapidly in low-and middle-income countries than in high-income countries. In 2021, approximately 537 million adults worldwide were living with diabetes, a figure expected to rise to 783 million by 2045 if current trends continue (1). Diabetes is a major cause of serious health complications, including blindness, kidney failure, heart attacks, strokes, and lower limb amputations. There are different types of diabetes, with Type 2 Diabetes (T2D) being the most prevalent and largely preventable. Managing T2D involves adopting healthy behaviors such as following a balanced diet (particularly low in carbohydrates and fat), engaging in regular physical activity, and, when necessary, taking medication. Consistent medical follow-ups are also essential for effective T2D management.

Managing diabetes effectively requires accurate monitoring of dietary intake, particularly caloric consumption (2). Traditional methods, such as food diaries and self-reporting, have long been used to estimate dietary intake (3). However, these methods are liable to errors due to underreporting, overreporting, and recall biases, which can significantly impact the accuracy of calorie calculations (4). This is especially problematic for individuals with diabetes, where precise management of caloric intake is crucial to maintaining stable blood glucose levels.

Recent advancements in computer science, particularly in the fields of artificial intelligence (AI) and computer vision, have introduced innovative solutions to these challenges. By leveraging deep learning algorithms, researchers have developed systems that can automatically segment, classify, estimate food volume and caloric content from images, eliminating the need for manual entry and reducing the potential for human error. These approaches offer the potential to revolutionize dietary monitoring by providing accurate, real-time assessments of food intake (5).

Previous research in this domain has explored various methods for improving dietary intake estimation, including the use of specialized hardware, such as 3D scanners and depth sensors, to capture more accurate food measurements (6). While these methods have shown promise, their reliance on specialized equipment limits their accessibility and widespread adoption. In contrast, image analysis can now be performed using standard smartphone cameras, thanks to everyday developments and improvements in smartphone technology, making these solutions more accessible and practical for daily use.

Theories surrounding personalized medicine and precision health underscore the importance of tailoring interventions to individual needs (7). In the context of diabetes management, this means providing dietary recommendations that align with a person’s specific metabolic profile, dietary habits, lifestyle, knowledge and capacity. AI-driven dietary monitoring tools align well with these theories by enabling more personalized and adaptive approaches to diabetes care.

However, despite the promise of these technologies, several challenges remain, specifically in achieving accurate automated volume estimation without user input or specialized devices. Additionally, issues related to the standardization of methods, validation across diverse populations, and privacy concerns in handling sensitive health data must be addressed to ensure the reliability and ethical use of dietary monitoring tools (8).

Given the critical role of accurate dietary monitoring in diabetes management and the rapid advancements in AI and computer vision, this paper provides a comprehensive review of 14 popular in the market calorie-counting applications. It critically evaluates the computer science methodologies employed in their development, focusing on food segmentation, classification, volume estimation, and calorie calculation. By synthesizing recent advancements introduced through reputable platforms such as IEEE, Springer, and ACM, this review emphasizes the technological innovations driving more accurate and personalized dietary assessments. It serves as a foundation for developing next-generation calorie-counting tools, offering insights into the strengths and limitations of current approaches and paving the way for future research and application development. The aim of the current paper is to identify solutions offered in the form of mobile applications, whose working principles are publicly available and can be studied. The aim of this paper is to critically appraise the existing literature on calorie-counting applications. It seeks to extract and evaluate the computer science methodologies employed, including food segmentation, classification, volume, and calorie estimation. Additionally, it compares the effectiveness and accuracy of these methodologies across various applications and derives recommendations for practical use and future research.

2 Materials and methods

This review synthesizes literature from several well-established computer science databases, including IEEE, Springer, ACM, and ScienceDirect, to evaluate advancements in calorie-counting applications. The focus was on image-based food monitoring systems and calorie-counting tools, both manual and automated, that utilize computer science methodologies such as food segmentation, classification, and volume estimation to enhance dietary intake accuracy. These tools are particularly relevant for individuals managing weight-related chronic diseases like diabetes.

2.1 Search terms and databases

Key search terms were carefully crafted based on initial scoping exercises and included combinations of keywords: “calorie counting apps,” “food image segmentation,” “food volume estimation,” “Image Processing” “dietary intake estimation,” and “diabetes management.” Search was limited to studies and applications published or introduced between 2010 and 2024 to focus on advancements spanning the past 15 years.

2.2 Article retrieval and screening protocols

Articles and application descriptions were retrieved through queries across databases. A multi-step screening process was implemented, beginning with the review of titles and abstracts to identify relevant studies. Full-text analysis was conducted to ensure methodological transparency and relevance to calorie-counting tools. Duplicates and irrelevant studies were excluded during this process.

2.3 Inclusion and exclusion criteria

Studies and applications were included if they explicitly focused on calorie-counting tools, whether manual or automated, and presented the computer science methodologies employed in their design, such as food segmentation, classification, or volume estimation. Excluded were those lacking sufficient methodological detail, not within the specified time range, or irrelevant to dietary monitoring.

2.4 Selection of most frequently studied applications

The 14 calorie-counting applications analyzed in this review represent a mix of manual, semi-automated, and AI-driven tools. To select the most studied calorie-counting applications, we used the following criteria: 1-frequent citation in the literature (more than 3 different articles), 2-availability of public documentation describing the methodologies used, and 3-contributions to advancing dietary monitoring practices. By including both manual and automated tools, the review provides a comprehensive overview of the progression and diversity in calorie-counting methodologies. Figure 1 provides a flow diagram summarizing the screening and selection process for identifying the 14 applications included in this review.

Figure 1

Figure 1. PRISMA flow diagram.

This approach allowed for an in-depth analysis of advancements in calorie-counting applications, along with an evaluation of their limitations, reliance on user input, sensitivity to image quality variations, and scalability challenges. These findings aim to highlight areas where future research can further improve the accuracy and accessibility of these tools.

3 Core stages and working principles of calorie counting apps

This section provides a detailed analysis of the 14 prominent calorie-counting applications selected for this review, as introduced in the previous section. These applications, highlighted for their contributions to computer science methodologies and were released through prominent publishers. We explore the main stages involved in calorie counting applications, focusing on three critical steps: food segmentation, food recognition, and food volume estimation. These steps are fundamental to the accurate calculation of nutritional information and calorie content from food images. Additionally, these applications often rely on well-known food datasets, which play a crucial role in ensuring the accuracy and comprehensiveness of food recognition and calorie estimation.

By analyzing these stages across the 14 applications, we aim to identify trends, strengths, and potential areas for improvement in the current state of calorie-counting technology. This analysis provides insights into how each application approaches the challenges of food segmentation, recognition, and volume estimation, all of which are crucial for accurate calorie calculation and effective dietary monitoring. The block diagram in Figure 2 illustrates the core stages involved in the calorie estimation process. The process begins with food image segmentation, where food items are isolated from the background or other objects in the image. Following segmentation, food classification identifies the specific type of food, such as distinguishing between white rice, brown rice, or meat. The classified food items then undergo volume estimation, where their portion sizes are calculated using appropriate techniques. Finally, the results of classification and volume estimation are integrated with nutritional datasets to perform calorie counting, determining the caloric value of the food items. These tasks are sequentially dependent, with each step building upon the previous one to achieve accurate dietary assessment. Table 1 presents the selected applications and provides their general information.

Figure 2

Figure 2. An automated image-based nutrition assessment tool.

Table 1

Table 1. The selected 14 calorie counting applications.

3.1 Food image datasets

Food image datasets are foundational for the development and evaluation of food recognition systems. These datasets vary widely in their attributes, including the number of images, food categories, and methods of data acquisition. A well-structured food image database is critical for training and benchmarking machine learning models, impacting their performance and generalizability.

Food image datasets are categorized by several factors. Different datasets focus on various food types, ranging from generic classifications to specific cuisines. For example, datasets such as Food-101 (9) and UEC-Food256 (10) cover a broad spectrum of food categories, while others, like Turkish-Foods-15 (11) and Japanese Foods (10, 12–16), focus on specific regional cuisines. Also, the source and method of image collection play a significant role in the quality and applicability of the database. Images may be captured in controlled environments, such as studios with standardized lighting, or in natural settings, like restaurants and social media platforms. For instance, Food-85 (17) and Diabetes (18) use controlled environments, whereas Foodlog (19) and Instagram 800k (20) leverage user-contributed images and web crawls.

The number of images and their diversity within each class are crucial for model robustness. Datasets like FoodX-251 (21) and Fruits 360 Dataset (22) offer extensive image collections, which are essential for training deep learning models. High diversity in images helps the model generalize better to new, unseen data. Food image datasets are often designed for specific tasks, such as classification or segmentation. For example, FOOD201-Segmented (7) contains images specifically segmented for classification tasks, while datasets like VIREO Food-172 (23) may serve both classification and segmentation needs. NutriNet (24) is another influential database designed for deep learning applications in food and drink image recognition. It plays a pivotal role in dietary assessment and nutritional analysis, further enhancing AI’s capabilities in health informatics.

Recent food image datasets like CNFOOD-241 (25), AI4FoodDB (26), and MyFoodRepo-273 (27) have made significant contributions to the field. AI4FoodDB (26), launched in 2023, is particularly notable as it forms part of a larger initiative aimed at advancing personalized nutrition and e-Health solutions. What sets AI4FoodDB apart is its integration of food images with data from wearable devices, validated questionnaires, and biological samples. This holistic approach seeks to create a digital twin of the human body, providing a valuable benchmark for personalized nutrition research and aiding in the fight against non-communicable diseases.

As mentioned, many existing food image datasets are predominantly focused on specific countries or cultural contexts, which can introduce significant biases in the development of food recognition models and constrain their generalizability across diverse dietary habits globally. The limited cultural diversity in these datasets often results in AI systems that underperform when encountering food items from underrepresented regions or cuisines. This lack of inclusivity poses a critical challenge to the development of robust, globally applicable dietary assessment tools. To address this, there is a pressing need for the creation of comprehensive datasets that capture the breadth of global food practices. Initiatives such as AI4FoodDB, which integrate diverse food categories alongside multimodal data sources, exemplify a forward-looking approach to enhancing model generalization and reducing biases in food recognition systems.

Table 2 provides a summary of notable food image datasets, highlighting their unique attributes.

Table 2

Table 2. Food image datasets.

While Table 2 focuses on publicly documented, food-focused image datasets, certain calorie-counting applications also reference specialized datasets that do not strictly fit these criteria. For example, Im2Calories (7) utilizes NYU Depth V2 (28) for initial depth training; however, we do not include it here because it is a general-purpose indoor scene dataset rather than a food-specific resource. These cases illustrate that some applications leverage additional or proprietary datasets for specialized tasks, particularly volume estimation, that fall outside the scope of publicly available food-image collections.

3.2 Image segmentation

Image segmentation is a foundational technique in computer vision, involving the partitioning of an image into distinct regions or segments that correspond to different objects or areas of interest. In food recognition, segmentation is particularly important because it enables the precise identification and isolation of individual food items on a plate. This accuracy is critical for tasks such as portion size estimation, calorie counting, and nutrient analysis, all of which are essential components of dietary assessment systems.

In food recognition applications, segmentation plays a vital role in ensuring that each food item is accurately identified and analyzed, regardless of how it is presented on the plate. Given the variability in food presentation due to different cuisines, cooking methods, and serving styles, segmentation methods must be robust and adaptable. These methods range from traditional approaches like edge detection and region-based segmentation to advanced deep learning models that can learn complex features from large datasets.

Numerous mobile applications and systems have been developed that incorporate image segmentation as a key component for dietary assessment. These applications often utilize various segmentation techniques, each chosen based on the specific requirements and constraints of the application, such as processing power, real-time capabilities, and the complexity of food items being analyzed. The following Table 3 summarizes the segmentation methods utilized in 14 prominent food recognition applications, detailing their approaches:

Table 3

Table 3. Segmentation strategies employed in food image processing applications.

This table outlines the variety of segmentation methods employed across different food recognition applications, each tailored to the unique challenges posed by food imagery. For instance, a range of segmentation methods, from manual approaches like PlateMate, where workers manually draw bounding boxes, to fully automated techniques seen in Im2Calories and goFOOD™, which utilize advanced models like DeepLab and Mask R-CNN for precise segmentation. Meanwhile, advanced segmentation models such as Mask R-CNN and DeepLab, while achieving high accuracy in food recognition tasks, are computationally intensive, making them less suitable for mobile or real-time applications where efficiency is paramount. These models involve complex architectures with multiple layers and extensive parameter sets, resulting in significant processing time and memory requirements. Such computational demands can hinder their deployment on resource-constrained devices like smartphones or in scenarios requiring immediate responses. Addressing these limitations often necessitates the exploration of lightweight alternatives, such as MobileNet or YOLO-based frameworks, or applying optimization techniques like model pruning and quantization to improve the feasibility of using these advanced models in practical, real-time settings.

Applications like FoodLog and Snap-n-Eat adopt simpler, yet effective, block-wise analysis and saliency-based sampling for segmenting food items. Some, such as GoCarb and FoodLog, are optimized for the specific characteristics of food images, enhancing accuracy, while others like YOLOv2 in Food Tracker and the RPN in DeepFood use more general object detection frameworks. Interactive methods in goFOOD™ offer a balance between automation and user input, whereas fully automated approaches like NU-InNet and mobile food record (mFR) prioritize efficiency, especially in mobile contexts. Fine-grained segmentation in Im2Calories and DeepFood focuses on individual food items, while coarser methods like those in Snap-n-Eat are faster and suitable for broader region identification. Its mentionable Building on the foundation of the GoCARB system, the team introduced goFOOD™ (29). For semi-automatic segmentation, they continued utilizing region growing and merging algorithms. In addition, they developed a fully automated food segmentation method using Mask R-CNN (30). The recognition module was upgraded with an enhanced Inception V3 model, enabling more effective hierarchical food recognition. While GoCARB was designed primarily for carbohydrate calculation, goFOOD™ expands its functionality to estimate the calories and nutritional content of entire meals.

Recent advancements in image segmentation have introduced powerful methods like the Segment Anything Model (SAM) (31), which has gained significant attention for its versatility and accuracy. SAM, developed by Meta AI, is designed to handle a wide range of segmentation tasks with minimal fine-tuning, making it particularly useful for applications requiring high adaptability to diverse data types. Unlike traditional segmentation models that often require extensive training on specific datasets, SAM leverages prompt engineering to perform zero-shot segmentation across various domains, including medical imaging, object detection, and food image analysis. Its ability to generalize well across different tasks has set a new benchmark in segmentation accuracy and efficiency, outperforming earlier models in terms of both speed and precision.

By employing these segmentation techniques, food recognition applications can enhance their ability to provide accurate dietary assessments, offering users more reliable insights into their food intake. As the field continues to evolve, it is expected that further advancements in segmentation algorithms, particularly those powered by deep learning, will continue to improve the precision and usability of dietary assessment tools.

3.3 Image classification

Food image classification is a critical step in many food assessment applications, where the goal is to accurately identify and categorize food items from images. This process typically involves two main components: feature extraction and classification. Feature extraction involves identifying and quantifying the relevant attributes of an image, such as color, texture, and shape, which can then be used to distinguish different types of food. Classification refers to the process of assigning a label to the image based on these extracted features, determining the specific food item or category.

Traditional machine learning approaches to food image classification rely on manually engineered features and classical classifiers. In these methods, the feature extraction process involves using techniques such as edge detection, color histograms, and texture analysis to represent the image in a feature space. Once the features are extracted, classifiers like Support Vector Machines (SVM), k-Nearest Neighbors (k-NN), and Random Forests are employed to categorize the food items. These approaches require careful selection and design of features, which can be a time-consuming process, and often struggle with the variability and complexity of food images. The performance of traditional methods is also heavily dependent on the quality and relevance of the extracted features.

In contrast, deep learning approaches have revolutionized food image classification by automating the feature extraction process using convolutional neural networks (CNNs). CNNs can learn hierarchical features directly from the raw pixel data, capturing intricate patterns and relationships within the image that are often difficult to detect with traditional methods. This ability to learn from data has led to significant improvements in classification accuracy, particularly for complex and diverse food items. Deep learning models, such as those based on CNN architectures like AlexNet, ResNet, and Inception, are capable of handling large-scale datasets and can generalize well to new, unseen food items. These models have become the standard in food image classification, outperforming traditional approaches in both accuracy and scalability.

Table 4 highlights how food assessment applications employ diverse classification methods, from traditional machine learning to advanced deep learning, each tailored to specific tasks. PlateMate utilizes a manual, user-driven approach where food items are described and matched to a database, relying on crowdsourced voting to refine classification accuracy. This method, while interactive, is heavily dependent on user input, which may limit scalability and consistency.

Table 4

Table 4. Classification strategies employed in food image processing applications.

In contrast, FoodLog and Snap-n-Eat employ more automated methods, using global image features and support vector machines (SVM) for classification. FoodLog combines block-wise analysis with global features like color histograms and Bag of Features (BoF), whereas Snap-n-Eat focuses on texture and shape information using HOG and SIFT descriptors, further enhanced by Fisher Vector encoding. These methods are more efficient but may struggle with complex food images where handcrafted features are insufficient to capture the necessary detail.

Deep learning approaches have significantly advanced the field of food image classification. For example, Im2Calories uses a CNN-based multi-label classifier, fine-tuned on large food datasets, to handle multiple food items in a single image. This method exemplifies the power of deep learning in capturing intricate patterns within food images, allowing for more accurate and scalable classification.

NU-InNet and Food Tracker further illustrate the effectiveness of deep learning, with architectures specifically optimized for mobile devices. NU-InNet, modifies the inception modules from GoogLeNet to balance processing time with accuracy, while Food Tracker uses a deep convolutional neural network (DCNN) based on MobileNet and YOLOv2, achieving impressive performance with minimal computational cost.

DietLens and MyDietCam leverage transfer learning, using pre-trained models like ResNet-50 and DenseNet201 to extract features, which are then classified using either traditional SVMs or innovative methods like ARCIKELM for adaptive learning. These approaches demonstrate how deep learning models, pre-trained on extensive datasets like ImageNet, can be adapted to specific food classification tasks with high accuracy.

Finally, applications like goFOOD™ and DeepFood highlight the utility of hierarchical classification and CNNs in handling fine-grained food categories. goFOOD™, employs a hierarchical classification scheme using an Inception V3 model to recognize food items at different levels of granularity, from broad categories to specific dishes. DeepFood utilizes the VGG-16 model, which combines region-based feature extraction with bounding box regression, ensuring precise classification even in complex food images.

One of the most recent classification approaches for food recognition involves the use of Vision Transformers (ViTs) (32) and Self-Supervised Learning (SSL) techniques (33). Vision Transformers, originally developed for natural language processing tasks (32), have been adapted for image classification and are gaining popularity due to their ability to capture long-range dependencies and global image context more effectively than traditional convolutional neural networks (CNNs). Self-Supervised Learning, on the other hand, leverages large amounts of unlabeled data to pre-train models, which are then fine-tuned on specific food datasets. This approach reduces the reliance on labeled data, which is often scarce in food recognition tasks, and improves the generalization capabilities of the model across different food categories.

Overall, the trend in food image classification is moving towards deep learning-based methods, which offer superior accuracy, scalability, and the ability to handle complex and diverse food images with minimal manual intervention. These methods have set a new benchmark in the field, outperforming traditional machine learning approaches, especially in terms of efficiency and adaptability to new data.

3.4 Volume estimation

Image volume estimation is a critical aspect of food recognition systems, particularly in dietary assessment applications. Accurate volume estimation allows these systems to determine the portion sizes of food items, which is essential for calculating nutritional intake, including calories, macronutrients, and micronutrients. The challenge in estimating food volume from images lies in the inherent variability in food presentation, such as different shapes, sizes, and textures, as well as varying camera angles and lighting conditions.

Several methods have been developed to estimate food volume from images, ranging from traditional geometric approaches to advanced machine learning techniques. Geometric methods typically involve using reference objects (like a standard-sized plate or utensil) to scale the food item in the image, enabling volume calculations based on known shapes (e.g., spheres, cylinders). On the other hand, machine learning approaches often leverage deep learning models trained on large datasets of food images to estimate volume directly from pixel data.

Table 5 highlights the diversity of methods used for volume estimation in food recognition applications.

Table 5

Table 5. Volume estimation strategies employed in 14 food image processing applications.

The volume estimation methods across these food assessment applications vary significantly in complexity and accuracy. PlateMate and FoodLog utilize crowd-sourced input and Bayesian personalization, respectively, which can improve accuracy but are dependent on user input and initial classification quality. Snap-n-Eat and DietLens use simpler techniques like pixel counting and reference images, respectively, making them user-friendly but potentially less precise. Menu-Match avoids direct volume estimation by relying on predefined data, which simplifies the process but limits its applicability to custom meals. Im2Calories and GoCarb employ advanced 3D reconstruction and pose estimation methods, providing high accuracy but requiring complex setups and multiple images. FoodCam relies on user input for volume estimation, which can be inconsistent. NU-InNet does not address volume estimation, focusing instead on food recognition accuracy. MyDietCam lacks specific details on its volume estimation method. goFOOD™ combines 3D reconstruction with gravity data for improved accuracy, though it requires additional hardware (sensor assisted volume estimation). DeepFood estimates nutritional content without specific volume estimation, assuming standard portion sizes. Finally, mobile food record (mFR) uses a combination of geometric models and deep learning for portion size estimation, offering a sophisticated approach but demanding significant computational resources. Overall, methods like Im2Calories and GoCarb are superior in terms of accuracy due to their advanced techniques, while applications like NU-InNet and MyDietCam fall short by not presenting specific volume estimation methods.

One of the recent advancements in volume estimation processes is FOODCAM 2022 (34), an imaging-based method specifically designed for food portion size estimation (FPSE). It employs a novel capturing device that delivers greater accuracy compared to traditional methods. The system integrates a stereo camera, PIR sensor, and infrared projector, enabling precise meal portion size estimation. FOODCAM was primarily designed for monitoring food intake and cooking activities in kitchen and dining environments.

Since the focus of FOODCAM 2022 is exclusively on portion size estimation, it was excluded from our discussion on food recognition applications. It’s important to note that FOODCAM 2022 is distinct from the FoodCam calorie-counting application developed in 2015, which was included in the tables. While both share the name “FoodCam,” the FOODCAM 2022 device is dedicated to volume estimation, whereas the FoodCam application developed in 2015 focuses on calorie tracking.

Accurate volume estimation in most of these apps relies on user input for near-perfect estimation, which can be prone to human measurement inaccuracies. A parallel challenge arises in automated food volume estimation, where overfitting becomes a critical issue, especially when models are trained on narrow or non-representative datasets that fail to capture the full diversity of real-world conditions. For instance, many existing volume estimation methods inadvertently memorize dataset-specific artifacts (e.g., background noise, lighting conditions, or food presentation styles) rather than learning generalizable features. This over-reliance on training data idiosyncrasies results in poor performance when deployed in variable, uncontrolled environments, compromising their accuracy in practical applications.

4 Ethical and privacy concerns

The ethical implications of AI-driven dietary monitoring tools are significant, particularly in the areas of data privacy, potential biases, and transparency. Sensitive dietary and health data collected by these applications must comply with stringent data protection frameworks, such as the General Data Protection Regulation (GDPR), which emphasize anonymization, secure storage, and encryption to mitigate the risks of misuse. Furthermore, biases stemming from imbalanced datasets, which may inadequately represent diverse cultural and demographic contexts, pose challenges to model accuracy and fairness. To address this, the diversification of datasets and the implementation of regular fairness audits are essential for achieving equitable outcomes. Transparency and accountability also play a vital role; explainable AI (XAI) techniques can elucidate model decision-making processes, fostering user trust and confidence in technology. As highlighted in recent literature (35), adopting comprehensive ethical guidelines and promoting collaboration among stakeholders are critical steps in addressing risks. These efforts, combined with robust audit mechanisms and user education programs, ensure the responsible development and deployment of AI-driven dietary monitoring tools.

5 App store availability, user ratings, and performance metrics

To gauge real-world user adoption and satisfaction, we conducted a search for these applications in major consumer app stores (Google Play and Apple’s App Store). However, we found that most of the listed calorie-counting solutions are academic prototypes or research-oriented tools rather than commercial products. Consequently, they are not publicly available in mainstream app stores, and official user ratings are unavailable. Some studies [e.g., Snap-n-Eat (36), FoodLog (37)] do discuss small-scale pilot tests or user acceptance evaluations within controlled research settings, but these do not constitute widespread consumer feedback akin to star ratings or download counts. Where a limited pilot or spin-off was mentioned (e.g., DietLens (38)), we could not locate any corresponding listing under the same app name. These findings highlight the research-focused nature of most solutions, emphasizing the need for future work on broader deployment, real-world user engagement, and the potential transition to public app store availability.

Our primary emphasis, however, is on the computer science methodologies underpinning these applications. By examining their underlying algorithms, we can better determine how effectively each solution tackles existing challenges, such as accurate portion estimation and real-time processing. Focusing on these methodological aspects allows us to identify limitations and highlight advantages relevant to the future design and deployment of calorie-counting applications.

Table 6 summarizes the reported performance metrics for each calorie-counting or food recognition application, as documented in their respective publications. Whenever possible, we include classification accuracy, mean error rates, or other relevant statistics. Despite differences in datasets and validation protocols, the reported accuracy and error rates provide insight into how well each application addresses calorie estimation or food recognition. For instance, PlateMate overestimates caloric content by +7.4%, which is close to the +5.5% error reported for a trained dietitian in the same study (39). Classification-based approaches generally report accuracy metrics in the 70–90% range, but they are measured on diverse datasets (e.g., Food-101, UEC-Food256), making direct comparisons challenging. Some solutions, such as Menu-Match and DeepFood, achieve relatively high Top5 accuracies of over 90% on specific datasets. Others, like FoodCam, show a lower Top1 accuracy (around 50%), yet still improve markedly with Top5 predictions (74.4%). Meanwhile, MyDietCam notes strong results of over 80% across multiple datasets, but 100% accuracy in certain controlled conditions (PFID dataset). Overall, although performance varies by dataset and study design, these figures highlight the progress in automated food recognition and the ongoing need to refine algorithms for real-world calorie counting applications.

Table 6

Table 6. Reported performance metrics in 14 food image processing applications.

6 Key findings and remaining challenges

This review sheds light on key aspects of image-based food monitoring systems, focusing on segmentation, classification, and volume estimation techniques used in calorie-counting applications. It highlights recent advancements in deep learning and image processing that have significantly improved the accuracy of food recognition and dietary intake estimation. However, challenges such as standardization, validation across diverse food types, and privacy concerns remain significant hurdles.

As technology continues to advance, the integration of machine learning and computer vision techniques is expected to further enhance the accuracy and reliability of food volume estimation in dietary assessment applications. These advancements will contribute to the development of more sophisticated tools for personalized nutrition and health management, providing users with detailed insights into their dietary habits and improving overall outcomes. Despite this progress, the review identifies that many applications still rely on user input and suffer from inconsistencies in image quality, highlighting the need for continued innovation.

To address these challenges, this review presents the following recommendations aimed at enhancing digital remote healthcare for patients with diabetes and weight-related chronic diseases:

• Develop robust, standardized algorithms capable of handling a wide range of food types and presentation styles.

• Eenhancing volume estimation techniques to minimize reliance on user input and reduce the need for specialized hardware will further enhance the accessibility and accuracy of next-generation calorie-counting systems.

• Prioritize secure, privacy-preserving methods for managing sensitive dietary and health-related data.

• Collaborate with healthcare professionals to ensure tools provide effective support for patient education, health behavior monitoring, and personalized dietary recommendations.

• Collaborate with patients living with T2D to ensure feasibility, acceptability, ease of navigation, and appropriateness of tools and supportive material, including education and support.

6.1 Future opportunities in emerging technologies

As AI-based architecture continues to evolve, new opportunities arise to enhance the accuracy, scalability, and usability of calorie-counting applications while maintaining user-friendliness. For instance, 3D sensing technologies, such as LiDAR sensors integrated into Apple smartphones, can be leveraged to improve the precision of volume estimation modules. These sensors provide high-resolution depth information, enabling more reliable food volume assessments. Additionally, advanced computational approaches for 3D reconstruction, for instance, monocular depth estimation (40), offer promising alternatives by generating detailed three-dimensional representations from single or multiple images. Incorporating such techniques can enhance model robustness across diverse food types and real-world conditions, further improving the reliability of dietary assessment tools.

To address the generalizability issues affecting most current calorie-counting applications, federated learning (41) can be explored as a promising approach. By enabling image classification and volume estimation models to be trained collaboratively across multiple user devices while keeping data local, federated learning enhances the robustness of the global model. This approach not only improves generalization across diverse food types and real-world conditions but also allows for personalization without compromising user privacy. Additionally, it reduces the risk of data centralization vulnerabilities while leveraging distributed computing to adapt models to individual dietary habits and variations in food presentation.

The challenges and limitations identified in this research, along with the proposed future directions, pave the way for the development of a new generation of calorie-counting applications. These next-generation applications can leverage cutting-edge techniques to address key issues such as real-time performance, accuracy across diverse dietary scenarios, and user-specific personalization. By overcoming these limitations, future calorie-counting solutions will become more reliable, user-friendly, and seamlessly integrated into clinical settings, ultimately supporting dietary management and chronic disease prevention.

7 Conclusion

In conclusion, this review underscores the critical role of advanced computer science techniques in enhancing the accuracy and effectiveness of calorie-counting applications. By evaluating the methodologies employed in existing systems, we have identified both their strengths and areas in need of improvement. This work represents a significant step forward in the field, contributing to the ongoing evolution of digital health tools aimed at better managing dietary intake. The insights gained from this review will guide the development of a new, state-of-the-art calorie-counting app designed to address the existing challenges and provide more reliable, personalized dietary assessments to support patients living with T2D and their clinicians in the international fight against T2D.

Author contributions

AR: Conceptualization, Data curation, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing. GR: Conceptualization, Methodology, Resources, Supervision, Validation, Visualization, Writing – review & editing. JJ: Conceptualization, Methodology, Resources, Supervision, Validation, Visualization, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. This study was supported through grant programs allocated to Dr Jbilou from ResearchNB-Strategic Initiative Grant (SIG_2025_014) and the PLEIADE Grant from Université de Moncton. This work was also supported by ResearchNB’s Talent Recruitment Fund program under Application No. TRF-0000000170 allocated to Dr G. Rouhafzay. Special thanks go to Centre de Formation Médicale du Nouveau-Brunswick (CFMNB) for the Knowledge Transfer Grant provided to Dr Jbilou in support for this publication.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. International Diabetes Federation (2021). IDF Diabetes Atlas, 10th edition. Brussels, Belgium. Available online at: https://diabetesatlas.org (Accessed August 21, 2024).

Image-based food monitoring and dietary management for patients living with diabetes: a scoping review of calorie counting applications

1 Introduction

2 Materials and methods

2.1 Search terms and databases

2.2 Article retrieval and screening protocols

2.3 Inclusion and exclusion criteria

2.4 Selection of most frequently studied applications

3 Core stages and working principles of calorie counting apps

3.1 Food image datasets

3.2 Image segmentation

3.3 Image classification

3.4 Volume estimation

4 Ethical and privacy concerns

5 App store availability, user ratings, and performance metrics

6 Key findings and remaining challenges

6.1 Future opportunities in emerging technologies

7 Conclusion

Author contributions

Funding

Conflict of interest

Generative AI statement

Publisher’s note

References

94% of researchers rate our articles as excellent or good