A Classification Algorithm-Based Hybrid Diabetes Prediction Model

Edeh, Michael Onyema; Khalaf, Osamah Ibrahim; Tavera, Carlos Andrés; Tayeb, Sofiane; Ghouali, Samir; Abdulsahib, Ghaida Muttashar; Richard-Nnabu, Nneka Ernestina; Louni, AbdRahmane

doi:10.3389/fpubh.2022.829519

ORIGINAL RESEARCH article

Front. Public Health, 31 March 2022

Sec. Digital Public Health

Volume 10 - 2022 | https://doi.org/10.3389/fpubh.2022.829519

This article is part of the Research TopicBig Data Analytics for Smart Healthcare applicationsView all 109 articles

A Classification Algorithm-Based Hybrid Diabetes Prediction Model

Michael Onyema Edeh¹^*

Osamah Ibrahim Khalaf²

Carlos Andrés Tavera³

Sofiane Tayeb⁴

Samir Ghouali^5,6

Ghaida Muttashar Abdulsahib⁷

Nneka Ernestina Richard-Nnabu⁸

AbdRahmane Louni⁹

¹Department of Mathematics and Computer Science, Coal City University, Enugu, Nigeria
²Al-Nahrain Nanorenewable Energy Research Center, Al-Nahrain University, Baghdad, Iraq
³COMBA R&D Laboratory, Faculty of Engineering, Universidad Santiago de Cali, Cali, Colombia
⁴Department of Electrical Engineering, Faculty of Sciences and Technology, Mustapha Stambouli University of Mascara, Mascara, Algeria
⁵Department of Electrical Engineering, Faculty of Science and Technology, University of Mascara, Mascara, Algeria
⁶STIC Laboratory, Faculty of Engineering Science, Tlemcen, Algeria
⁷Department of Computer Engineering, University of Technology, Baghdad, Iraq
⁸Department of Computer Science/Informatics, Alex Ekwueme Federal University Ndufu Alike Ikwo (AE-FUNAI), Abakaliki, Nigeria
⁹Faculty of Sciences and Technology, Mustapha Stambouli University of Mascara, Mascara, Algeria

Diabetes is considered to be one of the leading causes of death globally. If diabetes is not treated and detected early, it can lead to a variety of complications. The aim of this study was to develop a model that can accurately predict the likelihood of developing diabetes in patients with the greatest amount of precision. Classification algorithms are widely used in the medical field to classify data into different categories based on some criteria that are relatively restrictive to the individual classifier, Therefore, four machine learning classification algorithms, namely supervised learning algorithms (Random forest, SVM and Naïve Bayes, Decision Tree DT) and unsupervised learning algorithm (k-means), have been a technique that was utilized in this investigation to identify diabetes in its early stages. The experiments are per-formed on two databases, one extracted from the Frankfurt Hospital in Germany and the other from the database. PIMA Indian Diabetes (PIDD) provided by the UCI machine learning repository. The results obtained from the database extracted from Frankfurt Hospital, Germany, showed that the random forest algorithm outperformed with the highest accuracy of 97.6%, and the results obtained from the Pima Indian database showed that the SVM algorithm outperformed with the highest accuracy of 83.1% compared to other algorithms. The validity of these results is confirmed by the process of separating the data set into two parts: a training set and a test set, which is described below. The training set is used to develop the model's capabilities. The test set is used to put the model through its paces and determine its correctness.

Introduction

Diabetes is a chronic disease also known as a silent disease. The World Health Organization (WHO) defines diabetes as a disease that prevents the body from properly using the energy provided by the food it consumes. In addition, the disease occurs when there are problems with the hormone insulin, which is naturally produced by the pancreas to help the body use sugar and fat and store some it (1).

More clearly, when we eat, food is broken down into glucose (sugar). This glucose provides energy for the body to function properly by drawing on its resources. During digestion, the blood carries the glucose throughout the body and supplies the cells. However, in order for the sugar in the blood to be delivered to the cells, the body needs insulin, a hormone secreted by the pancreas, which acts as a key to get the glucose from the blood into the cells of our body (2). There are three most common types of diabetes:

Type 1 diabetes: it is a condition in which the pancreas cannot produces enough insulin or does nor produces any insulin. It accounts for 5–10% of all diabetes cases, commonly affecting childhood and adolescence. Then, Type 2 diabetes–it is a condition in which the insulin produced does not effectively used to maintain the blood sugar level in the body. This type diabetes is common in people age 40 and above, but also appears in younger people. From all diagnosed diabetes cases worldwide, type 2 diabetes accounts for 90–95 percent. There is another type of diabetes called “gestational diabetes”, caused by the lack response of the insulin receptors on the body tissues, even if the insulin levels are normal, which makes this condition different from the second type, and this case is very rare, account for 1–2% of all diabetes cased and it also increase the risk of developing type 2 diabetes later (3).

People with diabetes must be treated according to their type of diabetes. The goal of treatment is to keep the patient's blood sugar level within a normal range.

Related Works

Methodology Used

Model Diagram and Study Explanation

The proposed process is presented in the form of a model diagram in Figure 1 below. The following diagram depicts the flow of the research done in the process of building the model:

FIGURE 1

Figure 1. Proposed model diagram.

In this study, we split the data set into two parts: a training set and a test set. The training set is used to train the model. The test set is used to test the model and evaluate the accuracy.

In the second step we use a K-means algorithm for data correction in order to improve performance and control the classified model (By changing the number of clusters).

Next, we invite the learned algorithms to be tested using the test database. Finally model evaluation with other related work.

Algorithms Used

Support Vector Machine

Support Vector Machine is based on statistical learning theory. SVMs were originally developed for binary classification, but can be effectively extended to multi-layer problems. SVM or Sequential Minimal Optimization (SMO) is a learning system that use a hypothesis space for linear functions in a high-dimensional space, and that has been trained using an optimization theory learning algorithm that employs a learning bias de-rived from statistical learning theory to achieve its results. SVM implements nonlinear class boundaries by translating nonlinear input vectors into a high-dimensional feature space using a linear model, which is implemented using the kernel of the SVM. Support vectors are training in-stances that are closer to the maximal hyper maximum level than the rest of the training examples. In order to define the binary layer boundaries support, all other training samples are rendered inapplicable.

As a result, the vectors are utilized to construct the ideal level hyper linear separation function (in the case of pattern recognition) or linear regression function (in the case of regression) in the feature space in question (13).

K-Means Clustering Algorithm

The k-means clustering algorithm is a machine learning algorithm that groups nearby points into clusters. In this algorithm, there is no learning model construction because we will locate the new point in any cluster based on its distance from all the clusters (mainly its distance from the cluster center or its arithmetic mean) and it is placed in the cluster that is closest. For example, imagine that you want to divide the points of a line into 3 groups. To Determine how close a point is to a particular group, we will use a measure of its distance from the group (for example, the distance between two points) (14).

Naive Bayes

NB is a classification approach in which the idea of independence and relatedness of all characteristics is defined as follows: Specifically, it specifies that the state of a given feature inside a class has no effect on the status of any other feature within the class. As a result of its foundation in conditional probability, it is regarded as a strong algorithm that may be utilized for classification applications. It performs effectively when dealing with data that has imbalance issues and missing values (15):

\begin{array}{l} P (A ∣ B) = \frac{P (B ∣ A) P (A)}{P (B)} & (1) \end{array}

• P(A|B): conditional probability that the response variable has a certain value given the input characteristics. Additionally, this is referred to as the posterior probability.

• P(A): The response variable's a priori probability.

• P(B): The likelihood that the training data or evidence is correct.

• P(B|A): This is referred to as the probability training data.

Decision Tree

DT learning is one of the predictive modeling techniques used in statistics, data mining, and machine learning. Use a decision tree (as a predictive model) to move from observations on an item (represented in the branches) to conclusions about the target value of the item (represented in the paper). They use a hierarchical representation the data structure in the form of a sequence decisions (tests) to predict an outcome or a category. Each individual (or observation), which must be allocated to a class, is represented by a collection of variables, which are tested in the nodes of the tree. In the internal nodes, testing is carried out, and choices are taken in the paper nodes (16). In graph theory, a tree is a linked graph that is undirected, acyclic, and has no edges. There are three categories of nodes:

• Root node: This is the base of the tree and the most sensitive element when the tree is created and before it is exploded.

• Internal node: refers to nodes that have offspring that are themselves nodes.

• Final nodes: that do not contain any branches.

There are many DT's algorithms, and we can cite: ID3, C4.5, CART, C5, CHAID, SLIQ, UFFT, VFDT... (16).

Random Forest

Random Forest algorithm (17) is for statistics and machines that employs several learning methods to improve prediction performance. The two-part algorithm A. Tree bagging b. Each tree is produced from tree bagging to random forest:

1. Sample N instances at random - but with replacement, from the original data if the number cases are N inside the training set. The training set for developing the tree will be this sample.

2. A random number of characteristics and the optimal division utilized for dividing the node are picked when there are M input variables. During the forest growth, the value M is kept constant.

3. Each tree is cultivated as much as possible.

Dataset Used

We used two different databases in this study; Pima Indian Diabetes provided by the UCI Machine Learning repository (18) and a database extracted from the hospital in Frankfurt, Germany (19). Database extracted from the hospital in Frankfurt the first data is 2000 Pima Indian has 768 patient data with 8 attributes/features and one out-put with the patient's label/outcome (0: Not diabetic, 1: Diabetic). Two databases together consist some distinct medical variables, such as:

1. Pregnancies: number of pregnancies.

2. BMI: Body Mass Index (weight in kg / (height in m)²).

3. Insulin: Dose of insulin (mu U/ ml).

4. Age: Age at least 21 years.

5. Glucose: Plasma glucose concentration.

6. Blood Pressure: Diastolic blood pressure (mm Hg).

7. Skin Thickness: Thickness of the triceps skin fold (mm).

8. Diabetes Pedigree Function: Diabetes pedigree function (heredity).

And two classes (1 and 0)

• If class =1 implies diabetic patient.

• If class =0 implies non-diabetic patient.

The choice of these two bases is justified by the following criteria:

• The size of the database.

• Number of attributes.

• Number of classes.

Data Correction

Data cleaning is the next step in machine learning. It is considered one of the main steps in the working stages, and it is either building the model or breaking it. There is a saying that “the best data beats the most complex algorithms” in machine learning. Several aspects of data cleansing must be considered:

1. Discordances and omissions

2. Data mislabeling, same category repeated.

3. Invalid or missing data.

4. Outliers.

Unexpected Outliers in Both Bases

The observation and analysis of the two databases are presented in Table 1.

TABLE 1

Table 1. The observation and analysis of the two databases.

Methods of Handling Invalid Data Values

The Existing Works

• Ignore/delete these cases: delete all the observations with zero values but in this method we get a significant loss of data (about 50% of the data set).

• Put the mean values: calculate the median value of a specific column and replace this value in that column where we have zero.

• Avoid using parameters: The model can avoid using parameters with too many incorrect values. This may help thicken skin, although it's hard to tell.

In This Study

• Use of a classification algorithm: use a classification algorithm to recover the missing data where we have zero and replace them with the value found.

In our case, we have to apply this method and we have chosen the k-means algorithm with a variable K number of cluster (group) and replacing each column needs cleaning by the representative of cluster.

Evaluation Method

Train/Test

The data set is split into two parts: training and testing. The training set teaches the model. The test set is used to evaluate the model's correctness.

Pima Indian Database

“Test size = 0.2.” That is, 20% for the test and the rest 80% for the training.

Hospital Frankfurt Database

“Test size = 0.3.” That is, 30% for the test and the rest 70% for the training.

Accuracy Measures

This study uses Naive Bayes, Random Forest, SVM, and DT algorithms. Train/Test Split is used in experiments. This study uses Accuracy, F1-Measure, Recall, and Precision metrics to classify. See Table 2 for accuracy measures (20).

• True positives (TP), False positives (FP)

• True negatives (TN), False negatives (FN).

TABLE 2

Table 2. Accuracy measures.

Experimental Results

This is clear from Table 3, which compares the different performance measures (accuracy, recall, and F1 score) used to evaluate the investigated machine learning models. Random Forest (RF) demonstrated the best accuracy when used in the tuned configuration (97.6 percent). In addition to Random Forest (RF), additional algorithms such as Decision Tree (DT) have demonstrated sufficient accuracy (97.5 percent). We can see that DT, Gaussian Naive Bayes, Random Forest, SVM, performed better. From the basic level, we can observe that Random Forest and DT work better than other algorithms (Table 4).

TABLE 3

Table 3. Evaluation attributes results for different models (Frankfurt Germany).

TABLE 4

Table 4. Evaluation attributes results for the different models (Pima Indian).

Based on different performance measures such as accuracy, recall, and F1 score, it is clear that the examined ML models are comparable; this is seen in the Table 5, SVM (Support Vector Machine) demonstrated the best accuracy when used in its optimized form (83.1 percent). Other algorithms, such as the Random Forest (RF), have demonstrated sufficient accuracy in addition to the Support Vector Machine (SVM) (80.5 percent). We can see that DT, Gaussian Naive Bayes, Random Forest, SVM, performed better. From the basic level, we can observe that Support Vector Machine and Random Forest work well than other algorithms.

TABLE 5

Table 5. Comparison of the proposed work with the existing works (Pima Indian).

From the results of this experimentation, we observe that the accuracy values for this database with all measurements are satisfactory with a disturbance sometimes the rate increases and sometimes it decreases by a small difference when changing the number of cluster as well as the random initialization of cluster centers can influence the results.

Note: After running the model several times, different results can be obtained in the same cluster number. This depends on the step of random initialization of the cluster centers.

According to the above table, the SVM model obtained the best accuracy which is equal to 83.1%. That is, among 153 attributes that were chosen for testing this model are classified 127 patients correctly. We select the SVM model as the most optimal model that works best for our dataset because it's high accuracy.

According to the above table, the Random Forest model obtained the best accuracy which is equal to 97.6%. That is, among 600 attributes that were chosen for testing this model are classified 582 patients correctly. We select the SVM model as the most optimal model that works best for our dataset because it's high accuracy.

Discussion

Based on the results given in Tables 5, 6, the goal of the four algorithms is to better classify future observations while reducing classification errors. it can be concluded that the suggested models are more accurate than other type 2 diabetes prediction models that have been investigated in the research indicated in these tables. When comparing the above findings, it is obvious that the notion of utilizing the k-means algorithm was successful in our work; as a consequence, we infer that improving the quality of the data enhances the outcomes. This is consistent with other studies (26–35) which shows the predictive accuracy of machine learning algorithms.

TABLE 6

Table 6. Comparison of the proposed work with the existing works (Frankfort Allemagne).

Conclusion

In this study, we proposed a supportive diagnosis system based on the comparison four models of prediction algorithms to predict diabetes in two different databases. On the basis of several performance assessment methodologies like as accuracy and recall, as well as the F1 score, different machine learning algorithms are compared and assessed. Using the classification results obtained, it can be concluded that the random forest machine learning technique provides more accurate prediction and higher performance than the other methods described in this study. However, when compared to other research accessible in the current literature, some of the other approaches utilized in this study, such as naive Bayes, DT and SVM, Random Forest, and others, produce the most optimum outcomes.

The main objective of this study is to help diabetologist to establish an accurate treatment routine for their diabetic patients. Due to the high accuracy and diagnose the disease in a shorter time and the rapid treatment, this study could open a window in the development of an electronic health system for diabetic patients. There are also a few aspects in this study that could be improved or expanded in the future. In perspective term:

• Creation of diabetes database for Algerian patients

• Diabetes prediction with the deep learning approach.

• Developed a solution based on an Android application in order to help people predict if they have diabetes.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author Contributions

ME: conceptualization, method, and analysis. OK: validation and conceptualization. ST: conceptualization and analysis. SG: methodology and investigation. NR-N: discussion and conclusion. CT: review, editing, and data curation. GA: results and conclusion. AL: editing and analysis. All authors contributed to the article and approved the submitted version.

Funding

This research has been funded by the Research General Direction at Universidad Santiago de Cali under call No 01-2021.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Abbreviations

UCI, University California Irvine; DT, Decision Trees; PIDD, Pima Indians Diabetes Database; WHO, World Health Organization; ML, Machine Learning; AI, Artificial Intelligence; FBG, Fasting Blood Glucose; SVM, Support Vector Machine; SMO, Sequential Minimal Optimization; KNN, K-Nearest Neighbors.

References

1. Organisation mondiale de la santé. Rapport mondial sur le diabète. (2016). p. 88

Google Scholar

2. Medtronic. Le Diabète En Quelque Mots. [En ligne]. Available online at: https://www.parlonsdiabete.com/parlonsdiabete/le-diabete-en-quelques-mots (accessed November, 2021).

3. Max Ray,. DIABETES -TYPE 2: The Review of Diabetic Studies. (2019). Available online at: https://www.researchgate.net/publication/336634065_DIABETES_-TYPE_2 (accessed September, 2021).

4. Kumari VA, Chitra R. Classification de la maladie du diabète à l'aide d'une machine à vecteur de soutien. IJERA. (2013) 3:1797–801.

Google Scholar

5. Ahmed TM. Using data mining to develop model for classifying diabetic patient control level based on historical medical records. J Theor Appl Inf Technol. (2016) 87:316–23.

Google Scholar

6. Shetty D, Rit K, Shaikh S, Patil N. Diabetes disease prediction using data mining. In: Innovations in information, embedded and communication systems (ICIIECS) international conference. Coimbatore, India (2017). p. 1–5.

Google Scholar

7. Bhoia SK, Pandab SK, Jenaa KK, Abhisekhc PA, Sahood KS, Samae NU, et al. ‘Prediction of Diabetes in Females of PimaIndian Heritage: A Complete Supervised Learning Approach'. Turk J Comput Math Educ. (2021) 12:3074–84.

Google Scholar

8. Kandhasamy JP, Balamurali S. Performance analysis of classifier models to predict diabetes mellitus. Procedia Comput Sci. (2015) 47:45–51. doi: 10.1016/j.procs.2015.03.182

CrossRef Full Text | Google Scholar

9. Vijayanv V, Ravikumar A. Study of data mining algorithms for prediction and diagnosis of diabetes mellitus. Int J Comput Appl. (2014) 95:12–6. doi: 10.5120/16685-6801

PubMed Abstract | CrossRef Full Text | Google Scholar

10. Soleh M, Ammar N, Sukmadi I. Website-based application for classification of diabetes using logistic regression method. Jurnal Ilmiah Merpati. (2021) 9. Retrieved from: https://ojs.unud.ac.id/index.php/merpati/article/view/66691 (accessed November, 2021).

11. Rajput DS, Basha SM, Xin Q, Gadekallu TR, Kaluri R, Lakshmanna K, et al. Providing diagnosis on diabetes using cloud computing environment to the people living in rural areas of India. J Ambient Intell Humaniz Comput. (2021).

Google Scholar

12. Deepa N, Prabadevi B, Maddikunta PK, Gadekallu TR, Baker T, Khan MA, et al. An AI-based intelligent system for healthcare analysis using Ridge-Adaline Stochastic Gradient Descent Classifier. J Supercomput. (2021) 77:4–16. doi: 10.1007/s11227-020-03347-2

CrossRef Full Text | Google Scholar

13. Kumar DA, Govindasamy R. Performance et évaluation des techniques d'exploration de données de classification dans le diabète. IJCSIT. (2015) 6:1312–9.

Google Scholar

14. Ouamri O. Contribution des arbres dirigés et les k-means pour l'indexation et recherche d'images par contenu, Mémoire de Magister en Informatique, by H. fizazi izabatene, Université des Sciences et de la Technologie d'Oran- Mohamed Boudiaf, Département d'Informatique (2011). p. 113

Google Scholar

15. Java T point. Naïve Bayes Algorithm. [En ligne]. Available online at: https://www.javatpoint.com/machinelearning-naive-bayes-classifier (accessed July, 2021).

16. Rokach L, Maimon. Exploration de données avec arbres de décision: théorie et applications. 2e édition, World Scientific Pub Co Inc. (2015). Available from: http://cedric.cnam.fr/vertigo/cours/ml2/coursArbresDecision.html

17. Kandhasamy JP, Balamurali, S,. Procedia Computer Science. Elsevier (2015). Available online at: https://scholar.google.com/citations?user=LYSOeWMAAAAJ&hl=fr&oi=sra

18. Kaggle UCI Machine Learning, Base de donnée Pima Indian Diabetes. Available online at: https://www.kaggle.com/uciml/pima-indians-diabetes-database (accessed November, 2021).

19. Kaggle Johan Ensemble de données sur le diabète, extrait de l'hôpital de Francfort Allemagne. Available online at: https://www.kaggle.com/johndasilva/diabetes (accessed November, 2021).

20. Sisodia D, Sisodia DS. Prediction of Diabetes using Classification Algorithms. Procedia Comput Sci. (2018) 132:1578–85. doi: 10.1016/j.procs.2018.05.122

CrossRef Full Text | Google Scholar

21. Amel S, Karima R. The prediction of diabetes using machine learning algorithms, Master's thesis in Computer Science, supervised by Brahimi Farida, University AMO of Bouira Faculty of Sciences and Applied Sciences, Department of Computer Science (2019). p. 85.

Google Scholar

22. Nishat MM, Faisal F, Mahbub MA, Mahbub MH, Islam S, Hoque MA. Performance assessment of different machine learning algorithms in predicting diabetes mellitus. Biosc Biotech Res Comm. (2021) 14. doi: 10.21786/bbrc/14.1/10

CrossRef Full Text | Google Scholar

23. Daanouni O, Cherradi B, Tmiri A. Predicting Diabetes Diseases Using Mixed Data and Supervised Machine Learning Algorithms. SCA2019 (2019).

PubMed Abstract | Google Scholar

24. Onyema EM. Opportunities and challenges of use of mobile phone technology in teaching and learning in Nigeria-a review. IJREI. (2019) 3:352–8. doi: 10.36037/IJREI.2019.3601

CrossRef Full Text | Google Scholar

25. Onyema EM, Elhaj MAE, Bashir SG, Abdullahi I, Hauwa AA, Hayatu AS. Evaluation of the Performance of K-Nearest Neighbor Algorithm in Determining Student Learning Styles. Int J of Innovative Sci, Eng & Techn. (2020) 7:91–102.

Google Scholar

26. Shariq AB, Muhammad WA, Syed AH, Arindam G, Onyema EM. Smart Health Application for Remote Tracking of Ambulatory Patients. In: Hafizul Islam SK, Samanta D, editors. Smart Healthcare System design: Security and Privacy Aspects. Wiley Online Library (2021). p. 33–55.

Google Scholar

27. Jo O, Iwendi C, Bashir AK, Peshkar A, Sujatha R, Chatterjee JM, et al. COVID-19 patient health prediction using boosted random forest algorithm. Front Public Health. (2020) 8:357. doi: 10.3389/fpubh.2020.00357

PubMed Abstract | CrossRef Full Text | Google Scholar

28. Celestine I, Suresh P. An efficient and unique TF/IDF algorithmic model-based data analysis for handling applications with big data streaming. Electronics. (2019) 8:28. doi: 10.3390/electronics8111331

CrossRef Full Text | Google Scholar

29. Celestine I, Mueen U, James A, Pascal N, Joseph HA, Ali KB. On detection of Sybil attack in large scale VANETs using spider-monkey technique. IEEE Access. (2018) 6:47258–67. doi: 10.1109/ACCESS.2018.2864111

CrossRef Full Text | Google Scholar

30. Jalil P, Celestine I, Praveen K, Gadekallu TR, Lakshmanna K, Bashir AK. A metaheuristic optimization approach for energy efficiency in the IoT networks. Softw Pract Exp. (2020) 14:2–9. doi: 10.1002/spe.2797

CrossRef Full Text | Google Scholar

31. Rajendran S, Khalaf OI, Alotaibi Y, Alghamdi S. MapReduce-based big data classification model using feature subset selection and hyperparameter tuned deep belief network. Sci Rep. (2021) 11:24138. doi: 10.1038/s41598-021-03019-y

PubMed Abstract | CrossRef Full Text | Google Scholar

32. Rajalakshmi M, Saravanan V, Arunprasad V, Romero CAT, Khalaf OI, Karthik C. Machine learning for modeling and control of industrial clarifier process. IASC. (2022) 32:339–59. doi: 10.32604/iasc.2022.021696

CrossRef Full Text | Google Scholar

33. Alsubari SN, Deshmukh SN, Alqarni AA, Alsharif N, Aldhyani THH, Alsaade FW, et al. Data Analytics for the Identification of Fake Reviews Using Supervised Learning. Comput Mater Contin. (2022) 70:3189–204. doi: 10.32604/cmc.2022.019625

CrossRef Full Text | Google Scholar

34. Khalaf OI, Abdulsahib GM. Optimized dynamic storage of data (ODSD) in IoT based on blockchain for wireless sensor networks. Peer-to-Peer Netw Appl. (2021) 14:2858–73. doi: 10.1007/s12083-021-01115-4

CrossRef Full Text | Google Scholar

35. Surendran R, Khalaf OI, Andres C. Deep learning based intelligent industrial fault diagnosis model. Comput Mater Contin. (2022) 70:6323–38. doi: 10.32604/cmc.2022.021716

CrossRef Full Text | Google Scholar

Keywords: decision tree, random forest, Support Vector Machine (SVM), Bayesian Naive, diabetes, AI, ML, classification

Citation: Edeh MO, Khalaf OI, Tavera CA, Tayeb S, Ghouali S, Abdulsahib GM, Richard-Nnabu NE and Louni A (2022) A Classification Algorithm-Based Hybrid Diabetes Prediction Model. Front. Public Health 10:829519. doi: 10.3389/fpubh.2022.829519

Received: 05 December 2021; Accepted: 03 February 2022;
Published: 31 March 2022.

Edited by:

Thippa Reddy Gadekallu, VIT University, India

Reviewed by:

Abha Sharma, GD Goenka University, India
Adesegun Osijirin, Federal College of Dental Technology and Therapy Enugu, Nigeria
Praveen Kumar, VIT University, India

Copyright © 2022 Edeh, Khalaf, Tavera, Tayeb, Ghouali, Abdulsahib, Richard-Nnabu and Louni. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Michael Onyema Edeh, bWljaGFlbC5lZGVoQGNjdS5lZHUubmc=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.