Skip to main content

ORIGINAL RESEARCH article

Front. Public Health
Sec. Children and Health
Volume 12 - 2024 | doi: 10.3389/fpubh.2024.1369041

Multimodal Machine Learning for Analysing Multifactorial Causes of Disease -The Case of Childhood Overweight and Obesity in Mexico

Provisionally accepted
Rosario Silva Sepulveda Rosario Silva Sepulveda 1Magnus Boman Magnus Boman 1,2*
  • 1 Karolinska Institutet (KI), Solna, Stockholm, Sweden
  • 2 BioClinicum, Karolinska University Hospital, Stockholm, Stockholm, Sweden

The final, formatted version of the article will be published soon.

    Background: Mexico has one of the highest global incidences of paediatric overweight and obesity. Public health interventions have shown only moderate success, possibly from relying on knowledge extracted using limited types of statistical data analysis methods.Purpose: To explore if multimodal machine learning can enhance identifying predictive features from obesogenic environments and investigating complex disease or social patterns, using the Mexican National Health and Nutrition Survey.We grouped features into five data modalities corresponding to paediatric population exogenous factors, in two multimodal machine learning pipelines, against a unimodal early fusion baseline. The supervised pipeline employed four methods: Linear classifier with Elastic Net regularisation, k-Nearest Neighbour, Decision Tree, and Random Forest. The unsupervised pipeline used traditional methods with k-Means and hierarchical clustering, with the optimal number of clusters calculated to be k = 2.The decision tree classifier in the supervised early fusion approach produced the best quantitative results. The top five most important features for classifying child or adolescent health were measures of an adult in the household, selected at random: BMI, obesity diagnosis, being single, seeking care at private healthcare, and having paid TV in the home. Unsupervised learning approaches varied in the optimal number of clusters but agreed on the importance of home environment features when analysing inter-cluster patterns. Main findings from this study differed from previous studies using only traditional statistical methods on the same database.Notably, the BMI of a randomised adult within the household emerged as the most important feature, rather than maternal BMI, as reported in previous literature where unwanted cultural bias went undetected.Our general conclusion is that multimodal machine learning is a promising approach for comprehensively analysing obesogenic environments. The modalities allowed for a multimodal approach designed to critically analyse data signal strength and reveal sources of unwanted bias.In particular, it may aid in developing more effective public health policies to address the ongoing paediatric obesity epidemic in Mexico. This article is written in British English.

    Keywords: Supervised machine learning, Unsupervised machine learning, multimodal machine learning, Bias, Paediatric obesity, obesogenic environment, Mexico

    Received: 11 Jan 2024; Accepted: 16 Dec 2024.

    Copyright: © 2024 Silva Sepulveda and Boman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Magnus Boman, Karolinska Institutet (KI), Solna, 171 77, Stockholm, Sweden

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.