Skip to main content

ORIGINAL RESEARCH article

Front. Psychol.
Sec. Quantitative Psychology and Measurement
Volume 15 - 2024 | doi: 10.3389/fpsyg.2024.1449272

Transformers Deep Learning Models for Missing Data Imputation: An Application of the ReMasker Model on a Psychometric Scale

Provisionally accepted
  • University of Naples Federico II, Naples, Italy

The final, formatted version of the article will be published soon.

    Missing data in psychometric research presents a substantial challenge, impacting the reliability and validity of study outcomes. Various factors contribute to this issue, including participant nonresponse, dropout, or technical errors during data collection. Traditional methods like mean imputation or regression, commonly used to handle missing data, rely upon assumptions that may not hold on psychological data and can lead to distorted results. This study aims to evaluate the effectiveness of transformer-based deep learning for missing data imputation, comparing ReMasker, a masking autoencoding transformer model, with conventional imputation techniques (mean and median imputation, Expectation-Maximization algorithm) and machine learning approaches (Knearest neighbors, MissForest, and an Artificial Neural Network). Using a psychometric dataset from the COVID distress repository, we assessed imputation performance through the Root Mean Squared Error (RMSE) between the original and imputed data matrices. Results indicate that machine learning techniques, particularly ReMasker, achieve superior performance in terms of reconstruction error compared to conventional imputation techniques across all tested scenarios. This finding underscores the potential of transformer-based models to provide robust imputation in psychometric research, enhancing data integrity and generalizability.

    Keywords: Missing Data1, Machine Learning2, artificial intelligence3, Deep Learning4, Psychometrics5

    Received: 14 Jun 2024; Accepted: 26 Nov 2024.

    Copyright: © 2024 Casella, Milano, Dolce and Marocco. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Monica Casella, University of Naples Federico II, Naples, Italy

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.