Skip to main content

ORIGINAL RESEARCH article

Front. Artif. Intell.
Sec. Medicine and Public Health
Volume 7 - 2024 | doi: 10.3389/frai.2024.1374162
This article is part of the Research Topic Soft Computing and Machine Learning Applications for Healthcare Systems View all 9 articles

Modeling Disagreement in Automatic Data Labelling for Semi-Supervised Learning in Clinical Natural Language Processing

Provisionally accepted
  • 1 Imperial College London, London, England, United Kingdom
  • 2 Department of Applied Mathematics and Theoretical Physics, Faculty of Mathematics, School of Physical Sciences, University of Cambridge, Cambridge, England, United Kingdom
  • 3 Queen Mary University of London, London, United Kingdom

The final, formatted version of the article will be published soon.

    Computational models providing accurate estimates of their uncertainty are crucial for risk management associated with decision making in healthcare contexts. This is especially true since many state-of-the-art systems are trained using the data which has been labelled automatically (self-supervised mode) and tend to overfit. In this work, we investigate the quality of uncertainty estimates from a range of current state-of-the-art predictive models applied to the problem of observation detection in radiology reports. This problem remains understudied for Natural Language Processing in the healthcare domain. We demonstrate that Gaussian Processes (GPs) provide superior performance in quantifying the risks of 3 uncertainty labels based on the negative log predictive probability (NLPP) evaluation metric and mean maximum predicted confidence levels (MMPCL), whilst retaining strong predictive performance.

    Keywords: automated labeling, Clinical text, Natural Language Processing, Radiology, Semi-Supervised Learning, uncertainty

    Received: 21 Jan 2024; Accepted: 05 Sep 2024.

    Copyright: © 2024 Liu, Seedat and Ive. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Julia Ive, Queen Mary University of London, London, United Kingdom

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.