Skip to main content

ORIGINAL RESEARCH article

Front. Artif. Intell.
Sec. Machine Learning and Artificial Intelligence
Volume 8 - 2025 | doi: 10.3389/frai.2025.1413820
This article is part of the Research Topic Harnessing Artificial Intelligence for Multimodal Predictive Modeling in Orthopedic Surgery View all 7 articles

Deep Learning in Gonarthrosis Classification: A Comparative Study of Model Architectures and Single vs. Multi-Model Methods

Provisionally accepted
Şahika Betül Yaylı Şahika Betül Yaylı 1Kutay Kılıç Kutay Kılıç 1Salih Beyaz Salih Beyaz 2*
  • 1 Turkcell (Turkey), Istanbul, Türkiye
  • 2 Başkent University, Ankara, Türkiye

The final, formatted version of the article will be published soon.

    Purpose: This study aims to classify Kellgren-Lawrence (KL) osteoarthritis stages using knee anteroposterior X-ray images by comparing two deep learning (DL) methodologies: a traditional single-model approach and a proposed multi-model approach. Specifically, we investigated:The effectiveness of single-model and multi-model DL approaches in KL stage classification.The performance of seven convolutional neural network (CNN) architectures across four DL tasks.The impact of CLAHE (Contrast Limited Adaptive Histogram Equalization) augmentation on classification outcomes.Approach: We created a dataset of 14,607 annotated knee AP X-rays from three hospitals. The knee joint region was isolated using a YOLOv5 object detection model. The multi-model approach utilized three DL models: one for osteophyte detection, another for joint space narrowing analysis, and a third to combine these outputs with demographic and image data for KL classification. The single-model approach directly classified KL stages as a benchmark. Seven CNN architectures (NfNet-F0/F1, EfficientNet-B0/B3, Inception-ResNet-v2, VGG16) were trained with and without CLAHE augmentation.Results: The single-model approach achieved an F1-score of 0.763 and accuracy of 0.767, outperforming the multi-model strategy, which scored 0.736 and 0.740. Different models performed best across tasks, underscoring the need for task-specific architecture selection. CLAHE negatively impacted most models, with only one showing a marginal improvement of 0.3%.Conclusion: The single-model approach was more effective for KL grading, surpassing metrics in existing literature. These findings emphasize the importance of task-specific architectures and preprocessing. Future studies should explore ensemble modeling, advanced augmentations, and clinical validation to enhance applicability.

    Keywords: artificial intelligence, deep learning, Transfer Learning, Kellgren Lawrence, Gonarthrosis, medical image: multimodal learning

    Received: 07 Apr 2024; Accepted: 14 Jan 2025.

    Copyright: © 2025 Yaylı, Kılıç and Beyaz. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Salih Beyaz, Başkent University, Ankara, Türkiye

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.