Skip to main content

METHODS article

Front. Artif. Intell.
Sec. Natural Language Processing
Volume 7 - 2024 | doi: 10.3389/frai.2024.1381290
This article is part of the Research Topic Natural Language Processing for Recommender Systems View all 4 articles

Efficient Incremental Training using a Novel NMT-SMT Hybrid Framework for Translation of Low-resource Languages

Provisionally accepted
  • 1 School of Computer Science Engineering and Information Systems, Vellore Institute of Technology,India, vellore, India
  • 2 School of Computer Science and Engineering, VIT University, Vellore, India

The final, formatted version of the article will be published soon.

    The data-hungry Statistical Machine Translation (SMT) and Neural Machine Translation (NMT) models offer state-of-the-art results for languages with abundant data resources. But extensive research is imperative to make these models perform equally well for low-resource languages. This paper proposes a novel approach to integrate the best features of the NMT and SMT systems for improved translation performance of low-resource English-Tamil language pair. The sub-optimal NMT model trained with the small parallel corpus translates the monolingual corpus and selects only the best translations, to retrain itself in the next iteration. The proposed method employs the SMT phrase pair table to determine the best translations, based on the maximum match between the words of the phrase pair dictionary and each of the individual translations. This repeating cycle of translation and retraining generates a large quasi-parallel corpus, thus making the NMT model more powerful. SMT-integrated incremental training demonstrates a substantial difference in the translation performance as compared to the existing approaches for incremental training. The model is strengthened further by adopting beam search decoding strategy to produce k best possible translations for each input sentence. Empirical findings prove that the proposed model with BLEU scores 19.56 and 23.49 outperforms the baseline NMT with scores 11.06 and 17.06 for Eng-to-Tam and Tam-to-Eng translations respectively. METEOR score evaluation further corroborates these results, proving the supremacy of the proposed model.

    Keywords: Hybrid NMT-SMT, Incremental training, Beam search, SMT Phrase Table, Lowresource languages

    Received: 03 Feb 2024; Accepted: 27 Aug 2024.

    Copyright: © 2024 K and M. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: VARALAKSHMI M, School of Computer Science and Engineering, VIT University, Vellore, India

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.