AUTHOR=Ivanov Vladimir Vladimirovich TITLE=Sentence-level complexity in Russian: An evaluation of BERT and graph neural networks JOURNAL=Frontiers in Artificial Intelligence VOLUME=5 YEAR=2022 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2022.1008411 DOI=10.3389/frai.2022.1008411 ISSN=2624-8212 ABSTRACT=Introduction

Sentence-level complexity evaluation (SCE) can be formulated as assigning a given sentence a complexity score: either as a category, or a single value. SCE task can be treated as an intermediate step for text complexity prediction, text simplification, lexical complexity prediction, etc. What is more, robust prediction of a single sentence complexity needs much shorter text fragments than the ones typically required to robustly evaluate text complexity. Morphosyntactic and lexical features have proved their vital role as predictors in the state-of-the-art deep neural models for sentence categorization. However, a common issue is the interpretability of deep neural network results.

Methods

This paper presents testing and comparing several approaches to predict both absolute and relative sentence complexity in Russian. The evaluation involves Russian BERT, Transformer, SVM with features from sentence embeddings, and a graph neural network. Such a comparison is done for the first time for the Russian language.

Results and discussion

Pre-trained language models outperform graph neural networks, that incorporate the syntactical dependency tree of a sentence. The graph neural networks perform better than Transformer and SVM classifiers that employ sentence embeddings. Predictions of the proposed graph neural network architecture can be easily explained.