AUTHOR=Proksch Sebastian , Schühle Julia , Streeb Elisabeth , Weymann Finn , Luther Teresa , Kimmerle Joachim TITLE=The impact of text topic and assumed human vs. AI authorship on competence and quality assessment JOURNAL=Frontiers in Artificial Intelligence VOLUME=7 YEAR=2024 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2024.1412710 DOI=10.3389/frai.2024.1412710 ISSN=2624-8212 ABSTRACT=Background

While Large Language Models (LLMs) are considered positively with respect to technological progress and abilities, people are rather opposed to machines making moral decisions. But the circumstances under which algorithm aversion or algorithm appreciation are more likely to occur with respect to LLMs have not yet been sufficiently investigated. Therefore, the aim of this study was to investigate how texts with moral or technological topics, allegedly written either by a human author or by ChatGPT, are perceived.

Methods

In a randomized controlled experiment, n = 164 participants read six texts, three of which had a moral and three a technological topic (predictor text topic). The alleged author of each text was randomly either labeled “ChatGPT” or “human author” (predictor authorship). We captured three dependent variables: assessment of author competence, assessment of content quality, and participants' intention to submit the text in a hypothetical university course (sharing intention). We hypothesized interaction effects, that is, we expected ChatGPT to score lower than alleged human authors for moral topics and higher than alleged human authors for technological topics and vice versa.

Results

We only found a small interaction effect for perceived author competence, p = 0.004, d = 0.40, but not for the other dependent variables. However, ChatGPT was consistently devalued compared to alleged human authors across all dependent variables: there were main effects of authorship for assessment of the author competence, p < 0.001, d = 0.95; for assessment of content quality, p < 0.001, d = 0.39; as well as for sharing intention, p < 0.001, d = 0.57. There was also a small main effect of text topic on the assessment of text quality, p = 0.002, d = 0.35.

Conclusion

These results are more in line with previous findings on algorithm aversion than with algorithm appreciation. We discuss the implications of these findings for the acceptance of the use of LLMs for text composition.