AUTHOR=Albalawi Rania , Yeap Tet Hin , Benyoucef Morad TITLE=Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis JOURNAL=Frontiers in Artificial Intelligence VOLUME=3 YEAR=2020 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2020.00042 DOI=10.3389/frai.2020.00042 ISSN=2624-8212 ABSTRACT=
With the growth of online social network platforms and applications, large amounts of textual user-generated content are created daily in the form of comments, reviews, and short-text messages. As a result, users often find it challenging to discover useful information or more on the topic being discussed from such content. Machine learning and natural language processing algorithms are used to analyze the massive amount of textual social media data available online, including topic modeling techniques that have gained popularity in recent years. This paper investigates the topic modeling subject and its common application areas, methods, and tools. Also, we examine and compare five frequently used topic modeling methods, as applied to short textual social data, to show their benefits practically in detecting important topics. These methods are latent semantic analysis, latent Dirichlet allocation, non-negative matrix factorization, random projection, and principal component analysis. Two textual datasets were selected to evaluate the performance of included topic modeling methods based on the topic quality and some standard statistical evaluation metrics, like recall, precision,