AUTHOR=Kassab Lara , Kryshchenko Alona , Lyu Hanbaek , Molitor Denali , Needell Deanna , Rebrova Elizaveta , Yuan Jiahong TITLE=Sparseness-constrained nonnegative tensor factorization for detecting topics at different time scales JOURNAL=Frontiers in Applied Mathematics and Statistics VOLUME=10 YEAR=2024 URL=https://www.frontiersin.org/journals/applied-mathematics-and-statistics/articles/10.3389/fams.2024.1287074 DOI=10.3389/fams.2024.1287074 ISSN=2297-4687 ABSTRACT=

Temporal text data, such as news articles or Twitter feeds, often comprises a mixture of long-lasting trends and transient topics. Effective topic modeling strategies should detect both types and clearly locate them in time. We first demonstrate that nonnegative CANDECOMP/PARAFAC decomposition (NCPD) can automatically identify topics of variable persistence. We then introduce sparseness-constrained NCPD (S-NCPD) and its online variant to control the duration of the detected topics more effectively and efficiently, along with theoretical analysis of the proposed algorithms. Through an extensive study on both semi-synthetic and real-world datasets, we find that our S-NCPD and its online variant can identify both short- and long-lasting temporal topics in a quantifiable and controlled manner, which traditional topic modeling methods are unable to achieve. Additionally, the online variant of S-NCPD shows a faster reduction in reconstruction error and results in more coherent topics compared to S-NCPD, thus achieving both computational efficiency and quality of the resulting topics. Our findings indicate that S-NCPD and its online variant are effective tools for detecting and controlling the duration of topics in temporal text data, providing valuable insights into both persistent and transient trends.