AUTHOR=Navarro-Colorado Borja TITLE=On Poetic Topic Modeling: Extracting Themes and Motifs From a Corpus of Spanish Poetry JOURNAL=Frontiers in Digital Humanities VOLUME=5 YEAR=2018 URL=https://www.frontiersin.org/journals/digital-humanities/articles/10.3389/fdigh.2018.00015 DOI=10.3389/fdigh.2018.00015 ISSN=2297-2668 ABSTRACT=

This paper analyzes the application of LDA topic modeling to a corpus of poetry. First, it explains how the most coherent LDA-topics have been established by running several tests and automatically evaluating the coherence of the resulting LDA-topics. Results show, on one hand, that when dealing with a corpus of poetry, lemmatization is not advisable because several poetic features are lost in the process; and, on the other hand, that a standard LDA algorithm is better than a specific version of LDA for short texts (LF-LDA). The resulting LDA-topics have then been manually analyzed in order to define the relation between word topics and poems. The analysis shows that there are mainly two kinds of semantic relations: an LDA-topic could represent the subject or theme of the poem, but it could also represent a poetic motif. All these analyses have been undertaken on a large corpus of Golden Age Spanish sonnets. Finally, the paper shows the most relevant themes and motifs in this corpus such as “love,” “religion,” “heroics,” “moral,” or “mockery” on one hand, and “rhyme,” “marine,” “music,” or “painting” on the other hand.