The Language of Extremism on Social Media: An Examination of Posts, Comments, and Themes on Reddit

Hiaeshutter-Rice, Dan; Hawkins, Ian

doi:10.3389/fpos.2022.805008

ORIGINAL RESEARCH article

Front. Polit. Sci., 10 May 2022

Sec. Elections and Representation

Volume 4 - 2022 | https://doi.org/10.3389/fpos.2022.805008

This article is part of the Research TopicNegativity, Incivility, and Toxicity in Political DiscussionsView all 6 articles

The Language of Extremism on Social Media: An Examination of Posts, Comments, and Themes on Reddit

Dan Hiaeshutter-Rice¹^*

Ian Hawkins²

¹Department of Advertising and Public Relations, Michigan State University, East Lansing, MI, United States
²Department of Communication Studies, University of Alabama at Birmingham, Birmingham, AL, United States

Digital media give the public a voice to discuss or share their thoughts about political and social events. However, these discussions can often include language that contributes to creating toxic or uncivil online environments. Using data from Reddit, we examine the language surrounding three major events in the United States that occurred in 2020 and early 2021 from the comments and posts of 65 communities identified for their focus on extreme content. Our results suggest that social and political events in the U.S. triggered increased hostility in discussions as well as the formation of a set of shared language for describing and articulating information about these major political/social moments. Findings further reveal shifts in language toward more negativity, incivility, and specific language surrounding non-White outgroups. Finally, these shifts in language online were found to be durable and last after the events occurred. Our project identifies that negative language is frequently present on social media and is not necessarily exclusive to one group, topic, or real-world event. We discuss the implications of language as a powerful tool to engage, recruit, and radicalize those within communities online.

Introduction

The year 2020 saw a tremendous amount of political and social changes worldwide as individuals in every country dealt with societal and medical crises resulting from the COVID-19 pandemic (Zoumpourlis et al., 2020). In addition to the global pandemic, the United States contended with nationwide protests against racism (e.g., George Floyd murder protests, Taylor, 2021) and experienced a Presidential election fraught with emotions and hostility (American Psychological Association, 2020). Given these highly salient and pressing issues, it is understandable that in 2020 individuals regularly sought information about these important real-world events. Inevitably, the source where individuals turn to for political and social information matters a great deal, whether that be friends, family, cable news, or online discussions (e.g., Lambert et al., 1988; Althaus and Tewksbury, 2000; Johnson and Kaye, 2013).

Diverse sources, both in ideology and platform, have been demonstrated to give citizens broad understanding of politics as well as heightened political tolerance (e.g., Dubois and Blank, 2018). However, there are concerns about the degree to which individuals are exposed to diverse content on social media platforms, in part driven by the structure of the communication platform (Barberá, 2020; Hiaeshutter-Rice et al., 2021). Given the often emotionally charged nature of politics, especially around major events, we might expect that negative, hostile, and uncivil language can be used to describe political/social moments and become part of the standard discourse within like-minded communities (Valentino et al., 2011; Brader and Marcus, 2013). In addition, when major events occur, how online communities respond and the resultant content on platforms can shape the ways in which users view and perceive political and social actions. This has important implications for seeking information, as digital platforms provide a space where political discussions often feature increased toxicity, incivility, negativity, and polarization (Kim et al., 2020). This paper seeks to understand how like-minded communities that are generally focused around intolerance, hate, or extremist views respond to major political and social events. It is particularly focused on how language changes from before and after a political moment in these online spaces. One digital community where this is likely commonplace, and the focus of the present research, is Reddit.

Reddit is one of the fastest growing social media platforms and is one of only two online platforms that has experienced statistically significant growth in recent years (Pew Research Center, 2019, 2021).¹ While it calls itself the “front page of the Internet,” Reddit has also received extensive attention for the presence of hate and extremist-related content (Massanari, 2020). For example, once-popular subreddits like r/The_Donald and r/MGTOW provided a space for toxic political discussions (Flores-Saviaga et al., 2018; Gaudette et al., 2020).² Despite Reddit's crackdown on some of the largest problematic communities (Allyn, 2020), less visible, yet still controversial, subreddits continue to operate freely. More notably, these are often a place on Reddit for like-minded individuals to gather to communicate with each other and provide information about current events. This has the potential to create a space for shared language and hostility to develop around recent influential events in the United States (e.g., Presidential elections, social movements, nationwide crises, etc.). The implications of this are important as major social/political moments are a time for individuals to learn about politics, processes, government policy, and given the potentially insular nature of a subreddit, a place for specific narratives to develop around these moments.

Although research has begun to explore Reddit as a platform (e.g., Proferes et al., 2021), little research has provided a macro level examination of the specific language engaged with and viewed by its users (e.g., see Nithyanand et al., 2017a,b). Even less research has investigated this in the context of particularly impactful recent social and political events throughout 2020 and early 2021. Through an analysis of 55,797 and 5,087,644 comments across 65 subreddits, the present study seeks to advance this agenda with an exploration into the language used in communities on Reddit identified for their incivility and hostile communication dynamics. In addressing a series of research questions and predictions, the current research examined social media content surrounding three major political/social events that occurred during our time frame: 1) the killing of George Floyd and the subsequent Black Lives Matter protests starting on May 25th, 2) 2020 US Presidential Election Day on November 3rd, and 3) the 2021 storming of the US Capitol in an attempted insurrection on January 6th.

Our investigation is predicated on the contention that, especially in times of major political moments, self-reinforcing centers of information such as like-minded online communities of equals, have the potential to exacerbate existing group tensions. In addition, how individuals within a group discuss events and moments furthers the development of an uncivil or toxic shared set of language. These are not new arguments (e.g., Kim et al., 2020), but the present research extends the literature in two important ways. First, we conceptually examine how recent/current major events are dealt with in spaces specifically designed to encourage horizontal political discussion and discourse. We propose that major political moments theoretically increase the use of hostile or negative language, and because they are durable, will shape future discourses along these lines in communities that regularly engage in these approaches. Second, from a practical standpoint, we highlight various ways to evaluate content on Reddit using a series of content-analytic tools. By focusing on the nature of language around recent major US political events that occurred in 2020 and early 2021 we offer current and crucial insights into how real-world situations contribute to the use and spread of negative, uncivil, and substantive durable discussions among individuals on Reddit.

Negativity, Incivility, Toxicity, and Language on Social Media

Individuals turn to social media for many different reasons, including searches for information, escape from negative emotions, and pursuit of positive emotions (Brailovskaia et al., 2020). Most commonly, public opinion data show that the majority of people use online platforms to communicate with others (Pew Research Center, 2011). Individuals typically interact through posting content themselves as well as engaging with content created by others, such as commenting. Indeed, it is important to distinguish between negative interactions and negative content. Toxic or uncivil language in posts and comments is frequent across almost all social media platforms (Leite et al., 2020). Additionally, online incivility is often linked to contextual factors including quotes in an article or the article's topic (Coe et al., 2014). He degree to which online interactions are positive or negative can be further moderated by factors such as the size of an individual's social network (Kim, 2020). While similar concepts, uncivil language can be described as language that is rude or impolite (Merriam-Webster, n.d). Whereas, toxic language is considered to be poisonous or used with the intent to harm others. Kim et al. (2020) argues that definitions of incivility, and the implications or consequences of it online often vary by study. However, despite differing theoretical and methodological criteria, disrespect for others is a common theme in the literature. Existing research describes incivility/toxicity as, “expressing disrespect for someone by using insulting language, profanity, or name-calling; by engaging in personal attacks; and/or by employing racist, sexist, and xenophobic terms” (Kim et al., 2020, p. 924). Interestingly different types of incivility might elicit a range of outcomes from online users. For example, messaging that contains name-calling or vulgarity is often considered particularly uncivil behavior or speech (Kenski et al., 2020). Altogether, uncivil and toxic forms of communication can contribute to a negative information environment. One topic where incivility has experienced a steady increase is in politically focused discussions within the media (Gervais, 2014).

Indeed, when users are exposed to uncivil political content that conflicts with their own beliefs this results in more anger and less satisfaction with discourse online (Gervais, 2015). However, it is also critical to consider participants' agreement or disagreement with the toxic information they examine. For example, viewing uncivil information that aligns with an individual's beliefs is likely to increase the incivility of political comments that these individuals use online (Gervais, 2015). People are also typically angered by incivility that specifically targets their ingroup and not angered by like-minded incivility (Gervais, 2017). In other words, we become angered by incivility we disagree with and might choose to engage with incivility that agrees with our beliefs. This means that the connection between negative content and social media is not necessarily one size fits all and might lead some individuals to immerse themselves further into these, often, harmful discussions.

In mediated environments this has broad implications as it can socialize individuals to see negativity as a behavior that is generally acceptable online (Hmielowski et al., 2014). This has critical considerations for not only how people interact with each other, but likely how they process information they view in digital spaces. Indeed, the presence of uncivil comments connected to news blogs can increase participants' perceptions that a story is biased (Anderson et al., 2018). Given the frequency of negative comments online this is an important example of how incivility can influence or skew the ways in which individuals understand the content they are exposed to. Further, the relationship between social media and incivility is not inconsequential as toxic language in some situations could be harmful to democratic political discussions (Rossini, 2020). For example, when individuals view uncivil discourse in a digital format it is likely to lead to more issue polarization (Anderson et al., 2014).

Altogether, the research above highlights the relationship between uncivil language and digital spaces. Considering the influence that elements of language can have on engagement (e.g., Noguti, 2016) and incivility (Gervais, 2015, 2017) it is important to consider how it affects the ways in which individuals communicate with each other online. One common form of interaction on social media is horizontal communication. This is described as communication among at least two individuals that is participatory and who are at the same level (Costanza-Chock, 2006; Maia et al., 2021). This differs from other forms of communication that might involve information coming from political or media elites to the general public. Given the features of many social media platforms, the Internet is used extensively by individuals to engage in horizontal communication (Castells, 2007). Because horizontal interaction takes place between those who are considered equals it might be especially suitable for digital media websites that encourage communities to form. Indeed, one platform applicable to this, and the discussion of negativity online above, is Reddit.

In this context Reddit is an especially relevant website to consider for a few reasons. Mainly, it is important to recognize the forum or norms of the social media platform where the uncivil content is taking place. This might be notable in online communities where conversations are centered on uncivil topics (e.g., racism, sexism, discrimination), user identity is more homogenous along with anonymous, and horizontal forms of communication, even when negative, are encouraged.

Content and Communities on Reddit

Launched in 2005, Reddit is currently the 7th most visited website in the United States and 19th in the world (as of February 2021: Alexa Internet, 2021). Data on the demographics of Reddit users identifies that they are mostly male (67%), younger (64% ages 18–29), White (70%), and from the United States (48%) (Pew Research Center, 2016). In addition to its large user base (52 million daily active users) and number of communities (100,000+), this platform is known for its use of topic-oriented communities called subreddits which allow individuals to post content and subsequently provide comments on focused topics (Reddit, n.d.). Additionally, Reddit uses a unique voting system which gives users the ability to upvote or downvote any content that they choose to anonymously. The voting system underlies the algorithm that determines what people see when they view a subreddit, as highly upvoted and commented on posts rise to the top. This is important to the present research for two reasons.

First, it causes content that is highly upvoted to move to the top (e.g., becomes more visible) and vice versa for content that is downvoted (e.g., becomes less visible). Second, because voting is anonymous, individuals do not have to be concerned about potential backlash from others for what they upvote or downvote. This aspect of anonymity in evaluating others' content, and creating a profile more generally, is one way that Reddit differs from other social media platforms such as Facebook, Twitter, and Instagram (Van der Nagel and Frith, 2015). Anonymity on social media platforms can have important implications. Indeed, as Boyd (2012) discusses, individuals may be viewed by others as more accountable when they have to make their identity known and are unable to stay anonymous online. Additionally, when individuals can conceal their identity they are likely to engage in more aggressive forms of communication (Hutchens et al., 2015). Altogether, the features of Reddit allows individuals who do not want to actively and publicly participate a way to still contribute to what content is popularized (Kilgo et al., 2018).

In the context of problematic content, the voting system may be a positive technical feature by allowing users the ability to downvote harmful content without fear of retaliation. However, that only works in communities where the norm is to punish problematic content. In subreddits where that type of content is tolerated or encouraged, the voting system can lead to hate related or toxic content being anonymously pushed to the top of what a user might view. Indeed, an emerging area of research has begun to explore the reasons and content behind Reddit's issues with hate speech (for a review see Massanari, 2020).

In a sample of individual users on Reddit, 43% were found to be those who steadily used or gradually started to use more toxic comments over time (Mall et al., 2020). Other research points to communities that discuss discriminatory views related to race/ethnicity and politics that have become popular on Reddit (Topinka, 2018; Gaudette et al., 2020; Mittos et al., 2020). Subreddits identified for their sexist concentration on Men's rights and anti-feminist activism also contribute to harmful dynamics (Massanari, 2017; LaViolette and Hogan, 2019). Common across hate related topics on this platform is the use of negative language to both form an ingroup identity and create toxic conversations around “others” or outgroups. Indeed, on the subreddit r/The_Donald users were found to use hostile language by describing Muslim immigrants as terror threats to western civilization (Gaudette et al., 2020). While Reddit has attempted to crack down and moderate explicitly racist, sexist, and hate speech content (Chandrasekharan et al., 2017, 2020) much of it is still present and available to be viewed by individuals on this platform.

Exposure to uncivil language in online communities is important as it can have an effect on other users. Specific to Reddit, toxic language is associated with a comment receiving more direct replies (Xia et al., 2020). Toxic comments were also found to be positively associated with a higher user score (e.g., upvotes compared to downvotes) in some subreddits. This is problematic as it can lead to negative content having a more influential or prominent spot within networks of information online (Chipidza, 2021). While much of this research employs cross sectional or big data approaches, experimental evidence finds that when shown comments on social media that are considered highly toxic participants in turn are more likely to also write toxic comments themselves (Kim et al., 2020). This relationship was not found when participants were exposed to civil comments.

Altogether the literature indicates that negative language online can have a contagion effect that has a harmful influence on those exposed to this content (Gervais, 2015; Anderson et al., 2018; Kim et al., 2020). Relevant to the present study, if individuals regularly view uncivil language related to real world events on Reddit this could have implications for both how they understand these events as well as how they engage with the content surrounding them. However, to this point little research has examined 1) if the incivility of language changes before and after events in the real world on platforms such as Reddit, 2) whether these shifts in language are durable or last after the event occurs, or 3) provided an examination of this process across the most recent year of data (e.g., 2020 and early 2021). Considering the political and social volatility in 2020, it is likely that real world events and experiences were regularly discussed across digital media websites.

Events in the Real World and Communication Online

One function of social media is for individuals to share their thoughts, opinions, and experiences about what is happening in the real world with those in the digital world. This leads to the events of the real-world guiding topics of conversation online and is especially true in the context of politics. For example, the 2016 Presidential election was recognized for being highly contentious (American Psychological Association, 2020). The heightened tension and support of the candidates in the real world was found to subsequently occur in the digital world as well. Indeed, in the subreddit for Donald Trump individuals were consistently supportive of his candidacy over the election cycle, whereas support for Hillary Clinton in her subreddit trended negatively over time (Hale and Grabe, 2018).

Other studies examining incivility on platforms like Reddit show that political content has become significantly more offensive since the start of the 2016 Presidential election (Nithyanand et al., 2017a). More specifically, while the amount of uncivil apolitical comments has stayed steady, the number of uncivil political comments has increased since 2016 (Nithyanand et al., 2017b). Interestingly, Nithyanand et al. (2017a) also find that when Trump did well in the polls this was positively related to increased incivility online. Of course, this boost in negative content is likely coming from different directions and as both pushback from his opponents and support from other like-minded conservatives. Altogether, this is an example of how the relationship between what is happening in the real world can trickle into the ways in which individuals interact in the digital world. However, data from Nithyanand et al. (2017a,b) is focused on large trends in language through 2017 (e.g., the end of Donald Trump's first presidential election) and is primarily focused on presidential politics.

In other words, less is known about if trends of negativity, incivility, and content on platforms such as Reddit are currently playing out in a similar manner. Considering this, a strong argument can be made that an updated analysis is needed around the 2020 Presidential election and discussions online. Additionally, much of the previous research has focused on trends over time (e.g., a large number of years) or around specific political events. While an examination of language across a number of years and political events is important, it is also critical to investigate language in the context of social events too. For example, it is likely that events in the recent past, such as the murder of George Floyd and the subsequent national protests, as well as current events like the COVID-19 pandemic also affect conversations on social media. Indeed, an investigation of language online before and after both political and social events in the real world provides a more precise understanding of how individual events influence discourse and horizontal communication on digital platforms. As discussed above, throughout the year 2020 and into 2021, a large number of critical social and political events took place. The present research provides an examination into how these influential moments in 2020 and the beginning of 2021 shaped discussions along with language online. Given the outlined literature above, we propose the following research questions.

RQ1: How does the negativity and positivity of the tone in online content change in response to real world events?

RQ2: How does incivility in online content change in response to real world events?

Further, we anticipate that these problematic communities are also havens of election disinformation and conspiracy theories. Consequently, content about election overturning, voting conspiracies, and related topics will be prevalent on these subreddits. Accordingly, we make the following prediction.

H1: Election, voting, and overturning content will increase as a share of all content in the subreddits examined between the 2020 Presidential election and the insurrection at the capitol on January 6th, 2021.

Data and Methods

Like many social media sites, Reddit has an API that users can access and collect data from. This particular API allows access to a variety of endpoints, including subreddit posts and the corresponding comments. We began our data collection process by creating a list of over 400 subreddits. Our list was self-created and is based on a variety of different sources. First, we collected subreddits from lists maintained by anti-hate subreddits. The primary list came from r/AgainstHateSubreddits in an, unfortunately, since deleted post.³ In addition, we spent time looking through various political subreddits⁴ and other aggregations of problematic communities.⁵

It is by no means comprehensive and is skewed toward those pages that have been identified by other users of Reddit. Moreover, our selection criteria relied on qualitatively evaluating the subreddits for their content to see if they warranted inclusion. We discuss the implications of this process toward the end of the paper. That being said, it is a reasonable starting point and covers most of the most commonly cited problematic pages. A number of these subreddits were unavailable for data collection for a variety of reasons. Including having been shut down or quarantined by Reddit, while others were deleted or made private. Consequently, this left us with 65 subreddits that we were able to access through the Reddit API. From these 65, we collected posts from January 14th, 2020, until January 11th, 2021, giving us a total of 55,797 posts and 5,087,644 comments.⁶

As discussed above, we were primarily interested in how major political and social events influence the language in problematic communities. As such, we examined the language associated with events, how language changes, and the durability of these changes afterwards. In order to do so, we employed structural topic modeling (STM). This is a powerful tool that looks at the co-occurrence of words in our corpus and classifies them into themes, or topics. One of the key advantages of STM is that it allows us to structure our results at the document level, meaning that we can use the associated metadata to organize results. Put simply, STM lets us evaluate how language changes across these communities based on the posting time and subreddit.

Running the STM requires some pre-processing of the text to yield understandable topics. This included the removal of topically meaningless text (emoticons, URLs, etc.) as well as stopwords. From here, we made the decision to stem the remaining text (e.g., conspiracy becomes conspiraci).⁷ While the subsequent analyses rely on spectral initialization (Mimno and Lee, 2014; Roberts et al., 2019), we include a series of optimization tests in the Appendix. Briefly, for comments we used a random sample of 10,000 drawn from the dataset. We began with the fixed beta and varied the K values (20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 140) that are around the K produced by the spectral initialization (73). We then used a fixed K of 73 and varied the alpha (0.01, 0.05, 0.1). These models are included in the Appendix along with the top-7 words that distinguish each of the topics. These diagnostics confirm the validity of using the spectral initialization and default meta variables.

In addition to our machine learning models, we also employ the Lexicoder Sentiment Dictionary (LSD, Young and Soroka, 2012) to evaluate the tone of our corpus as well as the incivility dictionary from Muddiman et al. (2019). The LSD is a widely used tool for evaluating sentiment (Murthy, 2015; Sabel and Dal Cin, 2016; Soroka et al., 2018). Functionally, the LSD uses a positive and negative word dictionary with each word coded in every post title, text, and comment as either positive or negative based on their presence in the dictionary. Negative words are coded as −1 with positive words coded as 1, which are then summed across the entire text of each observation and divided by the number of positive words, minus the number of negative words, all divided by the total number of words in the text. For instance, the sentence: “They Hate Us” would be coded as zero positive words minus one negative word divided by three total words, for a total score of −0.33. Each post text, post title, and comment were assigned a sentiment score, with mean scores of −0.397 and a median of 0. Here, higher numbers represent more positive sentiment whereas negative numbers indicate more negative sentiment.⁸

For incivility, we employ a similar strategy of counting the number of words that appear from the incivility dictionary developed by Muddiman and colleagues (2019). After counting the number of words from the dictionary that appear in the text, we then divide that number by the total number of words in the text then multiplied by 100. Higher numbers indicate more incivility with the total corpus having a mean of 0.758 and a median score of 0.

Findings

To start, we first examined the distribution of posts and comments over our time window. Figures 1, 2 show the posting and commenting frequency along with vertical lines indicating the three major events we were interested in. The line represents a moving average of a 7-day window for each day in our corpus. What the data show is a relatively steady stream of posting behavior for most of the year with the murder of George Floyd on May 25th, and the subsequent release of recorded footage the next day, producing a small uptick. As the presidential election nears, posting slowly then more dramatically increases, leading up to a huge spike in posts before and after the January 6th Capitol assault. For commenting behavior, the largest increases come during the election, as well as another large jump associated with the Capitol assault. Here, George Floyd does not drive as many comments as our other major events do.

FIGURE 1

Figure 1. Post count by day.

FIGURE 2

Figure 2. Comment count by day.

Posting and commenting behavior give us some insights into how these problematic groups responded to major events. The Capitol assault was, by far, the most posted about event over the roughly 1 year of our time window. What is also noteworthy is the ramping up that occurred before the assault itself. Posting behavior notably changed right before the election and continuously increased leading up to the actual attempted insurrection at the U.S. Capitol. This suggests that the Capitol assault was not an isolated event in these communities, but the culmination of a period of increasing engagement and discussion. This is not surprising of course, as from the election onwards there were a series of events that took place both as part of the regular election schedule and as a result of both the closeness of the election and Trump's attempts to subvert the process of a peaceful exchange of power. Each likely contributed to the ongoing dialogs identified in our corpus.

That being said, the primary purpose of this study is to evaluate shifts in content around these events. Consequently, we now turn to how sentiment, or tone, responds to these events. As noted above, these analyses used the Lexicoder Sentiment Dictionary. Figures 3, 4 below show the overall trend in tone across our corpora. Due to high variation in individual day tone, especially for posts, we smooth the line over a 7-day window.⁹ Post and comment tone is an aggregate of all content on that day. Negative numbers indicate negative content and positive numbers are positive content. What we are looking at is the relative changes in tone. For instance, post content is relatively positive, but events cause changes in the overall sentiment of the text. Therefore, some interesting dynamics immediately stand out. Post tone takes a large drop immediately following the murder of George Floyd in May of 2020. Conversely, we see an uptick in tone of comments and posts leading up to, but not following the presidential election. Overall trends reflect a priori expectations regarding how the members of these communities might respond to the specific events we examined.

FIGURE 3

Figure 3. Post tone by day.

FIGURE 4

Figure 4. Comment tone by day.

Thus far we have identified that major events precipitate changes in tone and that those changes last for extended periods of time post-event. Notably, the murder of George Floyd resulted in a shift in overall tone that did not quite reach back to pre-event levels until multiple months later. Alone the figures are rather striking, but we supplement these with interrupted time series analyses (ITSA) for both posts and comments around the George Floyd murder as additional evidence. Using ITSA allows us to isolate the effect of an event on the trend line of tone. We first subset the dataset to the 2 months before and 2 months after the murder and then code the event (equal to 1) as the 7 days after George Floyd was killed. We use the week after for our event as both search trends from Google indicate that attention waned significantly after this time and according to Chen et al. (2007) news events are often about a week. The ITSA produces significant variation between time periods for tone (p < 0.001, see Table 1), as shown in Figure 5.¹⁰

TABLE 1

Table 1. ANOVA results for comment tone.

FIGURE 5

Figure 5. Interrupted time series of George Floyd's murder on comment tone.

We view this as strong evidence that events in the real world can have long-standing effects on how online communities talk, although future research is needed to confirm this. As such, we investigate incivility using the prebuilt dictionary described above (Muddiman et al., 2019), as shown in Figure 6 below.

FIGURE 6

Figure 6. Comment incivility by day (7-day rolling average).

Much like with tone, we see long-standing effects of events on incivility, especially after the killing of George Floyd, with content becoming significantly more uncivil after the murder. We show another ITSA model (using the same procedures as above) in Figure 7 below with significant variation between time periods, again (p < 0.001, see Table 2). Consequently, we view this as strong evidence that events play a role in the tone and content of communications in these problematic subreddits. More importantly, though, we view this as evidence that the influence of some events can be durable and have effects on content beyond the actual day or so of the event itself.

FIGURE 7

Figure 7. Interrupted time series of George Floyd's murder on comment incivility.

TABLE 2

Table 2. ANOVA results for comment incivility.

Finally, we explore how post language responds to events by specifically focusing on the election and subsequent Capitol attack in this piece. We chose this event because we know that these communities were discussing voting, the election, and presidential politics both before and after the election. Thus, it serves as a useful topic to look at how language presence might change in response to events. Further, there is a clear connection between the election and the eventual assault on the Capitol. That connection means that we can look at the roughly 2 months between these events as a window into how these extremist communities responded to major political moments.

In order to do so, we needed to isolate politically relevant topics in our dataset. This approach allows us to look specifically at posts where discussion of the election, politics, and voting may occur. To begin, we first took all post titles and text (which is text the original poster added under the title) and coded each as 0 or 1 based on the presence of any of a set of words.¹¹ Posts which contained these words were used for analysis. This subset consisted of 13,404 posts from the original 55,797. We used a binary classification for each post being either pre- or post-election (from 9/3/2020 to 11/2/2020 as before and 11/3/2020 through 1/13/2021). Finally, we took the comments from these posts and ran our Structural Topic Model on them. This smaller dataset consisted of 215,334 comments over 132 days. In so doing, we have pared the dataset down to a level that will allow us to track discussion of the election and voting over time.

We present the results of our STM for the 215,334 comments in Figure 8 below. For this figure, topics to the left of the vertical line represent those groupings of words that are more associated with the pre-election timeframe. Topics to the right are more closely associated with the post-election window. The further the topic is from the vertical, the more closely associated it is with the respective category. What this tells us is that some topics are more closely related to the pre- vs. post-election time window. More importantly, it tells us that some topics that are of interest to this paper, namely 1, 44, and 62 are more closely associated with post-election content. These topics (and a few others) are the ones that revolve around conspiracy theories, voter fraud, faithless electors, and overturning the democratic process. Specifically, Topic 1 is focused on voting mechanisms, with words like count, ballot, machine. Topic 44 is the conspiracy topic. It includes words like fraud, investigate, prove, fact. Finally, Topic 62 is about the winner of the election as illustrated by words like Trump, Biden, and win. That all three are associated with the post-election window is important as it reflects the narratives in digital spaces going on around the alt-right and conversative spheres at the time.

FIGURE 8

Figure 8. STM for election time period.

While these results may seem self-explanatory, they highlight the important relationships proposed by the present research. Individuals who looked at these social media sites before the election would likely be exposed to a fundamentally different set of conversations than beforehand. These problematic communities took up the narratives and language of voter fraud, conspiracy, and election overturning following the November election. To further highlight these trends, we look at the prevalence of a few topics over the entire 132 days of the smaller timeline. We specifically look at topics 1, 44, and 62.¹² These are all closely related to the election and attempts to overturn it. While we do not necessarily know the functional form of how these topics vary, we apply a cubic spline to the covariate day and let the model produce an appropriate trend. Results are shown in Figure 9 through Figure 11.

FIGURE 9

Figure 9. Topic 1 proportion over time.

FIGURE 10

Figure 10. Topic 44 proportion over time.

FIGURE 11

Figure 11. Topic 62 proportion over time.

Here we see that topic 1, which is mostly associated with voting, is focused heavily on the pre-election timeframe (as evidenced by the first vertical gray line). After the election the topic remains part of the conversation but does not reach its same heights after ballots were cast. Topic 44, however, sees an almost 3-fold increase from its Election Day level. Similarly, topic 62 sees discussion spike pre-election, dip during the week or so after voting day, then rise back up to pre-election levels before trailing off again. Taken together, we argue that events have long-term impacts on how these communities talked and communicated, much like we saw with tone and incivility. Perhaps more importantly, though, discussion of election fraud, conspiracies, and various recourses saw large increases in the proportion of total content post-election and this trend continued far beyond the few days following the end of ballot casting.

Discussion

Uncivil language on social media is steadily increasing, especially related to political topics (Gervais, 2014). This is a critical issue given both that the public relies on digital spaces for information and that the sources where they consume content matter (Johnson and Kaye, 2013). This is likely even more important in times of crisis as experienced throughout 2020 and into 2021. Problematically, if the content users are exposed to about real-world events is hostile this has the potential to harmfully exacerbate group-based tensions. This could be especially true on platforms such as Reddit, which allow individuals the opportunity to form communities and virtually gather around topics that can be extremely negative (Flores-Saviaga et al., 2018; Gaudette et al., 2020). Accordingly, the present research examines incivility, valence of tone, and language in some of the most problematic communities on Reddit to understand the online content associated with significant political and social moments that recently occurred in the real world.

Our results highlight both how events change posting and engagement behavior, but also how content shifts in response to events. Using a series of dictionary and machine-learning tools, we find meaningful differences in the substance, tone, and content of Reddit posting and commenting behavior surrounding important events that occurred throughout 2020 and early 2021. More specifically, our data show that events in the real world substantively altered the language being used within the online communities examined and that major political/social events are associated with the content of far-right and extremist communities on social media becoming more negative, increasing in incivility, and coalescing language around a shared set of words and phrases. For the election and subsequent leadup to the attempted insurrection, we find that post-election language focused on messaging that the election results were fraudulent and needed to be overturned. Further, our findings reveal that these changes were durable beyond the initial event itself (e.g., the 2020 Presidential election). Conceptually, this suggests that the influence of offline events can have a meaningful and potentially long-lasting effect on the content of these online communities.

Given this shift in language online and the specific topics discussed in the immediate aftermath of the 2020 Presidential election, there are critical theoretical implications to consider. For example, how narrative shifts on social media can influence users in the real world. Indeed, existing research has argued that the messaging in digital spaces played a significant role in the buildup to the Capitol riot (Hawkins and Saleem, 2021). We find conceptual support for this in that content around the insurrection was not present before the election but was frequent after. Given that some individuals may have turned to social media platforms like Reddit for information after the election, the presence of this messaging, and its durability is important. For those upset with the outcome, these online narratives around fraud and insurrection may have been influential in further engaging and radicalizing their beliefs that the election was stolen.

While this might not necessarily have resulted in these individuals participating in the Capitol riot themselves, it is possible that this messaging influenced support for the rioters or distrust in the investigations that followed. Our contention is that language on social media is not inconsequential. Additionally, we argue that the distinctions between information and actions in the real and digital world are blurring. This has implications for the socializing influence of horizontal communication (Costanza-Chock, 2006; Maia et al., 2021) between peers in online spaces. In addition to shifts in language around specific topics, we also find that events in the real world altered incivility and valence of tone.

This is important as our data show the use of negativity and incivility might be especially problematic and common when considering conversations that are emotionally charged and about political/social events in the United States. The growth of uncivil language and increased negativity in the aftermath of the George Floyd murder are key examples of how events have significant impacts on the tone and substance of these communities. Simply put, Floyd's murder resulted in an increase of negative and hostile content in these online communities. The context under which these events are framed is important to understanding how users may interpret and react to the moment itself. This is not inconsequential given the socializing influence of online narratives and discussions to perpetuate biased ways of thinking, especially in the context of outgroup beliefs. Altogether, our research highlights the need for examining the associations between digital media, hostility, and the rising social/political unrest that many are experiencing in the United States.

In addition to the important theoretical and practical implications discussed above, the present study also provides a framework for future research to examine the relationship between social media and negative language outside the focus area and context of our data. Indeed, the content-analytic tools used in this paper are applicable to other topics, communities, and subreddits on this platform. For example, these techniques might be especially useful to more in-depth examinations of issues like cyberbullying (Rakib and Soon, 2018), body image (Sowles et al., 2018), and anxiety or depression (Shen and Rudzicz, 2017). While the present research focuses on the content associated with political and social events, it is possible that any issue for which individuals turn to social media for information will develop its own set of shared language. As our data indicate, this has implications for how users on social media like Reddit will likely process the messaging that they view.

Limitations and Future Research

The current research has important limitations that require attention. First, the present study only examined posts and comments on the Reddit platform. The features of Reddit (Van der Nagel and Frith, 2015) make it a particularly influential place for discussion. However, Reddit is clearly not the only website or app where conversations about political and social moments co-occur with incivility, toxicity, and hate speech (Scrivens and Amarasingam, 2020; Chen et al., 2021; Hawkins and Saleem, 2021). The field would benefit from future studies which investigate and synthesize research across multiple platforms as we know that platforms differ in their content (Hiaeshutter-Rice et al., 2021). While our findings highlight these trends on one platform, we cannot speak to the experiences of users of other sources and, despite Reddit's size, it remains small compared to social media giants like Facebook and Twitter. A broader investigation across social media would allow researchers to understand similarities and differences across platforms when considering information environments and hate speech online.

Second, the events chosen to be examined in our corpus, while relevant, only represent a small number of the social and political moments that occurred throughout 2020 and 2021. Additionally, this study only examined digital content from a single year. Future research should continue to explore in different contexts how social/political events in the real-world influence conversations that occur online. Third, although big data approaches are useful for understanding macro level findings on social media platforms, it does not allow us to make an effects-based link between viewing this content and incivility or hostile attitudes. Future research should investigate how individuals' self-reported and actual exposure to negative posts/comments on digital media websites like Reddit influences attitudes, emotions, and behaviors among users in the real world.

Finally, our selection of subreddits is inherently limiting as many of the most-problematic ones were shut down by Reddit. Moreover, we are undoubtedly missing subreddits that are private or small enough to avoid scrutiny. These may be some of the most problematic ones on the site, but we do not have a way to look at their content. In addition, how we chose which groups to include and which to not leaves open the criticism that our selection is inherently biased. That being said, we believe that we have a reasonable cross-section of subreddits that cover some of the more notably and problematic communities and that these communities, given their visibility, may be more likely to attract new members to them.

Conclusion

Altogether, this research makes critical theoretical and practical contributions to the literature concerning politics, social media platforms, and the use of uncivil language. Our data show that online discussions around important real-world events are becoming increasingly negative. Not only do political and social moments in the real world shift the tone of incivility, but these changes last after the events take place. Additionally, we find that a unique set of language developed post the 2020 Presidential election focused on insurrection and overturning the results. This is problematic as these findings explain how what is happening in the real world can shape and contribute to hostile and harmful information environments online. Given that individuals seek out information in times of uncertainty, and often turn to digital media, this has direct implications for understanding how people process the content they view. The processing of this information is important, and as shown in the context of the present study, can promote further engagement with extremist ideas and content online. If social media users are exposed to negative political discussions about events that are salient and pressing, this is likely to shape conversations along with further exacerbating tensions about what is occurring in the real world.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author Contributions

Both authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpos.2022.805008/full#supplementary-material

Footnotes

1. ^YouTube, being the other.

2. ^The_Donald was a subreddit devoted, broadly, to Donald Trump and the numerous topics surrounding him. MGTOW, or Men Going Their Own Way, was focused on men's rights, though frequently was misogynistic at the very best.

3. ^https://www.reddit.com/r/AgainstHateSubreddits/

4. ^https://www.reddit.com/r/redditlists/comments/josdr/list_of_political_subreddits/

5. ^e.g.: https://www.reddit.com/r/AgainstHateSubreddits/comments/8i0qhm/list_of_known_hate_subreddits_suggestions_will_be/.

6. ^Full lists of both the initial subreddits and the ones used in the paper are in Tables A1, A2.

7. ^Stemming is not without its downsides, though (see Schofield and Mimno, 2016). However, stemming is appropriate here as the vast size of the corpus makes it difficult to evaluate words in the context they appear. To try and compensate, we rely on the Porter stemmer as recommended by Schofield and Mimno.

8. ^Using a dictionary is not without its downsides, of course. All methods, including other computational approaches are subject to errors. While a computational approach may have low correlation with the LSD on a 1:1 level, our argument is that the errors between a dictionary and a human supervised approach likely cancel out in the aggregate, which is how we are using the LSD here. Moreover, there is some evidence that there are marginal gains in accuracy to be made by using human coders (Dun et al., 2021).

9. ^This is also a moving average of each day's content.

10. ^As a robustness test, we look at comments made during the same time-frame for a series of non-political but popular subreddits (r/community, r/diy, r/doesanybodyelse, r/dundermifflin, r/pettyrevenge, r/science, r/starwars, r/talesfromretail, and r/trees). Results are in the appendix but show no significant effect of our event on tone or. We argue this confirms that our event was significant in these communities in a notable way and likely not driven by any confounding events or variables.

12. ^A full list of the Highest Probability words and FREX (most distinguishing) words is in Tables A3, A4.

Topic 1 Highest Prob: count, ballot, mail, poll, recount, absente, machin.

Topic 44 Highest Prob: fraud, evid, claim, fact, believ, prove, investig.

Topic 62 Highest Prob: biden, win, trump, lose, won, elector, chanc.

References

Alexa Internet (2021). Competitive analysis, marketing mix and traffic. reddit.com. Available online at: https://www.alexa.com/siteinfo/reddit.com#section_traffic (accessed April 20, 2022).

Allyn, B. (2020). Reddit Bans The_Donald, Forum of Nearly 800,000 Trump Fans, Over Abusive Posts. Available online at: https://www.npr.org/2020/06/29/884819923/reddit-bans-the_donald-forum-of-nearly-800-000-trump-fans-over-abusive-posts (accessed April 20, 2022).

Althaus, S. L., and Tewksbury, D. (2000). Patterns of Internet and traditional news media use in a networked community. Polit. Commun. 17, 21–45. doi: 10.1080/105846000198495

The Language of Extremism on Social Media: An Examination of Posts, Comments, and Themes on Reddit

Introduction

Negativity, Incivility, Toxicity, and Language on Social Media

Content and Communities on Reddit

Events in the Real World and Communication Online

Data and Methods

Findings

Discussion

Limitations and Future Research

Conclusion

Data Availability Statement

Author Contributions

Conflict of Interest

Publisher's Note

Supplementary Material

Footnotes

References

94% of researchers rate our articles as excellent or good