- 1Normal School, Jinhua Polytechnic, Jinhua, China
- 2National Engineering Research Center for Educational Big Data, Central China Normal University, Wuhan, China
- 3School of Marxism, Shanghai University of Finance and Economics, Shanghai, China
- 4College of Economics and Management, Zhejiang Normal University, Jinhua, China
- 5Institute of Scientific and Technical Information of China, Beijing, Beijing, China
Purpose: The possibility of mental illness caused by the academic emotions and academic pressure of graduate students has received widespread attention. Discovering hidden academic emotions by mining graduate students’ speeches in social networks has strong practical significance for the mental state discovery of graduate students.
Design/methodology/approach: Through data collected from online academic forum, a text based BiGRU-Attention model was conducted to achieve academic emotion recognition and classification, and a keyword statistics and topic analysis was performed for topic discussion among graduate posts.
Findings: Female graduate students post more than male students, and graduates majoring in chemistry post the most. Using the BiGRU-Attention model to identify and classify academic emotions has a performance with precision, recall and F1 score of more than 95%, the category of PA (Positive Activating) has the best classification performance. Through the analysis of post topics and keywords, the academic emotions of graduates mainly come from academic pressure, interpersonal relationships and career related.
Originality: A BiGRU-Attention model based on deep learning method is proposed to combine classical academic emotion classification and categories to achieve a text academic emotion recognition method based on user generated content.
1. Introduction
Graduate education is an important part of higher education. In the pursing of a master or doctoral degree, graduate students have been actively carrying out scientific research achievements. They take advantage of their intelligence and make the best use of higher quality scientific and technological resources. At the same time, they are also facing great pressure (Scott and Takarangi, 2019). Graduate students’ mental health concerns have been discussing for nearly 10 years and have attracted more and more attention (Gewin, 2012). These achievements and stress are expressed in many ways, of which emotions are mostly significant. Emotions have a great influence on human learning, memory, motivation, mental health, and neurological function. Positive emotions can increase the level of dopamine in the brain, which in turn affects the function of memory, learning attention, and the ability of creative problem solving (Ashby and Isen, 1999; Meinhardt and Pekrun, 2003; Ainley et al., 2005), meanwhile, negative emotions will reduce learning motivation and effort. Pekrun et al. (2002) first clearly put forward the concept of academic emotion in early 21st. Academic emotions refer to various emotional experiences related to students’ academic work in the teaching or learning process.
However, emotions are not easy to identify sometime. We cannot just ask some other graduate students with serial rude and personal questions such as “Are you happy doing your research today?” or “Why are you so angry about your teammates?” (Gratz and Roemer, 2004). So, if we want to discover the hidden emotions of graduate students, we first need them to be willing to express their emotions. With the continuous development of network technology, various social platforms have become centers for discussions and opinions (Sobkowicz et al., 2012). In such an anonymous environment, topics are being discussed, opinions are being published, and emotions are being expressed. Meanwhile, opinion mining methods based on machine learning and deep learning technologies have been widely accepted (Ramakrishnan et al., 2020), the task of sentiment analysis and emotion recognition for large-scale data sets has become feasible (Sánchez-Núñez et al., 2020). Thus, performing a data mining on the topics and comments published by these graduate students is a worthwhile path to discover their emotion states and care about their mental health, and our research mainly focuses on the following research questions:
1. What is the current state and distribution of graduate students who publish information on social platforms?
2. Based on these students’ posts, how can we discover the hidden academic emotions?
3. What are these graduate students really talking about online?
To address these issues, we conducted the following research. We performed our research on real Internet user generate contents. First, we conduct a descriptive statistical analysis on the collected data, to find out the distribution of posters’ information. Second, we transformed the task of discovering academic emotions from posters’ topics and replies into a supervised deep learning based classification model with natural language processing methods to recognize and classified the content into several emotion categories. At last, a semantic based topic evolution analysis has been conducted to find out what are the posters really discussing about specifically.
2. Related studies
2.1. Academic emotion recognition and measurement
Academic emotion recognition can be considered as the process of discovering hidden emotions from data and research in the studying process of students. Linnenbrink and Pintrich (2002) ran a dichotomy between positive and negative valence as an early exploration on academic emotion recognition and classification. From the research method and technology aspect, three types of academic emotion studies are widely conducted: empirical studies based on qualitative or quantitative questionnaires, model-based academic emotion recognition and measurement, and case studies based on meta-analysis. The Achievement Emotions Questionnaire (AEQ) is an important foundational questionnaire structure for the first type if studies. Pekrun et al. (2011) conducted AEQ and classified academic emotion into 9 categories as enjoyment, hope, pride, relief, anger, anxiety, shame, hopelessness, and boredom from three aspects as learning-related emotions, class-related emotions, and exam-related emotions. Based on these categories, researchers have expanded the applicable field of AEQ and proposed more new dimensions such as AEQ-S (Bieleke et al., 2021), AEQ-ES (Lichtenfeld et al., 2012) and AEQ-PE (Fierro-Suero et al., 2020). The second type of studies focuses on the optimization of academic emotion models. These studies approached on the construction of academic emotional models, such as OCC model (Clore and Ortony, 2013), categorical emotion models (D'mello and Graesser, 2007; Sreeja and Mahalaksmi, 2015) and dimensional emotion models (Sreeja and Mahalakshmi, 2017) are helpful for more accurate and comprehensive descriptions of the emotional state and can provide some references for the development of research work related to emotion recognition. The third type of studies is based on different cases, using meta-analysis method to comprehensively analyze the influencing factors of academic emotions. MacCann et al. (2020) discussed the relationship between emotional intelligence and academic performance. Camacho-Morles et al. (2021) analyzed the how could activity achievement emotions influence on academic achievement.
Recent studies have been conducted primarily on specific groups or by incorporating specific environmental factors. Wang et al. (2017) discussed the relationship between academic emotion and psychological well-being among Chinese rural-to-urban migrant adolescents. Wu et al. (2022) used questionnaire survey to study whether taking important exams during the COVID-19 pandemic would have an impact on students’ academic emotion and found that there was a certain negative deactivate correlation. Fried et al. (2022) ran an assessment of mental health, academic emotion, and resilience among undergraduate and graduate students, found that graduate students generally feel more stressed about their studies.
Meanwhile, the research on academic emotion recognition from artificial intelligence approach mainly focuses on the practical exploration of emotion recognition in the fields of natural language processing (Nandwani and Verma, 2021), computer vision, speech recognition, and physiological information recognition, mainly using text mining, speech recognition, and facial expression recognition, gesture recognition, biological information recognition and other technologies (Feng et al., 2020).
2.2. Emotion recognition based on text
Sentiment classification and analysis are important research front in the field of natural language processing, and emotion analysis is considered an extension and complement (Thelwall et al., 2012). Sentiment classification usually only focuses on the binary, whether it is positive or negative (Martín-Valdivia et al., 2013; Younis, 2021). However, according to emotional psychology research, although positive and negative are important emotional dimensions, there are still many other emotional types and emotional intensity measurement criteria, and positive and negative cannot meet the needs of emotional classification (Cornelius, 1996).
Both unsupervised and supervised approaches are used in research of emotion detection. Unsupervised approaches mainly including dictionary-based and rule-based methods. Mac Kim et al. (2010) used three datasets to build up an emotion word dictionary as a classification category, and classified the emotion texts into angry, fear, joy, and happiness. Agrawal and An (2012) added a discussion of syntactic relations based on emotional words, built a recognition model, and found that the accuracy rate has improved. Supervised approaches are mainly based on machine learning models (Asghar et al., 2019; Hasan et al., 2019; Tiwari et al., 2017), based on deep learning models or hybrid models (Basiri et al., 2020; Xu et al., 2020; Singh et al., 2021), and transfer learning techniques (Ahmad et al., 2020).
Research on academic emotions has been widely carried out, and many questionnaires are used for different research subjects. However, these methods are not suitable for academic emotion recognition process based on large-scale datasets. Meanwhile, the achievement of text-based emotion analysis, we can easily use these methods and techniques for academic emotion recognition research.
3. Data and methodology
3.1. Data collection
The dataset used in our research needs to meet two main requirements. First, to achieve opinion mining, it should be user generated content; Second, it should be posted by graduate students. We choose to collect the posts and comments from “Xiaomuchong”1, a famous Online Academic Forum in China, which focuses on provide a place for postgraduate students to exchange academic opinions, discuss study methods, and share daily lives. The forum has more than 5 million topics, 140 million comments and 25 million registered users. Besides the considerable amount of data capacity, the registration restrictions are critical, it requires postgraduate certification, and it is highly recommended to provide the registers major status. In this study, we focus on the academic emotions, so we need to make a preliminary limit to the data source just from the “postgraduate study mood” section.
We collected the web page text content of the Xiaomuchong website through a web spider program written in python for 2 weeks, and the data set contains three types of useful information: topic information, detail information and poster information. The topic information is the dataset of topics posted by the website users. It contains the title of the topic, the post time, the latest reply time, the id of the topic poster and the volume of views and replies. The detail information contains the whole detail content of the dataset. Each reply to every topic in our dataset is collected. It contains the replied detail text content and the replier’s information such as id and reply time. The poster information contains the personal information of both posters and repliers, such as age, gender and what they are majoring in. Finally, we collected 18,831 topic items, 511,269 detail items and 131,755 poster information items and stored in MongoDB.
3.2. Academic emotion recognition method
3.2.1. Academic emotion recognition framework
We used Bi-GRU, a deep learning model, for training and testing to achieve the task of identifying academic emotions. The pipeline of the entire model includes three main steps, which is shown in Figure 1.
The first step is data pre-processing. For Chinese text, before the word vector embedding task, it is necessary to perform a word segmentation on the text. The second step is data labeling, it is essential in supervised learning model training. In the word embedding process, due to the necessity to train and test on different models and compare the results, we used a pretrained word2vec model instead of randomly initialized word vectors. We selected a Chinese word vector library developed by Tencent Artificial Intelligence Laboratory (Song et al., 2018). This library has more than 8 million Chinese words, and contains many emerging words and network terms, which is very suitable for machine learning or deep learning downstream tasks based on user generated content. After the word vector embedding step, the Chinese words after word segmentation can be converted into word vectors of 200 dimensions. The last step is model training. We split the dataset into training set and test set, and through multiple iterations of multiple epochs, we finally got a stable classification model and tested it on the test set.
3.2.2. Data labelling
We labeled the dataset into four categories with academic emotion tags among “Positive activating,” “Positive deactivating,” “Negative activating” and “Negative deactivating.” Due to the large amount of data, perform manual labeling method on global data set is not appropriate. To ensure the efficiency and accuracy of data labeling result, we divide the process into two steps. The first round of labeling process is unsupervised labeling, and the second round is conducted by combining unsupervised and manual labeling. Unsupervised labeling is to determine whether the target text contains emotional words corresponding to the dimension. We first constructed an emotional word list based on academic emotion categories (Pekrun et al., 2011) and an existing Chinese emotional ontology (Xu et al., 2008), which contains the dimensions of academic emotion classification and the emotion words related to each dimension (As shown in Table 1). The ontology contains several attributes to each word, including lexicon, emotional category, and emotional strength.
Three conditions occurred during the process. First, there is only one emotional word in the text, which can be directly labeled with the corresponding dimension. Second, the text does not contain any emotional words, and it will be marked as invalid data. The third condition is the most complicated. If the text contains multiple emotional words, a method for calculating academic emotional strength was conducted to present the emotional tendency. For an unlabeled text, the academic emotional strength of one dimension can be calculated as the sum of the emotional strength of all emotional words belongs to this dimension, and the text can be labeled as the academic emotion dimension with the max emotional strength. For example, in an unlabeled text “今天做完了实验,真开心, 但导师却挑剔的说结果不够好 (I’m really happy that I finished the experiment today, but my supervisor was critical and said that the result was not good enough),” there are two emotional word <开心 (happy), 挑剔(critical) > in this text which belongs two emotional word categories < happy, derogatory > and two academic emotion dimensions < Positive activating, Negative deactivating >. According to the emotional ontology, emotional strength of word “happy” is 7 and “critical” is 5, thus the academic emotion dimension strength of this text can be calculated as < Positive activating:7, Positive deactivating: 0, Negative activating: 0, Negative deactivating:5 >, 7 is the max strength value, so this text will be labeled as “Positive activating.”
However, this unsupervised data labeling method is suitable for conditions where the text length should not be too long. If there are more than 10 emotional words in one unlabeled text, then it can be classified as long text (Deng and Ren, 2021). We manually checked these long texts, selected items with obvious emotional tendencies and marked them, and discarded other items. Finally, 203,350 detail items are labeled, and the distribution of each dimension of academic emotion are shown in Figure 2.
3.2.3. BiGRU-attention model
From natural language processing aspect, the task of identifying academic sentiment from textual content can be considered as feature based text sequence classification, so among deep learning models and frameworks, a recurrent neural network (RNN) that can extract and process sequence features is conducted. GRU (Gated Recurrent Units) and LSTM (Long-short Term Memory) are both optimized recurrent neural network models (Chung et al., 2014), which are suitable for dealing with long sequences in the sequence model. Compared with the traditional RNN model, update mechanism has been added to enhance the memory ability and reduces the chances of gradient disappearance and gradient explosion. These two models both have a large scale of applications in the fields of text classification, machine translation, and speech recognition. As an improvement of the LSTM model, the GRU model simplifies the input gate and forget gate into one update gate in the unit of hidden layer (Yang et al., 2020). A gated recurrent unit contains an update gate zt and a reset gate rt. At time t, according to the output of previous stateℎ ht−1 and current input xt, zt and rt can be formulated as:
After inputting the previous gated state, a reset gate can calculate an updated (ℎt−1 × xt), then combine the new data with the current input xt, a gated recurrent unit calculates the new state ht as:
is used as the activation function. This is the function of current input , also considered as the candidate hidden layer. can control the memory size, combines current and previous to calculate the final hidden layer state. Finally, the current calculation formular is updated as:
Here is the gate control signal, and , it controls the forgetting amount of information. The closer the value is to 0, the more information needs to be forgotten.
However, in sentiment classification tasks, use RNN model to extract text features has certain defects, that is, the one-way RNN model can only extract the former context features, but the latter context features are also very important. Using a bidirectional recurrent neural network structure can effectively solve this problem (Tang et al., 2019), and we conducted a bidirectional GRU network. The Bi-GRU model solves the sequence features in the text, and to achieve the classification function of the model, we need to adopt a classification structure. For the four classifications of the results, we employ a softmax classifier for concatenation. In addition, since we have performed word segmentation on the text, the importance of words will affect the classification results of the model, so an attention layer is added between the softmax layer and the hidden layer. The structure of the entire classification model is shown in the Figure 3.
Meanwhile, to figure out the performance of the model we adopted, we also used machine learning classification algorithm support vector machine, multinomial Naive Bayes and one-way GRU neural network as the baseline for comparison.
4. Results and discussion
4.1. Descriptive statistical analysis of posts
The research data we have collected has been since the establishment of the Xiaomuchong website from 2010 to 2020. Finally, we collected 18,831 topic items, 511,269 detail items and 131,755 poster information items. The year and month distribution of topics is shown in Figure 4. From 2010 to 2015, these topics did not get much discussion, but after November 2016, the volume of topics increased dramatically and reached a peak around 2017 to 2019. Although in 2020 there was a small decrease in the volume of topics, but there is still a lot of attention on the topic of academic emotion among graduate students.
Meanwhile, to figure out which topics get the most discussion, we draw boxplots based on replies to those topics as shown in Figure 5. From the overall aspect, the mean and median of the volume of replies to the topics show the post time of the topic is related to the volume of replies. The earlier a topic is posted, the more exposure it gets, and therefore more replies are performed. Most topics received less than 1,000 replies, but there were also some topics that got more than 3,000 replies, with the most reaching an astounding 5,227, which is a topic about “PhD thesis defence successfully passed” posted in 2017.
4.2. Descriptive analysis of posters’ gender and major
To find out the academic emotions hidden in viewpoints from multiple perspectives, it is also very important to conduct statistical analysis on the direct data that can be obtained, such as the distribution of posters’ gender and major.
The gender distribution of the posters is shown in the Figure 6. Surprisingly, nearly half of the posters (about 46.38%) chose not to set gender or keep gender confidential. When it comes to personal situations like sentiments and emotions, even in online communities that share opinions anonymously, people still tend to keep personal information such as gender in secret. Among the posters who set their genders, 21.80% of them selected as ‘Male’ and 31.82% of them selected as ‘Female’. In such online communities, female graduate students are more willing to express their thoughts and opinions.
As a graduate student, different majors have different academic pressures. For example, the students majoring in engineering may suffer from the unsatisfactory optimization of the experimental model and not get enough ideal experimental data, Meanwhile, the students majoring in literature might suffer from lack of inspiration and creation of new chapters. It is important to figure out the distribution of the posters’ majors. However, it is inappropriate to show the distributions directly because there are too many different majors, and some majors are quite alike. So, we classified the majors into 13 main subjects and 111 subcategories manually according to Catalogue of Degree Granting and Talent Training Subjects published by the Ministry of Education of the People’s Republic of China.2 At last, we conducted a statistical analysis of the subjects of the posters’ majors, and the result is show in Figure 7.
We can find out that the volume of posters on online platforms varies widely among graduate students in different majors. From the main subject level, the “engineering” subject not only has the largest total volume of posters, but also has the most kinds of subcategories. Due to the sensitivity and the quantity of samples collected, the volume of posters belonging to military science is the least, with no more than 100 records. Obviously, the difference between the majors of the article is more significant from the subcategory level. Among the graduate students who post on the Xiaomuchong website, the students majoring in “chemical” are the most, reaching 30,317. This is not an accidental phenomenon and there have been many conclusions about the academic pressure and academic depression of graduate students majoring in chemistry (Rodrigues, 2020; Stockard et al., 2021). Meanwhile, the second place is the students majoring in “material science and engineering” with 13,915. These are the only two majors with over 10,000 records. Is this phenomenon because the students of these two majors are more inclined to express their opinions and views online? Or do they have more academic emotions to express? We found some valuable reasons from other literature (Tang et al., 2018; Woolston and O'Meara, 2019). One of the main reasons is that students in these two majors, especially graduate students, face greater academic pressure. The daily work of these graduate students in these two majors is based on a giant number of experiments. A lot of work means extra working hours, which leads to a lack of work-life balance, anxiety, and depression.
4.3. Result of academic emotion classification
We randomly selected 203,350 data records to obtain the training set at a ratio of 80% and rounded them to integers. The training set contains 160,000 data records, and the remaining 43,350 are used as the test set.
In the performance evaluation of the classification model, we mainly carried out two aspects. The first is to measure the overall model performance, using overall indicators for evaluation. Then, to discover the classification effect of each category, we also evaluated the precision, recall and F1score of each category. For the overall metrics, we selected macro average score for evaluation. The results are shown in Table 2 (PA: Positive activating, PD: Positive deactivating, NA: Negative activating, ND: Negative deactivating).
Comparing between different models, deep learning models perform better than machine learning models. We can find that the BiGRU-Attention model we used has the best scores on various indicators all above 95%, shows that the model has good performance. The comprehensive score of the one-way GRU model has also reached more than 80%, which is also fair enough. However, performance of the machine learning models, multinomial NB and svm, are both unsatisfactory. The svm model can barely reach 70% in all three indicators, while the naive Bayes model performs even worse, only slightly more than 60%.
Furthermore, we can find that the classification model has good performance on all four categories. The PA (Positive Activating) classification with the best effect has a score of more than 98% in all three indicators. This category has the largest sample size and a larger range of emotional words, which can have better recognition effect. The worst result of the four categories is PD (Positive Deactivating), with a F1 score of 93% and a recall 89%. Just focus on the score of this one category, the performance of the model is good enough, but there is a gap of close to 5 to 10% with the scores of the other three categories. This phenomenon occurs because this category PD (Positive Deactivating) is one of the academic emotions which is most difficult to articulate or to define. According to Pekrun et al. (2011)’s research, academic emotions in the category of PD (Positive Deactivating) are considered to have positive sentiment, but a negative effect on academic efforts. For example, if a graduate student completes light work with ample time constraints, he will perceive it as an “easy” task and will feel “relieved” and “comfortable.” Then such emotions will have a negative effect on the following work, which may lead graduate students to think that research and learning are very simple, which is not conducive to their concentration on tasks. But somehow the emotional words of are very close to the PA (Positive Activating) category. Thus, “happy” and “relieve” are very easy to distinguish, but it is difficult to define whether “satisfaction” will have a positive or negative impact on academic efforts. This is the main reason for the low recall rate of this category.
In general, in such a high-dimensional multi-classification natural language processing task, using a deep learning model based on a recurrent neural network combined with an attention mechanism to build a classification model can achieve good performance with more than 90% of the comprehensive scores. The performance indicators can support some Chinese text-based academic emotion recognition applications based on such a model.
4.4. Topic analysis of graduate students’ emotional engagement
Discovering hidden academic emotions from posters can effectively help us deal with the academic stress of graduate students, but it is not enough to know that what are they really talking about or worrying about. To have a further understanding of current graduate students using online platforms, it is necessary to bring up this discussion to semantic and topic level.
In addition to identifying the hidden graduate academic emotions from the website’s posts, to truly understand the specific topics that the graduates are discussing, we have carried out more detailed discussions and analysis. After the text pre-processing operation, we not only construct the recognition model of the obtained corpus, but also conduct a simple topic analysis on the collected topics. Through word frequency statistics, we excluded keywords with term frequency less than 1,000, finally, we can know high-frequency words as shown in Figure 8.
We continue to use the Chinese word vectors obtained in the word vector embedding step and conducted a k-means algorithm to cluster the words in each topic and post, determined by calculating the silhouette coefficient, and finally divided the results into eight categories unsupervised. Based on these eight clusters, we invited 3 experts in postgraduate admissions employment and postgraduate mental health to summarize and manually classify them to improve the readability of the results. Finally, we divided these subject headings into the following three categories: academic pressure related topics, interpersonal related topics, and career related topics.
4.4.1. Academic pressure related topics
Academic pressure is a major aspect of postgraduate students expressing academic emotions. The academic emotional pressure of postgraduates is not only good or bad academic exam score performance, but also includes the pressure of scientific research achievements, such as experiments and papers. The high-frequency words related to the academic pressure of graduate students include “papers,” “projects,” “laboratories,” and “scientific research,” etc. Academic pressure, that is, the dual pressure from study and scientific research, is the topic that postgraduates are most concerned about, and may also be the major source of postgraduate pressure. Accompanied by such academic pressure, psychological problems or practical problems may occur among graduate students. Psychological problems can lead to mental illness, such as depression, obsessive–compulsive disorder, anxiety disorder, and even lead to schizophrenia, etc. (O'Connor and Yanos, 2021). Suicidal behavior is not uncommon for master and doctoral students (Poreddi et al., 2021). On the other hand, practical problems are mainly focused on “how to graduate.” In China’s postgraduate training program, the number of scientific research achievements of postgraduates within the school duration is closely related to whether they can successfully graduate and obtaining a graduation certificate and a degree certificate. Therefore, for graduate students, publishing academic papers and conducting scientific research is not only the accumulation of their own interests or enthusiasm, but also a necessary factor to ensure that they can successfully graduate.
4.4.2. Interpersonal related topics
Another important emotion topic comes from lack of social interaction. The high-intensity research and study of graduate students makes their life trajectory very simple. The daily life of most graduate students during the semester is to commute among the laboratory or classroom, dormitory and canteen. They do not have the opportunity, nor the extra energy, to engage in social activities. This situation is more prominent among Chinese students (Moore-Jones, 2022). The high-frequency words related to the interpersonal pressure of graduate students include “friends,” “relationships,” “lovers,” and “families,” etc. The lack of social activities and the unmet need for friendship are mutually influencing and sometimes reinforcing (Suwinyattichaiporn and Johnson, 2022). Somehow, the need for a spouse or couple cannot be ignored in the social emotions of graduate students. Keywords like couples, boyfriends and girlfriends, and marriage also have high word frequencies. Among these posts, many of them are marriage and friendship posts. For graduate students who are socially deficient, this may be a relatively effective way for them to be more familiar with, or to hope for O’Day and Heimberg (2021). There is also a certain number of posts discussing family and relationships. These discussions are not the same as those about finding the other half, which are mostly about the relationship between graduates themself and their parents. According to China’s higher education system, graduate students at the master’s level are usually around 25 years old, and those at the doctoral level are around 27–30 years old. At this age, the relationship with parents is at a low standard in lifetime (Kremer, 2016). There is a significant work–family conflict (WFC) between graduate students and their parents (Dolson and Deemer, 2022), which can be simply summarized as the income level of graduate students is incomparable with work, parents need to continue to provide help for their children’s lives, and the material and mental pressure that children bear in this relationship as graduates are seriously facing.
4.4.3. Career related topics
Finally, one of the most frequently discussed topics online for graduate students is related to personal career development. These topics are like the WFC mentioned in the previous section, but more about the concerns of the graduate students themselves about their personal career development. The high-frequency words related to the career pressure of graduate students include “work,” “career,” “job,” and “income,” etc. These pressures and emotions arise from comparison. We all know that academic qualifications and degrees can determine the type and starting point of your future work, but in actual working environment, personal career development is often influenced by a variety of factors, among which working years is a very important consideration (Tang et al., 2008; Purohit et al., 2020). This brings up the first comparison, the comparison between peers. Compared to their peers, if they did not continue to study for a master’s or doctoral degree but started working after graduating from an undergraduate or junior college, their income level is likely to be higher than that of fresh graduate students due to the accumulation of working years (Yusuf et al., 2020). Another comparison is with one’s own efforts. Pursuing a master’s degree can take 3 years or more, and a doctorate requires at least 6–7 years of additional study time compared to a bachelor’s degree. It is really a matter of willpower and endurance, and a lot of mental work. But the input–output ratio of a graduate student’s first job is often suboptimal, the value created, and the income obtained may not be satisfied with what the graduate student thinks is equivalent. These two comparisons are very easy to make graduate students have a psychological gap and self-doubt, thinking that their efforts are worthless, have no prospects, and have not improved their living standard. Not only in China, but many graduate students all over the world have the same worries and anxieties (Are et al., 2018; McConnell et al., 2018; Sharif et al., 2019).
5. Conclusion
Measuring academic emotions is an important way to discover graduate students’ learning status and mental health. Due to the concealment and diversity of academic emotions, it is difficult to discover hidden them from texts using traditional methods. The academic emotions and academic pressures of graduate students is a long-standing concern and is receiving increasing attention. The peculiarity of graduates’ academic emotions is that their stress sources are not only from their studies, but also from research, family, and career planning. At the same time, with these academic pressures, there is no effective way to express and talk about these academic emotions that graduate students generate. In the long run, it is very easy to form psychological problems and lead to serious consequences. Many studies have analyzed and excavated academic emotion.
To address the three research questions we raised, we conducted a series of academic emotion recognition and analysis methods on large-scale datasets. For our first research question, our research conducted a statistic analysis of the collected postgraduates’ posts on Xiaomuchong platform, mainly gender and major, and find out excludes users who do not want to disclose their gender or who do not fill in their gender, female post more on the platform. According to the majors marked by posters, we found that graduate students in science and engineering published most of the posts on the platform, especially majoring in chemistry. This is determined by the features of the major and the way in which the research work undertaken is carried out. For our second research question, we transform the academic emotion recognition task into a series process of constructing, training, and testing an emotion classification model based on user-generated text content. Aiming at the shortcomings of traditional academic emotion recognition research in the application of large-scale data sets, we constructed a pipeline based on recurrent neural network, which can identify and classify academic emotions unsupervised, and has a relatively ideal model performance. At last, for our third research question, based on the word vectors, we performed a topic analysis among the graduate students’ posts. We clustered graduate posts on the Xiaomuchong platform into three main categories: academic pressure related topics, interpersonal related topics, and career related topics. We also discussed the main problems and sources of stress faced by graduate students from these three main categories. There are also deficiencies in our research that need to be improved in future research. The first is the problem of data labeling. The use of vocabulary-based heuristic rules may be insufficient. Consider using a decision tree model instead. Second, the topic of posts is not necessarily a simple emotional expression, but also a relevant topic discussion. The information organization provided only through the website may be insufficient. Consider a better way to filter the data set.
Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Author contributions
QX: research design and paper writing. SC: data collection and data analysis. YX: paper revision and editing. CM: paper writing. All authors contributed to the article and approved the submitted version.
Funding
This work is supported by the National Natural Science Foundation of China (Nos. 72104219 and 62207016), the MOE Project of Humanities and Social Sciences (No. 21YJC870013), and Major Humanities and Social Sciences Research Projects in Zhejiang higher education institutions (No. 2023QN129).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Footnotes
1. ^A forum for academic information exchange among researchers. http://muchong.com/bbs/
2. ^The Catalogue is according to the version published in 2018 by MOE. http://www.moe.gov.cn/s78/A22/xwb_left/moe_833/201804/t20180419_333655.html
References
Agrawal, A., and An, A. (2012). Unsupervised emotion detection from text using semantic and syntactic relations. 2012 IEEE/WIC/ACM international conferences on web intelligence and intelligent agent technology, Melbourne, Australia 1, 346–353
Ahmad, Z., Jindal, R., Ekbal, A., and Bhattachharyya, P. (2020). Borrow from rich cousin: transfer learning for emotion detection using cross lingual embedding. Expert Syst. Appl. 139:112851. doi: 10.1016/j.eswa.2019.112851
Ainley, M., Corrigan, M., and Richardson, N. (2005). Students, tasks and emotions: identifying the contribution of emotions to students' reading of popular culture and popular science texts. Learn. Instr. 15, 433–447. doi: 10.1016/j.learninstruc.2005.07.011
Are, C., Stoddard, H. A., Nelson, K. L., Huggett, K., Carpenter, L., and Thompson, J. S. (2018). The influence of medical school on career choice: a longitudinal study of students’ attitudes toward a career in general surgery. Am. J. Surg. 216, 1215–1222. doi: 10.1016/j.amjsurg.2018.10.036
Asghar, M. Z., Subhan, F., Imran, M., Kundi, F. M., Shamshirband, S., Mosavi, A., et al. (2019). Performance evaluation of supervised machine learning techniques for efficient detection of emotions from online content. arXiv preprint arXiv:1908.01587. doi: 10.48550/arXiv.1908.01587
Ashby, F. G., and Isen, A. M. (1999). A neuropsychological theory of positive affect and its influence on cognition. Psychol. Rev. 106, 529–550. doi: 10.1037/0033-295X.106.3.529
Basiri, M. E., Abdar, M., Cifci, M. A., Nemati, S., and Acharya, U. R. (2020). A novel method for sentiment classification of drug reviews using fusion of deep and machine learning techniques. Knowl.-Based Syst. 198:105949. doi: 10.1016/j.knosys.2020.105949
Bieleke, M., Gogol, K., Goetz, T., Daniels, L., and Pekrun, R. (2021). The AEQ-S: a short version of the achievement emotions questionnaire. Contemp. Educ. Psychol. 65:101940. doi: 10.1016/j.cedpsych.2020.101940
Camacho-Morles, J., Slemp, G. R., Pekrun, R., Loderer, K., Hou, H., and Oades, L. G. (2021). Activity achievement emotions and academic performance: a meta-analysis. Educ. Psychol. Rev. 33, 1051–1095. doi: 10.1007/s10648-020-09585-3
Chung, J., Gulcehre, C., Cho, K., and Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. doi: 10.48550/arXiv.1412.3555
Clore, G. L., and Ortony, A. (2013). Psychological construction in the OCC model of emotion. Emot. Rev. 5, 335–343. doi: 10.1177/1754073913489751
Cornelius, R. R. (1996). The science of emotion: Research and tradition in the psychology of emotions. Hoboken Prentice-Hall, Inc.
Deng, J., and Ren, F. (2021). A survey of textual emotion recognition and its challenges. IEEE Trans. Affect. Comput. 14, 49–67. doi: 10.1109/TAFFC.2021.3053275
D'mello, S., and Graesser, A. (2007). "Mind and body: dialogue and posture for affect detection in learning environments," Proceedings of the 2007 Conference on Artificial Intelligence in Education: Building Technology Rich Learning Contexts that Work, 161–168.
Dolson, J. M., and Deemer, E. D. (2022). The relationship between perceived discrimination and school/work–family conflict among graduate student-parents. J. Career Dev. 49, 174–187. doi: 10.1177/0894845320916245
Feng, X., Wei, Y., Pan, X., Qiu, L., and Ma, Y. (2020). Academic emotion classification and recognition method for large-scale online learning environment—based on A-CNN and LSTM-ATT deep learning pipeline method. Int. J. Environ. Res. Public Health 17:1941. doi: 10.3390/ijerph17061941
Fierro-Suero, S., Almagro, B. J., and Sáenz-López, P. (2020). Validation of the achievement emotions questionnaire for physical education (AEQ-PE). Int. J. Environ. Res. Public Health 17:4560. doi: 10.3390/ijerph17124560
Fried, R. R., Karmali, S., and Irwin, J. D. (2022). Minding many minds: an assessment of mental health and resilience among undergraduate and graduate students; a mixed methods exploratory study. J. Am. Coll. Heal. 70, 898–910. doi: 10.1080/07448481.2020.1781134
Gratz, K. L., and Roemer, L. (2004). Multidimensional assessment of emotion regulation and dysregulation: development, factor structure, and initial validation of the difficulties in emotion regulation scale. J. Psychopathol. Behav. Assess. 26, 41–54. doi: 10.1023/B:JOBA.0000007455.08539.94
Hasan, M., Rundensteiner, E., and Agu, E. (2019). Automatic emotion detection in text streams by analyzing twitter data. Int. J Data Sci. Anal. 7, 35–51. doi: 10.1007/s41060-018-0096-z
Kremer, I. (2016). The relationship between school-work-family-conflict, subjective stress, and burnout. J. Manag. Psychol. 31, 805–819. doi: 10.1108/JMP-01-2015-0014
Lichtenfeld, S., Pekrun, R., Stupnisky, R. H., Reiss, K., and Murayama, K. (2012). Measuring students' emotions in the early years: the achievement emotions questionnaire-elementary school (AEQ-ES). Learn. Individ. Differ. 22, 190–201. doi: 10.1016/j.lindif.2011.04.009
Linnenbrink, E. A., and Pintrich, P. R. (2002). Achievement goal theory and affect: An asymmetrical bidirectional model. Educ. Psychol. 37, 69–78. doi: 10.1207/S15326985EP3702_2
Mac Kim, S., Valitutti, A., and Calvo, R. A. (2010). Evaluation of unsupervised emotion models to textual affect recognition. Proceedings of the NAACL HLT 2010 workshop on computational approaches to analysis and generation of emotion in text, 62–70. Stroudsburg, PA
MacCann, C., Jiang, Y., Brown, L. E., Double, K. S., Bucich, M., and Minbashian, A. (2020). Emotional intelligence predicts academic performance: a meta-analysis. Psychol. Bull. 146, 150–186. doi: 10.1037/bul0000219
Martín-Valdivia, M.-T., Martínez-Cámara, E., Perea-Ortega, J.-M., and Ureña-López, L. A. (2013). Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches. Expert Syst. Appl. 40, 3934–3942. doi: 10.1016/j.eswa.2012.12.084
McConnell, S. C., Westerman, E. L., Pierre, J. F., Heckler, E. J., and Schwartz, N. B. (2018). United States National Postdoc Survey results and the interaction of gender, career choice and mentor impact. elife 7:e40189. doi: 10.7554/eLife.40189
Meinhardt, J., and Pekrun, R. (2003). Attentional resource allocation to emotional events: An ERP study. Cognit. Emot. 17, 477–500. doi: 10.1080/02699930244000039
Moore-Jones, P. (2022). Self-segregation, sense of belonging, and social support: An inquiry into the practices and perceptions of Chinese graduate students at an American Mid-Atlantic university. J. Glob. Educat. Res. 6, 1–12. doi: 10.5038/2577-509X.6.1.1114
Nandwani, P., and Verma, R. (2021). A review on sentiment analysis and emotion detection from text. Soc. Netw. Anal. Min. 11, 81–19. doi: 10.1007/s13278-021-00776-6
O’Day, E. B., and Heimberg, R. G. (2021). Social media use, social anxiety, and loneliness: a systematic review. Comput. Hum. Behav. Rep. 3:100070. doi: 10.1016/j.chbr.2021.100070
O'Connor, L. K., and Yanos, P. T. (2021). Where are all the psychologists? A review of factors impacting the underrepresentation of psychology in work with serious mental illness. Clin. Psychol. Rev. 86:102026. doi: 10.1016/j.cpr.2021.102026
Pekrun, R., Goetz, T., Frenzel, A. C., Barchfeld, P., and Perry, R. P. (2011). Measuring emotions in students’ learning and performance: the Achievement Emotions Questionnaire (AEQ). Contemp. Educ. Psychol. 36, 36–48. doi: 10.1016/j.cedpsych.2010.10.002
Pekrun, R., Goetz, T., Titz, W., and Perry, R. P. (2002). Academic emotions in students' self-regulated learning and achievement: a program of qualitative and quantitative research. Educ. Psychol. 37, 91–105. doi: 10.1207/S15326985EP3702_4
Poreddi, V., Anjanappa, S., and Reddy, S. (2021). Attitudes of under graduate nursing students to suicide and their role in caring of persons with suicidal behaviors. Arch. Psychiatr. Nurs. 35, 583–586. doi: 10.1016/j.apnu.2021.08.005
Purohit, D., Jayswal, M., and Muduli, A. (2020). Factors influencing graduate job choice–a systematic literature review. Eur. J. Train. Develop. 45, 381–401. doi: 10.1108/EJTD-06-2020-0101
Ramakrishnan, J., Mavaluru, D., Srinivasan, K., Mubarakali, A., Narmatha, C., and Malathi, G. (2020). Opinion mining using machine learning approaches: a critical study. 2020 international conference on computing and information technology (ICCIT-1441) 09–10 September 2020
Rodrigues, A. E. (2020). Chemical engineering and environmental challenges. Cyclic adsorption/reaction technologies: materials and process together! J. Environ. Chem. Eng. 8:103926. doi: 10.1016/j.jece.2020.103926
Sánchez-Núñez, P., Cobo, M. J., De Las Heras-Pedrosa, C., Peláez, J. I., and Herrera-Viedma, E. (2020). Opinion mining, sentiment analysis and emotion understanding in advertising: a bibliometric analysis. IEEE Access 8, 134563–134576. doi: 10.1109/ACCESS.2020.3009482
Scott, H., and Takarangi, M. K. (2019). Measuring PhD student’s psychological well-being: are we seeing the whole picture? Student Success 10, 14–24. doi: 10.5204/ssj.v10i3.1294
Sharif, N., Ahmad, N., and Sarwar, S. (2019). Factors influencing career choices. IBT J. Bus. Stud. 15, 33–45. doi: 10.46745/ilma.jbs.2019.15.01.03
Singh, M., Jakhar, A. K., and Pandey, S. (2021). Sentiment analysis on the impact of coronavirus in social life using the BERT model. Soc. Netw. Anal. Min. 11, 33–11. doi: 10.1007/s13278-021-00737-z
Sobkowicz, P., Kaschesky, M., and Bouchard, G. (2012). Opinion mining in social media: modeling, simulating, and forecasting political opinions in the web. Gov. Inf. Q. 29, 470–479. doi: 10.1016/j.giq.2012.06.005
Song, Y., Shi, S., Li, J., and Zhang, H. (2018). Directional skip-gram: explicitly distinguishing left and right context for word embeddings. Proceedings of the 2018 conference of the north American chapter of the Association for Computational Linguistics: human language technologies, 2 (Short Papers), 175–180 New Orleans, Louisiana. Association for Computational Linguistics
Sreeja, P. S., and Mahalakshmi, G. (2017). Emotion models: a review. Int. J. Control Theory Applicat. 10, 651–657.
Sreeja, P., and Mahalakshmi, G. (2015). Applying vector space model for poetic emotion recognition. Adv. Nat. Appl. Sci. 9, 486–491.
Stockard, J., Noviski, M., Rohlfing, C. M., Richmond, G. L., and Lewis, P. (2021). The chemistry graduate student experience: findings from an ACS survey. J. Chem. Educ. 99, 461–468. doi: 10.1021/acs.jchemed.1c00610
Suwinyattichaiporn, T., and Johnson, Z. D. (2022). The impact of family and friends social support on Latino/a first-generation college students’ perceived stress, depression, and social isolation. J. Hisp. High. Educ. 21, 297–314. doi: 10.1177/1538192720964922
Tang, F., Byrne, M., and Qin, P. (2018). Psychological distress and risk for suicidal behavior among university students in contemporary China. J. Affect. Disord. 228, 101–108. doi: 10.1016/j.jad.2017.12.005
Tang, X., Dai, Y., Wang, T., and Chen, Y. (2019). Short-term power load forecasting based on multi-layer bidirectional recurrent neural network. IET Generat. Transm. Distrib. 13, 3847–3854. doi: 10.1049/iet-gtd.2018.6687
Tang, M., Pan, W., and Newmeyer, M. D. (2008). Factors influencing high school students’ career aspirations. Prof. Sch. Couns. 11:2156759X0801100. doi: 10.1177/2156759X0801100502
Thelwall, M., Buckley, K., and Paltoglou, G. (2012). Sentiment strength detection for the social web. J. Am. Soc. Inf. Sci. Technol. 63, 163–173. doi: 10.1002/asi.21662
Tiwari, P., Kumar, S., Kumar, V., and Mishra, B. K. (2017). Implementation of n-gram methodology for rotten tomatoes review dataset sentiment analysis. International Journal of Knowledge Discovery in Bioinformatics 7, 30–41. doi: 10.4018/IJKDB.2017010103
Wang, D., Li, S., Hu, M., Dong, D., and Tao, S. (2017). Negative academic emotion and psychological well-being in Chinese rural-to-urban migrant adolescents: examining the moderating role of cognitive reappraisal. Front. Psychol. 8:1312. doi: 10.3389/fpsyg.2017.01312
Woolston, C., and O'Meara, S. (2019). PhD students in China report misery and hope. Nature 575, 711–713. doi: 10.1038/d41586-019-03631-z
Wu, P., Li, M., Zhu, F., and Zhong, W. (2022). Empirical investigation of the academic emotions of Gaokao applicants during the COVID-19 pandemic. SAGE Open 12:215824402210798. doi: 10.1177/21582440221079886
Xu, G., Li, W., and Liu, J. (2020). A social emotion classification approach using multi-model fusion. Futur. Gener. Comput. Syst. 102, 347–356. doi: 10.1016/j.future.2019.07.007
Xu, L., Lin, H., Pan, Y., Ren, H., and Chen, J. (2008). Constructing the affective lexicon ontology. J. China Soc. Sci. Tech. Informat. 27, 180–185.
Yang, S., Yu, X., and Zhou, Y. (2020). LSTM and GRU neural network performance comparison study: taking yelp review dataset as an example. 2020 International workshop on electronic communication and artificial intelligence (IWECAI), Qingdao, 98–101.
Younis, S. B. (2021). Opinion mining on web-based communities using optimised clustering algorithms. Turkish J. Comput. Math. Educat. 12, 438–447. doi: 10.17762/turcomat.v12i9.3099
Keywords: academic emotion, emotion recognition, emotion classification, graduate mental health, deep learning
Citation: Xu Q, Chen S, Xu Y and Ma C (2023) Detection and analysis of graduate students’ academic emotions in the online academic forum based on text mining with a deep learning approach. Front. Psychol. 14:1107080. doi: 10.3389/fpsyg.2023.1107080
Edited by:
Cody Ding, University of Missouri–St. Louis, United StatesReviewed by:
Ravi Kiran, Thapar Institute of Engineering &Technology, IndiaJeeta Sarkar, XIM University, India
Copyright © 2023 Xu, Chen, Xu and Ma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Chao Ma, bWFjaGFvNDU2QGhvdG1haWwuY29t