- 1School of Management, Wuhan University of Technology, Wuhan, China
- 2Key Laboratory of Adolescent Cyberpsychology and Behavior, Ministry of Education, Wuhan, China
- 3School of Psychology, Central China Normal University, Wuhan, China
- 4Central China Normal University Branch, Collaborative Innovation Center of Assessment for Basic Education Quality at Beijing Normal University, Wuhan, China
- 5School of Music, Henan University, Kaifeng, China
- 6Institute of Digital Commerce, Wuhan Technology and Business University, Wuhan, China
The social question answering based online counseling (SQA-OC) is easy access for people seeking professional mental health information and service, has become the crucial pre-consultation and application stage toward online counseling. However, there is a lack of efforts to evaluate and explain the counselors' service quality in such an asynchronous online questioning and answering (QA) format efficiently. This study applied the notion of perceived helpfulness as a public's perception of counselors' service quality in SQA-OC, used computational linguistic and explainable machine learning (XML) methods suited for large-scale QA discourse analysis to build an predictive model, explored how various sources and types of linguistic cues [i.e., Linguistic Inquiry and Word Count (LIWC), topic consistency, linguistic style similarity, emotional similarity] contributed to the perceived helpfulness. Results show that linguistic cues from counselees, counselors, and synchrony between them are important predictors, the linguistic cues and XML can effectively predict and explain the perceived usefulness of SQA-OC, and support operational decision-making for counselors. Five helpful counseling experiences including linguistic styles of “talkative”, “empathy”, “thoughtful”, “concise with distance”, and “friendliness and confident” were identified in the SQA-OC. The paper proposed a method to evaluate the perceived helpfulness of SQA-OC service automatically, effectively, and explainable, shedding light on the understanding of the SQA-OC service outcome and the design of a better mechanism for SQA-OC systems.
Introduction
Throughout the world, people are affected by mental health disorders at staggering rates (1). In many contexts, appropriate treatment is lacking and people with mental health conditions experience severe human rights violations, discrimination, and stigma (2). Moreover, there are direct and indirect consequences of COVID-19 on mental health conditions, which challenged traditional mental health systems, leading to increased demand and interrupted delivery of essential services at the same time.
With the rapid development of the mobile internet and the extensive practice of the concept of “Internet Plus”, online counseling service has become an emerging market (3) and a large number of online counseling apps have emerged (4). Scholars have defined online counseling as the delivery of counseling services via the Internet when the pastoral/spiritual counselor or psychologist and counselee are not in the same physical area and communicate using computer-mediated communication innovations (3, 5, 6). Online counseling encompasses a wide range of techniques, including but not limited to instant messaging, synchronous chat, text messaging, video conferencing, and asynchronous email (7). There has been evidence showing that the break out of COVID-19 has boosted the use of online counseling worldwide (7–9). Still, access to mental health care remains a global challenge with widespread shortages of the workforce (10). Facing limited in-person treatment options and other barriers like stigma (11), millions of people are turning to SQA-OC platforms such as TalkLife (talklife.co), OnePsychology (xinli001.com) to express emotions, share stigmatized experiences, and receive peer support or volunteerism from the counselors (12).
One of the major information and knowledge sources that have arisen from Internet-mediated social practice is social questioning and answering (SQA) sites. People use social question answering sites for expressing their knowledge demands (13–15), seeking information (16, 17) as well as for the construction and maintenance of relationships (18). SQA, as an example of collective intelligence, allows users to pose questions, contribute answers and comments, and evaluate questions and responses among peers. Moreover, compared with the privacy and anonymity of face-to-face counseling, SQA-OC avoids privacy and ethical issues in the counseling process, which can greatly enhance the initiative, accessibility, and immediacy of psychological counseling services and is in line with epidemic prevention policies such as social isolation caused by COVID-19. However, while peer supporters on these platforms are motivated with the intention of helping others seeking support (henceforth seeker), most of them are not well-trained and typically lack the knowledge of best practices in the therapeutic process. Such informal help has been demonstrated to be more comfortable for the public but less impactful than formal counseling services (19). As the growing relevance of social Q&A platforms for psychological counselees, there is a large and urgent need for evaluating and forecasting the content quality of the counselor's responses. Specifically, what influences the counselees' perception of counselors' service quality in SQA-OC and how various factors interact remains unclear.
This study will introduce the notion of perceived helpfulness to measure the extent to which SQA-OC platforms help the public. The notion of perceived helpfulness has been widely used in the study of online consumers (20, 21), which refers to the degree to which consumers feel useful to online information or services and is usually measured by the total number of helpful notes voted by consumers (22–24). This voting mechanism brings huge revenue to the commercial platform (25), as well as filter out valuable information for consumers when faced with massive options. This success of voting helpful online services has been applied to the SQA-OC platform, by providing counselees with voting options for whether or counselors' responses are useful, and the public including counselees can check the number of peer votes to analyze the counselor helpfulness. This voting mechanism makes useful responses from the counselor stand out from an SQA-OC platform and helps counselees and the public make decisions (22, 26). Therefore, we applied the notion of perceived helpfulness to measure the outcome of SQA-OC in this study, as the total number of helpful notes becomes an intuitive standard for measuring and evaluating the service quality of counselors on the online platform.
In formal and face-to-face counseling, the quality of counseling service is mostly measured via interviews and questionnaires, during which the counseling competence, counseling strategy, and verbal skills of the counselor are assessed (27). More specifically, the investigation into the talking and conversations in the counseling process becomes an important way to measure counseling outcomes (28). Benefiting from the development of natural language process (NLP) and the tool of language inquiry and word count (LIWC), the linguistic features of counselors and counselees can be studied in a large scale and quantitative manner. Using NLP analysis, the existing research found that the change of counselees' linguistic features (e.g., first-person use, negative emotion words) can positively predict the therapeutic effect (29–31). In addition, there are significant differences in the linguistic features (e.g., positive/negative emotion words, causation words) between good counselors and poor counselors (32, 33).
In terms of the counselor-counselee interaction, several scholars used a computational approach to measure empathy in the text (34) and the counselor-counselee emotional similarity (35). According to Ardito and Rabellino (36), the emotional connection between the counselor and the counselee is closely related to the working alliance and the evaluation of therapy outcomes. In the field of commercial research, language style matching (LSM) (37) between the manager and the reviewer has been demonstrated to be an important factor in predicting the public's perceived helpfulness (38). Using the same approach, LSM between the counselor and the counselee has been proved to be effective in predicting the counseling outcome (35). However, the above studies that employed NLP methods mainly focused on the formal and synchronized counseling process, yet little is known about the informal and asynchronous online SQA-OC context. Therefore, this study analyzed the linguistic cues and psychological topics from SQA-OC, build a model to predict and explain the perceived helpfulness of counselors' response effectively, as well as identify helpful counseling experiences. This research aims to address the following two questions:
Research question 1: Dose linguistic cues and machine learning methods can be utilized to predict the perceived helpfulness of SQA-OC effectively?
Research question 2: How do linguistic cues contribute to the perceived helpfulness of SQA-OC?
In recent years, the SQA-OC has shown explosive growth, but its quality is uneven (39) which makes it difficult to monitor by manpower. By using the machine learning method, this study aims to propose an automatic detection approach to measure and understand the counseling outcome and service quality of SQA-OC. The method can reduce large amounts of questionnaire evaluation or qualitative analysis processes while providing feedback for the platform and the online users more quickly and efficiently on a large scale. Likewise, by proposing NLP and XML methods, this study aims to identify influential linguistic cues and their impact on perceived helpfulness, to advance ours understanding of SQA-OC outcome, which could afford manipulable and real-time feedback to the counselors and SQA-OC platform.
Data, Methods, and Measures
Data Crawling
Data were crawled from one of the largest Chinese online counseling platforms “One Psychology Community”,1 on which nearly 20 million asked for psychological help. In the Q&A section of the platform, psychological seek-helpers could post their psychological distress and problems, seek psychological service and support from the platform's counselors anonymously. We used “bazhuayu”,2 a web scraping software, to crawl 5,169 questions from the counselee, as well as 15,058 responses from the counselor. The time span of the SQA-OC data is from June 17, 2013, to December 16, 2020. A report conducted by a famous Chinese online counseling platform, “JianDanXinLi”3 showed that among the users of online counseling, the female visitors were more, who were three times more than the male visitors, and visitors in the early adulthood (21–35 years old) accounted for 77.57%.
Each question has the following three sections: the title of the description, the description of the psychological problem, and the asking time, which contains an average of 185 words. Questions may include components such as the title of the post, age, gender, course of the psychological problem, inner feelings, duration of the problem, and the label (i.e., occupation, marriage, romantic relationship, family, etc.). These questions are followed by several responses from counselors, which contains an average of 388 words. The number of users' likes given to the counselor ranges from 1 to 39, with an average of 4.362.
Word Embedding Based Psychological Topic Detection
Word embedding is a popular machine learning method that represents each word by a vector, such that the geometry between these vectors captures semantic relations between the corresponding words. Since it was demonstrated that word embedding can encode rich semantic relationships between words as geometrical relationships in low-dimensional vector space (40, 41), the embedding models have offered novel opportunities and solutions to challenging problems, including language evolution (42), gender and stereotypes (43, 44), culture and identities (45, 46), and even the prediction of material properties (47). For the analysis of psychological topics of counselees and counselors, word embedding was utilized to extract the symptoms and influencing topics from the SQA texts.
According to former researchers which applied word embedding to identifying psychological topics in online psychological help seeking texts (9, 48), we proposed following four steps. Firstly, a predefined lexicon regarding psychological symptoms and influential linguistic cues of was constructed. The seed words of the lexicon were extracted by two Ph.D. candidates in psychology from three text resources: Kessler 10 and Patient Health Questionnaire (49) and the question tag system4 of One Psychology website. Secondly, we built the psychological lexicon of the SQA-OC community. By using the Jieba tool (i.e., a Python segmentation package for Chinese5, and Baidu stop-word list, the text of SQA-OC was cut, and stop words were deleted. According to the word embedding method, the text was used as the training corpus. The word embedding method of Word2vec in Gensim software6 was used to construct the latent semantic model for large-scale SQA text, to obtain domain lexicons of psychological symptoms and influencing factors, respectively. Specifically, the cosine similarity between the words in the model vocabulary and the predefined lexicon was calculated based on the model. Two graduate students were recruited to set the thresholds of cosine similarity to remove words in SQA text which were irrelevant to the predefined vocabulary, to checked the retained words manually. Specifically, the psychological lexicons contain two parts: 2,567 words related to psychological symptoms and 1,077 words related to psychological factors.
The third step was to obtain topics of the psychological symptoms and influential linguistic cues of the counselees and counselors. According to the lexicon we built, the psychological words from the SQA-OC text were selected. Using the average word embedding method, word vectors representation of symptoms and influential linguistic cues for counselees' questions and counselors' responses was obtained (50). We used the k-means algorithm (python implement in scikit-learn) and its evaluation index (i.e., silhouette coefficient), to obtain and evaluate the clustering performance with different numbers of clustering centers. Fourth, we selected the best k-mean clustering model for topics detection. Then, the number of cluster under the optimal silhouette coefficient was selected to construct the clusters of psychological symptoms and influential factors. Finally, the topics related to psychological problems are named as depression and anxiety, suffering, social phobia, lack of interest, suicidal tendencies, worry (afraid), and anger. The topics related to influential linguistic cues are named as love, marriage, psychotherapy, work, interpersonal relationship, character, and family (see Table A2 in Appendix for detailed topic information).
Measures
Dependent Variable: Perceived Helpfulness of Counselor's Responses
One psychology platform provides the counselees with voting opportunities for whether a counselor's responses are useful or not. Owning to the anonymity of the platform, the questions and answers are visual to the public, a counselor's responses can be voted as useful or not (i.e., a binary measure of helpfulness) by both the counselee and others who are browsing the questions. We selected the number of helpful votes to measure perceived helpfulness, which is in line with previous studies of online (22–24, 27–35, 37, 38).
Explanatory Variable 1: Linguistic Cues in Text From Counselors and Counselees
Linguistic Cues in Counselees' Text
Given that the question of counselees in the One Psychology platform is visible to the public7; other counselees may vote the counselor's responses as useful or not after reading one counselee's question. Therefore, we hypothesized that the linguistic cues in the counselees' text will influence the perceived helpfulness of SQA-OC.
Linguistic Cues in Counselors' Text
Previous studies have found that the therapeutic outcome can be predicted from the linguistic cues on counselors' use of language (32, 33). We hypothesis that the linguistic cues in counselors' responses will influence the perceived helpfulness of SQA-OC.
For the linguistic cues measures of either the counselors' text or the counselees text, we used the Simplified Chinese version of LIWC (SCLIWC) to extract the linguistic cues including affective processes (AP), social processes (SP), cognitive processes (CP), perceptual processes (PP), biological processes (BP), Drives (Dr), time orientations (TO), relativity (Rev), personal concerns (PC), and informal language (IL). We also included the number of words, word per sentence, number of sentences, number of function words, verbs, nouns, and personal pronouns etc., and make them as linguistic cues of stylistic (St) (see Table A1 in Appendix for detailed information).
Explanatory Variable 2: Synchrony Between Counselor and Counselee
The notion of synchrony is usually used to describe concurrent non-verbal behaviors (e.g., postures, gestures, facial expressions) that happened in the context of interpersonal communication (51, 52), referring to an interactive outcome that can only be achieved when participants share a common course of action/goal and constrain their behavior in a mutual relationship (53). We borrowed this terminology to refer to a synchronized conversation [cf. (54)] between the counselor and the counselee in SQA-OC, and examined it quantitatively through the following three aspects, i.e., topic consistency (TC), linguistic style matching (LSM), and emotional similarity (ES).
Topic Consistency Measurement
The research on topic consistency was to explore whether the counselors respond to multiple topics mentioned by the counselees in their question. Based on the word embedding based psychological topic detection method, we found seven topics related to psychological symptoms (i.e., depression and anxiety, suffering, social phobia, lack of interest, suicidal tendency, worried and afraid, and angry), and seven topics related to psychological factors (i.e., love, marriage, psychotherapy, work, interpersonal relationship, personal characteristic, and family). To accurately identify psychological topics, the distribution of high-frequency feature words in the text can be used (55). Therefore, we set up a seven-dimensional vector corresponding to seven topics to represent the topic diversity of psychological symptoms and influential linguistic cues, respectively. First, we matched the counselees' text with the words in psychological topic one by one, if a word under a topic appeared in a text, the corresponding element in the vector was changed to 1, otherwise, it was 0. Similarly, the responses corpus will perform the above operation. Thus, for each piece of SQA text, we get two topic vector representations of the counselees' questions and the corresponding response of counselors. Then, the Jaccard similarity was calculated to represent topic consistency between two texts and measured the shared attributes of sets A and B (where the set consists of 0 or 1). Jaccard similarity coefficient is a method for measuring the similarity of asymmetric binary attributes (56). The Jaccard similarity coefficient between counselees i and counselors j is: J(Ri, Mi) = |Ri ∩ Mj||Ri ∪ Mj|, where Ri is a topic vector representation of the counselee. Mj represents a topic vector representation of the counselor's response.|Ri ∩ Mj| means the number of topics that co-occurred in both the counselee's question and counselor's response. |Ri ∪ Mj| means the number of topics embedded in the text of the counselee's question and counselor's response.
Linguistic Style Matching Measurement
To operationalize the interactive and implicit aspects of the alliance in psychoanalytic psychotherapy, the language style matching metric is proposed, which is based on computerized text analyses performed using the software LIWC (37, 57). Rather than content-based aspects of language (e.g., using the counselee's description of feeling “livid” rather than “angry”), LSM represents the degree to which two people are producing similar rates of function words (e.g., pronouns, prepositions, and conjunctions) in their dialogue (57, 58). Indeed, the function word includes nine types: prepositions, auxiliary verbs, adverbs, conjunction, articles, quantifiers, negations, personal pronoun, and impersonal pronoun (37). Hence, we firstly use the CLIWC (Chinese Linguistic Inquiry and Word Count) program to calculate the proportion of function words in the text. Then, according to the method of LSM introduced by Ireland and Pennebaker (37), the LSM score of prepositions (preps) between texts from counselor and counselee is: LSMpreps = 1 − [(|preps1 − preps2|)/(preps1 + preps2 + 0.0001)], where preps1 represents the percentage of prepositions in the counselee's text, preps2 represents the percentage of prepositions in the counselor's text. The 0.0001 is added to the denominator to prevent an empty set, where the value of a function word category might be zero as a percentage of the entire text. This calculation is repeated for each of the nine function word categories. The nine category-level LSM scores are then averaged to yield a composite LSM score bounded by 0 and 1, where higher numbers represent greater LSM between counselee and counselor.
Emotional Similarity Measurement
Empathy is critical to a successful mental health support and is part of the therapeutic strategies in the training of counselors (59, 60). Empathy measurement has pre-dominantly occurred in synchronous, face-to-face settings (61, 62). It is unknown that such a computational approach to study empathy can be applied to an asynchronous, text-based context (63). Also, while previous NLP research has focused predominantly on empathy as reacting with emotions of warmth and compassion (64), or focusing on speech-based settings (61, 65), a separate but key aspect of empathy is to communicate a cognitive understanding of others (66). Given that millions of people use text-based platforms for mental health support, understanding empathy in SQA-OC has practical significance in this study.
In this study, we present a novel computational approach to understanding how empathy is expressed in SQA-OC. Empathy is a complex multi-dimensional construct with two broad aspects related to emotion and cognition (67). The emotional aspect relates to the emotional stimulation in reaction to the experiences and feelings expressed by a counselee. The cognitive aspect is a more deliberate process of understanding and interpreting the experiences and feelings of the counselees and communicating that understanding to them (60). Here, we study expressed empathy in text-based mental health support – empathy expressed or communicated by peer supporters in their textual interactions with seekers [cf. (68)].
Specifically, we used CLIWC to extract the emotion-related linguistic cues from texts of the counselee's question and the corresponding response of counselors, including seven dimensions: emotion, positive emotion, negative emotion, anxiety, anger, sadness, and love (see Table A2 for detailed information). We use the similarity between the counselees' and the counselors' emotion-related linguistic cues to quantify the emotional similarity to characterize the empathy. We apply cosine distance to measure the similarity between vectors of emotion (“emo”)-related linguistic cues of counselee i and counselor j, which is emocounselor j and emocounselee i. We then calculated the cosine similarity as follows:
where the numerator represents the dot emotion-related linguistic cues vectors between counselee i and counselor j, the denominator represents the modular product of these two vectors.
Explainable Machine Learning Method
Taking the perceived helpfulness of the public to the SQA-OC as the dependent variable, and linguistic cues from counselees' questions, counselors' responses, and their synchronous interaction between them as the independent variables, utilizing XML regressions, we proposed prediction models for perceived helpfulness of SQA-OC.
Specifically, we used linear machine learning regression like linear regression, ridge regression, Lasso regression, support vector regression (linear kernel), as well as non-linear machine learning regression like random forest, to build a prediction for perceived helpfulness of SQA-OC. We used mean absolute error (MAE) to evaluate the performance of different algorithms and feature sets in the model, used the ten-fold cross-validation to select the best predictive model. Shapley values are a widely used approach from cooperative game theory that come with desirable properties. Utilizing explainable artificial intelligence (XAI) method based on the Shapley values, we identified the influential features from all independent variables, as well as how they contribute to the perceived helpfulness of SQA-OC. SHAP values represent a feature's responsibility for a change in the model output (69). SHAP values offer two important benefits. First, global interpretability, namely the SHAP values can show how much each predictor contributes, either positively or negatively, to the target variable. Second, local interpretability, namely each observation gets its own set of SHAP values. Traditional variable importance algorithms only show the results across the entire population but not on each individual case, while the local interpretability enables us to pinpoint and contrast the impacts of the factors. SHAP value greatly increases the transparency of machine learning and has been implemented in many research and industry scenarios (70).
By accumulative the SHAP values of each feature, we quantified the positive and negative influence of different types of features on the perceived helpfulness. Let the amount of data is M. If the number of feature in feature set F is {1, 2, …, P}, the SHAP values of the these features are:
Therefore, the positive SHAP value of the feature set F is: , is the amount of positive SHAP values for an specific feature i. We calculated the negative SHAP value in the same way. In addition, to classify features with different predicting power and influence, we calculated the relationship for SHAP values of each features using Pearson correlation coefficient.
The research methods and processes we proposed are shown in Figure 1.
Results
The Predictive Model of Perceived Helpfulness of SQA-OC
To build a predictive model of perceived helpfulness of SQA-OC with good performance and interpretability, we proposed feature sets of linguistic cues and specific combinations of their sources, including counselees' qusetions (i.e., “counselee” in Table 1), counselors' responses (i.e., “counselor” in Table 1), and the synchrony between counselor and counselee (i.e., counselee_counselor_sync in Table 1), and used machine learning regressors (i.e., linear regression, ridge, lasso, SVR and random forest in Table 1) to build the prediction. Further, we use MAE to evaluate the performance of different predictive models, and use the SHAP value to explain the model with the best performance.
As shown in Table 1, we got the performance for predictive models with different algorithms and feature sets through the MAE values. We can see that the random forest based on the combination of features from counselees, counselors, and the synchrony between them achieved the lowest MAE among all predictions. Specifically, for predictions with a larger number of features, compared with other linear prediction algorithms, the non-linear random forest algorithm achieves a lower MAE. For for predictions with a small number of features, compared with the non-linear prediction algorithm (i.e., random forest), the support vector regressors with linear kernel achieves a lower MAE. In general, the fandom forest based on the combined feature sets of counselees, counselors, and their synchrony has achieved the best performance, and its MAPEs are 0.20108, 0.211445, and 0.228419, respectively. Non-linear random forest and linear support vector algorithms are better than other algorithms in the prediction.
In addition, we further selected the effective features that can improve the performance of the model from the feature set containing all variables of linguistic cues. Specifically, using the random forest and SHAP value-based XML method, we calculated the SHAP values for each of the feature in the variables, and ranked these features from the highest to the lowest based on the SHSP values. Then, we added each of the features to the random forest regressors according to the ranked order, and calculate the MAE value fro the regressors after adding new features each time. Finally, as shown in Table 1, we found that the top 52 most important features achieved the highest performance: 1.8556.
The Influence of Linguistic Cues on the Perceived Helpfulness
The Influence of Linguistic Cues With Different Sources and Types on the Perceived Helpfulness
To further analyze the impact of different sources (i.e., counselors' question, counselees' response, synchrony between them) and types (i.e., AP, SP, CP, PP, BP, Dr, TO, Rev, PC, IL,St, and CSS) of linguistic cues on the predictive model of perceived helpfulness of SQA-OC, we calculated the cumulative SHAP values of the features mentioned above. The results are shown in Table 2.
Table 2. Cumulative SHAP values for different sources and types of linguistic cues in the perceived helpfulness predictive model.
For the influence of different feature sources, first, we find that all the three sources improve the performance of the predictive model, and all these three sources have an incremental effect in improving the performance of the prediction. In terms of the relative differences, the features from counselors' response have the greatest impact on the perceived helpfulness, and its cumulative SHAP value is 6.093 (positive value is 3.2436 and a negative value is −2.8497), accounting for 93.38% of the total SHAP value; It is much higher than influence of the linguistic cues from counselees (the overall SHAP value is 0.1699, accounting for 2.60%) or the synchrony between counselees and counselors (the overall SHAP value is 0.2621, accounting for 4.02%).
For the influence of different types of features, we calculated and analyzed the cumulative SHAP values of different types of features in the predictive model. Among the counselor-sourced features set, stylistic, affective processes, biological processes, cognitive processes, drives, informal language, perceptual processes, personal concerns, personal pronouns, prepositions, relativity, social processes, stylistic, time orientations, were the influential types of linguistic cues in predicting the perceived helpfulness, accounting for 85.82% of the total effect. Among them, stylistic, cognitive processes, and personal pronouns were the top three most influential linguistic cues with SHAP values of 1.8539, 1.2500, and 0.8139, respectively, accounting for 60.04% of the total effect. In the counselee-sourced features set, informal language (SHAP value of 0.0625) and stylistic (SHAP value of 0.1074) features were the influential types of linguistic cues. Among the feature set of the synchrony between counselor and counselee, emotional similarity (0.0936) and topic consistency (0.0665) were the influential linguistic cue types.
In addition, for the way that different types of linguistic cues influence the perceived helpfulness, we analyzed the positive and negative influence of different types of linguistic cues on the perceived helpfulness, as shown in Table 2. The results show that for counselor-sourced linguistic cues, except for relativity, social processes, which only reduce perceived helpfulness, other types of linguistic cues may both enhance and reduce perceived helpfulness. For the different types of counselees- sourced language cues, stylistic only increases perceived helpfulness, while informal language may decrease and increase perceived helpfulness. For different types of linguistic cues from synchrony between counselor and counselee, emotional similarity, linguistic style matching, and topic consistency of symptom may both increase and decrease perceived helpfulness. We can see that counselors-sourced linguistic cues of relativity, social processes are risk factors of the perceived helpfulness, while counselees-sourced stylistic are facilitators.
The Way That Top-Ranked Linguistic Cues Influence the Perceived Helpfulness
First, to obtain the most influential linguistic cues for the perceived helpfulness, we calculated and ranked the SHAP values of different features, as shown in Figure 2. For the cumulative SHAP values of the features in the predictive model, the top-33 most influential features contributed about 90% of the influence, and the top-40 influential features contributed more than 95% of the influence. Specifically, The top 33 features include (1) linguistics clues of counselee-sourced stylistic; (2) linguistics clues of counselor-sourced, i.e., stylistic, personal pronouns, cognitive processes, biological processes, cognitive processes, biological processes, personal concerns, affective processes, time orientations, perceptual processes; (3) linguistics clues of synchrony between counselor and counselee-sourced topic consistency of symptom.
Figure 2. Cumulative SHAP values for top 40 most important linguistic cues in the perceived helpfulness predictive model (In this figure shows the top 40 tokens affecting perceived helpfulness, in order of importance, as determined by the SHAP summary output).
Second, as shown in Figure 3, we clustering SHAP values of the effective linguistic cues, divided them into five categories, and analyze their influence in each category by the global interpretability of the model, as shown in Figures 4–8. In these figures, each plot is made up of thousands of individual points from the training data set such with a higher value being more red, and a lower value being more blue. This is depicted by the “feature value” bar on the right of each plot. Therefore, if the dots on one side of the central line are increasingly red or blue, that suggests that increasing values or decreasing values, respectively, move the predicated perceived helpfulness in that direction. Take Figure 4 as an example, lower word count values in stylistic from counselee (blue dots) are associated with a relatively lower perceived helpfulness. To briefly summarize, we detailed them into five typical patterns below according to the clustering results.
(1) As shown in Figure 4, the influential linguistic cues in the first category include linguistic cues of stylistic from counselees and counselors. Their influence on perceived usefulness can be divided to two types. For the first type, when values of the linguistic cues is at a high level, it improves the perceived usefulness; when its value is at a low level, it reduces the perceived usefulness. For example, WordCount, You and Number from counselees, Period from counselors belong to the first type. For the second type, when the value of the factors is at a high level, it decreases the perceived usefulness; When the value of the factors is at a low level, it improves the perceived usefulness, such as Parenth and OtherP from counselors.
(2) As shown in Figure 5, the second type of influential linguistic cues include stylistic, social processes, personal concerns, effective processes from counselors, and AffectSIM, SymptomsSIM from the synchrony between counselor and counselee. In addition to the linguistic cues of I from counselors, other linguistic cues show a consistent impact on the perceived usefulness, that is, when the value of the factor is at a high level, it improves the perceived usefulness, and when it is at a low level, it decreases the perceived usefulness.
(3) As shown in Figure 6, the third type of influential linguistic cues include stylistic, drive, cognitive processes from counselors, and show a consistent impact on the perceived usefulness, that is, when the value of the factor is at a high level, the perceived usefulness is improved, and when the value of the factor is at a low level, the perceived usefulness is reduced, for example, the stylistic (i.e., WordPerSentence, Preps, Conj), Drive (i.e., Achieve), cognitive processes (i.e., Cause, CogMech, Insight, Inclusive).
(4) As shown in Figure 7, the fourth type of influential linguistic cues include time orientations, stylistic, perceptual processes, informational language, cognitive processes, relativity, and personal concerns from counselors, and show a complex impact on the perceived usefulness. First, when the value of the factor is at a high level, it slightly improves the perceived usefulness; When its value is at low level, it significantly reduces or improves the perceived usefulness, such as language cues of stylistic (i.e., Quote, We, SpecArt,Colon,RateFourCharWord), cognitive processes (i.e., Inhibition), relativity (i.e., Nonfl, Time), perceptual processes (i.e., Hear), personal concerns (i.e., Leisure). Second, when the value of the linguistic cues is at a high level, it improves the perceived usefulness; When its value is at low level, it reduces the perceived usefulness, such as stylistic (i.e., PrepEnd), relativity (i.e., Time, Motion) from counselors. Third, when the value of the linguistic cues is at a high level, it reduces the perceived usefulness; When its value is at low level, it improves the perceived usefulness, such as language cues of time orientations (i.e., TenseM).
(5) As shown in Figure 8, the fifth type of influential linguistic cue includes stylistic, time orientations, informational language, cognitive processes, perceptual processes from counselors, stylistic and informal language from counselees, and linguistic cues from the synchrony between counselor and counsellee, and shows two effects on perceived usefulness. First, when the value of the linguistic cues is at a high level, it improves the perceived usefulness; When its value is at low level, it reduces the perceived usefulness, such as linguistic cues of stylistic (i.e., MultiFun, PrepEnd) from counselors, linguistic cues of stylistic (i.e., Preps, MultiFun) from counselees, and linguistic cues from the synchrony between counselor and counselee (i.e., Factors SIM). Second, when the value of the linguistic cues is at a high level, it reduces the perceived usefulness; When its value is at low level, it improves the perceived usefulness, such as language clues of stylistic (i.e., You, Adverb,Verb, Pronoun), cognitive processes (i.e., Tentat) and information language (i.e., Assent) from counselor, information language (i.e., Assent, Adverb) from counselees.
Figure 4. The SHAP summary plots about the adjustment to the predicted in perceive helpfulness numbers (x-axis) for each of the first type features.
Figure 5. The SHAP summary plots about the adjustment to the predicted in perceive helpfulness numbers (x-axis) for each of the second type features.
Figure 6. The SHAP summary plots about the adjustment to the predicted in perceive helpfulness numbers (x-axis) for each of the third type features.
Figure 7. The SHAP summary plots about the adjustment to the predicted in perceived helpfulness numbers (x-axis) for each of the fourth type features.
Figure 8. The SHAP summary plots about the adjustment to the predicted in perceive helpfulness numbers (x-axis) for each of the fifth type features.
Discussion
This exploratory research investigated automatic predictive methods and linguistic cues of the perceived helpfulness of SQA-OC. It puts forward prediction algorithm and factors with advantages, then discovers relevant influential factors from the interpretability. The findings of this study can be summarized in three parts: (1) algorithms and linguistic cues with advantages in predicting the perceived helpfulness; (2) the importance of the influential linguistic cues with different sources and types to the perceived helpfulness; (3) and the influence of these linguistic cues on the perceived helpfulness. We explained each part below and summarized the main contributions of this study.
The Predictive Model
In terms of the predictive model on the perceived helpfulness of SQA-OC, this study found that the random forest algorithm combining a set of counselees' features, counselors' features, and counselor-counselee interactions achieved the best predictive performance and has potential practical application significance. Comparing with different predictive algorithms, our findings showed that the non-linear regression model performs better than the linear model, which is in line with previous studies of automatically predicting the perceived helpfulness of online service (22, 71, 72). More specifically, this study showed that the linear model achieves better performance with a smaller number of features, while the non-linear models can represent the non-linear relationship between features and are more suitable to use in predictive situations of high complexity.
In terms of the different sources of feature sets, the combination of features about counselees' questions, counselors' responses, and the synchrony between counselees and counselors achieve the optimal performance. One possible explanation is that the number of influential features in the counselee-source and counselor-counselee-source in the predictive model only contains two or four features, while the number of counselor-source contains 46. It has been suggested that the number of features positively predicts the complexity of the predictive model (73), which explains why the counselor-source is indispensable in the complicated predictive model owing to a large number of highly influential features.
Furthermore, we found that the linguistic cues from counselors are the most important, despite all three having incremental effects to improve the performance of the prediction. This is in line with previous findings which showed that the linguistic features of counselors, counselees, and counselor-counselee interactions can predict the therapeutic outcome (62, 74, 75). For the perceived helpfulness and therapeutic outcome of online counseling in the asynchronous and one-off service environment, these findings also suggested that the strategies and skills of counselors play a major role.
The Importance of the Influential Linguistic Cues With Different Sources and Types to the Perceived Helpfulness
The interpretability gives machine learning the ability to explain or present their behaviors in understandable terms to humans (76), which is an effective tool to understand and improve the perceived helpfulness.
First, the global interpretability of the model identified and clarified the influential linguistic cues related to the perceived helpfulness, as well as their relative importance. The influential linguistic cues implied the influence of counselees' and counselors' attentional focus, thought processes, emotional states, and social relationships on the perceived helpfulness of SQA-OC. Specifically, for the counselor-source linguistic cues, the global interpretability of the model indicated stylistic, personal pronouns, cognitive processes, time orientations, personal concerns, affective processes, perceptual processes, informal language, biological processes, prepositions, numbers, drives, relativity, multifunction, social processes are the top-ranked predictors of perceived usefulness. These linguistic cues are widely believed to be related to individual psychological processes, such as cognitive process, emotion, and social relations (77). We provided evidence for the previous findings of counselors' linguistic cues and therapeutic outcomes in the online, asynchronous, and one-off service environment, and promoted the influencing factors of perceived usefulness in the SQA-OC context. For counselee-source linguistic cues, stylistic (i.e., length of texts, second person pronouns, numbers) and informal language are the influential factors. These linguistic cues usually explain who dominates a conversation and how they engage in the conversation, and predict the quality of relationships (77). For the linguistic cues of counselor-counselee synchrony sources, emotional similarity, topic (symptom) consistency, and language style similarity are the determinate predictors for perceived helpfulness. These findings are in line with researches of both perceived helpfulness (24, 38) and mainstream Therapeutic Change Process Research (TCPR) using computerized-text analysis (78, 79). In general, the global interpretability of the prediction provides insights into what makes a good SQA-OC and offers policy suggestions for the counseling platform to undertake professional training strategies for counselors.
Second, the local interpretability of this method produces both prediction and explanation for each response from counselors. As shown in Figure 9, for the prediction and explanation of an unvoted response we predicted (4.70 votes), we can see that the emotional similarity was 0.996, which ranked third among the positive factors, and the linguistic cues of SpecArt and Cause are top-ranked negative factors. Local interpretability of the prediction allows counselors to evaluate and improve their service in advance and facilitates the use of targeted counseling strategies.
Influence of the Influential Linguistic Cues on the Perceived Helpfulness
This study examined the influence of influential factors on the perceived helpfulness of SQA-OC and summarized them into five styles of linguistic cues that can improve the perceived helpfulness of SQA-OC, namely “talkative”, “empathy”, “thoughtful”, “concise with distance”, and “friendliness and confident”.
The first pattern is characterized by a “talkative” style, with a high level of stylistic (WordCount, You and Number) from counselees, and linguistic cues of stylistic (Period) from counselors, which reflects attentional allocation and engagement of counselors and counselees in SQA-OC. It is consistent with previous research, indicating that a greater word count means people who are more dominant and engaged in the conversation. However, it is inconsistent with the finding of the use of second-person words, which is more important in predicting lower-quality relationships (37). The distinct use of stylistic linguistic cues by counselors and counselees compared to traditional TCPR, possibly because of the single QA format and text-based feature of SQA-OC. This format facilitates both counselees and counselors to provide help-seeking information at one time as much as possible, in which context counselors want to circumvent the ambiguity of response (32) or to highlight their authorship differing from other counselors' responses (80).
The second pattern is characterized by an “empathy” style, which reflects the emotional state and social relationships of counselors and counselees in SQA-OC. This style contains stylistic (i.e., I), social processes (i.e., Humans), personal concerns (i.e., Psychology), affective processes (i.e., Affect, NegEmo) from counselors, and the synchrony between counselor and counselee (i.e., AffectSIM, SymptomsSIM). People who are experiencing physical or emotional pain tend to have their attention drawn to themselves and subsequently use more first-person singular pronouns (Tausczik and Pennebaker, 2010). In line with previous studies using the computerized approach to study counseling progress, a high level of first-person pronouns, emotional words, along with similarity in affect and language style (35), are important factors in the higher-quality relationship between counselor and counselee. Emotional similarity reflects the emotional aspect of empathy which predicts a counselor's competency and conversation skills (34, 81), while LSM and topic consistency (i.e., symptomSIM) represent the cognitive dimension of empathy (54) and reflect unconscious inter-personal communication behavior that promotes mutual understanding and increases intimacy between the two parties (77). These three dimensions influence the counselees' perception of the overall counseling experience in SQA-OC. Therefore, the use of first-person singular pronouns and the more similar the counselor's response text is to the counselee's language style, emotional disposition, and symptoms, the more likely the counselees are to vote the SQA-OC experience as useful and helpful.
The third pattern is characterized by a “thoughtful” style, with mainly linguistic cues of stylistic, drive, and cognitive processes from counselors, which reflects counselors' thinking styles and intentions in SQA-OC. Counselors with this style use more words of cognitive process (i.e., Cause, CogMech, Insight, Inclusive) and prepositions, which implies they make more efforts in analyzing the symptoms and causes of counselees' psychological distress, proposing treatment strategies, and promoting their implementation, so as to relieve counselees' psychological distress. Linguistic words of the cognitive process like exclusion words and conjunctions capture people's cognitive complexity (77). By using the “thoughtful” style, counselors could create causal explanations to organize their thoughts in counseling.
The fourth pattern is characterized by a “concise with distance” style, which reflects engagement, cognitive load, and psychological distance between counselors and counselees in SQA-OC. This style contains linguistic cues of stylistic, first-person plural, cognitive processes, relativity, perceptual processes, and personal concerns from counselors. Counselors' response with high perceived usefulness tend to use a moderate number of words (i.e., Quote, SpecArt, Colon, RateFourCharWord) and first-person plural (i.e., We), cognitive processes (i.e., Inhibition), relativity (i.e., Nonfl, Time), perceptual processes (i.e., Hear), personal concerns (i.e., Leisure), as many prepositions as possible (i.e., PrepEnd), relativity (i.e., Time, Motion), fewer words of time orientations (TenseM). On the one hand, these findings somehow contrast with the fact that good speakers are more biased toward group focus [plural pronouns “we”, (82)], which shows group cohesion (37). On the other hand, research indicated that with high load in conversation, people speak more and used longer sentences, used more words, and more plural personal pronouns (83). Likewise, during high-quality counseling, counselors achieve a more balanced exchange of words with counselees as the conversation progresses (62). These findings may also confirm that a reasonable length of response and an appropriate expression of first-person, cognitive processes, perceptual processes, and relativity are the strategies for counselors in SQA-OC. In addition, for the use of words such as biological and perceptual process, and time orientations, a previous study suggested that there are no significant differences between good and poor counselors (33). This is inconsistent with the results of the present study, which may be due to the fact that the SQA-OC uses a single QA format instead of stepwise multiple counseling progress, leading counselors to use as many cognitive/ sensory/affective descriptions as possible to meet counselee's information demand and relieve his psychological distress.
The fifth pattern is characterized by a “friendliness and confident” style, which reflects the formality and thinking style of counselors and counselees in SQA-OC. As mentioned earlier, experienced counselors and counselees with this style use more functional words and conjunctions (i.e., MultiFun, PrepEnd), providing more complex and, often, concrete information about a topic. In particular, experienced counselors and counselees tend to use fewer second-person pronouns (i.e., You), verbs, cognitive processes (i.e., Tentat), and informal language (i.e., Assent), while more similar in analyzing the influencing factors of psychological distress (i.e., Factors SIM). Specifically, owing to less informal language (i.e., Assent, Adverb) improving perceived helpfulness, this finding complements the previous NLP approach which studies counseling conversations from the perspective of the counselees' linguistic features, namely that the influence of counselees' language output is equally important as the counselors to the counseling process. For these counselees, using more informal language may reduce the perceived helpfulness of SQA-OC. Furthermore, when people are uncertain or insecure about their topic, they use tentative language (37). Therefore, successful counselors are better at handling ambiguity in the conversation, and using more words is one of the effective strategies to make the conversation less uncertain and more concrete (32).
Contribution and Limitation
The first contribution of this study is the application of the notion of perceived helpfulness in measuring the public's perception of SQA-OC. Although this notion has been applied in the field of online counseling research (3, 84), it is not always quantifiable and easily accessible in the online context. In addition, studies that use text mining technology to automatically measure, model, and predict the helpfulness of online counseling are still rare. In terms of to which extent the counselor's response is helpful (i.e., the social aspect) and the ranking/popularity of the counselor on the platform (i.e., the economic aspect), investigating the degree of online helpfulness votes of counselor's response maybe helpful to understand both the social impact and economic impact of online counseling. With the aid of text mining technology, we found that linguistic features of the counselor's responses and counselee's questions play a decisive role in the predictive model. Compared to the studies of shopping websites (i.e., TripAdvisor, etc.), whose helpful votes will be transformed into the direct economic benefits of the platform. The helpful votes of SQA-OC will be transformed into a more powerful, comfortable, and secure psychological resource for the counselees. Furthermore, we found the synchrony between counselor and counselee in the SQA-OC context, including the aspects of language style matching and topic consistency, but more attention has been paid to emotional and symptom similarity compared to the studies of other online service platforms. Previous scholars have expressed their concerns about the capacity and professionality of online counselors and argued that the service quality of online counseling is hard to guarantee (5). The results of this study showed that the public's perception of helpfulness can only be improved if it is synchronized and resonated with the counselees at both the semantic, cognitive, and emotional levels. Online counselors may improve their professional competence from the above points.
The second contribution is that we examined the online therapeutic relationship and developed a novel method to computerize the counselor-counselee interaction from the aspects of LSS, emotional similarity, and topic consistency. Previous studies have suggested that counselees may unveil at a faster rate when communicating with counselors online (85, 86) and get straight to the point rather than peacefully easing into a problem, due to the “disinhibiting effect” (84). However, little is known about how to measure the characteristics of text-based communication and the interaction pattern between counselor and counselee. Our study indicated five styles (“talkative”,“empathy”, “thoughtful”, “concise with distance”, and “friendliness and confident”) that may happen in the online therapeutic relationship, using large-scale discourse analysis. However, it is noteworthy that counselors may adopt more than one style in the actual practice settings.
We acknowledge that the working alliance is considered to be one of the most crucial factors in the counseling process (87, 88). Although we did not directly measure working alliance as a dependent variable, the measure of perceived helpfulness may be an indicator and reflection of working alliance. More specifically, the five styles lead to either higher or lower levels of perceived helpfulness in the counseling process, which is in line with the “repair-rupture” process of working alliance (89–91). This finding implies that online counselors should be more aware of their verbal responses in the first time of counseling process and adjust their future communication style to maintain the working alliance with the counselees.
This study has several limitations. Firstly, there was no demographic data such as age, gender, or occupation-level data in the estimation model due to the anonymity of the SQA-OC platform. Therefore, it is hard to identify to what extent the factor of age, gender, and occupations interact with the linguistic features of counselees and counselors. Although most counselees do not expose their personal information on the platform explicitly, their demographic information can be predicted from SQA behavior by utilizing advanced machine learning technology (92). Secondly, the computerized text-based analysis in this study has several limitations, such as the absence of non-verbal cues (93), issues of inhibition, and temporal fluidity (94). While this may adversely influence the strategies used in traditional counseling, web-based communication still has several strengths including stronger emotional disclosure (95) and a higher level of client empowerment (96). Future analysis can be improved by incorporating self-report or interview data of the counselees, to verify the perceived helpfulness of SQA-OC from a personal level.
In terms of physical distancing, web-based psychological services can effectively address crisis-related issues. While SQA-OC works as an effective and convenient pre-consultation service, its service quality can be further improved by integrating with the internet cognitive-behavior therapy (ICBT) services (97, 98). The integration may result in better treatment outcomes and wider usage of online mental health services for someone whose treatment outcome is not ideal in the traditional face-to-face modality or has the limitations of stigma, cost, or transportation (99).
Conclusions
We presented a large-scale quantitative study on the online and asynchronous conversation between psychological counselees (i.e., counselees) and counselors on the SQA-OC platform. We proposed an interpretative predictive model to automatically measure the perceived helpfulness of SQA-OC and investigated the impact of linguistic cues in the three sources (counselor, counselee, counselor-counselee synchrony) on the model performance. We hope that this work can inspire the future improvement of online counseling platforms as well as the online counselors, for instance using actionable conversation strategies to improve the public's perception of the helpfulness of online counseling services.
Data Availability Statement
Publicly available datasets were analyzed in this study. This data can be found here: https://www.xinli001.com/qa?source=pc-home.
Author Contributions
YH, HL, and SL: conceptualization and writing—original draft preparation. YH and SL: methodology. HL and YH: formal analysis. HL, YH, SL, ZZ, and WW: writing—review and editing. HL: visualization. WW, ZZ, and YH: funding acquisition. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the and National Natural Science Foundation of China (Nos. 71974072 and 72204095) and Science Foundation of Ministry of Education of China (No. 22YJC880022) and also supported by the Collaborative Innovation Center for Informatization and Balanced Development of K-12 Education by MOE and Hubei Province (Grant Number: xtzd2021-013), and China Basic Education Quality Monitoring Collaborative Innovation Center (No. 2022-04-028-BZPK01).
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher's Note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary Material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpubh.2022.817570/full#supplementary-material
Footnotes
1. ^Available online at: https://www.xinli001.com/ (accessed April 20, 2022).
3. ^Available online at: https://www.Jiandanxinli.Com/Public/2020/ (accessed April 12, 2022).
4. ^https://www.xinli001.com/qa/ask
5. ^https://github.com/fxsjy/jieba
6. ^Available online at: https://radimrehurek.com/gensim/models/word2vec.html (accessed August 1, 2021).
7. ^One Psychology platform categories the Q&A section in five blocks—Recommended answers (with questions), Recommended questions (with answers), Newest questions, Fine selection in last 30 days, and Offer a reward. Online counselees may browse the first four blocks to get help, and counselors may get paid in the last block if they answer some of the questions raised by the counselees.
References
1. Holmes EA, Ghaderi A, Harmer CJ, Ramchandani PG, Cuijpers P, Morrison AP, et al. The lancet psychiatry commission on psychological treatments research in tomorrow's science. Lancet Psychiatry. (2018) 5:237–86. doi: 10.1016/S2215-0366(17)30513-8
2. WHO. The Impact of COVID-19 on Mental, Neurological and Substance Use Services: Results of a Rapid Assessment. WHO (2020).
3. Richards D, Viganó N. Online counseling: a narrative and critical review of the literature. J Clin Psychol. (2013) 69:994–1011. doi: 10.1002/jclp.21974
4. Yin H, Wardenaar K, Wang Y, Wang N, Chen W, Zhangt Y, et al. Mobile mental health apps in china: systematic app store search. J Med Internet Res. (2020) 22:e14915. doi: 10.2196/14915
5. Baker KD, Ray M. Online counseling: the good, the bad, and the possibilities. Couns Psychol Q. (2011) 24:341–6. doi: 10.1080/09515070.2011.632875
6. Abney PC, Maddux CD. Counseling and technology: Some thoughts about the controversy. J. Technol Human Serv. (2004) 22:1–24. doi: 10.1300/J017v22n03_01
7. Situmorang DDB. Online/cyber counseling services in the COVID-19 outbreak: are they really new? J Pastoral Care Counsel. (2020) 74:166–74. doi: 10.1177/1542305020948170
8. Ifdil I, Fadli RP, Suranata K, Zola N, Ardi Z. Online mental health services in Indonesia during the COVID-19 outbreak. Asian J Psychiatr. (2020) 51:102153. doi: 10.1016/j.ajp.2020.102153
9. Huang Y, Liu H, Zhang L, Li S, Wang W, Ren Z, et al. The psychological and behavioral patterns of online psychological help-seekers before and during COVID-19 pandemic: a text mining-based longitudinal ecological study. Int J Environ Res Public Health. (2021) 18:11525. doi: 10.3390/ijerph182111525
10. Olfson M. Building the mental health workforce capacity needed to treat adults with serious mental illnesses. Health Aff. (2016) 35:983–90. doi: 10.1377/hlthaff.2015.1619
11. White M, Dorman SM. Receiving social support online: implications for health education. Health Educ Res. (2001) 16:693–707.
12. Eysenbach G, Powell J, Englesakis M, Rizo C, Stern A. Health related virtual communities and electronic support groups: systematic review of the effects of online peer to peer interactions. Br Med J. (2004) 328:1166. doi: 10.1136/bmj.328.7449.1166
13. Liu Z, Jansen BJ. Almighty Twitter, what are people asking for? Proc Am Soc Inform Sci Technol. (2012) 49:1–10. doi: 10.1002/meet.14504901134
14. Paul SA, Hong L, Chi EH. Who Is Authoritative? Understanding Reputation Mechanisms in Quora. (2012). Available online at: https://arxiv.org/abs/1204.3724v1 (accessed May 7, 2022).
15. Shah C, Oh JS, Oh S. Exploring characteristics and effects of user participation in online social Q&A sites. First Monday. (2008) 13. doi: 10.5210/fm.v13i9.2182
16. Jansen BJ, Sobel K, Cook G. Classifying ecommerce information sharing behaviour by youths on social networking sites. J Inform Sci. (2011) 37:120–36. doi: 10.1177/0165551510396975
17. Morris MR, Teevan J, Panovich K. What do people ask their social networks, and why? A survey study of status message Q&A behavior. Conf Hum Factors Comput Syst Proc. (2010) 3:1739–48. doi: 10.1145/1753326.1753587
18. Zhang M, Jansen BJ, Chowdhury A. Business engagement on Twitter: a path analysis. Electronic Markets. (2011) 21:161–75. doi: 10.1007/s12525-011-0065-z
19. McLennan J. Formal and informal counselling help: students' experiences. Br J Guid Counc. (1991) 19:149–59. doi: 10.1080/03069889108253599
20. März A, Schubach S, Schumann JH. “Why would i read a mobile review?” device compatibility perceptions and effects on perceived helpfulness. Psychol Mark. (2017) 34:119–37. doi: 10.1002/mar.20979
21. Schindler RM, Bickart B. Perceived Helpfulness of Online Consumer Reviews: The Role of Message Content and Style. J Consum Behav. (2012) 11:234–43.
22. Cao Q, Duan W, Gan Q. Exploring determinants of voting for the “helpfulness” of online user reviews: a text mining approach. Decis Support Syst. (2011) 50:511–21. doi: 10.1016/j.dss.2010.11.009
23. Chen Z, Lurie NH. Temporal contiguity and negativity bias in the impact of online word of mouth. J Market Res. (2013) 50:463–76. doi: 10.1509/jmr.12.0063
24. Fang B, Ye Q, Kucukusta D, Law R. Analysis of the perceived value of online tourism reviews: influence of readability and reviewer characteristics. Tourism Manage. (2016) 52:498–506. doi: 10.1016/j.tourman.2015.07.018
25. Spool J. The Magic Behind Amazon's 2.7 Billion Dollar Question. (2009). Available online at: https://articles.uie.com/magicbehindamazon/
26. Zhu L, Yin G, He W. Is this opinion leader's review useful? Peripheral cues for online review helpfulness. J Electron Commer Res. (2014) 15:267.
27. Jørgensen C, Hougaard E, Rosenbaum B, Valbak K, Rehfeld E. The dynamic assessment interview (DAI), interpersonal process measured by structural analysis of social behavior (SASB) and therapeutic outcome. Psychother Res. (2010) 10:181–95. doi: 10.1093/ptr/10.2.181
28. Russell RL, Trull TJ. Sequential analyses of language variables in psychotherapy process research. J Consult Clin Psychol. (1986) 54:16–21. doi: 10.1037/0022-006X.54.1.16
29. Arntz A, Hawke LD, Bamelis L, Spinhoven P, Molendijk ML. Changes in natural language use as an indicator of psychotherapeutic change in personality disorders. Behav Res Ther. (2012) 50:191–202. doi: 10.1016/j.brat.2011.12.007
30. Brockmeyer T, Zimmermann J, Kulessa D, Hautzinger M, Bents H, Friederich H-C, et al. Me, myself, and I: self-referent word use as an indicator of self-focused attention in relation to depression and anxiety. Front Psychol. (2015) 6:1564. doi: 10.3389/fpsyg.2015.01564
31. Consedine NS, Krivoshekova YS, Magai C. Play it (again) sam: linguistic changes predict improved mental and physical health among older adults. J Lang Soc Psychol. (2012) 31:240–62. doi: 10.1177/0261927X12446736
32. Althoff T, Clark K, Leskovec J. Large-scale analysis of counseling conversations: an application of natural language processing to mental health. Trans Assoc Comput Ling. (2016) 4:463–76. doi: 10.1162/tacl_a_00111
33. Huston JM. Language Use as a Progress Monitoring Marker: An Exploratory Study of Change in Psychotherapy. (Doctoral thesis), State University of New York (2018). Available online at: https://search.proquest.com/docview/2124411808?accountid=13375 (accessed May 7, 2022).
34. Sharma A, Miner A, Atkins D, Althoff T. A computational approach to understanding empathy expressed in text-based mental health support. arXiv [Preprint]. (2009). arXiv: 2009.08441.
35. Lai L. Language Use and Sentiment in Psychotherapy: Predicting the Process and Outcome. (Master's thesis), Central China Normal University (2019).
36. Ardito RB, Rabellino D. Therapeutic alliance and outcome of psychotherapy: historical excursus, measurements, and prospects for research. Front Psychol. (2011) 2:270. doi: 10.3389/fpsyg.2011.00270
37. Ireland ME, Pennebaker JW. Language style matching in writing: synchrony in essays, correspondence, and poetry. J Pers Soc Psychol. (2010) 99:549–71. doi: 10.1037/a0020386
38. Yang S, Zhou C, Chen Y. Do topic consistency and linguistic style similarity affect online review helpfulness? An elaboration likelihood model perspective. Inform Process Manage. (2021) 58:102521. doi: 10.1016/j.ipm.2021.102521
39. Lai L, Tao R, Ren Z, Jiang G. The current practice and ethical issues of online counseling in Chinese mainland: evidence comes from the “big data” and ethical assessment for online counseling websites. Psychol Sci. (2018) 41:1214–20. doi: 10.16719/j.cnki.1671-6981.20180528
40. An J, Kwak H, Ahn YY. SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment. (2018). Available online at: https://github.com/ghdi6758/SemAxis (accessed November 11, 2021).
41. Dong Y, Chawla NV, Swami A. Metapath2vec: Scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM (2017). p. 135–44.
42. Rudolph M, Blei D. Dynamic Embedding for Language Evolution. (2018). Available online at: https://dl.acm.org/doi/pdf/10.1145/3178876.3185999 (accessed May 7, 2022).
43. Bolukbasi T, Chang KW, Zou JY, Saligrama V, Kalai AT. Man Is to Computer Programmer as Woman Is to Homemaker? Debiasing Word Embeddings. (2016). Available online at: https://proceedings.neurips.cc/paper/2016/file/a486cd07e4ac3d270571622f4f316ec5-Paper.pdf
44. Garg N, Schiebinger L, Jurafsky D, Zou J. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proc Natl Acad Sci. (2018) 115:E3635–44. doi: 10.1073/pnas.1720347115
45. Caliskan A, Bryson JJ, Narayanan A. Semantics Derived Automatically From Language Corpora Necessarily Contain Human biases. (2017). Available online at: http://opus.bath.ac.uk/55288/ (accessed November 11, 2021).
46. Kozlowski AC, Taddy M, Evans JA. The geometry of culture: analyzing the meanings of class through word embeddings. Am Sociol Rev. (2019) 84:905–49. doi: 10.1177/0003122419877135
47. Tshitoyan V, Dagdelen J, Weston L, Dunn A, Rong Z, Kononova O, et al. Unsupervised word embeddings capture latent knowledge from materials science literature. Nature. (2019) 571:95–8. doi: 10.1038/s41586-019-1335-8
48. Liu H, Zhang L, Wang W, Huang Y, Li S, Ren Z, et al. Prediction of online psychological help-seeking behavior during the COVID-19 pandemic: an interpretable machine learning method. Front Public Health. (2022) 10:814366. doi: 10.3389/fpubh.2022.814366
49. Hides L, Lubman DI, Devlin H, Cotton S, Aitken C, Gibbie T, et al. Reliability and validity of the kessler 10 and patient health questionnaire among injecting drug users. Aust New Zeal J Psychiatry. (2016) 41:166–8. doi: 10.1080/00048670601109949
50. Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng AY, et al. Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (2013). p. 1631–42.
51. Chartrand TL, Bargh JA. The chameleon effect: the perception–behavior link and social interaction. J Pers Soc Psychol. (1999) 76:893. doi: 10.1037/0022-3514.76.6.893
52. White J, Gardner J. The Classroom X-Factor: The Power of Body Language and Non-verbal Communication in Teaching. New York, NY: Routledge (2013).
53. Zhou J, Zhou J. Analysis of interactive synchrony of non-verbal behavior between teachers and students in classroom teaching. J Beijing Univ Technol. (2006) 6:89–92.
54. Lord SP, Sheng E, Imel ZE, Baer J, Atkins DC. More than reflections: empathy in motivational interviewing includes language style synchrony between therapist and client. Behav Ther. (2015) 46:296. doi: 10.1016/j.beth.2014.11.002
55. Alsumait L, Alsumait L, Barbará D, Domeniconi C. Online LDA: adaptive topic model for mining text streams with application on topic detection and tracking. In: Proceedings of IEEE International Conference on Data Mining (ICDM08). (2008). Available online at: http://citeseerx.ist.psu.edu/viewdoc/summary (accessed May 7, 2022).
56. Niwattanakul S, Singthongchai J, Naenudorn E, Wanapu S. Using of Jaccard coefficient for keywords similarity. In: Proceedings of the International Multiconference of Engineers and Computer Scientists, Vol. 1. (2013). p. 80–4.
57. Gonzales AL, Hancock JT, Pennebaker JW. Language style matching as a predictor of social dynamics in small groups. Communic Res. (2009) 37:3–19. doi: 10.1177/0093650209351468
58. Crijns H, Cauberghe V, Hudders L, Claeys AS. How to deal with online consumer comments during a crisis? The impact of personalized organizational responses on organizational reputation. Comput Hum Behav. (2017) 75:619–31. doi: 10.1016/j.chb.2017.05.046
59. Bohart AC, Elliott R, Greenberg LS, Watson JC. Empathy. In: Norcross JC, editor. Psychotherapy Relationships That Work: Therapist Contributions and Responsiveness to Patients. New York, NY: Oxford University Press (2002). p. 89–108.
60. Elliott R, Bohart AC, Watson JC, Murphy D. Therapist empathy and client outcome: an updated meta-analysis. Psychotherapy. (2018) 55:399. doi: 10.1037/pst0000175
61. Gibson J, Can D, Xiao B, Imel ZE, Atkins DC, Georgiou P, et al. A deep learning approach to modeling empathy in addiction counseling. Commitment. (2016) 111:21. doi: 10.21437/Interspeech.2016-554
62. Pérez-Rosas V, Wu X, Resnicow K, Mihalcea R. What makes a good counselor? Learning to distinguish between high-quality and low-quality counseling conversations. In: Proceedings of the 57th Annual Meeting Ofthe Association for Computational Linguistics. (2019). p. 926–35.
63. Patel S, Pelletier-Bui A, Smith S, Roberts MB, Kilgannon H, Trzeciak S, et al. Curricula for empathy and compassion training in medical education: a systematic review. PLoS ONE. (2019) 14:e0221412. doi: 10.1371/journal.pone.0221412
64. Buechel S, Buffone A, Slaff B, Ungar L, Sedoc J. Modeling empathy and distress in reaction to news stories. ArXiv Preprint ArXiv:1808.10399. [Preprint]. (2018). doi: 10.18653/v1/D18-1507
65. Pérez-Rosas V, Mihalcea R, Resnicow K, Singh S, An L. Understanding and predicting empathic behavior in counseling therapy. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). (2017). p. 1426–35.
66. Selman RL. The Growth of Interpersonal Understanding: Developmental and Clinical Analyses. New York, NY: Academy Press (1980).
68. Barrett-Lennard GT. The empathy cycle: refinement of a nuclear concept. J Couns Psychol. (1981) 28:91. doi: 10.1037/0022-0167.28.2.91
69. Lundberg SM, Allen PG, Lee S-I. A Unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. (2017). p. 4768–77.
70. Barredo Arrieta A, Díaz-Rodríguez N, del Ser J, Bennetot A, Tabik S, Barbado A, et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inform Fusion. (2020) 58:82–115. doi: 10.1016/j.inffus.2019.12.012
71. Ghose A, Ipeirotis PG. Estimating the helpfulness and economic impact of product reviews: mining text and reviewer characteristics. IEEE Trans Knowl Data Eng. (2010) 23:1498–512. doi: 10.1109/TKDE.2010.188
72. Liu Y, Huang X, An A, Yu X. Modeling and predicting the helpfulness of online reviews. In: 2008 Eighth IEEE International Conference on Data Mining. (2008). p. 443–52.
74. Klonek FE, Quera V, Kauffeld S. Coding interactions in motivational interviewing with computer-software: what are the advantages for process researchers? Comput Human Behav. (2015) 44:284–92. doi: 10.1016/j.chb.2014.10.034
75. Xiao B, Bone D, Segbroeck M, van Imel ZE, Atkins DC, Georgiou PG, et al. Modeling therapist empathy through prosody in drug addiction counseling. In: Fifteenth Annual Conference of the International Speech Communication Association. (2014). p. 213–217.
76. Doshi-Velez F, Kim B. Towards a rigorous science of interpretable machine learning. ArXiv Preprint ArXiv:1702.08608 [Preprint]. (2017). doi: 10.48550/arXiv.1702.08608
77. Tausczik YR, Pennebaker JW. The psychological meaning of words: LIWC and computerized text analysis methods. J Lang Soc Psychol. (2010) 29:24–54. doi: 10.1177/0261927X09351676
78. Shapira N, Lazarus G, Goldberg Y, Gilboa-Schechtman E, Tuval-Mashiach R, Juravski D, et al. Using computerized text analysis to examine associations between linguistic features and clients' distress during psychotherapy. J Couns Psychol. (2021) 68:77. doi: 10.1037/cou0000440
79. Smink WAC, Fox J-P, Tjong Kim Sang E, Sools AM, Westerhof GJ, Veldkamp BP. Understanding therapeutic change process research through multilevel modeling and text mining. Front Psychol. (2019) 10:1186. doi: 10.3389/fpsyg.2019.01186
80. Chen X, Hao P, Chandramouli R, Subbalakshmi KP. Authorship similarity detection from email messages. In: Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6871 LNAI. (2011). p. 375–86.
81. Constantine MG. Social desirability attitudes, sex, and affective and cognitive empathy as predictors of self-reported multicultural counseling competence. Couns Psychol. (2016) 28:857–72. doi: 10.1177/0011000000286008
82. Pennebaker JW, Mehl MR, Niederhoffer KG. Psychological aspects of natural language use: our words, our selves. Annu Rev Psychol. (2003) 54:547–77. doi: 10.1146/annurev.psych.54.101601.145041
83. Khawaja MA, Chen F, Marcus N. Analysis of collaborative communication for linguistic cues of cognitive load. Hum Factors. (2012) 54:518–29. doi: 10.1177/0018720811431258
84. Barak A, Boniel-Nissim M, Suler J. Fostering empowerment in online support groups. Comput Human Behav. (2008) 24:1867–83. doi: 10.1016/j.chb.2008.02.004
85. Barnett JE. Online counseling: new entity, new challenges. Couns Psychol. (2016) 33:872–80. doi: 10.1177/0011000005279961
86. Rochlen AB, Zack JS, Speyer C. Online therapy: review of relevant definitions, debates, and current empirical support. J Clin Psychol. (2004) 60:269–83. doi: 10.1002/jclp.10263
87. Cook JE, Doyle C. Working alliance in online therapy as compared to face-to-face therapy: preliminary results. Cyber Psychol Behav. (2004) 5:95–105. doi: 10.1089/109493102753770480
88. Zhu X, Hu Y, Jiang G. The developmental patterns of working alliance in counseling: relationships to therapeutic outcomes. Acta Psychologica Sinica. (2015) 47:1279. doi: 10.3724/SP.J.1041.2015.01279
89. Safran JD. Breaches in the therapeutic alliance: an arena for negotiating authentic relatedness. Psychotherapy. (1993) 30:11–24. doi: 10.1037/0033-3204.30.1.11
90. Safran JD, McMain S, Crocker P, Murray P. Therapeutic alliance rupture as a therapy event for empirical investigation. Psychotherapy. (1990) 27:154–65. doi: 10.1037/0033-3204.27.2.154
91. Safran JD, Muran JC. The resolution of ruptures in the therapeutic alliance. J Consult Clin Psychol. (1996) 64:447–58. doi: 10.1037/0022-006X.64.3.447
92. Cesare N, Grant C, Nsoesie EO. Detection of User Demographics on Social Media: A Review of Methods Recommendations for Best Practices. (2017). Available online at: http://adsabs.harvard.edu/abs/2017arXiv170201807C
93. Liess A, Simon W, Yutsis M, Owen JE, Piemme KA, Golant M, et al. Detecting emotional expression in face-to-face and online breast cancer support groups. J Consult Clin Psychol. (2008) 76:517–23. doi: 10.1037/0022-006X.76.3.517
94. Mehta VS, Parakh M, Ghosh D. Web based interventions in psychiatry: an overview. Int J Mental Health Psychiatry. (2015) 2016:3. doi: 10.4172/2471-4372.1000108
95. Bar-Lev S. “We are here to give you emotional support”: performing emotions in an online HIV/AIDS support group. Qual Health Res. (2008) 18:509–21. doi: 10.1177/1049732307311680
96. Barker KK. Electronic support groups, patient-consumers, and medicalization: the case of contested illness. J Health Soc Behav. (2008) 49:20–36. doi: 10.1177/002214650804900103
97. Ren Z, Li X, Zhao L, Yu X, Li Z, Lai L, et al. Effectiveness and mechanism of internet-based self-help intervention for depression: the Chinese version of MoodGYM. Acta Psychol Sinica. (2016) 48:818. doi: 10.3724/SP.J.1041.2016.00818
98. Williams AD, Andrews G. The effectiveness of internet cognitive behavioural therapy (iCBT) for depression in primary care: a quality assurance study. PLoS ONE. (2013) 8:e57447. doi: 10.1371/journal.pone.0057447
Keywords: perceived helpfulness, social question answering, online counseling, explainable machine learning, topic consistency, linguistic style similarity, emotional similarity
Citation: Huang Y, Liu H, Li S, Wang W and Zhou Z (2022) Effective Prediction and Important Counseling Experience for Perceived Helpfulness of Social Question and Answering-Based Online Counseling: An Explainable Machine Learning Model. Front. Public Health 10:817570. doi: 10.3389/fpubh.2022.817570
Received: 18 November 2021; Accepted: 09 May 2022;
Published: 22 December 2022.
Edited by:
Li Wang, Nantong University, ChinaReviewed by:
Weiqing Li, Hubei University of Technology, ChinaXianglian Yu, Jianghan University, China
Copyright © 2022 Huang, Liu, Li, Wang and Zhou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Shen Li, Ymx1ZS5zaGUubiYjeDAwMDQwOzE2My5jb20=; Weijun Wang, d2FuZ3dqJiN4MDAwNDA7bWFpbC5jY251LmVkdS5jbg==