- 1Signal Analysis Research (SAR) Group, Department of Electrical, Computer, and Biomedical Engineering, Toronto Metropolitan University, Toronto, ON, Canada
- 2Interventional Psychiatry Program, St. Michael’s Hospital, Department of Psychiatry, University of Toronto, Toronto, ON, Canada
COVID-19 has led to an increase in anxiety among Canadians. Canadian Perspectives Survey Series (CPSS) is a dataset created by Statistics Canada to monitor the effects of COVID-19 among Canadians. Survey data were collected to evaluate health and health-related behaviours. This work evaluates CPSS2 and CPSS4, which were collected in May and July of 2020, respectively. The survey data consist of up to 102 questions. This work proposes the use of the survey data characteristics to identify the level of anxiety within the Canadian population during the first- and second-phases of COVID-19 and is validated by using the General Anxiety Disorder (GAD)-7 questionnaire. Minimum redundancy maximum relevance (mRMR) is applied to select the top features to represent user anxiety, and support vector machine (SVM) is used to classify the separation of anxiety severity. We employ SVM for binary classification with 10-fold cross validation to separate the labels of Minimal and Severe anxiety to achieve an overall accuracy of and for CPSS2 and CPSS4, respectively. After analysis, we compared the results of the first and second phases of COVID-19 and determined a subset of the features that could be represented as pseudo passive (PP) data. The accurate classification provides a proxy on the potential onsets of anxiety to provide tailored interventions. Future works can augment the proposed PP data for carrying out a more detailed digital phenotyping.
1. Introduction
Mental health is one of the greatest inequalities in terms of prevalence across the globe, with up to 80% of cases involving some sort of psychosis conditions occurring in low- and middle-income countries (1). Treatment for mental health disorders are consistently expensive among countries around the world (2). This can cause inequality and unequal access to mental health treatments for patients in poorer countries. Studies on mental health disorders in low- and middle-income countries have been recognized (3, 4), allowing for a better understanding of mental health applications in subpopulations. The opportunity to apply digital phenotyping applications can offer low-cost aid for diagnosis of mental health disorders and digital interventions (5, 6).
There are various aspects that can affect a person’s mental health, including internal and external factors. Internal factors include physical health and genetic predisposition (7), whereas external factors include financial insecurity, food insecurity, and lifestyle changes (8). Mental health is an obscure topic as it can affect everyone personally (9). Due to the COVID-19 pandemic, there has been a deterioration in the general public’s mental wellbeing, causing an increase in discussions related to mental health (10, 11).
The main aim of this work is to identify characteristics from the Canadian Perspective Survey Series (CPSS) (12) data to evaluate the level of anxiety within the Canadian labour force population. The CPSS dataset is a series of datasets collected by Statistics Canada and is used to evaluate the physical and mental health of Canadians at different stages of the COVID-19 pandemic. This work focuses on the Canadian Perspectives Survey Series 2, 2020: Monitoring the Effects of COVID-19 (CPSS2) and the Canadian Perspective Survey Series 4, 2020: Information Sources Consulted During the Pandemic (CPSS4), to evaluate the mental health of users within the Canadian labour force population. These datasets were collected online in May and July, respectively. CPSS2 was collected during May 2020, and the purpose of this dataset was to survey the mental and physical health effects of the COVID-19 pandemic on Canadians. CPSS2 was associated with the beginning of the first lockdown (12, 13). CPSS4 was the subsequent dataset of the series, which was collected during July in 2020 (14). CPSS4 is a continuation of CPSS2, in addition to collecting information about the sources consulted during the pandemic. This dataset was associated with the end of the first lockdown (13, 14). The labour force is broken down into two sections, namely, the employed and unemployed population. The employed are defined as persons holding a job or owning a business, and the unemployed are defined as those without work and actively seeking work.
The current literature uses the CPSS dataset to evaluate user anxiety through self-perceived mental health. We hypothesize a methodology that can indirectly assess self-perceived anxiety through the successful identification of survey data characteristics. Instead of the general self-perceived mental health response labels used in Findlay et al. (10) and Zajacova et al. (15), we propose the use of the more quantified General Anxiety Disorder (GAD)-7 labels to assess anxiety among the general public during the COVID-19 pandemic. Using the GAD-7 severity levels, we harness the novel feature selection and machine learning classification techniques to better understand what contributes to anxiety and how to provide early interventions.
This work aims to study the use of survey data to influence the future of Ecological Momentary Assessment (EMA) in mental health. EMA is the sampling of a subjects’ current behavior and experiences in real time (16). It is typically sampled in their natural environment. This work uses CPSS, where the survey questions are sampled throughout the pandemic. This work is used to analyze the characteristics of the CPSS dataset to successfully evaluate the anxiety of the Canadian population. Once successfully evaluated using the CPSS data, the results of this paper can be used in future work to offer improved and efficient data collection. This will allow continuous monitoring and monitor the trends of user anxiety (17).
The rest of this paper is organized as follows: Section 2 presents a literature review of the key related works. Section 3.1 discusses the CPSS data in further detail and Section 3.1.4 presents the methodologies used for feature selection and classification. Finally, the results are presented in Section 4 with a discussion on the conclusionsdrawn in Section 5.
2. Related works
Studies that have involved mental health research during COVID-19 include the work by Dagklis et al. (18). This work focuses on the perinatal of mental health during lockdown in Greece. The motivation for this work stems from the hypotheses of previous pandemics (SARS and MERS) that pregnant women were more likely to be psychologically affected (19, 20), which could lead to potential negative consequences on perinatal outcomes (21). To quantitatively monitor perinatal anxiety and depression, the State–Trait Anxiety Inventory and the Edinburg Postnatal Depression Scale are used (22, 23). This study followed the State–Trait Anxiety Inventory and Edinburg Postnatal Depression Scale score ranges and cut-offs. A total of 269 women consented to participate in the study. The results revealed that 37.5% of the participants experienced a state anxiety score of 42 (mild anxiety) and 13.0% of particpants experienced a trait anxiety score of 35 (no anxiety) (18). The State–Trait Anxiety Inventory scores were assessed during weeks 1, 3, and 6, and it was discovered that participants had feelings of tension, strain, and confusion. During week 6, they were feeling more frightened. The mass quarantine negatively affected the anxiety levels of the majority of pregnant women in Greece. Given these examples, it is evident that the COVID-19 lockdown has had a negative effect on mental health, regardless of geographical location.
In addition to these effects, the COVID-19 pandemic is having a significant socioeconomic impact on the vast majority of the general public (10). CPSS is a series of surveys undertaken by Statistics Canada, which assesses the impacts of the COVID-19 pandemic on the Canadian labour force (12). A few studies have been conducted on CPSS using perceived mental health categories (10, 15). These perceived mental health labels are Excellent, Very Good, Good, Fair, and Poor. CPSS contains questions asking about individual impressions of the pandemic from both the health and the economic standpoint. The questionnaire clearly pertains to mental health, as evident from the fact that it asks numerous questions in regards to the self-perceived mental health and causalities associated with positive and negative self-assessments. In particular, the GAD-7 questionnaire is one such metric validated by the Diagnostic and Statistical Manual of Mental Disorders (DSM) for the rating of anxiety severity (24–26). The representation of GAD-7 is a more quantified measure of the severity of anxiety as illustrated in Figure 3. Perceived mental health has traditionally been used as the standardized label for mental health studies. Polsky and Gilmor utilized self-perceived mental health to compare food insecurity among Canadians during the COVID-19 pandemic (8). This study used logistic regression with sociodemographic covariate adjustment. Based on the study, individuals with moderate food insecurity experienced three times higher odds of reporting lower levels of mental health and higher levels of anxiety. When compared with individuals with severe food insecurity, the ratios for mental health and anxiety increased to 4 and 7.6, respectively.
In a similar work, Bulloch et al. (27) used CPSS2 to determine that the COVID-19 pandemic was associated with a decrease in mental health in those under the age of 65. The evaluation was estimated through the use of self-reported mental health and GAD questionnaires. In an article by Lin (28), it is revealed that the author used CPSS4 and extracted information about GAD-7, exposure to COVID-19 misinformation, records of precarious employment, and health behaviour changes to explore gender-specific mental health during the pandemic. It was determined that anxiety levels differed between male and female participants. It was discovered that female participants experienced twice the prevalence of moderate-severe scores of anxiety on the GAD-7 survey (17.2% to 9.9% for female to male, respectively, ) (28).
In other studies that have used CPSS datasets for analysis, it is revealed that Nguyen et al. (29) utilized GAD-7 scores, from the CPSS2 dataset, as a label identifying indicators of anxiety in Canadians at the beginning of the first lockdown in Canada. CPSS2 comprises 62 questions, and the author employed minimum redundancy maximum relevance (mRMR) to reduce the feature set to the top 20 features. Hierarchial classification was implemented and a support vector machine (SVM) binary classification with 10-fold cross validation was employed to classify Minimal and Severe anxiety to achieve an overall accuracy of 94.77%. This work proposes the term pseudo passive (PP) data, which can be considered active data that can be augmented as passive data. There are many potential benefits in PP data such as reduction in survey fatigue and passive data collection (29).
The adoption of the collecting PP data through the use of digital platforms and wearables allows for different perspectives for affective computing and digital phenotyping. Affective computing is defined by the study of emotional states through the use of technologies such as systems and devices, which recognize, interpret, process, and simulate emotion (30). This is a multidisciplinary field that encompasses engineering, computer science, psychology, sociology, cognitive science, and others. Moreover, digital phenotyping is defined by Torous et al. (31) as the moment-by-moment evaluation of personalized human phenotype through the use of smartphone and digital devices. The data collected have two subgroups consisting of passive and active data. There have been only a limited number of studies that have used machine learning or statistical analysis to classify mental health from active, passive, and PP data. Studies that have incorporated the techniques and data streams to identify mental health markers include (32–35).
StudentLife project is a publicly available dataset collected at the Dartmouth College (32) that contains active and passive data from 60 participants over 10 weeks. Studies by Farhan et al. (34) and Nguyen et al. (33) have used the StudentLife dataset to apply techniques such as multiview biclustering and decision tree (DT) classification to classify depression severity and have achieved overall classification accuracies of 87.1% and 94.7%, respectively.
In similar studies, Melcher et al. (35) collected passive and active data from college students to determine how digital biomarkers of behavior correlate with mental health. Statistical analysis was conducted and it discovered a correlation of sleep variance with depression scores () and stress scores ().
Currently, EMA data can be collected using smartphones for affect and stress assessments (36). We believe that a subset of this EMA data, which still requires active engagement from users for responses, can be substituted with PP data collection. An example of the aforementioned includes “What type of physical activity are you doing right now?” (37). This EMA can be replaced by PP by using an accelerometer (29).
Studies by Curtis et al. (38) and Rivenbark et al. (39) have examined census data collected from Scotland and the USA, respectively, to evaluate the mental health of the target population. Similarly, this paper aims to analyze correlates of anxiety symptoms among the Canadian labour force in the CPSS dataset. In doing so, the term PP can be further developed, creating a foundation for future studies to potentially use PP in the replacement of EMA and active data collection. This has the potential to advance the field of digital phenotyping, offering users more flexibility to collect data.
3. Methods
3.1. Dataset
Presently, the CPSS dataset comprises six series, collected in April, May, June, July, and September of 2020 and January of 2021. The datasets used in this paper are CPSS2 (12) and CPSS4 (14). The study has a total of 31,896 user sign-ups, which are divided between the six series, and has a participation rate of 23%.
The target populations of these surveys are Canadians that are 15 years or older and part of the labour force, with the exception of full-time members of the Canadian Armed Forces. One participant per household is randomly selected to engage in CPSS. The purpose of the data collection exercise is to obtain information from the participants about any alterations that they experienced in their health condition and in their health behaviours during the COVID-19 pandemic.
3.1.1. First phase of COVID-19
CPSS2 was collected between May 4, 2020, and May 10, 2020. We will refer to this as the first phase of COVID-19 as it encompasses the start of the first wave and beginning of the lockdown. This dataset had 7,242 eligible participants, of whom 4,600 responded at a rate of 63.5%. This series contained 62 variables that were grouped into Behaviour (BH), Demographics (DEM), Derived Variables (DV), Food security (FSC), Labour market impacts (LM), Mental health impacts (MH), and Survey related variables (SRV). The groups BH, DEM, DV, FSC, LM, MH, and SRV contain 29, 9, 4, 1, 8, 8, and 3 variables, respectively.
Figure 1 visualizes the probability distribution of the demographics (household, age group and marital status) of participants in the first phase of the pandemic in respect to the severity of anxiety. In the CPSS2 dataset, it is revealed that 76% and 49.4% of the participants were born in Canada and were male, respectively, while the remaining participants were not born in Canada and are female, respectively.
3.1.2. Second phase of COVID-19
CPSS4 was collected from July 20, 2020, until July 26, 2020, and we will refer to it as the second phase of COVID-19. This dataset had 7,242 eligible participants, with 4,218 responding at a rate of 58.2%. This series contained 102 variables that were grouped into BH, DEM, MH, SRV, Checking Information Sources (FC), and People in Contact (PBH). The groups BH, DEM, MH, SRV, FC, and PBH contained 45, 10, 12, 3, 30, and 2 variables, respectively.
Figure 2 visualizes the probability distribution of the demographics (household, age group, and marital status) of participants in the second phase of COVID-19 in respect to the severity of anxiety. In the CPSS4 dataset, 84% and 46.1% of the participants were born in Canada and are male, respectively. While the remaining participants were not born in Canada and are female, respectively.
3.1.3. GAD-7
This paper chooses to focus on the first and second phases of COVID-19 as these are the only series that contains mental health survey questions, which include perceived mental health and GAD-7. GAD-7 determines the severity of anxiety disorder based on a self-diagnostic survey. The survey questions are scored between 0 and 3 and consist of seven questions totalling to a max score of 21 (24). The survey has four levels of anxiety severity, namely, Minimal, Mild, Moderate, and Severe Anxiety, these are determined by the score cut-off points of 5, 10, and 15, respectively (24).
3.1.4. Demographics
In CPSS2, the demographic information collected included household size, the age of the respondent, immigration status, the sex of the respondent, the presence of the dependent child as of May 4, 2020, the marital status of the respondent, the type of dwelling, the highest level of education completed, and rural/urban indicators. Similarly, CPSS4 collected the same demographic information, in addition to the employment status of the respondent. Due to the anonymization of the data, the survey response relationship for each user could not be tracked. This made it difficult to create a direct relationship of any findings with subpopulations. Instead, the findings could be generalized to the general Canadian population.
It could be seen that households categorized by GAD severity had a right skewed distribution where a small household size dominates each category of severity. However, age groups categorized by GAD had a Gaussian-type distribution where the age groups were distributed evenly across each category of severity. Lastly, it could be seen that trends in the first and second phases of COVID-19 were very similar. Although very minimal, it could be seen that there are less instances of the severe category and more instances of minimal category in the second phase of COVID-19 than in the first phase of COVID-19.
3.2. Pre-processing
Prior to analysis, the GAD-7 metric data were pre-processed. Pre-processing involved the removal of GAD-7-related features that were directly related to the survey due to the GAD-7 severity metric being used as the class label (ANXDVSEV column header). The GAD-7-related features that were removed were seven questions consisting of GAD (MH15A, MH15B, MH15C, MH15D, MH15E, MH15F, MH15G), GAD score (ANXDVGAD), and GAD cut-off (ANXDVGAC). Further pre-processing was conducted in order to remove any data samples where a GAD-7 severity metric response was not provided. The data was then normalized using min–max normalization (40). The normalization equation is represented in Equation 1, where represents the respective feature column
3.3. Feature learning
The full list of features can be seen in Statistics Canada (12, 14). To identify the significant features of the data, we applied feature learning techniques. Two feature learning tasks were employed and we found that mRMR provided the best outcome.
3.3.1. Minimum redundancy maximal relevance
For feature selection, the mRMR algorithm was proposed. This approach optimizes the mutual information values, represented as , where and represent the random variable (41). The aim of this approach is to maximize the distance between the max-dependency and min-redundancy as in Equation 2. However, due to the computational cost of maximum dependency, a simpler approximation was introduced, which was maximum relevance. Maximum relevance () between the subset of features and the target class was obtained as in Equation 3. Redundancy estimation for features was calculated by using mutual information values between two features. Minimum redundancy calculation is provided in Equation 4.
3.3.2. Relieff feature learning
Another feature learning algorithm that was applied was Relieff (42). This algorithm was proposed by Kira and Rendell (42) to enhance learning times and the accuracy of learned concepts. The original algorithm was proposed for binary classification but is possible for multinomial classification by decomposition into a number of binary problems. Given the feature set as with instance denoted by the -dimensional vector , the Relieff algorithm was used to detect features that are statistically relevant to the target concept. The feature vector was iterated times and the near-hit and near-miss values were calculated by the -dimensional Euclid distance. The near-hit and near-miss values were used to update the weight vector with index , which is represented in Equation 5. The feature weight was calculated for every triplet sample, which is also known as relevance. Lastly, relieff selected relevance values that were above a given threshold .
3.4. Label separation
We separated the label to evaluate four cases. The case separations were proposed to enhance the understanding of classifying GAD within the Canadian labour force population during the first and second phases of COVID-19. Preliminary verification of the selected features was achieved using the greatest distance between the labels, i.e., Minimal and Severe Anxiety.
The second case involved a more granular separation between adjacent labels, where hierarchical grouping were used to further test the robustness of these representative features. We followed GAD-7’s hierarchical structure illustrated in Figure 3 for the robustness studies.
The third case was a binary classification with a GAD score of 10. The significance of the score of 10 was suggested to be a reasonable cut-off for identifying cases of GAD (24). In a study by Spitzer et al. (24), 965 patients conducted a telephone interview with a mental health professional to determine the presence of GAD diagnosis. It was determined that the cut-off of 10 was significant, as it is an optimal balance between sensitivity (89%) and specificity (82%) of GAD symptoms (24, 43).
Lastly, the labels will be separated into its respective classes of minimal, mild, moderate, and severe. We conducted a four-class classification in attempt to separate users into respective GAD severities.
3.5. Classification
To validate the selected features, SVM and DT classifiers were used with 10-fold cross-validation to check the veracity of the features using various separations between the labels. SVM and DT classifiers are supervised machine learning algorithms. Our work utilized a one-vs-all approach in conjunction with a linear SVM and a binary classification for DT. Other kernels of SVM such as radial basis function (RBF) and polynomial were tested and we discovered that the linear kernel was able to achieve a similar performance. We were motivated to record the results of the linear kernel due to its explainability and power consumption compared with the alternative kernels. In addition, this manuscript used DT and linear SVM to be consistent with the models used in Nguyen et al. (29).
SVM is a supervised learning method for classification, which is developed through the construction of a set of hyper-planes that separate the respective classes (44). DT is a non-parametric supervised learning method for classification, which predicts the class label through learning simple decision rules from the features. DT can also be represented as a piecewise constant approximation (45).
These classifiers were chosen because of their ability for high performance, high explainability, low complexity, and the given dataset size. SVM and DT offer performance metrics, namely, accuracy, precision, recall, and F1-score. The performance results of the is a fundamental factor for choosing a model. In addition, the chosen models offer high explainability and offer a low complexity.
3.6. Performance metrics
Accuracy, precision, recall, and F1 score were used as performance metrics for classification on the selected features (40), as provided in Equations 6, 7, 8 and 9, respectively.
4. Results
4.1. First phase of COVID-19
After pre-processing, 4,512 samples and 49 features were used for analysis. The samples were separated into GAD severity groups, which include Minimal (), Mild (), Moderate (), and Severe (). Following pre-processing, the proposed feature selection techniques were applied. mRMR was found to achieve the best performance. Our work found that 20 was the optimal number of features required without having to sacrifice the classification accuracy of anxiety severity. The reduced features are described in Table 1.
During label separation, the first case separated the classes into Minimal and Severe, and were classified using a 10-fold SVM and DT, achieving an accuracy of and , respectively. The 10-fold SVM approach achieved a recall, precision, and F1 score of 98.62%, 95.72%, and 97.15%, respectively. To justify the robustness of our approach, this paper used a hierarchical classification approach where the labels were separated between adjacent labels (Figure 3) and tested using an SVM and DT classifier, as shown in Table 2. In the third case, a binary classification with a GAD score cut-off of 10 was conducted. SVM and DT achieved a binary classification accuracy of and , respectively. Lastly, a four-class classification of minimal, mild, moderate, and severe GAD severities was conducted. The 10-fold SVM and DT achieved an accuracy of and , respectively.
It is also worthwhile to mention that alternative kernels including RBF and polynomial, were also tested for the four cases. The respective results were achieved and can be seen in Table 3. The results of the alternative kernels achieved similar values to the linear SVM kernel. The greater simplicity of the linear SVM further supported our choice of kernel compared with its alternatives.
4.2. Second phase of COVID-19
After pre-processing, 4,087 samples and 89 features were used for the analysis. The samples were separated into GAD severity groups that included minimal (), mild (), moderate (), and severe (). Followed by pre-processing, the proposed feature selection techniques were applied. Similar to the first phase, mRMR achieved the best performance. Our work found that 20 was the optimal number of features required without having to sacrifice the classification accuracy of anxiety severity. The reduced feature set is described in Table 4.
During label separation, the first case separated the classes into Minimal and Severe and were classified using a 10-fold SVM and DT, achieving an accuracy of and , respectively. The 10-fold SVM approach achieved a recall, precision and F1 score of 99.03%, 98.39%, and 98.71%, respectively. To justify the robustness of our approach, this paper used a hierarchical classification approach where the labels were separated between adjacent labels (Figure 3) and tested using an SVM and DT classifier, as shown in Table 5. In the third case, a binary classification with a GAD score cut-off of 10. SVM and DT achieved a binary classification accuracy of and , respectively. Lastly, a four-class classification of minimal, mild, moderate and severe GAD severities was done. The 10-fold SVM and DT achieved an accuracy of and , respectively.
Similar to the first-phase COVID-19 analysis, alternative kernels were tested for the four cases and the respective results can be seen in Table 6.
4.3. Probability distribution analysis
The probability distribution for each of the selected features was analyzed, providing support for the selection of the reduced feature set. Figure 4 represents BH_35C, BH_40B, BH_35B, and BH_60C (Table 1). These probability distributions were assessed for each severity level. The resultant probabilities were equal to the number of sample points per response, divided by the total number of samples per severity level. Figure 4A shows a decline in the amount of physical exercise as the severity of anxiety increases. The probability of engaging in physical exercise reduced as the severity of anxiety increased, which matches the finding in Anderson and Shivakumar (46). Figure 4B shows a direct correlation between the severity of anxiety, and the usage of tobacco. Increased levels of anxiety present an increased probability of tobacco usage. This result supports the findings in King et al. (47). Figure 4C shows an increase in the meditation for mental and physical health as anxiety severity increases. There was a mixed response to the effectiveness of meditation in helping reduce anxiety in users (48–51). A potential reason for the increased number of users engaging in meditation might be their attempts to reduce their anxiety level or that they were unsuccessful in their previous meditation attempts due to its various challenges (52). Figure 4D represents the use of delivery services (Daily, 4 or 5 times, 1 to 3 times, and never) in the previous week. The figure outlines an increase in the use of delivery services with an increase in GAD severity. During the COVID-19 pandemic, people may increase their use of delivery services to minimize the risk of being infected (53).
Figure 4. Probability distribution of (A) exercises outdoors, (B) tobacco usage, (C) meditation, and (D) use of delivery services, in respect to severity.
5. Conclusion and discussion
The purpose of this work is to analyze the correlates of anxiety symptoms among the Canadian labour force during the first and second phases of COVID-19. This work proposes the use of GAD-7 as the anxiety severity labels, whereas others similar studies used perceived mental health (10, 11, 15, 54, 55). The novelty of this work is that we conduct a longitudinal analysis of the first and second phases of COVID-19, whereas Bulloch et al. (27) evaluated GAD severities of only the first phase of COVID-19. The reason for using GAD-7 is that GAD-7 is a psychometrically validated scale for anxiety (24). To the author’s knowledge, this is the first paper to conduct a longitudinal analysis of the first and second phases of COVID-19 CPSS datasets using the GAD-7 survey.
5.1. Feature analysis
Pre-processing and feature selection techniques were utilized to reduce the features used from a maximum of 102 to 20 features, in order to improve the efficiency and accuracy of the classifiers. The mRMR algorithm was used to reduce the feature set. Following the analysis of the reduced feature set, it was determined that many of the available features can be augmented as PP data. PP data are qualitative data that can be collected as passive data. For example, within the reduced feature set of the first and second phases of COVID-19 datasets, BH_35B, BH_35C, BH_40A, BH_40B, BH_40C, BH_40D, BH_40E, BH_40F, BH_110/PBH_110, and RURURB can be coined as PP data (Tables 1 and 4). The RURURB dataset is used to determine a participant’s location using the GPS signal, the BH_35C dataset uses an accelerometer for activity recognition, and the BH_40E dataset uses the audio environment to determine if the participant is watching TV. The term PP can be collected through various means, such as digital health devices and wireless and mobile systems. These platforms have the ability to capture PP data in addition to continuous passive data. The passive data can determine user exercise outdoors (BH_35C) as well as offer additional insights such as the frequency, duration, and location of exercises outdoor. Future work can envelope PP to reduce survey fatigue and capture objective measurements.
5.2. Classification
During classification, we tested for four cases, namely, Minimal-Severe, hierarchical, binary classification (GAD-7 score of 10), and four-class classification. In the first case, the classes Minimal and Severe were separated. The model used the reduced 20 feature subset and 10-fold SVM for the first phase of COVID-19 and the second phase of COVID-19 to achieve an overall accuracy of and , respectively. We expect to achieve the highest accuracy, when classifying Minimal and Severe, as the labels are opposite extremes in the GAD-7 severity scale. Given that the classes are represented as the opposite extremes of GAD-7, this a reasonable response that is further supported by our hierarchical classification results.
Our second case employed the hierarchical classification according to Figure 3 as it allows for a granular perspective and comparison between GAD severities. The third case involved a binary test with a GAD score cut-off of 10. This test classifies users into two classes (Minimal and Mild vs. Moderate and Severe). The binary test achieved an accuracy of 87.15% and 91.41% for the first and second phases of COVID-19, respectively. Given the high accuracy, the model can give proxy on identifying user anxiety. This gives the potential to augment PP data as it may have the potential to give proxy to user anxiety. This is significant as PP and passive data are more obtainable than active data, as it does not require user input.
Lastly, we classified four classes using 10-fold SVM and DT to achieve an accuracy of and for the first phase of COVID-19, respectively, and and for the second phase of COVID-19, respectively. When comparing the label separations, the four class classifier achieved the lowest accuracies. This was expected as we were classifying more classes and also due to the overlapping features between adjacent classes. GAD is not a black and white separation, as there are common symptoms that users will experience when feeling anxious (56). This is reflected in feelings, behaviours, thoughts, and physical sensation. We can consider anxiety as a spectrum of severities, and therefore, the features of one class, may be common to those of the adjacent classes.
5.3. Longitudinal analysis
A comparison of the first and second phases of COVID-19 reveals that we were able to achieve a higher accuracy for Minimal and Severe separation, hierarchical, and GAD significance for the second phase of COVID-19. Perhaps the reason for this was that the second phase of COVID-19 contained more features, allowing for more perspectives to classify anxiety. In contrast, the first-phase of COVID-19 we were able to achieve a higher four class classification. The reason is that the data were collected during the early stages of pandemic, when users are more mentally healthy. We expect the user population to have lower rates of mental illnesses at the beginning of the pandemic, whereas mental health of users in the general population (57, 58), older individuals () (59), and adolescents (mean ) (60) declined with the onset and progression of the pandemic. Overall, we were able to classify and compare CPSS2 and CPSS4 with relatively high accuracy. Future studies can collect the reduced feature set as EMA for continuous and long-term sampling. The use of EMA will allow increased sampling that can offer more interoperability and to predict the trends of a user’s mental health.
5.4. Ethical concerns
The data was collected in accordance with the ethical and privacy principles laid down in the Statistics Act, Revised Statutes of Canada, 1985, Chapter S-19 (61, 62). The datasets used in this study are publicily available and anonymized prior to publication. Anonymization is the process of removing personally identifiable information from data for the purposes of participant confidentiality and privacy. Data has also been volunteered with informed consent and the approval of participants.
5.5. Limitations
Because the data are anonymized and confidential, the findings of this paper cannot be applied to a specialized demographic of users. As this paper focuses on the general analysis of anxiety of Canadians, the results between subpopulations may vary.
The model developed used the CPSS data that were collected online through surveys during the pandemic. A limitation of this work is that we had only two datasets that involved the collection of self-perceived anxiety. The longitudinal analysis was conducted on two timestamps. Additional datasets collected at regular intervals or additional time sample points would further enhance the findings and offer a better understanding.
In addition, during the four-label separation, cases can be considered in retrospective analysis. Therefore, the proposed proxy needs to be validated in other datasets and implemented for future studies to determine the capabilities of identifying prospective anxiety in users.
5.6. Application
The findings of this work present anxiety severity as increasing from the first phase to the second phase of COVID-19. This implies a general decrease in mental health during the pandemic, which has been confirmed by prior work (18, 27). As previously mentioned, data collection is not continuous, thus making mental health monitoring difficult. However, these models can be applied to similar paradigms using wearables to collect passive data unobtrusively. The use of wearables will allow continuous data collection of similar information that was collected during the CPSS, which can be used to monitor and determine trends of participant mental health over time (63, 64). Future studies can incorporate PP for flexible collection of active data. This would result in lowering survey fatigue and capture of objective measurements. Moreover, this will allow interventions to be developed and orientated around the features studied in this paper. For example, users who increase the tobacco usage due to anxiety episodes can be detected and intervened by systems like mPuff and mobile devices (65). Furthermore, studies can be specialized for subpopulations, allowing better insights and understanding the specific demographics.
With the ability to have an increased sampling of data, we can offer personalized interventions such as ecological momentary interventions, which can be provided to patients in their natural environments (17).
5.7. Future work
The commonality between the datasets was limited due to the objectives of Canada Statistics data collection. The common features are related to demographics (RURURB, SEX, AGEGRP) and mental health questions (PBH_110/BH_110, MH_20D, BH_20M, BH_40D, BH_40F, MHDVMHI). Due to the common features, future works can evaluate the effect of demographics on GAD severity for the first and second phases.
The original CPSS surveys contained up to 102 survey questions that can lead to survey fatigue. Survey fatigue is defined as a participant becoming apathetic or bored due to excessive numbers of questions, resulting in the abandonment of the survey. This work reduced the feature set to 20, while also reducing the potential of survey fatigue. The ability to augment the PP data with a passive sensor in combination with efficient classifiers could allow more detailed digital phenotyping. The classification of Minimal and Severe provides proxy correlates for population anxiety, as well as the ability to prepare and provide interventions accordingly. Moreover, future studies can replicate this work and implement the use of passive and PP features for further analysis of public health policies if they are leading to decreased stress and anxiety in the population. With the presence of COVID-19, mental health has been a common discussion topic. A study of continuous long-term data collection can further explore and understand how people cope during this pandemic.
Data availability statement
The datasets analyzed for this study can be found in the Canadian Perspectives Survey Series 2: Monitoring the effects of COVID-19, May 2020 and Canadian Perspectives Survey Series 4: Information sources consulted during the pandemic, July 2020.
Author contributions
BN is the lead author who explored the literature, summarized the findings, developed the machine learning models, summarized the results, and wrote the manuscript. MI helped revise the manuscript. VB is the co-supervisor and offered expertise in psychiatry and assisted with manuscript writing. SK is the principal investigator of this research project, who provided biweekly feedback on the project, and assisted with manuscript writing. All authors contributed to the article and approved the submitted version.
Funding
This research is funded through Natural Sciences and Engineering Research Council of Canada (NSERC) RGPIN-2020-04628 and Ontario Graduate Scholarship (OGS).
Acknowledgments
The authors would like to thank the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Ontario Graduate Scholarship (OGS) for funding the project.
Conflict interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher's note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
1. Jacob KS, Patel V. Classification of mental disorders: a global mental health perspective. Lancet. (2014) 383:1433–5. doi: 10.1016/S0140-6736(13)62382-X
2. Christensen MK, Lim CC, Saha S, Plana-Ripoll O, Cannon D, Momen NC, et al. The cost of mental disorders: a systematic review. Epidemiol Psychiatr Sci. (2020) 29:e161. doi: 10.1017/S204579602000075X
3. Patel V. Recognition of common mental disorders in primary care in African countries: should mental be dropped? Lancet. (1996) 347:742–4. doi: 10.1016/S0140-6736(96)90083-5
4. Kauye F, Jenkins R, Rahman A. Training primary health care workers in mental health, its impact on diagnoses of common mental disorders in primary care of a developing country, Malawi: a cluster-randomized controlled trial. Psychol Med. (2014) 44:657–66. doi: 10.1017/S0033291713001141
5. Huckvale K, Venkatesh S, Christensen H. Toward clinical digital phenotyping: a timely opportunity to consider purpose, quality, safety. npj Digit Med. (2019) 2(1):1–11. doi: 10.1038/s41746-019-0166-1
6. Waring OM, Majumder MS. Introduction to digital phenotyping for global health. In: Leveraging data science for global health. Cham, Switzerland: Springer International Publishing (2020). p. 251–61. doi: 10.1007/978-3-030-47994-7_15
7. Martinengo L, Van Galen L, Lum E, Kowalski M, Subramaniam M, Car J. Suicide prevention and depression apps’ suicide RA and management: a systematic assessment of adherence to clinical guidelines. BMC Med. (2019) 17:231. doi: 10.1186/s12916-019-1461-z
8. Polsky JY, Gilmour H. Food insecurity and MH during the COVID-19 pandemic. Health Rep. (2020) 31:3–11. doi: 10.25318/82-003-x202001200001-eng
9. Moskowitz DS, Young SN. Ecological momentary assessment: what it is and why it is a method of the future in clinical psychopharmacology. J Psychiatry Neurosci. (2006) 31:13–20. PMID: 16496031
10. Findlay LC, Arim R, Kohen D. Understanding the perceived mental health of canadians during the COVID-19 pandemic. Health Rep. (2020) 31:22–7. doi: 10.25318/82-003-x202000400003-eng
11. Zajacova A, Jehn A, Stackhouse M, Denice P, Ramos H. Changes in health behaviours during early COVID-19, socio-demographic disparities: a cross-sectional analysis. Can J Public Health. (2020) 111:953–62. doi: 10.17269/s41997-020-00434-y
12. Statistics Canada. Canadian perspectives survey series 2: monitoring the effects of COVID-19 (2020).
14. Statistics Canada. Canadian perspective survey series 4, 2020: information sources consulted during the pandemic study documentation metadata production (2020).
15. Zajacova A, Jehn A, Stackhouse M, Choi KH, Denice P, Haan M, et al. Mental health, economic concerns from March to May during the COVID-19 pandemic in Canada: Insights from an analysis of repeated cross-sectional surveys. SSM - Popul Health. (2020) 12:100704. doi: 10.1016/j.ssmph.2020.100704
16. Shiffman S, Stone AA, Hufford MR. Ecological momentary assessment. Annu Rev Clin Psychol. (2008) 4:1–32. doi: 10.1146/ANNUREV.CLINPSY.3.022806.091415
17. Parmar A, Sharma P. Ecological momentary interventions delivered by smartphone apps: applications in substance use treatment in indian scenario. Indian J Psychol Med. (2017) 39:102. doi: 10.4103/0253-7176.198942
18. Dagklis T, Tsakiridis I, Mamopoulos A, Athanasiadis A, Pearson R, Papazisis G. Impact of the COVID 19 lockdown on antenatal mental health in Greece. Psychiatry Clin Neurosci. (2020) 74:616–7. doi: 10.1111/pcn.13135
19. Brooks SK, Webster RK, Smith LE, Woodland L, Wessely S, Greenberg N, et al. Rapid Review The psychological impact of quarantine and how to reduce it: rapid review of the evidence. Lancet. (2020) 395:912–20. doi: 10.1016/S0140-6736(20)30460-8
20. Schwartz DA, Graham AL. Potential maternal and infant outcomes from coronavirus 2019-NCOV (SARS-CoV-2) infecting pregnant women: lessons from SARS, MERS, and other human coronavirus infections. Viruses. (2020) 12(2):194–210. doi: 10.3390/v12020194
21. Grigoriadis S, Graves L, Peer M, Mamisashvili L, Tomlinson G, Vigod SN, et al. Maternal anxiety during pregnancy and the association with adverse perinatal outcomes: Systematic review and meta-analysis. J Clin Psychiatry. (2018) 79(5):813–835. doi: 10.4088/JCP.17r12011
22. Skapinakis P. Spielberger state-trait anxiety inventory. In: Encyclopedia of quality of life and well-being research. Netherlands: Springer (2014). p. 6261–6264. doi: 10.1007/978-94-007-0753-5_2825
23. Murray D, Cox JL. Screening for depression during pregnancy with the Edinburgh Depression Scale (EPDS). J Reprod Infant Psychol. (1990) 8:99–107. doi: 10.1080/02646839008403615
24. Spitzer RL, Kroenke K, Williams JBW, Löwe B. A brief measure for assessing generalized anxiety disorder: the GAD-7. Arch Intern Med. (2006) 166:1092–7. doi: 10.1001/archinte.166.10.1092
25. Locke AB, Kirst N, Shultz CG. Diagnosis and management of GAD and panic disorder in adults. Technical Report 9 (2015).
26. Curtiss J, Klemanski DH. Identifying individuals with GAD: A receiver operator characteristic analysis of theoretically relevant measures. Behav Change. (2015) 32:255–72. doi: 10.1017/bec.2015.15
27. Bulloch A, Zulyniak S, Williams J, Bajgai J, Bhattarai A, Dores A, et al. Poor mental health during the COVD-19 pandemic: effect modification by age (2021). doi: 10.1177/0706743721994408
28. Lin SL. Generalized anxiety disorder during COVID-19 in Canada: gender-specific association of COVID-19 misinformation exposure, precarious employment, and health behavior change. J Affect Disord. (2022) 302:280–92. doi: 10.1016/j.jad.2022.01.100
29. Nguyen B, Nigro M, Rueda A, Kolappan S, Bhat V, Krishnan S, et al. Feature analysis and hierarchical classification of anxiety severity during early COVID-19. In: Proceedings of 43nd Annual International Conference of the IEEE Engineering in Medicine / Biology Society (EMBC). Guadalajara, Mexico: Institute of Electrical and Electronics Engineers Inc. (2021). p. 1678–1681.
30. Daily SB, James MT, Cherry D, Porter JJ, Darnell SS, Isaac J, et al. Affective computing: historical foundations, current applications,, future trends. In: Emotions, affect in human factors and human–computer interaction. San Diego, United States: Academic Press (2017). p. 213–31. doi: 10.1016/B978-0-12-801851-4.00009-4
31. Torous J, Kiang MV, Lorme J, Onnela JP. New tools for new research in psychiatry: a scalable, customizable platform to empower data driven smartphone research. JMIR Ment Health. (2016) 3:e16. doi: 10.2196/mental.5165
32. Wang R, Chen F, Chen Z, Li T, Harari G, Tignor S, et al. StudentLife: assessing mental health, academic performance, behavioral trends of college students using smartphones. In: Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing. New York, NY, USA: ACM (2014). UbiComp ’14. p. 3–14. doi: 10.1145/2632048.2632054
33. Nguyen B, Kolappan S, Bhat V, Krishnan S. Clustering and feature analysis of smartphone data for depression monitoring. In: Proceedings of 43nd Annual International Conference of the IEEE Engineering in Medicine / Biology Society (EMBC). Guadalajara, Mexico: Institute of Electrical and Electronics Engineers Inc. (2021). p. 113–16.
34. Farhan AA, Lu J, Bi J, Russell A, Wang B, Bamis A. Multi-view bi-clustering to identify smartphone sensing features indicative of depression. In: Proceedings—2016 IEEE 1st International Conference on Connected Health: Applications, Systems, Engineering Technologies, CHASE 2016. Institute of Electrical, Electronics Engineers (2016). p. 264–273. doi: 10.1109/CHASE.2016.27
35. Melcher J, Lavoie J, Hays R, D’Mello R, Rauseo-Ricupero N, Camacho E, et al. Digital phenotyping of student mental health during COVID-19: an observational study of 100 college students. J Am Coll Health. (2021) 69(1):1–13. doi: 10.1080/07448481.2021.1905650
36. Yang YS, Ryu GW, Han I, Oh S, Choi M. Ecological momentary assessment using smartphone-based mobile application for affect, stress assessment. Healthc Inform Res. (2018) 24:381. doi: 10.4258/HIR.2018.24.4.381
37. Dunton GF, Liao Y, Kawabata K, Intille S. Momentary assessment of adults’ physical activity, sedentary behavior: Feasibility and validity. Front Psychol. (2012) 3:260–9. doi: 10.3389/FPSYG.2012.00260/
38. Curtis S, Pearce J, Cherrie M, Dibben C, Cunningham N, Bambra C. Changing labour market conditions during the ‘great recession’ and mental health in Scotland 2007–2011: an example using the Scottish Longitudinal Study and data for local areas in Scotland. Soc Sci Med. (2019) 227:1–9. doi: 10.1016/J.SOCSCIMED.2018.08.003
39. Rivenbark JG, Copeland WE, Davisson EK, Gassman-Pines A, Hoyle RH, Piontak JR, et al. Perceived social status and mental health among young adolescents: Evidence from census data to cellphones. Dev Psychol. (2019) 55:574–85. doi: 10.1037/DEV0000551
40. Krishnan S. Biomedical signal analysis for connected healthcare. Elsevier London, UK: Academic Press (2021). doi: 10.1016/B978-0-12-813086-5.00005-0
41. Peng H, Long F, Ding C. Feature selection based on mutual information. IEEE Trans Pattern Anal Mach Intell. (2005) 27:1226–38. doi: 10.1109/cita.2015.7349827
42. Kira K, Rendell LA. A practical approach to feature selection. Machine Learning Proceedings 1992. Elsevier (1992). p. 249–256. doi: 10.1016/b978-1-55860-247-2.50037-1
43. Byrd-Bredbenner C, Eck K, Quick V. Psychometric properties of the generalized anxiety disorder-7 and generalized anxiety disorder-mini in United States university students. Front Psychol. (2020) 11:2512–21. doi: 10.3389/FPSYG.2020.550533/BIBTEX
44. Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification, Regression Trees (2017). p. 1–358. doi: 10.1201/9781315139470/CLASSIFICATION-REGRESSION-TREES-LEO-BREIMAN-JEROME-FRIEDMAN-RICHARD-OLSHEN-CHARLES-STONE
45. Bishop CM. Pattern recognition and machine learning. Information Science and Statistics. New York, United States: Springer (2006).
46. Anderson E, Shivakumar G. Effects of exercise and physical activity on anxiety. Front Psychiatry. (2013) 4:27–31. doi: 10.3389/fpsyt.2013.00027
47. King JL, Reboussin BA, Spangler J, Cornacchione Ross J, Sutfin EL. Tobacco product use and mental health status among young adults. Addict Behav. (2018) 77:67–72. doi: 10.1016/j.addbeh.2017.09.012
48. Delmonte MM. Meditation and anxiety reduction: A literature review. Clin Psychol Rev. (1985) 5:91–102. doi: 10.1016/0272-7358(85)90016-9
49. Toneatto T, Nguyen L. Does mindfulness meditation improve anxiety and mood symptoms? A review of the controlled research. Can J Psychiatry. (2007) 52(4):260–6. doi: 10.1177/070674370705200409
50. Breedvelt JJF, Amanvermez Y, Harrer M, Karyotaki E, Gilbody S, Bockting CLH, et al. The effects of meditation, yoga, mindfulness on depression, anxiety, stress in tertiary education students: a meta-analysis. Front Psychiatry. (2019) 20:193. doi: 10.3389/FPSYT.2019.00193
51. Mosavi SV, Faraji R, Zebardast A, Esmaiel Zade J. Effectiveness of meditation as a meta-cognitive therapy in reducing anxiety in pregnant women in the last trimester of pregnancy. J Guilan Univ Med Sci. (2018) 27:32–43. ISSN: 0393-6384 (printed) / 2283-9720 (online)
52. Lomas T, Cartwright T, Edginton T, Ridge D. A qualitative analysis of experiential challenges associated with meditation practice. Mindfulness. (2014) 6(4):848–60. doi: 10.1007/S12671-014-0329-8
53. Janssen M, Chang BPI, Hristov H, Pravst I, Profeta A, Millard J. Changes in food consumption during the COVID-19 pandemic: analysis of consumer survey data from the first lockdown period in Denmark, Germany, and Slovenia. Front Nutr. (2021) 8:60. doi: 10.3389/FNUT.2021.635859
54. Béland LP, Brodeur A, Mikola D, Wright T. The short-term economic consequences of COVID-19: occupation tasks and mental health in Canada. IZA Discussion Papers 13254, Bonn (2020).
55. Li Y. Sources of COVID-19 information seeking and their associations sources of COVID-19 information seeking and their associations with self-perceived mental health among Canadians with self-perceived mental health among Canadians (2021). doi: 10.33137/ijidi.v5i3.36193
56. Lang PJ, McTeague LM. The anxiety disorder spectrum: Fear imagery, physiological reactivity, and differential diagnosis. Anxiety Stress Coping. (2009) 22:5. doi: 10.1080/10615800802478247
57. Rossi R, Socci V, Talevi D, Mensi S, Niolu C, Pacitti F, et al. COVID-19 pandemic and lockdown measures impact on mental health among the general population in Italy. Front Psychiatry. (2020) 11:790. doi: 10.3389/FPSYT.2020.00790
58. Ueda M, Stickley A, Sueki H, Matsubayashi T. Mental health status of the general population in Japan during the COVID-19 pandemic. Psychiatry Clin Neurosci. (2020) 74:505–6. doi: 10.1111/PCN.13105
59. Bailey L, Ward M, DiCosimo A, Baunta S, Cunningham C, Romero-Ortuno R, et al. Physical and mental health of older people while cocooning during the COVID-19 pandemic. QJM. (2021) 114(9):648–53. doi: 10.1093/QJMED/HCAB015
60. Magson NR, Freeman JYA, Rapee RM, Richardson CE, Oar EL, Fardouly J. Risk and Protective Factors for Prospective Changes in Adolescent Mental Health during the COVID-19 Pandemic. J Youth Adolesc. (2020) 50(1):44–57. doi: 10.1007/S10964-020-01332-9
63. Jacobson NC, Lekkas D, Huang R, Thomas N. Deep learning paired with wearable passive sensing data predicts deterioration in anxiety disorder symptoms across 17–18 years. J Affect Disord. (2021) 282:104–11. doi: 10.1016/j.jad.2020.12.086
64. Pedrelli P, Fedor S, Ghandeharioun A, Howe E, Ionescu DF, Bhathena D, et al. Monitoring changes in depression severity using wearable and mobile sensors. Front Psychiatry. (2020) 11:1413–24. doi: 10.3389/fpsyt.2020.584711
65. Ahsan Ali A, Monowar Hossain S, Hovsepian K, Mahbubur Rahman M, Plarre K, Kumar S. mPuff: automated detection of cigarette smoking puffs from respiration measurements. In: Proceedings of the 11th International Conference on Information Processing in Sensor Networks—IPSN ’12. Vol. 12. Beijing, China: Association for Computing Machinery, (2012). doi: 10.1145/2185677
Keywords: digital phenotyping, machine learning, COVID-19, anxiety, mental health
Citation: Nguyen B, Ivanov M, Bhat V and Krishnan S (2022) Digital phenotyping for classification of anxiety severity during COVID-19. Front. Digit. Health 4:877762. doi: 10.3389/fdgth.2022.877762
Received: 17 February 2022; Accepted: 14 September 2022;
Published: 13 October 2022.
Edited by:
Ahsan H. Khandoker Khalifa University, United Arab EmiratesReviewed by:
Kim Mathiasen University of Southern Denmark, DenmarkL.J. Muhammad Federal University Kashere, Nigeria
© 2022 Nguyen, Ivanov, Bhat and Krishnan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Binh Nguyen binh.nguyen@ryerson.ca
Specialty Section: This article was submitted to Digital Mental Health, a section of the journal Frontiers in Digital Health