Skip to main content

CONCEPTUAL ANALYSIS article

Front. Psychiatry, 19 February 2024
Sec. Psychopathology

Leveraging big data for causal understanding in mental health: a research framework

  • 1Sapien Labs, Arlington, VA, United States
  • 2Department of Psychiatry, University of California, San Diego, La Jolla, CA, United States
  • 3Rady Children’s Hospital – San Diego, San Diego, CA, United States

Over the past 30 years there have been numerous large-scale and longitudinal psychiatric research efforts to improve our understanding and treatment of mental health conditions. However, despite the huge effort by the research community and considerable funding, we still lack a causal understanding of most mental health disorders. Consequently, the majority of psychiatric diagnosis and treatment still operates at the level of symptomatic experience, rather than measuring or addressing root causes. This results in a trial-and-error approach that is a poor fit to underlying causality with poor clinical outcomes. Here we discuss how a research framework that originates from exploration of causal factors, rather than symptom groupings, applied to large scale multi-dimensional data can help address some of the current challenges facing mental health research and, in turn, clinical outcomes. Firstly, we describe some of the challenges and complexities underpinning the search for causal drivers of mental health conditions, focusing on current approaches to the assessment and diagnosis of psychiatric disorders, the many-to-many mappings between symptoms and causes, the search for biomarkers of heterogeneous symptom groups, and the multiple, dynamically interacting variables that influence our psychology. Secondly, we put forward a causal-orientated framework in the context of two large-scale datasets arising from the Adolescent Brain Cognitive Development (ABCD) study, the largest long-term study of brain development and child health in the United States, and the Global Mind Project which is the largest database in the world of mental health profiles along with life context information from 1.4 million people across the globe. Finally, we describe how analytical and machine learning approaches such as clustering and causal inference can be used on datasets such as these to help elucidate a more causal understanding of mental health conditions to enable diagnostic approaches and preventative solutions that tackle mental health challenges at their root cause.

1 Introduction

In the last three decades there have been many large-scale and longitudinal research initiatives aimed at enhancing our knowledge of mental health disorders and refining their treatment. Collectively, endeavors such as the Global Burden of Disease study (1), World Mental Health Surveys (2) and Psychiatric Genomics Consortium (3) have documented the prevalence of different mental health symptoms and their associated disorders; expanded our understanding of potential risk factors; and given us a better understanding of the complex genomic underpinnings that may result in symptoms associated with disorders such as bipolar disorder, depression, and schizophrenia.

However, despite intensive effort by the research community and considerable funding, a causal understanding of most mental health disorders remains elusive (4) and the majority of psychiatric diagnosis and treatment still operates at the level of symptomatic experience, rather than addressing root causes. This is analogous, within the domain of physical health, to physicians selecting treatments for conditions such as pneumonia, Covid-19, cancer, heart disease or diabetes based solely on a patient’s symptoms and sensations such as fever, pain, or fatigue, without having the necessary diagnostic or screening tests to know what’s caused them or what’s going on at a biological level. Furthermore, we know from physical conditions that the mapping between cause and symptom is typically a many-to-many mapping whereby the same constellation of symptoms can arise from multiple different causes, and the same set of causes can result in different constellations of symptoms across individuals. Having to diagnose and treat disorders based on symptoms alone therefore risks a trial-and-error approach that is a poor fit to underlying causality and results in poor clinical outcomes (5). Furthermore, within the field of psychiatry, diagnostic criteria for different disorders are theoretical constructs that are neither validated against underlying biology or cause, nor empirically demonstrated as separable symptom clusters (6, 7). Thus, the way symptoms are grouped within this system of disorder labeling may be grossly mismatched with underlying causality (79). Such mismatch creates substantial confusion in upstream efforts to identify treatments and biomarkers.

As a result of these challenges, progress in psychiatry lags behind many other medical specialties (1014) and clinical outcomes for many patients remain poor (1517). For example, an analysis of 102 meta-analyses covering 3,782 randomized clinical trials (RCTs) from over 650,000 participants, spanning most major mental disorders concluded “After more than half a century of research, thousands of RCTs and millions of invested funds, the effect sizes of psychotherapies and pharmacotherapies for mental disorders are limited” (18). In addition, the prevalence of suicide and mental health symptoms is high, and on the rise (19, 20) with suicide remaining the fourth leading cause of death among 15-29 year-olds globally (20) and the 2nd leading cause of death for people aged 10-14 and 20-34 in the United States (US) (21). In addition, the prevalence of depression and anxiety in young people has steadily increased, exacerbated by the Covid-19 pandemic (2224). This latter finding is visible as a striking shift in mental wellbeing trends across age groups, where, in the early 2000s, studies showed that young adults (ages 18-21) had the highest psychological wellbeing dipping in middle age and rising again in older age groups, a phenomenon that came to be known as the U-shaped curve of happiness (25). However, since 2011, the Centre for Disease Prevention (CDC) has shown that, in the US, younger age groups increasingly express feelings of sadness (19), while a trend of diminishing mental wellbeing in young people is observed on virtually every continent (26). It is also significant that the overall burden of psychiatric disease is greater in western English-speaking countries despite greater per capita income, a larger number of psychiatrists per population and mental health spending that is 5-7 times higher than other countries with lower incidence levels (1, 27, 28).

However, recent advances in large-scale data acquisition, open datasets and analytical/machine learning approaches present a new era of opportunity within mental health research (4, 2932) to deal with the multitude of biological, social and environmental factors which can influence the brain and mental health and unpack their complex relationships. This allows a refocus of the mental health research paradigm to deliver a more coherent understanding of both the causal factors and physiological underpinnings of psychiatric conditions to enable better prevention, diagnosis and treatment.

In this paper, we propose a shift in the existing paradigm of mental health research from one that starts with theoretically defined categorical symptom groupings to one that embraces a multidimensional approach using large datasets to develop testable causal hypotheses. To that end, we discuss the challenges and complexities associated with uncovering underlying causes of mental illness, focusing on current diagnostic frameworks, the many-to-many mapping between cause and symptoms, and the interplay between root causes, physiological markers and symptomatic experience. We then describe two large-scale multi-dimensional data projects, the Adolescent Brain Cognitive Development (ABCD) Study and Global Mind Project and discuss how machine learning approaches applied to datasets such as these can aid in identifying causal factors and help psychiatry take a much-needed leap forward.

2 Challenges and complexities of identifying causal drivers of mental illness

2.1 Current diagnostic frameworks preclude a causal understanding

Historically, the classification of psychiatric disorders has been driven by clinical observations combined with a theoretical framework that groups symptoms into diagnostic criteria. These criteria are laid out in manuals such as the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) (33) and the International Classification of Diseases (ICD-11) (34) and are used by clinicians to assign patients to particular diagnostic labels based on the alignment of their particular symptom profile to diagnostic criteria (e.g. Major depression: 5 or more depressive symptoms for ≥ 2 weeks; must have either depressed mood or loss of interest/pleasure). In turn, this guides their treatment and care management pathway, helps with ease of communication and documentation, and, in countries such as the US, determines the amount of health insurance support that patients will receive. In the context of research, these diagnostic categories are also used to determine how patients are selected for, and allocated to, different experimental groups, particularly in clinical trials.

In an ideal world, a robust diagnostic system should have high sensitivity to correctly identify patients who have a particular condition, and high specificity to correctly exclude individuals who do not have the condition. In addition, the definition of this condition should have biological validity, correlating with neuroimaging, genetic, or other biomarkers. However, despite multiple iterations of these classification manuals over the past few decades, the current system of classifying individuals based on symptom criteria does not meet this ideal (6, 35, 36) and encounters several challenges that hinder the establishment of a causal understanding.

First, these symptom-based disorder classifications often share similar symptoms within their criteria, making it difficult to distinguish between them. For example, impaired sleep is common to several disorders including attention-deficit/hyperactivity disorder (ADHD), anxiety disorder, autism spectrum disorder (ASD), mood disorders, substance use disorder, and post-traumatic stress disorder (PTSD) (3739). Consequently, it is common for patients to be comorbid across multiple disorders, rather than having symptoms that only align with one disorder (40, 41). Second, there is substantial heterogeneity within diagnostic categories, where individuals with the same diagnostic label can have diverse symptom profiles and treatment needs. For example, there are over a hundred different symptom combinations that can lead to a diagnosis of depression, ADHD or PTSD (4244). Third, it is common for a patient’s symptom profile to evolve and shift over time, crisscrossing different disorder categories, especially within child and adolescent psychiatry where developmental factors create a moving target of symptoms (4547).

This mismatch between symptomatic experience and disorder classifications was evidenced in a recent study of over 100,000 individuals that showed that the heterogeneity of their symptom profiles was almost as high within a psychiatric disorder category as between any two disorder categories (6). Furthermore, no individual disorder category was separable from randomly selected groups of individuals with at least one disorder, indicating that DSM-5 disorder criteria failed to separate individuals by symptom profiles any better than random assignment.

Altogether, the dominance of theoretical classification frameworks based on symptom groupings that are difficult to distinguish from one another and not tied to underlying causes, disrupts the search for linkages between symptoms and their root causes. New transdiagnostic frameworks have emerged such as Research Domain Criteria (RDoC) (7, 48, 49) put forward by the National Institute of Mental Health (NIMH) that considers mental health and psychopathology in the context of major functional neuroscientific domains (e.g. cognition, social processes, sensorimotor, positive/negative arousal, regulation). However, while this framework focuses on functional classifications and physiological criteria, it also does not provide a method of linking causes to symptoms presented in clinical practice.

2.2 Causes to symptoms have a many-to-many mapping

Another challenge that hinders the identification of causal drivers of mental illness is the complex interplay between cause and symptoms. To illustrate this complexity, we provide here an analogy to the physical illness of Covid-19. Covid-19 is a viral infection caused by SARS-CoV-2 with the likelihood of infection dependent on a whole host of secondary biological and social contributing factors. The constellation of symptoms it evokes in people is highly heterogeneous and can include anything from cough, cold, fever and breathing difficulty to fatigue, chills, headaches and brain fog, while some people can be asymptomatic. Conversely, the same constellation of symptoms can be evoked by other causal agents such as bacterial infections, fungal infections, poisons and toxins, poor diet or smoking. Therefore, there is not a 1-to-1 mapping between cause and symptoms, but instead a many-to-many mapping (Table 1).

Table 1
www.frontiersin.org

Table 1 Example of many-to-many mapping between cause and symptoms.

If the symptom-based approach of psychiatry was applied in this context, then the physical symptoms that often tend to present together (cough, cold, fever, sore throat, and breathing difficulty to fatigue, muscle weakness, chills and headaches etc.) might be, in aggregate, labeled “Body Depression Disorder” where having at least 3 or 4 of these symptoms may qualify you for the diagnosis. However, it would be impossible to identify the specific cause (e.g. Covid-19 vs worms vs poor diet) and individuals who were not responsive to a commonly prescribed medication for “Body Depression Disorder” such as an antibiotic, may simply be considered ‘treatment resistant’.

Within the domain of mental illness, this same many-to-many mapping applies between symptoms and root causes. For example, pathogens such as syphilis and streptococcus (if they cross the blood-brain barrier) have been shown to evoke a set of symptoms that align with the diagnostic criteria associated with the disorder labels of schizophrenia and obsessive compulsive disorder (OCD), respectively (50, 51). In turn, these disorder labels, have also been associated with multiple other causal factors and heterogeneous symptom and physiological profiles (5254). Similarly, the symptom-based diagnosis of depression has been associated with a host of environmental factors including ultra-processed food consumption, traumatic experiences and brain injury (5557), while individuals given a diagnosis of depression can exhibit highly heterogeneous symptom profiles beyond the diagnostic criteria outlined in DSM-5 (6, 58). Altogether, mental illness has a greater range of potential causes extending beyond pathogens, toxins and injury to include social experience and sensory stimulus.

Compounding this challenge is that the emergence of symptoms from these causal assaults depends critically on the physiology and genetics of the individual (Figure 1). In the case of COVID-19, those with immune compromise or obesity are more likely to experience a broader array of severe symptoms while those who are young and healthy may go entirely asymptomatic (59). So also, a traumatic experience or diet profile could result in very different mental health outcomes depending on individual physiology or genetics (60, 61). Thus, no perfect relationship exists for any cause and symptom combination, and even less so for any theoretically defined symptom grouping. This illustrates why traditional experimental approaches that are set up to evaluate differences between a symptom-based diagnostic group and healthy controls have not been very successful, resulting in considerable debate around treatment efficacy (e.g. for antidepressants (6264)). It also highlights the need for a multi-dimensional approach which considers a range of symptoms, physiological underpinnings and causal factors from the outset.

Figure 1
www.frontiersin.org

Figure 1 Various environmental assaults interact with the physiology of the individual to produce diverse symptomatic outcomes (individual symptoms represented by boxes). For example, a pathogen such as SARS-CoV-2 interacts with the immune system and various organs to produce anything from no symptoms to very severe symptoms of different types such as breathing difficulty or extreme fatigue and high fever. A broader range of environmental exposures as shown above can interact with the body to deliver a wide range of mental symptoms.

2.3 Biomarkers of symptoms can be misleading

The search for biomarkers of mental health disorders has been an active area of investigation (65). However, despite decades of research there are still no biomarkers that form a crucial part of accepted diagnosis (8, 66, 67). Why is this so? One reason is that mental disorders are highly heterogeneous groupings of symptoms with multiple potential causes so no single physiological marker can be definitive, as it will likely apply only to a subset of those with the symptom-defined diagnosis. Consequently, none have passed a threshold of accepted statistical significance. A second aspect, however, is that biomarkers of symptoms can be misleading.

To illustrate this, we return again to the example of “Body Depression Disorder” which we laid out above. One may find that elevated white blood cell (WBC) counts are fairly reliably associated with a diagnosis of “Body Depression Disorder”. However, the subset of those with injury or physical trauma which still aligns with the criteria for “Body Depression Disorder” would not be associated with this WBC biomarker. Similarly, one might find a reduced electromyography (EMG) signal (a measure of the strength of muscle contraction) is commonly associated with “Body Depression Disorder” although it may be more reliably associated with a particular constellation of sub-symptoms, (e.g., muscle weakness, fatigue, and fever) rather than the diagnosis in general. While these markers may indicate a physiological challenge, they do not necessarily inform treatment. If the function of WBCs was not well understood, it would be tempting to declare WBCs as the cause of “Body Depression Disorder” and thereby seek treatments that eliminate or manipulate levels of WBCs. Alternatively, if one considered that a weak EMG was the cause, then one might electrically stimulate muscles in the hope that it will spur them into action. Thus, while these “biomarkers” act as reasonably good predictors of the symptoms, targeting these factors as a treatment pathway would be a grave mistake.

When considered in the context of mental illness, with the absence of causal understanding, we may find that specific metabolites in blood or cerebrospinal fluid (CSF), or particular physiological characteristics within a single-photon emission computed tomography (SPECT) scan or electroencephalography (EEG) readout are predictive of symptom subsets. However, they may be an indirect biomarker of those symptoms (much like WBCs) rather than the direct cause of them, and therefore targeting them for treatment would not be appropriate. Instead, a more useful biomarker would be an assay for the causal factor itself (analogous to an antigen test for Covid-19) which can in turn inform both prevention and treatment at the causal level.

We thus propose a shift in the framework from searching for biomarkers of symptoms to searching for biomarkers of causes that can serve as potential diagnostic criteria, and potential targets for treatments (Figure 2).

Figure 2
www.frontiersin.org

Figure 2 The current paradigm involves searching for biomarkers as physiological correlates of symptom-based disorder definitions (top). As these disorder definitions are likely to have multiple potential causes, and encompass highly heterogeneous symptom profiles, we propose a shift toward identifying biomarkers that are physiological correlates of both causal factors and specific symptoms (bottom).

3 Strategies for a causal-oriented research framework

3.1 Developing a multidimensional approach to generate testable hypotheses

Considering the broad range of biological, social, and environmental factors that may be at play, moving from a categorical symptom framework of diagnosis to a causal one requires researchers to initially cast a wide net to explore multiple possibilities of cause in order to generate testable hypotheses and identify candidate biomarkers (Figure 3). From the perspective of mental illness this includes not only pathogens and a host of chemical toxins but also aspects of the stimulus environment such as social and technological experiences and lifestyle.

Figure 3
www.frontiersin.org

Figure 3 Generating causal hypotheses in observational data: Moving toward a causal diagnostic framework requires generating testable causal hypotheses from large-scale multidimensional observational data consisting of environmental exposures, physiological assays (e.g., from blood, urine, saliva) along with comprehensive symptom profiles and analyzing these using various multivariate techniques such as causal inference to generate maps of cause to physiology to symptoms.

Secondly, these causal possibilities must be considered in relation to comprehensive individual symptom profiles rather than being limited to theoretical groupings of symptoms categorized as pre-defined disorders. This approach can then identify well-substantiated hypotheses that can be tested through various methods.

Finally, the inclusion of measures of physiology and neurophysiology can aid in the refinement of such hypotheses and the discovery of diagnostic biomarkers. For example, blood markers could help distinguish between the same symptoms caused by pathogens and toxins versus by injury or traumatic experience.

Achieving this requires a shift away from the common approach of comparing a single causal possibility for pre-determined symptom-based diagnostic groups, to a multidimensional approach using large datasets. Such datasets should include comprehensive symptom profiles, assessment of a wide range of potential causal factors and physiological readouts, coupled with analytical approaches amenable to untangling the complex relationships of variables. With new tools for both data acquisition and analysis we are now entering an era that makes this possible.

3.2 The big data opportunity

Over the years, psychiatry research has faced limitations in both acquiring and handling vast amounts of data to explore a wide breadth of variables. Fortunately, recent advancements in data science have opened new doors for transformative progress in the field. Here we talk about two substantially different ongoing large-scale data acquisition efforts, and the potential they have in moving the field toward a causal diagnostic framework: the Adolescent Brain Cognitive Development (ABCD) Study and the Global Mind Project. Briefly, the ABCD study follows 11,864 young people in the US recruited at ages 9 or 10 into adulthood, annually characterizing various potentially causal aspects of their environment and their mental health status along with neuroimaging and genetic studies (Table 2). The Global Mind Project dynamically tracks detailed mental health symptom profiles of individuals around the globe along with demographic information and various social, technological, and environmental factors that are potential causes of mental health challenges. Since 2020 the project has collected responses from over 1.4 million people 18 years and older across 70+ countries in 12+ languages.

Table 2
www.frontiersin.org

Table 2 A summary of the ABCD Study and Global Mind Project.

3.3 ABCD Study®

The ABCD Study® is a 21-site US-based project integrating longitudinal neuroimaging, genetics and behavioral assessments of 11,864 youth and has been described in detail in multiple publications (6873). Youth were recruited into the study beginning at age 9 or 10 in 2015 and are tracked along various dimensions through their childhood with the goal of understanding how social, behavioral, physical, and environmental factors affect brain development and other health outcomes through the second decade of life. The earliest cohort are now in their 9th year of assessment. Study assessments are conducted annually or biannually and include an extensive neurocognitive battery and psychological/behavioral assessments covering various disorders as well as questionnaires on family history and structure, substance use history and screen time (Table 3) (74, 78). In addition, the study includes Magnetic Resonance Imaging (MRI) at two-year intervals (structural imaging, diffusion tensor imaging (DTI), and task-based and resting-state functional imaging) (71). Biospecimens are collected annually and include hair samples, deciduous baby teeth, and body fluids (blood, saliva and urine) to assess exposure to illicit and recreational drugs, pubertal hormones, genomics and epigenomics, pre- and post-natal exposure to environmental neurotoxicants and drugs of abuse (77). The total number of survey questions is approximately 1200 (depending on the number of answers that trigger additional questions) although many of the questions in the psychological surveys overlap due to the large overlap of symptoms across disorder specific tools. Importantly, the ABCD Study is a longitudinal study which allows not just snapshot views but the ability to look temporally at the trajectory of symptoms.

Table 3
www.frontiersin.org

Table 3 ABCD Study assessments.

The ABCD data is curated and accessible via the NIMH Data Archive and released annually. The ABCD Study’s Data Release 5.0 is now available (https://dx.doi.org/10.15154/8873-zj65), https://abcdstudy.org/. Only researchers with an approved NDA Data Use Certification (DUC) may obtain ABCD Study data. It requires verification through one of three NIH Auth Service (RAS) identities.

While the data includes recruitment across 21 sites with wide socio-economic and ethnic representation, it has a unique set of challenges and potential biases given its longitudinal nature. These include attrition over time and data gaps due to suspension of certain aspects of data collection during the Covid-19 pandemic. More specifics on the ABCD data and these potential biases are described in Saragosa-Harris et al. (73).

3.3.1 Key findings of interest

Thus far, major findings reported from the ABCD Study have regarded the potentially causal impact of genetics, sleep, exercise, music, nutrition, trauma, and social media use on brain structure and function, particularly regarding executive functions such as planning, decision making, and impulse control (79, 80). These functions are generally subserved by neural circuitry involving the prefrontal cortex which is known to be dynamically developing well into the third decade of life (81). Some specific findings of public health interest have included:

The negative impact of recreational screen use in adolescents. Data from the ABCD Study have added to a growing body of literature highlighting the negative association between screen use and cognitive and mental health outcomes in youth (80, 82). Research has also revealed the impacts of screen use on sleep, showing that screen use (television or interconnected devices) at bedtime was significantly associated with sleep disturbances in children aged 11-12 (83).

The relationship between sleep quality, neurocognitive development, and mental health symptoms. To date, findings have shown how sleep quality and duration are robustly associated with neurocognitive development, mental health symptoms, and brain anatomy/physiology in children and adolescents (84, 85). In particular, children with shorter sleep duration have smaller brain volumes in areas related to cognition and higher psychiatric problems scores (as do their parents) (84). In addition, insufficient sleep (defined as < 9 hours) has been shown to have widespread effects on baseline behavioral and functional connectivity measures (85).

However, thus far, most studies have focused on the relationships between specific environmental factors and outcomes. This leaves open a vast opportunity for multidimensional analysis that identifies the relative contributions of different environmental factors to both physiological and mental health outcomes and the consequences of their interactions. We outline some possible approaches in Section 4.

3.4 Global Mind Project

The Global Mind Project, launched in 2020, dynamically tracks detailed mental wellbeing profiles of individuals around the globe along with demographic information and various social, technological, and environmental factors that are potential causes of mental health challenges. Since inception, the project has collected responses from over 1.4 million people 18 years and older across 70+ countries in 12+ languages (Table 2). Approximately 1000-2000 new responses are added per day across diverse demographics. More recently data from ages 13 to 17 have also been included.

The study uses a transdiagnostic assessment of mental wellbeing, called the Mental Health Quotient (MHQ) that is completed online and collects life impact ratings across 47 different elements of mental feeling and function spanning all possible symptoms of 10 major mental health disorders, as well as positive aspects of functioning. In addition, the assessment captures detailed demographics, information on various aspects of the social and technology environment as well as lifestyle factors. Data is collected through inviting participation through online advertising that targets a broad range of demographics in each country. The sample is thus specific to the internet-enabled populations of each country. The US sample, where internet penetration is over 90%, has been shown to be broadly representative of the national population, closely matching various demographic and mental health patterns in the American Community and Household Pulse Surveys conducted by the US Census Bureau (86).

Additionally, the dynamic nature of the Global Mind Project offers a view of the ongoing evolution of mental wellbeing and the agility to quickly probe new potential causal factors at scale to understand the impact of emerging social and environmental factors. A summary of the demographic and potential causal factors considered thus far are shown in Table 4 below.

Table 4
www.frontiersin.org

Table 4 Global Mind Project data elements.

The data are openly available in real-time to not-for-profit researchers in structured format and the data can be searched and downloaded by time period, country, language, age and gender. Access to the dynamically updated data is available through Sapien Labs’ proprietary platform Brainbase for which access must be requested through the request form at this url: https://sapienlabs.org/global-mind-project/researcher-hub/.

The MHQ assessment is comprehensive in its coverage of mental health symptoms, yet compact, which allows for more streamlined analysis of symptom profiles without the need to stitch together various assessments which are characterized by significant overlap of symptoms and a lack of standardization (87). The overall aggregate metric of mental wellbeing, dimensional scores, as well as individual ratings (1-9 life impact Likert scale) for 47 individual elements of feeling and functioning, also allows for outcome analysis at different levels of granularity.

3.4.1 Key findings of interest

Thus far, the Global Mind Project has identified relationships between key potentially causal environmental factors and specific symptoms that are of public health interest:

Age of first smartphone and adult mental health. The data has shown that younger ages of first smartphone ownership in childhood are progressively associated with poorer mental wellbeing in adulthood and in particular a greater incidence of “Suicidal thoughts & intentions”, “Feelings of being detached from reality” and “Feelings of aggression toward others” in early adulthood, particularly for girls (88). This trend persists when controlling for childhood traumas and adversities.

Ultra-processed food consumption and symptoms of depression and cognitive/emotional control. Data from 300,000 people in 2023 showed that more frequent consumption of ultra-processed food is associated with significantly lower mental wellbeing, independent of differences in exercise frequency and household income (89). In particular, “Appetite regulation”, “Feelings of sadness, distress or hopelessness” as well as various challenges with emotional and cognitive control were most significantly increased with higher frequencies of ultra-processed food consumption.

The wide range of life context factors that are potentially causal also allows for a rich analysis of interactions. A recent multidimensional analysis that included multiple potential causal factors has used supervised learning to show that social behavior has a far greater impact on overall mental wellbeing outcomes in the population compared to exercise, traumas and adversities and substance use (90).

Altogether the Global Mind Project enables rapid generation of causal hypotheses as well as understanding of the hierarchy of impact of causal factors that can then be tested in follow-on studies.

4 Analytical approaches for causal understanding

The application of data science techniques to the large-scale datasets described above provides a powerful way to understand the many-to-many relationships between causes, symptoms and physiology. We present here examples of approaches that can be applied to the ABCD and Global Mind datasets.

4.1 Understanding groupings of symptoms and causes using clustering approaches

In contrast to the present symptom-based approach which groups symptoms theoretically, clustering or unsupervised learning approaches (91) can be used to determine if there are indeed empirically separable symptom groupings. Such empirically separable groups that map more strongly to specific physiological metrics and/or social or environmental factors could then suggest a specific underlying cause or disease grouping that can be more rigorously tested. Such symptom clustering can be easily achieved in the Global Mind dataset across a large and culturally diverse population where 47 symptoms are collected in a single assessment. While the ABCD Study queries symptoms across a number of different assessments which would have to be combined to construct a comprehensive symptom profile for each individual, it offers an opportunity to both determine how symptom clusters emerge during adolescence and how they might evolve over time.

We show in Figure 4 an example of clustering of symptom phenotypes using data from 29,993 people from the Global Mind data with five or more symptoms. Here, each individual either has or does not have each of 47 symptoms queried (Figure 4A) based on whether or not their rating crosses the threshold to be considered a symptom. A clustering algorithm then seeks to group individuals based on the similarity of their symptom profile. As one example, a 3-D projection of potential clusters using Principal Component Analysis (PCA) is shown in Figure 4B (92). Visually, there is poor separability of groupings overall which suggests that there are no clear symptom phenotypes. However, it is possible that on closer examination some clusters may separate better than others. Moreover, there are numerous approaches to clustering, as well as different levels at which the clustering can be performed, which may confer better separability. The first challenge is to determine the best way of computing similarity of symptom profiles. Drawing again the analogy to physical symptoms, fatigue may be a common symptom of almost all diseases whereas other symptoms such as a cough and cold can clearly restrict possible etiology to viral or bacterial pathogens. However, having many common symptoms like fatigue will reduce the separability of symptom profiles. Thus, understanding of the hierarchy of how symptoms behave can inform how one should approach clustering and which method(s) out of the many available should be implemented (see below for some examples). Conversely, the problem can be approached from the opposite direction where social and environmental factors can be clustered to identify life context phenotypes that map to particular symptoms and/or physiological phenotypes. The presence of empirically separable symptom clusters, especially if enriched for particular life context factors, could be substantially informative about the underlying cause.

Figure 4
www.frontiersin.org

Figure 4 An example of clustering of symptom profiles using 29,993 records from the Global Mind data. (A) The construction of symptom profiles across 47 symptoms (columns) for 29,993 individuals (rows). (B) Uniform manifold approximation and projection of symptom clusters.

Within the toolkit of machine learning there are many clustering approaches. For example, hierarchical clustering (93) organizes elements (such as symptoms) into a tree-like structure, which reveals both higher-level clusters and individual symptom relationships. On the other hand, K-Means clustering (94) creates a specified number of symptom groups by assigning each symptom to a cluster by minimizing the distance between symptoms within the same cluster while maximizing the distance between clusters. Other approaches include Density-Based Spatial Clustering of Applications with Noise (DBSCAN) (95), which dynamically determines the number of clusters based on the density of data points, as well as the Gaussian Mixture Model (GMM), which is a probabilistic model that accounts for overlapping symptom patterns (96).

Depending on the structure of the data, different clustering approaches can sometimes produce substantially different results (97, 98). Cluster validity and reliability can be affected by factors such as the distance metric used, the presence of outliers, and the distribution of the data. Additionally, dimensionality can pose a challenge, making techniques such as Principal Component Analysis (PCA) useful for preprocessing. Thus, multiple methods have to be explored and compared to identify robust and reproducible results.

4.2 Identifying the hierarchy of causal risk factors in symptom outcomes

One obvious application of large multidimensional data is the ability to identify the hierarchy of potential causal factors. While many factors may drive each symptom or grouping of symptoms, they may do so to different degrees. Various supervised learning approaches can be used to determine how combinations of potential causal factors, such as social determinants or physiological metrics, predict mental health outcomes. These include logistic regression (99, 100), gradient boosting (101, 102), random forest (103, 104) and naïve bayes (105), which are all described in detail in various machine learning textbooks and tutorials. As a corollary to such predictive models, different techniques can be used to determine how important each input is for the prediction. Such ranking can help determine which factors should be the focus of further research or intervention. In Figure 5, we show one such method called SHAP (106) applied to a gradient boosting prediction model (XGBoost), which uses a tree-based approach to prediction using 270,000 records from the Global Mind Project obtained in 2022. This example includes demographics as well as lifestyle factors demonstrating that certain factors contribute to negative mental wellbeing outcomes (negative MHQ outcomes) while others contribute to positive mental wellbeing outcomes. While this is for illustration purposes only, it is evident that various lifestyle and life context factors such as lack of social behavior and exercise as well as sleeping pills, job stress and sexual abuse contribute to negative mental health while good sleep, regular exercise and regular socializing contribute to positive mental health. In addition, age is a significant factor with younger age contributing to poorer mental health and posing the question of what type of causal factors associated with young age may be missing (for example, early age of smartphone ownership or social media use).

Figure 5
www.frontiersin.org

Figure 5 SHAP analysis of factors showing various demographic and lifestyle categories and their impact on the prediction of mental health status as determined by the MHQ score using a gradient boosting model. Red indicates a significant impact, while values to the left of 0 on the scale indicate that it contributes to a negative mental health status, and values to the right of 0 indicate a contribution to a positive mental health status.

While such methods can uncover degree of importance, they do not provide insight into relative causality. Recent years have thus seen the emergence of causal graphical models (CGMs) or Bayesian networks as potent tools for modeling complex causal relationships (107) and unveiling the hierarchical structure of causal effects (108). These offer new possibilities for disentangling complex cause-and-effect relationships in observational data using directed acyclic graphs (DAGs) that represent causal relationships between variables, where nodes symbolize variables and edges indicate causal links. For illustrative purposes, we present in Figure 6 a Bayesian network depicted as a causal inference graph, which prioritizes the hierarchical relationship among symptoms. This model applies Bayesian inference techniques to analyze 47 distinct symptoms across a dataset of 270,000 records from the Global Mind Project, spanning 70 countries, to elucidate relationships in the patterns of symptom manifestation. For example, in this graph, “Unwanted, strange, or obsessive thoughts” is a nodal symptom that appears to cause others such as “Fear and anxiety”, “Mood Swings”, “Sense of being detached from reality”, “Repetitive or compulsive actions” and “Avoidance and withdrawal”. So also “Focus & Concentration” has a nodal position with a causal path to “Ability to Learn”, “Memory”, “Planning and Organization” and “Emotional Control”. However, there are many nuances to this method and the strength of causality must be evaluated, which again can be done using various methodologies. While this example restricts analysis to the symptoms alone, application of causal inference to the Global Mind dataset with the inclusion of life context factors could similarly be used to uncover how substance use drives lifestyle behaviors and mental health outcomes, or how ultra-processed food consumption, life traumas and lifestyle factors interact to cause mental health symptoms.

Figure 6
www.frontiersin.org

Figure 6 Causal inference graph of 47 symptoms across 270,000 records from the Global Mind data.

While data from the ABCD Study has been available for several years already, we found the use of causal inference only in one paper, which looked at the relationship between prenatal cannabis exposure, sleep hours and internalizing problems (109). Thus, a host of possibilities remain unexplored. For example, comparing toxicity profiles obtained from biological samples with MRI metrics and behavioral measures such as UPPS-P for Children - Short Form [which assesses impulsivity (110);] and Prodromal Psychosis Scale [which assesses psychosis risk syndromes (111)], could provide a better understanding of the biological basis of certain behaviors. Within the ABCD data there is also the unique opportunity to look at causal trajectories over time. For example, when combining physiological measures and outcomes across subsequent years, how much do different life experiences at age 12 impact outcomes at age 17?

Like all data science applications, numerous other methods for identifying causal relationships are also available. Decision trees, when extended to causal relationships, estimate causal effects through Causal Tree-based Methods (112, 113), while Structural Causal Models (114) detail and simulate interventions based on explicit causal relationships and can predict the magnitude of changes in mental health outcomes due to various influences. Thus, multiple methods of causal exploration will have to be tried and differences in results debated. Finally, while such methods cannot definitively prove causality, they can provide a framework for identifying the most likely causal candidates that can then be tested in more rigorous studies.

4.3 Challenges and limitations

Big data approaches provide an important opportunity to explore a vast landscape of interacting factors to identify structures, patterns, and hierarchies. Furthermore, big data facilitates causal learning by exposing nuanced patterns not visible in small samples. Mining millions of records aids the discovery of subgroup-specific risks obscured by population averages. The demographic breadth of samples also enables the evaluation of assumptions and their robustness. For instance, testing regression consistency over various locations, periods, and subpopulations can identify relationships that are robust to culture. In addition, big data mitigates the risks of overfitting limited data.

However, there are also inherent limitations and cautions. These include data quality, completeness, structures, security and privacy and challenges of assumptions and interpretations.

Data quality. Causal discovery algorithms specifically model nonlinearity and stochasticity. In this context noisy inputs, therefore, propagate inaccuracies. Moreover, the assessment of symptoms is inherently subjective, contributing to the noise. Ensemble and aggregated modeling within the machine learning paradigm are designed to account for inherent variability by integrating multiple estimation approaches. In addition, debiasing training data and calculating missingness causally alleviates noise and incomplete information issues. However, there is no substitute for good quality data and rigorous evaluation and cleaning of data is essential. This can include utilization of internal checks and controls to determine if the same symptom queried in different ways has similar responses, if any values are out of plausible range, comparing outcomes across different sites or researchers involved in multi-site data acquisition to identify anomalies (as in the case of the ABCD Study) and comparing data across studies (e.g. Global Mind data to national surveys such as the American Community Survey or Household Pulse Survey) to check for consistency of overlapping variables.

Completeness of the data. The omission of important factors can provide an incomplete view of the data. Outcomes of clustering, predictive analysis and causal inference can all shift with the omission of a key nodal factor that is correlated to other factors considered. The opportunity of big data is thus enhanced by rapid exploration of new data points of relevance at large scale that allow iterative exploration. The Global Mind Project is specifically designed for such exploration by being an ongoing data acquisition program where the exploration of causal life factors can evolve with new hypotheses and ideas. In this paradigm, new questions can be swapped in and out and a hundred thousand records can be gathered in a few months.

A second aspect of data completeness is the sampling and whether it covers the breadth of populations that can enable a generalizable view. The Global Mind data for instance has wide global and demographic coverage where results can therefore be compared across language, country and ethnic groups. However, it is restricted to the internet-enabled audience and therefore not generalizable to the typically lower-income offline audience. In contrast, the ABCD Study is US specific and therefore does not allow cross-national comparisons. On the other hand, it recruits participants across a range of socio-economic groups and US geographies and enables a robust comparison within the country.

Data Structures A major challenge in working with big data is that of integrating across multiple data layers (e.g. symptom assessments and physiological metrics). Multiple data layers stored in different forms and associated with a single participant poses a considerable data management challenge and a barrier to effective use. Often integrating data layers can take more time than the analysis itself and researchers are loathe to spend their time on this exercise. Moreover, when the data is not properly annotated or documented it can lead to the introduction of considerable errors. Thus, a central goal of open data projects should be to develop data structures that are well annotated and easy to use whereby variables can be easily discovered and downloaded for analysis without substantial effort.

Data security and privacy. The power of large-scale data as we have described is best realized when it is open sourced for many researchers to use. However, aggregation and sharing of health-related information raises considerations relating to privacy, data-sharing and usage rights. Checks need to be in place to ensure that participants have appropriate rights over their personal data and that their data is only used in the manner outlined to the participant at time of consent. Attention should also be paid to data security to avoid breaches to participant privacy. In particular, identifiable information should be encrypted with encryption keys available only to a very small number of individuals who are bound by security protocols with periodic penetration testing conducted to identify and remedy any security weaknesses.

Assumptions and interpretations of results. While all machine learning algorithms are typically available as easy-to-use packages and code libraries, there are many nuances and choices that one must make in setting up the problem and processing the data. Thus, multiple perspectives on these problems are essential. This can pit methods and assumptions against one another to identify results that are robust to methods and data structures versus those that shift with methodology. Methods can then iteratively improve.

In summary, there are now both large datasets and powerful tools that can advance our understanding of multidimensional causal pathways with many-to-many interactions. Altogether this approach provides a framework for wide exploration and identification of the most likely causal candidates that can then be tested in more rigorous studies such as controlled, interventional studies. However, dataset characteristics and specific research objectives influence the choice of method which in turn may emphasize certain patterns and relationships while downplaying others. It is thus important to determine whether outcomes are consistent across methods as well as datasets, and reproducible by different research groups.

5 In conclusion

In conclusion, multidimensional data coupled with powerful analytical approaches have the potential to transform our definition and diagnosis of mental illness from symptom-based to one that is causal, enabling a revolution in psychiatric research and the prevention and treatment of mental health challenges.

However, the potential of these approaches depends crucially on the data. As in all large data exercises, wider breadth and larger scale of the data can contribute to deeper and more accurate insights by covering more relevant factors in the causal chain. Furthermore, the quality and accessibility of the data are paramount for research success. This includes easy understanding of open datasets and lower barriers to access and interpretation of the data through well-defined data dictionaries, database search tools and data structures that are easy to work with. In addition, framing of the questions and goals of any analysis effort, refining analytical methods, and translating findings into tangible improvements in mental healthcare are fundamental to the success of these efforts. Despite progress in computational psychiatry around precision approaches with a focus on tailored treatment regimens and efficacy prediction [e.g (30, 32)], there has been limited application of these approaches to multidimensional datasets such as the ABCD Study with a focus on prevention and causality. Establishing multi-disciplinary research teams with domain expertise spanning prevention psychiatry, sociology, computational science and data science would help to drive forward this research opportunity.

Altogether these large datasets and analytical toolkits now present the opportunity to untangle the many-to-many relationships between causal factors, physiology and symptoms and enable the development of strong hypotheses that can be tested in more controlled settings.

Author contributions

JN: Conceptualization, Writing – original draft, Writing – review & editing. JB: Writing – original draft, Conceptualization, Writing – review & editing. JG: Conceptualization, Writing – original draft, Writing – review & editing. BM: Conceptualization, Writing – review & editing, Writing – original draft. TT: Conceptualization, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported by funding from Sapien Labs.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. GBD 2019 Mental Disorders Collaborators. Global, regional, and national burden of 12 mental disorders in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet Psychiatry. (2022) 9:137–50. doi: 10.1016/S2215-0366(21)00395-3

PubMed Abstract | CrossRef Full Text | Google Scholar

2. McGrath JJ, Al-Hamzawi A, Alonso J, Altwaijri Y, Andrade LH, Bromet EJ, et al. Age of onset and cumulative risk of mental disorders: a cross-national analysis of population surveys from 29 countries. Lancet Psychiatry. (2023) 10(9):668–81. doi: 10.1016/S2215-0366(23)00193-1

PubMed Abstract | CrossRef Full Text | Google Scholar

3. Sullivan PF, Agrawal A, Bulik CM, Andreassen OA, Børglum AD, Breen G, et al. Psychiatric genomics: an update and an agenda. AJP. (2018) 175:15–27. doi: 10.1176/appi.ajp.2017.17030283

CrossRef Full Text | Google Scholar

4. Saxe GN, Bickman L, Ma S, Aliferis C. Mental health progress requires causal diagnostic nosology and scalable causal discovery. Front Psychiatry. (2022) 13:898789. doi: 10.3389/fpsyt.2022.898789

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Harris MG, Kazdin AE, Chiu WT, Sampson NA, Aguilar-Gaxiola S, Al-Hamzawi A, et al. Findings from world mental health surveys of the perceived helpfulness of treatment for patients with major depressive disorder. JAMA Psychiatry. (2020) 77:830–41. doi: 10.1001/jamapsychiatry.2020.1107

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Newson JJ, Pastukh V, Thiagarajan TC. Poor separation of clinical symptom profiles by DSM-5 disorder criteria. Front Psychiatry. (2021) 12:775762. doi: 10.3389/fpsyt.2021.775762

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Insel T, Cuthbert B, Garvey M, Heinssen R, Pine DS, Quinn K, et al. Research domain criteria (RDoC): toward a new classification framework for research on mental disorders. AJP. (2010) 167:748–51. doi: 10.1176/appi.ajp.2010.09091379

CrossRef Full Text | Google Scholar

8. Kapur S, Phillips AG, Insel TR. Why has it taken so long for biological psychiatry to develop clinical tests and what to do about it? Mol Psychiatry. (2012) 17:1174–9. doi: 10.1038/mp.2012.105

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Brückl TM, Spoormaker VI, Sämann PG, Brem A-K, Henco L, Czamara D, et al. The biological classification of mental disorders (BeCOME) study: a protocol for an observational deep-phenotyping study for the identification of biological subtypes. BMC Psychiatry. (2020) 20:213. doi: 10.1186/s12888-020-02541-z

PubMed Abstract | CrossRef Full Text | Google Scholar

10. Caetano Dos Santos FL, Wojciechowska U, Michalek IM, Didkowska J. Progress in cancer survival across last two decades: A nationwide study of over 1.2 million Polish patients diagnosed with the most common cancers. Cancer Epidemiol. (2022) 78:102147. doi: 10.1016/j.canep.2022.102147

PubMed Abstract | CrossRef Full Text | Google Scholar

11. Fauci AS, Lane HC. Four decades of HIV/AIDS — Much accomplished, much to do. New Engl J Med. (2020) 383:1–4. doi: 10.1056/NEJMp1916753

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Nathan DM. Diabetes: advances in diagnosis and treatment. JAMA. (2015) 314:1052–62. doi: 10.1001/jama.2015.9536

PubMed Abstract | CrossRef Full Text | Google Scholar

13. Norrving B. Stroke management - recent advances and residual challenges. Nat Rev Neurol. (2019) 15:69–71. doi: 10.1038/s41582-018-0129-1

PubMed Abstract | CrossRef Full Text | Google Scholar

14. Weisfeldt ML, Zieman SJ. Advances in the prevention and treatment of cardiovascular disease. Health Aff (Millwood). (2007) 26:25–37. doi: 10.1377/hlthaff.26.1.25

PubMed Abstract | CrossRef Full Text | Google Scholar

15. Cuijpers P, Stringaris A, Wolpert M. Treatment outcomes for depression: challenges and opportunities. Lancet Psychiatry. (2020) 7:925–7. doi: 10.1016/S2215-0366(20)30036-5

PubMed Abstract | CrossRef Full Text | Google Scholar

16. Freedland KE, Zorumski CF. Success rates in psychiatry. JAMA Psychiatry. (2023) 80:407–8. doi: 10.1001/jamapsychiatry.2023.0056

PubMed Abstract | CrossRef Full Text | Google Scholar

17. Zipursky RB. Why are the outcomes in patients with schizophrenia so poor? J Clin Psychiatry. (2014) 75 Suppl 2:20–4. doi: 10.4088/JCP.13065su1.05

PubMed Abstract | CrossRef Full Text | Google Scholar

18. Leichsenring F, Steinert C, Rabung S, Ioannidis JPA. The efficacy of psychotherapies and pharmacotherapies for mental disorders in adults: an umbrella review and meta-analytic evaluation of recent meta-analyses. World Psychiatry. (2022) 21:133–45. doi: 10.1002/wps.20941

PubMed Abstract | CrossRef Full Text | Google Scholar

19. CDC. Youth Risk Behavior Survey: Data Summary & Trends Report (2023). Available online at: https://www.cdc.gov/media/releases/2023/p0213-yrbs.html.

Google Scholar

20. World Health Organization. Suicide (2021). Available online at: https://www.who.int/news-room/fact-sheets/detail/suicide (Accessed July 19, 2023).

Google Scholar

21. CDC. Preventing Suicide (2023). Available online at: https://www.cdc.gov/suicide/pdf/NCIPC-Suicide-FactSheet-508_FINAL.pdf.

Google Scholar

22. Santomauro DF, Herrera AMM, Shadid J, Zheng P, Ashbaugh C, Pigott DM, et al. Global prevalence and burden of depressive and anxiety disorders in 204 countries and territories in 2020 due to the COVID-19 pandemic. Lancet. (2021) 398:1700–12. doi: 10.1016/S0140-6736(21)02143-7

PubMed Abstract | CrossRef Full Text | Google Scholar

23. World Health Organization. Mental Health and COVID-19: Early evidence of the pandemic’s impact: Scientific brief (2022). Available online at: https://www.who.int/publications-detail-redirect/WHO-2019-nCoV-Sci_Brief-Mental_health-2022.1 (Accessed July 12, 2023).

Google Scholar

24. Xiong J, Lipsitz O, Nasri F, Lui LMW, Gill H, Phan L, et al. Impact of COVID-19 pandemic on mental health in the general population: A systematic review. J Affect Disord. (2020) 277:55–64. doi: 10.1016/j.jad.2020.08.001

PubMed Abstract | CrossRef Full Text | Google Scholar

25. Stone AA, Schwartz JE, Broderick JE, Deaton A. A snapshot of the age distribution of psychological well-being in the United States. Proc Natl Acad Sci USA. (2010) 107:9985–90. doi: 10.1073/pnas.1003744107

PubMed Abstract | CrossRef Full Text | Google Scholar

26. Sapien Labs. Mental State of the World 2022 (2022). Available online at: https://mentalstateoftheworld.report/.

Google Scholar

27. Stein DJ, Shoptaw SJ, Vigo DV, Lund C, Cuijpers P, Bantjes J, et al. Psychiatric diagnosis and treatment in the 21st century: paradigm shifts versus incremental integration. World Psychiatry. (2022) 21:393–414. doi: 10.1002/wps.20998

PubMed Abstract | CrossRef Full Text | Google Scholar

28. Steel Z, Marnane C, Iranpour C, Chey T, Jackson JW, Patel V, et al. The global prevalence of common mental disorders: a systematic review and meta-analysis 1980–2013. Int J Epidemiol. (2014) 43:476–93. doi: 10.1093/ije/dyu038

PubMed Abstract | CrossRef Full Text | Google Scholar

29. Huys QJM, Maia TV, Frank MJ. Computational psychiatry as a bridge from neuroscience to clinical applications. Nat Neurosci. (2016) 19:404–13. doi: 10.1038/nn.4238

PubMed Abstract | CrossRef Full Text | Google Scholar

30. Bzdok D, Meyer-Lindenberg A. Machine learning for precision psychiatry: opportunities and challenges. Biol Psychiatry: Cogn Neurosci Neuroimaging. (2018) 3:223–30. doi: 10.1016/j.bpsc.2017.11.007

PubMed Abstract | CrossRef Full Text | Google Scholar

31. Ray A, Bhardwaj A, Malik YK, Singh S, Gupta R. Artificial intelligence and Psychiatry: An overview. Asian J Psychiatr. (2022) 70:103021. doi: 10.1016/j.ajp.2022.103021

PubMed Abstract | CrossRef Full Text | Google Scholar

32. Rutledge RB, Chekroud AM, Huys QJ. Machine learning and big data in psychiatry: toward clinical applications. Curr Opin Neurobiol. (2019) 55:152–9. doi: 10.1016/j.conb.2019.02.006

PubMed Abstract | CrossRef Full Text | Google Scholar

33. APA. Diagnostic and statistical manual of mental disorders, 5th ed (Arlington, VA, United States: American Psychiatric Publishing, Inc.). (2013).

Google Scholar

34. WHO. International statistical classification of diseases and related health problems, 11th ed (Geneva, Switzerland: World Health Organization (WHO)). (2018).

Google Scholar

35. Allsopp K, Read J, Corcoran R, Kinderman P. Heterogeneity in psychiatric diagnostic classification. Psychiatry Res. (2019) 279:15–22. doi: 10.1016/j.psychres.2019.07.005

PubMed Abstract | CrossRef Full Text | Google Scholar

36. Wakefield JC. Diagnostic issues and controversies in DSM-5: return of the false positives problem. Annu Rev Clin Psychol. (2016) 12:105–32. doi: 10.1146/annurev-clinpsy-032814-112800

PubMed Abstract | CrossRef Full Text | Google Scholar

37. Alfano CA, Gamble AL. The role of sleep in childhood psychiatric disorders. Child Youth Care Forum. (2009) 38:327–40. doi: 10.1007/s10566-009-9081-y

PubMed Abstract | CrossRef Full Text | Google Scholar

38. Giannakopoulos G, Kolaitis G. Sleep problems in children and adolescents following traumatic life events. World J Psychiatry. (2021) 11:27–34. doi: 10.5498/wjp.v11.i2.27

PubMed Abstract | CrossRef Full Text | Google Scholar

39. Phiri D, Amelia VL, Muslih M, Dlamini LP, Chung M-H, Chang P-C. Prevalence of sleep disturbance among adolescents with substance use: a systematic review and meta-analysis. Child Adolesc Psychiatry Ment Health. (2023) 17:100. doi: 10.1186/s13034-023-00644-5

PubMed Abstract | CrossRef Full Text | Google Scholar

40. Plana-Ripoll O, Pedersen CB, Holtz Y, Benros ME, Dalsgaard S, de Jonge P, et al. Exploring comorbidity within mental disorders among a danish national population. JAMA Psychiatry. (2019) 76:259. doi: 10.1001/jamapsychiatry.2018.3658

PubMed Abstract | CrossRef Full Text | Google Scholar

41. McGrath JJ, Lim CCW, Plana-Ripoll O, Holtz Y, Agerbo E, Momen NC, et al. Comorbidity within mental disorders: a comprehensive analysis based on 145 990 survey respondents from 27 countries. Epidemiol Psychiatr Sci. (2020) 29:e153. doi: 10.1017/S2045796020000633

PubMed Abstract | CrossRef Full Text | Google Scholar

42. Zimmerman M, Ellison W, Young D, Chelminski I, Dalrymple K. How many different ways do patients meet the diagnostic criteria for major depressive disorder? Compr Psychiatry. (2015) 56:29–34. doi: 10.1016/j.comppsych.2014.09.007

PubMed Abstract | CrossRef Full Text | Google Scholar

43. Galatzer-Levy IR, Bryant RA. 636,120 ways to have posttraumatic stress disorder. Perspect Psychol Sci. (2013) 8:651–62. doi: 10.1177/1745691613504115

PubMed Abstract | CrossRef Full Text | Google Scholar

44. Silk TJ, Malpas CB, Beare R, Efron D, Anderson V, Hazell P, et al. A network analysis approach to ADHD symptoms: More than the sum of its parts. PloS One. (2019) 14:e0211053. doi: 10.1371/journal.pone.0211053

PubMed Abstract | CrossRef Full Text | Google Scholar

45. Caspi A, Houts RM, Ambler A, Danese A, Elliott ML, Hariri A, et al. Longitudinal assessment of mental health disorders and comorbidities across 4 decades among participants in the dunedin birth cohort study. JAMA Netw Open. (2020) 3:e203221. doi: 10.1001/jamanetworkopen.2020.3221

PubMed Abstract | CrossRef Full Text | Google Scholar

46. Copeland WE, Adair CE, Smetanin P, Stiff D, Briante C, Colman I, et al. Diagnostic transitions from childhood to adolescence to early adulthood. J Child Psychol Psychiatry. (2013) 54:791–9. doi: 10.1111/jcpp.12062

PubMed Abstract | CrossRef Full Text | Google Scholar

47. Costello EJ, Mustillo S, Erkanli A, Keeler G, Angold A. Prevalence and development of psychiatric disorders in childhood and adolescence. Arch Gen Psychiatry. (2003) 60:837–44. doi: 10.1001/archpsyc.60.8.837

PubMed Abstract | CrossRef Full Text | Google Scholar

48. Cuthbert BN, Insel TR. Toward the future of psychiatric diagnosis: the seven pillars of RDoC. BMC Med. (2013) 11:126. doi: 10.1186/1741-7015-11-126

PubMed Abstract | CrossRef Full Text | Google Scholar

49. Insel TR. The NIMH research domain criteria (RDoC) project: precision medicine for psychiatry. AJP. (2014) 171:395–7. doi: 10.1176/appi.ajp.2014.14020138

CrossRef Full Text | Google Scholar

50. Dop D, Marcu IR, Padureanu R, Niculescu CE, Padureanu V. Pediatric autoimmune neuropsychiatric disorders associated with streptococcal infections (Review). Exp Ther Med. (2021) 21:94. doi: 10.3892/etm.2020.9526

PubMed Abstract | CrossRef Full Text | Google Scholar

51. Taki El-Din Z, Iqbal H, Sharma A. Neurosyphilis-induced psychosis: A unique presentation of syphilis with a primary psychiatric manifestation. Cureus (2023) 15:e36080. doi: 10.7759/cureus.36080

PubMed Abstract | CrossRef Full Text | Google Scholar

52. Stilo SA, Murray RM. Non-genetic factors in schizophrenia. Curr Psychiatry Rep. (2019) 21:100. doi: 10.1007/s11920-019-1091-3

PubMed Abstract | CrossRef Full Text | Google Scholar

53. Alnæs D, Kaufmann T, van der Meer D, Córdova-Palomera A, Rokicki J, Moberget T, et al. Brain heterogeneity in schizophrenia and its association with polygenic risk. JAMA Psychiatry. (2019) 76:739–48. doi: 10.1001/jamapsychiatry.2019.0257

PubMed Abstract | CrossRef Full Text | Google Scholar

54. Van Schalkwyk GI, Bhalla IP, Griepp M, Kelmendi B, Davidson L, Pittenger C. Toward Understanding the Heterogeneity in OCD: Evidence from narratives in adult patients. Aust N Z J Psychiatry. (2016) 50:74–81. doi: 10.1177/0004867415579919

PubMed Abstract | CrossRef Full Text | Google Scholar

55. Samuthpongtorn C, Nguyen LH, Okereke OI, Wang DD, Song M, Chan AT, et al. Consumption of ultraprocessed food and risk of depression. JAMA Network Open. (2023) 6:e2334770. doi: 10.1001/jamanetworkopen.2023.34770

PubMed Abstract | CrossRef Full Text | Google Scholar

56. LeMoult J, Humphreys KL, Tracy A, Hoffmeister J-A, Ip E, Gotlib IH. Meta-analysis: exposure to early life stress and risk for depression in childhood and adolescence. J Am Acad Child Adolesc Psychiatry. (2020) 59:842–55. doi: 10.1016/j.jaac.2019.10.011

PubMed Abstract | CrossRef Full Text | Google Scholar

57. Albrecht JS, Barbour L, Abariga SA, Rao V, Perfetto EM. Risk of depression after traumatic brain injury in a large national sample. J Neurotrauma. (2019) 36:300–7. doi: 10.1089/neu.2017.5608

PubMed Abstract | CrossRef Full Text | Google Scholar

58. Fried EI, Nesse RM. Depression is not a consistent syndrome: An investigation of unique symptom patterns in the STAR*D study. J Affect Disord. (2015) 172:96–102. doi: 10.1016/j.jad.2014.10.010

PubMed Abstract | CrossRef Full Text | Google Scholar

59. Cheng WA, Turner L, Marentes Ruiz CJ, Tanaka ML, Congrave-Wilson Z, Lee Y, et al. Clinical manifestations of COVID-19 differ by age and obesity status. Influenza Other Respir Viruses. (2022) 16:255–64. doi: 10.1111/irv.12918

PubMed Abstract | CrossRef Full Text | Google Scholar

60. Treatment (US) C for SA. Understanding the Impact of Trauma, in: Trauma-Informed Care in Behavioral Health Services (2014). Substance Abuse and Mental Health Services Administration (US. Available online at: https://www.ncbi.nlm.nih.gov/books/NBK207191/ (Accessed October 19, 2023).

Google Scholar

61. Firth J, Gangwisch JE, Borsini A, Wootton RE, Mayer EA. Food and mood: how do diet and nutrition affect mental wellbeing? BMJ. (2020) 369:m2382. doi: 10.1136/bmj.m2382

PubMed Abstract | CrossRef Full Text | Google Scholar

62. Moncrieff J. Against the stream: Antidepressants are not antidepressants – an alternative approach to drug action and implications for the use of antidepressants. BJPsych Bull. (2018) 42:42–4. doi: 10.1192/bjb.2017.11

PubMed Abstract | CrossRef Full Text | Google Scholar

63. Almohammed OA, Alsalem AA, Almangour AA, Alotaibi LH, Yami MSA, Lai L. Antidepressants and health-related quality of life (HRQoL) for patients with depression: Analysis of the medical expenditure panel survey from the United States. PloS One. (2022) 17:e0265928. doi: 10.1371/journal.pone.0265928

PubMed Abstract | CrossRef Full Text | Google Scholar

64. Cipriani A, Furukawa TA, Salanti G, Chaimani A, Atkinson LZ, Ogawa Y, et al. Comparative efficacy and acceptability of 21 antidepressant drugs for the acute treatment of adults with major depressive disorder: a systematic review and network meta-analysis. Lancet. (2018) 391:1357–66. doi: 10.1016/S0140-6736(17)32802-7

PubMed Abstract | CrossRef Full Text | Google Scholar

65. Abi-Dargham A, Moeller SJ, Ali F, DeLorenzo C, Domschke K, Horga G, et al. Candidate biomarkers in psychiatric disorders: state of the field. World Psychiatry. (2023) 22:236–62. doi: 10.1002/wps.21078

PubMed Abstract | CrossRef Full Text | Google Scholar

66. Berk M. Biomarkers in psychiatric disorders: status quo, impediments and facilitators. World Psychiatry. (2023) 22:174–6. doi: 10.1002/wps.21071

PubMed Abstract | CrossRef Full Text | Google Scholar

67. Yatham LN. Biomarkers for clinical use in psychiatry: where are we and will we ever get there? World Psychiatry. (2023) 22:263. doi: 10.1002/wps.21079

PubMed Abstract | CrossRef Full Text | Google Scholar

68. Volkow ND, Koob GF, Croyle RT, Bianchi DW, Gordon JA, Koroshetz WJ, et al. The conception of the ABCD study: From substance use to a broad NIH collaboration. Dev Cognit Neurosci. (2018) 32:4–7. doi: 10.1016/j.dcn.2017.10.002

CrossRef Full Text | Google Scholar

69. Karcher NR, Barch DM. The ABCD study: understanding the development of risk for mental and physical health outcomes. Neuropsychopharmacol. (2021) 46:131–42. doi: 10.1038/s41386-020-0736-6

CrossRef Full Text | Google Scholar

70. Auchter AM, Hernandez Mejia M, Heyser CJ, Shilling PD, Jernigan TL, Brown SA, et al. A description of the ABCD organizational structure and communication framework. Dev Cognit Neurosci. (2018) 32:8–15. doi: 10.1016/j.dcn.2018.04.003

CrossRef Full Text | Google Scholar

71. Casey BJ, Cannonier T, Conley MI, Cohen AO, Barch DM, Heitzeg MM, et al. The Adolescent Brain Cognitive Development (ABCD) study: Imaging acquisition across 21 sites. Dev Cognit Neurosci. (2018) 32:43–54. doi: 10.1016/j.dcn.2018.03.001

CrossRef Full Text | Google Scholar

72. Garavan H, Bartsch H, Conway K, Decastro A, Goldstein RZ, Heeringa S, et al. Recruiting the ABCD sample: Design considerations and procedures. Dev Cognit Neurosci. (2018) 32:16–22. doi: 10.1016/j.dcn.2018.04.004

CrossRef Full Text | Google Scholar

73. Saragosa-Harris NM, Chaku N, MacSweeney N, Guazzelli Williamson V, Scheuplein M, Feola B, et al. A practical guide for researchers and reviewers using the ABCD Study and other large longitudinal datasets. Dev Cogn Neurosci. (2022) 55:101115. doi: 10.1016/j.dcn.2022.101115

PubMed Abstract | CrossRef Full Text | Google Scholar

74. Barch DM, Albaugh MD, Avenevoli S, Chang L, Clark DB, Glantz MD, et al. Demographic, physical and mental health assessments in the adolescent brain and cognitive development study: Rationale and description. Dev Cogn Neurosci. (2018) 32:55–66. doi: 10.1016/j.dcn.2017.10.010

PubMed Abstract | CrossRef Full Text | Google Scholar

75. Luciana M, Bjork JM, Nagel BJ, Barch DM, Gonzalez R, Nixon SJ, et al. Adolescent neurocognitive development and impacts of substance use: Overview of the adolescent brain cognitive development (ABCD) baseline neurocognition battery. Dev Cognit Neurosci. (2018) 32:67–79. doi: 10.1016/j.dcn.2018.02.006

CrossRef Full Text | Google Scholar

76. Fan CC, Marshall A, Smolker H, Gonzalez MR, Tapert SF, Barch DM, et al. Adolescent Brain Cognitive Development (ABCD) study Linked External Data (LED): Protocol and practices for geocoding and assignment of environmental data. Dev Cogn Neurosci. (2021) 52:101030. doi: 10.1016/j.dcn.2021.101030

PubMed Abstract | CrossRef Full Text | Google Scholar

77. Uban KA, Horton MK, Jacobus J, Heyser C, Thompson WK, Tapert SF, et al. Biospecimens and the ABCD study: Rationale, methods of collection, measurement and early data. Dev Cognit Neurosci. (2018) 32:97–106. doi: 10.1016/j.dcn.2018.03.005

CrossRef Full Text | Google Scholar

78. Lisdahl KM, Sher KJ, Conway KP, Gonzalez R, Feldstein Ewing SW, Nixon SJ, et al. Adolescent brain cognitive development (ABCD) study: Overview of substance use assessment methods. Dev Cognit Neurosci. (2018) 32:80–96. doi: 10.1016/j.dcn.2018.02.007

CrossRef Full Text | Google Scholar

79. Freis SM, Morrison CL, Lessem JM, Hewitt JK, Friedman NP. Genetic and environmental influences on executive functions and intelligence in middle childhood. Dev Sci. (2022) 25:e13150. doi: 10.1111/desc.13150

PubMed Abstract | CrossRef Full Text | Google Scholar

80. Paulich KN, Ross JM, Lessem JM, Hewitt JK. Screen time and early adolescent mental health, academic, and social outcomes in 9- and 10- year old children: Utilizing the Adolescent Brain Cognitive Development SM (ABCD) Study. PloS One. (2021) 16:e0256591. doi: 10.1371/journal.pone.0256591

PubMed Abstract | CrossRef Full Text | Google Scholar

81. Kolk SM, Rakic P. Development of prefrontal cortex. Neuropsychopharmacol. (2022) 47:41–57. doi: 10.1038/s41386-021-01137-9

CrossRef Full Text | Google Scholar

82. Nagata JM, Cortez CA, Cattle CJ, Ganson KT, Iyer P, Bibbins-Domingo K, et al. Screen time use among US adolescents during the COVID-19 pandemic: findings from the adolescent brain cognitive development (ABCD) study. JAMA Pediatr. (2022) 176:94–6. doi: 10.1001/jamapediatrics.2021.4334

PubMed Abstract | CrossRef Full Text | Google Scholar

83. Nagata JM, Singh G, Yang JH, Smith N, Kiss O, Ganson KT, et al. Bedtime screen use behaviors and sleep outcomes: Findings from the Adolescent Brain Cognitive Development (ABCD) Study. Sleep Health: J Natl Sleep Foundation. (2023) 9:497–502. doi: 10.1016/j.sleh.2023.02.005

CrossRef Full Text | Google Scholar

84. Cheng W, Rolls E, Gong W, Du J, Zhang J, Zhang X-Y, et al. Sleep duration, brain structure, and psychiatric and cognitive problems in children. Mol Psychiatry. (2021) 26:3992–4003. doi: 10.1038/s41380-020-0663-2

PubMed Abstract | CrossRef Full Text | Google Scholar

85. Yang FN, Xie W, Wang Z. Effects of sleep duration on neurocognitive development in early adolescents in the USA: a propensity score matched, longitudinal, observational study. Lancet Child Adolesc Health. (2022) 6:705–12. doi: 10.1016/S2352-4642(22)00188-2

PubMed Abstract | CrossRef Full Text | Google Scholar

86. Taylor J, Sukhoi O, Newson J, Thiagarajan T. Representativeness of the Global Mind Project Data for the United States (2023). Available online at: https://osf.io/p9ur6 (Accessed December 22, 2023).

Google Scholar

87. Newson JJ, Hunter D, Thiagarajan TC. The heterogeneity of mental health assessment. Front Psychiatry. (2020) 11:76. doi: 10.3389/fpsyt.2020.00076

PubMed Abstract | CrossRef Full Text | Google Scholar

88. Sapien Labs. Age of first Smartphone/Tablet and Mental Wellbeing Outcomes (2023). Available online at: https://sapienlabs.org/age-of-first-smartphone-tablet-and-mental-wellbeing-outcomes/.

Google Scholar

89. Sapien Labs. Ultra-processed food consumption and mental wellbeing outcomes (2023). Available online at: https://sapienlabs.org/consumption-of-ultra-processed-food-and-mental-wellbeing-outcomes/.

Google Scholar

90. Bala J, Newson J, Thiagarajan T. Hierarchy of Demographic and Social Determinants of Mental Health (Charlottesville, VA, USA). (2023). doi: 10.31219/osf.io/k8h3u.

CrossRef Full Text | Google Scholar

91. Jain AK, Murty MN, Flynn PJ. Data clustering: a review. ACM computing surveys. (1999) 31(3):264–323. doi: 10.1145/331499.331504

CrossRef Full Text | Google Scholar

92. Joliffe I. Principal Component Analysis, Second Edition. 2nd ed. New York, USA: Springer (2014).

Google Scholar

93. Zhang Z, Murtagh F, Van Poucke S, Lin S, Lan P. Hierarchical cluster analysis in clinical research with heterogeneous study population: highlighting its visualization with R. Ann Transl Med. (2017) 5:75. doi: 10.21037/atm.2017.02.05

PubMed Abstract | CrossRef Full Text | Google Scholar

94. Hartigan JA, Wong MA. Algorithm AS 136: A K-Means clustering algorithm. J R Stat Soc Ser C (Applied Statistics). (1979) 28:100–8. doi: 10.2307/2346830

CrossRef Full Text | Google Scholar

95. Hahsler M, Piekenbrock M, Doran D. dbscan: fast density-based clustering with R. J Stat Software. (2019) 91:1–30. doi: 10.18637/jss.v091.i01

CrossRef Full Text | Google Scholar

96. Reynolds D. Gaussian Mixture Models. In: Encyclopedia of Biometrics. Springer, Boston, MA (2009).

Google Scholar

97. Estivill-Castro V. Why so many clustering algorithms: a position paper. SIGKDD Explor Newsl. (2002) 4:65–75. doi: 10.1145/568574.568575

CrossRef Full Text | Google Scholar

98. Rodriguez MZ, Comin CH, Casanova D, Bruno OM, Amancio DR, Costa L da F, et al. Clustering algorithms: A comparative approach. PloS One. (2019) 14:e0210236. doi: 10.1371/journal.pone.0210236

PubMed Abstract | CrossRef Full Text | Google Scholar

99. Cox DR. The regression analysis of binary sequences. J R Stat Soc Ser B (Methodological). (1958) 20:215–42.

Google Scholar

100. Yan X. Linear Regression Analysis: Theory and Computing (2009). Available online at: https://www.worldscientific.com/doi/epdf/10.1142/6986 (Accessed December 21, 2023).

Google Scholar

101. Friedman JH. Greedy function approximation: A gradient boosting machine. Ann Stat. (2001) 29:1189–232. doi: 10.1214/aos/1013203451

CrossRef Full Text | Google Scholar

102. Chen T, Guestrin C. (2016). XGBoost: A scalable tree boosting system, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA. pp. 785–94. New York, USA: Association for Computing Machinery. doi: 10.1145/2939672.2939785

CrossRef Full Text | Google Scholar

103. Ho TK. (1995). Random decision forests, in: Proceedings of 3rd International Conference on Document Analysis and Recognition, (Piscataway, NJ, USA: Institute of Electrical and Electronics Engineer), Vol. 1. pp. 278–82. doi: 10.1109/ICDAR.1995.598994

CrossRef Full Text | Google Scholar

104. Breiman L. Random forests. Mach Learn. (2001) 45:5–32. doi: 10.1023/A:1010933404324

CrossRef Full Text | Google Scholar

105. Vikramkumar BV. Trilochan. Bayes Naive Bayes Classifier. (2014). doi: 10.48550/arXiv.1404.0933

CrossRef Full Text | Google Scholar

106. Lundberg S, Lee S-I. A unified approach to interpreting model predictions. (2017). doi: 10.48550/arXiv.1705.07874

CrossRef Full Text | Google Scholar

107. Freedman DA. Graphical models for causation, and the identification problem. Eval Rev. (2004) 28:267–93. doi: 10.1177/0193841X04266432

PubMed Abstract | CrossRef Full Text | Google Scholar

108. PEARL J. Causal diagrams for empirical research. Biometrika. (1995) 82:669–88. doi: 10.1093/biomet/82.4.669

CrossRef Full Text | Google Scholar

109. Spechler P, Thompson W, Paulus M. P17. Prenatal cannabis exposure moderates the relationship between sleep hours and internalizing problems: A causal inference analysis of ABCD data. Biol Psychiatry. (2022) 91:S94. doi: 10.1016/j.biopsych.2022.02.252

CrossRef Full Text | Google Scholar

110. Cyders MA, Littlefield AK, Coffey S, Karyadi KA. Examination of a short version of the UPPS-P impulsive behavior scale. Addict Behav. (2014) 39:1372–6. doi: 10.1016/j.addbeh.2014.02.013

PubMed Abstract | CrossRef Full Text | Google Scholar

111. Loewy RL, Pearson R, Vinogradov S, Bearden CE, Cannon TD. Psychosis risk screening with the prodromal questionnaire – brief version (PQ-B). Schizophr Res. (2011) 129:42–6. doi: 10.1016/j.schres.2011.03.029

PubMed Abstract | CrossRef Full Text | Google Scholar

112. Athey S, Imbens G. Recursive partitioning for heterogeneous causal effects. Proc Natl Acad Sci. (2016) 113:7353–60. doi: 10.1073/pnas.1510489113

PubMed Abstract | CrossRef Full Text | Google Scholar

113. Wager S, Athey S. Estimation and inference of heterogeneous treatment effects using random forests. J Am Stat Assoc. (2018) 113:1228–42. doi: 10.1080/01621459.2017.1319839

CrossRef Full Text | Google Scholar

114. Steiner PM, Shadish WR, Sullivan KJ. Frameworks for causal inference in psychological science. In: APA handbook of research methods in psychology: Foundations, planning, measures, and psychometrics, Vol. 1, 2nd ed. APA Handbooks in Psychology®. American Psychological Association, Washington, DC, US (2023). p. 23–56. doi: 10.1037/0000318-002

CrossRef Full Text | Google Scholar

Keywords: big data, mental health, MHQ, ABCD, Global Mind Project, machine learning, AI, causal factors

Citation: Newson JJ, Bala J, Giedd JN, Maxwell B and Thiagarajan TC (2024) Leveraging big data for causal understanding in mental health: a research framework. Front. Psychiatry 15:1337740. doi: 10.3389/fpsyt.2024.1337740

Received: 13 November 2023; Accepted: 01 February 2024;
Published: 19 February 2024.

Edited by:

Jasmin Vassileva, Virginia Commonwealth University, United States

Reviewed by:

Robert Whelan, Trinity College Dublin, Ireland
Francesco Monaco, Azienda Sanitaria Locale Salerno, Italy

Copyright © 2024 Newson, Bala, Giedd, Maxwell and Thiagarajan. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Tara C. Thiagarajan, tara@sapienlabs.org

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.