Skip to main content

OPINION article

Front. Behav. Econ.
Sec. Behavioral Public Policy
Volume 3 - 2024 | doi: 10.3389/frbhe.2024.1503793

Harnessing pluralism in behavioural public policy requires insights from computational social science

Provisionally accepted
  • 1 VU Amsterdam, Amsterdam, Netherlands
  • 2 University of Trento, Trento, Trentino-Alto Adige/Südtirol, Italy

The final, formatted version of the article will be published soon.

    The current focus of behavioural research is highly applied and interdisciplinary, leading to a blurring of conceptual boundaries between different behavioural techniques (see Table 1 below for an overview). This trend has made it challenging to distinguish between various toolkits, turning them into mere labels. For instance, the concept of nudging, originally centred on reflexive psychological prompts like defaults, has expanded to include more thoughtful interventions like educative nudges, overlapping with other techniques like boosting, which aim to enhance human capabilities. This overlap is evident in how educative nudges and short-term boosts both address specific problems in a similar manner [2, p397]. Additionally, nudging and boosting share similarities with "thinks" [3], which are large-scale educational policies designed to improve societal decision-making. Similarly, nudge+ interventions, combining nudging with reflective prompts, also overlap with system-2 nudges, as they both promote deliberative responses to behavioural problems. Recently, scholars have argued that nudge+ and boosting both enhance agency, clubbing them together within a behavioural agency framework [4]. The lack of clear distinctions between these toolkits has made them less useful for practitioners and potentially more confusing to navigate. Nudge, as defined by 1: Pluralism in behavioural public policy toolkits Despite this, practitioners have extensively utilised pluralism in behavioural public policy by relying on different BIs. For instance, behavioural "nudge units" have implemented approaches that are predominantly pluralistic rather than strictly labelled -the MINDSPACE report, an early classification attempt in behavioural public policy, assembled a variety of BIs that went beyond simple nudges. Scholars advocating for these alternative toolkits argue that conceptual clarification aids in comparative analysis. However, why hasn't this been particularly useful thus far?One explanation is that efforts to delineate BIs and establish clear guidelines have been hampered by limitations in data and methodology for inferring causal mechanisms and establishing precursors. For instance, factors like motivation and conscientiousness have been proposed as necessary prerequisites for boosting people's competencies [5]. However, determining whether individuals meet these criteria requires identifying specific groups either through behavioural profiling before intervention delivery or reassessing them afterward based on their response to treatment -both of which are challenging tasks. In general, understanding heterogeneity and meeting prerequisites for effectively administering BIs has remained a methodological challenge related to causally identifying diverse treatment effects. This difficulty is amplified by the lack of a clear conceptual framework for tailoring interventions to individuals, as well as the inability to causally infer tailoring effects, even when attempted, due to self-selection bias leading to violations of experimental conditions (stable unit treatment value assumption).In this article, we outline new advancements in computational social science methods, including large language models, and discuss how these methods can be used in conjunction with conventional BPP toolkits in the pursuit to uncover heterogeneity in effect sizes and thereby understand mechanisms behaviour change. We propose that harnessing pluralism in the toolkit of the policymaker is possible using these recent developments in computational methods which in turn is key to personalising the delivery of behavioural interventions. Recent progress in causal inference methods, particularly with the integration of machine learning algorithms within computational social science, has marked a significant advancement. We now possess various techniques for estimating the heterogeneous treatment effects using causal machine learning, applicable to both experimental and observational data. From causal trees and forests to metalearners and Bayesian statistics approaches (see Table 2 for an overview of their features), these methodologies have transformed our comprehension of the subtleties in treatment effects across different subpopulations, enabling us to identify precursors and mechanisms of behaviour change.Causal trees and forests, pioneered by Athey and Imbens [6], utilise the hierarchical structure of decision trees to divide data into subgroups with distinct treatment effects. By iteratively splitting the data based on interacting covariates, these methods pinpoint the most relevant variations in treatment effects. For example, a causal tree might first split the data by age group and then by cognitive reflection capacities (using CRTs) within each age group, highlighting how treatment effects differ across these dimensions. The resulting tree or ensemble of trees offers an understandable and data-driven approach to exploring treatment effect diversity across groups of people varying in their reflective potential. Causal trees and forests are particularly useful when the treatment effect varies based on observable characteristics, as they can identify the specific subgroups that benefit most from the treatment. If, for example, a certain age group with a higher reflective potential is associated with a larger uptake of the treatment, this localised effect could suggest such groups of people might be more amenable to reflective BIs such as nudge versus a nudge.On the other hand, meta learners like the X-learner [7] and the R-learner [8] adopt a different strategy, combining multiple machine learning models to estimate the conditional average treatment effect (CATE). Typically, these methods entail training separate models for the treatment and control groups, then merging their predictions to estimate the CATE. For instance, using the same example as above, the X-learner would train a model on the treated units and another on the control units within each age group and CRTs level, using the difference between their predictions to estimate the treatment effect for each subgroup. Leveraging the strengths of diverse machine learning algorithms, meta learners can yield more precise and resilient estimates of treatment effect variability. Meta learners are advantageous when the relationship between the covariates and the treatment effect is complex and cannot be easily captured by a single model. Furthermore, Bayesian statistical techniques such as Bayesian Additive Regression Trees (BART) [9] and Bayesian Causal Forests (BCF) [10] provide a probabilistic framework for estimating heterogeneous treatment effects. These methods integrate prior knowledge and uncertainty quantification into the estimation process. By sampling from the posterior distribution of treatment effects, Bayesian approaches furnish point estimates and credible intervals that gauge the uncertainty surrounding the estimated treatment effects. For example, BART can analyse treatment effects within the same subgroups of age and CRT levels, generating a range of possible treatment effects, giving researchers a sense of how confident they can be in the estimates and where the true treatment effect is likely to lie. Bayesian methods are particularly useful when there is prior knowledge about the treatment effect or when quantifying the uncertainty of the estimates is crucial. The latest opportunities are offered by the application of Large Language Models (LLMs) in this context. LLMs offer significant potential to enhance experimental designs aimed at exploring heterogeneity in treatment effects. By leveraging LLMs, researchers can generate synthetic participants to complement real study participants, addressing issues of underrepresentation and allowing for the exploration of rare trait combinations. This capability enables better covariate balancing and more robust model validation. LLMs can be integrated into adaptive randomisation strategies, propensity score modelling, and outcome prediction, creating a multi-stage process that dynamically improves as the study progresses. These models can generate hypothetical scenarios, identify confounding factors, and refine propensity score models, ultimately improving the allocation of participants to different interventions.Furthermore, LLMs can simulate potential outcomes, aiding in sequential randomisation and response-adaptive allocation. This approach allows for a more flexible and efficient exploration of treatment effect heterogeneity, as the experimental design can be continuously updated based on both real and synthetic data insights. The integration of LLMs in this context promises to enhance study efficiency and the validity of findings regarding heterogeneous treatment effects, potentially revolutionising how we design and conduct experiments. Uncovering heterogeneity in the uptake of behavioural interventions helps us comprehend the varying effectiveness of these interventions, which sheds light on different operational mechanisms of BIs. This understanding not only clarifies how these tools function, leading to conceptual refinement, but also enables us to customise and personalise these interventions. For example, Krefeld-Schwalb and colleagues [11] through a series of large-scale online as well as offline experiments, underscore the need to understand omitted moderators which can explain why treatments might vary in their implementation intensity. Understanding this segmentation can, in turn, enable practitioners to choose between policies.Personalisation methods vary but can be broadly classified, as we propose here, into top-down and bottom-up ways. A top-down approach to personalisation relies on utilising behavioural profiling. In this approach, clusters of individuals are first identified using previously available information on behavioural characteristics, such as one's demographics, socio-economic preferences, cognitive abilities and so on. Using different clustering algorithms [12], it is possible to uniquely identify different clusters and thereby predict underlying economic and cognitive barriers that hinders the uptake of desirable behaviours, thereby assigning specific behavioural interventions to these different clusters. The top-down approach to personalisation is often synonymous to ex-ante personalisation, that is personalisation before the delivery of an intervention. Contrary to this, a bottom-up approach relies on utilising response efficacy. In this approach, a generic intervention is first administered to all individuals. Following this, average causal effects of the treatment are measured across different clusters or groups of individuals using computational approaches such as heterogeneity analysis via causal forests [13]. Identification of heterogeneous treatment effects enables us to uniquely determine what works best and for whom and tailor behavioural interventions based on such a ranking. This bottom-up data driven approach is often synonymous to ex-post personalisation, that is personalisation after the delivery of the intervention.By employing computational social science techniques, we can ultimately create metarules that enable practitioners to classify and design alternative behavioural toolkits in a streamlined and practical manner [14]. This involves accurately identifying subpopulations with varying treatment effects and uncovering previously unrecognised sources of heterogeneity. For example, following field experiments on using reminders to improve the uptake of student financial aid, Athey, Keleher and Speiss [15] applied a bottom-up approach finding that text and email reminders worked best for students who were already somewhat predisposed to applying for financial aid. In contrast, students who were less likely to file for aid remained largely unaffected by these reminders. Based on this, they suggest avoiding expensive efforts to engage individuals who are unlikely to respond. Similarly, in the context of modern contraceptive methods, Athey and colleagues [16] suggest that "low-cost individualised recommendations can potentially be as effective in increasing unfamiliar technology adoption as providing large subsidies." While the evidence on the benefits of computational social science methods in behavioural science is growing, direct tests of personalised interventions versus "one-size-fits-all" policies are largely missing.Implementing computational methods for personalising behavioural interventions raises ethical concerns. While personalisation can improve effectiveness, it risks reinforcing societal inequalities if certain groups are excluded based on predicted response rates. This "optimization-fairness trade-off" could neglect vulnerable populations. Moreover, there are questions about algorithmic transparency and accountability, as practitioners and subjects must understand intervention assignments. Additionally, extensive data collection may infringe on privacy rights and autonomy, reinforcing claims of a "nanny-state" government. Addressing these challenges requires clear governance frameworks that balance optimisation with equity, such as equity audits of algorithms and transparent processes for individuals to understand and contest their intervention assignments.Overall, we advocate for a greater utilisation of these machine learning methods to harness the diversity within behavioural public policy.

    Keywords: Behavioural public policy, computational social science, Large language models, heterogeneity, mechanisms, personalisation

    Received: 29 Sep 2024; Accepted: 14 Nov 2024.

    Copyright: © 2024 Banerjee and Veltri. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Sanchayan Banerjee, VU Amsterdam, Amsterdam, Netherlands

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.