On Consequentialism and Fairness

Card, Dallas; Smith, Noah A.

doi:10.3389/frai.2020.00034

CONCEPTUAL ANALYSIS article

Front. Artif. Intell., 08 May 2020

Sec. Machine Learning and Artificial Intelligence

Volume 3 - 2020 | https://doi.org/10.3389/frai.2020.00034

This article is part of the Research TopicEthical Machine Learning and Artificial Intelligence (AI)View all 7 articles

On Consequentialism and Fairness

Dallas Card¹^*

Noah A. Smith^2,3

¹Computer Science Department, Stanford University, Stanford, CA, United States
²Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle, WA, United States
³Allen Institute for AI, Seattle, WA, United States

Recent work on fairness in machine learning has primarily emphasized how to define, quantify, and encourage “fair” outcomes. Less attention has been paid, however, to the ethical foundations which underlie such efforts. Among the ethical perspectives that should be taken into consideration is consequentialism, the position that, roughly speaking, outcomes are all that matter. Although consequentialism is not free from difficulties, and although it does not necessarily provide a tractable way of choosing actions (because of the combined problems of uncertainty, subjectivity, and aggregation), it nevertheless provides a powerful foundation from which to critique the existing literature on machine learning fairness. Moreover, it brings to the fore some of the tradeoffs involved, including the problem of who counts, the pros and cons of using a policy, and the relative value of the distant future. In this paper we provide a consequentialist critique of common definitions of fairness within machine learning, as well as a machine learning perspective on consequentialism. We conclude with a broader discussion of the issues of learning and randomization, which have important implications for the ethics of automated decision making systems.

1. Introduction

In recent years, computer scientists have increasingly come to recognize that artificial intelligence (AI) systems have the potential to create harmful consequences. Especially within machine learning, there have been numerous efforts to formally characterize various notions of fairness and develop algorithms to satisfy these criteria. However, most of this research has proceeded without any nuanced discussion of ethical foundations. Partly as a response, there have been several recent calls to think more broadly about the ethical implications of AI (Barabas et al., 2018; Hu and Chen, 2018b; Torresen, 2018; Green, 2019).

Among the most prominent approaches to ethics within philosophy is a highly influential position known as consequentialism. Roughly speaking, the consequentialist believes that outcomes are all that matter, and that people should therefore endeavor to act so as to produce the best consequences, based on an impartial perspective as to what is best.

Although there are numerous difficulties with consequentialism in practice (see section 4), it nevertheless provides a clear and principled foundation from which to critique proposals which fall short of its ideals. In this paper, we analyze the literature on fairness within machine learning, and show how it largely depends on assumptions which the consequentialist perspective reveals immediately to be problematic. In particular, we make the following contributions:

• We provide an accessible overview of the main ideas of consequentialism (section 3), as well as a discussion of its difficulties (section 4), with a special emphasis on computational limitations.

• We review the dominant ideas about fairness in the machine learning literature (section 5), and provide the first critique of these ideas explicitly from the perspective of consequentialism (section 6).

• We conclude with a broader discussion of the ethical issues raised by learning and randomization, highlighting future direction for both AI and consequentialism (section 7).

2. Motivating Examples

Before providing a formal description of consequentialism (section 3), we will begin with a series of motivating examples which illustrate some of the difficulties involved. We consider three variations on decisions about lending money, a frequently-used example in discussions about fairness, and an area in which AI could have significant real-world consequences.

First, imagine being asked by a relative for a small personal loan. This would seem to be a relatively low-stakes decision involving a simple tradeoff (e.g., financial burden vs. familial strife). Although this decision could in principle have massive long term consequences (perhaps the relative will start a business that will have a large impact, etc.), it is the immediate consequences which will likely dominate the decision. On the other hand, treating this as a simple yes-or-no decision fails to recognize the full range of possibilities. A consequentialist might suggest that we consider all possible uses of the money, such as investing it, or lending it to someone in even greater need. Whereas commonsense morality might direct us to favor our relatives over strangers, the notion of impartiality inherent in consequentialism presents a challenge to this perspective, thus raising the problem of demandingness (section 4.4).

Second, consider a bank executive creating a policy to determine who will or will not be granted a loan. This policy will affect not only would-be borrowers, but also the financial health of the bank, its employees, etc. In this case, the bank will likely be bound by various forms of regulation which will constrain the policy. Even a decision maker with an impartial perspective will be bound by these laws (the breaking of which might entail severe negative consequences). In addition, the bank might wish to create a policy that will be perceived as fair, yet knowing the literature on machine learning fairness, they will know that no policy will simultaneously satisfy all criteria that have been proposed (section 5). Moreover, there may be a tradeoff between short-term profits and long-term success (section 4.2).

Finally, consider a legislator trying to craft legislation that will govern the space of policies that banks are allowed to use in determining who will get a loan. This is an even more high-level decision that could have even more far reaching consequences. As a democratic society, we may hope that those in government will work for the benefit of all (though this hope may often be disappointed in practice), but it is unclear how even a selfless legislator should balance all competing interests (section 4.1). Moreover, even if there were consensus on the desired outcome, determining the expected consequences of any particular governing policy will be extremely difficult, as banks will react to any such legislation, trying to maximize their own interests while respecting the letter of the law, thus raising the problem of uncertainty (section 4.3).

Although these scenarios are distinct, each of the issues raised applies to some extent in each case. As we will discuss, work on fairness within machine learning has focused primarily on the intermediate, institutional case, and has largely ignored the broader context. We will begin with an in-depth overview of consequentialism that engages with these difficulties, and then show that it nevertheless provides a useful critical perspective on conventional thinking about fairness within machine learning (section 6).

3. Consequentialism Defined

3.1. Overview

The literature on consequentialism is vast, including many nuances that will not concern us here. The most well-known expressions can be found in the writings of Jeremy Bentham (1970 [1781]) and John Stuart Mill (1979[1863]), later refined by philosophers such as Henry Sidgwick (1967), Elizabeth Anscombe (1958), Derek Parfit (1984), and Peter Singer (1993). The basic idea which unifies all of this thinking is that only the outcomes that result from our actions (i.e., the relative value of possible worlds that might exist in the future) have moral relevance.

Before proceeding, it is helpful to consider three lenses through which we can make sense of an ethical theory. First, we can consider a statement to be a claim about what would be objectively best, given some sort of full knowledge and understanding of the universe. Second, we can think of an ethical theory as a proposed guide for how someone should choose to act in a particular situation (which may only align partially with an objective perspective, due to limited information). Third, although less conventional, we can think of ethics as a way to interpret the actions taken by others. In the sense that “actions speak louder than words,” we can treat people's behavior as revealing of their view of what is morally correct (Greene and Haidt, 2002).

Although consequentialism is typically presented in a more abstract philosophical form (often illustrated via thought experiments), we will begin with a concise mathematical formulation of the two most common forms of consequentialism, known as act consequentialism and rule consequentialism. For the moment, we will intentionally adopt the objective perspective, before returning to practical difficulties below.

3.2. Act Consequentialism

First, consider the proposal known as act consequentialism. This theory says, simply, that the best action to take in any situation is the one that will produce the best outcomes (Smart and Williams, 1973; Railton, 1984). To be precise, let us define the set of possible actions, $A$ , and an evaluation function v(·). According to act consequentialism, the best action to take is the one that will lead to the consequences with the greatest value, i.e.,

\begin{array}{l} a^{*} = \underset{a \in A}{arg max} v (c_{a}), & (1) \end{array}

where v(c_a) computes the value of consequences, c_a, which follow from taking action a. Importantly, note that c_a here represents not just the local or immediate consequences of a, but all consequences (Kagan, 1998). In other words, we can think of the decision as a branching point in the universe, and want to evaluate how it will unfold based on the action that is taken at a particular moment in time (Portmore, 2011).

While Equation (1) might seem tautological, it is by no means a universally agreed upon definition of what is best. For example, many deontological theories posit that certain actions should never be permitted (or that some might always be required), no matter what the consequences. In addition, there are some obvious difficulties with Equation (1), especially the question of how to define the evaluation function v(·). We will return to this and other difficulties below (section 4), but for the moment we will put them aside.

One might object that perhaps there is inherent randomness in the universe, leading to uncertainty about c_a. In that case, we can sensibly define the optimal action in terms of the expected value of all future consequences, i.e.,

\begin{array}{l} a^{*} = \underset{a \in A}{arg max} 𝔼_{p (c ∣ a)} [v (c)], & (2) \end{array}

where p(c ∣ a) represents the true probability (according to the universe) that consequences c will follow from action a. That is, for each possible action, we would consider all possible outcomes which might result from that action, and sum their values, weighted by the respective probabilities that they will occur, recommending the action with the highest expected value.

To make the dependence on future consequences more explicit, it can be helpful to factor the expected value into a summation over time, optionally with some sort of discounting. Although consequentialism does not require that we factorize the value of the future in this way, it will prove convenient in further elaboration of these ideas. For the sake of simplicity, we will assume that time can be discretized into finite steps. A statement of act consequentialism using a simple geometric discounting factor would then be:

\begin{array}{l} a^{*} = \underset{a \in A}{arg max} \sum_{t = 0}^{\infty} γ^{t} \cdot 𝔼_{p (s_{t + 1} ∣ a)} [v (s_{t + 1})], & (3) \end{array}

where p(s_t+1 ∣ a) represents the probability that the universe will be in state s at time t + 1 if we take action a at time t = 0, and 0 ≤ γ ≤ 1 represents the discount factor. A discount factor of 0 means that only the immediate consequences of an action are relevant, whereas a discount factor of 1 means that all times in the future are valued equally¹.

3.3. Rule Consequentialism

The main alternative to act consequentialism is a variant known as rule consequentialism (Harsanyi, 1977; Hooker, 2002). As the name suggests, rule consequentialism is similar to act consequentialism, except that rather than focusing on the best action in each unique situation, it suggests that we should act according to a set of rules governing all situations, and adopt the set of rules which will lead to the best overall outcomes².

Here, we will refer to a set of rules as a policy, and allow for the policy to be stochastic. In other words, a policy, π, is a probability distribution over possible actions conditional on the present state s, i.e., π(s) ≜ p(a ∣ s). To make a decision, an action is sampled randomly from this distribution³. Using the same temporal factorization as above, we can formalize rule consequentialism as

\begin{array}{l} π^{*} = \underset{π \in Π}{arg max} 𝔼_{p (s_{t^{'} + 1} ∣ a_{t^{'}}, s_{t^{'}}) π (a_{t^{'}} ∣ s_{t^{'}})} [\sum_{t = 0}^{\infty} γ^{t} \cdot v (s_{t + 1})], & (4) \end{array}

where Π represents the space of possible policies, and the expectation is now taken with respect to the governing dynamics, in which actions are selected based on the state of the world, i.e., a_t ~ π(a_t ∣ s_t), and the next state depends on the current state of the world and the action taken, i.e., s_t+1 ~ p(s_t+1 ∣ s_t, a_t).

While some have suggested that rule consequentialism is strictly inferior to act consequentialism, in that it fails to treat each situation as unique (Railton, 1984), others have argued for it, citing the inability of individuals to accurately determine the best action in each unique situation (Hooker, 2002), as well as benefits from coordination and incentives (Harsanyi, 1977). As noted by various papers (e.g., Abel et al., 2016), Equation (4) bears a striking resemblance to the problem of reinforcement learning⁴. While this similarity is provocative, we will defer discussion of it (and the more general question of learning) until section 7.

It is important to emphasize that the above formulation is a highly stylized discussion of morality, largely divorced from reality, which tries to encapsulate a large body of philosophical writing put forward under the name “consequentialism.” Thinking about what this formulation has to tell us about how individuals make (or should make) choices requires further elaboration, which we revisit below (section 4).

3.4. Competing Ethical Frameworks

The primary contrasting proposals to consequentialism are (a) deontology; and (b) theories in the social contract tradition. As mentioned above, deontological theories posit that there are certain restrictions or requirements on action, a priori, which cannot be violated. For example, various religious traditions place restrictions on lending money, or require a certain level of charitable giving. Using the framework established above, we can describe deontological theories as constraints on the action space, $A$ , or policy space, Π (Kagan, 1998). While they may accord more with our commonsense notions of morality (see section 4.4), deontological theories are open to challenge because of their inability to justify the particular constraints they specify, as well as the implication that they would fail to produce the best outcomes in certain scenarios (Smart and Williams, 1973; Scheffler, 1994).

By contrast, social contract theories are more concerned with determining the rules, or ways of organizing society, that a group of free and reasonable people would agree to in an idealized deliberative scenario⁵. Most famously in this tradition, John Rawls suggested that we should imagine people designing society behind a “veil of ignorance,” not knowing what position they will hold in that society (Rawls, 1971). We cannot possibly do justice to these other schools of thought in the space available, but we note that there is value in thinking about sociotechnical systems from multiple ethical perspectives, and encourage others to elaborate on these points⁶.

In this paper, we focus on consequentialism not because it is necessarily superior to the alternatives, but because it is influential, and because it might seem, at first glance, to have a natural affinity with machine learning and optimization. While there have been many papers providing brief summaries of various ethical theories and their relevance to AI, we believe that a more in-depth treatment is required to fully unpack the implications of each, and would encourage similar consideration of the above traditions, as well as virtue ethics, feminist ethics, etc.

Before discussing the problems with consequentialism, it is useful to note that the formulation given in Equation (4) highlights three important matters about which reasonable people might disagree, with respect to how we should act (alluded to in section 2): we might disagree about the relative value of different outcomes [the evaluation function, v(·)]; we might disagree about the likely effects of different actions [the probability of outcomes, p(s_t+1 ∣ s_t, a_t)]; and we might disagree about how much weight to place on the distant future (the discount factor, γ).

4. Difficulties of Consequentialism

Even if one accepts the idea in Equation (2)—that the best action is the one that will produce the best outcome in expectation, with no a priori restrictions on the action space, there are still numerous difficulties with consequentialism, both theoretically and in practice.

4.1. Value

Perhaps the most vexing part of consequentialism is the evaluation function, v(·). Even if one had perfect knowledge of how the universe would unfold conditional on each possible action, choosing the best action would still require some sort of objective way of characterizing the relative value of each possible outcome. Most writers on consequentialism agree that the specification of value should be impartial, in that it should not give arbitrary priority to particular individuals (Singer, 1993; Kagan, 1998), but this is far from sufficient for resolving this difficulty⁷.

By far the most common way of simplifying the evaluation of outcomes, both within writings on consequentialism and in decision theory, is to adopt the classic utilitarian perspective (Smart and Williams, 1973; Mill, 1979[1863]). Although there are many variations, the most common statement of utilitarianism is that the value of a state is equal to the sum of the well-being experienced by all individual entities⁸. The most common social welfare function is thus

\begin{array}{l} v (s) = \sum_{e \in E} w_{e} (s), & (5) \end{array}

where $E$ represents the set of entities under consideration, and w_e(s) measures the absolute well-being of entity e in state s⁹.

Although utilitarianism is highly influential, there are fundamental difficulties with it. First, aggregating well-being requires measuring individual welfare, but it is unclear that it can be measured in a way that allows for fair comparisons, at least given current technology. Even if we restrict the set of morally relevant entities to humans, issues of subjectivity, disposition, and self-reporting make it difficult if not impossible to meaningfully compare across individuals (Binmore, 2009).

Second, even if there were a satisfactory way of measuring individual well-being, there are computational difficulties involved in estimating these values for hypothetical worlds. Given that well-being could depend on fine-grained details of the state of the world, it is unclear what level of precision would be required of a model in order to evaluate well-being for each entity. Thus, even estimating the overall value of a single state of the world might be infeasible, let alone a progression of them over time.

Third, any function which maps from the welfare of multiple entities to a single scalar will fail to distinguish between dramatically different distributions. Using the sum, for example, will treat as equivalent two states with the same total value, but with different levels of inequality (Parfit, 1984). While this failing is not necessarily insurmountable, most solutions seem to undermine the inherent simplicity of the utilitarian ideal¹⁰.

Fourth, others have challenged the ideal of impartiality on the grounds that it is subtly paternalist, emphasizes individual autonomy over relationships and care, and ignores existing relations of power (Smart and Williams, 1973; Friedman, 1991; Driver, 2005; Kittay, 2009). Undoubtedly, there is a long and troubling history of otherwise enlightened philosophers presuming to know what is best for others, and being blind to the harms of institutions such as colonialism, while believing that certain classes of people either don't count or are incapable of full rationality (Mills, 1987; Schultz and Varouxakis, 2005).

Ultimately, it seems inescapable to conclude that there is no universally acceptable evaluation function for consequentialism. Rather, we must acknowledge that every action will entail an uneven distribution of costs and benefits. Even in the case where an action literally makes everyone better off, it will almost certainly benefit some more than others. As such, the most credible position is to view the idea of valuation (utilitarian or otherwise) as inherently contested and political. While we might insist that an admissible evaluation function conform to certain criteria, such as disinterestedness, or not being self-defeating (Parfit, 1984), we must also acknowledge that advocating for a particular notion of value as correct is fundamentally a political act.

4.2. Temporal Discounting

Even if there were an unproblematic way of assessing the relative value of a state of the world, the extent to which we should value the distant future is yet another point of potential disagreement. It is common (for somewhat orthogonal reasons) to apply temporal discounting in economics, but it is not obvious that there is any good reason to do so when it comes to moral value (Cowen and Parfit, 1992; Cowen, 2006). Just as philosophers such as Peter Singer have argued that we should not discount the value of a human life simply because a person happens to live far away (Singer, 1972), one could argue that the lives of those who will live in the future should count for as much as the lives of people who are alive today.

Unfortunately, it is difficult to avoid discounting in practice, as it becomes increasingly difficult to predict the consequences of our actions farther into the future. Even if we assume a finite action space, the number of possible worlds to consider will grow exponentially over time. Moreover, because of the chaotic nature of complex systems, even if we had complete knowledge of the causal structure of the universe, we would be limited in our ability to predict the future by lack of precision in our knowledge about the present.

Despite these difficulties, consequentialism would suggest that we should, to the extent that we are able, think not only about the immediate consequences of our actions, but about the longer-term consequences as well (Cowen, 2006). Indeed, considering the political nature of valuation, we arguably bear even greater responsibility for thinking about future generations than the present, given that those who have not yet been born are unable to directly advocate for their interests.

4.3. Uncertainty

In practice, of course, we do not know with any certainty what the consequences of our actions will be, especially over the long term. Again, from the perspective of determining the objectively morally correct action, one might argue that all that matters is the (unknown) probability according to the universe. For individual decision makers, however, any person's ability to predict the future will be limited, and, indeed, will likely vary across individuals. In other words, it is not just our uncertainty about consequences that is a problem, but our uncertainty about our uncertainty: we don't know how well or poorly our own model of the universe matches the true likelihood of what will happen (Kagan, 1998; Cowen, 2006).

The subjective interpretation of consequentialism suggests that, regardless of what the actual consequences may be, the morally correct thing for an individual to do is whatever they have reason to believe will produce the best consequences (Kagan, 1998). This, however, is problematic for two reasons: first, it ignores the computational effort involved in trying to determine which action would be best (which is itself a kind of action); and second, it seemingly absolves people from wrong-doing who happen to have a poor model of the world.

Rule consequentialism arguably provides a (philosophical) solution for these problems, in that it involves a direct mapping from states to actions, without requiring that each decision maker independently determine the expected value of each possible action (Kagan, 1998; Hooker, 2002)¹¹. It still has the problem, however, of determining what policy is optimal, given our uncertainty about the world. Nevertheless, we should not overstate the problem of uncertainty; we are not in a state of total ignorance, and in general, trying to help people is likely to do more good than trying to harm them (de Lazari-Radek and Singer, 2017).

4.4. Conflicts With Commonsense Morality

A final set of arguments against consequentialism take the form of thought experiments in which consequentialism (and utilitarianism in particular) would seemingly require us to take actions that violate our own notions of commonsense morality. A particularly common example is the “trolley problem” and its variants, in which it is asked whether or not it is correct to cause one person to die in order to save multiple others (Foot, 1967; Greene, 2013).

We will not dwell on these thought experiments, except to note that many of the seeming conflicts from this type of scenario vanish once we take a longer term view, or adopt a broader notion of value than a simple sum over individuals. Killing one patient to save five might create greater aggregate well-being if we only consider the immediate consequences. If we consider all consequences of such an action, however, it should be obvious why we would not wish to adopt such a policy (Kagan, 1991).

It is worth commenting, however, on one particular conflict with commonsense morality, namely the claim that consequentialism is, in some circumstances, excessively demanding. Given the present amount of suffering in the world, and the diminishing marginal utility of wealth, taking consequentialism seriously would seem to require that we sacrifice nearly all of our resources in an effort to improve the well-being of the worst off (Smart and Williams, 1973; Driver, 2012). While to some extent this concern is mitigated by the same logic as above (reducing ourselves to ruin would be less valuable over the long term than sacrificing a smaller but sustainable amount), we should take seriously the possibility that the best action might not agree with our moral intuitions.

5. Fairness in Machine Learning

With the necessary background on consequentialism in place, we now review and summarize ideas about fairness in machine learning. Note that “fairness” is arguably an ambiguous and overloaded term in general usage; our focus here is on how it has been conceptualized and formalized within the machine learning literature¹². In order to lay the foundation for a critical perspective on this literature, we first summarize the general framework that is commonly used for discussing fairness, and then summarize the most prominent ways in which it has been defined¹³.

The typical setup is to assume that there are two or more groups of individuals which are distinguished by some “protected attribute,” A, such as race or gender. All other information about each individual is represented by a feature vector, X. The purpose of the system is to make a prediction about each individual, Ŷ, which we will assume to be binary, for the sake of simplicity. Moreover, we will assume that the two possible predictions (1 or 0) are asymmetric, such that one is in some sense preferable. Finally, we assume that, for some individuals, we can observe the true outcome, Y. We will use $X$ to refer to a set of individuals.

To make this more concrete, consider the case of deciding whether or not to approve a loan. An algorithmic decision making system would take the applicant's information (X and possibly A), and return a prediction about whether or not the applicant will repay the loan, Ŷ. For those applicants who are approved, we can then check to see who actually pays it back on time (Y = 1) and who does not (Y = 0). Note, however, that in this setup, we are unable to observe the outcome for those applicants who are denied a loan, and thus cannot know what their outcome would have been in the counterfactual scenario.

The overriding concern in this literature is to make predictions that are highly accurate while respecting some notion of fairness. Because reducing complex social constructs such as race and gender to simplistic categories is inherently problematic, as a running example we will instead use biological age as a hypothetical protected attribute¹⁴. Using the same notation as above, we would say that an automated system instantiates a policy, π, in making a prediction for each applicant. Thus, for instance i, a threshold classifier would predict

\begin{array}{l} {\hat{y}}_{i} = \underset{y \in {0, 1}}{arg max} π (Y = y ∣ X = x_{i}, A = a_{i}), & (6) \end{array}

though we might equally consider a randomized predictor.

Much of the work in fairness has drawn inspiration from two legal doctrines: disparate treatment and disparate impact (Ruggieri et al., 2010; Barocas and Selbst, 2016). Disparate treatment, roughly speaking, says that two people should not be treated differently if they differ only in terms of a protected attribute. For our running example, this would be equivalent to saying that one cannot deny someone a loan simply because of their age.

Disparate impact, on the other hand, prohibits the adoption of policies that would have consequences that are unevenly distributed according to the protected attribute, even if they are neutral on their face. Thus a policy which denies loans to people with no credit history might have a disparate impact on younger borrowers, and could therefore (hypothetically) be considered discriminatory.

While research in machine learning fairness is ongoing, most proposals can be classified into two types, which to some extent map onto the two legal doctrines mentioned above. Some definitions are specified without reference to outcomes (section 5.1). Others are specified exclusively with regard to a particular set of outcomes (which must be evaluated using real data; section 5.2). We summarize the dominant proposals of each type below.

5.1. Fairness Constraints Specified Without Regard to Outcomes

The first type of approach to fairness advocates constraints that are specified without reference to actual effects. In a formal sense, we can think of these as placing restrictions, a priori, on the space of policies which will be considered morally acceptable. We provide three examples of this type of approach below.

5.1.1. Fairness Through Unawareness

A commonsense but naive notion is to disallow policies which use the protected attribute in making a prediction. Equivalently, this requires that for any x,

\begin{array}{l} π (y ∣ x, A = 0) = π (y ∣ x, A = 1) & (7) \end{array}

Although this seems like a strict translation of the prohibition against disparate treatment, it is generally considered to be unhelpful (Hardt et al., 2016; Kleinberg et al., 2018). Due to correlations, it may be possible to infer the protected attribute from other features, hence prohibiting a single piece of information may have no effect in practice.

5.1.2. Individual Fairness

A more general application of the same idea argues that models must make similar predictions for similar individuals (in terms of their representations, X) (Dwork et al., 2012). This proposal was originally framed as being in the Rawlsian tradition, suggesting it should be a matter of public deliberation to determine who counts as similar. However, as has been noted, the effects of this framework are highly dependent on the particular notion of similarity that is chosen (Green and Hu, 2018).

5.1.3. Randomization

A further way of avoiding disparate treatment is through randomization (Kroll et al., 2017). The basic idea is that a policy should not look at the protected attribute or any other attribute when making a decision, except perhaps to verify that some minimal criteria are met. For example, a policy might assign 0 probability to instances that do not meet the criteria, and an equal probability to all others. Although this is a severe limitation on the space of policies, we do see instances of it being used in practice, such as in the U.S. Diversity Visa Lottery (Perry and Zarsky, 2015; Kroll et al., 2017)¹⁵.

5.2. Fairness Constraints Specified in Terms of Outcomes

The other major approach to fairness in machine learning is to specify requirements on the actual outcomes of a policy. In other words, while the above fairness criteria can be evaluated without data, the following criteria can only be checked using an actual dataset. These notions of fairness are often justified in terms of the doctrine of disparate impact—that is, policies should not be adopted which have adverse outcomes for protected groups. Three examples are presented below:

5.2.1. Demographic/Statistical Parity

The notion of parity implies that the proportion of predicted labels should be the same, or approximately the same for each group. For example, this might require that an equal proportion of older and younger applicants would receive a loan. Formally, this requirement says that in order to be acceptable, a policy must satisfy

\begin{array}{l} \frac{\sum_{i \in X} Ⅱ [a_{i} = 0] \cdot {\hat{y}}_{i}}{\sum_{i \in X} Ⅱ [a_{i} = 0]} = \frac{\sum_{j \in X} Ⅱ [a_{j} = 1] \cdot {\hat{y}}_{j}}{\sum_{j \in X} Ⅱ [a_{j} = 1]}, & (8) \end{array}

where Ⅱ[·] equals 1 if the condition holds (otherwise 0). Demographic parity is a strong statement about what the consequences of a policy must be (in terms of a very focused set of short-term consequences). Note, however, that enforcing this constraint may result in suboptimal outcomes from the perspective of other criteria (Corbett-Davies et al., 2017).

5.2.2. Equality of Odds/Opportunity

Another outcome-based fairness criteria looks at the outcomes that result from the policy, and compares the rates of true positives and/or false positives among a held-out dataset (Hardt et al., 2016). Equal opportunity would require that, for example, an equal proportion of applicants from each group who will pay back a loan are in fact approved. Formally,

\begin{array}{l} \frac{\sum_{i \in X} Ⅱ [a_{i} = 0, y_{i} = 1] \cdot {\hat{y}}_{i}}{\sum_{i \in X} Ⅱ [a_{i} = 0, y_{i} = 1]} = \frac{\sum_{j \in X} Ⅱ [a_{j} = 1, y_{j} = 1] \cdot {\hat{y}}_{j}}{\sum_{j \in X} Ⅱ [a_{j} = 1, y_{j} = 1]} . & (9) \end{array}

Equality of odds is similar, except that is requires that rates of both true positives and false positives be the same across groups.

5.2.3. Equal Calibration

An alternative to equality of odds is to ask that the predictions be equally well calibrated across groups. That is, if we bin the predicted probabilities into a set of bins, a well-calibrated predictor should predict probabilities such that the proportion of instances that are correctly classified within each bin is the same for all groups. In other words, equal calibration tries to ensure that

\begin{array}{l} \frac{\sum_{i \in X} Ⅱ [a_{i} = 0, {\hat{p}}_{i} \in [b, c)] \cdot y_{i}}{\sum_{i \in X} Ⅱ [a_{i} = 0, {\hat{p}}_{i} \in [b, c)]} = \frac{\sum_{j \in X} Ⅱ [a_{j} = 1, {\hat{p}}_{j} \in [b, c)] \cdot y_{j}}{\sum_{j \in X} Ⅱ [a_{j} = 1, {\hat{p}}_{j} \in [b, c)]} & (10) \end{array}

for each interval [b, c), where ${\hat{p}}_{i} = π (Y = 1 ∣ x_{i}, a_{i})$ according to the policy.

Note that whereas demographic parity only requires the set of predictions (Ŷ) made for all individuals in a dataset, equal opportunity and equal calibration also require that we know the true outcome (Y) for all such individuals, even those who are given a negative prediction. As a result, the latter two requirements can only be properly verified on a dataset for which we can independently observe the true outcome (e.g., based on assigning treatment randomly).

As has been shown by multiple authors, certain fairness criteria will necessarily be in conflict with others, under mild conditions, indicating that we will be unable to satisfy all simultaneously (Chouldechova, 2017; Kleinberg et al., 2017).

6. A Consequentialist Perspective on Machine Learning Fairness

As previously mentioned, most fairness metrics have been proposed with only limited discussion of ethical foundations. In this section, we provide commentary on the criteria described above from the perspective of consequentialism. As a reminder, we are not suggesting that consequentialism provides the last word on what is morally correct. Rather, we can think of consequentialism as providing one of several possible ethical perspectives which should be considered.

First, consider the fairness proposals that are specified without regard to outcomes (section 5.1). As mentioned above, these can be seen as restrictions on the set of policies that are acceptable. By definition, these constraints are not determined by the actual consequences of adopting them, nor do they possess an in-built verification mechanism to assess the nature of the consequences being produced. As such, these have more of a deontological flavor, reflecting a prior stipulation that similar people should be treated similarly, or that everyone deserves an equal chance. For example, Equation (7) specifies precisely the constraint on the policy space required by fairness through unawareness, and similarly for the other proposals. In principle, of course, these criteria could have been developed with the expectation that using them would produce the best outcomes, but it is far from obvious that this is the case.

By contrast, the fairness criteria specified explicitly in terms of outcomes (section 5.2) might seem to be closer to a form of consequentialism, given that they are evaluated by looking at actual impacts. However, upon closer inspection we see that they imply a severely restricted form of consequentialism in terms of how they think about value, time horizon, and who counts. In particular, while the proposals differ in terms of the precise values that are being emphasized, all of these proposals have some features in common:

• They only evaluate outcomes in terms of the people who are the direct object of the decision being made, not others who may be affected by these decisions;

• They only explicitly consider the immediate consequences of each decision, equivalent to using a discount factor of 0;

• They presuppose that a particular function of the distribution of predictions and outcomes (e.g., calibration) is the only value that is morally relevant.

Again, it is entirely possible that these constraints were developed with the intention of producing more broadly beneficial consequences over the long term. The point is that there is nothing in the constraints themselves that points to or tries to verify this broader impact, despite the fact that they are evaluated in terms of (a narrow set of) outcomes.

To make this concrete, consider again the case of trying to regulate algorithms which will be used by banks in making loans. Requiring satisfaction of any of the above fairness constraints will alter the set of loan applicants who are approved (and denied). While it is possible that some of these criteria might lead to broadly beneficial changes (e.g., demographic parity might enhance access to credit among those who have been historically marginalized), from the perspective of consequentialism it insufficient to evaluate the outcome only in terms of the probabilities or labels assigned to each group. Rather, it is necessary to consider the full range of consequences to individuals and society. In some cases, a loan might positively transform a person's life, or the life of their community, via mechanisms such as education and entrepreneurship. In other cases, easier access to credit could lead to speculative borrowing and financial ruin. For example, while not directly related to concerns about fairness, the potentially devastating effects of lending policies which ignore long-term and systemic effects can easily be seen in the aftermath of the subprime mortgage crisis, which derived, in part, by perverse incentives and risky lending (Bianco, 2008).

Crafting effective financial regulation is obviously extremely difficult, and this is not meant to suggest that any particular fairness constraint is likely to lead to disaster. Nevertheless, it is important to remember that fairness criteria which are specified only in terms of a narrow set of short term metrics do not guarantee positive outcomes beyond what they measure, and may in some cases lead to overall greater harm.

In sum, adopting a consequentialist perspective reveals numerous ways in which the existing proposals for thinking about fairness in machine learning are fatally flawed. While all have their merits, none have been adequately justified in terms of their likely consequences, broadly considered. Moreover, most are highly restricted in terms of the types of outcomes they take into consideration, and largely ignore broader systemic effects of adopting a single policy.

It is, of course, understandable that most approaches to machine learning fairness have focused on a priori constraints and tractable short term consequences. Avoiding negative consequences from new technologies is challenging in general, and many of the difficulties of consequentialism also apply directly to machine learning, especially in social contexts (uncertainty about the future, lack of agreement about value, etc.). Even in relatively controlled environments, it is easy to find examples of undesirable outcomes resulting from ill-specified value functions, improper time horizons, and the kinds of computational difficulties described in section 4 (Amodei et al., 2016).

Although consequentialism does not provide any easy answers about how to make AI systems more fair or just, several important considerations follow from its tenets. First, consequentialism reminds us of the need to consider outcomes broadly; technical systems are embedded in social contexts, and policies can have widespread effects on communities, not merely those who are subject to classification. Second, the political nature of valuation means that a broad range of perspectives on what is desirable should be sought out and considered, not for a reductive utilitarian calculus, but so as to be informed as to the diversity of opinions. Third, the phenomenon of diminishing marginal utility suggests that efforts should be directed to helping those who are worst off, rather than trying to make life better for the already well off, without, of course, presuming to automatically know what is best for others. Fourth, while we might disagree about the discount rate, the moral value of the future necessitates that we take downstream effects into account, rather than only focusing on immediate consequences. Sweeping attempts at regulation, such as GDPR, may have outsized effects here, as they will partially determine how we think about fairness going forward, and what it is legitimate to measure. Finally, because it is particularly difficult to predict consequences in the distant future, a high standard should be required for any policy that would place a definite burden on the present for a possible future gain.

7. Randomization and Learning

Before concluding, we will attempt to draw together a number of threads related to uncertainty, learning, and randomization. As described earlier, most philosophical presentations of consequentialism are highly abstract, without considering how one would practically determine what actions or rules are best. Given that statistics and machine learning arose specifically to deal with the problem of uncertainty, it is natural to ask whether there is any role for learning in consequentialism.

Indeed, an entire subfield of machine learning exists precisely to deal with the problem of action selection in the face of uncertainty (so-called “bandit” problems, or reinforcement learning more broadly). As noted in the introduction, the reinforcement learning objective explicitly encodes the goal of maximizing some benefit over the long term. Algorithms designed to optimize this objective typically rely initially on random exploration to reduce uncertainty, thereby facilitating long-term “exploitation” of rewards.

Not surprisingly, a number of papers have proposed using similar strategies as a way of achieving fair outcomes over the long-term. For example, Kroll et al. (2017) suggest that adding randomness to hiring algorithms could help to debias them over time. Joseph et al. (2016b) consider the problem of learning a policy for making loans, and present an algorithm to do so without violating a particular notion of fairness¹⁶. Liu et al. (2017) extend this work, again trying to satisfy fairness in the contextual bandit setting. Meanwhile, Barabas et al. (2018) suggest using randomization to facilitate causal inference about the “social, structural, and psychological drivers” of crime.

Randomization in decision making is a deep and important topic, and has been the focus of much past work in ethics (Lockwood and Anscombe, 1983; Freedman, 1987; Bird et al., 2016; Haushofer et al., 2019). As noted above, it can be a source of fairness, if we take “fair” to mean that everyone deserves an equal chance. It may also be useful to prevent strategic manipulation of a system, and has a definite role in some parts of American law (Perry and Zarsky, 2015; Kroll et al., 2017).

Although temporal discounting in consequentialism is typically discussed in terms of present vs. future value (e.g., helping people today vs. investing in the future), a similar trade off applies to costly experimentation for the purpose of reducing future uncertainty. Indeed, this sort of approach has been widely adopted in industry in the form of A/B testing, as well as for adaptive trials in domains such as medicine (Lai et al., 2015). Moreover, there is clearly something appealing about the idea that it should be morally incumbent upon people to improve their understanding of the world over time, not merely to act on their current understanding. However, randomization also raises a number of serious concerns.

First, as always, there is the problem of value, and the question of who gets to decide how to balance present costs against future benefits. Second, there are good reasons to think that such an approach is unlikely to work in complex sociotechnical systems. Although reinforcement learning has been extraordinarily successful in limited domains, such as game playing and online advertising, making reinforcement learning tractable generally requires assuming the existence of a stable environment, a limited space of actions, a clear reward signal, and a massive amount of training data. In most policy domains, we can expect to have none of these. Third, there may be real costs associated with participation in such a process; while a bank could conceivably choose to add randomness to a policy for granting loans (for the purpose of better learning who is likely to pay them back), giving loans to people who cannot afford them could have severe negative consequences for those individuals.

There are clearly some domains where randomization is widely used, and seems well-justified, especially from the perspective of consequentialism. The best example of this is clinical trials in medicine, which are not only favored, but required. Medicine, however, is a special domain for several reasons: there is general agreement about ends (saving lives and reducing suffering), there is good reason to think that findings will generalize across people, and there is a well-established framework for experimentation, with safeguards in place to protect the participants.

Where things get more complicated is using the same logic to establish the efficacy of social interventions, such as randomized trials in development economics. Although controlled experiments do provide good evidence about whether an intervention was effective, it is less clear that the conclusions will generalize to different situations (Barrett and Carter, 2010).

Ultimately, while randomization can be an important tool in learning policies that promote long term benefits, especially in relatively static, generalizable domains, the limitations of both consequentialism and of statistical learning theory mean that we should be highly skeptical of any attempt to use it as the basis for creating policies or automated decision making systems to deal with complex social problems.

8. Additional Related Work

Beyond the criteria mentioned in section 5, numerous other fairness metrics have been proposed, such as procedural fairness (Grgić-Hlača et al., 2016) and causal effects (Madras et al., 2018; Khademi et al., 2019). Meanwhile, other papers have emphasized that simply satisfying a particular definition of fairness is no guarantee of the broader outcomes people care about, such as justice (Hu and Chen, 2018b). Selbst et al. (2019) discuss five common “traps” in thinking about sociotechnical systems, and Friedler et al. (2019) demonstrate how outcomes differs depending on preprocessing and the choice of fairness metric.

Others have explored various types of consequences in particular settings, such as cost to the community in criminal justice (Corbett-Davies et al., 2017), runaway feedback loops in predictive policing (Ensign et al., 2018), disparities in the labor market (Hu and Chen, 2018a), and the potential for strategic manipulation of policies (Hu et al., 2019; Milli et al., 2019). Liu et al. (2018) demonstrate the importance of modeling the delayed impact of adopting various fairness metrics, even when focused narrowly on outcomes such as demographic parity. In a discussion of racial bias in the criminal justice system, Huq (2019) uses broadly consequentialist logic, arguing that the systems should be evaluated in terms of costs and benefits to minority groups. For surveys discussing the intersection of ethics and AI more broadly, see Brundage (2014) and Yu et al. (2018). For a book-length treatment of the subject, see Wallach and Allen (2008).

9. Conclusions

Consequentialism represents one of the most important pillars of ethical thinking in philosophy, including (but not limited to) utilitarianism. In brief, the central tenet of consequentialism is that actions should be evaluated in terms of the relative goodness of the expected outcomes, according to an impartial perspective on what is best. Despite a number of serious problems that limit its practical application, including computational problems involving value, uncertainty, and discounting, consequentialism still provides a useful basis for thinking about the limitations of other normative frameworks.

Within the context of automated decision making, a consequentialist perspective underscores that merely satisfying a particular fairness metric is no guarantee of ethical conduct. Rather, consequentialism requires that we consider all possible options (including the possibility of not deploying an automated system), and weigh the likely consequences that will result, considered broadly, including possible implications for the long term future. Moreover, we must consider not only those who will be directly affected, but broader impacts on communities, and systemic effects of replacing many human decision makers with a single policy. While there are contexts in which it is reasonable, even required, to attempt to learn from the present for the benefit of the future, we should be skeptical of any randomization schemes which make unrealistic assumptions about the generalizability of what can be learned from social systems.

The political nature of valuation means we are unlikely to ever have agreement on what outcomes are best, and long term consequences will always remain to some extent unpredictable. Nevertheless, through ongoing efforts to take into consideration a diverse set of perspectives on value, and systematic attempts to learn from our experiences, we can strive to move toward policies which are likely to lead to a better world, over both the short and long term future.

Author Contributions

DC conceived of the scope of this article. DC and NS contributed to the writing and editing of the manuscript.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

The authors would like to thank Jared Moore, Emily Kalah Gade, Maarten Sap, Dan Hendrycks, and reviewers for their thoughtful feedback and comments on this work.

Footnotes

1. ^One could similarly augment Equation (3) to make any epistemic uncertainty about the evaluation function or discount factor explicit.

2. ^In some cases, rule consequentialism is formulated as the problem of choosing the set of rules which, if internalized by the vast majority of the community, would lead to the best consequences (Hooker, 2002).

3. ^Most treatments of consequentialism assume that the rules determine a single correct action for each situation. However, the formulation presented here is strictly more general; deterministic policies are those that assign all probability mass to a single action for each state.

4. ^Equation (4) is equivalent to the standard formulation of a Markov decision process if we restrict ourselves to a finite set of states $s \in S$ , actions $a \in A$ , transition probabilities p(s_t+1 ∣ s_t, a_t), and discount factor γ.

5. ^E.g., “An act is wrong if its performance under the circumstances would be disallowed by any set of principles for the general regulation of behavior that no one could reasonably reject as a basis for informed, unforced, general agreement” (Scanlon, 1998).

6. ^For a review of how Rawls has been applied within information sciences [see Hoffmann (2017)].

7. ^Sidgwick (1967) writes, “I obtain the self-evident principle that the good of any one individual is of no more importance, from the point of view (if I may say so) of the Universe, than the good of any other; unless, that is, there are special grounds for believing that more good is likely to be realized in the one case than in the other.”

8. ^The philosophical literature in some cases uses happiness or the satisfaction of preferences, rather than well-being, but this distinction is not essential for our purposes.

9. ^Note that using a separate value function for each entity accounts for variation in preferences, and allows for some entities to “count” for more than others, as when the set of relevant entities includes animals, or all sentient beings (Kagan, 1998).

10. ^For example, one could model well-being as a non-linear, increasing, concave (e.g., logarithmic) function of other attributes such as wealth (i.e., diminishing marginal utility), which would encourage a more equal distribution of resources. Alternatively, one could try to incorporate people's suffering due to inequality into their value functions (de Lazari-Radek and Singer, 2017).

11. ^To use a somewhat farcical example, we could imagine using a neural network to map from states to actions; the time to compute what action to take would therefore be constant for any scenario.

12. ^Extensive discussion of the idea of fairness can be found in much of the philosophical and technical literature cited throughout. In particular, we refer to the reader to Rawls (1958), Kagan (1998), and Binns (2018).

13. ^While there is also some work on fairness in the unsupervised setting (e.g., Benthall and Haynes, 2019; Kleindessner et al., 2019), in this paper we focus on the supervised case.

14. ^Age is a particularly interesting example of a protected attribute, as it is explicitly used to discriminate in some domains (as in restricting the right to vote), but afforded some protections in others (such as the U.S. Age Discrimination in Employment Act).

15. ^Additional examples of randomization include jury selection, military service, sortition in ancient Athenian government, and which members of a firing squad have guns with real bullets. Of course, as Kroll et al. (2017) point out, randomization is only fair if the system cannot be manipulated by either applicants or decision makers.

16. ^In a companion paper, Joseph et al. (2016a) proclaim their approach to be Rawlsian, but this seems to miss the key point of Rawls—namely, that we must account for inequalities due to circumstances (i.e., “regardless of their initial place in the social system”; Rawls, 1958). Rather, the approach of Joseph et al. (2016b) merely says we should learn to give loans to people who will best be able to pay them back.

References

Abel, D., MacGlashan, J., and Littman, M. L. (2016). Reinforcement learning as a framework for ethical decision making. in Proceedings of the Workshop on AI, Ethics, and Society at AAAI (Phoenix, AZ).

Google Scholar

Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., and Mané, D. (2016). Concrete problems in AI safety. arXiv:1606.06565.

Google Scholar

Anscombe, G. E. M. (1958). Modern moral philosophy. Philosophy 33, 1–19. doi: 10.1017/S0031819100037943

On Consequentialism and Fairness

1. Introduction

2. Motivating Examples

3. Consequentialism Defined

3.1. Overview

3.2. Act Consequentialism

3.3. Rule Consequentialism

3.4. Competing Ethical Frameworks

4. Difficulties of Consequentialism

4.1. Value

4.2. Temporal Discounting

4.3. Uncertainty

4.4. Conflicts With Commonsense Morality

5. Fairness in Machine Learning

5.1. Fairness Constraints Specified Without Regard to Outcomes

5.1.1. Fairness Through Unawareness

5.1.2. Individual Fairness

5.1.3. Randomization

5.2. Fairness Constraints Specified in Terms of Outcomes

5.2.1. Demographic/Statistical Parity

5.2.2. Equality of Odds/Opportunity

5.2.3. Equal Calibration

6. A Consequentialist Perspective on Machine Learning Fairness

7. Randomization and Learning

8. Additional Related Work

9. Conclusions

Author Contributions

Conflict of Interest

Acknowledgments

Footnotes

References

94% of researchers rate our articles as excellent or good