Fair classification via domain adaptation: A dual adversarial learning approach

Liang, Yueqing; Chen, Canyu; Tian, Tian; Shu, Kai

doi:10.3389/fdata.2022.1049565

ORIGINAL RESEARCH article

Front. Big Data, 04 January 2023

Sec. Data Mining and Management

Volume 5 - 2022 | https://doi.org/10.3389/fdata.2022.1049565

Fair classification via domain adaptation: A dual adversarial learning approach

Yueqing Liang¹

Canyu Chen¹

Tian Tian²

Kai Shu¹^*

¹Department of Computer Science, Illinois Institute of Technology, Chicago, IL, United States
²Stuart School of Business, Illinois Institute of Technology, Chicago, IL, United States

Modern machine learning (ML) models are becoming increasingly popular and are widely used in decision-making systems. However, studies have shown critical issues of ML discrimination and unfairness, which hinder their adoption on high-stake applications. Recent research on fair classifiers has drawn significant attention to developing effective algorithms to achieve fairness and good classification performance. Despite the great success of these fairness-aware machine learning models, most of the existing models require sensitive attributes to pre-process the data, regularize the model learning or post-process the prediction to have fair predictions. However, sensitive attributes are often incomplete or even unavailable due to privacy, legal or regulation restrictions. Though we lack the sensitive attribute for training a fair model in the target domain, there might exist a similar domain that has sensitive attributes. Thus, it is important to exploit auxiliary information from a similar domain to help improve fair classification in the target domain. Therefore, in this paper, we study a novel problem of exploring domain adaptation for fair classification. We propose a new framework that can learn to adapt the sensitive attributes from a source domain for fair classification in the target domain. Extensive experiments on real-world datasets illustrate the effectiveness of the proposed model for fair classification, even when no sensitive attributes are available in the target domain.

1. Introduction

The recent development of machine learning models has been increasingly used for high-stake decision making such as filtering loan applicants (Hamid and Ahmed, 2016), deploying police officers (Mena, 2011), etc. However, a prominent concern is when a learned model has bias against some specific demographic group such as race or gender. For example, a recent report shows that software used by schools to filter student applications may be biased toward a specific race group¹; and COMPAS (Redmond, 2011), a tool for crime prediction, is shown to be more likely to assign a higher risk score to African-American offenders than to Caucasians with the same profile. Bias in algorithms can emanate from unrepresentative or incomplete training data or flawed information that reflects historical inequalities (Du et al., 2020), which can lead to unfair decisions that have a collective and disparate impact on certain groups of people. Therefore, it is important to ensure fairness in machine learning models.

Recent research on fairness machine learning has drawn significant attention to developing effective algorithms to achieve fairness and maintain good prediction performance (Barocas et al., 2017). However, the majority of these models require sensitive attributes to preprocess the data, regularize the model learning or post-process the prediction to have fair predictions. For example, Kamiran and Calders (2012) propose to assign the weights of each training sample differently by reweighting to ensure fairness before model training. In addition, Madras et al. (2018) utilize adversarial training on sensitive attributes and prediction labels for debiasing classification results. However, sensitive attributes are often incomplete or even unavailable due to privacy, legal, or regulation restrictions. For example, by the law of the US Consumer Financial Protection Bureau (CFPB), creditors may not request or collect information about an applicant's race, color, sex, etc².

Although we lack the sensitive attribute for training a fair model in the target domain, there might exist a similar domain that has sensitive attributes, which paves us a way to adapt the fairness from a related source domain to the target. For example, as shown in Figure 1, when historical features and risk labels are available for credit risk assessment, while user sensitive attributes (i.e., gender) are unavailable when expanding into new markets, we may learn models from its existing market data (source) to score people in the new market (target).

FIGURE 1

Figure 1. An illustration of cross-domain fair classification. The information of the sensitive attribute is available in the source domain, while unavailable in the target domain.

Though there are extensive works on domain adaptation, they are overwhelmingly on classification or knowledge transfer (Tan et al., 2018; Zhuang et al., 2020); while the work on fairness classification through domain adaptation is rather limited.

Very few recent works study a shallow scenario of transfer between different groups of sensitive attributes in the same domain for fair classification (Coston et al., 2019; Schumann et al., 2019). We study a general cross-domain fair classification problem in terms of a specific sensitive attribute, which is not explored yet to our best knowledge.

Therefore, we propose a dual-adversarial learning framework to learn to adapt sensitive attributes for fair classification in the target domain. In essence, we investigate the following challenges: (1) how to adapt the sensitive attributes to the target domain by transferring knowledge from the source domain; and (2) how to predict the labels accurately and satisfy fairness criteria. It is non-trivial since there are domain discrepancies across domains and no anchors are explicitly available for knowledge transfer. Our solutions to these challenges result in a novel framework called FAIRDA for fairness classification via domain adaptation. Our main contributions are summarized as follows:

• We study a novel problem of fair classification via domain adaptation.

• We propose a new framework FAIRDA that simultaneously transfers knowledge to adapt sensitive attributes and learns a fair classifier in the target domain with a dual adversarial learning approach.

• We conduct theoretical analysis demonstrating that fairness in the target domain can be achieved with estimated sensitive attributes derived from the source domain.

• We perform extensive experiments on real-world datasets to demonstrate the effectiveness of the proposed method for fair classification without sensitive attributes.

2. Related work

In this section, we briefly review the related works on fairness machine learning and deep domain adaptation.

2.1. Fairness in machine learning

Recent research on fairness in machine learning has drawn significant attention to developing effective algorithms to achieve fairness and maintain good prediction performance. which generally focus on individual fairness (Kang et al., 2020) or group fairness (Hardt et al., 2016; Zhang et al., 2017). Individual fairness requires the model to give similar predictions to similar individuals (Caton and Haas, 2020; Cheng et al., 2021). In group fairness, similar predictions are desired among multiple groups categorized by a specific sensitive attribute (e.g., gender). Other niche notions of fairness include subgroup fairness (Kearns et al., 2018) and Max-Min fairness (Lahoti et al., 2020), which aims to maximize the minimum expected utility across groups. In this work, we focus on group fairness. To improve group fairness, debiasing techniques have been applied at different stages of a machine learning model: (1) Pre-processing approaches (Kamiran and Calders, 2012) apply dedicated transformations on the original dataset to remove intrinsic discrimination and obtain unbiased training data prior to modeling; (2) In-processing approaches (Bechavod and Ligett, 2017; Agarwal et al., 2018) tackle this issue by incorporating fairness constraints or fairness-related objective functions to the design of machine learning models; and (3) Post-processing approaches (Dwork et al., 2018) revise the biased prediction labels by debiasing mechanisms.

Although the aforementioned methods can improve group fairness, they generally require the access of sensitive attributes, which is often infeasible. Very few recent works study fairness with limited sensitive attributes. For example, Zhao et al. (2022) and Gupta et al. (2018) explore and use the related features as proxies of the sensitive attribute to achieve better fairness results. Lahoti et al. (2020) proposes an adversarial reweighted method to achieve the Rawlsian Max-Min fairness objective which aims at improving the accuracy for the worst-case protected group. However, these methods may require domain knowledge to approximate sensitive attributes or are not suitable for ensuring group fairness. In addition, though Madras et al. (2018) mentioned fairness transfer, what they study is the cross-task transfer (Tan et al., 2021), which is different from our setting.

In this paper, we study the novel problem of cross-domain fairness classification which aims to estimate the sensitive attributes of the target domain to achieve the group fairness.

2.2. Deep domain adaption

Domain adaptation (Pan and Yang, 2009) aims at mitigating the generalization bottleneck introduced by domain shift. With the rapid growth of deep neural networks, deep domain adaptation has drawn much attention lately. In general, deep domain adaptation methods aim to learn a domain-invariant feature space that can reduce the discrepancy between the source and target domains. This goal is accomplished either by transforming the features from one domain to be closer to the other domain, or by projecting both domains into a domain-invariant latent space (Shu et al., 2019). For instance, TLDA (Zhuang et al., 2015) is a deep autoencoder-based model for learning domain-invariant representations for classification. Inspired by the idea of Generative Adversarial Network (GAN) (Goodfellow et al., 2014), researchers also propose to perform domain adaptation in an adversarial training paradigm (Ganin et al., 2016; Tzeng et al., 2017; Shu et al., 2019). By exploiting a domain discriminator to distinguish the domain labels while learning deep features to confuse the discriminator, DANN (Ganin et al., 2016) achieves superior domain adaptation performance. ADDA (Tzeng et al., 2017) learns a discriminative representation using labeled source data and then maps the target data to the same space through an adversarial loss. Recently, very few works apply transfer learning techniques for fair classification (Coston et al., 2019; Schumann et al., 2019). Although sensitive attributes are not available in the target domain, there may exist some publicly available datasets which can be used as auxiliary sources. However, these methods mostly consider a shallow scenario of transfer between different sensitive attributes in the same dataset.

In this paper, we propose a new domain adaptive approach based on dual adversarial learning to achieve fair classification in the target domain.

3. Problem statement

We first introduce the notations of this paper, and then give the formal problem definition. Let $D^{1} = {X^{1}, A^{1}, Y^{1}}$ denote the data in the source domain, where $X^{1}$ , $A^{1}$ , and $Y^{1}$ represent the set of data samples, sensitive attributes, and corresponding labels. Let $D^{2} = {X^{2}, Y^{2}}$ be the target domain, where the sensitive attributes of the target domain are unknown. The data distributions in the source and target domains are similar but different with the domain discrepancy. Following existing work on fair classification (Barocas et al., 2017; Mehrabi et al., 2019), we evaluated fairness performance using metrics such as equal opportunity and demographic parity. Without loss of generality, we consider binary classification. Equal opportunity requires that the probability of positive instances with arbitrary sensitive attributes A being assigned to a positive outcome are equal: 𝔼(Ŷ ∣ A = a, Y = 1) = 𝔼(Ŷ ∣ A = b, Y = 1), where Ŷ is predicted label. Demographic parity requires the behavior of the prediction model to be fair to different sensitive groups. Concretely, it requires that the positive rate across sensitive attributes are equal: 𝔼(Ŷ ∣ A = a) = 𝔼(Ŷ ∣ A = b), ∀a, b. The problem of fair classification with domain adaptation is formally defined as follows:

Problem Statement: Given the training data $D^{1}$ and $D^{2}$ from the source and target domain, learn an effective classifier for the target domain while satisfying the fairness criteria such as demographic parity.

4. FAIRDA: Fair classification with domain adaptation

In this section, we present the details of the proposed framework for fair classification with domain adaptation. As shown in Figure 2, our framework consists of two major modules: (1) an adversarial domain adaptation module that estimates sensitive attributes for the target domain; and (2) an adversarial debiasing module to learn a fair classifier in the target domain.

FIGURE 2

Figure 2. An illustration of the proposed framework FAIRDA. It consists of two major modules: (1) an adversarial domain adaptation module containing a sensitive attribute predictor and a domain classifier; and (2) an adversarial debiasing module consisting of a label predictor and a bias predictor.

Specifically, first, the adversarial domain adaptation module contains a sensitive attribute predictor f_{θ_A} that describes the modeling of predicting sensitive attributes and a domain classifier f_{θ_d} that illustrates the process of transferring knowledge to the target domain for estimating sensitive attributes. In addition, the adversarial debiasing module consists of a label predictor f_{θ_Y} that models the label classification in the target domain, and a bias predictor f_{θ_a} to illustrate the process of differentiating the estimated sensitive attributes from the target domain data.

4.1. Estimating target sensitive attributes

Since the sensitive attributes in the target domain are unknown, it is necessary to estimate the target sensitive attributes to build fair classifiers. Recent advancement in unsupervised domain adaptation has shown promising results to adapt a classifier on a source domain to the unlabeled data to the target domain (Ganin and Lempitsky, 2015). Therefore, we propose to investigate an unsupervised domain adaptation framework to infer the sensitive attributes in the target domain. Specifically, to minimize the prediction error of sensitive attributes in the source domain, with the following objective function:

\begin{array}{l} \min_{θ_{A}} ℒ_{A} = 𝔼_{(X^{1}, A^{1}) \in D^{1}} ℓ (A^{1}, f_{θ_{A}} (h_{1} (X^{1}))) & (1) \end{array}

Where f_{θ_A} is used to predict sensitive attributes, ℓ denotes the loss function to minimize prediction error, such as cross-entropy loss, and the embedding function h₁(·) is to learn the representation for predicting sensitive attributes. To ensure that f_{θ_A} trained on $D^{1}$ can well estimate the sensitive attributes of $D^{2}$ . We further introduce a domain classifier f_{θ_d} to make the representation of source and target domains in the same feature space. Specifically, f_{θ_d} aims to differentiate if a representation is from the source or the target; while h₁(·) aim to learn a domain-invariant representation that can fool f_{θ_d}, i.e.,

\begin{array}{l} \min_{θ_{h_{1}}} \max_{θ_{d}} ℒ_{d} = E_{X^{1} \in D^{1}} [\log (f_{θ_{d}} (h_{1} (X^{1}))] + \\ E_{X^{2} \in D^{2}} [\log (1 - (f_{θ_{d}} (h_{1} (X^{2})))] \end{array} (2)

Where θ_d are the parameters of the domain classifier. The overall objective function for this module is a min-max game between the sensitive attributes predictor f_{θ_A} and the domain classifier f_{θ_d} as follows:

\begin{array}{l} \min_{θ_{A}, θ_{h_{1}}} \max_{θ_{d}} ℒ_{1} = ℒ_{A} - α ℒ_{d} & (3) \end{array}

Where α controls the importance of the adversarial learning of the domain classifier. To estimate the sensitive attributes in the target domain, we will utilize the following function: $Â^{2} = f_{θ_{A}} (h_{1} (X^{2}))$ .

4.2. Adversarial debiasing for fair classification

To learn a fair classifier for the target domain, we will leverage the derived sensitive attributes Â² and the labels Y² in the target domain. Specifically, we aim to learn a representation that can predict the labels accurately, while being irrelevant to sensitive attributes. To this end, we propose to feed the representation into a label predictor and a bias predictor. Specifically, the label predictor is to minimize the predicted errors of the labels with the following objective function:

\begin{array}{l} \min_{θ_{Y}} ℒ_{Y} = 𝔼_{(X^{2}, Y^{2}) \in D^{2}} ℓ (Y^{2}, f_{θ_{Y}} (h_{2} (X^{2}))) & (4) \end{array}

Where f_{θ_Y} is to predict the labels, h₂(·) is an embedding layer to encode the features into a latent representation space, and ℓ is a cross-entropy loss. Furthermore, to learn fair representations and make fair predictions in the target domain, we incorporate a bias predictor to predict the sensitive attributes; while h₂(·) trying to learn the representation that can fool f_{θ_Y}:

\begin{array}{l} \min_{θ_{h_{2}}} \max_{θ_{a}} ℒ_{a} = 𝔼_{X^{2} ~ p (X^{2} ∣ {\hat{A}}^{2} = 1)} [\log (f_{θ_{a}} (h_{2} (X^{2})))] + \\ 𝔼_{X^{2} ~ p (X^{2} ∣ {\hat{A}}^{2} = 0)} [\log (1 - f_{θ_{a}} (h_{2} (X^{2})))] & (5) \end{array}

Where θ_a are the parameters for the adversary to predict sensitive attributes. Finally, the overall objective function of adversarial debiasing for fair classification is a min-max function:

\begin{array}{l} \min_{θ_{h_{2}}, θ_{Y}} \max_{θ_{a}} ℒ_{2} = ℒ_{Y} - β ℒ_{a} & (6) \end{array}

Where β controls the importance of the bias predictor.

4.3. The proposed framework: FAIRDA

We have introduced how we can estimate sensitive attributes by adversarial domain adaptation, and how to ensure fair classification with adversarial debiasing. We integrate the components together and the overall objective function of the dual adversarial learning is as follows:

\begin{array}{l} \min_{θ_{h_{1}}, θ_{h_{2}}, θ_{Y}, θ_{A}} \max_{θ_{d}, θ_{a}} ℒ = ℒ_{1} + ℒ_{2} & (7) \end{array}

The parameters in the objective are learned through RMSProp, which is an adaptive learning rate method that divides the learning rate by an exponentially decaying average of squared gradients. We choose RMSProp as the optimizer because it is a popular and effective method to determine the learning rate abortively which is widely used for training adversarial neural networks (Dou et al., 2019; Li et al., 2022; Zhou and Pan, 2022). Adam is another widely adopted optimizer that extends RMSProp with momentum terms, however, the momentum terms may make Adam unstable (Mao et al., 2017; Luo et al., 2018; Clavijo et al., 2021). We will also prioritize the training of f_{θ_A} to ensure a good estimation of Â². Next, we will conduct a theoretical analysis of the dual adversarial learning for fairness guarantee.

5. A theoretical analysis on fairness guarantee

In this section, we perform a theoretical analysis of the fairness guarantee under the proposed framework FAIRDA with key assumptions. The model essentially contains two modules: (1) the adversarial domain adaptation for estimating sensitive attributes ( $L_{1}$ in Equation 3); and (2) the adversarial debiasing for learning a fair classifier ( $L_{2}$ in Equation 6). The performance of the second module is relying on the output (i.e., Â²) of the first module.

To understand the theoretical guarantee for the first module, we follow the recent work on analyzing the conventional unsupervised domain adaptation (Zhang et al., 2019; Liu et al., 2021). The majority of them consider the model effectiveness under different types of domain shifts such as label shift, covariate shift, conditional shift, etc. (Kouw and Loog, 2018; Liu et al., 2021). When analyzing domain adaptation models, researchers usually consider the existence of one shift while assuming other shifts are invariant across domains (Kouw and Loog, 2018). The existence of label shift is a common setting of unsupervised domain adaptation (Chen et al., 2018; Lipton et al., 2018; Azizzadenesheli et al., 2019; Wu et al., 2019). We follow the above papers to make the same assumption on label shift, which in our scenario is sensitive attribute shift across two domains, i.e., the prior distribution changes as p(A¹) ≠ p(A²). In addition, we assume that the other shifts are invariant across domains, for example, p(X¹|A¹) = p(X²|A²). Under the assumption of the sensitive attribute shift, we can derive the risk in the target domain as Kouw and Loog (2018):

\begin{array}{l} R^{2} (h_{1}, θ_{A}) = \sum_{A \in A} \int_{X} ℓ (A, f_{θ_{A}} (h_{1} (X))) \frac{p (X^{2} | A^{2}) p (A^{2})}{p (X^{1} | A^{1}) p (A^{1})} \\ p (X^{1}, A^{1}) d X \\ = \sum_{A \in A} \int_{X} ℓ (A, f_{θ_{A}} (h_{1} (X))) \frac{p (A^{2})}{p (A^{1})} p (X^{1}, A^{1}) d X & (8) \end{array}

Where the ratio p(A²)/p(A¹) represents the change in the proportions in the sensitive attribute. Since we do not have samples with sensitive attributes from the target domain, we can use the samples drawn from the source distribution to estimate the target sensitive attributes distribution with mean matching (Gretton et al., 2009) by minimizing the following function $| | M^{1} p (A^{1}) - μ^{2} | |_{2}^{2}$ : where M¹ is the vector empirical sample means from the source domain, i.e., $[μ^{1} (f_{θ_{A}} (h_{1} (X^{1})) | A^{1} = 0), μ^{1} (f_{θ_{A}} (h_{1} (X^{1})) | A^{1} = 1)]$ . μ² is the encoded feature means for the target. We will incorporate the above strategy for estimating the target sensitive attribute proportions with gradient descent during the adversarial training (Li et al., 2019).

The noise induced for estimating the target sensitive attributes Â² is nonnegligible, which may influence the adversarial debiasing in the second module. Next, we theoretically show that under mild conditions, we can satisfy fairness metrics such as demographic parity. First, we can prove that the global optimum of $L_{2}$ can be achieved if and only if p(Z²|Â² = 1) = p(Z²|Â² = 0), where $Z^{2} = h_{2} (X^{2})$ is the representation of X² according to the Proposition 1. in Goodfellow et al. (2014). Next, we introduce the following theorem under two reasonable assumptions:

Theorem 1. Let Ŷ² denote the predicted label of the target domain, if

(1) The estimated sensitive attributes Â² and the representation of X² are independent conditioned on the true sensitive attributes, i.e.,

p(Â², Z²|A²) = p(Â²|A²)p(Z²|A²);

(2) The estimated sensitive attributes are not random, i.e.,

p(A² = 1|Â² = 1) ≠ p(|A² = 0|Â² = 1).

If $L_{2}$ reaches the global optimum, the label prediction f_{θ_Y} will achieve demographic parity, i.e.,

p(Ŷ²|A² = 0) = p(Ŷ²|A² = 1).

We first explain the two assumptions:

(1) Since we use two separate embedding layers h₁(·) and h₂(·) to predict the target sensitive attributes, and learn the latent presentation, it generally holds that Â² is independent of the representation of X², i.e., p(Â², Z²|A²) = p(Â²|A²)p(Z²|A²);

(2) Since we are using adversarial learning to learn an effective estimator f_{θ_A} for sensitive attributes, it is reasonable to assume that it does not produce random prediction results.

We prove Theorem 1 as follows:

Since p(Â², Z²|A²) = p(Â²|A²)p(Z²|A²), we have p(Z²|Â², A²) = p(Z²|A²). In addition, when the algorithm converges, we have p(Z²|Â² = 1) = p(Z²|Â² = 0), which is equivalent with $\sum_{A^{2}} p (Z^{2}, A^{2} | Â^{2} = 1) = \sum_{A^{2}} p (Z^{2}, A^{2} | Â^{2} = 0)$ . Therefore,

\begin{array}{l} \sum_{A^{2}} p (Z^{2} | A^{2}) p (A^{2} | Â^{2} = 1) = \sum_{A^{2}} p (Z^{2} | A^{2}) p (A^{2} | Â^{2} = 0) \end{array} (9)

Based on the above equation, we can get,

\frac{p (Z^{2} | A^{2} = 1)}{p (Z^{2} | A^{2} = 0)} = \frac{p (A^{2} = 0 | {\hat{A}}^{2} = 1) - p (A^{2} = 0 | {\hat{A}}^{2} = 0)}{p (A^{2} = 1 | {\hat{A}}^{2} = 0) - p (A^{2} = 1 | {\hat{A}}^{2} = 1)} = 1 (10)

Which shows that a global minimum achieves, i.e., p(Z²|A² = 1) = p(Z²|A² = 0). Since $Ŷ^{2} = f_{θ_{Y}} (Z^{2})$ , we can get p(Ŷ²|A² = 1) = p(Ŷ²|A² = 0), which is the demographic parity.

6. Experiments

In this section, we present the experiments to evaluate the effectiveness of FAIRDA. We aim to answer the following research questions (RQs):

• RQ1: Can FAIRDA obtain fair predictions without accessing sensitive attributes in the target domain?

• RQ2: How can we transfer fairness knowledge from the source domain while effectively using it to regularize the target's prediction?

• RQ3: How would different choices of the weights of the two adversarial components impact the performance of FAIRDA?

6.1. Datasets

We conduct experiments on four publicly available benchmark datasets for fair classification: COMPAS (Angwin et al., 2016), Adult (Dua and Graff, 2017), Toxicity (Dixon et al., 2018), and CelebA (Liu et al., 2015).

• COMPAS³: This dataset describes the task of predicting the recidivism of individuals in the U.S. Both “sex” and “race” could be the sensitive attributes of this dataset (Le Quy et al., 2022).

• ADULT⁴: This dataset contains records of personal yearly income, and the label is whether the income of a specific individual exceeds 50k or not. Following Zhao et al. (2022), we choose “sex" as the sensitive attribute.

• Toxicity⁵: This dataset collects comments with labels indicating whether each comment is toxic or not. Following Chuang and Mroueh (2021), we choose “race" as the sensitive attribute.

• CelebA⁶: This dataset contains celebrity images, where each image has 40 human-labeled binary attributes. Following Chuang and Mroueh (2021), we choose the attribute “attractive" as our prediction task and “sex" as the sensitive attribute.

The evaluation of all models is measured by the performance on the target dataset. We split train : eval : test set for the target dataset as 0.5 : 0.25 : 0.25. For the source dataset, because we do not evaluate on it, we split train : eval set as 0.6 : 0.4. It is worth noting that for the aforementioned datasets, we need to define a filter to derive source and target domains for empirical studies. Our general principle is to choose a feature that is reasonable in the specific scenario. For example, in COMPAS, we use age as the filter due to the prior knowledge that individuals in different age groups have different preferences to disclose their sex and race information. For ADULT, we consider the sex of each individual to be the sensitive attribute, and define two filters based on the working class and country to derive the source and target domains. The statistics of the datasets are shown in Table 1.

TABLE 1

Table 1. Statistics of datasets.

6.2. Experimental settings

6.2.1. Baselines

Since there is no existing work on cross-domain fair classification, we compare our proposed FAIRDA with the following representative methods in fair classification without sensitive attributes.

• Vanilla: This method directly trains a base classifier without explicitly regularizing on the sensitive attributes.

• ARL (Lahoti et al., 2020): This method utilizes reweighting on under-represented regions detected by adversarial model to alleviate bias. It focuses on improving Max-Min fairness rather than group fairness in our setting. However, it is an important work in fair machine learning. For the integrity of our experiment, we still involve it as one of our baselines.

• KSMOTE (Yan et al., 2020): It first derives pseudo groups, and then uses them to design regularization to ensure fairness.

• FairRF (Zhao et al., 2022): It optimizes the prediction fairness without sensitive attribute but with some available features that are known to be correlated with the sensitive attribute.

For KSMOTE⁷ and FairRF⁸, we directly use the code provided by the authors. For all other approaches, we adopt a three-layer multi-layer-perceptron (MLP) as an example of the base classifier, and set the two hidden dimensions of MLP as 64 and 32, and use RMSProp optimizer with 0.001 as the initial learning rate. We have similar observations for other types of classifiers such as Logistic Regression and SVM.

6.2.2. Evaluation metrics

Following existing work on fair models, we measure the classification performance with Accuracy (ACC) and F1, and the fairness performance based on Demographic Parity and Equal Opportunity (Mehrabi et al., 2019). We consider the scenario when the sensitive attributes and labels are binary, which can be naturally extended to more general cases.

• Demographic Parity: A classifier is considered to be fair if the prediction Ŷ is independent of the sensitive attribute A. In other words, demographic parity requires each demographic group has the same chance for a positive outcome: 𝔼(Ŷ|A = 1) = 𝔼(Ŷ|A = 0). We will report the difference of each sensitive group's demographic parity (Δ_DP):

\begin{array}{l} Δ_{D P} = | 𝔼 (\hat{Y} | A = 1) - 𝔼 (\hat{Y} | A = 0) | & (11) \end{array}

• Equal Opportunity: Equal opportunity considers one more condition than demographic parity. A classifier is said to be fair if the prediction Ŷ of positive instances is independent of the sensitive attributes. Concretely, it requires the true positive rates of different sensitive groups to be equal: 𝔼(Ŷ|A = 1, Y = 1) = 𝔼(Ŷ|A = 0, Y = 1). Similarly, in the experiments, we report the difference of each sensitive group's equal opportunity (Δ_EO):

\begin{array}{l} Δ_{𝔼 O} = | 𝔼 (\hat{Y} | A = 1, Y = 1) - 𝔼 (\hat{Y} | A = 0, Y = 1) | & (12) \end{array}

Note that demographic parity and equal opportunity measure fairness performance in different ways, and the smaller the values are, the better the performance of fairness.

6.3. Fairness performances (RQ1)

To answer RQ1, we compare the proposed framework with the aforementioned baselines in terms of prediction and fairness. All experiments are conducted 5 times and the average performances and standard deviations are reported in Tables 2–4 in terms of ACC, F1, Δ_EO, Δ_DP. We have the following observations:

• Adapting sensitive attribute information from a similar source domain could help achieve a better fairness performance in the target domain. For example, compared with ARL, KSMOTE and FairRF, which leverage the implicit sensitive information within the target domain, overall FAIRDA performs better in terms of both fairness metrics Δ_DP and Δ_EO across all six experiments.

• In general, the proposed framework FAIRDA can achieve a better performance of fairness, while remains comparatively good prediction performance. For example, in Exp. 6 of CelebA dataset, the performance of Δ_DP of FAIRDA increases 21.2% compared to Vanilla, on which no explicit fairness regularization applied, and the performance of ACC only drops 0.1%.

• We conduct experiments on various source and target datasets. We observe that no matter whether the number of instances in the source domain is greater or smaller than the target one, the proposed FAIRDA outperforms baselines consistently, which demonstrates the intuition of sensitive adaptation is effective and the proposed FAIRDA is robust.

TABLE 2

Table 2. Fairness performance comparison in the COMPAS dataset.

TABLE 3

Table 3. Fairness performance comparison in the ADULT dataset.

TABLE 4

Table 4. Fairness performance comparison in the Toxicity and CelebA datasets.

6.4. Ablation study (RQ2)

In this section, we aim to analyse the effectiveness of each component in the proposed FAIRDA framework. As shown in Section (4), FAIRDA contains two adversarial components. The first one is designed for making better predictions on target domain's sensitive attribute with the knowledge learned from the source domain. The second one is designed for debiasing with the sensitive attribute predicted by the domain adaptive classifier. In order to answer RQ2, we investigate the effects of these components by defining two variants of FAIRDA:

• FAIRDA w/o DA: This is a variant of FAIRDA without domain adversary on the source domain. It first trains a classifier with the source domain's sensitive attribute and directly applies it to predict the target domain's sensitive attribute.

• FAIRDA w/o Debiasing: This is a variant of FAIRDA without debiasing adversary on the target domain. In short, it is equivalent to Vanilla.

Table 5 reports the average performances with standard deviations for each method. We could make the following observations:

• When we eliminate the domain adversary for adapting the sensitive attributes information (i.e., FAIRDA w/o DA), both Δ_DP and Δ_EO increase, which means the fairness performances are reduced. In addition, the classification performances also decline compared with FAIRDA. This suggests that the adaptation is able to make better estimations for the target sensitive attributes, which contributes to better fairness performances.

• When we eliminate the debiasing adversary (i.e., FAIRDA w/o Debiasing), there is no fairness regularization applied on the target domain. Therefore, it is equivalent to Vanilla. We could observe that the fairness performances drop significantly, which suggests that adding the debiasing adversary to FAIRDA is indispensable. In addition, even with the adversarial debiasing, FAIRDA can still achieve comparatively good F1 compared with FAIRDA w/o Debiasing. This enhances the intuition of leveraging adversary to debias.

TABLE 5

Table 5. Assessing the effectiveness of each component in FAIRDA.

6.5. Parameter analysis (RQ3)

To answer RQ3, we explore the parameter sensitivity of the two important hyperparameters of our model, using Exp.1. α controls the impact of the adversary to the sensitive attribute predictor, while β controls the influence of the adversary to the debiasing. We vary both α and β from [0.0001, 0.001, 0.01, 0.1, 1], and other settings are the same as Exp.1. The results are shown in Figure 3. It is worth noting that, for Figures 3C, D, lower values are better fairness performances, while for Figures 3A, B, higher values indicate better prediction performance. From Figures 3C, D we could observe that: (1) generally, larger α and β will achieve fairer predictions, while smaller α and β result in worse fairness; (2) when we increase the value of α, both Δ_DP and Δ_EO will first decrease and then increase when the value of α is too large. This may be because the estimated target sensitive attributes are still noisy and strengthening the adversarial debiasing that optimizes the noisy sensitive attributes may not necessarily lead to better fairness performances; and (3) for β, the fairness performance is consistently better with a higher value. From Figure 3B, we observe that the classification performance is better when α and β are balanced. Overall, we observe that when α and β are in the range of [0.001, 0.1], FAIRDA can achieve relatively good performances of fairness and classification.

FIGURE 3

Figure 3. Parameter sensitivity analysis to assess the impact of domain adversary and debiasing adversary components. (A) ACC, (B) F1, (C) Δ_DP, and (D) Δ_EO.

7. Conclusion and future work

In this paper, we study a novel and challenging problem of exploiting domain adaptation for fair and accurate classification for a target domain without the availability of sensitive attributes. We propose a new framework FAIRDA using a dual adversarial learning approach to achieve a fair and accurate classification. We provide a theoretical analysis to demonstrate that we can learn a fair model prediction under mild assumptions. Experiments on real-world datasets show that the proposed approach can achieve a more fair performance compared to existing approaches by exploiting the information from a source domain, even without knowing the sensitive attributes in the target domain. For future work, first, we can consider multiple source domains available and explore how to exploit domain discrepancies across multiple domains to enhance the performance of the fair classifier in the target domain. Second, we will explore the fairness transfer under different types of domain shifts, such as the conditional shift of sensitive attributes. Third, we will explore other domain adaption approaches such as meta-transfer learning to achieve cross-domain fair classification.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.

Funding

This work was supported in part by a Cisco Research Award and XCMG American Research Corporation.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Footnotes

1. ^https://www.fastcompany.com/90342596/schools-are-quietly-turning-to-ai-to-help-pick-who-gets-in-what-could-go-wrong

2. ^https://www.consumerfinance.gov/rules-policy/regulations/1002/5/

3. ^https://github.com/propublica/compas-analysis

4. ^https://archive.ics.uci.edu/ml/machine-learning-databases/adult/

5. ^https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification

6. ^http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html

7. ^https://imbalanced-learn.org/stable/

8. ^https://github.com/TianxiangZhao/fairlearn

References

Agarwal, A., Beygelzimer, A., Dudík, M., Langford, J., and Wallach, H. (2018). “A reductions approach to fair classification,” in International Conference on Machine Learning (Stockholm), 60–69.

Google Scholar

Angwin, J., Larson, J., Mattu, S., and Kirchner, L. (2016). “Machine bias,” in Ethics of Data and Analytics (Auerbach Publications), 254–264.

Google Scholar

Azizzadenesheli, K., Liu, A., Yang, F., and Anandkumar, A. (2019). Regularized learning for domain adaptation under label shifts. arXiv preprint arXiv:1903.09734. doi: 10.48550/arXiv.1903.09734

CrossRef Full Text | Google Scholar

Barocas, S., Hardt, M., and Narayanan, A. (2017). “Fairness in machine learning,” in NeurIPS Tutorial, Vol. 1, 2. Available online at: http://www.fairmlbook.org

Google Scholar

Bechavod, Y., and Ligett, K. (2017). Learning fair classifiers: a regularization-inspired approach. arXiv preprint arXiv:1707.00044. doi: 10.48550/arXiv.1707.00044

CrossRef Full Text | Google Scholar

Caton, S., and Haas, C. (2020). Fairness in machine learning: a survey. CoRR, abs/2010.04053. doi: 10.48550/arXiv.2010.04053

CrossRef Full Text | Google Scholar

Chen, Q., Liu, Y., Wang, Z., Wassell, I., and Chetty, K. (2018). “Re-weighted adversarial adaptation network for unsupervised domain adaptation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Salt Lake City, UT: IEEE), 7976–7985.

Google Scholar

Cheng, L., Varshney, K. R., and Liu, H. (2021). Socially responsible ai algorithms: Issues, purposes, and challenges. arXiv preprint arXiv:2101.02032. doi: 10.1613/jair.1.12814

CrossRef Full Text | Google Scholar

Chuang, C.-Y., and Mroueh, Y. (2021). Fair mixup: fairness via interpolation,” in International Conference on Learning Representations (Vienna).

Google Scholar

Clavijo, J. M., Glaysher, P., Jitsev, J., and Katzy, J. (2021). Adversarial domain adaptation to reduce sample bias of a high energy physics event classifier. Mach. Learn. Sci. Technol. 3, 015014. doi: 10.1088/2632-2153/ac3dde

CrossRef Full Text | Google Scholar

Coston, A., Ramamurthy, K. N., Wei, D., Varshney, K. R., Speakman, S., Mustahsan, Z., et al. (2019). “Fair transfer learning with missing protected attributes,” in Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (Honolulu, HI), 91–98.

Google Scholar

Dixon, L., Li, J., Sorensen, J., Thain, N., and Vasserman, L. (2018). “Measuring and mitigating unintended bias in text classification,” in Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society (New Orleans, LA), 67–73.

Google Scholar

Dou, Q., Ouyang, C., Chen, C., Chen, H., Glocker, B., Zhuang, X., et al. (2019). Pnp-adanet: plug-and-play adversarial domain adaptation network at unpaired cross-modality cardiac segmentation. IEEE Access 7:99065–99076. doi: 10.1109/ACCESS.2019.2929258

CrossRef Full Text | Google Scholar

Du, M., Yang, F., Zou, N., and Hu, X. (2020). Fairness in deep learning: a computational perspective. IEEE Intell. Syst. 36, 25–34. doi: 10.1109/MIS.2020.3000681

PubMed Abstract | CrossRef Full Text | Google Scholar

Dua, D., and Graff, C. (2017). UCI Machine Learning Repository. Irvine, CA: University of California. Available online at: http://archive.ics.uci.edu/ml

Google Scholar

Dwork, C., Immorlica, N., Kalai, A. T., and Leiserson, M. (2018). “Decoupled classifiers for group-fair and efficient machine learning,” in Conference on Fairness, Accountability and Transparency (New York, NY), 119–133.

Google Scholar

Ganin, Y., and Lempitsky, V. (2015). “Unsupervised domain adaptation by backpropagation,” in International Conference on Machine Learning (Lille: PMLR), 1180–1189.

Google Scholar

Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., et al. (2016). Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17, 1–35. doi: 10.1007/978-3-319-58347-1_10

CrossRef Full Text | Google Scholar

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). “Generative adversarial nets,” in NIPS (Montreal, QC), 2672–2680.

Google Scholar

Gretton, A., Smola, A., Huang, J., Schmittfull, M., Borgwardt, K., and Schölkopf, B. (2009). Covariate shift by kernel mean matching. Dataset Shift Mach. Learn. 3, 5. doi: 10.7551/mitpress/9780262170055.003.0008

CrossRef Full Text | Google Scholar

Gupta, M., Cotter, A., Fard, M. M., and Wang, S. (2018). Proxy fairness. arXiv preprint arXiv:1806.11212. doi: 10.48550/arXiv.1806.11212

CrossRef Full Text | Google Scholar

Hamid, A. J., and Ahmed, T. M. (2016). Developing prediction model of loan risk in banks using data mining. Mach. Learn. Appl. 3, 3101. doi: 10.5121/mlaij.2016.3101

CrossRef Full Text | Google Scholar

Hardt, M., Price, E., and Srebro, N. (2016). Equality of opportunity in supervised learning. arXiv preprint arXiv:1610.02413. doi: 10.48550/arXiv.1610.02413

CrossRef Full Text | Google Scholar

Kamiran, F., and Calders, T. (2012). Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 33, 1–33. doi: 10.1007/s10115-011-0463-8

CrossRef Full Text | Google Scholar

Kang, J., He, J., Maciejewski, R., and Tong, H. (2020). “Inform: individual fairness on graph mining,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Diego, CA), 379–389.

Google Scholar

Kearns, M., Neel, S., Roth, A., and Wu, Z. S. (2018). “Preventing fairness gerrymandering: auditing and learning for subgroup fairness,” in International Conference on Machine Learning (Stockholm), 2564–2572.

Google Scholar

Kouw, W. M., and Loog, M. (2018). An introduction to domain adaptation and transfer learning. arXiv preprint arXiv:1812.11806. doi: 10.48550/arXiv.1812.11806

CrossRef Full Text | Google Scholar

Lahoti, P., Beutel, A., Chen, J., Lee, K., Prost, F., Thain, N., et al. (2020). Fairness without demographics through adversarially reweighted learning. arXiv preprint arXiv:2006.13114.

Google Scholar

Le Quy, T., Roy, A., Iosifidis, V., Zhang, W., and Ntoutsi, E. (2022). “A survey on datasets for fairness-aware machine learning,” in Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, e1452.

Google Scholar

Li, Y., Murias, M., Major, S., Dawson, G., and Carlson, D. (2019). “On target shift in adversarial domain adaptation,” in The 22nd International Conference on Artificial Intelligence and Statistics (Naha, Okinawa: PMLR), 616–625.

Google Scholar

Li, Y., Zhang, Y., and Yang, C. (2022). Unsupervised domain adaptation with joint adversarial variational autoencoder. Knowl. Based Syst. 17, 109065. doi: 10.1016/j.knosys.2022.109065

CrossRef Full Text | Google Scholar

Lipton, Z., Wang, Y.-X., and Smola, A. (2018). “Detecting and correcting for label shift with black box predictors,” in International Conference on Machine Learning (Stockholm: PMLR), 3122–3130.

Google Scholar

Liu, X., Guo, Z., Li, S., Xing, F., You, J., Kuo, C.-C. J., et al. (2021). “Adversarial unsupervised domain adaptation with conditional and label shift: Infer, align and iterate,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (Montreal, QC: IEEE), 10367–10376.

Google Scholar

Liu, Z., Luo, P., Wang, X., and Tang, X. (2015). “Deep learning face attributes in the wild,” in Proceedings of International Conference on Computer Vision (ICCV).

Google Scholar

Luo, Y., Zhang, S.-Y., Zheng, W.-L., and Lu, B.-L. (2018). “Wgan domain adaptation for eeg-based emotion recognition,” in International Conference on Neural Information Processing (Montreal, QC: Springer), 275–286.

PubMed Abstract | Google Scholar

Madras, D., Creager, E., Pitassi, T., and Zemel, R. (2018). “Learning adversarially fair and transferable representations,” in International Conference on Machine Learning (Stockholm: PMLR), 3384–3393.

Google Scholar

Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z., and Paul Smolley, S. (2017). “Least squares generative adversarial networks,” in Proceedings of the IEEE International Conference on Computer Vision (Venice: IEEE), 2794–2802.

PubMed Abstract | Google Scholar

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., and Galstyan, A. (2019). A survey on bias and fairness in machine learning. arXiv preprint arXiv:1908.09635. doi: 10.48550/arXiv.1908.09635

CrossRef Full Text | Google Scholar

Mena, J. (2011). Machine Learning Forensics for Law Enforcement, Security, and Intelligence. CRC Press.

Google Scholar

Pan, S. J., and Yang, Q. (2009). A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22, 1345–1359. doi: 10.1109/TKDE.2009.191

CrossRef Full Text | Google Scholar

Redmond, M. (2011). Communities and Crime Unnormalized Data set. UCI Machine Learning Repository. Available online at: http://www.ics.uci.edu/mlearn/MLRepository.html

Google Scholar

Schumann, C., Wang, X., Beutel, A., Chen, J., Qian, H., and Chi, E. H. (2019). Transfer of machine learning fairness across domains. arXiv preprint arXiv:1906.09688. doi: 10.48550/arXiv.1906.09688

CrossRef Full Text | Google Scholar

Shu, Y., Cao, Z., Long, M., and Wang, J. (2019). Transferable curriculum for weakly-supervised domain adaptation. Proc. AAAI Conf. Artif. Intell. 33, 4951–4958. doi: 10.1609/aaai.v33i01.33014951

CrossRef Full Text | Google Scholar

Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., and Liu, C. (2018). “A survey on deep transfer learning,” in International Conference on Artificial Neural Networks (Rhodes: Springer), 270–279.

PubMed Abstract | Google Scholar

Tan, Y., Li, Y., and Huang, S.-L. (2021). “Otce: a transferability metric for cross-domain cross-task representations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (Nashville, TN: IEEE), 15779–15788.

Google Scholar

Tzeng, E., Hoffman, J., Saenko, K., and Darrell, T. (2017). “Adversarial discriminative domain adaptation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Honolulu, HI: IEEE), 7167–7176.

Google Scholar

Wu, Y., Winston, E., Kaushik, D., and Lipton, Z. (2019). “Domain adaptation with asymmetrically-relaxed distribution alignment,” in International Conference on Machine Learning (Long Beach, CA: PMLR), 6872–6881.

PubMed Abstract | Google Scholar

Yan, S., Kao, H.-T., and Ferrara, E. (2020). “Fair class balancing: Enhancing model fairness without observing sensitive attributes,” in Proceedings of the 29th ACM International Conference on Information &Knowledge Management, 1715–1724.

Google Scholar

Zhang, L., Wu, Y., and Wu, X. (2017). “Achieving non-discrimination in data release,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Halifax, NS), 1335–1344.

Google Scholar

Zhang, Y., Liu, T., Long, M., and Jordan, M. (2019). “Bridging theory and algorithm for domain adaptation,” in International Conference on Machine Learning (Long Beach, CA: PMLR), 7404–7413.

Google Scholar

Zhao, T., Dai, E., Shu, K., and Wang, S. (2022). “You can still achieve fairness without sensitive attributes: exploring biases in non-sensitive features,” in WSDM '22, The Fourteenth ACM International Conference on Web Search and Data Mining, Phoenix, March 8-12, 2022 (Phoenix: ACM).

Google Scholar

Zhou, R., and Pan, Y. (2022). “Flooddan: unsupervised flood forecasting based on adversarial domain adaptation,” in 2022 IEEE 5th International Conference on Big Data and Artificial Intelligence (BDAI) (Fuzhou: IEEE), 6–12.

Google Scholar

Zhuang, F., Cheng, X., Luo, P., Pan, S. J., and He, Q. (2015). “Supervised representation learning: Transfer learning with deep autoencoders,” in IJCAI (Buenos Aires).

Google Scholar

Zhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., et al. (2020). A comprehensive survey on transfer learning. Proc. IEEE 109, 43–76. doi: 10.1109/JPROC.2020.3004555

CrossRef Full Text | Google Scholar

Keywords: fair machine learning, adversarial learning, domain adaptation, trustworthiness, transfer learning

Citation: Liang Y, Chen C, Tian T and Shu K (2023) Fair classification via domain adaptation: A dual adversarial learning approach. Front. Big Data 5:1049565. doi: 10.3389/fdata.2022.1049565

Received: 20 September 2022; Accepted: 07 December 2022;
Published: 04 January 2023.

Edited by:

Qi Li, Iowa State University, United States

Reviewed by:

Chaochao Chen, Zhejiang University, China
Mengdi Huai, Iowa State University, United States

Copyright © 2023 Liang, Chen, Tian and Shu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Kai Shu, yes a3NodUBpaXQuZWR1

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.