- 1Department of Sociology and Anthropology, Ben-Gurion University of the Negev, Beer-Sheba, Israel
- 2Uehiro Research Division for iPS Cell Ethics, Center for iPS Cell Research and Application (CiRA), Kyoto University, Kyoto, Japan
Introduction
Big data, juxtaposing genetic, clinical, and socio-demographic information, forms the basis for research on various health risk correlations in precision/personalized medicine. In this context, artificial intelligence (AI) has recently been used to improve polygenic risk score (PRS). Polygenic risk scores provide a measure of individual disease risk based on one’s genome-wide information, with a particular focus on a statistical calculation of multiple genomic variants1. The development of population-level genetic studies, such as genome wide association studies (GWAS), has accelerated the development of PRS as part of genomic research. This characteristic, where PRS is based on a particular population, leads to an inherent need to avoid overfitting and underfitting and to address diversity in the development of the scores. Previous studies comparing PRS predictive accuracy for biobank data from different countries have shown that genetic prediction accuracy (based on UK biobank data) was far lower in non-European populations. Indeed, it was 2.5-fold lower in East Asians and 4.9-fold lower in Africans, on average (Martin et al., 2019). This poorer predictive power of PRS in non-European populations, particularly among African ancestry individuals, is most likely due to them being underrepresented within the training data. In the same vein, PRS for breast cancer in African American women based largely on variants identified in European-ancestry populations show poor performance, as DNA susceptibility loci are not similar across race/ethnicity, and have indeed been shown to differ most often for individuals of African ancestry because of their considerably greater genetic diversity (Feng et al., 2017). The way in which each individual variant affects the polygenic score can vary from study to study, adding to the complexity. In addition, using AI for PRS increases the complexity of ethical and social challenges, especially when electronic health records are integrated (Fritzsche et al., 2023). While research on PRS is ongoing, its clinical validity is still debated (Slunecka et al., 2021). Nevertheless, commercial genomic sequencing laboratories are already offering an array of both clinical and direct-to-consumer tests that include PRS as part of their risk prediction products for a variety of diseases and conditions (James et al., 2021).
We wish to draw attention to the importance and timeliness of comparing some key issues that are more broadly emerging from the proposed EU AI Act, especially regarding the banning of “social scoring” in AI systems, with the ethical concerns related to PRS. In particular, when used as a form of ethnicity-related genomic scoring, PRS with poorer predictive power in underrepresented populations could exacerbate ethnically based health discrimination as well as reinforce a reckoning with the relevance of self-reported race, ethnicity, and ancestry, and the relationship of such biomarkers and risk factors to disease diagnoses. PRS is geared primarily toward healthcare/medicine whereas social scoring is used in various areas (e.g., education, finance, insurances, migration, etc.), as well as in healthcare/medicine. While PRS are also developed for other areas than healthcare, e.g., for educational purposes (Merz et al., 2022), such educational attainment polygenic scores are similarly vulnerable to biases due to stratification, thus again highlighting the need for the critical reflection raised in this opinion. While PRS is not the same as social scoring, highlighting the differences and similarities will open up the interface of AI and health risk construction to an even wider audience.
Criticism of PRS in the context of ethnic/ancestry traits
There is no well-established genetic basis for distinctly stratifying human populations by ethnicity (Mersha and Beck, 2020). However, adding parameters of ethnicity to the calculation of polygenic risk scores may reveal statistical correlations and thus interest researchers. It is now widely accepted that most of the genetic diversity in the human species exists between individuals within populations and that only a small fraction of the total genetic diversity is related to variation between ethnic populations (Kaplan & Fullerton, 2022). As geneticist Richard Lewontin (1972) famously asserted, these features of human genetic variation mean that racial classification is of “virtually no genetic or taxonomic significance” and hence should be abandoned. Recently, there are calls for building genetic literacy through education that uses population thinking and multifactorial genetics to refute genetic essentialist beliefs about race (Little et al., 2022). However, with PRS targeting “risk groups,” we are currently witnessing the resurfacing of traditional social groupings like ethnicity and race, re-charged by genomic designations. When risk estimates are applied to patients stratified by self-identified race and/or ethnicity, it may result in a range of consequences, despite the often-unprecise designation of “ethnicity” and its confluence with ancestry (James et al., 2021). Clinical use of PRS could exacerbate race-based health disparities and reinforce systemic biases of self-reported race, ethnicity, and ancestry as biomarkers and risk factors to disease diagnoses (Lewis and Green, 2021). While many common complex traits and diseases differ in their prevalence between racial and/or ethnic groups, particularly in the United States, this has been shown to be the result of pronounced racial and ethnic health disparities rather than genetic differences (Yearby et al., 2022). These concerns regarding the social/ethnic aspects of PRS echo recent concerns about AI-driven social scoring.
Criticism of AI-driven social scoring
The proposed EU AI Act (2021) explicitly bans AI system use by public authorities (expected to later include also the private sectors) for social scoring purposes. Social scoring in this context means using an AI system to evaluate the trustworthiness of individuals based on their behaviors or personal characteristics, leading to stratified treatment of individuals. Adherence to public health measures can affect social scoring, for example, following quarantine measures or receiving vaccinations (Meszaros et al., 2022). The proposed Act explains this as follows:
AI systems providing social scoring of natural persons for general purpose by public authorities or on their behalf may lead to discriminatory outcomes and the exclusion of certain groups. They may violate the right to dignity and non-discrimination and the values of equality and justice. Such AI systems evaluate or classify the trustworthiness of natural persons based on their social behaviour in multiple contexts or known or predicted personal or personality characteristics. The social score obtained from such AI systems may lead to the detrimental or unfavourable treatment of natural persons or whole groups thereof in social contexts, which are unrelated to the context in which the data was originally generated or collected or to a detrimental treatment that is disproportionate or unjustified to the gravity of their social behaviour. Such AI systems should be therefore prohibited. (EU, 2021, article 17, p. 21).
The proposed AI Act lists high-risk AI systems in areas that include, for example, biometric identification and categorization of natural persons, law enforcement, as well as migration, asylum, and border control management. The China social credit system, which allegedly rates individuals based on the aggregation and analysis of data concerning their past behaviors, would be banned by the EU Act, if it indeed uses social scoring.
Discussion
Polygenic risk scores (PRS) and social scoring are two different concepts. PRS are used primarily in medical research and do not involve any evaluation of an individual’s behavior or personal characteristics, but rather are based solely on genetic data. Social scoring, on the other hand, refers to a system of evaluating individuals based on various social and behavioral factors, such as their credit score, online activity, criminal record, or other personal data. However, both may reproduce biases. The concerns raised here could be used to develop a critique of how AI for genomic risk stratification in healthcare/medicine should not only be regulated for representativeness of human diversity but perhaps also for potential amplification of social scoring. This is especially important, as there may be a risk of drawing conclusions from PRS about causal relationships too quickly and with insufficient knowledge of statistics and causality/correlation claims (Fritzsche et al., 2023). By lowering the statistical standards for regarding a marker as trait-associated, weighting associations by estimated effect sizes, and aggregating associations over a larger number of variants, predictive accuracy may be increased at the expense of explainability, as any clear etiological link between specific genetic changes and the phenotype of interest is obscured.
By banning social scoring as an unacceptable risk, the proposed AI Act aims to go beyond the technical robustness, privacy, and safety required by the General Data Protection Regulation (GDPR) to prevent or minimize the probability of unintentional harm in processing personal data by AI systems. The AI Act does not directly mention AI-driven PRS. Nevertheless, in addition to specifying several unacceptable risks, it establishes the goal of minimizing the risk of erroneous or biased AI-assisted decisions in critical areas, including healthcare. We must hence carefully consider PRS in the light of minimizing the risk of erroneous or biased AI-assisted decisions. Arguably, there are three major foci in the proposed AI Act that are relevant to both social scoring and polygenic scores: transparency, non-discrimination, and accountability.
(1) Transparency: The “right to explanation” formulated in the GDPR and the proposed EU AI Act require that AI systems be explainable for high-risk decision making (EU, 2021). The “black box” conundrum is manifested in the context of scoring through the non-explainable relationships between individual genomic variants, PRS and diseases phenotype, similar to the relationships between individual “accountability”, obtained/accessible personal data, and social scoring.
(2) Non-discrimination: AI systems must collect diverse data to avoid bias and prevent the uncertain decision-making and unjust use of such data toward different populations. This requirement is critical in the case of ethnicity-based PRS due to careful consideration of the diversity of the ethnicity.
(3) Accountability: Certain actors, such as the government, health maintenance organizations, or health insurance companies, should be held responsible for the unintended consequences of individual’s actions using PRS. For example, who is responsible if a PRS-based model for breast cancer screening leads to precluding a patient from accessing screening, or has the responsibilities of harm by improper screening due to risk scores that are wrongfully produced based on race and ethnicity?
Social scoring is used in various areas as well as in healthcare/medicine, but for the sake of comparison we focus here on its use in healthcare/medicine, which is the primary area of PRS. If AI-derived PRS evaluates or classifies the risk of natural persons based on their ethnic/racial self-designation (or practitioner-designated), it would be akin to AI-derived social scoring based on previous social behaviours in multiple contexts or known or predicted personal or personality characteristics. The ethnicity related PRS obtained from such AI systems may therefore lead to the detrimental or unfavourable treatment of natural persons or whole groups of persons in healthcare contexts. Further, if the model of PRS-based screening is adopted as standard clinical practice, and if risk scores are produced based on race and ethnicity, it could lead to under- or over-screening. The purpose and implications of the classification must be clear to both those making the classification and those being classified. Social scoring can be wrong due to being based on previous behaviours that are unrelated to the context of scoring or to a detrimental treatment. Ethnicity-related PRS can be wrong because of being based on ethnic/ancestry traits that are similarly unrelated to the context of scoring or to a detrimental treatment. In this case, both AI systems should thus be equally prohibited.
Author contributions
AR and JM both made substantial, direct contributions and approved the final version of the manuscript. All authors contributed to the article and approved the submitted version.
Acknowledgments
We are grateful for the funding provided by the JSPS–ISF Joint Program, grant 62/22 (ISF), JPJSBP120228404 (JSPS), “Biobanks for genomic medicine in Israel and Japan: An analysis of ethics and policy”.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Footnotes
1https://www.genome.gov/genetics-glossary/Polygenic-Risk-Score.
References
EU (2021). Proposal for a regulation of the European parliament and of the council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain union legislative acts. Available at: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A52021PC0206.
Feng, Y., Rhie, S. K., Huo, D., Ruiz Narvaez, E. A., Haddad, S. A., Ambrosone, C. B., et al. (2017). Characterizing genetic susceptibility to breast cancer in women of African ancestry. Cancer Epidemiol. Biomarkers Prev. 26 (7), 1016–1026.
Fritzsche, M. C., Akyüz, K., Cano Abadía, M., McLennan, S., Marttinen, P., Mayrhofer, M. T., et al. (2023). Ethical layering in AI-driven polygenic risk scores – new complexities, new challenges. Front. Genet. 14, 1098439. doi:10.3389/fgene.2023.1098439
James, J. E., Riddle, L., Koenig, B. A., and Joseph, G. (2021). The limits of personalization in precision medicine: Polygenic risk scores and racial categorization in a precision breast cancer screening trial. PLoS ONE 16 (10), e0258571. doi:10.1371/journal.pone.0258571
Kaplan, J. M., and Fullerton, S. M. (2022). Polygenic risk, population structure and ongoing difficulties with race in human genetics. Philos. Trans. R. Soc. Lond., B, Biol. Sci. 2022. doi:10.1098/rstb.2020.0427
Lewis, A. C. F., and Green, R. C. (2021). Polygenic risk scores in the clinic: New perspectives needed on familiar ethical issues. Genome Med. 13, 14. doi:10.1186/s13073-021-00829-7
Lewontin, R. C. (1972). “The apportionment of human diversity,” in Evolutionary biology. Editors T. Dobzhansky, M. K. Hech, and W. C. Steere (New York: NY: Springer), 381–398.
Little, I. D., Koehly, L. M., and Gunter, C. (2022). Understanding changes in genetic literacy over time and in genetic research participants. Am. J. Hum. Genet. 109 (12), 2141–2151. doi:10.1016/j.ajhg.2022.11.005
Martin, A. R., Kanai, M., Kamatani, Y., Okada, Y., Neale, B. M., and Daly, M. J. (2019). Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51 (4), 584–591. doi:10.1038/s41588-019-0379-x
Mersha, T. B., and Beck, A. F. (2020). The social, economic, political, and genetic value of race and ethnicity in 2020. Hum. Genomics 14 (1), 37. doi:10.1186/s40246-020-00284-2
Merz, E. C., Strack, J., Hurtado, H., Vainik, U., Thomas, M., Evans, A., et al. (2022). Educational attainment polygenic scores, socioeconomic factors, and cortical structure in children and adolescents. Hum. Brain Mapp. 43 (16), 4886–4900. doi:10.1002/hbm.26034
Meszaros, J., Minari, J., and Huys, I. (2022). The future regulation of artificial intelligence systems in healthcare services and medical research in the European Union. Front. Genet. 13, 927721. doi:10.3389/fgene.2022.927721
Slunecka, J. L., van der Zee, M. D., Beck, J. J., Johnson, B. N., Finnicum, C. T., Pool, R, et al. (2021). Implementation and implications for polygenic risk scores in healthcare. Hum. Genomics 15 (1), 46. doi:10.1186/s40246-021-00339-y
Keywords: polygenic risk score (PRS), artificial intelligence, stratificati on, ethnicity, transparency
Citation: Raz A and Minari J (2023) AI-driven risk scores: should social scoring and polygenic scores based on ethnicity be equally prohibited?. Front. Genet. 14:1169580. doi: 10.3389/fgene.2023.1169580
Received: 19 February 2023; Accepted: 17 May 2023;
Published: 30 May 2023.
Edited by:
Manuel Corpas, University of Westminster, United KingdomReviewed by:
Stuart McLennan, Technical University of Munich, GermanyMarie-Christine Fritzsche, Technical University of Munich Munich, Germany, in collaboration with reviewer SM
Copyright © 2023 Raz and Minari. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Aviad Raz, YXZpYWRyYXpAYmd1LmFjLmls