Recent Consanguinity and Outbred Autozygosity Are Associated With Increased Risk of Late-Onset Alzheimer’s Disease

Napolioni, Valerio; Scelsi, Marzia A.; Khan, Raiyan R.; Altmann, Andre; Greicius, Michael D.

doi:10.3389/fgene.2020.629373

ORIGINAL RESEARCH article

Front. Genet. , 29 January 2021

Sec. Genetics of Aging

Volume 11 - 2020 | https://doi.org/10.3389/fgene.2020.629373

This article is part of the Research Topic Omics of Human Aging and Longevity in the Post Genome Era: From Single Biomarkers to Systems Biology Approaches View all 7 articles

Recent Consanguinity and Outbred Autozygosity Are Associated With Increased Risk of Late-Onset Alzheimer’s Disease

$\r\nValerio Napolioni*$ Valerio Napolioni^1*

Marzia A. Scelsi²

Raiyan R. Khan³

Andre Altmann²

Michael D. Greicius⁴

for the Alzheimer’s Disease Neuroimaging Initiative

¹Genomic and Molecular Epidemiology (GAME) Lab, School of Biosciences and Veterinary Medicine, University of Camerino, Camerino, Italy
²Computational Biology in Imaging and Genetics (COMBINE) Lab, Centre for Medical Image Computing, Department of Medical Physics and Biomedical Engineering, University College London, London, United Kingdom
³Department of Computer Science, Columbia University, New York, NY, United States
⁴Functional Imaging in Neuropsychiatric Disorders (FIND) Lab, Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, CA, United States

Prior work in late-onset Alzheimer’s disease (LOAD) has resulted in discrepant findings as to whether recent consanguinity and outbred autozygosity are associated with LOAD risk. In the current study, we tested the association between consanguinity and outbred autozygosity with LOAD in the largest such analysis to date, in which 20 LOAD GWAS datasets were retrieved through public databases. Our analyses were restricted to eight distinct ethnic groups: African–Caribbean, Ashkenazi–Jewish European, European–Caribbean, French–Canadian, Finnish European, North-Western European, South-Eastern European, and Yoruba African for a total of 21,492 unrelated subjects (11,196 LOAD and 10,296 controls). Recent consanguinity determination was performed using FSuite v1.0.3, according to subjects’ ancestral background. The level of autozygosity in the outbred population was assessed by calculating inbreeding estimates based on the proportion (F_ROH) and the number (N_ROH) of runs of homozygosity (ROHs). We analyzed all eight ethnic groups using a fixed-effect meta-analysis, which showed a significant association of recent consanguinity with LOAD (N = 21,481; OR = 1.262, P = 3.6 × 10^–4), independently of APOE^∗4 (N = 21,468, OR = 1.237, P = 0.002), and years of education (N = 9,257; OR = 1.274, P = 0.020). Autozygosity in the outbred population was also associated with an increased risk of LOAD, both for F_ROH (N = 20,237; OR = 1.204, P = 0.030) and N_ROH metrics (N = 20,237; OR = 1.019, P = 0.006), independently of APOE^∗4 [(F_ROH, N = 20,225; OR = 1.222, P = 0.029) (N_ROH, N = 20,225; OR = 1.019, P = 0.007)]. By leveraging the Alzheimer’s Disease Sequencing Project (ADSP) whole-exome sequencing (WES) data, we determined that LOAD subjects do not show an enrichment of rare, risk-enhancing minor homozygote variants compared to the control population. A two-stage recessive GWAS using ADSP data from 201 consanguineous subjects in the discovery phase followed by validation in 10,469 subjects led to the identification of RPH3AL p.A303V (rs117190076) as a rare minor homozygote variant increasing the risk of LOAD [discovery: Genotype Relative Risk (GRR) = 46, P = 2.16 × 10^–6; validation: GRR = 1.9, P = 8.0 × 10^–4]. These results confirm that recent consanguinity and autozygosity in the outbred population increase risk for LOAD. Subsequent work, with increased samples sizes of consanguineous subjects, should accelerate the discovery of non-additive genetic effects in LOAD.

Introduction

The impact of consanguinity on reproduction and Mendelian disorders is well known and documented (Bittles and Black, 2010). In contrast, very little has been published on the effects of consanguinity on late-onset diseases, even though inbreeding may have a prominent influence on late-onset traits (Rudan et al., 2003). Recessive inheritance of complex phenotypes can be linked to long [≥1-megabase (Mb)] runs of homozygosity (ROHs), which are indicative of recent consanguinity (Ghani et al., 2013). Levels of homozygosity vary by population owing to the evolutionary distance of different populations from the ancient migration events that led to elevated homozygosity (Pemberton et al., 2012; Kang et al., 2016).

Several studies have been carried out in late-onset Alzheimer’s Disease (LOAD) cohorts from different ethnicities, including Caribbean-Hispanics (Ghani et al., 2013), African Americans (Ghani et al., 2015), Wadi-Ara Arabs (Sherva et al., 2011), and Northern-Europeans (Nalls et al., 2009a; Sims et al., 2011) with the aim of determining the impact of ROHs on LOAD. The studies carried out in Caribbean-Hispanic (Ghani et al., 2013) and African American (Ghani et al., 2015) populations both demonstrated an association of long ROHs with LOAD, thus suggesting a link between recent consanguinity and LOAD. An association between consanguinity and LOAD was also demonstrated in a genealogical study of the Saguenay region in Québec (Vézina et al., 1999). Conversely, in the small ethnic isolate of Wadi-Ara Arabs (Sherva et al., 2011), the average degree of inbreeding was significantly higher in controls compared to cases. Moreover, the two studies carried out in Caucasians (Nalls et al., 2009a; Sims et al., 2011) showed discordant findings: the British–Irish study (Sims et al., 2011) displayed no association of number of ROHs with LOAD, while the mainly North-Western European cohort of neuropathologically verified subjects from the TGenII cohort (Nalls et al., 2009a) showed a suggestive increased number of ROHs in LOAD cases compared to controls. In sum, the results vary considerably by ancestral background, thus failing to provide a clear picture of the overall impact of homozygosity on LOAD risk.

In the present work we tested the association of consanguinity and autozygosity with LOAD by leveraging a large collection of publicly available GWAS data. To this aim, we determined the individual ancestry of subjects belonging to 20 independent GWAS datasets and pooled the consanguineous subjects according to their respective ethnic group. This step was followed by an association analysis between consanguinity in LOAD cases against older, cognitively healthy controls. We also tested the overall impact of genome-wide autozygosity in the outbred population. Finally, we leveraged Whole-Exome Sequencing (WES) data from the Alzheimer’s Disease Sequencing Project (ADSP) (Beecham et al., 2017) to test the global burden of rare minor homozygote variants in LOAD and to perform a two-stage recessive GWAS using 201 consanguineous subjects in the discovery phase followed by validation in 10,469 subjects.

Materials and Methods

Subjects

Twenty LOAD GWAS datasets were obtained from publicly available data repositories (Supplementary Table S1). The 20 datasets have been described in previous studies (Li et al., 2008; Filippini et al., 2009; Lee et al., 2011; Naj et al., 2011; Zhang et al., 2013; Clark et al., 2014; Proitsi et al., 2014; Saykin et al., 2015). Details of the participating studies and genotyping platforms used are provided in Supplementary Table S1.

The Alzheimer’s Disease Neuroimaging Initiative (ADNI) (Saykin et al., 2015) was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging, positron emission tomography, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment and early Alzheimer’s disease.

Whole-exome sequencing from the discovery phase of the Alzheimer’s Disease Sequencing Project (ADSP) (Beecham et al., 2017) was obtained through the National Institute on Aging Genetics of Alzheimer’s Disease Data Storage Site (NIAGADS) and it includes 5,096 Alzheimer’s Disease (AD) cases and 4,965 controls, with an additional enriched sample set comprised of 853 AD cases from multiple affected families and 171 Hispanic controls.

This was a re-analysis of de-identified data available from shared data repositories. The study protocol was granted an exemption by the Stanford Institutional Review Board because the analyses were carried out on “de-identified, off-the-shelf” data.

Inclusion Criteria, Quality Control (QC) Pipeline, Ancestry Determination and Imputation

The entire dataset includes 34,111 participants. Analyses were performed using PLINK 1.9 (Chang et al., 2015). A comprehensive flowchart of the data QC/harmonization/ancestry-determination steps applied to the full dataset is reported as Figure 1.

FIGURE 1

Figure 1. Flow-chart of the workflow adopted in the data QC/harmonization/ancestry-determination processing of the full dataset.

Subjects with autosome missingness (≥5%) and/or X-chromosome missingness (≥5%) within the same dataset, age below 60 years, age information missing, or phenotype inconsistency [missing phenotype, diagnosis of mild cognitive impairment or other neurodegenerative phenotype] were excluded from the analysis (Supplementary Table S2).

Individual ancestry was determined using SNPweights v.2.1 (Chen et al., 2013) using reference populations from the 1000 Genomes Project (1KGP) (1000 Genomes Project Consortium, Auton et al., 2015). By applying an ancestry percentage cut-off ≥ 80%, the samples were stratified into the five super populations, South-Asians (SAS), East-Asians (EAS), Americans (AMR), Africans (AFR), and Europeans (EUR) (Supplementary Table S2). Since most of the samples belonged to the European population, we also determined their ancestry percentage according to four major ethnicities, North-Western, South-Eastern, Ashkenazi-Jewish, and Finnish Europeans, using reference populations available both from SNPweights v.2.1 (Chen et al., 2013) and 1KGP (1000 Genomes Project Consortium, Auton et al., 2015). European subjects were stratified into the above-mentioned ethnicities when their ancestry percentage was attributable with an ancestry percentage cut-off ≥ 50% (Supplementary Table S3).

We assigned French–Canadian (FCN) ancestry to subjects included in GenADA GWAS (Li et al., 2008) when they both reported Canada as country of origin for all the grandparents and French as their first spoken language.

Most subjects belonging to the Columbia University Study of Caribbean Hispanics with Familial and Sporadic Late Onset Alzheimer’s disease (CIDR) (Lee et al., 2011) had admixed ancestry (Supplementary Table S2); therefore, we stratified the subjects into three groups according to their prevalent ancestral background. This stratification allowed the definition of one dataset composed of African–Caribbean (African ancestry ≥ 50%), one dataset composed of European–Caribbean (European ancestry ≥ 50%) and one dataset including highly admixed subjects (ancestry percentage less than 50% attributable to a unique super-population from 1KGP, 1000 Genomes Project Consortium, Auton et al., 2015). Only the African–Caribbean and the European–Caribbean datasets were considered for the analyses. Subjects with genetic ancestry estimates discordant from self-reported ancestry were excluded from the analyses.

Next, datasets were tested for presence of consanguineous subjects using FSuite v.1.0.3 (Gazal et al., 2014). Consanguineous female subjects were flagged to avoid their exclusion because of apparent sex-inconsistency (e.g., due to increased homozygosity at X-chromosome SNPs). Subjects showing sex-inconsistency were excluded along with the possibly contaminated samples {heterozygosity F ≤ −0.03; more than 25 related [(identity-by-descent (IBD) ≥ 0.0625, equivalent to 3rd degree relative] within the same dataset}.

With the aim of maximizing the efficiency of quality control procedures and harmonizing the GWAS results after imputation, we collapsed the genotyping data from the 20 GWAS (when needed), according to sample ethnicity and the number of SNPs shared across the SNP-array platforms. Thus, we defined five groups, reported in Supplementary Table S4, where the subjects from different GWAS were collapsed and further QCed to remove the SNPs with a call rate ≤ 95%; Minor Allele Frequency (MAF) ≤ 1%; SNPs with MAF deviating more than 10% from the MAF reported in 1KGP for the relative population; SNPs with differential missingness between cases and controls (P < 0.05); SNPs deviating from Hardy–Weinberg Equilibrium (HWE) in controls (P < 5 × 10^–5); tri-allelic SNPs; and SNPs where the alleles are mismatched compared to the 1KGP reference sequence. A/T and C/G SNPs were removed prior to imputation.

All the datasets were phased and imputed using the Michigan Imputation Server (Das et al., 2016), considering the Haplotype Reference Consortium r1.1. 2016 European panel (McCarthy et al., 2016) for Europeans, 1KGP Phase 3 African panel (1000 Genomes Project Consortium, Auton et al., 2015) for African Indianapolis-Ibadan and the Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA) (Mathias et al., 2016) for admixed Caribbean. After imputation, SNPs with a r² quality score ≤ 0.7 and MAF ≤ 0.01 were excluded. Results of the imputation process are provided in Supplementary Table S5 and Supplementary Figure S1.

For the statistical analyses, inter-dataset duplicates (IBD ≥ 0.95) were removed from the dataset having the lowest SNP coverage, while, in case of relatedness (IBD ≥ 0.0625) the affected or older subject were kept, independently of SNP coverage (Supplementary Tables S6–S13).

Consanguinity Determination

Consanguinity determination was performed using FSuite v1.0.3 (Gazal et al., 2014), both at the pre- and post-imputation stage on QCed SNPs, according to each subjects’ ancestral background. Results were concordant in 98% of the subjects analyzed. Discordant subjects were kept in the subsequent analyses as outbred, since their ROHs may be somatic Copy Number Variations or linked to a specific ancestral background (e.g., ethnic minorities – Acadians, Sardinians, etc.) not captured by our ancestry-determination pipeline.

Subjects showing a homozygous region over 10 Mb on only one chromosome were considered carriers of putative uniparental isodisomy (UPD), according to the homozygosity cut-off previously reported (Papenhausen et al., 2011). UPD carriers were excluded from the association testing of consanguinity with LOAD since they represent subjects affected by chromosomal alterations, not the result of consanguineous unions.

ROH Calling and Burden Analysis

Runs of homozygosity (≥1 Mb) were determined for each ethnic group separately using PLINK1.9 (Chang et al., 2015) and according to the guidelines recently reported (Keller et al., 2012). GWAS datasets were pruned for strong LD (MAF ≥ 0.01, r² ≤ 0.1) and ROHs were defined as being ≥ 65 consecutive homozygous SNPs with no heterozygote calls allowed and a density greater than 1 SNP per 200 kb.

For each subject, we summed the total length of all their ROHs in the autosome and divided by the total SNP-mappable autosomal distance (2.77 × 10⁹ bases) to derive F_ROH, the proportion (between 0 and 1) of the autosome in ROHs, as previously described (Chang et al., 2015). F_ROH was used as the predictor of case-control status in ROH burden analyses.

Population Genetics, WES Variant Annotation and Statistical Analyses

Inter-population structure was examined using ADMIXTURE (Alexander et al., 2009). Intra-population structure for each ancestral group was determined using principal components (PCs) obtained from EIGENSOFT v.6.1 (Price et al., 2006) using pruned (r² ≤ 0.1), directly genotyped, SNPs.

The association of consanguinity/autozygosity with LOAD was carried out using three different logistic regression models:

(a) MODEL1: adjusting for each subject’s age at LOAD onset (when not available, we used the subject’s age at last visit or death), sex, the first three PC eigenvalues from population structure and GWAS imputation group (only for Ashkenazi-Jewish Europeans, Finnish Europeans, North-Western Europeans and South-Eastern Europeans, since those subjects were genotyped on multiple platforms);

(b) MODEL2: including APOE^∗4 dose to the list of covariates used in MODEL1;

Since EDU is available only for 43.1% of the full dataset (9,260 out of 21,492 subjects), we fitted MODEL1 and MODEL2 to this restricted subset of subjects as well, to allow an appropriate comparison with MODEL3 of the effect estimates. The association of EDU with F_ROH was tested by linear regression, considering MODEL2 and adjusting for diagnosis status. The analyses were carried out for each ethnic group separately and combined using a fixed-effect meta-analysis implemented in GWAMA (Mägi and Morris, 2010).

GWAS participants were mapped to ADSP WES participants through IBD estimation using ∼30K common, overlapping SNPs (MAF > 0.01) between the two datasets. We considered matching samples as those with a pair-wise coefficient of relatedness above 99%. The burden of rare recessive variants was tested using a general linear model, adjusting for sex, age, APOE^∗4 dose and ethnicity. ADSP WES data were annotated using Variant Effect Predictor (VEP) (McLaren et al., 2016). Recessive variants having a CADD (Rentzsch et al., 2019) score ≥ 15 and a MAF < 10% were considered when testing the global burden of rare recessive variants in LOAD and in a two-stage GWAS by applying the Recessive-Allele Frequency Test (RAFT) statistic (Lim et al., 2014).

Statistical significance was set at P < 0.05 for all the association testing, while for the two-stage GWAS, we applied Bonferroni’s correction according to the number of recessive variants tested in the discovery [P < 1.8 × 10^–5 (0.05/2767)] and replication [P < 0.007 (0.05/7)] phases.

Results

Ancestry Determination and Ethnic-Specific Differences

After quality control procedures, our analyses were restricted to eight distinct ethnic groups, namely African–Caribbeans from Dominican Republic (ACD, N = 398), Ashkenazi-Jewish Europeans (AJE, N = 1,229), European–Caribbeans from Dominican Republic (ECD, N = 671), French–Canadians (FCN, N = 376), Finnish Europeans (FIN, N = 219), North-Western Europeans (NWE, N = 16,496), South-Eastern Europeans (SEE, N = 1,083), and African-Yoruba (YRI, N = 1,020), for a total of 21,492 unrelated subjects (11,196 LOAD, 10,296 controls). The analysis of genetic structure, applied to the whole dataset, confirmed the effectiveness of ancestry determination cut-offs (Figure 2).

FIGURE 2

Figure 2. Analysis of the genetic structure confirmed the effectiveness of our ancestry-determination pipeline. Ancestry proportions of the 21,492 unrelated subjects studied (from the eight determined ethnic groups) as revealed by the ADMIXTURE software (Alexander et al., 2009), using pruned (MAF ≥ 0.01, r² ≤ 0.1) SNPs at K = 6. Each color represents a different ancestral component, and each ancestry is a mixture of different components. ACD, African–Caribbean from Dominican Republic; AJE, Ashkenazi-Jewish Europeans; ECD, European–Caribbean from Dominican Republic; FCN, French–Canadians; FIN, Finnish Europeans; NWE, North-Western Europeans; SEE, South-Eastern Europeans; YRI, African Yoruba.

Ethnic groups showed significant differences (P < 0.00001) in mean age, EDU and APOE allele frequency, independently of diagnostic status (Figure 3). Both consanguinity rates and consanguinity degree differed significantly (P < 0.00001) across the eight ethnic groups analyzed, with percentages of consanguineous subjects ranging from 1.2% in YRI to 31.7% in ECD (Table 1). As expected, consanguineous subjects displayed higher F_ROH estimates (0.024 ± 0.025 vs. 0.003 ± 0.002) and more ROH (N_ROH, 9.4 ± 6.3 vs. 4.4 ± 2.5) compared to outbred subjects (P < 0.00001, Table 2). When considering exclusively the outbred population, both F_ROH estimates and N_ROH significantly differed across the eight ethnic groups analyzed (P < 0.00001, Figure 3), with ACD and ECD showing the lowest F_ROH (0.0007 ± 0.0012) and N_ROH (0.4 ± 0.8), respectively. Conversely, the highest F_ROH and N_ROH were found in FIN (0.0053 ± 0.0038) and FCN (8.4 ± 2.9), respectively (Figure 3).

FIGURE 3

Figure 3. The eight ethnic groups analyzed showed ethnic-specific differences in Alzheimer’s-relevant risk factors. (A) mean age, (B) EDU, (C) APOE^∗2 frequency, (D) APOE^∗4 frequency, (E) inbreeding coefficient, and (F) number of ROHs (>1 Mb) differ significantly across the eight ethnic groups analyzed (P < 0.00001).

TABLE 1

Table 1. Consanguinity rates differ across the eight ethnic groups analyzed.

TABLE 2

Table 2. Genome-wide autozygosity determined through F_ROH estimates and number of ROHs across the eight ethnic groups according to case-control and consanguinity status.

Consanguineous subjects in ACD, ECD, FIN, and NWE groups showed a statistically significant (ACD, ECD, NWE), or nearly significant (FIN), lower education level compared to the relative outbred population (on average, 1.4 fewer EDU, Supplementary Tables S6–S13). Conversely, YRI consanguineous subjects showed higher education levels compared to the relative outbred population (Supplementary Table S13). No association was detected between F_ROH and EDU in the outbred sample (N = 8,648, β = −0.059, SE = 0.196, P = 0.760, heterogeneity-Q = 2.1 × 10^–7). However, a highly significant heterogeneity was found across seven ethnic groups, with ACD (β = −3.765, P = 0.077), ECD (β = −2.688, P = 0.059), and FIN (β = −4.959, P < 0.00001) showing a significant, or nearly significant, negative correlation between F_ROH and EDU, while YRI displayed an opposite correlation (β = 0.870, P = 0.079).

No significant differences were found for mean age between inbred and outbred groups. Consanguinity rates reported for the eight analyzed ethnic groups largely fall within the ranges reported in the literature (Goldschmidt et al., 1960; Gazal et al., 2015; Vardarajan et al., 2015).

The distribution of APOE genotypes and alleles did not show deviations from HWE (Supplementary Table S14). In addition, APOE genotypes and allele counts differed significantly between outbred and consanguineous subjects only for AJE controls, FCN cases and FIN controls (Supplementary Table S14).

Consanguinity Is Associated With an Increased Risk of LOAD

Consanguinity was significantly associated with LOAD (N = 21,481, OR = 1.262, P = 3.6 × 10^–4), independently of APOE^∗4 (N = 21,468, OR = 1.237, P = 0.002) and EDU (N = 9,257, OR = 1.274, P = 0.020) (Table 3).

TABLE 3

Table 3. Consanguinity increases the risk of LOAD.

When testing the association of degree of consanguinity with LOAD by separating close (first cousin/double-first cousin/avuncular offspring) from distant (second cousin offspring) consanguinity, it became clear that the association was driven by close consanguinity (close: N = 19,227, OR = 1.713, P = 0.002; distant: N = 21,284, OR = 1.207, P = 0.007, Table 3). The association reported for close and distant consanguinity was independent of APOE^∗4 and EDU (Table 3). When considering the analyses carried out in the smaller EDU subset, the inclusion of APOE^∗4 does not reduce statistical estimates of the associations (Table 3). Conversely, the inclusion of EDU as a variable slightly decreases all the associations reported in MODEL3 compared to MODEL2, such that the association of distant consanguinity with LOAD in MODEL3 trends in the same direction but is no longer statistically significant (OR = 1.170, P = 0.158, Table 3).

Autozygosity in the Outbred Population Is Associated With an Increased Risk of LOAD

Table 4 reports the results obtained from the meta-analysis of the association of genome-wide autozygosity determined both by F_ROH estimates and by N_ROH across the eight ethnic groups. When considering the full dataset, both F_ROH and number of ROHs are significantly associated with LOAD, independently of APOE^∗4 (Table 4). However, when testing the association of F_ROH and the number of ROHs with LOAD in the subset with information on EDU, the meta-analysis results are not statistically significant for Table 4, likely reflecting a lack of power rather than an effect of education given that MODELS 1 and 2 are also no longer significant with these sample sizes.

TABLE 4

Table 4. Autozygosity in the outbred population increases the risk of LOAD.

LOAD Genome Is Not Enriched in Rare Recessive Damaging Variants

Given the consistent association of LOAD with consanguinity and autozygosity, we leveraged ADSP WES data to establish whether LOAD subjects showed an enrichment of damaging recessive variants compared to the control population. After merging GWAS imputed data from NWE, SEE, AJE, FIN, and FCN groups with ADSP WES data, we determined that 4,969 subjects were overlapping between the two datasets (AJE = 287; FIN = 47; NWE = 4,424; SEE = 211).

Previous studies have shown that long ROHs are enriched for damaging homozygous variants, with the majority having a MAF ≤ 5% (Pemberton et al., 2012; Pemberton and Szpiech, 2018). Thus, we determined the number of rare, deleterious minor homozygote variants (RMHV) for each GWAS subject that was also whole-exome sequenced through ADSP. The four ethnic groups differed significantly in the average individual number of RMHV in their respective outbred population (N = 4,753, P = 0.0002), independently of diagnostic status (AJE: 18.97 ± 0.31; FIN:16.65 ± 0.74; NWE: 14.88 ± 0.08; SEE: 18.16 ± 0.38). As expected, consanguineous subjects displayed a significantly higher average individual number of RMHV compared to the outbred population, independently of their ethnicity and diagnostic status (23.31 ± 0.37 vs. 15.26 ± 0.08, P < 0.00001). Notably, the average RMHV in 12 subjects carrying a putative UPD was lower than the one reported for the outbred group, and significantly different compared to distant or close consanguineous subjects (UPD:14.46 ± 1.43; outbred:15.26 ± 0.07; distant consanguinity:20.23 ± 0.39; close consanguinity:42.14 ± 1.43).

When testing the burden of RMHV in LOAD vs. controls, no significant association was detected in the outbred (LOAD:15.15 ± 0.10; Controls:15.35 ± 1.14; P = 0.303) or consanguineous (LOAD:23.97 ± 0.99; Controls:24.48 ± 1.67; P = 0.805) group.

Identification of RPH3AL p.A303V (rs117190076) as RMHV Associated With LOAD

Despite the lack of association between the burden of RMHV and LOAD, we decided to leverage WES data to perform a two-stage recessive-GWAS using the 201 consanguineous subjects identified in ADSP as discovery phase, followed by validation in the remaining 10,469 ADSP subjects. To this aim, we applied the RAFT statistic (Lim et al., 2014) to the 2,767 RMHV detected in the discovery cohort composed exclusively of consanguineous subjects. Seven RMHV yielded a Bonferroni’s corrected statistically significant P < 1.8 × 10^–5 (Table 5). When applying the RAFT statistic to the seven variants in the validation group, only the RPH3AL missense variant (rs117190076, NP_001177340 p.A303V), successfully replicated (Genotype Relative Risk = 1.9, P = 8.0 × 10^–4). However, we could not validate one of the seven variants passing the statistical threshold in the discovery phase (SCAPER on chr15q24.3, rs200719909, NP_001339938 p.A280V), because no minor homozygote was detected in the replication/validation phase conducted on outbred subjects (Table 5). Remarkably, 523 out of 2,767 RMHV tested in the discovery group (18.9%) did not have a minor homozygote counterpart in the validation group (Supplementary Table S15). Although those variants did not pass the statistical threshold, set up by applying the Bonferroni’s correction in the discovery phase on outbred subjects, they may still have functional/causal role in LOAD.

TABLE 5

Table 5. Identification of RPH3AL rs117190076 as rare, deleterious minor homozygote variant associated with LOAD.

Putative UniParental Disomy Does Not Associate With LOAD

During the inbreeding determination process, 56 subjects (6 ACD, 42 NWE, 8 SEE) were found to be potential cases of UPD (Figure 4). The presence of UPD did not show a significant association with an increased risk for LOAD in a logistic regression testing the presence of putative isodisomy compared to the rest of ACD, NWE, and SEE outbred populations (OR = 1.561, P = 0.158). The origin of UPD in our subjects is unknown due to the lack of genotype data from their parents. Nine putative UPD subjects had first-degree relatives genotyped and in these nine cases none of the first-degree relatives showed any evidence of consanguinity or shared long ROHs, suggesting the presence of true isodisomy. Moreover, one of the 12 UPD subjects had ADSP WES data, showing a UPD on chromosome 9p (9p-UDP), was homozygote for the somatic JAK2 V617F (rs77375493) mutation. Since the co-occurrence of JAK2 V617F mutation and 9p-UPD is very common in hematological malignancies (Wang et al., 2016), a somatic (hematologic) origin for some of the reported UPD is highly conceivable, especially since 9p-UPD is the most common UPD among our UPD subjects (8/56, 14%, Figure 4).

FIGURE 4

Figure 4. Numerous cases of putative isodisomy misidentified as consanguineous subjects. Fifty-six subjects identified as consanguineous by FSuite v1.0.3 (Gazal et al., 2014) showed a single homozygous region over 10 Mb on just one chromosome, the homozygosity cut-off previously reported to define the presence of putative uniparental isodisomy (UPD). Two female subjects showing a putative UPD on X-chromosome are not shown. Red = LOAD; Blue = Control.

Discussion

Our results clearly demonstrate the effect of recent consanguinity and outbred autozygosity in increasing the risk of LOAD consistently across the eight ethnic groups analyzed, independently of APOE^∗4 and EDU. Several important features separate our work from previous studies looking at the impact of consanguinity and autozygosity on LOAD. First, and most critically, this is the largest such study to date combining 11,196 cases and 10,296 controls across eight ethnically distinct populations. Second, we went beyond standard super-population definitions of ethnicity and determined European sub-ancestry (NWE, AJE, FCN, FIN, NWE, and SEE), since it has previously been established that these ethnicities have different inbreeding rates (Pemberton et al., 2012; Kang et al., 2016), or are characterized by founder effects (AJE, FCN, FIN) (Jakkula et al., 2008; Roy-Gagnon et al., 2011). Third, rather than examining autozygosity across all subjects, we enriched our sample by identifying a consanguineous subset and analyzing them separately from the relative outbred population. This step allowed us to estimate the risk for LOAD attributable to the mating types of consanguineous subjects (first cousin/double-first cousin/avuncular vs. second cousin offspring), thereby providing a measure applicable to the clinical setting. Fourth, we leveraged the large amount of WES data from ADSP to determine the contribution of rare recessive damaging variants in LOAD. Lastly, we provided, for the first time, an estimate of the impact of putative isodisomy on LOAD; these subjects were also removed from our analysis of inbred subjects, thereby eliminating a source of noise since these subjects are wrongly identified as consanguineous using standard measures.

The overall results suggest the existence of inbreeding depression, which is a recognized phenomenon that is common to polygenic traits in all living organisms (Joshi et al., 2015). Inbreeding depression is thought to result from increased homozygosity of multiple recessive alleles that act in the same direction of effect at loci that influence the phenotype of interest (“directional dominance”) (Joshi et al., 2015). In a consanguineous individual, inbreeding depression is predicted to affect many polygenic endophenotypes which can be established risk factors for late-onset diseases, such as blood pressure, body mass index, cholesterol levels, glucose levels, and bone mineral density (Rudan and Campbell, 2004). The previous negative association of educational attainment and general cognitive abilities with genome-wide autozygosity (Joshi et al., 2015) suggests involvement of directional dominance at these two endophenotypes in increasing the risk for LOAD. Indeed, it has been widely reported that lower education is associated with a greater risk for dementia (Sharp and Gatz, 2011), while lower general cognitive abilities have been linked to an increased risk of dementia according to the cognitive reserve theory (Schmand et al., 1997). However, the present results show that the association of consanguinity with LOAD is independent of educational attainment. This evidence leads us to speculate on the involvement of other polygenic endophenotypes mediating the association of consanguinity with LOAD or on the direct effect of recessive loci in LOAD, yet to be discovered. In this context, we can mention that a recent study (Andrews et al., 2021), leveraging Polygenic Risk Score/Mendelian Randomization analyses on a large sample (N = 26,431 LOAD cases/controls tested for 22 LOAD risk factors/clinical biomarkers), strongly supported a causal role for blood pressure and cholesterol levels with LOAD phenome. Thus, it may be conceivable that directional dominance acting on blood pressure and cholesterol levels may be contributing to the association reported here. Future studies targeting a larger subset of consanguineous subjects, phenotypically characterized in a more homogeneous way, ideally including clinically relevant biomarkers such as blood pressure and cholesterol levels, will allow us to better determine the impact of directional dominance at those endophenotypes in LOAD.

Notably, despite significant differences in consanguinity rates, autozygosity level, mean age, mean EDU, and APOE frequencies, each of the ethnic groups individually showed significant association (or a non-significant trend in the same direction) for the association of close consanguinity or autozygosity with an increased risk of LOAD. Our results in ACD, ECD, and YRI are reassuringly and not surprisingly, in line with the previous studies (Ghani et al., 2013, 2015) since we used overlapping datasets. However, the results in the European groups (AJE, FCN, FIN, NWE, and SEE) are new and highlight interesting differences across the five ethnicities.

Previous studies carried out on Europeans of British/Irish descent (Nalls et al., 2009a; Sims et al., 2011) reported inconsistent results on the role of ROHs in Caucasians. However, neither study had sufficient power to detect significant results given the small sample size (N < 3,000). Indeed, given the small variation in genome wide F_ROH in unselected samples (standard deviation in our analyses are on the order of 0.001), large sample sizes (e.g., >12,000) are necessary to detect inbreeding depression given the relatively small effect sizes in samples not selected for recent inbreeding (Keller et al., 2012).

We also leveraged WES data from ADSP to determine the contribution of rare recessive variants in LOAD. The lack of association between the global burden of rare recessive variants and LOAD suggests either the involvement of increased homozygosity at common loci or the existence of specific recessive loci driving the association of consanguinity with LOAD. The two-stage recessive-GWAS we carried out using ADSP WES data showed the association of RPH3AL p.A303V (rs117190076) with LOAD. The RPH3AL gene (also known as NOC2), located on 17p13.3 (OMIM^∗604881), encodes for the Rabphilin 3A-like (without C2 domains) protein which plays an essential role in endocrine and exocrine cells, ranging from the accumulation of secretory granules of increased size to impairments in the regulated release of their secretory products (Cheviet et al., 2004). In particular, RPH3AL has been shown to be a crucial effector for RAB3A and RAB27A in the regulation of secretory vesicle exocytosis (Fukuda et al., 2004). The dysregulation of RAB3A and RAB27A has already been linked to Alzheimer’s and other neurodegenerative disorders (Davidsson et al., 2001; Ginsberg et al., 2011; Bereczki et al., 2016; Iguchi et al., 2016), while the ancestral RPH3A (Rabphilin 3A) gene (Craxton, 2010) was found to influence dementia severity, cholinergic deafferentation, and increased β-amyloid concentrations in postmortem neocortex of Alzheimer’s disease patients (Tan et al., 2014). Moreover, other rs117190076-unlinked variants at the RPH3AL locus have been associated with LOAD-related phenotypes, such as Alzheimer’s age-at-onset in PSEN1 E280A carriers (rs4341804, P = 7.10 × 10^–13) (Vélez et al., 2013) and cognitive performance scores in electronic health records (rs74192827, P = 5.02 × 10^–7) (McCoy et al., 2018). Thus, the overall evidence suggests a functional role of the RPH3AL locus in LOAD that clearly warrants further investigations.

Interestingly, our work highlighted the presence of potential UPD carriers in the population studied, with prevalence estimates of 0.25% in NWE, 0.74% in SEE and 1.51% in ACD, respectively, showing a trend toward a significant association with increased risk of LOAD. Current estimates of UPD in the general population suggests a general prevalence of 0.05% (1 in 2,000 births) (Nakka et al., 2019), lower than the estimates we reported. One explanation for this discrepancy could be the fact that we were not able to determine whether those long, unique, ROHs (used to define the presence of UPD) may turn to be true long deleted genomic regions, of somatic or germ-line origin. Indeed, we did not have access to raw SNP-array intensity data from most of the cohorts included in our study, leading to an under-estimation of large deletions and a consequent increased number of subjects carrying a putative UPD. Nonetheless, our studied sample is mostly representative of the elderly population, where age-related somatic events, like Clonal Hematopoiesis of Indeterminate Potential (CHIP), already linked to cardiovascular disease (Jaiswal et al., 2017), may result in large somatic genomic aberrations, such as “pseudo” 9p-UPD (Wang et al., 2016), that can be misinterpreted as germline UPD. A deeper analysis of these phenomena is clearly warranted, since it may offer important insights into the missing heritability of several age-related diseases. In this context, it is remarkable that functional mutations of the TET2 gene, a main driver of CHIP (Jaiswal et al., 2017), have recently been found to be associated with multiple neurodegenerative disorders, including LOAD (Cochran et al., 2020).

One important limitation is that the ethnic stratification, especially for the European groups, led to very small samples in terms of the number of inbred subjects for some of the ancestral groups [e.g., FCN, N = 53; FIN, N = 31 (Supplementary Tables S6–S13)]. Nonetheless, the meta-analytic approach used can greatly mitigate potential biases due to the inclusions of small samples, while providing a better sense of the ethnic-related differences in consanguinity prevalence. Similarly, considering the important contribution of somatic genomic events related to aging such as CHIP, the heterogeneous nature of the specimens used (e.g., whole blood vs. post-mortem brain tissue) in SNP-array genotyping across the different cohorts and samples may have led to uncontrolled biases when determining the impact of true ROHs vs. large genomic deletions of somatic origin.

Overall, these results provide substantial evidence that consanguinity increases risk for LOAD. One might anticipate a change in the genetic architecture of LOAD in the coming decades when more recent cohorts, composed of subjects born after the World War II, will be analyzed. Panmixia and larger effective population sizes have resulted in decreasing autozygosity as the chronological age of a population decreases (Nalls et al., 2009b). Consistent with this pattern, mounting evidence suggests that trends in dementia incidence rates are decreasing (Satizabal et al., 2016; Derby et al., 2017; Noble et al., 2017). Subsequent work with increased sample sizes of consanguineous subjects should accelerate the discovery of non-additive genetic effects in LOAD.

Alzheimer’s Disease Neuroimaging Initiative

Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such. the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author/s.

Ethics Statement

Author Contributions

VN: conceptualization, data curation, formal analysis, investigation, methodology, and writing – original draft preparation. MS: formal analysis, validation, and writing – review and editing. RK: formal analysis, resources, writing – review and editing. AA: writing – review and editing. MG: funding acquisition, supervision, and writing – review and editing. All authors contributed to the article and approved the submitted version.

Funding

This study was supported by the following funding sources: MG: NIH grants AG066515 and AG060747, the McKnight Memory and Cognitive Disorders Award, the Iqbal Farrukh & Asad Jamal Fund, the Show Me Charitable Foundation, and the J. W. Bagley Foundation. AA: Medical Research Council (grant number MR/L016311/1).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

The authors wish to thank Professor Noah A. Rosenberg (Stanford University) for his insightful comments on the manuscript. MS acknowledges financial support by the EPSRC-funded UCL Centre for Doctoral Training in Medical Imaging (EP/L016478/1). AA holds a Medical Research Council eMedLab Medical Bioinformatics Career Development Fellowship. Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering and through generous contributions from the following: AbbVie. Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica. Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir. Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals. Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd. and its affiliated company Genentech. Inc.; Fujirebio; GE HealtControlsare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development. LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co. Inc.; Meso Scale Diagnostics. LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. Acknowledgment for Alzheimer Disease Genetic Analysis Data Biological samples and Associated Phenotypic Data used in primary data analyses were stored at Principal Investigators’ institutions, and at the National Cell Repository for Alzheimer’s Disease (NCRAD) at Indiana University funded by NIA. Associated phenotypic data used in primary and secondary data analyses were provided by Principal Investigators, the NIA funded Alzheimer’s Disease Centers (ADCs), and the National Alzheimer’s Coordinating Center (NACC) and stored at Principal Investigators’ Institutions, NCRAD, and at the National Institute on Aging Alzheimer’s Disease Data Storage Site (NIAGADS) at the University of Pennsylvania, funded by NIA. Contributors to the Genetic Analysis Data included Principal Investigators on projects that were individually funded by NIA, other NIH Institutes, private U.S. organizations, or foreign governmental or non-governmental organizations. The NACC database is funded by NIA/NIH Grant U01 AG016976. NACC data are contributed by the NIA funded ADCs: P30 AG019610 (PI Eric Reiman, MD), P30 AG013846 (PI Neil Kowall, MD), P50 AG008702 (PI Scott Small, MD), P50 AG025688 (PI Allan Levey, MD, PhD), P50 AG047266 (PI Todd Golde, MD, Ph.D.). P30 AG010133 (PI Andrew Saykin, PsyD), P50 AG005146 (PI Marilyn Albert, Ph.D.), P50 AG005134 (PI Bradley Hyman, MD, Ph.D.), P50 AG016574 (PI Ronald Petersen, MD, Ph.D.), P50 AG005138 (PI Mary Sano, Ph.D.), P30 AG008051 (PI Steven Ferris, Ph.D.), P30 AG013854 (PI M. Marsel Mesulam, MD), P30 AG008017 (PI Jeffrey Kaye, MD), P30 AG010161 (PI David Bennett, MD), P50 AG047366 (PI Victor Henderson, MD, MS), P30 AG010129 (PI Charles DeCarli, MD), P50 AG016573 (PI Frank LaFerla, Ph.D.), P50 AG016570 (PI Marie-Francoise Chesselet, MD, Ph.D.), P50 AG005131 (PI Douglas Galasko, MD), P50 AG023501 (PI Bruce Miller, MD), P30 AG035982 (PI Russell Swerdlow, MD), P30 AG028383 (PI Linda Van Eldik, Ph.D.), P30 AG010124 (PI John Trojanowski, MD, Ph.D.), P50 AG005133 (PI Oscar Lopez, MD), P50 AG005142 (PI Helena Chui, MD), P30 AG012300 (PI Roger Rosenberg, MD), P50 AG005136 (PI Thomas Montine, MD, Ph.D.), P50 AG033514 (PI Sanjay Asthana, MD, FRCP), P50 AG005681 (PI John Morris, MD), and P50 AG047270 (PI Stephen Strittmatter, MD, Ph.D.).” The genotypic and associated phenotypic data used in the study. “Multi-Site Collaborative Study for Genotype-Phenotype Associations in Alzheimer’s Disease (GenADA)” were provided by the GlaxoSmithKline, R&D Limited. The “Genome Wide Association study of Yoruba in Nigeria” is part of the “Indianapolis – Ibadan Dementia Project (R01 AG009956)” funded by the Division of Neuroscience, National Institute on Aging, National Institutes of Health, ROSMAP study data were provided by the Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago. Data collection was supported through funding by NIA grants P30AG10161, R01AG15819, R01AG17917, R01AG30146, R01AG36836, U01AG32984, U01AG46152, the Illinois Department of Public Health, and the Translational Genomics Research Institute. Acknowledgment for the Columbia University Study of Caribbean Hispanics with Familial and Sporadic Late Onset Alzheimer’s disease (CIDR): funding support for the “Genetic Consortium for Late Onset Alzheimer’s Disease” was provided through the Division of Neuroscience, NIA. The Genetic Consortium for Late Onset Alzheimer’s Disease includes a genome-wide association study funded as part of the Division of Neuroscience, NIA. Assistance with phenotype harmonization and genotype cleaning, as well as with general study coordination, was provided by Genetic Consortium for Late Onset Alzheimer’s Disease. The AddNeuroMed data are from a public-private partnership supported by EFPIA companies and SMEs as part of InnoMed (Innovative Medicines in Europe), an Integrated Project funded by the European Union of the Sixth Framework program priority FP6-2004-LIFESCIHEALTH-5. Clinical leads responsible for data collection are Iwona Kłoszewska (Lodz), Simon Lovestone (London), Patrizia Mecocci (Perugia), Hilkka Soininen (Kuopio), Magda Tsolaki (Thessaloniki), and Bruno Vellas (Toulouse), imaging leads are Andy Simmons (London), Lars-Olad Wahlund (Stockholm), and Christian Spenger (Zurich) and bioinformatics leads are Richard Dobson (London) and Stephen Newhouse (London).

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2020.629373/full#supplementary-material

References

1000 Genomes Project Consortium, Auton, A., Brooks, L. D., Durbin, R. M., Garrison, E. P., Kang, H. M., et al. (2015). A global reference for human genetic variation. Nature 526, 68–74. doi: 10.1038/nature15393

PubMed Abstract | CrossRef Full Text | Google Scholar

Alexander, D. H., Novembre, J., and Lange, K. (2009). Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664. doi: 10.1101/gr.094052.109

PubMed Abstract | CrossRef Full Text | Google Scholar

Andrews, S. J., Fulton-Howard, B., O’Reilly, P., Marcora, E., Goate, A. M., and Collaborators of the Alzheimer’s Disease Genetics Consortium, (2021). Causal associations between modifiable risk factors and the Alzheimer’s Phenome. Ann. Neurol. 89, 54–65. doi: 10.1002/ana.25918

PubMed Abstract | CrossRef Full Text | Google Scholar

Beecham, G. W., Bis, J. C., Martin, E. R., Choi, S. H., DeStefano, A. L., van Duijn, C. M., et al. (2017). The Alzheimer’s disease sequencing project: study design and sample selection. Neurol. Genet. 3:e194.

Recent Consanguinity and Outbred Autozygosity Are Associated With Increased Risk of Late-Onset Alzheimer’s Disease

Introduction

Materials and Methods

Subjects

Inclusion Criteria, Quality Control (QC) Pipeline, Ancestry Determination and Imputation

Consanguinity Determination

ROH Calling and Burden Analysis

Population Genetics, WES Variant Annotation and Statistical Analyses

Results

Ancestry Determination and Ethnic-Specific Differences

Consanguinity Is Associated With an Increased Risk of LOAD

Autozygosity in the Outbred Population Is Associated With an Increased Risk of LOAD

LOAD Genome Is Not Enriched in Rare Recessive Damaging Variants

Identification of RPH3AL p.A303V (rs117190076) as RMHV Associated With LOAD

Putative UniParental Disomy Does Not Associate With LOAD

Discussion

Alzheimer’s Disease Neuroimaging Initiative

Data Availability Statement

Ethics Statement

Author Contributions

Funding

Conflict of Interest

Acknowledgments

Supplementary Material

References

95% of researchers rate our articles as excellent or good