METHODS article

Front. Bioinform., 08 March 2024

Sec. Drug Discovery in Bioinformatics

Volume 4 - 2024 | https://doi.org/10.3389/fbinf.2024.1295600

A network-based method for associating genes with autism spectrum disorder

  • 1. Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel

  • 2. Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel

Abstract

Autism spectrum disorder (ASD) is a highly heritable complex disease that affects 1% of the population, yet its underlying molecular mechanisms are largely unknown. Here we study the problem of predicting causal genes for ASD by combining genome-scale data with a network propagation approach. We construct a predictor that integrates multiple omic data sets that assess genomic, transcriptomic, proteomic, and phosphoproteomic associations with ASD. In cross validation our predictor yields mean area under the ROC curve of 0.87 and area under the precision-recall curve of 0.89. We further show that it outperforms previous gene-level predictors of autism association. Finally, we show that we can use the model to predict genes associated with Schizophrenia which is known to share genetic components with ASD.

Introduction

Autism spectrum disorder (ASD) is a complex neurological and developmental disorder that affects a person’s behavior, communication, and learning abilities. About 1 in 44 children is identified with the disorder according to estimates from Centers for Disease Control and Prevention (CDC) Autism and Developmental Disabilities Monitoring (ADDM) Network (Maenner, 2021). It is thought to be caused by a combination of genetic and environmental factors that impacts the structure and function of the brain and nervous system (Flickr, 2023). Identifying the genetic base of ASD is critical to understanding its underlying biological mechanisms. Such knowledge will impact the development of new interventions and treatments for individuals affected by this disorder.

Extensive molecular studies have charted the landscape of ASD with respect to different information layers including genome wide association (GWAS), differential gene expression (Parikshak et al., 2016; Gandal et al., 2018), differential transcript expression (Gandal et al., 2018), alternative splicing changes (Parikshak et al., 2016; Gandal et al., 2018), differential methylation (Wong et al., 2019), copy number variation (Sanders et al., 2015) and more (Satterstrom et al., 2020). Each of these studies have come up with candidate lists of ASD-associated genes, calling for computational methods to consolidate these gene lists.

Machine learning based methods offer a new perspective to the problem by learning from known ASD-related genes and building models that provide ways to prioritize the risk associated with previously unknown genes based on their predicted scores (Liu et al., 2014; Krishnan et al., 2016; Duda et al., 2018; Brueggeman et al., 2020; Lin et al., 2020). These methods differ in their training features and prioritization method. (Duda et al., 2018) and (Krishnan et al., 2016) leveraged brain-specific functional interaction networks to produce genome-wide rankings of ASD associated genes. (Liu et al., 2014) clustered evidence for ASD association within a co-expression network in specific brain regions. (Lin et al., 2020) used features extracted from spatiotemporal gene expression patterns in the human brain. Last, the state-of–the-art forecASD (Brueggeman et al., 2020) integrates network-based information from large gene interaction networks with scores of genetic association and brain gene expression information. In detail, novel features are generated from BrainSpan expression and STRING interaction data. These are combined with literature-derived features from DAWN, DAMAGES, and Krishnan (Liu et al., 2014); (Krishnan et al., 2016; Zhang and Shen, 2017) and used to train a random forest classifier for ASD association.

The methods described above have predominantly relied on a single data source, potentially missing relevant information. Conversely, some approaches such as forecASD utilize multiple data sources that are devoid of network context. To address these limitations, our proposed method leverages a network propagation technique to integrate diverse data sources while accounting for their network context. Our model employs network-propagation on ASD associated genes from different data sets to derive predictive gene scores. Features are then combined using a random forest classifier. We evaluate the performance of our model and compare it to previous methods. Finally, we use our model to predict Schizophrenia-associated genes.

Materials and methods

Classification pipeline

The computational pipeline has two stages. First, network-based gene features are generated using a network propagation technique. Second, a random forest model is applied to these features to yield the prediction score. The classifier is summarized in Figure 1. Its stages are described in the following sections.

FIGURE 1

Feature generation

Our starting point is gene lists obtained from the literature as detailed in Table 1. Overall, we use ten gene sets that were suggested to be associated with ASD based on various layers of information.

TABLE 1

ReferencesData typeData sourceNumber of genes
1Gandal et al., (2018)Differential gene expression (DGE)Frontal and temporal cortex brain samples1,611
2Gandal et al., (2018)Differential transcript expression (DTE)Frontal and temporal cortex brain samples767
3Gandal et al., (2018)Summary-data–based Mendelian randomization (SMR)Frontal and temporal cortex brain samples36
4Gandal et al., (2018)Transcriptome-wide association study (TWAS)Frontal and temporal cortex brain samples12
5Parikshak et al., (2016)Differential gene expression (DGE)Cortex Samples of frontal and temporal cortex and cerebellum1,142
6Parikshak et al., (2016)Differential alternative splicing analysisCortex Samples of frontal and temporal cortex and cerebellum833
7Satterstrom et al., (2020)Transmitted and de novo association (TADA) modelWhole-exome sequence (WES) data from 35,584 samples102
8Wong et al., (2019)Cross-cortex iASD-associated genes that reported both signficant differential DNA mtheyltaion and transcriptional changes223 post-mortem tissues samples isolated from three brain regions [prefrontal cortex, temporal cortex and cerebellum (CB)]18
9Wong et al., (2019)Cross-cortex dup15q-associated genes that reported both significant differential DNA methylation and transcriptional changes223 post-mortem tissues samples isolated from three brain regions [prefrontal cortex, temporal cortex and cerebellum (CB)]74
10Sanders et al., (2015)Analysis of de novo CNVs (dnCNVs) from the full Simons Simplex Collection65

Lists of ASD associated genes according to different sources.

Each of these ASD related gene lists is used as a seed for a network propagation process that pinpoints other genes with high proximity to the seed set in a protein-protein interaction (PPI) network. The initial value of each seed protein from a list of size s is set to 1/s. We use a human PPI network from (Signorini et al., 2021) which has 20,933 proteins and 251,078 interactions in its main connected component. We run network propagation with default damping parameter ɑ = 0.8. We normalize the results using the eigenvector centrality method (Erten et al., 2011) in order to avoid biases which are caused by the degrees of the proteins. The resulting ten propagation scores for each gene comprise its feature set.

Random forest model

The features of a gene are integrated using a random forest model. To train the model, we use SFARI’s Gene Scoring Module (Abrahams et al., 2013) which offers critical evaluation of the strength of the evidence for each gene’s association with ASD. The genes are assigned to four categories: “Syndromic” (S), “Category 1” (High Confidence), “Category 2” (Strong Candidate) and “Category 3” (Suggestive Evidence). We label “Category 1” genes as positives (206 in total) and randomly pick 206 negative genes that do not appear in the SFARI database.

The random forest model is trained with the “sklearn” Python package using its default parameters which are a maximum of 100 trees, no maximum tree depth and minimum number of samples required to split an internal node of 2.

Results and discussion

Model performance

We tested our classifier using 5-fold cross-validation. Figure 2 depicts ROC and precision-recall graphs showing the final result as the mean between all the five fold scores. The AUROC is 0.87 and the AUPRC is 0.89 indicating the high accuracy of our method.

FIGURE 2

To facilitate the model’s application by potential users, we calculated an optimal classification cutoff of 0.86, which maximizes the product of specificity and sensitivity (Liu, 2012). The full model scores can be found in Supplementary Table S2.

To further support our results, we tested our classifier’s predictions on SFARI genes of scores 2 and 3 (which were not used during training), compared to randomly chosen negative genes (Figure 3). The classifier’s score distributions of the groups were compared using the Wilcoxon signed-rank test. Reassuringly, SFARI genes with scores 2 and 3 got significantly higher scores than the random negative genes (p-value < 3.62e-34).

FIGURE 3

After establishing the accuracy of our proposed method, we conducted a comparative analysis with the state-of-the-art forecASD classifier, which was shown to outperform earlier predictors. For this purpose, we employed the same random forest classifier on both our dataset and the features suggested by forecASD, namely BrainSpan and STRING. We additionally assessed the propagation procedure on a random degree-preserving network to act as a negative control. The results are provided in Figure 4 showing the superiority of our method compared to forecASD and the negative control (AUROC of 0.91 vs. 0.87 and 0.82). Notably, the relatively high AUROC of the negative control testifies to the quality of the gene sets that serve as seeds for the network propagation.

FIGURE 4

Functional annotation analysis

Next, we wished to analyze the functional roles of the top predicted genes. To this end, we set a prediction threshold of 0.947, which maximizes the sum of precision and recall, and focused on the 84 genes passing this threshold. We conducted functional enrichment analysis using g:Profiler (version e109_eg56_p17_1d3191d) with Bonferroni corrected p-values and a significance threshold of 0.001 (Raudvere et al., 2019). The analysis is based on several data sources (GO:MF, GO:BP, Human Phenotype Ontology) (Figure 5).

FIGURE 5

From Human Phenotype Ontology—Autistic Behavior was the highest enriched phenotype. Using GO:BP (Biological Process) and GO:MF(Molecular Function) data sources, yield several highly enriched pathways known to play important roles in autism etiology including chromatin organization and binding (Haddad Derafshi et al., 2022), histone modification (Sun et al., 2016), neuron cell-cell adhesion (Eve et al., 2022) and zinc ion binding (Walsh et al., 2001; Walsh, 2002; Wang and Zhou, 2010; Yasuda et al., 2011). The full list of the functional annotation analysis results can be found in Supplementary Table S1.

Exploiting schizophrenia genes

Given the known phenotypic similarity between ASD and schizophrenia (SCZ) (Hommer and Swedo, 2015) we wished to test whether our classification model could be further improved by adding information on SCZ associated genes. Thus, we collected lists of genes that were associated with SCZ (Table 2) and reapplied our computational pipeline.

TABLE 2

ReferencesMethodData sourceNumber of genes
1Gandal et al., (2018)Differential gene expression (DGE)Frontal and temporal cortex brain samples4,821
2Gandal et al., (2018)Differential transcript expression (DTE)Frontal and temporal cortex brain samples3,803
3Gandal et al., (2018)Transcriptome-wide association study (TWAS)Frontal and temporal cortex brain samples193
4Huang et al., (2020)equality of variances to normalized expression data (evQTLs) Differential gene expressiondorsolateral prefrontal cortex of individuals affected with SCZ 212 SCZ and 214 unaffected control (CTL) samples88
5Huang et al., (2020)expression variability QTL (evQTL) mapping analysisdorsolateral prefrontal cortex of individuals affected with SCZ 212 SCZ and 214 unaffected control (CTL) samples1,453

Lists of SCZ associated genes.

Indeed, the model based on both ASD, and SCZ, genes obtained improved results with AUC, and AUPRC, of 0.88 and 0.89 respectively.

Next, we checked if our ASD classifier can also predict schizophrenia associated genes. We ran it on genes which are associated with SCZ from a database for Schizophrenia genetic research, SZDB (Wu et al., 2017). Specifically, we focused on 1622 SCZ-associated genes from SZDB with scores higher than 3, and compared their scores to those of the same number of random genes (Figure 6). Wilcoxon signed-rank test showed that the classifier gave the SCZ-associated genes significantly higher scores than the random genes (p-value of 2.275e-11).

FIGURE 6

Conclusion

We have presented a classification model for disease association. The classifier uses network propagation which enables us to combine and to amplify signals from individual genes, and uses machine learning to allow us to learn from known genes in order to classify new ones. In application to ASD, our classifier attained high accuracy, outperforming the state of the art. We have further shown its applicability to SCZ, which benefits from the similarity between these diseases but also shows the generality of our approach.

Functional enrichment analysis of our proposed candidate ASD genes has pointed to several pathways and processes that have been previously linked to ASD. For instance, neuron cell-cell adhesion, which may contribute to neuroinflammation in ASD (Eve et al., 2022) and participates in neurodevelopmental pathways associated with the disorder (Gandawijaya et al., 2020). Moreover (Bonsi et al., 2022), demonstrated that genes associated with ASD are frequently involved in the structural organization and functional activity of synapses, as evident in our results indicating enriched pathways such as synapse assembly and presynaptic and postsynaptic membrane assembly. This is interesting, as ASD is sometimes regarded as a disorder of connectivity (Mohammad-Rezazadeh et al., 2016), since its genes have both direct and indirect effect on a range of presynaptic and postsynaptic proteins (Bonsi et al., 2022; Yeo et al., 2022).

The fact that our classifier succeeded in predicting Schizophrenia-associated genes may suggest that previously implied association between ASD and SCZ may evolve from connectivity-related issues (Hommer and Swedo, 2015).

Statements

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Ethics statement

Ethical approval was not required for the study involving humans in accordance with the local legislation and institutional requirements. Written informed consent to participate in this study was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and the institutional requirements.

Author contributions

NZ: Conceptualization, Data curation, Formal Analysis, Investigation, Methodology, Project administration, Software, Validation, Visualization, Writing–original draft. GA: Funding acquisition, Writing–review and editing. RS: Funding acquisition, Writing–review and editing, Conceptualization, Project administration, Supervision.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This research was supported by a research grant from the Israel Science Foundation (IPMP grant no. 2417/20).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fbinf.2024.1295600/full#supplementary-material

References

  • 1

    AbrahamsB. S.ArkingD. E.CampbellD. B.MeffordH. C.MorrowE. M.WeissL. A.et al (2013). SFARI Gene 2.0: a community-driven knowledgebase for the autism spectrum disorders (ASDs). Mol. autism4 (1), 3643. 10.1186/2040-2392-4-36

  • 2

    BonsiP.De JacoA.FasanoL.GubelliniP. (2022). Postsynaptic autism spectrum disorder genes and synaptic dysfunction. Neurobiol. Dis.162, 105564. 10.1016/j.nbd.2021.105564

  • 3

    BrueggemanL.KoomarT.MichaelsonJ. J. (2020). Forecasting risk gene discovery in autism with machine learning and genome-scale data. Sci. Rep.10 (1), 4569. 10.1038/s41598-020-61288-5

  • 4

    DudaM.ZhangH.LiH. D.WallD. P.BurmeisterM.GuanY. (2018). Brain-specific functional relationship networks inform autism spectrum disorder gene prediction. Transl. psychiatry8 (1), 56. 10.1038/s41398-018-0098-6

  • 5

    ErtenS.BebekG.EwingR. M.KoyutürkM. (2011). DADA: degree-aware algorithms for network-based disease gene prioritization. BioData Min.4, 19. 10.1186/1756-0381-4-19

  • 6

    EveM.GandawijayaJ.YangL.Oguro-AndoA. (2022). Neuronal cell adhesion molecules may mediate neuroinflammation in autism spectrum disorder. Front. psychiatry/Front. Res. Found.13, 842755. 10.3389/fpsyt.2022.842755

  • 7

    FlickrF. (2023). Us on (no date) about Autism. Available at: https://www.nichd.nih.gov/health/topics/autism/conditioninfo (Accessed February 28, 2023).

  • 8

    GandalM. J.ZhangP.HadjimichaelE.WalkerR. L.ChenC.LiuS.et al (2018). Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. Science362 (6420), eaat8127. Available at:. 10.1126/science.aat8127

  • 9

    GandawijayaJ.BamfordR. A.BurbachJ. P. H.Oguro-AndoA. (2020). Cell adhesion molecules involved in neurodevelopmental pathways implicated in 3p-deletion syndrome and autism spectrum disorder. Front. Cell. Neurosci.14, 611379. 10.3389/fncel.2020.611379

  • 10

    Haddad DerafshiB.DankoT.ChandaS.BatistaP. J.LitzenburgerU.LeeQ. Y.et al (2022). The autism risk factor CHD8 is a chromatin activator in human neurons and functionally dependent on the ERK-MAPK pathway effector ELK1. Sci. Rep.12 (1), 22425. 10.1038/s41598-022-23614-x

  • 11

    HommerR. E.SwedoS. E. (2015). Schizophrenia and autism-related disorders. Schizophr. Bull.41 (2), 313314. 10.1093/schbul/sbu188

  • 12

    HuangG.OsorioD.GuanJ.JiG.CaiJ. J. (2020). Overdispersed gene expression in schizophrenia. NPJ Schizophr.6 (1), 9. 10.1038/s41537-020-0097-5

  • 13

    KrishnanA.ZhangR.YaoV.TheesfeldC. L.WongA. K.TadychA.et al (2016). Genome-wide prediction and functional characterization of the genetic basis of autism spectrum disorder. Nat. Neurosci.19 (11), 14541462. 10.1038/nn.4353

  • 14

    LinY.AfsharS.RajadhyakshaA. M.PotashJ. B.HanS. (2020). A machine learning approach to predicting autism risk genes: validation of known genes and discovery of new candidates. Front. Genet.11, 500064. 10.3389/fgene.2020.500064

  • 15

    LiuL.LeiJ.SandersS. J.WillseyA. J.KouY.CicekA. E.et al (2014). DAWN: a framework to identify autism genes and subnetworks using gene expression and genetics. Mol. autism5 (1), 2218. 10.1186/2040-2392-5-22

  • 16

    LiuX. (2012). Classification accuracy and cut point selection. Statistics Med.31 (23), 26762686. 10.1002/sim.4509

  • 17

    MaennerM. J. (2021). ‘Prevalence and characteristics of autism spectrum disorder among children aged 8 Years — autism and developmental Disabilities monitoring network, 11 sites, United States, 2018’, morbidity and mortality weekly report. Surveill. Summ., 70. Available at:. 10.15585/mmwr.ss7011a1

  • 18

    Mohammad-RezazadehI.FrohlichJ.LooS. K.JesteS. S. (2016). Brain connectivity in autism spectrum disorder. Curr. Opin. neurology29 (2), 137147. 10.1097/wco.0000000000000301

  • 19

    ParikshakN. N.SwarupV.BelgardT. G.IrimiaM.RamaswamiG.GandalM. J.et al (2016). Genome-wide changes in lncRNA, splicing, and regional gene expression patterns in autism. Nature540 (7633), 423427. 10.1038/nature20612

  • 20

    RaudvereU.KolbergL.KuzminI.ArakT.AdlerP.PetersonH.et al (2019). g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update). Nucleic acids Res.47 (W1), W191W198. 10.1093/nar/gkz369

  • 21

    SandersS. J.HeX.WillseyA.Ercan-SencicekA.SamochaK.CicekA.et al (2015). Insights into autism spectrum disorder genomic architecture and biology from 71 risk loci. Neuron87 (6), 12151233. 10.1016/j.neuron.2015.09.016

  • 22

    SatterstromF. K.KosmickiJ. A.WangJ.BreenM. S.De RubeisS.AnJ. Y.et al (2020). Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism. Cell.180 (3), 568584.e23. 10.1016/j.cell.2019.12.036

  • 23

    SignoriniL. F.AlmozlinoT.SharanR. (2021). ANAT 3.0: a framework for elucidating functional protein subnetworks using graph-theoretic and machine learning approaches. BMC Bioinforma.22 (1), 5266. 10.1186/s12859-021-04449-1

  • 24

    SunW.PoschmannJ.Cruz-Herrera del RosarioR.ParikshakN. N.HajanH. S.KumarV.et al (2016). Histone acetylome-wide association study of autism spectrum disorder. Cell.167 (5), 13851397.e11. 10.1016/j.cell.2016.10.031

  • 25

    WalshW. J.et al (2002). Metallothionein and autism’, and autism 2nd edn. Naperville, IL: pfeiffer.

  • 26

    WalshW. J.UsmanA.TarpeyJ. (2001). Disordered metal metabolism in a large autism population. walshinstitute.Org. Available at: http://www.walshinstitute.org/uploads/1/7/9/9/17997321/disordered_metal_metabolism_in_a_large_autism_population.pdf (Accessed August 28, 2023).

  • 27

    WangX.ZhouB. (2010). Dietary zinc absorption: a play of Zips and ZnTs in the gut. IUBMB life62 (3), 176182. 10.1002/iub.291

  • 28

    WongC. C. Y.SmithR. G.HannonE.RamaswamiG.ParikshakN. N.AssaryE.et al (2019). Genome-wide DNA methylation profiling identifies convergent molecular signatures associated with idiopathic and syndromic autism in post-mortem human brain tissue. Hum. Mol. Genet.28 (13), 22012211. 10.1093/hmg/ddz052

  • 29

    WuY.YaoY.-G.LuoX.-J. (2017). SZDB: a database for schizophrenia genetic research. Schizophr. Bull.43 (2), 459471. 10.1093/schbul/sbw102

  • 30

    YasudaH.YoshidaK.YasudaY.TsutsuiT. (2011). Infantile zinc deficiency: association with autism spectrum disorders. Sci. Rep.1, 129. 10.1038/srep00129

  • 31

    YeoX. Y.LimY. T.ChaeW. R.ParkC.ParkH.JungS. (2022). Alterations of presynaptic proteins in autism spectrum disorder. Front. Mol. Neurosci.15, 1062878. 10.3389/fnmol.2022.1062878

  • 32

    ZhangC.ShenY. (2017). A cell type-specific expression signature predicts haploinsufficient autism-susceptibility genes. Hum. Mutat.38 (2), 204215. 10.1002/humu.23147

Summary

Keywords

autism spectrum disorder (ASD), network propagation, machine learning, ASD genes, random forest

Citation

Zadok N, Ast G and Sharan R (2024) A network-based method for associating genes with autism spectrum disorder. Front. Bioinform. 4:1295600. doi: 10.3389/fbinf.2024.1295600

Received

16 September 2023

Accepted

26 February 2024

Published

08 March 2024

Volume

4 - 2024

Edited by

Yovani Marrero-Ponce, University of Valencia, Spain

Reviewed by

Hanbing Song, University of California, San Francisco, United States

Guillermin Agüero-Chapin, University of Porto, Portugal

Updates

Copyright

*Correspondence: Neta Zadok,

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics