AUTHOR=Maciel Lucas F. , Morales-Vicente David A. , Silveira Gilbert O. , Ribeiro Raphael O. , Olberg Giovanna G. O. , Pires David S. , Amaral Murilo S. , Verjovski-Almeida Sergio
TITLE=Weighted Gene Co-Expression Analyses Point to Long Non-Coding RNA Hub Genes at Different Schistosoma mansoni Life-Cycle Stages
JOURNAL=Frontiers in Genetics
VOLUME=10
YEAR=2019
URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2019.00823
DOI=10.3389/fgene.2019.00823
ISSN=1664-8021
ABSTRACT=
Long non-coding RNAs (lncRNAs) (>200 nt) are expressed at levels lower than those of the protein-coding mRNAs, and in all eukaryotic model species where they have been characterized, they are transcribed from thousands of different genomic loci. In humans, some four dozen lncRNAs have been studied in detail, and they have been shown to play important roles in transcriptional regulation, acting in conjunction with transcription factors and epigenetic marks to modulate the tissue-type specific programs of transcriptional gene activation and repression. In Schistosoma mansoni, around 10,000 lncRNAs have been identified in previous works. However, the limited number of RNA-sequencing (RNA-seq) libraries that had been previously assessed, together with the use of old and incomplete versions of the S. mansoni genome and protein-coding transcriptome annotations, have hampered the identification of all lncRNAs expressed in the parasite. Here we have used 633 publicly available S. mansoni RNA-seq libraries from whole worms at different stages (n = 121), from isolated tissues (n = 24), from cell-populations (n = 81), and from single-cells (n = 407). We have assembled a set of 16,583 lncRNA transcripts originated from 10,024 genes, of which 11,022 are novel S. mansoni lncRNA transcripts, whereas the remaining 5,561 transcripts comprise 120 lncRNAs that are identical to and 5,441 lncRNAs that have gene overlap with S. mansoni lncRNAs already reported in previous works. Most importantly, our more stringent assembly and filtering pipeline has identified and removed a set of 4,293 lncRNA transcripts from previous publications that were in fact derived from partially processed mRNAs with intron retention. We have used weighted gene co-expression network analyses and identified 15 different gene co-expression modules. Each parasite life-cycle stage has at least one highly correlated gene co-expression module, and each module is comprised of hundreds to thousands lncRNAs and mRNAs having correlated co-expression patterns at different stages. Inspection of the top most highly connected genes within the modules’ networks has shown that different lncRNAs are hub genes at different life-cycle stages, being among the most promising candidate lncRNAs to be further explored for functional characterization.