IntroductionFasciolopsiasis, a food-borne intestinal disease is most common in Asia and the Indian subcontinent. Pigs are the reservoir host, and fasciolopsiasis is most widespread in locations where pigs are reared and aquatic plants are widely consumed. Human infection has been most commonly documented in China, Bangladesh, Southeast Asia, and parts of India. It predominates in school-age children, and significant worm burdens are not uncommon. The causal organism is Fasciolopsis buski, a giant intestinal fluke that infects humans and causes diarrhoea, fever, ascites, and intestinal blockage. The increasing prevalence of medication resistance and the necessity for an effective vaccination make controlling these diseases challenging.MethodsOver the last decade, we have achieved major advances in our understanding of intestinal fluke biology by in-depth interrogation and analysis of evolving F. buski omics datasets. The creation of large omics datasets for F. buski by our group has accelerated the discovery of key molecules involved in intestinal fluke biology, toxicity, and virulence that can be targeted for vaccine development. Finding successful vaccination antigen combinations from these huge number of genes/proteins in the available omics datasets is the key in combating these neglected tropical diseases. In the present study, we developed an in silico workflow to select antigens for composing a chimeric vaccine, which could be a significant technique for developing a fasciolopsiasis vaccine that prevents the parasite from causing serious harm.Results and discussionThis chimeric vaccine can now be tested experimentally and compared to other vaccine candidates to determine its potential influence on human health. Although the results are encouraging, additional validation is needed both in vivo and in vitro. Considering the extensive genetic data available for intestinal flukes that has expanded with technological advancements, we may need to reassess our methods and suggest a more sophisticated technique in the future for identifying vaccine molecules.
Introduction: In the realm of next-generation sequencing datasets, various characteristics can be extracted through k-mer based analysis. Among these characteristics, genome size (GS) is one that can be estimated with relative ease, yet achieving satisfactory accuracy, especially in the context of heterozygosity, remains a challenge.
Methods: In this study, we introduce a high-precision genome size estimator, GSET (Genome Size Estimation Tool), which is based on k-mer histogram correction.
Results: We have evaluated GSET on both simulated and real datasets. The experimental results demonstrate that this tool can estimate genome size with greater precision, even surpassing the accuracy of state-of-the-art tools. Notably, GSET also performs satisfactorily on heterozygous datasets, where other tools struggle to produce useable results.
Discussion: The processing model of GSET diverges from the popular data fitting models used by similar tools. Instead, it is derived from empirical data and incorporates a correction term to mitigate the impact of sequencing errors on genome size estimation. GSET is freely available for use and can be accessed at the following URL: https://github.com/Xingyu-Liao/GSET.
Preventing, diagnosing, and treating diseases requires accurate clinical biomarkers, which remains challenging. Recently, advanced computational approaches have accelerated the discovery of promising biomarkers from high-dimensional multimodal data. Although machine-learning methods have greatly contributed to the research fields, handling data sparseness, which is not unusual in research settings, is still an issue as it leads to limited interpretability and performance in the presence of missing information. Here, we propose a novel pipeline integrating joint non-negative matrix factorization (JNMF), identifying key features within sparse high-dimensional heterogeneous data, and a biological pathway analysis, interpreting the functionality of features by detecting activated signaling pathways. By applying our pipeline to large-scale public cancer datasets, we identified sets of genomic features relevant to specific cancer types as common pattern modules (CPMs) of JNMF. We further detected COPS5 as a potential upstream regulator of pathways associated with diffuse large B-cell lymphoma (DLBCL). COPS5 exhibited co-overexpression with MYC, TP53, and BCL2, known DLBCL marker genes, and its high expression was correlated with a lower survival probability of DLBCL patients. Using the CRISPR-Cas9 system, we confirmed the tumor growth effect of COPS5, which suggests it as a novel prognostic biomarker for DLBCL. Our results highlight that integrating multiple high-dimensional data and effectively decomposing them to interpretable dimensions unravels hidden biological importance, which enhances the discovery of clinical biomarkers.
Esophageal squamous cell carcinoma (ESCC) is the predominant subtype of esophageal cancer in Central Asia, often diagnosed at advanced stages. Understanding population-specific patterns of ESCC is crucial for tailored treatments. This study aimed to unravel ESCC’s genetic basis in Kazakhstani patients and identify potential biomarkers for early diagnosis and targeted therapies. ESCC patients from Kazakhstan were studied. We analyzed histological subtypes and conducted in-depth transcriptome sequencing. Differential gene expression analysis was performed, and significantly dysregulated pathways were identified using KEGG pathway analysis (p-value < 0.05). Protein-protein interaction networks were constructed to elucidate key modules and their functions. Among Kazakhstani patients, ESCC with moderate dysplasia was the most prevalent subtype. We identified 42 significantly upregulated and two significantly downregulated KEGG pathways, highlighting molecular mechanisms driving ESCC pathogenesis. Immune-related pathways, such as viral protein interaction with cytokines, rheumatoid arthritis, and oxidative phosphorylation, were elevated, suggesting immune system involvement. Conversely, downregulated pathways were associated with extracellular matrix degradation, crucial in cancer invasion and metastasis. Protein-protein interaction network analysis revealed four distinct modules with specific functions, implicating pathways in esophageal cancer development. High-throughput transcriptome sequencing elucidated critical molecular pathways underlying esophageal carcinogenesis in Kazakhstani patients. Insights into dysregulated pathways offer potential for early diagnosis and precision treatment strategies for ESCC. Understanding population-specific patterns is essential for personalized approaches to ESCC management.