Skip to main content

ORIGINAL RESEARCH article

Front. Res. Metr. Anal.
Sec. Emerging Technologies and Transformative Paradigms in Research
Volume 10 - 2025 | doi: 10.3389/frma.2025.1509502

Predicting Implicit Concept Embeddings for Singular Relationship Discovery Replication of Closed Literature-based Discovery

Provisionally accepted
  • Virginia Commonwealth University, Richmond, United States

The final, formatted version of the article will be published soon.

    Literature-based Discovery (LBD) identifies new knowledge by leveraging existing literature. It exploits interconnecting implicit relationships to build bridges between isolated sets of non-interacting literatures. It has been used to facilitate drug repurposing, new drug discovery, and study adverse event reactions. Within the last decade, LBD systems have transitioned from using statistical methods to exploring deep learning (DL) to analyze semantic spaces between non-interacting literatures. Recent works explore knowledge graphs (KG) to represent explicit relationships. These works envision LBD as a knowledge graph completion (KGC) task and use DL to generate implicit relationships. However, these systems require the researcher to have domain-expert knowledge when submitting relevant queries for novel hypothesis discovery. Our method explores a novel approach to identify all implicit hypotheses given the researcher's search query and expedites the knowledge discovery process. We revise the KGC task as the task of predicting interconnecting vertex embeddings within the graph. We train our model using a similarity learning objective and compare our model's predictions against all known vertices within the graph to determine the likelihood of an implicit relationship (i.e., connecting edge). We also explore three approaches to represent edge connections between vertices within the KG: average, concatenation, and {H}adamard. Lastly, we explore an approach to induce inductive biases and expedite model convergence (i.e., input representation scaling). We evaluate our method by replicating five known discoveries within the Hallmark of Cancer (HOC) datasets and compare our method to two existing works. Our results show no significant difference in reported ranks and model convergence rate when comparing scaling our input representations and not using this method. Comparing our method to previous works, we found our method achieves optimal performance on two of five datasets and achieves comparable performance on the remaining datasets. We further analyze our results using statistical significance testing to demonstrate the efficacy of our method. We found our similarity-based learning objective predicts linking vertex embeddings for single relationship closed discovery replication. Our method also provides a ranked list of linking vertices between a set of inputs. This approach reduces researcher burden and allows further exploration of generated hypotheses.

    Keywords: Natural Language Processing, Semantic similarity and relatedness, Distributional similarity, Literature-Based Discovery, neural networks, deep learning, knowledge discovery

    Received: 11 Oct 2024; Accepted: 27 Jan 2025.

    Copyright: © 2025 Cuffy and McInnes. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    Clint Cuffy, Virginia Commonwealth University, Richmond, United States
    Bridget T. McInnes, Virginia Commonwealth University, Richmond, United States

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.