Skip to main content

ORIGINAL RESEARCH article

Front. Genet.
Sec. Computational Genomics
Volume 15 - 2024 | doi: 10.3389/fgene.2024.1483490
This article is part of the Research Topic The 22nd International Conference on Bioinformatics (InCoB 2023) Translational Bioinformatics Transforming Life View all 7 articles

Quadratic Descriptors and Reduction Methods in a Two-layered Model for Compound Inference

Provisionally accepted
Jianshen Zhu Jianshen Zhu 1Naveed Ahmed Azam Naveed Ahmed Azam 2*Shengjuan Cao Shengjuan Cao 1Ryota Ido Ryota Ido 1Kazuya Haraguchi Kazuya Haraguchi 1Liang Zhao Liang Zhao 3Hiroshi Nagamochi Hiroshi Nagamochi 1Tatsuya Akutsu Tatsuya Akutsu 4
  • 1 Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Kyoto, Kyōto, Japan
  • 2 Department of Mathematics, Faculty of Natural Sciences, Quaid-i-Azam University, Islamabad, Islamabad, Pakistan
  • 3 Graduate School of Advanced Integrated Studies in Human Survavibility (Shishu-Kan), Kyoto University, Kyoto, Japan, Kyoto, Japan
  • 4 Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji, Japan

The final, formatted version of the article will be published soon.

    Compound inference models are crucial for discovering novel drugs in bioinformatics and chemoinformatics. These models rely heavily on useful descriptors of chemical compounds that effectively capture important information about the underlying compounds for constructing accurate prediction functions. In this paper, we introduce quadratic descriptors, the products of two graph-theoretic descriptors, to enhance the learning performance of a novel two-layered compound inference model. A mixed-integer linear programming formulation is designed to approximate these quadratic descriptors for inferring desired compounds with the two-layered model. Furthermore, we introduce different methods to reduce descriptors, aiming to avoid computational complexity and overfitting issues during the learning process caused by the large number of quadratic descriptors. Experimental results show that for 32 chemical properties of monomers and ten chemical properties of polymers, the prediction functions constructed by the proposed method achieved high test coefficients of determination. Furthermore, our method inferred chemical compounds in a range from a few seconds to around 60 seconds. These results indicate a strong correlation between the properties of chemical graphs and their quadratic graph-theoretic descriptors.

    Keywords: machine learning, integer programming, Chemo-informatics, materials informatics, QSAR/QSPR, molecular design

    Received: 20 Aug 2024; Accepted: 30 Dec 2024.

    Copyright: © 2024 Zhu, Azam, Cao, Ido, Haraguchi, Zhao, Nagamochi and Akutsu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Naveed Ahmed Azam, Department of Mathematics, Faculty of Natural Sciences, Quaid-i-Azam University, Islamabad, 45320, Islamabad, Pakistan

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.