About this Research Topic
The inability of ML to offer simple, human-readable interpretations of the rules for converting multi-dimensional input data stream into output classes makes ML a closed system, an inaccessible black box, a sort of Deux-ex-Machina, and the movements in latent space that stay largely hidden from human observers (the latent space can be perceived as a space of “hidden variables” in Quantum mechanics). One can argue that this is just an issue of linguistics or epistemology, but we find it crucial if we aim to fully understand biological or physical processes. Conversely, the ability of ML protocols to efficiently extract key features from a given data set makes them vulnerable to input data sampling bias. In scientific terms, the ML algorithms easily find the local minima given the input stream, but the global minima may be out of reach due to limited sampling. This issue is especially emphasized in natural sciences, where sampling bias is almost inevitable. All this shows that we may end up where we have started, with observable phenomena with no clear explanation. That may lead to a saturation effect of the explanatory ability of science, with weak feedback for knowledge improvement.
ML lacks the overall knowledge of the world, i.e., artificial general intelligence (AGI) still does not exist, and the ability to access general knowledge was instrumental to researchers’ ability to cross the sampling gap and guided them toward correct explanations (the Copernican revolution in astronomy, given observational data in the XVI century, would not be possible if Nicolas Copernicus did not possess a wider understanding and knowledge of the world).
In computer-aided-drug-design ML has been used to recognize drug binding sites, binding modes, and conformations, to speed up costly MD calculations (QM/MM in particular), and to optimize potential hits. The aim of practitioners of ML was to reduce the exorbitant costs of drug development and shorten the development and, correspondingly, lengthen the applicability of patent rights. ML can help in that regard, but the above comments still apply. Without a full compendium of cell signaling process, and still not fully resolved principles of molecular interactions and dynamics, machine learning’s ability to filter out unnecessary details from the input stream (experimentally obtained molecular structures, interactions, and clinical data), seems like a short-term success. What is necessary is to understand cause and effect both on the micro and macro level (cells, tissues, organs, individuals), together with effects of timescales and time relativity inside cells and tissues.
To address that we would like to invite practitioners of ML and drug hunters to submit manuscripts to this Research topic showing research that utilizes ML protocols/architectures but offers a detailed and comprehensive interpretation of observed phenomena. The topics that can be addressed with ML and that we are interested in are molecular dynamics (MD) acceleration techniques, implicit solvent improvements, small molecules force field optimizations, cryptic pocket discovery and their physical interpretation, allosteric effects discovery, and their application to undruggable targets. The toxicity in drug discovery is also a topic we would like to see addressed. We would also encourage the application of ML to the analysis of interaction frequencies between genome loci in the nucleus as statistical averages over cell populations, and their relation (and potential clinical use) to single cells analyses via 3C and fluorescence in situ hybridization techniques. The already established fractal and continuous polymer models of chromatin are ripe for deeper interpretations with the help of ML tools together with the experimental data on covalent modifications.
As a tool of choice for data analysis in experimental and observational sciences, ML helped produce a deluge of research papers, many with limited/questionable scientific merit. We would, therefore, avoid simple ML analysis of input data sets, or comparison of different ML algorithms (e.g., neural networks vs. random forests, etc.). Manuscripts that use machine learning, but at the same time offer solid theoretical interpretations of results are, on the other hand welcome. That may not only help drug discovery and molecular biology but also benefit the machine learning field as well, as it may shed light on the underlying processes in the latent space of variables.
Dr. Gunady is currently an employee of Illumina Inc; Dr.Perišić is an employee (Research Scientist) of Redesign Science. All other Topic Editors declare no competing interests.
Keywords: machine learning, small molecule force field optimization, drug design
Important Note: All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.