AUTHOR=Münch Maximilian , Raab Christoph , Biehl Michael , Schleif Frank-Michael TITLE=Data-Driven Supervised Learning for Life Science Data JOURNAL=Frontiers in Applied Mathematics and Statistics VOLUME=6 YEAR=2020 URL=https://www.frontiersin.org/journals/applied-mathematics-and-statistics/articles/10.3389/fams.2020.553000 DOI=10.3389/fams.2020.553000 ISSN=2297-4687 ABSTRACT=

Life science data are often encoded in a non-standard way by means of alpha-numeric sequences, graph representations, numerical vectors of variable length, or other formats. Domain-specific or data-driven similarity measures like alignment functions have been employed with great success. The vast majority of more complex data analysis algorithms require fixed-length vectorial input data, asking for substantial preprocessing of life science data. Data-driven measures are widely ignored in favor of simple encodings. These preprocessing steps are not always easy to perform nor particularly effective, with a potential loss of information and interpretability. We present some strategies and concepts of how to employ data-driven similarity measures in the life science context and other complex biological systems. In particular, we show how to use data-driven similarity measures effectively in standard learning algorithms.