AUTHOR=Kolyshkina Inna , Simoff Simeon 

TITLE=Interpretability of Machine Learning Solutions in Public Healthcare: The CRISP-ML Approach

JOURNAL=Frontiers in Big Data

VOLUME=Volume 4 - 2021

YEAR=2021

URL=https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2021.660206

DOI=10.3389/fdata.2021.660206

ISSN=2624-909X

ABSTRACT=Public healthcare has a history of cautious adoption of artificial intelligence (AI) systems. The rapid growth of data collection and linking capabilities combined with increasing diversity of the data-driven AI techniques including machine learning (ML), have brought both ubiquitous opportunities for data analytics projects and increased demands for regulation and accountability for the outcomes of these projects. As a result, the area of interpretability and explainability of ML is gaining significant research momentum. However, while there has been some progress in the development of ML methods, less has been done on the methodological side. This limits the practicality of using ML in the health domain: the issues with explaining the outcomes of ML algorithms to medical practitioners and policy makers in public health has been a recognised obstacle to the broader adoption of data science approaches in this domain. This paper builds on our earlier work which introduced CRISP-ML - a methodology that determines the interpretability level required by stakeholders for the successful real-world solution and then helps to achieve it. CRISP-ML builds on the strengths of CRISP-DM, while addressing its gaps in handling interpretability. Its application in the Public Healthcare sector follows its successful deployment in a number of recent real-world projects across several industries and fields including credit risk, insurance, utilities and sport. In this paper we elaborate CRISP-ML methodology on the determination, measurement and achievement of the necessary level of interpretability of ML solution in the Public Healthcare sector. The paper demonstrates how CRISP-ML addressed the problems with data diversity, unstructured nature and relatively low linkage between diverse data sets in the healthcare domain. The characteristics of the case study, which we used, are typical for healthcare data and CRISP-ML managed to deliver on these issues, ensuring the required interpretability of the ML solutions in the project. It ensured meeting the interpretability requirements, taking in account public healthcare specifics, regulatory requirements, stakeholders involved, project objectives and data characteristics. The paper concludes with the three main directions for the development of the presented cross-industry standard process.