AUTHOR=Kidwai-Khan Farah , Rentsch Christopher T. , Pulk Rebecca , Alcorn Charles , Brandt Cynthia A. , Justice Amy C. TITLE=Pharmacogenomics driven decision support prototype with machine learning: A framework for improving patient care JOURNAL=Frontiers in Big Data VOLUME=5 YEAR=2022 URL=https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2022.1059088 DOI=10.3389/fdata.2022.1059088 ISSN=2624-909X ABSTRACT=Introduction

A growing number of healthcare providers make complex treatment decisions guided by electronic health record (EHR) software interfaces. Many interfaces integrate multiple sources of data (e.g., labs, pharmacy, diagnoses) successfully, though relatively few have incorporated genetic data.

Method

This study utilizes informatics methods with predictive modeling to create and validate algorithms to enable informed pharmacogenomic decision-making at the point of care in near real-time. The proposed framework integrates EHR and genetic data relevant to the patient's current medications including decision support mechanisms based on predictive modeling. We created a prototype with EHR and linked genetic data from the Department of Veterans Affairs (VA), the largest integrated healthcare system in the US. The EHR data included diagnoses, medication fills, and outpatient clinic visits for 2,600 people with HIV and matched uninfected controls linked to prototypic genetic data (variations in single or multiple positions in the DNA sequence). We then mapped the medications that patients were prescribed to medications defined in the drug-gene interaction mapping of the Clinical Pharmacogenomics Implementation Consortium's (CPIC) level A (i.e., sufficient evidence for at least one prescribing action) guidelines that predict adverse events. CPIC is a National Institute of Health funded group of experts who develop evidence based pharmacogenomic guidelines. Preventable adverse events (PAE) can be defined as a harmful outcome from an intervention that could have been prevented. For this study, we focused on potential PAEs resulting from a medication-gene interaction.

Results

The final model showed AUC scores of 0.972 with an F1 score of 0.97 with genetic data as compared to 0.766 and 0.73 respectively, without genetic data integration.

Discussion

Over 98% of people in the cohort were on at least one medication with CPIC level a guideline in their lifetime. We compared predictive power of machine learning models to detect a PAE between five modeling methods: Random Forest, Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), K Nearest neighbors (KNN), and Decision Tree. We found that XGBoost performed best for the prototype when genetic data was added to the framework and improved prediction of PAE. We compared area under the curve (AUC) between the models in the testing dataset.