AUTHOR=Sepas Ali , Bangash Ali Haider , Alraoui Omar , El Emam Khaled , El-Hussuna Alaa
TITLE=Algorithms to anonymize structured medical and healthcare data: A systematic review
JOURNAL=Frontiers in Bioinformatics
VOLUME=2
YEAR=2022
URL=https://www.frontiersin.org/journals/bioinformatics/articles/10.3389/fbinf.2022.984807
DOI=10.3389/fbinf.2022.984807
ISSN=2673-7647
ABSTRACT=
Introduction: With many anonymization algorithms developed for structured medical health data (SMHD) in the last decade, our systematic review provides a comprehensive bird’s eye view of algorithms for SMHD anonymization.
Methods: This systematic review was conducted according to the recommendations in the Cochrane Handbook for Reviews of Interventions and reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). Eligible articles from the PubMed, ACM digital library, Medline, IEEE, Embase, Web of Science Collection, Scopus, ProQuest Dissertation, and Theses Global databases were identified through systematic searches. The following parameters were extracted from the eligible studies: author, year of publication, sample size, and relevant algorithms and/or software applied to anonymize SMHD, along with the summary of outcomes.
Results: Among 1,804 initial hits, the present study considered 63 records including research articles, reviews, and books. Seventy five evaluated the anonymization of demographic data, 18 assessed diagnosis codes, and 3 assessed genomic data. One of the most common approaches was k-anonymity, which was utilized mainly for demographic data, often in combination with another algorithm; e.g., l-diversity. No approaches have yet been developed for protection against membership disclosure attacks on diagnosis codes.
Conclusion: This study reviewed and categorized different anonymization approaches for MHD according to the anonymized data types (demographics, diagnosis codes, and genomic data). Further research is needed to develop more efficient algorithms for the anonymization of diagnosis codes and genomic data. The risk of reidentification can be minimized with adequate application of the addressed anonymization approaches.
Systematic Review Registration: [http://www.crd.york.ac.uk/prospero], identifier [CRD42021228200].