AUTHOR=Salway Travis , Butt Zahid A. , Wong Stanley , Abdia Younathan , Balshaw Robert , Rich Ashleigh J. , Ablona Aidan , Wong Jason , Grennan Troy , Yu Amanda , Alvarez Maria , Rossi Carmine , Gilbert Mark , Krajden Mel , Janjua Naveed Z. TITLE=A Computable Phenotype Model for Classification of Men Who Have Sex With Men Within a Large Linked Database of Laboratory, Surveillance, and Administrative Healthcare Records JOURNAL=Frontiers in Digital Health VOLUME=2 YEAR=2020 URL=https://www.frontiersin.org/journals/digital-health/articles/10.3389/fdgth.2020.547324 DOI=10.3389/fdgth.2020.547324 ISSN=2673-253X ABSTRACT=

Background: Most public health datasets do not include sexual orientation measures, thereby limiting the availability of data to monitor health disparities, and evaluate tailored interventions. We therefore developed, validated, and applied a novel computable phenotype model to classify men who have sex with men (MSM) using multiple health datasets from British Columbia, Canada, 1990–2015.

Methods: Three case surveillance databases, a public health laboratory database, and five administrative health databases were linked and deidentified (BC Hepatitis Testers Cohort), resulting in a retrospective cohort of 727,091 adult men. Known MSM status from the three disease case surveillance databases was used to develop a multivariable model for classifying MSM in the full cohort. Models were selected using “elastic-net” (GLMNet package) in R, and a final model optimized area under the receiver operating characteristics curve. We compared characteristics of known MSM, classified MSM, and classified heterosexual men.

Findings: History of gonorrhea and syphilis diagnoses, HIV tests in the past year, history of visit to an identified gay and bisexual men's clinic, and residence in MSM-dense neighborhoods were all positively associated with being MSM. The selected model had sensitivity of 72%, specificity of 94%. Excluding those with known MSM status, a total of 85,521 men (12% of cohort) were classified as MSM.

Interpretation: Computable phenotyping is a promising approach for classification of sexual minorities and investigation of health outcomes in the absence of routinely available self-report data.