AUTHOR=Hu Lijuan , Zhao Bin , Liu Mingchen , Gao Yang , Ding Haibo , Hu Qinghai , An Minghui , Shang Hong , Han Xiaoxu TITLE=Optimization of genetic distance threshold for inferring the CRF01_AE molecular network based on next-generation sequencing JOURNAL=Frontiers in Cellular and Infection Microbiology VOLUME=14 YEAR=2024 URL=https://www.frontiersin.org/journals/cellular-and-infection-microbiology/articles/10.3389/fcimb.2024.1388059 DOI=10.3389/fcimb.2024.1388059 ISSN=2235-2988 ABSTRACT=Introduction

HIV molecular network based on genetic distance (GD) has been extensively utilized. However, the GD threshold for the non-B subtype differs from that of subtype B. This study aimed to optimize the GD threshold for inferring the CRF01_AE molecular network.

Methods

Next-generation sequencing data of partial CRF01_AE pol sequences were obtained for 59 samples from 12 transmission pairs enrolled from a high-risk cohort during 2009 and 2014. The paired GD was calculated using the Tamura-Nei 93 model to infer a GD threshold range for HIV molecular networks.

Results

2,019 CRF01_AE pol sequences and information on recent HIV infection (RHI) from newly diagnosed individuals in Shenyang from 2016 to 2019 were collected to construct molecular networks to assess the ability of the inferred GD thresholds to predict recent transmission events. When HIV transmission occurs within a span of 1-4 years, the mean paired GD between the sequences of the donor and recipient within the same transmission pair were as follow: 0.008, 0.011, 0.013, and 0.023 substitutions/site. Using these four GD thresholds, it was found that 98.9%, 96.0%, 88.2%, and 40.4% of all randomly paired GD values from 12 transmission pairs were correctly identified as originating from the same transmission pairs. In the real world, as the GD threshold increased from 0.001 to 0.02 substitutions/site, the proportion of RHI within the molecular network gradually increased from 16.6% to 92.3%. Meanwhile, the proportion of links with RHI gradually decreased from 87.0% to 48.2%. The two curves intersected at a GD of 0.008 substitutions/site.

Discussion

A suitable range of GD thresholds, 0.008-0.013 substitutions/site, was identified to infer the CRF01_AE molecular transmission network and identify HIV transmission events that occurred within the past three years. This finding provides valuable data for selecting an appropriate GD thresholds in constructing molecular networks for non-B subtypes.