AUTHOR=Lobb Briallen , Kurtz Daniel A., Moreno-Hagelsieb Gabriel , Doxey Andrew C. TITLE=Remote homology and the functions of metagenomic dark matter JOURNAL=Frontiers in Genetics VOLUME=6 YEAR=2015 URL=https://www.frontiersin.org/journals/genetics/articles/10.3389/fgene.2015.00234 DOI=10.3389/fgene.2015.00234 ISSN=1664-8021 ABSTRACT=
Predicted open reading frames (ORFs) that lack detectable homology to known proteins are termed ORFans. Despite their prevalence in metagenomes, the extent to which ORFans encode real proteins, the degree to which they can be annotated, and their functional contributions, remain unclear. To gain insights into these questions, we applied sensitive remote-homology detection methods to functionally analyze ORFans from soil, marine, and human gut metagenome collections. ORFans were identified, clustered into sequence families, and annotated through profile-profile comparison to proteins of known structure. We found that a considerable number of metagenomic ORFans (73,896 of 484,121, 15.3%) exhibit significant remote homology to structurally characterized proteins, providing a means for ORFan functional profiling. The extent of detected remote homology far exceeds that obtained for artificial protein families (1.4%). As expected for real genes, the predicted functions of ORFans are significantly similar to the functions of their gene neighbors (