AUTHOR=Clarke Neil D. , Taylor John S. TITLE=Taxonomic distribution of opsin families inferred from UniProt Reference Proteomes and a suite of opsin-specific hidden Markov models JOURNAL=Frontiers in Ecology and Evolution VOLUME=11 YEAR=2023 URL=https://www.frontiersin.org/journals/ecology-and-evolution/articles/10.3389/fevo.2023.1190549 DOI=10.3389/fevo.2023.1190549 ISSN=2296-701X ABSTRACT=Introduction

Opsins are a large and sequence-diverse family of light-responsive G-protein coupled receptors involved in vision, circadian rhythm, and other processes. Numerous subfamilies have been defined based on sequence similarity, cell-type localization, signal transduction mechanism, or biological function, but there is no consensus classification system.

Methods

We used multiple hidden Markov models (HMMs) to identify opsins in the UniProt Reference Proteomes database. Opsin-specific HMMs were also used in an annotation procedure that represents sequences as a vector of HMM scores and assess the similarity of these vectors to those of annotated sequences. UniProt Reference Proteomes are built from genome sequences, allowing us to make meaningful comparisons of the number of opsins in each of the 260 species available at the time of the survey in absolute terms and relative to a larger superfamily of which opsins are a member.

Results

More than 2,000 opsins were retrieved from 262 species (all metazoans).

Discussion

Merging opsin counts into higher order taxa paints a broad view of the taxonomic distribution of opsins, and of opsin subfamilies, annotated according to three different schemes.