Opsins are a large and sequence-diverse family of light-responsive G-protein coupled receptors involved in vision, circadian rhythm, and other processes. Numerous subfamilies have been defined based on sequence similarity, cell-type localization, signal transduction mechanism, or biological function, but there is no consensus classification system.
We used multiple hidden Markov models (HMMs) to identify opsins in the UniProt Reference Proteomes database. Opsin-specific HMMs were also used in an annotation procedure that represents sequences as a vector of HMM scores and assess the similarity of these vectors to those of annotated sequences. UniProt Reference Proteomes are built from genome sequences, allowing us to make meaningful comparisons of the number of opsins in each of the 260 species available at the time of the survey in absolute terms and relative to a larger superfamily of which opsins are a member.
More than 2,000 opsins were retrieved from 262 species (all metazoans).
Merging opsin counts into higher order taxa paints a broad view of the taxonomic distribution of opsins, and of opsin subfamilies, annotated according to three different schemes.