AUTHOR=Lockwood Svetlana , Brayton Kelly A. , Daily Jeff A. , Broschat Shira L. TITLE=Whole Proteome Clustering of 2,307 Proteobacterial Genomes Reveals Conserved Proteins and Significant Annotation Issues JOURNAL=Frontiers in Microbiology VOLUME=10 YEAR=2019 URL=https://www.frontiersin.org/journals/microbiology/articles/10.3389/fmicb.2019.00383 DOI=10.3389/fmicb.2019.00383 ISSN=1664-302X ABSTRACT=
We clustered 8.76 M protein sequences deduced from 2,307 completely sequenced Proteobacterial genomes resulting in 707,311 clusters of one or more sequences of which 224,442 ranged in size from 2 to 2,894 sequences. To our knowledge this is the first study of this scale. We were surprised to find that no single cluster contained a representative sequence from all the organisms in the study. Given the minimal genome concept, we expected to find a shared set of proteins. To determine why the clusters did not have universal representation we chose four essential proteins, the chaperonin GroEL, DNA dependent RNA polymerase subunits beta and beta′ (RpoB/RpoB′), and DNA polymerase I (PolA), representing fundamental cellular functions, and examined their cluster distribution. We found these proteins to be remarkably conserved with certain caveats. Although the