10.6078/D1K01X
Srouji, John
John
Srouji
Xu, Anting
Anting
Xu
Park, Annsea
Annsea
Park
Kirsch, Jack
Jack
Kirsch
Brenner, Steven
Steven
Brenner
https://orcid.org/0000-0001-7559-6185
The Evolution of Function within the Nudix Superfamily: Steps to reconstruct sequence alignments
UC Berkeley
2015
Data Set/Description
hydrolase
homoplasy
Nudix
sequence alignment
structural alignment
2015
10.6078/d1cc74
Creative Commons Attribution 4.0 International (CC-BY 4.0)
The Nudix superfamily encompasses over 80,000 protein domains from all three domains of life. These proteins fall into four general functional classes: isopentenyl diphosphate isomerases (IDIs), adenine/guanine mismatch-specific adenine glycosylases (A/G-specific adenine glycosylases), pyrophosphohydrolases, and non-enzymatic activities such as protein/protein interaction and transcriptional regulation. The largest group, pyrophosphohydrolases, encompasses more than 100 distinct hydrolase specificities. To understand the evolution of this vast number of activities, we assembled and analyzed experimental and structural data for 205 Nudix proteins collected from the literature. We corrected erroneous functions or provided more appropriate descriptions for 53 annotations described in the Gene Ontology Annotation database in this family, and propose 275 new experimentally-based annotations. We manually constructed structure-guided sequence alignment of 78 Nudix proteins. Using the structural alignment as a seed, we then made an alignment of 347 “select” Nudix domains, curated from structurally determined, functionally characterized, or phylogenetically important Nudix domains. Based on our review of Nudix pyrophosphohydrolase structures and specificities, we further analyzed a loop region downstream of the Nudix hydrolase motif previously shown to contact the substrate molecule and possess known functional motifs. This loop region provides a potential structural basis for the functional radiation and evolution of substrate specificity within the hydrolase family. Finally, phylogenetic analyses of the 347 select protein domains and of the complete Nudix clan revealed general monophyly with regard to function and a few instances of probable homoplasy.
Graphical representation of the steps used to reconstruct sequence alignments of the Nudix superfamily, as described in the Materials and Methods section. (A) The pipeline to build the 78-PDB structure guided sequence alignment. (B) The pipeline to build the 324-core sequence alignment guided by the 78-PDB sequence alignment. (C) The pipeline to build the alignment of the complete Nudix clan (38,950 sequences). (D) Illustration of how to combine two alignment into one guided by a scaffold alignment.