With large-scale genomic sequencing efforts meeting with success, post-genomics research efforts are focusing on functional assignment of gene products. One proposed approach for the functional assignment of hypothetical proteins is the use of structural data determined by crystallographic or NMR methods. The crystal structure determination of E. coli YgfZ protein was undertaken as part of a structural genomics effort (http://s2f.carb.nist.gov
) in order to assist with the functional assignment of the protein. The ygfZ gene of Escherichia coli encodes an uncharacterized protein of 36 kDa molecular weight. Its homologues are present in most bacteria and eukaryotes, but not in archaea. Based on a marginal sequence similarity to T-protein of the glycine-cleavage system (GCS),2 the YgfZ protein has been annotated as a putative aminomethyltransferase. T-protein catalyzes the release of ammonia from S-aminomethyldihydrolipoyl moiety attached to H-protein of the GCS and the synthesis of methylenetetrahydrofolate in the presence of tetrahydrofolate. The sequence identity between YgfZ and T-protein is very low, e.g. only 15% for the E. coli proteins. Moreover, with the exception of several glycine residues, none of the residues conserved among T-proteins is conserved in the YgfZ family, nor does the YgfZ conservation pattern match that of the T-protein sequence. In support of the possible relationship is the fact that in some enterobacteria including E. coli, Yersinia pestis, and Salmonella typhimurium, the ygfZ gene is located upstream of the gcv operon of the GCS. Although this cannot be used to assign the molecular function, it may indicate an involvement of YgfZ in one of the pathways related to the GCS. The YgfZ protein from Escherichia coli was cloned, expressed, and the crystal structure determined at 2.8 resolution. The YgfZ protein molecule has a globular shape with dimensions of 60 X 50 X 30 . It consists of three domains that are arranged at the vertices of an equilateral triangle creating a narrow central channel (Fig. 1). One of the most interesting features of the YgfZ structure is the cysteine residue (Cys228) located next to the charged surface in the loop bearing the fingerprint sequence. Cys228 is strictly invariant in the YgfZ family. Its thiol group is completely exposed and apparently highly reactive. Our functional hypothesis for the YgfZ protein is centered on this reactive cysteine residue, which may possibly act as a nucleophile. The position of Cys228 in the middle of the highly conserved fragment of the sequence supports this contention. The nature of the substrate remains unknown and will require further biochemical and biophysical studies that will be facilitated by the three-dimensional structure of the YgfZ protein.