Cys.sqlite: A structured-information approach to the comprehensive analysis of cysteine disulfide bonds in the Protein Databank
Theodore L. Fobe, Andrei F. Kazakov, Demian Riccardi
Cysteine is a multifaceted amino acid that is central to the structure and function of many proteins. A disulfide bond formed between two cysteines restrains protein conformations through the strong covalent bond and torsions about the bond that pre- fer, energetically, +/- 90 degrees. In this study, we transform over 30k Protein Databank files (PDBx/mmCIFs) into a single file, SQLite database (Cys.sqlite). The database schema is designed to accommodate the structural information of both oxidized and re- duced cysteines and to retain essential protein metadata to establish informational and biological provenance. Cys.sqlite contains over 95k peptide chains and 500k cysteines (700k structural conformers); there are over 265k cysteine disulfide bond conformations from structures solved with all available experimental methods. The structural infor- mation is analyzed with respect to sequence identity cutoff, the experimental method, and energetics of the disulfide. We find that as the experimental information becomes limiting and the influence of modeling becomes more pronounced, the observed average strain increases artificially. The database and analyses presented here can be used to improve the refinement of biological structures from experiments that are known to contain one or more disulfide bonds.
, Kazakov, A.
and Riccardi, D.
Cys.sqlite: A structured-information approach to the comprehensive analysis of cysteine disulfide bonds in the Protein Databank, Structure, [online], https://doi.org/10.1021/acs.jcim.8b00950, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=925795
(Accessed February 25, 2024)