Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

About the CDCS

The Configurable Data Curation System developed at NIST and also known as the CDCS or Curator, provides a means for capturing, sharing, and transforming unstructured data into a structured format based on the Extensible Markup Language (XML). The CDCS can be viewed as a "loading dock" for scientific data. It serves as means to enable the collection and dissemination of structured scientific data. It can be applied to any area and is agnostic to the type of data. “Curated” data is amenable to transformation to other formats such as those used by existing computational tools. The data are organized using user-selected community-developed templates encoded in XML Schema used to create data documents that are saved in a non-relational (NoSQL) document database.

The CDCS is currently in use by the Materials Genome Initiative (MGI). In the MGI, there may be collections of incompatible data often represented in diverse formats. This is a challenge to the distributed research goal envisaged by the MGI. The Materials Data Curation System (MDCS) allows for the curation of materials data into a repository using predefined templates. The ability of the MDCS’ underlying XML format to be transformed into virtually any other format using standard tools, gives the MDCS the ability to serve as a data source for a wide variety of existing materials informatics efforts that can span across projects, groups, and organizations. Each project, group, or organization can run as many MDCS instances as needed. Individual MDCS repositories can be interconnected for federated searches and data sharing.

The CDCS is implemented in Python, the Django framework and MongoDB. It uses XML because it is a robust, proven standard written as plain text. It can also be shared and converted into other formats easily. The CDCS provides a Representational State Transfer (REST) API that allows other software to directly interact with it over a network. CDCS functions are available via the API, allowing for full automation.

Created September 6, 2018, Updated September 17, 2018