Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PROJECTS/PROGRAMS

Configurable Data Curation System (CDCS)

Summary

The Configurable Data Curation System (CDCS) is an informatics platform created as critical data-infrastructure for materials science R&D. Although initially conceived under the Materials Genome Initiative (MGI) program to accelerate advanced materials innovation, design, and deployment, the CDCS has been finding increasing use in scientific projects, organizations, institutions, and other domains domestically and internationally.

Project website: cdcs.nist.gov

Description

The ability to automate and accelerate the activities of scientific and engineering lifecycles for materials science (or any other domain) depends critically on a scalable infrastructure for scientific data. Without appropriate data or inter-connection of that data, no meaningful automation or interpretation is possible. In the MGI, there may be collections of incompatible data often represented in diverse formats. This is a challenge to the distributed research goal envisaged by the MGI. The Configurable Data Curation System (CDCS) allows for the curation of materials data into a repository using predefined templates. The ability of the platform’s underlying XML format to be transformed into virtually any other format using standard tools gives the CDCS the ability to serve as a data source for a wide variety of existing materials informatics efforts that can span across projects, groups, and organizations. Each project, group, or organization can run as many MDCS instances as needed. Individual MDCS repositories can be interconnected for federated searches and data sharing.

The informatics platform created by the NIST informatics team is a scalable data management platform, whose system types (curator and registry) represent basic building blocks of MGI infrastructure for activities involving data, computation, integration, and R&D. Built as web-applications made of modular functional components, the CDCS platform and team has been continually and successfully realizing the data-infrastructure aspect of the MGI vision by providing a scalable basis for incrementally curating, aggregating, connecting, searching, and sharing data, resources, and infrastructure. This has been built on a stack of modern web, data, and informatics technologies including:

the Django-web-application framework,
XML and JSON for structured, validated, interoperable data,
OAI-PMH protocols and federation for infrastructural composition,
persistent identifier support for linked data applications,
elastic search for advanced indexing and searching,
and much more.

The CDCS is implemented in Python, the Django web-application framework and MongoDB. It uses XML because it is a robust, proven standard written as plain text. It can also be shared and converted into other formats easily. The CDCS provides a Representational State Transfer (REST) API that allows other software to directly interact with it over a network. CDCS functions are available via the API, allowing for full automation.

Features and capabilities available in fielded systems include:

Support for JSON-formatted data
Support for custom front-ends (Angular, etc.)
Advances in search technologies - Integration of Root-and-Rule term-based technology (Parmenides), context-sensitive custom search operators, search by periodic table
Advanced indexing with Elastic Search
Online data migration support (Data migration tool)
Advanced data linking capabilities (persistent identification and linked data support)
Extended support for multiple data views

Common use-cases supported by the CDCS include:

Enterprise workflows and data integration
Lab information management systems
End-to-end microscopy workflows
Scientific reference data
Scientific sample repositories and data-linking
NIST building maintenance data analysis and linking
Tracking packages, plugins, data schemas, and more for NIST scientific computing platforms
Real-time streaming data support and applications
Data pipelines, workflows, and tools for community research (COVID-19)

Major Accomplishments

Since its rearchitecture in 2017 into the 2.0 core modular system of packages, the team has had 31 releases of the curator (MDCS) and 24 releases of the registry (MRR) software components and nearly 50 modular components, as well as 6 primary releases before that since the first public release in 2015. Releases have been developed in close-collaboration with a growing base of stakeholders who suggest important features that are rapidly developed and deployed on both test and production systems.
The user community for the CDCS has an increasing footprint in the U.S. (government, industry, and academia) as well as internationally. CDCS systems are being used to support research at national research institutions in Switzerland (the Swiss Government’s Research Institute for Materials Science and Technology or EMPA), Sweden (Stockholm’s KTH Royal Institute of Technology), Taiwan (University of Taiwan), China (Shanghai University), Japan (National Institute for Materials Science or NIMS), Korea (Korea Institute of Materials Science or KIMS). In the United States, it is being used extensively at the National Institute for Standards and Technology (NIST) for projects such as smart manufacturing, additive manufacturing, inter-atomic potentials research, phase-data research, high-throughput materials science, and more. In addition, it is being used by the U.S. Army Research Lab, Hollings Marine Laboratory, Argonne National Laboratory, John’s Hopkins University, Duke University, Texas A&M, Missouri S&T, Northwest University, and more for materials science R&D. It is being used at prominent industry organizations such as NextFlex, America Makes, and QuesTek, and is also showing impact in domains well-beyond materials science, such as uses at NIH to support human genome bio-informatics research, and in the greenhouse-gases research community. It also has domestic and international deployments of its registries to support scientific discovery at prominent institutions such as the world-recognized metrology organization, Bureau International des Poids et Mesures (BIPM), the Research Data Alliance (RDA), the Center for Hierarchical Materials Design (CHiMaD), as well as a primary registry deployment at the National Institute of Standards and Technology (NIST).
Recently we have seen rapid application, integration, and scaling of CDCS to the COVID-19 research domain. The NIST COVID19-DATA repository and registry systems are being made available to aid in meeting the White House Call to Action for the Nation’s artificial intelligence experts to develop new text and data mining techniques that can help the science community address high-priority scientific questions associated with COVID-19.
The CDCS is actively developed in collaboration with MML and ODI and has been regularly listed as part of the NIST-wide initiatives for improving access to open data and supporting the NIST process systems for developing and producing scientific reference data. Thus, CDCS remains close to conversations surrounding high quality science and analysis, such as scientific reproducibility and provenance.
Continual engagement is successfully and gradually growing and integrating communities of scientific data by deploying and connecting registries and repositories into scientific workflows in materials science, bio-economy, greenhouse gases, international metrology, international research data working groups (RDA), as well as integrating with other existing data infrastructure.

Associated Product(s)

Resources/Demos

Project website: cdcs.nist.gov
Software repositories
- Curator | Materials Data Curation System (MDCS): github.com/usnistgov/mdcs
- Registry | Materials Resource Registry (MRR): github.com/usnistgov/nmrr
Demo systems
- Curator demo system: mdcs.nist.gov/
- Registry demo system: cdcs.registry.nist.gov/

Information technology

Created May 27, 2021, Updated February 19, 2026

Organizations

NIST Staff

Benjamin Long

Guillaume Sousa Amaral

Philippe Dessauw

Former Staff

Adrien Catel, Information Technology Laboratory, NIST
Augustin Chini, Information Technology Laboratory, NIST
Xavier Schmitt, Information Technology Laboratory, NIST
Pierre-Francois Rigodiat, Information Technology Laboratory, NIST
Sharief Youssef, Information Technology Laboratory, NIST
Alden Dima, Information Technology Laboratory, NIST
Mary Brady, Information Technology Laboratory, NIST
Joshua Taillon, Office of Data and Informatics, NIST

Project Status

Ongoing

Related Publications

An Informatics Infrastructure for the Materials Genome Initiative

Implementing a Registry Federation for Materials Science Data Discovery

Other Projects

Related Projects

Internal

External

CUSTOMERS/CONTRIBUTORS
/COLLABORATORS

Internal:

Carelyn Campbell, Material Measurement Laboratory, NIST
Chandler Becker, Office of Data and Informatics, NIST
Raymond Plante, Office of Data and Informatics, NIST
Ali Daoudi, Office of Data and Informatics, NIST
Gretchen Greene, Office of Data and Informatics, NIST
Marcus Newrock, Office of Data and Informatics, NIST
Kamal Choudhary, Material Measurement Laboratory, NIST
Faical Yannick Congo, Material Measurement Laboratory, NIST
Frederick R. Phelan Jr., Material Measurement Laboratory, NIST
June Lau, Material Measurement Laboratory, NIST
Lyle Levine, Material Measurement Laboratory, NIST
Paul Witherow, Engineering Laboratory, NIST
Yan Lu, Engineering Laboratory, NIST
Michael Brundage, Engineering Laboratory, NIST
Tom Hedberg, Engineering Laboratory, NIST
Rachael Sexton, Engineering Laboratory, NIST
Yande Ndiaye, Engineering Laboratory, NIST
Melvin Martins, Information Technology Laboratory, NIST
Ya-Shian Li-Baboud, Information Technology Laboratory, NIST
Eswaran Subrahmanian, Information Technology Laboratory, NIST
Talapady N. Bhat, Material Measurement Laboratory, NIST
John T. Elliott, Material Measurement Laboratory, NIST
Kathryn L. Beers, Material Measurement Laboratory, NIST
Kelsea A. Schumacher, Material Measurement Laboratory, NIST
Mylene Simon, Information Technology Laboratory, NIST
Peter Bajcsy, Information Technology Laboratory, NIST
Regina L. Avila, Information Services Office, NIST

External:

Greta Lindwall, KTH Royal Institute of Technology in Stockholm
Hans-Henrik König, KTH Royal Institute of Technology in Stockholm
Anders Lindquist, KTH Royal Institute of Technology in Stockholm
Michele Griffa, Swiss Federal Laboratories for Materials Science and Technology (EMPA)
Fabian Bucher, Swiss Federal Laboratories for Materials Science and Technology (EMPA)
Laura M. Bartolo, Northwestern University
Gerard Lemson, Johns Hopkins University
David Elbert, Johns Hopkins University
Quan Qian, Shanghai University
Shengyen Li, Southwest Research Institute
Rachel Cusic, U.S. Navy

Was this page helpful?