Reproducibility and reuse play fundamental roles in the development of science. With the growth of computational power and tools, simulated experiments have become increasingly useful to understanding and predicting physical phenomena. In particular, computational methods in materials science have opened up a wealth of possibilities for novel materials discovery and design. The growth in this area, however, has largely surpassed the current cultural and institutional capacity to adequately facilitate reproducible and collaborative science.
We have identified several key areas (not an exhaustive list) within materials science for which reproducibility and/or data sharing is challenging: atomistic simulations, CALPHAD assessments, and density functional theory (DFT) based studies. We focus on efforts within the DFT arena, but there is substantial methodological overlap with the other listed areas.
Historically, the printed article has served as the medium de rigueur for the dissemination of scientific information. This works well when the context and results of an experiment or theory fit on a few pages; however, it is insufficient as a publication medium for many computational studies. For example, if one fits an effective Hamiltonian to the DFT formation energies for 100-500 different atomic configurations in an alloy, then reproduction from a paper and ink publication is prohibitively time consuming (if possible at all), whereas reproduction or reuse from a repository is orders of magnitude faster and therefore quite practical.
A cultural bias towards the traditional paper and ink publication and a lack of "best practices" for computational scientists contributes to the challenge of reproducible science. A cultural shift toward reproducible computational science will be a mufti-faceted process that includes:
Our initial focus has been on the development of tools and repositories for data publication, citation, and curation. A curated file repository (http://nist.matdl.org/) using the DSpace framework serves as a functional prototype for our efforts. Data with accompanying meta-data is uploaded by users. The Handle System provides unique persistent identifiers for the uploaded data. This allows attribution to be given unambiguously to the data entry itself. Support of standard meta-data sharing protocols and harvester allows the system to share content with external sites and federated instances. While still in a developmental stage, all contributed data to the existing system will be maintained going forward, and thus we encourage wide use.
Current goals include further development of the repository system with a focus on ease-of-use for end-users, machine-readable data, collaboration with stake-holders, and continued focus on the four points listed above.