An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models

Jason Hattrick-Simpers; Brian DeCost; Aaron Gilad Kusne; Howard Joress; Winnie Wong-Ng; Debra Kaiser; Andriy Zakutayev; Caleb Phillips; Tonio Buonassisi; Shijing Sun; Janak Thapa

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models

Published

June 9, 2021

Author(s)

Jason Hattrick-Simpers, Brian DeCost, Aaron Gilad Kusne, Howard Joress, Winnie Wong-Ng, Debra Kaiser, Andriy Zakutayev, Caleb Phillips, Tonio Buonassisi, Shijing Sun, Janak Thapa

Abstract

Modern machine learning and autonomous experimentation schemes in materials science rely on accurate analysis of the data ingested by these models. Unfortunately, accurate analysis of the underlying data can be difficult, even for domain experts, complicating the training of the models intended to drive experiments. This is especially true when the goal is to identify the presence of weak signatures in diffraction or spectroscopic datasets. In this work, we examine a set of as-obtained diffraction data that track the phase transition from monoclinic to tetragonal in a Nb-doped VO2 film as a function of temperature and dopant concentration. We then task a set of domain experts and a set of machine learning experts with identifying which phase is present in each diffraction pattern manually and algorithmically, respectively; in both cases, the labels can vary dramatically, especially at the phase boundaries. We use the mode of the labels and the Shannon entropy as a method to capture, preserve and propagate consensus labels and their variance. Further we use the expert labels as a benchmark and demonstrate the use of Shannon entropy weighted scoring to test the performance of machine learning generated labels. Finally, we propose a material data challenge centered around generating improved labeling algorithms. This real-world dataset curated with expert labels can act as test bed for new algorithms. The raw data, annotations and code used in this study are all available online at data.gov and the interested reader is encouraged to replicate and improve the existing models

Citation

Integrating Materials and Manufacturing Innovation

Volume

Issue

Pub Type

Journals

Download Paper

https://doi.org/10.1007/s40192-021-00213-8

Local Download

Keywords

reference data, combinatorial materials science, diffraction, machine learning, labeling uncertainty

Composition and structure

Citation

Hattrick-Simpers, J. , DeCost, B. , Kusne, A. , Joress, H. , Wong-Ng, W. , Kaiser, D. , Zakutayev, A. , Phillips, C. , Buonassisi, T. , Sun, S. and Thapa, J. (2021), An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models, Integrating Materials and Manufacturing Innovation, [online], https://doi.org/10.1007/s40192-021-00213-8, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=931195 (Accessed October 12, 2025)

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created June 9, 2021, Updated October 14, 2021

Was this page helpful?

An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models

Author(s)

Abstract

Download Paper

Keywords

Citation

Additional citation formats

Issues