Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PROJECTS/PROGRAMS

Creating Benchmark Datasets for Electron Microscopy-Based Machine Learning Models

Summary

We are developing benchmark datasets for machine learning (ML) models trained on electron microscopy data, using NIST’s NexusLIMS dataset to decrease the black-box nature of ML models. This project focuses on creating parameters for materials science image dataset curation and developing a benchmark dataset that can standardize the performance of materials science machine learning models. This is a critical step towards the interpretability of these machine learning models and makes connections to our physical understanding of materials.

Description

Machine learning (ML) models in materials science need a benchmark dataset to improve our trust and understanding of ML tools. Benchmarks exist for developers to compare models side-by-side and focus on the architecture of models rather than on the data used to train the model. Due to the proprietary nature of some material designs, sharing of industrially-relevant materials data is not widespread, which means collecting large unbiased datasets that cover multiple materials, data types, and scales is a challenge for ML methodology researchers.

This project specifically focuses on developing a benchmark for convolutional neural networks (CNNs) using the NexusLIMS database. NexusLIMS contains over 80,000 data points from 13 electron microscopes collected by NIST users across the Materials Measurement Laboratory. There are a mix of materials (e.g., metals, ceramics, polymers, etc.) and data modalities (e.g., images, spectra, and diffraction) which allow for testing of dataset curation. The first step is to make the data in NexusLIMS ML actionable which requires labeling and data preprocessing. Next will be identifying members at NIST who have projects for which a CNN would be beneficial and partnering with them for model development. Then, using existing CNN architectures, a variety of datasets will be used to train established well-performing models (e.g., VGG19, ResNet, ConvNeXt, etc.) and compare their performances. The primary question is whether datasets for materials science can be as heterogeneous as existing datasets like ImageNet and Cifar-10, or do the specific tasks require datasets with the same material, scale, and/or modality.

The short-term outcomes of this project are to:

Develop an ML preprocessing pipeline for the NexusLIMS database to make it more accessible for future researchers at NIST who would like to leverage the database for ML tasks.
Develop guidelines around dataset curation, so researchers can continue to create benchmark datasets based on the necessary parameters.
Create pretrained models to share with researchers based on our findings. These could be models pretrained on the whole NexusLIMS database, or split based on material of the electron microscopy process. This will be particularly useful for researchers without access to large datasets.

The long-term outcome is to gain a better understanding of materials science ML models and connect the results to our physical understanding. The black-box-nature of ML is one of the main factors to prevent further adoption. We plan to work with other researchers at NIST on projects that are well-suited to acceleration with ML tools, collaborating with them to build and assess the performance of models for concrete scientific tasks. This will allow us to more confidently expand our usage of ML models in industry for quality checks, property prediction, manufacturing process monitoring and control, and many other areas.

Created January 29, 2025, Updated March 26, 2025

Was this page helpful?