We are developing benchmark datasets for machine learning (ML) models trained on electron microscopy data, using NIST’s NexusLIMS dataset to decrease the black-box nature of ML models. This project focuses on creating parameters for materials science image dataset curation and developing a benchmark dataset that can standardize the performance of materials science machine learning models. This is a critical step towards the interpretability of these machine learning models and makes connections to our physical understanding of materials.
(left) NexusLIMS automatically collects and stores data. (middle) Data preprocessing: labeling data, connecting with researchers, creating models. (right) Train models: To identify the best curation method be it scale, image type, or material type.
Machine learning (ML) models in materials science need a benchmark dataset to improve our trust and understanding of ML tools. Benchmarks exist for developers to compare models side-by-side and focus on the architecture of models rather than on the data used to train the model. Due to the proprietary nature of some material designs, sharing of industrially-relevant materials data is not widespread, which means collecting large unbiased datasets that cover multiple materials, data types, and scales is a challenge for ML methodology researchers.
This project specifically focuses on developing a benchmark for convolutional neural networks (CNNs) using the NexusLIMS database. NexusLIMS contains over 80,000 data points from 13 electron microscopes collected by NIST users across the Materials Measurement Laboratory. There are a mix of materials (e.g., metals, ceramics, polymers, etc.) and data modalities (e.g., images, spectra, and diffraction) which allow for testing of dataset curation. The first step is to make the data in NexusLIMS ML actionable which requires labeling and data preprocessing. Next will be identifying members at NIST who have projects for which a CNN would be beneficial and partnering with them for model development. Then, using existing CNN architectures, a variety of datasets will be used to train established well-performing models (e.g., VGG19, ResNet, ConvNeXt, etc.) and compare their performances. The primary question is whether datasets for materials science can be as heterogeneous as existing datasets like ImageNet and Cifar-10, or do the specific tasks require datasets with the same material, scale, and/or modality.
The short-term outcomes of this project are to:
The long-term outcome is to gain a better understanding of materials science ML models and connect the results to our physical understanding. The black-box-nature of ML is one of the main factors to prevent further adoption. We plan to work with other researchers at NIST on projects that are well-suited to acceleration with ML tools, collaborating with them to build and assess the performance of models for concrete scientific tasks. This will allow us to more confidently expand our usage of ML models in industry for quality checks, property prediction, manufacturing process monitoring and control, and many other areas.