Aggressively optimizing validation statistics can degrade interpretability of data-driven materials models

Jason Hattrick-Simpers; Brian DeCost; Howie Joress; Nils Persson; Katherine Lei

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

Aggressively optimizing validation statistics can degrade interpretability of data-driven materials models

Published

August 3, 2021

Author(s)

Jason Hattrick-Simpers, Brian DeCost, Howie Joress, Nils Persson, Katherine Lei

Abstract

One of the key factors in enabling trust in artificial intelligence within the materials science community is the interpretability (or explainability) of the underlying models used. By understanding what features were used to generate predictions, scientists are then able to critically evaluate the credibility of the predictions and gain new insights. Here, we demonstrate that ignoring hyperparameters viewed as less impactful to the overall model performance can deprecate model explainability. Specifically, we demonstrate that random forest models trained using unconstrained maximum depths, in accordance with accepted best practices, often can report a randomly generated feature as being one of the most important features in generated predictions for classifying an alloy as being a high entropy alloy. We demonstrate that this is the case for impurity, permutation, and Shapley importance rankings, and the latter two showed no strong structure in terms of optimal hyperparameters. Furthermore, we demonstrate that, for the case of impurity importance rankings, only optimizing the validation accuracy, as is also considered standard in the random forest community, yields models that prefer the random feature in generating their predictions. We show that by adopting a Pareto optimization strategy to model performance that balances validation statistics with the differences between the training and validation statistics, one obtains models that reject random features and thus balance model predictive power and explainability.

Citation

The Journal of Chemical Physics

Volume

155

Pub Type

Journals

Download Paper

https://doi.org/10.1063/5.0050885

Local Download

Keywords

high entropy alloys, machine learning, explainability, artificial intelligence, materials genome initiative

Modeling and computational material science, Metals and Materials

Citation

Hattrick-Simpers, J. , DeCost, B. , Joress, H. , Persson, N. and Lei, K. (2021), Aggressively optimizing validation statistics can degrade interpretability of data-driven materials models, The Journal of Chemical Physics, [online], https://doi.org/10.1063/5.0050885, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=931942 (Accessed July 26, 2025)

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created August 3, 2021, Updated November 29, 2022

Was this page helpful?

Aggressively optimizing validation statistics can degrade interpretability of data-driven materials models

Author(s)

Abstract

Download Paper

Keywords

Citation

Additional citation formats

Issues