The application of machine learning to the materials domain has traditionally struggled with two major challenges: a lack of large, curated data sets and the need to understand the physics behind the machine-learning prediction. The former problem is particularly acute in the polymers domain. Here we aim to simultaneously tackle these challenges through the incorporation of scientific knowledge, thus, providing improved predictions for smaller data sets, both under interpolation and extrapolation, and a degree of explainability. We focus on imperfect theories, as they are often readily available and easier to interpret. Using a system of a polymer in different solvent qualities, we explore numerous methods for incorporating theory into machine learning using different machine-learning models, including Gaussian process regression. Ultimately, we find that encoding the functional form of the theory performs best followed by an encoding of the numeric values of the theory.
, McDannald, A.
and DeCost, B.
Leveraging Theory for Enhanced Machine Learning, ACS Macro Letters, [online], https://doi.org/10.1021/acsmacrolett.2c00369, https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=934090
(Accessed September 28, 2023)