We are developing machine learning algorithms to accelerate the discovery and optimization of advanced materials. These new algorithms form part of a data analysis system that integrates data mining, materials databases, and measurement tools, to provide high throughput analysis of materials data. Of primary interest is the high throughput analysis of experimental data measured on combinatorial "libraries", both on-the-fly (real-time) during the measurement experiment, as well as offline, i.e., after data collection. The advantage of the former is the possibility of providing real-time guidance during measurement, to improve data collection.
Over the last few decades, materials discovery and optimization have become significantly more sophisticated through the use of high throughput (combinatorial) methodologies. As a result, materials researchers can now collect much more data than previously possible, resulting in larger data sets that can be very time consuming to analyze. The disparity between data collection and analysis times is fueling interest in new machine learning algorithms, also known as data-mining techniques. These techniques are especially needed when the individual data takes on non-trivial, high dimensional forms such as spectra or images.
Combinatorial Library Data:
Our data-mining techniques have been developed to address two applications of interest to the combinatorial materials community: the discovery of phase structure diagrams for binary and ternary composition spreads, and determining relationships between different material properties measured on the same combinatorial libraries. A description of the inorganic materials combinatorial research project in MML/NIST can be found here.
Incorporating data from materials databases into the data analysis system can provide further benefits. Critically evaluated database entries can enhance data mining performance, or can be used to cross reference results. Two such databases of interest are the FIZ/NIST Inorganic Crystal Structure Database and the NIST Phase Equilibria Diagrams Database.
Integration with Instrumentation and Real-Time Analysis:
Further integrating the data analysis software by allowing it real-time access to data as it is being collected can provide the added benefits of live data analysis. For example, when instrument time or financial constraints limit the number of samples that can be characterized, live data analysis can provide a means for guiding the experimentalist to the optimal samples to characterize, maximizing knowledge of the overall set of samples with a minimum number of measurements. This is of great interest to the experimentalist working with combinatorial libraries, which commonly comprise hundreds or thousands of samples. Integration of data-mining algorithms, materials property databases, and experimental equipment defines one NIST vision of the Materials Genome Initiative (MGI), a major initiative for accelerating discovery and optimization of novel, advanced materials.