To develop the need data infrastructure and informatics tools, NIST is focusing on three areas: data curation, data infrastructure, and data access.
- Data curation efforts include efforts to develop a phase-based materials ontology to ensure that the data is represented in a semantically consistent fashion by disparate members of the phase-based data community and will facilitate search, transformation and reasoning with the curated data. Domain level modeling using UML (Unified Modeling Language) will enable software development that is consistent with develop ontologies and data schemas. XML schema will also be developed for different phase-based use cases. A variety of existing XML schemas will be incorporated and/or expanded, including the ThermML schema. The curated data will be encoded in a consistent and repeatable fashion with the necessary provenance and will be properly stored in the appropriate repositories. Web-based interfaces will be used to capture curated data.
- The database infrastructure will be structured to manage the large amount of heterogeneous data in flexible way and capable of supporting complex data queries. This infrastructure will be distributed, federated, and heterogeneous and will draw from modern NoSQL databases as well as more traditional relational technologies.
- Data access to this heterogeneous, federated database systems capable of handling complex dynamic queries will require the development of programmable access to the data via traditional APIs, Web APIs (REST), and data exchange facilities and formats (XML, JSON, BSON). These interfaces and facilities will serve as the foundation for flexibility and scalability. These interfaces will enable data analytics and machining learning tools.