Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Strategy for extensible, evolving terminology for MGI efforts

Summary

Current terminology used to describe materials data is heterogeneous, redundant, and often ambiguous.  The lack of common, community-based terminology hinders the discovery and integration of material data for improved design of advanced materials.  Intuitive, flexible, and evolving terminology plays a significant role in capitalizing on recommended knowledge representation models for material engineering applications. We are developing a rules-based approach with initial examples from a growing corpus of materials terms in the NIST Materials Repository (https://materialsdata.nist.gov).  Our method aims to establish a common, and consistent evolving set of rules for creating or extending terminology as needed to describe materials data.  The rules are intended to be simple and generalizable for users to understand and extend. The rules are also for other groups to apply to repositories they are building and to guide  machines during automated processing of the terms and their execution.

Description

Many Indo-European languages utilize a limited set of highly reused, non-synonymous, short semantically relevant words called roots that can be combined to facilitate the building of new compounded terms such as peanut butter and watch dog. This approach, which is more prominent in certain languages such as Sanskrit, Latin, and German permits the creation of terms on-demand as well as the replacement of a root in an existing term by one or more other roots to create a new, related term. As illustrated in the figure below, the proposed MGI terminology makes use of these root and term concepts to generate terms on-demand in order to establish a common and evolving vocabulary that is based on use cases and related to developing ontologies.

Root and Rules
Credit: Bhat, T N

 

Selected rules with examples:

  1. Choose frequently used short words as roots such as crystal.

  2. Keep ‘roots’ in singular form such as property instead of properties.

  3. Avoid including special characters (such as “’”, ‘:’,’_’,’-‘,’=’) in a root such as Xray instead of X-Ray.

  4. Avoid the use of superfluous words, including stop words such as 'of', 'with' etc, in a term such as vaporization heat instead of heat of vaporization.

  5. Concatenate ‘roots’ by a hyphen (-) to form a term such as Crystal-structure.

  6. Create reasonably discriminating terms. If needed add additional ‘roots’ to a term to increase its discriminating power such as Spectroscopy-XPS instead of XPS.

                                   

Examples of rule-based terms:

General example: Watch dog

  • A dog with a particular purpose to watch as opposed to a show dog or a hunting dog.

Compounding two roots, watch and dog, creates a new term with meaning related to the qualified root, dog and the qualifying root, watch.

Materials science example: Crystal-structure-FCC-Be-diffusion

  • FCC class of structure of crystal (crystal structure) has a diffusion of Be (Be diffusion) at a rate of. . .

Compounding five roots, Crystal-structure with FCC and Be diffusion, creates a new term with meaning related to the qualified roots and the qualifying root, crystal structure.

Major Accomplishments

Strategy for Extensible, Evolving Terminology for the Material Genome Initiative Efforts.

Building of an infrastructure to create terminology is not new to humans. Over the past thousands of years, humans have developed many languages and the terminology needed for languages to evolve. What is new is devise a way to adopt and adapt some of the elegant linguistic concepts to build the terminology infrastructure for the Materials Genome Initiative (MGI), a multi-agency initiative designed to accelerate discovery, development and deployment of advanced materials. The consistency among the words (terminology) that researchers use to describe and share data on advanced materials is an essential component of this national infrastructure. Current terminology used by the MGI community is ad hoc and heterogeneous. To help in accelerating the discovery and integration of materials data for improved design of advanced materials, we have constructed a “root” and rule-based approach that will help the community build re-useable, extensible and automation friendly terminology to describe MGI data in an intuitive way.

After reviewing many possibilities, we chose to adopt some of the rule and root-based concepts used by few Indo-European languages such as Sanskrit, Latin and German to build MGI terminology using English words. Unlike spoken or written linguistic terminology, MGI terminology is expected to be not only human friendly, but also machine-friendly. For this reason we had to give special consideration for the requirements of text-mining techniques such as Natural Language Programming, data-graphs and databases. The proposed MGI terminology effort takes advantage of our past experience in developing ChemBLAST (terminology for chemical structures) and Cell image terminology (cell image data).

In particular, we focused on producing terms that describe material properties. The terminology then could be used by database developers to develop user-friendly web interfaces to archive and distribute MGI data. Community may use these databases to get succinct answers to their product related questions which may lead to decrease in the time needed to develop new products.

Download PDF

Visit the website

CURRENT AND PLANNED OUTPUTS: 

This novel technique has been implemented to enhance and extend  MGI/Dspace search capabilities

New features include user friendly interface that produces succinct answers to

1.Search on data from Dspace (Right panel in Dspace website (https://materialsdata.nist.gov/dspace/xmlui/handle/11115/75  )

2.Search/download NIST publications related to MGI projects using metadata related  to those found in Dspace.

 

Work is underway to use this novel software generated semantic, evolving and extensible terminology to improve the search experience of publications distributed by

1.International Union of Crystallography

2.American Physical Society .

 

Within NIST resources, efforts are underway to use this method to improve search experience of MML/ODI managed data.gov and Biosystems and Biomaterials Division (644) web pages with future possibilities to cover the entire MML webpage.

Reference:

TN Bhat, L. Bartolo, U. Kattner, C. Campbell and J. Elliott, “Strategy for Extensible, Evolving Terminology for the Material Genome Initiative Efforts”, JOM, online July 2015.

DOI 10.1007/s11837-015-1487-4

Created September 7, 2018, Updated October 16, 2018