Whether at the level of the individual, team, project or program, research and engineering work must keep abreast of huge amounts of published information that might contain discoveries or needed elements for discoveries crucial to its success. We, in collaboration with Software and Systems Division (ITL) have been developing next generation technologies for the automated creation of terminological and semantic resources from published information. Individual users can use these resources to create their own terminology which creates a common communication ground for groups of all sizes from small teams to large-scale multidisciplinary and multinational research and design projects and programs. The technologies have been designed to include a set of default modules that can easily be modified or replaced by ones that are more adapted to user needs or domain requirements. These modules cover such things as domain-dependent preprocessing rules (for example, filtering out non-linguistic content), language models providing information about dependency structure, vector semantics, and syntactic annotations such as part of speech and terminology generation rules that can be added to or replaced as users gain increased understanding of the information being tracked. The resources produced can easily be plugged into information seeking tools providing the basis for thesauri that support expanding searches, structured indices (including key words, snippets and phrases) for browsing collections of online text and databases and ontologies helpful for organizing vast amounts of information. These resources could also facilitate detecting important changes in the information as it is updated that no longer fits adequately with the current terminological indices of the ontology. These changes may represent new knowledge paradigms in the information module. Although our technology shares some common elements with that of general-purpose search engines such as Google, is fundamentally different in its emphasis on adaptability to different knowledge domains, capability of evolving and the ease with which the terminological and semantic resources can be plugged into individual research and engineering systems.
Given below are examples of some of our ongoing projects supporting information seeking in our databases and knowledge bases.
Material Genome Archive https://randr.nist.gov/mgi/Default.aspx
A subset of cancer related articles from PubMed https://randr.nist.gov/mgi/Default.aspx?dataSource=james
NIST publications https://randr.nist.gov/mgi/Default.aspx?dataSource=nike
Structural database https://randr.nist.gov/chemblast/default.aspx
Enzyme thermodynamics database https://randr.nist.gov/enzyme/Default.aspx
Search MML Website https://randr.nist.gov/mgi/Default.aspx?dataSource=mml
Our previous work on data management ( Protein Data Bank ) systems has been cited over 22,000 times