Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

A novel root and rule-based natural language processing (NLP) approach to information indexing and searching


A natural language processing approach to information indexing and searching

A natural language processing approach to information indexing and searching

Whether at the level of the individual, team, project or program, research and engineering work must keep abreast of huge amounts of published information that might contain discoveries or needed elements for discoveries crucial to its success. We, in collaboration with Software and Systems Division (ITL) have been developing next generation technologies for the automated creation of terminological and semantic resources from published information. Individual users can use these resources to create their own terminology which creates a common communication ground for groups of all sizes from small teams to large-scale multidisciplinary and multinational research and design projects and programs.  The technologies have been designed to include a set of default modules that can easily be modified or replaced by ones that are more adapted to user needs or domain requirements. These modules cover such things as domain-dependent preprocessing rules (for example, filtering out non-linguistic content), language models providing information about dependency structure, vector semantics, and syntactic annotations such as part of speech and terminology generation rules that can be added to or replaced as users gain increased understanding of the information being tracked. The resources produced can easily be plugged into information seeking tools providing the basis for thesauri that support expanding searches, structured indices (including key words, snippets and phrases) for browsing collections of online text and databases and ontologies helpful for organizing vast amounts of information. These resources could also facilitate detecting important changes in the information as it is updated that no longer fits adequately with the current terminological indices of the ontology. These changes may represent new knowledge paradigms in the information module.   Although our technology shares some common elements with that of general-purpose search engines such as Google, is fundamentally different in its emphasis on adaptability to different knowledge domains, capability of evolving and the ease with which the terminological and semantic resources can be plugged into individual research and engineering systems.

Given below are examples of some of our ongoing projects supporting information seeking in our databases and knowledge bases.

Material Genome Archive

A subset of cancer related articles from PubMed

NIST publications

Structural database

Enzyme thermodynamics database

Search MML Website

All publications from Nature-communications as an multi-disciplinary example

Our previous work on data management ( Protein Data Bank ) systems has been cited over 22,000 times

Created May 5, 2017, Updated June 2, 2021