Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

A novel root and rule-based natural language processing (NLP) approach to information indexing and searching

Description

A natural language processing approach to information indexing and searching

A natural language processing approach to information indexing and searching

Whether at the level of the individual, team, project or program, research and engineering work must keep abreast of huge amounts of published information that might contain discoveries or needed elements for discoveries crucial to its success. We, in collaboration with Software and Systems Division (ITL) have been developing next generation technologies for the automated creation of terminological and semantic resources from published information. Individual users can use these resources to create their own terminology which creates a common communication ground for groups of all sizes from small teams to large-scale multidisciplinary and multinational research and design projects and programs.  The technologies have been designed to include a set of default modules that can easily be modified or replaced by ones that are more adapted to user needs or domain requirements. These modules cover such things as domain-dependent preprocessing rules (for example, filtering out non-linguistic content), language models providing information about dependency structure, vector semantics, and syntactic annotations such as part of speech and terminology generation rules that can be added to or replaced as users gain increased understanding of the information being tracked. The resources produced can easily be plugged into information seeking tools providing the basis for thesauri that support expanding searches, structured indices (including key words, snippets and phrases) for browsing collections of online text and databases and ontologies helpful for organizing vast amounts of information. These resources could also facilitate detecting important changes in the information as it is updated that no longer fits adequately with the current terminological indices of the ontology. These changes may represent new knowledge paradigms in the information module.   Although our technology shares some common elements with that of general-purpose search engines such as Google, is fundamentally different in its emphasis on adaptability to different knowledge domains, capability of evolving and the ease with which the terminological and semantic resources can be plugged into individual research and engineering systems.

Given below are examples of some of our ongoing projects supporting information seeking in our databases and knowledge bases.

Material Genome Archive  https://randr.nist.gov/mgi/Default.aspx

A subset of cancer related articles from PubMed https://randr.nist.gov/mgi/Default.aspx?dataSource=james

NIST publications https://randr.nist.gov/mgi/Default.aspx?dataSource=nike

Structural database  https://randr.nist.gov/chemblast/default.aspx

Enzyme thermodynamics database https://randr.nist.gov/enzyme/Default.aspx

Search MML Website https://randr.nist.gov/mgi/Default.aspx?dataSource=mml

Our previous work on data management ( Protein Data Bank ) systems has been cited over 22,000 times


 
Created May 5, 2017, Updated February 5, 2018