Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.


Much information that could have immediate practical value in the understanding, management, and measurement of socio-technical systems exists as domain-specific text embedded technical documents and information systems. The goal of this project is to discover, characterize, and apply text analysis methodologies that extract relevant information from domain-specific text.


A large portion of this project involves a collaboration with the Engineering Laboratory’s (EL) Model-based Enterprise program, specifically with the Knowledge Extraction and Application for Manufacturing Operations project. This collaboration has focused on the analysis of maintenance work order data and the development of an adaptation of Natural Language Processing (NLP) for technical text.

We contribute to the NIST effort to provide tools for searching the COVID-19 Open Research Dataset (CORD-19) by reconstructing the dataset’s contiguous text from its JSON format. We use an Allen AI language model to extract keywords for this effort.

We have been developing a methodology for analyzing large collections of privacy notices and have an emerging collaboration with the Federal Judicial Center to classify court dockets. Both of these efforts are focused on developing domain-specific text analyses for low-resource environments.

We have contributed to the IARPA TrojAI project by developing a semi-automated methodology for literature reviews and are also investigating a methodology for the development of synthetic data for low-resource domains. This effort is interfacing with the Configurable Data Curation System (CDCS) project which will provide infrastructure for this effort.

We are beginning an effort with the University of Maryland/Applied Research Laboratory for Intelligence and Security (ARLIS) to analyze supply-chain related documents to identify potential risks. We will begin by evaluating several text analysis methodologies. We also plan to investigate architectures for supply chain observatories.

We also plan to collaborate with the Material Measurement Laboratory to analyze microscopy journal articles to assist with the development of scientific taxonomies.

Major Accomplishments

Created May 27, 2021, Updated May 25, 2023