Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Information Systems Group

Welcome to the Software and Systems Division's Information Systems Group.

The group’s work falls into three main categories—(1) Infrastructure or Platforms for Science-Oriented Analytics, (2) Advanced Computational Techniques and Algorithms, and (3) Foundational Capabilities—that contribute to developing computationally enabled measurements with trust in computing and handling of high-throughput instruments built-in by design.

Infrastructure or Platforms for Science-Oriented Analytics

  • CDCS (Customizable Data Curation System) focuses on registries, curating document-oriented data, and providing persistent IDs.  It provides web interfaces for retrieving and querying data, text-oriented search.  It is being augmented with advanced tools, inspired by Natural Language Processing (NLP), to provide semantic search.
     
  • WIPP (Web Image Processing Pipeline) focuses on image analytics over terabyte-sized image collections running on distributed computational hardware (clusters and clouds).  It provides web interfaces for managing and viewing images or subsets of images, and for the traceable and reproducible processing of images via workflows of software containers from a WIPP registry.

Advanced Computational Techniques and Algorithms

  • Image Analytics—the group is developing approaches that combine approaches from conventional feature engineering and ones rooted in Artificial Intelligence/Deep Learning (AI/DL) for analyzing a variety of image types: optical microscopy, electron microscopy, Cryo-EM images (Cryogenic Electron Microscopy), neutron images, etc.  Several of these image types go beyond 3D by adding a time dimension (T) or multiple channels (C) and can be very large (approaching 1 TB).
     
  • Text Analytics—the group is applying Natural Language Processing (NLP) techniques and language models (e.g., BERT) to analyze curated scientific publications and answer more sophisticated queries than traditional Informational Retrieval (IR) systems.  The group is also a participant in identifying a subdomain of NLP, Technical Language Processing (TLP), which aims to tackle text-related problems in technical domains with limited data availability (e.g., maintenance logs).
     
  • Algorithmic Acceleration—the group is continuing its development of specialized algorithms with reduced operation counts in areas ranging from Monte Carlo sampling for Molecular Dynamics, mixed or reduced precision computations, and stochastic algorithms.

Foundational Capabilities

  • Artificial Intelligence/Deep Learning for Imaging and NLP—most group members have become quite proficient in the use of DL tools.  The group now uses DL approaches as a foundational building block to solve problems in multiple domains: imaging across multiple modalities, text, specialized signal processing, and computer security (trojan detection).  Furthermore, the group is collaborating with groups in several NIST OUs (EL, MML, NCNR) to apply AI/DL techniques to OU-specific problems, has made code available to NIST researchers for automating training on AI-oriented hardware resources at NIST, and has given AI-related presentations to NIST researchers.

    The group is administering a public competition to detect Trojans (hidden classes) in AI Deep Learning models (Neural Networks) on behalf of IARPA.
     
  • Scalability & Performance—the group continues to extend its work on Program Execution Models for obtaining performance in a range of applications.  This work identified Data Flow Graphs as a promising execution model that makes it easy to take advantage of accelerators (e.g., GPUs).  The group released Hedgehog, a library and runtime system for implementing Multi-threaded Asynchronous Data Flow Graphs on high-end single compute nodes, along with FastLoader, a companion library for the multithreaded asynchronous reading of large objects from files (e.g., very large images), to simplifying the development of performance-oriented applications. We have used this execution model to develop performance-oriented applications (e.g., analysis of very large microscopy images [100K x 50K pixels]).

    At a conceptual level, the group is cooperating with a University of Utah research team, led by Prof. Martin Berzins, to extend this programming model beyond a single compute node, so it applies to a cluster.  The group is also exploring Ray Tracing as a programming model to accelerate and simplify particle transport simulation, which are of interest to multiple NIST OUs.
     
  • Trustworthy Computing—the group is developing or extending approaches to enhance trust in computing in three areas: (a) numerical reproducibility—by associating a numerical uncertainty with a computed result; (b) explainable AI in OMICS Problems—by combining simulations of neural networks, interactive visualizations for sequencing data, and designs of multiple metrics of AI models using perturbations; (c) reproducible image analysis—by organizing imaging computations as reproducible workflows using containers and tracking data & result provenance.

Projects and Programs

Actionable Intelligence

Ongoing
Much information that could have immediate practical value in the understanding, management, and measurement of socio-technical systems exists as domain

AI for Low-Field MRI

Ongoing
Emerging low-field Magnetic Resonance Imaging (MRI) systems offer the promise of low-cost point-of-care imaging that could be conducted in, for example, rural

News and Updates

A Map App to Track Stem Cells

Researchers who work with stem cells have ambitious goals. Some want to cure cancer or treat heart disease. Others want to grow the tissues and organs that

Publications

Ballot Definition Common Data Format Specification

Author(s)
Benjamin Long, John Dziurłaj
This publication describes a ballot definition common data format for the interchange of logical and physical ballot style information. It contains a UML

Micro Common Data Format Specification

Author(s)
Benjamin Long, John Dziurłaj
This specification describes a data format for space-constrained environments, such as the placement of machine readable data on paper. The specification is

Characterization of AI Model Configurations For Model Reuse

Author(s)
Peter Bajcsy, Daniel Gao, Michael Paul Majurski, Thomas Cleveland, Manuel Carrasco, Michael Buschmann, Walid Keyrouz
With the widespread creation of artificial intelligence (AI) models in biosciences, bio-medical researchers are reusing trained AI models from other

Awards

Archived Projects

Contacts