Skip to main content
U.S. flag

An official website of the United States government

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

Validating A.I. Pipelines for Analysis of Live Cell Image Data

Summary

Imaging of living cells over time provides unparalleled data of the dynamic cellular characteristics that give rise to complex biological functions. Quantitative time lapse microscopy applied to the analysis of individual cells used to require labor-intensive custom designed image analysis algorithms to segment and track single cells. Deep learning promises to dramatically change this paradigm. Our work is focuses on elucidating the generalizability and limitations of the training of models as a function of the imaging system, cell type, culture conditions, and other factors. Methods are needed to rapidly obtain large amounts of training data, and to test, compare and validate models.  Our program components are: 1) Very high-speed time lapse image acquisition, 2) Storage, analysis and management of time lapse image data, and 3) testing and validating pipelines for analysis of live cell image data.

Description

We are exploring methods for very high-speed cell image data collection, which is enabling the repeated sampling of thousands of cells every 2 minutes. Rapid sampling allow us to observe dynamic processes in cells (such as cell division) on a relevant time scale. Such rates of sampling allows us to track unlabeled cells over long periods in culture. This effort involves collecting and handling very large image datasets, and developing and validating analysis pipelines that use deep learning.

VERY HIGH SPEED LIVE CELL IMAGE ACQUISITION

Quantitative time lapse microscopy applied to the analysis of cellular populations requires sufficient spatial resolution to identify individual cells, then sufficient temporal sampling for tracking of those cells as they grow, move and divide. A fundamental challenge to timelapse imaging the inverse relationship between temporal sampling rate that can be achieved and the area that can be imaged within the time lapse interval (a.k.a. the spatial bandwidth product of the imaging system).

To simultaneously obtain high spatial coverage and temporal sampling, we are working with technology developed by Inscoper that eliminates software latency to optimize the rate at which an automated acquisition workflow can be executed and maximize the spatial bandwidth product of a traditional microscope [Figure 1].  We are also developing acquisition and analysis workflows around continuous motion imaging. In this operational mode, the motorized x-y stage moves continuously, and the illumination and camera are event synchronized for a rapid acquisition sequence. Time lapse image data can be acquired ~50-fold faster compared with the traditional “stop-and-stare” operational mode of a multi-field-of-view workflow. Images are acquired every ~12ms or a whole 6-well plate in ~3 minutes or ~12x108 cells every 3 minutes. A  description of the acquisition method and implementation can be found here.
 

Fluorescence equipped for fast execution of automated acquisition workflows, continuous motion imaging and spinning disk confocal.
Figure 1 Fluorescence microscope equipped for fast execution of automated acquisition workflows, continuous motion imaging and spinning disk confocal.

Another method to achieve high spatial bandwidth product imaging is Fourier ptychography microscopy. We have capabilities and interest in developing workflows for dynamic cellular analysis based on this cutting-edge technology [show image of FPM instrument]. With high spatial bandwidth product imaging capabilities, bioscience researchers can explore a large number of applications including the impact of drugs on cell survival, mitosis or the motion of single cells. The dynamic interactions of gene regulatory networks components could also potentially be examined with very high speed time lapse image acquisition (Plant, Halter, 2020).

STORAGE, ANALYSIS AND MANAGEMENT OF TIME LAPSE IMAGE DATA

Very high-speed acquisition of time lapse image data results in the weekly production of experimental datasets on the order of 1-10 TB’s. We are developing a laboratory with state-of-the-art data tools that leverage the power of deep learning (i.e. CNN’s) for better quantification and prediction of complex biological processes. Deep learning models, in addition to classical image analysis, provide another tool for the quantitative analysis of cellular attributes from microscopic image data.  These models require high quality, unbiased training data and computational hardware and software to achieve the model ‘learning’ so that can then be used for cell image analysis. Because deep learning models do not require rules-based algorithm development as in classical image analysis, we are developing a laboratory with state-of-the-art data tools to advance the application of deep learning for cellular analysis:

  1. TRAINING DATA. We use fluorescently labeled cells, including gene edited cell lines from the Allen Institute for Cell Science, imaged at high signal to noise and preprocessed with classical methods to provide training data for deep learning models. Importantly the high speed data acquisition instrumentation described above can provide very large amounts of data for model development and evaluation.
  2. DATA PLUMBING. The data collection instrumentation is direct connected via 10 GbE to the primary data storage (i.e. large capacity NAS drives) to reduce data fragmentation and duplication. The primary data storage repository is connected via 40 GbE/100 GbE to an optical fiber network on NIST campus so that data can be transferred and processed on high performance CPU and GPU clusters in computational facilities such as the Center for Theoretical and Computational Materials Science (CTCMS) at NIST.
  3. DATA PROCESSING. Image data are processed for model training or inferencing either locally on multi-GPU workstations (i.e. Lamba workstations) or on CPU and GPU clusters at computational facilities on NIST campus.
  4. SOFTWARE. Web Image Processing Pipeline (WIPP) is an open-source web-based algorithmic plugin platform for trusted image-based measurements from terabyte-sized images developed at the National Institute of Standards and Technology (NIST), in collaboration with the National Institutes of Health (NIH) - National Center for Advancing Translational Science (NCATS). We work with WIPP developers to visualize, analyze, maintain, and share our complete quantitative imaging workflows.

TESTING AND VALIDATING PIPELINES FOR ANALYSIS OF LIVE CELL IMAGE DATA

To fully realize the potential of deep learning for large-scale bioimage analysis, we are collaborating with computational scientists to develop strategies to build and deploy trusted A.l. analysis pipelines. Part of this work involves building and testing high speed imaging systems (see above) for rapidly generating data at an appropriate scale for training new models, then testing models over a range of data qualities (e.g. cell density). Another aspect is designing experimental data that has a low ambiguity and can serve as ‘ground truth’ for validating a quantitative time lapse imaging workflow. 

Cell Populations With a Fraction of Labelled Cells
Cell Populations With a Fraction of Labelled Cells
The cell population is partially labelled, a fraction of cells is fluorescent. The labeled cells can be tracked with low ambiguity and can serve as ‘ground truth’ for a label free, deep learning based analysis.

The following questions drive our work in advancing these complex measurement systems: When can we have confidence in the quantitative output of a deep learning A.I. model? Under what conditions does the model fail? We are also evaluating the effect of training data on A.I. model performance characteristics such as accuracy, reliability, robustness and bias. These systematic studies involve the acquisition of image datasets under varying conditions for the training and testing of A.I. models.

With trusted A.I. systems, fit-for-purpose cellular measurements of greater complexity than ever before are possible.


DISCLAIMER: Certain commercial equipment, instruments, or materials (or suppliers, or software) are identified in this webpage to foster understanding. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the materials or equipment identified are necessarily the best available for the purpose.

Created June 9, 2021, Updated March 14, 2023