A Hybrid CPU-GPU System for Stitching Large Scale Optical Microscopy Images

Timothy Blattner; Walid Keyrouz; Joe Chalfoun; Bertrand C. Stivalet; Mary C. Brady; Shujia Zhou

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

PUBLICATIONS

A Hybrid CPU-GPU System for Stitching Large Scale Optical Microscopy Images

Published

September 12, 2014

Author(s)

Timothy Blattner, Walid Keyrouz, Joe Chalfoun, Bertrand C. Stivalet, Mary C. Brady, Shujia Zhou

Abstract

Researchers in various fields are using optical microscopy to acquire very large images, 10K--200K of pixels per side. Optical microscopes acquire these images as grids of overlapping partial images (thousands of pixels per side) that are then stitched together via software. Composing such large images is a compute and data intensive task even for modern machines. Researchers compound this difficulty further by obtaining time-series, volumetric, or multiple channel images with the resulting data sets now having or approaching terabyte sizes. We present a scalable hybrid CPU-GPU implementation of image stitching that processes large image sets at near interactive rates. Our implementation scales well with both image sizes and the number of CPU and GPU cores in a machine. It processes a grid of 42 x 59 tiles into a 17K x 22K pixels image in 43 s (end-to-end execution times) when using one NVIDIA Tesla card and two Intel Xeon E-5620 quad-core CPUs, and in 29 s when using two Tesla C2070 cards and the same two CPUs. It also composes and renders the composite image without saving it in 15 s. In comparison, ImageJ/Fiji takes > 3.6 h for the same workload despite being multithreaded and executing the same mathematical operators; it composes and saves the large image in 1.5 h. This implementation takes advantage of coarse-grain parallelism. It organizes the computation into a pipeline architecture that spans CPU and GPU resources and overlaps computation with data motion. The implementation achieves a nearly $10\mathrmx}$ performance improvement over our optimized non-pipeline GPU implementation and demonstrates near-linear speedup when increasing CPU thread count and increasing number of GPUs.

Proceedings Title

Proceedings of the 2014 International Conference on Parallel Processing (ICPP-2014)

Conference Dates

September 9-12, 2014

Conference Location

Minneapolis, MN, US

Pub Type

Conferences

Download Paper

Local Download

Keywords

Hybrid systems, Parallel Architectures, Heterogeneous (hybrid) systems, Scheduling and task partitioning

Information technology, Image and signal processing and Computational science

Citation

Blattner, T. , Keyrouz, W. , Chalfoun, J. , Stivalet, B. , Brady, M. and Zhou, S. (2014), A Hybrid CPU-GPU System for Stitching Large Scale Optical Microscopy Images, Proceedings of the 2014 International Conference on Parallel Processing (ICPP-2014), Minneapolis, MN, US, [online], https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=915568 (Accessed July 12, 2025)

Issues

If you have any questions about this publication or are having problems accessing it, please contact [email protected].

Created September 11, 2014, Updated October 12, 2021

Was this page helpful?

A Hybrid CPU-GPU System for Stitching Large Scale Optical Microscopy Images

Author(s)

Abstract

Download Paper

Keywords

Citation

Additional citation formats

Issues