, , , Shuvra S. Bhattacharyya, Milton Halem
Designing applications for scalability is key to improving their performance in hybrid and cluster computing. Scheduling code to utilize parallelism is difficult, particularly when dealing with data dependencies, memory management, data motion, and processor occupancy. The Hybrid Task Graph Scheduler (HTGS) increases programmer productivity when implementing hybrid workflows for multi- core and multi-GPU systems. HTGS manages dependencies between tasks, represents CPU and GPU memories independently, overlaps computations with disk I/O and memory transfers, keeps multiple GPUs occupied, and uses all available compute resources. To demonstrate the HTGS API we implement and present two algorithms. First, a matrix multiplication, which achieves 1.3x and 1.8x speedup over the OpenBLAS library for 16k x 16k and 32k x 32k size matrices, respectively. And second, a hybrid implementation of microscopy image stitching that reduces code size by ~43% and shows favorable performance compared to a similar hybrid workflow implementation of the same algorithm that does not use HTGS.
Journal of Signal Processing Systems
heterogeneous architectures, task graphs, hybrid workflows, dataflow, image processing, matrix multiplication