Improving Communication by Optimizing On-Node Data Movement with Data Layout
We present optimizations to improve communication performance by reducing on-node data movement for a class of distributed memory applications. The primary concept is to eliminate the data movement associated with packing and unpacking subsets of the data during communication. With the rapid rise in network injection bandwidth reducing off-node data movement cost, on-node data movement can be significantly more expensive than computation and network communication. This data movement is especially costly for small domains - as in memory-intensive multi-physics codes or when strong scaling to reduce time-to-solution. The optimizations presented include (1) optimizing data layout through indirection to enable pack-free communication; (2) creating contiguous views of memory using memory mapping thus minimizing the number of messages; and (3) applying these techniques to intra-node data movement including CPU-GPU data movement. The benefits of these optimizations are demonstrated in stencil benchmarks against a highly-optimized baseline, reducing communication time by up to 14.4$\times$.
Wed 3 MarDisplayed time zone: Eastern Time (US & Canada) change
10:00 - 11:00 | |||
10:00 15mTalk | EGEMM-TC: Accelerating Scientific Computing on Tensor Cores with Extended Precision Main Conference Boyuan Feng UC Santa Barbara, Yuke Wang UC Santa Barbara, Guoyang Chen Alibaba Group US Inc., Weifeng Zhang Alibaba Group US Inc., Yuan Xie UCSB, Yufei Ding UC Santa Barbara Link to publication | ||
10:15 15mTalk | Efficiently Running SpMV on Long Vector Architectures Main Conference Constantino Gómez Barcelona Supercomputing Center, Filippo Mantovani Barcelona Supercomputing Center, Erich Focht NEC, Marc Casas Barcelona Supercomputing Center Link to publication | ||
10:30 15mTalk | Improving Communication by Optimizing On-Node Data Movement with Data Layout Main Conference Tuowen Zhao University of Utah, Mary Hall University of Utah, Hans Johansen Lawrence Berkeley National Laboratory, Samuel Williams Lawrence Berkeley National Laboratory Link to publication | ||
10:45 15mTalk | Sparta: High-Performance, Element-Wise Sparse Tensor Contraction on Heterogeneous Memory Main Conference Jiawen Liu University of California, Merced, Jie Ren University of California, Merced, Roberto Gioiosa Pacific Northwest National Laboratory, Dong Li University of California, Merced, Jiajia Li Pacific Northwest National Laboratory Link to publication |