Write a Blog >>
PPoPP 2021
Sat 27 February - Wed 3 March 2021
Wed 3 Mar 2021 10:00 - 10:15 - Session 8. Scientific Computing & Optimizations Chair(s): Tim Harris

Nvidia Tensor Cores achieve high performance with half-precision matrix inputs tailored towards deep learning workloads. However, this limits the application of Tensor Cores especially in the area of scientific computing with high precision requirements. In this paper, we build Emulated GEMM on Tensor Cores (EGEMM-TC) to extend the usage of Tensor Cores to accelerate scientific computing applications without compromising the precision requirements. First, EGEMM-TC employs an extendable workflow of hardware profiling and operation design to generate a lightweight emulation algorithm on Tensor Cores with extended-precision. Second, EGEMM-TC exploits a set of Tensor Core kernel optimizations to achieve high performance, including the highly-efficient tensorization to exploit the Tensor Core memory architecture and the instruction-level optimizations to coordinate the emulation computation and memory access. Third, EGEMM-TC incorporates a hardware-aware analytic model to offer large flexibility for automatic performance tuning across various scientific computing workloads and input datasets. Extensive evaluations show that EGEMM-TC can achieve on average 3.13X and 11.18X speedup over the cuBLAS kernels and the CUDA-SDK kernels on CUDA Cores, respectively. Our case study on several scientific computing applications further confirms that EGEMM-TC can generalize the usage of Tensor Cores and achieve about 1.8X speedup compared to the hand-tuned, highly-optimized implementations running on CUDA Cores.

Wed 3 Mar
Times are displayed in time zone: Eastern Time (US & Canada) change

10:00 - 11:00
Session 8. Scientific Computing & OptimizationsMain Conference
Chair(s): Tim HarrisMicrosoft, UK
10:00
15m
Talk
EGEMM-TC: Accelerating Scientific Computing on Tensor Cores with Extended Precision
Main Conference
Boyuan FengUC Santa Barbara, Yuke WangUC Santa Barbara, Guoyang ChenAlibaba Group US Inc., Weifeng ZhangAlibaba Group US Inc., Yuan XieUCSB, Yufei DingUC Santa Barbara
Link to publication
10:15
15m
Talk
Efficiently Running SpMV on Long Vector Architectures
Main Conference
Constantino GómezBarcelona Supercomputing Center, Filippo MantovaniBarcelona Supercomputing Center, Erich FochtNEC, Marc CasasBarcelona Supercomputing Center
Link to publication
10:30
15m
Talk
Improving Communication by Optimizing On-Node Data Movement with Data Layout
Main Conference
Tuowen ZhaoUniversity of Utah, Mary HallUniversity of Utah, Hans JohansenLawrence Berkeley National Laboratory, Samuel WilliamsLawrence Berkeley National Laboratory
Link to publication
10:45
15m
Talk
Sparta: High-Performance, Element-Wise Sparse Tensor Contraction on Heterogeneous Memory
Main Conference
Jiawen LiuUniversity of California, Merced, Jie RenUniversity of California, Merced, Roberto GioiosaPacific Northwest National Laboratory, Dong LiUniversity of California, Merced, Jiajia LiPacific Northwest National Laboratory
Link to publication