EGEMM-TC: Accelerating Scientific Computing on Tensor Cores with Extended Precision (PPoPP 2021 - Main Conference)

Who

Boyuan Feng, Yuke Wang, Guoyang Chen, Weifeng Zhang, Yuan Xie, Yufei Ding

Track

PPoPP 2021 Main Conference

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 3 Mar 2021 10:00 - 10:15 - Session 8. Scientific Computing & Optimizations Chair(s): Tim Harris

Abstract

Nvidia Tensor Cores achieve high performance with half-precision matrix inputs tailored towards deep learning workloads. However, this limits the application of Tensor Cores especially in the area of scientific computing with high precision requirements. In this paper, we build Emulated GEMM on Tensor Cores (EGEMM-TC) to extend the usage of Tensor Cores to accelerate scientific computing applications without compromising the precision requirements. First, EGEMM-TC employs an extendable workflow of hardware profiling and operation design to generate a lightweight emulation algorithm on Tensor Cores with extended-precision. Second, EGEMM-TC exploits a set of Tensor Core kernel optimizations to achieve high performance, including the highly-efficient tensorization to exploit the Tensor Core memory architecture and the instruction-level optimizations to coordinate the emulation computation and memory access. Third, EGEMM-TC incorporates a hardware-aware analytic model to offer large flexibility for automatic performance tuning across various scientific computing workloads and input datasets. Extensive evaluations show that EGEMM-TC can achieve on average 3.13X and 11.18X speedup over the cuBLAS kernels and the CUDA-SDK kernels on CUDA Cores, respectively. Our case study on several scientific computing applications further confirms that EGEMM-TC can generalize the usage of Tensor Cores and achieve about 1.8X speedup compared to the hand-tuned, highly-optimized implementations running on CUDA Cores.

Link to Publication

https://dl.acm.org/doi/10.1145/3437801.3441599

Boyuan Feng

UC Santa Barbara

Yuke Wang

UC Santa Barbara

Guoyang Chen

Alibaba Group US Inc.

Weifeng Zhang

Alibaba Group US Inc.

Yuan Xie

UCSB

Yufei Ding

UC Santa Barbara

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 3 Mar
Displayed time zone: Eastern Time (US & Canada) change

10:00 - 11:00	Session 8. Scientific Computing & OptimizationsMain Conference Chair(s): Tim Harris Microsoft, UK

10:00 15m Talk		EGEMM-TC: Accelerating Scientific Computing on Tensor Cores with Extended Precision Main Conference Boyuan Feng UC Santa Barbara, Yuke Wang UC Santa Barbara, Guoyang Chen Alibaba Group US Inc., Weifeng Zhang Alibaba Group US Inc., Yuan Xie UCSB, Yufei Ding UC Santa Barbara Link to publication
10:15 15m Talk		Efficiently Running SpMV on Long Vector Architectures Main Conference Constantino Gómez Barcelona Supercomputing Center, Filippo Mantovani Barcelona Supercomputing Center, Erich Focht NEC, Marc Casas Barcelona Supercomputing Center Link to publication
10:30 15m Talk		Improving Communication by Optimizing On-Node Data Movement with Data Layout Main Conference Tuowen Zhao University of Utah, Mary Hall University of Utah, Hans Johansen Lawrence Berkeley National Laboratory, Samuel Williams Lawrence Berkeley National Laboratory Link to publication
10:45 15m Talk		Sparta: High-Performance, Element-Wise Sparse Tensor Contraction on Heterogeneous Memory Main Conference Jiawen Liu University of California, Merced, Jie Ren University of California, Merced, Roberto Gioiosa Pacific Northwest National Laboratory, Dong Li University of California, Merced, Jiajia Li Pacific Northwest National Laboratory Link to publication