I/O Lower Bounds for Auto-tuning of Convolutions in CNNs (PPoPP 2021 - Main Conference)

Who

Xiaoyang Zhang, Junmin Xiao, Guangming Tan

Track

PPoPP 2021 Main Conference

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 2 Mar 2021 11:30 - 11:50 - Session 5. Auto Tuning Chair(s): Saeed Maleki

Abstract

Convolution is the most time-consuming part in the computation of convolutional neural networks (CNNs), which have achieved great successes in numerous practical applications. Due to the complex data dependency and the increase in the amount of model samples, the convolution suffers from high overhead on data movement (i.e., memory access). This work provides comprehensive analysis and methodologies to minimize the communication for the convolution in CNNs. With an in-depth analysis of the recent I/O complexity theory under the red-blue game model, we develop a general I/O lower bound theory for a composite algorithm which consists of several different sub-computations. Based on the proposed theory, we establish the data movement lower bound results for two main convolution algorithms in CNNs, namely the direct convolution and Winograd algorithm, which represents the direct and indirect implementations of a convolution respectively. Next, derived from I/O lower bound results, we design the near I/O-optimal dataflow strategies for the two main convolution algorithms by fully exploiting the data reuse. Furthermore, in order to push the envelope of performance of the near I/O-optimal dataflow strategies further, an aggressive design of auto-tuning based on I/O lower bounds, is proposed to search an optimal parameter configuration for the direct convolution and Winograd algorithm on GPU, such as the number of threads and the size of shared memory used in each thread block. Finally, experiment evaluation results on the direct convolution and Winograd algorithm show that our dataflow strategies with the auto-tuning approach can achieve about $3.32 \times$ performance speedup on average over cuDNN. In addition, compared with TVM, which represents the state-of-the-art technique for auto-tuning, not only our auto-tuning method based on I/O lower bounds can find the optimal parameter configuration faster, but also our solution has higher performance than the optimal solution provided by TVM.

Link to Publication

https://dl.acm.org/doi/10.1145/3437801.3441609

Xiaoyang Zhang

Institute of Computing Technology, Chinese Academy of Sciences

Junmin Xiao

Institute of Computing Technology, Chinese Academy of Sciences

Guangming Tan

Institute of Computing Technology, Chinese Academy of Sciences

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 2 Mar
Displayed time zone: Eastern Time (US & Canada) change

11:10 - 12:10	Session 5. Auto TuningMain Conference Chair(s): Saeed Maleki Microsoft Research

11:10 20m Talk		GPTune: Multitask Learning for Autotuning Exascale Applications Main Conference Yang Liu , Wissam M. Sid-Lakhdar Lawrence Berkeley National Laboratory, Osni Marques Lawrence Berkeley National Laboratory, Xinran Zhu Cornell University, Chang Meng Emory University, James W. Demmel UC Berkeley, Xiaoye S. Li Lawrence Berkeley National Laboratory Link to publication
11:30 20m Talk		I/O Lower Bounds for Auto-tuning of Convolutions in CNNs Main Conference Xiaoyang Zhang Institute of Computing Technology, Chinese Academy of Sciences, Junmin Xiao Institute of Computing Technology, Chinese Academy of Sciences, Guangming Tan Institute of Computing Technology, Chinese Academy of Sciences Link to publication
11:50 20m Talk		ApproxTuner: A Compiler and Runtime System for Adaptive Approximations Main Conference Hashim Sharif University of Illinois at Urbana Champaign, Yifan Zhao University of Illinois at Urbana Champaign, Maria Kotsifakou Runtime Verification, Inc., Akash Kothari University of Illinois at Urbana Champaign, Ben Schreiber University of Illinois at Urbana Champaign, Elizabeth Wang University of Illinois at Urbana Champaign, Yasmin Sarita Cornell University, Nathan Zhao University of Illinois at Urbana-Champaign, Keyur Joshi University of Illinois at Urbana-Champaign, Vikram S. Adve University of Illinois at Urbana-Champaign, Sasa Misailovic University of Illinois at Urbana-Champaign, Sarita Adve University of Illinois at Urbana-Champaign Link to publication