DAPPLE: A Pipelined Data Parallel Approach for Training Large Models (PPoPP 2021 - Main Conference)

Who

Shiqing Fan, Yi Rong, Chen Meng, ZongYan Cao, Siyu Wang, Zhen Zheng, Chuan Wu, Guoping Long, Jun Yang, LiXue Xia, Lansong Diao, Xiaoyong Liu, Wei Lin

Track

PPoPP 2021 Main Conference

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 3 Mar 2021 13:15 - 13:30 - Session 10. Machine Learning and Software Engineering Chair(s): Albert Cohen

Abstract

It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However, there are still several tricky issues to address: improving computing efficiency while ensuring convergence, and reducing memory usage without incurring additional computing costs. We propose \emph{DAPPLE}, a synchronous training framework which combines data parallelism and pipeline parallelism for large DNN models. It features a novel parallelization strategy \emph{planner} to solve the partition and placement problems, and explores the optimal hybrid strategies of data and pipeline parallelism. We also propose a new runtime scheduling algorithm to reduce device memory usage, which is orthogonal to re-computation approach and does not come at the expense of training throughput. Experiments show that \emph{DAPPLE planner} consistently outperforms strategies generated by PipeDream‘s planner by up to $3.23\times$ speedup under synchronous training scenarios, and \emph{DAPPLE runtime} outperforms GPipe by $1.6\times$ speedup of training throughput and saves 12% of memory consumption at the same time.

Link to Publication

https://dl.acm.org/doi/10.1145/3437801.3441593

Shiqing Fan

Alibaba Group

Yi Rong

Alibaba Group

Chen Meng

Alibaba Group

ZongYan Cao

Alibaba Group

Siyu Wang

Alibaba Group

Zhen Zheng

Alibaba Group

Chuan Wu

The University of Hong Kong

Guoping Long

Alibaba Group

Jun Yang

Alibaba Group

LiXue Xia

Alibaba Group

Lansong Diao

Alibaba Group

Xiaoyong Liu

Alibaba Group

Wei Lin

Alibaba Group

Time Zone

The program is currently displayed in (GMT-05:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-05:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 3 Mar
Displayed time zone: Eastern Time (US & Canada) change

12:30 - 13:30	Session 10. Machine Learning and Software EngineeringMain Conference Chair(s): Albert Cohen Google

12:30 15m Talk		TurboTransformers: An Efficient GPU Serving System For Transformer Models Main Conference Jiarui Fang Tencent, Yang Yu , Chengduo Zhao Tencent, Jie Zhou Tencent Link to publication
12:45 15m Talk		Extracting Clean Performance Models from Tainted Programs Main Conference Marcin Copik ETH Zurich, Alexandru Calotoiu ETH Zurich, Tobias Grosser University of Edinburgh, Nicolas Wicki ETH Zurich, Felix Wolf TU Darmstadt, Torsten Hoefler ETH Zurich Link to publication Pre-print
13:00 15m Talk		Modernizing Parallel Code with Pattern Analysis Main Conference Roberto Castañeda Lozano University of Edinburgh, Murray Cole University of Edinburgh, Björn Franke University of Edinburgh Link to publication
13:15 15m Talk		DAPPLE: A Pipelined Data Parallel Approach for Training Large Models Main Conference Shiqing Fan Alibaba Group, Yi Rong Alibaba Group, Chen Meng Alibaba Group, ZongYan Cao Alibaba Group, Siyu Wang Alibaba Group, Zhen Zheng Alibaba Group, Chuan Wu The University of Hong Kong, Guoping Long Alibaba Group, Jun Yang Alibaba Group, LiXue Xia Alibaba Group, Lansong Diao Alibaba Group, Xiaoyong Liu Alibaba Group, Wei Lin Alibaba Group Link to publication