Write a Blog >>
PPoPP 2021
Sat 27 February - Wed 3 March 2021

It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However, there are still several tricky issues to address: improving computing efficiency while ensuring convergence, and reducing memory usage without incurring additional computing costs. We propose \emph{DAPPLE}, a synchronous training framework which combines data parallelism and pipeline parallelism for large DNN models. It features a novel parallelization strategy \emph{planner} to solve the partition and placement problems, and explores the optimal hybrid strategies of data and pipeline parallelism. We also propose a new runtime scheduling algorithm to reduce device memory usage, which is orthogonal to re-computation approach and does not come at the expense of training throughput. Experiments show that \emph{DAPPLE planner} consistently outperforms strategies generated by PipeDream‘s planner by up to $3.23\times$ speedup under synchronous training scenarios, and \emph{DAPPLE runtime} outperforms GPipe by $1.6\times$ speedup of training throughput and saves 12% of memory consumption at the same time.

Wed 3 Mar

Displayed time zone: Eastern Time (US & Canada) change

12:30 - 13:30
Session 10. Machine Learning and Software EngineeringMain Conference
Chair(s): Albert Cohen Google
12:30
15m
Talk
TurboTransformers: An Efficient GPU Serving System For Transformer Models
Main Conference
Jiarui Fang Tencent, Yang Yu , Chengduo Zhao Tencent, Jie Zhou Tencent
Link to publication
12:45
15m
Talk
Extracting Clean Performance Models from Tainted Programs
Main Conference
Marcin Copik ETH Zurich, Alexandru Calotoiu ETH Zurich, Tobias Grosser University of Edinburgh, Nicolas Wicki ETH Zurich, Felix Wolf TU Darmstadt, Torsten Hoefler ETH Zurich
Link to publication Pre-print
13:00
15m
Talk
Modernizing Parallel Code with Pattern Analysis
Main Conference
Roberto Castañeda Lozano University of Edinburgh, Murray Cole University of Edinburgh, Björn Franke University of Edinburgh
Link to publication
13:15
15m
Talk
DAPPLE: A Pipelined Data Parallel Approach for Training Large Models
Main Conference
Shiqing Fan Alibaba Group, Yi Rong Alibaba Group, Chen Meng Alibaba Group, ZongYan Cao Alibaba Group, Siyu Wang Alibaba Group, Zhen Zheng Alibaba Group, Chuan Wu The University of Hong Kong, Guoping Long Alibaba Group, Jun Yang Alibaba Group, LiXue Xia Alibaba Group, Lansong Diao Alibaba Group, Xiaoyong Liu Alibaba Group, Wei Lin Alibaba Group
Link to publication