In recent years, distributed deep learning is becoming popular in industry and academia. Although researchers want to use distributed systems for training, it has been reported that the communication cost for synchronizing gradients can be a bottleneck. Using low-precision gradients is a promising technique for reducing the bandwidth requirement. In this work, we propose Auto Precision Scaling (APS), an algorithm that can improve the accuracy when we communicate gradients by low-precision floating-point values. APS can improve the accuracy for all precisions with a trivial communication cost. Our experimental results show that for both image classification and segmentation, applying APS can train the state-of-the-art models by 8-bit floating-point gradients with no or only a tiny accuracy loss (<0.05%). Furthermore, we can avoid any accuracy loss by designing a hybrid-precision technique. Finally, we propose a performance model to evaluate the proposed method. Our experimental results show that APS can get a significant speedup over the state-of-the-art method. To make it available to researchers and developers, we design and implement a high-performance system for customized precision Deep Learning(CPD), which can simulate the training process using an arbitrary low-precision customized floating-point format. We integrate CPD into PyTorch and make it open-source to the public.
Tue 2 MarDisplayed time zone: Eastern Time (US & Canada) change
13:30 - 14:30 | |||
13:30 6mTalk | POSTER: In-situ Workflow Auto-tuning through Combining Component Models Main Conference Tong Shu Southern Illinois University Carbondale, Yanfei Guo Argonne National Laboratory, Justin Wozniak Argonne National Laboratory, Xiaoning Ding New Jersey Institute of Technology, Ian Foster Argonne Nat Lab and U.Chicago, Tahsin Kurc Stony Brook University Link to publication | ||
13:36 6mTalk | POSTER: Simplifying Low-Level GPU Programming with GAS Main Conference Da Yan Hong Kong University of Science and Technology, Wei Wang Hong Kong University of Science and Technology, Xiaowen Chu Hong Kong Baptist University Link to publication | ||
13:42 6mTalk | POSTER: Corder: Cache-Aware Reordering for Optimizing Graph Analytics Main Conference YuAng Chen The Chinese University of Hong Kong, Shenzhen, Yeh-Ching Chung The Chinese University of Hong Kong, Shenzhen Link to publication | ||
13:48 6mTalk | POSTER: DFOGraph: An I/O- and Communication-Efficient System for Distributed Fully-out-of-Core Graph Processing Main Conference Jiping Yu Tsinghua University, Wei Qin Tsinghua University, Xiaowei Zhu Tsinghua University, Zhenbo Sun Tsinghua University, Jianqiang Huang Tsinghua University, Xiaohan Li Tsinghua University, Wenguang Chen Tsinghua University Link to publication | ||
13:54 6mTalk | POSTER: An Efficient Uncertain Graph Processing Framework for Heterogeneous Architectures Main Conference Heng Zhang Institute of Software, Chinese Academy of Sciences; University of Sydney, Lingda Li Brookhaven National Laboratory, Donglin Zhuang University of Sydney, Rui Liu University of Chicago, Shuang Song Facebook Inc., Dingwen Tao Washington State University, Yanjun Wu Institute of Software, Chinese Academy of Sciences, Shuaiwen Leon Song University of Sydney Link to publication | ||
14:00 6mTalk | POSTER: Dynamic Scaling for Low-Precision Learning Main Conference Ruobing Han Peking University, Min Si Argonne National Laboratory, James W. Demmel UC Berkeley, Yang You UC Berkeley Link to publication | ||
14:06 6mTalk | POSTER: Exploring Deep Reuse in Winograd CNN Inference Main Conference Ruofan Wu Renmin University of China, Feng Zhang Renmin University of China, Zhen Zheng Alibaba Group, Xiaoyong Du Renmin University of China, Xipeng Shen North Carolina State University Link to publication | ||
14:12 6mTalk | POSTER: A Novel Memory-Efficient Deep Learning Training Framework via Error-Bounded Lossy Compression Main Conference Sian Jin Washington State University, Guanpeng Li University of Iowa, Shuaiwen Leon Song University of Sydney, Dingwen Tao Washington State University Link to publication | ||
14:18 6mTalk | POSTER: FFT Blitz: The Tensor Cores Strike Back Main Conference Sultan Durrani University of Illinois at Urbana-Champaign, Muhammad Saad Chughtai Georgia Institute of Technology, Abdul Dakkak University of Illinois at Urbana-Champaign, Wen-mei Hwu University of Illinois at Urbana-Champaign, Lawrence Rauchwerger UIUC Link to publication | ||
14:24 6mBreak | Break Main Conference |