Large-Scale Distributed Learning Strategies
LOW-RANK GRADIENT APPROXIMATION FOR MEMORY-EFFICIENT ON-DEVICE TRAINING OF DEEP NEURAL NETWORK
IMPROVING EFFICIENCY IN LARGE-SCALE DECENTRALIZED DISTRIBUTED TRAINING
PARALLELIZING ADAM OPTIMIZER WITH BLOCKWISE MODEL-UPDATE FILTERING
Chair: Xiaodong Cui (IBM) and Bhuvana Ramabhadran (Google)