Large-Scale Distributed Learning Strategies

LOW-RANK GRADIENT APPROXIMATION FOR MEMORY-EFFICIENT ON-DEVICE TRAINING OF DEEP NEURAL NETWORK

IMPROVING EFFICIENCY IN LARGE-SCALE DECENTRALIZED DISTRIBUTED TRAINING

PARALLELIZING ADAM OPTIMIZER WITH BLOCKWISE MODEL-UPDATE FILTERING

Chair: Xiaodong Cui (IBM) and Bhuvana Ramabhadran (Google)

%d bloggers like this: