Graduate School and Research Center in Digital Sciences

Sparsification as a remedy for staleness in distributed asynchronous SGD

Candela, Rosa; Franzese, Giulio; Filippone, Maurizio; Michiardi, Pietro

Submitted on ArXiv, 21 October 2019

Large scale machine learning is increasingly relying on distributed optimization, whereby several machines contribute to the training process of a statistical model. While there exist a large literature on stochastic gradient descent (SGD) and variants, the study of countermeasures to mitigate problems arising in asynchronous distributed settings are still in their infancy. The key question of this work is whether sparsification, a technique predominantly used to reduce communication overheads, can also mitigate the staleness problem that affects asynchronous SGD. We study the role of sparsification both theoretically and empirically. Our theory indicates that, in an asynchronous, non-convex setting, the ergodic convergence rate of sparsified SGD matches the known result O  1 / √ T  of non-convex SGD. We then carry out an empirical study to complement our theory and show that, in practice, sparsification consistently improves over vanilla SGD and current alternatives to mitigate the effects of staleness.

Arxiv Bibtex

Title:Sparsification as a remedy for staleness in distributed asynchronous SGD
Department:Data Science
Eurecom ref:6080
Copyright: © EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Submitted on ArXiv, 21 October 2019 and is available at :
Bibtex: @inproceedings{EURECOM+6080, year = {2019}, title = {{S}parsification as a remedy for staleness in distributed asynchronous {SGD}}, author = {{C}andela, {R}osa and {F}ranzese, {G}iulio and {F}ilippone, {M}aurizio and {M}ichiardi, {P}ietro}, booktitle = {{S}ubmitted on {A}r{X}iv, 21 {O}ctober 2019}, address = {}, month = {10}, url = {} }
See also: