Large-language models are notoriously famous for their impressive performance across a wide range of tasks. One surprising example of such impressive performance is a recently identified capacity of LLMs to understand the governing principles of dynamical systems satisfying the Markovian property. In this paper, we seek to explore this direction further by studying the dynamics of stochastic gradient descent in convex and non-convex optimization. By leveraging the theoretical link between the SGD and Markov chains, we show a remarkable zero-shot performance of LLMs in predicting the local minima to which SGD converges for previously unseen starting points. On a more general level, we inquire about the possibility of using LLMs to perform zero-shot randomized trials for larger deep learning models used in practice.
Can LLMs predict the convergence of Stochastic Gradient Descent?
ICML 2024, 41st International Conference on Machine Learning, 1st ICML Workshop on In-Context Learning, 27 July 2024, Vienna, Austria
Type:
Poster / Demo
City:
Vienna
Date:
2024-07-27
Department:
Data Science
Eurecom Ref:
8083
Copyright:
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in ICML 2024, 41st International Conference on Machine Learning, 1st ICML Workshop on In-Context Learning, 27 July 2024, Vienna, Austria and is available at :
See also:
PERMALINK : https://www.eurecom.fr/publication/8083