The initialization determines whether in-context learning is gradient descent

Xie, Shifeng; Yuan, Rui; Rossi, Simone; Hannagan, Thomas
NeurIPS 2025, Workshop, What Can(’t) Transformers Do?, 39th Annual Conference on Neural Information Processing Systems, 2-7 December 2025, San Diego, USA


Type:
Poster / Demo
City:
San Diego
Date:
2025-12-02
Department:
Data Science
Eurecom Ref:
8537
Copyright:
© NIST. Personal use of this material is permitted. The definitive version of this paper was published in NeurIPS 2025, Workshop, What Can(’t) Transformers Do?, 39th Annual Conference on Neural Information Processing Systems, 2-7 December 2025, San Diego, USA and is available at :
See also:

PERMALINK : https://www.eurecom.fr/publication/8537