Large language models as Markov chains

Zekri, Oussama; Odonnat, Ambroise; Benechehab, Abdelhakim; Bleistein, Linus; Boullé, Nicolas; Redko, Ievgen

Submitted to ArXiV, 2 February 2025

Large language models (LLMs) are remarkably efficient across a wide range of natural language processing tasks and well beyond them. However, a comprehensive theoretical analysis of the LLMs' generalization capabilities remains elusive. In our paper, we approach this task by drawing an equivalence between autoregressive transformer-based language models and Markov chains defined on a finite state space. This allows us to study the multi-step inference mechanism of LLMs from first principles. We relate the obtained results to the pathological behavior observed with LLMs such as repetitions and incoherent replies with high temperature. Finally, we leverage the proposed formalization to derive pre-training and in-context learning generalization bounds for LLMs under realistic data and model assumptions. Experiments with the most recent Llama and Gemma herds of models show that our theory correctly captures their behavior in practice.

Detail

ARXIV

BIBTEX

Type:

Conference

Date:

2025-02-02

Department:

Communication systems

Eurecom Ref:

8082