Model-based reinforcement learning in the era of foundation models

Benechehab, Abdelhakim

Thesis

Reinforcement Learning (RL) provides a general framework for sequential decision-making, enabling agents to learn through interaction with an environment. Despite successes in domains such as games and robotics, it remains challenging to deploy in real-world settings, notably due to limited sample efficiency.

Model-based Reinforcement Learning (MBRL) addresses part of this issue by learning a model of the environment dynamics for planning, thereby reducing costly interactions. However, learned models introduce new challenges, including compounding errors and model-policy objective mismatch.

In parallel, machine learning has shifted toward Foundation Models (FMs): large-scale pre-trained models that learn transferable representations from massive datasets. Initially developed for natural language processing, they are now emerging in areas such as time series modeling, raising questions about their role in RL.

This thesis investigates how integrating FMs can strengthen MBRL.

Adopting a sequence modeling perspective, it first revisits dynamics modelling by directly optimizing a multi-step objective to improve long-term accuracy.

Building on this view, it then formulates next-state prediction as an in-context learning problem and introduces a latent projection mechanism that enables zero-shot use of pre-trained Large Language Models (LLMs) within MBRL.

Extending these representation learning ideas further, the thesis addresses the adaptation of univariate Time series Foundation models (TSFMs) to multivariate and probabilistic forecasting through learnable encoder–decoder adapters.

Finally, shifting the focus from dynamics to rewards, it proposes a bilevel optimization approach to learn implicit reward functions from supervised data, broadening the applicability of RL to settings where explicit reward design is impractical.

Overall, this work demonstrates how combining MBRL with FMs leads to more sample-efficient and generalizable decision-making systems.

Detail

Document

HAL

BIBTEX

Type:

Thesis

Date:

2026-05-13

Department:

Data Science

Eurecom Ref:

8657