In model-based reinforcement learning, most algorithms rely on simulating trajectories from one-step models of the dynamics learned on data. A critical challenge of this approach is the compounding of one-step prediction errors as the length of the trajectory grows. In this paper we tackle this issue by using a multi-step objective to train one-step models. Our objective is a weighted sum of the mean squared error (MSE) loss at various future horizons. We find that this new loss is particularly useful when the data is noisy (additive Gaussian noise in the observations), which is often the case in real-life environments. We show in a variety of tasks (environments or datasets) that the models learned with this loss achieve a significant improvement in terms of the averaged R2-score on future prediction horizons. To our surprise, in the pure batch reinforcement learning setting, we find that the multi-step lossbased models perform only marginally better than the baseline. Furthermore, this improvement is only observed for small loss horizons, unlike the trend present with the R2-score on the respective datasets.
A Study of the Weighted Multi-step Loss Impact on the Predictive Error and the Return in MBRL
RLC 2024, 1st Reinforcement Learning Conference, 9-12 August 2024, Amherst, MA, USA
Type:
Conférence
City:
Amherst
Date:
2024-08-09
Department:
Data Science
Eurecom Ref:
8084
See also:
PERMALINK : https://www.eurecom.fr/publication/8084