Deep Reinforcement Learning (DRL) has become a prominent paradigm to design trajectories for autonomous unmanned aerial vehicles (UAV) used as flying access points in the context of cellular or Internet of Things (IoT) connectivity. However, the prohibitively high training data demand severely restricts the applicability of RL-based trajectory planning in real-world missions. We propose a model-aided deep Q-learning approach that, in contrast to previous work, requires a minimum of expensive training data samples and is able to guide a flight-time restricted UAV on a data harvesting mission without prior knowledge of wireless channel characteristics and limited knowledge of wireless node locations. By exploiting some known reference wireless node positions and channel gain measurements, we seek to learn a model of the environment by estimating unknown node positions and learning the wireless channel characteristics. Interaction with the model allows us to train a deep Q-network (DQN) to approximate the optimal UAV control policy. We show that in comparison with standard DRL approaches, the proposed model-aided approach requires at least one order of magnitude less training data samples to reach identical data collection performance, hence offering a first step towards making DRL a viable solution to the problem.
Model-aided deep reinforcement learning for sample-efficient UAV trajectory design in IoT networks
Submitted to GLOBECOM 2021, IEEE Global Communications Conference, 7-11 December 2021, Madrid, Spain / Submitted to ArXiV, 21 April 2021
Systèmes de Communication
© EURECOM. Personal use of this material is permitted. The definitive version of this paper was published in Submitted to GLOBECOM 2021, IEEE Global Communications Conference, 7-11 December 2021, Madrid, Spain / Submitted to ArXiV, 21 April 2021 and is available at :
PERMALINK : https://www.eurecom.fr/publication/6543