Deep Reinforcement Learning (DRL) has become a prominent paradigm to design trajectories for autonomous unmanned aerial vehicles (UAV) used as flying access points in the context of cellular or Internet of Things (IoT) connectivity. However, the prohibitively high training data demand severely restricts the applicability of RL-based trajectory planning in real-world missions. We propose a model-aided deep Q-learning approach that, in contrast to previous work, requires a minimum of expensive training data samples and is able to guide a flight-time restricted UAV on a data harvesting mission without prior knowledge of wireless channel characteristics and limited knowledge of wireless node locations. By exploiting some known reference wireless node positions and channel gain measurements, we seek to learn a model of the environment by estimating unknown node positions and learning the wireless channel characteristics. Interaction with the model allows us to train a deep Q-network (DQN) to approximate the optimal UAV control policy. We show that in comparison with standard DRL approaches, the proposed model-aided approach requires at least one order of magnitude less training data samples to reach identical data collection performance, hence offering a first step towards making DRL a viable solution to the problem.
Model-aided deep reinforcement learning for sample-efficient UAV trajectory design in IoT networks
GLOBECOM 2021, IEEE Global Communications Conference, 7-11 December 2021, Madrid, Spain
© 2021 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
PERMALINK : https://www.eurecom.fr/publication/6543