On using reinforcement learning for network slice admission control in 5G: offline vs. online