On enabling 5G dynamic TDD by leveraging deep reinforcement learning and O-RAN