In this dissertation we deal with the design of strategies for agents interacting in a dynamic environment. The mathematical tool of Game Theory (GT) on Markov Decision Processes (MDPs) is adopted. The agents' strategies control both the transition probabilities among the states and the rewards earned by each agent. Rewards are geometrically discounted over time. We first study the competitive case, in which two agents act selfishly. The game is zero-sum and the agents control disjoint sets of states. We devise two algorithms to compute the Nash equilibrium for all discount factors close enough to 1. Then we consider the long-run cooperative case, in which agents can coordinate their strategies. We utilize our two algorithms to compute the value of the coalitions in a routing game, in which several providers share the same network and control the routing in disjoint sets of nodes. Next we deal with dynamic cooperative GT on MDPs, in which coalitions can form throughout the game. We show how to enforce a common agreement forwhich the pay-off is distributed at each state, in a global optimum way and such that no coalition is ever enticed to break the agreement. We apply these concepts to a wireless multiple access channel, in which the channel is quasi-static. We assign the rate to users in each channel state in a fair and satisfactory manner. Next we provide three methods to compute a confidence interval for Shapley value on Markovian games. Such methods have polynomial complexity in the number of agents, while the complexity of the exact computation is exponential. Two methods are still valid when the values in each state are learned during the game. Finally we assess the performance of two strategies to dynamically select the frequency band to communicate on. We exploit an MDP formulation with uncountable state space.
Markovian competitive and cooperative games with applications to communications
Systèmes de Communication
© Université de Nice. Personal use of this material is permitted. The definitive version of this paper was published in Thesis and is available at :
PERMALINK : https://www.eurecom.fr/publication/3835