Value-Based Planning for Teams of Agents in Stochastic Partially Observable Environments
Annotation. A key requirement of decision support systems is the ability to reason about uncertainty. This is a complex problem, especially when multiple decision makers are involved. For instance, consider a team of fire fighting agents whose goal is to extinguish a large fire in a residential area using only local observations. In this case, the environment is stochastic because the agents may be uncertain with respect to: 1) the effect of their actions, 2) the true state of the environment, and 3) the actions the other agents take. These uncertainties render the problem computationally intractable. In this thesis such decision-making problems are formalized using a stochastic discrete-time model called decentralized partially observable Markov decision process (Dec-POMDP). The first part of this thesis describes a value-based (i.e. based on value functions) approach for Dec-POMDPs, making use of Bayesian games. In the second part, different forms of structure in this approach are identified and exploited to realize better scaling behavior. This title can be previewed in Google Books - http://books.google.com/books?vid=ISBN9789056296100.
What people are saying - Write a review
We haven't found any reviews in the usual places.
admissible heuristic algorithm approximate Q-value functions basis functions Bayesian games CGBGs chapter components consider Dec-Tiger decentralized deﬁned Deﬁnition delayed communication denote diﬀerent dynamic programming eﬀect eﬃciently Emery-Montemerlo environment evaluation expected reward expected value exponential factor graph factored Dec-POMDPs Factored FSPC factored Q-value function Figure ﬁnd ﬁnite ﬁrst ﬁxed formulation forward-sweep policy computation game theory given GMAA GMAA∗-Cluster heuristic horizon indicator functions induced scope inﬂuence inner product joint action joint belief joint observation k-GMAA last stage Markov decision process Max-Plus maximization multiagent Nash equilibrium number of agents observation histories Oliehoek optimal joint policy optimal policy optimal solution optimal value function Pareto optimal partially observable particular past joint policy payoﬀ function performed planning policy search POMDP possible proposed QMDP Section shows solution methods solving Spaan speciﬁed stochastic sub-tree policy subset Tambe thesis transition and observation variables Vlassis Zilberstein