Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

Share this Page URL

Chapter 15: Task Coordination For Servic... > Markov Decision Processes - Pg. 345

Task Coordination for Service Robots Based on Multiple Markov Decision Processes the same time but it is not desirable given the application. For example, it is not desirable for a mobile robot to be navigating and handing out an object to a person at the same time (this situation is also difficult for a person). Behavior conflicts are solved on­line based on a set of restrictions specified by the user. If there are no restrictions, all the actions are executed concurrently; otherwise, a constraint satisfaction module selects the set of actions with the higher expected utility. We have applied our proposed solutions to a messenger robot. We consider initially no con- flicts between the MDPs and demonstrate that a complex robot task can be solved efficiently using our approach. Then we consider a more difficult scenario with conflicts, and show that our method can solve these conflicts and still solve the problem effectively. We also compared it with a solution without restrictions and confirmed that by incorporating restrictions the robot can accom- plish the task more efficiently, with a significant reduction in the number of time steps required to complete the task. · · · Factorization, in which the state space is represented in a factored form (Boutilier, Dearden & Goldszmidt, 1995, Hoey, St- Aubin, Hu & Boutilier, 1999). Abstraction, that creates an abstract mod- el where states with similar features are grouped together (Parr & Russell, 1997, T. G. Dietterich, 1998, Jong & Stone, 2005, Li, Walsh & Littman, 2006, Dean & Givan, 1997). Decomposition, that divides the global problem into smaller problems that are solved independently and their solutions are combined (Meuleau et al., 1998, Laroche, Boniface & Schott, 2001). We briefly review these 3 approaches, with emphasis on decomposition, which is the ap- proach we follow. Factorization Traditional MDP solution techniques have the drawback that they require an explicit state space representation, limiting their applicability to real-world problems. Factored representations address this drawback by compactly specifying the state-space in factored form. In a factored MDP, the set of states is described via a set of random variables, X = {X 1 , ..., X n }, where each X 1 takes on values in some finite domain, Dom(X 1 ). The framework of dynamic Bayesian networks (DBNs) gives us the tools to describe the transition func- tion concisely (see Chapter 2). For each action, a two-stage DBN specifies the transition function. An even more compact representation can be obtained by representing the transition tables and value functions as decision trees (Boutilier, Dean & Hanks, 1999) or algebraic decision diagrams (Hoey et al., 1999). MARKOV DECISION PROCESSES A Markov Decision Process (Puterman, 1994) models a sequential decision problem, in which a system evolves in time and is controlled by an agent. A policy for an MDP is a mapping : S A that selects an action for each state. A solution to an MDP is a policy that maximizes its expected value. Two popular methods for finding an optimal policy for an MDP are: (a) value iteration and (2) policy iteration (Puterman, 1994) (see Chapter 3). The main drawback of the MDP approach is that the solution complexity is polynomial on size of the state­action space, and this can be very large for most applications. To deal with the complexity problem for MDPs there are three main approaches: 345