Table of Contents#### Download Safari Books Online apps: Apple iOS | Android | BlackBerry

Entire Site

Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

156 CHAPTER 6 Fully Distributed Learning Algorithms vector u j,t and builds a strategy x j,t+1 for the next stage. The strategy x j,t+1 is a ^ function only of x j,t , u j,t and the u j,t . Note that the exact value of the state of nature ^ w t at time t and the past strategies x -j,t-1 := (x k,t-1 ) k=j of the other players and their past utilities u -j,t-1 := (u k,t-1 ) k=j are unknown to player j at time t. The game moves to t + 1. Remark This model is clearly a stochastic game (Shapley, 1953b) with incomplete information and independent state transitions. It includes some interesting well- studied learning problems. A basic example is the class of matrix games with i.i.d. ~ ~ ~ random entries in the form U(a) = D(a) + S (a deterministic part and a stochastic ~ = 0.) part with E( S) 6.2.1.1 Private Histories As seen in Chapter 3, the (private) history of player j at stage t comprises all obser- vations made up to stage t. As each player is assumed to knows his own actions only and observe the realizations of his utility, his history is merely given by: h j,t = (a j,1 , u j,1 , a j,2 , u j,2 , . . . , a j,t-1 , u j,t-1 ), which belongs to the set of private histories of player j at stage t: H j,t := (A j × R) t-1 . (6.4) (6.3) 6.2.1.2 Behavioral Strategy A behavioral strategy for player j is a sequence of mappings ( j,t ) t0 with: ~ j,t : ~ H i,t h j,t - - (A j ) x j (t). (6.5) The set of behavioral strategies of player j is denoted by j . The set of complete his- tories of the dynamic game after t stages is H t = (W × j A j × R K ) t-1 ; it describes the set of active players, the states, the chosen actions and the received utilities for all the players at all past stages before t. A strategy profile = ( j ) jK j j and an ~ ~ initial state w W induce a probability distribution P w, on the set of plays H = ~ (W × j A j × R K ) . Given an initial state w and a strategy profile , the utility of ~ 1 player j is the superior limit of the Cesaro-mean utility E w, v j,T = E T T u j,t . ~ t=1 We assume that E w, v j,T has a limit and can be expressed in terms of the stationary ~ ~ distribution as u j (x j , x -j ) = E w,x U j (w, a). The main idea to understand here is that we associate a static game with the dynamic game. A concrete consequence of this is that, under appropriate assumptions, the limiting player's behaviors may be given by the Nash equilibria of the associated static game. We call this game the expected robust game, and define it as follows.