Thursday, August 8, 2019
Sunday, August 4, 2019
- Reinforcement learning. Recurring chicken game
- The chicken game -
https://en.wikipedia.org/wiki/Chicken_(game)
- Nash equilibrium f_1(x, y) >= f_1(z, y), f_2(x, y) >= f_2(x, z)
- (2, 7), (7, 2) - nash equilibrium in the chicken game
- The minmax/maxmin is 2. The player can ensure he get get something regardless of what the other is doing
-- The two player if she plays dare the worst that can happen is 0, if she plays chicken that worst that can happen is 2. Then by playing chicken she can ensure 2. Same with the column player
- We'll repeat the game infinite number of times. The payment of the row play is ri at stage. Will define the payment to be the liminf of the average
Sum_{1 to k}(ri)/k as k goes to infinite. Same thing with the column player
- The folk theorem in game theory - https://en.wikipedia.org/wiki/Folk_theorem_(game_theory)
- Objective - obtain an equilibrium of (1/2)(0,0) + (1/2)(6, 6). In face any convex combination of the possible game payments can be obtained.
- Strategy players play (0 0) in odd stages and (6 6) in even stages. If the column player changes from the prescribed behavior the row player play dare always ensuring the player gets no more than 2. Same with the column player.
-- Analysis
- Payments when nobody deviates - same for both players. For row player the payment is
0 6 0 6 0 6 ....
0 3 2 3 12/5 3 ....
Liminf of the series is 3
https://en.wikipedia.org/wiki/Limit_superior_and_limit_inferior
Assume a deviation of the column play then at the deviation the column player gets a maximal addition of 2 then moving forward after the deviation the column player is getting a maximum of 2
Payments of column player is bounded by -
0 6 0 6 2 2 2 2 2 2 ....
Assume the length of the prefix is 2k before the deviation then the payment for the column player is bounded by
Lim [k6 + (n - 2k)2]/n = lim 2 = 2
Which is less than 3.
Therefore, the above strategy is a nash equilibrium .
Can we marry repeated games with re-enforcement learning? See additional interesting reading -
https://arxiv.org/pdf/1703.02702.pdf
- The chicken game -
https://en.wikipedia.org/wiki/Chicken_(game)
- Nash equilibrium f_1(x, y) >= f_1(z, y), f_2(x, y) >= f_2(x, z)
- (2, 7), (7, 2) - nash equilibrium in the chicken game
- The minmax/maxmin is 2. The player can ensure he get get something regardless of what the other is doing
-- The two player if she plays dare the worst that can happen is 0, if she plays chicken that worst that can happen is 2. Then by playing chicken she can ensure 2. Same with the column player
- We'll repeat the game infinite number of times. The payment of the row play is ri at stage. Will define the payment to be the liminf of the average
Sum_{1 to k}(ri)/k as k goes to infinite. Same thing with the column player
- The folk theorem in game theory - https://en.wikipedia.org/wiki/Folk_theorem_(game_theory)
- Objective - obtain an equilibrium of (1/2)(0,0) + (1/2)(6, 6). In face any convex combination of the possible game payments can be obtained.
- Strategy players play (0 0) in odd stages and (6 6) in even stages. If the column player changes from the prescribed behavior the row player play dare always ensuring the player gets no more than 2. Same with the column player.
-- Analysis
- Payments when nobody deviates - same for both players. For row player the payment is
0 6 0 6 0 6 ....
0 3 2 3 12/5 3 ....
Liminf of the series is 3
https://en.wikipedia.org/wiki/Limit_superior_and_limit_inferior
Assume a deviation of the column play then at the deviation the column player gets a maximal addition of 2 then moving forward after the deviation the column player is getting a maximum of 2
Payments of column player is bounded by -
0 6 0 6 2 2 2 2 2 2 ....
Assume the length of the prefix is 2k before the deviation then the payment for the column player is bounded by
Lim [k6 + (n - 2k)2]/n = lim 2 = 2
Which is less than 3.
Therefore, the above strategy is a nash equilibrium .
Can we marry repeated games with re-enforcement learning? See additional interesting reading -
https://arxiv.org/pdf/1703.02702.pdf
Sunday, July 28, 2019
We'll continue with concentration inequalities https://drive.google.com/file/d/1_raS_p223vmfTVBEY7T_9oyO9ctBfYLA/view?usp=sharing
Sunday, July 21, 2019
We'll cover the Shapley value and recent application to data analysis https://drive.google.com/file/d/1rRQv3KoJmmboJl4Or0doX2KsLg_edQKU/view?usp=sharing To deep dive review the following paper - https://drive.google.com/file/d/1DfoFOzO5LQ5Jzppv7lRNLzaS0tM1ky9L/view?usp=sharing For the curious, slide picture was taken in humus Yossef https://www.tripadvisor.co.il/Restaurant_Review-g7314119-d8589712-Reviews-Yossef_Hummus_Joint-Pardes_Hanna_Karkur_Haifa_District.html
Sunday, July 14, 2019
As we discussed the use of concentration inequalities in our last meeting on re-enforcement learning, we will run a refresh on this subject and discuss Markov inequality this week - https://drive.google.com/file/d/1nvvLIXALNb__l0wNFifWmdVN15Fze5WC/view?usp=sharing
Sunday, July 7, 2019
The simplest Monte Carlo evaluation of a policy value https://drive.google.com/file/d/1bVcrAV9WsQt2oyZAoaW2kOUzv5isLGvV/view?usp=sharing
Tuesday, July 2, 2019
Following our meeting yesterday, promised book reference on convex optimization https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf
Subscribe to:
Posts (Atom)
Our next ML study group meeting will take place on Monday the 8 th of October. I'll cover the contraction theorem. See relevant s...
-
Following the meeting yesterday I have added an example of a not continuous function that has continuous partial derivatives https://drive...
-
Ml crash directory Are you familiar with regression - https://m.youtube.com/watch?v=aq8VU5KLmkY ? One way to view Ml is regression on ster...
-
Ml crash directory Are you familiar with regression - https://m.youtube.com/watch?v=aq8VU5KLmkY ? One way to view Ml is regression o...