In the context of reinforcement learning we will show that a specific scheme of Monte Carlo control is monotonic if Q(a, pi) is well estimated by the exploration stage.
https://drive.google.com/file/d/11Aa92Mr3nMF1Gxa5r0kIiHfg-9wn_rkI/view?usp=sharing
No comments:
Post a Comment