Monte Carlo in Reinforcement Learning, the Easy Way
Update: The best way of learning and practicing Monte Carlo method is by going to http://rl-lab.com/gridworld-mc
In Dynamic Programming (DP) we have seen that in order to compute the value function on each state, we need to know the transition matrix as well as the reward system. But this is not always a realistic condition. Probably it is possible to have such thing in some board games, but in video games and real life problems like self-driving car there is no way to know these information before hand.
If you recall the formula of the State-Value function from “Math Behind Reinforcement Learning” article:
It is not possible to compute the V(s) because p(s’,r|s,a) is now unknown to us.
Always keep in mind that our goal is to find the policy that maximizes the reward for an agent. We said in previous articles that analytical solution is hard to get, so we fallback to iterative solutions such as Dynamic Programming. However DP has its own problems as mentioned above.
An alternative solution is to play enough number of episodes of the game and extract the information needed. Notice that in DP we didn’t play the game because we knew its dynamics, in other words at each state we knew what are the probabilities of going to another state when we take certain action, and we knew what the reward is going to be. Based on that we were able to do our calculations. In this new scenario, we won’t know these data unless we play the…