Member-only story

Monte Carlo in Reinforcement Learning, the Easy Way

9 min readNov 12, 2018

Update: The best way of learning and practicing Monte Carlo method is by going to http://rl-lab.com/gridworld-mc

You might also be interested in our YouTube channel,
AI-Team https://www.youtube.com/@AITEAM-h6f

In Dynamic Programming (DP) we have seen that in order to compute the value function on each state, we need to know the transition matrix as well as the reward system. But this is not always a realistic condition. Probably it is possible to have such thing in some board games, but in video games and real life problems like self-driving car there is no way to know these information before hand.

If you recall the formula of the State-Value function from “Math Behind Reinforcement Learning” article:

It is not possible to compute the V(s) because p(s’,r|s,a) is now unknown to us.

Always keep in mind that our goal is to find the policy that maximizes the reward for an agent. We said in previous articles that analytical solution is hard to get, so we fallback to iterative solutions such as Dynamic Programming. However DP has its own problems as mentioned above.

An alternative solution is to play enough number of episodes of the game and extract the information needed. Notice that in DP we didn’t play the game because we knew its dynamics, in other words at each state we knew what are the probabilities of going to another state when we take certain action, and we knew…

Monte Carlo in Reinforcement Learning, the Easy Way

Written by Ziad SALLOUM

Responses (4)