Monte Carlo in Reinforcement Learning, the Easy Way

In Dynamic Programming (DP) we have seen that in order to compute the value function on each state, we need to know the transition matrix as well as the reward system. But this is not always a realistic condition. Probably it is possible to have such thing in some board games, but in video games and real life problems like self-driving car there is no way to know these information before hand.

If you recall the formula of the State-Value function from “Math Behind Reinforcement Learning” article: