Member-only story
Dynamic Programming in Reinforcement Learning, the Easy Way
Update: The best way of learning and practicing Dynamic Programming method is by going to http://rl-lab.com/gridworld-dp
In the previous article “Basics of Reinforcement Learning, the Easy Way”, we said that the goal of Reinforcement Learning is to make the agent follow an optimal policy in order to maximize collected rewards.
To be able to achieve this goal we should solve one of the following equations:
The idea is to know the maximum value that we can get at each state. This will allow us to determine what policy to follow.
We also said that Linear Algebra is not suitable for environment with large number of states, so it can’t help us solve the above equations. For this reason we will use iterative methods. One such method is Dynamic Programming (DP).
Dynamic Programming Method
Important: to be able to apply DP to solve the RL problem, we need to know the transition probability matrix, as well as the reward system.
This might not always be the case in real world problems!
As said we will proceed using iterative solution because the analytical one is hard to obtain.
We start with an initial random value at each state, then we pick a random policy to follow. The reason to have a policy is simply because in order to compute any state-value function we need to know how the agent is behaving.
(If you are wondering…