Member-only story

Dynamic Programming in Reinforcement Learning, the Easy Way

4 min readOct 13, 2018

Update: The best way of learning and practicing Dynamic Programming method is by going to http://rl-lab.com/gridworld-dp

You might also be interested in our YouTube channel,
AI-Team https://www.youtube.com/@AITEAM-h6f

In the previous article “Basics of Reinforcement Learning, the Easy Way”, we said that the goal of Reinforcement Learning is to make the agent follow an optimal policy in order to maximize collected rewards.
To be able to achieve this goal we should solve one of the following equations:

The idea is to know the maximum value that we can get at each state. This will allow us to determine what policy to follow.

We also said that Linear Algebra is not suitable for environment with large number of states, so it can’t help us solve the above equations. For this reason we will use iterative methods. One such method is Dynamic Programming (DP).

Dynamic Programming Method

Important: to be able to apply DP to solve the RL problem, we need to know the transition probability matrix, as well as the reward system.
This might not always be the case in real world problems!

As said we will proceed using iterative solution because the analytical one is hard to obtain.
We start with an initial random value at each state, then we pick a random policy to follow. The…

Dynamic Programming in Reinforcement Learning, the Easy Way

Dynamic Programming Method

Written by Ziad SALLOUM

No responses yet