Basics of Reinforcement Learning, the Easy Way
9 min readAug 29, 2018
Update: The best way of learning and practicing Reinforcement Learning is by going to http://rl-lab.com
Reinforcement Learning (RL) is the problem of studying an agent in an environment, the agent has to interact with the environment in order to maximize some cumulative rewards.
Example of RL is an agent in a labyrinth trying to find its way out. The fastest it can find the exit, the better reward it will get.
Markov Decision Process (MDP)
To describe this problem in a mathematical way, we use Markov Decision Process (MDP).
MDP describes the environment as follows.
- MDP is a collection of States, Actions, Transition Probabilities, Rewards, Discount Factor: (S, A, P, R, γ)
- S is a set of a finite state that describes the environment.
- A is a set of a finite actions that describes the action that can be taken by the agent.
- P is a probability matrix that tells the probability of moving from one state to the other.
- R is a set of rewards that depend on the…