“We should regret our mistakes and learn from them, but never carry them forward into the future with us.”
Lucy Maud Montgomery
Learning from regrets is what Counter Factual Minimization is all about.
The notion of “regret” is introduced in the article “Introduction to Regret in Reinforcement Learning”. However, it considers scenarios or games composed of a single step or action. Certainly, this is not realistic enough, because most scenarios, in reality, are composed of multiple steps.
It is clear that in every aspect of life, each decision might have a long term impact, and its effect might not be…
“In the end, we only regret the chances we didn’t take”
It is almost sure that every human has regretted something (actually many things) during his/her lifetime. Regretting not to buy a ticket when the price was still affordable, regretting not to take a career decision, regretting personal or social move, etc… Of course, regretting has a bitter taste, and even though it can be instructive, the reality is that the opportunity is often lost and there is no turning back.
But that might not be quite the case when training a machine or an algorithm.
This article describes an implementation of Neural Fictitious Self-Play (NFSP) in Leduc Hold’em Poker Game, based on code by Eric Steinberger. The full source code can be found on his Github repository.
Disclaimer: this article does not aim to explain every bit of detail in the implementation code, but rather to highlight the main axis and the mapping between the implementation of the code and the academic solution.
The implementation involves distributed computation which…
If you are a developer with not enough knowledge in math, you might be having hard time grasping the basic formula of Reinforcement Learning.
Policy is somehow a tricky concept, mainly for Reinforcement Learning beginners. This article will try to clarify the topic in plain and simple English, away from mathematical notions. It is written with developers in mind.
If you have ever heard of best practices or guidelines then you have heard about policy. Consider, for example, fire safety guidelines for people living in high buildings. Probably the most important guideline is not to use the elevator during a fire. People should close doors, stock water, use wet sheets, and make their position known to firefighters.
These series of actions are issued by guidelines…
This article is based on a scientific paper by Heinrich & Silver that introduces the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge.
Using Reinforcement Learning in a zero-sum game requires some more involved methods than the standard Fictitious Play. Standard Fictitious Play is used in Normal Form Games which does not consider time. To apply Reinforcement Learning to zero-sum games, another approach is needed. This article is based on the paper “Fictitious Self-Play in Extensive-Form Games” by Johannes Heinrich & David Silver.
Normal Form Games are modelled as a table where the actions (called strategies) of each player, are the headers of rows and columns, and the content of each cell is the payoff of the strategy employed by each player.
First step into understanding Self Play in Reinforcement Learning
Fictitious play is a game theory concept. It consists of analyzing the game to figure out what is the best strategy to adopt when facing an opponent in a zero sum game.
This is usually a heavy subject so we will start by some important definitions, then we will explain the Fictitious play algorithm.
A zero sum game is a game where the points gained by one player is the loss of the other(s). In such a way that the sum of all points attributed to the players is equal to…
When you are new to Reinforcement Learning you will no doubt be bombarded with weird terms, like Model-Based, Model-Free, On Policy, Off Policy etc…
Soon you will find it exhausting to keep track of this terminology that seem to appear all over the place, without obvious link between its terms.
This article will try to put all these terms into perspective so that beginners don’t feel overwhelmed.
Disclaimer: this article assumes that you already know what is Reinforcement Learning and some of the existing algorithms. …
Before delving into the details of the actor critic, let’s remind ourselves of the Policy Gradient .
What does it mean to have a policy based reinforcement learning?
To put it simply imagine that a robot find itself in some situation, but it appears that this situation is similar to something it had experienced before.
So the policy based method says: since I have taken action (a) in this particular situation in the past, let’s try the same action this time too.
PS. Don’t mix similar situation with same states, in similar situation the robot or agent is in some…