Hello and thank you for your comment.

Jan 11, 2021

I believe if you have a look at this blog it will be clearer https://towardsdatascience.com/neural-fictitious-self-play-800612b4a53f

Anyway, in a quick way. You have a list called Msl containing states and their best responses (best action for each state). The problem is that when your list of states grows exponentially, this won't be manageable. So to solve this issue, you create a Neural Net, that tries to "map" (approximate) each state to its best action.

So in input you have the states and in output (labels) you have the actions. When training this NN, you try to find the right parameters (θ), such that once you feed the NN with one state from this list (or similar one) you get the best action for it. Hope this makes sense.

Written by Ziad SALLOUM

No responses yet