Ziad SALLOUM
1 min readNov 26, 2019

--

Hello Siddharth Singi
If I understood well your question, the double Q learning won’t help you in this.
Double Q Learning as you can see from the graphs in the article, is mostly about convergence performance.
If you describe your problem in the way you did in your question, all Double Q Learning tells you is that going to from A to C, is better than going to B, but it tells you that faster than the normal Q learning algorithm.
The algorithm does not exploit “gaps” in the problem.
Algorithms that have the “ability to exploits gaps” (being really optimistic in here) are Policy based (https://towardsdatascience.com/policy-gradient-step-by-step-ac34b629fd55)
or self play (https://towardsdatascience.com/introduction-to-fictitious-play-12a8bc4ed1bb)

Hope I was able to reply to your question
Regards

--

--

Responses (1)