Ziad SALLOUM
1 min readJan 27, 2020

--

Hi,
In the previous article “Math Behind Reinforcement Learning” in the paragraph you have mentioned, I haven’t yet introduced the policy 𝜋 (a) which is the probability of taking a certain action.
In this article there are 3 possible actions, each with its own possibility of success and failure. I assumed that all three actions are equally likely to be taken ( check the sentence “Suppose at first he considers using all actions equally in the attack”) so I divide by 3. If they were not equally likely I should multiply each (Ri+ 𝛄 V(Si)) by its 𝜋 (ai) for i=1 to 3
V(S) will be something like that (not exactly but in the general form):
V(S) = 𝜋 (a1) * (R1+ 𝛄 V(S1)) + 𝜋 (a2) * (R2+ 𝛄 V(S2)) +𝜋 (a3) * (R3+ 𝛄 V(S3))
but since 𝜋 (a1)=𝜋 (a2)=𝜋 (a3)=1/3
V(S) = 1/3* (R1+ 𝛄 V(S1)) + 1/3* (R2+ 𝛄 V(S2)) +1/3* (R3+ 𝛄 V(S3))
V(S) = 1/3* (R1+R2+R3) + 1/3*𝛄 *( V(S1) + V(S2) + V(S3))

Hope it is clearer now

Regards

--

--

No responses yet