Ziad SALLOUM
1 min readJul 5, 2019

--

Hello Ning Wang
It is all a matter of convention. Some consider the reward when the agent arrives to the state, others when leaving it. Both are right as long as you stick to one convention. There is not absolute truth in here, but a relative one in which we try to estimate the value of one state compared to others. The value of the state by itself is useless unless it is compared to others. Hope this is clearer now. Thank you.

--

--

No responses yet