logs14:Normalize Reward

Jump to bottom

Higepon Taro Minowa edited this page May 7, 2018 · 3 revisions

Normalize Reward

1: What specific output am I working on right now?

According to Why do we normalize the discounted rewards when doing policy gradient reinforcement learning? - Data Science Stack Exchange, we should standlize reward so that half of the actions are positive and the other half is negative.

2: Thinking out loud - e.g. hypotheses about the current problem, what to work on next, how can I verify

3: A record of currently ongoing runs along with a short reminder of what question each run is supposed to answer

run1: title

4: Results of runs (TensorBoard graphs, any other significant observations), separated by type of run (e.g. by the environment the agent is being trained in)

run1

hparams >
mega.nz directory: 20180430rl_test_medium7

seq2seq

RL