-
Notifications
You must be signed in to change notification settings - Fork 19
logs14:Normalize Reward
Higepon Taro Minowa edited this page May 7, 2018
·
3 revisions
According to Why do we normalize the discounted rewards when doing policy gradient reinforcement learning? - Data Science Stack Exchange, we should standlize reward so that half of the actions are positive and the other half is negative.
2: Thinking out loud - e.g. hypotheses about the current problem, what to work on next, how can I verify
3: A record of currently ongoing runs along with a short reminder of what question each run is supposed to answer
- run1: title
4: Results of runs (TensorBoard graphs, any other significant observations), separated by type of run (e.g. by the environment the agent is being trained in)
- hparams >
- mega.nz directory: 20180430rl_test_medium7
| seq2seq | ||
|---|---|---|
![]() |
![]() |
![]() |
| RL | ||
|---|---|---|
![]() |
![]() |
![]() |





