Skip to content

logs:33 Long RL run

Higepon Taro Minowa edited this page Jul 9, 2018 · 4 revisions
Log Type Detail
1: What specific output am I working on right now? Run reward_qi + reward_s RL see if reward goes up.
2: Thinking out loud
- hypotheses about the current problem
- what to work on next
- how can I verify
Just see tensorboard graph
3: A record of currently ongoing runs along with a short reminder of what question each run is supposed to answer - did reward goes up
-does the answer look okay?
4: Results of runs and conclusion -reward was flat didn't go up
-
5: Next steps
6: mega.nz 20180709140004_rl_test

In the middle, there were actually good results.

ありがとうございます🙇🏻‍♀️💓こちらこそありがとうございます❇︎これからよろしくお願いします(^^)[PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD] [seq2] : こちら♡です!!!!!!!!!!!!!!!!!!!!!!!! -0.43 => (-16.16) <= -15.73 [RL greedy] : いえいえ(^^)よろしくお願いします🙇 -2.77 => [RL sample]: いえいえ(^^)よろしくお願いします🙇 -6.82 => (-20.74) <= -13.92

But later it showed short results.

なのにもう汚いお勉強、えらいな![PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD] [seq2] : ありがとう!!!!!!!!!!!!!!!!!!!!!!!!!! -0.23 => (-19.05) <= -18.82 [RL greedy] : 痛かっゆこ -5.23 => [RL sample]: エアコン -7.77 => (-17.96) <= -10.19

In the end, it's not even human readable and reward is very low.

PS4とモンハンで計5万はやばいセットのやつ?[PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD] [seq2] : 💩の💩のやつ 💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩💩 -2.55 => (-8.05) <= -5.50 [RL greedy] : 一室ジャポニカ|||)|||)⊂)⊂)⊂)⊂)⊂)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||) -1.41 => [RL sample]: アサデス山中湖|||)|||)⊂)⊂)⊂)⊂)⊂)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||)`|||) -13.26 => (-20.16) <= -6.90 reward_qi size= 64 28

Clone this wiki locally