Skip to content

logs:33 Long RL run

Higepon Taro Minowa edited this page Jul 9, 2018 · 4 revisions
Log Type Detail
1: What specific output am I working on right now? Run reward_qi + reward_s RL see if reward goes up.
2: Thinking out loud
- hypotheses about the current problem
- what to work on next
- how can I verify
Just see tensorboard graph
3: A record of currently ongoing runs along with a short reminder of what question each run is supposed to answer - did reward goes up
-does the answer look okay?
4: Results of runs and conclusion -reward was flat didn't go up
-
5: Next steps
6: mega.nz

In the middle, there were actually good results.

ありがとうございます🙇🏻‍♀️💓こちらこそありがとうございます❇︎これからよろしくお願いします(^^)[PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD][PAD] [seq2] : こちら♡です!!!!!!!!!!!!!!!!!!!!!!!! -0.43 => (-16.16) <= -15.73 [RL greedy] : いえいえ(^^)よろしくお願いします🙇 -2.77 => [RL sample]: いえいえ(^^)よろしくお願いします🙇 -6.82 => (-20.74) <= -13.92

seq2seq
RL

hparam| {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'}dst{'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}|

Clone this wiki locally