-
Notifications
You must be signed in to change notification settings - Fork 19
logs15:See if new RL is working
- In logs14:Normalize Reward, we fixed reward logic. We want to confirm if it's working.
2: Thinking out loud - e.g. hypotheses about the current problem, what to work on next, how can I verify
Repeat some of the trials we did in logs11:See if RL works with medium model2, and see if we have some changes in avg_reply_len.
3: A record of currently ongoing runs along with a short reminder of what question each run is supposed to answer
- run1 & run2: basic run.
- run3 & run4: basic run with nan fix
- Fixed 0 division when we are standardizing reward.
- Fix some issue found in run3 and run4
- fix graph title
- average reward is almost zero by definition, we should log it before normalize.
- and run again.
- run5 & run 6: basic run with fixes above
- In run5, all replies has 0 length, which is really weird. Investigating.
- this is due to zero loss for zero length. So I changed the reward logic.
- In run5, all replies has 0 length, which is really weird. Investigating.
- run7
- See what happen if we skip seq2seq training and set reward = 1 only if len = 3
- The agent didn't receive any reward, so it failed.
- See what happen if we skip seq2seq training and set reward = 1 only if len = 3
- run8
- Adjust run 7, so that the agent can get some reward.
- Get reward when len >= encoder_len - 2.
- The agent get reward from the beginning.
- didn't work.
- I found wrong setup of this current implmentation.
- Adjust run 7, so that the agent can get some reward.
4: Results of runs (TensorBoard graphs, any other significant observations), separated by type of run (e.g. by the environment the agent is being trained in)
average_reward became nan and results got messed up.
ばいとおわ! [0]
[1]← average reply len=1.0 validation loss=146020.7 learning rate 0.1 msec/data=8.3 .../usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:112: RuntimeWarning: invalid value encountered in true_divide ...............average_reward=nan .INFO:tensorflow:Restoring parameters from model/tweet_large_rl/ChatbotModel-1662 ==== 1662 ==== おやすみ~ [0][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS] おやすみ~
-
{'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'} dst {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}
validation loss and reward became nan.
ばいとおわ! [0][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS] ばいとおわ! [0]併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ [1]併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ入り average reply len=140.0 validation loss=nan learning rate 0.1 msec/data=7.5 ..................average_reward=nan {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'} dst {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}
20180511rl_test_medium19 {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'} dst {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}
| seq2seq | ||
|---|---|---|
![]() |
![]() |
![]() |
| RL | ||
|---|---|---|
![]() |
![]() |
![]() |
20180511rl_test_medium20 {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'} dst {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}
| seq2seq | ||
|---|---|---|
![]() |
![]() |
![]() |
| RL | ||
|---|---|---|
![]() |
![]() |
![]() |
20180511rl_test_medium21
{'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'} dst {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}
| seq2seq | ||
|---|---|---|
![]() |
![]() |
| RL | ||
|---|---|---|
![]() |
![]() |
20180511rl_test_medium22
{'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'} dst {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}
| seq2seq | ||
|---|---|---|
![]() |
![]() |
| RL | ||
|---|---|---|
![]() |
![]() |
20180513_test_medium23
{'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 22, 'model_path': 'model/tweet_large'} dst {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}



















