logs15:See if new RL is working

See if new RL is working

1: What specific output am I working on right now?

In logs14:Normalize Reward, we fixed reward logic. We want to confirm if it's working.

2: Thinking out loud - e.g. hypotheses about the current problem, what to work on next, how can I verify

Repeat some of the trials we did in logs11:See if RL works with medium model2, and see if we have some changes in avg_reply_len.

3: A record of currently ongoing runs along with a short reminder of what question each run is supposed to answer

run1 & run2: basic run.
run3 & run4: basic run with nan fix
- Fixed 0 division when we are standardizing reward.
Fix some issue found in run3 and run4
- fix graph title
- average reward is almost zero by definition, we should log it before normalize.
- and run again.
run5 & run 6: basic run with fixes above
- In run5, all replies has 0 length, which is really weird. Investigating.
  - this is due to zero loss for zero length. So I changed the reward logic.
run7
- See what happen if we skip seq2seq training and set reward = 1 only if len = 3
  - The agent didn't receive any reward, so it failed.
run8
- Adjust run 7, so that the agent can get some reward.
  - Get reward when len >= encoder_len - 2.
  - The agent get reward from the beginning.
  - didn't work.
  - I found wrong setup of this current implmentation.

4: Results of runs (TensorBoard graphs, any other significant observations), separated by type of run (e.g. by the environment the agent is being trained in)

run1

average_reward became nan and results got messed up.

ばいとおわ！ [0]
[1]← average reply len=1.0 validation loss=146020.7 learning rate 0.1 msec/data=8.3 .../usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:112: RuntimeWarning: invalid value encountered in true_divide ...............average_reward=nan .INFO:tensorflow:Restoring parameters from model/tweet_large_rl/ChatbotModel-1662 ==== 1662 ==== おやすみ～ [0][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS] おやすみ～

{'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'} dst {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}

run2

validation loss and reward became nan.

ばいとおわ！ [0][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS] ばいとおわ！ [0]併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ [1]併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ入り average reply len=140.0 validation loss=nan learning rate 0.1 msec/data=7.5 ..................average_reward=nan {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'} dst {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}

run 3

20180511rl_test_medium19 {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'} dst {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}

seq2seq

RL

run 4

20180511rl_test_medium20 {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'} dst {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}

seq2seq

RL

run 5

20180511rl_test_medium21

{'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'} dst {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}

seq2seq

RL

run 6

20180511rl_test_medium22

{'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'} dst {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}

seq2seq

RL

run7

20180513_test_medium23

{'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 22, 'model_path': 'model/tweet_large'} dst {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}

logs15:See if new RL is working

See if new RL is working

1: What specific output am I working on right now?

2: Thinking out loud - e.g. hypotheses about the current problem, what to work on next, how can I verify

3: A record of currently ongoing runs along with a short reminder of what question each run is supposed to answer

4: Results of runs (TensorBoard graphs, any other significant observations), separated by type of run (e.g. by the environment the agent is being trained in)

run1

run2

run 3

run 4

run 5

run 6

run7

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!