logs15:See if new RL is working

See if new RL is working

1: What specific output am I working on right now?

In logs14:Normalize Reward, we fixed reward logic. We want to confirm if it's working.

2: Thinking out loud - e.g. hypotheses about the current problem, what to work on next, how can I verify

Repeat some of the trials we did in logs11:See if RL works with medium model2, and see if we have some changes in avg_reply_len.

3: A record of currently ongoing runs along with a short reminder of what question each run is supposed to answer

run1 & run2: basic run.
run3 & run4: basic run with nan fix
- Fixed 0 division when we are standardize reward.

4: Results of runs (TensorBoard graphs, any other significant observations), separated by type of run (e.g. by the environment the agent is being trained in)

run1

average_reward became nan and results got messed up.

ばいとおわ！ [0]
[1]← average reply len=1.0 validation loss=146020.7 learning rate 0.1 msec/data=8.3 .../usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:112: RuntimeWarning: invalid value encountered in true_divide ...............average_reward=nan .INFO:tensorflow:Restoring parameters from model/tweet_large_rl/ChatbotModel-1662 ==== 1662 ==== おやすみ～ [0][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS] おやすみ～

{'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'} dst {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}

run2

validation loss and reward became nan.

ばいとおわ！ [0][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS] ばいとおわ！ [0]併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ [1]併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ併せ入り average reply len=140.0 validation loss=nan learning rate 0.1 msec/data=7.5 ..................average_reward=nan {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'} dst {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}

run 3

src: {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'} dst {'machine': 'client1', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'} | seq2seq | | | | ------------- |:-------------:| -----:| | | | |

RL

run 4

{'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'} dst {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}

seq2seq

RL

logs15:See if new RL is working

See if new RL is working

1: What specific output am I working on right now?

2: Thinking out loud - e.g. hypotheses about the current problem, what to work on next, how can I verify

3: A record of currently ongoing runs along with a short reminder of what question each run is supposed to answer

4: Results of runs (TensorBoard graphs, any other significant observations), separated by type of run (e.g. by the environment the agent is being trained in)

run1

run2

run 3

run 4

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally