logs15:See if new RL is working

See if new RL is working

1: What specific output am I working on right now?

In logs14:Normalize Reward, we fixed reward logic. We want to confirm if it's working.

2: Thinking out loud - e.g. hypotheses about the current problem, what to work on next, how can I verify

Repeat some of the trials we did in logs11:See if RL works with medium model2, and see if we have some changes in avg_reply_len.

3: A record of currently ongoing runs along with a short reminder of what question each run is supposed to answer

run1: title

4: Results of runs (TensorBoard graphs, any other significant observations), separated by type of run (e.g. by the environment the agent is being trained in)

run1

average_reward became nan and results got messed up.

ばいとおわ！ [0]
[1]← average reply len=1.0 validation loss=146020.7 learning rate 0.1 msec/data=8.3 .../usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:112: RuntimeWarning: invalid value encountered in true_divide ...............average_reward=nan .INFO:tensorflow:Restoring parameters from model/tweet_large_rl/ChatbotModel-1662 ==== 1662 ==== おやすみ～ [0][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS][SOS] おやすみ～

hparams >{'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.5, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large'} dst {'machine': 'client2', 'batch_size': 64, 'num_units': 512, 'num_layers': 2, 'vocab_size': 5000, 'embedding_size': 256, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 28, 'decoder_length': 28, 'max_gradient_norm': 5.0, 'beam_width': 0, 'num_train_steps': 1560, 'model_path': 'model/tweet_large_rl'}

logs15:See if new RL is working

See if new RL is working

1: What specific output am I working on right now?

2: Thinking out loud - e.g. hypotheses about the current problem, what to work on next, how can I verify

3: A record of currently ongoing runs along with a short reminder of what question each run is supposed to answer

4: Results of runs (TensorBoard graphs, any other significant observations), separated by type of run (e.g. by the environment the agent is being trained in)

run1

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally