logs21: Steps to make small RL work

Log Type	Detail
1: What specific output am I working on right now?	See if this small set of RL is working.
2: Thinking out loud - hypotheses about the current problem - what to work on next - how can I verify	- Reward 1.0 when len == 8 or len == 0 otherwise reward -1.0
3: A record of currently ongoing runs along with a short reminder of what question each run is supposed to answer	Run1 & Run2 Run 3 give -1.0 for len = 1- Run 4 and Run5 longer training.
4: Results of runs and conclusion	Run1 Eventually converge to produce len == 1 Run2 Converge differently but still looks good. Run 3 showing it might converge if train longer. Run 4 & 5 seems working.
5: Next steps	See if it's working with large pre-trained data.
6: mega.nz

Run 1

sampled lengths=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] 0
sampled lengths=[0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0] 0
sampled lengths=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] 0
sampled lengths=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] 0
sampled lengths=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] 0

{'machine': 'client2', 'batch_size': 16, 'num_units': 256, 'num_layers': 2, 'vocab_size': 34, 'embedding_size': 40, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 8, 'decoder_length': 8, 'max_gradient_norm': 5.0, 'beam_width': 2, 'num_train_steps': 5000, 'model_path': 'model/tweet_small'}

Run 2

objective_count=16.0
sampled lengths=[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] 0
sampled lengths=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] 0
sampled lengths=[8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8] 0
sampled lengths=[3, 3, 4, 3, 3, 4, 3, 3, 4, 3, 3, 4, 3, 3, 4, 3] 11
sampled lengths=[5, 5, 6, 5, 5, 6, 5, 5, 6, 5, 5, 6, 5, 5, 6, 5] 0
objective_count=9.6
sampled lengths=[8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8] 0
sampled lengths=[1, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 1, 2, 2, 2] 0
sampled lengths=[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4] 0
sampled lengths=[8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8] 0

{'machine': 'client2', 'batch_size': 16, 'num_units': 256, 'num_layers': 2, 'vocab_size': 34, 'embedding_size': 40, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 8, 'decoder_length': 8, 'max_gradient_norm': 5.0, 'beam_width': 2, 'num_train_steps': 5000, 'model_path': 'model/tweet_small'}

Run 3

objective_count=9.6
sampled lengths=[6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6] 0
sampled lengths=[8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8] 0
sampled lengths=[6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6] 0
sampled lengths=[8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8] 0
sampled lengths=[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2] 0
objective_count=9.6
sampled lengths=[6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6] 0
sampled lengths=[8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8] 0
sampled lengths=[5, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5] 0
sampled lengths=[8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8] 0

{'machine': 'client1', 'batch_size': 16, 'num_units': 256, 'num_layers': 2, 'vocab_size': 34, 'embedding_size': 40, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 8, 'decoder_length': 8, 'max_gradient_norm': 5.0, 'beam_width': 2, 'num_train_steps': 5000, 'model_path': 'model/tweet_small'}

Run 4

> {'machine': 'client1', 'batch_size': 16, 'num_units': 256, 'num_layers': 2, 'vocab_size': 34, 'embedding_size': 40, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 8, 'decoder_length': 8, 'max_gradient_norm': 5.0, 'beam_width': 2, 'num_train_steps': 90000, 'model_path': 'model/tweet_small'}

sampled lengths=[7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7] 16
sampled lengths=[7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7] 16
sampled lengths=[8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8] 0

Run 5

{'machine': 'client1', 'batch_size': 16, 'num_units': 256, 'num_layers': 2, 'vocab_size': 34, 'embedding_size': 40, 'learning_rate': 0.1, 'learning_rate_decay': 0.99, 'use_attention': True, 'encoder_length': 8, 'decoder_length': 8, 'max_gradient_norm': 5.0, 'beam_width': 2, 'num_train_steps': 90000, 'model_path': 'model/tweet_small'}

sampled lengths=[8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8] 0
sampled lengths=[2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2] 12
sampled lengths=[8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8] 0

logs21: Steps to make small RL work

Run 1

Run 2

Run 3

Run 4

Run 5

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally