-
Notifications
You must be signed in to change notification settings - Fork 19
logs32:Values to watch for RL
Higepon Taro Minowa edited this page Jul 2, 2018
·
2 revisions
It's hard to say if RL is working or your implementation is correct. Let's watch some values to ensure.
- How are good random samples shown? Are they diverse enough?
- Does mutual information reward look reasonable?
- Which one is better seq2seq reply or RL reply?
- for validation tweets typically good morning type.
Avg reward actually went down, which doesn't look right.