Skip to content

logs32:Values to watch for RL

Higepon Taro Minowa edited this page Jul 2, 2018 · 2 revisions

background

It's hard to say if RL is working or your implementation is correct. Let's watch some values to ensure.

values

  • How are good random samples shown? Are they diverse enough?
  • Does mutual information reward look reasonable?
  • Which one is better seq2seq reply or RL reply?
    • for validation tweets typically good morning type.

Observation

Avg reward actually went down, which doesn't look right.

Clone this wiki locally