Skip to content

A Deep Reinforcement Learning Chatbot

Higepon Taro Minowa edited this page Apr 8, 2018 · 19 revisions

Abstract

  • Amazon Alexa Prize competition で MILABOT を deep reinforcement learning で開発した。
  • よくある small talk topi に対応していて人と会話できる
  • bot は
    • natural language generation model と retrieval model を組み合わせてつくられている
    • template-based model, bag-of-words model, seq2seq, latent variable などなど。
  • crowd source と実際のユーザーとのやりとりで、複数のモデルから適切な response を選ぶように reinforcement learning で train した。

1.Introduction

2 System Overview

3 Response Models

3.1 Template-based Models

3.2 Knowledge Base-based Question Answering

3.3 Retrieval-based Neural Networks

3.4 Retrieval-based Logistic Regression

3.5 Search Engine-based Neural Networks

3.6 Generation-based Neural Networks

4 Model Selection Policy

4.1 Input Features

4.2 Model Architecture

4.3 Supervised AMT: Learning with Crowdsourced Labels

4.4 Supervised Learned Reward: Learning with a Learned Reward Function

4.5 Off-policy REINFORCE

4.6 Off-policy REINFORCE with Learned Reward Function

4.7 Q-learning with the Abstract Discourse Markov Decision Process

4.8 Preliminary Evaluation

5 A/B Testing Experiments

5.1 A/B Testing Experiment #1

5.2 A/B Testing Experiment #2

5.3 A/B Testing Experiment #3

5.4 Discussion

6 Related Work

7 Future Work

7.1 Personalization

7.2 Text-based Evaluation

8 Conclusion

Acknowledgments

Clone this wiki locally