A Deep Reinforcement Learning Chatbot

Jump to bottom

Higepon Taro Minowa edited this page Apr 8, 2018 · 19 revisions

Abstract

Amazon Alexa Prize competition で MILABOT を deep reinforcement learning で開発した。
よくある small talk topi に対応していて人と会話できる
bot は
- natural language generation model と retrieval model を組み合わせてつくられている
- template-based model, bag-of-words model, seq2seq, latent variable などなど。
crowd source と実際のユーザーとのやりとりで、複数のモデルから適切な　response を選ぶように reinforcement learning で train した。

1.Introduction

略

2 System Overview

rule base は限界があるのでほぼ全て Statistical machine learning にした。
全てのコンポーネントが独立して大量データを使って ML で training された
dialogue manager
- モデルたちから reponse 候補をもらう
- priority response があったら即返す (what's your name?)
- なかったら selection policy で選ぶ

3 Response Models

3.1 Template-based Models

3.2 Knowledge Base-based Question Answering

3.3 Retrieval-based Neural Networks

3.4 Retrieval-based Logistic Regression

3.5 Search Engine-based Neural Networks

3.6 Generation-based Neural Networks

4 Model Selection Policy

4.1 Input Features

4.2 Model Architecture

4.3 Supervised AMT: Learning with Crowdsourced Labels

4.4 Supervised Learned Reward: Learning with a Learned Reward Function

4.5 Off-policy REINFORCE

4.6 Off-policy REINFORCE with Learned Reward Function

4.7 Q-learning with the Abstract Discourse Markov Decision Process

4.8 Preliminary Evaluation

5 A/B Testing Experiments

5.1 A/B Testing Experiment #1

5.2 A/B Testing Experiment #2

5.3 A/B Testing Experiment #3

5.4 Discussion

6 Related Work

7 Future Work

7.1 Personalization

7.2 Text-based Evaluation

8 Conclusion

Acknowledgments