A Deep Reinforcement Learning Chatbot

Abstract

Amazon Alexa Prize competition で MILABOT を deep reinforcement learning で開発した。
よくある small talk topi に対応していて人と会話できる
bot は
- natural language generation model と retrieval model を組み合わせてつくられている
- template-based model, bag-of-words model, seq2seq, latent variable などなど。
crowd source と実際のユーザーとのやりとりで、複数のモデルから適切な　response を選ぶように reinforcement learning で train した。

1.Introduction

略

2 System Overview

rule base は限界があるのでほぼ全て Statistical machine learning にした。
全てのコンポーネントが独立して大量データを使って ML で training された
dialogue manager
- モデルたちから reponse 候補をもらう
- priority response があったら即返す (what's your name?)
- なかったら selection policy で選ぶ

3 Response Models

22 個のレスポンスモデルを内包している
retrieval-based
generation
knowledge base
template base

3.1 Template-based Models

[1]. Alicebot

AIML テンプレートに記述されたレスポンス（会話履歴）からレスポンスを出す
www.alicebot.org
priority response ではない
トレーニング

[2]. Elizabot

string match base template bot
Rogerian psychotherapis システムをまねしたもの

[3]. Initiatorbot

会話を始める starter として使う - 40個のopen questions を用意
例: What did you do today?
interesting fact で会話を始めることも
過去2ターンですでに起動しているかも確認

[4]. Storybot

ユーザーが say/tell というとトリガーされる
唯一の会話形ではないボット

3.2 Knowledge Base-based Question Answering

[5].Evibot

www.evi.com
Amazon の Q/A web service
direct question に関しては priority response となる
evibot に投げる。失敗したら NLTK named entity processor を使ってサブクエリがあるか調べる

[6]. BoWMovies

template-based
movie bot: あらすじ。公開年などのデータ持ってる
string matching で映画タイトルなどをユーザーのクエリの中で探す
matching に word embeddingも使ってる

3.3 Retrieval-based Neural Networks

[7, 8, 9, 10]. VHRED models

いくつかの VHRED model がある。VHRED は seq2seq with Gaussian latent variables (論文あり)
レスポンス生成の方法
- K個のレスポンス候補が、「現時点の会話ヒストリ」と「dataset にある会話ヒストリ」の cosine similarity で得られる。（bag-of-words TF-IDF Glove word embeddingsを使う）
- その 20（K?)個のレスポンスの log-likelifood を VHRED で計算して一番高いやつを return する
4つの VHRD モデルが有る。reddit から scrape してる

11. SkipThought Vector Models

BookCorpus でトレーニングしたSkipThought Vector model
amazon prize の rule で宗教や政治のトピックに関して意見を言ってはいけないというものがあった
それに対応するもの

[12, 13] Dual Encoder Models

DualEncoderRedditPolitics and DualEncoderRedditNews
ENC_Q と ENC_R の２つのEncoder で dialogue history と学習して response を返す。
candidate response の score は bilinear mapping of the dialogue history embedding and the candidate response embedding.
top K results が TFIDF と　Glove word embeddings cosine similarity

A Deep Reinforcement Learning Chatbot

Abstract

1.Introduction

2 System Overview

3 Response Models

3.1 Template-based Models

[1]. Alicebot

[2]. Elizabot

[3]. Initiatorbot

[4]. Storybot

3.2 Knowledge Base-based Question Answering

[5].Evibot

[6]. BoWMovies

3.3 Retrieval-based Neural Networks

[7, 8, 9, 10]. VHRED models

11. SkipThought Vector Models

[12, 13] Dual Encoder Models

3.4 Retrieval-based Logistic Regression

3.5 Search Engine-based Neural Networks

3.6 Generation-based Neural Networks

4 Model Selection Policy

4.1 Input Features

4.2 Model Architecture

4.3 Supervised AMT: Learning with Crowdsourced Labels

4.4 Supervised Learned Reward: Learning with a Learned Reward Function

4.5 Off-policy REINFORCE

4.6 Off-policy REINFORCE with Learned Reward Function

4.7 Q-learning with the Abstract Discourse Markov Decision Process

4.8 Preliminary Evaluation

5 A/B Testing Experiments

5.1 A/B Testing Experiment #1

5.2 A/B Testing Experiment #2

5.3 A/B Testing Experiment #3

5.4 Discussion

6 Related Work

7 Future Work

7.1 Personalization

7.2 Text-based Evaluation

8 Conclusion

Acknowledgments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!