DialToM

Official code and data repository for the paper "DialToM: A Theory of Mind Benchmark for Forecasting State-Driven Dialogue Trajectories."

Directory Structure

counterfactual_data: contains three files, for each dataset, containing all counterfactuals generated for the counterfactual ablation study.
data: contains the human verified version of DialToM.

Packages required

The user needs to install the following packages in their preferred virtual environment: google-genai, openai, sacrebleu, rouge, bert-score.

Usage

Benchmarking

Retrospective

python benchmark.py --model gpt-5 --task retrospective --filename retrospective.csv

Prospective

python benchmark.py --model gpt-5 --task prospective --exp [EXP_TYPE] --filename prospective.csv

The user can choose between four experiment types for the prospective task (EXP_TYPE): normal, easy, NOTA, CoT. The filename will change dynamically based on what experiment is chosen to {filename}_{EXP_TYPE}.csv.
normal experiment is the default prospective baseline, easy refers to the easy set evaluation of the prospective task, NOTA and CoT are the two ablations as discussed in the rebuttal phase.

Written

python benchmark.py --model gpt-5 --task written --filename written.csv

Counterfactual testing

python counterfactual_test.py --model gpt-5 --filename counter.csv

Memorization pilot study

python memorization_pilot.py --model gpt-5 --filename memorize.csv

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
counterfactual_data		counterfactual_data
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmark.py		benchmark.py
counterfactual_test.py		counterfactual_test.py
memorization_pilot.py		memorization_pilot.py
utils.py		utils.py
written_metrics.py		written_metrics.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DialToM

Directory Structure

Packages required

Usage

Benchmarking

Retrospective

Prospective

Written

Counterfactual testing

Memorization pilot study

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DialToM

Directory Structure

Packages required

Usage

Benchmarking

Retrospective

Prospective

Written

Counterfactual testing

Memorization pilot study

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages