by Tin Nguyen, Logan Bolton, Mohammad R. Taesiri, Trung Bui, and Anh Nguyen.
python=3.10.15
google-generativeai==0.8.3
openai==1.58.1
Run the following command to execute the script:
python main.py --save_answer --llm_model "$llm_model" --dataset "$dataset" --answer_mode "$run_mode" --data_mode "$data_mode"--llm_model: Defines the LLM model to use. Choices include:gemini-1.5-pro-002,gemini-1.5-flash-002,gpt-4o-2024-08-06llama_8b,llama_70b,llama_sambanova_405bqwen25_coder_32b,qwq_32b,deepseek_r1
--dataset: Specifies the dataset to evaluate, such as:GSM8K,AQUA,DROP
--answer_mode": Determines the answering strategy:cot: Chain-of-Thought promptinghot: Highlight Chain-of-Thought prompting
--data_mode:random: Runs the model on 200 randomly selected samples.longest: Runs the model on 200 longest samples.shortest: Runs the model on 200 shortest samples.full: Runs the model on the whole dataset.
python main.py --save_answer --llm_model "gpt-4o-2024-08-06" --dataset "GSM8K" --answer_mode "cot" --data_mode randomRun the following command to evaluate the results:
python evaluate.py --llm_model "$llm_model" --dataset "$dataset" --answer_mode "$answer_mode" --data_mode "$data_mode"python evaluate.py --llm_model "gpt-4o-2024-08-06" --dataset "GSM8K" --answer_mode "cot" --data_mode longestRun the following command to render the result on html pages:
python visualize.py --llm_model "$llm_model" --dataset "$dataset" --answer_mode "$answer_mode" --save_htmlpython visualize.py --llm_model "gpt-4o-2024-08-06" --dataset "GSM8K" --answer_mode "cot" --data_mode --save_htmlMIT
If you use this for your research, please cite:
@article{nguyen2025hot,
title={HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs},
author={Nguyen, Tin and Bolton, Logan and Taesiri, Mohammad Reza and Nguyen, Anh Totti},
journal={arXiv preprint arXiv:2503.02003},
year={2025}
}

