Official PyTorch code snapshot for the final CataCon manuscript package.
This package contains two main parts:
entry.pymodel/RxnCatNet_Paper.pymodel/MolGNN.pymodel/CELosses.pymodel/CELosses_InfoNCE_Contrastive.pymodel/RxnCatNet.py
The final revised setting uses:
GraphSAGEas the default molecular encoder,- reactant/product graph representations for reaction modeling,
- contrastive learning together with a classification objective,
randomandscaffold-productsplit evaluation.
data/preprocess_uspto_catalyst.pydata/datamodule.pydata/datasets.pydata/scaffold_split.py
These scripts cover:
- canonicalization and normalization,
- invalid-row removal,
- duplicate filtering,
- deterministic label remapping,
- deterministic
randomandscaffoldsplit generation.
- Ubuntu Linux
- CUDA 11.8
- Python 3.9
From this directory:
cd code_final
conda create --name catacon-final --file requirements.txt
conda activate catacon-finalThe requirements.txt file in this directory is exported directly from the conda environment used in our experiments, so it should be treated as a conda package list rather than a pip-style requirements file.
The graph training pipeline expects a processed USPTO-Catalyst directory, for example:
/path/to/uspto
Use:
python data/preprocess_uspto_catalyst.py \
--input-csv /path/to/uspto_catalyst.csv \
--reference-csv /path/to/uspto_catalyst.csv \
--output-csv /tmp/uspto_catalyst_rebuilt.csv \
--mapping-csv /tmp/uspto_label_mapping.csv \
--stats-csv /tmp/uspto_preprocessing_stats.csv \
--skip-canonicalizeThis script supports:
- RDKit-based normalization and canonicalization,
- duplicate removal by
(reactant, product, catalyst)and/or(reactant, product), - deterministic label remapping,
- filtering to a curated final label space through
--reference-csv.
python entry.py \
--model_variant paper \
--dataset /path/to/uspto \
--split_strategy random \
--classification_mode mlp \
--scheduler_strategy warmup_cosine \
--selection_metric val_top1_acc \
--name Y_paper_random_catacon \
--cuda 0python entry.py \
--model_variant paper \
--dataset /path/to/uspto \
--split_strategy scaffold \
--scaffold_source product \
--classification_mode mlp \
--scheduler_strategy warmup_cosine \
--selection_metric val_top1_acc \
--name Y_paper_scaffold_product_catacon \
--cuda 0