CataCon: a contrastive graph representation learning framework for catalyst prediction

Official PyTorch code snapshot for the final CataCon manuscript package.

What Is Included?

This package contains two main parts:

1. Main CataCon pipeline

entry.py
model/RxnCatNet_Paper.py
model/MolGNN.py
model/CELosses.py
model/CELosses_InfoNCE_Contrastive.py
model/RxnCatNet.py

The final revised setting uses:

GraphSAGE as the default molecular encoder,
reactant/product graph representations for reaction modeling,
contrastive learning together with a classification objective,
random and scaffold-product split evaluation.

2. Data processing and split utilities

data/preprocess_uspto_catalyst.py
data/datamodule.py
data/datasets.py
data/scaffold_split.py

These scripts cover:

canonicalization and normalization,
invalid-row removal,
duplicate filtering,
deterministic label remapping,
deterministic random and scaffold split generation.

Environment

1. Recommended environment

Ubuntu Linux
CUDA 11.8
Python 3.9

2. Install dependencies

From this directory:

cd code_final
conda create --name catacon-final --file requirements.txt
conda activate catacon-final

The requirements.txt file in this directory is exported directly from the conda environment used in our experiments, so it should be treated as a conda package list rather than a pip-style requirements file.

Data Preparation

1. Expected dataset root

The graph training pipeline expects a processed USPTO-Catalyst directory, for example:

/path/to/uspto

2. Rebuild the processed benchmark

Use:

python data/preprocess_uspto_catalyst.py \
  --input-csv /path/to/uspto_catalyst.csv \
  --reference-csv /path/to/uspto_catalyst.csv \
  --output-csv /tmp/uspto_catalyst_rebuilt.csv \
  --mapping-csv /tmp/uspto_label_mapping.csv \
  --stats-csv /tmp/uspto_preprocessing_stats.csv \
  --skip-canonicalize

This script supports:

RDKit-based normalization and canonicalization,
duplicate removal by (reactant, product, catalyst) and/or (reactant, product),
deterministic label remapping,
filtering to a curated final label space through --reference-csv.

Running CataCon

1. Final model on the random split

python entry.py \
  --model_variant paper \
  --dataset /path/to/uspto \
  --split_strategy random \
  --classification_mode mlp \
  --scheduler_strategy warmup_cosine \
  --selection_metric val_top1_acc \
  --name Y_paper_random_catacon \
  --cuda 0

2. Final model on the product-scaffold split

python entry.py \
  --model_variant paper \
  --dataset /path/to/uspto \
  --split_strategy scaffold \
  --scaffold_source product \
  --classification_mode mlp \
  --scheduler_strategy warmup_cosine \
  --selection_metric val_top1_acc \
  --name Y_paper_scaffold_product_catacon \
  --cuda 0

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
figs		figs
model		model
README.md		README.md
entry.py		entry.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CataCon: a contrastive graph representation learning framework for catalyst prediction

What Is Included?

1. Main CataCon pipeline

2. Data processing and split utilities

Environment

1. Recommended environment

2. Install dependencies

Data Preparation

1. Expected dataset root

2. Rebuild the processed benchmark

Running CataCon

1. Final model on the random split

2. Final model on the product-scaffold split

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CataCon: a contrastive graph representation learning framework for catalyst prediction

What Is Included?

1. Main CataCon pipeline

2. Data processing and split utilities

Environment

1. Recommended environment

2. Install dependencies

Data Preparation

1. Expected dataset root

2. Rebuild the processed benchmark

Running CataCon

1. Final model on the random split

2. Final model on the product-scaffold split

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages