Skip to content

KazeDog/CataCon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CataCon: a contrastive graph representation learning framework for catalyst prediction

Official PyTorch code snapshot for the final CataCon manuscript package.

framework

What Is Included?

This package contains two main parts:

1. Main CataCon pipeline

  • entry.py
  • model/RxnCatNet_Paper.py
  • model/MolGNN.py
  • model/CELosses.py
  • model/CELosses_InfoNCE_Contrastive.py
  • model/RxnCatNet.py

The final revised setting uses:

  • GraphSAGE as the default molecular encoder,
  • reactant/product graph representations for reaction modeling,
  • contrastive learning together with a classification objective,
  • random and scaffold-product split evaluation.

2. Data processing and split utilities

  • data/preprocess_uspto_catalyst.py
  • data/datamodule.py
  • data/datasets.py
  • data/scaffold_split.py

These scripts cover:

  • canonicalization and normalization,
  • invalid-row removal,
  • duplicate filtering,
  • deterministic label remapping,
  • deterministic random and scaffold split generation.

Environment

1. Recommended environment

  • Ubuntu Linux
  • CUDA 11.8
  • Python 3.9

2. Install dependencies

From this directory:

cd code_final
conda create --name catacon-final --file requirements.txt
conda activate catacon-final

The requirements.txt file in this directory is exported directly from the conda environment used in our experiments, so it should be treated as a conda package list rather than a pip-style requirements file.

Data Preparation

1. Expected dataset root

The graph training pipeline expects a processed USPTO-Catalyst directory, for example:

/path/to/uspto

2. Rebuild the processed benchmark

Use:

python data/preprocess_uspto_catalyst.py \
  --input-csv /path/to/uspto_catalyst.csv \
  --reference-csv /path/to/uspto_catalyst.csv \
  --output-csv /tmp/uspto_catalyst_rebuilt.csv \
  --mapping-csv /tmp/uspto_label_mapping.csv \
  --stats-csv /tmp/uspto_preprocessing_stats.csv \
  --skip-canonicalize

This script supports:

  • RDKit-based normalization and canonicalization,
  • duplicate removal by (reactant, product, catalyst) and/or (reactant, product),
  • deterministic label remapping,
  • filtering to a curated final label space through --reference-csv.

Running CataCon

1. Final model on the random split

python entry.py \
  --model_variant paper \
  --dataset /path/to/uspto \
  --split_strategy random \
  --classification_mode mlp \
  --scheduler_strategy warmup_cosine \
  --selection_metric val_top1_acc \
  --name Y_paper_random_catacon \
  --cuda 0

2. Final model on the product-scaffold split

python entry.py \
  --model_variant paper \
  --dataset /path/to/uspto \
  --split_strategy scaffold \
  --scaffold_source product \
  --classification_mode mlp \
  --scheduler_strategy warmup_cosine \
  --selection_metric val_top1_acc \
  --name Y_paper_scaffold_product_catacon \
  --cuda 0

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages