Skip to content

OrkaZeta/CHILI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CHILI (CHinese Latent diffusion with OCR Integration)

CHILI train

CHILI Training Pipeline

CHILI infer

CHILI Inference Pipeline

Setup and Runs

CHILI/
├── asset/
├── config/
│   ├── diffusion_hwdb1.yaml
│   └── vae_hwdb1.yaml
├── logs/
├── model/
│   ├── diffusion/
│   │   ├── __init__.py
│   │   ├── model.py
│   │   ├── scheduler.py
│   │   └── unet.py
│   ├── vae/
│   │   ├── __init__.py
│   │   ├── decoder.py
│   │   ├── encoder.py
│   │   └── vae.py
│   └── content_encoder.py
├── runs/
├── scripts/
│   ├── data_scope.ipynb
│   ├── infer_vae.ipynb
│   ├── test_vae_ocr.py
│   ├── train_diffusion.py
│   ├── train_vae.py
│   ├── hwdb_download.sh
│   ├── train_diffusion.slurm
│   └── train_vae.slurm
├── utils/
│   ├── config.py
│   ├── dataset.py
│   ├── log.py
│   ├── loss.py
│   ├── ocr_score.py
│   └── seed.py
├── LICENSE
├── .gitignore
├── README.md
├── requirements.txt
└── environment.yml

Environment setup

conda env create -f environment.yml
conda activate chili311
#OR
pip install -r requirements.txt

HWDB dataset download

bash scripts/hwdb_download.sh
  • Download HWDB1.1 and HWDB1.0 datasets from official site;
  • Extract to data/HWDB1/ folder;
  • Folder structure:
  • data/CASIA/HWDB1.0/train/<char_id>-f.gnt for training set;
  • data/CASIA/HWDB1.0/test/<char_id>-t.gnt for test set;
  • scripts/data_scope.ipynb to explore dataset statistics.

Configuration

  • Config files are in config/ folder.
  • config/vae_hwdb1.yaml for VAE training on HWDB1 dataset.
  • config/diffusion_hwdb1.yaml for diffusion model training on HWDB1 dataset.

Scripts

Training VAE

  • train VAE scripts/train_vae.py;
  • loads config/vae_hwdb1.yaml;
  • creates runs/<ts>/vae/;
  • copies config to vae_hwdb1_<ts>.yaml;
  • checkpoints: vae_hwdb1_best_<ts>.pt, vae_hwdb1_last_<ts>.pt;
  • logs and tensorboard under runs/<ts>/.

VAE inference

  • notebook for VAE inference scripts/infer_vae.ipynb;
  • loads trained VAE from model/vae/vae_hwdb1_best_<ts>.pt;
  • outputs to Generated/vae_infer/.

VAE OCR evaluation

TODO

Training Diffusion Model

  • train diffusion model scripts/train_diffusion.py;
  • loads config/diffusion_hwdb1.yaml;
  • creates runs/<ts>/diffusion/;
  • copies config to diffusion_hwdb1_<ts>.yaml;
  • checkpoints: diffusion_hwdb1_best_<ts>.pt, diffusion_hwdb1_last_<ts>.pt;
  • logs and tensorboard under runs/<ts>/.

Experiments

Gp 1: One-DM Reimplementation on HWDB1 dataset Gp 1': Gp1 ckpt continued training (lower LR) Gp 2: CHILI with DDIM

ExpID Comments BS LR Backbone HighNCE LowNCE Content Xstart Load Ckpt Best Epoch Best Step Recon Loss
132836 完全混乱 1024 5e-4 resnet18 1.0 1.0 - - N/A 4 4000 0.109876
133224 完全混乱 1024 5e-4 resnet32 1.0 1.0 - - N/A 3 3000 0.111537
135252 完全混乱 1024 5e-4 resnet32 1.0 1.0 - - N/A 4 4000 0.110261
183205 完全混乱 1024 2e-5 resnet32 1.0 1.0 - - N/A 7 8500 0.119643
184510 epoch4:乱七八糟 1024 2e-5 resnet18 1.0 1.0 - - runs/diff_hwdb1_20251212_132836/diff_epoch0003_best.pt (resume) 4 4000 0.109466
194209 epoch4:乱七八糟 1024 2e-5 resnet18 1.0 1.0 - - runs/diff_hwdb1_20251212_132836/diff_epoch0003_best.pt (resume) 4 4000 0.109466
213138 epoch1:乱七八糟 1024 5e-5 resnet32 1.0 1.0 - - runs/diff_hwdb1_20251212_183205/diff_epoch0006_best.pt (resume) 12 15000 0.113213
004312 epoch1:乱七八糟 128 5e-5 resnet18 1.0 1.0 - 1.0 N/A 1 4000 0.153818
015232 epoch4/5:个别可识别 1024 5e-5 resnet32 1.0 1.0 - - N/A 5 40000 0.114299
121215 epoch14:不错;复杂字体有断笔画 / evo 效果好 128 1e-5 resnet18 0.5 0.5 - 0.0 runs/diff_hwdb1_20251213_015232/diff_epoch0005_best.pt (resume) 8 66000 0.103914
164212 epoch1:完全混乱 128 1e-4 resnet18 0.5 0.5 1.0 1.0 N/A 1 2000 0.553935
173327 epoch14:复杂字体不清晰 / evo 有好图 128 5e-5 resnet18 1.0 1.0 1.0 1.0 N/A 1 2000 0.525631
230032 epoch1:完全混乱 128 2e-5 resnet18 1.0 1.0 1.0 0.1 N/A 1 2000 0.562972
001631 epoch1:完全混乱 128 2e-5 resnet18 1.0 1.0 0.5 1.0 N/A 1 1000 0.396908
015721 epoch4:复杂字体不清晰 128 2e-5 resnet18 1.0 1.0 0.0 1.0 N/A 4 33000 0.124525
133318 epoch6:图像很虚,笔画不连贯,无法辨认 256 1e-4 resnet32 1.0 1.0 0.0 1.0 N/A 3 15000 0.126090

Reference

  • One-DM:One-Shot Diffusion Mimicker for Handwritten Text Generation
  • MetaScript: Few-Shot Handwritten Chinese Content Generation via Generative Adversarial Networks

About

CHILI - CHinese Latent diffusion with OCR Integration

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors