Skip to content

mrheng9/CNN-PointNet

Repository files navigation

CNN-PointNet (CNN + PointNet++) Tutorial

This repository implements a regression model for PMT waveforms (time series) combined with detector geometry (xyz):

  1. A 1D CNN extracts per-PMT features from each PMT waveform.
  2. All PMTs are treated as a point cloud (each PMT is a point):
    • point position: xyz
    • point feature: CNN features (optionally concatenated with static features)
  3. A PointNet++ backbone aggregates all points into a global representation and regresses the final output (default: a 3D vector).

Two training modes are provided:

  • Waveform-only: CNN features from waveform only
    (train_npevst.py + models/pointnet_regression_npevst.py)
  • Waveform + static features: concatenate 4 static features before waveform
    (train_npevst+feat.py + models/pointnet_regression_npevst_add.py)

1. Repository Layout

CNN-PointNet/
├─ data_utils/
│  └─ npevst_dataset.py                 # Datasets: lazy loading paired x_i.npy / y_i.npy (and optional feat)
├─ models/
│  ├─ pointnet_regression_utils.py      # PointNet++ utilities (FPS, ball query, SA/MSG modules)
│  ├─ pointnet_regression_npevst.py     # waveform-only model: CNNBlock + PointNet++
│  └─ pointnet_regression_npevst_add.py # waveform+feat model: conv on waveform + concat static feats
├─ train_npevst.py                      # Training script (waveform-only)
├─ train_npevst+feat.py                 # Training script (waveform + 4 static features)
├─ export_visE.py                       # Export a column (e.g., visE) from y and match the same test split
├─ plots.py                             # Plotting/analysis helpers (optional, see Section 6.5)
└─ run_training.sh                      # Example launcher (torchrun commands are commented)

2. Method Overview: How CNN + PointNet++ Are Combined

2.1 Inputs / Outputs (default task)

Waveform input wave:

  • Waveform-only mode: wave has shape (B, N, T, 1)
  • Waveform+feat mode: wave has shape (B, N, (4+T), 1)
    (the first 4 are static features; the remaining T are waveform samples)

Here:

  • B: batch size
  • N = 17612: number of PMTs (assumed fixed in the code)
  • T = 100: number of waveform timesteps used by default

Geometry input xyz:

  • Loaded as (N, 3) and expanded to (B, N, 3) during training.

Model output:

  • (B, 3) regression vector (a 3D vector)
    • train_npevst.py uses MSELoss
    • train_npevst+feat.py uses mean Euclidean distance by default

2.2 Per-PMT CNN Feature Extraction

In models/pointnet_regression_npevst.py and ..._add.py:

  1. Extract each PMT’s 1D time series.
  2. Feed it into a CNNBlock to produce a compact feature vector (e.g., 4 dims).
  3. Assemble all PMT features into points of shape (B, D, N) and feed (xyz, points) to PointNet++.

The model also splits PMTs into two groups by type (Hamamatsu vs NNVT) and uses two different CNN blocks (cnn_hama, cnn_nnvt). The type mapping is loaded from a CSV:

  • '/disk_pool1/houyh/data/PMTType_CD_LPMT.csv'

You will almost certainly need to change this path to your local environment (see Section 5).

2.3 PointNet++ Aggregation

PointNet++ components live in models/pointnet_regression_utils.py. The backbone uses:

  • SA1: PointNetSetAbstraction(...)
  • SA2: PointNetSetAbstractionMsg(...) (multi-scale grouping)
  • SA3: PointNetSetAbstraction(..., group_all=True) for global feature
  • MLP head to regress the final output vector

3. Data Requirements

3.1 Paired waveform/label files

The training scripts search for:

  • x_dir/x_*.npy
  • y_dir/y_*.npy

Pairs are matched by index i (x_i.npy with y_i.npy), then filtered by start/end (default 3000..4999).

From data_utils/npevst_dataset.py:

  • x_i.npy shape: (num_samples_in_file, 17612, T_total)
    • only the first timesteps (default 100) are used
  • y_i.npy shape: (num_samples_in_file, ncol)
    • columns theta_col=2 and phi_col=3 are used to generate a unit direction vector label:
      • x = sin(theta) cos(phi)
      • y = sin(theta) sin(phi)
      • z = cos(theta)

3.2 Optional: 4 static feature files (only for train_npevst+feat.py)

For waveform+feat mode, you also need these files under feat_dir for each i:

  • x_fht_pmt_i.npy
  • x_npe_pmt_i.npy
  • x_slope4_pmt_i.npy
  • x_peaktime_pmt_i.npy

For each sample, the dataset loads (17612,) from each feature file, stacks to (17612, 4), and concatenates with waveform (17612, T) along the “sequence” dimension to get (17612, 4+T).

3.3 Geometry coordinates (norm_coords)

Training scripts load:

  • /disk_pool1/houyh/coords/norm_coords_single.npy

It must be a numpy array of shape:

  • (17612, 3)

You must update this path (or provide the file at that location).


4. Environment Setup (Recommended)

Main dependencies:

  • Python 3.9+ (recommended)
  • PyTorch (recommended 2.x)
  • numpy, matplotlib, tqdm, scikit-learn
  • pandas (required for PMT type CSV in the model)

Example setup:

conda create -n cnn_pointnet python=3.10 -y
conda activate cnn_pointnet

# Install torch matching your CUDA version (example only)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

pip install numpy matplotlib tqdm scikit-learn pandas

5. Training Tutorial (Step-by-step)

5.1 Step 1: Fix hard-coded paths

The current code contains multiple hard-coded local paths:

In training scripts:

  • x_dir = "/disk_pool1/chenhr/decon2"
  • y_dir = "/disk_pool1/wangjb/nu_mu_waveform/y"
  • feat_dir = "/disk_pool1/wangjb/nu_mu_waveform/elec_fea2" (only +feat)
  • xyz = np.load('/disk_pool1/houyh/coords/norm_coords_single.npy')
  • default log_dir = "/home/houyh/CNN-PointNet/experiments/runX"

In model files:

  • PMT type CSV: '/disk_pool1/houyh/data/PMTType_CD_LPMT.csv'

Minimum required changes to run on a new machine:

  1. Edit train_npevst.py / train_npevst+feat.py to point to your x_dir, y_dir, (feat_dir) and coords file.
  2. Edit models/pointnet_regression_npevst.py and models/pointnet_regression_npevst_add.py to point to your PMT type CSV path.

More maintainable approach: convert these into command-line arguments (if you want, I can help refactor the scripts accordingly).

5.2 Train waveform-only model (train_npevst.py)

Example command:

python train_npevst.py \
  --gpu 0 \
  --epoch 100 \
  --batch_size 32 \
  --learning_rate 5e-4 \
  --weight_decay 1e-4 \
  --num_features 4 \
  --log_dir ./experiments/wave_only_run

What it does:

  • Finds paired x_i.npy/y_i.npy indices via find_paired_indices
  • Uses NPYPairDataset for lazy sample loading
  • Splits train/test using random_split (test = 20%)
  • Tracks train/test loss each epoch
  • Saves into log_dir:
    • best.pth (best test loss checkpoint)
    • predict_xyz.npy, true_xyz.npy (best-epoch predictions/targets)
    • learning_curve.png (log-scaled y-axis)
    • Test Performance.png (scatter plot; by default uses theta derived from xyz)

Common arguments:

  • --num_features: per-PMT CNN feature dimension D (default 4)
  • --patience: early stopping patience (default 15)
  • --save_each_epoch: also save last.pth each epoch (default off)

5.3 Train waveform + static features model (train_npevst+feat.py)

Example command:

python "train_npevst+feat.py" \
  --gpu 0 \
  --epoch 100 \
  --batch_size 32 \
  --learning_rate 5e-4 \
  --weight_decay 1e-4 \
  --num_features 4 \
  --log_dir ./experiments/wave_plus_feat_run

Differences vs waveform-only:

  • Dataset becomes NPYPairFeatDataset
  • Input feature length becomes 4 + timesteps
  • models/pointnet_regression_npevst_add.py:
    • applies conv only to the waveform portion
    • concatenates CNN output with the static features, forming per-point feature dimension
      num_features + static_feats

5.4. Export visE (Optional)

export_visE.py extracts one column (default vise_col=7) from all y_i.npy files and reproduces the same test split as training (via the same seed-based permutation) to export only the test subset.

Example:

python export_visE.py \
  --x_dir /your/x_dir \
  --y_dir /your/y_dir \
  --start 3000 \
  --end 5000 \
  --vise_col 7 \
  --seed 42 \
  --out ./experiments/wave_only_run/visE_test.npy

6 Plotting & Post-processing (plots.py)

The file plots.py contains utility functions/scripts for visualization and offline analysis of your experiment outputs. Typical use cases include:

  • Plotting training curves (train/test loss vs epoch) from saved logs
  • Visualizing prediction quality (e.g., predicted vs true scatter)
  • Comparing different runs (e.g., different num_features, with/without static features)
  • Additional custom plots for physics/geometry-related metrics (depending on what you save)

How to use it

Because plots.py is a standalone script, usage depends on how the functions are written inside it. A common workflow is:

  1. Run training to produce outputs under --log_dir, such as:
    • loss_log.txt
    • learning_curve.png
    • predict_xyz.npy
    • true_xyz.npy
  2. Run/modify plots.py to read those files and generate additional figures.

Example (conceptual):

python plots.py --log_dir ./experiments/wave_only_run

Note: if plots.py currently does not expose a CLI, you can either:

  • add an argparse interface, or
  • import it from a notebook/script and call its functions directly.

7. FAQ / Troubleshooting

Q1: It fails because /disk_pool1/... does not exist.

The code contains hard-coded paths. Follow Section 5.1 to update dataset paths, coordinates path, and the PMT type CSV path.

Q2: xyz N=... does not match model.npmt=17612

Your coordinates file is not shaped (17612, 3). This repo assumes N=17612 PMTs.

Q3: Out of memory / too slow

  • Reduce --batch_size
  • Reduce timesteps (requires consistent changes in dataset + model config)
  • The model extracts features in PMT chunks using pmt_batch_size=1000; reducing it saves memory but may slow down training.

Q4: I want everything configurable via CLI arguments

Recommended refactor: add --x_dir --y_dir --feat_dir --coords_path --pmt_type_csv and remove hard-coded paths.


8. Acknowledgements

  • PointNet++ utilities and Set Abstraction modules are implemented in models/pointnet_regression_utils.py.
  • The CNN + PointNet++ hybrid models are in models/pointnet_regression_npevst*.py.

License

No LICENSE file is detected in the repository root. Add one if you plan to open-source/distribute the code.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages