CNN-PointNet (CNN + PointNet++) Tutorial

This repository implements a regression model for PMT waveforms (time series) combined with detector geometry (xyz):

A 1D CNN extracts per-PMT features from each PMT waveform.
All PMTs are treated as a point cloud (each PMT is a point):
- point position: xyz
- point feature: CNN features (optionally concatenated with static features)
A PointNet++ backbone aggregates all points into a global representation and regresses the final output (default: a 3D vector).

Two training modes are provided:

Waveform-only: CNN features from waveform only
(train_npevst.py + models/pointnet_regression_npevst.py)
Waveform + static features: concatenate 4 static features before waveform
(train_npevst+feat.py + models/pointnet_regression_npevst_add.py)

1. Repository Layout

CNN-PointNet/
├─ data_utils/
│  └─ npevst_dataset.py                 # Datasets: lazy loading paired x_i.npy / y_i.npy (and optional feat)
├─ models/
│  ├─ pointnet_regression_utils.py      # PointNet++ utilities (FPS, ball query, SA/MSG modules)
│  ├─ pointnet_regression_npevst.py     # waveform-only model: CNNBlock + PointNet++
│  └─ pointnet_regression_npevst_add.py # waveform+feat model: conv on waveform + concat static feats
├─ train_npevst.py                      # Training script (waveform-only)
├─ train_npevst+feat.py                 # Training script (waveform + 4 static features)
├─ export_visE.py                       # Export a column (e.g., visE) from y and match the same test split
├─ plots.py                             # Plotting/analysis helpers (optional, see Section 6.5)
└─ run_training.sh                      # Example launcher (torchrun commands are commented)

2. Method Overview: How CNN + PointNet++ Are Combined

2.1 Inputs / Outputs (default task)

Waveform input wave:

Waveform-only mode: wave has shape (B, N, T, 1)
Waveform+feat mode: wave has shape (B, N, (4+T), 1)
(the first 4 are static features; the remaining T are waveform samples)

Here:

B: batch size
N = 17612: number of PMTs (assumed fixed in the code)
T = 100: number of waveform timesteps used by default

Geometry input xyz:

Loaded as (N, 3) and expanded to (B, N, 3) during training.

Model output:

(B, 3) regression vector (a 3D vector)
- train_npevst.py uses MSELoss
- train_npevst+feat.py uses mean Euclidean distance by default

2.2 Per-PMT CNN Feature Extraction

In models/pointnet_regression_npevst.py and ..._add.py:

Extract each PMT’s 1D time series.
Feed it into a CNNBlock to produce a compact feature vector (e.g., 4 dims).
Assemble all PMT features into points of shape (B, D, N) and feed (xyz, points) to PointNet++.

The model also splits PMTs into two groups by type (Hamamatsu vs NNVT) and uses two different CNN blocks (cnn_hama, cnn_nnvt). The type mapping is loaded from a CSV:

'/disk_pool1/houyh/data/PMTType_CD_LPMT.csv'

You will almost certainly need to change this path to your local environment (see Section 5).

2.3 PointNet++ Aggregation

PointNet++ components live in models/pointnet_regression_utils.py. The backbone uses:

SA1: PointNetSetAbstraction(...)
SA2: PointNetSetAbstractionMsg(...) (multi-scale grouping)
SA3: PointNetSetAbstraction(..., group_all=True) for global feature
MLP head to regress the final output vector

3. Data Requirements

3.1 Paired waveform/label files

The training scripts search for:

x_dir/x_*.npy
y_dir/y_*.npy

Pairs are matched by index i (x_i.npy with y_i.npy), then filtered by start/end (default 3000..4999).

From data_utils/npevst_dataset.py:

x_i.npy shape: (num_samples_in_file, 17612, T_total)
- only the first timesteps (default 100) are used
y_i.npy shape: (num_samples_in_file, ncol)
- columns theta_col=2 and phi_col=3 are used to generate a unit direction vector label:
  - x = sin(theta) cos(phi)
  - y = sin(theta) sin(phi)
  - z = cos(theta)

3.2 Optional: 4 static feature files (only for `train_npevst+feat.py`)

For waveform+feat mode, you also need these files under feat_dir for each i:

x_fht_pmt_i.npy
x_npe_pmt_i.npy
x_slope4_pmt_i.npy
x_peaktime_pmt_i.npy

For each sample, the dataset loads (17612,) from each feature file, stacks to (17612, 4), and concatenates with waveform (17612, T) along the “sequence” dimension to get (17612, 4+T).

3.3 Geometry coordinates (`norm_coords`)

Training scripts load:

/disk_pool1/houyh/coords/norm_coords_single.npy

It must be a numpy array of shape:

(17612, 3)

You must update this path (or provide the file at that location).

4. Environment Setup (Recommended)

Main dependencies:

Python 3.9+ (recommended)
PyTorch (recommended 2.x)
numpy, matplotlib, tqdm, scikit-learn
pandas (required for PMT type CSV in the model)

Example setup:

conda create -n cnn_pointnet python=3.10 -y
conda activate cnn_pointnet

# Install torch matching your CUDA version (example only)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

pip install numpy matplotlib tqdm scikit-learn pandas

5. Training Tutorial (Step-by-step)

5.1 Step 1: Fix hard-coded paths

The current code contains multiple hard-coded local paths:

In training scripts:

x_dir = "/disk_pool1/chenhr/decon2"
y_dir = "/disk_pool1/wangjb/nu_mu_waveform/y"
feat_dir = "/disk_pool1/wangjb/nu_mu_waveform/elec_fea2" (only +feat)
xyz = np.load('/disk_pool1/houyh/coords/norm_coords_single.npy')
default log_dir = "/home/houyh/CNN-PointNet/experiments/runX"

In model files:

PMT type CSV: '/disk_pool1/houyh/data/PMTType_CD_LPMT.csv'

Minimum required changes to run on a new machine:

Edit train_npevst.py / train_npevst+feat.py to point to your x_dir, y_dir, (feat_dir) and coords file.
Edit models/pointnet_regression_npevst.py and models/pointnet_regression_npevst_add.py to point to your PMT type CSV path.

More maintainable approach: convert these into command-line arguments (if you want, I can help refactor the scripts accordingly).

5.2 Train waveform-only model (`train_npevst.py`)

Example command:

python train_npevst.py \
  --gpu 0 \
  --epoch 100 \
  --batch_size 32 \
  --learning_rate 5e-4 \
  --weight_decay 1e-4 \
  --num_features 4 \
  --log_dir ./experiments/wave_only_run

What it does:

Finds paired x_i.npy/y_i.npy indices via find_paired_indices
Uses NPYPairDataset for lazy sample loading
Splits train/test using random_split (test = 20%)
Tracks train/test loss each epoch
Saves into log_dir:
- best.pth (best test loss checkpoint)
- predict_xyz.npy, true_xyz.npy (best-epoch predictions/targets)
- learning_curve.png (log-scaled y-axis)
- Test Performance.png (scatter plot; by default uses theta derived from xyz)

Common arguments:

--num_features: per-PMT CNN feature dimension D (default 4)
--patience: early stopping patience (default 15)
--save_each_epoch: also save last.pth each epoch (default off)

5.3 Train waveform + static features model (`train_npevst+feat.py`)

Example command:

python "train_npevst+feat.py" \
  --gpu 0 \
  --epoch 100 \
  --batch_size 32 \
  --learning_rate 5e-4 \
  --weight_decay 1e-4 \
  --num_features 4 \
  --log_dir ./experiments/wave_plus_feat_run

Differences vs waveform-only:

Dataset becomes NPYPairFeatDataset
Input feature length becomes 4 + timesteps
models/pointnet_regression_npevst_add.py:
- applies conv only to the waveform portion
- concatenates CNN output with the static features, forming per-point feature dimension
  num_features + static_feats

5.4. Export `visE` (Optional)

export_visE.py extracts one column (default vise_col=7) from all y_i.npy files and reproduces the same test split as training (via the same seed-based permutation) to export only the test subset.

Example:

python export_visE.py \
  --x_dir /your/x_dir \
  --y_dir /your/y_dir \
  --start 3000 \
  --end 5000 \
  --vise_col 7 \
  --seed 42 \
  --out ./experiments/wave_only_run/visE_test.npy

6 Plotting & Post-processing (`plots.py`)

The file plots.py contains utility functions/scripts for visualization and offline analysis of your experiment outputs. Typical use cases include:

Plotting training curves (train/test loss vs epoch) from saved logs
Visualizing prediction quality (e.g., predicted vs true scatter)
Comparing different runs (e.g., different num_features, with/without static features)
Additional custom plots for physics/geometry-related metrics (depending on what you save)

How to use it

Because plots.py is a standalone script, usage depends on how the functions are written inside it. A common workflow is:

Run training to produce outputs under --log_dir, such as:
- loss_log.txt
- learning_curve.png
- predict_xyz.npy
- true_xyz.npy
Run/modify plots.py to read those files and generate additional figures.

Example (conceptual):

python plots.py --log_dir ./experiments/wave_only_run

Note: if plots.py currently does not expose a CLI, you can either:

add an argparse interface, or

import it from a notebook/script and call its functions directly.

7. FAQ / Troubleshooting

Q1: It fails because `/disk_pool1/...` does not exist.

The code contains hard-coded paths. Follow Section 5.1 to update dataset paths, coordinates path, and the PMT type CSV path.

Q2: `xyz N=... does not match model.npmt=17612`

Your coordinates file is not shaped (17612, 3). This repo assumes N=17612 PMTs.

Q3: Out of memory / too slow

Reduce --batch_size
Reduce timesteps (requires consistent changes in dataset + model config)
The model extracts features in PMT chunks using pmt_batch_size=1000; reducing it saves memory but may slow down training.

Q4: I want everything configurable via CLI arguments

Recommended refactor: add --x_dir --y_dir --feat_dir --coords_path --pmt_type_csv and remove hard-coded paths.

8. Acknowledgements

PointNet++ utilities and Set Abstraction modules are implemented in models/pointnet_regression_utils.py.
The CNN + PointNet++ hybrid models are in models/pointnet_regression_npevst*.py.

License

No LICENSE file is detected in the repository root. Add one if you plan to open-source/distribute the code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CNN-PointNet (CNN + PointNet++) Tutorial

1. Repository Layout

2. Method Overview: How CNN + PointNet++ Are Combined

2.1 Inputs / Outputs (default task)

2.2 Per-PMT CNN Feature Extraction

2.3 PointNet++ Aggregation

3. Data Requirements

3.1 Paired waveform/label files

3.2 Optional: 4 static feature files (only for `train_npevst+feat.py`)

3.3 Geometry coordinates (`norm_coords`)

4. Environment Setup (Recommended)

5. Training Tutorial (Step-by-step)

5.1 Step 1: Fix hard-coded paths

5.2 Train waveform-only model (`train_npevst.py`)

5.3 Train waveform + static features model (`train_npevst+feat.py`)

5.4. Export `visE` (Optional)

6 Plotting & Post-processing (`plots.py`)

How to use it

7. FAQ / Troubleshooting

Q1: It fails because `/disk_pool1/...` does not exist.

Q2: `xyz N=... does not match model.npmt=17612`

Q3: Out of memory / too slow

Q4: I want everything configurable via CLI arguments

8. Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data_utils		data_utils
models		models
.gitignore		.gitignore
export_visE.py		export_visE.py
plots.py		plots.py
readme.md		readme.md
train_muon.py		train_muon.py
train_npevst+feat.py		train_npevst+feat.py
train_npevst.py		train_npevst.py

Folders and files

Latest commit

History

Repository files navigation

CNN-PointNet (CNN + PointNet++) Tutorial

1. Repository Layout

2. Method Overview: How CNN + PointNet++ Are Combined

2.1 Inputs / Outputs (default task)

2.2 Per-PMT CNN Feature Extraction

2.3 PointNet++ Aggregation

3. Data Requirements

3.1 Paired waveform/label files

3.2 Optional: 4 static feature files (only for train_npevst+feat.py)

3.3 Geometry coordinates (norm_coords)

4. Environment Setup (Recommended)

5. Training Tutorial (Step-by-step)

5.1 Step 1: Fix hard-coded paths

5.2 Train waveform-only model (train_npevst.py)

5.3 Train waveform + static features model (train_npevst+feat.py)

5.4. Export visE (Optional)

6 Plotting & Post-processing (plots.py)

How to use it

7. FAQ / Troubleshooting

Q1: It fails because /disk_pool1/... does not exist.

Q2: xyz N=... does not match model.npmt=17612

Q3: Out of memory / too slow

Q4: I want everything configurable via CLI arguments

8. Acknowledgements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3.2 Optional: 4 static feature files (only for `train_npevst+feat.py`)

3.3 Geometry coordinates (`norm_coords`)

5.2 Train waveform-only model (`train_npevst.py`)

5.3 Train waveform + static features model (`train_npevst+feat.py`)

5.4. Export `visE` (Optional)

6 Plotting & Post-processing (`plots.py`)

Q1: It fails because `/disk_pool1/...` does not exist.

Q2: `xyz N=... does not match model.npmt=17612`

Packages