This repository implements a regression model for PMT waveforms (time series) combined with detector geometry (xyz):
- A 1D CNN extracts per-PMT features from each PMT waveform.
- All PMTs are treated as a point cloud (each PMT is a point):
- point position:
xyz - point feature: CNN features (optionally concatenated with static features)
- point position:
- A PointNet++ backbone aggregates all points into a global representation and regresses the final output (default: a 3D vector).
Two training modes are provided:
- Waveform-only: CNN features from waveform only
(train_npevst.py+models/pointnet_regression_npevst.py) - Waveform + static features: concatenate 4 static features before waveform
(train_npevst+feat.py+models/pointnet_regression_npevst_add.py)
CNN-PointNet/
├─ data_utils/
│ └─ npevst_dataset.py # Datasets: lazy loading paired x_i.npy / y_i.npy (and optional feat)
├─ models/
│ ├─ pointnet_regression_utils.py # PointNet++ utilities (FPS, ball query, SA/MSG modules)
│ ├─ pointnet_regression_npevst.py # waveform-only model: CNNBlock + PointNet++
│ └─ pointnet_regression_npevst_add.py # waveform+feat model: conv on waveform + concat static feats
├─ train_npevst.py # Training script (waveform-only)
├─ train_npevst+feat.py # Training script (waveform + 4 static features)
├─ export_visE.py # Export a column (e.g., visE) from y and match the same test split
├─ plots.py # Plotting/analysis helpers (optional, see Section 6.5)
└─ run_training.sh # Example launcher (torchrun commands are commented)
Waveform input wave:
- Waveform-only mode:
wavehas shape(B, N, T, 1) - Waveform+feat mode:
wavehas shape(B, N, (4+T), 1)
(the first 4 are static features; the remainingTare waveform samples)
Here:
B: batch sizeN = 17612: number of PMTs (assumed fixed in the code)T = 100: number of waveform timesteps used by default
Geometry input xyz:
- Loaded as
(N, 3)and expanded to(B, N, 3)during training.
Model output:
(B, 3)regression vector (a 3D vector)train_npevst.pyusesMSELosstrain_npevst+feat.pyuses mean Euclidean distance by default
In models/pointnet_regression_npevst.py and ..._add.py:
- Extract each PMT’s 1D time series.
- Feed it into a
CNNBlockto produce a compact feature vector (e.g., 4 dims). - Assemble all PMT features into
pointsof shape(B, D, N)and feed(xyz, points)to PointNet++.
The model also splits PMTs into two groups by type (Hamamatsu vs NNVT) and uses two different CNN blocks (cnn_hama, cnn_nnvt). The type mapping is loaded from a CSV:
'/disk_pool1/houyh/data/PMTType_CD_LPMT.csv'
You will almost certainly need to change this path to your local environment (see Section 5).
PointNet++ components live in models/pointnet_regression_utils.py. The backbone uses:
- SA1:
PointNetSetAbstraction(...) - SA2:
PointNetSetAbstractionMsg(...)(multi-scale grouping) - SA3:
PointNetSetAbstraction(..., group_all=True)for global feature - MLP head to regress the final output vector
The training scripts search for:
x_dir/x_*.npyy_dir/y_*.npy
Pairs are matched by index i (x_i.npy with y_i.npy), then filtered by start/end (default 3000..4999).
From data_utils/npevst_dataset.py:
x_i.npyshape:(num_samples_in_file, 17612, T_total)- only the first
timesteps(default 100) are used
- only the first
y_i.npyshape:(num_samples_in_file, ncol)- columns
theta_col=2andphi_col=3are used to generate a unit direction vector label:x = sin(theta) cos(phi)y = sin(theta) sin(phi)z = cos(theta)
- columns
For waveform+feat mode, you also need these files under feat_dir for each i:
x_fht_pmt_i.npyx_npe_pmt_i.npyx_slope4_pmt_i.npyx_peaktime_pmt_i.npy
For each sample, the dataset loads (17612,) from each feature file, stacks to (17612, 4), and concatenates with waveform (17612, T) along the “sequence” dimension to get (17612, 4+T).
Training scripts load:
/disk_pool1/houyh/coords/norm_coords_single.npy
It must be a numpy array of shape:
(17612, 3)
You must update this path (or provide the file at that location).
Main dependencies:
- Python 3.9+ (recommended)
- PyTorch (recommended 2.x)
- numpy, matplotlib, tqdm, scikit-learn
- pandas (required for PMT type CSV in the model)
Example setup:
conda create -n cnn_pointnet python=3.10 -y
conda activate cnn_pointnet
# Install torch matching your CUDA version (example only)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install numpy matplotlib tqdm scikit-learn pandasThe current code contains multiple hard-coded local paths:
In training scripts:
x_dir = "/disk_pool1/chenhr/decon2"y_dir = "/disk_pool1/wangjb/nu_mu_waveform/y"feat_dir = "/disk_pool1/wangjb/nu_mu_waveform/elec_fea2"(only +feat)xyz = np.load('/disk_pool1/houyh/coords/norm_coords_single.npy')- default
log_dir = "/home/houyh/CNN-PointNet/experiments/runX"
In model files:
- PMT type CSV:
'/disk_pool1/houyh/data/PMTType_CD_LPMT.csv'
Minimum required changes to run on a new machine:
- Edit
train_npevst.py/train_npevst+feat.pyto point to yourx_dir,y_dir, (feat_dir) andcoordsfile. - Edit
models/pointnet_regression_npevst.pyandmodels/pointnet_regression_npevst_add.pyto point to your PMT type CSV path.
More maintainable approach: convert these into command-line arguments (if you want, I can help refactor the scripts accordingly).
Example command:
python train_npevst.py \
--gpu 0 \
--epoch 100 \
--batch_size 32 \
--learning_rate 5e-4 \
--weight_decay 1e-4 \
--num_features 4 \
--log_dir ./experiments/wave_only_runWhat it does:
- Finds paired
x_i.npy/y_i.npyindices viafind_paired_indices - Uses
NPYPairDatasetfor lazy sample loading - Splits train/test using
random_split(test = 20%) - Tracks train/test loss each epoch
- Saves into
log_dir:best.pth(best test loss checkpoint)predict_xyz.npy,true_xyz.npy(best-epoch predictions/targets)learning_curve.png(log-scaled y-axis)Test Performance.png(scatter plot; by default uses theta derived from xyz)
Common arguments:
--num_features: per-PMT CNN feature dimensionD(default 4)--patience: early stopping patience (default 15)--save_each_epoch: also savelast.ptheach epoch (default off)
Example command:
python "train_npevst+feat.py" \
--gpu 0 \
--epoch 100 \
--batch_size 32 \
--learning_rate 5e-4 \
--weight_decay 1e-4 \
--num_features 4 \
--log_dir ./experiments/wave_plus_feat_runDifferences vs waveform-only:
- Dataset becomes
NPYPairFeatDataset - Input feature length becomes
4 + timesteps models/pointnet_regression_npevst_add.py:- applies conv only to the waveform portion
- concatenates CNN output with the static features, forming per-point feature dimension
num_features + static_feats
export_visE.py extracts one column (default vise_col=7) from all y_i.npy files and reproduces the same test split as training (via the same seed-based permutation) to export only the test subset.
Example:
python export_visE.py \
--x_dir /your/x_dir \
--y_dir /your/y_dir \
--start 3000 \
--end 5000 \
--vise_col 7 \
--seed 42 \
--out ./experiments/wave_only_run/visE_test.npyThe file plots.py contains utility functions/scripts for visualization and offline analysis of your experiment outputs. Typical use cases include:
- Plotting training curves (train/test loss vs epoch) from saved logs
- Visualizing prediction quality (e.g., predicted vs true scatter)
- Comparing different runs (e.g., different
num_features, with/without static features) - Additional custom plots for physics/geometry-related metrics (depending on what you save)
Because plots.py is a standalone script, usage depends on how the functions are written inside it. A common workflow is:
- Run training to produce outputs under
--log_dir, such as:loss_log.txtlearning_curve.pngpredict_xyz.npytrue_xyz.npy
- Run/modify
plots.pyto read those files and generate additional figures.
Example (conceptual):
python plots.py --log_dir ./experiments/wave_only_runNote: if
plots.pycurrently does not expose a CLI, you can either:
- add an
argparseinterface, or- import it from a notebook/script and call its functions directly.
The code contains hard-coded paths. Follow Section 5.1 to update dataset paths, coordinates path, and the PMT type CSV path.
Your coordinates file is not shaped (17612, 3). This repo assumes N=17612 PMTs.
- Reduce
--batch_size - Reduce
timesteps(requires consistent changes in dataset + model config) - The model extracts features in PMT chunks using
pmt_batch_size=1000; reducing it saves memory but may slow down training.
Recommended refactor: add --x_dir --y_dir --feat_dir --coords_path --pmt_type_csv and remove hard-coded paths.
- PointNet++ utilities and Set Abstraction modules are implemented in
models/pointnet_regression_utils.py. - The CNN + PointNet++ hybrid models are in
models/pointnet_regression_npevst*.py.
No LICENSE file is detected in the repository root. Add one if you plan to open-source/distribute the code.