DVCTr: Digital volume correlation based on Transformer

Introduction

Network architure

DVCTr is an extension of DICTr to processing Digital Volume Correlation (DVC). It leverages transformer architecture to achieve precise matching between volumetric image pairs, enabling accurate displacement field estimation.

DVCTr consists of the following key components:

ResNet Encoder: Extracts multi-scale features from input volume images
Transformer Encoder: Enhances feature representation using Swin Transformer
Matching Block: Performs both global and local feature matching

The network processes input volumes at multiple resolution, starting with global matching at lower resolution and refining with local matching at higher resolution.

Loss Function for Unsupervised DICTr Training

The unsupervised loss function fuses photometric consistency and multi-resolution displacement gradient consistency (MrDGC), enabling training without labeled displacement data or specific assumptions about smoothness of displacement field.

1. Photometric Consistency Loss ($l_g$)

$l_{g}=\sum_{j=0}^{2} \frac{w_{j}}{N K_{g}} \sum_{i=0}^{N}\left| I_{0}\left(x_{i}\right)-I_{1}^{j}\left(x_{i}-u_{j}\left(x_{i}\right)\right)\right| _{1}$
Weights ($w_j$): $w_{j}=\frac{0.9^j}{\sum_{k=0}^{2} 0.9^{k}}$ (exponential decay with base 0.9, i.e., intensity differences at higher resolution are assigned with higher weights); normalized by $N$ (total number of pixels) and $K_g=255$ (max grayscale value for 8-bit image).

2. Multi-resolution Displacement Gradient Consistency Loss ($l_m$)

$l_{m}=\frac{h\left| g_{h}-g_{f}\right| {1}+q\left| g{q}-g_{f}\right| {1}}{N K{m}}$
Displacement gradients calculated via central difference (forward/backward for edge pixels); weighted as $h=0.9$ (1/2 resolution) and $q=0.1$ (1/4 resolution) to preserve genuine high-frequency deformation.
Normalized by $N$ (total number of pixels) and $K_m$ (a dimensionless factor determined via 1-epoch trial training).

3. Total Loss Function

$l_{I}=w_{g} l_{g}+w_{m} l_{m}$
Optimal weight ratio for DVCTr: $w_g : w_m = 2 : 1$.

Prerequisites

We recommend creating a Conda environment through the YAML file provided in the repository:

conda env create -f environment.yaml
conda activate dvctr

Datasets

Generate synthetic speckle datasets using the MATLAB scripts provided:

cd ./dataset/SpeckleDataset
matlab -nosplash -nodesktop -r main

Key parameters in main.m:

% Total number of training and validation samples
batch_size_total = 3200;
% Control displacement magnitude
sigma_d
% Control displacement gradient
% Multiple gradient displacements to enrich the dataset
% Smaller values represent larger relative displacement gradients
grid_size_list = [6, 8, 16, 24];
% Number of speckles
% Determined based on duty cycle
NUMSPECKLE = 6000;
% Control speckle radius
sigma_r = 1.1;
% Noise parameters
% 2% Gaussian noise
myparamnoise = paramnoise('G',0.02);

Training

Execute the following command in the root directory of the repository using the provided script:

sh ./train.sh

Key parameters in train.sh:

# Batch size per GPU
--batch_size 4
# Number of transformer layers
--num_transformer_layers 10
# Attention splits list
--attn_splits_list 2 8
# Correlation radius list
--corr_radius_list -1 4
# Total training steps
--num_steps 100000
# Loss weight for photometric loss
# lm: MrDGC loss, lg+lm=1
--lg 2/3
# Loss weight for half-resolution gradient difference loss
# q: quarter-resolution, q+h=1
--h 0.9

The script uses distributed training with PyTorch launcher for efficient multi-GPU training.

Training Strategy

To improve training efficiency and reduce grayscale overfitting in unsupervised learning, an early stopping strategy is introduced. The reduction in grayscale difference loss enters a saturation phase after sufficient training stages. Therefore, the relative reduction value of grayscale difference loss during validation can be used as a criterion for early stopping. Our experiments show that a threshold of 2% may be a good choice. For example, if the relative reduction of the mean absolute error of grayscale values remains below 2% for three consecutive epochs, it is appropriate to stop training, and the model at the beginning of the three epochs is used as a fully trained model for inference tasks.

For reference, DVCTr is trained on a high-performance computing server using four NVIDIA A800 SXM4 GPUs (80 GB VRAM). The batch size was 4 and it took around 30 hours for training.

Running Inference

Execute the following command in the root directory of the repository using the provided script:

sh ./experiment.sh

Key parameters in experiment.sh:

# Path to the pretrained model
--resume checkpoints/UnsupervisedDVCTr.pth
# Types of experiments to run (tfm, rotation, simulate)
--exp_type tfm rotation simulate
# Number of transformer layers
--num_transformer_layers 10
# Attention splits list
--attn_splits_list 2 8
# Correlation radius list
--corr_radius_list -1 4

This script runs the model on predefined test cases to evaluate its performance on different deformation scenarios.

Example inference results:

The above shows the displacement inference result for a 128×128×32 voxel volume rotated by 5° in the x-y plane.

The above shows the inference result for a 128×128×32 voxel volume with relatively random complex deformation.

Pretrained Models

The pretrained models of supervised and unsupervised DVCTr used in the paper are provided in the repository:

Supervised model: ./checkpoints/SupervisedDVCTr.pth
Unsupervised model: ./checkpoints/UnsupervisedDVCTr.pth

These models will be loaded in the default experiment script.

Citation

@article{HE2026114939,
    title = {Unsupervised Transformer-based deep learning for digital image correlation and digital volume correlation},
    journal = {Optics & Laser Technology},
    volume = {198},
    pages = {114939},
    year = {2026},
    issn = {0030-3992},
    doi = {https://doi.org/10.1016/j.optlastec.2026.114939},
    url = {https://www.sciencedirect.com/science/article/pii/S0030399226002902},
    author = {He, Haoyang and Zhou, Yifei and Zhang, Yajing and Cai, Yuqi and Li, Rui and Liu, Yiping and Tang, Liqun and Sun, Taolin and Jiang, Zhenyu}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
checkpoints		checkpoints
dataset/SpeckleDataset		dataset/SpeckleDataset
img		img
network		network
test		test
utils		utils
README.md		README.md
environment.yaml		environment.yaml
evaluate.py		evaluate.py
experiment.py		experiment.py
experiment.sh		experiment.sh
loss.py		loss.py
main.py		main.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DVCTr: Digital volume correlation based on Transformer

Introduction

Network architure

Loss Function for Unsupervised DICTr Training

1. Photometric Consistency Loss ($l_g$)

2. Multi-resolution Displacement Gradient Consistency Loss ($l_m$)

3. Total Loss Function

Prerequisites

Datasets

Training

Training Strategy

Running Inference

Pretrained Models

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DVCTr: Digital volume correlation based on Transformer

Introduction

Network architure

Loss Function for Unsupervised DICTr Training

1. Photometric Consistency Loss ($l_g$)

2. Multi-resolution Displacement Gradient Consistency Loss ($l_m$)

3. Total Loss Function

Prerequisites

Datasets

Training

Training Strategy

Running Inference

Pretrained Models

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages