Skip to content

Add configurable anisotropic downsampling support to AutoencoderKL an…#8856

Open
shubham-61969 wants to merge 2 commits into
Project-MONAI:devfrom
shubham-61969:8447-autoencoderkl-kernel-stride
Open

Add configurable anisotropic downsampling support to AutoencoderKL an…#8856
shubham-61969 wants to merge 2 commits into
Project-MONAI:devfrom
shubham-61969:8447-autoencoderkl-kernel-stride

Conversation

@shubham-61969
Copy link
Copy Markdown
Contributor

…d relevant testcases

Fixes #8447.

Description

This PR adds configurable anisotropic downsampling support to AutoencoderKL.

Previously, AutoencoderKL hardcoded:

  • kernel_size=3
  • stride=2
  • isotropic downsampling assumptions
  • asymmetric padding logic coupled to the default configuration

This PR introduces configurable per-level and per-dimension downsampling parameters while preserving backward compatibility and encoder-decoder spatial consistency.

Key changes:

  • Added configurable downsampling parameters for AEKLDownsample

  • Added helper utilities for:

    • parameter normalization
    • validation
    • automatic padding computation
  • Added support for anisotropic configurations such as:

    • stride=(2,2,1)
    • kernel_size=(3,3,1)
  • Removed dependency on hardcoded asymmetric padding for configurable paths

  • Updated decoder upsampling to automatically mirror encoder downsampling configuration

  • Added validation for:

    • odd kernels only
    • valid tuple lengths
    • correct number of downsampling levels
  • Added comprehensive tests covering:

    • backward compatibility
    • anisotropic 2D/3D configurations
    • per-level configurations
    • reconstruction shape consistency
    • non-power-of-two spatial dimensions
    • invalid configuration handling

This is particularly useful for medical imaging workloads with anisotropic voxel spacing such as CT and MRI volumes.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

…d relevant testcases

Signed-off-by: Shubham Chandravanshi <[email protected]>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 17, 2026

📝 Walkthrough

Walkthrough

This PR adds per-level configurable downsampling and upsampling parameters to AutoencoderKL. New validation utilities enforce odd kernel sizes and normalize parameters across spatial dimensions. AEKLDownsample is refactored to accept kernel_size, stride, and padding. Encoder normalizes and applies per-level parameters; Decoder reverses them to compute per-dimension upsampling scale_factors. AutoencoderKL exposes downsample_parameters and wires encoder/decoder consistently. Tests cover valid anisotropic configs, validation errors, and reconstruction/shape robustness.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title partially describes the main change. It mentions 'configurable anisotropic downsampling support to AutoencoderKL' which is the primary objective, though it appears truncated.
Description check ✅ Passed Description is comprehensive and complete. It covers objectives, types of changes, key implementation details, and validates alignment with the template requirements.
Linked Issues check ✅ Passed Code changes meet all core requirements from #8447: configurable kernel_size and stride per level/dimension, automatic padding derivation, backward compatibility, and encoder-decoder consistency validation.
Out of Scope Changes check ✅ Passed All changes are in-scope. Modified files directly implement #8447 requirements: AEKLDownsample refactoring, Encoder/Decoder parameter support, and comprehensive test coverage.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
monai/networks/nets/autoencoderkl.py (1)

636-667: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

ConvTranspose path ignores anisotropic stride.

When use_convtranspose=True, the Upsample call doesn't receive the per-level stride and defaults to stride=2. This breaks anisotropic configurations (e.g., stride=(2,2,1)). The upsampling_stride computed on line 638 is unused in this branch, while the nontrainable path correctly applies it as scale_factor.

Proposed fix
             if use_convtranspose:
                 blocks.append(
                     Upsample(
-                        spatial_dims=spatial_dims, mode="deconv", in_channels=block_in_ch, out_channels=block_in_ch
+                        spatial_dims=spatial_dims,
+                        mode="deconv",
+                        in_channels=block_in_ch,
+                        out_channels=block_in_ch,
+                        scale_factor=tuple(float(s) for s in upsampling_stride),
                     )
                 )

Note: Anisotropic stride tests exist but don't exercise the convtranspose path, leaving this bug untested.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@monai/networks/nets/autoencoderkl.py` around lines 636 - 667, The
convtranspose branch for upsampling ignores the computed per-level
upsampling_stride (variable upsampling_stride) and always uses the default
stride, breaking anisotropic cases; modify the use_convtranspose branch in the
loop that builds blocks so the Upsample(...) call for mode="deconv" receives the
per-level scale/stride (e.g., pass scale_factor=tuple(float(s) for s in
upsampling_stride) or the appropriate strides argument accepted by Upsample) so
it uses the anisotropic upsampling_stride instead of the hardcoded default;
update the Upsample(...) invocation in the use_convtranspose True branch (the
block creating Upsample with mode="deconv" and in_channels=block_in_ch) to
include that scale/stride parameter.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/networks/nets/test_autoencoderkl.py`:
- Around line 560-576: The test test_validation_even_kernel_raises_error
currently fails for the wrong reason because the supplied downsample_parameters
list length doesn't match the expected number of downsampling levels for the
provided channels, so the level-count validation triggers before kernel-size
validation; update the test to supply a downsample_parameters list whose length
matches the required number of levels for AutoencoderKL (e.g., for
channels=(4,4,4) provide two dicts) and ensure at least one dict uses an even
"kernel_size" (e.g., 4) so that instantiating AutoencoderKL(...) raises the
intended ValueError about even kernel sizes rather than the level-count
mismatch.
- Around line 578-595: The test
test_validation_invalid_tuple_length_raises_error is failing because the
level-count mismatch validation runs before the tuple-length check; to reach the
tuple-length validation you must provide two downsample parameter dicts in
downsample_parameters so the number of levels matches attention_levels and
channels length, then still include invalid tuple lengths (e.g., kernel_size and
stride with only 2 elements) to trigger ValueError from AutoencoderKL; update
the downsample_params used in the test (referenced variable downsample_params
and class AutoencoderKL) to contain two dicts with the bad tuples so the
tuple-length validation is exercised.

---

Outside diff comments:
In `@monai/networks/nets/autoencoderkl.py`:
- Around line 636-667: The convtranspose branch for upsampling ignores the
computed per-level upsampling_stride (variable upsampling_stride) and always
uses the default stride, breaking anisotropic cases; modify the
use_convtranspose branch in the loop that builds blocks so the Upsample(...)
call for mode="deconv" receives the per-level scale/stride (e.g., pass
scale_factor=tuple(float(s) for s in upsampling_stride) or the appropriate
strides argument accepted by Upsample) so it uses the anisotropic
upsampling_stride instead of the hardcoded default; update the Upsample(...)
invocation in the use_convtranspose True branch (the block creating Upsample
with mode="deconv" and in_channels=block_in_ch) to include that scale/stride
parameter.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8b8b88a6-f7ce-47d8-b866-6f242adcbc65

📥 Commits

Reviewing files that changed from the base of the PR and between 2a98f63 and 0a90773.

📒 Files selected for processing (2)
  • monai/networks/nets/autoencoderkl.py
  • tests/networks/nets/test_autoencoderkl.py

Comment thread tests/networks/nets/test_autoencoderkl.py
Comment thread tests/networks/nets/test_autoencoderkl.py
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
monai/networks/nets/autoencoderkl.py (1)

668-692: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Pass anisotropic stride to deconv branch.

The deconv upsampling ignores upsampling_stride (line 667) and defaults to isotropic ×2, while the nontrainable branch correctly passes scale_factor=tuple(float(s) for s in upsampling_stride). For anisotropic configs like (2, 2, 1), deconv will upscale incorrectly.

Add scale_factor=tuple(float(s) for s in upsampling_stride) to the deconv Upsample call (line 668-673).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@monai/networks/nets/autoencoderkl.py` around lines 668 - 692, The deconv
branch inside the upsampling construction (when use_convtranspose is True)
currently creates an Upsample(mode="deconv", ...) but omits the anisotropic
upsampling factor; pass the same computed scale factor used by the nontrainable
branch by adding scale_factor=tuple(float(s) for s in upsampling_stride) to that
Upsample(...) call so Upsample(mode="deconv", ...) uses the correct anisotropic
upsampling_stride value (refer to the Upsample instantiation, use_convtranspose
flag, and the upsampling_stride variable).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@monai/networks/nets/autoencoderkl.py`:
- Around line 130-139: The current None-handling for downsample_parameters
normalizes to symmetric padding via
_validate_kernel_stride_parameters/_compute_padding, changing legacy behavior;
revert to special-casing the legacy default when downsample_parameters is None
by returning per-level entries that match the original AsymmetricPad + Conv
semantics (kernel_size=3, stride=2, padding=0) instead of computed symmetric
padding—use the symbols downsample_parameters,
default_kernel_size/default_stride, spatial_dims and num_levels to locate the
branch and ensure each returned dict keeps padding=0 (caller is expected to
apply the AsymmetricPad((0,1)*spatial_dims) behavior externally) so existing
checkpoints keep the same behavior.
- Around line 85-99: The current _compute_padding that returns padding = tuple(k
// 2 for k in kernel_size) produces symmetric padding only and does not preserve
spatial sizes for non-divisible inputs; update the encoder/decoder to record
per-stage spatial outputs (target sizes) during encoding and use those targets
in reconstruct() to compute and apply either per-stage output_padding for
ConvTranspose (based on stride and recorded encoder sizes) or explicit cropping
after upsampling, rather than relying on fixed symmetric padding. Specifically,
modify the code paths around _compute_padding and the encoder forward that
produces ceil(n/stride) to store each intermediate spatial shape, and update the
decoder/ConvTranspose reconstruction logic (where output_padding or cropping is
applied) to use those stored sizes to guarantee exact recovery for
kernels/strides such as kernel=3,stride=2 (also fix the similar logic referenced
at lines 691-692).

---

Outside diff comments:
In `@monai/networks/nets/autoencoderkl.py`:
- Around line 668-692: The deconv branch inside the upsampling construction
(when use_convtranspose is True) currently creates an Upsample(mode="deconv",
...) but omits the anisotropic upsampling factor; pass the same computed scale
factor used by the nontrainable branch by adding scale_factor=tuple(float(s) for
s in upsampling_stride) to that Upsample(...) call so Upsample(mode="deconv",
...) uses the correct anisotropic upsampling_stride value (refer to the Upsample
instantiation, use_convtranspose flag, and the upsampling_stride variable).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 43f2a9ce-100e-4c3b-a9b2-baea9721e870

📥 Commits

Reviewing files that changed from the base of the PR and between 0a90773 and 6bc4ebc.

📒 Files selected for processing (2)
  • monai/networks/nets/autoencoderkl.py
  • tests/networks/nets/test_autoencoderkl.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/networks/nets/test_autoencoderkl.py

Comment on lines +85 to +99
def _compute_padding(kernel_size: tuple[int, ...]) -> tuple[int, ...]:
"""
Compute symmetric padding for odd kernel sizes.

Padding is derived as:
padding[d] = kernel_size[d] // 2

Args:
kernel_size: Kernel size for each spatial dimension.

Returns:
Tuple of padding values for each spatial dimension.
"""
padding = tuple(k // 2 for k in kernel_size)
return padding
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Static symmetric padding still breaks odd/non-divisible reconstruction sizes.

padding[d] = kernel_size[d] // 2 makes the encoder output ceil(n / stride[d]), while the decoder only multiplies by stride[d]. For (kernel=3, stride=2), a size-5 axis becomes 3 after encode and 6 after decode, so reconstruct() can still return the wrong spatial shape on non-divisible inputs. This needs per-stage target sizes or crop/output-padding metadata, not just fixed padding.

Also applies to: 691-692

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@monai/networks/nets/autoencoderkl.py` around lines 85 - 99, The current
_compute_padding that returns padding = tuple(k // 2 for k in kernel_size)
produces symmetric padding only and does not preserve spatial sizes for
non-divisible inputs; update the encoder/decoder to record per-stage spatial
outputs (target sizes) during encoding and use those targets in reconstruct() to
compute and apply either per-stage output_padding for ConvTranspose (based on
stride and recorded encoder sizes) or explicit cropping after upsampling, rather
than relying on fixed symmetric padding. Specifically, modify the code paths
around _compute_padding and the encoder forward that produces ceil(n/stride) to
store each intermediate spatial shape, and update the decoder/ConvTranspose
reconstruction logic (where output_padding or cropping is applied) to use those
stored sizes to guarantee exact recovery for kernels/strides such as
kernel=3,stride=2 (also fix the similar logic referenced at lines 691-692).

Comment on lines +130 to +139
if downsample_parameters is None:
# Default: use provided defaults for all levels
default_ks_tuple, default_s_tuple = _validate_kernel_stride_parameters(
default_kernel_size, default_stride, spatial_dims
)
default_padding = _compute_padding(default_ks_tuple)
return [
{"kernel_size": default_ks_tuple, "stride": default_s_tuple, "padding": default_padding}
for _ in range(num_levels)
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

downsample_parameters=None no longer preserves the legacy path.

This now normalizes the default case to kernel_size=3, stride=2, padding=1, but the previous implementation was AsymmetricPad((0, 1) * spatial_dims) + Conv(..., kernel_size=3, stride=2, padding=0). Those are not equivalent: a length-5 axis went to 2 before and goes to 3 here, and even even-sized inputs see different border context. Existing checkpoints on the default config will silently change behavior unless the legacy default is special-cased.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@monai/networks/nets/autoencoderkl.py` around lines 130 - 139, The current
None-handling for downsample_parameters normalizes to symmetric padding via
_validate_kernel_stride_parameters/_compute_padding, changing legacy behavior;
revert to special-casing the legacy default when downsample_parameters is None
by returning per-level entries that match the original AsymmetricPad + Conv
semantics (kernel_size=3, stride=2, padding=0) instead of computed symmetric
padding—use the symbols downsample_parameters,
default_kernel_size/default_stride, spatial_dims and num_levels to locate the
branch and ensure each returned dict keeps padding=0 (caller is expected to
apply the AsymmetricPad((0,1)*spatial_dims) behavior externally) so existing
checkpoints keep the same behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AutoencoderKL does not allow to modify the kernel size and stride

1 participant