feat: make tiling the default input_prep, with ViT patch-grid safety guards#89
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes the default
input_preppolicy from"resize"to tiling, so that large on-the-fly inputs preserve native resolution by default instead of being silently downsampled to the model's image size.input_prep=Noneis now the "unset / use package default" sentinel and resolves to"tile". This default is applied consistently acrossget_embedding,get_embeddings_batch,Model(...), andExportConfig."resize"and"auto"remain available as explicit opt-ins.remoteclip,scalemae,satmae,satmaepp,satmaepp_s2_10b), an unset/"auto"input_prepresolves to"resize"instead of tile, because their tiled patch-token grids are not a seamless spatial mosaic and can show stitching seams. Requestinggridoutput on these models emits a warning; explicit"tile"still works but warns and records seam-risk metadata. The resolved embedding records the model policy inmeta["input_prep"].max_tilesdefault raised9 → 64and is now a soft threshold (requests above it still run but warn that they may be slow / memory-heavy). A newmax_tiles_hardceiling (default1024, clamped>= max_tiles) raises to guard against runaway-area mistakes.resolved_mode="resize"in meta, so basic usage never breaks and the fallback stays auditable.meta["input_prep"]block (INPUT_PREP_VERSION = 1) recording requested vs. resolved mode (and tiling layout / counts when tiled), regardless of which code path produced it.remoteclippooled/grid consistency fix. The single path now usesencode_image(canonical 512-d CLIP embedding) sopooledis identical across single, batch, and tiled paths instead of being a 768-d raw-token mean on the single path.gridextraction was rewritten to read open_clip'sforward_intermediatespatch grid. Seedocs/models/remoteclip.md.Why: the previous
"resize"default quietly discarded spatial resolution for larger ROIs, and tile mode was easy to trip into hard errors. This makes the high-fidelity behavior the default while guarding the cases (ViT patch grids, huge areas) where naive tiling misleads.Testing
How did you verify the change?
CHANGELOG.mdfor user-facing API, model, semantic, or installation changes.New/updated coverage:
tests/test_model_aware_input_prep.py(default resolves to resize with metadata for ViT-grid models; explicit tile allowed but marked experimental), plus updatedtest_input_prep_tiling.py,test_specs.py,test_api.py,test_batch_overrides_all_models.py,test_combined_flow.py,test_export_batch.py, andtest_scalemae_preprocess_alignment.py.Notes
"resize"default will now tile by default; results for large ROIs change shape/values. Pininput_prep="resize"to keep prior behavior. Documented inCHANGELOG.mdunder Changed.pooledoutput is the recommended path for those.choosing_settings.md,concepts.md,api_embedding.md,api_specs.md, and the per-model pages for the affected ViT models.