Skip to content

fix: combined export_batch manifest lost all but the first model entry#88

Merged
Dinghye merged 3 commits into
mainfrom
combinebug
Jun 11, 2026
Merged

fix: combined export_batch manifest lost all but the first model entry#88
Dinghye merged 3 commits into
mainfrom
combinebug

Conversation

@Dinghye

@Dinghye Dinghye commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes a bug reported by a user: in multi-model combined exports (export_batch with a single output file), the manifest's models list ended up with only one entry — each checkpoint write effectively rewrote it instead of accumulating.

Root cause

CheckpointManager.combined_write_checkpoint returned the jsonable copy of the manifest that write_arrays operates on, not the original dict. run_pending_models rebinds its local manifest to that returned copy after each per-model checkpoint, while the _write_ckpt closure in BatchExporter._run_combined keeps serializing the stale original — so every model entry after the first was appended to a throwaway copy and dropped from both the in-memory result and the on-disk manifest. As a side effect, combined-mode resume would needlessly re-run already-completed models.

Fix

combined_write_checkpoint now merges the writer's path keys (manifest_path, npz_path, npz_keys, nc_path, nc_variables) back into the original manifest and returns that same object, preserving the in-place mutation contract. Per-item exports were never affected (each point's manifest is fully built before its single write).

Tests

  • New tests/test_checkpoint_manager.py with two regression tests; the second faithfully mirrors the exporter's closure + run_pending_models loop structure. Verified both fail on the pre-fix code and pass with the fix.
  • Full suite: 751 passed, 2 skipped (after merging latest main).

@Dinghye Dinghye merged commit 35a3fc8 into main Jun 11, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant